eess.IV - 2023-10-24

A Comparative Study of Variational Autoencoders, Normalizing Flows, and Score-based Diffusion Models for Electrical Impedance Tomography

  • paper_url: http://arxiv.org/abs/2310.15831
  • repo_url: https://github.com/adahfbch/dgm-eit
  • paper_authors: Huihui Wang, Guixian Xu, Qingping Zhou
  • for: This study aims to investigate the potential of deep generative models (DGMs) in learning implicit regularizers for Electrical Impedance Tomography (EIT) imaging.
  • methods: The study uses three DGMs - variational autoencoder networks, normalizing flow, and score-based diffusion model - to learn implicit regularizers for EIT inverse problems.
  • results: The study shows that no single method consistently outperforms the others across all settings, and the conditional normalizing flow model (CNF) exhibits the best generalization in low-level noise, while the conditional score-based diffusion model (CSD*) demonstrates the best generalization in high-level noise settings.
    Abstract Electrical Impedance Tomography (EIT) is a widely employed imaging technique in industrial inspection, geophysical prospecting, and medical imaging. However, the inherent nonlinearity and ill-posedness of EIT image reconstruction present challenges for classical regularization techniques, such as the critical selection of regularization terms and the lack of prior knowledge. Deep generative models (DGMs) have been shown to play a crucial role in learning implicit regularizers and prior knowledge. This study aims to investigate the potential of three DGMs-variational autoencoder networks, normalizing flow, and score-based diffusion model-to learn implicit regularizers in learning-based EIT imaging. We first introduce background information on EIT imaging and its inverse problem formulation. Next, we propose three algorithms for performing EIT inverse problems based on corresponding DGMs. Finally, we present numerical and visual experiments, which reveal that (1) no single method consistently outperforms the others across all settings, and (2) when reconstructing an object with 2 anomalies using a well-trained model based on a training dataset containing 4 anomalies, the conditional normalizing flow model (CNF) exhibits the best generalization in low-level noise, while the conditional score-based diffusion model (CSD*) demonstrates the best generalization in high-level noise settings. We hope our preliminary efforts will encourage other researchers to assess their DGMs in EIT and other nonlinear inverse problems.
    摘要 电气阻抗成像(EIT)是广泛应用的成像技术在工业检查、地球物理探测和医疗成像中。然而,EIT图像重建中的内在非线性和不稳定性使得经典正则化技术难以应用。深度生成模型(DGM)已经被证明可以扮演重要的隐式正则化和先知知识角色。本研究旨在调查DGM在学习基于EIT成像的情况下是否能够学习隐式正则化。我们首先介绍EIT成像和其反向问题的背景信息。然后,我们提出了基于不同DGM的三种算法来解决EIT反向问题。最后,我们通过数值和视觉实验发现,(1) 不同的方法在不同的设定下并不一定能够优于其他方法,(2) 使用训练集中包含4个畸形的模型来重建含2个畸形的物体时,在低噪声设定下, conditional normalizing flow模型(CNF)表现最佳,而在高噪声设定下, conditional score-based diffusion模型(CSD*) 表现最佳。我们希望我们的初步努力能够激励其他研究人员通过自己的DGM来探索EIT和其他非线性反向问题。

eess.SP - 2023-10-24

Resource Allocation for UAV-Assisted Industrial IoT User with Finite Blocklength

  • paper_url: http://arxiv.org/abs/2310.16211
  • repo_url: None
  • paper_authors: Atefeh Rezaei, Ata Khalili, Falko Dressler
  • for: 提高下链信息传输的稳定性和效率,使用无人机(UAV)增强下链信息传输系统。
  • methods: 使用非对称多接入(NOMA)技术,分两个阶段进行信息传输:第一阶段是控制器向UAV和IIoT设备传输信息,第二阶段是UAV解码并向IIoT设备传输信息。
  • results: 提出一种基于AOapproach的优化策略,可以有效地 mitigate非对称性问题,并且在 simulations中与基eline schemes进行比较,结果表明该方法可以下降DEP,并且具有高效率和稳定性。
    Abstract We consider a relay system empowered by an unmanned aerial vehicle (UAV) that facilitates downlink information delivery while adhering to finite blocklength requirements. The setup involves a remote controller transmitting information to both a UAV and an industrial Internet of Things (IIoT) or remote device, employing the non-orthogonal multiple access (NOMA) technique in the first phase. Subsequently, the UAV decodes and forwards this information to the remote device in the second phase. Our primary objective is to minimize the decoding error probability (DEP) at the remote device, which is influenced by the DEP at the UAV. To achieve this goal, we optimize the blocklength, transmission power, and location of the UAV. However, the underlying problem is highly non-convex and generally intractable to be solved directly. To overcome this challenge, we adopt an alternative optimization (AO) approach and decompose the original problem into three sub-problems. This approach leads to a sub-optimal solution, which effectively mitigates the non-convexity issue. In our simulations, we compare the performance of our proposed algorithm with baseline schemes. The results reveal that the proposed framework outperforms the baseline schemes, demonstrating its superiority in achieving lower DEP at the remote device. Furthermore, the simulation results illustrate the rapid convergence of our proposed algorithm, indicating its efficiency and effectiveness in solving the optimization problem.
    摘要 我们考虑了一种由无人机(UAV) empowered的 relay 系统,该系统可以在 finite blocklength 的限制下提供下行信息传输。系统包括一个远程控制器,该控制器将信息传输到 UAV 和一个工业互联网对象(IIoT)或远程设备,使用非正交多存取(NOMA)技术在第一阶段。在第二阶段,UAV 将这些信息解码并转发给远程设备。我们的主要目标是最小化译码错误概率(DEP)在远程设备上,该错误概率受 UAV 的 DEP 影响。为了实现这个目标,我们优化块长度、传输功率和 UAV 的位置。然而,这个问题具有非对称和紧难直接解决的特点。为了解决这个挑战,我们采用了一种代替优化(AO)方法,将原始问题分解成三个子问题。这种方法导致一个差不多的解决方案,有效地减少了非对称性问题。在我们的 simulate 中,我们与基线方案进行比较。结果表明,我们的提posed 框架在 DEP 上具有更低的性能,表明其在远程设备上的superiority。此外,我们的 simulate 结果表明我们的算法具有快速收敛的特点,这表明它在解决优化问题时的效率和可靠性。

Systematic Physics-Compliant Analysis of Over-the-Air Channel Equalization in RIS-Parametrized Wireless Networks-on-Chip

  • paper_url: http://arxiv.org/abs/2310.16195
  • repo_url: None
  • paper_authors: Jean Tapie, Hugo Prod’homme, Mohammadreza F. Imani, Philipp del Hougne
  • for: 这篇论文的目的是提出一种用于快速预测干扰环境中无线网络的模型,以便更好地利用快速模拟和优化技术来提高无线网络的性能。
  • methods: 这篇论文使用了一种基于物理学的减少基数模型,该模型可以通过一次全波矢量优化来预测干扰环境中无线网络的响应。
  • results: 该论文的结果表明,使用这种减少基数模型可以快速预测干扰环境中无线网络的响应,并且可以系统地研究各种干扰环境下的无线网络性能。此外,论文还提出了一种用于优化多个在芯处理器上的无线链接的同时优化策略。
    Abstract Wireless networks-on-chip (WNoCs) are an enticing complementary interconnect technology for multi-core chips but face severe resource constraints. Being limited to simple on-off-keying modulation, the reverberant nature of the chip enclosure imposes limits on allowed modulation speeds in sight of inter-symbol interference, casting doubts on the competitiveness of WNoCs as interconnect technology. Fortunately, this vexing problem was recently overcome by parametrizing the on-chip radio environment with a reconfigurable intelligent surface (RIS). By suitably configuring the RIS, selected channel impulse responses (CIRs) can be tuned to be (almost) pulse-like despite rich scattering thanks to judiciously tailored multi-bounce path interferences. However, the exploration of this "over-the-air" (OTA) equalization is thwarted by (i) the overwhelming complexity of the propagation environment, and (ii) the non-linear dependence of the CIR on the RIS configuration, requiring a costly and lengthy full-wave simulation for every optimization step. Here, we show that a reduced-basis physics-compliant model for RIS-parametrized WNoCs can be calibrated with a single full-wave simulation. Thereby, we unlock the possibility of predicting the CIR for any RIS configuration almost instantaneously without any additional full-wave simulation. We leverage this new tool to systematically explore OTA equalization in RIS-parametrized WNoCs regarding the optimal choice of delay time for the RIS-shaped CIR's peak. We also study the simultaneous optimization of multiple on-chip wireless links for broadcasting. Looking forward, the introduced tools will enable the efficient exploration of various types of OTA analog computing in RIS-parametrized WNoCs.
    摘要 无线网络在芯片(WNoCs)是一种吸引人的补充连接技术,但它们面临严重的资源约束。由于使用简单的ON-OFF键调制,芯片封包的噪声环境会限制允许的调制速率,从而影响WNoCs的竞争力。幸运的是,这个问题已经得到了解决,通过使用可配置的智能表面(RIS)来Parametrize the on-chip radio environment。通过适当配置RIS,可以通过 selecing channel impulse responses(CIRs)来实现(近乎)普液化调制,即使在丰富的散射下。然而,这种“空中”(OTA)平衡的探索被阻塞了(1)压倒性的宽泛环境,以及(2)RIS配置的非线性依赖关系,需要每个优化步骤都需要昂贵和时间consuming的全波 Simulation。在这里,我们展示了一种基于物理的减少基准模型,可以在RIS参数化的WNoCs中快速地Calibrate the CIR for any RIS configuration。通过这种新工具,我们可以在不需要额外的全波 Simulation 的情况下,快速地预测CIR для任何RIS配置。我们利用这种新工具来系统地探索OTA平衡在RIS参数化的WNoCs中的最佳选择延迟时间。我们还研究了在多个芯片无线连接中同时优化的问题。在未来,我们将引入的工具将允许我们高效地探索不同类型的OTA分析计算在RIS参数化的WNoCs中。

  • paper_url: http://arxiv.org/abs/2310.16137
  • repo_url: None
  • paper_authors: Liu Cao, Yahia Shabara, Parisa Cheraghi
  • for: 提高 fifth-generation (5G) 移动设备的上行性能 (UL) 性能。
  • methods: 使用全协同天线端口进行单层传输,并分析SB预编程选择标准和设计改进的UL代码书。
  • results: 通过链级模拟,发现使用ULSB预编程可以提高BLER错误率至少2dB,并且显示UL性能增幅受SB大小选择和相对相位偏移多样性的影响。
    Abstract The transformative enhancements of fifth-generation (5G) mobile devices bring about new challenges to achieve better uplink (UL) performance. Particularly, in codebook-based transmission, the wide-band (WB) precoding and the legacy UL codebook may become main bottlenecks for higher efficient data transmission. In this paper, we investigate the codebook-based UL single-layer transmission performance using fully coherent antenna ports in the context of sub-band (SB) precoding. We analyze the SB precoder selection criteria and design an UL codebook used for SB precoding by increasing the number of relative phase shifts of each port. Via link-level simulations, we verify that the UL SB precoding can improve up to 2 dB performance gain in terms of the block error rate (BLER) compared with the UL WB precoding which is the current UL precoding scheme. We also show that UL performance gain is sensitive to the SB size selection as well as the relative phase shift diversity.
    摘要 fifth-generation (5G) 移动设备的转变性丰富化带来了新的挑战以实现更好的上行(UL)性能。特别是在编码库(codebook)基础上的传输中,宽频(WB)预编码和传统的UL编码库可能成为更高效数据传输的主要瓶颈。在这篇论文中,我们调查了使用完全协同天线端口的UL单层传输性能。我们分析了SB预编码选择标准和设计了用于SB预编码的UL编码库。通过链级模拟,我们证明了UL SB预编码可以提高至2dB的BLER错误率相对于当前的UL WB预编码方案。我们还表明UL性能提升的敏感性受SB大小选择以及相对频shift多样性的影响。

Enhancing Energy Efficiency for Reconfigurable Intelligent Surfaces with Practical Power Models

  • paper_url: http://arxiv.org/abs/2310.15901
  • repo_url: None
  • paper_authors: Zhiyi Li, Jida Zhang, Jieao Zhu, Shi Jin, Linglong Dai
  • for: 本研究旨在提高智能表面协助下的无线通信系统的能效性 (EE),并且解决了先前的研究中忽略了智能表面元素上的PIN晶体的ON和OFF状态之间的能效性差异问题。
  • methods: 本研究使用了实际的能源模型,考虑了智能表面元素上PIN晶体的ON和OFF状态之间的能效性差异,并根据此模型提出了更加准确的EE优化问题。然而,这个问题是非对称的和杂合整数性质的,对优化poses了挑战。为解决这个问题,本研究使用了有效的替换优化算法框架,分别优化了基站和智能表面的扩展抽象预测器。
  • results: тео理分析表明,本研究可以减少原始问题的复杂性,从多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段减少到多阶段
    Abstract Reconfigurable intelligent surfaces (RISs) are widely considered a promising technology for future wireless communication systems. As an important indicator of RIS-assisted communication systems in green wireless communications, energy efficiency (EE) has recently received intensive research interest as an optimization target. However, most previous works have ignored the different power consumption between ON and OFF states of the PIN diodes attached to each RIS element. This oversight results in extensive unnecessary power consumption and reduction of actual EE due to the inaccurate power model. To address this issue, in this paper, we first utilize a practical power model for a RIS-assisted multi-user multiple-input single-output (MU-MISO) communication system, which takes into account the difference in power dissipation caused by ON-OFF states of RIS's PIN diodes. Based on this model, we formulate a more accurate EE optimization problem. However, this problem is non-convex and has mixed-integer properties, which poses a challenge for optimization. To solve the problem, an effective alternating optimization (AO) algorithm framework is utilized to optimize the base station and RIS beamforming precoder separately. To obtain the essential RIS beamforming precoder, we develop two effective methods based on maximum gradient search and SDP relaxation respectively. Theoretical analysis shows the exponential complexity of the original problem has been reduced to polynomial complexity. Simulation results demonstrate that the proposed algorithm outperforms the existing ones, leading to a significant increase in EE across a diverse set of scenarios.
    摘要 可重配置智能表面(RIS)技术被广泛认为是未来无线通信系统的优秀技术之一。作为RIS协助通信系统中绿色无线通信的重要指标,能效率(EE)在最近几年内受到了投入研究的广泛关注。然而,大多数前一些工作忽视了RIS元素上的PIN晶体的ON和OFF状态之间的能量消耗差异,这会导致无必要的能量浪费和EE的减少。为解决这个问题,在这篇论文中,我们首先采用了RIS协助多用户多输入单出口(MU-MISO)通信系统的实用能量模型,该模型考虑了RIS元素上PIN晶体的ON-OFF状态所导致的能量消耗差异。基于这个模型,我们形ulated了一个更加精准的EE优化问题。然而,这个问题具有非核心和杂合属性,这会对优化带来挑战。为解决这个问题,我们采用了有效的交互式优化(AO)算法框架,以分别优化基站和RIS的扫描频率和扫描方向。为了获得 essence RIS扫描频率和扫描方向,我们开发了两种有效的方法,包括最大峰值搜索和SDP缓和方法。理论分析表明,原始问题的几何复杂度已经降低到了多项式复杂度。实验结果表明,我们的算法在多种场景下比现有算法更高效,导致EE的提高。

Data-Driven Modeling and Analysis of Transmission Error in Harmonic Drive Systems: Nonlinear Dynamics, Error Modeling, and Compensation Techniques

  • paper_url: http://arxiv.org/abs/2310.15875
  • repo_url: None
  • paper_authors: Ju Wu, Philippe Louis Schuchert, Alireza Karimi
    for: This paper aims to improve the precision performance of harmonic drive systems (HDS) by data-driven modeling and analysis of the system’s kinematic errors.methods: The authors use Lagrange equations to derive the HDS dynamics, and develop various linear and nonlinear models to predict the kinematic errors. They also propose novel compensation mechanisms and policies, including nonlinear model predictive control and frequency loop-shaping.results: The best-performing model, based on a nonlinear neural network, achieves over 98% accuracy for one-step predictions on both the training and validation data sets. The authors also analyze the feedback loop to select the controller for vibration mitigation. The main contributions of the paper include the nonlinear dynamics derivation, data-driven nonlinear modeling of flexible kinematic errors, repeatable experiment design, and proposed novel compensation mechanism and policies.
    Abstract Harmonic drive systems (HDS) are high-precision robotic transmissions featuring compact size and high gear ratios. However, issues like kinematic transmission errors hamper their precision performance. This article focuses on data-driven modeling and analysis of an HDS to improve kinematic error compensation. The background introduces HDS mechanics, nonlinear attributes, and modeling approaches from literature. The HDS dynamics are derived using Lagrange equations. Experiments under aggressive conditions provide training data exhibiting deterministic patterns. Various linear and nonlinear models have been developed. The best-performing model, based on a nonlinear neural network, achieves over 98\% accuracy for one-step predictions on both the training and validation data sets. A phenomenological model separates the kinematic error into a periodic pure part and flexible part. Apart from implementation of estimated transmission error injection compensation, novel compensation mechanisms policies for the kinematic error are analyzed and proposed, including nonlinear model predictive control and frequency loop-shaping. The feedback loop is analyzed to select the controller for vibration mitigation. Main contributions include the nonlinear dynamics derivation, data-driven nonlinear modeling of flexible kinematic errors, repeatable experiment design, and proposed novel compensation mechanism and policies. Future work involves using physics-informed neural networks, sensitivity analysis, full life-cycle monitoring, and extracting physical laws directly from data.
    摘要 响应驱动系统(HDS)是高精度机器人传动系统,具有高比例和减小的尺寸。然而, transmit errors 妨碍它们的精度性表现。这篇文章关注数据驱动模型化和分析HDS,以改善传动误差补偿。文章首先介绍HDS的机械特性和非线性属性,以及从文献中所获取的模型方法。然后,通过拉格朗日方程来 derivate HDS 的动力学。在严格条件下进行的实验提供了训练数据,显示出具有决定性模式的特征。 varous linear 和非线性模型已经被开发出来,并且基于非线性神经网络的最佳性能模型达到了98%以上的准确性。在这篇文章中,我们采用了以下主要贡献:1. 非线性动力学 derivation2. 数据驱动非线性模型化的弹性传动误差3. 重复性实验设计4. 提出了新的补偿机制和策略,包括非线性预测控制和频率征 Loop-shaping5. 反馈循环分析,选择合适的控制器 для防止振荡未来的工作包括使用physics-informed neural networks、敏感分析、全生命周期监测和直接从数据中提取物理法则。

A High-Performance and Low-Complexity 5G LDPC Decoder: Algorithm and Implementation

  • paper_url: http://arxiv.org/abs/2310.15801
  • repo_url: None
  • paper_authors: Yuqing Ren, Hassan Harb, Yifei Shen, Alexios Balatsoukas-Stimming, Andreas Burg
  • for: 这个论文的目的是为5G新Radio(NR)设计低密度差错检查(LDPC)解码算法和相关的 Very Large-Scale Integration(VLSI)实现。
  • methods: 这个论文提出了一种高性能低复杂度LDPC解码算法,以满足5G的要求。具体来说,作者提出了一种扩展了调整最小和(GA-MS)解码算法,可以在硬件中实现准确的信号传输。
  • results: 作者通过数字实验表明,提出的固定点GAMS算法与浮点数BP解码算法在不同的5G标准规范下具有只有0.1dB的差异。此外,作者还实现了一个可重新配置的5G NR LDPC解码器,使用GA-MS解码算法,并采用多种数据压缩和近似技术来减少解码器的内存开销。
    Abstract 5G New Radio (NR) has stringent demands on both performance and complexity for the design of low-density parity-check (LDPC) decoding algorithms and corresponding VLSI implementations. Furthermore, decoders must fully support the wide range of all 5G NR blocklengths and code rates, which is a significant challenge. In this paper, we present a high-performance and low-complexity LDPC decoder, tailor-made to fulfill the 5G requirements. First, to close the gap between belief propagation (BP) decoding and its approximations in hardware, we propose an extension of adjusted min-sum decoding, called generalized adjusted min-sum (GA-MS) decoding. This decoding algorithm flexibly truncates the incoming messages at the check node level and carefully approximates the non-linear functions of BP decoding to balance the error-rate and hardware complexity. Numerical results demonstrate that the proposed fixed-point GAMS has only a minor gap of 0.1 dB compared to floating-point BP under various scenarios of 5G standard specifications. Secondly, we present a fully reconfigurable 5G NR LDPC decoder implementation based on GA-MS decoding. Given that memory occupies a substantial portion of the decoder area, we adopt multiple data compression and approximation techniques to reduce 42.2% of the memory overhead. The corresponding 28nm FD-SOI ASIC decoder has a core area of 1.823 mm2 and operates at 895 MHz. It is compatible with all 5G NR LDPC codes and achieves a peak throughput of 24.42 Gbps and a maximum area efficiency of 13.40 Gbps/mm2 at 4 decoding iterations.
    摘要 5G新Radio(NR)具有严格的性能和复杂度要求对低密度差异检查(LDPC)解码算法和相关的VLSI实现。此外,解码器必须完全支持5G NR块长和编码率的广泛范围,这是一项重要挑战。在这篇论文中,我们提出了一种高性能低复杂度LDPC解码器,专门用于满足5G的要求。首先,我们提出了一种通过修改信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级别的信道节点级�

Observation of Damped Oscillations in Chemical-Quantum-Magnetic Interactions

  • paper_url: http://arxiv.org/abs/2310.15775
  • repo_url: None
  • paper_authors: Luana Hildever, Thiago Ferro, José Holanda
  • for: 这个论文是关于新型的化学-量子-磁学互动现象的报道。
  • methods: 这个研究使用了Fe3O4/PANI nanostructure,并通过考察这种结构的电子韧度和磁矩来观察和测量化学-量子-磁学互动的行为。
  • results: 研究发现,在某些条件下,Fe3O4/PANI nanostructure会具有双值性效应,导致化学部分的振荡,而量子和磁部分则由双值性效应控制。此外,通过互动测量,研究人员发现了一种受限的抑制的响应,符合抑制的响应的行为。
    Abstract Fundamental interactions are the basis of the most diverse phenomena in science that allow the dazzling of possible applications. In this work, we report a new interaction, which we call chemical-quantum-magnetic interaction. This interaction arises due to the difference in valence that the Fe3O4/PANI nanostructure acquires under certain conditions. In this study, PANI activates the chemical part of the oscillations, leaving the quantum and magnetic part for the double valence effect and consequently for changing the number of spins of the nanostructure sites. We also observed using interaction measurements that chemical-quantum-magnetic interactions oscillate in a subcritical regime satisfying the behavior of a damped harmonic oscillator.
    摘要

Neuromorphic Sampling of Sparse Signals

  • paper_url: http://arxiv.org/abs/2310.15750
  • repo_url: None
  • paper_authors: Abijith Jagannath Kamath, Chandra Sekhar Seelamantula
  • for: 这 paper 是关于 neuromorphic sampling 技术的研究,用于实现低功耗、高 Dynamical range 和高时间分辨率的视觉传感器。
  • methods: 这 paper 使用 sampling-theoretic 方法,基于 neuromorphic sensing 和时间基准扩展,通过高分辨率 spectral estimation 方法进行参数估计,实现完美信号重建。
  • results: 研究人员通过实验证明,可以通过高精度的 spectral estimation 方法和 kernel-based sampling 方法,实现完美信号重建,并且可以在多通道情况下进行多个输入多个输出(MIMO)和单输入多输出(SIMO)的信号参数估计。
    Abstract Neuromorphic sampling is a bioinspired and opportunistic analog-to-digital conversion technique, where the measurements are recorded only when there is a significant change in the signal amplitude. Neuromorphic sampling has paved the way for a new class of vision sensors called event cameras or dynamic vision sensors (DVS), which consume low power, accommodate a high-dynamic range, and provide sparse measurements with high temporal resolution making it convenient for downstream inference tasks. In this paper, we consider neuromorphic sensing of signals with a finite rate of innovation (FRI), including a stream of Dirac impulses, sum of weighted and time-shifted pulses, and piecewise-polynomial functions. We consider a sampling-theoretic approach and leverage the close connection between neuromorphic sensing and time-based sampling, where the measurements are encoded temporally. Using Fourier-domain analysis, we show that perfect signal reconstruction is possible via parameter estimation using high-resolution spectral estimation methods. We develop a kernel-based sampling approach, which allows for perfect reconstruction with a sample complexity equal to the rate of innovation of the signal. We provide sufficient conditions on the parameters of the neuromorphic encoder for perfect reconstruction. Furthermore, we extend the analysis to multichannel neuromorphic sampling of FRI signals, in the single-input multi-output (SIMO) and multi-input multi-output (MIMO) configurations. We show that the signal parameters can be jointly estimated using multichannel measurements. Experimental results are provided to substantiate the theoretical claims.
    摘要 《神经omorphic sampling:一种基于生物体的 opportunistic扫描技术》,我们使用神经omorphic sampling技术来捕捉信号,只在信号强度发生显著变化时进行扫描。这种技术具有低功耗、高 Dinamic range 和高时间分辨率等优点,使得后续的推理任务变得更加便捷。在这篇论文中,我们考虑了具有有限速度创新(FRI)信号的神经omorphic捕捉,包括束缚 Dirac 冲击、重量加权和时间偏移的冲击和分割 polynomials 函数。我们采用抽象理论方法,利用神经omorphic捕捉和时间基的关系,通过时间编码来编码测量。使用傅里叶分析,我们表明可以通过高分辨率spectral estimation方法来进行参数估计,从而实现完美的信号重建。我们开发了基于核函数的抽象 sampling 方法,可以在样本复杂度等于创新率的情况下实现完美的重建。我们还提供了神经omorphic捕捉器参数的充分条件,以确保完美的重建。此外,我们扩展了分析到多通道神经omorphic捕捉的 FRI 信号,包括单输入多输出(SIMO)和多输入多输出(MIMO)配置。我们表明可以通过多通道测量来共同估计信号参数。实验结果用于证明理论声明。

User Clustering for Coexistence between Near-field and Far-field Communications

  • paper_url: http://arxiv.org/abs/2310.15707
  • repo_url: None
  • paper_authors: Kaidi Wang, Zhiguo Ding, George K. Karagiannidis
  • for: 本研究 investigate 近距离(NF)和远距离(FF)通信的共存,其中多个FF用户被归类为NF用户的扩散的磁场上服务,通过非对称多接入(NOMA)技术。
  • methods: 本研究提出了三种不同的突击抑制(SIC)解码策略,并将总比特率最大化问题转化为一个 clustering 问题,并且使用了 overlap coalitional game 来设计 clustering 算法。
  • results: 实验结果表明,提出的归类策略能够显著提高考虑系统的总比特率,而提出的策略可以实现不同的平衡 между 总比特率和公平性。
    Abstract This letter investigates the coexistence between near-field (NF) and far-field (FF) communications, where multiple FF users are clustered to be served on the beams of legacy NF users, via non-orthogonal multiple access (NOMA). Three different successive interference cancellation (SIC) decoding strategies are proposed and a sum rate maximization problem is formulated to optimize the assignment and decoding order. The beam allocation problem is further reformulated as an overlapping coalitional game, which facilitates the the design of the proposed clustering algorithm. The optimal decoding order in each cluster is also derived, which can be integrated into the proposed clustering. Simulation results demonstrate that the proposed clustering algorithm is able to significantly improve the sum rate of the considered system, and the developed strategies achieve different trade-offs between sum rate and fairness.
    摘要 这封信函数研究了靠近场(NF)和远场(FF)通信的共存问题,其中多个FF用户被集中服务在传统NF用户的扩散中,通过非对称多访问(NOMA)。提出了三种不同的successive interference cancellation(SIC)解码策略,并将总比特率最大化问题编制为优化分配和解码顺序的问题。扩散分配问题被再次编制为覆盖合作游戏,这使得设计提案的集群算法变得更加容易。最佳解码顺序在每个分组中也得出,可以与集群分配结合使用。实验结果表明,提案的集群算法可以明显提高考虑系统的总比特率,而提出的策略可以实现不同的比特率和公平性之间的质量。

Exploitation des propri{é}t{é}s de saturation synaptique pour obtenir un neurone {à} fr{é}quence sp{é}cifique

  • paper_url: http://arxiv.org/abs/2310.15635
  • repo_url: None
  • paper_authors: Guillaume Marthe, Claire Goursaud
  • for: 解决 IoT 应用中的能源消耗问题,特别是 Micro-controller 的过高能耗。
  • methods: 使用生物体启发的 Interacting Synapses 技术,实现时间滤波。
  • results: 提出一种在 analog 频域中处理时间序列的新方法,并研究 Parameters 的适应。
    Abstract Energy consumption remains the main limiting factors in many promising IoT applications. In particular, micro-controllers consume far too much power. In order to overcome this problem, new circuit designs have been proposed and the use of spiking neurons and analog computing has emerged as it allows a very significant consumption reduction. However, working in the analog domain brings difficulty to handle the sequential processing of incoming signals as is needed in many use cases.In this paper, we propose to use a bio-inspired phenomenon called Interacting Synapses to produce a time filter. We propose a model of synapses that makes the neuron fire for a specific range of delays between two incoming spikes, but not react when this Inter-Spike Timing is not in that range. We study the parameters of the model to understand how to adapt the Inter-Spike Timing. The originality of the paper is to propose a new way, in the analog domain, to deal with temporal sequences.
    摘要 “能源消耗仍然是许多有前途的IoT应用中的主要限制因素。特别是微控制器的能量消耗相对较高。为了解决这个问题,新的电路设计被提议,以及使用脉冲神经和分析计算。然而,在分析频域工作带来了处理进入信号的顺序处理的困难。在这篇论文中,我们提出使用生物体启发的现象——交互式 synapses 来生成时间筛选器。我们提出一种 synapses 模型,使得神经元在某个时间范围内的两个脉冲间隔时间不同,但是不会响应其他时间范围内的脉冲。我们研究模型参数,以便适应交互式时间调整。本文的原创性在于,在分析频域中,提出了一种新的方法来处理时间序列。”Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

3D Multi-Target Localization Via Intelligent Reflecting Surface: Protocol and Analysis

  • paper_url: http://arxiv.org/abs/2310.15574
  • repo_url: None
  • paper_authors: Meng Hua, Guangji Chen, Shaodan Ma, Chau Yuen, Hing Cheung So
  • for: 这篇论文的目的是研究基于多智能反射表(IRS)的三维多Target地位定位系统。
  • methods: 论文使用了多种方法,包括控制IRS的打开/关状态,以实现三维地位定位。
  • results: simulation results demonstrate the effectiveness of the proposed scheme, and sub-meter-level positioning accuracy can be achieved.
    Abstract With the emerging environment-aware applications, ubiquitous sensing is expected to play a key role in future networks. In this paper, we study a 3-dimensional (3D) multi-target localization system where multiple intelligent reflecting surfaces (IRSs) are applied to create virtual line-of-sight (LoS) links that bypass the base station (BS) and targets. To fully unveil the fundamental limit of IRS for sensing, we first study a single-target-single-IRS case and propose a novel \textit{two-stage localization protocol} by controlling the on/off state of IRS. To be specific, in the IRS-off stage, we derive the Cram\'{e}r-Rao bound (CRB) of the azimuth/elevation direction-of-arrival (DoA) of the BS-target link and design a DoA estimator based on the MUSIC algorithm. In the IRS-on stage, the CRB of the azimuth/elevation DoA of the IRS-target link is derived and a simple DoA estimator based on the on-grid IRS beam scanning method is proposed. Particularly, the impact of echo signals reflected by IRS from different paths on sensing performance is analyzed. Moreover, we prove that the single-beam of the IRS is not capable of sensing, but it can be achieved with \textit{multi-beam}. Based on the two obtained DoAs, the 3D single-target location is constructed. We then extend to the multi-target-multi-IRS case and propose an \textit{IRS-adaptive sensing protocol} by controlling the on/off state of multiple IRSs, and a multi-target localization algorithm is developed. Simulation results demonstrate the effectiveness of our scheme and show that sub-meter-level positioning accuracy can be achieved.
    摘要 With the emerging environment-aware applications, ubiquitous sensing is expected to play a key role in future networks. In this paper, we study a 3-dimensional (3D) multi-target localization system where multiple intelligent reflecting surfaces (IRSs) are applied to create virtual line-of-sight (LoS) links that bypass the base station (BS) and targets. To fully unveil the fundamental limit of IRS for sensing, we first study a single-target-single-IRS case and propose a novel \textit{two-stage localization protocol} by controlling the on/off state of IRS. To be specific, in the IRS-off stage, we derive the Cramér-Rao bound (CRB) of the azimuth/elevation direction-of-arrival (DoA) of the BS-target link and design a DoA estimator based on the MUSIC algorithm. In the IRS-on stage, the CRB of the azimuth/elevation DoA of the IRS-target link is derived and a simple DoA estimator based on the on-grid IRS beam scanning method is proposed. Particularly, the impact of echo signals reflected by IRS from different paths on sensing performance is analyzed. Moreover, we prove that the single-beam of the IRS is not capable of sensing, but it can be achieved with \textit{multi-beam}. Based on the two obtained DoAs, the 3D single-target location is constructed. We then extend to the multi-target-multi-IRS case and propose an \textit{IRS-adaptive sensing protocol} by controlling the on/off state of multiple IRSs, and a multi-target localization algorithm is developed. Simulation results demonstrate the effectiveness of our scheme and show that sub-meter-level positioning accuracy can be achieved.

Reconfigurable Intelligent Surface-Based Receive Generalized Spatial Modulation Design

  • paper_url: http://arxiv.org/abs/2310.15566
  • repo_url: None
  • paper_authors: Xinghao Guo, Hanjiang Hong, Yin Xu, Yi-yan Wu, Dazhi He, Wenjun Zhang
  • for: 本文提出了一种基于智能表面的收发复制模式(RGSM)方案,用于提高未来无线通信的高效性。
  • methods: 该方案使用智能表面(RIS)的群控制器来实现选择接收天线和相位幂分多址(PSK)模ulation,并通过修改元素的激活状态来实现幂分多址(APSK)模ulation。
  • results: 比对与现有的RIS-aided receive generalized space shift keying(RIS-RGSSK)方案,该方案在同样的bit error rate(BER)性能下具有更好的性能,并且结果表明,在低幂分多址(PSK)下,该方案会在低幂分多址(PSK)下表现更好,而在高幂分多址(APSK)下,该方案会表现更好。
    Abstract In this paper, the receive generalized spatial modulation (RGSM) scheme with reconfigurable intelligent surfaces (RIS) assistance is proposed. The RIS group controllers change the reflected phases of the RIS elements to achieve the selection of receive antennas and phase shift keying (PSK) modulation, and the amplitudes of the received symbols are adjusted by changing the activation states of the elements to achieve amplitude phase shift keying (APSK) modulation. Compared with the existing RIS-aided receive generalized space shift keying (RIS-RGSSK) scheme, the proposed scheme realizes that the selected antennas respectively receive different modulation symbols, and only adds the process to control the modulated phases and the activation states of elements. The proposed scheme has better bit error rate (BER) performance than the RIS-RGSSK scheme at the same rate. In addition, the results show that for low modulation orders, the proposed scheme will perform better with PSK, while for high modulation order, APSK is better. The proposed scheme is a promising scheme for future wireless communication to achieve high-efficiency.
    摘要 在这篇论文中,提出了基于扩展通用空间调制(RGSM)的接收方案,即使用智能表面(RIS)的协助。RIS组控制器通过改变反射的相位来实现选择接收天线和相位幅调制(PSK)调制,并通过改变元素的激活状态来调整接收符号的幅度调制(APSK)。与现有的RIS协助的接收扩展空间幅移键(RIS-RGSSK)方案相比,提议方案可以实现每个接收天线分别接收不同的调制符号,并且只需控制调制相位和元素激活状态。提议方案的比较优秀性比RIS-RGSSK方案更好,特别是在同样的速率下,它的比特错误率(BER)性能更高。此外,结果显示,当低调制次数时,提议方案会在PSK上表现更好,而在高调制次数时,APSK会更好。这个方案是未来无线通信技术的一个有前途的方案。

Capacity-based Spatial Modulation Constellation and Pre-scaling Design

  • paper_url: http://arxiv.org/abs/2310.15565
  • repo_url: None
  • paper_authors: Xinghao Guo, Hanjiang Hong, Yin Xu, Yi-yan Wu, Dazhi He, Wenjun Zhang
  • for: 提高SM系统的性能,不需要渠道状态返回(CSI)反馈。
  • methods: 使用非均匀抽象(NUC)和预scaling系数优化设计方案,对SM系统进行优化。
  • results: 对多输入单出口(MISO)系统 WITH Rayleigh canal 进行优化,提高了SM系统的性能,并且可以作为未来6G技术来实现高效性。
    Abstract Spatial Modulation (SM) can utilize the index of the transmit antenna (TA) to transmit additional information. In this paper, to improve the performance of SM, a non-uniform constellation (NUC) and pre-scaling coefficients optimization design scheme is proposed. The bit-interleaved coded modulation (BICM) capacity calculation formula of SM system is firstly derived. The constellation and pre-scaling coefficients are optimized by maximizing the BICM capacity without channel state information (CSI) feedback. Optimization results are given for the multiple-input-single-output (MISO) system with Rayleigh channel. Simulation result shows the proposed scheme provides a meaningful performance gain compared to conventional SM system without CSI feedback. The proposed optimization design scheme can be a promising technology for future 6G to achieve high-efficiency.
    摘要 空间调制(SM)可以利用发射天线(TA)的指标来传输额外信息。在本文中,为提高SM表现,一种非均匀定点(NUC)和预scaling系数优化设计方案被提议。SM系统的bit-拼接oded模ulation(BICM)容量计算公式首先得到。定点和预scaling系数通过最大化BICM容量而不需要通道状态信息(CSI)反馈进行优化。对多输入单出口(MISO)系统的RAYLEIGH通道进行优化结果。实验结果表明,提议的方案比非CSI反馈的SM系统提供了更高的表现。这种优化设计方案可能成为未来6G高效技术的亮点。

Knowledge-driven Meta-learning for CSI Feedback

  • paper_url: http://arxiv.org/abs/2310.15548
  • repo_url: None
  • paper_authors: Han Xiao, Wenqiang Tian, Wendong Liu, Jiajia Guo, Zhi Zhang, Shi Jin, Zhihua Shi, Li Guo, Jia Shen
  • for: 提高大规模多输入多输出系统中的减法状态信息反馈精度,通过深度学习(DL)技术进行改进。
  • methods: 提出了一种基于知识驱动的元学习方法,其中DL模型在元训练阶段初始化的meta模型能够在目标 retrained 阶段快速趋向于新enario。特别是,不需要训练大量来自不同enario的数据,而是基于通信频率特征的内在知识来构建meta任务环境,并将目标任务数据集进行了可brie� augmentation。
  • results: 通过仪器实验结果显示,提出的方法能够在反馈性能和训练时间两个方面具有优势。
    Abstract Accurate and effective channel state information (CSI) feedback is a key technology for massive multiple-input and multiple-output systems. Recently, deep learning (DL) has been introduced for CSI feedback enhancement through massive collected training data and lengthy training time, which is quite costly and impractical for realistic deployment. In this article, a knowledge-driven meta-learning approach is proposed, where the DL model initialized by the meta model obtained from meta training phase is able to achieve rapid convergence when facing a new scenario during target retraining phase. Specifically, instead of training with massive data collected from various scenarios, the meta task environment is constructed based on the intrinsic knowledge of spatial-frequency characteristics of CSI for meta training. Moreover, the target task dataset is also augmented by exploiting the knowledge of statistical characteristics of wireless channel, so that the DL model can achieve higher performance with small actually collected dataset and short training time. In addition, we provide analyses of rationale for the improvement yielded by the knowledge in both phases. Simulation results demonstrate the superiority of the proposed approach from the perspective of feedback performance and convergence speed.
    摘要 Accurate and effective channel state information (CSI) feedback is a key technology for massive multiple-input and multiple-output systems. Recently, deep learning (DL) has been introduced for CSI feedback enhancement through massive collected training data and lengthy training time, which is quite costly and impractical for realistic deployment. In this article, a knowledge-driven meta-learning approach is proposed, where the DL model initialized by the meta model obtained from meta training phase is able to achieve rapid convergence when facing a new scenario during target retraining phase. Specifically, instead of training with massive data collected from various scenarios, the meta task environment is constructed based on the intrinsic knowledge of spatial-frequency characteristics of CSI for meta training. Moreover, the target task dataset is also augmented by exploiting the knowledge of statistical characteristics of wireless channel, so that the DL model can achieve higher performance with small actually collected dataset and short training time. In addition, we provide analyses of rationale for the improvement yielded by the knowledge in both phases. Simulation results demonstrate the superiority of the proposed approach from the perspective of feedback performance and convergence speed.Here's the translation in Traditional Chinese:精准且有效的通道状态信息(CSI)反馈是许多入出发点多载波系统的关键技术。近期,深度学习(DL)已经被引入用于CSI反馈增强,通过大量收集的训练数据和长时间的训练时间,但这是实际部署中非常昂贵和不实际的。在本文中,我们提出了知识驱动的meta学习方法,其中DL模型由meta模型在meta训练阶段取得的知识初始化。在target重训阶段,DL模型能够从新的enario面对快速融合,而不需要大量的训练数据和长时间训练。此外,我们还提供了两个阶段中知识的分析,以解释增强的原因。 simulation results表明,提案的方法具有较好的反馈性和融合速度。

LDPC Decoding with Degree-Specific Neural Message Weights and RCQ Decoding

  • paper_url: http://arxiv.org/abs/2310.15483
  • repo_url: None
  • paper_authors: Linfang Wang, Caleb Terrill, Richard Wesel, Dariush Divsalar
    for: This paper proposes a family of weight-sharing schemes for low-density parity-check (LDPC) codes that use the same weight for edges with the same check node degree and/or variable node degree, reducing neural network complexity and storage requirements.methods: The paper combines degree-specific neural weights with a reconstruction-computation-quantization (RCQ) decoder to produce a weighted RCQ (W-RCQ) decoder, and identifies and resolves a gradient explosion issue that can arise when training neural LDPC decoders.results: The paper shows that node-degree-based weight-sharing can deliver the same performance as using distinct weights for each node, and the W-RCQ decoder with node-degree-based weight sharing has a reduced hardware requirement compared with the original RCQ decoder.
    Abstract Recently, neural networks have improved MinSum message-passing decoders for low-density parity-check (LDPC) codes by multiplying or adding weights to the messages, where the weights are determined by a neural network. The neural network complexity to determine distinct weights for each edge is high, often limiting the application to relatively short LDPC codes. Furthermore, storing separate weights for every edge and every iteration can be a burden for hardware implementations. To reduce neural network complexity and storage requirements, this paper proposes a family of weight-sharing schemes that use the same weight for edges that have the same check node degree and/or variable node degree. Our simulation results show that node-degree-based weight-sharing can deliver the same performance requiring distinct weights for each node. This paper also combines these degree-specific neural weights with a reconstruction-computation-quantization (RCQ) decoder to produce a weighted RCQ (W-RCQ) decoder. The W-RCQ decoder with node-degree-based weight sharing has a reduced hardware requirement compared with the original RCQ decoder. As an additional contribution, this paper identifies and resolves a gradient explosion issue that can arise when training neural LDPC decoders.
    摘要 近些年,神经网络已经改进了MinSum消息传递解码器,用于低密度约束Check(LDPC)编码器。在这些解码器中,神经网络会对消息 multiply 或 add weights,其中 weights 的确定由神经网络。然而,神经网络Complexity determining distinct weights for each edge is high,通常只能应用于相对短LDPC编码器。此外,每个边和每个迭代都需要独立存储 weights 的存储要求是硬件实现的负担。为了减少神经网络复杂性和存储要求,本文提出了一家weight-sharing方案,该方案使用同样的weight для每个检查节点度和/或变量节点度相同的边。我们的实验结果表明,基于节点度的weight-sharing可以实现与每个节点具有独立 weights 的性能。此外,本文还将这种度量Specific neural weights与 reconstruction-computation-quantization(RCQ)解码器结合,生成一个weighted RCQ(W-RCQ)解码器。W-RCQ解码器采用node-degree-based weight sharing,与原始RCQ解码器相比,具有更低的硬件需求。作为其他贡献,本文还识别并解决了在培育神经LDPC解码器时可能出现的梯度爆炸问题。

cs.SD - 2023-10-23

GESI: Gammachirp Envelope Similarity Index for Predicting Intelligibility of Simulated Hearing Loss Sounds

  • paper_url: http://arxiv.org/abs/2310.15399
  • repo_url: None
  • paper_authors: Ayako Yamamoto, Toshio Irino, Fuki Miyazaki, Honoka Tamaru
    for: 这个研究旨在开发一种新的听力可读性指标(OIM),以便预测normal hearing(NH)听众对 simulate hearing loss(HL)声音的听力可读性(SI)。methods: 这个研究使用了一种新的方法,即Gammachirp Envelope Similarity Index(GESI),该方法使用了gammachirp filterbank(GCFB)、模ulation filterbank和extended cosine similarity measure来计算SI指标。GESI可以接受参照声音和测试声音的水平不均衡,并反映listeners的听力水平如 audiogram 上所示。results: 研究发现,GESI可以在四个SI实验中预测mean和个人SI值,而其他传统的OIMs(STOI、ESTOI、MBSTOI和HASPI)没有预测SI的能力。GESI也可以根据听众的个人听力状况来预测SI。
    Abstract We proposed a new objective intelligibility measure (OIM), called the Gammachirp Envelope Similarity Index (GESI), which can predict the speech intelligibility (SI) of simulated hearing loss (HL) sounds for normal hearing (NH) listeners. GESI is an intrusive method that computes the SI metric using the gammachirp filterbank (GCFB), the modulation filterbank, and the extended cosine similarity measure. GESI can accept the level asymmetry of the reference and test sounds and reflect the HI listener's hearing level as it appears on the audiogram. A unique feature of GESI is its ability to incorporate an individual participant's listening condition into the SI prediction. We conducted four SI experiments on male and female speech sounds in both laboratory and crowdsourced remote environments. We then evaluated GESI and the conventional OIMs, STOI, ESTOI, MBSTOI, and HASPI, for their ability to predict mean and individual SI values with and without the use of simulated HL sounds. GESI outperformed the other OIMs in all evaluations. STOI, ESTOI, and MBSTOI did not predict SI at all, even when using the simulated HL sounds. HASPI did not predict the difference between the laboratory and remote experiments on male speech sounds and the individual SI values. GESI may provide a first step toward SI prediction for individual HI listeners whose HL is caused solely by peripheral dysfunction.
    摘要 我们提出了一种新的对象智能度量标 (OIM),即γ折衔幂响同比指标 (GESI),可以预测正常聆听者 (NH) 对 simulate 听力损伤 (HL) 声音的语音 inteligibilidad (SI)。GESI 是一种侵入性的方法,使用γ折衔幂 filterbank (GCFB)、模拟 filterbank 和扩展 косину similarity measure 来计算 SI 指标。GESI 可以接受参考和测试声音的水平差异,并反映听力测试图中的听力水平。GESI 的一个独特特点是可以将参与者的听力条件 incorporated 到 SI 预测中。我们在男女speech声音上进行了四个 SI 实验,分别在实验室和 Remote 环境中进行。然后,我们评估了 GESI 和传统 OIMs,STOI、ESTOI、MBSTOI 和 HASPI,它们在使用 simulate HL 声音时预测 Mean 和个体 SI 值的能力。GESI 在所有评估中表现出色,而 STOI、ESTOI 和 MBSTOI 没有预测 SI 一样,甚至在使用 simulate HL 声音时也无法预测。HASPI 不能预测男子speech声音在实验室和 Remote 环境之间的差异。GESI 可能为听力损伤仅由 péripheral dysfunction 引起的个体 HI listeners 提供了第一步 toward SI 预测。

8+8=4: Formalizing Time Units to Handle Symbolic Music Durations

  • paper_url: http://arxiv.org/abs/2310.14952
  • repo_url: None
  • paper_authors: Emmanouil Karystinaios, Francesco Foscarin, Florent Jacquemard, Masahiko Sakai, Satoshi Tojo, Gerhard Widmer
  • for: 这个论文关注符号音乐作品中的 Nominal 持续时间(音符和停止符),以及如何在计算机应用程序中方便地处理这些。
  • methods: 作者提议使用直接与音乐Sheet中图形符号相关的时间单位,并提供了一组常用的计算音乐应用程序中的操作。
  • results: 作者将这种时间单位和常用方法整合到了一个数学框架中,并讨论了一些实际应用场景,并指出了在数据类型和计算数量方面,该系统可以提高处理效率。
    Abstract This paper focuses on the nominal durations of musical events (notes and rests) in a symbolic musical score, and on how to conveniently handle these in computer applications. We propose the usage of a temporal unit that is directly related to the graphical symbols in musical scores and pair this with a set of operations that cover typical computations in music applications. We formalize this time unit and the more commonly used approach in a single mathematical framework, as semirings, algebraic structures that enable an abstract description of algorithms/processing pipelines. We then discuss some practical use cases and highlight when our system can improve such pipelines by making them more efficient in terms of data type used and the number of computations.
    摘要 Translated into Simplified Chinese:这篇论文关注 симвоlic musical score中的 Nominal duration of musical events (notes and rests),以及如何在计算机应用程序中方便处理这些事件。我们提议使用 directly related to the graphical symbols in musical scores的 temporal unit,并与常用的算法/处理管道中的操作集成一起。我们将这个时间单位和常用的方法集成到一个数学框架中,使用semirings,这些数学结构可以描述算法/处理管道的抽象描述。然后我们讨论了一些实际应用场景,并 highlighted when our system can improve such pipelines by making them more efficient in terms of data type used and the number of computations.

Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model

  • paper_url: http://arxiv.org/abs/2310.14946
  • repo_url: None
  • paper_authors: Joanna Hong, Se Jin Park, Yong Man Ro
  • for: 本研究旨在开发一种多语言 audio-visual speech recognition系统,以便在多语言环境中提高语音识别精度。
  • methods: 我们提出了一种基于人类语言认知系统的方法,通过分别捕捉语言之间的共同特征和差异来识别输入语音的语言类别。我们在大量预训练的 audio-visual 表示模型中加入了提示细化技术,使网络可以同时识别语音和语言类别。
  • results: 我们的方法可以在多语言 audio-visual speech recognition任务中实现高效精度的识别,而无需建立语言特定的模型。这些结果表明,我们的方法可以提高多语言 audio-visual speech recognition系统的Robustness和效率。
    Abstract We present a novel approach to multilingual audio-visual speech recognition tasks by introducing a single model on a multilingual dataset. Motivated by a human cognitive system where humans can intuitively distinguish different languages without any conscious effort or guidance, we propose a model that can capture which language is given as an input speech by distinguishing the inherent similarities and differences between languages. To do so, we design a prompt fine-tuning technique into the largely pre-trained audio-visual representation model so that the network can recognize the language class as well as the speech with the corresponding language. Our work contributes to developing robust and efficient multilingual audio-visual speech recognition systems, reducing the need for language-specific models.
    摘要 我们提出了一种新的方法来处理多语言音视频演说识别任务,那是通过在多语言数据集上引入单一模型。我们受到人类认知系统的启发,人们可以无需注意或指导,直观地分辨不同语言。因此,我们提议一种可以从语言输入speech中捕捉语言类别的模型,同时还能识别相应的语言。为此,我们在广泛预训练的音视频表示模型中加入了提示细调技术,使网络可以同时识别语言类别和相应的speech。我们的工作对于开发高效、可靠的多语言音视频演说识别系统做出了贡献,降低了语言特定模型的需求。

Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

  • paper_url: http://arxiv.org/abs/2310.14778
  • repo_url: None
  • paper_authors: Jinzheng Zhao, Yong Xu, Xinyuan Qian, Davide Berghi, Peipei Wu, Meng Cui, Jianyuan Sun, Philip J. B. Jackson, Wenwu Wang
  • for: 本文提供了一个全面的室内Audio-visual speaker tracking的综述,包括 Bayesian 筛选器家族和获取音视频测量方法。
  • methods: 本文使用 Bayesian 筛选器家族和深度学习技术来解决数据归一化、音视频融合和跟踪管理等问题。
  • results: 本文对 AV16.3 数据集的现有跟踪器进行了总结,并讨论了深度学习技术对测量提取和状态估计的影响。
    Abstract Audio-visual speaker tracking has drawn increasing attention over the past few years due to its academic values and wide application. Audio and visual modalities can provide complementary information for localization and tracking. With audio and visual information, the Bayesian-based filter can solve the problem of data association, audio-visual fusion and track management. In this paper, we conduct a comprehensive overview of audio-visual speaker tracking. To our knowledge, this is the first extensive survey over the past five years. We introduce the family of Bayesian filters and summarize the methods for obtaining audio-visual measurements. In addition, the existing trackers and their performance on AV16.3 dataset are summarized. In the past few years, deep learning techniques have thrived, which also boosts the development of audio visual speaker tracking. The influence of deep learning techniques in terms of measurement extraction and state estimation is also discussed. At last, we discuss the connections between audio-visual speaker tracking and other areas such as speech separation and distributed speaker tracking.
    摘要 simplified chinese translation:听音视频说话人跟踪在过去几年来得到了越来越多的关注,这主要归功于其学术价值和广泛的应用领域。听音和视频Modalities可以提供补充信息,用于定位和跟踪。通过听音和视频信息, Bayesian 筛选器可以解决数据关联、听音视频融合和跟踪管理问题。在这篇论文中,我们提供了听音视频说话人跟踪的全面概述。据我们所知,这是过去五年来第一篇详细的报告。我们介绍了 Bayesian 筛选器的家族和获得听音视频测量的方法。此外,我们还总结了现有的跟踪器和其在 AV16.3 数据集上的性能。在过去几年中,深度学习技术得到了大量应用,这也促进了听音视频说话人跟踪的发展。深度学习技术在测量提取和状态估计方面的影响也得到了讨论。最后,我们讨论了听音视频说话人跟踪和其他领域之间的连接,如语音分离和分布式说话人跟踪。Note: Simplified Chinese translation is based on the standardized form of Chinese used in mainland China. The translation may vary depending on the region or dialect.

Acoustic BPE for Speech Generation with Discrete Tokens

  • paper_url: http://arxiv.org/abs/2310.14580
  • repo_url: None
  • paper_authors: Feiyu Shen, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu
  • for: 用于提高语音生成器的速度和语法捕捉能力
  • methods: 使用字对编码法(BPE)对声音token进行编码,以减少序列长度并利用tokentsequence中的词法信息
  • results: 通过对语音生成器进行了全面的调查,证明了acoustic BPE可以提高语音生成器的听起来更自然和语法更正确,并提出了一种新的重新分配方法来选择最佳的人工语音。
    Abstract Discrete audio tokens derived from self-supervised learning models have gained widespread usage in speech generation. However, current practice of directly utilizing audio tokens poses challenges for sequence modeling due to the length of the token sequence. Additionally, this approach places the burden on the model to establish correlations between tokens, further complicating the modeling process. To address this issue, we propose acoustic BPE which encodes frequent audio token patterns by utilizing byte-pair encoding. Acoustic BPE effectively reduces the sequence length and leverages the prior morphological information present in token sequence, which alleviates the modeling challenges of token correlation. Through comprehensive investigations on a speech language model trained with acoustic BPE, we confirm the notable advantages it offers, including faster inference and improved syntax capturing capabilities. In addition, we propose a novel rescore method to select the optimal synthetic speech among multiple candidates generated by rich-diversity TTS system. Experiments prove that rescore selection aligns closely with human preference, which highlights acoustic BPE's potential to other speech generation tasks.
    摘要 几种自动学习模型中的不连续音频标记已经在语音生成中得到了广泛的应用。然而,直接使用音频标记会导致序列模型遇到长度问题,同时还需要模型建立音频标记之间的相关性,这会复杂化模型化过程。为解决这个问题,我们提出了音频BPE,它利用字对编码法将常见音频标记模式编码。音频BPE可以缩短序列长度,同时利用音频标记序列中的先前 morphological 信息,从而使模型化过程更加简单。通过对一个使用音频BPE训练的语音语言模型进行了广泛的调查,我们证明了它具有 faster inference 和 improved syntax capturing 能力。此外,我们还提出了一种新的重新分配方法,可以在rich-diversity TTS 系统中选择最佳的 sintetic 语音。实验证明,重新分配选择与人类偏好高度相符,这 highlights 音频BPE 的潜在应用在其他语音生成任务中。

eess.AS - 2023-10-23

Prompt-driven Target Speech Diarization

  • paper_url: http://arxiv.org/abs/2310.14823
  • repo_url: None
  • paper_authors: Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li
  • for: 本研究旨在 определять target speech event 在听音信号中发生的时间。
  • methods: 我们提出了一种基于提示的 neural architecture,即 Prompt-driven Target Speech Diarization (PTSD),可以处理不同的提示,以确定Target speech event of interest。
  • results: 我们通过使用 sim2spk、sim3spk 和 sim4spk 数据集进行训练和评估,并证明了我们的框架可以准确地localize target speech events。此外,我们的框架还能够在三种关于分类的任务中表现出优秀的多样性:target speaker voice activity detection、overlapped speech detection 和 gender diarization。
    Abstract We introduce a novel task named `target speech diarization', which seeks to determine `when target event occurred' within an audio signal. We devise a neural architecture called Prompt-driven Target Speech Diarization (PTSD), that works with diverse prompts that specify the target speech events of interest. We train and evaluate PTSD using sim2spk, sim3spk and sim4spk datasets, which are derived from the Librispeech. We show that the proposed framework accurately localizes target speech events. Furthermore, our framework exhibits versatility through its impressive performance in three diarization-related tasks: target speaker voice activity detection, overlapped speech detection and gender diarization. In particular, PTSD achieves comparable performance to specialized models across these tasks on both real and simulated data. This work serves as a reference benchmark and provides valuable insights into prompt-driven target speech processing.
    摘要 我们介绍了一个新任务名为“目标语音分类”,它的目的是在声音信号中确定“target事件发生的时间”。我们提出了一种神经网络架构名为“启发式目标语音分类”(PTSD),它可以使用多种启发符Specify the target speech events of interest。我们在sim2spk、sim3spk和sim4spk数据集上训练和评估PTSD,这些数据集来自Librispeech。我们发现我们的框架可以准确地本地化target语音事件。此外,我们的框架也表现出了多样性,它在三个分类相关的任务中:目标说话人嗓音活动检测、 overlap speech检测和性别分类中都达到了相当的性能。特别是,PTSD在这些任务中与专门的模型相比具有相当的性能,包括实际数据和模拟数据。这项工作作为参考基准,提供了价值的启发式目标语音处理的理解。

cs.CV - 2023-10-23

Towards contrast-agnostic soft segmentation of the spinal cord

  • paper_url: http://arxiv.org/abs/2310.15402
  • repo_url: https://github.com/sct-pipeline/contrast-agnostic-softseg-spinalcord
  • paper_authors: Sandrine Bédard, Naga Karthik Enamundram, Charidimos Tsagkas, Emanuele Pravatà, Cristina Granziera, Andrew Smith, Kenneth Arnold Weber II, Julien Cohen-Adad
    for:* 这个论文的目的是提出一种深度学习基于的脊椎神经分割方法,以减少脊椎神经膨胀面积(CSA)的变化,从而提高脊椎神经分割的精度和可重复性。methods:* 该方法使用了深度学习的UNet模型,并使用了多个对比度的binary segmentations来生成参与者特定的软分割(soft ground truth,SGT)。然后,使用SGT和回归型损失函数来训练UNet模型。results:* 对比之前的方法,该方法能够减少CSA的变化($p < 0.05$, 沃科克签名检验),并且在不同的数据集、供应商、对比度和疾病(压缩、损伤)中具有更好的普适性。
    Abstract Spinal cord segmentation is clinically relevant and is notably used to compute spinal cord cross-sectional area (CSA) for the diagnosis and monitoring of cord compression or neurodegenerative diseases such as multiple sclerosis. While several semi and automatic methods exist, one key limitation remains: the segmentation depends on the MRI contrast, resulting in different CSA across contrasts. This is partly due to the varying appearance of the boundary between the spinal cord and the cerebrospinal fluid that depends on the sequence and acquisition parameters. This contrast-sensitive CSA adds variability in multi-center studies where protocols can vary, reducing the sensitivity to detect subtle atrophies. Moreover, existing methods enhance the CSA variability by training one model per contrast, while also producing binary masks that do not account for partial volume effects. In this work, we present a deep learning-based method that produces soft segmentations of the spinal cord. Using the Spine Generic Public Database of healthy participants ($\text{n}=267$; $\text{contrasts}=6$), we first generated participant-wise soft ground truth (GT) by averaging the binary segmentations across all 6 contrasts. These soft GT, along with a regression-based loss function, were then used to train a UNet model for spinal cord segmentation. We evaluated our model against state-of-the-art methods and performed ablation studies involving different GT mask types, loss functions, and contrast-specific models. Our results show that using the soft average segmentations along with a regression loss function reduces CSA variability ($p < 0.05$, Wilcoxon signed-rank test). The proposed spinal cord segmentation model generalizes better than the state-of-the-art contrast-specific methods amongst unseen datasets, vendors, contrasts, and pathologies (compression, lesions), while accounting for partial volume effects.
    摘要 临床重要的脊椎神经段化是用于计算脊椎跨sectional area (CSA)的诊断和监测多发性静脉炎和其他神经退化疾病。 although several semi and automatic methods exist, one key limitation remains: the segmentation depends on the MRI contrast, resulting in different CSA across contrasts. This is partly due to the varying appearance of the boundary between the spinal cord and the cerebrospinal fluid that depends on the sequence and acquisition parameters. This contrast-sensitive CSA adds variability in multi-center studies where protocols can vary, reducing the sensitivity to detect subtle atrophies. Moreover, existing methods enhance the CSA variability by training one model per contrast, while also producing binary masks that do not account for partial volume effects.在这项工作中,我们提出了一种基于深度学习的方法,该方法生成了软分 segmentation of the spinal cord。 使用Generic Public Database of healthy participants(n = 267,contrasts = 6),我们首先生成了每个参与者的soft ground truth(GT),通过对所有6个对比进行平均来生成每个参与者的GT。这些soft GT,以及一种回归基于的损失函数,然后用于训练UNet模型。我们对我们的模型进行比较,并进行了不同GT层次、损失函数和对比特定模型的ablation研究。我们的结果表明,使用soft average segmentations along with a regression loss function reduces CSA variability ($p < 0.05$, Wilcoxon signed-rank test).我们的提议的脊椎神经段化模型在未看过的数据集、供应商、对比和疾病(压缩、损害)方面更好地总结,同时考虑到partial volume effects。

Remote Heart Rate Monitoring in Smart Environments from Videos with Self-supervised Pre-training

  • paper_url: http://arxiv.org/abs/2310.15388
  • repo_url: None
  • paper_authors: Divij Gupta, Ali Etemad
    for: 这 paper 的目的是提出一种基于自我supervised learning的远程心率估算方法,以减少需要大量标注数据的限制,并提高性能。methods: 这 paper 使用的方法包括自我supervised contrastive learning,以及3个空间和3个时间的扩展。results: 对两个公开的数据集进行了实验,并证明了该方法可以比较related works和supervised learning baseline的性能,并且approach the state-of-the-art。
    Abstract Recent advances in deep learning have made it increasingly feasible to estimate heart rate remotely in smart environments by analyzing videos. However, a notable limitation of deep learning methods is their heavy reliance on extensive sets of labeled data for effective training. To address this issue, self-supervised learning has emerged as a promising avenue. Building on this, we introduce a solution that utilizes self-supervised contrastive learning for the estimation of remote photoplethysmography (PPG) and heart rate monitoring, thereby reducing the dependence on labeled data and enhancing performance. We propose the use of 3 spatial and 3 temporal augmentations for training an encoder through a contrastive framework, followed by utilizing the late-intermediate embeddings of the encoder for remote PPG and heart rate estimation. Our experiments on two publicly available datasets showcase the improvement of our proposed approach over several related works as well as supervised learning baselines, as our results approach the state-of-the-art. We also perform thorough experiments to showcase the effects of using different design choices such as the video representation learning method, the augmentations used in the pre-training stage, and others. We also demonstrate the robustness of our proposed method over the supervised learning approaches on reduced amounts of labeled data.
    摘要 近期深度学习的发展使得智能环境中 Remote 心率估计变得越来越可能。然而,深度学习方法却存在一定的问题,即需要大量标注数据进行有效的训练。为解决这个问题,自动学习 emerged as a promising avenue。我们基于这个 Avenues 提出了一种解决方案,利用自动学习的对比学习来进行 Remote 光谱 Plethysmography (PPG) 和心率监测,从而减少标注数据的依赖性和提高性能。我们提议使用 3 个空间和 3 个时间的扩展来训练一个编码器,然后使用编码器的晚期中间 embedding 进行 Remote PPG 和心率估计。我们在两个公共可用的数据集上进行了实验,并证明了我们的提议方法在许多相关作品以及标注学习基准点上的改进。我们还进行了广泛的实验,以展示不同的设计选择的影响,如视频学习方法、预训练阶段的扩展和其他。此外,我们还证明了我们的提议方法在标注学习方法下的减少量标注数据上的Robustness。

Deep Integrated Explanations

  • paper_url: http://arxiv.org/abs/2310.15368
  • repo_url: https://github.com/dix-cikm23/dix
  • paper_authors: Oren Barkan, Yehonatan Elisha, Jonathan Weill, Yuval Asher, Amit Eshel, Noam Koenigstein
  • for: 这篇论文提出了一种名为深度集成解释(DIX)的全面方法,用于解释视觉模型。
  • methods: DIX使用模型的中间表示和相应的梯度来生成解释地图。
  • results: 经过对多种任务、数据集和模型配置的广泛评估,DIX能够生成 faithful和准确的解释地图,并超过当前状态的方法。
    Abstract This paper presents Deep Integrated Explanations (DIX) - a universal method for explaining vision models. DIX generates explanation maps by integrating information from the intermediate representations of the model, coupled with their corresponding gradients. Through an extensive array of both objective and subjective evaluations spanning diverse tasks, datasets, and model configurations, we showcase the efficacy of DIX in generating faithful and accurate explanation maps, while surpassing current state-of-the-art methods.
    摘要

DeepVox and SAVE-CT: a contrast- and dose-independent 3D deep learning approach for thoracic aorta segmentation and aneurysm prediction using computed tomography scans

  • paper_url: http://arxiv.org/abs/2310.15328
  • repo_url: None
  • paper_authors: Matheus del-Valle, Lariza Laura de Oliveira, Henrique Cursino Vieira, Henrique Min Ho Lee, Lucas Lembrança Pinheiro, Maria Fernanda Portugal, Newton Shydeo Brandão Miyoshi, Nelson Wolosker
  • for: 这个研究旨在提出一个可靠且自动化的脊梗动脉炎(TAA)检测方法,以减少TAA的死亡率和专业医生的评估过程中的时间。
  • methods: 这个研究使用了一个新的分类模型(DeepVox)和一个新的TAA分类模型(SAVE-CT),它们可以自动识别和分类TAA,并且可以处理不同数量的图像和不同的脊梗序列。
  • results: 这个研究发现,使用DeepVox和SAVE-CT模型可以实现自动化的TAA检测,并且可以提高医生的评估效率和准确性。这个方法可以帮助减少TAA的死亡率和专业医生的负担。
    Abstract Thoracic aortic aneurysm (TAA) is a fatal disease which potentially leads to dissection or rupture through progressive enlargement of the aorta. It is usually asymptomatic and screening recommendation are limited. The gold-standard evaluation is performed by computed tomography angiography (CTA) and radiologists time-consuming assessment. Scans for other indications could help on this screening, however if acquired without contrast enhancement or with low dose protocol, it can make the clinical evaluation difficult, besides increasing the scans quantity for the radiologists. In this study, it was selected 587 unique CT scans including control and TAA patients, acquired with low and standard dose protocols, with or without contrast enhancement. A novel segmentation model, DeepVox, exhibited dice score coefficients of 0.932 and 0.897 for development and test sets, respectively, with faster training speed in comparison to models reported in the literature. The novel TAA classification model, SAVE-CT, presented accuracies of 0.930 and 0.922 for development and test sets, respectively, using only the binary segmentation mask from DeepVox as input, without hand-engineered features. These two models together are a potential approach for TAA screening, as they can handle variable number of slices as input, handling thoracic and thoracoabdominal sequences, in a fully automated contrast- and dose-independent evaluation. This may assist to decrease TAA mortality and prioritize the evaluation queue of patients for radiologists.
    摘要 腹部大动脉瘤 (TAA) 是一种可能导致分解或爆裂的致命疾病,通过进行不断扩大的大动脉。它通常是无症状的,而检查建议则有限。黄金标准评估是通过 computed tomography angiography (CTA) 进行,但这需要 radiologists 的时间负担。在这个研究中,选择了 587 个独特的 CT 扫描,包括控制和 TAA 患者,采用低和标准剂量剂 protocol。一个新的分 segmentation 模型,DeepVox,在发展和测试集上显示了 dice 分数价值为 0.932 和 0.897,并且比文献中报告的模型更快的训练速度。一个新的 TAA 分类模型,SAVE-CT,在发展和测试集上显示了准确率为 0.930 和 0.922,使用仅有 binary 分 segmentation 面组的 DeepVox 输入,不需手工设计特征。这两个模型共同形成一个可能的 TAA 检查方法,可以处理变数数量的萤幕,处理腹部和腹部类别的扫描,并且完全自动、对比物和剂量无需关注。这可能帮助降低 TAA 的死亡率,并将检查顺序优先级为 radiologists 评估。

Videoprompter: an ensemble of foundational models for zero-shot video understanding

  • paper_url: http://arxiv.org/abs/2310.15324
  • repo_url: None
  • paper_authors: Adeel Yousaf, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
    for: 这个论文是为了提高视频理解的零shot性能而提出的一种框架。methods: 该框架 combinest 预训练的推论视频模型(VLMs)和预训练的生成视频到文本和文本到文本模型。 该框架具有两个关键修改:首先,我们提出了语言指导的视觉特征增强,并使用视频到文本模型将查询视频转换为其描述性形式。其次,我们提出了视频特定的提示技术,以生成更加有意义的描述,以增强类标表示。results: 该框架在三种零shot设定下进行了视频理解:1)视频动作识别,2)视频到文本和文本到视频检索,以及3)时间敏感的视频任务。Result表明,该框架在多个 benchmark 上具有稳定的提高,并且可以与不同的 VLMs 结合使用。我们将代码公开。
    Abstract Vision-language models (VLMs) classify the query video by calculating a similarity score between the visual features and text-based class label representations. Recently, large language models (LLMs) have been used to enrich the text-based class labels by enhancing the descriptiveness of the class names. However, these improvements are restricted to the text-based classifier only, and the query visual features are not considered. In this paper, we propose a framework which combines pre-trained discriminative VLMs with pre-trained generative video-to-text and text-to-text models. We introduce two key modifications to the standard zero-shot setting. First, we propose language-guided visual feature enhancement and employ a video-to-text model to convert the query video to its descriptive form. The resulting descriptions contain vital visual cues of the query video, such as what objects are present and their spatio-temporal interactions. These descriptive cues provide additional semantic knowledge to VLMs to enhance their zeroshot performance. Second, we propose video-specific prompts to LLMs to generate more meaningful descriptions to enrich class label representations. Specifically, we introduce prompt techniques to create a Tree Hierarchy of Categories for class names, offering a higher-level action context for additional visual cues, We demonstrate the effectiveness of our approach in video understanding across three different zero-shot settings: 1) video action recognition, 2) video-to-text and textto-video retrieval, and 3) time-sensitive video tasks. Consistent improvements across multiple benchmarks and with various VLMs demonstrate the effectiveness of our proposed framework. Our code will be made publicly available.
    摘要 视力语模型(VLM)将查询视频分类为计算视觉特征和文本基础标签表示之间的相似度。现在,大型语言模型(LLM)已经用于提高文本基础标签的描述性。然而,这些改进只适用于文本基础标签,查询视频特征未被考虑。在这篇论文中,我们提出一个框架,结合预训练的推断性VLM和预训练的生成视频到文本和文本到文本模型。我们提出两个关键修改:首先,我们提出语言引导的视觉特征增强,并使用视频到文本模型将查询视频转换成其描述性的形式。 resulting descriptions contain vital visual cues of the query video, such as what objects are present and their spatio-temporal interactions。这些描述性缺省提供了额外的semantic knowledge to VLMs,以提高其零基础性性能。其次,我们提出视频特有的提示,以便LLMs生成更加有意义的描述,以恰到类标签表示。specifically, we introduce prompt techniques to create a Tree Hierarchy of Categories for class names, offering a higher-level action context for additional visual cues。我们在视频理解中进行了三种零基础设定:1)视频动作识别,2)视频到文本和文本到视频检索,和3)时间敏感视频任务。我们在多个标准 benchark 和不同VLMs上展现了一致性的改进, demonstrating the effectiveness of our proposed framework。我们将代码公开。

SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

  • paper_url: http://arxiv.org/abs/2310.15308
  • repo_url: None
  • paper_authors: Haoxiang Wang, Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pouransari
    for:This paper aims to create a unified model that combines the strengths of two pre-trained vision foundation models (VFMs), Segment Anything Model (SAM) and Contrastive Language-Image Pre-training (CLIP).methods:The proposed method uses a simple recipe that integrates multi-task learning, continual learning techniques, and teacher-student distillation to efficiently merge SAM and CLIP into a single backbone.results:The resulting model, called SAM-CLIP, learns richer visual representations that are equipped with both localization and semantic features, leading to improved performance on several head probing tasks and zero-shot semantic segmentation tasks, with new state-of-the-art results on 5 benchmarks.
    Abstract The landscape of publicly available vision foundation models (VFMs), such as CLIP and Segment Anything Model (SAM), is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their pre-training objectives. For instance, CLIP excels in semantic understanding, while SAM specializes in spatial understanding for segmentation. In this work, we introduce a simple recipe to efficiently merge VFMs into a unified model that assimilates their expertise. Our proposed method integrates multi-task learning, continual learning techniques, and teacher-student distillation. This strategy entails significantly less computational cost compared to traditional multi-task training from scratch. Additionally, it only demands a small fraction of the pre-training datasets that were initially used to train individual models. By applying our method to SAM and CLIP, we derive SAM-CLIP: a unified model that amalgamates the strengths of SAM and CLIP into a single backbone, making it apt for edge device applications. We show that SAM-CLIP learns richer visual representations, equipped with both localization and semantic features, suitable for a broad range of vision tasks. SAM-CLIP obtains improved performance on several head probing tasks when compared with SAM and CLIP. We further show that SAM-CLIP not only retains the foundational strengths of its precursor models but also introduces synergistic functionalities, most notably in zero-shot semantic segmentation, where SAM-CLIP establishes new state-of-the-art results on 5 benchmarks. It outperforms previous models that are specifically designed for this task by a large margin, including +6.8% and +5.9% mean IoU improvement on Pascal-VOC and COCO-Stuff datasets, respectively.
    摘要 “公共可用的视觉基础模型(VFM)的领域正在迅速扩展。VFM具有不同的预训练目标,从而具备不同的能力。例如,CLIP excel在Semantic Understanding方面,而SAM专注于 segmentation 的空间理解。在这项工作中,我们提出了一个简单的方法,可以快速将 VFM 合并成一个统一的模型,并融合它们的专长。我们的提议的方法包括多任务学习、 continual learning 技术和教师学习。这种策略相比传统的多任务训练从零开始,需要 significatively 更少的计算成本。此外,它只需要原始训练数据的一小部分。通过应用我们的方法于 SAM 和 CLIP,我们得到了 SAM-CLIP:一个统一的模型,将 SAM 和 CLIP 的强点融合到一起,适用于边缘设备应用。我们显示,SAM-CLIP 学习了更加丰富的视觉表示,具有 Both localization 和 Semantic 特征,适用于广泛的视觉任务。SAM-CLIP 在多个头 probing 任务中表现出色,比 SAM 和 CLIP 更好。我们进一步显示,SAM-CLIP 不仅保留了其前置模型的基础优势,还 introduce 了相互补做的功能,主要在零shot Semantic Segmentation 方面,SAM-CLIP 在5个标准 benchmark 上设置了新的州队纪录,包括 Pascal-VOC 和 COCO-Stuff 数据集,升级 +6.8% 和 +5.9% 的 Mean IoU。”

SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis

  • paper_url: http://arxiv.org/abs/2310.15247
  • repo_url: None
  • paper_authors: Marco Comunità, Riccardo F. Gramaccioni, Emilian Postolache, Emanuele Rodolà, Danilo Comminiello, Joshua D. Reiss
  • for: 本研究旨在提高声效设计的效率和自动化水平,使声效设计更加创新和灵活。
  • methods: 本研究提出了一种基于扩散模型的声效自动生成方法,使用环境录音或文本嵌入来控制扩散模型生成的声效音轨。
  • results: 实验结果表明,该方法可以准确地检测视频中的重复动作开头,并生成符合视频的声效音轨。此外,编辑开头轨或更改控制嵌入需要 much less 的努力 than 编辑音轨本身,从而简化了声效设计过程。
    Abstract Sound design involves creatively selecting, recording, and editing sound effects for various media like cinema, video games, and virtual/augmented reality. One of the most time-consuming steps when designing sound is synchronizing audio with video. In some cases, environmental recordings from video shoots are available, which can aid in the process. However, in video games and animations, no reference audio exists, requiring manual annotation of event timings from the video. We propose a system to extract repetitive actions onsets from a video, which are then used - in conjunction with audio or textual embeddings - to condition a diffusion model trained to generate a new synchronized sound effects audio track. In this way, we leave complete creative control to the sound designer while removing the burden of synchronization with video. Furthermore, editing the onset track or changing the conditioning embedding requires much less effort than editing the audio track itself, simplifying the sonification process. We provide sound examples, source code, and pretrained models to faciliate reproducibility
    摘要 声音设计包括创atively选择、录音和编辑声效 для不同媒体 like 电影、游戏和虚拟/增强现实。同视频的 Audio 的同步是设计声音的一个最时consuming的步骤。在一些情况下,可以使用视频拍摄的环境录音来帮助同步,但在游戏和动画中,没有参考音频,需要手动标注视频中的事件时间。我们提出一种系统,可以提取视频中的重复动作开始时间,这些时间可以与音频或文本嵌入一起用于conditioning一个 diffusion 模型,以生成一个新的同步的声效音轨。这样,我们保留了声音设计师完整的创作控制,同时消除了与视频的同步相关的劳动。此外,编辑启动轨或改变conditioning嵌入需要 much less effort than editing 音频轨本身,这使得声音同化过程变得更加简单。我们提供了声音示例、源代码和预训练模型,以便促进可重现性。

RoboDepth: Robust Out-of-Distribution Depth Estimation under Corruptions

  • paper_url: http://arxiv.org/abs/2310.15171
  • repo_url: https://github.com/ldkong1205/robodepth
  • paper_authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit R. Cottereau, Wei Tsang Ooi
  • For: The paper aims to address the issue of out-of-distribution (OoD) situations in depth estimation from monocular images, which is crucial for real-world visual perception systems like autonomous driving.* Methods: The authors introduce a comprehensive robustness test suite called RoboDepth, which includes 18 corruptions across three categories: weather and lighting conditions, sensor failures and movement, and data processing anomalies. They benchmark 42 depth estimation models across indoor and outdoor scenes to assess their resilience to these corruptions.* Results: The authors find that many leading depth estimation models are susceptible to typical corruptions, highlighting the need for more robust models. They discuss design considerations for crafting more robust depth estimation models, including pre-training, augmentation, modality, model capacity, and learning paradigms.Here’s the Chinese translation of the three key points:* For: 这篇论文旨在解决单目图像深度估计中的异常情况(Out-of-distribution,OoD)问题,这对实际应用中的视觉识别系统如自动驾驶是非常重要。* Methods: 作者们提出了一个全面的 robustness 测试策略,名为 RoboDepth,该策略包括18种异常情况,分为三类:天气和照明条件、感知器和运动失效、数据处理异常。他们在室内和室外场景中对42种深度估计模型进行了测试,以评估它们对这些异常情况的抗性。* Results: 作者们发现,许多当前的深度估计模型在常见的异常情况下很容易受损,这提出了为了创造更加可靠的深度估计模型所需的设计考虑。他们讨论了针对异常情况设计更加 Robust 的深度估计模型,包括预训练、增强、模式、容量和学习方法等方面的设计考虑。
    Abstract Depth estimation from monocular images is pivotal for real-world visual perception systems. While current learning-based depth estimation models train and test on meticulously curated data, they often overlook out-of-distribution (OoD) situations. Yet, in practical settings -- especially safety-critical ones like autonomous driving -- common corruptions can arise. Addressing this oversight, we introduce a comprehensive robustness test suite, RoboDepth, encompassing 18 corruptions spanning three categories: i) weather and lighting conditions; ii) sensor failures and movement; and iii) data processing anomalies. We subsequently benchmark 42 depth estimation models across indoor and outdoor scenes to assess their resilience to these corruptions. Our findings underscore that, in the absence of a dedicated robustness evaluation framework, many leading depth estimation models may be susceptible to typical corruptions. We delve into design considerations for crafting more robust depth estimation models, touching upon pre-training, augmentation, modality, model capacity, and learning paradigms. We anticipate our benchmark will establish a foundational platform for advancing robust OoD depth estimation.
    摘要 深度估计从单目图像中是实际视觉系统中的关键任务。当前的学习型深度估计模型通常在精心挑选的数据上训练和测试,但它们经常忽视非标准(OoD)情况。然而,在实际应用中,特别是自动驾驶等安全关键的场景中,常见的损害可能会出现。为解决这一问题,我们介绍了一个完整的RoboDepth robustness测试 suite,包括18种损害类型,即:一、天气和照明条件;二、传感器故障和运动;三、数据处理异常。然后,我们在室内和室外场景中测试了42个深度估计模型的可靠性。我们的发现表明,在缺乏专门的可靠性评估框架的情况下,许多领先的深度估计模型可能会受到 Typical corruptions 的影响。我们进一步探讨了制定更加可靠的深度估计模型的设计考虑因素,包括预训练、扩展、模式、容量和学习方法。我们预计,RoboDepth 会成为建立可靠 OoD 深度估计模型的基础平台。

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling

  • paper_url: http://arxiv.org/abs/2310.15169
  • repo_url: https://github.com/arthur-qiu/longercrafter
  • paper_authors: Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei Liu
  • for: 这种研究旨在扩展基于文本的视频生成能力,以便在执行中生成更高质量的长视频。
  • methods: 我们首先分析了视频扩散模型中的初始噪声的影响,然后基于这个观察,我们提出了一种免除调参的高效方法来增强已经预训练的视频扩散模型的生成能力,同时保持内容一致性。我们采用了一种窗口函数进行时间注意力,并对不同框架的噪声进行顺序调整。
  • results: 我们的方法比前一个最佳方法带来了约17%的时间成本增加,而且生成的视频样本可以在我们的网站上查看:http://haonanqiu.com/projects/FreeNoise.html。
    Abstract With the availability of large-scale video datasets and the advances of diffusion models, text-driven video generation has achieved substantial progress. However, existing video generation models are typically trained on a limited number of frames, resulting in the inability to generate high-fidelity long videos during inference. Furthermore, these models only support single-text conditions, whereas real-life scenarios often require multi-text conditions as the video content changes over time. To tackle these challenges, this study explores the potential of extending the text-driven capability to generate longer videos conditioned on multiple texts. 1) We first analyze the impact of initial noise in video diffusion models. Then building upon the observation of noise, we propose FreeNoise, a tuning-free and time-efficient paradigm to enhance the generative capabilities of pretrained video diffusion models while preserving content consistency. Specifically, instead of initializing noises for all frames, we reschedule a sequence of noises for long-range correlation and perform temporal attention over them by window-based function. 2) Additionally, we design a novel motion injection method to support the generation of videos conditioned on multiple text prompts. Extensive experiments validate the superiority of our paradigm in extending the generative capabilities of video diffusion models. It is noteworthy that compared with the previous best-performing method which brought about 255% extra time cost, our method incurs only negligible time cost of approximately 17%. Generated video samples are available at our website: http://haonanqiu.com/projects/FreeNoise.html.
    摘要 通过大规模视频集和扩散模型的进步,文本驱动视频生成已经取得了显著进步。然而,现有的视频生成模型通常只在有限数量的帧上进行训练,导致在推理过程中无法生成高质量的长视频。此外,这些模型只支持单个文本条件,而实际场景通常需要多个文本条件,以适应视频内容的变化。为解决这些挑战,本研究探讨了扩展文本驱动能力,以生成基于多个文本条件的长视频。1. 我们首先分析了视频扩散模型中的初始噪声的影响。然后,我们提出了一种免除调整和时间效率的方法 FreeNoise,以提高预训练视频扩散模型的生成能力,同时保持内容一致。特别是,而不是为所有帧 initialize 噪声,我们重新安排了一个序列噪声,并通过窗口函数进行时间注意力。2. 此外,我们设计了一种新的运动插入方法,以支持基于多个文本条件的视频生成。广泛的实验证明了我们的方法的优越性。与之前最佳成果相比,我们的方法只增加了约17%的时间成本,而其他方法增加了255%的时间成本。生成的视频示例可以在我们的网站上找到:http://haonanqiu.com/projects/FreeNoise.html。

Ghost on the Shell: An Expressive Representation of General 3D Shapes

  • paper_url: http://arxiv.org/abs/2310.15168
  • repo_url: https://github.com/lzzcd001/GShell
  • paper_authors: Zhen Liu, Yao Feng, Yuliang Xiu, Weiyang Liu, Liam Paull, Michael J. Black, Bernhard Schölkopf
  • for: 该论文目的是描述一种能够模型精准的3D表面几何体,以便创建真实的虚拟世界。
  • methods: 该论文使用了一种基于 manifold signed distance field的parameterization方法,以便模型开放的表面。
  • results: 该论文的实验结果表明,该方法可以实现非常高的重建和生成表面的精度,并且可以快速地渲染与材料和灯光相关的场景。
    Abstract The creation of photorealistic virtual worlds requires the accurate modeling of 3D surface geometry for a wide range of objects. For this, meshes are appealing since they 1) enable fast physics-based rendering with realistic material and lighting, 2) support physical simulation, and 3) are memory-efficient for modern graphics pipelines. Recent work on reconstructing and statistically modeling 3D shape, however, has critiqued meshes as being topologically inflexible. To capture a wide range of object shapes, any 3D representation must be able to model solid, watertight, shapes as well as thin, open, surfaces. Recent work has focused on the former, and methods for reconstructing open surfaces do not support fast reconstruction with material and lighting or unconditional generative modelling. Inspired by the observation that open surfaces can be seen as islands floating on watertight surfaces, we parameterize open surfaces by defining a manifold signed distance field on watertight templates. With this parameterization, we further develop a grid-based and differentiable representation that parameterizes both watertight and non-watertight meshes of arbitrary topology. Our new representation, called Ghost-on-the-Shell (G-Shell), enables two important applications: differentiable rasterization-based reconstruction from multiview images and generative modelling of non-watertight meshes. We empirically demonstrate that G-Shell achieves state-of-the-art performance on non-watertight mesh reconstruction and generation tasks, while also performing effectively for watertight meshes.
    摘要 创造光realistic虚拟世界需要准确地模型3D表面几何,为这,缓冲是有吸引力的,因为它们可以1)快速地基于物理学渲染,使用真实的材料和照明,2)支持物理模拟,3)对现代图形管道来说是内存有效。然而, latest research on reconstructing and statistically modeling 3D shape has criticized meshes for being topologically inflexible. To capture a wide range of object shapes, any 3D representation must be able to model solid, watertight shapes as well as thin, open surfaces. Recent work has focused on the former, and methods for reconstructing open surfaces do not support fast reconstruction with material and lighting or unconditional generative modeling.我们注意到,开放表面可以被看作是浮在 watertight 表面上的岛屿,我们可以将开放表面 parameterized by defining a manifold signed distance field on watertight templates. With this parameterization, we further develop a grid-based and differentiable representation that parameterizes both watertight and non-watertight meshes of arbitrary topology. Our new representation, called Ghost-on-the-Shell (G-Shell), enables two important applications: differentiable rasterization-based reconstruction from multiview images and generative modeling of non-watertight meshes. We empirically demonstrate that G-Shell achieves state-of-the-art performance on non-watertight mesh reconstruction and generation tasks, while also performing effectively for watertight meshes.

SAM-Med3D

  • paper_url: http://arxiv.org/abs/2310.15161
  • repo_url: https://github.com/uni-medical/sam-med3d
  • paper_authors: Haoyu Wang, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tianbin Li, Jianpin Chen, Yanzhou Su, Ziyan Huang, Yiqing Shen, Bin Fu, Shaoting Zhang, Junjun He, Yu Qiao
    for:This paper aims to improve the performance of the Segment Anything Model (SAM) in 3D volumetric medical image segmentation.methods:The authors modify SAM to a 3D architecture trained on a comprehensively processed large-scale volumetric medical dataset, and provide a comprehensive evaluation of its performance.results:SAM-Med3D excels at capturing 3D spatial information and exhibits competitive performance with significantly fewer prompt points than the top-performing fine-tuned SAM in the medical domain. It also shows enhanced efficiency and broad segmentation capabilities for 3D volumetric medical images.
    Abstract Although the Segment Anything Model (SAM) has demonstrated impressive performance in 2D natural image segmentation, its application to 3D volumetric medical images reveals significant shortcomings, namely suboptimal performance and unstable prediction, necessitating an excessive number of prompt points to attain the desired outcomes. These issues can hardly be addressed by fine-tuning SAM on medical data because the original 2D structure of SAM neglects 3D spatial information. In this paper, we introduce SAM-Med3D, the most comprehensive study to modify SAM for 3D medical images. Our approach is characterized by its comprehensiveness in two primary aspects: firstly, by comprehensively reformulating SAM to a thorough 3D architecture trained on a comprehensively processed large-scale volumetric medical dataset; and secondly, by providing a comprehensive evaluation of its performance. Specifically, we train SAM-Med3D with over 131K 3D masks and 247 categories. Our SAM-Med3D excels at capturing 3D spatial information, exhibiting competitive performance with significantly fewer prompt points than the top-performing fine-tuned SAM in the medical domain. We then evaluate its capabilities across 15 datasets and analyze it from multiple perspectives, including anatomical structures, modalities, targets, and generalization abilities. Our approach, compared with SAM, showcases pronouncedly enhanced efficiency and broad segmentation capabilities for 3D volumetric medical images. Our code is released at https://github.com/uni-medical/SAM-Med3D.
    摘要 although the Segment Anything Model (SAM) has demonstrated impressive performance in 2D natural image segmentation, its application to 3D volumetric medical images reveals significant shortcomings, namely suboptimal performance and unstable prediction, necessitating an excessive number of prompt points to attain the desired outcomes. these issues can hardly be addressed by fine-tuning SAM on medical data because the original 2D structure of SAM neglects 3D spatial information. in this paper, we introduce SAM-Med3D, the most comprehensive study to modify SAM for 3D medical images. our approach is characterized by its comprehensiveness in two primary aspects: firstly, by comprehensively reformulating SAM to a thorough 3D architecture trained on a comprehensively processed large-scale volumetric medical dataset; and secondly, by providing a comprehensive evaluation of its performance. specifically, we train SAM-Med3D with over 131K 3D masks and 247 categories. our SAM-Med3D excels at capturing 3D spatial information, exhibiting competitive performance with significantly fewer prompt points than the top-performing fine-tuned SAM in the medical domain. we then evaluate its capabilities across 15 datasets and analyze it from multiple perspectives, including anatomical structures, modalities, targets, and generalization abilities. our approach, compared with SAM, showcases pronouncedly enhanced efficiency and broad segmentation capabilities for 3D volumetric medical images. our code is released at https://github.com/uni-medical/SAM-Med3D.

FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models

  • paper_url: http://arxiv.org/abs/2310.15160
  • repo_url: https://github.com/LiheYoung/FreeMask
  • paper_authors: Lihe Yang, Xiaogang Xu, Bingyi Kang, Yinghuan Shi, Hengshuang Zhao
  • for: 提高 semantic segmentation 模型的训练效果,使其更加准确地分类图像中的各个对象。
  • methods: 使用生成模型生成具有真实描述信息的 sintetic 图像,并通过 Conditional GAN 生成映射,生成更多的具有描述信息的图像-描述映射对。
  • results: 使用 synthetic 图像进行训练,可以达到与使用真实图像进行训练相同的性能水平(e.g., 48.3 vs. 48.5 mIoU on ADE20K,和 49.3 vs. 50.5 on COCO-Stuff)。此外,可以通过对 synthetic 图像进行 filtering 和重新分配,提高 segmentation 模型的性能。
    Abstract Semantic segmentation has witnessed tremendous progress due to the proposal of various advanced network architectures. However, they are extremely hungry for delicate annotations to train, and the acquisition is laborious and unaffordable. Therefore, we present FreeMask in this work, which resorts to synthetic images from generative models to ease the burden of both data collection and annotation procedures. Concretely, we first synthesize abundant training images conditioned on the semantic masks provided by realistic datasets. This yields extra well-aligned image-mask training pairs for semantic segmentation models. We surprisingly observe that, solely trained with synthetic images, we already achieve comparable performance with real ones (e.g., 48.3 vs. 48.5 mIoU on ADE20K, and 49.3 vs. 50.5 on COCO-Stuff). Then, we investigate the role of synthetic images by joint training with real images, or pre-training for real images. Meantime, we design a robust filtering principle to suppress incorrectly synthesized regions. In addition, we propose to inequally treat different semantic masks to prioritize those harder ones and sample more corresponding synthetic images for them. As a result, either jointly trained or pre-trained with our filtered and re-sampled synthesized images, segmentation models can be greatly enhanced, e.g., from 48.7 to 52.0 on ADE20K. Code is available at https://github.com/LiheYoung/FreeMask.
    摘要 Semantic segmentation 技术在过去几年中历史猛增,但是这些高级网络架构却很需要精细的标注来训练,而这些标注的收集和生成却很困难和昂贵。因此,我们在这个工作中提出了FreeMask,它利用生成模型生成的 sintetic 图像来减轻数据收集和标注过程的压力。具体来说,我们首先使用生成模型生成大量的训练图像,并将这些图像与实际数据中的 semantic mask 相对应。这些图像-mask 对 alignment 非常好,可以用于semantic segmentation 模型的训练。我们奇怪的发现,只使用 sintetic 图像进行训练,我们就可以达到与实际数据相当的性能(例如,ADE20K 中的 mIoU 从 48.3 提高到 48.5,COCO-Stuff 中的 mIoU 从 49.3 提高到 50.5)。然后,我们研究了使用 sintetic 图像进行 joint 训练或 pre-training 的效果,同时设计了一种鲁棒的 filtering 原则来排除 incorrect 生成的区域。此外,我们还提出了对不同的 semantic mask 进行不同的 treated 方式,以便更好地调整对各种 semantic mask 的训练。结果是,通过 jointly 训练或 pre-training 使用我们的 filtered 和重新分配的 sintetic 图像,semantic segmentation 模型的性能可以得到大幅提高(例如,ADE20K 中的 mIoU 从 48.7 提高到 52.0)。代码可以在 上找到。

Online Detection of AI-Generated Images

  • paper_url: http://arxiv.org/abs/2310.15150
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: David C. Epstein, Ishan Jain, Oliver Wang, Richard Zhang
  • for: 本研究旨在检测AI生成的图像,以适应现实中新生成器不断更新的情况。
  • methods: 本研究使用N个生成器进行训练,并在N+k个生成器上进行测试,根据历史上公布的生成方法的发布日期进行设置。
  • results: 研究表明,通过抽象像素预测,可以实现强大的表现,并且可以在没有自动生成数据的情况下训练检测器。
    Abstract With advancements in AI-generated images coming on a continuous basis, it is increasingly difficult to distinguish traditionally-sourced images (e.g., photos, artwork) from AI-generated ones. Previous detection methods study the generalization from a single generator to another in isolation. However, in reality, new generators are released on a streaming basis. We study generalization in this setting, training on N models and testing on the next (N+k), following the historical release dates of well-known generation methods. Furthermore, images increasingly consist of both real and generated components, for example through image inpainting. Thus, we extend this approach to pixel prediction, demonstrating strong performance using automatically-generated inpainted data. In addition, for settings where commercial models are not publicly available for automatic data generation, we evaluate if pixel detectors can be trained solely on whole synthetic images.
    摘要 随着人工智能生成图像的进步不断,现在很难分辨来自传统源的图像(例如照片、艺术作品)和人工智能生成的图像。先前的检测方法通常研究单个生成器之间的泛化,但在实际情况下,新的生成器不断发布。我们研究这种设定,训练在N个模型上,测试在下一个(N+k)个,按照历史上发布的人工生成方法的时间顺序。此外,图像越来越多地包含真实和生成的组成部分,例如通过图像填充。因此,我们扩展了这种方法到像素预测,并证明了使用自动生成的填充数据表现出色。此外,在商业模型没有公开可用的自动数据生成情况下,我们评估了像素检测器是否可以在solely基于完整的人工图像上训练。

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

  • paper_url: http://arxiv.org/abs/2310.15144
  • repo_url: https://github.com/design-bench/design-bench.github.io
  • paper_authors: Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Lijuan Wang
  • for: 这篇论文是为了研究和评估文本到图像(T2I)生成模型在视觉设计场景中的潜力。
  • methods: 作者提出了一个名为DEsignBench的T2I生成测试平台,包括了评估T2I模型在“设计技术能力”和“设计应用场景”两个维度上的测试样本。
  • results: 作者使用DALL-E 3和其他领先的T2I模型在DEsignBench平台上进行测试,并创建了一个Side-by-Side比较图库,以便对生成图像进行人类评估和自动评估。人类评估包括图文对齐、视觉美感和设计创新等方面,而自动评估则使用GPT-4V引擎进行评估。
    Abstract We introduce DEsignBench, a text-to-image (T2I) generation benchmark tailored for visual design scenarios. Recent T2I models like DALL-E 3 and others, have demonstrated remarkable capabilities in generating photorealistic images that align closely with textual inputs. While the allure of creating visually captivating images is undeniable, our emphasis extends beyond mere aesthetic pleasure. We aim to investigate the potential of using these powerful models in authentic design contexts. In pursuit of this goal, we develop DEsignBench, which incorporates test samples designed to assess T2I models on both "design technical capability" and "design application scenario." Each of these two dimensions is supported by a diverse set of specific design categories. We explore DALL-E 3 together with other leading T2I models on DEsignBench, resulting in a comprehensive visual gallery for side-by-side comparisons. For DEsignBench benchmarking, we perform human evaluations on generated images in DEsignBench gallery, against the criteria of image-text alignment, visual aesthetic, and design creativity. Our evaluation also considers other specialized design capabilities, including text rendering, layout composition, color harmony, 3D design, and medium style. In addition to human evaluations, we introduce the first automatic image generation evaluator powered by GPT-4V. This evaluator provides ratings that align well with human judgments, while being easily replicable and cost-efficient. A high-resolution version is available at https://github.com/design-bench/design-bench.github.io/raw/main/designbench.pdf?download=
    摘要 我们介绍DEsignBench,一个文本到图像(T2I)生成测试准则,适用于视觉设计场景。最近的T2I模型如DALL-E 3等,已经表现出了惊人的能力,可以生成高质量的图像,与文本输入高度吻合。而我们的目标不仅在于创造美丽的图像,更在于探索使用这些强大模型在实际设计场景中的潜在可能性。为实现这个目标,我们开发了DEsignBench,它包括了评测T2I模型的“设计技术能力”和“设计应用场景”两个维度。每个维度都有多种特定的设计类别支持。我们使用DALL-E 3和其他领先的T2I模型在DEsignBench上进行测试,并创建了一个丰富的视觉图库,用于对各模型进行侧对比。为DEsignBench测试,我们进行了人类评价生成图像,以评价图像和文本之间的吻合度、视觉美感和设计创新性。我们的评价还考虑了其他专业设计能力,包括文本渲染、布局组合、颜色和彩色协调、3D设计和媒体风格。此外,我们还引入了基于GPT-4V的自动生成图像评价器,它提供了与人类评价相符的评价结果,同时易于复制和经济。高分辨率版本可以在https://github.com/design-bench/design-bench.github.io/raw/main/designbench.pdf?download=下载。

Fusion-Driven Tree Reconstruction and Fruit Localization: Advancing Precision in Agriculture

  • paper_url: http://arxiv.org/abs/2310.15138
  • repo_url: None
  • paper_authors: Kaiming Fu, Peng Wei, Juan Villacres, Zhaodan Kong, Stavros G. Vougioukas, Brian N. Bailey
  • for: This study aims to improve the precision of guidance for agricultural robotics and automation systems by analyzing fruit distribution in orchards.
  • methods: The study uses a fusion of RGB imagery, LiDAR, and IMU data to reconstruct trees and locate fruits with high precision.
  • results: The experiments conducted in both a controlled environment and an actual peach orchard demonstrate the robustness and efficacy of the proposed methodology, highlighting its potential for transforming agricultural robotics and precision farming.
    Abstract Fruit distribution is pivotal in shaping the future of both agriculture and agricultural robotics, paving the way for a streamlined supply chain. This study introduces an innovative methodology that harnesses the synergy of RGB imagery, LiDAR, and IMU data, to achieve intricate tree reconstructions and the pinpoint localization of fruits. Such integration not only offers insights into the fruit distribution, which enhances the precision of guidance for agricultural robotics and automation systems, but also sets the stage for simulating synthetic fruit patterns across varied tree architectures. To validate this approach, experiments have been carried out in both a controlled environment and an actual peach orchard. The results underscore the robustness and efficacy of this fusion-driven methodology, highlighting its potential as a transformative tool for future agricultural robotics and precision farming.
    摘要 Frruit 分布对未来农业和农业机器人发展起着关键作用,为农业自动化系统的指导提供精准的信息。本研究提出了一种创新的方法,通过RGB图像、LiDAR和IMU数据的结合,实现精细的树形 reconstruction和果实的精确定位。这种整合不仅提供了果实分布的信息,也为 simulate synthetic fruit patterns across varied tree architectures 创造了条件。为验证这种方法的可行性,在控制环境和实际桃果园中进行了实验。结果表明这种融合驱动的方法有 robustness和效果, highlighting its potential as a transformative tool for future agricultural robotics and precision farming。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Novel-View Acoustic Synthesis from 3D Reconstructed Rooms

  • paper_url: http://arxiv.org/abs/2310.15130
  • repo_url: https://github.com/apple/ml-nvas3d
  • paper_authors: Byeongjoo Ahn, Karren Yang, Brian Hamilton, Jonathan Sheaffer, Anurag Ranjan, Miguel Sarabia, Oncel Tuzel, Jen-Hao Rick Chang
  • for: 本研究探讨了将盲音频记录与3D场景信息结合使用以实现新视角声音合成。
  • methods: 我们使用了2-4个麦克风的盲音频记录和场景中的多个不知道声音源的3D几何和材料,并计算出场景中的任何声音位置。
  • results: 我们的方法比既有的方法更高效,能够同时解决声音源localization、separation和干扰除等问题。在Matterport3D-NVAS数据集上的模拟研究中,我们的模型实现了99.8%的源localization精度、PSNR26.44dB和SDR14.23dB等佳绩。
    Abstract We investigate the benefit of combining blind audio recordings with 3D scene information for novel-view acoustic synthesis. Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we estimate the sound anywhere in the scene. We identify the main challenges of novel-view acoustic synthesis as sound source localization, separation, and dereverberation. While naively training an end-to-end network fails to produce high-quality results, we show that incorporating room impulse responses (RIRs) derived from 3D reconstructed rooms enables the same network to jointly tackle these tasks. Our method outperforms existing methods designed for the individual tasks, demonstrating its effectiveness at utilizing 3D visual information. In a simulated study on the Matterport3D-NVAS dataset, our model achieves near-perfect accuracy on source localization, a PSNR of 26.44 dB and a SDR of 14.23 dB for source separation and dereverberation, resulting in a PSNR of 25.55 dB and a SDR of 14.20 dB on novel-view acoustic synthesis. Code, pretrained model, and video results are available on the project webpage (https://github.com/apple/ml-nvas3d).
    摘要 我们调查了结合无视录音 recording 与 3D 场景信息的独特观点音响合成的优点。我们使用两到四个麦克风的音录音,以及场景中多个不知名的声音来源的 3D 几何和材料,估计场景中任何声音的位置。我们识别了独特观点音响合成的主要挑战为声音来源位置Localization、分离和降噪。而将数据集训练成为终端网络,却无法生成高品质结果。我们显示,将3D 房间响应函数(RIR) derive from 3D 重建的房间,可以让同一个网络同时解决这些任务。我们的方法比于现有的方法设计 для个别任务,具有更高的效果,实现了将3D 视觉信息作用到音响合成中。在 simulated 的 Matterport3D-NVAS 数据集上,我们的模型实现了近乎完美的精度在声音来源Localization,PSNR 26.44 dB 和 SDR 14.23 dB для声音分离和降噪,最终实现了 PSNR 25.55 dB 和 SDR 14.20 dB 在独特观点音响合成中。代码、预训模型和视频结果可以在项目网页(https://github.com/apple/ml-nvas3d)上获取。

Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients

  • paper_url: http://arxiv.org/abs/2310.15128
  • repo_url: None
  • paper_authors: Maximilian Krahn, Michelle Sasdelli, Fengyi Yang, Vladislav Golyanik, Juho Kannala, Tat-Jun Chin, Tolga Birdal
  • for: 该文章提出了一种新的层wise随机优化器,用于在量子硬件上训练使用二进制权重的神经网络(BNNs)。BNNs可以减少深度学习模型的计算需求和能 consumption,但是在实际训练中仍然是一个开放的挑战。
  • methods: 该优化器使用了一种新的方法,称为层wise随机梯度映射(QP-SBGD),它可以将梯度映射到二进制变量上,并通过解决一个二次约束 Binary optimization 问题来实现。
  • results: 通过对 Rosenbrock 函数、BNNs 和二进制图 neural networks 进行训练,我们展示了 QP-SBGD 可以与其他竞争性和成熔的基准值相比,或者与其相当。此外,我们还证明了 QP-SBGD 的修改版本可以 converge to a fixed point in the binary variable space。
    Abstract We present, QP-SBGD, a novel layer-wise stochastic optimiser tailored towards training neural networks with binary weights, known as binary neural networks (BNNs), on quantum hardware. BNNs reduce the computational requirements and energy consumption of deep learning models with minimal loss in accuracy. However, training them in practice remains to be an open challenge. Most known BNN-optimisers either rely on projected updates or binarise weights post-training. Instead, QP-SBGD approximately maps the gradient onto binary variables, by solving a quadratic constrained binary optimisation. Under practically reasonable assumptions, we show that this update rule converges with a rate of $\mathcal{O}(1 / \sqrt{T})$. Moreover, we show how the $\mathcal{NP}$-hard projection can be effectively executed on an adiabatic quantum annealer, harnessing recent advancements in quantum computation. We also introduce a projected version of this update rule and prove that if a fixed point exists in the binary variable space, the modified updates will converge to it. Last but not least, our algorithm is implemented layer-wise, making it suitable to train larger networks on resource-limited quantum hardware. Through extensive evaluations, we show that QP-SBGD outperforms or is on par with competitive and well-established baselines such as BinaryConnect, signSGD and ProxQuant when optimising the Rosenbrock function, training BNNs as well as binary graph neural networks.
    摘要 我们提出了QP-SBGD,一种适用于训练使用二进制权重的神经网络(BNNs)的新的层 wise随机优化器。BNNs可以减少深度学习模型的计算需求和能 consumption,但是在实际训练中仍然是一个开放的挑战。大多数已知的BNN优化器都是通过 projeted 更新或者在训练后对权重进行二进制化。然而,QP-SBGD可以将梯度约束在二进制变量上,通过解决一个二次约束 binary 优化问题来approximately 映射梯度。在实际可能的假设下,我们证明了这个更新规则在 $O(\frac{1}{\sqrt{T})$ 的速率下收敛。此外,我们还证明了在Quantum Annealer上实现这个更新规则的 $\mathcal{NP}$-hard проекcion可以高效地执行。此外,我们还引入了一个修改后的版本,并证明如果在二进制变量空间中存在固定点,那么修改后的更新规则将收敛到它。最后,我们的算法是层 wise 的,使其适用于训练更大的神经网络在有限的Quantum 硬件上。通过广泛的评估,我们表明了QP-SBGD可以与现有的竞争力强的基准值相比,如BinaryConnect、signSGD 和 ProxQuant 等,在 Rosenbrock 函数、BNN 以及二进制图 neural network 上进行优化。

SpVOS: Efficient Video Object Segmentation with Triple Sparse Convolution

  • paper_url: http://arxiv.org/abs/2310.15115
  • repo_url: None
  • paper_authors: Weihao Lin, Tao Chen, Chong Yu
    for:* 这个论文主要研究的是semi-supervised video object segmentation(Semi-VOS),它只需要标注第一帧视频图像就可以对后续帧进行分割。methods:* 该论文提出了一种名为SpVOS的稀疏基线方法,该方法使用了一种新的三重稀疏 convolution来减少总VOS框架的计算成本。results:* 实验结果显示,提出的SpVOS方法可以在两个主流VOS数据集上达到比较高的分割性能,同时减少了42%的计算成本。例如,在DAVIS-2017验证集上,SpVOS方法可以获得83.04%的总分(79.29%),与传统的非稀疏VOS基eline(82.88%)几乎相当,而且在YouTube-VOS数据集上,SpVOS方法可以获得80.36%的总分(75.84%)。
    Abstract Semi-supervised video object segmentation (Semi-VOS), which requires only annotating the first frame of a video to segment future frames, has received increased attention recently. Among existing pipelines, the memory-matching-based one is becoming the main research stream, as it can fully utilize the temporal sequence information to obtain high-quality segmentation results. Even though this type of method has achieved promising performance, the overall framework still suffers from heavy computation overhead, mainly caused by the per-frame dense convolution operations between high-resolution feature maps and each kernel filter. Therefore, we propose a sparse baseline of VOS named SpVOS in this work, which develops a novel triple sparse convolution to reduce the computation costs of the overall VOS framework. The designed triple gate, taking full consideration of both spatial and temporal redundancy between adjacent video frames, adaptively makes a triple decision to decide how to apply the sparse convolution on each pixel to control the computation overhead of each layer, while maintaining sufficient discrimination capability to distinguish similar objects and avoid error accumulation. A mixed sparse training strategy, coupled with a designed objective considering the sparsity constraint, is also developed to balance the VOS segmentation performance and computation costs. Experiments are conducted on two mainstream VOS datasets, including DAVIS and Youtube-VOS. Results show that, the proposed SpVOS achieves superior performance over other state-of-the-art sparse methods, and even maintains comparable performance, e.g., an 83.04% (79.29%) overall score on the DAVIS-2017 (Youtube-VOS) validation set, with the typical non-sparse VOS baseline (82.88% for DAVIS-2017 and 80.36% for Youtube-VOS) while saving up to 42% FLOPs, showing its application potential for resource-constrained scenarios.
    摘要 半supervised视频对象分割(Semi-VOS),需要只annotating the first frame of a video to segment future frames,在最近received increased attention。 Among existing pipelines, the memory-matching-based one is becoming the main research stream, as it can fully utilize the temporal sequence information to obtain high-quality segmentation results. Although this type of method has achieved promising performance, the overall framework still suffers from heavy computation overhead, mainly caused by the per-frame dense convolution operations between high-resolution feature maps and each kernel filter. Therefore, we propose a sparse baseline of VOS named SpVOS in this work, which develops a novel triple sparse convolution to reduce the computation costs of the overall VOS framework. The designed triple gate, taking full consideration of both spatial and temporal redundancy between adjacent video frames, adaptively makes a triple decision to decide how to apply the sparse convolution on each pixel to control the computation overhead of each layer, while maintaining sufficient discrimination capability to distinguish similar objects and avoid error accumulation. A mixed sparse training strategy, coupled with a designed objective considering the sparsity constraint, is also developed to balance the VOS segmentation performance and computation costs. Experiments are conducted on two mainstream VOS datasets, including DAVIS and Youtube-VOS. Results show that, the proposed SpVOS achieves superior performance over other state-of-the-art sparse methods, and even maintains comparable performance, e.g., an 83.04% (79.29%) overall score on the DAVIS-2017 (Youtube-VOS) validation set, with the typical non-sparse VOS baseline (82.88% for DAVIS-2017 and 80.36% for Youtube-VOS) while saving up to 42% FLOPs, showing its application potential for resource-constrained scenarios.

Matryoshka Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.15111
  • repo_url: None
  • paper_authors: Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly
  • for: 高解像图像和视频生成
  • methods: diffusion process + NestedUNet architecture + progressive training schedule
  • results: 实现了高解像图像和视频生成,并且可以进行零基础学习Here’s a more detailed explanation of each point:
  • for: The paper is focused on high-resolution image and video synthesis, and it proposes a novel framework called Matryoshka Diffusion Models (MDM) to achieve this goal.
  • methods: The paper uses a diffusion process to denoise inputs at multiple resolutions jointly, and employs a NestedUNet architecture where features and parameters for small-scale inputs are nested within those of large scales. Additionally, the paper proposes a progressive training schedule from lower to higher resolutions, which helps to improve optimization for high-resolution generation.
  • results: The paper demonstrates the effectiveness of its approach on various benchmarks, including class-conditioned image generation, high-resolution text-to-image, and text-to-video applications. Remarkably, the paper shows that a single pixel-space model can be trained at resolutions of up to 1024x1024 pixels, with strong zero-shot generalization using the CC12M dataset, which contains only 12 million images.
    Abstract Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing methods often resort to training cascaded models in pixel space or using a downsampled latent space of a separately trained auto-encoder. In this paper, we introduce Matryoshka Diffusion Models(MDM), an end-to-end framework for high-resolution image and video synthesis. We propose a diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small-scale inputs are nested within those of large scales. In addition, MDM enables a progressive training schedule from lower to higher resolutions, which leads to significant improvements in optimization for high-resolution generation. We demonstrate the effectiveness of our approach on various benchmarks, including class-conditioned image generation, high-resolution text-to-image, and text-to-video applications. Remarkably, we can train a single pixel-space model at resolutions of up to 1024x1024 pixels, demonstrating strong zero-shot generalization using the CC12M dataset, which contains only 12 million images.
    摘要 Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing methods often resort to training cascaded models in pixel space or using a downsampled latent space of a separately trained auto-encoder. In this paper, we introduce Matryoshka Diffusion Models(MDM), an end-to-end framework for high-resolution image and video synthesis. We propose a diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small-scale inputs are nested within those of large scales. In addition, MDM enables a progressive training schedule from lower to higher resolutions, which leads to significant improvements in optimization for high-resolution generation. We demonstrate the effectiveness of our approach on various benchmarks, including class-conditioned image generation, high-resolution text-to-image, and text-to-video applications. Remarkably, we can train a single pixel-space model at resolutions of up to 1024x1024 pixels, demonstrating strong zero-shot generalization using the CC12M dataset, which contains only 12 million images.Here's the translation in Traditional Chinese:Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing methods often resort to training cascaded models in pixel space or using a downsampled latent space of a separately trained auto-encoder. In this paper, we introduce Matryoshka Diffusion Models(MDM), an end-to-end framework for high-resolution image and video synthesis. We propose a diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small-scale inputs are nested within those of large scales. In addition, MDM enables a progressive training schedule from lower to higher resolutions, which leads to significant improvements in optimization for high-resolution generation. We demonstrate the effectiveness of our approach on various benchmarks, including class-conditioned image generation, high-resolution text-to-image, and text-to-video applications. Remarkably, we can train a single pixel-space model at resolutions of up to 1024x1024 pixels, demonstrating strong zero-shot generalization using the CC12M dataset, which contains only 12 million images.

Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model

  • paper_url: http://arxiv.org/abs/2310.15110
  • repo_url: https://github.com/sudo-ai-3d/zero123plus
  • paper_authors: Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, Hao Su
  • for: 生成3D共变图像从单个输入视图中
  • methods: 使用图像扩散模型,并对其进行conditioning和训练,以优化预训练的2D生成规则
  • results: 生成高质量、共变的多视图图像,解决了Texture异常和Geometric Misalignment等问题,并可以通过训练ControlNet进行更高级的控制I hope that helps! Let me know if you have any other questions.
    Abstract We report Zero123++, an image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view. To take full advantage of pretrained 2D generative priors, we develop various conditioning and training schemes to minimize the effort of finetuning from off-the-shelf image diffusion models such as Stable Diffusion. Zero123++ excels in producing high-quality, consistent multi-view images from a single image, overcoming common issues like texture degradation and geometric misalignment. Furthermore, we showcase the feasibility of training a ControlNet on Zero123++ for enhanced control over the generation process. The code is available at https://github.com/SUDO-AI-3D/zero123plus.
    摘要 我团队报道Zero123++,一种基于图像的扩散模型,可以从单个输入视图中生成3D保持一致的多视图图像。为了完全利用预训练的2D生成假设,我们开发了多种conditioning和训练方案,以最小化从存储库中的扩散模型(如稳定扩散)的训练时间。Zero123++在生成高质量、一致的多视图图像方面表现出色,解决了通常出现的 текстура强制下降和几何不对齐问题。此外,我们还展示了在Zero123++上训练控制网络以提高生成过程的控制性的可能性。代码可以在https://github.com/SUDO-AI-3D/zero123plus上获取。

FD-Align: Feature Discrimination Alignment for Fine-tuning Pre-Trained Models in Few-Shot Learning

  • paper_url: http://arxiv.org/abs/2310.15105
  • repo_url: https://github.com/skingorz/fd-align
  • paper_authors: Kun Song, Huimin Ma, Bochao Zou, Huishuai Zhang, Weiran Huang
  • for: 提高预训练模型的下游任务性能,尤其是在分布转移时。
  • methods: 提出了一种名为Feature Discrimination Alignment(FD-Align)的细化方法,通过保持幌子特征的一致性来增强模型的通用性。
  • results: 经验证明,该方法可以提高ID和OOD任务的性能,并且可以轻松地与现有方法集成,从而提高模型的总性能。
    Abstract Due to the limited availability of data, existing few-shot learning methods trained from scratch fail to achieve satisfactory performance. In contrast, large-scale pre-trained models such as CLIP demonstrate remarkable few-shot and zero-shot capabilities. To enhance the performance of pre-trained models for downstream tasks, fine-tuning the model on downstream data is frequently necessary. However, fine-tuning the pre-trained model leads to a decrease in its generalizability in the presence of distribution shift, while the limited number of samples in few-shot learning makes the model highly susceptible to overfitting. Consequently, existing methods for fine-tuning few-shot learning primarily focus on fine-tuning the model's classification head or introducing additional structure. In this paper, we introduce a fine-tuning approach termed Feature Discrimination Alignment (FD-Align). Our method aims to bolster the model's generalizability by preserving the consistency of spurious features across the fine-tuning process. Extensive experimental results validate the efficacy of our approach for both ID and OOD tasks. Once fine-tuned, the model can seamlessly integrate with existing methods, leading to performance improvements. Our code can be found in https://github.com/skingorz/FD-Align.
    摘要 In this paper, we introduce a fine-tuning approach termed Feature Discrimination Alignment (FD-Align). Our method aims to bolster the model's generalizability by preserving the consistency of spurious features across the fine-tuning process. Extensive experimental results validate the efficacy of our approach for both in-distribution (ID) and out-of-distribution (OOD) tasks. Once fine-tuned, the model can seamlessly integrate with existing methods, leading to performance improvements. Our code can be found at https://github.com/skingorz/FD-Align.

On the Detection of Image-Scaling Attacks in Machine Learning

  • paper_url: http://arxiv.org/abs/2310.15085
  • repo_url: https://github.com/equiw/2023-detection-scalingattacks
  • paper_authors: Erwin Quiring, Andreas Müller, Konrad Rieck
  • for: 这篇论文旨在研究图像缩放攻击的检测方法,以帮助防范这种攻击。
  • methods: 本论文提出了两种检测方法,分别是基于图像的特征值分析和基于图像的预测值分析。这些方法简单易用, yet significantly outperform previous work。
  • results: 在对所有主要学习平台和缩放算法进行了广泛的评估中,本论文的方法能够可靠地检测图像缩放攻击,包括在适应性攻击者下 detection of attacks modifying the entire scaled image 和 minor parts of the image 都能够得到良好的检测性能。
    Abstract Image scaling is an integral part of machine learning and computer vision systems. Unfortunately, this preprocessing step is vulnerable to so-called image-scaling attacks where an attacker makes unnoticeable changes to an image so that it becomes a new image after scaling. This opens up new ways for attackers to control the prediction or to improve poisoning and backdoor attacks. While effective techniques exist to prevent scaling attacks, their detection has not been rigorously studied yet. Consequently, it is currently not possible to reliably spot these attacks in practice. This paper presents the first in-depth systematization and analysis of detection methods for image-scaling attacks. We identify two general detection paradigms and derive novel methods from them that are simple in design yet significantly outperform previous work. We demonstrate the efficacy of these methods in a comprehensive evaluation with all major learning platforms and scaling algorithms. First, we show that image-scaling attacks modifying the entire scaled image can be reliably detected even under an adaptive adversary. Second, we find that our methods provide strong detection performance even if only minor parts of the image are manipulated. As a result, we can introduce a novel protection layer against image-scaling attacks.
    摘要 Image scaling 是机器学习和计算机视觉系统中的一个基本步骤。然而,这个预处理步骤受到称为图像缩放攻击的威胁。这些攻击使攻击者可以隐蔽地改变图像,使其变成一个新的图像 после 缩放。这开创了新的攻击方式,让攻击者可以控制预测或提高毒剂和后门攻击。虽然有有效的防御技术,但检测这些攻击还没有得到系统的研究。因此,目前并不可靠地检测这些攻击。本文提出了第一个系统化和分析检测图像缩放攻击的方法。我们标识了两个通用检测方法,并从这些方法中 derivation 出了简单设计的新方法。我们在对所有主要学习平台和缩放算法进行了全面的评估中,证明了这些方法的效果。首先,我们表明了修改整个缩放图像的攻击可以可靠地检测,即使敌方是可靠的。其次,我们发现我们的方法在只有少量图像部分被修改时也有强大的检测性能。因此,我们可以在图像缩放攻击中引入一种新的保护层。

E4S: Fine-grained Face Swapping via Editing With Regional GAN Inversion

  • paper_url: http://arxiv.org/abs/2310.15081
  • repo_url: https://github.com/e4s2023/e4s2023
  • paper_authors: Maomao Li, Ge Yuan, Cairong Wang, Zhian Liu, Yong Zhang, Yongwei Nie, Jue Wang, Dong Xu
  • for: 这个论文提出了一种基于细腻表情编辑的面部换位方法,称为”编辑换位”(E4S)。传统的面部换位方法通常基于全局特征EXTRACTION,容易导致源face的损失。相比之下,我们的框架使用了区域生成 adversarial network (RGI) 方法,可以显式分离形状和Texture。
  • methods: 我们的E4S在一个预训练的 StyleGAN 中的 latent space 中进行面部换位,使用多Scale mask-guided encoder 将每个 facial component 的 Texture 映射到区域风格码中,然后使用 mask-guided injection module 修改特征图中的 style codes。这种分离使得面部换位可以简化为style和mask switching。
  • results: 对比之前的方法,我们的E4S可以更好地保持 texture、形状和照明条件。此外,我们还提出了一种面部填充网络作为后期处理,以解决潜在的mask exchange Area不一致。我们的实现可以在https://github.com/e4s2023/E4S2023中找到。
    Abstract This paper proposes a novel approach to face swapping from the perspective of fine-grained facial editing, dubbed "editing for swapping" (E4S). The traditional face swapping methods rely on global feature extraction and often fail to preserve the source identity. In contrast, our framework proposes a Regional GAN Inversion (RGI) method, which allows the explicit disentanglement of shape and texture. Specifically, our E4S performs face swapping in the latent space of a pretrained StyleGAN, where a multi-scale mask-guided encoder is applied to project the texture of each facial component into regional style codes and a mask-guided injection module then manipulates feature maps with the style codes. Based on this disentanglement, face swapping can be simplified as style and mask swapping. Besides, since reconstructing the source face in the target image may lead to disharmony lighting, we propose to train a re-coloring network to make the swapped face maintain the lighting condition on the target face. Further, to deal with the potential mismatch area during mask exchange, we designed a face inpainting network as post-processing. The extensive comparisons with state-of-the-art methods demonstrate that our E4S outperforms existing methods in preserving texture, shape, and lighting. Our implementation is available at https://github.com/e4s2023/E4S2023.
    摘要

RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in Dynamic Environments

  • paper_url: http://arxiv.org/abs/2310.15072
  • repo_url: None
  • paper_authors: Jinyu Li, Xiaokun Pan, Gan Huang, Ziyang Zhang, Nan Wang, Hujun Bao, Guofeng Zhang
  • for: 这个论文是用于解决动态场景和纯旋转问题的视觉-运动规律系统(VIO)。
  • methods: 本文提出了一个名为RD-VIO的新型VIO系统,使用了IMU-PARSAC算法来灵活地检测和匹配关键点。在第一阶段,该算法使用视觉和IMU测量来匹配新的关键点。在第二阶段,它使用了自己的统计资讯来引导内部关键点匹配。此外,本文还使用了抽象框架来处理纯旋转运动。
  • results: 实验结果显示,提出的RD-VIO系统在公共数据上比其他方法在动态环境中表现更好。
    Abstract It is typically challenging for visual or visual-inertial odometry systems to handle the problems of dynamic scenes and pure rotation. In this work, we design a novel visual-inertial odometry (VIO) system called RD-VIO to handle both of these two problems. Firstly, we propose an IMU-PARSAC algorithm which can robustly detect and match keypoints in a two-stage process. In the first state, landmarks are matched with new keypoints using visual and IMU measurements. We collect statistical information from the matching and then guide the intra-keypoint matching in the second stage. Secondly, to handle the problem of pure rotation, we detect the motion type and adapt the deferred-triangulation technique during the data-association process. We make the pure-rotational frames into the special subframes. When solving the visual-inertial bundle adjustment, they provide additional constraints to the pure-rotational motion. We evaluate the proposed VIO system on public datasets. Experiments show the proposed RD-VIO has obvious advantages over other methods in dynamic environments.
    摘要 通常情况下,视觉或视觉-遥感增益系统会遇到动态场景和纯旋转的问题。在这项工作中,我们设计了一种新的视觉-遥感增益(VIO)系统,称为RD-VIO,以处理这两个问题。首先,我们提出了一种IMU-PARSAC算法,可以强健地检测和匹配视觉和IMU测量中的关键点。在第一个阶段,我们使用视觉和IMU测量来匹配地标,并收集视觉和IMU测量中的统计信息,以后在第二个阶段进行内部匹配。其次,为了处理纯旋转的问题,我们在数据关联过程中检测运动类型,并适应延迟三角形技术。将纯旋转的帧转换为特殊子帧,并在视觉-遥感套件调整中提供附加纯旋转运动的约束。我们对公共数据集进行了评测,实验表明,我们的RD-VIO系统在动态环境中具有明显的优势。

DREAM+: Efficient Dataset Distillation by Bidirectional Representative Matching

  • paper_url: http://arxiv.org/abs/2310.15052
  • repo_url: https://github.com/lyq312318224/dream
  • paper_authors: Yanqing Liu, Jianyang Gu, Kai Wang, Zheng Zhu, Kaipeng Zhang, Wei Jiang, Yang You
  • for: 创建减小数据集,降低存储和训练成本
  • methods: 基于bidirectional representative matching的新匹配策略,可应用于主流数据减小框架
  • results: 可以降低数据减小迭代次数,并且在充分训练时可以提高性能,达到状态略报结果
    Abstract Dataset distillation plays a crucial role in creating compact datasets with similar training performance compared with original large-scale ones. This is essential for addressing the challenges of data storage and training costs. Prevalent methods facilitate knowledge transfer by matching the gradients, embedding distributions, or training trajectories of synthetic images with those of the sampled original images. Although there are various matching objectives, currently the strategy for selecting original images is limited to naive random sampling. We argue that random sampling overlooks the evenness of the selected sample distribution, which may result in noisy or biased matching targets. Besides, the sample diversity is also not constrained by random sampling. Additionally, current methods predominantly focus on single-dimensional matching, where information is not fully utilized. To address these challenges, we propose a novel matching strategy called Dataset Distillation by Bidirectional REpresentAtive Matching (DREAM+), which selects representative original images for bidirectional matching. DREAM+ is applicable to a variety of mainstream dataset distillation frameworks and significantly reduces the number of distillation iterations by more than 15 times without affecting performance. Given sufficient training time, DREAM+ can further improve the performance and achieve state-of-the-art results. We have released the code at github.com/NUS-HPC-AI-Lab/DREAM+.
    摘要 To address these challenges, we propose a novel matching strategy called Dataset Distillation by Bidirectional REpresentAtive Matching (DREAM+), which selects representative original images for bidirectional matching. DREAM+ is applicable to a variety of mainstream dataset distillation frameworks and significantly reduces the number of distillation iterations by more than 15 times without affecting performance. Given sufficient training time, DREAM+ can further improve the performance and achieve state-of-the-art results. The code has been released at github.com/NUS-HPC-AI-Lab/DREAM+.

CalibrationPhys: Self-supervised Video-based Heart and Respiratory Rate Measurements by Calibrating Between Multiple Cameras

  • paper_url: http://arxiv.org/abs/2310.15043
  • repo_url: None
  • paper_authors: Yusuke Akamatsu, Terumi Umematsu, Hitoshi Imaoka
  • for: 用于心率和呼吸频率测量,更加有用和易用于传统的接触式传感器。
  • methods: 使用自我超级vised学习方法,不需要贵重的实验室数据 collection。
  • results: 在两个数据集上进行了实验,并且比现有的方法表现出色,可以使用任意的相机进行心率和呼吸频率测量。Here’s the full answer in Simplified Chinese:
  • for: 这篇论文是用于提出一种基于视频的心率和呼吸频率测量方法,比传统的接触式传感器更加有用和易用。
  • methods: 这种方法使用了自我超级vised学习方法,不需要贵重的实验室数据 collection。
  • results: 在两个数据集上进行了实验,并且比现有的方法表现出色,可以使用任意的相机进行心率和呼吸频率测量。
    Abstract Video-based heart and respiratory rate measurements using facial videos are more useful and user-friendly than traditional contact-based sensors. However, most of the current deep learning approaches require ground-truth pulse and respiratory waves for model training, which are expensive to collect. In this paper, we propose CalibrationPhys, a self-supervised video-based heart and respiratory rate measurement method that calibrates between multiple cameras. CalibrationPhys trains deep learning models without supervised labels by using facial videos captured simultaneously by multiple cameras. Contrastive learning is performed so that the pulse and respiratory waves predicted from the synchronized videos using multiple cameras are positive and those from different videos are negative. CalibrationPhys also improves the robustness of the models by means of a data augmentation technique and successfully leverages a pre-trained model for a particular camera. Experimental results utilizing two datasets demonstrate that CalibrationPhys outperforms state-of-the-art heart and respiratory rate measurement methods. Since we optimize camera-specific models using only videos from multiple cameras, our approach makes it easy to use arbitrary cameras for heart and respiratory rate measurements.
    摘要 traditional contact-based sensors的替代方案,使用视频来测量心跳和呼吸频率更加有用和易用。然而,现有的深度学习方法大多需要训练用的真实心跳和呼吸波,这些数据集成本昂贵。本文提出了一种自我超vised video基于心跳和呼吸频率测量方法,即CalibrationPhys。CalibrationPhys使用多个摄像头同时拍摄的人脸视频进行自我超vised学习,无需真实心跳和呼吸波的标注。我们使用对ynchronous的多个摄像头拍摄的人脸视频进行对比学习,以便在多个摄像头上测量心跳和呼吸频率。此外,我们还使用数据增强技术来提高模型的Robustness。实验结果表明,CalibrationPhys可以高效地测量心跳和呼吸频率,并且可以使用任意摄像头进行测量。因为我们只需要使用多个摄像头拍摄的视频来优化相机特定的模型,因此我们的方法很容易使用。

Manipulation Mask Generator: High-Quality Image Manipulation Mask Generation Method Based on Modified Total Variation Noise Reduction

  • paper_url: http://arxiv.org/abs/2310.15041
  • repo_url: None
  • paper_authors: Xinyu Yang, Jizhe Zhou
  • for: 提高深度学习模型的表现(improve the performance of deep learning models)
  • methods: 使用修改后total variation noise reduction方法(modified total variation noise reduction method)和MSER+NMS技术(MSER+NMS technology)提取文本信息(extract text information)
  • results: 可以获得含有少量噪音的图像(obtain images with little noise),同时保留文本信息(preserve text information),可以用于深度学习模型(can be used for deep learning models)
    Abstract In artificial intelligence, any model that wants to achieve a good result is inseparable from a large number of high-quality data. It is especially true in the field of tamper detection. This paper proposes a modified total variation noise reduction method to acquire high-quality tampered images. We automatically crawl original and tampered images from the Baidu PS Bar. Baidu PS Bar is a website where net friends post countless tampered images. Subtracting the original image with the tampered image can highlight the tampered area. However, there is also substantial noise on the final print, so these images can't be directly used in the deep learning model. Our modified total variation noise reduction method is aimed at solving this problem. Because a lot of text is slender, it is easy to lose text information after the opening and closing operation. We use MSER (Maximally Stable Extremal Regions) and NMS (Non-maximum Suppression) technology to extract text information. And then use the modified total variation noise reduction technology to process the subtracted image. Finally, we can obtain an image with little noise by adding the image and text information. And the idea also largely retains the text information. Datasets generated in this way can be used in deep learning models, and they will help the model achieve better results.
    摘要 在人工智能中,任何模型想要获得好的结果,都是不可或缺的大量高质量数据。特别是在妥协检测领域。这篇论文提出了一种修改后总变量噪声减少方法,以获得高质量的妥协图像。我们自动爬取了原始图像和妥协图像从百度PS栏。百度PS栏是一个网上网友发布 countless 妥协图像的网站。将原始图像 subtracted 妥协图像可以高亮妥协区域。但是,最终图像还有较大的噪声,所以这些图像无法直接用于深度学习模型。我们修改后总变量噪声减少方法是解决这个问题的目标。因为文本很多是细长的,在开关和关闭操作中容易产生文本信息损失。我们使用 MSER (最大稳定极值区域) 和 NMS (非最大suppression) 技术提取文本信息。然后使用修改后总变量噪声减少技术处理减去后的图像。最后,我们可以获得一个噪声少的图像,并将图像和文本信息相加。这种方法可以帮助模型获得更好的结果,同时保留了文本信息的大部分。这些数据集可以用于深度学习模型,并帮助模型取得更好的结果。

P2AT: Pyramid Pooling Axial Transformer for Real-time Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2310.15025
  • repo_url: None
  • paper_authors: Mohammed A. M. Elhassan, Changjun Zhou, Amina Benabid, Abuzar B. M. Adam
  • for: 本研究旨在提出一种实时semantic segmentation架构,以提高自动驾驶等实时任务中的景象理解精度。
  • methods: 本研究提出了一种名为Pyramid Pooling Axial Transformer(P2AT)的实时semantic segmentation架构,包括一个卷积神经网络encoder、一个pyramid pooling axial transformer、一个Bidirectional Fusion模块以及一个decoder block。
  • results: 在Camvid、Cityscapes和Pascal VOC 2012等三个复杂场景理解数据集上, variants of P2AT 都达到了状态的最佳结果,具体的结果分别是80.5%、81.0%、81.1%以及78.7%。
    Abstract Recently, Transformer-based models have achieved promising results in various vision tasks, due to their ability to model long-range dependencies. However, transformers are computationally expensive, which limits their applications in real-time tasks such as autonomous driving. In addition, an efficient local and global feature selection and fusion are vital for accurate dense prediction, especially driving scene understanding tasks. In this paper, we propose a real-time semantic segmentation architecture named Pyramid Pooling Axial Transformer (P2AT). The proposed P2AT takes a coarse feature from the CNN encoder to produce scale-aware contextual features, which are then combined with the multi-level feature aggregation scheme to produce enhanced contextual features. Specifically, we introduce a pyramid pooling axial transformer to capture intricate spatial and channel dependencies, leading to improved performance on semantic segmentation. Then, we design a Bidirectional Fusion module (BiF) to combine semantic information at different levels. Meanwhile, a Global Context Enhancer is introduced to compensate for the inadequacy of concatenating different semantic levels. Finally, a decoder block is proposed to help maintain a larger receptive field. We evaluate P2AT variants on three challenging scene-understanding datasets. In particular, our P2AT variants achieve state-of-art results on the Camvid dataset 80.5%, 81.0%, 81.1% for P2AT-S, P2ATM, and P2AT-L, respectively. Furthermore, our experiment on Cityscapes and Pascal VOC 2012 have demonstrated the efficiency of the proposed architecture, with results showing that P2AT-M, achieves 78.7% on Cityscapes. The source code will be available at
    摘要 近些时间,基于Transformer的模型在各种视觉任务中取得了有前途的成绩,这主要归功于它们的长距离依赖关系模型。然而,Transformer是 computationally 昂贵的,这限制了它们在实时任务,如自动驾驶,的应用。此外,fficient的本地和全局特征选择和融合是 dense prediction 精度的关键,特别是驾驶场景理解任务。在这篇论文中,我们提出了一种实时semantic segmentation 架构,名为Pyramid Pooling Axial Transformer (P2AT)。我们的提案的P2AT使用 CNN Encoder 中的粗细特征来生成尺度意义的Contextual Features,然后使用多级特征聚合方案来生成加强的Contextual Features。具体来说,我们引入了一种pyramid pooling axial transformer,以捕捉细致的空间和通道相互关系,从而提高了semantic segmentation 的性能。此外,我们设计了一种Bidirectional Fusion 模块(BiF),用于将不同水平的semantic信息融合。同时,我们引入了一种全球上下文增强器(Global Context Enhancer),以补做不同水平semantic信息融合的不足。最后,我们提出了一个解码块,以帮助保持更大的接收器场。我们的P2AT变体在三个复杂的场景理解数据集上进行了评估,具体来说是 Camvid 数据集的80.5%, 81.0%, 81.1% для P2AT-S, P2ATM, 和 P2AT-L 分别。此外,我们对 Cityscapes 和 Pascal VOC 2012 进行了实验,结果表明,P2AT-M 在 Cityscapes 上达到了78.7%。源代码将在 [

SONIC: Sonar Image Correspondence using Pose Supervised Learning for Imaging Sonars

  • paper_url: http://arxiv.org/abs/2310.15023
  • repo_url: None
  • paper_authors: Samiran Gode, Akshay Hinduja, Michael Kaess
  • for: 解决水下SLAM中数据关联问题,通过一种基于学习特征的新方法实现SONAR图像匹配。
  • methods: 提出了一种名为SONIC(SONar Image Correspondence)的 pose-supervised 网络,可以在不同视角下提供强健的特征匹配,承受视角变化。
  • results: 方法可以在SONAR图像中生成高精度的匹配结果,将为水下SLAM带来更高的精度和可靠性。
    Abstract In this paper, we address the challenging problem of data association for underwater SLAM through a novel method for sonar image correspondence using learned features. We introduce SONIC (SONar Image Correspondence), a pose-supervised network designed to yield robust feature correspondence capable of withstanding viewpoint variations. The inherent complexity of the underwater environment stems from the dynamic and frequently limited visibility conditions, restricting vision to a few meters of often featureless expanses. This makes camera-based systems suboptimal in most open water application scenarios. Consequently, multibeam imaging sonars emerge as the preferred choice for perception sensors. However, they too are not without their limitations. While imaging sonars offer superior long-range visibility compared to cameras, their measurements can appear different from varying viewpoints. This inherent variability presents formidable challenges in data association, particularly for feature-based methods. Our method demonstrates significantly better performance in generating correspondences for sonar images which will pave the way for more accurate loop closure constraints and sonar-based place recognition. Code as well as simulated and real-world datasets will be made public to facilitate further development in the field.
    摘要 在这篇论文中,我们解决了水下SLAM中数据关联的挑战问题,通过一种新的声波图像匹配方法,使用学习的特征。我们提出了SONIC(声波图像匹配),一种姿态监睹的网络,可以生成Robust的特征匹配,抗抗视点变化。水下环境的内在复杂性来自于动态和有限的视野条件,这限制了视野到几米的空气。这使得摄像头系统在大多数开水应用场景中不太适用。因此,多束声波扫描仪成为了感知传感器的首选。然而,它们也不 Without its limitations。声波扫描仪可以在不同视点下提供长距离可见性,但是它们的测量可能会因视点变化而显示不同。这种内在的变化对数据关联具有挑战性,特别是基于特征方法。我们的方法在生成声波图像匹配中表现出了显著的改善,这将为更精确的循环关闭约束和声波基于地理位置识别提供道路。我们将代码、模拟和实际数据集一起发布,以便进一步发展在这个领域。

Wonder3D: Single Image to 3D using Cross-Domain Diffusion

  • paper_url: http://arxiv.org/abs/2310.15008
  • repo_url: None
  • paper_authors: Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, Wenping Wang
  • for: 这个论文的目的是提出一种高效地生成高精度文本化三维模型的方法,以优化图像到三维任务的质量、一致性和效率。
  • methods: 该方法基于多视图扩散模型,通过多视图交叉领域注意力机制和形态相关的normal图像 fusión算法来生成高质量的三维表面。
  • results: 经过广泛的评估,该方法可以获得高质量的重建结果,具有良好的一致性和可靠的效率,与之前的方法相比有所提高。
    Abstract In this work, we introduce Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images.Recent methods based on Score Distillation Sampling (SDS) have shown the potential to recover 3D geometry from 2D diffusion priors, but they typically suffer from time-consuming per-shape optimization and inconsistent geometry. In contrast, certain works directly produce 3D information via fast network inferences, but their results are often of low quality and lack geometric details. To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images. To ensure consistency, we employ a multi-view cross-domain attention mechanism that facilitates information exchange across views and modalities. Lastly, we introduce a geometry-aware normal fusion algorithm that extracts high-quality surfaces from the multi-view 2D representations. Our extensive evaluations demonstrate that our method achieves high-quality reconstruction results, robust generalization, and reasonably good efficiency compared to prior works.
    摘要 在这个研究中,我们介绍了 Wonder3D,一种新的方法,能够高效地生成具有高品质的纹理降降 mesh 从单视图图像。现有的基于 Score Distillation Sampling(SDS)的方法有可能从二维扩散先验中恢复三维几何结构,但它们通常受到每个形状的优化时间consuming和不一致的几何结构的限制。与之相反,一些工作直接通过快速网络推理生成了三维信息,但其结果通常是低质量的,缺乏几何细节。为了全面提高图像到三维任务的质量、一致性和效率,我们提议了一种域隔扩散模型,该模型生成了多视图正常地图和对应的颜色图像。为保持一致性,我们使用了多视图域隔扩散注意机制,该机制促进了不同视图和模式之间的信息交换。最后,我们引入了一种几何意识的正常融合算法,该算法从多视图二维表示中提取出高质量的表面。我们的广泛评估表明,我们的方法可以实现高质量的重建结果,良好的一致性和理想的效率,相比于先前的方法。

StenUNet: Automatic Stenosis Detection from X-ray Coronary Angiography

  • paper_url: http://arxiv.org/abs/2310.14961
  • repo_url: https://github.com/huilin0220/stenunet
  • paper_authors: Hui Lin, Tom Liu, Aggelos Katsaggelos, Adrienne Kline
  • for: 鉴别 coronary artery disease (CAD) 的主要方法
  • methods: 使用机器学习和其他计算机视觉技术
  • results: 测试集 F1 分数为 0.5348,比第二名偏低 0.0005 分数
    Abstract Coronary angiography continues to serve as the primary method for diagnosing coronary artery disease (CAD), which is the leading global cause of mortality. The severity of CAD is quantified by the location, degree of narrowing (stenosis), and number of arteries involved. In current practice, this quantification is performed manually using visual inspection and thus suffers from poor inter- and intra-rater reliability. The MICCAI grand challenge: Automatic Region-based Coronary Artery Disease diagnostics using the X-ray angiography imagEs (ARCADE) curated a dataset with stenosis annotations, with the goal of creating an automated stenosis detection algorithm. Using a combination of machine learning and other computer vision techniques, we propose the architecture and algorithm StenUNet to accurately detect stenosis from X-ray Coronary Angiography. Our submission to the ARCADE challenge placed 3rd among all teams. We achieved an F1 score of 0.5348 on the test set, 0.0005 lower than the 2nd place.
    摘要 心血管绘影继续serve as the primary方法 для诊断心血管疾病(CAD),这是全球最主要的死亡原因。CAD的严重程度由病变的位置、狭窄程度(stenosis)和涉及的动脉数量来衡量。在当前的实践中,这些评估是通过视觉检查进行手动实施,因此受到poor inter-和intra-评估者的可靠性的限制。MICCAI大挑战:自动区域基于X射线绘影的 coronary artery disease 诊断(ARCADE)筹集了stenosis注解,目的是创建一个自动检测stenosis的算法。通过machine learning和其他计算机视觉技术,我们提出了StenUNet架构和算法,以准确地从X射线心血管绘影中检测stenosis。我们对ARCADE挑战的提交位列第3,测试集F1分数为0.5348,相比第2名的0.0005低。

Learning Real-World Image De-Weathering with Imperfect Supervision

  • paper_url: http://arxiv.org/abs/2310.14958
  • repo_url: https://github.com/1180300419/imperfect-deweathering
  • paper_authors: Xiaohui Liu, Zhilu Zhang, Xiaohe Wu, Chaoyu Feng, Xiaotao Wang, LEI LEI, Wangmeng Zuo
  • for: 提高现实场景中图像去气化的性能,解决现有 dataset 中 inconsistent 照明、位置和xture 等问题对学习型去气化方法的训练造成的负面影响。
  • methods: 提出一种Unified Inconsistency Addressing (UIA) 方法,包括一种基于信息瓶颈理论的 Consistent Label Constructor (CLC) 生成 pseudo-label,以及一种 Information Allocation Strategy (IAS) 将原始不完整标签和 pseudo-标签共同supervise 去气化模型。
  • results: 在两个实际场景中进行测试,结果表明 UIA 方法可以帮助现有去气化模型提高性能。 codes 可以在 https://github.com/1180300419/imperfect-deweathering 上下载。
    Abstract Real-world image de-weathering aims at removing various undesirable weather-related artifacts. Owing to the impossibility of capturing image pairs concurrently, existing real-world de-weathering datasets often exhibit inconsistent illumination, position, and textures between the ground-truth images and the input degraded images, resulting in imperfect supervision. Such non-ideal supervision negatively affects the training process of learning-based de-weathering methods. In this work, we attempt to address the problem with a unified solution for various inconsistencies. Specifically, inspired by information bottleneck theory, we first develop a Consistent Label Constructor (CLC) to generate a pseudo-label as consistent as possible with the input degraded image while removing most weather-related degradations. In particular, multiple adjacent frames of the current input are also fed into CLC to enhance the pseudo-label. Then we combine the original imperfect labels and pseudo-labels to jointly supervise the de-weathering model by the proposed Information Allocation Strategy (IAS). During testing, only the de-weathering model is used for inference. Experiments on two real-world de-weathering datasets show that our method helps existing de-weathering models achieve better performance. Codes are available at https://github.com/1180300419/imperfect-deweathering.
    摘要 现实世界中的图像去气化目标是去除各种不好的天气相关的artefacts。由于不能同时拍摄图像对,现有的现实世界去气化数据集经常表现出不一致的照明、位置和文本ure между真实图像和输入降低图像,这会负面影响学习基于的去气化方法的训练过程。在这种情况下,我们尝试解决这个问题,通过一种统一的解决方案来处理不同的不一致。具体来说,我们首先开发了一种适应信息瓶颈理论的Consistent Label Constructor (CLC),用于生成与输入降低图像最接近的pseudo-标签。具体来说,我们将多个相邻帧的输入图像 feed into CLC,以增强pseudo-标签。然后,我们将原始不完美标签和pseudo-标签共同用于supervising去气化模型,通过我们提出的信息分配策略(IAS)。在测试时,只有去气化模型进行推理。实验结果表明,我们的方法可以帮助现有的去气化模型在两个真实的去气化数据集上提高性能。代码可以在https://github.com/1180300419/imperfect-deweathering上获取。

Robust Depth Linear Error Decomposition with Double Total Variation and Nuclear Norm for Dynamic MRI Reconstruction

  • paper_url: http://arxiv.org/abs/2310.14934
  • repo_url: None
  • paper_authors: Junpeng Tan, Chunmei Qing, Xiangmin Xu
  • for: 这个论文是为了提高动态MRI图像重建的速度和准确性而写的。
  • methods: 该论文提出了一种新的稳定低级动态MRI重建优化模型,使用了高度减掉样本和离散傅里叶变换(DFT)。该方法包括线性分解、双Total Variation(TV)和双核内容 нор(NN)规范。
  • results: compared with五种现有方法,该方法在动态MRI数据上进行了广泛的实验,并证明了其在重建准确性和时间复杂度方面的优越性。
    Abstract Compressed Sensing (CS) significantly speeds up Magnetic Resonance Image (MRI) processing and achieves accurate MRI reconstruction from under-sampled k-space data. According to the current research, there are still several problems with dynamic MRI k-space reconstruction based on CS. 1) There are differences between the Fourier domain and the Image domain, and the differences between MRI processing of different domains need to be considered. 2) As three-dimensional data, dynamic MRI has its spatial-temporal characteristics, which need to calculate the difference and consistency of surface textures while preserving structural integrity and uniqueness. 3) Dynamic MRI reconstruction is time-consuming and computationally resource-dependent. In this paper, we propose a novel robust low-rank dynamic MRI reconstruction optimization model via highly under-sampled and Discrete Fourier Transform (DFT) called the Robust Depth Linear Error Decomposition Model (RDLEDM). Our method mainly includes linear decomposition, double Total Variation (TV), and double Nuclear Norm (NN) regularizations. By adding linear image domain error analysis, the noise is reduced after under-sampled and DFT processing, and the anti-interference ability of the algorithm is enhanced. Double TV and NN regularizations can utilize both spatial-temporal characteristics and explore the complementary relationship between different dimensions in dynamic MRI sequences. In addition, Due to the non-smoothness and non-convexity of TV and NN terms, it is difficult to optimize the unified objective model. To address this issue, we utilize a fast algorithm by solving a primal-dual form of the original problem. Compared with five state-of-the-art methods, extensive experiments on dynamic MRI data demonstrate the superior performance of the proposed method in terms of both reconstruction accuracy and time complexity.
    摘要 压缩感知(CS)可以大大提高 магнит共振成像(MRI)处理速度,并实现高精度的 MRI重建从下采样的 k-空间数据中。根据当前研究,有些动态 MRI k-空间重建基于 CS 的问题仍然存在。其中有以下几点:1. 埃尔增殖和像域重建存在差异,需要考虑不同域的处理方法。2. 动态 MRI 数据是三维的,具有空间-时间特征,需要计算表面文件的差异和一致性,同时保持结构完整性和特有性。3. 动态 MRI 重建时间consuming和计算资源依赖。在本文中,我们提出了一种新的 Robust Depth Linear Error Decomposition Model (RDLEDM),用于解决上述问题。我们的方法主要包括线性分解、双 Total Variation (TV) 和双 Nuclear Norm (NN) 正则化。通过加入线性图像域错误分析,可以减少下采样和DFT处理后的噪声,提高算法的抗干扰能力。同时,双 TV 和 NN 正则化可以利用动态 MRI 序列的空间-时间特征,并探索不同维度之间的协同关系。由于 TV 和 NN 正则化是不卷积和非拟合的,因此优化单一目标函数是困难的。为此,我们利用一种快速的算法,解决原始问题的预子问题。与现有五种方法进行比较,我们在动态 MRI 数据上进行了广泛的实验,并证明了我们的方法在重建精度和计算时间上的优越性。

Converting Depth Images and Point Clouds for Feature-based Pose Estimation

  • paper_url: http://arxiv.org/abs/2310.14924
  • repo_url: https://github.com/rlsch/depth-conversions
  • paper_authors: Robert Lösch, Mark Sastuba, Jonas Toth, Bernhard Jung
  • for: 该论文旨在将深度数据转换为可视化的图像,以显示基本隐藏在传统深度图像中的空间细节。
  • methods: 该方法首先进行噪声除除,然后在两个邻居点的normal vector的差异中编码信息。
  • results: 与拧角度图像相比,该新的转换方法可以生成更亮、更高对比度的图像,显示更多的细节和 outline。在视觉增强任务和RGB-D SLAM中,对所有测试特征(AKAZE、ORB、SIFT、SURF),我们的新的 flexion 图像表现更好,并且显示出了将深度数据和经典计算机视觉桥接的潜力。
    Abstract In recent years, depth sensors have become more and more affordable and have found their way into a growing amount of robotic systems. However, mono- or multi-modal sensor registration, often a necessary step for further processing, faces many challenges on raw depth images or point clouds. This paper presents a method of converting depth data into images capable of visualizing spatial details that are basically hidden in traditional depth images. After noise removal, a neighborhood of points forms two normal vectors whose difference is encoded into this new conversion. Compared to Bearing Angle images, our method yields brighter, higher-contrast images with more visible contours and more details. We tested feature-based pose estimation of both conversions in a visual odometry task and RGB-D SLAM. For all tested features, AKAZE, ORB, SIFT, and SURF, our new Flexion images yield better results than Bearing Angle images and show great potential to bridge the gap between depth data and classical computer vision. Source code is available here: https://rlsch.github.io/depth-flexion-conversion.
    摘要 近年来,深度感知器件成本逐渐下降,并在机器人系统中得到广泛应用。然而,单模或多模态感知器件注册,经常需要进一步处理Raw深度图像或点云数据,这在实际应用中遇到了许多挑战。本文提出了将深度数据转换为可视化 spatial details的方法,从而解决传统深度图像中隐藏的问题。经过噪声除除后,点云中的两个 Normal vector 的差异被编码到这个新的转换中。与比较 Bearing Angle 图像的情况相比,我们的新方法可以提供更亮、更高对比度的图像,图像中的缘界更加明显,更多的细节可见。我们在视觉运动任务和RGB-D SLAM 中测试了基于特征的姿态估计,并发现所有测试特征(AKAZE、ORB、SIFT、SURF),我们的新Flexion图像比 Bearing Angle 图像更好,可能为传统深度数据和经典计算机视ión之间的桥梁。源代码可以在以下链接中找到:https://rlsch.github.io/depth-flexion-conversion。

GRLib: An Open-Source Hand Gesture Detection and Recognition Python Library

  • paper_url: http://arxiv.org/abs/2310.14919
  • repo_url: https://github.com/mikhail-vlasenko/grlib
  • paper_authors: Jan Warchocki, Mikhail Vlasenko, Yke Bauke Eisma
  • for: 这篇论文旨在提出一个基于 OpenCV 的Python 库,用于检测和识别手势。
  • methods: 该论文使用了 MediaPipe Hands 进行手势检测,并使用了数据加工和keyframe提取来支持动态手势。
  • results: 对三个真实世界数据集进行测试,该库的性能高于另一个公开可用的 HGR 系统 - MediaPipe Solutions。
    Abstract Hand gesture recognition systems provide a natural way for humans to interact with computer systems. Although various algorithms have been designed for this task, a host of external conditions, such as poor lighting or distance from the camera, make it difficult to create an algorithm that performs well across a range of environments. In this work, we present GRLib: an open-source Python library able to detect and classify static and dynamic hand gestures. Moreover, the library can be trained on existing data for improved classification robustness. The proposed solution utilizes a feed from an RGB camera. The retrieved frames are then subjected to data augmentation and passed on to MediaPipe Hands to perform hand landmark detection. The landmarks are then classified into their respective gesture class. The library supports dynamic hand gestures through trajectories and keyframe extraction. It was found that the library outperforms another publicly available HGR system - MediaPipe Solutions, on three diverse, real-world datasets. The library is available at https://github.com/mikhail-vlasenko/grlib and can be installed with pip.
    摘要 人体姿势识别系统提供了一种自然的人机交互方式。虽然有各种算法被设计用于这项任务,但外部条件,如照明不佳或相机距离较远,使得创建一个在多种环境下表现良好的算法变得困难。在这种工作中,我们提出了GRLib:一个开源的Python库,能够检测和分类静止和动态手势。此外,库还可以在现有数据上进行训练,以提高分类稳定性。该解决方案利用RGB摄像头的数据流,并对数据进行数据增强。接下来,抓取的帧将被传递给MediaPipe Hands进行手指地标检测。手指地标会被分类为各自的姿势类。库支持动态手势通过轨迹和关键帧EXTRACTION。实验表明,库在三个多样化的实际数据集上表现出色,超过了另一个公开available的HGR系统——MediaPipe Solutions。库可以在https://github.com/mikhail-vlasenko/grlib中下载,并使用pip安装。

Object Pose Estimation Annotation Pipeline for Multi-view Monocular Camera Systems in Industrial Settings

  • paper_url: http://arxiv.org/abs/2310.14914
  • repo_url: None
  • paper_authors: Hazem Youssef, Frederik Polachowski, Jérôme Rutinowski, Moritz Roidl, Christopher Reining
  • for: 这个论文主要用于解决大型工业空间中物流操作中的对象位置和对象pose估计问题,而不需要安装人工设备或昂贵设备。
  • methods: 这篇论文使用现有的摄像头来解决对象pose估计问题,而不需要人工标注。它使用了一种基于深度学习的方法,通过将3D模型 proyect到实际空间中,以获得对象的6D位置。
  • results: 该论文在一个自定义的数据集中测试了他们的管道,并取得了26,482个对象实例的一致性高的标注,只需要一小部分的时间。
    Abstract Object localization, and more specifically object pose estimation, in large industrial spaces such as warehouses and production facilities, is essential for material flow operations. Traditional approaches rely on artificial artifacts installed in the environment or excessively expensive equipment, that is not suitable at scale. A more practical approach is to utilize existing cameras in such spaces in order to address the underlying pose estimation problem and to localize objects of interest. In order to leverage state-of-the-art methods in deep learning for object pose estimation, large amounts of data need to be collected and annotated. In this work, we provide an approach to the annotation of large datasets of monocular images without the need for manual labor. Our approach localizes cameras in space, unifies their location with a motion capture system, and uses a set of linear mappings to project 3D models of objects of interest at their ground truth 6D pose locations. We test our pipeline on a custom dataset collected from a system of eight cameras in an industrial setting that mimics the intended area of operation. Our approach was able to provide consistent quality annotations for our dataset with 26, 482 object instances at a fraction of the time required by human annotators.
    摘要 <>传送文本到简化中文。>在大型工厂和生产设施中,物流运作中的对象位置和 orientación是关键。传统方法通常采用在环境中安装人工设施或过分昂贵的设备,这些设备不适用于大规模应用。我们提出了一种更实用的方法,利用现有的摄像头来解决对象pose估计问题并将对象的位置进行标注。为了利用深度学习的最新方法进行对象pose估计,需要收集和标注大量数据。在这种工作中,我们提供了一种无需人工劳动的对象标注方法。我们的方法将摄像头在空间中固定,与动作捕捉系统结合,并使用一组线性映射将3D对象的模型在真实6D姿态位置上进行投影。我们在一个自定义的数据集中测试了我们的管道,该数据集由8个摄像头在工业场景中采集而成。我们的方法能够在相对较少的时间内提供高质量标注数据,与人工标注者相比,我们的方法能够提供26,482个对象实例的标注。

Orientation-Aware Leg Movement Learning for Action-Driven Human Motion Prediction

  • paper_url: http://arxiv.org/abs/2310.14907
  • repo_url: None
  • paper_authors: Chunzhi Gu, Chao Zhang, Shigeru Kuriyama
  • for: 这个论文的目的是提出一种基于动作标签的人体运动预测方法,以便预测人体 будущее运动序列,同时尊重给出的动作标签。
  • methods: 这个方法使用了一种叫做动作嵌入(action-conditioned in-betweening,ACB)的学习任务,以便促进人体运动过程中的自然转换。具体来说,它只在一些活跃的走姿动作类别,如走或跑,进行动作嵌入学习。
  • results: 这个方法在三个标准 benchmark 数据集上进行了广泛的测试,并证明了它在视觉质量、预测精度和动作忠实性等方面达到了领先的性能。
    Abstract The task of action-driven human motion prediction aims to forecast future human motion from the observed sequence while respecting the given action label. It requires modeling not only the stochasticity within human motion but the smooth yet realistic transition between multiple action labels. However, the fact that most of the datasets do not contain such transition data complicates this task. Existing work tackles this issue by learning a smoothness prior to simply promote smooth transitions, yet doing so can result in unnatural transitions especially when the history and predicted motions differ significantly in orientations. In this paper, we argue that valid human motion transitions should incorporate realistic leg movements to handle orientation changes, and cast it as an action-conditioned in-betweening (ACB) learning task to encourage transition naturalness. Because modeling all possible transitions is virtually unreasonable, our ACB is only performed on very few selected action classes with active gait motions, such as Walk or Run. Specifically, we follow a two-stage forecasting strategy by first employing the motion diffusion model to generate the target motion with a specified future action, and then producing the in-betweening to smoothly connect the observation and prediction to eventually address motion prediction. Our method is completely free from the labeled motion transition data during training. To show the robustness of our approach, we generalize our trained in-betweening learning model on one dataset to two unseen large-scale motion datasets to produce natural transitions. Extensive methods on three benchmark datasets demonstrate that our method yields the state-of-the-art performance in terms of visual quality, prediction accuracy, and action faithfulness.
    摘要 人体动作预测任务的目标是预测未来人体动作,而且需要遵循给定的动作标签。这需要模型人体动作中的随机性以及多个动作标签之间的平滑过渡。然而,大多数数据集不包含这种过渡数据,这使得这个任务变得更加复杂。现有的方法通过学习一个平滑性先验来促进平滑过渡,但这可能会导致不自然的过渡,特别是当历史动作和预测动作差异较大时。在这篇论文中,我们认为有效的人体动作过渡应该包含实际的脚部运动,以处理方向变化。我们将这种任务定义为动作条件的宽权(ACB)学习任务,以促进过渡自然性。由于模型所有可能的过渡是无法实现的,我们只在一些活动步态动作类型,如走或跑,进行ACB学习。我们采用了两个阶段预测策略:首先,使用动作扩散模型生成target动作,然后生成宽权来连接观察和预测,以最终解决动作预测问题。我们的方法不需要在训练时使用标注过渡动作数据。为了证明我们的方法的稳定性,我们在一个数据集上进行了一些推广和特化的方法,并在三个 benchmark 数据集上进行了广泛的测试。结果表明,我们的方法在视觉质量、预测精度和动作忠实度等方面达到了领先水平。

Deep learning denoiser assisted roughness measurements extraction from thin resists with low Signal-to-Noise Ratio(SNR) SEM images: analysis with SMILE

  • paper_url: http://arxiv.org/abs/2310.14815
  • repo_url: None
  • paper_authors: Sara Sacchi, Bappaditya Dey, Iacopo Mochi, Sandip Halder, Philippe Leray
  • for: 高 numerical aperture Extreme Ultraviolet Lithography (高 NA EUVL) 的技术进步,使得研究薄膜 photoresists (下30nm) 的需求增加,但是SEM图像受到干扰的影响,导致图像的增强和精度下降。这种情况下,本研究的目的是使用深度学习减噪器,提高SEM图像的信号强度,并且可以准确地测量薄膜中的折射率和宽度折射。
  • methods: 本研究使用了深度学习减噪器,对SEM图像进行减噪处理,并使用开源的测量软件SMILE 2.3.2进行系统性的分析。
  • results: 对于不同的涂敷厚度(15nm、20nm、25nm、30nm)、底层(Spin-On-Glass-SOG、Organic Underlayer-OUL)和抽样平均数(4、8、16、32、64 Fr),通过减噪处理后的图像进行了系统性的分析,并发现denoised图像中的CD具有不变性,减噪后的图像具有更高的信号强度,并且可以准确地测量薄膜中的折射率和宽度折射。
    Abstract The technological advance of High Numerical Aperture Extreme Ultraviolet Lithography (High NA EUVL) has opened the gates to extensive researches on thinner photoresists (below 30nm), necessary for the industrial implementation of High NA EUVL. Consequently, images from Scanning Electron Microscopy (SEM) suffer from reduced imaging contrast and low Signal-to-Noise Ratio (SNR), impacting the measurement of unbiased Line Edge Roughness (uLER) and Line Width Roughness (uLWR). Thus, the aim of this work is to enhance the SNR of SEM images by using a Deep Learning denoiser and enable robust roughness extraction of the thin resist. For this study, we acquired SEM images of Line-Space (L/S) patterns with a Chemically Amplified Resist (CAR) with different thicknesses (15nm, 20nm, 25nm, 30nm), underlayers (Spin-On-Glass-SOG, Organic Underlayer-OUL) and frames of averaging (4, 8, 16, 32, and 64 Fr). After denoising, a systematic analysis has been carried out on both noisy and denoised images using an open-source metrology software, SMILE 2.3.2, for investigating mean CD, SNR improvement factor, biased and unbiased LWR/LER Power Spectral Density (PSD). Denoised images with lower number of frames present unaltered Critical Dimensions (CDs), enhanced SNR (especially for low number of integration frames), and accurate measurements of uLER and uLWR, with the same accuracy as for noisy images with a consistent higher number of frames. Therefore, images with a small number of integration frames and with SNR < 2 can be successfully denoised, and advantageously used in improving metrology throughput while maintaining reliable roughness measurements for the thin resist.
    摘要 高 numerical aperture extreme ultraviolet литография (高 NA EUVL) 的技术进步已经开启了追究薄膜抗抗�� (Below 30nm) 的广泛研究,这是高 NA EUVL 的工业实现所必需的。然而,由于 SEM 图像的快照射镜观察镜影响,导致 SEM 图像的呈现效果受到了干扰,从而影响了无偏线Edge 粗 roughness (uLER) 和 Line Width Roughness (uLWR) 的测量。因此,本研究的目标是使用深度学习去噪器提高 SEM 图像的信噪比 (SNR),以便robustly 提取薄膜中的粗 roughness。我们对 Line-Space (L/S) 模式中的 Chemically Amplified Resist (CAR) WITH different thicknesses (15nm, 20nm, 25nm, 30nm)、underlayers (Spin-On-Glass-SOG, Organic Underlayer-OUL) 和 frames of averaging (4, 8, 16, 32, and 64 Fr) 进行了SEM 图像的取样,并对这些图像进行了去噪处理。然后,我们使用开源的测量软件 SMILE 2.3.2 进行了系统性的分析, investigate mean CD, SNR improvement factor, biased and unbiased LWR/LER Power Spectral Density (PSD)。去噪后的图像显示了下降的 Critical Dimensions (CDs)、提高的 SNR (特别是低数量的整合帧)、和精确地测量 uLER 和 uLWR,与不去噪的图像相同的精度。因此,具有少量的整合帧和 SNR < 2 的图像可以成功地去噪,并且可以提高测量过程的效率而无需损失精度。

DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading

  • paper_url: http://arxiv.org/abs/2310.14802
  • repo_url: https://github.com/hint-lab/doctrack
  • paper_authors: Hao Wang, Qingxuan Wang, Yue Li, Changqing Wang, Chenhui Chu, Rui Wang
  • for: 本研究目的是提供一个可以让机器学习模型更好地理解文档的数据集,以便进一步推动文档智能模型的研究和开发。
  • methods: 该研究使用了眼动跟踪技术来跟踪人类阅读文档时的眼动路径,并将其与文档中的各种元素进行对比,以生成一个可以让机器学习模型更好地理解文档的数据集。
  • results: 研究发现,虽然文档智能模型已经做出了 significiant 进步,但它们仍然无法如人类一样准确、连续、灵活地理解文档。这些发现可能对未来文档智能模型的研究和开发产生影响。数据集可以在 \url{https://github.com/hint-lab/doctrack} 上下载。
    Abstract The use of visually-rich documents (VRDs) in various fields has created a demand for Document AI models that can read and comprehend documents like humans, which requires the overcoming of technical, linguistic, and cognitive barriers. Unfortunately, the lack of appropriate datasets has significantly hindered advancements in the field. To address this issue, we introduce \textsc{DocTrack}, a VRD dataset really aligned with human eye-movement information using eye-tracking technology. This dataset can be used to investigate the challenges mentioned above. Additionally, we explore the impact of human reading order on document understanding tasks and examine what would happen if a machine reads in the same order as a human. Our results suggest that although Document AI models have made significant progress, they still have a long way to go before they can read VRDs as accurately, continuously, and flexibly as humans do. These findings have potential implications for future research and development of Document AI models. The data is available at \url{https://github.com/hint-lab/doctrack}.
    摘要 使用触发性文档(VRD)在不同领域的应用已经创造了人工智能文档模型能够像人类一样阅读和理解文档的需求,但这些需求却受到技术、语言和认知障碍的影响。然而,缺乏适当的数据集的问题使得这一领域的进步受到了很大的限制。为了解决这个问题,我们介绍了《 DocTrack》,一个基于人类眼动信息的 VRD 数据集。这个数据集可以用于调查以上挑战。此外,我们还探讨了人类阅读顺序对文档理解任务的影响,以及机器人是否可以像人类一样阅读。我们的结果表明,虽然文档人工智能模型已经做出了很大的进步,但它们仍然需要进一步的改进,以达到人类一样的精度、连续性和灵活性。这些发现有可能对未来文档人工智能模型的研发产生影响。数据可以在 GitHub 上获取:https://github.com/hint-lab/doctrack。

SAMCLR: Contrastive pre-training on complex scenes using SAM for view sampling

  • paper_url: http://arxiv.org/abs/2310.14736
  • repo_url: None
  • paper_authors: Benjamin Missaoui, Chongbin Yuan
  • for: 这个论文是为了提高自主学习的对照式学习效果,并且在复杂的场景中提高模型的精确性。
  • methods: 这个论文使用了SAM来分割图像,然后从同一个区域中采样两个看法。
  • results: 这个论文的预training在Cityscapes和ADE20K上,然后评估在CIFAR-10、STL10和ImageNet上,与SimCLR、DINO和MoCo相比,表现至少与其匹配,常常表现出色。
    Abstract In Computer Vision, self-supervised contrastive learning enforces similar representations between different views of the same image. The pre-training is most often performed on image classification datasets, like ImageNet, where images mainly contain a single class of objects. However, when dealing with complex scenes with multiple items, it becomes very unlikely for several views of the same image to represent the same object category. In this setting, we propose SAMCLR, an add-on to SimCLR which uses SAM to segment the image into semantic regions, then sample the two views from the same region. Preliminary results show empirically that when pre-training on Cityscapes and ADE20K, then evaluating on classification on CIFAR-10, STL10 and ImageNette, SAMCLR performs at least on par with, and most often significantly outperforms not only SimCLR, but also DINO and MoCo.
    摘要

MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion

  • paper_url: http://arxiv.org/abs/2310.14729
  • repo_url: None
  • paper_authors: Roy Kapon, Guy Tevet, Daniel Cohen-Or, Amit H. Bermano
  • for: 这篇论文是为了生成一种可靠的多视图2D抽象方法,以便从不同角度获取3D动作序列。
  • methods: 该方法基于一种 diffusion model,并且只使用2D数据进行训练。它通过同时杂化多个2D动作序列来生成一致的3D动作序列。
  • results: 该方法在不同的运动领域中都能够生成多样化和真实的3D动作序列,而无需文本条件。它比传统的优化化策略更自然地与扩散框架结合,并且可以避免一些常见的问题,如外域抽象、缺失细节和模式落塌。
    Abstract We introduce Multi-view Ancestral Sampling (MAS), a method for generating consistent multi-view 2D samples of a motion sequence, enabling the creation of its 3D counterpart. MAS leverages a diffusion model trained solely on 2D data, opening opportunities to exciting and diverse fields of motion previously under-explored as 3D data is scarce and hard to collect. MAS works by simultaneously denoising multiple 2D motion sequences representing the same motion from different angles. Our consistency block ensures consistency across all views at each diffusion step by combining the individual generations into a unified 3D sequence, and projecting it back to the original views for the next iteration. We demonstrate MAS on 2D pose data acquired from videos depicting professional basketball maneuvers, rhythmic gymnastic performances featuring a ball apparatus, and horse obstacle course races. In each of these domains, 3D motion capture is arduous, and yet, MAS generates diverse and realistic 3D sequences without textual conditioning. As we demonstrate, our ancestral sampling-based approach offers a more natural integration with the diffusion framework compared to popular denoising optimization-based approaches, and avoids common issues such as out-of-domain sampling, lack of details and mode-collapse. https://guytevet.github.io/mas-page/
    摘要 我们介绍 Multi-view Ancestral Sampling(MAS),一种产生一致的多看法2D样本的方法,实现3D样本的创建。MAS 利用了一个 diffusion 模型,仅从2D数据进行训练,这开启了吸引人的和多样化的动作领域,因为3D数据罕见和困难收集。MAS 工作的方式是同时干扰多个2D动作序列,代表同一个动作的不同角度。我们的一致封页确保了每个步骤中的一致性,通过结合个别生成的3D样本,并将其转换回原始的视角,以便下一个迭代。我们在2D姿势数据,包括职业篮球动作、rhythmic gymnastic performances featuring a ball apparatus 和 horse obstacle course races 等领域中进行了评估。在这些领域中,3D动作捕捉是困难的,但MAS 仍然可以生成多样化和真实的3D样本,无需文本调整。我们示示了我们的 ancstral sampling 基本上和流行的混合优化基本法相比,具有更自然的整合,并避免了常见的对外领域样本、缺乏细节和模式崩溃等问题。Please note that the translation is done using Google Translate, and may not be perfect or idiomatic.

Rethinking Scale Imbalance in Semi-supervised Object Detection for Aerial Images

  • paper_url: http://arxiv.org/abs/2310.14718
  • repo_url: None
  • paper_authors: Ruixiang Zhang, Chang Xu, Fang Xu, Wen Yang, Guangjun He, Huai Yu, Gui-Song Xia
  • for: 本研究针对半指导 объек detection(SSOD)在航空图像中问题,即小型 объек 的训练问题。
  • methods: 本研究提出了一个 novel Scale-discriminative Semi-Supervised Object Detection(S^3OD)学习管线,以解决半指导 SSOD 中的尺度偏见问题。
  • results: 实验结果显示, compared to state-of-the-art competitors, 我们的提案方法具有更高的性能。Here’s a more detailed explanation of each point:
  • for: The paper is focused on the problem of semi-supervised object detection (SSOD) in aerial images, specifically the challenge of training detectors for small objects.
  • methods: The proposed method is a novel Scale-discriminative Semi-Supervised Object Detection (S^3OD) learning pipeline, which includes three key components: Size-aware Adaptive Thresholding (SAT), Size-rebalanced Label Assignment (SLA), and Teacher-guided Negative Learning (TNL).
  • results: The proposed method outperforms state-of-the-art competitors in terms of performance, as demonstrated by extensive experiments conducted on the DOTA-v1.5 benchmark.
    Abstract This paper focuses on the scale imbalance problem of semi-supervised object detection(SSOD) in aerial images. Compared to natural images, objects in aerial images show smaller sizes and larger quantities per image, increasing the difficulty of manual annotation. Meanwhile, the advanced SSOD technique can train superior detectors by leveraging limited labeled data and massive unlabeled data, saving annotation costs. However, as an understudied task in aerial images, SSOD suffers from a drastic performance drop when facing a large proportion of small objects. By analyzing the predictions between small and large objects, we identify three imbalance issues caused by the scale bias, i.e., pseudo-label imbalance, label assignment imbalance, and negative learning imbalance. To tackle these issues, we propose a novel Scale-discriminative Semi-Supervised Object Detection (S^3OD) learning pipeline for aerial images. In our S^3OD, three key components, Size-aware Adaptive Thresholding (SAT), Size-rebalanced Label Assignment (SLA), and Teacher-guided Negative Learning (TNL), are proposed to warrant scale unbiased learning. Specifically, SAT adaptively selects appropriate thresholds to filter pseudo-labels for objects at different scales. SLA balances positive samples of objects at different scales through resampling and reweighting. TNL alleviates the imbalance in negative samples by leveraging information generated by a teacher model. Extensive experiments conducted on the DOTA-v1.5 benchmark demonstrate the superiority of our proposed methods over state-of-the-art competitors. Codes will be released soon.
    摘要
  1. Size-aware Adaptive Thresholding (SAT): This component adaptively selects appropriate thresholds to filter pseudo-labels for objects at different scales.2. Size-rebalanced Label Assignment (SLA): This component balances positive samples of objects at different scales through resampling and reweighting.3. Teacher-guided Negative Learning (TNL): This component alleviates the imbalance in negative samples by leveraging information generated by a teacher model.Our proposed methods have been extensively tested on the DOTA-v1.5 benchmark and have demonstrated superior performance compared to state-of-the-art competitors. The codes will be released soon.

Interaction-Driven Active 3D Reconstruction with Object Interiors

  • paper_url: http://arxiv.org/abs/2310.14700
  • repo_url: https://github.com/Salingo/Interaction-Driven-Reconstruction
  • paper_authors: Zihao Yan, Fubao Su, Mingyang Wang, Ruizhen Hu, Hao Zhang, Hui Huang
  • for: 这篇论文是为了描述一种活动3D重建方法,该方法结合视觉感知、机器人对象互动和3D扫描,以恢复目标3D物体的外部和内部结构。
  • methods: 该方法使用了机器人对象互动的分析,以及基于神经网络的部件检测和三角形重建。
  • results: 该方法可以自动地完成目标3D物体的重建,并且可以获得物体的部件相互关系和内部结构。
    Abstract We introduce an active 3D reconstruction method which integrates visual perception, robot-object interaction, and 3D scanning to recover both the exterior and interior, i.e., unexposed, geometries of a target 3D object. Unlike other works in active vision which focus on optimizing camera viewpoints to better investigate the environment, the primary feature of our reconstruction is an analysis of the interactability of various parts of the target object and the ensuing part manipulation by a robot to enable scanning of occluded regions. As a result, an understanding of part articulations of the target object is obtained on top of complete geometry acquisition. Our method operates fully automatically by a Fetch robot with built-in RGBD sensors. It iterates between interaction analysis and interaction-driven reconstruction, scanning and reconstructing detected moveable parts one at a time, where both the articulated part detection and mesh reconstruction are carried out by neural networks. In the final step, all the remaining, non-articulated parts, including all the interior structures that had been exposed by prior part manipulations and subsequently scanned, are reconstructed to complete the acquisition. We demonstrate the performance of our method via qualitative and quantitative evaluation, ablation studies, comparisons to alternatives, as well as experiments in a real environment.
    摘要 我们介绍了一种活动三维重建方法,该方法结合视觉感知、机器人对象互动和3D扫描,以获取目标3D对象的外部和内部结构。与其他有关活动视觉的研究不同,我们的重建方法不是关注摄像头视点优化以更好地探索环境,而是通过分析机器人对目标对象不同部分的互动性,以及由此导致的部件扫描和重建 occluded 区域。因此,我们可以获得目标对象的部件骨格,同时完全获得其三维结构。我们的方法可以凭借Fetch机器人内置的RGBD感知器自动完成,它在互动分析和互动驱动重建、扫描和重建等步骤中循环运行。在最后一步,我们重建了所有未被扫描的非骨立部分,包括所有在先前的部件扫描中暴露出来的内部结构。我们通过质量和量度评估、简除研究、相对研究和实际环境中的实验,证明了我们的方法的效果。

CAwa-NeRF: Instant Learning of Compression-Aware NeRF Features

  • paper_url: http://arxiv.org/abs/2310.14695
  • repo_url: None
  • paper_authors: Omnia Mahmoud, Théo Ladune, Matthieu Gendrin
  • for: 提高 Neural Radiance Fields (NeRF) 的质量和效率,通过三维场景的卷积特征网格模型化。
  • methods: 使用多分辨率哈希编码,从lookup表中学习可训练的特征网格,以实现几秒钟内高质量的神经图形 primitives。
  • results: 通过快速学习压缩意识 NeRF 特征网格(CAwa-NeRF),可以在模型训练结束时导出压缩后的特征网格,无需更改存储架构或模型参数,并且可以在不同的静止场景中实现出色的效果。
    Abstract Modeling 3D scenes by volumetric feature grids is one of the promising directions of neural approximations to improve Neural Radiance Fields (NeRF). Instant-NGP (INGP) introduced multi-resolution hash encoding from a lookup table of trainable feature grids which enabled learning high-quality neural graphics primitives in a matter of seconds. However, this improvement came at the cost of higher storage size. In this paper, we address this challenge by introducing instant learning of compression-aware NeRF features (CAwa-NeRF), that allows exporting the zip compressed feature grids at the end of the model training with a negligible extra time overhead without changing neither the storage architecture nor the parameters used in the original INGP paper. Nonetheless, the proposed method is not limited to INGP but could also be adapted to any model. By means of extensive simulations, our proposed instant learning pipeline can achieve impressive results on different kinds of static scenes such as single object masked background scenes and real-life scenes captured in our studio. In particular, for single object masked background scenes CAwa-NeRF compresses the feature grids down to 6% (1.2 MB) of the original size without any loss in the PSNR (33 dB) or down to 2.4% (0.53 MB) with a slight virtual loss (32.31 dB).
    摘要 <>模型3D场景使用分割特征网格是一个有前途的方向,以提高神经预测场景(NeRF)的性能。INSTant-NGP(INGP)引入多尺度哈希编码,从一个可调特征网格的lookup表中学习高质量神经图形基元,只需几秒钟内。然而,这种改进带来了更高的存储大小。在这篇论文中,我们解决这个挑战,通过引入压缩意识NeRF特征(CAwa-NeRF),允许在模型训练结束时,压缩特征网格,并在训练参数和存储架构不变的情况下,实现无损压缩。此外,我们的提案不仅适用于INGP,也可以适用于任何模型。通过广泛的仿真实验,我们的快速学习管道可以在不同类型的静止场景中实现惊人的结果,包括单个对象遮盖背景场景和实际studio中捕捉的真实场景。特别是在单个对象遮盖背景场景中,CAwa-NeRF可以将特征网格压缩到6%(1.2MB)原始大小的1/6,无损PSNR(33dB)或者压缩到2.4%(0.53MB),有一定的虚拟损失(32.31dB)。

On Partial Shape Correspondence and Functional Maps

  • paper_url: http://arxiv.org/abs/2310.14692
  • repo_url: None
  • paper_authors: Amit Bracha, Thomas Dagès, Ron Kimmel
  • for: 对于匹配 shapes 与其部分的问题
  • methods: 使用 functional maps 来转换问题,并通过解决 least squares 问题来进行 algebra 匹配
  • results: 提出了一种新的方法来解决 partial shape matching 问题,并在 SHREC’16 数据集上进行评估,与现有的不监督方法比较,得到了更好的结果。
    Abstract While dealing with matching shapes to their parts, we often utilize an instrument known as functional maps. The idea is to translate the shape matching problem into ``convenient'' spaces by which matching is performed algebraically by solving a least squares problem. Here, we argue that such formulations, though popular in this field, introduce errors in the estimated match when partiality is invoked. Such errors are unavoidable even when considering advanced feature extraction networks, and they can be shown to escalate with increasing degrees of shape partiality, adversely affecting the learning capability of such systems. To circumvent these limitations, we propose a novel approach for partial shape matching. Our study of functional maps led us to a novel method that establishes direct correspondence between partial and full shapes through feature matching bypassing the need for functional map intermediate spaces. The Gromov distance between metric spaces leads to the construction of the first part of our loss functions. For regularization we use two options: a term based on the area preserving property of the mapping, and a relaxed version of it without the need to compute a functional map. The proposed approach shows superior performance on the SHREC'16 dataset, outperforming existing unsupervised methods for partial shape matching. In particular, it achieves state-of-the-art result on the SHREC'16 HOLES benchmark, superior also compared to supervised methods.
    摘要 而在匹配形状与其部件时,我们经常使用一种工具称之为功能地图。这个想法是将形状匹配问题转化成``方便''的空间中进行算术匹配,解决一个最小二乘问题。我们认为这种形式ulation,虽然在这个领域非常流行,但是会在形状partiality情况下引入误差。这些误差不仅会在高度形状partiality情况下出现,而且会随着形状partiality的增加而增长,从而对这些系统的学习能力产生负面影响。为了缺陷这些限制,我们提出了一种新的方法 для partial shape matching。我们的研究表示,功能地图不仅可以用于匹配完整形状,还可以用于匹配部件之间的相互关系。我们提出了一种新的方法,可以直接将partial shape与完整形状之间建立对应关系,不需要 intermediate spaces。我们使用Gromov距离来建立首部分的损失函数。为了正则化,我们使用两种选项:一个基于形状匹配的区域性质,以及一个宽松化的版本,不需要计算功能地图。我们的方法在SHREC'16 dataset上表现出色,超过了现有的无监督方法。特别是在SHREC'16 HOLES benchmark上,我们的方法达到了状态 искусственный智能领域的最佳结果,并且在supervised方法之上。

Online Out-of-Domain Detection for Automated Driving

  • paper_url: http://arxiv.org/abs/2310.14675
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Timo Sämann, Horst-Michael Groß
  • for: Ensuring safety in automated driving, particularly in detecting distributional shifts in Deep Neural Networks (DNNs)
  • methods: Proof of concept for a safety mechanism that detects leaving of the training domain online (at runtime) using the Synthia data set
  • results: Achieved 100% correct detection of whether the input data is inside or outside the domain
    Abstract Ensuring safety in automated driving is a major challenge for the automotive industry. Special attention is paid to artificial intelligence, in particular to Deep Neural Networks (DNNs), which is considered a key technology in the realization of highly automated driving. DNNs learn from training data, which means that they only achieve good accuracy within the underlying data distribution of the training data. When leaving the training domain, a distributional shift is caused, which can lead to a drastic reduction of accuracy. In this work, we present a proof of concept for a safety mechanism that can detect the leaving of the domain online, i.e. at runtime. In our experiments with the Synthia data set we can show that a 100 % correct detection of whether the input data is inside or outside the domain is achieved. The ability to detect when the vehicle leaves the domain can be an important requirement for certification.
    摘要 自动驾驶安全保障是汽车业的主要挑战之一。特别是对人工智能技术的应用,特别是深度神经网络(DNNs),被视为高度自动驾驶的重要技术。DNNs通过训练数据学习,意味着它们只有在训练数据的下面分布中获得良好的准确率。当离开训练领域时,分布Shift会导致准确率降低至极限。在这项工作中,我们提出了一种可行性证明,可以在运行时(online)检测数据集离开训练领域。我们在Synthia数据集的实验中表明,可以实现100%的正确检测,以确定输入数据是否在训练领域内或外。能够检测车辆离开训练领域的能力可能是证明的重要需求。

Inject Semantic Concepts into Image Tagging for Open-Set Recognition

  • paper_url: http://arxiv.org/abs/2310.15200
  • repo_url: https://github.com/xinyu1205/recognize-anything
  • paper_authors: Xinyu Huang, Yi-Jie Huang, Youcai Zhang, Weiwei Tian, Rui Feng, Yuejie Zhang, Yanchun Xie, Yaqian Li, Lei Zhang
  • for: 本研究旨在提出一种基本图像识别模型,具有强大的开放集识别能力,通过在图像标签训练框架中注入semantic概念。
  • methods: 本研究使用了图像-文本对对alignment和图像标签的细致交互机制,并采用了大型自然语言模型(LLM)来生成多样化的视觉标签描述。
  • results: 评估在多个图像识别benchmark上表明,RAM++超越了现有的基本图像识别模型,特别是在开放集识别方面表现出优异。具体来说,对于常用的标签类别,RAM++在OpenImages和ImageNet上显示10.2 mAP和15.4 mAP的提升;对于开放集识别,RAM++在OpenImages上记录了5 mAP和6.4 mAP的提升,而在RAM上记录了4.7 mAP的提升。
    Abstract In this paper, we introduce the Recognize Anything Plus Model~(RAM++), a fundamental image recognition model with strong open-set recognition capabilities, by injecting semantic concepts into image tagging training framework. Previous approaches are either image tagging models constrained by limited semantics, or vision-language models with shallow interaction for suboptimal performance in multi-tag recognition. In contrast, RAM++ integrates image-text alignment and image-tagging within a unified fine-grained interaction framework based on image-tags-text triplets. This design enables RAM++ not only excel in identifying predefined categories, but also significantly augment the recognition ability in open-set categories. Moreover, RAM++ employs large language models~(LLMs) to generate diverse visual tag descriptions, pioneering the integration of LLM's knowledge into image tagging training. This approach empowers RAM++ to integrate visual description concepts for open-set recognition during inference. Evaluations on comprehensive image recognition benchmarks demonstrate RAM++ exceeds existing state-of-the-art (SOTA) fundamental image recognition models on most aspects. Specifically, for predefined common-used tag categories, RAM++ showcases 10.2 mAP and 15.4 mAP enhancements over CLIP on OpenImages and ImageNet. For open-set categories beyond predefined, RAM++ records improvements of 5 mAP and 6.4 mAP over CLIP and RAM respectively on OpenImages. For diverse human-object interaction phrases, RAM++ achieves 7.8 mAP and 4.7 mAP improvements on the HICO benchmark. Code, datasets and pre-trained models are available at \url{https://github.com/xinyu1205/recognize-anything}.
    摘要 在这篇论文中,我们介绍了Recognize Anything Plus Model~(RAM++),一种基本图像识别模型,具有强大的开放集识别能力,通过在图像标签训练框架中注入semantic概念。先前的方法都是 either图像标签模型受限于有限 semantics,或视觉语言模型具有浅层交互,导致suboptimal的多标签识别性能。相比之下,RAM++将图像-文本对齐和图像标签融合到了一个细化的交互框架中,基于图像标签文本三元组。这种设计使得RAM++不仅能够在预定的类别中 excel,还能够在开放集类别中具有显著的增强。此外,RAM++利用大型语言模型~(LLMs)生成多样的视觉标签描述,开拓了图像标签训练中语言知识的应用。这种方法使得RAM++在推理过程中可以将视觉描述概念集成到开放集识别中。测试结果表明,RAM++在主流图像识别benchmark上超越了现有的基本图像识别模型,尤其是在预定的常用标签类别和开放集类别中。具体来说,对于预定的常用标签类别,RAM++记录了10.2 mAP和15.4 mAP的提升,对比CLIP。对于开放集类别,RAM++记录了5 mAP和6.4 mAP的提升。对于多样的人物-物体互动表达,RAM++实现了7.8 mAP和4.7 mAP的提升。代码、数据集和预训练模型可以在上获取。

Invariant Feature Regularization for Fair Face Recognition

  • paper_url: http://arxiv.org/abs/2310.14652
  • repo_url: https://github.com/panasonicconnect/invreg
  • paper_authors: Jiali Ma, Zhongqi Yue, Kagaya Tomoyuki, Suzuki Tomoki, Karlekar Jayashree, Sugiri Pranata, Hanwang Zhang
  • for: The paper aims to address the issue of bias in face recognition systems due to the imbalanced demographic attributes in the training data.
  • methods: The proposed method, called Invariant Feature Regularization (INV-REG), uses unsupervised data partitioning to generate diverse partitions that act as self-annotated confounders, allowing the model to deconfound and learn invariant features that generalize well across different demographic groups.
  • results: The proposed method improves face recognition performance on a variety of demographic groups, achieving new state-of-the-art results when combined with two strong baselines (Arcface and CIFP).Here’s the simplified Chinese text for the three key points:
  • for: 本研究旨在解决面部识别系统中的偏见问题,即训练数据中不均衡的人类特征Attributes。
  • methods: 提议的方法是InvVariance Regularization(INV-REG),通过无监督数据分割生成多种不同的数据分割,每个分割 acted as self-annotated confounder,使模型学习不受特定人类属性的偏见特征。
  • results: 提议的方法可以提高面部识别性能在不同的人类群体中,达到新的状态数据。
    Abstract Fair face recognition is all about learning invariant feature that generalizes to unseen faces in any demographic group. Unfortunately, face datasets inevitably capture the imbalanced demographic attributes that are ubiquitous in real-world observations, and the model learns biased feature that generalizes poorly in the minority group. We point out that the bias arises due to the confounding demographic attributes, which mislead the model to capture the spurious demographic-specific feature. The confounding effect can only be removed by causal intervention, which requires the confounder annotations. However, such annotations can be prohibitively expensive due to the diversity of the demographic attributes. To tackle this, we propose to generate diverse data partitions iteratively in an unsupervised fashion. Each data partition acts as a self-annotated confounder, enabling our Invariant Feature Regularization (INV-REG) to deconfound. INV-REG is orthogonal to existing methods, and combining INV-REG with two strong baselines (Arcface and CIFP) leads to new state-of-the-art that improves face recognition on a variety of demographic groups. Code is available at https://github.com/PanasonicConnect/InvReg.
    摘要 “美貌识别是关于学习不受影响的特征的,以便在未看过的人脸群中generalize。然而,人脸数据集总是捕捉到各种不均衡的人类属性,这使得模型学习到偏袋特征,其在少数群体中generalize很差。我们发现,这种偏袋是由潜在的人类属性混合所致,这使得模型捕捉到假的人类属性特征。这种混合效应可以通过 causal intervention 来消除,但这些批注可能因人类属性的多样性而变得昂贵。为解决这个问题,我们提议在无监督的情况下 iteratively 生成多个数据分区。每个数据分区 acts as a self-annotated confounder,使我们的 Invariant Feature Regularization (INV-REG) 可以减少混合效应。INV-REG 与现有方法不同,并且将 INV-REG 与两个强大基线(Arcface 和 CIFP)结合使用,可以获得新的状态机器人脸识别性能。代码可以在 https://github.com/PanasonicConnect/InvReg 上获取。”

Relit-NeuLF: Efficient Relighting and Novel View Synthesis via Neural 4D Light Field

  • paper_url: http://arxiv.org/abs/2310.14642
  • repo_url: https://github.com/oppo-us-research/relitneulf
  • paper_authors: Zhong Li, Liangchen Song, Zhang Chen, Xiangyu Du, Lele Chen, Junsong Yuan, Yi Xu
  • for: 本文解决了基于多视图图像的复杂场景同时重新照明和新视图合成问题,使用分析synthesis方法Relit-NeuLF。
  • methods: Relit-NeuLF方法首先使用两个平面光场表示法 parameterize each ray in a 4D coordinate system,使得效率学习和推理。然后,通过自我超级视图的方式,分解SVBRDF组成部分:粗糙度、法向和反射率。基于这些分解的 BRDF 组成部分和条件光线方向,RenderNet 学习了将光线颜色synthesize。
  • results: 实验表明,提出的方法是效率高效的,在 both synthetic data和实际人脸数据上达到了 state-of-the-art 水平,并且自动地学习了场景的SVBRDF。代码在 GitHub 上公开发布,可以在这里找到:https://github.com/oppo-us-research/RelitNeuLF。
    Abstract In this paper, we address the problem of simultaneous relighting and novel view synthesis of a complex scene from multi-view images with a limited number of light sources. We propose an analysis-synthesis approach called Relit-NeuLF. Following the recent neural 4D light field network (NeuLF), Relit-NeuLF first leverages a two-plane light field representation to parameterize each ray in a 4D coordinate system, enabling efficient learning and inference. Then, we recover the spatially-varying bidirectional reflectance distribution function (SVBRDF) of a 3D scene in a self-supervised manner. A DecomposeNet learns to map each ray to its SVBRDF components: albedo, normal, and roughness. Based on the decomposed BRDF components and conditioning light directions, a RenderNet learns to synthesize the color of the ray. To self-supervise the SVBRDF decomposition, we encourage the predicted ray color to be close to the physically-based rendering result using the microfacet model. Comprehensive experiments demonstrate that the proposed method is efficient and effective on both synthetic data and real-world human face data, and outperforms the state-of-the-art results. We publicly released our code on GitHub. You can find it here: https://github.com/oppo-us-research/RelitNeuLF
    摘要 在这篇论文中,我们解决了同时重新照明和新视图合成复杂场景的问题,使用限制数量的灯光源。我们提出了一种分析synthesis方法called Relit-NeuLF。基于最近的神经网络4D灯场(NeuLF),Relit-NeuLF首先利用了两个平面灯场表示,以参数化每个光束在4D坐标系统中,从而实现高效的学习和推理。然后,我们利用自我超级视图的方式来回归场景中的空间变化的折射率分布函数(SVBRDF)。一个DecomposeNet学习将每个光束映射到其SVBRDF组成部分:反射率、法向和粗糙度。基于分解后的BRDF组成部分和灯光方向,一个RenderNet学习了将光束颜色Synthesize。为了自我超级视图SVBRDF分解,我们鼓励预测的光束颜色与物理基础渲染结果相匹配。完整的实验证明了我们提出的方法是高效和有效的,并在 sintetic数据和实际人脸数据上达到了状态的艺术结果。我们在GitHub上公开了我们的代码,您可以在这里找到:https://github.com/oppo-us-research/RelitNeuLF。

Semantic-Aware Adversarial Training for Reliable Deep Hashing Retrieval

  • paper_url: http://arxiv.org/abs/2310.14637
  • repo_url: https://github.com/xandery-geek/SAAT
  • paper_authors: Xu Yuan, Zheng Zhang, Xunguang Wang, Lin Wu
  • for: 本文旨在提高深度哈希模型中的防御性 robustness,通过提出Semantic-Aware Adversarial Training(SAAT)方法,以提高深度哈希模型对黑hat攻击的鲁棒性。
  • methods: 本文提出了一种名为Discriminative Mainstay Features Learning(DMFL)的方法,用于在深度哈希模型中学习可靠的主干特征,以便在防御学习中引导深度哈希模型。DMFL方法通过在泛化学习中同时考虑探索性和 semantics的特征,实现了精细化的主干特征学习。
  • results: 经验表明,SAAT方法可以在 benchmark datasets 上实现superb的防御性表现,同时可以有效地消除深度哈希模型中的黑hat攻击。此外,SAAT方法可以帮助提高深度哈希模型的鲁棒性,使其在不同的攻击方式下保持稳定的性能。
    Abstract Deep hashing has been intensively studied and successfully applied in large-scale image retrieval systems due to its efficiency and effectiveness. Recent studies have recognized that the existence of adversarial examples poses a security threat to deep hashing models, that is, adversarial vulnerability. Notably, it is challenging to efficiently distill reliable semantic representatives for deep hashing to guide adversarial learning, and thereby it hinders the enhancement of adversarial robustness of deep hashing-based retrieval models. Moreover, current researches on adversarial training for deep hashing are hard to be formalized into a unified minimax structure. In this paper, we explore Semantic-Aware Adversarial Training (SAAT) for improving the adversarial robustness of deep hashing models. Specifically, we conceive a discriminative mainstay features learning (DMFL) scheme to construct semantic representatives for guiding adversarial learning in deep hashing. Particularly, our DMFL with the strict theoretical guarantee is adaptively optimized in a discriminative learning manner, where both discriminative and semantic properties are jointly considered. Moreover, adversarial examples are fabricated by maximizing the Hamming distance between the hash codes of adversarial samples and mainstay features, the efficacy of which is validated in the adversarial attack trials. Further, we, for the first time, formulate the formalized adversarial training of deep hashing into a unified minimax optimization under the guidance of the generated mainstay codes. Extensive experiments on benchmark datasets show superb attack performance against the state-of-the-art algorithms, meanwhile, the proposed adversarial training can effectively eliminate adversarial perturbations for trustworthy deep hashing-based retrieval. Our code is available at https://github.com/xandery-geek/SAAT.
    摘要 深度哈希已经广泛研究和成功应用于大规模图像检索系统,这主要归功于其高效性和可靠性。然而, latest studies have shown that deep hashing models are vulnerable to adversarial attacks, which poses a security threat to these models. Specifically, it is challenging to efficiently distill reliable semantic representatives for deep hashing to guide adversarial learning, which hinders the enhancement of adversarial robustness of deep hashing-based retrieval models. In addition, current researches on adversarial training for deep hashing are difficult to be formalized into a unified minimax structure.In this paper, we propose a Semantic-Aware Adversarial Training (SAAT) method to improve the adversarial robustness of deep hashing models. Specifically, we design a discriminative mainstay features learning (DMFL) scheme to construct semantic representatives for guiding adversarial learning in deep hashing. Our DMFL scheme is adaptively optimized in a discriminative learning manner, where both discriminative and semantic properties are jointly considered. Moreover, we fabricate adversarial examples by maximizing the Hamming distance between the hash codes of adversarial samples and mainstay features, which is validated in the adversarial attack trials. Furthermore, we formulate the formalized adversarial training of deep hashing into a unified minimax optimization under the guidance of the generated mainstay codes.Extensive experiments on benchmark datasets show that our proposed adversarial training method can effectively eliminate adversarial perturbations for trustworthy deep hashing-based retrieval, while achieving superb attack performance against the state-of-the-art algorithms. Our code is available at https://github.com/xandery-geek/SAAT.

Multilevel Perception Boundary-guided Network for Breast Lesion Segmentation in Ultrasound Images

  • paper_url: http://arxiv.org/abs/2310.14636
  • repo_url: None
  • paper_authors: Xing Yang, Jian Zhang, Qijian Chen, Li Wang, Lihui Wang
    for: breast tumor segmentation from ultrasound imagesmethods: * multilevel global perception module (MGPM)* boundary guided module (BGM)* multi-level boundary-enhanced segmentation (BS) lossresults:* improved segmentation performance for tumor boundaries* outperformed state-of-the-art methods in terms of qualitative and quantitative evaluation metrics* Dice score, Jaccard coefficient, Specificity and HD95 improved by 0.70%, 1.1%, 0.1% and 2.5% respectively.
    Abstract Automatic segmentation of breast tumors from the ultrasound images is essential for the subsequent clinical diagnosis and treatment plan. Although the existing deep learning-based methods have achieved significant progress in automatic segmentation of breast tumor, their performance on tumors with similar intensity to the normal tissues is still not pleasant, especially for the tumor boundaries. To address this issue, we propose a PBNet composed by a multilevel global perception module (MGPM) and a boundary guided module (BGM) to segment breast tumors from ultrasound images. Specifically, in MGPM, the long-range spatial dependence between the voxels in a single level feature maps are modeled, and then the multilevel semantic information is fused to promote the recognition ability of the model for non-enhanced tumors. In BGM, the tumor boundaries are extracted from the high-level semantic maps using the dilation and erosion effects of max pooling, such boundaries are then used to guide the fusion of low and high-level features. Moreover, to improve the segmentation performance for tumor boundaries, a multi-level boundary-enhanced segmentation (BS) loss is proposed. The extensive comparison experiments on both publicly available dataset and in-house dataset demonstrate that the proposed PBNet outperforms the state-of-the-art methods in terms of both qualitative visualization results and quantitative evaluation metrics, with the Dice score, Jaccard coefficient, Specificity and HD95 improved by 0.70%, 1.1%, 0.1% and 2.5% respectively. In addition, the ablation experiments validate that the proposed MGPM is indeed beneficial for distinguishing the non-enhanced tumors and the BGM as well as the BS loss are also helpful for refining the segmentation contours of the tumor.
    摘要 自动 segmentation of breast tumors from ultrasound images 是至关重要的,以便进行后续临床诊断和治疗规划。尽管现有的深度学习基本方法已经取得了对 breast tumor 的自动 segmentation 的显著进步,但它们在与正常组织的同等亮度下的表现仍然不太理想,特别是 tumor 边界。为解决这个问题,我们提议一种名为 PBNet 的模型,其包括一个多层全局感知模块 (MGPM) 和一个边界导向模块 (BGM)。具体来说,在 MGPM 中,我们模型了 voxels 之间的长距离空间依赖性,然后将多层semantic信息融合以提高模型对非扩展 tumor 的识别能力。在 BGM 中,我们使用高级别semantic图像的扩散和剪辑效果进行最大池化,以提取 tumor 边界。此外,为了提高 tumor 边界的分 segmentation 性能,我们提出了一种多层边界增强分 segmentation (BS) 损失函数。我们在公共可用数据集和自有数据集进行了广泛的比较实验,结果显示,提议的 PBNet 在qualitative visualization结果和量化评价指标上都超过了现有的状态OF-THE-ART方法,Dice 分数、Jaccard 系数、特异性和 HD95 分别提高了0.70%、1.1%、0.1%和2.5%。此外,我们还进行了剖除实验, validate that the proposed MGPM 对非扩展 tumor 的识别能力和 BGM 以及 BS 损失函数对 tumor 边界的分 segmentation 性能具有帮助作用。

Pre-Training LiDAR-Based 3D Object Detectors Through Colorization

  • paper_url: http://arxiv.org/abs/2310.14592
  • repo_url: None
  • paper_authors: Tai-Yu Pan, Chenyang Ma, Tianle Chen, Cheng Perng Phoo, Katie Z Luo, Yurong You, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao
  • for: 这个论文主要用于提高自动驾驶车的三维对象检测和理解,即使只有有限的标注数据。
  • methods: 这篇论文提出了一种创新的预训练方法,即根据点云图像彩色化(Grounded Point Colorization,GPC),使模型具有有用的 semantic cue。
  • results: 实验结果表明,GPC 能够在 KITTI 和 Waymo 数据集上显著提高精度和效率,尤其是在仅使用 20% 的 KITTI 数据集时。
    Abstract Accurate 3D object detection and understanding for self-driving cars heavily relies on LiDAR point clouds, necessitating large amounts of labeled data to train. In this work, we introduce an innovative pre-training approach, Grounded Point Colorization (GPC), to bridge the gap between data and labels by teaching the model to colorize LiDAR point clouds, equipping it with valuable semantic cues. To tackle challenges arising from color variations and selection bias, we incorporate color as "context" by providing ground-truth colors as hints during colorization. Experimental results on the KITTI and Waymo datasets demonstrate GPC's remarkable effectiveness. Even with limited labeled data, GPC significantly improves fine-tuning performance; notably, on just 20% of the KITTI dataset, GPC outperforms training from scratch with the entire dataset. In sum, we introduce a fresh perspective on pre-training for 3D object detection, aligning the objective with the model's intended role and ultimately advancing the accuracy and efficiency of 3D object detection for autonomous vehicles.
    摘要 <>转换文本到简化中文。<>自适应汽车自动驾驶需要高精度三维对象检测和理解,因此需要大量标注数据进行训练。在这项工作中,我们介绍了一种创新的预训练方法——基于点云的地面色彩化(GPC),用于让模型学习点云的颜色,从而提供有价值的语义提示。为了解决颜色变化和选择偏见的挑战,我们在色彩化过程中包含颜色作为“上下文”,提供实际颜色作为提示。实验结果表明,GPC在KITTI和Waymo数据集上具有惊人的有效性。即使使用有限的标注数据,GPC可以大幅提高精通训练性能,特别是在使用KITTI数据集的20%时,GPC已经超越了从零开始训练整个数据集。总的来说,我们引入了3D对象检测预训练的新视角,使目标与模型的角色一致,从而提高3D对象检测的准确性和效率,为自动驾驶车辆提供更好的支持。

Tensor Decomposition Based Attention Module for Spiking Neural Networks

  • paper_url: http://arxiv.org/abs/2310.14576
  • repo_url: None
  • paper_authors: Haoyu Deng, Ruijie Zhu, Xuerui Qiu, Yule Duan, Malu Zhang, Liangjian Deng
  • for: 提高逻辑神经网络(SNN)性能
  • methods: 使用矩阵分解技术实现tensor-相关的注意机制
  • results: 在静态和动态测试集上达到了最佳性能,超过了现有的SNN模型(使用Transformer和CNN底层)In English:
  • for: Improving the performance of spiking neural networks (SNN)
  • methods: Using tensor decomposition techniques to implement attention mechanisms that are relevant to tensors
  • results: Achieving state-of-the-art performance on both static and dynamic benchmark datasets, outperforming existing SNN models with Transformer-based and CNN-based backbones.
    Abstract The attention mechanism has been proven to be an effective way to improve spiking neural network (SNN). However, based on the fact that the current SNN input data flow is split into tensors to process on GPUs, none of the previous works consider the properties of tensors to implement an attention module. This inspires us to rethink current SNN from the perspective of tensor-relevant theories. Using tensor decomposition, we design the \textit{projected full attention} (PFA) module, which demonstrates excellent results with linearly growing parameters. Specifically, PFA is composed by the \textit{linear projection of spike tensor} (LPST) module and \textit{attention map composing} (AMC) module. In LPST, we start by compressing the original spike tensor into three projected tensors using a single property-preserving strategy with learnable parameters for each dimension. Then, in AMC, we exploit the inverse procedure of the tensor decomposition process to combine the three tensors into the attention map using a so-called connecting factor. To validate the effectiveness of the proposed PFA module, we integrate it into the widely used VGG and ResNet architectures for classification tasks. Our method achieves state-of-the-art performance on both static and dynamic benchmark datasets, surpassing the existing SNN models with Transformer-based and CNN-based backbones.
    摘要 “对于神经网络(SNN)来说,专注机制已经证明是一种有效的改进方法。然而,由于现有SNN的输入数据流程是以紧缩为tensor进行GPU处理,所以前一些工作都没有考虑tensor的特性来实现专注模组。这为我们提供了重新思考SNN的机会,从tensor相关理论的角度出发。使用tensor分解,我们设计了对应全面专注(PFA)模组,它在线性增长参数下表现出色。具体来说,PFA由线性对应缩排(LPST)模组和专注地图组合(AMC)模组组成。在LPST中,我们将原始发射tensor缩排成三个投影tensor使用单一的属性保持策略,每个维度都有学习型的参数。然后,在AMC中,我们利用反向的tensor分解过程来结合三个tensor成专注地图,使用一个称为连接因子。为了证明提案的PFA模组的有效性,我们将其integragreen到了广泛使用的VGG和ResNet架构中,进行分类任务。我们的方法在静态和动态benchmark数据上表现出色,超越了现有的SNN模型,包括基于Transformer和CNN的底层。”

DICE: Diverse Diffusion Model with Scoring for Trajectory Prediction

  • paper_url: http://arxiv.org/abs/2310.14570
  • repo_url: None
  • paper_authors: Younwoo Choi, Ray Coden Mercurius, Soheil Mohamad Alizadeh Shabestary, Amir Rasouli
  • for: 预测行人或自动驾驶车辆的未来路径,以提高安全性和有效性。
  • methods: 使用吸引模型,并提出了一种高效的采样机制和分数方法来减少计算成本。
  • results: 在UCY/ETH和nuScenes测试集上实现了状态当前的表现,并在一些子集和指标上达到了领先水平。
    Abstract Road user trajectory prediction in dynamic environments is a challenging but crucial task for various applications, such as autonomous driving. One of the main challenges in this domain is the multimodal nature of future trajectories stemming from the unknown yet diverse intentions of the agents. Diffusion models have shown to be very effective in capturing such stochasticity in prediction tasks. However, these models involve many computationally expensive denoising steps and sampling operations that make them a less desirable option for real-time safety-critical applications. To this end, we present a novel framework that leverages diffusion models for predicting future trajectories in a computationally efficient manner. To minimize the computational bottlenecks in iterative sampling, we employ an efficient sampling mechanism that allows us to maximize the number of sampled trajectories for improved accuracy while maintaining inference time in real time. Moreover, we propose a scoring mechanism to select the most plausible trajectories by assigning relative ranks. We show the effectiveness of our approach by conducting empirical evaluations on common pedestrian (UCY/ETH) and autonomous driving (nuScenes) benchmark datasets on which our model achieves state-of-the-art performance on several subsets and metrics.
    摘要 路用户轨迹预测在动态环境中是一项复杂但重要的任务,有助于自动驾驶等应用。未知多种多样的代理人意图的多模态性是预测任务中的主要挑战。扩散模型在预测任务中表现出色,但这些模型包含许多计算成本高的杂谱步骤和抽样操作,使其不适合实时安全关键应用。为此,我们提出了一种新的框架,利用扩散模型进行未来轨迹预测,并在计算效率上做出了改进。为了减少循环抽样中的计算瓶颈,我们采用了高效的抽样机制,以最大化抽样轨迹数量,提高预测准确性,同时保持检测时间在实时水平。此外,我们提议一种排名机制,将可能性最高的轨迹排名为最有可能性的。我们在UCY/ETH和nuScenes两个常用的人行和自动驾驶数据集上进行了实验,并证明了我们的方法在一些子集和指标上达到了当前领域的状态前沿性。

F$^2$AT: Feature-Focusing Adversarial Training via Disentanglement of Natural and Perturbed Patterns

  • paper_url: http://arxiv.org/abs/2310.14561
  • repo_url: None
  • paper_authors: Yaguan Qian, Chenyu Zhao, Zhaoquan Gu, Bin Wang, Shouling Ji, Wei Wang, Boyang Zhou, Pan Zhou
  • for: 防止深度神经网络(DNNs)受到针对性攻击,以保护critical应用程序,如自动驾驶车、surveillance安全和医疗诊断。
  • methods: 提出了一种新的Feature-Focusing Adversarial Training(F$^2$AT)方法,通过分割bit-plane来分离自然和受攻击的样本,使模型更加注重自然样本中的核心特征,从而降低针对性攻击的影响。
  • results: F$^2$AT方法在clean accuracy和针对性攻击性方面具有显著的优势,比 précédente方法更高。
    Abstract Deep neural networks (DNNs) are vulnerable to adversarial examples crafted by well-designed perturbations. This could lead to disastrous results on critical applications such as self-driving cars, surveillance security, and medical diagnosis. At present, adversarial training is one of the most effective defenses against adversarial examples. However, traditional adversarial training makes it difficult to achieve a good trade-off between clean accuracy and robustness since spurious features are still learned by DNNs. The intrinsic reason is that traditional adversarial training makes it difficult to fully learn core features from adversarial examples when adversarial noise and clean examples cannot be disentangled. In this paper, we disentangle the adversarial examples into natural and perturbed patterns by bit-plane slicing. We assume the higher bit-planes represent natural patterns and the lower bit-planes represent perturbed patterns, respectively. We propose a Feature-Focusing Adversarial Training (F$^2$AT), which differs from previous work in that it enforces the model to focus on the core features from natural patterns and reduce the impact of spurious features from perturbed patterns. The experimental results demonstrated that F$^2$AT outperforms state-of-the-art methods in clean accuracy and adversarial robustness.
    摘要 The main reason is that traditional adversarial training cannot fully separate adversarial noise and clean examples, making it difficult to learn core features from adversarial examples. To address this issue, we propose a novel method called Feature-Focusing Adversarial Training (F$^2$AT). We disentangle adversarial examples into natural and perturbed patterns using bit-plane slicing, where higher bit-planes represent natural patterns and lower bit-planes represent perturbed patterns.F$^2$AT differs from previous methods in that it enforces the model to focus on core features from natural patterns and reduce the impact of spurious features from perturbed patterns. Our experimental results show that F$^2$AT outperforms state-of-the-art methods in both clean accuracy and adversarial robustness.

Polyhedral Surface: Self-supervised Point Cloud Reconstruction Based on Polyhedral Surface

  • paper_url: http://arxiv.org/abs/2310.14560
  • repo_url: None
  • paper_authors: Hui Tian, Kai Xu
  • for: 本研究旨在解决点云重建问题,尤其是在计算机图形领域中建立地方几何体以适应地方曲线。
  • methods: 本研究提出了一种新的多面体表示方法,用于表示地方表面。这种方法可以更好地表示锐利特征和表面边界,而不需要任何地方坐标系统。
  • results: 研究人员通过使用 нормаls构建多面体表示,包括二Normal和三Normal表示。这种方法实现了当前状态的最佳结果在三个常用的数据集(ShapeNetCore、ABC和ScanNet)中。
    Abstract Point cloud reconstruction from raw point cloud has been an important topic in computer graphics for decades, especially due to its high demand in modeling and rendering applications. An important way to solve this problem is establishing a local geometry to fit the local curve. However, previous methods build either a local plane or polynomial curve. Local plane brings the loss of sharp feature and the boundary artefacts on open surface. Polynomial curve is hard to combine with neural network due to the local coordinate consistent problem. To address this, we propose a novel polyhedral surface to represent local surface. This method provides more flexible to represent sharp feature and surface boundary on open surface. It does not require any local coordinate system, which is important when introducing neural networks. Specifically, we use normals to construct the polyhedral surface, including both dihedral and trihedral surfaces using 2 and 3 normals, respectively. Our method achieves state-of-the-art results on three commonly used datasets (ShapeNetCore, ABC, and ScanNet). Code will be released upon acceptance.
    摘要 <>将点云重建问题视为计算机图形领域中的一个重要话题,特别是由于其高需求于模型和渲染应用。现有的方法建立了本地几何结构来适应本地曲线。然而,前一代方法均建立了本地平面或多项式曲线。本地平面会导致具有粗糙特征和边缘artefacts的问题,而多项式曲线困难与神经网络结合使用,因为本地坐标系统存在问题。为解决这个问题,我们提出了一种新的多面体表示本地表面的方法。这种方法可以更好地表示锐角特征和表面边界的问题。它不需要任何本地坐标系统,这对于引入神经网络来说非常重要。具体来说,我们使用法向量来构建多面体表面,包括二面体和三面体使用2和3个法向量,分别。我们的方法实现了状态略于最佳的结果在三个常用的数据集(ShapeNetCore、ABC和ScanNet)中。代码将在接受后发布。

S3Aug: Segmentation, Sampling, and Shift for Action Recognition

  • paper_url: http://arxiv.org/abs/2310.14556
  • repo_url: None
  • paper_authors: Taiki Sugiura, Toru Tamaki
  • for: 本研究 propose 一种视频数据增强方法,用于提高动作识别的性能。
  • methods: 该方法通过段落和标签图像转换来生成新的视频数据,并通过采样和特征偏移来增强视频帧之间的时间协调性。
  • results: 实验结果表明,提档方法可以有效地提高动作识别的性能,特别是在Mimetics数据集上的异 Context 视频。
    Abstract Action recognition is a well-established area of research in computer vision. In this paper, we propose S3Aug, a video data augmenatation for action recognition. Unlike conventional video data augmentation methods that involve cutting and pasting regions from two videos, the proposed method generates new videos from a single training video through segmentation and label-to-image transformation. Furthermore, the proposed method modifies certain categories of label images by sampling to generate a variety of videos, and shifts intermediate features to enhance the temporal coherency between frames of the generate videos. Experimental results on the UCF101, HMDB51, and Mimetics datasets demonstrate the effectiveness of the proposed method, paricularlly for out-of-context videos of the Mimetics dataset.
    摘要 <>translate "Action recognition is a well-established area of research in computer vision. In this paper, we propose S3Aug, a video data augmentation for action recognition. Unlike conventional video data augmentation methods that involve cutting and pasting regions from two videos, the proposed method generates new videos from a single training video through segmentation and label-to-image transformation. Furthermore, the proposed method modifies certain categories of label images by sampling to generate a variety of videos, and shifts intermediate features to enhance the temporal coherency between frames of the generated videos. Experimental results on the UCF101, HMDB51, and Mimetics datasets demonstrate the effectiveness of the proposed method, particularly for out-of-context videos of the Mimetics dataset." into Simplified Chinese. Action recognition 是计算机视觉领域的一个已经有了很长历史的研究领域。在这篇论文中,我们提出了 S3Aug,一种用于动作识别的视频数据增强方法。与传统的视频数据增强方法不同,S3Aug 方法不需要从两个视频中切割和粘贴区域,而是通过分割和标签图像转换来生成新的视频。此外,S3Aug 方法还可以对某些类别的标签图像进行采样,以生成多种视频,并将中间特征shift来提高视频帧之间的时间协调性。实验结果表明,S3Aug 方法在 UCF101、HMDB51 和 Mimetics 数据集上具有出色的效果,特别是对 Mimetics 数据集中的 OUT-OF-CONTEXT 视频。

Practical Deep Dispersed Watermarking with Synchronization and Fusion

  • paper_url: http://arxiv.org/abs/2310.14532
  • repo_url: https://github.com/bytedance/dwsf
  • paper_authors: Hengchang Guo, Qilong Zhang, Junwei Luo, Feng Guo, Wenbin Zhang, Xiaodong Su, Minglei Li
    for: 这个论文的目的是提出一种实用的深度 watermarking 技术,以提高对高分辨率图像的潜在水印。methods: 本文使用的方法包括:随机选择 Cover 图像中的一些固定小尺寸块进行推干 embedder,并在扩展stage中使用水平同步模块和解码器来rectify 和提取扩展stage中的水印信息。results: 对比与状态艺方法,本文的盲水印技术可以提高比特率的平均提高5.28%和5.93%,并且显示更好的文件大小增长和视觉质量。
    Abstract Deep learning based blind watermarking works have gradually emerged and achieved impressive performance. However, previous deep watermarking studies mainly focus on fixed low-resolution images while paying less attention to arbitrary resolution images, especially widespread high-resolution images nowadays. Moreover, most works usually demonstrate robustness against typical non-geometric attacks (\textit{e.g.}, JPEG compression) but ignore common geometric attacks (\textit{e.g.}, Rotate) and more challenging combined attacks. To overcome the above limitations, we propose a practical deep \textbf{D}ispersed \textbf{W}atermarking with \textbf{S}ynchronization and \textbf{F}usion, called \textbf{\proposed}. Specifically, given an arbitrary-resolution cover image, we adopt a dispersed embedding scheme which sparsely and randomly selects several fixed small-size cover blocks to embed a consistent watermark message by a well-trained encoder. In the extraction stage, we first design a watermark synchronization module to locate and rectify the encoded blocks in the noised watermarked image. We then utilize a decoder to obtain messages embedded in these blocks, and propose a message fusion strategy based on similarity to make full use of the consistency among messages, thus determining a reliable message. Extensive experiments conducted on different datasets convincingly demonstrate the effectiveness of our proposed {\proposed}. Compared with state-of-the-art approaches, our blind watermarking can achieve better performance: averagely improve the bit accuracy by 5.28\% and 5.93\% against single and combined attacks, respectively, and show less file size increment and better visual quality. Our code is available at https://github.com/bytedance/DWSF.
    摘要 深度学习基于的盲水印技术逐渐出现,并实现了卓越的性能。然而,前一代的深水印研究主要集中在固定低分辨率图像上,而忽略了任意分辨率图像,特别是现在广泛存在的高分辨率图像。此外,大多数工作通常在Typical non-geometric attacks(如JPEG压缩)中展现了抗坚定性,而忽略了Common geometric attacks(如旋转)和更复杂的组合攻击。为了解决上述局限性,我们提出了一种实用的深度盲水印方法,即{\proposed}。具体来说,给定一个任意分辨率的覆盖图像,我们采用了一种散列嵌入方案,在覆盖图像中随机和精度地选择一些固定的小尺寸覆盖块,并通过一个培编的编码器嵌入一致的盲水印消息。在提取阶段,我们首先设计了一个盲水印同步模块,用于在受损盲水印图像中找到并修正编码的块。然后,我们使用一个解码器从这些块中提取嵌入的消息,并提出了一种消息融合策略基于相似性,以便在多个消息之间共享信息,从而确定一个可靠的消息。我们对不同的数据集进行了广泛的实验,并证明了我们提出的{\proposed}的有效性。相比之前的方法,我们的盲水印技术可以平均提高比特率准确率5.28%和5.93%,并且在单个和组合攻击下表现更好,并且显示更好的视觉质量和更小的文件尺寸增长。我们的代码可以在https://github.com/bytedance/DWSF上获取。

Poster: Real-Time Object Substitution for Mobile Diminished Reality with Edge Computing

  • paper_url: http://arxiv.org/abs/2310.14511
  • repo_url: None
  • paper_authors: Hongyu Ke, Haoxin Wang
  • For: 本研究旨在提供一个可靠的、实时的减少实际(DR)架构,以便在行动设备上实现高品质的实时Scene建构。* Methods: 本研究使用了边缘计算技术,实现了实时物品替换,并提出了一个终端到终端架构,以便实现高品质的实时Scene建构。* Results: 本研究获得了一个可靠的、实时的DR架构,并在实验中证明了其高品质和可靠性。Translation:* For: This research aims to provide a reliable and real-time diminished reality (DR) architecture for mobile devices, enabling high-quality real-time scene construction.* Methods: The research uses edge computing technology to achieve real-time object substitution, and proposes an end-to-end architecture to facilitate high-quality real-time scene construction.* Results: The research obtains a reliable and real-time DR architecture, and demonstrates its high quality and reliability through experiments.
    Abstract Diminished Reality (DR) is considered as the conceptual counterpart to Augmented Reality (AR), and has recently gained increasing attention from both industry and academia. Unlike AR which adds virtual objects to the real world, DR allows users to remove physical content from the real world. When combined with object replacement technology, it presents an further exciting avenue for exploration within the metaverse. Although a few researches have been conducted on the intersection of object substitution and DR, there is no real-time object substitution for mobile diminished reality architecture with high quality. In this paper, we propose an end-to-end architecture to facilitate immersive and real-time scene construction for mobile devices with edge computing.
    摘要 降低现实(DR)被视为扩展现实(AR)的概念对应,最近受到了行业和学术界的越来越多的关注。与AR不同的是,DR使用户可以从真实世界中消除物理内容。当与物体替换技术结合使用时,它开启了在Metaverse中进一步的探索之路。虽然有些研究者已经对这两个领域的交叉进行了研究,但是还没有实时物体替换的手机降低现实架构,具有高质量。本文提出了一种综合架构,以便在移动设备上实现具有启发性和实时性的场景建构,并利用边缘计算。

ADoPT: LiDAR Spoofing Attack Detection Based on Point-Level Temporal Consistency

  • paper_url: http://arxiv.org/abs/2310.14504
  • repo_url: None
  • paper_authors: Minkyoung Cho, Yulong Cao, Zixiang Zhou, Z. Morley Mao
  • For: The paper aims to address the challenge of LiDAR spoofing attacks in autonomous vehicles by proposing a novel framework called ADoPT.* Methods: The proposed method uses temporal consistency to identify abnormal objects based on the coherency of point clusters across consecutive frames.* Results: The evaluation using the nuScenes dataset shows that the proposed algorithm effectively counters various LiDAR spoofing attacks with a low false positive ratio (< 10%) and high true positive ratio (> 85%), outperforming existing state-of-the-art defense methods.Here’s the simplified Chinese text for the three key points:* For: 本研究旨在解决自动驾驶车辆LiDAR欺诈攻击的挑战,提出了一种名为ADoPT的新框架。* Methods: ADoPT使用点云的时间含义来识别异常对象,根据连续帧的点云协同性来判断是否存在攻击。* Results: 对于nuScenes数据集的评估表明,提议的算法可以有效地抵御多种LiDAR欺诈攻击,false positive ratio < 10%,true positive ratio > 85%,超越现有的状态码防御方法CARLO和3D-TC2。
    Abstract Deep neural networks (DNNs) are increasingly integrated into LiDAR (Light Detection and Ranging)-based perception systems for autonomous vehicles (AVs), requiring robust performance under adversarial conditions. We aim to address the challenge of LiDAR spoofing attacks, where attackers inject fake objects into LiDAR data and fool AVs to misinterpret their environment and make erroneous decisions. However, current defense algorithms predominantly depend on perception outputs (i.e., bounding boxes) thus face limitations in detecting attackers given the bounding boxes are generated by imperfect perception models processing limited points, acquired based on the ego vehicle's viewpoint. To overcome these limitations, we propose a novel framework, named ADoPT (Anomaly Detection based on Point-level Temporal consistency), which quantitatively measures temporal consistency across consecutive frames and identifies abnormal objects based on the coherency of point clusters. In our evaluation using the nuScenes dataset, our algorithm effectively counters various LiDAR spoofing attacks, achieving a low (< 10%) false positive ratio (FPR) and high (> 85%) true positive ratio (TPR), outperforming existing state-of-the-art defense methods, CARLO and 3D-TC2. Furthermore, our evaluation demonstrates the promising potential for accurate attack detection across various road environments.
    摘要 Translated into Simplified Chinese:深度神经网络(DNNs)在激光探测和范围(LiDAR)基于感知系统中日益得到推广,需要在对抗情况下保持稳定性。我们想解决激光欺诈攻击,攻击者通过投入假对象到激光数据中,使自动驾驶车辆错误地理解环境和做出错误的决策。然而,当前的防御算法主要依赖于感知输出(即 bounding box),因此面临限制,因为感知模型根据 Egocar 的视角处理有限的点数据生成 bounding box,这些 bounding box 是不准确的。为了突破这些限制,我们提出了一个新的框架,名为 ADoPT(基于点水平的异常检测),它量化 consecutives 帧中的时间一致性,并根据点群集的一致性来识别异常对象。在我们使用 nuScenes 数据集进行评估中,我们的算法能够有效地对各种 LiDAR 欺诈攻击进行防御,实现 < 10% 的 false positive ratio(FPR)和 > 85% 的 true positive ratio(TPR),超过当前的状态对抗方法 CARLO 和 3D-TC2。此外,我们的评估还表明了我们的算法在不同的道路环境中具有扬性的攻击检测潜力。

MSFormer: A Skeleton-multiview Fusion Method For Tooth Instance Segmentation

  • paper_url: http://arxiv.org/abs/2310.14489
  • repo_url: None
  • paper_authors: Yuan Li, Huan Liu, Yubo Tao, Xiangyang He, Haifeng Li, Xiaohu Guo, Hai Lin
  • for: 本研究旨在提高基于深度学习的牙齿分割方法,使其在有限数据情况下实现高精度分割。
  • methods: 本研究提出了一种2D-3D联合见解方法,使用skeleton来提取3D见解,并通过对skeleton和图像进行对比学习来实现2D-3D联合见解。
  • results: 实验结果表明,与大型预训练多视图模型结合MSFormer方法,可以实现 state-of-the-art 的牙齿分割性能,只需要100个训练网格。此外,随着训练数据量的增加, segmentation 精度也会提高。
    Abstract Recently, deep learning-based tooth segmentation methods have been limited by the expensive and time-consuming processes of data collection and labeling. Achieving high-precision segmentation with limited datasets is critical. A viable solution to this entails fine-tuning pre-trained multiview-based models, thereby enhancing performance with limited data. However, relying solely on two-dimensional (2D) images for three-dimensional (3D) tooth segmentation can produce suboptimal outcomes because of occlusion and deformation, i.e., incomplete and distorted shape perception. To improve this fine-tuning-based solution, this paper advocates 2D-3D joint perception. The fundamental challenge in employing 2D-3D joint perception with limited data is that the 3D-related inputs and modules must follow a lightweight policy instead of using huge 3D data and parameter-rich modules that require extensive training data. Following this lightweight policy, this paper selects skeletons as the 3D inputs and introduces MSFormer, a novel method for tooth segmentation. MSFormer incorporates two lightweight modules into existing multiview-based models: a 3D-skeleton perception module to extract 3D perception from skeletons and a skeleton-image contrastive learning module to obtain the 2D-3D joint perception by fusing both multiview and skeleton perceptions. The experimental results reveal that MSFormer paired with large pre-trained multiview models achieves state-of-the-art performance, requiring only 100 training meshes. Furthermore, the segmentation accuracy is improved by 2.4%-5.5% with the increasing volume of training data.
    摘要

Player Re-Identification Using Body Part Appearences

  • paper_url: http://arxiv.org/abs/2310.14469
  • repo_url: https://github.com/abhinine4/Soccerplayer_Reidentification
  • paper_authors: Mahesh Bhosale, Abhishek Kumar, David Doermann
  • for: 本研究提出了一种基于神经网络的人体部分识别方法,用于解决人体识别 task。
  • methods: 该方法使用了两条流域网络(一条用于特征地図EXTRACTION,另一条用于身体部分地図EXTRACTION)和一个bilinear-pooling层,以生成和空间 pooling 身体部分地図。每个本地特征的身体部分地図由相应的本地外观特征和身体部分描述符之间的bilinear映射生成。
  • results: 该模型在SoccerNet-V3 dataset上表现出色,比如OsNet和InceptionNet等状态的模型。
    Abstract We propose a neural network architecture that learns body part appearances for soccer player re-identification. Our model consists of a two-stream network (one stream for appearance map extraction and the other for body part map extraction) and a bilinear-pooling layer that generates and spatially pools the body part map. Each local feature of the body part map is obtained by a bilinear mapping of the corresponding local appearance and body part descriptors. Our novel representation yields a robust image-matching feature map, which results from combining the local similarities of the relevant body parts with the weighted appearance similarity. Our model does not require any part annotation on the SoccerNet-V3 re-identification dataset to train the network. Instead, we use a sub-network of an existing pose estimation network (OpenPose) to initialize the part substream and then train the entire network to minimize the triplet loss. The appearance stream is pre-trained on the ImageNet dataset, and the part stream is trained from scratch for the SoccerNet-V3 dataset. We demonstrate the validity of our model by showing that it outperforms state-of-the-art models such as OsNet and InceptionNet.
    摘要 我们提出了一种神经网络架构,用于学习足球运动员重识别的体部表现。我们的模型包括一个两树网络(一树用于出现地图抽取,另一树用于身体部分地图抽取)以及一个 bilinear-pooling 层,该层生成并在空间上 pool 身体部分地图。每个地方特征的身体部分地图来自于 bilinear 映射的相应地方出现和身体部分描述符。我们的新表示方式生成了一个鲁棒的图像匹配特征地图,该特征地图来自于组合相关的身体部分特征和权重的出现相似性。我们的模型不需要在 SoccerNet-V3 重识别 dataset 上进行任何部分注释,而是使用一个 OpenPose pose 估计网络的子网络来初始化部分流动,然后训练整个网络以 minimize triplet loss。出现流动在 ImageNet dataset 上进行预训练,而身体部分流动则是从 scratch 训练于 SoccerNet-V3 dataset。我们示出了我们的模型的有效性,证明它在 OsNet 和 InceptionNet 等状态机器上出perform。

cs.AI - 2023-10-23

DoGE: Domain Reweighting with Generalization Estimation

  • paper_url: http://arxiv.org/abs/2310.15393
  • repo_url: None
  • paper_authors: Simin Fan, Matteo Pagliardini, Martin Jaggi
  • for: 本研究旨在提高大语言模型的通用化能力,探讨预训练数据集的覆盖率和组成如何影响模型的泛化能力。
  • methods: 本研究提出了Domain reweighting with Generalization Estimation(DoGE)方法,通过在每个预训练源域的抽样权重基于每个域的泛化贡献来优化预训练数据集。
  • results: 在SlimPajama-6B数据集上,使用DoGE方法可以获得较好的平均折扣率和零批评率表现。在异源预训练任务上,DoGE方法可以大幅降低目标域的折扣率。此外,本研究还提出了一种效率的参数选择方法来提高泛化估计的效果。
    Abstract The coverage and composition of the pretraining data corpus significantly impacts the generalization ability of large language models. Conventionally, the pretraining corpus is composed of various source domains (e.g. CommonCrawl, Wikipedia, Github etc.) according to certain sampling probabilities (domain weights). However, current methods lack a principled way to optimize domain weights for ultimate goal for generalization. We propose DOmain reweighting with Generalization Estimation (DoGE), where we reweigh the sampling probability from each domain based on its contribution to the final generalization objective assessed by a gradient-based generalization estimation function. First, we train a small-scale proxy model with a min-max optimization to obtain the reweighted domain weights. At each step, the domain weights are updated to maximize the overall generalization gain by mirror descent. Finally we use the obtained domain weights to train a larger scale full-size language model. On SlimPajama-6B dataset, with universal generalization objective, DoGE achieves better average perplexity and zero-shot reasoning accuracy. On out-of-domain generalization tasks, DoGE reduces perplexity on the target domain by a large margin. We further apply a parameter-selection scheme which improves the efficiency of generalization estimation.
    摘要 大量语言模型的涵盖率和组合对其泛化能力产生了重要影响。传统上,预训练数据集由多个源频道(例如CommonCrawl、Wikipedia、Github等)按照certain sampling probabilities(频道权重)组成。然而,现有方法缺乏一种原则性的方法来优化频道权重以实现最终的泛化目标。我们提出了频道重Weighting with Generalization Estimation(DoGE),其中我们根据频道对最终泛化目标的贡献来重新定义频道权重。首先,我们使用一个小规模的代理模型通过min-max优化来获得重新定义的频道权重。在每次更新中,频道权重被更新以最大化总的泛化收益,通过镜投影法。最后,我们使用获得的频道权重来训练一个更大规模的全部语言模型。在SlimPajama-6B数据集上,使用通用泛化目标,DoGE在平均折衔率和零扫训练率方面达到了更好的性能。在对应频道上,DoGE可以大幅度下降目标频道的折衔率。我们进一步应用了一种效率的参数选择方案,以提高泛化估计的效率。

Irreducible Curriculum for Language Model Pretraining

  • paper_url: http://arxiv.org/abs/2310.15389
  • repo_url: None
  • paper_authors: Simin Fan, Martin Jaggi
  • for: 提高大型语言模型的自动数据选择和课程设计,以提高语言模型的预训练效果。
  • methods: 提出了一种名为”irreducible curriculum”的学习算法,该算法根据样本的学习可能性来优先选择训练样本。
  • results: 对RedPajama-1B数据集进行实验,与随机固定基线和反课程策略进行比较,显示了在所有7个领域中的验证抽象值的一致提高。同时,通过对MMLU标准准则进行5次测试,显示了网络的减锋和更好的5次准则性。
    Abstract Automatic data selection and curriculum design for training large language models is challenging, with only a few existing methods showing improvements over standard training. Furthermore, current schemes focus on domain-level selection, overlooking the more fine-grained contributions of each individual training point. It is difficult to apply traditional datapoint selection methods on large language models: most online batch selection methods perform two-times forward or backward passes, which introduces considerable extra costs with large-scale models. To mitigate these obstacles, we propose irreducible curriculum as a curriculum learning algorithm for language model pretraining, which prioritizes samples with higher learnability. Specifically, to avoid prohibitive extra computation overhead, we simulate the sample loss along the main model's training trajectory using a small-scale proxy model. Our experiments on the RedPajama-1B dataset demonstrate a consistent improvement on validation perplexity across all 7 domains compared to random uniform baseline and the anti-curriculum strategy. Our method also reduces the sharpness of the network and illustrates a better 5-shot accuracy on MMLU benchmarks.
    摘要 自动选择数据和课程设计 для训练大型自然语言处理模型是挑战,只有一些现有方法表现出了改善。现有方法主要集中在域级别的选择上,忽略了每个训练点的更细化贡献。传统的数据点选择方法在大型模型上不太适用:大多数在线批处理方法需要两次前向或后向通过,这会增加较大的额外成本。为了解决这些障碍,我们提出了不可分割课程(Irreducible Curriculum),一种基于语言模型预训练的学习策略,具体来说,我们使用一个小规模的代理模型来模拟样本损失 along主模型的训练轨迹。我们对RedPajama-1B数据集进行了实验,并证明了与随机统一基线和反课程策略相比,我们的方法在所有7个域上都表现出了稳定的提高。我们的方法还减少了网络的锐度,并在MMLU标准测试中达到了更好的5架准确率。

Course Correcting Koopman Representations

  • paper_url: http://arxiv.org/abs/2310.15386
  • repo_url: None
  • paper_authors: Mahan Fathi, Clement Gehring, Jonathan Pilault, David Kanaa, Pierre-Luc Bacon, Ross Goroshin
  • for: 本文目的是学习非线性动力系统(NLDS)中特征,以便在幽parallel space中线性化动力学。
  • methods: 本文使用自适应网络形式来解决这个问题,并研究了不同的方法来模型动态,特别是未来状态预测。
  • results: 研究发现了预测未来状态在幽parallel space中存在一些限制,并提出了一种执行时机制,即周期重编码,以准确捕捉长期动力学。这种方法在低维和高维NLDS中都得到了分析和实验 validate。
    Abstract Koopman representations aim to learn features of nonlinear dynamical systems (NLDS) which lead to linear dynamics in the latent space. Theoretically, such features can be used to simplify many problems in modeling and control of NLDS. In this work we study autoencoder formulations of this problem, and different ways they can be used to model dynamics, specifically for future state prediction over long horizons. We discover several limitations of predicting future states in the latent space and propose an inference-time mechanism, which we refer to as Periodic Reencoding, for faithfully capturing long term dynamics. We justify this method both analytically and empirically via experiments in low and high dimensional NLDS.
    摘要

Health Disparities through Generative AI Models: A Comparison Study Using A Domain Specific large language model

  • paper_url: http://arxiv.org/abs/2310.18355
  • repo_url: None
  • paper_authors: Yohn Jairo Parra Bautista, Vinicious Lima, Carlos Theran, Richard Alo
  • for: 降低健康差距和提高健康服务质量
  • methods: 使用域专大语言模型(SciBERT)和多用途语言模型(BERT),并使用cosine相似性分析文本查询以探讨对健康差距的影响
  • results: SciBERT在独立使用“race”一询时无法区分“race”和“对健康差距带来影响”的区别,这表明需要更多的资料和专业知识来实现域专大语言模型的效果。
    Abstract Health disparities are differences in health outcomes and access to healthcare between different groups, including racial and ethnic minorities, low-income people, and rural residents. An artificial intelligence (AI) program called large language models (LLMs) can understand and generate human language, improving health communication and reducing health disparities. There are many challenges in using LLMs in human-doctor interaction, including the need for diverse and representative data, privacy concerns, and collaboration between healthcare providers and technology experts. We introduce the comparative investigation of domain-specific large language models such as SciBERT with a multi-purpose LLMs BERT. We used cosine similarity to analyze text queries about health disparities in exam rooms when factors such as race are used alone. Using text queries, SciBERT fails when it doesn't differentiate between queries text: "race" alone and "perpetuates health disparities." We believe clinicians can use generative AI to create a draft response when communicating asynchronously with patients. However, careful attention must be paid to ensure they are developed and implemented ethically and equitably.
    摘要 医疗差距是指不同群体(包括种族和民族少数群体、低收入人群和农村居民)在医疗结果和医疗资源方面存在的差异。一种人工智能(AI)程序称为大语言模型(LLMs)可以理解和生成人类语言,从而改善医疗沟通和减少医疗差距。使用LLMs在医生与病人之间的互动中存在许多挑战,包括需要多样化和代表性的数据、隐私问题以及医疗提供者和技术专家之间的合作。我们介绍了对域pecific的大语言模型如SciBERT与多用途LLMsBERT进行比较研究。我们使用cosine相似性分析医疗差距话语中的文本查询,当factor such as race alone时,SciBERT失败。我们认为临床医生可以使用生成AI创建异步沟通中的答复。然而,必须仔细注意,以确保其在伦理和公平方面得到实施和开发。

Semantic Data Management in Data Lakes

  • paper_url: http://arxiv.org/abs/2310.15373
  • repo_url: None
  • paper_authors: Sayed Hoseini, Johannes Theissen-Lipp, Christoph Quix
  • for: 本文主要针对数据湖系统中的semantic data management,探讨latestapproaches和技术,以提高数据访问的表达力和可用性。
  • methods: 本文分为三类方法:基本semantic data management、metadata增强方法和ontologybased数据访问方法。每类方法都有详细介绍和比较latest研究。
  • results: 本文结合latest研究和实践,对数据湖系统中semantic data management的应用和扩展提出了一些挑战,并指出了未来研究的方向。
    Abstract In recent years, data lakes emerged as away to manage large amounts of heterogeneous data for modern data analytics. One way to prevent data lakes from turning into inoperable data swamps is semantic data management. Some approaches propose the linkage of metadata to knowledge graphs based on the Linked Data principles to provide more meaning and semantics to the data in the lake. Such a semantic layer may be utilized not only for data management but also to tackle the problem of data integration from heterogeneous sources, in order to make data access more expressive and interoperable. In this survey, we review recent approaches with a specific focus on the application within data lake systems and scalability to Big Data. We classify the approaches into (i) basic semantic data management, (ii) semantic modeling approaches for enriching metadata in data lakes, and (iii) methods for ontologybased data access. In each category, we cover the main techniques and their background, and compare latest research. Finally, we point out challenges for future work in this research area, which needs a closer integration of Big Data and Semantic Web technologies.
    摘要 现在的年头,数据湖出现了管理大量不同类型数据的现代数据分析的新方法。为了避免数据湖变成无法操作的数据泥沼,一些方法提出了链接元数据到知识图基于连接数据原则,以提供数据湖中数据更多的含义和Semantics。这种semantic层可能不仅用于数据管理,还可以解决来自不同来源的数据集成问题,以使数据访问更加表达力和可交互。在这篇评论中,我们审查了最近的方法,专注于数据湖系统中的应用和大数据扩展。我们将这些方法分为(i)基本semantic数据管理、(ii)用于填充数据湖元数据的semantic模型方法、(iii)基于 ontology的数据访问方法。在每个类别中,我们介绍了主要技术和背景,并对最新的研究进行比较。最后,我们指出了未来研究领域中的挑战,需要更加紧密地结合大数据和Semantic Web技术。

EpiK-Eval: Evaluation for Language Models as Epistemic Models

  • paper_url: http://arxiv.org/abs/2310.15372
  • repo_url: None
  • paper_authors: Gabriele Prato, Jerry Huang, Prasannna Parthasarathi, Shagun Sodhani, Sarath Chandar
  • for: 本研究旨在探讨大型自然语言模型(LLM)在整合不同训练文档中的知识整合能力。
  • methods: 本研究引入了一个新的问答评价指标集,用于评估不同LLM在从分割的故事中整合知识的能力。
  • results: 研究发现现有的LLM在这个领域存在显著的缺陷,这些缺陷源于现有训练目标的限制。本研究提出了改进知识整合的方法,以提高LLM的总效果和性能。
    Abstract In the age of artificial intelligence, the role of large language models (LLMs) is becoming increasingly central. Despite their growing prevalence, their capacity to consolidate knowledge from different training documents - a crucial ability in numerous applications - remains unexplored. This paper presents the first study examining the capability of LLMs to effectively combine such information within their parameter space. We introduce EpiK-Eval, a novel question-answering benchmark tailored to evaluate LLMs' proficiency in formulating a coherent and consistent knowledge representation from segmented narratives. Evaluations across various LLMs reveal significant weaknesses in this domain. We contend that these shortcomings stem from the intrinsic nature of prevailing training objectives. Consequently, we advocate for refining the approach towards knowledge consolidation, as it harbors the potential to dramatically improve their overall effectiveness and performance. The findings from this study offer insights for developing more robust and reliable LLMs. Our code and benchmark are available at https://github.com/chandar-lab/EpiK-Eval
    摘要 在人工智能时代,大型语言模型(LLM)的角色日益重要。尽管它们在应用领域中的使用越来越普遍,但它们在将不同训练文档中的知识集成到其参数空间中的能力仍未得到探讨。这篇论文发表了首个研究,探讨LLM在拼接篇章中的能力。我们提出了一个新的问答评价指标——EpiK-Eval,用于评估LLM在分段故事中形成一致性和稳定性的知识表示能力。经过多种LLM的评估,我们发现它们在这个领域存在重大的缺陷。我们认为这些缺陷源于现有训练目标的本质。因此,我们建议改进知识集成的方法,以提高LLM的总效果和性能。本研究的发现可以帮助开发更加强健和可靠的LLM。我们的代码和评价指标可以在GitHub上找到:https://github.com/chandar-lab/EpiK-Eval

Vicinal Feature Statistics Augmentation for Federated 3D Medical Volume Segmentation

  • paper_url: http://arxiv.org/abs/2310.15371
  • repo_url: None
  • paper_authors: Yongsong Huang, Wanqing Xie, Mingzhen Li, Mingmei Cheng, Jinzhou Wu, Weixiao Wang, Jane You, Xiaofeng Liu
  • for: This paper aims to develop a vicinal feature-level data augmentation (VFDA) scheme to improve the performance of federated learning (FL) for 3D medical segmentation, while preserving data privacy.
  • methods: The proposed VFDA scheme exploits batch-wise feature statistics in each institute to abstractly represent the discrepancy of data, and models each feature statistic probabilistically using a Gaussian prototype. The scheme is designed to mitigate the local feature shift and facilitate collaborative training for privacy-aware FL segmentation, without the need for cross-institute transfer of raw data or their mixup.
  • results: The proposed VFDA scheme consistently yielded marked improvements over six advanced FL methods on both 3D brain tumor and cardiac segmentation.
    Abstract Federated learning (FL) enables multiple client medical institutes collaboratively train a deep learning (DL) model with privacy protection. However, the performance of FL can be constrained by the limited availability of labeled data in small institutes and the heterogeneous (i.e., non-i.i.d.) data distribution across institutes. Though data augmentation has been a proven technique to boost the generalization capabilities of conventional centralized DL as a "free lunch", its application in FL is largely underexplored. Notably, constrained by costly labeling, 3D medical segmentation generally relies on data augmentation. In this work, we aim to develop a vicinal feature-level data augmentation (VFDA) scheme to efficiently alleviate the local feature shift and facilitate collaborative training for privacy-aware FL segmentation. We take both the inner- and inter-institute divergence into consideration, without the need for cross-institute transfer of raw data or their mixup. Specifically, we exploit the batch-wise feature statistics (e.g., mean and standard deviation) in each institute to abstractly represent the discrepancy of data, and model each feature statistic probabilistically via a Gaussian prototype, with the mean corresponding to the original statistic and the variance quantifying the augmentation scope. From the vicinal risk minimization perspective, novel feature statistics can be drawn from the Gaussian distribution to fulfill augmentation. The variance is explicitly derived by the data bias in each individual institute and the underlying feature statistics characterized by all participating institutes. The added-on VFDA consistently yielded marked improvements over six advanced FL methods on both 3D brain tumor and cardiac segmentation.
    摘要

Why LLMs Hallucinate, and How to Get (Evidential) Closure: Perceptual, Intensional, and Extensional Learning for Faithful Natural Language Generation

  • paper_url: http://arxiv.org/abs/2310.15355
  • repo_url: None
  • paper_authors: Adam Bouyamourn
  • for: 这 paper 是为了探讨语言模型(LLMs)的假设和证据的关系而写的。
  • methods: 这 paper 使用了一种名为“Learn-Babble-Prune”的规则集来制约 LLMs,使其输出符合证据的要求。
  • results: 这 paper 显示了 LLMs 会“幻想”,即其输出不一定是基于证据的。通过应用“Learn-Babble-Prune”规则集,可以使 LLMs 的输出变得更加准确和可靠。
    Abstract We show that LLMs hallucinate because their output is not constrained to be synonymous with claims for which they have evidence: a condition that we call evidential closure. Information about the truth or falsity of sentences is not statistically identified in the standard neural probabilistic language model setup, and so cannot be conditioned on to generate new strings. We then show how to constrain LLMs to produce output that does satisfy evidential closure. A multimodal LLM must learn about the external world (perceptual learning); it must learn a mapping from strings to states of the world (extensional learning); and, to achieve fluency when generalizing beyond a body of evidence, it must learn mappings from strings to their synonyms (intensional learning). The output of a unimodal LLM must be synonymous with strings in a validated evidence set. Finally, we present a heuristic procedure, Learn-Babble-Prune, that yields faithful output from an LLM by rejecting output that is not synonymous with claims for which the LLM has evidence.
    摘要 我们显示LLMs会幻视,因为它们的输出不是确保它们有证据的陈述:一个我们称为证据关闭的情况。讯息关于真伪或伪的句子是在标准神经 probabilistic language model 设置中不会被 statistically 识别出来,因此不能用来生成新的字串。我们然后显示如何将 LLMs 约束成生成符合证据关闭的出力。一个多modal LLM 必须学习外部世界(感知学习);它必须学习将字串映射到世界状态(扩展学习);并且,以扩展到证据外的情况时,它必须学习将字串映射到它们的同义词(内在学习)。一个单modal LLM 的输出必须是同义的字串集。最后,我们提出了一个专业程序,Learn-Babble-Prune,它可以从 LLM 中获得实际的出力,并且拒绝不同义的出力。

Moral Foundations of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.15337
  • repo_url: https://github.com/abdulhaim/moral_foundations_llm
  • paper_authors: Marwa Abdulhai, Gregory Serapio-Garcia, Clément Crepy, Daria Valter, John Canny, Natasha Jaques
  • for: 本研究用 moral foundations theory (MFT) 作为研究大型自然语言模型 (LLM) 是否具有特定道德价值观的工具。
  • methods: 本研究使用 MFT 分析知名的 LLM 是否具有特定道德价值观,并与人类道德价值观和政治倾向相关性分析。 并且对这些偏见进行了验证和检验。
  • results: 研究发现知名的 LLM 具有特定道德价值观,并且这些偏见与人类道德价值观和政治倾向有关。此外,研究还发现可以通过针对性地选择提示来让模型展现出特定的道德价值观,并且这会影响模型在下游任务中的行为。这些发现可能暴露了 LLM 做出不良决策的风险和不良后果。
    Abstract Moral foundations theory (MFT) is a psychological assessment tool that decomposes human moral reasoning into five factors, including care/harm, liberty/oppression, and sanctity/degradation (Graham et al., 2009). People vary in the weight they place on these dimensions when making moral decisions, in part due to their cultural upbringing and political ideology. As large language models (LLMs) are trained on datasets collected from the internet, they may reflect the biases that are present in such corpora. This paper uses MFT as a lens to analyze whether popular LLMs have acquired a bias towards a particular set of moral values. We analyze known LLMs and find they exhibit particular moral foundations, and show how these relate to human moral foundations and political affiliations. We also measure the consistency of these biases, or whether they vary strongly depending on the context of how the model is prompted. Finally, we show that we can adversarially select prompts that encourage the moral to exhibit a particular set of moral foundations, and that this can affect the model's behavior on downstream tasks. These findings help illustrate the potential risks and unintended consequences of LLMs assuming a particular moral stance.
    摘要 道德基础理论(MFT)是一种心理评估工具,它将人类道德理解 decomposes into 五个因素,包括 care/harm、liberty/oppression 和 sanctity/degradation( Graham 等,2009)。人们在作出道德决策时会因文化背景和政治理念而异同在Weight 这些维度上。由于 LLMs 在互联网上收集的数据中存在偏见,它们可能会反映这些偏见。这篇论文使用 MFT 作为镜子来分析流行 LLMS 是否具有特定的道德价值观。我们分析了知名的 LLMS,并发现它们表现出特定的道德基础,并与人类道德基础和政治倾向相关。我们还测量了这些偏见的一致性,或者在提示方式不同时是否强烈变化。最后,我们表明了可以通过选择提示来使模型表现出特定的道德基础,并且这可以影响模型在下游任务中的行为。这些发现可以帮助我们理解 LLMS 可能存在的风险和无意义后果。

Serverless Federated Learning with flwr-serverless

  • paper_url: http://arxiv.org/abs/2310.15329
  • repo_url: https://github.com/kungfuai/flwr_serverless
  • paper_authors: Sanjeev V. Namjoshi, Reese Green, Krishi Sharma, Zhangzhang Si
  • for: 这个研究旨在提供一个可扩展、可靠且易用的 Federated Learning 框架,以便在不同的资料来源上进行训练,而不需要中央服务器。
  • methods: 这个研究使用了 Flower 库,并将其扩展为允许同步和异步 Federated Learning,并且不需要调eso client-side training jobs。
  • results: 这个研究通过了一系列实验,证明了其可以降低 Federated Training 的时间和成本,并且提供了一个更易用的方式来实现和实验 Federated Learning 系统。
    Abstract Federated learning is becoming increasingly relevant and popular as we witness a surge in data collection and storage of personally identifiable information. Alongside these developments there have been many proposals from governments around the world to provide more protections for individuals' data and a heightened interest in data privacy measures. As deep learning continues to become more relevant in new and existing domains, it is vital to develop strategies like federated learning that can effectively train data from different sources, such as edge devices, without compromising security and privacy. Recently, the Flower (\texttt{Flwr}) Python package was introduced to provide a scalable, flexible, and easy-to-use framework for implementing federated learning. However, to date, Flower is only able to run synchronous federated learning which can be costly and time-consuming to run because the process is bottlenecked by client-side training jobs that are slow or fragile. Here, we introduce \texttt{flwr-serverless}, a wrapper around the Flower package that extends its functionality to allow for both synchronous and asynchronous federated learning with minimal modification to Flower's design paradigm. Furthermore, our approach to federated learning allows the process to run without a central server, which increases the domains of application and accessibility of its use. This paper presents the design details and usage of this approach through a series of experiments that were conducted using public datasets. Overall, we believe that our approach decreases the time and cost to run federated training and provides an easier way to implement and experiment with federated learning systems.
    摘要 Federated learning 在当今数据收集和存储个人标识信息的增长趋势下变得越来越重要和受欢迎。在这些发展中,政府们在全球范围内提出了更多的数据保护措施,并且对数据隐私措施表示了更高的兴趣。随着深度学习在新领域中得到应用,它变得更加重要,以训练来自不同来源的数据,如边缘设备,而不compromising安全和隐私。Recently, the Flower (\texttt{Flwr}) Python package was introduced to provide a scalable, flexible, and easy-to-use framework for implementing federated learning. However, to date, Flower is only able to run synchronous federated learning, which can be costly and time-consuming to run because the process is bottlenecked by client-side training jobs that are slow or fragile. Here, we introduce \texttt{flwr-serverless}, a wrapper around the Flower package that extends its functionality to allow for both synchronous and asynchronous federated learning with minimal modification to Flower's design paradigm. Furthermore, our approach to federated learning allows the process to run without a central server, which increases the domains of application and accessibility of its use. This paper presents the design details and usage of this approach through a series of experiments that were conducted using public datasets. Overall, we believe that our approach decreases the time and cost to run federated training and provides an easier way to implement and experiment with federated learning systems.

Hallucination Detection for Grounded Instruction Generation

  • paper_url: http://arxiv.org/abs/2310.15319
  • repo_url: None
  • paper_authors: Lingjun Zhao, Khanh Nguyen, Hal Daumé III
  • for: 本研究旨在生成用于导航 simulate 的家居环境的指南。
  • methods: 我们采用了一种预训练于大量图文对的模型,并通过对比损失来精度地检测抽象 Refer 的幻觉。
  • results: 我们的最终模型在许多基础模型之上表现出色,包括使用 Word 概率估计和超参数化的 LSTM 和 Transformer 模型。
    Abstract We investigate the problem of generating instructions to guide humans to navigate in simulated residential environments. A major issue with current models is hallucination: they generate references to actions or objects that are inconsistent with what a human follower would perform or encounter along the described path. We develop a model that detects these hallucinated references by adopting a model pre-trained on a large corpus of image-text pairs, and fine-tuning it with a contrastive loss that separates correct instructions from instructions containing synthesized hallucinations. Our final model outperforms several baselines, including using word probability estimated by the instruction-generation model, and supervised models based on LSTM and Transformer.
    摘要 我们研究如何生成导航 instrucions,以帮助人类在模拟的住宅环境中导航。当前的模型存在一个主要问题,即幻觉:它们生成的参考行为或物品与人类跟随者所执行或遇到的不一致。我们开发了一个模型,通过采用已经预训练的大量图像文本对的模型,并使用对比损失来分离正确的 instrucions 和包含synthesized幻觉的 instrucions。我们的最终模型在许多基线模型之上表现出色,包括使用 instrucions 生成模型中的词汇概率,以及使用 LSTM 和 Transformer 等监督模型。

HetGPT: Harnessing the Power of Prompt Tuning in Pre-Trained Heterogeneous Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.15318
  • repo_url: None
  • paper_authors: Yihong Ma, Ning Yan, Jiayu Li, Masood Mortazavi, Nitesh V. Chawla
  • for: 这篇论文旨在提高预训练的异类图神经网络(HGNN)的预测性能。
  • methods: 该论文提出了一种普适的后期提示框架(HetGPT),以改善预训练的HGNN的预测性能。该框架包括一种新的提示函数,它将虚拟类提示和多态特征提示相结合,以重新定义下游任务,使其与预tex任务更加一致。此外,HetGPT还提出了一种多视图邻居聚合机制,以捕捉异类图中复杂的邻居结构。
  • results: 在三个benchmark dataset上,HetGPT可以提高预训练的HGNN的 semi-supervised node classification性能。
    Abstract Graphs have emerged as a natural choice to represent and analyze the intricate patterns and rich information of the Web, enabling applications such as online page classification and social recommendation. The prevailing "pre-train, fine-tune" paradigm has been widely adopted in graph machine learning tasks, particularly in scenarios with limited labeled nodes. However, this approach often exhibits a misalignment between the training objectives of pretext tasks and those of downstream tasks. This gap can result in the "negative transfer" problem, wherein the knowledge gained from pre-training adversely affects performance in the downstream tasks. The surge in prompt-based learning within Natural Language Processing (NLP) suggests the potential of adapting a "pre-train, prompt" paradigm to graphs as an alternative. However, existing graph prompting techniques are tailored to homogeneous graphs, neglecting the inherent heterogeneity of Web graphs. To bridge this gap, we propose HetGPT, a general post-training prompting framework to improve the predictive performance of pre-trained heterogeneous graph neural networks (HGNNs). The key is the design of a novel prompting function that integrates a virtual class prompt and a heterogeneous feature prompt, with the aim to reformulate downstream tasks to mirror pretext tasks. Moreover, HetGPT introduces a multi-view neighborhood aggregation mechanism, capturing the complex neighborhood structure in heterogeneous graphs. Extensive experiments on three benchmark datasets demonstrate HetGPT's capability to enhance the performance of state-of-the-art HGNNs on semi-supervised node classification.
    摘要 GRAPHs have emerged as a natural choice to represent and analyze the intricate patterns and rich information of the Web, enabling applications such as online page classification and social recommendation. The prevailing "pre-train, fine-tune" paradigm has been widely adopted in graph machine learning tasks, particularly in scenarios with limited labeled nodes. However, this approach often exhibits a misalignment between the training objectives of pretext tasks and those of downstream tasks. This gap can result in the "negative transfer" problem, wherein the knowledge gained from pre-training adversely affects performance in the downstream tasks. The surge in prompt-based learning within Natural Language Processing (NLP) suggests the potential of adapting a "pre-train, prompt" paradigm to graphs as an alternative. However, existing graph prompting techniques are tailored to homogeneous graphs, neglecting the inherent heterogeneity of Web graphs. To bridge this gap, we propose HetGPT, a general post-training prompting framework to improve the predictive performance of pre-trained heterogeneous graph neural networks (HGNNs). The key is the design of a novel prompting function that integrates a virtual class prompt and a heterogeneous feature prompt, with the aim to reformulate downstream tasks to mirror pretext tasks. Moreover, HetGPT introduces a multi-view neighborhood aggregation mechanism, capturing the complex neighborhood structure in heterogeneous graphs. Extensive experiments on three benchmark datasets demonstrate HetGPT's capability to enhance the performance of state-of-the-art HGNNs on semi-supervised node classification.Here's the translation in Traditional Chinese:GRAPHs have emerged as a natural choice to represent and analyze the intricate patterns and rich information of the Web, enabling applications such as online page classification and social recommendation. The prevailing "pre-train, fine-tune" paradigm has been widely adopted in graph machine learning tasks, particularly in scenarios with limited labeled nodes. However, this approach often exhibits a misalignment between the training objectives of pretext tasks and those of downstream tasks. This gap can result in the "negative transfer" problem, wherein the knowledge gained from pre-training adversely affects performance in the downstream tasks. The surge in prompt-based learning within Natural Language Processing (NLP) suggests the potential of adapting a "pre-train, prompt" paradigm to graphs as an alternative. However, existing graph prompting techniques are tailored to homogeneous graphs, neglecting the inherent heterogeneity of Web graphs. To bridge this gap, we propose HetGPT, a general post-training prompting framework to improve the predictive performance of pre-trained heterogeneous graph neural networks (HGNNs). The key is the design of a novel prompting function that integrates a virtual class prompt and a heterogeneous feature prompt, with the aim to reformulate downstream tasks to mirror pretext tasks. Moreover, HetGPT introduces a multi-view neighborhood aggregation mechanism, capturing the complex neighborhood structure in heterogeneous graphs. Extensive experiments on three benchmark datasets demonstrate HetGPT's capability to enhance the performance of state-of-the-art HGNNs on semi-supervised node classification.

Toward a Critical Toponymy Framework for Named Entity Recognition: A Case Study of Airbnb in New York City

  • paper_url: http://arxiv.org/abs/2310.15302
  • repo_url: None
  • paper_authors: Mikael Brunila, Jack LaViolette, Sky CH-Wang, Priyanka Verma, Clara Féré, Grant McKenzie
  • for: 本研究探讨了地名的动态power、资本和抵抗力,以及地名的用途和生产过程。
  • methods: 本研究使用计算机方法测量了人们在日常话语中如何引用地方,并通过一个 novel的 Airbnb 列表数据集来验证这些方法。
  • results: 研究发现了一些新的地名概念和语言信号,这些信号有助于我们更好地理解社区地位、住房和旅游市场等问题。
    Abstract Critical toponymy examines the dynamics of power, capital, and resistance through place names and the sites to which they refer. Studies here have traditionally focused on the semantic content of toponyms and the top-down institutional processes that produce them. However, they have generally ignored the ways in which toponyms are used by ordinary people in everyday discourse, as well as the other strategies of geospatial description that accompany and contextualize toponymic reference. Here, we develop computational methods to measure how cultural and economic capital shape the ways in which people refer to places, through a novel annotated dataset of 47,440 New York City Airbnb listings from the 2010s. Building on this dataset, we introduce a new named entity recognition (NER) model able to identify important discourse categories integral to the characterization of place. Our findings point toward new directions for critical toponymy and to a range of previously understudied linguistic signals relevant to research on neighborhood status, housing and tourism markets, and gentrification.
    摘要 重要的地名学研究力量、资本和抵抗的动态变化通过地名和它们所指的地点进行检查。在这里,研究通常集中在地名的semantic content和上下游机构所生成的过程上。然而,它们通常忽略了Ordinary people在日常语言中使用地名的方式以及其他地点描述的策略。我们开发了计算方法来测量文化资本和经济资本如何影响人们如何引用地方,基于2010年代的纽约市空bnb列表47,440个数据集。建立在这个数据集之上,我们提出了一种新的命名实体识别(NER)模型,能够识别地名中重要的演讲类划分。我们的发现指向了新的方向 для重要的地名学和一系列以前未研究的语言信号,有关社区地位、住房和旅游市场以及urbana化。

Neural Network with Local Converging Input (NNLCI) for Supersonic Flow Problems with Unstructured Grids

  • paper_url: http://arxiv.org/abs/2310.15299
  • repo_url: None
  • paper_authors: Weiming Ding, Haoxiang Huang, Tzu Jung Lee, Yingjie Liu, Vigor Yang
  • for: 这项研究的目的是开发一种基于深度神经网络的高精度预测方法,用于解决含有复杂物理问题的partial differential equations。
  • methods: 该方法使用了一种含有本地领域的输入(NNLCI),使用了涉及到粗略解的粗略解结果作为输入,从而大幅减少了计算资源和训练时间。
  • results: 在渠道中的做冲流动中,通过使用NNLCI方法,可以系统地研究冲击波的交互和各种流体结构,并且可以对不同的凸起形态和位置进行benchmark。
    Abstract In recent years, surrogate models based on deep neural networks (DNN) have been widely used to solve partial differential equations, which were traditionally handled by means of numerical simulations. This kind of surrogate models, however, focuses on global interpolation of the training dataset, and thus requires a large network structure. The process is both time consuming and computationally costly, thereby restricting their use for high-fidelity prediction of complex physical problems. In the present study, we develop a neural network with local converging input (NNLCI) for high-fidelity prediction using unstructured data. The framework utilizes the local domain of dependence with converging coarse solutions as input, which greatly reduces computational resource and training time. As a validation case, the NNLCI method is applied to study inviscid supersonic flows in channels with bumps. Different bump geometries and locations are considered to benchmark the effectiveness and versability of the proposed approach. Detailed flow structures, including shock-wave interactions, are examined systematically.
    摘要

TaskDiff: A Similarity Metric for Task-Oriented Conversations

  • paper_url: http://arxiv.org/abs/2310.15298
  • repo_url: None
  • paper_authors: Ankita Bhaumik, Praveen Venkateswaran, Yara Rizk, Vatche Isahagian
  • for: 这个论文的目的是提出一种新的对话相似度度量方法,以便更好地评估和优化对话式数字助手的人机交互。
  • methods: 该论文使用了流行的大语言模型ChatGPT,并强调了提前工程和评估方法的重要性。它还利用了不同的对话组件(句子、意图和槽)的分布来计算相似度。
  • results: 对于一个 benchmark 数据集,TaskDiff 在对话相似度度量方法中表现出色,与其他相关方法相比,具有更高的性能和更好的Robustness。
    Abstract The popularity of conversational digital assistants has resulted in the availability of large amounts of conversational data which can be utilized for improved user experience and personalized response generation. Building these assistants using popular large language models like ChatGPT also require additional emphasis on prompt engineering and evaluation methods. Textual similarity metrics are a key ingredient for such analysis and evaluations. While many similarity metrics have been proposed in the literature, they have not proven effective for task-oriented conversations as they do not take advantage of unique conversational features. To address this gap, we present TaskDiff, a novel conversational similarity metric that utilizes different dialogue components (utterances, intents, and slots) and their distributions to compute similarity. Extensive experimental evaluation of TaskDiff on a benchmark dataset demonstrates its superior performance and improved robustness over other related approaches.
    摘要 “ conversational digital assistants 的普及化导致了大量的对话数据的可用性,可以用于提高用户体验和个性化响应生成。建立这些助手使用流行的大语言模型如ChatGPT也需要额外强调提问工程和评估方法。文本相似度度量是评估和评估方法中的关键组成部分。 although many similarity metrics have been proposed in the literature, they have not proven effective for task-oriented conversations as they do not take advantage of unique conversational features. To address this gap, we present TaskDiff, a novel conversational similarity metric that utilizes different dialogue components (utterances, intents, and slots) and their distributions to compute similarity. Extensive experimental evaluation of TaskDiff on a benchmark dataset demonstrates its superior performance and improved robustness over other related approaches.”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM

  • paper_url: http://arxiv.org/abs/2310.15296
  • repo_url: None
  • paper_authors: Weijie Xu, Wenxiang Hu, Fanyou Wu, Srinivasan Sengamedu
  • for: 这篇论文是关于自然语言处理领域中的 Neural Topic Models (NTMs) 和 Large Language Models (LLMs) 的研究。
  • methods: 该研究提出了一种新的框架 named Diffusion-Enhanced Topic Modeling using Encoder-Decoder-based LLMs (DeTiME),利用编码器-解码器基于的 LLMs 生成高度归一化的嵌入,以提高主题的归一化和 semantics 准确性。
  • results: 该研究表明,DeTiME 可以生成高度归一化的主题,并且可以同时生成与主题相关的内容。此外,DeTiME 还可以生成归一化嵌入,并且可以在多种应用中使用。
    Abstract In the burgeoning field of natural language processing, Neural Topic Models (NTMs) and Large Language Models (LLMs) have emerged as areas of significant research interest. Despite this, NTMs primarily utilize contextual embeddings from LLMs, which are not optimal for clustering or capable for topic generation. Our study addresses this gap by introducing a novel framework named Diffusion-Enhanced Topic Modeling using Encoder-Decoder-based LLMs (DeTiME). DeTiME leverages ncoder-Decoder-based LLMs to produce highly clusterable embeddings that could generate topics that exhibit both superior clusterability and enhanced semantic coherence compared to existing methods. Additionally, by exploiting the power of diffusion, our framework also provides the capability to generate content relevant to the identified topics. This dual functionality allows users to efficiently produce highly clustered topics and related content simultaneously. DeTiME's potential extends to generating clustered embeddings as well. Notably, our proposed framework proves to be efficient to train and exhibits high adaptability, demonstrating its potential for a wide array of applications.
    摘要 在自然语言处理领域的发展中,神经网络主题模型(NTM)和大型自然语言模型(LLM)已经成为研究的焦点。然而,NTMs主要使用语言模型提供的上下文嵌入,这些嵌入不适合归类或生成话题。我们的研究团队解决了这个问题,通过引入一种新的框架——吸引-加强话题模型(DeTiME)。DeTiME利用编码-解码型语言模型生成高度归类的嵌入,这些嵌入可以生成具有更高的归类度和改进的语义相关性的话题。此外,通过利用吸引的力量,我们的框架还提供了生成与标注的话题相关内容的能力。这种双重功能使得用户可以高效地生成高度归类的话题和相关的内容同时。DeTiME的潜在应用范围扩展到生成归类嵌入和相关内容。值得一提的是,我们的提议的框架具有较高的效率和适应性,这些特点使得它在各种应用场景中具有潜在的潜力。

Active teacher selection for reinforcement learning from human feedback

  • paper_url: http://arxiv.org/abs/2310.15288
  • repo_url: None
  • paper_authors: Rachel Freedman, Justin Svegliato, Kyle Wray, Stuart Russell
  • for: 这篇论文旨在探讨人工智能系统如何从人类反馈中学习目标,并模型不同教师的理性、专业和成本差异。
  • methods: 论文提出了隐藏用户师 bandit(HUB)框架,用于形式化多教师问题,并开发了多种解决方案。
  • results: 对两个实际应用领域(文献推荐系统和 COVID-19 疫苗测试)进行了实验,发现活动教师选择(ATS)算法比基线算法表现出色,能够活动选择何时和哪个教师提供反馈。
    Abstract Reinforcement learning from human feedback (RLHF) enables machine learning systems to learn objectives from human feedback. A core limitation of these systems is their assumption that all feedback comes from a single human teacher, despite querying a range of distinct teachers. We propose the Hidden Utility Bandit (HUB) framework to model differences in teacher rationality, expertise, and costliness, formalizing the problem of learning from multiple teachers. We develop a variety of solution algorithms and apply them to two real-world domains: paper recommendation systems and COVID-19 vaccine testing. We find that the Active Teacher Selection (ATS) algorithm outperforms baseline algorithms by actively selecting when and which teacher to query. The HUB framework and ATS algorithm demonstrate the importance of leveraging differences between teachers to learn accurate reward models, facilitating future research on active teacher selection for robust reward modeling.
    摘要 人工学习从人类反馈(RLHF)允许机器学习系统从人类反馈中学习目标。RLHF 系统假设所有反馈来自同一个人教师,尽管实际上可能有多个不同的人教师提供反馈。我们提议使用隐藏利用者强度bandit(HUB)框架来Model多个教师的不同合理性、专业性和成本,正式提出多教师学习问题。我们开发了多种解决方案,并将其应用于两个真实世界领域:纸张推荐系统和COVID-19疫苗测试。我们发现活动教师选择(ATS)算法在基eline算法的基础上提高性能,活动地选择何时和哪位教师提供反馈。HUB框架和ATS算法表明,可以利用不同教师之间的差异来学习准确的奖励模型,为未来的活动教师选择研究提供了基础。

Systematic AI Approach for AGI: Addressing Alignment, Energy, and AGI Grand Challenges

  • paper_url: http://arxiv.org/abs/2310.15274
  • repo_url: None
  • paper_authors: Eren Kurshan
  • for: 本研究旨在解决AI面临的三大挑战(能源墙、对齐问题和从 narrow AI 到 AGI 的跃迁)。
  • methods: 本研究提出了一种系统设计方法来解决这三个挑战。这种方法基于人类大脑系统设计的原则,包括信息处理和决策等方面。
  • results: 研究表明,通过采用系统设计方法,可以有效地解决能源墙、对齐问题和从 narrow AI 到 AGI 的跃迁。这种方法可以提高AI的可持续性和效率,同时也可以帮助实现健康的道德决策。
    Abstract AI faces a trifecta of grand challenges the Energy Wall, the Alignment Problem and the Leap from Narrow AI to AGI. Contemporary AI solutions consume unsustainable amounts of energy during model training and daily operations.Making things worse, the amount of computation required to train each new AI model has been doubling every 2 months since 2020, directly translating to increases in energy consumption.The leap from AI to AGI requires multiple functional subsystems operating in a balanced manner, which requires a system architecture. However, the current approach to artificial intelligence lacks system design; even though system characteristics play a key role in the human brain from the way it processes information to how it makes decisions. Similarly, current alignment and AI ethics approaches largely ignore system design, yet studies show that the brains system architecture plays a critical role in healthy moral decisions.In this paper, we argue that system design is critically important in overcoming all three grand challenges. We posit that system design is the missing piece in overcoming the grand challenges.We present a Systematic AI Approach for AGI that utilizes system design principles for AGI, while providing ways to overcome the energy wall and the alignment challenges.
    摘要 AI面临着三大挑战:能源墙、对齐问题和从 narrow AI 到 AGI 的跳跃。现代 AI 解决方案在模型训练和日常操作中消耗不可持续的能源。事实上,自 2020 年以来,每两个月都需要训练一个新的 AI 模型,这直接导致能源消耗的增加。往 AGI 的跳跃需要多个功能子系统在均衡状态下运行,这需要一套系统架构。然而,当前人工智能的方法缺乏系统设计,即使系统特点在人类大脑中扮演着关键角色,从信息处理到做出决策。同时,当前的对齐和 AI 伦理方法大多忽略系统设计,然而研究表明,脑系统架构在健康道德决策中发挥了关键作用。在这篇论文中,我们认为系统设计是解决三大挑战的关键因素。我们主张通过系统设计原则来实现 AGI,同时提供了绕过能源墙和对齐挑战的方法。

Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

  • paper_url: http://arxiv.org/abs/2310.15264
  • repo_url: None
  • paper_authors: Soumya Suvra Ghosal, Souradip Chakraborty, Jonas Geiping, Furong Huang, Dinesh Manocha, Amrit Singh Bedi
  • for: 这篇论文旨在探讨人工智能生成的文本检测问题,以帮助解决违规用户生成的文本的问题。
  • methods: 论文使用了许多现有的检测方法,包括语言模型、深度学习和自然语言处理等技术。
  • results: 论文提出了一些限制和挑战,并提出了一些未解决的问题,例如检测效果的提高和误报率的降低等。
    Abstract Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses. However, despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs such as spreading misinformation, generating fake news, plagiarism in academia, and contaminating the web. To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text. The basic idea is that whenever we can tell if the given text is either written by a human or an AI, we can utilize this information to address the above-mentioned concerns. To that end, a plethora of detection frameworks have been proposed, highlighting the possibilities of AI-generated text detection. But in parallel to the development of detection frameworks, researchers have also concentrated on designing strategies to elude detection, i.e., focusing on the impossibilities of AI-generated text detection. This is a crucial step in order to make sure the detection frameworks are robust enough and it is not too easy to fool a detector. Despite the huge interest and the flurry of research in this domain, the community currently lacks a comprehensive analysis of recent developments. In this survey, we aim to provide a concise categorization and overview of current work encompassing both the prospects and the limitations of AI-generated text detection. To enrich the collective knowledge, we engage in an exhaustive discussion on critical and challenging open questions related to ongoing research on AI-generated text detection.
    摘要 大型语言模型(LLM)在自然语言处理(NLP)领域取得了很大的进步,其可以生成人类语言样式的文本响应。然而,有一些研究表明,LLM可能会被用于散播谣言、生成假新闻、学术抄袭和污染网络等不良用途。为 Addressing these concerns, the research community has reached a consensus on developing algorithmic solutions to detect AI-generated text. The basic idea is that if we can distinguish between human-written and AI-generated text, we can use this information to address the above-mentioned concerns. To this end, a variety of detection frameworks have been proposed, highlighting the possibilities of AI-generated text detection. However, in parallel to the development of detection frameworks, researchers have also focused on designing strategies to evade detection, i.e., making it difficult for detectors to identify AI-generated text. This is a crucial step to ensure that detection frameworks are robust and cannot be easily fooled. Despite the significant interest and the flurry of research in this domain, the community currently lacks a comprehensive analysis of recent developments. In this survey, we aim to provide a concise categorization and overview of current work encompassing both the prospects and limitations of AI-generated text detection. To enrich the collective knowledge, we engage in an exhaustive discussion on critical and challenging open questions related to ongoing research on AI-generated text detection.

Reference Free Domain Adaptation for Translation of Noisy Questions with Question Specific Rewards

  • paper_url: http://arxiv.org/abs/2310.15259
  • repo_url: https://github.com/babangain/unsup_questions_translation
  • paper_authors: Baban Gain, Ramakrishna Appicharla, Soumya Chennabasavaraj, Nikesh Garera, Asif Ekbal, Muthusamy Chelliah
  • for: This paper aims to improve the accuracy of Neural Machine Translation (NMT) for question translation in noisy environments, where the grammatical correctness of the questions is not monitored.
  • methods: The authors propose a training methodology that fine-tunes the NMT system only using source-side data, and combines BERTScore and Masked Language Model (MLM) Score to balance adequacy and fluency.
  • results: The proposed method surpasses the conventional Maximum Likelihood Estimation (MLE) based fine-tuning approach, achieving a 1.9 BLEU score improvement, and shows robustness when adding noise to the baseline, with a 1.1 BLEU improvement and large improvements on TER and BLEURT metrics.Here’s the same information in Simplified Chinese:
  • for: 这篇论文目的是提高在各种不精确的问题环境中的启发式机器翻译(NMT)的准确率。
  • methods: 作者提出了一种只使用源 сторо数据进行训练的NMT系统精度调整方法,并将BERTScore和Masked Language Model(MLM)Score相结合以保持准确性和流畅性的平衡。
  • results: 提议的方法超过了传统的最大概率估计(MLE)基于的精度调整方法,实现了1.9个BLEU分数的提升,并在加入噪音基eline时显示了1.1个BLEU分数的提升和大量的TER和BLEURT分数的改善。
    Abstract Community Question-Answering (CQA) portals serve as a valuable tool for helping users within an organization. However, making them accessible to non-English-speaking users continues to be a challenge. Translating questions can broaden the community's reach, benefiting individuals with similar inquiries in various languages. Translating questions using Neural Machine Translation (NMT) poses more challenges, especially in noisy environments, where the grammatical correctness of the questions is not monitored. These questions may be phrased as statements by non-native speakers, with incorrect subject-verb order and sometimes even missing question marks. Creating a synthetic parallel corpus from such data is also difficult due to its noisy nature. To address this issue, we propose a training methodology that fine-tunes the NMT system only using source-side data. Our approach balances adequacy and fluency by utilizing a loss function that combines BERTScore and Masked Language Model (MLM) Score. Our method surpasses the conventional Maximum Likelihood Estimation (MLE) based fine-tuning approach, which relies on synthetic target data, by achieving a 1.9 BLEU score improvement. Our model exhibits robustness while we add noise to our baseline, and still achieve 1.1 BLEU improvement and large improvements on TER and BLEURT metrics. Our proposed methodology is model-agnostic and is only necessary during the training phase. We make the codes and datasets publicly available at \url{https://www.iitp.ac.in/~ai-nlp-ml/resources.html#DomainAdapt} for facilitating further research.
    摘要 社区问答(CQA)门户作为组织内用户的有价值工具,但是让其对非英语说者 accessible 仍然是一大挑战。通过翻译问题可以扩大社区的覆盖率,并且有助于语言不同的用户们提出相似的问题。使用神经机器翻译(NMT)翻译问题时,尤其在嘈杂环境下, grammatical correctness 的问题更加突出。这些问题可能会被非Native speaker 重塑成为句子,并且可能缺少问点。创建受损平行 corpus FROM SUCH DATA 也是困难的。为解决这个问题,我们提出了一种培训方法,只使用源侧数据进行 fine-tuning。我们的方法通过 combinig BERTScore 和 Masked Language Model(MLM) Score 来平衡准确性和流畅性。与传统的 Maximum Likelihood Estimation(MLE)基于 fine-tuning 方法相比,我们的方法可以达到 1.9 BLEU 分数的提高。我们的模型在我们添加噪音到基eline 时仍然表现稳定,并且可以达到 1.1 BLEU 分数和大幅提高 TER 和 BLEURT metric。我们的提posed methodology 是模型无关的,只需在培训阶段进行。我们将代码和数据公开 disponibles 在 \url{https://www.iitp.ac.in/~ai-nlp-ml/resources.html#DomainAdapt},以便进一步的研究。

CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks

  • paper_url: http://arxiv.org/abs/2310.15239
  • repo_url: https://github.com/mismayil/crow
  • paper_authors: Mete Ismayilzada, Debjit Paul, Syrielle Montariol, Mor Geva, Antoine Bosselut
  • for: 这个论文的目的是评估自然语言处理(NLP)系统在真实世界任务 setting中的常识理解能力。
  • methods: 这个论文使用了一种多项目数据采集管道,将现有数据集中的例子 rewrite 为具有常识抵触的例子,以构建一个手动精心编辑的多任务数据集(CRoW)。
  • results: 根据CRoW数据集进行评估,研究发现现有NLP系统在常识理解方面存在很大的性能差距,与人类的表现相比。这表明常识理解在真实世界任务 setting中仍然是一个待解决的问题。
    Abstract Recent efforts in natural language processing (NLP) commonsense reasoning research have yielded a considerable number of new datasets and benchmarks. However, most of these datasets formulate commonsense reasoning challenges in artificial scenarios that are not reflective of the tasks which real-world NLP systems are designed to solve. In this work, we present CRoW, a manually-curated, multi-task benchmark that evaluates the ability of models to apply commonsense reasoning in the context of six real-world NLP tasks. CRoW is constructed using a multi-stage data collection pipeline that rewrites examples from existing datasets using commonsense-violating perturbations. We use CRoW to study how NLP systems perform across different dimensions of commonsense knowledge, such as physical, temporal, and social reasoning. We find a significant performance gap when NLP systems are evaluated on CRoW compared to humans, showcasing that commonsense reasoning is far from being solved in real-world task settings. We make our dataset and leaderboard available to the research community at https://github.com/mismayil/crow.
    摘要 Recent efforts in 自然语言处理(NLP)的通用理解研究已经提供了许多新的数据集和标准测试。然而,大多数这些数据集都是在人工创造的情况下定义的通用理解挑战,这并不是实际世界中NLP系统设计的任务。在这项工作中,我们介绍了CRoW,一个手动精心编辑的、多任务测试benchmark,用于评估模型在六种真实世界NLP任务中应用通用理解的能力。CRoW使用多stage数据收集管道,将现有数据集中的示例 rewrite 以使用通用理解遮盲。我们使用CRoW来研究NLP系统在不同的通用知识维度,如物理、时间和社会理解方面的表现。我们发现,当NLP系统被评估在CRoW上时,与人类的表现存在很大的差距,这表明了通用理解在实际任务设置中仍然是一个未解决的问题。我们将我们的数据集和排名 disponibles 给研究社区在 GitHub 上(https://github.com/mismayil/crow)。

A new approach to template banks of gravitational waves with higher harmonics: reducing matched-filtering cost by over an order of magnitude

  • paper_url: http://arxiv.org/abs/2310.15233
  • repo_url: https://github.com/jaywadekar/gw_higher_harmonics_search
  • paper_authors: Digvijay Wadekar, Tejaswi Venumadhav, Ajit Kumar Mehta, Javier Roulet, Seth Olsen, Jonathan Mushkin, Barak Zackay, Matias Zaldarriaga
  • For: The paper aims to improve the sensitivity of gravitational wave event searches by including higher-order modes (HM) in the template banks, which are currently dominated by the quadrupole mode.* Methods: The paper proposes a new strategy that exploits the natural connection between modes to include HM in template banks, using a combination of post-Newtonian formulae and machine learning tools to model aligned-spin waveforms.* Results: The paper shows that the proposed method can significantly reduce the matched-filtering cost of HM searches, and is generally applicable for template banks constructed with either stochastic or geometric placement techniques. Additionally, the paper discusses compression of $(2,2)$-only geometric-placement template banks using machine learning algorithms.
    Abstract Searches for gravitational wave events use models, or templates, for the signals of interest. The templates used in current searches in the LIGO-Virgo-Kagra (LVK) data model the dominant quadrupole mode $(\ell,m)=(2,2)$ of the signals, and omit sub-dominant higher-order modes (HM) such as $(\ell,m)=(3,3)$, $(4,4)$, which are predicted by general relativity. Hence, these searches could lose sensitivity to black hole mergers in interesting parts of parameter space, such as systems with high-masses and asymmetric mass ratios. We develop a new strategy to include HM in template banks that exploits the natural connection between the modes. We use a combination of post-Newtonian formulae and machine learning tools to model aligned-spin $(3,3)$, $(4,4)$ waveforms corresponding to a given $(2,2)$ waveform. Each of these modes can be individually filtered against the data to yield separate timeseries of signal-to-noise ratios (SNR), which can be combined in a relatively inexpensive way to marginalize over extrinsic parameters of the signals. This leads to a HM search pipeline whose matched-filtering cost is just $\approx 3\times$ that of a quadrupole-only search (in contrast to being $\approx\! 100 \times$, as in previously proposed HM search methods). Our method is effectual and is generally applicable for template banks constructed with either stochastic or geometric placement techniques. Additionally, we discuss compression of $(2,2)$-only geometric-placement template banks using machine learning algorithms.
    摘要 Current gravitational wave event searches use templates to model the signals of interest. These templates in the LIGO-Virgo-Kagra (LVK) data model the dominant quadrupole mode ($\ell,m$)=(2,2) of the signals and omit sub-dominant higher-order modes (HM) such as ($\ell,m$)=(3,3), (4,4), which are predicted by general relativity. As a result, these searches could lose sensitivity to black hole mergers in interesting parts of parameter space, such as systems with high masses and asymmetric mass ratios.To address this issue, we develop a new strategy that includes HM in template banks by exploiting the natural connection between the modes. We use a combination of post-Newtonian formulae and machine learning tools to model aligned-spin ($\ell,m$)=(3,3), (4,4) waveforms corresponding to a given ($\ell,m$)=(2,2) waveform. Each of these modes can be individually filtered against the data to yield separate time series of signal-to-noise ratios (SNR), which can be combined in a relatively inexpensive way to marginalize over extrinsic parameters of the signals. This leads to a HM search pipeline whose matched-filtering cost is just $\approx 3\times$ that of a quadrupole-only search (in contrast to being $\approx\! 100 \times$, as in previously proposed HM search methods). Our method is effective and is generally applicable for template banks constructed with either stochastic or geometric placement techniques. Additionally, we discuss compression of (2,2)-only geometric-placement template banks using machine learning algorithms.

Handling Data Heterogeneity via Architectural Design for Federated Visual Recognition

  • paper_url: http://arxiv.org/abs/2310.15165
  • repo_url: https://github.com/sarapieri/fed_het
  • paper_authors: Sara Pieri, Jose Renato Restom, Samuel Horvath, Hisham Cholakkal
  • for: 该论文旨在探讨基于联合学习的视觉识别系统的性能问题。
  • methods: 该论文采用了多种现代化的建筑设计方法,包括卷积神经网络、转换器和MLP混合器,以实验性地证明这些设计方法对联合学习系统的性能有着重要的提高作用,特别是在处理不同数据时。
  • results: 该论文通过对19种视觉识别模型的测试和分析,发现了不同建筑设计方法对联合学习系统的性能有着重要的影响,并且发现了基于卷积神经网络的模型在联合学习 Setting 中的下降性能问题。
    Abstract Federated Learning (FL) is a promising research paradigm that enables the collaborative training of machine learning models among various parties without the need for sensitive information exchange. Nonetheless, retaining data in individual clients introduces fundamental challenges to achieving performance on par with centrally trained models. Our study provides an extensive review of federated learning applied to visual recognition. It underscores the critical role of thoughtful architectural design choices in achieving optimal performance, a factor often neglected in the FL literature. Many existing FL solutions are tested on shallow or simple networks, which may not accurately reflect real-world applications. This practice restricts the transferability of research findings to large-scale visual recognition models. Through an in-depth analysis of diverse cutting-edge architectures such as convolutional neural networks, transformers, and MLP-mixers, we experimentally demonstrate that architectural choices can substantially enhance FL systems' performance, particularly when handling heterogeneous data. We study 19 visual recognition models from five different architectural families on four challenging FL datasets. We also re-investigate the inferior performance of convolution-based architectures in the FL setting and analyze the influence of normalization layers on the FL performance. Our findings emphasize the importance of architectural design for computer vision tasks in practical scenarios, effectively narrowing the performance gap between federated and centralized learning. Our source code is available at https://github.com/sarapieri/fed_het.git.
    摘要 《联邦学习(FL)是一种有前途的研究方法,它允许不同方共同训练机器学习模型,而不需要敏感信息交换。然而,保留数据在个体客户端引入了基本挑战,以达到与中央训练模型相同的性能。我们的研究提供了对联邦学习应用于视觉识别的广泛的评论,强调了建筑设计的重要性,这一点在FL文献中经常被忽略。现有的FL解决方案frequently测试在极其简单的或者极其厚网络上,这可能不准确反映实际应用场景。通过对多种 cutting-edge 架构,如卷积神经网络、 transformer 和 MLP-mixers 进行实验分析,我们证明了建筑设计可以很大地提高FL系统的性能,特别是处理多元数据。我们在四个具有挑战性的FL dataset上测试了19种视觉识别模型,来自五个不同的架构家族。我们还重新调查了在FL设置下,卷积神经网络的 inferior 性能,以及normalization layers 对 FL 性能的影响。我们的发现强调了在实际应用场景中,建筑设计对计算机视觉任务非常重要,从而减小了联邦和中央学习之间的性能差距。我们的源代码可以在 中获取。

LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers

  • paper_url: http://arxiv.org/abs/2310.15164
  • repo_url: https://github.com/benlipkin/linc
  • paper_authors: Theo X. Olausson, Alex Gu, Benjamin Lipkin, Cedegao E. Zhang, Armando Solar-Lezama, Joshua B. Tenenbaum, Roger Levy
  • for: 本研究旨在帮助人工智能机器人更好地进行逻辑推理,以便在科学、数学和社会中发挥广泛的影响。
  • methods: 本研究使用的方法是LINC(逻辑推理via神经符号计算),即将 premise 和 conclusion 翻译成首选逻辑表示形式,然后交给外部符号证明器进行符号计算。
  • results: 研究发现,LINC 方法可以大幅提高 GPT-3.5 和 GPT-4 的逻辑推理性能,特别是在 ProofWriter 上。当与 GPT-4 结合使用时,LINC 方法可以与 CoT 提示方法相比,在 ProofWriter 上提高逻辑推理性能。
    Abstract Logical reasoning, i.e., deductively inferring the truth value of a conclusion from a set of premises, is an important task for artificial intelligence with wide potential impacts on science, mathematics, and society. While many prompting-based strategies have been proposed to enable Large Language Models (LLMs) to do such reasoning more effectively, they still appear unsatisfactory, often failing in subtle and unpredictable ways. In this work, we investigate the validity of instead reformulating such tasks as modular neurosymbolic programming, which we call LINC: Logical Inference via Neurosymbolic Computation. In LINC, the LLM acts as a semantic parser, translating premises and conclusions from natural language to expressions in first-order logic. These expressions are then offloaded to an external theorem prover, which symbolically performs deductive inference. Leveraging this approach, we observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate. On ProofWriter, augmenting the comparatively small open-source StarCoder+ (15.5B parameters) with LINC even outperforms GPT-3.5 and GPT-4 with Chain-of-Thought (CoT) prompting by an absolute 38% and 10%, respectively. When used with GPT-4, LINC scores 26% higher than CoT on ProofWriter while performing comparatively on FOLIO. Further analysis reveals that although both methods on average succeed roughly equally often on this dataset, they exhibit distinct and complementary failure modes. We thus provide promising evidence for how logical reasoning over natural language can be tackled through jointly leveraging LLMs alongside symbolic provers. All corresponding code is publicly available at https://github.com/benlipkin/linc
    摘要 逻辑推理,即从前提中推导出结论的真假值,对人工智能来说是一项重要任务,具有广泛的科学、数学和社会影响。虽然许多提示基本策略已经被提出,但它们仍然表现不够有力,经常在微妙而难以预测的方式失败。在这项工作中,我们调查了将这类任务转化为模块化神经符号编程,即LINC(逻辑推理via神经符号计算)的有效性。在LINC中,LLM(大型自然语言模型) acted as a semantic parser,将前提和结论从自然语言翻译成首选逻辑表达。这些表达后onces offloaded to an external theorem prover,以符号方式进行deductive inference。通过这种方法,我们在FOLIO和一个均衡的subset of ProofWriter上观察到了显著的性能提升。在ProofWriter上,通过与StarCoder+(15.5B参数)相结合,LINC甚至超过了GPT-3.5和GPT-4 with Chain-of-Thought(CoT)提示的性能。当用于GPT-4时,LINC在ProofWriter上得分26%高于CoT,而且在FOLIO上的性能相对相同。进一步分析发现,尽管这两种方法在这个数据集上的均方准确率大致相等,但它们在不同的失败模式下表现出明显的差异。我们因此提供了证明人工智能可以通过结合LLMs和符号推理器来解决逻辑推理 sobre 自然语言的证据。所有相关的代码可以在https://github.com/benlipkin/linc上获取。

Linear Representations of Sentiment in Large Language Models

  • paper_url: http://arxiv.org/abs/2310.15154
  • repo_url: https://github.com/curt-tigges/eliciting-latent-sentiment
  • paper_authors: Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, Neel Nanda
  • for: 本研究探讨了大语言模型中情感表达的表示方式。
  • methods: 研究人员通过对各模型的激活空间进行分析,发现情感在各模型中都表示为一条直线,大致上对应于正面和负面两种情感。通过对这些模型进行 causal 干预,研究人员发现这条直线在多种任务上具有 causally 相关性,并在真实世界数据集中如 Stanford Sentiment Treebank 中得到证实。
  • results: 研究人员发现,情感不仅仅表示于情感强烈的词语上,还会在介词和名称上被总结,如句号和名称。在 Stanford Sentiment Treebank 零基础分类任务中,当缺少情感方向时,净正率下降了76%,其中36%可以归因于缺少总结情感方向。
    Abstract Sentiment is a pervasive feature in natural language text, yet it is an open question how sentiment is represented within Large Language Models (LLMs). In this study, we reveal that across a range of models, sentiment is represented linearly: a single direction in activation space mostly captures the feature across a range of tasks with one extreme for positive and the other for negative. Through causal interventions, we isolate this direction and show it is causally relevant in both toy tasks and real world datasets such as Stanford Sentiment Treebank. Through this case study we model a thorough investigation of what a single direction means on a broad data distribution. We further uncover the mechanisms that involve this direction, highlighting the roles of a small subset of attention heads and neurons. Finally, we discover a phenomenon which we term the summarization motif: sentiment is not solely represented on emotionally charged words, but is additionally summarized at intermediate positions without inherent sentiment, such as punctuation and names. We show that in Stanford Sentiment Treebank zero-shot classification, 76% of above-chance classification accuracy is lost when ablating the sentiment direction, nearly half of which (36%) is due to ablating the summarized sentiment direction exclusively at comma positions.
    摘要 大自然语言文本中的情感是一种普遍存在的特征,然而如何在大型语言模型(LLM)中表达情感仍然是一个开放的问题。在这种研究中,我们发现在不同的模型中,情感是线性表示的:一个单一的方向在活动空间主要捕捉了特征,其中一个极端对应于正面,另一个对应于负面。通过 causal intervention,我们孤立了这个方向,并证明它是 causally relevant的,不仅在干扰 зада务中,还在真实世界数据集 such as Stanford Sentiment Treebank 中。通过这次研究,我们对大范围数据分布中的这个方向进行了全面的调查。我们进一步探索了这个方向的机制,发现这个方向的承担者是一小部分的注意头和神经元。最后,我们发现一种现象,我们称之为“概要抑制”现象:情感不仅表现在情感强烈的词语上,还在中间位置上,如句号和名称上进行了概要抑制。我们证明在 Stanford Sentiment Treebank 零shot分类任务中,当禁用情感方向时,准确率下降了76%,其中36%是由禁用概要情感方向 exclusively at comma positions 所致。

Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number

  • paper_url: http://arxiv.org/abs/2310.15151
  • repo_url: None
  • paper_authors: Sophie Hao, Tal Linzen
  • for: 这个论文旨在探讨Transformer架构中的语言特征是否具有可解释性。
  • methods: 作者使用了 causal intervention analysis 方法来显示BERT在 conjugate 动词时实际上使用了一种可解释的线性编码方式。
  • results: 研究发现,BERT在 conjugate 动词时会根据主语数目的线性编码,这种编码可以预测性地影响 conjugation 准确率。这种编码存在主语位置的第一层和动词位置的最后一层,以及中间层的多个位置,特别是当有多个cue提示主语数目时。
    Abstract Deep architectures such as Transformers are sometimes criticized for having uninterpretable "black-box" representations. We use causal intervention analysis to show that, in fact, some linguistic features are represented in a linear, interpretable format. Specifically, we show that BERT's ability to conjugate verbs relies on a linear encoding of subject number that can be manipulated with predictable effects on conjugation accuracy. This encoding is found in the subject position at the first layer and the verb position at the last layer, but distributed across positions at middle layers, particularly when there are multiple cues to subject number.
    摘要 Note: "black-box" refers to a system that is difficult to understand or interpret because its workings are not transparent or visible. "Linear encoding" refers to a way of representing information in a straight line, so that the relationships between different elements are easy to see and understand.

Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.15145
  • repo_url: https://github.com/Hermannovski/React
  • paper_authors: Jingyun Yang, Max Sobol Mark, Brandon Vu, Archit Sharma, Jeannette Bohg, Chelsea Finn
  • for: 该研究旨在帮助机器人快速适应新任务,无需大量人工干预。
  • methods: 该方法使用互联网上的数据和模型,通过重新推训来快速学习新任务。它还使用了偏好的抽象学习技术和预先训练的视觉语言模型来自动提供奖励信号。
  • results: 在五个真实机器人抓取任务中,该方法可以在3小时内自动学习并改进目标任务。在模拟实验中,该方法也超过了使用不同RL算法或不同预测奖励方法的先前作品。Please note that the above information is in Simplified Chinese.
    Abstract The pre-train and fine-tune paradigm in machine learning has had dramatic success in a wide range of domains because the use of existing data or pre-trained models on the internet enables quick and easy learning of new tasks. We aim to enable this paradigm in robotic reinforcement learning, allowing a robot to learn a new task with little human effort by leveraging data and models from the Internet. However, reinforcement learning often requires significant human effort in the form of manual reward specification or environment resets, even if the policy is pre-trained. We introduce RoboFuME, a reset-free fine-tuning system that pre-trains a multi-task manipulation policy from diverse datasets of prior experiences and self-improves online to learn a target task with minimal human intervention. Our insights are to utilize calibrated offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy in the presence of distribution shifts and leverage pre-trained vision language models (VLMs) to build a robust reward classifier for autonomously providing reward signals during the online fine-tuning process. In a diverse set of five real robot manipulation tasks, we show that our method can incorporate data from an existing robot dataset collected at a different institution and improve on a target task within as little as 3 hours of autonomous real-world experience. We also demonstrate in simulation experiments that our method outperforms prior works that use different RL algorithms or different approaches for predicting rewards. Project website: https://robofume.github.io
    摘要 Machine learning的预训练和精度调整方法在各种领域取得了很大成功,因为可以使用现有的数据或预训练模型来快速和轻松地学习新任务。我们想要将这种方法应用到机器人学习控制中,让机器人通过互联网上的数据和模型快速学习新任务,并且减少人类的干预。然而,学习控制经常需要大量的人类干预,包括手动指定奖励或环境重置,即使策略是预训练的。我们介绍了RoboFuME,一种不需要重置的精度调整系统,可以从多种数据集中预训练多任务摩擦策略,并在线自我改进来学习目标任务,尽可能减少人类干预。我们的发现是利用准确的离线学习控制技术来确保在分布shift的情况下进行高效的在线精度调整,并利用预训练的视觉语言模型(VLM)来建立自动提供奖励信号的稳定奖励分类器。在五种真实的机器人摩擦任务中,我们示出了我们的方法可以 incorporate 现有机器人数据集中的数据,并在目标任务上进行改进,只需3小时的自主实际经验。我们还在模拟实验中证明了我们的方法超过了使用不同的RL算法或不同的预测奖励方法的先前作品。项目网站:https://robofume.github.io

AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models

  • paper_url: http://arxiv.org/abs/2310.15140
  • repo_url: None
  • paper_authors: Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun
  • for: 本研究旨在测试大型自然语言模型(LLMs)的安全性,并研究如何使用敏捷攻击和自动攻击破坏LLMs的安全性。
  • methods: 本研究使用了两种攻击方法: manually crafted jailbreak attacks 和自动生成的攻击 prompts。两种攻击方法都可以破坏LLMs的安全性,但是自动生成的攻击 prompts 可以更好地避免被侦测。
  • results: 本研究发现,两种攻击方法都可以成功破坏LLMs的安全性,但是自动生成的攻击 prompts 可以更好地避免被侦测,并且可以在使用有限的训练数据或单一的代理模型时表现更好。此外,本研究还发现了一个新的攻击方法,即“自动驱动攻击”,可以将系统提示泄露出来。
    Abstract Safety alignment of Large Language Models (LLMs) can be compromised with manual jailbreak attacks and (automatic) adversarial attacks. Recent work suggests that patching LLMs against these attacks is possible: manual jailbreak attacks are human-readable but often limited and public, making them easy to block; adversarial attacks generate gibberish prompts that can be detected using perplexity-based filters. In this paper, we show that these solutions may be too optimistic. We propose an interpretable adversarial attack, \texttt{AutoDAN}, that combines the strengths of both types of attacks. It automatically generates attack prompts that bypass perplexity-based filters while maintaining a high attack success rate like manual jailbreak attacks. These prompts are interpretable and diverse, exhibiting strategies commonly used in manual jailbreak attacks, and transfer better than their non-readable counterparts when using limited training data or a single proxy model. We also customize \texttt{AutoDAN}'s objective to leak system prompts, another jailbreak application not addressed in the adversarial attack literature. Our work provides a new way to red-team LLMs and to understand the mechanism of jailbreak attacks.
    摘要 安全对齐大自然语言模型(LLM)可能会受到手动破坏攻击和自动攻击的威胁。 latest work suggests that patching LLMs against these attacks is possible:手动破坏攻击是人类可读的,但通常受限和公共,使其易于屏蔽; 自动攻击生成的噪音提示可以使用异常性基于的筛选器检测。在这篇论文中,我们表明这些解决方案可能太optimistic。我们提出一种可解释的自动攻击方法,称为\texttt{AutoDAN}。它将自动生成攻击提示,绕过异常性基于的筛选器,同时保持高攻击成功率,类似于手动破坏攻击。这些提示是可解释的和多样的,表现出手动破坏攻击中通常使用的策略,并在使用有限的训练数据或单个代理模型时传递得更好。我们还自定义\texttt{AutoDAN}的目标,以泄露系统提示,另一个未在攻击Literature中被考虑的破坏应用。我们的工作提供了一种新的红色团队LLMs的方法,以及理解破坏攻击的机制。

Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

  • paper_url: http://arxiv.org/abs/2310.15127
  • repo_url: None
  • paper_authors: Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki
  • for: 这篇论文是为了解决人工智能机器人在执行人工语言指令时的问题,提出了一种基于协助者(HELPER)的解决方案。
  • methods: 该论文使用了预训练和冻结的大型自然语言处理(LLM)模型,通过适当的几何示例提示来将简单的场景重新排序指令映射到机器人的视мотор功能中。
  • results: 根据论文中的结果,使用HELPER解决方案可以在TEACh测试准则下实现1.7倍的提升,比之前的最佳状态对比(SOTA)提高1.7倍。
    Abstract Pre-trained and frozen LLMs can effectively map simple scene re-arrangement instructions to programs over a robot's visuomotor functions through appropriate few-shot example prompting. To parse open-domain natural language and adapt to a user's idiosyncratic procedures, not known during prompt engineering time, fixed prompts fall short. In this paper, we introduce HELPER, an embodied agent equipped with an external memory of language-program pairs that parses free-form human-robot dialogue into action programs through retrieval-augmented LLM prompting: relevant memories are retrieved based on the current dialogue, instruction, correction or VLM description, and used as in-context prompt examples for LLM querying. The memory is expanded during deployment to include pairs of user's language and action plans, to assist future inferences and personalize them to the user's language and routines. HELPER sets a new state-of-the-art in the TEACh benchmark in both Execution from Dialog History (EDH) and Trajectory from Dialogue (TfD), with 1.7x improvement over the previous SOTA for TfD. Our models, code and video results can be found in our project's website: https://helper-agent-llm.github.io.
    摘要 <> translate_language: zh-CN>Pre-trained和冻结的LLMs可以有效地将简单的场景重新排序指令映射到机器人的视觉动作函数通过适当的几个示例提示。但是,用于提示工程时不知道用户的特殊程序, fixes prompts 不足。在这篇论文中,我们介绍了 HELPER,一个具有语言-程序对的外部记忆的聊天机器人,通过检索加工的LLM提示来解析开放领域自然语言,并将它们转化为动作计划。在对话中,当用户提供指令、 correction 或 VLM描述时,会根据当前对话情况检索相关的记忆,并使用这些记忆作为在线提示示例来询问LLM。在部署过程中,外部记忆会被扩展以包括用户的语言和动作计划对,以帮助未来的推理和个性化。 HELPER 在 TEACh 竞赛中创造了新的纪录,在 Execution from Dialog History (EDH) 和 Trajectory from Dialogue (TfD) 两个指标上分别提高了1.7倍。我们的模型、代码和视频结果可以在我们项目的网站上找到:https://helper-agent-llm.github.io。

Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

  • paper_url: http://arxiv.org/abs/2310.15123
  • repo_url: None
  • paper_authors: Swarnadeep Saha, Omer Levy, Asli Celikyilmaz, Mohit Bansal, Jason Weston, Xian Li
  • for: 本研究旨在提高大语言模型(LLM)在多方面语言生成和评估任务中的性能,包括满足复杂用户约束和考虑多个因素和标准。
  • methods: 本研究提出了分支、解决和合并(BSM)模块,用于帮助LLM在复杂的自然语言任务中表现更好。这三个模块负责将任务分解成多个并行子任务,独立解决它们,并将解决结果融合到一起。
  • results: 研究表明,BSM可以提高LLM的评估正确率和一致性,并减少长度和对位偏见的偏向。在受约Story生成任务中,BSM可以提高故事的一致性,同时提高约束满足度12%。
    Abstract Large Language Models (LLMs) are frequently used for multi-faceted language generation and evaluation tasks that involve satisfying intricate user constraints or taking into account multiple aspects and criteria. However, their performance can fall short, due to the model's lack of coherence and inability to plan and decompose the problem. We propose Branch-Solve-Merge (BSM), a Large Language Model program (Schlag et al., 2023) for tackling such challenging natural language tasks. It consists of branch, solve, and merge modules that are parameterized with specific prompts to the base LLM. These three modules plan a decomposition of the task into multiple parallel sub-tasks, independently solve them, and fuse the solutions to the sub-tasks. We apply our method to the tasks of LLM response evaluation and constrained text generation and evaluate its effectiveness with multiple LLMs, including Vicuna, LLaMA-2-chat, and GPT-4. BSM improves the evaluation correctness and consistency for each LLM by enhancing human-LLM agreement by up to 26%, reducing length and pairwise position biases by up to 50%, and allowing LLaMA-2-chat to match or outperform GPT-4 on most domains. On the constraint story generation task, BSM improves the coherence of the stories while also improving constraint satisfaction by 12%.
    摘要 大型语言模型(LLM)经常用于多方面的语言生成和评估任务,其中包括满足复杂的用户限制或考虑多种因素和标准。然而,LLM的性能可能不足,主要因为模型缺乏层次结构和计划能力。我们提出了分支解决方案(BSM),一种基于Schlag et al.(2023)的大型语言模型程序。该程序包括分支、解决和合并模块,这些模块都是通过特定提示来 Parametrize the base LLM。这三个模块计划将任务分解成多个平行的子任务,独立解决它们,并将子任务的解决结果融合起来。我们对响应评估和受限文本生成任务应用了我们的方法,并评估了多个LLM,包括Vicuna、LLaMA-2-chat和GPT-4。BSM可以提高每个LLM的评估正确率和一致性,提高人机 LLM 一致性达26%,降低长度和对称位偏好的偏好达50%,同时使LLaMA-2-chat与GPT-4在大多数领域匹配或超越GPT-4。在受限故事生成任务上,BSM可以提高故事的一致性,同时提高约束满足率12%。

Modeling Path Importance for Effective Alzheimer’s Disease Drug Repurposing

  • paper_url: http://arxiv.org/abs/2310.15211
  • repo_url: None
  • paper_authors: Shunian Xiang, Patrick J. Lawrence, Bo Peng, ChienWei Chiang, Dokyoon Kim, Li Shen, Xia Ning
  • for: 这 paper 的目的是探讨一种新的方法 для抗アルツハイマー病药物发现(drug repurposing),通过利用复杂网络来更有效地确定药物的作用机制。
  • methods: 这 paper 使用了一种新的网络基本方法(MPI),该方法通过学习节点嵌入来评估网络中各种交互类型的复杂关系,从而更好地评估药物的作用机制。
  • results: 对比基eline方法,MPI 能够更好地评估药物的作用机制,并提取了20.0%更多有anti-AD证据的药物。此外,从保险公司数据中的科克 proportional-hazard 模型表明,使用 etodolac、nicotine 和 BBB-crossing ACE-INHs 可能具有降低 AD 风险的作用,这些药物可能是可以重新利用的 канди达。
    Abstract Recently, drug repurposing has emerged as an effective and resource-efficient paradigm for AD drug discovery. Among various methods for drug repurposing, network-based methods have shown promising results as they are capable of leveraging complex networks that integrate multiple interaction types, such as protein-protein interactions, to more effectively identify candidate drugs. However, existing approaches typically assume paths of the same length in the network have equal importance in identifying the therapeutic effect of drugs. Other domains have found that same length paths do not necessarily have the same importance. Thus, relying on this assumption may be deleterious to drug repurposing attempts. In this work, we propose MPI (Modeling Path Importance), a novel network-based method for AD drug repurposing. MPI is unique in that it prioritizes important paths via learned node embeddings, which can effectively capture a network's rich structural information. Thus, leveraging learned embeddings allows MPI to effectively differentiate the importance among paths. We evaluate MPI against a commonly used baseline method that identifies anti-AD drug candidates primarily based on the shortest paths between drugs and AD in the network. We observe that among the top-50 ranked drugs, MPI prioritizes 20.0% more drugs with anti-AD evidence compared to the baseline. Finally, Cox proportional-hazard models produced from insurance claims data aid us in identifying the use of etodolac, nicotine, and BBB-crossing ACE-INHs as having a reduced risk of AD, suggesting such drugs may be viable candidates for repurposing and should be explored further in future studies.
    摘要 近期,药物重新定位(drug repurposing)在抗生素发现中显示出了有效和资源有效的特点。 among various drug repurposing methods, network-based methods have shown promising results as they can effectively leverage complex networks that integrate multiple interaction types, such as protein-protein interactions, to more effectively identify candidate drugs. However, existing approaches typically assume that paths of the same length in the network have equal importance in identifying the therapeutic effect of drugs. Other domains have found that same length paths do not necessarily have the same importance. Therefore, relying on this assumption may be detrimental to drug repurposing attempts. In this work, we propose MPI(Modeling Path Importance), a novel network-based method for AD drug repurposing. MPI is unique in that it prioritizes important paths via learned node embeddings, which can effectively capture a network's rich structural information. Therefore, leveraging learned embeddings allows MPI to effectively differentiate the importance among paths. We evaluate MPI against a commonly used baseline method that identifies anti-AD drug candidates primarily based on the shortest paths between drugs and AD in the network. We observe that among the top-50 ranked drugs, MPI prioritizes 20.0% more drugs with anti-AD evidence compared to the baseline. Finally, Cox proportional-hazard models produced from insurance claims data suggest that the use of etodolac, nicotine, and BBB-crossing ACE-INHs may reduce the risk of AD, indicating that these drugs may be viable candidates for repurposing and should be explored further in future studies.

Causal Inference Using LLM-Guided Discovery

  • paper_url: http://arxiv.org/abs/2310.15117
  • repo_url: None
  • paper_authors: Aniket Vashishtha, Abbavaram Gowtham Reddy, Abhinav Kumar, Saketh Bachu, Vineeth N Balasubramanian, Amit Sharma
    for:This paper focuses on developing a method to determine reliable causal graphs solely based on observational data, which is a challenging task in causal inference.methods:The authors propose using large language models (LLMs) such as GPT-3.5-turbo and GPT-4 to obtain causal order, which is easier to elicit from domain experts compared to graph edges. They employ different prompting strategies and contextual cues to propose a robust technique of obtaining causal order from LLMs.results:The authors’ approach significantly improves causal ordering accuracy as compared to established causal discovery algorithms, highlighting the potential of LLMs to enhance causal inference across diverse fields.
    Abstract At the core of causal inference lies the challenge of determining reliable causal graphs solely based on observational data. Since the well-known backdoor criterion depends on the graph, any errors in the graph can propagate downstream to effect inference. In this work, we initially show that complete graph information is not necessary for causal effect inference; the topological order over graph variables (causal order) alone suffices. Further, given a node pair, causal order is easier to elicit from domain experts compared to graph edges since determining the existence of an edge can depend extensively on other variables. Interestingly, we find that the same principle holds for Large Language Models (LLMs) such as GPT-3.5-turbo and GPT-4, motivating an automated method to obtain causal order (and hence causal effect) with LLMs acting as virtual domain experts. To this end, we employ different prompting strategies and contextual cues to propose a robust technique of obtaining causal order from LLMs. Acknowledging LLMs' limitations, we also study possible techniques to integrate LLMs with established causal discovery algorithms, including constraint-based and score-based methods, to enhance their performance. Extensive experiments demonstrate that our approach significantly improves causal ordering accuracy as compared to discovery algorithms, highlighting the potential of LLMs to enhance causal inference across diverse fields.
    摘要 在 causal inference 的核心之中,存在一个挑战是基于观察数据solely deterministic reliable causal graphs。由于Backdoor criterion виси于图,任何图中的错误都可能propagate downstream影响推断。在这个工作中,我们首先显示了complete graph information 不是必需的 для causal effect 推断; causal order alone suffices。此外,给定两个节点,causal order 更容易由domain experts 提供比graph edges。 Surprisingly,我们发现了同样的原理也适用于 Large Language Models (LLMs) 如 GPT-3.5-turbo 和 GPT-4,因此提出了一种自动获取 causal order 的方法,使用 LLMs 作为虚拟 domain experts。为此,我们采用不同的 prompting strategies 和 contextual cues 来提出一种robust的 causal order 获取技术。认可 LLMS 的限制,我们也研究了可能的 интегра LLMS 与 Established causal discovery algorithms,包括 constraint-based 和 score-based methods,以提高其性能。广泛的实验表明,我们的方法可以 significatively improve causal ordering accuracy compared to discovery algorithms, highlighting the potential of LLMs to enhance causal inference across diverse fields.

The Self 2.0: How AI-Enhanced Self-Clones Transform Self-Perception and Improve Presentation Skills

  • paper_url: http://arxiv.org/abs/2310.15112
  • repo_url: None
  • paper_authors: Qingxiao Zheng, Yun Huang
  • for: 这项研究探讨了人工智能生成的数字自我模拟对线上演讲技巧的提高的影响。
  • methods: 这项研究使用了混合设计实验,含44名国际学生,比较了自录视频(控制组)与自clone视频(AI组)在英语演讲练习中的表现。 AI视频利用了声音拟合、面部替换、唇 sync和姿势模拟来优化参与者的原始演讲中的重复、填充词和发音。
  • results: 机器评分结果表明了两组的演讲表现增强,但两组没有显著差异。 AI组表现出了更高的深度反思、自律和意义的转变从修正到增强的自我批判方式。在 AI组中,自我认知和 AI自我模拟之间的一致性导致了减少的演讲焦虑和增加的享受度。我们的发现建议了伦理使用数字自我模拟来提高情感和认知方面的技能发展。
    Abstract This study explores the impact of AI-generated digital self-clones on improving online presentation skills. We carried out a mixed-design experiment involving 44 international students, comparing self-recorded videos (control) with self-clone videos (AI group) for English presentation practice. The AI videos utilized voice cloning, face swapping, lip-sync, and body-language simulation to refine participants' original presentations in terms of repetition, filler words, and pronunciation. Machine-rated scores indicated enhancements in speech performance for both groups. Though the groups didn't significantly differ, the AI group exhibited a heightened depth of reflection, self-compassion, and a meaningful transition from a corrective to an enhancive approach to self-critique. Within the AI group, congruence between self-perception and AI self-clones resulted in diminished speech anxiety and increased enjoyment. Our findings recommend the ethical employment of digital self-clones to enhance the emotional and cognitive facets of skill development.
    摘要

Dual-path convolutional neural network using micro-FTIR imaging to predict breast cancer subtypes and biomarkers levels: estrogen receptor, progesterone receptor, HER2 and Ki67

  • paper_url: http://arxiv.org/abs/2310.15099
  • repo_url: None
  • paper_authors: Matheus del-Valle, Emerson Soares Bernardes, Denise Maria Zezell
    for:这个研究的目的是为了开发一种基于深度学习的二维FTIR微спектроскопи仪来评估乳腺癌,以便提高诊断精度和快速化诊断过程。methods:这个研究使用了FTIR微спектроскопи仪收集了60个微型FTIR图像,并使用K-means算法分 clustering,自动生成32x32块,然后开发了CaReNet-V2 convolutional neural network来分类乳腺癌vs附近组织和分子亚型,以及预测生长因子和孢子激酶水平。results:这个研究得到了CA vs AT和分子亚型的测试准确率高于0.84,并能够预测生长因子和孢子激酶水平,虽然边缘值显示较低的性能(最低准确率为0.54)。 Ki67百分比回归显示了平均误差为3.6%。因此,CaReNet-V2是一种有前途的技术,可以用于评估乳腺癌生物材料,并帮助优先级化患者。
    Abstract Breast cancer molecular subtypes classification plays an import role to sort patients with divergent prognosis. The biomarkers used are Estrogen Receptor (ER), Progesterone Receptor (PR), HER2, and Ki67. Based on these biomarkers expression levels, subtypes are classified as Luminal A (LA), Luminal B (LB), HER2 subtype, and Triple-Negative Breast Cancer (TNBC). Immunohistochemistry is used to classify subtypes, although interlaboratory and interobserver variations can affect its accuracy, besides being a time-consuming technique. The Fourier transform infrared micro-spectroscopy may be coupled with deep learning for cancer evaluation, where there is still a lack of studies for subtypes and biomarker levels prediction. This study presents a novel 2D deep learning approach to achieve these predictions. Sixty micro-FTIR images of 320x320 pixels were collected from a human breast biopsies microarray. Data were clustered by K-means, preprocessed and 32x32 patches were generated using a fully automated approach. CaReNet-V2, a novel convolutional neural network, was developed to classify breast cancer (CA) vs adjacent tissue (AT) and molecular subtypes, and to predict biomarkers level. The clustering method enabled to remove non-tissue pixels. Test accuracies for CA vs AT and subtype were above 0.84. The model enabled the prediction of ER, PR, and HER2 levels, where borderline values showed lower performance (minimum accuracy of 0.54). Ki67 percentage regression demonstrated a mean error of 3.6%. Thus, CaReNet-V2 is a potential technique for breast cancer biopsies evaluation, standing out as a screening analysis technique and helping to prioritize patients.
    摘要 乳癌分子亚型分类对患者进行不同预后评估具有重要作用。使用的生物标志物包括雌激素受体(ER)、孕激素受体(PR)、HER2和Ki67。根据这些生物标志物表达水平,分为Luminal A(LA)、Luminal B(LB)、HER2亚型和 triple-negative breast cancer(TNBC)。使用免疫 histochemistry 分类亚型,但是室内和观察者间的差异可能影响准确性,同时是一种时间费 consume 的技术。采用Fourier transform infrared micro-spectroscopy可以与深度学习结合用于肿瘤评估,而目前尚缺乏关于亚型和生物标志物水平预测的研究。这项研究提出了一种新的2D深度学习方法来实现这些预测。收集到了60个微FTIR图像,分辨率为320x320像素,来自人类乳腺癌微际阵列。数据被归类、处理和生成32x32个补丁,使用了一种完全自动化的方法。CaReNet-V2,一种新的 convolutional neural network,用于分类乳癌(CA)与邻近组织(AT)以及分子亚型,以及预测生物标志物水平。归类方法可以除非细胞像素。测试准确率为CA vs AT和亚型都高于0.84。模型可以预测ER、PR和HER2水平,而边缘值表现下降(最低准确率为0.54)。Ki67百分数回归示出了平均误差为3.6%。因此,CaReNet-V2 是一种有前途的技术,可以用于乳癌biopsies 评估,并且作为一种屏选分析技术,帮助优先级患者。

Acquiring Weak Annotations for Tumor Localization in Temporal and Volumetric Data

  • paper_url: http://arxiv.org/abs/2310.15098
  • repo_url: https://github.com/johnson111788/drag-drop
  • paper_authors: Yu-Cheng Chou, Bowen Li, Deng-Ping Fan, Alan Yuille, Zongwei Zhou
  • for: 提高医疗影像自动识别和定位的精度,使用大规模和充分标注的数据来训练人工智能算法。
  • methods: 提出了一种新的标注策略——Drag&Drop,即将医疗影像中的肿瘤拖动到合适的位置,以便更加快速地完成标注工作。
  • results: 经过实验表明, Drag&Drop 标注法可以提高识别和定位性能,并且可以在医疗影像中提高模型的一致性和抗见图能力。
    Abstract Creating large-scale and well-annotated datasets to train AI algorithms is crucial for automated tumor detection and localization. However, with limited resources, it is challenging to determine the best type of annotations when annotating massive amounts of unlabeled data. To address this issue, we focus on polyps in colonoscopy videos and pancreatic tumors in abdominal CT scans; both applications require significant effort and time for pixel-wise annotation due to the high dimensional nature of the data, involving either temporary or spatial dimensions. In this paper, we develop a new annotation strategy, termed Drag&Drop, which simplifies the annotation process to drag and drop. This annotation strategy is more efficient, particularly for temporal and volumetric imaging, than other types of weak annotations, such as per-pixel, bounding boxes, scribbles, ellipses, and points. Furthermore, to exploit our Drag&Drop annotations, we develop a novel weakly supervised learning method based on the watershed algorithm. Experimental results show that our method achieves better detection and localization performance than alternative weak annotations and, more importantly, achieves similar performance to that trained on detailed per-pixel annotations. Interestingly, we find that, with limited resources, allocating weak annotations from a diverse patient population can foster models more robust to unseen images than allocating per-pixel annotations for a small set of images. In summary, this research proposes an efficient annotation strategy for tumor detection and localization that is less accurate than per-pixel annotations but useful for creating large-scale datasets for screening tumors in various medical modalities.
    摘要 创建大规模、详细注解的数据集是训练AI算法的关键,以便自动检测和定位肿瘤。然而,由于资源有限,很难确定最佳类型的注解,当注解大量未标记数据时。为解决这个问题,我们关注肠内观感检查中的肿瘤和胰腺扫描中的肿瘤,两者都需要大量时间和努力进行每个像素注解,因为数据具有高维度特征,包括时间和空间维度。在这篇论文中,我们开发了一种新的注解策略,称之为“Drag&Drop”,该策略可以将注解过程简化为拖动和释放。这种注解策略比其他弱类注解方法,如每个像素、 bounding boxes、scribbles、 ellipses 和点更高效,特别是在时间和空间成像中。此外,我们开发了一种基于水果算法的新弱学习方法,以利用我们的 Drag&Drop 注解。实验结果表明,我们的方法可以在较少的资源下实现更好的检测和定位性能,并且与详细每个像素注解相比,可以达到类似的性能。在总结中,这项研究提出了一种高效的肿瘤检测和定位注解策略,虽然不如每个像素注解准确,但可以为各种医疗成像模式中的肿瘤检测做出大规模数据集。

One-dimensional convolutional neural network model for breast cancer subtypes classification and biochemical content evaluation using micro-FTIR hyperspectral images

  • paper_url: http://arxiv.org/abs/2310.15094
  • repo_url: None
  • paper_authors: Matheus del-Valle, Emerson Soares Bernardes, Denise Maria Zezell
  • for: 这个研究旨在开发一个deep learning工具来评估乳癌细胞型和生化贡献。
  • methods: 这个研究使用了Fourier transform infrared micro-spectroscopy和机器学习方法来评估乳癌组织 образ面,并获得了生化相关的解释。
  • results: 这个研究获得了一个1D deep learning工具,可以快速和精确地评估乳癌细胞型和生化贡献。实验结果显示,这个工具可以对乳癌组织 образ面进行分类,并提供了生化相关的解释。
    Abstract Breast cancer treatment still remains a challenge, where molecular subtypes classification plays a crucial role in selecting appropriate and specific therapy. The four subtypes are Luminal A (LA), Luminal B (LB), HER2 subtype, and Triple-Negative Breast Cancer (TNBC). Immunohistochemistry is the gold-standard evaluation, although interobserver variations are reported and molecular signatures identification is time-consuming. Fourier transform infrared micro-spectroscopy with machine learning approaches have been used to evaluate cancer samples, presenting biochemical-related explainability. However, this explainability is harder when using deep learning. This study created a 1D deep learning tool for breast cancer subtype evaluation and biochemical contribution. Sixty hyperspectral images were acquired from a human breast cancer microarray. K-Means clustering was applied to select tissue and paraffin spectra. CaReNet-V1, a novel 1D convolutional neural network, was developed to classify breast cancer (CA) and adjacent tissue (AT), and molecular subtypes. A 1D adaptation of Grad-CAM was applied to assess the biochemical impact to the classifications. CaReNet-V1 effectively classified CA and AT (test accuracy of 0.89), as well as HER2 and TNBC subtypes (0.83 and 0.86), with greater difficulty for LA and LB (0.74 and 0.68). The model enabled the evaluation of the most contributing wavenumbers to the predictions, providing a direct relationship with the biochemical content. Therefore, CaReNet-V1 and hyperspectral images is a potential approach for breast cancer biopsies assessment, providing additional information to the pathology report. Biochemical content impact feature may be used for other studies, such as treatment efficacy evaluation and development new diagnostics and therapeutic methods.
    摘要 乳癌治疗仍然是一个挑战,分子Subtype分类在选择适当和特定的治疗方法中扮演了关键性角色。四种Subtype包括Luminal A(LA)、Luminal B(LB)、HER2分子Subtype和Triple-Negative Breast Cancer(TNBC)。免疫染色是评估的标准方法,although interobserver variations have been reported, and molecular signatures identification is time-consuming。Fourier transform infrared micro-spectroscopy with machine learning approaches have been used to evaluate cancer samples, presenting biochemical-related explainability。However, this explainability is harder when using deep learning。这项研究创造了一种1D深度学习工具用于乳癌Subtype评估和生化贡献。六十个谱spectrum的图像被从人类乳癌微际array中获取。使用K-Means归一 clustering选择组织和油膜spectra。CaReNet-V1,一种新的1D卷积神经网络,用于分类乳癌(CA)和相邻组织(AT),以及分子Subtype。一种1D的Grad-CAM进行了改进,以评估分类的生化影响。CaReNet-V1能够有效地分类CA和AT(测试准确率为0.89),以及HER2和TNBC分子Subtype(测试准确率为0.83和0.86),但是LA和LB分子Subtype的分类更加困难(测试准确率为0.74和0.68)。模型允许评估预测中的最大贡献普通数,提供了直接与生化内容的关系。因此,CaReNet-V1和谱spectrum图像是可能的乳癌生物псиzes评估方法,提供了补充 pathology report 中的信息。生化内容的影响特征可能用于其他研究,如治疗效果评估和开发新的诊断和治疗方法。

MGAS: Multi-Granularity Architecture Search for Effective and Efficient Neural Networks

  • paper_url: http://arxiv.org/abs/2310.15074
  • repo_url: None
  • paper_authors: Xiaoyun Liu, Divya Saxena, Jiannong Cao, Yuqing Zhao, Penghui Ruan
  • For: 本研究旨在实现一种高效精准的神经网络搜索方法,以优化模型性能和模型大小之间的平衡。* Methods: 该方法使用多重粒度搜索(MGAS)框架,通过自适应确定不同粒度水平单元的剩余比例,以优化模型的多个粒度水平单元。同时,该方法采用分割网络阶段进行缓存控制,并提出进度评估以弥补权衡问题。* Results: 实验结果表明,MGAS方法在CIFAR-10、CIFAR-100和ImageNet等 dataset上比其他状态前方法更高效,能够更好地平衡模型性能和模型大小。
    Abstract Differentiable architecture search (DAS) revolutionizes neural architecture search (NAS) with time-efficient automation, transitioning from discrete candidate sampling and evaluation to differentiable super-net optimization and discretization. However, existing DAS methods either only conduct coarse-grained operation-level search or manually define the remaining ratios for fine-grained kernel-level and weight-level units, which fail to simultaneously optimize model size and model performance. Furthermore, these methods compromise search quality to reduce memory consumption. To tackle these issues, we introduce multi-granularity architecture search (MGAS), a unified framework which aims to comprehensively and memory-efficiently explore the multi-granularity search space to discover both effective and efficient neural networks. Specifically, we learn discretization functions specific to each granularity level to adaptively determine the remaining ratios according to the evolving architecture. This ensures an optimal balance among units of different granularity levels for different target model sizes. Considering the memory demands, we break down the super-net optimization and discretization into multiple sub-net stages. Nevertheless, the greedy nature of this approach may introduce bias in the early stages. To compensate for the bias, we propose progressive re-evaluation to allow for re-pruning and regrowing of previous units during subsequent stages. Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet demonstrate that MGAS outperforms other state-of-the-art methods in achieving a better trade-off between model performance and model size.
    摘要 “弹性建筑搜寻”(Differentiable Architecture Search,DAS)革命化神经建筑搜寻(Neural Architecture Search,NAS),从粗粒度的候选者抽样和评估中转移到微调精确的超网优化和粗粒化。然而,现有的DAS方法 Either only conduct coarse-grained operation-level search or manually define the remaining ratios for fine-grained kernel-level and weight-level units, which fail to simultaneously optimize model size and model performance. Furthermore, these methods compromise search quality to reduce memory consumption.To tackle these issues, we introduce multi-granularity architecture search(MGAS), a unified framework which aims to comprehensively and memory-efficiently explore the multi-granularity search space to discover both effective and efficient neural networks. Specifically, we learn discretization functions specific to each granularity level to adaptively determine the remaining ratios according to the evolving architecture. This ensures an optimal balance among units of different granularity levels for different target model sizes. Considering the memory demands, we break down the super-net optimization and discretization into multiple sub-net stages. Nevertheless, the greedy nature of this approach may introduce bias in the early stages. To compensate for the bias, we propose progressive re-evaluation to allow for re-pruning and regrowing of previous units during subsequent stages.Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet demonstrate that MGAS outperforms other state-of-the-art methods in achieving a better trade-off between model performance and model size.

Synergizing Human-AI Agency: A Guide of 23 Heuristics for Service Co-Creation with LLM-Based Agents

  • paper_url: http://arxiv.org/abs/2310.15065
  • repo_url: None
  • paper_authors: Qingxiao Zheng, Zhongwei Xu, Abhinav Choudhary, Yuting Chen, Yongming Li, Yun Huang
  • for: 这个Empirical study旨在帮助服务提供者确定如何integrate Large Language Models (LLMs)技术,以便为其专家和社区提供服务。
  • methods: 这个研究使用了CoAGent工具,一种基于LLM的代理人服务协作设计工具,与23名领域专家从美国各地的公共图书馆合作,探索人工智能与人类的共同学习之旅。
  • results: 研究发现了23个关键的”服务协作与AI”heellip,把人工智能和人类分享的责任归纳为9个基本机构方面,如所有权、公平待遇和表达自由。这种创新的方法扩展了参与设计模型,使AI成为关键参与者,并通过AI-AI交互发现盲点。这些发现为人类和AI在服务上合作准备了道路。
    Abstract This empirical study serves as a primer for interested service providers to determine if and how Large Language Models (LLMs) technology will be integrated for their practitioners and the broader community. We investigate the mutual learning journey of non-AI experts and AI through CoAGent, a service co-creation tool with LLM-based agents. Engaging in a three-stage participatory design processes, we work with with 23 domain experts from public libraries across the U.S., uncovering their fundamental challenges of integrating AI into human workflows. Our findings provide 23 actionable "heuristics for service co-creation with AI", highlighting the nuanced shared responsibilities between humans and AI. We further exemplar 9 foundational agency aspects for AI, emphasizing essentials like ownership, fair treatment, and freedom of expression. Our innovative approach enriches the participatory design model by incorporating AI as crucial stakeholders and utilizing AI-AI interaction to identify blind spots. Collectively, these insights pave the way for synergistic and ethical human-AI co-creation in service contexts, preparing for workforce ecosystems where AI coexists.
    摘要 这项实践研究作为各服务提供者决定是否并如何把大语言模型(LLM)技术 интеGRATE到他们的专家和更广泛的社区中的入门教程。我们通过CoAGent,一种基于语言模型代理的服务共创工具,与23名领域专家从美国各地的公共图书馆进行了三个阶段的参与式设计过程。我们发现了非AI专家与AI之间的共同挑战,并提供了23个有用的"服务共创与AI的准则",强调人与AI之间的细致共负责任。此外,我们还提出了9个基本机构方面,包括所有权、公平待遇和自由表达等,这些方面是AI的基本需求。我们的创新方法将参与式设计模型扩展到包括AI作为关键参与者,并通过AI-AI互动找到缺乏的眼界。总的来说,这些发现将为人类-AI共创在服务上准备了道路,以便在AI与人类合作的未来工作生态系统中,AI和人类可以共同创造价值。

The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models

  • paper_url: http://arxiv.org/abs/2310.15061
  • repo_url: https://github.com/shin-ee-chen/bla
  • paper_authors: Xinyi Chen, Raquel Fernández, Sandro Pezzelle
  • for: 这篇论文是为了检测语言模型对基本语言能力的理解度而写的。
  • methods: 该论文使用了一个新的自动生成的测试集来评估多Modal模型在基本语言能力方面的表现。
  • results: 论文显示,大多数测试模型在零基础设置下只有微妙地改善,而生成型模型BLIP2在Context学习设置下表现出了明显的趋势。
    Abstract Despite the impressive performance achieved by pre-trained language-and-vision models in downstream tasks, it remains an open question whether this reflects a proper understanding of image-text interaction. In this work, we explore to what extent they handle basic linguistic constructions -- active-passive voice, coordination, and relative clauses -- that even preschool children can typically master. We present BLA, a novel, automatically constructed benchmark to evaluate multimodal models on these Basic Language Abilities. We show that different types of Transformer-based systems, such as CLIP, ViLBERT, and BLIP2, generally struggle with BLA in a zero-shot setting, in line with previous findings. Our experiments, in particular, show that most of the tested models only marginally benefit when fine-tuned or prompted with construction-specific samples. Yet, the generative BLIP2 shows promising trends, especially in an in-context learning setting. This opens the door to using BLA not only as an evaluation benchmark but also to improve models' basic language abilities.
    摘要 尽管预训语言视觉模型在下游任务中表现出色,但是是否真正理解图文交互仍然是一个开放的问题。在这个工作中,我们探究这些模型是否能处理基本语言能力(Basic Language Abilities,BLA),包括主动passive语voice、协ordinance和 relatives clauses等。我们构建了一个自动生成的测试 bencmark,以评估多 modal 模型在 BLA 方面的表现。我们发现,基于 transformer 的不同类型模型,如 CLIP、ViLBERT 和 BLIP2,在零shot设置下通常遇到困难,与之前的发现相符。我们的实验表明,大多数测试模型只在特定的构造示例下得到了微妙的改进。然而,生成型 BLIP2 表现出了有前途的趋势,特别是在 context learning Setting 中。这开门了使用 BLA 不仅作为评估标准套件,而且可以提高模型的基本语言能力。

Robot Skill Generalization via Keypoint Integrated Soft Actor-Critic Gaussian Mixture Models

  • paper_url: http://arxiv.org/abs/2310.15059
  • repo_url: None
  • paper_authors: Iman Nematollahi, Kirill Yankov, Wolfram Burgard, Tim Welschehold
  • for: 提高 робоット抓捕系统在真实世界中的适应和通用能力
  • methods: 融合循环和追加实现方法,将学习和适应技能与场景中的核心点组合成Hybrid Skill Models
  • results: 在实验和真实环境中进行了严格评估,获得了在新环境中具有重要零基eline化能力和更快地精确化技能的结果,而且不需要新的实验数据。
    Abstract A long-standing challenge for a robotic manipulation system operating in real-world scenarios is adapting and generalizing its acquired motor skills to unseen environments. We tackle this challenge employing hybrid skill models that integrate imitation and reinforcement paradigms, to explore how the learning and adaptation of a skill, along with its core grounding in the scene through a learned keypoint, can facilitate such generalization. To that end, we develop Keypoint Integrated Soft Actor-Critic Gaussian Mixture Models (KIS-GMM) approach that learns to predict the reference of a dynamical system within the scene as a 3D keypoint, leveraging visual observations obtained by the robot's physical interactions during skill learning. Through conducting comprehensive evaluations in both simulated and real-world environments, we show that our method enables a robot to gain a significant zero-shot generalization to novel environments and to refine skills in the target environments faster than learning from scratch. Importantly, this is achieved without the need for new ground truth data. Moreover, our method effectively copes with scene displacements.
    摘要 robotic manipulation system 在实际场景中运行时面临的一个长期挑战是将获得的动力学技能适应和泛化到未看过的环境。我们解决这个挑战,将混合型技能模型 integrate 仿真和资料回归 paradigms,以探索如何通过学习和适应技能的核心在场景中的学习,来实现这种泛化。为此,我们开发了 Keypoint Integrated Soft Actor-Critic Gaussian Mixture Models (KIS-GMM) 方法,可以在场景中学习参考的三维关键点,通过 robot 的物理交互获得的视觉观察数据。经过在模拟和实际环境中的全面评估,我们表明了我们的方法可以让 robot 在未看过的环境中获得显著的零基eline generalization,并且在目标环境中更快地练习技能。此外,我们的方法可以适应场景变化。

Towards Conceptualization of “Fair Explanation”: Disparate Impacts of anti-Asian Hate Speech Explanations on Content Moderators

  • paper_url: http://arxiv.org/abs/2310.15055
  • repo_url: https://github.com/jiannan-xu/emnlp23_fair_explanation
  • paper_authors: Tin Nguyen, Jiannan Xu, Aayushi Roy, Hal Daumé III, Marine Carpuat
    for:This paper focuses on developing a novel evaluation method for “fair explanations” in AI systems, specifically in the context of content moderation of potential hate speech.methods:The authors propose using a combination of metrics, including mental discomfort, stereotype activation, and perceived workload, to evaluate the psychological impact of explanations on different user groups. They apply this method in the context of content moderation of potential hate speech, using saliency maps and counterfactual explanations as examples.results:The authors find that saliency maps generally perform better and show less evidence of disparate impact and individual unfairness than counterfactual explanations, suggesting that these maps may be a more effective and fair approach to explanation in this context.
    Abstract Recent research at the intersection of AI explainability and fairness has focused on how explanations can improve human-plus-AI task performance as assessed by fairness measures. We propose to characterize what constitutes an explanation that is itself "fair" -- an explanation that does not adversely impact specific populations. We formulate a novel evaluation method of "fair explanations" using not just accuracy and label time, but also psychological impact of explanations on different user groups across many metrics (mental discomfort, stereotype activation, and perceived workload). We apply this method in the context of content moderation of potential hate speech, and its differential impact on Asian vs. non-Asian proxy moderators, across explanation approaches (saliency map and counterfactual explanation). We find that saliency maps generally perform better and show less evidence of disparate impact (group) and individual unfairness than counterfactual explanations. Content warning: This paper contains examples of hate speech and racially discriminatory language. The authors do not support such content. Please consider your risk of discomfort carefully before continuing reading!
    摘要 近期研究的核心在人工智能解释和公平性的交叉点上,关注如何使用解释提高人 plus AI 任务的性能,按照公平度量表进行评估。我们提出了一种定义“公平解释”的方法,即解释不会对特定群体产生负面影响。我们开发了一种新的评估方法,用于评估解释的公平性,不仅包括准确率和标签时间,还包括解释对不同用户群体的心理影响(情绪不适、权威触发、认知劳重)。我们在内容审核潜在仇恨言语中应用了这种方法,并对不同解释方法(Saliency Map 和 counterfactual explanation)在不同用户群体中的差异进行分析。结果表明,Saliency Map 通常表现更好,并且对于不同群体的差异和个人不公正性表现较少。请注意,本文包含了仇恨言语和排斥性语言的示例,作者不支持这类内容。读者请考虑风险的不适程度,在继续阅读之前进行评估!

TeleQnA: A Benchmark Dataset to Assess Large Language Models Telecommunications Knowledge

  • paper_url: http://arxiv.org/abs/2310.15051
  • repo_url: https://github.com/netop-team/teleqna
  • paper_authors: Ali Maatouk, Fadhel Ayed, Nicola Piovesan, Antonio De Domenico, Merouane Debbah, Zhi-Quan Luo
  • for: The paper is written to evaluate the knowledge of Large Language Models (LLMs) in telecommunications and to provide a benchmark dataset for assessing their capabilities.
  • methods: The paper uses an automated question generation framework to create a dataset of 10,000 questions and answers related to telecommunications, drawing from diverse sources such as standards and research articles. Human input was integrated at various stages to ensure the quality of the questions.
  • results: The paper evaluates the capabilities of LLMs, including GPT-3.5 and GPT-4, using the provided dataset. The results show that these models struggle with complex standards-related questions but perform well on general telecom-related inquiries. Incorporating telecom knowledge context significantly enhances their performance, highlighting the need for a specialized telecom foundation model. The paper also compares the performance of LLMs with active telecom professionals, showing that LLMs can rival the performance of humans in telecom knowledge.
    Abstract We introduce TeleQnA, the first benchmark dataset designed to evaluate the knowledge of Large Language Models (LLMs) in telecommunications. Comprising 10,000 questions and answers, this dataset draws from diverse sources, including standards and research articles. This paper outlines the automated question generation framework responsible for creating this dataset, along with how human input was integrated at various stages to ensure the quality of the questions. Afterwards, using the provided dataset, an evaluation is conducted to assess the capabilities of LLMs, including GPT-3.5 and GPT-4. The results highlight that these models struggle with complex standards related questions but exhibit proficiency in addressing general telecom-related inquiries. Additionally, our results showcase how incorporating telecom knowledge context significantly enhances their performance, thus shedding light on the need for a specialized telecom foundation model. Finally, the dataset is shared with active telecom professionals, whose performance is subsequently benchmarked against that of the LLMs. The findings illustrate that LLMs can rival the performance of active professionals in telecom knowledge, thanks to their capacity to process vast amounts of information, underscoring the potential of LLMs within this domain. The dataset has been made publicly accessible on GitHub.
    摘要 我们介绍TeleQnA,首个用于评估大自然语言模型(LLM)在电信领域的评估标准dataset。这个dataset包含10,000个问题和答案,来自多个来源,包括标准和研究论文。本文详细介绍了创建这个dataset的自动问题生成框架,以及在不同阶段integrate人工input以确保问题的质量。接下来,使用提供的dataset进行评估,以评估LLMs的能力,包括GPT-3.5和GPT-4。结果显示这些模型对于标准相关问题表现不佳,但对于一般电信领域问题表现良好。此外,我们的结果显示,在将电信知识背景integrated到模型中后,模型的表现有所提高,这与建立特殊的电信基础模型有关。最后,我们将dataset分享给活跃的电信专业人员,并评估他们的表现,以及与LLMs的比较。结果显示LLMs可以与活跃的专业人员相比,在电信知识领域表现出色,强调LLMs在这个领域的潜力。dataset已经公开存储在GitHub上。

Meta- (out-of-context) learning in neural networks

  • paper_url: http://arxiv.org/abs/2310.15047
  • repo_url: https://github.com/krasheninnikov/internalization
  • paper_authors: Dmitrii Krasheninnikov, Egor Krasheninnikov, Bruno Mlodozeniec, David Krueger
  • for: 本研究旨在探讨大语言模型(LLM)中的内容学习现象,以及这种现象在不同的情况下如何表现。
  • methods: 研究人员采用了 синтетиче实验方法,用于测试LLM的内容学习能力。他们还提出了两种假设来解释内容学习现象的出现:一种是模型存储知识的方式,另一种是梯度对齐假设。
  • results: 研究人员发现,当LLM在不同的情况下学习时,它们会更倾向于“内化”广泛有用的语言内容,如真实的陈述或来自可靠的 источника的文本。此外,研究人员还在 sintetic computer vision Setting中证明了 meta-OCL 现象,并提出了两种假设来解释其出现。
    Abstract Brown et al. (2020) famously introduced the phenomenon of in-context learning in large language models (LLMs). We establish the existence of a phenomenon we call meta-out-of-context learning (meta-OCL) via carefully designed synthetic experiments with LLMs. Our results suggest that meta-OCL leads LLMs to more readily "internalize" the semantic content of text that is, or appears to be, broadly useful (such as true statements, or text from authoritative sources) and use it in appropriate circumstances. We further demonstrate meta-OCL in a synthetic computer vision setting, and propose two hypotheses for the emergence of meta-OCL: one relying on the way models store knowledge in their parameters, and another suggesting that the implicit gradient alignment bias of gradient-descent-based optimizers may be responsible. Finally, we reflect on what our results might imply about capabilities of future AI systems, and discuss potential risks. Our code can be found at https://github.com/krasheninnikov/internalization.
    摘要 布朗等人 (2020) 著名地引入大语言模型(LLM)中的内容学习现象。我们通过对 LLM 的精心设计的 sintetic experiment 而证明了一种我们称为meta-out-of-context learning(meta-OCL)现象的存在。我们的结果表明,meta-OCL使得 LLM 更加倾向于在广泛有用的文本中Internalize semantic content,例如真实声明或来自可靠源的文本,并在合适的情况下使用它们。我们还在 sintetic 计算机视觉设置中证明了 meta-OCL,并提出了两种对 meta-OCL 的起源的假设:一种基于模型存储知识的方式,另一种则建立在 gradient-descent-based optimizer 的偏好之上。最后,我们讨论了我们的结果对未来 AI 系统的能力的影响,以及 potential risks。可以在 https://github.com/krasheninnikov/internalization 找到我们的代码。

A Universal Anti-Spoofing Approach for Contactless Fingerprint Biometric Systems

  • paper_url: http://arxiv.org/abs/2310.15044
  • repo_url: None
  • paper_authors: Banafsheh Adami, Sara Tehranipoor, Nasser Nasrabadi, Nima Karimian
  • for: 这个研究旨在提高无接触指纹识别的安全性,应对各种发表攻击(Presentation Attack Instruments,PAI)。
  • methods: 我们提出了一种通用的发表攻击检测方法,使用StyleGAN生成的实验性指纹,并使用 semi-supervised ResNet-18 模型进行训练。我们还引入了一个新的共同损失函数, combinig Arcface 和 Center loss,并添加了一个调整参数来将两个损失函数均衡。
  • results: 我们的提案方法可以实现0.12%的伪阳性率(BPCER)、0.63%的攻击发表率(APCER)和0.37%的平均错分率(ACER),并且在无见到的发表攻击和生活资料上进行评估。
    Abstract With the increasing integration of smartphones into our daily lives, fingerphotos are becoming a potential contactless authentication method. While it offers convenience, it is also more vulnerable to spoofing using various presentation attack instruments (PAI). The contactless fingerprint is an emerging biometric authentication but has not yet been heavily investigated for anti-spoofing. While existing anti-spoofing approaches demonstrated fair results, they have encountered challenges in terms of universality and scalability to detect any unseen/unknown spoofed samples. To address this issue, we propose a universal presentation attack detection method for contactless fingerprints, despite having limited knowledge of presentation attack samples. We generated synthetic contactless fingerprints using StyleGAN from live finger photos and integrating them to train a semi-supervised ResNet-18 model. A novel joint loss function, combining the Arcface and Center loss, is introduced with a regularization to balance between the two loss functions and minimize the variations within the live samples while enhancing the inter-class variations between the deepfake and live samples. We also conducted a comprehensive comparison of different regularizations' impact on the joint loss function for presentation attack detection (PAD) and explored the performance of a modified ResNet-18 architecture with different activation functions (i.e., leaky ReLU and RelU) in conjunction with Arcface and center loss. Finally, we evaluate the performance of the model using unseen types of spoof attacks and live data. Our proposed method achieves a Bona Fide Classification Error Rate (BPCER) of 0.12\%, an Attack Presentation Classification Error Rate (APCER) of 0.63\%, and an Average Classification Error Rate (ACER) of 0.37\%.
    摘要 We generated synthetic contactless fingerprints using StyleGAN from live finger photos and integrated them into a semi-supervised ResNet-18 model. We introduced a novel joint loss function that combines the Arcface and Center loss, with a regularization to balance between the two loss functions and minimize variations within the live samples while enhancing inter-class variations between the deepfake and live samples. We also explored the impact of different regularizations on the joint loss function for presentation attack detection (PAD) and the performance of a modified ResNet-18 architecture with different activation functions (i.e., leaky ReLU and RelU) in conjunction with Arcface and center loss.Finally, we evaluated the performance of the model using unseen types of spoof attacks and live data. Our proposed method achieved a Bona Fide Classification Error Rate (BPCER) of 0.12%, an Attack Presentation Classification Error Rate (APCER) of 0.63%, and an Average Classification Error Rate (ACER) of 0.37%.

Machine Learning and Knowledge: Why Robustness Matters

  • paper_url: http://arxiv.org/abs/2310.19819
  • repo_url: None
  • paper_authors: Jonathan Vandenburgh
  • for: 本文探讨了机器学习算法的信任问题,即使用者是否可以信任机器学习模型的输出。
  • methods: 作者提出了一种基于知识的信任模型,即用户需要知道机器学习模型的输出是正确的,以确保模型的信任worthiness。
  • results: 作者认为,机器学习模型只有当它们能够在不同的假设场景下工作正常,并且做出基于正确的特征的决策时,用户才能有知识来信任模型的输出。这种知识需要决策基于正确的原因,并且需要抗错误性和robustness。
    Abstract Trusting machine learning algorithms requires having confidence in their outputs. Confidence is typically interpreted in terms of model reliability, where a model is reliable if it produces a high proportion of correct outputs. However, model reliability does not address concerns about the robustness of machine learning models, such as models relying on the wrong features or variations in performance based on context. I argue that the epistemic dimension of trust can instead be understood through the concept of knowledge, where the trustworthiness of an algorithm depends on whether its users are in the position to know that its outputs are correct. Knowledge requires beliefs to be formed for the right reasons and to be robust to error, so machine learning algorithms can only provide knowledge if they work well across counterfactual scenarios and if they make decisions based on the right features. This, I argue, can explain why we should care about model properties like interpretability, causal shortcut independence, and distribution shift robustness even if such properties are not required for model reliability.
    摘要 信任机器学习算法需要对其输出有信心。信任度通常被理解为模型可靠性,其中模型可靠性是指模型生成正确输出的概率高。然而,模型可靠性不能解决机器学习模型的稳定性问题,例如模型依赖错误特征或上下文变化导致的性能变化。我认为知识维度上的信任可以通过知识概念来理解,即算法的用户是否知道其输出是正确的。知识需要形成基于正确原因的信念,并能够抵抗错误,因此机器学习算法只能提供知识,如果它们在不同的假设enario下工作良好,并且做出基于正确特征的决策。这可以解释为什么我们应该关注模型性能的可解释性、 causal shortcut independence 和分布转移Robustness 等属性,即使这些属性并不是模型可靠性的必需条件。

UWB Based Static Gesture Classification

  • paper_url: http://arxiv.org/abs/2310.15036
  • repo_url: None
  • paper_authors: Abhishek Sebastian
  • for: 这份研究旨在提高无线射频(UWB)技术下的静止姿势识别,并且应用于各个领域中。
  • methods: 本研究使用了专有的UWB射频感应器技术,进行了广泛的数据收集和处理,并建立了一个全面的数据预processing pipeline,包括偏差处理、保持比例的resize和伪色图像变数。模型使用了CNN和MobileNet架构,并且实现了96.78%的准确率。
  • results: 本研究获得了96.78%的准确率,并且开发了一个用户友好的GUI框架,以评估模型的系统资源使用和处理时间,发现系统资源利用率低,处理时间几乎是一秒钟内完成。这项研究对于静止姿势识别领域的应用具有实际意义。
    Abstract Our paper presents a robust framework for UWB-based static gesture recognition, leveraging proprietary UWB radar sensor technology. Extensive data collection efforts were undertaken to compile datasets containing five commonly used gestures. Our approach involves a comprehensive data pre-processing pipeline that encompasses outlier handling, aspect ratio-preserving resizing, and false-color image transformation. Both CNN and MobileNet models were trained on the processed images. Remarkably, our best-performing model achieved an accuracy of 96.78%. Additionally, we developed a user-friendly GUI framework to assess the model's system resource usage and processing times, which revealed low memory utilization and real-time task completion in under one second. This research marks a significant step towards enhancing static gesture recognition using UWB technology, promising practical applications in various domains.
    摘要 我们的论文提出了一种鲁棒的UWB技术基于静止姿势识别框架,利用专有的UWB雷达传感器技术。我们进行了广泛的数据收集努力,编辑了五种常用的姿势集成dataset。我们的方法包括一个全面的数据预处理管道,包括异常处理、保持比例resize和假色图像变换。我们在处理后的图像上训练了CNN和MobileNet模型。意外地,我们的最佳模型达到了96.78%的准确率。此外,我们开发了一个用户友好的GUI框架,用于评估模型的系统资源使用情况和处理时间,发现内存利用率低,实时任务完成时间在1秒钟以下。这项研究为静止姿势识别领域使用UWB技术带来了重要的进步,有很多实际应用领域。

Deep Autoencoder-based Z-Interference Channels with Perfect and Imperfect CSI

  • paper_url: http://arxiv.org/abs/2310.15027
  • repo_url: None
  • paper_authors: Xinliang Zhang, Mojtaba Vaezi
  • for: 这篇论文是为了设计一种基于深度自编码器(DAE)的结构,用于实现两个用户之间的Z-INTERFERENCE CHANNEL(ZIC)中的端到端通信。
  • methods: 该结构使用了DAE,并将它与两个编码器/解码器对相结合,以生成受电磁干扰(Interference)意识的星形数据,以最小化比特错误率(BER)。此外,它还引入了I/Q电力分配层,以保证平均电力限制,并使架构能够生成非uniform的星形数据。
  • results: 相比标准的均匀星形数据(如 quadrature amplitude modulation),该结构可以提供更高的性能。在不同的电磁干扰水平(weak、moderate、strong)下,该结构都可以获得显著的改善,并且与信号噪音比(SNR)的增加而一直增加。例如,在弱电磁干扰下,当SNR大于15dB时,与最竞争性的传统方法相比,可以获得更多一个数字位的改善( более一个数字位)。而在量化错误存在时,改善可以达到两个数字位,这表明DAE-ZIC对于电磁干扰更加Robust。
    Abstract A deep autoencoder (DAE)-based structure for endto-end communication over the two-user Z-interference channel (ZIC) with finite-alphabet inputs is designed in this paper. The proposed structure jointly optimizes the two encoder/decoder pairs and generates interference-aware constellations that dynamically adapt their shape based on interference intensity to minimize the bit error rate (BER). An in-phase/quadrature-phase (I/Q) power allocation layer is introduced in the DAE to guarantee an average power constraint and enable the architecture to generate constellations with nonuniform shapes. This brings further gain compared to standard uniform constellations such as quadrature amplitude modulation. The proposed structure is then extended to work with imperfect channel state information (CSI). The CSI imperfection due to both the estimation and quantization errors are examined. The performance of the DAEZIC is compared with two baseline methods, i.e., standard and rotated constellations. The proposed structure significantly enhances the performance of the ZIC both for the perfect and imperfect CSI. Simulation results show that the improvement is achieved in all interference regimes (weak, moderate, and strong) and consistently increases with the signal-to-noise ratio (SNR). For example, more than an order of magnitude BER reduction is obtained with respect to the most competitive conventional method at weak interference when SNR>15dB and two bits per symbol are transmitted. The improvements reach about two orders of magnitude when quantization error exists, indicating that the DAE-ZIC is more robust to the interference compared to the conventional methods.
    摘要 这篇论文提出了一种基于深度自编码器(DAE)的结构,用于实现两个用户之间的Z-交通渠道(ZIC)的端到端通信。该结构同时优化了两个编码器/解码器对的joint,并生成基于干扰强度的自适应干扰几何体,以最小化比特错误率(BER)。在DAE中,我们引入了I/Q功率分配层,以保证平均功率控制,并使架构能够生成非均匀的几何体。这会带来更大的改进,比如干扰强度的不同,使用标准的均匀几何体,如 quadrature amplitude modulation。然后,我们将结构扩展到与不完全的通道状态信息(CSI)一起使用。我们研究了CSI的估计和量化误差的影响。提出的DAEZIC结构与标准和旋转几何体比较,表现出明显的改进。在不同的干扰水平(弱、中、强)下,DAEZIC结构能够在SNR从15dB开始,对比标准方法的一个次数以上减少比特错误率。此外,当存在量化误差时,DAEZIC结构能够与标准方法相比,提供更大的改进,达到两个数量级。

Efficient Data Learning for Open Information Extraction with Pre-trained Language Models

  • paper_url: http://arxiv.org/abs/2310.15021
  • repo_url: None
  • paper_authors: Zhiyuan Fan, Shizhu He
    for:OK-IE is designed to improve the efficiency of Open Information Extraction (OpenIE) tasks in Natural Language Processing.methods:OK-IE uses a novel framework that transforms the task form of OpenIE into the pre-training task form of the T5 model, reducing the need for extensive training data. Additionally, OK-IE introduces an innovative concept called Anchor to control the sequence of model outputs and eliminate the impact of order penalty on model convergence.results:Compared to previous state-of-the-art (SOTA) methods, OK-IE requires only 1/100 of the training data (900 instances) and 1/120 of the training time (3 minutes) to achieve comparable results.
    Abstract Open Information Extraction (OpenIE) is a fundamental yet challenging task in Natural Language Processing, which involves extracting all triples (subject, predicate, object) from a given sentence. While labeling-based methods have their merits, generation-based techniques offer unique advantages, such as the ability to generate tokens not present in the original sentence. However, these generation-based methods often require a significant amount of training data to learn the task form of OpenIE and substantial training time to overcome slow model convergence due to the order penalty. In this paper, we introduce a novel framework, OK-IE, that ingeniously transforms the task form of OpenIE into the pre-training task form of the T5 model, thereby reducing the need for extensive training data. Furthermore, we introduce an innovative concept of Anchor to control the sequence of model outputs, effectively eliminating the impact of order penalty on model convergence and significantly reducing training time. Experimental results indicate that, compared to previous SOTA methods, OK-IE requires only 1/100 of the training data (900 instances) and 1/120 of the training time (3 minutes) to achieve comparable results.
    摘要 开放信息提取(OpenIE)是自然语言处理中的基本 yet 挑战性任务,它涉及提取每个句子中的所有三元组(主语、谓语、谓Object)。虽然标注方法有其优点,但生成型技术具有生成 tokens 不存在于原始句子的优点。然而,这些生成型方法通常需要很大量的训练数据来学习 OpenIE 任务的形式,并且需要很长的训练时间来超越顺序罚。在这篇论文中,我们介绍了一种新的框架,OK-IE,它巧妙地将 OpenIE 任务的形式转换为 T5 模型的预训练任务形式,从而减少了训练数据的需求。此外,我们还引入了一个新的概念,即 Anchor,用于控制模型输出的顺序,有效地消除了顺序罚对模型的整合和训练时间的影响,并显著减少了训练时间。实验结果表明,相比前一代 SOTA 方法,OK-IE 只需要 1/100 的训练数据(900 个实例)和 1/120 的训练时间(3 分钟)来实现相似的结果。

Invariance is Key to Generalization: Examining the Role of Representation in Sim-to-Real Transfer for Visual Navigation

  • paper_url: http://arxiv.org/abs/2310.15020
  • repo_url: None
  • paper_authors: Bo Ai, Zhanxin Wu, David Hsu
  • for: 这篇论文是关于数据驱动控制方法的总结,它提出了一种可以在不同任务域中进行总结的方法。
  • methods: 论文使用了一种含有深度信息和 semantic信息的表示方法,用于Visual navigation控制。
  • results: 实验表明,这种表示方法可以让一个在 simulate indoor scene 中训练的控制策略在真实世界中的不同环境中进行总结,并且可以降低A-distance,从而提高总结误差的 bound。
    Abstract The data-driven approach to robot control has been gathering pace rapidly, yet generalization to unseen task domains remains a critical challenge. We argue that the key to generalization is representations that are (i) rich enough to capture all task-relevant information and (ii) invariant to superfluous variability between the training and the test domains. We experimentally study such a representation -- containing both depth and semantic information -- for visual navigation and show that it enables a control policy trained entirely in simulated indoor scenes to generalize to diverse real-world environments, both indoors and outdoors. Further, we show that our representation reduces the A-distance between the training and test domains, improving the generalization error bound as a result. Our proposed approach is scalable: the learned policy improves continuously, as the foundation models that it exploits absorb more diverse data during pre-training.
    摘要 “数据驱动的机器人控制方法在速度快速增长,然而在未经见过任务域时的泛化仍然是一个挑战。我们认为,键要泛化是(i)捕捉所有任务相关信息的丰富表示,以及(ii)在训练和测试领域之间不受干扰的表示不变性。我们实验study一种如此的表示——包含深度和semantic信息——用于视觉导航,并证明其使得一个完全在simulated indoor scene中训练的控制策略可以通过多种真实世界环境,both indoors and outdoors泛化。此外,我们证明我们的表示可以降低A距离 between 训练和测试领域,从而降低泛化错误 bound。我们提出的方法可扩展:学习的策略会不断改进,因为它们可以在预训练过程中吸收更多多样化的数据。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Meta learning with language models: Challenges and opportunities in the classification of imbalanced text

  • paper_url: http://arxiv.org/abs/2310.15019
  • repo_url: https://github.com/usnistgov/NIST-AI-Meta-Learning-LLM
  • paper_authors: Apostol Vassilev, Honglan Jin, Munawar Hasan
  • for: 检测异常言行(OOPS)内容是重要但具有挑战性的任务。
  • methods: 我们提出了一种元学习技术(MLT),将不同文本表示建立的个体模型结合在一起,并通过分析表明了结果的数值稳定性和合理的结合权重。
  • results: 我们结合了MLT和阈值移动(TM)技术,对高度不均衡的采集和非采集数据集进行了改进性能的推断。我们还提供了计算结果,证明了我们的提案方法具有统计学上的优势。
    Abstract Detecting out of policy speech (OOPS) content is important but difficult. While machine learning is a powerful tool to tackle this challenging task, it is hard to break the performance ceiling due to factors like quantity and quality limitations on training data and inconsistencies in OOPS definition and data labeling. To realize the full potential of available limited resources, we propose a meta learning technique (MLT) that combines individual models built with different text representations. We analytically show that the resulting technique is numerically stable and produces reasonable combining weights. We combine the MLT with a threshold-moving (TM) technique to further improve the performance of the combined predictor on highly-imbalanced in-distribution and out-of-distribution datasets. We also provide computational results to show the statistically significant advantages of the proposed MLT approach. All authors contributed equally to this work.
    摘要 检测不当言论(OOPS)内容是重要但困难的任务。虽然机器学习是一种强大的工具来解决这个任务,但由于训练数据的量和质量限制以及OOPS定义和数据标注的不一致,很难突破性能的上限。为了实现有限的资源的潜在潜力,我们提议一种元学习技术(MLT),该技术将不同的文本表示建立各自的模型,并将这些模型组合起来。我们数学地显示,该技术的结果是数学稳定的并且可以获得合理的组合权重。我们还将MLT与阈值移动(TM)技术结合使用,以进一步提高在具有很高异常性的数据集上的组合预测器的性能。此外,我们还提供了计算结果,以证明我们的提议MLT方法具有统计上的显著优势。All authors contributed equally to this work.

The primacy bias in Model-based RL

  • paper_url: http://arxiv.org/abs/2310.15017
  • repo_url: None
  • paper_authors: Zhongjian Qiao, Jiafei Lyu, Xiu Li
  • for: 本研究旨在 investigate 深度优化学习(DRL)中的 primacy bias,特别是在 model-based reinforcement learning(MBRL)中。
  • methods: 本研究使用 world model resetting 方法来 alleviate primacy bias。
  • results: 实验结果表明,world model resetting 可以 effectively 降低 primacy bias 并提高算法的性能。Results 验证在多个 continous control 任务上和 discrete control 任务上,并在 MuJoCo 和 DeepMind Control Suite 上进行了 validate。
    Abstract The primacy bias in deep reinforcement learning (DRL), which refers to the agent's tendency to overfit early data and lose the ability to learn from new data, can significantly decrease the performance of DRL algorithms. Previous studies have shown that employing simple techniques, such as resetting the agent's parameters, can substantially alleviate the primacy bias. However, we observe that resetting the agent's parameters harms its performance in the context of model-based reinforcement learning (MBRL). In fact, on further investigation, we find that the primacy bias in MBRL differs from that in model-free RL. In this work, we focus on investigating the primacy bias in MBRL and propose world model resetting, which works in MBRL. We apply our method to two different MBRL algorithms, MBPO and DreamerV2. We validate the effectiveness of our method on multiple continuous control tasks on MuJoCo and DeepMind Control Suite, as well as discrete control tasks on Atari 100k benchmark. The results show that world model resetting can significantly alleviate the primacy bias in model-based setting and improve algorithm's performance. We also give a guide on how to perform world model resetting effectively.
    摘要 深度强化学习(DRL)中的先天偏见(primacy bias),指的是智能体的偏见性,即在训练早期采集的数据中适应性过高,导致新数据适应性下降。先前的研究表明,对智能体参数重置可以减轻先天偏见的影响。然而,我们发现在基于模型的强化学习(MBRL)中,重置智能体参数实际上会伤性。这是因为MBRL中的先天偏见与模型自由强化学习(model-free RL)中的先天偏见不同。在这项工作中,我们关注MBRL中的先天偏见,并提出了世界模型重置方法(world model resetting)。我们应用了我们的方法于两种不同的MBRL算法,MBPO和DreamerV2。我们验证了我们的方法在多个连续控制任务上,包括MuJoCo和DeepMind Control Suite,以及离散控制任务上,包括Atari 100k benchmark。结果表明,世界模型重置可以在基于模型的设置中减轻先天偏见,提高算法的性能。我们还提供了有效实施世界模型重置的指南。

Understanding the Inner Workings of Language Models Through Representation Dissimilarity

  • paper_url: http://arxiv.org/abs/2310.14993
  • repo_url: None
  • paper_authors: Davis Brown, Charles Godfrey, Nicholas Konz, Jonathan Tu, Henry Kvinge
  • for: 本研究旨在帮助我们更深入地理解语言模型的内部工作方式,提高模型的信任、可读性和透明性。
  • methods: 本研究使用表示差异度量来评估语言模型的内部表示方式之间的不同程度。
  • results: 研究发现,使用SoLU和GeLU活动函数的模型存在显著的内部表示差异性,此外,表示差异度量还可以检测和定位模型的泛化特性,而这些特性可以通过测试集中的表现不可见。此外,研究还发现了语言模型特征的变化情况随着模型的宽度和深度的变化。
    Abstract As language models are applied to an increasing number of real-world applications, understanding their inner workings has become an important issue in model trust, interpretability, and transparency. In this work we show that representation dissimilarity measures, which are functions that measure the extent to which two model's internal representations differ, can be a valuable tool for gaining insight into the mechanics of language models. Among our insights are: (i) an apparent asymmetry in the internal representations of model using SoLU and GeLU activation functions, (ii) evidence that dissimilarity measures can identify and locate generalization properties of models that are invisible via in-distribution test set performance, and (iii) new evaluations of how language model features vary as width and depth are increased. Our results suggest that dissimilarity measures are a promising set of tools for shedding light on the inner workings of language models.
    摘要
  1. An apparent asymmetry in the internal representations of models using SoLU and GeLU activation functions.2. Evidence that dissimilarity measures can identify and locate generalization properties of models that are invisible via in-distribution test set performance.3. New evaluations of how language model features vary as width and depth are increased.Our results suggest that dissimilarity measures are a promising set of tools for shedding light on the inner workings of language models.

ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation

  • paper_url: http://arxiv.org/abs/2310.14979
  • repo_url: None
  • paper_authors: Xinpeng Wang, Barbara Plank
  • for: 该论文旨在解决dataset创建过程中annotator disagreement的问题,提出了一种基于多头模型的活动学习策略来降低标注成本。
  • methods: 该论文使用了多头模型和活动学习策略来解决annotator disagreement问题,并评估了不同的acquisition函数在两个数据集上。
  • results: 论文显示,使用多头模型和活动学习策略可以大幅降低标注成本,同时保持prediction和uncertainty estimation的性能水平。 Specifically, the paper shows that using a multi-head model and active learning can reduce the annotation budget by up to 70%, while maintaining the performance of prediction and uncertainty estimation.
    Abstract Label aggregation such as majority voting is commonly used to resolve annotator disagreement in dataset creation. However, this may disregard minority values and opinions. Recent studies indicate that learning from individual annotations outperforms learning from aggregated labels, though they require a considerable amount of annotation. Active learning, as an annotation cost-saving strategy, has not been fully explored in the context of learning from disagreement. We show that in the active learning setting, a multi-head model performs significantly better than a single-head model in terms of uncertainty estimation. By designing and evaluating acquisition functions with annotator-specific heads on two datasets, we show that group-level entropy works generally well on both datasets. Importantly, it achieves performance in terms of both prediction and uncertainty estimation comparable to full-scale training from disagreement, while saving up to 70% of the annotation budget.
    摘要 Label aggregation如果常用来解决注释分歧在数据集创建中。然而,这可能忽略少数值和意见。 latest studies indicate that learning from individual annotations outperforms learning from aggregated labels, although they require a considerable amount of annotation。 Active learning,作为注释成本减少策略,尚未在决策分歧中充分探索。我们显示,在活动学习设置下,一个多头模型在uncertainty估计方面表现更好于单头模型,并且可以大幅减少注释预算。通过设计和评估Acquisition函数的annotator-specificheads在两个数据集上,我们显示了组级 entropy通常在两个数据集上表现良好,并且可以达到相对于全面决策分歧的性能,同时减少到70%的注释预算。

The WHY in Business Processes: Discovery of Causal Execution Dependencies

  • paper_url: http://arxiv.org/abs/2310.14975
  • repo_url: None
  • paper_authors: Fabiana Fournier, Lior Limonad, Inna Skarbovsky, Yuval David
  • for: 本研究旨在揭示进程执行中的真实依赖关系,以便更好地预测进程 intervención的结果和做出 Informed 决策。
  • methods: 本研究使用现有的 causal discovery 算法对活动时间进行了扩展,并在 three 种 causal 模式下检测了模型与实际进程模型之间的差异。
  • results: 研究人员通过对两个开源的进程挖掘算法和 IBM 进程挖掘工具进行应用,在 sintéticamente 生成的数据集和两个开源的 Referential 数据集上进行了实验,并发现了模型与实际进程模型之间的差异。
    Abstract A crucial element in predicting the outcomes of process interventions and making informed decisions about the process is unraveling the genuine relationships between the execution of process activities. Contemporary process discovery algorithms exploit time precedence as their main source of model derivation. Such reliance can sometimes be deceiving from a causal perspective. This calls for faithful new techniques to discover the true execution dependencies among the tasks in the process. To this end, our work offers a systematic approach to the unveiling of the true causal business process by leveraging an existing causal discovery algorithm over activity timing. In addition, this work delves into a set of conditions under which process mining discovery algorithms generate a model that is incongruent with the causal business process model, and shows how the latter model can be methodologically employed for a sound analysis of the process. Our methodology searches for such discrepancies between the two models in the context of three causal patterns, and derives a new view in which these inconsistencies are annotated over the mined process model. We demonstrate our methodology employing two open process mining algorithms, the IBM Process Mining tool, and the LiNGAM causal discovery technique. We apply it on a synthesized dataset and on two open benchmark data sets.
    摘要 “一个重要的因素在预测 процес间接引入和做出 Informed 决策是探索真实的执行活动之间的依赖关系。现代 процес发现算法从时间先决为主要的模型 derivation 来源。这种依赖可能从 causal 角度来说是欺骗的。这倡议需要新的忠实的方法来发现 true 的执行dependencies amidst tasks 在过程中。为此,我们的工作提供了一个系统的方法来揭示 genuine 的商业过程模型,通过利用现有的 causal 发现算法 sobre 活动时间。此外,这个工作也探讨了一些过程探索算法生成模型与商业过程模型不一致的情况下,如何使用这个模型进行有 sound 的过程分析。我们的方法在这三种 causal 模式下寻找这些不一致,并将这些不一致注解到探索过程中。我们运用这个方法使用 IBM 过程探索工具和 LiNGAM causal 发现技术。我们将其应用于一个合成数据集和两个公开的 benchmark 数据集。”Note: Simplified Chinese is used here, which is a more casual and informal version of Chinese. If you prefer Traditional Chinese or another version, please let me know.

Efficient Causal Discovery for Robotics Applications

  • paper_url: http://arxiv.org/abs/2310.14925
  • repo_url: None
  • paper_authors: Luca Castri, Sariah Mghames, Nicola Bellotto
  • for: 这个论文的目的是为了提供一种快速和准确的 causal 分析方法,用于在人类和机器人共同工作的环境中自动化任务。
  • methods: 这个论文使用的方法是 Filtered PCMCI(F-PCMCI),它可以快速和准确地分析人类和机器人之间的 causal 关系。
  • results: 论文中的实验表明,F-PCMCI 可以准确地重建人类和机器人之间的 causal 模型,从而提高机器人和人类之间的互动质量。
    Abstract Using robots for automating tasks in environments shared with humans, such as warehouses, shopping centres, or hospitals, requires these robots to comprehend the fundamental physical interactions among nearby agents and objects. Specifically, creating models to represent cause-and-effect relationships among these elements can aid in predicting unforeseen human behaviours and anticipate the outcome of particular robot actions. To be suitable for robots, causal analysis must be both fast and accurate, meeting real-time demands and the limited computational resources typical in most robotics applications. In this paper, we present a practical demonstration of our approach for fast and accurate causal analysis, known as Filtered PCMCI (F-PCMCI), along with a real-world robotics application. The provided application illustrates how our F-PCMCI can accurately and promptly reconstruct the causal model of a human-robot interaction scenario, which can then be leveraged to enhance the quality of the interaction.
    摘要 In this paper, we present a practical demonstration of our approach for fast and accurate causal analysis, known as Filtered PCMCI (F-PCMCI), along with a real-world robotics application. The provided application shows how our F-PCMCI can accurately and promptly reconstruct the causal model of a human-robot interaction scenario, which can then be leveraged to enhance the quality of the interaction.

PartialFormer: Modeling Part Instead of Whole

  • paper_url: http://arxiv.org/abs/2310.14921
  • repo_url: https://github.com/zhengkid/partialformer
  • paper_authors: Tong Zheng, Bei Li, Huiwen Bao, Weiqiao Shan, Tong Xiao, Jingbo Zhu
  • for: 本研究旨在提出一种具有较少参数和计算 overhead 的 Transformer Feed-Forward Neural Network (FFNN) 架构,通过强调隐藏维度来实现轻量级 FFNN。
  • methods: 本研究提出了一种名为 PartialFormer 的参数高效的 Transformer 架构,该架构使用多个较小的 FFNN 来减少参数和计算量,同时保持关键的隐藏维度。此外,我们还提出了一种适应 Head 的缩放策略,以及一种深度缩放中的 residual-like 注意计算方法。
  • results: 我们在 9 种翻译任务和 1 种抽象摘要任务上进行了广泛的实验 validate 了我们的 PartialFormer 方法的有效性。
    Abstract The design choices in Transformer feed-forward neural networks have resulted in significant computational and parameter overhead. In this work, we emphasize the importance of hidden dimension in designing lightweight FFNs, a factor often overlooked in previous architectures. Guided by this principle, we introduce PartialFormer, a parameter-efficient Transformer architecture utilizing multiple smaller FFNs to reduce parameters and computation while maintaining essential hidden dimensions. These smaller FFNs are integrated into a multi-head attention system to enable effective collaboration. We also propose a tailored head scaling strategy to enhance PartialFormer's capabilities. Furthermore, we present a residual-like attention calculation to improve depth scaling within PartialFormer. Extensive experiments on 9 translation tasks and 1 abstractive summarization task validate the effectiveness of our PartialFormer approach. Our code would be available at: \url{https://github.com/zhengkid/PartialFormer}.
    摘要 “ transformer Feed-Forward Neural Networks(FFN)的设计选择导致了重要的计算和参数开销。在这种工作中,我们强调隐藏维度在设计轻量级 FFN 时的重要性,一个常被前一代架构所忽略。受此假设,我们介绍 PartialFormer,一个具有轻量级 FFN 的参数有效的 transformer 架构,通过多个较小的 FFN 实现降低参数和计算量,保持重要的隐藏维度。这些较小的 FFN 被整合到多头注意系统中,以实现有效的合作。此外,我们提出了适应 PartialFormer 的头scaling策略,以提高其能力。此外,我们还提出了一种类似于差分注意计算的 residual-like attention 计算,以提高 PartialFormer 中的深度扩展。实验结果显示,我们的 PartialFormer 方法在 9 种翻译任务和 1 种概括摘要任务上具有优秀的效果。我们的代码将会在:https://github.com/zhengkid/PartialFormer 上公开。”

Linking Surface Facts to Large-Scale Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2310.14909
  • repo_url: https://github.com/nec-research/fact-linking
  • paper_authors: Gorjan Radevski, Kiril Gashteovski, Chia-Chien Hung, Carolin Lawrence, Goran Glavaš
  • for: 本研究的目的是提高自然语言文本中的信息提取和知识图(KG)之间的 bridge。
  • methods: 本研究使用 Open Information Extraction(OIE)方法提取自然语言文本中的 (“主题”; “关系”; “ объек”) triple,并使用知识图(KG)来提高信息的可靠性和准确性。
  • results: 研究发现,检测外部知识图(KG)中没有对应的表面形式是更加困难的任务,而准确地将表面形式与知识图(KG)中的对应关系连接起来的任务相对较为容易。
    Abstract Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entities and predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task. We publicly release all resources (data, benchmark and code) on https://github.com/nec-research/fact-linking.
    摘要 开放信息抽取(OIE)方法从自然语言文本中提取信息,并将其表示为("主题"; "关系"; "对象") triple。但这些信息的表面形式却具有很多含义不确定性,例如 surface phrase "Michael Jordan" 可能指的是前职业篮球运动员或大学教授。知识图(KG)则是一种固定 schema(i.e., 固定的实体和Predicate)的canvas,它们的覆盖率较低。为了bridging这个差距,我们需要将 OIE 和 KG 的优点相结合:(1)高度的自由文本 OIE 覆盖率,以及(2)知识图中的semantic precisión(i.e., 单义性)。为了实现这个目标,我们提出了一个新的标准准例,以及一些新的评估协议,可以例如度量系统是否能够在granular triple slot level上正确地连接信息。我们的广泛评估表明,检测 Out-of-KG 实体和 predicate 是更加困难的,因此需要更多的研究努力。我们将所有资源(数据、标准准例和代码)公开发布在 https://github.com/nec-research/fact-linking。

Universal Knowledge Graph Embeddings

  • paper_url: http://arxiv.org/abs/2310.14899
  • repo_url: https://github.com/dice-group/universal_embeddings
  • paper_authors: N’Dah Jean Kouagou, Caglar Demir, Hamada M. Zahera, Adrian Wilke, Stefan Heindorf, Jiayi Li, Axel-Cyrille Ngonga Ngomo
  • for: 这个论文的目的是学习大规模联结知识图中的embedding,以便在多个知识图上进行类似entity的搜索和关系预测。
  • methods: 这篇论文使用了大规模知识图的OWL:sameAs关系来融合多个知识图,并通过计算这些关系来学习universal的知识图 embedding。
  • results: 实验表明,使用这些universal embedding可以更好地预测知识图中的关系,并且提供了一个便捷的API来提供这些embedding作为服务。
    Abstract A variety of knowledge graph embedding approaches have been developed. Most of them obtain embeddings by learning the structure of the knowledge graph within a link prediction setting. As a result, the embeddings reflect only the semantics of a single knowledge graph, and embeddings for different knowledge graphs are not aligned, e.g., they cannot be used to find similar entities across knowledge graphs via nearest neighbor search. However, knowledge graph embedding applications such as entity disambiguation require a more global representation, i.e., a representation that is valid across multiple sources. We propose to learn universal knowledge graph embeddings from large-scale interlinked knowledge sources. To this end, we fuse large knowledge graphs based on the owl:sameAs relation such that every entity is represented by a unique identity. We instantiate our idea by computing universal embeddings based on DBpedia and Wikidata yielding embeddings for about 180 million entities, 15 thousand relations, and 1.2 billion triples. Moreover, we develop a convenient API to provide embeddings as a service. Experiments on link prediction show that universal knowledge graph embeddings encode better semantics compared to embeddings computed on a single knowledge graph. For reproducibility purposes, we provide our source code and datasets open access at https://github.com/dice-group/Universal_Embeddings
    摘要 各种知识图像化方法已经开发出来。大多数这些方法通过在链接预测设定中学习知识图的结构来获取嵌入。因此,这些嵌入只反映了单个知识图的 semantics,并且嵌入不同的知识图不匹配,例如不能通过最近邻居搜索找到类似实体。然而,知识图嵌入应用如实体划分需要更全面的表示,即一个可以适用于多个来源的表示。我们提议从大规模连接的知识源中学习通用知识图嵌入。为此,我们将大知识图基于owl:sameAs关系进行融合,使每个实体都有唯一的标识。我们实现了这个想法,计算了基于DBpedia和Wikidata的通用嵌入,共计180万实体、15000关系和120亿 triple。此外,我们还开发了一个便捷的API,以提供嵌入作为服务。链接预测实验表明,通用知识图嵌入更好地编码 semantics 比单个知识图嵌入。为了保持可重复性,我们在https://github.com/dice-group/Universal_Embeddings 上提供了源代码和数据集,供大家免费下载。

Local Universal Rule-based Explanations

  • paper_url: http://arxiv.org/abs/2310.14894
  • repo_url: https://github.com/sbobek/lux
  • paper_authors: Szymon Bobek, Grzegorz J. Nalepa
  • for: 本研究旨在提供一种可解释的人工智能(XAI)方法,以帮助理解模型做出决策的原因。
  • methods: 本研究使用了一种基于决策树算法的规则生成器,可以生成事实、Counterfactual和视觉解释。此外,它还可以融合 SHAP 或 LIME 等特征重要性 XAI 方法。不同于其他算法,LUX 不需要数据生成,而是基于实际数据中的高密度集群来选择本地概念。
  • results: 对实际和synthetic数据进行测试后,我们发现LUX 方法在简洁性、全球准确性和代表性方面都超过了现有的rule-based explainer。
    Abstract Explainable artificial intelligence (XAI) is one of the most intensively developed are of AI in recent years. It is also one of the most fragmented one with multiple methods that focus on different aspects of explanations. This makes difficult to obtain the full spectrum of explanation at once in a compact and consistent way. To address this issue, we present Local Universal Explainer (LUX) that is a rule-based explainer which can generate factual, counterfactual and visual explanations. It is based on a modified version of decision tree algorithms that allows for oblique splits and integration with feature importance XAI methods such as SHAP or LIME. It does not use data generation in opposite to other algorithms, but is focused on selecting local concepts in a form of high-density clusters of real data that have the highest impact on forming the decision boundary of the explained model. We tested our method on real and synthetic datasets and compared it with state-of-the-art rule-based explainers such as LORE, EXPLAN and Anchor. Our method outperforms currently existing approaches in terms of simplicity, global fidelity and representativeness.
    摘要 “几年前,可解释人工智能(XAI)已成为人工智能发展的一个热门领域。然而,这个领域受到多种方法的影响,每种方法都专注于不同的解释方面。这使得很难同时获得完整的解释spectrum,尤其是在一个单一和可靠的方式下。为了解决这个问题,我们提出了一个名为Local Universal Explainer(LUX)的规律式解释器。LUX可以生成事实、 counterfactual 和可视的解释,并基于修改后的决策树算法,可以与特征重要性 XAI 方法(如 SHAP 或 LIME)集成。它不需要资料生成,而是专注于选择本地概念,即高密度的真实数据集中的最高影响因素,以形成决策边界的解释。我们对实际和 sintetic 数据进行了测试,并与现有的规律式解释器(如 LORE、EXPLAN 和 Anchor)进行比较。我们发现,我们的方法在简单性、全球实现度和代表性方面表现更好。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know.

Non-autoregressive Streaming Transformer for Simultaneous Translation

  • paper_url: http://arxiv.org/abs/2310.14883
  • repo_url: https://github.com/ictnlp/nast
  • paper_authors: Zhengrui Ma, Shaolei Zhang, Shoutao Guo, Chenze Shao, Min Zhang, Yang Feng
  • for: 提高同时翻译(SiMT)模型的快速和精度之间的平衡。
  • methods: 非autoregressive streaming Transformer(NAST)模型,包括单向编码器和非autoregressive解码器,并使用内存缓存来实现同步翻译。
  • results: NAST模型在各种SiMT benchmark上表现出色,比前一个强的autoregressive SiMT基线模型更高。
    Abstract Simultaneous machine translation (SiMT) models are trained to strike a balance between latency and translation quality. However, training these models to achieve high quality while maintaining low latency often leads to a tendency for aggressive anticipation. We argue that such issue stems from the autoregressive architecture upon which most existing SiMT models are built. To address those issues, we propose non-autoregressive streaming Transformer (NAST) which comprises a unidirectional encoder and a non-autoregressive decoder with intra-chunk parallelism. We enable NAST to generate the blank token or repetitive tokens to adjust its READ/WRITE strategy flexibly, and train it to maximize the non-monotonic latent alignment with an alignment-based latency loss. Experiments on various SiMT benchmarks demonstrate that NAST outperforms previous strong autoregressive SiMT baselines.
    摘要

A Study on Knowledge Graph Embeddings and Graph Neural Networks for Web Of Things

  • paper_url: http://arxiv.org/abs/2310.14866
  • repo_url: https://github.com/kgrl2021/submission-one
  • paper_authors: Rohith Teja Mittakola, Thomas Hassan
  • for: 本研究旨在适用知识图在物联网(WoT)领域,提供物理世界的数字表示,并启用跨领域应用程序在这个规模巨大且高度连接的图ething上建立。
  • methods: 本研究使用现状最佳知识图嵌入(KGE)方法学习数据抽象,并在下游任务中进行链接预测、节点分类和 triple 分类。此外,本研究还 investigate graph neural networks(GNN)与 KGE 方法的比较性能。
  • results: 研究显示,state-of-the-art KGE 和 GNN 方法在节点分类任务中表现极佳,而 GNN 方法在链接预测任务中表现更加优异。总的来说,本研究表明现状最佳方法在 WoT 上适用,并提供了实施和评估这些方法的初步经验。
    Abstract Graph data structures are widely used to store relational information between several entities. With data being generated worldwide on a large scale, we see a significant growth in the generation of knowledge graphs. Thing in the future is Orange's take on a knowledge graph in the domain of the Web Of Things (WoT), where the main objective of the platform is to provide a digital representation of the physical world and enable cross-domain applications to be built upon this massive and highly connected graph of things. In this context, as the knowledge graph grows in size, it is prone to have noisy and messy data. In this paper, we explore state-of-the-art knowledge graph embedding (KGE) methods to learn numerical representations of the graph entities and, subsequently, explore downstream tasks like link prediction, node classification, and triple classification. We also investigate Graph neural networks (GNN) alongside KGEs and compare their performance on the same downstream tasks. Our evaluation highlights the encouraging performance of both KGE and GNN-based methods on node classification, and the superiority of GNN approaches in the link prediction task. Overall, we show that state-of-the-art approaches are relevant in a WoT context, and this preliminary work provides insights to implement and evaluate them in this context.
    摘要 GRaph数据结构广泛用于存储多个实体之间的关系。随着全球数据生成的快速增长,我们看到了知识图的快速增长。在未来,Orange将在Web Of Things(WoT)领域提供一个基于大量连接的图Structured的数字表示世界,以便在这个图上建立跨领域应用程序。在这种情况下,知识图的增长会导致噪声和杂乱的数据。在这篇论文中,我们探讨了现代知识图嵌入(KGE)方法,以学习图元素的数字表示,并对这些表示进行下游任务的预测,如链接预测、节点分类和 triple classification。同时,我们 также investigate了图神经网络(GNN)与KGE的比较,并评估其在同样的下游任务中的性能。我们的评估表明,KGE和GNN基本方法在节点分类任务中表现出色,而GNN方法在链接预测任务中表现更为优秀。总的来说,我们发现现代方法在WoT上是有效的,这些初步的工作为我们在这个领域进行实现和评估提供了新的思路。

BioImage.IO Chatbot: A Personalized Assistant for BioImage Analysis Augmented by Community Knowledge Base

  • paper_url: http://arxiv.org/abs/2310.18351
  • repo_url: https://github.com/bioimage-io/bioimageio-chatbot
  • paper_authors: Wanlu Lei, Caterina Fuster-Barceló, Arrate Muñoz-Barrutia, Wei Ouyang
  • for: 本研究旨在应对生物像分析工具的扩散和复杂的应用环境,提供专业人士和新手一个轻松易用的搜寻和应用工具。
  • methods: 本研究使用大语言模型建构了一个名为 BioImage$.$IO Chatbot 的 conversational 助手,通过聚合和解释来自多个数据库、工具具specific文档和结构化数据源的信息,提供对应用程序的个性化、上下文感知的答案。
  • results: 本研究发现,BioImage$.$IO Chatbot 可以提供对应用程序的专业化和上下文感知的搜寻和应用服务,实现了生物像分析工具的搜寻和应用中的转型,并提高了生物学家、生物像分析师和开发者对于生物像分析工具的了解和使用效率。
    Abstract The rapidly expanding landscape of bioimage analysis tools presents a navigational challenge for both experts and newcomers. Traditional search methods often fall short in assisting users in this complex environment. To address this, we introduce the BioImage$.$IO Chatbot, an AI-driven conversational assistant tailored for the bioimage community. Built upon large language models, this chatbot provides personalized, context-aware answers by aggregating and interpreting information from diverse databases, tool-specific documentation, and structured data sources. Enhanced by a community-contributed knowledge base and fine-tuned retrieval methods, the BioImage$.$IO Chatbot offers not just a personalized interaction but also a knowledge-enriched, context-aware experience. It fundamentally transforms the way biologists, bioimage analysts, and developers navigate and utilize advanced bioimage analysis tools, setting a new standard for community-driven, accessible scientific research.
    摘要 生物图像分析工具的快速扩展化的景观呈现出了导航挑战,对于专家和新手都是如此。传统的搜索方法经常无法为用户在这个复杂的环境中提供有用的帮助。为解决这个问题,我们介绍了生物图像$.$IO chatbot,这是针对生物图像社区的人工智能对话助手。基于大语言模型,这个 chatbot 提供了个性化、上下文感知的答案,通过聚合和解释多种数据库、工具特定的文档和结构化数据源来完成。通过社区共同投稿的知识库和精细的搜索方法,生物图像$.$IO chatbot 不仅提供了个性化的交互,还提供了上下文感知、知识增强的体验。它fundamentally 改变了生物学家、生物图像分析员和开发者如何导航和使用高级生物图像分析工具,设置了一个新的社区驱动、可 accessible 的科学研究标准。

Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing

  • paper_url: http://arxiv.org/abs/2310.14855
  • repo_url: None
  • paper_authors: Sai Koneru, Miriam Exel, Matthias Huck, Jan Niehues
  • for: 本研究旨在利用大语言模型(LLM)进行机器翻译(MT),并 explore recent parameter-efficient fine-tuning techniques。
  • methods: 本研究使用了LLM作为自动后期编辑器(APE),并 extend 这种方法到文档级翻译。另外,我们还使用了 Low-Rank-Adapter fine-tuning 技术来改进 APE 性能。
  • results: 本研究实现了将ContraPro测试集中的89%的精度,并且在不同的领域数据上实现了一致性。此外,我们还发现在提供参考 контекст的情况下,人工后期修改可以大幅减少翻译后需要进行的编辑数量。
    Abstract Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks, but they have yet to attain state-of-the-art performance in Neural Machine Translation (NMT). Nevertheless, their significant performance in tasks demanding a broad understanding and contextual processing shows their potential for translation. To exploit these abilities, we investigate using LLM's for MT and explore recent parameter-efficient fine-tuning techniques. Surprisingly, our initial experiments find that fine-tuning for translation purposes even led to performance degradation. To overcome this, we propose an alternative approach: adapting LLM's as Automatic Post-Editors (APE) rather than direct translators. Building on the LLM's exceptional ability to process and generate lengthy sequences, we also propose extending our approach to document-level translation. We show that leveraging Low-Rank-Adapter fine-tuning for APE can yield significant improvements across both sentence and document-level metrics while generalizing to out-of-domain data. Most notably, we achieve a state-of-the-art accuracy rate of 89\% on the ContraPro test set, which specifically assesses the model's ability to resolve pronoun ambiguities when translating from English to German. Lastly, we investigate a practical scenario involving manual post-editing for document-level translation, where reference context is made available. Here, we demonstrate that leveraging human corrections can significantly reduce the number of edits required for subsequent translations\footnote{Interactive Demo for integrating manual feedback can be found \href{https://huggingface.co/spaces/skoneru/contextual_refinement_ende}{here}
    摘要 大型语言模型(LLM)在不同的自然语言处理任务中表现出了很大的成功,但它们尚未在神经机器翻译(NMT)中取得最佳性能。然而,它们在需要广泛理解和对话处理方面的能力表现出了潜力,因此可以用于翻译。为了利用这些能力,我们进行了研究,用LLM进行翻译和探索最近的参数效率调整技术。 surprisingly,我们的初步实验发现,对翻译目的进行调整会导致性能下降。为了解决这个问题,我们提出了一个 alternativ Approach:使用LLM作为自动修订器(APE)而不是直接翻译器。基于LLM的Length-Rank-Adapter调整技术,我们还提出了将我们的方法扩展到文档级翻译。我们展示了在两个层级的 метриках上,通过使用低维度适材调整,可以获得明显的改善,并且具有扩展性。最 notable,我们在从英文到德文的ContraPro测试集上 achiev 了89%的精度率,这个测试集specifically assess the model's ability to resolve pronoun ambiguities。最后,我们 investigate a practical scenario involving manual post-editing for document-level translation, where reference context is available. here, we demonstrate that leveraging human corrections can significantly reduce the number of edits required for subsequent translations。Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

ESVAE: An Efficient Spiking Variational Autoencoder with Reparameterizable Poisson Spiking Sampling

  • paper_url: http://arxiv.org/abs/2310.14839
  • repo_url: https://github.com/qgzhan/esvae
  • paper_authors: Qiugang Zhan, Xiurui Xie, Guisong Liu, Malu Zhang
  • for: 该文章主要研究了基于脉冲神经网络(SNN)的变量自动编码器(VAE)模型,以提高图像生成质量。
  • methods: 该文章提出了一种效果很好的变量自动编码器(ESVAE)模型,该模型使用脉冲神经网络中的发射率来构建干扰空间的先验分布,并提出了一种可重parameterizable的脉冲抽样方法。
  • results: 实验结果表明,提出的ESVAE模型在重构&生成图像质量方面表现更好,而且编码器能够更好地保留原始图像信息,决码器也更加稳定。
    Abstract In recent years, studies on image generation models of spiking neural networks (SNNs) have gained the attention of many researchers. Variational autoencoders (VAEs), as one of the most popular image generation models, have attracted a lot of work exploring their SNN implementation. Due to the constrained binary representation in SNNs, existing SNN VAE methods implicitly construct the latent space by an elaborated autoregressive network and use the network outputs as the sampling variables. However, this unspecified implicit representation of the latent space will increase the difficulty of generating high-quality images and introduces additional network parameters. In this paper, we propose an efficient spiking variational autoencoder (ESVAE) that constructs an interpretable latent space distribution and design a reparameterizable spiking sampling method. Specifically, we construct the prior and posterior of the latent space as a Poisson distribution using the firing rate of the spiking neurons. Subsequently, we propose a reparameterizable Poisson spiking sampling method, which is free from the additional network. Comprehensive experiments have been conducted, and the experimental results show that the proposed ESVAE outperforms previous SNN VAE methods in reconstructed & generated images quality. In addition, experiments demonstrate that ESVAE's encoder is able to retain the original image information more efficiently, and the decoder is more robust. The source code is available at https://github.com/QgZhan/ESVAE.
    摘要 近年来,基于脉冲神经网络(SNN)的图像生成模型得到了许多研究者的关注。变量自动编码器(VAE)是图像生成模型中最受欢迎的一种,SNN实现中也有很多研究。由于SNN中的二进制表示受限,现有的SNN VAE方法会隐式地构造latent space,通过复杂的autoregressive网络输出来采样。然而,这种隐式的latent space表示会提高图像质量生成的difficulty,并且添加了额外的网络参数。在这篇论文中,我们提出了高效的脉冲变量自动编码器(ESVAE),constructlatent space的分布是可解释的,并且设计了可重parameterizable的脉冲采样方法。具体来说,我们使用脉冲神经元的发射率来构造latent space的先前和后续分布,然后提出了可重parameterizable的Poisson脉冲采样方法,这种方法不需要额外的网络。我们对ESVAE进行了广泛的实验,实验结果表明,提议的ESVAE在重建&生成图像质量方面比前期SNN VAE方法更高效。此外,实验还表明,ESVAE的encoder能够更有效地保留原始图像信息,而decoder更加稳定。源代码可以在https://github.com/QgZhan/ESVAE上下载。

Calibration of Time-Series Forecasting Transformers: Detecting and Adapting Context-Driven Distribution Shift

  • paper_url: http://arxiv.org/abs/2310.14838
  • repo_url: None
  • paper_authors: Mouxiang Chen, Lefei Shen, Han Fu, Zhuo Li, Jianling Sun, Chenghao Liu
  • for: 本研究旨在提高时间序列预测中使用Transporter的准确性,特别是在不同的时间上下文下预测时。
  • methods: 本研究提出了一种通用的偏差检测方法和一种适应器框架,可以让模型在不同的时间上下文下适应。
  • results: 实验表明,使用提出的方法可以提高现有SOTA Transformer模型在真实世界数据集上的表现,特别是在受到偏差的情况下。
    Abstract Recent years have witnessed the success of introducing Transformers to time series forecasting. From a data generation perspective, we illustrate that existing Transformers are susceptible to distribution shifts driven by temporal contexts, whether observed or unobserved. Such context-driven distribution shift (CDS) introduces biases in predictions within specific contexts and poses challenges for conventional training paradigm. In this paper, we introduce a universal calibration methodology for the detection and adaptation of CDS with a trained Transformer model. To this end, we propose a novel CDS detector, termed the "residual-based CDS detector" or "Reconditionor", which quantifies the model's vulnerability to CDS by evaluating the mutual information between prediction residuals and their corresponding contexts. A high Reconditionor score indicates a severe susceptibility, thereby necessitating model adaptation. In this circumstance, we put forth a straightforward yet potent adapter framework for model calibration, termed the "sample-level contextualized adapter" or "SOLID". This framework involves the curation of a contextually similar dataset to the provided test sample and the subsequent fine-tuning of the model's prediction layer with a limited number of steps. Our theoretical analysis demonstrates that this adaptation strategy is able to achieve an optimal equilibrium between bias and variance. Notably, our proposed Reconditionor and SOLID are model-agnostic and readily adaptable to a wide range of Transformers. Extensive experiments show that SOLID consistently enhances the performance of current SOTA Transformers on real-world datasets, especially on cases with substantial CDS detected by the proposed Reconditionor, thus validate the effectiveness of the calibration approach.
    摘要 recent years have witnessed the success of introducing Transformers to time series forecasting. from a data generation perspective, we illustrate that existing Transformers are susceptible to distribution shifts driven by temporal contexts, whether observed or unobserved. such context-driven distribution shift (CDS) introduces biases in predictions within specific contexts and poses challenges for conventional training paradigm. in this paper, we introduce a universal calibration methodology for the detection and adaptation of CDS with a trained Transformer model. to this end, we propose a novel CDS detector, termed the "residual-based CDS detector" or "Reconditionor", which quantifies the model's vulnerability to CDS by evaluating the mutual information between prediction residuals and their corresponding contexts. a high Reconditionor score indicates a severe susceptibility, thereby necessitating model adaptation. in this circumstance, we put forth a straightforward yet potent adapter framework for model calibration, termed the "sample-level contextualized adapter" or "SOLID". this framework involves the curation of a contextually similar dataset to the provided test sample and the subsequent fine-tuning of the model's prediction layer with a limited number of steps. our theoretical analysis demonstrates that this adaptation strategy is able to achieve an optimal equilibrium between bias and variance. notably, our proposed Reconditionor and SOLID are model-agnostic and readily adaptable to a wide range of Transformers. extensive experiments show that SOLID consistently enhances the performance of current SOTA Transformers on real-world datasets, especially on cases with substantial CDS detected by the proposed Reconditionor, thus validate the effectiveness of the calibration approach.

Harnessing Attention Mechanisms: Efficient Sequence Reduction using Attention-based Autoencoders

  • paper_url: http://arxiv.org/abs/2310.14837
  • repo_url: None
  • paper_authors: Daniel Biermann, Fabrizio Palumbo, Morten Goodwin, Ole-Christoffer Granmo
  • for: 这个论文的目的是探讨一种基于注意力的方法,可以直接 manipulate 序列长度,以提高模型的性能。
  • methods: 这个方法使用了注意力机制,可以在 latent space 中对序列进行压缩和重构。
  • results: 实验结果显示,这个方法可以准确地压缩序列长度,并且可以保留所有重要信息。即使压缩到原始序列的一半或一半的长度,模型仍可以准确地重建原始序列,准确率约为 90%。
    Abstract Many machine learning models use the manipulation of dimensions as a driving force to enable models to identify and learn important features in data. In the case of sequential data this manipulation usually happens on the token dimension level. Despite the fact that many tasks require a change in sequence length itself, the step of sequence length reduction usually happens out of necessity and in a single step. As far as we are aware, no model uses the sequence length reduction step as an additional opportunity to tune the models performance. In fact, sequence length manipulation as a whole seems to be an overlooked direction. In this study we introduce a novel attention-based method that allows for the direct manipulation of sequence lengths. To explore the method's capabilities, we employ it in an autoencoder model. The autoencoder reduces the input sequence to a smaller sequence in latent space. It then aims to reproduce the original sequence from this reduced form. In this setting, we explore the methods reduction performance for different input and latent sequence lengths. We are able to show that the autoencoder retains all the significant information when reducing the original sequence to half its original size. When reducing down to as low as a quarter of its original size, the autoencoder is still able to reproduce the original sequence with an accuracy of around 90%.
    摘要 In this study, we introduce a novel attention-based method that allows for the direct manipulation of sequence lengths. To explore the method's capabilities, we employ it in an autoencoder model. The autoencoder reduces the input sequence to a smaller sequence in latent space, and then aims to reproduce the original sequence from this reduced form. In this setting, we explore the method's reduction performance for different input and latent sequence lengths. We are able to show that the autoencoder retains all the significant information when reducing the original sequence to half its original size. When reducing down to as low as a quarter of its original size, the autoencoder is still able to reproduce the original sequence with an accuracy of around 90%.

Leveraging Ensemble Diversity for Robust Self-Training in the Presence of Sample Selection Bias

  • paper_url: http://arxiv.org/abs/2310.14814
  • repo_url: None
  • paper_authors: Ambroise Odonnat, Vasilii Feofanov, Ievgen Redko
  • for: 这篇论文旨在提出一种新的信任度衡量方法,用于 semi-supervised learning 中的自适性训练。
  • methods: 方法包括在无标示数据上逐次分配 pseudo-label,并将其视为标示的例子。在 neural network 中,通常使用 softmax 预测概率作为信任度衡量,但是这种方法可能会受到样本选择偏见的影响。
  • results: 作者提出一种新的信任度衡量方法,called $\mathcal{T}$-similarity,基于预测多样性。他们提供了理论分析,该方法可以对标示过滤和样本选择偏见进行适当的调整。在三个不同的 pseudo-labeling 政策下,作者透过实验证明了该方法的效果。
    Abstract Self-training is a well-known approach for semi-supervised learning. It consists of iteratively assigning pseudo-labels to unlabeled data for which the model is confident and treating them as labeled examples. For neural networks, softmax prediction probabilities are often used as a confidence measure, despite the fact that they are known to be overconfident, even for wrong predictions. This phenomenon is particularly intensified in the presence of sample selection bias, i.e., when data labeling is subject to some constraint. To address this issue, we propose a novel confidence measure, called $\mathcal{T}$-similarity, built upon the prediction diversity of an ensemble of linear classifiers. We provide the theoretical analysis of our approach by studying stationary points and describing the relationship between the diversity of the individual members and their performance. We empirically demonstrate the benefit of our confidence measure for three different pseudo-labeling policies on classification datasets of various data modalities.
    摘要 自适应训练是一种常见的半监督学习方法。它包括在无标记数据上逐次分配假标签,并将其视为标注的示例。对于神经网络,通常使用软max预测概率作为信心度量,即使这些预测可能是错误的。这种现象在样本选择偏见的情况下更加严重,即数据标注受到某些约束。为解决这个问题,我们提出了一种新的信心度量,称为 $\mathcal{T}$-相似度,基于预测 ensemble 的线性分类器的多样性。我们提供了对我们方法的理论分析,包括站点点和个体成员多样性与性能之间的关系。我们还进行了实验,证明我们的信心度量在不同的假标签政策下对分类 datasets 的不同数据模式具有明显的优点。

Large Language Models can Share Images, Too!

  • paper_url: http://arxiv.org/abs/2310.14804
  • repo_url: https://github.com/passing2961/LLM-Share-Image
  • paper_authors: Young-Jun Lee, Jonghwan Hyeon, Ho-Jin Choi
  • for: 这 paper 探索了 Large Language Models (LLMs) 在零基础设定下的图像分享能力,不使用视觉基础模型。
  • methods: 我们提出了一个两个阶段的框架,使得 LLMs 可以预测图像分享的可能性并生成相关的图像描述,使用我们的有效的约束基本模板。
  • results: 我们在延展 experiment 中发现了 LLMs 在零基础设定下的图像分享能力,GPT-4 achieve 最好的表现。此外,我们还发现了在零基础设定下的图像分享能力的 emergent 特性。
    Abstract This paper explores the image-sharing capability of Large Language Models (LLMs), such as InstructGPT, ChatGPT, and GPT-4, in a zero-shot setting, without the help of visual foundation models. Inspired by the two-stage process of image-sharing in human dialogues, we propose a two-stage framework that allows LLMs to predict potential image-sharing turns and generate related image descriptions using our effective restriction-based prompt template. With extensive experiments, we unlock the \textit{image-sharing} capability of LLMs in zero-shot prompting, with GPT-4 achieving the best performance. Additionally, we uncover the emergent \textit{image-sharing} ability in zero-shot prompting, demonstrating the effectiveness of restriction-based prompts in both stages of our framework. Based on this framework, we augment the PhotoChat dataset with images generated by Stable Diffusion at predicted turns, namely PhotoChat++. To our knowledge, this is the first study to assess the image-sharing ability of LLMs in a zero-shot setting without visual foundation models. The source code and the dataset will be released after publication.
    摘要 Note:* 图像分享 (image-sharing) is used instead of 图像描述 (image description) to emphasize the sharing aspect of the task.* restriction-based prompts are used to guide the generation of image descriptions.* PhotoChat++ is a dataset augmented with images generated by Stable Diffusion at predicted turns.

Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages

  • paper_url: http://arxiv.org/abs/2310.14799
  • repo_url: None
  • paper_authors: Libo Qin, Qiguang Chen, Fuxuan Wei, Shijue Huang, Wanxiang Che
  • for: 提高跨语言逻辑推理的准确性和通用性
  • methods: 提出cross-lingual prompting (CLP)和cross-lingual self-consistent prompting (CLSP)两种新的提问方法,用于提高逻辑推理的准确性和通用性
  • results: 实验结果表明,CLP和CLSP在多个测试 benchmark 上表现出色,significantly outperform 现有的提问方法,达到了领域的状态元表现
    Abstract Chain-of-thought (CoT) is capable of eliciting models to explicitly generate reasoning paths, thus promoting reasoning accuracy and attracting increasing attention. Specifically, zero-shot CoT achieves remarkable improvements in a wide range of reasoning tasks by simply instructing the LLM with the prompt "Let's think step by step!". Despite the success of zero-shot CoT, the existing zero-shot prompting techniques remain limited to a single language, making it challenging to generalize to other languages and hindering global development. In this work, we introduce cross-lingual prompting (CLP), aiming to improve zero-shot CoT reasoning across languages. Specifically, CLP consists of two main components: (1) cross-lingual alignment prompting and (2) task-specific solver prompting. The cross-lingual alignment prompting is responsible for aligning representations across different languages, whereas the task-specific solver prompting is used to generate the final chain of thoughts and results for the reasoning task. In addition, we further introduce cross-lingual self-consistent prompting (CLSP) to ensemble different reasoning paths across languages. Our experimental evaluations on several benchmarks demonstrate that CLP and CLSP significantly outperform the existing prompting methods and achieve state-of-the-art performance. We hope this work will inspire further breakthroughs in cross-lingual CoT.
    摘要 Chain-of-thought (CoT) 可以让模型直接生成推理路径,从而提高推理准确性,并在最近吸引了越来越多的关注。 Specifically, zero-shot CoT 在各种推理任务中显示出了很好的改善,只需通过提示 "Let's think step by step!" 就可以达到这一点。Despite the success of zero-shot CoT, 现有的零shot prompting技术仅限于单一语言,这使得推理扩展到其他语言和全球发展受到了限制。在这项工作中,我们提出了跨语言推理(CLP),旨在提高零shot CoT 的推理性能 across languages。 CLP 包括两个主要组成部分:(1)跨语言对应 prompting 和(2)任务特定的解决者 prompting。跨语言对应 prompting 负责在不同语言之间对表示进行对应,而任务特定的解决者 prompting 用于生成最终的推理路径和结果。此外,我们还提出了跨语言自包 prompting (CLSP),用于ensemble不同语言的推理路径。我们的实验评估表明,CLP 和 CLSP 在多个benchmark上表现出色,significantly outperform 现有的prompting方法,并达到了状态之arte。我们希望这项工作能够激发更多的跨语言 CoT 的突破。

What do Deck Chairs and Sun Hats Have in Common? Uncovering Shared Properties in Large Concept Vocabularies

  • paper_url: http://arxiv.org/abs/2310.14793
  • repo_url: None
  • paper_authors: Amit Gajbhiye, Zied Bouraoui, Na Li, Usashi Chatterjee, Luis Espinosa Anke, Steven Schockaert
  • for: 本研究旨在提高没有句子背景下模型概念的表示方法,以便在应用中更好地处理概念。
  • methods: 本研究使用了一种策略,即基于概念的共同属性来表示概念。
  • results: 研究发现,通过将 Shared Properties 添加到标签集,可以提高当前状态的最佳模型在ultra-fine entity typing任务中的表现。I hope that helps! Let me know if you have any further questions or if you’d like me to translate anything else.
    Abstract Concepts play a central role in many applications. This includes settings where concepts have to be modelled in the absence of sentence context. Previous work has therefore focused on distilling decontextualised concept embeddings from language models. But concepts can be modelled from different perspectives, whereas concept embeddings typically mostly capture taxonomic structure. To address this issue, we propose a strategy for identifying what different concepts, from a potentially large concept vocabulary, have in common with others. We then represent concepts in terms of the properties they share with the other concepts. To demonstrate the practical usefulness of this way of modelling concepts, we consider the task of ultra-fine entity typing, which is a challenging multi-label classification problem. We show that by augmenting the label set with shared properties, we can improve the performance of the state-of-the-art models for this task.
    摘要 concepts 在许多应用场景中扮演中心角色。这包括无语句上下文下模型概念的情况。先前的工作因此主要关注从语言模型中提取减少上下文的概念嵌入。但是概念可以从不同的角度出现,而概念嵌入通常主要捕捉分类结构。为了解决这个问题,我们提议一种策略,即在大量概念词汇中查找不同概念之间共同的属性。然后我们使用这些共同属性来表示概念。为了证明这种概念表示方式的实际有用性,我们考虑了ultra-fine entity typing任务,这是一个复杂的多标签分类问题。我们发现,通过增加共同属性来补充标签集,可以提高现有模型的性能。

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning

  • paper_url: http://arxiv.org/abs/2310.14785
  • repo_url: None
  • paper_authors: Hao Wang, Xiahua Chen, Rui Wang, Chenhui Chu
    for: 这篇论文是关于从视觉 ricHform 文档中提取有用的实体所写的。methods: 这篇论文使用了一种新的视觉不对称共论学习(\textsc{Vancl)approach,通过吸收颜色偏好来增强模型对视觉和布局特征的捕捉能力。results: 实验结果表明,我们的方法在标准 benchmark 数据集上显著超越了强大的 LayoutLM 系列基eline,证明了我们的方法的有效性。此外,我们还 investigate了不同颜色方案对我们的方法的影响,为未来的多模态信息提取研究提供了思路。
    Abstract Extracting meaningful entities belonging to predefined categories from Visually-rich Form-like Documents (VFDs) is a challenging task. Visual and layout features such as font, background, color, and bounding box location and size provide important cues for identifying entities of the same type. However, existing models commonly train a visual encoder with weak cross-modal supervision signals, resulting in a limited capacity to capture these non-textual features and suboptimal performance. In this paper, we propose a novel \textbf{V}isually-\textbf{A}symmetric co\textbf{N}sisten\textbf{C}y \textbf{L}earning (\textsc{Vancl}) approach that addresses the above limitation by enhancing the model's ability to capture fine-grained visual and layout features through the incorporation of color priors. Experimental results on benchmark datasets show that our approach substantially outperforms the strong LayoutLM series baseline, demonstrating the effectiveness of our approach. Additionally, we investigate the effects of different color schemes on our approach, providing insights for optimizing model performance. We believe our work will inspire future research on multimodal information extraction.
    摘要 抽取来自可视形式文档(VFD)中意义性的实体,是一项有挑战性的任务。视觉和布局特征,如字体、背景、颜色和 bounding box 的位置和大小,对于同类实体的标识提供重要的指示。然而,现有的模型通常通过弱交叉模式监督信号来训练视觉编码器,导致其捕捉非文本特征的能力有限,性能也不佳。在本文中,我们提出了一种新的 \textbf{V}isually-\textbf{A}symmetric co\textbf{N}sisten\textbf{C}y \textbf{L}earning (\textsc{Vancl}) 方法,以解决上述限制。我们通过添加色彩优先来增强模型捕捉细腻的视觉和布局特征,使模型能够更好地捕捉非文本特征,并且实验结果表明,我们的方法与强 LayoutLM 系列基线相比,显著超越了它们。此外,我们还 investigate了不同的颜色方案对我们的方法的影响,提供了优化模型性能的意见。我们认为,我们的工作将激发未来关于多模式信息抽取的研究。

An Efficient Imbalance-Aware Federated Learning Approach for Wearable Healthcare with Autoregressive Ratio Observation

  • paper_url: http://arxiv.org/abs/2310.14784
  • repo_url: None
  • paper_authors: Wenhao Yan, He Li, Kaoru Ota, Mianxiong Dong
  • for: this paper aims to address the challenges of class imbalance in federated learning scenarios
  • methods: the proposed FedImT framework uses an online scheme to estimate data composition and a self-attenuating iterative method to track variations and adjust loss computing for minority classes
  • results: experiments demonstrate the effectiveness of FedImT in solving the imbalance problem without extra energy consumption and avoiding privacy risks
    Abstract Widely available healthcare services are now getting popular because of advancements in wearable sensing techniques and mobile edge computing. People's health information is collected by edge devices such as smartphones and wearable bands for further analysis on servers, then send back suggestions and alerts for abnormal conditions. The recent emergence of federated learning allows users to train private data on local devices while updating models collaboratively. However, the heterogeneous distribution of the health condition data may lead to significant risks to model performance due to class imbalance. Meanwhile, as FL training is powered by sharing gradients only with the server, training data is almost inaccessible. The conventional solutions to class imbalance do not work for federated learning. In this work, we propose a new federated learning framework FedImT, dedicated to addressing the challenges of class imbalance in federated learning scenarios. FedImT contains an online scheme that can estimate the data composition during each round of aggregation, then introduces a self-attenuating iterative equivalent to track variations of multiple estimations and promptly tweak the balance of the loss computing for minority classes. Experiments demonstrate the effectiveness of FedImT in solving the imbalance problem without extra energy consumption and avoiding privacy risks.
    摘要 Note:* "Widely available healthcare services" is 健康服务 (jīngkāng fúwù) in Simplified Chinese.* "Advancements in wearable sensing techniques and mobile edge computing" is 智能监测技术和移动边缘计算 (zhìnéng jiānnéng jìshù yǔ qiǎo yìdòng jìsuō) in Simplified Chinese.* "People's health information" is 人员健康信息 (rényù jīngkāng xìnxī) in Simplified Chinese.* "Federated learning" is 联合学习 (liánhégōng xuéxí) in Simplified Chinese.* "Class imbalance" is 类别不均衡 (lèibié bùjìhóng) in Simplified Chinese.* "Self-attenuating iterative method" is 自适应迭代法 (zìshìbìng dàiédào fǎ) in Simplified Chinese.* "Privacy risks" is 隐私风险 (yìnwèi fēngxīn) in Simplified Chinese.

Evaluating the Knowledge Base Completion Potential of GPT

  • paper_url: http://arxiv.org/abs/2310.14771
  • repo_url: None
  • paper_authors: Blerta Veseli, Simon Razniewski, Jan-Christoph Kalo, Gerhard Weikum
  • for: This paper is written for evaluating the ability of language models (LMs) to complete structured knowledge bases (KBs) at scale and with high accuracy.
  • methods: The paper uses GPT-3, ChatGPT, and GPT-4 to perform unsupervised knowledge base completion (KBC) on the largest public KB, Wikidata.
  • results: The paper finds that, despite the size and capabilities of GPT-3 and other models, they do not achieve fully convincing results on this task, but they do provide solid improvements over earlier approaches with smaller LMs. Specifically, with proper thresholding, GPT-3 enables the extension of Wikidata by 27M facts at 90% precision.
    Abstract Structured knowledge bases (KBs) are an asset for search engines and other applications, but are inevitably incomplete. Language models (LMs) have been proposed for unsupervised knowledge base completion (KBC), yet, their ability to do this at scale and with high accuracy remains an open question. Prior experimental studies mostly fall short because they only evaluate on popular subjects, or sample already existing facts from KBs. In this work, we perform a careful evaluation of GPT's potential to complete the largest public KB: Wikidata. We find that, despite their size and capabilities, models like GPT-3, ChatGPT and GPT-4 do not achieve fully convincing results on this task. Nonetheless, they provide solid improvements over earlier approaches with smaller LMs. In particular, we show that, with proper thresholding, GPT-3 enables to extend Wikidata by 27M facts at 90% precision.
    摘要 《结构化知识库(KB)是搜索引擎和其他应用程序的资产,但是不可避免 incomplete。语言模型(LM)已经被提议用于无监督知识库完成(KBC),但是,其在大规模和高精度下的能力仍然是一个 открыQuestion。先前的实验研究主要是在流行的主题上进行评估,或者从现有知识库中采样已有的事实。在这个工作中,我们进行了综合评估GPT的可能性来完成最大公共知识库:Wikidata。我们发现,即使它们具有大小和能力,模型如GPT-3、ChatGPT和GPT-4在这个任务上并不实现完全感知的结果。然而,它们在earlier Approaches中提供了更好的改进。特别是,我们发现,通过适当的阈值调整,GPT-3可以将Wikidata扩展到2700万事实,准确率为90%。》

Policy Gradient with Kernel Quadrature

  • paper_url: http://arxiv.org/abs/2310.14768
  • repo_url: None
  • paper_authors: Satoshi Hayakawa, Tetsuro Morimura
  • for: 本研究的目的是提高奖励学习任务中的回归评估效率,通过选择一小而表示全部集的代表集,并且只在这些代表集上计算奖励,以便更加有效地进行策略向导迭代。
  • methods: 本研究使用 Gaussian 过程来模型减去返回或奖励的空间,并使用 “episodic” kernel quadrature 方法来压缩示例集的信息,然后将压缩后的集 passing 给策略网络进行导数更新。
  • results: 本研究通过数学背景和 MuJoCo 和 causal discovery 任务的数字示例,证明了这种方法的有效性和可Scalability。
    Abstract Reward evaluation of episodes becomes a bottleneck in a broad range of reinforcement learning tasks. Our aim in this paper is to select a small but representative subset of a large batch of episodes, only on which we actually compute rewards for more efficient policy gradient iterations. We build a Gaussian process modeling of discounted returns or rewards to derive a positive definite kernel on the space of episodes, run an "episodic" kernel quadrature method to compress the information of sample episodes, and pass the reduced episodes to the policy network for gradient updates. We present the theoretical background of this procedure as well as its numerical illustrations in MuJoCo and causal discovery tasks.
    摘要 reward 评估 episoden 成为 reinforcement learning 任务中的瓶颈。我们在这篇论文中的目标是选择一小而代表性的集合 episode,只在这些集合上计算奖励,以更高效地进行策略梯度迭代。我们使用 Gaussian process 模型来 derive 一个正定的kernel在 episoden 空间上,然后使用 "episodic" kernel quadrature 方法压缩样例集的信息,并将压缩后的集合传递给策略网络进行梯度更新。我们还提供了这种过程的理论背景和 MuJoCo 和 causal discovery 任务的数学示例。

The Safety Challenges of Deep Learning in Real-World Type 1 Diabetes Management

  • paper_url: http://arxiv.org/abs/2310.14743
  • repo_url: https://github.com/hemerson1/openaps_cleaner
  • paper_authors: Harry Emerson, Ryan McConville, Matthew Guy
  • for: 这项研究的目的是评估使用深度学习算法来模拟Type 1 диабеت斯(T1D)管理策略的有效性,无需害 patients。
  • methods: 该研究使用了在实际世界中采集的自由生活数据,并使用了patient报告的挑战性 диабеت斯事件标签,构成了一个非常详细的T1D实际数据集。这个数据集用于训练和评估现有的血糖模拟器,并比较了它们在安全关键场景下的预测错误,以及学习的物理学correctness。
  • results: 研究发现,使用深度学习算法可以在T1D管理中提高预测准确性,但是在安全关键场景下,模型会衰竭,并且无法利用自报的食物和运动信息。另外,SHAP值分析也表明,模型混淆了胰岛素和碳水化合物的角色,这是T1D管理的基本原则之一。这项研究 highlights the importance of considering physiological appropriateness when using deep learning to model real-world systems in T1D and healthcare more broadly, and provides recommendations for building models that are robust to real-world data constraints.
    Abstract Blood glucose simulation allows the effectiveness of type 1 diabetes (T1D) management strategies to be evaluated without patient harm. Deep learning algorithms provide a promising avenue for extending simulator capabilities; however, these algorithms are limited in that they do not necessarily learn physiologically correct glucose dynamics and can learn incorrect and potentially dangerous relationships from confounders in training data. This is likely to be more important in real-world scenarios, as data is not collected under strict research protocol. This work explores the implications of using deep learning algorithms trained on real-world data to model glucose dynamics. Free-living data was processed from the OpenAPS Data Commons and supplemented with patient-reported tags of challenging diabetes events, constituting one of the most detailed real-world T1D datasets. This dataset was used to train and evaluate state-of-the-art glucose simulators, comparing their prediction error across safety critical scenarios and assessing the physiological appropriateness of the learned dynamics using Shapley Additive Explanations (SHAP). While deep learning prediction accuracy surpassed the widely-used mathematical simulator approach, the model deteriorated in safety critical scenarios and struggled to leverage self-reported meal and exercise information. SHAP value analysis also indicated the model had fundamentally confused the roles of insulin and carbohydrates, which is one of the most basic T1D management principles. This work highlights the importance of considering physiological appropriateness when using deep learning to model real-world systems in T1D and healthcare more broadly, and provides recommendations for building models that are robust to real-world data constraints.
    摘要 类型1糖尿病(T1D)管理策略的效iveness可以通过血糖模拟来评估,而不会对病人造成伤害。深度学习算法提供了扩展模拟能力的可能性,但这些算法可能不会学习正确的生物physiological糖 dynamics,并且可能会从训练数据中学习 incorrect和potentially dangerous的关系。这在实际场景中可能更加重要,因为数据不是在严格的研究协议下采集。这项工作探讨了使用基于实际数据的深度学习算法来模拟血糖动态的后果。自由生活数据来自OpenAPS数据共享平台,并与患者报告的糖尿病事件标签相结合,构成了一个非常详细的实际糖尿病数据集。这个数据集用于训练和评估当今最佳的血糖模拟器,并比较它们在安全关键情况下的预测误差。深度学习预测精度超过了广泛使用的数学模拟方法,但模型在安全关键情况下崩溃,并且无法利用患者报告的饮食和运动信息。SHAP值分析也表明模型混淆了胰岛素和碳水化合物的角色,这是糖尿病管理的基本原则之一。这项工作强调了在使用深度学习模型实际世界系统时考虑生物学可正确性的重要性,并提供了建立可靠的模型的建议。

Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review

  • paper_url: http://arxiv.org/abs/2310.14735
  • repo_url: None
  • paper_authors: Banghao Chen, Zhaofeng Zhang, Nicolas Langrené, Shengxin Zhu
  • for: 本研究探讨了大语言模型(LLM)的提示工程技术,以便最大化 LLM 的能力。
  • methods: 本文介绍了基础的提示工程原则,如角色提示、一击、少击提示,以及更高级的方法,如链条思维和树条思维。 外部协助Plugin 可以帮助提高提示效果,减少机器幻觉。
  • results: 本研究 shed light on the potential of prompt engineering in fields such as education and programming, demonstrating its transformative power. The paper also discusses the need for further research on the structures and agent roles in Artificial Intelligence-Generated Content (AIGC) tools.
    Abstract This paper delves into the pivotal role of prompt engineering in unleashing the capabilities of Large Language Models (LLMs). Prompt engineering is the process of structuring input text for LLMs and is a technique integral to optimizing the efficacy of LLMs. This survey elucidates foundational principles of prompt engineering, such as role-prompting, one-shot, and few-shot prompting, as well as more advanced methodologies such as the chain-of-thought and tree-of-thoughts prompting. The paper sheds light on how external assistance in the form of plugins can assist in this task, and reduce machine hallucination by retrieving external knowledge. We subsequently delineate prospective directions in prompt engineering research, emphasizing the need for a deeper understanding of structures and the role of agents in Artificial Intelligence-Generated Content (AIGC) tools. We discuss how to assess the efficacy of prompt methods from different perspectives and using different methods. Finally, we gather information about the application of prompt engineering in such fields as education and programming, showing its transformative potential. This comprehensive survey aims to serve as a friendly guide for anyone venturing through the big world of LLMs and prompt engineering.
    摘要 Here's the Simplified Chinese translation:这篇论文探讨了大语言模型(LLM)的启发引擎技术的重要性。启发引擎技术是指为LLM设计输入文本,以便最佳化其性能的过程。本文描述了启发引擎的基本原则,包括角色启发、一键启发和少数启发,以及更高级的链条思维和树思维启发。文章还介绍了外部帮助的形式为插件,可以帮助减少机器幻见并提高LLM的准确率。此外,文章还讨论了未来启发引擎研究的可能性,包括结构的深入理解和人工智能生成内容工具中的代理人角色。文章还讨论了评估启发方法的不同方法和多种方法,以及启发引擎在教育和编程等领域的应用。总之,这篇论文 aimsto serve as a friendly guide for anyone exploring the exciting world of LLMs and prompt engineering.

Predicting Transcription Factor Binding Sites using Transformer based Capsule Network

  • paper_url: http://arxiv.org/abs/2310.15202
  • repo_url: https://github.com/NimishaGhosh/DNABERT-Cap
  • paper_authors: Nimisha Ghosh, Daniele Santoni, Indrajit Saha, Giovanni Felici
  • for: 预测蛋白质因子绑定 сай点,以理解蛋白质因子如何调控基因表达,以及如何通过修饰这种调控来实现治疗目标。
  • methods: 提出了一种基于 transformer 核心的软件箱网络(DNABERT-Cap),通过将大量的 genomic DNA 序列作为预训练数据,并将软件箱层负责最终预测。提出的模型使用了拥有 bidirectional encoder 和软件箱层的特征合并,以及 convolutional 和 bidirectional long-short term memory 层,来建立蛋白质因子绑定站预测器。
  • results: 使用了 ENCODE 数据库中的五个 cell line(A549、GM12878、Hep-G2、H1-hESC 和 Hela)的 ChIP-seq 数据进行评估,结果显示,DNABERT-Cap 在所有五个 cell line 中的平均 receiver operating characteristic curve 分数超过 0.91。此外,DNABERT-Cap 还与现有的深度学习基于 predictors viz. DeepARC、DeepTF、CNN-Zeng 和 DeepBind 进行比较,并被发现超越它们。
    Abstract Prediction of binding sites for transcription factors is important to understand how they regulate gene expression and how this regulation can be modulated for therapeutic purposes. Although in the past few years there are significant works addressing this issue, there is still space for improvement. In this regard, a transformer based capsule network viz. DNABERT-Cap is proposed in this work to predict transcription factor binding sites mining ChIP-seq datasets. DNABERT-Cap is a bidirectional encoder pre-trained with large number of genomic DNA sequences, empowered with a capsule layer responsible for the final prediction. The proposed model builds a predictor for transcription factor binding sites using the joint optimisation of features encompassing both bidirectional encoder and capsule layer, along with convolutional and bidirectional long-short term memory layers. To evaluate the efficiency of the proposed approach, we use a benchmark ChIP-seq datasets of five cell lines viz. A549, GM12878, Hep-G2, H1-hESC and Hela, available in the ENCODE repository. The results show that the average area under the receiver operating characteristic curve score exceeds 0.91 for all such five cell lines. DNABERT-Cap is also compared with existing state-of-the-art deep learning based predictors viz. DeepARC, DeepTF, CNN-Zeng and DeepBind, and is seen to outperform them.
    摘要 预测蛋白质因子绑定位点是理解蛋白质因子如何调控蛋白质表达的关键,以及如何通过调控来实现治疗目标。虽然过去几年来有很多研究addressing这个问题,但还有很多空间可以进行改进。在这个 regard,本文提出了一种基于 transformer 的宫墩网络模型,称为 DNABERT-Cap,用于预测蛋白质因子绑定位点,并 mine ChIP-seq 数据集。DNABERT-Cap 是一种双向编码器,预先训练了大量的 genomic DNA 序列,并具有一个负责最终预测的宫墩层。提出的模型建立了一个基于 joint 优化的预测器,包括双向编码器和宫墩层,以及 convolutional 和双向 long-short term memory 层。为评估提出的方法的效率,我们使用了五个 cell line 的 ChIP-seq 数据集,即 A549、GM12878、Hep-G2、H1-hESC 和 Hela,这些数据集可以在 ENCODE 存储库中找到。结果显示,DNABERT-Cap 的平均 receiver operating characteristic curve 分数超过 0.91 的所有五个 cell line。此外,DNABERT-Cap 还与现有的深度学习基于预测器 viz. DeepARC、DeepTF、CNN-Zeng 和 DeepBind 进行比较,并被证明超越了它们。

Generating Prototypes for Contradiction Detection Using Large Language Models and Linguistic Rules

  • paper_url: http://arxiv.org/abs/2310.14732
  • repo_url: https://github.com/fraunhofer-iais/informed_nlu
  • paper_authors: Maren Pielka, Svetlana Schmidt, Rafet Sifa
  • for: 这个论文的目的是提出一种新的矛盾检测数据生成方法,以利用大语言模型的生成能力和语言规则。
  • methods: 该方法使用大语言模型来生成矛盾的说明,并遵循语言规则来构造简单的矛盾。
  • results: 该方法可以生成具有凝结和多样性的数据,并且可以用于语言模型的 Fine-tuning。但是,进一步的研究和手动修改是必要的,以使用这些数据在机器学习 setup 中。
    Abstract We introduce a novel data generation method for contradiction detection, which leverages the generative power of large language models as well as linguistic rules. Our vision is to provide a condensed corpus of prototypical contradictions, allowing for in-depth linguistic analysis as well as efficient language model fine-tuning. To this end, we instruct the generative models to create contradicting statements with respect to descriptions of specific contradiction types. In addition, the model is also instructed to come up with completely new contradiction typologies. As an auxiliary approach, we use linguistic rules to construct simple contradictions such as those arising from negation, antonymy and numeric mismatch. We find that our methods yield promising results in terms of coherence and variety of the data. Further studies, as well as manual refinement are necessary to make use of this data in a machine learning setup.
    摘要 我们介绍了一种新的数据生成方法,用于探测矛盾,这种方法利用大语言模型的生成力以及语言规则。我们的目标是提供一个简化的矛盾词库,以便进行深入的语言分析以及语言模型精细调整。为此,我们指定生成模型创造与特定矛盾类型相关的矛盾声明,同时也让模型创造新的矛盾类型。此外,我们还使用语言规则构建简单的矛盾,如从否定、反义和数字差异中导致的矛盾。我们发现,我们的方法可以生成具有凝聚力和多样性的数据,进一步研究和人工优化是必要的,以使用这些数据在机器学习设置中。

A Survey on LLM-generated Text Detection: Necessity, Methods, and Future Directions

  • paper_url: http://arxiv.org/abs/2310.14724
  • repo_url: https://github.com/nlp2ct/llm-generated-text-detection
  • paper_authors: Junchao Wu, Shu Yang, Runzhe Zhan, Yulin Yuan, Derek F. Wong, Lidia S. Chao
  • for: The paper is written to detect LLM-generated text and to mitigate the potential misuse of LLMs in various areas, such as artistic expression and social networks.
  • methods: The paper discusses various techniques for LLM-generated text detection, including watermarking, zero-shot methods, fine-turning LMs, adversarial learning, and human-assisted methods.
  • results: The paper highlights recent research breakthroughs in LLM-generated text detection and emphasizes the need for further research to improve the accuracy and robustness of detectors. It also discusses the limitations and developmental requirements of prevalent datasets and analyzes various detection paradigms, shedding light on challenges such as out-of-distribution problems, potential attacks, and data ambiguity.
    Abstract The powerful ability to understand, follow, and generate complex language emerging from large language models (LLMs) makes LLM-generated text flood many areas of our daily lives at an incredible speed and is widely accepted by humans. As LLMs continue to expand, there is an imperative need to develop detectors that can detect LLM-generated text. This is crucial to mitigate potential misuse of LLMs and safeguard realms like artistic expression and social networks from harmful influence of LLM-generated content. The LLM-generated text detection aims to discern if a piece of text was produced by an LLM, which is essentially a binary classification task. The detector techniques have witnessed notable advancements recently, propelled by innovations in watermarking techniques, zero-shot methods, fine-turning LMs methods, adversarial learning methods, LLMs as detectors, and human-assisted methods. In this survey, we collate recent research breakthroughs in this area and underscore the pressing need to bolster detector research. We also delve into prevalent datasets, elucidating their limitations and developmental requirements. Furthermore, we analyze various LLM-generated text detection paradigms, shedding light on challenges like out-of-distribution problems, potential attacks, and data ambiguity. Conclusively, we highlight interesting directions for future research in LLM-generated text detection to advance the implementation of responsible artificial intelligence (AI). Our aim with this survey is to provide a clear and comprehensive introduction for newcomers while also offering seasoned researchers a valuable update in the field of LLM-generated text detection. The useful resources are publicly available at: https://github.com/NLP2CT/LLM-generated-Text-Detection.
    摘要 “LLM生成文本检测技术在当前的应用和发展中具有强大的能力,能够识别和检测LLM生成的文本。随着LLM的不断扩展,有必要开发检测LLM生成文本的技术,以避免LLM生成的文本在艺术表达和社交媒体等领域产生不良影响。LLM生成文本检测的目标是判断一个文本是否由LLM生成的,这是一个二分类问题。在最近几年内,检测技术有了很大的进步,它们包括水印技术、零shot方法、细腻LM方法、对抗学习方法、LLM作为检测器和人工协助方法。在这篇评论中,我们收集了最新的研究突破和推动 LLM生成文本检测的技术。我们还探讨了常见的数据集,描述了它们的局限性和发展需求。此外,我们分析了不同的LLM生成文本检测方法,描述了它们面临的挑战,如非典型输入、攻击和数据抖抖。最后,我们指出了未来研究的有优点的方向,以便推动负责任人工智能的实施。我们的目标是通过这篇评论,为新手提供一个清晰的入门,同时为经验老研究人员提供一个有价值的更新。有关的有用资源可以在:https://github.com/NLP2CT/LLM-generated-Text-Detection 中找到。”Note: Please note that the translation is in Simplified Chinese, and some words or phrases may have been translated differently in Traditional Chinese.

A Skin Microbiome Model with AMP interactions and Analysis of Quasi-Stability vs Stability in Population Dynamics

  • paper_url: http://arxiv.org/abs/2310.15201
  • repo_url: None
  • paper_authors: Eléa Thibault Greugny, François Fages, Ovidiu Radulescu, Peter Szmolyan, Georgios Stamatas
  • for: 研究了皮肤微生物圈的稳定性和维持健康皮肤的重要性。
  • methods: 使用了数学模型,包括常 differential equation 和量化时间逻辑,对皮肤微生物圈中各种种群之间的互动和竞争进行研究。
  • results: 模型预测,皮肤表面pH值的升高可以促进机会主义病原菌的出现和殖息,而人类AMPs 的生产有非线性影响于各种种群之间的平衡。此外,模型还发现了在长时间Scale下的征icher quasi-stable state,并使用了热带 algebraic methods 进行分析。
    Abstract The skin microbiome plays an important role in the maintenance of a healthy skin. It is an ecosystem, composed of several species, competing for resources and interacting with the skin cells. Imbalance in the cutaneous microbiome, also called dysbiosis, has been correlated with several skin conditions, including acne and atopic dermatitis. Generally, dysbiosis is linked to colonization of the skin by a population of opportunistic pathogenic bacteria. Treatments consisting in non-specific elimination of cutaneous microflora have shown conflicting results. In this article, we introduce a mathematical model based on ordinary differential equations, with 2 types of bacteria populations (skin commensals and opportunistic pathogens) and including the production of antimicrobial peptides to study the mechanisms driving the dominance of one population over the other. By using published experimental data, assumed to correspond to the observation of stable states in our model, we reduce the number of parameters of the model from 13 to 5. We then use a formal specification in quantitative temporal logic to calibrate our model by global parameter optimization and perform sensitivity analyses. On the time scale of 2 days of the experiments, the model predicts that certain changes of the environment, like the elevation of skin surface pH, create favorable conditions for the emergence and colonization of the skin by the opportunistic pathogen population, while the production of human AMPs has non-linear effect on the balance between pathogens and commensals. Surprisingly, simulations on longer time scales reveal that the equilibrium reached around 2 days can in fact be a quasi-stable state followed by the reaching of a reversed stable state after 12 days or more. We analyse the conditions of quasi-stability observed in this model using tropical algebraic methods, and show their non-generic character in contrast to slow-fast systems. These conditions are then generalized to a large class of population dynamics models over any number of species.
    摘要 皮肤微生物群落对维持健康皮肤具有重要作用。它是一个生态系统,由多种种 composing,竞争资源和与皮肤细胞互动。不均衡的皮肤微生物群落(也称为不均衡)与多种皮肤疾病有关,包括牛顿病和过敏性皮肤炎。通常,不均衡与皮肤表皮感染 opportunistic pathogenic bacteria 的殖民相关。非特定的皮肤微生物 populations 清除治疗得到了不一致的结果。在这篇文章中,我们介绍了基于常微分方程的数学模型,包括2种细菌种群(皮肤固有和机会性病原菌),并包括皮肤细胞生产的抗微生物蛋白来研究维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种维持一种

BatteryML:An Open-source platform for Machine Learning on Battery Degradation

  • paper_url: http://arxiv.org/abs/2310.14714
  • repo_url: https://github.com/microsoft/batteryml
  • paper_authors: Han Zhang, Xiaofan Gui, Shun Zheng, Ziheng Lu, Yuqi Li, Jiang Bian
  • for: 本文旨在提供一个一元、涵盖所有阶段的、开源的平台,用于帮助研究人员更好地利用数据预处理、特征提取和模型实现,以提高电池研究的实用性和效率。
  • methods: 本文使用的方法包括数据预处理、特征提取和模型实现,其中包括传统模型和当前领域的state-of-the-art模型。
  • results: 本文的实验结果表明,BatteryML 可以帮助研究人员更好地理解和预测电池的衰变行为,提高电池研究的效率和可重用性。
    Abstract Battery degradation remains a pivotal concern in the energy storage domain, with machine learning emerging as a potent tool to drive forward insights and solutions. However, this intersection of electrochemical science and machine learning poses complex challenges. Machine learning experts often grapple with the intricacies of battery science, while battery researchers face hurdles in adapting intricate models tailored to specific datasets. Beyond this, a cohesive standard for battery degradation modeling, inclusive of data formats and evaluative benchmarks, is conspicuously absent. Recognizing these impediments, we present BatteryML - a one-step, all-encompass, and open-source platform designed to unify data preprocessing, feature extraction, and the implementation of both traditional and state-of-the-art models. This streamlined approach promises to enhance the practicality and efficiency of research applications. BatteryML seeks to fill this void, fostering an environment where experts from diverse specializations can collaboratively contribute, thus elevating the collective understanding and advancement of battery research.The code for our project is publicly available on GitHub at https://github.com/microsoft/BatteryML.
    摘要 锂电池衰退仍然是能量存储领域中的一个关键问题,机器学习技术在解决这个问题上表现出了潜在的优势。然而,这两个领域的交叉点也存在许多复杂的挑战。机器学习专家经常遇到锂电池科学中的复杂性,而锂电池研究人员则面临着适应特定数据集的复杂模型的挑战。此外,一个包容性的锂电池衰退模型标准,包括数据格式和评估标准,缺失着。认识到这些障碍,我们提出了锂电池ML(BatteryML) - 一个一步、全面、开源的平台,旨在统一数据预处理、特征提取和传统和当前模型的实现。这种流lined的方法将提高研究应用的实用性和效率。锂电池ML旨在填补这个空白,创造一个多元专业人员合作的环境,提高锂电池研究的共同理解和进步。我们的项目代码公开 disponibles on GitHub at .

Random Forest Dissimilarity for High-Dimension Low Sample Size Classification

  • paper_url: http://arxiv.org/abs/2310.14710
  • repo_url: None
  • paper_authors: Lucca Portes Cavalheiro, Simon Bernard, Jean Paul Barddal, Laurent Heutte
  • for: solves high-dimensional low-sample-size (HDLSS) classification problems
  • methods: uses a learned precomputed support vector machine (SVM) kernel based on the random forest (RF) similarity measure
  • results: outperforms existing methods for the majority of HDLSS problems and remains competitive for low or non-HDLSS problemsHere is the summary in traditional Chinese characters:
  • for: 解决高维低样本数 (HDLSS) 分类问题
  • methods: 使用学习前计算的支持向量机 (SVM) kernel based on random forest (RF) 相似度测量
  • results: 比 EXISTS 方法在大多数 HDLSS 问题中表现出色,并在低或非 HDLSS 问题中保持竞争力
    Abstract High dimension, low sample size (HDLSS) problems are numerous among real-world applications of machine learning. From medical images to text processing, traditional machine learning algorithms are usually unsuccessful in learning the best possible concept from such data. In a previous work, we proposed a dissimilarity-based approach for multi-view classification, the Random Forest Dissimilarity (RFD), that perfoms state-of-the-art results for such problems. In this work, we transpose the core principle of this approach to solving HDLSS classification problems, by using the RF similarity measure as a learned precomputed SVM kernel (RFSVM). We show that such a learned similarity measure is particularly suited and accurate for this classification context. Experiments conducted on 40 public HDLSS classification datasets, supported by rigorous statistical analyses, show that the RFSVM method outperforms existing methods for the majority of HDLSS problems and remains at the same time very competitive for low or non-HDLSS problems.
    摘要 高维度低样本数(HDLSS)问题在实际应用中很普遍。从医疗图像到文本处理,传统机器学习算法通常无法从这种数据中学习最佳概念。在前一项工作中,我们提出了一种相似度基于的多视图分类方法,即随机森林相似度(RFD),其在这些问题上实现了状态艺术结果。在这项工作中,我们将把核心原理转移到解决HDLSSB类型问题上,通过使用RF相似度度量作为学习得到的SVM核度(RFSVM)。我们表明该学习相似度度量在这种分类上特别适合和准确。对40个公共HDLSSB分类数据集进行了严格的统计分析,并证明RFSVM方法在大多数HDLSSB问题上超越现有方法,并且在低或非HDLSSB问题上保持竞争力。

BM2CP: Efficient Collaborative Perception with LiDAR-Camera Modalities

  • paper_url: http://arxiv.org/abs/2310.14702
  • repo_url: https://github.com/byzhaoai/bm2cp
  • paper_authors: Binyu Zhao, Wei Zhang, Zhaonian Zou
    for:This paper aims to improve the perception performance of autonomous driving systems by enabling agents to share complementary perceptual information.methods:The proposed approach, BM2CP, utilizes LiDAR and camera data to achieve efficient multi-modal perception. It employs LiDAR-guided modal fusion, cooperative depth generation, and modality-guided intermediate fusion to acquire deep interactions among modalities of different agents.results:The proposed approach outperforms state-of-the-art methods with 50X lower communication volumes in both simulated and real-world autonomous driving scenarios.
    Abstract Collaborative perception enables agents to share complementary perceptual information with nearby agents. This would improve the perception performance and alleviate the issues of single-view perception, such as occlusion and sparsity. Most existing approaches mainly focus on single modality (especially LiDAR), and not fully exploit the superiority of multi-modal perception. We propose a collaborative perception paradigm, BM2CP, which employs LiDAR and camera to achieve efficient multi-modal perception. It utilizes LiDAR-guided modal fusion, cooperative depth generation and modality-guided intermediate fusion to acquire deep interactions among modalities of different agents, Moreover, it is capable to cope with the special case where one of the sensors, same or different type, of any agent is missing. Extensive experiments validate that our approach outperforms the state-of-the-art methods with 50X lower communication volumes in both simulated and real-world autonomous driving scenarios. Our code is available at https://github.com/byzhaoAI/BM2CP.
    摘要 以下文本翻译成简化中文:共享感知使得智能代理能共享相互补充的感知信息,从而提高感知性能并解决单视角的问题,如遮挡和缺失。现有的大多数方法主要集中在单一模式(尤其是LiDAR),未充分利用多模式感知的优势。我们提议一种共享感知模式,BM2CP,它使用LiDAR和摄像头实现高效的多模式感知。它利用LiDAR导航多模态融合、合作深度生成和模式导向中间融合来实现深度间的交互,并能够处理特殊情况下,任何代理的某种感知器(同或不同类型) missing。我们的实验表明,我们的方法在 simulated 和实际自动驾驶场景中都能够超越当前状态的方法,并且通信量为50倍下降。我们的代码可以在 上获取。

API-Assisted Code Generation for Question Answering on Varied Table Structures

  • paper_url: http://arxiv.org/abs/2310.14687
  • repo_url: None
  • paper_authors: Yihan Cao, Shuyi Chen, Ryan Liu, Zhiruo Wang, Daniel Fried
  • for: 这篇论文旨在提供一个统一的TableQuestionAnswering(TableQA)框架,以便在不同的表格结构中应对问题。
  • methods: 该框架使用Python作为查询语言,并使用几个shot提示将自然语言(NL)问题翻译成Python程序,可以执行于Pandas数据帧上。此外,以便回答复杂的关系问题和外部知识,我们的框架允许自定义API。
  • results: 我们在四个TableQA数据集上进行了实验,并取得了显著的改善,比如过去的状态 искусственный智能系统。在减少研究中,我们(1)表明了我们的多重索引表示和API对比基于LLM的基eline具有优势,并(2)证明了我们的方法是可分解的并可以包含更多的API。
    Abstract A persistent challenge to table question answering (TableQA) by generating executable programs has been adapting to varied table structures, typically requiring domain-specific logical forms. In response, this paper introduces a unified TableQA framework that: (1) provides a unified representation for structured tables as multi-index Pandas data frames, (2) uses Python as a powerful querying language, and (3) uses few-shot prompting to translate NL questions into Python programs, which are executable on Pandas data frames. Furthermore, to answer complex relational questions with extended program functionality and external knowledge, our framework allows customized APIs that Python programs can call. We experiment with four TableQA datasets that involve tables of different structures -- relational, multi-table, and hierarchical matrix shapes -- and achieve prominent improvements over past state-of-the-art systems. In ablation studies, we (1) show benefits from our multi-index representation and APIs over baselines that use only an LLM, and (2) demonstrate that our approach is modular and can incorporate additional APIs.
    摘要 困难问题:表格问答(TableQA)通常需要适应不同的表格结构,通常需要域pecific的逻辑形式。为此,本文介绍了一个统一的 TableQA 框架,它:1. 将结构化表格表示为多个索引的 Pandas 数据框,以便更好地处理表格数据。2. 使用 Python 作为强大的查询语言,以便更好地处理表格数据。3. 使用少量的提示来将自然语言问题翻译成 Python 程序,该程序可以执行于 Pandas 数据框上。此外,为回答复杂的关系问题和追加的外部知识,我们的框架允许自定义 API,Python 程序可以调用。我们在四个不同结构的 TableQA 数据集上进行了实验,并实现了过去的状态顶点系统的显著改进。在剥离研究中,我们:1. 显示了我们的多索引表示和 API 对基eline系统的改进。2. 示示了我们的方法是可模块的,可以添加更多的 API。

Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond

  • paper_url: http://arxiv.org/abs/2310.14670
  • repo_url: None
  • paper_authors: Zhecan Wang, Long Chen, Haoxuan You, Keyang Xu, Yicheng He, Wenhao Li, Noal Codella, Kai-Wei Chang, Shih-Fu Chang
  • For: The paper aims to address dataset biases in vision-language (VL) understanding tasks by proposing Adversarial Data Synthesis (ADS) and Intra-sample Counterfactual Training (ICT) to improve model performance.* Methods: The paper uses ADS to generate synthetic training and debiased evaluation data, and ICT to assist models in utilizing the synthesized training data, particularly the counterfactual data, via focusing on intra-sample differentiation.* Results: The paper shows that ADS and ICT consistently improve model performance across different benchmarks, even in domain-shifted scenarios.
    Abstract Vision-language (VL) understanding tasks evaluate models' comprehension of complex visual scenes through multiple-choice questions. However, we have identified two dataset biases that models can exploit as shortcuts to resolve various VL tasks correctly without proper understanding. The first type of dataset bias is \emph{Unbalanced Matching} bias, where the correct answer overlaps the question and image more than the incorrect answers. The second type of dataset bias is \emph{Distractor Similarity} bias, where incorrect answers are overly dissimilar to the correct answer but significantly similar to other incorrect answers within the same sample. To address these dataset biases, we first propose Adversarial Data Synthesis (ADS) to generate synthetic training and debiased evaluation data. We then introduce Intra-sample Counterfactual Training (ICT) to assist models in utilizing the synthesized training data, particularly the counterfactual data, via focusing on intra-sample differentiation. Extensive experiments demonstrate the effectiveness of ADS and ICT in consistently improving model performance across different benchmarks, even in domain-shifted scenarios.
    摘要 视觉语言(VL)理解任务评估模型对复杂视觉场景的理解能力通过多选问题。然而,我们已经发现了两种数据集偏见,这些偏见可以让模型通过短cut使得不具备深入理解而 Correctly解决多个VL任务。第一种数据集偏见是“不均匀匹配”偏见,正确答案与问题和图像之间的 overlap 比 incorrect answers 更大。第二种数据集偏见是“Distractor Similarity”偏见,错误答案与正确答案之间的 similarity 远大于其他错误答案之间的 similarity。为了解决这些数据集偏见,我们首先提出了对抗数据生成(ADS),用于生成Synthetic 训练和不偏见评估数据。然后,我们引入Intra-sampleCounterfactual Training(ICT),以帮助模型利用生成的训练数据,特别是对偶数据。广泛的实验表明,ADS和ICT可以在不同的benchmark上提高模型性能,甚至在领域转移 scenarios 中。

B^2SFL: A Bi-level Blockchained Architecture for Secure Federated Learning-based Traffic Prediction

  • paper_url: http://arxiv.org/abs/2310.14669
  • repo_url: None
  • paper_authors: Hao Guo, Collin Meese, Wanxin Li, Chien-Chung Shen, Mark Nejad
  • for: 这篇论文旨在提出一种安全和分散的 federated learning(FL)技术,用于实现隐私保护和安全的机器学习(ML)模型训练。
  • methods: 该论文提出了一种两层链接的区块链架构,其中底层链存储本地模型,顶层链存储全球综合参数。同时,提出了分布式同质加密 Federated Averaging(DHFA)算法,以解决安全计算问题。
  • results: 实验结果表明,提出的系统可以实现安全、分散的 federated learning,用于实际的交通流量预测任务。
    Abstract Federated Learning (FL) is a privacy-preserving machine learning (ML) technology that enables collaborative training and learning of a global ML model based on aggregating distributed local model updates. However, security and privacy guarantees could be compromised due to malicious participants and the centralized FL server. This article proposed a bi-level blockchained architecture for secure federated learning-based traffic prediction. The bottom and top layer blockchain store the local model and global aggregated parameters accordingly, and the distributed homomorphic-encrypted federated averaging (DHFA) scheme addresses the secure computation problems. We propose the partial private key distribution protocol and a partially homomorphic encryption/decryption scheme to achieve the distributed privacy-preserving federated averaging model. We conduct extensive experiments to measure the running time of DHFA operations, quantify the read and write performance of the blockchain network, and elucidate the impacts of varying regional group sizes and model complexities on the resulting prediction accuracy for the online traffic flow prediction task. The results indicate that the proposed system can facilitate secure and decentralized federated learning for real-world traffic prediction tasks.
    摘要

Data Pruning via Moving-one-Sample-out

  • paper_url: http://arxiv.org/abs/2310.14664
  • repo_url: None
  • paper_authors: Haoru Tan, Sitong Wu, Fei Du, Yukang Chen, Zhibin Wang, Fan Wang, Xiaojuan Qi
  • for: 本研究提出了一种新的数据剔除方法,即移动一个样本出(MoSo),用于识别和移除训练集中最少有用的样本。
  • methods: 本方法基于评估每个样本的重要性,通过评估它们对优化实际风险的影响。这里使用了一种高效的首领函数来估算样本之间的相关性。
  • results: 实验结果表明,MoSo 有效地避免高度剔除率导致的性能下降,并在不同的设置下达到了满意的性能。
    Abstract In this paper, we propose a novel data-pruning approach called moving-one-sample-out (MoSo), which aims to identify and remove the least informative samples from the training set. The core insight behind MoSo is to determine the importance of each sample by assessing its impact on the optimal empirical risk. This is achieved by measuring the extent to which the empirical risk changes when a particular sample is excluded from the training set. Instead of using the computationally expensive leaving-one-out-retraining procedure, we propose an efficient first-order approximator that only requires gradient information from different training stages. The key idea behind our approximation is that samples with gradients that are consistently aligned with the average gradient of the training set are more informative and should receive higher scores, which could be intuitively understood as follows: if the gradient from a specific sample is consistent with the average gradient vector, it implies that optimizing the network using the sample will yield a similar effect on all remaining samples. Experimental results demonstrate that MoSo effectively mitigates severe performance degradation at high pruning ratios and achieves satisfactory performance across various settings.
    摘要 在这篇论文中,我们提出了一种新的数据剔除方法,称为“移动一个样本出”(MoSo),旨在在训练集中标识和移除最少有用的样本。MoSo的核心想法是根据每个样本对优化的实际风险的影响来确定每个样本的重要性。这是通过计算剔除特定样本后训练集的实际风险变化来实现的。而不是使用计算成本较高的离开一个样本重新训练过程,我们提议了一种高效的第一个 aproximator,只需要不同训练阶段的梯度信息。MoSo的关键思想是,样本的梯度与训练集的平均梯度方向相互平行,这些样本更有用,因为它们在优化网络时会对所有剩下的样本产生相似的效果。实验结果表明,MoSo可以在高剔除率下有效地避免严重的性能下降,并在不同设置下达到满意的性能。

Reasoning about Ambiguous Definite Descriptions

  • paper_url: http://arxiv.org/abs/2310.14657
  • repo_url: https://github.com/sfschouten/exploiting-ambiguity
  • paper_authors: Stefan F. Schouten, Peter Bloem, Ilia Markov, Piek Vossen
  • For: The paper aims to evaluate the ability of Large Language Models (LLMs) to use explicit reasoning to resolve context-dependent ambiguity in language.* Methods: The paper proposes using ambiguous definite descriptions to create a benchmark dataset for this purpose, and all information required to resolve the ambiguity is included in the prompt, allowing models to rely solely on reasoning to perform well.* Results: The authors find that recent LLMs struggle with this task, indicating that there is room for improvement in their ability to use explicit reasoning to resolve ambiguity.Here’s the same information in Simplified Chinese:* For: 论文目的是评估大语言模型(LLMs)是否可以使用明确的理由解决语言中的上下文依赖性的歧义。* Methods: 论文提出使用不确定的定语phrases创建一个benchmark dataset,并将所有需要解决歧义的信息包含在提示中,使模型只需理解来做好。* Results: 作者发现现有的LLMs在这个任务中表现不佳,表明需要进一步提高对明确的理由的使用能力。
    Abstract Natural language reasoning plays an increasingly important role in improving language models' ability to solve complex language understanding tasks. An interesting use case for reasoning is the resolution of context-dependent ambiguity. But no resources exist to evaluate how well Large Language Models can use explicit reasoning to resolve ambiguity in language. We propose to use ambiguous definite descriptions for this purpose and create and publish the first benchmark dataset consisting of such phrases. Our method includes all information required to resolve the ambiguity in the prompt, which means a model does not require anything but reasoning to do well. We find this to be a challenging task for recent LLMs. Code and data available at: https://github.com/sfschouten/exploiting-ambiguity
    摘要 自然语言理解在解决复杂语言理解任务中发挥越来越重要的作用。一个有趣的应用例子是解决上下文依赖性的歧义。但目前没有资源来评估大语言模型如何使用显式理解来解决语言中的歧义。我们提议使用不确定定语的描述来解决这个问题,并创建了首个包含这些短语的数据集。我们的方法包括所有需要解决歧义的信息,这意味着模型只需要理解来完成任务。我们发现最新的大语言模型很困难地解决这个任务。代码和数据可以在 GitHub 上获取:https://github.com/sfschouten/exploiting-ambiguity。

$Λ$-Split: A Privacy-Preserving Split Computing Framework for Cloud-Powered Generative AI

  • paper_url: http://arxiv.org/abs/2310.14651
  • repo_url: https://github.com/nishio-laboratory/lambda_split
  • paper_authors: Shoki Ohta, Takayuki Nishio
  • For: 这个论文目的是提出一个名为 $\Lambda$-Split 的分computing框架,以便在资源有限的移动设备上进行Generative AI服务的computational offloading,同时保护敏感资料免受隐私和安全风险。* Methods: 这个框架使用了三个分别分配到用户的本地设备和云端服务器上的子模型,以避免将敏感的输入和输出数据外传。它还使用了黑盒模型,使得从侵略者 intercepted 隐藏层输出中估算原始输入或输出的问题具有很大的挑战。* Results: 在使用 Llama 2 和 Stable Diffusion XL 的测试中, $\Lambda$-Split 框架能够实现高效的隐私和安全性,并且可以与传统的加密类型安全机制相容。
    Abstract In the wake of the burgeoning expansion of generative artificial intelligence (AI) services, the computational demands inherent to these technologies frequently necessitate cloud-powered computational offloading, particularly for resource-constrained mobile devices. These services commonly employ prompts to steer the generative process, and both the prompts and the resultant content, such as text and images, may harbor privacy-sensitive or confidential information, thereby elevating security and privacy risks. To mitigate these concerns, we introduce $\Lambda$-Split, a split computing framework to facilitate computational offloading while simultaneously fortifying data privacy against risks such as eavesdropping and unauthorized access. In $\Lambda$-Split, a generative model, usually a deep neural network (DNN), is partitioned into three sub-models and distributed across the user's local device and a cloud server: the input-side and output-side sub-models are allocated to the local, while the intermediate, computationally-intensive sub-model resides on the cloud server. This architecture ensures that only the hidden layer outputs are transmitted, thereby preventing the external transmission of privacy-sensitive raw input and output data. Given the black-box nature of DNNs, estimating the original input or output from intercepted hidden layer outputs poses a significant challenge for malicious eavesdroppers. Moreover, $\Lambda$-Split is orthogonal to traditional encryption-based security mechanisms, offering enhanced security when deployed in conjunction. We empirically validate the efficacy of the $\Lambda$-Split framework using Llama 2 and Stable Diffusion XL, representative large language and diffusion models developed by Meta and Stability AI, respectively. Our $\Lambda$-Split implementation is publicly accessible at https://github.com/nishio-laboratory/lambda_split.
    摘要 在生长式人工智能(AI)服务的扩展中,这些技术的计算需求经常需要云计算的卷积式加载,特别是 для资源有限的移动设备。这些服务通常使用提示来引导生成过程,而提示和生成的内容,如文本和图像,可能包含隐私敏感或机密信息,从而增加安全和隐私风险。为了缓解这些问题,我们介绍了Lambda-Split,一种分布式计算框架,以便在用户的本地设备和云服务器之间分配计算任务,从而实现计算加载。在Lambda-Split中,一个生成模型,通常是深度神经网络(DNN),被分解成三个子模型,分别分配到用户的本地设备和云服务器:输入和输出子模型分别分配到本地,而中间、计算昂贵的子模型寄存在云服务器。这种架构确保只有隐藏层输出被传输,因此防止了外部传输隐私敏感的原始输入和输出数据。由于DNN的黑盒特性,对 intercepted 隐藏层输出进行重建原始输入或输出的挑战非常大。此外,Lambda-Split与传统的加密基础设施不同,可以增强安全性,当与其他安全机制相结合使用时。我们通过使用Llama 2和Stable Diffusion XL,代表Meta和Stability AI开发的大语言和扩散模型,进行了实验 validate Lambda-Split 框架的可靠性。我们的Lambda-Split实现可公开访问于https://github.com/nishio-laboratory/lambda_split。

Spiking mode-based neural networks

  • paper_url: http://arxiv.org/abs/2310.14621
  • repo_url: https://github.com/linzhanghan/smnn
  • paper_authors: Zhanghan Lin, Haiping Huang
  • for: 这个论文的目的是提出一种基于模式的训练协议,以解决大规模抽象神经网络训练时的成本高和学习过程中信息隐藏问题。
  • methods: 该训练协议使用输入和输出模式和它们相关的分数来解释Weight的意义。这使得模式数量可以调整,以提供更多的自由度来模拟实验数据。此外,该协议还可以将高维神经活动投影到低维模式空间中,从而将学习空间维度减少。
  • results: 作者在两个计算任务中(数字分类和选择性感受Integration任务)进行了分析,并 derivated一种模式基本学习规则 для抽象神经网络。
    Abstract Spiking neural networks play an important role in brain-like neuromorphic computations and in studying working mechanisms of neural circuits. One drawback of training a large scale spiking neural network is that an expensive cost of updating all weights is required. Furthermore, after training, all information related to the computational task is hidden into the weight matrix, prohibiting us from a transparent understanding of circuit mechanisms. Therefore, in this work, we address these challenges by proposing a spiking mode-based training protocol. The first advantage is that the weight is interpreted by input and output modes and their associated scores characterizing importance of each decomposition term. The number of modes is thus adjustable, allowing more degrees of freedom for modeling the experimental data. This reduces a sizable training cost because of significantly reduced space complexity for learning. The second advantage is that one can project the high dimensional neural activity in the ambient space onto the mode space which is typically of a low dimension, e.g., a few modes are sufficient to capture the shape of the underlying neural manifolds. We analyze our framework in two computational tasks -- digit classification and selective sensory integration tasks. Our work thus derives a mode-based learning rule for spiking neural networks.
    摘要 神经网络在脑动-类似计算中扮演着重要的角色,同时也用于研究神经综合体的工作机制。然而,训练大规模神经网络时存在一个昂贵的更新所有权重的问题。此外,训练后,所有相关计算任务的信息都被储存在权重矩阵中,这使得我们无法从权重矩阵中获得透彻的认识。因此,在这项工作中,我们解决这些挑战,提出了一种神经模式基于的训练协议。首先,在输入和输出模式和它们关联的分数中,权重被解释。因此,模式数量可以被调整,以获得更多的自由度来模型实验数据。这 reduces 训练成本,因为学习空间复杂度减少了。其次,可以将高维神经活动在 ambient 空间中 проек到模式空间中,模式空间通常是低维的,例如只需几个模式可以捕捉神经 manifold 的形状。我们在 digit 分类和选择性感知任务中分析了我们的框架。因此,我们 derive 一种基于模式的学习规则 для神经网络。

Prefix-Tuning Based Unsupervised Text Style Transfer

  • paper_url: http://arxiv.org/abs/2310.14599
  • repo_url: None
  • paper_authors: Huiyu Mai, Wenhao Jiang, Zhihong Deng
  • for: 这篇论文主要针对的是无监督文本风格转换技术,即通过训练一个生成模型,使其可以改变输入句子的风格而不使用任何平行数据。
  • methods: 本文使用了强大预训练的大语言模型,并提出了一种基于前缀练化的新方法 для无监督文本风格转换。这种方法使用三种不同的前缀,即共享前缀、风格前缀和内容前缀,来编码任务特定信息、目标风格和输入句子的内容信息,分别。与之前的方法不同的是,这些前缀可以为模型提供更丰富的信息。此外,我们采用了 recursive way of using language models,这种策略可以更有效地在文本风格转换过程中进行模型之间的交互,帮助模型构建更有用的前缀,从而提高性能。
  • results: 对于知名的数据集,我们的方法超过了现状之最佳基elines。此外,我们还提供了对比分析、抽象研究和人类Subjective评价,以便更深入地了解提议的方法。
    Abstract Unsupervised text style transfer aims at training a generative model that can alter the style of the input sentence while preserving its content without using any parallel data. In this paper, we employ powerful pre-trained large language models and present a new prefix-tuning-based method for unsupervised text style transfer. We construct three different kinds of prefixes, i.e., \textit{shared prefix, style prefix}, and \textit{content prefix}, to encode task-specific information, target style, and the content information of the input sentence, respectively. Compared to embeddings used by previous works, the proposed prefixes can provide richer information for the model. Furthermore, we adopt a recursive way of using language models in the process of style transfer. This strategy provides a more effective way for the interactions between the input sentence and GPT-2, helps the model construct more informative prefixes, and thus, helps improve the performance. Evaluations on the well-known datasets show that our method outperforms the state-of-the-art baselines. Results, analysis of ablation studies, and subjective evaluations from humans are also provided for a deeper understanding of the proposed method.
    摘要 <>translate into Simplified Chinese无监督文本风格传输目标在训练一个生成模型,以alter输入句子的风格而保持其内容,不使用任何平行数据。在这篇论文中,我们利用强大的预训练大语言模型,并提出了一种基于前缀修改的新方法 для无监督文本风格传输。我们构建了三种不同的前缀,即\textit{共享前缀}, \textit{风格前缀}和\textit{内容前缀},以编码任务特定信息、目标风格和输入句子的内容信息,分别。与前作使用的嵌入相比,我们的提案的前缀可以为模型提供更丰富的信息。另外,我们采用了回归的语言模型使用策略,这种策略可以在文本风格传输过程中更有效地进行输入句子和GPT-2之间的交互,帮助模型构建更有用的前缀,从而提高性能。评估在知名的数据集上表明,我们的方法比前一阶段的基准值更高。结果、简洁分析和人类的主观评价也提供了更深入的理解方法。

Learning to Correct Noisy Labels for Fine-Grained Entity Typing via Co-Prediction Prompt Tuning

  • paper_url: http://arxiv.org/abs/2310.14596
  • repo_url: https://github.com/mhtang1995/cppt
  • paper_authors: Minghao Tang, Yongquan He, Yongxiu Xu, Hongbo Xu, Wenyuan Zhang, Yang Lin
  • for: 本研究旨在解决自然语言处理中细致实体类型化(FET)中的噪音标注问题,即现有方法通过估计噪音分布来识别噪音标注,但它们受到多种噪音分布偏差的影响。
  • methods: 我们提出了基于多个预测结果的协同预测提示调整方法,用于减少噪音标注。 Specifically, 我们将预测结果集 integrate 回召回标签,并使用分化的边界来识别不准确标签。 此外,我们设计了一个对异质协同预测进行优化的目标函数,确保模型捕捉到足够的信息并保持噪音识别的稳定性。
  • results: 我们在三个广泛使用的FET数据集上进行实验,结果显示,我们的噪音修正方法可以显著提高不同类型的训练样本的质量,包括使用远程监督、ChatGPT和人工标注等方法进行标注的样本。
    Abstract Fine-grained entity typing (FET) is an essential task in natural language processing that aims to assign semantic types to entities in text. However, FET poses a major challenge known as the noise labeling problem, whereby current methods rely on estimating noise distribution to identify noisy labels but are confused by diverse noise distribution deviation. To address this limitation, we introduce Co-Prediction Prompt Tuning for noise correction in FET, which leverages multiple prediction results to identify and correct noisy labels. Specifically, we integrate prediction results to recall labeled labels and utilize a differentiated margin to identify inaccurate labels. Moreover, we design an optimization objective concerning divergent co-predictions during fine-tuning, ensuring that the model captures sufficient information and maintains robustness in noise identification. Experimental results on three widely-used FET datasets demonstrate that our noise correction approach significantly enhances the quality of various types of training samples, including those annotated using distant supervision, ChatGPT, and crowdsourcing.
    摘要 优化的实体类型分类(FET)是自然语言处理中的一项重要任务,它的目标是将文本中的实体分类为semantic类型。然而,FET受到一个主要的挑战,即噪声标签问题,现有方法依据估计噪声分布来标识噪声标签,但是这些方法容易受到多种噪声分布的 deviation。为了解决这些限制,我们介绍了Co-Prediction Prompt Tuning(CPPT),一种用于噪声 corrections的方法,它利用多个预测结果来确定和更正噪声标签。具体来说,我们将预测结果集成起来,以便回忆标记标签,并使用分化的margin来识别错误的标签。此外,我们还设计了一个在精度适应中的优化目标,确保模型捕捉到足够的信息,并保持噪声识别的稳定性。实验结果表明,我们的噪声 corrections策略在三个广泛使用的FET数据集上有效地提高了不同类型的训练样本的质量,包括使用远程监督、ChatGPT和大量签名的标注样本。

Leveraging Image-Text Similarity and Caption Modification for the DataComp Challenge: Filtering Track and BYOD Track

  • paper_url: http://arxiv.org/abs/2310.14581
  • repo_url: None
  • paper_authors: Shuhei Yokoo, Peifei Zhu, Yuchi Ishikawa, Mikihiro Tanaka, Masayoshi Kondo, Hirokatsu Kataoka
  • for: 本研究的目的是提出一种用于DataComp挑战的数据处理方法,以提高数据质量并提高模型的泛化能力。
  • methods: 本研究使用大型多Modal模型CLIP和BLIP-2来筛选和修改网络爬虫数据,同时利用外部数据集和一些技巧来提高数据质量。
  • results: 实验表明,我们的解决方案在DataComp挑战中的筛选track和BYOD track上显著超过基eline(筛选track:6.6%提高,BYOD track:48.5%提高)。
    Abstract Large web crawl datasets have already played an important role in learning multimodal features with high generalization capabilities. However, there are still very limited studies investigating the details or improvements of data design. Recently, a DataComp challenge has been designed to propose the best training data with the fixed models. This paper presents our solution to both filtering track and BYOD track of the DataComp challenge. Our solution adopts large multimodal models CLIP and BLIP-2 to filter and modify web crawl data, and utilize external datasets along with a bag of tricks to improve the data quality. Experiments show our solution significantly outperforms DataComp baselines (filtering track: 6.6% improvement, BYOD track: 48.5% improvement).
    摘要 大型网爬Datasets已经在学习多Modal特征方面发挥了重要作用,但是还有很少的研究关于数据设计的细节或改进方法。近期,DataComp挑战赛推出了提议最佳训练数据的挑战。本文介绍了我们对DataComp挑战的解决方案,我们采用了大型多Modal模型CLIP和BLIP-2来筛选和修改网爬数据,并利用外部数据集和一些资源进行优化。实验结果显示,我们的解决方案在筛选 track和BYOD track中都有显著的改进(筛选 track:6.6%提升,BYOD track:48.5%提升)。

FedSplitX: Federated Split Learning for Computationally-Constrained Heterogeneous Clients

  • paper_url: http://arxiv.org/abs/2310.14579
  • repo_url: None
  • paper_authors: Jiyun Shin, Jinhyun Ahn, Honggu Kang, Joonhyuk Kang
  • for: 这篇论文目的是提出一个名为 FedSplitX 的联合学习框架,以解决在联合学习中客户端机器不均匀的问题。
  • methods: 这篇论文使用了一种名为 FedSplitX 的新方法,将大型模型分成客户端和服务器端两部分,以满足不同客户端的 Compute 能力。
  • results: 实验结果显示,FedSplitX 可以更好地利用服务器的 Compute 能力来训练大型模型,并与基准方法相比实现更高的模型性能。
    Abstract Foundation models (FMs) have demonstrated remarkable performance in machine learning but demand extensive training data and computational resources. Federated learning (FL) addresses the challenges posed by FMs, especially related to data privacy and computational burdens. However, FL on FMs faces challenges in situations with heterogeneous clients possessing varying computing capabilities, as clients with limited capabilities may struggle to train the computationally intensive FMs. To address these challenges, we propose FedSplitX, a novel FL framework that tackles system heterogeneity. FedSplitX splits a large model into client-side and server-side components at multiple partition points to accommodate diverse client capabilities. This approach enables clients to collaborate while leveraging the server's computational power, leading to improved model performance compared to baselines that limit model size to meet the requirement of the poorest client. Furthermore, FedSplitX incorporates auxiliary networks at each partition point to reduce communication costs and delays while enhancing model performance. Our experiments demonstrate that FedSplitX effectively utilizes server capabilities to train large models, outperforming baseline approaches.
    摘要 基于Machine Learning的Foundation Models(FM)在实际应用中表现出了惊人的能力,但它们需要大量的训练数据和计算资源。 Federation Learning(FL)可以解决FM所遇到的问题,特别是数据隐私和计算负担问题。然而,在FM上进行FL时,面临着客户端 possessing varying computing capabilities 的挑战,因为客户端的限制可能会使FM进行计算性能不佳。为解决这些挑战,我们提出了 FedSplitX,一种新的FL框架,可以在多个分区点上将大型模型分解成客户端和服务器端两部分。这种方法允许客户端与服务器进行协作,同时利用服务器的计算能力,从而提高模型性能,比基eline方法,其限制模型大小以适应最差的客户端。此外,FedSplitX还包括在每个分 partition point 中的辅助网络,以减少通信成本和延迟,同时提高模型性能。我们的实验表明,FedSplitX可以有效利用服务器的计算能力,训练大型模型,超越基eline方法。

Unveiling the Multi-Annotation Process: Examining the Influence of Annotation Quantity and Instance Difficulty on Model Performance

  • paper_url: http://arxiv.org/abs/2310.14572
  • repo_url: None
  • paper_authors: Pritam Kadasi, Mayank Singh
  • for: This paper aims to investigate the impact of multi-annotator datasets on NLP model performance.
  • methods: The authors propose a novel multi-annotator simulation process to generate datasets with varying annotation budgets.
  • results: The study shows that similar datasets with the same annotation budget can lead to varying performance gains, challenging the popular belief that multi-annotation datasets always lead to better performance.Here’s the same information in Simplified Chinese:
  • for: 这篇论文探讨了 NLP 模型性能如何受到多个注释数据集的影响。
  • methods: 作者提出了一种新的多注释模拟过程,用于生成具有不同注释预算的数据集。
  • results: 研究发现,同一个注释预算下的多个注释数据集可能会导致不同的性能提升,与人们通常认为的多注释数据集总是提高模型性能的想法相抵触。
    Abstract The NLP community has long advocated for the construction of multi-annotator datasets to better capture the nuances of language interpretation, subjectivity, and ambiguity. This paper conducts a retrospective study to show how performance scores can vary when a dataset expands from a single annotation per instance to multiple annotations. We propose a novel multi-annotator simulation process to generate datasets with varying annotation budgets. We show that similar datasets with the same annotation budget can lead to varying performance gains. Our findings challenge the popular belief that models trained on multi-annotation examples always lead to better performance than models trained on single or few-annotation examples.
    摘要

Meaning Representations from Trajectories in Autoregressive Models

  • paper_url: http://arxiv.org/abs/2310.18348
  • repo_url: None
  • paper_authors: Tian Yu Liu, Matthew Trager, Alessandro Achille, Pramuditha Perera, Luca Zancato, Stefano Soatto
  • for: 本研究旨在提取基于自然语言模型的含义表示,以便无需 Fine-tuning 和提示,可以在任何预训练的自然语言模型上使用。
  • methods: 本研究使用考虑输入文本的所有可能的演进 trajectories 来获得分布基于含义表示。这种策略不需要vector-based表示,可以模型不对称关系(例如逻辑推论、Hypernym/Hyponym关系),通过likelihood函数的代数运算。
  • results: 实验表明,使用大型模型获得的表示与人工标注相符,超过零配置和提示方法在 semantic similarity 任务中的性能,并可以解决标准表示无法处理的更复杂的包含和推论任务。此外,我们还扩展了方法以处理不同的模态数据(例如图像和文本),使用多modal autoregressive模型。
    Abstract We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text. This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model. Moreover, unlike vector-based representations, distribution-based representations can also model asymmetric relations (e.g., direction of logical entailment, hypernym/hyponym relations) by using algebraic operations between likelihood functions. These ideas are grounded in distributional perspectives on semantics and are connected to standard constructions in automata theory, but to our knowledge they have not been applied to modern language models. We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle. Finally, we extend our method to represent data from different modalities (e.g., image and text) using multimodal autoregressive models.
    摘要 我们提议通过考虑输入文本的所有可能的趋势扩展来提取语义表示。这种策略不需要准确调整,可以应用于任何预训练的自然语言模型,而且不同于向量基的表示方法,分布基的表示方法可以模型不对称关系(如逻辑推论的方向、Hypernym/Hyponym关系)通过likelihood函数的代数运算。这些想法基于分布 semantics 的视角和自动机理论中的标准构造,但我们知道它们没有应用于现代语言模型。我们的实验表明,从大型模型中获得的表示与人工标注相符,在 semantic similarity 任务上超越了零shot和不需要准确调整的方法,并可以解决标准表示无法处理的更复杂的包含和涵盖任务。最后,我们扩展了我们的方法以用于不同的Modalities(如图像和文本)的多模态自然语言模型。

AlpaCare:Instruction-tuned Large Language Models for Medical Application

  • paper_url: http://arxiv.org/abs/2310.14558
  • repo_url: https://github.com/xzhang97666/alpacare
  • paper_authors: Xinlu Zhang, Chenxin Tian, Xianjun Yang, Lichang Chen, Zekun Li, Linda Ruth Petzold
  • for: 这paper的目的是提高大语言模型(LLMs)的 instrucion-following 能力,以实现在多种任务中的出色表现。
  • methods: 这paper使用了52k个多样化的机器生成的医学指令追踪数据(MedInstruct-52k)进行了练习,并通过对这些数据进行微调来提高模型的医学能力和泛化性。
  • results: 实验结果表明,对于医学和通用领域的自由指令评估中,AlpaCare模型在医学和通用领域都表现出色,比之前的指令微调模型更具医学能力和泛化性。
    Abstract Large Language Models (LLMs) have demonstrated significant enhancements in instruction-following abilities through instruction tuning, achieving notable performances across various tasks. Previous research has focused on fine-tuning medical domain-specific LLMs using an extensive array of medical-specific data, incorporating millions of pieces of biomedical literature to augment their medical capabilities. However, existing medical instruction-tuned LLMs have been constrained by the limited scope of tasks and instructions available, restricting the efficacy of instruction tuning and adversely affecting performance in the general domain. In this paper, we fine-tune LLaMA-series models using 52k diverse, machine-generated, medical instruction-following data, MedInstruct-52k, resulting in the model AlpaCare. Comprehensive experimental results on both general and medical-specific domain free-form instruction evaluations showcase AlpaCare's strong medical proficiency and generalizability compared to previous instruction-tuned models in both medical and general domains. We provide public access to our MedInstruct-52k dataset and a clinician-crafted free-form instruction test set, MedInstruct-test, along with our codebase, to foster further research and development. Our project page is available at https://github.com/XZhang97666/AlpaCare.
    摘要 大型语言模型(LLM)在 instrucion 调整方面已经展示了明显的提升,在不同的任务中表现出色。过去的研究主要集中在医疗领域特定的 LLM 上进行精细调整,使用了大量的医疗专业文献来增强其医疗能力。但是,现有的医疗 instrucion 调整 LLM 受到了限定的任务和 instrucion 的有限性,这产生了 instrucion 调整的有效性和医疗领域的表现的问题。在这篇论文中,我们使用了 52,000 种多样的医疗 instrucion 调整数据,MedInstruct-52k,进行模型调整,创造出 AlpaCare 模型。我们实现了对医疗领域和通用领域的具体和通用性的评估,并提供了 MedInstruct-52k 数据集和来自临床专业人士的自由形式 instrucion 试点集 MedInstruct-test,以及我们的代码库,以便进一步的研究和发展。我们的项目页面可以在 GitHub 上找到:https://github.com/XZhang97666/AlpaCare。

Making RL with Preference-based Feedback Efficient via Randomization

  • paper_url: http://arxiv.org/abs/2310.14554
  • repo_url: None
  • paper_authors: Runzhe Wu, Wen Sun
  • for: 这个论文目标是提出一种高效的强化学习算法,能够从人类反馈中学习。
  • methods: 该算法使用随机化方法,包括随机扩散和活动学习,以实现 statistically efficient 和 computationally efficient 的性能。
  • results: 该算法可以在 linear MDP 模型下实现 near-optimal 的 worst-case regret bound,同时具有 polynomial 的运行时间和 query complexity。在更一般的非线性函数近似中,我们设计了一种基于模型的随机化算法,可以 minimize Bayesian regret bound 和 query complexity,实现 near-optimal 的质量平衡。
    Abstract Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be efficient in terms of statistical complexity, computational complexity, and query complexity. In this work, we consider the RLHF setting where the feedback is given in the format of preferences over pairs of trajectories. In the linear MDP model, by using randomization in algorithm design, we present an algorithm that is sample efficient (i.e., has near-optimal worst-case regret bounds) and has polynomial running time (i.e., computational complexity is polynomial with respect to relevant parameters). Our algorithm further minimizes the query complexity through a novel randomized active learning procedure. In particular, our algorithm demonstrates a near-optimal tradeoff between the regret bound and the query complexity. To extend the results to more general nonlinear function approximation, we design a model-based randomized algorithm inspired by the idea of Thompson sampling. Our algorithm minimizes Bayesian regret bound and query complexity, again achieving a near-optimal tradeoff between these two quantities. Computation-wise, similar to the prior Thompson sampling algorithms under the regular RL setting, the main computation primitives of our algorithm are Bayesian supervised learning oracles which have been heavily investigated on the empirical side when applying Thompson sampling algorithms to RL benchmark problems.
    摘要 “强化学习算法(RL),受人类反馈(HF)的学习需要高效率。在这项工作中,我们考虑RLHF设定,其中反馈是对轨迹对的偏好。在线性MDP模型中,通过Randomization在算法设计中使用,我们提出了一种高效的算法,其worst-case regret bound几乎为最优,运行时间为几乎 polynomials with respect to相关参数。我们的算法还减少了查询复杂度,通过一种新的随机活动学习过程。具体来说,我们的算法达到了near-optimal的交易征。在更一般的非线性函数approximation中,我们设计了基于模型的随机化算法,以Thompson sampling的思想为基础。我们的算法最小化了 bayesian regret bound和查询复杂度,再次达到了near-optimal的交易征。计算上,我们的算法的主要计算基础是 Bayesian supervised learning oracles,它们在RL benchmark问题上得到了大量的实际研究。”Note: Simplified Chinese is used in this translation, which is a more casual and conversational style of Chinese. If you prefer Traditional Chinese or a more formal style, please let me know and I can provide those versions as well.

Denoising Opponents Position in Partial Observation Environment

  • paper_url: http://arxiv.org/abs/2310.14553
  • repo_url: None
  • paper_authors: Aref Sayareh, Aria Sardari, Vahid Khoddami, Nader Zare, Vinicius Prado da Fonseca, Amilcar Soares
  • for: 本研究旨在提高 Soccer Simulation 2D 软件中的决策过程,通过机器学习方法预测对手位置,以提高行动的准确性。
  • methods: 本研究使用 Long Short-Term Memory 模型 (LSTM) 和 Deep Neural Networks (DNN) 来预测对手位置,并与标准算法进行比较。
  • results: 研究结果表明,LSTM 和 DNN 的位置预测精度高于标准算法,如最后见方法。
    Abstract The RoboCup competitions hold various leagues, and the Soccer Simulation 2D League is a major among them. Soccer Simulation 2D (SS2D) match involves two teams, including 11 players and a coach for each team, competing against each other. The players can only communicate with the Soccer Simulation Server during the game. Several code bases are released publicly to simplify team development. So researchers can easily focus on decision-making and implementing machine learning methods. SS2D actions and behaviors are only partially accurate due to different challenges, such as noise and partial observation. Therefore, one strategy is to implement alternative denoising methods to tackle observation inaccuracy. Our idea is to predict opponent positions while they have yet to be seen in a finite number of cycles using machine learning methods to make more accurate actions such as pass. We will explain our position prediction idea powered by Long Short-Term Memory models (LSTM) and Deep Neural Networks (DNN). The results show that the LSTM and DNN predict the opponents' position more accurately than the standard algorithm, such as the last-seen method.
    摘要 罗宾杯比赛有多个联赛, Soccer Simulation 2D 联赛是其中之一。 Soccer Simulation 2D(SS2D)比赛含有两支队伍,每支队伍有11名球员和一位教练,在比赛中竞争。球员仅能在比赛中与足球模拟服务器进行交流。为了促进队伍的开发,多个开源代码库被公开发布,让研究人员可以更集中地专注于决策和实现机器学习技术。由于 SS2D 的动作和行为仅部分准确,因此一种策略是实现替代的干扰方法来解决观察不准确的问题。我们的想法是使用长期记忆运算方法(LSTM)和深度神经网络(DNN)预测对手的位置,而不需要在比赛中观察对手。我们将说明我们使用 LSTM 和 DNN 预测对手位置的想法,以及其比于标准算法,如最后一次方法,的结果。

Evaluating Spatial Understanding of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.14540
  • repo_url: None
  • paper_authors: Yutaro Yamada, Yihan Bao, Andrew K. Lampinen, Jungo Kasai, Ilker Yildirim
  • for: 这个论文旨在研究语言模型(LLM)是否能够自动捕捉空间结构的知识。
  • methods: 作者使用自然语言导航任务来评估不同语言模型(包括GPT-3.5-turbo、GPT-4和Llama2系列)对空间结构的表示和推理能力,并与人类表现进行比较。
  • results: 研究发现,LLM在不同的空间结构中表现有很大的变化,包括方格、六边形、三角形网格、环形和树等。此外,LLM的错误分析发现,LLM的错误不仅受到空间因素的影响,还受到非空间因素的影响。这些发现表明,LLM在捕捉空间结构方面存在一定的能力,但还有很多空间提高。
    Abstract Large language models (LLMs) show remarkable capabilities across a variety of tasks. Despite the models only seeing text in training, several recent studies suggest that LLM representations implicitly capture aspects of the underlying grounded concepts. Here, we explore LLM representations of a particularly salient kind of grounded knowledge -- spatial relationships. We design natural-language navigation tasks and evaluate the ability of LLMs, in particular GPT-3.5-turbo, GPT-4, and Llama2 series models, to represent and reason about spatial structures, and compare these abilities to human performance on the same tasks. These tasks reveal substantial variability in LLM performance across different spatial structures, including square, hexagonal, and triangular grids, rings, and trees. We also discover that, similar to humans, LLMs utilize object names as landmarks for maintaining spatial maps. Finally, in extensive error analysis, we find that LLMs' mistakes reflect both spatial and non-spatial factors. These findings suggest that LLMs appear to capture certain aspects of spatial structure implicitly, but room for improvement remains.
    摘要 大型语言模型(LLM)表现出了多种任务中的出色能力。尽管模型只在训练中看到文本,但数据rekent studies表明,LLM表示含有下面的基本概念。在这里,我们研究LLM表示的空间关系,并评估不同模型在不同空间结构上的表现,包括正方形、六角形和三角形网格、环和树。我们还发现,与人类类似,LLM使用物品名称作为空间地图的标志。最后,我们进行了详细的错误分析,发现LLM的错误是由空间和非空间因素共同决定的。这些发现表明LLM capture了一些空间结构的方面,但还有待改进的空间。

Context-Aware Prediction of User Engagement on Online Social Platforms

  • paper_url: http://arxiv.org/abs/2310.14533
  • repo_url: None
  • paper_authors: Heinrich Peters, Yozen Liu, Francesco Barbieri, Raiyan A. Baten, Sandra C. Matz, Maarten W. Bos
  • for: 这个研究旨在预测和理解社交媒体平台上的用户行为规律,以提高平台的功能和隐私保护。
  • methods: 研究使用深度LSTM神经网络分析了1000万次推特会话数据,从80000名用户手机上收集数据,并使用过去行为特征来预测用户的活跃和潜在行为。
  • results: 研究发现,考虑Context信息可以提高预测性能,并且只需要 minimal behavioral histories 可以准确预测用户的活跃和潜在行为。此外,研究还发现了用户在不同的Context下的行为差异,这些差异可以被用来预测用户的活跃和潜在行为。
    Abstract The success of online social platforms hinges on their ability to predict and understand user behavior at scale. Here, we present data suggesting that context-aware modeling approaches may offer a holistic yet lightweight and potentially privacy-preserving representation of user engagement on online social platforms. Leveraging deep LSTM neural networks to analyze more than 100 million Snapchat sessions from almost 80.000 users, we demonstrate that patterns of active and passive use are predictable from past behavior (R2=0.345) and that the integration of context information substantially improves predictive performance compared to the behavioral baseline model (R2=0.522). Features related to smartphone connectivity status, location, temporal context, and weather were found to capture non-redundant variance in user engagement relative to features derived from histories of in-app behaviors. Further, we show that a large proportion of variance can be accounted for with minimal behavioral histories if momentary context information is considered (R2=0.44). These results indicate the potential of context-aware approaches for making models more efficient and privacy-preserving by reducing the need for long data histories. Finally, we employ model explainability techniques to glean preliminary insights into the underlying behavioral mechanisms. Our findings are consistent with the notion of context-contingent, habit-driven patterns of active and passive use, underscoring the value of contextualized representations of user behavior for predicting user engagement on social platforms.
    摘要 在线社交平台的成功取决于它们可以预测和理解用户行为的能力。在这里,我们发布了一些数据,说明了使用具有上下文意识的模型方法可以提供一个整体而轻量级的用户互动预测。我们使用深度LSTM神经网络分析了1000万次 snapchat会话,来自80000名用户,并示出了过去行为可以预测当前行为的情况(R2=0.345)。此外,我们发现了一些与智能手机连接状态、位置、时间上下文和天气有关的特征可以捕捉用户互动中的非重复差异。此外,我们发现只需要考虑当前上下文信息,就可以减少用户历史数据量,以提高预测性能(R2=0.44)。这些结果表明了上下文意识方法的潜在优势,可以使模型更加高效和隐私保护。最后,我们使用模型解释技术,Initial insights into the underlying behavioral mechanisms. Our findings support the idea of context-dependent, habit-driven patterns of active and passive use, highlighting the importance of contextualized representations of user behavior for predicting user engagement on social platforms.

Towards Zero Shot Learning in Restless Multi-armed Bandits

  • paper_url: http://arxiv.org/abs/2310.14526
  • repo_url: None
  • paper_authors: Yunfan Zhao, Nikhil Behari, Edward Hughes, Edwin Zhang, Dheeraj Nagaraj, Karl Tuyls, Aparna Taneja, Milind Tambe
  • for: 解决Restless Multi-Arm Bandits(RMAB)问题,提高资源分配的效率和适应能力。
  • methods: 使用神经网络模型PreFeRMAB,通过预训练和精度训练来解决RMAB问题,并且可以处理连续状态和多个动作选择。
  • results: 提出了一种新的更新规则,并通过 theoretically 和实验来证明其可以快速适应不同的RMAB问题,并且在一些实际应用中提高了效率和适应能力。
    Abstract Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when arms opt-in and opt-out over time, a common challenge in many real world applications. We address these limitations by developing a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs, and which can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. Our model also accommodates general multi-action settings and discrete or continuous state spaces. To enable fast generalization, we learn a novel single policy network model that utilizes feature information and employs a training procedure in which arms opt-in and out over time. We derive a new update rule for a crucial $\lambda$-network with theoretical convergence guarantees and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems.
    摘要 众多臂猿 (RMAB),一类资源分配问题,在医疗、在线广告和抵抗贪婪等领域有广泛的应用。过去的RMAB研究受到一些限制,例如无法有效处理连续状态,并且需要在时间上 retrained from scratch,这在许多实际应用中是一个常见的挑战。我们通过开发一个基于神经网络的预训练模型(PreFeRMAB)来解决这些限制。PreFeRMAB具有通用零shot能力,可以在许多未before seen RMAB中进行快速普适化,并且可以在特定情况下进行精细化,比 retrained from scratch 更加 sample-efficient。我们的模型还可以处理多个动作设定和整数或连续状态空间。为了快速普适化,我们学习了一种新的单策略网络模型,利用特征信息,并采用一种在Arm opt-in和opt-out过程中进行训练的方法。我们 derive了一个新的更新规则,用于一个关键的 $\lambda$-网络,并经验表明了我们的方法在一些实际挑战性的问题上具有优势。

Do We Really Need Contrastive Learning for Graph Representation?

  • paper_url: http://arxiv.org/abs/2310.14525
  • repo_url: None
  • paper_authors: Yulan Hu, Sheng Ouyang, Jingyu Liu, Ge Chen, Zhirui Yang, Junchen Wan, Fuzheng Zhang, Zhongyuan Wang, Yong Liu
  • for: 本研究的目的是提出一种简单 yet effective的图学习模型,用于减少对大图的计算开销和提高图 embedding 的质量。
  • methods: 本研究使用 rank learning 技术,首先通过损害生成两个图视图,然后计算对应节点的对比性分数,并通过对比性分数进行排名学习来度量对应节点之间的相似性。
  • results: 对多个图任务进行了广泛的实验,结果显示 GraphRank 模型在多种图任务中表现出色,并且比其他当今最佳 GCL 方法更高效。
    Abstract In recent years, contrastive learning has emerged as a dominant self-supervised paradigm, attracting numerous research interests in the field of graph learning. Graph contrastive learning (GCL) aims to embed augmented anchor samples close to each other while pushing the embeddings of other samples (negative samples) apart. However, existing GCL methods require large and diverse negative samples to ensure the quality of embeddings, and recent studies typically leverage samples excluding the anchor and positive samples as negative samples, potentially introducing false negative samples (negatives that share the same class as the anchor). Additionally, this practice can result in heavy computational burden and high time complexity of $O(N^2)$, which is particularly unaffordable for large graphs. To address these deficiencies, we leverage rank learning and propose a simple yet effective model, GraphRank. Specifically, we first generate two graph views through corruption. Then, we compute the similarity of pairwise nodes (anchor node and positive node) in both views, an arbitrary node in the latter view is selected as a negative node, and its similarity with the anchor node is computed. Based on this, we introduce rank-based learning to measure similarity scores which successfully relieve the false negative provlem and decreases the time complexity from $O(N^2)$ to $O(N)$. Moreover, we conducted extensive experiments across multiple graph tasks, demonstrating that GraphRank performs favorably against other cutting-edge GCL methods in various tasks.
    摘要 近年来,对比学习在图学Field中得到了广泛的研究兴趣,特别是图对比学习(GCL),它的目标是将扩展的锚Sample embeddingClose to each other, while pushing the embeddings of other samples (negative samples) apart. However, existing GCL methods require large and diverse negative samples to ensure the quality of embeddings, and recent studies typically leverage samples excluding the anchor and positive samples as negative samples, potentially introducing false negative samples (negatives that share the same class as the anchor). Additionally, this practice can result in heavy computational burden and high time complexity of $O(N^2)$, which is particularly unaffordable for large graphs.To address these deficiencies, we leverage rank learning and propose a simple yet effective model, GraphRank. Specifically, we first generate two graph views through corruption. Then, we compute the similarity of pairwise nodes (anchor node and positive node) in both views, and an arbitrary node in the latter view is selected as a negative node, and its similarity with the anchor node is computed. Based on this, we introduce rank-based learning to measure similarity scores, which successfully relieves the false negative problem and decreases the time complexity from $O(N^2)$ to $O(N)$.Moreover, we conducted extensive experiments across multiple graph tasks, demonstrating that GraphRank performs favorably against other cutting-edge GCL methods in various tasks.

CorefPrompt: Prompt-based Event Coreference Resolution by Measuring Event Type and Argument Compatibilities

  • paper_url: http://arxiv.org/abs/2310.14512
  • repo_url: https://github.com/jsksxs360/prompt-event-coref-emnlp2023
  • paper_authors: Sheng Xu, Peifeng Li, Qiaoming Zhu
  • for: 这篇论文是关于Event Coreference Resolution (ECR)的研究,ECR是将事件提及集成到同一个群组的过程。
  • methods: 这篇论文提出了一种基于提示的方法,即CorefPrompt,将ECR转化为一种cloze-style MLM任务,同时实现事件模型和核心关系划分的同时学习。此外, paper 还提出了两个辅助任务,事件类兼容性和参数兼容性,以便显示ECR的决策过程。
  • results: 实验结果表明,CorefPrompt 在一个SOTA标准 benchmark 中表现出色。
    Abstract Event coreference resolution (ECR) aims to group event mentions referring to the same real-world event into clusters. Most previous studies adopt the "encoding first, then scoring" framework, making the coreference judgment rely on event encoding. Furthermore, current methods struggle to leverage human-summarized ECR rules, e.g., coreferential events should have the same event type, to guide the model. To address these two issues, we propose a prompt-based approach, CorefPrompt, to transform ECR into a cloze-style MLM (masked language model) task. This allows for simultaneous event modeling and coreference discrimination within a single template, with a fully shared context. In addition, we introduce two auxiliary prompt tasks, event-type compatibility and argument compatibility, to explicitly demonstrate the reasoning process of ECR, which helps the model make final predictions. Experimental results show that our method CorefPrompt performs well in a state-of-the-art (SOTA) benchmark.
    摘要 Event 核心聚合解析(ECR)目的是将事件提及集成到一起。大多数前一 Studies 采用 "编码然后评分" 框架,使核心相关性判断依赖事件编码。此外,当前方法困难借用人工概括 ECRI 规则,例如相关事件应该有相同的事件类型,来引导模型。为解决这两个问题,我们提出了一种提示基本方法,CorefPrompt,将 ECR 转换成一种cloze 风格的 MLM 任务。这使得同一个模板中同时进行事件模型化和核心相关性决定,具有完全共享上下文。此外,我们引入了两个辅助提示任务,事件类型兼容和参数兼容,以显式地演示 ECRI 的逻辑过程,帮助模型作出最终预测。实验结果表明,我们的方法 CorefPrompt 在state-of-the-art bencmark 中表现出色。

Iteratively Learn Diverse Strategies with State Distance Information

  • paper_url: http://arxiv.org/abs/2310.14509
  • repo_url: None
  • paper_authors: Wei Fu, Weihua Du, Jingwei Li, Sunli Chen, Jingzhao Zhang, Yi Wu
  • for: 解决复杂奖励学习(RL)问题中,政策具有相似奖励可能具有巨大不同行为的问题。
  • methods: 我们的研究检验了两种设计选择来解决这个挑战,即多样性度量和计算框架。我们发现现有多样性度量可能无法准确捕捉行为差异,因此我们提出了基于状态空间距离信息的多样性度量。此外,我们检验了两种常见的计算框架,即人口基础训练(PBT)和迭代学习(ITR)。我们发现虽然PBT是正确的问题形式,但ITR可以在计算效率更高的情况下实现相同的多样性分数,从而提高实际解决方案的质量。
  • results: 基于我们的分析,我们将ITR与两种可读取的状态距离基于多样性度量相结合,开发了一种新的多样性驱动RL算法,即状态基于内在奖励策略优化(SIPO)。我们在机器人运动和多机器人游戏等三个领域进行了实验,并在所有测试环境中表明SIPO可以生成策略性多样且人类可理解的策略,而现有基准无法捕捉到这些策略。
    Abstract In complex reinforcement learning (RL) problems, policies with similar rewards may have substantially different behaviors. It remains a fundamental challenge to optimize rewards while also discovering as many diverse strategies as possible, which can be crucial in many practical applications. Our study examines two design choices for tackling this challenge, i.e., diversity measure and computation framework. First, we find that with existing diversity measures, visually indistinguishable policies can still yield high diversity scores. To accurately capture the behavioral difference, we propose to incorporate the state-space distance information into the diversity measure. In addition, we examine two common computation frameworks for this problem, i.e., population-based training (PBT) and iterative learning (ITR). We show that although PBT is the precise problem formulation, ITR can achieve comparable diversity scores with higher computation efficiency, leading to improved solution quality in practice. Based on our analysis, we further combine ITR with two tractable realizations of the state-distance-based diversity measures and develop a novel diversity-driven RL algorithm, State-based Intrinsic-reward Policy Optimization (SIPO), with provable convergence properties. We empirically examine SIPO across three domains from robot locomotion to multi-agent games. In all of our testing environments, SIPO consistently produces strategically diverse and human-interpretable policies that cannot be discovered by existing baselines.
    摘要 在复杂的强化学习(RL)问题中,具有类似奖励的策略可能具有极大的不同行为。optimizing奖励并发现最多可能的策略是一个基本挑战。我们的研究检视了两种设计选择来解决这个挑战,即多样性度量和计算框架。首先,我们发现了现有多样性度量中的一个问题,即视觉上无法区分的策略可能具有高多样性分数。为了准确地捕捉行为之间的差异,我们提议将状态空间距离信息纳入多样性度量中。此外,我们还检视了两种常见的计算框架,即人口基本训练(PBT)和迭代学习(ITR)。我们发现,虽然PBT是正确的问题形式,但ITR可以达到相同的多样性分数,并且 computationally more efficient,从而提高实际解决方案的质量。基于我们的分析,我们进一步将ITR与两种可读的状态距离基于多样性度量相结合,并开发了一种新的多样性驱动RL算法,即状态基本内在奖励策略优化(SIPO)。我们在三个从机器人移动到多机器人游戏的领域进行了实验,在所有测试环境中,SIPO一直生成了策略多样性和人类可解释的策略,而现有基准值无法生成这些策略。

Counterfactual Explanation Generation with s(CASP)

  • paper_url: http://arxiv.org/abs/2310.14497
  • repo_url: None
  • paper_authors: Sopam Dasgupta, Farhad Shakerin, Joaquín Arias, Elmer Salazar, Gopal Gupta
  • for: This paper focuses on the problem of automatically generating counterfactual explanations to provide justifications for decision-making models.
  • methods: The approach used in this paper is based on answer set programming (ASP) and the s(CASP) goal-directed ASP system. The query-driven nature of s(CASP) allows for the generation of counterfactual explanations as proof trees.
  • results: The paper shows how counterfactual explanations can be computed and justified by imagining multiple possible worlds where some or all factual assumptions are untrue, and how the algorithm can be used to find the Craig Interpolant for a class of answer set programs for a failing query.
    Abstract Machine learning models that automate decision-making are increasingly being used in consequential areas such as loan approvals, pretrial bail, hiring, and many more. Unfortunately, most of these models are black-boxes, i.e., they are unable to reveal how they reach these prediction decisions. A need for transparency demands justification for such predictions. An affected individual might desire explanations to understand why a decision was made. Ethical and legal considerations may further require informing the individual of changes in the input attribute that could be made to produce a desirable outcome. This paper focuses on the latter problem of automatically generating counterfactual explanations. Our approach utilizes answer set programming and the s(CASP) goal-directed ASP system. Answer Set Programming (ASP) is a well-known knowledge representation and reasoning paradigm. s(CASP) is a goal-directed ASP system that executes answer-set programs top-down without grounding them. The query-driven nature of s(CASP) allows us to provide justifications as proof trees, which makes it possible to analyze the generated counterfactual explanations. We show how counterfactual explanations are computed and justified by imagining multiple possible worlds where some or all factual assumptions are untrue and, more importantly, how we can navigate between these worlds. We also show how our algorithm can be used to find the Craig Interpolant for a class of answer set programs for a failing query.
    摘要 机器学习模型在重要领域 such as 贷款批准、预审释放、招聘等中逐渐被使用来自动化决策。然而,大多数这些模型都是黑盒模型,即无法解释它们如何达到预测决策。一个受影响的个体可能希望了解决ving why a decision was made。伦理和法律考虑因此可能需要通知个体可以改变输入属性来生成愉悦的结果。本文关注后一个问题:自动生成对实际解释。我们的方法使用回答集编程和s(CASP)目标导向ASP系统。回答集编程(ASP)是一种知识表示和推理方法。s(CASP)是一种目标导向ASP系统,它在执行回答集程序时不需要下面。我们可以通过查询驱动的nature of s(CASP)提供证明树,从而分析生成的对实际解释。我们展示了如何计算和证明对实际解释,以及如何在多个可能的世界中导航。我们还展示了如何使用我们的算法找到一个 Craig Interpolant для一个类型的回答集程序。

InstructExcel: A Benchmark for Natural Language Instruction in Excel

  • paper_url: http://arxiv.org/abs/2310.14495
  • repo_url: None
  • paper_authors: Justin Payan, Swaroop Mishra, Mukul Singh, Carina Negreanu, Christian Poelitz, Chitta Baral, Subhro Roy, Rasika Chakravarthy, Benjamin Van Durme, Elnaz Nouri
  • for: investigate whether Large Language Models (LLMs) can generate code (Excel OfficeScripts, a TypeScript API for executing many tasks in Excel) that solves Excel specific tasks provided via natural language user instructions.
  • methods: introduce a new large-scale benchmark, InstructExcel, created by leveraging the ‘Automate’ feature in Excel to automatically generate OfficeScripts from users’ actions.
  • results: observe that (1) using GPT-4 over GPT-3.5, (2) providing more in-context examples, and (3) dynamic prompting can help improve performance on this benchmark.Here’s the full text in Simplified Chinese:
  • for: 这个研究是 investigate whether Large Language Models (LLMs) can generate code (Excel OfficeScripts, a TypeScript API for executing many tasks in Excel) that solves Excel specific tasks provided via natural language user instructions.
  • methods: 这个研究使用了一个新的大规模 benchmark,InstructExcel,它是通过使用 Excel 中的 ‘Automate’ 功能自动生成 OfficeScripts 从用户的操作来生成的。
  • results: 研究发现,(1)使用 GPT-4 instead of GPT-3.5, (2)提供更多的上下文示例,和 (3) 动态提示可以帮助改进这个 benchmark 的性能。
    Abstract With the evolution of Large Language Models (LLMs) we can solve increasingly more complex NLP tasks across various domains, including spreadsheets. This work investigates whether LLMs can generate code (Excel OfficeScripts, a TypeScript API for executing many tasks in Excel) that solves Excel specific tasks provided via natural language user instructions. To do so we introduce a new large-scale benchmark, InstructExcel, created by leveraging the 'Automate' feature in Excel to automatically generate OfficeScripts from users' actions. Our benchmark includes over 10k samples covering 170+ Excel operations across 2,000 publicly available Excel spreadsheets. Experiments across various zero-shot and few-shot settings show that InstructExcel is a hard benchmark for state of the art models like GPT-4. We observe that (1) using GPT-4 over GPT-3.5, (2) providing more in-context examples, and (3) dynamic prompting can help improve performance on this benchmark.
    摘要 “对于大型语言模型(LLMs),我们可以解决越来越复杂的自然语言处理(NLP)任务,包括表格。这个工作研究了 LLMs 是否可以从自然语言用户指令中生成代码(Excel OfficeScripts,一个基于 TypeScript 的 API 用于执行许多 Excel 任务)以解决 Excel 特定的任务。为此,我们创建了一个新的大规模参考基准,名为 InstructExcel,通过使用 Excel 的 '自动化' 功能来自动从用户的动作生成 OfficeScripts。我们的参考基准包括超过 10,000 个样本,覆盖了 170 多个 Excel 操作,分布在 2,000 个公开可用的 Excel 表格上。实验结果显示,InstructExcel 是现今最佳模型 GPT-4 的困难参考基准。我们发现了以下三个因素可以帮助提高表格上的表现:(1)使用 GPT-4 вместо GPT-3.5,(2)提供更多的内部示例,以及(3)动态提示。”

Robotic Arm Manipulation to Perform Rock Skipping in Simulation

  • paper_url: http://arxiv.org/abs/2310.14492
  • repo_url: None
  • paper_authors: Nicholas Ramirez, Michael Burgess
  • for: 这个项目的目的是将岩石跳跃引入机器人设定中,利用机器人抓取和动态环境来实现岩石跳跃。
  • methods: 该项目使用了机器人臂和动态环境来实现岩石跳跃,并通过调整发射速度来获得最大跳跃数。
  • results: 该项目遇到了抓取不稳定和发射高度轨迹问题,这些问题将在报告中进行详细介绍。
    Abstract Rock skipping is a highly dynamic and relatively complex task that can easily be performed by humans. This project aims to bring rock skipping into a robotic setting, utilizing the lessons we learned in Robotic Manipulation. Specifically, this project implements a system consisting of a robotic arm and dynamic environment to perform rock skipping in simulation. By varying important parameters such as release velocity, we hope to use our system to gain insight into the most important factors for maximizing the total number of skips. In addition, by implementing the system in simulation, we have a more rigorous and precise testing approach over these varied test parameters. However, this project experienced some limitations due to gripping inefficiencies and problems with release height trajectories which is further discussed in our report.
    摘要 石头跳是一种高度动态和相对复杂的任务,人类可以轻松完成。这个项目想将石头跳带入机器人环境中,利用我们在机器人抓取方面学到的教训。具体来说,这个项目实现了一个由机器人臂和动态环境组成的系统,用于在模拟环境中进行石头跳。通过调整发射速度等重要参数,我们希望通过这个系统获得最大化石头跳数的最佳因素。此外,由于在模拟环境中实现系统,我们可以对这些参数进行更加严格和精炼的测试。然而,这个项目受到了抓取不足和发射高度轨迹问题的限制,这些问题在报告中进行了详细说明。

VQ-NeRF: Vector Quantization Enhances Implicit Neural Representations

  • paper_url: http://arxiv.org/abs/2310.14487
  • repo_url: None
  • paper_authors: Yiying Yang, Wen Liu, Fukun Yin, Xin Chen, Gang Yu, Jiayuan Fan, Tao Chen
  • for: 高精度表面重建和新视图生成
  • methods: 使用vector quantization提高implicit neural representation,并实现多尺度NeRF采样 schemes
  • results: 实现高品质渲染和高效率,并在DTU、BlendMVS和H3DS数据集上达到最佳平衡。
    Abstract Recent advancements in implicit neural representations have contributed to high-fidelity surface reconstruction and photorealistic novel view synthesis. However, the computational complexity inherent in these methodologies presents a substantial impediment, constraining the attainable frame rates and resolutions in practical applications. In response to this predicament, we propose VQ-NeRF, an effective and efficient pipeline for enhancing implicit neural representations via vector quantization. The essence of our method involves reducing the sampling space of NeRF to a lower resolution and subsequently reinstating it to the original size utilizing a pre-trained VAE decoder, thereby effectively mitigating the sampling time bottleneck encountered during rendering. Although the codebook furnishes representative features, reconstructing fine texture details of the scene remains challenging due to high compression rates. To overcome this constraint, we design an innovative multi-scale NeRF sampling scheme that concurrently optimizes the NeRF model at both compressed and original scales to enhance the network's ability to preserve fine details. Furthermore, we incorporate a semantic loss function to improve the geometric fidelity and semantic coherence of our 3D reconstructions. Extensive experiments demonstrate the effectiveness of our model in achieving the optimal trade-off between rendering quality and efficiency. Evaluation on the DTU, BlendMVS, and H3DS datasets confirms the superior performance of our approach.
    摘要 (Simplified Chinese translation)最近的隐藏神经表示法的进步有助于高精度表面重建和真实视角synthesis。然而,这些方法中的计算复杂性存在实际应用中的重大阻碍,限制了可能的帧率和分辨率。为了解决这个问题,我们提出了VQ-NeRF,一种高效和高质量的渠道 для加强隐藏神经表示。我们的方法包括将NeRF的采样空间减小到更低的分辨率,并使用预训练的VAE解码器来重新填充到原始大小。这种方法可以有效地减少渲染时间瓶颈。虽然代码库提供了表示性的特征,但是重建场景中细节的表示仍然具有挑战性,因为压缩率较高。为了解决这个问题,我们设计了一种多尺度NeRF采样方案,同时优化NeRF模型在压缩和原始尺度上,以提高网络对细节的能力。此外,我们还添加了一个 semantic损失函数,以提高3D重建的 геометрической准确性和semantic coherence。广泛的实验表明,我们的模型可以实现最佳的质量和效率之间的平衡。DTU、BlendMVS和H3DS数据集的评估表明我们的方法的超越性。

Intelligent Escape of Robotic Systems: A Survey of Methodologies, Applications, and Challenges

  • paper_url: http://arxiv.org/abs/2310.14485
  • repo_url: None
  • paper_authors: Junfei Li, Simon X. Yang
  • for: 本研究评论了智能逃脱robotic系统的最新研究成果,以帮助读者更好地理解这一领域的发展趋势。
  • methods: 本文主要介绍了四种智能逃脱方法,包括规划基本方法、分区基本方法、学习基本方法和生物靓取得基本方法。
  • results: 本文 Summarizes the strengths and limitations of existing methods, and discusses their potential applications in various domains such as search and rescue, evacuation, military security, and healthcare. Additionally, the survey identifies current research challenges and provides insights into future research trends in intelligent escape.
    Abstract Intelligent escape is an interdisciplinary field that employs artificial intelligence (AI) techniques to enable robots with the capacity to intelligently react to potential dangers in dynamic, intricate, and unpredictable scenarios. As the emphasis on safety becomes increasingly paramount and advancements in robotic technologies continue to advance, a wide range of intelligent escape methodologies has been developed in recent years. This paper presents a comprehensive survey of state-of-the-art research work on intelligent escape of robotic systems. Four main methods of intelligent escape are reviewed, including planning-based methodologies, partitioning-based methodologies, learning-based methodologies, and bio-inspired methodologies. The strengths and limitations of existing methods are summarized. In addition, potential applications of intelligent escape are discussed in various domains, such as search and rescue, evacuation, military security, and healthcare. In an effort to develop new approaches to intelligent escape, this survey identifies current research challenges and provides insights into future research trends in intelligent escape.
    摘要 智能逃脱是一个跨学科领域,利用人工智能技术(AI)为机器人提供智能应对危险的能力。随着安全的重要性日益增加,机器人技术的进步也在不断发展,Recently, a wide range of intelligent escape methodologies has been developed. This paper presents a comprehensive survey of state-of-the-art research work on intelligent escape of robotic systems. Four main methods of intelligent escape are reviewed, including planning-based methodologies, partitioning-based methodologies, learning-based methodologies, and bio-inspired methodologies. The strengths and limitations of existing methods are summarized. In addition, potential applications of intelligent escape are discussed in various domains, such as search and rescue, evacuation, military security, and healthcare. To develop new approaches to intelligent escape, this survey identifies current research challenges and provides insights into future research trends in intelligent escape.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

cs.CL - 2023-10-23

GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions

  • paper_url: http://arxiv.org/abs/2310.15405
  • repo_url: None
  • paper_authors: Ting-Yao Hsu, Chieh-Yang Huang, Ryan Rossi, Sungchul Kim, C. Lee Giles, Ting-Hao K. Huang
  • for: This paper aims to evaluate the effectiveness of using large language models (LLMs) as a cost-effective, reference-free method for assessing the quality of scientific figure captions.
  • methods: The authors constructed a human evaluation dataset called SCICAP-EVAL, which contains human judgments for 3,600 scientific figure captions, and used LLMs like GPT-4 and GPT-3 to score each caption based on its potential to aid reader understanding.
  • results: The results show that GPT-4, used as a zero-shot evaluator, outperformed all other models and even surpassed assessments made by Computer Science and Informatics undergraduates, achieving a Kendall correlation score of 0.401 with Ph.D. students rankings.
    Abstract There is growing interest in systems that generate captions for scientific figures. However, assessing these systems output poses a significant challenge. Human evaluation requires academic expertise and is costly, while automatic evaluation depends on often low-quality author-written captions. This paper investigates using large language models (LLMs) as a cost-effective, reference-free method for evaluating figure captions. We first constructed SCICAP-EVAL, a human evaluation dataset that contains human judgments for 3,600 scientific figure captions, both original and machine-made, for 600 arXiv figures. We then prompted LLMs like GPT-4 and GPT-3 to score (1-6) each caption based on its potential to aid reader understanding, given relevant context such as figure-mentioning paragraphs. Results show that GPT-4, used as a zero-shot evaluator, outperformed all other models and even surpassed assessments made by Computer Science and Informatics undergraduates, achieving a Kendall correlation score of 0.401 with Ph.D. students rankings
    摘要 有增长的兴趣在对科学图表的描述文本进行自动生成。然而,评估这些系统的输出是一项重要的挑战。人工评估需要专业知识和成本高昂,而自动评估则基于经常低质量的作者写的描述文本。本文研究使用大型自然语言模型(LLM)作为一种可靠、无参考的方法来评估图表描述文本。我们首先构建了SCICAP-EVAL数据集,其包含3,600个科学图表描述文本的人类评估结果,包括原始描述文本和机器生成的描述文本,对arXiv文章进行了600个。然后,我们向GPT-4和GPT-3等LMM prompted,评分每个描述文本的可能性以助理理解,基于相关的figure-提及段落。结果显示,GPT-4作为零批评判定器,与所有其他模型相比,表现出色,甚至超过了由计算机科学和信息学学士学生进行的评估,其Kendall相关度分数为0.401。

  • paper_url: http://arxiv.org/abs/2310.15398
  • repo_url: None
  • paper_authors: Li Lucy, Su Lin Blodgett, Milad Shokouhi, Hanna Wallach, Alexandra Olteanu
  • for: 研究人员想要了解NLG系统应该如何行为,以便确定合适的NLG系统行为是否具有公平性。
  • methods: 研究人员采用了五个案例研究,在NLG系统输入中偏让不同类型的身份语言特征(名称、角色、地点、方言和风格),以探索偏让和适应之间的矛盾。
  • results: 研究人员发现,适应的动机包括社会规范、文化差异、特定特征信息和适应,而偏让的动机包括拟合主义、观点认为NLG系统应该保持一致,以及对false assumption的担忧。这些发现表明定义合适NLG系统行为的问题仍存在开放的挑战。
    Abstract Fairness-related assumptions about what constitutes appropriate NLG system behaviors range from invariance, where systems are expected to respond identically to social groups, to adaptation, where responses should instead vary across them. We design and conduct five case studies, in which we perturb different types of identity-related language features (names, roles, locations, dialect, and style) in NLG system inputs to illuminate tensions around invariance and adaptation. We outline people's expectations of system behaviors, and surface potential caveats of these two contrasting yet commonly-held assumptions. We find that motivations for adaptation include social norms, cultural differences, feature-specific information, and accommodation; motivations for invariance include perspectives that favor prescriptivism, view adaptation as unnecessary or too difficult for NLG systems to do appropriately, and are wary of false assumptions. Our findings highlight open challenges around defining what constitutes fair NLG system behavior.
    摘要 “对于恰当的自然语言生成系统行为的公平相关假设, Range from不变性, where systems are expected to respond identically to all social groups, to adaptability, where responses should instead vary across them. We design and conduct five case studies, in which we perturb different types of identity-related language features (names, roles, locations, dialect, and style) in NLG system inputs to illuminate tensions around invariance and adaptation. We outline people's expectations of system behaviors, and surface potential caveats of these two contrasting yet commonly-held assumptions. We find that motivations for adaptation include social norms, cultural differences, feature-specific information, and accommodation; motivations for invariance include perspectives that favor prescriptivism, view adaptation as unnecessary or too difficult for NLG systems to do appropriately, and are wary of false assumptions. Our findings highlight open challenges around defining what constitutes fair NLG system behavior.”Note: The translation is done using the Simplified Chinese writing system, which is used in mainland China and Singapore. The Traditional Chinese writing system is used in Taiwan, Hong Kong, and other parts of the world.

GD-COMET: A Geo-Diverse Commonsense Inference Model

  • paper_url: http://arxiv.org/abs/2310.15383
  • repo_url: None
  • paper_authors: Mehar Bhatia, Vered Shwartz
  • for: 提高AI系统的多元化和包容性,以服务于不同背景的用户。
  • methods: 基于COMET模型,开发了地域多样化版本GD-COMET,可以涵盖广泛的文化知识。
  • results: 通过人类评估和外测,GD-COMET能够捕捉和生成具有文化特征的通用常识知识,展示其在NLProc应用中的潜在优势和包容性。
    Abstract With the increasing integration of AI into everyday life, it's becoming crucial to design AI systems that serve users from diverse backgrounds by making them culturally aware. In this paper, we present GD-COMET, a geo-diverse version of the COMET commonsense inference model. GD-COMET goes beyond Western commonsense knowledge and is capable of generating inferences pertaining to a broad range of cultures. We demonstrate the effectiveness of GD-COMET through a comprehensive human evaluation across 5 diverse cultures, as well as extrinsic evaluation on a geo-diverse task. The evaluation shows that GD-COMET captures and generates culturally nuanced commonsense knowledge, demonstrating its potential to benefit NLP applications across the board and contribute to making NLP more inclusive.
    摘要 随着人工智能日益普遍化到日常生活中,设计能满足多元背景用户的AI系统已成为非常重要。在这篇论文中,我们提出了GD-COMET模型,这是一种基于地理多样化的COMET常识推理模型。GD-COMET不仅超越了西方常识知识,还能处理广泛的文化知识。我们通过对5个多元文化的人类评估和地理多样化任务的外显性评估,证明GD-COMET能够捕捉和生成文化差异化的常识知识,表明其在NLG、NLP和其他应用领域中的潜力。

A Review of Reinforcement Learning for Natural Language Processing, and Applications in Healthcare

  • paper_url: http://arxiv.org/abs/2310.18354
  • repo_url: None
  • paper_authors: Ying Liu, Haozhu Wang, Huixue Zhou, Mingchen Li, Yu Hou, Sicheng Zhou, Fang Wang, Rama Hoetzlein, Rui Zhang
  • for: 本文提供了一个RL在自然语言处理(NLP)领域的回顾,涵盖RL技术的发展、挑战和应用在医疗领域。
  • methods: 本文详细介绍了RL在NLP任务中的应用,包括对话系统、机器翻译、问答系统、文本摘要和信息提取等。
  • results: 本文评论了RL-NLP系统中的伦理考虑和偏见问题。
    Abstract Reinforcement learning (RL) has emerged as a powerful approach for tackling complex medical decision-making problems such as treatment planning, personalized medicine, and optimizing the scheduling of surgeries and appointments. It has gained significant attention in the field of Natural Language Processing (NLP) due to its ability to learn optimal strategies for tasks such as dialogue systems, machine translation, and question-answering. This paper presents a review of the RL techniques in NLP, highlighting key advancements, challenges, and applications in healthcare. The review begins by visualizing a roadmap of machine learning and its applications in healthcare. And then it explores the integration of RL with NLP tasks. We examined dialogue systems where RL enables the learning of conversational strategies, RL-based machine translation models, question-answering systems, text summarization, and information extraction. Additionally, ethical considerations and biases in RL-NLP systems are addressed.
    摘要 复制RL在医疗决策中 Emerged as a powerful approach, tackling complex medical decision-making problems such as treatment planning, personalized medicine, and optimizing the scheduling of surgeries and appointments. It has gained significant attention in the field of Natural Language Processing (NLP) due to its ability to learn optimal strategies for tasks such as dialogue systems, machine translation, and question-answering. This paper presents a review of the RL techniques in NLP, highlighting key advancements, challenges, and applications in healthcare. The review begins by visualizing a roadmap of machine learning and its applications in healthcare. And then it explores the integration of RL with NLP tasks. We examined dialogue systems where RL enables the learning of conversational strategies, RL-based machine translation models, question-answering systems, text summarization, and information extraction. Additionally, ethical considerations and biases in RL-NLP systems are addressed.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Specialist or Generalist? Instruction Tuning for Specific NLP Tasks

  • paper_url: http://arxiv.org/abs/2310.15326
  • repo_url: https://github.com/DavidFanzz/Generalist_or_Specialist
  • paper_authors: Chufan Shi, Yixuan Su, Cheng Yang, Yujiu Yang, Deng Cai
  • for: 本研究旨在探讨大语言模型(LLMs)同时完成多种自然语言处理(NLP)任务的能力。
  • methods: 研究采用了宽泛涵盖的通用 instrucion 调整技术,以提高 LLMs 的总体性能。
  • results: 实验表明,将通用 instrucion 调整与专家模型相结合,可以提高模型的性能,尤其是当任务覆盖率广泛时。此外,研究还发现,通用 instrucion 调整可以提高模型理解和逻辑能力,但对于需要 фактиче知识的任务,通用数据中含有幻想信息可能会有负面影响。
    Abstract The potential of large language models (LLMs) to simultaneously perform a wide range of natural language processing (NLP) tasks has been the subject of extensive research. Although instruction tuning has proven to be a data-efficient method for transforming LLMs into such generalist models, their performance still lags behind specialist models trained exclusively for specific tasks. In this paper, we investigate whether incorporating broad-coverage generalist instruction tuning can contribute to building a specialist model. We hypothesize that its efficacy depends on task specificity and skill requirements. Our experiments assess four target tasks with distinct coverage levels, revealing that integrating generalist instruction tuning consistently enhances model performance when the task coverage is broad. The effect is particularly pronounced when the amount of task-specific training data is limited. Further investigation into three target tasks focusing on different capabilities demonstrates that generalist instruction tuning improves understanding and reasoning abilities. However, for tasks requiring factual knowledge, generalist data containing hallucinatory information may negatively affect the model's performance. Overall, our work provides a systematic guide for developing specialist models with general instruction tuning. Our code and other related resources can be found at https://github.com/DavidFanzz/Generalist_or_Specialist.
    摘要 大型自然语言处理(NLP)模型(LLMs)的潜在能力同时执行多种NLP任务已经得到了广泛的研究。虽然教程调整被证明是一种数据效率的方法来转化LLMs成为通用模型,但其性能仍然落后于专门为特定任务训练的模型。在这篇论文中,我们 investigate whether incorporating broad-coverage generalist instruction tuning can contribute to building a specialist model. We hypothesize that its efficacy depends on task specificity and skill requirements. Our experiments assess four target tasks with distinct coverage levels, revealing that integrating generalist instruction tuning consistently enhances model performance when the task coverage is broad. The effect is particularly pronounced when the amount of task-specific training data is limited. Further investigation into three target tasks focusing on different capabilities demonstrates that generalist instruction tuning improves understanding and reasoning abilities. However, for tasks requiring factual knowledge, generalist data containing hallucinatory information may negatively affect the model's performance. Overall, our work provides a systematic guide for developing specialist models with general instruction tuning. Our code and other related resources can be found at .

LXMERT Model Compression for Visual Question Answering

  • paper_url: http://arxiv.org/abs/2310.15325
  • repo_url: https://github.com/ghazaleh-mahmoodi/lxmert_compression
  • paper_authors: Maryam Hashemi, Ghazaleh Mahmoudi, Sara Kodeiri, Hadi Sheikhi, Sauleh Eetemadi
  • for: 本研究用于evaluating whether trainable subnetworks exist in LXMERT when fine-tuned on the VQA task, 以及investigating how much pruning can be done without significant loss in accuracy.
  • methods: 本研究使用LXMERT模型进行微调,并对其进行剪辑以降低大小。
  • results: 实验结果表明,可以剪辑LXMERT模型40%-60%的大小,无需 significannot loss in accuracy。
    Abstract Large-scale pretrained models such as LXMERT are becoming popular for learning cross-modal representations on text-image pairs for vision-language tasks. According to the lottery ticket hypothesis, NLP and computer vision models contain smaller subnetworks capable of being trained in isolation to full performance. In this paper, we combine these observations to evaluate whether such trainable subnetworks exist in LXMERT when fine-tuned on the VQA task. In addition, we perform a model size cost-benefit analysis by investigating how much pruning can be done without significant loss in accuracy. Our experiment results demonstrate that LXMERT can be effectively pruned by 40%-60% in size with 3% loss in accuracy.
    摘要 大规模预训练模型如LXMERT在文本图像对进行学习跨模态表示的应用越来越普遍。根据彩票假设,NLP和计算机视觉模型含有可以独立进行训练的更小子网络,以达到完整性表现。本文将这些观察结合,以评估LXMERT在VQA任务上是否包含可训练的子网络。此外,我们还进行了模型大小成本效果分析, investigate how much pruning can be done without significant loss in accuracy.我们的实验结果表明,LXMERT可以被有效地剪除40%-60%的大小,但loss in accuracy只有3%。

Exploring the Potential of Large Language Models in Generating Code-Tracing Questions for Introductory Programming Courses

  • paper_url: http://arxiv.org/abs/2310.15317
  • repo_url: None
  • paper_authors: Aysa Xuemo Fan, Ranran Haoran Zhang, Luc Paquette, Rui Zhang
  • for: 这 paper 探讨了大型语言模型 (LLMs) 在 introductory programming 课程中生成代码跟踪问题的应用。
  • methods: 作者设计了targeted prompts,使 GPT4 生成基于代码片断和描述的代码跟踪问题。
  • results: 研究发现 LLMS 可以生成多样化的代码跟踪问题,并提供了一个Unique dataset of human-和 LLM-generated tracing questions,对教育和 NLP 研究领域都是一个有价值的资源。
    Abstract In this paper, we explore the application of large language models (LLMs) for generating code-tracing questions in introductory programming courses. We designed targeted prompts for GPT4, guiding it to generate code-tracing questions based on code snippets and descriptions. We established a set of human evaluation metrics to assess the quality of questions produced by the model compared to those created by human experts. Our analysis provides insights into the capabilities and potential of LLMs in generating diverse code-tracing questions. Additionally, we present a unique dataset of human and LLM-generated tracing questions, serving as a valuable resource for both the education and NLP research communities. This work contributes to the ongoing dialogue on the potential uses of LLMs in educational settings.
    摘要 在这篇论文中,我们探讨了大语言模型(LLMs)在初级编程课程中生成代码跟踪问题的应用。我们设计了特定的提示,引导GPT4生成基于代码片段和描述的代码跟踪问题。我们确定了一组用于评估模型生成的问题质量的人类评估指标。我们的分析提供了LLMs在生成多样化代码跟踪问题的能力和潜力的深入了解。此外,我们提供了一个独特的人类和LLM生成的跟踪问题集,作为教育和NLP研究领域的价值资源。这项工作贡献于LLMs在教育设置中的潜在用途的对话。

Probing Representations for Document-level Event Extraction

  • paper_url: http://arxiv.org/abs/2310.15316
  • repo_url: https://github.com/githubarry/docie-probing
  • paper_authors: Barry Wang, Xinya Du, Claire Cardie
  • for: 这个研究旨在应用 probing 框架来解释深度神经网络模型在文档级信息EXTRACTION 应用中的表示。
  • methods: 该研究使用了 eight embedding probes 来分析表示 superficiale, semantic, 和 event-understanding 能力相关于文档级事件EXTRACTION。
  • results: 研究发现,使用 LLM-based 文档级 IE 方法学习的表示可以微妙地提高 argument detection 和标注,但只能微妙地提高事件级任务,同时存在文档长度和句子间对话的问题。
    Abstract The probing classifiers framework has been employed for interpreting deep neural network models for a variety of natural language processing (NLP) applications. Studies, however, have largely focused on sentencelevel NLP tasks. This work is the first to apply the probing paradigm to representations learned for document-level information extraction (IE). We designed eight embedding probes to analyze surface, semantic, and event-understanding capabilities relevant to document-level event extraction. We apply them to the representations acquired by learning models from three different LLM-based document-level IE approaches on a standard dataset. We found that trained encoders from these models yield embeddings that can modestly improve argument detections and labeling but only slightly enhance event-level tasks, albeit trade-offs in information helpful for coherence and event-type prediction. We further found that encoder models struggle with document length and cross-sentence discourse.
    摘要 《探索类分类器框架》已经应用于深度神经网络模型的多种自然语言处理(NLP)应用中。研究主要集中在句子级NLP任务上。这项工作是首次应用探索方法来分析文档级信息提取(IE)中所学习的表示。我们设计了八个嵌入探索器来分析表示的表面、 semantic和事件理解能力。我们将它们应用于基于三种不同LLM(深度学习模型)文档级IE方法学习的表示集。我们发现训练过的encoder从这些模型中得到的嵌入可以轻微提高Argument检测和标注,但只能轻微提高事件级任务,同时存在信息帮助性的交易。我们还发现encoder模型对文档长度和相关句子流程表示不稳定。

Adaptive End-to-End Metric Learning for Zero-Shot Cross-Domain Slot Filling

  • paper_url: http://arxiv.org/abs/2310.15294
  • repo_url: https://github.com/switchsyj/adae2ml-xsf
  • paper_authors: Yuanjun Shi, Linzhi Wu, Minglai Shao
  • for: 这篇论文是针对 zero-shot slot filling 的应用,即在训练时未见过的领域中进行插值。
  • methods: 本文提出了一个适应性的终端式度量学习方案,包括聚合式联合学习框架、内容相互联系的软式标签表示和槽级对称呈现学习,以实现效率、一致性和通用性。
  • results: 实验结果显示,提出的方法在公共评分数据集上具有较好的超越性,较以前的统计学习方法和其他竞争基准。
    Abstract Recently slot filling has witnessed great development thanks to deep learning and the availability of large-scale annotated data. However, it poses a critical challenge to handle a novel domain whose samples are never seen during training. The recognition performance might be greatly degraded due to severe domain shifts. Most prior works deal with this problem in a two-pass pipeline manner based on metric learning. In practice, these dominant pipeline models may be limited in computational efficiency and generalization capacity because of non-parallel inference and context-free discrete label embeddings. To this end, we re-examine the typical metric-based methods, and propose a new adaptive end-to-end metric learning scheme for the challenging zero-shot slot filling. Considering simplicity, efficiency and generalizability, we present a cascade-style joint learning framework coupled with context-aware soft label representations and slot-level contrastive representation learning to mitigate the data and label shift problems effectively. Extensive experiments on public benchmarks demonstrate the superiority of the proposed approach over a series of competitive baselines.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The translation may vary depending on the specific dialect or region.

On the Dimensionality of Sentence Embeddings

  • paper_url: http://arxiv.org/abs/2310.15285
  • repo_url: https://github.com/WM-SEMERU/SecureReqNet
  • paper_authors: Hongwei Wang, Hongming Zhang, Dong Yu
  • for: 这个论文主要目的是为了研究句子嵌入的维度。
  • methods: 该论文使用了一种两步训练方法,首先将编码器和池化器分别优化,以减少句子嵌入维度下的性能损失。
  • results: 实验结果表明,该方法可以在七种STS任务和七种句子分类任务中显著提高低维度句子嵌入的性能。
    Abstract Learning sentence embeddings is a fundamental problem in natural language processing. While existing research primarily focuses on enhancing the quality of sentence embeddings, the exploration of sentence embedding dimensions is limited. Here we present a comprehensive and empirical analysis of the dimensionality of sentence embeddings. First, we demonstrate that the optimal dimension of sentence embeddings is usually smaller than the default value. Subsequently, to compress the dimension of sentence embeddings with minimum performance degradation, we identify two components contributing to the overall performance loss: the encoder's performance loss and the pooler's performance loss. Therefore, we propose a two-step training method for sentence representation learning models, wherein the encoder and the pooler are optimized separately to mitigate the overall performance loss in low-dimension scenarios. Experimental results on seven STS tasks and seven sentence classification tasks demonstrate that our method significantly improves the performance of low-dimensional sentence embeddings.
    摘要 Translated into Simplified Chinese:学习句子表示是自然语言处理领域的基本问题。而现有研究主要集中在提高句子表示质量,对句子表示维度的探索却有限。我们在这里提供了一个全面的实验分析,探讨句子表示维度的优化问题。首先,我们示出了句子表示维度的优化值通常小于默认值。然后,为了压缩句子表示维度而不导致性能下降,我们分解了两个 contribuuting to the overall performance loss:encoder的表现损失和pooler的表现损失。因此,我们提出了一种两步训练方法 для句子表示学习模型,其中encoder和pooler分别优化以mitigate the overall performance loss in low-dimension scenarios。实验结果表明,我们的方法可以在七种STS任务和七种句子分类任务中显著提高低维度句子表示的性能。

Efficient Algorithms for Recognizing Weighted Tree-Adjoining Languages

  • paper_url: http://arxiv.org/abs/2310.15276
  • repo_url: None
  • paper_authors: Alexandra Butoi, Tim Vieira, Ryan Cotterell, David Chiang
  • for: 本研究是关于树连接语言的类型的语言形式学的研究,具体来说是使用不同的两级形式alisms,包括上下文自由格式(CFG)、推pushdown自动机(PDA)等,来 caracterize 树连接语言。
  • methods: 本研究使用了semiring-weighted版本的上述两级形式alisms,并设计了新的算法来计算字符串总和(所有 derive 的字符串的权重)和所有总和(所有 derive 的字符串的权重)。
  • results: 研究发现,对于 linear indexed grammars(LIG),我们的算法比 Vijay-Shanker 和 Weir(1989)的算法更时间高效,具体来说是 $\mathcal{O}(n|\mathcal{N}|)$ 高效,并且对于 embedded pushdown automata(EPDA),我们的算法比 Alonso et al.(2001)的算法更空间高效和时间高效,具体来说是 $\mathcal{O}(|\Gamma|^2)$ 高效和 $\mathcal{O}(|\Gamma|^3)$ 高效。此外,本研究还提供了首次的 PAA 字符串总和和所有总和算法。
    Abstract The class of tree-adjoining languages can be characterized by various two-level formalisms, consisting of a context-free grammar (CFG) or pushdown automaton (PDA) controlling another CFG or PDA. These four formalisms are equivalent to tree-adjoining grammars (TAG), linear indexed grammars (LIG), pushdown-adjoining automata (PAA), and embedded pushdown automata (EPDA). We define semiring-weighted versions of the above two-level formalisms, and we design new algorithms for computing their stringsums (the weight of all derivations of a string) and allsums (the weight of all derivations). From these, we also immediately obtain stringsum and allsum algorithms for TAG, LIG, PAA, and EPDA. For LIG, our algorithm is more time-efficient by a factor of $\mathcal{O}(n|\mathcal{N}|)$ (where $n$ is the string length and $|\mathcal{N}|$ is the size of the nonterminal set) and more space-efficient by a factor of $\mathcal{O}(|\Gamma|)$ (where $|\Gamma|$ is the size of the stack alphabet) than the algorithm of Vijay-Shanker and Weir (1989). For EPDA, our algorithm is both more space-efficient and time-efficient than the algorithm of Alonso et al. (2001) by factors of $\mathcal{O}(|\Gamma|^2)$ and $\mathcal{O}(|\Gamma|^3)$, respectively. Finally, we give the first PAA stringsum and allsum algorithms.
    摘要 “树连接语言的类型可以通过不同的二级形式主义来描述,包括 контекст自由格式 (CFG) 或推动式自动 machine (PDA) 控制另一个 CFG 或 PDA。这四种形式主义都等价于树连接语法 (TAG)、线性索引语法 (LIG)、推动式连接自动机 (PAA) 和嵌入式推动自动机 (EPDA)。我们定义了 Semiring 权重版本的上述二级形式主义,并设计了新的算法以计算其字串权重 (字串 derivation 的权重) 和所有权重 (所有 derivation 的权重)。从这些,我们也立即获得了 TAG、LIG、PAA 和 EPDA 的字串权重和所有权重算法。对 LIG,我们的算法在时间效率方面比 Vijay-Shanker 和 Weir (1989) 的算法快速 $\mathcal{O}(n|\mathcal{N}|)$ (其中 $n$ 是字串长度,$|\mathcal{N}|$ 是非预定元素集的大小),在空间效率方面比 $\mathcal{O}(|\Gamma|)$ (其中 $|\Gamma|$ 是栈字母的大小)。对 EPDA,我们的算法在时间效率和空间效率方面都比 Alonso et al. (2001) 的算法快速 $\mathcal{O}(|\Gamma|^2)$ 和 $\mathcal{O}(|\Gamma|^3)$, 分别。最后,我们提供了 PAA 的字串权重和所有权重算法。”

GradSim: Gradient-Based Language Grouping for Effective Multilingual Training

  • paper_url: http://arxiv.org/abs/2310.15269
  • repo_url: https://github.com/boschresearch/gradsim
  • paper_authors: Mingyang Wang, Heike Adel, Lukas Lange, Jannik Strötgen, Hinrich Schütze
  • for: 本研究旨在提高低资源语言处理模型的性能,通过多语言训练共享知识。
  • methods: 本文提出了一种基于梯度相似性的语言分组方法,称为GradSim。
  • results: 对三个多语言benchmark数据集进行了实验,并得到了与其他相似度度量相比较大的性能提升,以及与跨语言模型性能更高的相关性。此外,我们还发现了数据集主题的重要性,以及低层转换器模型中的语言特征和高层模型中的任务特征之间的关系。
    Abstract Most languages of the world pose low-resource challenges to natural language processing models. With multilingual training, knowledge can be shared among languages. However, not all languages positively influence each other and it is an open research question how to select the most suitable set of languages for multilingual training and avoid negative interference among languages whose characteristics or data distributions are not compatible. In this paper, we propose GradSim, a language grouping method based on gradient similarity. Our experiments on three diverse multilingual benchmark datasets show that it leads to the largest performance gains compared to other similarity measures and it is better correlated with cross-lingual model performance. As a result, we set the new state of the art on AfriSenti, a benchmark dataset for sentiment analysis on low-resource African languages. In our extensive analysis, we further reveal that besides linguistic features, the topics of the datasets play an important role for language grouping and that lower layers of transformer models encode language-specific features while higher layers capture task-specific information.
    摘要 大多数语言在世界 pose low-resource 挑战 для自然语言处理模型。通过多语言训练,知识可以共享于语言。然而,不是所有语言都有积极影响,而是开放的研究问题是如何选择最适合的语言组合 для多语言训练,以避免语言之间的负面干扰。在这篇论文中,我们提出了 GradSim,基于梯度相似性的语言分组方法。我们在三个多样化的多语言标准 benchmark 数据集上进行了实验,发现它比其他相似度度量更大,与跨语言模型性能更高相关。因此,我们设置了新的 state of the art 在 AfriSenti,一个低资源非洲语言 Sentiment 分析的标准数据集上。在我们的广泛分析中,我们发现除了语言特征之外,数据集主题也扮演着重要的角色于语言分组,以及低层转换器模型中存在语言特有的特征,而高层转换器模型则捕捉到任务特定的信息。

Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study

  • paper_url: http://arxiv.org/abs/2310.15262
  • repo_url: None
  • paper_authors: Injy Hamed, Nizar Habash, Ngoc Thang Vu
  • for: 本研究旨在比较三种扩充方法的效果,以提高Code-switching(CSW)文本生成的质量。
  • methods: 本研究使用了三种扩充方法:lexical replacements、linguistic theories和back-translation(BT),并在egyptian arabic-english CSW上进行了评估。
  • results: 研究结果显示,BT和CSW预测基于lexical replacement的方法在机器翻译和扩充 tasks中表现最佳,而linguistic theories和随机lexical replacement在缺乏CSW平行数据的情况下也能够达到类似的效果。
    Abstract Code-switching (CSW) text generation has been receiving increasing attention as a solution to address data scarcity. In light of this growing interest, we need more comprehensive studies comparing different augmentation approaches. In this work, we compare three popular approaches: lexical replacements, linguistic theories, and back-translation (BT), in the context of Egyptian Arabic-English CSW. We assess the effectiveness of the approaches on machine translation and the quality of augmentations through human evaluation. We show that BT and CSW predictive-based lexical replacement, being trained on CSW parallel data, perform best on both tasks. Linguistic theories and random lexical replacement prove to be effective in the lack of CSW parallel data, where both approaches achieve similar results.
    摘要 优化策略(CSW)文本生成已经受到了加大的关注,以解决数据缺乏的问题。随着这种兴趣的增长,我们需要更加全面的比较研究不同的扩充方法。在这个工作中,我们比较了三种流行的方法:lexical replacements、语言理论和回译(BT),在埃及阿拉伯语-英语 CSW 上进行了评估。我们通过人工评估来评估这些方法在机器翻译和扩充质量方面的效果。我们发现,BT和CSW预测基于 lexical replacement 的方法在两个任务上表现最佳。语言理论和随机 lexical replacement 在缺乏 CSW 平行数据的情况下表现出色,两者在两个任务上具有相似的效果。

Breaking the Language Barrier: Improving Cross-Lingual Reasoning with Structured Self-Attention

  • paper_url: http://arxiv.org/abs/2310.15258
  • repo_url: https://github.com/negar-foroutan/multilingual-code-switched-reasoning
  • paper_authors: Negar Foroutan, Mohammadreza Banaei, Karl Aberer, Antoine Bosselut
  • for: 本研究探讨了多语言语言模型(MultiLMs)是否可以在不同语言下传递逻辑理解能力。
  • methods: 我们使用了多语言语言模型,并在不同语言下进行了逻辑理解测试。我们使用了两种测试方案:一是保持语言和问题语言相同,但是在新语言中进行逻辑理解测试(i.e., 逻辑理解仍然是单语言的,但模型需要在不同语言之间传递学习的逻辑理解能力);二是将语言和问题语言混合使用(我们称之为“交叉语言理解”)。
  • results: 我们在两个逻辑理解数据集上进行了测试,发现 MultiLMs 可以在单语言 Setting 中传递逻辑理解能力,但在交叉语言 Setting 中却难以传递逻辑理解能力。基于这一观察,我们提出了一种新的注意力机制,使用专门的参数集来促进交叉语言注意力,这有效提高了逻辑理解性能,最高提高14%和4%。
    Abstract In this work, we study whether multilingual language models (MultiLMs) can transfer logical reasoning abilities to other languages when they are fine-tuned for reasoning in a different language. We evaluate the cross-lingual reasoning abilities of MultiLMs in two schemes: (1) where the language of the context and the question remain the same in the new languages that are tested (i.e., the reasoning is still monolingual, but the model must transfer the learned reasoning ability across languages), and (2) where the language of the context and the question is different (which we term code-switched reasoning). On two logical reasoning datasets, RuleTaker and LeapOfThought, we demonstrate that although MultiLMs can transfer reasoning ability across languages in a monolingual setting, they struggle to transfer reasoning abilities in a code-switched setting. Following this observation, we propose a novel attention mechanism that uses a dedicated set of parameters to encourage cross-lingual attention in code-switched sequences, which improves the reasoning performance by up to 14% and 4% on the RuleTaker and LeapOfThought datasets, respectively.
    摘要 在这项研究中,我们研究了多语言语言模型(MultiLMs)是否可以在不同语言中传递逻辑推理能力。我们在两种方案中评估了多语言模型的跨语言逻辑能力:(1)Context和问题在新语言测试中保持同一种语言(i.e., 逻辑仍然是单语言的,但模型需要将学习的逻辑能力传递到其他语言),以及(2)Context和问题在新语言测试中是不同的(我们称之为code-switched reasoning)。在两个逻辑推理数据集上,RuleTaker和LeapOfThought上,我们发现 although MultiLMs可以在单语言设置中传递逻辑能力,在code-switched设置中它们很难传递逻辑能力。根据这一观察,我们提出了一种新的注意力机制,使用专门的参数集来强制跨语言注意力在code-switched序列中,这有效提高了逻辑表现,在RuleTaker和LeapOfThought数据集上提高了14%和4%。

Large Language Models are Visual Reasoning Coordinators

  • paper_url: http://arxiv.org/abs/2310.15166
  • repo_url: https://github.com/cliangyu/cola
  • paper_authors: Liangyu Chen, Bo Li, Sheng Shen, Jingkang Yang, Chunyuan Li, Kurt Keutzer, Trevor Darrell, Ziwei Liu
  • for: 这paper的目的是提出一种新的视觉理解模型协调方法,使得多种视觉语言模型(VLM)能够协同工作,提高视觉理解能力。
  • methods: 这paper使用了一种新的协调方法,即通过自然语言交流来协调多种VLM的特殊和补充性能力。具体来说,这paper使用了一个大型语言模型(LLM)来协调多种VLM。
  • results: 实验表明,这paper的提案可以在视觉问答(VQA)、外知ledge VQA、视觉包容性和视觉空间理解等任务上达到状态对齐的性能。此外,这paper还表明了在零和几个shot设置下,不需要微调,即使没有预训练也可以达到比较出色的性能。
    Abstract Visual reasoning requires multimodal perception and commonsense cognition of the world. Recently, multiple vision-language models (VLMs) have been proposed with excellent commonsense reasoning ability in various domains. However, how to harness the collective power of these complementary VLMs is rarely explored. Existing methods like ensemble still struggle to aggregate these models with the desired higher-order communications. In this work, we propose Cola, a novel paradigm that coordinates multiple VLMs for visual reasoning. Our key insight is that a large language model (LLM) can efficiently coordinate multiple VLMs by facilitating natural language communication that leverages their distinct and complementary capabilities. Extensive experiments demonstrate that our instruction tuning variant, Cola-FT, achieves state-of-the-art performance on visual question answering (VQA), outside knowledge VQA, visual entailment, and visual spatial reasoning tasks. Moreover, we show that our in-context learning variant, Cola-Zero, exhibits competitive performance in zero and few-shot settings, without finetuning. Through systematic ablation studies and visualizations, we validate that a coordinator LLM indeed comprehends the instruction prompts as well as the separate functionalities of VLMs; it then coordinates them to enable impressive visual reasoning capabilities.
    摘要 “视觉逻辑需要多Modal感知和常识世界认知。最近,多种视觉语言模型(VLM)已经被提出,具有优秀的常识逻辑能力在不同领域。然而,如何利用这些补充的VLM的集合力仍然是 rarely explored。现有的方法,如集成,仍然困难于将这些模型集成到所需的高阶通信中。在这项工作中,我们提出了 Cola,一种新的思路,协调多种VLM的视觉逻辑。我们的关键发现是,一个大型语言模型(LLM)可以高效地协调多种VLM,通过促进自然语言交流,利用它们的不同和补充的能力。我们的实验表明,我们的 instruction tuning 变体,Cola-FT,在视觉问答(VQA)、外知VQA、视觉推论和视觉空间逻辑任务上具有状态机器的性能。此外,我们还证明了我们的 zero shot 和几少shot 设置下的变体,Cola-Zero,具有竞争性的性能,无需 fine-tuning。通过系统性的抽象研究和视觉化,我们 validate that coordinator LLM实际上理解了指令提示,以及各种VLM的分别能力;它然后协调它们,以实现了优秀的视觉逻辑能力。”

Function Vectors in Large Language Models

  • paper_url: http://arxiv.org/abs/2310.15213
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, David Bau
  • for: 这个论文旨在描述一种内置在自然语言处理器中的简单神经机制,该机制可以表示输入-输出函数为一个向量。
  • methods: 该论文使用 causal mediation analysis 来研究这种机制在各种具有上下文学习(ICL)任务中的表现。
  • results: 研究发现,这种机制可以在不同的任务、模型和层上表现出强烈的 causal effect,并且可以在不同的上下文中工作,包括零上下文和自然语言设置。此外,研究还发现,这种机制可以在不同的任务和模型之间进行 Semantic vector composition,并且可以创建新的复杂任务的 vectors。
    Abstract We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs). Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV). FVs are robust to changes in context, i.e., they trigger execution of the task on inputs such as zero-shot and natural text settings that do not resemble the ICL contexts from which they are collected. We test FVs across a range of tasks, models, and layers and find strong causal effects across settings in middle layers. We investigate the internal structure of FVs and find while that they often contain information that encodes the output space of the function, this information alone is not sufficient to reconstruct an FV. Finally, we test semantic vector composition in FVs, and find that to some extent they can be summed to create vectors that trigger new complex tasks. Taken together, our findings suggest that LLMs contain internal abstractions of general-purpose functions that can be invoked in a variety of contexts.
    摘要 (我们报告了一个简单的神经机制,它在 autoregressive transformer 语言模型(LM)中表示输入-输出函数 como a vector。我们使用 causal mediation analysis 在内部学习(ICL)任务上,发现一小数 attention heads 传递了内部学习的对象,我们称之为函数 вектор(FV)。FV 具有对上下文变化的强健性,可以在零传播和自然文本设定下触发任务,而不需要与 ICL 上下文完全相似的训练。我们在不同的任务、模型和层次上测试 FV,发现它们在中层层次上具有强大的 causal 效果。我们对 FV 的内部结构进行了研究,发现它们通常含有输出空间函数的信息,但这个信息 alone 不够以重建 FV。最后,我们在 FV 中进行 semantic vector 作用,发现它们可以在一定程度上被加和,导致新的复杂任务。总之,我们的发现表明 LLMs 内部含有一些通用函数的内部抽象,可以在不同的上下文中运行。)

S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.15147
  • repo_url: https://github.com/lfy79001/sqleval
  • paper_authors: Fangyu Lei, Qian Liu, Yiming Huang, Shizhu He, Jun Zhao, Kang Liu
  • for: 评估大语言模型(LLM)的能力,特别是理解长文本Context。
  • methods: 使用复杂的synthetic任务作为评估方法,并提出了S3Eval评估集。
  • results: S3Eval的性能与实际 benchmark like Big-Bench Hard(BBH)之间存在强相关关系,并且通过深入分析,探索了模型的性能特点。
    Abstract The rapid development of Large Language Models (LLMs) has led to great strides in model capabilities like reasoning and long-context understanding. However, as LLMs are able to process longer contexts, it becomes more challenging to evaluate whether they have acquired certain capabilities, since the length of text (e.g., 100K tokens) they can process far exceeds what humans can reliably assess in a reasonable duration. In this paper, we propose using complex synthetic tasks as a proxy evaluation method, and present S3Eval, a Synthetic, Scalable, Systematic evaluation suite for LLMs evaluation. As a synthetic benchmark, S3Eval enables the creation of any number of evaluation examples that are theoretically invisible to LLMs, mitigating the test set contamination issue. The synthetic nature of S3Eval provides users full control over the dataset, allowing them to systematically probe LLM capabilities by scaling text length and varying task difficulty across diverse scenarios. The strong correlation between S3Eval performance and scores of real-world benchmarks like Big-Bench Hard (BBH) demonstrates the soundness of using S3Eval for evaluation of LLMs. The in-depth analysis also uncover additional insights, including performance drop when the answer is sparsely distributed or located in the middle context, as well as some counter-intuitive trends of model performance.
    摘要 大量语言模型(LLM)的快速发展已导致模型能力的大幅提升,如理解和长文本理解。然而,随着 LLM 可以处理更长的文本,评估其所获得的能力变得更加困难,因为文本的长度(例如 100K 个Token)已经超出了人可靠地评估的时间范围。在这篇论文中,我们提出使用复杂的 sintetic 任务作为评估方法,并提出了 S3Eval,一个可扩展、可控的评估集。作为一个 sintetic 标准,S3Eval 允许创建任何数量的评估示例,这些示例对 LLM 来说是 theoretically 不可见的,因此可以解决测试集污染问题。 sintetic 的特点使得用户可以完全控制数据集,通过调整文本长度和任务难度来系统地探索 LLM 的能力。我们的实验表明,S3Eval 的表现与 BBH 等实际世界标准的分数之间存在强相关性,这表明使用 S3Eval 进行 LLM 评估是可靠的。进一步分析还揭示了一些逻辑的发现,包括答案 sparse 分布或中间文本位置的性能下降,以及一些Counter-intuitive 的模型性能趋势。

SpecTr: Fast Speculative Decoding via Optimal Transport

  • paper_url: http://arxiv.org/abs/2310.15141
  • repo_url: None
  • paper_authors: Ziteng Sun, Ananda Theertha Suresh, Jae Hun Ro, Ahmad Beirami, Himanshu Jain, Felix Yu
  • For: The paper is written to provide a principled understanding of speculative decoding through the lens of optimal transport, and to develop a new autoregressive sampling algorithm called SpecTr that achieves speedup in decoding while ensuring quality of the output.* Methods: The paper uses optimal transport with membership cost as a framework for understanding speculative decoding, and proposes a new draft selection algorithm that is computed via linear programming and has a best-known runtime of exponential in k.* Results: The proposed SpecTr algorithm achieves a wall clock speedup of 2.13X and a further 1.37X speedup over speculative decoding on standard benchmarks, while ensuring no quality degradation in the decoded output.
    Abstract Autoregressive sampling from large language models has led to state-of-the-art results in several natural language tasks. However, autoregressive sampling generates tokens one at a time making it slow, and even prohibitive in certain tasks. One way to speed up sampling is $\textit{speculative decoding}$: use a small model to sample a $\textit{draft}$ (block or sequence of tokens), and then score all tokens in the draft by the large language model in parallel. A subset of the tokens in the draft are accepted (and the rest rejected) based on a statistical method to guarantee that the final output follows the distribution of the large model. In this work, we provide a principled understanding of speculative decoding through the lens of optimal transport (OT) with $\textit{membership cost}$. This framework can be viewed as an extension of the well-known $\textit{maximal-coupling}$ problem. This new formulation enables us to generalize the speculative decoding method to allow for a set of $k$ candidates at the token-level, which leads to an improved optimal membership cost. We show that the optimal draft selection algorithm (transport plan) can be computed via linear programming, whose best-known runtime is exponential in $k$. We then propose a valid draft selection algorithm whose acceptance probability is $(1-1/e)$-optimal multiplicatively. Moreover, it can be computed in time almost linear with size of domain of a single token. Using this $new draft selection$ algorithm, we develop a new autoregressive sampling algorithm called $\textit{SpecTr}$, which provides speedup in decoding while ensuring that there is no quality degradation in the decoded output. We experimentally demonstrate that for state-of-the-art large language models, the proposed approach achieves a wall clock speedup of 2.13X, a further 1.37X speedup over speculative decoding on standard benchmarks.
    摘要 自然语言任务中, autoregressive 采样已经取得了状态机器的最佳结果。然而, autoregressive 采样每个 tokens 一个一个生成,因此速度慢,甚至在某些任务中是禁止的。一种加速采样的方法是“speculative decoding”:使用一个小型模型来采样一个“稿件”(块或字符串),然后将这个稿件中的所有 tokens 通过大语言模型并行计算分数。然后,根据一种统计方法,选择稿件中的一部分 tokens,并将其他 tokens 拒绝。这种方法可以保证输出符合大语言模型的分布。在这个工作中,我们提供了 speculative decoding 的理解的基础,通过 optimal transport(OT)的成员成本。这种框架可以视为 maximal-coupling 问题的扩展。这个新的形式可以让我们将 speculative decoding 方法扩展到允许多个候选者(k),从而获得更好的最佳成员成本。我们表明了选择最佳稿件算法(交通计划)可以通过线性程序计算,其最佳运行时间为 $k$ 的指数增长。此外,我们还提出了一个有效的稿件选择算法,其接受概率为 $(1-1/e)$-多倍加。此外,它的计算时间接近单个字符串的领域范围。使用这个新的稿件选择算法,我们开发了一种新的 autoregressive 采样算法 called SpecTr,它可以在采样时提高速度,并保证输出质量不受影响。我们实验表明,对于当前的状态机器,我们的方法可以 achieve 增加 Wall clock 速度的2.13倍,并且在标准 benchmark 上进一步提高了 speculative decoding 的速度1.37倍。

Quantifying the Dialect Gap and its Correlates Across Languages

  • paper_url: http://arxiv.org/abs/2310.15135
  • repo_url: None
  • paper_authors: Anjali Kantharuban, Ivan Vulić, Anna Korhonen
  • for: 本研究旨在评估现有最佳大语言模型(LLM)在不同地区方言的应用中的性能,以及 dialect gap 与经济、社会和语言因素的相关性。
  • methods: 本研究使用了两个高度使用应用程序:自动翻译和语音识别。研究还分析了不同语言和地区方言之间的关系,以及数据集的构建方式和大小对模型性能的影响。
  • results: 研究发现,不同语言和地区方言之间存在显著的 dialect gap,并且这种差距与经济、社会和语言因素有相关性。此外,研究还发现了不同模型和语言之间的数据集大小和构建方式对模型性能的影响。
    Abstract Historically, researchers and consumers have noticed a decrease in quality when applying NLP tools to minority variants of languages (i.e. Puerto Rican Spanish or Swiss German), but studies exploring this have been limited to a select few languages. Additionally, past studies have mainly been conducted in a monolingual context, so cross-linguistic trends have not been identified and tied to external factors. In this work, we conduct a comprehensive evaluation of the most influential, state-of-the-art large language models (LLMs) across two high-use applications, machine translation and automatic speech recognition, to assess their functionality on the regional dialects of several high- and low-resource languages. Additionally, we analyze how the regional dialect gap is correlated with economic, social, and linguistic factors. The impact of training data, including related factors like dataset size and its construction procedure, is shown to be significant but not consistent across models or languages, meaning a one-size-fits-all approach cannot be taken in solving the dialect gap. This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
    摘要 历史上,研究人员和消费者们已经注意到使用自然语言处理工具处理少数语言变体时(如波多黎各西班牙语或瑞士德语)的质量下降,但学术研究这一点尚未得到全面的探讨和证明。在本研究中,我们进行了大量语言模型(LLM)的全面评估,在两个高用应用中(机器翻译和自动声音识别),以评估这些模型在各地方言方面的功能。此外,我们还分析了各语言方言之间的联系,并考虑了经济、社会和语言因素的关系。我们发现,训练数据的影响是显著的,但不是一致的, meaning一个“一般化”的方法无法解决方言差距。本研究将为 dialectal NLP 领域奠基,揭示了不同语言方言之间的明显差异,并标识了可能的解决方案。

Location-Aware Visual Question Generation with Lightweight Models

  • paper_url: http://arxiv.org/abs/2310.15129
  • repo_url: None
  • paper_authors: Nicholas Collin Suwono, Justin Chih-Yao Chen, Tun Min Hung, Ting-Hao Kenneth Huang, I-Bin Liao, Yung-Hui Li, Lun-Wei Ku, Shao-Hua Sun
  • for: 本研究旨在生成与特定地理位置相关的有趣问题(LocaVQG),以提高对地理位置相关信息的理解和利用。
  • methods: 我们提出了一种数据生成管道,使用GPT-4生成多样化和复杂的问题,以及一种轻量级模型,能够在边缘设备(如手机)上适应LocaVQG任务。
  • results: 我们的提议方法在人工评估中(如参与度、基础性、 coherence)和自动评估指标(如 BERTScore、 ROUGE-2)中表现出色,并进行了广泛的ablation研究以证明我们的方法的有效性。
    Abstract This work introduces a novel task, location-aware visual question generation (LocaVQG), which aims to generate engaging questions from data relevant to a particular geographical location. Specifically, we represent such location-aware information with surrounding images and a GPS coordinate. To tackle this task, we present a dataset generation pipeline that leverages GPT-4 to produce diverse and sophisticated questions. Then, we aim to learn a lightweight model that can address the LocaVQG task and fit on an edge device, such as a mobile phone. To this end, we propose a method which can reliably generate engaging questions from location-aware information. Our proposed method outperforms baselines regarding human evaluation (e.g., engagement, grounding, coherence) and automatic evaluation metrics (e.g., BERTScore, ROUGE-2). Moreover, we conduct extensive ablation studies to justify our proposed techniques for both generating the dataset and solving the task.
    摘要 这个研究引入了一个新的任务:位置感知视觉问题生成(LocaVQG),旨在从特定地理位置相关的数据中生成有趣的问题。 Specifically,我们使用环境图像和GPS坐标来表示位置感知信息。为解决这个任务,我们提出了一个数据生成管道,利用GPT-4生成多样化和复杂的问题。然后,我们目标是学习一个轻量级的模型,能够 Addressing the LocaVQG task and fit on an edge device such as a mobile phone. To this end, we propose a method that can reliably generate engaging questions from location-aware information. Our proposed method outperforms baselines in terms of human evaluation (e.g., engagement, grounding, coherence) and automatic evaluation metrics (e.g., BERTScore, ROUGE-2). In addition, we conduct extensive ablation studies to justify our proposed techniques for both generating the dataset and solving the task.

How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation

  • paper_url: http://arxiv.org/abs/2310.15114
  • repo_url: https://github.com/hlt-mt/fbk-fairseq
  • paper_authors: Marco Gaido, Dennis Fucci, Matteo Negri, Luisa Bentivogli
  • for: 这篇论文的目的是提高语音翻译(ST)模型中的性别偏见问题。
  • methods: 该论文使用一种“多性别”神经网络模型,将说话人的性别信息作为外部metadata integrate into ST模型中,以提高模型对 feminine 形式的表达准确性。
  • results: 该研究表明,使用“多性别”模型可以提高语音翻译模型对 feminine 形式的表达准确性,并且在很多情况下可以超过 gender-specific 模型的表现。
    Abstract When translating from notional gender languages (e.g., English) into grammatical gender languages (e.g., Italian), the generated translation requires explicit gender assignments for various words, including those referring to the speaker. When the source sentence does not convey the speaker's gender, speech translation (ST) models either rely on the possibly-misleading vocal traits of the speaker or default to the masculine gender, the most frequent in existing training corpora. To avoid such biased and not inclusive behaviors, the gender assignment of speaker-related expressions should be guided by externally-provided metadata about the speaker's gender. While previous work has shown that the most effective solution is represented by separate, dedicated gender-specific models, the goal of this paper is to achieve the same results by integrating the speaker's gender metadata into a single "multi-gender" neural ST model, easier to maintain. Our experiments demonstrate that a single multi-gender model outperforms gender-specialized ones when trained from scratch (with gender accuracy gains up to 12.9 for feminine forms), while fine-tuning from existing ST models does not lead to competitive results.
    摘要 当翻译从不具有性别特征的语言(如英语)到具有性别特征的语言(如意大利语)时,生成的翻译需要显式地分配性别标识符 для多个词汇,包括指示说话人的词汇。当源句子不会提供说话人的性别信息时,Speech Translation(ST)模型会根据可能偏导的声音特征或默认到 masculine gender,这是现有训练 Corpora 中最常见的性别。为了避免这种偏见和不包容的行为, speaker-related 表达的性别分配应该被指导由外部提供的说话人 gender metadata。而在这篇论文中,我们的目标是通过将说话人 gender metadata integrate into a single "多性别" neural ST 模型,这样更容易维护。我们的实验表明,一个多性别模型在从零开始训练时(与 feminine 形式准确率提高达 12.9)可以超越 gender-specific 模型,而 fine-tuning 从现有 ST 模型不会达到竞争性result。

Counting the Bugs in ChatGPT’s Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model

  • paper_url: http://arxiv.org/abs/2310.15113
  • repo_url: https://github.com/dmort27/chatgpts-wugs
  • paper_authors: Leonie Weissweiler, Valentin Hofmann, Anjali Kantharuban, Anna Cai, Ritam Dutt, Amey Hengle, Anubha Kabra, Atharva Kulkarni, Abhishek Vijayakumar, Haofei Yu, Hinrich Schütze, Kemal Oflazer, David R. Mortensen
  • for: 这研究旨在检验最新一代大语言模型(ChatGPT)是否具备人类语言能力。
  • methods: 研究者采用了贝尔科(Berko,1958)的“含蓄测试”方法,使用四种语言(英语、德语、тами语和土耳其语)的新领域数据进行测试。
  • results: 研究发现,ChatGPT在英语方面表现异常差,与专门设计的系统相比,其表现较差。总的来说,这些结果表明,对ChatGPT的语言能力宣称可能是提前的和误导的。
    Abstract Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (iii) investigate syntax or semantics and overlook other capabilities that lie at the heart of human language, like morphology. Here, we close these gaps by conducting the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages (specifically, English, German, Tamil, and Turkish). We apply a version of Berko's (1958) wug test to ChatGPT, using novel, uncontaminated datasets for the four examined languages. We find that ChatGPT massively underperforms purpose-built systems, particularly in English. Overall, our results -- through the lens of morphology -- cast a new light on the linguistic capabilities of ChatGPT, suggesting that claims of human-like language skills are premature and misleading.
    摘要

GRENADE: Graph-Centric Language Model for Self-Supervised Representation Learning on Text-Attributed Graphs

  • paper_url: http://arxiv.org/abs/2310.15109
  • repo_url: https://github.com/bigheiniu/GRENADE
  • paper_authors: Yichuan Li, Kaize Ding, Kyumin Lee
  • for: 本研究旨在提出一种基于自我监督学习的文本嵌入表示学习方法,以便在不同下游任务中创建表示更加具有表达力和泛化能力。
  • methods: GRENADE使用了两种特циализирован的自我监督学习算法:图中心对照学习和图中心知识匹配。这两种算法有助于GRENADE捕捉文本 semantics 以及文本嵌入图中的结构上下文信息。
  • results: 对比其他状态之前的方法,GRENADE在多个实验中表现出优于状态之前的方法。GRENADE的实现可以在 \url{https://github.com/bigheiniu/GRENADE} 上找到。
    Abstract Self-supervised representation learning on text-attributed graphs, which aims to create expressive and generalizable representations for various downstream tasks, has received increasing research attention lately. However, existing methods either struggle to capture the full extent of structural context information or rely on task-specific training labels, which largely hampers their effectiveness and generalizability in practice. To solve the problem of self-supervised representation learning on text-attributed graphs, we develop a novel Graph-Centric Language model -- GRENADE. Specifically, GRENADE exploits the synergistic effect of both pre-trained language model and graph neural network by optimizing with two specialized self-supervised learning algorithms: graph-centric contrastive learning and graph-centric knowledge alignment. The proposed graph-centric self-supervised learning algorithms effectively help GRENADE to capture informative textual semantics as well as structural context information on text-attributed graphs. Through extensive experiments, GRENADE shows its superiority over state-of-the-art methods. Implementation is available at \url{https://github.com/bigheiniu/GRENADE}.
    摘要 自然语言文本图像上的自我监督学习,旨在创造表达力强、通用的表示方法,在不同下游任务中得到广泛应用。然而,现有方法 Either struggle to capture the full extent of structural context information or rely on task-specific training labels, which greatly hinders their effectiveness and generalizability in practice. To solve the problem of self-supervised representation learning on text-attributed graphs, we develop a novel Graph-Centric Language model -- GRENADE. Specifically, GRENADE exploits the synergistic effect of both pre-trained language model and graph neural network by optimizing with two specialized self-supervised learning algorithms: graph-centric contrastive learning and graph-centric knowledge alignment. The proposed graph-centric self-supervised learning algorithms effectively help GRENADE to capture informative textual semantics as well as structural context information on text-attributed graphs. Through extensive experiments, GRENADE shows its superiority over state-of-the-art methods. 实现可以在 \url{https://github.com/bigheiniu/GRENADE} 中找到。

LLM-in-the-loop: Leveraging Large Language Model for Thematic Analysis

  • paper_url: http://arxiv.org/abs/2310.15100
  • repo_url: https://github.com/sjdai/llm-thematic-analysis
  • paper_authors: Shih-Chieh Dai, Aiping Xiong, Lun-Wei Ku
  • for: 这个研究的目的是提出一个人工智能-人类合作框架,用于实现内容学习(ICL)的主题分析(TA)。
  • methods: 这个框架使用大型自然语言模型(LLM)与人类合作,将LLM作为TA的代码库生成器。
  • results: 实验结果显示,该框架可以提供与人类编码者相同的编码质量,但可以减少TA的劳动和时间需求。
    Abstract Thematic analysis (TA) has been widely used for analyzing qualitative data in many disciplines and fields. To ensure reliable analysis, the same piece of data is typically assigned to at least two human coders. Moreover, to produce meaningful and useful analysis, human coders develop and deepen their data interpretation and coding over multiple iterations, making TA labor-intensive and time-consuming. Recently the emerging field of large language models (LLMs) research has shown that LLMs have the potential replicate human-like behavior in various tasks: in particular, LLMs outperform crowd workers on text-annotation tasks, suggesting an opportunity to leverage LLMs on TA. We propose a human-LLM collaboration framework (i.e., LLM-in-the-loop) to conduct TA with in-context learning (ICL). This framework provides the prompt to frame discussions with a LLM (e.g., GPT-3.5) to generate the final codebook for TA. We demonstrate the utility of this framework using survey datasets on the aspects of the music listening experience and the usage of a password manager. Results of the two case studies show that the proposed framework yields similar coding quality to that of human coders but reduces TA's labor and time demands.
    摘要

Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization

  • paper_url: http://arxiv.org/abs/2310.15080
  • repo_url: https://github.com/llm-eff/fedpeptao
  • paper_authors: Tianshi Che, Ji Liu, Yang Zhou, Jiaxiang Ren, Jiwen Zhou, Victor S. Sheng, Huaiyu Dai, Dejing Dou
  • for: 这篇论文的目的是提出一种Parameter-efficient prompt Tuning方法,以实现大型自然语言模型(LLMs)的联合训练。
  • methods: 这篇论文使用了一种具有协调优化的快速问题调整方法(FedPepTAO),包括一种高效的偏好问题调整方法和一种适应优化方法来解决客户端漂移问题。
  • results: 实验结果显示,FedPepTAO比基eline方法高达60.8%的精度和97.59%的训练时间效率。
    Abstract Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data. However, the training process of Large Language Models (LLMs) generally incurs the update of significant parameters, which limits the applicability of FL techniques to tackle the LLMs in real scenarios. Prompt tuning can significantly reduce the number of parameters to update, but it either incurs performance degradation or low training efficiency. The straightforward utilization of prompt tuning in the FL often raises non-trivial communication costs and dramatically degrades performance. In addition, the decentralized data is generally non-Independent and Identically Distributed (non-IID), which brings client drift problems and thus poor performance. This paper proposes a Parameter-efficient prompt Tuning approach with Adaptive Optimization, i.e., FedPepTAO, to enable efficient and effective FL of LLMs. First, an efficient partial prompt tuning approach is proposed to improve performance and efficiency simultaneously. Second, a novel adaptive optimization method is developed to address the client drift problems on both the device and server sides to enhance performance further. Extensive experiments based on 10 datasets demonstrate the superb performance (up to 60.8\% in terms of accuracy) and efficiency (up to 97.59\% in terms of training time) of FedPepTAO compared with 9 baseline approaches. Our code is available at https://github.com/llm-eff/FedPepTAO.
    摘要 联邦学习(FL)是一种有前途的方法,启用了分布式数据的合作模型训练。然而, LLM 训练过程通常需要更新许多参数,这限制了FL技术在实际场景中的应用。干扰调整可以减少参数更新的数量,但它会导致性能下降或训练效率下降。 straightforward utilization of prompt tuning in FL often raises non-trivial communication costs and dramatically degrades performance。此外,分布式数据通常不是独立同分布(non-IID),这会导致客户端偏移问题,从而影响性能。本文提出了一种高效的参数精炼干扰调整方法,以及一种适应优化方法,以解决客户端偏移问题并提高性能。实验结果表明,相比9个基eline方法,FedPepTAO可以达到60.8%的准确率和97.59%的训练时间效率。我们的代码可以在https://github.com/llm-eff/FedPepTAO 中找到。

Affective and Dynamic Beam Search for Story Generation

  • paper_url: http://arxiv.org/abs/2310.15079
  • repo_url: https://github.com/tenghaohuang/affgen
  • paper_authors: Tenghao Huang, Ehsan Qasemi, Bangzheng Li, He Wang, Faeze Brahman, Muhao Chen, Snigdha Chaturvedi
  • for: 这篇论文旨在提出一种能生成有趣的故事的模型,它可以用于娱乐、教育、治疗和认知学等领域。
  • methods: 该模型使用了两种新的技术:动态杆大小和情感重新排序。动态杆大小使得故事中的词语更加不predictable,使得故事更加有趣。情感重新排序根据情感强度对句子候选者进行排序。
  • results: 我们的实验表明,AffGen在生成有趣且情感强度高的故事方面表现出色,比现有的基eline模型更好。我们的剥离分析和分析也提供了AffGen的优势和缺点。
    Abstract Storytelling's captivating potential makes it a fascinating research area, with implications for entertainment, education, therapy, and cognitive studies. In this paper, we propose Affective Story Generator (AffGen) for generating interesting narratives. AffGen introduces "intriguing twists" in narratives by employing two novel techniques-Dynamic Beam Sizing and Affective Reranking. Dynamic Beam Sizing encourages less predictable, more captivating word choices using a contextual multi-arm bandit model. Affective Reranking prioritizes sentence candidates based on affect intensity. Our empirical evaluations, both automatic and human, demonstrate AffGen's superior performance over existing baselines in generating affectively charged and interesting narratives. Our ablation study and analysis provide insights into the strengths and weaknesses of AffGen.
    摘要 storytelling的吸引力潜力使得它成为了一个非常有趣的研究领域,它的应用领域包括娱乐、教育、治疗和认知学。在这篇论文中,我们提出了情感故事生成器(AffGen),用于生成有趣的故事。AffGen通过两种新的技术——动态杆大小和情感重新排序——在故事中引入了不可预期的转折点。动态杆大小使用了上下文multi-arm bandit模型,以便更好地采用趋势的选择。情感重新排序将句子候选者按照情感强度进行排序。我们的实际评估和人类评价都表明AffGen在生成有情感强度和有趣的故事方面的表现远胜了现有的基线。我们的剥离分析和分析提供了AffGen的优势和缺点。

‘Don’t Get Too Technical with Me’: A Discourse Structure-Based Framework for Science Journalism

  • paper_url: http://arxiv.org/abs/2310.15077
  • repo_url: https://github.com/ronaldahmed/scitechnews
  • paper_authors: Ronald Cardenas, Bingsheng Yao, Dakuo Wang, Yufang Hou
  • for: 这个论文的目的是支持自动化科学新闻报道(Automatic Science Journalism),通过构建一个真实世界数据集(SciTechNews)和提出一种新的技术框架,帮助报道技术发现inea的报道更加准确、简洁和易于理解。
  • methods: 这个论文使用了一种新的技术框架,它将论文的话语结构与元数据结合起来,以便在生成报道时提供指导。此外,论文还使用了一些基eline方法(如Alpaca和ChatGPT)进行比较。
  • results: 根据extensive的自动和人工实验结果,这个论文的技术框架在生成媒体报道的内容计划、简化信息选择和生成报道的layman’s style中表现出色,相比之下baseline方法(如Alpaca和ChatGPT)的表现较差。
    Abstract Science journalism refers to the task of reporting technical findings of a scientific paper as a less technical news article to the general public audience. We aim to design an automated system to support this real-world task (i.e., automatic science journalism) by 1) introducing a newly-constructed and real-world dataset (SciTechNews), with tuples of a publicly-available scientific paper, its corresponding news article, and an expert-written short summary snippet; 2) proposing a novel technical framework that integrates a paper's discourse structure with its metadata to guide generation; and, 3) demonstrating with extensive automatic and human experiments that our framework outperforms other baseline methods (e.g. Alpaca and ChatGPT) in elaborating a content plan meaningful for the target audience, simplifying the information selected, and producing a coherent final report in a layman's style.
    摘要 科学新闻报道指的是将科学论文中的技术发现报道为对大众读者更加简洁的新闻文章。我们的目标是通过自动化系统支持这个实际任务(自动科学新闻),包括:1)构建了一个真实世界数据集(SciTechNews),该数据集包含公开available的科学论文、其对应的新闻文章和专家写的简短概要摘要;2)提出了一种新的技术框架,该框架将论文的话语结构与元数据集成一体,以指导生成;以及3)通过广泛的自动和人类实验,我们的框架比基eline方法(如Alpaca和ChatGPT)在为目标读者制定内容计划、简化选择的信息和生成易于理解的报道而表现出优异。

TableQAKit: A Comprehensive and Practical Toolkit for Table-based Question Answering

  • paper_url: http://arxiv.org/abs/2310.15075
  • repo_url: None
  • paper_authors: Fangyu Lei, Tongxu Luo, Pengqi Yang, Weihao Liu, Hanwen Liu, Jiahe Lei, Yiming Huang, Yifan Wei, Shizhu He, Jun Zhao, Kang Liu
  • for: This paper is written for researchers and developers working on table-based question answering (TableQA) tasks, as well as those interested in natural language processing and machine learning.
  • methods: The paper introduces TableQAKit, an open-source toolkit that provides a unified platform for TableQA, including plentiful datasets and popular methods for this task, as well as large language models (LLMs).
  • results: The paper reports that using the modules in TableQAKit achieves new state-of-the-art (SOTA) results on some datasets, and provides an LLM-based TableQA Benchmark for evaluating the role of LLMs in TableQA.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文是为研究表格问答(TableQA)任务的研究人员和开发者所写的,以及关注自然语言处理和机器学习领域的人员。
  • methods: 论文介绍了 TableQAKit,一个开源的工具集,提供了表格问答的一站式平台,包括丰富的数据集和表格问答任务中流行的方法,以及大语言模型(LLMs)。
  • results: 论文报告了使用 TableQAKit 模块时达到了一些数据集的新状态的艺术(SOTA)结果,并提供了基于 LLMs 的表格问答benchmark,用于评估 LLMs 在表格问答中的角色。
    Abstract Table-based question answering (TableQA) is an important task in natural language processing, which requires comprehending tables and employing various reasoning ways to answer the questions. This paper introduces TableQAKit, the first comprehensive toolkit designed specifically for TableQA. The toolkit designs a unified platform that includes plentiful TableQA datasets and integrates popular methods of this task as well as large language models (LLMs). Users can add their datasets and methods according to the friendly interface. Also, pleasantly surprised using the modules in this toolkit achieves new SOTA on some datasets. Finally, \tableqakit{} also provides an LLM-based TableQA Benchmark for evaluating the role of LLMs in TableQA. TableQAKit is open-source with an interactive interface that includes visual operations, and comprehensive data for ease of use.
    摘要 tables-based 问答 (TableQA) 是自然语言处理中的一项重要任务,需要理解表格并运用多种逻辑方法来回答问题。这篇文章介绍了 TableQAKit,是特地为 TableQA 设计的首个通用工具箱。工具箱包括丰富的 TableQA 数据集和整合了流行的这个任务方法以及大语言模型(LLM)。用户可以根据易用的界面添加自己的数据集和方法。此外,使用 modules 在这个工具箱中也可以实现新的 SOTA 成绩在某些数据集上。最后,\tableqakit{} 还提供了基于 LLM 的 TableQA 评估标准,用于评估 LLM 在 TableQA 中的角色。TableQAKit 是开源的,具有交互式界面,包括视觉操作和完整的数据,以便使用。

Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge

  • paper_url: http://arxiv.org/abs/2310.15066
  • repo_url: https://github.com/pluslabnlp/envision
  • paper_authors: Te-Lin Wu, Yu Zhou, Nanyun Peng
    for: 本文主要旨在提高phrase grounding模型的能力于本地化活动对象,以便它们可以更好地帮助人类完成任务。methods: 本文提出了一种基于语言模式和视觉模式的解决方案,包括学习对象变化的角色,提取对象更加准确,以及利用前后条件来识别对象。results: 对Ego4D和Epic-Kitchens数据集进行了广泛的实验,并得到了显著的提升。相比于传统方法,本文的方法可以提高TREK-150-OPE-Det本地化+跟踪任务的标准 metric 54%以上,TREK-150-OPE跟踪任务的标准 metric 7%以上,以及Ego4D SCOD任务的平均精度(AP)上的提升3%以上。
    Abstract The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually. One important step towards this goal is to localize and track key active objects that undergo major state change as a consequence of human actions/interactions to the environment without being told exactly what/where to ground (e.g., localizing and tracking the `sponge` in video from the instruction "Dip the `sponge` into the bucket."). While existing works approach this problem from a pure vision perspective, we investigate to which extent the textual modality (i.e., task instructions) and their interaction with visual modality can be beneficial. Specifically, we propose to improve phrase grounding models' ability on localizing the active objects by: (1) learning the role of `objects undergoing change` and extracting them accurately from the instructions, (2) leveraging pre- and post-conditions of the objects during actions, and (3) recognizing the objects more robustly with descriptional knowledge. We leverage large language models (LLMs) to extract the aforementioned action-object knowledge, and design a per-object aggregation masking technique to effectively perform joint inference on object phrases and symbolic knowledge. We evaluate our framework on Ego4D and Epic-Kitchens datasets. Extensive experiments demonstrate the effectiveness of our proposed framework, which leads to>54% improvements in all standard metrics on the TREK-150-OPE-Det localization + tracking task, >7% improvements in all standard metrics on the TREK-150-OPE tracking task, and >3% improvements in average precision (AP) on the Ego4D SCOD task.
    摘要 人工智能代理人需要能够从自己的视角活动地基于任务说明进行操作或协助人类。一个重要的进步是将关键的活动对象localize和跟踪到环境中,而不需要指定具体的位置或物体。现有的工作通过纯视觉方式解决这个问题,我们则 investigate到文本模式(即任务说明)和视觉模式之间的交互如何提高地面对象localization的能力。具体来说,我们提出了以下三个方法来改进地面对象localization模型:1. 学习对象变化的角色和准确地从说明中提取活动对象。2. 利用操作前后对象的条件,以便更好地识别活动对象。3. 通过描述知识来更加稳定地识别活动对象。我们利用大型自然语言模型(LLMs)提取对象变化的知识,并设计了每个对象的权重聚合屏蔽技术,以实现效果地进行对象短语和符号知识的共同推理。我们在Ego4D和Epic-Kitchens数据集上进行了广泛的实验,结果表明我们提出的框架具有明显的优势,在TREK-150-OPE-Det本地化+跟踪任务上提高了>54%的标准指标,在TREK-150-OPE跟踪任务上提高了>7%的标准指标,并在Ego4D SCOD任务上提高了>3%的平均准确率。

SLOG: A Structural Generalization Benchmark for Semantic Parsing

  • paper_url: http://arxiv.org/abs/2310.15040
  • repo_url: https://github.com/bingzhilee/slog
  • paper_authors: Bingzhi Li, Lucia Donatelli, Alexander Koller, Tal Linzen, Yuekun Yao, Najoung Kim
  • for: 评估语言模型对新复杂表达的泛化能力
  • methods: 使用COGS数据集(Kim和Linzen,2020)的17个结构泛化 случа例进行评估
  • results: 使用Transformer模型(包括预训练模型)的泛化精度只达40.6%,而结构意识 parser只达70.8%,与现有模型在COGS上的准确率有很大差异,说明SLOG数据集能够强调模型对结构泛化的能力的不足。
    Abstract The goal of compositional generalization benchmarks is to evaluate how well models generalize to new complex linguistic expressions. Existing benchmarks often focus on lexical generalization, the interpretation of novel lexical items in syntactic structures familiar from training; structural generalization tasks, where a model needs to interpret syntactic structures that are themselves unfamiliar from training, are often underrepresented, resulting in overly optimistic perceptions of how well models can generalize. We introduce SLOG, a semantic parsing dataset that extends COGS (Kim and Linzen, 2020) with 17 structural generalization cases. In our experiments, the generalization accuracy of Transformer models, including pretrained ones, only reaches 40.6%, while a structure-aware parser only achieves 70.8%. These results are far from the near-perfect accuracy existing models achieve on COGS, demonstrating the role of SLOG in foregrounding the large discrepancy between models' lexical and structural generalization capacities.
    摘要 “目的是评估模型如何通过新的复杂语言表达扩展其泛化能力。现有的测试 benchmark 常常注重词汇泛化,即在训练中 Familiar 的语法结构中新的词汇的理解;而结构泛化任务,即模型需要理解不 familar 的语法结构,则 часто被忽略,导致评估模型的泛化能力过于优化。我们引入 SLOG,一个延展 COGS(Kim 和 Linzen,2020)的semantic parsing dataset,包含17个结构泛化例子。在我们的实验中,Transformer 模型,包括预训练的,只达到40.6%的泛化精度,而结构意识的 parser 则达到70.8%。这些结果远远低于现有模型在 COGS 上的 near-perfect 精度,强调 SLOG 在抛出模型的词汇和结构泛化能力之间的大差。”

Statistical Depth for Ranking and Characterizing Transformer-Based Text Embeddings

  • paper_url: http://arxiv.org/abs/2310.15010
  • repo_url: https://github.com/pkseeg/tte_depth
  • paper_authors: Parker Seegmiller, Sarah Masud Preum
  • for: 该论文旨在提供一种 Statistical depth 方法,用于衡量高维文本表示Matrix中文本的中心性。
  • methods: 该论文使用 transformer-based text embedding (TTE) depth 方法,并在 NLP 管道中进行模型化和分布推断。
  • results: 研究人员通过使用 TTE depth 方法进行启发式学习提问选择,并发现这种方法可以在 six 种文本分类任务中提高性能。此外,研究人员还使用 TTE depth 方法和相关的rank sum test来描述人工生成和自然语言 corpora 的分布,发现五种最近的人工数据生成过程会导致 associative 人类生成文本的分布偏移。
    Abstract The popularity of transformer-based text embeddings calls for better statistical tools for measuring distributions of such embeddings. One such tool would be a method for ranking texts within a corpus by centrality, i.e. assigning each text a number signifying how representative that text is of the corpus as a whole. However, an intrinsic center-outward ordering of high-dimensional text representations is not trivial. A statistical depth is a function for ranking k-dimensional objects by measuring centrality with respect to some observed k-dimensional distribution. We adopt a statistical depth to measure distributions of transformer-based text embeddings, transformer-based text embedding (TTE) depth, and introduce the practical use of this depth for both modeling and distributional inference in NLP pipelines. We first define TTE depth and an associated rank sum test for determining whether two corpora differ significantly in embedding space. We then use TTE depth for the task of in-context learning prompt selection, showing that this approach reliably improves performance over statistical baseline approaches across six text classification tasks. Finally, we use TTE depth and the associated rank sum test to characterize the distributions of synthesized and human-generated corpora, showing that five recent synthetic data augmentation processes cause a measurable distributional shift away from associated human-generated text.
    摘要 《 transformer-based 文本嵌入 popularity 的 Popularity 导致了更好的统计工具 для Measuring 文本嵌入 Distribution 。一种 Such tool 是一种方法 для rank 文本在 corpus 中的中心性,即 assigning 每个文本一个数字,表示该文本是 corpus 中的代表性。然而,高维文本表示的内在中心-外向顺序不是rivial。我们采用了一种统计深度来 Measuring 高维文本表示中的中心性,称为 transformer-based text embedding (TTE)深度。我们首先定义 TTE 深度和一种相关的rank sum test ,用于 Determining 两个 corpus 在 embedding 空间是否有 statistically significant difference。然后,我们使用 TTE 深度进行 NLP 管道中的模型和分布式推理任务。我们发现,这种 Approach 可靠地提高性能,在六种文本分类任务中,比基eline Approach 更好。最后,我们使用 TTE 深度和相关的rank sum test来 Characterize 人工生成和synthesized corpora 的分布,发现 five recent synthetic data augmentation 过程导致了 measurable distributional shift away from associated human-generated text。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is based on the original text and may not capture all the nuances and idiomatic expressions of the original text.

Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.15007
  • repo_url: None
  • paper_authors: Matthieu Meeus, Shubham Jain, Marek Rei, Yves-Alexandre de Montjoye
  • for: The paper is focused on the task of document-level membership inference for real-world large language models (LLMs), which involves inferring whether the LLM has seen a given document during training or not.
  • methods: The authors propose a practical, black-box method to predict document-level membership using commonly used data sources for training and the model release date. They also propose a procedure for the development and evaluation of document-level membership inference for LLMs.
  • results: The authors show that their methodology performs very well, reaching an impressive AUC of 0.856 for books and 0.678 for papers. They also show that their approach outperforms sentence-level membership inference attacks used in the privacy literature for the document-level membership task. Additionally, they find that smaller models like OpenLLaMA-3B are approximately as sensitive to their approach as larger models like OpenLLaMA-7B.
    Abstract With large language models (LLMs) poised to become embedded in our daily lives, questions are starting to be raised about the dataset(s) they learned from. These questions range from potential bias or misinformation LLMs could retain from their training data to questions of copyright and fair use of human-generated text. However, while these questions emerge, developers of the recent state-of-the-art LLMs become increasingly reluctant to disclose details on their training corpus. We here introduce the task of document-level membership inference for real-world LLMs, i.e. inferring whether the LLM has seen a given document during training or not. First, we propose a procedure for the development and evaluation of document-level membership inference for LLMs by leveraging commonly used data sources for training and the model release date. We then propose a practical, black-box method to predict document-level membership and instantiate it on OpenLLaMA-7B with both books and academic papers. We show our methodology to perform very well, reaching an impressive AUC of 0.856 for books and 0.678 for papers. We then show our approach to outperform the sentence-level membership inference attacks used in the privacy literature for the document-level membership task. We finally evaluate whether smaller models might be less sensitive to document-level inference and show OpenLLaMA-3B to be approximately as sensitive as OpenLLaMA-7B to our approach. Taken together, our results show that accurate document-level membership can be inferred for LLMs, increasing the transparency of technology poised to change our lives.
    摘要 LLMs (大型自然语言模型) 即将成为我们日常生活中的一部分,因此有人开始提出关于它们学习数据集的问题。这些问题包括 LLMs 可能从学习数据集中招吸到的偏见或误information,以及人类生成的文本是否符合版权和公平使用的问题。然而,开发者们在发布最新的状态艺术LLMs时变得越来越不愿意披露它们的学习数据集详细信息。我们在这里介绍了对实际世界 LLMS 的文档级会员推理任务,即判断一个给定的文档是否在 LLMS 的训练数据中出现过。我们首先提出了对实际世界 LLMS 的文档级会员推理任务的开发和评估方法,然后提出了一种实用的黑盒方法来预测文档级会员,并在 OpenLLaMA-7B 上实现了这种方法。我们的方法性能很高,达到了 0.856 的 AUC 值 для书籍和 0.678 的 AUC 值 для学术论文。我们的方法还超过了在隐私领域中使用的句子级会员推理攻击,并且我们发现 OpenLLaMA-3B 和 OpenLLaMA-7B 的敏感度几乎相同。总之,我们的结果表明,可以准确地推理实际世界 LLMS 中文档的会员性,从而提高技术的透明度。

When Language Models Fall in Love: Animacy Processing in Transformer Language Models

  • paper_url: http://arxiv.org/abs/2310.15004
  • repo_url: https://github.com/hannamw/lms-in-love
  • paper_authors: Michael Hanna, Yonatan Belinkov, Sandro Pezzelle
  • for: 本研究旨在探讨语言模型是否能够正确地处理生物体的动态性,以及它们如何处理不同类型的生物体。
  • methods: 研究人员使用了开源的语言模型,并对其进行了训练和测试,以评估其对生物体动态性的处理能力。
  • results: 研究发现,语言模型在处理典型生物体时行为类似于人类,但在处理不典型生物体时,其处理能力较差。尽管 context indicating atypical animacy 很短,但语言模型仍可以从 subtle clues 中感受到动态性的含义,并改变其行为。
    Abstract Animacy - whether an entity is alive and sentient - is fundamental to cognitive processing, impacting areas such as memory, vision, and language. However, animacy is not always expressed directly in language: in English it often manifests indirectly, in the form of selectional constraints on verbs and adjectives. This poses a potential issue for transformer language models (LMs): they often train only on text, and thus lack access to extralinguistic information from which humans learn about animacy. We ask: how does this impact LMs' animacy processing - do they still behave as humans do? We answer this question using open-source LMs. Like previous studies, we find that LMs behave much like humans when presented with entities whose animacy is typical. However, we also show that even when presented with stories about atypically animate entities, such as a peanut in love, LMs adapt: they treat these entities as animate, though they do not adapt as well as humans. Even when the context indicating atypical animacy is very short, LMs pick up on subtle clues and change their behavior. We conclude that despite the limited signal through which LMs can learn about animacy, they are indeed sensitive to the relevant lexical semantic nuances available in English.
    摘要 生物性 - Entity 是生物和意识的存在 - 是认知处理中的基本因素,影响 Memory, 视觉和语言等领域。然而,生物性并不总是直接表达在语言中:在英语中,它通常表现为动词和形容词的选择约束。这可能会对 transformer 语言模型(LMs)造成问题:它们通常只在文本上训练,因此缺乏人类从 extralinguistic 信息中学习生物性的信息。我们问:LMs 在处理生物性方面的影响如何 - 它们是否 behave 如人类一样?我们使用 open-source LMs 回答这个问题。与前一些研究相同,我们发现 LMs 与人类行为类似,只要 entity 的生物性是典型的。然而,我们还发现 LMs 可以适应不典型的 animate 实体,如一个爱上了花生的情节。LMs 会对这些实体进行处理,虽然它们不如人类那么好。即使 context 中表达 atypical animacy 的信息非常短暂,LMs 也可以捕捉到微妙的 clue 并改变行为。我们认为,即使 LMs 只有通过语言学习到的有限信号,它们仍然能够感受到英语中有关生物性的 relevante semantic nuance。

Simple Hardware-Efficient PCFGs with Independent Left and Right Productions

  • paper_url: http://arxiv.org/abs/2310.14997
  • repo_url: None
  • paper_authors: Wei Liu, Songlin Yang, Yoon Kim, Kewei Tu
  • for: 这个论文是为了提高PCFG的扩展和语言模型的性能而写的。
  • methods: 这个论文使用了一种低级别参数化方法来缩放PCFG,并且引入了一种简单的PCFG形式来提高语言模型的性能。
  • results: 这个论文的结果表明,使用这种简单的PCFG形式和低级别参数化方法可以更好地扩展PCFG,并且在语言模型方面表现更好,比如同样大小的低级别PCFG。
    Abstract Scaling dense PCFGs to thousands of nonterminals via a low-rank parameterization of the rule probability tensor has been shown to be beneficial for unsupervised parsing. However, PCFGs scaled this way still perform poorly as a language model, and even underperform similarly-sized HMMs. This work introduces \emph{SimplePCFG}, a simple PCFG formalism with independent left and right productions. Despite imposing a stronger independence assumption than the low-rank approach, we find that this formalism scales more effectively both as a language model and as an unsupervised parser. As an unsupervised parser, our simple PCFG obtains an average F1 of 65.1 on the English PTB, and as a language model, it obtains a perplexity of 119.0, outperforming similarly-sized low-rank PCFGs. We further introduce \emph{FlashInside}, a hardware IO-aware implementation of the inside algorithm for efficiently scaling simple PCFGs.
    摘要 压缩稠密PCFGs到千个非终态符的规模通过低级参数化规则概率矩阵已经被证明是无监督分析的助长。然而,PCFGs压缩得到的性能仍然较差,甚至下出了相同大小的HMMs。这项工作介绍了简单PCFG(SimplePCFG),一种简单的PCFG формаль语言模型,其中左侧和右侧生成规则独立。虽然这种形式强制了更加独立的假设,但我们发现它在语言模型和无监督分析器方面更好地扩展。作为无监督分析器,我们的简单PCFG在英语PTB上的平均F1值为65.1,作为语言模型,它的词频为119.0,超过了相同大小的低级PCFGs。我们还介绍了FlashInside,一种硬件IO意识的内部算法实现,用于有效地扩展简单PCFGs。

LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay

  • paper_url: http://arxiv.org/abs/2310.14985
  • repo_url: None
  • paper_authors: Yihuai Lan, Zhiqiang Hu, Lei Wang, Yang Wang, Deheng Ye, Peilin Zhao, Ee-Peng Lim, Hui Xiong, Hao Wang
  • for: 本研究目标是探索基于LLM的代理人在社交行为方面的开放问题。
  • methods: 我们采用了Avalon游戏作为环境,并使用系统提示来引导LLM代理人参与游戏。
  • results: 我们的研究表明,我们的框架可以快速适应Avalon游戏,并且可以生成适应性强的智能代理人。我们的结果还显示了LLM代理人在动态社交环境中的应用潜力。
    Abstract This paper aims to investigate the open research problem of uncovering the social behaviors of LLM-based agents. To achieve this goal, we adopt Avalon, a representative communication game, as the environment and use system prompts to guide LLM agents to play the game. While previous studies have conducted preliminary investigations into gameplay with LLM agents, there lacks research on their social behaviors. In this paper, we present a novel framework designed to seamlessly adapt to Avalon gameplay. The core of our proposed framework is a multi-agent system that enables efficient communication and interaction among agents. We evaluate the performance of our framework based on metrics from two perspectives: winning the game and analyzing the social behaviors of LLM agents. Our results demonstrate the effectiveness of our framework in generating adaptive and intelligent agents and highlight the potential of LLM-based agents in addressing the challenges associated with dynamic social environment interaction. By analyzing the social behaviors of LLM agents from the aspects of both collaboration and confrontation, we provide insights into the research and applications of this domain.
    摘要 本研究目的是探索基于LLM(语言模型)代理的社交行为问题。为达到这个目标,我们采用了Avalon游戏作为环境,并使用系统提示导引LLM代理进行游戏。先前的研究已经对LLM代理在游戏中的初步调查,但尚缺乏关于其社交行为的研究。本文提出了一种新的框架,可以轻松适应Avalon游戏环境。我们的框架核心是多代理系统,允许代理之间有效地交流和互动。我们根据游戏胜利和LLM代理社交行为的两个角度进行评价,并发现了我们的框架在生成适应性强和智能代理方面的效果。我们的结果还 highlight了LLM代理在面对动态社会环境的挑战中的潜在应用前景。通过分析LLM代理的社交行为从合作和对抗两个方面,我们提供了这个领域的研究和应用的深入理解。

Fidelity-Enriched Contrastive Search: Reconciling the Faithfulness-Diversity Trade-Off in Text Generation

  • paper_url: http://arxiv.org/abs/2310.14981
  • repo_url: https://github.com/ntunlplab/fecs
  • paper_authors: Wei-Lin Chen, Cheng-Kuang Wu, Hsin-Hsi Chen, Chung-Chi Chen
  • for: solves the hallucination problem in natural language generation tasks
  • methods: uses Fidelity-Enriched Contrastive Search (FECS) with context-aware regularization terms
  • results: consistently enhances faithfulness while maintaining output diversityHere’s the full translation of the paper’s abstract in Simplified Chinese:
  • for: 本研究旨在解决自然语言生成任务中的幻觉问题, язы言模型经常生成流利且吸引人的内容,但可能缺乏与提供的源文件的一致性,导致可能的不准确。
  • methods: 我们提出了一种新的解码方法,即强化对比搜索框架的 faithfulness-enriched contrastive search (FECS),该方法在生成文本时添加了上下文感知规则,以便批量抑制生成文本中的重复性。
  • results: 我们在摘要生成和对话生成两个极易幻觉的任务中进行了实验,结果表明,FECS可以在不同的语言模型大小下保持 faithfulness,并与其他解码算法相比保持输出多样性。
    Abstract In this paper, we address the hallucination problem commonly found in natural language generation tasks. Language models often generate fluent and convincing content but can lack consistency with the provided source, resulting in potential inaccuracies. We propose a new decoding method called Fidelity-Enriched Contrastive Search (FECS), which augments the contrastive search framework with context-aware regularization terms. FECS promotes tokens that are semantically similar to the provided source while penalizing repetitiveness in the generated text. We demonstrate its effectiveness across two tasks prone to hallucination: abstractive summarization and dialogue generation. Results show that FECS consistently enhances faithfulness across various language model sizes while maintaining output diversity comparable to well-performing decoding algorithms.
    摘要 在这篇论文中,我们解决了自然语言生成任务中的幻觉问题。语言模型经常生成流畅、有力的内容,但可能缺乏提供的源文本的一致性,导致可能的错误。我们提出了一种新的解码方法called Fidelity-Enriched Contrastive Search (FECS),它在对照搜索框架中添加了语言模型自适应正则化项。FECS 推荐的Token与提供的源文本具有相似性,同时对生成文本中的重复增加罚款。我们在摘要生成和对话生成两个任务中证明FECS 的有效性。结果表明,FECS 可以在不同的语言模型大小下保持准确性,并与其他解码算法相比保持输出多样性。

Penalty Decoding: Well Suppress the Self-Reinforcement Effect in Open-Ended Text Generation

  • paper_url: http://arxiv.org/abs/2310.14971
  • repo_url: https://github.com/zwhong714/penalty_decoding
  • paper_authors: Wenhong Zhu, Hongkun Hao, Rui Wang
  • for: 这篇论文是 investigate the self-reinforcement effect in text generation and the effectiveness of a repetition penalty to mitigate it.
  • methods: 这篇论文使用了一种忘记机制,使得选择罚款更加容易,以及一种长度罚款,以解决因罚款过重而导致的句子过短问题。
  • results: 实验结果表明,这种罚款解码方法可以生成高质量的句子,与人工输出类似。
    Abstract The decoding algorithm is critical for open-ended text generation, transforming latent representations into coherent and meaningful outputs. This paper investigates the self-reinforcement effect in text generation and the effectiveness of a repetition penalty to mitigate it. However, determining the optimal repetition penalty value is challenging. To tackle this, we propose a forgetting mechanism that disregards distant tokens, reducing the burden of penalty selection. In addition, we introduce a length penalty to address overly short sentences caused by excessive penalties. Our penalty decoding approach incorporating three strategies helps resolve issues with sampling methods deviating from factual information. Experimental results demonstrate the efficacy of our approach in generating high-quality sentences resembling human output.
    摘要 《解码算法是开放式文本生成中关键的,将潜在表示转化为有意义和 coherent 的输出。本文研究文本生成中的自我补偿效应以及使用重复罚款来缓解其。然而,选择优化的重复罚款值是困难的。为此,我们提出了忘记机制,忽略远程Token,减轻罚款选择的负担。此外,我们还引入了长度罚款,以解决由过分罚款导致的过短句子问题。我们的罚款解码方法结合了三种策略,帮助解决采样方法偏离实际信息的问题。实验结果表明,我们的方法可以生成高质量的句子,与人类输出类似。

Towards LLM-driven Dialogue State Tracking

  • paper_url: http://arxiv.org/abs/2310.14970
  • repo_url: https://github.com/woodscene/ldst
  • paper_authors: Yujie Feng, Zexin Lu, Bo Liu, Liming Zhan, Xiao-Ming Wu
  • for: 这个研究的目的是评估ChatGPT在对话管理中的能力。
  • methods: 这个研究使用了ChatGPT和一种基于小型开源模型的LDST框架进行对话管理。LDST使用了一种新的域槽指令调整方法来改进性能。
  • results: 研究发现LDST在零shot和几shot设置下比前一个SOTA方法表现出了remarkable的性能改进。
    Abstract Dialogue State Tracking (DST) is of paramount importance in ensuring accurate tracking of user goals and system actions within task-oriented dialogue systems. The emergence of large language models (LLMs) such as GPT3 and ChatGPT has sparked considerable interest in assessing their efficacy across diverse applications. In this study, we conduct an initial examination of ChatGPT's capabilities in DST. Our evaluation uncovers the exceptional performance of ChatGPT in this task, offering valuable insights to researchers regarding its capabilities and providing useful directions for designing and enhancing dialogue systems. Despite its impressive performance, ChatGPT has significant limitations including its closed-source nature, request restrictions, raising data privacy concerns, and lacking local deployment capabilities. To address these concerns, we present LDST, an LLM-driven DST framework based on smaller, open-source foundation models. By utilizing a novel domain-slot instruction tuning method, LDST achieves performance on par with ChatGPT. Comprehensive evaluations across three distinct experimental settings, we find that LDST exhibits remarkable performance improvements in both zero-shot and few-shot setting compared to previous SOTA methods. The source code is provided for reproducibility.
    摘要 对话状态跟踪(DST)对于实现对话系统中用户目标和系统行为的准确跟踪是非常重要的。大语言模型(LLM)如GPT3和ChatGPT的出现引发了对其多种应用场景的评估。在这项研究中,我们对ChatGPT在DST中的能力进行了初步评估。我们的评估发现ChatGPT在这个任务中表现出色,为研究人员提供了有价值的信息,并为设计和改进对话系统提供了有用的指导。 despite its impressive performance, ChatGPT有一些限制,包括它的关闭源代码、请求限制、数据隐私问题和无法在本地部署的问题。为了解决这些问题,我们提出了LDST,基于更小的开源基础模型的LLM驱动的DST框架。通过使用一种新的领域槽调整方法,LDST在零shot和几shot设置中表现出色,与前一代SOTA方法相比具有显著的性能改进。我们在三个不同的实验设置中进行了广泛的评估,发现LDST在零shot和几shot设置中具有remarkable的性能改进。源代码提供了重现性。

Key Frame Mechanism For Efficient Conformer Based End-to-end Speech Recognition

  • paper_url: http://arxiv.org/abs/2310.14954
  • repo_url: https://github.com/scufan1990/key-frame-mechanism-for-efficient-conformer
  • paper_authors: Peng Fan, Changhao Shan, Sining Sun, Qing Yang, Jianwei Zhang
    for: 这个研究旨在提高Conformer模型的效率,并且解决Conformer模型中自注意力机制的计算复杂性问题。methods: 这个研究使用了Conformer块作为基础网络,并引入了中间CTC输出作为导向,以减少自注意力机制的计算复杂性。还引入了关键帧自注意力机制(KFSA)和关键帧下采样机制(KFDS)来减少计算量。results: 这个研究实现了比vanilla Conformer和其他相似工作(如Efficient Conformer)相当或更高的性能,同时可以快速释放大约60%的无用帧。这种方法可以大幅提高模型的推理速度。
    Abstract Recently, Conformer as a backbone network for end-to-end automatic speech recognition achieved state-of-the-art performance. The Conformer block leverages a self-attention mechanism to capture global information, along with a convolutional neural network to capture local information, resulting in improved performance. However, the Conformer-based model encounters an issue with the self-attention mechanism, as computational complexity grows quadratically with the length of the input sequence. Inspired by previous Connectionist Temporal Classification (CTC) guided blank skipping during decoding, we introduce intermediate CTC outputs as guidance into the downsampling procedure of the Conformer encoder. We define the frame with non-blank output as key frame. Specifically, we introduce the key frame-based self-attention (KFSA) mechanism, a novel method to reduce the computation of the self-attention mechanism using key frames. The structure of our proposed approach comprises two encoders. Following the initial encoder, we introduce an intermediate CTC loss function to compute the label frame, enabling us to extract the key frames and blank frames for KFSA. Furthermore, we introduce the key frame-based downsampling (KFDS) mechanism to operate on high-dimensional acoustic features directly and drop the frames corresponding to blank labels, which results in new acoustic feature sequences as input to the second encoder. By using the proposed method, which achieves comparable or higher performance than vanilla Conformer and other similar work such as Efficient Conformer. Meantime, our proposed method can discard more than 60\% useless frames during model training and inference, which will accelerate the inference speed significantly. This work code is available in {https://github.com/scufan1990/Key-Frame-Mechanism-For-Efficient-Conformer}
    摘要 近期,Conformer作为端到端自动语音识别的后ION网络 achieved state-of-the-art performance。Conformer块利用自我注意机制 capture global information,以及 convolutional neural network capture local information,resulting in improved performance。然而,Conformer-based model encounter issue with self-attention mechanism, as computational complexity grows quadratically with input sequence length。inspired by previous CTC guided blank skipping during decoding, we introduce intermediate CTC outputs as guidance into the downsampling procedure of Conformer encoder。we define the frame with non-blank output as key frame。specifically, we introduce key frame-based self-attention (KFSA) mechanism, a novel method to reduce the computation of self-attention mechanism using key frames。our proposed approach consists of two encoders。following the initial encoder, we introduce an intermediate CTC loss function to compute label frame, enabling us to extract key frames and blank frames for KFSA。furthermore, we introduce key frame-based downsampling (KFDS) mechanism to operate on high-dimensional acoustic features directly and drop frames corresponding to blank labels, resulting in new acoustic feature sequences as input to second encoder。by using the proposed method, we achieve comparable or higher performance than vanilla Conformer and other similar work such as Efficient Conformer。meantime, our proposed method can discard more than 60% useless frames during model training and inference, which will significantly accelerate inference speed。this work code is available in {https://github.com/scufan1990/Key-Frame-Mechanism-For-Efficient-Conformer}

System Combination via Quality Estimation for Grammatical Error Correction

  • paper_url: http://arxiv.org/abs/2310.14947
  • repo_url: https://github.com/nusnlp/greco
  • paper_authors: Muhammad Reza Qorib, Hwee Tou Ng
  • for: 这个论文的目的是提出一种新的语法错误修正评估模型,以提高语法错误修正系统的评估精度。
  • methods: 这个论文使用了一种新的语法错误修正评估模型,叫做GRECO,它可以更好地评估修正后的句子质量。此外,论文还提出了三种方法来使用语法错误修正评估模型进行系统组合,包括模型无关、模型无关投票方法和模型相关方法。
  • results: 根据论文的实验结果,使用GRECO模型可以更好地评估修正后的句子质量,并且组合使用多个语法错误修正系统可以达到更高的F0.5分数。
    Abstract Quality estimation models have been developed to assess the corrections made by grammatical error correction (GEC) models when the reference or gold-standard corrections are not available. An ideal quality estimator can be utilized to combine the outputs of multiple GEC systems by choosing the best subset of edits from the union of all edits proposed by the GEC base systems. However, we found that existing GEC quality estimation models are not good enough in differentiating good corrections from bad ones, resulting in a low F0.5 score when used for system combination. In this paper, we propose GRECO, a new state-of-the-art quality estimation model that gives a better estimate of the quality of a corrected sentence, as indicated by having a higher correlation to the F0.5 score of a corrected sentence. It results in a combined GEC system with a higher F0.5 score. We also propose three methods for utilizing GEC quality estimation models for system combination with varying generality: model-agnostic, model-agnostic with voting bias, and model-dependent method. The combined GEC system outperforms the state of the art on the CoNLL-2014 test set and the BEA-2019 test set, achieving the highest F0.5 scores published to date.
    摘要 Quality estimation models have been developed to assess the corrections made by grammatical error correction (GEC) models when the reference or gold-standard corrections are not available. An ideal quality estimator can be utilized to combine the outputs of multiple GEC systems by choosing the best subset of edits from the union of all edits proposed by the GEC base systems. However, we found that existing GEC quality estimation models are not good enough in differentiating good corrections from bad ones, resulting in a low F0.5 score when used for system combination. In this paper, we propose GRECO, a new state-of-the-art quality estimation model that gives a better estimate of the quality of a corrected sentence, as indicated by having a higher correlation to the F0.5 score of a corrected sentence. It results in a combined GEC system with a higher F0.5 score. We also propose three methods for utilizing GEC quality estimation models for system combination with varying generality: model-agnostic, model-agnostic with voting bias, and model-dependent method. The combined GEC system outperforms the state of the art on the CoNLL-2014 test set and the BEA-2019 test set, achieving the highest F0.5 scores published to date.Here's the translation in Traditional Chinese:quality estimation models have been developed to assess the corrections made by grammatical error correction (GEC) models when the reference or gold-standard corrections are not available. An ideal quality estimator can be utilized to combine the outputs of multiple GEC systems by choosing the best subset of edits from the union of all edits proposed by the GEC base systems. However, we found that existing GEC quality estimation models are not good enough in differentiating good corrections from bad ones, resulting in a low F0.5 score when used for system combination. In this paper, we propose GRECO, a new state-of-the-art quality estimation model that gives a better estimate of the quality of a corrected sentence, as indicated by having a higher correlation to the F0.5 score of a corrected sentence. It results in a combined GEC system with a higher F0.5 score. We also propose three methods for utilizing GEC quality estimation models for system combination with varying generality: model-agnostic, model-agnostic with voting bias, and model-dependent method. The combined GEC system outperforms the state of the art on the CoNLL-2014 test set and the BEA-2019 test set, achieving the highest F0.5 scores published to date.

Unveiling A Core Linguistic Region in Large Language Models

  • paper_url: http://arxiv.org/abs/2310.14928
  • repo_url: None
  • paper_authors: Jun Zhao, Zhihao Zhang, Yide Ma, Qi Zhang, Tao Gui, Luhui Gao, Xuanjing Huang
  • for: 本研究用大语言模型(LLM)来深入理解智能的起源。
  • methods: 通过脑地図的研究,找到 LLM 中语言能力的核心区域,并研究这个区域与智能的发展有关的特征。
  • results: 研究发现,LLM 中语言能力的核心区域占据约1%的总模型参数,并且存在特定维度的参数变化会导致语言能力下降。此外,研究发现,提高语言能力不一定 accompanies 提高模型的知识水平,这可能 imply 存在域知识的分离。
    Abstract Brain localization, which describes the association between specific regions of the brain and their corresponding functions, is widely accepted in the field of cognitive science as an objective fact. Today's large language models (LLMs) possess human-level linguistic competence and can execute complex tasks requiring abstract knowledge and reasoning. To deeply understand the inherent mechanisms of intelligence emergence in LLMs, this paper conducts an analogical research using brain localization as a prototype. We have discovered a core region in LLMs that corresponds to linguistic competence, accounting for approximately 1% of the total model parameters. This core region exhibits significant dimension dependency, and perturbations to even a single parameter on specific dimensions can lead to a loss of linguistic competence. Furthermore, we observe that an improvement in linguistic competence does not necessarily accompany an elevation in the model's knowledge level, which might imply the existence of regions of domain knowledge that are dissociated from the linguistic region. Overall, exploring the LLMs' functional regions provides insights into the foundation of their intelligence. In the future, we will continue to investigate knowledge regions within LLMs and the interactions between them.
    摘要 布尔Localization,指的是脑部特定区域与其功能之间的关联,在认知科学中广泛得到认可。今天的大语言模型(LLM)具有人类水平的语言能力,可执行复杂的抽象知识和理解任务。通过使用脑部Localization作为原型,本文通过分析LLM内部的函数区域来深入理解智能的内在机制。我们发现了LLM中的核心区域,与语言能力相关,占总模型参数的约1%。这个核心区域显示出明显的维度依赖,甚至对特定维度的参数的小幅改变可能会导致语言能力的失去。此外,我们发现,提高语言能力不一定意味着提高模型的知识水平,这可能表明存在域知识的分离。总的来说,研究LLM内部的函数区域,为我们理解智能的基础提供了新的视角。未来,我们将继续探索LLM内部的知识区域,以及它们之间的交互。

Air-Decoding: Attribute Distribution Reconstruction for Decoding-Time Controllable Text Generation

  • paper_url: http://arxiv.org/abs/2310.14892
  • repo_url: https://github.com/r1047/air-decoding
  • paper_authors: Tianqi Zhong, Quan Wang, Jingxuan Han, Yongdong Zhang, Zhendong Mao
  • for: 这个论文的目的是提高控制性能,但是它发现了一种新的问题——特性塌突,导致生成文本的流畅性快速下降,使文本无法使用。
  • methods: 该论文提出了一种新的轻量级解码框架 named Air-Decoding,其主要思想是重建特性分布,以保持 attribute words 和非 attribute words 的权重平衡,从而生成更流畅的文本。
  • results: 经过多个 CTG 任务的实验证明,该方法可以实现新的控制性能最佳化。
    Abstract Controllable text generation (CTG) aims to generate text with desired attributes, and decoding-time-based methods have shown promising performance on this task. However, in this paper, we identify the phenomenon of Attribute Collapse for the first time. It causes the fluency of generated text to rapidly decrease when the control strength exceeds a critical value, rendering the text completely unusable. This limitation hinders the effectiveness of decoding methods in achieving high levels of controllability. To address this problem, we propose a novel lightweight decoding framework named Air-Decoding. Its main idea is reconstructing the attribute distributions to balance the weights between attribute words and non-attribute words to generate more fluent text. Specifically, we train prefixes by prefix-tuning to obtain attribute distributions. Then we design a novel attribute distribution reconstruction method to balance the obtained distributions and use the reconstructed distributions to guide language models for generation, effectively avoiding the issue of Attribute Collapse. Experiments on multiple CTG tasks prove that our method achieves a new state-of-the-art control performance.
    摘要 《控制性文本生成(CTG)目标是生成具有愿景的文本,而解码时间基于的方法在这个任务上表现出了良好的表现。然而,在这篇论文中,我们第一次观察到了“特性塌瓦”现象。当控制力超过了一定的 crítical value 时,文本的流畅性快速下降,使得文本变得完全 unusable。这种限制了解码方法在实现高水平的控制性的能力。为解决这个问题,我们提出了一种新的轻量级解码框架,名为 Air-Decoding。它的主要想法是重建特性分布,以平衡 attribute words 和 non-attribute words 之间的权重,以生成更流畅的文本。具体来说,我们使用 prefix-tuning 方法来训练 prefixes,然后设计了一种新的特性分布重建方法,以平衡获得的分布,并使用重建的分布来导引语言模型进行生成,从而有效避免特性塌瓦问题。在多个 CTG 任务上进行了实验,我们发现我们的方法可以 дости到新的状态对抗性控制性表现。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know.

  • paper_url: http://arxiv.org/abs/2310.14880
  • repo_url: https://github.com/christinakang/sirac
  • paper_authors: Xiaoxi Kang, Lizhen Qu, Lay-Ki Soon, Adnan Trakic, Terry Yue Zhuo, Patrick Charles Emerton, Genevieve Grant
  • for: 这个论文的目的是检验大语言模型(LLMs)是否能够像律师一样分析法律案例,以及 LLMS 是否能够按照法律专业人员使用的 IRAC 方法进行分析。
  • methods: 作者使用了一个新的词汇库,并将其应用于一个包含马来西亚和澳大利亚社会法律的场景集。他们使用的 IRAC 方法是法律专业人员广泛使用的一种框架。
  • results: 研究发现,ChatGPT 能够按照 IRAC 方法进行分析,但与法律专业人员的分析不完全相同。这些结果提供了未来研究的可能性,以提高 LLMS 和法律专业人员之间的协调性。
    Abstract Large Language Models (LLMs), such as ChatGPT, have drawn a lot of attentions recently in the legal domain due to its emergent ability to tackle a variety of legal tasks. However, it is still unknown if LLMs are able to analyze a legal case and perform reasoning in the same manner as lawyers. Therefore, we constructed a novel corpus consisting of scenarios pertain to Contract Acts Malaysia and Australian Social Act for Dependent Child. ChatGPT is applied to perform analysis on the corpus using the IRAC method, which is a framework widely used by legal professionals for organizing legal analysis. Each scenario in the corpus is annotated with a complete IRAC analysis in a semi-structured format so that both machines and legal professionals are able to interpret and understand the annotations. In addition, we conducted the first empirical assessment of ChatGPT for IRAC analysis in order to understand how well it aligns with the analysis of legal professionals. Our experimental results shed lights on possible future research directions to improve alignments between LLMs and legal experts in terms of legal reasoning.
    摘要 Translation notes:* "Large Language Models" (LLMs) was translated as "大型语言模型" (dàxìng yǔyán módelì)* "ChatGPT" was translated as "ChatGPT" (卖词GPT)* "IRAC" was translated as "IRAC" (法律分析框架)* "Contract Acts Malaysia" was translated as "马来西亚合同法" (málàixià yìpianfǎ)* "Australian Social Act for Dependent Children" was translated as "澳大利亚社会法(依赖儿童)" (àozhìdàlìyà shèhuìfǎ (yīzhì èrtóng))Please note that the translation is in Simplified Chinese, and the format of the text may be different from the original English version.

We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields

  • paper_url: http://arxiv.org/abs/2310.14870
  • repo_url: https://github.com/jpwahle/emnlp23-citation-field-influence
  • paper_authors: Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad
    methods: 该论文使用了 77k NLP 论文、3.1m NLP 论文所引用的其他论文、以及 ~1.8m 其他论文所引用的 NLP 论文的索引分析来量化 NLP 领域与其他领域之间的影响关系。results: 研究发现,与其他领域之间的 NLP 领域之间的影响关系(CFDI)在1980年为0.58,而在2022年则为0.31(历史低点)。此外,研究发现 NLP 领域在过去几十年中变得更加闭塞,它在引用更多的 NLP 论文,同时 fewer 论文作为其他领域和 NLP 领域之间的桥梁。NLP 引用中的大多数来自计算机科学,只有 less than 8% 的 NLP 引用来自语言学,而 less than 3% 来自数学和心理学。这些发现提醒 NLP 领域需要反思与其他领域之间的交流和合作。
    Abstract Natural Language Processing (NLP) is poised to substantially influence the world. However, significant progress comes hand-in-hand with substantial risks. Addressing them requires broad engagement with various fields of study. Yet, little empirical work examines the state of such engagement (past or current). In this paper, we quantify the degree of influence between 23 fields of study and NLP (on each other). We analyzed ~77k NLP papers, ~3.1m citations from NLP papers to other papers, and ~1.8m citations from other papers to NLP papers. We show that, unlike most fields, the cross-field engagement of NLP, measured by our proposed Citation Field Diversity Index (CFDI), has declined from 0.58 in 1980 to 0.31 in 2022 (an all-time low). In addition, we find that NLP has grown more insular -- citing increasingly more NLP papers and having fewer papers that act as bridges between fields. NLP citations are dominated by computer science; Less than 8% of NLP citations are to linguistics, and less than 3% are to math and psychology. These findings underscore NLP's urgent need to reflect on its engagement with various fields.
    摘要 自然语言处理(NLP)即将对世界产生重要影响。然而,进步也会随之带来重大风险。解决这些风险需要跨学科合作。然而,有很少的实证工作研究过NLP与其他学科之间的交往(过去或当前)。在这篇论文中,我们使用我们提出的引用领域多样性指数(CFDI)量化NLP与23个学科之间的影响度。我们分析了大约77000篇NLP论文,大约3100万次来自NLP论文的引用到其他论文,以及大约1800万次来自其他论文的引用到NLP论文。我们发现,与大多数学科不同的是,NLP的跨学科交往度(CFDI)在1980年是0.58,到2022年则下降至0.31(历史低点)。此外,我们发现NLP在某些程度上变得更加闭塞——它更多地引用NLP论文,并且 fewer papers serve as bridges between fields。NLP引用的主要来源是计算机科学,仅有8%的NLP引用是到语言学,而仅有3%是到数学和心理学。这些发现表明NLP需要反思与多个学科之间的交往。

Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study on Syllogism

  • paper_url: http://arxiv.org/abs/2310.14868
  • repo_url: https://github.com/muyo8692/stepbystep-reasoning-vs-negation
  • paper_authors: Mengyu Ye, Tatsuki Kuribayashi, Jun Suzuki, Goro Kobayashi, Hiroaki Funayama
  • for: 这项研究旨在探讨大型自然语言模型(LLMs)在步骤逻辑指导下的推理能力,尤其是在否定语言现象下的逻辑推理能力。
  • methods: 研究采用了一些控制 Settings(例如,基于虚构存在的推理)来评估现代 LLMs 的逻辑推理能力。
  • results: 研究发现,多达数十个现代 LLMs 在 lexical negation(例如,可能 ->不可能)时表现不稳定,并且每个 LLM 家族具有独特的限制。
    Abstract Large language models (LLMs) take advantage of step-by-step reasoning instructions, e.g., chain-of-thought (CoT) prompting. Building on this, their ability to perform CoT-style reasoning robustly is of interest from a probing perspective. In this study, we inspect the step-by-step reasoning ability of LLMs with a focus on negation, which is a core linguistic phenomenon that is difficult to process. In particular, we introduce several controlled settings (e.g., reasoning in case of fictional entities) to evaluate the logical reasoning abilities of the models. We observed that dozens of modern LLMs were not robust against lexical negation (e.g., plausible ->implausible) when performing CoT-style reasoning, and the results highlight unique limitations in each LLM family.
    摘要 大型语言模型(LLM)利用步骤性理解指令,如链式思维(CoT)提示,可以强大地进行CoT式理解。在这种情况下, LLM 的可靠性和稳定性是一个有趣的问题。在这种研究中,我们审查了 LLM 在负符号(例如,可能 ->不可能)的逻辑理解能力,这是一种 linguistic 现象具有困难处理的特点。我们引入了一些控制条件(例如,在虚拟存在的情况下进行逻辑理解)来评估 LLM 的逻辑理解能力。我们发现了多达数十个现代 LLM 在 CoT 式理解中对lexical negation(例如,plausible ->implausible)不具有可靠性,结果透视了每个 LLM 家族的特殊限制。

Paraphrase Types for Generation and Detection

  • paper_url: http://arxiv.org/abs/2310.14863
  • repo_url: https://github.com/jpwahle/emnlp23-paraphrase-types
  • paper_authors: Jan Philip Wahle, Bela Gipp, Terry Ruas
  • for: This paper aims to address the limitations of current paraphrase generation and detection approaches by introducing two new tasks that consider specific linguistic perturbations at particular text positions.
  • methods: The paper proposes two new tasks, Paraphrase Type Generation and Paraphrase Type Detection, which involve generating and identifying fine-grained paraphrase types.
  • results: The results suggest that while current techniques perform well in a binary classification scenario, they struggle with the inclusion of fine-grained paraphrase types. Models trained in generating and identifying paraphrase types show improvements in tasks without them, and scaling these models further improves their ability to understand paraphrase types.
    Abstract Current approaches in paraphrase generation and detection heavily rely on a single general similarity score, ignoring the intricate linguistic properties of language. This paper introduces two new tasks to address this shortcoming by considering paraphrase types - specific linguistic perturbations at particular text positions. We name these tasks Paraphrase Type Generation and Paraphrase Type Detection. Our results suggest that while current techniques perform well in a binary classification scenario, i.e., paraphrased or not, the inclusion of fine-grained paraphrase types poses a significant challenge. While most approaches are good at generating and detecting general semantic similar content, they fail to understand the intrinsic linguistic variables they manipulate. Models trained in generating and identifying paraphrase types also show improvements in tasks without them. In addition, scaling these models further improves their ability to understand paraphrase types. We believe paraphrase types can unlock a new paradigm for developing paraphrase models and solving tasks in the future.
    摘要 Translated into Simplified Chinese:现有的简单相似性分数方法 heavily rely on a single general similarity score, 忽略了语言的细腻语言特性。这篇论文引入了两个新任务,以考虑具体的简单语言干扰,即特定文本位置的语言突变。我们称这两个任务为 Paraphrase Type Generation 和 Paraphrase Type Detection。我们的结果表明,虽然现有的技术在简单的二分类场景中表现良好,即是否简单化,但是包含细腻语言类型的场景是一个 significante 挑战。大多数方法可以生成和识别通用 semantic similar 内容,但是它们无法理解它们所操作的语言变量。具有 Paraphrase Type Generation 和 Paraphrase Type Detection 的模型在没有这些任务时也表现出了改进。此外,将这些模型进一步缩放也会提高它们对 Paraphrase Type 的理解。我们认为 Paraphrase Type 可以开启一个新的发展 парадигмы,解决未来的任务。

3M-TRANSFORMER: A Multi-Stage Multi-Stream Multimodal Transformer for Embodied Turn-Taking Prediction

  • paper_url: http://arxiv.org/abs/2310.14859
  • repo_url: None
  • paper_authors: Mehdi Fatan, Emanuele Mincato, Dimitra Pintzou, Mariella Dimiccoli
  • for: 预测多人会议中的转接(turn-taking)有很多实用应用在人机/机器人交互中。但是,人类communication的复杂性使得这个任务变得具有挑战性。
  • methods: 我们提出了一种基于多模态转换器的新建筑,用于预测embodied、同步多个视角数据中的转接。我们的实验结果表明,我们的方法可以在EgoCom数据集上实现substantial的性能提升,至多14.01%。
  • results: 我们的实验结果表明,我们的方法可以在EgoCom数据集上实现substantial的性能提升,至多14.01%。
    Abstract Predicting turn-taking in multiparty conversations has many practical applications in human-computer/robot interaction. However, the complexity of human communication makes it a challenging task. Recent advances have shown that synchronous multi-perspective egocentric data can significantly improve turn-taking prediction compared to asynchronous, single-perspective transcriptions. Building on this research, we propose a new multimodal transformer-based architecture for predicting turn-taking in embodied, synchronized multi-perspective data. Our experimental results on the recently introduced EgoCom dataset show a substantial performance improvement of up to 14.01% on average compared to existing baselines and alternative transformer-based approaches. The source code, and the pre-trained models of our 3T-Transformer will be available upon acceptance.
    摘要 预测多方会议中的转移很有实际应用在人机/机器人交互中。然而,人类communication的复杂性使得这个任务变得非常困难。最近的进展表明,同步多个视角的egos征data可以在转移预测中提供显著改进,比 asynchronous, single-perspective transcripts更高。基于这些研究,我们提议一种新的多模式 transformer-based 架构,用于预测embodied, synchronized multi-perspective数据中的转移。我们的实验结果表明,在最近引入的 EgoCom 数据集上,我们的方法可以与现有的基eline和替换 transformer-based 方法相比,提高了平均14.01%的性能。我们的源代码和预训练模型将在接受后提供。

Adaptive Policy with Wait-$k$ Model for Simultaneous Translation

  • paper_url: http://arxiv.org/abs/2310.14853
  • repo_url: None
  • paper_authors: Libo Zhao, Kai Fan, Wei Luo, Jing Wu, Shushu Wang, Ziqian Zeng, Zhongqiang Huang
  • for: 提高同时机器翻译(SiMT)的稳定性和质量。
  • methods: 提出一种更灵活的方法,即分离适应策略模型与翻译模型。使用差分基于预测分布的差分策略(DaP),可以为任何翻译模型进行读写决策,并且具有轻量级参数和计算效率。
  • results: 实验结果表明,我们的方法可以提供更好的翻译质量和响应时间的平衡,超过了强基eline。
    Abstract Simultaneous machine translation (SiMT) requires a robust read/write policy in conjunction with a high-quality translation model. Traditional methods rely on either a fixed wait-$k$ policy coupled with a standalone wait-$k$ translation model, or an adaptive policy jointly trained with the translation model. In this study, we propose a more flexible approach by decoupling the adaptive policy model from the translation model. Our motivation stems from the observation that a standalone multi-path wait-$k$ model performs competitively with adaptive policies utilized in state-of-the-art SiMT approaches. Specifically, we introduce DaP, a divergence-based adaptive policy, that makes read/write decisions for any translation model based on the potential divergence in translation distributions resulting from future information. DaP extends a frozen wait-$k$ model with lightweight parameters, and is both memory and computation efficient. Experimental results across various benchmarks demonstrate that our approach offers an improved trade-off between translation accuracy and latency, outperforming strong baselines.
    摘要 同时机器翻译(SiMT)需要一种可靠的读写策略,并且需要一个高质量的翻译模型。传统方法通常采用 either 固定 waith-$k$ 策略与独立的 waith-$k$ 翻译模型,或者一种适应策略与翻译模型一起受训练。在这个研究中,我们提出了一种更灵活的方法,即分离适应策略模型与翻译模型。我们的动机来自于我们发现,一个独立的多路 waith-$k$ 模型可以与适应策略一起实现竞争力。具体来说,我们引入了 DaP,一种基于偏移量的适应策略,可以为任何翻译模型进行读写决策,基于未来信息中的翻译分布的潜在偏移量。DaP 通过增加了轻量级的参数,可以减少内存和计算量。实验结果表明,我们的方法可以提供更好的翻译准确率和延迟之间的平衡,超过强基eline。

Universal Domain Adaptation for Robust Handling of Distributional Shifts in NLP

  • paper_url: http://arxiv.org/abs/2310.14849
  • repo_url: https://github.com/heyjoonkim/universal_domain_adaptation_for_nlp
  • paper_authors: Hyuhng Joon Kim, Hyunsoo Cho, Sang-Woo Lee, Junyeob Kim, Choonghyun Park, Sang-goo Lee, Kang Min Yoo, Taeuk Kim
  • for: 本研究旨在探讨通用领域适应(UniDA)在自然语言输入上的应用,以及实现适应能力和响应性(能够检测非典型输入)。
  • methods: 本研究使用现有的UniDA方法和STATE-OF-THE-ART的领域适应技术来验证,并在多个 dataset 上进行评估。
  • results: 研究发现,原本设计用于图像输入的UniDA方法可以有效地转移到自然语言领域,并且显示了适应困难度的影响在模型性能上。
    Abstract When deploying machine learning systems to the wild, it is highly desirable for them to effectively leverage prior knowledge to the unfamiliar domain while also firing alarms to anomalous inputs. In order to address these requirements, Universal Domain Adaptation (UniDA) has emerged as a novel research area in computer vision, focusing on achieving both adaptation ability and robustness (i.e., the ability to detect out-of-distribution samples). While UniDA has led significant progress in computer vision, its application on language input still needs to be explored despite its feasibility. In this paper, we propose a comprehensive benchmark for natural language that offers thorough viewpoints of the model's generalizability and robustness. Our benchmark encompasses multiple datasets with varying difficulty levels and characteristics, including temporal shifts and diverse domains. On top of our testbed, we validate existing UniDA methods from computer vision and state-of-the-art domain adaptation techniques from NLP literature, yielding valuable findings: We observe that UniDA methods originally designed for image input can be effectively transferred to the natural language domain while also underscoring the effect of adaptation difficulty in determining the model's performance.
    摘要 (注意:以下是简化中文版本,具体内容可能与原文有所不同)

Transparency at the Source: Evaluating and Interpreting Language Models With Access to the True Distribution

  • paper_url: http://arxiv.org/abs/2310.14840
  • repo_url: https://github.com/clclab/pcfg-lm
  • paper_authors: Jaap Jumelet, Willem Zuidema
  • for: 这个论文旨在提供一种基于人工语言数据的语言模型训练、评估和解释 setup,以及一种使用这种数据生成的大量probabilistic grammar来控制生成过程。
  • methods: 这种方法使用一种基于状态拆分PCFGs的大型probabilistic grammar来生成人工语言数据,并且对这种数据进行了评估和解释。
  • results: 研究发现,不同的神经语言模型架构和训练目标在 aproximating lower bound on perplexity 方面存在明显的差异。此外,这种方法还允许直接比较神经网络模型学习的表达和符号学习规则。
    Abstract We present a setup for training, evaluating and interpreting neural language models, that uses artificial, language-like data. The data is generated using a massive probabilistic grammar (based on state-split PCFGs), that is itself derived from a large natural language corpus, but also provides us complete control over the generative process. We describe and release both grammar and corpus, and test for the naturalness of our generated data. This approach allows us to define closed-form expressions to efficiently compute exact lower bounds on obtainable perplexity using both causal and masked language modelling. Our results show striking differences between neural language modelling architectures and training objectives in how closely they allow approximating the lower bound on perplexity. Our approach also allows us to directly compare learned representations to symbolic rules in the underlying source. We experiment with various techniques for interpreting model behaviour and learning dynamics. With access to the underlying true source, our results show striking differences and outcomes in learning dynamics between different classes of words.
    摘要 我们提出了一种基于人工语言样本的训练、评估和解释神经语言模型的设置。我们使用一个大型概率 grammar(基于状态拆分 PCGB),该 grammar 是从大量自然语言词汇库中生成的,但允许我们完全控制生成过程。我们描述和发布了 grammar 和词汇库,并测试了生成的数据的自然性。这种方法允许我们计算精确的下界抑制能力,并且可以使用 causal 和 masked 语言模型来计算下界。我们的结果表明不同的神经语言模型架构和训练目标在接近下界抑制能力方面存在显著的差异。我们的方法还允许我们直接比较学习的表示与源代码中的symbolic规则进行比较。我们在不同的技术上进行了解model行为和学习动态的解释。通过访问真实的源代码,我们的结果显示了不同类型的词的学习动态和结果存在显著的差异。

Characterizing how ‘distributional’ NLP corpora distance metrics are

  • paper_url: http://arxiv.org/abs/2310.14829
  • repo_url: https://github.com/ibm/text-corpus-distance-distributionality
  • paper_authors: Samuel Ackerman, George Kour, Eitan Farchi
  • for: 本研究的目的是研究对两个文档集的距离度量的选择,以及这些度量的分布性。
  • methods: 本研究使用了一个名为“知similarity corpora”的集合,用于测试不同的距离度量的分布性。这个集合包含了两个重叠的文档集,并且通过对这两个集合中的文档进行重叠,生成了一个包含了各种重叠度量的数据集。
  • results: 研究发现,使用了不同的距离度量可以得到不同的分布性结果。例如,使用了MAuve距离度量和Frechet Inception距离度量可以得到更好的分布性结果,而使用了 tradicional的pairwise nearest-neighbor距离度量可以得到更差的分布性结果。此外,研究还发现,可以通过对知similarity corpora中的文档进行重叠,来评估不同距离度量的分布性。
    Abstract A corpus of vector-embedded text documents has some empirical distribution. Given two corpora, we want to calculate a single metric of distance (e.g., Mauve, Frechet Inception) between them. We describe an abstract quality, called `distributionality', of such metrics. A non-distributional metric tends to use very local measurements, or uses global measurements in a way that does not fully reflect the distributions' true distance. For example, if individual pairwise nearest-neighbor distances are low, it may judge the two corpora to have low distance, even if their two distributions are in fact far from each other. A more distributional metric will, in contrast, better capture the distributions' overall distance. We quantify this quality by constructing a Known-Similarity Corpora set from two paraphrase corpora and calculating the distance between paired corpora from it. The distances' trend shape as set element separation increases should quantify the distributionality of the metric. We propose that Average Hausdorff Distance and energy distance between corpora are representative examples of non-distributional and distributional distance metrics, to which other metrics can be compared, to evaluate how distributional they are.
    摘要 一个文档Vector embedding的集合有一定的实际分布。给定两个集合,我们想计算它们之间的单个度量(例如Mauve、Frechet Inception)。我们描述一个抽象的质量,called“分布性”,这种度量。一个非分布的度量通常使用非常Local的测量,或者使用全局测量,但是不充分反映分布的真实距离。例如,如果个体对对 nearest-neighbor 距离都很低,它可能会判断这两个集合的距离很低,即使它们两个分布在实际上很远。一个更分布的度量将,相反,更好地捕捉分布的总距离。我们量化这种质量通过从两个重叠 corpora 中构建一个知情相似 corpora 集并计算这些集合之间的距离。距离的趋势形状,作为集合元素 separation 增加时,可以量化分布性。我们建议Mauve 和Frechet Inception distance 是非分布的度量,而 average Hausdorff distance 和 energy distance 是分布的度量,其他度量可以与这些度量进行比较,以评估它们是多分布的程度。

ALCUNA: Large Language Models Meet New Knowledge

  • paper_url: http://arxiv.org/abs/2310.14820
  • repo_url: https://github.com/arvid-pku/alcuna
  • paper_authors: Xunjian Yin, Baizhou Huang, Xiaojun Wan
  • for: This paper aims to address the lack of benchmarks for evaluating large-scale language models’ (LLMs) ability to handle new knowledge, an important aspect in the rapidly evolving world.
  • methods: The proposed approach, called KnowGen, generates new knowledge by altering existing entity attributes and relationships, resulting in artificial entities that are distinct from real-world entities. A new benchmark, ALCUNA, is introduced to assess LLMs’ abilities in knowledge understanding, differentiation, and association.
  • results: The authors benchmark several LLMs and find that their performance in face of new knowledge is not satisfactory, particularly in reasoning between new and internal knowledge. The impact of entity similarity on the model’s understanding of entity knowledge and the influence of contextual entities are also explored.
    Abstract With the rapid development of NLP, large-scale language models (LLMs) excel in various tasks across multiple domains now. However, existing benchmarks may not adequately measure these models' capabilities, especially when faced with new knowledge. In this paper, we address the lack of benchmarks to evaluate LLMs' ability to handle new knowledge, an important and challenging aspect in the rapidly evolving world. We propose an approach called KnowGen that generates new knowledge by altering existing entity attributes and relationships, resulting in artificial entities that are distinct from real-world entities. With KnowGen, we introduce a benchmark named ALCUNA to assess LLMs' abilities in knowledge understanding, differentiation, and association. We benchmark several LLMs, reveals that their performance in face of new knowledge is not satisfactory, particularly in reasoning between new and internal knowledge. We also explore the impact of entity similarity on the model's understanding of entity knowledge and the influence of contextual entities. We appeal to the need for caution when using LLMs in new scenarios or with new knowledge, and hope that our benchmarks can help drive the development of LLMs in face of new knowledge.
    摘要 快速发展的自然语言处理(NLP)技术,大型自然语言模型(LLMs)在多个领域中展现出极高的能力。然而,现有的标准测试集可能不充分评估这些模型在面对新知识时的能力,特别是在新知识领域。本文提出了一种方法 called KnowGen,该方法通过修改现有实体属性和关系来生成新的知识,从而生成了与实际世界中的实体不同的虚拟实体。通过KnowGen,我们引入了一个名为ALCUNA的benchmark,用于评估LLMs的知识理解、区分和相关能力。我们对多个LLMs进行了测试,发现它们在面对新知识时表现不 satisfactory,特别是在新知识和内存知识之间的理解。我们还探索了实体相似性对模型理解实体知识的影响,以及Contextual entities的影响。我们强调在使用LLMs时需要小心,并希望我们的benchmark可以帮助驱动LLMs在面对新知识方面的发展。

Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic

  • paper_url: http://arxiv.org/abs/2310.14819
  • repo_url: None
  • paper_authors: Sabri Boughorbel, Majd Hawasly
  • for: 这篇论文旨在评估大语言模型在阿拉伯语中的多turn指令响应能力。
  • methods: 本文使用GPT-4作为一个统一评估器,对英语和阿拉伯语查询进行了定制的阿拉伯语翻译MT-Bench测试环境。
  • results: 研究发现,不同任务类型(如逻辑vs文化)在英语和阿拉伯语下的模型响应有差异。基于多语言和多turn数据集进行微调的基础模型可以与从scratch在多语言数据集上训练的模型相比。最后,我们提出一个 ensemble of small, open LLMs可能可以与商业LLMs相比。
    Abstract While significant progress has been made in benchmarking Large Language Models (LLMs) across various tasks, there is a lack of comprehensive evaluation of their abilities in responding to multi-turn instructions in less-commonly tested languages like Arabic. Our paper offers a detailed examination of the proficiency of open LLMs in such scenarios in Arabic. Utilizing a customized Arabic translation of the MT-Bench benchmark suite, we employ GPT-4 as a uniform evaluator for both English and Arabic queries to assess and compare the performance of the LLMs on various open-ended tasks. Our findings reveal variations in model responses on different task categories, e.g., logic vs. literacy, when instructed in English or Arabic. We find that fine-tuned base models using multilingual and multi-turn datasets could be competitive to models trained from scratch on multilingual data. Finally, we hypothesize that an ensemble of small, open LLMs could perform competitively to proprietary LLMs on the benchmark.
    摘要 While significant progress has been made in benchmarking Large Language Models (LLMs) across various tasks, there is a lack of comprehensive evaluation of their abilities in responding to multi-turn instructions in less-commonly tested languages like Arabic. Our paper offers a detailed examination of the proficiency of open LLMs in such scenarios in Arabic. Utilizing a customized Arabic translation of the MT-Bench benchmark suite, we employ GPT-4 as a uniform evaluator for both English and Arabic queries to assess and compare the performance of the LLMs on various open-ended tasks. Our findings reveal variations in model responses on different task categories, e.g., logic vs. literacy, when instructed in English or Arabic. We find that fine-tuned base models using multilingual and multi-turn datasets could be competitive to models trained from scratch on multilingual data. Finally, we hypothesize that an ensemble of small, open LLMs could perform competitively to proprietary LLMs on the benchmark.Here's the translation in Traditional Chinese:虽然在不同任务上已经做出了重要的进步,但是对于少数测试语言如阿拉伯语的多回合指令对大型自然语言模型(LLM)的评估仍然缺乏全面的评估。我们的论文提供了阿拉伯语中的LLM在多回合指令下的详细评估。我们使用了GPT-4作为英语和阿拉伯语查询的uniform评估器,以评估和比较不同任务类别下的LLM表现。我们的发现显示了不同语言类别下的模型回应存在差异,例如逻辑vs文化。我们发现,使用多语言和多回合数据集进行微调的基本模型可以与从scratch在多语言数据集上训练的模型竞争。最后,我们提出了一个假设,assert that an ensemble of small, open LLMs could perform competitively to proprietary LLMs on the benchmark.

DISC-FinLLM: A Chinese Financial Large Language Model based on Multiple Experts Fine-tuning

  • paper_url: http://arxiv.org/abs/2310.15205
  • repo_url: https://github.com/fudandisc/disc-finllm
  • paper_authors: Wei Chen, Qiushi Wang, Zefei Long, Xianyin Zhang, Zhongtian Lu, Bingxuan Li, Siyuan Wang, Jiarong Xu, Xiang Bai, Xuanjing Huang, Zhongyu Wei
  • for: 建立一个金融大语言模型(DISC-FinLLM),以提高常规语言模型的多转问答能力、领域文本处理能力、数学计算能力和探索增强生成能力。
  • methods: 使用多专家精炼框架,将常规语言模型扩展为多转问答能力、领域文本处理能力、数学计算能力和探索增强生成能力。建立了金融问题预备集(DISC-FIN-SFT),包括四种类型的问题示例(顾问、NLP任务、计算和探索增强生成)。
  • results: 与基准模型进行比较,我们的模型在多个金融场景中表现出色,具有较高的准确率和效率。详细的结果可以参考https://github.com/FudanDISC/DISC-FinLLM。
    Abstract We propose Multiple Experts Fine-tuning Framework to build a financial large language model (LLM), DISC-FinLLM. Our methodology improves general LLMs by endowing them with multi-turn question answering abilities, domain text processing capabilities, mathematical computation skills, and retrieval-enhanced generation capabilities. We build a financial instruction-tuning dataset named DISC-FIN-SFT, including instruction samples of four categories (consulting, NLP tasks, computing and retrieval-augmented generation). Evaluations conducted on multiple benchmarks demonstrate that our model performs better than baseline models in various financial scenarios. Further resources can be found at https://github.com/FudanDISC/DISC-FinLLM.
    摘要 我们提出了多个专家精细调整框架,以建立一个金融大语言模型(LLM),称之为DISC-FinLLM。我们的方法可以提高通用LLM的多回问答能力、领域文本处理能力、数学计算能力以及检索增强生成能力。我们建立了金融指导调整数据集(DISC-FIN-SFT),包括四类指导示例(咨询、NLP任务、计算和检索增强生成)。经多个标准准的评估表明,我们的模型在多种金融场景中表现更好于基eline模型。更多资源可以在https://github.com/FudanDISC/DISC-FinLLM找到。

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

  • paper_url: http://arxiv.org/abs/2310.14806
  • repo_url: None
  • paper_authors: Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li, Yashesh Gaur
  • for: 这篇论文是为了解决自动语音识别(ASR)和语音翻译(ST)的实时流处理问题而写的。
  • methods: 该论文提出了一种基于流Transformer-抽象器(T-T)模型的方法,可以同时生成多个目标语言的一对多和一对一的转写和翻译。它还提出了一种基于时间戳信息的单个排序器训练方法,以便在流处理Setting中有效地生成ASR和ST输出。
  • results: 实验表明,该方法能够在{it,es,de}->英语的实时流处理中生成一对多的转写和翻译输出,并且可以使用单个排序器来生成多个目标语言的输出,这是首次实现的。
    Abstract The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions. This has made offering translations in multiple languages essential for user applications. Traditional approaches to automatic speech recognition (ASR) and speech translation (ST) have often relied on separate systems, leading to inefficiencies in computational resources, and increased synchronization complexity in real time. In this paper, we propose a streaming Transformer-Transducer (T-T) model able to jointly produce many-to-one and one-to-many transcription and translation using a single decoder. We introduce a novel method for joint token-level serialized output training based on timestamp information to effectively produce ASR and ST outputs in the streaming setting. Experiments on {it,es,de}->en prove the effectiveness of our approach, enabling the generation of one-to-many joint outputs with a single decoder for the first time.
    摘要 随着全球交流增长,即时语音转写和翻译的需求不断增长。这使得为用户应用提供多种语言翻译成为必需。传统的自动语音识别(ASR)和语音翻译(ST)方法经常使用分开的系统,导致计算资源的浪费和实时同步复杂性的增加。在这篇论文中,我们提出了一种流式Transformer-Transducer(T-T)模型,能够同时生成多种语言的转写和翻译。我们还介绍了一种新的时间戳信息基于的单个解码器进行同步输出训练方法。在{it,es,de}->英语实验中,我们证明了我们的方法的效果,使得一个解码器可以生成一对多的同时输出。

Cross-Modal Conceptualization in Bottleneck Models

  • paper_url: http://arxiv.org/abs/2310.14805
  • repo_url: https://github.com/danisalukaev/xcbs
  • paper_authors: Danis Alukaev, Semen Kiselev, Ilya Pershin, Bulat Ibragimov, Vladimir Ivanov, Alexey Kornaev, Ivan Titov
  • for: 本研究旨在提出一种跨模态学习方法,以便在医学影像分类 зада务中使用文本描述来引导概念的生成。
  • methods: 本研究使用文本描述和医学影像之间的交互学习方法,将概念看作是隐藏的变量,并且通过预测概念来预测标签。
  • results: 经过实验表明,跨模态学习方法可以帮助生成可解释的概念,同时也可以增强模型的稳定性。
    Abstract Concept Bottleneck Models (CBMs) assume that training examples (e.g., x-ray images) are annotated with high-level concepts (e.g., types of abnormalities), and perform classification by first predicting the concepts, followed by predicting the label relying on these concepts. The main difficulty in using CBMs comes from having to choose concepts that are predictive of the label and then having to label training examples with these concepts. In our approach, we adopt a more moderate assumption and instead use text descriptions (e.g., radiology reports), accompanying the images in training, to guide the induction of concepts. Our cross-modal approach treats concepts as discrete latent variables and promotes concepts that (1) are predictive of the label, and (2) can be predicted reliably from both the image and text. Through experiments conducted on datasets ranging from synthetic datasets (e.g., synthetic images with generated descriptions) to realistic medical imaging datasets, we demonstrate that cross-modal learning encourages the induction of interpretable concepts while also facilitating disentanglement. Our results also suggest that this guidance leads to increased robustness by suppressing the reliance on shortcut features.
    摘要 conePT bottleneck models (CBMs) assume that training examples (e.g., x-ray images) are annotated with high-level concepts (e.g., types of abnormalities), and perform classification by first predicting the concepts, followed by predicting the label relying on these concepts. The main difficulty in using CBMs comes from having to choose concepts that are predictive of the label and then having to label training examples with these concepts. In our approach, we adopt a more moderate assumption and instead use text descriptions (e.g., radiology reports), accompanying the images in training, to guide the induction of concepts. Our cross-modal approach treats concepts as discrete latent variables and promotes concepts that (1) are predictive of the label, and (2) can be predicted reliably from both the image and text. Through experiments conducted on datasets ranging from synthetic datasets (e.g., synthetic images with generated descriptions) to realistic medical imaging datasets, we demonstrate that cross-modal learning encourages the induction of interpretable concepts while also facilitating disentanglement. Our results also suggest that this guidance leads to increased robustness by suppressing the reliance on shortcut features.Here's a word-for-word translation of the text into Simplified Chinese: conePT bottleneck models (CBMs) 假设训练示例(例如,X射线图像)被标注为高级概念(例如,畸形类型),并在这些概念的基础上进行分类。使用 conePT bottleneck models 的主要困难在于选择预测性的概念,并将训练示例标注为这些概念。在我们的方法中,我们采用更为妥协的假设,而不是使用高级概念,而是使用附加到图像上的文本描述(例如,医学报告)来引导概念的生成。我们的cross-modal方法将概念视为离散的隐藏变量,并且激励概念的预测,其中 (1) 预测标签,并 (2) 可预测自图像和文本两个方面。通过在不同的synthetic datasets(例如,生成的描述和图像)和实际医学成像 datasets 上进行实验,我们示出了cross-modal学习可以促进可解释的概念的生成,同时还可以促进分离。我们的结果还表明,这种指导可以提高Robustness,因为它可以抑制快捷特征的依赖。

Geographical Erasure in Language Generation

  • paper_url: http://arxiv.org/abs/2310.14777
  • repo_url: https://github.com/amazon-science/geographical-erasure-in-language-generation
  • paper_authors: Pola Schwöbel, Jacek Golebiowski, Michele Donini, Cédric Archambeau, Danish Pruthi
  • for: 研究大型语言模型(LLMs)如何受到训练数据中占主导地位的团体影响,并如何通过语言生成器中的各种方法来纠正这种偏见。
  • methods: 研究人员使用了多种方法来检测和纠正语言模型中的地域消失现象,包括对模型的训练数据进行分析,以及使用自定义目标函数进行微调。
  • results: 研究人员发现,在许多语言模型中,存在一种地域消失现象,其中某些国家的提及频率异常低。此外,研究人员还发现,这种消失与训练数据中国家提及频率的低 frequencie强相关。最后,研究人员通过微调来纠正这种消失。
    Abstract Large language models (LLMs) encode vast amounts of world knowledge. However, since these models are trained on large swaths of internet data, they are at risk of inordinately capturing information about dominant groups. This imbalance can propagate into generated language. In this work, we study and operationalise a form of geographical erasure, wherein language models underpredict certain countries. We demonstrate consistent instances of erasure across a range of LLMs. We discover that erasure strongly correlates with low frequencies of country mentions in the training corpus. Lastly, we mitigate erasure by finetuning using a custom objective.
    摘要

SuperTweetEval: A Challenging, Unified and Heterogeneous Benchmark for Social Media NLP Research

  • paper_url: http://arxiv.org/abs/2310.14757
  • repo_url: None
  • paper_authors: Dimosthenis Antypas, Asahi Ushio, Francesco Barbieri, Leonardo Neves, Kiamehr Rezaee, Luis Espinosa-Anke, Jiaxin Pei, Jose Camacho-Collados
  • for: 提高 NLP 在社交媒体上的评估和比较性能
  • methods: 引入一个统一的评估标准 SuperTweetEval,包括多种任务和数据集,适应不同的模型和比较metric
  • results: 虽然最近的语言模型在语言模型方面有所进步,但社交媒体仍然是挑战,模型的性能需要进一步改进
    Abstract Despite its relevance, the maturity of NLP for social media pales in comparison with general-purpose models, metrics and benchmarks. This fragmented landscape makes it hard for the community to know, for instance, given a task, which is the best performing model and how it compares with others. To alleviate this issue, we introduce a unified benchmark for NLP evaluation in social media, SuperTweetEval, which includes a heterogeneous set of tasks and datasets combined, adapted and constructed from scratch. We benchmarked the performance of a wide range of models on SuperTweetEval and our results suggest that, despite the recent advances in language modelling, social media remains challenging.
    摘要 尽管它的重要性,社交媒体NLP的成熟程度与通用模型、指标和benchmark相比,还是落后的。这个分化的景象使得社区很难知道,给定任务,哪怕是最高性能的模型,与其他模型如何进行比较。为了解决这个问题,我们引入了社交媒体NLP评估的统一benchmark,SuperTweetEval,该benchmark包括多种任务和数据集,经过组合、适应和自定义构建。我们对SuperTweetEval上的多种模型进行了性能测试,结果表明,尽管最近几年语言模型在语言模型方面取得了进步,但社交媒体仍然是一个挑战。

MCC-KD: Multi-CoT Consistent Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2310.14747
  • repo_url: None
  • paper_authors: Hongzhan Chen, Siyue Wu, Xiaojun Quan, Rui Wang, Ming Yan, Ji Zhang
  • for: 提高大语言模型(LLM)中的复杂逻辑能力和小模型的逻辑能力之间的转移。
  • methods: 提出了多个逻辑(Multi-CoT)一致知识填充(KD)技术,通过对每个问题生成多个论证并强制这些论证之间的一致性,以提高逻辑能力的多样性和一致性。
  • results: 通过使用不同的模型结构(LLaMA/FlanT5)和不同的模型规模(3B/7B/11B/13B)进行实验,证明了MCC-KD在数学逻辑和常识逻辑benchmark上表现出色,同时也表明了其在非标准数据集上的稳定普适性。
    Abstract Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting. Recently, there has been a growing interest in transferring these reasoning abilities from LLMs to smaller models. However, achieving both the diversity and consistency in rationales presents a challenge. In this paper, we focus on enhancing these two aspects and propose Multi-CoT Consistent Knowledge Distillation (MCC-KD) to efficiently distill the reasoning capabilities. In MCC-KD, we generate multiple rationales for each question and enforce consistency among the corresponding predictions by minimizing the bidirectional KL-divergence between the answer distributions. We investigate the effectiveness of MCC-KD with different model architectures (LLaMA/FlanT5) and various model scales (3B/7B/11B/13B) on both mathematical reasoning and commonsense reasoning benchmarks. The empirical results not only confirm MCC-KD's superior performance on in-distribution datasets but also highlight its robust generalization ability on out-of-distribution datasets.
    摘要 In MCC-KD, we generate multiple rationales for each question and enforce consistency among the corresponding predictions by minimizing the bidirectional KL-divergence between the answer distributions. We investigate the effectiveness of MCC-KD with different model architectures (LLaMA/FlanT5) and various model scales (3B/7B/11B/13B) on both mathematical reasoning and commonsense reasoning benchmarks. The empirical results not only confirm MCC-KD's superior performance on in-distribution datasets but also highlight its robust generalization ability on out-of-distribution datasets.Translated into Simplified Chinese:大型语言模型(LLM)有展示出很强的复杂推理能力,通过链接思维(CoT)提示。近期,对于将这些推理能力传递到较小的模型中,有增加的兴趣。然而,确保多样性和一致性在理由中是一个挑战。在这篇论文中,我们专注于提高这两个方面,并提出了多CoT一致知识传递(MCC-KD)来高效地传递推理能力。在MCC-KD中,我们为每个问题生成多个理由,并对应推理结果进行一致性 enforcement,通过对答案分布进行双向KL散度的最小化。我们使用不同的模型架构(LLaMA/FlanT5)和不同的模型缩减(3B/7B/11B/13B)进行评估,并评估其在数学推理和通过推理 benchmarks 上的表现。结果显示,MCC-KD不仅在内部数据上表现出色,而且在外部数据上也具有优秀的一致性和稳定性。

Once Upon a $\textit{Time}$ in $\textit{Graph}$: Relative-Time Pretraining for Complex Temporal Reasoning

  • paper_url: http://arxiv.org/abs/2310.14709
  • repo_url: https://github.com/damo-nlp-sg/rememo
  • paper_authors: Sen Yang, Xin Li, Lidong Bing, Wai Lam
  • for: 本研究旨在提高预训练语言模型对时间文本的理解和推理能力。
  • methods: 我们使用时间的本质,将所有时间范围内的句子串起来形成一个一维时轴,并建立基于时间相关性的图结构。我们提出了基于时间相关性的模型,即RemeMo,以连接所有时间范围内的事实。
  • results: 实验结果表明,RemeMo比基线T5在多个时间问答 datasets 下表现出优异性,特别是在模型长距离复杂时间关系的情况下。我们在 $\href{https://github.com/DAMO-NLP-SG/RemeMo}{\text{这里}$ 发布了我们的代码和预训练检查点。
    Abstract Our physical world is constantly evolving over time, rendering challenges for pre-trained language models to understand and reason over the temporal contexts of texts. Existing work focuses on strengthening the direct association between a piece of text and its time-stamp. However, the knowledge-time association is usually insufficient for the downstream tasks that require reasoning over temporal dependencies between knowledge. In this work, we make use of the underlying nature of time, all temporally-scoped sentences are strung together through a one-dimensional time axis, and suggest creating a graph structure based on the relative placements of events along the time axis. Inspired by the graph view, we propose RemeMo ($\underline{Re}$lative Ti$\underline{me}$ $\underline{Mo}$deling), which explicitly connects all temporally-scoped facts by modeling the time relations between any two sentences. Experimental results show that RemeMo outperforms the baseline T5 on multiple temporal question answering datasets under various settings. Further analysis suggests that RemeMo is especially good at modeling long-range complex temporal dependencies. We release our code and pre-trained checkpoints at $\href{https://github.com/DAMO-NLP-SG/RemeMo}{\text{this url}$.
    摘要

Strong and Efficient Baselines for Open Domain Conversational Question Answering

  • paper_url: http://arxiv.org/abs/2310.14708
  • repo_url: None
  • paper_authors: Andrei C. Coman, Gianni Barlacchi, Adrià de Gispert
  • for: 本研究旨在重新评估State-of-the-Art(SotA) dense passage retrieval(DPR)检索器和混合在编辑器(FiD)读取管道的效果,以及提出一种快速重新检索组件和targeted fine-tuning步骤,以提高ODConvQA任务的性能。
  • methods: 本研究使用了State-of-the-Art DPR检索器和FiD读取管道,并对其进行了改进和优化。我们还引入了一种快速重新检索组件,以及targeted fine-tuning步骤,以提高ODConvQA任务的性能。
  • results: 我们的实验结果表明,我们的方法可以提高SotA结果,同时降低读取器的延迟时间 by 60%。此外,我们还提供了一些新的VALUABLE INSIGHTS,以便未来的研究人员可以基于这些基elines,开发更复杂的方法,包括使用大型自然语言模型(LLMs)。
    Abstract Unlike the Open Domain Question Answering (ODQA) setting, the conversational (ODConvQA) domain has received limited attention when it comes to reevaluating baselines for both efficiency and effectiveness. In this paper, we study the State-of-the-Art (SotA) Dense Passage Retrieval (DPR) retriever and Fusion-in-Decoder (FiD) reader pipeline, and show that it significantly underperforms when applied to ODConvQA tasks due to various limitations. We then propose and evaluate strong yet simple and efficient baselines, by introducing a fast reranking component between the retriever and the reader, and by performing targeted finetuning steps. Experiments on two ODConvQA tasks, namely TopiOCQA and OR-QuAC, show that our method improves the SotA results, while reducing reader's latency by 60%. Finally, we provide new and valuable insights into the development of challenging baselines that serve as a reference for future, more intricate approaches, including those that leverage Large Language Models (LLMs).
    摘要 (注意:以下是简化中文版本,不同于开放领域问答(ODQA)设置,对话型问答(ODConvQA)领域尚未得到了足够的关注,重新评估基线的效率和效果。本文研究了现状最佳(SotA)的稠密段 Retrieval(DPR) Retriever和混合在解码器(FiD)读取管道,并发现它在ODConvQA任务中表现不佳,主要由于多种限制。我们then propose和评估了一些简单、高效的基线,通过在搜索器和解码器之间添加快速重新排名组件,以及进行targeted finetuning步骤。在两个ODConvQA任务, namely TopiOCQA和OR-QuAC上,我们的方法提高了SotA结果,同时降低读取器的延迟时间60%。最后,我们提供了新的有价值的视角,包括使用大语言模型(LLMs)的更复杂的基线的开发。)

The continued usefulness of vocabulary tests for evaluating large language models

  • paper_url: http://arxiv.org/abs/2310.14703
  • repo_url: https://github.com/wordsgpt/llm_vocabulary_evaluation
  • paper_authors: Gonzalo Martínez, Javier Conde, Elena Merino-Gómez, Beatriz Bermúdez-Margaretto, José Alberto Hernández, Pedro Reviriego, Marc Brysbaert
  • for: 测试现代自然语言模型的质量
  • methods: 使用Landauer和Dumain(1997)提出的测试英语为外语测试,以及Yes/No测试来评估模型的性能
  • results: 现代主要语言模型在Target Word测试中表现不完美,有些模型在不同项目上出错。在Yes/No测试中,模型对非存在的单词表现 significatively worse,与其他观察结果相符。在西班牙语测试中,大多数模型给出了字典中存在的单词的意思和翻译,但最佳模型开始表现非常良好,并且还指出了 Dictionary 中不存在的单词。
    Abstract In their seminal article on semantic vectors, Landauer and Dumain (1997) proposed testing the quality of AI language models with a challenging vocabulary test. We show that their Test of English as a Foreign Language (TOEFL) test remains informative for contemporary major language models, since none of the models was perfect and made errors on divergent items. The TOEFL test consists of target words with four alternatives to choose from. We further tested the models on a Yes/No test that requires distinguishing between existing words and made-up nonwords. The models performed significantly worse on the nonword items, in line with other observations that current major language models provide non-existent information. The situation was worse when we generalized the tests to Spanish. Here, most models gave meanings/translations for the majority of random letter sequences. On the plus side, the best models began to perform quite well, and they also pointed to nonwords that were unknown to the test participants but can be found in dictionaries.
    摘要 landaurer和dumain(1997)在Semantic Vector的论文中提出了测试人工智能语言模型的困难词汇测试。我们发现,这些模型对于当代主要语言模型来说,测试仍然有用,因为None of the models was perfect and made errors on divergent items。TOEFL测试包括目标词汇与四个选项。我们还对模型进行了是否存在测试,需要分辨真正存在的词和虚构的非Word。模型在非word项目上表现 significatively worse,与其他观察结果相符,现代主要语言模型提供了无效的信息。在扩展到西班牙语时,大多数模型为大多数随机字串提供了意思/翻译。fortunately,最佳模型在测试中表现良好,并且指向了未知的测试参与者,但可以在字典中找到的非Word。

Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models

  • paper_url: http://arxiv.org/abs/2310.14696
  • repo_url: https://github.com/gankim/tree-of-clarifications
  • paper_authors: Gangwoo Kim, Sungdong Kim, Byeongguk Jeon, Joonsuk Park, Jaewoo Kang
  • for: 这篇论文是关于如何处理抽象问题的,即问题可以有多种解释。
  • methods: 该论文提出了一种新的框架,即树结构化的解释框架(Tree of Clarifications,ToC),用于处理抽象问题。该框架通过几个Prompting来构建问题的解释树,并使用这些解释树来生成长答案。
  • results: 论文在ASQA上使用几个ew-shot设置下,与已有的基eline相比,具有更高的Disambig-F1和Disambig-ROUGE分数。此外,论文还超过了完全监督的基eline,在整个训练集上的Disambig-F1和Disambig-ROUGE分数方面表现更优。
    Abstract Questions in open-domain question answering are often ambiguous, allowing multiple interpretations. One approach to handling them is to identify all possible interpretations of the ambiguous question (AQ) and to generate a long-form answer addressing them all, as suggested by Stelmakh et al., (2022). While it provides a comprehensive response without bothering the user for clarification, considering multiple dimensions of ambiguity and gathering corresponding knowledge remains a challenge. To cope with the challenge, we propose a novel framework, Tree of Clarifications (ToC): It recursively constructs a tree of disambiguations for the AQ -- via few-shot prompting leveraging external knowledge -- and uses it to generate a long-form answer. ToC outperforms existing baselines on ASQA in a few-shot setup across the metrics, while surpassing fully-supervised baselines trained on the whole training set in terms of Disambig-F1 and Disambig-ROUGE. Code is available at https://github.com/gankim/tree-of-clarifications.
    摘要 通常,开放领域问答中的问题是抽象的,允许多种解释。一种方法是将抽象问题(AQ)的所有可能的解释标出,并生成一个详细的答案,以满足用户的所有需求。然而,考虑多个维度的抽象和收集相关知识是一项挑战。为解决这个问题,我们提出了一种新的框架:树形详细(ToC)。它通过几何激活外部知识来 recursively 构建 AQ 的树型解释,并使用它来生成详细的答案。ToC 在 ASQA 中以几个shot 的设置超过现有基eline,并在 Disambig-F1 和 Disambig-ROUGE 中击败完全监督基eline 训练在整个训练集上。代码可以在 https://github.com/gankim/tree-of-clarifications 中找到。

SpEL: Structured Prediction for Entity Linking

  • paper_url: http://arxiv.org/abs/2310.14684
  • repo_url: https://github.com/shavarani/spel
  • paper_authors: Hassan S. Shavarani, Anoop Sarkar
  • for: 这篇论文是关于实体链接的研究,旨在创建结构化数据,将文本Span链接到ontology或知识源。
  • methods: 该论文使用结构预测方法进行实体链接,每个输入Token都被识别为实体,并将Token预测结果聚合。该系统被称为SpEL(结构预测 для实体链接)。
  • results: 该论文的实验结果表明,SpEL可以在常用的AIDAbenchmark数据集上超过状态时的实体链接性能,并且具有较少的模型输出词汇大小和快速推理速度。
    Abstract Entity linking is a prominent thread of research focused on structured data creation by linking spans of text to an ontology or knowledge source. We revisit the use of structured prediction for entity linking which classifies each individual input token as an entity, and aggregates the token predictions. Our system, called SpEL (Structured prediction for Entity Linking) is a state-of-the-art entity linking system that uses some new ideas to apply structured prediction to the task of entity linking including: two refined fine-tuning steps; a context sensitive prediction aggregation strategy; reduction of the size of the model's output vocabulary, and; we address a common problem in entity-linking systems where there is a training vs. inference tokenization mismatch. Our experiments show that we can outperform the state-of-the-art on the commonly used AIDA benchmark dataset for entity linking to Wikipedia. Our method is also very compute efficient in terms of number of parameters and speed of inference.
    摘要 Entity linking是一个重要的研究方向,旨在通过将文本段联结到ontology或知识源来创建结构化数据。我们重新审视了用于entity linking的结构预测方法,其中每个输入token都被视为实体,并将tokend预测结果聚合。我们的系统名为SpEL(结构预测 для实体联结),它使用了一些新的想法来应用结构预测到实体联结任务中,包括:两个精细微调步骤;上下文敏感预测聚合策略;模型输出词汇表的减少,以及解决实体联结系统中常见的训练vs推理tokenization差异问题。我们的实验表明,我们可以超过当今最佳实体联结系统在AIDA数据集上的性能。此外,我们的方法也具有较少的参数量和快速的推理速度。

Pre-Trained Language Models Augmented with Synthetic Scanpaths for Natural Language Understanding

  • paper_url: http://arxiv.org/abs/2310.14676
  • repo_url: None
  • paper_authors: Shuwen Deng, Paul Prasse, David R. Reich, Tobias Scheffer, Lena A. Jäger
  • for: 这个论文的目的是提出一种基于人工眼动数据的自然语言处理(NLP)模型,以提高语言理解能力。
  • methods: 该论文使用了一种基于人工眼动生成的语言模型,并通过在读取过程中生成人类化的眼动数据来提高模型的性能。
  • results: 论文的实验结果表明,该模型不仅能够超过基础语言模型的性能,而且与基于真实人类眼动数据的模型性能相当。
    Abstract Human gaze data offer cognitive information that reflects natural language comprehension. Indeed, augmenting language models with human scanpaths has proven beneficial for a range of NLP tasks, including language understanding. However, the applicability of this approach is hampered because the abundance of text corpora is contrasted by a scarcity of gaze data. Although models for the generation of human-like scanpaths during reading have been developed, the potential of synthetic gaze data across NLP tasks remains largely unexplored. We develop a model that integrates synthetic scanpath generation with a scanpath-augmented language model, eliminating the need for human gaze data. Since the model's error gradient can be propagated throughout all parts of the model, the scanpath generator can be fine-tuned to downstream tasks. We find that the proposed model not only outperforms the underlying language model, but achieves a performance that is comparable to a language model augmented with real human gaze data. Our code is publicly available.
    摘要 人类视线数据提供了认知信息,它反映了自然语言理解。实际上,将人类扫描路径与语言模型结合可以提高各种自然语言处理任务的性能,包括语言理解。然而,这种方法的可用性受到了文本 Corpora 的充足性和人类视线数据的罕见性的限制。虽然已经开发了生成人类化扫描路径的模型,但是对于 NLP 任务中的 sintetic gaze data 的潜在价值还没有得到充分的探索。我们开发了一种将生成的扫描路径与扫描路径增强语言模型结合的模型,从而消除了人类视线数据的需求。由于模型的错误梯度可以在所有部分上传递,因此扫描路径生成器可以根据下游任务进行细调。我们发现,我们提出的模型不仅能够超越下面的语言模型,而且可以与基于真实人类视线数据进行了比较。我们的代码公开可用。

DPP-TTS: Diversifying prosodic features of speech via determinantal point processes

  • paper_url: http://arxiv.org/abs/2310.14663
  • repo_url: None
  • paper_authors: Seongho Joo, Hyukhun Koh, Kyomin Jung
  • for: 这 paper 的目的是提出一种基于 Determinantal Point Processes (DPPs) 的文本到语音模型 (TTS),以实现更多的语音欣赏度和多样性。
  • methods: 该模型使用 DPPs 来生成语音样本,并具有一个欣赏度增强模块,以 simultanously 考虑每个样本的感知多样性和多个样本之间的多样性。
  • results: 比较 experiments 表明,DPP-TTS 能够生成更多的语音样本,同时保持语音的自然性。
    Abstract With the rapid advancement in deep generative models, recent neural Text-To-Speech(TTS) models have succeeded in synthesizing human-like speech. There have been some efforts to generate speech with various prosody beyond monotonous prosody patterns. However, previous works have several limitations. First, typical TTS models depend on the scaled sampling temperature for boosting the diversity of prosody. Speech samples generated at high sampling temperatures often lack perceptual prosodic diversity, which can adversely affect the naturalness of the speech. Second, the diversity among samples is neglected since the sampling procedure often focuses on a single speech sample rather than multiple ones. In this paper, we propose DPP-TTS: a text-to-speech model based on Determinantal Point Processes (DPPs) with a prosody diversifying module. Our TTS model is capable of generating speech samples that simultaneously consider perceptual diversity in each sample and among multiple samples. We demonstrate that DPP-TTS generates speech samples with more diversified prosody than baselines in the side-by-side comparison test considering the naturalness of speech at the same time.
    摘要 With the rapid advancement in deep generative models, recent neural Text-To-Speech(TTS) models have succeeded in synthesizing human-like speech. There have been some efforts to generate speech with various prosody beyond monotonous prosody patterns. However, previous works have several limitations. First, typical TTS models depend on the scaled sampling temperature for boosting the diversity of prosody. Speech samples generated at high sampling temperatures often lack perceptual prosodic diversity, which can adversely affect the naturalness of the speech. Second, the diversity among samples is neglected since the sampling procedure often focuses on a single speech sample rather than multiple ones. In this paper, we propose DPP-TTS: a text-to-speech model based on Determinantal Point Processes (DPPs) with a prosody diversifying module. Our TTS model is capable of generating speech samples that simultaneously consider perceptual diversity in each sample and among multiple samples. We demonstrate that DPP-TTS generates speech samples with more diversified prosody than baselines in the side-by-side comparison test considering the naturalness of speech at the same time.Here's the translation in Traditional Chinese:随着深度生成模型的快速进步,现代神经网络 Text-To-Speech(TTS)模型已成功地 sinthezing human-like speech。有些努力将 speech sintheized with various prosody beyond monotonous prosody patterns。然而,先前的工作有 Several limitations。First, typical TTS models depend on the scaled sampling temperature for boosting the diversity of prosody。Speech samples generated at high sampling temperatures often lack perceptual prosodic diversity,which can adversely affect the naturalness of the speech。Second, the diversity among samples is neglected,since the sampling procedure often focuses on a single speech sample rather than multiple ones。在本文中,我们提出 DPP-TTS:基于 Determinantal Point Processes(DPPs)的 text-to-speech 模型,具有 prosody diversifying module。我们的 TTS 模型可以同时考虑每个样本的 perceptual 多样性和多个样本之间的多样性。我们显示 DPP-TTS 产生的 speech 样本与基准相比,具有更多的多样性,同时保持 speech 的自然性。

SPRING-INX: A Multilingual Indian Language Speech Corpus by SPRING Lab, IIT Madras

  • paper_url: http://arxiv.org/abs/2310.14654
  • repo_url: None
  • paper_authors: Nithya R, Malavika S, Jordan F, Arjun Gangwar, Metilda N J, S Umesh, Rithik Sarab, Akhilesh Kumar Dubey, Govind Divakaran, Samudra Vijaya K, Suryakanth V Gangashetty
  • for: 这个论文是为了推动印度语言技术社区为印度语言建立语音应用程序。
  • methods: 论文使用了manual transcription和legally sourced speech data来构建ASR系统。
  • results: 论文描述了数据收集和数据清洁过程,以及数据统计。
    Abstract India is home to a multitude of languages of which 22 languages are recognised by the Indian Constitution as official. Building speech based applications for the Indian population is a difficult problem owing to limited data and the number of languages and accents to accommodate. To encourage the language technology community to build speech based applications in Indian languages, we are open sourcing SPRING-INX data which has about 2000 hours of legally sourced and manually transcribed speech data for ASR system building in Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi and Tamil. This endeavor is by SPRING Lab , Indian Institute of Technology Madras and is a part of National Language Translation Mission (NLTM), funded by the Indian Ministry of Electronics and Information Technology (MeitY), Government of India. We describe the data collection and data cleaning process along with the data statistics in this paper.
    摘要 印度是一个多语言国家,其中22种语言被印度宪法认可为官方语言。建立基于语音的应用程序 для印度人口是一个困难的问题,因为有限的数据和语言和口音要适应。为促进印度语言技术社区在印度语言上建立语音基本应用程序,我们将开源SPRING-INX数据,包括2000小时的法定获取和手动译录的语音数据,用于ASR系统的建立。这个尝试由SPRING实验室,印度理工学院Madras进行,是国家语言翻译使命(NLTM)的一部分,由印度电子和信息技术部(MeitY)政府承担。我们将介绍数据收集和清洗过程以及数据统计。

Multilingual k-Nearest-Neighbor Machine Translation

  • paper_url: http://arxiv.org/abs/2310.14644
  • repo_url: None
  • paper_authors: David Stap, Christof Monz
  • for: 提高机器翻译质量,特别是低资源语言对翻译质量的限制。
  • methods: combinig representations from multiple languages into a single datastore。
  • results: 实现了低资源翻译质量的显著提高(最高+3.6 BLEU),以及高资源翻译质量的小幅提高(最高+0.5 BLEU),同时实现了 datastore 的减少和速度提高。
    Abstract k-nearest-neighbor machine translation has demonstrated remarkable improvements in machine translation quality by creating a datastore of cached examples. However, these improvements have been limited to high-resource language pairs, with large datastores, and remain a challenge for low-resource languages. In this paper, we address this issue by combining representations from multiple languages into a single datastore. Our results consistently demonstrate substantial improvements not only in low-resource translation quality (up to +3.6 BLEU), but also for high-resource translation quality (up to +0.5 BLEU). Our experiments show that it is possible to create multilingual datastores that are a quarter of the size, achieving a 5.3x speed improvement, by using linguistic similarities for datastore creation.
    摘要

Extending Input Contexts of Language Models through Training on Segmented Sequences

  • paper_url: http://arxiv.org/abs/2310.14633
  • repo_url: None
  • paper_authors: Petros Karypis, Julian McAuley, George Karypis
  • for: 提高语言模型对长输入的训练效果
  • methods: 使用分割序列和 interpolate-based 方法扩展绝对位置嵌入
  • results: 可以extend输入上下文大小无需改变模型结构和不增加内存成本,并且可以改善模型在长输入上的表现
    Abstract Effectively training language models on long inputs poses many technical challenges. As a cost consideration, languages models are pretrained on a fixed sequence length before being adapted to longer sequences. We explore various methods for adapting models to longer inputs by training on segmented sequences and an interpolation-based method for extending absolute positional embeddings. We develop a training procedure to extend the input context size of pretrained models with no architectural changes and no additional memory costs than training on the original input lengths. By sub-sampling segments from long inputs while maintaining their original position the model is able to learn new positional interactions. Our method benefits both models trained with absolute positional embeddings, by extending their input contexts, as well as popular relative positional embedding methods showing a reduced perplexity on sequences longer than they were trained on. We demonstrate our method can extend input contexts by a factor of 4x while improving perplexity.
    摘要 实现长输入的语言模型训练具有许多技术挑战。为了考虑成本考虑,语言模型通常在固定序列长度下进行预训,然后被适应到更长的序列。我们探索了多种方法来将模型适应到更长的输入,包括在分段序列上进行训练和使用插值基于的方法来延长绝对位置嵌入。我们开发了一个训练程序,可以将预训的模型中的输入上下文大小延长,而不需要任何结构更改和额外的内存成本。通过在长输入中抽样段落,并保持段落的原始位置,模型可以学习新的位置互动。我们的方法对于使用绝对位置嵌入的模型和受欢迎的相对位置嵌入方法都有利,可以将输入上下文延长4倍,并降低序列长于它们预训的时间。我们展示了我们的方法可以将输入上下文延长4倍,同时降低序列长于它们预训的时间。

Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts

  • paper_url: http://arxiv.org/abs/2310.14628
  • repo_url: https://github.com/tengxiaoliu/xot
  • paper_authors: Tengxiao Liu, Qipeng Guo, Yuqing Yang, Xiangkun Hu, Yue Zhang, Xipeng Qiu, Zheng Zhang
  • for: 这paper aimed to propose an integrated problem-solving framework for math reasoning tasks, which can effectively utilize the strengths of different prompting methods.
  • methods: 该方法使用了多种推理思维方法,包括链条思维、程序思维等,并通过 Iterative execution 和动态方法 switching 来实现问题解决。
  • results: 经过广泛的实验测试, authors 证明了他们提出的方法的效果,并进行了问题解决的分析和比较。 besides, the results suggest that the framework is orthogonal to recent work on single reasoning methods and can be further generalized to logical reasoning domain.
    Abstract As large language models (LLMs) have shown effectiveness with different prompting methods, such as Chain of Thought, Program of Thought, we find that these methods have formed a great complementarity to each other on math reasoning tasks. In this work, we propose XoT, an integrated problem solving framework by prompting LLMs with diverse reasoning thoughts. For each question, XoT always begins with selecting the most suitable method then executes each method iteratively. Within each iteration, XoT actively checks the validity of the generated answer and incorporates the feedback from external executors, allowing it to dynamically switch among different prompting methods. Through extensive experiments on 10 popular math reasoning datasets, we demonstrate the effectiveness of our proposed approach and thoroughly analyze the strengths of each module. Moreover, empirical results suggest that our framework is orthogonal to recent work that makes improvements on single reasoning methods and can further generalise to logical reasoning domain. By allowing method switching, XoT provides a fresh perspective on the collaborative integration of diverse reasoning thoughts in a unified framework.
    摘要 受大语言模型(LLM)的不同提示方法的影响,我们发现这些方法在数学逻辑任务上形成了一种优良的补充关系。在这项工作中,我们提议XoT,一个集成的问题解决框架,通过对LLM的多种思维方法进行提示来解决问题。对于每个问题,XoT都会选择最适合的方法,然后在每次迭代中运行每个方法。在每次迭代中,XoT会活动地检查生成的答案的有效性,并根据外部执行器的反馈进行动态切换。通过对10种流行的数学逻辑数据集进行广泛的实验,我们证明了我们的提议的有效性,并且对每个模块进行了详细的分析。此外,实验结果表明,XoT是与最近的单一逻辑方法改进工作 ortogonal的,并且可以进一步泛化到逻辑逻辑领域。通过允许方法切换,XoT提供了一种新的多思维方法集成的共同框架,为数学逻辑问题的解决提供了新的思路。

CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster Tweet Classification

  • paper_url: http://arxiv.org/abs/2310.14627
  • repo_url: https://github.com/HenryPengZou/DeCrisisMB
  • paper_authors: Henry Peng Zou, Yue Zhou, Cornelia Caragea, Doina Caragea
  • for: 这个论文旨在提高自然灾害事件监测中的效果,使用少量标注数据和大量无标注数据进行分类。
  • methods: 该模型使用了 semi-supervised 和 few-shot 学习方法,只需要少量的标注数据和大量的无标注数据来实现精细化分类。
  • results: 模型在两个自然灾害数据集上的平均提高率为 11.2%,并且通过对数据量的变化和域外结果进行分析。
    Abstract The shared real-time information about natural disasters on social media platforms like Twitter and Facebook plays a critical role in informing volunteers, emergency managers, and response organizations. However, supervised learning models for monitoring disaster events require large amounts of annotated data, making them unrealistic for real-time use in disaster events. To address this challenge, we present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting where only a small number of annotated data is required. Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data, mimicking the early stage of a disaster. Through integrating effective semi-supervised learning ideas and incorporating TextMixUp, CrisisMatch achieves performance improvement on two disaster datasets of 11.2\% on average. Further analyses are also provided for the influence of the number of labeled data and out-of-domain results.
    摘要 共享的实时信息在社交媒体平台上,如推特和facebook,在援助者、紧急管理人员和应急响应组织中扮演了关键的角色。然而,监督学习模型用于监测自然灾害事件需要大量注释数据,使其在紧急事件中不实际。为解决这个挑战,我们提出了一种细化的自然灾害微博分类模型,基于半监督、少量学习设置。我们的模型,危机匹配,能够使用少量注释数据和大量无注释数据来分类微博,模拟紧急事件的早期阶段。通过 integrate 有效的半监督学习想法和 TextMixUp,危机匹配实现了平均11.2%的性能提升在两个自然灾害数据集上。此外,我们还提供了数据量的影响和 OUT-OF-DOMAIN 结果的分析。

Conversational Recommender System and Large Language Model Are Made for Each Other in E-commerce Pre-sales Dialogue

  • paper_url: http://arxiv.org/abs/2310.14626
  • repo_url: None
  • paper_authors: Yuanxing Liu, Wei-Nan Zhang, Yifan Chen, Yuchi Zhang, Haopeng Bai, Fan Feng, Hengbin Cui, Yongbin Li, Wanxiang Che
  • for: 这篇论文主要目的是探讨在电商预售对话中使用语言模型和会话推荐系统(CRS)的合作方式,以提高推荐的准确性和有用性。
  • methods: 这篇论文使用了两种协作方法:一种是使用CRS协助语言模型(LLM),另一种是使用LLM协助CRS。这两种协作方法在四个电商预售对话任务中进行了广泛的实验。
  • results: 研究发现,在某些情况下,CRS和LLM的协作可以非常有效。
    Abstract E-commerce pre-sales dialogue aims to understand and elicit user needs and preferences for the items they are seeking so as to provide appropriate recommendations. Conversational recommender systems (CRSs) learn user representation and provide accurate recommendations based on dialogue context, but rely on external knowledge. Large language models (LLMs) generate responses that mimic pre-sales dialogues after fine-tuning, but lack domain-specific knowledge for accurate recommendations. Intuitively, the strengths of LLM and CRS in E-commerce pre-sales dialogues are complementary, yet no previous work has explored this. This paper investigates the effectiveness of combining LLM and CRS in E-commerce pre-sales dialogues, proposing two collaboration methods: CRS assisting LLM and LLM assisting CRS. We conduct extensive experiments on a real-world dataset of Ecommerce pre-sales dialogues. We analyze the impact of two collaborative approaches with two CRSs and two LLMs on four tasks of Ecommerce pre-sales dialogue. We find that collaborations between CRS and LLM can be very effective in some cases.
    摘要 电商预售对话的目标是理解和提取用户需求和喜好,以提供相应的建议。对话推荐系统(CRS)学习用户表示,并基于对话上下文提供准确的建议,但是需要外部知识。大语言模型(LLM)通过精度地优化,生成类似预售对话的回答,但是缺乏特定领域知识。我们认为LLM和CRS在电商预售对话中的优势是补偿的,但是没有前期研究这一点。这篇论文探讨了将LLM和CRS在电商预售对话中合作的效果,并提出了两种合作方法:CRS帮助LLM和LLM帮助CRS。我们在一个真实的电商预售对话数据集上进行了广泛的实验。我们分析了在四个电商预售对话任务上的两种合作方法的影响。我们发现在某些情况下,LLM和CRS之间的合作可以非常有效。

CoF-CoT: Enhancing Large Language Models with Coarse-to-Fine Chain-of-Thought Prompting for Multi-domain NLU Tasks

  • paper_url: http://arxiv.org/abs/2310.14623
  • repo_url: None
  • paper_authors: Hoang H. Nguyen, Ye Liu, Chenwei Zhang, Tao Zhang, Philip S. Yu
  • for: This paper aims to improve the performance of large language models (LLMs) in natural language understanding (NLU) tasks by proposing a Coarse-to-Fine Chain-of-Thought (CoF-CoT) approach that breaks down NLU tasks into multiple reasoning steps.
  • methods: The proposed approach uses semantic-based Abstract Meaning Representation (AMR) structured knowledge as an intermediate step to capture the nuances and diverse structures of utterances, and to understand connections between their varying levels of granularity.
  • results: The proposed approach is demonstrated effective in assisting LLMs adapt to multi-grained NLU tasks under both zero-shot and few-shot multi-domain settings.
    Abstract While Chain-of-Thought prompting is popular in reasoning tasks, its application to Large Language Models (LLMs) in Natural Language Understanding (NLU) is under-explored. Motivated by multi-step reasoning of LLMs, we propose Coarse-to-Fine Chain-of-Thought (CoF-CoT) approach that breaks down NLU tasks into multiple reasoning steps where LLMs can learn to acquire and leverage essential concepts to solve tasks from different granularities. Moreover, we propose leveraging semantic-based Abstract Meaning Representation (AMR) structured knowledge as an intermediate step to capture the nuances and diverse structures of utterances, and to understand connections between their varying levels of granularity. Our proposed approach is demonstrated effective in assisting the LLMs adapt to the multi-grained NLU tasks under both zero-shot and few-shot multi-domain settings.
    摘要 while Chain-of-Thought prompting is popular in reasoning tasks, its application to Large Language Models (LLMs) in Natural Language Understanding (NLU) is under-explored. Motivated by multi-step reasoning of LLMs, we propose Coarse-to-Fine Chain-of-Thought (CoF-CoT) approach that breaks down NLU tasks into multiple reasoning steps where LLMs can learn to acquire and leverage essential concepts to solve tasks from different granularities. Moreover, we propose leveraging semantic-based Abstract Meaning Representation (AMR) structured knowledge as an intermediate step to capture the nuances and diverse structures of utterances, and to understand connections between their varying levels of granularity. Our proposed approach is demonstrated effective in assisting the LLMs adapt to the multi-grained NLU tasks under both zero-shot and few-shot multi-domain settings.Here's the breakdown of the translation:* while (而) is translated as "而"* Chain-of-Thought (Chain-of-Thought) is translated as "思维链"* prompting (prompting) is translated as "提示"* is popular (is popular) is translated as "很受欢迎"* in reasoning tasks (in reasoning tasks) is translated as "在理解任务中"* its application (its application) is translated as "其应用"* to Large Language Models (LLMs) (to Large Language Models (LLMs)) is translated as "对大语言模型"* in Natural Language Understanding (NLU) (in Natural Language Understanding (NLU)) is translated as "在自然语言理解中"* is under-explored (is under-explored) is translated as "尚未得到足够的探索"* Motivated (Motivated) is translated as "受到动机"* by multi-step reasoning (by multi-step reasoning) is translated as "多步骤的理解"* of LLMs (of LLMs) is translated as "LLMs的"* we propose (we propose) is translated as "我们提议"* Coarse-to-Fine Chain-of-Thought (CoF-CoT) (Coarse-to-Fine Chain-of-Thought (CoF-CoT)) is translated as "从粗到细的思维链"* approach (approach) is translated as "方法"* that breaks down (that breaks down) is translated as "分解"* NLU tasks (NLU tasks) is translated as "自然语言理解任务"* into (into) is translated as "分解为"* multiple reasoning steps (multiple reasoning steps) is translated as "多个理解步骤"* where (where) is translated as "在"* LLMs can learn (LLMs can learn) is translated as "LLMs可以学习"* to acquire (to acquire) is translated as "获得"* and leverage (and leverage) is translated as "并利用"* essential concepts (essential concepts) is translated as "重要概念"* to solve (to solve) is translated as "解决"* tasks (tasks) is translated as "任务"* from different granularities (from different granularities) is translated as "不同粒度"* Moreover (Moreover) is translated as "另外"* we propose (we propose) is translated as "我们提议"* leveraging (leveraging) is translated as "利用"* semantic-based (semantic-based) is translated as "基于 semantics的"* Abstract Meaning Representation (AMR) (Abstract Meaning Representation (AMR)) is translated as "抽象意义表示"* structured knowledge (structured knowledge) is translated as "结构化知识"* as an intermediate step (as an intermediate step) is translated as "作为中间步骤"* to capture (to capture) is translated as "捕捉"* the nuances (the nuances) is translated as "细节"* and diverse structures (and diverse structures) is translated as "以及多种结构"* of utterances (of utterances) is translated as "语言表达中的"* and to understand (and to understand) is translated as "理解"* connections (connections) is translated as "连接"* between their varying levels of granularity (between their varying levels of granularity) is translated as "不同粒度之间的连接"* Our proposed approach (Our proposed approach) is translated as "我们提议的方法"* is demonstrated (is demonstrated) is translated as "已经证明"* effective (effective) is translated as "有效"* in assisting (in assisting) is translated as "协助"* the LLMs (the LLMs) is translated as "LLMs"* adapt (adapt) is translated as "适应"* to the multi-grained NLU tasks (to the multi-grained NLU tasks) is translated as "对多级自然语言理解任务"* under both zero-shot and few-shot (under both zero-shot and few-shot) is translated as "在零shot和几个shot的多级任务中"* multi-domain settings (multi-domain settings) is translated as "多个领域的多级任务"

Efficient Cross-Task Prompt Tuning for Few-Shot Conversational Emotion Recognition

  • paper_url: http://arxiv.org/abs/2310.14614
  • repo_url: None
  • paper_authors: Yige Xu, Zhiwei Zeng, Zhiqi Shen
  • for: 这个研究旨在提高对话中的情感识别(ERC)性能,并且使用预训练语言模型(PLMs)来实现这一目标。
  • methods: 我们提出了一种 derivatives-free 优化方法,即 Cross-Task Prompt Tuning(CTPT),可以在几个实验中进行几据学习。CTPT 利用了不同任务之间的共享知识,从而提高学习效率。
  • results: 我们在五个不同的对话上进行了实验,结果显示了我们的 CTPT 方法在几据学习和零据转移中具有优秀的成绩。
    Abstract Emotion Recognition in Conversation (ERC) has been widely studied due to its importance in developing emotion-aware empathetic machines. The rise of pre-trained language models (PLMs) has further pushed the limit of ERC performance. However, most recent works on ERC using PLMs are heavily data-driven, and requires fine-tuning the entire PLMs. To improve both sample and computational efficiency, we propose a derivative-free optimization method called Cross-Task Prompt Tuning (CTPT) for few-shot conversational emotion recognition. Unlike existing methods that learn independent knowledge from individual tasks, CTPT leverages sharable cross-task knowledge by exploiting external knowledge from other source tasks to improve learning performance under the few-shot setting. Moreover, CTPT only needs to optimize a vector under the low intrinsic dimensionality without gradient, which is highly parameter-efficient compared with existing approaches. Experiments on five different contextual conversation datasets demonstrate that our CTPT method has superior results on both few-shot scenarios and zero-shot transfers.
    摘要 与现有方法不同,CTPT不是独立学习各个任务的知识,而是利用其他来源任务的外部知识来提高在几何学shot设定下的学习性能。此外,CTPT只需要优化一个维度下的低内在维度,这比既有的方法更高效。在五种不同的语言对话dataset上进行了实验,我们发现我们的CTPT方法在几何学shotenario和零shot传递中具有优越的性能。

That was the last straw, we need more: Are Translation Systems Sensitive to Disambiguating Context?

  • paper_url: http://arxiv.org/abs/2310.14610
  • repo_url: None
  • paper_authors: Jaechan Lee, Alisa Liu, Orevaoghene Ahia, Hila Gonen, Noah A. Smith
  • for: 本研究旨在study English idiomatic expressions的翻译问题,尤其是这些表达在不同语言中的含义差异。
  • methods: 我们使用了MT模型和语言模型进行比较,以评估它们在拥有歧义性的句子中表现的不同。我们收集了512对英语句子,其中一个句子 literal,另一个句子 figurative,并且在不同的目标语言中进行了对照试验。
  • results: 我们发现,当面临歧义性句子时,现有的MT模型往往会直接翻译成 literal 的意思,而忽略 figurative 的含义。相比之下,语言模型在不同的语言中表现更加灵活,尽管还存在一些语言之间的差异。这些结果表明,语言模型可能是跨语言翻译中的强大后备选择。
    Abstract The translation of ambiguous text presents a challenge for translation systems, as it requires using the surrounding context to disambiguate the intended meaning as much as possible. While prior work has studied ambiguities that result from different grammatical features of the source and target language, we study semantic ambiguities that exist in the source (English in this work) itself. In particular, we focus on idioms that are open to both literal and figurative interpretations (e.g., goose egg), and collect TIDE, a dataset of 512 pairs of English sentences containing idioms with disambiguating context such that one is literal (it laid a goose egg) and another is figurative (they scored a goose egg, as in a score of zero). In experiments, we compare MT-specific models and language models for (i) their preference when given an ambiguous subsentence, (ii) their sensitivity to disambiguating context, and (iii) the performance disparity between figurative and literal source sentences. We find that current MT models consistently translate English idioms literally, even when the context suggests a figurative interpretation. On the other hand, LMs are far more context-aware, although there remain disparities across target languages. Our findings underline the potential of LMs as a strong backbone for context-aware translation.
    摘要 现在的翻译系统面临着抽象文本的翻译挑战,因为它需要使用周围的上下文来尽可能精确地翻译意图。而我们在这里研究的是源语言(英语)本身的含义抽象,具体来说是idioms,这些idioms可以被理解为Literal和 figurative两种不同的 interpretations(如鸡蛋)。我们收集了TIDE数据集,包含512对英语句子,其中一个是Literal(它产生了鸡蛋),另一个是 figurative(他们得到了鸡蛋,即 zeroscore)。在实验中,我们比较了MT特定模型和语言模型,包括:(i)它们在抽象子句中的偏好,(ii)它们对上下文的敏感度,(iii)来自Literal和 figurative源语言句子的性能差异。我们发现现有的MT模型在翻译英语idioms时一直 literal,即使上下文表明 figurative 的解释。相反,LMs 是非常上下文感知的,尽管存在不同目标语言的差异。我们的发现 подчеркиваетLMs的强大后备可以为上下文感知翻译提供支持。

Long Short-Term Planning for Conversational Recommendation Systems

  • paper_url: http://arxiv.org/abs/2310.14609
  • repo_url: None
  • paper_authors: Xian Li, Hongguang Shi, Yunfei Wang, Yeqin Zhang, Xubin Li, Cam-Tu Nguyen
  • for: 本研究旨在提高对话推荐系统(CRS)中对话代理的自然问题表达和适应推荐。
  • methods: 本文提出了一种新的很短期反馈体系,通过将对话模型和推荐预测模型相互连接,使这两个重要组件在CRS中充分交互。
  • results: 研究表明,该体系可以更好地捕捉用户偏好,并提供适应的推荐。
    Abstract In Conversational Recommendation Systems (CRS), the central question is how the conversational agent can naturally ask for user preferences and provide suitable recommendations. Existing works mainly follow the hierarchical architecture, where a higher policy decides whether to invoke the conversation module (to ask questions) or the recommendation module (to make recommendations). This architecture prevents these two components from fully interacting with each other. In contrast, this paper proposes a novel architecture, the long short-term feedback architecture, to connect these two essential components in CRS. Specifically, the recommendation predicts the long-term recommendation target based on the conversational context and the user history. Driven by the targeted recommendation, the conversational model predicts the next topic or attribute to verify if the user preference matches the target. The balance feedback loop continues until the short-term planner output matches the long-term planner output, that is when the system should make the recommendation.
    摘要 在对话推荐系统(CRS)中,中心问题是如何让对话代理人自然地请求用户偏好并提供适合的推荐。现有的工作主要采用层次架构,其中高级政策决定是否 invoke对话模块(问题)或推荐模块(推荐)。这种架构 prevents这两个组件从全面相互作用。相反,这篇论文提出了一种新的架构,长期短期反馈架构,以连接这两个关键组件在 CRS 中。具体来说,推荐模型根据对话 контекст和用户历史预测长期推荐目标。驱动了这个目标的推荐,对话模型预测下一个话题或特性,以验证用户喜好是否匹配目标。这种平衡反馈循环持续 until short-term planner 输出与长期 planner 输出匹配,即系统应该提供推荐。

Investigating the Fairness of Large Language Models for Predictions on Tabular Data

  • paper_url: http://arxiv.org/abs/2310.14607
  • repo_url: None
  • paper_authors: Yanchen Liu, Srishti Gautam, Jiaqi Ma, Himabindu Lakkaraju
  • for: 本文探讨了使用大型自然语言模型(LLMs)进行表格任务预测时,是否存在社会偏见问题,以及这些偏见对公平性的影响。
  • methods: 本文使用了一系列实验来研究 LLMS 在表格任务预测时是否会继承社会偏见,以及这些偏见对公平性的影响。
  • results: 研究结果表明, LLMS 在表格任务预测时会继承社会偏见,这些偏见会导致公平性问题。此外,我们的研究还发现,在减少偏见方面,尝试用宽泛的 Fine-tuning 和 Context-aware 策略可以有一定的改善,但是这些策略并不能完全消除社会偏见的影响。
    Abstract Recent literature has suggested the potential of using large language models (LLMs) to make predictions for tabular tasks. However, LLMs have been shown to exhibit harmful social biases that reflect the stereotypes and inequalities present in the society. To this end, as well as the widespread use of tabular data in many high-stake applications, it is imperative to explore the following questions: what sources of information do LLMs draw upon when making predictions for tabular tasks; whether and to what extent are LLM predictions for tabular tasks influenced by social biases and stereotypes; and what are the consequential implications for fairness? Through a series of experiments, we delve into these questions and show that LLMs tend to inherit social biases from their training data which significantly impact their fairness in tabular prediction tasks. Furthermore, our investigations show that in the context of bias mitigation, though in-context learning and fine-tuning have a moderate effect, the fairness metric gap between different subgroups is still larger than that in traditional machine learning models, such as Random Forest and shallow Neural Networks. This observation emphasizes that the social biases are inherent within the LLMs themselves and inherited from their pre-training corpus, not only from the downstream task datasets. Besides, we demonstrate that label-flipping of in-context examples can significantly reduce biases, further highlighting the presence of inherent bias within LLMs.
    摘要
  1. What sources of information do LLMs rely on when making predictions for tabular tasks?2. To what extent are LLM predictions influenced by social biases and stereotypes?3. What are the implications for fairness?Through a series of experiments, we found that LLMs inherit social biases from their training data, which significantly impacts their fairness in tabular prediction tasks. Additionally, we found that in-context learning and fine-tuning have a moderate effect in mitigating biases, but the fairness metric gap between different subgroups is still larger than in traditional machine learning models. This suggests that the social biases are inherent within the LLMs themselves and not just from the downstream task datasets. Furthermore, we demonstrated that label-flipping of in-context examples can significantly reduce biases, highlighting the presence of inherent bias within LLMs.

M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis

  • paper_url: http://arxiv.org/abs/2310.14605
  • repo_url: https://github.com/grandchicken/m2df
  • paper_authors: Fei Zhao, Chunhui Li, Zhen Wu, Yawen Ouyang, Jianbing Zhang, Xinyu Dai
  • for: 本研究旨在提高多模态方面性情感分析(MABSA)任务的精度,尤其是避免因为数据中含有噪音图像而带来的负面影响。
  • methods: 我们提出了一种基于课程学习的多级多课程减噪框架(M2DF),通过调整训练数据的顺序来实现减噪。
  • results: 我们的方法在三个MABSA任务下表现了consistently比 estado-of-the-art更高的性能。
    Abstract Multimodal Aspect-based Sentiment Analysis (MABSA) is a fine-grained Sentiment Analysis task, which has attracted growing research interests recently. Existing work mainly utilizes image information to improve the performance of MABSA task. However, most of the studies overestimate the importance of images since there are many noise images unrelated to the text in the dataset, which will have a negative impact on model learning. Although some work attempts to filter low-quality noise images by setting thresholds, relying on thresholds will inevitably filter out a lot of useful image information. Therefore, in this work, we focus on whether the negative impact of noisy images can be reduced without modifying the data. To achieve this goal, we borrow the idea of Curriculum Learning and propose a Multi-grained Multi-curriculum Denoising Framework (M2DF), which can achieve denoising by adjusting the order of training data. Extensive experimental results show that our framework consistently outperforms state-of-the-art work on three sub-tasks of MABSA.
    摘要 多模式方面基 sentiment分析 (MABSA) 是一个细化的 sentiment 分析任务,在最近吸引了增加研究兴趣。现有的工作主要利用图像信息提高 MABSA 任务的性能。然而,大多数研究过度估计图像的重要性,因为 dataset 中存在许多不相关的噪音图像,这将对模型学习产生负面影响。虽然一些工作尝试通过设置阈值来过滤噪音图像,但这会导致排除有用的图像信息。因此,在这个工作中,我们强调是否可以降低噪音图像的负面影响而无需修改数据。为达到这个目标,我们借鉴了课程学习的想法并提出了 Multi-grained Multi-curriculum Denoising Framework (M2DF),可以通过调整训练数据的顺序来实现减噪。我们的框架在三个 MABSA 任务上进行了广泛的实验,并 consistently 超越了当前的状态艺。

Generative Pre-trained Transformer for Vietnamese Community-based COVID-19 Question Answering

  • paper_url: http://arxiv.org/abs/2310.14602
  • repo_url: None
  • paper_authors: Tam Minh Vo, Khiem Vinh Tran
  • for: 本研究旨在探讨 Generative Pre-trained Transformer (GPT) 在 Vietnamese 自然语言处理领域的应用前景。
  • methods: 本研究使用 GPT-2 作为 decoder,并对不同的 Transformers 和 SOTA 模型进行比较分析,以评估其在社区问答系统中的表现。
  • results: 实验结果表明,GPT-2 模型在社区问答系统中表现出色,不仅超越了其他 SOTA 模型,还超越了之前在 Vietnamese 语言上开发的社区问答模型。
    Abstract Recent studies have provided empirical evidence of the wide-ranging potential of Generative Pre-trained Transformer (GPT), a pretrained language model, in the field of natural language processing. GPT has been effectively employed as a decoder within state-of-the-art (SOTA) question answering systems, yielding exceptional performance across various tasks. However, the current research landscape concerning GPT's application in Vietnamese remains limited. This paper aims to address this gap by presenting an implementation of GPT-2 for community-based question answering specifically focused on COVID-19 related queries in Vietnamese. We introduce a novel approach by conducting a comparative analysis of different Transformers vs SOTA models in the community-based COVID-19 question answering dataset. The experimental findings demonstrate that the GPT-2 models exhibit highly promising outcomes, outperforming other SOTA models as well as previous community-based COVID-19 question answering models developed for Vietnamese.
    摘要 现有研究提供了对Generative Pre-trained Transformer(GPT)在自然语言处理领域的证据,证明GPT在问答系统中作为解码器可以获得出色的性能。然而,关于GPT在越南语中的应用研究现场仍然很有限。这篇论文计划 Addressing this gap by presenting an implementation of GPT-2 for community-based question answering specifically focused on COVID-19 related queries in Vietnamese. We introduce a novel approach by conducting a comparative analysis of different Transformers vs SOTA models in the community-based COVID-19 question answering dataset. The experimental findings demonstrate that the GPT-2 models exhibit highly promising outcomes, outperforming other SOTA models as well as previous community-based COVID-19 question answering models developed for Vietnamese.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.

Large Search Model: Redefining Search Stack in the Era of LLMs

  • paper_url: http://arxiv.org/abs/2310.14587
  • repo_url: None
  • paper_authors: Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei
  • for: 这篇论文的目的是提出一种新的搜索框架,即大型搜索模型(LLM),通过将所有搜索任务都视为自然语言生成问题,以便自然地通过自然语言提示来定制任务。
  • methods: 这篇论文使用了一种新的概念框架,即大型搜索模型(LLM),该模型可以通过自然语言理解和逻辑能力来提高搜索结果质量,同时也可以简化现有的复杂的搜索堆栈。
  • results: 作者通过一系列证明性实验来证明了这种框架的可行性,并讨论了在现实世界搜索系统中实施这种方法的挑战。
    Abstract Modern search engines are built on a stack of different components, including query understanding, retrieval, multi-stage ranking, and question answering, among others. These components are often optimized and deployed independently. In this paper, we introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM). All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts. This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack. To substantiate the feasibility of this framework, we present a series of proof-of-concept experiments and discuss the potential challenges associated with implementing this approach within real-world search systems.
    摘要 现代搜索引擎是由不同组件构成的,包括查询理解、检索、多stage排名和问答等。这些组件通常是独立优化和部署的。在这篇论文中,我们提出了一个新的概念框架,即大型搜索模型(LLM),它将搜索任务集成到一个大型自然语言模型中。所有任务都是表示为文本生成问题,使得可以通过自然语言提示来定制任务。这个提出的框架利用大型语言理解和逻辑能力,可能提高搜索结果质量,同时也可能简化现有的复杂的搜索堆栈。为证明这个框架的可行性,我们提供了一系列证明性实验,并讨论了在真实世界搜索系统中实施这种方法的潜在挑战。

JointMatch: A Unified Approach for Diverse and Collaborative Pseudo-Labeling to Semi-Supervised Text Classification

  • paper_url: http://arxiv.org/abs/2310.14583
  • repo_url: https://github.com/HenryPengZou/JointMatch
  • paper_authors: Henry Peng Zou, Cornelia Caragea
  • for: 提高 semi-supervised text classification 的性能,解决 pseudo-label 偏见和错误积累问题
  • methods: 提出 JointMatch 方法, combinig 最近 semi-supervised learning 和 learning with noise task,适应性地调整类别的阈值,以降低模型偏爱当前容易的类别
  • results: 在 benchmark 数据集上实验,JointMatch 方法可以 дости得 significiant 5.13% 的提高,特别是在 extremely-scarce-label 设定下,AG News 上只有 5 个标签,JointMatch 方法可以达到 86% 的准确率
    Abstract Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data. However, existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation. In this paper, we propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning and the task of learning with noise. JointMatch adaptively adjusts classwise thresholds based on the learning status of different classes to mitigate model bias towards current easy classes. Additionally, JointMatch alleviates error accumulation by utilizing two differently initialized networks to teach each other in a cross-labeling manner. To maintain divergence between the two networks for mutual learning, we introduce a strategy that weighs more disagreement data while also allowing the utilization of high-quality agreement data for training. Experimental results on benchmark datasets demonstrate the superior performance of JointMatch, achieving a significant 5.13% improvement on average. Notably, JointMatch delivers impressive results even in the extremely-scarce-label setting, obtaining 86% accuracy on AG News with only 5 labels per class. We make our code available at https://github.com/HenryPengZou/JointMatch.
    摘要 semi-supervised文本分类(SSTC)在最近引起了越来越多的关注,这是因为它可以利用无标签数据。然而,现有的方法基于pseudo-标签受到了pseudo-标签偏见和错误积累的问题。在这篇论文中,我们提出了JointMatch,一种整体的方法 дляSSTC,解决了这些挑战。JointMatch可以根据不同类型的学习状况来调整类别的阈值,以降低模型对当前容易类别的偏见。此外,JointMatch可以通过两个不同初始化的网络来教育对方,从而减轻错误积累。为维护两个网络之间的分化,我们提出了一种策略,即在训练中使用高质量一致数据,同时 weights更多的不一致数据。实验结果表明,JointMatch在标准 benchmarkdataset上表现出色,相对于基eline的5.13%提高。尤其是在EXTREMELY-SCARCE-LABEL setting下,JointMatch可以达到86%的准确率,只使用AG News中每个类别5个标签。我们将代码提供在https://github.com/HenryPengZou/JointMatch上。

DeCrisisMB: Debiased Semi-Supervised Learning for Crisis Tweet Classification via Memory Bank

  • paper_url: http://arxiv.org/abs/2310.14577
  • repo_url: https://github.com/HenryPengZou/DeCrisisMB
  • paper_authors: Henry Peng Zou, Yue Zhou, Weizhi Zhang, Cornelia Caragea
    for: 本研究旨在提高危机事件监测和救援过程中 semi-supervised 模型的准确率和泛化能力,并且解决 semi-supervised 模型容易受到类别偏见的问题。methods: 本研究首先对两种最新的减偏方法进行了研究,然后提出了一种简单 yet effective 的减偏方法——DeCrisisMB,该方法利用 Memory Bank 来存储和在每次训练中进行平等抽样,从而消除类别偏见。results: 对不同减偏方法的性能和泛化能力进行了广泛的实验,结果显示我们提出的方法在 both in-distribution 和 out-of-distribution 设置下表现出色,超过了其他方法的性能和泛化能力。
    Abstract During crisis events, people often use social media platforms such as Twitter to disseminate information about the situation, warnings, advice, and support. Emergency relief organizations leverage such information to acquire timely crisis circumstances and expedite rescue operations. While existing works utilize such information to build models for crisis event analysis, fully-supervised approaches require annotating vast amounts of data and are impractical due to limited response time. On the other hand, semi-supervised models can be biased, performing moderately well for certain classes while performing extremely poorly for others, resulting in substantially negative effects on disaster monitoring and rescue. In this paper, we first study two recent debiasing methods on semi-supervised crisis tweet classification. Then we propose a simple but effective debiasing method, DeCrisisMB, that utilizes a Memory Bank to store and perform equal sampling for generated pseudo-labels from each class at each training iteration. Extensive experiments are conducted to compare different debiasing methods' performance and generalization ability in both in-distribution and out-of-distribution settings. The results demonstrate the superior performance of our proposed method. Our code is available at https://github.com/HenryPengZou/DeCrisisMB.
    摘要 In this paper, we study two recent debiasing methods for semi-supervised crisis tweet classification. We then propose a simple but effective method called DeCrisisMB, which uses a Memory Bank to store and perform equal sampling for generated pseudo-labels from each class at each training iteration. We conduct extensive experiments to compare the performance and generalization ability of different debiasing methods in both in-distribution and out-of-distribution settings. The results show that our proposed method outperforms the others. Our code is available at .

Exploring the Boundaries of GPT-4 in Radiology

  • paper_url: http://arxiv.org/abs/2310.14573
  • repo_url: None
  • paper_authors: Qianchu Liu, Stephanie Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Maria Teodora Wetscherek, Robert Tinn, Harshita Sharma, Fernando Pérez-García, Anton Schwaighofer, Pranav Rajpurkar, Sameer Tajdin Khanna, Hoifung Poon, Naoto Usuyama, Anja Thieme, Aditya V. Nori, Matthew P. Lungren, Ozan Oktay, Javier Alvarez-Valle
  • for: This paper assesses the performance of GPT-4 on text-based applications for radiology reports, comparing it against state-of-the-art (SOTA) radiology-specific models.
  • methods: The paper explores various prompting strategies to evaluate GPT-4 on a diverse range of common radiology tasks, including zero-shot prompting and example-based prompting.
  • results: GPT-4 either outperforms or is on par with current SOTA radiology models in various tasks, including temporal sentence similarity classification and natural language inference. Additionally, GPT-4 outputs for findings summarization are found to be overall comparable with existing manually-written impressions.
    Abstract The recent success of general-domain large language models (LLMs) has significantly changed the natural language processing paradigm towards a unified foundation model across domains and applications. In this paper, we focus on assessing the performance of GPT-4, the most capable LLM so far, on the text-based applications for radiology reports, comparing against state-of-the-art (SOTA) radiology-specific models. Exploring various prompting strategies, we evaluated GPT-4 on a diverse range of common radiology tasks and we found GPT-4 either outperforms or is on par with current SOTA radiology models. With zero-shot prompting, GPT-4 already obtains substantial gains ($\approx$ 10% absolute improvement) over radiology models in temporal sentence similarity classification (accuracy) and natural language inference ($F_1$). For tasks that require learning dataset-specific style or schema (e.g. findings summarisation), GPT-4 improves with example-based prompting and matches supervised SOTA. Our extensive error analysis with a board-certified radiologist shows GPT-4 has a sufficient level of radiology knowledge with only occasional errors in complex context that require nuanced domain knowledge. For findings summarisation, GPT-4 outputs are found to be overall comparable with existing manually-written impressions.
    摘要 现代大语言模型(LLM)的成功已经改变了自然语言处理的核心思路,即在各个领域和应用之间建立共同基础模型。在这篇论文中,我们将关注GPT-4,目前最强大的LLM,在文本基础上的应用场景中的性能,并与当前领先的医学特定模型进行比较。我们使用不同的提示策略,对GPT-4进行了广泛的评估,并发现GPT-4在多种常见的医学任务中表现出色,其中包括时间序列相似性分类和自然语言推理等。在需要学习数据集特定的样式或结构(如发现摘要)时,GPT-4可以通过示例提示来改进,并与现有的超级vised SOTA模型匹配。我们的广泛的错误分析表明,GPT-4在复杂的医学上有足够的知识,只有 occasionally 出现具有细化领域知识的错误。在发现摘要方面,GPT-4的输出被发现与现有的手动编写的印象相当相似。

A Boundary Offset Prediction Network for Named Entity Recognition

  • paper_url: http://arxiv.org/abs/2310.18349
  • repo_url: https://github.com/mhtang1995/bopn
  • paper_authors: Minghao Tang, Yongquan He, Yongxiu Xu, Hongbo Xu, Wenyuan Zhang, Yang Lin
  • for: 本研究旨在提高Named Entity Recognition (NER) 的准确率,解决 span-based 方法中的样本空间不均衡和非实体 span 的忽略问题。
  • methods: 我们提出了一种新的方法,即Boundary Offset Prediction Network (BOPN),该方法预测候选它 span 和最近的实体 span 的边界偏移。通过利用边界偏移的指导 semantics,BOPN 建立了非实体 span 和实体 span 之间的连接,使非实体 span 能够作为实体检测的额外正例样本。此外,我们的方法将实体类型和 span 表示结合起来生成类型意识的边界偏移,而不是直接使用实体类型作为检测目标。
  • results: 我们在八个常用的 NER 数据集上进行了实验,结果显示,我们的提出的 BOPN 可以比前一个状态的方法更高的准确率。
    Abstract Named entity recognition (NER) is a fundamental task in natural language processing that aims to identify and classify named entities in text. However, span-based methods for NER typically assign entity types to text spans, resulting in an imbalanced sample space and neglecting the connections between non-entity and entity spans. To address these issues, we propose a novel approach for NER, named the Boundary Offset Prediction Network (BOPN), which predicts the boundary offsets between candidate spans and their nearest entity spans. By leveraging the guiding semantics of boundary offsets, BOPN establishes connections between non-entity and entity spans, enabling non-entity spans to function as additional positive samples for entity detection. Furthermore, our method integrates entity type and span representations to generate type-aware boundary offsets instead of using entity types as detection targets. We conduct experiments on eight widely-used NER datasets, and the results demonstrate that our proposed BOPN outperforms previous state-of-the-art methods.
    摘要 Named entity recognition (NER) 是自然语言处理中的基本任务,旨在在文本中Identify和分类命名实体。然而,使用span基本方法进行NER通常会导致样本空间不均衡,并忽略非实体 span与实体 span之间的连接。为了解决这些问题,我们提出了一种新的NER方法,即边界偏移预测网络(BOPN)。BOPN预测候选块和最近实体块之间的边界偏移。通过利用边界偏移的导引 semantics,BOPN建立了非实体 span和实体 span之间的连接,使得非实体 span能够作为实体检测的额外正例样本。此外,我们的方法集成实体类型和块表示,而不是使用实体类型作为检测目标。我们在八个广泛使用的NER数据集上进行实验,结果显示,我们提出的BOPN比前一个状态的方法更高效。

HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

  • paper_url: http://arxiv.org/abs/2310.14566
  • repo_url: https://github.com/tianyi-lab/hallusionbench
  • paper_authors: Fuxiao Liu, Tianrui Guan, Zongxia Li, Lichang Chen, Yaser Yacoob, Dinesh Manocha, Tianyi Zhou
  • for: 研究大语言模型(LLM)在图像理解任务中的提升。
  • methods: 对图像理解任务使用vision模型和LLM的启合,并对VLM的两种错误(语言幻觉和视觉误差)进行分析。
  • results: 通过创建了HallusionBench图像理解benchmark,发现VLM的语言优先级可能导致忽略图像背景和依赖于语言优先级进行理解,而视觉模块在VLM中较弱,可能导致误导性的视觉表示。
    Abstract Large language models (LLMs), after being aligned with vision models and integrated into vision-language models (VLMs), can bring impressive improvement in image reasoning tasks. This was shown by the recently released GPT-4V(ison), LLaVA-1.5, etc. However, the strong language prior in these SOTA LVLMs can be a double-edged sword: they may ignore the image context and solely rely on the (even contradictory) language prior for reasoning. In contrast, the vision modules in VLMs are weaker than LLMs and may result in misleading visual representations, which are then translated to confident mistakes by LLMs. To study these two types of VLM mistakes, i.e., language hallucination and visual illusion, we curated HallusionBench, an image-context reasoning benchmark that is still challenging to even GPT-4V and LLaVA-1.5. We provide a detailed analysis of examples in HallusionBench, which sheds novel insights on the illusion or hallucination of VLMs and how to improve them in the future. The benchmark and codebase will be released at https://github.com/tianyi-lab/HallusionBench.
    摘要 大型语言模型(LLM), после与视觉模型进行Alignment和融合,可以在图像理解任务中带来很大的改进。这得到了最新的GPT-4V(ison)、LLaVA-1.5等等的证明。然而,这些顶尖的视觉语言模型(VLM)中的强大语言优先可能会使其忽视图像上下文, solely rely on(也许是矛盾的)语言优先来进行理解。与此相反,视觉模块在VLM中比LLM更弱,可能会导致误导性的视觉表示,这些表示然后被LLM翻译成自信的错误。为了研究这两种VLM的错误,我们创建了HallusionBench,一个图像上下文理解 bencmark,它仍然是GPT-4V和LLaVA-1.5所不能解决的挑战。我们提供了图像示例的详细分析,它提供了新的理解关于VLM的 hallucination 和 visual illusion,以及未来如何改进它们的新 идеа。benchmark和代码库将在https://github.com/tianyi-lab/HallusionBench中发布。

Language Models Hallucinate, but May Excel at Fact Verification

  • paper_url: http://arxiv.org/abs/2310.14564
  • repo_url: None
  • paper_authors: Jian Guan, Jesse Dodge, David Wadden, Minlie Huang, Hao Peng
  • for: 评估大语言模型(LLM)的可靠性和可信度,以及其应用于事实核查 task 的可能性。
  • methods: 采用人类评估方法评估 LLM 的输出是否准确,并分析 LLM 对高质量证据的依赖性以及其 robustness 和泛化能力的弱点。
  • results: 研究发现,即使使用最先进的 LLM 如 GPT-3.5 和 ChatGPT,其生成的事实输出率仅为 25% 左右,表明需要进一步提高 LLM 的可靠性和可信度。而 unexpectedly,FLAN-T5-11B,最不准确的生成器,在事实核查 task 中表现最佳,甚至超过 GPT3.5 和 ChatGPT。
    Abstract Recent progress in natural language processing (NLP) owes much to remarkable advances in large language models (LLMs). Nevertheless, LLMs frequently "hallucinate," resulting in non-factual outputs. Our carefully designed human evaluation substantiates the serious hallucination issue, revealing that even GPT-3.5 produces factual outputs less than 25% of the time. This underscores the importance of fact verifiers in order to measure and incentivize progress. Our systematic investigation affirms that LLMs can be repurposed as effective fact verifiers with strong correlations with human judgments, at least in the Wikipedia domain. Surprisingly, FLAN-T5-11B, the least factual generator in our study, performs the best as a fact verifier, even outperforming more capable LLMs like GPT3.5 and ChatGPT. Delving deeper, we analyze the reliance of these LLMs on high-quality evidence, as well as their deficiencies in robustness and generalization ability. Our study presents insights for developing trustworthy generation models.
    摘要 Simplified Chinese:近期的自然语言处理(NLP)技术发展受到大型语言模型(LLMs)的巨大影响。然而,这些模型经常“幻想”并生成非事实输出。我们的人工评估发现, même GPT-3.5,一个现代 LLM,生成事实输出的时间只占25%左右。这 подчерки了需要事实验证器来衡量和激励进步。我们发现, LLMs 可以被重新用于有效的事实验证器,与人类判断具有强相关性,至少在Wikipedia 领域。奇怪的是, FLAN-T5-11B,我们研究中最不事实的生成器,在事实验证方面表现最佳,甚至超过了 GPT3.5 和 ChatGPT 等更高能力 LLMs。我们的分析还探讨了这些 LLMs 借助高质量证据的依赖,以及它们的缺乏 Robustness 和总体化能力。我们的研究提供了开发可靠生成模型的洞察。

NormDial: A Comparable Bilingual Synthetic Dialog Dataset for Modeling Social Norm Adherence and Violation

  • paper_url: http://arxiv.org/abs/2310.14563
  • repo_url: https://github.com/aochong-li/normdial
  • paper_authors: Oliver Li, Mallika Subramanian, Arkadiy Saakyan, Sky CH-Wang, Smaranda Muresan
  • for: 这个论文是为了研究社会规范如何影响人们之间的交流。
  • methods: 这个论文使用了人类 Loop 管道生成高质量的对话数据,并对社交规范遵从和违反进行了分别标注。
  • results: 研究发现,大语言模型在这个任务上表现不佳,提出了新的研究方向以更好地理解社交规范在对话中的表现。
    Abstract Social norms fundamentally shape interpersonal communication. We present NormDial, a high-quality dyadic dialogue dataset with turn-by-turn annotations of social norm adherences and violations for Chinese and American cultures. Introducing the task of social norm observance detection, our dataset is synthetically generated in both Chinese and English using a human-in-the-loop pipeline by prompting large language models with a small collection of expert-annotated social norms. We show that our generated dialogues are of high quality through human evaluation and further evaluate the performance of existing large language models on this task. Our findings point towards new directions for understanding the nuances of social norms as they manifest in conversational contexts that span across languages and cultures.
    摘要 社会规范深刻影响人际交流。我们介绍NormDial数据集,包含中美文化的高质量对话副本,每个对话都有 turnovers 的社会规范遵循和违反的标注。我们提出社会规范遵循检测任务,使用人类在 loops 中的人工生成管道,通过提示大型自然语言模型一小集专家标注的社会规范,生成的对话质量高。我们通过人类评估和现有大型自然语言模型在这个任务上的表现进行评估,我们的发现指向了在语言和文化交叉之下社会规范的细节。

The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages

  • paper_url: http://arxiv.org/abs/2310.14557
  • repo_url: None
  • paper_authors: Chiyu Zhang, Khai Duy Doan, Qisheng Liao, Muhammad Abdul-Mageed
  • for: 这研究旨在探讨 instruction-tuned 大型自然语言处理(LLM)模型在跨语言社会功能性含义(SM)理解方面的能力。
  • methods: 研究使用了多种多语言预训练语言模型(如 mT5)和指令调教LLM(如 BLOOMZ、ChatGPT)在 SPARROW 测试集上进行微调、零shot 和几shot 学习。
  • results: 研究发现现有的开源指令调教LLM仍然在不同语言的 SM 理解方面表现不佳,在一些情况下与随机基线几乎相当。此外,尽管 ChatGPT 表现较好,但它仍然落后任务特定微调模型的差距为 12.19 SPARROW 分数。
    Abstract Instruction tuned large language models (LLMs), such as ChatGPT, demonstrate remarkable performance in a wide range of tasks. Despite numerous recent studies that examine the performance of instruction-tuned LLMs on various NLP benchmarks, there remains a lack of comprehensive investigation into their ability to understand cross-lingual sociopragmatic meaning (SM), i.e., meaning embedded within social and interactive contexts. This deficiency arises partly from SM not being adequately represented in any of the existing benchmarks. To address this gap, we present SPARROW, an extensive multilingual benchmark specifically designed for SM understanding. SPARROW comprises 169 datasets covering 13 task types across six primary categories (e.g., anti-social language detection, emotion recognition). SPARROW datasets encompass 64 different languages originating from 12 language families representing 16 writing scripts. We evaluate the performance of various multilingual pretrained language models (e.g., mT5) and instruction-tuned LLMs (e.g., BLOOMZ, ChatGPT) on SPARROW through fine-tuning, zero-shot, and/or few-shot learning. Our comprehensive analysis reveals that existing open-source instruction tuned LLMs still struggle to understand SM across various languages, performing close to a random baseline in some cases. We also find that although ChatGPT outperforms many LLMs, it still falls behind task-specific finetuned models with a gap of 12.19 SPARROW score. Our benchmark is available at: https://github.com/UBC-NLP/SPARROW
    摘要 <>将大型语言模型(LLM)如ChatGPT调教,表现出杰出的表现在各种任务中。Despite numerous recent studies examining the performance of instruction-tuned LLMs on various NLP benchmarks, there remains a lack of comprehensive investigation into their ability to understand cross-lingual sociopragmatic meaning (SM), i.e., meaning embedded within social and interactive contexts. This deficiency arises partly from SM not being adequately represented in any of the existing benchmarks. To address this gap, we present SPARROW, an extensive multilingual benchmark specifically designed for SM understanding. SPARROW comprises 169 datasets covering 13 task types across six primary categories (e.g., anti-social language detection, emotion recognition). SPARROW datasets encompass 64 different languages originating from 12 language families representing 16 writing scripts. We evaluate the performance of various multilingual pretrained language models (e.g., mT5) and instruction-tuned LLMs (e.g., BLOOMZ, ChatGPT) on SPARROW through fine-tuning, zero-shot, and/or few-shot learning. Our comprehensive analysis reveals that existing open-source instruction-tuned LLMs still struggle to understand SM across various languages, performing close to a random baseline in some cases. We also find that although ChatGPT outperforms many LLMs, it still falls behind task-specific finetuned models with a gap of 12.19 SPARROW score. Our benchmark is available at: https://github.com/UBC-NLP/SPARROW.

Harnessing ChatGPT for thematic analysis: Are we ready?

  • paper_url: http://arxiv.org/abs/2310.14545
  • repo_url: None
  • paper_authors: V Vien Lee, Stephanie C. C. van der Lubbe, Lay Hoon Goh, Jose M. Valderas
  • for: This paper explores the use of ChatGPT in three core phases of thematic analysis within a medical context, including direct coding of transcripts, generating themes from a predefined list of codes, and preprocessing quotes for manuscript inclusion.
  • methods: The paper uses ChatGPT, an advanced natural language processing tool, to automate the thematic analysis process.
  • results: The authors assess the strengths and limitations of using ChatGPT in thematic analysis, highlighting areas where human intervention remains necessary, and argue that ChatGPT can function as a valuable tool during analysis, enhancing the efficiency and offering additional insights into the qualitative data.Here is the same information in Simplified Chinese text:
  • for: 这篇论文探讨了在医学上使用ChatGPT进行主题分析的可能性,包括直接编码访谈稿、从预定的代码列表中生成主题以及为报告 incluption 进行预处理。
  • methods: 论文使用了ChatGPT,一种高级自然语言处理工具,来自动化主题分析过程。
  • results: 作者评估了使用ChatGPT进行主题分析的优劣点,指出了人工干预仍然必要的领域,并 argue that ChatGPT可以作为分析过程中的有价值工具,提高效率并提供额外的数据分析 Insights。
    Abstract ChatGPT is an advanced natural language processing tool with growing applications across various disciplines in medical research. Thematic analysis, a qualitative research method to identify and interpret patterns in data, is one application that stands to benefit from this technology. This viewpoint explores the utilization of ChatGPT in three core phases of thematic analysis within a medical context: 1) direct coding of transcripts, 2) generating themes from a predefined list of codes, and 3) preprocessing quotes for manuscript inclusion. Additionally, we explore the potential of ChatGPT to generate interview transcripts, which may be used for training purposes. We assess the strengths and limitations of using ChatGPT in these roles, highlighting areas where human intervention remains necessary. Overall, we argue that ChatGPT can function as a valuable tool during analysis, enhancing the efficiency of the thematic analysis and offering additional insights into the qualitative data.
    摘要 chatGPT是一种先进的自然语言处理工具,在医学研究中有越来越多的应用。本观点探讨了在医学上使用chatGPT进行主题分析的三个核心阶段:1)直接编码讲话稿,2)根据预定的代码列表生成主题,3)预处理引用 для报告 inclusion。此外,我们还探讨了使用chatGPT生成访谈稿,可以用于培训。我们评估了使用chatGPT的优点和缺点,并指出了人工干预仍然必要的领域。总的来说,我们认为chatGPT可以作为分析中的有价值工具,提高分析的效率并为质量数据提供额外的意义。

Evaluating Large Language Models on Controlled Generation Tasks

  • paper_url: http://arxiv.org/abs/2310.14542
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Jiao Sun, Yufei Tian, Wangchunshu Zhou, Nan Xu, Qian Hu, Rahul Gupta, John Frederick Wieting, Nanyun Peng, Xuezhe Ma
  • for: 研究大语言模型在不同任务上的能力
  • methods: 使用不同的大语言模型和小语言模型进行比较
  • results: 发现大语言模型在细化的任务上弱于小语言模型,但在其他任务上可以和小语言模型相比或超越它们的能力。Translation:
  • for: 研究大语言模型在不同任务上的能力
  • methods: 使用不同的大语言模型和小语言模型进行比较
  • results: 发现大语言模型在细化的任务上弱于小语言模型,但在其他任务上可以和小语言模型相比或超越它们的能力。
    Abstract While recent studies have looked into the abilities of large language models in various benchmark tasks, including question generation, reading comprehension, multilingual and etc, there have been few studies looking into the controllability of large language models on generation tasks. We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities. After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models. We conclude that **large language models struggle at meeting fine-grained hard constraints**.
    摘要 Recent research has focused on the capabilities of large language models in various tasks, such as question generation, reading comprehension, and multilingual tasks. However, few studies have explored the controllability of large language models on generation tasks. In this study, we conduct an extensive analysis of various benchmarks, including a sentence planning benchmark with different granularities. We compare large language models with state-of-the-art finetuned smaller models and find that large language models struggle to meet fine-grained hard constraints. Our results show that while large language models can excel in some tasks, they fall behind, are comparable, or exceed the ability of smaller models in others. We conclude that large language models have limitations in meeting fine-grained hard constraints.Translation notes:* "benchmark tasks" is translated as "benchmark任务" (benchmark task)* "question generation" is translated as "问题生成" (question generation)* "reading comprehension" is translated as "阅读理解" (reading comprehension)* "multilingual" is translated as "多语言" (multilingual)* "fine-grained hard constraints" is translated as "细化硬性约束" (fine-grained hard constraints)* "state-of-the-art finetuned smaller models" is translated as "现有最佳精度小型模型" (state-of-the-art finetuned smaller models)Note that the translation is based on the Simplified Chinese version of the text, and the translation may vary slightly depending on the specific context and audience.

Continual Named Entity Recognition without Catastrophic Forgetting

  • paper_url: http://arxiv.org/abs/2310.14541
  • repo_url: https://github.com/bladedancer957/cpfd
  • paper_authors: Duzhen Zhang, Wei Cong, Jiahua Dong, Yahan Yu, Xiuyi Chen, Yonggang Zhang, Zhen Fang
  • for: 本研究旨在提高 continual named entity recognition (CNER) 中的模型更新策略,尤其是减轻 catastrophic forgetting 问题。
  • methods: 本文提出了一种池化特征储存损失,以保持旧entity type 知识,同时吸收新entity type。此外,我们还提出了一种 confidence-based pseudo-labeling 方法,使用旧模型预测 non-entity type。最后,我们建议一种 adaptive re-weighting 类型偏好学习策略。
  • results: 我们在十个 CNER 设置中进行了广泛的实验,使用三个不同的数据集。结果显示,我们的方法在 Micro 和 Macro F1 分数中表现出色,与先前的状态OF-THE-ART方法相比,平均提高了6.3% 和 8.0%。
    Abstract Continual Named Entity Recognition (CNER) is a burgeoning area, which involves updating an existing model by incorporating new entity types sequentially. Nevertheless, continual learning approaches are often severely afflicted by catastrophic forgetting. This issue is intensified in CNER due to the consolidation of old entity types from previous steps into the non-entity type at each step, leading to what is known as the semantic shift problem of the non-entity type. In this paper, we introduce a pooled feature distillation loss that skillfully navigates the trade-off between retaining knowledge of old entity types and acquiring new ones, thereby more effectively mitigating the problem of catastrophic forgetting. Additionally, we develop a confidence-based pseudo-labeling for the non-entity type, \emph{i.e.,} predicting entity types using the old model to handle the semantic shift of the non-entity type. Following the pseudo-labeling process, we suggest an adaptive re-weighting type-balanced learning strategy to handle the issue of biased type distribution. We carried out comprehensive experiments on ten CNER settings using three different datasets. The results illustrate that our method significantly outperforms prior state-of-the-art approaches, registering an average improvement of $6.3$\% and $8.0$\% in Micro and Macro F1 scores, respectively.
    摘要

Improving Seq2Seq Grammatical Error Correction via Decoding Interventions

  • paper_url: http://arxiv.org/abs/2310.14534
  • repo_url: https://github.com/Jacob-Zhou/gecdi
  • paper_authors: Houquan Zhou, Yumeng Liu, Zhenghua Li, Min Zhang, Bo Zhang, Chen Li, Ji Zhang, Fei Huang
  • for: 本文旨在提高语音识别领域中的语法错误检测性能。
  • methods: 本文提出了一种统一的解码优化框架,通过外部评价人来评估生成token的适应程度,并通过动态地影响下一个token的选择。
  • results: 经过广泛的实验表明,本文的方法可以与State-of-the-art方法竞争,并且在英文和中文数据集上具有优秀的性能。
    Abstract The sequence-to-sequence (Seq2Seq) approach has recently been widely used in grammatical error correction (GEC) and shows promising performance. However, the Seq2Seq GEC approach still suffers from two issues. First, a Seq2Seq GEC model can only be trained on parallel data, which, in GEC task, is often noisy and limited in quantity. Second, the decoder of a Seq2Seq GEC model lacks an explicit awareness of the correctness of the token being generated. In this paper, we propose a unified decoding intervention framework that employs an external critic to assess the appropriateness of the token to be generated incrementally, and then dynamically influence the choice of the next token. We discover and investigate two types of critics: a pre-trained left-to-right language model critic and an incremental target-side grammatical error detector critic. Through extensive experiments on English and Chinese datasets, our framework consistently outperforms strong baselines and achieves results competitive with state-of-the-art methods.
    摘要 seq2seq方法在 grammatical error correction (GEC) 领域已经广泛应用,并表现良好。然而,seq2seq GEC 方法仍然受到两个问题的限制。首先,一个 seq2seq GEC 模型只能在平行数据上训练,在 GEC 任务中,这些平行数据经常含有噪音和数量有限。其次,seq2seq GEC 模型的解码器缺乏对生成token的正确性的显式意识。在这篇论文中,我们提议一种统一的解码干扰 Framework,利用外部批评人来评估生成token的适用程度,然后动态地影响下一个token的选择。我们发现和探索了两种批评人:一个预训练的左到右语言模型批评人和一个逐行目标边 grammatical error detector 批评人。通过对英语和中文数据集进行了广泛的实验,我们的框架一直表现出优于强基eline和状态 искус的方法。

Dual-Feedback Knowledge Retrieval for Task-Oriented Dialogue Systems

  • paper_url: http://arxiv.org/abs/2310.14528
  • repo_url: None
  • paper_authors: Tianyuan Shi, Liangzhi Li, Zijian Lin, Tao Yang, Xiaojun Quan, Qifan Wang
  • for: 提高终端任务对话系统的成功率,通过快速选择相关信息满足用户请求。
  • methods: 提出了一种 Retriever-Generator 架构,通过使用搜寻器 retrieve 相关知识,并使用生成器生成系统回答。另外,由于搜寻器没有培训标签,我们提议使用生成器的反馈作为 Pseudo-labels 来培训搜寻器。
  • results: 在三个 benchmark 数据集上进行实验, results 表明我们的方法可以在任务对话任务中显著提高性能。
    Abstract Efficient knowledge retrieval plays a pivotal role in ensuring the success of end-to-end task-oriented dialogue systems by facilitating the selection of relevant information necessary to fulfill user requests. However, current approaches generally integrate knowledge retrieval and response generation, which poses scalability challenges when dealing with extensive knowledge bases. Taking inspiration from open-domain question answering, we propose a retriever-generator architecture that harnesses a retriever to retrieve pertinent knowledge and a generator to generate system responses.~Due to the lack of retriever training labels, we propose relying on feedback from the generator as pseudo-labels to train the retriever. To achieve this, we introduce a dual-feedback mechanism that generates both positive and negative feedback based on the output of the generator. Our method demonstrates superior performance in task-oriented dialogue tasks, as evidenced by experimental results on three benchmark datasets.
    摘要 高效知识检索对终端任务对话系统的成功起着关键作用,因为它可以帮助选择用户请求所需的相关信息。然而,现有的方法通常将知识检索和响应生成集成起来,这会导致面临广泛知识库的扩展性挑战。取得开放式问答系统的灵感,我们提议一种检索生成架构,该架构利用检索器来检索相关知识,并使用生成器来生成系统响应。由于检索器没有专门的训练标签,我们提议利用生成器的反馈作为假标签来训练检索器。为实现这一点,我们提出了一种双反馈机制,该机制可以根据生成器的输出生成正面和负面反馈。我们的方法在任务对话任务中表现出了超过比较的表现,实验结果表明。

PRCA: Fitting Black-Box Large Language Models for Retrieval Question Answering via Pluggable Reward-Driven Contextual Adapter

  • paper_url: http://arxiv.org/abs/2310.18347
  • repo_url: None
  • paper_authors: Haoyan Yang, Zhitao Li, Yong Zhang, Jianzong Wang, Ning Cheng, Ming Li, Jing Xiao
  • for: 提高 Retrieval Question Answering (ReQA) 任务的性能,使得可以使用适应性的 Large Language Models (LLMs) 作为生成器。
  • methods: 提出了一种可调PRCA,位于生成器和搜索器之间,通过在搜索阶段使用奖励学习来更正搜索结果,以提高ReQA性能。
  • results: 经验表明,PRCA可以在三个数据集上提高ReQA性能,最高提高20%,表明PRCA在LLMs时代具有显著的潜在价值。
    Abstract The Retrieval Question Answering (ReQA) task employs the retrieval-augmented framework, composed of a retriever and generator. The generator formulates the answer based on the documents retrieved by the retriever. Incorporating Large Language Models (LLMs) as generators is beneficial due to their advanced QA capabilities, but they are typically too large to be fine-tuned with budget constraints while some of them are only accessible via APIs. To tackle this issue and further improve ReQA performance, we propose a trainable Pluggable Reward-Driven Contextual Adapter (PRCA), keeping the generator as a black box. Positioned between the retriever and generator in a Pluggable manner, PRCA refines the retrieved information by operating in a token-autoregressive strategy via maximizing rewards of the reinforcement learning phase. Our experiments validate PRCA's effectiveness in enhancing ReQA performance on three datasets by up to 20% improvement to fit black-box LLMs into existing frameworks, demonstrating its considerable potential in the LLMs era.
    摘要 抓取问答任务(ReQA)采用抓取加强框架,由搜索器和生成器组成。生成器根据搜索器返回的文档来形成答案。尽管大语言模型(LLM)作为生成器具有高水平的问答能力,但它们通常是资金限制不能进行精细调整的,而且一些只能通过API访问。为了解决这个问题并进一步提高ReQA性能,我们提议一种可调整的插入式奖励驱动上下文 adaptor(PRCA),将生成器视为黑盒子。PRCA位于搜索器和生成器之间,通过在token autoregressive策略中 maximizing奖励回报来细化搜索结果。我们的实验证明PRCA在三个 dataset 上提高了ReQA性能,最高提高20%,这说明PRCA在LLMs era中具有可观的潜力。

Rethinking Word-Level Auto-Completion in Computer-Aided Translation

  • paper_url: http://arxiv.org/abs/2310.14523
  • repo_url: https://github.com/galaxychen/wlac-joint-training
  • paper_authors: Xingyu Chen, Lemao Liu, Guoping Huang, Zhirui Zhang, Mingming Yang, Shuming Shi, Rui Wang
  • for: 提高 Computer-Assisted Translation 中 Word-Level Auto-Completion 的性能。
  • methods: 提出一种基于可测量标准的方法,以确定合适的自动完成词选择。
  • results: 通过实验表明,提议的方法可以在不同的 encoder-based 架构上提高 WLAC 性能,并且使用较小的模型大小。
    Abstract Word-Level Auto-Completion (WLAC) plays a crucial role in Computer-Assisted Translation. It aims at providing word-level auto-completion suggestions for human translators. While previous studies have primarily focused on designing complex model architectures, this paper takes a different perspective by rethinking the fundamental question: what kind of words are good auto-completions? We introduce a measurable criterion to answer this question and discover that existing WLAC models often fail to meet this criterion. Building upon this observation, we propose an effective approach to enhance WLAC performance by promoting adherence to the criterion. Notably, the proposed approach is general and can be applied to various encoder-based architectures. Through extensive experiments, we demonstrate that our approach outperforms the top-performing system submitted to the WLAC shared tasks in WMT2022, while utilizing significantly smaller model sizes.
    摘要

QUDEVAL: The Evaluation of Questions Under Discussion Discourse Parsing

  • paper_url: http://arxiv.org/abs/2310.14520
  • repo_url: https://github.com/lingchensanwen/qudeval
  • paper_authors: Yating Wu, Ritika Mangla, Greg Durrett, Junyi Jessy Li
  • for: 本研究旨在提供一个自动评估QUD结构的框架,以满足语言模型的进一步发展。
  • methods: 本研究使用了一个新的评估数据集——QUDeval,并将QUD的理论限制实现为具体协议。
  • results: 研究发现,现有的语言模型仍然具有实现QUD结构的困难度,并且现有的评估指标不具体地反映评估器的质量。 however, human-authored QUDs are scored highly by human evaluators, suggesting headroom for further progress on language modeling.
    Abstract Questions Under Discussion (QUD) is a versatile linguistic framework in which discourse progresses as continuously asking questions and answering them. Automatic parsing of a discourse to produce a QUD structure thus entails a complex question generation task: given a document and an answer sentence, generate a question that satisfies linguistic constraints of QUD and can be grounded in an anchor sentence in prior context. These questions are known to be curiosity-driven and open-ended. This work introduces the first framework for the automatic evaluation of QUD parsing, instantiating the theoretical constraints of QUD in a concrete protocol. We present QUDeval, a dataset of fine-grained evaluation of 2,190 QUD questions generated from both fine-tuned systems and LLMs. Using QUDeval, we show that satisfying all constraints of QUD is still challenging for modern LLMs, and that existing evaluation metrics poorly approximate parser quality. Encouragingly, human-authored QUDs are scored highly by our human evaluators, suggesting that there is headroom for further progress on language modeling to improve both QUD parsing and QUD evaluation.
    摘要 问题下的讨论(QUD)是一种灵活的语言框架,在这个框架下,对话进行不断的问题和答案交互。自动分析对话生成 QUD 结构,因此需要进行复杂的问题生成任务:给定一个文档和一个答案句子,生成一个满足语言约束的 QUD 问题,并且可以基于先前的上下文中的拓展句子。这些问题被称为 curios-driven 和开放的。本工作提出了自动评估 QUD 解析的第一个框架,实现了理论约束的具体实现。我们提出了 QUDeval 数据集,包含2,190个精细评估的 QUD 问题,其中一部分来自 fine-tuned 系统,另一部分来自 LLMS。使用 QUDeval,我们发现现代 LLMS 仍然很难满足 QUD 语言约束,并且现有的评估 метри低估 parser 质量。幸好,人类编写的 QUD 问题得分高,表示可以进一步进行语言模型化以提高 QUD 解析和 QUD 评估。

Turn-Level Active Learning for Dialogue State Tracking

  • paper_url: http://arxiv.org/abs/2310.14513
  • repo_url: None
  • paper_authors: Zihan Zhang, Meng Fang, Fanghua Ye, Ling Chen, Mohammad-Reza Namazi-Rad
  • for: 这 paper 的目的是提出一种新的 turn-level active learning 框架,用于对话系统中的对话数据分类。
  • methods: 该框架使用选择性的注释方法,以优化对话数据的注释效率。
  • results: 实验结果表明,该方法可以在有限的标注预算下实现相对比较好的对话数据分类性能,并且可以减少对话数据的注释量。
    Abstract Dialogue state tracking (DST) plays an important role in task-oriented dialogue systems. However, collecting a large amount of turn-by-turn annotated dialogue data is costly and inefficient. In this paper, we propose a novel turn-level active learning framework for DST to actively select turns in dialogues to annotate. Given the limited labelling budget, experimental results demonstrate the effectiveness of selective annotation of dialogue turns. Additionally, our approach can effectively achieve comparable DST performance to traditional training approaches with significantly less annotated data, which provides a more efficient way to annotate new dialogue data.
    摘要 对话状态跟踪(DST)在任务导向对话系统中扮演着重要的角色。然而,收集大量的回合对话数据 annotation是成本高昂且不效率的。在这篇论文中,我们提出了一种新的回合活动学习框架,用于精选对话中的回合进行标注。由于有限的标注预算,实验结果表明我们的方法可以有效地实现选择性的标注对话回合,并且可以与传统的训练方法具有相同的DST性能,但是具有更少的标注数据,从而提供了更有效的对话数据标注方法。

CITB: A Benchmark for Continual Instruction Tuning

  • paper_url: http://arxiv.org/abs/2310.14510
  • repo_url: None
  • paper_authors: Zihan Zhang, Meng Fang, Ling Chen, Mohammad-Reza Namazi-Rad
  • for: 本研究旨在解决连续学习(CL)任务中的指令调整问题,以便更好地掌握和应用自然语言指令。
  • methods: 本研究采用了现有的连续学习方法,并对其进行了修改和调整,以便更好地适应不同类型的语言指令。
  • results: 研究发现,现有的连续学习方法不充分利用了自然语言指令的丰富性,并且在不断调整模型的情况下,可以获得类似或更好的结果。
    Abstract Continual learning (CL) is a paradigm that aims to replicate the human ability to learn and accumulate knowledge continually without forgetting previous knowledge and transferring it to new tasks. Recent instruction tuning (IT) involves fine-tuning models to make them more adaptable to solving NLP tasks in general. However, it is still uncertain how instruction tuning works in the context of CL tasks. This challenging yet practical problem is formulated as Continual Instruction Tuning (CIT). In this work, we establish a CIT benchmark consisting of learning and evaluation protocols. We curate two long dialogue task streams of different types, InstrDialog and InstrDialog++, to study various CL methods systematically. Our experiments show that existing CL methods do not effectively leverage the rich natural language instructions, and fine-tuning an instruction-tuned model sequentially can yield similar or better results. We further explore different aspects that might affect the learning of CIT. We hope this benchmark will facilitate more research in this direction.
    摘要

EXPLAIN, EDIT, GENERATE: Rationale-Sensitive Counterfactual Data Augmentation for Multi-hop Fact Verification

  • paper_url: http://arxiv.org/abs/2310.14508
  • repo_url: https://github.com/aaandy-zhu/race
  • paper_authors: Yingjie Zhu, Jiasheng Si, Yibo Zhao, Haiyang Zhu, Deyu Zhou, Yulan He
  • for: 提高自然语言处理中的多跳事实验证性能
  • methods: 使用 Explain-Edit-Generate 架构生成多样化和精准的对应替换文本,并使用 checking 和 filtering 模块来规范化对假文本的检查和筛选
  • results: 提出的方法在对假文本生成中比基eline高效,能够生成多样化的对应替换文本而无需破坏逻辑关系
    Abstract Automatic multi-hop fact verification task has gained significant attention in recent years. Despite impressive results, these well-designed models perform poorly on out-of-domain data. One possible solution is to augment the training data with counterfactuals, which are generated by minimally altering the causal features of the original data. However, current counterfactual data augmentation techniques fail to handle multi-hop fact verification due to their incapability to preserve the complex logical relationships within multiple correlated texts. In this paper, we overcome this limitation by developing a rationale-sensitive method to generate linguistically diverse and label-flipping counterfactuals while preserving logical relationships. In specific, the diverse and fluent counterfactuals are generated via an Explain-Edit-Generate architecture. Moreover, the checking and filtering modules are proposed to regularize the counterfactual data with logical relations and flipped labels. Experimental results show that the proposed approach outperforms the SOTA baselines and can generate linguistically diverse counterfactual data without disrupting their logical relationships.
    摘要 自动多阶假设验证任务在近年获得了很大的关注。 DESPITE 这些优化的模型在训练数据上表现出色,但它们在对预测的数据上表现不佳。一个可能的解决方案是将训练数据补充了假设,这些假设是通过最小化的因果特征来生成的。但现有的假设数据增强技术无法处理多阶假设验证,因为它们无法维持多个相关的文本之间的复杂逻辑关系。在这篇论文中,我们解决这个限制,我们发展了一种关系敏感的方法,可以生成 linguistically 多元和标签转换的假设,同时保持逻辑关系。具体来说,我们使用 Explain-Edit-Generate 架构来生成多元和流畅的假设。此外,我们也提出了检查和筛选模组,以规范假设数据中的逻辑关系和标签转换。实验结果显示,我们的方法可以超越现有的基eline,并可以生成 linguistically 多元的假设数据,不会遗传逻辑关系的破坏。

Sentiment analysis with adaptive multi-head attention in Transformer

  • paper_url: http://arxiv.org/abs/2310.14505
  • repo_url: None
  • paper_authors: Fanfei Meng, David Demeter
  • for: 这 paper 是为了提出一种基于注意机制的电影评论文档情感识别框架。
  • methods: 这 paper 使用了一种自适应多头注意架构(AdaptAttn),其中注意头数量根据句子长度进行自适应调整。
  • results: 实验结果表明,与基eline模型相比,本模型在 Standford 大电影评论数据集上的 F1 分数几乎相同。
    Abstract We propose a novel framework based on the attention mechanism to identify the sentiment of a movie review document. Previous efforts on deep neural networks with attention mechanisms focus on encoder and decoder with fixed numbers of multi-head attention. Therefore, we need a mechanism to stop the attention process automatically if no more useful information can be read from the memory.In this paper, we propose an adaptive multi-head attention architecture (AdaptAttn) which varies the number of attention heads based on length of sentences. AdaptAttn has a data preprocessing step where each document is classified into any one of the three bins small, medium or large based on length of the sentence. The document classified as small goes through two heads in each layer, the medium group passes four heads and the large group is processed by eight heads. We examine the merit of our model on the Stanford large movie review dataset. The experimental results show that the F1 score from our model is on par with the baseline model.
    摘要 我们提出了一种基于注意机制的新框架,用于 identificar电影评论文档中的情感。先前的各种深度神经网络与注意机制实验均采用固定数量的多头注意。因此,我们需要一种机制来自动停止注意过程,以避免继续读取内存中的无用信息。在这篇论文中,我们提出了一种自适应多头注意架构(AdaptAttn),其中注意头数量根据句子长度进行变化。AdaptAttn具有一个数据预处理步骤,其中每个文档根据句子长度被分类为小、中、大三类中的任一类。小类文档通过每层两个头进行处理,中类文档通过每层四个头进行处理,大类文档通过每层八个头进行处理。我们对斯坦福大学电影评论数据集进行实验,结果显示我们的模型与基线模型的F1分数几乎相同。

Diversify Question Generation with Retrieval-Augmented Style Transfer

  • paper_url: http://arxiv.org/abs/2310.14503
  • repo_url: https://github.com/gouqi666/rast
  • paper_authors: Qi Gou, Zehua Xia, Bowen Yu, Haiyang Yu, Fei Huang, Yongbin Li, Nguyen Cam-Tu
  • for: 提高问题生成的表达多样性,使得问题更能够表达出人类语言的多样性和自然性。
  • methods: 利用不同模板的风格进行问题生成,并通过可优化的强化学习法找到最佳的模板。
  • results: 在多样性和一致性两个指标下,RAST方法比前一代多样性驱动基线方法表现出色,同时保持了与模板的一致性。
    Abstract Given a textual passage and an answer, humans are able to ask questions with various expressions, but this ability is still challenging for most question generation (QG) systems. Existing solutions mainly focus on the internal knowledge within the given passage or the semantic word space for diverse content planning. These methods, however, have not considered the potential of external knowledge for expression diversity. To bridge this gap, we propose RAST, a framework for Retrieval-Augmented Style Transfer, where the objective is to utilize the style of diverse templates for question generation. For training RAST, we develop a novel Reinforcement Learning (RL) based approach that maximizes a weighted combination of diversity reward and consistency reward. Here, the consistency reward is computed by a Question-Answering (QA) model, whereas the diversity reward measures how much the final output mimics the retrieved template. Experimental results show that our method outperforms previous diversity-driven baselines on diversity while being comparable in terms of consistency scores. Our code is available at https://github.com/gouqi666/RAST.
    摘要 Simplified Chinese: humans can ask questions with various expressions, but this ability is still challenging for most question generation (QG) systems. existing solutions mainly focus on the internal knowledge within the given passage or the semantic word space for diverse content planning. these methods, however, have not considered the potential of external knowledge for expression diversity. to bridge this gap, we propose RAST, a framework for Retrieval-Augmented Style Transfer, where the objective is to utilize the style of diverse templates for question generation. for training RAST, we develop a novel Reinforcement Learning (RL) based approach that maximizes a weighted combination of diversity reward and consistency reward. here, the consistency reward is computed by a Question-Answering (QA) model, whereas the diversity reward measures how much the final output mimics the retrieved template. experimental results show that our method outperforms previous diversity-driven baselines on diversity while being comparable in terms of consistency scores. our code is available at https://github.com/gouqi666/RAST.

Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models

  • paper_url: http://arxiv.org/abs/2310.14491
  • repo_url: https://github.com/yifan-h/mechanisticprobe
  • paper_authors: Yifan Hou, Jiaoda Li, Yu Fei, Alessandro Stolfo, Wangchunshu Zhou, Guangtao Zeng, Antoine Bosselut, Mrinmaya Sachan
  • for: 本研究旨在解释语言模型(LM)是如何进行多步逻辑思维的。
  • methods: 本研究使用了一种新的探测方法(名为 MechanisticProbe),可以从模型的注意模式中恢复出逻辑树。
  • results: 研究发现, MechanisticProbe 能够在大多数示例中从模型的注意模式中恢复出逻辑树信息,表明LM在许多情况下实际上是通过多步逻辑过程来完成任务。
    Abstract Recent work has shown that language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities. However, it is unclear whether LMs perform these tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism. In this paper, we try to answer this question by exploring a mechanistic interpretation of LMs for multi-step reasoning tasks. Concretely, we hypothesize that the LM implicitly embeds a reasoning tree resembling the correct reasoning process within it. We test this hypothesis by introducing a new probing approach (called MechanisticProbe) that recovers the reasoning tree from the model's attention patterns. We use our probe to analyze two LMs: GPT-2 on a synthetic task (k-th smallest element), and LLaMA on two simple language-based reasoning tasks (ProofWriter & AI2 Reasoning Challenge). We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples, suggesting that the LM indeed is going through a process of multi-step reasoning within its architecture in many cases.
    摘要

Text Fact Transfer

  • paper_url: http://arxiv.org/abs/2310.14486
  • repo_url: https://github.com/nbalepur/text-fact-transfer
  • paper_authors: Nishant Balepur, Jie Huang, Kevin Chen-Chuan Chang
  • for: 控制文本的样式,包括将过去的新闻更新为当前事件和将教育材料重新用途。
  • methods: 提出了文本事实传递任务,即将文本的事实内容传递到不同话题中,保持原始文本的样式不变。
  • results: 通过设计ModQGA框架,可以准确地传递文本的事实内容,而不是改变原始文本的样式。
    Abstract Text style transfer is a prominent task that aims to control the style of text without inherently changing its factual content. To cover more text modification applications, such as adapting past news for current events and repurposing educational materials, we propose the task of text fact transfer, which seeks to transfer the factual content of a source text between topics without modifying its style. We find that existing language models struggle with text fact transfer, due to their inability to preserve the specificity and phrasing of the source text, and tendency to hallucinate errors. To address these issues, we design ModQGA, a framework that minimally modifies a source text with a novel combination of end-to-end question generation and specificity-aware question answering. Through experiments on four existing datasets adapted for text fact transfer, we show that ModQGA can accurately transfer factual content without sacrificing the style of the source text.
    摘要

“Why Should I Review This Paper?” Unifying Semantic, Topic, and Citation Factors for Paper-Reviewer Matching

  • paper_url: http://arxiv.org/abs/2310.14483
  • repo_url: https://github.com/plubplub1/bountyfarm
  • paper_authors: Yu Zhang, Yanzhen Shen, Xiusi Chen, Bowen Jin, Jiawei Han
  • for: This paper proposes a unified model for paper-reviewer matching that jointly captures semantic, topic, and citation factors to improve the accuracy of matching reviewers with papers.
  • methods: The proposed UniPR model uses a contextualized language model backbone to learn common knowledge and introduces instruction tuning to characterize the uniqueness of each factor by producing factor-aware paper embeddings.
  • results: Experiments on four datasets across different fields consistently validate the effectiveness of the UniPR model in comparison with state-of-the-art paper-reviewer matching methods and scientific pre-trained language models.
    Abstract As many academic conferences are overwhelmed by a rapidly increasing number of paper submissions, automatically finding appropriate reviewers for each submission becomes a more urgent need than ever. Various factors have been considered by previous attempts on this task to measure the expertise relevance between a paper and a reviewer, including whether the paper is semantically close to, shares topics with, and cites previous papers of the reviewer. However, the majority of previous studies take only one of these factors into account, leading to an incomprehensive evaluation of paper-reviewer relevance. To bridge this gap, in this paper, we propose a unified model for paper-reviewer matching that jointly captures semantic, topic, and citation factors. In the unified model, a contextualized language model backbone is shared by all factors to learn common knowledge, while instruction tuning is introduced to characterize the uniqueness of each factor by producing factor-aware paper embeddings. Experiments on four datasets (one of which is newly contributed by us) across different fields, including machine learning, computer vision, information retrieval, and data mining, consistently validate the effectiveness of our proposed UniPR model in comparison with state-of-the-art paper-reviewer matching methods and scientific pre-trained language models.
    摘要 很多学术会议由于纷纷增加的论文提交数量而面临着自动找到适当的评审人的需求,而这已经成为一项非常紧迫的任务。先前的尝试中考虑了多种因素来衡量论文和评审人之间的专业相关性,包括论文与评审人的语义相似性、论文与评审人的话题相似性以及论文与评审人之间的引用关系。但是,大多数先前的研究只是单独考虑了一个这些因素,导致评审人与论文之间的评估不充分。为了bridging这个差距,在这篇论文中,我们提议一种统一的论文评审匹配模型,该模型同时捕捉语义、话题和引用因素。在统一模型中,一个contextualized language model backbone被共享,以学习共同知识,而instruction tuning被引入,以特别 caracterize每个因素的唯一性,生成因素相关的论文嵌入。实验在四个数据集(其中一个是我们新提供的) across不同的领域,包括机器学习、计算机视觉、信息检索和数据挖掘,验证了我们提议的UniPR模型的效果,相比之下与状态当前的论文评审匹配方法和科学预训言语言模型。

DetectGPT-SC: Improving Detection of Text Generated by Large Language Models through Self-Consistency with Masked Predictions

  • paper_url: http://arxiv.org/abs/2310.14479
  • repo_url: None
  • paper_authors: Rongsheng Wang, Qi Li, Sihong Xie
  • for: 检测AI生成文本是否是人类生成的文本
  • methods: 使用自适应性预测来检测AI生成文本的自我一致性
  • results: 在不同任务中,DetectGPT-SC超过当前状态的检测性能
    Abstract General large language models (LLMs) such as ChatGPT have shown remarkable success, but it has also raised concerns among people about the misuse of AI-generated texts. Therefore, an important question is how to detect whether the texts are generated by ChatGPT or by humans. Existing detectors are built on the assumption that there is a distribution gap between human-generated and AI-generated texts. These gaps are typically identified using statistical information or classifiers. In contrast to prior research methods, we find that large language models such as ChatGPT exhibit strong self-consistency in text generation and continuation. Self-consistency capitalizes on the intuition that AI-generated texts can still be reasoned with by large language models using the same logical reasoning when portions of the texts are masked, which differs from human-generated texts. Using this observation, we subsequently proposed a new method for AI-generated texts detection based on self-consistency with masked predictions to determine whether a text is generated by LLMs. This method, which we call DetectGPT-SC. We conducted a series of experiments to evaluate the performance of DetectGPT-SC. In these experiments, we employed various mask scheme, zero-shot, and simple prompt for completing masked texts and self-consistency predictions. The results indicate that DetectGPT-SC outperforms the current state-of-the-art across different tasks.
    摘要 通用大型语言模型(LLM)如ChatGPT已经表现出了惊人的成功,但也引起了人们对AI生成文本的使用的担忧。因此,一个重要的问题是如何检测文本是由ChatGPT或人类生成的。现有的检测器基于人类生成和AI生成文本之间的分布差异。这些差异通常通过统计信息或分类器来定义。与先前的研究方法不同,我们发现了大型语言模型如ChatGPT在文本生成中具有强的自我一致性。自我一致性利用了人类生成文本和AI生成文本之间的逻辑相似性,这与人类生成文本不同。使用这一观察,我们后续提出了一种基于自我一致性的新方法 дляAI生成文本检测,称为DetectGPT-SC。我们进行了一系列实验来评估DetectGPT-SC的性能。在这些实验中,我们使用了不同的mask scheme、零shot和简单的提示来完成masked文本和自我一致预测。结果表明,DetectGPT-SC在不同任务上比现有状态的培训数据表现出了更高的性能。

GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding

  • paper_url: http://arxiv.org/abs/2310.14478
  • repo_url: https://github.com/knowledge-computing/geolm
  • paper_authors: Zekun Li, Wenxuan Zhou, Yao-Yi Chiang, Muhao Chen
  • for: 该论文旨在提高自然语言处理和地理科学之间的交互,通过充分利用大规模可用的地理数据库,如OpenStreetMap。
  • methods: 该论文提出了一种基于地理信息的语言模型,称为GeoLM,可以增强对地OINames的理解。GeoLM通过将地理信息与文本 Corpora中的语言信息相连接,通过对比学习和遮盖语言模型来连接这两种类型的上下文。它还包括一种空间坐标编码机制,以编码距离和方向关系,捕捉地理上下文。
  • results: 实验表明,GeoLM可以有效地支持拼音识别、地OINames连接、关系提取和地entity类型分类等任务,bridge自然语言处理和地理科学之间的空难。代码可以在https://github.com/knowledge-computing/geolm中下载。
    Abstract Humans subconsciously engage in geospatial reasoning when reading articles. We recognize place names and their spatial relations in text and mentally associate them with their physical locations on Earth. Although pretrained language models can mimic this cognitive process using linguistic context, they do not utilize valuable geospatial information in large, widely available geographical databases, e.g., OpenStreetMap. This paper introduces GeoLM, a geospatially grounded language model that enhances the understanding of geo-entities in natural language. GeoLM leverages geo-entity mentions as anchors to connect linguistic information in text corpora with geospatial information extracted from geographical databases. GeoLM connects the two types of context through contrastive learning and masked language modeling. It also incorporates a spatial coordinate embedding mechanism to encode distance and direction relations to capture geospatial context. In the experiment, we demonstrate that GeoLM exhibits promising capabilities in supporting toponym recognition, toponym linking, relation extraction, and geo-entity typing, which bridge the gap between natural language processing and geospatial sciences. The code is publicly available at https://github.com/knowledge-computing/geolm.
    摘要 人们无意识地在阅读文章时进行地ospatial 理解。我们认可地名和其所处的空间关系在文本中,并将其与地球上的 физические位置相关联。虽然预训练的语言模型可以通过语言上的语境来模仿这种认知过程,但它们不会利用大量的地ospatial信息,例如OpenStreetMap。这篇论文介绍了GeoLM,一种基于地ospatial信息的语言模型,它可以增强自然语言处理中的地ospatial信息的理解。GeoLM利用地ospatial信息来将文本中的地名作为锚点,与文本中的语言信息相连接。GeoLM通过对比学习和隐藏语言模型来连接这两种上下文。它还包括一个空间坐标编码机制,以编码距离和方向关系,以 Capture地ospatial上下文。在实验中,我们示出了GeoLM在支持拼音识别、地名连接、关系提取和地ospatial类型识别等方面的表现,这些能力 bridge 自然语言处理和地ospatial科学之间的空难。代码可以在https://github.com/knowledge-computing/geolm 中获取。

cs.LG - 2023-10-23

Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach

  • paper_url: http://arxiv.org/abs/2310.15411
  • repo_url: None
  • paper_authors: Yinan Li, Chicheng Zhang
  • for: Computationally and label efficient PAC active learning of $d$-dimensional halfspaces with Tsybakov Noise.
  • methods: Nonconvex optimization-based algorithm with a label complexity of $\tilde{O}(d (\frac{1}{\epsilon})^{\frac{8-6\alpha}{3\alpha-1})$.
  • results: Low excess error guarantee, narrowing down the gap between the label complexities of the previously known efficient passive or active algorithms and the information-theoretic lower bound.
    Abstract We study the problem of computationally and label efficient PAC active learning $d$-dimensional halfspaces with Tsybakov Noise~\citep{tsybakov2004optimal} under structured unlabeled data distributions. Inspired by~\cite{diakonikolas2020learning}, we prove that any approximate first-order stationary point of a smooth nonconvex loss function yields a halfspace with a low excess error guarantee. In light of the above structural result, we design a nonconvex optimization-based algorithm with a label complexity of $\tilde{O}(d (\frac{1}{\epsilon})^{\frac{8-6\alpha}{3\alpha-1})$\footnote{In the main body of this work, we use $\tilde{O}(\cdot), \tilde{\Theta}(\cdot)$ to hide factors of the form $\polylog(d, \frac{1}{\epsilon}, \frac{1}{\delta})$}, under the assumption that the Tsybakov noise parameter $\alpha \in (\frac13, 1]$, which narrows down the gap between the label complexities of the previously known efficient passive or active algorithms~\citep{diakonikolas2020polynomial,zhang2021improved} and the information-theoretic lower bound in this setting.
    摘要 我们研究 computationally 和标签有效的 PAC 活动学习 $d$-dimensional 半空间,以 Tsybakov 噪声为基础。参考 \cite{diakonikolas2020learning}, 我们证明任何估计首要点的近似非凸函数损失函数的解答将具有低过量错误保证。对于这个结构性结果,我们设计了非凸估计过程中的算法,其标签复杂度为 $\tilde{O}(d (\frac{1}{\epsilon})^{\frac{8-6\alpha}{3\alpha-1})$\footnote{在本文中,我们使用 $\tilde{O}(\cdot), \tilde{\Theta}(\cdot)$ 隐藏因数为 $\polylog(d, \frac{1}{\epsilon}, \frac{1}{\delta})$}, 其中 $\alpha \in (\frac13, 1]$ 是 Tsybakov 噪声参数。这个假设窄化了在这个设定下的标签复杂度和以前已知的有效投入或活动算法 \cite{diakonikolas2020polynomial,zhang2021improved} 之间的差距。

MEMPSEP III. A machine learning-oriented multivariate data set for forecasting the Occurrence and Properties of Solar Energetic Particle Events using a Multivariate Ensemble Approach

  • paper_url: http://arxiv.org/abs/2310.15390
  • repo_url: None
  • paper_authors: Kimberly Moreland, Maher Dayeh, Hazel M. Bain, Subhamoy Chatterjee, Andres Munoz-Jaramillo, Samuel Hart
  • for: 这个论文是用于描述一个新的多变量数据集,用于预测太阳生成的高能粒子事件和其后的性质。
  • methods: 这个数据集使用了多个空间站的实时和远程探测数据,包括GOES、ACE和SDO等机器人。数据集包括本地气体属性、高能质子和电子数据、上游太阳风条件以及天顺磁场方向量量。
  • results: 这个数据集可以用于机器学习预测高能粒子事件的发生和其后的性质,并已经用于开发了一种新的多变量组合模型(MEMPSEP)。
    Abstract We introduce a new multivariate data set that utilizes multiple spacecraft collecting in-situ and remote sensing heliospheric measurements shown to be linked to physical processes responsible for generating solar energetic particles (SEPs). Using the Geostationary Operational Environmental Satellites (GOES) flare event list from Solar Cycle (SC) 23 and part of SC 24 (1998-2013), we identify 252 solar events (flares) that produce SEPs and 17,542 events that do not. For each identified event, we acquire the local plasma properties at 1 au, such as energetic proton and electron data, upstream solar wind conditions, and the interplanetary magnetic field vector quantities using various instruments onboard GOES and the Advanced Composition Explorer (ACE) spacecraft. We also collect remote sensing data from instruments onboard the Solar Dynamic Observatory (SDO), Solar and Heliospheric Observatory (SoHO), and the Wind solar radio instrument WAVES. The data set is designed to allow for variations of the inputs and feature sets for machine learning (ML) in heliophysics and has a specific purpose for forecasting the occurrence of SEP events and their subsequent properties. This paper describes a dataset created from multiple publicly available observation sources that is validated, cleaned, and carefully curated for our machine-learning pipeline. The dataset has been used to drive the newly-developed Multivariate Ensemble of Models for Probabilistic Forecast of Solar Energetic Particles (MEMPSEP; see MEMPSEP I (Chatterjee et al., 2023) and MEMPSEP II (Dayeh et al., 2023) for associated papers).
    摘要 我们介绍一个新的多变量数据集,该数据集利用多个空间站在 situ和远程探测〖〗〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕

Error analysis of generative adversarial network

  • paper_url: http://arxiv.org/abs/2310.15387
  • repo_url: None
  • paper_authors: Mahmud Hasan, Hailin Sang
  • for: 研究高维分布学习模型生成敌方网络(GAN)的错误收敛率。
  • methods: 使用VC类函数,包括权重分配器和生成器神经网络,并采用泰拉格兰德不等式和波列-卡诺蒂勒马术来研究GAN模型的错误收敛率。
  • results: 建立了GAN模型的错误收敛率的紧张bounds,并且可以应用于现有的GAN模型错误估计,从而提高错误估计的收敛率。具体来说,使用神经网络距离定义的错误是GAN模型的特殊情况。
    Abstract The generative adversarial network (GAN) is an important model developed for high-dimensional distribution learning in recent years. However, there is a pressing need for a comprehensive method to understand its error convergence rate. In this research, we focus on studying the error convergence rate of the GAN model that is based on a class of functions encompassing the discriminator and generator neural networks. These functions are VC type with bounded envelope function under our assumptions, enabling the application of the Talagrand inequality. By employing the Talagrand inequality and Borel-Cantelli lemma, we establish a tight convergence rate for the error of GAN. This method can also be applied on existing error estimations of GAN and yields improved convergence rates. In particular, the error defined with the neural network distance is a special case error in our definition.
    摘要 “generate adversarial network(GAN)是近年来实现高维分布学习的重要模型。然而,我们需要一种全面的方法来理解GAN模型的错误融合率。在本研究中,我们专注于研究基于构成推论和生成神经网络的函数集的GAN模型的错误融合率。这些函数是VC类型,具有受我们假设的受限的托管函数,因此可以适用塔拉格兰δ不等式。通过使用塔拉格兰δ不等式和波尔-凡提莱假设,我们建立了GAN模型的错误融合率的紧缩界限。这个方法可以对现有的GAN模型错误评估中的错误进行改进。特别是,使用神经网络距离定义的错误是GAN模型的特殊情况。”Note: Simplified Chinese is used here, which is a standardized form of Chinese that is easier to read and write. Traditional Chinese is also widely used, and the translation would be similar but with some differences in wording and formatting.

Learning Fair Representations with High-Confidence Guarantees

  • paper_url: http://arxiv.org/abs/2310.15358
  • repo_url: https://github.com/jamesluoyh/frg
  • paper_authors: Yuhong Luo, Austin Hoag, Philip S. Thomas
  • for: 本研究旨在提供一种具有高度可信度保证的不平等 represencing学习方法,以防止对劣势群体的不公正待遇。
  • methods: 本研究提出了一种名为 Fair Representation learning with high-confidence Guarantees(FRG)框架,该框架可以提供高度可信度保证,限制下游模型和任务中的不公正程度,并且可以根据用户定义的Upper bound进行限制。
  • results: 研究人员通过证明FRG方法能够在所有下游模型和任务中保证公正性,并且在多个下游模型和任务中实际评估中也能够Upper bound不公正程度。
    Abstract Representation learning is increasingly employed to generate representations that are predictive across multiple downstream tasks. The development of representation learning algorithms that provide strong fairness guarantees is thus important because it can prevent unfairness towards disadvantaged groups for all downstream prediction tasks. To prevent unfairness towards disadvantaged groups in all downstream tasks, it is crucial to provide representation learning algorithms that provide fairness guarantees. In this paper, we formally define the problem of learning representations that are fair with high confidence. We then introduce the Fair Representation learning with high-confidence Guarantees (FRG) framework, which provides high-confidence guarantees for limiting unfairness across all downstream models and tasks, with user-defined upper bounds. After proving that FRG ensures fairness for all downstream models and tasks with high probability, we present empirical evaluations that demonstrate FRG's effectiveness at upper bounding unfairness for multiple downstream models and tasks.
    摘要 “表示学习”是在多个下游任务中生成预测性的表示,因此开发有强有力公平保证的表示学习算法是非常重要的,因为它可以防止对劣势群体的不公正。为了防止在所有下游任务中对劣势群体的不公正,提供公平保证的表示学习算法是非常重要的。在这篇论文中,我们正式定义了学习公平表示的问题,并引入了高度信任保证的 Fair Representation learning with high-confidence Guarantees(FRG)框架,该框架提供了高度信任保证,限制下游模型和任务中的不公正,并且可以根据用户定义的Upper bound进行限制。经过证明FRG在所有下游模型和任务中保证公平性的高概率,我们还提供了多个下游模型和任务的实验评估,证明FRG能够有效地 Upper bound不公正性。

Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency

  • paper_url: http://arxiv.org/abs/2310.15351
  • repo_url: None
  • paper_authors: Sudeep Salgia, Sattar Vakili, Qing Zhao
  • for: 本研究旨在探讨 bayesian 优化方法,具体来说是使用 Gaussian Process 模型,以及在随机抽样的基础上进行搜索空间。
  • methods: 本研究使用了随机抽样法,并基于一个新的归一化 bounds 来分析其性能。
  • results: 研究结果表明,采用随机抽样法可以实现最佳的错误率,并且在噪声无存的情况下可以 closing 现有的 gap 问题,解决了 COLT 开放问题。此外,提出的算法还具有更高的计算效率,因为它不需要在每次迭代中估算非对称的吸引函数。
    Abstract We consider Bayesian optimization using Gaussian Process models, also referred to as kernel-based bandit optimization. We study the methodology of exploring the domain using random samples drawn from a distribution. We show that this random exploration approach achieves the optimal error rates. Our analysis is based on novel concentration bounds in an infinite dimensional Hilbert space established in this work, which may be of independent interest. We further develop an algorithm based on random exploration with domain shrinking and establish its order-optimal regret guarantees under both noise-free and noisy settings. In the noise-free setting, our analysis closes the existing gap in regret performance and thereby resolves a COLT open problem. The proposed algorithm also enjoys a computational advantage over prevailing methods due to the random exploration that obviates the expensive optimization of a non-convex acquisition function for choosing the query points at each iteration.
    摘要 我团队考虑使用 bayesian 优化方法,其中使用 Gaussian Process 模型,也称为 kernel-based bandit 优化。我们研究如何在域中随机抽样,以实现最佳的错误率。我们的分析基于我们在这篇文章中提出的新的含量约束,这可能是独立的兴趣。我们还开发了基于随机抽样和域缩小的算法,并证明其在静音和噪音Setting中具有order-optimal的 regret 保证。在静音Setting中,我们的分析填充了现有的 gap 问题,解决了 COLT 开放问题。此外,我们的算法还享有对先前方法的计算优势,因为随机抽样可以避免估计非 conjugate 函数的成本。

Burgers’ pinns with implicit euler transfer learning

  • paper_url: http://arxiv.org/abs/2310.15343
  • repo_url: None
  • paper_authors: Vitória Biesek, Pedro Henrique de Almeida Konzen
  • for: 解决 Burgers 方程的计算模拟问题
  • methods: 使用 Physics-Informed Neural Networks (PINNs) 与 implicit Euler transfer learning 方法求解 Burgers 方程
  • results: 提出一种时间细化的 PINNs 模型,可以减少计算成本并保持同等准确性。
    Abstract The Burgers equation is a well-established test case in the computational modeling of several phenomena such as fluid dynamics, gas dynamics, shock theory, cosmology, and others. In this work, we present the application of Physics-Informed Neural Networks (PINNs) with an implicit Euler transfer learning approach to solve the Burgers equation. The proposed approach consists in seeking a time-discrete solution by a sequence of Artificial Neural Networks (ANNs). At each time step, the previous ANN transfers its knowledge to the next network model, which learns the current time solution by minimizing a loss function based on the implicit Euler approximation of the Burgers equation. The approach is tested for two benchmark problems: the first with an exact solution and the other with an alternative analytical solution. In comparison to the usual PINN models, the proposed approach has the advantage of requiring smaller neural network architectures with similar accurate results and potentially decreasing computational costs.
    摘要 《布尔格尔方程》是一个广泛应用的计算模拟方法的测试 случа件,包括流体动力学、气体动力学、冲击理论、 cosmology 等领域。在这项工作中,我们提出了使用物理学知识泛化神经网络(PINNs)的偏函数扩展学习方法解决布尔格尔方程。该方法包括通过一系列人工神经网络(ANNs)来求解时间隔 discrete 的解。在每个时间步骤中,前一个 ANN 传递其知识给下一个神经网络模型,该模型通过最小化基于偏函数积分的损失函数来学习当前时间解。该方法在两个标准测试问题上进行了测试:一个具有精确解和另一个具有备用分析解。与传统的 PINN 模型相比,该方法具有更小的神经网络架构,同样具有高度准确的结果,并且可能减少计算成本。

Towards Hybrid-grained Feature Interaction Selection for Deep Sparse Network

  • paper_url: http://arxiv.org/abs/2310.15342
  • repo_url: https://github.com/fuyuanlyu/optfeature
  • paper_authors: Fuyuan Lyu, Xing Tang, Dugang Liu, Chen Ma, Weihong Luo, Liang Chen, Xiuqiang He, Xue Liu
  • for: 这个研究旨在提出一种hybrid-grained的Feature Interaction Selection方法,用于深度稀畴网络(Deep Sparse Networks)的预测任务中,以提高精度和效率。
  • methods: 我们提出了一个名为OptFeature的选择算法,可以同时选择feature field和feature value中的交互作用,并在运算中calculate一个分解空间,以探索更加广阔的Feature Interaction空间。
  • results: 我们在三个大型的Real-world benchmark dataset上进行了实验,结果显示OptFeature方法在精度和效率之间取得了良好的平衡,并且进行了更多的研究以证明我们的方法的可行性。
    Abstract Deep sparse networks are widely investigated as a neural network architecture for prediction tasks with high-dimensional sparse features, with which feature interaction selection is a critical component. While previous methods primarily focus on how to search feature interaction in a coarse-grained space, less attention has been given to a finer granularity. In this work, we introduce a hybrid-grained feature interaction selection approach that targets both feature field and feature value for deep sparse networks. To explore such expansive space, we propose a decomposed space which is calculated on the fly. We then develop a selection algorithm called OptFeature, which efficiently selects the feature interaction from both the feature field and the feature value simultaneously. Results from experiments on three large real-world benchmark datasets demonstrate that OptFeature performs well in terms of accuracy and efficiency. Additional studies support the feasibility of our method.
    摘要 深层稀畴网络在预测任务中具有高维稀有特性,其中特征交互选择是关键组成部分。先前的方法主要关注在粗略空间中搜索特征交互,少有关注细腻空间。在这项工作中,我们提出了一种混合粗细特征交互选择方法,该方法 simultanously targets 特征场和特征值。为了探索这种广阔的空间,我们提出了一种分解空间,该空间在运行时进行计算。然后,我们开发了一种选择算法called OptFeature,该算法高效地选择特征交互从特征场和特征值中。实验结果表明,OptFeature在三个大规模实际数据集上表现良好,具有准确性和效率之间的平衡。此外,我们还进行了其他研究,以支持我们的方法的可行性。

ADMM Training Algorithms for Residual Networks: Convergence, Complexity and Parallel Training

  • paper_url: http://arxiv.org/abs/2310.15334
  • repo_url: None
  • paper_authors: Jintao Xu, Yifei Li, Wenxun Xing
  • for: 本研究旨在提出一种基于 auxiliary 变量的序列和并行 proximal point (梯度) ADMM,用于FCResNets 训练问题。
  • methods: 我们引入了auxiliary变量,并提供了一种基于 Kurdyka-Lojasiewicz(KL)性质分析框架的收敛分析。我们可以保证当KL exponent在不同范围内时,存在本地 R-线性或下降性收敛率。
  • results: 我们 theoretically analyzed the convergence, convergence rate, time complexity and (per-node) runtime memory requirement of the ADMM applied in the FCResNets training problem. experiments show that our parallel training method can achieve high speed, better performance, robustness and potential in deep network training tasks. Finally, we present the advantage and potential of our parallel training in large-scale problems.
    Abstract We design a series of serial and parallel proximal point (gradient) ADMMs for the fully connected residual networks (FCResNets) training problem by introducing auxiliary variables. Convergence of the proximal point version is proven based on a Kurdyka-Lojasiewicz (KL) property analysis framework, and we can ensure a locally R-linear or sublinear convergence rate depending on the different ranges of the Kurdyka-Lojasiewicz (KL) exponent, in which a necessary auxiliary function is constructed to realize our goal. Moreover, the advantages of the parallel implementation in terms of lower time complexity and less (per-node) memory consumption are analyzed theoretically. To the best of our knowledge, this is the first work analyzing the convergence, convergence rate, time complexity and (per-node) runtime memory requirement of the ADMM applied in the FCResNets training problem theoretically. Experiments are reported to show the high speed, better performance, robustness and potential in the deep network training tasks. Finally, we present the advantage and potential of our parallel training in large-scale problems.
    摘要 我们设计了一系列的串行和并行距离点(梯度)ADMM,用于全连接差异网络(FCResNets)训练问题。我们引入了辅助变量,并证明了距离点版本的收敛性,基于 Kurdyka-Lojasiewicz(KL)性质分析框架。我们可以确保当KL exponent在不同范围内时,得到本地R-线性或者减速收敛率。此外,我们还分析了并行实现的优点,包括更低的时间复杂度和每个节点的内存占用量。在我们知道的范围内,这是FCResNets训练问题中ADMM的首次 teoretic 分析。我们的实验表明了高速、更好的性能、稳定性和深网训练任务中的潜在优势。最后,我们介绍了大规模问题中的并行训练的优点和潜在。

Estimating Trustworthy and Safe Optimal Treatment Regimes

  • paper_url: http://arxiv.org/abs/2310.15333
  • repo_url: None
  • paper_authors: Harsh Parikh, Quinn Lanners, Zade Akras, Sahar F. Zafar, M. Brandon Westover, Cynthia Rudin, Alexander Volfovsky
  • for: 这个研究旨在开发一个安全可解释的框架,以便在高风险情况下进行患者处理策略。
  • methods: 本研究使用了统计学和强化学方法,包括匹配病人的医疗和药物特征,以建立一个最佳政策。
  • results: 研究结果显示,这个框架能够在复杂情况下进行最佳策略设定,并且发现了个人化治疗策略,将药物剂量降低轻微和短暂的癫痫病人,而对于在加护病房中接受治疗的病人,则采取更积极的治疗方案。
    Abstract Recent statistical and reinforcement learning methods have significantly advanced patient care strategies. However, these approaches face substantial challenges in high-stakes contexts, including missing data, inherent stochasticity, and the critical requirements for interpretability and patient safety. Our work operationalizes a safe and interpretable framework to identify optimal treatment regimes. This approach involves matching patients with similar medical and pharmacological characteristics, allowing us to construct an optimal policy via interpolation. We perform a comprehensive simulation study to demonstrate the framework's ability to identify optimal policies even in complex settings. Ultimately, we operationalize our approach to study regimes for treating seizures in critically ill patients. Our findings strongly support personalized treatment strategies based on a patient's medical history and pharmacological features. Notably, we identify that reducing medication doses for patients with mild and brief seizure episodes while adopting aggressive treatment for patients in intensive care unit experiencing intense seizures leads to more favorable outcomes.
    摘要 Translated into Simplified Chinese:近期的统计学和强化学习方法已经有效地提高了患者治疗策略。然而,这些方法在高风险上下面遇到了大量数据缺失、内生的随机性和必须满足解释性和患者安全的要求。我们的工作将这些方法变得安全和可解释的。我们的方法是通过匹配患者的医疗和药理特征来寻找优质策略。我们通过 interpolate 来构建优质策略。我们进行了全面的模拟研究,以示该方法在复杂情况下的能力。最后,我们运用我们的方法研究了治疗癫痫病人的方法。我们的发现支持个性化治疗策略,基于患者的医疗历史和药理特征。尤其是,我们发现在轻度癫痫发作的患者 reducing medication doses 可以获得更好的结果,而在医院内部件严重癫痫发作的患者则应采取更为严格的治疗策略。

Unsupervised Federated Learning: A Federated Gradient EM Algorithm for Heterogeneous Mixture Models with Robustness against Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2310.15330
  • repo_url: None
  • paper_authors: Ye Tian, Haolei Weng, Yang Feng
  • for: 这篇论文旨在探讨无监督联合学习方法在不同任务之间的Unsupervised Learning问题。
  • methods: 该论文提出了一种基于Gradient EM算法的联合学习方法,用于学习含有不同混合比例的混合模型。
  • results: 该方法具有适应不确定任务相似性、对小数据源的抗击骚扰攻击、保护本地数据隐私、计算和通信效率等优点。
    Abstract While supervised federated learning approaches have enjoyed significant success, the domain of unsupervised federated learning remains relatively underexplored. In this paper, we introduce a novel federated gradient EM algorithm designed for the unsupervised learning of mixture models with heterogeneous mixture proportions across tasks. We begin with a comprehensive finite-sample theory that holds for general mixture models, then apply this general theory on Gaussian Mixture Models (GMMs) and Mixture of Regressions (MoRs) to characterize the explicit estimation error of model parameters and mixture proportions. Our proposed federated gradient EM algorithm demonstrates several key advantages: adaptability to unknown task similarity, resilience against adversarial attacks on a small fraction of data sources, protection of local data privacy, and computational and communication efficiency.
    摘要 “supervised federated learning的方法已经取得了显著的成功,然而无监督 federated learning的领域尚未得到充分的探索。在这篇论文中,我们提出了一种新的 federated gradient EM算法,用于无监督学习混合模型的不同任务中的混合比例。我们首先提出了一种通用的finite-sample理论,然后应用这种通用理论于 Gaussian Mixture Models (GMMs) 和 Mixture of Regressions (MoRs),以Characterize模型参数和混合比例的明确估计误差。我们的提议的 federated gradient EM算法具有以下优点:适应不知道任务相似性,抗击小数据源上的恶意攻击,保护本地数据隐私,并且在计算和通信方面具有高效性。”Note that Simplified Chinese is used in this translation, which is a standardized form of Chinese that is used in mainland China and Singapore. Traditional Chinese is used in Taiwan and Hong Kong.

ADMarker: A Multi-Modal Federated Learning System for Monitoring Digital Biomarkers of Alzheimer’s Disease

  • paper_url: http://arxiv.org/abs/2310.15301
  • repo_url: None
  • paper_authors: Xiaomin Ouyang, Xian Shuai, Yang Li, Li Pan, Xifan Zhang, Heming Fu, Xinyan Wang, Shihua Cao, Jiang Xin, Hazel Mok, Zhenyu Yan, Doris Sau Fung Yu, Timothy Kwok, Guoliang Xing
  • for: 这篇论文旨在探讨一个可以在日常生活环境中检测多维度阿尔茨海默症(AD)数位生物 markers的系统,以及该系统如何具有隐私保证和扩展可能性。
  • methods: 这篇论文提出了一个三stage多modal联合学习架构,它可以从多种不同的数据来源中搜寻到高精度的AD数位生物 markers,并且可以在隐私保证下进行检测。
  • results: 根据这篇论文的结果,ADMarker 可以实现高精度的AD数位生物 markers 检测,其准确率可以达到 93.8%,并且可以在88.9%的精度下识别早期的AD。此外,ADMarker 还可以在长期的评估中跟踪患AD患者的症状和病程。
    Abstract Alzheimer's Disease (AD) and related dementia are a growing global health challenge due to the aging population. In this paper, we present ADMarker, the first end-to-end system that integrates multi-modal sensors and new federated learning algorithms for detecting multidimensional AD digital biomarkers in natural living environments. ADMarker features a novel three-stage multi-modal federated learning architecture that can accurately detect digital biomarkers in a privacy-preserving manner. Our approach collectively addresses several major real-world challenges, such as limited data labels, data heterogeneity, and limited computing resources. We built a compact multi-modality hardware system and deployed it in a four-week clinical trial involving 91 elderly participants. The results indicate that ADMarker can accurately detect a comprehensive set of digital biomarkers with up to 93.8% accuracy and identify early AD with an average of 88.9% accuracy. ADMarker offers a new platform that can allow AD clinicians to characterize and track the complex correlation between multidimensional interpretable digital biomarkers, demographic factors of patients, and AD diagnosis in a longitudinal manner.
    摘要 阿尔茨heimer病(AD)和相关的忘卫症是全球医疗健康的增长性挑战,归因于老龄化人口增长。在这篇论文中,我们介绍了ADMarker,第一个整体 integrate多Modal感知器和新的联邦学习算法的检测多维度AD数字生物标志系统。ADMarker具有一种新的三 stage多Modal联邦学习架构,可以准确地检测数字生物标志,并且保护隐私。我们的方法同时解决了一些重要的现实世界挑战,如有限的数据标签、数据不一致和有限的计算资源。我们建立了一个 компакт的多Modal感知硬件系统,并在4周的临床试验中使用了91名老年参与者。结果表明,ADMarker可以准确地检测一组包括93.8%的准确率的数字生物标志,并在88.9%的准确率下识别早期AD。ADMarker提供了一个新的平台,可以让AD临床医生通过多维度可读取数字生物标志、患者人口特征和AD诊断之间的复杂相关关系进行长期跟踪。

Fast and Reliable Generation of EHR Time Series via Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.15290
  • repo_url: None
  • paper_authors: Muhang Tian, Bernie Chen, Allan Guo, Shiyi Jiang, Anru R. Zhang
  • for: 这个论文是为了提供一种生成隐私保护的电子健康记录(EHR)时间序列数据的新方法。
  • methods: 这个论文使用了Denosing Diffusion Probabilistic Models(DDPM)来生成多样化和现实的生成EHR时间序列数据。
  • results: 对六个数据集进行了实验,比较了这种新方法与七种现有方法的性能,结果表明,这种方法在数据用途性能方面明显超过了所有现有方法,同时需要更少的训练努力。这种方法也可以增强下游医疗数据分析,提供多样化和现实的生成EHR时间序列数据。
    Abstract Electronic Health Records (EHRs) are rich sources of patient-level data, including laboratory tests, medications, and diagnoses, offering valuable resources for medical data analysis. However, concerns about privacy often restrict access to EHRs, hindering downstream analysis. Researchers have explored various methods for generating privacy-preserving EHR data. In this study, we introduce a new method for generating diverse and realistic synthetic EHR time series data using Denoising Diffusion Probabilistic Models (DDPM). We conducted experiments on six datasets, comparing our proposed method with seven existing methods. Our results demonstrate that our approach significantly outperforms all existing methods in terms of data utility while requiring less training effort. Our approach also enhances downstream medical data analysis by providing diverse and realistic synthetic EHR data.
    摘要

A Doubly Robust Approach to Sparse Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.15286
  • repo_url: None
  • paper_authors: Wonyoung Kim, Garud Iyengar, Assaf Zeevi
  • for: 这个论文是为了解决 episodic sparse linear Markov decision process (SMDP) 中的 regret minimization 问题。
  • methods: 这个论文使用的方法包括 doubly robust method 和一种新的分析技术,这些方法使得算法可以使用所有动作的特征向量,并且可以使用所有期间的数据。
  • results: 这个论文的结果表明,提出的算法的 regret 是 $\tilde{O}(\sigma^{-1}{\min} s{\star} H \sqrt{N})$,其中 $\sigma_{\min}$ 是 average Gram matrix 的最小特征值,$s_\star$ 是稀疏性参数,$H$ 是 episode 的长度,$N$ 是 round 的数量。此外,论文还提供了一个下界 regret bound,其与上界 bound 几乎吻合。 numersical experiments 支持了论文的理论结果,并证明了该算法的优越性。
    Abstract We propose a new regret minimization algorithm for episodic sparse linear Markov decision process (SMDP) where the state-transition distribution is a linear function of observed features. The only previously known algorithm for SMDP requires the knowledge of the sparsity parameter and oracle access to an unknown policy. We overcome these limitations by combining the doubly robust method that allows one to use feature vectors of \emph{all} actions with a novel analysis technique that enables the algorithm to use data from all periods in all episodes. The regret of the proposed algorithm is $\tilde{O}(\sigma^{-1}_{\min} s_{\star} H \sqrt{N})$, where $\sigma_{\min}$ denotes the restrictive the minimum eigenvalue of the average Gram matrix of feature vectors, $s_\star$ is the sparsity parameter, $H$ is the length of an episode, and $N$ is the number of rounds. We provide a lower regret bound that matches the upper bound up to logarithmic factors on a newly identified subclass of SMDPs. Our numerical experiments support our theoretical results and demonstrate the superior performance of our algorithm.
    摘要 我们提出一个新的后悔函数减少算法 для episodic sparse linear Markov decision process(SMDP),其中状态转换分布为観测特征的线性函数。先前只有一个算法可以应对SMDP,但它需要知道稀疏参数和不知道政策的权限。我们利用 doubly robust 方法,允许使用所有动作的特征 вектор,并与一种新的分析技术,允许算法使用所有时期和所有集的数据。我们的算法的后悔是 $\tilde{O}(\sigma^{-1}_{\min} s_{\star} H \sqrt{N})$,其中 $\sigma_{\min}$ 是 restrictive 最小 eigenvalues of the average Gram matrix of feature vectors, $s_\star$ 是稀疏参数,$H$ 是集的长度,$N$ 是轮数。我们提出了一个新的下界,与上界之间仅差 logarithmic factors,并在一个新的 SMDP subclass 上适用。我们的实验结果支持我们的理论结果,并证明我们的算法的优秀性。

UncertaintyPlayground: A Fast and Simplified Python Library for Uncertainty Estimation

  • paper_url: http://arxiv.org/abs/2310.15281
  • repo_url: https://github.com/Unco3892/UncertaintyPlayground
  • paper_authors: Ilia Azizi
  • for: 本研究提出了一个名为UncertaintyPlayground的Python库,用于推断难点学习任务中的不确定性估计。
  • methods: 该库使用PyTorch和GPyTorch实现了各种不确定性估计方法,包括 sparse和Variational Gaussian Process Regressions (SVGPRs) 和 Mixed Density Networks (MDN)。
  • results: 该库可以快速训练 Gaussian和多Modal 结果分布,并且可以Visual化一个或多个实例的预测范围。此外,库还提供了许多PyTorch特有的速度优化技术,可以在CPU和GPU上进行训练。
    Abstract This paper introduces UncertaintyPlayground, a Python library built on PyTorch and GPyTorch for uncertainty estimation in supervised learning tasks. The library offers fast training for Gaussian and multi-modal outcome distributions through Sparse and Variational Gaussian Process Regressions (SVGPRs) for normally distributed outcomes and Mixed Density Networks (MDN) for mixed distributions. In addition to model training with various hyperparameters, UncertaintyPlayground can visualize the prediction intervals of one or more instances. Due to using tensor operations, the library can be trained both on CPU and GPU and offers various PyTorch-specific techniques for speed optimization. The library contains unit tests for each module and ensures multi-platform continuous integration with GitHub Workflows (online integration) and Tox (local integration). Finally, the code is documented with Google-style docstrings and offers a documentation website created with MkDocs and MkDocStrings.
    摘要 这份论文介绍了UncertaintyPlayground,一个基于PyTorch和GPyTorch的Python库,用于supervised learning任务中的不确定性估计。该库提供了快速训练泊松分布和多模态分布的结果的方法,包括几何分布回归(SVGPR)和混合密度网络(MDN)。除了模型训练的各种超参数外,UncertaintyPlayground还可以可视化一个或多个实例的预测范围。由于使用tensor操作,该库可以在CPU和GPU上训练,并使用PyTorch特有的加速技术。库包含每个模块的单元测试,并实现了多平台连续集成(online integration)和Tox(本地集成)。最后,代码被文档化并创建了使用MkDocs和MkDocStrings的文档网站。

Triple Simplex Matrix Completion for Expense Forecasting

  • paper_url: http://arxiv.org/abs/2310.15275
  • repo_url: None
  • paper_authors: Cheng Qian, Lucas Glass, Nikos Sidiropoulos
  • for: 预测项目开支,帮助企业避免预算过剩和项目失败。
  • methods: 提议使用受限非正定矩阵完成模型,通过学习项目与某些开支模式之间的相关性来预测开支。模型受三个概率 simplices 约束,其中两个在因子矩阵上,另一个在缺失数据上。预测开支值 garantizado 遵循预算约束,无需后期处理。
  • results: 使用两个实际数据集示出方法的有效性,比对州对照算法。
    Abstract Forecasting project expenses is a crucial step for businesses to avoid budget overruns and project failures. Traditionally, this has been done by financial analysts or data science techniques such as time-series analysis. However, these approaches can be uncertain and produce results that differ from the planned budget, especially at the start of a project with limited data points. This paper proposes a constrained non-negative matrix completion model that predicts expenses by learning the likelihood of the project correlating with certain expense patterns in the latent space. The model is constrained on three probability simplexes, two of which are on the factor matrices and the third on the missing entries. Additionally, the predicted expense values are guaranteed to meet the budget constraint without the need of post-processing. An inexact alternating optimization algorithm is developed to solve the associated optimization problem and is proven to converge to a stationary point. Results from two real datasets demonstrate the effectiveness of the proposed method in comparison to state-of-the-art algorithms.
    摘要 预测项目费用是企业避免预算超支和项目失败的关键步骤。传统上,这是由财务分析师或数据科学技术如时间序列分析完成的。然而,这些方法可能存在uncertainty和不准确的结果,特别是项目开始时有限数据点。这篇论文提议一种受限的非负矩阵完成模型,该模型预测费用 by learning项目与某些费用模式在隐藏空间的可能性。模型受限于三个可能性箱,其中两个在因素矩阵上,另一个在缺失数据点上。此外,预测的费用值均会满足预算约束,无需后处理。一种不准确的交互优化算法是解决相关优化问题的,并证明会收敛到站点点。实验结果表明,提议的方法在两个实际数据集中效果更高于状态调度算法。

One-hot Generalized Linear Model for Switching Brain State Discovery

  • paper_url: http://arxiv.org/abs/2310.15263
  • repo_url: None
  • paper_authors: Chengrui Li, Soon Ho Kim, Chris Rodgers, Hannah Choi, Anqi Wu
    for: 这种研究旨在理解神经细胞circuit的内部结构和功能关系。methods: 这种方法使用状态转换的一般线性模型(GLM)和隐马尔可夫模型(HMM),并在GLM中引入了一个加 Gaussian Prior和一个一键热静态 Prior。这些 Priors 是可学习的。results: 这种方法可以有效地恢复真实的交互结构,在 simulate 数据中取得最高预测概率,并在实际神经元数据中提供更加可读的交互结构和隐藏状态。
    Abstract Exposing meaningful and interpretable neural interactions is critical to understanding neural circuits. Inferred neural interactions from neural signals primarily reflect functional interactions. In a long experiment, subject animals may experience different stages defined by the experiment, stimuli, or behavioral states, and hence functional interactions can change over time. To model dynamically changing functional interactions, prior work employs state-switching generalized linear models with hidden Markov models (i.e., HMM-GLMs). However, we argue they lack biological plausibility, as functional interactions are shaped and confined by the underlying anatomical connectome. Here, we propose a novel prior-informed state-switching GLM. We introduce both a Gaussian prior and a one-hot prior over the GLM in each state. The priors are learnable. We will show that the learned prior should capture the state-constant interaction, shedding light on the underlying anatomical connectome and revealing more likely physical neuron interactions. The state-dependent interaction modeled by each GLM offers traceability to capture functional variations across multiple brain states. Our methods effectively recover true interaction structures in simulated data, achieve the highest predictive likelihood with real neural datasets, and render interaction structures and hidden states more interpretable when applied to real neural data.
    摘要 描述神经细胞之间的意义ful和可解释的互动是神经细胞网络理解的关键。尽管推测的神经互动主要反映了功能性互动,但在长期实验中,主体动物可能经历不同的阶段,定义为实验、刺激或行为状态,因此功能互动可能会随时间变化。为了模型时间变化的功能互动,前作使用状态转换的普通线性模型加上隐马尔可夫模型(i.e., HMM-GLMs)。然而,我们认为这些方法缺乏生物学可信度,因为功能互动受到神经细胞之间的结构连接的限制。在这里,我们提出了一种新的先知 informed state-switching GLM。我们引入了一个加aussian prior和一个one-hot prior sobre GLM在每个状态中。这些先知是学习的。我们会示出,学习的先知应该捕捉状态常量的互动,揭示下面的结构连接,并抛出更可能的物理神经元互动。每个 GLM 模型的状态依赖性可以跟踪多种神经细胞状态中的功能变化。我们的方法可以有效地回归真实的互动结构,在实验数据中达到最高的预测可能性,并在应用于真实神经数据时使互动结构和隐藏状态更加可读。

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

  • paper_url: http://arxiv.org/abs/2310.15261
  • repo_url: None
  • paper_authors: Gautam Krishna, Sameer Dharur, Oggi Rudovic, Pranay Dighe, Saurabh Adya, Ahmed Hussen Abdelaziz, Ahmed H Tewfik
  • for: 本研究旨在提高 Device-directed speech detection (DDSD) 系统的Robustness,使其能够更好地处理 missing modalities。
  • methods: 本研究使用了不同的合并方法,包括非线性中间融合和模式落单技术,将 prosody 特征与语音特征结合以提高 DDSD 性能。
  • results: 研究发现,通过使用 prosody 特征,DDSD 性能可以提高 upto 8.5% 在 false acceptance rate (FA) 上,而使用模式落单技术可以在 missing modalities 时提高 DDSD 性能 by 7.4%。
    Abstract Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g acoustic, text and/or automatic speech recognition system (ASR) features, to classify speech as device-directed or otherwise, and often have to contend with one or more of these modalities being unavailable when deployed in real-world settings. In this paper, we investigate fusion schemes for DDSD systems that can be made more robust to missing modalities. Concurrently, we study the use of non-verbal cues, specifically prosody features, in addition to verbal cues for DDSD. We present different approaches to combine scores and embeddings from prosody with the corresponding verbal cues, finding that prosody improves DDSD performance by upto 8.5% in terms of false acceptance rate (FA) at a given fixed operating point via non-linear intermediate fusion, while our use of modality dropout techniques improves the performance of these models by 7.4% in terms of FA when evaluated with missing modalities during inference time.
    摘要 device-directed speech detection (DDSD) 是一个二分类任务,旨在 отличить对voice助手的查询 versus 背景或边声。现状最佳的 DDSD 系统使用语音、文本和/或自动语音识别系统(ASR)特征来分类语音为设备指向或否,并常常需要面临实际场景中一或多个感知modalities不可用的问题。在这篇论文中,我们调查DDSD系统的融合方案,以提高缺失modalities情况下的Robustness。同时,我们研究非语音特征,即语调特征,在DDSD中的使用,并研究将语调特征与语音特征结合的不同方法。我们发现,在给定的运行点下,使用语调特征可以降低DDSD的false acceptance rate(FA)至多达8.5%,而我们的模块Dropout技术可以在检测时遇到缺失modalities情况下提高这些模型的性能达7.4%。

SimBIG: Field-level Simulation-Based Inference of Galaxy Clustering

  • paper_url: http://arxiv.org/abs/2310.15256
  • repo_url: None
  • paper_authors: Pablo Lemos, Liam Parker, ChangHoon Hahn, Shirley Ho, Michael Eickenberg, Jiamin Hou, Elena Massara, Chirag Modi, Azadeh Moradinezhad Dizgah, Bruno Regaldo-Saint Blancard, David Spergel
  • for: 这个研究的目的是通过场层分析来对cosmological parameters进行 simulations-based inference。
  • methods: 这个研究使用了{\sc SimBIG} forward modelling framework和 convolutional neural network with stochastic weight averaging来进行数据压缩和非 Parametric inference。
  • results: 这个研究得到了对$\Omega_m$和$\sigma_8$的约束,其中$\Omega_m=0.267^{+0.033}{-0.029}$和$\sigma_8=0.762^{+0.036}{-0.035}$。此外,这个研究还提供了基于场层分析的Hubble常数$H_0=64.5 \pm 3.8 \ {\rm km / s / Mpc}$的约束。
    Abstract We present the first simulation-based inference (SBI) of cosmological parameters from field-level analysis of galaxy clustering. Standard galaxy clustering analyses rely on analyzing summary statistics, such as the power spectrum, $P_\ell$, with analytic models based on perturbation theory. Consequently, they do not fully exploit the non-linear and non-Gaussian features of the galaxy distribution. To address these limitations, we use the {\sc SimBIG} forward modelling framework to perform SBI using normalizing flows. We apply SimBIG to a subset of the BOSS CMASS galaxy sample using a convolutional neural network with stochastic weight averaging to perform massive data compression of the galaxy field. We infer constraints on $\Omega_m = 0.267^{+0.033}_{-0.029}$ and $\sigma_8=0.762^{+0.036}_{-0.035}$. While our constraints on $\Omega_m$ are in-line with standard $P_\ell$ analyses, those on $\sigma_8$ are $2.65\times$ tighter. Our analysis also provides constraints on the Hubble constant $H_0=64.5 \pm 3.8 \ {\rm km / s / Mpc}$ from galaxy clustering alone. This higher constraining power comes from additional non-Gaussian cosmological information, inaccessible with $P_\ell$. We demonstrate the robustness of our analysis by showcasing our ability to infer unbiased cosmological constraints from a series of test simulations that are constructed using different forward models than the one used in our training dataset. This work not only presents competitive cosmological constraints but also introduces novel methods for leveraging additional cosmological information in upcoming galaxy surveys like DESI, PFS, and Euclid.
    摘要 我们介绍首次基于 simulations 的 cosmological parameter inference (SBI),通过场景级分析 galaxy clustering。标准 galaxy clustering 分析通常使用 analytic 模型基于扰动理论来分析 Summary statistics,如干振荡спектrum $P_\ell$。这些限制不能充分利用 galaxy 分布的非线性和非高斯特性。为了解决这些限制,我们使用 SimBIG 前模型框架进行 SBI,使用 normalizing flows。我们对 BOSS CMASS галаксиSample 进行了一个子集的 SimBIG 分析,使用一个卷积神经网络和随机权重均衡来实现大量数据压缩。我们从 galaxy clustering 中得到了 $\Omega_m = 0.267^{+0.033}_{-0.029}$ 和 $\sigma_8=0.762^{+0.036}_{-0.035}$ 的限制,其中 $\Omega_m$ 的限制与标准 $P_\ell$ 分析相符,而 $\sigma_8$ 的限制则是 $2.65\times$ 更紧的。我们的分析还提供了基于 galaxy clustering 的 Hubble 常数 $H_0=64.5 \pm 3.8 \ {\rm km / s / Mpc}$ 的限制。这更高的限制来自于非 Gaussian cosmological 信息,无法通过 $P_\ell$ 访问。我们示示了我们的分析可以从不同的前模型测试 simulate 中获得不偏的 cosmological 限制,这表明我们的分析是可靠的。这项工作不仅提供了竞争力的 cosmological 限制,还介绍了在 future galaxy surveys 中使用 Additional cosmological 信息的新方法。

Field-level simulation-based inference with galaxy catalogs: the impact of systematic effects

  • paper_url: http://arxiv.org/abs/2310.15234
  • repo_url: None
  • paper_authors: Natalí S. M. de Santi, Francisco Villaescusa-Navarro, L. Raul Abramo, Helen Shao, Lucia A. Perez, Tiago Castro, Yueying Ni, Christopher C. Lovell, Elena Hernandez-Martinez, Federico Marinacci, David N. Spergel, Klaus Dolag, Lars Hernquist, Mark Vogelsberger
  • for: 这 paper 的目的是使用 graph neural network 进行 cosmological parameter 的偏好测试,不需要对 scale 进行割辑。
  • methods: 这 paper 使用了 field-level likelihood-free inference 方法,并且使用了 thousands 个 state-of-the-art hydrodynamic simulation 来训练和测试模型。
  • results: 这 paper 发现,尽管 observational effects 会降低模型的精度和准确性,但模型仍然可以在大多数 galaxy catalogs 中进行良好的测试,并且可以在实际数据上应用。
    Abstract It has been recently shown that a powerful way to constrain cosmological parameters from galaxy redshift surveys is to train graph neural networks to perform field-level likelihood-free inference without imposing cuts on scale. In particular, de Santi et al. (2023) developed models that could accurately infer the value of $\Omega_{\rm m}$ from catalogs that only contain the positions and radial velocities of galaxies that are robust to uncertainties in astrophysics and subgrid models. However, observations are affected by many effects, including 1) masking, 2) uncertainties in peculiar velocities and radial distances, and 3) different galaxy selections. Moreover, observations only allow us to measure redshift, intertwining galaxies' radial positions and velocities. In this paper we train and test our models on galaxy catalogs, created from thousands of state-of-the-art hydrodynamic simulations run with different codes from the CAMELS project, that incorporate these observational effects. We find that, although the presence of these effects degrades the precision and accuracy of the models, and increases the fraction of catalogs where the model breaks down, the fraction of galaxy catalogs where the model performs well is over 90 %, demonstrating the potential of these models to constrain cosmological parameters even when applied to real data.
    摘要 最近研究表明,可以使用图 нейрон网络来无需割辑的 cosmological 参数的约束,从 galaxy 红移测量中获取信息。特别是,de Santi 等 (2023) 开发了可以准确地从 Catalog 中获取 $\Omega_{\rm m}$ 的模型,这些模型对 astrophysics 和 subgrid 模型中的不确定性具有抗衰减的性能。然而,观测受到多种效应的影响,包括 1) 遮盲、2) peculiar 速度和 radial 距离的不确定性、3) 不同的 galaxy 选择。此外,观测只能测量红移,因此 galaxy 的 radial 位置和速度相互影响。在这篇文章中,我们使用 thousands 个 state-of-the-art hydrodynamic simulation 创建了包含这些观测效应的 galaxy 目录,然后使用不同的 CAMELS 项目代码来训练和测试我们的模型。我们发现,尽管这些效应使模型的精度和准确性下降,并使模型在部分目录中失效,但在 galaxy 目录中模型的性能仍然高于 90%,这表明这些模型可以将 cosmological 参数约束在实际数据上。

Unlocking the Transferability of Tokens in Deep Models for Tabular Data

  • paper_url: http://arxiv.org/abs/2310.15149
  • repo_url: None
  • paper_authors: Qi-Le Zhou, Han-Jia Ye, Le-Ye Wang, De-Chuan Zhan
  • for: 提高深度神经网络在表格数据上的模型 Fine-tuning 质量。
  • methods: 提出了 TabToken 方法,该方法可以在表格数据上使用预训练模型,即使在上游和下游任务中存在特征集的不同。
  • results: TabToken 可以允许模型在少量训练示例下进行 Fine-tuning,并且可以提高深度神经网络在标准分类和回归任务中的泛化能力。
    Abstract Fine-tuning a pre-trained deep neural network has become a successful paradigm in various machine learning tasks. However, such a paradigm becomes particularly challenging with tabular data when there are discrepancies between the feature sets of pre-trained models and the target tasks. In this paper, we propose TabToken, a method aims at enhancing the quality of feature tokens (i.e., embeddings of tabular features). TabToken allows for the utilization of pre-trained models when the upstream and downstream tasks share overlapping features, facilitating model fine-tuning even with limited training examples. Specifically, we introduce a contrastive objective that regularizes the tokens, capturing the semantics within and across features. During the pre-training stage, the tokens are learned jointly with top-layer deep models such as transformer. In the downstream task, tokens of the shared features are kept fixed while TabToken efficiently fine-tunes the remaining parts of the model. TabToken not only enables knowledge transfer from a pre-trained model to tasks with heterogeneous features, but also enhances the discriminative ability of deep tabular models in standard classification and regression tasks.
    摘要 现在许多机器学习任务中,调整预训练的深度神经网络已成为一种成功的方法。然而,这种方法在表格数据上特别地具有挑战性,因为预训练模型和目标任务的特征集不匹配。在这篇论文中,我们提出了TabToken方法,用于提高表格特征的嵌入(即预训练模型中的特征表示)的质量。TabToken使得可以在预训练模型和下游任务之间共享特征,从而实现模型的微调,即使具有有限的训练示例。TabToken方法的核心思想是通过对嵌入进行对比目标,使它们捕捉表格特征之间的含义。在预训练阶段,我们将嵌入与深度模型(如转换器)一起学习。在下游任务中, TabToken可以高效地微调剩下的模型部分,而不需要更改嵌入。TabToken不仅可以将预训练模型传递给具有不同特征的任务中,还可以提高深度表格模型在标准分类和回归任务中的预测能力。

Hyperparameter optimization of hp-greedy reduced basis for gravitational wave surrogates

  • paper_url: http://arxiv.org/abs/2310.15143
  • repo_url: None
  • paper_authors: Franco Cerino, Andrés Diaz-Pace, Emmanuel Tassone, Manuel Tiglio, Atuel Villegas
  • for: 这个论文主要研究的是如何优化各种超参数(hyperparameters),以提高减少基的精度和效率。
  • methods: 这个论文使用了搜索法(Bayesian optimization)来优化超参数,并证明了这种方法的高效性。
  • results: 研究发现,使用本地hp-greedy减少基和搜索法可以提高减少基的精度,同时减少计算时间。Specifically, for gravitational waves from the collision of two spinning but non-precessing black holes, the local hp-greedy reduced bases with HPO have a lower dimensionality of up to $4 \times$ for the cases here studied, depending on the desired accuracy. This factor should directly translate in a parameter estimation speedup.
    Abstract In a previous work we introduced, in the context of gravitational wave science, an initial study on an automated domain-decomposition approach for reduced basis through hp-greedy refinement. The approach constructs local reduced bases of lower dimensionality than global ones, with the same or higher accuracy. These ``light'' local bases should imply both faster evaluations when predicting new waveforms and faster data analysis, in particular faster statistical inference (the forward and inverse problems, respectively). In this approach, however, we have previously found important dependence on several hyperparameters, which do not appear in global reduced basis. This naturally leads to the problem of hyperparameter optimization (HPO), which is the subject of this paper. We tackle the problem through a Bayesian optimization, and show its superiority when compared to grid or random searches. We find that for gravitational waves from the collision of two spinning but non-precessing black holes, for the same accuracy, local hp-greedy reduced bases with HPO have a lower dimensionality of up to $4 \times$ for the cases here studied, depending on the desired accuracy. This factor should directly translate in a parameter estimation speedup, for instance. Such acceleration might help in the near real-time requirements for electromagnetic counterparts of gravitational waves from compact binary coalescences. In addition, we find that the Bayesian approach used in this paper for HPO is two orders of magnitude faster than, for example, a grid search, with about a $100 \times$ acceleration. The code developed for this project is available as open source from public repositories.
    摘要 在我们之前的工作中,我们已经在 gravitational wave 科学中提出了一种自动域分解方法,通过 hp-greedy 精炼来建立地方减少基。这种方法可以建立较低维度的地方减少基,而不损失全球减少基的精度。这些“轻”的地方基应该会导致在预测新波形时的评估更快,以及数据分析更快,特别是更快的统计推断(前向和反向问题)。在这种方法中,我们已经发现了一些重要的超参数,这些超参数不在全球减少基中出现。这自然导致了超参数优化(HPO)的问题,这是本文的主题。我们使用 Bayesian 优化来解决这个问题,并证明了它在比例搜索和随机搜索之间的优势。我们发现,对于两个旋转 pero 非旋转的黑洞凝合的 gravitational waves,使用本地 hp-greedy 减少基和 HPO,可以在保持同等精度情况下,减少维度的幅度为 $4 \times$,具体取决于所需精度。这个因子应该直接导致参数估计速度增加,例如。这种加速可能会帮助在 compact binary coalescences 中的 near real-time 需求。此外,我们发现 Bayesian 方法在这个项目中的HPO速度是两个数量级快于,例如,格子搜索,约 $100 \times$ 加速。我们开发的代码可以在公共库中获取。

Mixed-Variable Global Sensitivity Analysis For Knowledge Discovery And Efficient Combinatorial Materials Design

  • paper_url: http://arxiv.org/abs/2310.15124
  • repo_url: None
  • paper_authors: Yigitcan Comlek, Liwei Wang, Wei Chen
  • for: 本研究旨在探讨global sensitivity analysis (GSA)在工程设计中的应用,尤其是在混合变量设计中。
  • methods: 本研究提出了一种基于Latent Variable Gaussian Process (LVGP)和Sobol分析的混合变量GSA方法,并通过数值实验验证其效果。
  • results: 研究表明,该方法可以有效地探讨混合变量设计中的敏感度问题,并且可以在多bjective Bayesian optimization (BO)中应用,以加速Pareto前设计探索。
    Abstract Global Sensitivity Analysis (GSA) is the study of the influence of any given inputs on the outputs of a model. In the context of engineering design, GSA has been widely used to understand both individual and collective contributions of design variables on the design objectives. So far, global sensitivity studies have often been limited to design spaces with only quantitative (numerical) design variables. However, many engineering systems also contain, if not only, qualitative (categorical) design variables in addition to quantitative design variables. In this paper, we integrate Latent Variable Gaussian Process (LVGP) with Sobol' analysis to develop the first metamodel-based mixed-variable GSA method. Through numerical case studies, we validate and demonstrate the effectiveness of our proposed method for mixed-variable problems. Furthermore, while the proposed GSA method is general enough to benefit various engineering design applications, we integrate it with multi-objective Bayesian optimization (BO) to create a sensitivity-aware design framework in accelerating the Pareto front design exploration for metal-organic framework (MOF) materials with many-level combinatorial design spaces. Although MOFs are constructed only from qualitative variables that are notoriously difficult to design, our method can utilize sensitivity analysis to navigate the optimization in the many-level large combinatorial design space, greatly expediting the exploration of novel MOF candidates.
    摘要 全球敏感分析(GSA)是研究模型输出受输入影响的学科。在工程设计方面,GSA已广泛应用于理解设计变量对设计目标的影响。虽然现有的全球敏感研究多数是限制在只含数字(量化)设计变量的设计空间内,但是许多工程系统也包含至少一些 categorical(分类)设计变量。在本文中,我们将Latent Variable Gaussian Process(LVGP)与索博尔分析结合,开发出首个混合变量GSA方法。通过数字实验,我们验证并证明了我们提议的方法在混合变量问题上的效果。此外,我们将这种GSA方法与多目标 Bayesian 优化(BO)结合,创建一种敏感意识设计框架,以加速MOF材料的Pareto前面设计探索。MOFs 是由仅数字变量组成的,这些变量很难设计的,但我们的方法可以通过敏感分析,在多级大 combinatorial 设计空间中导航优化,大大加速了探索新MOF канди达。

Evaluating machine learning models in non-standard settings: An overview and new findings

  • paper_url: http://arxiv.org/abs/2310.15108
  • repo_url: None
  • paper_authors: Roman Hornung, Malte Nalenz, Lennart Schneider, Andreas Bender, Ludwig Bothmann, Bernd Bischl, Thomas Augustin, Anne-Laure Boulesteix
  • for: 本研究旨在提供非标准设置下机器学习模型的通用化预测误差(GE)估计方法,以 Address the challenge of biased GE estimates in non-standard settings.
  • methods: 本研究使用了多种方法,包括分布式数据、空间数据、不均匀抽样概率、概念演化和层次结构的结果,以涵盖不同的非标准设置。
  • results: simulations 表明,使用适应非标准设置的GE估计方法可以减轻标准抽样方法中的偏见,这 Further emphasizes the importance of tailoring GE estimation methods to specific non-standard settings.
    Abstract Estimating the generalization error (GE) of machine learning models is fundamental, with resampling methods being the most common approach. However, in non-standard settings, particularly those where observations are not independently and identically distributed, resampling using simple random data divisions may lead to biased GE estimates. This paper strives to present well-grounded guidelines for GE estimation in various such non-standard settings: clustered data, spatial data, unequal sampling probabilities, concept drift, and hierarchically structured outcomes. Our overview combines well-established methodologies with other existing methods that, to our knowledge, have not been frequently considered in these particular settings. A unifying principle among these techniques is that the test data used in each iteration of the resampling procedure should reflect the new observations to which the model will be applied, while the training data should be representative of the entire data set used to obtain the final model. Beyond providing an overview, we address literature gaps by conducting simulation studies. These studies assess the necessity of using GE-estimation methods tailored to the respective setting. Our findings corroborate the concern that standard resampling methods often yield biased GE estimates in non-standard settings, underscoring the importance of tailored GE estimation.
    摘要 估计机器学习模型的总体化误差(GE)是基础知识,使用重新采样方法是最常见的方法。然而,在非标准设置下,特别是 observation 不是独立同分布的情况下,使用简单随机数据分区的重新采样可能会导致GE估计偏差。这篇论文努力提供在不同非标准设置下的固定指南:分区数据、空间数据、不均匀采样概率、概念演变和层次结构化输出。我们的概述结合了已有的方法ologies 以及其他一些未经常考虑的方法,这些方法的共同原则是在每次重新采样过程中使用测试数据应用于新观察数据,而训练数据应该反映整个数据集来获得最终模型。此外,我们还进行了literature gap的填充,通过进行 simulations 来评估标准重新采样方法在不同设置下是否具有偏差。我们的发现表明,在非标准设置下,使用标准重新采样方法可能会导致GE估计偏差,这更加证明了需要适应的GE估计方法。

A Canonical Data Transformation for Achieving Inter- and Within-group Fairness

  • paper_url: http://arxiv.org/abs/2310.15097
  • repo_url: None
  • paper_authors: Zachary McBride Lazri, Ivan Brugere, Xin Tian, Dana Dachman-Soled, Antigoni Polychroniadou, Danial Dervovic, Min Wu
  • for: 本研究旨在提高机器学习算法对敏感数据的应用,特别是在不同群体之间实现公平性。
  • methods: 本研究提出了一种形式化的 Within-group fairness 定义,以保证同一群体内的个体受到公平对待。此外,提出了一种预处理框架,可以同时满足 между群体公平性和同群体内公平性的要求,并尽量保持准确性。
  • results: 在 COMPAS 风险评估数据和 Law School 数据上应用了这种预处理框架,与两种常量化方法进行比较,结果表明,这种方法可以更好地实现 между群体公平性和同群体内公平性。
    Abstract Increases in the deployment of machine learning algorithms for applications that deal with sensitive data have brought attention to the issue of fairness in machine learning. Many works have been devoted to applications that require different demographic groups to be treated fairly. However, algorithms that aim to satisfy inter-group fairness (also called group fairness) may inadvertently treat individuals within the same demographic group unfairly. To address this issue, we introduce a formal definition of within-group fairness that maintains fairness among individuals from within the same group. We propose a pre-processing framework to meet both inter- and within-group fairness criteria with little compromise in accuracy. The framework maps the feature vectors of members from different groups to an inter-group-fair canonical domain before feeding them into a scoring function. The mapping is constructed to preserve the relative relationship between the scores obtained from the unprocessed feature vectors of individuals from the same demographic group, guaranteeing within-group fairness. We apply this framework to the COMPAS risk assessment and Law School datasets and compare its performance in achieving inter-group and within-group fairness to two regularization-based methods.
    摘要 《机器学习公平性问题的研究》随着机器学习算法的应用于敏感数据领域的增加,对机器学习公平性的问题引起了广泛的关注。许多研究集中在不同民族群体待遇公平的应用上。然而,旨在满足多个民族群体公平(也称为群体公平)的算法可能无意中对同一个民族群体内的个体不公。为解决这问题,我们提出了一种形式定义的同一个群体内公平性(也称为内部公平),以保证同一个民族群体内的个体待遇公平。我们提议一种预处理框架,能同时满足多个民族群体公平和同一个群体内公平的要求,而无需妥协精度。这种框架将不同群体的特征向量映射到一个公平的共同领域,然后通过 scoring 函数进行评估。映射是根据不同群体内个体的未处理特征向量 scores 的相对关系,以保证同一个群体内公平。我们在 COMPAS 风险评估和法学院数据集上应用了这种框架,并与两种常量化方法进行比较,以评估它们在多个群体公平和同一个群体内公平方面的表现。

Quantum Federated Learning With Quantum Networks

  • paper_url: http://arxiv.org/abs/2310.15084
  • repo_url: None
  • paper_authors: Tyler Wang, Huan-Hsin Tseng, Shinjae Yoo
  • for: 提高深度学习模型的安全性和隐私性
  • methods: 使用量子互联网和量子学习 weights 实现完全安全和隐私的 federated learning
  • results: 实现了首次完全量子化的 federated learning 方案,并且在一个分布式 federated learning 环境中实现了高效的数据交换和学习Here’s a more detailed explanation of each point:
  • for: The paper aims to improve the security and privacy of deep learning models by using the quantum internet and quantum learning weights.
  • methods: The proposed method uses a decentralized ring topology for federated learning, where each client is given a portion of the entire dataset and only performs training on that set. Additionally, the paper introduces the use of quantum weights for quantum federated learning, which allows the training to be performed entirely in quantum.
  • results: The paper achieves the first successful implementation of a completely quantumized federated learning scheme, and demonstrates high-efficiency data exchange and learning in a distributed federated learning environment.
    Abstract A major concern of deep learning models is the large amount of data that is required to build and train them, much of which is reliant on sensitive and personally identifiable information that is vulnerable to access by third parties. Ideas of using the quantum internet to address this issue have been previously proposed, which would enable fast and completely secure online communications. Previous work has yielded a hybrid quantum-classical transfer learning scheme for classical data and communication with a hub-spoke topology. While quantum communication is secure from eavesdrop attacks and no measurements from quantum to classical translation, due to no cloning theorem, hub-spoke topology is not ideal for quantum communication without quantum memory. Here we seek to improve this model by implementing a decentralized ring topology for the federated learning scheme, where each client is given a portion of the entire dataset and only performs training on that set. We also demonstrate the first successful use of quantum weights for quantum federated learning, which allows us to perform our training entirely in quantum.
    摘要 深度学习模型的一个主要问题是需要大量数据来建立和训练它们,大多数数据是敏感和个人可识别信息,容易被第三方访问。有人提出使用量子互联网解决这个问题,可以实现快速和完全安全的在线通信。之前的工作已经实现了一种混合量子-类征传输学习方案,使用枢轴-螺旋托波ptopлоги,可以在类征数据和通信中进行快速的传输。然而,枢轴-螺旋托波ptopлоги并不适合量子通信,因为量子通信需要量子存储。我们在这里尝试改进这个模型,使用分布式环形托波ptopлоги,每个客户端都 receives一部分整个数据集,并仅在该集上进行训练。我们还实现了首次量子质量用于量子联合学习,允许我们完全在量子上进行训练。

Coordinated Replay Sample Selection for Continual Federated Learning

  • paper_url: http://arxiv.org/abs/2310.15054
  • repo_url: None
  • paper_authors: Jack Good, Jimit Majmudar, Christophe Dupuy, Jixuan Wang, Charith Peris, Clement Chung, Richard Zemel, Rahul Gupta
  • for: 这 paper 旨在解决 continual federated learning (CFL) 中的忘记问题,即在继续学习新数据时,不要忘记之前学习的知识。
  • methods: 这 paper 使用了 federated learning (FL) 和 continual learning (CL) 的结合,并采用了 replay-based 算法来减少忘记。具体来说,paper 采用了基于损失函数多样性的播种样本选择目标,并提出了一种新的宽度优化方法来优化该目标。
  • results: 这 paper 通过对大规模实验来证明,使用 gradient-based 播种样本选择方法可以提高性能并降低忘记,并且协调客户端的播种样本选择可以在低播种大小的情况下获得提高。
    Abstract Continual Federated Learning (CFL) combines Federated Learning (FL), the decentralized learning of a central model on a number of client devices that may not communicate their data, and Continual Learning (CL), the learning of a model from a continual stream of data without keeping the entire history. In CL, the main challenge is \textit{forgetting} what was learned from past data. While replay-based algorithms that keep a small pool of past training data are effective to reduce forgetting, only simple replay sample selection strategies have been applied to CFL in prior work, and no previous work has explored coordination among clients for better sample selection. To bridge this gap, we adapt a replay sample selection objective based on loss gradient diversity to CFL and propose a new relaxation-based selection of samples to optimize the objective. Next, we propose a practical algorithm to coordinate gradient-based replay sample selection across clients without communicating private data. We benchmark our coordinated and uncoordinated replay sample selection algorithms against random sampling-based baselines with language models trained on a large scale de-identified real-world text dataset. We show that gradient-based sample selection methods both boost performance and reduce forgetting compared to random sampling methods, with our coordination method showing gains early in the low replay size regime (when the budget for storing past data is small).
    摘要

Fast 2D Bicephalous Convolutional Autoencoder for Compressing 3D Time Projection Chamber Data

  • paper_url: http://arxiv.org/abs/2310.15026
  • repo_url: https://github.com/bnl-daq-ldrd/neuralcompression_v2
  • paper_authors: Yi Huang, Yihui Ren, Shinjae Yoo, Jin Huang
  • For: The paper is written for the development and application of real-time data compression algorithms for high-energy large-scale particle colliders.* Methods: The paper uses a 3D convolutional neural network (CNN)-based approach called Bicephalous Convolutional Autoencoder (BCAE) to compress the data, and two variants of BCAE are introduced: BCAE++ and BCAE-2D.* Results: The paper achieves a 15% better compression ratio and a 77% better reconstruction accuracy compared with traditional methods, and the BCAE++ and BCAE-2D variants can benefit more from using half-precision mode without loss in reconstruction accuracy.Here are the three points in Simplified Chinese text:
  • for: 本文为高能物理实验室中的数据压缩技术的开发和应用。
  • methods: 本文使用了一种基于3D卷积神经网络的方法,即双脑卷积自编码器(BCAE)来压缩数据,并 introduce了两种BCAE变体:BCAE++和BCAE-2D。
  • results: 本文比传统方法提供15%更好的压缩率和77%更好的重建精度,而BCAE++和BCAE-2D变体可以在使用半 Float 模式时提供更大的提高。
    Abstract High-energy large-scale particle colliders produce data at high speed in the order of 1 terabytes per second in nuclear physics and petabytes per second in high-energy physics. Developing real-time data compression algorithms to reduce such data at high throughput to fit permanent storage has drawn increasing attention. Specifically, at the newly constructed sPHENIX experiment at the Relativistic Heavy Ion Collider (RHIC), a time projection chamber is used as the main tracking detector, which records particle trajectories in a volume of a three-dimensional (3D) cylinder. The resulting data are usually very sparse with occupancy around 10.8%. Such sparsity presents a challenge to conventional learning-free lossy compression algorithms, such as SZ, ZFP, and MGARD. The 3D convolutional neural network (CNN)-based approach, Bicephalous Convolutional Autoencoder (BCAE), outperforms traditional methods both in compression rate and reconstruction accuracy. BCAE can also utilize the computation power of graphical processing units suitable for deployment in a modern heterogeneous high-performance computing environment. This work introduces two BCAE variants: BCAE++ and BCAE-2D. BCAE++ achieves a 15% better compression ratio and a 77% better reconstruction accuracy measured in mean absolute error compared with BCAE. BCAE-2D treats the radial direction as the channel dimension of an image, resulting in a 3x speedup in compression throughput. In addition, we demonstrate an unbalanced autoencoder with a larger decoder can improve reconstruction accuracy without significantly sacrificing throughput. Lastly, we observe both the BCAE++ and BCAE-2D can benefit more from using half-precision mode in throughput (76-79% increase) without loss in reconstruction accuracy. The source code and links to data and pretrained models can be found at https://github.com/BNL-DAQ-LDRD/NeuralCompression_v2.
    摘要 高能量大规模粒子Collider生成数据的速度在高能物理和高能物理领域分别为1 terabytes每秒和petabytes每秒。为了减少这些数据,开发实时数据压缩算法已经引起了越来越多的关注。具体来说,在 newly constructed sPHENIX experiment中,使用了时间投影室来作为主跟踪器,记录粒子轨迹在三维cylinder中的卷积体中。结果的数据通常很稀疏,占用率约为10.8%。这种稀疏性对于传统的学习无损压缩算法(如SZ、ZFP和MGARD)来说是一个挑战。使用3D卷积神经网络(CNN)基于的方法,比如双颈 convolutional autoencoder(BCAE),可以在压缩率和重建精度两个方面都超越传统方法。BCAE还可以利用现代高性能计算环境中的图形处理器,并且可以使用半precision模式,无需loss在重建精度上。在这项工作中,我们介绍了两种BCAE变体:BCAE++和BCAE-2D。BCAE++可以达到15%更好的压缩率和77%更好的重建精度(计算为平均绝对误差),相比BCAE。BCAE-2D将卷积方向视为图像的通道维度,从而实现了3倍的压缩速率。此外,我们还证明了一个不均衡的 autoencoder 可以在重建精度方面获得更好的表现,而无需Significant sacrifice in throughput。最后,我们发现BCAE++和BCAE-2D都可以通过使用半precision模式来提高 Throughput(76-79% 增加),无需loss在重建精度上。相关代码和数据可以在https://github.com/BNL-DAQ-LDRD/NeuralCompression_v2中找到。

Leveraging Deep Learning for Abstractive Code Summarization of Unofficial Documentation

  • paper_url: http://arxiv.org/abs/2310.15015
  • repo_url: None
  • paper_authors: AmirHossein Naghshzan, Latifa Guerrouj, Olga Baysal
  • for: 提高API文档学习效iveness,帮助开发者更好地理解API的功能和使用方法。
  • methods: 利用BART算法和深度学习技术,自动生成Stack Overflow上API的摘要,并对其进行评估。
  • results: 比较于人工生成的摘要和之前的工作,深度学习算法能够提高摘要质量,提高了precision、recall和F-measure的平均值,并且运行速度比之前的工作快4.4倍。
    Abstract Usually, programming languages have official documentation to guide developers with APIs, methods, and classes. However, researchers identified insufficient or inadequate documentation examples and flaws with the API's complex structure as barriers to learning an API. As a result, developers may consult other sources (StackOverflow, GitHub, etc.) to learn more about an API. Recent research studies have shown that unofficial documentation is a valuable source of information for generating code summaries. We, therefore, have been motivated to leverage such a type of documentation along with deep learning techniques towards generating high-quality summaries for APIs discussed in informal documentation. This paper proposes an automatic approach using the BART algorithm, a state-of-the-art transformer model, to generate summaries for APIs discussed in StackOverflow. We built an oracle of human-generated summaries to evaluate our approach against it using ROUGE and BLEU metrics which are the most widely used evaluation metrics in text summarization. Furthermore, we evaluated our summaries empirically against a previous work in terms of quality. Our findings demonstrate that using deep learning algorithms can improve summaries' quality and outperform the previous work by an average of %57 for Precision, %66 for Recall, and %61 for F-measure, and it runs 4.4 times faster.
    摘要 通常,编程语言都有官方文档,以帮助开发者了解API、方法和类。然而,研究人员发现,官方文档中的示例不够或者不够详细,导致开发者需要寻找其他来源(如Stack Overflow、GitHub等)来学习更多关于API的信息。最近的研究表明,非官方文档是开发者学习API的重要来源。因此,我们受到了这种类型的文档的启发,并使用深度学习技术来生成高质量的APISUMMARY。本文提出了一种自动化方法,使用BART算法,一种现代的转移模型,来生成Stack Overflow上讨论的APISUMMARY。我们建立了一个人类生成的SUMMARY oracle来评估我们的方法,使用ROUGE和BLEU metrics,这两个 metrics 是文本摘要评估中最常用的评估 metric。此外,我们也对我们的摘要进行了实际评估,并与之前的工作进行了比较。我们的结果表明,使用深度学习算法可以提高摘要质量,并超过之前的工作平均57%的精度、66%的准确率和61%的评价率,并且运行速度比之前的工作快4.4倍。

Neural Snowflakes: Universal Latent Graph Inference via Trainable Latent Geometries

  • paper_url: http://arxiv.org/abs/2310.15003
  • repo_url: https://github.com/zeniSoida/pl1
  • paper_authors: Haitz Sáez de Ocáriz Borde, Anastasis Kratsios
    for:* 这个论文主要目标是提高图神经网络(GNN)的预测性能,并通过动态重构图神经网络的假设来实现这一目标。methods:* 该论文提出了一种名为“神经雪花”的深度学习架构,可以在$\mathbb{R}^d$上实现自适应的几何学量表示。* 该架构包括一个标准的多层感知(MLP)编码器和一个神经网络。results:* 该论文证明了任意给定的Finite Weights Graph可以通过标准的MLP编码器进行归一化。* 当latent graph可以在特征空间中被表示为一个足够 régulier kernel时,该架构并不会陷入维度下降的问题,只需要一个低度的多项式数量的参数。* 该实现可以实现一个低维度几何学量的归一化表示。* 对于 sintética experiment和graph benchmark,神经雪花模型表现出了更好的度量学习能力,并且在latent graph推理中实现了更高的预测性能。
    Abstract The inductive bias of a graph neural network (GNN) is largely encoded in its specified graph. Latent graph inference relies on latent geometric representations to dynamically rewire or infer a GNN's graph to maximize the GNN's predictive downstream performance, but it lacks solid theoretical foundations in terms of embedding-based representation guarantees. This paper addresses this issue by introducing a trainable deep learning architecture, coined neural snowflake, that can adaptively implement fractal-like metrics on $\mathbb{R}^d$. We prove that any given finite weights graph can be isometrically embedded by a standard MLP encoder. Furthermore, when the latent graph can be represented in the feature space of a sufficiently regular kernel, we show that the combined neural snowflake and MLP encoder do not succumb to the curse of dimensionality by using only a low-degree polynomial number of parameters in the number of nodes. This implementation enables a low-dimensional isometric embedding of the latent graph. We conduct synthetic experiments to demonstrate the superior metric learning capabilities of neural snowflakes when compared to more familiar spaces like Euclidean space. Additionally, we carry out latent graph inference experiments on graph benchmarks. Consistently, the neural snowflake model achieves predictive performance that either matches or surpasses that of the state-of-the-art latent graph inference models. Importantly, this performance improvement is achieved without requiring random search for optimal latent geometry. Instead, the neural snowflake model achieves this enhancement in a differentiable manner.
    摘要 <> translate "The inductive bias of a graph neural network (GNN) is largely encoded in its specified graph. Latent graph inference relies on latent geometric representations to dynamically rewire or infer a GNN's graph to maximize the GNN's predictive downstream performance, but it lacks solid theoretical foundations in terms of embedding-based representation guarantees. This paper addresses this issue by introducing a trainable deep learning architecture, coined neural snowflake, that can adaptively implement fractal-like metrics on $\mathbb{R}^d$. We prove that any given finite weights graph can be isometrically embedded by a standard MLP encoder. Furthermore, when the latent graph can be represented in the feature space of a sufficiently regular kernel, we show that the combined neural snowflake and MLP encoder do not succumb to the curse of dimensionality by using only a low-degree polynomial number of parameters in the number of nodes. This implementation enables a low-dimensional isometric embedding of the latent graph. We conduct synthetic experiments to demonstrate the superior metric learning capabilities of neural snowflakes when compared to more familiar spaces like Euclidean space. Additionally, we carry out latent graph inference experiments on graph benchmarks. Consistently, the neural snowflake model achieves predictive performance that either matches or surpasses that of the state-of-the-art latent graph inference models. Importantly, this performance improvement is achieved without requiring random search for optimal latent geometry. Instead, the neural snowflake model achieves this enhancement in a differentiable manner."中文翻译:<>graph neural network(GNN)的逻辑偏好主要嵌入到其指定的图中。潜在图的推理依赖于潜在的几何表示来动态重新配置或推理GNN的图以最大化GNN的下游预测性能,但是它缺乏嵌入基于嵌入表示的理论基础。这篇文章通过引入可调深度学习架构,称为神经风暴,来解决这一问题。我们证明任何给定的Finite Weights图可以通过标准的多层感知器编码器得到准确的嵌入。此外,当潜在图可以在特定的feature空间中表示,我们显示了combined神经风暴和多层感知器编码器不会陷入维度的恶性效应,只使用了低度多项式数量的参数。这种实现可以实现低维度准确嵌入。我们在 sintetic experiments中示出了神经风暴在比较 familar的Euclidean space中的superior metric learning能力。此外,我们在图 benchmark上进行了潜在图推理实验。consistently,神经风暴模型在predictive performance中 either matches or surpasses state-of-the-art latent graph inference模型。这种性能改进不需要随机搜索最佳潜在几何,而是通过可导的方式实现。

Bayesian Regression Markets

  • paper_url: http://arxiv.org/abs/2310.14992
  • repo_url: https://github.com/tdfalc/regression-markets
  • paper_authors: Thomas Falconer, Jalal Kazempour, Pierre Pinson
  • for: 这篇论文是为了解决机器学习任务中数据质量的影响所写的。
  • methods: 该论文使用了一种 Bayesian 框架来鼓励数据分享,并对 regression 任务进行了exploration。
  • results: 论文表明,相比现有文献中的提议,其机制可以减少金融风险。
    Abstract Machine learning tasks are vulnerable to the quality of data used as input. Yet, it is often challenging for firms to obtain adequate datasets, with them being naturally distributed amongst owners, that in practice, may be competitors in a downstream market and reluctant to share information. Focusing on supervised learning for regression tasks, we develop a \textit{regression market} to provide a monetary incentive for data sharing. Our proposed mechanism adopts a Bayesian framework, allowing us to consider a more general class of regression tasks. We present a thorough exploration of the market properties, and show that similar proposals in current literature expose the market agents to sizeable financial risks, which can be mitigated in our probabilistic setting.
    摘要 机器学习任务容易受到输入数据质量的影响。然而,很难 для公司获得足够的数据集,因为它们自然地分布在所有者手中,而这些所有者在下游市场中可能是竞争对手,并可能不愿分享信息。我们关注supervised learning regression任务,并开发了一个called“回归市场”,以提供数据共享的经济利益。我们提议的机制采用 bayesian 框架,允许我们考虑更加一般的回归任务。我们对市场属性进行了广泛的探索,并证明了现有文献中的类似提议会对市场代理者带来巨大的金融风险,这些风险可以在我们的概率设置中减少。

Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

  • paper_url: http://arxiv.org/abs/2310.14982
  • repo_url: None
  • paper_authors: Pengfei Sun, Jibin Wu, Malu Zhang, Paul Devos, Dick Botteldooren
  • for: 提高vanilla RNN的时间关系模型能力,解决梯度消失和扩散问题,提高网络的泛化能力。
  • methods: 提出了一种延迟记忆单元(DMU),通过延迟线结构和延迟门控制,实现时间互动和时间归属,提高vanilla RNN的时间关系模型能力。
  • results: 在多种顺序模型任务中,DMU表现出优于其他状态 искус的批量RNN模型,使用参数量相对较少,并且在语音识别、雷达手势识别、ECG波形分 segmentation和排序顺序图像分类等应用中达到了优秀的效果。
    Abstract Recurrent Neural Networks (RNNs) are renowned for their adeptness in modeling temporal dependencies, a trait that has driven their widespread adoption for sequential data processing. Nevertheless, vanilla RNNs are confronted with the well-known issue of gradient vanishing and exploding, posing a significant challenge for learning and establishing long-range dependencies. Additionally, gated RNNs tend to be over-parameterized, resulting in poor network generalization. To address these challenges, we propose a novel Delayed Memory Unit (DMU) in this paper, wherein a delay line structure, coupled with delay gates, is introduced to facilitate temporal interaction and temporal credit assignment, so as to enhance the temporal modeling capabilities of vanilla RNNs. Particularly, the DMU is designed to directly distribute the input information to the optimal time instant in the future, rather than aggregating and redistributing it over time through intricate network dynamics. Our proposed DMU demonstrates superior temporal modeling capabilities across a broad range of sequential modeling tasks, utilizing considerably fewer parameters than other state-of-the-art gated RNN models in applications such as speech recognition, radar gesture recognition, ECG waveform segmentation, and permuted sequential image classification.
    摘要 循环神经网络(RNN)因其能够模型时间相关性而广泛应用于序列数据处理。然而,vanilla RNN受到时间梯度消失和爆炸问题的威胁,这使得学习和建立长距离时间相关性变得困难。另外,闭合RNN往往过参数化,从而导致网络泛化不佳。为了解决这些挑战,我们在这篇论文中提出了一种新的延迟记忆单元(DMU),其中包括延迟线结构和延迟门。这种结构可以促进时间相互作用和时间凭证,从而提高vanilla RNN的时间模型能力。具体来说,DMU可以直接将输入信息传递到将来的最佳时间点,而不是通过复杂的网络动力来聚合和重新分配输入信息。我们的提出的DMU在多种序列模型任务中显示出了superior的时间模型能力,使用的参数数量比其他state-of-the-art闭合RNN模型少得多。应用包括语音识别、雷达手势识别、ECG波形分 segmentation和排序序列图像分类。

Reinforcement learning in large, structured action spaces: A simulation study of decision support for spinal cord injury rehabilitation

  • paper_url: http://arxiv.org/abs/2310.14976
  • repo_url: None
  • paper_authors: Nathan Phelps, Stephanie Marrocco, Stephanie Cornell, Dalton L. Wolfe, Daniel J. Lizotte
  • for: 本研究用强化学习(RL)改进了决策的几种应用,但在某些应用,如脊椎损伤(SCI)的康复治疗中,使用传统RL具有挑战性。这是因为康复治疗有很多可能的治疗方法(大量动作空间)和少量患者(有限的训练数据)。
  • methods: 我们提出了两种方法来归类治疗方法,以便RL机器人可以从有限的数据中学习有效。一种基于SCI康复治疗领域的知识,另一种使用嵌入技术学习治疗方法之间的相似性。然后我们使用Fitted Q Iteration进行训练,以让RL机器人学习优化治疗方案。
  • results: 通过在基于SCI康复治疗的 simulations 中进行的研究,我们发现两种方法都可以帮助 физи奥терапев特的决策更加有效,但基于领域知识的方法表现更好。我们的发现表明RL可以用于改进SCI康复治疗的治疗方案,并且继续收集数据并应用RL于这个领域是值得的。
    Abstract Reinforcement learning (RL) has helped improve decision-making in several applications. However, applying traditional RL is challenging in some applications, such as rehabilitation of people with a spinal cord injury (SCI). Among other factors, using RL in this domain is difficult because there are many possible treatments (i.e., large action space) and few patients (i.e., limited training data). Treatments for SCIs have natural groupings, so we propose two approaches to grouping treatments so that an RL agent can learn effectively from limited data. One relies on domain knowledge of SCI rehabilitation and the other learns similarities among treatments using an embedding technique. We then use Fitted Q Iteration to train an agent that learns optimal treatments. Through a simulation study designed to reflect the properties of SCI rehabilitation, we find that both methods can help improve the treatment decisions of physiotherapists, but the approach based on domain knowledge offers better performance. Our findings provide a "proof of concept" that RL can be used to help improve the treatment of those with an SCI and indicates that continued efforts to gather data and apply RL to this domain are worthwhile.
    摘要 利用强化学习(RL)可以提高决策的几种应用程序。然而,在某些应用程序中,如重abilitation的人类脊梁损伤(SCI),使用传统RL具有挑战性。这些应用程序的原因包括许多可能的治疗方法(即大的动作空间)和少量患者(即有限的训练数据)。SCI的治疗方法有自然的分组,我们提议分两种方法来对治疗方法进行分组,以便RL代理人可以从有限数据中学习有效。其中一种基于SCI重abilitation的领域知识,另一种使用嵌入技术来学习治疗方法之间的相似性。然后,我们使用Fitted Q Iteration来训练一个学习最佳治疗的RL代理人。通过一个设计来模拟SCI重abilitation的特性的实验研究,我们发现两种方法都可以帮助physiotherapist们改善治疗决策,但基于领域知识的方法提供更好的表现。我们的发现表明RL可以用于改善SCI患者的治疗,并且继续收集数据并应用RL于这个领域是值得的。

The Fundamental Dilemma of Bayesian Active Meta-learning

  • paper_url: http://arxiv.org/abs/2310.14968
  • repo_url: None
  • paper_authors: Sabina J. Sloman, Ayush Bharti, Samuel Kaski
  • for: 解决多个多样化、数据稀缺的任务环境中参数的估计问题
  • methods: 使用 bayesian active meta-learning 方法,即顺序优化实验设计
  • results: 研究表明,在某些任务下,积极寻求通用知识可能会导致负面转移(即负面影响),而且任务标识是减少这个威胁的关键Note: “负面转移” (negative transfer) refers to the phenomenon where learning a task-specific model can negatively impact the performance on other tasks, and “任务标识” (task identification) refers to the process of identifying which task the model is currently working on.
    Abstract Many applications involve estimation of parameters that generalize across multiple diverse, but related, data-scarce task environments. Bayesian active meta-learning, a form of sequential optimal experimental design, provides a framework for solving such problems. The active meta-learner's goal is to gain transferable knowledge (estimate the transferable parameters) in the presence of idiosyncratic characteristics of the current task (task-specific parameters). We show that in such a setting, greedy pursuit of this goal can actually hurt estimation of the transferable parameters (induce so-called negative transfer). The learner faces a dilemma akin to but distinct from the exploration--exploitation dilemma: should they spend their acquisition budget pursuing transferable knowledge, or identifying the current task-specific parameters? We show theoretically that some tasks pose an inevitable and arbitrarily large threat of negative transfer, and that task identification is critical to reducing this threat. Our results generalize to analysis of prior misspecification over nuisance parameters. Finally, we empirically illustrate circumstances that lead to negative transfer.
    摘要 许多应用涉及到多个多样化但相关的数据稀少任务环境中参数的估计。 bayesian active meta-learning 是一种顺序优化实验设计的框架,用于解决这些问题。 active meta-learner 的目标是在存在当前任务特有的特征(task-specific parameters)的情况下获得可转移知识(estimate the transferable parameters)。我们表明,在这种情况下,急性追求这个目标可能会对可转移参数的估计产生负面影响(induce so-called negative transfer)。学习者面临一种类似于但不同于exploration--exploitation dilemma:他们应该花费收集预算来追求可转移知识,还是确定当前任务特有的参数?我们表证了一些任务会导致无限大的负面影响,并且任务标识是减少这种威胁的关键。我们的结果泛化到对假设错误的分析。最后,我们employs实验证明了在某些情况下,negative transfer 会发生。

Adam through a Second-Order Lens

  • paper_url: http://arxiv.org/abs/2310.14963
  • repo_url: https://github.com/rmclarke/adamthroughasecondorderlens
  • paper_authors: Ross M. Clarke, Baiyu Su, José Miguel Hernández-Lobato
  • for: 这篇论文是关于深度学习优化的研究,旨在结合首顺方法(如SGD和Adam)和第二顺方法(如 quasi-Newton 方法和 K-FAC)的计算效率。
  • methods: 该论文提议 AdamQLR 优化器,它结合了 K-FAC 的抑制策略和 Adam 的更新方向,并且通过对 Adam 的解释来提高计算效率。
  • results: 该论文在一系列回归和分类任务上测试了 AdamQLR,并实现了与计算时间成正相关的总体化表现。
    Abstract Research into optimisation for deep learning is characterised by a tension between the computational efficiency of first-order, gradient-based methods (such as SGD and Adam) and the theoretical efficiency of second-order, curvature-based methods (such as quasi-Newton methods and K-FAC). We seek to combine the benefits of both approaches into a single computationally-efficient algorithm. Noting that second-order methods often depend on stabilising heuristics (such as Levenberg-Marquardt damping), we propose AdamQLR: an optimiser combining damping and learning rate selection techniques from K-FAC (Martens and Grosse, 2015) with the update directions proposed by Adam, inspired by considering Adam through a second-order lens. We evaluate AdamQLR on a range of regression and classification tasks at various scales, achieving competitive generalisation performance vs runtime.
    摘要

XTSC-Bench: Quantitative Benchmarking for Explainers on Time Series Classification

  • paper_url: http://arxiv.org/abs/2310.14957
  • repo_url: https://github.com/jhoelli/xtsc-bench
  • paper_authors: Jacqueline Höllig, Steffen Thoma, Florian Grimm
  • for: 本研究旨在系统性评估时间序列分类预测器的解释方法,以确定不同解释方法的正确性。
  • methods: 本研究使用了3种扰动-, 6种梯度-和2种示例基于的解释方法来解释时间序列分类模型。
  • results: 研究发现,对多变量时间序列数据的解释方法需要进一步改进正确性和可靠性,特别是在多变量情况下。
    Abstract Despite the growing body of work on explainable machine learning in time series classification (TSC), it remains unclear how to evaluate different explainability methods. Resorting to qualitative assessment and user studies to evaluate explainers for TSC is difficult since humans have difficulties understanding the underlying information contained in time series data. Therefore, a systematic review and quantitative comparison of explanation methods to confirm their correctness becomes crucial. While steps to standardized evaluations were taken for tabular, image, and textual data, benchmarking explainability methods on time series is challenging due to a) traditional metrics not being directly applicable, b) implementation and adaption of traditional metrics for time series in the literature vary, and c) varying baseline implementations. This paper proposes XTSC-Bench, a benchmarking tool providing standardized datasets, models, and metrics for evaluating explanation methods on TSC. We analyze 3 perturbation-, 6 gradient- and 2 example-based explanation methods to TSC showing that improvements in the explainers' robustness and reliability are necessary, especially for multivariate data.
    摘要 尽管有一个不断增长的研究对时间序列分类(TSC)的可解释机器学习(Explainable Machine Learning,EML)方法进行了评估,仍然没有清晰的方法来评估不同的可解释方法。由于人类对时间序列数据中含义的理解具有困难,因此对于TSC的可解释方法进行系统性的评估和量化比较成为了必要。尽管对于表格、图像和文本数据进行标准化评估的步骤已经存在,但对于时间序列数据进行标准化评估是困难的,主要因为:a) 传统指标不直接适用于时间序列数据,b) 在文献中对时间序列数据的传统指标实现和适应各有不同,c) 基准实现的变化。本文提出了XTSC-Bench,一个用于评估TSC可解释方法的标准化工具,提供了标准的数据集、模型和指标。我们对3种扰动-, 6种梯度-和2种示例基于的可解释方法进行了分析,发现这些方法的可靠性和稳定性需要进一步提高,特别是对多变量数据进行。

Causal machine learning for single-cell genomics

  • paper_url: http://arxiv.org/abs/2310.14935
  • repo_url: None
  • paper_authors: Alejandro Tejada-Lapuerta, Paul Bertin, Stefan Bauer, Hananeh Aliee, Yoshua Bengio, Fabian J. Theis
  • for: The paper is written to explore the application of causal techniques and algorithms to handle high-dimensional data in single-cell genomics, and to challenge the assumptions of current causal approaches from a biological perspective.
  • methods: The paper uses large-scale perturbation screens and single-cell omics technologies to measure the effect of targeted perturbations on the whole transcriptome, and discusses the application of established causal techniques and algorithms to handle high-dimensional data in this context.
  • results: The paper identifies open problems in the application of causal approaches to single-cell data, including generalizing to unseen environments, learning interpretable models, and learning causal models of dynamics, and discusses various research directions to address these challenges.
    Abstract Advances in single-cell omics allow for unprecedented insights into the transcription profiles of individual cells. When combined with large-scale perturbation screens, through which specific biological mechanisms can be targeted, these technologies allow for measuring the effect of targeted perturbations on the whole transcriptome. These advances provide an opportunity to better understand the causative role of genes in complex biological processes such as gene regulation, disease progression or cellular development. However, the high-dimensional nature of the data, coupled with the intricate complexity of biological systems renders this task nontrivial. Within the machine learning community, there has been a recent increase of interest in causality, with a focus on adapting established causal techniques and algorithms to handle high-dimensional data. In this perspective, we delineate the application of these methodologies within the realm of single-cell genomics and their challenges. We first present the model that underlies most of current causal approaches to single-cell biology and discuss and challenge the assumptions it entails from the biological point of view. We then identify open problems in the application of causal approaches to single-cell data: generalising to unseen environments, learning interpretable models, and learning causal models of dynamics. For each problem, we discuss how various research directions - including the development of computational approaches and the adaptation of experimental protocols - may offer ways forward, or on the contrary pose some difficulties. With the advent of single cell atlases and increasing perturbation data, we expect causal models to become a crucial tool for informed experimental design.
    摘要 新的单细胞ómics技术可以为每个单细胞提供无前例的透视 profiling。当与大规模的干扰屏试验结合使用时,这些技术可以测量特定生物学机制的影响,从而为整个转录 profiling提供指标。这些进步可以帮助我们更好地理解生物学过程中许多因素之间的关系,如细胞发育、疾病进程和基因调控。然而,高维数据的特点和生物系统的复杂性使得这个任务变得非常困难。在机器学习社区中,有一些最近的兴趣在 causality 方面,旨在适应高维数据的已有 causal 技术和算法。在这个视角下,我们将 causal 方法在单细胞 genomics 中的应用和挑战描述。首先,我们将现有的单细胞生物学模型描述,并讨论这些模型具有生物学上的假设。然后,我们将identify 单细胞数据中 causal 方法的开放问题,包括在未经见过的环境中泛化、学习可解释模型和学习动力学模型。对于每个问题,我们将讨论不同的研究方向,包括计算机方法的开发和实验室协议的修改,以及它们对 causal 模型的影响。随着单细胞 Atlases 的出现和干扰数据的增加,我们预计 causal 模型将成为单细胞 эксперименталь设计中的关键工具。

Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks

  • paper_url: http://arxiv.org/abs/2310.14901
  • repo_url: https://github.com/rmclarke/seriesofhessianvectorproducts
  • paper_authors: Elre T. Oldewage, Ross M. Clarke, José Miguel Hernández-Lobato
  • for: 这篇论文是为了解决当代机器学习中的连续优化问题,特别是第二类 quasi-Newton 方法在机器学习中的应用问题。
  • methods: 本论文提出了一种新的优化算法,可以有效地解决非线性问题和高维度矩降微矩降的问题。这个算法基于一系列的减法和对角降微矩降,可以实现高效率的优化。
  • results: 在实验中,这个算法在训练 ResNet-18 模型在 CIFAR-10 上的过程中,与其他一般优化方法相比,具有相似的runtime和优化性能。
    Abstract Despite their popularity in the field of continuous optimisation, second-order quasi-Newton methods are challenging to apply in machine learning, as the Hessian matrix is intractably large. This computational burden is exacerbated by the need to address non-convexity, for instance by modifying the Hessian's eigenvalues as in Saddle-Free Newton methods. We propose an optimisation algorithm which addresses both of these concerns - to our knowledge, the first efficiently-scalable optimisation algorithm to asymptotically use the exact (eigenvalue-modified) inverse Hessian. Our method frames the problem as a series which principally square-roots and inverts the squared Hessian, then uses it to precondition a gradient vector, all without explicitly computing or eigendecomposing the Hessian. A truncation of this infinite series provides a new optimisation algorithm which is scalable and comparable to other first- and second-order optimisation methods in both runtime and optimisation performance. We demonstrate this in a variety of settings, including a ResNet-18 trained on CIFAR-10.
    摘要 尽管第二阶 quasi-Newton 方法在连续优化中很受欢迎,但在机器学习中它们很难应用,因为希耶曼矩阵是不可能计算的。这个计算卷积的问题被加剧了,因为需要解决非对称性问题,例如通过修改希耶曼矩阵的特征值来使用 Saddle-Free Newton 方法。我们提议一种优化算法,它解决了这两个问题。我们认为这是第一个可以有效使用恒等式(特征值修改后的)希耶曼矩阵的 inverse 的优化算法。我们的方法将问题表示为一个序列, principales square-roots 并将其反转,然后使用它来预处理梯度向量,而无需显式计算或 eigendecomposing 希耶曼矩阵。一个对这个无穷序数列的 truncation 提供了一种新的优化算法,它是可扩展的并与其他第一阶和第二阶优化方法相比,在运行时间和优化性能方面具有相同的性能。我们在多个设置中展示了这一点,包括使用 ResNet-18 在 CIFAR-10 上训练。

Beyond Bayesian Model Averaging over Paths in Probabilistic Programs with Stochastic Support

  • paper_url: http://arxiv.org/abs/2310.14888
  • repo_url: https://github.com/treigerm/beyond_bma_in_probprog
  • paper_authors: Tim Reichelt, Luke Ong, Tom Rainforth
  • for: 这篇论文的目的是解决 probabilistic programs 中的 posterior 问题,以及如何改善 predictions 的准确性。
  • methods: 论文使用的方法包括 Bayesian model averaging (BMA) 和 two alternative mechanisms for path weighting:一种是基于 stacking 的方法,另一种是基于 PAC-Bayes 的方法。
  • results: 试验表明,使用这两种机制可以更好地改善 predictions 的准确性,比 Default BMA weights 更加稳定和有效。
    Abstract The posterior in probabilistic programs with stochastic support decomposes as a weighted sum of the local posterior distributions associated with each possible program path. We show that making predictions with this full posterior implicitly performs a Bayesian model averaging (BMA) over paths. This is potentially problematic, as model misspecification can cause the BMA weights to prematurely collapse onto a single path, leading to sub-optimal predictions in turn. To remedy this issue, we propose alternative mechanisms for path weighting: one based on stacking and one based on ideas from PAC-Bayes. We show how both can be implemented as a cheap post-processing step on top of existing inference engines. In our experiments, we find them to be more robust and lead to better predictions compared to the default BMA weights.
    摘要 posterior in probabilistic 程序中的权重分布分解为每个可能的程序路径的本地 posterior 分布的加权和。我们显示出,使用这个全部 posterior 来进行预测实际上是在执行 Bayesian 模型均衡(BMA)中,这可能会导致模型错误所致的 weights 快速塌缩到单个路径上,从而导致预测质量下降。为了解决这个问题,我们提议了一些路径权重机制:一种基于堆叠,一种基于 PAC-Bayes 的想法。我们示示了这两种机制可以作为现有推理引擎的便宜后处理步骤来实现,并在我们的实验中发现它们比 default BMA weights 更加稳定,预测更加 precis。

Diverse Priors for Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.14864
  • repo_url: None
  • paper_authors: Chenfan Weng, Zhongguo Li
  • for: 解决RLagent在学习过程中的exploration-exploitation矛盾,通过使用不确定性为导向思想,提供一种有效的方法。
  • methods: 使用ensemble-based方法来衡量不确定性,并在RLagent的初始值函数中加入随机函数作为先验。
  • results: 比random prior方法更高效地解决经典控制问题和普通的探索任务,大幅提高样本效率。
    Abstract In Reinforcement Learning (RL), agents aim at maximizing cumulative rewards in a given environment. During the learning process, RL agents face the dilemma of exploitation and exploration: leveraging existing knowledge to acquire rewards or seeking potentially higher ones. Using uncertainty as a guiding principle provides an active and effective approach to solving this dilemma and ensemble-based methods are one of the prominent avenues for quantifying uncertainty. Nevertheless, conventional ensemble-based uncertainty estimation lacks an explicit prior, deviating from Bayesian principles. Besides, this method requires diversity among members to generate less biased uncertainty estimation results. To address the above problems, previous research has incorporated random functions as priors. Building upon these foundational efforts, our work introduces an innovative approach with delicately designed prior NNs, which can incorporate maximal diversity in the initial value functions of RL. Our method has demonstrated superior performance compared with the random prior approaches in solving classic control problems and general exploration tasks, significantly improving sample efficiency.
    摘要 在强化学习(RL)中,代理人目标是 maximize 累积奖励在给定环境中。在学习过程中,RL 代理人面临的权衡决策:利用现有知识来获得奖励或寻找可能更高的奖励。使用不确定性为导向原则提供了一种活跃和有效的方法解决这个权衡决策,并且集成方法是RL中一个显著的进路。然而,传统的集成方法缺乏明确的先验, deviating from Bayesian principles。此外,这种方法需要集成成员之间的多样性,以生成更加不偏的不确定性估计结果。为了解决以上问题,前一些研究已经将随机函数作为先验。我们的研究基于这些基础努力,提出了一种创新的方法,通过精心设计的先验神经网络,可以吸收最大的多样性在RL中的初始值函数中。我们的方法在经典控制问题和普通探索任务中表现出了superior的性能,significantly improve sample efficiency。

Dynamically Weighted Federated k-Means

  • paper_url: http://arxiv.org/abs/2310.14858
  • repo_url: None
  • paper_authors: Patrick Holzer, Tania Jacob, Shubham Kavane
  • for: 本研究旨在提出一种新的 federated clustering 算法,以解决分布式数据源和不同数据类型所带来的挑战。
  • methods: 该算法 combinates 传统 clustering 技术的优点和 federated learning 的隐私和可扩展性优势,使多个数据所有者可以合作 clustering 本地数据,而无需交换大量信息。
  • results: 我们在多个数据集和数据分布设置下进行了实验,并评估了我们的方法在 clustering 分数、准确率和 v-度量上的性能。结果表明,我们的方法可以与中央化 classical k-means 基线方法匹配表现,并在实际场景中超越现有的 federated clustering 方法。
    Abstract Federated clustering is an important part of the field of federated machine learning, that allows multiple data sources to collaboratively cluster their data while keeping it decentralized and preserving privacy. In this paper, we introduce a novel federated clustering algorithm, named Dynamically Weighted Federated k-means (DWF k-means), to address the challenges posed by distributed data sources and heterogeneous data. Our proposed algorithm combines the benefits of traditional clustering techniques with the privacy and scalability advantages of federated learning. It enables multiple data owners to collaboratively cluster their local data while exchanging minimal information with a central coordinator. The algorithm optimizes the clustering process by adaptively aggregating cluster assignments and centroids from each data source, thereby learning a global clustering solution that reflects the collective knowledge of the entire federated network. We conduct experiments on multiple datasets and data distribution settings to evaluate the performance of our algorithm in terms of clustering score, accuracy, and v-measure. The results demonstrate that our approach can match the performance of the centralized classical k-means baseline, and outperform existing federated clustering methods in realistic scenarios.
    摘要 federated clustering 是Machine learning field中的一个重要部分,它允许多个数据源共同分类他们的数据,而不需要集中化和遗漏隐私。在这篇论文中,我们介绍了一种新的联邦分类算法,名为 Dynamically Weighted Federated k-means (DWF k-means),以解决分布式数据源和不同数据的挑战。我们的提议的算法结合了传统分类技术的优点和联邦学习中的隐私和扩展性优势。它允许多个数据所有者共同分类本地数据,只需要与中央协调器交换最小的信息。算法通过自适应聚合分类分配和中心点,从每个数据源中学习一个全局分类解决方案,这个解决方案反映了整个联邦网络中的共同知识。我们在多个数据集和数据分布设置下进行了实验,以评估我们的方法在分类分数、准确率和v-度量上的性能。结果显示,我们的方法可以与中央化类似的传统k-means基准相匹配,并在实际情况下超过现有的联邦分类方法。

Zero-knowledge Proof Meets Machine Learning in Verifiability: A Survey

  • paper_url: http://arxiv.org/abs/2310.14848
  • repo_url: None
  • paper_authors: Zhibo Xing, Zijian Zhang, Jiamou Liu, Ziang Zhang, Meng Li, Liehuang Zhu, Giovanni Russello
  • for: This paper focuses on the trustworthiness problem of machine learning model computations, specifically in outsourced learning and federated learning settings. It proposes a solution using zero-knowledge proof-based verifiable machine learning (ZKP-VML) technology.
  • methods: The paper analyzes potential verifiability issues in different machine learning scenarios and provides a formal definition of ZKP-VML. It also classifies existing works based on their technical approaches and discusses key challenges and future directions in the field.
  • results: The paper presents a comprehensive survey of ZKP-based VML technology and its applications in machine learning. It provides a detailed analysis of existing works and identifies key challenges and future research directions in the field.
    Abstract With the rapid advancement of artificial intelligence technology, the usage of machine learning models is gradually becoming part of our daily lives. High-quality models rely not only on efficient optimization algorithms but also on the training and learning processes built upon vast amounts of data and computational power. However, in practice, due to various challenges such as limited computational resources and data privacy concerns, users in need of models often cannot train machine learning models locally. This has led them to explore alternative approaches such as outsourced learning and federated learning. While these methods address the feasibility of model training effectively, they introduce concerns about the trustworthiness of the training process since computations are not performed locally. Similarly, there are trustworthiness issues associated with outsourced model inference. These two problems can be summarized as the trustworthiness problem of model computations: How can one verify that the results computed by other participants are derived according to the specified algorithm, model, and input data? To address this challenge, verifiable machine learning (VML) has emerged. This paper presents a comprehensive survey of zero-knowledge proof-based verifiable machine learning (ZKP-VML) technology. We first analyze the potential verifiability issues that may exist in different machine learning scenarios. Subsequently, we provide a formal definition of ZKP-VML. We then conduct a detailed analysis and classification of existing works based on their technical approaches. Finally, we discuss the key challenges and future directions in the field of ZKP-based VML.
    摘要 We first analyze the potential verifiability issues that may exist in different machine learning scenarios. Subsequently, we provide a formal definition of ZKP-VML. We then conduct a detailed analysis and classification of existing works based on their technical approaches. Finally, we discuss the key challenges and future directions in the field of ZKP-based VML.翻译结果随着人工智能技术的快速发展,我们日常生活中的机器学习模型使用也在不断增加。高质量的模型不仅依赖于高效的优化算法,还依赖于基于大量数据和计算能力的训练和学习过程。然而,在做实际应用时,由于限制计算资源和数据隐私问题,用户经常无法本地训练机器学习模型。这导致他们寻找外部学习和联合学习的方法。虽然这些方法解决了本地训练的可行性问题,但它们引入了计算是否符合规则、模型和输入数据的可靠性问题。此外,外部模型推理也存在可靠性问题。这两个问题可以总结为机器学习计算的可靠性问题:如何确保其他参与者计算的结果是根据指定的算法、模型和输入数据得出的? To address this challenge, verifiable machine learning (VML) has emerged. This paper presents a comprehensive survey of zero-knowledge proof-based verifiable machine learning (ZKP-VML) technology.我们首先分析了不同的机器学习场景中的可靠性问题。接着,我们提供了ZKP-VML的正式定义。然后,我们对现有的技术方法进行了详细的分析和分类。最后,我们讨论了ZKP-VML领域的关键挑战和未来方向。

ULTRA-DP: Unifying Graph Pre-training with Multi-task Graph Dual Prompt

  • paper_url: http://arxiv.org/abs/2310.14845
  • repo_url: None
  • paper_authors: Mouxiang Chen, Zemin Liu, Chenghao Liu, Jundong Li, Qiheng Mao, Jianling Sun
  • for: 提高 Graph Neural Networks (GNNs) 的表现,增强下游任务的性能。
  • methods: 使用 prompt 机制,将任务标识和位置标识 inject 到 GNNs 中,实现 multi-task graph dual prompt (ULTRA-DP) 框架。
  • results: 对 hybrid pre-training 方法进行了改进,并在不同级别(node-node level和node-group level)进行了融合,提高了表现。广泛的实验表明,我们提出的 ULTRA-DP 可以显著提高 hybrid pre-training 方法的表现,并且具有普适性和可移植性。
    Abstract Recent research has demonstrated the efficacy of pre-training graph neural networks (GNNs) to capture the transferable graph semantics and enhance the performance of various downstream tasks. However, the semantic knowledge learned from pretext tasks might be unrelated to the downstream task, leading to a semantic gap that limits the application of graph pre-training. To reduce this gap, traditional approaches propose hybrid pre-training to combine various pretext tasks together in a multi-task learning fashion and learn multi-grained knowledge, which, however, cannot distinguish tasks and results in some transferable task-specific knowledge distortion by each other. Moreover, most GNNs cannot distinguish nodes located in different parts of the graph, making them fail to learn position-specific knowledge and lead to suboptimal performance. In this work, inspired by the prompt-based tuning in natural language processing, we propose a unified framework for graph hybrid pre-training which injects the task identification and position identification into GNNs through a prompt mechanism, namely multi-task graph dual prompt (ULTRA-DP). Based on this framework, we propose a prompt-based transferability test to find the most relevant pretext task in order to reduce the semantic gap. To implement the hybrid pre-training tasks, beyond the classical edge prediction task (node-node level), we further propose a novel pre-training paradigm based on a group of $k$-nearest neighbors (node-group level). The combination of them across different scales is able to comprehensively express more structural semantics and derive richer multi-grained knowledge. Extensive experiments show that our proposed ULTRA-DP can significantly enhance the performance of hybrid pre-training methods and show the generalizability to other pre-training tasks and backbone architectures.
    摘要

Sharp error bounds for imbalanced classification: how many examples in the minority class?

  • paper_url: http://arxiv.org/abs/2310.14826
  • repo_url: None
  • paper_authors: Anass Aghbalou, François Portier, Anne Sabourin
  • for: This paper addresses the challenge of imbalanced classification data, specifically when the rare class probability approaches zero.
  • methods: The paper presents two novel contributions: a non-asymptotic fast rate probability bound for constrained balanced empirical risk minimization, and a consistent upper bound for balanced nearest neighbors estimates.
  • results: The findings provide a clearer understanding of the benefits of class-weighting in realistic settings, opening new avenues for further research in this field.Here is the same information in Simplified Chinese text:
  • for: 这篇论文解决了偏心分类数据的问题,具体是当罕见类概率接近零时。
  • methods: 这篇论文提出了两项新贡献:一个非假设快速速率概率上限 для受限的平衡Empirical Risk Minimization,以及一个共聚函数的上限。
  • results: 研究结果为偏心分类问题提供了更清晰的理解,打开了新的研究方向,对这一领域进行进一步的探索。
    Abstract When dealing with imbalanced classification data, reweighting the loss function is a standard procedure allowing to equilibrate between the true positive and true negative rates within the risk measure. Despite significant theoretical work in this area, existing results do not adequately address a main challenge within the imbalanced classification framework, which is the negligible size of one class in relation to the full sample size and the need to rescale the risk function by a probability tending to zero. To address this gap, we present two novel contributions in the setting where the rare class probability approaches zero: (1) a non asymptotic fast rate probability bound for constrained balanced empirical risk minimization, and (2) a consistent upper bound for balanced nearest neighbors estimates. Our findings provide a clearer understanding of the benefits of class-weighting in realistic settings, opening new avenues for further research in this field.
    摘要
  1. A non-asymptotic fast rate probability bound for constrained balanced empirical risk minimization.2. A consistent upper bound for balanced nearest neighbors estimates.Our findings provide a better understanding of the benefits of class-weighting in realistic scenarios, opening up new avenues for further research in this field.

Text2Topic: Multi-Label Text Classification System for Efficient Topic Detection in User Generated Content with Zero-Shot Capabilities

  • paper_url: http://arxiv.org/abs/2310.14817
  • repo_url: None
  • paper_authors: Fengjun Wang, Moran Beladev, Ofri Kleinfeld, Elina Frayerman, Tal Shachar, Eran Fainman, Karen Lastmann Assaraf, Sarai Mizrachi, Benjamin Wang
  • for: This paper proposes a new method for multi-label text classification, called Text to Topic (Text2Topic), which is designed for high-performance and scalability.
  • methods: The Text2Topic model uses a Bi-Encoder Transformer architecture that employs concatenation, subtraction, and multiplication of embeddings on both text and topic. The model also supports zero-shot predictions, produces domain-specific text embeddings, and enables production-scale batch-inference with high throughput.
  • results: The final Text2Topic model achieves accurate and comprehensive results compared to state-of-the-art baselines, including large language models (LLMs). In a real-world stream processing platform, the model outperforms other models with 92.9% micro mAP and 75.8% macro mAP scores. The paper also conducts extensive ablation studies to validate the effectiveness of the modeling choices.
    Abstract Multi-label text classification is a critical task in the industry. It helps to extract structured information from large amount of textual data. We propose Text to Topic (Text2Topic), which achieves high multi-label classification performance by employing a Bi-Encoder Transformer architecture that utilizes concatenation, subtraction, and multiplication of embeddings on both text and topic. Text2Topic also supports zero-shot predictions, produces domain-specific text embeddings, and enables production-scale batch-inference with high throughput. The final model achieves accurate and comprehensive results compared to state-of-the-art baselines, including large language models (LLMs). In this study, a total of 239 topics are defined, and around 1.6 million text-topic pairs annotations (in which 200K are positive) are collected on approximately 120K texts from 3 main data sources on Booking.com. The data is collected with optimized smart sampling and partial labeling. The final Text2Topic model is deployed on a real-world stream processing platform, and it outperforms other models with 92.9% micro mAP, as well as a 75.8% macro mAP score. We summarize the modeling choices which are extensively tested through ablation studies, and share detailed in-production decision-making steps.
    摘要 多个标签文本分类是产业中的关键任务,它可以从大量文本数据中提取结构化信息。我们提出了文本到话题(Text2Topic),它利用BI-EncoderTransformer架构,通过 concatenation、减法和乘法操作,实现高精度多标签分类。Text2Topic还支持零shot预测、生成域pecific文本嵌入和高效批处理。最终模型实现了比state-of-the-art基elines更高的精度和完整性。在这个研究中,总共定义了239个话题,并收集了约1.6万个文本-话题对(其中200000是正例)的注解,来自3个主要数据源,包括Booking.com。数据采集使用优化的聪明采集和半标注。最终的Text2Topic模型在实际世界流处理平台上部署,并与其他模型相比,实现了92.9%微MAP和75.8%macro MAP的高性能。我们系统地测试了模型选择,并在文章中分享了生产环境中的决策过程。

Learning spatio-temporal patterns with Neural Cellular Automata

  • paper_url: http://arxiv.org/abs/2310.14809
  • repo_url: https://github.com/alexdr1998/nca
  • paper_authors: Alex D. Richardson, Tibor Antal, Richard A. Blythe, Linus J. Schumacher
  • for: 学习复杂动力学系统的本质规律
  • methods: 使用神经细胞自动机(NCA)学习图像时序序列和非线性偏微分方程(PDE)的轨迹
  • results: 能够学习出较为复杂的动力学系统的本质规律,并且可以在不同的系统中进行泛化Here’s a more detailed explanation of each point:
  • for: The paper is written to explore the use of Neural Cellular Automata (NCA) for learning complex dynamics in systems, particularly in the context of biological pattern formation.
  • methods: The paper uses NCA, a combination of machine learning and mechanistic modelling, to learn the underlying local rules that govern large-scale dynamic emergent behaviors from time series of images and PDE trajectories.
  • results: The paper demonstrates that NCA can capture both transient and stable structures within the same system, and can generalize well beyond its training data. The paper also explores the effects of associated hyperparameters on model performance and stability, and shows how to constrain NCA to respect given symmetries.
    Abstract Neural Cellular Automata (NCA) are a powerful combination of machine learning and mechanistic modelling. We train NCA to learn complex dynamics from time series of images and PDE trajectories. Our method is designed to identify underlying local rules that govern large scale dynamic emergent behaviours. Previous work on NCA focuses on learning rules that give stationary emergent structures. We extend NCA to capture both transient and stable structures within the same system, as well as learning rules that capture the dynamics of Turing pattern formation in nonlinear Partial Differential Equations (PDEs). We demonstrate that NCA can generalise very well beyond their PDE training data, we show how to constrain NCA to respect given symmetries, and we explore the effects of associated hyperparameters on model performance and stability. Being able to learn arbitrary dynamics gives NCA great potential as a data driven modelling framework, especially for modelling biological pattern formation.
    摘要

Mid-Long Term Daily Electricity Consumption Forecasting Based on Piecewise Linear Regression and Dilated Causal CNN

  • paper_url: http://arxiv.org/abs/2310.15204
  • repo_url: None
  • paper_authors: Zhou Lan, Ben Liu, Yi Feng, Danhuang Dong, Peng Zhang
  • for: 预测每天的电力消耗
  • methods: 使用分割式线性回归和扩展 causal CNN 进行预测
  • results: 比现有方法更高的预测精度
    Abstract Daily electricity consumption forecasting is a classical problem. Existing forecasting algorithms tend to have decreased accuracy on special dates like holidays. This study decomposes the daily electricity consumption series into three components: trend, seasonal, and residual, and constructs a two-stage prediction method using piecewise linear regression as a filter and Dilated Causal CNN as a predictor. The specific steps involve setting breakpoints on the time axis and fitting the piecewise linear regression model with one-hot encoded information such as month, weekday, and holidays. For the challenging prediction of the Spring Festival, distance is introduced as a variable using a third-degree polynomial form in the model. The residual sequence obtained in the previous step is modeled using Dilated Causal CNN, and the final prediction of daily electricity consumption is the sum of the two-stage predictions. Experimental results demonstrate that this method achieves higher accuracy compared to existing approaches.
    摘要 每日电力消耗预测是一个经典的问题。现有的预测算法通常在特殊的日子 like 假期和节假日上减少准确性。这种研究将每日电力消耗序列分解成三个组件:趋势、季节性和差异,并构建了一种两阶段预测方法,使用 piecwise 线性回归作为筛选器和扩展 causal CNN 作为预测器。具体步骤包括在时间轴上设置分支点,并使用一个简单的一频率编码的月、周日和假期信息来适应 piecwise 线性回归模型。为了解决春节的难预测问题,在模型中引入了距离变量,使用第三度多项式形式。最后,通过将差异序列模型化为 Dilated Causal CNN,并将两个阶段预测结果相加,得到了每日电力消耗的最终预测结果。实验结果表明,这种方法与现有方法相比,具有更高的准确性。

Principled Approaches for Learning to Defer with Multiple Experts

  • paper_url: http://arxiv.org/abs/2310.14774
  • repo_url: None
  • paper_authors: Anqi Mao, Mehryar Mohri, Yutao Zhong
  • for: 学习延迟多 экспер特性问题
  • methods: 引入新的协助函数损失家族,同时学习预测和延迟函数
  • results: 证明新损失函数具有强$H$-一致性约束,并在几个实践中给出了明确的保证。
    Abstract We present a study of surrogate losses and algorithms for the general problem of learning to defer with multiple experts. We first introduce a new family of surrogate losses specifically tailored for the multiple-expert setting, where the prediction and deferral functions are learned simultaneously. We then prove that these surrogate losses benefit from strong $H$-consistency bounds. We illustrate the application of our analysis through several examples of practical surrogate losses, for which we give explicit guarantees. These loss functions readily lead to the design of new learning to defer algorithms based on their minimization. While the main focus of this work is a theoretical analysis, we also report the results of several experiments on SVHN and CIFAR-10 datasets.
    摘要 我团队发表了一篇关于代表函数和算法的总体研究,旨在学习多个专家之间的延迟学习问题。我们首先介绍了一个新的多专家设置下的代表函数家族,同时学习预测和延迟函数。然后,我们证明了这些代表函数受益于强$H$-一致性 bound。我们通过一些实际的代表函数例子,给出了Explicit guarantees。这些损失函数直接导致了基于其最小化的学习延迟算法的设计。虽然我们的主要关注点是理论分析,但我们还在SVHN和CIFAR-10 datasets上进行了一些实验。

Predictor-Rejector Multi-Class Abstention: Theoretical Analysis and Algorithms

  • paper_url: http://arxiv.org/abs/2310.14772
  • repo_url: None
  • paper_authors: Anqi Mao, Mehryar Mohri, Yutao Zhong
  • for: 本研究目的是在多类分类 Setting 中学习忽略学习。
  • methods: 本文提出了一系列新的理论和算法结果,用于解决这种学习问题。包括新的 surrogate 损失函数家族,以及对这些损失函数的非偏极 consistency 保证。
  • results: 实验结果表明,使用我们提出的新 surrogate 损失函数家族和两stage 算法可以得到remarkable的性能提升,比现有的算法更高。
    Abstract We study the key framework of learning with abstention in the multi-class classification setting. In this setting, the learner can choose to abstain from making a prediction with some pre-defined cost. We present a series of new theoretical and algorithmic results for this learning problem in the predictor-rejector framework. We introduce several new families of surrogate losses for which we prove strong non-asymptotic and hypothesis set-specific consistency guarantees, thereby resolving positively two existing open questions. These guarantees provide upper bounds on the estimation error of the abstention loss function in terms of that of the surrogate loss. We analyze both a single-stage setting where the predictor and rejector are learned simultaneously and a two-stage setting crucial in applications, where the predictor is learned in a first stage using a standard surrogate loss such as cross-entropy. These guarantees suggest new multi-class abstention algorithms based on minimizing these surrogate losses. We also report the results of extensive experiments comparing these algorithms to the current state-of-the-art algorithms on CIFAR-10, CIFAR-100 and SVHN datasets. Our results demonstrate empirically the benefit of our new surrogate losses and show the remarkable performance of our broadly applicable two-stage abstention algorithm.
    摘要 我们研究多类别学习中的决策弃权框架。在这种设定下,学习者可以选择不预测某些预定的成本。我们在predictor-rejector框架下提供了一系列新的理论和算法结果,用于解决这个学习问题。我们引入了多种新的代理损函数,并证明了这些损函数对于不同的假设集拥有强不偏极和特定的一致性保证,因此解决了两个现有的开问。这些保证提供了对决策损函数的估计误差的上限,以及这些损函数对决策损函数的预测性。我们分析了单阶段设定和两阶段设定,其中在第一阶段使用标准的替身损函数,如十字积分损函数。这些保证和实验结果表明,使用我们的新代理损函数可以提高多类别决策的性能。我们还报告了在CIFAR-10、CIFAR-100和SVHN数据集上的实验结果,并证明了我们的算法在这些数据集上的Remarkable性能。

Theoretically Grounded Loss Functions and Algorithms for Score-Based Multi-Class Abstention

  • paper_url: http://arxiv.org/abs/2310.14770
  • repo_url: None
  • paper_authors: Anqi Mao, Mehryar Mohri, Yutao Zhong
  • for: This paper is written for learning with abstention in the multi-class classification setting, with a focus on score-based formulations and surrogate losses.
  • methods: The paper introduces new families of surrogate losses for the abstention loss function, including state-of-the-art surrogate losses in the single-stage setting and a novel family of loss functions in the two-stage setting. The authors also prove strong consistency guarantees for these surrogate losses.
  • results: The paper experiments on CIFAR-10, CIFAR-100, and SVHN datasets and shows the practical significance of the new surrogate losses and two-stage abstention algorithms. The results also demonstrate that the relative performance of state-of-the-art score-based surrogate losses can vary across datasets.Here is the same information in Simplified Chinese text:
  • for: 这篇论文是针对多类分类学习中的退出学习,使用得分基的表示方式和代理损函数。
  • methods: 论文引入了新的代理损函数家族,包括单阶段 Setting 中的现状损函数和两阶段 Setting 中的新家族损函数。作者还证明了这些代理损函数的强式非对抗性和特定假设集特定的一致性保证。
  • results: 论文在 CIFAR-10、CIFAR-100 和 SVHN 等 datasets 进行了实验,并证明了新的代理损函数和两阶段退出算法的实际意义。结果还显示了不同的得分基 surrogate loss 的相对性能可以随 datasets 的不同而变化。
    Abstract Learning with abstention is a key scenario where the learner can abstain from making a prediction at some cost. In this paper, we analyze the score-based formulation of learning with abstention in the multi-class classification setting. We introduce new families of surrogate losses for the abstention loss function, which include the state-of-the-art surrogate losses in the single-stage setting and a novel family of loss functions in the two-stage setting. We prove strong non-asymptotic and hypothesis set-specific consistency guarantees for these surrogate losses, which upper-bound the estimation error of the abstention loss function in terms of the estimation error of the surrogate loss. Our bounds can help compare different score-based surrogates and guide the design of novel abstention algorithms by minimizing the proposed surrogate losses. We experimentally evaluate our new algorithms on CIFAR-10, CIFAR-100, and SVHN datasets and the practical significance of our new surrogate losses and two-stage abstention algorithms. Our results also show that the relative performance of the state-of-the-art score-based surrogate losses can vary across datasets.
    摘要 学习弃权是一个关键场景,在这种场景下学习者可以弃权不预测。在这篇论文中,我们分析了在多类分类设定下的分数基式学习弃权场景。我们引入了新的家族surrogate损失函数,包括单 Stage设定下的state-of-the-art surrogate损失函数和两 Stage设定下的一个新家族损失函数。我们证明了这些surrogate损失函数的强非尺度和假设集特定的一致性保证,这些保证可以Upper-bound abstention损失函数的估计误差,基于surrogate损失函数的估计误差。我们的界限可以帮助比较不同的分数基式surrogate,并 guideline novel abstention算法的设计。我们在CIFAR-10、CIFAR-100和SVHN数据集上进行实验,并证明了我们的新surrogate损失函数和两Stage abstention算法的实际意义。我们的结果还表明,不同数据集上state-of-the-art分数基式surrogate损失函数的相对性能可能会变化。

Improved K-mer Based Prediction of Protein-Protein Interactions With Chaos Game Representation, Deep Learning and Reduced Representation Bias

  • paper_url: http://arxiv.org/abs/2310.14764
  • repo_url: None
  • paper_authors: Ruth Veevers, Dan MacLean
  • for: This paper aims to address the problem of representation bias in machine learning models used to predict protein-protein interactions, by extracting unique pairs from an interaction dataset and generating non-redundant paired data.
  • methods: The authors use a method for extracting unique pairs from an interaction dataset, which involves clustering protein pairs based on their similarity and then removing any pairs that are not truly unique. They also use a convolutional neural network (CNN) model to learn and predict interactions from Chaos Game Representations of proteins’ coding genes.
  • results: The authors applied their method to datasets containing Arabidopsis thaliana and pathogen effector interactions, and demonstrated that their approach can generate non-redundant paired data that can be used to train machine learning models to predict protein-protein interactions with high accuracy.
    Abstract Protein-protein interactions drive many biological processes, including the detection of phytopathogens by plants' R-Proteins and cell surface receptors. Many machine learning studies have attempted to predict protein-protein interactions but performance is highly dependent on training data; models have been shown to accurately predict interactions when the proteins involved are included in the training data, but achieve consistently poorer results when applied to previously unseen proteins. In addition, models that are trained using proteins that take part in multiple interactions can suffer from representation bias, where predictions are driven not by learned biological features but by learning of the structure of the interaction dataset. We present a method for extracting unique pairs from an interaction dataset, generating non-redundant paired data for unbiased machine learning. After applying the method to datasets containing _Arabidopsis thaliana_ and pathogen effector interations, we developed a convolutional neural network model capable of learning and predicting interactions from Chaos Game Representations of proteins' coding genes.
    摘要 生物过程中,蛋白质-蛋白质交互作用扮演着重要角色,包括植物R-蛋白和表面受体检测病原体。许多机器学习研究尝试预测蛋白质-蛋白质交互,但是模型性能高度取决于训练数据。当包含交互中的蛋白质时,模型可以准确预测交互,但是在前期未见蛋白质时,模型性能往往较差。此外,使用参与多个交互的蛋白质进行训练可能会导致代表性偏见,其中预测不再是基于学习生物特征,而是学习交互数据集的结构。我们提出了一种方法,用于从交互数据集中提取独特对,生成非重复的对数据,以避免代表性偏见。在应用这种方法于包含阿拉伯豆植物和病原体效应蛋白交互的数据集上,我们开发了一种基于混沌游戏表示的蛋白质编码基因 convolutional neural network 模型,可以学习和预测蛋白质交互。

Externally Valid Policy Evaluation Combining Trial and Observational Data

  • paper_url: http://arxiv.org/abs/2310.14763
  • repo_url: None
  • paper_authors: Sofia Ek, Dave Zachariah
  • for: 评估决策政策的影响
  • methods: 使用试验数据和外部数据模型 sampling 个人的采样方法来确保评估的有效性
  • results: 可以确定的评估结果,即使模型有较大的误差Here’s a breakdown of each point:1. for: The paper is written to evaluate the effectiveness of decision policies using trial data.2. methods: The paper uses trial data and additional covariate data from the target population to model the sampling of individuals in the trial study. The method is nonparametric and can handle any specified range of model miscalibrations.3. results: The method provides certifiably valid trial-based policy evaluations, even with finite samples. The results are illustrated using both simulated and real data.
    Abstract Randomized trials are widely considered as the gold standard for evaluating the effects of decision policies. Trial data is, however, drawn from a population which may differ from the intended target population and this raises a problem of external validity (aka. generalizability). In this paper we seek to use trial data to draw valid inferences about the outcome of a policy on the target population. Additional covariate data from the target population is used to model the sampling of individuals in the trial study. We develop a method that yields certifiably valid trial-based policy evaluations under any specified range of model miscalibrations. The method is nonparametric and the validity is assured even with finite samples. The certified policy evaluations are illustrated using both simulated and real data.
    摘要 随机对照试验被广泛认为是评估决策政策的 золо标准。试验数据来自一个可能与目标人口 Population 不同的人口,这引起了外部有效性(即泛化)的问题。在这篇论文中,我们想使用试验数据来得出有效的政策影响 outcome 的引导。我们使用来自目标人口的额外 covariate 数据来模拟试验中的个体采样。我们开发了一种可以在任何指定的模型偏差范围内获得有效的试验基本政策评估方法。这种方法是非 Parametric 的,并且在 finite samples 下保证有效。我们通过使用实际数据和 simulated 数据进行了示例。

Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

  • paper_url: http://arxiv.org/abs/2310.14753
  • repo_url: https://github.com/syr-cn/simsgt
  • paper_authors: Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua
  • for: 本研究主要针对于分自supervised molecular graph representation learning领域。
  • methods: 本研究使用的方法包括Masked Graph Modeling(MGM),其中包括三个关键组件:图 tokenizer、图遮盖和图自动编码器。
  • results: 研究发现,使用subgraph级别的tokenizer和具有表达能力的解码器可以大幅提高图自动编码器的表示学习。此外,提出了一种新的MGM方法SimSGT,其中包括一种简单的GNN基于的tokenizer(SGT)和一种有效的解码策略。经验 validate that our method outperforms the existing molecule self-supervised learning methods。
    Abstract Masked graph modeling excels in the self-supervised representation learning of molecular graphs. Scrutinizing previous studies, we can reveal a common scheme consisting of three key components: (1) graph tokenizer, which breaks a molecular graph into smaller fragments (i.e., subgraphs) and converts them into tokens; (2) graph masking, which corrupts the graph with masks; (3) graph autoencoder, which first applies an encoder on the masked graph to generate the representations, and then employs a decoder on the representations to recover the tokens of the original graph. However, the previous MGM studies focus extensively on graph masking and encoder, while there is limited understanding of tokenizer and decoder. To bridge the gap, we first summarize popular molecule tokenizers at the granularity of node, edge, motif, and Graph Neural Networks (GNNs), and then examine their roles as the MGM's reconstruction targets. Further, we explore the potential of adopting an expressive decoder in MGM. Our results show that a subgraph-level tokenizer and a sufficiently expressive decoder with remask decoding have a large impact on the encoder's representation learning. Finally, we propose a novel MGM method SimSGT, featuring a Simple GNN-based Tokenizer (SGT) and an effective decoding strategy. We empirically validate that our method outperforms the existing molecule self-supervised learning methods. Our codes and checkpoints are available at https://github.com/syr-cn/SimSGT.
    摘要 <>TRANSLATE_TEXT模型化图表示力在分解化学Graph上自我超级vised学习中表现出色。从前期研究中可以找到一种通用的 schema,包括三个关键组件:(1)图标记化,将化学Graph分解成更小的 Fragment(i.e., 子图)并将其转换为token;(2)图遮盖,对图进行遮盖;(3)图自动编码器,首先将遮盖后的图进行编码,然后使用Decoder将编码后的表示 reconstruction 到原始图的token。但是,之前的MGM研究强调了图遮盖和编码器,而忽略了标记化和Decoder。为了填补这个差距,我们首先总结了流行的分子标记化器,并分析它们在MGM的重建目标中的角色。然后,我们探索采用表达力强的Decoder可能有优势。我们的结果表明,使用分子水平的标记化器和具有重新遮盖编码的Decoder可以对编码器的表示学习产生重要影响。最后,我们提出了一种新的MGM方法SimSGT,其特点是使用简单的GNN-based Tokenizer(SGT)和有效的编码策略。我们通过实验证明,我们的方法在分子自我超级vised学习中表现出色,并且超过了现有的分子自我学习方法。我们的代码和检查点可以在https://github.com/syr-cn/SimSGT上获取。

Efficient and Interpretable Bandit Algorithms

  • paper_url: http://arxiv.org/abs/2310.14751
  • repo_url: None
  • paper_authors: Subhojyoti Mukherjee, Ruihao Zhu, Branislav Kveton
  • for: 本文旨在设计一种可解释的帮助算法,以提高现代机器学习中的解释性。
  • methods: 本文提出了一种新的不确定性损失度量,用于衡量算法的可解释性。同时,本文还提出了一种基于约束最优设计的帮助算法,称为CODE,它可以高效地实现可解释性。
  • results: 实验表明,CODE可以在多臂和线性帮助上实现近似最优的停损 bound,而且在实际问题上也有优于其他可解释的设计。此外,CODE还可以在不同的约束下进行可靠地推荐。
    Abstract Motivated by the importance of explainability in modern machine learning, we design bandit algorithms that are \emph{efficient} and \emph{interpretable}. A bandit algorithm is interpretable if it explores with the objective of reducing uncertainty in the unknown model parameter. To quantify the interpretability, we introduce a novel metric of \textit{uncertainty loss}, which compares the rate of the uncertainty reduction to the theoretical optimum. We propose CODE, a bandit algorithm based on a \textbf{C}onstrained \textbf{O}ptimal \textbf{DE}sign, that is interpretable and maximally reduces the uncertainty. The key idea in \code is to explore among all plausible actions, determined by a statistical constraint, to achieve interpretability. We implement CODE efficiently in both multi-armed and linear bandits and derive near-optimal regret bounds by leveraging the optimality criteria of the approximate optimal design. CODE can be also viewed as removing phases in conventional phased elimination, which makes it more practical and general. We demonstrate the advantage of \code by numerical experiments on both synthetic and real-world problems. CODE outperforms other state-of-the-art interpretable designs while matching the performance of popular but uninterpretable designs, such as upper confidence bound algorithms.
    摘要 受现代机器学习中解释性的重要性启发,我们设计了高效和可解释的bandit算法。一个可解释的bandit算法是如果它在不确定模型参数下进行探索,以减少不确定性。为量化解释性,我们引入了一个新的不确定损失度量,该度量比较探索动作对于理论最优的不确定度减少率。我们提议了一种基于《C》онstrained《O》ptimal《D》esign的CODE算法,它是可解释的并尽可能减少不确定性。CODE算法的关键思想是在所有可能的动作中探索,以实现可解释性。我们有效地实现了CODE算法在多重臂和线性bandit中,并 derive了近似优质 regret bound。CODE算法可以被视为从 conventual phased elimination中除去阶段,这使得它更加实用和通用。我们通过数字实验表示,CODE算法在 synthetic 和实际问题上具有优势,并与流行但不可解释的设计匹配性。

A Comparative Study of Portfolio Optimization Methods for the Indian Stock Market

  • paper_url: http://arxiv.org/abs/2310.14748
  • repo_url: None
  • paper_authors: Jaydip Sen, Arup Dasgupta, Partha Pratim Sengupta, Sayantani Roy Choudhury
  • for: 研究印度股市三种 portefolio优化方法(MVP、HRP、HERC)的比较研究,特别是从15个领域上选择的股票。
  • methods: 使用MVP、HRP、HERC三种 portefolio优化方法,基于2019年7月1日至2022年6月30日的股票价格,为每个领域设计三个 portefolio。
  • results: 对每个领域的三个 portefolio进行测试,使用三个绩效指标(累积回报、年度波动率、希克率)评估 portefolio的表现,并为每个领域选择最高累积回报、最低波动率和最大希克率的 portefolio。
    Abstract This chapter presents a comparative study of the three portfolio optimization methods, MVP, HRP, and HERC, on the Indian stock market, particularly focusing on the stocks chosen from 15 sectors listed on the National Stock Exchange of India. The top stocks of each cluster are identified based on their free-float market capitalization from the report of the NSE published on July 1, 2022 (NSE Website). For each sector, three portfolios are designed on stock prices from July 1, 2019, to June 30, 2022, following three portfolio optimization approaches. The portfolios are tested over the period from July 1, 2022, to June 30, 2023. For the evaluation of the performances of the portfolios, three metrics are used. These three metrics are cumulative returns, annual volatilities, and Sharpe ratios. For each sector, the portfolios that yield the highest cumulative return, the lowest volatility, and the maximum Sharpe Ratio over the training and the test periods are identified.
    摘要 本章 presenta una estudio comparativo de los métodos de optimización de cartera MVP, HRP y HERC en el mercado de valores indio, enfocándose en las acciones seleccionadas de 15 sectores listados en la Bolsa de Valores Nacional de India. Los mejores acciones de cada cluster se identifican basados en su capitalización de mercado libre flotante según el informe del NSE publicado el 1 de julio de 2022 (NSE Website). Para cada sector, se diseñan tres carteras sobre los precios de las acciones del período desde el 1 de julio de 2019 hasta el 30 de junio de 2022, siguiendo tres enfoques de optimización de cartera. Las carteras se prueban durante el período desde el 1 de julio de 2022 hasta el 30 de junio de 2023. Para evaluar el desempeño de las carteras, se utilizan tres métricas. Estas tres métricas son los rendimientos acumulados, las volatilidades anuales y los coeficientes de Sharpe. Para cada sector, se identifican las carteras que proporcionan los mayores rendimientos acumulados, las volatilidades más bajas y los coeficientes de Sharpe más altos durante el período de entrenamiento y prueba.

Extended Deep Adaptive Input Normalization for Preprocessing Time Series Data for Neural Networks

  • paper_url: http://arxiv.org/abs/2310.14720
  • repo_url: https://github.com/marcusgh/edain_paper
  • paper_authors: Marcus A. K. September, Francesco Sanna Passino, Leonie Goldmann, Anton Hinel
    for: This paper focuses on addressing the challenges of preprocessing time series data for machine learning tasks, specifically using deep neural networks.methods: The proposed EDAIN (Extended Deep Adaptive Input Normalization) layer is an adaptive neural layer that learns to normalize irregular time series data in an end-to-end fashion using back-propagation.results: The EDAIN layer outperforms conventional normalization methods and existing adaptive time series preprocessing layers in experiments using synthetic data, a credit default prediction dataset, and a large-scale limit order book benchmark dataset.
    Abstract Data preprocessing is a crucial part of any machine learning pipeline, and it can have a significant impact on both performance and training efficiency. This is especially evident when using deep neural networks for time series prediction and classification: real-world time series data often exhibit irregularities such as multi-modality, skewness and outliers, and the model performance can degrade rapidly if these characteristics are not adequately addressed. In this work, we propose the EDAIN (Extended Deep Adaptive Input Normalization) layer, a novel adaptive neural layer that learns how to appropriately normalize irregular time series data for a given task in an end-to-end fashion, instead of using a fixed normalization scheme. This is achieved by optimizing its unknown parameters simultaneously with the deep neural network using back-propagation. Our experiments, conducted using synthetic data, a credit default prediction dataset, and a large-scale limit order book benchmark dataset, demonstrate the superior performance of the EDAIN layer when compared to conventional normalization methods and existing adaptive time series preprocessing layers.
    摘要 “数据预处理是机器学习管道中的关键环节,它对性能和训练效率都有很大的影响,尤其是在使用深度神经网络进行时间序列预测和分类时。实际世界的时间序列数据经常具有不规则性,如多模性、偏度和异常值,如果不足以处理这些特征,模型的性能就会快速下降。在这项工作中,我们提议使用EDAIN(扩展深度适应输入 нор化)层,这是一种可以在端到端的方式下适应时间序列数据的不规则性的新型神经层。通过与深度神经网络一起使用反射传播来优化其未知参数,EDAIN层可以在不使用固定 нор化方案的情况下,为给定任务适应地normalize时间序列数据。我们使用了 sintetic数据、一个信用风险预测数据集和一个大规模的 limit order book 数据集进行实验,结果表明,与常规normalization方法和现有的时间序列预处理层相比,EDAIN层在性能上表现出色。”

A Hybrid GNN approach for predicting node data for 3D meshes

  • paper_url: http://arxiv.org/abs/2310.14707
  • repo_url: None
  • paper_authors: Shwetha Salimath, Francesca Bugiotti, Frederic Magoules
  • for: 本研究旨在提高金属锻造中die的制造效率,通过开发一个hybrid方法,融合Finite Element方法和深度学习模型,实现更快速的预测和新数据生成。
  • methods: 本研究使用了一个Hybrid方法,包括Graph Convolutional Neural Network(GCNN)和Finite Element方法。GCNN模型可以将网格或点 cloud结构转换为深度学习模型,以便进行预测和数据生成。
  • results: 本研究的结果显示,Hybrid方法可以实现更高的预测精度和更快速的预测时间,比较前的PointNet和简单的graph neural network模型。新的模型可以更好地处理网格或点 cloud结构,并且可以实现更好的数据生成和预测。
    Abstract Metal forging is used to manufacture dies. We require the best set of input parameters for the process to be efficient. Currently, we predict the best parameters using the finite element method by generating simulations for the different initial conditions, which is a time-consuming process. In this paper, introduce a hybrid approach that helps in processing and generating new data simulations using a surrogate graph neural network model based on graph convolutions, having a cheaper time cost. We also introduce a hybrid approach that helps in processing and generating new data simulations using the model. Given a dataset representing meshes, our focus is on the conversion of the available information into a graph or point cloud structure. This new representation enables deep learning. The predicted result is similar, with a low error when compared to that produced using the finite element method. The new models have outperformed existing PointNet and simple graph neural network models when applied to produce the simulations.
    摘要 钢铁锻造被用于生产锻造模具。我们需要最佳的输入参数来使 processefficient。目前,我们使用finite element方法预测最佳参数,这是一个时间消耗的过程。在这篇论文中,我们提出了一种混合方法,即基于图 convolutional neural network(Graph CNN)模型,可以减少时间成本。我们还提出了一种混合方法,可以处理和生成新的数据 simulateonusing这种模型。给定一个表示 mesh 的数据集,我们的关注点在于将可用的信息转换为图或点云结构。这种新的表示允许深度学习。预测结果与finite element方法生成的结果相似,错误低。新的模型在应用于生成 simulateon时表现出色,超越了点云网络和简单的图神经网络模型。

Federated learning compression designed for lightweight communications

  • paper_url: http://arxiv.org/abs/2310.14693
  • repo_url: https://github.com/lgrativol/fl_exps
  • paper_authors: Lucas Grativol Ribeiro, Mathieu Leonardon, Guillaume Muller, Virginie Fresse, Matthieu Arzel
  • for: 这篇论文主要关注在 Federated Learning (FL) 中的通信成本问题,特别是在军事和医疗领域中, client 数据无法被共享或传送到云端服务器。
  • methods: 本文使用了各种压缩技术,包括剪裁和量化,以减轻 client 端的网络通信负载。
  • results: 本文显示,使用压缩技术可以将 messages 压缩到 50%,并且仅导致精度损失小于 1%,与现有技术竞争。
    Abstract Federated Learning (FL) is a promising distributed method for edge-level machine learning, particularly for privacysensitive applications such as those in military and medical domains, where client data cannot be shared or transferred to a cloud computing server. In many use-cases, communication cost is a major challenge in FL due to its natural intensive network usage. Client devices, such as smartphones or Internet of Things (IoT) nodes, have limited resources in terms of energy, computation, and memory. To address these hardware constraints, lightweight models and compression techniques such as pruning and quantization are commonly adopted in centralised paradigms. In this paper, we investigate the impact of compression techniques on FL for a typical image classification task. Going further, we demonstrate that a straightforward method can compresses messages up to 50% while having less than 1% of accuracy loss, competing with state-of-the-art techniques.
    摘要 federated 学习(FL)是一种有前途的分布式方法,尤其适用于隐私敏感的应用场景,如军事和医疗领域, где客户端数据无法被传输到云计算服务器。在许多实例中,通信成本是FL的主要挑战,因为它的自然极为网络使用量。客户端设备,如智能手机或物联网(IoT)节点,具有有限的能源、计算和存储资源。为Addressing these hardware constraints, lightweight models and compression techniques such as pruning and quantization are commonly adopted in centralized paradigms. In this paper, we investigate the impact of compression techniques on FL for a typical image classification task. Furthermore, we show that a straightforward method can compress messages up to 50% while incurring less than 1% accuracy loss, competing with state-of-the-art techniques.

Population Descent: A Natural-Selection Based Hyper-Parameter Tuning Framework

  • paper_url: http://arxiv.org/abs/2310.14671
  • repo_url: None
  • paper_authors: Abhinav Pomalapally, Bassel El Mabsout, Renato Mansuco
  • for: 优化超参数(hyperparameter optimization)
  • methods: 人口 descent(Population Descent),一种基于 memetic algorithm的优化方法
  • results: 在常用 benchmark 任务上,与现有 state-of-the-art 算法相比, adaptive m-elitist 选择和normalized-fitness-based randomization 方法可以提高表现,最高提高13%。Here’s the English version for reference:
  • for: Hyperparameter optimization
  • methods: Population Descent, a memetic algorithm-based optimization method
  • results: On common benchmark tasks, the adaptive m-elitist selection and normalized-fitness-based randomization method outperforms state-of-the-art algorithms by up to 13%.
    Abstract First-order gradient descent has been the base of the most successful optimization algorithms ever implemented. On supervised learning problems with very high dimensionality, such as neural network optimization, it is almost always the algorithm of choice, mainly due to its memory and computational efficiency. However, it is a classical result in optimization that gradient descent converges to local minima on non-convex functions. Even more importantly, in certain high-dimensional cases, escaping the plateaus of large saddle points becomes intractable. On the other hand, black-box optimization methods are not sensitive to the local structure of a loss function's landscape but suffer the curse of dimensionality. Instead, memetic algorithms aim to combine the benefits of both. Inspired by this, we present Population Descent, a memetic algorithm focused on hyperparameter optimization. We show that an adaptive m-elitist selection approach combined with a normalized-fitness-based randomization scheme outperforms more complex state-of-the-art algorithms by up to 13% on common benchmark tasks.
    摘要 首顺 gradient descent 是最成功的优化算法之一,它在超级vised learning 问题上,特别是神经网络优化中,总是选择的首选方法,主要是因为它的内存和计算效率。然而,是一个古老的优化结果, gradient descent 会 converges to мест minimum 在非对称函数上。尤其是在高维度情况下,穿过大的板块点的架构成为不可能的。相比之下,黑盒优化方法不受优化函数的本地结构的影响,但是受到维度的味道。而 memetic algorithms 旨在结合两者的优点。以 Population Descent 为例,我们提出了一种 adaptive m-elitist 选择策略和 normalized-fitness-based 随机化方案,可以在常见的 benchmark 任务上超过更复杂的现状算法,提高性能达 13%。

Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy

  • paper_url: http://arxiv.org/abs/2310.14661
  • repo_url: None
  • paper_authors: Yingyu Lin, Yian Ma, Yu-Xiang Wang, Rachel Redberg
  • For: 提供 $\varepsilon$-纯 differential privacy (DP) 保证和不受可能的无限大隐私泄露的 posterior sampling 方法。* Methods: 使用 exponential mechanism 和 Markov chain Monte Carlo (MCMC) 方法,并将它们结合在一起以减少 $\delta$-approximation error。* Results: 提出了 Approximate SAample Perturbation (ASAP) 算法,可以在 MCMC 样本上添加 proportional noise,以保证 $\varepsilon$-纯 DP 和 $\delta=0$ pure Gaussian DP。并且证明了该算法可以在 nearly linear-time 内实现最佳率的 DP-ERM 问题。
    Abstract Posterior sampling, i.e., exponential mechanism to sample from the posterior distribution, provides $\varepsilon$-pure differential privacy (DP) guarantees and does not suffer from potentially unbounded privacy breach introduced by $(\varepsilon,\delta)$-approximate DP. In practice, however, one needs to apply approximate sampling methods such as Markov chain Monte Carlo (MCMC), thus re-introducing the unappealing $\delta$-approximation error into the privacy guarantees. To bridge this gap, we propose the Approximate SAample Perturbation (abbr. ASAP) algorithm which perturbs an MCMC sample with noise proportional to its Wasserstein-infinity ($W_\infty$) distance from a reference distribution that satisfies pure DP or pure Gaussian DP (i.e., $\delta=0$). We then leverage a Metropolis-Hastings algorithm to generate the sample and prove that the algorithm converges in W$_\infty$ distance. We show that by combining our new techniques with a careful localization step, we obtain the first nearly linear-time algorithm that achieves the optimal rates in the DP-ERM problem with strongly convex and smooth losses.
    摘要 后采样,即对 posterior distribution 的 exponential mechanism 样本,提供了 $\varepsilon$-纯 differential privacy (DP) 保证和不受可能的无限 privacy breach 引入的 $( \varepsilon, \delta)-$approximate DP。然而,在实践中,需要使用approximate sampling方法,如 Markov chain Monte Carlo (MCMC),从而重新引入 $\delta-$approximation error 到隐私保证中。为 bridging 这个差距,我们提出了 Approximate SAample Perturbation(简称 ASAP)算法,该算法在 MCMC 样本上添加了距离参考分布的 Wasserstein-infinity($W_\infty$) 误差,该分布满足纯 DP 或纯 Gaussian DP(即 $\delta=0$)。然后,我们利用 Metropolis-Hastings 算法生成样本,并证明该算法在 W$_\infty$ 距离上收敛。我们还证明,通过将我们的新技术与精心的本地化步骤相结合,可以获得首个 nearly linear-time 算法,实现 DP-ERM 问题中的优化率。

Predicting Accurate Lagrangian Multipliers for Mixed Integer Linear Programs

  • paper_url: http://arxiv.org/abs/2310.14659
  • repo_url: None
  • paper_authors: Francesco Demelas, Joseph Le Roux, Mathieu Lacroix, Axel Parmentier
  • for: 解决具有困难约束的混合整数线性Program (MILP) 问题。
  • methods: 使用深度学习方法,通过跳过梯度下降,快速优化凹陷函数。
  • results: 可以减少85%的误差,提供高质量的热启动解。
    Abstract Lagrangian relaxation stands among the most efficient approaches for solving a Mixed Integer Linear Programs (MILP) with difficult constraints. Given any duals for these constraints, called Lagrangian Multipliers (LMs), it returns a bound on the optimal value of the MILP, and Lagrangian methods seek the LMs giving the best such bound. But these methods generally rely on iterative algorithms resembling gradient descent to maximize the concave piecewise linear dual function: the computational burden grows quickly with the number of relaxed constraints. We introduce a deep learning approach that bypasses the descent, effectively amortizing the local, per instance, optimization. A probabilistic encoder based on a graph convolutional network computes high-dimensional representations of relaxed constraints in MILP instances. A decoder then turns these representations into LMs. We train the encoder and decoder jointly by directly optimizing the bound obtained from the predicted multipliers. Numerical experiments show that our approach closes up to 85~\% of the gap between the continuous relaxation and the best Lagrangian bound, and provides a high quality warm-start for descent based Lagrangian methods.
    摘要 拉格朗日 relaxation 是各种混合整数线性Programs (MILP) 中最为效率的解决方法之一,它可以给出 MILP 的优化目标值的下界,并且lagrangian方法会寻找提供这个下界的最佳Lagrangian Multipliers (LMs)。然而,这些方法通常需要使用迭代算法,类似于梯度下降,来最大化权重函数的凹陷部分,这会导致计算负担随着约束数量的增加。我们提出了一种基于深度学习的方法,它可以绕过下降,将每个实例的本地优化执行成功填充。我们使用一个基于图 convolutional neural network 的概率编码器计算 MILP 实例中约束的高维表示。然后,一个解码器将这些表示转化为 LMs。我们共同训练编码器和解码器,直接优化预测的多余分数提供的下界,以确保其高质量。数值实验表明,我们的方法可以逼近Continuous relaxation 和最佳 Lagrangian bound 之间的差距,并提供高质量的启动点 для梯度下降基于 Lagrangian 方法。

Making informed decisions in cutting tool maintenance in milling: A KNN based model agnostic approach

  • paper_url: http://arxiv.org/abs/2310.14629
  • repo_url: None
  • paper_authors: Aditya M. Rahalkar, Om M. Khare, Abhishek D. Patange
  • For: 本研究旨在提出一种基于KNN的白盒模型,以提高工具状况监测系统的可解释性和维护决策。* Methods: 该研究使用了各种机器学习技术进行工具状况监测,并在实验中采集了大量数据。 Decision trees 和 KNN 算法被用进行特征选择和分类。 hyperparameter 优化也进行了以提高模型的性能。* Results: 该研究使用了 KNN 白盒模型,可以帮助制造商更深入了解工具的维护和监测过程,并且可以提高工具状况监测系统的可解释性。
    Abstract In machining processes, monitoring the condition of the tool is a crucial aspect to ensure high productivity and quality of the product. Using different machine learning techniques in Tool Condition Monitoring TCM enables a better analysis of the large amount of data of different signals acquired during the machining processes. The real time force signals encountered during the process were acquired by performing numerous experiments. Different tool wear conditions were considered during the experimentation. A comprehensive statistical analysis of the data and feature selection using decision trees was conducted, and the KNN algorithm was used to perform classification. Hyperparameter tuning of the model was done to improve the models performance. Much research has been done to employ machine learning approaches in tool condition monitoring systems, however, a model agnostic approach to increase the interpretability of the process and get an in depth understanding of how the decision making is done is not implemented by many. This research paper presents a KNN based white box model, which allows us to dive deep into how the model performs the classification and how it prioritizes the different features included. This approach helps in detecting why the tool is in a certain condition and allows the manufacturer to make an informed decision about the tools maintenance.
    摘要 在机床过程中,监测工具状况是一项重要的方面,以确保高效率和产品质量。使用不同的机器学习技术在工具状况监测(TCM)中可以更好地分析各种信号的大量数据。在实验中,收集了不同工具损害情况下的实时力矩读数据。通过对数据进行全面统计分析和特征选择,使用KNN算法进行分类。为了改进模型性能,进行了模型参数调整。虽然许多研究把机器学习技术应用于工具状况监测系统,但是不多的研究者采用白盒模型来增加解释性和深入了解决ving过程中的决策。本研究论文提出了一种基于KNN的白盒模型,允许我们深入了解模型如何进行分类和如何优先级排序不同特征。这种方法可以帮助检测工具状况,并让制造商做出 Informed 决策 regarding 工具维护。

Rethinking SIGN Training: Provable Nonconvex Acceleration without First- and Second-Order Gradient Lipschitz

  • paper_url: http://arxiv.org/abs/2310.14616
  • repo_url: None
  • paper_authors: Tao Sun, Congliang Chen, Peng Qiao, Li Shen, Xinwang Liu, Dongsheng Li
  • for: 这篇论文旨在探讨基于标识符的随机方法在深度神经网络训练中的性能和稳定性。
  • methods: 该论文使用基于标识符的随机方法,包括已有的SignMethod和LION算法,进行分析和比较。
  • results: 研究人员通过分析和实验发现,基于标识符的随机方法在深度神经网络训练中可以实现稳定和高效的性能,而且在分布式设置下,采用快速通信压缩协议可以实现线性增速。
    Abstract Sign-based stochastic methods have gained attention due to their ability to achieve robust performance despite using only the sign information for parameter updates. However, the current convergence analysis of sign-based methods relies on the strong assumptions of first-order gradient Lipschitz and second-order gradient Lipschitz, which may not hold in practical tasks like deep neural network training that involve high non-smoothness. In this paper, we revisit sign-based methods and analyze their convergence under more realistic assumptions of first- and second-order smoothness. We first establish the convergence of the sign-based method under weak first-order Lipschitz. Motivated by the weak first-order Lipschitz, we propose a relaxed second-order condition that still allows for nonconvex acceleration in sign-based methods. Based on our theoretical results, we gain insights into the computational advantages of the recently developed LION algorithm. In distributed settings, we prove that this nonconvex acceleration persists with linear speedup in the number of nodes, when utilizing fast communication compression gossip protocols. The novelty of our theoretical results lies in that they are derived under much weaker assumptions, thereby expanding the provable applicability of sign-based algorithms to a wider range of problems.
    摘要 <>传统的掌纹方法在深度学习中的应用中已经受到了广泛关注,因为它们可以在使用仅掌纹信息时实现稳定性。然而,现有的整合分析方法假设了first-order gradient Lipschitz和second-order gradient Lipschitz假设,这些假设在实际任务中可能不成立。在这篇论文中,我们重新审视了掌纹方法,并对其在更实际的first-和second-order smoothness假设下进行了整合分析。我们首先证明了使用weak first-order Lipschitz的掌纹方法的整合。受weak first-order Lipschitz的启发,我们提出了一种放宽的second-order condition,该condition仍允许非 convex acceleration。基于我们的理论结论,我们获得了关于LION算法的计算优点的新的理解。在分布式设置下,我们证明了在使用快速通信压缩GOSSIP协议时,这种非 convex acceleration会保持线性增速。我们的理论结论的新性在于它们是基于更弱的假设 derivation,从而扩展了掌纹算法的可证明适用范围。

  • paper_url: http://arxiv.org/abs/2310.17664
  • repo_url: None
  • paper_authors: Yingying Gao, Shilei Zhang, Zihao Cui, Chao Deng, Junlan Feng
  • for: 这个论文主要用于提出一种自动化和有效的适应学习方法,用于优化端到端嵌入多任务模型。
  • methods: 该方法基于神经网络搜索(NAS)框架,并使用候选适应操作来优化模型。这些候选适应操作包括冻结、插入 adapter 和微调。
  • results: 该方法能够在 SLURP 上实现类似于手动设计的优化策略,压缩优化参数数量为 8.7%,并且性能更好。
    Abstract Cascading multiple pre-trained models is an effective way to compose an end-to-end system. However, fine-tuning the full cascaded model is parameter and memory inefficient and our observations reveal that only applying adapter modules on cascaded model can not achieve considerable performance as fine-tuning. We propose an automatic and effective adaptive learning method to optimize end-to-end cascaded multi-task models based on Neural Architecture Search (NAS) framework. The candidate adaptive operations on each specific module consist of frozen, inserting an adapter and fine-tuning. We further add a penalty item on the loss to limit the learned structure which takes the amount of trainable parameters into account. The penalty item successfully restrict the searched architecture and the proposed approach is able to search similar tuning scheme with hand-craft, compressing the optimizing parameters to 8.7% corresponding to full fine-tuning on SLURP with an even better performance.
    摘要 继发多个预训练模型是一种有效的端到端系统组合方式。然而,对整个继发模型进行精细调整是参数和内存不fficient的,我们的观察表明,只有在继发模型上应用 adapter 模块不能达到显著性能提升。我们提出了一种自动和有效的适应学习方法,基于神经建筑搜索(NAS)框架,以便优化端到端继发多任务模型。候选适应操作 на each specific module 包括冻结、插入 adapter 和 fine-tuning。我们进一步添加了一个惩罚项到损失函数,以限制搜索到的结构,该结构的训练参数数量被考虑。这个惩罚项成功限制了搜索的结构,我们的方法能够搜索到类似的调整方案,将优化参数压缩到 8.7% 相当于全面 fine-tuning 在 SLURP 上,并且性能更好。

CAD-DA: Controllable Anomaly Detection after Domain Adaptation by Statistical Inference

  • paper_url: http://arxiv.org/abs/2310.14608
  • repo_url: None
  • paper_authors: Vo Nguyen Le Duy, Hsuan-Tien Lin, Ichiro Takeuchi
  • for: 这个论文是为了测试异常检测(AD)下领域适应(DA)中的结果而设计的一种新统计方法,即CAD-DA。
  • methods: 这个方法使用 conditional Selective Inference 来处理 DA 的影响,并能够控制预先确定的水平 $\alpha$(例如 0.05)中的异常识别概率。
  • results: 在 both synthetic 和实际数据集上,CAD-DA 方法能够实现有效的统计推断,并且能够控制预先确定的水平 $\alpha$ 中的异常识别概率。
    Abstract We propose a novel statistical method for testing the results of anomaly detection (AD) under domain adaptation (DA), which we call CAD-DA -- controllable AD under DA. The distinct advantage of the CAD-DA lies in its ability to control the probability of misidentifying anomalies under a pre-specified level $\alpha$ (e.g., 0.05). The challenge within this DA setting is the necessity to account for the influence of DA to ensure the validity of the inference results. Our solution to this challenge leverages the concept of conditional Selective Inference to handle the impact of DA. To our knowledge, this is the first work capable of conducting a valid statistical inference within the context of DA. We evaluate the performance of the CAD-DA method on both synthetic and real-world datasets.
    摘要 我们提出了一种新的统计方法,用于测试异常检测(AD)下领域适应(DA)的结果,我们称之为CAD-DA,即可控AD下DA的异常检测方法。CAD-DA的独特优势在于可以控制misidentify异常的概率,例如0.05。在DA设定下,挑战是需要考虑DA的影响,以确保结论的有效性。我们解决这个挑战,利用选择性统计处理DA的影响。据我们所知,这是首个在DA设定下进行有效统计推断的研究。我们对 synthetic和实际数据集进行了性能评估。

Online Auditing of Information Flow

  • paper_url: http://arxiv.org/abs/2310.14595
  • repo_url: https://github.com/Sfedfcv/redesigned-pancake
  • paper_authors: Mor Oren-Loberman, Vered Azar, Wasim Huleihel
  • for: 这个论文的目的是为了审核社交媒体上的信息流动,以分辨假新闻和真实新闻。
  • methods: 该论文使用了一种 probabilistic Markovian 信息传播模型,并提出了一种基于这种模型的Sequential detection问题,以最小化错误率和检测时间的 комбиinación。
  • results: 实验表明,该算法在实际 dataset 上表现更高的准确率和检测速度,比现有的假信息检测算法更好。
    Abstract Modern social media platforms play an important role in facilitating rapid dissemination of information through their massive user networks. Fake news, misinformation, and unverifiable facts on social media platforms propagate disharmony and affect society. In this paper, we consider the problem of online auditing of information flow/propagation with the goal of classifying news items as fake or genuine. Specifically, driven by experiential studies on real-world social media platforms, we propose a probabilistic Markovian information spread model over networks modeled by graphs. We then formulate our inference task as a certain sequential detection problem with the goal of minimizing the combination of the error probability and the time it takes to achieve correct decision. For this model, we find the optimal detection algorithm minimizing the aforementioned risk and prove several statistical guarantees. We then test our algorithm over real-world datasets. To that end, we first construct an offline algorithm for learning the probabilistic information spreading model, and then apply our optimal detection algorithm. Experimental study show that our algorithm outperforms state-of-the-art misinformation detection algorithms in terms of accuracy and detection time.
    摘要 现代社交媒体平台在推广信息的速度和范围方面发挥着重要的作用。社交媒体上的假新闻、谣言和未经证实的信息可能会导致社会不稳定。在这篇论文中,我们考虑了在社交媒体上进行信息流/宣传的在线审核问题,以分类新闻项目为假或真。我们基于实际的社交媒体平台实践研究,提出了一种 probabilistic Markov chain 信息传播模型,并将检测任务定义为一种顺序检测问题,以最小化错误概率和检测时间的组合。我们找到了最佳检测算法,并证明了一些统计保证。然后,我们对实际数据进行测试,并构建了一个在线算法来学习probabilistic信息传播模型。实验结果表明,我们的算法在准确率和检测时间方面与当前的误信息检测算法相比,表现出优异性。

GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels

  • paper_url: http://arxiv.org/abs/2310.14586
  • repo_url: https://github.com/Amanda-Zheng/GNNEvaluator
  • paper_authors: Xin Zheng, Miao Zhang, Chunyang Chen, Soheila Molaei, Chuan Zhou, Shirui Pan
  • for: 评估图神经网络(GNN)的性能是实际应用中的一项重要任务,因为部署的 GNN 模型在测试图上进行预测时会面临巨大的性能不确定性,这是因为训练和测试图的分布不匹配。本文研究了一个新的问题——GNN 模型评估,该问题的目标是评估特定的 GNN 模型在见到的图上进行预测性能。
  • methods: 我们提出了一个两个阶段的 GNN 模型评估框架,包括(1)DiscGraph 集合建构和(2)GNNEvaluator 训练和推理。DiscGraph 集合通过一种不同的分布差异量度函数来捕捉广泛的图数据分布差异,并利用 GNN 模型关于隐藏节点嵌入和节点类预测输出来实现有效的训练监督。GNNEvaluator 通过从 DiscGraph 集合获得的有效的训练监督来准确地估算 GNN 模型的节点分类率。
  • results: 我们的方法在实际中的无标签测试图上进行了广泛的实验,结果表明我们的方法可以准确地评估 GNN 模型在不同的图数据分布下的性能。
    Abstract Evaluating the performance of graph neural networks (GNNs) is an essential task for practical GNN model deployment and serving, as deployed GNNs face significant performance uncertainty when inferring on unseen and unlabeled test graphs, due to mismatched training-test graph distributions. In this paper, we study a new problem, GNN model evaluation, that aims to assess the performance of a specific GNN model trained on labeled and observed graphs, by precisely estimating its performance (e.g., node classification accuracy) on unseen graphs without labels. Concretely, we propose a two-stage GNN model evaluation framework, including (1) DiscGraph set construction and (2) GNNEvaluator training and inference. The DiscGraph set captures wide-range and diverse graph data distribution discrepancies through a discrepancy measurement function, which exploits the outputs of GNNs related to latent node embeddings and node class predictions. Under the effective training supervision from the DiscGraph set, GNNEvaluator learns to precisely estimate node classification accuracy of the to-be-evaluated GNN model and makes an accurate inference for evaluating GNN model performance. Extensive experiments on real-world unseen and unlabeled test graphs demonstrate the effectiveness of our proposed method for GNN model evaluation.
    摘要 评估图 neural network(GNN)的性能是实际部署和服务GNN模型的重要任务,因为部署在测试图上的GNN模型会面临很大的性能不确定性,由于训练和测试图的分布不匹配。在这篇论文中,我们研究了一个新的问题:GNN模型评估,它的目标是评估特定的GNN模型,它在观察和标注的图上 receives training,并且可以准确地预测图上的节点分类率。具体来说,我们提出了一个两stage GNN模型评估框架,包括(1)DiscGraph集合建立和(2)GNNEvaluator训练和推测。DiscGraph集合通过一个不同程度度量函数来捕捉图数据分布差异,这些差异度量函数利用GNN模型对节点嵌入和节点类预测的输出。在DiscGraph集合的有效的训练监督下,GNNEvaluator可以准确地估计GNN模型的节点分类率,并且可以准确地进行GNN模型性能评估。我们在实际的未看到和未标注测试图上进行了广泛的实验,并证明了我们提出的方法的有效性。

Modeling groundwater levels in California’s Central Valley by hierarchical Gaussian process and neural network regression

  • paper_url: http://arxiv.org/abs/2310.14555
  • repo_url: None
  • paper_authors: Anshuman Pradhan, Kyra H. Adams, Venkat Chandrasekaran, Zhen Liu, John T. Reager, Andrew M. Stuart, Michael J. Turmon
  • for: 模拟加利福尼亚中部水系地下水水平的continuous模型,以解决由于低质量的井水数据而困难。
  • methods: 提议使用机器学习方法,结合 Gaussian processes (GP) 和深度神经网络 (DNN),学习3D矿物文本模型,对地下水水平进行多变量回归。
  • results: 在2015-2020年间对加利福尼亚中部水系的地下水水平进行模拟,结果显示2017和2019两年的洗涤不足以补偿过去的旱期lost groundwater。
    Abstract Modeling groundwater levels continuously across California's Central Valley (CV) hydrological system is challenging due to low-quality well data which is sparsely and noisily sampled across time and space. A novel machine learning method is proposed for modeling groundwater levels by learning from a 3D lithological texture model of the CV aquifer. The proposed formulation performs multivariate regression by combining Gaussian processes (GP) and deep neural networks (DNN). Proposed hierarchical modeling approach constitutes training the DNN to learn a lithologically informed latent space where non-parametric regression with GP is performed. The methodology is applied for modeling groundwater levels across the CV during 2015 - 2020. We demonstrate the efficacy of GP-DNN regression for modeling non-stationary features in the well data with fast and reliable uncertainty quantification. Our results indicate that the 2017 and 2019 wet years in California were largely ineffective in replenishing the groundwater loss caused during previous drought years.
    摘要 模拟加利福尼亚中部谷地水层水位的问题非常复杂,因为水井数据质量低下,时空上充满噪声。一种新的机器学习方法被提议,通过学习三维岩石 texture模型来模拟水层水位。提议的形式ulation 通过将 Gaussian processes (GP) 和深度神经网络 (DNN) 结合起来实现多变量回归。层次模型的方法是在DNN中培养 latent space,并在非 Parametric regression with GP 中进行推 regression。该方法在2015-2020年cv中modeling groundwater levels。我们示出了GP-DNN回归的可靠性和快速量化uncertainty。我们的结果表明,2017和2019年的洗 wet years在加利福尼亚州中没有重新补充过去的抽水年份中的groundwater loss。

KindMed: Knowledge-Induced Medicine Prescribing Network for Medication Recommendation

  • paper_url: http://arxiv.org/abs/2310.14552
  • repo_url: None
  • paper_authors: Ahmad Wisnu Mulyadi, Heung-Il Suk
  • for: 该论文旨在提出一种基于知识的医学药物推荐网络(KindMed)框架,用于根据多种医疗相关的外部知识来扩充电子医疗纪录(EHR)群体,从而获得更全面的医学知识图(KG)。
  • methods: 该论文提出了一种基于关系意识图学习的方法,使得KGs能够更好地表示医学知识,并采用了层次序列学习来发现和融合病人历史 admit 中的医学和药物时间动态,以便个性化推荐药物。
  • results: 该论文在使用实际的扩展EHR群体上展示了KindMed的效果,与基于图driven的竞争对手相比,达到了领先的表现。
    Abstract Extensive adoption of electronic health records (EHRs) offers opportunities for its use in various clinical analyses. We could acquire more comprehensive insights by enriching an EHR cohort with external knowledge (e.g., standardized medical ontology and wealthy semantics curated on the web) as it divulges a spectrum of informative relations between observed medical codes. This paper proposes a novel Knowledge-Induced Medicine Prescribing Network (KindMed) framework to recommend medicines by inducing knowledge from myriad medical-related external sources upon the EHR cohort, rendering them as medical knowledge graphs (KGs). On top of relation-aware graph representation learning to unravel an adequate embedding of such KGs, we leverage hierarchical sequence learning to discover and fuse clinical and medicine temporal dynamics across patients' historical admissions for encouraging personalized recommendations. In predicting safe, precise, and personalized medicines, we devise an attentive prescribing that accounts for and associates three essential aspects, i.e., a summary of joint historical medical records, clinical condition progression, and the current clinical state of patients. We exhibited the effectiveness of our KindMed on the augmented real-world EHR cohorts, etching leading performances against graph-driven competing baselines.
    摘要 广泛采用电子健康记录(EHR)提供了许多可能性,用于不同的临床分析。我们可以通过扩充EHR群组 WITH 外部知识(例如标准医学 ontology 和互联网上 cura 的丰富 semantics)来获得更全面的理解,这些外部知识揭示了观察到的医疗代码之间的各种有益关系。本文提出了一种基于知识的医学药物推荐网络(KindMed)框架,用于根据外部医疗相关资源中的知识来建立医学知识图(KG),并在这些KG上进行关系意识graph representation learning来获得合适的嵌入。此外,我们还利用层次序列学习来发现和融合患者历史入院记录中的临床和药物时间动力学,以便促进个性化推荐。在预测安全、精准和个性化的药物时,我们提出了一种注意力投入的医学药物推荐方法,该方法考虑了三个基本方面:患者历史医疗记录摘要、临床病变趋势和患者当前临床状况。我们在扩展了实际世界EHR群组上展示了KindMed的效果,与图驱动的竞争基线相比,达到了出色的性能。

Corruption-Robust Offline Reinforcement Learning with General Function Approximation

  • paper_url: http://arxiv.org/abs/2310.14550
  • repo_url: https://github.com/yangrui2015/uwmsg
  • paper_authors: Chenlu Ye, Rui Yang, Quanquan Gu, Tong Zhang
  • for: 这个论文的目的是研究在线上执行学习中的抗腐蚀性能,特别是在批处理样本时遇到抗腐蚀的情况下。
  • methods: 这篇论文使用了uncertainty-weighting技术,并设计了一种新的不确定度权重迭代过程来有效地计算批处理样本上的抗腐蚀性能。
  • results: 该论文的结果表明,对于单个策略覆盖和抗腐蚀知识的假设下,提议的算法可以达到一种抗腐蚀性能bound,其增幅因子与抗腐蚀水平直接相关。特别是,当特定到线性MDP时,损害依赖于抗腐蚀水平的错误项降低到了 $\mathcal O(\zeta d n^{-1})$,其中 $d$ 是特征映射的维度,这与已知的下界准确性。
    Abstract We investigate the problem of corruption robustness in offline reinforcement learning (RL) with general function approximation, where an adversary can corrupt each sample in the offline dataset, and the corruption level $\zeta\geq0$ quantifies the cumulative corruption amount over $n$ episodes and $H$ steps. Our goal is to find a policy that is robust to such corruption and minimizes the suboptimality gap with respect to the optimal policy for the uncorrupted Markov decision processes (MDPs). Drawing inspiration from the uncertainty-weighting technique from the robust online RL setting \citep{he2022nearly,ye2022corruptionrobust}, we design a new uncertainty weight iteration procedure to efficiently compute on batched samples and propose a corruption-robust algorithm for offline RL. Notably, under the assumption of single policy coverage and the knowledge of $\zeta$, our proposed algorithm achieves a suboptimality bound that is worsened by an additive factor of $\mathcal O(\zeta \cdot (\text{CC}(\lambda,\hat{\mathcal F},\mathcal Z_n^H))^{1/2} (C(\hat{\mathcal F},\mu))^{-1/2} n^{-1})$ due to the corruption. Here $\text{CC}(\lambda,\hat{\mathcal F},\mathcal Z_n^H)$ is the coverage coefficient that depends on the regularization parameter $\lambda$, the confidence set $\hat{\mathcal F}$, and the dataset $\mathcal Z_n^H$, and $C(\hat{\mathcal F},\mu)$ is a coefficient that depends on $\hat{\mathcal F}$ and the underlying data distribution $\mu$. When specialized to linear MDPs, the corruption-dependent error term reduces to $\mathcal O(\zeta d n^{-1})$ with $d$ being the dimension of the feature map, which matches the existing lower bound for corrupted linear MDPs. This suggests that our analysis is tight in terms of the corruption-dependent term.
    摘要 我们研究了在线执行学习(RL)中的抗腐败性能,具体来说是在批处理样本的情况下,对于每个样本,敌对者可以进行抗腐败。我们的目标是找到一个抗腐败的策略,并尽可能减少与无腐败情况下的优化策略之间的差异。我们启发自在Robust Online RL Setting中的不确定性权重技术(\cite{he2022nearly,ye2022corruptionrobust}),并设计了一种新的不确定性权重迭代过程,以有效地计算批处理样本上。我们提出了一种对抗腐败的算法,并证明了该算法在假设单个策略覆盖和抗腐败量$\zeta$的情况下,存在一定的下降 bounds。其中,$\text{CC}(\lambda,\hat{\mathcal F},\mathcal Z_n^H)$是一个取决于Regularization参数$\lambda$、信任集$\hat{\mathcal F}$和批处理样本$\mathcal Z_n^H$的覆盖系数,$C(\hat{\mathcal F},\mu)$是一个取决于$\hat{\mathcal F}$和数据分布$\mu$的系数。当特化到线性MDPs时,损耗因素中的抗腐败相关项降为 $\mathcal O(\zeta d n^{-1})$,其中$d$是特征映射的维度,与现有的下降 bound相符。这表明我们的分析是紧急的。

Multimodal Graph Learning for Modeling Emerging Pandemics with Big Data

  • paper_url: http://arxiv.org/abs/2310.14549
  • repo_url: https://github.com/khanhtungtran/mgl4mep
  • paper_authors: Khanh-Tung Tran, Truong Son Hy, Lili Jiang, Xuan-Son Vu
  • for: 这篇论文旨在提出一种基于时间图 neural network 和多Modal 数据的潜在疫情预测和分析框架,以便更好地管理和决策。
  • methods: 该框架使用社交媒体内容、具有特定预训语言模型的特殊预训语言模型,以探索用户之间的下游图结构。
  • results: 对比基eline方法,该框架在不同的地区、疫情情况和预测时间范围内都能够具有更高的预测性能。
    Abstract Accurate forecasting and analysis of emerging pandemics play a crucial role in effective public health management and decision-making. Traditional approaches primarily rely on epidemiological data, overlooking other valuable sources of information that could act as sensors or indicators of pandemic patterns. In this paper, we propose a novel framework called MGL4MEP that integrates temporal graph neural networks and multi-modal data for learning and forecasting. We incorporate big data sources, including social media content, by utilizing specific pre-trained language models and discovering the underlying graph structure among users. This integration provides rich indicators of pandemic dynamics through learning with temporal graph neural networks. Extensive experiments demonstrate the effectiveness of our framework in pandemic forecasting and analysis, outperforming baseline methods across different areas, pandemic situations, and prediction horizons. The fusion of temporal graph learning and multi-modal data enables a comprehensive understanding of the pandemic landscape with less time lag, cheap cost, and more potential information indicators.
    摘要 正确预测和分析新兴疫情的角色在公共健康管理和决策中非常重要。传统方法主要依靠疫情学数据,忽略了其他可能有用的信息源,这些信息可以 acted as 疫情模式的感应器或指标。在本文中,我们提出了一个名为MGL4MEP的新框架,它结合了时间图 neural network和多 modal 数据进行学习和预测。我们利用了大量的数据来源,包括社交媒体内容,通过特定的预训语言模型来利用。这个组合提供了丰富的疫情动态指标,通过时间图 neural network 进行学习,实现了疫情预测和分析的优化。实验结果显示,我们的框架在不同的区域、疫情情况和预测时间点方面,都能够优于基准方法。将时间图学习和多Modal 数据融合,能够实现疫情景观的全面理解,具有较少的时间延迟、较低的成本和更多的信息指标。

Trigonometric Quadrature Fourier Features for Scalable Gaussian Process Regression

  • paper_url: http://arxiv.org/abs/2310.14544
  • repo_url: None
  • paper_authors: Kevin Li, Max Balakirsky, Simon Mak
  • for: The paper is written for scalable Gaussian Process (GP) regression, specifically to address the limitations of Quadrature Fourier Features (QFF) and improve the approximation accuracy and uncertainty estimates.
  • methods: The paper proposes a new method called Trigonometric Quadrature Fourier Feature (TQFF) that uses a non-Gaussian quadrature rule tailored for the desired Fourier transform, which improves the performance of the approximation over RFF and Gaussian QFF.
  • results: The paper demonstrates the improved performance of TQFF over RFF and Gaussian QFF in a suite of numerical experiments and applications, and shows that TQFF enjoys accurate GP approximations over a broad range of length-scales using fewer features.Here is the simplified Chinese translation of the three points:
  • for: 这篇论文是为了扩展 Gaussian Process(GP)回归的可扩展性,特别是解决 Quadrature Fourier Features(QFF)的限制,提高拟合精度和不确定性估计。
  • methods: 这篇论文提出了一种新的方法 called Trigonometric Quadrature Fourier Feature(TQFF),该方法使用非对称Gaussian quadrature规则,特地适应所需的傅立叶变换。
  • results: 这篇论文通过一系列的数值实验和应用,证明 TQFF 方法在 RFF 和 Gaussian QFF 方法上具有更好的表现,可以在各种长尺度范围内使用 fewer features 来获得更高的拟合精度。
    Abstract Fourier feature approximations have been successfully applied in the literature for scalable Gaussian Process (GP) regression. In particular, Quadrature Fourier Features (QFF) derived from Gaussian quadrature rules have gained popularity in recent years due to their improved approximation accuracy and better calibrated uncertainty estimates compared to Random Fourier Feature (RFF) methods. However, a key limitation of QFF is that its performance can suffer from well-known pathologies related to highly oscillatory quadrature, resulting in mediocre approximation with limited features. We address this critical issue via a new Trigonometric Quadrature Fourier Feature (TQFF) method, which uses a novel non-Gaussian quadrature rule specifically tailored for the desired Fourier transform. We derive an exact quadrature rule for TQFF, along with kernel approximation error bounds for the resulting feature map. We then demonstrate the improved performance of our method over RFF and Gaussian QFF in a suite of numerical experiments and applications, and show the TQFF enjoys accurate GP approximations over a broad range of length-scales using fewer features.
    摘要 “傅里叶特征近似方法已经成功应用在文献中 для涵盖�urrent Gaussian Process(GP)回归。特别是几何 Fourier Feature(QFF),它们的优点包括更高的准确度和更好地调整的不确定性估计,相比于Random Fourier Feature(RFF)方法。然而,QFF的表现可能会受到高振芜的问题,导致中�来的抽象有限。我们解决这个主要的问题,通过一种新的 trigonometric quadrature Fourier feature(TQFF)方法,这个方法使用一种特殊的非� Gaussian quadrature rule,专门适用于所需的傅里叶缩推。我们 derivate一个精确的 quadrature rule for TQFF,以及� kernel approximation error bounds for the resulting feature map。我们然后在一系列的数据�emonstrate TQFF的改进性,比如RFF和几何 Fourier Feature(Gaussian QFF),并显示TQFF在�urrent GP回归中具有更好的准确度和更广泛的�urrent�ength�scale使用更少的特征。”

Marginal Nodes Matter: Towards Structure Fairness in Graphs

  • paper_url: http://arxiv.org/abs/2310.14527
  • repo_url: None
  • paper_authors: Xiaotian Han, Kaixiong Zhou, Ting-Hsiang Wang, Jundong Li, Fei Wang, Na Zou
  • for: 本文主要研究领域是图 neural network 中的结构公平性(structure fairness)问题,特别是在 marginal node 处理不公平性问题。
  • methods: 本文提出了一种名为 SFairGNN 的新方法,它结合 neighborhood expansion 基于结构偏置和 hop-aware 注意力汇集来实现结构公平性。
  • results: 实验结果表明,SFairGNN 可以显著提高结构公平性,同时保持下游任务的总性能。
    Abstract In social network, a person located at the periphery region (marginal node) is likely to be treated unfairly when compared with the persons at the center. While existing fairness works on graphs mainly focus on protecting sensitive attributes (e.g., age and gender), the fairness incurred by the graph structure should also be given attention. On the other hand, the information aggregation mechanism of graph neural networks amplifies such structure unfairness, as marginal nodes are often far away from other nodes. In this paper, we focus on novel fairness incurred by the graph structure on graph neural networks, named \emph{structure fairness}. Specifically, we first analyzed multiple graphs and observed that marginal nodes in graphs have a worse performance of downstream tasks than others in graph neural networks. Motivated by the observation, we propose \textbf{S}tructural \textbf{Fair} \textbf{G}raph \textbf{N}eural \textbf{N}etwork (SFairGNN), which combines neighborhood expansion based structure debiasing with hop-aware attentive information aggregation to achieve structure fairness. Our experiments show \SFairGNN can significantly improve structure fairness while maintaining overall performance in the downstream tasks.
    摘要 在社交网络中,位于边缘区域(边缘节点)的人可能会被不公正地对待,相比中心节点的人。现有的公平性工作主要关注保护敏感属性(如年龄和性别),但是图Structural fairness也应该得到注意。图Structural fairness的存在会使得边缘节点在图 нейрон网络中表现更差,因为它们通常与其他节点较远。在这篇论文中,我们关注图Structural fairness,并提出了一种名为\emph{Structure Fairness}的新公平性。我们首先分析了多个图,并发现了边缘节点在图 нейрон网络中下游任务的性能较差。这一观察motivates我们提出了\textbf{S}tructural \textbf{Fair} \textbf{G}raph \textbf{N}eural \textbf{N}etwork(SFairGNN),它将邻居扩展基于结构偏见的结构偏见与跳跃感知的信息汇集结合以实现结构公平性。我们的实验表明,\SFairGNN可以显著改善结构公平性,同时保持下游任务的总性能。

K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis

  • paper_url: http://arxiv.org/abs/2310.14521
  • repo_url: None
  • paper_authors: Sean Cottrell, Yuta Hozumi, Guo-Wei Wei
  • for: 这种研究用于描述单个细胞的转录调控和细胞间通信的细胞RNA分析数据的分析方法。
  • methods: 这种方法使用 persist Laplacian 技术和 L$_{2,1}$ 正则化来处理多尺度和多类别多样性问题,并 introduce 一种 k-Nearest-Neighbor(kNN) persist Laplacian 技术来提高方法的稳定性。
  • results: 对于 11 个不同的 benchmark 细胞RNA seq 数据集,我们的方法超过了其他无监督 PCA 增强法和 UMAP、tSNE 和 Projection Non-Negative Matrix Factorization(NMF)的表现,并且可以更好地处理多尺度和多类别多样性问题。
    Abstract Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L$_{2,1}$ norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins.
    摘要 Single-cell RNA sequencing (scRNA-seq) 是一种广泛使用的技术,用于揭示细胞间的多样性,从而为我们提供了细胞通信、细胞分化和不同基因表达的启示。然而,分析 scRNA-seq 数据是一项挑战,因为数据中存在稀疏性和大量基因的问题。因此,维度减少和特征选择是必须的,以移除干扰信号并增强下游分析。传统的 PCA(主成分分析)是维度减少的主要工具,但它缺乏捕捉数据中嵌入的几何结构信息的能力。我们提出一种 topological Principal Components Analysis (tPCA) 方法,通过结合持续 Laplacian(PL)技术和 L$_{2,1}$ 范数规则来解决数据中多尺度和多类异ogeneity 问题。我们还引入 k-Nearest-Neighbor (kNN) 持续 Laplacian技术以提高我们的持续 Laplacian方法的可靠性。我们的 kNN-PL 方法是一种新的 algebraic topology 技术,它解决了传统 persist homology 中的多个限制。而不是通过变化距离阈值来实现维度滤波,我们引入 kNN-tPCA,其中维度滤波通过在每个步骤中变化数据中的几个邻居的数量来实现。我们验证了我们提出的 tPCA 和 kNN-tPCA 方法在 11 个不同的 scRNA-seq 数据集上的效果,并发现它们在其他未supervised PCA 增强技术、UMAP、t-SNE 和 Projection Non-Negative Matrix Factorization (NMF) 的基础上表现出了显著的优势。

Efficient Heterogeneous Graph Learning via Random Projection

  • paper_url: http://arxiv.org/abs/2310.14481
  • repo_url: https://github.com/CrawlScript/RpHGNN
  • paper_authors: Jun Hu, Bryan Hooi, Bingsheng He
  • For: 这个研究旨在提高大规模现实世界图像中的深度学习效率,通过将异化图像转换为常态化的矩阵,并使用单次讯息传递来进行训练。* Methods: 本研究提出了一种混合式预计基于图像神经网络(HGNN),组合了一Style的效率和另一Style的信息损失低。主要框架包括传播-更新迭代,并引入随机投影压缩步骤,以确保复杂度增长 linearly。* Results: 实验结果显示,我们的方法可以在七个小型和大型 benchmark 数据集上取得现场状态的结果,并且比最有效的基准更快速了230%。 surprisngly,我们的方法不仅超过预处理基准,而且还超过了终端方法。
    Abstract Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs. Typical HGNNs require repetitive message passing during training, limiting efficiency for large-scale real-world graphs. Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors, enabling efficient mini-batch training. Existing pre-computation-based HGNNs can be mainly categorized into two styles, which differ in how much information loss is allowed and efficiency. We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN), which combines the benefits of one style's efficiency with the low information loss of the other style. To achieve efficiency, the main framework of RpHGNN consists of propagate-then-update iterations, where we introduce a Random Projection Squashing step to ensure that complexity increases only linearly. To achieve low information loss, we introduce a Relation-wise Neighbor Collection component with an Even-odd Propagation Scheme, which aims to collect information from neighbors in a finer-grained way. Experimental results indicate that our approach achieves state-of-the-art results on seven small and large benchmark datasets while also being 230% faster compared to the most effective baseline. Surprisingly, our approach not only surpasses pre-processing-based baselines but also outperforms end-to-end methods.
    摘要 《异种图 neural network(HGNN)是深度学习异种图的powerful工具。 typical HGNNs需要在训练中重复的发送消息,限制了大规模实际graph的效率。 recent pre-computation-based HGNNs使用一次消息 passing将异种图转换成常规形状的矩阵,以便高效地进行小批量训练。 existing pre-computation-based HGNNs可以分为两种风格,不同的风格允许的信息损失和效率。 we propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN), which combines the benefits of one style's efficiency with the low information loss of the other style. To achieve efficiency, the main framework of RpHGNN consists of propagate-then-update iterations, where we introduce a Random Projection Squashing step to ensure that complexity increases only linearly. To achieve low information loss, we introduce a Relation-wise Neighbor Collection component with an Even-odd Propagation Scheme, which aims to collect information from neighbors in a finer-grained way. Experimental results indicate that our approach achieves state-of-the-art results on seven small and large benchmark datasets while also being 230% faster compared to the most effective baseline. Surprisingly, our approach not only surpasses pre-processing-based baselines but also outperforms end-to-end methods。》Note: The translation is in Simplified Chinese, which is the standard version of Chinese used in mainland China and widely used in other countries. If you need the translation in Traditional Chinese, please let me know.

Attention-Enhancing Backdoor Attacks Against BERT-based Models

  • paper_url: http://arxiv.org/abs/2310.14480
  • repo_url: None
  • paper_authors: Weimin Lyu, Songzhu Zheng, Lu Pang, Haibin Ling, Chao Chen
  • for: investigate the strategies of backdoor attacks and understand the model’s vulnerability
  • methods: directly manipulate the attention patterns in the interior structure of neural networks
  • results: enhance the Trojan behavior and boost attack efficacy in terms of attack successful rates and poisoning rates, applicable to different attacking methods and models.
    Abstract Recent studies have revealed that \textit{Backdoor Attacks} can threaten the safety of natural language processing (NLP) models. Investigating the strategies of backdoor attacks will help to understand the model's vulnerability. Most existing textual backdoor attacks focus on generating stealthy triggers or modifying model weights. In this paper, we directly target the interior structure of neural networks and the backdoor mechanism. We propose a novel Trojan Attention Loss (TAL), which enhances the Trojan behavior by directly manipulating the attention patterns. Our loss can be applied to different attacking methods to boost their attack efficacy in terms of attack successful rates and poisoning rates. It applies to not only traditional dirty-label attacks, but also the more challenging clean-label attacks. We validate our method on different backbone models (BERT, RoBERTa, and DistilBERT) and various tasks (Sentiment Analysis, Toxic Detection, and Topic Classification).
    摘要 新的研究发现,\textit{后门攻击} 可以威胁自然语言处理(NLP)模型的安全性。调查后门攻击的策略可以帮助我们理解模型的漏洞。现有大多数文本后门攻击都是通过生成隐藏的触发符或修改模型的权重来实现。在这篇论文中,我们直接target了神经网络的内部结构和后门机制。我们提出了一种新的 Trojan Attention Loss(TAL),它可以直接 manipulate 神经网络的注意模式,从而增强 Trojan 行为。我们的损失可以应用于不同的攻击方法,以提高攻击成功率和毒料率。它适用于不仅传统的尘埃标签攻击,还适用于更加困难的干净标签攻击。我们在不同的基础模型(BERT、RoBERTa、DistilBERT)和多个任务(情感分析、毒语检测、主题分类)上验证了我们的方法。

Revisiting Implicit Differentiation for Learning Problems in Optimal Control

  • paper_url: http://arxiv.org/abs/2310.14468
  • repo_url: https://github.com/mingu6/implicit-diff-optimal-control
  • paper_authors: Ming Xu, Timothy Molloy, Stephen Gould
  • for: 这个论文提出了一种新的方法,用于解决非 convex、受限 discrete-time最优控制(COC)问题中的导数计算。
  • methods: 我们直接解析变量排除后得到的矩阵方程,并利用矩阵方程的结构来避免约束的影响。这种方法可以高效地解决带有时间步骤的优化问题,并且可以简单地并行化。
  • results: 我们在一个 synthetic 测试集和四个具有挑战性的学习从示例中评估了我们的方法。结果表明,我们的方法可以在带有时间步骤的优化问题中高效地计算导数,并且可以在大型模型下实现更好的可扩展性和稳定性。
    Abstract This paper proposes a new method for differentiating through optimal trajectories arising from non-convex, constrained discrete-time optimal control (COC) problems using the implicit function theorem (IFT). Previous works solve a differential Karush-Kuhn-Tucker (KKT) system for the trajectory derivative, and achieve this efficiently by solving an auxiliary Linear Quadratic Regulator (LQR) problem. In contrast, we directly evaluate the matrix equations which arise from applying variable elimination on the Lagrange multiplier terms in the (differential) KKT system. By appropriately accounting for the structure of the terms within the resulting equations, we show that the trajectory derivatives scale linearly with the number of timesteps. Furthermore, our approach allows for easy parallelization, significantly improved scalability with model size, direct computation of vector-Jacobian products and improved numerical stability compared to prior works. As an additional contribution, we unify prior works, addressing claims that computing trajectory derivatives using IFT scales quadratically with the number of timesteps. We evaluate our method on a both synthetic benchmark and four challenging, learning from demonstration benchmarks including a 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing.
    摘要

Inferring Relational Potentials in Interacting Systems

  • paper_url: http://arxiv.org/abs/2310.14466
  • repo_url: https://github.com/ArmandCom/relational-potentials.github.io
  • paper_authors: Armand Comas-Massagué, Yilun Du, Christian Fernandez, Sandesh Ghimire, Mario Sznaier, Joshua B. Tenenbaum, Octavia Camps
  • for: 本研究旨在建立一种能够robustly推断互动系统中间的方法,以便在实际世界中建立可靠的互动系统。
  • methods: 本研究提出了一种名为神经网络互动推断法(NIIP),它通过发现可能Field的能量函数来推断互动系统中间的关系。
  • results: NIIP可以在测试时展示独特的能力,例如可以在不同的模型中交换互动类型,预测行程,以及检测异常样本和外部干扰。
    Abstract Systems consisting of interacting agents are prevalent in the world, ranging from dynamical systems in physics to complex biological networks. To build systems which can interact robustly in the real world, it is thus important to be able to infer the precise interactions governing such systems. Existing approaches typically discover such interactions by explicitly modeling the feed-forward dynamics of the trajectories. In this work, we propose Neural Interaction Inference with Potentials (NIIP) as an alternative approach to discover such interactions that enables greater flexibility in trajectory modeling: it discovers a set of relational potentials, represented as energy functions, which when minimized reconstruct the original trajectory. NIIP assigns low energy to the subset of trajectories which respect the relational constraints observed. We illustrate that with these representations NIIP displays unique capabilities in test-time. First, it allows trajectory manipulation, such as interchanging interaction types across separately trained models, as well as trajectory forecasting. Additionally, it allows adding external hand-crafted potentials at test-time. Finally, NIIP enables the detection of out-of-distribution samples and anomalies without explicit training. Website: https://energy-based-model.github.io/interaction-potentials.
    摘要 系统组成了互动代理是世界各地的普遍现象,从物理动力学系统到复杂生物网络。为建立能够在实际世界中稳定交互的系统,因此是必要的能够推断系统的准确交互规则。现有的方法通常通过显式地模型演示的演进动力学来发现这些交互。在这种工作中,我们提议使用神经网络可视化力学 potential(NIIP)作为一种alternative方法,可以更多的灵活性在轨迹模型化。NIIP发现一组关系 potential,表示为能量函数,当这些函数的最小值时,可以重建原始轨迹。NIIP将低能量分配给尊重关系约束的子集。我们示出NIIP在测试时显示了独特的能力。首先,它允许轨迹操作,例如在不同的模型上交换交互类型,以及预测轨迹。其次,它允许在测试时添加手动编写的外部潜在力。最后,NIIP可以在测试时探测不同类型的样本和异常现象而无需显式培训。网站:https://energy-based-model.github.io/interaction-potentials。

eess.IV - 2023-10-23

Bitrate Ladder Prediction Methods for Adaptive Video Streaming: A Review and Benchmark

  • paper_url: http://arxiv.org/abs/2310.15163
  • repo_url: None
  • paper_authors: Ahmed Telili, Wassim Hamidouche, Hadi Amirpour, Sid Ahmed Fezza, Luce Morin, Christian Timmerer
  • for: 这篇论文旨在概述不同的方法来预测内容优化的比特率梯度,以提高OTT视频流服务的流畅性。
  • methods: 这篇论文评估了多种方法,包括传统的和学习基于的方法,以预测内容优化的比特率梯度。
  • results: 这篇论文在使用大规模数据集进行了 benchmark 研究,并评估了多种学习基于的方法,以预测内容优化的比特率梯度。
    Abstract HTTP adaptive streaming (HAS) has emerged as a widely adopted approach for over-the-top (OTT) video streaming services, due to its ability to deliver a seamless streaming experience. A key component of HAS is the bitrate ladder, which provides the encoding parameters (e.g., bitrate-resolution pairs) to encode the source video. The representations in the bitrate ladder allow the client's player to dynamically adjust the quality of the video stream based on network conditions by selecting the most appropriate representation from the bitrate ladder. The most straightforward and lowest complexity approach involves using a fixed bitrate ladder for all videos, consisting of pre-determined bitrate-resolution pairs known as one-size-fits-all. Conversely, the most reliable technique relies on intensively encoding all resolutions over a wide range of bitrates to build the convex hull, thereby optimizing the bitrate ladder for each specific video. Several techniques have been proposed to predict content-based ladders without performing a costly exhaustive search encoding. This paper provides a comprehensive review of various methods, including both conventional and learning-based approaches. Furthermore, we conduct a benchmark study focusing exclusively on various learning-based approaches for predicting content-optimized bitrate ladders across multiple codec settings. The considered methods are evaluated on our proposed large-scale dataset, which includes 300 UHD video shots encoded with software and hardware encoders using three state-of-the-art encoders, including AVC/H.264, HEVC/H.265, and VVC/H.266, at various bitrate points. Our analysis provides baseline methods and insights, which will be valuable for future research in the field of bitrate ladder prediction. The source code of the proposed benchmark and the dataset will be made publicly available upon acceptance of the paper.
    摘要 最简单且最低复杂度的方法是使用固定的比特率组,这些组合包括预先决定的比特率和分辨率的对。 然而,这些方法可能无法提供最佳的比特率组,因为每个影片都有不同的内容和质量需求。 因此,许多技术已经被提出供预测内容基于的比特率组,而不需要进行成本高昂的探索性编码。本文提供了各种方法的全面评论,包括传统和学习基于的方法。 此外,我们进行了专注于不同学习基于的方法的benchmark研究,以评估这些方法在多种codec设置下的性能。我们的分析提供了基线方法和对照,这些将是未来在这个领域的研究中的价值。我们将在接下来发布的proposed benchmark和dataset中公开source code。

DeepOrientation: convolutional neural network for fringe pattern orientation map estimation

  • paper_url: http://arxiv.org/abs/2310.15209
  • repo_url: https://github.com/mariasi1/deeporientationnetmodel
  • paper_authors: Maria Cywinska, Mikolaj Rogalski, Filip Brzeski, Krzysztof Patorski, Maciej Trusiak
  • for: 本研究旨在提出一种基于卷积神经网络和深度学习的本地弯曲方向图像分割方法,以便在全场光学测量中准确地估计本地弯曲方向图像。
  • methods: 本研究使用了卷积神经网络和深度学习来实现本地弯曲方向图像分割。
  • results: 实验和数值仿真结果表明,提出的 DeepOrientation 方法可以准确地估计本地弯曲方向图像,并且比传统的方法(合并平面适应/梯度法)更加稳定和自动化。
    Abstract Fringe pattern based measurement techniques are the state-of-the-art in full-field optical metrology. They are crucial both in macroscale, e.g., fringe projection profilometry, and microscale, e.g., label-free quantitative phase microscopy. Accurate estimation of the local fringe orientation map can significantly facilitate the measurement process on various ways, e.g., fringe filtering (denoising), fringe pattern boundary padding, fringe skeletoning (contouring/following/tracking), local fringe spatial frequency (fringe period) estimation and fringe pattern phase demodulation. Considering all of that the accurate, robust and preferably automatic estimation of local fringe orientation map is of high importance. In this paper we propose novel numerical solution for local fringe orientation map estimation based on convolutional neural network and deep learning called DeepOrientation. Numerical simulations and experimental results corroborate the effectiveness of the proposed DeepOrientation comparing it with the representative of the classical approach to orientation estimation called combined plane fitting/gradient method. The example proving the effectiveness of DeepOrientation in fringe pattern analysis, which we present in this paper is the application of DeepOrientation for guiding the phase demodulation process in Hilbert spiral transform. In particular, living HeLa cells quantitative phase imaging outcomes verify the method as an important asset in label-free microscopy.
    摘要 异常模式基于测量技术是现代全场光学测量领域的州际标准。它们在 макро尺度上,如异常投影 Profilometry,以及微尺度上,如无标签量phasemicroscopy。 precisely estimating the local fringe orientation map can significantly facilitate the measurement process in various ways, such as fringe filtering (denoising), fringe pattern boundary padding, fringe skeletoning (contouring/following/tracking), local fringe spatial frequency (fringe period) estimation, and fringe pattern phase demodulation. Therefore, accurately, robustly, and preferably automatically estimating the local fringe orientation map is of great importance. In this paper, we propose a novel numerical solution for local fringe orientation map estimation based on convolutional neural networks and deep learning called DeepOrientation. Numerical simulations and experimental results confirm the effectiveness of the proposed DeepOrientation compared to the classical approach to orientation estimation called combined plane fitting/gradient method. The example demonstrating the effectiveness of DeepOrientation in fringe pattern analysis, which we present in this paper, is the application of DeepOrientation for guiding the phase demodulation process in Hilbert spiral transform. In particular, the quantitative phase imaging outcomes of living HeLa cells verify the method as an important asset in label-free microscopy.

The AIMI Initiative: AI-Generated Annotations for Imaging Data Commons Collections

  • paper_url: http://arxiv.org/abs/2310.14897
  • repo_url: None
  • paper_authors: Gowtham Krishnan Murugesan, Diana McCrumb, Mariam Aboian, Tej Verma, Rahul Soni, Fatima Memon, Jeff Van Oss
  • for: This paper aims to contribute to the research and development of advanced imaging tools and algorithms by providing AI-generated annotations for 11 medical imaging collections from the Image Data Commons (IDC).
  • methods: The authors used both publicly available and novel AI algorithms, along with expert annotations, to create the AI-generated annotations. They also reviewed and corrected a portion of the AI annotations with a radiologist to assess the AI models’ performances.
  • results: The study provided AI-generated annotations for 11 medical imaging collections from the IDC, covering modalities such as CT, MRI, and PET, and focusing on the chest, breast, kidneys, prostate, and liver. The study demonstrated the potential of expansive publicly accessible datasets and AI for increasing accessibility and reliability in cancer imaging research and development.
    Abstract The Image Data Commons (IDC) contains publicly available cancer radiology datasets that could be pertinent to the research and development of advanced imaging tools and algorithms. However, the full extent of its research capabilities is limited by the fact that these datasets have few, if any, annotations associated with them. Through this study with the AI in Medical Imaging (AIMI) initiative a significant contribution, in the form of AI-generated annotations, was made to provide 11 distinct medical imaging collections from the IDC with annotations. These collections included computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET) imaging modalities. The main focus of these annotations were in the chest, breast, kidneys, prostate, and liver. Both publicly available and novel AI algorithms were adopted and further developed using open-sourced data coupled with expert annotations to create the AI-generated annotations. A portion of the AI annotations were reviewed and corrected by a radiologist to assess the AI models' performances. Both the AI's and the radiologist's annotations conformed to DICOM standards for seamless integration into the IDC collections as third-party analyses. This study further cements the well-documented notion that expansive publicly accessible datasets, in the field of cancer imaging, coupled with AI will aid in increased accessibility as well as reliability for further research and development.
    摘要 Image Data Commons (IDC) 包含公共可用的癌症医学像素数据集,这些数据集可能对预测和开发高级医学像素处理算法进行研究和开发提供有益。然而,IDC的全面研究能力受限因为这些数据集几乎没有注释。通过本研究与医学计算机视觉(AIMI)计划的合作,对IDC中的11个医学像素集进行了AI生成的注释。这些集合包括计算机扫描(CT)、核磁共振成像(MRI)和 позитроном辐射Tomography(PET)成像方式。主要注释焦点在胸部、乳腺、肾脏、膀胱和肝脏。使用公共可用的AI算法和专家注释开发了AI生成的注释。一部分AI注释由医生审核并修正以评估AI模型的性能。AI和医生的注释都遵循DICOM标准,以便轻松地集成到IDC集合中作为第三方分析。这项研究进一步证明了许多公共可用的癌症医学像素数据集,结合AI,将会提高研究和开发的可 accessible性和可靠性。

Joint Non-Linear MRI Inversion with Diffusion Priors

  • paper_url: http://arxiv.org/abs/2310.14842
  • repo_url: None
  • paper_authors: Moritz Erlacher, Martin Zach
  • for: 加速MRI扫描过程,提高图像质量
  • methods: 使用数据驱动重构方法,并jointly estimate the sensitivity maps with the image
  • results: 实现了高效、高精度的MRI扫描,并且计算了高精度的扫描图像
    Abstract Magnetic resonance imaging (MRI) is a potent diagnostic tool, but suffers from long examination times. To accelerate the process, modern MRI machines typically utilize multiple coils that acquire sub-sampled data in parallel. Data-driven reconstruction approaches, in particular diffusion models, recently achieved remarkable success in reconstructing these data, but typically rely on estimating the coil sensitivities in an off-line step. This suffers from potential movement and misalignment artifacts and limits the application to Cartesian sampling trajectories. To obviate the need for off-line sensitivity estimation, we propose to jointly estimate the sensitivity maps with the image. In particular, we utilize a diffusion model -- trained on magnitude images only -- to generate high-fidelity images while imposing spatial smoothness of the sensitivity maps in the reverse diffusion. The proposed approach demonstrates consistent qualitative and quantitative performance across different sub-sampling patterns. In addition, experiments indicate a good fit of the estimated coil sensitivities.
    摘要

First realization of macroscopic Fourier ptychography for hundred-meter distance sub-diffraction imaging

  • paper_url: http://arxiv.org/abs/2310.14515
  • repo_url: None
  • paper_authors: Qi Zhang, Yuran Lu, Yinghui Guo, Yingjie Shang, Mingbo Pu, Yulong Fan, Rui Zhou, Xiaoyin Li, Fei Zhang, Mingfeng Xu, Xiangang Luo
  • for: 提高尺度为10米的远程超分色限 imaging
  • methods: 使用弯光函数优化和目标图像联合优化方法,从而实现补做镜像和同时估计折射函数
  • results: 实验中使用这种方法可以提高最大投影距离至12米、90米和170米,同时提高最大合成孔径至200毫米,相比之前的研究result有一个数量级的提高,并且解决了FOV limitation问题,从而开启了macroscopic FP的新阶段发展。
    Abstract Fourier ptychography (FP) imaging, drawing on the idea of synthetic aperture, has been demonstrated as a potential approach for remote sub-diffraction-limited imaging. Nevertheless, the farthest imaging distance is still limited around 10 m even though there has been a significant improvement in macroscopic FP. The most severely issue in increasing the imaging distance is FoV limitation caused by far-field condition for diffraction. Here, we propose to modify the Fourier far-field condition for rough reflective objects, aiming to overcome the small FoV limitation by using a divergent beam to illuminate objects. A joint optimization of pupil function and target image is utilized to attain the aberration-free image while estimating the pupil function simultaneously. Benefiting from the optimized reconstruction algorithm which effectively expands the camera's effective aperture, we experimentally implement several FP systems suited for imaging distance of 12 m, 90 m, and 170 m with the maximum synthetic aperture of 200 mm. The maximum imaging distance and synthetic aperture are thus improved by more than one order of magnitude of the state-of-the-art works with a fourfold improvement in the resolution. Our findings demonstrate significant potential for advancing the field of macroscopic FP, propelling it into a new stage of development.
    摘要 福尔勒普图графи(FP)成像,基于合成孔径的想法,已经被证明可以作为远程下限捕集成像的潜在方法。然而,最远捕集距离仍然受到10米的限制,即使有 significanth improvement in macroscopic FP。最严重的问题在于提高捕集距离的 FoV 限制,是由于远场干扰的折射所致。我们提议修改 Fourier 远场干扰的条件,以超越小 FoV 的限制。我们使用折射光束照射对象,并对目标图像和 pupil function 进行联合优化,以获得无扰图像,同时也可以估计 pupil function。由于改进的重建算法,我们实际实现了一些适合12米、90米和170米的FP系统,其中最大合成孔径达200毫米。因此,我们实现了一个至少一个数量级的提高,比之前的状态ola工作。我们的发现表明FP在 macroscopic 领域可能会进入一个新的发展阶段。

eess.SP - 2023-10-23

Finite-Time Adaptive Fuzzy Tracking Control for Nonlinear State Constrained Pure-Feedback Systems

  • paper_url: http://arxiv.org/abs/2310.15407
  • repo_url: None
  • paper_authors: Ju Wu, Tong Wang, Min Ma
  • for: 这个论文研究了一类具有全状态约束的纯反馈系统的可靠追踪控制问题。
  • methods: 该论文使用了 Mean-Value Theorem 将纯反馈非线性系统转化为约束式反馈系统,并使用了 finite-time-stable like function 和状态转换来 garantuee 输出追踪误差在固定的 finite interval 内 converge 到预定的集合。
  • results: 该论文使用了 integral Barrier Lyapunov functions 来保证状态变量在预定的约束内 remain 在 feasibility check 中,并使用了尼采度系统来近似未知非线性函数。 最后,论文提供了两个示例来证明提出的控制策略的有效性。
    Abstract This paper investigates the finite-time adaptive fuzzy tracking control problem for a class of pure-feedback system with full-state constraints. With the help of Mean-Value Theorem, the pure-feedback nonlinear system is transformed into strict-feedback case. By employing finite-time-stable like function and state transformation for output tracking error, the output tracking error converges to a predefined set in a fixed finite interval. To tackle the problem of state constraints, integral Barrier Lyapunov functions are utilized to guarantee that the state variables remain within the prescribed constraints with feasibility check. Fuzzy logic systems are utilized to approximate the unknown nonlinear functions. In addition, all the signals in the closed-loop system are guaranteed to be semi-global ultimately uniformly bounded. Finally, two simulation examples are given to show the effectiveness of the proposed control strategy.
    摘要 本文研究一种 finite-time 适应杂环境控制问题,对于一类具有全状态约束的纯反馈系统。通过 Mean-Value Theorem,我们将纯反馈非线性系统转换为约束反馈系统。通过使用 finite-time 稳定类似函数和状态变换来控制输出追踪错误,输出追踪错误在固定时间内 converge 到预定集。为解决状态约束问题,我们使用 integral Barrier Lyapunov functions 确保状态变量在预定的约束范围内停止。此外,我们还使用混杂逻辑系统来近似未知非线性函数。最后,我们证明所有系统信号在关闭环Loop 中都是半全球uniformly bounded。为证明效果,我们在文中提供了两个示例。

Adaptive Fuzzy Tracking Control for Nonlinear State Constrained Pure-Feedback Systems With Input Delay via Dynamic Surface Technique

  • paper_url: http://arxiv.org/abs/2310.16060
  • repo_url: None
  • paper_authors: Ju Wu, Tong Wang
  • for: 本文提出了一种适用于混合反馈系统的可适应跟踪控制方案,以解决输入延迟和全状约束的问题。
  • methods: 本文使用了 Mean Value Theorem 将混合反馈系统转化为纯反馈系统,并使用了 barrier Lyapunov functions 保证系统所有状态都处于预定的集中。 而且,通过引入 Pade 近似方法和相应的中间变量,消除了输入延迟对输出追踪性能的影响。
  • results: 本文通过稳定分析,证明了所有信号在封闭Loop系统中都是半全球最终有界bounded,并且可以通过选择合适的参数,使追踪错误在任意小的邻域内减少到起始点附近。最后,通过数值示例,证明了提出的方法的有效性。
    Abstract This brief constructs the adaptive backstepping control scheme for a class of pure-feedback systems with input delay and full state constraints. With the help of Mean Value Theorem, the pure-feedback system is transformed into strict-feedback one. Barrier Lyapunov functions are employed to guarantee all of the states remain constrained within predefined sets. By introducing the Pade approximation method and corresponding intermediate, the impact generated by input delay on the output tracking performance of the system can be eliminated. Furthermore, a low-pass filter driven by a newly-defined control input, is employed to generate the actual control input, which facilitates the design of backstepping control. To approximate the unknown functions with a desired level of accuracy, the fuzzy logic systems (FLSs) are utilized by choosing appropriate fuzzy rules, logics and so on. The minimal learning parameter (MLP) technique is employed to decrease the number of nodes and parameters in FLSs, and dynamic surface control (DSC) technique is leveraged to avoid so-called "explosion of complexity". Moreover, smooth robust compensators are introduced to circumvent the influences of external disturbance and approximation errors. By stability analysis, it is proved that all of signals in the closed-loop system are semi-globally ultimately uniform bounded, and the tracking error can be within a arbitrary small neighbor of origin via selecting appropriate parameters of controllers. Finally, the results of numerical illustration are provided to demonstrate the effectiveness of the designed method.
    摘要 这个文章构建了一种适应式后夹控制方案 для一类具有输入延迟和全状约束的纯反馈系统。通过意义值定理,这种纯反馈系统被转换为紧急反馈系统。使用障碍函数来保证所有状态都尽可能地保持在预定的集中。通过引入Pade方法和相应的中间变量,可以消除输入延迟对输出追踪性的影响。此外,一个低通滤波器,被驱动by一个新定义的控制输入,被用来生成实际的控制输入,以便实现后夹控制。通过使用多值逻辑系统(FLS),可以 aproximate unknown functions with a desired level of accuracy。使用最小学习参数(MLP)技术可以减少FLS的节点和参数数量,并使用动态表面控制(DSC)技术来避免“爆炸性”问题。此外,使用平滑稳定补偿器可以避免外部干扰和 aproximation error的影响。通过稳定分析,证明所有在关闭循环系统中的信号都是 semi-globally ultimately uniform bounded,并且追踪错误可以在一个任意小的邻域内减少到起始点。最后,文章提供了数据示例来证明设计的方法的有效性。

Analytical Performance Bounds for Radio Map Estimation

  • paper_url: http://arxiv.org/abs/2310.15106
  • repo_url: None
  • paper_authors: Daniel Romero, Tien Ngoc Ha, Raju Shrestha, Massimo Franceschetti
  • for: 这篇论文主要targetsRadio Map Estimation (RME) problem, aiming to provide a radiofrequency metric at every location of a geographical region of interest by relying on measurements acquired at multiple positions.
  • methods: 这篇论文使用了一些已知的估计器,并对其性能进行了分析。同时,它还提出了一些新的Error bounds和Complexity measures to quantify the performance of these estimators.
  • results: 研究发现, Error bounds和Complexity measures可以roughly proportional to the proximity coefficient, which depends on the transmitted power and the distance from the transmitters to the mapped region. Simple numerical experiments verify the tightness of the obtained bounds.
    Abstract Radio map estimation (RME) aims at providing a radiofrequency metric, such as the received power strength, at every location of a geographical region of interest by relying on measurements acquired at multiple positions. Although a large number of estimators have been proposed so far, their performance has been analyzed mostly on simulated data. The theoretical aspects of the RME problem as well as performance bounds remain an open problem. This paper takes a step towards filling this gap by means of a theoretical analysis of the RME problem in a free-space propagation environment. First, the complexity of the estimation problem is quantified by means of upper bounds on the spatial variability of radio maps. Second, error bounds are derived for zeroth-order and first-order interpolation estimators. The proximity coefficient, which depends proportionally on the transmitted power and inversely proportionally on the cube of the distance from the transmitters to the mapped region, is proposed to quantify the complexity of the RME problem. One of the main findings is that the error of the considered estimators is roughly proportional to this proximity coefficient. Simple numerical experiments verify the tightness of the obtained bounds.
    摘要 Radio 地图估计(RME)目标是在 интересующие我们地理区域中每个位置提供电磁波度量,通过在多个位置上进行测量来实现。虽然一大量的估计器已经被提出,但其性能主要基于模拟数据进行分析。 radio 地图估计问题的理论方面以及性能下限仍然是一个未解决的问题。这篇论文通过对 radio 地图估计问题在射频传播环境中进行理论分析来做出一步。首先,估计问题的复杂性被质量量化通过上限 bounds on 电磁波地图的空间变化。其次,对于零阶和首阶 interpolator 的估计器而言,错误 bounds 被 derivation。 transmitted 功率和投放到电磁波地图区域的距离之间的立方根有关的 proximity 系数被提出来衡量估计问题的复杂性。这一主要发现是估计器的错误roughly proportional to 该 proximity 系数。 simple numerical experiments 验证了获得的 bound 的紧密性。

Interference Management by Harnessing Multi-Domain Resources in Spectrum-Sharing Aided Satellite-Ground Integrated Networks

  • paper_url: http://arxiv.org/abs/2310.15011
  • repo_url: None
  • paper_authors: Xiaojin Ding, Yue Lei, Yulong Zou, Gengxin Zhang, Lajos Hanzo
  • for: 这项研究的目的是设计一个基于多域资源的干扰管理方案,以提高spectrum-sharing satellite-ground integrated network的性能。
  • methods: 该方案利用了joint multi-domain resource aided interference management(JMDR-IM)技术,包括 beam-domain和power-domain资源的共享,以及特制的beam shut-off和switching based beam scheduling,以及long short-term memory based joint autoregressive moving average assisted deep Q network aided power scheduling。
  • results: 研究人员通过分析覆盖重叠区域的覆盖相互关系,并利用多域资源进行干扰管理,从而提高了throughput和降低了outage probability(OP)。
    Abstract A spectrum-sharing satellite-ground integrated network is conceived, consisting of a pair of non-geostationary orbit (NGSO) constellations and multiple terrestrial base stations, which impose the co-frequency interference (CFI) on each other. The CFI may increase upon increasing the number of satellites. To manage the potentially severe interference, we propose to rely on joint multi-domain resource aided interference management (JMDR-IM). Specifically, the coverage overlap of the constellations considered is analyzed. Then, multi-domain resources - including both the beam-domain and power-domain - are jointly utilized for managing the CFI in an overlapping coverage region. This joint resource utilization is performed by relying on our specifically designed beam-shut-off and switching based beam scheduling, as well as on long short-term memory based joint autoregressive moving average assisted deep Q network aided power scheduling. Moreover, the outage probability (OP) of the proposed JMDR-IM scheme is derived, and the asymptotic analysis of the OP is also provided. Our performance evaluations demonstrate the superiority of the proposed JMDR-IM scheme in terms of its increased throughput and reduced OP.
    摘要 一种基于各自频率共享的卫星地面集成网络是提出的,该网络包括两组非地球坐标卫星(NGSO)和多个地面基站,这些卫星和基站之间存在干扰。随着卫星的数量增加,干扰可能会增加。为了管理严重的干扰,我们提议利用多元频率资源帮助的共享资源管理(JMDR-IM)。 Specifically,我们分析了卫星集成的覆盖重叠区域。然后,我们利用多元频率资源,包括扫描频率和功率频率,共同管理干扰在重叠覆盖区域。这种共同资源利用是通过我们特制的扫描屏蔽和功率频率调度来实现的,以及基于长期快速傅立叶滤波器支持的深度神经网络帮助的电力调度。此外,我们还 derivated了我们提议的JMDR-IM方案的失效概率(OP),并提供了这种方案的极限分析。我们的性能评估表明,相比之下,我们的JMDR-IM方案在透传率和失效概率方面具有更高的优势。

GDOP Based BS Selection for Positioning in mmWave 5G NR Networks

  • paper_url: http://arxiv.org/abs/2310.14857
  • repo_url: None
  • paper_authors: A. Indika Perera, K. B. Shashika Manosha, Nandana Rajatheva, Matti Latva-aho
  • for: 提高用户设备(UE)的定位精度,使用 fifth-generation(5G)移动通信技术和高基站密度(BS)。
  • methods: 利用基站(BS)的几何分布,通过时差到达(TDOA)测量来提高UE定位精度。
  • results: 根据基站的几何分布选择算法可以在杂合线视野(LOS)和非线视野(NLOS)环境下提高UE定位精度,并且需要 fewer radio resources。
    Abstract The fifth-generation (5G) of mobile communication supported by millimetre-wave (mmWave) technology and higher base station (BS) densification facilitate to enhance user equipment (UE) positioning. Therefore, 5G cellular system is designed with many positioning measurements and special positioning reference signals with a multitude of configurations for a variety of use cases, expecting stringent positioning accuracies. One of the major factors that the accuracy of a particular position estimate depends on is the geometry of the nodes in the system, which could be measured with the geometric dilution of precision (GDOP). Hence in this paper, we investigate the time difference of arrival (TDOA) measurements based UE positioning accuracy improvement, exploiting the geometric distribution of BSs in mixed LOS and NLOS environment. We propose a BS selection algorithm for UE positioning based on the GDOP of the BSs participating in the positioning process. Simulations are conducted for indoor and outdoor scenarios that use antenna arrays with beam-based mmWave NR communication. Results demonstrate that the proposed BS selection can achieve higher positioning accuracy with fewer radio resources compared to the other BS selection methods.
    摘要 fifth-generation (5G) 移动通信支持毫米波(mmWave)技术和更高的基站(BS) 密度化,以提高用户设备(UE) 定位。因此,5G ћеллиular 系统被设计为有多种定位测量和特殊定位引导信号,以满足不同的用 caso 需求,期望 stringent 定位精度。一个 major factor 影响 particular 定位估计的精度是系统中节点的几何学分布(GDOP)。因此,在这篇论文中,我们调查了时差到达(TDOA) 测量基于 UE 定位精度改进,利用混合 LOS 和 NLOS 环境中 BSs 的几何学分布。我们提议一种基于 GDOP 的 BS 选择算法 для UE 定位。我们对indoor 和 outdoor enario 进行了 simulations,使用扫描器数组和 beam-based mmWave NR 通信。结果表明,我们提议的 BS 选择可以在 fewer radio resources 下实现更高的定位精度,相比其他 BS 选择方法。

An introduction to radar Automatic Target Recognition (ATR) technology in ground-based radar systems

  • paper_url: http://arxiv.org/abs/2310.14769
  • repo_url: None
  • paper_authors: Jiangkun Gong, Jun Yan, Deyong Kong, Deren Li
  • for: 这篇论文探讨了基于地面雷达系统的自动目标识别(ATR)技术。
  • methods: 这篇论文描述了ATR技术的历史发展、不同散射区域的分类方法以及在雷达系统中应用ATR解决方案。
  • results: 这篇论文表明了通过应用ATR技术,可以提高雷达探测范围和跟踪能力,从而提高情况意识。 更重要的是,这篇论文还指出了有三种需要ATR技术的急需应用:探测隐形飞机、对小型无人机进行应对和实施防护措施。
    Abstract This paper presents a brief examination of Automatic Target Recognition (ATR) technology within ground-based radar systems. It offers a lucid comprehension of the ATR concept, delves into its historical milestones, and categorizes ATR methods according to different scattering regions. By incorporating ATR solutions into radar systems, this study demonstrates the expansion of radar detection ranges and the enhancement of tracking capabilities, leading to superior situational awareness. Drawing insights from the Russo-Ukrainian War, the paper highlights three pressing radar applications that urgently necessitate ATR technology: detecting stealth aircraft, countering small drones, and implementing anti-jamming measures. Anticipating the next wave of radar ATR research, the study predicts a surge in cognitive radar and machine learning (ML)-driven algorithms. These emerging methodologies aspire to confront challenges associated with system adaptation, real-time recognition, and environmental adaptability. Ultimately, ATR stands poised to revolutionize conventional radar systems, ushering in an era of 4D sensing capabilities.
    摘要 Note: Simplified Chinese is also known as "简化字" or "简体字".Translation notes:* "Automatic Target Recognition" is translated as "自动目标识别" (zìhuì mùzhì bǐ)* "ground-based radar systems" is translated as "地面雷达系统" (dìmiàn léi da xìtǒng)* "cognitive radar" is translated as "认知雷达" (xiǎngchí léi da)* "machine learning-driven algorithms" is translated as "机器学习驱动算法" (jīshēng xuéxí jíxiàng yùdòng)* "4D sensing capabilities" is translated as "4D感知能力" (4D gǎn zhī nénglì)

Optimizing IoT-Based Asset and Utilization Tracking: Efficient Activity Classification with MiniRocket on Resource-Constrained Devices

  • paper_url: http://arxiv.org/abs/2310.14758
  • repo_url: None
  • paper_authors: Marco Giordano, Silvano Cortesi, Michele Crabolu, Lavinia Pedrollo, Giovanni Bellusci, Tommaso Bendinelli, Engin Türetken, Andrea Dunbar, Michele Magno
  • for: 这 paper 描述了一种有效的解决方案,使得建筑工具 retrofit 为低功率 IoT 设备,以实现准确的活动分类。
  • methods: 本 paper 使用了一种新发布的算法called MiniRocket,以实现时间序列分类的准确性和功耗减少。
  • results: 实验结果表明,提posed 解决方案可以在具有限制的资源的 IoT 设备上实现96.9%的准确率,并且具有低于 15μW 的平均电流吞吐量,可以实现3-9年的电池寿命。
    Abstract This paper introduces an effective solution for retrofitting construction power tools with low-power IoT to enable accurate activity classification. We address the challenge of distinguishing between when a power tool is being moved and when it is actually being used. To achieve classification accuracy and power consumption preservation a newly released algorithm called MiniRocket was employed. Known for its accuracy, scalability, and fast training for time-series classification, in this paper, it is proposed as a TinyML algorithm for inference on resource-constrained IoT devices. The paper demonstrates the portability and performance of MiniRocket on a resource-constrained, ultra-low power sensor node for floating-point and fixed-point arithmetic, matching up to 1% of the floating-point accuracy. The hyperparameters of the algorithm have been optimized for the task at hand to find a Pareto point that balances memory usage, accuracy and energy consumption. For the classification problem, we rely on an accelerometer as the sole sensor source, and BLE for data transmission. Extensive real-world construction data, using 16 different power tools, were collected, labeled, and used to validate the algorithm's performance directly embedded in the IoT device. Experimental results demonstrate that the proposed solution achieves an accuracy of 96.9% in distinguishing between real usage status and other motion statuses while consuming only 7kB of flash and 3kB of RAM. The final application exhibits an average current consumption of less than 15{\mu}W for the whole system, resulting in battery life performance ranging from 3 to 9 years depending on the battery capacity (250-500mAh) and the number of power tool usage hours (100-1500h).
    摘要 The paper demonstrates the portability and performance of MiniRocket on a resource-constrained, ultra-low power sensor node for floating-point and fixed-point arithmetic, achieving up to 1% of the floating-point accuracy. The hyperparameters of the algorithm were optimized for the task at hand to find a Pareto point that balances memory usage, accuracy, and energy consumption.For the classification problem, an accelerometer was used as the sole sensor source, and BLE was used for data transmission. Extensive real-world construction data, using 16 different power tools, were collected, labeled, and used to validate the algorithm's performance directly embedded in the IoT device.Experimental results show that the proposed solution achieves an accuracy of 96.9% in distinguishing between real usage status and other motion statuses while consuming only 7kB of flash and 3kB of RAM. The final application exhibits an average current consumption of less than 15μW for the whole system, resulting in battery life performance ranging from 3 to 9 years depending on the battery capacity (250-500mAh) and the number of power tool usage hours (100-1500h).

Time-Domain Channel Estimation for Extremely Large MIMO THz Communications with Beam Squint

  • paper_url: http://arxiv.org/abs/2310.14745
  • repo_url: None
  • paper_authors: Evangelos Vlachos, Aryan Kaushik, Yonina C. Eldar, George C. Alexandropoulos
  • for: 本研究考虑了极高频(THz)频段多输入多Output(MIMO)通信中的极大(XL)通道估计问题,包括整个天线缘的传播延迟,导致频率选择性,即矢量折叠问题。
  • methods: 我们提出了一种新的时域频率域通道估计问题的形式ulation,适用于单载波(SC)模ulation,并在拥有矢量折叠效应的稀疏vector recovery问题中包含了 beam squint效应。我们使用了分解 alternating minimizationapproach来同时追踪矢量折叠和稀疏MIMO通道。
  • results: 我们的实验结果表明,提出的SC技术在THz通信中表现更好于传统的多Provider(MC)方法以及当前的多Provider(MC)方法。
    Abstract In this paper, we study the problem of extremely large (XL) multiple-input multiple-output (MIMO) channel estimation in the Terahertz (THz) frequency band, considering the presence of propagation delays across the entire array apertures, which leads to frequency selectivity, a problem known as beam squint. Multi-carrier transmission schemes which are usually deployed to address this problem, suffer from high peak-to-average power ratio, which is specifically dominant in THz communications where low transmit power is realized. Diverging from the usual approach, we devise a novel channel estimation problem formulation in the time domain for single-carrier (SC) modulation, which favors transmissions in THz, and incorporate the beam-squint effect in a sparse vector recovery problem that is solved via sparse optimization tools. In particular, the beam squint and the sparse MIMO channel are jointly tracked by using an alternating minimization approach that decomposes the two estimation problems. The presented performance evaluation results validate that the proposed SC technique exhibits superior performance than the conventional one as well as than state-of-the-art multi-carrier approaches.
    摘要 在这篇论文中,我们研究了超广频(XL)多输入多输出(MIMO)通道估算问题在tera哈兹(THz)频率带中,考虑到普通频率延迟所导致的频率选择性问题,即矩形扫描问题。通常使用多卡рий传输方案来解决这个问题,但这会导致高峰至平均功率比,特别是在THz通信中, где实现低发射功率。在这篇论文中,我们提出了一种新的时域频率域频率估算问题的解决方案,采用单卡рий(SC)模ulation,并具有减少频率选择性的效果。具体来说,我们将矩形扫描和稀疏MIMO通道作为两个估算问题,并使用循环最小化方法来解决它们。论文中所提出的性能评估结果表明,我们的SC技术在比较减少频率选择性下表现更优于传统的方法以及当前的多卡рий方法。

Design and Implementation of an RSSI-Based Bluetooth Low Energy Indoor Localization System

  • paper_url: http://arxiv.org/abs/2310.14704
  • repo_url: None
  • paper_authors: Silvano Cortesi, Marc Dreher, Michele Magno
  • for: This paper presents a low-cost and energy-efficient indoor localization system using Bluetooth Low Energy (BLE) technology.
  • methods: The system uses a received signal strength indicator (RSSI)-based approach and a low complex weighted k-Nearest Neighbors algorithm to process raw RSSI data from connection-less iBeacon’s.
  • results: The experimental results show an average error of only 0.72 m in realistic conditions, and the overall power consumption of the fixed beacon is only 50 uA at 3 V, leading to a long-lasting solution of over one year with a 500 mAh coin battery.
    Abstract Indoor Positioning System (IPS) is a crucial technology that enables medical staff and hospital managements to accurately locate and track persons or assets inside the medical buildings. Among other technologies, Bluetooth Low Energy (BLE) can be exploited for achieving an energy-efficient and low-cost solution. This work presents the design and implementation of an received signal strength indicator (RSSI)-based indoor localization system. The paper shows the implementation of a low complex weighted k-Nearest Neighbors algorithm that processes raw RSSI data from connection-less iBeacon's. The designed hardware and firmware are implemented around the low-power and low-cost nRF52832 from Nordic Semiconductor. Experimental evaluation with the real-time data processing has been evaluated and presented in a 7.2 m by 7.2 m room with furniture and 5 beacon nodes. The experimental results show an average error of only 0.72 m in realistic conditions. Finally, the overall power consumption of the fixed beacon with a periodic advertisement of 100 ms is only 50 uA at 3 V, which leads to a long-lasting solution of over one year with a 500 mAh coin battery.
    摘要 室内定位系统(IPS)是医疗人员和医院管理人员可以准确定位和跟踪室内人员或资产的关键技术。 among other technologies, Bluetooth Low Energy(BLE)可以被利用来实现能效和成本低的解决方案。 这项工作描述了基于接收信号强度指标(RSSI)的室内定位系统的设计和实现。 文章显示了使用低复杂度Weighted k-Nearest Neighbors算法处理Raw RSSI数据,并在connection-less iBeacon上实现了简单的硬件和软件实现。 实验评估了使用实时数据处理的实际情况,并在7.2米 x 7.2米的房间中进行了5个Beacon节点的实验。 实验结果显示,在实际情况下的平均误差只有0.72米。 最后,fixed beacon的 periodic advertisement 100ms 的能量消耗只有50 uA bei 3 V,这导致了一个可持续的解决方案,可以在500 mAh coin battery上持续使用耗电量超过一年。

Comments on “Graphon Signal Processing’’

  • paper_url: http://arxiv.org/abs/2310.14683
  • repo_url: None
  • paper_authors: Xingchao Jian, Feng Ji, Wee Peng Tay
  • for: 本文指出了论文中第4 proposition 中的技术错误,导致论文中的lemma 3、 theorem 1、 proposition 2 和 theorem 4 的证明无效。
  • methods: 作者提供了反例以证明 Proposition 4 的错误,并指出了证明中的缺陷。 furthermore, the authors provide numerical evidence suggesting that lemma 3, theorem 1, and proposition 2 are likely to be false.
  • results: 作者提出了修改论文中 theorem 4 的陈述,使其符合实际情况。 In addition, the authors provide a construction that guarantees convergence in the sense of proposition 4.
    Abstract This correspondence points out a technical error in Proposition 4 of the paper [1]. Because of this error, the proofs of Lemma 3, Theorem 1, Theorem 3, Proposition 2, and Theorem 4 in that paper are no longer valid. We provide counterexamples to Proposition 4 and discuss where the flaw in its proof lies. We also provide numerical evidence indicating that Lemma 3, Theorem 1, and Proposition 2 are likely to be false. Since the proof of Theorem 4 depends on the validity of Proposition 4, we propose an amendment to the statement of Theorem 4 of the paper using convergence in operator norm and prove this rigorously. In addition, we also provide a construction that guarantees convergence in the sense of Proposition 4.
    摘要 这封电子邮件指出了论文中提案4中的技术错误。由于这个错误,论文中的 Lemma 3、Theorem 1、Theorem 3、Proposition 2 和 Theorem 4 的证明不再有效。我们提供了对 Proposition 4 的反例,并讲解了证明中的缺陷。我们还提供了一些数据证明 Lemma 3、Theorem 1 和 Proposition 2 可能无效。由于 Theorem 4 的证明受到 Proposition 4 的有效性的影响,我们提议修改论文中 Theorem 4 的陈述,使用 Operator norm 的极限来证明其正确性。此外,我们还提供了一种 garantuee convergency 的构造。

Distributed MIMO for 6G sub-Networks in the Unlicensed Spectrum

  • paper_url: http://arxiv.org/abs/2310.14591
  • repo_url: None
  • paper_authors: Mohamed Elwekeil, Lorenzo Galati Giordano, Paolo Baracca, Stefano Buzzi
  • for: 本研究针对第六代(6G)子网络,预期会满足高可靠低延迟通信(HRLLC)需求。
  • methods: 本研究使用 listen before talk(LBT)技术,评估多个子网络在服务区域中的运作可能性,并考虑使用分布式多Input多Output(MIMO)技术,将每个子网络中的可用天线分布到多个存取点(AP)上。
  • results: 比较不同的分布式MIMO配置和中央MIMO配置,分析它们对于HRLLC的影响。
    Abstract In this paper, we consider the sixth generation (6G) sub-networks, where hyper reliable low latency communications (HRLLC) requirements are expected to be met. We focus on a scenario where multiple sub-networks are active in the service area and assess the feasibility of using the 6 GHz unlicensed spectrum to operate such deployment, evaluating the impact of listen before talk (LBT). Then, we explore the benefits of using distributed multiple input multiple output (MIMO), where the available antennas in every sub-network are distributed over a number of access points (APs). Specifically, we compare different configurations of distributed MIMO with respect to centralized MIMO, where a single AP with all antennas is located at the center of every sub-network.
    摘要 在这篇论文中,我们考虑了第六代(6G)子网络,其中超低延迟低可靠通信(HRLLC)要求需要满足。我们关注了多个子网在服务区域中同时活跃的情况,并评估了在6 GHz无线频率带中运行这种部署的可行性,包括听前说(LBT)的影响。然后,我们探讨了使用分布式多输入多输出(MIMO)的好处,其中每个子网的可用天线被分布在一些访问点(AP)上。我们比较了不同的分布式MIMO配置与中央MIMO配置,其中每个子网的中心处有一个单一的AP所拥有所有天线。

Device Detection and Channel Estimation in MTC with Correlated Activity Pattern

  • paper_url: http://arxiv.org/abs/2310.14578
  • repo_url: None
  • paper_authors: Hamza Djelouat, Mikko J. Sillanpää, Markku Juntti
  • for: 解决 grant-free 访问中的活动检测和通道估计问题,特别是在事件触发交通模式下,设备分布在团队中,并且具有相互关联的稀疏活动模式。
  • methods: 提出了一种基于Structured sparsity-inducing spike-and-slab prior的 bayesian推理方法,通过对预测进程(EP)框架进行解释,解决JUICE问题。
  • results: 经过数学实验表明,提出的方法可以获得显著的用户识别精度和通道估计性能提升。
    Abstract This paper provides a solution for the activity detection and channel estimation problem in grant-free access with correlated device activity patterns. In particular, we consider a machine-type communications (MTC) network operating in event-triggered traffic mode, where the devices are distributed over clusters with an activity behaviour that exhibits both intra-cluster and inner-cluster sparsity patterns. Furthermore, to model the network's intra-cluster and inner-cluster sparsity, we propose a structured sparsity-inducing spike-and-slab prior which provides a flexible approach to encode the prior information about the correlated sparse activity pattern. Furthermore, we drive a Bayesian inference scheme based on the expectation propagation (EP) framework to solve the JUICE problem. Numerical results highlight the significant gains obtained by the proposed structured sparsity-inducing spike-and-slab prior in terms of both user identification accuracy and channel estimation performance.
    摘要

Locally Self-Adjustive Smoothing for Measurement Noise Reduction with Application to Automated Peak Detection

  • paper_url: http://arxiv.org/abs/2310.17663
  • repo_url: None
  • paper_authors: Keisuke Ozawa, Tomoya Itakura, Taisuke Ono
  • for: 这篇论文是为了提出一种地方自适应的平滑方法,以实现几何平滑和当地精确性。
  • methods: 论文使用了一个参数,可以控制全局平滑程度,同时使用数据本身来保持地方平滑。
  • results: 实验和实际应用显示,该方法可以实现实际情况下的优化噪声减少,并且与现有的扩散平滑方法比较,具有较好的性能。
    Abstract Smoothing is widely used approach for measurement noise reduction in spectral analysis. However, it suffers from signal distortion caused by peak suppression. A locally self-adjustive smoothing method is developed that retains sharp peaks and less distort signals. The proposed method uses only one parameter that determines global smoothness, while balancing the local smoothness using data itself. Simulation and real experiments in comparison with existing convolution-based smoothing methods indicate both qualitatively and quantitatively improved noise reduction performance in practical scenarios. We also discuss parameter selection and demonstrate an application for the automated smoothing and detection of a given number of peaks from noisy measurement data.
    摘要 均拟合是 spectral analysis 中广泛使用的一种纹理减少方法。然而,它会导致信号扭曲由峰点抑制。我们开发了一种基于本地自适应纹理方法,可以保留锐利峰点而减少信号扭曲。该方法仅需一个全局纹理程度的参数,同时通过数据自身进行局部纹理平衡。在实验中,我们对现有核函数纹理方法进行比较,并表明在实际应用中具有较好的噪声减少性能。我们还讨论参数选择并示例了自动纹理和峰点检测的应用。Note: The translation is done using Google Translate and may not be perfect. Please let me know if you need further assistance or if you would like me to use a different translation tool.

Low-Complex Channel Estimation in Extra-Large Scale MIMO with the Spherical Wave Properties

  • paper_url: http://arxiv.org/abs/2310.14538
  • repo_url: None
  • paper_authors: Xumin Pu, Zhinan Sun, Qianbin Chen, Shi Jin
  • for: 这个论文研究了EXTRA-LARGE SCALE MIMO系统中的低复杂 Linear Minimum Mean Squared Error(LMMSE)通道估计,使用圆柱形天线数组。
  • methods: 该论文提出了一种基于圆柱形波模型(SWM)的EXTRA-LARGE SCALE MIMO通道估计算法,使用了圆柱形波性质(SWP)来优化估计算法。
  • results: тео리тиче和实验结果表明,提出的算法可以降低复杂度,不会降低通道估计的精度。
    Abstract This paper investigates the low-complex linear minimum mean squared error (LMMSE) channel estimation in an extra-large scale MIMO system with the spherical wave model (SWM). We model the extra-large scale MIMO channels using the SWM in the terahertz (THz) line-of-sight propagation, in which the transceiver is a uniform circular antenna array. On this basis, for the known channel covariance matrix (CCM), a low-complex LMMSE channel estimation algorithm is proposed by exploiting the spherical wave properties (SWP). Meanwhile, for the unknown CCM, a similar low-complex LMMSE channel estimation algorithm is also proposed. Both theoretical and simulation results show that the proposed algorithm has lower complexity without reducing the accuracy of channel estimation.
    摘要

Terahertz Induced Protein Interactions in a Random Medium

  • paper_url: http://arxiv.org/abs/2310.14519
  • repo_url: None
  • paper_authors: Hadeel Elayan, Andrew W. Eckford, Raviraj Adve
  • for: 这个论文的目的是研究tera哈兹(THz)信号对蛋白质结构变化的控制,以及这种控制机制在细胞运动和蛋白质结构之间的复杂关系。
  • methods: 该论文使用了一种通信系统,包括一个射频天线、一个蛋白质接收器和一个由移动红细胞组成的通道。由于系统的动态特性,研究者们分析了系统中的快速和慢速通道变化对蛋白质叠层的影响。
  • results: 研究者们发现,通过调整天线功率和频率,可以提高蛋白质相互作用的可控性。此外,通过 probabilistic 分析,研究者们获得了一种新的视角,可以理解外部条件对蛋白质叠层动力学和结构的影响,以及如何利用这些还未开发的工具来控制蛋白质相互作用。
    Abstract Folding of proteins into their correct native structure is key to their function. Simultaneously, the intricate interplay between cell movement and protein conformation highlights the complex nature of cellular processes. In this work, we demonstrate the impact of Terahertz (THz) signaling on controlling protein conformational changes in a random medium. Our system of interest consists of a communication link that involves a nanoantenna transmitter, a protein receiver, and a channel composed of moving red blood cells. Due to the system dynamics, we investigate the influence of both the fast and slow channel variations on protein folding. Specifically, we analyze the system's selectivity to asses the effectiveness of the induced THz interaction in targeting a specific group of proteins under fading conditions. By optimizing the selectivity metric with respect to the nanoantenna power and frequency, it is possible to enhance the controllability of protein interactions. Our probabilistic analysis provides a new perspective regarding electromagnetically triggered protein molecules, their microenvironment and their interaction with surrounding particles. It helps elucidate how external conditions impact the protein folding kinetics and pathways. This results in not only understanding the mechanisms underlying THz-induced protein interactions but also engineering these still-emerging tools.
    摘要 <>翻译文本到简化中文。<>蛋白质的折叠到其正确的自然结构是关键于其功能。同时,细胞移动和蛋白质结构之间的复杂互动推翻细胞生物过程的复杂性。在这项工作中,我们展示了特拉赫兹(THz)信号在控制蛋白质结构变化中的影响。我们的系统包括一个通信链,包括一个奈米天线发送器、一个蛋白质接收器和一个由移动红细胞组成的通道。由于系统动力学,我们研究了速度和慢速通道变化对蛋白质折叠的影响。 Specifically,我们分析了系统的选择性,以评估THz交互所引起的特定组蛋白质的影响下的折叠。通过调整选择性指标与奈米天线功率和频率相关,可以提高蛋白质交互的可控性。我们的概率分析提供了一个新的视角,描述了电磁谱诱发的蛋白质分子、它的微环境以及与周围粒子的交互。它帮助了解外部条件对蛋白质折叠动力学和折叠路径的影响,从而更好地理解THz谱诱发蛋白质交互的机制,并开发这些仍在级探索的工具。

Label Space Partition Selection for Multi-Object Tracking Using Two-Layer Partitioning

  • paper_url: http://arxiv.org/abs/2310.14506
  • repo_url: None
  • paper_authors: Ji Youn Lee, Changbeom Shim, Hoa Van Nguyen, Tran Thien Dat Nguyen, Hyunjin Choi, Youngho Kim
  • for: 提高多对象轨迹估计的效率,解决数据关联ambiguity导致的计算复杂性增加问题。
  • methods: 使用分治策略,将 отличитель标签分配给具有统计相关性的对象,并使用次要分配技术来实现并行计算。
  • results: 提出一种高效的标签分配方法,能够在大规模跟踪问题中实现扩展性和高速计算。并对提出方法的性能进行比较分析,结果显示提出方法在大规模数据集上表现出色。
    Abstract Estimating the trajectories of multi-objects poses a significant challenge due to data association ambiguity, which leads to a substantial increase in computational requirements. To address such problems, a divide-and-conquer manner has been employed with parallel computation. In this strategy, distinguished objects that have unique labels are grouped based on their statistical dependencies, the intersection of predicted measurements. Several geometry approaches have been used for label grouping since finding all intersected label pairs is clearly infeasible for large-scale tracking problems. This paper proposes an efficient implementation of label grouping for label-partitioned generalized labeled multi-Bernoulli filter framework using a secondary partitioning technique. This allows for parallel computation in the label graph indexing step, avoiding generating and eliminating duplicate comparisons. Additionally, we compare the performance of the proposed technique with several efficient spatial searching algorithms. The results demonstrate the superior performance of the proposed approach on large-scale data sets, enabling scalable trajectory estimation.
    摘要 estimate trajectories of multiple objects poses a significant challenge due to data association ambiguity, leading to a substantial increase in computational requirements. To address such problems, a divide-and-conquer manner has been employed with parallel computation. In this strategy, distinguished objects that have unique labels are grouped based on their statistical dependencies, the intersection of predicted measurements. Several geometry approaches have been used for label grouping since finding all intersected label pairs is clearly infeasible for large-scale tracking problems. This paper proposes an efficient implementation of label grouping for label-partitioned generalized labeled multi-Bernoulli filter framework using a secondary partitioning technique. This allows for parallel computation in the label graph indexing step, avoiding generating and eliminating duplicate comparisons. Additionally, we compare the performance of the proposed technique with several efficient spatial searching algorithms. The results demonstrate the superior performance of the proposed approach on large-scale data sets, enabling scalable trajectory estimation.Here's a word-for-word translation of the text into Simplified Chinese:估算多个对象的轨迹poses 一个重要挑战,由于数据关联不确定性,计算量增加了一个很大的幅度。为Addressing这些问题,使用了分治的方法,并使用并行计算。在这种策略中,通过统计依赖关系,将 отличитель的对象分组,基于预测测量的交叉。几种geometry方法已经用于标签分组,因为找到所有交叉标签对是大规模跟踪问题中不可能的。本文提出了一种高效的标签分组实现方式,使用次要分区技术,以避免生成和消除重复比较。此外,我们还对提出的方法进行了与多种高效空间搜索算法进行比较。结果表明,提出的方法在大规模数据集上具有优秀性能,可以实现扩展的轨迹估算。

Channel State Information-Free Location-Privacy Enhancement: Delay-Angle Information Spoofing

  • paper_url: http://arxiv.org/abs/2310.14465
  • repo_url: None
  • paper_authors: Jianxiu Li, Urbashi Mitra
  • for: 增强位置隐私
  • methods: 使用延迟角度信息欺骗(DAIS)策略,Shift location-relevant delays and angles without relying on channel state information (CSI) at the transmitter, obfuscating the eavesdropper with a physical location distinct from the true one.
  • results: 1) 透明化了地址私钥分享,2) 对于非法搜索者,达到了高于15dB的性能下降,3) 对于泄露precoder结构的攻击,具有较高的难度。
    Abstract In this paper, a delay-angle information spoofing (DAIS) strategy is proposed for location-privacy enhancement. By shifting the location-relevant delays and angles without the aid of channel state information (CSI) at the transmitter, the eavesdropper is obfuscated by a physical location that is distinct from the true one. A precoder is designed to preserve location-privacy while the legitimate localizer can remove the obfuscation with the securely shared information. Then, a lower bound on the localization error is derived via the analysis of the geometric mismatch caused by DAIS, validating the enhanced location-privacy. The statistical hardness for the estimation of the shared information is also investigated to assess the robustness to the potential leakage of the designed precoder structure. Numerical comparisons show that the proposed DAIS scheme results in more than 15 dB performance degradation for the illegitimate localizer at high signal-to-noise ratios, which is comparable to a recently proposed CSI-free location-privacy enhancement strategy and is less sensitive to the precoder structure leakage than the prior approach.
    摘要 在本文中,一种延迟角信息骗人(DAIS)策略是提出了用于增强位置隐私。通过在发射机没有通道状态信息(CSI)的帮助下,shift位置相关的延迟和角度,骗人被误导到一个与真实位置不同的物理位置。为保护位置隐私,一个适应器是设计的,而合法的地图可以通过安全地分享信息来除掉骗人。然后,通过分析几何偏差引起的地理匹配的下界,确认提高了位置隐私。此外,我们也 investigate了共享信息的统计困难性,以评估逃逸预处理器结构的Robustness。 numerically,我们发现,提出的DAIS方案在高信号响应率下,对非法地图的性能下降至少15dB,与之前提出的CSI自由位置隐私加强策略相当,并且对预处理器结构泄露更为敏感。