eess.IV - 2023-09-06

Compact Representation of n-th order TGV

  • paper_url: http://arxiv.org/abs/2309.03359
  • repo_url: None
  • paper_authors: Manu Ghulyani, Muthuvel Arigovindan
  • for: 这个论文的目的是探讨高阶导数迁移的普适化方法,以及解决高阶导数迁移导致的干扰问题。
  • methods: 该论文提出了一种新的总体化变分方法(TGV),该方法可以在不同的区域内使用不同的拟合程度来描述图像的 Piece-wise polynomial 行为。
  • results: 该论文的结果表明,使用TGV regularization可以提高图像重建的稳定性和准确性,并且可以在不同的图像中实现不同的拟合程度。然而,目前还没有一个可靠的算法来解决TGV regularization的高阶问题。
    Abstract Although regularization methods based on derivatives are favored for their robustness and computational simplicity, research exploring higher-order derivatives remains limited. This scarcity can possibly be attributed to the appearance of oscillations in reconstructions when directly generalizing TV-1 to higher orders (3 or more). Addressing this, Bredies et. al introduced a notable approach for generalizing total variation, known as Total Generalized Variation (TGV). This technique introduces a regularization that generates estimates embodying piece-wise polynomial behavior of varying degrees across distinct regions of an image.Importantly, to our current understanding, no sufficiently general algorithm exists for solving TGV regularization for orders beyond 2. This is likely because of two problems: firstly, the problem is complex as TGV regularization is defined as a minimization problem with non-trivial constraints, and secondly, TGV is represented in terms of tensor-fields which is difficult to implement. In this work we tackle the first challenge by giving two simple and implementable representations of n th order TGV
    摘要

Real-Time Dynamic Data Driven Deformable Registration for Image-Guided Neurosurgery: Computational Aspects

  • paper_url: http://arxiv.org/abs/2309.03336
  • repo_url: None
  • paper_authors: Nikos Chrisochoides, Andrey Fedorov, Yixun Liu, Andriy Kot, Panos Foteinos, Fotis Drakopoulos, Christos Tsolakis, Emmanuel Billias, Olivier Clatz, Nicholas Ayache, Alex Golby, Peter Black, Ron Kikinis
  • for: 这篇论文旨在描述用于脑外科手术规划的脑MR成像数据的动态数据驱动非均匀准确注册方法,以及这种方法的计算方面的进化和未来发展。
  • methods: 该方法使用动态数据驱动非均匀准确注册技术,可以在手术过程中动态调整预操作MR成像数据,以便考虑到手术过程中脑组织的变形。
  • results: 该方法可以准确地考虑到手术过程中脑组织的变形,并且可以提供高品质的脑MR成像数据,以帮助脑外科手术规划。
    Abstract Current neurosurgical procedures utilize medical images of various modalities to enable the precise location of tumors and critical brain structures to plan accurate brain tumor resection. The difficulty of using preoperative images during the surgery is caused by the intra-operative deformation of the brain tissue (brain shift), which introduces discrepancies concerning the preoperative configuration. Intra-operative imaging allows tracking such deformations but cannot fully substitute for the quality of the pre-operative data. Dynamic Data Driven Deformable Non-Rigid Registration (D4NRR) is a complex and time-consuming image processing operation that allows the dynamic adjustment of the pre-operative image data to account for intra-operative brain shift during the surgery. This paper summarizes the computational aspects of a specific adaptive numerical approximation method and its variations for registering brain MRIs. It outlines its evolution over the last 15 years and identifies new directions for the computational aspects of the technique.
    摘要 当前的神经外科手术使用各种媒体的医疗图像来准确定位肿瘤和critical brain structures,以便准确肿瘤除除。但是使用前opera的图像在手术过程中具有困难,这是因为脑肿瘤(brain shift)的变形会导致图像与实际状态不符,从而引起差异。实时成像技术可以跟踪这些变形,但是无法完全取代优质的前opera数据。D4NRR是一种复杂且时间消耗的图像处理操作,它允许在手术过程中动态调整前opera图像数据,以适应脑肿瘤的变形。本文概述了D4NRR的计算方面的特点和其变化在过去15年中,以及新的计算方向。

The Secrets of Non-Blind Poisson Deconvolution

  • paper_url: http://arxiv.org/abs/2309.03105
  • repo_url: None
  • paper_authors: Abhiram Gnanasambandam, Yash Sanghvi, Stanley H. Chan
  • for: 这篇论文主要针对的是非目标束化图像的恢复,尤其是在光子限制的情况下,传统的恢复算法失效的问题。
  • methods: 这篇论文提出了一种系统性的分析方法,涵盖了传统和深度学习方法的Poisson非目标束化算法。基于这种分析,提出了五个”秘密”,用于设计算法。
  • results: 根据这种分析,提出了一种证明性的方法,结合了五个秘密。实验结果显示,新方法与一些最新的方法相当,而与一些较老的方法超越。
    Abstract Non-blind image deconvolution has been studied for several decades but most of the existing work focuses on blur instead of noise. In photon-limited conditions, however, the excessive amount of shot noise makes traditional deconvolution algorithms fail. In searching for reasons why these methods fail, we present a systematic analysis of the Poisson non-blind deconvolution algorithms reported in the literature, covering both classical and deep learning methods. We compile a list of five "secrets" highlighting the do's and don'ts when designing algorithms. Based on this analysis, we build a proof-of-concept method by combining the five secrets. We find that the new method performs on par with some of the latest methods while outperforming some older ones.
    摘要 非盲图像恢复已经在数十年中被研究,但大多数现有工作都集中在模糊问题上,而在光子限制条件下,过度的射击噪声使传统的恢复算法失效。在寻找这些方法失败的原因时,我们提供了系统性的文献分析,涵盖了经典和深度学习方法。我们编译了五个"秘密",描述了设计算法时的做法和不做法。基于这一分析,我们建立了一种证明性方法,并发现其与一些最新的方法性能相似,而在一些较老的方法上表现出色。

A flexible and accurate total variation and cascaded denoisers-based image reconstruction algorithm for hyperspectrally compressed ultrafast photography

  • paper_url: http://arxiv.org/abs/2309.02835
  • repo_url: None
  • paper_authors: Zihan Guo, Jiali Yao, Dalong Qi, Pengpeng Ding, Chengzhi Jin, Ning Xu, Zhiling Zhang, Yunhua Yao, Lianzhong Deng, Zhiyong Wang, Zhenrong Sun, Shian Zhang
  • for: 高速光学画像捕捉(HCUP)技术可以同时实现时间和频谱图像捕捉,但由于压缩率过高和传统重建算法的限制,HCUP的图像重建质量受到影响。
  • methods: 提议使用全Variation(TV)和紧接着去噪器(CD)组合算法来解决HCUP图像重建问题,该算法基于迭代方向多分子方法,可以保持图像的平滑性,同时利用深度去噪网络获取更多的约束,解决了本地相似性和运动补做的共同稀疏表示问题。
  • results: 实验和 simulate结果表明,提议的TV-CD算法可以有效提高HCUP图像重建的准确性和质量,并且可以推动HCUP在捕捉高维复杂物理、化学和生物ultrafast光学场景中的实际应用。
    Abstract Hyperspectrally compressed ultrafast photography (HCUP) based on compressed sensing and the time- and spectrum-to-space mappings can simultaneously realize the temporal and spectral imaging of non-repeatable or difficult-to-repeat transient events passively in a single exposure. It possesses an incredibly high frame rate of tens of trillions of frames per second and a sequence depth of several hundred, and plays a revolutionary role in single-shot ultrafast optical imaging. However, due to the ultra-high data compression ratio induced by the extremely large sequence depth as well as the limited fidelities of traditional reconstruction algorithms over the reconstruction process, HCUP suffers from a poor image reconstruction quality and fails to capture fine structures in complex transient scenes. To overcome these restrictions, we propose a flexible image reconstruction algorithm based on the total variation (TV) and cascaded denoisers (CD) for HCUP, named the TV-CD algorithm. It applies the TV denoising model cascaded with several advanced deep learning-based denoising models in the iterative plug-and-play alternating direction method of multipliers framework, which can preserve the image smoothness while utilizing the deep denoising networks to obtain more priori, and thus solving the common sparsity representation problem in local similarity and motion compensation. Both simulation and experimental results show that the proposed TV-CD algorithm can effectively improve the image reconstruction accuracy and quality of HCUP, and further promote the practical applications of HCUP in capturing high-dimensional complex physical, chemical and biological ultrafast optical scenes.
    摘要 高级спектраль压缩超快摄影(HCUP)基于压缩感知和时间-频谱空间映射,可同时实现非重复或Difficult-to-repeat脉冲事件的时间和频谱成像,在单 exposure 中完成。它具有无 precedent 的高帧率,达到了 tens of trillions of frames per second 和 Several hundred sequence depth,扮演了革命性的角色在单shot ultrafast optical imaging 中。然而,由于 ultra-high data compression ratio 以及传统重建算法的限制,HCUP 的图像重建质量受到了严重的限制,无法捕捉复杂的脉冲场景中的细节。为解决这些限制,我们提出了基于全量变量(TV)和级联去噪器(CD)的 flexible image reconstruction algorithm,称为 TV-CD 算法。它在 iterative plug-and-play alternating direction method of multipliers 框架中,将 TV 去噪模型与多种高级深度学习去噪模型相互嵌套,以保持图像的稳定性,同时利用深度去噪网络获取更多的PRIOR,解决了 мест similarity和运动补做的共同简约表示问题。在 simulate 和实验中,我们发现 TV-CD 算法可以有效提高 HCUP 的图像重建质量和精度,并推动 HCUP 在捕捉高维复杂物理、化学和生物 ultrafast optical 场景中的实际应用。

Review of photoacoustic imaging plus X

  • paper_url: http://arxiv.org/abs/2309.02638
  • repo_url: None
  • paper_authors: Daohuai Jiang, Luyao Zhu, Shangqing Tong, Yuting Shen, Feng Gao, Fei Gao
  • For: This paper provides an overview of the emerging research frontiers in photoacoustic imaging (PAI) technology, including its applications in various biomedical fields and its combination with other advanced technologies.* Methods: The paper discusses the current state of PAI technology and its combination with other technologies, including PAI plus treatment, PAI plus new circuit design, PAI plus accurate positioning systems, PAI plus fast scanning systems, PAI plus novel ultrasound sensors, PAI plus advanced laser sources, PAI plus deep learning, and PAI plus other imaging modalities.* Results: The paper summarizes the technical advantages and prospects for application of each technology, with a focus on recent developments in the past three years. It also discusses the challenges and potential future work in the PAI plus X area.Here’s the information in Simplified Chinese text:* For: 这篇评论文章提供了photoacoustic imaging(PAI)技术的新兴研究前沿,包括它在各种生物医学领域的应用以及与其他先进技术的组合。* Methods: 文章讨论了PAI技术的当前状况以及它与其他技术的组合,包括PAI加 treatment、PAI加新电路设计、PAI加精准定位系统、PAI加快扫描系统、PAI加新式ultrasound探测器、PAI加高级激光源、PAI加深度学习以及PAI加其他成像模式。* Results: 文章summarizes each technology’s current state, technical advantages, and prospects for application, with a focus on recent developments in the past three years. It also discusses the challenges and potential future work in the PAI plus X area.
    Abstract Photoacoustic imaging (PAI) is a novel modality in biomedical imaging technology that combines the rich optical contrast with the deep penetration of ultrasound. To date, PAI technology has found applications in various biomedical fields. In this review, we present an overview of the emerging research frontiers on PAI plus other advanced technologies, named as PAI plus X, which includes but not limited to PAI plus treatment, PAI plus new circuits design, PAI plus accurate positioning system, PAI plus fast scanning systems, PAI plus novel ultrasound sensors, PAI plus advanced laser sources, PAI plus deep learning, and PAI plus other imaging modalities. We will discuss each technology's current state, technical advantages, and prospects for application, reported mostly in recent three years. Lastly, we discuss and summarize the challenges and potential future work in PAI plus X area.
    摘要 照片听影技术(PAI)是生物医学成像技术中的一种新兴方式,它结合了丰富的光学强度和深入的超声探测。至今,PAI技术已找到了多个生物医学应用领域。在本文中,我们提供PAI以外其他高级技术的概述,称为PAI加X,包括但不限于PAI加治疗、PAI加新电路设计、PAI加准确定位系统、PAI加快扫描系统、PAI加新的ultrasound探测器、PAI加高级激光源、PAI加深度学习和PAI加其他成像方式。我们将讨论每种技术的当前状态、技术优势和应用前景,大多是在过去三年内发表的研究报告。最后,我们讨论和总结PAI加X领域中的挑战和未来工作。

eess.SP - 2023-09-06

Demonstration of an Integrated Planar Guided-wave Terahertz Synthesized Filter

  • paper_url: http://arxiv.org/abs/2309.03379
  • repo_url: None
  • paper_authors: Ali Dehghanian, Mohsen Haghighat, Thomas Darcie, Levi Smith
  • for: 这篇论文是用于设计和实现Integrated planar low-pass filters at terahertz (THz) frequencies的。
  • methods: 本文使用了microwave engineering中的filter synthesis方法来设计几种Integrated planar low-pass filters,并验证了其传输特性与理论和模拟结果的一致。
  • results: 本文通过实验和数学模拟表明,使用了filter synthesis方法可以实现高精度的Integrated planar low-pass filters,并且其传输特性与理论和模拟结果一致。
    Abstract At terahertz (THz) frequencies there are few experimental works which demonstrate filter synthesis to obtain a desired filter response (i.e., Chebyshev, Butterworth, Bessel, etc.). Currently, the majority of literature perform THz filter analysis, that is, characterizing the filter response after design procedure. In this paper, we apply filter synthesis methods from microwave engineering to design several integrated planar low-pass filters fc = 0.8 THz). We find that the transmission characteristics align with theory and simulation.
    摘要 在tera哈兹(THz)频率范围内,有很少的实验室工作展示了filter synthesis的应用,以实现想要的filter响应(例如Chebyshev、Butterworth、Bessel等)。目前,大多数 литера献都是THz范围内的filter分析,即在设计过程后对filter响应进行 characterization。在这篇文章中,我们通过微波工程学中的filter synthesis方法,设计了一些集成式平面低通filter(fc = 0.8 THz)。我们发现,传输特性与理论和仿真结果一致。Note: "tera哈兹" (tera哈兹) is the Simplified Chinese term for "terahertz".

Cache-assisted Mobile Edge Computing over Space-Air-Ground Integrated Networks for Extended Reality Applications

  • paper_url: http://arxiv.org/abs/2309.03357
  • repo_url: None
  • paper_authors: Seonghoon Yoo, Seongah Jeong, Jeongbin Kim, Joonhyuk Kang
    for:* 提供一个基于现代6G技术和互联网络的扩展现实(XRI)系统,以提供更加真实的新用户体验和对现实世界的浸润。methods:* 使用网页边缘计算(MEC)系统,包括两种边缘服务器:一种位于无人飞行器(UAV)上,另一种位于低地球轨道(LEO)上,并具有快照和多个地面XRI设备的缓存。results:* 提出一个统一优化方法,包括UAV轨道估计、资源分配和下传决策,以最大化系统能效,并遵循延迟限制和UAV的运作限制。Please note that the above information is in Simplified Chinese.
    Abstract Extended reality-enabled Internet of Things (XRI) provides the new user experience and the sense of immersion by adding virtual elements to the real world through Internet of Things (IoT) devices and emerging 6G technologies. However, the computational-intensive XRI tasks are challenging for the energy-constrained small-size XRI devices to cope with, and moreover certain data requires centralized computing that needs to be shared among users. To this end, we propose a cache-assisted space-air-ground integrated network mobile edge computing (SAGIN-MEC) system for XRI applications, consisting of two types of edge servers mounted on an unmanned aerial vehicle (UAV) and low Earth orbit (LEO) equipped with cache and the multiple ground XRI devices. For system efficiency, the four different offloading procedures of the XRI data are considered according to the type of information, i.e., shared data and private data, as well as the offloading decision and the caching status. Specifically, the private data can be offloaded to either UAV or LEO, while the offloading decision of the shared data to the LEO can be determined by the caching status. With the aim of maximizing the energy efficiency of the overall system, we jointly optimize UAV trajectory, resource allocation and offloading decisions under latency constraints and UAV's operational limitations by using the alternating optimization (AO)-based method along with Dinkelbach algorithm and successive convex optimization (SCA). Via numerical results, the proposed algorithm is verified to have the superior performance compared to conventional partial optimizations or without cache.
    摘要 伸展实现现实(XRI)通过互联网物联网(IoT)设备和emerging 6G技术添加虚拟元素到实际世界中,提供新的用户体验和吸引力。然而,XRI任务具有计算沉重的问题,许多小型XRI设备无法承受,而且一些数据需要中央计算,需要在用户之间分享。为解决这个问题,我们提议一个缓存助手的空间空中地球多边计算(SAGIN-MEC)系统,包括两种类型的边缘服务器,分别安装在无人飞行器(UAV)和低地球轨道(LEO)上,并具有缓存和多个地面XRI设备。为了提高系统的效率,我们考虑了XRI数据的四种卸载过程,根据数据的类型和卸载决策,以及缓存状态。 Specifically,私有数据可以卸载到UAV或LEO上,而LEO上的卸载决策可以根据缓存状态。为了最大化整体系统的能效性,我们使用 alternating optimization(AO)基于方法, along with Dinkelbach algorithm和 successive convex optimization(SCA),并对偏振率和资源分配做joint优化。通过数字结果,我们的算法被证明为比传统的部分优化或无缓存情况下表现更好。

Sub-Array Selection in Full-Duplex Massive MIMO for Enhanced Self-Interference Suppression

  • paper_url: http://arxiv.org/abs/2309.03317
  • repo_url: None
  • paper_authors: Mobeen Mahmood, Asil Koc, Duc Tuong Nguyen, Robert Morawski, Tho Le-Ngoc
  • for: 本研究旨在提高全双工(FD)大规模多输入多出力(mMIMO)系统中的自适应干扰(SI) Mitigation,通过融合干扰扩散(HBF)架构,实现同频段同时进行上行(UL)和下行(DL)传输。
  • methods: 我们提出了一种基于融合干扰扩散(HBF)架构的小SI扩散扩散(min-SI)扩散方案,通过在发射(Tx)和接收(Rx)子数组中选择子数组来实现。此外,我们还提出了一种基于群体智能算法的优化方法来找到最佳的干扰扩散和Tx/Rx子数组,以最小化SISubject to直播和下行束缚约束。
  • results: 实验结果表明,提出的min-SI扩散方案可以在FD mMIMO系统中实现SI干扰suppression达78 dB。
    Abstract This study considers a novel full-duplex (FD) massive multiple-input multiple-output (mMIMO) system using hybrid beamforming (HBF) architecture, which allows for simultaneous uplink (UL) and downlink (DL) transmission over the same frequency band. Particularly, our objective is to mitigate the strong self-interference (SI) solely on the design of UL and DL RF beamforming stages jointly with sub-array selection (SAS) for transmit (Tx) and receive (Rx) sub-arrays at base station (BS). Based on the measured SI channel in an anechoic chamber, we propose a min-SI beamforming scheme with SAS, which applies perturbations to the beam directivity to enhance SI suppression in UL and DL beam directions. To solve this challenging nonconvex optimization problem, we propose a swarm intelligence-based algorithmic solution to find the optimal perturbations as well as the Tx and Rx sub-arrays to minimize SI subject to the directivity degradation constraints for the UL and DL beams. The results show that the proposed min-SI BF scheme can achieve SI suppression as high as 78 dB in FD mMIMO systems.
    摘要

Terahertz-Band Direction Finding With Beam-Split and Mutual Coupling Calibration

  • paper_url: http://arxiv.org/abs/2309.03195
  • repo_url: None
  • paper_authors: Ahmet M. Elbir, Kumar Vijay Mishra, Symeon Chatzinotas
    for: 这篇论文旨在研究用于第六代无线系统(6G)的teraHertz(THz)频段。methods: 论文使用了一种基于数组的方法,模型了束点分散和相互干扰,并使用了多个信号分类算法来进行方向来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来来 here is the proposed CREAM-MUSIC approach accurately estimates the DoAs in the presence of beam-split and mutual coupling.results: 通过数字仿真,论文显示了该方法可以准确地估计DoAs在束点分散和相互干扰的情况下。
    Abstract Terahertz (THz) band is currently envisioned as the key building block to achieving the future sixth generation (6G) wireless systems. The ultra-wide bandwidth and very narrow beamwidth of THz systems offer the next order of magnitude in user densities and multi-functional behavior. However, wide bandwidth results in a frequency-dependent beampattern causing the beams generated at different subcarriers split and point to different directions. Furthermore, mutual coupling degrades the system's performance. This paper studies the compensation of both beam-split and mutual coupling for direction-of-arrival (DoA) estimation by modeling the beam-split and mutual coupling as an array imperfection. We propose a subspace-based approach using multiple signal classification with CalibRated for bEAam-split and Mutual coupling (CREAM-MUSIC) algorithm for this purpose. Via numerical simulations, we show the proposed CREAM-MUSIC approach accurately estimates the DoAs in the presence of beam-split and mutual coupling.
    摘要 特拉赫频率(THz)带是未来第六代(6G)无线系统的关键构建元素。THz系统的超宽频率和非常窄的发射束宽提供下一个级别的用户密度和多功能行为。然而,广频率会导致频率相互依赖的束宽分裂,使得在不同的子报文上生成的束点分别指向不同的方向。此外,相互干扰会下降系统性能。这篇论文研究了补偿束宽分裂和相互干扰的方向到来(DoA)估计方法,将束宽分裂和相互干扰视为阵列不准确性。我们提议使用多个信号分类的SUBSPACE-basedapproach,并使用CalibRated for bEAam-split and Mutual coupling(CREAM-MUSIC)算法来实现这一目标。通过数值仿真,我们示出了提议的CREAM-MUSIC方法可以准确地估计DoAs在束宽分裂和相互干扰的情况下。

Real-Time Non-Invasive Imaging and Detection of Spreading Depolarizations through EEG: An Ultra-Light Explainable Deep Learning Approach

  • paper_url: http://arxiv.org/abs/2309.03147
  • repo_url: None
  • paper_authors: Yinzhe Wu, Sharon Jewell, Xiaodan Xing, Yang Nan, Anthony J. Strong, Guang Yang, Martyn G. Boutelle
  • for: 避免次次脑损伤,尤其是通过扩散电化学现象(SD)的检测。
  • methods: 利用脑电图(EEG)的信号处理和多模态深度学习网络,实现非侵入式SD检测。
  • results: 提出了一种新的ultra-light-weight多模态深度学习网络,可以融合EEG特征图像和时间能量 вектор,以提高SD检测精度。
    Abstract A core aim of neurocritical care is to prevent secondary brain injury. Spreading depolarizations (SDs) have been identified as an important independent cause of secondary brain injury. SDs are usually detected using invasive electrocorticography recorded at high sampling frequency. Recent pilot studies suggest a possible utility of scalp electrodes generated electroencephalogram (EEG) for non-invasive SD detection. However, noise and attenuation of EEG signals makes this detection task extremely challenging. Previous methods focus on detecting temporal power change of EEG over a fixed high-density map of scalp electrodes, which is not always clinically feasible. Having a specialized spectrogram as an input to the automatic SD detection model, this study is the first to transform SD identification problem from a detection task on a 1-D time-series wave to a task on a sequential 2-D rendered imaging. This study presented a novel ultra-light-weight multi-modal deep-learning network to fuse EEG spectrogram imaging and temporal power vectors to enhance SD identification accuracy over each single electrode, allowing flexible EEG map and paving the way for SD detection on ultra-low-density EEG with variable electrode positioning. Our proposed model has an ultra-fast processing speed (<0.3 sec). Compared to the conventional methods (2 hours), this is a huge advancement towards early SD detection and to facilitate instant brain injury prognosis. Seeing SDs with a new dimension - frequency on spectrograms, we demonstrated that such additional dimension could improve SD detection accuracy, providing preliminary evidence to support the hypothesis that SDs may show implicit features over the frequency profile.
    摘要 ❗注意:以下文本使用了简化字体。 neuroscience critical care 的核心目标是避免次次脑损伤。广泛的电化学变化 (SDs) 已被证明为次次脑损伤的重要独立原因。 SDs 通常通过高频率的电rocorticography 进行检测。Recent pilot studies 表明,可能通过非侵入式 EEG 来检测 SD。然而,EEG 信号噪音和弱化会使检测任务变得极其困难。 previous methods 主要是通过固定高密度电极地图来检测 EEG 时间能量的变化,这并不总是临床可行。本研究是首次将 SD 识别问题转化为一个在二维图像上进行的Sequential 任务,并使用特殊的spectrogram作为输入。我们提出了一种ultra-light-weight multi-modal deep-learning网络,用于融合 EEG spectrogram 图像和时间能量向量,以提高 SD 识别精度。我们的模型具有ultra-快处理速度(<0.3秒),与传统方法(2小时)相比,这是一个巨大的进步,可以帮助早期 SD 识别和实时脑损伤诊断。通过在spectrogram 图像上看到 SDs 的频谱特征,我们表明了频谱特征的存在可以提高 SD 识别精度,这提供了初步的证据支持 SDs 可能会在频谱 Profiling 中显示出隐藏的特征。

Millimeter Wave Thin-Film Bulk Acoustic Resonator in Sputtered Scandium Aluminum Nitride

  • paper_url: http://arxiv.org/abs/2309.03135
  • repo_url: None
  • paper_authors: Sinwoo Cho, Omar Barrera, Pietro Simeoni, Emily N. Marshall, Jack Kramer, Keisuke Motoki, Tzu-Hsuan Hsu, Vakhtang Chulukhadze, Matteo Rinaldi, W. Alan Doolittle, Ruochen Lu
  • for: This paper aims to improve the frequency scaling of sputtered ScAlN into mmWave and proposes a new fabrication procedure.
  • methods: The paper uses sputtered Sc0.3Al0.7N on Al on Si carrier wafer and transmission electron microscopy (TEM) and X-ray diffraction (XRD) to identify the bottlenecks in the existing piezoelectric-metal stack.
  • results: The resonator achieves electromechanical coupling (k2) of 7.0% and quality factor (Q) of 62 for the first-order symmetric (S1) mode at 21.4 GHz, along with k2 of 4.0% and Q of 19 for the third-order symmetric (S3) mode at 55.4 GHz, showing higher figures of merit (FoM, k2xQ) than reported AlN/ScAlN-based mmWave acoustic resonators.
    Abstract This work reports a millimeter wave (mmWave) thin-film bulk acoustic resonator (FBAR) in sputtered scandium aluminum nitride (ScAlN). This paper identifies challenges of frequency scaling sputtered ScAlN into mmWave and proposes a stack and new fabrication procedure with a sputtered Sc0.3Al0.7N on Al on Si carrier wafer. The resonator achieves electromechanical coupling (k2) of 7.0% and quality factor (Q) of 62 for the first-order symmetric (S1) mode at 21.4 GHz, along with k2 of 4.0% and Q of 19 for the third-order symmetric (S3) mode at 55.4 GHz, showing higher figures of merit (FoM, k2xQ) than reported AlN/ScAlN-based mmWave acoustic resonators. The ScAlN quality is identified by transmission electron microscopy (TEM) and X-ray diffraction (XRD), identifying the bottlenecks in the existing piezoelectric-metal stack. Further improvement of ScAlN/AlN-based mmWave acoustic resonators calls for better crystalline quality from improved thin-film deposition methods.
    摘要

NUV-DoA: NUV Prior-based Bayesian Sparse Reconstruction with Spatial Filtering for Super-Resolution DoA Estimation

  • paper_url: http://arxiv.org/abs/2309.03114
  • repo_url: https://github.com/MengyuanZha0/ICASSP24-NUV-DoA
  • paper_authors: Mengyuan Zhao, Guy Revach, Tirza Routtenberg, Nir Shlezinger
  • for: 高分辨率方向来源估计(DoA估计)通常需要高信号噪声比(SNR)和足够多的拍摄数。这篇文章提出了NUV-DoA算法,该算法将湍普 sparse reconstruction与空间滤波结合以实现超分辨率DoA估计。
  • methods: 文章使用了每个方向在Azimuth网格上的几何均匀分布(NUV)先验来模型每个方向,从而将非 konvex 优化问题转化为轮循重估最小二乘问题,其中测量值的均值是一个充分统计。这种方法不仅简化了我们的解决方案,而且准确地检测到了DoAs。
  • results: 实验证明了NUV-DoA的优越性,特别在低SNR情况下,与其他DoA估计器相比。
    Abstract Achieving high-resolution Direction of Arrival (DoA) recovery typically requires high Signal to Noise Ratio (SNR) and a sufficiently large number of snapshots. This paper presents NUV-DoA algorithm, that augments Bayesian sparse reconstruction with spatial filtering for super-resolution DoA estimation. By modeling each direction on the azimuth's grid with the sparsity-promoting normal with unknown variance (NUV) prior, the non-convex optimization problem is reduced to iteratively reweighted least-squares under Gaussian distribution, where the mean of the snapshots is a sufficient statistic. This approach not only simplifies our solution but also accurately detects the DoAs. We utilize a hierarchical approach for interference cancellation in multi-source scenarios. Empirical evaluations show the superiority of NUV-DoA, especially in low SNRs, compared to alternative DoA estimators.
    摘要 得到高分辨率方向来源(DoA)重建通常需要高信号噪声比(SNR)和足够多的拍照。这篇论文介绍了NUV-DoA算法,它将柔谐重建与空间滤波结合以实现超分辨率DoA估计。通过对每个方向在方位网格上使用不确定 variance(NUV)前提,非 convex 优化问题被降低到轮循权重最小二乘问题,其中抽样均值是sufficient statistic。这种方法不仅简化了我们的解决方案,而且准确地检测到DoAs。我们在多源场景中采用层次方法进行干扰抑制。实验证明NUV-DoA在低SNR下表现更出色,与其他DoA估计器相比。

Purposeful Co-Design of OFDM Signals for Ranging and Communications

  • paper_url: http://arxiv.org/abs/2309.03076
  • repo_url: None
  • paper_authors: Andrew Graff, Todd E. Humphreys
  • for: 本研究探讨了多频分复用通信信号的基本负面特性,以提高它们的范围估计和通信能力之间的质量负面。
  • methods: 本研究使用了谱函数积分 bound,概率失效概率,以及Ziv-Zakai bound on range estimation variance来量化这些负面特性。
  • results: 研究结果显示,通过根据 derive bounds来做Pareto优化设计选择,可以提高通信吞吐量、失败概率和范围估计方差之间的质量负面。不同的信号设计策略的Pareto优化设计选择也是受通道condition所影响的。
    Abstract This paper analyzes the fundamental trade-offs that occur in the co-design of orthogonal frequency-division multiplexing signals for both ranging (via time-of-arrival estimation) and communications. These trade-offs are quantified through the Shannon capacity bound, probability of outage, and the Ziv-Zakai bound on range estimation variance. Bounds are derived for signals experiencing frequency-selective Rayleigh block fading, accounting for the impact of limited channel knowledge and multi-antenna reception. Uncompensated carrier frequency offset and phase errors are also factored into the capacity bounds. Analysis based on the derived bounds demonstrates how Pareto-optimal design choices can be made to optimize the communication throughput, probability of outage, and ranging variance. Different signal design strategies are then analyzed, showing how Pareto-optimal design choices change depending on the channel.
    摘要 Here is the text in Simplified Chinese:这篇论文分析了在共同设计Orthogonal frequency-division multiplexing信号时存在的基本贸易offs。这些贸易offs通过Shannon容量 bound、抗干扰率和Ziv-Zakai bound on range estimation variance来量化。分析考虑了频率选择性的Rayleigh块折射,以及有限通道知识和多antenna接收。此外,也考虑了不归还调制频率偏移和相位错误。结果表明,可以通过Pareto优化的设计选择来优化通信吞吐量、抗干扰率和距离估计variance,而不同的信号设计策略会因渠道而异。

Reconfigurable Intelligent Surface Aided Space Shift Keying With Imperfect CSI

  • paper_url: http://arxiv.org/abs/2309.03059
  • repo_url: None
  • paper_authors: Xusheng Zhu, Wen Chen, Qingqing Wu, Zhendong Li, Jun Li, Shunqing Zhang, Ming Ding
  • for: This paper investigates the performance of reconfigurable intelligent surface (RIS)-aided spatial shift keying (SSK) wireless communication systems in the presence of imperfect channel state information (CSI).
  • methods: The paper analyzes the average bit error probability (ABEP) of two RIS-SSK systems, one based on intelligent reflection and the other based on blind reflection of the RIS. The authors use maximum likelihood (ML) detection and derive the conditional pairwise error probability of the composite channel, as well as the probability density function of the combined channel.
  • results: The paper derives closed-form and asymptotic expressions for the ABEP of the RIS-SSK system with imperfect CSI, and explores the impact of discrete reflection phase shifts on the system’s performance. The authors validate their analytical derivations using Monte Carlo simulations.
    Abstract In this paper, we investigate the performance of reconfigurable intelligent surface (RIS)-aided spatial shift keying (SSK) wireless communication systems in the presence of imperfect channel state information (CSI). Specifically, we analyze the average bit error probability (ABEP) of two RIS-SSK systems respectively based on intelligent reflection and blind reflection of RIS. For the intelligent RIS-SSK scheme, we first derive the conditional pairwise error probability of the composite channel through maximum likelihood (ML) detection. Subsequently, we derive the probability density function of the combined channel. Due to the intricacies of the composite channel formulation, an exact closed-form ABEP expression is unattainable through direct derivation. To this end, we resort to employing the Gaussian-Chebyshev quadrature method to estimate the results. In addition, we employ the Q-function approximation to derive the non-exact closed-form expression when CSI imperfections are present. For the blind RIS-SSK scheme, we derive both closed-form ABEP expression and asymptotic ABEP expression with imperfect CSI by adopting the ML detector. To offer deeper insights, we explore the impact of discrete reflection phase shifts on the performance of the RIS-SSK system. Lastly, we extensively validate all the analytical derivations using Monte Carlo simulations.
    摘要 在这篇论文中,我们研究了带有不完全的通道状态信息(CSI)的快速智能表面(RIS)协助的空间偏移键(SSK)无线通信系统的性能。 Specifically,我们分析了两种基于智能反射和盲反射的RIS-SSK系统的平均比特错误率(ABEP)。 对于智能RIS-SSK方案,我们首先 derivated了最大可信度检测(ML)后的 conditional pairwise error probability of the composite channel。然后,我们 derivated the probability density function of the combined channel。由于composite channel的形式复杂,直接 derivation无法得到准确的闭式表达。为此,我们使用Gaussian-Chebyshev quadrature方法来估计结果。此外,我们使用Q-函数approximation来 derivethe non-exact closed-form expression when CSI imperfections are present。 对于盲RIS-SSK方案,我们 derivated both closed-form ABEP expression and asymptotic ABEP expression with imperfect CSI by adopting the ML detector。为了深入了解,我们探讨了RIS-SSK系统中的Discrete reflection phase shifts的影响。最后,我们使用Monte Carlo simulations extensively validate all the analytical derivations。

Uncertainty Quantification in Deep Learning Based Kalman Filters

  • paper_url: http://arxiv.org/abs/2309.03058
  • repo_url: https://github.com/yonatandn/Uncertainty-extraction-in-Model-Based-DL
  • paper_authors: Yehonatan Dahan, Guy Revach, Jindrich Dunik, Nir Shlezinger
  • for: 这篇论文主要针对深度神经网络(DNNs)和卡尔曼滤波器(KFs)的结合,以便从数据中学习并跟踪复杂动力系统。
  • methods: 这篇论文研究了DNNs-aided KFs中的错误 covariance 提取方法,包括三种主要方法,它们之间的区别在于将内部特征与KF量如卡尔曼增益(KG)和先验协方差相关联。
  • results: 数值研究表明,这三种方法可以在DNNs-aided KFs中提取错误 covariance,其中模型基础/数据驱动设计具有最高准确性的错误预测。
    Abstract Various algorithms combine deep neural networks (DNNs) and Kalman filters (KFs) to learn from data to track in complex dynamics. Unlike classic KFs, DNN-based systems do not naturally provide the error covariance alongside their estimate, which is of great importance in some applications, e.g., navigation. To bridge this gap, in this work we study error covariance extraction in DNN-aided KFs. We examine three main approaches that are distinguished by the ability to associate internal features with meaningful KF quantities such as the Kalman gain (KG) and prior covariance. We identify the differences between these approaches in their requirements and their effect on the training of the system. Our numerical study demonstrates that the above approaches allow DNN-aided KFs to extract error covariance, with most accurate error prediction provided by model-based/data-driven designs.
    摘要 各种算法结合深度神经网络(DNN)和卡尔曼筛(KF)来学习从数据中跟踪复杂动态。不同于传统KF,DNN基本系统不会自然提供错误covariance的估计,这在一些应用中,如导航,是非常重要的。为了bridging这个差距,在这项工作中我们研究DNN帮助KF中的错误covariance提取。我们检查了三种主要的方法,这些方法通过与KF量可以关联内部特征,如卡尔曼增益(KG)和先验 covariance。我们描述了这些方法的不同之处,以及它们对系统训练的影响。我们的数字研究表明,以上三种方法可以让DNN帮助KF提取错误covariance,并且使用数据驱动/模型驱动的设计可以提供最准确的错误预测。

Cellular Wireless Networks in the Upper Mid-Band

  • paper_url: http://arxiv.org/abs/2309.03038
  • repo_url: None
  • paper_authors: Seongjoon Kang, Marco Mezzavilla, Sundeep Rangan, Arjuna Madanayake, Satheesh Bojja Venkatakrishnan, Gregory Hellbourg, Monisha Ghosh, Hamed Rahmani, Aditya Dhananjay
  • for: 这篇论文主要是为了评估Upper mid-band(7-24GHz)频率段的 Cellular 系统的可行性和潜在增强。
  • methods: 本论文使用了系统研究、传播计算和antenna设计来评估Upper mid-band Cellular 系统的可行性和潜在增强。
  • results: 研究结果表明,Upper mid-band Cellular 系统在高密度城市环境下可以获得更高的吞吐量和覆盖范围,但需要与现有的卫星通信、军事RADAR和天文学 radiolocation 等频率分享spectrum。此外,由于传播特性和宽频带特性,Cellular 系统需要具备智能感知和利用大空间和频率度 freedom。
    Abstract The upper mid-band -- roughly from 7 to 24 GHz -- has attracted considerable recent interest for new cellular services. This frequency range has vastly more spectrum than the highly congested bands below 7 GHz while offering more favorable propagation and coverage than the millimeter wave (mmWave) frequencies. Realizing the full potential of these bands, however, will require fundamental changes to the design of cellular systems. Most importantly, spectrum will likely need to be shared with incumbents including communication satellites, military RADAR, and radio astronomy. Also, due to the wide bandwidth, directional nature of transmission, and intermittent occupancy of incumbents, cellular systems will need to be agile to sense and intelligently use large spatial and bandwidth degrees of freedom. This paper attempts to provide an initial assessment of the feasibility and potential gains of wideband cellular systems operating in the upper mid-band. The study includes: (1) a system study to assess potential gains of multi-band systems in a representative dense urban environment; (2) propagation calculations to assess potential cross interference between satellites and terrestrial cellular services; and (3) design and evaluation of a compact multi-band antenna array structure. Leveraging these preliminary results, we identify potential future research directions to realize next-generation systems in these frequencies.
    摘要 上层中频带(约7-24GHz)在最近吸引了许多关注,用于新的mobile服务。这个频率范围具有许多谱 spectrum比下7GHz的频率带,而且具有更有利的宽频率和覆盖范围,相比于毫米波(mmWave)频率。但是,实现这些频率带的全部潜力需要基础设施的重大改进。主要是,需要与现有的卫星通信、军用RADAR和广播天文学分享频率。此外,由于宽频率、指向性传输和干扰者占用的不定期, cellular系统需要具备较强的感知和智能使用大空间和频率度量积。本文尝试提供初步评估宽频 cellular系统在上述频率带的可能性和优势。研究包括:1. 多频率系统的系统研究,以评估高密度城市环境中的可能性提升。2. 卫星和地面cellular服务之间的干扰计算,以评估干扰的可能性。3. 一种多频率天线阵列结构的设计和评估。基于这些初步结果,我们标识了未来研究的可能方向,以实现下一代系统在这些频率带。

MUSIC Algorithm for IRS-Assisted AOA Estimation

  • paper_url: http://arxiv.org/abs/2309.02947
  • repo_url: None
  • paper_authors: Qipeng Wang, Liang Liu, Shuowen Zhang
    for:* 这篇论文的目的是研究基站使用智能反射面(IRS)助手的精细感知和通信(ISAC)系统中的角度估计问题。methods:* 该论文提出了一种创新的方法,通过设计用户消息模式和IRS反射模式,将空间域接收信号转化为时间域多维ensional信号,以便使用经典的MUSIC算法估计信号从用户到IRS的角度。results:* 研究发现,通过采用该方法,可以准确地估计信号从用户到IRS的角度,并且可以利用IRS作为块来实现精细感知和通信。
    Abstract Based on the signals received across its antennas, a multi-antenna base station (BS) can apply the classic multiple signal classification (MUSIC) algorithm for estimating the angle of arrivals (AOAs) of its incident signals. This method can be leveraged to localize the users if their line-of-sight (LOS) paths to the BS are available. In this paper, we consider a more challenging AOA estimation setup in the intelligent reflecting surface (IRS) assisted integrated sensing and communication (ISAC) system, where LOS paths do not exist between the BS and the users, while the users' signals can be transmitted to the BS merely via their LOS paths to the IRS as well as the LOS path from the IRS to the BS. Specifically, we treat the IRS as the anchor and are interested in estimating the AOAs of the incident signals from the users to the IRS. Note that we have to achieve the above goal based on the signals received by the BS, because the passive IRS cannot process its received signals. However, the signals received across different antennas of the BS only contain AOA information of its incident signals via the LOS path from the IRS to the BS. To tackle this challenge arising from the spatial-domain received signals, we propose an innovative approach to create temporal-domain multi-dimension received signals for estimating the AOAs of the paths from the users to the IRS. Specifically, via a proper design of the user message pattern and the IRS reflecting pattern, we manage to show that our designed temporal-domain multi-dimension signals can be surprisingly expressed as a function of the virtual steering vectors of the IRS towards the users. This amazing result implies that the classic MUSIC algorithm can be applied to our designed temporal-domain multi-dimension signals for accurately estimating the AOAs of the signals from the users to the IRS.
    摘要 基于其天线接收的信号,一个多天线基站(BS)可以应用经典的多信号分类(MUSIC)算法来估算它所接收的信号的方向来归一化(AOA)。这种方法可以用来地理位置用户,如果用户的直线视线(LOS)路径与BS存在。在这篇论文中,我们考虑了更加具有挑战性的AOA估算设置在智能反射表(IRS)帮助的 интеграted感知通信(ISAC)系统中,其中用户与BS之间没有直线视线路径,而用户的信号可以仅通过它们的LOS路径到IRS以及IRS到BS的LOS路径传输到BS。 specifically,我们将IRS作为固定点,并关注估算它所接收的信号的AOA。注意,我们需要基于BS接收的信号来实现以上目标,因为被动的IRS无法处理其接收信号。然而,BS接收到不同天线的信号只包含AOA信息,它们的incident信号通过IRS到BS的LOS路径传输。为了解决这种在空间频域接收信号中出现的挑战,我们提出了一种创新的方法,即通过适当的用户消息模式和IRS反射模式的设计,将空间频域接收信号转换为时间频域多维度信号,以便使用经典的MUSIC算法来准确地估算用户到IRS信号的AOA。这个优等结果表明,我们设计的时间频域多维度信号可以Surprisingly表示为IRS对用户的虚投影向量的函数。这个优等结果意味着我们可以使用经典的MUSIC算法来估算用户到IRS信号的AOA。

Bi-Linear Homogeneity Enforced Calibration for Pipelined ADCs

  • paper_url: http://arxiv.org/abs/2309.02901
  • repo_url: None
  • paper_authors: Matthias Wagner, Oliver Lang, Esmaeil Kavousi Ghafi, Andreas Schwarz, Mario Huemer
  • for: 这个论文是为了探讨pipelined analog-to-digital converters (ADCs)的准确 Linearity Calibration方法。
  • methods: 这个论文使用了homogeneity enforced calibration (HEC) Approach和bi-linear homogeneity enforced calibration (BL-HEC) Approach来实现ADC的准确 Linearity Calibration。
  • results: 这个论文通过了simulation来验证HEC Approach和BL-HEC Approach的稳定性和准确性。
    Abstract Pipelined analog-to-digital converters (ADCs) are key enablers in many state-of-the-art signal processing systems with high sampling rates. In addition to high sampling rates, such systems often demand a high linearity. To meet these challenging linearity requirements, ADC calibration techniques were heavily investigated throughout the past decades. One limitation in ADC calibration is the need for a precisely known test signal. In our previous work, we proposed the homogeneity enforced calibration (HEC) approach, which circumvents this need by consecutively feeding a test signal and a scaled version of it into the ADC. The calibration itself is performed using only the corresponding output samples, such that the test signal can remain unknown. On the downside, the HEC approach requires the option to accurately scale the test signal, impeding an on-chip implementation. In this work, we provide a thorough analysis of the HEC approach, including the effects of an inaccurately scaled test signal. Furthermore, the bi-linear homogeneity enforced calibration (BL-HEC) approach is introduced and suggested to account for an inaccurate scaling and, therefore, to facilitate an on-chip implementation. In addition, a comprehensive stability and convergence analysis of the BL-HEC approach is carried out. Finally, we verify our concept with simulations.
    摘要 高速采样率的数字化抽象转换器(ADC)是现代信号处理系统的关键启用器。此外,这些系统frequently需要高linearity。为满足这些高linearity要求,ADC启动技术在过去几十年中得到了广泛的研究。一个ADC启动技术的限制是需要一个精确知道的测试信号。在我们之前的工作中,我们提出了一种均匀性强制启动(HEC)方法,该方法缺省测试信号和其涨大版本分别输入到ADC中,并通过对应的输出样本进行启动。这种启动方法不需要测试信号的精确知道,因此可以在不知道测试信号的情况下进行启动。然而,HEC方法需要精确涨大测试信号的能力,这限制了其在芯片上实现的可能性。在这个工作中,我们对HEC方法进行了全面的分析,包括测试信号的不准确涨大的影响。此外,我们还提出了一种具有兼顾不准确涨大和芯片实现的双线性均匀性强制启动(BL-HEC)方法。此外,我们还进行了BL-HEC方法的稳定性和收敛分析。最后,我们通过仿真来验证我们的概念。

Multi-Device Task-Oriented Communication via Maximal Coding Rate Reduction

  • paper_url: http://arxiv.org/abs/2309.02888
  • repo_url: None
  • paper_authors: Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang
  • for: 本研究旨在提高多设备边缘推理的通信效率,通过将学习和通信模块设计为同一目标:推理准确率最大化。
  • methods: 我们使用Maximal Coding Rate Reduction(MCR2)目标作为推理准确率的Surrogate,并将其用于precoding优化问题的形式化表述。我们还开发了一种块坐标推移(BCD)解算法。
  • results: 我们的方法在synthetic和实际数据集上实现了较高的性能,与多个基准值进行比较。
    Abstract Task-oriented communication offers ample opportunities to alleviate the communication burden in next-generation wireless networks. Most existing work designed the physical-layer communication modules and learning-based codecs with distinct objectives: learning is targeted at accurate execution of specific tasks, while communication aims at optimizing conventional communication metrics, such as throughput maximization, delay minimization, or bit error rate minimization. The inconsistency between the design objectives may hinder the exploitation of the full benefits of task-oriented communications. In this paper, we consider a specific task-oriented communication system for multi-device edge inference over a multiple-input multiple-output (MIMO) multiple-access channel, where the learning (i.e., feature encoding and classification) and communication (i.e., precoding) modules are designed with the same goal of inference accuracy maximization. Instead of end-to-end learning which involves both the task dataset and wireless channel during training, we advocate a separate design of learning and communication to achieve the consistent goal. Specifically, we leverage the maximal coding rate reduction (MCR2) objective as a surrogate to represent the inference accuracy, which allows us to explicitly formulate the precoding optimization problem. We cast valuable insights into this formulation and develop a block coordinate descent (BCD) solution algorithm. Moreover, the MCR2 objective also serves the loss function of the feature encoding network, based on which we characterize the received features as a Gaussian mixture (GM) model, facilitating a maximum a posteriori (MAP) classifier to infer the result. Simulation results on both the synthetic and real-world datasets demonstrate the superior performance of the proposed method compared to various baselines.
    摘要 任务强调通信可以减轻下一代无线网络中的通信压力。现有大部分工作都是设计物理层通信模块和学习型编码器,其目标是准确执行特定任务,而通信则是优化传统通信指标,如吞吐量最大化、延迟最小化或比特错误率最小化。这种目标之间的不一致可能会阻碍利用任务强调通信的全部优势。在本文中,我们考虑了一种特定的任务强调通信系统,用于多设备边缘推理over multiple-input multiple-output(MIMO)多Access通道。在这个系统中,学习(即特征编码和分类)和通信(即precoding)模块都是设计于最大化推理准确性的目标。而不是沿用end-to-end学习,我们提议分离学习和通信的设计,以实现一致的目标。具体来说,我们利用最大编码率减少(MCR2)目标作为推理准确性的代表,从而直接化precoding优化问题。我们对这个问题提出了值得关注的探讨,并开发了块坐标下降(BCD)解决方案。此外,MCR2目标还服务了特征编码网络的损失函数,基于这个损失函数,我们描述了接收特征为 Gaussian Mixture(GM)模型,从而实现最大 posteriori(MAP)分类器来推理结果。实验结果表明,提议的方法在synthetic和实际数据集上比多个基准方法表现出更高的性能。

Symmetric-Reciprocal-Match Method for Vector Network Analyzer Calibration

  • paper_url: http://arxiv.org/abs/2309.02886
  • repo_url: https://github.com/ZiadHatab/srm-calibration
  • paper_authors: Ziad Hatab, Michael Ernst Gadringer, Wolfgang Bösch
  • for: 这篇论文提出了一种新的方法,即对Vector Network Analyzer(VNA)进行准确启动的Symmetric-Reciprocal-Match(SRM)方法。
  • methods: 该方法使用多个对称的一 ports荷车,一个两ports对称设备,以及一个匹配荷。荷标准包括两个对称的一 ports设备,至少使用三个独特的荷。但是具体的阻抗值并不被规定。
  • results: 我们通过使用商业化短开载反射(SOLR)测量工具和验证标准进行数值示例和实验验证了提议方法的准确性。该方法的优点在于只需定义匹配标准,其余标准则可以通过对称性或对偶性进行定义。
    Abstract This paper proposes a new approach, the symmetric-reciprocal-match (SRM) method, for calibrating vector network analyzers (VNAs). The method involves using multiple symmetric one-port loads, a two-port reciprocal device, and a matched load. The load standards consist of two-port symmetric one-port devices, and at least three unique loads are used. However, the specific impedances of the loads are not specified. The reciprocal device can be any transmissive device, although a non-reciprocal device can also be used if only the one-port error boxes are of interest. The matched load is fully defined and can be asymmetric. We numerically demonstrated the proposed method's accuracy with synthetic data and with measurements of coaxial standards using a commercial short-open-load-reciprocal (SOLR) calibration kit with verification standards. An advantage of the proposed method is that only the match standard is defined, whereas the remaining standards are partially defined, either through symmetry or reciprocity.
    摘要

Reconfigurable Intelligent Surfaces for 6G Non-Terrestrial Networks: Assisting Connectivity from the Sky

  • paper_url: http://arxiv.org/abs/2309.02859
  • repo_url: None
  • paper_authors: Wali Ullah Khan, Asad Mahmood, Chandan Kumar Sheemar, Eva Lagunas, Symeon Chatzinotas, Björn Ottersten
  • for: 本文研究了利用RIS-integrated NTN来实现下一代连接的潜力。
  • methods: 本文首先介绍了RIS技术的基础知识,然后介绍了最新的RIS-enabled NTN的进展。最后,它提出了一种基于当前状态的艺术的低地球轨道卫星(LEO)通信方案,其中用户端接收信号需要 traverse both a direct link和一个RIS链路,RIS被安装在高空平台(HAP)上,位于大气层中。
  • results: 本文 conclude by highlighting open challenges and future research directions to revolutionize the realm of RIS-integrated NTNs。
    Abstract This paper studies the potential of RIS-integrated NTNs to revolutionize the next-generation connectivity. First, it discusses the fundamentals of RIS technology. Secondly, it delves into reporting the recent advances in RIS-enabled NTNs. Subsequently, it presents a novel framework based on the current state-of-the-art for low earth orbit satellites (LEO) communications, wherein the signal received at the user terminal traverses both a direct link and an RIS link, and the RIS is mounted on a high-altitude platform (HAP) situated within the stratosphere. Finally, the paper concludes by highlighting open challenges and future research directions to revolutionize the realm of RIS-integrated NTNs.
    摘要 这篇论文研究了利用RIS(媒体赋能器)integreated NTNs(无线电传输网络)来开创下一代连接性的潜力。首先,它介绍了RIS技术的基础知识。其次,它详细报道了最新的RIS-启用NTNs的进展。然后,它提出了基于当前领域的最佳实践的一种新框架,该框架适用于低地球轨道卫星(LEO)通信,其中用户终端接收到的信号需要经过直接链路和RIS链路,而RIS被安装在高空平台(HAP)上,该平台位于大气层中。最后,论文指出了RIS-integreated NTNs的开放挑战和未来研究方向,以便在这个领域取得革命性的进步。

Variational Bayesian Approximations Kalman Filter Based on Threshold Judgment

  • paper_url: http://arxiv.org/abs/2309.02789
  • repo_url: None
  • paper_authors: Zuxuan Zhang, Gang Wang, Jiacheng He, Shan Zhong
  • for: 这篇论文的目的是提出一种线上运算非泊松测量噪声模型中的误差参数估计方法。
  • methods: 这篇论文提出了一种阈值基本 kalman 统计方法,使用一定量的样本数据估计观测参数的方差阈值,并使用Variational Bayesian估计方法来 obtencorresponding noise variance estimates,以便在 subsequential iterations中使用 kalman 统计方法进行运算。
  • results: 这篇论文通过了 simulate 实验评估了这个算法的精度和有效性,发现它可以妥善地估计状态和噪声参数。
    Abstract The estimation of non-Gaussian measurement noise models is a significant challenge across various fields. In practical applications, it often faces challenges due to the large number of parameters and high computational complexity. This paper proposes a threshold-based Kalman filtering approach for online estimation of noise parameters in non-Gaussian measurement noise models. This method uses a certain amount of sample data to infer the variance threshold of observation parameters and employs variational Bayesian estimation to obtain corresponding noise variance estimates, enabling subsequent iterations of the Kalman filtering algorithm. Finally, we evaluate the performance of this algorithm through simulation experiments, demonstrating its accurate and effective estimation of state and noise parameters.
    摘要 “非泊布变量测量噪音模型的估计是各领域中的一个重要挑战。实际应用中常常面临许多参数和高计算复杂性的挑战。本文提出了一个阈值基本 kalman 统计方法来在线上估计噪音参数的非泊布变量测量噪音模型。这个方法使用一定的样本数推导观测参数的方差阈值,并使用Variational Bayesian估计来取得对应的噪音方差估计,允许后续的kalman统计算法的迭代。最后,我们通过了 simulation 实验评估这个算法的精确和有效性。”Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

  • paper_url: http://arxiv.org/abs/2309.02687
  • repo_url: None
  • paper_authors: Jiancheng An, Marco Di Renzo, Merouane Debbah, H. Vincent Poor, Chau Yuen
  • for: 本研究旨在提高无线网络的功率分配和电磁波幕射扩展,以提高系统的总率。
  • methods: 本研究使用了堆叠智能表面(SIM)技术,实现无线网络中发射束形成的电磁波幕射扩展。
  • results: 研究结果显示,相比传统MISO系统,SIM-enabled wave-based beamforming设计可以提高总率约200%。
    Abstract Intelligent metasurface has recently emerged as a promising technology that enables the customization of wireless environments by harnessing large numbers of inexpensive configurable scattering elements. However, prior studies have predominantly focused on single-layer metasurfaces, which have limitations in terms of the number of beam patterns they can steer accurately due to practical hardware restrictions. In contrast, this paper introduces a novel stacked intelligent metasurface (SIM) design. Specifically, we investigate the integration of SIM into the downlink of a multiuser multiple-input single-output (MISO) communication system, where a SIM, consisting of a multilayer metasurface structure, is deployed at the base station (BS) to facilitate transmit beamforming in the electromagnetic wave domain. This eliminates the need for conventional digital beamforming and high-resolution digital-to-analog converters at the BS. To this end, we formulate an optimization problem that aims to maximize the sum rate of all user equipments by jointly optimizing the transmit power allocation at the BS and the wave-based beamforming at the SIM, subject to both the transmit power budget and discrete phase shift constraints. Furthermore, we propose a computationally efficient algorithm for solving this joint optimization problem and elaborate on the potential benefits of employing SIM in wireless networks. Finally, the numerical results corroborate the effectiveness of the proposed SIM-enabled wave-based beamforming design and evaluate the performance improvement achieved by the proposed algorithm compared to various benchmark schemes. It is demonstrated that considering the same number of transmit antennas, the proposed SIM-based system achieves about 200\% improvement in terms of sum rate compared to conventional MISO systems.
    摘要 智能表面技术最近才有所突破,可以根据较低的成本和灵活的配置,自适应化无线环境。然而,先前的研究主要集中在单层表面上,它们在实际硬件限制下只能准确控制一定数量的扫描方向。相比之下,本文提出了一种新的叠加智能表面(SIM)设计。我们在基站(BS)部署了一个SIM,该SIM包括多层表面结构,以实现在电磁波域内进行发射扫描。这消除了BS需要传统的数字扫描和高分辨率数字到分析转换器。为此,我们形ulated一个最大化所有用户设备的总速率的优化问题,并对BS发射功率分配和SIM基于波的扫描优化进行联合优化,其中包括发射功率预算和离散相位转换约束。此外,我们提出了一种计算效率高的算法来解决这个联合优化问题。最后,我们证明了提案的SIM-基于波-扫描设计的效iveness,并评估了相对于不同参考方案的性能提升。实验结果表明,在同样的发射天线数量下,提案的SIM-基于波-扫描系统可以相比传统MISO系统提高约200%的总速率。

White paper on Selected Environmental Parameters affecting Autonomous Vehicle (AV) Sensors

  • paper_url: http://arxiv.org/abs/2309.02673
  • repo_url: None
  • paper_authors: James Lee Wei Shung, Andrea Piazzoni, Roshan Vijay, Lincoln Ang Hon Kin, Niels de Boer
  • For: This paper aims to explore the effects of different environmental parameters on LiDARs and cameras used in Autonomous Vehicles (AVs) to better understand their performance and weaknesses.* Methods: The study uses a LiDAR test methodology developed in the Urban Mobility Grand Challenge (UMGC-L010) White Paper on LiDAR performance against selected Automotive Paints.* Results: The paper identifies the weaknesses and challenges that LiDARs may face in different environmental conditions, informing AV regulators in Singapore of the effects of different environmental parameters on AV sensors and the need for more robust testing standards and specifications.
    Abstract Autonomous Vehicles (AVs) being developed these days rely on various sensor technologies to sense and perceive the world around them. The sensor outputs are subsequently used by the Automated Driving System (ADS) onboard the vehicle to make decisions that affect its trajectory and how it interacts with the physical world. The main sensor technologies being utilized for sensing and perception (S&P) are LiDAR (Light Detection and Ranging), camera, RADAR (Radio Detection and Ranging), and ultrasound. Different environmental parameters would have different effects on the performance of each sensor, thereby affecting the S&P and decision-making (DM) of an AV. In this publication, we explore the effects of different environmental parameters on LiDARs and cameras, leading us to conduct a study to better understand the impact of several of these parameters on LiDAR performance. From the experiments undertaken, the goal is to identify some of the weaknesses and challenges that a LiDAR may face when an AV is using it. This informs AV regulators in Singapore of the effects of different environmental parameters on AV sensors so that they can determine testing standards and specifications which will assess the adequacy of LiDAR systems installed for local AV operations more robustly. Our approach adopts the LiDAR test methodology first developed in the Urban Mobility Grand Challenge (UMGC-L010) White Paper on LiDAR performance against selected Automotive Paints.
    摘要 自动驾驶车辆(AV)的开发现在启用了多种感知技术来感知和理解车辆周围的世界。感知输出后,车辆上的自动驾驶系统(ADS)使用这些输出来做出影响车辆轨迹和与物理世界之间交互的决策。主要的感知技术包括LiDAR(光探测和距离测量)、摄像头、RADAR(电波探测和距离测量)和超声波。不同的环境参数会对每种感知器的性能产生不同的影响,从而影响AV的感知和决策(DM)。在这篇文章中,我们探讨了不同环境参数对LiDAR和摄像头的影响,以便更好地理解LiDAR在AV中的弱点和挑战。这些研究结果可以让SG的AV规定者更好地了解AV感知器在不同环境下的性能,从而制定更加robust的测试标准和规范。我们采用了LiDAR测试方法学,根据城市 mobilit Grand Challenge(UMGC-L010)白皮书中的LiDAR性能测试方法。

Passive Eavesdropping Can Significantly Slow Down RIS-Assisted Secret Key Generation

  • paper_url: http://arxiv.org/abs/2309.02653
  • repo_url: None
  • paper_authors: Ningya Xu, Guoshun Nan, Xiaofeng Tao
  • for: This paper aims to maximize the RIS-assisted physical-layer secret key generation by optimizing the RIS units switching under the eavesdropping channel.
  • methods: The paper introduces a mathematical formulation to maximize the key generation rate and provides a step-by-step analysis.
  • results: The paper shows the effectiveness of the method in benefiting the secret key capacity under the eavesdropping channel, and observes that the key randomness and unmatched key rate are also significantly improved.Here is the same information in Simplified Chinese text:
  • for: 这篇论文目标是通过控制信号的相位和幅度来提高RIS协助的物理层密钥生成。
  • methods: 论文提出了一个数学形式来最大化密钥生成率,并提供了一步步分析。
  • results: 论文证明了这种方法在听说攻击channel下可以提高密钥容量,并观察到密钥具有更高的随机性和不匹配率。
    Abstract Reconfigurable Intelligent Surface (RIS) assisted physical layer key generation has shown great potential to secure wireless communications by smartly controlling signals such as phase and amplitude. However, previous studies mainly focus on RIS adjustment under ideal conditions, while the correlation between the eavesdropping channel and the legitimate channel, a more practical setting in the real world, is still largely under-explored for the key generation. To fill this gap, this paper aims to maximize the RIS-assisted physical-layer secret key generation by optimizing the RIS units switching under the eavesdropping channel. Firstly, we theoretically show that passive eavesdropping significantly reduces RIS-assisted secret key generation. Keeping this in mind, we then introduce a mathematical formulation to maximize the key generation rate and provide a step-by-step analysis. Extensive experiments show the effectiveness of our method in benefiting the secret key capacity under the eavesdropping channel. We also observe that the key randomness, and unmatched key rate, two metrics that measure the secret key quality, are also significantly improved, potentially paving the way to RIS-assisted key generation in real-world scenarios.
    摘要 智能表面协助物理层密钥生成(RIS)已经展示了为无线通信安全提供了巨大潜力,通过智能控制信号的相位和幅度。然而,先前的研究主要关注RIS的调整情况下理想的条件下,而实际世界中 correlate between the eavesdropping channel and the legitimate channel 的情况还是尚未得到了充分的探讨。为了填补这一漏洞,本文目的是通过最大化RIS单元的 switching 来提高RIS协助物理层密钥生成。首先,我们理论上表明了潜在的窃听者可以很大地降低RIS协助的密钥生成。鉴于这一点,我们然后引入了一种数学表述,以最大化密钥生成率,并提供了一步步分析。广泛的实验表明了我们的方法在窃听 Channel 下可以提高secret key capacity。我们还发现了两个秘密钥质量指标:key randomness 和 unmatched key rate 在窃听 Channel 下也有显著提高,这些指标可能为RIS协助的密钥生成在实际世界中开辟了新的可能性。

  • paper_url: http://arxiv.org/abs/2309.02648
  • repo_url: None
  • paper_authors: Yuan Guo, Yang Liu, Qingqing Wu, Xiaoyang Li, Qingjiang Shi
  • for: This paper aims to jointly design beamforming, power allocation, and signal processing in a full-duplex (FD) uplink communication system aided by a reconfigurable intelligent surface (RIS) to enhance the integrated sensing and communication (ISAC) capability.
  • methods: The paper proposes an iterative solution using convex optimization techniques, including majorization-minimization (MM) and penalty-dual-decomposition (PDD), to optimize all variables. Additionally, an low-complexity solution using alternative direction method of multipliers (ADMM) is developed to update all variables analytically and run efficiently.
  • results: Numerical results demonstrate the effectiveness and efficiency of the proposed algorithms in enhancing the ISAC capability of the FD uplink communication system aided by RIS, with significant performance boosting achieved by employing RIS.
    Abstract Integrated sensing and communication (ISAC) capability is envisioned as one key feature for future cellular networks. Classical half-duplex (HD) radar sensing is conducted in a "first-emit-then-listen" manner. One challenge to realize HD ISAC lies in the discrepancy of the two systems' time scheduling for transmitting and receiving. This difficulty can be overcome by full-duplex (FD) transceivers. Besides, ISAC generally has to comprise its communication rate due to realizing sensing functionality. This loss can be compensated by the emerging reconfigurable intelligent surface (RIS) technology. This paper considers the joint design of beamforming, power allocation and signal processing in a FD uplink communication system aided by RIS, which is a highly nonconvex problem. To resolve this challenge, via leveraging the cutting-the-edge majorization-minimization (MM) and penalty-dual-decomposition (PDD) methods, we develop an iterative solution that optimizes all variables via using convex optimization techniques. Besides, by wisely exploiting alternative direction method of multipliers (ADMM) and optimality analysis, we further develop a low complexity solution that updates all variables analytically and runs highly efficiently. Numerical results are provided to verify the effectiveness and efficiency of our proposed algorithms and demonstrate the significant performance boosting by employing RIS in the FD ISAC system.
    摘要 integrated sensing and communication (ISAC) capability 是未来网络的一个关键特性。 classical half-duplex (HD) 雷达探测是通过 "first-emit-then-listen" 方式进行的。一个实现 HD ISAC 的挑战在于两个系统的时间安排不同。这个困难可以通过全双工 (FD) 接收机来解决。此外,ISAC 通常需要牺牲其通信率,以实现探测功能。这个损失可以通过 emerging 可配置智能表面 (RIS) 技术来补偿。本文考虑了在 FD 上升通信系统中使用 RIS 的共同设计,包括扬声器、功率分配和信号处理。这是一个非 convex 问题,通过利用 cutting-the-edge 主要化-最小化 (MM) 和罚偿对偶 (PDD) 方法,我们开发了一个迭代解决方案,该方案通过使用几何优化技术来优化所有变量。此外,通过利用 alternative direction method of multipliers (ADMM) 和可行性分析,我们进一步开发了一个低复杂度的解决方案,该方案可以在分析方式下更新所有变量,并在高效地运行。 numerical results 表明我们提出的方法的有效性和高效性,并且通过使用 RIS 在 FD ISAC 系统中实现了显著性能提升。

Mean Field Game-based Waveform Precoding Design for Mobile Crowd Integrated Sensing, Communication, and Computation Systems

  • paper_url: http://arxiv.org/abs/2309.02645
  • repo_url: None
  • paper_authors: Dezhi Wang, Chongwen Huang, Jiguang He, Xiaoming Chen, Wei Wang, Zhaoyang Zhang, Zhu Han, Mérouane Debbah
  • for: 大规模移动群体 интеграble 感知、通信、计算(ISCC)系统,如智能家居和连接车辆,需要许多 интеграble 感知和通信(ISAC)设备来感知目标并将数据传输到基站(BS)进行进一步处理。然而,随着 ISAC 设备的数量增加,存在许多 ISAC 设备之间的密切互动,尤其是在数据收集和处理过程中。
  • methods: 本文使用了 Mean Field Game(MFG)方法来解决大规模 ISAC 设备之间的互动问题。特别是,我们首先使用 MFG 方法将其他 ISAC 设备对其影响转化为平均场term,并 deriv 了 Fokker-Planck-Kolmogorov 方程,这个方程模型了系统状态的演化。然后,我们 deriv 了基于平均场term的成本函数,并重新定义了波形预编码设计问题。
  • results: 我们的算法可以有效地解决大规模 ISAC 设备之间的互动问题,并且与其他基线比较,我们的波形预编码设计算法具有提高通信性能和降低成本函数的优势。
    Abstract Data collection and processing timely is crucial for mobile crowd integrated sensing, communication, and computation~(ISCC) systems with various applications such as smart home and connected cars, which requires numerous integrated sensing and communication~(ISAC) devices to sense the targets and offload the data to the base station~(BS) for further processing. However, as the number of ISAC devices growing, there exists intensive interactions among ISAC devices in the processes of data collection and processing since they share the common network resources. In this paper, we consider the environment sensing problem in the large-scale mobile crowd ISCC systems and propose an efficient waveform precoding design algorithm based on the mean field game~(MFG). Specifically, to handle the complex interactions among large-scale ISAC devices, we first utilize the MFG method to transform the influence from other ISAC devices into the mean field term and derive the Fokker-Planck-Kolmogorov equation, which model the evolution of the system state. Then, we derive the cost function based on the mean field term and reformulate the waveform precoding design problem. Next, we utilize the G-prox primal-dual hybrid gradient algorithm to solve the reformulated problem and analyze the computational complexity of the proposed algorithm. Finally, simulation results demonstrate that the proposed algorithm can solve the interactions among large-scale ISAC devices effectively in the ISCC process. In addition, compared with other baselines, the proposed waveform precoding design algorithm has advantages in improving communication performance and reducing cost function.
    摘要 <>Translate the given text into Simplified Chinese.<>大数据收集和处理在移动众情 интеGRATED sensing、通信和计算(ISCC)系统中是关键,这些系统有各种应用,如智能家居和连接的汽车,需要大量的 интеGRATED sensing和通信(ISAC)设备来感知目标并将数据传输到基站(BS)进行进一步处理。然而,随着ISAC设备的数量增加,存在许多ISAC设备之间的互动,这些设备共享共享网络资源。在这篇论文中,我们考虑了大规模移动众情 ISCC 系统中的环境感知问题,并提出了一种高效的波形预编设计算法,基于 Mean Field Game(MFG)。specifically,为了处理大规模ISAC设备之间复杂的互动,我们首先使用MFG方法将其他ISAC设备的影响转化为均场项,并 derive the Fokker-Planck-Kolmogorov equation,该方程模型系统状态的演化。然后,我们 derive the cost function based on the mean field term,并重新定义波形预编设计问题。接着,我们使用G-prox primal-dual hybrid gradient algorithm来解决重新定义的问题,并分析提案的计算复杂性。最后,实验结果表明,提案的算法可以有效地处理大规模ISAC设备之间的互动,并且相比其他基准,提案的波形预编设计算法具有改善通信性和降低成本函数的优势。

cs.SD - 2023-09-05

Music Source Separation with Band-Split RoPE Transformer

  • paper_url: http://arxiv.org/abs/2309.02612
  • repo_url: None
  • paper_authors: Wei-Tsung Lu, Ju-Chiang Wang, Qiuqiang Kong, Yun-Ning Hung
  • for: 这个论文是为了提出一种新的频域方法,用于分离音乐录音中的不同乐器部分。
  • methods: 这种方法基于一种带划分模块,将输入复杂的spectrogram映射到不同的subband水平上,然后使用一个堆栈的层次Transformer来模型内部带和间部序列进行多带封锁估计。
  • results: 这个系统在Sound Demixing Challenge(SDX23)的MSS追踪上 Ranked the first place,并在MUSDB18HQ上 achieved state-of-the-art result without extra training data,具有9.80 dB的平均SDR。
    Abstract Music source separation (MSS) aims to separate a music recording into multiple musically distinct stems, such as vocals, bass, drums, and more. Recently, deep learning approaches such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been used, but the improvement is still limited. In this paper, we propose a novel frequency-domain approach based on a Band-Split RoPE Transformer (called BS-RoFormer). BS-RoFormer relies on a band-split module to project the input complex spectrogram into subband-level representations, and then arranges a stack of hierarchical Transformers to model the inner-band as well as inter-band sequences for multi-band mask estimation. To facilitate training the model for MSS, we propose to use the Rotary Position Embedding (RoPE). The BS-RoFormer system trained on MUSDB18HQ and 500 extra songs ranked the first place in the MSS track of Sound Demixing Challenge (SDX23). Benchmarking a smaller version of BS-RoFormer on MUSDB18HQ, we achieve state-of-the-art result without extra training data, with 9.80 dB of average SDR.
    摘要 音乐源分离(MSS)目标是将音乐录音分解成多个音乐上的独立 Component,如 vocals、bass、鼓等。 现在,深度学习方法如卷积神经网络(CNN)和循环神经网络(RNN)已经被应用,但是改进的空间仍然有限。在这篇论文中,我们提出了一种新的频域方法,基于 Band-Split RoPE Transformer(BS-RoFormer)。BS-RoFormer使用带分模块将输入复杂spectrogram projet到子带水平表示,然后将一 stack of hierarchical Transformers用于内带和间带序列的模型化,以实现多带掩码估计。为了训练MSS模型,我们提出了Rotary Position Embedding(RoPE)。BS-RoFormer系统在MUSDB18HQ和500首EXTRA歌曲上训练后,在Sound Demixing Challenge(SDX23)的MSS轨道上排名第一,并 achieved state-of-the-art result without extra training data,具有9.80 dB的平均SDR。 benchmarking一个小版本的BS-RoFormer在MUSDB18HQ上,我们也达到了最佳结果,不需要额外的训练数据,具有9.80 dB的平均SDR。

BWSNet: Automatic Perceptual Assessment of Audio Signals

  • paper_url: http://arxiv.org/abs/2309.02592
  • repo_url: None
  • paper_authors: Clément Le Moine Veillon, Victor Rosi, Pablo Arias Sarah, Léane Salais, Nicolas Obin
  • for: 这个论文是用来描述一种基于最差最好的排序(BWS)实验获取的人类评价的模型。
  • methods: 该模型使用了一组成本函数和约束,将声音amples映射到一个表示感知的属性的嵌入空间中。
  • results: 对两个BWS研究中的社会态度和气质质量的声音进行测试,结果显示该模型的嵌入空间结构与人类评价具有一致性。Here’s the translation in English:
  • for: This paper proposes a model that can be trained from raw human judgments obtained through a Best-Worst scaling (BWS) experiment, which maps sound samples into an embedded space that represents the perception of a studied attribute.
  • methods: The model uses a set of cost functions and constraints, interpreting trial-wise ordinal relations as distance comparisons in a metric learning task.
  • results: The results show that the structure of the latent space is faithful to human judgments, based on tests on two BWS studies investigating the perception of speech social attitudes and timbral qualities.
    Abstract This paper introduces BWSNet, a model that can be trained from raw human judgements obtained through a Best-Worst scaling (BWS) experiment. It maps sound samples into an embedded space that represents the perception of a studied attribute. To this end, we propose a set of cost functions and constraints, interpreting trial-wise ordinal relations as distance comparisons in a metric learning task. We tested our proposal on data from two BWS studies investigating the perception of speech social attitudes and timbral qualities. For both datasets, our results show that the structure of the latent space is faithful to human judgements.
    摘要 Translated into Simplified Chinese:这篇论文介绍了BWSNet模型,可以从简陋人类评价获得的 raw 数据进行训练。它将声音样本映射到一个嵌入空间中,该空间表达人类对研究 attribute 的感受。为此,我们提出了一组成本函数和约束,将试验性质的ORDINAL 关系解释为度量学习任务中的距离比较。我们在两个 BWS 研究中进行了测试,研究的是speech 社会态度和气质质量。为两个数据集,我们的结果表明,嵌入空间的结构与人类评价具有一致性。

Symbolic Music Representations for Classification Tasks: A Systematic Evaluation

  • paper_url: http://arxiv.org/abs/2309.02567
  • repo_url: https://github.com/anusfoil/symrep
  • paper_authors: Huan Zhang, Emmanouil Karystinaios, Simon Dixon, Gerhard Widmer, Carlos Eduardo Cancino-Chacón
  • for: 本研究旨在对深度学习方法在音乐信息检索领域中的应用进行系统性的检查,特别是对符号音乐的不同表示方式的研究。
  • methods: 本研究使用矩阵表示、序列表示和图表示,并与符号谱和表演进行相应的神经网络架构。此外,我们还介绍了一种新的图表示方法,用于符号表演的全球分类任务。
  • results: 我们的系统性评估结果显示,图表示方法在三个乐曲级别分类任务中表现出色,而且训练成本较低。
    Abstract Music Information Retrieval (MIR) has seen a recent surge in deep learning-based approaches, which often involve encoding symbolic music (i.e., music represented in terms of discrete note events) in an image-like or language like fashion. However, symbolic music is neither an image nor a sentence, and research in the symbolic domain lacks a comprehensive overview of the different available representations. In this paper, we investigate matrix (piano roll), sequence, and graph representations and their corresponding neural architectures, in combination with symbolic scores and performances on three piece-level classification tasks. We also introduce a novel graph representation for symbolic performances and explore the capability of graph representations in global classification tasks. Our systematic evaluation shows advantages and limitations of each input representation. Our results suggest that the graph representation, as the newest and least explored among the three approaches, exhibits promising performance, while being more light-weight in training.
    摘要 音乐信息检索(MIR)在深度学习基于方法方面有最近的增长,这些方法常常通过编码符号音乐(即音乐表示为离散音 Event)来实现图像或语言类似的表示方式。然而,符号音乐并不是图像 noch ist eine Sprache,研究符号领域缺乏全面的不同表示方法的报告。在这篇论文中,我们调查矩阵(钢琴 Roll)、序列和图表示法和其相应的神经网络架构,并在三个乐曲级别分类任务中进行了系统性的评估。我们还介绍了一种新的图表示法 для符号性表演,并探讨图表示法在全球分类任务中的能力。我们的系统性评估显示每种输入表示方法的优势和局限性。结果表明,图表示法,作为最新并最少探索的一种方法,具有抢人的表现,同时在训练中更轻量级。

Employing Real Training Data for Deep Noise Suppression

  • paper_url: http://arxiv.org/abs/2309.02432
  • repo_url: None
  • paper_authors: Ziyi Xu, Marvin Sach, Jan Pirklbauer, Tim Fingscheidt
  • for: 用于提高深度噪音消除(DNS)模型的训练,使其能够更好地适应实际应用场景。
  • methods: 使用实际训练数据,通过非生成方法或引用自由损失函数来训练DNS模型,并使用一个端到端非侵入式深度神经网络(PESQ-DNN)来估算噪音消除后的语音质量评分。
  • results: 使用实际训练数据和PESQ-DNN,DNS模型的训练效果比使用仅Synthetic训练数据的参考方法更好,在Synthetic测试数据上比基线方法提高0.32个PESQ分,在实际测试数据上也超过了基线方法0.05个DNSMOS分。
    Abstract Most deep noise suppression (DNS) models are trained with reference-based losses requiring access to clean speech. However, sometimes an additive microphone model is insufficient for real-world applications. Accordingly, ways to use real training data in supervised learning for DNS models promise to reduce a potential training/inference mismatch. Employing real data for DNS training requires either generative approaches or a reference-free loss without access to the corresponding clean speech. In this work, we propose to employ an end-to-end non-intrusive deep neural network (DNN), named PESQ-DNN, to estimate perceptual evaluation of speech quality (PESQ) scores of enhanced real data. It provides a reference-free perceptual loss for employing real data during DNS training, maximizing the PESQ scores. Furthermore, we use an epoch-wise alternating training protocol, updating the DNS model on real data, followed by PESQ-DNN updating on synthetic data. The DNS model trained with the PESQ-DNN employing real data outperforms all reference methods employing only synthetic training data. On synthetic test data, our proposed method excels the Interspeech 2021 DNS Challenge baseline by a significant 0.32 PESQ points. Both on synthetic and real test data, the proposed method beats the baseline by 0.05 DNSMOS points - although PESQ-DNN optimizes for a different perceptual metric.
    摘要 现有大多数深度噪声抑制(DNS)模型通常通过参考基于的损失来训练,但是在实际应用中,加法式麦克风模型可能不足。因此,使用实际训练数据来训练 DNS 模型可能会降低训练/推断匹配问题。使用实际数据进行 DNS 训练需要使用生成方法或无参考损失。在这种情况下,我们提出了一种非侵入式深度神经网络(DNN)名为 PESQ-DNN,用于估算噪声抑制后的语音质量评分(PESQ)分数。PESQ-DNN 提供了一种无参考的感知损失,可以使用实际数据进行 DNS 训练,最大化 PESQ 分数。此外,我们使用了一种每个班次 alternate 训练协议,首先更新 DNS 模型使用实际数据,然后 PESQ-DNN 使用生成数据进行更新。使用 PESQ-DNN 进行 DNS 训练的模型超越了所有参考方法,使用只有synthetic 训练数据。在synthetic 测试数据上,我们的提议方法高于2021年慕尼黑语音处理大会 DNS 挑战基准值的0.32 PESQ 分数。在synthetic 和实际测试数据上,我们的提议方法超越了基准值0.05 DNSMOS 分数。

Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition

  • paper_url: http://arxiv.org/abs/2309.02418
  • repo_url: None
  • paper_authors: Minh Tran, Yufeng Yin, Mohammad Soleymani
  • for: 本研究旨在提高无监督个性化情感识别的泛化和稳定性,通过学习 speaker embedding 来学习 robust speech 表示。
  • methods: 我们首先在无监督情况下预训一个 encoder,使其可以通过 speaker embedding 来学习 conditioned on speakers 的 robust speech 表示。其次,我们提出一种无监督方法,通过找到类似的 speaker 并利用它们的标签分布来补做标签分布的偏移。
  • results: 我们的方法在 MSP-Podcast 集合上进行了广泛的实验,结果表明,我们的方法可以一直高于强个性化基eline,并实现情感识别预测的状态前沿性。
    Abstract There are individual differences in expressive behaviors driven by cultural norms and personality. This between-person variation can result in reduced emotion recognition performance. Therefore, personalization is an important step in improving the generalization and robustness of speech emotion recognition. In this paper, to achieve unsupervised personalized emotion recognition, we first pre-train an encoder with learnable speaker embeddings in a self-supervised manner to learn robust speech representations conditioned on speakers. Second, we propose an unsupervised method to compensate for the label distribution shifts by finding similar speakers and leveraging their label distributions from the training set. Extensive experimental results on the MSP-Podcast corpus indicate that our method consistently outperforms strong personalization baselines and achieves state-of-the-art performance for valence estimation.
    摘要 “各自差异的表达行为,受文化 norms 和人类特质的影响,会导致情感识别的表现下降。因此,个人化是识别表情认识的重要步骤。在这篇文章中,我们首先透过自我监督学习法,将speaker embedding learnable 运算在训练集中,以学习基于话者的Robust speech表现。其次,我们提出了一个无supervision的方法,通过找到相似的话者,并利用它们在训练集中的标签分布来补偿标签分布的差异。实验结果显示,我们的方法可以与强大的个人化基eline 相比,并 achieve state-of-the-art 的表情认识性能。”Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

The Batik-plays-Mozart Corpus: Linking Performance to Score to Musicological Annotations

  • paper_url: http://arxiv.org/abs/2309.02399
  • repo_url: https://github.com/huispaty/batik_plays_mozart
  • paper_authors: Patricia Hu, Gerhard Widmer
  • for: 这个论文是为了创建一个高质量、高精度的钢琴表演数据集,结合专业的莫扎特钢琴奏鸣和专家标注的乐谱。
  • methods: 该论文使用了专业钢琴家 Roland Batik 的录音,并将其与现代莫扎特乐谱标准版本进行了精度对应。
  • results: 该论文创建了一个高精度的钢琴表演数据集,可以用于研究表演与结构之间的关系,并提供了两个探索性实验来证明其使用价值。
    Abstract We present the Batik-plays-Mozart Corpus, a piano performance dataset combining professional Mozart piano sonata performances with expert-labelled scores at a note-precise level. The performances originate from a recording by Viennese pianist Roland Batik on a computer-monitored B\"osendorfer grand piano, and are available both as MIDI files and audio recordings. They have been precisely aligned, note by note, with a current standard edition of the corresponding scores (the New Mozart Edition) in such a way that they can further be connected to the musicological annotations (harmony, cadences, phrases) on these scores that were recently published by Hentschel et al. (2021). The result is a high-quality, high-precision corpus mapping scores and musical structure annotations to precise note-level professional performance information. As the first of its kind, it can serve as a valuable resource for studying various facets of expressive performance and their relationship with structural aspects. In the paper, we outline the curation process of the alignment and conduct two exploratory experiments to demonstrate its usefulness in analyzing expressive performance.
    摘要 我们现在提出了“巴提克扮演莫扎特” corpora,这是一个结合了专业莫扎特钢琴室内乐表演和专家标注的谱面数据集。表演来自奥地利钢琴家罗兰·巴提克在计算机监测的波Sendendorfer大钢琴上的录音,并以MIDI文件和音频录音的形式可用。它们已经精准地对应了现代标准版谱面(新莫扎特版),以便可以与最近由豪伦肯(2021)等人发表的音乐学注释(和声、推移、段落)相连接。这个结果是一个高质量、高精度的谱面和音乐结构注释映射数据集。作为首个类型的资源,它可以用于研究不同的表演特征和其与结构方面的关系。在论文中,我们介绍了对Alignment的策略和进行了两项探索性实验,以 demonstate其在分析表演各种特征方面的用用。

PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective

  • paper_url: http://arxiv.org/abs/2309.02265
  • repo_url: None
  • paper_authors: Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters
  • for: 本研究使用自适应学习(SSL)解决音高估计问题。
  • methods: 我们使用一个轻量级(<30k参数)的siamesen神经网络,该网络使用两个不同的折射音高版本的同一个音频 Constant-Q Transform 输入。为避免encoder-only设置中的模型崩溃,我们提议一种新的类型-based transposition-equivariant目标,该目标捕捉音高信息。此外,我们设计了网络的architecture来保持折射性,通过引入学习的Toeplitz矩阵。
  • results: 我们对歌唱voice和乐器pitch估计两个任务进行评估,并显示我们的模型能够泛化到任务和数据集,同时具有轻量级和实时应用compatibility。具体来说,我们的结果超过了自适应基eline和supervised方法的自适应基eline,并将自适应和supervised方法之间的性能差逐渐缩小。
    Abstract In this paper, we address the problem of pitch estimation using Self Supervised Learning (SSL). The SSL paradigm we use is equivariance to pitch transposition, which enables our model to accurately perform pitch estimation on monophonic audio after being trained only on a small unlabeled dataset. We use a lightweight ($<$ 30k parameters) Siamese neural network that takes as inputs two different pitch-shifted versions of the same audio represented by its Constant-Q Transform. To prevent the model from collapsing in an encoder-only setting, we propose a novel class-based transposition-equivariant objective which captures pitch information. Furthermore, we design the architecture of our network to be transposition-preserving by introducing learnable Toeplitz matrices. We evaluate our model for the two tasks of singing voice and musical instrument pitch estimation and show that our model is able to generalize across tasks and datasets while being lightweight, hence remaining compatible with low-resource devices and suitable for real-time applications. In particular, our results surpass self-supervised baselines and narrow the performance gap between self-supervised and supervised methods for pitch estimation.
    摘要 在这篇论文中,我们解决了使用自适应学习(SSL)的抽象问题。我们使用的SSL模式是抽象到音高的转换,这使得我们的模型可以在只有一小量无标签数据上训练后,对声音进行精准的抽象。我们使用一个轻量级(<30k参数)的同构神经网络,该网络接受两个不同的抽象后的同一个音频表示,即其常见频谱变换。为避免encoder-only设置中的模型崩溃,我们提出了一种新的类型-基于的转换平衡目标,该目标捕捉到音高信息。此外,我们设计了网络的架构,使其保持转换平衡,通过引入学习的托凯利矩阵。 我们对两个任务:唱歌voice和乐器pitch estimation进行评估,并显示了我们的模型可以适应任务和数据集,同时具有轻量级和实时应用Compatible。具体来说,我们的结果超过了自我监督基线,并将自我监督和指导方法之间的性能差距缩小。

cs.CV - 2023-09-05

Compressing Vision Transformers for Low-Resource Visual Learning

  • paper_url: http://arxiv.org/abs/2309.02617
  • repo_url: https://github.com/chensy7/efficient-vit
  • paper_authors: Eric Youn, Sai Mitheran J, Sanjana Prabhu, Siyuan Chen
  • for: 这个研究的目的是将视Transformer(ViT)和其变体带到边缘环境中,以提高视觉学习的可行性和效率。
  • methods: 我们使用了各种实用的模型压缩技术,包括传承、剪裁和量化,以降低ViT的模型大小和 Compute重量,并且保持与顶尖ViT的几乎相同的准确性。
  • results: 我们的实现可以在NVIDIA Jetson Nano(4GB)上实现快速的视觉 trasformer 推断,并且与顶尖ViT的准确性几乎相同,具体来说,我们在ImageNet题库上的Top-1准确性为85.3%,比顶尖ViT-B的86.3%高出0.6%。
    Abstract Vision transformer (ViT) and its variants have swept through visual learning leaderboards and offer state-of-the-art accuracy in tasks such as image classification, object detection, and semantic segmentation by attending to different parts of the visual input and capturing long-range spatial dependencies. However, these models are large and computation-heavy. For instance, the recently proposed ViT-B model has 86M parameters making it impractical for deployment on resource-constrained devices. As a result, their deployment on mobile and edge scenarios is limited. In our work, we aim to take a step toward bringing vision transformers to the edge by utilizing popular model compression techniques such as distillation, pruning, and quantization. Our chosen application environment is an unmanned aerial vehicle (UAV) that is battery-powered and memory-constrained, carrying a single-board computer on the scale of an NVIDIA Jetson Nano with 4GB of RAM. On the other hand, the UAV requires high accuracy close to that of state-of-the-art ViTs to ensure safe object avoidance in autonomous navigation, or correct localization of humans in search-and-rescue. Inference latency should also be minimized given the application requirements. Hence, our target is to enable rapid inference of a vision transformer on an NVIDIA Jetson Nano (4GB) with minimal accuracy loss. This allows us to deploy ViTs on resource-constrained devices, opening up new possibilities in surveillance, environmental monitoring, etc. Our implementation is made available at https://github.com/chensy7/efficient-vit.
    摘要 “视力变换器”(ViT)和其变体在视觉学领导板卡上提供了状态机器的精度,包括图像分类、物体检测和 semantics 分割,通过不同部分的视觉输入注意力和捕捉长距离的空间相关性。但这些模型很大, computation 沉重。例如,最近提出的 ViT-B 模型有 86M 参数,使其在资源有限的设备上无法进行部署。因此,我们的目标是通过使用流行的模型压缩技术,如熔炼、剪辑和量化,将视力变换器带到边缘。我们选择的应用环境是无人飞行器(UAV),它是电池电池和内存有限制的,搭载了一款基于 NVIDIA Jetson Nano 的单板计算机,具有 4GB 的 RAM。然而,UAV 需要高精度,接近状态机器精度,以确保自主导航中的物体避免和人类的correct 当地化。推理延迟应该被最小化,因为应用要求。因此,我们的目标是在 NVIDIA Jetson Nano (4GB)上快速推理一个视力变换器,并尽可能减少精度损失。这样,我们就能够将视力变换器部署到资源有限的设备上,开放出新的可能性,例如监测、环境监测等。我们的实现可以在 GitHub 上找到:https://github.com/chensy7/efficient-vit。

Self-Supervised Pretraining Improves Performance and Inference Efficiency in Multiple Lung Ultrasound Interpretation Tasks

  • paper_url: http://arxiv.org/abs/2309.02596
  • repo_url: None
  • paper_authors: Blake VanBerlo, Brian Li, Jesse Hoey, Alexander Wong
  • for: 这个研究是 investigate whether self-supervised pretraining could produce a neural network feature extractor applicable to multiple classification tasks in B-mode lung ultrasound analysis.
  • methods: 这个研究使用了自我监督预训练,并在三个肺超声分类任务上进行了细化。
  • results: 研究结果表明,使用自我监督预训练可以提高肺超声分类任务的平均总成功率,并且可以降低总计算时间。
    Abstract In this study, we investigated whether self-supervised pretraining could produce a neural network feature extractor applicable to multiple classification tasks in B-mode lung ultrasound analysis. When fine-tuning on three lung ultrasound tasks, pretrained models resulted in an improvement of the average across-task area under the receiver operating curve (AUC) by 0.032 and 0.061 on local and external test sets respectively. Compact nonlinear classifiers trained on features outputted by a single pretrained model did not improve performance across all tasks; however, they did reduce inference time by 49% compared to serial execution of separate fine-tuned models. When training using 1% of the available labels, pretrained models consistently outperformed fully supervised models, with a maximum observed test AUC increase of 0.396 for the task of view classification. Overall, the results indicate that self-supervised pretraining is useful for producing initial weights for lung ultrasound classifiers.
    摘要 在这项研究中,我们研究了自我监督预训练是否可以生成应用于多个分类任务的脑神经网络特征提取器。当细化到三个肺超声分类任务时,预训练模型可以提高平均 across-task 接受器操作曲线(AUC)的值 by 0.032 and 0.061 on local and external test sets respectively。compact nonlinear classifiers trained on features outputted by a single pretrained model did not improve performance across all tasks; however, they did reduce inference time by 49% compared to serial execution of separate fine-tuned models。when training using 1% of the available labels, pretrained models consistently outperformed fully supervised models, with a maximum observed test AUC increase of 0.396 for the task of view classification。总的来说,结果表明自我监督预训练是肺超声分类器的初始 веса的生成的有用。

Anatomy-Driven Pathology Detection on Chest X-rays

  • paper_url: http://arxiv.org/abs/2309.02578
  • repo_url: https://github.com/philip-mueller/adpd
  • paper_authors: Philip Müller, Felix Meissen, Johannes Brandt, Georgios Kaissis, Daniel Rueckert
  • for: automatic interpretation of medical scans, such as chest X-rays, and providing a high level of explainability to support radiologists in making informed decisions.
  • methods: uses easy-to-annotate bounding boxes of anatomical regions as proxies for pathologies, and studies two training approaches: supervised training using anatomy-level pathology labels and multiple instance learning (MIL) with image-level pathology labels.
  • results: outperforms weakly supervised methods and fully supervised detection with limited training samples, and the MIL approach is competitive with both baseline approaches, demonstrating the potential of the proposed approach.Here’s the text in Simplified Chinese:
  • for: automatic化医疗影像解读,如胸部X线影像,并提供高水准的解释以支持 radiologists 做出 informed 的决策。
  • methods: 使用易于注释的 bounding boxes 的 anatomical regions 作为疾病的代理,并研究两种训练方法: supervised 训练使用 anatomy-level 疾病标签,以及 multiple instance learning (MIL) with image-level 疾病标签。
  • results: outperforms weakly supervised methods 和仅有限的训练样本的充分 supervised detection,并且 MIL 方法与两种基eline approaches 相当,因此 demonstrates 了提案的方法的潜力。
    Abstract Pathology detection and delineation enables the automatic interpretation of medical scans such as chest X-rays while providing a high level of explainability to support radiologists in making informed decisions. However, annotating pathology bounding boxes is a time-consuming task such that large public datasets for this purpose are scarce. Current approaches thus use weakly supervised object detection to learn the (rough) localization of pathologies from image-level annotations, which is however limited in performance due to the lack of bounding box supervision. We therefore propose anatomy-driven pathology detection (ADPD), which uses easy-to-annotate bounding boxes of anatomical regions as proxies for pathologies. We study two training approaches: supervised training using anatomy-level pathology labels and multiple instance learning (MIL) with image-level pathology labels. Our results show that our anatomy-level training approach outperforms weakly supervised methods and fully supervised detection with limited training samples, and our MIL approach is competitive with both baseline approaches, therefore demonstrating the potential of our approach.
    摘要 医学成像检测和定义可以自动解释医疗成像,如胸部X射线扫描,并提供高水平的解释,以支持 radiologist 作出 Informed Decision。但是,标注疾病 bounding box 是一个时间消耗大的任务,因此大型公共数据集 для此目的罕见。现有的方法因此使用弱型对象检测来学习 (粗略) 疾病的 localization,但是性能有限因缺少 bounding box 监督。我们因此提出了 anatomy-driven pathology detection (ADPD),它使用容易标注的 anatomical region bounding box 作为疾病的代理。我们研究了两种训练方法:以 anatomy-level 疾病标签进行supervised 训练和多个实例学习 (MIL) 使用 image-level 疾病标签。我们的结果表明,我们的 anatomy-level 训练方法高于弱型方法和有限训练样本的完全监督检测,而我们的 MIL 方法与两种基线方法竞争,因此证明了我们的方法的潜在性。

Emphysema Subtyping on Thoracic Computed Tomography Scans using Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2309.02576
  • repo_url: https://github.com/diagnijmegen/bodyct-dram-emph-subtype
  • paper_authors: Weiyi Xie, Colin Jacobs, Jean-Paul Charbonnier, Dirk Jan Slebos, Bram van Ginneken
    for: 这份研究目的是为了自动识别emphysema的亚型和严重程度,以便更好地管理COPD疾病和研究疾病多样性。methods: 这种方法使用了深度学习的方法来自动应用Fleischner Society的视觉分分数系统来分类emphysema的亚型和严重程度。results: 这种方法可以在9650名COPD病人中取得52%的预测精度,比之前发表的方法的45%预测精度高。此外,这种方法可以生成高分辨率的地方化活化图,可以visualizing the network predictions,同时可以计算每个肺部中emphysema的涉及率。此外,这种方法还可以超过中心lobular emphysema的预测能力,可以包括paraseptal emphysema的亚型。
    Abstract Accurate identification of emphysema subtypes and severity is crucial for effective management of COPD and the study of disease heterogeneity. Manual analysis of emphysema subtypes and severity is laborious and subjective. To address this challenge, we present a deep learning-based approach for automating the Fleischner Society's visual score system for emphysema subtyping and severity analysis. We trained and evaluated our algorithm using 9650 subjects from the COPDGene study. Our algorithm achieved the predictive accuracy at 52\%, outperforming a previously published method's accuracy of 45\%. In addition, the agreement between the predicted scores of our method and the visual scores was good, where the previous method obtained only moderate agreement. Our approach employs a regression training strategy to generate categorical labels while simultaneously producing high-resolution localized activation maps for visualizing the network predictions. By leveraging these dense activation maps, our method possesses the capability to compute the percentage of emphysema involvement per lung in addition to categorical severity scores. Furthermore, the proposed method extends its predictive capabilities beyond centrilobular emphysema to include paraseptal emphysema subtypes.
    摘要 正确地识别肺血液性病变的亚型和严重程度是肺部疾病管理和疾病多样性研究中的重要课题。手动分类肺血液性病变的亚型和严重程度是劳动ious和主观的。为解决这个挑战,我们提出了一个基于深度学习的方法,用于自动化肺血液性病变的Fleischner社会可视分数系统。我们在9650名COPD病人中训练和评估了我们的算法,其预测精度为52%,比前一方法的精度高出17个百分点。此外,我们的方法可以生成高分辨率的局部活化地图,用于视觉化网络预测结果,并且可以计算每个肺部中肺血液性病变的百分比参数。此外,我们的方法可以进一步扩展预测的能力,以包括肺部分 septal emphysema 亚型。

Evaluation Kidney Layer Segmentation on Whole Slide Imaging using Convolutional Neural Networks and Transformers

  • paper_url: http://arxiv.org/abs/2309.02563
  • repo_url: None
  • paper_authors: Muhao Liu, Chenyang Qi, Shunxing Bao, Quan Liu, Ruining Deng, Yu Wang, Shilin Zhao, Haichun Yang, Yuankai Huo
  • for: automated image analysis in renal pathology
  • methods: deep learning-based approaches (CNN and Transformer segmentation)
  • results: Transformer models generally outperform CNN-based models, with a decent Mean Intersection over Union (mIoU) index and the ability to enable quantitative evaluation of renal cortical structures.Here’s the full text in Simplified Chinese:
  • for: 这些深度学习方法的应用在肾脏病理学中的自动图像分析中发挥了重要的作用。
  • methods: 这些方法包括深度学习网络(CNN)和转换器分割方法(Transformer segmentation),包括Swin-Unet、医疗转换器、TransUNet、U-Net、PSPNet和DeepLabv3+。
  • results: 我们的方法的实验结果表明,转换器模型通常比CNN模型性能更高,并且可以提供肾脏层结构的量化评估。
    Abstract The segmentation of kidney layer structures, including cortex, outer stripe, inner stripe, and inner medulla within human kidney whole slide images (WSI) plays an essential role in automated image analysis in renal pathology. However, the current manual segmentation process proves labor-intensive and infeasible for handling the extensive digital pathology images encountered at a large scale. In response, the realm of digital renal pathology has seen the emergence of deep learning-based methodologies. However, very few, if any, deep learning based approaches have been applied to kidney layer structure segmentation. Addressing this gap, this paper assesses the feasibility of performing deep learning based approaches on kidney layer structure segmetnation. This study employs the representative convolutional neural network (CNN) and Transformer segmentation approaches, including Swin-Unet, Medical-Transformer, TransUNet, U-Net, PSPNet, and DeepLabv3+. We quantitatively evaluated six prevalent deep learning models on renal cortex layer segmentation using mice kidney WSIs. The empirical results stemming from our approach exhibit compelling advancements, as evidenced by a decent Mean Intersection over Union (mIoU) index. The results demonstrate that Transformer models generally outperform CNN-based models. By enabling a quantitative evaluation of renal cortical structures, deep learning approaches are promising to empower these medical professionals to make more informed kidney layer segmentation.
    摘要 人类脏器Layer结构分割在人类脏器整片图像(WSI)中扮演了重要的作用,包括肾脏层、外带层、内带层和内脏层。然而,现有的手动分割过程具有劳动密集和不可靠的缺点,不适合处理大规模的数字 PATHOLOGY 图像。面对这个问题,数字肾脏 PATHOLOGY 领域已经出现了深度学习基本的方法。然而,很少有任何深度学习基本的方法应用于肾脏层结构分割。为了解决这个漏洞,本文评估了深度学习基本的方法在肾脏层结构分割中的可能性。本研究采用了代表性的卷积神经网络(CNN)和Transformer分割方法,包括Swin-Unet、医疗Transformer、TransUNet、U-Net、PSPNet和DeepLabv3+。我们对六种流行的深度学习模型进行了数据的评估,并对mouse肾脏WSIs进行了量化评估。结果表明,Transformer模型在肾脏层分割中通常表现出色,并且对比于CNN基本模型具有更高的性能。这些结果表明,通过使用深度学习方法,医生和医疗工程师可以更加准确地分割肾脏层,从而提高诊断和治疗的效果。

Domain Adaptation for Efficiently Fine-tuning Vision Transformer with Encrypted Images

  • paper_url: http://arxiv.org/abs/2309.02556
  • repo_url: None
  • paper_authors: Teru Nagamori, Sayaka Shiota, Hitoshi Kiya
  • for: 这个论文应用于 privacy-preserving learning、access control 和 adversarial defenses 等应用。
  • methods: 本文提出一种基于 vision transformer(ViT)的域 adapted 方法,不会对模型的精度造成衰退。
  • results: 在实验中,我们确认了提案的方法可以防止精度衰退,即使使用加密的图像,使用 CIFAR-10 和 CIFAR-100 数据集。
    Abstract In recent years, deep neural networks (DNNs) trained with transformed data have been applied to various applications such as privacy-preserving learning, access control, and adversarial defenses. However, the use of transformed data decreases the performance of models. Accordingly, in this paper, we propose a novel method for fine-tuning models with transformed images under the use of the vision transformer (ViT). The proposed domain adaptation method does not cause the accuracy degradation of models, and it is carried out on the basis of the embedding structure of ViT. In experiments, we confirmed that the proposed method prevents accuracy degradation even when using encrypted images with the CIFAR-10 and CIFAR-100 datasets.
    摘要 Recently, deep neural networks (DNNs) trained with transformed data have been applied to various applications such as privacy-preserving learning, access control, and adversarial defenses. However, the use of transformed data decreases the performance of models. Therefore, in this paper, we propose a novel method for fine-tuning models with transformed images based on the vision transformer (ViT). Our proposed domain adaptation method does not degrade the accuracy of models and is based on the embedding structure of ViT. In experiments, we confirmed that the proposed method maintains accuracy even when using encrypted images with the CIFAR-10 and CIFAR-100 datasets.Note: The translation is in Simplified Chinese, which is one of the two standardized Chinese writing systems. The other is Traditional Chinese.

A Survey of the Impact of Self-Supervised Pretraining for Diagnostic Tasks with Radiological Images

  • paper_url: http://arxiv.org/abs/2309.02555
  • repo_url: None
  • paper_authors: Blake VanBerlo, Jesse Hoey, Alexander Wong
  • for: 这个论文旨在探讨自动预训练在医学影像识别和分割任务中的效果,并比较自动预训练和完全监督学习的性能。
  • methods: 这些研究使用了不同的自动预训练方法,包括contrastive learning、self-supervised learning、semi-supervised learning等。
  • results: 研究发现,自动预训练通常能够提高下游任务性能,特别是当无标例大量多于标例时。此外,自动预训练还能够减少数据量和计算成本。
    Abstract Self-supervised pretraining has been observed to be effective at improving feature representations for transfer learning, leveraging large amounts of unlabelled data. This review summarizes recent research into its usage in X-ray, computed tomography, magnetic resonance, and ultrasound imaging, concentrating on studies that compare self-supervised pretraining to fully supervised learning for diagnostic tasks such as classification and segmentation. The most pertinent finding is that self-supervised pretraining generally improves downstream task performance compared to full supervision, most prominently when unlabelled examples greatly outnumber labelled examples. Based on the aggregate evidence, recommendations are provided for practitioners considering using self-supervised learning. Motivated by limitations identified in current research, directions and practices for future study are suggested, such as integrating clinical knowledge with theoretically justified self-supervised learning methods, evaluating on public datasets, growing the modest body of evidence for ultrasound, and characterizing the impact of self-supervised pretraining on generalization.
    摘要 自我超视教学在提高特征表示方面的效果已经被观察到,通过利用大量未标注数据进行学习。本文总结了最近关于这一点的研究,专注于对比自我超视学习和完全超视学习在靶体表示分类和分割任务中的表现。研究发现,自我超视学习通常会提高下游任务性能,尤其是当未标注示例大大超过标注示例时。根据总体证据,提供了实践者考虑使用自我超视学习的建议。受到现有研究的限制所启发,未来研究的方向和实践被建议,如结合临床知识与理论上正确的自我超视学习方法,评估在公共数据集上,扩大有限的证据库,以及Characterizing自我超视预训练对泛化的影响。

A skeletonization algorithm for gradient-based optimization

  • paper_url: http://arxiv.org/abs/2309.02527
  • repo_url: https://github.com/martinmenten/skeletonization-for-gradient-based-optimization
  • paper_authors: Martin J. Menten, Johannes C. Paetzold, Veronika A. Zimmer, Suprosanna Shit, Ivan Ezhov, Robbie Holland, Monika Probst, Julia A. Schnabel, Daniel Rueckert
  • for: This paper aims to propose a three-dimensional skeletonization algorithm that is compatible with gradient-based optimization and preserves the object’s topology.
  • methods: The proposed method is based on matrix additions and multiplications, convolutional operations, basic non-linear functions, and sampling from a uniform probability distribution, which makes it easy to implement in any major deep learning library.
  • results: The authors demonstrate the advantages of their skeletonization algorithm compared to non-differentiable, morphological, and neural-network-based baselines through benchmarking experiments. They also integrate the algorithm with two medical image processing applications that use gradient-based optimization, including deep-learning-based blood vessel segmentation and multimodal registration of the mandible in computed tomography and magnetic resonance images.
    Abstract The skeleton of a digital image is a compact representation of its topology, geometry, and scale. It has utility in many computer vision applications, such as image description, segmentation, and registration. However, skeletonization has only seen limited use in contemporary deep learning solutions. Most existing skeletonization algorithms are not differentiable, making it impossible to integrate them with gradient-based optimization. Compatible algorithms based on morphological operations and neural networks have been proposed, but their results often deviate from the geometry and topology of the true medial axis. This work introduces the first three-dimensional skeletonization algorithm that is both compatible with gradient-based optimization and preserves an object's topology. Our method is exclusively based on matrix additions and multiplications, convolutional operations, basic non-linear functions, and sampling from a uniform probability distribution, allowing it to be easily implemented in any major deep learning library. In benchmarking experiments, we prove the advantages of our skeletonization algorithm compared to non-differentiable, morphological, and neural-network-based baselines. Finally, we demonstrate the utility of our algorithm by integrating it with two medical image processing applications that use gradient-based optimization: deep-learning-based blood vessel segmentation, and multimodal registration of the mandible in computed tomography and magnetic resonance images.
    摘要 “骨架”是一个数位影像的简洁表示,包括其顺序结构、几何和比例。它在计算机视觉应用中具有广泛的用途,如影像描述、分割和注册。然而,骨架化仅在当今的深度学习解决方案中具有有限的应用。大多数现有的骨架化算法不可微分,使得它们与梯度基本的优化不能集成。此外,基于 morphological 操作和神经网络的兼容算法也已经提出,但它们的结果通常与真实的中间轴几何和顺序结构存在差异。本文提出了第一个可微分的三维骨架化算法,可以保持物体的顺序结构和几何。我们的方法基于矩阵添加和乘法、卷积操作、基本非线性函数和随机抽样,可以轻松地在任何主流深度学习库中实现。在 benchmarking 实验中,我们证明了我们的骨架化算法与非可微分、 morphological 和神经网络基础的参考模型相比有益。最后,我们通过将我们的算法与两个医学影像处理应用程序集成,即深度学习基于血管分割和多Modal 融合注册,证明了我们的算法的实用性。

GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction

  • paper_url: http://arxiv.org/abs/2309.02436
  • repo_url: https://github.com/youmi-zym/go-slam
  • paper_authors: Youmin Zhang, Fabio Tosi, Stefano Mattoccia, Matteo Poggi
  • for: 这篇论文主要用于提出一种基于深度学习的高精度视觉SLAM框架,以实时globally optimize pose estimation和3D reconstruction。
  • methods: 该框架使用深度学习来实现 pose estimation,并且通过高效的循环关闭和在线全束调整来优化每帧的位姿估计。同时,它在运行时对 implicit和连续表示进行了修正,以确保全局一致性。
  • results: 对于多种 sintetic和实际 dataset,GO-SLAM 能够在tracking robustness和3D reconstruction accuracy方面超越现有方法。此外,GO-SLAM 可以与 monocular、stereo 和 RGB-D输入一起运行。
    Abstract Neural implicit representations have recently demonstrated compelling results on dense Simultaneous Localization And Mapping (SLAM) but suffer from the accumulation of errors in camera tracking and distortion in the reconstruction. Purposely, we present GO-SLAM, a deep-learning-based dense visual SLAM framework globally optimizing poses and 3D reconstruction in real-time. Robust pose estimation is at its core, supported by efficient loop closing and online full bundle adjustment, which optimize per frame by utilizing the learned global geometry of the complete history of input frames. Simultaneously, we update the implicit and continuous surface representation on-the-fly to ensure global consistency of 3D reconstruction. Results on various synthetic and real-world datasets demonstrate that GO-SLAM outperforms state-of-the-art approaches at tracking robustness and reconstruction accuracy. Furthermore, GO-SLAM is versatile and can run with monocular, stereo, and RGB-D input.
    摘要 neural implicit representations 在最近的 dense Simultaneous Localization And Mapping (SLAM) 中表现出了吸引人的结果,但是它们受到相机跟踪的积累错误和重建的扭曲影响。为了解决这些问题,我们提出了 GO-SLAM,一种基于深度学习的 dense visual SLAM 框架,在实时中全球优化姿态和3D重建。姿态估计是其核心,得益于高效的循环关闭和在线全束补做,每帧都可以利用学习的全局几何结构来优化。同时,我们在实时更新了几何和连续表示,以确保3D重建的全球一致性。实验结果表明,GO-SLAM 在跟踪稳定性和重建精度方面超越了当前的方法。此外,GO-SLAM 可以与单目、双目和 RGB-D 输入运行。

ReliTalk: Relightable Talking Portrait Generation from a Single Video

  • paper_url: http://arxiv.org/abs/2309.02434
  • repo_url: https://github.com/arthur-qiu/ReliTalk
  • paper_authors: Haonan Qiu, Zhaoxi Chen, Yuming Jiang, Hang Zhou, Xiangyu Fan, Lei Yang, Wayne Wu, Ziwei Liu
  • for: 生成带有声音的人物肖像图像从单视视频中
  • methods: 提出了一种基于音频特征的人脸 нормаль学习方法,并使用这些 нормаль来进行反射环境的分解
  • results: 在实验中证明了该方法的超越性,能够在单视视频中生成高质量的带有声音的人物肖像图像,并且可以适应不同的背景和照明条件
    Abstract Recent years have witnessed great progress in creating vivid audio-driven portraits from monocular videos. However, how to seamlessly adapt the created video avatars to other scenarios with different backgrounds and lighting conditions remains unsolved. On the other hand, existing relighting studies mostly rely on dynamically lighted or multi-view data, which are too expensive for creating video portraits. To bridge this gap, we propose ReliTalk, a novel framework for relightable audio-driven talking portrait generation from monocular videos. Our key insight is to decompose the portrait's reflectance from implicitly learned audio-driven facial normals and images. Specifically, we involve 3D facial priors derived from audio features to predict delicate normal maps through implicit functions. These initially predicted normals then take a crucial part in reflectance decomposition by dynamically estimating the lighting condition of the given video. Moreover, the stereoscopic face representation is refined using the identity-consistent loss under simulated multiple lighting conditions, addressing the ill-posed problem caused by limited views available from a single monocular video. Extensive experiments validate the superiority of our proposed framework on both real and synthetic datasets. Our code is released in https://github.com/arthur-qiu/ReliTalk.
    摘要 Our key insight is to decompose the portrait's reflectance from implicitly learned audio-driven facial normals and images. Specifically, we use 3D facial priors derived from audio features to predict delicate normal maps through implicit functions. These initially predicted normals then play a crucial role in reflectance decomposition by dynamically estimating the lighting condition of the given video. Additionally, we refine the stereoscopic face representation using the identity-consistent loss under simulated multiple lighting conditions, addressing the ill-posed problem caused by limited views available from a single monocular video.Extensive experiments demonstrate the superiority of our proposed framework on both real and synthetic datasets. Our code is available at https://github.com/arthur-qiu/ReliTalk.

EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding

  • paper_url: http://arxiv.org/abs/2309.02423
  • repo_url: None
  • paper_authors: Yue Xu, Yong-Lu Li, Zhemin Huang, Michael Xu Liu, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang
  • for: 本研究是为了提高 Egocentric Hand-Object Interaction (Ego-HOI) 识别的性能,并解决现有研究基于第三人称视频动作识别的域外差问题。
  • methods: 本研究提出了一个新的框架,称为 Probing, Curation and Adaption (EgoPCA),用于适应 Ego-HOI 识别。该框架包括了全面的预训练集、平衡测试集以及一个新的基线。
  • results: 根据本研究的结果,新的 EgoPCA 框架可以在 Ego-HOI 标准测试集上达到状态之最的性能。此外,本研究还提出了一些新的机制和设置,以进一步推动 Ego-HOI 研究的发展。
    Abstract With the surge in attention to Egocentric Hand-Object Interaction (Ego-HOI), large-scale datasets such as Ego4D and EPIC-KITCHENS have been proposed. However, most current research is built on resources derived from third-person video action recognition. This inherent domain gap between first- and third-person action videos, which have not been adequately addressed before, makes current Ego-HOI suboptimal. This paper rethinks and proposes a new framework as an infrastructure to advance Ego-HOI recognition by Probing, Curation and Adaption (EgoPCA). We contribute comprehensive pre-train sets, balanced test sets and a new baseline, which are complete with a training-finetuning strategy. With our new framework, we not only achieve state-of-the-art performance on Ego-HOI benchmarks but also build several new and effective mechanisms and settings to advance further research. We believe our data and the findings will pave a new way for Ego-HOI understanding. Code and data are available at https://mvig-rhos.com/ego_pca
    摘要 traditional Chinese:随着 Egocentric Hand-Object Interaction (Ego-HOI) 的注意度增加,大规模的数据集如 Ego4D 和 EPIC-KITCHENS 已经被提议。然而,现今大多数研究都基于第三人称视频动作识别资源,这种内在的领域差异使当前的 Ego-HOI 产生较差的性能。这篇论文重新思考并提出了一个新的框架,以提高 Ego-HOI 识别的基础设施,称为 EgoPCA。我们提供了完整的预训练集、平衡测试集和新的基线,并提供了一种训练-微调策略。我们不仅实现了 Ego-HOI 标准 benchmarcks 上的状态最佳性能,还构建了一些新有效的机制和设置,以推进更进一步的研究。我们认为我们的数据和发现将为 Ego-HOI 的理解开拓新的道路。代码和数据可以在 上获取。

Doppelgangers: Learning to Disambiguate Images of Similar Structures

  • paper_url: http://arxiv.org/abs/2309.02420
  • repo_url: https://github.com/RuojinCai/Doppelgangers
  • paper_authors: Ruojin Cai, Joseph Tung, Qianqian Wang, Hadar Averbuch-Elor, Bharath Hariharan, Noah Snavely
  • for: 本文针对的是解决视觉歧义问题,即判断两个visually相似的图像是否描绘同一个3D表面(例如同一个或对称的建筑物的不同面)。
  • methods: 我们提出了一种基于学习的方法,将这个问题转化为图像对的二分类问题。我们还提出了一种新的数据集Doppelgangers,包含visually相似的图像对,并设计了一种网络结构,使用local keypoints和匹配的空间分布来输入,以更好地利用本地和全局cue。
  • results: 我们的方法可以在困难的情况下分辨出illusory匹配,并可以与SfM管道集成,以生成正确、不同歧义的3D重建结果。参考我们项目页面(http://doppelgangers-3d.github.io)可以获得我们的代码、数据集和更多结果。
    Abstract We consider the visual disambiguation task of determining whether a pair of visually similar images depict the same or distinct 3D surfaces (e.g., the same or opposite sides of a symmetric building). Illusory image matches, where two images observe distinct but visually similar 3D surfaces, can be challenging for humans to differentiate, and can also lead 3D reconstruction algorithms to produce erroneous results. We propose a learning-based approach to visual disambiguation, formulating it as a binary classification task on image pairs. To that end, we introduce a new dataset for this problem, Doppelgangers, which includes image pairs of similar structures with ground truth labels. We also design a network architecture that takes the spatial distribution of local keypoints and matches as input, allowing for better reasoning about both local and global cues. Our evaluation shows that our method can distinguish illusory matches in difficult cases, and can be integrated into SfM pipelines to produce correct, disambiguated 3D reconstructions. See our project page for our code, datasets, and more results: http://doppelgangers-3d.github.io/.
    摘要 我们考虑了视觉异常分辨任务,即确定两个视觉相似图像是否描述同一个或不同的3D表面(例如,同一面或对面的对称建筑物)。假设图像的假寻常匹配,其中两个图像可能描述不同的3D表面,但视觉上看起来很相似。这种情况可能会使人类困难于分辨,同时也可能导致3D重建算法生成错误的结果。我们提出了一种学习基于的方法来解决这个问题,将其 форulate为图像对的二分类任务。为此,我们开发了一个新的数据集,即Doppelgangers,其包括了类似结构的图像对,以及真实的标签。我们还设计了一种网络架构,该架构可以接受图像对的空间分布的本地关键点和匹配的输入,以便更好地理解本地和全局cue。我们的评估结果表明,我们的方法可以在困难的情况下分辨假寻常匹配,并可以与SfM管道集成,以生成正确、异常分辨的3D重建结果。请参考我们的项目页面获取我们的代码、数据集和更多结果:http://doppelgangers-3d.github.io/.

Generating Realistic Images from In-the-wild Sounds

  • paper_url: http://arxiv.org/abs/2309.02405
  • repo_url: https://github.com/etilelab/Generating-Realistic-Images-from-In-the-wild-Sounds
  • paper_authors: Taegyeong Lee, Jeonghun Kang, Hyeonyu Kim, Taehwan Kim
  • for: 本研究旨在生成野生声音中的图像,因为现有的数据集缺乏声音和图像的对应对。
  • methods: 本研究使用了音频描述、音频注意力和句子注意力来表达声音的 richttraits,并使用了 CLIPscore 和 AudioCLIP 进行直接声音优化, finally 使用了扩散模型生成图像。
  • results: 实验结果显示,本模型能够生成高质量的图像从野生声音中,并在野外音频数据集上超越基elines 的 both 量化和质量评估。
    Abstract Representing wild sounds as images is an important but challenging task due to the lack of paired datasets between sound and images and the significant differences in the characteristics of these two modalities. Previous studies have focused on generating images from sound in limited categories or music. In this paper, we propose a novel approach to generate images from in-the-wild sounds. First, we convert sound into text using audio captioning. Second, we propose audio attention and sentence attention to represent the rich characteristics of sound and visualize the sound. Lastly, we propose a direct sound optimization with CLIPscore and AudioCLIP and generate images with a diffusion-based model. In experiments, it shows that our model is able to generate high quality images from wild sounds and outperforms baselines in both quantitative and qualitative evaluations on wild audio datasets.
    摘要 <>将野声表示为图像是一项重要但具有挑战性的任务,主要因为声音和图像之间没有匹配的数据集和这两种模式之间存在重要的差异。先前的研究主要集中在限定的类别或音乐中生成图像。在这篇论文中,我们提出一种生成声音中的图像的新方法。首先,我们将声音转换为文本使用音频描述。其次,我们提出了听音注意力和句子注意力来表示声音的丰富特征和视觉化声音。最后,我们提出了直接声音优化CLIPscore和AudioCLIP,并使用扩散模型生成图像。在实验中,我们发现我们的模型能够生成高质量的图像从野声,并在野声数据集上超过基线在量和质量评估中表现出色。Note: Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

Voice Morphing: Two Identities in One Voice

  • paper_url: http://arxiv.org/abs/2309.02404
  • repo_url: https://github.com/rprokap/pset-9
  • paper_authors: Sushanta K. Pani, Anurag Chowdhury, Morgan Sandler, Arun Ross
  • for: 本研究探讨了一种基于语音特征的 morph 攻击,即 Voice Identity Morphing (VIM),可以Synthesize speech samples that impersonate the voice characteristics of a pair of individuals。
  • methods: 研究人员使用了 ECAPA-TDNN 和 x-vector 两种常见的 speaker recognition system,并通过对 Librispeech 数据集进行实验,发现这两种系统都具有高达80%的成功率,但同时也存在1%的假阳性率。
  • results: 研究人员通过实验发现,使用 VIM 可以在 speaker recognition 系统中实现高达80%的成功率,但同时也存在1%的假阳性率。
    Abstract In a biometric system, each biometric sample or template is typically associated with a single identity. However, recent research has demonstrated the possibility of generating "morph" biometric samples that can successfully match more than a single identity. Morph attacks are now recognized as a potential security threat to biometric systems. However, most morph attacks have been studied on biometric modalities operating in the image domain, such as face, fingerprint, and iris. In this preliminary work, we introduce Voice Identity Morphing (VIM) - a voice-based morph attack that can synthesize speech samples that impersonate the voice characteristics of a pair of individuals. Our experiments evaluate the vulnerabilities of two popular speaker recognition systems, ECAPA-TDNN and x-vector, to VIM, with a success rate (MMPMR) of over 80% at a false match rate of 1% on the Librispeech dataset.
    摘要 在生物特征识别系统中,每个生物特征样本或模板通常与单一身份相关。然而,最近的研究已经证明可以生成"变形"生物特征样本,可以成功匹配多个身份。这种"变形攻击"被视为生物特征识别系统的安全威胁。然而,大多数变形攻击都在生物特征模式 operating in the image domain, such as face, fingerprint, and iris 中进行研究。在这项初步工作中,我们引入了语音特征变形(VIM) - 一种基于语音的变形攻击,可以生成具有两个人之间语音特征的演示样本。我们的实验发现,使用 ECAPA-TDNN 和 x-vector 两种流行的 speaker recognition 系统都受到 VIM 攻击的威胁,false match rate 为 1%,在 Librispeech 数据集上达到了80% 的成功率。

Prototype-based Dataset Comparison

  • paper_url: http://arxiv.org/abs/2309.02401
  • repo_url: https://github.com/nanne/protosim
  • paper_authors: Nanne van Noord
  • for: 这篇论文旨在推广 dataset inspection 的思想,通过对多个 dataset 进行比较,从而超越单个 dataset 中最为显著的视觉概念的限制。
  • methods: 作者提出了一种基于自动学习的模块,可以在多个 dataset 之间进行比较,从而发现不同 dataset 中的视觉概念。该模块通过不supervised learning来学习概念水平的聚合体,并在两个案例研究中证明了其效果。
  • results: 作者的研究表明,通过对多个 dataset 进行比较,可以扩展 dataset inspection 的范畴,并且可以发现更多的视觉概念。这些发现可以帮助更多的研究人员在 dataset inspection 中做出更多的发现。
    Abstract Dataset summarisation is a fruitful approach to dataset inspection. However, when applied to a single dataset the discovery of visual concepts is restricted to those most prominent. We argue that a comparative approach can expand upon this paradigm to enable richer forms of dataset inspection that go beyond the most prominent concepts. To enable dataset comparison we present a module that learns concept-level prototypes across datasets. We leverage self-supervised learning to discover these prototypes without supervision, and we demonstrate the benefits of our approach in two case-studies. Our findings show that dataset comparison extends dataset inspection and we hope to encourage more works in this direction. Code and usage instructions available at https://github.com/Nanne/ProtoSim
    摘要 dataset 概述是一种有济于数据集检查的方法。然而,当应用于单个数据集时,发现视觉概念的限制只能是最显著的。我们认为 Comparative approach 可以扩展这个 парадиг,以便进行更加丰富的数据集检查,超过最显著的概念。为实现数据集比较,我们提出了一个学习概念级别的原型模块。我们利用无监督学习来发现这些原型,并在两个案例研究中证明了我们的方法的效iveness。我们发现,数据集比较可以扩展数据集检查,并希望更多的研究在这个方向上。 Code 和使用说明可以在 获取。

STEP – Towards Structured Scene-Text Spotting

  • paper_url: http://arxiv.org/abs/2309.02356
  • repo_url: None
  • paper_authors: Sergi Garcia-Bordils, Dimosthenis Karatzas, Marçal Rusiñol
  • for: scene-text OCR系统的结构化文本检测任务,用于根据用户提供的正则表达式动态控制场景文本检测和识别。
  • methods: 我们提出了Structured TExt sPotter(STEP)模型,利用提供的文本结构来导航OCR过程。STEP可以处理包含空格的正则表达式,并不受WORD水平精度的限制。
  • results: 我们的方法可以在各种实际阅读场景中提供准确的零shot结构化文本检测,并且只基于公开available数据进行训练。我们还 introduce了一个新的挑战性测试集,包含多种场景中的Out-of-vocabulary结构化文本,例如价格、日期、序列号、车牌等。我们的方法在所有测试场景中都能提供专业化的OCR性能。
    Abstract We introduce the structured scene-text spotting task, which requires a scene-text OCR system to spot text in the wild according to a query regular expression. Contrary to generic scene text OCR, structured scene-text spotting seeks to dynamically condition both scene text detection and recognition on user-provided regular expressions. To tackle this task, we propose the Structured TExt sPotter (STEP), a model that exploits the provided text structure to guide the OCR process. STEP is able to deal with regular expressions that contain spaces and it is not bound to detection at the word-level granularity. Our approach enables accurate zero-shot structured text spotting in a wide variety of real-world reading scenarios and is solely trained on publicly available data. To demonstrate the effectiveness of our approach, we introduce a new challenging test dataset that contains several types of out-of-vocabulary structured text, reflecting important reading applications of fields such as prices, dates, serial numbers, license plates etc. We demonstrate that STEP can provide specialised OCR performance on demand in all tested scenarios.
    摘要 我们介绍了结构化场景文本搜寻任务,这需要一个场景文本OCR系统在用户提供的规律表达中搜寻文本。不同于通用场景文本OCR,结构化场景文本搜寻需要在用户提供的规律下动态地控制场景文本检测和识别。为解决这个任务,我们提出了结构化文本搜寻器(STEP),这个模型利用提供的文本结构来引导OCR процес。STEP能够处理包含空格的规律表达,并不受限制于字词水平的检测。我们的方法可以在实际阅读场景中提供精确的零基eline文本搜寻,并仅从公开available的数据进行训练。为证明我们的方法的有效性,我们引入了一个新的挑战性测试数据集,这个数据集包含了多种出版 vocabulary 的结构化文本,例如价格、日期、序号、车牌号码等。我们展示了 STEP 在所有测试场景中提供特殊化 OCR 性能。

Generating Infinite-Resolution Texture using GANs with Patch-by-Patch Paradigm

  • paper_url: http://arxiv.org/abs/2309.02340
  • repo_url: https://github.com/ai4netzero/infinite_texture_gans
  • paper_authors: Alhasan Abdellatif, Ahmed H. Elsheikh
  • for: 生成无穷限分辨率的纹理图像
  • methods: 基于patch-by-patch paradigm的GANs方法
  • results: 比现有方法更加可扩展和灵活,可以生成任意大小的纹理图像,同时保持视觉准确性和多样性。
    Abstract In this paper, we introduce a novel approach for generating texture images of infinite resolutions using Generative Adversarial Networks (GANs) based on a patch-by-patch paradigm. Existing texture synthesis techniques often rely on generating a large-scale texture using a one-forward pass to the generating model, this limits the scalability and flexibility of the generated images. In contrast, the proposed approach trains GANs models on a single texture image to generate relatively small patches that are locally correlated and can be seamlessly concatenated to form a larger image while using a constant GPU memory footprint. Our method learns the local texture structure and is able to generate arbitrary-size textures, while also maintaining coherence and diversity. The proposed method relies on local padding in the generator to ensure consistency between patches and utilizes spatial stochastic modulation to allow for local variations and diversity within the large-scale image. Experimental results demonstrate superior scalability compared to existing approaches while maintaining visual coherence of generated textures.
    摘要 在这篇论文中,我们介绍了一种新的方法,使用生成抗对抗网络(GANs)来生成无限分辨率的текxture图像,基于一种patch-by-patch的方法。现有的 texture合成技术通常是通过一次forward pass来生成一个大规模的 texture,这限制了生成图像的可扩展性和灵活性。相比之下,我们的方法是在一个单个 texture 图像上训练 GANs 模型,以生成相对较小的 patches,这些 patches 是地方相关的,可以在一定的 GPU 内存占用下进行 concatenation,以生成一个更大的图像。我们的方法学习了地方 texture 结构,能够生成任意大小的 texture,同时保持了视觉准确性和多样性。我们的方法利用了 generator 中的本地补充来确保 patches 之间的一致性,并使用空间随机变化来允许本地变化和多样性在大规模图像中。实验结果表明,我们的方法比现有的方法具有更高的可扩展性,同时保持了生成图像的视觉准确性。

DEEPBEAS3D: Deep Learning and B-Spline Explicit Active Surfaces

  • paper_url: http://arxiv.org/abs/2309.02335
  • repo_url: None
  • paper_authors: Helena Williams, João Pedrosa, Muhammad Asad, Laura Cattani, Tom Vercauteren, Jan Deprest, Jan D’hooge
  • for: 提高 automatic segmentation 方法的robustness,以便直接应用于临床。
  • methods: 使用 B-spline explicit active surface (BEAS) Ensures 3D segmentation 是 Smooth 而具有 анатомиче可信度,同时允许用户精确地编辑 3D 表面。
  • results: 与 clinical 工具 4D View VOCAL 相比,提出的框架具有更低的 NASA-TLX 指数(30% 减少)和用户时间(70% 减少,p<0.00001)。
    Abstract Deep learning-based automatic segmentation methods have become state-of-the-art. However, they are often not robust enough for direct clinical application, as domain shifts between training and testing data affect their performance. Failure in automatic segmentation can cause sub-optimal results that require correction. To address these problems, we propose a novel 3D extension of an interactive segmentation framework that represents a segmentation from a convolutional neural network (CNN) as a B-spline explicit active surface (BEAS). BEAS ensures segmentations are smooth in 3D space, increasing anatomical plausibility, while allowing the user to precisely edit the 3D surface. We apply this framework to the task of 3D segmentation of the anal sphincter complex (AS) from transperineal ultrasound (TPUS) images, and compare it to the clinical tool used in the pelvic floor disorder clinic (4D View VOCAL, GE Healthcare; Zipf, Austria). Experimental results show that: 1) the proposed framework gives the user explicit control of the surface contour; 2) the perceived workload calculated via the NASA-TLX index was reduced by 30% compared to VOCAL; and 3) it required 7 0% (170 seconds) less user time than VOCAL (p< 0.00001)
    摘要 深度学习自动分割方法已成为当前状态的惯性。然而,它们常不够鲁棒,对于直接临床应用而言。领域变化导致自动分割失败,从而导致了不优化的结果,需要更正。为解决这些问题,我们提出了一种新的3D扩展的互动分割框架。这个框架将convulsion neural network(CNN)的分割表示为B-spline显式活动表面(BEAS)。BEAS使得分割在3D空间是平滑的,提高了生物学可能性,同时允许用户精确地编辑3D表面。我们在分割Transperineal ultrasound(TPUS)图像中的下部缺陷复合(AS)任务上应用了这个框架,并与临床工具used in the pelvic floor disorder clinic(4D View VOCAL,GE Healthcare,Zipf,Austria)进行比较。实验结果显示:1)提案的框架给用户显式控制表面轮廓; 2)NASA-TLX指数计算的感知工作负荷比VOCAL减少30%;3)与VOCAL相比,用户时间减少70%(p<0.00001)。

TiAVox: Time-aware Attenuation Voxels for Sparse-view 4D DSA Reconstruction

  • paper_url: http://arxiv.org/abs/2309.02318
  • repo_url: None
  • paper_authors: Zhenghong Zhou, Huangxuan Zhao, Jiemin Fang, Dongqiao Xiang, Lei Chen, Lingxia Wu, Feihong Wu, Wenyu Liu, Chuansheng Zheng, Xinggang Wang
    for: This paper aims to propose a novel approach for sparse-view 4D digital subtraction angiography (DSA) reconstruction, which can reduce the radiation dose while maintaining high-quality imaging results.methods: The proposed approach, called Time-aware Attenuation Voxel (TiAVox), utilizes 4D attenuation voxel grids to model the attenuation properties of both spatial and temporal dimensions. It is optimized by minimizing discrepancies between the rendered images and sparse 2D DSA images, without relying on any neural network.results: The proposed TiAVox approach achieved a 31.23 Peak Signal-to-Noise Ratio (PSNR) for novel view synthesis using only 30 views on a clinically sourced dataset, outperforming traditional Feldkamp-Davis-Kress methods which required 133 views. Additionally, TiAVox yielded a PSNR of 34.32 for novel view synthesis and 41.40 for 3D reconstruction on a synthetic dataset using merely 10 views.
    Abstract Four-dimensional Digital Subtraction Angiography (4D DSA) plays a critical role in the diagnosis of many medical diseases, such as Arteriovenous Malformations (AVM) and Arteriovenous Fistulas (AVF). Despite its significant application value, the reconstruction of 4D DSA demands numerous views to effectively model the intricate vessels and radiocontrast flow, thereby implying a significant radiation dose. To address this high radiation issue, we propose a Time-aware Attenuation Voxel (TiAVox) approach for sparse-view 4D DSA reconstruction, which paves the way for high-quality 4D imaging. Additionally, 2D and 3D DSA imaging results can be generated from the reconstructed 4D DSA images. TiAVox introduces 4D attenuation voxel grids, which reflect attenuation properties from both spatial and temporal dimensions. It is optimized by minimizing discrepancies between the rendered images and sparse 2D DSA images. Without any neural network involved, TiAVox enjoys specific physical interpretability. The parameters of each learnable voxel represent the attenuation coefficients. We validated the TiAVox approach on both clinical and simulated datasets, achieving a 31.23 Peak Signal-to-Noise Ratio (PSNR) for novel view synthesis using only 30 views on the clinically sourced dataset, whereas traditional Feldkamp-Davis-Kress methods required 133 views. Similarly, with merely 10 views from the synthetic dataset, TiAVox yielded a PSNR of 34.32 for novel view synthesis and 41.40 for 3D reconstruction. We also executed ablation studies to corroborate the essential components of TiAVox. The code will be publically available.
    摘要 四维数字抽取成像(4D DSA)在诊断医学疾病方面发挥重要作用,如arteriovenous malformation(AVM)和arteriovenous fistula(AVF)。尽管它具有重要应用价值,但4D DSA重建需要大量视图,以模拟复杂的血管和干扰物流动,从而导致高射线剂量。为解决这个高射线问题,我们提出了基于时间意识的减杂粒子(TiAVox)方法,这种方法可以在缺少视图情况下实现高质量4D成像。此外,2D和3D DSA成像结果也可以从重建的4D DSA图像中生成。TiAVox使用4D减杂粒子网格,该网格反映了空间和时间维度中的减杂特性。它通过最小化与渲染图像之间的差异来优化。不同于使用神经网络的方法,TiAVox具有特定的物理解释性。每个学习粒子的参数表示减杂系数。我们在临床和模拟数据集上验证了TiAVox方法,在使用30个视图时,对于新视图synthesis,TiAVox方法达到了31.23的峰值信号强度比率(PSNR),而传统的Feldkamp-Davis-Kress方法需要133个视图。同样,只使用10个视图从synthetic dataset,TiAVox方法可以达到34.32的PSNR для新视图synthesis和41.40的PSNR для3D重建。我们还进行了ablation研究,以证明TiAVox的关键组件。代码将公开。

CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning

  • paper_url: http://arxiv.org/abs/2309.02301
  • repo_url: None
  • paper_authors: Hongyu Hu, Jiyuan Zhang, Minyi Zhao, Zhenbang Sun
  • for: 本研究旨在 Addressing the hallucination phenomenon in Large Vision-Language Models (LVLMs) by introducing a Contrastive Instruction Evaluation Method (CIEM) and a new instruction tuning method called Contrastive Instruction Tuning (CIT).
  • methods: 本研究使用了一个自动化管道,包括一个注解的图像文本数据集和一个Large Language Model (LLM),生成了事实/对比问题对的评估,以检测LVLMs中的幻觉现象。同时,基于CIEM,我们还提出了一种新的 instruction tuning 方法,即 CIT (Contrastive Instruction Tuning),以自动生成高质量的事实/对比问题对和相应的证明,以适应LVLMs中的幻觉现象。
  • results: 我们通过广泛的实验表明,CIEM 和 CIT 能够准确检测LVLMs中的幻觉现象,并且CIT-调教VLMs比CIEM和公共数据集更优。
    Abstract Nowadays, the research on Large Vision-Language Models (LVLMs) has been significantly promoted thanks to the success of Large Language Models (LLM). Nevertheless, these Vision-Language Models (VLMs) are suffering from the drawback of hallucination -- due to insufficient understanding of vision and language modalities, VLMs may generate incorrect perception information when doing downstream applications, for example, captioning a non-existent entity. To address the hallucination phenomenon, on the one hand, we introduce a Contrastive Instruction Evaluation Method (CIEM), which is an automatic pipeline that leverages an annotated image-text dataset coupled with an LLM to generate factual/contrastive question-answer pairs for the evaluation of the hallucination of VLMs. On the other hand, based on CIEM, we further propose a new instruction tuning method called CIT (the abbreviation of Contrastive Instruction Tuning) to alleviate the hallucination of VLMs by automatically producing high-quality factual/contrastive question-answer pairs and corresponding justifications for model tuning. Through extensive experiments on CIEM and CIT, we pinpoint the hallucination issues commonly present in existing VLMs, the disability of the current instruction-tuning dataset to handle the hallucination phenomenon and the superiority of CIT-tuned VLMs over both CIEM and public datasets.
    摘要 现在,大vision-language模型(LVLM)的研究得到了大语言模型(LLM)的成功,然而这些视力语言模型(VLM)却受到了一个缺点——因为不够理解视觉和语言模式,VLM可能在下游应用中生成错误的感知信息,例如captioning一个不存在的实体。为了解决这种幻觉现象,我们在一个手动管道中引入了一种对比 instruction evaluation方法(CIEM),这种方法利用了一个注解图像文本集和一个LLM来生成factual/对比问题对的评估。此外,基于CIEM,我们进一步提出了一种新的指令调整方法called CIT(对比指令调整),以解决VLM中的幻觉问题。通过广泛的CIEM和CIT实验,我们揭示了现有VLM中的幻觉问题,存在的指令调整数据集不能处理幻觉现象,以及CIT-调整VLM的superiority。

ATM: Action Temporality Modeling for Video Question Answering

  • paper_url: http://arxiv.org/abs/2309.02290
  • repo_url: None
  • paper_authors: Junwen Chen, Jie Zhu, Yu Kong
  • for: 本研究旨在提高视频问答(VideoQA)中的 causal/temporal 理解能力,因为现有方法在面对需要跨帧 temporality 理解的问题时表现不佳。
  • methods: 本研究提出了 Action Temporality Modeling (ATM) 方法,通过三种特点:(1)重新思考optical flow的表示方式,发现optical flow可以帮助捕捉长期 temporality 理解;(2)通过对视觉和文本模式的嵌入进行对比式学习,从而提高动作表示在视觉和文本模式中的性能;(3)在精度调整阶段避免使用混乱视频,以避免因为出现和运动的杂交关系而导致的假 positives。
  • results: 实验表明,ATM方法比前一些方法在多个 VideoQA 任务上表现更高的准确率,同时也能够更好地保持true temporality 理解能力。
    Abstract Despite significant progress in video question answering (VideoQA), existing methods fall short of questions that require causal/temporal reasoning across frames. This can be attributed to imprecise motion representations. We introduce Action Temporality Modeling (ATM) for temporality reasoning via three-fold uniqueness: (1) rethinking the optical flow and realizing that optical flow is effective in capturing the long horizon temporality reasoning; (2) training the visual-text embedding by contrastive learning in an action-centric manner, leading to better action representations in both vision and text modalities; and (3) preventing the model from answering the question given the shuffled video in the fine-tuning stage, to avoid spurious correlation between appearance and motion and hence ensure faithful temporality reasoning. In the experiments, we show that ATM outperforms previous approaches in terms of the accuracy on multiple VideoQAs and exhibits better true temporality reasoning ability.
    摘要 尽管现有的视频问答(VideoQA)技术已经取得了显著的进步,但现有的方法仍然无法解决需要时间/ causal 逻辑推理的问题。这可以归结于不精准的动作表示。我们提出了动作时间模型(ATM),通过三种独特性来进行时间逻辑推理:1. 重新思考光流,并发现光流能够 Capture 长远时间的时间逻辑推理;2. 通过对视觉和文本的嵌入进行对比学习,从而提高动作的表示在视觉和文本模式之间;3. 在练习阶段,防止模型根据混乱的视频回答问题,以避免因为外观和运动的偶推关系而导致的假设关系。在实验中,我们表明ATM在多个 VideoQA 上的准确率高于先前的方法,并且展现出更好的真实时间逻辑推理能力。

Haystack: A Panoptic Scene Graph Dataset to Evaluate Rare Predicate Classes

  • paper_url: http://arxiv.org/abs/2309.02286
  • repo_url: None
  • paper_authors: Julian Lorenz, Florian Barthel, Daniel Kienzle, Rainer Lienhart
  • for: 本研究旨在构建一个新的SceneGraph dataset,以提高SceneGraph生成模型的预测性能,特别是对于罕见 predicate class。
  • methods: 该研究提出了一种模型协助的注释管道,以高效地找到图像中的罕见 predicate class。这种方法不同于现有的SceneGraph dataset,因为它包含Explicit negative annotations。
  • results: Haystack dataset可以轻松地与现有的SceneGraph dataset集成,并且可以帮助提高SceneGraph生成模型的预测性能,特别是对于罕见 predicate class。
    Abstract Current scene graph datasets suffer from strong long-tail distributions of their predicate classes. Due to a very low number of some predicate classes in the test sets, no reliable metrics can be retrieved for the rarest classes. We construct a new panoptic scene graph dataset and a set of metrics that are designed as a benchmark for the predictive performance especially on rare predicate classes. To construct the new dataset, we propose a model-assisted annotation pipeline that efficiently finds rare predicate classes that are hidden in a large set of images like needles in a haystack. Contrary to prior scene graph datasets, Haystack contains explicit negative annotations, i.e. annotations that a given relation does not have a certain predicate class. Negative annotations are helpful especially in the field of scene graph generation and open up a whole new set of possibilities to improve current scene graph generation models. Haystack is 100% compatible with existing panoptic scene graph datasets and can easily be integrated with existing evaluation pipelines. Our dataset and code can be found here: https://lorjul.github.io/haystack/. It includes annotation files and simple to use scripts and utilities, to help with integrating our dataset in existing work.
    摘要 To construct Haystack, we proposed a model-assisted annotation pipeline that efficiently finds rare predicate classes hidden in large sets of images, similar to finding needles in a haystack. This pipeline allows us to annotate rare predicate classes that were previously difficult or impossible to annotate.Haystack includes negative annotations, which are particularly useful in the field of scene graph generation. These negative annotations open up new possibilities for improving current scene graph generation models. Our dataset and code can be found at https://lorjul.github.io/haystack/, which includes annotation files and simple-to-use scripts and utilities to help integrate our dataset into existing work.

SAM-Deblur: Let Segment Anything Boost Image Deblurring

  • paper_url: http://arxiv.org/abs/2309.02270
  • repo_url: None
  • paper_authors: Siwei Li, Mingxuan Liu, Yating Zhang, Shu Chen, Haoxiang Li, Hong Chen, Zifei Dou
  • For: 该 paper 的目的是解决非均匀抖isser(non-uniform blurring)导致的图像恢复问题,使用 Segment Anything Model(SAM)的优先知识来提高恢复模型的通用性。* Methods: 该 paper 提出了一种框架,名为 SAM-Deblur,它将 SAM 的优先知识 integrate 到恢复任务中,并提出了一种面积融合(MAP)单元,用于融合 SAM 生成的分割区域,以提高模型的稳定性和普适性。* Results: 实验结果表明,通过 incorporating 我们的方法,可以提高 NAFNet 的 PSNR 值,具体如下:RealBlurJ 上提高 0.05,ReloBlur 上提高 0.96,REDs 上提高 7.03。
    Abstract Image deblurring is a critical task in the field of image restoration, aiming to eliminate blurring artifacts. However, the challenge of addressing non-uniform blurring leads to an ill-posed problem, which limits the generalization performance of existing deblurring models. To solve the problem, we propose a framework SAM-Deblur, integrating prior knowledge from the Segment Anything Model (SAM) into the deblurring task for the first time. In particular, SAM-Deblur is divided into three stages. First, We preprocess the blurred images, obtain image masks via SAM, and propose a mask dropout method for training to enhance model robustness. Then, to fully leverage the structural priors generated by SAM, we propose a Mask Average Pooling (MAP) unit specifically designed to average SAM-generated segmented areas, serving as a plug-and-play component which can be seamlessly integrated into existing deblurring networks. Finally, we feed the fused features generated by the MAP Unit into the deblurring model to obtain a sharp image. Experimental results on the RealBlurJ, ReloBlur, and REDS datasets reveal that incorporating our methods improves NAFNet's PSNR by 0.05, 0.96, and 7.03, respectively. Code will be available at \href{https://github.com/HPLQAQ/SAM-Deblur}{SAM-Deblur}.
    摘要 图像抖涂除是图像修复领域中的关键任务,旨在消除抖涂 artifacts。然而,非均匀抖涂的挑战导致一个不定性问题,限制了现有的抖涂除模型的泛化性能。为解决这问题,我们提出了一个框架SAM-Deblur,将Segment Anything Model(SAM)的先前知识integrated into the deblurring task。具体来说,SAM-Deblur分为三个阶段。首先,我们对抖涂图像进行预处理,通过SAM获得图像掩码,并提出了一种掩码抽样方法来提高模型Robustness。然后,我们提出了一种特殊的Mask Average Pooling(MAP)单元,用于平均SAM生成的分割区域,作为可插入到现有的抖涂除网络中的插件。最后,我们将MAP单元生成的融合特征 fed into the deblurring model,以获得锐化图像。实验结果表明,在RealBlurJ、ReloBlur和REDSDatasets上, incorporating our methods improve NAFNet's PSNR by 0.05, 0.96, and 7.03, respectively。代码将提供在 \href{https://github.com/HPLQAQ/SAM-Deblur}{SAM-Deblur}。

Augmenting Chest X-ray Datasets with Non-Expert Annotations

  • paper_url: http://arxiv.org/abs/2309.02244
  • repo_url: None
  • paper_authors: Cathrine Damgaard, Trine Naja Eriksen, Dovile Juodelyte, Veronika Cheplygina, Amelia Jiménez-Sánchez
  • for: 增加医疗影像分析中的机器学习算法的扩展,需要训练资料集的扩展。
  • methods: 使用自动标注抽象法从免费医疗报告中提取标注,以减少专家医生的 annotating 成本。
  • results: 通过将短cuts标注为管道,增加了两个公共可用的胸部X射像数据集的大小。使用非专家标注可以对医疗影像分析进行扩展。
    Abstract The advancement of machine learning algorithms in medical image analysis requires the expansion of training datasets. A popular and cost-effective approach is automated annotation extraction from free-text medical reports, primarily due to the high costs associated with expert clinicians annotating chest X-ray images. However, it has been shown that the resulting datasets are susceptible to biases and shortcuts. Another strategy to increase the size of a dataset is crowdsourcing, a widely adopted practice in general computer vision with some success in medical image analysis. In a similar vein to crowdsourcing, we enhance two publicly available chest X-ray datasets by incorporating non-expert annotations. However, instead of using diagnostic labels, we annotate shortcuts in the form of tubes. We collect 3.5k chest drain annotations for CXR14, and 1k annotations for 4 different tube types in PadChest. We train a chest drain detector with the non-expert annotations that generalizes well to expert labels. Moreover, we compare our annotations to those provided by experts and show "moderate" to "almost perfect" agreement. Finally, we present a pathology agreement study to raise awareness about ground truth annotations. We make our annotations and code available.
    摘要 “医学影像分析中的机器学习算法的进步需要训练数据集的扩展。一种受欢迎的和成本效益的方法是自动提取自自由文本医疗报告中的注释,主要是因为专业医生标注胸部X射线图像的成本很高。然而,已经证明了这些数据集具有偏见和短cuts。另一种增加数据集的方法是在大众筹资源,这是通用计算机视觉领域广泛采用的一种做法,在医学影像分析中也有一定的成功。我们在两个公共可用的胸部X射线数据集上进行了改进,并在医生标注中添加了非专业注释。我们收集了3500个胸部排液注释,并在PadChest上收集了4种不同的管道类型的1000个注释。我们使用非专业注释来训练胸部排液检测器,并发现其可以很好地泛化到专业标注。此外,我们比较了我们的注释和专业标注,并发现它们之间存在“中度”到“几乎完美”的一致。最后,我们进行了病理一致性研究,以提醒人们关于真实标注的重要性。我们将我们的注释和代码公开。”

Robustness and Generalizability of Deepfake Detection: A Study with Diffusion Models

  • paper_url: http://arxiv.org/abs/2309.02218
  • repo_url: https://github.com/OpenRL-Lab/DeepFakeFace
  • paper_authors: Haixu Song, Shiyu Huang, Yinpeng Dong, Wei-Wei Tu
  • for: 本研究旨在帮助推广真实信息,减少深度模仿图像的散布。
  • methods: 研究采用了高级扩散模型生成虚假名人脸,并将其分享到线上平台上。用于训练和测试深度模仿检测算法的数据集是DFF集。
  • results: 研究发现,不同的深度模仿方法和图像变化,需要更好的深度模仿检测工具。DFF集和测试方法能够推动开发更有效的深度模仿检测算法。
    Abstract The rise of deepfake images, especially of well-known personalities, poses a serious threat to the dissemination of authentic information. To tackle this, we present a thorough investigation into how deepfakes are produced and how they can be identified. The cornerstone of our research is a rich collection of artificial celebrity faces, titled DeepFakeFace (DFF). We crafted the DFF dataset using advanced diffusion models and have shared it with the community through online platforms. This data serves as a robust foundation to train and test algorithms designed to spot deepfakes. We carried out a thorough review of the DFF dataset and suggest two evaluation methods to gauge the strength and adaptability of deepfake recognition tools. The first method tests whether an algorithm trained on one type of fake images can recognize those produced by other methods. The second evaluates the algorithm's performance with imperfect images, like those that are blurry, of low quality, or compressed. Given varied results across deepfake methods and image changes, our findings stress the need for better deepfake detectors. Our DFF dataset and tests aim to boost the development of more effective tools against deepfakes.
    摘要 “深圳技术”的出现,尤其是关于知名人物的深圳图像,对媒体传播Authentic信息提供了严重的威胁。为了解决这个问题,我们提供了一份深入探究深圳图像的生成和识别方法的研究报告。我们的研究的核心是一个名为“DeepFakeFace”(DFF)的人工知名人物脸部集合。我们使用了先进的扩散模型来制作了这个数据集,并通过在线平台分享给社区。这个数据集作为训练和测试深圳识别算法的基础,提供了一个robust的基础。我们对DFF数据集进行了住检查,并提出了两种评价方法来评估深圳识别算法的强度和适应性。第一种测试是用一种基于一种深圳方法训练的算法能否识别其他方法生成的深圳图像。第二种测试是用一种受到干扰、低质量或压缩等变化的图像来测试算法的性能。由于深圳方法和图像变化的多样性,我们的发现表明了深圳识别算法的需要更好。我们的DFF数据集和测试方法旨在推动对深圳图像的更好的识别工具的开发。

Advanced Underwater Image Restoration in Complex Illumination Conditions

  • paper_url: http://arxiv.org/abs/2309.02217
  • repo_url: None
  • paper_authors: Yifan Song, Mengkun She, Kevin Köser
  • for: 本研究旨在提高深水下摄影图像的修复效果,特别是在200米深度以下的潜水器拍摄场景, где自然光scarce和人工照明必须。
  • methods: 本研究使用了新的应earance变化约束,即对象或海底表面的变化,以估算照明场景。通过每个像素对相机视场中的光场的约束,每个voxel都可以保存一个信号因子和反射值,以便高效地修复摄影机-灯光平台上的图像。
  • results: 实验结果表明,本approach可以准确地修复摄影机-灯光平台上的图像,同时减轻照明和媒体效果的影响。此外,本approach可以轻松扩展到其他场景,如在空中拍摄或其他类似场景。
    Abstract Underwater image restoration has been a challenging problem for decades since the advent of underwater photography. Most solutions focus on shallow water scenarios, where the scene is uniformly illuminated by the sunlight. However, the vast majority of uncharted underwater terrain is located beyond 200 meters depth where natural light is scarce and artificial illumination is needed. In such cases, light sources co-moving with the camera, dynamically change the scene appearance, which make shallow water restoration methods inadequate. In particular for multi-light source systems (composed of dozens of LEDs nowadays), calibrating each light is time-consuming, error-prone and tedious, and we observe that only the integrated illumination within the viewing volume of the camera is critical, rather than the individual light sources. The key idea of this paper is therefore to exploit the appearance changes of objects or the seafloor, when traversing the viewing frustum of the camera. Through new constraints assuming Lambertian surfaces, corresponding image pixels constrain the light field in front of the camera, and for each voxel a signal factor and a backscatter value are stored in a volumetric grid that can be used for very efficient image restoration of camera-light platforms, which facilitates consistently texturing large 3D models and maps that would otherwise be dominated by lighting and medium artifacts. To validate the effectiveness of our approach, we conducted extensive experiments on simulated and real-world datasets. The results of these experiments demonstrate the robustness of our approach in restoring the true albedo of objects, while mitigating the influence of lighting and medium effects. Furthermore, we demonstrate our approach can be readily extended to other scenarios, including in-air imaging with artificial illumination or other similar cases.
    摘要 水下图像修复问题已经是数十年来的挑战,自 fotografías submarinas 的出现以来。大多数解决方案都专注于浅水enario, где场景由太阳照明均匀。然而,95%的未探索的水下地形都 locate在200米深度以下,其中自然光照明稀缺,需要人工照明。在这种情况下,相机 Move along with light sources, dynamically change the scene appearance,使得浅水修复方法无法满足需求。特别是,现今的多光源系统(由多个LED组成),每个光源的准确耗时、容易出错和繁琐,而我们发现,只有相机前方照明的积合照明是关键的,而不是个别的光源。本文的关键想法是利用相机视图卷积中物体或海底的变化,来恢复图像。通过新的约束,对象或海底的镜像变化会帮助确定图像中的照明场景,并为每个 voxel 存储一个信号因子和反射值,可以高效地修复相机灯台上的图像,使得大型 3D 模型和地图可以一致地 текстури化,而不会受到照明和媒体效果的限制。为验证我们的方法的有效性,我们对 simulated 和实际数据进行了广泛的实验。实验结果表明,我们的方法可以准确地恢复物体的真实反射率,同时抑制照明和媒体效果的影响。此外,我们还证明了我们的方法可以轻松扩展到其他场景,包括在空中拍摄的人工照明或类似情况。

Continual Cross-Dataset Adaptation in Road Surface Classification

  • paper_url: http://arxiv.org/abs/2309.02210
  • repo_url: None
  • paper_authors: Paolo Cudrano, Matteo Bellusci, Giuseppe Macino, Matteo Matteucci
  • for: 这篇论文是为了解决自动驾驶车(AV)的道路表面分类问题,以便优化驾驶环境、提高安全性和实现进阶道路地图。
  • methods: 这篇论文使用了快速和有效的cross-dataset演化方法,以保持过去的知识并适应新数据,从而避免了忘记现象。
  • results: 实验结果显示,这种方法比Naive finetuning更有优势,可以实现性能与新 retraining 之间的几乎相同水平。
    Abstract Accurate road surface classification is crucial for autonomous vehicles (AVs) to optimize driving conditions, enhance safety, and enable advanced road mapping. However, deep learning models for road surface classification suffer from poor generalization when tested on unseen datasets. To update these models with new information, also the original training dataset must be taken into account, in order to avoid catastrophic forgetting. This is, however, inefficient if not impossible, e.g., when the data is collected in streams or large amounts. To overcome this limitation and enable fast and efficient cross-dataset adaptation, we propose to employ continual learning finetuning methods designed to retain past knowledge while adapting to new data, thus effectively avoiding forgetting. Experimental results demonstrate the superiority of this approach over naive finetuning, achieving performance close to fresh retraining. While solving this known problem, we also provide a general description of how the same technique can be adopted in other AV scenarios. We highlight the potential computational and economic benefits that a continual-based adaptation can bring to the AV industry, while also reducing greenhouse emissions due to unnecessary joint retraining.
    摘要 准确的路面类别化是自动驾驶车辆(AV)优化驾驶条件、提高安全性和实现高级路况映射的关键。然而,深度学习模型 для路面类别化受到不seen数据集的泛化问题带来挑战。为了更新这些模型,还需要考虑原始训练数据集,以避免恶性忘记。这是一个效率和可行性的限制,例如在流动数据集或大量数据集时。为了突破这个限制,我们提议使用 kontinual learning finetuning 方法,以保持过去知识而适应新数据,从而实现高效的跨数据集适应。实验结果表明,我们的方法与混合 retrained 方法具有类似的性能,而且具有更高的效率和可行性。此外,我们还描述了在其他 AV 场景中如何采用同样的技术,并指出了计算和经济上的优势,以及减少可能的绿色排放。

Delving into Ipsilateral Mammogram Assessment under Multi-View Network

  • paper_url: http://arxiv.org/abs/2309.02197
  • repo_url: None
  • paper_authors: Thai Ngoc Toan Truong, Thanh-Huy Nguyen, Ba Thinh Lam, Vu Minh Duy Nguyen, Hong Phuc Nguyen
  • for: 这项研究旨在探讨多视图胸部X射图分析中的多种融合策略,包括平均和 concatenate 策略,以及不同个体和融合路径对模型学习行为的影响。
  • methods: 该研究使用了 Ipsilateral Multi-View Network,包括 Pre、Early、Middle、Last 和 Post Fusion 五种融合类型,并使用了 ResNet-18 网络。
  • results: 研究发现,中间融合方法是最佳均衡和有效的方法,可以提高深度学习模型在 VinDr-Mammo 数据集和 CMMD 数据集上的总体分类精度,并且在macro F1-Score上提高了 +2.06% (concatenate) 和 +5.29% (average),以及 +2.03% (concatenate) 和 +3% (average)。
    Abstract In many recent years, multi-view mammogram analysis has been focused widely on AI-based cancer assessment. In this work, we aim to explore diverse fusion strategies (average and concatenate) and examine the model's learning behavior with varying individuals and fusion pathways, involving Coarse Layer and Fine Layer. The Ipsilateral Multi-View Network, comprising five fusion types (Pre, Early, Middle, Last, and Post Fusion) in ResNet-18, is employed. Notably, the Middle Fusion emerges as the most balanced and effective approach, enhancing deep-learning models' generalization performance by +2.06% (concatenate) and +5.29% (average) in VinDr-Mammo dataset and +2.03% (concatenate) and +3% (average) in CMMD dataset on macro F1-Score. The paper emphasizes the crucial role of layer assignment in multi-view network extraction with various strategies.
    摘要 多年来,多视图胸部X光分析已广泛关注人工智能基于癌病评估。在这项工作中,我们想要探索多种融合策略(平均和 concatenate),并研究模型在不同个体和融合路径上学习行为,包括粗层和细层。我们使用Ipsilateral Multi-View Network,其包括五种融合类型(Pre、Early、Middle、Last和Post Fusion)在ResNet-18中。值得注意的是,Middle Fusion表现为最 equilibrio和有效的方法,可以提高深度学习模型的泛化性能,在VinDr-Mammo数据集中提高了 macro F1-Score 的表现,分别提高了 +2.06%( concatenate)和 +5.29%(average),在CMMD数据集中提高了 +2.03%( concatenate)和 +3%(average)。文章强调了多视图网络EXTRACTION中不同策略的层分配的重要性。

High-resolution 3D Maps of Left Atrial Displacements using an Unsupervised Image Registration Neural Network

  • paper_url: http://arxiv.org/abs/2309.02179
  • repo_url: None
  • paper_authors: Christoforos Galazis, Anil Anthony Bharath, Marta Varela
  • for: 这个研究旨在提供一种自动将左心室(LA)动态分割为不同阶段的工具,以便更好地了解室内动力学性质。
  • methods: 这个研究使用高级解剖磁共振成像(Cine MRI)技术,以获取高分辨率、全面覆盖的LA动态图像。然后,提出了一种自动将LA动态分割为不同阶段的工具,使用了扩展的距离函数和矩阵方法。
  • results: 研究发现,该工具能够准确地跟踪LA墙面在心动周期内的运动, Hausdorff距离平均值为2.51±1.3mm,Dice分数平均值为0.96±0.02。
    Abstract Functional analysis of the left atrium (LA) plays an increasingly important role in the prognosis and diagnosis of cardiovascular diseases. Echocardiography-based measurements of LA dimensions and strains are useful biomarkers, but they provide an incomplete picture of atrial deformations. High-resolution dynamic magnetic resonance images (Cine MRI) offer the opportunity to examine LA motion and deformation in 3D, at higher spatial resolution and with full LA coverage. However, there are no dedicated tools to automatically characterise LA motion in 3D. Thus, we propose a tool that automatically segments the LA and extracts the displacement fields across the cardiac cycle. The pipeline is able to accurately track the LA wall across the cardiac cycle with an average Hausdorff distance of $2.51 \pm 1.3~mm$ and Dice score of $0.96 \pm 0.02$.
    摘要 左心室功能分析在心血管疾病诊断和预后中发挥越来越重要的作用。使用echo响应测量左心室尺寸和弹性可以提供有用的生物标志物,但它们只提供了左心室弹性的部分图像。高分辨率动态磁共振成像(Cine MRI)可以让我们在三维空间中观察左心室的运动和弹性,并且具有完整的左心室覆盖。然而,目前没有专门的工具可以自动描述左心室的运动。因此,我们提出了一种工具,可以自动 segment左心室并提取征动过程中的挤压场。管道可以准确地跟踪左心室墙在征动过程中的挤压场,平均 Hausdorff 距离为2.51±1.3毫米,Dice 分数为0.96±0.02。

PCFGaze: Physics-Consistent Feature for Appearance-based Gaze Estimation

  • paper_url: http://arxiv.org/abs/2309.02165
  • repo_url: None
  • paper_authors: Yiwei Bao, Feng Lu
  • for: 本文试图解释如何将视线特征连接到物理上的视线定义。
  • methods: 本文分析了视线特征拟合空间,发现视线特征间的地odesic距离与样本之间的视线差异相关。基于这种发现,提出了物理相关特征(PCF),将视线特征与物理定义的视线连接。
  • results: 提出的PCFGAZE框架可以直接优化视线特征空间,无需额外训练数据,可以 Alleviate overfitting问题,并在不同预测器上提高跨预测器视线估计精度。
    Abstract Although recent deep learning based gaze estimation approaches have achieved much improvement, we still know little about how gaze features are connected to the physics of gaze. In this paper, we try to answer this question by analyzing the gaze feature manifold. Our analysis revealed the insight that the geodesic distance between gaze features is consistent with the gaze differences between samples. According to this finding, we construct the Physics- Consistent Feature (PCF) in an analytical way, which connects gaze feature to the physical definition of gaze. We further propose the PCFGaze framework that directly optimizes gaze feature space by the guidance of PCF. Experimental results demonstrate that the proposed framework alleviates the overfitting problem and significantly improves cross-domain gaze estimation accuracy without extra training data. The insight of gaze feature has the potential to benefit other regression tasks with physical meanings.
    摘要 尽管最近的深度学习基于眼动估算方法已经取得了大量进步,但我们对眼动特征与物理眼动之间的连接还知之 little。在这篇论文中,我们尝试回答这个问题,通过分析眼动特征抽象空间。我们的分析发现,在眼动特征空间中, closest geodesic distance 与眼动差异 between samples 相关。基于这一发现,我们构建了Physics-Consistent Feature (PCF),将眼动特征连接到物理眼动的定义。我们进一步提出PCFGaze框架,通过PCF的指导,直接优化眼动特征空间。实验结果表明,我们的框架可以减少过拟合问题,在不同领域的眼动估算精度得到显著改善,无需额外的训练数据。我们的发现可能会对其他具有物理含义的回归任务产生影响。

The Adversarial Implications of Variable-Time Inference

  • paper_url: http://arxiv.org/abs/2309.02159
  • repo_url: https://github.com/dudi709/Timing-Based-Attack
  • paper_authors: Dudi Biton, Aditi Misra, Efrat Levy, Jaidip Kotak, Ron Bitton, Roei Schuster, Nicolas Papernot, Yuval Elovici, Ben Nassi
  • For: The paper is written to demonstrate the ability to enhance decision-based attacks on machine learning models by exploiting a novel side channel in algorithmic timing.* Methods: The paper uses a technique called timing attack, which measures the execution time of the algorithm used to post-process the predictions of the ML model under attack.* Results: The paper demonstrates the ability to successfully evade object detection using adversarial examples and perform dataset inference by exploiting the timing leakage vulnerability inherent in the non-maximum suppression (NMS) algorithm. The adversarial examples exhibit superior perturbation quality compared to a decision-based attack.Here is the information in Simplified Chinese text:* For: 本文是为了展示如何使用一种新的边道攻击机器学习模型。* Methods: 本文使用了一种名为时间攻击的技术,该技术测量 ML 模型下的预测 outputs(例如,标签)的执行时间。* Results: 本文成功地逃脱了对象检测器的攻击,并完成了基于时间泄漏的数据推断。 adversarial examples 展现了比基于决策的攻击更好的杂化质量。
    Abstract Machine learning (ML) models are known to be vulnerable to a number of attacks that target the integrity of their predictions or the privacy of their training data. To carry out these attacks, a black-box adversary must typically possess the ability to query the model and observe its outputs (e.g., labels). In this work, we demonstrate, for the first time, the ability to enhance such decision-based attacks. To accomplish this, we present an approach that exploits a novel side channel in which the adversary simply measures the execution time of the algorithm used to post-process the predictions of the ML model under attack. The leakage of inference-state elements into algorithmic timing side channels has never been studied before, and we have found that it can contain rich information that facilitates superior timing attacks that significantly outperform attacks based solely on label outputs. In a case study, we investigate leakage from the non-maximum suppression (NMS) algorithm, which plays a crucial role in the operation of object detectors. In our examination of the timing side-channel vulnerabilities associated with this algorithm, we identified the potential to enhance decision-based attacks. We demonstrate attacks against the YOLOv3 detector, leveraging the timing leakage to successfully evade object detection using adversarial examples, and perform dataset inference. Our experiments show that our adversarial examples exhibit superior perturbation quality compared to a decision-based attack. In addition, we present a new threat model in which dataset inference based solely on timing leakage is performed. To address the timing leakage vulnerability inherent in the NMS algorithm, we explore the potential and limitations of implementing constant-time inference passes as a mitigation strategy.
    摘要 машинное обучение (ML) 模型已知容易受到一些攻击,这些攻击可能会影响模型预测的正确性或训练数据的隐私。为了进行这些攻击,黑盒式敌对者通常需要能够访问模型并观察其输出(例如,标签)。在这项工作中,我们显示了,对于第一次,能够增强这些决策基本攻击。我们提出了一种方法,利用一种新的侧途通道,即对 ML 模型下攻击的执行时间进行测量。我们发现,在执行时间方面的泄露包含有丰富的信息,可以提高基于决策的攻击,并且能够 significatively exceed 基于标签输出的攻击。在一个案例研究中,我们investigated leakage from the non-maximum suppression (NMS) algorithm,该算法在 объек检测器中扮演着关键的角色。我们发现,与 NMS 算法相关的时间泄露具有潜在的威胁,并且可以用于增强决策基本攻击。我们采用 YOLOv3 检测器进行攻击,通过利用时间泄露来逃脱物体检测,并进行数据集推理。我们的实验结果表明,我们的恶作剂示例具有较高的杂化质量,比基于决策的攻击更好。此外,我们还提出了一个新的威胁模型,在该模型中,攻击者 solely 基于时间泄露进行数据集推理。为了解决 NMS 算法中的时间泄露漏洞,我们探讨了可能的和限制的实现常量时间推理 passes 的缓解策略。

Traffic Light Recognition using Convolutional Neural Networks: A Survey

  • paper_url: http://arxiv.org/abs/2309.02158
  • repo_url: None
  • paper_authors: Svetlana Pavlitska, Nico Lambing, Ashok Kumar Bangaru, J. Marius Zöllner
  • for: 本研究旨在提供一个涵盖汽车自动驾驶中实时交通信号识别的模型建立方法的综述。
  • methods: 本研究使用了 convolutional neural networks (CNNs) 进行交通信号识别方法的分析和检视。
  • results: 研究人员通过对 datasets 和 CNN 建模方法的分析,将交通信号识别方法分为三个主要群组:(1)特定任务特性补做的 generic object detectors 修改版本,(2)包含 rule-based 和 CNN 组件的多阶段方法,以及(3)专门为此任务设计的单阶段方法。
    Abstract Real-time traffic light recognition is essential for autonomous driving. Yet, a cohesive overview of the underlying model architectures for this task is currently missing. In this work, we conduct a comprehensive survey and analysis of traffic light recognition methods that use convolutional neural networks (CNNs). We focus on two essential aspects: datasets and CNN architectures. Based on an underlying architecture, we cluster methods into three major groups: (1) modifications of generic object detectors which compensate for specific task characteristics, (2) multi-stage approaches involving both rule-based and CNN components, and (3) task-specific single-stage methods. We describe the most important works in each cluster, discuss the usage of the datasets, and identify research gaps.
    摘要 现实时交通信号识别是自动驾驶的重要组成部分。然而,关于这个任务下的模型建立的总体概述却缺乏一个系统性的审查。在这项工作中,我们进行了全面的调研和分析,探讨了使用卷积神经网络(CNN)进行交通信号识别的方法。我们主要关注两个重要方面:数据集和CNN体系。基于基本体系,我们将方法分为三个主要群组:(1)特定任务特性补做的通用物体检测器修改版本,(2)包含Rule-based和CNN组件的多 stageapproaches,以及(3)专门为此任务设计的单 stage方法。我们描述了每个群组中最重要的工作,讨论了数据集的使用,并确定了研究漏洞。

S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning

  • paper_url: http://arxiv.org/abs/2309.02155
  • repo_url: None
  • paper_authors: Wei Suo, Mengyang Sun, Weisong Liu, Yiqi Gao, Peng Wang, Yanning Zhang, Qi Wu
  • for: 这 paper 的目的是解释 VQA 模型的决策过程,以便更好地了解和让用户信任。
  • methods: 这 paper 使用了 Semi-Supervised VQA-NLE via Self-Critical Learning (S3C) 方法,通过回答奖励来评估候选的解释,从而提高了逻辑一致性 между答案和解释。
  • results: 这 paper 的方法在两个 VQA-NLE 数据集上达到了新的state-of-the-art性能,并且通过自动度量和人类评估都表明了方法的有效性。
    Abstract VQA Natural Language Explanation (VQA-NLE) task aims to explain the decision-making process of VQA models in natural language. Unlike traditional attention or gradient analysis, free-text rationales can be easier to understand and gain users' trust. Existing methods mostly use post-hoc or self-rationalization models to obtain a plausible explanation. However, these frameworks are bottlenecked by the following challenges: 1) the reasoning process cannot be faithfully responded to and suffer from the problem of logical inconsistency. 2) Human-annotated explanations are expensive and time-consuming to collect. In this paper, we propose a new Semi-Supervised VQA-NLE via Self-Critical Learning (S3C), which evaluates the candidate explanations by answering rewards to improve the logical consistency between answers and rationales. With a semi-supervised learning framework, the S3C can benefit from a tremendous amount of samples without human-annotated explanations. A large number of automatic measures and human evaluations all show the effectiveness of our method. Meanwhile, the framework achieves a new state-of-the-art performance on the two VQA-NLE datasets.
    摘要

Domain Adaptation for Satellite-Borne Hyperspectral Cloud Detection

  • paper_url: http://arxiv.org/abs/2309.02150
  • repo_url: None
  • paper_authors: Andrew Du, Anh-Dzung Doan, Yee Wei Law, Tat-Jun Chin
  • for: 本研究旨在解决遥感机器学习硬件加速器上部署云计算模型时遇到的领域差问题,以实现在新任务中使用新感器时的模型更新和提高。
  • methods: 本研究提出了新的领域适应任务,并开发了一种带宽效率的超vision学习领域适应算法。此外,本研究还提出了在遥感机器学习硬件加速器上实现测试时适应算法。
  • results: 本研究表明,通过使用新的领域适应任务和算法,可以在遥感机器学习硬件加速器上实现带宽效率的领域适应,以便在新任务中使用新感器时的模型更新和提高。
    Abstract The advent of satellite-borne machine learning hardware accelerators has enabled the on-board processing of payload data using machine learning techniques such as convolutional neural networks (CNN). A notable example is using a CNN to detect the presence of clouds in hyperspectral data captured on Earth observation (EO) missions, whereby only clear sky data is downlinked to conserve bandwidth. However, prior to deployment, new missions that employ new sensors will not have enough representative datasets to train a CNN model, while a model trained solely on data from previous missions will underperform when deployed to process the data on the new missions. This underperformance stems from the domain gap, i.e., differences in the underlying distributions of the data generated by the different sensors in previous and future missions. In this paper, we address the domain gap problem in the context of on-board hyperspectral cloud detection. Our main contributions lie in formulating new domain adaptation tasks that are motivated by a concrete EO mission, developing a novel algorithm for bandwidth-efficient supervised domain adaptation, and demonstrating test-time adaptation algorithms on space deployable neural network accelerators. Our contributions enable minimal data transmission to be invoked (e.g., only 1% of the weights in ResNet50) to achieve domain adaptation, thereby allowing more sophisticated CNN models to be deployed and updated on satellites without being hampered by domain gap and bandwidth limitations.
    摘要 卫星上的机器学习硬件加速器的出现,使得payload数据中使用机器学习技术,如 convolutional neural networks (CNN) 进行处理。一个典型的应用是使用 CNN 在 Earth observation (EO) 任务中检测地球表面上云的存在,从而只下载清晰天空数据,以保存带宽。然而,在新任务中使用新的探测器时,新任务不会有足够的表示性数据来训练 CNN 模型,而一个仅在前一任务中训练的模型在新任务中表现不佳,这是由于领域差距问题,即不同探测器生成的数据的下面分布之间的差异。在这篇论文中,我们在 Earth observation 中解决领域差距问题。我们的主要贡献在于提出了新的领域适应任务,开发了一种带宽有效的指导领域适应算法,并在空间部署可能的神经网络加速器上进行测试时适应算法。我们的贡献使得只需传输 minimal 的数据(例如,ResNet50 中的 1% 的参数)可以实现领域适应,从而允许更复杂的 CNN 模型在卫星上部署和更新,不受领域差距和带宽限制。

INCEPTNET: Precise And Early Disease Detection Application For Medical Images Analyses

  • paper_url: http://arxiv.org/abs/2309.02147
  • repo_url: https://github.com/AMiiR-S/Inceptnet_cancer_recognition
  • paper_authors: Amirhossein Sajedi, Mohammad Javad Fadaeieslam
  • for: 本研究旨在提出一种新的深度神经网络(DNN),名为InceptNet,用于医疗图像处理,以提高疾病检测和医疗图像分 segmentation 的精度和性能。
  • methods: 本研究使用了Unet架构,并添加了多个并行的启发模块,以便快速地捕捉到医疗图像中的缩放区域。
  • results: 对四个referencedataset进行测试,包括血管 segmentation、肺肿尺segmentation、皮肤损伤 segmentation和乳腺癌细胞检测。结果表明,提案方法在图像中的小规模结构上表现出了更高的改进度。与前一代方法相比,提案方法的精度从0.9531、0.8900、0.9872和0.9881提高到0.9555、0.9510、0.9945和0.9945。
    Abstract In view of the recent paradigm shift in deep AI based image processing methods, medical image processing has advanced considerably. In this study, we propose a novel deep neural network (DNN), entitled InceptNet, in the scope of medical image processing, for early disease detection and segmentation of medical images in order to enhance precision and performance. We also investigate the interaction of users with the InceptNet application to present a comprehensive application including the background processes, and foreground interactions with users. Fast InceptNet is shaped by the prominent Unet architecture, and it seizes the power of an Inception module to be fast and cost effective while aiming to approximate an optimal local sparse structure. Adding Inception modules with various parallel kernel sizes can improve the network's ability to capture the variations in the scaled regions of interest. To experiment, the model is tested on four benchmark datasets, including retina blood vessel segmentation, lung nodule segmentation, skin lesion segmentation, and breast cancer cell detection. The improvement was more significant on images with small scale structures. The proposed method improved the accuracy from 0.9531, 0.8900, 0.9872, and 0.9881 to 0.9555, 0.9510, 0.9945, and 0.9945 on the mentioned datasets, respectively, which show outperforming of the proposed method over the previous works. Furthermore, by exploring the procedure from start to end, individuals who have utilized a trial edition of InceptNet, in the form of a complete application, are presented with thirteen multiple choice questions in order to assess the proposed method. The outcomes are evaluated through the means of Human Computer Interaction.
    摘要 因为深度AI技术的最近分布shift,医疗图像处理方法得到了显著提高。在这个研究中,我们提出了一种新的深度神经网络(DNN),即InceptNet,用于医疗图像处理领域的疾病检测和图像分割,以提高精度和性能。我们还investigated用户与InceptNet应用程序之间的交互,以提供全面的应用程序,包括背景进程和前景交互。快速InceptNet基于提前的Unet架构,并利用Inception模块以实现快速和经济的搅拌,同时尝试以最佳的本地稀疏结构来近似。通过添加不同的并行kernel大小的Inception模块,可以提高网络的捕捉缩放区域的变化能力。为了实验,我们测试了四个标准数据集,包括血液管Segmentation、肺肿Segmentation、皮肤病变Segmentation和乳腺癌细胞检测。提出的方法在这些数据集上提高了精度,从0.9531、0.8900、0.9872和0.9881提高到0.9555、0.9510、0.9945和0.9945,这表明提出的方法在先前的工作上出perform。此外,通过探索从开始到结束的过程,那些使用了InceptNet试用版的人员被提供了十三个多选题,以评估提出的方法。结果由人计算机交互评估。

Hierarchical Masked 3D Diffusion Model for Video Outpainting

  • paper_url: http://arxiv.org/abs/2309.02119
  • repo_url: https://github.com/fanfanda/M3DDM
  • paper_authors: Fanda Fan, Chaoxu Guo, Litong Gong, Biao Wang, Tiezheng Ge, Yuning Jiang, Chunjie Luo, Jianfeng Zhan
  • for: 这个论文的目的是提出一种基于掩码模型的3D扩散模型,用于视频填充。
  • methods: 这个方法使用掩码模型的技术进行训练,并使用多个引导帧连接多个视频帧的结果,以保证视频的时间一致性和避免邻帧的抖动。同时,该方法还使用全帧提取以提取视频中的全帧信息,并使用交叉注意力引导模型获取其他视频帧中的信息。
  • results: 该方法在视频填充任务中实现了state-of-the-art的结果。
    Abstract Video outpainting aims to adequately complete missing areas at the edges of video frames. Compared to image outpainting, it presents an additional challenge as the model should maintain the temporal consistency of the filled area. In this paper, we introduce a masked 3D diffusion model for video outpainting. We use the technique of mask modeling to train the 3D diffusion model. This allows us to use multiple guide frames to connect the results of multiple video clip inferences, thus ensuring temporal consistency and reducing jitter between adjacent frames. Meanwhile, we extract the global frames of the video as prompts and guide the model to obtain information other than the current video clip using cross-attention. We also introduce a hybrid coarse-to-fine inference pipeline to alleviate the artifact accumulation problem. The existing coarse-to-fine pipeline only uses the infilling strategy, which brings degradation because the time interval of the sparse frames is too large. Our pipeline benefits from bidirectional learning of the mask modeling and thus can employ a hybrid strategy of infilling and interpolation when generating sparse frames. Experiments show that our method achieves state-of-the-art results in video outpainting tasks. More results are provided at our https://fanfanda.github.io/M3DDM/.
    摘要 <>将文本翻译成简化中文。<>视频外绘目标是完整地填充视频帧边缘中的缺失区域。与图像外绘相比,它增加了一个额外挑战,即模型需要保持视频帧中填充的区域的时间一致性。在这篇论文中,我们介绍了一种masked 3D扩散模型 для视频外绘。我们使用模型的技术来训练3D扩散模型。这 позволяет我们使用多个引导帧连接多个视频帧的结果,以确保时间一致性,并减少相邻帧之间的振荡。同时,我们提取视频全帧作为提示,并使用交叉注意力导引模型获取除当前视频帧之外的信息。我们还提出了一种混合粗细调制pipeline来缓解artifact散布问题。现有的粗细调制pipeline只使用填充策略,这会导致质量下降,因为缺失帧的时间间隔太长。我们的管道可以利用面精模型的混合学习,因此可以采用混合策略,在生成缺失帧时使用填充和 interpolate 两种策略。实验显示,我们的方法实现了视频外绘任务的状态精算结果。更多结果请参考我们的

Towards Diverse and Consistent Typography Generation

  • paper_url: http://arxiv.org/abs/2309.02099
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Wataru Shimoda, Daichi Haraguchi, Seiichi Uchida, Kota Yamaguchi
  • for: 这篇论文旨在实现多元化的 typography 设计,以增加设计文件中的多样性。
  • methods: 论文使用了一个精确的 attribute generation 模型,并建立了一个自适应的 sampling 方法,以生成具有与输入设计上下文相符的多样化 typography。
  • results: 实验结果显示,论文的模型成功将多样化的 typography 生成出来,并且保持了一致的 typography 结构。Is there anything else I can help with?
    Abstract In this work, we consider the typography generation task that aims at producing diverse typographic styling for the given graphic document. We formulate typography generation as a fine-grained attribute generation for multiple text elements and build an autoregressive model to generate diverse typography that matches the input design context. We further propose a simple yet effective sampling approach that respects the consistency and distinction principle of typography so that generated examples share consistent typographic styling across text elements. Our empirical study shows that our model successfully generates diverse typographic designs while preserving a consistent typographic structure.
    摘要 在这个工作中,我们考虑了 typography 生成任务,旨在为给定的图文文档生成多样化的 typography 风格。我们将 typography 生成定义为多个文本元素的细化特征生成,并构建了自适应模型来生成匹配输入设计Context 的多样化 typography。我们还提出了一种简单 yet 有效的采样方法,使得生成的例子遵循 typography 的一致性和差异原则,以保证生成的 typography 风格具有一致性。我们的实验表明,我们的模型成功地生成了多样化的 typography 设计,同时保持一致的 typography 结构。Here's the breakdown of the translation:* "typography" is translated as "typography" (字体设计)* "graphic document" is translated as "图文文档" (图文文档)* "fine-grained attribute generation" is translated as "细化特征生成" (细化特征生成)* "autoregressive model" is translated as "自适应模型" (自适应模型)* "consistency and distinction principle" is translated as "一致性和差异原则" (一致性和差异原则)* "empirical study" is translated as "实验研究" (实验研究)

DeNISE: Deep Networks for Improved Segmentation Edges

  • paper_url: http://arxiv.org/abs/2309.02091
  • repo_url: None
  • paper_authors: Sander Riisøen Jyhne, Per-Arne Andersen, Morten Goodwin
  • for: 提高分割图像边界质量
  • methods: 使用边检测和分割模型提高分割edge的准确性
  • results: 在航空图像 segmentation task 中,使用 DeNISE 技术可以提高基elineresult的 IoU 至 78.9%Here’s a breakdown of each point:
  • for: The paper is written for improving the boundary quality of segmentation masks in aerial images.
  • methods: The paper uses edge detection and segmentation models to improve the accuracy of the predicted segmentation edge. The technique is not trained end-to-end and can be applied to all types of neural networks.
  • results: The paper demonstrates the potential of the DeNISE technique by improving the baseline results with a building IoU of 78.9% in aerial images.
    Abstract This paper presents Deep Networks for Improved Segmentation Edges (DeNISE), a novel data enhancement technique using edge detection and segmentation models to improve the boundary quality of segmentation masks. DeNISE utilizes the inherent differences in two sequential deep neural architectures to improve the accuracy of the predicted segmentation edge. DeNISE applies to all types of neural networks and is not trained end-to-end, allowing rapid experiments to discover which models complement each other. We test and apply DeNISE for building segmentation in aerial images. Aerial images are known for difficult conditions as they have a low resolution with optical noise, such as reflections, shadows, and visual obstructions. Overall the paper demonstrates the potential for DeNISE. Using the technique, we improve the baseline results with a building IoU of 78.9%.
    摘要 这篇论文提出了深度网络提高分割边缘(DeNISE),一种使用边检测和分割模型提高分割框架质量的数据优化技术。DeNISE利用两个顺序的深度神经网络之间的自然差异来提高预测分割边的准确性。DeNISE适用于所有类型的神经网络,不需要练习端到端,可以快速进行实验,找到哪些模型相互补充。我们测试并应用DeNISE于航空图像分割。航空图像known for difficult conditions, such as low resolution, optical noise, reflections, shadows, and visual obstructions. Overall, the paper demonstrates the potential of DeNISE. Using the technique, we improve the baseline results with a building IoU of 78.9%.

Histograms of Points, Orientations, and Dynamics of Orientations Features for Hindi Online Handwritten Character Recognition

  • paper_url: http://arxiv.org/abs/2309.02067
  • repo_url: None
  • paper_authors: Anand Sharma, A. G. Ramakrishnan
  • for: 这个研究的目的是提出一种独立于字形笔触方向和顺序变化的手写字符识别方法。
  • methods: 该方法将特征分别映射到点的坐标值和点的方向orientation,并计算这些特征的 histogram 从不同的空间地图中。 此外,该方法还考虑了其他已经在其他研究中用于训练分类器的不同特征,如 spatio-temporal、柯西 transform、杰氏 transform、浪涌 transform 和 histogram of oriented gradients。
  • results: SVM 分类器使用该提出的特征达到了最高的92.9%的分类精度,比其他特征的分类器的性能更高。
    Abstract A set of features independent of character stroke direction and order variations is proposed for online handwritten character recognition. A method is developed that maps features like co-ordinates of points, orientations of strokes at points, and dynamics of orientations of strokes at points spatially as a function of co-ordinate values of the points and computes histograms of these features from different regions in the spatial map. Different features like spatio-temporal, discrete Fourier transform, discrete cosine transform, discrete wavelet transform, spatial, and histograms of oriented gradients used in other studies for training classifiers for character recognition are considered. The classifier chosen for classification performance comparison, when trained with different features, is support vector machines (SVM). The character datasets used for training and testing the classifiers consist of online handwritten samples of 96 different Hindi characters. There are 12832 and 2821 samples in training and testing datasets, respectively. SVM classifiers trained with the proposed features has the highest classification accuracy of 92.9\% when compared to the performances of SVM classifiers trained with the other features and tested on the same testing dataset. Therefore, the proposed features have better character discriminative capability than the other features considered for comparison.
    摘要 “一组独立于字符笔触方向和顺序变化的特征集被提议用于在线手写字符识别。一种方法将这些特征,如点坐标、点上笔orientation、笔orientation的动态变化,空间地映射为坐标值的函数,并从不同区域的空间地图中计算 histogram。这些特征包括空间时间、离散傅里叶变换、离散佩瑞茨变换、离散波浪变换、空间和 histogram of oriented gradients,这些特征在其他研究中用于训练类ifier进行字符识别。类ifier选择为支持向量机(SVM)。字符数据集用于训练和测试类ifier包括96个不同的印地语字符的在线手写样本,共有12832和2821个样本。 SVM类ifier使用提议的特征得到最高的92.9%的分类精度,比其他特征和测试集进行比较,因此提议的特征具有更好的字符识别能力。”

An Adaptive Spatial-Temporal Local Feature Difference Method for Infrared Small-moving Target Detection

  • paper_url: http://arxiv.org/abs/2309.02054
  • repo_url: None
  • paper_authors: Yongkang Zhao, Chuang Zhu, Yuan Li, Shuaishuai Wang, Zihan Lan, Yuanyuan Qiao
  • for: 这个研究旨在提高红外线小运动目标准确的检测,并且提出了一个新的方法。
  • methods: 这个方法使用了空间和时间领域的滤波器,并且将像素级别的背景降噪模组组合到出力中,以增强目标和背景的 контра斯特。
  • results: 实验结果显示,提案的方法在红外线小运动目标检测方面比现有的方法表现更好。
    Abstract Detecting small moving targets accurately in infrared (IR) image sequences is a significant challenge. To address this problem, we propose a novel method called spatial-temporal local feature difference (STLFD) with adaptive background suppression (ABS). Our approach utilizes filters in the spatial and temporal domains and performs pixel-level ABS on the output to enhance the contrast between the target and the background. The proposed method comprises three steps. First, we obtain three temporal frame images based on the current frame image and extract two feature maps using the designed spatial domain and temporal domain filters. Next, we fuse the information of the spatial domain and temporal domain to produce the spatial-temporal feature maps and suppress noise using our pixel-level ABS module. Finally, we obtain the segmented binary map by applying a threshold. Our experimental results demonstrate that the proposed method outperforms existing state-of-the-art methods for infrared small-moving target detection.
    摘要 探测红外图像序列中小目标的准确性是一项非常重要的挑战。为解决这个问题,我们提出了一种新的方法,即空间-时间本地特征差分(STLFD)与适应背景抑制(ABS)。我们的方法使用空间和时间Domain的滤波器,并在输出中进行像素级ABS处理,以增强目标和背景的对比度。我们的方法包括三步:第一步,基于当前帧图像,获取三帧图像,并使用我们设计的空间频谱和时间频谱滤波器来提取两个特征图。第二步,将空间频谱和时间频谱的信息融合,生成空间-时间特征图,并使用我们的像素级ABS模块来抑制噪声。第三步,应用阈值来获得分割的二值图。我们的实验结果表明,我们的方法可以超过现有的状态艺术方法对红外小目标探测的性能。

Diffusion-based 3D Object Detection with Random Boxes

  • paper_url: http://arxiv.org/abs/2309.02049
  • repo_url: None
  • paper_authors: Xin Zhou, Jinghua Hou, Tingting Yao, Dingkang Liang, Zhe Liu, Zhikang Zou, Xiaoqing Ye, Jianwei Cheng, Xiang Bai
  • for: 三角形物体探测是自动驾驶的关键任务,现有的锚点基于方法依赖于经验性的锚点设置,导致算法缺乏启发性。
  • methods: 我们提出的Diff3Det方法将Diffusion模型应用到提议生成阶段,视为探测目标的生成。在训练阶段,物体框从真实框架噪声过程中演化到高斯分布,解码器学习反转噪声过程。在推断阶段,模型不断级联一组随机框架,进行预测结果的精细化。
  • results: 我们在KITTI测试benchmark上进行了详细的实验,与传统锚点基于3D探测方法进行比较,实现了显著的性能提升。
    Abstract 3D object detection is an essential task for achieving autonomous driving. Existing anchor-based detection methods rely on empirical heuristics setting of anchors, which makes the algorithms lack elegance. In recent years, we have witnessed the rise of several generative models, among which diffusion models show great potential for learning the transformation of two distributions. Our proposed Diff3Det migrates the diffusion model to proposal generation for 3D object detection by considering the detection boxes as generative targets. During training, the object boxes diffuse from the ground truth boxes to the Gaussian distribution, and the decoder learns to reverse this noise process. In the inference stage, the model progressively refines a set of random boxes to the prediction results. We provide detailed experiments on the KITTI benchmark and achieve promising performance compared to classical anchor-based 3D detection methods.
    摘要 三维对象检测是自动驾驶的关键任务。现有的锚点基于方法靠 Empirical 规则设置锚点,这使得算法缺乏吟芳。近年来,我们所见证到了多种生成模型的出现,其中扩散模型在学习两个分布之间的变换方面表现出了极大的潜力。我们提出的 Diff3Det 将扩散模型应用到提议生成中,并考虑检测框为生成目标。在训练阶段,对象框从真实框 diffuse 到 Gaussian 分布,decoder 学习恢复这种噪声过程。在推测阶段,模型逐渐精细化一组随机框到预测结果。我们对 KITTI 测试准则进行详细的实验,并与经典锚点基于三维检测方法比较得出了良好的表现。

Decomposed Guided Dynamic Filters for Efficient RGB-Guided Depth Completion

  • paper_url: http://arxiv.org/abs/2309.02043
  • repo_url: https://github.com/YufeiWang777/DGDF
  • paper_authors: Yufei Wang, Yuxin Mao, Qi Liu, Yuchao Dai
  • for: 这个 paper 目的是完善 RGB 图像中的深度映射,使用 sparse depth measurement 和对应的 RGB 图像。
  • methods: 使用 guided dynamic filters,将 RGB 特征变换为深度特征,并使用内容适应的混合层来将 filters 分解为具有内容属性的分布式元件。
  • results: 根据 proposed 的想法,可以实现对 RGB-D 复合 зада зада的优化,并在 KITTI 数据集上实现 state-of-the-art 的表现,同时在 NYUv2 数据集上实现相似的表现。
    Abstract RGB-guided depth completion aims at predicting dense depth maps from sparse depth measurements and corresponding RGB images, where how to effectively and efficiently exploit the multi-modal information is a key issue. Guided dynamic filters, which generate spatially-variant depth-wise separable convolutional filters from RGB features to guide depth features, have been proven to be effective in this task. However, the dynamically generated filters require massive model parameters, computational costs and memory footprints when the number of feature channels is large. In this paper, we propose to decompose the guided dynamic filters into a spatially-shared component multiplied by content-adaptive adaptors at each spatial location. Based on the proposed idea, we introduce two decomposition schemes A and B, which decompose the filters by splitting the filter structure and using spatial-wise attention, respectively. The decomposed filters not only maintain the favorable properties of guided dynamic filters as being content-dependent and spatially-variant, but also reduce model parameters and hardware costs, as the learned adaptors are decoupled with the number of feature channels. Extensive experimental results demonstrate that the methods using our schemes outperform state-of-the-art methods on the KITTI dataset, and rank 1st and 2nd on the KITTI benchmark at the time of submission. Meanwhile, they also achieve comparable performance on the NYUv2 dataset. In addition, our proposed methods are general and could be employed as plug-and-play feature fusion blocks in other multi-modal fusion tasks such as RGB-D salient object detection.
    摘要 RGB-导向深度完成任务是预测粗略深度图像从稀疏深度测量和对应的RGB图像,其中如何有效地和有效地利用多Modal信息是关键问题。受导动 филь特,生成从RGB特征生成空间不同的depth-wise分解卷积滤波器,用于导航深度特征,已经证明是有效的。然而,生成的滤波器需要大量的模型参数,计算成本和内存占用率,特别是当特征通道数大的时候。在这篇论文中,我们提议将导动 filters decomposed into a spatially-shared component multiplied by content-adaptive adaptors at each spatial location。根据我们的想法,我们引入了A和B两种分解方案,通过在空间位置上分解滤波器结构并使用空间层别注意,分解滤波器。这些分解滤波器不仅保持了导动 filters 的有利特性,例如具有内容依赖和空间不同的特性,同时还减少了模型参数和硬件成本,因为学习的适应器被分离到特征通道数。我们的方法在KITTI dataset上实现了比州-of-the-art的表现,并在KITTI benchmark上 ranked 1st and 2nd at the time of submission。此外,我们的方法也实现了与NYUv2 dataset上的相似表现。此外,我们的提议是通用的,可以作为RGB-D突出对象检测中的特性融合块。

Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples

  • paper_url: http://arxiv.org/abs/2309.02041
  • repo_url: https://github.com/hengliusky/few_shot_rvos
  • paper_authors: Guanghui Li, Mingqi Gao, Heng Liu, Xiantong Zhen, Feng Zheng
  • for: 本研究旨在提出一种基于Transformer架构的简单 yet有效的模型,以解决现有的 Referring Video Object Segmentation(RVOS)方法在面临有限样本的情况下表现不佳的问题。
  • methods: 我们提出了一种新的交叉模式相互关联(CMA)模块,该模块通过建立多模式相互关联,快速学习新的semantic信息,使模型能够适应不同的场景。
  • results: 我们的模型在几种不同的场景下,只使用了几个样本来学习,仍然可以达到当前最佳性能,比如在Mini-Ref-YouTube-VOS上的平均性能为53.1 J和54.8 F,比基eline上高出10%。此外,我们在Mini-Ref-SAIL-VOS上也取得了很出色的结果,达到77.7 J和74.8 F,与基eline相比明显优于。
    Abstract Referring video object segmentation (RVOS), as a supervised learning task, relies on sufficient annotated data for a given scene. However, in more realistic scenarios, only minimal annotations are available for a new scene, which poses significant challenges to existing RVOS methods. With this in mind, we propose a simple yet effective model with a newly designed cross-modal affinity (CMA) module based on a Transformer architecture. The CMA module builds multimodal affinity with a few samples, thus quickly learning new semantic information, and enabling the model to adapt to different scenarios. Since the proposed method targets limited samples for new scenes, we generalize the problem as - few-shot referring video object segmentation (FS-RVOS). To foster research in this direction, we build up a new FS-RVOS benchmark based on currently available datasets. The benchmark covers a wide range and includes multiple situations, which can maximally simulate real-world scenarios. Extensive experiments show that our model adapts well to different scenarios with only a few samples, reaching state-of-the-art performance on the benchmark. On Mini-Ref-YouTube-VOS, our model achieves an average performance of 53.1 J and 54.8 F, which are 10% better than the baselines. Furthermore, we show impressive results of 77.7 J and 74.8 F on Mini-Ref-SAIL-VOS, which are significantly better than the baselines. Code is publicly available at https://github.com/hengliusky/Few_shot_RVOS.
    摘要 描述视频对象分割(RVOS)作为一种监督学习任务,需要具备充足的标注数据,以便在新场景中进行学习。然而,在更真实的场景中,只有有限的标注数据可用于新场景,这会对现有的RVOS方法带来重大挑战。为了解决这个问题,我们提出了一种简单 yet effective的模型,基于 transformer 架构的 cross-modal affinity(CMA)模块。CMA模块可以快速学习新的semantic信息,并在不同的场景中建立多modal的联系,使模型能够适应不同的场景。由于我们的方法targets限样数据,我们将问题推广为- few-shot referring video object segmentation(FS-RVOS)。为了推动这个方向的研究,我们建立了一个新的FS-RVOS benchmark,该benchmark包括了多种情况,可以最大化 simulate real-world scenarios。我们的实验表明,我们的模型能够适应不同的场景,只需要几个样本,并且在 benchmark 上达到了state-of-the-art的性能。在 Mini-Ref-YouTube-VOS 上,我们的模型取得了53.1 J和54.8 F的性能,比基eline高出10%。此外,我们在 Mini-Ref-SAIL-VOS 上取得了77.7 J和74.8 F的性能,与基eline significatively better。我们的代码可以在 上下载。

A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking

  • paper_url: http://arxiv.org/abs/2309.02031
  • repo_url: None
  • paper_authors: Lorenzo Papa, Paolo Russo, Irene Amerini, Luping Zhou
  • for: 提高vision transformer(ViT)的效率和可扩展性,以便在实际应用中使用。
  • methods: 研究和分析了四种高效策略,包括减少模型大小、知识传播、量化和减少计算成本。
  • results: 通过对不同应用场景进行分析和讨论,探讨了现有高效策略的性能。同时,也提出了一些未来研究的挑战和机遇。
    Abstract Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications. Their main feature is the capacity to extract global information through the self-attention mechanism, outperforming earlier convolutional neural networks. However, ViT deployment and performance have grown steadily with their size, number of trainable parameters, and operations. Furthermore, self-attention's computational and memory cost quadratically increases with the image resolution. Generally speaking, it is challenging to employ these architectures in real-world applications due to many hardware and environmental restrictions, such as processing and computational capabilities. Therefore, this survey investigates the most efficient methodologies to ensure sub-optimal estimation performances. More in detail, four efficient categories will be analyzed: compact architecture, pruning, knowledge distillation, and quantization strategies. Moreover, a new metric called Efficient Error Rate has been introduced in order to normalize and compare models' features that affect hardware devices at inference time, such as the number of parameters, bits, FLOPs, and model size. Summarizing, this paper firstly mathematically defines the strategies used to make Vision Transformer efficient, describes and discusses state-of-the-art methodologies, and analyzes their performances over different application scenarios. Toward the end of this paper, we also discuss open challenges and promising research directions.
    摘要 computer vision 应用中vision transformer(ViT)架构受到越来越多的关注和推广使用,主要特点是通过自注意机制提取全局信息,超越了先前的卷积神经网络。然而,ViT的部署和性能随其大小、可训练参数数量和运算数量的增长。此外,自注意的计算和内存成本随图像分辨率的增长而呈 quadratic 增长。因此,在真实世界应用中使用这些架构很困难,主要因为硬件和环境限制,如处理和计算能力。因此,本纪要Investigate the most efficient methodologies to ensure sub-optimal estimation performances。具体来说,本文分析了四种高效的类别:压缩架构、采样、知识继承和量化策略。此外,我们还引入了一个新的度量,称为高效错误率,以便对各种模型的特征进行Normalize和比较,这些特征在推理时影响硬件设备,如参数数量、位数、FLOPs和模型大小。总之,本文首先数学定义了使Vision Transformer高效的策略,描述了和讨论了当前领域的状态泰尊方法,并对不同应用场景进行了分析。在本文的末尾,我们还讨论了开放的挑战和潜在的研究方向。

RawHDR: High Dynamic Range Image Reconstruction from a Single Raw Image

  • paper_url: http://arxiv.org/abs/2309.02020
  • repo_url: https://github.com/jackzou233/rawhdr
  • paper_authors: Yunhao Zou, Chenggang Yan, Ying Fu
  • for: 这 paper 的目的是生成高动态范围 (HDR) 图像,从 Raw 感知器数据中恢复场景信息。
  • methods: 该 paper 使用了一种专门为 Raw 图像设计的模型,利用 Raw 数据的特点来促进 Raw-to-HDR 映射。具体来说,它学习了曝光面积来分 distinguishing 高动态场景中的软和硬区。然后,它引入了两种重要的导航:双感知导航和全球空间导航,以帮助恢复场景信息。
  • results: 该 paper 的实验表明,提出的 Raw-to-HDR 重建模型在训练和测试 datasets 上具有superiority,同时新captured dataset 也在实验中得到了验证。
    Abstract High dynamic range (HDR) images capture much more intensity levels than standard ones. Current methods predominantly generate HDR images from 8-bit low dynamic range (LDR) sRGB images that have been degraded by the camera processing pipeline. However, it becomes a formidable task to retrieve extremely high dynamic range scenes from such limited bit-depth data. Unlike existing methods, the core idea of this work is to incorporate more informative Raw sensor data to generate HDR images, aiming to recover scene information in hard regions (the darkest and brightest areas of an HDR scene). To this end, we propose a model tailor-made for Raw images, harnessing the unique features of Raw data to facilitate the Raw-to-HDR mapping. Specifically, we learn exposure masks to separate the hard and easy regions of a high dynamic scene. Then, we introduce two important guidances, dual intensity guidance, which guides less informative channels with more informative ones, and global spatial guidance, which extrapolates scene specifics over an extended spatial domain. To verify our Raw-to-HDR approach, we collect a large Raw/HDR paired dataset for both training and testing. Our empirical evaluations validate the superiority of the proposed Raw-to-HDR reconstruction model, as well as our newly captured dataset in the experiments.
    摘要 高动态范围(HDR)图像捕捉到了标准图像的多倍级别。现有方法主要从8位低动态范围(LDR)sRGB图像中生成HDR图像,这些图像在摄像头处理管道中受到了很大的压缩。然而,从这些有限位数据中恢复极高动态范围场景变得非常困难。与现有方法不同,本工作的核心思想是利用Raw感知器数据来生成HDR图像,以便恢复场景信息在极端区域(HDR场景中最暗和最亮区域)。为此,我们提出了专门为Raw图像设计的模型,利用Raw数据的独特特性来促进Raw-to-HDR映射。具体来说,我们学习出光mask,以分类场景中的困难区域和易于处理区域。然后,我们引入两种重要的导航,分别是双感知导航和全球空间导航。双感知导航使用不具有很多信息的通道与具有更多信息的通道相互拥抱,而全球空间导航通过把场景特点推广到更广泛的空间领域来描述场景。为验证我们的Raw-to-HDR重建模型,我们收集了大量Raw/HDR对应的数据集,用于训练和测试。我们的实验证明了我们提出的Raw-to-HDR重建模型的优越性,以及我们 newly captured数据集在实验中的有用性。

Logarithmic Mathematical Morphology: theory and applications

  • paper_url: http://arxiv.org/abs/2309.02007
  • repo_url: None
  • paper_authors: Guillaume Noyel
  • for: Addressing the issue of lighting variations in Mathematical Morphology for grey-level functions.
  • methods: Defining a new framework called Logarithmic Mathematical Morphology (LMM) with an additive law that varies according to the image amplitude, and using it to define operators that are robust to lighting variations.
  • results: Comparing the LMM method with three state-of-the-art approaches on eye-fundus images with non-uniform lighting variations, and showing that the LMM approach has better robustness to such variations.Here’s the full text in Simplified Chinese:
  • for: Mathematical Morphology 中的批处理问题,即在图像中处理不同亮度的问题。
  • methods: Logarithmic Mathematical Morphology (LMM) Framework,使用不同亮度的加法则来处理图像,以提高对不同亮度的Robustness。
  • results: LMM 方法与三种现有方法进行比较,在不同亮度下的眼膜图像中 segmentation vessles 的问题上,LMM 方法表现更好,具有更高的Robustness。
    Abstract Classically, in Mathematical Morphology, an image (i.e., a grey-level function) is analysed by another image which is named the structuring element or the structuring function. This structuring function is moved over the image domain and summed to the image. However, in an image presenting lighting variations, the analysis by a structuring function should require that its amplitude varies according to the image intensity. Such a property is not verified in Mathematical Morphology for grey level functions, when the structuring function is summed to the image with the usual additive law. In order to address this issue, a new framework is defined with an additive law for which the amplitude of the structuring function varies according to the image amplitude. This additive law is chosen within the Logarithmic Image Processing framework and models the lighting variations with a physical cause such as a change of light intensity or a change of camera exposure-time. The new framework is named Logarithmic Mathematical Morphology (LMM) and allows the definition of operators which are robust to such lighting variations. In images with uniform lighting variations, those new LMM operators perform better than usual morphological operators. In eye-fundus images with non-uniform lighting variations, a LMM method for vessel segmentation is compared to three state-of-the-art approaches. Results show that the LMM approach has a better robustness to such variations than the three others.
    摘要

Retail store customer behavior analysis system: Design and Implementation

  • paper_url: http://arxiv.org/abs/2309.03232
  • repo_url: None
  • paper_authors: Tuan Dinh Nguyen, Keisuke Hihara, Tung Cao Hoang, Yumeka Utada, Akihiko Torii, Naoki Izumi, Nguyen Thanh Thuy, Long Quoc Tran
    for: 这个研究的目的是提高零售店内客户满意度,通过个性化服务提供值。methods: 这个研究使用了深度学习技术,包括深度神经网络,来分析客户在店内互动的行为。results: 研究结果表明,使用深度学习技术可以更好地检测客户行为,并提供有用的客户行为数据可视化。
    Abstract Understanding customer behavior in retail stores plays a crucial role in improving customer satisfaction by adding personalized value to services. Behavior analysis reveals both general and detailed patterns in the interaction of customers with a store items and other people, providing store managers with insight into customer preferences. Several solutions aim to utilize this data by recognizing specific behaviors through statistical visualization. However, current approaches are limited to the analysis of small customer behavior sets, utilizing conventional methods to detect behaviors. They do not use deep learning techniques such as deep neural networks, which are powerful methods in the field of computer vision. Furthermore, these methods provide limited figures when visualizing the behavioral data acquired by the system. In this study, we propose a framework that includes three primary parts: mathematical modeling of customer behaviors, behavior analysis using an efficient deep learning based system, and individual and group behavior visualization. Each module and the entire system were validated using data from actual situations in a retail store.
    摘要 理解顾客在商场中的行为对于提高顾客满意度具有关键作用。行为分析可以揭示顾客与店内物品和其他人的互动特征,为店长提供顾客偏好的信息。然而,目前的方法受到小型顾客行为集的分析的限制,并且使用传统方法来探测行为。这些方法不使用深度学习技术,如深度神经网络,这些技术在计算机视觉领域具有强大能力。此外,这些方法只能提供有限的行为数据视图。在本研究中,我们提出了一个框架,包括三个主要部分:顾客行为数学模型、深度学习基于系统的行为分析和个人和群体行为视图。每个模块和整个系统都被使用实际情况中的商场数据验证。

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

  • paper_url: http://arxiv.org/abs/2309.01961
  • repo_url: None
  • paper_authors: Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh, Jonghwan Mun, Solgil Oh, Kenan Emir Ak, Gwang-Gook Lee, Yan Xu, Mingwei Shen, Kyomin Hwang, Wonsik Shin, Kamin Lee, Wonhark Park, Dongkwan Lee, Nojun Kwak, Yujin Wang, Yimu Wang, Tiancheng Gu, Xingchang Lv, Mingmao Sun
  • for: 挑战计划推动计算机视觉领域内的模型精度和公平性进步。
  • methods: 使用新的评估数据集,挑战计算机视觉模型在多个领域中处理新类型的图像描述。
  • results: 挑战结果包括新的评估数据集、评估方法和优秀入选结果等,帮助提高各种视觉语言任务的AI模型。
    Abstract In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested using a new evaluation dataset that includes a large variety of visual concepts from many domains. There was no specific training data provided for the challenge, and therefore the challenge entries were required to adapt to new types of image descriptions that had not been seen during training. This report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries. We expect that the outcomes of the challenge will contribute to the improvement of AI models on various vision-language tasks.
    摘要 在这份报告中,我们介绍了NICE(新领域零基础图像描述评价)项目,并分享2023年度挑战的结果和成果。这个项目的目的是挑战计算机视觉社区,以开发能够在精度和公平性两个方面提高的图像描述模型。通过挑战,图像描述模型被测试使用了一个新的评价数据集,该数据集包含了多个视觉领域的各种图像描述。没有提供专门的训练数据,因此挑战参赛作品需要适应新的图像描述类型,这些类型在训练过程中未被见过。这份报告包括NICE数据集的新提案、评价方法、挑战结果以及技术细节。我们期望这些成果将对各种视觉语言任务的AI模型产生改进。

Empowering Low-Light Image Enhancer through Customized Learnable Priors

  • paper_url: http://arxiv.org/abs/2309.01958
  • repo_url: https://github.com/zheng980629/cue
  • paper_authors: Naishan Zheng, Man Zhou, Yanmeng Dong, Xiangyu Rui, Jie Huang, Chongyi Li, Feng Zhao
  • for: 提高低光照图像的品质,提高图像的亮度和降低噪音。
  • methods: 使用自定义学习的偏好来改进深度 unfolding 架构的透明度和解释能力,包括结构流和优化流两种方法。
  • results: 对多个低光照图像进行了广泛的实验,并证明了我们提出的方法在比 estado del arte 方法之上有superiority。
    Abstract Deep neural networks have achieved remarkable progress in enhancing low-light images by improving their brightness and eliminating noise. However, most existing methods construct end-to-end mapping networks heuristically, neglecting the intrinsic prior of image enhancement task and lacking transparency and interpretability. Although some unfolding solutions have been proposed to relieve these issues, they rely on proximal operator networks that deliver ambiguous and implicit priors. In this work, we propose a paradigm for low-light image enhancement that explores the potential of customized learnable priors to improve the transparency of the deep unfolding paradigm. Motivated by the powerful feature representation capability of Masked Autoencoder (MAE), we customize MAE-based illumination and noise priors and redevelop them from two perspectives: 1) \textbf{structure flow}: we train the MAE from a normal-light image to its illumination properties and then embed it into the proximal operator design of the unfolding architecture; and m2) \textbf{optimization flow}: we train MAE from a normal-light image to its gradient representation and then employ it as a regularization term to constrain noise in the model output. These designs improve the interpretability and representation capability of the model.Extensive experiments on multiple low-light image enhancement datasets demonstrate the superiority of our proposed paradigm over state-of-the-art methods. Code is available at https://github.com/zheng980629/CUE.
    摘要 深度神经网络已经取得了优化低光照图像的remarkable进步,提高图像的亮度和降低噪声。然而,大多数现有方法是通过规则性的映射网络来实现,忽略了图像优化任务的内在先验知识,lacking transparency和可读性。虽然一些解开解决方案已经被提出,但它们基于 proximal operator networks,导致权重不明确和隐式先验知识。在这种情况下,我们提出了一种低光照图像优化 paradigma,exploring the potential of customized learnable priors to improve the transparency of the deep unfolding paradigm。我们的方法是通过以下两种方法来改进 interpretable和 representation capability of the model:1. 结构流:我们从一个正常光照图像开始,通过训练MAE来学习图像的照明特性,然后将其embed到 proximal operator 设计中。2. 优化流:我们从一个正常光照图像开始,通过训练MAE来学习图像的梯度表示,然后使其作为模型输出中的常数约束项。我们的方法在多个低光照图像优化数据集上进行了广泛的实验,并证明了我们的方法的优越性。代码可以在https://github.com/zheng980629/CUE中找到。

Efficient Bayesian Computational Imaging with a Surrogate Score-Based Prior

  • paper_url: http://arxiv.org/abs/2309.01949
  • repo_url: None
  • paper_authors: Berthy T. Feng, Katherine L. Bouman
  • for: 这个论文的目的是提出一种代理函数,以便高效地使用分数基的先验知识来解决难定的图像重构问题。
  • methods: 这个论文使用了分数基扩散模型,将其转化为probabilistic priors,以解决难定的图像重构问题。
  • results: compared to之前的精确先验,这个代理先验可以加速对大型图像的变量推理的优化,至少提高两个数量级。此外,我们的原则途径也可以获得更高的准确性图像,比非泊尔baseline。
    Abstract We propose a surrogate function for efficient use of score-based priors for Bayesian inverse imaging. Recent work turned score-based diffusion models into probabilistic priors for solving ill-posed imaging problems by appealing to an ODE-based log-probability function. However, evaluating this function is computationally inefficient and inhibits posterior estimation of high-dimensional images. Our proposed surrogate prior is based on the evidence lower-bound of a score-based diffusion model. We demonstrate the surrogate prior on variational inference for efficient approximate posterior sampling of large images. Compared to the exact prior in previous work, our surrogate prior accelerates optimization of the variational image distribution by at least two orders of magnitude. We also find that our principled approach achieves higher-fidelity images than non-Bayesian baselines that involve hyperparameter-tuning at inference. Our work establishes a practical path forward for using score-based diffusion models as general-purpose priors for imaging.
    摘要 我们提议一种代理函数,以便高效地使用分数基金函数对bayesian反射干涉进行减少。在最近的工作中,将分数基 diffusion 模型转换成了 probabilistic prior,以解决ill-posed imaging问题。然而,计算这个函数的计算复杂度高,使得 posterior 估计高维图像的权重估计受到限制。我们的提议的代理假设基于分数基 diffusion 模型的证据下界。我们在变量推理中使用这种代理假设,以便高效地批量样本大图像的变量分布。与前一个工作中的精确假设相比,我们的代理假设可以加速变量图像分布的估计,至少提高两个数量级。此外,我们发现我们的原则途径可以在非泊利基eline上实现更高的图像质量。我们的工作建立了使用分数基 diffusion 模型作为通用假设的实用方法,以解决反射干涉问题。

Extract-and-Adaptation Network for 3D Interacting Hand Mesh Recovery

  • paper_url: http://arxiv.org/abs/2309.01943
  • repo_url: None
  • paper_authors: JoonKyu Park, Daniel Sungho Jung, Gyeongsik Moon, Kyoung Mu Lee
  • for: 本研究旨在提高3D互动手套逻辑恢复的准确性,即使两手的姿势很不同。
  • methods: 我们提出了一种名为EANet的提取和适应网络,其中包括EABlock作为网络的主要组件。而不是直接使用两手特征作为输入 токен,我们的EABlock使用了两种新的类型的特征 токен:SimToken和JoinToken。这两种特征 токен是从分离的两手特征组合而成,因此更 robust于远程特征问题。
  • results: 我们的EANet实现了3D互动手套测试benchmark上的状态之最好性能。代码可以在https://github.com/jkpark0825/EANet中下载。
    Abstract Understanding how two hands interact with each other is a key component of accurate 3D interacting hand mesh recovery. However, recent Transformer-based methods struggle to learn the interaction between two hands as they directly utilize two hand features as input tokens, which results in distant token problem. The distant token problem represents that input tokens are in heterogeneous spaces, leading Transformer to fail in capturing correlation between input tokens. Previous Transformer-based methods suffer from the problem especially when poses of two hands are very different as they project features from a backbone to separate left and right hand-dedicated features. We present EANet, extract-and-adaptation network, with EABlock, the main component of our network. Rather than directly utilizing two hand features as input tokens, our EABlock utilizes two complementary types of novel tokens, SimToken and JoinToken, as input tokens. Our two novel tokens are from a combination of separated two hand features; hence, it is much more robust to the distant token problem. Using the two type of tokens, our EABlock effectively extracts interaction feature and adapts it to each hand. The proposed EANet achieves the state-of-the-art performance on 3D interacting hands benchmarks. The codes are available at https://github.com/jkpark0825/EANet.
    摘要 To address this issue, we propose EANet, an extract-and-adaptation network that utilizes two novel types of tokens, SimToken and JoinToken, as input tokens. These tokens are derived from a combination of separated two hand features, making them more robust to the distant token problem. Our EABlock effectively extracts interaction features and adapts them to each hand using these tokens.The proposed EANet achieves state-of-the-art performance on 3D interacting hands benchmarks. The codes are available at https://github.com/jkpark0825/EANet.

DR-Pose: A Two-stage Deformation-and-Registration Pipeline for Category-level 6D Object Pose Estimation

  • paper_url: http://arxiv.org/abs/2309.01925
  • repo_url: https://github.com/zray26/dr-pose
  • paper_authors: Lei Zhou, Zhiyang Liu, Runze Gan, Haozhe Wang, Marcelo H. Ang Jr
  • for: 提高对象pose estimation的精度,使用两个阶段管道设计
  • methods: 使用完成帮助型折叠阶段和缩放注册阶段,首先使用点云完成方法生成目标对象的未见部分,然后使用注册网络提取pose敏感特征并预测对象部分点云在坐标空间的表示
  • results: 在CAMERA25和REAL275测试数据集上,DR-Pose比前状态艺术形式基于方法提供了更高的精度Here’s the same information in Simplified Chinese:
  • for: 提高对象pose estimation的精度,使用两个阶段管道设计
  • methods: 使用完成帮助型折叠阶段和缩放注册阶段,首先使用点云完成方法生成目标对象的未见部分,然后使用注册网络提取pose敏感特征并预测对象部分点云在坐标空间的表示
  • results: 在CAMERA25和REAL275测试数据集上,DR-Pose比前状态艺术形式基于方法提供了更高的精度
    Abstract Category-level object pose estimation involves estimating the 6D pose and the 3D metric size of objects from predetermined categories. While recent approaches take categorical shape prior information as reference to improve pose estimation accuracy, the single-stage network design and training manner lead to sub-optimal performance since there are two distinct tasks in the pipeline. In this paper, the advantage of two-stage pipeline over single-stage design is discussed. To this end, we propose a two-stage deformation-and registration pipeline called DR-Pose, which consists of completion-aided deformation stage and scaled registration stage. The first stage uses a point cloud completion method to generate unseen parts of target object, guiding subsequent deformation on the shape prior. In the second stage, a novel registration network is designed to extract pose-sensitive features and predict the representation of object partial point cloud in canonical space based on the deformation results from the first stage. DR-Pose produces superior results to the state-of-the-art shape prior-based methods on both CAMERA25 and REAL275 benchmarks. Codes are available at https://github.com/Zray26/DR-Pose.git.
    摘要 Category-level object pose estimation 涉及到对 predetermined categories 中对象的 6D 姿 pose 和 3D метри尺度的估算。Recent approaches 使用 categorical shape prior information 作为参考来提高姿 pose 估算精度,但是单Stage 网络设计和训练方式会导致下游表现不佳,因为这有两个不同的任务在管道中。在本文中,我们讨论了 two-stage 管道的优势。为了实现这一目标,我们提出了一种 two-stage deformation-and registration 管道,称为 DR-Pose,它包括 completion-aided deformation stage 和 scaled registration stage。在第一个阶段中,我们使用一种点云完成方法来生成目标对象的未看到部分,并用这些部分作为导向后续形态变换的引导。在第二个阶段中,我们设计了一种新的注册网络,用于提取姿 pose-sensitive 特征并预测对象 partial point cloud 在 canonical space 中的表示,基于形态变换结果。DR-Pose 在 CAMERA25 和 REAL275 测试准则上produce了与shape prior-based方法相比的更高精度结果。codes 可以在 中找到。

Causal Scoring Medical Image Explanations: A Case Study On Ex-vivo Kidney Stone Images

  • paper_url: http://arxiv.org/abs/2309.01921
  • repo_url: None
  • paper_authors: Armando Villegas-Jimenez, Daniel Flores-Araiza, Francisco Lopez-Tiro, Gilberto Ochoa-Ruiz andand Christian Daul
  • for: 本研究旨在提供一种量化测试explainable方法的效果,以便不同水平的用户都能够理解模型输出的 causa causal relationships。
  • methods: 本研究使用了一种名为Causal Explanation Score(CaES)的技术,该技术可以量化测试explainable方法的效果,并且可以对不同水平的用户进行个性化的评估。
  • results: 实验结果表明,CaES可以提供更好的量化测试结果,并且可以帮助用户更好地理解模型输出的causal relationships。
    Abstract On the promise that if human users know the cause of an output, it would enable them to grasp the process responsible for the output, and hence provide understanding, many explainable methods have been proposed to indicate the cause for the output of a model based on its input. Nonetheless, little has been reported on quantitative measurements of such causal relationships between the inputs, the explanations, and the outputs of a model, leaving the assessment to the user, independent of his level of expertise in the subject. To address this situation, we explore a technique for measuring the causal relationship between the features from the area of the object of interest in the images of a class and the output of a classifier. Our experiments indicate improvement in the causal relationships measured when the area of the object of interest per class is indicated by a mask from an explainable method than when it is indicated by human annotators. Hence the chosen name of Causal Explanation Score (CaES)
    摘要 “由于知道输出的原因可以让人类用户理解模型的处理过程,因此许多可解释方法已经被提出来表示模型的输出和输入之间的原因关系。然而,很少有报告关于量化测量这些 causal 关系的方法, leaving the assessment to the user, regardless of their level of expertise in the subject.为解决这个问题,我们研究了一种测量模型输出和输入之间的 causal 关系的技术。我们的实验表明,当使用可解释方法指定模型输出的区域对象的像素点时, causal 关系的测量得到了改进,而不是由人工标注器指定。因此,我们选择了名为 Causal Explanation Score(CaES)的技术。”

Improving Drone Imagery For Computer Vision/Machine Learning in Wilderness Search and Rescue

  • paper_url: http://arxiv.org/abs/2309.01904
  • repo_url: https://github.com/crasar/wisar
  • paper_authors: Robin Murphy, Thomas Manzini
  • for: 本研究旨在掌握无人机成像数据的搜寻问题,以便在计算机视觉/机器学习(CV/ML)模型的后处理中使用。
  • methods: 本研究提出五项建议,以提高无人机成像数据的适用性。这些建议包括在搜寻阶段使用自动化数据收集软件,以及根据计算机视觉/机器学习模型的特点进行飞行优化。
  • results: 本研究通过使用2023年日本 Wu-Murad搜寻案例进行实证研究,发现大量的搜寻数据可以通过计算机视觉/机器学习技术进行自动化分析,从而提高搜寻效率。
    Abstract This paper describes gaps in acquisition of drone imagery that impair the use with computer vision/machine learning (CV/ML) models and makes five recommendations to maximize image suitability for CV/ML post-processing. It describes a notional work process for the use of drones in wilderness search and rescue incidents. The large volume of data from the wide area search phase offers the greatest opportunity for CV/ML techniques because of the large number of images that would otherwise have to be manually inspected. The 2023 Wu-Murad search in Japan, one of the largest missing person searches conducted in that area, serves as a case study. Although drone teams conducting wide area searches may not know in advance if the data they collect is going to be used for CV/ML post-processing, there are data collection procedures that can improve the search in general with automated collection software. If the drone teams do expect to use CV/ML, then they can exploit knowledge about the model to further optimize flights.
    摘要 Translation in Simplified Chinese:这篇文章描述了无人机图像获取的缺陷,并提出五项建议来最大化图像的适用性 для计算机视觉/机器学习(CV/ML)模型的后处理。文章还描述了一种假设的无人机在远程搜索和救援任务中的应用。在广泛搜索阶段中,由于图像的大量数据,CV/ML技术可以更加有效,因为可以避免大量的手动检查。2023年在日本的武村搜索是一个最大的失踪人搜索案例,作为案例研究。虽然无人机队在广搜案件中可能不知道他们的数据将被用于CV/ML后处理,但是他们可以通过自动收集软件来改进搜索。如果无人机队知道他们的数据将被用于CV/ML,那么他们可以根据模型的知识来进一步优化飞行。

Towards Robust Plant Disease Diagnosis with Hard-sample Re-mining Strategy

  • paper_url: http://arxiv.org/abs/2309.01903
  • repo_url: None
  • paper_authors: Quan Huu Cap, Atsushi Fukuda, Satoshi Kagiwada, Hiroyuki Uga, Nobusuke Iwasaki, Hitoshi Iyatomi
  • For: 提高自动植物疾病诊断系统的准确率和效率,特别是处理大量未标注的健康数据。* Methods: 提出了一种简单 yet effective 的训练策略 called hard-sample re-mining (HSReM),通过筛选适度难度的训练图像来提高健康数据的诊断性能并同时提高疾病数据的诊断性能。* Results: 对于实际的Field eight-class cucumber和ten-class tomato数据集(42.7K和35.6K张图像)进行了实验,结果表明,使用 HSReM 训练策略可以提高大规模未seen数据的总诊断性能,并且比原始 объек detection 模型和分类基于 EfficientNetV2-Large 模型更高。
    Abstract With rich annotation information, object detection-based automated plant disease diagnosis systems (e.g., YOLO-based systems) often provide advantages over classification-based systems (e.g., EfficientNet-based), such as the ability to detect disease locations and superior classification performance. One drawback of these detection systems is dealing with unannotated healthy data with no real symptoms present. In practice, healthy plant data appear to be very similar to many disease data. Thus, those models often produce mis-detected boxes on healthy images. In addition, labeling new data for detection models is typically time-consuming. Hard-sample mining (HSM) is a common technique for re-training a model by using the mis-detected boxes as new training samples. However, blindly selecting an arbitrary amount of hard-sample for re-training will result in the degradation of diagnostic performance for other diseases due to the high similarity between disease and healthy data. In this paper, we propose a simple but effective training strategy called hard-sample re-mining (HSReM), which is designed to enhance the diagnostic performance of healthy data and simultaneously improve the performance of disease data by strategically selecting hard-sample training images at an appropriate level. Experiments based on two practical in-field eight-class cucumber and ten-class tomato datasets (42.7K and 35.6K images) show that our HSReM training strategy leads to a substantial improvement in the overall diagnostic performance on large-scale unseen data. Specifically, the object detection model trained using the HSReM strategy not only achieved superior results as compared to the classification-based state-of-the-art EfficientNetV2-Large model and the original object detection model, but also outperformed the model using the HSM strategy.
    摘要 With rich annotation information, 对 automatized plant disease diagnosis system (e.g., YOLO-based system) 来说,具有病变部署的方式比 классификаation-based system (e.g., EfficientNet-based) 有所优势,如病变部署和高精度的分类性能。然而,这些检测系统面临着处理无标注的健康数据的挑战,因为健康植物数据和疾病数据很相似。因此,这些模型通常会在健康图像上产生错误的框架。此外,为检测模型增加新数据标注是时间消耗的。为了解决这个问题,我们提出了一种简单 yet effective 的训练策略,即 hard-sample re-mining (HSReM),它可以提高健康数据的诊断性能,同时也可以提高疾病数据的诊断性能。我们在两个实际的大规模验证 dataset (42.7K和35.6K图像) 上进行了实验,结果表明,使用 HSReM 训练策略可以在大规模未看到的数据上提高总诊断性能。具体来说,使用 HSReM 训练模型不仅超过了基于类型的 state-of-the-art EfficientNetV2-Large 模型和原始检测模型,还超过了使用 HSM 策略的模型。

Unsupervised Skin Lesion Segmentation via Structural Entropy Minimization on Multi-Scale Superpixel Graphs

  • paper_url: http://arxiv.org/abs/2309.01899
  • repo_url: https://github.com/selgroup/sled
  • paper_authors: Guangjie Zeng, Hao Peng, Angsheng Li, Zhiwei Liu, Chunyang Liu, Philip S. Yu, Lifang He
  • For: 这个论文的目的是提出一种新的无监督性肤肉病变分割方法,以解决现有深度学习方法缺乏可解释性的问题。* Methods: 这种方法基于结构 entropy和孤岛森林异常检测,使用了superpixel图构建自dermoscopic图像,然后使用多尺度异常检测来提高分割精度。* Results: 在四个肤肉病变benchmark上进行了实验,并与九种代表性的无监督分割方法进行比较。实验结果表明提案的方法具有优越性,同时还进行了一些 caso study来证明方法的有效性。
    Abstract Skin lesion segmentation is a fundamental task in dermoscopic image analysis. The complex features of pixels in the lesion region impede the lesion segmentation accuracy, and existing deep learning-based methods often lack interpretability to this problem. In this work, we propose a novel unsupervised Skin Lesion sEgmentation framework based on structural entropy and isolation forest outlier Detection, namely SLED. Specifically, skin lesions are segmented by minimizing the structural entropy of a superpixel graph constructed from the dermoscopic image. Then, we characterize the consistency of healthy skin features and devise a novel multi-scale segmentation mechanism by outlier detection, which enhances the segmentation accuracy by leveraging the superpixel features from multiple scales. We conduct experiments on four skin lesion benchmarks and compare SLED with nine representative unsupervised segmentation methods. Experimental results demonstrate the superiority of the proposed framework. Additionally, some case studies are analyzed to demonstrate the effectiveness of SLED.
    摘要 皮肤病变分割是肤肤影像分析的基本任务。病变区域像素特征的复杂性使得病变分割精度受到限制,现有的深度学习基于方法frequently lack interpretability to this problem. In this work, we propose a novel unsupervised Skin Lesion Segmentation framework based on structural entropy and isolation forest outlier Detection, namely SLED. Specifically, skin lesions are segmented by minimizing the structural entropy of a superpixel graph constructed from the dermoscopic image. Then, we characterize the consistency of healthy skin features and devise a novel multi-scale segmentation mechanism by outlier detection, which enhances the segmentation accuracy by leveraging the superpixel features from multiple scales. We conduct experiments on four skin lesion benchmarks and compare SLED with nine representative unsupervised segmentation methods. Experimental results demonstrate the superiority of the proposed framework. Additionally, some case studies are analyzed to demonstrate the effectiveness of SLED.Here's the translation in Traditional Chinese:皮肤病变分割是肤肤影像分析的基本任务。病变区域像素特征的复杂性使得病变分割精度受到限制,现有的深度学习基于方法frequently lack interpretability to this problem. In this work, we propose a novel unsupervised Skin Lesion Segmentation framework based on structural entropy and isolation forest outlier Detection, namely SLED. Specifically, skin lesions are segmented by minimizing the structural entropy of a superpixel graph constructed from the dermoscopic image. Then, we characterize the consistency of healthy skin features and devise a novel multi-scale segmentation mechanism by outlier detection, which enhances the segmentation accuracy by leveraging the superpixel features from multiple scales. We conduct experiments on four skin lesion benchmarks and compare SLED with nine representative unsupervised segmentation methods. Experimental results demonstrate the superiority of the proposed framework. Additionally, some case studies are analyzed to demonstrate the effectiveness of SLED.

Gradient Domain Diffusion Models for Image Synthesis

  • paper_url: http://arxiv.org/abs/2309.01875
  • repo_url: None
  • paper_authors: Yuanhao Gong
  • for: 本研究旨在提高扩散模型在生成图像和视频 sintesis 中的效率,通过在梯度领域进行扩散过程。
  • methods: 本研究提议在梯度领域进行扩散过程,利用梯度领域的特点,即梯度领域与原始图像领域之间的数学等价性,以及梯度领域的稀疏性,使扩散过程更快 converges。
  • results: 数值实验表明,梯度领域扩散模型比原始扩散模型更高效。这种方法可以在图像处理、计算机视觉和机器学习任务中应用。
    Abstract Diffusion models are getting popular in generative image and video synthesis. However, due to the diffusion process, they require a large number of steps to converge. To tackle this issue, in this paper, we propose to perform the diffusion process in the gradient domain, where the convergence becomes faster. There are two reasons. First, thanks to the Poisson equation, the gradient domain is mathematically equivalent to the original image domain. Therefore, each diffusion step in the image domain has a unique corresponding gradient domain representation. Second, the gradient domain is much sparser than the image domain. As a result, gradient domain diffusion models converge faster. Several numerical experiments confirm that the gradient domain diffusion models are more efficient than the original diffusion models. The proposed method can be applied in a wide range of applications such as image processing, computer vision and machine learning tasks.
    摘要 Diffusion模型在生成图像和视频 синтеisis中越来越受欢迎。然而,由于扩散过程,它们需要大量的步骤以至于相对较慢。在这篇论文中,我们提议在梯度领域进行扩散过程,这会使扩散过程更快。我们有两点原因:首先,由于波尔兹方程,梯度领域与原始图像领域之间存在数学等价关系,因此每个扩散步骤在图像领域有唯一的对应梯度领域表示。其次,梯度领域比图像领域更加稀疏,因此梯度领域的扩散模型更快 converges。我们通过多个数学实验证明,梯度领域扩散模型比原始扩散模型更高效。该方法可以应用于广泛的图像处理、计算机视觉和机器学习任务。

cs.AI - 2023-09-05

Utilizing Generative Adversarial Networks for Stable Structure Generation in Angry Birds

  • paper_url: http://arxiv.org/abs/2309.02614
  • repo_url: https://github.com/Blaxzter/Utilizing-Generative-Adversarial-Networks-for-Stable-Structure-Generation-in-Angry-Birds
  • paper_authors: Frederic Abraham, Matthew Stephenson
  • for: investigate the suitability of using Generative Adversarial Networks (GANs) to generate stable structures for the physics-based puzzle game Angry Birds.
  • methods: using a detailed encoding/decoding process to convert between Angry Birds level descriptions and a suitable grid-based representation, and utilizing state-of-the-art GAN architectures and training methods to produce new structure designs.
  • results: GANs can be successfully applied to generate a varied range of complex and stable Angry Birds structures.
    Abstract This paper investigates the suitability of using Generative Adversarial Networks (GANs) to generate stable structures for the physics-based puzzle game Angry Birds. While previous applications of GANs for level generation have been mostly limited to tile-based representations, this paper explores their suitability for creating stable structures made from multiple smaller blocks. This includes a detailed encoding/decoding process for converting between Angry Birds level descriptions and a suitable grid-based representation, as well as utilizing state-of-the-art GAN architectures and training methods to produce new structure designs. Our results show that GANs can be successfully applied to generate a varied range of complex and stable Angry Birds structures.
    摘要

Detection of Unknown-Unknowns in Cyber-Physical Systems using Statistical Conformance with Physics Guided Process Models

  • paper_url: http://arxiv.org/abs/2309.02603
  • repo_url: None
  • paper_authors: Aranyak Maity, Ayan Banerjee, Sandeep Gupta
  • for: 这 paper 是关于 cyber-physical system unknown unknown 情况下的分析和评估的研究。
  • methods: 该 paper 使用 dynamics-induced hybrid recurrent neural networks (DiH-RNN) 和 physics-guided surrogate model (PGSM) 来检测 operational output 特性的模型兼容性。
  • results: 该 paper 通过使用 DiH-RNN 和 PGSM 检测 operational output 特性的模型兼容性,可以检测到 unknown insulin cartridge errors 的影响。
    Abstract Unknown unknowns are operational scenarios in a cyber-physical system that are not accounted for in the design and test phase. As such under unknown-unknown scenarios, the operational behavior of the CPS is not guaranteed to meet requirements such as safety and efficacy specified using Signal Temporal Logic (STL) on the output trajectories. We propose a novel framework for analyzing the stochastic conformance of operational output characteristics of safety-critical cyber-physical systems that can discover unknown-unknown scenarios and evaluate potential safety hazards. We propose dynamics-induced hybrid recurrent neural networks (DiH-RNN) to mine a physics-guided surrogate model (PGSM) which is used to check the model conformance using STL on the model coefficients. We demonstrate the detection of operational changes in an Artificial Pancreas(AP) due to unknown insulin cartridge errors.
    摘要 未知未知情况是遗传逻辑系统中运行阶段没有考虑的情况。因此,在未知未知情况下,遗传逻辑系统的运行行为不能保证符合要求,如安全性和有效性,使用信号时间逻辑(STL)表示输出特性的轨迹。我们提出了一种新的框架,用于分析安全关键遗传逻辑系统的输出特性的杂乱兼容性,可以发现未知未知情况并评估可能的安全隐患。我们提出了动力学引导的混合回归神经网络(DiH-RNN),用于挖掘基于物理学的模型(PGSM),并使用STL来检查模型的一致性。我们示例了适用于人工肾脏(AP)的操作变化检测,即因未知的胰岛素卡通错误而导致的操作变化。

Comparative Evaluation of Metaheuristic Algorithms for Hyperparameter Selection in Short-Term Weather Forecasting

  • paper_url: http://arxiv.org/abs/2309.02600
  • repo_url: None
  • paper_authors: Anuvab Sen, Arul Rhik Mazumder, Dibyarup Dutta, Udayon Sen, Pathikrit Syam, Sandipan Dhar
  • for: 这篇论文主要针对于准确预测天气系统的复杂动态,以便于传统统计模型不能准确地捕捉天气系统的复杂性。
  • methods: 本论文使用了深度学习技术(包括普通的ANN、LSTM和GRU网络),以及元heidursive算法(GA、DE、PSO)来自动搜索优化参数。
  • results: 研究发现,元heidursive算法可以准确地搜索优化参数,从而提高天气预测的准确性。研究还发现,不同的模型结构和元heidursive算法之间存在着积极的交互关系。
    Abstract Weather forecasting plays a vital role in numerous sectors, but accurately capturing the complex dynamics of weather systems remains a challenge for traditional statistical models. Apart from Auto Regressive time forecasting models like ARIMA, deep learning techniques (Vanilla ANNs, LSTM and GRU networks), have shown promise in improving forecasting accuracy by capturing temporal dependencies. This paper explores the application of metaheuristic algorithms, namely Genetic Algorithm (GA), Differential Evolution (DE), and Particle Swarm Optimization (PSO), to automate the search for optimal hyperparameters in these model architectures. Metaheuristic algorithms excel in global optimization, offering robustness, versatility, and scalability in handling non-linear problems. We present a comparative analysis of different model architectures integrated with metaheuristic optimization, evaluating their performance in weather forecasting based on metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). The results demonstrate the potential of metaheuristic algorithms in enhancing weather forecasting accuracy \& helps in determining the optimal set of hyper-parameters for each model. The paper underscores the importance of harnessing advanced optimization techniques to select the most suitable metaheuristic algorithm for the given weather forecasting task.
    摘要 天气预报中的准确预测是许多领域的关键任务,但是传统的统计模型难以准确地捕捉天气系统的复杂动态。 apart from ARIMA模型,深度学习技术(Vanilla ANNs、LSTM和GRU网络)已经显示出提高预测准确性的承诺, capture temporal dependencies。 this paper explores the application of metaheuristic algorithms, namely Genetic Algorithm (GA)、Differential Evolution (DE) and Particle Swarm Optimization (PSO), to automate the search for optimal hyperparameters in these model architectures。 metaheuristic algorithms excel in global optimization, offering robustness、versatility and scalability in handling non-linear problems。 we present a comparative analysis of different model architectures integrated with metaheuristic optimization, evaluating their performance in weather forecasting based on metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE)。 the results demonstrate the potential of metaheuristic algorithms in enhancing weather forecasting accuracy and helps in determining the optimal set of hyper-parameters for each model。 the paper underscores the importance of harnessing advanced optimization techniques to select the most suitable metaheuristic algorithm for the given weather forecasting task。

Approximating High-Dimensional Minimal Surfaces with Physics-Informed Neural Networks

  • paper_url: http://arxiv.org/abs/2309.02589
  • repo_url: None
  • paper_authors: Steven Zhou, Xiaojing Ye
  • for: 这 paper 是计算高维 minimal surface 数学近似的,解决了高维埃尔文·芮贝格·约瑟夫问题。
  • methods: 本 paper 使用 Physics-Informed Neural Network (PINN) 方法,通过训练深度神经网络 (DNN) 来解决 minimal surface 方程。
  • results: 本 paper 的结果表明,PINN 方法可以在高维空间中计算 minimal surface,并且可以快速地训练和运行在笔记型计算机上,无需高性能计算机。
    Abstract In this paper, we compute numerical approximations of the minimal surfaces, an essential type of Partial Differential Equation (PDE), in higher dimensions. Classical methods cannot handle it in this case because of the Curse of Dimensionality, where the computational cost of these methods increases exponentially fast in response to higher problem dimensions, far beyond the computing capacity of any modern supercomputers. Only in the past few years have machine learning researchers been able to mitigate this problem. The solution method chosen here is a model known as a Physics-Informed Neural Network (PINN) which trains a deep neural network (DNN) to solve the minimal surface PDE. It can be scaled up into higher dimensions and trained relatively quickly even on a laptop with no GPU. Due to the inability to view the high-dimension output, our data is presented as snippets of a higher-dimension shape with enough fixed axes so that it is viewable with 3-D graphs. Not only will the functionality of this method be tested, but we will also explore potential limitations in the method's performance.
    摘要 在本文中,我们计算高维精灵散函数(PDE)的数学近似值,这是一种基础方程的重要类型。传统方法在这种情况下无法处理,因为维度味问题的计算成本会快速增长,超过现代超级计算机的处理能力。仅在过去几年,机器学习研究人员才能够解决这个问题。选择的方法是一种称为物理学习神经网络(PINN),它使用深度神经网络(DNN)解决精灵散函数PDE。它可以扩展到更高维度,并在笔记计算机上训练相对快速,即使没有GPU。由于无法查看高维度输出,我们的数据被示为一些固定轴的高维度形状的截面,可以通过3D图表查看。不仅会测试这种方法的功能,我们还将探讨这种方法的可能的局限性。

Representation Learning for Sequential Volumetric Design Tasks

  • paper_url: http://arxiv.org/abs/2309.02583
  • repo_url: None
  • paper_authors: Md Ferdous Alam, Yi Wang, Linh Tran, Chin-Yi Cheng, Jieliang Luo
  • for: 这个论文的目的是提出一种基于 transformer 模型的整体设计系统,以便自动生成符合设计师的设计解决方案。
  • methods: 该论文使用了 transformer 模型来编码设计知识,并从一系列专家或高性能的设计序列中提取有用的表示。然后,它使用这些表示来进行设计偏好评估和生成过程设计。
  • results: 该论文通过使用一个 novel dataset of thousands of sequential volumetric designs 来证明其方法的有效性。其中,设计偏好评估模型可以准确地评估两个任意给定的设计序列之间的差异,并且可以自动完成一个volumetric设计序列从一个半完整的设计序列中。
    Abstract Volumetric design, also called massing design, is the first and critical step in professional building design which is sequential in nature. As the volumetric design process is complex, the underlying sequential design process encodes valuable information for designers. Many efforts have been made to automatically generate reasonable volumetric designs, but the quality of the generated design solutions varies, and evaluating a design solution requires either a prohibitively comprehensive set of metrics or expensive human expertise. While previous approaches focused on learning only the final design instead of sequential design tasks, we propose to encode the design knowledge from a collection of expert or high-performing design sequences and extract useful representations using transformer-based models. Later we propose to utilize the learned representations for crucial downstream applications such as design preference evaluation and procedural design generation. We develop the preference model by estimating the density of the learned representations whereas we train an autoregressive transformer model for sequential design generation. We demonstrate our ideas by leveraging a novel dataset of thousands of sequential volumetric designs. Our preference model can compare two arbitrarily given design sequences and is almost 90% accurate in evaluation against random design sequences. Our autoregressive model is also capable of autocompleting a volumetric design sequence from a partial design sequence.
    摘要 volumetric design,也称为质量设计,是职业建筑设计的首要步骤,具有顺序性。由于volumetric design过程复杂,其下面顺序设计过程含有价值信息。许多努力已经被made to automatically生成合理的涂抹设计,但生成的设计解决方案质量不一,评估设计解决方案需要 Either a comprehensive set of metrics or expensive human expertise。而前一些方法只是学习最终的设计而不是顺序设计任务,我们提议将专家或高性能的设计序列知识编码到模型中,并使用 transformer-based 模型提取有用的表示。后续,我们将学习的表示用于重要的下游应用,如设计偏好评估和 procedural 设计生成。我们开发了偏好模型,通过估计学习的表示密度来评估两个任意给定的设计序列,并训练一个 autoregressive transformer 模型来生成顺序设计序列。我们通过使用一个 novel 的千个顺序涂抹设计数据集来证明我们的想法。我们的偏好模型可以比较两个任意给定的设计序列,准确率接近 90%,而我们的 autoregressive 模型也可以自动完成一个顺序涂抹设计序列从一个部分设计序列。

Unveiling Intractable Epileptogenic Brain Networks with Deep Learning Algorithms: A Novel and Comprehensive Framework for Scalable Seizure Prediction with Unimodal Neuroimaging Data in Pediatric Patients

  • paper_url: http://arxiv.org/abs/2309.02580
  • repo_url: None
  • paper_authors: Bliss Singhal, Fnu Pooja
    for: 这项研究的目的是预测儿童患有不可逆的癫痫病例中的癫痫发作。methods: 该研究使用机器学习算法对单modal神经成像数据进行评估,包括电энцеfalogram信号。研究使用了频带滤波和独立组分分析来减少数据中的噪声和artefacts。results: 研究发现,深度学习算法在预测癫痫发作方面比逻子回归和k最近邻居更成功。Long short-term memory(LSTM)在精度和F1分数方面表现出色, convolutional neural network(CNN)在特征特征方面表现出色。这项研究有重要的应用前瞻性,可能改变临床实践,并提高儿童护理水平。
    Abstract Epilepsy is a prevalent neurological disorder affecting 50 million individuals worldwide and 1.2 million Americans. There exist millions of pediatric patients with intractable epilepsy, a condition in which seizures fail to come under control. The occurrence of seizures can result in physical injury, disorientation, unconsciousness, and additional symptoms that could impede children's ability to participate in everyday tasks. Predicting seizures can help parents and healthcare providers take precautions, prevent risky situations, and mentally prepare children to minimize anxiety and nervousness associated with the uncertainty of a seizure. This research proposes a novel and comprehensive framework to predict seizures in pediatric patients by evaluating machine learning algorithms on unimodal neuroimaging data consisting of electroencephalogram signals. The bandpass filtering and independent component analysis proved to be effective in reducing the noise and artifacts from the dataset. Various machine learning algorithms' performance is evaluated on important metrics such as accuracy, precision, specificity, sensitivity, F1 score and MCC. The results show that the deep learning algorithms are more successful in predicting seizures than logistic Regression, and k nearest neighbors. The recurrent neural network (RNN) gave the highest precision and F1 Score, long short-term memory (LSTM) outperformed RNN in accuracy and convolutional neural network (CNN) resulted in the highest Specificity. This research has significant implications for healthcare providers in proactively managing seizure occurrence in pediatric patients, potentially transforming clinical practices, and improving pediatric care.
    摘要 “凝视症是一种流行的神经系统疾病,全球病人约5000万人,美国病人约120万人。有数百万名儿童患有不治疗的凝视症,其中发作不能控制的症状可能导致物理伤害、混乱、失去意识和其他 симптом,影响儿童日常生活。预测发作可以帮助家长和医疗提供者制定预防措施,避免危险情况,并帮助儿童减少发作症状所带来的焦虑和不安。本研究提出了一个全面的预测发作框架,通过评估机器学习算法在单一神经内成像数据上进行评估。对于重要的 метри几何,例如准确性、特异性、敏感度和合理性,评估了不同的机器学习算法的表现。结果显示,深度学习算法比逻辑回传和k最近邻居更 successful 预测发作。Long short-term memory(LSTM)在准确性和敏感度方面表现出色, convolutional neural network(CNN)在特异性方面表现最好。这些研究结果具有重要的实践意义,可能将影响医疗提供者在管理儿童发作的方法,并改善儿童医疗。”

Recurrence-Free Survival Prediction for Anal Squamous Cell Carcinoma Chemoradiotherapy using Planning CT-based Radiomics Model

  • paper_url: http://arxiv.org/abs/2309.02562
  • repo_url: None
  • paper_authors: Shanshan Tang, Kai Wang, David Hein, Gloria Lin, Nina N. Sanford, Jing Wang
  • for: 这项研究是为了开发一个基于辐射预计划CT图像的模型,以预测非转移性分化细胞癌(ASCC)患者 после化疗(CRT)的再次出现率。
  • methods: 研究人员使用了 радиом扬特征来预测ASCC患者的再次出现率,并在多ivariate Cox准确比率模型中选择了最佳特征集。
  • results: 研究发现,基于Shape和Texture的 радиом扬特征可以有效预测ASCC患者的再次出现率,并且combined模型在测试组中表现更好,其C-指数和ROC曲线都高于仅使用临床特征模型。
    Abstract Objectives: Approximately 30% of non-metastatic anal squamous cell carcinoma (ASCC) patients will experience recurrence after chemoradiotherapy (CRT), and currently available clinical variables are poor predictors of treatment response. We aimed to develop a model leveraging information extracted from radiation pretreatment planning CT to predict recurrence-free survival (RFS) in ASCC patients after CRT. Methods: Radiomics features were extracted from planning CT images of 96 ASCC patients. Following pre-feature selection, the optimal feature set was selected via step-forward feature selection with a multivariate Cox proportional hazard model. The RFS prediction was generated from a radiomics-clinical combined model based on an optimal feature set with five repeats of five-fold cross validation. The risk stratification ability of the proposed model was evaluated with Kaplan-Meier analysis. Results: Shape- and texture-based radiomics features significantly predicted RFS. Compared to a clinical-only model, radiomics-clinical combined model achieves better performance in the testing cohort with higher C-index (0.80 vs 0.73) and AUC (0.84 vs 0.79 for 1-year RFS, 0.84 vs 0.78 for 2-year RFS, and 0.86 vs 0.83 for 3-year RFS), leading to distinctive high- and low-risk of recurrence groups (p<0.001). Conclusions: A treatment planning CT based radiomics and clinical combined model had improved prognostic performance in predicting RFS for ASCC patients treated with CRT as compared to a model using clinical features only.
    摘要 目的:约30%的非转移性 anal squamous cell carcinoma(ASCC)患者会经受化疗后再次出现,现有的临床变量不能准确预测治疗效果。我们目标是利用从化疗前规划CT图像中提取的信息,预测ASCC患者在化疗后的再次出现率(RFS)。方法:从96名ASCC患者的规划CT图像中提取了 радиом学特征。经过预选feature,选择了最佳特征集。然后,通过五次十分割分 Validation进行验证。Result:Shape和 texture基的 радиом学特征能够有效预测RFS。与仅使用临床特征模型相比,radiomics-临床共同模型在测试组中表现更好,其C-指数(0.80 vs 0.73)和ROC(0.84 vs 0.79 for 1-year RFS, 0.84 vs 0.78 for 2-year RFS, and 0.86 vs 0.83 for 3-year RFS)都高于临床特征模型,从而导致了高和低风险组分化(p<0.001)。结论:基于规划CT图像的 радиомics和临床共同模型在预测ASCC患者化疗后RFS方面表现出了改善的预测能力,与仅使用临床特征模型相比。

Physically Grounded Vision-Language Models for Robotic Manipulation

  • paper_url: http://arxiv.org/abs/2309.02561
  • repo_url: https://github.com/Stanford-ILIAD/pg-vlm
  • paper_authors: Jensen Gao, Bidipta Sarkar, Fei Xia, Ted Xiao, Jiajun Wu, Brian Ichter, Anirudha Majumdar, Dorsa Sadigh
  • for: 提高视觉问答和图像描述任务的性能,使模型更能理解物理世界。
  • methods: 使用普通人的协助和自动化的物理概念标注数据集PhysObjects进行训练,以捕捉人类对物理对象的Visual priors。
  • results: 在含有物理概念的任务中,使用物理grounded VLM进行规划,可以提高任务成功率,并在实际机器人中进行实践。
    Abstract Recent advances in vision-language models (VLMs) have led to improved performance on tasks such as visual question answering and image captioning. Consequently, these models are now well-positioned to reason about the physical world, particularly within domains such as robotic manipulation. However, current VLMs are limited in their understanding of the physical concepts (e.g., material, fragility) of common objects, which restricts their usefulness for robotic manipulation tasks that involve interaction and physical reasoning about such objects. To address this limitation, we propose PhysObjects, an object-centric dataset of 39.6K crowd-sourced and 417K automated physical concept annotations of common household objects. We demonstrate that fine-tuning a VLM on PhysObjects improves its understanding of physical object concepts, including generalization to held-out concepts, by capturing human priors of these concepts from visual appearance. We incorporate this physically-grounded VLM in an interactive framework with a large language model-based robotic planner, and show improved planning performance on tasks that require reasoning about physical object concepts, compared to baselines that do not leverage physically-grounded VLMs. We additionally illustrate the benefits of our physically-grounded VLM on a real robot, where it improves task success rates. We release our dataset and provide further details and visualizations of our results at https://iliad.stanford.edu/pg-vlm/.
    摘要 近期的视力语言模型(VLM)的进步已导致视觉问答和图像描述等任务的表现得到改善。因此,这些模型现在更容易进行物理世界中的理解,特别是在机器人操作领域。然而,当前的VLM仍有限制,它们对常见物品的物理概念(例如材料和脆弱性)的理解不够,这限制了它们在机器人操作任务中的用途。为解决这个问题,我们提出了PhysObjects,一个包含39600个人工标注和417000个自动标注的常见家用物品的物理概念数据集。我们示示了 fine-tuning VLM 在 PhysObjects 上可以提高它对物理 объек 概念的理解,包括泛化到未经过 обучение的概念。我们将这种物理基础 VLM 与大语言模型基础的机器人规划器集成,并表明了不使用物理基础 VLM 的基eline 的规划性能相对较差。我们还在真实的机器人上运行了这种物理基础 VLM,并证明了它可以提高任务成功率。我们在 https://iliad.stanford.edu/pg-vlm/ 发布了我们的数据集,并在结果中提供了更多的详细信息和视觉化。

Automating Behavioral Testing in Machine Translation

  • paper_url: http://arxiv.org/abs/2309.02553
  • repo_url: https://github.com/Sfedfcv/redesigned-pancake
  • paper_authors: Javier Ferrando, Matthias Sperber, Hendra Setiawan, Dominic Telaar, Saša Hasan
  • for: 评估机器翻译系统的语言能力,包括语料生成和输入输出行为的分析。
  • methods: 使用大语言模型生成多样化的源句子,以测试机器翻译系统在不同情况下的行为。
  • results: 通过对多个可用的机器翻译系统进行评估,发现 passer rates 随着传统精度指标的趋势相似,但方法找到了一些重要的差异和潜在的错误,这些错误在仅仅依据精度时未被发现。
    Abstract Behavioral testing in NLP allows fine-grained evaluation of systems by examining their linguistic capabilities through the analysis of input-output behavior. Unfortunately, existing work on behavioral testing in Machine Translation (MT) is currently restricted to largely handcrafted tests covering a limited range of capabilities and languages. To address this limitation, we propose to use Large Language Models (LLMs) to generate a diverse set of source sentences tailored to test the behavior of MT models in a range of situations. We can then verify whether the MT model exhibits the expected behavior through matching candidate sets that are also generated using LLMs. Our approach aims to make behavioral testing of MT systems practical while requiring only minimal human effort. In our experiments, we apply our proposed evaluation framework to assess multiple available MT systems, revealing that while in general pass-rates follow the trends observable from traditional accuracy-based metrics, our method was able to uncover several important differences and potential bugs that go unnoticed when relying only on accuracy.
    摘要 <>TRANSLATE_TEXT行为测试在自然语言处理(NLP)中允许细化评估系统的语言能力通过输入输出行为的分析。 Unfortunately,现有的机器翻译(MT)行为测试工作现在受到了较少的手动测试覆盖的限制,只有一些特定的语言和能力。 To address this limitation, we propose using Large Language Models(LLMs)来生成一组多样化的源句子,用于测试MT模型在多种情况下的行为。 We can then verify whether the MT model exhibits the expected behavior by matching candidate sets that are also generated using LLMs. Our approach aims to make behavioral testing of MT systems practical while requiring only minimal human effort. In our experiments, we apply our proposed evaluation framework to assess multiple available MT systems, revealing that while in general pass-rates follow the trends observable from traditional accuracy-based metrics, our method was able to uncover several important differences and potential bugs that go unnoticed when relying only on accuracy.TRANSLATE_TEXT

Continual Improvement of Threshold-Based Novelty Detection

  • paper_url: http://arxiv.org/abs/2309.02551
  • repo_url: None
  • paper_authors: Abe Ejilemele, Jorge Mendez-Mendez
  • for: 解决 neural network 在动态开放世界中探测未经见过的类型时存在问题,使得不可预知 novelty 探测方法无法适应实际环境中的数据特性。
  • methods: 我们提出了一种新的方法,利用直线搜索和留一个样本验证来自动选择阈值,以提高总准确率在 MNIST、Fashion MNIST 和 CIFAR-10 上。
  • results: 我们的方法在三个 dataset 上都达到了提高的总准确率,表明自动选择阈值可以更好地适应不同的数据特性。
    Abstract When evaluated in dynamic, open-world situations, neural networks struggle to detect unseen classes. This issue complicates the deployment of continual learners in realistic environments where agents are not explicitly informed when novel categories are encountered. A common family of techniques for detecting novelty relies on thresholds of similarity between observed data points and the data used for training. However, these methods often require manually specifying (ahead of time) the value of these thresholds, and are therefore incapable of adapting to the nature of the data. We propose a new method for automatically selecting these thresholds utilizing a linear search and leave-one-out cross-validation on the ID classes. We demonstrate that this novel method for selecting thresholds results in improved total accuracy on MNIST, Fashion MNIST, and CIFAR-10.
    摘要 在动态开放环境中评估神经网络时,它们很难探测未看过的类别。这个问题使得在实际环境中部署不断学习者变得更加复杂。一种常见的新类探测技术是基于训练数据点和观察数据点之间的相似度阈值。然而,这些方法通常需要手动指定阈值的值(在过程中),因此无法适应数据的特点。我们提出了一种新的方法,使用线性搜索和留下一个样本进行交叉验证,以自动选择阈值。我们示出,这种新的阈值选择方法可以提高MNIST、Fashion MNIST和CIFAR-10等三个 dataset 的总准确率。

Structural Concept Learning via Graph Attention for Multi-Level Rearrangement Planning

  • paper_url: http://arxiv.org/abs/2309.02547
  • repo_url: None
  • paper_authors: Manav Kulshrestha, Ahmed H. Qureshi
  • for: 该论文关注了机器人 manipulate 任务,如物体重新排序,以便 robot 与复杂而不受限制的环境进行交互。
  • methods: 该论文提出了一种深度学习方法,名为结构概念学习(SCL),它利用图注意网络来实现多层物体重新排序规划。SCL 可以处理具有结构依赖层次的Scene,并可以在未经见过的场景中进行任务并行化和灵活化。
  • results: 作者通过对一系列经典和模型基础的基准方法进行比较,证明了 SCL 能够更好地利用场景理解来实现更高的性能、灵活性和效率。
    Abstract Robotic manipulation tasks, such as object rearrangement, play a crucial role in enabling robots to interact with complex and arbitrary environments. Existing work focuses primarily on single-level rearrangement planning and, even if multiple levels exist, dependency relations among substructures are geometrically simpler, like tower stacking. We propose Structural Concept Learning (SCL), a deep learning approach that leverages graph attention networks to perform multi-level object rearrangement planning for scenes with structural dependency hierarchies. It is trained on a self-generated simulation data set with intuitive structures, works for unseen scenes with an arbitrary number of objects and higher complexity of structures, infers independent substructures to allow for task parallelization over multiple manipulators, and generalizes to the real world. We compare our method with a range of classical and model-based baselines to show that our method leverages its scene understanding to achieve better performance, flexibility, and efficiency. The dataset, supplementary details, videos, and code implementation are available at: https://manavkulshrestha.github.io/scl
    摘要 瑜珈机器人操作任务,如物品重新排序,对机器人在复杂且随机环境中进行交互起到关键作用。现有研究主要集中在单个层次重新排序规划上,即使有多个层次存在,dependency关系 among substructures几乎都是 геометрически简单的,如筒堆。我们提出了Structural Concept Learning(SCL),一种深度学习方法,通过图像注意力网络来实现多层次物品重新排序规划。它在自己生成的 simulatedata set上接受intuitive结构,可以处理未看过的场景,无论有多少对象和更高的结构复杂度,推导独立的substructure,以便在多个机器人上分布task,并且可以在实际世界中普适。我们与一系列的古典和基于模型的基准进行比较,显示我们的方法借助于场景理解来实现更好的性能、灵活性和效率。数据集、补充细节、视频和代码实现可以在:https://manavkulshrestha.github.io/scl obtener

Experience and Prediction: A Metric of Hardness for a Novel Litmus Test

  • paper_url: http://arxiv.org/abs/2309.02534
  • repo_url: None
  • paper_authors: Nicos Isaak, Loizos Michael
  • for: 本研究旨在开发一种基于机器学习(ML)的系统,用于评估winograd schema的困难程度,并且比之前的方法更快速和准确。
  • methods: 本研究采用了两种不同的方法,namely random forest和深度学习(LSTM-based),以评估winograd schema的困难程度。
  • results: 研究发现,人类对winograd schema的表现异常 vary,并且与schema的困难程度有直接关系。此外,我们还发现了一些特定的winograd schema可以用于测试人类的智能水平。
    Abstract In the last decade, the Winograd Schema Challenge (WSC) has become a central aspect of the research community as a novel litmus test. Consequently, the WSC has spurred research interest because it can be seen as the means to understand human behavior. In this regard, the development of new techniques has made possible the usage of Winograd schemas in various fields, such as the design of novel forms of CAPTCHAs. Work from the literature that established a baseline for human adult performance on the WSC has shown that not all schemas are the same, meaning that they could potentially be categorized according to their perceived hardness for humans. In this regard, this \textit{hardness-metric} could be used in future challenges or in the WSC CAPTCHA service to differentiate between Winograd schemas. Recent work of ours has shown that this could be achieved via the design of an automated system that is able to output the hardness-indexes of Winograd schemas, albeit with limitations regarding the number of schemas it could be applied on. This paper adds to previous research by presenting a new system that is based on Machine Learning (ML), able to output the hardness of any Winograd schema faster and more accurately than any other previously used method. Our developed system, which works within two different approaches, namely the random forest and deep learning (LSTM-based), is ready to be used as an extension of any other system that aims to differentiate between Winograd schemas, according to their perceived hardness for humans. At the same time, along with our developed system we extend previous work by presenting the results of a large-scale experiment that shows how human performance varies across Winograd schemas.
    摘要 过去一个 décennio,Winograd Schema Challenge(WSC)已成为研究社区中的中心方面,作为一种新的考验。因此,WSC 引发了研究者的兴趣,因为它可以用来理解人类行为。在这个意义上,开发新技术使得 Winograd schema 可以在不同领域中使用,如设计新型 CAPTCHAs。 据文献记录,人类成人在 WSC 中的表现达标准,表明不同的 Winograd schema 可能有不同的抵抗程度。在这个意义上,这个“抵抗度指标”可以在未来的挑战中或 WSC CAPTCHA 服务中使用来区分 Winograd schema。 我们最近的工作表明,这可以通过设计一个自动化系统来实现,该系统可以输出 Winograd schema 的抵抗度指标,但是只能应用于有限数量的 schema。本文添加了先前的研究,提出了一个基于机器学习(ML)的新系统,可以更快、更准确地输出 Winograd schema 的抵抗度指标。我们开发的系统采用了两种不同的方法,即随机森林和深度学习(LSTM)。这个系统可以作为任何其他系统的扩展,以区分 Winograd schema 根据人类对它们的抵抗度。同时,我们也扩展了先前的工作,通过发表大规模实验,显示了人类在不同 Winograd schema 中的表现差异。

Do You Trust ChatGPT? – Perceived Credibility of Human and AI-Generated Content

  • paper_url: http://arxiv.org/abs/2309.02524
  • repo_url: None
  • paper_authors: Martin Huschens, Martin Briesch, Dominik Sobania, Franz Rothlauf
  • for: 这个研究探讨用户对人类作者vs大语言模型生成内容的信任程度如何不同的用户界面版本。
  • methods: 研究使用了具有不同用户界面版本的人类作者和大语言模型生成的内容,评估用户对这两种内容的信任程度和技能感。
  • results: 结果显示,尽管用户界面版本不同,但参与者对人类作者和大语言模型生成的内容的信任程度几乎相同。同时,参与者认为AI生成的内容 clearer和更有吸引力。这些发现告诉我们需要更加谨慎地评估信息来源,促使用者予以慎重和批判性思维。
    Abstract This paper examines how individuals perceive the credibility of content originating from human authors versus content generated by large language models, like the GPT language model family that powers ChatGPT, in different user interface versions. Surprisingly, our results demonstrate that regardless of the user interface presentation, participants tend to attribute similar levels of credibility. While participants also do not report any different perceptions of competence and trustworthiness between human and AI-generated content, they rate AI-generated content as being clearer and more engaging. The findings from this study serve as a call for a more discerning approach to evaluating information sources, encouraging users to exercise caution and critical thinking when engaging with content generated by AI systems.
    摘要

Efficient RL via Disentangled Environment and Agent Representations

  • paper_url: http://arxiv.org/abs/2309.02435
  • repo_url: None
  • paper_authors: Kevin Gmelin, Shikhar Bahl, Russell Mendonca, Deepak Pathak
  • for: 提高RL算法的视觉理解和表示能力
  • methods: 使用自己的视觉知识(如形状或面具)来学习结构化表示,并将其integrated into RL目标函数中
  • results: 在18个不同的视觉 simulations环境中,对5种不同的机器人进行了比较,并得到了模型自由方法的性能提升
    Abstract Agents that are aware of the separation between themselves and their environments can leverage this understanding to form effective representations of visual input. We propose an approach for learning such structured representations for RL algorithms, using visual knowledge of the agent, such as its shape or mask, which is often inexpensive to obtain. This is incorporated into the RL objective using a simple auxiliary loss. We show that our method, Structured Environment-Agent Representations, outperforms state-of-the-art model-free approaches over 18 different challenging visual simulation environments spanning 5 different robots. Website at https://sear-rl.github.io/
    摘要 Agent 可以感知自己和环境之间的分离,可以利用这种理解来形成有效的视觉输入表示。我们提出一种使用视觉知识,如机器人的形状或面具,可以轻松获得的方法来学习这些结构化表示。这种方法被 integrate 到RL目标中使用简单的辅助损失。我们显示,我们的方法 Structured Environment-Agent Representations 在 18 个不同的复杂视觉 simulate 环境中,使用 5 种不同的机器人,超过了现状的模型自由方法。网站地址为 。Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach

  • paper_url: http://arxiv.org/abs/2309.02429
  • repo_url: None
  • paper_authors: Vimal K B, Saketh Bachu, Tanmay Garg, Niveditha Lakshmi Narasimhan, Raghavan Konuru, Vineeth N Balasubramanian
  • for: 这篇论文主要应用于估计公开可用的预训模型在目标任务上的转移可能性。
  • methods: 本论文使用了一个新的Optimal tranSport-based suBmOdular tRaNsferability metric(OSBORN)来估计多来源模型的转移可能性。OSBORN考虑了图像领域差异、任务差异和模型集合中的凝聚性,以提供可靠的转移可能性估计。
  • results: 本论文通过对28个源数据集、11个目标数据集、5种模型架构和2种预训方法进行 benchmarking,发现OSBORN可以与现有的state-of-the-art度量metric MS-LEEP和E-LEEP相比,并在该方法下表现出色。
    Abstract Estimating the transferability of publicly available pretrained models to a target task has assumed an important place for transfer learning tasks in recent years. Existing efforts propose metrics that allow a user to choose one model from a pool of pre-trained models without having to fine-tune each model individually and identify one explicitly. With the growth in the number of available pre-trained models and the popularity of model ensembles, it also becomes essential to study the transferability of multiple-source models for a given target task. The few existing efforts study transferability in such multi-source ensemble settings using just the outputs of the classification layer and neglect possible domain or task mismatch. Moreover, they overlook the most important factor while selecting the source models, viz., the cohesiveness factor between them, which can impact the performance and confidence in the prediction of the ensemble. To address these gaps, we propose a novel Optimal tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the transferability of an ensemble of models to a downstream task. OSBORN collectively accounts for image domain difference, task difference, and cohesiveness of models in the ensemble to provide reliable estimates of transferability. We gauge the performance of OSBORN on both image classification and semantic segmentation tasks. Our setup includes 28 source datasets, 11 target datasets, 5 model architectures, and 2 pre-training methods. We benchmark our method against current state-of-the-art metrics MS-LEEP and E-LEEP, and outperform them consistently using the proposed approach.
    摘要 估计公共可用预训练模型在目标任务中的转移性在过去几年中得到了重要地位。现有的努力提出了用于选择预训练模型池中的一个模型而无需 individually fine-tune each model和特定地标出一个的指标。随着可用的预训练模型的数量的增加和模型组合的流行,也变得必要研究多源模型在给定目标任务中的转移性。现有的努力研究了这种多源模型的转移性使用输出类别层的结果,而忽略了可能存在的领域或任务差异,而且也忽略了选择源模型时最重要的因素——模型集合中的凝结度,这可能会影响预测 ensemble 的性能和信任度。为了解决这些差距,我们提出了一种新的 Optimal tranSport-based suBmOdular tRaNsferability 指标(OSBORN),用于估计 ensemble 模型到下游任务的转移性。OSBORN 共同考虑图像领域差异、任务差异和模型集合中的凝结度,以提供可靠的转移性估计。我们在图像分类和 semantic segmentation 任务上测试了我们的方法,并与当前状态的метрик MS-LEEP 和 E-LEEP 进行了比较,并一致地超越了它们。

Cognitive Architectures for Language Agents

  • paper_url: http://arxiv.org/abs/2309.02427
  • repo_url: https://github.com/ysymyth/awesome-language-agents
  • paper_authors: Theodore Sumers, Shunyu Yao, Karthik Narasimhan, Thomas L. Griffiths
  • for: 本研究旨在开发一种新一代的语言智能代理人,以帮助语言模型(LLM)进行更好的理解和决策。
  • methods: 本研究使用了符号人工智能的历史经验,将生成语言模型(LLM)与外部资源(如互联网)或内部控制流(如提示链)结合起来,以建立一个完整的语言代理人系统。
  • results: 研究表明,LLMs 具有许多生产系统的特性,而最近尝试改进 LLMS 的理解和基础设施的努力,与生产系统驱动的认知架构的发展具有很大的相似性。
    Abstract Recent efforts have incorporated large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or reasoning. However, these efforts have largely been piecemeal, lacking a systematic framework for constructing a fully-fledged language agent. To address this challenge, we draw on the rich history of agent design in symbolic artificial intelligence to develop a blueprint for a new wave of cognitive language agents. We first show that LLMs have many of the same properties as production systems, and recent efforts to improve their grounding or reasoning mirror the development of cognitive architectures built around production systems. We then propose Cognitive Architectures for Language Agents (CoALA), a conceptual framework to systematize diverse methods for LLM-based reasoning, grounding, learning, and decision making as instantiations of language agents in the framework. Finally, we use the CoALA framework to highlight gaps and propose actionable directions toward more capable language agents in the future.
    摘要 We first show that LLMs have many of the same properties as production systems, and recent efforts to improve their grounding or reasoning mirror the development of cognitive architectures built around production systems. We then propose Cognitive Architectures for Language Agents (CoALA), a conceptual framework to systematize diverse methods for LLM-based reasoning, grounding, learning, and decision making as instantiations of language agents in the framework.Finally, we use the CoALA framework to highlight gaps and propose actionable directions toward more capable language agents in the future.

A Context-Sensitive Approach to XAI in Music Performance

  • paper_url: http://arxiv.org/abs/2309.04491
  • repo_url: None
  • paper_authors: Nicola Privato, Jack Armitage
  • for: 提出了一种Explainable Pragmatism(EP)框架,用于解释人工智能(AI)系统在音乐表演中的工作原理。
  • methods: 提出了一种基于上下文和听众的解释需求开发方法,并在实际应用中进行了详细的描述和分析。
  • results: EP框架可以帮助提高AI系统在艺术应用中的透明度和可解性,并且可以根据听众反馈和改进。
    Abstract The rapidly evolving field of Explainable Artificial Intelligence (XAI) has generated significant interest in developing methods to make AI systems more transparent and understandable. However, the problem of explainability cannot be exhaustively solved in the abstract, as there is no single approach that can be universally applied to generate adequate explanations for any given AI system, and this is especially true in the arts. In this position paper, we propose an Explanatory Pragmatism (EP) framework for XAI in music performance, emphasising the importance of context and audience in the development of explainability requirements. By tailoring explanations to specific audiences and continuously refining them based on feedback, EP offers a promising direction for enhancing the transparency and interpretability of AI systems in broad artistic applications and more specifically to music performance.
    摘要 rapidly evolving field of 可解释人工智能(XAI)已引起了开发方法来使AI系统更透明和理解的广泛关注。然而,问题的解释不可能在抽象中完全解决,因为没有一种通用的方法可以对任何AI系统生成足够的解释,这特别是在艺术领域。在这篇Position paper中,我们提出了一种 Pragmatism(EP)框架 для XAI在音乐表演中,强调了解释的上下文和听众的重要性。通过对specific audiences tailoring explanations和基于反馈不断修改,EP提供了一个有前途的方向来提高AI系统在艺术应用中的透明度和可解释性。

Information Processing by Neuron Populations in the Central Nervous System: Mathematical Structure of Data and Operations

  • paper_url: http://arxiv.org/abs/2309.02332
  • repo_url: None
  • paper_authors: Martin N. P. Nilsson
    for:这篇论文旨在探讨神经元集团中的信息编码和操作方法。methods:该论文使用一种现代化的神经元模型,并通过描述这些神经元集团的数学结构,探讨了这些集团在信息处理方面的能力。results:研究发现,这些神经元集团可以通过一种简单的代数结构来表示和处理信息,并且可以实现许多操作,如特殊化、通用化、新鲜度检测、维度减少、逆模型、预测和关联记忆等。这些结果可能有助于我们更好地理解脑中的信息处理机制,并在认知科学和人工智能领域进行进一步的研究。
    Abstract In the intricate architecture of the mammalian central nervous system, neurons form populations. Axonal bundles communicate between these clusters using spike trains as their medium. However, these neuron populations' precise encoding and operations have yet to be discovered. In our analysis, the starting point is a state-of-the-art mechanistic model of a generic neuron endowed with plasticity. From this simple framework emerges a profound mathematical construct: The representation and manipulation of information can be precisely characterized by an algebra of finite convex cones. Furthermore, these neuron populations are not merely passive transmitters. They act as operators within this algebraic structure, mirroring the functionality of a low-level programming language. When these populations interconnect, they embody succinct yet potent algebraic expressions. These networks allow them to implement many operations, such as specialization, generalization, novelty detection, dimensionality reduction, inverse modeling, prediction, and associative memory. In broader terms, this work illuminates the potential of matrix embeddings in advancing our understanding in fields like cognitive science and AI. These embeddings enhance the capacity for concept processing and hierarchical description over their vector counterparts.
    摘要 在哺乳动物中枢神经系统的复杂建筑中, neurons 组成 populations。 axon 短列传输 между这些群体使用冲击车作为媒介。然而,这些 neuron populations 的准确编码和操作仍未被发现。在我们的分析中,开始点是一种现代机制模型,拥有 пластичность的 generic neuron。从这个简单的框架中,出现了深刻的数学构造:表示和操作信息的 algebra of finite convex cones。此外,这些 neuron populations 不仅是 passive 的传输器。它们作为这些数学结构中的操作员,反映了低级编程语言的功能。当这些 populations 相互连接时,它们实现了简洁而强大的数学表达。这些网络允许它们实现许多操作,如特性化、泛化、发现新的、维度减少、逆模型、预测和相关记忆。在更广泛的意义上,这些 embedding 在认知科学和 AI 领域的发展中具有潜在的潜力。这些 embedding 可以提高概念处理的能力和层次描述的能力,比vector counterparts 更高效。

Neurosymbolic Meta-Reinforcement Lookahead Learning Achieves Safe Self-Driving in Non-Stationary Environments

  • paper_url: http://arxiv.org/abs/2309.02328
  • repo_url: None
  • paper_authors: Haozhe Lei, Quanyan Zhu
  • for: This paper focuses on the integration of machine learning into self-driving technology, with a specific emphasis on ensuring safety and efficiency in real-world applications.
  • methods: The paper introduces an algorithm for online meta-reinforcement learning, called Neurosymbolic Meta-Reinforcement Lookahead Learning (NUMERLA), which combines lookahead symbolic constraints with online adaptation to ensure both efficiency and safety.
  • results: The experimental results demonstrate that NUMERLA enables the self-driving agent to adapt in real-time to non-stationary urban human-vehicle interaction scenarios, leading to safe and self-adaptive driving.Here’s the same information in Simplified Chinese:
  • for: 这篇论文关注机器学习在自动驾驶技术中的集成,特别是在实际应用中保证安全性和效率的问题。
  • methods: 这篇论文提出了一种名为数字符号化多因素奖励前Lookahead学习算法(NUMERLA),它将数字符号化约束与在线调整结合起来,以确保效率和安全性均得到保障。
  • results: 实验结果表明, NUMERLA可以使自动驾驶机器人在非站立城市人机交互enario下实现安全和自适应驾驶。
    Abstract In the area of learning-driven artificial intelligence advancement, the integration of machine learning (ML) into self-driving (SD) technology stands as an impressive engineering feat. Yet, in real-world applications outside the confines of controlled laboratory scenarios, the deployment of self-driving technology assumes a life-critical role, necessitating heightened attention from researchers towards both safety and efficiency. To illustrate, when a self-driving model encounters an unfamiliar environment in real-time execution, the focus must not solely revolve around enhancing its anticipated performance; equal consideration must be given to ensuring its execution or real-time adaptation maintains a requisite level of safety. This study introduces an algorithm for online meta-reinforcement learning, employing lookahead symbolic constraints based on \emph{Neurosymbolic Meta-Reinforcement Lookahead Learning} (NUMERLA). NUMERLA proposes a lookahead updating mechanism that harmonizes the efficiency of online adaptations with the overarching goal of ensuring long-term safety. Experimental results demonstrate NUMERLA confers the self-driving agent with the capacity for real-time adaptability, leading to safe and self-adaptive driving under non-stationary urban human-vehicle interaction scenarios.
    摘要 在学习驱动人工智能的发展领域,将机器学习(ML) integrates into自动驾驶(SD)技术是一项印象深刻的工程成果。然而,在实际应用中,自动驾驶技术的部署具有生命 crítical 的重要性,需要研究人员强调安全性和效率之间的平衡。例如,当一个自动驾驶模型在实时执行中遇到未知环境时,不能 solely 围绕增强其预期性能进行强调,也需要确保其执行或实时适应保持一定的安全水平。本研究提出了一种在线meta-学习算法,使用lookahead符号约束,基于Neurosymbolic Meta-Reinforcement Lookahead Learning(NUMERLA)。NUMERLA提出了一种协调在线适应的效率和长期安全的目标,使得自动驾驶机器人能够在非站ARY urban human-vehicle interactionenario下进行安全和自适应驾驶。实验结果表明,NUMERLA使得自动驾驶机器人具有了实时适应的能力,并在非站ARY urban human-vehicle interactionenario下保持了安全和自适应的驾驶。

Revisiting File Context for Source Code Summarization

  • paper_url: http://arxiv.org/abs/2309.02326
  • repo_url: https://github.com/apcl-research/transformerfc
  • paper_authors: Aakash Bansal, Chia-Yi Su, Collin McMillan
  • for: 这个论文主要是为了提高代码概要的生成。
  • methods: 该论文使用了改进的 transformer 架构,用于编码文件上下文信息,以帮助解决一些困难的例子。
  • results: 研究发现,文件上下文信息可以帮助解决一些困难的例子,并且提高代码概要的生成质量。
    Abstract Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code summarization is the encoder-decoder neural architecture, and the encoder input is almost always a single subroutine or other short code snippet. The problem with this setup is that the information needed to describe the code is often not present in the code itself -- that information often resides in other nearby code. In this paper, we revisit the idea of ``file context'' for code summarization. File context is the idea of encoding select information from other subroutines in the same file. We propose a novel modification of the Transformer architecture that is purpose-built to encode file context and demonstrate its improvement over several baselines. We find that file context helps on a subset of challenging examples where traditional approaches struggle.
    摘要 源代码概要是将源代码写成自然语言描述的任务。一个常见的用例是生成 API 文档中的简短描述。现有的大多数研究都使用 encoder-decoder 神经网络架构,其中 encoder 输入通常是单个子routine 或其他短代码副本。问题在于,代码描述所需的信息不总是在代码中存在,这些信息通常位于附近的代码中。在这篇论文中,我们重新考虑了 ``file context'' 的想法,即在代码概要中使用其他文件中的选择信息。我们提出了一种 modificated Transformer 架构,专门用于编码文件上下文,并证明其在多个基线上显著提高了性能。我们发现,文件上下文有助于一些困难的例子,传统方法在这些例子中困难。

SeisCLIP: A seismology foundation model pre-trained by multi-modal data for multi-purpose seismic feature extraction

  • paper_url: http://arxiv.org/abs/2309.02320
  • repo_url: https://github.com/sixu0/SeisCLIP
  • paper_authors: Xu Si, Xinming Wu, Hanlin Sheng, Jun Zhu, Zefeng Li
  • for: 这篇论文的目的是发展一个基础模型,供不同领域的地震学家使用。
  • methods: 这篇论文使用了对比式学习方法,将多modal数据集融合到一个基础模型中。
  • results: 这篇论文的实验结果显示,基础模型可以在不同地区的数据集上表现出色,并且在不同任务上表现更好 than 基于点的方法。
    Abstract Training specific deep learning models for particular tasks is common across various domains within seismology. However, this approach encounters two limitations: inadequate labeled data for certain tasks and limited generalization across regions. To address these challenges, we develop SeisCLIP, a seismology foundation model trained through contrastive learning from multi-modal data. It consists of a transformer encoder for extracting crucial features from time-frequency seismic spectrum and an MLP encoder for integrating the phase and source information of the same event. These encoders are jointly pre-trained on a vast dataset and the spectrum encoder is subsequently fine-tuned on smaller datasets for various downstream tasks. Notably, SeisCLIP's performance surpasses that of baseline methods in event classification, localization, and focal mechanism analysis tasks, employing distinct datasets from different regions. In conclusion, SeisCLIP holds significant potential as a foundational model in the field of seismology, paving the way for innovative directions in foundation-model-based seismology research.
    摘要 通常在不同领域内的地震学中都会使用特定任务的深度学习模型训练。然而,这种方法存在两个限制:一是没有充足的标注数据 для某些任务,二是限制了在不同地区的泛化。为了解决这些挑战,我们开发了SeisCLIP,一个基于对比学习的地震学基础模型。它包括一个变换器编码器,用于从时频地震谱中提取关键特征,以及一个多层感知编码器,用于 инте integrating频谱信息和源信息。这两个编码器被共同预训练在庞大的数据集上,并且 spectrum编码器在更小的数据集上进行细化训练以适应不同下游任务。值得注意的是,SeisCLIP的性能在不同地区的事件分类、地点定位和焦点机制分析任务中都超过了基eline方法的性能。因此,SeisCLIP在地震学领域中具有重要的潜在价值,可能开创出新的基础模型基于的地震学研究方向。

Graph Self-Contrast Representation Learning

  • paper_url: http://arxiv.org/abs/2309.02304
  • repo_url: https://github.com/GRAND-Lab/MERIT
  • paper_authors: Minjie Chen, Yao Cheng, Ye Wang, Xiang Li, Ming Gao
  • for: 本文提出了一种新的图自对抗框架GraphSC,用于图表示学习。
  • methods: GraphSC使用了一个positive和一个negative样本,并使用三元损失函数。具体来说, GraphSC使用图生成函数来生成图样本的多种强度的负样本,并使用HSIC来因素化表示。
  • results: 在对19种当前状态的方法进行了广泛的实验测试后,GraphSC在无监督和转移学习Setting中表现出了优秀的表现。
    Abstract Graph contrastive learning (GCL) has recently emerged as a promising approach for graph representation learning. Some existing methods adopt the 1-vs-K scheme to construct one positive and K negative samples for each graph, but it is difficult to set K. For those methods that do not use negative samples, it is often necessary to add additional strategies to avoid model collapse, which could only alleviate the problem to some extent. All these drawbacks will undoubtedly have an adverse impact on the generalizability and efficiency of the model. In this paper, to address these issues, we propose a novel graph self-contrast framework GraphSC, which only uses one positive and one negative sample, and chooses triplet loss as the objective. Specifically, self-contrast has two implications. First, GraphSC generates both positive and negative views of a graph sample from the graph itself via graph augmentation functions of various intensities, and use them for self-contrast. Second, GraphSC uses Hilbert-Schmidt Independence Criterion (HSIC) to factorize the representations into multiple factors and proposes a masked self-contrast mechanism to better separate positive and negative samples. Further, Since the triplet loss only optimizes the relative distance between the anchor and its positive/negative samples, it is difficult to ensure the absolute distance between the anchor and positive sample. Therefore, we explicitly reduced the absolute distance between the anchor and positive sample to accelerate convergence. Finally, we conduct extensive experiments to evaluate the performance of GraphSC against 19 other state-of-the-art methods in both unsupervised and transfer learning settings.
    摘要 graph contrastive learning (GCL) 近期出现为图表示学习的有力的方法之一。一些现有方法采用1对K的方案来建立一个图和K个负样本,但是很难设置K。对于不使用负样本的方法,通常需要添加额外策略以避免模型塌陷,这可以只是减轻问题的程度。这些缺点会对模型的普适性和效率产生负面影响。在这篇论文中,我们提出一种新的图自相关框架GraphSC,它仅使用一个图和一个负样本,并选择三元损失为目标。具体来说,自相关有两个含义。首先,GraphSC通过图像函数的多种强度生成了图样本的正面和负面视图,并将它们用于自相关。其次,GraphSC使用希尔伯特-施密特独立度标准(HSIC)来因素化表示,并提出了屏蔽自相关机制以更好地分离正面和负面样本。此外,因为三元损失仅仅优化了anchor和正样本之间的相对距离,因此我们显式减小了anchor和正样本之间的绝对距离,以加速收敛。最后,我们进行了广泛的实验,以评估GraphSC在无监督和转移学习设置下的性能,与19种当前状态的方法进行比较。

Enhancing Semantic Communication with Deep Generative Models – An ICASSP Special Session Overview

  • paper_url: http://arxiv.org/abs/2309.02478
  • repo_url: None
  • paper_authors: Eleonora Grassucci, Yuki Mitsufuji, Ping Zhang, Danilo Comminiello
  • for: 本研究旨在探讨 semantic communication 在 future AI-driven communication systems 中的突出作用,以及如何通过深度生成模型来解决 semantic information EXTRACTION 和 semantically consistent data 生成的挑战。
  • methods: 本研究使用 deep generative models 来 Addressing semantic communication challenges from the machine learning perspective,包括 dealing with real-world complex data, extracting and exploiting semantic information, and being robust to channel corruptions。
  • results: 本研究 Chart novel research pathways for the next generative semantic communication frameworks, 并预示了 deep generative models 在 semantic communication 中的突出作用。
    Abstract Semantic communication is poised to play a pivotal role in shaping the landscape of future AI-driven communication systems. Its challenge of extracting semantic information from the original complex content and regenerating semantically consistent data at the receiver, possibly being robust to channel corruptions, can be addressed with deep generative models. This ICASSP special session overview paper discloses the semantic communication challenges from the machine learning perspective and unveils how deep generative models will significantly enhance semantic communication frameworks in dealing with real-world complex data, extracting and exploiting semantic information, and being robust to channel corruptions. Alongside establishing this emerging field, this paper charts novel research pathways for the next generative semantic communication frameworks.
    摘要 semantic communication 将在未来的 AI 驱动通信系统中扮演重要的角色。其挑战是从原始复杂内容中提取 semantics 信息并在接收端生成具有 semantics 一致性的数据,可能在通道损害情况下保持稳定性。这篇 ICASSP 特别会议简述paper 揭示了从机器学习角度来看 semantic communication 的挑战和 deep generative models 如何在实际世界中处理复杂数据、提取和利用 semantics 信息,并在通道损害情况下保持稳定性。此外,这篇 paper 还映示了未来的 generative semantic communication 框架的新研究路径。

Optimal Observation-Intervention Trade-Off in Optimisation Problems with Causal Structure

  • paper_url: http://arxiv.org/abs/2309.02287
  • repo_url: None
  • paper_authors: Kim Hammar, Neil Dhir
  • for: 优化成本高的灰色案例目标函数,在有限预算下,基于 causal 结构的知识。
  • methods: 使用非幼目的最优停止问题,考虑观察 intervención 费用负面选择。
  • results: 实验结果表明,我们的表述可以增强现有的算法在真实和 sintética 标准准的表现。
    Abstract We consider the problem of optimising an expensive-to-evaluate grey-box objective function, within a finite budget, where known side-information exists in the form of the causal structure between the design variables. Standard black-box optimisation ignores the causal structure, often making it inefficient and expensive. The few existing methods that consider the causal structure are myopic and do not fully accommodate the observation-intervention trade-off that emerges when estimating causal effects. In this paper, we show that the observation-intervention trade-off can be formulated as a non-myopic optimal stopping problem which permits an efficient solution. We give theoretical results detailing the structure of the optimal stopping times and demonstrate the generality of our approach by showing that it can be integrated with existing causal Bayesian optimisation algorithms. Experimental results show that our formulation can enhance existing algorithms on real and synthetic benchmarks.
    摘要 我们考虑一个评估成本高的灰色obox目标函数优化问题,在有限预算内进行优化,其中知道变量之间的 causal 结构。标准的黑色obox优化忽略了 causal 结构,经常使其不fficient和昂贵。现有的方法只考虑了 causal 结构,但它们是偏短视的,不完全考虑观测 intervención 费用的负面作用。在这篇论文中,我们表明观测 intervención 费用可以形式化为非偏短视的最优停止问题,允许高效解决。我们提供了理论结果,详细说明优止时间的结构,并证明我们的方法可以与现有的 causal Bayesian 优化算法结合使用。实验结果表明,我们的形式化可以提高现有算法的性能在真实和synthetic 标准测试上。

s-ID: Causal Effect Identification in a Sub-Population

  • paper_url: http://arxiv.org/abs/2309.02281
  • repo_url: None
  • paper_authors: Amir Mohammad Abouei, Ehsan Mokhtarian, Negar Kiyavash
  • for: 本文目的是解决在特定子 populations 中 causal inference 问题,即从 observational data 中确定干预对特定子 populations 的影响。
  • methods: 本文提出了一种新的 causal inference 问题,称为 s-ID 问题,其中只有 observational data 可用,并且不知道整个人口的数据分布。作者提供了必需的和完整的条件,以确定 causal effect 在子 populations 中的可 identificability。
  • results: 本文提出的方法可以在 observational data 中确定 causal effect 在子 populations 中,并且可以在不同的 causal graph 下进行 identification。这种方法可以解决现有方法在 sub-populations 中的局限性。
    Abstract Causal inference in a sub-population involves identifying the causal effect of an intervention on a specific subgroup within a larger population. However, ignoring the subtleties introduced by sub-populations can either lead to erroneous inference or limit the applicability of existing methods. We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID), in which we merely have access to observational data of the targeted sub-population (as opposed to the entire population). Existing inference problems in sub-populations operate on the premise that the given data distributions originate from the entire population, thus, cannot tackle the s-ID problem. To address this gap, we provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population. Given these conditions, we present a sound and complete algorithm for the s-ID problem.
    摘要 causal inference in a sub-population involves identifying the causal effect of an intervention on a specific subgroup within a larger population. However, ignoring the subtleties introduced by sub-populations can either lead to erroneous inference or limit the applicability of existing methods. We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID), in which we merely have access to observational data of the targeted sub-population (as opposed to the entire population). Existing inference problems in sub-populations operate on the premise that the given data distributions originate from the entire population, thus, cannot tackle the s-ID problem. To address this gap, we provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population. Given these conditions, we present a sound and complete algorithm for the s-ID problem.Here's the translation in Traditional Chinese as well: causal inference in a sub-population involves identifying the causal effect of an intervention on a specific subgroup within a larger population. However, ignoring the subtleties introduced by sub-populations can either lead to erroneous inference or limit the applicability of existing methods. We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID), in which we merely have access to observational data of the targeted sub-population (as opposed to the entire population). Existing inference problems in sub-populations operate on the premise that the given data distributions originate from the entire population, thus, cannot tackle the s-ID problem. To address this gap, we provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population. Given these conditions, we present a sound and complete algorithm for the s-ID problem.

MA-VAE: Multi-head Attention-based Variational Autoencoder Approach for Anomaly Detection in Multivariate Time-series Applied to Automotive Endurance Powertrain Testing

  • paper_url: http://arxiv.org/abs/2309.02253
  • repo_url: https://github.com/lcs-crr/ma-vae
  • paper_authors: Lucas Correia, Jan-Christoph Goos, Philipp Klein, Thomas Bäck, Anna V. Kononova
  • for: automatous anomaly detection in automotive testing
  • methods: variational autoencoder with multi-head attention (MA-VAE)
  • results: detects the majority of anomalies with few false positives, avoids bypass phenomenon, and introduces a new method for remapping individual windows to a continuous time series.Here’s the breakdown of each point:
  • for: The paper is written for the purpose of proposing a novel approach to automatic anomaly detection in automotive testing, which is a real-world application with massive, diverse, multivariate, and temporal data.
  • methods: The proposed approach uses a variational autoencoder with multi-head attention (MA-VAE) to model the testee behavior and detect anomalies. The MA-VAE is trained on unlabelled data and has the ability to provide few false positives and detect the majority of anomalies.
  • results: The approach is tested on a real-world industrial data set and the results show that it can detect 67% of the anomalies present with 9% false positives. Additionally, the approach has the potential to perform well with only a fraction of the training and validation subset, but a more sophisticated threshold estimation method is required to extract it.
    Abstract A clear need for automatic anomaly detection applied to automotive testing has emerged as more and more attention is paid to the data recorded and manual evaluation by humans reaches its capacity. Such real-world data is massive, diverse, multivariate and temporal in nature, therefore requiring modelling of the testee behaviour. We propose a variational autoencoder with multi-head attention (MA-VAE), which, when trained on unlabelled data, not only provides very few false positives but also manages to detect the majority of the anomalies presented. In addition to that, the approach offers a novel way to avoid the bypass phenomenon, an undesirable behaviour investigated in literature. Lastly, the approach also introduces a new method to remap individual windows to a continuous time series. The results are presented in the context of a real-world industrial data set and several experiments are undertaken to further investigate certain aspects of the proposed model. When configured properly, it is 9% of the time wrong when an anomaly is flagged and discovers 67% of the anomalies present. Also, MA-VAE has the potential to perform well with only a fraction of the training and validation subset, however, to extract it, a more sophisticated threshold estimation method is required.
    摘要 <>对于自动异常检测在汽车测试中的需求,现在越来越明显,因为人类的手动评估已经达到了其容量。这些真实世界数据是庞大、多样、多变和时间序列的,因此需要测试对象的行为模型。我们提议一种多头注意力自适应变换器(MA-VAE),当训练于无标签数据时,不仅可以减少假阳性数量,而且能够检测大多数异常现象。此外,该方法还可以避免快船现象,这是文献中 investigate 的不良行为。最后,该方法还引入了一种新的时间序列映射方法。结果在实际工业数据集上展示,并进行了一些实验来更深入探索ertain aspect of the proposed model。当配置正确时,MA-VAE的错误率为9%,并检测到67%的异常现象。此外,MA-VAE还有可能在只使用一小部分的训练和验证subset中表现良好,但是要EXTRACT 它,需要一种更加复杂的阈值估计方法。

Encoding Seasonal Climate Predictions for Demand Forecasting with Modular Neural Network

  • paper_url: http://arxiv.org/abs/2309.02248
  • repo_url: None
  • paper_authors: Smit Marvaniya, Jitendra Singh, Nicolas Galichet, Fred Ochieng Otieno, Geeth De Mel, Kommy Weldemariam
  • for: 提高供应链功能的时间序列预测精度
  • methods: 使用模块化神经网络架构,高效地编码季节气候预测结果,以及其他时间序列数据(例如买家模式),从而学习具有坚实性和可靠性的秘密表示
  • results: 比对 existed 预测方法,实验结果显示,使用该模型增加了约13%到17%的预测精度,在多个实际数据集上
    Abstract Current time-series forecasting problems use short-term weather attributes as exogenous inputs. However, in specific time-series forecasting solutions (e.g., demand prediction in the supply chain), seasonal climate predictions are crucial to improve its resilience. Representing mid to long-term seasonal climate forecasts is challenging as seasonal climate predictions are uncertain, and encoding spatio-temporal relationship of climate forecasts with demand is complex. We propose a novel modeling framework that efficiently encodes seasonal climate predictions to provide robust and reliable time-series forecasting for supply chain functions. The encoding framework enables effective learning of latent representations -- be it uncertain seasonal climate prediction or other time-series data (e.g., buyer patterns) -- via a modular neural network architecture. Our extensive experiments indicate that learning such representations to model seasonal climate forecast results in an error reduction of approximately 13\% to 17\% across multiple real-world data sets compared to existing demand forecasting methods.
    摘要 当前时间序列预测问题通常使用短期天气特征作为外生输入。然而,在特定的时间序列预测解决方案(如购物者patterns)中,季节气候预测是关键以提高其抗难度。表示中期至长期季节气候预测的问题是复杂的,因为季节气候预测具有不确定性,而且与需求的空间时间关系复杂。我们提出了一种新的模型框架,可以效率地编码季节气候预测。该框架允许效果学习季节气候预测的秘密表示,并且可以吸收其他时间序列数据(如购物者patterns)的学习。我们的广泛实验表明,通过学习这些表示来模型季节气候预测可以减少错误率约13%到17%,相比之前的需求预测方法。

AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models

  • paper_url: http://arxiv.org/abs/2309.06495
  • repo_url: None
  • paper_authors: Fei Tang, Wanling Gao, Luzhou Peng, Jianfeng Zhan
    for:* 本研究的目的是为了评估大型自然语言模型(LLM)的问题解决能力和智能水平。methods:* 本研究提出了一种多级划分、多modal、人参照的benchmarking方法,称为AGIBench,用于评估LLM的问题解决能力。results:* AGIBench支持多级划分benchmarking,包括每个问题、每个能力分支、每个知识、每个模式和每个难度层次的 granularity。Note:* 本文使用Simplified Chinese text format.* 所有的中文句子使用标准的标点符号和格式。
    Abstract Large language models (LLMs) like ChatGPT have revealed amazing intelligence. How to evaluate the question-solving abilities of LLMs and their degrees of intelligence is a hot-spot but challenging issue. First, the question-solving abilities are interlaced with different ability branches like understanding and massive knowledge categories like mathematics. Second, the inputs of questions are multimodal that may involve text and images. Third, the response format of LLMs is diverse and thus poses great challenges for result extraction and evaluation. In this paper, we propose AGIBench -- a multi-granularity, multimodal, human-referenced, and auto-scoring benchmarking methodology for LLMs. Instead of a collection of blended questions, AGIBench focuses on three typical ability branches and adopts a four-tuple to label the attributes of each question. First, it supports multi-granularity benchmarking, e.g., per-question, per-ability branch, per-knowledge, per-modal, per-dataset, and per-difficulty level granularities. Second, it contains multimodal input, including text and images. Third, it classifies all the questions into five degrees of difficulty according to the average accuracy rate of abundant educated humans (human-referenced). Fourth, it adopts zero-shot learning to avoid introducing additional unpredictability and provides an auto-scoring method to extract and judge the result. Finally, it defines multi-dimensional metrics, including accuracy under the average, worst, best, and majority voting cases, and repeatability. AGIBench is publically available from \url{https://www.benchcouncil.org/agibench}.
    摘要 大型语言模型(LLM)如ChatGPT的出色智能引发了评估这类模型的问题解决能力和智能水平的热点问题。然而,这些能力存在多种能力分支和多种输入模式,使评估变得具有挑战性。在本文中,我们提出了AGIBench方法,它是一种多级、多Modal、人参照的自动评分 benchMarking方法。在AGIBench中,每个问题被标记为四元组(能力分支、知识、Difficulty、Modal),以便支持多级别的评估。此外,AGIBench还支持多Modal输入,包括文本和图像。此外,它采用人参照的方式将问题分为五个Difficulty水平,并采用零投入学习以避免引入额外的不确定性。最后,它定义了多维度纪录,包括均值、最差、最佳、多数投票等纪录。AGIBench公共可用于 \url{https://www.benchcouncil.org/agibench}。

Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

  • paper_url: http://arxiv.org/abs/2309.02236
  • repo_url: None
  • paper_authors: Shyam Sundhar Ramesh, Pier Giuseppe Sessa, Yifan Hu, Andreas Krause, Ilija Bogunovic
  • for: 本研究旨在解决机器学习中的复杂动态系统、数据收集成本高和实际环境不符合训练环境的问题。
  • methods: 本文使用分布robust Markov决策过程(DRMP),利用 Gaussian Processes 和最大差异减少算法,效率地学习多输出 номинаル过程动态模型,并可以快速适应不同的不确定性集。
  • results: 研究人员通过 theoretically 和实验来证明提议的方法可以快速和高效地学习分布robust策略,并且可以适应不同的不确定性集。实验结果表明,该方法可以快速适应分布shift,并且在许多实际应用中表现出色。
    Abstract Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. To overcome these issues, we study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets. We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics, leveraging access to a generative model (i.e., simulator). We further demonstrate the statistical sample complexity of the proposed method for different uncertainty sets. These complexity bounds are independent of the number of states and extend beyond linear dynamics, ensuring the effectiveness of our approach in identifying near-optimal distributionally-robust policies. The proposed method can be further combined with other model-free distributionally robust reinforcement learning methods to obtain a near-optimal robust policy. Experimental results demonstrate the robustness of our algorithm to distributional shifts and its superior performance in terms of the number of samples needed.
    摘要 三大挑战在强化学习中是复杂的动力系统和大状态空间,以及实际世界中的动力不同于训练环境部署。为了解决这些问题,我们研究了分布robust Markov决策过程(MDP),并使用了 kontinuous state space 下的 Kullback-Leibler、chi-square 和 total variation 不确定性集。我们提出了一种基于模型的方法,利用 Gaussian Processes 和最大差异减少算法,高效地学习多输出 номинал传递动力学,利用 simulator 的访问权限。我们进一步证明了我们的方法的统计样本复杂度,这些复杂度独立于状态数量,并超越了线性动力学,保证了我们的方法在分布不稳定下可以适应最佳的分布robust策略。我们的方法可以与其他分布robust强化学习方法相结合,以获得最佳的 robust 策略。实验结果表明,我们的算法具有分布不稳定的Robustness和较少样本数量的优势。

Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering

  • paper_url: http://arxiv.org/abs/2309.02233
  • repo_url: None
  • paper_authors: Yubo Wang, Xueguang Ma, Wenhu Chen
  • for: 这个研究旨在应用大规模语言模型(LLM)到医疗领域,但是这类模型在医疗领域中的应用仍然具有挑战,主要是因为它们无法充分利用领域专门知识。
  • methods: 这个研究提出了一个名为Large-scale Language Models Augmented with Medical Textbooks(LLM-AMT)的解决方案,它通过将专业医学书籍作为设计的核心,通过插件式模组来扩展LLM的能力,包括混合文献搜寻器和询问增强器。
  • results: 实验结果显示,在三个开放领域医学问题解答任务上,使用LLM-AMT可以提高LLM的专业性和准确性,提高范围在11.4%到13.2%。此外,医学书籍作为搜寻 корпу的使用比wikipedia更有价值,实验结果显示,对于医学领域来说,使用医学书籍进行扩展可以提高性能范围在9.7%到12.2%。
    Abstract Large-scale language models (LLMs), such as ChatGPT, are capable of generating human-like responses for various downstream tasks, such as task-oriented dialogues and question answering. However, applying LLMs to medical domains remains challenging due to their inability to leverage domain-specific knowledge. In this study, we present the Large-scale Language Models Augmented with Medical Textbooks (LLM-AMT), which integrates authoritative medical textbooks as the cornerstone of its design, enhancing its proficiency in the specialized domain through plug-and-play modules, comprised of a Hybrid Textbook Retriever, supplemented by the Query Augmenter and the LLM Reader. Experimental evaluation on three open-domain medical question-answering tasks reveals a substantial enhancement in both the professionalism and accuracy of the LLM responses when utilizing LLM-AMT, exhibiting an improvement ranging from 11.4% to 13.2%. Despite being 100 times smaller, we found that medical textbooks as the retrieval corpus serves as a more valuable external knowledge source than Wikipedia in the medical domain. Our experiments show that textbook augmentation results in a performance improvement ranging from 9.7% to 12.2% over Wikipedia augmentation.
    摘要 大规模语言模型(LLM),如ChatGPT,可以生成人类化回答 для多种下游任务,如任务导向对话和问答。然而,在医疗领域中应用LLM仍然是一个挑战,因为它们无法借鉴医疗领域专业知识。在本研究中,我们提出了医疗领域语言模型增强器(LLM-AMT),它将权威的医疗文献作为设计的核心,通过插件式模块来增强其在专业领域的能力。我们的实验表明,在三个开放领域医学问答任务上,使用LLM-AMT可以substantially提高LLM的专业性和准确性,提高回答的质量,其中提高范围为11.4%到13.2%。我们发现,医疗文献作为搜索库是医疗领域更有价值的外部知识源,而不是Wikipedia。我们的实验表明,在使用医疗文献扩展时,表现提高的范围为9.7%到12.2%。

FSD: An Initial Chinese Dataset for Fake Song Detection

  • paper_url: http://arxiv.org/abs/2309.02232
  • repo_url: https://github.com/xieyuankun/fsd-dataset
  • paper_authors: Yuankun Xie, Jingjing Zhou, Xiaolin Lu, Zhenghao Jiang, Yuxin Yang, Haonan Cheng, Long Ye
  • for: 本研究旨在提供一个特有的歌曲深度变异检测数据集,并利用这个数据集进行歌曲深度变异检测模型的训练和评估。
  • methods: 本研究使用了五种当前最佳的嗓音合成和嗓音转换技术来生成假歌曲,并使用这些假歌曲来初始化一个中文歌曲深度变异检测数据集(FSD)。然后,我们使用FSD数据集进行歌曲深度变异检测模型的训练和评估。
  • results: 我们的实验结果表明,将歌曲特有的特征提取和处理到深度变异检测模型中,可以减少平均错误率38.58%,相比Speech-trained ADD模型在FSD测试集上。
    Abstract Singing voice synthesis and singing voice conversion have significantly advanced, revolutionizing musical experiences. However, the rise of "Deepfake Songs" generated by these technologies raises concerns about authenticity. Unlike Audio DeepFake Detection (ADD), the field of song deepfake detection lacks specialized datasets or methods for song authenticity verification. In this paper, we initially construct a Chinese Fake Song Detection (FSD) dataset to investigate the field of song deepfake detection. The fake songs in the FSD dataset are generated by five state-of-the-art singing voice synthesis and singing voice conversion methods. Our initial experiments on FSD revealed the ineffectiveness of existing speech-trained ADD models for the task of song deepFake detection. Thus, we employ the FSD dataset for the training of ADD models. We subsequently evaluate these models under two scenarios: one with the original songs and another with separated vocal tracks. Experiment results show that song-trained ADD models exhibit a 38.58% reduction in average equal error rate compared to speech-trained ADD models on the FSD test set.
    摘要 《声音合成和声音转换技术在音乐经验方面已经取得了 significativ advancement,但是这些技术的出现也引发了 authenticity的问题。与Audio DeepFake Detection(ADD)不同的是,歌曲深伪检测领域没有专门的数据集或方法进行歌曲的真实性验证。在这篇论文中,我们首先构建了中文伪歌曲检测(FSD)数据集,以探讨歌曲深伪检测领域的问题。这些伪歌曲在FSD数据集中是由五种当前最好的声音合成和声音转换方法生成的。我们的初始实验表明,现有的speech-trained ADD模型对歌曲深伪检测任务并不有效。因此,我们使用FSD数据集来训练ADD模型。我们之后对这些模型进行了两种enario的评估:一种是使用原始的歌曲,另一种是使用分离的vocal轨。实验结果显示,使用歌曲训练的ADD模型在FSD测试集上比使用speech训练的ADD模型减少了38.58%的平均等错率。

DCP-Net: A Distributed Collaborative Perception Network for Remote Sensing Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2309.02230
  • repo_url: None
  • paper_authors: Zhechao Wang, Peirui Cheng, Shujing Duan, Kaiqiang Chen, Zhirui Wang, Xinming Li, Xian Sun
  • for: 提高远程感知任务中紧急情况下多平台协同观测的精度和效率。
  • methods: 提出了一种分布式协同感知网络(DCP-Net),通过将不同平台的特征集成而提高感知性能。同时,通过自适应匹配模块和相关特征融合模块来实现多平台协同观测。
  • results: 经过广泛的实验和视觉分析,DCP-Net在三个semantic segmentation dataset上表现出了明显的优势,与exist方法相比,提高了mIoU值2.61%~16.89%,达到了当前最佳水平。
    Abstract Onboard intelligent processing is widely applied in emergency tasks in the field of remote sensing. However, it is predominantly confined to an individual platform with a limited observation range as well as susceptibility to interference, resulting in limited accuracy. Considering the current state of multi-platform collaborative observation, this article innovatively presents a distributed collaborative perception network called DCP-Net. Firstly, the proposed DCP-Net helps members to enhance perception performance by integrating features from other platforms. Secondly, a self-mutual information match module is proposed to identify collaboration opportunities and select suitable partners, prioritizing critical collaborative features and reducing redundant transmission cost. Thirdly, a related feature fusion module is designed to address the misalignment between local and collaborative features, improving the quality of fused features for the downstream task. We conduct extensive experiments and visualization analyses using three semantic segmentation datasets, including Potsdam, iSAID and DFC23. The results demonstrate that DCP-Net outperforms the existing methods comprehensively, improving mIoU by 2.61%~16.89% at the highest collaboration efficiency, which promotes the performance to a state-of-the-art level.
    摘要 在远程感知领域的紧急任务中,船载智能处理广泛应用。然而,它主要受限于个人平台的有限观测范围以及易受干扰的问题,导致准确性有限。针对当前多平台合作观测的状况,本文创新提出了分布式合作感知网络(DCP-Net)。首先,提议的DCP-Net帮助成员提高感知性能,将其他平台的特征集成到自己平台上。其次,基于自我相互信息匹配模块,用于识别合作机会,选择适合的合作伙伴,优先级划分关键合作特征,降低重复传输成本。第三,关联特征融合模块用于解决本地特征与合作特征之间的不一致问题,提高下游任务的质量。我们对三个semantic segmentation数据集进行了广泛的实验和视觉分析,包括Potsdam、iSAID和DFC23。结果表明,DCP-Net与现有方法相比,全面性地提高了miou值,在最高的合作效率下提高了2.61%~16.89%,提升性能至当前领先水平。

Dense Object Grounding in 3D Scenes

  • paper_url: http://arxiv.org/abs/2309.02224
  • repo_url: None
  • paper_authors: Wencan Huang, Daizong Liu, Wei Hu
  • for: 本研究旨在解决现有3D物体准确定位方法的限制,即只能根据单个句子描述一个物体进行定位。为了解决这个问题,我们引入了3D密集物体定位(3D DOG)任务,即在更复杂的段落中找到多个物体的定位。
  • methods: 我们提出了一种新的核心 трансформа器基于框架,名为3DOGSFormer。该框架包括一个地址驱动的本地变换器解码器,以及一个提案驱动的全球变换器解码器。这两个解码器协作以生成更加准确的定位提案。
  • results: 我们的3DOGSFormer在三个不同的测试集(Nr3D、Sr3D和ScanRefer)上进行了广泛的实验,结果表明,我们的方法在比较复杂的3D场景中对多个物体的定位有较高的准确率,与现有的3D单物体定位方法和密集物体定位方法相比,具有显著的优势。
    Abstract Localizing objects in 3D scenes according to the semantics of a given natural language is a fundamental yet important task in the field of multimedia understanding, which benefits various real-world applications such as robotics and autonomous driving. However, the majority of existing 3D object grounding methods are restricted to a single-sentence input describing an individual object, which cannot comprehend and reason more contextualized descriptions of multiple objects in more practical 3D cases. To this end, we introduce a new challenging task, called 3D Dense Object Grounding (3D DOG), to jointly localize multiple objects described in a more complicated paragraph rather than a single sentence. Instead of naively localizing each sentence-guided object independently, we found that dense objects described in the same paragraph are often semantically related and spatially located in a focused region of the 3D scene. To explore such semantic and spatial relationships of densely referred objects for more accurate localization, we propose a novel Stacked Transformer based framework for 3D DOG, named 3DOGSFormer. Specifically, we first devise a contextual query-driven local transformer decoder to generate initial grounding proposals for each target object. Then, we employ a proposal-guided global transformer decoder that exploits the local object features to learn their correlation for further refining initial grounding proposals. Extensive experiments on three challenging benchmarks (Nr3D, Sr3D, and ScanRefer) show that our proposed 3DOGSFormer outperforms state-of-the-art 3D single-object grounding methods and their dense-object variants by significant margins.
    摘要 本文提出了一个新的挑战任务,即3D密集物地理(3D DOG),它的目的是在3D场景中对自然语言中提供的多个对象进行同时地理化。现有的大多数3D物理地理方法都是基于单句输入,无法处理更复杂的多对象描述。为此,我们提出了一种新的框架,即3DOGSFormer,它利用了堆叠的变换器来探索多个对象之间的含义和空间关系,从而实现更高精度的物理地理。我们首先设计了一种基于上下文的查询驱动的本地变换器嵌入器,以生成每个目标对象的初步锚点提案。然后,我们使用一种提案驱动的全球变换器嵌入器,利用本地对象特征来学习它们之间的相互关系,进一步细化初步锚点提案。我们在三个具有挑战性的测试基准(Nr3D、Sr3D和ScanRefer)上进行了广泛的实验,结果显示,我们的提案的3DOGSFormer在与现有的3D单个对象地理方法和密集对象变体之间具有显著的优势。

Improving equilibrium propagation without weight symmetry through Jacobian homeostasis

  • paper_url: http://arxiv.org/abs/2309.02214
  • repo_url: https://github.com/laborieux-axel/generalized-holo-ep
  • paper_authors: Axel Laborieux, Friedemann Zenke
  • for: 这个论文是为了研究等温傅振(EP)算法在生物或分析型神经网络上的应用。
  • methods: 这个论文使用了等温傅振算法,但是它需要权重对称和极小的平衡冲击来计算神经网络中的梯度。
  • results: 研究发现,权重不对称会导致等温傅振算法的表现不佳,而且可能会导致学习任务的低效。为了解决这个问题, authors propose了一种新的自适应目标函数,可以直接惩罚神经网络中权重的不对称性。这种自适应目标函数可以帮助神经网络更好地解决复杂的任务,如 ImageNet 32x32。
    Abstract Equilibrium propagation (EP) is a compelling alternative to the backpropagation of error algorithm (BP) for computing gradients of neural networks on biological or analog neuromorphic substrates. Still, the algorithm requires weight symmetry and infinitesimal equilibrium perturbations, i.e., nudges, to estimate unbiased gradients efficiently. Both requirements are challenging to implement in physical systems. Yet, whether and how weight asymmetry affects its applicability is unknown because, in practice, it may be masked by biases introduced through the finite nudge. To address this question, we study generalized EP, which can be formulated without weight symmetry, and analytically isolate the two sources of bias. For complex-differentiable non-symmetric networks, we show that the finite nudge does not pose a problem, as exact derivatives can still be estimated via a Cauchy integral. In contrast, weight asymmetry introduces bias resulting in low task performance due to poor alignment of EP's neuronal error vectors compared to BP. To mitigate this issue, we present a new homeostatic objective that directly penalizes functional asymmetries of the Jacobian at the network's fixed point. This homeostatic objective dramatically improves the network's ability to solve complex tasks such as ImageNet 32x32. Our results lay the theoretical groundwork for studying and mitigating the adverse effects of imperfections of physical networks on learning algorithms that rely on the substrate's relaxation dynamics.
    摘要 <> translate the following text into Simplified Chinese<>平衡传播(EP)是一种有吸引力的替代品 для误差传播算法(BP),用于计算神经网络的梯度。然而,EP算法需要权重的对称和微小的平衡干扰,以便效率地计算梯度。这两个需求在物理系统中实现可以是问题。但是,权重的不对称性是否会影响EP的应用是未知的,因为在实践中可能会遭受由固定干扰引入的偏见。为了解决这个问题,我们研究了一种简化EP的方法,可以不需要权重的对称。我们还可以分析EP中两种source of bias,并证明在复杂不对称的神经网络上,finite nudge不会对梯度的计算产生影响。然而,权重的不对称性会导致梯度的误差,从而降低任务的性能。为了解决这个问题,我们提出了一个新的家ostatic objective,可以直接 penalty函数的不对称性。这个家ostatic objective可以对任务如ImageNet 32x32进行解决,并获得了良好的性能。我们的结果对于研究和解决物理网络上学习算法的不完善性问题提供了理论基础。

Exchanging-based Multimodal Fusion with Transformer

  • paper_url: http://arxiv.org/abs/2309.02190
  • repo_url: https://github.com/recklessronan/muse
  • paper_authors: Renyu Zhu, Chengcheng Han, Yong Qian, Qiushi Sun, Xiang Li, Ming Gao, Xuezhi Cao, Yunsen Xian
  • for: 本文研究了多模态融合的问题,特别是用于文本视频融合。
  • methods: 本文提出了一种基于Transformer的新的多模态融合模型MuSE,使用了两个编码器将多modal输入映射到不同的低维度空间中,并使用了两个解码器来规范嵌入并将其拟合到同一个空间中。
  • results: 对多模态命名实体识别和多模态情感分析两个任务进行了广泛的实验,结果表明MuSE比其他竞争者更高效。
    Abstract We study the problem of multimodal fusion in this paper. Recent exchanging-based methods have been proposed for vision-vision fusion, which aim to exchange embeddings learned from one modality to the other. However, most of them project inputs of multimodalities into different low-dimensional spaces and cannot be applied to the sequential input data. To solve these issues, in this paper, we propose a novel exchanging-based multimodal fusion model MuSE for text-vision fusion based on Transformer. We first use two encoders to separately map multimodal inputs into different low-dimensional spaces. Then we employ two decoders to regularize the embeddings and pull them into the same space. The two decoders capture the correlations between texts and images with the image captioning task and the text-to-image generation task, respectively. Further, based on the regularized embeddings, we present CrossTransformer, which uses two Transformer encoders with shared parameters as the backbone model to exchange knowledge between multimodalities. Specifically, CrossTransformer first learns the global contextual information of the inputs in the shallow layers. After that, it performs inter-modal exchange by selecting a proportion of tokens in one modality and replacing their embeddings with the average of embeddings in the other modality. We conduct extensive experiments to evaluate the performance of MuSE on the Multimodal Named Entity Recognition task and the Multimodal Sentiment Analysis task. Our results show the superiority of MuSE against other competitors. Our code and data are provided at https://github.com/RecklessRonan/MuSE.
    摘要 我们在这篇论文中研究了多模态融合问题。在最近的交换基本方法中,有些方法用于视觉融合,其目的是将一个模态的嵌入交换到另一个模态中。然而,大多数方法将多模态输入映射到不同的低维度空间,无法应用于顺序输入数据。为解决这些问题,在这篇论文中,我们提出了一种基于Transformer的新的多模态融合模型MuSE,用于文本视觉融合。我们首先使用两个encoder将多模态输入映射到不同的低维度空间。然后,我们employs两个decoder来规范嵌入并将其拖入同一个空间。两个decoder使得文本和图像之间的相关性能够更好地捕捉,并通过图像描述任务和文本到图像生成任务来规范嵌入。此外,基于规范嵌入,我们还提出了交换Transformer,它使用两个Transformer encoder的共享参数作为后备模型,以交换多模态之间的知识。具体来说,交换Transformer先学习输入的全局 контекст信息,然后进行交换,选择一个模态中的一些Token,并将其嵌入换成另一个模态中的均值。我们对MuSE进行了广泛的实验,以评估其在多模态命名实体识别任务和多模态情感分析任务中的表现。我们的结果显示MuSE在与其他竞争对手相比,具有更高的表现。我们的代码和数据可以在https://github.com/RecklessRonan/MuSE上获取。

Leveraging BERT Language Models for Multi-Lingual ESG Issue Identification

  • paper_url: http://arxiv.org/abs/2309.02189
  • repo_url: None
  • paper_authors: Elvys Linhares Pontes, Mohamed Benjannet, Lam Kim Ming
  • for: 这项研究的目的是为投资者更好地了解公司的可持续发展和社会责任,通过分类新闻文章的ESG Issue标签来提高投资决策的可持续性。
  • methods: 该研究使用BERT语言模型来实现新闻文章的分类,并 comparing different BERT语言模型和SVM折衔模型的表现。
  • results: 研究发现,使用RoBERTa分类器得到了英文测试集第二名的成绩,并与法语测试集第五名相当。此外,SVM折衔模型特制 для中文语言也表现出色,在测试集上排名第二。
    Abstract Environmental, Social, and Governance (ESG) has been used as a metric to measure the negative impacts and enhance positive outcomes of companies in areas such as the environment, society, and governance. Recently, investors have increasingly recognized the significance of ESG criteria in their investment choices, leading businesses to integrate ESG principles into their operations and strategies. The Multi-Lingual ESG Issue Identification (ML-ESG) shared task encompasses the classification of news documents into 35 distinct ESG issue labels. In this study, we explored multiple strategies harnessing BERT language models to achieve accurate classification of news documents across these labels. Our analysis revealed that the RoBERTa classifier emerged as one of the most successful approaches, securing the second-place position for the English test dataset, and sharing the fifth-place position for the French test dataset. Furthermore, our SVM-based binary model tailored for the Chinese language exhibited exceptional performance, earning the second-place rank on the test dataset.
    摘要 环境、社会和管理(ESG)被用作公司负面影响和改善效果的度量。近期,投资者对ESG标准的重要性日益认识,导致企业将ESG原则 integrate into their operations and strategies。这个多语言ESG问题识别(ML-ESG)共同任务涵盖35个不同的ESG问题标签。本研究通过BERT语言模型的多种策略来实现新闻文档的准确分类。我们的分析发现,RoBERTa分类器在英语测试数据集中获得了第二名的成绩,并在法语测试数据集中与其他模型并列第五名。此外,我们对中文语言的SVM二分类模型也展现出了出色的表现,在测试数据集中获得了第二名。

AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

  • paper_url: http://arxiv.org/abs/2309.02186
  • repo_url: https://github.com/YueWuHKUST/AniPortraitGAN
  • paper_authors: Yue Wu, Sicheng Xu, Jianfeng Xiang, Fangyun Wei, Qifeng Chen, Jiaolong Yang, Xin Tong
  • for: 生成高质量的3D人像视频
  • methods: 基于生成光谱抽象表示法和可学习头部和肩部变形的3D人像生成模型
  • results: 使用不结构化2D图像集训练的方法可以生成多样性和高质量的3D人像视频,并可以控制不同属性的表达Here’s a more detailed explanation of each point:
  • for: The paper is focused on generating high-quality 3D portrait videos, which are relatively rare in real life and are challenging to generate using existing methods.
  • methods: The proposed method is based on a generative radiance manifold representation and includes learnable facial and head-shoulder deformations. The method also uses a dual-camera rendering and adversarial learning scheme to improve the quality of the generated faces.
  • results: The method, trained on unstructured 2D image collections, can generate diverse and high-quality 3D portraits with desired control over different properties, such as facial expression, head pose, and shoulder movements.
    Abstract Previous animatable 3D-aware GANs for human generation have primarily focused on either the human head or full body. However, head-only videos are relatively uncommon in real life, and full body generation typically does not deal with facial expression control and still has challenges in generating high-quality results. Towards applicable video avatars, we present an animatable 3D-aware GAN that generates portrait images with controllable facial expression, head pose, and shoulder movements. It is a generative model trained on unstructured 2D image collections without using 3D or video data. For the new task, we base our method on the generative radiance manifold representation and equip it with learnable facial and head-shoulder deformations. A dual-camera rendering and adversarial learning scheme is proposed to improve the quality of the generated faces, which is critical for portrait images. A pose deformation processing network is developed to generate plausible deformations for challenging regions such as long hair. Experiments show that our method, trained on unstructured 2D images, can generate diverse and high-quality 3D portraits with desired control over different properties.
    摘要 以前的可动画3D意识GANs为人类生成都主要集中在人头或全身上。然而,头部视频较少出现在实际生活中,全身生成通常没有控制表情和脸部表现的能力,并且仍有高质量结果生成的挑战。为应用视频化身,我们提出了可动画3D意识GAN,该模型可生成带有可控表情、头部姿势和肩部运动的肖像图像。我们基于生成抛光扩散表示,并增加了可学习的脸部和头部运动扭曲。我们还提出了双摄像头渲染和对抗学习方案,以提高生成的脸部质量,这是对肖像图像的关键。此外,我们还开发了一个挑战区域 such as long hair 的姿势处理网络,以生成可能的姿势扭曲。实验表明,我们的方法,通过未结构化的2D图像集训练,可以生成多样化和高质量的3D肖像图像,并且可以控制不同的属性。

BEVTrack: A Simple Baseline for 3D Single Object Tracking in Bird’s-Eye View

  • paper_url: http://arxiv.org/abs/2309.02185
  • repo_url: https://github.com/xmm-prio/bevtrack
  • paper_authors: Yuxiang Yang, Yingqi Deng, Jiahao Nie, Jing Zhang
  • for: 3D single object tracking (SOT) in point clouds, specifically in autonomous driving scenarios where target objects maintain spatial adjacency across frames.
  • methods: converts consecutive point clouds into Bird’s-Eye View representation, encodes spatial proximity and captures motion cues via simple element-wise operation and convolutional layers, and directly learns the underlying motion distribution without making assumptions.
  • results: achieves state-of-the-art performance on KITTI and NuScenes datasets with a high inference speed of 122 FPS.
    Abstract 3D single object tracking (SOT) in point clouds is still a challenging problem due to appearance variation, distractors, and high sparsity of point clouds. Notably, in autonomous driving scenarios, the target object typically maintains spatial adjacency across consecutive frames, predominantly moving horizontally. This spatial continuity offers valuable prior knowledge for target localization. However, existing trackers, which often employ point-wise representations, struggle to efficiently utilize this knowledge owing to the irregular format of such representations. Consequently, they require elaborate designs and solving multiple subtasks to establish spatial correspondence. In this paper, we introduce BEVTrack, a simple yet strong baseline framework for 3D SOT. After converting consecutive point clouds into the common Bird's-Eye View representation, BEVTrack inherently encodes spatial proximity and adeptly captures motion cues for tracking via a simple element-wise operation and convolutional layers. Additionally, to better deal with objects having diverse sizes and moving patterns, BEVTrack directly learns the underlying motion distribution rather than making a fixed Laplacian or Gaussian assumption as in previous works. Without bells and whistles, BEVTrack achieves state-of-the-art performance on KITTI and NuScenes datasets while maintaining a high inference speed of 122 FPS. The code will be released at https://github.com/xmm-prio/BEVTrack.
    摘要 三元素 объек tracking (SOT) in point clouds 仍然是一个挑战,主要因为外观变化、干扰和点云的稀疏性。值得注意的是,在自动驾驶场景中,目标对象通常在连续帧中保持空间邻近,主要在水平方向上移动。这种空间继续性提供了有价值的先知知识 для目标位置确定。然而,现有的跟踪器,通常使用点 wise 表示,困难减少这种知识,因为点云的不规则格式。因此,它们需要较复杂的设计和解决多个子任务来确立空间匹配。在这篇论文中,我们介绍了 BEVTrack,一个简单却强大的基eline框架 для 3D SOT。将 consecutive point clouds 转化为共同 Bird's-Eye View 表示后,BEVTrack 自然地编码了空间 proximity 并善于捕捉运动指示符,通过简单的元素 wise 操作和卷积层来跟踪。此外,为了更好地处理具有不同尺寸和移动模式的对象,BEVTrack 直接学习下流动分布而不是在前一些作品中做固定 Laplacian 或 Gaussian 假设。无论各种饰物,BEVTrack achieve state-of-the-art 性能在 KITTI 和 NuScenes 数据集上,并保持高速推理速度为 122 FPS。代码将在 https://github.com/xmm-prio/BEVTrack 上发布。

Dual Relation Alignment for Composed Image Retrieval

  • paper_url: http://arxiv.org/abs/2309.02169
  • repo_url: None
  • paper_authors: Xintong Jiang, Yaxiong Wang, Yujiao Wu, Meng Wang, Xueming Qian
    for: 本研究旨在提高组合图像检索性能,通过融合两种关系:Explicit Relation(图像参考和补充文本)和Implicit Relation(图像参考和目标图像)。methods: 我们提出了一种新的框架,称为双关系对齐,它将Explicit Relation和Implicit Relation完全融合,以便充分利用这些对象之间的相互关系。我们设计了一个视觉混合器,将参考图像和目标图像 fusion,然后将结果表示作两种角色:(1)对Semantic Alignment with 补充文本进行对应,(2)为Explicit Relation模型增强。results: 我们在CIRR和FashionIQ两个Popular Dataset上进行了广泛的实验,结果表明我们的双关系学习方法可以明显提高组合图像检索性能。
    Abstract Composed image retrieval, a task involving the search for a target image using a reference image and a complementary text as the query, has witnessed significant advancements owing to the progress made in cross-modal modeling. Unlike the general image-text retrieval problem with only one alignment relation, i.e., image-text, we argue for the existence of two types of relations in composed image retrieval. The explicit relation pertains to the reference image & complementary text-target image, which is commonly exploited by existing methods. Besides this intuitive relation, the observations during our practice have uncovered another implicit yet crucial relation, i.e., reference image & target image-complementary text, since we found that the complementary text can be inferred by studying the relation between the target image and the reference image. Regrettably, existing methods largely focus on leveraging the explicit relation to learn their networks, while overlooking the implicit relation. In response to this weakness, We propose a new framework for composed image retrieval, termed dual relation alignment, which integrates both explicit and implicit relations to fully exploit the correlations among the triplets. Specifically, we design a vision compositor to fuse reference image and target image at first, then the resulted representation will serve two roles: (1) counterpart for semantic alignment with the complementary text and (2) compensation for the complementary text to boost the explicit relation modeling, thereby implant the implicit relation into the alignment learning. Our method is evaluated on two popular datasets, CIRR and FashionIQ, through extensive experiments. The results confirm the effectiveness of our dual-relation learning in substantially enhancing composed image retrieval performance.
    摘要 新型图像检索任务:基于参考图像和补充文本的目标图像检索,受到跨模型的进步所见证。不同于一般的图像文本检索问题,我们认为图像检索任务存在两种关系:一种是明确的关系,即参考图像和补充文本-目标图像,这种关系通常被现有方法利用。此外,我们在实践中发现了一种隐式 yet crucial 的关系,即参考图像和目标图像-补充文本,因为我们发现了补充文本可以通过研究参考图像和目标图像之间的关系来推导。然而,现有方法主要是通过明确的关系来学习其网络。为了解决这个弱点,我们提出了一种新的框架,即双关系协调,这种框架将明确和隐式关系完全利用,以便充分利用参考图像、目标图像和补充文本之间的相关性。我们设计了一个视觉笔记,用于将参考图像和目标图像 fusion,然后得到的表示将扮演两个角色:(1)对应文本的 semantic alignment 和(2)用于强化明确关系模型,以便在 alignment 学习中嵌入隐式关系。我们的方法在 CIRR 和 FashionIQ 两个流行的数据集上进行了广泛的实验,结果证明了我们的双关系学习在图像检索性能上具有显著提高的效果。

A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges

  • paper_url: http://arxiv.org/abs/2309.02473
  • repo_url: None
  • paper_authors: Maryam Zare, Parham M. Kebria, Abbas Khosravi, Saeid Nahavandi
  • For: This paper provides an introduction to imitation learning (IL) and offers an overview of its underlying assumptions and approaches in the context of robotics and artificial intelligence (AI).* Methods: The paper discusses recent advances and emerging areas of research in IL, including the use of demonstrations to learn desired behavior, and addresses common challenges associated with IL.* Results: The paper provides a comprehensive guide to the growing field of IL in robotics and AI, including potential directions for future research.Here’s the text in Simplified Chinese:
  • for: 这篇论文是提供对人工智能和机器人领域内的学习模式的引入和概述,包括学习从专家示例的方法。
  • methods: 论文讨论了现有的IL技术和新兴领域的研究,以及面临IL的常见挑战。
  • results: 论文提供了人工智能和机器人领域内IL的总结和未来研究方向。
    Abstract In recent years, the development of robotics and artificial intelligence (AI) systems has been nothing short of remarkable. As these systems continue to evolve, they are being utilized in increasingly complex and unstructured environments, such as autonomous driving, aerial robotics, and natural language processing. As a consequence, programming their behaviors manually or defining their behavior through reward functions (as done in reinforcement learning (RL)) has become exceedingly difficult. This is because such environments require a high degree of flexibility and adaptability, making it challenging to specify an optimal set of rules or reward signals that can account for all possible situations. In such environments, learning from an expert's behavior through imitation is often more appealing. This is where imitation learning (IL) comes into play - a process where desired behavior is learned by imitating an expert's behavior, which is provided through demonstrations. This paper aims to provide an introduction to IL and an overview of its underlying assumptions and approaches. It also offers a detailed description of recent advances and emerging areas of research in the field. Additionally, the paper discusses how researchers have addressed common challenges associated with IL and provides potential directions for future research. Overall, the goal of the paper is to provide a comprehensive guide to the growing field of IL in robotics and AI.
    摘要 This paper provides an introduction to IL and an overview of its underlying assumptions and approaches. It also offers a detailed description of recent advances and emerging areas of research in the field. Additionally, the paper discusses how researchers have addressed common challenges associated with IL and provides potential directions for future research. The goal of the paper is to provide a comprehensive guide to the growing field of IL in robotics and AI.

Model-based Offline Policy Optimization with Adversarial Network

  • paper_url: http://arxiv.org/abs/2309.02157
  • repo_url: https://github.com/junming-yang/moan
  • paper_authors: Junming Yang, Xingguo Chen, Shengyuan Wang, Bolei Zhang
  • for: 提出了一种基于模型的线上权限学习(RL)方法,以避免在线环境中的成本交互,并且可以在离线数据集上进行策略优化。
  • methods: 使用了对抗学习来建立一个更好的逻辑分布,并通过对抗网络来提供模型的不确定性量化。
  • results: 比较了现有的基于模型的离线RL方法,并取得了更高的性能和更准确的不确定性量化。
    Abstract Model-based offline reinforcement learning (RL), which builds a supervised transition model with logging dataset to avoid costly interactions with the online environment, has been a promising approach for offline policy optimization. As the discrepancy between the logging data and online environment may result in a distributional shift problem, many prior works have studied how to build robust transition models conservatively and estimate the model uncertainty accurately. However, the over-conservatism can limit the exploration of the agent, and the uncertainty estimates may be unreliable. In this work, we propose a novel Model-based Offline policy optimization framework with Adversarial Network (MOAN). The key idea is to use adversarial learning to build a transition model with better generalization, where an adversary is introduced to distinguish between in-distribution and out-of-distribution samples. Moreover, the adversary can naturally provide a quantification of the model's uncertainty with theoretical guarantees. Extensive experiments showed that our approach outperforms existing state-of-the-art baselines on widely studied offline RL benchmarks. It can also generate diverse in-distribution samples, and quantify the uncertainty more accurately.
    摘要 模型基于的线上强化学习(RL),通过使用日志数据建立一个监督式过渡模型,以避免在线环境中的成本性交互,已经是无线环境中的一种有前途的方法。然而, logging 数据和在线环境之间的差异可能会导致分布性Shift问题,许多前作都研究了如何建立保守的过渡模型和准确地估计模型的不确定性。然而,过于保守的建模可能会限制 agent 的探索,而估计的不确定性可能是不可靠的。在这个工作中,我们提出了一种基于 Model-based Offline policy optimization 框架的 Adversarial Network (MOAN)。关键思想是使用对抗学习建立一个更好的泛化过渡模型,其中一个对手可以分辨在 Distribution 和 Out-of-Distribution 样本之间。此外,对手还可以自然地提供一个量化的模型不确定性的理论保证。我们的方法在 widely studied 的 offline RL 标准准样本上进行了广泛的实验,并显示了我们的方法在性能和多样性方面的超过现有基eline。它还可以更准确地量化不确定性。

Making Large Language Models Better Reasoners with Alignment

  • paper_url: http://arxiv.org/abs/2309.02144
  • repo_url: None
  • paper_authors: Peiyi Wang, Lei Li, Liang Chen, Feifan Song, Binghuai Lin, Yunbo Cao, Tianyu Liu, Zhifang Sui
  • for: 提升大语言模型(LLM)的理解能力,特别是在链式思维(COT)理解过程中。
  • methods: 通过对 LLM 进行特定的微调,使其在 COT 理解过程中提高其理解能力。
  • results: 通过实施新的对齐练习(AFT)方法,可以有效地解决 LLM 在 COT 理解过程中存在的评价不一致问题,并提高其理解能力。
    Abstract Reasoning is a cognitive process of using evidence to reach a sound conclusion. The reasoning capability is essential for large language models (LLMs) to serve as the brain of the artificial general intelligence agent. Recent studies reveal that fine-tuning LLMs on data with the chain of thought (COT) reasoning process can significantly enhance their reasoning capabilities. However, we find that the fine-tuned LLMs suffer from an \textit{Assessment Misalignment} problem, i.e., they frequently assign higher scores to subpar COTs, leading to potential limitations in their reasoning abilities. To address this problem, we introduce an \textit{Alignment Fine-Tuning (AFT)} paradigm, which involves three steps: 1) fine-tuning LLMs with COT training data; 2) generating multiple COT responses for each question, and categorizing them into positive and negative ones based on whether they achieve the correct answer; 3) calibrating the scores of positive and negative responses given by LLMs with a novel constraint alignment loss. Specifically, the constraint alignment loss has two objectives: a) Alignment, which guarantees that positive scores surpass negative scores to encourage answers with high-quality COTs; b) Constraint, which keeps the negative scores confined to a reasonable range to prevent the model degradation. Beyond just the binary positive and negative feedback, the constraint alignment loss can be seamlessly adapted to the ranking situations when ranking feedback is accessible. Furthermore, we also delve deeply into recent ranking-based alignment methods, such as DPO, RRHF, and PRO, and discover that the constraint, which has been overlooked by these approaches, is also crucial for their performance. Extensive experiments on four reasoning benchmarks with both binary and ranking feedback demonstrate the effectiveness of AFT.
    摘要 理智是认知过程中的证据使用,以达到正确的结论。理智能力是人工通用智能代理人的关键能力。现代研究表明,对大语言模型(LLM)进行适应过程可以显著提高其理智能力。然而,我们发现,经过精度调整后的LLM受到了评价不一致(Assessment Misalignment)问题的限制,即它们经常对低质量的链条思维(COT)进行高分评价,可能导致其理智能力受到限制。为解决这个问题,我们提出了一种Alignment Fine-Tuning(AFT)方法,包括以下三步:1)对LLM进行COT训练数据的精度调整;2)为每个问题生成多个COT响应,并将它们分为正确和错误的两类 based on whether they achieve the correct answer; 3)对LLM对正确和错误响应的分配分数进行Calibration,使其符合一个新的约束Alignmentloss。具体来说,Alignmentloss有两个目标:a)Alignment,确保正确的分数高于错误的分数,以鼓励高质量的COT; b)Constraint,使错误的分数尽可能地受限,以避免模型下降。此外,我们还发现,当有排名反馈时,这种约束可以轻松地适应到排名情况下。此外,我们还对最近的排名基于Alignment方法,如DPO、RRHF和PRO进行了深入研究,发现,这种约束也是这些方法的关键因素。我们在四个理智benchmark上进行了广泛的实验,并证明了AFT的效果。

A Lightweight, Rapid and Efficient Deep Convolutional Network for Chest X-Ray Tuberculosis Detection

  • paper_url: http://arxiv.org/abs/2309.02140
  • repo_url: https://github.com/dani-capellan/LightTBNet
  • paper_authors: Daniel Capellán-Martín, Juan J. Gómez-Valverde, David Bermejo-Peláez, María J. Ledesma-Carbayo
  • for: 您的论文旨在提高肺部X射线图像的诊断精度,减少误判。
  • methods: 您使用了深度学习技术,开发了一种特制的轻量级、快速、计算效率低的快速预测模型,以提高肺部X射线图像的诊断精度。
  • results: 您的模型在独立测试集上达到了0.906、0.907和0.961的准确率、F1分数和ROC曲线值,表明模型在诊断肺部TB的能力强,并且具有快速预测和低计算和存储需求,适用于在低TB发病地区使用。
    Abstract Tuberculosis (TB) is still recognized as one of the leading causes of death worldwide. Recent advances in deep learning (DL) have shown to enhance radiologists' ability to interpret chest X-ray (CXR) images accurately and with fewer errors, leading to a better diagnosis of this disease. However, little work has been done to develop models capable of diagnosing TB that offer good performance while being efficient, fast and computationally inexpensive. In this work, we propose LightTBNet, a novel lightweight, fast and efficient deep convolutional network specially customized to detect TB from CXR images. Using a total of 800 frontal CXR images from two publicly available datasets, our solution yielded an accuracy, F1 and area under the ROC curve (AUC) of 0.906, 0.907 and 0.961, respectively, on an independent test subset. The proposed model demonstrates outstanding performance while delivering a rapid prediction, with minimal computational and memory requirements, making it highly suitable for deployment in handheld devices that can be used in low-resource areas with high TB prevalence. Code publicly available at https://github.com/dani-capellan/LightTBNet.
    摘要 肺炎病毒 (TB) 仍然被认为全球主要的死亡原因之一。latest advances in deep learning (DL) 已经显示了改善医生解读胸部X射线 (CXR) 像素的能力,导致更好的这病的诊断。然而, little work has been done to develop models capable of diagnosing TB that offer good performance while being efficient, fast and computationally inexpensive. In this work, we propose LightTBNet, a novel lightweight, fast and efficient deep convolutional network specially customized to detect TB from CXR images. Using a total of 800 frontal CXR images from two publicly available datasets, our solution yielded an accuracy, F1 and area under the ROC curve (AUC) of 0.906, 0.907 and 0.961, respectively, on an independent test subset. The proposed model demonstrates outstanding performance while delivering a rapid prediction, with minimal computational and memory requirements, making it highly suitable for deployment in handheld devices that can be used in low-resource areas with high TB prevalence. Code publicly available at https://github.com/dani-capellan/LightTBNet.Note: Please note that the translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Self-Supervised Pre-Training Boosts Semantic Scene Segmentation on LiDAR data

  • paper_url: http://arxiv.org/abs/2309.02139
  • repo_url: https://github.com/marionacaros/barlow-twins-for-sem-seg
  • paper_authors: Mariona Carós, Ariadna Just, Santi Seguí, Jordi Vitrià
  • for: 本研究旨在实现从无标注数据中学习Semantic Scene Segmentation(实物景像分类),以减少需要标注的数据量。
  • methods: 本研究使用Barlow Twins自我超vised encoder进行预训练,并将其用作实物景像分类任务中的预训练网络。
  • results: 实验结果显示,我们的无标注预训练策略可以增加实物景像分类任务中的表现,特别是对于少数类别的表现。
    Abstract Airborne LiDAR systems have the capability to capture the Earth's surface by generating extensive point cloud data comprised of points mainly defined by 3D coordinates. However, labeling such points for supervised learning tasks is time-consuming. As a result, there is a need to investigate techniques that can learn from unlabeled data to significantly reduce the number of annotated samples. In this work, we propose to train a self-supervised encoder with Barlow Twins and use it as a pre-trained network in the task of semantic scene segmentation. The experimental results demonstrate that our unsupervised pre-training boosts performance once fine-tuned on the supervised task, especially for under-represented categories.
    摘要 空中探测LiDAR系统可以捕捉地球表面,生成大量的点云数据,主要由3D坐标定义。但标注这些点云数据用于监督学习任务是时间消耗大。因此,我们需要研究如何从无标注数据中学习,以大幅减少需要标注的样本数量。在这个工作中,我们提议使用自我监督编码器和Barlow Twins进行预训练,并将其作为semantic scene segmentation任务的预训练网络。实验结果表明,我们的无监督预训练可以大幅提高任务的性能,尤其是对于少数概率类别。

Generalized Simplicial Attention Neural Networks

  • paper_url: http://arxiv.org/abs/2309.02138
  • repo_url: https://github.com/luciatesta97/generalized-simplicial-attention-neural-networks
  • paper_authors: Claudio Battiloro, Lucia Testa, Lorenzo Giusti, Stefania Sardellitti, Paolo Di Lorenzo, Sergio Barbarossa
  • for: 本研究旨在介绍通用 simplicial 注意力神经网络 (GSAN),即利用面积掩码自我注意 layers 处理定义在 simplicial 复合体上的数据的新神经网络体系。
  • methods: 作者提出了一系列基于 topological signal processing 原理的自我注意方案,可以处理不同 simplicial 顺序上的数据组件,如节点、边、三角形等,并通过 Dirac 算子和其分解学习对 simplicial 领域的邻居重要性进行权重。
  • results: 作者证明了 GSAN 具有交换对称和 simplicial 意识,并通过应用于多个(推导和推理)任务,如 trajectory prediction、缺失数据填充、图 классификация和 simplex prediction,与其他方法进行比较,得到了比较好的结果。
    Abstract The aim of this work is to introduce Generalized Simplicial Attention Neural Networks (GSANs), i.e., novel neural architectures designed to process data defined on simplicial complexes using masked self-attentional layers. Hinging on topological signal processing principles, we devise a series of self-attention schemes capable of processing data components defined at different simplicial orders, such as nodes, edges, triangles, and beyond. These schemes learn how to weight the neighborhoods of the given topological domain in a task-oriented fashion, leveraging the interplay among simplices of different orders through the Dirac operator and its Dirac decomposition. We also theoretically establish that GSANs are permutation equivariant and simplicial-aware. Finally, we illustrate how our approach compares favorably with other methods when applied to several (inductive and transductive) tasks such as trajectory prediction, missing data imputation, graph classification, and simplex prediction.
    摘要 文章的目的是介绍通用 simplicial 注意力神经网络(GSAN),即新的神经网络架构,用于处理定义在 simplicial 复合体上的数据,使用假自注意层。基于 topological signal processing 原则,我们设计了一系列自注意方案,可以处理不同 simplicial 顺序的数据组件,如节点、边、triangle 等。这些方案可以Weight neighborhoods of the given topological domain in a task-oriented fashion,利用不同 simplices 之间的交互,通过 Дирак算符和其 Дирак分解。我们还证明了 GSANs 是 permutation equivariant 和 simplicial-aware。最后,我们比较了我们的方法与其他方法在 inductive 和 transductive 任务上的性能,包括 trajectory prediction、missing data imputation、graph classification 和 simplex prediction。

Exploring the Intersection of Complex Aesthetics and Generative AI for Promoting Cultural Creativity in Rural China after the Post-Pandemic Era

  • paper_url: http://arxiv.org/abs/2309.02136
  • repo_url: None
  • paper_authors: Mengyao Guo, Xiaolin Zhang, Yuan Zhuang, Jing Chen, Pengfei Wang, Ze Gao
  • for: 这个论文探讨了在中国农村地区使用生成AI和艺术来促进文化创新,尤其是在COVID-19的影响下。
  • methods: 该论文通过文献综述、案例研究、问卷调查和文本分析方法来研究艺术和科技在农村 context中的应用,并找到了关键的挑战。
  • results: 研究发现艺术作品经常无法在当地 resonate,而依赖于外部艺术家的支持限制了可持续性。因此,抚养“村村艺术家”通过AI被提议。我们的方法是通过对主观美学进行机器学习训练,生成文化相关的内容。交互式AI媒体还可以提高旅游业,保护遗产。这项先导性的研究提出了对AI和艺术的交互关系的新视角,并强调AI的创作能力 versus 取代性。最后,它为了使用AI创新来促进农村社区的发展奠定了基础。
    Abstract This paper explores using generative AI and aesthetics to promote cultural creativity in rural China amidst COVID-19's impact. Through literature reviews, case studies, surveys, and text analysis, it examines art and technology applications in rural contexts and identifies key challenges. The study finds artworks often fail to resonate locally, while reliance on external artists limits sustainability. Hence, nurturing grassroots "artist villagers" through AI is proposed. Our approach involves training machine learning on subjective aesthetics to generate culturally relevant content. Interactive AI media can also boost tourism while preserving heritage. This pioneering research puts forth original perspectives on the intersection of AI and aesthetics to invigorate rural culture. It advocates holistic integration of technology and emphasizes AI's potential as a creative enabler versus replacement. Ultimately, it lays the groundwork for further exploration of leveraging AI innovations to empower rural communities. This timely study contributes to growing interest in emerging technologies to address critical issues facing rural China.
    摘要

Multi-label affordance mapping from egocentric vision

  • paper_url: http://arxiv.org/abs/2309.02120
  • repo_url: https://github.com/lmur98/epic_kitchens_affordances
  • paper_authors: Lorenzo Mur-Labadia, Jose J. Guerrero, Ruben Martinez-Cantin
  • for: 本研究旨在提供高精度的交互场景中的可用性检测和分割方法,用于支持人工智能系统的发展。
  • methods: 本研究使用新的多标签检测方法,可以准确地检测和分割多个可用性在同一个空间中。
  • results: 研究人员通过使用多标签检测方法,成功地提取了高精度的交互场景中的可用性信息,并构建了大量和完整的交互可用性数据集(EPIC-Aff)。此外,研究人员还提出了一种新的多标签检测方法,可以处理多个可用性同时存在同一个空间中的情况。
    Abstract Accurate affordance detection and segmentation with pixel precision is an important piece in many complex systems based on interactions, such as robots and assitive devices. We present a new approach to affordance perception which enables accurate multi-label segmentation. Our approach can be used to automatically extract grounded affordances from first person videos of interactions using a 3D map of the environment providing pixel level precision for the affordance location. We use this method to build the largest and most complete dataset on affordances based on the EPIC-Kitchen dataset, EPIC-Aff, which provides interaction-grounded, multi-label, metric and spatial affordance annotations. Then, we propose a new approach to affordance segmentation based on multi-label detection which enables multiple affordances to co-exists in the same space, for example if they are associated with the same object. We present several strategies of multi-label detection using several segmentation architectures. The experimental results highlight the importance of the multi-label detection. Finally, we show how our metric representation can be exploited for build a map of interaction hotspots in spatial action-centric zones and use that representation to perform a task-oriented navigation.
    摘要 重要的一部分是许多复杂系统中的互动,如 робоット和协助设备。我们提出了一个新的方法来检测和分类可用性,可以从首人视频中自动提取固定的可用性。我们使用这种方法建立了最大和最完整的可用性数据集,EPIC-Aff,其提供了互动基于环境的三维地图,以像素精度检测可用性位置。接下来,我们提出了一个新的可用性分类方法,可以同时检测多个可用性,例如它们与同一个物品相关。我们提出了多种多label检测方法,包括多个分类架构。实验结果显示了多label检测的重要性。最后,我们显示了如何使用我们的度量表示法建立互动热点地图,并使用该表示法进行任务导向的探索。

Leveraging Label Information for Multimodal Emotion Recognition

  • paper_url: http://arxiv.org/abs/2309.02106
  • repo_url: https://github.com/Digimonseeker/LE-MER
  • paper_authors: Peiying Wang, Sunlu Zeng, Junqing Chen, Lu Fan, Meng Chen, Youzheng Wu, Xiaodong He
  • for: 本研究旨在提高多模态情感识别(MER)的性能,通过结合语音和文本信息。
  • methods: 我们提出了一种新的方法,利用标签信息来提高MER的性能。我们首先获取语音和文本模态的表示性标签嵌入,然后通过标签-token和标签-帧交互来学习每个语音的标签感知表示。最后,我们提出了一种新的标签导向拟合模块,将标签意识文本和语音表示进行情感分类。
  • results: 我们在公共的IEMOCAP dataset上进行了广泛的实验,结果表明,我们的提议的方法在比较基eline和现有方法的情况下,实现了新的国际顶点性能。
    Abstract Multimodal emotion recognition (MER) aims to detect the emotional status of a given expression by combining the speech and text information. Intuitively, label information should be capable of helping the model locate the salient tokens/frames relevant to the specific emotion, which finally facilitates the MER task. Inspired by this, we propose a novel approach for MER by leveraging label information. Specifically, we first obtain the representative label embeddings for both text and speech modalities, then learn the label-enhanced text/speech representations for each utterance via label-token and label-frame interactions. Finally, we devise a novel label-guided attentive fusion module to fuse the label-aware text and speech representations for emotion classification. Extensive experiments were conducted on the public IEMOCAP dataset, and experimental results demonstrate that our proposed approach outperforms existing baselines and achieves new state-of-the-art performance.
    摘要 多Modal情感识别(MER)目标是通过 Speech 和文本信息检测表达的情感状态。直觉地,标签信息应该能够帮助模型定位特定情感的关键词/帧,从而实现MER任务。 inspirited by这个想法,我们提出了一种新的MER方法,利用标签信息。具体来说,我们首先获得文本和Speech模态的表示性标签嵌入,然后通过标签-token和标签-帧交互学习每个语音的标签感知表示。最后,我们设计了一种新的标签引导束合模块,将标签意识的文本和Speech表示进行情感分类。我们在公共的IEMOCAP数据集上进行了广泛的实验,实验结果表明,我们提出的方法比现有的基eline和实现新的状态。

Improving Query-Focused Meeting Summarization with Query-Relevant Knowledge

  • paper_url: http://arxiv.org/abs/2309.02105
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Tiezheng Yu, Ziwei Ji, Pascale Fung
  • for: 这篇论文旨在提供一个可以根据查询生成会议总结的方法。
  • methods: 本文提出了一个知识增强的两阶段框架,名为知识感知SUMmarizer (KAS),以解决长Input文本长度和会议总结中罕见的查询相关信息的问题。在第一阶段,我们引入知识感知分析来提高查询相关段别的提取精度。在第二阶段,我们将查询相关知识 integrate到总结生成中。
  • results: 实验结果显示,我们的方法在QMSum dataset上实现了现有最佳性能。进一步的分析显示,我们的方法能够生成相关 faithful和有用的总结。
    Abstract Query-Focused Meeting Summarization (QFMS) aims to generate a summary of a given meeting transcript conditioned upon a query. The main challenges for QFMS are the long input text length and sparse query-relevant information in the meeting transcript. In this paper, we propose a knowledge-enhanced two-stage framework called Knowledge-Aware Summarizer (KAS) to tackle the challenges. In the first stage, we introduce knowledge-aware scores to improve the query-relevant segment extraction. In the second stage, we incorporate query-relevant knowledge in the summary generation. Experimental results on the QMSum dataset show that our approach achieves state-of-the-art performance. Further analysis proves the competency of our methods in generating relevant and faithful summaries.
    摘要 Query-Focused Meeting Summarization (QFMS) 目标是根据查询生成会议笔记摘要。主要挑战是输入文本长度较长,会议笔记中关键信息罕见。在这篇论文中,我们提出了知识增强的两Stage框架,称为知识感知摘要器(KAS),以解决这些挑战。首先,我们在查询相关段落提取中引入了知识感知分数。然后,我们在摘要生成过程中引入了查询相关知识。实验结果表明,我们的方法在 QMSum 数据集上达到了顶尖性能。进一步分析表明,我们的方法能够生成相关和准确的摘要。

Iterative Superquadric Recomposition of 3D Objects from Multiple Views

  • paper_url: http://arxiv.org/abs/2309.02102
  • repo_url: https://github.com/explainableml/isco
  • paper_authors: Stephan Alaniz, Massimiliano Mancini, Zeynep Akata
  • for: 本研究旨在提出一种描述对象的概念模型,帮助机器学习模型更好地理解和重建物体的三维结构。
  • methods: 该方法使用3D超quadrics作为semantic part来直接从2D视图中重建物体,而不需要训练任何3Dsupervision模型。该方法通过优化超quadrics参数,以实现高精度的3D重建。
  • results: 实验表明,相比最近的单个实例超quadrics重建方法,ISCO方法能够提供更高精度的3D重建结果,即使是从野生图像中。代码可以在https://github.com/ExplainableML/ISCO上下载。
    Abstract Humans are good at recomposing novel objects, i.e. they can identify commonalities between unknown objects from general structure to finer detail, an ability difficult to replicate by machines. We propose a framework, ISCO, to recompose an object using 3D superquadrics as semantic parts directly from 2D views without training a model that uses 3D supervision. To achieve this, we optimize the superquadric parameters that compose a specific instance of the object, comparing its rendered 3D view and 2D image silhouette. Our ISCO framework iteratively adds new superquadrics wherever the reconstruction error is high, abstracting first coarse regions and then finer details of the target object. With this simple coarse-to-fine inductive bias, ISCO provides consistent superquadrics for related object parts, despite not having any semantic supervision. Since ISCO does not train any neural network, it is also inherently robust to out-of-distribution objects. Experiments show that, compared to recent single instance superquadrics reconstruction approaches, ISCO provides consistently more accurate 3D reconstructions, even from images in the wild. Code available at https://github.com/ExplainableML/ISCO .
    摘要 人类具有将新物体复制成已知结构的能力,即可以从总结构到细节上识别未知物体的共同点,这是机器不易复制的能力。我们提出了一个框架,即ISCO,可以通过直接从2D视图中提取3D超quadric作为semantic part来重新组合物体。为了实现这一点,我们优化了超quadric参数,以使其能够组合特定物体的实例,并比较其渲染的3D视图和2D图像轮廓。我们的ISCO框架会逐渐添加新的超quadric,以降低重建错误,从总体到细节地抽象物体的target part。由于ISCO没有任何semantic supervision,它具有简单的卷积偏好,可以适应各种不同的物体。此外,由于ISCO不需要训练任何神经网络,它也是对外部数据集的抗耗性的。实验显示,相比最近的单个实例超quadrics重建方法,ISCO可以提供更加准确的3D重建结果,甚至来自野外图像。代码可以在https://github.com/ExplainableML/ISCO上获取。

TensorBank:Tensor Lakehouse for Foundation Model Training

  • paper_url: http://arxiv.org/abs/2309.02094
  • repo_url: None
  • paper_authors: Romeo Kienzler, Benedikt Blumenstiel, Zoltan Arnold Nagy, S. Karthik Mukkavilli, Johannes Schmude, Marcus Freitag, Michael Behrendt, Daniel Salles Civitarese, Naomi Simumba, Daiki Kimura, Hendrik Hamann
  • For: 用于训练基础模型的高维数据存储和流处理成为现代自然语言之外的核心需求。* Methods: 使用复杂关系查询加速 Hierarchical Statistical Indices (HSI) 来从Cloud Object Store (COS) 流动到 GPU 内存中的tensor lakehouse。* Results: 可以通过 direktly 地址tensor的块级别使用 HTTP 范围读取来快速地从Cloud Object Store (COS) 流动tensor到 GPU 内存中,并使用 PyTorch 转换来转换数据。
    Abstract Storing and streaming high dimensional data for foundation model training became a critical requirement with the rise of foundation models beyond natural language. In this paper we introduce TensorBank, a petabyte scale tensor lakehouse capable of streaming tensors from Cloud Object Store (COS) to GPU memory at wire speed based on complex relational queries. We use Hierarchical Statistical Indices (HSI) for query acceleration. Our architecture allows to directly address tensors on block level using HTTP range reads. Once in GPU memory, data can be transformed using PyTorch transforms. We provide a generic PyTorch dataset type with a corresponding dataset factory translating relational queries and requested transformations as an instance. By making use of the HSI, irrelevant blocks can be skipped without reading them as those indices contain statistics on their content at different hierarchical resolution levels. This is an opinionated architecture powered by open standards and making heavy use of open-source technology. Although, hardened for production use using geospatial-temporal data, this architecture generalizes to other use case like computer vision, computational neuroscience, biological sequence analysis and more.
    摘要 存储和流动高维数据 для基础模型训练成为了基础模型的发展的关键要求。在这篇论文中,我们介绍TensorBank,一个 petabyte 级 tensor 湖居 capable of streaming tensors from Cloud Object Store (COS) to GPU 内存 based on complex relational queries。我们使用 Hierarchical Statistical Indices (HSI) for query acceleration。我们的架构允许直接地址 tensors 在块级别使用 HTTP 范围读。一旦在 GPU 内存中,数据可以通过 PyTorch 转换。我们提供一个通用 PyTorch 数据集类型,并提供一个对应的数据工厂,该工厂将关系查询和请求的转换翻译为实例。通过使用 HSI,我们可以跳过无关块,因为它们包含不同层次分辨率水平上的统计信息。这是一种基于开源技术的意见 arquitecture,并且通过使用 geospatial-temporal 数据进行硬化,这种架构可以普及到其他应用场景,如计算机视觉、计算神经科学、生物序列分析等。

Dual Adversarial Alignment for Realistic Support-Query Shift Few-shot Learning

  • paper_url: http://arxiv.org/abs/2309.02088
  • repo_url: None
  • paper_authors: Siyang Jiang, Rui Fang, Hsi-Wen Chen, Wei Ding, Ming-Syan Chen
  • For: 实际情况下,支持集和查询集之间存在多种不确定的扩散shift,这使得traditional的支持集查询集学习变得困难。这篇论文提出了一个新的挑战,即Realistic Support-Query Shift few-shot learning(RSQS),旨在在不确定的扩散shift下进行几个shot学习。* Methods: 我们提出了一种新的对抗性特征平衡方法called DUal adversarial ALignment framework(DuaL),用于缓解RSQS中的两种方面:inter-domain bias和intra-domain variance。一方面,我们在特定的数据集上进行了预处理,并使用生成的各种偏移输入来训练修复网络,以最小化特征层的距离。另一方面,我们提出了一种生成器网络,用于自顺序生成硬example,即less similar的支持集中的示例,并通过整合最优运输来获得一个平滑的运输计划。* Results: 我们建立了RSQS的benchmark,包括了several state-of-the-art baselines,并进行了实验研究。结果表明,DuaL在我们的benchmark中显著超过了state-of-the-art方法。
    Abstract Support-query shift few-shot learning aims to classify unseen examples (query set) to labeled data (support set) based on the learned embedding in a low-dimensional space under a distribution shift between the support set and the query set. However, in real-world scenarios the shifts are usually unknown and varied, making it difficult to estimate in advance. Therefore, in this paper, we propose a novel but more difficult challenge, RSQS, focusing on Realistic Support-Query Shift few-shot learning. The key feature of RSQS is that the individual samples in a meta-task are subjected to multiple distribution shifts in each meta-task. In addition, we propose a unified adversarial feature alignment method called DUal adversarial ALignment framework (DuaL) to relieve RSQS from two aspects, i.e., inter-domain bias and intra-domain variance. On the one hand, for the inter-domain bias, we corrupt the original data in advance and use the synthesized perturbed inputs to train the repairer network by minimizing distance in the feature level. On the other hand, for intra-domain variance, we proposed a generator network to synthesize hard, i.e., less similar, examples from the support set in a self-supervised manner and introduce regularized optimal transportation to derive a smooth optimal transportation plan. Lastly, a benchmark of RSQS is built with several state-of-the-art baselines among three datasets (CIFAR100, mini-ImageNet, and Tiered-Imagenet). Experiment results show that DuaL significantly outperforms the state-of-the-art methods in our benchmark.
    摘要 支持问题Shift几何学学习目标是将未经见过的示例(查询集)分类到已经标注的数据(支持集)基于学习得到的嵌入在低维度空间下,但在实际场景中,这些变化通常是未知且多样的,使得预测变化很困难。因此,在这篇论文中,我们提出了一个新的挑战,即真实支持问题Shift几何学学习(RSQS)。RSQS的关键特点是每个元任务中的个体样本会面临多种分布变化。此外,我们提出了一种整合式对抗特征对齐方法,即DUal adversarial ALignment framework(DuaL),以解决RSQS中的两个方面:间域偏见和内域变异。一方面,为了间域偏见,我们在提前损害原始数据后,使用生成的妨害输入来训练维护网络,并在特征层面下将其距离最小化。另一方面,为了内域变异,我们提出了一种生成器网络,通过自适应方式生成硬例(即更不相似的示例),并通过可惩正的优化运输来 derivation 一个平滑的优化运输计划。最后,我们建立了RSQS的标准准则,包括三个数据集(CIFAR100、mini-ImageNet和Tiered-Imagenet)上的state-of-the-art基elines。实验结果显示,DuaL明显超过了state-of-the-art方法在我们的标准准则中。

Natural Example-Based Explainability: a Survey

  • paper_url: http://arxiv.org/abs/2309.03234
  • repo_url: https://github.com/danderfer/Comp_Sci_Sem_2
  • paper_authors: Antonin Poché, Lucas Hervier, Mohamed-Chafik Bakkay
  • for: 本研究旨在提供一份对当前自然示例基于XAI技术的概述,以便了解不同方法的优缺点和应用场景。
  • methods: 本研究主要涉及的方法包括相似示例、 counterfactual 和 semi-factual 示例、重要实例、原型和概念。
  • results: 本研究提供了这些方法的 semantic definition、认知影响和added value的比较,以便激励和促进未来的自然示例基于XAI技术发展。
    Abstract Explainable Artificial Intelligence (XAI) has become increasingly significant for improving the interpretability and trustworthiness of machine learning models. While saliency maps have stolen the show for the last few years in the XAI field, their ability to reflect models' internal processes has been questioned. Although less in the spotlight, example-based XAI methods have continued to improve. It encompasses methods that use examples as explanations for a machine learning model's predictions. This aligns with the psychological mechanisms of human reasoning and makes example-based explanations natural and intuitive for users to understand. Indeed, humans learn and reason by forming mental representations of concepts based on examples. This paper provides an overview of the state-of-the-art in natural example-based XAI, describing the pros and cons of each approach. A "natural" example simply means that it is directly drawn from the training data without involving any generative process. The exclusion of methods that require generating examples is justified by the need for plausibility which is in some regards required to gain a user's trust. Consequently, this paper will explore the following family of methods: similar examples, counterfactual and semi-factual, influential instances, prototypes, and concepts. In particular, it will compare their semantic definition, their cognitive impact, and added values. We hope it will encourage and facilitate future work on natural example-based XAI.
    摘要 很好!我们将提供一个关于自然例子基于Explainable Artificial Intelligence(XAI)的概览,涵盖了不同方法的优点和缺点。XAI在机器学习模型的解释性和可信度方面具有越来越重要的地位。虽然在过去几年中,saliency map在XAI领域中受到了很多关注,但其能够反映机器学习模型内部过程的能力受到了质疑。在这篇文章中,我们将探讨不同的自然例子基于XAI方法,包括相似的例子、counterfactual和semi-factual、重要的实例、原型和概念。尤其是在比较这些方法的semantic定义、认知影响和加值方面。我们希望这篇文章可以对未来的自然例子基于XAI工作提供启发和促进。Note: "自然"(natural)在这里指的是直接从训练数据中提取的例子,而不是通过生成过程来生成的例子。这种要求可信度的需求是因为人们需要在理解模型的预测时有足够的信任感。

DeepVol: A Deep Transfer Learning Approach for Universal Asset Volatility Modeling

  • paper_url: http://arxiv.org/abs/2309.02072
  • repo_url: None
  • paper_authors: Chen Liu, Minh-Ngoc Tran, Chao Wang, Richard Gerlach, Robert Kohn
  • for: 这篇论文旨在提出一种深度学习模型,以更好地模型金融资产的波动性。
  • methods: 该模型使用了转移学习,可以有效地捕捉和模型所有金融资产的波动性动态,只需要一个通用模型。这与经济学 литераature中常见的单独训练每个数据集的方法不同。
  • results: 这个模型在模型泛化性方面表现出色,可以更好地预测金融资产的波动性。这对金融预测和管理具有广泛的应用前景。
    Abstract This paper introduces DeepVol, a promising new deep learning volatility model that outperforms traditional econometric models in terms of model generality. DeepVol leverages the power of transfer learning to effectively capture and model the volatility dynamics of all financial assets, including previously unseen ones, using a single universal model. This contrasts to the prevailing practice in econometrics literature, which necessitates training separate models for individual datasets. The introduction of DeepVol opens up new avenues for volatility modeling and forecasting in the finance industry, potentially transforming the way volatility is understood and predicted.
    摘要 这篇论文介绍了深度风险模型(DeepVol),这是一种有前途的深度学习模型,可以在经济学领域中超越传统 econometric 模型,并且可以更好地捕捉和模型所有金融资产的风险动态。 DeepVol 利用了传输学习的力量,可以通过单一的通用模型来模型所有金融资产,包括之前未见的资产。这与经济学 литераature 中的常见做法不同,需要为每个数据集训练 separte 模型。 DeepVol 的出现将为金融行业带来新的风险模型和预测方法,可能会改变风险的理解和预测方式。

Enhance Multi-domain Sentiment Analysis of Review Texts through Prompting Strategies

  • paper_url: http://arxiv.org/abs/2309.02045
  • repo_url: None
  • paper_authors: Yajing Wang, Zongwei Luo
  • for: 这篇论文旨在提高大型自然语言处理模型(LLMs)在特定任务中的性能,具体来说是在 Sentiment Analysis 任务中。
  • methods: 这篇论文使用了两种新的提示策略,即 RolePlaying(RP)提示和 Chain-of-thought(CoT)提示,并提出了 RP-CoT 提示策略。
  • results: 实验结果显示,采用提出的提示策略可以明显提高 Sentiment Analysis 的准确率,其中 CoT 提示策略对隐式情感分析具有显著的影响,RP-CoT 提示策略则在所有策略中表现最佳。
    Abstract Large Language Models (LLMs) have made significant strides in both scientific research and practical applications. Existing studies have demonstrated the state-of-the-art (SOTA) performance of LLMs in various natural language processing tasks. However, the question of how to further enhance LLMs' performance in specific task using prompting strategies remains a pivotal concern. This paper explores the enhancement of LLMs' performance in sentiment analysis through the application of prompting strategies. We formulate the process of prompting for sentiment analysis tasks and introduce two novel strategies tailored for sentiment analysis: RolePlaying (RP) prompting and Chain-of-thought (CoT) prompting. Specifically, we also propose the RP-CoT prompting strategy which is a combination of RP prompting and CoT prompting. We conduct comparative experiments on three distinct domain datasets to evaluate the effectiveness of the proposed sentiment analysis strategies. The results demonstrate that the adoption of the proposed prompting strategies leads to a increasing enhancement in sentiment analysis accuracy. Further, the CoT prompting strategy exhibits a notable impact on implicit sentiment analysis, with the RP-CoT prompting strategy delivering the most superior performance among all strategies.
    摘要

Diffusion Generative Inverse Design

  • paper_url: http://arxiv.org/abs/2309.02040
  • repo_url: None
  • paper_authors: Marin Vlastelica, Tatiana López-Guevara, Kelsey Allen, Peter Battaglia, Arnaud Doucet, Kimberley Stachenfeld
  • for: solving inverse design problems efficiently
  • methods: using denoising diffusion models (DDMs) and a particle sampling algorithm
  • results: reducing the number of calls to the simulator compared to standard techniques, with improved efficiency
    Abstract Inverse design refers to the problem of optimizing the input of an objective function in order to enact a target outcome. For many real-world engineering problems, the objective function takes the form of a simulator that predicts how the system state will evolve over time, and the design challenge is to optimize the initial conditions that lead to a target outcome. Recent developments in learned simulation have shown that graph neural networks (GNNs) can be used for accurate, efficient, differentiable estimation of simulator dynamics, and support high-quality design optimization with gradient- or sampling-based optimization procedures. However, optimizing designs from scratch requires many expensive model queries, and these procedures exhibit basic failures on either non-convex or high-dimensional problems. In this work, we show how denoising diffusion models (DDMs) can be used to solve inverse design problems efficiently and propose a particle sampling algorithm for further improving their efficiency. We perform experiments on a number of fluid dynamics design challenges, and find that our approach substantially reduces the number of calls to the simulator compared to standard techniques.
    摘要 “ inverse 设计”指的是对目标函数的输入优化,以实现一个目标结果。在许多实际工程问题中,目标函数通常是一个预测系统状态在时间推移中的模拟器,并且设计挑战是确定初始条件以实现目标结果。现有的学习模拟技术发展已经表明,图 neural network (GNNs) 可以用于准确、高效、可导estiimation of simulator dynamics,并支持高质量的设计优化。然而,从头开始优化设计需要许多昂贵的模拟器调用,这些过程在非凸或高维问题上会表现出基本的失败。在这种情况下,我们提出了使用 denoising diffusion models (DDMs) 来解决 inverse 设计问题,并提出了一种粒子抽象算法来进一步提高其效率。我们在一些流体动力学设计挑战中进行了实验,并发现我们的方法可以减少对模拟器的调用数量相比标准技术。

The Impact of Artificial Intelligence on the Evolution of Digital Education: A Comparative Study of OpenAI Text Generation Tools including ChatGPT, Bing Chat, Bard, and Ernie

  • paper_url: http://arxiv.org/abs/2309.02029
  • repo_url: None
  • paper_authors: Negin Yazdani Motlagh, Matin Khajavi, Abbas Sharifi, Mohsen Ahmadi
  • for: This paper aims to explore the potential of OpenAI’s text generation tools, particularly ChatGPT, in revolutionizing education and to highlight the challenges and opportunities of AI in education.
  • methods: The paper uses a typology that views education through the lenses of system, process, and result to examine the multifaceted applications of AI in education, including decentralizing global education, personalizing curriculums, and digitally documenting competence-based outcomes.
  • results: The paper highlights ChatGPT’s meteoric rise to one million users in just five days and its potential in democratizing education, fostering autodidacticism, and magnifying student engagement. However, the study also acknowledges the potential challenges of AI in education, such as the need for ethical guidelines, pedagogical adaptations, and strategic collaborations to ensure the responsible use of AI tools.
    Abstract In the digital era, the integration of artificial intelligence (AI) in education has ushered in transformative changes, redefining teaching methodologies, curriculum planning, and student engagement. This review paper delves deep into the rapidly evolving landscape of digital education by contrasting the capabilities and impact of OpenAI's pioneering text generation tools like Bing Chat, Bard, Ernie with a keen focus on the novel ChatGPT. Grounded in a typology that views education through the lenses of system, process, and result, the paper navigates the multifaceted applications of AI. From decentralizing global education and personalizing curriculums to digitally documenting competence-based outcomes, AI stands at the forefront of educational modernization. Highlighting ChatGPT's meteoric rise to one million users in just five days, the study underscores its role in democratizing education, fostering autodidacticism, and magnifying student engagement. However, with such transformative power comes the potential for misuse, as text-generation tools can inadvertently challenge academic integrity. By juxtaposing the promise and pitfalls of AI in education, this paper advocates for a harmonized synergy between AI tools and the educational community, emphasizing the urgent need for ethical guidelines, pedagogical adaptations, and strategic collaborations.
    摘要 在数字时代,人工智能(AI)在教育领域的整合已经带来了转变性的变革,重定义了教学方法、课程规划和学生参与度。这篇评论文章深入探讨在数字教育领域的迅速发展,并对OpenAI的创新性文本生成工具如Bing Chat、Bard、Ernie等进行了着力强调,特别是新出现的ChatGPT。根据教育视为系统、过程和结果的三个视角,文章探讨了AI在教育中的多方面应用。从全球教育的减少到个性化课程、数字记录竞争力具体成果等方面,AI在教育现代化中扮演着重要的角色。文章指出ChatGPT在只需五天内吸引了一百万用户,其在推动自主学习、提高学生参与度和全球教育民主化方面具有重要的作用。然而,与此同时,AI在教育领域的应用也存在潜在的风险,文本生成工具可能会不必要地挑战学术Integrity。通过对AI在教育中的推荐和风险的对比,这篇文章强调需要在AI工具和教育社区之间建立和谐的合作,并提出了优先级的道德规范、教学改进和战略合作。

Dynamic Early Exiting Predictive Coding Neural Networks

  • paper_url: http://arxiv.org/abs/2309.02022
  • repo_url: None
  • paper_authors: Alaa Zniber, Ouassim Karrakchou, Mounir Ghogho
  • for: 这篇论文是为了提高深度学习模型在互联网络预测应用中的效率和低功耗。
  • methods: 本研究使用预测编码理论和动态早期终止技术,构建了一个浅层双向网络,并与VGG-16模型进行比较。
  • results: 本研究获得了与VGG-16模型相似的准确率,但具有更少的参数和computational complexity。
    Abstract Internet of Things (IoT) sensors are nowadays heavily utilized in various real-world applications ranging from wearables to smart buildings passing by agrotechnology and health monitoring. With the huge amounts of data generated by these tiny devices, Deep Learning (DL) models have been extensively used to enhance them with intelligent processing. However, with the urge for smaller and more accurate devices, DL models became too heavy to deploy. It is thus necessary to incorporate the hardware's limited resources in the design process. Therefore, inspired by the human brain known for its efficiency and low power consumption, we propose a shallow bidirectional network based on predictive coding theory and dynamic early exiting for halting further computations when a performance threshold is surpassed. We achieve comparable accuracy to VGG-16 in image classification on CIFAR-10 with fewer parameters and less computational complexity.
    摘要 互联网智能设备(IoT)的感应器在不同的实际应用中广泛使用,由穿梭到智能建筑和医疗监控。这些小型设备产生的大量数据,导致深度学习(DL)模型广泛应用于增强过程中。但是,对于更小和更精确的设备,DL模型已经变得太重且不可行。因此,我们受人脑的效率和低功耗骄傲,提出了一个浅层双向网络,基于预测编码理论和动态早期退出,以避免进一步的计算。我们在CIFAR-10类图标签准则中实现了与VGG-16相同的准确性,但具有较少的参数和计算复杂性。

iLoRE: Dynamic Graph Representation with Instant Long-term Modeling and Re-occurrence Preservation

  • paper_url: http://arxiv.org/abs/2309.02012
  • repo_url: None
  • paper_authors: Siwei Zhang, Yun Xiong, Yao Zhang, Xixi Wu, Yiheng Sun, Jiawei Zhang
  • for: 这篇论文旨在提出一个新的动态グラフ模型化方法,以解决现有方法的三个限制,提高其数据测试和应用范围。
  • methods: 这篇论文使用了一个具有自适应短期更新和长期更新的模型,具有自动删除无用或杂散的边的能力,以及一个具有识别注意力机制的传统transformer-based更新器,以更好地捕捉长期循环pattern。
  • results: 论文的实验结果显示,iLoRE方法能够有效地模型动态グラフ,并且在实验中获得了更高的表现。
    Abstract Continuous-time dynamic graph modeling is a crucial task for many real-world applications, such as financial risk management and fraud detection. Though existing dynamic graph modeling methods have achieved satisfactory results, they still suffer from three key limitations, hindering their scalability and further applicability. i) Indiscriminate updating. For incoming edges, existing methods would indiscriminately deal with them, which may lead to more time consumption and unexpected noisy information. ii) Ineffective node-wise long-term modeling. They heavily rely on recurrent neural networks (RNNs) as a backbone, which has been demonstrated to be incapable of fully capturing node-wise long-term dependencies in event sequences. iii) Neglect of re-occurrence patterns. Dynamic graphs involve the repeated occurrence of neighbors that indicates their importance, which is disappointedly neglected by existing methods. In this paper, we present iLoRE, a novel dynamic graph modeling method with instant node-wise Long-term modeling and Re-occurrence preservation. To overcome the indiscriminate updating issue, we introduce the Adaptive Short-term Updater module that will automatically discard the useless or noisy edges, ensuring iLoRE's effectiveness and instant ability. We further propose the Long-term Updater to realize more effective node-wise long-term modeling, where we innovatively propose the Identity Attention mechanism to empower a Transformer-based updater, bypassing the limited effectiveness of typical RNN-dominated designs. Finally, the crucial re-occurrence patterns are also encoded into a graph module for informative representation learning, which will further improve the expressiveness of our method. Our experimental results on real-world datasets demonstrate the effectiveness of our iLoRE for dynamic graph modeling.
    摘要 <> translate("Continuous-time dynamic graph modeling is a crucial task for many real-world applications, such as financial risk management and fraud detection. Though existing dynamic graph modeling methods have achieved satisfactory results, they still suffer from three key limitations, hindering their scalability and further applicability. i) Indiscriminate updating. For incoming edges, existing methods would indiscriminately deal with them, which may lead to more time consumption and unexpected noisy information. ii) Ineffective node-wise long-term modeling. They heavily rely on recurrent neural networks (RNNs) as a backbone, which has been demonstrated to be incapable of fully capturing node-wise long-term dependencies in event sequences. iii) Neglect of re-occurrence patterns. Dynamic graphs involve the repeated occurrence of neighbors that indicates their importance, which is disappointedly neglected by existing methods. In this paper, we present iLoRE, a novel dynamic graph modeling method with instant node-wise Long-term modeling and Re-occurrence preservation. To overcome the indiscriminate updating issue, we introduce the Adaptive Short-term Updater module that will automatically discard the useless or noisy edges, ensuring iLoRE's effectiveness and instant ability. We further propose the Long-term Updater to realize more effective node-wise long-term modeling, where we innovatively propose the Identity Attention mechanism to empower a Transformer-based updater, bypassing the limited effectiveness of typical RNN-dominated designs. Finally, the crucial re-occurrence patterns are also encoded into a graph module for informative representation learning, which will further improve the expressiveness of our method. Our experimental results on real-world datasets demonstrate the effectiveness of our iLoRE for dynamic graph modeling.")Here's the translation in Simplified Chinese:<> continuous-time动态图模型化是许多实际应用中的关键任务,如金融风险管理和欺诈探测。虽然现有的动态图模型化方法已经达到了一定的成果,但它们仍然受到三个关键限制,使其可扩展性和更多的应用场景受到限制。i) 随机更新。现有的方法会随机处理入coming edges,这可能会导致更多的时间开销和意外的噪声信息。ii) 不够有效的节点级长期模型化。它们依赖于循环神经网络(RNN)作为底层,这已经被证明无法全面捕捉节点级长期依赖关系。iii) 忽略重复模式。动态图中的重复 neighboor 表示其重要性,这是现有方法忽略的。在这篇论文中,我们提出了 iLoRE,一种新的动态图模型化方法,具有即时节点级长期模型化和重复模式保存。为了解决随机更新问题,我们引入了适应短期更新模块,可以自动排除无用或噪声的边,保证 iLoRE 的有效性和即时能力。我们进一步提出了长期更新模块,以实现更有效的节点级长期模型化。我们创新地提出了标识注意力机制,以强化基于 Transformer 的更新器,超越传统 RNN доминиated 设计的局限性。最后,我们还编码了重复模式到图模块,以进一步提高我们方法的表达能力。我们在实际 datasets 上进行了实验, demonstate 了我们 iLoRE 在动态图模型化中的效果。

Belief revision and incongruity: is it a joke?

  • paper_url: http://arxiv.org/abs/2309.02009
  • repo_url: None
  • paper_authors: Florence Dupin de Saint Cyr - Bannay, Henri Prade
  • for: 本文是一种智能行为 formalization的尝试,描述一个智能代理在听笑话时的行为。
  • methods: 本文使用了改变信念、出乎意料和违反 norms 等方法来形式化这种智能行为。
  • results: 本文的研究结果表明,在听笑话时,智能代理可以通过改变信念和出乎意料来产生幽默的感受,并且可以通过违反 norms 来提高幽默的效果。
    Abstract Incongruity often makes people laugh. You have to be smart to say stupid things. It requires to be even smarter for understanding them. This paper is a shameless attempt to formalize this intelligent behavior in the case of an agent listening to a joke. All this is a matter of revision of beliefs, surprise and violation of norms.
    摘要 冲突可以让人 laugh。你需要聪明才能说些愚蠢的话。更需要聪明才能理解它们。这篇文章是一种不害怕的尝试,用于形式化代理人听笑话时的智慧行为。这一切都是对信念的修订,对听者的意外和规范的违反。

Aggregating Correlated Estimations with (Almost) no Training

  • paper_url: http://arxiv.org/abs/2309.02005
  • repo_url: None
  • paper_authors: Theo Delemazure, François Durand, Fabien Mathieu
  • for: 本研究旨在提出一些考虑关联错误的汇集规则,以解决许多决策问题无法得到精确解决方案。
  • methods: 本文提出了一些考虑关联错误的汇集规则,并对它们进行了多种实验,以评估它们在不同的数据集上的性能。
  • results: 研究结果表明,当知道错误的相关性信息时,最大似然汇集方法应该被首选。否则,通常在受限的训练数据下,我们建议使用嵌入式选举方法(EV)。
    Abstract Many decision problems cannot be solved exactly and use several estimation algorithms that assign scores to the different available options. The estimation errors can have various correlations, from low (e.g. between two very different approaches) to high (e.g. when using a given algorithm with different hyperparameters). Most aggregation rules would suffer from this diversity of correlations. In this article, we propose different aggregation rules that take correlations into account, and we compare them to naive rules in various experiments based on synthetic data. Our results show that when sufficient information is known about the correlations between errors, a maximum likelihood aggregation should be preferred. Otherwise, typically with limited training data, we recommend a method that we call Embedded Voting (EV).
    摘要 很多决策问题无法精确解决,需要使用估计算法赋分不同选项的分数。估计误差可能存在多种相关性,从低(例如两种完全不同的方法)到高(例如使用同一算法的不同Hyperparameter)。大多数汇集规则都会受到这种多样性的影响。在这篇文章中,我们提出了考虑相关性的不同汇集规则,并与无知规则进行了多个实验,基于 sintetic 数据。我们的结果表明,当知道估计误差之间的相关性信息充分时,最大 likelihood 汇集应该被首选。否则,通常在受限的训练数据情况下,我们建议一种我们称为 Embedded Voting(EV)方法。

Analyzing domain shift when using additional data for the MICCAI KiTS23 Challenge

  • paper_url: http://arxiv.org/abs/2309.02001
  • repo_url: None
  • paper_authors: George Stoica, Mihaela Breaban, Vlad Barbu
  • for: 提高医疗影像3D segmentation的结果,尤其是在训练材料稀缺的情况下。
  • methods: 使用 histogram matching 来缓解频谱shift,以便将新数据与原始训练数据一起使用。
  • results: 对于 histogram matching 的应用,比使用 simple normalization 得到了更好的结果。
    Abstract Using additional training data is known to improve the results, especially for medical image 3D segmentation where there is a lack of training material and the model needs to generalize well from few available data. However, the new data could have been acquired using other instruments and preprocessed such its distribution is significantly different from the original training data. Therefore, we study techniques which ameliorate domain shift during training so that the additional data becomes better usable for preprocessing and training together with the original data. Our results show that transforming the additional data using histogram matching has better results than using simple normalization.
    摘要 使用额外训练数据可以提高结果,特别是医学图像三维分割,因为这个领域缺乏训练材料,模型需要将少量可用数据总结化好。然而,新的数据可能是使用不同的仪器获取的,其分布与原始训练数据有很大差异。因此,我们研究如何在训练过程中缓解领域差异,使得额外数据更容易与原始数据一起预处理和训练。我们的结果表明,对额外数据进行 histogram matching 变换比使用简单 нормализация更有效。

Photonic Structures Optimization Using Highly Data-Efficient Deep Learning: Application To Nanofin And Annular Groove Phase Masks

  • paper_url: http://arxiv.org/abs/2309.01995
  • repo_url: https://github.com/kaeryv/acsphot23suppl
  • paper_authors: Nicolas Roy, Lorenzo König, Olivier Absil, Charlotte Beauthier, Alexandre Mayer, Michaël Lobet
    for: This paper aims to introduce a surrogate optimization framework for metasurfaces, specifically for the manipulation of light properties in astronomical high-contrast imaging.methods: The paper uses computational intelligence techniques, such as partial least squares Kriging, radial basis functions, and neural networks, to optimize the geometric features of vortex phase masks (VPMs). However, these methods are shown to be inadequate for modeling the performance of VPMs, so a data-efficient evolutionary optimization setup using a deep neural network is proposed instead.results: The paper demonstrates the effectiveness of the proposed optimization setup by developing optimal designs for two design candidates, with the surrogate model improving the reliability and efficiency of the procedure. In the most complex case, evolutionary optimization enables optimization of the design that would otherwise be impractical (requiring too much simulations). The use of the surrogate model reduces the required number of simulations by up to 75% compared to conventional optimization techniques.Here is the text in Simplified Chinese:for: 这篇论文目标是引入一种基于Vortex phase masks(VPMs)的高精度光学设计优化框架。methods: 论文使用了计算智能技术,如部分最小值Kriging、基函数和神经网络,来优化VPMs的几何特征。然而,这些方法不足以模型VPMs的性能,因此提出了一种数据高效的进化优化方案。results: 论文证明了提案的优化方案的有效性,通过开发了两种设计候选人。在最复杂的情况下,进化优化可以实现对设计的优化,而无需进行过多的Simulations。使用代表性模型可以大大降低需要的Simulations数量,相比传统优化技术。
    Abstract Metasurfaces offer a flexible framework for the manipulation of light properties in the realm of thin film optics. Specifically, the polarization of light can be effectively controlled through the use of thin phase plates. This study aims to introduce a surrogate optimization framework for these devices. The framework is applied to develop two kinds of vortex phase masks (VPMs) tailored for application in astronomical high-contrast imaging. Computational intelligence techniques are exploited to optimize the geometric features of these devices. The large design space and computational limitations necessitate the use of surrogate models like partial least squares Kriging, radial basis functions, or neural networks. However, we demonstrate the inadequacy of these methods in modeling the performance of VPMs. To address the shortcomings of these methods, a data-efficient evolutionary optimization setup using a deep neural network as a highly accurate and efficient surrogate model is proposed. The optimization process in this study employs a robust particle swarm evolutionary optimization scheme, which operates on explicit geometric parameters of the photonic device. Through this approach, optimal designs are developed for two design candidates. In the most complex case, evolutionary optimization enables optimization of the design that would otherwise be impractical (requiring too much simulations). In both cases, the surrogate model improves the reliability and efficiency of the procedure, effectively reducing the required number of simulations by up to 75% compared to conventional optimization techniques.
    摘要 追踪板(Metasurfaces)提供了膜片光学中的灵活框架,可以有效控制光的属性。本研究旨在介绍一种代理优化框架,用于这些设备。这种框架应用于开发两种星系高对比图像的旋转相位面镜(VPM)。通过利用计算智能技术,可以优化这些设备的几何特征。由于设计空间很大,计算限制,因此需要使用代理模型,如多项式拟合、径向基函数或神经网络。然而,我们发现这些方法无法模型VPM的性能。为了解决这些方法的缺陷,我们提出了一种数据有效的进化优化设计,使用深度神经网络作为高精度和高效的代理模型。优化过程中,我们使用一种稳定的粒子群演化优化方法,该方法操作于膜片光学设备的显式几何参数。通过这种方法,我们开发了两个设计候选人。在最复杂的情况下,演化优化使得设计可以实现,而不是通过传统优化技术来实现。在两个情况下,代理模型提高了过程的可靠性和效率,实际减少了需要的模拟数量,相对于传统优化技术,减少了75%。

sasdim: self-adaptive noise scaling diffusion model for spatial time series imputation

  • paper_url: http://arxiv.org/abs/2309.01988
  • repo_url: None
  • paper_authors: Shunyang Zhang, Senzhang Wang, Xianzhen Tan, Ruochen Liu, Jian Zhang, Jianxin Wang
  • for: spatial time series imputation
  • methods: self-adaptive noise scaling diffusion model (SaSDim) with a new loss function and across spatial-temporal global convolution module
  • results: effective imputation performance on three real-world datasets, with comparison to current state-of-the-art baselines
    Abstract Spatial time series imputation is critically important to many real applications such as intelligent transportation and air quality monitoring. Although recent transformer and diffusion model based approaches have achieved significant performance gains compared with conventional statistic based methods, spatial time series imputation still remains as a challenging issue due to the complex spatio-temporal dependencies and the noise uncertainty of the spatial time series data. Especially, recent diffusion process based models may introduce random noise to the imputations, and thus cause negative impact on the model performance. To this end, we propose a self-adaptive noise scaling diffusion model named SaSDim to more effectively perform spatial time series imputation. Specially, we propose a new loss function that can scale the noise to the similar intensity, and propose the across spatial-temporal global convolution module to more effectively capture the dynamic spatial-temporal dependencies. Extensive experiments conducted on three real world datasets verify the effectiveness of SaSDim by comparison with current state-of-the-art baselines.
    摘要 <> spatial time series imputation 是很重要的几个实际应用,如智能交通和空气质量监测。尽管最近的 transformer 和 diffusion model 基于方法已经实现了与传统统计学基于方法相比的显著性能提升,但是 spatial time series imputation 仍然是一个具有复杂的空间时间相关性和空间时间数据的噪声不确定性的挑战。特别是,最近的 diffusion process 基于模型可能会将随机噪声引入到插入中,从而影响模型性能。为此,我们提出了一种自适应噪声扩大扩散模型名为 SaSDim,以更有效地进行 spatial time series imputation。特别是,我们提出了一个新的损失函数,可以扩大噪声到类似的强度,并提出了跨空间时间全球 convolution 模块,以更好地捕捉空间时间相关性的动态变化。广泛的实验在三个真实世界数据集上验证了 SaSDim 的效果,与当前状态的先进基elines进行比较。

Graph-Based Interaction-Aware Multimodal 2D Vehicle Trajectory Prediction using Diffusion Graph Convolutional Networks

  • paper_url: http://arxiv.org/abs/2309.01981
  • repo_url: None
  • paper_authors: Keshu Wu, Yang Zhou, Haotian Shi, Xiaopeng Li, Bin Ran
  • for: 预测汽车轨迹,以提高自动化汽车运行效率和安全,特别在拥堵多车道高速公路上。
  • methods: 使用 Graph-based Interaction-aware Multi-modal Trajectory Prediction (GIMTP) 框架,利用图表示汽车的动态互动关系,并通过Diffusion Graph Convolutional Network (DGCN) 捕捉空间和时间两种依赖关系。
  • results: 提供了两维预测结果,包括 longitudinal 和 lateral 驾驶行为,并提供了相应的概率分布,以便更好地预测汽车的未来行为。
    Abstract Predicting vehicle trajectories is crucial for ensuring automated vehicle operation efficiency and safety, particularly on congested multi-lane highways. In such dynamic environments, a vehicle's motion is determined by its historical behaviors as well as interactions with surrounding vehicles. These intricate interactions arise from unpredictable motion patterns, leading to a wide range of driving behaviors that warrant in-depth investigation. This study presents the Graph-based Interaction-aware Multi-modal Trajectory Prediction (GIMTP) framework, designed to probabilistically predict future vehicle trajectories by effectively capturing these interactions. Within this framework, vehicles' motions are conceptualized as nodes in a time-varying graph, and the traffic interactions are represented by a dynamic adjacency matrix. To holistically capture both spatial and temporal dependencies embedded in this dynamic adjacency matrix, the methodology incorporates the Diffusion Graph Convolutional Network (DGCN), thereby providing a graph embedding of both historical states and future states. Furthermore, we employ a driving intention-specific feature fusion, enabling the adaptive integration of historical and future embeddings for enhanced intention recognition and trajectory prediction. This model gives two-dimensional predictions for each mode of longitudinal and lateral driving behaviors and offers probabilistic future paths with corresponding probabilities, addressing the challenges of complex vehicle interactions and multi-modality of driving behaviors. Validation using real-world trajectory datasets demonstrates the efficiency and potential.
    摘要 预测 vehicular trajectories 是确保自动化交通效率和安全的关键,尤其在拥堵的多车道高速公路上。在这种动态环境中,车辆的运动受到历史行为以及与周围车辆的交互影响。这些复杂的交互关系导致了车辆的驾驶行为的各种多样性,需要进一步的研究。本研究提出的 Graph-based Interaction-aware Multi-modal Trajectory Prediction(GIMTP)框架,可以 probabilistically 预测未来车辆的 trajectories,并有效地捕捉这些交互关系。在这个框架中,车辆的运动被视为时间变化的图形中的节点,交通交互被表示为动态邻接矩阵。为了全面捕捉这些图形中的空间和时间相关性,我们采用了卷积图грам(DGCN),从而提供了图形 embedding both historical states 和 future states。此外,我们采用了驾驶意图特征融合,以适应不同驾驶意图的 embeddings 的权重调整,从而提高了驾驶意图识别和车辆预测。这个模型为每种方向的两维预测提供了两维预测结果,并提供了对应的概率,解决了车辆间复杂的交互关系和驾驶行为多样性的问题。验证使用实际 trajectory 数据表明该模型的效率和潜力。

Linear Regression using Heterogeneous Data Batches

  • paper_url: http://arxiv.org/abs/2309.01973
  • repo_url: None
  • paper_authors: Ayush Jain, Rajat Sen, Weihao Kong, Abhimanyu Das, Alon Orlitsky
    for:The paper is written to address the problem of learning input-output relationships from multiple sources with insufficient data.methods:The paper proposes a novel gradient-based algorithm that improves upon existing results by allowing for different, unknown, and heavy-tailed input distributions for each subgroup, recovering all subgroups with a significant proportion of batches, and removing the separation requirement between regression vectors.results:The proposed algorithm extends the applicability of the existing results, allowing for smaller batch sizes and reducing the number of batches needed to achieve accurate regression.
    Abstract In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are $k$ subgroups, each with its own regression vector. Prior work~\cite{kong2020meta} showed that with abundant small-batches, the regression vectors can be learned with only few, $\tilde\Omega( k^{3/2})$, batches of medium-size with $\tilde\Omega(\sqrt k)$ samples each. However, the paper requires that the input distribution for all $k$ subgroups be isotropic Gaussian, and states that removing this assumption is an ``interesting and challenging problem". We propose a novel gradient-based algorithm that improves on the existing results in several ways. It extends the applicability of the algorithm by: (1) allowing the subgroups' underlying input distributions to be different, unknown, and heavy-tailed; (2) recovering all subgroups followed by a significant proportion of batches even for infinite $k$; (3) removing the separation requirement between the regression vectors; (4) reducing the number of batches and allowing smaller batch sizes.
    摘要 在许多学习应用中,数据来源来自多个不同的源泉,每个源泉提供一个不够的批处理,用于学习其输入输出关系。一种常见的方法假设这些源泉可以分为多个未知的子组,每个子组有未知的输入分布和输入输出关系。我们考虑这个设置的一个最基本和最重要的情况,其中输出是噪声加权的输入的线性组合,并且有 $k$ 个子组,每个子组有自己的回归 вектор。先前的工作(\ref{kong2020meta}) 显示,只要有充足的小批处理,可以通过只需几个 $\tilde\Omega(k^{3/2})$ 批处理,每批处理中有 $\tilde\Omega(\sqrt k)$ 个样本,学习这些回归 вектор。然而,这篇文章要求所有 $k$ 个子组的输入分布都是均匀的 Gaussian,并且认为从不同输入分布的批处理中学习回归 вектор是一个“有趣和挑战的问题”。我们提出了一种新的梯度法,它在以下方面改进了现有结果:1. 允许子组的下面分布不同、未知、重 tailed;2. 可以在有限 $k$ 的情况下,将所有子组都回归;3. 取消回归向量之间的分离要求;4. 减少批处理的数量,并允许更小的批处理大小。

A Survey on Interpretable Cross-modal Reasoning

  • paper_url: http://arxiv.org/abs/2309.01955
  • repo_url: https://github.com/ZuyiZhou/Awesome-Interpretable-Cross-modal-Reasoning
  • paper_authors: Dizhan Xue, Shengsheng Qian, Zuyi Zhou, Changsheng Xu
  • for: 本文旨在探讨可解释的跨模态理解(I-CMR),即不 только实现高预测性能,还能提供人类可理解的解释结果。
  • methods: 本文使用三级分类法概述I-CMR的典型方法,并对现有的CMR数据集进行了解释注释。
  • results: 本文总结了I-CMR的挑战和未来发展方向,并提供了一个包含相关方法、数据集和资源的GitHub项目。
    Abstract In recent years, cross-modal reasoning (CMR), the process of understanding and reasoning across different modalities, has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics. As the deployment of AI systems becomes more ubiquitous, the demand for transparency and comprehensibility in these systems' decision-making processes has intensified. This survey delves into the realm of interpretable cross-modal reasoning (I-CMR), where the objective is not only to achieve high predictive performance but also to provide human-understandable explanations for the results. This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR. Furthermore, this survey reviews the existing CMR datasets with annotations for explanations. Finally, this survey summarizes the challenges for I-CMR and discusses potential future directions. In conclusion, this survey aims to catalyze the progress of this emerging research area by providing researchers with a panoramic and comprehensive perspective, illuminating the state of the art and discerning the opportunities. The summarized methods, datasets, and other resources are available at https://github.com/ZuyiZhou/Awesome-Interpretable-Cross-modal-Reasoning.
    摘要 This survey provides a comprehensive overview of I-CMR methods, using a three-level taxonomy to categorize them. Additionally, it reviews existing CMR datasets with annotations for explanations. Finally, it discusses the challenges facing I-CMR and outlines potential future directions.The main goal of this survey is to advance the progress of this emerging research area by providing researchers with a comprehensive perspective, highlighting the current state of the art and identifying opportunities for future research. The survey's findings, methods, datasets, and other resources are available at https://github.com/ZuyiZhou/Awesome-Interpretable-Cross-modal-Reasoning.

RADIO: Reference-Agnostic Dubbing Video Synthesis

  • paper_url: http://arxiv.org/abs/2309.01950
  • repo_url: None
  • paper_authors: Dongyeun Lee, Chaewon Kim, Sangjoon Yu, Jaejun Yoo, Gyeong-Moon Park
  • for: 高精度 talking head生成中最大的挑战是实现高精度细节,同时保证 precisions synchronization.
  • methods: 我们提出了 RADIO 框架,通过修改decoder层的 latent space 来实现高质量的 dubbed video。此外,我们还 incorporated ViT blocks 来强调高精度细节,特别是在唇部分。
  • results: 我们的实验结果表明,RADIO 能够保持高度同步,同时不失高精度。特别在参考帧与实际帧有很大差异时,我们的方法表现出了更高的稳定性和可靠性。
    Abstract One of the most challenging problems in audio-driven talking head generation is achieving high-fidelity detail while ensuring precise synchronization. Given only a single reference image, extracting meaningful identity attributes becomes even more challenging, often causing the network to mirror the facial and lip structures too closely. To address these issues, we introduce RADIO, a framework engineered to yield high-quality dubbed videos regardless of the pose or expression in reference images. The key is to modulate the decoder layers using latent space composed of audio and reference features. Additionally, we incorporate ViT blocks into the decoder to emphasize high-fidelity details, especially in the lip region. Our experimental results demonstrate that RADIO displays high synchronization without the loss of fidelity. Especially in harsh scenarios where the reference frame deviates significantly from the ground truth, our method outperforms state-of-the-art methods, highlighting its robustness. Pre-trained model and codes will be made public after the review.
    摘要 一个非常挑战的问题在听音驱动的头部生成中是达到高精度细节,同时保证准确的同步。只有一个参考图片,提取有意义的人脸特征变得更加挑战,常常使网络模式lip和脸部结构,这会导致网络模式的产生。为解决这些问题,我们介绍了RADIO框架,可以生成高质量的重音视频,无论参考图片的pose或表情。关键在于在decoder层中模拟latent空间中的音频和参考特征。此外,我们在decoder中添加了ViT块,以强调高精度细节,特别是在唇区。我们的实验结果表明,RADIO可以保持高同步性,而不失去精度。尤其在参考图片与实际真实情况有很大差异时,我们的方法比状态艺术法更高效,这展示了其Robustness。我们将在审核后发布预训练模型和代码。

OHQ: On-chip Hardware-aware Quantization

  • paper_url: http://arxiv.org/abs/2309.01945
  • repo_url: None
  • paper_authors: Wei Huang, Haotong Qin, Yangdong Liu, Jingzhuo Liang, Yifu Ding, Ying Li, Xianglong Liu
  • for: 这个论文旨在提出一个在硬件上进行数值化的框架,以便在资源有限的硬件上部署进步的深度模型。
  • methods: 这个框架使用了硬件感知的混合精度数值化,并且使用了面精度调节来优化数值化的精度和效率。
  • results: 这个框架可以实现在硬件上进行加速的推理,并且可以获得70%和73%的准确率 для ResNet-18和MobileNetV3。同时,这个框架可以提高对INT8的延迟时间,比较INT8在部署时的性能。
    Abstract Quantization emerges as one of the most promising approaches for deploying advanced deep models on resource-constrained hardware. Mixed-precision quantization leverages multiple bit-width architectures to unleash the accuracy and efficiency potential of quantized models. However, existing mixed-precision quantization suffers exhaustive search space that causes immense computational overhead. The quantization process thus relies on separate high-performance devices rather than locally, which also leads to a significant gap between the considered hardware metrics and the real deployment.In this paper, we propose an On-chip Hardware-aware Quantization (OHQ) framework that performs hardware-aware mixed-precision quantization without accessing online devices. First, we construct the On-chip Quantization Awareness (OQA) pipeline, enabling perceive the actual efficiency metrics of the quantization operator on the hardware.Second, we propose Mask-guided Quantization Estimation (MQE) technique to efficiently estimate the accuracy metrics of operators under the constraints of on-chip-level computing power.By synthesizing network and hardware insights through linear programming, we obtain optimized bit-width configurations. Notably, the quantization process occurs on-chip entirely without any additional computing devices and data access. We demonstrate accelerated inference after quantization for various architectures and compression ratios, achieving 70% and 73% accuracy for ResNet-18 and MobileNetV3, respectively. OHQ improves latency by 15~30% compared to INT8 on deployment.
    摘要 “量化技术在资源有限的硬件上部署高级深度模型的可能性几乎无限大。混合精度量化利用多个bit Width架构实现量化模型的精度和效率潜力。然而,现有的混合精度量化受到极大的搜索空间压力,导致计算开销很大。因此,量化过程通常需要分离的高性能设备,这也导致了实际部署与考虑的硬件指标之间存在很大的差距。在这篇论文中,我们提出了在硬件上完全没有访问外部设备的On-chip Hardware-aware Quantization(OHQ)框架。首先,我们构建了On-chip Quantization Awareness(OQA)管道,使得量化操作的实际效率指标可以在硬件上被感知。其次,我们提出了面具指导量化估计(MQE)技术,以计算量化操作在硬件上的精度指标。通过将网络和硬件知识融合到线性规划中,我们得到了优化的位数配置。需要注意的是,量化过程完全发生在硬件上,没有任何外部计算设备和数据访问。我们在不同的架构和压缩比例上进行加速的推理,实现了ResNet-18和MobileNetV3的70%和73%的准确率。OHQ提高了INT8在部署时的延迟时间,相对于INT8,OHQ提高了15~30%。”

Quantum-AI empowered Intelligent Surveillance: Advancing Public Safety Through Innovative Contraband Detection

  • paper_url: http://arxiv.org/abs/2309.03231
  • repo_url: None
  • paper_authors: Syed Atif Ali Shah, Nasir Algeelani, Najeeb Al-Sammarraie
  • for: 这个研究旨在发展一个基于量子人工智能的实时类型识别系统,以解决现有的实时识别过程中的速度问题。
  • methods: 本研究使用了Quantum CNN的技术,实现了实时类型识别的高精度和高速度。
  • results: Quantum-RetinaNet模型在实验中表现出色,能够实现高精度和高速度的实时类型识别,提供了一个可行的解决方案 для实时识别过程中的速度问题。
    Abstract Surveillance systems have emerged as crucial elements in upholding peace and security in the modern world. Their ubiquity aids in monitoring suspicious activities effectively. However, in densely populated environments, continuous active monitoring becomes impractical, necessitating the development of intelligent surveillance systems. AI integration in the surveillance domain was a big revolution, however, speed issues have prevented its widespread implementation in the field. It has been observed that quantum artificial intelligence has led to a great breakthrough. Quantum artificial intelligence-based surveillance systems have shown to be more accurate as well as capable of performing well in real-time scenarios, which had never been seen before. In this research, a RentinaNet model is integrated with Quantum CNN and termed as Quantum-RetinaNet. By harnessing the Quantum capabilities of QCNN, Quantum-RetinaNet strikes a balance between accuracy and speed. This innovative integration positions it as a game-changer, addressing the challenges of active monitoring in densely populated scenarios. As demand for efficient surveillance solutions continues to grow, Quantum-RetinaNet offers a compelling alternative to existing CNN models, upholding accuracy standards without sacrificing real-time performance. The unique attributes of Quantum-RetinaNet have far-reaching implications for the future of intelligent surveillance. With its enhanced processing speed, it is poised to revolutionize the field, catering to the pressing need for rapid yet precise monitoring. As Quantum-RetinaNet becomes the new standard, it ensures public safety and security while pushing the boundaries of AI in surveillance.
    摘要 现代世界中维护和平安全的重要元素之一是监控系统。它们的普遍性使得可以有效监控异常活动。然而,在高度密集的环境中,不断的活动监控变得不实际,需要开发智能监控系统。人工智能在监控领域的整合是一次大革命,但速度问题阻碍了其广泛应用。研究表明,量子人工智能在监控领域带来了巨大突破。基于量子人工智能的监控系统显示更高精度,并在实时场景中表现出色,这从未被见过。本研究将RentinaNet模型与量子神经网络(QCNN)结合,称为量子-RetinaNet。通过利用量子神经网络的量子特性,量子-RetinaNet实现了精度和速度之间的平衡。这种创新的集成,将成为监控领域的游戏 changer,解决了高度密集enario中不断监控的挑战。随着有效监控解决方案的需求不断增长,量子-RetinaNet对现有的Convolutional Neural Network(CNN)模型提供了一种有力的替代,保持精度标准而不是速度性能的牺牲。量子-RetinaNet的独特特点有广泛的未来预测,它在监控领域的扩展将成为一个革命,为公共安全和安全提供保障,同时推动人工智能在监控领域的发展。

Dynamic Brain Transformer with Multi-level Attention for Functional Brain Network Analysis

  • paper_url: http://arxiv.org/abs/2309.01941
  • repo_url: https://github.com/Wayfear/Dynamic-Brain-Transformer
  • paper_authors: Xuan Kan, Antonio Aodong Chen Gu, Hejie Cui, Ying Guo, Carl Yang
  • for: 这篇论文的目的是提出一种新的方法,以便更好地分析大脑功能。
  • methods: 这篇论文使用了Dynamic bRAin Transformer(DART)方法,融合静止大脑网络和动态大脑网络,以提高大脑功能分析的精度和多元性。
  • results: 这篇论文的结果显示,DRAT方法可以更有效地预测临床结果和分类个人,并且可以提供更多的几何资讯,例如哪些大脑网络或动态网络在最终预测中做出了贡献。
    Abstract Recent neuroimaging studies have highlighted the importance of network-centric brain analysis, particularly with functional magnetic resonance imaging. The emergence of Deep Neural Networks has fostered a substantial interest in predicting clinical outcomes and categorizing individuals based on brain networks. However, the conventional approach involving static brain network analysis offers limited potential in capturing the dynamism of brain function. Although recent studies have attempted to harness dynamic brain networks, their high dimensionality and complexity present substantial challenges. This paper proposes a novel methodology, Dynamic bRAin Transformer (DART), which combines static and dynamic brain networks for more effective and nuanced brain function analysis. Our model uses the static brain network as a baseline, integrating dynamic brain networks to enhance performance against traditional methods. We innovatively employ attention mechanisms, enhancing model explainability and exploiting the dynamic brain network's temporal variations. The proposed approach offers a robust solution to the low signal-to-noise ratio of blood-oxygen-level-dependent signals, a recurring issue in direct DNN modeling. It also provides valuable insights into which brain circuits or dynamic networks contribute more to final predictions. As such, DRAT shows a promising direction in neuroimaging studies, contributing to the comprehensive understanding of brain organization and the role of neural circuits.
    摘要 latest neuroimaging studies have highlighted the importance of network-centric brain analysis, particularly with functional magnetic resonance imaging. The emergence of Deep Neural Networks has fostered a substantial interest in predicting clinical outcomes and categorizing individuals based on brain networks. However, the conventional approach involving static brain network analysis offers limited potential in capturing the dynamism of brain function. Although recent studies have attempted to harness dynamic brain networks, their high dimensionality and complexity present substantial challenges. This paper proposes a novel methodology, Dynamic bRAin Transformer (DART), which combines static and dynamic brain networks for more effective and nuanced brain function analysis. Our model uses the static brain network as a baseline, integrating dynamic brain networks to enhance performance against traditional methods. We innovatively employ attention mechanisms, enhancing model explainability and exploiting the dynamic brain network's temporal variations. The proposed approach offers a robust solution to the low signal-to-noise ratio of blood-oxygen-level-dependent signals, a recurring issue in direct DNN modeling. It also provides valuable insights into which brain circuits or dynamic networks contribute more to final predictions. As such, DRAT shows a promising direction in neuroimaging studies, contributing to the comprehensive understanding of brain organization and the role of neural circuits.

CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models

  • paper_url: http://arxiv.org/abs/2309.01940
  • repo_url: https://github.com/apexlab/codeapex
  • paper_authors: Lingyue Fu, Huacan Chai, Shuang Luo, Kounianhua Du, Weiming Zhang, Longteng Fan, Jiayi Lei, Renting Rui, Jianghao Lin, Yuchen Fang, Yifan Liu, Jingkuan Wang, Siyuan Qi, Kangning Zhang, Weinan Zhang, Yong Yu
  • for: 这个论文主要是用来评估大型自然语言模型(LLMs)在编程方面的能力。
  • methods: 该论文使用了一个名为CodeApex的双语 bencmark dataset,以评估 LLMS 在编程理解和代码生成方面的能力。 CodeApex 包括三类多选问题:概念理解、通用理性和多步理性,用于评估 LLMS 的编程理解能力。
  • results: 研究人员使用 14 个当前状态的 LLMS,包括一般型和专门型模型,进行评估。 GPT 表现出了最好的编程能力,在两个任务上的准确率分别为 50% 和 56%。 这显示了 LLMS 在编程任务上仍有很大的改进空间。
    Abstract With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. We propose CodeApex, a bilingual benchmark dataset focusing on the programming comprehension and code generation abilities of LLMs. CodeApex comprises three types of multiple-choice questions: conceptual understanding, commonsense reasoning, and multi-hop reasoning, designed to evaluate LLMs on programming comprehension tasks. Additionally, CodeApex utilizes algorithmic questions and corresponding test cases to assess the code quality generated by LLMs. We evaluate 14 state-of-the-art LLMs, including both general-purpose and specialized models. GPT exhibits the best programming capabilities, achieving approximate accuracies of 50% and 56% on the two tasks, respectively. There is still significant room for improvement in programming tasks. We hope that CodeApex can serve as a reference for evaluating the coding capabilities of LLMs, further promoting their development and growth. Datasets are released at https://github.com/APEXLAB/CodeApex.git. CodeApex submission website is https://apex.sjtu.edu.cn/codeapex/.
    摘要 <>使大语言模型(LLM)的出现,程序设计和生成能力得到了显著改善,引起了研究人员的关注。我们提出了 CodeApex,一个双语测试集,旨在评估 LLM 在程序理解和代码生成任务上的能力。CodeApex 包括三种多选问题:概念理解、常识逻辑和多步逻辑,用于评估 LLM 在程序理解任务上的能力。此外,CodeApex 还使用了算法问题和相应的测试用例,来评估 LLM 生成的代码质量。我们对 14 个当前state-of-the-art LLM 进行评估,包括一般目标和专门目标模型。GPT 在两个任务上显示出了最好的编程能力,即 aproximate 的准确率为 50% 和 56%。然而,还有很多空间可以进一步改进程序任务。我们希望 CodeApex 能够成为 LLM 编程能力的参考,并促进其发展和成长。测试集可以在 https://github.com/APEXLAB/CodeApex.git 上下载。CodeApex 提交website是 https://apex.sjtu.edu.cn/codeapex/。

Provably safe systems: the only path to controllable AGI

  • paper_url: http://arxiv.org/abs/2309.01933
  • repo_url: None
  • paper_authors: Max Tegmark, Steve Omohundro
  • for: 这篇论文旨在帮助人类安全蒸蒸成长,并使用强大的人工智能(AGI)来实现这一目标。
  • methods: 这篇论文提出了使用高级人工智能进行正式验证和机制解释来建构AGI,并 garantía AGI满足人类指定的要求。
  • results: 这篇论文认为,这种方法是保证安全控制AGI的唯一道路。
    Abstract We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by building them to provably satisfy human-specified requirements. We argue that this will soon be technically feasible using advanced AI for formal verification and mechanistic interpretability. We further argue that it is the only path which guarantees safe controlled AGI. We end with a list of challenge problems whose solution would contribute to this positive outcome and invite readers to join in this work.
    摘要 我们描述了一条人类安全快速发展的人工通用智能(AGI)路径,通过让AGI建立可靠满足人类规定的条件。我们认为这很快会科技上可行,使用进步的AI进行正式验证和机械阅读性。我们还认为这是唯一能 guarantee safe controlled AGI 的路径。我们列出了一些挑战问题的解决方案,并邀请读者参与这个工作。Note that "人类安全快速发展" (rénxīn ànqù suǒzhòng fāzhǎng) is a bit of a mouthful in Chinese, so you may see variations of the phrase that use shorter words or different phrasing.

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

  • paper_url: http://arxiv.org/abs/2309.01922
  • repo_url: None
  • paper_authors: Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal
  • for: 这个论文关注的是无穷远景平均奖励Markov决策过程(MDP)。与现有工作不同,我们的方法不假设MDP结构是线性的,而是利用通用的政策梯度算法,从而解放其吞吐量。
  • methods: 我们提出了一种政策梯度算法,并证明其全球吞吐量性。此外,我们还 Compute regret bound,这是首次在平均奖励场景中对通用参数化政策梯度算法进行投入的尝试。
  • results: 我们证明了该算法的 regret bound为 $\tilde{\mathcal{O}({T}^{3/4})$。这意味着,在平均奖励场景中,我们的算法可以在很短的时间内达到理想的决策。
    Abstract In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Distinguishing itself from existing works within this context, our approach harnesses the power of the general policy gradient-based algorithm, liberating it from the constraints of assuming a linear MDP structure. We propose a policy gradient-based algorithm and show its global convergence property. We then prove that the proposed algorithm has $\tilde{\mathcal{O}({T}^{3/4})$ regret. Remarkably, this paper marks a pioneering effort by presenting the first exploration into regret-bound computation for the general parameterized policy gradient algorithm in the context of average reward scenarios.
    摘要 在这篇论文中,我们考虑了一个无穷horizon平均奖励Markov决策过程(MDP)。与现有的研究不同,我们的方法利用了通用的policy梯度基本算法,从linear MDP结构的假设中解放出来。我们提出了一种policy梯度基本算法,并证明其全球归一化性。然后,我们证明了提案的算法有$\tilde{\mathcal{O}({T}^{3/4})$的 regret。值得注意的是,这篇论文是第一篇在average奖励场景中计算 regret bound的general parameterized policy gradient算法的探索。

SyntheWorld: A Large-Scale Synthetic Dataset for Land Cover Mapping and Building Change Detection

  • paper_url: http://arxiv.org/abs/2309.01907
  • repo_url: https://github.com/JTRNEO/SyntheWorld
  • paper_authors: Jian Song, Hongruixuan Chen, Naoto Yokoya
  • for: 提高计算机视觉任务和技术的研究,尤其是远程感知图像处理领域。
  • methods: 使用 Synthetic dataset,包括40,000个图像,每个图像具有 submeter 精度的像素和 eight 类地形分类注解,以及40,000个对比图像,用于检测建筑变化。
  • results: 通过在多个标准远程感知图像集上进行实验,证明 SyntheticWorld 的高质量和多样性,并 investigate 了在不同条件下 synthetic data 的优势。
    Abstract Synthetic datasets, recognized for their cost effectiveness, play a pivotal role in advancing computer vision tasks and techniques. However, when it comes to remote sensing image processing, the creation of synthetic datasets becomes challenging due to the demand for larger-scale and more diverse 3D models. This complexity is compounded by the difficulties associated with real remote sensing datasets, including limited data acquisition and high annotation costs, which amplifies the need for high-quality synthetic alternatives. To address this, we present SyntheWorld, a synthetic dataset unparalleled in quality, diversity, and scale. It includes 40,000 images with submeter-level pixels and fine-grained land cover annotations of eight categories, and it also provides 40,000 pairs of bitemporal image pairs with building change annotations for building change detection task. We conduct experiments on multiple benchmark remote sensing datasets to verify the effectiveness of SyntheWorld and to investigate the conditions under which our synthetic data yield advantages. We will release SyntheWorld to facilitate remote sensing image processing research.
    摘要 《 synthetic datasets 》,被广泛应用于计算机视觉任务和技术的进步,因为它们的成本效益很高。然而,当涉及到远程感知图像处理时,创建 synthetic datasets 变得更加困难,因为需要更大规模和更多的 3D 模型。这种复杂性由实际远程感知数据的限制和高注释成本带来,这使得高质量的 synthetic altenativas 变得更加重要。为解决这一问题,我们介绍 SyntheWorld,一个无与伦比的 synthetic dataset,包括 40,000 张图像,每张图像有 submeter 级像素和细化的地形分类注释,同时还提供了 40,000 对时间双写图像对,用于建筑变化检测任务的注释。我们在多个标准远程感知数据集上进行了实验,以验证 SyntheWorld 的有效性和在不同条件下synthetic 数据的优势。我们将在未来发布 SyntheWorld,以便促进远程感知图像处理研究。

Towards General and Efficient Online Tuning for Spark

  • paper_url: http://arxiv.org/abs/2309.01901
  • repo_url: None
  • paper_authors: Yang Li, Huaijun Jiang, Yu Shen, Yide Fang, Xiaofeng Yang, Danqing Huang, Xinyi Zhang, Wentao Zhang, Ce Zhang, Peng Chen, Bin Cui
  • for: 提高 Spark 的性能和可扩展性,解决自动调整问题。
  • methods: 提出一个通用和高效的 Spark 自动调整框架,包括一个通用优化形式ulation、搜索方法、安全获取方法和三种创新技术。
  • results: 实现了在云端提供独立的 Spark 调整服务,并在实际生产任务中实现了减少内存成本57.00%和CPU成本34.93%的效果,提高了实用性、通用性和效率。
    Abstract The distributed data analytic system -- Spark is a common choice for processing massive volumes of heterogeneous data, while it is challenging to tune its parameters to achieve high performance. Recent studies try to employ auto-tuning techniques to solve this problem but suffer from three issues: limited functionality, high overhead, and inefficient search. In this paper, we present a general and efficient Spark tuning framework that can deal with the three issues simultaneously. First, we introduce a generalized tuning formulation, which can support multiple tuning goals and constraints conveniently, and a Bayesian optimization (BO) based solution to solve this generalized optimization problem. Second, to avoid high overhead from additional offline evaluations in existing methods, we propose to tune parameters along with the actual periodic executions of each job (i.e., online evaluations). To ensure safety during online job executions, we design a safe configuration acquisition method that models the safe region. Finally, three innovative techniques are leveraged to further accelerate the search process: adaptive sub-space generation, approximate gradient descent, and meta-learning method. We have implemented this framework as an independent cloud service, and applied it to the data platform in Tencent. The empirical results on both public benchmarks and large-scale production tasks demonstrate its superiority in terms of practicality, generality, and efficiency. Notably, this service saves an average of 57.00% memory cost and 34.93% CPU cost on 25K in-production tasks within 20 iterations, respectively.
    摘要 分布式数据分析系统---Spark 是一种常用的处理巨量不同类型数据的工具,但是调整其参数以 достичь高性能是一个挑战。最近的研究尝试使用自动调整技术解决这个问题,但它们受到三种问题的限制:功能受限,高过程成本,和不fficient搜索。在这篇论文中,我们提出了一个通用和高效的Spark调整框架,可以同时解决这三种问题。首先,我们引入一个通用的调整形式,可以方便地支持多个调整目标和约束,并使用抽象优化(BO)来解决这个通用优化问题。其次,为了避免现有方法的高过程成本,我们提议在实际 periodic执行每个任务时(即在线评估)进行参数调整。为确保在线任务执行安全,我们设计了一种安全配置获取方法,可以模拟安全区域。最后,我们采用了三种创新技术来加速搜索过程:适应子空间生成、 Approximate Gradient Descent 和元学习方法。我们实现了这个框架作为独立的云服务,并应用于腾讯数据平台。实际结果表明,这个框架在实际应用中具有了优秀的实用性、通用性和高效性。特别是,这个服务在25000个生产任务上平均占用内存成本下降57.00%,并且CPU成本下降34.93%,在20个迭代中分别达到这些值。

Inferring Actual Treatment Pathways from Patient Records

  • paper_url: http://arxiv.org/abs/2309.01897
  • repo_url: None
  • paper_authors: Adrian Wilkins-Caruana, Madhushi Bandara, Katarzyna Musial, Daniel Catchpoole, Paul J. Kennedy
  • for: This paper aims to infer actual treatment steps for a particular patient group from administrative health records (AHRs), addressing gaps in treatment pathway-inference research.
  • methods: The method introduced in this paper is called Defrag, which learns the semantic and temporal meaning of healthcare event sequences using a neural network (NN) and a self-supervised learning objective.
  • results: Defrag significantly outperforms several existing pathway-inference methods and is effective in identifying best-practice pathway fragments for breast cancer, lung cancer, and melanoma in public healthcare records.Here’s the Chinese translation of the three points:
  • for: 这篇论文目标是从行政医疗记录(AHR)中推断特定患者群体的实际治疗步骤,解决治疗路径推断研究中的技术和方法上的缺陷。
  • methods: 该论文提出的方法是名为“Defrag”的方法,它利用神经网络(NN)和一种自我超vised学习目标来学习医疗事件序列的semantic和时间意义。
  • results: Defrag Significantly Outperforms Several Existing Pathway-Inference Methods and Effective in Identifying Best-Practice Pathway Fragments for Breast Cancer, Lung Cancer, and Melanoma in Public Healthcare Records。
    Abstract Treatment pathways are step-by-step plans outlining the recommended medical care for specific diseases; they get revised when different treatments are found to improve patient outcomes. Examining health records is an important part of this revision process, but inferring patients' actual treatments from health data is challenging due to complex event-coding schemes and the absence of pathway-related annotations. This study aims to infer the actual treatment steps for a particular patient group from administrative health records (AHR) - a common form of tabular healthcare data - and address several technique- and methodology-based gaps in treatment pathway-inference research. We introduce Defrag, a method for examining AHRs to infer the real-world treatment steps for a particular patient group. Defrag learns the semantic and temporal meaning of healthcare event sequences, allowing it to reliably infer treatment steps from complex healthcare data. To our knowledge, Defrag is the first pathway-inference method to utilise a neural network (NN), an approach made possible by a novel, self-supervised learning objective. We also developed a testing and validation framework for pathway inference, which we use to characterise and evaluate Defrag's pathway inference ability and compare against baselines. We demonstrate Defrag's effectiveness by identifying best-practice pathway fragments for breast cancer, lung cancer, and melanoma in public healthcare records. Additionally, we use synthetic data experiments to demonstrate the characteristics of the Defrag method, and to compare Defrag to several baselines where it significantly outperforms non-NN-based methods. Defrag significantly outperforms several existing pathway-inference methods and offers an innovative and effective approach for inferring treatment pathways from AHRs. Open-source code is provided to encourage further research in this area.
    摘要 医疗路径是一系列步骤计划,用于确定特定疾病的建议的医疗方案。这些路径不断地得到更新,当新的治疗方法提高患者结果时。查看医疗记录是这个 revision 过程的重要组成部分,但从医疗数据中推断患者的具体治疗步骤是困难的,因为医疗事件编码方案复杂,而且缺乏路径相关的注释。本研究旨在从医疗记录中推断患者特定群体的实际治疗步骤,并解决了一些技术和方法基础上的差距。我们提出了一种名为Defrag的方法,可以从医疗记录中推断实际治疗步骤。Defrag可以学习医疗事件序列的semantic和temporal意义,以可靠地从复杂医疗数据中推断治疗步骤。我们知道,Defrag是首个利用神经网络(NN)的医疗路径推断方法,这是由于我们提出的一种新的自主学习目标。我们还开发了一个用于医疗路径推断的测试和验证框架,用于评估和比较Defrag的路径推断能力,并与基eline相比。我们在公共医疗记录中identified breast cancer, lung cancer和melanoma的best-practice路径片段。此外,我们通过 sintetic data experiment demonstrates Defrag的特点,并与其他基eline相比,Defrag显示出显著的优势。Defrag signifiantly outperforms several existing pathway-inference methods and offers an innovative and effective approach for inferring treatment pathways from AHRs.我们提供了开源代码,以便进一步研究这个领域。

On the Planning, Search, and Memorization Capabilities of Large Language Models

  • paper_url: http://arxiv.org/abs/2309.01868
  • repo_url: None
  • paper_authors: Yunhao Yang, Anshul Tomar
  • for: 这项研究探讨了使用最新的大语言模型GPT-4进行规划任务的可能性,并在多个规划子领域进行了广泛的检验。
  • methods: 本研究使用GPT-4进行规划领域EXTRACTION、图搜索路径规划和反对抗规划等多个任务的实验分析。
  • results: 研究发现GPT-4在规划领域中表现出色,但也存在一些约束限制其应用范围。提出了一种精通语言模型特定领域的微调方法来提高CoT能力。
    Abstract The rapid advancement of large language models, such as the Generative Pre-trained Transformer (GPT) series, has had significant implications across various disciplines. In this study, we investigate the potential of the state-of-the-art large language model (GPT-4) for planning tasks. We explore its effectiveness in multiple planning subfields, highlighting both its strengths and limitations. Through a comprehensive examination, we identify areas where large language models excel in solving planning problems and reveal the constraints that limit their applicability. Our empirical analysis focuses on GPT-4's performance in planning domain extraction, graph search path planning, and adversarial planning. We then propose a way of fine-tuning a domain-specific large language model to improve its Chain of Thought (CoT) capabilities for the above-mentioned tasks. The results provide valuable insights into the potential applications of large language models in the planning domain and pave the way for future research to overcome their limitations and expand their capabilities.
    摘要 <>转换给定文本到简化中文。<>大语言模型的快速发展,如生成预训练变换器(GPT)系列,对各个领域产生了深远的影响。在这项研究中,我们研究了最新的州阶势language model(GPT-4)在规划任务中的潜力。我们探索它在多个规划子领域的效果,把握其优势和局限性。通过全面的分析,我们确定了大语言模型在解决规划问题的场景,以及它们的应用约束。我们的实验分析关注GPT-4在规划领域抽取、图搜索路径规划和反对抗规划等方面的性能。然后,我们提出了一种 fine-tuning 域特定的大语言模型来提高它的链条思维(CoT)能力,以便更好地应用于以上任务。结果提供了对大语言模型在规划领域的应用潜力和未来研究的指导。

Efficient Query-Based Attack against ML-Based Android Malware Detection under Zero Knowledge Setting

  • paper_url: http://arxiv.org/abs/2309.01866
  • repo_url: None
  • paper_authors: Ping He, Yifan Xia, Xuhong Zhang, Shouling Ji
  • for: 本研究旨在提出一种高效的查询式攻击框架,用于对基于机器学习的Android黑客检测(AMD)方法进行攻击。
  • methods: 本研究使用了一种基于零知识的查询式攻击方法,可以在各种实际场景中进行攻击。
  • results: 对多种主流的机器学习基于AMD方法和现实世界的抗病毒解决方案进行了广泛的评估,并取得了出色的成绩。
    Abstract The widespread adoption of the Android operating system has made malicious Android applications an appealing target for attackers. Machine learning-based (ML-based) Android malware detection (AMD) methods are crucial in addressing this problem; however, their vulnerability to adversarial examples raises concerns. Current attacks against ML-based AMD methods demonstrate remarkable performance but rely on strong assumptions that may not be realistic in real-world scenarios, e.g., the knowledge requirements about feature space, model parameters, and training dataset. To address this limitation, we introduce AdvDroidZero, an efficient query-based attack framework against ML-based AMD methods that operates under the zero knowledge setting. Our extensive evaluation shows that AdvDroidZero is effective against various mainstream ML-based AMD methods, in particular, state-of-the-art such methods and real-world antivirus solutions.
    摘要 Android 操作系统的普及使得恶意应用程序成为了袭击者的目标。基于机器学习(ML)的 Android 恶意软件检测(AMD)方法是解决这个问题的关键,但它们受到了对抗示例的攻击的担忧。现有的对 ML-based AMD 方法的攻击方法具有惊人的性能,但它们假设了可能不是实际场景中的假设,例如特征空间、模型参数和训练集的知识要求。为解决这个限制,我们介绍了 AdvDroidZero,一种基于查询的攻击框架,在零知识设定下运行。我们的广泛评估表明,AdvDroidZero 对主流 ML-based AMD 方法和实际的反病毒解决方案都具有高效性。

BigFUSE: Global Context-Aware Image Fusion in Dual-View Light-Sheet Fluorescence Microscopy with Image Formation Prior

  • paper_url: http://arxiv.org/abs/2309.01865
  • repo_url: None
  • paper_authors: Yu Liu, Gesine Muller, Nassir Navab, Carsten Marr, Jan Huisken, Tingying Peng
  • for: 提高LSFM图像质量,解决薄样品中光散射引起的图像模糊问题
  • methods: 使用双视图图像融合技术,根据两个视图的图像质量进行地方性比较,以确定具有高对比度的焦点 pixels
  • results: 提出了BigFUSE全局上下文感知图像融合方法,可以在LSFM中稳定图像融合,并且可以排除结构化噪声,从而提高图像质量
    Abstract Light-sheet fluorescence microscopy (LSFM), a planar illumination technique that enables high-resolution imaging of samples, experiences defocused image quality caused by light scattering when photons propagate through thick tissues. To circumvent this issue, dualview imaging is helpful. It allows various sections of the specimen to be scanned ideally by viewing the sample from opposing orientations. Recent image fusion approaches can then be applied to determine in-focus pixels by comparing image qualities of two views locally and thus yield spatially inconsistent focus measures due to their limited field-of-view. Here, we propose BigFUSE, a global context-aware image fuser that stabilizes image fusion in LSFM by considering the global impact of photon propagation in the specimen while determining focus-defocus based on local image qualities. Inspired by the image formation prior in dual-view LSFM, image fusion is considered as estimating a focus-defocus boundary using Bayes Theorem, where (i) the effect of light scattering onto focus measures is included within Likelihood; and (ii) the spatial consistency regarding focus-defocus is imposed in Prior. The expectation-maximum algorithm is then adopted to estimate the focus-defocus boundary. Competitive experimental results show that BigFUSE is the first dual-view LSFM fuser that is able to exclude structured artifacts when fusing information, highlighting its abilities of automatic image fusion.
    摘要 光Sheet fluorescence微scopía(LSFM),一种平面照明技术,可以实现高分辨率图像的取得,但光子在厚度的样本中传播时会导致图像模糊。为了解决这问题,双视图成像是有帮助的。它可以在不同的方向上扫描样本,从而实现不同部分的样本的高分辨率扫描。然而,当应用最新的图像融合方法时,由于其有限的场景视野,会导致图像融合失真。在这种情况下,我们提出了BigFUSE,一种全局上下文认知的图像融合器,可以在LSFM中稳定图像融合,并且考虑了光子在样本中的全局影响。通过对本地图像质量进行比较,BigFUSE可以计算出各个像素的封闭度,并且通过 bayes定理来确定注重点。在应用期望最大算法时,BigFUSE可以优先地除掉结构化遗憾。实验结果表明,BigFUSE是第一个可以自动执行图像融合的双视图LSFM融合器。

cs.CL - 2023-09-05

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

  • paper_url: http://arxiv.org/abs/2309.02591
  • repo_url: https://github.com/kyegomez/CM3Leon
  • paper_authors: Lili Yu, Bowen Shi, Ramakanth Pasunuru, Benjamin Muller, Olga Golovneva, Tianlu Wang, Arun Babu, Binh Tang, Brian Karrer, Shelly Sheynin, Candace Ross, Adam Polyak, Russell Howes, Vasu Sharma, Puxin Xu, Hovhannes Tamoyan, Oron Ashual, Uriel Singer, Shang-Wen Li, Susan Zhang, Richard James, Gargi Ghosh, Yaniv Taigman, Maryam Fazel-Zarandi, Asli Celikyilmaz, Luke Zettlemoyer, Armen Aghajanyan
  • for: 这篇论文是为了描述一种基于多模态语言模型的文本和图像生成模型CM3Leon,以及该模型在不同任务上的性能。
  • methods: 该模型使用了CM3多模态架构,并在大规模的采集和调参数数据上进行了扩展和优化。它还包括一个大规模的预训练阶段和一个多任务练熟环境(SFT)阶段。
  • results: 实验结果显示,这种方法对多模态模型是非常有效的,CM3Leon在文本到图像生成任务中达到了状态对的性能(FID=4.88),并且在语言指导图像编辑、图像控制生成和分割等任务中也可以达到了不可思议的水平。
    Abstract We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented, token-based, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pre-training stage and a second multi-task supervised fine-tuning (SFT) stage. It is also a general-purpose model that can do both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs. Extensive experiments demonstrate that this recipe is highly effective for multi-modal models. CM3Leon achieves state-of-the-art performance in text-to-image generation with 5x less training compute than comparable methods (zero-shot MS-COCO FID of 4.88). After SFT, CM3Leon can also demonstrate unprecedented levels of controllability in tasks ranging from language-guided image editing to image-controlled generation and segmentation.
    摘要 我们提出CM3Leon(发音为“卡美伦”),这是一个基于搜索修正的、符号基于的解码器只多模态语言模型,可以生成和填充文本和图像。CM3Leon使用CM3多模态架构,但还有更加极端的优势,来自更多的指令样式数据的扩大和调整。它是首个基于文本only语言模型的多模态模型,通过一个大规模的搜索修正预训练阶段和第二个多任务监督练练(SFT)阶段进行训练。它还是一个通用的模型,可以进行文本到图像和图像到文本的生成,allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs。广泛的实验表明,这种方法对多模态模型非常有效。CM3Leon在文本到图像生成中达到了比较方法的状态机器(零shot MS-COCO FID of 4.88)。在SFT后,CM3Leon也可以展示无 precedent的可控性,从语言引导的图像修改到图像控制生成和分割。

Substitution-based Semantic Change Detection using Contextual Embeddings

  • paper_url: http://arxiv.org/abs/2309.02403
  • repo_url: https://github.com/dallascard/SBSCD
  • paper_authors: Dallas Card
  • for: 本研究旨在使用上下文嵌入来度量语义变化,并且提出了一种简单有效的方法,以优化现有的方法。
  • methods: 本研究使用最有可能的替换词来度量语义变化,这种方法不仅直观可解,而且更有效率,可以更好地探讨语义变化。
  • results: 本研究在最常引用的数据集上达到了最高的均值性能,并且可以更好地探讨语义变化,比静止词vec更有利于理解语义变化。
    Abstract Measuring semantic change has thus far remained a task where methods using contextual embeddings have struggled to improve upon simpler techniques relying only on static word vectors. Moreover, many of the previously proposed approaches suffer from downsides related to scalability and ease of interpretation. We present a simplified approach to measuring semantic change using contextual embeddings, relying only on the most probable substitutes for masked terms. Not only is this approach directly interpretable, it is also far more efficient in terms of storage, achieves superior average performance across the most frequently cited datasets for this task, and allows for more nuanced investigation of change than is possible with static word vectors.
    摘要

nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources

  • paper_url: http://arxiv.org/abs/2309.02373
  • repo_url: https://github.com/piotrnawrot/nanot5
  • paper_authors: Piotr Nawrot
  • for: 提高语言模型研究的可用性和资源利用率,使更多研究者能够访问和使用T5模型。
  • methods: 通过优化PyTorch框架和优化器,实现高效的T5模型预训练和精度调整,以及开源框架和配置等资源的提供,旨在拓宽语言模型研究领域的可用性和资源利用率。
  • results: 在单个GPU上预训练T5-Base模型只需16个小时,不会影响性能,并提供了多种配置和软硬件准则,以及开源框架和预训练模型,以满足研究者对T5模型的需求。
    Abstract State-of-the-art language models like T5 have revolutionized the NLP landscape, but their computational demands hinder a large portion of the research community. To address this challenge, we present nanoT5, a specially-optimized PyTorch framework for efficient pre-training and fine-tuning of T5 models. Drawing on insights from optimizer differences and prioritizing efficiency, nanoT5 allows a T5-Base model to be pre-trained on a single GPU in just 16 hours, without any loss in performance. With the introduction of this open-source framework, we hope to widen the accessibility to language modelling research and cater to the community's demand for more user-friendly T5 (Encoder-Decoder) implementations. Our contributions, including configurations, codebase, software/hardware insights, and pre-trained models, are available to the public, aiming to strike a balance between research accessibility and resource constraints in NLP.
    摘要 现代语言模型如T5已经革命化了NLPT中的景象,但它们的计算需求限制了大量研究人员。为解决这个挑战,我们现在提出nanoT5,一个特殊优化的PyTorch框架,用于高效地预训练和精度调整T5模型。通过优化器差异和高效性的启发,nanoT5可以在单个GPU上预训练T5-Base模型,只需16个小时,而无损失性表现。我们通过这个开源框架,希望扩大语言模型研究的访问权限,并为NLPT社区提供更加用户友好的T5(Encoder-Decoder)实现。我们的贡献包括配置、代码库、软硬件杂志和预训练模型,都对公众开放,以实现NLPT研究资源的平衡。

Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization

  • paper_url: http://arxiv.org/abs/2309.02311
  • repo_url: https://github.com/milanlproc/weigh-your-own-words
  • paper_authors: Helena Bonaldi, Giuseppe Attanasio, Debora Nozza, Marco Guerini
  • for: 防止在线仇恨言语的发展,提出了一种基于预训练语言模型(PLMs)的自动生成对话方法。
  • methods: 本研究使用了一种基于注意力的违规常量来改进PLMs的泛化能力,以便在不同的目标和实际垃圾语言上生成更加多样化和更加丰富的对话。
  • results: 对英语 benchmark 数据集进行实验表明,使用了注意力违规常量的改进方法可以生成更好的对话,特别是在训练数据中不包含仇恨目标时。
    Abstract Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives by adapting Pretrained Transformer-based Language Models (PLMs) with human-curated data. This process, however, can produce in-domain overfitting, resulting in models generating acceptable narratives only for hatred similar to training data, with little portability to other targets or to real-world toxic language. This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs for counter narratives generation. Overfitting to training-specific terms is then discouraged, resulting in more diverse and richer narratives. We experiment with two attention-based regularization techniques on a benchmark English dataset. Regularized models produce better counter narratives than state-of-the-art approaches in most cases, both in terms of automatic metrics and human evaluation, especially when hateful targets are not present in the training data. This work paves the way for better and more flexible counter-speech generation models, a task for which datasets are highly challenging to produce.
    摘要 Simplified Chinese:近期计算方法对于在线仇恨言语的应对包括使用预训练的变换器基于语言模型(PLMs)自动生成反对narritives。然而,这个过程可能会导致域内过拟合,使模型只能生成与训练数据相似的acceptable narritives,具有小的可移植性到其他目标或实际世界中的恶语言。本文提出了一种新的注意力规范方法来提高PLMs的泛化能力 для反对narritives生成。通过避免训练数据特定的注意力过拟合,模型可以生成更多元和更加丰富的narritives。我们在一个英语 benchmark 数据集上实验了两种注意力基于规范技术,并发现正则化模型在大多数情况下可以生成更好的反对narritives,特别是当仇恨目标不在训练数据中时。这项工作为Counter-speech生成模型的更好和更灵活的模型开创了道路,这个任务的数据非常困难生产。

PromptTTS 2: Describing and Generating Voices with Text Prompt

  • paper_url: http://arxiv.org/abs/2309.02285
  • repo_url: None
  • paper_authors: Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian
  • for: 这个研究是为了解决基于文本提示的语音生成方法中的一个问题,即使用文本提示来生成语音时,不能完全捕捉语音中的声音变化信息。
  • methods: 这个研究使用了一种变换网络,该网络可以根据文本提示来预测语音中的声音变化信息,以及一个提取ipeline,该ipeline可以使用语音理解模型来识别语音中的声音特征(例如性别、速度等),并使用大型自然语言处理模型来形成文本提示。
  • results: 实验结果表明,与前一代方法相比,PromptTTS 2可以更好地根据文本提示生成语音,并且支持采样多种语音变化,因此可以为用户提供更多的语音选择。此外,提取ipeline可以生成高质量的文本提示,从而消除大量的标注成本。
    Abstract Speech conveys more information than just text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text prompt face two challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompt for speech. In this work, we introduce PromptTTS 2 to address these challenges with a variation network to provide variability information of voice not captured by text prompts, and a prompt generation pipeline to utilize the large language models (LLM) to compose high quality text prompts. Specifically, the variation network predicts the representation extracted from the reference speech (which contains full information about voice) based on the text prompt representation. For the prompt generation pipeline, it generates text prompts for speech with a speech understanding model to recognize voice attributes (e.g., gender, speed) from speech and a large language model to formulate text prompt based on the recognition results. Experiments on a large-scale (44K hours) speech dataset demonstrate that compared to the previous works, PromptTTS 2 generates voices more consistent with text prompts and supports the sampling of diverse voice variability, thereby offering users more choices on voice generation. Additionally, the prompt generation pipeline produces high-quality prompts, eliminating the large labeling cost. The demo page of PromptTTS 2 is available online\footnote{https://speechresearch.github.io/prompttts2}.
    摘要 文本中的语音包含更多信息,因为同一个词可以在不同的声音下被读出,表达多种信息。相比传统的文本识别(TTS)方法,利用语音提示(参考语音)来实现声音多样性,使用文本提示(描述)更加用户友好,因为语音提示可能困难找或者不存在。TTS方法基于文本提示面临两个挑战:1)一个多个问题,即文本提示中不能完全表达声音多样性的细节信息;2)文本提示数据集的有限性,需要供应商和大量的数据标注来编写文本提示。在这项工作中,我们介绍PromptTTS 2,以解决这两个挑战。PromptTTS 2使用变化网络提供不同声音的多样性信息,并使用大语言模型(LLM)组合高质量文本提示来生成语音。具体来说,变化网络预测基于参考语音(含有全部声音信息)的表示,根据文本提示表示。为生成文本提示,我们使用语音理解模型认识语音特征(例如性别、速度),并使用大语言模型根据认识结果组合文本提示。实验表明,Compared to previous works,PromptTTS 2可以更好地根据文本提示生成声音,并支持采样多样的声音选择。此外,提示生成管道可以生成高质量的提示,减少大量标注成本。PromptTTS 2的demo页面可以在线查看\footnotesize{\url{https://speechresearch.github.io/prompttts2}.

Dialog Action-Aware Transformer for Dialog Policy Learning

  • paper_url: http://arxiv.org/abs/2309.02240
  • repo_url: None
  • paper_authors: Huimin Wang, Wai-Chung Kwan, Kam-Fai Wong
  • for: 这个研究旨在提高对话策略学习(Dialog Policy Learning,DPL)的效率,使用对话数据来增强RL代理人的学习速度。
  • methods: 本研究提出了一个叫做“对话动作意识”的对话动作批评(DaTrans),该批评通过一个新的调整程序“对话最后一个动作任务”来增强DaTrans的对话意识和动作特征。
  • results: 研究结果显示,这个方法可以快速地将RL代理人带到最佳的对话策略,并且在人类评价中得到了良好的评价。
    Abstract Recent works usually address Dialog policy learning DPL by training a reinforcement learning (RL) agent to determine the best dialog action. However, existing works on deep RL require a large volume of agent-user interactions to achieve acceptable performance. In this paper, we propose to make full use of the plain text knowledge from the pre-trained language model to accelerate the RL agent's learning speed. Specifically, we design a dialog action-aware transformer encoder (DaTrans), which integrates a new fine-tuning procedure named masked last action task to encourage DaTrans to be dialog-aware and distils action-specific features. Then, DaTrans is further optimized in an RL setting with ongoing interactions and evolves through exploration in the dialog action space toward maximizing long-term accumulated rewards. The effectiveness and efficiency of the proposed model are demonstrated with both simulator evaluation and human evaluation.
    摘要 现代工作通常采用对话策略学习(Dialog Policy Learning,DPL),通过训练一个强化学习(Reinforcement Learning,RL)代理人来确定最佳对话动作。然而,现有的深度RL需要大量的代理人-用户互动来 достичьacceptable的性能。在这篇论文中,我们提议利用预先训练的自然语言模型的普通文本知识,以加速RL代理人的学习速度。特别是,我们设计了对话动作意识的 transformer 编码器(DaTrans),通过一种新的精细调整过程名为遮盖最后一个动作任务来鼓励 DaTrans 成为对话意识的。然后,DaTrans 在RL Setting中进行了进一步优化,通过在对话动作空间中的探索来最大化长期积累的奖励。我们通过 simulate 评估和人类评估来证明提案的效果和效率。

Incorporating Dictionaries into a Neural Network Architecture to Extract COVID-19 Medical Concepts From Social Media

  • paper_url: http://arxiv.org/abs/2309.02188
  • repo_url: None
  • paper_authors: Abul Hasan, Mark Levene, David Weston
  • for: 这研究探讨了将字典信息 integrate into neural network architecture for natural language processing的可能性。
  • methods: 这研究使用了一种基于字典的深度学习模型,用于提取COVID-19相关的概念。
  • results: 研究结果显示,将小领域字典 integrate into深度学习模型可以提高概念提取任务的性能,并且这些模型可以在不同数据集上进行转移。
    Abstract We investigate the potential benefit of incorporating dictionary information into a neural network architecture for natural language processing. In particular, we make use of this architecture to extract several concepts related to COVID-19 from an on-line medical forum. We use a sample from the forum to manually curate one dictionary for each concept. In addition, we use MetaMap, which is a tool for extracting biomedical concepts, to identify a small number of semantic concepts. For a supervised concept extraction task on the forum data, our best model achieved a macro $F_1$ score of 90\%. A major difficulty in medical concept extraction is obtaining labelled data from which to build supervised models. We investigate the utility of our models to transfer to data derived from a different source in two ways. First for producing labels via weak learning and second to perform concept extraction. The dataset we use in this case comprises COVID-19 related tweets and we achieve an $F_1$ score 81\% for symptom concept extraction trained on weakly labelled data. The utility of our dictionaries is compared with a COVID-19 symptom dictionary that was constructed directly from Twitter. Further experiments that incorporate BERT and a COVID-19 version of BERTweet demonstrate that the dictionaries provide a commensurate result. Our results show that incorporating small domain dictionaries to deep learning models can improve concept extraction tasks. Moreover, models built using dictionaries generalize well and are transferable to different datasets on a similar task.
    摘要 我们研究将词典信息 integrate into neural network architecture for natural language processing的潜在优点。特别是我们使用这种架构提取COVID-19相关概念从在线医学讨论区。我们使用样本从讨论区手动精心抽取一个词典 для每个概念。此外,我们使用MetaMap工具提取生物医学概念,以确定一些semantic概念。对于基于讨论区数据的抽象概念提取任务,我们的最佳模型达到了90%的macro $F_1$ 分数。医疗概念提取的主要挑战之一是获得可靠的标签数据,用于建立supervised模型。我们研究将我们的模型传输到不同来源数据上进行两种方式。第一种是通过弱学习生成标签,第二种是进行概念提取。我们使用COVID-19相关推特来构建数据集,并实现了基于弱标签的概念提取Task中的81%的$F_1$ 分数。我们的词典与直接从Twitter中构建的COVID-19症状词典进行比较。进一步的实验表明,我们的词典提供了相似的结果。我们的结果表明,将小域词典 integrate into深度学习模型可以提高概念提取任务的性能。此外,使用词典建立的模型具有良好的泛化能力和可传播性。

Advancing Text-to-GLOSS Neural Translation Using a Novel Hyper-parameter Optimization Technique

  • paper_url: http://arxiv.org/abs/2309.02162
  • repo_url: None
  • paper_authors: Younes Ouargani, Noussaima El Khattabi
  • for: 这paper是 investigate transformers for Neural Machine Translation of text-to-GLOSS, 用于提高Deaf和听力不良的通信中的GLOSS翻译的精度和流畅性。
  • methods: 这paper使用了一种新的超参数搜索技术,搜索了不同的架构参数,并构建了一个优化的 transformer-based 架构,特意适用于text-to-GLOSS翻译任务。
  • results: 实验结果表明,最佳的 transformer 架构在 PHOENIX14T 数据集上达到了 ROUGE 分数55.18% 和 BLEU-1 分数63.6%,超过了之前在同一数据集上的最佳结果,升级了 BLEU1 和 ROUGE 分数的状态之作。
    Abstract In this paper, we investigate the use of transformers for Neural Machine Translation of text-to-GLOSS for Deaf and Hard-of-Hearing communication. Due to the scarcity of available data and limited resources for text-to-GLOSS translation, we treat the problem as a low-resource language task. We use our novel hyper-parameter exploration technique to explore a variety of architectural parameters and build an optimal transformer-based architecture specifically tailored for text-to-GLOSS translation. The study aims to improve the accuracy and fluency of Neural Machine Translation generated GLOSS. This is achieved by examining various architectural parameters including layer count, attention heads, embedding dimension, dropout, and label smoothing to identify the optimal architecture for improving text-to-GLOSS translation performance. The experiments conducted on the PHOENIX14T dataset reveal that the optimal transformer architecture outperforms previous work on the same dataset. The best model reaches a ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score of 55.18% and a BLEU-1 (BiLingual Evaluation Understudy 1) score of 63.6%, outperforming state-of-the-art results on the BLEU1 and ROUGE score by 8.42 and 0.63 respectively.
    摘要 在这篇论文中,我们研究使用变换器来进行神经机器翻译文本到GLOSS,以便为听力异常和耳聋人士进行交流。由于文本到GLOSS翻译数据的稀缺和限制了资源,我们将这个问题视为低资源语言任务。我们使用我们的新的 гипер参数探索技术来探索各种建筑 Parameters,并构建一个优化的变换器基础结构,专门适用于文本到GLOSS翻译。研究的目的是提高神经机器翻译生成的GLOSS的准确率和流畅度。我们通过检查层数、注意头数、嵌入维度、dropout和标签平滑来确定优化文本到GLOSS翻译性能的最佳建筑 Parameters。在PHOENIX14T数据集上进行的实验表明,优化的变换器结构可以超越之前在同一数据集上的成果。最佳模型在ROUGE(Recall-Oriented Understudy for Gisting Evaluation)分数上达到55.18%,并在BLEU-1(BiLingual Evaluation Understudy 1)分数上达到63.6%,超越了当前的BLEU1和ROUGE分数的状态态度。

Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

  • paper_url: http://arxiv.org/abs/2309.02145
  • repo_url: None
  • paper_authors: Patrick Eickhoff, Matthias Möller, Theresa Pekarek Rosin, Johannes Twiefel, Stefan Wermter
  • for: 这研究旨在提高自动语音识别(ASR)系统的性能,特别是在听力条件不佳的情况下。
  • methods: 我们提出了一种新的方法,可以将大型端到端(E2E)模型中的干净能力提取出来,并将其应用于任何encoder-decoder架构。我们的方法基于Conformer ASR模型的隐藏活动,通过一个decoder来预测干净spectrogram。
  • results: 我们的模型可以成功地过滤听力条件下的噪音,并且可以提高下游模型在噪音条件下的总词错率(WER)。我们的模型可以作为前端应用于预训练的Conformer ASR模型,以及从头开始训练小型Conformer ASR模型。
    Abstract In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have reported state-of-the-art performance on various benchmarks. These systems intrinsically learn how to handle and remove noise conditions from speech. Previous research has shown, that it is possible to extract the denoising capabilities of these models into a preprocessor network, which can be used as a frontend for downstream ASR models. However, the proposed methods were limited to specific fully convolutional architectures. In this work, we propose a novel method to extract the denoising capabilities, that can be applied to any encoder-decoder architecture. We propose the Cleancoder preprocessor architecture that extracts hidden activations from the Conformer ASR model and feeds them to a decoder to predict denoised spectrograms. We train our pre-processor on the Noisy Speech Database (NSD) to reconstruct denoised spectrograms from noisy inputs. Then, we evaluate our model as a frontend to a pretrained Conformer ASR model as well as a frontend to train smaller Conformer ASR models from scratch. We show that the Cleancoder is able to filter noise from speech and that it improves the total Word Error Rate (WER) of the downstream model in noisy conditions for both applications.
    摘要 Recent research in speech processing has shown that large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have achieved state-of-the-art performance on various benchmarks. These systems have the ability to intrinsically handle and remove noise from speech. Previous studies have demonstrated that the denoising capabilities of these models can be extracted and used as a frontend for downstream ASR models. However, these methods were limited to specific fully convolutional architectures.In this study, we propose a novel method to extract the denoising capabilities that can be applied to any encoder-decoder architecture. We introduce the Cleancoder preprocessor architecture, which extracts hidden activations from the Conformer ASR model and feeds them to a decoder to predict denoised spectrograms. We train our pre-processor on the Noisy Speech Database (NSD) to reconstruct denoised spectrograms from noisy inputs.We evaluate our model as a frontend to a pretrained Conformer ASR model as well as a frontend to train smaller Conformer ASR models from scratch. Our results show that the Cleancoder is able to filter noise from speech and improve the total Word Error Rate (WER) of the downstream model in noisy conditions for both applications.

Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion

  • paper_url: http://arxiv.org/abs/2309.02133
  • repo_url: https://github.com/unilight/seq2seq-vc
  • paper_authors: Wen-Chin Huang, Tomoki Toda
  • For: 本研究旨在评估三种最近提出的无ground truth基础的外语变换方法(FAC),以实现将非本地语言speaker的语音转换为本地语言speaker的语音,同时保持speaker identity。* Methods: 本研究使用的方法包括seq2seq和非并行的VC模型,以实现控制speaker identity和降低外语变换的困难性。* Results: 我们的实验评估结果显示,无一个方法在所有评估轴上表现出色,与之前的研究结论不同。我们还分析了seq2seq模型的训练输入和输出,以及非并行VC模型的设计选择,并发现Intelligibility指标与主观外语程度之间没有直接关系。
    Abstract Foreign accent conversion (FAC) is a special application of voice conversion (VC) which aims to convert the accented speech of a non-native speaker to a native-sounding speech with the same speaker identity. FAC is difficult since the native speech from the desired non-native speaker to be used as the training target is impossible to collect. In this work, we evaluate three recently proposed methods for ground-truth-free FAC, where all of them aim to harness the power of sequence-to-sequence (seq2seq) and non-parallel VC models to properly convert the accent and control the speaker identity. Our experimental evaluation results show that no single method was significantly better than the others in all evaluation axes, which is in contrast to conclusions drawn in previous studies. We also explain the effectiveness of these methods with the training input and output of the seq2seq model and examine the design choice of the non-parallel VC model, and show that intelligibility measures such as word error rates do not correlate well with subjective accentedness. Finally, our implementation is open-sourced to promote reproducible research and help future researchers improve upon the compared systems.
    摘要 外国腔转换(FAC)是voice转换(VC)的特殊应用,旨在将非本地语言 speaker的折衣语音转换为本地语言 speaker的Native-sounding speech,同时保持 speaker identity。FAC具有困难,因为不可收集欲使用的Native speech from the desired non-native speaker作为训练目标。在这项工作中,我们评估了三种最近提出的ground-truth-free FAC方法,其中所有方法均企图利用 seq2seq和non-parallel VC模型来正确地转换腔和控制 speaker identity。我们的实验评估结果表明,没有任何方法在所有评估轴上表现出显著优势,这与之前的研究结论不符。我们还解释了这些方法的效iveness,并检查了seq2seq模型的训练输入和输出,以及非平行VC模型的设计选择。最后,我们发现Intelligibility measure如word error rates与主观腔度之间没有正确的相关性。 finally,我们开源了我们的实现,以便促进可重复性的研究和未来的研究人员可以在此基础上改进相关的系统。

Wordle: A Microcosm of Life. Luck, Skill, Cheating, Loyalty, and Influence!

  • paper_url: http://arxiv.org/abs/2309.02110
  • repo_url: None
  • paper_authors: James P. Dilger
  • for: 这个研究是为了研究Wordle游戏中玩家的做法和习惯。
  • methods: 这个研究使用了信息理论来评估玩家的幸运和技巧,并将数据显示在Wordle游戏中的第一、第二、…、第六个猜测中。
  • results: 研究发现每天约有0.2-0.5%的玩家在第一次猜测中解题成功,这意味着4,000-10,000名玩家可能通过外部获取目标词语来夺冠。此外,研究还发现至少1/3的玩家有一个喜爱的开头词,而且大多数玩家会保持loyal于他们的开头词,即使该词语已经出现过。8月15日,约有30,000名玩家突然改变了他们的开头词,这可能是基于十字WORD的游戏提示。
    Abstract Wordle is a popular, online word game offered by the New York Times (nytimes.com). Currently there are some 2 million players of the English version worldwide. Players have 6 attempts to guess the daily word (target word) and after each attempt, the player receives color-coded information about the correctness and position of each letter in the guess. After either a successful completion of the puzzle or the final unsuccessful attempt, software can assess the player's luck and skill using Information Theory and can display data for the first, second, ..., sixth guesses of a random sample of all players. Recently, I discovered that the latter data is presented in a format that can easily be copied and pasted into a spreadsheet. I compiled data on Wordle players' first guesses from May 2023 - August 2023 and inferred some interesting information about Wordle players. A) Every day, about 0.2-0.5% of players solve the puzzle in one attempt. Because the odds of guessing the one of 2,315 possible target words at random is 0.043%, this implies that 4,000 - 10,000 players cheat by obtaining the target word outside of playing the game! B) At least 1/3 of the players have a favorite starting word, or cycle through several. And even though players should be aware that target words are never repeated, most players appear to remain loyal to their starting word even after its appearance as a target word. C) On August 15, 2023, about 30,000 players abruptly changed their starting word, presumably based on a crossword puzzle clue! Wordle players can be influenced! This study goes beyond social media postings, surveys, and Google Trends to provide solid, quantitative evidence about cheating in Wordle.
    摘要 wordle是一款受欢迎的在线单词游戏,提供于纽约时报(nytimes.com)上。目前全球玩家约200万人。玩家有6次尝试猜测每天的目标词(target word),每次猜测后,玩家会收到颜色标注的正确性和位置信息。完成游戏或最后一次无法猜测后,软件可以根据信息理论评估玩家的运气和技巧,并显示数据 для第一、第二、...、第六次猜测的随机样本玩家。我最近发现这些数据可以轻松地复制并粘贴到表格中。我 compile了5月2023年-8月2023年的Wordle玩家首次猜测数据,并从中推导出了一些有趣的信息。A) 每天大约0.2%-0.5%的玩家在第一次猜测中解题成功。由于随机猜测target word的概率为0.043%,这 imply That 4,000-10,000名玩家通过外部方式获得target word!B) 至少1/3的玩家有一个喜爱的开始词,或者循环使用多个。尽管玩家应该知道target words never repeated,但大多数玩家仍然偏向自己的开始词,即使该词已经出现在目标词中。C) 2023年8月15日,约30,000名玩家 suddenly changed their starting word, apparently based on a crossword puzzle clue! Wordle players can be influenced!这项研究超过社交媒体帖子、调查和Google Trends提供的轻量级证据,以准确的数据证明Wordle玩家的作弊行为。

Bridging Emotion Role Labeling and Appraisal-based Emotion Analysis

  • paper_url: http://arxiv.org/abs/2309.02092
  • repo_url: None
  • paper_authors: Roman Klinger
  • for: 本研究旨在探讨情感分析在文本中的应用,具体来说是情感分类和情感角色标注两个方面。
  • methods: 本研究使用了多种自然语言处理技术,包括情感分类和情感角色标注等。
  • results: 研究发现了情感分类和情感角色标注两个方面的问题,并提出了一些未解决的研究问题。
    Abstract The term emotion analysis in text subsumes various natural language processing tasks which have in common the goal to enable computers to understand emotions. Most popular is emotion classification in which one or multiple emotions are assigned to a predefined textual unit. While such setting is appropriate to identify the reader's or author's emotion, emotion role labeling adds the perspective of mentioned entities and extracts text spans that correspond to the emotion cause. The underlying emotion theories agree on one important point; that an emotion is caused by some internal or external event and comprises several subcomponents, including the subjective feeling and a cognitive evaluation. We therefore argue that emotions and events are related in two ways. (1) Emotions are events; and this perspective is the fundament in NLP for emotion role labeling. (2) Emotions are caused by events; a perspective that is made explicit with research how to incorporate psychological appraisal theories in NLP models to interpret events. These two research directions, role labeling and (event-focused) emotion classification, have by and large been tackled separately. We contributed to both directions with the projects SEAT (Structured Multi-Domain Emotion Analysis from Text) and CEAT (Computational Event Evaluation based on Appraisal Theories for Emotion Analysis), both funded by the German Research Foundation. In this paper, we consolidate the findings and point out open research questions.
    摘要 “情感分析”是一种自然语言处理任务的总称,它的目的是让计算机理解人类的情感。最受欢迎的是情感分类,在这种设定下,一个或多个情感被分配给已知文本单位。而情感角色标注则添加了提及对象的视角,并提取与情感相关的文本块。在情感理论中,所有情感都是由内部或外部事件引起的,并包括一些主观感受和认知评价。因此,我们认为情感和事件之间存在两种关系。第一种是情感是事件的角度,这是NP的基础。第二种是情感是由事件引起的,这种角度通过涉及心理评价理论来在NP模型中表示。这两个研究方向一直处理了分开,我们通过项目《SEAT》(结构多元领域情感分析从文本)和《CEAT》(基于评价理论的计算事件评价为情感分析),均得到了德国研究基金的资金支持。在这篇论文中,我们汇总了发现和提出了未来研究的问题。

An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models

  • paper_url: http://arxiv.org/abs/2309.02077
  • repo_url: None
  • paper_authors: Yusheng Liao, Yutong Meng, Hongcheng Liu, Yanfeng Wang, Yu Wang
  • for: 这篇论文旨在评估大语言模型(LLMs)在虚拟医生环境中的实际能力。
  • methods: 该论文提出了一种自动评估框架,用于评估 LLMs 在多turn 询问中的实际能力。该框架包括设计了供询问任务,要求 LLMs 了解自己所不知道的信息,并从患者那里收集缺失的医疗信息。
  • results: 实验结果显示,通过 fine-tuning 训练集可以减轻 LLMs 的假设现象,提高其在提posed的benchmark上的表现。这些结果得到了广泛的实验和剥夺学调查的验证。
    Abstract Large language models (LLMs) have achieved significant success in interacting with human. However, recent studies have revealed that these models often suffer from hallucinations, leading to overly confident but incorrect judgments. This limits their application in the medical domain, where tasks require the utmost accuracy. This paper introduces an automated evaluation framework that assesses the practical capabilities of LLMs as virtual doctors during multi-turn consultations. Consultation tasks are designed to require LLMs to be aware of what they do not know, to inquire about missing medical information from patients, and to ultimately make diagnoses. To evaluate the performance of LLMs for these tasks, a benchmark is proposed by reformulating medical multiple-choice questions from the United States Medical Licensing Examinations (USMLE), and comprehensive evaluation metrics are developed and evaluated on three constructed test sets. A medical consultation training set is further constructed to improve the consultation ability of LLMs. The results of the experiments show that fine-tuning with the training set can alleviate hallucinations and improve LLMs' performance on the proposed benchmark. Extensive experiments and ablation studies are conducted to validate the effectiveness and robustness of the proposed framework.
    摘要 Consultation tasks are designed to require LLMs to be aware of what they do not know, to inquire about missing medical information from patients, and to ultimately make diagnoses. To evaluate the performance of LLMs for these tasks, a benchmark is proposed by reformulating medical multiple-choice questions from the United States Medical Licensing Examinations (USMLE), and comprehensive evaluation metrics are developed and evaluated on three constructed test sets. A medical consultation training set is further constructed to improve the consultation ability of LLMs. The results of the experiments show that fine-tuning with the training set can alleviate hallucinations and improve LLMs' performance on the proposed benchmark.Extensive experiments and ablation studies are conducted to validate the effectiveness and robustness of the proposed framework.

Bilevel Scheduled Sampling for Dialogue Generation

  • paper_url: http://arxiv.org/abs/2309.01953
  • repo_url: None
  • paper_authors: Jiawen Liu, Kan Li
  • for: mitigating exposure bias in natural language processing tasks, particularly in dialog generation.
  • methods: proposed a bilevel scheduled sampling model that takes sentence-level information into account and incorporates it with word-level quality, and a smooth function that maps the combined result to an appropriate range for probabilistic sampling.
  • results: significantly alleviated the exposure bias problem and outperformed state-of-the-art scheduled sampling methods in experiments conducted on the DailyDialog and PersonaChat datasets.
    Abstract Exposure bias poses a common challenge in numerous natural language processing tasks, particularly in the dialog generation. In response to this issue, researchers have devised various techniques, among which scheduled sampling has proven to be an effective method for mitigating exposure bias. However, the existing state-of-the-art scheduled sampling methods solely consider the current sampling words' quality for threshold truncation sampling, which overlooks the importance of sentence-level information and the method of threshold truncation warrants further discussion. In this paper, we propose a bilevel scheduled sampling model that takes the sentence-level information into account and incorporates it with word-level quality. To enhance sampling diversity and improve the model's adaptability, we propose a smooth function that maps the combined result of sentence-level and word-level information to an appropriate range, and employ probabilistic sampling based on the mapped values instead of threshold truncation. Experiments conducted on the DailyDialog and PersonaChat datasets demonstrate the effectiveness of our proposed methods, which significantly alleviate the exposure bias problem and outperform state-of-the-art scheduled sampling methods.
    摘要 <>translate("Exposure bias poses a common challenge in numerous natural language processing tasks, particularly in dialog generation. In response to this issue, researchers have devised various techniques, among which scheduled sampling has proven to be an effective method for mitigating exposure bias. However, the existing state-of-the-art scheduled sampling methods solely consider the current sampling words' quality for threshold truncation sampling, which overlooks the importance of sentence-level information and the method of threshold truncation warrants further discussion. In this paper, we propose a bilevel scheduled sampling model that takes the sentence-level information into account and incorporates it with word-level quality. To enhance sampling diversity and improve the model's adaptability, we propose a smooth function that maps the combined result of sentence-level and word-level information to an appropriate range, and employ probabilistic sampling based on the mapped values instead of threshold truncation. Experiments conducted on the DailyDialog and PersonaChat datasets demonstrate the effectiveness of our proposed methods, which significantly alleviate the exposure bias problem and outperform state-of-the-art scheduled sampling methods.")]Here's the translation:<>交叉偏见是许多自然语言处理任务中的常见挑战,尤其是对话生成。为了解决这个问题,研究人员已经提出了多种技术,其中规则采样已经被证明是有效的方法来减少交叉偏见。然而,现有的状态艺术规则采样方法只考虑当前采样词语的质量,忽略了句子水平信息,这种方法不充分考虑句子级别的信息和规则采样的问题。在这篇论文中,我们提出了一种两级规则采样模型,该模型考虑了句子水平信息,并将其与单词水平信息结合。为了增强采样多样性和模型适应性,我们提出了一种缓动函数,将合并的句子水平和单词水平信息映射到适当的范围内,然后使用概率采样基于映射值而不是阈值 truncation。经过 DailyDialog 和 PersonaChat 数据集的实验,我们的提议方法显示效果,可以减少交叉偏见问题,并在现有的规则采样方法中具有优势。

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

  • paper_url: http://arxiv.org/abs/2309.01947
  • repo_url: None
  • paper_authors: Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra
  • for: 这篇论文的目的是提出一个名为TODM(Train Once Deploy Many)的新方法,用于快速训练适合不同硬件的实时语音识别(ASR)模型,并且可以与单一训练作业相比减少训练时间和资源。
  • methods: 这篇论文使用了以往的Supernet研究,将RNN-T模型的层级和宽度缩减为更小的subnetworks,以适应不同的硬件类型。此外,论文还提出了三种技术来提高TODM Supernet的效果:适应性Dropout、对Alpha-divergence知识传递和Scale Adam优化器。
  • results: 论文通过比较Supernet训练 versus个别调整Multi-Head State Space Model (MH-SSM) RNN-T使用LibriSpeech数据库,发现TODM Supernet可以与手动调整模型相比,在字元错误率(WER)上提高至3%以上的表现,而且可以快速地训练多个模型,并且仅需小于单一训练作业的训练时间和资源。
    Abstract Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and re-validating models after making these changes can be a resource-intensive task. This paper presents TODM (Train Once Deploy Many), a new approach to efficiently train many sizes of hardware-friendly on-device ASR models with comparable GPU-hours to that of a single training job. TODM leverages insights from prior work on Supernet, where Recurrent Neural Network Transducer (RNN-T) models share weights within a Supernet. It reduces layer sizes and widths of the Supernet to obtain subnetworks, making them smaller models suitable for all hardware types. We introduce a novel combination of three techniques to improve the outcomes of the TODM Supernet: adaptive dropouts, an in-place Alpha-divergence knowledge distillation, and the use of ScaledAdam optimizer. We validate our approach by comparing Supernet-trained versus individually tuned Multi-Head State Space Model (MH-SSM) RNN-T using LibriSpeech. Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER), while efficiently keeping the cost of training many models at a small constant.
    摘要 自动话语识别(ASR)模型需要根据特定硬件进行优化,以便在设备上部署。这可以通过调整模型的超参数或探索其结构的变化来实现。然而,在进行这些变化后,需要重新训练和验证模型,这可能会占用资源。本文介绍了一种新的方法—— Train Once Deploy Many(TODM),可以高效地在不同硬件类型上训练多个适合硬件的语音识别模型,并且与单个训练任务相比,它的GPU时间相同。TODM利用了先前的Supernet研究,在Supernet中,Recurrent Neural Network Transducer(RNN-T)模型共享权重。它采用了减小Supernet层数和宽度,从而得到了适合所有硬件类型的子网络,这些子网络是小型模型。我们介绍了一种新的组合技术,包括适应性Dropout、在位Alpha-分布知识继承和Scale Adam优化器,以提高TODM Supernet的结果。我们通过对Supernet训练 versus 手动调整Multi-Head State Space Model(MH-SSM)RNN-T使用LibriSpeech进行比较,结果表明,我们的TODM Supernet可以与手动调整模型相比,在字节错误率(WER)方面提高到3%之间的Relative。同时,我们efficient地保持了训练多个模型的成本,占用小的常量。

QuantEase: Optimization-based Quantization for Language Models – An Efficient and Intuitive Algorithm

  • paper_url: http://arxiv.org/abs/2309.01885
  • repo_url: None
  • paper_authors: Kayhan Behdin, Ayan Acharya, Aman Gupta, Sathiya Keerthi, Rahul Mazumder
  • for: 本研究针对大型自然语言模型(LLMs)的快速部署实现了压缩技术,尤其是Post-Training Quantization(PTQ)。
  • methods: 本研究提出了一个层别压缩框架QuantEase,各层独立进行压缩,并使用了coordinate descent(CD)技术来解决非凸网络问题。
  • results: 实验结果显示,QuantEase在不同的LLMs和数据集上的误差率和零shot准确率方面具有国际级的表现,与比较方法GPTQ之间的改善为15%之间。尤其是对于具有重要权重(outliers)的情况下,我们的方法可以实现近乎3位数字的压缩,不需要非凸压缩或分组技术,与比较方法SpQR的改善为2倍以上。
    Abstract With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization framework where individual layers undergo separate quantization. The problem is framed as a discrete-structured non-convex optimization, prompting the development of algorithms rooted in Coordinate Descent (CD) techniques. These CD-based methods provide high-quality solutions to the complex non-convex layer-wise quantization problems. Notably, our CD-based approach features straightforward updates, relying solely on matrix and vector operations, circumventing the need for matrix inversion or decomposition. We also explore an outlier-aware variant of our approach, allowing for retaining significant weights (outliers) with complete precision. Our proposal attains state-of-the-art performance in terms of perplexity and zero-shot accuracy in empirical evaluations across various LLMs and datasets, with relative improvements up to 15% over methods such as GPTQ. Particularly noteworthy is our outlier-aware algorithm's capability to achieve near or sub-3-bit quantization of LLMs with an acceptable drop in accuracy, obviating the need for non-uniform quantization or grouping techniques, improving upon methods such as SpQR by up to two times in terms of perplexity.
    摘要 随着大型语言模型(LLM)的 популяр化,压缩技术的研究吸引了越来越多的关注。这项研究关注于LLM的Post-Training Quantization(PTQ)。基于最新的进展,我们提出了QuantEase,一个层 wise量化框架,其中每层都进行独立的量化。问题被定义为一个逻辑结构化非核心的优化问题,这使得我们可以基于坐标降低(CD)技术开发高质量的解决方案。这些CD基本的方法可以提供高质量的解决方案,并且具有简单的更新,只需要基于矩阵和向量的操作,不需要矩阵反射或分解。我们还探索了一种具有异常检测的变体,可以保留重要的权重(异常),并且完全保留精度。我们的提议在实验中达到了 LLM 的状态zegart 性能,包括词 Error 和零培训精度,与比如 GPTQ 的方法相比,提高了15%。特别是我们的异常检测变体可以在不同批量化或分组技术的情况下,实现 LLM 的近或下三位量化,超过 SpQR 的性能,提高了至多两倍。

cs.LG - 2023-09-05

Superclustering by finding statistically significant separable groups of optimal gaussian clusters

  • paper_url: http://arxiv.org/abs/2309.02623
  • repo_url: https://github.com/berng/GMSDB
  • paper_authors: Oleg I. Berngardt
  • For: The paper proposes an algorithm for clustering a dataset into optimal superclusters based on the BIC criterion and statistical separability.* Methods: The algorithm consists of three stages: representing the dataset as a mixture of Gaussian distributions, estimating distances and cluster sizes using the Mahalanobis distance, and combining clusters into superclusters using the DBSCAN method with a statistical significance level.* Results: The algorithm demonstrates good results on test datasets in both noisy and noiseless situations, and can predict correct superclusters for new data based on already trained clusterer. However, the algorithm has low speed and stochastic nature, and requires a sufficiently large dataset for clustering.Here is the same information in Simplified Chinese text:* For: 文章提出一种算法,用于基于BIC criterion和统计分离性 clustering dataset。* Methods: 算法包括三个阶段:将dataset表示为一个 mixture of Gaussian distributions,使用 Mahalanobis distance 估计 distances 和 cluster size,并使用 DBSCAN 方法将 clusters 组合成 superclusters,并使用统计significance level。* Results: 算法在测试dataset上显示了良好的结果,能够预测新数据中 correct superclusters,基于已经训练好的 clusterer。然而,算法有低速度和随机性,需要一个充分大的 dataset 进行 clustering。
    Abstract The paper presents the algorithm for clustering a dataset by grouping the optimal, from the point of view of the BIC criterion, number of Gaussian clusters into the optimal, from the point of view of their statistical separability, superclusters. The algorithm consists of three stages: representation of the dataset as a mixture of Gaussian distributions - clusters, which number is determined based on the minimum of the BIC criterion; using the Mahalanobis distance, to estimate the distances between the clusters and cluster sizes; combining the resulting clusters into superclusters using the DBSCAN method by finding its hyperparameter (maximum distance) providing maximum value of introduced matrix quality criterion at maximum number of superclusters. The matrix quality criterion corresponds to the proportion of statistically significant separated superclusters among all found superclusters. The algorithm has only one hyperparameter - statistical significance level, and automatically detects optimal number and shape of superclusters based of statistical hypothesis testing approach. The algorithm demonstrates a good results on test datasets in noise and noiseless situations. An essential advantage of the algorithm is its ability to predict correct supercluster for new data based on already trained clusterer and perform soft (fuzzy) clustering. The disadvantages of the algorithm are: its low speed and stochastic nature of the final clustering. It requires a sufficiently large dataset for clustering, which is typical for many statistical methods.
    摘要
  1. Representing the dataset as a mixture of Gaussian distributions (clusters) and determining the number of clusters based on the minimum of the BIC criterion.2. Using the Mahalanobis distance to estimate the distances between the clusters and their sizes.3. Combining the resulting clusters into superclusters using the DBSCAN method by finding the maximum distance that provides the maximum value of the introduced matrix quality criterion at the maximum number of superclusters.The algorithm has only one hyperparameter, the statistical significance level, and automatically detects the optimal number and shape of superclusters based on a statistical hypothesis testing approach. The algorithm demonstrates good results on test datasets in both noisy and noise-free situations. An advantage of the algorithm is its ability to predict correct superclusters for new data based on an already trained clusterer and perform soft (fuzzy) clustering. However, the algorithm has some disadvantages, such as low speed and stochastic nature of the final clustering, and requires a sufficient dataset for clustering, which is common for many statistical methods.

Generative AI-aided Joint Training-free Secure Semantic Communications via Multi-modal Prompts

  • paper_url: http://arxiv.org/abs/2309.02616
  • repo_url: None
  • paper_authors: Hongyang Du, Guangyuan Liu, Dusit Niyato, Jiayi Zhang, Jiawen Kang, Zehui Xiong, Bo Ai, Dong In Kim
  • for: 提高网络资源使用效率和实现通信目标
  • methods: 使用生成人工智能(GAI)模型,增强 semantic decoder 的可重构能力,并应用多模型提示进行精准内容解码
  • results: 实现精准内容解码和安全传输源消息,并提高网络资源使用效率
    Abstract Semantic communication (SemCom) holds promise for reducing network resource consumption while achieving the communications goal. However, the computational overheads in jointly training semantic encoders and decoders-and the subsequent deployment in network devices-are overlooked. Recent advances in Generative artificial intelligence (GAI) offer a potential solution. The robust learning abilities of GAI models indicate that semantic decoders can reconstruct source messages using a limited amount of semantic information, e.g., prompts, without joint training with the semantic encoder. A notable challenge, however, is the instability introduced by GAI's diverse generation ability. This instability, evident in outputs like text-generated images, limits the direct application of GAI in scenarios demanding accurate message recovery, such as face image transmission. To solve the above problems, this paper proposes a GAI-aided SemCom system with multi-model prompts for accurate content decoding. Moreover, in response to security concerns, we introduce the application of covert communications aided by a friendly jammer. The system jointly optimizes the diffusion step, jamming, and transmitting power with the aid of the generative diffusion models, enabling successful and secure transmission of the source messages.
    摘要

Generative Algorithms for Fusion of Physics-Based Wildfire Spread Models with Satellite Data for Initializing Wildfire Forecasts

  • paper_url: http://arxiv.org/abs/2309.02615
  • repo_url: https://github.com/bshaddy/cWGAN_fire_arrival_time_inference
  • paper_authors: Bryan Shaddy, Deep Ray, Angel Farguell, Valentina Calaza, Jan Mandel, James Haley, Kyle Hilburn, Derek V. Mallia, Adam Kochanski, Assad Oberai
    for: 这个研究的目的是开发一种基于卫星数据的高分辨率野火行为模型,以便预测野火的 sprech.methods: 这个研究使用了一种称为conditional Wasserstein Generative Adversarial Network (cWGAN)的方法,用于从卫星活动火数据中推断野火的到达时间。results: 研究发现,使用cWGAN来预测野火的到达时间可以非常准确, average Sorensen’s coefficient of 0.81 for the fire perimeters和average ignition time error of 32 minutes。
    Abstract Increases in wildfire activity and the resulting impacts have prompted the development of high-resolution wildfire behavior models for forecasting fire spread. Recent progress in using satellites to detect fire locations further provides the opportunity to use measurements to improve fire spread forecasts from numerical models through data assimilation. This work develops a method for inferring the history of a wildfire from satellite measurements, providing the necessary information to initialize coupled atmosphere-wildfire models from a measured wildfire state in a physics-informed approach. The fire arrival time, which is the time the fire reaches a given spatial location, acts as a succinct representation of the history of a wildfire. In this work, a conditional Wasserstein Generative Adversarial Network (cWGAN), trained with WRF-SFIRE simulations, is used to infer the fire arrival time from satellite active fire data. The cWGAN is used to produce samples of likely fire arrival times from the conditional distribution of arrival times given satellite active fire detections. Samples produced by the cWGAN are further used to assess the uncertainty of predictions. The cWGAN is tested on four California wildfires occurring between 2020 and 2022, and predictions for fire extent are compared against high resolution airborne infrared measurements. Further, the predicted ignition times are compared with reported ignition times. An average Sorensen's coefficient of 0.81 for the fire perimeters and an average ignition time error of 32 minutes suggest that the method is highly accurate.
    摘要 人类活动增加了野火的活动,并导致了一些影响。为了预测野火的扩散,人们已经开发了高分辨率野火行为模型。近年来,通过卫星探测火灾位置,可以使用测量数据进行数据吸收,以改进数字模型中的火灾扩散预测。这种工作通过卫星活动火灾数据来推算火灾历史,以便在物理学 informed 的方法下初始化气候-野火模型。火灾到达时间,即火灾到某个空间位置的时间, acted as a succinct representation of the history of a wildfire。在这种工作中,一种 conditional Wasserstein 生成 adversarial network (cWGAN) 被用来从卫星活动火灾数据中推算火灾到达时间。cWGAN 被用来生成 conditional 分布中的可能的火灾到达时间样本。这些样本被用来评估预测的不确定性。cWGAN 在加利福尼亚州2020-2022年四次野火中进行测试,并将预测的火灾范围与高分辨率空中热成像测量进行比较。此外,预测的点燃时间也与报告的点燃时间进行比较。 Sorensen 公式的平均值为0.81,表明方法的准确性很高。

T-SaS: Toward Shift-aware Dynamic Adaptation for Streaming Data

  • paper_url: http://arxiv.org/abs/2309.02610
  • repo_url: None
  • paper_authors: Weijieying Ren, Tianxiang Zhao, Wei Qin, Kunpeng Liu
  • for: 这篇研究是为了解决流数据中的分布类型逐步变化问题,特别是在没有预测的情况下,数据流中突然出现分布shift。
  • methods: 这篇研究使用了一个 Bayesian 框架,名为 T-SaS,并将分布类型变化变数纳入模型中,以捕捉数据流中突然的分布shift。然后,这篇研究设计了一个可动的网络选择策略,以适应不同的分布类型。
  • results: 实验结果显示,这篇研究的方法可以优于在数据流中准确地探测分布shift的范围,并对下游预测或分类任务进行有效适应。
    Abstract In many real-world scenarios, distribution shifts exist in the streaming data across time steps. Many complex sequential data can be effectively divided into distinct regimes that exhibit persistent dynamics. Discovering the shifted behaviors and the evolving patterns underlying the streaming data are important to understand the dynamic system. Existing methods typically train one robust model to work for the evolving data of distinct distributions or sequentially adapt the model utilizing explicitly given regime boundaries. However, there are two challenges: (1) shifts in data streams could happen drastically and abruptly without precursors. Boundaries of distribution shifts are usually unavailable, and (2) training a shared model for all domains could fail to capture varying patterns. This paper aims to solve the problem of sequential data modeling in the presence of sudden distribution shifts that occur without any precursors. Specifically, we design a Bayesian framework, dubbed as T-SaS, with a discrete distribution-modeling variable to capture abrupt shifts of data. Then, we design a model that enable adaptation with dynamic network selection conditioned on that discrete variable. The proposed method learns specific model parameters for each distribution by learning which neurons should be activated in the full network. A dynamic masking strategy is adopted here to support inter-distribution transfer through the overlapping of a set of sparse networks. Extensive experiments show that our proposed method is superior in both accurately detecting shift boundaries to get segments of varying distributions and effectively adapting to downstream forecast or classification tasks.
    摘要 在许多实际场景中,流动数据中的分布shift存在,这些shift可能是随机的和突然的。许多复杂的顺序数据可以被有效地分解为不同的领域,每个领域都具有持续的动力学。了解流动数据中的shift和下沉pattern是理解动态系统的关键。现有方法通常是在不同的分布下训练一个Robust模型,或者采用显式给出的领域边界来逐步修改模型。然而,存在两个挑战:(1)数据流中的shift可能会发生急剧和突然,无法预测;(2)训练共享模型可能无法捕捉不同领域的变化模式。这篇论文的目的是解决流动数据中的顺序数据模型化问题,具体来说是在无前兆的分布shift下进行适应。我们提出了一种抽象 Bayesian 框架,名为T-SaS,它包含一个简单的分布模型变量,用于捕捉数据的突然shift。然后,我们设计了一种可动的网络选择conditioned于这个分布变量,以便适应不同的分布。我们的方法可以学习每个分布的特定参数,并且通过在全网络中活跃的 neuron 来确定哪些参数是有用的。我们采用了一种动态遮盾策略来支持 между分布传递,这种策略可以在不同的分布下共享一部分稀缺网络。我们的方法在 Segment 分布boundary 检测和适应下沉任务中具有显著优势。

Distributed Variational Inference for Online Supervised Learning

  • paper_url: http://arxiv.org/abs/2309.02606
  • repo_url: https://github.com/pptx/distributed-mapping
  • paper_authors: Parth Paritosh, Nikolay Atanasov, Sonia Martinez
  • for: 这篇论文旨在提出一种扩展可行的分布式概率推理算法,用于智能传感器网络中的推理问题。
  • methods: 该论文提出了一种分布式 probabilistic inference algorithm,可以应用于连续变量、不可解 posteriors 和大规模实时数据。在中央设置下,Variational inference 是一种基本的技术,用于perform approximate Bayesian estimation,其中一个难以求解 posterior density 被approximated 为一个参数化density。
  • results: 论文的关键贡献在于 derive 了一个分布式lower bound на centralized estimation objective,这使得在传感器网络中进行分布式variational inference,只需一次 hop 通信。此外,论文还设计了一种在线分布式算法,用于在流动数据中进行分类和回归问题的解决,并特化为 Gaussian variational densities with non-linear likelihoods。最后,论文还 derive 了一个高维模型的 diagonalized 版本,并应用于多机器人概率地图使用indoor LiDAR数据。
    Abstract Developing efficient solutions for inference problems in intelligent sensor networks is crucial for the next generation of location, tracking, and mapping services. This paper develops a scalable distributed probabilistic inference algorithm that applies to continuous variables, intractable posteriors and large-scale real-time data in sensor networks. In a centralized setting, variational inference is a fundamental technique for performing approximate Bayesian estimation, in which an intractable posterior density is approximated with a parametric density. Our key contribution lies in the derivation of a separable lower bound on the centralized estimation objective, which enables distributed variational inference with one-hop communication in a sensor network. Our distributed evidence lower bound (DELBO) consists of a weighted sum of observation likelihood and divergence to prior densities, and its gap to the measurement evidence is due to consensus and modeling errors. To solve binary classification and regression problems while handling streaming data, we design an online distributed algorithm that maximizes DELBO, and specialize it to Gaussian variational densities with non-linear likelihoods. The resulting distributed Gaussian variational inference (DGVI) efficiently inverts a $1$-rank correction to the covariance matrix. Finally, we derive a diagonalized version for online distributed inference in high-dimensional models, and apply it to multi-robot probabilistic mapping using indoor LiDAR data.
    摘要 开发高效的推理解决方案是智能感知网络下一代位置跟踪和地图服务的关键。本文提出了一种可扩展的分布式概率推理算法,适用于继续变量、不可解决 posterior 和大规模实时数据。在中央化环境下,变量推理是概率推理的基本技术,用于approximate Bayesian estimation,其中一个难以解决的 posterior density 被approximated 为parametric density。我们的关键贡献在于 derive 一个可分离的下界于中央估计目标函数,这使得分布式变量推理可以在感知网络中进行一 hop 通信。我们的分布式证据下界(DELBO)是一个加权和 observation likelihood 和偏好函数之间的差异,它的差异是由consensus和modeling error 引起的。为了解决流动数据中的二分类和回归问题,我们设计了一种在线分布式算法,该算法可以最大化 DELBO,并特化为 Gaussian 变量推理函数。这导致了一种高效地减少 $1$-rank corrections 的方法。最后,我们 deriv 了一个 диагональ 版本,用于在线分布式推理高维模型,并应用于多 robot 概率地图使用indoor LiDAR 数据。

Screening of Pneumonia and Urinary Tract Infection at Triage using TriNet

  • paper_url: http://arxiv.org/abs/2309.02604
  • repo_url: None
  • paper_authors: Stephen Z. Lu
    for: 这个论文是为了解决医疗机构的急诊室拥堵和效率下降问题而写的。methods: 这个论文使用机器学习算法来自动化急诊室的医疗指导,以提高急诊室的效率和质量。results: 论文中的TriNet模型在检测患有肺炎和慢性肾炎的病人中显示了高正确率(0.86和0.93),这些模型比现有的临床标准更高,表明机器学习医疗指导可以提供免费、不侵入的检测方式,从而降低急诊室的风险和提高医疗质量。
    Abstract Due to the steady rise in population demographics and longevity, emergency department visits are increasing across North America. As more patients visit the emergency department, traditional clinical workflows become overloaded and inefficient, leading to prolonged wait-times and reduced healthcare quality. One of such workflows is the triage medical directive, impeded by limited human workload, inaccurate diagnoses and invasive over-testing. To address this issue, we propose TriNet: a machine learning model for medical directives that automates first-line screening at triage for conditions requiring downstream testing for diagnosis confirmation. To verify screening potential, TriNet was trained on hospital triage data and achieved high positive predictive values in detecting pneumonia (0.86) and urinary tract infection (0.93). These models outperform current clinical benchmarks, indicating that machine-learning medical directives can offer cost-free, non-invasive screening with high specificity for common conditions, reducing the risk of over-testing while increasing emergency department efficiency.
    摘要 Translated into Simplified Chinese:因人口减少和寿命增加,北美洲的急诊室访问量在增长。随着更多患者前往急诊室,传统的临床工作流程变得过载和不具有效率,导致排队时间增长,健康保健质量减退。其中一种工作流程是抢救医疗指南,受到人工负荷、不准确诊断和不必要的检测限制。为解决这一问题,我们提议TriNet:一种基于机器学习的医疗指南,自动化急诊室抢救阶段的首选检测,以确认诊断。为验证这一点,TriNet在医院急诊室数据上进行训练,在患有肺炎和尿感染的病例中达到了0.86的正确预测值,并在患有尿感染的病例中达到了0.93的正确预测值。这些模型比现有的临床标准更高,表明机器学习医疗指南可以提供免费、不侵入的检测,高度特异性 для常见的疾病,降低过测试风险,提高急诊室效率。

Causal Structure Recovery of Linear Dynamical Systems: An FFT based Approach

  • paper_url: http://arxiv.org/abs/2309.02571
  • repo_url: None
  • paper_authors: Mishfad Shaikh Veedu, James Melbourne, Murti V. Salapaka
  • for: 本研究旨在探讨时间序列观测中的 causal 效应, especial when there are dynamical dependencies between entities.
  • methods: 我们提出了一种方法,可以减少计算复杂性为 $O(Tn^3 \log N)$,用于回归 causation 结构,从而获得频域频谱 (FD) 表示。
  • results: 我们发现,对于LTI 系统,可以使用 do-calculus 机制在 FD 中进行 causal 推理,并且可以使用 multivariate Wiener projections 实现 graph 重建,具有 $O(n)$ 复杂性。
    Abstract Learning causal effects from data is a fundamental and well-studied problem across science, especially when the cause-effect relationship is static in nature. However, causal effect is less explored when there are dynamical dependencies, i.e., when dependencies exist between entities across time. Identifying dynamic causal effects from time-series observations is computationally expensive when compared to the static scenario. We demonstrate that the computational complexity of recovering the causation structure for the vector auto-regressive (VAR) model is $O(Tn^3N^2)$, where $n$ is the number of nodes, $T$ is the number of samples, and $N$ is the largest time-lag in the dependency between entities. We report a method, with a reduced complexity of $O(Tn^3 \log N)$, to recover the causation structure to obtain frequency-domain (FD) representations of time-series. Since FFT accumulates all the time dependencies on every frequency, causal inference can be performed efficiently by considering the state variables as random variables at any given frequency. We additionally show that, for systems with interactions that are LTI, do-calculus machinery can be realized in the FD resulting in versions of the classical single-door (with cycles), front and backdoor criteria. We demonstrate, for a large class of problems, graph reconstruction using multivariate Wiener projections results in a significant computational advantage with $O(n)$ complexity over reconstruction algorithms such as the PC algorithm which has $O(n^q)$ complexity, where $q$ is the maximum neighborhood size. This advantage accrues due to some remarkable properties of the phase response of the frequency-dependent Wiener coefficients which is not present in any time-domain approach.
    摘要 学习 causal effects 从数据中是科学的基础问题,特别是当 causal relationship 是静态的时候。然而, causal effect 在存在时间相关性时更少研究。从时序观察数据中提取动态 causal effects 的计算复杂度比静态场景更高。我们证明了VAR 模型的 causation 结构恢复计算复杂度为 $O(Tn^3N^2)$, where $n$ 是节点数, $T$ 是样本数, $N$ 是最大时间延迟 между实体。我们报告了一种方法,计算复杂度为 $O(Tn^3 \log N)$, 恢复 causation 结构,以获得频域域 (FD) 表示。由于 FFT 积累了所有时间相关性,因此在频域中进行 causal inference 是高效的。我们还证明了,对于具有 LTI 交互的系统,do-calculus 机械可以在 FD 中实现,导致了类ical single-door (with cycles)、front 和 backdoor criterion。我们示例了, для 一类问题,使用 multivariate Wiener projections 进行图重建可以获得 $O(n)$ 复杂度的计算优势,比 PC 算法 ($O(n^q)$ 复杂度) 更高效。这个优势来自频域 Wiener 系数的相对应性,不存在在时域方法中。

Sparse Partitioning Around Medoids

  • paper_url: http://arxiv.org/abs/2309.02557
  • repo_url: None
  • paper_authors: Lars Lenssen, Erich Schubert
  • For: 这篇论文是关于分群算法,具体是使用Partitioning Around Medoids(PAM)和fastPAM方法来解决分群问题。* Methods: 这篇论文使用了PAM和fastPAM方法,并且提出了一个缓存簇数据的方法来解决分群问题。* Results: 这篇论文的结果显示了这个方法可以在实际应用中提供高品质的分群解决方案,并且可以避免过度的缓存和复杂运算。
    Abstract Partitioning Around Medoids (PAM, k-Medoids) is a popular clustering technique to use with arbitrary distance functions or similarities, where each cluster is represented by its most central object, called the medoid or the discrete median. In operations research, this family of problems is also known as facility location problem (FLP). FastPAM recently introduced a speedup for large k to make it applicable for larger problems, but the method still has a runtime quadratic in N. In this chapter, we discuss a sparse and asymmetric variant of this problem, to be used for example on graph data such as road networks. By exploiting sparsity, we can avoid the quadratic runtime and memory requirements, and make this method scalable to even larger problems, as long as we are able to build a small enough graph of sufficient connectivity to perform local optimization. Furthermore, we consider asymmetric cases, where the set of medoids is not identical to the set of points to be covered (or in the interpretation of facility location, where the possible facility locations are not identical to the consumer locations). Because of sparsity, it may be impossible to cover all points with just k medoids for too small k, which would render the problem unsolvable, and this breaks common heuristics for finding a good starting condition. We, hence, consider determining k as a part of the optimization problem and propose to first construct a greedy initial solution with a larger k, then to optimize the problem by alternating between PAM-style "swap" operations where the result is improved by replacing medoids with better alternatives and "remove" operations to reduce the number of k until neither allows further improving the result quality. We demonstrate the usefulness of this method on a problem from electrical engineering, with the input graph derived from cartographic data.
    摘要 分割附近中心(PAM,k-Medoids)是一种流行的聚类技术,可以用于任何距离函数或相似度,每个群由其最中央对象表示,称为中心点或离散中值。在运维研究中,这家团队的问题也称为设施位置问题(FLP)。Recently, FastPAM introduced a speedup for large k to make it applicable for larger problems, but the method still has a runtime quadratic in N. 在本章中,我们讨论了一种稀疏和不均匀的变体,用于应用于图数据,如道路网络。通过利用稀疏性,我们可以避免 quadratic runtime和内存需求,并使这种方法可扩展至更大的问题,只要我们能够构建一个足够紧凑的图,以便进行本地优化。此外,我们考虑了非对称情况,其中中心点与要覆盖的点不同。由于稀疏性,可能无法使用 too small k 覆盖所有点,这会导致问题不可解,并让常见的尝试找到好的初始状态失效。我们因此考虑在优化问题时确定 k 的部分,并提议先构建一个大于 k 的推荐解,然后通过 PAM 样式的 "交换" 操作和 "移除" 操作来优化问题,直到 neither 允许进一步提高结果质量。我们在电力工程中的一个问题上示cases the usefulness of this method.Note: The translation is in Simplified Chinese, which is the standard Chinese writing system used in mainland China and Singapore. The translation is based on the traditional Chinese characters, but the sentence structure and vocabulary have been adjusted to conform to Simplified Chinese conventions.

Data Aggregation for Hierarchical Clustering

  • paper_url: http://arxiv.org/abs/2309.02552
  • repo_url: https://github.com/elki-project/elki
  • paper_authors: Erich Schubert, Andreas Lang
  • for: 使用Hierarchical Agglomerative Clustering(HAC)进行数据 clustering,但是因为HAC需要全息距离矩阵和完整的层次结构,因此在资源受限的系统上可能会出现问题。
  • methods: 使用BETULA数据汇集算法,一种稳定的BIRCH数据汇集算法变体,来使HAC在受限资源的系统上可行,只有小的质量损失。
  • results: 可以使用BETULA数据汇集算法来实现HAC在受限资源的系统上的可行性,但是有小的质量损失。
    Abstract Hierarchical Agglomerative Clustering (HAC) is likely the earliest and most flexible clustering method, because it can be used with many distances, similarities, and various linkage strategies. It is often used when the number of clusters the data set forms is unknown and some sort of hierarchy in the data is plausible. Most algorithms for HAC operate on a full distance matrix, and therefore require quadratic memory. The standard algorithm also has cubic runtime to produce a full hierarchy. Both memory and runtime are especially problematic in the context of embedded or otherwise very resource-constrained systems. In this section, we present how data aggregation with BETULA, a numerically stable version of the well known BIRCH data aggregation algorithm, can be used to make HAC viable on systems with constrained resources with only small losses on clustering quality, and hence allow exploratory data analysis of very large data sets.
    摘要

A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

  • paper_url: http://arxiv.org/abs/2309.02539
  • repo_url: None
  • paper_authors: Karn N. Watcharasupat, Chih-Wei Wu, Yiwei Ding, Iroro Orife, Aaron J. Hipple, Phillip A. Williams, Scott Kramer, Alexander Lerch, William Wolcott
  • for: 提取对话、音乐和特效的独立音频源
  • methods: 基于频谱分割的Bandsplit RNN模型,利用听觉学原则定义频率谱,使用1-norm损失函数和共享编码器提高分离性能
  • results: 在Divide and Remaster数据集上,模型达到了理想比例幕值以上的分离性能
    Abstract Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue stem, the music stem, and the effects stem from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psycho-acoustically motivated frequency scales were used to inform the band definitions which are now defined with redundancy for more reliable feature extraction. A loss function motivated by the signal-to-noise ratio and the sparsity-promoting property of the 1-norm was proposed. We additionally exploit the information-sharing property of a common-encoder setup to reduce computational complexity during both training and inference, improve separation performance for hard-to-generalize classes of sounds, and allow flexibility during inference time with easily detachable decoders. Our best model sets the state of the art on the Divide and Remaster dataset with performance above the ideal ratio mask for the dialogue stem.
    摘要 《电影式音频源分离》是一个相对较新的音频源分离子任务,目的是从它们的混合中提取对话束、音乐束和特效束。在这个工作中,我们开发了一个泛化了Bandsplit RNN模型,用于任何完整或过完整的频谱分解。基于听觉驱动的频谱缩放被用来引导频段定义,这些频段现在具有冗余性,以提高特征提取的可靠性。我们还提出了基于信号噪声比和1-norm减少准则的损失函数。此外,我们还利用了通用编码器设置中的信息共享特性,以降低训练和推断时的计算复杂度,提高对困难总结类声音的分离性能,并允许在推断时进行轻松地分解。我们的最佳模型已经将Divide and Remaster数据集的状态标准化,对对话束的性能超过理想的掩码壁。

Diffusion on the Probability Simplex

  • paper_url: http://arxiv.org/abs/2309.02530
  • repo_url: None
  • paper_authors: Griffin Floto, Thorsteinn Jonsson, Mihai Nica, Scott Sanner, Eric Zhengyu Zhu
  • for: 生成模型学习数据分布的逆进程,创造一个生成模型。
  • methods: 使用概率 simplice 进行Diffusion,使用softmax函数应用于Ornstein-Unlenbeck过程。
  • results: 方法可以自然扩展到包括Diffusion在unit cube上,并有应用于 bounded image generation。Note: “概率 simplice” refers to the probability simplex, which is a geometric object used to represent probability distributions. “Diffusion on the unit cube” refers to a specific type of diffusion process that is applied to a cube-shaped domain, rather than a continuous space.
    Abstract Diffusion models learn to reverse the progressive noising of a data distribution to create a generative model. However, the desired continuous nature of the noising process can be at odds with discrete data. To deal with this tension between continuous and discrete objects, we propose a method of performing diffusion on the probability simplex. Using the probability simplex naturally creates an interpretation where points correspond to categorical probability distributions. Our method uses the softmax function applied to an Ornstein-Unlenbeck Process, a well-known stochastic differential equation. We find that our methodology also naturally extends to include diffusion on the unit cube which has applications for bounded image generation.
    摘要 diffusion 模型可以学习将数据分布中的进行进行逆转,以创建一个生成模型。然而,所希望的连续性的噪声过程可能与数据的精度有冲突。为了解决这种连续和精度之间的矛盾,我们提出了将噪声应用到概率 simpliciter 的方法。使用概率 simpliciter 自然地创造了点对应的 categorical 概率分布的解释。我们的方法使用 Ornstein-Unlenbeck 过程和 softmax 函数。我们发现我们的方法也自然地扩展到包括单位立方体上的噪声,这有应用于 bounded 图像生成。Note: "概率 simpliciter" refers to the probability simplex, which is a geometric object used to represent probability distributions. In this context, the method proposed in the text applies diffusion to the probability simplex to create a generative model.

Adaptive Adversarial Training Does Not Increase Recourse Costs

  • paper_url: http://arxiv.org/abs/2309.02528
  • repo_url: None
  • paper_authors: Ian Hardy, Jayanth Yetukuri, Yang Liu
  • for: 本研究旨在 investigating the effects of adaptive adversarial training on algorithmic recourse costs.
  • methods: 本研究使用了 adaptive adversarial training 方法,以对模型的Robustness和Algorithmic recourse costs 进行研究.
  • results: 研究结果显示,adaptive adversarial training 可以对模型的Robustness进行改善,但是这些改善对Algorithmic recourse costs 没有明显的影响。
    Abstract Recent work has connected adversarial attack methods and algorithmic recourse methods: both seek minimal changes to an input instance which alter a model's classification decision. It has been shown that traditional adversarial training, which seeks to minimize a classifier's susceptibility to malicious perturbations, increases the cost of generated recourse; with larger adversarial training radii correlating with higher recourse costs. From the perspective of algorithmic recourse, however, the appropriate adversarial training radius has always been unknown. Another recent line of work has motivated adversarial training with adaptive training radii to address the issue of instance-wise variable adversarial vulnerability, showing success in domains with unknown attack radii. This work studies the effects of adaptive adversarial training on algorithmic recourse costs. We establish that the improvements in model robustness induced by adaptive adversarial training show little effect on algorithmic recourse costs, providing a potential avenue for affordable robustness in domains where recoursability is critical.
    摘要 最近的研究已经连接了 adversarial attack 方法和 algorithmic recourse 方法:它们都寻找最小的输入实例修改,以变更模型的分类决策。已经证明,传统的 adversarial training,寻求对于黑客变化的抑制,将生成的 recourse 成本增加; avec larger adversarial training radii 相关的 recourse 成本高于。从 algorithmic recourse 的角度来看,适当的 adversarial training radius 一直未知。另一些最近的研究将 adversarial training 与 adaptive training radii 结合,以解决实例对于黑客变化的不确定性,并在不同的实例上显示成功。这项研究 investigate 了 adaptive adversarial training 对于 algorithmic recourse 成本的影响。我们确定了 adaptive adversarial training 对于模型Robustness 的改进,对于 algorithmic recourse 成本的影响几乎无效,提供了可能的折衣预算在域内的途径。

Comparative Analysis of CPU and GPU Profiling for Deep Learning Models

  • paper_url: http://arxiv.org/abs/2309.02521
  • repo_url: None
  • paper_authors: Dipesh Gyawali
  • for: 这个论文是关于深度学习和机器学习应用的研究,旨在探讨 GPU 和 CPU 在训练深度神经网络时的资源分配和消耗。
  • methods: 该论文使用了 Pytorch 框架来实现深度学习项目,并对 GPU 和 CPU 的操作跟踪进行分析,以了解它们在训练深度神经网络时的资源分配和消耗。
  • results: 研究显示,在训练深度神经网络时,GPU 的运行时间比 CPU 更低,但是对于更简单的网络,GPU 与 CPU 之间并没有很大的差异。
    Abstract Deep Learning(DL) and Machine Learning(ML) applications are rapidly increasing in recent days. Massive amounts of data are being generated over the internet which can derive meaningful results by the use of ML and DL algorithms. Hardware resources and open-source libraries have made it easy to implement these algorithms. Tensorflow and Pytorch are one of the leading frameworks for implementing ML projects. By using those frameworks, we can trace the operations executed on both GPU and CPU to analyze the resource allocations and consumption. This paper presents the time and memory allocation of CPU and GPU while training deep neural networks using Pytorch. This paper analysis shows that GPU has a lower running time as compared to CPU for deep neural networks. For a simpler network, there are not many significant improvements in GPU over the CPU.
    摘要 深度学习(DL)和机器学习(ML)应用在最近几年内快速增长。互联网上的巨量数据可以通过ML和DL算法提取有意义的结果。硬件资源和开源库的出现使得实现这些算法变得更加容易。TensorFlow和PyTorch是实现ML项目的主要框架之一。通过使用这些框架,我们可以跟踪CPU和GPU上执行的操作,以分析资源分配和消耗。本文 presente在训练深度神经网络时CPU和GPU的时间和内存分配。本文分析显示,在深度神经网络训练中,GPU的运行时间比CPU更低。对于简单的网络,GPU上并没有很多显著的改进。

Towards User Guided Actionable Recourse

  • paper_url: http://arxiv.org/abs/2309.02517
  • repo_url: None
  • paper_authors: Jayanth Yetukuri, Ian Hardy, Yang Liu
  • for: This paper aims to provide actionable recourse to negatively impacted users in machine learning models, with a focus on capturing user preferences via soft constraints.
  • methods: The paper proposes using three simple forms of soft constraints to capture user preferences: scoring continuous features, bounding feature values, and ranking categorical features. Additionally, the paper proposes a gradient-based approach to identify User Preferred Actionable Recourse (UP-AR).
  • results: The paper conducts extensive experiments to verify the effectiveness of the proposed approach.
    Abstract Machine Learning's proliferation in critical fields such as healthcare, banking, and criminal justice has motivated the creation of tools which ensure trust and transparency in ML models. One such tool is Actionable Recourse (AR) for negatively impacted users. AR describes recommendations of cost-efficient changes to a user's actionable features to help them obtain favorable outcomes. Existing approaches for providing recourse optimize for properties such as proximity, sparsity, validity, and distance-based costs. However, an often-overlooked but crucial requirement for actionability is a consideration of User Preference to guide the recourse generation process. In this work, we attempt to capture user preferences via soft constraints in three simple forms: i) scoring continuous features, ii) bounding feature values and iii) ranking categorical features. Finally, we propose a gradient-based approach to identify User Preferred Actionable Recourse (UP-AR). We carried out extensive experiments to verify the effectiveness of our approach.
    摘要

Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework

  • paper_url: http://arxiv.org/abs/2309.02428
  • repo_url: https://github.com/mhelal/TensorsPyBook
  • paper_authors: Manal Helal
  • for: 本文旨在概述张量化,它是一种将多维数据转化为二维矩阵的方法,用于提高深度学习模型的表示和分析能力。
  • methods: 本文使用了多种多方分析方法,包括盲源分离(BSS)等,并评估了这些方法在不同领域的应用。
  • results: 研究结果表明,使用多维数据的原始形式和多方分析方法可以更好地捕捉数据中的复杂关系,同时减少模型参数数量和加速处理速度。
    Abstract The burgeoning growth of public domain data and the increasing complexity of deep learning model architectures have underscored the need for more efficient data representation and analysis techniques. This paper is motivated by the work of Helal (2023) and aims to present a comprehensive overview of tensorization. This transformative approach bridges the gap between the inherently multidimensional nature of data and the simplified 2-dimensional matrices commonly used in linear algebra-based machine learning algorithms. This paper explores the steps involved in tensorization, multidimensional data sources, various multiway analysis methods employed, and the benefits of these approaches. A small example of Blind Source Separation (BSS) is presented comparing 2-dimensional algorithms and a multiway algorithm in Python. Results indicate that multiway analysis is more expressive. Contrary to the intuition of the dimensionality curse, utilising multidimensional datasets in their native form and applying multiway analysis methods grounded in multilinear algebra reveal a profound capacity to capture intricate interrelationships among various dimensions while, surprisingly, reducing the number of model parameters and accelerating processing. A survey of the multi-away analysis methods and integration with various Deep Neural Networks models is presented using case studies in different domains.
    摘要 随着公共领域数据的急速增长和深度学习模型的复杂化,需要更有效的数据表示和分析技术。这篇论文是基于Helal(2023)的研究,旨在提供tensorization的全面介绍。这种转换方法 bridge了数据的自然多维性和通常用于线性代数学习算法中的简单二维矩阵之间的差异。本文探讨tensorization的过程、多维数据源、多方分析方法和其好处。此外,还提供了一个小例子, Comparing 2-dimensional算法和多方算法在Python中。结果表明,多方分析方法更加表达力。与传统的维度惩罚理论相反,使用原始多维数据和应用多方分析方法可以捕捉多维维度之间的复杂关系,同时减少模型参数的数量和加速处理。本文还提供了多方分析方法的survey和与不同领域的各种深度神经网络模型的集成。Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".

Monotone Tree-Based GAMI Models by Adapting XGBoost

  • paper_url: http://arxiv.org/abs/2309.02426
  • repo_url: None
  • paper_authors: Linwei Hu, Soroush Aramideh, Jie Chen, Vijayan N. Nair
    for:This paper aims to develop a monotone tree-based functional ANOVA model, called monotone GAMI-Tree, to address the issue of non-monotonicity in existing GAMI models.methods:The proposed method uses a filtering technique to select important interactions, followed by fitting a monotone XGBoost algorithm with the selected interactions. The results are then parsed and purified to obtain a monotone GAMI model.results:The proposed method is demonstrated on simulated datasets and shows better performance than existing GAMI models in terms of monotonicity and accuracy. The results also show that the main effects can be monotone, but the interactions may not be monotone.
    Abstract Recent papers have used machine learning architecture to fit low-order functional ANOVA models with main effects and second-order interactions. These GAMI (GAM + Interaction) models are directly interpretable as the functional main effects and interactions can be easily plotted and visualized. Unfortunately, it is not easy to incorporate the monotonicity requirement into the existing GAMI models based on boosted trees, such as EBM (Lou et al. 2013) and GAMI-Lin-T (Hu et al. 2022). This paper considers models of the form $f(x)=\sum_{j,k}f_{j,k}(x_j, x_k)$ and develops monotone tree-based GAMI models, called monotone GAMI-Tree, by adapting the XGBoost algorithm. It is straightforward to fit a monotone model to $f(x)$ using the options in XGBoost. However, the fitted model is still a black box. We take a different approach: i) use a filtering technique to determine the important interactions, ii) fit a monotone XGBoost algorithm with the selected interactions, and finally iii) parse and purify the results to get a monotone GAMI model. Simulated datasets are used to demonstrate the behaviors of mono-GAMI-Tree and EBM, both of which use piecewise constant fits. Note that the monotonicity requirement is for the full model. Under certain situations, the main effects will also be monotone. But, as seen in the examples, the interactions will not be monotone.
    摘要 现在的研究论文使用机器学习建筑物来适应低阶函数ANOVA模型的主效应和次阶交互。这些GAMI(GAM + 交互)模型可以直接解释为函数主效应和交互,可以轻松地图表和可见化。然而,对于现有的GAMI模型,如EBM(Lou et al. 2013)和GAMI-Lin-T(Hu et al. 2022),不能直接包含幂随机性的要求。这篇论文考虑模型的形式为 $f(x)=\sum_{j,k}f_{j,k}(x_j, x_k)$,并开发了幂随机性基于XGBoost算法的幂随机性GAMI模型,称为幂随机性GAMI-Tree。使用XGBoost算法直接适应幂随机性模型是 straightforward。然而,适应后的模型仍然是黑盒模型。我们采用了不同的方法:i)使用筛选技术确定重要的交互,ii)适应幂随机性XGBoost算法,并iii)解析和纯化结果,以获得幂随机性GAMI模型。使用 simulate datasets 进行了示例,并证明了 mono-GAMI-Tree 和 EBM 都使用 piecewise constant fits 的行为。请注意,幂随机性要求是全模型的,而不是每个主效应或交互。在某些情况下,主效应也可能是幂随机的,但交互通常不是。

On the Minimax Regret in Online Ranking with Top-k Feedback

  • paper_url: http://arxiv.org/abs/2309.02425
  • repo_url: None
  • paper_authors: Mingyuan Zhang, Ambuj Tewari
  • For: Online ranking with top $k$ feedback* Methods: Partial monitoring techniques, minimax regret rates* Results: Full characterization of minimax regret rates for all $k$ and for Pairwise Loss, Discounted Cumulative Gain, and Precision@n, efficient algorithm for Precision@nHere is the Chinese translation of the three information points:* For: online排名 WITH top $k$ 反馈* Methods: partial monitoring 技术, minimax regret rates* Results: all $k$ 和 Pairwise Loss, Discounted Cumulative Gain, Precision@n 中的完整characterization, efficient algorithm for Precision@nI hope this helps! Let me know if you have any other questions.
    Abstract In online ranking, a learning algorithm sequentially ranks a set of items and receives feedback on its ranking in the form of relevance scores. Since obtaining relevance scores typically involves human annotation, it is of great interest to consider a partial feedback setting where feedback is restricted to the top-$k$ items in the rankings. Chaudhuri and Tewari [2017] developed a framework to analyze online ranking algorithms with top $k$ feedback. A key element in their work was the use of techniques from partial monitoring. In this paper, we further investigate online ranking with top $k$ feedback and solve some open problems posed by Chaudhuri and Tewari [2017]. We provide a full characterization of minimax regret rates with the top $k$ feedback model for all $k$ and for the following ranking performance measures: Pairwise Loss, Discounted Cumulative Gain, and Precision@n. In addition, we give an efficient algorithm that achieves the minimax regret rate for Precision@n.
    摘要 在在线排名中,一种学习算法顺序排序一组项目,并接收反馈分数以评估其排名的有用性。由于获取反馈分数通常需要人工标注,因此受限于top-$k$反馈的情况是非常有趣。查得和特ва里[2017]提出了在线排名算法的框架,并使用partial monitoring技术进行分析。在这篇论文中,我们进一步调查了在线排名的top-$k$反馈问题,并解决了查得和特wa里[2017]提出的一些开放问题。我们提供了所有$k$的最小最大偏差率的完整 характеристику,以及对Pairwise Loss、Discounted Cumulative Gain和Precision@n的排名性能指标进行分析。此外,我们还提供了实现最小最大偏差率的高效算法。

Maximum Mean Discrepancy Meets Neural Networks: The Radon-Kolmogorov-Smirnov Test

  • paper_url: http://arxiv.org/abs/2309.02422
  • repo_url: None
  • paper_authors: Seunghoon Paik, Michael Celentano, Alden Green, Ryan J. Tibshirani
  • For: The paper is written to introduce a new test for comparing two distributions, called the Radon-Kolmogorov-Smirnov (RKS) test, which is based on the concept of Radon bounded variation (RBV) and has applications in deep learning.* Methods: The RKS test uses the unit ball in the RBV space of a given smoothness order $k \geq 0$ as the function space $\mathcal{F}$ and maximizes the mean difference over samples from one distribution $P$ versus another $Q$. The test is related to neural networks and can be optimized using modern deep learning toolkits.* Results: The paper proves that the RKS test has asymptotically full power at distinguishing any distinct pair $P \not= Q$ of distributions, derives its asymptotic null distribution, and conducts extensive experiments to evaluate the strengths and weaknesses of the RKS test compared to the more traditional kernel MMD test.Here is the information in Simplified Chinese:
  • for: 这篇论文是为了介绍一种新的分布比较测试方法,称为Radon-Kolmogorov-Smirnov(RKS)测试,它基于Radon bounded variation(RBV)的概念,有应用于深度学习。
  • methods: RKS测试使用 unit ball在RBV空间中的smoothness order $k \geq 0$作为函数空间$\mathcal{F}$,对一个分布$P$和另一个分布$Q$的样本进行最大化mean difference。测试与神经网络有关,可以使用现代深度学习工具包来(近似地)优化测试的 критериion。
  • results: 论文证明RKS测试在任意不同的分布$P \not= Q$中具有极大的权重, derivation of its asymptotic null distribution, 并进行了广泛的实验来评估RKS测试与传统的kernel MMD测试的优劣点。
    Abstract Maximum mean discrepancy (MMD) refers to a general class of nonparametric two-sample tests that are based on maximizing the mean difference over samples from one distribution $P$ versus another $Q$, over all choices of data transformations $f$ living in some function space $\mathcal{F}$. Inspired by recent work that connects what are known as functions of $\textit{Radon bounded variation}$ (RBV) and neural networks (Parhi and Nowak, 2021, 2023), we study the MMD defined by taking $\mathcal{F}$ to be the unit ball in the RBV space of a given smoothness order $k \geq 0$. This test, which we refer to as the $\textit{Radon-Kolmogorov-Smirnov}$ (RKS) test, can be viewed as a generalization of the well-known and classical Kolmogorov-Smirnov (KS) test to multiple dimensions and higher orders of smoothness. It is also intimately connected to neural networks: we prove that the witness in the RKS test -- the function $f$ achieving the maximum mean difference -- is always a ridge spline of degree $k$, i.e., a single neuron in a neural network. This allows us to leverage the power of modern deep learning toolkits to (approximately) optimize the criterion that underlies the RKS test. We prove that the RKS test has asymptotically full power at distinguishing any distinct pair $P \not= Q$ of distributions, derive its asymptotic null distribution, and carry out extensive experiments to elucidate the strengths and weakenesses of the RKS test versus the more traditional kernel MMD test.
    摘要 “最大均值差(MMD)是一种通用的非参数TwoSample测试,它基于对一个分布$P$和另一个分布$Q$的样本之间的均值差进行最大化,而且对所有的数据变换$f$生成在一个函数空间$\mathcal{F}$中。 drawing inspiration from recent work that connects functions of Radon bounded variation (RBV) and neural networks (Parhi and Nowak, 2021, 2023), we study the MMD defined by taking $\mathcal{F}$ to be the unit ball in the RBV space of a given smoothness order $k \geq 0$. this test, which we refer to as the Radon-Kolmogorov-Smirnov (RKS) test, can be viewed as a generalization of the well-known and classical Kolmogorov-Smirnov (KS) test to multiple dimensions and higher orders of smoothness. it is also intimately connected to neural networks: we prove that the witness in the RKS test -- the function $f$ achieving the maximum mean difference -- is always a ridge spline of degree $k$, i.e., a single neuron in a neural network. this allows us to leverage the power of modern deep learning toolkits to (approximately) optimize the criterion that underlies the RKS test. we prove that the RKS test has asymptotically full power at distinguishing any distinct pair $P \not= Q$ of distributions, derive its asymptotic null distribution, and carry out extensive experiments to elucidate the strengths and weaknesses of the RKS test versus the more traditional kernel MMD test.”Note: "Simplified Chinese" is a translation of the text into traditional Chinese characters, but with simpler grammar and vocabulary to make it easier to read and understand.

Computing SHAP Efficiently Using Model Structure Information

  • paper_url: http://arxiv.org/abs/2309.02417
  • repo_url: None
  • paper_authors: Linwei Hu, Ke Wang
  • for: 本研究旨在提高 SHAP(SHapley Additive exPlanations)计算的效率,因为现有的方法具有几乎 exponential 时间复杂度。
  • methods: 本文提出了三种方法来计算 SHAP,包括:1) 基于函数分解的方法,可以在 polynomial 时间内计算 SHAP; 2) 基于模型结构信息的方法,可以在 polynomial 时间内计算 SHAP; 3) 基于迭代方法的方法,可以用于 unknown 模型结构情况下。
  • results: 三种方法均可以准确计算 SHAP,并且 computationally efficient。与 Castor & Gomez (2008) 的采样方法进行比较,本文的方法在几乎所有情况下具有更高的效率。
    Abstract SHAP (SHapley Additive exPlanations) has become a popular method to attribute the prediction of a machine learning model on an input to its features. One main challenge of SHAP is the computation time. An exact computation of Shapley values requires exponential time complexity. Therefore, many approximation methods are proposed in the literature. In this paper, we propose methods that can compute SHAP exactly in polynomial time or even faster for SHAP definitions that satisfy our additivity and dummy assumptions (eg, kernal SHAP and baseline SHAP). We develop different strategies for models with different levels of model structure information: known functional decomposition, known order of model (defined as highest order of interaction in the model), or unknown order. For the first case, we demonstrate an additive property and a way to compute SHAP from the lower-order functional components. For the second case, we derive formulas that can compute SHAP in polynomial time. Both methods yield exact SHAP results. Finally, if even the order of model is unknown, we propose an iterative way to approximate Shapley values. The three methods we propose are computationally efficient when the order of model is not high which is typically the case in practice. We compare with sampling approach proposed in Castor & Gomez (2008) using simulation studies to demonstrate the efficacy of our proposed methods.
    摘要

First and zeroth-order implementations of the regularized Newton method with lazy approximated Hessians

  • paper_url: http://arxiv.org/abs/2309.02412
  • repo_url: None
  • paper_authors: Nikita Doikov, Geovani Nunes Grapiglia
  • for: solving general non-convex optimization problems
  • methods: Cubically regularized Newton method with finite difference approximations of the derivatives, and an adaptive search procedure that simultaneously fits the regularization constant and the parameters of the finite difference approximations
  • results: global complexity bound of $\mathcal{O}( n^{1/2} \epsilon^{-3/2})$ function and gradient evaluations for the Hessian-free method, and a bound of $\mathcal{O}( n^{3/2} \epsilon^{-3/2} )$ function evaluations for the derivative-free method, which significantly improve the previously known ones in terms of the joint dependence on $n$ and $\epsilon$.
    Abstract In this work, we develop first-order (Hessian-free) and zero-order (derivative-free) implementations of the Cubically regularized Newton method for solving general non-convex optimization problems. For that, we employ finite difference approximations of the derivatives. We use a special adaptive search procedure in our algorithms, which simultaneously fits both the regularization constant and the parameters of the finite difference approximations. It makes our schemes free from the need to know the actual Lipschitz constants. Additionally, we equip our algorithms with the lazy Hessian update that reuse a previously computed Hessian approximation matrix for several iterations. Specifically, we prove the global complexity bound of $\mathcal{O}( n^{1/2} \epsilon^{-3/2})$ function and gradient evaluations for our new Hessian-free method, and a bound of $\mathcal{O}( n^{3/2} \epsilon^{-3/2} )$ function evaluations for the derivative-free method, where $n$ is the dimension of the problem and $\epsilon$ is the desired accuracy for the gradient norm. These complexity bounds significantly improve the previously known ones in terms of the joint dependence on $n$ and $\epsilon$, for the first-order and zeroth-order non-convex optimization.
    摘要 在这项工作中,我们开发了一种基于第一阶 Taylor 展开的卷积规范 Newton 方法,用于解决普通非凸优化问题。我们使用贝叶斯适应搜索算法,以同时调整规范常数和finite differenceapproximation的参数。这使得我们的算法不需要知道实际的Lipschitz常数。此外,我们还在我们的算法中使用懒散Hessian更新,使用已经计算过的Hessian近似矩阵进行多个迭代。我们证明了我们新的Hessian-free方法的全局复杂度上下文为 $\mathcal{O}(n^{1/2} \epsilon^{-3/2})$ 函数和梯度评估数,而derivative-free方法的复杂度上下文为 $\mathcal{O}(n^{3/2} \epsilon^{-3/2})$ 函数评估数,其中 $n$ 是优化问题的维度, $\epsilon$ 是梯度norm的desired accuracy。这些复杂度上下文在之前已知的joint $n$ 和 $\epsilon$ 的依赖关系方面具有显著改进。

Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices

  • paper_url: http://arxiv.org/abs/2309.02411
  • repo_url: None
  • paper_authors: Bojia Zi, Xianbiao Qi, Lingzhi Wang, Jianan Wang, Kam-Fai Wong, Lei Zhang
  • for: 本文提出了Delta-LoRA,一种基于大语言模型的参数高效的细化方法。
  • methods: Delta-LoRA不仅更新低级矩阵 $\bA$ 和 $\bB$,还通过更新 $\bW$ 中的差值($\bA^{(t+1)}\bB^{(t+1)} - \bA^{(t)}\bB^{(t)}$)来进行学习。
  • results: 对比LoRA和其他低级适应方法,Delta-LoRA在下游任务中表现出色,并且与LoRA相比,Delta-LoRA具有相同的内存需求和计算成本。
    Abstract In this paper, we present Delta-LoRA, which is a novel parameter-efficient approach to fine-tune large language models (LLMs). In contrast to LoRA and other low-rank adaptation methods such as AdaLoRA, Delta-LoRA not only updates the low-rank matrices $\bA$ and $\bB$, but also propagate the learning to the pre-trained weights $\bW$ via updates utilizing the delta of the product of two low-rank matrices ($\bA^{(t+1)}\bB^{(t+1)} - \bA^{(t)}\bB^{(t)}$). Such a strategy effectively addresses the limitation that the incremental update of low-rank matrices is inadequate for learning representations capable for downstream tasks. Moreover, as the update of $\bW$ does not need to compute the gradients of $\bW$ and store their momentums, Delta-LoRA shares comparable memory requirements and computational costs with LoRA. Extensive experiments show that Delta-LoRA significantly outperforms existing low-rank adaptation methods. We further support these results with comprehensive analyses that underscore the effectiveness of Delta-LoRA.
    摘要 在这篇论文中,我们提出了Delta-LoRA,它是一种新的参数效率的方法,用于精细调整大型自然语言模型(LLM)。与LoRA和其他低级权 adaptation方法相比,Delta-LoRA不仅更新了低级矩阵$\bA$和$\bB$,还通过使用两个低级矩阵的乘积 delta($\bA^{(t+1)}\bB^{(t+1)} - \bA^{(t)}\bB^{(t)}$)来升级学习。这种策略有效地解决了低级矩阵逐步更新的限制,使得模型能够更好地适应下游任务。此外,由于更新 $\bW$ 不需要计算 $\bW$ 的梯度和存储它们的动量,Delta-LoRA 与 LoRA 的内存需求和计算成本相同。实验表明,Delta-LoRA 显著超过了现有的低级权 adaptation 方法。我们还提供了全面的分析,以证明 Delta-LoRA 的效果。

In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms

  • paper_url: http://arxiv.org/abs/2309.02393
  • repo_url: None
  • paper_authors: Philipp Schilk, Niccolò Polvani, Andrea Ronco, Milos Cernak, Michele Magno
  • for: 这个论文是为了解决远程会议中音频质量受到扰乱的问题而写的。
  • methods: 这篇论文使用了新型的MEMS骨传导 Microphone,并采用了个性化语音活动检测算法和循环神经网络来解决问题。
  • results: 论文评估了一种基于骨传导数据的低功耗个性化语音检测算法,并与传统麦克风输入的方法进行比较。实验结果显示,骨传导系统可以在12.8毫秒内正确地检测到语音,并且具有43小时的电池寿命。
    Abstract The recent ubiquitous adoption of remote conferencing has been accompanied by omnipresent frustration with distorted or otherwise unclear voice communication. Audio enhancement can compensate for low-quality input signals from, for example, small true wireless earbuds, by applying noise suppression techniques. Such processing relies on voice activity detection (VAD) with low latency and the added capability of discriminating the wearer's voice from others - a task of significant computational complexity. The tight energy budget of devices as small as modern earphones, however, requires any system attempting to tackle this problem to do so with minimal power and processing overhead, while not relying on speaker-specific voice samples and training due to usability concerns. This paper presents the design and implementation of a custom research platform for low-power wireless earbuds based on novel, commercial, MEMS bone-conduction microphones. Such microphones can record the wearer's speech with much greater isolation, enabling personalized voice activity detection and further audio enhancement applications. Furthermore, the paper accurately evaluates a proposed low-power personalized speech detection algorithm based on bone conduction data and a recurrent neural network running on the implemented research platform. This algorithm is compared to an approach based on traditional microphone input. The performance of the bone conduction system, achieving detection of speech within 12.8ms at an accuracy of 95\% is evaluated. Different SoC choices are contrasted, with the final implementation based on the cutting-edge Ambiq Apollo 4 Blue SoC achieving 2.64mW average power consumption at 14uJ per inference, reaching 43h of battery life on a miniature 32mAh li-ion cell and without duty cycling.
    摘要 现代无线耳机的普遍采用导致了远程会议中的声音扭曲或不清楚的问题,而声音提升技术可以补做小质量输入信号的问题。这种处理需要具备快速响应和可区分戴户的声音和其他声音的能力,但是由于设备的能量限制,系统不能依赖于特定的说话人样本和训练。本文描述了一种自定义研究平台,基于新型商业MEMS骨传声 microphone,用于低功耗无线耳机。这种 microphone 可以更好地隔离戴户的声音,以便实现个性化声音活动检测和其他声音提升应用。此外,文章详细评估了一种基于骨传声数据和回归神经网络的低功耗个性化声音检测算法。这种算法与传统 микрофон输入的方法进行比较,并评估了骨传声系统的性能,包括检测speech within 12.8ms 的精度为 95%。文章还对不同的SoC选择进行了比较,最终实现基于Ambiq Apollo 4 Blue SoC的实现,占用2.64mW的平均电力consumption,可以达到32mAh电池的43h寿命,无需循环停机。

Explaining grokking through circuit efficiency

  • paper_url: http://arxiv.org/abs/2309.02390
  • repo_url: None
  • paper_authors: Vikrant Varma, Rohin Shah, Zachary Kenton, János Kramár, Ramana Kumar
  • for: 本研究探讨了神经网络的总结和泛化问题,具体来说是 Grokking 现象,即训练准确率很高 yet 泛化率很低的网络,在进一步训练后会转变为泛化率很高 yet 训练准确率很低。
  • methods: 作者提出了一种解释 Grokking 现象的假设,即任务存在一个泛化解决方案和一个记忆解决方案,其中泛化解决方案 slower to learn 但更高效,生成更大的 logits 与参数 нор 相同。作者还提出了四个新预测,并证明了其中的四个。
  • results: 作者通过实验证明了他们的假设,并发现了两种新的行为:ungrokking 和 semi-grokking。ungrokking 是指网络从完美测试率下降到低测试率的现象,而 semi-grokking 是指网络在部分测试数据上显示延迟的泛化行为。
    Abstract One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation. We propose that grokking occurs when the task admits a generalising solution and a memorising solution, where the generalising solution is slower to learn but more efficient, producing larger logits with the same parameter norm. We hypothesise that memorising circuits become more inefficient with larger training datasets while generalising circuits do not, suggesting there is a critical dataset size at which memorisation and generalisation are equally efficient. We make and confirm four novel predictions about grokking, providing significant evidence in favour of our explanation. Most strikingly, we demonstrate two novel and surprising behaviours: ungrokking, in which a network regresses from perfect to low test accuracy, and semi-grokking, in which a network shows delayed generalisation to partial rather than perfect test accuracy.
    摘要 一种非常有趣的神经网络泛化问题是“grokking”:一个网络在完美训练后却表现出低泛化性能。我们提出,grokking发生在任务允许一个泛化解决方案和一个记忆解决方案,其中泛化解决方案需要更长的时间学习,但生成更大的logits,同样的参数 нор。我们假设记忆化环路在更大的训练数据集上变得更不效率,而泛化环路不变。我们提出四个新预测,并证明了这些预测。最引人注目的是我们发现了两种新的行为:ungrokking和半泛化。ungrokking是指一个网络在完美训练后却降低到低测试精度,而半泛化是指一个网络在部分测试数据上显示延迟的泛化。

A Lightweight and Transferable Design for Robust LEGO Manipulation

  • paper_url: http://arxiv.org/abs/2309.02354
  • repo_url: None
  • paper_authors: Ruixuan Liu, Yifan Sun, Changliu Liu
  • for: 这篇论文是研究如何实现安全高效的机器人LEGO拼接。
  • methods: 本论文使用硬件软件共设计,设计了一个终端工具(EOAT),以便大型工业机器人轻松地拼接LEGO块。此外,本论文使用进化策略安全地优化机器人运动,以实现100%的成功率。
  • results: 实验表明,EOAT可靠地进行LEGO拼接,而学习框架可以安全地提高拼接性能,达到100%的成功率。另外,本解决方案在多个机器人(FANUC LR-mate 200id/7L和Yaskawa GP4)上进行了多机器人扩展和传输性测试,以证明其通用性和可重复性。最后,我们表明了该解决方案可以实现可持续的机器人LEGO拼接,机器人可以重复地组装和解组不同的原型。
    Abstract LEGO is a well-known platform for prototyping pixelized objects. However, robotic LEGO prototyping (i.e. manipulating LEGO bricks) is challenging due to the tight connections and accuracy requirement. This paper investigates safe and efficient robotic LEGO manipulation. In particular, this paper reduces the complexity of the manipulation by hardware-software co-design. An end-of-arm tool (EOAT) is designed, which reduces the problem dimension and allows large industrial robots to easily manipulate LEGO bricks. In addition, this paper uses evolution strategy to safely optimize the robot motion for LEGO manipulation. Experiments demonstrate that the EOAT performs reliably in manipulating LEGO bricks and the learning framework can effectively and safely improve the manipulation performance to a 100\% success rate. The co-design is deployed to multiple robots (i.e. FANUC LR-mate 200id/7L and Yaskawa GP4) to demonstrate its generalizability and transferability. In the end, we show that the proposed solution enables sustainable robotic LEGO prototyping, in which the robot can repeatedly assemble and disassemble different prototypes.
    摘要 LEGO 是一个知名的原型平台,但是机器人 LEGO 拼接 (i.e. 拼接 LEGO 块) 具有挑战性,主要是因为连接紧密和精度要求高。这篇论文 investigate 安全和高效的机器人 LEGO 拼接方法。特别是这篇论文通过硬件软件共设计来降低拼接复杂性。一个终端工具 (EOAT) 被设计,可以减少问题维度,使大型工业机器人更容易地拼接 LEGO 块。此外,这篇论文使用进化策略来安全地优化机器人运动,以达到100% 的成功率。实验表明,EOAT 可靠地 manipulating LEGO 块,并且学习框架可以效果地提高拼接性能。 co-design 被部署到多个机器人 (i.e. FANUC LR-mate 200id/7L 和 Yaskawa GP4),以示其通用性和传递性。最后,我们示出了我们的解决方案可以实现可持续的机器人 LEGO 拼接,机器人可以重复地组装和解组不同的原型。

Exact Inference for Continuous-Time Gaussian Process Dynamics

  • paper_url: http://arxiv.org/abs/2309.02351
  • repo_url: None
  • paper_authors: Katharina Ensinger, Nicholas Tagliapietra, Sebastian Ziesche, Sebastian Trimpe
  • for: 提取真实连续时间系统的GP模型
  • methods: 利用多步和泰勒积分器,实现直接GP推理
  • results: 实验和理论表明,该方法可以准确地表示连续时间系统的GP模型
    Abstract Physical systems can often be described via a continuous-time dynamical system. In practice, the true system is often unknown and has to be learned from measurement data. Since data is typically collected in discrete time, e.g. by sensors, most methods in Gaussian process (GP) dynamics model learning are trained on one-step ahead predictions. This can become problematic in several scenarios, e.g. if measurements are provided at irregularly-sampled time steps or physical system properties have to be conserved. Thus, we aim for a GP model of the true continuous-time dynamics. Higher-order numerical integrators provide the necessary tools to address this problem by discretizing the dynamics function with arbitrary accuracy. Many higher-order integrators require dynamics evaluations at intermediate time steps making exact GP inference intractable. In previous work, this problem is often tackled by approximating the GP posterior with variational inference. However, exact GP inference is preferable in many scenarios, e.g. due to its mathematical guarantees. In order to make direct inference tractable, we propose to leverage multistep and Taylor integrators. We demonstrate how to derive flexible inference schemes for these types of integrators. Further, we derive tailored sampling schemes that allow to draw consistent dynamics functions from the learned posterior. This is crucial to sample consistent predictions from the dynamics model. We demonstrate empirically and theoretically that our approach yields an accurate representation of the continuous-time system.
    摘要 Physical systems can often be described using a continuous-time dynamical system. However, the true system is often unknown and must be learned from measurement data. Since data is typically collected in discrete time, e.g. by sensors, most methods in Gaussian process (GP) dynamics model learning are trained on one-step ahead predictions. This can become problematic in certain scenarios, e.g. if measurements are provided at irregularly-sampled time steps or physical system properties must be conserved. Therefore, we aim to learn a GP model of the true continuous-time dynamics. Higher-order numerical integrators provide the necessary tools to address this problem by discretizing the dynamics function with arbitrary accuracy. However, many higher-order integrators require dynamics evaluations at intermediate time steps, making exact GP inference intractable.In previous work, this problem is often tackled by approximating the GP posterior with variational inference. However, exact GP inference is preferable in many scenarios, e.g. due to its mathematical guarantees. To make direct inference tractable, we propose to leverage multistep and Taylor integrators. We derive how to derive flexible inference schemes for these types of integrators. Additionally, we derive tailored sampling schemes that allow drawing consistent dynamics functions from the learned posterior. This is crucial to sample consistent predictions from the dynamics model.We demonstrate empirically and theoretically that our approach yields an accurate representation of the continuous-time system.Note: The text is translated into Simplified Chinese, which is a standardized form of Chinese used in mainland China and Singapore. The translation is written in Traditional Chinese characters, which is the standard form of Chinese used in Taiwan and other countries.这些physical systems可以用一个连续时间的动力系统来描述。然而,真正的系统是未知的,需要从测量数据学习。由于数据通常是在离散时间收集的,例如由感应器收集,因此大多数GP动力系统学习方法是在一步之前预测。这可能会在某些情况下问题,例如测量是在不规则时间步进行的或物理系统特性需要保持。因此,我们的目标是学习一个GP模型的真正连续时间系统。高级数字积分器提供了必要的工具来解决这个问题。然而,许多高级积分器需要在中途时间步进行动力评估,使得GP数据 posterior 不可靠。在过去的工作中,这个问题通常是通过将GP posterior 近似为多标量数据学习的方法来解决。然而,精确的GP数据 posterior 是在许多情况下更好的,例如因为它具有数学保证。为了让直接推理可行,我们提议使用多步和泰勒积分器。我们 derivation 了如何 derivation flexible inference schemes for these types of integrators. 此外,我们也 derivation 了适合的抽样方案,允许从学习的 posterior 中获得一致的动力函数。这是关键的,以获得一致的预测。我们在实验和理论上证明了我们的方法可以实现一个精确的连续时间系统表示。

PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference

  • paper_url: http://arxiv.org/abs/2309.02334
  • repo_url: None
  • paper_authors: Marta Andronic, George A. Constantinides
  • for: 这个论文是为了提高深度学习推理的启动速度和面积使用Field-programmable gate arrays (FPGAs)的实现。
  • methods: 这个论文提出了一种使用多ivariate polynomials作为深度学习模型的基本建置物件,并将这些多ivariate polynomials hide在FPGA的Lookup Table (LUTs)中,以避免额外的负载。
  • results: 这个论文显示了使用多ivariate polynomials可以实现相同的准确性,但是使用许多 fewer layers of soft logic,从而获得了显著的启动速度和面积改善。这个方法在三个任务中得到了证明:网络入侵检测、CERN大 HadronCollider上的戳机识别和MNIST datasets上的手写数字识别。
    Abstract Field-programmable gate arrays (FPGAs) are widely used to implement deep learning inference. Standard deep neural network inference involves the computation of interleaved linear maps and nonlinear activation functions. Prior work for ultra-low latency implementations has hardcoded the combination of linear maps and nonlinear activations inside FPGA lookup tables (LUTs). Our work is motivated by the idea that the LUTs in an FPGA can be used to implement a much greater variety of functions than this. In this paper, we propose a novel approach to training neural networks for FPGA deployment using multivariate polynomials as the basic building block. Our method takes advantage of the flexibility offered by the soft logic, hiding the polynomial evaluation inside the LUTs with zero overhead. We show that by using polynomial building blocks, we can achieve the same accuracy using considerably fewer layers of soft logic than by using linear functions, leading to significant latency and area improvements. We demonstrate the effectiveness of this approach in three tasks: network intrusion detection, jet identification at the CERN Large Hadron Collider, and handwritten digit recognition using the MNIST dataset.
    摘要 Field-programmable gate arrays (FPGAs) 广泛用于深度学习推理实现。标准深度神经网络推理包括交叠的线性映射和非线性活化函数的计算。先前的工作对于超低延迟实现做出了硬编码了线性映射和非线性活化函数在FPGALookup表(LUTs)中的方法。我们的工作是基于FPGA LUTs 可以实现更多种函数的想法。在这篇论文中,我们提出了一种新的方法,使用多变量多项式作为神经网络训练的基本构件。我们的方法利用FPGA soft logic 的灵活性,将多项式评估隐藏在LUTs 中,无损失。我们显示,使用多项式构件可以与使用线性函数相比,实现同等准确性,但是具有显著的延迟和面积改进。我们在三个任务中证明了这种方法的有效性:网络入侵检测、在CERN大 адроン卫星中的戳彩识别和使用MNIST数据集的手写数字识别。

Resilient VAE: Unsupervised Anomaly Detection at the SLAC Linac Coherent Light Source

  • paper_url: http://arxiv.org/abs/2309.02333
  • repo_url: None
  • paper_authors: Ryan Humble, William Colocho, Finn O’Shea, Daniel Ratner, Eric Darve
  • for: 这篇论文旨在应用深度学习进行异常检测,但现有方法假设有正常训练集(i.e., 无异常数据)或完全标签训练集。在复杂的工程系统中,例如粒子加速器,标签是罕见和昂贵的;为了进行异常检测在这些情况下,我们必须搁置这些假设,并使用完全无监督的方法。
  • methods: 这篇论文提出了具有抗衰变性的自适应器(ResVAE),一种深度生成模型,用于异常检测。ResVAE在训练过程中学习每个样本的异常可能性,以及每个个别特征的异常可能性,并将这些可能性用于有效地忽略在训练数据中的异常样本。
  • results: 这篇论文应用ResVAE进行了加速器异常检测,并使用了射测系统中的枪位监控数据。 results show that ResVAE exhibits exceptional ability in identifying various types of anomalies present in the accelerator, and provides feature-level anomaly attribution.
    Abstract Significant advances in utilizing deep learning for anomaly detection have been made in recent years. However, these methods largely assume the existence of a normal training set (i.e., uncontaminated by anomalies) or even a completely labeled training set. In many complex engineering systems, such as particle accelerators, labels are sparse and expensive; in order to perform anomaly detection in these cases, we must drop these assumptions and utilize a completely unsupervised method. This paper introduces the Resilient Variational Autoencoder (ResVAE), a deep generative model specifically designed for anomaly detection. ResVAE exhibits resilience to anomalies present in the training data and provides feature-level anomaly attribution. During the training process, ResVAE learns the anomaly probability for each sample as well as each individual feature, utilizing these probabilities to effectively disregard anomalous examples in the training data. We apply our proposed method to detect anomalies in the accelerator status at the SLAC Linac Coherent Light Source (LCLS). By utilizing shot-to-shot data from the beam position monitoring system, we demonstrate the exceptional capability of ResVAE in identifying various types of anomalies that are visible in the accelerator.
    摘要 Translation notes:* "Significant advances" is translated as "重要的进步" (zhòng yì de jìn bo)* "Utilizing deep learning" is translated as "使用深度学习" (shǐ yòu shēn dào xué xí)* "Anomaly detection" is translated as "异常检测" (yì cháng jiǎn té)* "Normal training set" is translated as "正常的训练集" (zhèng cháng de xùn xí jí)* "Completely labeled training set" is translated as "完全标注的训练集" (quán zhì biǎo xiǎo de xùn xí jí)* "Particle accelerators" is translated as "粒子加速器" (dì zhí jiā sù qì)* "Sparse and expensive labels" is translated as "稀疏和昂贵的标签" (xī shū hé gōng jí de biāo jiā)* "Completely unsupervised method" is translated as "完全无监督的方法" (quán zhì wú jiǎn dū de fāng fá)* "Resilient Variational Autoencoder" is translated as "鲁棒的变量自适应器" (ròng bò de biàn yù zì xiǎng qì)* "Feature-level anomaly attribution" is translated as "特征层异常报告" (fèi yì zhì yì cháng bào gāo)* "Shot-to-shot data" is translated as "一把一的数据" (yī bǎ yī de xùn xí)* "Beam position monitoring system" is translated as "贝壳位置监测系统" (bēi kē zhì dào jiān tè xì tuán)* "SLAC Linac Coherent Light Source" is translated as "SLAC激光干涉源" (SLA cè liàng kē gǎn chuī yuán)

A study on the impact of pre-trained model on Just-In-Time defect prediction

  • paper_url: http://arxiv.org/abs/2309.02317
  • repo_url: None
  • paper_authors: Yuxiang Guo, Xiaopeng Gao, Zhenyu Zhang, W. K. Chan, Bo Jiang
  • for: 本研究主要针对Just-In-Time(JIT)缺陷预测任务,旨在探讨不同预训模型之间的关系。
  • methods: 我们建立了六个模型:RoBERTaJIT、CodeBERTJIT、BARTJIT、PLBARTJIT、GPT2JIT和CodeGPTJIT,每个模型都使用不同的预训模型作为底层模型。我们系统地探讨这些模型之间的差异和连接。
  • results: 我们发现每个模型都有改进,当预训模型的类似性较高时,需要的训练资源减少得更多。我们 также发现提交代码对缺陷探测具有重要作用,不同的预训模型在少量数据下场景下表现出不同的缺陷探测能力。这些结果为JIT缺陷预测任务中使用预训模型优化提供新的视角,并 highlight了需要更多注意的因素。
    Abstract Previous researchers conducting Just-In-Time (JIT) defect prediction tasks have primarily focused on the performance of individual pre-trained models, without exploring the relationship between different pre-trained models as backbones. In this study, we build six models: RoBERTaJIT, CodeBERTJIT, BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained model as its backbone. We systematically explore the differences and connections between these models. Specifically, we investigate the performance of the models when using Commit code and Commit message as inputs, as well as the relationship between training efficiency and model distribution among these six models. Additionally, we conduct an ablation experiment to explore the sensitivity of each model to inputs. Furthermore, we investigate how the models perform in zero-shot and few-shot scenarios. Our findings indicate that each model based on different backbones shows improvements, and when the backbone's pre-training model is similar, the training resources that need to be consumed are much more closer. We also observe that Commit code plays a significant role in defect detection, and different pre-trained models demonstrate better defect detection ability with a balanced dataset under few-shot scenarios. These results provide new insights for optimizing JIT defect prediction tasks using pre-trained models and highlight the factors that require more attention when constructing such models. Additionally, CodeGPTJIT and GPT2JIT achieved better performance than DeepJIT and CC2Vec on the two datasets respectively under 2000 training samples. These findings emphasize the effectiveness of transformer-based pre-trained models in JIT defect prediction tasks, especially in scenarios with limited training data.
    摘要

Inferring effective couplings with Restricted Boltzmann Machines

  • paper_url: http://arxiv.org/abs/2309.02292
  • repo_url: https://github.com/alfonso-navas/inferring_effective_couplings_with_RBMs
  • paper_authors: Aurélien Decelle, Cyril Furtlehner, Alfonso De Jesus Navas Gómez, Beatriz Seoane
  • for: 本研究旨在提出一种简单的解决方案,使能够准确地理解生成模型中的物理含义。
  • methods: 本研究使用了 Restricted Boltzmann Machine(RBM),并提出了一种将 RBM 的能量函数映射到有效磁矩度 Hamiltonian 的方法,该方法包括了所有可能的交互次数,超过了传统的对应 inverse Ising 方法所考虑的对习次数。
  • results: 研究人员通过控制的数值实验,训练 RBM 使用平衡样本,以验证该方法的有效性。结果表明,该方法可以学习正确的交互网络,并可以应用于模型复杂数据集。此外,研究人员还评估了不同训练方法的模型质量。
    Abstract Generative models offer a direct way to model complex data. Among them, energy-based models provide us with a neural network model that aims to accurately reproduce all statistical correlations observed in the data at the level of the Boltzmann weight of the model. However, one challenge is to understand the physical interpretation of such models. In this study, we propose a simple solution by implementing a direct mapping between the energy function of the Restricted Boltzmann Machine and an effective Ising spin Hamiltonian that includes high-order interactions between spins. This mapping includes interactions of all possible orders, going beyond the conventional pairwise interactions typically considered in the inverse Ising approach, and allowing the description of complex datasets. Earlier works attempted to achieve this goal, but the proposed mappings did not do properly treat the complexity of the problem or did not contain direct prescriptions for practical application. To validate our method, we performed several controlled numerical experiments where we trained the RBMs using equilibrium samples of predefined models containing local external fields, two-body and three-body interactions in various low-dimensional topologies. The results demonstrate the effectiveness of our proposed approach in learning the correct interaction network and pave the way for its application in modeling interesting datasets. We also evaluate the quality of the inferred model based on different training methods.
    摘要 生成模型提供了直接模型复杂数据的方式。其中,能量基型模型为我们提供了一个基于神经网络的模型,该模型的目标是在数据中识别所有 estadísticos correlations,并在模型的Boltzmann权重水平上准确地复制它们。然而,一个挑战是理解这些模型的物理解释。在这项研究中,我们提议一种简单的解决方案,通过将Restricted Boltzmann Machine(RBM)的能量函数直接映射到包含高阶交互的有效牛顿矩阵 Hamiltoniano中。这种映射包括所有可能的顺序交互,超过了传统的对应 inverse Ising 方法中考虑的对应交互,并允许描述复杂的数据集。先前的工作尝试了实现这个目标,但是提议的映射没有正确地处理问题的复杂性或者没有直接的实践指南。为验证我们的方法,我们进行了一些控制性的数字实验,在 pré-definido 模型中使用平衡样本,包括局部外场、二体和三体交互在多种低维度拓扑中。结果表明了我们提议的方法的效iveness,可以学习正确的交互网络,并为复杂的数据集模型提供了道路。我们还评估了不同的培训方法的模型质量。

A Comparison of Residual-based Methods on Fault Detection

  • paper_url: http://arxiv.org/abs/2309.02274
  • repo_url: None
  • paper_authors: Chi-Ching Hsu, Gaetan Frusque, Olga Fink
  • for: 本研究旨在比较两种残差基于方法:自动编码器和输入输出模型,以检测FAULTS并区分不同的潜在FAULT类型。
  • methods: 两种方法都使用残差来检测FAULTS,并且都使用健康数据进行训练。
  • results: 两种方法都可以在约20个循环后检测FAULTS,并保持低的假阳性率。而输入输出模型提供更好的解释力,包括可能的FAULT类型和可能的FAULT Component。
    Abstract An important initial step in fault detection for complex industrial systems is gaining an understanding of their health condition. Subsequently, continuous monitoring of this health condition becomes crucial to observe its evolution, track changes over time, and isolate faults. As faults are typically rare occurrences, it is essential to perform this monitoring in an unsupervised manner. Various approaches have been proposed not only to detect faults in an unsupervised manner but also to distinguish between different potential fault types. In this study, we perform a comprehensive comparison between two residual-based approaches: autoencoders, and the input-output models that establish a mapping between operating conditions and sensor readings. We explore the sensor-wise residuals and aggregated residuals for the entire system in both methods. The performance evaluation focuses on three tasks: health indicator construction, fault detection, and health indicator interpretation. To perform the comparison, we utilize the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dynamical model, specifically a subset of the turbofan engine dataset containing three different fault types. All models are trained exclusively on healthy data. Fault detection is achieved by applying a threshold that is determined based on the healthy condition. The detection results reveal that both models are capable of detecting faults with an average delay of around 20 cycles and maintain a low false positive rate. While the fault detection performance is similar for both models, the input-output model provides better interpretability regarding potential fault types and the possible faulty components.
    摘要 <>为了探测复杂工业系统中的FAULT, initially understanding its health condition是非常重要的一步。随后,对这个健康状况的持续监测变得非常重要,以观察其发展、跟踪变化并孤立FAULT。由于FAULT是非常罕见的,因此需要在无监督的情况下进行监测。多种方法已经被提议,不仅可以探测FAULT,还可以分辨不同的可能的FAULT类型。在本研究中,我们进行了总比较两种异常检测方法:自适应神经网络和输入输出模型,它们都可以建立运行条件和传感器读ings之间的映射。我们分析了传感器级别和系统级别的差异,并对三个不同的FAULT类型进行了评估。我们使用了商用模块式飞机发动机 simulate(C-MAPSS)动力模型,特别是一个包含三种FAULT类型的液冷发动机数据集。所有模型都是由健康数据进行了唯一的训练。异常检测是通过设置基于健康状况的阈值来实现的。检测结果显示,两种模型都能够在20轮异常检测,并保持低的假阳性率。虽然异常检测性能相似,但输入输出模型提供了更好的可解释性,即可能的FAULT类型和可能的异常组件。<>

Graph-Based Automatic Feature Selection for Multi-Class Classification via Mean Simplified Silhouette

  • paper_url: http://arxiv.org/abs/2309.02272
  • repo_url: https://github.com/davidlevinwork/GB-AFS
  • paper_authors: David Levin, Gonen Singer
  • for: 本研究提出了一种新的图structured filter方法,用于自动特征选择(简称GB-AFS),以便在多类分类任务中提高预测性能。
  • methods: 该方法使用Jeffries-Matusita距离和t-分布随机邻域嵌入(t-SNE)生成一个低维度空间,以反映每个特征在每对类之间如何有效地分化。而选择最小数量的特征则使用我们新提出的平均简化 Silhouette指数(简称MSS),用于评估特征选择任务中的凝结结果。
  • results: 实验结果表明,提案的GB-AFS方法在公共数据集上表现出优于其他筛子基本方法和自动特征选择方法。此外,GB-AFS方法可以保持使用所有特征时的准确率,只使用$7%$到$30%$的特征,从而降低了分类时间的消耗,从$15%$降低到$70%$.
    Abstract This paper introduces a novel graph-based filter method for automatic feature selection (abbreviated as GB-AFS) for multi-class classification tasks. The method determines the minimum combination of features required to sustain prediction performance while maintaining complementary discriminating abilities between different classes. It does not require any user-defined parameters such as the number of features to select. The methodology employs the Jeffries-Matusita (JM) distance in conjunction with t-distributed Stochastic Neighbor Embedding (t-SNE) to generate a low-dimensional space reflecting how effectively each feature can differentiate between each pair of classes. The minimum number of features is selected using our newly developed Mean Simplified Silhouette (abbreviated as MSS) index, designed to evaluate the clustering results for the feature selection task. Experimental results on public data sets demonstrate the superior performance of the proposed GB-AFS over other filter-based techniques and automatic feature selection approaches. Moreover, the proposed algorithm maintained the accuracy achieved when utilizing all features, while using only $7\%$ to $30\%$ of the features. Consequently, this resulted in a reduction of the time needed for classifications, from $15\%$ to $70\%$.
    摘要 The methodology employs Jeffries-Matusita distance and t-distributed Stochastic Neighbor Embedding (t-SNE) to generate a low-dimensional space that reflects how effectively each feature differentiates between each pair of classes. The minimum number of features is selected using the Mean Simplified Silhouette (MSS) index, which evaluates the clustering results for the feature selection task.Experimental results on public data sets demonstrate the superior performance of the proposed GB-AFS over other filter-based techniques and automatic feature selection approaches. The proposed algorithm achieved the same accuracy as using all features, while using only 7% to 30% of the features, resulting in a significant reduction in classification time, from 15% to 70%.

Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning

  • paper_url: http://arxiv.org/abs/2309.02476
  • repo_url: None
  • paper_authors: Yong Lin, Chen Liu, Chenlu Ye, Qing Lian, Yuan Yao, Tong Zhang
  • For: 这个研究旨在提出一个可靠的方法来选择资料subset,以减少深度学习模型的训练成本和错误。* Methods: 本研究使用了潜在子集选择和活动学习,并提出了一个 theoretically 优质的解决方案,名为COPS(uncertainty based OPtimal Sub-sampling),可以对于线性软MAX regression进行最佳化。* Results: 在实验中,COPS 方法与基eline方法比较,结果显示 COPS 方法在深度学习任务中具有superior表现,并且可以对于模型缺失和低密度样本进行下条件调整。
    Abstract Modern deep learning heavily relies on large labeled datasets, which often comse with high costs in terms of both manual labeling and computational resources. To mitigate these challenges, researchers have explored the use of informative subset selection techniques, including coreset selection and active learning. Specifically, coreset selection involves sampling data with both input ($\bx$) and output ($\by$), active learning focuses solely on the input data ($\bx$). In this study, we present a theoretically optimal solution for addressing both coreset selection and active learning within the context of linear softmax regression. Our proposed method, COPS (unCertainty based OPtimal Sub-sampling), is designed to minimize the expected loss of a model trained on subsampled data. Unlike existing approaches that rely on explicit calculations of the inverse covariance matrix, which are not easily applicable to deep learning scenarios, COPS leverages the model's logits to estimate the sampling ratio. This sampling ratio is closely associated with model uncertainty and can be effectively applied to deep learning tasks. Furthermore, we address the challenge of model sensitivity to misspecification by incorporating a down-weighting approach for low-density samples, drawing inspiration from previous works. To assess the effectiveness of our proposed method, we conducted extensive empirical experiments using deep neural networks on benchmark datasets. The results consistently showcase the superior performance of COPS compared to baseline methods, reaffirming its efficacy.
    摘要 现代深度学习强调大量标注数据,经常带来高的人工标注和计算成本。为了缓解这些挑战,研究人员已经探索了有用的子集选择技术,包括核心集选择和活动学习。特别是,核心集选择是通过采样数据来减少数据量,而活动学习则仅关注输入数据。在本研究中,我们提出了在线性软MAX回归中的理论优化解决方案,名为COPS(uncertainty based Optimal Sub-sampling)。我们的方法旨在降低由subsampled数据训练的模型预测错误的期望损失。与现有方法不同,COPS不需要显式计算 inverse covariance matrix,而是利用模型的logits来估算采样比率。这个采样比率与模型uncertainty有很Close关系,可以有效应用于深度学习任务。此外,我们还解决了模型偏置低密度样本的挑战,通过引入低密度样本下降值策略,这种策略 drew inspiration from previous works。为评估我们的提出方法的有效性,我们在深度神经网络上进行了广泛的实验。结果 consistently showcase COPS比基准方法更好,这再次证明了其效果。

RoBoSS: A Robust, Bounded, Sparse, and Smooth Loss Function for Supervised Learning

  • paper_url: http://arxiv.org/abs/2309.02250
  • repo_url: https://github.com/mtanveer1/RoBoSS
  • paper_authors: Mushir Akhtar, M. Tanveer, Mohd. Arshad
  • For: 本研究提出了一种新的稳定、缩短、稀疏和均匀(RoBoSS)损失函数,用于改进支持向量机(SVM)的超级vised学习算法。* Methods: 本文 integra RoBoSS损失函数到SVM框架中,并提出了一种新的稳定化算法($\mathcal{L}{rbss}$-SVM)。同时,本文也进行了对RoBoSS损失函数的分类准确性和泛化能力的理论分析。* Results: 实验表明,使用提出的$\mathcal{L}{rbss}$-SVM模型,在88个真实世界UCI和KEEL数据集上表现出色,并且在医学领域中,在EEG信号数据集和Breast Cancer(BreaKHis)数据集上也得到了惊喜的结果。
    Abstract In the domain of machine learning algorithms, the significance of the loss function is paramount, especially in supervised learning tasks. It serves as a fundamental pillar that profoundly influences the behavior and efficacy of supervised learning algorithms. Traditional loss functions, while widely used, often struggle to handle noisy and high-dimensional data, impede model interpretability, and lead to slow convergence during training. In this paper, we address the aforementioned constraints by proposing a novel robust, bounded, sparse, and smooth (RoBoSS) loss function for supervised learning. Further, we incorporate the RoBoSS loss function within the framework of support vector machine (SVM) and introduce a new robust algorithm named $\mathcal{L}_{rbss}$-SVM. For the theoretical analysis, the classification-calibrated property and generalization ability are also presented. These investigations are crucial for gaining deeper insights into the performance of the RoBoSS loss function in the classification tasks and its potential to generalize well to unseen data. To empirically demonstrate the effectiveness of the proposed $\mathcal{L}_{rbss}$-SVM, we evaluate it on $88$ real-world UCI and KEEL datasets from diverse domains. Additionally, to exemplify the effectiveness of the proposed $\mathcal{L}_{rbss}$-SVM within the biomedical realm, we evaluated it on two medical datasets: the electroencephalogram (EEG) signal dataset and the breast cancer (BreaKHis) dataset. The numerical results substantiate the superiority of the proposed $\mathcal{L}_{rbss}$-SVM model, both in terms of its remarkable generalization performance and its efficiency in training time.
    摘要 在机器学习算法领域,损失函数的重要性非常高,特别在指导学习任务中。它作为基础的核心因素,深刻影响了指导学习算法的行为和效果。传统的损失函数,虽广泛使用,但经常难以处理噪音和高维数据,阻碍模型解释性,并导致训练过程中的慢 converges。本文提出了一种新的robust、bounded、稀疏和均匀(RoBoSS)损失函数,用于指导学习。此外,我们将RoBoSS损失函数 integration到支持向量机(SVM)框架中,并提出了一种新的Robust-SVM算法。对于理论分析方面,我们还提出了分类准备性和泛化能力的研究。这些研究对于了解RoBoSS损失函数在分类任务中的性能和泛化能力具有重要意义。为了证明提出的Lrbss-SVM模型的效果,我们对88个真实世界UCI和KEEL数据集进行了实验评估。此外,为了展示Lrbss-SVM模型在医学领域的效果,我们对电enzephalogram(EEG)信号数据集和Breast Cancer(BreaKHis)数据集进行了评估。实验结果表明,提出的Lrbss-SVM模型具有优秀的泛化性和训练效率。

Self-Similarity-Based and Novelty-based loss for music structure analysis

  • paper_url: http://arxiv.org/abs/2309.02243
  • repo_url: None
  • paper_authors: Geoffroy Peeters
  • for: 本研究旨在提出一种监督学习方法来解决音乐分割问题。
  • methods: 该方法同时学习特征和卷积核,并将自同相似矩阵(SSM)和新鲜度分数作为损失函数进行优化。
  • results: 研究人员通过对学习到的特征进行自我注意力,提高了音乐分割任务的性能。此外,与之前的方法进行比较,该方法在RWC-Pop和SALAMI等标准数据集上表现较优。
    Abstract Music Structure Analysis (MSA) is the task aiming at identifying musical segments that compose a music track and possibly label them based on their similarity. In this paper we propose a supervised approach for the task of music boundary detection. In our approach we simultaneously learn features and convolution kernels. For this we jointly optimize -- a loss based on the Self-Similarity-Matrix (SSM) obtained with the learned features, denoted by SSM-loss, and -- a loss based on the novelty score obtained applying the learned kernels to the estimated SSM, denoted by novelty-loss. We also demonstrate that relative feature learning, through self-attention, is beneficial for the task of MSA. Finally, we compare the performances of our approach to previously proposed approaches on the standard RWC-Pop, and various subsets of SALAMI.
    摘要 音乐结构分析(MSA)是目标在音乐轨道中确定乐曲的各个部分,并可能根据它们的相似性进行标签。在这篇论文中,我们提出了一种监督方法来实现音乐边界检测任务。我们同时学习特征和卷积核,并同步优化两种损失函数。其中一种损失函数基于自相似矩阵(SSM),通过学习的特征来获得,被称为SSM-损失;另一种损失函数基于新鲜度分数,通过学习的卷积核应用于估算的SSM中,被称为新鲜度-损失。我们还证明了通过自我注意力来实现相对特征学习是MSA任务中有利的。最后,我们比较了我们的方法与之前提出的方法在标准RWC-Pop和SALAMI中的性能。

Sample Size in Natural Language Processing within Healthcare Research

  • paper_url: http://arxiv.org/abs/2309.02237
  • repo_url: None
  • paper_authors: Jaya Chaturvedi, Diana Shamsutdinova, Felix Zimmer, Sumithra Velupillai, Daniel Stahl, Robert Stewart, Angus Roberts
  • for: 本研究是为了提供适合医疗领域文本数据的样本大小选择的建议。
  • methods: 本研究使用了不同的分类器和样本大小进行了 simulations,以评估不同样本大小对文本分类任务的影响。
  • results: 研究发现,使用 K-最近邻分类器时,小样本大小可以提供更好的性能指标,而使用支持向量机和BERT模型时,大样本大小提供更好的性能。总之,样本大小大于1000是适合的,可以提供良好的性能指标。
    Abstract Sample size calculation is an essential step in most data-based disciplines. Large enough samples ensure representativeness of the population and determine the precision of estimates. This is true for most quantitative studies, including those that employ machine learning methods, such as natural language processing, where free-text is used to generate predictions and classify instances of text. Within the healthcare domain, the lack of sufficient corpora of previously collected data can be a limiting factor when determining sample sizes for new studies. This paper tries to address the issue by making recommendations on sample sizes for text classification tasks in the healthcare domain. Models trained on the MIMIC-III database of critical care records from Beth Israel Deaconess Medical Center were used to classify documents as having or not having Unspecified Essential Hypertension, the most common diagnosis code in the database. Simulations were performed using various classifiers on different sample sizes and class proportions. This was repeated for a comparatively less common diagnosis code within the database of diabetes mellitus without mention of complication. Smaller sample sizes resulted in better results when using a K-nearest neighbours classifier, whereas larger sample sizes provided better results with support vector machines and BERT models. Overall, a sample size larger than 1000 was sufficient to provide decent performance metrics. The simulations conducted within this study provide guidelines that can be used as recommendations for selecting appropriate sample sizes and class proportions, and for predicting expected performance, when building classifiers for textual healthcare data. The methodology used here can be modified for sample size estimates calculations with other datasets.
    摘要 Sample size calculation is an essential step in most data-based disciplines. Large enough samples ensure representativeness of the population and determine the precision of estimates. This is true for most quantitative studies, including those that employ machine learning methods, such as natural language processing, where free-text is used to generate predictions and classify instances of text. Within the healthcare domain, the lack of sufficient corpora of previously collected data can be a limiting factor when determining sample sizes for new studies. This paper tries to address the issue by making recommendations on sample sizes for text classification tasks in the healthcare domain. 模型在基于MIMIC-III数据库的医疗记录中训练后,用于分类文档是否有未特定主要高血压症,该数据库中最常见的诊断代码。在不同的样本大小和类别比例下,使用不同的分类器进行了 simulations。这些 simulations 重复使用不同的分类器和不同的样本大小。结果表明,使用 K-最近邻居分类器时,小样本大小得到了更好的结果,而使用支持向量机和BERT模型时,大样本大小得到了更好的结果。总的来说,样本大小大于1000是可以提供不错的性能指标的。 这些 simulations 提供了适用于健康领域文本数据的分类器建立时选择合适的样本大小和类别比例,以及预测性能的指南。这种方法可以适用于其他数据集的样本大小估计计算。

Distributionally Robust Machine Learning with Multi-source Data

  • paper_url: http://arxiv.org/abs/2309.02211
  • repo_url: None
  • paper_authors: Zhenyu Wang, Peter Bühlmann, Zijian Guo
  • for: 这篇文章是用于提高预测性能,当目标分布与源 Population 不同时,传统机器学习方法可能会导致差强预测性能。
  • methods: 文章提出了一个基于多来源数据的集团分布强化预测模型,这个模型使用了对于target分布的敌方奖励函数来定义,以提高预测性能。
  • results: 文章的实验结果显示,相比于传统的empirical risk minimization,提案的强化预测模型可以提高预测性能,并且可以让用户对于不同的目标分布进行预测。
    Abstract Classical machine learning methods may lead to poor prediction performance when the target distribution differs from the source populations. This paper utilizes data from multiple sources and introduces a group distributionally robust prediction model defined to optimize an adversarial reward about explained variance with respect to a class of target distributions. Compared to classical empirical risk minimization, the proposed robust prediction model improves the prediction accuracy for target populations with distribution shifts. We show that our group distributionally robust prediction model is a weighted average of the source populations' conditional outcome models. We leverage this key identification result to robustify arbitrary machine learning algorithms, including, for example, random forests and neural networks. We devise a novel bias-corrected estimator to estimate the optimal aggregation weight for general machine-learning algorithms and demonstrate its improvement in the convergence rate. Our proposal can be seen as a distributionally robust federated learning approach that is computationally efficient and easy to implement using arbitrary machine learning base algorithms, satisfies some privacy constraints, and has a nice interpretation of different sources' importance for predicting a given target covariate distribution. We demonstrate the performance of our proposed group distributionally robust method on simulated and real data with random forests and neural networks as base-learning algorithms.
    摘要 传统机器学习方法可能会导致预测性能差,因为目标分布与源 популяции不同。这篇论文利用多个源数据,提出了一种分布robust预测模型,定义为最小化一个对target分布的反对抗 reward的 adversarial objective function。相比于传统的empirical risk minimization,我们的robust预测模型可以提高预测性能 для目标population中的分布shift。我们证明了我们的分布robust预测模型是源population conditional outcome模型的Weighted average。我们利用这个关键的标识结果,以robustify任意机器学习算法,包括随机森林和神经网络。我们开发了一种偏导corrected estimator来估计最佳汇合权,并证明其提高了收敛率。我们的提议可以看作是一种分布robust Federated learning方法,computationally efficient,易于实现,满足一些隐私约束,并具有预测targetcovariate分布的nice interpretation of different sources的重要性。我们在 simulate和实际数据上测试了我们的提议,使用随机森林和神经网络作为基础学习算法。

Latent Disentanglement in Mesh Variational Autoencoders Improves the Diagnosis of Craniofacial Syndromes and Aids Surgical Planning

  • paper_url: http://arxiv.org/abs/2309.10825
  • repo_url: None
  • paper_authors: Simone Foti, Alexander J. Rickart, Bongjin Koo, Eimear O’ Sullivan, Lara S. van de Lande, Athanasios Papaioannou, Roman Khonsari, Danail Stoyanov, N. u. Owase Jeelani, Silvia Schievano, David J. Dunaway, Matthew J. Clarkson
  • for: 这项研究旨在应用Swap Disentangled Variational Autoencoder(SD-VAE)模型对人类头部复杂结构进行深度学习分析,以便更好地识别和分类各种颅骨畸形。
  • methods: 这项研究使用了SD-VAE模型,通过对整个头部网格进行分类,同时也可以分析每个区域对颅骨畸形的影响。此外,通过修改生成模型的参数,可以模拟不同的颅骨外科手术结果。
  • results: 这项研究可以帮助提高颅骨畸形诊断的准确性,帮助外科医生规划手术,以及对手术结果进行客观评估。
    Abstract The use of deep learning to undertake shape analysis of the complexities of the human head holds great promise. However, there have traditionally been a number of barriers to accurate modelling, especially when operating on both a global and local level. In this work, we will discuss the application of the Swap Disentangled Variational Autoencoder (SD-VAE) with relevance to Crouzon, Apert and Muenke syndromes. Although syndrome classification is performed on the entire mesh, it is also possible, for the first time, to analyse the influence of each region of the head on the syndromic phenotype. By manipulating specific parameters of the generative model, and producing procedure-specific new shapes, it is also possible to simulate the outcome of a range of craniofacial surgical procedures. This opens new avenues to advance diagnosis, aids surgical planning and allows for the objective evaluation of surgical outcomes.
    摘要 使用深度学习进行人类头部复杂性分析具有很大的抢救性。然而,在全球和地方层次上准确模型化却存在许多障碍。在这项工作中,我们将讨论使用Swap Disentangled Variational Autoencoder(SD-VAE)在Crouzon、Apert和Muenke综合症中的应用。虽然病种分类是基于整个网格进行的,但是也可以,如 nunca antes,分析每个头部区域对综合症fenotip的影响。通过修改生成模型的特定参数,并生成过程特定的新形状,也可以模拟多种颅外科手术的结果。这些新的技术可以提高诊断、帮助手术规划和评估手术结果的 объекivity。

Language Models for Novelty Detection in System Call Traces

  • paper_url: http://arxiv.org/abs/2309.02206
  • repo_url: None
  • paper_authors: Quentin Fournier, Daniel Aloise, Leandro R. Costa
  • for: 本研究旨在提出一种基于系统调用语言模型的新型行为探测方法,用于检测现代计算机系统中的异常行为。
  • methods: 本研究使用了三种不同的 neural network 架构:LSTM、Transformer 和 Longformer,并对这些架构进行了评估。
  • results: 研究发现,使用这些架构可以实现高于 95% 的 F-score 和 AuROC 在大多数新型行为上,而且该方法不需要大量的专家手动编制和可以在不同任务上进行数据独立的应用。
    Abstract Due to the complexity of modern computer systems, novel and unexpected behaviors frequently occur. Such deviations are either normal occurrences, such as software updates and new user activities, or abnormalities, such as misconfigurations, latency issues, intrusions, and software bugs. Regardless, novel behaviors are of great interest to developers, and there is a genuine need for efficient and effective methods to detect them. Nowadays, researchers consider system calls to be the most fine-grained and accurate source of information to investigate the behavior of computer systems. Accordingly, this paper introduces a novelty detection methodology that relies on a probability distribution over sequences of system calls, which can be seen as a language model. Language models estimate the likelihood of sequences, and since novelties deviate from previously observed behaviors by definition, they would be unlikely under the model. Following the success of neural networks for language models, three architectures are evaluated in this work: the widespread LSTM, the state-of-the-art Transformer, and the lower-complexity Longformer. However, large neural networks typically require an enormous amount of data to be trained effectively, and to the best of our knowledge, no massive modern datasets of kernel traces are publicly available. This paper addresses this limitation by introducing a new open-source dataset of kernel traces comprising over 2 million web requests with seven distinct behaviors. The proposed methodology requires minimal expert hand-crafting and achieves an F-score and AuROC greater than 95% on most novelties while being data- and task-agnostic. The source code and trained models are publicly available on GitHub while the datasets are available on Zenodo.
    摘要 Recently, researchers have turned to system calls as the most fine-grained and accurate source of information to investigate the behavior of computer systems. This paper introduces a novelty detection methodology that relies on a probability distribution over sequences of system calls, which can be seen as a language model. The method estimates the likelihood of sequences, and since novelties deviate from previously observed behaviors by definition, they would be unlikely under the model.To evaluate the effectiveness of the method, three neural network architectures were used: the widespread LSTM, the state-of-the-art Transformer, and the lower-complexity Longformer. However, large neural networks typically require a large amount of data to be trained effectively. To address this limitation, this paper introduces a new open-source dataset of kernel traces comprising over 2 million web requests with seven distinct behaviors.The proposed methodology requires minimal expert hand-crafting and achieves an F-score and AuROC greater than 95% on most novelties while being data- and task-agnostic. The source code and trained models are publicly available on GitHub, while the datasets are available on Zenodo.

On the Complexity of Differentially Private Best-Arm Identification with Fixed Confidence

  • paper_url: http://arxiv.org/abs/2309.02202
  • repo_url: None
  • paper_authors: Achraf Azize, Marc Jourdan, Aymen Al Marjani, Debabrota Basu
  • for: 这个论文主要研究的是如何在数据敏感应用中实现最佳臂标识(BAI)问题,包括设计适应性临床试验、调整超参数以及进行用户研究等。
  • methods: 这篇论文使用了 $\epsilon$-全球隐私(DP)来保证数据隐私,并研究了 $\epsilon$-全球DP下BAI问题的解决方案。作者首先计算了隐私保护成本的下限,并发现了两种隐私模式,即高隐私模式(小 $\epsilon$)和低隐私模式(大 $\epsilon$)。在高隐私模式下,难度受到隐私和一种新的信息论量——总特征时间的共同影响。在低隐私模式下,下界降到非私下的下界。
  • results: 作者提出了一种名为 AdaP-TT 的 $\epsilon$-全球DP下的 BAI 算法,该算法在 arm-dependent 的扩展集内运行,并添加了 Laplace 噪声来保证好隐私与实用之间的融合。作者 deriv了 AdaP-TT 的 asymptotic 上限,并证明了其与下界之间的相似性。最后,作者进行了实验分析,并证明了理论结论。
    Abstract Best Arm Identification (BAI) problems are progressively used for data-sensitive applications, such as designing adaptive clinical trials, tuning hyper-parameters, and conducting user studies to name a few. Motivated by the data privacy concerns invoked by these applications, we study the problem of BAI with fixed confidence under $\epsilon$-global Differential Privacy (DP). First, to quantify the cost of privacy, we derive a lower bound on the sample complexity of any $\delta$-correct BAI algorithm satisfying $\epsilon$-global DP. Our lower bound suggests the existence of two privacy regimes depending on the privacy budget $\epsilon$. In the high-privacy regime (small $\epsilon$), the hardness depends on a coupled effect of privacy and a novel information-theoretic quantity, called the Total Variation Characteristic Time. In the low-privacy regime (large $\epsilon$), the sample complexity lower bound reduces to the classical non-private lower bound. Second, we propose AdaP-TT, an $\epsilon$-global DP variant of the Top Two algorithm. AdaP-TT runs in arm-dependent adaptive episodes and adds Laplace noise to ensure a good privacy-utility trade-off. We derive an asymptotic upper bound on the sample complexity of AdaP-TT that matches with the lower bound up to multiplicative constants in the high-privacy regime. Finally, we provide an experimental analysis of AdaP-TT that validates our theoretical results.
    摘要 最佳臂标识问题(BAI)在数据敏感应用中得到普遍应用,如设计适应式临床试验、调整超参数以及进行用户研究等。由于这些应用中的数据隐私问题,我们研究BAI问题在固定信息保护环境下的$\epsilon$-全球隐私(DP)下进行研究。首先,我们定义了隐私成本的量化,即在任意$\delta$-正确BAI算法满足$\epsilon$-全球DP下的样本复杂度下界。我们的下界表明,隐私预算$\epsilon$的两个隐私模式存在:在高隐私模式(小$\epsilon$)下,难度受到隐私和一种新的信息论量度——总特征时间的共同作用的影响。在低隐私模式(大$\epsilon$)下,样本复杂度下界降低到经典非私有下界。其次,我们提出了一种$\epsilon$-全球DP variant的Top Two算法——AdaP-TT。AdaP-TT在臂 dependent的适应性集中运行,并在每个集中添加拉Place噪声以确保良好的隐私利用平衡。我们 deriv了AdaP-TT的 asymptotic 上界样本复杂度,与下界匹配到多项式常数在高隐私模式。最后,我们对AdaP-TT进行实验分析,并证明了我们的理论结果。

Sparse Function-space Representation of Neural Networks

  • paper_url: http://arxiv.org/abs/2309.02195
  • repo_url: https://github.com/himanshub1007/Alzhimers-Disease-Prediction-Using-Deep-learning
  • paper_authors: Aidan Scannell, Riccardo Mereu, Paul Chang, Ella Tamir, Joni Pajarinen, Arno Solin
  • for: 提高深度神经网络(NN)的不确定性估计和新数据 incorporation 能力
  • methods: 通过将NN从权重空间转换到函数空间,via dual parameterization,实现了一种简洁和原理正确的不确定性捕捉方法,并可以在整个数据集中捕捉信息
  • results: 通过proof-of-concept示例,证明了该方法在超参量学任务上可以有效地Quantifying uncertainty和 incorporating new data without retraining
    Abstract Deep neural networks (NNs) are known to lack uncertainty estimates and struggle to incorporate new data. We present a method that mitigates these issues by converting NNs from weight space to function space, via a dual parameterization. Importantly, the dual parameterization enables us to formulate a sparse representation that captures information from the entire data set. This offers a compact and principled way of capturing uncertainty and enables us to incorporate new data without retraining whilst retaining predictive performance. We provide proof-of-concept demonstrations with the proposed approach for quantifying uncertainty in supervised learning on UCI benchmark tasks.
    摘要 Translated into Simplified Chinese:深度神经网络(NN)通常缺乏不确定性估计和新数据integrate的能力。我们提出了一种方法,通过将NN从权重空间转换到函数空间,使用双参数化。这种方法可以提供一种紧凑而原理的方式来捕捉不确定性信息,并且可以在不需要 RETRAINING 的情况下,将新数据集入库。我们在 UCI benchmark 任务上提供了证明示范,以证明我们的方法可以在超出学习中量化不确定性。

Personalized Federated Deep Reinforcement Learning-based Trajectory Optimization for Multi-UAV Assisted Edge Computing

  • paper_url: http://arxiv.org/abs/2309.02193
  • repo_url: None
  • paper_authors: Zhengrong Song, Chuan Ma, Ming Ding, Howard H. Yang, Yuwen Qian, Xiangwei Zhou
  • for: 提高多架空器飞行轨迹优化的通信系统吞吐量
  • methods: 使用个性化联合深度学习(PF-DRL)方法,为每个代理模型开发个性化模型,以Address数据稀缺和不均匀性问题
  • results: 在模拟环境中,提议的算法比其他DRL基本方法具有更好的训练性能和更快的 converges速率,并提高服务质量
    Abstract In the era of 5G mobile communication, there has been a significant surge in research focused on unmanned aerial vehicles (UAVs) and mobile edge computing technology. UAVs can serve as intelligent servers in edge computing environments, optimizing their flight trajectories to maximize communication system throughput. Deep reinforcement learning (DRL)-based trajectory optimization algorithms may suffer from poor training performance due to intricate terrain features and inadequate training data. To overcome this limitation, some studies have proposed leveraging federated learning (FL) to mitigate the data isolation problem and expedite convergence. Nevertheless, the efficacy of global FL models can be negatively impacted by the high heterogeneity of local data, which could potentially impede the training process and even compromise the performance of local agents. This work proposes a novel solution to address these challenges, namely personalized federated deep reinforcement learning (PF-DRL), for multi-UAV trajectory optimization. PF-DRL aims to develop individualized models for each agent to address the data scarcity issue and mitigate the negative impact of data heterogeneity. Simulation results demonstrate that the proposed algorithm achieves superior training performance with faster convergence rates, and improves service quality compared to other DRL-based approaches.
    摘要 在5G移动通信时代,有一场很大的研究集中在无人飞行器(UAV)和边缘计算技术上。UAV可以在边缘计算环境中服务为智能服务器,最大化通信系统吞吐量。深度违离学(DRL)基于的轨迹优化算法可能因地形特征复杂和训练数据不充分而表现不佳。为了解决这些限制,一些研究已经提议利用联邦学习(FL)来减少数据隔离问题,加速融合。然而,全球FL模型的效果可能受到本地数据的高自similarity的影响,这可能会阻碍训练过程并可能下降本地代理的性能。这项工作提出了一种解决这些挑战的新解决方案,即个性化联邦深度违离学(PF-DRL),用于多个UAV的轨迹优化。PF-DRL的目标是为每个代理开发特定的模型,以解决数据缺乏问题,并减少数据不同性的负面影响。在模拟结果中,提出的算法可以在训练性能和速度上达到更好的表现,并提高服务质量相比其他DRL基于的方法。

Bias Propagation in Federated Learning

  • paper_url: http://arxiv.org/abs/2309.02160
  • repo_url: https://github.com/privacytrustlab/bias_in_FL
  • paper_authors: Hongyan Chang, Reza Shokri
  • for: 这个论文旨在探讨联邦学习中的群体公平问题,具体来说是研究在分布式数据集上如何避免偏见的扩散。
  • methods: 这个论文使用了联邦学习的实际应用场景,对实际分布式数据集进行分析和解释,探讨偏见如何在联邦学习中传播。
  • results: 研究发现,在联邦学习中,偏见可以通过网络传播给所有参与方,而且这种偏见的程度高于中央训练模型使用所有数据集的情况。这些结果告诉我们,在联邦学习中需要进行审核和设计具有群体公平性的学习算法。
    Abstract We show that participating in federated learning can be detrimental to group fairness. In fact, the bias of a few parties against under-represented groups (identified by sensitive attributes such as gender or race) can propagate through the network to all the parties in the network. We analyze and explain bias propagation in federated learning on naturally partitioned real-world datasets. Our analysis reveals that biased parties unintentionally yet stealthily encode their bias in a small number of model parameters, and throughout the training, they steadily increase the dependence of the global model on sensitive attributes. What is important to highlight is that the experienced bias in federated learning is higher than what parties would otherwise encounter in centralized training with a model trained on the union of all their data. This indicates that the bias is due to the algorithm. Our work calls for auditing group fairness in federated learning and designing learning algorithms that are robust to bias propagation.
    摘要 我们显示了参与联邦学习可能会对群体公平性造成不良影响。事实上,一些党对受抑表示的群体(通过敏感特征如性别或种族)的偏见可以在网络中传播到所有党。我们分析并解释了联邦学习中偏见传播的现象。我们的分析表明,偏见党在训练过程中隐藏式地将偏见编码到少量的模型参数中,并在训练中不断增加受抑表示的参考。值得注意的是,在联邦学习中体验到的偏见高于中央训练一个模型使用所有数据的情况下所体验到的偏见。这表明,偏见是由算法所导致的。我们的工作呼吁了审核联邦学习中的群体公平性,并设计不受偏见传播的学习算法。

A Simple Asymmetric Momentum Make SGD Greatest Again

  • paper_url: http://arxiv.org/abs/2309.02130
  • repo_url: None
  • paper_authors: Gongyue Zhang, Dinghuang Zhang, Shuwen Zhao, Donghan Liu, Carrie M. Toptan, Honghai Liu
  • for: 本研究旨在解决深度学习中的枢轴点问题,提出了一种新的损控极大值梯度法(LCAM),不同于传统的梯度下降法,LCAM在计算成本上没有增加,却能够超越现有的优化器。
  • methods: 本研究使用了质量 conjugation 和拖动效应来解释 LCAM 的现象,并设计了一系列实验来快速降低学习率在特定的积分 epoch 来更好地吸引参数陷入枢轴点。
  • results: 在 WRN28-10 测试网络上,使用 LCAM 在 Cifar100 测试集上达到了平均测试精度的峰值 around 120 epoch,比原始 WRN 纸上的80.75% 高,而且使用的 convergence time 只有原始 WRN 的一半。
    Abstract We propose the simplest SGD enhanced method ever, Loss-Controlled Asymmetric Momentum(LCAM), aimed directly at the Saddle Point problem. Compared to the traditional SGD with Momentum, there's no increase in computational demand, yet it outperforms all current optimizers. We use the concepts of weight conjugation and traction effect to explain this phenomenon. We designed experiments to rapidly reduce the learning rate at specified epochs to trap parameters more easily at saddle points. We selected WRN28-10 as the test network and chose cifar10 and cifar100 as test datasets, an identical group to the original paper of WRN and Cosine Annealing Scheduling(CAS). We compared the ability to bypass saddle points of Asymmetric Momentum with different priorities. Finally, using WRN28-10 on Cifar100, we achieved a peak average test accuracy of 80.78\% around 120 epoch. For comparison, the original WRN paper reported 80.75\%, while CAS was at 80.42\%, all at 200 epoch. This means that while potentially increasing accuracy, we use nearly half convergence time. Our demonstration code is available at\\ https://github.com/hakumaicc/Asymmetric-Momentum-LCAM
    摘要 我们提出了最简单的SGD加强方法之一,损控量子动量(LCAM),直接解决顶点问题。与传统的SGD加强方法相比,我们没有增加计算需求,但它在当前优化器中表现出色。我们使用了权重 conjugation 和拖动效应来解释这种现象。我们设计了实验,以快速降低学习率在 specify 的epoch中,以更容易将参数固定在顶点上。我们选择了 WRN28-10 作为测试网络,并选择了 cifar10 和 cifar100 作为测试集,与原始 WRN 和 Cosine Annealing Scheduling(CAS)的测试集一样。我们比较了不同优先级的偏置量子动量的缺过点途径能力。最后,使用 WRN28-10 在 Cifar100 上达到了约 120 epoch 的峰值平均测试准确率 around 80.78%。相比之下,原始 WRN 文章reported 80.75%,而 CAS 则是 80.42%,都在 200 epoch 上。这意味着,虽然可能提高准确率,但我们使用的是 nearly half 的 converge 时间。我们的示例代码可以在 https://github.com/hakumaicc/Asymmetric-Momentum-LCAM 上找到。

Exploiting Spatial-temporal Data for Sleep Stage Classification via Hypergraph Learning

  • paper_url: http://arxiv.org/abs/2309.02124
  • repo_url: None
  • paper_authors: Yuze Liu, Ziming Zhao, Tiehua Zhang, Kang Wang, Xin Chen, Xiaowei Huang, Jun Yin, Zhishu Shen
  • for: 静脉分类是诊断疾病的关键,现有模型主要使用卷积神经网络(CNN)模型几何数据,以及图 convolutional neural networks(GNN)模型非几何数据,但是这些模型无法同时考虑多modal数据的异质性和交互性,以及空间-时间相关性,因此它们的分类性能有限。
  • methods: 我们提出了一种动态学习框架STHL,该框架引入了Hipergraph来编码空间-时间数据 для静脉分类。Hipergraph可以构造多Modal/多类型的数据,而不是使用简单的对两个主体之间的对应。STHL创建空间和时间的Hiperedge分别来建立节点相关性,然后它进行类型特定的Hipergraph学习过程来编码特征到嵌入空间。
  • results: 我们的提出的STHL在静脉分类任务中超过了当前最佳模型的性能。
    Abstract Sleep stage classification is crucial for detecting patients' health conditions. Existing models, which mainly use Convolutional Neural Networks (CNN) for modelling Euclidean data and Graph Convolution Networks (GNN) for modelling non-Euclidean data, are unable to consider the heterogeneity and interactivity of multimodal data as well as the spatial-temporal correlation simultaneously, which hinders a further improvement of classification performance. In this paper, we propose a dynamic learning framework STHL, which introduces hypergraph to encode spatial-temporal data for sleep stage classification. Hypergraphs can construct multi-modal/multi-type data instead of using simple pairwise between two subjects. STHL creates spatial and temporal hyperedges separately to build node correlations, then it conducts type-specific hypergraph learning process to encode the attributes into the embedding space. Extensive experiments show that our proposed STHL outperforms the state-of-the-art models in sleep stage classification tasks.
    摘要 📝 sleep stage classification 是诊断病人健康状况的关键。现有的模型主要使用卷积神经网络(CNN)来模型几何数据,以及图 convolutional neural networks(GNN)来模型非几何数据,但是这些模型无法同时考虑多模态数据的异质性和互动性以及空间-时间相关性,这会限制分类性能的进一步提高。在本文中,我们提出了一个动态学习框架 STHL,该框架利用卷积图来编码空间-时间数据 для睡眠阶段分类。卷积图可以构建多Modal/多类型的数据,而不是使用简单的对两个主题之间的对应关系。STHL 首先在空间和时间上分别创建特性 Edge,然后进行类型特定的卷积图学习过程来编码特征到嵌入空间中。广泛的实验表明,我们提出的 STHL 在睡眠阶段分类任务中表现出了优于现有的模型。

An Efficient Approach to Unsupervised Out-of-Distribution Detection with Variational Autoencoders

  • paper_url: http://arxiv.org/abs/2309.02084
  • repo_url: https://github.com/zjlab-ammi/vae4ood
  • paper_authors: Zezhen Zeng, Bin Liu
  • for: 这 paper 关注深度生成模型 (DGM) 的无监督 out-of-distribution (OOD) 检测。特别是我们关注 vanilla Variational Autoencoders (VAE),使用标准正态分布 для隐藏变量。这些模型具有更小的模型大小,使得它们在资源有限的应用程序中更适合使用,比较复杂的 DGM。
  • methods: 我们提出了一种新的 OOD 分数,called Error Reduction (ER),专门为 vanilla VAE 设计。ER 具有重建输入图像的损失 counterpart 的想法,并考虑图像的 Kolmogorov 复杂性。我们在多个数据集上进行了实验,比较基准方法。
  • results: 我们的实验结果表明,我们的方法在多个数据集上具有显著优势,比较基准方法。我们的代码可以在 GitHub 上找到:https://github.com/ZJLAB-AMMI/VAE4OOD。
    Abstract This paper is concerned with deep generative models (DGMs) for unsupervised out-of-distribution (OOD) detection. In particular, we focus on vanilla Variational Autoencoders (VAE) that use a standard normal prior distribution for the latent variables. These models have a smaller model size, enabling faster training and inference, making them well-suited for resource-limited applications compared to more complex DGMs. We propose a novel OOD score called Error Reduction (ER) specifically designed for vanilla VAE. ER incorporate the idea of reconstructing image inputs from their lossy counterparts and takes into account the Kolmogorov complexity of the images. Experimental results on diverse datasets demonstrate the superiority of our approach over baseline methods. Our code is available at: https://github.com/ZJLAB-AMMI/VAE4OOD.
    摘要 Translation notes:* "DGMs" is translated as "深度生成模型" (shēn dào shēng chuàng mó delè)* "VAE" is translated as "自变量 autoencoder" (zì biàn xiàng autoencoder)* "OOD" is translated as "外围数据" (wài yù shù jí)* "ER" is translated as "错误减少" (cuò wù jiǎn shang)* "Kolmogorov complexity" is translated as "科玛戈罗夫复杂度" (kē mǎ gē luó fù zhòng dù)

BeeTLe: A Framework for Linear B-Cell Epitope Prediction and Classification

  • paper_url: http://arxiv.org/abs/2309.02071
  • repo_url: https://github.com/yuanx749/bcell
  • paper_authors: Xiao Yuan
  • for: 这个论文的目的是提出一种新的深度学习基于多任务框架,用于线性B细胞抗体复合体预测和抗体类型特定的复合体分类。
  • methods: 该论文提出了一种基于序列 neural network 模型,使用回归层和 Transformer 块来实现。此外,还提出了一种基于对角化的胺基encoding方法,以帮助模型学习复合体的表示。
  • results: 实验结果表明,提出的方法有效地预测B细胞抗体复合体,并且与竞争方法相比,表现出色。
    Abstract The process of identifying and characterizing B-cell epitopes, which are the portions of antigens recognized by antibodies, is important for our understanding of the immune system, and for many applications including vaccine development, therapeutics, and diagnostics. Computational epitope prediction is challenging yet rewarding as it significantly reduces the time and cost of laboratory work. Most of the existing tools do not have satisfactory performance and only discriminate epitopes from non-epitopes. This paper presents a new deep learning-based multi-task framework for linear B-cell epitope prediction as well as antibody type-specific epitope classification. Specifically, a sequenced-based neural network model using recurrent layers and Transformer blocks is developed. We propose an amino acid encoding method based on eigen decomposition to help the model learn the representations of epitopes. We introduce modifications to standard cross-entropy loss functions by extending a logit adjustment technique to cope with the class imbalance. Experimental results on data curated from the largest public epitope database demonstrate the validity of the proposed methods and the superior performance compared to competing ones.
    摘要 “识别和描述B细胞结构的过程是免疫系统理解的重要部分,并有许多应用,包括疫苗开发、治疗和诊断。计算epitope预测是具有挑战性的,但是可以大幅降低实验室工作的时间和成本。现有的工具大多数无法达到满意的性能,只能区分epitope和非epitope。本文提出了一个新的深度学习多任务框架,用于直线B细胞epitope预测和抗体类型特定epitope分类。具体来说,我们使用序列化的神经网络模型,包括回传层和Transformer层。我们提出了一个使用对角解析法来编码氨基酸的方法,帮助模型学习epitope的表现。我们也提出了对标准十字熵损失函数进行修改,以应对分布不对称。实验结果显示,提出的方法有效性和与竞争方法相比较高的性能。”

Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI

  • paper_url: http://arxiv.org/abs/2309.02065
  • repo_url: None
  • paper_authors: Dustin Wright, Christian Igel, Gabrielle Samuel, Raghavendra Selvan
    for: 本文旨在探讨机器学习(ML)技术的环境可持续性问题,并 argue against solely focusing on efficiency as the solution.methods: 本文使用高级数学(DL)和其他ML方法,以及系统思维来探讨ML技术对环境的影响。results: 本文认为,提高ML系统的效率并不能够完全解决其对环境的影响,而需要考虑多个变量的交互作用。
    Abstract Artificial Intelligence (AI) is currently spearheaded by machine learning (ML) methods such as deep learning (DL) which have accelerated progress on many tasks thought to be out of reach of AI. These ML methods can often be compute hungry, energy intensive, and result in significant carbon emissions, a known driver of anthropogenic climate change. Additionally, the platforms on which ML systems run are associated with environmental impacts including and beyond carbon emissions. The solution lionized by both industry and the ML community to improve the environmental sustainability of ML is to increase the efficiency with which ML systems operate in terms of both compute and energy consumption. In this perspective, we argue that efficiency alone is not enough to make ML as a technology environmentally sustainable. We do so by presenting three high level discrepancies between the effect of efficiency on the environmental sustainability of ML when considering the many variables which it interacts with. In doing so, we comprehensively demonstrate, at multiple levels of granularity both technical and non-technical reasons, why efficiency is not enough to fully remedy the environmental impacts of ML. Based on this, we present and argue for systems thinking as a viable path towards improving the environmental sustainability of ML holistically.
    摘要 然而,我们认为效率alone是不足以使ML成为可持续的技术。我们这样做的原因在于,当ML系统与多个变数互动时,增加效率对环境可持续性的影响是复杂的。为了解释这个观点,我们在这篇文章中提出了三个高度不一致的问题,它们是:1. 碳排放和能源消耗之间的复杂关系。2. ML系统的可持续性受到多个因素的影响,包括硬件、软件、供应链和使用者。3. 增加效率可能会导致新的环境和社会影响,例如对于资源的掌控和可持续性。这些问题表明,增加ML系统的效率 alone 不能全面解决环境可持续性的问题。因此,我们提出了以系统思维为基础的可持续性解决方案,以确保ML技术在环境和社会方面的影响是可持续的。

MvFS: Multi-view Feature Selection for Recommender System

  • paper_url: http://arxiv.org/abs/2309.02064
  • repo_url: None
  • paper_authors: Youngjune Lee, Yeongjong Jeong, Keunchan Park, SeongKu Kang
  • for: 提高 recommender systems 中 feature selection 的性能,适应不同数据场景。
  • methods: 使用多视图网络和独立的重要性分数模型,避免特征选择过程受主导特征的偏袋问题,从而更有效地选择有用的特征。
  • results: 对实际数据进行实验,证明 MvFS 比state-of-the-art基elines更有效。
    Abstract Feature selection, which is a technique to select key features in recommender systems, has received increasing research attention. Recently, Adaptive Feature Selection (AdaFS) has shown remarkable performance by adaptively selecting features for each data instance, considering that the importance of a given feature field can vary significantly across data. However, this method still has limitations in that its selection process could be easily biased to major features that frequently occur. To address these problems, we propose Multi-view Feature Selection (MvFS), which selects informative features for each instance more effectively. Most importantly, MvFS employs a multi-view network consisting of multiple sub-networks, each of which learns to measure the feature importance of a part of data with different feature patterns. By doing so, MvFS mitigates the bias problem towards dominant patterns and promotes a more balanced feature selection process. Moreover, MvFS adopts an effective importance score modeling strategy which is applied independently to each field without incurring dependency among features. Experimental results on real-world datasets demonstrate the effectiveness of MvFS compared to state-of-the-art baselines.
    摘要 Feature selection, which is a technique used in recommender systems to select key features, has recently received increasing research attention. Adaptive Feature Selection (AdaFS) has shown remarkable performance by adaptively selecting features for each data instance, considering that the importance of a given feature field can vary significantly across data. However, this method still has limitations, as its selection process can be easily biased towards major features that frequently occur. To address these problems, we propose Multi-view Feature Selection (MvFS), which selects informative features for each instance more effectively. Most importantly, MvFS employs a multi-view network consisting of multiple sub-networks, each of which learns to measure the feature importance of a part of data with different feature patterns. By doing so, MvFS mitigates the bias problem towards dominant patterns and promotes a more balanced feature selection process. Moreover, MvFS adopts an effective importance score modeling strategy which is applied independently to each field without incurring dependency among features. Experimental results on real-world datasets demonstrate the effectiveness of MvFS compared to state-of-the-art baselines.

No-Regret Caching with Noisy Request Estimates

  • paper_url: http://arxiv.org/abs/2309.02055
  • repo_url: None
  • paper_authors: Younes Ben Mazziane, Francescomaria Faticanti, Giovanni Neglia, Sara Alouf
  • for: 这个论文目的是设计缓存策略,以满足在高负荷和/或内存约束的情况下的缓存需求。
  • methods: 这个论文使用了在线学习算法,以实现缓存策略的设计,并且对缓存请求序列进行了预测。
  • results: 该论文提出了一种名为“听雷雨 Follow the Perturbed Leader”(NFPL)算法,该算法可以在缓存请求序列中受到噪声影响时,实现低 regret 的缓存策略。此外,该论文还进行了对 классические缓存策略的比较,并在实验中验证了提议的方法的可行性。
    Abstract Online learning algorithms have been successfully used to design caching policies with regret guarantees. Existing algorithms assume that the cache knows the exact request sequence, but this may not be feasible in high load and/or memory-constrained scenarios, where the cache may have access only to sampled requests or to approximate requests' counters. In this paper, we propose the Noisy-Follow-the-Perturbed-Leader (NFPL) algorithm, a variant of the classic Follow-the-Perturbed-Leader (FPL) when request estimates are noisy, and we show that the proposed solution has sublinear regret under specific conditions on the requests estimator. The experimental evaluation compares the proposed solution against classic caching policies and validates the proposed approach under both synthetic and real request traces.
    摘要 在线学习算法已经成功地用于设计缓存策略,并且提供了 regret 保证。现有的算法假设缓存知道正确的请求序列,但在高负荷和/或内存压力高的enario中,这可能并不是可行的,因为缓存可能只有对请求数据进行采样或估计。在这篇论文中,我们提出了听从噪声扰动领导者(NFPL)算法,这是经典的跟踪扰动领导者(FPL)算法的变种,当请求估计具有噪声时。我们证明了我们的解决方案在特定的请求估计条件下具有下降式 regret。实验评估比较了我们的解决方案与经典缓存策略,并在 synthetic 和实际请求轨迹上验证了我们的方法。

Model-agnostic network inference enhancement from noisy measurements via curriculum learning

  • paper_url: http://arxiv.org/abs/2309.02050
  • repo_url: https://github.com/xiaoyuans/manie
  • paper_authors: Kai Wu, Yuanyuan Li, Jing Liu
  • for: 提高网络推理模型对噪声的抵抗性能
  • methods: curriculum learning + 模型自适应阈值调整 + 数据 augmentation
  • results: 在多种噪声环境下,提高了各种网络推理模型的性能,特别是在清晰样本充沥的情况下表现出色
    Abstract Noise is a pervasive element within real-world measurement data, significantly undermining the performance of network inference models. However, the quest for a comprehensive enhancement framework capable of bolstering noise resistance across a diverse array of network inference models has remained elusive. Here, we present an elegant and efficient framework tailored to amplify the capabilities of network inference models in the presence of noise. Leveraging curriculum learning, we mitigate the deleterious impact of noisy samples on network inference models. Our proposed framework is model-agnostic, seamlessly integrable into a plethora of model-based and model-free network inference methods. Notably, we utilize one model-based and three model-free network inference methods as the foundation. Extensive experimentation across various synthetic and real-world networks, encapsulating diverse nonlinear dynamic processes, showcases substantial performance augmentation under varied noise types, particularly thriving in scenarios enriched with clean samples. This framework's adeptness in fortifying both model-free and model-based network inference methodologies paves the avenue towards a comprehensive and unified enhancement framework, encompassing the entire spectrum of network inference models. Available Code: https://github.com/xiaoyuans/MANIE.
    摘要 <>Translate the given text into Simplified Chinese.<>噪声是现实世界测量数据中的一种普遍存在的元素,对网络推理模型的性能产生了重要的影响。然而,找到一个全面提高抗噪抗性的框架,能够在多种网络推理模型上提高性能,一直未能实现。在这里,我们提出了一个简洁而高效的框架,用于增强网络推理模型在噪声存在下的性能。我们利用课程学习,以mitigate噪声样本对网络推理模型的负面影响。我们提posed的框架是模型无关的,可以轻松地整合到多种模型基于和模型自由的网络推理方法中。特别是,我们使用了一个模型基于的和三个模型自由的网络推理方法作为基础。经过对多种人工和实际网络、包括多种非线性动力学过程的广泛实验,表明我们的框架在不同的噪声类型下具有显著的性能提高,特别是在充满清晰样本的场景下表现出色。这种框架的能力在加强模型自由和模型基于的网络推理方法方面表现出了广泛的可用性,为建立一个涵盖整个网络推理模型谱系的全面和统一的提高框架提供了道路。可以在 GitHub 上获取代码:https://github.com/xiaoyuans/MANIE。

Probabilistic Self-supervised Learning via Scoring Rules Minimization

  • paper_url: http://arxiv.org/abs/2309.02048
  • repo_url: None
  • paper_authors: Amirhossein Vahidi, Simon Schoßer, Lisa Wimmer, Yawei Li, Bernd Bischl, Eyke Hüllermeier, Mina Rezaei
  • For: 提高表示质量和避免归一化表示* Methods: 使用 probabilistic models 和知识传播来增强表示质量,并提出一种新的损失函数基于适当的分数规则* Results: 在多种下游任务上达到了superior的准确率和准确性,比自我超vised基线在广泛的实验中表现出色,demonstrating scalability and real-world applicability.
    Abstract In this paper, we propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN), which leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks; the online network and the target network, which collaborate and learn the diverse distribution of representations from each other through knowledge distillation. By presenting the input samples in two augmented formats, the online network is trained to predict the target network representation of the same sample under a different augmented view. The two networks are trained via our new loss function based on proper scoring rules. We provide a theoretical justification for ProSMIN's convergence, demonstrating the strict propriety of its modified scoring rule. This insight validates the method's optimization process and contributes to its robustness and effectiveness in improving representation quality. We evaluate our probabilistic model on various downstream tasks, such as in-distribution generalization, out-of-distribution detection, dataset corruption, low-shot learning, and transfer learning. Our method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on large-scale datasets like ImageNet-O and ImageNet-C, ProSMIN demonstrates its scalability and real-world applicability.
    摘要 在这篇论文中,我们提出了一种新的概率自编学习方法,即 Scoring Rule Minimization(ProSMIN),它利用概率模型来提高表示质量和消除塑性表示。我们的提议方法包括两个神经网络:在线网络和目标网络,它们相互合作,通过知识传递学习来学习各自的多样化表示分布。我们将输入样本提供两种扩展视图,使在线网络通过预测目标网络对同一个样本的不同扩展视图的表示来训练。我们使用我们新提出的损失函数,基于正确的探索规则。我们提供了对ProSMIN的优化过程的理论正确性的证明,这种视角证明了其优化过程的正确性和效果性,从而提高了表示质量。我们在多种下游任务上评估了我们的概率模型,包括内部分布式、外部分布式、数据损害、低学习率和转移学习等。我们的方法在各种实验中具有优于自编学习基eline的高精度和抗混淆性。

Data-Juicer: A One-Stop Data Processing System for Large Language Models

  • paper_url: http://arxiv.org/abs/2309.02033
  • repo_url: https://github.com/alibaba/data-juicer
  • paper_authors: Daoyuan Chen, Yilun Huang, Zhijian Ma, Hesen Chen, Xuchen Pan, Ce Ge, Dawei Gao, Yuexiang Xie, Zhaoyang Liu, Jinyang Gao, Yaliang Li, Bolin Ding, Jingren Zhou
  • for: LLama 大语模型数据处理
  • methods: Data-Juicer 提供50多个内置Operator和可插入工具,以提高模块化、可组合、可扩展性,用于多种 LLama 数据处理需求。
  • results: Empirical validation reveals up to 7.45% relative improvement in LLaMA performance, and up to 88.7% reduction in single-machine processing time.
    Abstract The immense evolution in Large Language Models (LLMs) has underscored the importance of massive, diverse, and high-quality data. Despite this, existing open-source tools for LLM data processing remain limited and mostly tailored to specific datasets, with an emphasis on the reproducibility of released data over adaptability and usability, inhibiting potential applications. In response, we propose a one-stop, powerful yet flexible and user-friendly LLM data processing system named Data-Juicer. Our system offers over 50 built-in versatile operators and pluggable tools, which synergize modularity, composability, and extensibility dedicated to diverse LLM data processing needs. By incorporating visualized and automatic evaluation capabilities, Data-Juicer enables a timely feedback loop to accelerate data processing and gain data insights. To enhance usability, Data-Juicer provides out-of-the-box components for users with various backgrounds, and fruitful data recipes for LLM pre-training and post-tuning usages. Further, we employ multi-facet system optimization and seamlessly integrate Data-Juicer with both LLM and distributed computing ecosystems, to enable efficient and scalable data processing. Empirical validation of the generated data recipes reveals considerable improvements in LLaMA performance for various pre-training and post-tuning cases, demonstrating up to 7.45% relative improvement of averaged score across 16 LLM benchmarks and 16.25% higher win rate using pair-wise GPT-4 evaluation. The system's efficiency and scalability are also validated, supported by up to 88.7% reduction in single-machine processing time, 77.1% and 73.1% less memory and CPU usage respectively, and 7.91x processing acceleration when utilizing distributed computing ecosystems. Our system, data recipes, and multiple tutorial demos are released, calling for broader research centered on LLM data.
    摘要 “巨大的语言模型(LLM)演化带来了大量、多样化和高质量数据的重要性。然而,现有的开源工具 для LLM 数据处理仍然有限,主要是为特定数据集设计的,强调数据重现性而不是适应性和用户友好性,这限制了其应用前景。为此,我们提出了一个一站式、强大 yet 灵活和用户友好的 LLM 数据处理系统,名为 Data-Juicer。我们的系统提供了50多种快速组合和可插入工具,以提高模块性、可 compose 性和扩展性,以满足不同 LLM 数据处理需求。通过包含可视化和自动评估功能,Data-Juicer 可以提供时效的反馈循环,加速数据处理并获得数据视图。为了提高可用性,Data-Juicer 提供了适合不同背景的准备组件,以及 LLMA 预训练和后处理用例的有用数据荚。此外,我们采用多方面优化和与 LLM 和分布式计算环境集成,以实现高效和可扩展的数据处理。我们的实验 validate 了生成的数据荚,显示 LLMA 性能提高明显,在16个 LLMBenchmark 和16个 GPT-4 评估中,相对提高7.45%的平均分数,并在对比评估中提高16.25%的胜率。系统的效率和扩展性也得到了 validate,包括单机处理时间减少88.7%、内存和CPU使用量减少77.1%和73.1%,以及使用分布式计算环境时的处理加速7.91倍。我们的系统、数据荚和多个教程示例都已经发布,呼吁更广泛的 LLM 数据研究。”

Non-Parametric Representation Learning with Kernels

  • paper_url: http://arxiv.org/abs/2309.02028
  • repo_url: https://github.com/himanshub1007/Alzhimers-Disease-Prediction-Using-Deep-learning
  • paper_authors: Pascal Esser, Maximilian Fleissner, Debarghya Ghoshdastidar
  • for: 学习无监督的特征表示,从无标签数据中学习有用的特征。
  • methods: 使用kernel-based方法进行表示学习,包括对冲损函数和自适应 Encoder(AE)模型。
  • results: 提出了新的表示理论,并 derive了泛化误差上限,进行表示学习的评估。
    Abstract Unsupervised and self-supervised representation learning has become popular in recent years for learning useful features from unlabelled data. Representation learning has been mostly developed in the neural network literature, and other models for representation learning are surprisingly unexplored. In this work, we introduce and analyze several kernel-based representation learning approaches: Firstly, we define two kernel Self-Supervised Learning (SSL) models using contrastive loss functions and secondly, a Kernel Autoencoder (AE) model based on the idea of embedding and reconstructing data. We argue that the classical representer theorems for supervised kernel machines are not always applicable for (self-supervised) representation learning, and present new representer theorems, which show that the representations learned by our kernel models can be expressed in terms of kernel matrices. We further derive generalisation error bounds for representation learning with kernel SSL and AE, and empirically evaluate the performance of these methods in both small data regimes as well as in comparison with neural network based models.
    摘要 <> translate into Simplified Chinese无监督和自监督表示学习在最近几年内变得非常流行,以学习无标记数据中的有用特征。表示学习主要发展在神经网络文献中,而其他模型表示学习却尚未得到探索。在这项工作中,我们引入并分析了几种基于核函数的表示学习方法:首先,我们定义了两种抽象损失函数基于自我监督学习(SSL)模型,其次,基于数据嵌入和重建的核自动编码(AE)模型。我们认为经典的supervised机器学习的表示定理不一定适用于(自监督)表示学习,并提出了新的表示定理,其中表示学习得到的表示可以表示为核矩阵。我们进一步 deriv Generalization Error bounds for representation learning with kernel SSL和AE,并对这些方法在小数据 régime和与神经网络模型相比进行实验评估。Note: "Simplified Chinese" is a romanization of Chinese characters, which is used to represent the language in the Latin alphabet. It is not a translation of the text into Traditional Chinese, which is a different writing system.

Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length

  • paper_url: http://arxiv.org/abs/2309.02027
  • repo_url: None
  • paper_authors: Katerina Hlavackova-Schindler, Anna Melnykova, Irene Tubikanec
  • for: 这个论文主要针对多变量郝克过程(MHPs)中的连接图生成和选择问题,并提出一种基于最小消息长度(MML)原理的优化 критерион和模型选择算法。
  • methods: 该论文使用了扩展衰减函数和优化 критерион,并通过比较不同模型对数据的适应度和 concise度来选择最佳模型。
  • results: 对于短时间适应度较高的场景,该方法可以达到最高的 F1 分数,并在具有特定稀疏图设置下进行了数值研究。 更进一步,通过应用于 G7 财政债券数据,该方法可以获得一致的 causal 连接,与专业知识一致。
    Abstract Multivariate Hawkes processes (MHPs) are versatile probabilistic tools used to model various real-life phenomena: earthquakes, operations on stock markets, neuronal activity, virus propagation and many others. In this paper, we focus on MHPs with exponential decay kernels and estimate connectivity graphs, which represent the Granger causal relations between their components. We approach this inference problem by proposing an optimization criterion and model selection algorithm based on the minimum message length (MML) principle. MML compares Granger causal models using the Occam's razor principle in the following way: even when models have a comparable goodness-of-fit to the observed data, the one generating the most concise explanation of the data is preferred. While most of the state-of-art methods using lasso-type penalization tend to overfitting in scenarios with short time horizons, the proposed MML-based method achieves high F1 scores in these settings. We conduct a numerical study comparing the proposed algorithm to other related classical and state-of-art methods, where we achieve the highest F1 scores in specific sparse graph settings. We illustrate the proposed method also on G7 sovereign bond data and obtain causal connections, which are in agreement with the expert knowledge available in the literature.
    摘要 多变量庞克过程(MHP)是一种通用的概率工具,用于模拟各种实际场景:地震、股票市场交易、神经元活动、病毒传播等。在这篇论文中,我们关注MHP中的凝聚函数和抽象树的估计问题。我们使用最小消息长度(MML)原理来解决这个问题,MML比较不同庞克模型的适应度,并选择最简洁的模型。大多数当前的方法使用lasso类型的惩罚往往会过拟合短时间尺度下的场景,而我们提出的MML基于方法在这些设置下达到了最高的F1分数。我们进行了一个数字实验,比较了我们的方法和其他相关的古典和当前状态的方法,我们在特定的稀疏图设置下达到了最高的F1分数。我们还应用了我们的方法在G7国债数据中,并获得了一致的 causal 连接,与文献中的专家知识一致。

RDGSL: Dynamic Graph Representation Learning with Structure Learning

  • paper_url: http://arxiv.org/abs/2309.02025
  • repo_url: None
  • paper_authors: Siwei Zhang, Yun Xiong, Yao Zhang, Yiheng Sun, Xi Chen, Yizhu Jiao, Yangyong Zhu
  • for: 本研究旨在学习 kontinuous-time 动态图 Representation,以提高下游任务的效果。
  • methods: 本研究提出了 RDGSL 方法,具有 dynamic graph structure learning 和 Temporal Embedding Learner 两个重要组成部分。dynamic graph structure learning 可以有效地抑制噪声,Temporal Embedding Learner 可以选择ively ignore 噪声边,以提高 Representation 的表达力。
  • results: 本研究的方法在 downstream 任务中表现出了5.1% 绝对 AUC 提升,与第二个基线相比。
    Abstract Temporal Graph Networks (TGNs) have shown remarkable performance in learning representation for continuous-time dynamic graphs. However, real-world dynamic graphs typically contain diverse and intricate noise. Noise can significantly degrade the quality of representation generation, impeding the effectiveness of TGNs in downstream tasks. Though structure learning is widely applied to mitigate noise in static graphs, its adaptation to dynamic graph settings poses two significant challenges. i) Noise dynamics. Existing structure learning methods are ill-equipped to address the temporal aspect of noise, hampering their effectiveness in such dynamic and ever-changing noise patterns. ii) More severe noise. Noise may be introduced along with multiple interactions between two nodes, leading to the re-pollution of these nodes and consequently causing more severe noise compared to static graphs. In this paper, we present RDGSL, a representation learning method in continuous-time dynamic graphs. Meanwhile, we propose dynamic graph structure learning, a novel supervisory signal that empowers RDGSL with the ability to effectively combat noise in dynamic graphs. To address the noise dynamics issue, we introduce the Dynamic Graph Filter, where we innovatively propose a dynamic noise function that dynamically captures both current and historical noise, enabling us to assess the temporal aspect of noise and generate a denoised graph. We further propose the Temporal Embedding Learner to tackle the challenge of more severe noise, which utilizes an attention mechanism to selectively turn a blind eye to noisy edges and hence focus on normal edges, enhancing the expressiveness for representation generation that remains resilient to noise. Our method demonstrates robustness towards downstream tasks, resulting in up to 5.1% absolute AUC improvement in evolving classification versus the second-best baseline.
    摘要 temps 图网络(TGNs)在学习 continuous-time 动态图 Representation 方面表现出色,但实际世界中的动态图通常具有多样化和复杂的噪音。噪音可以对 Representation 生成质量产生重要影响,从而降低 TGNs 在下游任务中的效果。虽然结构学习在静止图中广泛应用,但其在动态图设置中存在两个主要挑战。i) 噪音动态性。现有的结构学习方法无法 Address 动态噪音的问题,因此其效iveness 在这些动态和改变中的噪音模式下受限。ii) 更严重的噪音。噪音可能会在两个节点之间多种交互中引入,导致这些节点重新污染,从而导致更严重的噪音 compared to 静止图。在这篇文章中,我们提出了 RDGSL,一种在 continuous-time 动态图上进行 Representation 学习的方法。同时,我们提出了动态图结构学习,一种新的监督信号,可以让 RDGSL 在动态图上有效地抗抗噪音。为 Address 噪音动态性问题,我们引入了动态噪音函数,可以动态地捕捉当前和历史噪音,使我们可以评估动态图中噪音的 temporal 方面,并生成一个 Denoised 图。此外,我们还提出了时间 Embedding Learner,可以在更严重的噪音下提高 Representation 生成的表达能力。我们的方法在下游任务中表现了Robustness,相比第二Best baseline,我们的方法在 evolving 分类中实现了5.1%的绝对 AUC 提升。

PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates

  • paper_url: http://arxiv.org/abs/2309.02014
  • repo_url: None
  • paper_authors: Zachary Frangella, Pratik Rathore, Shipu Zhao, Madeleine Udell
  • for: 解决大规模的几何优化问题,如机器学习中的ridge和logistic回归问题。
  • methods: 使用绘制技术来实现预处理的渐进搜索法,包括SVRG、SAGA和Katyusha等方法,每个方法都有强大的理论分析和有效的默认超参数设置。
  • results: 经验表明,提出的方法可以在 default 超参数设置下超过或与通过手动调整的梯度搜索优化器相比,并且在实际中也能够更快地达到 globally 线性减少。
    Abstract This paper introduces PROMISE ($\textbf{Pr}$econditioned Stochastic $\textbf{O}$ptimization $\textbf{M}$ethods by $\textbf{I}$ncorporating $\textbf{S}$calable Curvature $\textbf{E}$stimates), a suite of sketching-based preconditioned stochastic gradient algorithms for solving large-scale convex optimization problems arising in machine learning. PROMISE includes preconditioned versions of SVRG, SAGA, and Katyusha; each algorithm comes with a strong theoretical analysis and effective default hyperparameter values. In contrast, traditional stochastic gradient methods require careful hyperparameter tuning to succeed, and degrade in the presence of ill-conditioning, a ubiquitous phenomenon in machine learning. Empirically, we verify the superiority of the proposed algorithms by showing that, using default hyperparameter values, they outperform or match popular tuned stochastic gradient optimizers on a test bed of $51$ ridge and logistic regression problems assembled from benchmark machine learning repositories. On the theoretical side, this paper introduces the notion of quadratic regularity in order to establish linear convergence of all proposed methods even when the preconditioner is updated infrequently. The speed of linear convergence is determined by the quadratic regularity ratio, which often provides a tighter bound on the convergence rate compared to the condition number, both in theory and in practice, and explains the fast global linear convergence of the proposed methods.
    摘要 Empirically, we demonstrate the superiority of the proposed algorithms by showing that they outperform or match popular tuned stochastic gradient optimizers on a test bed of $51$ ridge and logistic regression problems from benchmark machine learning repositories, using default hyperparameter values.Theoretically, this paper introduces the concept of quadratic regularity to establish the linear convergence of all proposed methods, even when the preconditioner is updated infrequently. The speed of linear convergence is determined by the quadratic regularity ratio, which often provides a tighter bound on the convergence rate compared to the condition number, both in theory and in practice. This explains the fast global linear convergence of the proposed methods.

Representation Learning Dynamics of Self-Supervised Models

  • paper_url: http://arxiv.org/abs/2309.02011
  • repo_url: None
  • paper_authors: Pascal Esser, Satyaki Mukherjee, Debarghya Ghoshdastidar
  • for: 本研究探讨了无监督学习(Self-Supervised Learning)模型中的学习动力学,具体来说是对减少对比和非对比损失来获得的表示进行研究。
  • methods: 作者使用了多变量回归模型的动力学来探讨SSL模型的学习动力学,并提出了包含对齐约束的SSL目标函数。
  • results: 研究发现,使用 gradient descent 在 Grassmannian manifold 上训练 SSL 模型时,模型会学习简单的标量表示,导致维度减少现象出现。作者还证明了无监督学习模型在无穷宽approximation中与supervised模型之间存在很大差异。
    Abstract Self-Supervised Learning (SSL) is an important paradigm for learning representations from unlabelled data, and SSL with neural networks has been highly successful in practice. However current theoretical analysis of SSL is mostly restricted to generalisation error bounds. In contrast, learning dynamics often provide a precise characterisation of the behaviour of neural networks based models but, so far, are mainly known in supervised settings. In this paper, we study the learning dynamics of SSL models, specifically representations obtained by minimising contrastive and non-contrastive losses. We show that a naive extension of the dymanics of multivariate regression to SSL leads to learning trivial scalar representations that demonstrates dimension collapse in SSL. Consequently, we formulate SSL objectives with orthogonality constraints on the weights, and derive the exact (network width independent) learning dynamics of the SSL models trained using gradient descent on the Grassmannian manifold. We also argue that the infinite width approximation of SSL models significantly deviate from the neural tangent kernel approximations of supervised models. We numerically illustrate the validity of our theoretical findings, and discuss how the presented results provide a framework for further theoretical analysis of contrastive and non-contrastive SSL.
    摘要

Establishing a real-time traffic alarm in the city of Valencia with Deep Learning

  • paper_url: http://arxiv.org/abs/2309.02010
  • repo_url: None
  • paper_authors: Miguel Folgado, Veronica Sanz, Johannes Hirn, Edgar Lorenzo-Saez, Javier Urchueguia
  • for: 这项研究的目的是分析城市劳伦сия(Valencia,西班牙)的交通征流与污染物的相关性,以及开发一种用于预测下一30分钟内特定街区是否会出现异常高交通流的警报系统。
  • methods: 该研究使用了2018年的交通数据,通过Long Short-Term Memory(LSTM)神经网络进行预测,并在2019年的交通数据上进行测试。
  • results: 研究发现,交通征流对某些污染物(特别是$\text{NO}_\text{x}$)的水平有显著影响。同时,该研究开发出了一种独立的三级水平警报系统,可以预测特定街区在下一30分钟内是否会出现异常高交通流。
    Abstract Urban traffic emissions represent a significant concern due to their detrimental impacts on both public health and the environment. Consequently, decision-makers have flagged their reduction as a crucial goal. In this study, we first analyze the correlation between traffic flux and pollution in the city of Valencia, Spain. Our results demonstrate that traffic has a significant impact on the levels of certain pollutants (especially $\text{NO}_\text{x}$). Secondly, we develop an alarm system to predict if a street is likely to experience unusually high traffic in the next 30 minutes, using an independent three-tier level for each street. To make the predictions, we use traffic data updated every 10 minutes and Long Short-Term Memory (LSTM) neural networks. We trained the LSTM using traffic data from 2018, and tested it using traffic data from 2019.
    摘要 城市交通排放对公共健康和环境产生了重要的影响,因此决策者们将其减少列为重要目标。在这项研究中,我们首先分析了Valencia市的交通流和污染物之间的相关性。我们的结果显示,交通具有对某些污染物(尤其是$\text{NO}_\text{x}$)的显著影响。其次,我们开发了一个预测在下一个30分钟内街道是否会出现异常高交通流的警示系统,并将每条街道分为三级水平。为了进行预测,我们使用了每10分钟更新的交通数据和Long Short-Term Memory(LSTM)神经网络。我们使用2018年的交通数据进行训练,并在2019年的交通数据上进行测试。

An LSTM-Based Predictive Monitoring Method for Data with Time-varying Variability

  • paper_url: http://arxiv.org/abs/2309.01978
  • repo_url: None
  • paper_authors: Jiaqi Qiu, Yu Lin, Inez Zwetsloot
  • for: 本文旨在探讨使用循环神经网络(RNN)和其变体来实现预测性监测,以检测数据中的异常现象。
  • methods: 本文提出了基于长短期记忆(LSTM)预测 интерVAL的控制图,用于监测时间变化的数据。
  • results: simulations 和实际应用表明,提出的方法在检测平均值变化时表现出色,并且在实际时系列感知器数据中得到了证明。
    Abstract The recurrent neural network and its variants have shown great success in processing sequences in recent years. However, this deep neural network has not aroused much attention in anomaly detection through predictively process monitoring. Furthermore, the traditional statistic models work on assumptions and hypothesis tests, while neural network (NN) models do not need that many assumptions. This flexibility enables NN models to work efficiently on data with time-varying variability, a common inherent aspect of data in practice. This paper explores the ability of the recurrent neural network structure to monitor processes and proposes a control chart based on long short-term memory (LSTM) prediction intervals for data with time-varying variability. The simulation studies provide empirical evidence that the proposed model outperforms other NN-based predictive monitoring methods for mean shift detection. The proposed method is also applied to time series sensor data, which confirms that the proposed method is an effective technique for detecting abnormalities.
    摘要 “Recurrent neural network(RNN)和其变体在过去几年内得到了广泛的成功,但它尚未吸引过多的注意力在预测过程监测中。此外,传统的统计模型基于假设和假设测试,而神经网络(NN)模型则不需要这么多假设。这种灵活性使得NN模型在数据中具有时变变量的效率,这是实际数据中的一个常见特征。本文探讨了RNN结构在监测过程中的能力,并提出了基于长期快短时尺度内预测 интерval的控制图。实验研究表明,提议的模型在mean shift探测方面表现出色,并且在时间序列感知数据中进行了有效的异常检测。”Note: The translation is in Simplified Chinese, which is one of the two standard Chinese dialects. The other dialect is Traditional Chinese.

AdaPlus: Integrating Nesterov Momentum and Precise Stepsize Adjustment on AdamW Basis

  • paper_url: http://arxiv.org/abs/2309.01966
  • repo_url: None
  • paper_authors: Lei Guan
  • for: 提出了一种高效的优化器 called AdaPlus,它将Nesterov冲击和精确步长调整与AdamW基础结合。
  • methods: 使用了AdamW、Nadam和AdaBelief的优点,而无需添加额外的超参数。
  • results: 通过对三个机器学习任务进行广泛的实验评估,证明了AdaPlus在图像分类任务中表现最优,并在语言模型任务和生成器任务中表现出最高的稳定性。
    Abstract This paper proposes an efficient optimizer called AdaPlus which integrates Nesterov momentum and precise stepsize adjustment on AdamW basis. AdaPlus combines the advantages of AdamW, Nadam, and AdaBelief and, in particular, does not introduce any extra hyper-parameters. We perform extensive experimental evaluations on three machine learning tasks to validate the effectiveness of AdaPlus. The experiment results validate that AdaPlus (i) is the best adaptive method which performs most comparable with (even slightly better than) SGD with momentum on image classification tasks and (ii) outperforms other state-of-the-art optimizers on language modeling tasks and illustrates the highest stability when training GANs. The experiment code of AdaPlus is available at: https://github.com/guanleics/AdaPlus.
    摘要 这份论文提出了一种高效的优化器called AdaPlus,它将Nesterov势量和精确步长调整 integrate到AdamW基础上。AdaPlus结合了AdamW、Nadam和AdaBelief的优点,并不添加任何额外hyper参数。我们在三个机器学习任务上进行了广泛的实验评估,以验证AdaPlus的效果。实验结果表明,AdaPlus:(1)在图像分类任务上与SGD势量几乎相同,甚至有slightly better的性能。(2)在语言模型任务上超过其他当前顶尖优化器。(3)在训练GAN时显示出最高稳定性。AdaPlus的实验代码可以在https://github.com/guanleics/AdaPlus中找到。

Developing A Fair Individualized Polysocial Risk Score (iPsRS) for Identifying Increased Social Risk of Hospitalizations in Patients with Type 2 Diabetes (T2D)

  • paper_url: http://arxiv.org/abs/2309.02467
  • repo_url: None
  • paper_authors: Yu Huang, Jingchuan Guo, William T Donahoo, Zhengkang Fan, Ying Lu, Wei-Han Chen, Huilin Tang, Lori Bilello, Elizabeth A Shenkman, Jiang Bian
    for: 这个研究的目的是开发一个基于电子健康记录(EHR)的机器学习(ML)分析管道,以识别患有型二糖尿病(T2D)患者的社会需求不足,并且对这些需求进行解释性AI(XAI)评估和优化。methods: 这个研究使用了大学佐华利 Integrated Data Repository(UFH IR)中的EHR数据,包括社会Determinants of health(SDoH)和个体级SDoH,并开发了一个基于EHR的ML分析管道,称为个体化多社会风险分数(iPsRS),以识别患有T2D的患者中的高社会风险。results: 我们的iPsRS在各个种族-民族群体中进行了公平优化后,C statistic为0.72,可以准确预测患有T2D的患者1年内的入院风险。iPsRS能够准确捕捉高入院风险的个体,实际1年内top 5%的iPsRS的入院率比底层分数段高约13倍。
    Abstract Background: Racial and ethnic minority groups and individuals facing social disadvantages, which often stem from their social determinants of health (SDoH), bear a disproportionate burden of type 2 diabetes (T2D) and its complications. It is therefore crucial to implement effective social risk management strategies at the point of care. Objective: To develop an EHR-based machine learning (ML) analytical pipeline to identify the unmet social needs associated with hospitalization risk in patients with T2D. Methods: We identified 10,192 T2D patients from the EHR data (from 2012 to 2022) from the University of Florida Health Integrated Data Repository, including contextual SDoH (e.g., neighborhood deprivation) and individual-level SDoH (e.g., housing stability). We developed an electronic health records (EHR)-based machine learning (ML) analytic pipeline, namely individualized polysocial risk score (iPsRS), to identify high social risk associated with hospitalizations in T2D patients, along with explainable AI (XAI) techniques and fairness assessment and optimization. Results: Our iPsRS achieved a C statistic of 0.72 in predicting 1-year hospitalization after fairness optimization across racial-ethnic groups. The iPsRS showed excellent utility for capturing individuals at high hospitalization risk; the actual 1-year hospitalization rate in the top 5% of iPsRS was ~13 times as high as the bottom decile. Conclusion: Our ML pipeline iPsRS can fairly and accurately screen for patients who have increased social risk leading to hospitalization in T2D patients.
    摘要 背景:种族和民族少数群体和受到社会不利条件影响的个人患有类型2糖尿病(T2D)和其并发症的负担较大。因此,在点患者处实施有效的社会风险管理策略是非常重要。目标:开发基于电子健康纪录(EHR)的机器学习(ML)分析管道,以识别患有T2D患者的社会需求不足,与住院风险相关。方法:我们从2012年至2022年的University of Florida Health Integrated Data Repository中提取了10,192名T2D患者的EHR数据,包括上下文性社会 determinants of health(SDoH)和个体级SDoH(如住房稳定)。我们开发了基于EHR的ML分析管道,称为个体化多社会风险分数(iPsRS),以识别患有T2D患者的高社会风险,同时使用可解释AI(XAI)技术和公平评估和优化。结果:我们的iPsRS在公平优化后,在不同种族-民族群体中的CStatistic为0.72,能够准确预测患有T2D患者1年内的住院风险。iPsRS表现出色地捕捉了高住院风险的个体,实际1年内患有T2D患者在top5%的iPsRS中住院率高达13倍于bottom decile。结论:我们的ML管道iPsRS可以公平、准确地在患有T2D患者中识别具有高社会风险的患者,并且可以通过可解释AI技术和公平评估和优化来提高其效果。

RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking

  • paper_url: http://arxiv.org/abs/2309.01918
  • repo_url: None
  • paper_authors: Homanga Bharadhwaj, Jay Vakil, Mohit Sharma, Abhinav Gupta, Shubham Tulsiani, Vikash Kumar
  • for: 这篇论文旨在开发一种能够快速 multiply 现有数据集,并提取高性能策略的 universal agent 训练系统。
  • methods: 该系统使用 semantic augmentations 和 action representations 来快速训练 universal agent,并使用可靠的任务条件和表达能力架构来实现多种 manipulate 技能。
  • results: 通过使用仅 7500 示例,该系统可以训练一个可以执行 12 种技能的 universal agent,并在不同的 kitchen 场景中展示其普遍性和多样性。在未seen 情况下,RoboAgent 的性能高于先前方法,并且更加 Sample Efficient。视频详情请参考 https://robopen.github.io/.
    Abstract The grand aim of having a single robot that can manipulate arbitrary objects in diverse settings is at odds with the paucity of robotics datasets. Acquiring and growing such datasets is strenuous due to manual efforts, operational costs, and safety challenges. A path toward such an universal agent would require a structured framework capable of wide generalization but trained within a reasonable data budget. In this paper, we develop an efficient system (RoboAgent) for training universal agents capable of multi-task manipulation skills using (a) semantic augmentations that can rapidly multiply existing datasets and (b) action representations that can extract performant policies with small yet diverse multi-modal datasets without overfitting. In addition, reliable task conditioning and an expressive policy architecture enable our agent to exhibit a diverse repertoire of skills in novel situations specified using language commands. Using merely 7500 demonstrations, we are able to train a single agent capable of 12 unique skills, and demonstrate its generalization over 38 tasks spread across common daily activities in diverse kitchen scenes. On average, RoboAgent outperforms prior methods by over 40% in unseen situations while being more sample efficient and being amenable to capability improvements and extensions through fine-tuning. Videos at https://robopen.github.io/
    摘要 “我们的大目标是建立一个可以 manipulate 任意物品的多功能机器人,但是实际上存在着机器人学习数据的稀缺。 acquiring 和 growing 这些数据需要许多人工干预、操作成本和安全挑战。为了实现这个目标,我们需要一个结构化的框架,可以实现广泛的普遍化,并在有限的数据预算下训练。在这篇论文中,我们开发了一个高效的系统(RoboAgent),可以通过(a)实义增强和(b)动作表示来快速增加现有数据,并将小量多 modal 数据中的精致政策EXTRACT。此外,我们的任务条件和表达政策架构可以让我们的代理人在新的语言指令下展现多元的技能。仅从7500次示例中,我们能够训练一个可以拥有12种技能的单一代理人,并在38个任务中展现其普遍性。在未见的情况下,RoboAgent 比PRIOR METHODS 高出40%的性能,同时更加sample efficient 和可以通过精致化和扩展来提高能力。影片请参考https://robopen.github.io/”

A Survey on Physics Informed Reinforcement Learning: Review and Open Problems

  • paper_url: http://arxiv.org/abs/2309.01909
  • repo_url: None
  • paper_authors: Chayan Banerjee, Kien Nguyen, Clinton Fookes, Maziar Raissi
  • for: 这种研究旨在推动physics-informed reinforcement learning(PIRL)的发展,增强机器学习框架中的物理信息 incorporation,以提高physical plausibility和数据效率。
  • methods: 这篇文章通过对现有的physics-informed reinforcement learning(PIRL)方法进行系统性的综述和分类,批判性地分析了它们的不同特点和适用场景,从而提供了一个权威的taxonomy。
  • results: 该文章提供了一个全面的视角,把physics-informed reinforcement learning(PIRL)的实现方法分类为不同的类别,并指出了这个领域的应用场景、潜在的挑战和未来研究的方向。
    Abstract The inclusion of physical information in machine learning frameworks has revolutionized many application areas. This involves enhancing the learning process by incorporating physical constraints and adhering to physical laws. In this work we explore their utility for reinforcement learning applications. We present a thorough review of the literature on incorporating physics information, as known as physics priors, in reinforcement learning approaches, commonly referred to as physics-informed reinforcement learning (PIRL). We introduce a novel taxonomy with the reinforcement learning pipeline as the backbone to classify existing works, compare and contrast them, and derive crucial insights. Existing works are analyzed with regard to the representation/ form of the governing physics modeled for integration, their specific contribution to the typical reinforcement learning architecture, and their connection to the underlying reinforcement learning pipeline stages. We also identify core learning architectures and physics incorporation biases (i.e., observational, inductive and learning) of existing PIRL approaches and use them to further categorize the works for better understanding and adaptation. By providing a comprehensive perspective on the implementation of the physics-informed capability, the taxonomy presents a cohesive approach to PIRL. It identifies the areas where this approach has been applied, as well as the gaps and opportunities that exist. Additionally, the taxonomy sheds light on unresolved issues and challenges, which can guide future research. This nascent field holds great potential for enhancing reinforcement learning algorithms by increasing their physical plausibility, precision, data efficiency, and applicability in real-world scenarios.
    摘要 机器学习框架中包含物理信息的包含已经革命化了许多应用领域。这些包含物理约束和遵循物理法律,以提高学习过程的精度和有效性。在这项工作中,我们探讨物理信息在强化学习应用中的用途。我们提出了一种新的分类方法,将现有的强化学习方法分为三类:观察性、推理性和学习性。我们还对现有的强化学习方法进行了分析,包括物理模型的表示形式、在强化学习架构中的特点和与强化学习流程的连接。此外,我们还发现了现有的强化学习方法的核心学习架构和物理包含偏好。这种分类方法为未来研究提供了一个整体的视角,并且为实现物理信息的包含提供了一个有效的方法。此外,这种分类方法还透视了物理信息的包含在强化学习中的潜在问题和挑战,以及未来研究的可能性。这个领域的发展潜力很大,可以提高强化学习算法的物理可能性、精度、数据效率和实际应用场景中的适用性。

Extended Symmetry Preserving Attention Networks for LHC Analysis

  • paper_url: http://arxiv.org/abs/2309.01886
  • repo_url: None
  • paper_authors: Michael James Fenton, Alexander Shmakov, Hideki Okawa, Yuji Li, Ko-Yang Hsiao, Shih-Chieh Hsu, Daniel Whiteson, Pierre Baldi
  • for: 这篇论文是用来探索重积过的大型内部积体(ttH)搜寻、Top颗粒子质量测量和重Z’粒子衰变到Top颗粒子对的搜寻。
  • methods: 这篇论文使用了一种简化的注意力机制——对称保持注意力网络(SPANet),并将其扩展到考虑多个输入流,例如电子和全事件特征。
  • results: 研究发现在 semi-leptonic 探索中,使用扩展的 SPANet 可以获得 significative 的改善,并在 three 个代表性的研究中提供了详细的结果。
    Abstract Reconstructing unstable heavy particles requires sophisticated techniques to sift through the large number of possible permutations for assignment of detector objects to partons. An approach based on a generalized attention mechanism, symmetry preserving attention networks (SPANet), has been previously applied to top quark pair decays at the Large Hadron Collider, which produce six hadronic jets. Here we extend the SPANet architecture to consider multiple input streams, such as leptons, as well as global event features, such as the missing transverse momentum. In addition, we provide regression and classification outputs to supplement the parton assignment. We explore the performance of the extended capability of SPANet in the context of semi-leptonic decays of top quark pairs as well as top quark pairs produced in association with a Higgs boson. We find significant improvements in the power of three representative studies: search for ttH, measurement of the top quark mass and a search for a heavy Z' decaying to top quark pairs. We present ablation studies to provide insight on what the network has learned in each case.
    摘要 重新建构不稳定的重子 particle需要使用复杂的技术来筛选大量的可能性,以将测器对象分配给束子。一种基于通用注意机制的 Symmetry Preserving Attention Networks (SPANet) 已经在大引子中子粒子机器人中应用于 top quark pair 减谱,该过程产生六个有征的树脂。在这里,我们扩展了 SPANet 架构,考虑多个输入流,如电子和全局事件特征,如转移质量。此外,我们还提供了 regression 和 classification 输出,以补充束子分配。我们在 semi-leptonic decay 中研究了 top quark pair 的扩展能力,以及 top quark pair 与 Higgs boson 共生生成的情况。我们发现,在三个表型研究中,使用扩展的 SPANet 能力具有显著改善。我们还进行了剥离研究,以了解每个情况中网络学习的内容。

Task Generalization with Stability Guarantees via Elastic Dynamical System Motion Policies

  • paper_url: http://arxiv.org/abs/2309.01884
  • repo_url: None
  • paper_authors: Tianyu Li, Nadia Figueroa
  • for: 本研究旨在提出一种基于动力系统(DS)的学习 FROM DEMONSTRATION(LfD)方法,以实现从少量轨迹学习稳定和准确的激发动作策略,并能够扩展到新的任务实例。
  • methods: 该方法基于线性参数变化(LPV)DS模型,并使用 Gaussian Mixture Model(GMM)来捕捉任务相关的参数变化。在新任务实例/上下文中,GMM将被改变并使用Laplacian Editing来重新估计LPV-DS策略。
  • results: 在许多 simulate 和实际 робо臂实验中,Elastic-DS 表现出了高度的灵活性和扩展性,同时保持了控制理论上的保证。详细视频可以在https://sites.google.com/view/elastic-ds 找到。
    Abstract Dynamical System (DS) based Learning from Demonstration (LfD) allows learning of reactive motion policies with stability and convergence guarantees from a few trajectories. Yet, current DS learning techniques lack the flexibility to generalize to new task instances as they ignore explicit task parameters that inherently change the underlying trajectories. In this work, we propose Elastic-DS, a novel DS learning, and generalization approach that embeds task parameters into the Gaussian Mixture Model (GMM) based Linear Parameter Varying (LPV) DS formulation. Central to our approach is the Elastic-GMM, a GMM constrained to SE(3) task-relevant frames. Given a new task instance/context, the Elastic-GMM is transformed with Laplacian Editing and used to re-estimate the LPV-DS policy. Elastic-DS is compositional in nature and can be used to construct flexible multi-step tasks. We showcase its strength on a myriad of simulated and real-robot experiments while preserving desirable control-theoretic guarantees. Supplementary videos can be found at https://sites.google.com/view/elastic-ds
    摘要 dynamical system (DS) 基于学习from Demonstration (LfD) 可以从一些轨迹学习反应性动作策略,并且有稳定性和收敛保证。然而,当前的 DS 学习技术 ignore 表达式 task 参数,这些参数直接影响下面的轨迹,从而导致学习不具有普适性。在这项工作中,我们提出了 Elastic-DS,一种新的 DS 学习和总结方法,该方法在 Gaussian Mixture Model (GMM) 基于 Linear Parameter Varying (LPV) DS 形式ulation中嵌入任务参数。中心思想是 Elastic-GMM,一个受 SE(3) 任务相关框架约束的 GMM。给定一个新任务实例/上下文,Elastic-GMM 将通过 Laplacian Editing 变换,并用来重新估计 LPV-DS 政策。Elastic-DS 是可组合的性的,可以用于构建灵活的多步任务。我们在许多模拟和真实机器人实验中证明了其强大,同时保持了控制理论上的保证。补充视频可以在 https://sites.google.com/view/elastic-ds 找到。

eess.IV - 2023-09-05

An Improved Upper Bound on the Rate-Distortion Function of Images

  • paper_url: http://arxiv.org/abs/2309.02574
  • repo_url: https://github.com/duanzhiihao/lossy-vae
  • paper_authors: Zhihao Duan, Jack Ma, Jiangpeng He, Fengqing Zhu
  • for: 这个论文的目的是提出一种基于Variational Autoencoders(VAEs)的图像损失 compression算法,以实现图像的损失 compression。
  • methods: 该论文使用了一种新的 VAE 模型架构,并应用了可变比率压缩技术,以及一种新的 \ourfunction{} 来稳定训练。
  • results: 该论文的实验结果表明,可以通过该算法实现至少 30% BD-rate 减少,相比于 VVC 编码器的内部预测模式,这表明仍然有很大的潜在提高损失图像压缩的potential。
    Abstract Recent work has shown that Variational Autoencoders (VAEs) can be used to upper-bound the information rate-distortion (R-D) function of images, i.e., the fundamental limit of lossy image compression. In this paper, we report an improved upper bound on the R-D function of images implemented by (1) introducing a new VAE model architecture, (2) applying variable-rate compression techniques, and (3) proposing a novel \ourfunction{} to stabilize training. We demonstrate that at least 30\% BD-rate reduction w.r.t. the intra prediction mode in VVC codec is achievable, suggesting that there is still great potential for improving lossy image compression. Code is made publicly available at https://github.com/duanzhiihao/lossy-vae.
    摘要 最近的研究表明,变量自动编码器(VAEs)可以用来Upper-bound the information rate-distortion(R-D)函数图像,即图像损失压缩的基本限制。在这篇论文中,我们报告了一种新的 VAE 模型架构,以及对变量比特率压缩技术的应用,以及一种新的 \ourfunction{} 来稳定训练。我们示出,至少可以实现30%的BD-rate减少相对于VVC编码器的内部预测模式,这表明还有很大的潜在改进损失图像压缩的可能性。代码在https://github.com/duanzhiihao/lossy-vae中公开。

Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2309.02529
  • repo_url: None
  • paper_authors: Haisheng Fu, Feng Liang, Jie Liang, Yongqiang Wang, Guohe Zhang, Jingning Han
  • for: 提高图像压缩速度和质量之间的平衡
  • methods: 引入四种技术:扭曲conv模块、平行上下文模型、改进的三步知识传递训练方法和$L_1$正则化
  • results: 比对 estado-of-the-art 学习图像编码方案,我们的方法可以在编码和解码过程中减少时间,并且在 PSNR 和 MS-SSIM 指标上提高 $2.3%$,在 Kodak 和 Tecnick-40 数据集上测试得到更高的性能。
    Abstract Deep learning-based image compression has made great progresses recently. However, many leading schemes use serial context-adaptive entropy model to improve the rate-distortion (R-D) performance, which is very slow. In addition, the complexities of the encoding and decoding networks are quite high and not suitable for many practical applications. In this paper, we introduce four techniques to balance the trade-off between the complexity and performance. We are the first to introduce deformable convolutional module in compression framework, which can remove more redundancies in the input image, thereby enhancing compression performance. Second, we design a checkerboard context model with two separate distribution parameter estimation networks and different probability models, which enables parallel decoding without sacrificing the performance compared to the sequential context-adaptive model. Third, we develop an improved three-step knowledge distillation and training scheme to achieve different trade-offs between the complexity and the performance of the decoder network, which transfers both the final and intermediate results of the teacher network to the student network to help its training. Fourth, we introduce $L_{1}$ regularization to make the numerical values of the latent representation more sparse. Then we only encode non-zero channels in the encoding and decoding process, which can greatly reduce the encoding and decoding time. Experiments show that compared to the state-of-the-art learned image coding scheme, our method can be about 20 times faster in encoding and 70-90 times faster in decoding, and our R-D performance is also $2.3 \%$ higher. Our method outperforms the traditional approach in H.266/VVC-intra (4:4:4) and some leading learned schemes in terms of PSNR and MS-SSIM metrics when testing on Kodak and Tecnick-40 datasets.
    摘要 深度学习基于图像压缩的技术在最近几年来已经取得了大量的进步。然而,许多领先的方案仍然使用序列Context-adaptive entropy模型来提高Rate-distortion(R-D)性能,这很慢。此外,压缩和解压缩网络的复杂度很高,不适合许多实际应用。在这篇论文中,我们提出了四种技术来平衡复杂度和性能的负担。我们是首次在压缩框架中引入可变 convolutional模块,可以更好地从输入图像中除去红UNDERSCOREundancy,从而提高压缩性能。其次,我们设计了Checkerboard Context模型,它使用两个独立的分布参数估计网络和不同的概率模型,可以在平行解码过程中保持同样的性能,而不需要顺序Context-adaptive模型。第三,我们开发了一种改进的三步知识传递和训练方案,可以在不同的负担和性能之间进行平衡。最后,我们引入L1正则化,使得干扰表示的数字值更加稀疏。然后,我们只编码非零通道,从而大幅减少编码和解码时间。实验结果显示,相比于当前最佳学习图像编码方案,我们的方法可以在编码过程中提高20倍,并在解码过程中提高70-90倍,同时R-D性能也提高了2.3%。我们的方法在H.266/VVC-intra(4:4:4)和一些领先的学习图像编码方案之上具有较高的PSNR和MS-SSIM指标,当测试在Kodak和Tecnick-40 dataset时。

An automated, high-resolution phenotypic assay for adult Brugia malayi and microfilaria

  • paper_url: http://arxiv.org/abs/2309.03235
  • repo_url: None
  • paper_authors: Upender Kalwa, Yunsoo Park, Michael J. Kimber, Santosh Pandey
  • for: 这项研究用于测试抗helmintic药物的有效性,以提高现有的抗helmintic药物治疗生ariasis的效果。
  • methods: 这种多参数型生物力学试验基于tracking成年布鲁格IA的运动能力,以评估三种抗helmintic药物的效果。
  • results: 研究发现,这三种抗helmintic药物可以减少成年布鲁格IA的运动能力,且具有不同的机理和效果。
    Abstract Brugia malayi are thread-like parasitic worms and one of the etiological agents of Lymphatic filariasis (LF). Existing anthelmintic drugs to treat LF are effective in reducing the larval microfilaria (mf) counts in human bloodstream but are less effective on adult parasites. To test potential drug candidates, we report a multi-parameter phenotypic assay based on tracking the motility of adult B. malayi and mf in vitro. For adult B. malayi, motility is characterized by the centroid velocity, path curvature, angular velocity, eccentricity, extent, and Euler Number. These parameters are evaluated in experiments with three anthelmintic drugs. For B. malayi mf, motility is extracted from the evolving body skeleton to yield positional data and bending angles at 74 key point. We achieved high-fidelity tracking of complex worm postures (self-occlusions, omega turns, body bending, and reversals) while providing a visual representation of pose estimates and behavioral attributes in both space and time scales.
    摘要 布鲁迪亚马LAY是线状寄生虫,是淋巴炎病(LF)的etiological agent之一。现有的安定虫药可以降低人体血液中幼虫微血短的数量,但对成熟虫有效性较差。为测试潜在药物候选者,我们报告了一种多参数现象学测试方法,基于成人布鲁迪亚马LAY和幼虫的运动追踪。成人布鲁迪亚马LAY的运动特征包括中心速度、轨迹弯曲、angular velocity、eccentricity、范围和Euler数。这些参数在三种安定虫药实验中被评估。布鲁迪亚马LAY幼虫的运动被提取自发展的身体骨架中,以获得位坐数据和弯曲角度。我们实现了高精度的跟踪复杂的虫姿势(自相交、卷曲、身体弯曲和反转),并提供了Visual representation of pose estimates和行为特征在空间和时间尺度上。

Duration-adaptive Video Highlight Pre-caching for Vehicular Communication Network

  • paper_url: http://arxiv.org/abs/2309.01944
  • repo_url: None
  • paper_authors: Liang Xu, Deshi Li, Kaitao Meng, Mingliu Liu, Shuya Zhu
  • for: 本文 targets vehicular communication networks (VCNs) and aims to improve video highlight pre-caching.
  • methods: 本文提出了一种高效的视频精华预取方案,基于服务持续时间的变化和视频段的吸引力和续接性。
  • results: simulations based on real-world video datasets show that the proposed method significantly improves highlight entropy and jitter compared to benchmark schemes.
    Abstract Video traffic in vehicular communication networks (VCNs) faces exponential growth. However, different segments of most videos reveal various attractiveness for viewers, and the pre-caching decision is greatly affected by the dynamic service duration that edge nodes can provide services for mobile vehicles driving along a road. In this paper, we propose an efficient video highlight pre-caching scheme in the vehicular communication network, adapting to the service duration. Specifically, a highlight entropy model is devised with the consideration of the segments' popularity and continuity between segments within a period of time, based on which, an optimization problem of video highlight pre-caching is formulated. As this problem is non-convex and lacks a closed-form expression of the objective function, we decouple multiple variables by deriving candidate highlight segmentations of videos through wavelet transform, which can significantly reduce the complexity of highlight pre-caching. Then the problem is solved iteratively by a highlight-direction trimming algorithm, which is proven to be locally optimal. Simulation results based on real-world video datasets demonstrate significant improvement in highlight entropy and jitter compared to benchmark schemes.
    摘要 Video流量在交通网络(VCN)中正在呈指数增长趋势。然而,不同的视频片段吸引了不同的观众,并且边缘节点可以为移动 vehicles提供不同的服务时间,这会对预缓存决策产生很大的影响。在这篇论文中,我们提出了一种高效的视频突出点预缓存方案,适应到服务时间的变化。具体来说,我们开发了一个高光 entropy 模型,考虑了视频片段的吸引力和时间上的连续性,基于这个模型,我们将预缓存问题转化为优化问题。由于这个问题是非凸的,而且无法取得目标函数的闭合表达,我们将变量分解成多个变量,通过wavelet 变换 derivation 得到候选的高光片段,这可以很大地降低预缓存的复杂度。然后,我们通过一种高光方向截断算法来解决这个问题,这个算法被证明是本地优化的。实验结果基于实际的视频数据集表明,与参考方案相比,我们的方案具有显著提高高光 entropy和抖动的性能。

eess.SP - 2023-09-05

Semantic Communications Based on Adaptive Generative Models and Information Bottleneck

  • paper_url: http://arxiv.org/abs/2309.02387
  • repo_url: None
  • paper_authors: S. Barbarossa, D. Comminiello, E. Grassucci, F. Pezone, S. Sardellitti, P. Di Lorenzo
  • for: 本文提出了一种基于三个基本想法的含义通信方法,即使用拓扑空间来表示数据,根据关系来捕捉 semantics,使用信息瓶颈理论来确定可信度和延迟,以及使用概率生成模型来适应无线通信频率和图像重建或执行分类任务。
  • methods: 本文使用的方法包括使用拓扑空间来表示数据,根据关系来捕捉 semantics,使用信息瓶颈理论来确定可信度和延迟,以及使用概率生成模型来适应无线通信频率和图像重建或执行分类任务。
  • results: 本文的结果表明,基于这三个基本想法的含义通信方法可以减少无线通信中的传输数据量,同时保持图像重建和分类任务的高精度。
    Abstract Semantic communications represent a significant breakthrough with respect to the current communication paradigm, as they focus on recovering the meaning behind the transmitted sequence of symbols, rather than the symbols themselves. In semantic communications, the scope of the destination is not to recover a list of symbols symbolically identical to the transmitted ones, but rather to recover a message that is semantically equivalent to the semantic message emitted by the source. This paradigm shift introduces many degrees of freedom to the encoding and decoding rules that can be exploited to make the design of communication systems much more efficient. In this paper, we present an approach to semantic communication building on three fundamental ideas: 1) represent data over a topological space as a formal way to capture semantics, as expressed through relations; 2) use the information bottleneck principle as a way to identify relevant information and adapt the information bottleneck online, as a function of the wireless channel state, in order to strike an optimal trade-off between transmit power, reconstruction accuracy and delay; 3) exploit probabilistic generative models as a general tool to adapt the transmission rate to the wireless channel state and make possible the regeneration of the transmitted images or run classification tasks at the receiver side.
    摘要 semantic communications represent a significant breakthrough in terms of the current communication paradigm, as they focus on recovering the meaning behind the transmitted sequence of symbols, rather than the symbols themselves. In semantic communications, the scope of the destination is not to recover a list of symbols symbolically identical to the transmitted ones, but rather to recover a message that is semantically equivalent to the semantic message emitted by the source. This paradigm shift introduces many degrees of freedom to the encoding and decoding rules that can be exploited to make the design of communication systems much more efficient. In this paper, we present an approach to semantic communication building on three fundamental ideas: 1) represent data over a topological space as a formal way to capture semantics, as expressed through relations; 2) use the information bottleneck principle as a way to identify relevant information and adapt the information bottleneck online, as a function of the wireless channel state, in order to strike an optimal trade-off between transmit power, reconstruction accuracy, and delay; 3) exploit probabilistic generative models as a general tool to adapt the transmission rate to the wireless channel state and make possible the regeneration of the transmitted images or run classification tasks at the receiver side.

Sensing With Random Signals

  • paper_url: http://arxiv.org/abs/2309.02375
  • repo_url: https://github.com/jpatsenker/noisy_random_projection_sparse_signal_recon
  • paper_authors: Shihang Lu, Fan Liu, Fuwang Dong, Yifeng Xiong, Jie Xu, Ya-Feng Liu
  • for: 本文研究了使用随机ISAC信号进行目标探测,并对多antenna系统进行分析。
  • methods: 本文定义了一个新的探测性能指标,即随机线性最小均方误差(ELMMSE),用于描述ISAC信号Randomness中的估计误差。然后,本文研究了基于数据依赖的探测矩阵编码方案,以实现优化的探测性能,并提出了一种数据独立的探测矩阵编码方案和一种Stochastic Gradient Projection(SGP)算法来实现ELMMSE最小化。
  • results: 通过Simulations,本文示出了提议方法的优越性。
    Abstract Radar systems typically employ well-designed deterministic signals for target sensing. In contrast to that, integrated sensing and communications (ISAC) systems have to use random signals to convey useful information, potentially causing sensing performance degradation. This paper analyzes the sensing performance via random ISAC signals over a multi-antenna system. Towards this end, we define a new sensing performance metric, namely, ergodic linear minimum mean square error (ELMMSE), which characterizes the estimation error averaged over the randomness of ISAC signals. Then, we investigate a data-dependent precoding scheme to minimize the ELMMSE, which attains the {optimized} sensing performance at the price of high computational complexity. To reduce the complexity, we present an alternative data-independent precoding scheme and propose a stochastic gradient projection (SGP) algorithm for ELMMSE minimization, which can be trained offline by locally generated signal samples. Finally, we demonstrate the superiority of the proposed methods by simulations.
    摘要 雷达系统通常使用高效的决定性信号进行目标探测。然而,集成感知通信(ISAC)系统需要使用随机信号传输有用信息,可能导致探测性能下降。本文分析了使用随机ISAC信号在多antenna系统上的探测性能。为此,我们定义了一个新的探测性能指标,即ergodic线性最小均方误差(ELMMSE),该指标表示随机ISAC信号中的估计误差的平均值。然后,我们 investigate了一种数据依赖的 precoding 策略,以实现最优的探测性能,但是计算复杂性高。为了降低复杂性,我们提出了一种数据独立的 precoding 策略,并提出了一种Stochastic Gradient Projection(SGP)算法,用于ELMMSE最小化,该算法可以在本地生成的信号样本上进行线上培育。最后,我们通过 simulate 表明了我们的方法的优越性。

  • paper_url: http://arxiv.org/abs/2309.02264
  • repo_url: None
  • paper_authors: Shanshan Zhang, Wen Chen
  • for: 提高上行通信系统的公平性
  • methods: 使用环境反射表(IRS)减轻通信路径损失,并使用最大最小偏好优化问题获取资源分配,包括接收束形成和相位偏移束形成
  • results: simulation 结果表明,提议的方案可以提高上行通信的公平性
    Abstract In this paper, we propose a rate-splitting multiple access (RSMA) scheme for uplink wireless communication systems with intelligent reflecting surface (IRS) aided. In the considered model, IRS is adopted to overcome power attenuation caused by path loss. We construct a max-min fairness optimization problem to obtain the resource allocation, including the receive beamforming at the base station (BS) and phase-shift beamforming at IRS. We also introduce a successive group decoding (SGD) algorithm at the receiver, which trades off the fairness and complexity of decoding. In the simulation, the results show that the proposed scheme has superiority in improving the fairness of uplink communication.
    摘要 在这篇论文中,我们提出了一种基于多访问率拆分(RSMA)的上行无线通信系统中使用智能反射 superficie(IRS)的方案。在考虑的模型中,我们采用了IRS以超越由路径损失引起的功率损失。我们构建了最大最小公平性优化问题来获取资源分配,包括接收天线的扫描方向和phaseshift天线的调制。我们还引入了Successive Group Decoding(SGD)算法,以考虑公平性和解码复杂性之间的贸易。在仿真中,结果显示,我们提出的方案可以提高上行通信的公平性。

Design of a New CIM-DCSK-Based Ambient Backscatter Communication System

  • paper_url: http://arxiv.org/abs/2309.02259
  • repo_url: None
  • paper_authors: Ruipeng Yang, Yi Fang, Pingping Chen, Huan Ma
    for: 提高Diffusion Chaos Shift Keying(DCSK)基于Ambient Backscatter Communication(AmBC)系统的数据率,我们提出了一种基于Code Index Modulation(CIM)的AmBC系统,称为CIM-DCSK-AmBC系统。methods: 在提出的系统中,CIM-DCSK信号在直接链路上传输,并用作背scatter链路的Radio Frequency源。在背scatter链路中,信号格式设计用于提高数据率,同时消除直接链路信号干扰。因此,直接链路信号和背scatter链路信号可以同时接收和解模ulation。results: 我们 derivated and validated the theoretical bit error rate(BER)表达式 Of CIM-DCSK-AmBC系统 Over multipath Rayleigh fading channels。Compared with准确参照DCSK-based AmBC(SR-DCSK-AmBC)系统作为参考系统,numerical results reveal that CIM-DCSK-AmBC系统可以在直接链路中 achieve better BER性能和在背scatter链路中 higher throughput than benchmark system。
    Abstract To improve the data rate in differential chaos shift keying (DCSK) based ambient backscatter communication (AmBC) system, we propose a new AmBC system based on code index modulation (CIM), referred to as CIM-DCSK-AmBC system. In the proposed system, the CIM-DCSK signal transmitted in the direct link is used as the radio frequency source of the backscatter link. The signal format in the backscatter link is designed to increase the data rate as well as eliminate the interference of the direct link signal. As such, the direct link signal and the backscatter link signal can be received and demodulated simultaneously. Moreover, we derive and validate the theoretical bit error rate (BER) expressions of the CIM-DCSK-AmBC system over multipath Rayleigh fading channels. Regarding the short reference DCSK-based AmBC (SR-DCSK-AmBC) system as a benchmark system, numerical results reveal that the CIM-DCSK-AmBC system can achieve better BER performance in the direct link and higher throughput in the backscatter link than the benchmark system.
    摘要 为了提高Diffusion Chaos Shift Keying(DCSK)基于Ambient Backscatter Communication(AmBC)系统的数据速率,我们提议一种基于Code Index Modulation(CIM)的AmBC系统,称为CIM-DCSK-AmBC系统。在该系统中,在直接链路中发送的CIM-DCSK信号被用作背scatter链路的Radio Frequency源。在背scatter链路中,信号格式设计为提高数据速率,同时消除直接链路信号干扰。因此,直接链路信号和背scatter链路信号可以同时接收和解模式。此外,我们 derive了和验证了CIM-DCSK-AmBC系统在多path Rayleigh抖振通道上的符号错误率(BER)表达式。对于参考系统SR-DCSK-AmBC系统作为标准系统,数字结果表明,CIM-DCSK-AmBC系统在直接链路中的BER性能比标准系统更好,而在背scatter链路中的 Throughput更高。

PyPVRoof: a Python package for extracting the characteristics of rooftop PV installations using remote sensing data

  • paper_url: http://arxiv.org/abs/2309.07143
  • repo_url: https://github.com/gabrielkasmi/pypvroof
  • paper_authors: Yann Tremenbert, Gabriel Kasmi, Laurent Dubus, Yves-Marie Saint-Drenan, Philippe Blanc
  • for: 这篇论文是为了提供一个Python包({\tt PyPVRoof)来自动提取庭院式太阳能系统的主要特征(倾角、方位、表面、地点和安装容量)。
  • methods: 该论文使用了一种benchmark方法来评估{\tt PyPVRoof}的准确性,并提供了数据来复制这些精度测试。
  • results: 该论文的结果表明,{\tt PyPVRoof}可以高效地自动提取庭院式太阳能系统的主要特征,并且可以满足不同的数据可用性和用户需求。
    Abstract Photovoltaic (PV) energy grows at an unprecedented pace, which makes it difficult to maintain up-to-date and accurate PV registries, which are critical for many applications such as PV power generation estimation. This lack of qualitative data is especially true in the case of rooftop PV installations. As a result, extensive efforts are put into the constitution of PV inventories. However, although valuable, these registries cannot be directly used for monitoring the deployment of PV or estimating the PV power generation, as these tasks usually require PV systems {\it characteristics}. To seamlessly extract these characteristics from the global inventories, we introduce {\tt PyPVRoof}. {\tt PyPVRoof} is a Python package to extract essential PV installation characteristics. These characteristics are tilt angle, azimuth, surface, localization, and installed capacity. {\tt PyPVRoof} is designed to cover all use cases regarding data availability and user needs and is based on a benchmark of the best existing methods. Data for replicating our accuracy benchmarks are available on our Zenodo repository \cite{tremenbert2023pypvroof}, and the package code is accessible at this URL: \url{https://github.com/gabrielkasmi/pypvroof}.
    摘要 彩票太阳能(PV)在不可思议的速度下增长,使得保持最新和准确的PV注册记录变得很困难,这些注册记录对许多应用来说非常重要,例如PV电力生产估算。特别是在悬挂PV设备上,缺乏质量的数据是非常真实的。因此,大量的努力被投入到PV库的编制中。虽然这些注册记录非常有价值,但它们无法直接用于监测PV的部署或估算PV电力生产,因为这些任务通常需要PV系统的特征。为了快速提取这些特征,我们介绍了PyPVRoof。PyPVRoof是一个基于Python的包,用于提取PV设备的关键特征,包括倾斜角度、方位、面积、地点和安装容量。PyPVRoof针对所有的数据可用性和用户需求进行了设计,并基于最佳现有方法的准确性标准。数据用于复制我们准确性标准的数据可以在我们Zenodo存储库中找到 \cite{tremenbert2023pypvroof}, 并且包代码可以在以下URL上获取:\url{https://github.com/gabrielkasmi/pypvroof}。

A Wideband MIMO Channel Model for Aerial Intelligent Reflecting Surface-Assisted Wireless Communications

  • paper_url: http://arxiv.org/abs/2309.02171
  • repo_url: None
  • paper_authors: Shaoyi Liu, Nan Ma, Yaning Chen, Ke Peng, Dongsheng Xue
  • for: 本研究旨在提出一种三维宽频通道模型,用于描述空中智能反射表(AIRS)和智能反射表(IRS)合作多输入多出力(MIMO)通信系统中的通道特性。
  • methods: 本文提出了一种三维宽频通道模型,考虑了AIRS的旋转度量和空间运动角度。基于该模型,提出了一些可行的共同相位调整策略。
  • results: 实验结果表明,提出的模型能准确捕捉通道特性,并且提出的相位调整策略可以有效地改善通道统计特性和系统容量。此外,我们发现在某些情况下,IRS和直线视线(LoS)路径之间的道路具有类似特性。这些发现可以为未来智能通信系统的发展提供有价值的指导。
    Abstract Compared to traditional intelligent reflecting surfaces(IRS), aerial IRS (AIRS) has unique advantages, such as more flexible deployment and wider service coverage. However, modeling AIRS in the channel presents new challenges due to their mobility. In this paper, a three-dimensional (3D) wideband channel model for AIRS and IRS joint-assisted multiple-input multiple-output (MIMO) communication system is proposed, where considering the rotational degrees of freedom in three directions and the motion angles of AIRS in space. Based on the proposed model, the channel impulse response (CIR), correlation function, and channel capacity are derived, and several feasible joint phase shifts schemes for AIRS and IRS units are proposed. Simulation results show that the proposed model can capture the channel characteristics accurately, and the proposed phase shifts methods can effectively improve the channel statistical characteristics and increase the system capacity. Additionally, we observe that in certain scenarios, the paths involving the IRS and the line-of-sight (LoS) paths exhibit similar characteristics. These findings provide valuable insights for the future development of intelligent communication systems.
    摘要 Translated into Simplified Chinese:与传统的智能反射表面(IRS)相比,空中智能反射表面(AIRS)具有更多的灵活部署和更广泛的服务覆盖。然而,为AIRS在通道进行模型化带来了新的挑战,因为它们的移动会导致通道的差异。在这篇论文中,我们提出了一个三维(3D)宽频通道模型,用于AIRS和IRS共同协助多输入多出力(MIMO)通信系统。该模型考虑了AIRS在三个方向上的旋转度量和空间中的运动角度。根据提出的模型,我们 derivated了通道响应函数(CIR)、相关函数和通道容量。此外,我们还提出了一些可能的共同相位shift方案 дляAIRS和IRS单元。实验结果表明,提出的模型可以准确捕捉通道特性,并且提出的相位shift方案可以有效改善通道统计特性和系统容量。此外,我们还发现在某些场景下,IRS和直线视线(LoS)路径之间的道路具有相似的特性。这些发现提供了智能通信系统的未来发展中的有价值意见。

The Impact of SAR-ADC Mismatch on Quantized Massive MU-MIMO Systems

  • paper_url: http://arxiv.org/abs/2309.02168
  • repo_url: None
  • paper_authors: Jérémy Guichemerre, Christoph Studer
  • for: 这篇论文主要探讨了低分辨率的数字化数组(ADC)在大规模多用户(MU)多输入多Output(MIMO)无线系统中的应用。
  • methods: 论文使用了Bussgang的分解来模型了ADC的发散错误,并分析了这些错误对ADC的性能影响。
  • results: 论文发现,即使使用低分辨率的SAR ADC,但是发散错误仍然会影响系统的性能。
    Abstract Low-resolution analog-to-digital converters (ADCs) in massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems can significantly reduce the power, cost, and interconnect data rates of infrastructure basestations. Thus, recent research on the theory and algorithm sides has extensively focused on such architectures, but with idealistic quantization models. However, real-world ADCs do not behave like ideal quantizers, and are affected by fabrication mismatches. We analyze the impact of capacitor-array mismatches in successive approximation register (SAR) ADCs, which are widely used in wireless systems. We use Bussgang's decomposition to model the effects of such mismatches, and we analyze their impact on the performance of a single ADC. We then simulate a massive MU-MIMO system to demonstrate that capacitor mismatches should not be ignored, even in basestations that use low-resolution SAR ADCs.
    摘要 低分辨率的analog-to-digital converter (ADC)在大规模多用户多输入多输出(MU-MIMO)无线系统中可以显著降低基站的能耗、成本和 интер连接数据率。因此,当前的研究把焦点在这些架构上,但是使用理想的量化模型。然而,实际的ADC不是理想的量化器,它们受到制造偏差的影响。我们分析了Successive Approximation Register(SAR)ADC中的电容器数组偏差的影响,使用Bussgang的分解来模型这些影响。我们分析了单个ADC的性能受到这些偏差的影响,然后通过模拟大规模MU-MIMO系统来证明,即使使用低分辨率SAR ADC,也不能忽略电容器偏差。

Wiometrics: Comparative Performance of Artificial Neural Networks for Wireless Navigation

  • paper_url: http://arxiv.org/abs/2309.02121
  • repo_url: None
  • paper_authors: Russ Whiton, Junshi Chen, Fredrik Tufvesson
  • for: 本研究用于Navigation aid的 Radio signals 的利用,以及现有和未来的 terrestrial wireless communication systems 的双用性。
  • methods: 本文使用 artificial neural networks 进行了 vehicular location and heading estimation,并使用了 software-defined radio 和庞大的天线数组。
  • results: 实验结果显示,使用不同的 artificial neural network 架构和输入数据表示,可以实现精度在几米之间,并且方向精度在几度之间。
    Abstract Radio signals are used broadly as navigation aids, and current and future terrestrial wireless communication systems have properties that make their dual-use for this purpose attractive. Sub-6 GHz carrier frequencies enable widespread coverage for data communication and navigation, but typically offer smaller bandwidths and limited resolution for precise estimation of geometries, particularly in environments where propagation channels are diffuse in time and/or space. Non-parametric methods have been employed with some success for such scenarios both commercially and in literature, but often with an emphasis on low-cost hardware and simple models of propagation, or with simulations that do not fully capture hardware impairments and complex propagation mechanisms. In this article, we make opportunistic observations of downlink signals transmitted by commercial cellular networks by using a software-defined radio and massive antenna array mounted on a passenger vehicle in an urban non line-of-sight scenario, together with a ground truth reference for vehicle pose. With these observations as inputs, we employ artificial neural networks to generate estimates of vehicle location and heading for various artificial neural network architectures and different representations of the input observation data, which we call wiometrics, and compare the performance for navigation. Position accuracy on the order of a few meters, and heading accuracy of a few degrees, are achieved for the best-performing combinations of networks and wiometrics. Based on the results of the experiments we draw conclusions regarding possible future directions for wireless navigation using statistical methods.
    摘要 无线信号广泛用于导航帮助,现有和未来的陆地无线通信系统具有许多有用的双用特性。低于6GHz的载波频率提供了广泛的覆盖率 для数据通信和导航,但通常具有较小的带宽和限制的分辨率,尤其在时间和空间方向上的噪声通道杂化环境中。非参数方法在这些场景中已经得到了一定的成功,但经常强调低成本硬件和简单的噪声传播模型,或者使用不完全捕捉硬件障碍和复杂噪声传播机制的仿真。在这篇文章中,我们利用软件定义的广播Receiver和巨大的天线数组,在城市非直线视线场景中观察商业无线网络的下行信号,并使用软件定义的人工神经网络生成车辆位置和方向估计。我们使用不同的人工神经网络架构和输入数据的不同表示方式,称为“wiometrics”,并比较这些组合的性能。实验结果显示,最佳组合可以实现位置精度在几米之间,并且方向精度在几度之间。根据实验结果,我们对未来无线导航使用统计方法的可能性进行了结论。

Bayesian Phase Search for Probabilistic Amplitude Shaping

  • paper_url: http://arxiv.org/abs/2309.02003
  • repo_url: None
  • paper_authors: Mohammad Taha Askari, Lutz Lampe
  • for: 这篇论文是为了提出一种可靠的数据恢复(CPR)算法,可以在低信号至杂音比例(SNR)的情况下进行恢复。
  • methods: 这篇论文使用的方法是 bayesian 数据恢复(CPR)算法,并且将其应用于可能性束形成(PAS)。
  • results: 结果显示这个新算法可以超越干扰监测CPR的降解情况,并且在PAS中获得更好的效果。
    Abstract We introduce a Bayesian carrier phase recovery (CPR) algorithm which is robust against low signal-to-noise ratio scenarios. It is therefore effective for phase recovery for probabilistic amplitude shaping (PAS). Results validate that the new algorithm overcomes the degradation experienced by blind phase-search CPR for PAS.
    摘要 我团队提出了一种 bayesian 承载阶段恢复(CPR)算法,可以在低信号噪响比enario下展示Robust性。因此,这种算法是probabilistic amplitude shaping(PAS)中的phase恢复效果的好选择。结果表明,新算法可以超越blind phase-search CPR对PAS的干扰。Here's a breakdown of the translation:* "We introduce" is translated as "我团队提出" (wǒ tuán zǔ tím shuō)* "Bayesian carrier phase recovery" is translated as "bayesian 承载阶段恢复" (bayesian zhāng chēng jīe duō zhī yì)* "algorithm" is translated as "算法" (suān fǎ)* "robust against low signal-to-noise ratio scenarios" is translated as "可以在低信号噪响比enario下展示Robust性" (kě yǐ zài shàng xīn xiāng bīng yè xiàng)* "It is therefore effective for phase recovery for probabilistic amplitude shaping" is translated as "因此,这种算法是probabilistic amplitude shaping(PAS)中的phase恢复效果的好选择" (yīn qù, zhè zhōng suān fǎ shì PAS 中的phase zhī yì de hǎo jiǎo)* "Results validate" is translated as "结果表明" (jiégù bǎo míng)* "that the new algorithm overcomes the degradation experienced by blind phase-search CPR for PAS" is translated as "新算法可以超越blind phase-search CPR对PAS的干扰" (xīn suān fǎ kě yǐ chāo yù blind phase-search CPR duō PAS de gōng kē)

cs.SD - 2023-09-04

Single-Channel Speech Enhancement with Deep Complex U-Networks and Probabilistic Latent Space Models

  • paper_url: http://arxiv.org/abs/2309.01535
  • repo_url: None
  • paper_authors: Eike J. Nustede, Jörn Anemüller
  • for: 提高混响 speech enhancement 性能
  • methods: integrate probabilistic (i.e., variational) latent space model into U-Network architecture
  • results: 在 MS-DNS 2020 和 Voicebank+Demand 数据集上实现了高效的混响 speech enhancement,比如 SI-SDR 达到 20.2 dB,与无 probabilistic latent space 版本相比提高约 0.5-1.4 dB,并且高于 WaveUNet 和 PHASEN。
    Abstract In this paper, we propose to extend the deep, complex U-Network architecture for speech enhancement by incorporating a probabilistic (i.e., variational) latent space model. The proposed model is evaluated against several ablated versions of itself in order to study the effects of the variational latent space model, complex-value processing, and self-attention. Evaluation on the MS-DNS 2020 and Voicebank+Demand datasets yields consistently high performance. E.g., the proposed model achieves an SI-SDR of up to 20.2 dB, about 0.5 to 1.4 dB higher than its ablated version without probabilistic latent space, 2-2.4 dB higher than WaveUNet, and 6.7 dB above PHASEN. Compared to real-valued magnitude spectrogram processing with a variational U-Net, the complex U-Net achieves an improvement of up to 4.5 dB SI-SDR. Complex spectrum encoding as magnitude and phase yields best performance in anechoic conditions whereas real and imaginary part representation results in better generalization to (novel) reverberation conditions, possibly due to the underlying physics of sound.
    摘要 在这篇论文中,我们提议扩展深度、复杂的U-网络架构以提高语音增强。我们的提议模型包括 probabilistic(即变量)latent space模型。我们对模型的几个版本进行了ablationstudy,以研究变量latent space模型、复杂值处理和自注意的效果。我们对MS-DNS 2020和Voicebank+Demand dataset进行了评估,得到了出色的表现。例如,我们的模型在SI-SDR方面可以达到20.2 dB,与无 probabilistic latent space版本相比高出0.5-1.4 dB,与WaveUNet相比高出2-2.4 dB,与PHASEN相比高出6.7 dB。与实数值spectrogram处理的变量U-Net相比,复杂spectrum编码为实数值和相位的方法可以在静音条件下达到最佳性能,而实部和虚部表示的方法可以更好地泛化到(新的)噪音条件,可能是因为声音的物理学习。

Quid Manumit – Freeing the Qubit for Art

  • paper_url: http://arxiv.org/abs/2309.03104
  • repo_url: None
  • paper_authors: Mark Carney
  • for: 这篇论文描述了如何使用嵌入式量子 simulateur 创造出具有艺术性的量子乐器和乐效。
  • methods: 该论文利用了 previously released 的 ARM-based Raspberry Pi Pico嵌入式微控制器的量子 simulateur 代码,并提供了一些示例,包括一个量子 MIDI 处理器,可以根据输入音符生成附加的伴奏和具有量子生成的乐器,以及一个量子扭曲模块,可以根据量子Circuit来修改乐器的原始声音。
  • results: 该论文提供了一些示例,包括一个自包含的Quantum Stylophone和一个效果模块插件called ‘QubitCrusher’ для Korg Nu:Tekt NTS-1。这篇论文还讨论了未来的工作和方向,并提供了所有示例作为开源代码。这是作者所知道的第一个嵌入式量子 simulateur 用于乐器音乐(另一个 QSIM)。
    Abstract This paper describes how to `Free the Qubit' for art, by creating standalone quantum musical effects and instruments. Previously released quantum simulator code for an ARM-based Raspberry Pi Pico embedded microcontroller is utilised here, and several examples are built demonstrating different methods of utilising embedded resources: The first is a Quantum MIDI processor that generates additional notes for accompaniment and unique quantum generated instruments based on the input notes, decoded and passed through a quantum circuit in an embedded simulator. The second is a Quantum Distortion module that changes an instrument's raw sound according to a quantum circuit, which is presented in two forms; a self-contained Quantum Stylophone, and an effect module plugin called 'QubitCrusher' for the Korg Nu:Tekt NTS-1. This paper also discusses future work and directions for quantum instruments, and provides all examples as open source. This is, to the author's knowledge, the first example of embedded Quantum Simulators for Instruments of Music (another QSIM).
    摘要 The first example is a Quantum MIDI processor that generates additional notes for accompaniment and unique quantum-generated instruments based on the input notes, decoded and passed through a quantum circuit in an embedded simulator. The second example is a Quantum Distortion module that changes an instrument's raw sound according to a quantum circuit, presented in two forms: a self-contained Quantum Stylophone and an effect module plugin called "QubitCrusher" for the Korg Nu:Tekt NTS-1.The paper also discusses future work and directions for quantum instruments and provides all examples as open source. This is, to the author's knowledge, the first example of embedded Quantum Simulators for Instruments of Music (QSIM).