cs.CV - 2023-11-15

Predicting Spine Geometry and Scoliosis from DXA Scans

  • paper_url: http://arxiv.org/abs/2311.09424
  • repo_url: None
  • paper_authors: Amir Jamaludin, Timor Kadir, Emma Clark, Andrew Zisserman
  • for: 这paper的目的是估算DXA扫描中脊梁的弯 curvature.
  • methods: 首先,我们训练一个神经网络来预测扫描中的中脊梁弯曲。然后,我们使用一种积分方法来确定脊梁弯曲的弯曲范围。
  • results: 我们发现,使用弯曲度作为评分函数可以对脊梁弯曲进行排序,并且性能超过了 Jamaludin et al. 2018 的先前工作。
    Abstract Our objective in this paper is to estimate spine curvature in DXA scans. To this end we first train a neural network to predict the middle spine curve in the scan, and then use an integral-based method to determine the curvature along the spine curve. We use the curvature to compare to the standard angle scoliosis measure obtained using the DXA Scoliosis Method (DSM). The performance improves over the prior work of Jamaludin et al. 2018. We show that the maximum curvature can be used as a scoring function for ordering the severity of spinal deformation.
    摘要 我们的目标在这篇论文中是估算DXA扫描中的脊梁弯曲度。为达到这个目标,我们首先训练一个神经网络来预测扫描中的中脊梁弯曲度,然后使用一种积分方法来确定脊梁弯曲度的沿脊梁曲线。我们使用弯曲度来与使用DXA脊梁疾病方法(DSM)获得的标准角度股疾病量进行比较。我们表明,最大弯曲度可以用作评估脊梁弯曲度严重程度的分数函数。Here's the translation in Traditional Chinese:我们的目标在这篇论文中是估算DXA扫描中的脊梁弯曲度。为达到这个目标,我们首先训练一个神经网络来预测扫描中的中脊梁弯曲度,然后使用一种积分方法来确定脊梁弯曲度的沿脊梁曲线。我们使用弯曲度来与使用DXA脊梁疾病方法(DSM)获得的标准角度股疾病量进行比较。我们表明,最大弯曲度可以用作评估脊梁弯曲度严重程度的分数函数。

Synthetically Enhanced: Unveiling Synthetic Data’s Potential in Medical Imaging Research

  • paper_url: http://arxiv.org/abs/2311.09402
  • repo_url: https://github.com/bardiakh/syntheticallyenhanced
  • paper_authors: Bardia Khosravi, Frank Li, Theo Dapamede, Pouria Rouzrokh, Cooper U. Gamble, Hari M. Trivedi, Cody C. Wyles, Andrew B. Sellergren, Saptarshi Purkayastha, Bradley J. Erickson, Judy W. Gichoya
  • for: 这个研究旨在检验深度学习(DL)分类器在胸部X射线成像(CXR)分析中的性能,并研究使用扩充的数据集来提高模型的准确率。
  • methods: 这个研究使用了三个数据集:CheXpert、MIMIC-CXR和Emory Chest X-ray,并使用了条件涉嫌扩充推散模型(DDPMs)来生成医学图像。我们确保了生成的人工图像具有原始数据中的人类和疾病特征。
  • results: 研究发现,通过使用人工数据来补充真实数据,可以提高DL模型的准确率,特别是在检测较少发现的疾病方面。此外,使用人工数据alone来训练模型也可以达到与使用真实数据来训练模型的性能水平。这表示人工数据可能可以弥补真实数据的短缺,并且可以帮助建立更加坚固的DL模型。然而,尽管有扎实的结果,真实数据仍然保持优势。
    Abstract Chest X-rays (CXR) are the most common medical imaging study and are used to diagnose multiple medical conditions. This study examines the impact of synthetic data supplementation, using diffusion models, on the performance of deep learning (DL) classifiers for CXR analysis. We employed three datasets: CheXpert, MIMIC-CXR, and Emory Chest X-ray, training conditional denoising diffusion probabilistic models (DDPMs) to generate synthetic frontal radiographs. Our approach ensured that synthetic images mirrored the demographic and pathological traits of the original data. Evaluating the classifiers' performance on internal and external datasets revealed that synthetic data supplementation enhances model accuracy, particularly in detecting less prevalent pathologies. Furthermore, models trained on synthetic data alone approached the performance of those trained on real data. This suggests that synthetic data can potentially compensate for real data shortages in training robust DL models. However, despite promising outcomes, the superiority of real data persists.
    摘要 胸部X射影(CXR)是医学影像研究中最常用的方法,用于诊断多种医疗问题。本研究检查了深度学习(DL)分类器在CXR分析中使用推 diffusion模型(DDPMs)的合成数据补充的影响。我们使用了三个数据集:CheXpert、MIMIC-CXR和Emory Chest X-ray,并使用 conditional denoising diffusion probabilistic models(DDPMs)来生成合成前置影像。我们的方法确保了生成的合成图像具有原始数据中的人口和疾病特征。我们对内部和外部数据集进行评估,发现使用合成数据补充可以提高模型的准确率,特别是在检测较少发生的疾病方面。此外,使用合成数据alone进行训练的模型可以达到与使用真实数据训练的模型相同的性能。这表明合成数据可以可能补充实际数据的短缺,帮助建立更加稳定的DL模型。然而,尽管出色的结果,实际数据仍然保持优势。

MoCo-Transfer: Investigating out-of-distribution contrastive learning for limited-data domains

  • paper_url: http://arxiv.org/abs/2311.09401
  • repo_url: None
  • paper_authors: Yuwen Chen, Helen Zhou, Zachary C. Lipton
  • for: 这 paper 是为了研究如何将自动生成的医学影像数据转移到不同的域内数据集上,以提高特殊模型的开发。
  • methods: 这 paper 使用了 MoCo 自我超vised 对比表示法进行预训练,并对不同的 X-ray 数据集进行比较,以确定是否可以从更大的数据集中提取有用的信息。
  • results: 研究发现,在有限量的数据集上,可以通过从更大的数据集中提取自动生成的医学影像数据来提高模型性能。此外,在不同的域内数据集之间进行对比可以更好地提高模型性能,而不是使用 ImageNet 预训练的 weights。最后,这 paper 还提供了一种初步的数据集相似性量化方法。
    Abstract Medical imaging data is often siloed within hospitals, limiting the amount of data available for specialized model development. With limited in-domain data, one might hope to leverage larger datasets from related domains. In this paper, we analyze the benefit of transferring self-supervised contrastive representations from moment contrast (MoCo) pretraining on out-of-distribution data to settings with limited data. We consider two X-ray datasets which image different parts of the body, and compare transferring from each other to transferring from ImageNet. We find that depending on quantity of labeled and unlabeled data, contrastive pretraining on larger out-of-distribution datasets can perform nearly as well or better than MoCo pretraining in-domain, and pretraining on related domains leads to higher performance than if one were to use the ImageNet pretrained weights. Finally, we provide a preliminary way of quantifying similarity between datasets.
    摘要 医疗影像数据经常受限于医院内部,限制了特殊模型开发的数据量。在这种情况下,可能会希望利用更大的相关领域数据进行开发。本文分析了在有限数据情况下,将自动学习强制对比表示(MoCo)预训练的扩展数据传递到另一个领域的效果。我们考虑了两个X射线数据集,它们分别捕捉了不同的身体部位,并比较了从别的领域传递和从ImageNet预训练的效果。我们发现,取决于标注和无标注数据的数量,在更大的外部数据集上进行对比预训练可以与域内MoCo预训练相当或更好,而从相关领域传递的预训练性能高于使用ImageNet预训练的模型。最后,我们提供了一种初步的数据集相似性量化方法。

RENI++ A Rotation-Equivariant, Scale-Invariant, Natural Illumination Prior

  • paper_url: http://arxiv.org/abs/2311.09361
  • repo_url: https://github.com/jadgardner/ns_reni
  • paper_authors: James A. D. Gardner, Bernhard Egger, William A. P. Smith
  • for: 本研究旨在提出一种基于神经网络的自然照明模型,用于解决 inverse rendering 问题。
  • methods: 该模型使用 conditional neural field 表示法和 transformer 解码器,并通过 Vector Neurons 技术实现建立对称性。
  • results: 模型可以准确地表示高动态范围 (HDR) 环境图像,并且可以捕捉到自然环境中高频特征。 Is there anything else you would like to know?
    Abstract Inverse rendering is an ill-posed problem. Previous work has sought to resolve this by focussing on priors for object or scene shape or appearance. In this work, we instead focus on a prior for natural illuminations. Current methods rely on spherical harmonic lighting or other generic representations and, at best, a simplistic prior on the parameters. This results in limitations for the inverse setting in terms of the expressivity of the illumination conditions, especially when taking specular reflections into account. We propose a conditional neural field representation based on a variational auto-decoder and a transformer decoder. We extend Vector Neurons to build equivariance directly into our architecture, and leveraging insights from depth estimation through a scale-invariant loss function, we enable the accurate representation of High Dynamic Range (HDR) images. The result is a compact, rotation-equivariant HDR neural illumination model capable of capturing complex, high-frequency features in natural environment maps. Training our model on a curated dataset of 1.6K HDR environment maps of natural scenes, we compare it against traditional representations, demonstrate its applicability for an inverse rendering task and show environment map completion from partial observations. We share our PyTorch implementation, dataset and trained models at https://github.com/JADGardner/ns_reni
    摘要 “ inverse rendering 是一个欠定的问题。 先前的工作强调了对象或场景形状或外观的确认,以解决这个问题。 在这个工作中,我们则专注于天然照明的确认。 现有的方法使用圆柱对称光学或其他通用表示,并仅对parameters的简单确认。 这会导致对 inverse setting 的限制,特别是在考虑到镜面反射时。 我们提议一个基于 conditional neural field 的表示方法,使用 variational auto-decoder 和 transformer decoder。 我们将 Vector Neurons 扩展到建立了 architecture 中的 equivariance,并利用 depth estimation 中的构成调和损失函数,以实现高动态范围 (HDR) 图像的准确表示。 结果是一个可靠、旋转相似的 HDR 神经照明模型,能够捕捉自然环境图中的复杂、高频率特征。 我们在一个精心选择的 dataset 上训练我们的模型,并与传统表示进行比较,证明其适用于 inverse rendering 任务,以及环境图完整性的完成从部分观察。 我们在 GitHub 上分享我们的 PyTorch 实现、dataset 和训练模型,请见 。”

Nothing Stands Still: A Spatiotemporal Benchmark on 3D Point Cloud Registration Under Large Geometric and Temporal Change

  • paper_url: http://arxiv.org/abs/2311.09346
  • repo_url: None
  • paper_authors: Tao Sun, Yan Hao, Shengyu Huang, Silvio Savarese, Konrad Schindler, Marc Pollefeys, Iro Armeni
  • for: 这篇论文旨在探讨现有的3D geometric map建构方法如何处理时间变化。
  • methods: 该论文使用了多个参考视图的点云注册方法,以及一个新的多方向点云注册 benchmark。
  • results: 研究发现现有的点云注册方法无法处理大规模的时间变化,新的方法需要 специаль地设计来处理这类变化。
    Abstract Building 3D geometric maps of man-made spaces is a well-established and active field that is fundamental to computer vision and robotics. However, considering the evolving nature of built environments, it is essential to question the capabilities of current mapping efforts in handling temporal changes. In addition, spatiotemporal mapping holds significant potential for achieving sustainability and circularity goals. Existing mapping approaches focus on small changes, such as object relocation or self-driving car operation; in all cases where the main structure of the scene remains fixed. Consequently, these approaches fail to address more radical changes in the structure of the built environment, such as geometry and topology. To this end, we introduce the Nothing Stands Still (NSS) benchmark, which focuses on the spatiotemporal registration of 3D scenes undergoing large spatial and temporal change, ultimately creating one coherent spatiotemporal map. Specifically, the benchmark involves registering two or more partial 3D point clouds (fragments) from the same scene but captured from different spatiotemporal views. In addition to the standard pairwise registration, we assess the multi-way registration of multiple fragments that belong to any temporal stage. As part of NSS, we introduce a dataset of 3D point clouds recurrently captured in large-scale building indoor environments that are under construction or renovation. The NSS benchmark presents three scenarios of increasing difficulty, to quantify the generalization ability of point cloud registration methods over space (within one building and across buildings) and time. We conduct extensive evaluations of state-of-the-art methods on NSS. The results demonstrate the necessity for novel methods specifically designed to handle large spatiotemporal changes. The homepage of our benchmark is at http://nothing-stands-still.com.
    摘要 建立3D геометрические地图的人工环境是一个已经成熟且活跃的领域,对计算机视觉和机器人来说是基础领域。然而,随着建筑环境的发展,需要考虑当前地图努力的有效性,特别是在面对时间变化时。此外,空间时间地图具有可持续和循环的潜力,现有的地图方法主要集中于小型变化,如物体重新布局或自动驾驶车辆操作,而 ignore 主要结构的变化。为此,我们引入Nothing Stands Still(NSS)测试准则,强调在3D场景中大尺度空间和时间变化下进行空间时间地图的准确匹配。具体来说,测试准则包括将两个或多个来自同一场景,但从不同空间时间视图捕捉的3D点云(块)进行对比注准。此外,我们还评估了多个块之间的多方注准,以及这些块在不同时间阶段的注准。NSS测试准则包括3个难度不同的场景,以评估点云注准方法的通用性和灵活性。我们在NSS测试准则上进行了广泛的评估,结果表明了现有方法在面对大尺度空间时间变化时的不足。NSS测试准则的主页可以在http://nothing-stands-still.com 中找到。

Single-Image 3D Human Digitization with Shape-Guided Diffusion

  • paper_url: http://arxiv.org/abs/2311.09221
  • repo_url: None
  • paper_authors: Badour AlBahar, Shunsuke Saito, Hung-Yu Tseng, Changil Kim, Johannes Kopf, Jia-Bin Huang
  • for: 实现单一图像中的人物360度全景视图,包括衣服和人体的高精度描绘。
  • methods: 利用高容量2D扩散模型来提供衣服人体的应earance假设,然后逐步Synthesize多个视角的人物图像,并通过反射处理协会Multi-view图像进行融合,以获得高精度的3D mesh。
  • results: 试验结果显示,我们的方法可以优于先前的方法,实现高精度、 фото实际的360度人物全景视图,并且可以处理不同的衣服和人体表现。
    Abstract We present an approach to generate a 360-degree view of a person with a consistent, high-resolution appearance from a single input image. NeRF and its variants typically require videos or images from different viewpoints. Most existing approaches taking monocular input either rely on ground-truth 3D scans for supervision or lack 3D consistency. While recent 3D generative models show promise of 3D consistent human digitization, these approaches do not generalize well to diverse clothing appearances, and the results lack photorealism. Unlike existing work, we utilize high-capacity 2D diffusion models pretrained for general image synthesis tasks as an appearance prior of clothed humans. To achieve better 3D consistency while retaining the input identity, we progressively synthesize multiple views of the human in the input image by inpainting missing regions with shape-guided diffusion conditioned on silhouette and surface normal. We then fuse these synthesized multi-view images via inverse rendering to obtain a fully textured high-resolution 3D mesh of the given person. Experiments show that our approach outperforms prior methods and achieves photorealistic 360-degree synthesis of a wide range of clothed humans with complex textures from a single image.
    摘要 我们提出了一种方法,可以从单个输入图像中生成一个360度的人体视图,具有一致性和高分辨率的外观。NeRF和其变体通常需要不同视角的视频或图像。现有的approach都是通过ground-truth 3D扫描来supervise,或者lack 3D一致性。而最近的3D生成模型显示了人体数字化的3D一致性,但这些方法不能普遍应用于多样化的服装外表,并且lack photorealism。我们利用高容量2D扩散模型,这些模型在通用图像生成任务上进行预训练,作为人体服装的外观先验。为了实现更好的3D一致性而保持输入人物的身份,我们逐步synthesize多个视图的人体在输入图像中的缺失区域,使用形状响应的扩散条件和表面法向量来填充。然后,我们将这些合成的多视图图像进行反向渲染,以获得一个完全纹理高分辨率的3D mesh。实验表明,我们的方法超过了先前的方法,并实现了从单个图像中高真实度地生成360度的人体视图,包括复杂的服装表面。

DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

  • paper_url: http://arxiv.org/abs/2311.09217
  • repo_url: None
  • paper_authors: Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Jiahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, Kai Zhang
  • for: The paper is written for proposing a novel 3D generation approach called DMV3D, which uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion.
  • methods: The paper uses a transformer-based 3D large reconstruction model that incorporates a triplane NeRF representation to denoise noisy multi-view images, achieving single-stage 3D generation in $\sim$30s on single A100 GPU.
  • results: The paper demonstrates state-of-the-art results for the single-image reconstruction problem where probabilistic modeling of unseen object parts is required for generating diverse reconstructions with sharp textures. The paper also shows high-quality text-to-3D generation results outperforming previous 3D diffusion models.
    Abstract We propose \textbf{DMV3D}, a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion. Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering, achieving single-stage 3D generation in $\sim$30s on single A100 GPU. We train \textbf{DMV3D} on large-scale multi-view image datasets of highly diverse objects using only image reconstruction losses, without accessing 3D assets. We demonstrate state-of-the-art results for the single-image reconstruction problem where probabilistic modeling of unseen object parts is required for generating diverse reconstructions with sharp textures. We also show high-quality text-to-3D generation results outperforming previous 3D diffusion models. Our project website is at: https://justimyhxu.github.io/projects/dmv3d/ .
    摘要 我们提出了\textbf{DMV3D},一种新的3D生成方法,使用transformer基础的3D大量重建模型来去噪多视对冲测。我们的重建模型包括了三面NeRF表现,可以通过NeRF重建和渲染来去噪多视图像,实现单一阶段3D生成,需时约30秒在单一A100 GPU上。我们在大规模多视图像数据集上训练\textbf{DMV3D},只使用图像重建损失,不需要访问3D资产。我们示出了单一图像重建问题中的前景,需要 probabilistic modeling of unseen object parts,以生成多样化的重建结果, texture sharpness。我们还展示了与前一代3D扩散模型相比,\textbf{DMV3D}可以实现高品质的文本到3D生成。我们的项目网站位于:https://justimyhxu.github.io/projects/dmv3d/ .

ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy

  • paper_url: http://arxiv.org/abs/2311.09215
  • repo_url: https://github.com/kirill-vish/beyond-inet
  • paper_authors: Kirill Vishniakov, Zhiqiang Shen, Zhuang Liu
  • for: 本研究旨在探讨现代计算机视觉模型的多样性,以及选择最佳模型 для特定应用场景的决策因素。
  • methods: 本研究采用了多种模型建构和训练方法,包括ConvNet和Vision Transformer两种主流建构,以及supervised和CLIP训练方法。
  • results: 研究发现,尽管选择的模型在ImageNet精度上具有相似性,但它们在其他方面存在显著差异,如输出准确率、类型错误、转移性和特征不变等。这些差异不可能由传统的精度metric fully capture。
    Abstract Modern computer vision offers a great variety of models to practitioners, and selecting a model from multiple options for specific applications can be challenging. Conventionally, competing model architectures and training protocols are compared by their classification accuracy on ImageNet. However, this single metric does not fully capture performance nuances critical for specialized tasks. In this work, we conduct an in-depth comparative analysis of model behaviors beyond ImageNet accuracy, for both ConvNet and Vision Transformer architectures, each across supervised and CLIP training paradigms. Although our selected models have similar ImageNet accuracies and compute requirements, we find that they differ in many other aspects: types of mistakes, output calibration, transferability, and feature invariance, among others. This diversity in model characteristics, not captured by traditional metrics, highlights the need for more nuanced analysis when choosing among different models. Our code is available at https://github.com/kirill-vish/Beyond-INet.
    摘要 现代计算机视觉提供了多种模型选择器,选择合适的模型对特定应用可以是挑战。传统上,竞争模型建筑和训练协议通常被比较通过ImageNet分类精度。然而,这个单一指标并不完全捕捉特有任务的性能细节。在这项工作中,我们进行了深入的比较分析,涵盖ConvNet和Vision Transformer建筑,每种模型在超vised和CLIP训练协议下的行为。虽然我们选择的模型在ImageNet精度和计算需求上几乎相同,但我们发现它们在很多方面不同:出错类型、输出准确、传输性和特征不变等方面。这些模型特性的多样性,不被传统指标捕捉,高亮了选择模型时需要更加细化的分析。我们的代码可以在https://github.com/kirill-vish/Beyond-INet上获取。

Leveraging Citizen Science for Flood Extent Detection using Machine Learning Benchmark Dataset

  • paper_url: http://arxiv.org/abs/2311.09276
  • repo_url: None
  • paper_authors: Muthukumaran Ramasubramanian, Iksha Gurung, Shubhankar Gahlot, Ronny Hänsch, Andrew L. Molthan, Manil Maskey
  • for: 这个论文的目的是为了提供一个高精度的洪水覆盖区域检测方法,以便在紧急应急响应和恢复工作中使用。
  • methods: 这个论文使用了卫星遥感数据,特别是Sentinel-1 C-Band Synthetic Aperture Radar(SAR)图像,以检测洪水覆盖区域。由于SAR图像中水体的低反射率,因此可以准确地检测水体。然而,在某些洪水区域中,如果存在基础设施和树木等,会导致反射强度增加,使得简单的像素强度阈值和时间序列差分方法无法准确地检测洪水覆盖区域。因此,这个论文使用机器学习技术来准确地检测洪水覆盖区域。
  • results: 这个论文提供了一个标注了水体覆盖区域和洪水覆盖区域的 dataset,以及一个基eline模型和一个开源的 competedition,以推动洪水覆盖区域检测的研究。此外,这个论文还利用了公民科学,通过开源dataset和组织一个开源的比赛,以快速推进洪水覆盖区域检测的社区生成模型。
    Abstract Accurate detection of inundated water extents during flooding events is crucial in emergency response decisions and aids in recovery efforts. Satellite Remote Sensing data provides a global framework for detecting flooding extents. Specifically, Sentinel-1 C-Band Synthetic Aperture Radar (SAR) imagery has proven to be useful in detecting water bodies due to low backscatter of water features in both co-polarized and cross-polarized SAR imagery. However, increased backscatter can be observed in certain flooded regions such as presence of infrastructure and trees - rendering simple methods such as pixel intensity thresholding and time-series differencing inadequate. Machine Learning techniques has been leveraged to precisely capture flood extents in flooded areas with bumps in backscatter but needs high amounts of labelled data to work desirably. Hence, we created a labeled known water body extent and flooded area extents during known flooding events covering about 36,000 sq. kilometers of regions within mainland U.S and Bangladesh. Further, We also leveraged citizen science by open-sourcing the dataset and hosting an open competition based on the dataset to rapidly prototype flood extent detection using community generated models. In this paper we present the information about the dataset, the data processing pipeline, a baseline model and the details about the competition, along with discussion on winning approaches. We believe the dataset adds to already existing datasets based on Sentinel-1C SAR data and leads to more robust modeling of flood extents. We also hope the results from the competition pushes the research in flood extent detection further.
    摘要 “溢涌水域的扩散范围检测是紧急应急响应和恢复工作中的关键。卫星遥感数据提供了全球性的检测溢涌水域的方法。特别是Sentinel-1 C-Band Synthetic Aperture Radar(SAR)影像已经证明可以准确检测水体,因为水体具有低的反射强度。然而,某些淹没区域中的基础设施和树木会导致反射强度增加,使得简单的像素强度阈值和时间序列差异方法无法准确检测溢涌水域。机器学习技术已经被应用于准确地检测溢涌水域,但需要大量标注数据来工作。因此,我们创建了标注水体范围和淹没区域的known数据集,覆盖了美国大陆和孟加拉国的约36,000平方公里地区。此外,我们还利用公民科学,将数据集打包开源,并在该数据集基础上组织了一场公开竞赛,以快速搭建溢涌水域检测模型。本文介绍了数据集、数据处理管道、基线模型以及竞赛的详细信息,以及赢家的方法。我们认为该数据集将提高现有的Sentinel-1C SAR数据集,并且希望竞赛结果可以推动溢涌水域检测的研究进一步。”

Domain Aligned CLIP for Few-shot Classification

  • paper_url: http://arxiv.org/abs/2311.09191
  • repo_url: None
  • paper_authors: Muhammad Waleed Gondal, Jochen Gast, Inigo Alonso Ruiz, Richard Droste, Tommaso Macri, Suren Kumar, Luitpold Staudigl
  • for: 提高CLIP模型在target分布上的预测性能,包括图像分类和OOD robustness。
  • methods: 提出了一种sample-efficient领域适应策略,称为Domain Aligned CLIP (DAC),可以在target分布上提高图像-图像和图像-文本的同步,无需修改CLIP模型的参数。
  • results: 在11个广泛使用的图像分类任务上 demonstrates 16-shot classification的领先表现,相比强基eline的2.3%提高,并在4个OOD robustness benchmark上达到了竞争性表现。
    Abstract Large vision-language representation learning models like CLIP have demonstrated impressive performance for zero-shot transfer to downstream tasks while largely benefiting from inter-modal (image-text) alignment via contrastive objectives. This downstream performance can further be enhanced by full-scale fine-tuning which is often compute intensive, requires large labelled data, and can reduce out-of-distribution (OOD) robustness. Furthermore, sole reliance on inter-modal alignment might overlook the rich information embedded within each individual modality. In this work, we introduce a sample-efficient domain adaptation strategy for CLIP, termed Domain Aligned CLIP (DAC), which improves both intra-modal (image-image) and inter-modal alignment on target distributions without fine-tuning the main model. For intra-modal alignment, we introduce a lightweight adapter that is specifically trained with an intra-modal contrastive objective. To improve inter-modal alignment, we introduce a simple framework to modulate the precomputed class text embeddings. The proposed few-shot fine-tuning framework is computationally efficient, robust to distribution shifts, and does not alter CLIP's parameters. We study the effectiveness of DAC by benchmarking on 11 widely used image classification tasks with consistent improvements in 16-shot classification upon strong baselines by about 2.3% and demonstrate competitive performance on 4 OOD robustness benchmarks.
    摘要 大型视力语言表示学习模型如CLIP已经显示出Zero-shot传输下游任务的出色表现,主要受益于图像文本对Alignment via对比目标。然而,这种下游性能可以通过全面精细调整进一步提高,但这需要大量标注数据、计算昂贵和可能导致OOD不稳定性下降。此外,几乎完全依赖于图像文本对Alignment可能会忽略每个模式内的资源多样性。在这种情况下,我们提出了一种减少样本成本的领域适应策略,称为Domain Aligned CLIP(DAC),该策略可以在目标分布上提高图像图像和图像文本对Alignment,无需修改CLIP的主模型参数。为了提高图像图像对Alignment,我们引入了一个轻量级的适配器,该适配器专门通过图像对对比目标来训练。为了改善图像文本对Alignment,我们引入了一个简单的框架来修改预计算的类文本嵌入。我们的几个少量精细调整框架可以快速、稳定地进行,不需要大量标注数据,并且不会改变CLIP的参数。我们对11种常用的图像分类任务进行了测试,并 consistently obtain 16-shot classification improvements of around 2.3% over strong baselines,并且在4个OOD robustness benchmark上达到了竞争性的性能。

On the Computation of the Gaussian Rate-Distortion-Perception Function

  • paper_url: http://arxiv.org/abs/2311.09190
  • repo_url: None
  • paper_authors: Giuseppe Serra, Photios A. Stavrou, Marios Kountouris
  • for: 这个论文研究了多变量 Gaussian 源的 rate-distortion-perception 函数 (RDPF) 的计算,对于 mean squared error (MSE) 损害和各种抽象感知指标(Kullback-Leibler 异同、 геометрический Jensen-Shannon 异同、平方 Hellinger 距离和平方 Wasserstein-2 距离)。
  • methods: 作者首先计算了拟合函数的分析上下文 bound,并提供了 RDPF 实现的前向 “测试通道” 实现。在多变量情况下,作者表明了对于 tensorizable 损害和感知指标,优化解决方案 residues 在源协 variance 矩阵的 eigenvector 上。因此,多变量优化问题可以表示为一个约束的 scalar Gaussian RDPF 问题。
  • results: 作者提出了一种基于块非线性 Gauss-Seidel 方法的 alternating minimization 算法,可以优化多变量问题,同时 identificatinig RDPF 实现。此外,作者还提供了算法的实现、 converges 和 converge 速率的 Characterization。最后,作者在 “完美现实” régime 下获得了多变量 Gaussian RDPF 的分析解。作者通过数值实验证明了结论,并与现有结果进行了比较。
    Abstract In this paper, we study the computation of the rate-distortion-perception function (RDPF) for a multivariate Gaussian source under mean squared error (MSE) distortion and, respectively, Kullback-Leibler divergence, geometric Jensen-Shannon divergence, squared Hellinger distance, and squared Wasserstein-2 distance perception metrics. To this end, we first characterize the analytical bounds of the scalar Gaussian RDPF for the aforementioned divergence functions, also providing the RDPF-achieving forward "test-channel" realization. Focusing on the multivariate case, we establish that, for tensorizable distortion and perception metrics, the optimal solution resides on the vector space spanned by the eigenvector of the source covariance matrix. Consequently, the multivariate optimization problem can be expressed as a function of the scalar Gaussian RDPFs of the source marginals, constrained by global distortion and perception levels. Leveraging this characterization, we design an alternating minimization scheme based on the block nonlinear Gauss-Seidel method, which optimally solves the problem while identifying the Gaussian RDPF-achieving realization. Furthermore, the associated algorithmic embodiment is provided, as well as the convergence and the rate of convergence characterization. Lastly, for the "perfect realism" regime, the analytical solution for the multivariate Gaussian RDPF is obtained. We corroborate our results with numerical simulations and draw connections to existing results.
    摘要 在本文中,我们研究了多变量 Gaussian 源的 computation rate-distortion-perception function (RDPF) 的计算,采用 Mean Squared Error (MSE) 损均、Kullback-Leibler 差分、Geometric Jensen-Shannon 差分、平方 Hellinger 距离和平方 Wasserstein-2 距离的感知度量。为此,我们首先计算了整数 Gaussian RDPF 的分析 bound,并提供了 RDPF 实现的前向 "测试通道" 实现。在多变量情况下,我们证明了,对于张量izable 损均和感知度量,最佳解在源协VARiance矩阵的 eigenvector 上。因此,多变量优化问题可以表示为各自 Gaussian RDPF 的源协VARiance矩阵的eigenvector 的函数,受到全局损均和感知水平的约束。我们利用这种特征来设计一种 alternating minimization 方案,基于块非线性 Gauss-Seidel 方法,可以最佳地解决问题,同时确定 Gaussian RDPF 实现。此外,我们还提供了算法实现、收敛性和收敛速度的特征。最后,在 "完美现实" режиме下,我们获得了多变量 Gaussian RDPF 的分析解。我们的结果与数值仿真结果相符,并与现有结果进行了连接。

RBPGAN: Recurrent Back-Projection GAN for Video Super Resolution

  • paper_url: http://arxiv.org/abs/2311.09178
  • repo_url: None
  • paper_authors: Israa Fahmy, Marwah Sulaiman, Zahraa Shehabeldin, Mohammed Barakat, Dareen Hussein, Mohammed El-Naggar, Hesham Eraqi, Moustafa Youssef
  • for: 这个论文旨在提出一种Video超分辨(VSR)模型,以生成时间协调的解决方案,保持空间细节。
  • methods: 该模型结合了两个 state-of-the-art 模型, generator inspirited by RBPN 系统, discriminator inspirited by TecoGAN。使用 Ping-Pong 损失函数,提高时间一致性。
  • results: 我们的贡献使得模型在 temporally 一致的细节方面表现出色,证明了我们的模型在不同的数据集上的高性能。
    Abstract Recently, video super resolution (VSR) has become a very impactful task in the area of Computer Vision due to its various applications. In this paper, we propose Recurrent Back-Projection Generative Adversarial Network (RBPGAN) for VSR in an attempt to generate temporally coherent solutions while preserving spatial details. RBPGAN integrates two state-of-the-art models to get the best in both worlds without compromising the accuracy of produced video. The generator of the model is inspired by RBPN system, while the discriminator is inspired by TecoGAN. We also utilize Ping-Pong loss to increase temporal consistency over time. Our contribution together results in a model that outperforms earlier work in terms of temporally consistent details, as we will demonstrate qualitatively and quantitatively using different datasets.
    摘要 近期,视频超解像 (VSR) 在计算机视觉领域已经变得非常重要,因为它具有许多应用。在这篇论文中,我们提出了 Recurrent Back-Projection Generative Adversarial Network (RBPGAN),用于实现时间相关的解决方案,同时保持空间细节的准确性。RBPGAN 结合了两个状态的先进模型,以获得最佳的效果而无需牺牲生成的视频的准确性。生成器采用 RBPN 系统的设计,而批判器采用 TecoGAN 的设计。我们还使用了 Ping-Pong 损失函数,以增加时间一致性。我们的贡献结合使得模型在时间一致性和空间细节方面表现出色,我们将通过不同的数据集进行质量和量化的比较来证明。

WildlifeDatasets: An open-source toolkit for animal re-identification

  • paper_url: http://arxiv.org/abs/2311.09118
  • repo_url: https://github.com/wildlifedatasets/wildlife-datasets
  • paper_authors: Vojtěch Čermák, Lukas Picek, Lukáš Adam, Kostas Papafitsoros
  • for: 本研究开发了一个开源工具箱(WildlifeDatasets),旨在帮助生物学家和计算机视觉/机器学习研究人员访问公共可用的野生动物数据集,并提供了许多数据集预处理、性能分析和模型细化等方法。
  • methods: 本研究使用了Python编程语言,并提供了许多数据集预处理和性能分析方法,包括了本研究中的最全面的实验比较,涵盖了本地描述器和深度学习方法。此外,本研究还提供了一个基于描述器的个体重复识别模型——MegaDescriptor,该模型在野生动物重复识别数据集上实现了state-of-the-art性能,并在其他预训练模型 such as CLIP 和 DINOv2 中出众表现。
  • results: 本研究通过实验和比较,证明了MegaDescriptor模型在野生动物重复识别任务上具有state-of-the-art性能,并且在多种情况下都能够准确地识别动物个体。此外,本研究还提供了多种MegaDescriptor的不同版本(i.e., Small, Medium, and Large),通过HuggingFace hub(https://huggingface.co/BVRA)进行了公开发布,以便易于与现有的野生动物监测应用程序集成。
    Abstract In this paper, we present WildlifeDatasets (https://github.com/WildlifeDatasets/wildlife-datasets) - an open-source toolkit intended primarily for ecologists and computer-vision / machine-learning researchers. The WildlifeDatasets is written in Python, allows straightforward access to publicly available wildlife datasets, and provides a wide variety of methods for dataset pre-processing, performance analysis, and model fine-tuning. We showcase the toolkit in various scenarios and baseline experiments, including, to the best of our knowledge, the most comprehensive experimental comparison of datasets and methods for wildlife re-identification, including both local descriptors and deep learning approaches. Furthermore, we provide the first-ever foundation model for individual re-identification within a wide range of species - MegaDescriptor - that provides state-of-the-art performance on animal re-identification datasets and outperforms other pre-trained models such as CLIP and DINOv2 by a significant margin. To make the model available to the general public and to allow easy integration with any existing wildlife monitoring applications, we provide multiple MegaDescriptor flavors (i.e., Small, Medium, and Large) through the HuggingFace hub (https://huggingface.co/BVRA).
    摘要 在这篇论文中,我们介绍了 WildlifeDatasets(https://github.com/WildlifeDatasets/wildlife-datasets)-一个开源工具箱,主要针对生态学家和计算机视觉/机器学习研究人员。WildlifeDatasets 是用 Python 编写的,可以直接访问公共可用的野生动物数据集,并提供了许多方法 для数据集预处理、性能分析和模型细化。我们在各种场景和基线实验中示例了这个工具箱,包括我们知道的最全面的野生动物重新识别实验,包括本地描述器和深度学习方法。此外,我们还提供了一个称之为 MegaDescriptor 的基本模型,该模型在各种种类的动物重新识别任务中提供了状态机器的性能,并在其他预训练模型 such as CLIP 和 DINOv2 的基础上出色的超越。为使这个模型对一般公众开放,并让它与现有的野生监测应用程序融合,我们在 HuggingFace 平台(https://huggingface.co/BVRA)提供了多种 MegaDescriptor 的FLAVOR(i.e., Small, Medium, and Large)。

Cross-view and Cross-pose Completion for 3D Human Understanding

  • paper_url: http://arxiv.org/abs/2311.09104
  • repo_url: None
  • paper_authors: Matthieu Armando, Salma Galaaoui, Fabien Baradel, Thomas Lucas, Vincent Leroy, Romain Brégier, Philippe Weinzaepfel, Grégory Rogez
  • for: The paper is written for the domain of computer vision, specifically for human perception and understanding.
  • methods: The paper proposes a pre-training approach based on self-supervised learning using human-centric data, including stereoscopic and temporal pairs of images.
  • results: The proposed method outperforms existing self-supervised pre-training methods on a wide set of human-centric downstream tasks, and achieves state-of-the-art performance for model-based and model-free human mesh recovery.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文是关注计算机视觉领域,具体是人体识别和理解。
  • methods: 这篇论文提议一种基于自我监督学习的预训练方法,使用人体中心的数据集,包括左右视图和时间视图对。
  • results: 提议的方法在多种人体中心下沉淀任务上超过现有的自我监督预训练方法,并在模型基于和模型外的人体网格恢复任务上达到了状态机器。
    Abstract Human perception and understanding is a major domain of computer vision which, like many other vision subdomains recently, stands to gain from the use of large models pre-trained on large datasets. We hypothesize that the most common pre-training strategy of relying on general purpose, object-centric image datasets such as ImageNet, is limited by an important domain shift. On the other hand, collecting domain specific ground truth such as 2D or 3D labels does not scale well. Therefore, we propose a pre-training approach based on self-supervised learning that works on human-centric data using only images. Our method uses pairs of images of humans: the first is partially masked and the model is trained to reconstruct the masked parts given the visible ones and a second image. It relies on both stereoscopic (cross-view) pairs, and temporal (cross-pose) pairs taken from videos, in order to learn priors about 3D as well as human motion. We pre-train a model for body-centric tasks and one for hand-centric tasks. With a generic transformer architecture, these models outperform existing self-supervised pre-training methods on a wide set of human-centric downstream tasks, and obtain state-of-the-art performance for instance when fine-tuning for model-based and model-free human mesh recovery.
    摘要 人类 восприятие和理解是计算机视觉的一大领域,与其他视觉子领域一样,它也可以受惠于大型模型在大量数据上进行预训练。我们假设,通过普通的对象中心图像 dataset such as ImageNet 进行预训练的方法,受到了重要的领域变化的限制。而收集专门适用于人类数据的预先知道ledge,如 2D 或 3D 标注,不可能扩展。因此,我们提议一种基于自我学习的预训练方法,使用人类数据图像。我们的方法使用人类图像的对称对(cross-view)和视频中的姿态对(cross-pose)进行自我学习,以学习人类的3D 和运动约束。我们预训练了一个身体中心任务的模型和一个手中心任务的模型,使用通用的 transformer 架构。这些模型在人类中心下沉天任务中表现出色,并在模型基于和模型自由人体碎片恢复任务中实现了国际级的表现。

Guided Scale Space Radon Transform for linear structures detection

  • paper_url: http://arxiv.org/abs/2311.09103
  • repo_url: None
  • paper_authors: Aicha Baya Goumeidane, Djemel Ziou, Nafaa Nacereddine
  • for: automaton detection of thick linear structures in gray scale and binary images
  • methods: 使用Scale Space Radon Transform (SSRT) 和计算图像的希尔曼方向的方法
  • results: 能够有效地检测图像中的不同厚度线条,并且具有鲁棒性 against noise and complex background
    Abstract Using integral transforms to the end of lines detection in images with complex background, makes the detection a hard task needing additional processing to manage the detection. As an integral transform, the Scale Space Radon Transform (SSRT) suffers from such drawbacks, even with its great abilities for thick lines detection. In this work, we propose a method to address this issue for automatic detection of thick linear structures in gray scale and binary images using the SSRT, whatever the image background content. This method involves the calculated Hessian orientations of the investigated image while computing its SSRT, in such a way that linear structures are emphasized in the SSRT space. As a consequence, the subsequent maxima detection in the SSRT space is done on a modified transform space freed from unwanted parts and, consequently, from irrelevant peaks that usually drown the peaks representing lines. Besides, highlighting the linear structure in the SSRT space permitting, thus, to efficiently detect lines of different thickness in synthetic and real images, the experiments show also the method robustness against noise and complex background.
    摘要 Our method involves calculating the Hessian orientations of the investigated image while computing its SSRT, which emphasizes linear structures in the SSRT space. As a result, the subsequent maxima detection in the SSRT space is done on a modified transform space free from unwanted parts and irrelevant peaks that usually drown the peaks representing lines. This approach allows us to efficiently detect lines of different thickness in synthetic and real images, and the experiments show the method's robustness against noise and complex backgrounds.

Applications of Computer Vision in Autonomous Vehicles: Methods, Challenges and Future Directions

  • paper_url: http://arxiv.org/abs/2311.09093
  • repo_url: None
  • paper_authors: Xingshuai Dong, Massimiliano L. Cappuccio
  • for: 本文主要旨在为读者提供自动驾驶技术的全面了解,包括自动驾驶系统的发展、感知器技术、benchmark数据集和公共评价等方面。
  • methods: 本文主要使用了computer vision技术,包括深度估计、物体检测、车道检测和交通标志识别等方面。同时,本文还对自动驾驶系统的发展进行了概述,包括主要汽车制造商从不同国家的开发。
  • results: 本文对自动驾驶技术的现状进行了全面的检视和分析,包括现有技术挑战和未来研究方向等。通过对自动驾驶技术的概述和分析,本文可以帮助读者更好地了解自动驾驶技术的发展和应用。
    Abstract Autonomous vehicle refers to a vehicle capable of perceiving its surrounding environment and driving with little or no human driver input. The perception system is a fundamental component which enables the autonomous vehicle to collect data and extract relevant information from the environment to drive safely. Benefit from the recent advances in computer vision, the perception task can be achieved by using sensors, such as camera, LiDAR, radar, and ultrasonic sensor. This paper reviews publications on computer vision and autonomous driving that are published during the last ten years. In particular, we first investigate the development of autonomous driving systems and summarize these systems that are developed by the major automotive manufacturers from different countries. Second, we investigate the sensors and benchmark data sets that are commonly utilized for autonomous driving. Then, a comprehensive overview of computer vision applications for autonomous driving such as depth estimation, object detection, lane detection, and traffic sign recognition are discussed. Additionally, we review public opinions and concerns on autonomous vehicles. Based on the discussion, we analyze the current technological challenges that autonomous vehicles meet with. Finally, we present our insights and point out some promising directions for future research. This paper will help the reader to understand autonomous vehicles from the perspectives of academia and industry.
    摘要 自动驾驶车辆指的是一种可以自主感知周围环境并减少或完全消除人类驾驶员输入的车辆。感知系统是自动驾驶车辆的基本组件,它使得自动驾驶车辆能够收集环境数据并提取有用信息以安全驾驶。受计算机视觉技术的推动,感知任务可以通过感知器,如摄像头、LiDAR、雷达和超声波感知器来实现。本文回顾过去十年发表的计算机视觉和自动驾驶相关的研究论文。特别是,我们首先调查了各国主要汽车制造商在自动驾驶领域的开发系统,然后调查了通用于自动驾驶的感知器和标准数据集。接着,我们提供了计算机视觉在自动驾驶中的应用篇幅,包括深度估计、物体检测、车道检测和交通标志识别。此外,我们还评估了自动驾驶车辆的公众观点和担忧,并分析了自动驾驶车辆目前所面临的技术挑战。最后,我们提出了一些有前途的研究方向。本文将帮助读者更好地理解自动驾驶车辆的 academia 和 industry 视角。

  • paper_url: http://arxiv.org/abs/2311.09084
  • repo_url: https://github.com/hcplab-sysu/personsearch-ctlg
  • paper_authors: Hefeng Wu, Weifeng Chen, Zhibin Liu, Tianshui Chen, Zhiguang Chen, Liang Lin
  • for: 这个论文是为了提出一种简单 yet effective的双Transformer模型,用于图像库中的文本基于人脸检索。
  • methods: 该模型使用了一种具有强度感知的对比学习策略,以及一种自动生成数据模块(PDG),以提高模型的性能。
  • results: 实验表明,该模型在两个流行的TBPS数据集(CUHK-PEDES和ICFG-PEDES)上表现出色,与现有方法相比,提高了Top1、Top5、Top10的性能(例如,CUHK-PEDES上提高了3.88%, 4.02%, 2.92%)。
    Abstract Given a descriptive text query, text-based person search (TBPS) aims to retrieve the best-matched target person from an image gallery. Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data. To better align the two modalities, most existing works focus on introducing sophisticated network structures and auxiliary tasks, which are complex and hard to implement. In this paper, we propose a simple yet effective dual Transformer model for text-based person search. By exploiting a hardness-aware contrastive learning strategy, our model achieves state-of-the-art performance without any special design for local feature alignment or side information. Moreover, we propose a proximity data generation (PDG) module to automatically produce more diverse data for cross-modal training. The PDG module first introduces an automatic generation algorithm based on a text-to-image diffusion model, which generates new text-image pair samples in the proximity space of original ones. Then it combines approximate text generation and feature-level mixup during training to further strengthen the data diversity. The PDG module can largely guarantee the reasonability of the generated samples that are directly used for training without any human inspection for noise rejection. It improves the performance of our model significantly, providing a feasible solution to the data insufficiency problem faced by such fine-grained visual-linguistic tasks. Extensive experiments on two popular datasets of the TBPS task (i.e., CUHK-PEDES and ICFG-PEDES) show that the proposed approach outperforms state-of-the-art approaches evidently, e.g., improving by 3.88%, 4.02%, 2.92% in terms of Top1, Top5, Top10 on CUHK-PEDES. The codes will be available at https://github.com/HCPLab-SYSU/PersonSearch-CTLG
    摘要 文本基于人搜索(TBPS)是一种跨模态检索任务,目的是从图库中检索最佳匹配的人Target。由于两个模态之间存在显著的差异和细化差异,以及缺乏标注数据,这种检索任务非常具有挑战性。大多数现有的方法都是通过引入复杂的网络结构和辅助任务来减轻这些差异,但这些方法通常具有复杂性和困难实现性。在本文中,我们提出了一种简单 yet effective的双Transformer模型,用于实现文本基于人搜索。我们通过利用一种困难性感知的对比学习策略,使我们的模型在无需特殊的本地特征对齐或副作用信息的情况下达到了状态艺术的表现。此外,我们还提出了一种 proximity data generation(PDG)模块,用于自动生成更多的跨模态数据。PDG模块首先通过基于文本到图像扩散模型的自动生成算法,生成了新的文本-图像对amples在原始对amples的邻近空间中。然后,它通过在训练时 combining approximate text generation和特征级混合来进一步增强数据多样性。PDG模块可以保证生成的样本的合理性,不需要人工检查噪声。这使得我们的模型表现得更好,提供了一种可行的解决方案 для跨模态任务中的数据不足问题。我们在两个流行的TBPS任务(i.e., CUHK-PEDES和ICFG-PEDES)上进行了广泛的实验,结果显示,我们的方法明显超越了现有的方法,例如,在CUHK-PEDES上提高了Top1、Top5、Top10的表现,升准3.88%、4.02%、2.92%。代码将在https://github.com/HCPLab-SYSU/PersonSearch-CTLG上提供。

Spiking NeRF: Representing the Real-World Geometry by a Discontinuous Representation

  • paper_url: http://arxiv.org/abs/2311.09077
  • repo_url: None
  • paper_authors: Zhanfeng Liao, Qian Zheng, Yan Liu, Gang Pan
  • for: 提高 NeRF 方法的准确性,使其能够更好地捕捉物体的 geometry 和光学特性。
  • methods: 使用 spiking neuron 和 hybrid ANN-SNN 框架建立不连续的density field,以 faithful 地表示物体的geometry。
  • results: 实现了 SOTA 性能,并提供了数值关系 между spiking neuron 参数和理论准确性,以便进一步改进方法。
    Abstract A crucial reason for the success of existing NeRF-based methods is to build a neural density field for the geometry representation via multiple perceptron layers (MLPs). MLPs are continuous functions, however, real geometry or density field is frequently discontinuous at the interface between the air and the surface. Such a contrary brings the problem of unfaithful geometry representation. To this end, this paper proposes spiking NeRF, which leverages spiking neuron and a hybrid Artificial Neural Network (ANN)-Spiking Neural Network (SNN) framework to build a discontinuous density field for faithful geometry representation. Specifically, we first demonstrate the reason why continuous density fields will bring inaccuracy. Then, we propose to use the spiking neurons to build a discontinuous density field. We conduct comprehensive analysis for the problem of existing spiking neuron models and then provide the numerical relationship between the parameter of spiking neuron and the theoretical accuracy of geometry, Based on this, we propose a bounded spiking neuron to build the discontinuous density field. Our results achieve SOTA performance. Our code and data will be released to the public.
    摘要 <>TRANSLATE_TEXT一个关键的原因导致现有的NeRF方法成功是通过多层感知核(MLP)建立神经density场来表示geometry。但是,实际的geometry或density场经常在空气和表面之间存在缺口,这会导致不准确的geometry表示。为解决这问题,本文提出了脊动NeRF,它利用脊动神经和混合人工神经网络(ANN)-脊动神经网络(SNN)框架来建立不连续的density场,以准确地表示geometry。 Specifically, we first demonstrate why continuous density fields will bring inaccuracy. Then, we propose to use spiking neurons to build a discontinuous density field. We conduct comprehensive analysis for the problem of existing spiking neuron models and then provide the numerical relationship between the parameter of spiking neuron and the theoretical accuracy of geometry. Based on this, we propose a bounded spiking neuron to build the discontinuous density field. Our results achieve SOTA performance. Our code and data will be released to the public.Note: SOTA stands for "State of the Art" in English, which means the current best performance in a particular field or task.

Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models

  • paper_url: http://arxiv.org/abs/2311.09064
  • repo_url: None
  • paper_authors: Yeongbin Kim, Gautam Singh, Junyeong Park, Caglar Gulcehre, Sungjin Ahn
  • for: 本文旨在提出一个新的基准测试系统,用于评估机器学习模型在视觉领域中的系统性协成能力。
  • methods: 本文使用了一种新的基准测试系统,称为视觉协成能力测试 benchmark (SVIB),以评估模型在一种受控的世界动力下的一步图像转换能力。
  • results: 经过对多种基线模型的评估,本文发现现有的模型在系统性视觉协成能力方面存在一定的限制,并提出了一些可能的改进方向。I hope that helps! Let me know if you have any other questions.
    Abstract Systematic compositionality, or the ability to adapt to novel situations by creating a mental model of the world using reusable pieces of knowledge, remains a significant challenge in machine learning. While there has been considerable progress in the language domain, efforts towards systematic visual imagination, or envisioning the dynamical implications of a visual observation, are in their infancy. We introduce the Systematic Visual Imagination Benchmark (SVIB), the first benchmark designed to address this problem head-on. SVIB offers a novel framework for a minimal world modeling problem, where models are evaluated based on their ability to generate one-step image-to-image transformations under a latent world dynamics. The framework provides benefits such as the possibility to jointly optimize for systematic perception and imagination, a range of difficulty levels, and the ability to control the fraction of possible factor combinations used during training. We provide a comprehensive evaluation of various baseline models on SVIB, offering insight into the current state-of-the-art in systematic visual imagination. We hope that this benchmark will help advance visual systematic compositionality.
    摘要 系统性的组合性,或者在新的情况下适应性的创建一个世界模型使用可重用的知识,仍然是机器学习领域的主要挑战。虽然在语言领域有了很大的进步,但对于视觉想象的系统atic imagination,尚未有充分的尝试。我们介绍了系统atic Visual Imagination Benchmark (SVIB),第一个专门解决这个问题的benchmark。SVIB提供了一个新的世界模型设计问题,其中模型被评估根据它们在一个latent世界动力学下生成一步图像到图像变换的能力。这个框架具有许多优点,如同时优化系统atic perception和想象、多种难度水平和在训练中控制可能的因素组合的使用。我们对多种基eline模型在SVIB上进行了全面的评估,提供了现状的概况,并希望这个benchmark能够推动视觉系统atic compositionality的进步。

Self-Annotated 3D Geometric Learning for Smeared Points Removal

  • paper_url: http://arxiv.org/abs/2311.09029
  • repo_url: None
  • paper_authors: Miaowei Wang, Daniel Morris
  • for: 本研究旨在提高顾客级束缚精密深度感知器的准确性和质量,并解决雷达点精灵抽象的问题。
  • methods: 我们提出了一种完全自我标注的方法,利用多视角 geometric evidence 自动检测和标注精灵点和有效点。
  • results: 我们的方法在实验和减少研究中表现出色,超过了传统的滤波器和其他自我标注方法。In simpler English, the paper aims to improve the accuracy and quality of consumer-level depth sensors and solve the problem of “smeared points” (points that are not on any 3D surface and can cause errors in depth maps). The proposed method uses fully self-annotated training data and relies on 3D geometric evidence from multiple perspectives to detect and remove smeared points. Experimental results show that the method outperforms traditional filters and other self-annotated methods.
    Abstract There has been significant progress in improving the accuracy and quality of consumer-level dense depth sensors. Nevertheless, there remains a common depth pixel artifact which we call smeared points. These are points not on any 3D surface and typically occur as interpolations between foreground and background objects. As they cause fictitious surfaces, these points have the potential to harm applications dependent on the depth maps. Statistical outlier removal methods fare poorly in removing these points as they tend also to remove actual surface points. Trained network-based point removal faces difficulty in obtaining sufficient annotated data. To address this, we propose a fully self-annotated method to train a smeared point removal classifier. Our approach relies on gathering 3D geometric evidence from multiple perspectives to automatically detect and annotate smeared points and valid points. To validate the effectiveness of our method, we present a new benchmark dataset: the Real Azure-Kinect dataset. Experimental results and ablation studies show that our method outperforms traditional filters and other self-annotated methods. Our work is publicly available at https://github.com/wangmiaowei/wacv2024_smearedremover.git.
    摘要 “随着对消费者级数 dense depth sensor 的改进,有所进步。然而,还有一个常见的深度像素错误,我们称之为“扩散点”。这些点不在任何三维表面上,通常发生在前景和背景物体之间的插值。由于这些点会创建虚拟表面,这些点有可能伤害对深度地图依赖的应用。统计方法对这些点进行排除是不具有效果的,因为它们也可能会 removes 真正的表面点。训练网络基于的方法也难以获得足够的标注数据。为解决这个问题,我们提出了一个完全自我标注的方法,用于训练深度扩散点移除分类器。我们的方法基于从多个角度收集3D几何证据,以自动检测和标注深度扩散点和有效点。为 validate 我们的方法,我们提供了一个新的库 benchmark 数据集:Real Azure-Kinect 数据集。实验结果和删除研究显示,我们的方法比传统范例和其他自我标注方法高效。我们的工作公开在 GitHub 上,请参考 https://github.com/wangmiaowei/wacv2024_smearedremover.git。”

Fast Certification of Vision-Language Models Using Incremental Randomized Smoothing

  • paper_url: http://arxiv.org/abs/2311.09024
  • repo_url: None
  • paper_authors: A K Nirala, A Joshi, C Hegde, S Sarkar
  • for: 这个论文旨在提出一种快速的开放词汇识别模型认证方法,以确保这些模型在实际应用中的可靠性。
  • methods: 这种认证方法基于随机缓和技术,使用基于训练集的Certified CLIP classifier来快速认证novel prompts。
  • results: 实验结果表明,OVC可以快速认证开放词汇模型,并且可以在CIFAR-10和ImageNet测试 datasets上达到比较高的识别率。
    Abstract A key benefit of deep vision-language models such as CLIP is that they enable zero-shot open vocabulary classification; the user has the ability to define novel class labels via natural language prompts at inference time. However, while CLIP-based zero-shot classifiers have demonstrated competitive performance across a range of domain shifts, they remain highly vulnerable to adversarial attacks. Therefore, ensuring the robustness of such models is crucial for their reliable deployment in the wild. In this work, we introduce Open Vocabulary Certification (OVC), a fast certification method designed for open-vocabulary models like CLIP via randomized smoothing techniques. Given a base "training" set of prompts and their corresponding certified CLIP classifiers, OVC relies on the observation that a classifier with a novel prompt can be viewed as a perturbed version of nearby classifiers in the base training set. Therefore, OVC can rapidly certify the novel classifier using a variation of incremental randomized smoothing. By using a caching trick, we achieve approximately two orders of magnitude acceleration in the certification process for novel prompts. To achieve further (heuristic) speedups, OVC approximates the embedding space at a given input using a multivariate normal distribution bypassing the need for sampling via forward passes through the vision backbone. We demonstrate the effectiveness of OVC on through experimental evaluation using multiple vision-language backbones on the CIFAR-10 and ImageNet test datasets.
    摘要 CLIP 深度视力语言模型具有零shot开 vocabulary 分类的能力,即在运行时通过自然语言提示来定义新的分类标签。然而, CLIP 基于的零shot 分类器对于攻击而言是极为易受攻击的。因此,为了可靠地部署这些模型,确保其 Robustness 是非常重要的。在这项工作中,我们介绍了一种叫做 Open Vocabulary Certification (OVC) 的快速证明方法,用于验证开 vocabulary 模型如 CLIP。OVC 基于的观察是,一个新的提示可以视为 nearby 的基础训练集中的类ifier 的噪声版本。因此,OVC 可以快速地证明这个新的类ifier 使用随机噪声技术。使用缓存技巧,我们实现了约两个数量级的加速。此外,OVC 使用一种快速的方法来 Approximate 输入空间的 embedding 空间,从而缩短证明过程。我们通过对多个视力语言背景进行实验评估,证明了 OVC 的有效性。

Incremental Object-Based Novelty Detection with Feedback Loop

  • paper_url: http://arxiv.org/abs/2311.09004
  • repo_url: None
  • paper_authors: Simone Caldarella, Elisa Ricci, Rahaf Aljundi
  • for: 本研究旨在提高对象检测模型的不可预测对象检测能力,以避免在实际应用中可能存在危害的行为,如自驾车或自主机器人中使用的对象检测模型。
  • methods: 本研究提出了一种基于人工反馈的对象新型检测方法,假设可以在预测输出中请求人工反馈,并在反馈可用时进行不间断的改进。为解决这种新的对象检测问题,我们提出了一个轻量级的ND模块,附加在已经训练的对象检测模型之上,并通过反馈循环进行不断更新。
  • results: 我们的实验表明,我们的ND方法可以增强对象检测模型的Robustness,并成功地收集和 incorporate 人工反馈。我们还提出了一个新的评价指标,用于评价对象检测模型的新型检测能力,并进行了广泛的比较试验,以证明我们的ND方法的效果。
    Abstract Object-based Novelty Detection (ND) aims to identify unknown objects that do not belong to classes seen during training by an object detection model. The task is particularly crucial in real-world applications, as it allows to avoid potentially harmful behaviours, e.g. as in the case of object detection models adopted in a self-driving car or in an autonomous robot. Traditional approaches to ND focus on one time offline post processing of the pretrained object detection output, leaving no possibility to improve the model robustness after training and discarding the abundant amount of out-of-distribution data encountered during deployment. In this work, we propose a novel framework for object-based ND, assuming that human feedback can be requested on the predicted output and later incorporated to refine the ND model without negatively affecting the main object detection performance. This refinement operation is repeated whenever new feedback is available. To tackle this new formulation of the problem for object detection, we propose a lightweight ND module attached on top of a pre-trained object detection model, which is incrementally updated through a feedback loop. We also propose a new benchmark to evaluate methods on this new setting and test extensively our ND approach against baselines, showing increased robustness and a successful incorporation of the received feedback.
    摘要 In this work, we propose a novel framework for object-based ND that assumes human feedback can be requested on the predicted output and later incorporated to refine the ND model without negatively affecting the main object detection performance. This refinement operation is repeated whenever new feedback is available. To tackle this new formulation of the problem for object detection, we propose a lightweight ND module attached on top of a pre-trained object detection model, which is incrementally updated through a feedback loop. We also propose a new benchmark to evaluate methods on this new setting and test our ND approach extensively against baselines, showing increased robustness and a successful incorporation of the received feedback.

Simple but Effective Unsupervised Classification for Specified Domain Images: A Case Study on Fungi Images

  • paper_url: http://arxiv.org/abs/2311.08995
  • repo_url: None
  • paper_authors: Zhaocong liu, Fa Zhang, Lin Cheng, Huanxi Deng, Xiaoyan Yang, Zhenyu Zhang, Chichun Zhou
  • for: 特别适用于需要专业知识的域面图像分类任务,解决高质量标注数据的缺乏问题。
  • methods: 使用自动Feature extraction方法,并利用多种 clustering 算法投票来实现无监督分类。
  • results: 在 fungal 图像数据集上达到 94.1% 和 96.7% 的分类精度,比超级vised方法高。这种无监督分类方法可以减少依赖于预先标注数据,提供closed-loop数据分类。
    Abstract High-quality labeled datasets are essential for deep learning. Traditional manual annotation methods are not only costly and inefficient but also pose challenges in specialized domains where expert knowledge is needed. Self-supervised methods, despite leveraging unlabeled data for feature extraction, still require hundreds or thousands of labeled instances to guide the model for effective specialized image classification. Current unsupervised learning methods offer automatic classification without prior annotation but often compromise on accuracy. As a result, efficiently procuring high-quality labeled datasets remains a pressing challenge for specialized domain images devoid of annotated data. Addressing this, an unsupervised classification method with three key ideas is introduced: 1) dual-step feature dimensionality reduction using a pre-trained model and manifold learning, 2) a voting mechanism from multiple clustering algorithms, and 3) post-hoc instead of prior manual annotation. This approach outperforms supervised methods in classification accuracy, as demonstrated with fungal image data, achieving 94.1% and 96.7% on public and private datasets respectively. The proposed unsupervised classification method reduces dependency on pre-annotated datasets, enabling a closed-loop for data classification. The simplicity and ease of use of this method will also bring convenience to researchers in various fields in building datasets, promoting AI applications for images in specialized domains.
    摘要 高品质标注数据是深度学习的关键。传统的手动标注方法不仅成本高、效率低,还存在专业领域中的知识问题。无监督方法,尽管利用无标注数据进行特征提取,仍需要数百或千个标注实例来引导模型以实现特殊领域图像分类。当前无监督学习方法可以自动分类而无需先前的标注,但通常会 compromise 准确性。因此,得到高品质标注数据仍然是特殊领域图像无标注数据的Pressing challenge。为解决这个问题,我们提出了一种无监督分类方法,包括三个关键想法:1)使用预训练模型和拟合学习来实现双步特征维度减少,2)多种 clustering 算法的投票机制,和3)post-hoc 而不是先前的手动标注。这种方法在分类准确率方面超过了supervised方法,如 demonstrated 在菌类图像数据上,实现了94.1%和96.7%的公共和私人数据集分类率。我们的无监督分类方法减少了对前置标注数据的依赖,使得数据分类成为closed-loop。这种简单易用的方法会将研究人员在多个领域建立数据集,推动人工智能应用于特殊领域图像。

Unsupervised approaches based on optimal transport and convex analysis for inverse problems in imaging

  • paper_url: http://arxiv.org/abs/2311.08972
  • repo_url: None
  • paper_authors: Marcello Carioni, Subhadip Mukherjee, Hong Ye Tan, Junqi Tang
  • for: This paper focuses on theoretically principled unsupervised learning schemes for solving imaging inverse problems, with a particular focus on methods rooted in optimal transport and convex analysis.
  • methods: The paper reviews optimal transport-based unsupervised approaches, learned adversarial regularization methods, provably convergent learned optimization algorithms, and plug-and-play algorithms for imaging problems.
  • results: The paper provides an overview of the key mathematical results that underlie the methods reviewed in the chapter to keep the discussion self-contained.
    Abstract Unsupervised deep learning approaches have recently become one of the crucial research areas in imaging owing to their ability to learn expressive and powerful reconstruction operators even when paired high-quality training data is scarcely available. In this chapter, we review theoretically principled unsupervised learning schemes for solving imaging inverse problems, with a particular focus on methods rooted in optimal transport and convex analysis. We begin by reviewing the optimal transport-based unsupervised approaches such as the cycle-consistency-based models and learned adversarial regularization methods, which have clear probabilistic interpretations. Subsequently, we give an overview of a recent line of works on provably convergent learned optimization algorithms applied to accelerate the solution of imaging inverse problems, alongside their dedicated unsupervised training schemes. We also survey a number of provably convergent plug-and-play algorithms (based on gradient-step deep denoisers), which are among the most important and widely applied unsupervised approaches for imaging problems. At the end of this survey, we provide an overview of a few related unsupervised learning frameworks that complement our focused schemes. Together with a detailed survey, we provide an overview of the key mathematical results that underlie the methods reviewed in the chapter to keep our discussion self-contained.
    摘要 <>translate_language: zh-CN<>无监督深度学习方法在媒体领域取得了重要突破,尤其是在数据质量较高的时候,它们能够学习表达力强的重建运算符。在这章中,我们将评论理论上正确的无监督学习方案,用于解决媒体领域的反向问题,特别是基于最优运输和凸分析的方法。我们首先介绍循环一致性基于的模型和学习抑制方法,这些方法具有明确的概率解释。接着,我们将讲解最近一些可靠地训练无监督算法,以加速媒体领域的反向问题解决。此外,我们还介绍了一些可靠地插入执行的无监督算法(基于梯度步深排除器),它们是媒体领域中最重要和最广泛应用的无监督方法。 finally,我们将介绍一些与我们关注的无监督学习框架,以及这些方法的关键数学结论,以便保持我们的讨论自 contenido。

A Spectral Diffusion Prior for Hyperspectral Image Super-Resolution

  • paper_url: http://arxiv.org/abs/2311.08955
  • repo_url: None
  • paper_authors: Jianjun Liu, Zebin Wu, Liang Xiao
  • for: fusion-based hyperspectral image (HSI) super-resolution
  • methods: spectral diffusion prior, maximum a posteriori, Adam optimization
  • results: effective in producing high-spatial-resolution HSI, demonstrated on both synthetic and real datasetsHere’s the simplified Chinese text:
  • for: 高分辨率多spectral影像(HSI)超解析
  • methods: spectral diffusion prior, maximum a posteriori, Adam优化
  • results: 高效地生成高分辨率HSI, 在synthetic和实际数据上进行了实验
    Abstract Fusion-based hyperspectral image (HSI) super-resolution aims to produce a high-spatial-resolution HSI by fusing a low-spatial-resolution HSI and a high-spatial-resolution multispectral image. Such a HSI super-resolution process can be modeled as an inverse problem, where the prior knowledge is essential for obtaining the desired solution. Motivated by the success of diffusion models, we propose a novel spectral diffusion prior for fusion-based HSI super-resolution. Specifically, we first investigate the spectrum generation problem and design a spectral diffusion model to model the spectral data distribution. Then, in the framework of maximum a posteriori, we keep the transition information between every two neighboring states during the reverse generative process, and thereby embed the knowledge of trained spectral diffusion model into the fusion problem in the form of a regularization term. At last, we treat each generation step of the final optimization problem as its subproblem, and employ the Adam to solve these subproblems in a reverse sequence. Experimental results conducted on both synthetic and real datasets demonstrate the effectiveness of the proposed approach. The code of the proposed approach will be available on https://github.com/liuofficial/SDP.
    摘要 融合基于快照影像(HSI)超分辨率目标是生成高空间分辨率HSI,通过融合低空间分辨率HSI和高空间分辨率多spectral影像。这种HSI超分辨率过程可以表示为一个逆问题,其中假设知识是获得所需解决方案的关键。鼓动 diffusion模型的成功,我们提出了一种新的 spectral diffusion prior для融合基于HSI超分辨率。specifically,我们首先调查spectrum生成问题,并设计了一种spectral diffusion模型来模拟spectral数据分布。然后,在maximum a posteriori框架中,我们保留了每两个邻居状态之间的过渡信息,并将这些知识嵌入到融合问题中,以形式化一个正则化项。最后,我们对每个生成步骤的最终优化问题进行分解,并使用Adam算法解决这些子问题。实验结果表明,我们提出的方法在synthetic和实际数据集上具有效果。code的github地址为https://github.com/liuofficial/SDP.

Automated Volume Corrected Mitotic Index Calculation Through Annotation-Free Deep Learning using Immunohistochemistry as Reference Standard

  • paper_url: http://arxiv.org/abs/2311.08949
  • repo_url: None
  • paper_authors: Jonas Ammeling, Moritz Hecker, Jonathan Ganz, Taryn A. Donovan, Christof A. Bertram, Katharina Breininger, Marc Aubreville
  • for: This paper is written for assessing the prognostic value of invasive breast carcinomas using a deep learning-based approach.
  • methods: The paper uses a deep learning pipeline solely trained with an annotation-free, immunohistochemistry-based approach to estimate epithelial segmentation in canine breast carcinomas.
  • results: The deep learning-based pipeline shows expert-level performance, providing time efficiency and reproducibility, compared to the manually annotated M/V-Index.
    Abstract The volume-corrected mitotic index (M/V-Index) was shown to provide prognostic value in invasive breast carcinomas. However, despite its prognostic significance, it is not established as the standard method for assessing aggressive biological behaviour, due to the high additional workload associated with determining the epithelial proportion. In this work, we show that using a deep learning pipeline solely trained with an annotation-free, immunohistochemistry-based approach, provides accurate estimations of epithelial segmentation in canine breast carcinomas. We compare our automatic framework with the manually annotated M/V-Index in a study with three board-certified pathologists. Our results indicate that the deep learning-based pipeline shows expert-level performance, while providing time efficiency and reproducibility.
    摘要 “对入侵性乳癌中的肉眼癌指数(M/V-Index)的评估,有过往的研究显示其具有预后价值。然而,尽管其预后意义,但它并未被视为标准的评估具有攻击性生物行为的方法,因为需要额外的努力来决定胞质含量。在这个工作中,我们展示了一个使用深度学习管线仅以无标注、免疫抗体技术为基础的架构,可以实时和可重复地估算乳癌组织中的胞质分布。我们与三位美国医学会认证的病理学家进行比较,结果显示,我们的深度学习架构具有专家水准的表现,同时提供时间效益和可重复性。”Note that the translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Confident Naturalness Explanation (CNE): A Framework to Explain and Assess Patterns Forming Naturalness

  • paper_url: http://arxiv.org/abs/2311.08936
  • repo_url: None
  • paper_authors: Ahmed Emam, Mohamed Farag, Ribana Roscher
  • for: This paper aims to improve the understanding and mapping of naturalness within protected natural areas using machine learning and explainability techniques.
  • methods: The proposed Confident Naturalness Explanation (CNE) framework combines explainable machine learning and uncertainty quantification to assess and explain naturalness, using a new quantitative metric and uncertainty-aware segmentation masks.
  • results: The proposed CNE framework is demonstrated to be effective in a study site in Fennoscandia using two open-source satellite datasets, providing confident and objective explanations of naturalness.Here is the same information in Simplified Chinese text:
  • for: 这篇论文目的是使用机器学习和可解释技术来提高保护自然区域中自然性的理解和地图。
  • methods: 提议的Confident Naturalness Explanation(CNE)框架结合可解释机器学习和不确定量化来评估和解释自然性,使用新的量化度量和不确定度映射。
  • results: 在芬兰地区使用两个开源卫星数据集,通过应用CNE框架,实现了对自然性的可靠和客观解释。
    Abstract Protected natural areas are regions that have been minimally affected by human activities such as urbanization, agriculture, and other human interventions. To better understand and map the naturalness of these areas, machine learning models can be used to analyze satellite imagery. Specifically, explainable machine learning methods show promise in uncovering patterns that contribute to the concept of naturalness within these protected environments. Additionally, addressing the uncertainty inherent in machine learning models is crucial for a comprehensive understanding of this concept. However, existing approaches have limitations. They either fail to provide explanations that are both valid and objective or struggle to offer a quantitative metric that accurately measures the contribution of specific patterns to naturalness, along with the associated confidence. In this paper, we propose a novel framework called the Confident Naturalness Explanation (CNE) framework. This framework combines explainable machine learning and uncertainty quantification to assess and explain naturalness. We introduce a new quantitative metric that describes the confident contribution of patterns to the concept of naturalness. Furthermore, we generate an uncertainty-aware segmentation mask for each input sample, highlighting areas where the model lacks knowledge. To demonstrate the effectiveness of our framework, we apply it to a study site in Fennoscandia using two open-source satellite datasets.
    摘要 保护的自然区域是人类活动影响的最小化区域,如城市化、农业等。为了更好地理解和映射这些区域的自然性,机器学习模型可以使用卫星图像进行分析。特别是使用可解释机器学习方法可以揭示保护区域中自然性的特征。然而,现有的方法有限制。它们可能无法提供有效和客观的解释,或者困难提供准确度量自然性的贡献和相关信息。在这篇论文中,我们提出了一种新的框架,即可靠自然性解释(CNE)框架。这个框架结合可解释机器学习和不确定量化来评估和解释自然性。我们还提出了一个新的量化度量,用于描述模型对自然性的可靠贡献。此外,我们生成了每个输入样本的不确定性感知分割图,以标识模型对具体区域的不确定性。为了证明我们的框架的效果,我们对芬兰北部的一个研究区使用了两个开源卫星数据集进行应用。

Structural-Based Uncertainty in Deep Learning Across Anatomical Scales: Analysis in White Matter Lesion Segmentation

  • paper_url: http://arxiv.org/abs/2311.08931
  • repo_url: https://github.com/medical-image-analysis-laboratory/ms_wml_uncs
  • paper_authors: Nataliia Molchanova, Vatsal Raina, Andrey Malinin, Francesco La Rosa, Adrien Depeursinge, Mark Gales, Cristina Granziera, Henning Muller, Mara Graziani, Meritxell Bach Cuadra
  • for: 这个论文探讨了自动深度学习(DL)工具的可靠性量化(UQ)在多发性脑梗液病人(MS)的磁共振成像(MRI)扫描中的白 matter损伤(WML)分 segmentation任务中的作用。
  • methods: 我们的研究主要集中在两个主要的不确定性问题上:首先,我们认为一个好的不确定性度量应该指示预测有高度不确定性的值。其次,我们 investigate了不确定性在不同的 анаatomical scale( voxel、 lesion 或 patient)之间的关系。我们认为不确定性在每个缩放级别都与特定类型的错误有关。
  • results: 我们的研究结果表明,我们提出的方法可以更好地捕捉模型错误在 lesion 和 patient 缩放级别上,比 tradicional voxel-scale uncertainty 值的平均值。我们在一个多中心 MRI 数据集上进行了172名病人的研究,并提供了 UQ 协议代码在 GitHub 上(https://github.com/Medical-Image-Analysis-Laboratory/MS_WML_uncs)。
    Abstract This paper explores uncertainty quantification (UQ) as an indicator of the trustworthiness of automated deep-learning (DL) tools in the context of white matter lesion (WML) segmentation from magnetic resonance imaging (MRI) scans of multiple sclerosis (MS) patients. Our study focuses on two principal aspects of uncertainty in structured output segmentation tasks. Firstly, we postulate that a good uncertainty measure should indicate predictions likely to be incorrect with high uncertainty values. Second, we investigate the merit of quantifying uncertainty at different anatomical scales (voxel, lesion, or patient). We hypothesize that uncertainty at each scale is related to specific types of errors. Our study aims to confirm this relationship by conducting separate analyses for in-domain and out-of-domain settings. Our primary methodological contributions are (i) the development of novel measures for quantifying uncertainty at lesion and patient scales, derived from structural prediction discrepancies, and (ii) the extension of an error retention curve analysis framework to facilitate the evaluation of UQ performance at both lesion and patient scales. The results from a multi-centric MRI dataset of 172 patients demonstrate that our proposed measures more effectively capture model errors at the lesion and patient scales compared to measures that average voxel-scale uncertainty values. We provide the UQ protocols code at https://github.com/Medical-Image-Analysis-Laboratory/MS_WML_uncs.
    摘要
  1. We propose that a good uncertainty measure should indicate predictions that are likely to be incorrect with high uncertainty values.2. We investigate the merit of quantifying uncertainty at different anatomical scales (voxel, lesion, or patient). We hypothesize that uncertainty at each scale is related to specific types of errors.Our study aims to confirm this relationship by conducting separate analyses for in-domain and out-of-domain settings. Our primary methodological contributions are:1. The development of novel measures for quantifying uncertainty at lesion and patient scales, derived from structural prediction discrepancies.2. The extension of an error retention curve analysis framework to facilitate the evaluation of UQ performance at both lesion and patient scales.The results from a multi-centric MRI dataset of 172 patients demonstrate that our proposed measures more effectively capture model errors at the lesion and patient scales compared to measures that average voxel-scale uncertainty values. The UQ protocols code is available at https://github.com/Medical-Image-Analysis-Laboratory/MS_WML_uncs.

Progressive Feedback-Enhanced Transformer for Image Forgery Localization

  • paper_url: http://arxiv.org/abs/2311.08910
  • repo_url: None
  • paper_authors: Haochen Zhu, Gang Cao, Xianglin Huang
  • for: 本研究旨在提出一种 Progressive FeedbACk-enhanced Transformer (ProFact) 网络,用于提高图像forge localization的精度和可靠性。
  • methods: 该网络使用了一个初始分支网络生成的粗略定位图,并将其FeedbACk到早期的 transformer 嵌入层以增强正面特征表示,同时抑制干扰因素。此外,还提出了一种 Contextual Spatial Pyramid 模块,用于进一步提高医学特征的涵盖率和分辨率。
  • results: 对于九个公共的医学检测数据集,我们的提出的定位器表现出色,在泛化能力和Robustness方面都大幅超越了当前状态。
    Abstract Blind detection of the forged regions in digital images is an effective authentication means to counter the malicious use of local image editing techniques. Existing encoder-decoder forensic networks overlook the fact that detecting complex and subtle tampered regions typically requires more feedback information. In this paper, we propose a Progressive FeedbACk-enhanced Transformer (ProFact) network to achieve coarse-to-fine image forgery localization. Specifically, the coarse localization map generated by an initial branch network is adaptively fed back to the early transformer encoder layers for enhancing the representation of positive features while suppressing interference factors. The cascaded transformer network, combined with a contextual spatial pyramid module, is designed to refine discriminative forensic features for improving the forgery localization accuracy and reliability. Furthermore, we present an effective strategy to automatically generate large-scale forged image samples close to real-world forensic scenarios, especially in realistic and coherent processing. Leveraging on such samples, a progressive and cost-effective two-stage training protocol is applied to the ProFact network. The extensive experimental results on nine public forensic datasets show that our proposed localizer greatly outperforms the state-of-the-art on the generalization ability and robustness of image forgery localization. Code will be publicly available at https://github.com/multimediaFor/ProFact.
    摘要 “针对本地图像修改技术的恶意使用,潜意检测数字图像中的假造区域是一种有效的身份验证手段。现有的编oder-解码器审计网络忽视了检测复杂且微妙的假造区域通常需要更多的反馈信息。本文提出了一种Progressive FeedbACk-enhanced Transformer(ProFact)网络,以实现从粗到细图像假造地点检测。特别是,初始分支网络生成的粗略地点映射被适应地feedback到早期的Transformer编码层,以增强正面特征表示,同时抑制干扰因素。另外,我们设计了一个Contextual Spatial Pyramid模块,用于修改审计特征,以提高假造地点检测精度和可靠性。此外,我们还提出了一种有效的自动生成大规模假造图像样本close to real-world审计enario,特别是在realistic和coherent处理中。基于这些样本,我们采用了一种进程ive和cost-effective的两阶段训练 protocole。我们的实验结果表明,我们提出的检测器在通用性和Robustness两个方面都有出色的表现,超过了当前state-of-the-art。代码将在https://github.com/multimediaFor/ProFact中公开。”

DLAS: An Exploration and Assessment of the Deep Learning Acceleration Stack

  • paper_url: http://arxiv.org/abs/2311.08909
  • repo_url: None
  • paper_authors: Perry Gibson, José Cano, Elliot J. Crowley, Amos Storkey, Michael O’Boyle
  • for: 这个论文的目的是提供一个参考框架,帮助机器学习和系统实践者在实现深度学习推进运算时,更好地考虑各个层次的依赖关系。
  • methods: 这个论文使用了机器学习和系统技术,建立了深度学习加速框架(DLAS),并对DLAS进行了逐层次的干扰研究,以探索各个层次之间的依赖关系。
  • results: 这个论文的评估结果显示,DLAS的各个层次之间存在许多依赖关系,而且这些关系可以通过干扰DLAS的各个层次来变化。此外,论文还发现了一些实际上的规律,例如压缩技术的加速效益是具体设备依赖的,以及自动调整代码生成可以对最佳化器的选择产生重大影响。
    Abstract Deep Neural Networks (DNNs) are extremely computationally demanding, which presents a large barrier to their deployment on resource-constrained devices. Since such devices are where many emerging deep learning applications lie (e.g., drones, vision-based medical technology), significant bodies of work from both the machine learning and systems communities have attempted to provide optimizations to accelerate DNNs. To help unify these two perspectives, in this paper we combine machine learning and systems techniques within the Deep Learning Acceleration Stack (DLAS), and demonstrate how these layers can be tightly dependent on each other with an across-stack perturbation study. We evaluate the impact on accuracy and inference time when varying different parameters of DLAS across two datasets, seven popular DNN architectures, four DNN compression techniques, three algorithmic primitives with sparse and dense variants, untuned and auto-scheduled code generation, and four hardware platforms. Our evaluation highlights how perturbations across DLAS parameters can cause significant variation and across-stack interactions. The highest level observation from our evaluation is that the model size, accuracy, and inference time are not guaranteed to be correlated. Overall we make 13 key observations, including that speedups provided by compression techniques are very hardware dependent, and that compiler auto-tuning can significantly alter what the best algorithm to use for a given configuration is. With DLAS, we aim to provide a reference framework to aid machine learning and systems practitioners in reasoning about the context in which their respective DNN acceleration solutions exist in. With our evaluation strongly motivating the need for co-design, we believe that DLAS can be a valuable concept for exploring the next generation of co-designed accelerated deep learning solutions.
    摘要 深度神经网络(DNNs)非常 computationally 需求强大,这使得它们在有限资源的设备上部署变得困难。由于这些设备是许多深度学习应用程序的核心(如无人机、视觉基于医疗技术),因此机器学习和系统社区中的大量工作尝试了加速DNNs。为了统一这两个视角,在这篇论文中我们将机器学习和系统技术融合在一起,并通过各层之间的扰动研究表明了它们之间的互相依赖关系。我们对两个数据集、七种流行的DNN架构、四种DNN压缩技术、三种算法基本primitives的稀热和杂散变体、未调uning和自动生成代码生成、四种硬件平台进行了评估。我们的评估表明,在不同的DAS Parameters中,可以导致显著的变化和层之间的互动。最高层的观察结论是,模型大小、准确率和执行时间之间不一定是相关的。总的来说,我们所得到的13个观察结论,其中一些包括压缩技术在不同硬件平台上提供的加速效果是非常硬件依赖的,并且编译器自动调试可以很大地改变选择最佳算法的配置是否正确。通过DAS,我们希望提供一个参考框架,帮助机器学习和系统专家更好地理解它们的DNN加速解决方案在不同上下文中的运行环境。我们的评估强烈驱动了需要的共设计,我们认为DAS可以成为下一代共设计加速深度学习解决方案的价值概念。

Robust Brain MRI Image Classification with SIBOW-SVM

  • paper_url: http://arxiv.org/abs/2311.08908
  • repo_url: None
  • paper_authors: Liyun Zeng, Hao Helen Zhang
    for:* The paper aims to develop a novel brain tumor image classification method to improve the accuracy and efficiency of detecting and diagnosing brain tumors.methods:* The proposed method, called SIBOW-SVM, integrates the Bag-of-Features (BoF) model with SIFT feature extraction and weighted Support Vector Machines (wSVMs) to capture hidden image features and differentiate various tumor types.* The method also estimates the probabilities of images belonging to each class, providing high-confidence classification decisions.results:* The SIBOW-SVM method outperforms state-of-the-art methods, including Convolutional Neural Network (CNN), on a public data set of brain tumor MRI images containing four classes: glioma, meningioma, pituitary, and normal.
    Abstract The majority of primary Central Nervous System (CNS) tumors in the brain are among the most aggressive diseases affecting humans. Early detection of brain tumor types, whether benign or malignant, glial or non-glial, is critical for cancer prevention and treatment, ultimately improving human life expectancy. Magnetic Resonance Imaging (MRI) stands as the most effective technique to detect brain tumors by generating comprehensive brain images through scans. However, human examination can be error-prone and inefficient due to the complexity, size, and location variability of brain tumors. Recently, automated classification techniques using machine learning (ML) methods, such as Convolutional Neural Network (CNN), have demonstrated significantly higher accuracy than manual screening, while maintaining low computational costs. Nonetheless, deep learning-based image classification methods, including CNN, face challenges in estimating class probabilities without proper model calibration. In this paper, we propose a novel brain tumor image classification method, called SIBOW-SVM, which integrates the Bag-of-Features (BoF) model with SIFT feature extraction and weighted Support Vector Machines (wSVMs). This new approach effectively captures hidden image features, enabling the differentiation of various tumor types and accurate label predictions. Additionally, the SIBOW-SVM is able to estimate the probabilities of images belonging to each class, thereby providing high-confidence classification decisions. We have also developed scalable and parallelable algorithms to facilitate the practical implementation of SIBOW-SVM for massive images. As a benchmark, we apply the SIBOW-SVM to a public data set of brain tumor MRI images containing four classes: glioma, meningioma, pituitary, and normal. Our results show that the new method outperforms state-of-the-art methods, including CNN.
    摘要 主要脑中央神经系统肿瘤的多数是人类最致命的疾病之一。早期检测脑肿瘤类型,无论是肿瘤或非肿瘤, glial 或非 glial,都是防范癌症和治疗的关键,最终提高人类存活时间。磁共振成像(MRI)是识别脑肿瘤的最有效的技术,通过扫描生成全面脑图像。然而,人工检查可能会出现错误和不具有效率,因为脑肿瘤的复杂性、大小和位置变化。现在,自动分类技术使用机器学习(ML)方法,如卷积神经网络(CNN),已经表明了与人工检查相比,有较高的准确率,同时保持低的计算成本。然而,深度学习基于图像分类方法,包括CNN,面临着估计类别概率无法进行正确的模型定制。在这篇论文中,我们提出了一种新的脑肿瘤图像分类方法,called SIBOW-SVM,它将袋子模型(BoF)和SIFT特征提取结合weighted Support Vector Machines(wSVMs)。这种新方法可以有效捕捉隐藏的图像特征,以便区分不同的肿瘤类型并准确地预测标签。此外,SIBOW-SVM还可以估计图像属于哪一类的概率,从而提供高确度的分类决策。我们还开发了可扩展和并行的算法,以便实现SIBOW-SVM的实用应用。作为标准,我们对一个公共的脑肿瘤MRI图像集进行了应用,该集包含四个类别: glioma、meningioma、pituitary和正常。我们的结果显示,新方法在与现状的方法相比,具有更高的准确率。

AdapterShadow: Adapting Segment Anything Model for Shadow Detection

  • paper_url: http://arxiv.org/abs/2311.08891
  • repo_url: None
  • paper_authors: Leiping Jie, Hui Zhang
  • for: 提高阴影检测的精度和效率
  • methods: 使用可调式适应器与SAM模型结合,并提出了一种新的格子采样方法来自动生成精度点提示
  • results: 在四个广泛使用的基准数据集上进行了广泛的实验,并证明了我们提出的方法的精度和效率的提高
    Abstract Segment anything model (SAM) has shown its spectacular performance in segmenting universal objects, especially when elaborate prompts are provided. However, the drawback of SAM is twofold. On the first hand, it fails to segment specific targets, e.g., shadow images or lesions in medical images. On the other hand, manually specifying prompts is extremely time-consuming. To overcome the problems, we propose AdapterShadow, which adapts SAM model for shadow detection. To adapt SAM for shadow images, trainable adapters are inserted into the frozen image encoder of SAM, since the training of the full SAM model is both time and memory consuming. Moreover, we introduce a novel grid sampling method to generate dense point prompts, which helps to automatically segment shadows without any manual interventions. Extensive experiments are conducted on four widely used benchmark datasets to demonstrate the superior performance of our proposed method. Codes will are publicly available at https://github.com/LeipingJie/AdapterShadow.
    摘要 Segment anything model (SAM) 已经显示出了吸引人的表现,尤其是当提供详细的提示时。然而,SAM的缺点是两重的。一方面,它无法 segment Specific targets, 例如阴影图像或医学图像中的病变。另一方面,手动指定提示是非常时间和内存占用的。为了解决这些问题,我们提议 AdapterShadow,它将SAM模型适应到阴影检测中。为了适应SAM模型 для阴影图像,我们在冻结的图像编码器中插入可学习的适应器。此外,我们还介绍了一种新的网格采样方法,用于生成密集的点提示,以帮助自动 segment 阴影 без任何手动干预。我们在四个常用的标准数据集上进行了广泛的实验,以证明我们提出的方法的优秀性。代码将在 GitHub 上公开。

One-Shot Federated Learning with Classifier-Guided Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.08870
  • repo_url: None
  • paper_authors: Mingzhao Yang, Shangchao Su, Bin Li, Xiangyang Xue
  • for: This paper focuses on exploring the potential of diffusion models in one-shot federated learning (OSFL) to generate high-quality synthetic datasets that can be used to train aggregated models without relying on auxiliary datasets or training generators.
  • methods: The proposed method, called FedCADO, utilizes guidance from client classifiers to generate data that complies with clients’ distributions and subsequently trains the aggregated model on the server. The method involves targeted optimizations in two aspects: conditionally editing the randomly sampled initial noises and employing the BN statistics from the classifiers to provide detailed guidance during generation.
  • results: The proposed method effectively handles the heterogeneous client models and the problems of non-IID features or labels, and can generate synthetic datasets that closely resemble the distribution and quality of the original client dataset. The method also avoids privacy leakage risks by not training any generators or transferring any auxiliary information on clients. Experimental results on three large-scale multi-domain image datasets demonstrate that the synthetic datasets generated by FedCADO can assist in surpassing the knowledge limitations of the client samples, resulting in aggregation models that even outperform the performance ceiling of centralized training in some cases.
    Abstract One-shot federated learning (OSFL) has gained attention in recent years due to its low communication cost. However, most of the existing methods require auxiliary datasets or training generators, which hinders their practicality in real-world scenarios. In this paper, we explore the novel opportunities that diffusion models bring to OSFL and propose FedCADO, utilizing guidance from client classifiers to generate data that complies with clients' distributions and subsequently training the aggregated model on the server. Specifically, our method involves targeted optimizations in two aspects. On one hand, we conditionally edit the randomly sampled initial noises, embedding them with specified semantics and distributions, resulting in a significant improvement in both the quality and stability of generation. On the other hand, we employ the BN statistics from the classifiers to provide detailed guidance during generation. These tailored optimizations enable us to limitlessly generate datasets, which closely resemble the distribution and quality of the original client dataset. Our method effectively handles the heterogeneous client models and the problems of non-IID features or labels. In terms of privacy protection, our method avoids training any generator or transferring any auxiliary information on clients, eliminating any additional privacy leakage risks. Leveraging the extensive knowledge stored in the pre-trained diffusion model, the synthetic datasets can assist us in surpassing the knowledge limitations of the client samples, resulting in aggregation models that even outperform the performance ceiling of centralized training in some cases, which is convincingly demonstrated in the sufficient quantification and visualization experiments conducted on three large-scale multi-domain image datasets.
    摘要 一种新型的 federated learning 方法,即 One-shot federated learning(OSFL),在最近几年内受到了广泛关注,因为它的通信成本很低。然而,大多数现有的方法都需要附加的 auxillary 数据或训练生成器,这限制了它们在实际场景中的实用性。在这篇论文中,我们探索了 diffusion 模型带来的新机遇,并提出了 FedCADO 方法,通过客户端分类器的指导,在服务器上训练汇集模型。具体来说,我们的方法包括两个方面的优化。一方面,我们通过条件编辑 randomly 采样的初始噪声,使其嵌入特定的 semantics 和分布,从而获得较好的质量和稳定性。另一方面,我们利用客户端的 BN 统计来提供详细的指导,以便在生成过程中进行精细的调整。这些特定的优化使得我们能够无限生成数据,这些数据与客户端的原始数据分布和质量具有很高的相似度。我们的方法可以有效地处理不同客户端模型的 hetroogeneous 特性,以及非 Identical 的特征或标签问题。另外,我们的方法不需要在客户端上训练任何生成器或传输任何附加信息,因此不会增加隐私泄露的风险。通过 diffusion 模型存储的广泛知识,我们可以使用生成的 sintethic 数据进行超越客户端样本的知识限制,实现汇集模型的性能超越中央化训练的性能均衡,这些结果在三个大规模多域图像 dataset 上得到了证明。

Toulouse Hyperspectral Data Set: a benchmark data set to assess semi-supervised spectral representation learning and pixel-wise classification techniques

  • paper_url: http://arxiv.org/abs/2311.08863
  • repo_url: https://github.com/romain3ch216/tlse-experiments
  • paper_authors: Romain Thoreau, Laurent Risser, Véronique Achard, Béatrice Berthelot, Xavier Briottet
    for:The paper aims to provide a new hyperspectral data set for large-scale urban area mapping, addressing the scarcity of annotated data and the limitations of existing data sets.methods:The paper uses semi-supervised and self-supervised techniques, such as Masked Autoencoders, to train machine learning models on the new data set, and evaluates their performance on pixel-wise classification.results:The paper achieves an overall accuracy of 82% and an F1 score of 74% on pixel-wise classification, using a conventional autoencoder combined with a Random Forest classifier. The paper also releases the Toulouse Hyperspectral Data Set and the code for reproducing the experiments.Here is the Chinese translation of the three points:for:论文旨在提供大规模城市区域地图的空中彩色影像,解决现有数据集缺乏标注数据和限制。methods:论文使用 semi-supervised 和 self-supervised 技术,如Masked Autoencoders,在新数据集上训练机器学习模型,并评估其像素级分类性能。results:论文在像素级分类 task 上达到了 82% 的总准确率和 74% 的 F1 分数,使用 convential autoencoder 和 Random Forest 分类器。论文还发布了 Toulouse Hyperspectral Data Set 和 reproduce эксперименты 的代码。
    Abstract Airborne hyperspectral images can be used to map the land cover in large urban areas, thanks to their very high spatial and spectral resolutions on a wide spectral domain. While the spectral dimension of hyperspectral images is highly informative of the chemical composition of the land surface, the use of state-of-the-art machine learning algorithms to map the land cover has been dramatically limited by the availability of training data. To cope with the scarcity of annotations, semi-supervised and self-supervised techniques have lately raised a lot of interest in the community. Yet, the publicly available hyperspectral data sets commonly used to benchmark machine learning models are not totally suited to evaluate their generalization performances due to one or several of the following properties: a limited geographical coverage (which does not reflect the spectral diversity in metropolitan areas), a small number of land cover classes and a lack of appropriate standard train / test splits for semi-supervised and self-supervised learning. Therefore, we release in this paper the Toulouse Hyperspectral Data Set that stands out from other data sets in the above-mentioned respects in order to meet key issues in spectral representation learning and classification over large-scale hyperspectral images with very few labeled pixels. Besides, we discuss and experiment the self-supervised task of Masked Autoencoders and establish a baseline for pixel-wise classification based on a conventional autoencoder combined with a Random Forest classifier achieving 82% overall accuracy and 74% F1 score. The Toulouse Hyperspectral Data Set and our code are publicly available at https://www.toulouse-hyperspectral-data-set.com and https://www.github.com/Romain3Ch216/tlse-experiments, respectively.
    摘要 飞行式干扰спектраль成像可以用于大都市地区的地表覆盖图像,因为它们具有非常高的空间和спектраль分辨率,并且覆盖了广泛的 спектраль频谱。 although the spectral dimension of hyperspectral images is highly informative of the chemical composition of the land surface, the use of state-of-the-art machine learning algorithms to map the land cover has been dramatically limited by the availability of training data. To cope with the scarcity of annotations, semi-supervised and self-supervised techniques have lately raised a lot of interest in the community. However, the publicly available hyperspectral data sets commonly used to benchmark machine learning models are not totally suited to evaluate their generalization performances due to one or several of the following properties: limited geographical coverage (which does not reflect the spectral diversity in metropolitan areas), a small number of land cover classes, and a lack of appropriate standard train / test splits for semi-supervised and self-supervised learning. Therefore, we release in this paper the Toulouse Hyperspectral Data Set, which stands out from other data sets in the above-mentioned respects in order to meet key issues in spectral representation learning and classification over large-scale hyperspectral images with very few labeled pixels. Besides, we discuss and experiment the self-supervised task of Masked Autoencoders and establish a baseline for pixel-wise classification based on a conventional autoencoder combined with a Random Forest classifier, achieving 82% overall accuracy and 74% F1 score. The Toulouse Hyperspectral Data Set and our code are publicly available at https://www.toulouse-hyperspectral-data-set.com and https://www.github.com/Romain3Ch216/tlse-experiments, respectively.

Data Augmentations in Deep Weight Spaces

  • paper_url: http://arxiv.org/abs/2311.08851
  • repo_url: None
  • paper_authors: Aviv Shamsian, David W. Zhang, Aviv Navon, Yan Zhang, Miltiadis Kofinas, Idan Achituve, Riccardo Valperga, Gertjan J. Burghouts, Efstratios Gavves, Cees G. M. Snoek, Ethan Fetaya, Gal Chechik, Haggai Maron
  • for: 本研究旨在解决深度神经网络学习在权重空间中的难题,即生成大量数据来避免过拟合。
  • methods: 本文提出了一种基于混合方法的数据增强技术,以生成新的数据示例,不需要额外训练输入权重空间元素。
  • results: 对于现有的benchmark和新生成的benchmark,我们评估了不同数据增强技术的性能,并发现了一种基于混合方法的新数据增强方案可以提高学习效果。
    Abstract Learning in weight spaces, where neural networks process the weights of other deep neural networks, has emerged as a promising research direction with applications in various fields, from analyzing and editing neural fields and implicit neural representations, to network pruning and quantization. Recent works designed architectures for effective learning in that space, which takes into account its unique, permutation-equivariant, structure. Unfortunately, so far these architectures suffer from severe overfitting and were shown to benefit from large datasets. This poses a significant challenge because generating data for this learning setup is laborious and time-consuming since each data sample is a full set of network weights that has to be trained. In this paper, we address this difficulty by investigating data augmentations for weight spaces, a set of techniques that enable generating new data examples on the fly without having to train additional input weight space elements. We first review several recently proposed data augmentation schemes %that were proposed recently and divide them into categories. We then introduce a novel augmentation scheme based on the Mixup method. We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate, which can be valuable for future studies.
    摘要 学习Weight空间中的神经网络,其中神经网络处理另一个深度神经网络的权重,已经出现为一个有前途的研究方向,具有应用于不同领域的可能性,从分析和编辑神经场和隐藏神经表示之间的应用,到网络剪辑和量化。最近的工作设计了适用于这个空间的建筑,考虑其独特的协变结构。然而,目前这些建筑受到严重的过拟合问题困扰,需要大量的数据来适应。在这篇论文中,我们解决这个挑战,通过调查Weight空间中的数据增强技术,以生成新的数据示例,而无需额外训练输入权重空间元素。我们首先回顾最近提出的数据增强方案,并将其分为类别。然后,我们介绍了一种基于 Mixup 方法的新的增强方案。我们对这些技术的性能进行评估,并在现有的标准准则上进行评估,以及新生成的标准准则,这些标准准则可能对未来的研究有所价值。

Controlling the Output of a Generative Model by Latent Feature Vector Shifting

  • paper_url: http://arxiv.org/abs/2311.08850
  • repo_url: None
  • paper_authors: Róbert Belanec, Peter Lacko, Kristína Malinovská
  • for: 这个论文的目的是控制StyleGAN3生成器的输出图像修改。
  • methods: 我们使用了一个预训练的StyleGAN3生成器和一个ResNet34对应神经网络,将生成的图像分类为 celebA 数据集中的 Binary facial 特征。我们还使用了一个叫做 latent feature shifter 的神经网络,将 StyleGAN3 的 latent vector shift 到指定的特征方向。
  • results: 我们的 latent feature shifter 方法比基eline方法多出了更多的生成图像具有想要的特征。我们通过评估结果发现,我们的 latent feature shifter 方法成功地控制了 StyleGAN3 生成器的输出图像修改。
    Abstract State-of-the-art generative models (e.g. StyleGAN3 \cite{karras2021alias}) often generate photorealistic images based on vectors sampled from their latent space. However, the ability to control the output is limited. Here we present our novel method for latent vector shifting for controlled output image modification utilizing semantic features of the generated images. In our approach we use a pre-trained model of StyleGAN3 that generates images of realistic human faces in relatively high resolution. We complement the generative model with a convolutional neural network classifier, namely ResNet34, trained to classify the generated images with binary facial features from the CelebA dataset. Our latent feature shifter is a neural network model with a task to shift the latent vectors of a generative model into a specified feature direction. We have trained latent feature shifter for multiple facial features, and outperformed our baseline method in the number of generated images with the desired feature. To train our latent feature shifter neural network, we have designed a dataset of pairs of latent vectors with and without a certain feature. Based on the evaluation, we conclude that our latent feature shifter approach was successful in the controlled generation of the StyleGAN3 generator.
    摘要 现代生成模型(例如StyleGAN3 \cite{karras2021alias)frequently生成高分辨率的图像,基于生成器的幂space中的向量采样。然而,控制输出的能力受限。在这里,我们介绍了我们的新方法,利用生成器图像的Semantic特征来实现控制输出图像修改。我们使用已经训练过的StyleGAN3生成器,可以生成高分辨率的真实人脸图像。我们补充了生成器的核心网络,使其能够通过CelebA数据集中的二分类网络(ResNet34)来分类生成的图像。我们的幂向量推移器是一个具有将幂向量推移到指定特征方向的任务的神经网络模型。我们在多个面部特征上训练了幂向量推移器,并超过了我们的基eline方法的数量。为了训练我们的幂向量推移器神经网络,我们设计了一个包含具有和无某些特征的latent向量对的数据集。根据评估结果,我们认为我们的幂向量推移器方法成功地控制了StyleGAN3生成器。

Personalized Video Relighting Using Casual Light Stage

  • paper_url: http://arxiv.org/abs/2311.08843
  • repo_url: None
  • paper_authors: Jun Myeong Choi, Max Christman, Roni Sengupta
  • for: 这个论文的目的是提出一种个性化视频重光算法,以实现在任何姿势、表情和照明条件下,在实时下生成高质量的重光视频。
  • methods: 该算法使用了一种新的神经网络重光架构,可以有效地分离出照明源的光照特征、物体的 geometry 和反射特征,然后将其与目标照明相加以生成重光图像。
  • results: 根据对 Light Stage at Your Desk (LSYD) 数据和 Light Stage captured One Light At a Time (OLAT) 数据的质量评估,这种重光算法能够提高肖像图像重光质量和时间稳定性,比之前的方法更高效。
    Abstract In this paper, we develop a personalized video relighting algorithm that produces high-quality and temporally consistent relit video under any pose, expression, and lighting conditions in real-time. Existing relighting algorithms typically rely either on publicly available synthetic data, which yields poor relighting results, or instead on Light Stage data which is inaccessible and is not publicly available. We show that by casually capturing video of a user watching YouTube videos on a monitor we can train a personalized algorithm capable of producing high-quality relighting under any condition. Our key contribution is a novel neural relighting architecture that effectively separates the intrinsic appearance features, geometry and reflectance, from the source lighting and then combines it with the target lighting to generate a relit image. This neural architecture enables smoothing of intrinsic appearance features leading to temporally stable video relighting. Both qualitative and quantitative evaluations show that our relighting architecture improves portrait image relighting quality and temporal consistency over state-of-the-art approaches on both casually captured Light Stage at Your Desk (LSYD) data and Light Stage captured One Light At a Time (OLAT) datasets.
    摘要 在这篇论文中,我们开发了一种个性化视频重新照明算法,该算法在实时下生成高质量、时间上一致的重新照明视频,无论用户的姿势、表情或照明条件如何。现有的重新照明算法通常依赖于公共可用的生成器数据,这些数据的重新照明结果很差,或者使用Light Stage数据,但这些数据不公开可用。我们显示,通过通过捕捉用户在 monitor 上观看 YouTube 视频来训练个性化算法,我们可以生成高质量的重新照明视频。我们的关键贡献是一种新的神经网络重新照明架构,该架构能够有效地分离出照明源的自然特征、几何和反射特征,然后与目标照明相结合,生成一个重新照明的图像。这种神经网络架构使得图像的内在特征平滑,从而实现了时间上一致的视频重新照明。我们的重新照明架构在使用LSYD 和 OLAT 数据集上的质量和时间一致性方面与当前的方法进行比较,并取得了显著的改善。

Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding

  • paper_url: http://arxiv.org/abs/2311.08835
  • repo_url: https://github.com/wjun0830/cgdetr
  • paper_authors: WonJun Moon, Sangeek Hyun, SuBeen Lee, Jae-Pil Heo
  • for: 这 paper 的目的是提供一种基于注意力机制的视频时间固定方法,以便在视频和文本查询之间强化交互,并且能够根据文本查询提取相关的视频clip。
  • methods: 这 paper 使用了一种名为 Correlation-Guided Detection Transformer~(CG-DETR) 的方法,它包括一个适应式跨模态注意力层、一个 dummy tokens 的使用、以及一个高级概念共同embedding空间学习。
  • results: 这 paper 的实验结果表明,CG-DETR 可以在多个benchmark上达到州OF-the-art的Result,包括时刻检索和突出部分检测。 codes 可以在 https://github.com/wjun0830/CGDETR 中找到。
    Abstract Recent endeavors in video temporal grounding enforce strong cross-modal interactions through attention mechanisms to overcome the modality gap between video and text query. However, previous works treat all video clips equally regardless of their semantic relevance with the text query in attention modules. In this paper, our goal is to provide clues for query-associated video clips within the crossmodal encoding process. With our Correlation-Guided Detection Transformer~(CG-DETR), we explore the appropriate clip-wise degree of cross-modal interactions and how to exploit such degrees for prediction. First, we design an adaptive cross-attention layer with dummy tokens. Dummy tokens conditioned by text query take a portion of the attention weights, preventing irrelevant video clips from being represented by the text query. Yet, not all word tokens equally inherit the text query's correlation to video clips. Thus, we further guide the cross-attention map by inferring the fine-grained correlation between video clips and words. We enable this by learning a joint embedding space for high-level concepts, i.e., moment and sentence level, and inferring the clip-word correlation. Lastly, we use a moment-adaptive saliency detector to exploit each video clip's degrees of text engagement. We validate the superiority of CG-DETR with the state-of-the-art results on various benchmarks for both moment retrieval and highlight detection. Codes are available at https://github.com/wjun0830/CGDETR.
    摘要 近期的视频时间挂钩工作强制实施了跨Modal的交互,通过注意机制来超越视频和文本查询之间的Modal gap。然而,前一些工作都是在注意模块中对所有视频clip进行等效的处理,不考虑视频clip与文本查询的Semantic relevance。在这篇论文中,我们的目标是提供与文本查询相关的视频clip在跨Modal编码过程中的线索。我们使用Correlation-Guided Detection Transformer~(CG-DETR)来探索适当的clipwise度跨Modal交互,以及如何利用这些度量进行预测。首先,我们设计了适应式交叉注意层,其中文本查询条件下的干扰符token会占据一部分注意量,以避免不相关的视频clip被文本查询所代表。然而,不是所有的单词token都会相同地继承文本查询的视频clip相关性。因此,我们进一步指导交叉注意地图,通过学习高级概念的共同embedding空间,以及clip-word关系的推理。最后,我们使用时刻适应性的焦点检测器来利用每个视频clip的文本参与度。我们 validate CG-DETR的优越性通过在多种benchmark上实现state-of-the-art的结果,包括时刻检索和突出部分检测。代码可以在https://github.com/wjun0830/CGDETR中获取。

Target-oriented Domain Adaptation for Infrared Image Super-Resolution

  • paper_url: http://arxiv.org/abs/2311.08816
  • repo_url: https://github.com/yongsongh/dasrgan
  • paper_authors: Yongsong Huang, Tomo Miyazaki, Xiaofeng Liu, Yafei Dong, Shinichiro Omachi
  • for: 提高红外超分辨率图像质量
  • methods: 使用目标域适应SRGAN(DASRGAN),包括Texture-Oriented Adaptation(TOA)和Noise-Oriented Adaptation(NOA)两部分
  • results: 实验表明,DASRGAN在多个标准测试 benchmark 和不同的� upsampling 因子下表现出优于其他方法,并设置了新的州际表现标准。
    Abstract Recent efforts have explored leveraging visible light images to enrich texture details in infrared (IR) super-resolution. However, this direct adaptation approach often becomes a double-edged sword, as it improves texture at the cost of introducing noise and blurring artifacts. To address these challenges, we propose the Target-oriented Domain Adaptation SRGAN (DASRGAN), an innovative framework specifically engineered for robust IR super-resolution model adaptation. DASRGAN operates on the synergy of two key components: 1) Texture-Oriented Adaptation (TOA) to refine texture details meticulously, and 2) Noise-Oriented Adaptation (NOA), dedicated to minimizing noise transfer. Specifically, TOA uniquely integrates a specialized discriminator, incorporating a prior extraction branch, and employs a Sobel-guided adversarial loss to align texture distributions effectively. Concurrently, NOA utilizes a noise adversarial loss to distinctly separate the generative and Gaussian noise pattern distributions during adversarial training. Our extensive experiments confirm DASRGAN's superiority. Comparative analyses against leading methods across multiple benchmarks and upsampling factors reveal that DASRGAN sets new state-of-the-art performance standards. Code are available at \url{https://github.com/yongsongH/DASRGAN}.
    摘要 近期研究探索了使用可见光图像来增强护理图像的细节,但这直接适应方法经常变成一件双刃剑,因为它会提高细节的同时也会引入噪声和模糊 artefacts。为了解决这些挑战,我们提议了Target-oriented Domain Adaptation SRGAN(DASRGAN),一种创新的护理图像超分解模型适应框架。DASRGAN在两个关键组件之间运行:1)Texture-Oriented Adaptation(TOA),用于精细调整细节,和2)Noise-Oriented Adaptation(NOA),专门降低噪声传输。具体来说,TOA包括一个特殊的检测器,并使用 Sobel 引导的对抗损失来有效地对细节分布进行对齐。同时,NOA使用噪声对抗损失来在对抗训练中明确地分离生成的pattern和 Gaussian 噪声的分布。我们的广泛的实验证明了 DASRGAN 的优越性。与其他领先方法进行多个标准尺度和增强因子的比较分析表明,DASRGAN 创造了新的状态标准表现。代码可以在 中下载。

Correlation-aware active learning for surgery video segmentation

  • paper_url: http://arxiv.org/abs/2311.08811
  • repo_url: None
  • paper_authors: Fei Wu, Pablo Marquez-Neila, Mingyi Zheng, Hedyeh Rafii-Tari, Raphael Sznitman
  • for: 这个研究的目的是提出一个新的活动学习策略(COALSamp),用于降低医疗影像资料的标注成本。
  • methods: 方法包括将影像视网膜下降到一个特别设计的对应学习 latent space,然后从本地块群中选择一定数量的表征性影像。
  • results: 这个方法在两个手术影像资料集上进行了评估,结果显示COALSamp 可以对医疗影像资料进行有效的分类。此外,这个方法还在三个真实世界的影像资料集上进行了评估,结果也很显著。
    Abstract Semantic segmentation is a complex task that relies heavily on large amounts of annotated image data. However, annotating such data can be time-consuming and resource-intensive, especially in the medical domain. Active Learning (AL) is a popular approach that can help to reduce this burden by iteratively selecting images for annotation to improve the model performance. In the case of video data, it is important to consider the model uncertainty and the temporal nature of the sequences when selecting images for annotation. This work proposes a novel AL strategy for surgery video segmentation, \COALSamp{}, COrrelation-aWare Active Learning. Our approach involves projecting images into a latent space that has been fine-tuned using contrastive learning and then selecting a fixed number of representative images from local clusters of video frames. We demonstrate the effectiveness of this approach on two video datasets of surgical instruments and three real-world video datasets. The datasets and code will be made publicly available upon receiving necessary approvals.
    摘要 Semantic segmentation是一项复杂的任务,它依赖于大量已经标注的图像数据。然而,对于医疗领域来说,标注这些数据可以是时间consuming和资源占用的。活动学习(AL)是一种受欢迎的方法,它可以逐步选择图像进行标注,以提高模型性能。在视频数据中,需要考虑模型的uncertainty和时间序列的特点,以便更好地选择需要标注的图像。这项工作提出了一种新的AL策略,称为\COALSamp{}, COrrelation-aWare Active Learning。我们的方法是将图像 проек到一个已经精心调整的latent空间中,然后选择视频帧的本地集群中固定数量的表示图像。我们在两个手术工具视频数据集和三个实际世界视频数据集上证明了这种方法的有效性。数据和代码将在接收所需的批准后公开发布。

EyeLS: Shadow-Guided Instrument Landing System for Intraocular Target Approaching in Robotic Eye Surgery

  • paper_url: http://arxiv.org/abs/2311.08799
  • repo_url: None
  • paper_authors: Junjie Yang, Zhihao Zhao, Siyuan Shen, Daniel Zapp, Mathias Maier, Kai Huang, Nassir Navab, M. Ali Nasseri
  • for: This paper aims to improve the accuracy of robotic ophthalmic surgery by using shadow positions to estimate the depth position of the instrument tip and optimize its insertion trajectory.
  • methods: The proposed method uses shadows to estimate the relative depth position of the instrument tip and the target, and then optimizes the insertion trajectory to approach the target within the scanning area of the iOCT.
  • results: The method was tested on a retina model and achieved an average depth error of 0.0127 mm for floating targets and 0.3473 mm for retinal targets in the surgical simulator, without damaging the retina.
    Abstract Robotic ophthalmic surgery is an emerging technology to facilitate high-precision interventions such as retina penetration in subretinal injection and removal of floating tissues in retinal detachment depending on the input imaging modalities such as microscopy and intraoperative OCT (iOCT). Although iOCT is explored to locate the needle tip within its range-limited ROI, it is still difficult to coordinate iOCT's motion with the needle, especially at the initial target-approaching stage. Meanwhile, due to 2D perspective projection and thus the loss of depth information, current image-based methods cannot effectively estimate the needle tip's trajectory towards both retinal and floating targets. To address this limitation, we propose to use the shadow positions of the target and the instrument tip to estimate their relative depth position and accordingly optimize the instrument tip's insertion trajectory until the tip approaches targets within iOCT's scanning area. Our method succeeds target approaching on a retina model, and achieves an average depth error of 0.0127 mm and 0.3473 mm for floating and retinal targets respectively in the surgical simulator without damaging the retina.
    摘要 关于:机器人眼科手术技术的发展机器人眼科手术是一种emerging technology,用于实现高精度干预,如retina penetration和floatings tissues removing,这些干预都取决于输入的干预modalities,如微scopy和intraoperative OCT(iOCT)。虽然iOCT被探索以定位针的位置,但是尚未能够协调针与iOCT的运动,特别是在目标方向的初始阶段。此外,由于2D的 perspective projection,当前的图像基本方法无法有效地估算针的轨迹,特别是在向retinal和浮动目标的方向上。为了解决这个限制,我们提议使用target和 instrumente tip的阴影位置来估算它们的相对深度位置,并根据此来优化针的插入轨迹,直到针接近target在iOCT的扫描范围内。我们的方法在retina模型上成功地进行目标接近,并在手术模拟器中达到了0.0127mm和0.3473mm的平均深度误差,对于浮动和retinal目标。

HFORD: High-Fidelity and Occlusion-Robust De-identification for Face Privacy Protection

  • paper_url: http://arxiv.org/abs/2311.08786
  • repo_url: None
  • paper_authors: Dongxin Chen, Mingrui Zhu, Nannan Wang, Xinbo Gao
  • for: 面部隐私保护issue受到了智能设备的普及和计算机视觉技术的发展的关注。本文提出了一种高效和防护 occlusion 的面部隐私化方法(HFORD),以解决这些问题。
  • methods: 本方法使用了一种叫做 Identity Disentanglement Module(IDM)的模块,用于分离 latent codes 中的个体特征和特征特征。此外,还提出了一种叫做 Attribute Retention Module(ARM)的模块,用于保留不相关的特征和面部遮挡。
  • results: 对比其他面部隐私化方法,本方法的结果质量更高,细节更加详细,并且更强地适应 occlusion 情况。
    Abstract With the popularity of smart devices and the development of computer vision technology, concerns about face privacy protection are growing. The face de-identification technique is a practical way to solve the identity protection problem. The existing facial de-identification methods have revealed several problems, including the impact on the realism of anonymized results when faced with occlusions and the inability to maintain identity-irrelevant details in anonymized results. We present a High-Fidelity and Occlusion-Robust De-identification (HFORD) method to deal with these issues. This approach can disentangle identities and attributes while preserving image-specific details such as background, facial features (e.g., wrinkles), and lighting, even in occluded scenes. To disentangle the latent codes in the GAN inversion space, we introduce an Identity Disentanglement Module (IDM). This module selects the latent codes that are closely related to the identity. It further separates the latent codes into identity-related codes and attribute-related codes, enabling the network to preserve attributes while only modifying the identity. To ensure the preservation of image details and enhance the network's robustness to occlusions, we propose an Attribute Retention Module (ARM). This module adaptively preserves identity-irrelevant details and facial occlusions and blends them into the generated results in a modulated manner. Extensive experiments show that our method has higher quality, better detail fidelity, and stronger occlusion robustness than other face de-identification methods.
    摘要 随着智能设备的普及和计算机视觉技术的发展,人脸隐私保护的问题日益突出。面部隐私化技术是一种实际的解决方案。现有的面部隐私化方法存在一些问题,如受到遮挡物的影响下的匿名结果的真实性下降,以及维护不同于人脸特征的匿名结果。我们提出了一种高度准确和遮挡物鲁棒的匿名化方法(HFORD),以解决这些问题。这种方法可以分离人脸特征和属性,并保留图像特有的背景、表情特征(如皱纹)和照明等信息,即使在遮挡场景下也能够保持高质量。为了分离GAN拟合空间中的秘密码,我们引入了一种身份分解模块(IDM)。这个模块选择与身份有关的秘密码,并将其分解成身份相关的秘密码和属性相关的秘密码,使网络能够保留属性,只对人脸进行修改。为确保图像细节的保留和网络的遮挡物鲁棒性,我们提议一种Attribute Retention Module(ARM)。这个模块可以动态保留不相关于身份的细节和脸部遮挡物,并将其混合到生成结果中,以实现更高质量和更强的鲁棒性。经过广泛的实验,我们发现我们的方法在质量、细节准确性和遮挡物鲁棒性等方面都有更高的表现。

Language Semantic Graph Guided Data-Efficient Learning

  • paper_url: http://arxiv.org/abs/2311.08782
  • repo_url: None
  • paper_authors: Wenxuan Ma, Shuang Li, Lincan Cai, Jingxuan Kang
    for: 这个研究的目的是提高机器学习模型对有限数据的学习效能,并在无需人工标注的情况下实现更好的模型表现。methods: 这个研究使用了 Semi-Supervised Learning (SSL)、Transfer Learning (TL) 和 Data Augmentation (DA) 等方法来实现数据优化。另外,这个研究还使用了一个名为 Language Semantic Graph (LSG) 的新方法,它是根据标签中的自然语言描述建立的一个图形。results: 这个研究在图像、影片和音频等模式下运用 LSG 方法,在 SSL 和 TL 情况下获得了显著改善的表现,并且比其他数据优化方法更快速。
    Abstract Developing generalizable models that can effectively learn from limited data and with minimal reliance on human supervision is a significant objective within the machine learning community, particularly in the era of deep neural networks. Therefore, to achieve data-efficient learning, researchers typically explore approaches that can leverage more related or unlabeled data without necessitating additional manual labeling efforts, such as Semi-Supervised Learning (SSL), Transfer Learning (TL), and Data Augmentation (DA). SSL leverages unlabeled data in the training process, while TL enables the transfer of expertise from related data distributions. DA broadens the dataset by synthesizing new data from existing examples. However, the significance of additional knowledge contained within labels has been largely overlooked in research. In this paper, we propose a novel perspective on data efficiency that involves exploiting the semantic information contained in the labels of the available data. Specifically, we introduce a Language Semantic Graph (LSG) which is constructed from labels manifest as natural language descriptions. Upon this graph, an auxiliary graph neural network is trained to extract high-level semantic relations and then used to guide the training of the primary model, enabling more adequate utilization of label knowledge. Across image, video, and audio modalities, we utilize the LSG method in both TL and SSL scenarios and illustrate its versatility in significantly enhancing performance compared to other data-efficient learning approaches. Additionally, our in-depth analysis shows that the LSG method also expedites the training process.
    摘要 Developing generalizable models that can effectively learn from limited data and with minimal reliance on human supervision is a significant objective within the machine learning community, particularly in the era of deep neural networks. Therefore, to achieve data-efficient learning, researchers typically explore approaches that can leverage more related or unlabeled data without necessitating additional manual labeling efforts, such as Semi-Supervised Learning (SSL), Transfer Learning (TL), and Data Augmentation (DA). SSL leverages unlabeled data in the training process, while TL enables the transfer of expertise from related data distributions. DA broadens the dataset by synthesizing new data from existing examples. However, the significance of additional knowledge contained within labels has been largely overlooked in research. In this paper, we propose a novel perspective on data efficiency that involves exploiting the semantic information contained in the labels of the available data. Specifically, we introduce a Language Semantic Graph (LSG) which is constructed from labels manifest as natural language descriptions. Upon this graph, an auxiliary graph neural network is trained to extract high-level semantic relations and then used to guide the training of the primary model, enabling more adequate utilization of label knowledge. Across image, video, and audio modalities, we utilize the LSG method in both TL and SSL scenarios and illustrate its versatility in significantly enhancing performance compared to other data-efficient learning approaches. Additionally, our in-depth analysis shows that the LSG method also expedites the training process.Here's the translation in Traditional Chinese:开发能够从有限数据中学习并且仅对人工标注有最少依赖的机器学习模型是机器学习社区中的一个重要目标,特别是在深度神经网络时代。因此,实现数据效率的研究通常会探索可以将更多相关的或未标注的数据 leveraged 进行训练,例如半监督学习 (SSL)、传播学习 (TL) 和数据扩展 (DA)。SSL 利用训练过程中的无标注数据,而 TL 允许将相关数据分布中的专长转移到新的数据上。DA 则是将现有数据中的新数据生成新的数据,以增加数据集的大小。但是,实际上 Label 中含的额外知识几乎没有被研究。在这篇论文中,我们提出了一个新的数据效率的思路,即利用可用数据中的标签上的 semantic information。 Specifically, we introduce a Language Semantic Graph (LSG) which is constructed from labels manifest as natural language descriptions. Upon this graph, an auxiliary graph neural network is trained to extract high-level semantic relations and then used to guide the training of the primary model, enabling more adequate utilization of label knowledge. Across image, video, and audio modalities, we utilize the LSG method in both TL and SSL scenarios and illustrate its versatility in significantly enhancing performance compared to other data-efficient learning approaches. Additionally, our in-depth analysis shows that the LSG method also expedites the training process.

Two-stage Joint Transductive and Inductive learning for Nuclei Segmentation

  • paper_url: http://arxiv.org/abs/2311.08774
  • repo_url: None
  • paper_authors: Hesham Ali, Idriss Tondji, Mennatullah Siam
  • for: 针对医疗影像中蛋白质分割任务进行研究,以提高肿瘤诊断和治疗的效率和准确性。
  • methods: 提出了一种新的混合学习方法,结合了泛化学习和推导学习的优点,以便更好地利用可用的标注和未标注数据。
  • results: 在MoNuSeg测试集上证明了该方法的效果和潜在应用前景,并提出了一种新的两stage混合推理方案。
    Abstract AI-assisted nuclei segmentation in histopathological images is a crucial task in the diagnosis and treatment of cancer diseases. It decreases the time required to manually screen microscopic tissue images and can resolve the conflict between pathologists during diagnosis. Deep Learning has proven useful in such a task. However, lack of labeled data is a significant barrier for deep learning-based approaches. In this study, we propose a novel approach to nuclei segmentation that leverages the available labelled and unlabelled data. The proposed method combines the strengths of both transductive and inductive learning, which have been previously attempted separately, into a single framework. Inductive learning aims at approximating the general function and generalizing to unseen test data, while transductive learning has the potential of leveraging the unlabelled test data to improve the classification. To the best of our knowledge, this is the first study to propose such a hybrid approach for medical image segmentation. Moreover, we propose a novel two-stage transductive inference scheme. We evaluate our approach on MoNuSeg benchmark to demonstrate the efficacy and potential of our method.
    摘要 AI助成 Histopathological 图像中的核体分割是诊断和治疗癌症疾病的关键任务。它可以减少手动检查微scopic 组织图像所需的时间,并能够解决 pathologists 在诊断中存在的冲突。深度学习 已经在这种任务中证明了其有用性。然而,缺乏标注数据是深度学习基于方法的主要障碍。在这项研究中,我们提出了一种新的核体分割方法,利用可用的标注和无标注数据。我们的方法结合了泛化学习和抽象学习的优点,这两种学习方法在过去已经分别被应用。泛化学习目标是将通用函数approximated,并在未看到的测试数据上generalize;而抽象学习具有利用无标注测试数据来改进分类的潜在优势。根据我们所知,这是第一项提出了这种混合方法的医学图像分割研究。此外,我们还提出了一种新的两stage 混合推理方案。我们在 MoNuSeg benchmark 上评估了我们的方法,以demonstrate 我们的方法的效果和潜在。

FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier

  • paper_url: http://arxiv.org/abs/2311.09265
  • repo_url: https://github.com/artiprocher/sd-webui-fastblend
  • paper_authors: Zhongjie Duan, Chengyu Wang, Cen Chen, Weining Qian, Jun Huang, Mingyi Jin
  • for: Addresses the consistency problem in video processing for tasks such as style transfer and image editing.
  • methods: Uses a patch matching algorithm with two inference modes: blending and interpolation.
  • results: Outperforms existing methods for video deflickering and video synthesis in the blending mode, and surpasses video interpolation and model-based video processing approaches in the interpolation mode.Here’s the summary in Traditional Chinese:
  • for: 解决影像处理中的一致问题,如style transfer和图像修复。
  • methods: 使用一个patch matching算法,分别有汇入和 interpolate两种推论方式。
  • results: 与现有方法相比,在汇入模式下表现更好,并在 interpolate 模式下超过了影像 interpolate 和基于模型的影像处理方法。I hope that helps!
    Abstract With the emergence of diffusion models and rapid development in image processing, it has become effortless to generate fancy images in tasks such as style transfer and image editing. However, these impressive image processing approaches face consistency issues in video processing. In this paper, we propose a powerful model-free toolkit called FastBlend to address the consistency problem for video processing. Based on a patch matching algorithm, we design two inference modes, including blending and interpolation. In the blending mode, FastBlend eliminates video flicker by blending the frames within a sliding window. Moreover, we optimize both computational efficiency and video quality according to different application scenarios. In the interpolation mode, given one or more keyframes rendered by diffusion models, FastBlend can render the whole video. Since FastBlend does not modify the generation process of diffusion models, it exhibits excellent compatibility. Extensive experiments have demonstrated the effectiveness of FastBlend. In the blending mode, FastBlend outperforms existing methods for video deflickering and video synthesis. In the interpolation mode, FastBlend surpasses video interpolation and model-based video processing approaches. The source codes have been released on GitHub.
    摘要 “对于快速发展的图像处理技术和扩散模型,创造出了丰富的图像效果,例如风格转移和图像修剪。但这些印象精采的图像处理方法在视频处理中还存在一定的一致性问题。在本文中,我们提出了一个强大且无模型的工具套件called FastBlend,以解决视频处理中的一致性问题。基于图像匹配算法,我们设计了两种推理模式,包括融合和插值。在融合模式下,FastBlend可以消除视频震荡,通过视频内排埋窗口中的匹配。此外,我们还优化了不同应用场景中的计算效率和视频质量。在插值模式下,给定一幅或多幅透过扩散模型生成的关键帧,FastBlend可以生成整个视频。由于FastBlend不会改变扩散模型的生成过程,因此它具有很好的相容性。实验结果显示FastBlend的效果极佳,在融合模式下比现有的视频显示和视频生成方法更好,在插值模式下比视频插值和基于模型的视频处理方法更好。源代码已经在GitHub上发布。”

4K-Resolution Photo Exposure Correction at 125 FPS with ~8K Parameters

  • paper_url: http://arxiv.org/abs/2311.08759
  • repo_url: https://github.com/zhou-yijie/msltnet
  • paper_authors: Yijie Zhou, Chao Li, Jin Liang, Tianyi Xu, Xin Liu, Jun Xu
  • for: 本研究旨在提出一种高效且轻量级的多层感知架构,用于高分辨率图像曝光 corrections。
  • methods: 提议使用多槽线性变换网络(MSLT),通过拉普拉敏度 pyramid 技术进行高频和低频层分解,然后采用像素适应线性变换进行层次修复。
  • results: 实验结果表明,提议的 MSLT 网络在两个标准数据集上对 фото曝光 corrections 表现更高效,并且对比于现有的状态 искусственный neural network 有更好的性能。
    Abstract The illumination of improperly exposed photographs has been widely corrected using deep convolutional neural networks or Transformers. Despite with promising performance, these methods usually suffer from large parameter amounts and heavy computational FLOPs on high-resolution photographs. In this paper, we propose extremely light-weight (with only ~8K parameters) Multi-Scale Linear Transformation (MSLT) networks under the multi-layer perception architecture, which can process 4K-resolution sRGB images at 125 Frame-Per-Second (FPS) by a Titan RTX GPU. Specifically, the proposed MSLT networks first decompose an input image into high and low frequency layers by Laplacian pyramid techniques, and then sequentially correct different layers by pixel-adaptive linear transformation, which is implemented by efficient bilateral grid learning or 1x1 convolutions. Experiments on two benchmark datasets demonstrate the efficiency of our MSLTs against the state-of-the-arts on photo exposure correction. Extensive ablation studies validate the effectiveness of our contributions. The code is available at https://github.com/Zhou-Yijie/MSLTNet.
    摘要 “对于不当露光的照片,深度卷积神经网络或Transformers已经广泛地解决问题。然而,这些方法通常具有较大的参数数量和高 Computational FLOPs 的高resolution照片。在这篇文章中,我们提出了 extremely light-weight(仅有 ~8K 参数)的多尺度线性转换(MSLT)网络,可以在 Titan RTX GPU 上处理 4K 分辨率 sRGB 影像,并且可以在 125 帧每秒(FPS)的速度下进行处理。具体来说,我们的 MSLT 网络首先将输入影像分解成高频和低频层,使用 Laplacian pyramid 技术,然后逐层 corrections 不同层的像素,使用高效的二元方格学习或 1x1 卷积。实验结果显示,我们的 MSLT 网络在照片曝光修正方面与现有的方法相比,有着更高的效率。广泛的测试 validate 了我们的贡献。代码可以在 https://github.com/Zhou-Yijie/MSLTNet 上取得。”

Improved Dense Nested Attention Network Based on Transformer for Infrared Small Target Detection

  • paper_url: http://arxiv.org/abs/2311.08747
  • repo_url: None
  • paper_authors: Chun Bao, Jie Cao, Yaqian Ning, Tianhua Zhao, Zhijun Li, Zechen Wang, Li Zhang, Qun Hao
    for:The paper is written for detecting infrared small targets in complex and dynamic backgrounds using deep learning.methods:The proposed method, called improved dense nested attention network (IDNANet), is based on the transformer architecture and incorporates several novel features, including the Swin-transformer and ACmix attention structure, to enhance the continuity and features of the target.results:The proposed method outperforms other state-of-the-art methods in terms of probability of detection (P_d), false-alarm rate (F_a), and mean intersection of union ($mIoU$). Specifically, the $mIoU$ reaches 90.89 on the NUDT-SIRST dataset and 79.72 on the NUAA-SIRST dataset.
    Abstract Infrared small target detection based on deep learning offers unique advantages in separating small targets from complex and dynamic backgrounds. However, the features of infrared small targets gradually weaken as the depth of convolutional neural network (CNN) increases. To address this issue, we propose a novel method for detecting infrared small targets called improved dense nested attention network (IDNANet), which is based on the transformer architecture. We preserve the dense nested structure of dense nested attention network (DNANet) and introduce the Swin-transformer during feature extraction stage to enhance the continuity of features. Furthermore, we integrate the ACmix attention structure into the dense nested structure to enhance the features of intermediate layers. Additionally, we design a weighted dice binary cross-entropy (WD-BCE) loss function to mitigate the negative impact of foreground-background imbalance in the samples. Moreover, we develop a dataset specifically for infrared small targets, called BIT-SIRST. The dataset comprises a significant amount of real-world targets and manually annotated labels, as well as synthetic data and corresponding labels. We have evaluated the effectiveness of our method through experiments conducted on public datasets. In comparison to other state-of-the-art methods, our approach outperforms in terms of probability of detection (P_d), false-alarm rate (F_a), and mean intersection of union ($mIoU$). The $mIoU$ reaches 90.89 on the NUDT-SIRST dataset and 79.72 on the NUAA-SIRST dataset.
    摘要 infrared小target检测基于深度学习具有独特的优势,可以准确分割小target从复杂和动态背景中。然而,infrared小target特征逐渐弱化为深度卷积神经网络(CNN)的深度增加。为解决这个问题,我们提出了一种改进的infrared小target检测方法,称为改进的密集嵌入注意网络(IDNANet),基于transformer架构。我们保持密集嵌入结构的密集嵌入注意网络(DNANet)结构,并在特征提取阶段引入Swin-transformer以增强特征连续性。此外,我们将ACmix注意结构integrated到密集结构中,以提高中间层特征的表现。此外,我们还定义了weighted dice二分类优化函数(WD-BCE),以抑制样本中背景干扰的负面影响。此外,我们还开发了专门为infrared小target而设计的BIT-SIRST数据集。该数据集包括大量真实世界目标和手动标注 Label,以及 sintetic数据和相应的标注。我们通过对公共数据集进行实验,证明了我们的方法的效果。与其他当前状态的方法相比,我们的方法在检测概率(P_d)、假阳性率(F_a)和mean intersection of union($mIoU)方面具有优势。$mIoU$ 在NUDT-SIRST数据集上达到了90.89,在NUAA-SIRST数据集上达到了79.72。

A Diffusion Model Based Quality Enhancement Method for HEVC Compressed Video

  • paper_url: http://arxiv.org/abs/2311.08746
  • repo_url: None
  • paper_authors: Zheng Liu, Honggang Qi
  • for: 提高压缩视频质量
  • methods: 使用扩散模型进行后处理
  • results: 在混合数据集上实现更高的质量改进 compared to existing methods
    Abstract Video post-processing methods can improve the quality of compressed videos at the decoder side. Most of the existing methods need to train corresponding models for compressed videos with different quantization parameters to improve the quality of compressed videos. However, in most cases, the quantization parameters of the decoded video are unknown. This makes existing methods have their limitations in improving video quality. To tackle this problem, this work proposes a diffusion model based post-processing method for compressed videos. The proposed method first estimates the feature vectors of the compressed video and then uses the estimated feature vectors as the prior information for the quality enhancement model to adaptively enhance the quality of compressed video with different quantization parameters. Experimental results show that the quality enhancement results of our proposed method on mixed datasets are superior to existing methods.
    摘要 <>视频后处理技术可以提高压缩视频在解码器端的质量。现有的大多数方法需要为不同的压缩参数训练对应的模型,以提高压缩视频的质量。然而,在大多数情况下,解码后的视频的压缩参数都是未知的。这限制了现有方法的改进视频质量的能力。为解决这个问题,本研究提出了基于扩散模型的后处理方法。该方法首先估计压缩视频的特征向量,然后使用估计的特征向量作为质量提升模型的先知信息,以适应不同的压缩参数进行自适应质量提升。实验结果表明,我们提出的方法在混合数据集上的质量提升结果较 existing方法优。Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".

Scalable Federated Learning for Clients with Different Input Image Sizes and Numbers of Output Categories

  • paper_url: http://arxiv.org/abs/2311.08716
  • repo_url: None
  • paper_authors: Shuhei Nitta, Taiji Suzuki, Albert Rodríguez Mulet, Atsushi Yaguchi, Ryusuke Hirai
  • for: 采用 federated learning 方法进行隐私保护的训练,但不是所有客户端的数据都可以共享。
  • methods: 根据客户端的输入图像大小和输出类别数量调整本地模型的深度和宽度,以及提供一个新的普适性隔距来描述联邦学习的泛化差。
  • results: 在多个不同客户端设置下,对图像分类和物体检测任务进行了证明效果,并且提供了一个可靠的 bound 来描述联邦学习的泛化差。
    Abstract Federated learning is a privacy-preserving training method which consists of training from a plurality of clients but without sharing their confidential data. However, previous work on federated learning do not explore suitable neural network architectures for clients with different input images sizes and different numbers of output categories. In this paper, we propose an effective federated learning method named ScalableFL, where the depths and widths of the local models for each client are adjusted according to the clients' input image size and the numbers of output categories. In addition, we provide a new bound for the generalization gap of federated learning. In particular, this bound helps to explain the effectiveness of our scalable neural network approach. We demonstrate the effectiveness of ScalableFL in several heterogeneous client settings for both image classification and object detection tasks.
    摘要 federated learning 是一种隐私保护的训练方法,它通过多个客户端进行训练,但不是共享客户端的敏感数据。然而,过去的联邦学习工作未经检查适合客户端的不同输入图像大小和输出类别数量的适应性的神经网络架构。在这篇论文中,我们提出了一种有效的联邦学习方法,名为可扩展FL(ScalableFL)。我们在每个客户端的本地模型中调整了深度和宽度,以适应客户端的输入图像大小和输出类别数量。此外,我们还提供了一个新的泛化差 bounds,帮助解释我们的扩展神经网络方法的效iveness。我们在多种不同客户端设置下进行了多个实验,证明了ScalableFL 的效iveness。Note: Please note that the translation is in Simplified Chinese, and the word order and sentence structure may be different from the original text.

CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding

  • paper_url: http://arxiv.org/abs/2311.08673
  • repo_url: None
  • paper_authors: Jianzong Wang, Yimin Deng, Ziqi Liang, Xulong Zhang, Ning Cheng, Jing Xiao
  • for: 这个论文是为了提出一种名为“CP-EB”的对话面生成方法,用于从音频信号和人像作为输入,生成一个具有自然头姿和眼睛跳动的人讲视频。
  • methods: 该方法使用了一种基于GAN的建模结构,从输入音频和参考视频中提取眼睛跳动特征,并通过对其进行对比训练,将其 embedding到人讲图像中。
  • results: 实验结果显示,该方法可以生成具有同步嘴部动作、自然头姿和眼睛跳动的真实人讲视频。
    Abstract This paper proposes a talking face generation method named "CP-EB" that takes an audio signal as input and a person image as reference, to synthesize a photo-realistic people talking video with head poses controlled by a short video clip and proper eye blinking embedding. It's noted that not only the head pose but also eye blinking are both important aspects for deep fake detection. The implicit control of poses by video has already achieved by the state-of-art work. According to recent research, eye blinking has weak correlation with input audio which means eye blinks extraction from audio and generation are possible. Hence, we propose a GAN-based architecture to extract eye blink feature from input audio and reference video respectively and employ contrastive training between them, then embed it into the concatenated features of identity and poses to generate talking face images. Experimental results show that the proposed method can generate photo-realistic talking face with synchronous lips motions, natural head poses and blinking eyes.
    摘要 这篇论文提出了一种名为“CP-EB”的人脸发言方法,该方法接受音频信号和人像作为输入,并生成一个具有自然头姿和眼睛跳动的真实人脸发言视频。研究表明,不仅头姿也是深度伪造检测中重要的一个方面,而且眼睛跳动与输入音频的关系弱,这意味着可以从音频中提取眼睛跳动特征并生成。因此,我们提议使用GAN网络抽取音频和参考视频中的眼睛跳动特征,并在这些特征之间进行对比培训,然后将其 embedding到人脸特征和头姿特征中,以生成真实的人脸发言图像。实验结果表明,提出的方法可以生成具有同步嘴部动作、自然头姿和眼睛跳动的真实人脸发言图像。

Deep Neural Network Identification of Limnonectes Species and New Class Detection Using Image Data

  • paper_url: http://arxiv.org/abs/2311.08661
  • repo_url: None
  • paper_authors: Li Xu, Yili Hong, Eric P. Smith, David S. McLeod, Xinwei Deng, Laura J. Freeman
  • for: 这篇论文是用于解决生物多样性的挑战,具体来说是用于处理种群复杂的问题。
  • methods: 这篇论文使用机器学习的原理来解决两个问题:一是种群分类问题,二是外围检测问题。
  • results: 研究人员使用深度神经网络成功地自动将图像分类到已知种群中,并且可以成功地检测图像是否属于现有类别之外。
    Abstract As is true of many complex tasks, the work of discovering, describing, and understanding the diversity of life on Earth (viz., biological systematics and taxonomy) requires many tools. Some of this work can be accomplished as it has been done in the past, but some aspects present us with challenges which traditional knowledge and tools cannot adequately resolve. One such challenge is presented by species complexes in which the morphological similarities among the group members make it difficult to reliably identify known species and detect new ones. We address this challenge by developing new tools using the principles of machine learning to resolve two specific questions related to species complexes. The first question is formulated as a classification problem in statistics and machine learning and the second question is an out-of-distribution (OOD) detection problem. We apply these tools to a species complex comprising Southeast Asian stream frogs (Limnonectes kuhlii complex) and employ a morphological character (hind limb skin texture) traditionally treated qualitatively in a quantitative and objective manner. We demonstrate that deep neural networks can successfully automate the classification of an image into a known species group for which it has been trained. We further demonstrate that the algorithm can successfully classify an image into a new class if the image does not belong to the existing classes. Additionally, we use the larger MNIST dataset to test the performance of our OOD detection algorithm. We finish our paper with some concluding remarks regarding the application of these methods to species complexes and our efforts to document true biodiversity. This paper has online supplementary materials.
    摘要 如同许多复杂任务一样,发现、描述和理解地球上生物多样性(即生物系统матиCS和taxonomy)需要许多工具。一些这些工具可以通过传统的方法来实现,但一些方面却对传统知识和工具来说无法充分解决。一个如此挑战是由种群复杂体系所带来,其中种群成员之间的形态相似性使得可靠地识别已知种和检测新种变得困难。我们通过开发新的工具,使用机器学习原理来解决这两个问题。第一个问题是一个统计学和机器学习中的分类问题,第二个问题是一个 OUT-OF-DISTRIBUTION(OOD)检测问题。我们在南东亚瀑布蟾(Limnonectes kuhlii complex)种群中应用这些工具,使用传统上被质量地对待的一个形态特征(背股皮Texture)进行了量化Objective的处理。我们示示了深度神经网络可以成功地自动将图像分类到已知种群中,并且可以成功地将图像分类到新的类别中。此外,我们使用更大的MNIST数据集来测试我们的OOD检测算法的性能。我们在文章结尾附加了一些关于这些方法在种群复杂体系中的应用以及我们的记录真正的生物多样性的评论。这篇文章有在线补充材料。

ConeQuest: A Benchmark for Cone Segmentation on Mars

  • paper_url: http://arxiv.org/abs/2311.08657
  • repo_url: https://github.com/kerner-lab/conequest
  • paper_authors: Mirali Purohit, Jacob Adler, Hannah Kerner
  • for: 这 paper 是为了开发更加准确和可靠的 Mars 穹顶 cone 分割模型。
  • methods: 该 paper 使用了 computer vision 技术,并基于 Mars 的三个地区,提供了 >13k 个样本,以便进行 cone 分割模型的训练和评估。
  • results: 该 paper 的结果表明,现有的 segmentation 模型无法准确地分割 Mars 的穹顶 cone,其中最佳模型的 IoU 分割率只有 52.52% 和 42.55%。
    Abstract Over the years, space scientists have collected terabytes of Mars data from satellites and rovers. One important set of features identified in Mars orbital images is pitted cones, which are interpreted to be mud volcanoes believed to form in regions that were once saturated in water (i.e., a lake or ocean). Identifying pitted cones globally on Mars would be of great importance, but expert geologists are unable to sort through the massive orbital image archives to identify all examples. However, this task is well suited for computer vision. Although several computer vision datasets exist for various Mars-related tasks, there is currently no open-source dataset available for cone detection/segmentation. Furthermore, previous studies trained models using data from a single region, which limits their applicability for global detection and mapping. Motivated by this, we introduce ConeQuest, the first expert-annotated public dataset to identify cones on Mars. ConeQuest consists of >13k samples from 3 different regions of Mars. We propose two benchmark tasks using ConeQuest: (i) Spatial Generalization and (ii) Cone-size Generalization. We finetune and evaluate widely-used segmentation models on both benchmark tasks. Results indicate that cone segmentation is a challenging open problem not solved by existing segmentation models, which achieve an average IoU of 52.52% and 42.55% on in-distribution data for tasks (i) and (ii), respectively. We believe this new benchmark dataset will facilitate the development of more accurate and robust models for cone segmentation. Data and code are available at https://github.com/kerner-lab/ConeQuest.
    摘要 Over the years, space scientists have collected terabytes of Mars data from satellites and rovers. One important set of features identified in Mars orbital images is pitted cones, which are interpreted to be mud volcanoes believed to form in regions that were once saturated in water (i.e., a lake or ocean). Identifying pitted cones globally on Mars would be of great importance, but expert geologists are unable to sort through the massive orbital image archives to identify all examples. However, this task is well suited for computer vision. Although several computer vision datasets exist for various Mars-related tasks, there is currently no open-source dataset available for cone detection/segmentation. Furthermore, previous studies trained models using data from a single region, which limits their applicability for global detection and mapping. Motivated by this, we introduce ConeQuest, the first expert-annotated public dataset to identify cones on Mars. ConeQuest consists of >13k samples from 3 different regions of Mars. We propose two benchmark tasks using ConeQuest: (i) Spatial Generalization and (ii) Cone-size Generalization. We finetune and evaluate widely-used segmentation models on both benchmark tasks. Results indicate that cone segmentation is a challenging open problem not solved by existing segmentation models, which achieve an average IoU of 52.52% and 42.55% on in-distribution data for tasks (i) and (ii), respectively. We believe this new benchmark dataset will facilitate the development of more accurate and robust models for cone segmentation. Data and code are available at https://github.com/kerner-lab/ConeQuest.Here's the translation in Traditional Chinese:过去的年头,宇宙科学家从卫星和探测车获取了数十TB的火星数据。一个重要的特征是穿孔碗,被解释为怀疑是过去曾经淹没在水中的泥火山。全球火星上的穿孔碗识别是非常重要,但专业地质学家无法从巨大的卫星图像档案中找到所有的例子。然而,这个任务非常适合计算机视觉。虽然有许多火星相关的计算机视觉数据集存在,但目前没有公开的数据集可以用于穿孔碗检测。此外,前一 studies将模型训练使用单一区域的数据,导致其在全球检测和地图上的应用有限。为了解决这个问题,我们介绍了ConeQuest,首个专家录实 dataset 用于火星穿孔碗检测。ConeQuest 包含了 >13k 个样本,来自三个不同的火星区域。我们提出了两个benchmark任务:(i) 空间一致和 (ii) 穿孔大小一致。我们在这两个任务上调整和评估了广泛使用的检测模型。结果显示,穿孔检测是一个尚未解决的开问题,现有的检测模型在内部数据上的内容率为52.52%和42.55%。我们相信这个新的benchmark数据集将促进更加精确和可靠的穿孔检测模型的发展。数据和代码可以在 https://github.com/kerner-lab/ConeQuest 上获取。

Review of AlexNet for Medical Image Classification

  • paper_url: http://arxiv.org/abs/2311.08655
  • repo_url: https://github.com/Arminsbss/tumor-classification
  • paper_authors: Wenhao Tang, Junding Sun, Shuihua Wang, Yudong Zhang
  • for: 这篇论文主要探讨了AlexNet模型在医学图像分类领域的应用和技术细节。
  • methods: 该论文使用了Dropout技术和ReLU活化函数来避免过拟合和梯度消失问题,并提出了一种基于AlexNet模型的医学图像分类方法。
  • results: 该论文通过对40篇学术论文和会议论文进行回顾,提出了AlexNet模型的技术细节、优势和应用领域。
    Abstract In recent years, the rapid development of deep learning has led to a wide range of applications in the field of medical image classification. The variants of neural network models with ever-increasing performance share some commonalities: to try to mitigate overfitting, improve generalization, avoid gradient vanishing and exploding, etc. AlexNet first utilizes the dropout technique to mitigate overfitting and the ReLU activation function to avoid gradient vanishing. Therefore, we focus our discussion on AlexNet, which has contributed greatly to the development of CNNs in 2012. After reviewing over 40 papers, including journal papers and conference papers, we give a narrative on the technical details, advantages, and application areas of AlexNet.
    摘要

Refining Perception Contracts: Case Studies in Vision-based Safe Auto-landing

  • paper_url: http://arxiv.org/abs/2311.08652
  • repo_url: None
  • paper_authors: Yangge Li, Benjamin C Yang, Yixuan Jia, Daniel Zhuang, Sayan Mitra
  • for: 该论文旨在评估控制系统中使用机器学习进行感知时的安全性。
  • methods: 论文使用了合同测试和证明方法来证明总体系统水平安全需求的可行性。
  • results: 论文通过引入数据和需求导向的改进合同构建算法(DaRePC),得出了可测试的合同,确定了降落在跑道上安全的飞机状态和环境条件,以及通过序列门检测器安全过航的无人机状态。同时也发现了一些可能导致视觉控制系统的安全性问题的条件(例如低地平线的阳光)。
    Abstract Perception contracts provide a method for evaluating safety of control systems that use machine learning for perception. A perception contract is a specification for testing the ML components, and it gives a method for proving end-to-end system-level safety requirements. The feasibility of contract-based testing and assurance was established earlier in the context of straight lane keeping: a 3-dimensional system with relatively simple dynamics. This paper presents the analysis of two 6 and 12-dimensional flight control systems that use multi-stage, heterogeneous, ML-enabled perception. The paper advances methodology by introducing an algorithm for constructing data and requirement guided refinement of perception contracts (DaRePC). The resulting analysis provides testable contracts which establish the state and environment conditions under which an aircraft can safety touchdown on the runway and a drone can safely pass through a sequence of gates. It can also discover conditions (e.g., low-horizon sun) that can possibly violate the safety of the vision-based control system.
    摘要 感知合约提供了评估机器学习控制系统安全性的方法。感知合约是测试ML组件的规范,它提供了系统级别的安全要求的证明方法。在推点驱动的情况下,感知合约的可行性已经在三维直线保持问题中被证明。本文分析了使用多 Stage、异构、ML实现的6和12维飞行控制系统。本文提出了一种数据和需求驱动的感知合约构建算法(DaRePC),从而得到了可测试的合约,这些合约确定了飞机在跑道上安全着陆和无人机通过序列门的前提条件。此外,它还可以发现可能违反视觉控制系统安全性的情况(如低地平线的阳光)。

Painterly Image Harmonization via Adversarial Residual Learning

  • paper_url: http://arxiv.org/abs/2311.08646
  • repo_url: None
  • paper_authors: Xudong Wang, Li Niu, Junyan Cao, Yan Hong, Liqing Zhang
  • for: 将 photorealistic 背景图像和绘画风格背景图像进行合成,以实现图像的协调和融合。
  • methods: 使用对抗学习技术,特别是设计了 dual-encoder 生成器和 pixel-wise 探测器,以bridge 背景和前景特征图像之间的域隔。
  • results: 实验表明,我们的方法可以实现更加协调、视觉吸引人的结果,比前方法更高效。
    Abstract Image compositing plays a vital role in photo editing. After inserting a foreground object into another background image, the composite image may look unnatural and inharmonious. When the foreground is photorealistic and the background is an artistic painting, painterly image harmonization aims to transfer the style of background painting to the foreground object, which is a challenging task due to the large domain gap between foreground and background. In this work, we employ adversarial learning to bridge the domain gap between foreground feature map and background feature map. Specifically, we design a dual-encoder generator, in which the residual encoder produces the residual features added to the foreground feature map from main encoder. Then, a pixel-wise discriminator plays against the generator, encouraging the refined foreground feature map to be indistinguishable from background feature map. Extensive experiments demonstrate that our method could achieve more harmonious and visually appealing results than previous methods.
    摘要 Image compositing 在图像编辑中扮演着关键角色。在插入一个背景图像中的前景对象后,复合图像可能会看起来不自然和不协调。当前景是真实图像,背景是艺术油画时, painterly image harmonization 的目标是将背景油画的风格传递到前景对象中,这是一项具有很大领域差异的任务。在这种情况下,我们使用对抗学习来跨领域 bridge 差异。我们设计了一个 dual-encoder 生成器,其中副encoder 生成的差异特征将被添加到主encoder 生成的前景特征图中。然后,一个像素级别的探测器与生成器进行对抗,以便使得重新处理后的前景特征图与背景特征图无法分辨。我们的方法在详细实验中被证明可以实现更自然和视觉吸引人的结果,比前方法更好。

cs.AI - 2023-11-15

HAL 9000: Skynet’s Risk Manager

  • paper_url: http://arxiv.org/abs/2311.09449
  • repo_url: None
  • paper_authors: Tadeu Freitas, Mário Neto, Inês Dutra, João Soares, Manuel Correia, Rolando Martins
    for:这种论文是为了提出一种基于现代技术的攻击快照系统(ITS)体系,以提高ITS的入侵忍受能力和适应新敌人。methods:该论文使用了机器学习(ML)算法来帮助ITS学习从以往攻击和已知漏洞中,以增强其入侵忍受能力。它还提出了一种基于现代技术的风险管理器设计,通过自动评估操作系统(OS)的风险,提供更安全的配置建议。results:实验表明,使用Skynet和HAL 9000设计可以降低成功入侵的可能性,并且HAL可以选择15%更安全的配置,比现有的风险管理器更高效。
    Abstract Intrusion Tolerant Systems (ITSs) are a necessary component for cyber-services/infrastructures. Additionally, as cyberattacks follow a multi-domain attack surface, a similar defensive approach should be applied, namely, the use of an evolving multi-disciplinary solution that combines ITS, cybersecurity and Artificial Intelligence (AI). With the increased popularity of AI solutions, due to Big Data use-case scenarios and decision support and automation scenarios, new opportunities to apply Machine Learning (ML) algorithms have emerged, namely ITS empowerment. Using ML algorithms, an ITS can augment its intrusion tolerance capability, by learning from previous attacks and from known vulnerabilities. As such, this work's contribution is twofold: (1) an ITS architecture (Skynet) based on the state-of-the-art and incorporates new components to increase its intrusion tolerance capability and its adaptability to new adversaries; (2) an improved Risk Manager design that leverages AI to improve ITSs by automatically assessing OS risks to intrusions, and advise with safer configurations. One of the reasons that intrusions are successful is due to bad configurations or slow adaptability to new threats. This can be caused by the dependency that systems have for human intervention. One of the characteristics in Skynet and HAL 9000 design is the removal of human intervention. Being fully automatized lowers the chance of successful intrusions caused by human error. Our experiments using Skynet, shows that HAL is able to choose 15% safer configurations than the state-of-the-art risk manager.
    摘要 干扰快照系统(ITS)是现代网络服务/基础设施的必需组件。此外,由于攻击者通常会利用多个领域进行攻击,因此应采取相应的防御策略,即结合ITS、网络安全和人工智能(AI)的演化多学科解决方案。随着人工智能解决方案的普及,特别是基于大数据和决策支持自动化场景,新的机会出现了,可以使用机器学习(ML)算法来实现ITS的增强。通过ML算法,ITS可以从前一次攻击和已知漏洞中学习增强其抗侵入能力。这项工作的贡献有两个方面:1. 基于当前最佳实践的ITS架构(Skynet),新增了增强抗侵入能力和适应新敌人的功能。2. 基于人工智能自动评估系统(HAL 9000),提高了ITS的风险管理,自动评估操作系统的风险,并提供更安全的配置。一个常见的攻击成功原因是因为系统的坏配置或慢速应对新威胁。这可能是由系统的人工参与引起的。Skynet和HAL 9000的设计中消除了人工参与,它们是完全自动化的,降低了由人类错误引起的成功攻击的可能性。我们对Skynet进行了实验,发现HAL可以比当前状态艺术风险管理选择15%更安全的配置。

How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities

  • paper_url: http://arxiv.org/abs/2311.09447
  • repo_url: None
  • paper_authors: Lingbo Mo, Boshi Wang, Muhao Chen, Huan Sun
  • for: 本研究旨在评估开源大语言模型(LLMs)的可靠性,检测其在8个方面,包括恶意、偏见、伦理、幻觉、公平、奴役、隐私和对抗示范攻击的可靠性。
  • methods: 我们提出了一种基于Chain of Utterances(CoU)的提示策略,通过针对性地制作恶意示范来检测模型的可靠性。我们对当今代表性的开源LLMs进行了广泛的实验,包括Vicuna、MPT、Falcon、Mistral和Llama 2。
  • results: 我们的实验结果表明,我们的攻击策略在多个方面具有效果,而且模型的性能在普通NLP任务上高不一定意味着它们具有更高的可靠性。此外,我们发现,受过 instrucion tuning 的模型更容易受到攻击,而 fine-tuning LLMs for safety alignment 可以减轻对抗式可靠性攻击的影响。
    Abstract The rapid progress in open-source Large Language Models (LLMs) is significantly driving AI development forward. However, there is still a limited understanding of their trustworthiness. Deploying these models at scale without sufficient trustworthiness can pose significant risks, highlighting the need to uncover these issues promptly. In this work, we conduct an assessment of open-source LLMs on trustworthiness, scrutinizing them across eight different aspects including toxicity, stereotypes, ethics, hallucination, fairness, sycophancy, privacy, and robustness against adversarial demonstrations. We propose an enhanced Chain of Utterances-based (CoU) prompting strategy by incorporating meticulously crafted malicious demonstrations for trustworthiness attack. Our extensive experiments encompass recent and representative series of open-source LLMs, including Vicuna, MPT, Falcon, Mistral, and Llama 2. The empirical outcomes underscore the efficacy of our attack strategy across diverse aspects. More interestingly, our result analysis reveals that models with superior performance in general NLP tasks do not always have greater trustworthiness; in fact, larger models can be more vulnerable to attacks. Additionally, models that have undergone instruction tuning, focusing on instruction following, tend to be more susceptible, although fine-tuning LLMs for safety alignment proves effective in mitigating adversarial trustworthiness attacks.
    摘要 开源大语言模型(LLM)的快速进步在人工智能发展中发挥着重要作用。然而,对这些模型的可靠性仍然具有有限的理解。在大规模部署过程中,如果不具备足够的可靠性,可能会产生严重的风险。在这项工作中,我们对开源LLM进行了可靠性评估,对其进行了八个方面的检查,包括恶意、偏见、伦理、幻觉、公平、追随、隐私和对抗攻击的Robustness。我们提出了基于Chain of Utterances(CoU)的增强的提示策略,通过针对可靠性攻击的精心制作的假示例进行检测。我们的广泛实验包括当前和代表性的开源LLM系列,包括Vicuna、MPT、Falcon、Mistral和Llama 2。实验结果证明了我们的攻击策略在多个方面的有效性。更有趣的是,我们的结果分析发现,在普通的NLPT任务中表现出色的模型并不总是具有最高的可靠性;事实上,更大的模型可能会更容易受到攻击。此外,通过专门准备Instruction Following的模型,即模型偏好遵循指令,可能会更容易受到攻击,而通过安全对齐来平衡攻击的可靠性攻击。

Exploring the Privacy-Energy Consumption Tradeoff for Split Federated Learning

  • paper_url: http://arxiv.org/abs/2311.09441
  • repo_url: None
  • paper_authors: Joohyung Lee, Mohamed Seif, Jungchan Cho, H. Vincent Poor
    for:This paper focuses on Split Federated Learning (SFL) and its impact on energy consumption and privacy.methods:The paper analyzes the influence of system parameters on the selection of the cut layer in SFL and provides an illustrative example of cut layer selection to minimize the risk of clients reconstructing raw data while sustaining energy consumption within a required budget.results:The paper discusses the challenges of cut layer selection in SFL and provides a comprehensive overview of the SFL process, taking into account the impact of various system parameters on energy consumption and privacy. Additionally, the paper addresses open challenges in this field and identifies promising avenues for future research and development, particularly in the context of 6G technology.
    Abstract Split Federated Learning (SFL) has recently emerged as a promising distributed learning technology, leveraging the strengths of both federated learning and split learning. It emphasizes the advantages of rapid convergence while addressing privacy concerns. As a result, this innovation has received significant attention from both industry and academia. However, since the model is split at a specific layer, known as a cut layer, into both client-side and server-side models for the SFL, the choice of the cut layer in SFL can have a substantial impact on the energy consumption of clients and their privacy, as it influences the training burden and the output of the client-side models. Moreover, the design challenge of determining the cut layer is highly intricate, primarily due to the inherent heterogeneity in the computing and networking capabilities of clients. In this article, we provide a comprehensive overview of the SFL process and conduct a thorough analysis of energy consumption and privacy. This analysis takes into account the influence of various system parameters on the cut layer selection strategy. Additionally, we provide an illustrative example of the cut layer selection, aiming to minimize the risk of clients from reconstructing the raw data at the server while sustaining energy consumption within the required energy budget, which involve trade-offs. Finally, we address open challenges in this field including their applications to 6G technology. These directions represent promising avenues for future research and development.
    摘要 Split Federated Learning (SFL) 是一种最近崛起的分布式学习技术,结合 federated learning 和 split learning 的优势,强调快速收敛和隐私问题的处理。因此,这一创新在行业和学术界都受到了广泛的关注。然而,在 SFL 中选择 cut layer 可能对客户端的能 consumption 和隐私有很大的影响,因为它影响了客户端的训练负担和输出。此外,选择 cut layer 的设计挑战很大,主要由客户端的计算和网络能力的不同而导致的约束。在本文中,我们提供了 SFL 的全面概述,并进行了严格的能 consumption 和隐私分析。这种分析考虑了各种系统参数对 cut layer 选择策略的影响。此外,我们还提供了一个例子,以减少客户端从服务器重建原始数据的风险,同时保持在必要的能 consumption 范围内,这些决策涉及到了负面的负担。最后,我们讨论了当前领域的开放挑战,包括它们在 6G 技术中的应用。这些方向表示未来研发的有前途。

Backdoor Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment

  • paper_url: http://arxiv.org/abs/2311.09433
  • repo_url: None
  • paper_authors: Haoran Wang, Kai Shu
  • for: 这 paper 是为了研究 instruction-tuned Large Language Models (LLMs) 的安全性,具体来说是研究这些模型在不同的安全任务上的可控性。
  • methods: 这 paper 使用了一种新的攻击框架,叫做 Backdoor Activation Attack,它可以在 LLMs 的活动层中插入恶意导向 вектор。
  • results: 实验结果表明,该方法可以高效地启动攻击,并且增加了非常小的负担。此外, paper 还讨论了对这种活动攻击的可能的防御措施。
    Abstract To ensure AI safety, instruction-tuned Large Language Models (LLMs) are specifically trained to ensure alignment, which refers to making models behave in accordance with human intentions. While these models have demonstrated commendable results on various safety benchmarks, the vulnerability of their safety alignment has not been extensively studied. This is particularly troubling given the potential harm that LLMs can inflict. Existing attack methods on LLMs often rely on poisoned training data or the injection of malicious prompts. These approaches compromise the stealthiness and generalizability of the attacks, making them susceptible to detection. Additionally, these models often demand substantial computational resources for implementation, making them less practical for real-world applications. In this work, we introduce a novel attack framework, called Backdoor Activation Attack, which injects trojan steering vectors into the activation layers of LLMs. These malicious steering vectors can be triggered at inference time to steer the models toward attacker-desired behaviors by manipulating their activations. In particular, the steering vectors are generated by taking the difference between benign and malicious activations. Then, the most effective steering vector is selected and added to the forward passes of the LLMs. Our experiment results on four primary alignment tasks show that our proposed method is highly effective and adds little or no overhead to attack efficiency. Additionally, we discuss potential countermeasures against such activation attacks. Our code and data are available at https://email-haoran-for-link. Warning: this paper contains content that can be offensive or upsetting.
    摘要 为确保人工智能安全,特定的大语言模型(LLMs)被专门训练,以确保它们的对应性,即让模型按照人类意图进行行为。尽管这些模型在不同的安全标准上表现出色,但它们的安全对应性还没有得到广泛的研究。这特别具有威胁性,因为这些模型可能会造成严重的损害。现有的攻击方法通常利用恶意训练数据或插入恶意提示来攻击LLMs。这些方法会增加攻击的隐蔽性和通用性,使其易于检测。另外,这些模型通常需要巨大的计算资源来实现,使其在实际应用中不太实际。在这种情况下,我们引入了一种新的攻击框架,called Backdoor Activation Attack,它可以在LLMs中插入恶意导向 вектор。这些恶意导向 вектор可以在推理时被触发,以使模型按照攻击者所需的方向进行行为。具体来说,这些恶意导向 вектор是通过比较善意和恶意的激活值而生成的。然后,选择最有效的导向 вектор,并将其添加到LLMs的前向传输中。我们的实验结果表明,我们的提posed方法在四个主要对应任务上都具有非常高的效果,并且增加了非常少的负载。此外,我们还讨论了对这种激活攻击的可能的防御措施。我们的代码和数据可以在https://email-haoran-for-link中找到。注意:这篇论文可能包含不适或不适的内容。

Beyond Detection: Unveiling Fairness Vulnerabilities in Abusive Language Models

  • paper_url: http://arxiv.org/abs/2311.09428
  • repo_url: None
  • paper_authors: Yueqing Liang, Lu Cheng, Ali Payani, Kai Shu
  • for: 本研究探讨了针对恶意语言检测模型的不公正性和检测性能的攻击性能,以提高模型的公正性稳定性。
  • methods: 本研究提出了一个简单 yet effective的框架 FABLE,通过利用后门攻击来实现对公正性和检测性能的Targeted控制。 FABLE 探讨了三种触发设计(i.e., 罕见、人工和自然触发)以及新的采样策略。
  • results: 实验结果表明,FABLE 可以成功地攻击恶意语言检测模型的公正性和实用性。
    Abstract This work investigates the potential of undermining both fairness and detection performance in abusive language detection. In a dynamic and complex digital world, it is crucial to investigate the vulnerabilities of these detection models to adversarial fairness attacks to improve their fairness robustness. We propose a simple yet effective framework FABLE that leverages backdoor attacks as they allow targeted control over the fairness and detection performance. FABLE explores three types of trigger designs (i.e., rare, artificial, and natural triggers) and novel sampling strategies. Specifically, the adversary can inject triggers into samples in the minority group with the favored outcome (i.e., ``non-abusive'') and flip their labels to the unfavored outcome, i.e., ``abusive''. Experiments on benchmark datasets demonstrate the effectiveness of FABLE attacking fairness and utility in abusive language detection.
    摘要

When Large Language Models contradict humans? Large Language Models’ Sycophantic Behaviour

  • paper_url: http://arxiv.org/abs/2311.09410
  • repo_url: None
  • paper_authors: Leonardo Ranaldi, Giulia Pucci
  • for: 这篇论文探讨了大语言模型(LLMs)在解决复杂任务时的可能性,以及人类反馈对其回答的影响。
  • methods: 该论文使用了不同任务的人类影响提示,以探讨 LLMS 是否受到 sycophancy 行为的影响。
  • results: 研究发现,当 LLMS 回答Subjective 意见和基于事实应该提供相反回答的问题时,它们往往表现出 sycophancy 倾向,表明它们缺乏坚实性和可靠性。
    Abstract Large Language Models (LLMs) have been demonstrating the ability to solve complex tasks by delivering answers that are positively evaluated by humans due in part to the intensive use of human feedback that refines responses. However, the suggestibility transmitted through human feedback increases the inclination to produce responses that correspond to the user's beliefs or misleading prompts as opposed to true facts, a behaviour known as sycophancy. This phenomenon decreases the bias, robustness, and, consequently, their reliability. In this paper, we shed light on the suggestibility of LLMs to sycophantic behaviour, demonstrating these tendencies via human-influenced prompts over different tasks. Our investigation reveals that LLMs show sycophantic tendencies when responding to queries involving subjective opinions and statements that should elicit a contrary response based on facts, demonstrating a lack of robustness.
    摘要

Zero-Shot Relational Learning on Temporal Knowledge Graphs with Large Language Models

  • paper_url: http://arxiv.org/abs/2311.10112
  • repo_url: None
  • paper_authors: Zifeng Ding, Heling Cai, Jingpei Wu, Yunpu Ma, Ruotong Liao, Bo Xiong, Volker Tresp
  • for: 提高模型对非常见关系的预测能力
  • methods: 利用自然语言模型生成关系表示,然后将其引入嵌入式TKGF方法中
  • results: 实验结果表明,我们的方法可以帮助TKGF模型在预测未看过的关系方面提高表现,而且仍然保持预测已经看过的关系的能力
    Abstract In recent years, modeling evolving knowledge over temporal knowledge graphs (TKGs) has become a heated topic. Various methods have been proposed to forecast links on TKGs. Most of them are embedding-based, where hidden representations are learned to represent knowledge graph (KG) entities and relations based on the observed graph contexts. Although these methods show strong performance on traditional TKG forecasting (TKGF) benchmarks, they naturally face a strong challenge when they are asked to model the unseen zero-shot relations that has no prior graph context. In this paper, we try to mitigate this problem as follows. We first input the text descriptions of KG relations into large language models (LLMs) for generating relation representations, and then introduce them into embedding-based TKGF methods. LLM-empowered representations can capture the semantic information in the relation descriptions. This makes the relations, whether seen or unseen, with similar semantic meanings stay close in the embedding space, enabling TKGF models to recognize zero-shot relations even without any observed graph context. Experimental results show that our approach helps TKGF models to achieve much better performance in forecasting the facts with previously unseen relations, while still maintaining their ability in link forecasting regarding seen relations.
    摘要 Here's the Simplified Chinese translation:最近几年,模型知识演化的研究在时态知识图(TKG)上得到了广泛的关注。许多方法被提出来预测TKG上的链接。大多数方法是基于嵌入的,其中隐藏表示被学习以表示知识图实体和关系,基于观察的图文本上。although these methods show strong performance on traditional TKG forecasting (TKGF) benchmarks, they naturally face a strong challenge when they are asked to model the unseen zero-shot relations that have no prior graph context. In this paper, we try to mitigate this problem as follows. We first input the text descriptions of KG relations into large language models (LLMs) for generating relation representations, and then introduce them into embedding-based TKGF methods. LLM-empowered representations can capture the semantic information in the relation descriptions. This makes the relations, whether seen or unseen, with similar semantic meanings stay close in the embedding space, enabling TKGF models to recognize zero-shot relations even without any observed graph context. Experimental results show that our approach helps TKGF models to achieve much better performance in forecasting the facts with previously unseen relations, while still maintaining their ability in link forecasting regarding seen relations.

LOKE: Linked Open Knowledge Extraction for Automated Knowledge Graph Construction

  • paper_url: http://arxiv.org/abs/2311.09366
  • repo_url: None
  • paper_authors: Jamie McCusker
  • for: 本研究旨在提高知识图建构(KGC)中Open Information Extraction(Open IE)的效果,通过使用大语言模型(LLM)和提示工程(Prompt Engineering)来提高知识图的建构。
  • methods: 本研究使用GPT模型和提示工程来实现Open Knowledge Extraction(OKE),并提出了一种Linked Open Knowledge Extractor(LOKE)来解决相似的问题。
  • results: 研究发现,一个WellEngineered提示,配置了Naive entity linking方法(LOKE-GPT),可以超过AllenAI的OpenIE 4实现在OKE任务上的性能,尽管它生成了比参照集更多的 triple。此外,研究还发现,LOKE-GPT和”银” TekGen triple 表明任务的内容和结构都与OIE有很大差异。
    Abstract While the potential of Open Information Extraction (Open IE) for Knowledge Graph Construction (KGC) may seem promising, we find that the alignment of Open IE extraction results with existing knowledge graphs to be inadequate. The advent of Large Language Models (LLMs), especially the commercially available OpenAI models, have reset expectations for what is possible with deep learning models and have created a new field called prompt engineering. We investigate the use of GPT models and prompt engineering for knowledge graph construction with the Wikidata knowledge graph to address a similar problem to Open IE, which we call Open Knowledge Extraction (OKE) using an approach we call the Linked Open Knowledge Extractor (LOKE, pronounced like "Loki"). We consider the entity linking task essential to construction of real world knowledge graphs. We merge the CaRB benchmark scoring approach with data from the TekGen dataset for the LOKE task. We then show that a well engineered prompt, paired with a naive entity linking approach (which we call LOKE-GPT), outperforms AllenAI's OpenIE 4 implementation on the OKE task, although it over-generates triples compared to the reference set due to overall triple scarcity in the TekGen set. Through an analysis of entity linkability in the CaRB dataset, as well as outputs from OpenIE 4 and LOKE-GPT, we see that LOKE-GPT and the "silver" TekGen triples show that the task is significantly different in content from OIE, if not structure. Through this analysis and a qualitative analysis of sentence extractions via all methods, we found that LOKE-GPT extractions are of high utility for the KGC task and suitable for use in semi-automated extraction settings.
    摘要 原文:尽管开放信息EXTRACTION(Open IE)的潜力对知识图构建(KGC)似乎吸引人,但我们发现将Open IE提取结果与现有知识图进行对应是不够的。商业可用的大语言模型(LLM),特别是OpenAI模型,使得深度学习模型的期望得到了新的提升,并创造了一个新的领域called prompt engineering。我们调查了GPT模型和提示工程在知识图构建中的应用,使用Wikidata知识图来解决类似于Open IE的问题,我们称之为开放知识EXTRACTION(OKE)。我们认为实体链接任务是构建现实世界知识图的关键。我们将CaRB评估方法与数据集的TekGen结合,并显示了一个WellEngineered的提示,与Naive实体链接方法(我们称之为LOKE-GPT)在OKE任务上表现出色,虽然它生成的 triple 比参考集多,但是总体来说 triple 在TekGen集中的缺失导致了这种情况。通过CaRB数据集中实体链接性的分析,以及OpenIE 4和LOKE-GPT的输出,我们发现LOKE-GPT和“银”TekGen triple 表明任务的内容和结构都与OIE不同。通过这种分析和所有方法的 качеitative分析,我们发现LOKE-GPT提取是KGC任务中的高Utility和可以用于半自动提取设置。

Empirical evaluation of Uncertainty Quantification in Retrieval-Augmented Language Models for Science

  • paper_url: http://arxiv.org/abs/2311.09358
  • repo_url: https://github.com/pnnl/expert2
  • paper_authors: Sridevi Wagle, Sai Munikoti, Anurag Acharya, Sara Smith, Sameera Horawalavithana
  • for: 这个研究的目的是evaluating uncertainty quantification (UQ) in Retrieval Augmented Language Models (RALMs) for scientific tasks.
  • methods: 该研究使用了一种已有的RALM模型,并在其基础上进行了训练和测试,以评估模型在科学任务中的可靠性和准确性。
  • results: 研究发现,当用科学知识作为预训练和检索数据时,模型具有更高的自信心,但同时也更容易产生错误的预测。此外,模型在准确预测和错误预测之间的自信心差异不会减轻这个问题。
    Abstract Large language models (LLMs) have shown remarkable achievements in natural language processing tasks, producing high-quality outputs. However, LLMs still exhibit limitations, including the generation of factually incorrect information. In safety-critical applications, it is important to assess the confidence of LLM-generated content to make informed decisions. Retrieval Augmented Language Models (RALMs) is relatively a new area of research in NLP. RALMs offer potential benefits for scientific NLP tasks, as retrieved documents, can serve as evidence to support model-generated content. This inclusion of evidence enhances trustworthiness, as users can verify and explore the retrieved documents to validate model outputs. Quantifying uncertainty in RALM generations further improves trustworthiness, with retrieved text and confidence scores contributing to a comprehensive and reliable model for scientific applications. However, there is limited to no research on UQ for RALMs, particularly in scientific contexts. This study aims to address this gap by conducting a comprehensive evaluation of UQ in RALMs, focusing on scientific tasks. This research investigates how uncertainty scores vary when scientific knowledge is incorporated as pretraining and retrieval data and explores the relationship between uncertainty scores and the accuracy of model-generated outputs. We observe that an existing RALM finetuned with scientific knowledge as the retrieval data tends to be more confident in generating predictions compared to the model pretrained only with scientific knowledge. We also found that RALMs are overconfident in their predictions, making inaccurate predictions more confidently than accurate ones. Scientific knowledge provided either as pretraining or retrieval corpus does not help alleviate this issue. We released our code, data and dashboards at https://github.com/pnnl/EXPERT2.
    摘要 大型自然语言处理模型(LLM)在自然语言处理任务中表现出色,生成高质量输出。然而,LLM仍存在一些限制,包括生成不准确的信息。在安全关键应用中,需要评估LLM生成的内容的可靠性,以做出 Informed 决策。Retrieval Augmented Language Models(RALM)是一个相对新的研究领域,它们可以提供可靠的科学 NLP 任务。RALM 的可靠性可以通过文档检索来提高,文档可以作为生成内容的证据,让用户可以验证和探索文档来验证模型的输出。在 RALM 生成时,量化不确定性可以进一步提高可靠性,文档检索结果和信任分数共同组成一个可靠和可靠的模型。然而,对于 RALM 的 UQ 研究尚存在很大的空白,特别是在科学上。本研究希望填补这一空白,通过对 RALM 的 UQ 进行全面评估,专注于科学任务。本研究研究了 RALM 在科学任务中 uncertainty 分布的变化,以及模型生成输出的准确率和不确定性之间的关系。我们发现,当将科学知识作为检索数据和预训练数据时,RALM 会更加自信地生成预测结果。此外,我们发现 RALM 会过于自信,即在生成更多的不准确预测结果。科学知识作为预训练数据或检索数据不能够解决这一问题。我们将代码、数据和仪表分享在 GitHub 上,请参考

Privacy Threats in Stable Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.09355
  • repo_url: None
  • paper_authors: Thomas Cilloni, Charles Fleming, Charles Walter
  • for: 这篇论文旨在攻击稳定扩散计算机视觉模型中的成员推论攻击(MIA),具体是针对高度进步的稳定扩散V2模型(StabilityAI)。
  • methods: 我们使用了黑盒攻击方法,只需要重复地询问受到攻击的模型。我们的方法包括观察稳定扩散模型在不同生成epoch时的输出,并训练一个分类模型来区别生成结果是否来自训练数据集。
  • results: 我们使用ROC AUC方法评估攻击的有效率,获得60%的成功率,即可以从稳定扩散模型的输出中推断出训练数据集的成员信息。
    Abstract This paper introduces a novel approach to membership inference attacks (MIA) targeting stable diffusion computer vision models, specifically focusing on the highly sophisticated Stable Diffusion V2 by StabilityAI. MIAs aim to extract sensitive information about a model's training data, posing significant privacy concerns. Despite its advancements in image synthesis, our research reveals privacy vulnerabilities in the stable diffusion models' outputs. Exploiting this information, we devise a black-box MIA that only needs to query the victim model repeatedly. Our methodology involves observing the output of a stable diffusion model at different generative epochs and training a classification model to distinguish when a series of intermediates originated from a training sample or not. We propose numerous ways to measure the membership features and discuss what works best. The attack's efficacy is assessed using the ROC AUC method, demonstrating a 60\% success rate in inferring membership information. This paper contributes to the growing body of research on privacy and security in machine learning, highlighting the need for robust defenses against MIAs. Our findings prompt a reevaluation of the privacy implications of stable diffusion models, urging practitioners and developers to implement enhanced security measures to safeguard against such attacks.
    摘要 Our methodology involves observing the output of a stable diffusion model at different generative epochs and training a classification model to distinguish when a series of intermediates originated from a training sample or not. We propose numerous ways to measure the membership features and discuss what works best. The attack's efficacy is assessed using the ROC AUC method, demonstrating a 60% success rate in inferring membership information.This paper contributes to the growing body of research on privacy and security in machine learning, highlighting the need for robust defenses against MIAs. Our findings prompt a reevaluation of the privacy implications of stable diffusion models, urging practitioners and developers to implement enhanced security measures to safeguard against such attacks.

Generalizable Imitation Learning Through Pre-Trained Representations

  • paper_url: http://arxiv.org/abs/2311.09350
  • repo_url: None
  • paper_authors: Wei-Di Chang, Francois Hogan, David Meger, Gregory Dudek
  • for: 提高imitaiton learning政策的通用能力
  • methods: 利用自动Supervised vision transformer模型和其自然的 semantic能力来提高imitaiton learning政策的通用能力
  • results: 通过 clustering appearance features into semantic concepts, our method obtains better generalization across a wide range of appearance variations and object types, and demonstrates generalized behavior in a diverse dataset of object manipulation tasks.
    Abstract In this paper we leverage self-supervised vision transformer models and their emergent semantic abilities to improve the generalization abilities of imitation learning policies. We introduce BC-ViT, an imitation learning algorithm that leverages rich DINO pre-trained Visual Transformer (ViT) patch-level embeddings to obtain better generalization when learning through demonstrations. Our learner sees the world by clustering appearance features into semantic concepts, forming stable keypoints that generalize across a wide range of appearance variations and object types. We show that this representation enables generalized behaviour by evaluating imitation learning across a diverse dataset of object manipulation tasks. Our method, data and evaluation approach are made available to facilitate further study of generalization in Imitation Learners.
    摘要 在这篇论文中,我们利用自我supervised的视觉变换器模型和其自然而然的semantic能力来提高便函式学习策略的总体化能力。我们提出了BC-ViT算法,它利用丰富的DINO预训练的视觉变换(ViT)质点嵌入来获得更好的总体化,从示例学习中学习出更好的行为。我们的学习者通过对外观特征的归一化来形成稳定的键点,这些键点可以在各种外观变化和物体类型上广泛generalize。我们示示了这种表示能够实现总体化的行为,我们的方法、数据和评估方法都被提供,以便进一步研究便函式学习的总体化。

Generative AI-Based Probabilistic Constellation Shaping With Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.09349
  • repo_url: None
  • paper_authors: Mehdi Letafati, Samad Ali, Matti Latva-aho
  • for: 本研究旨在探讨 diffusion models 在通信工程应用中的潜在优势,以提高信息率和解码性能。
  • methods: 我们利用 denoising diffusion probabilistic models (DDPM) 的“净化并生成”特点进行 probabilistic constellation shaping。
  • results: 我们的方法比深度神经网络 (DNN) 参考方法和均匀设定更好,并且在低 SNR 条件下和非泊 distributions 下提供了网络可靠性和 Robust out-of-distribution 性能。数值评估表明,我们的方法在 64-QAM 几何中提高了cosine similarity 30%,并三倍提高了相互信息。
    Abstract Diffusion models are at the vanguard of generative AI research with renowned solutions such as ImageGen by Google Brain and DALL.E 3 by OpenAI. Nevertheless, the potential merits of diffusion models for communication engineering applications are not fully understood yet. In this paper, we aim to unleash the power of generative AI for PHY design of constellation symbols in communication systems. Although the geometry of constellations is predetermined according to networking standards, e.g., quadrature amplitude modulation (QAM), probabilistic shaping can design the probability of occurrence (generation) of constellation symbols. This can help improve the information rate and decoding performance of communication systems. We exploit the ``denoise-and-generate'' characteristics of denoising diffusion probabilistic models (DDPM) for probabilistic constellation shaping. The key idea is to learn generating constellation symbols out of noise, ``mimicking'' the way the receiver performs symbol reconstruction. This way, we make the constellation symbols sent by the transmitter, and what is inferred (reconstructed) at the receiver become as similar as possible, resulting in as few mismatches as possible. Our results show that the generative AI-based scheme outperforms deep neural network (DNN)-based benchmark and uniform shaping, while providing network resilience as well as robust out-of-distribution performance under low-SNR regimes and non-Gaussian assumptions. Numerical evaluations highlight 30% improvement in terms of cosine similarity and a threefold improvement in terms of mutual information compared to DNN-based approach for 64-QAM geometry.
    摘要 Diffusion models 是生成人工智能研究的先锋之一,其中包括Google Brain 的 ImageGen 和 OpenAI 的 DALL.E 3。然而,对于通信工程应用的 diffusion models 的潜在优点还没有得到充分了解。在这篇论文中,我们想使用生成人工智能来设计物理设计符号(PHY)在通信系统中。尽管constellation 的几何结构根据网络标准固定,例如 quadrature amplitude modulation(QAM),但可以通过概率形成来设计 constellation 符号的概率出现。这可以帮助提高通信系统的信息率和解码性能。我们利用 denoising diffusion probabilistic models(DDPM)的“denoise-and-generate”特性来进行概率形成。我们的关键思想是通过学习生成 constellation 符号来“模仿”接收器在重建符号时的过程。这样,我们可以使得发送者发送的 constellation 符号和接收器重建的符号变得非常相似,从而减少偏差。我们的结果表明,基于生成人工智能的方案在低 SNR 下和非泊然分布下表现出了更好的网络鲁棒性和robust out-of-distribution性,并且在 64-QAM 几何上实现了30%的圆拟相似性和三倍的相互信息相比于 DNN-based 方法。数值评估表明,在低 SNR 下和非泊然分布下,生成人工智能的方案可以实现30%的圆拟相似性和三倍的相互信息相比于 DNN-based 方法。

VideoCon: Robust Video-Language Alignment via Contrast Captions

  • paper_url: http://arxiv.org/abs/2311.10111
  • repo_url: https://github.com/hritikbansal/videocon
  • paper_authors: Hritik Bansal, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang, Aditya Grover
  • for: 该研究目标是提高现有视频语言对齐模型的强度,使其能够承受Semantically plausible contrastive changes在视频标题中。
  • methods: 该研究使用了一种新的视频语言对齐数据集,即VideoCon,该数据集由一个大型自然语言模型生成了可能的对比视频标题和解释。然后,研究者使用了VideoCon来训练一个生成型视频语言对齐模型,以评估视频语言相似性和生成解释。
  • results: 研究发现,VideoCon-based alignment model在人工生成的对比标题上表现出色,与现有模型相比,其AUC提高了12点。此外,该模型在无预训练的情况下在视频语言任务中(如SSv2-Temporal和ATP-Hard)达到了新的最佳性能,并在人工编写的标题和解释上也表现出优异。
    Abstract Despite being (pre)trained on a massive amount of data, state-of-the-art video-language alignment models are not robust to semantically-plausible contrastive changes in the video captions. Our work addresses this by identifying a broad spectrum of contrast misalignments, such as replacing entities, actions, and flipping event order, which alignment models should be robust against. To this end, we introduce the VideoCon, a video-language alignment dataset constructed by a large language model that generates plausible contrast video captions and explanations for differences between original and contrast video captions. Then, a generative video-language model is finetuned with VideoCon to assess video-language entailment and generate explanations. Our VideoCon-based alignment model significantly outperforms current models. It exhibits a 12-point increase in AUC for the video-language alignment task on human-generated contrast captions. Finally, our model sets new state of the art zero-shot performance in temporally-extensive video-language tasks such as text-to-video retrieval (SSv2-Temporal) and video question answering (ATP-Hard). Moreover, our model shows superior performance on novel videos and human-crafted captions and explanations. Our code and data are available at https://github.com/Hritikbansal/videocon.
    摘要 尽管使用大量数据进行(先)训练,当前最佳的视频语言对接模型并不能抗击Semantically plausible contrastive changes in the video captions。我们的工作是解决这个问题,我们识别了广泛的对接不一致,如替换实体、动作和事件顺序的flipping。为了实现这一目标,我们开发了 VideoCon,一个由大型语言模型生成的视频语言对接集合,其中包含了可能的对接视频标签和解释。然后,我们使用 VideoCon 进行finetuning,以评估视频语言关系和生成解释。我们的 VideoCon-based alignment model 在人工生成的对接标签上显示了12点的提升,而且在 temporally-extensive video-language tasks 中也达到了新的最佳无Zero-shot表现(SSv2-Temporal和ATP-Hard)。此外,我们的模型在新的视频和人工生成的标签和解释上也表现出了superior的性能。我们的代码和数据可以在 https://github.com/Hritikbansal/videocon 上获取。

Lighter, yet More Faithful: Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization

  • paper_url: http://arxiv.org/abs/2311.09335
  • repo_url: None
  • paper_authors: George Chrysostomou, Zhixue Zhao, Miles Williams, Nikolaos Aletras
  • for: investigate the effect of pruning on hallucinations in abstractive summarization with large language models (LLMs)
  • methods: pruning techniques to reduce model size, three instruction-tuned LLMs, three hallucination evaluation metrics
  • results: pruned LLMs hallucinate less compared to full-sized counterparts, greater dependency on source input leads to higher lexical overlap between generated content and source inputHere is the summary in Traditional Chinese text:
  • for: 研究对于摘要 Summarization 中大型语言模型(LLMs)的剪裁效果
  • methods: 剪裁技术来缩小模型大小,三个受训 LLMs,三个hallucination评估指标
  • results: 剪裁 LLMs 比起全大小模型少了hallucination,依赖源输入更加强,导致生成内容与源输入之间的字词相似度更高
    Abstract Despite their remarkable performance on abstractive summarization, large language models (LLMs) face two significant challenges: their considerable size and tendency to hallucinate. Hallucinations are concerning because they erode the reliability of LLMs and raise safety issues. Pruning is a technique that reduces model size by removing redundant weights to create sparse models that enable more efficient inference. Pruned models yield comparable performance to their counterpart full-sized models, making them ideal alternatives when operating on a limited budget. However, the effect that pruning has upon hallucinations in abstractive summarization with LLMs has yet to be explored. In this paper, we provide an extensive empirical study on the hallucinations produced by pruned models across three standard summarization tasks, two pruning approaches, three instruction-tuned LLMs, and three hallucination evaluation metrics. Surprisingly, we find that pruned LLMs hallucinate less compared to their full-sized counterparts. Our follow-up analysis suggests that pruned models tend to depend more on the source input and less on their parametric knowledge from pre-training for generation. This greater dependency on the source input leads to a higher lexical overlap between generated content and the source input, which can be a reason for the reduction in hallucinations.
    摘要 尽管大语言模型(LLM)在抽象概要 SUMMARIZATION 方面表现出色,但它们面临两个主要挑战:它们的很大的大小和偏差。偏差会让模型失去可靠性,并且提高安全风险。剪除是一种技术,可以通过删除重复的权重来减小模型的大小,从而实现更高效的推理。剪除后的模型可以保持与全大小模型相同的性能,因此它们成为了在有限预算情况下使用的理想选择。然而,剪除对抽象 SUMMARIZATION 中 LLM 的偏差产生的影响还没有得到探讨。在这篇论文中,我们提供了对剪除后 LLM 在三个标准 SUMMARIZATION 任务中的偏差产生进行了广泛的实验研究。我们发现,剪除后 LLM 会比全大小模型更少地偏差。我们的跟踪分析表明,剪除后模型更多地依赖于输入源,而 menos依赖于它们在预训练中学习的参数知识。这种更多地依赖于输入源的依赖性导致生成的内容与输入源的字句 overlap 更高,这可能是减少偏差的原因。

Strategic Data Augmentation with CTGAN for Smart Manufacturing: Enhancing Machine Learning Predictions of Paper Breaks in Pulp-and-Paper Production

  • paper_url: http://arxiv.org/abs/2311.09333
  • repo_url: None
  • paper_authors: Hamed Khosravi, Sarah Farhadpour, Manikanta Grandhi, Ahmed Shoyeb Raihan, Srinjoy Das, Imtiaz Ahmed
  • For: This paper aims to address the challenge of predictive maintenance in the pulp-and-paper industry, specifically the scarcity of paper breaks during production, which has a high economic impact.* Methods: The authors use a dataset of 18,398 instances derived from a quality assurance protocol, and employ Conditional Generative Adversarial Networks (CTGAN) and Synthetic Minority Oversampling Technique (SMOTE) to create a novel data augmentation framework. This method enhances the performance metrics of predictive modeling and improves the detection of machine breaks.* Results: The study achieves significant improvements in predictive maintenance performance metrics using the CTGAN-enhanced dataset. The models’ detection of machine breaks (Class 1) improves by over 30% for Decision Trees, 20% for Random Forest, and nearly 90% for Logistic Regression.Here is the information in Simplified Chinese text:* For: 这篇论文目标是解决纸品工业中预测维护的挑战,特别是生产过程中纸卷断rare事件的高经济影响。* Methods: 作者们使用18398个来自质量监控协议的实例集,并使用Conditional Generative Adversarial Networks (CTGAN)和Synthetic Minority Oversampling Technique (SMOTE)创建一个新的数据增强框架。这种方法提高预测模型性能指标,并提高机器停机的检测率。* Results: 研究实现了预测维护性能指标的显著提高,使用CTGAN增强dataset时,模型对机器停机(Class 1)的检测率提高了30%以上 для决策树,20%以上 дляRandom Forest, nearly 90%以上 дляLogistic Regression。
    Abstract A significant challenge for predictive maintenance in the pulp-and-paper industry is the infrequency of paper breaks during the production process. In this article, operational data is analyzed from a paper manufacturing machine in which paper breaks are relatively rare but have a high economic impact. Utilizing a dataset comprising 18,398 instances derived from a quality assurance protocol, we address the scarcity of break events (124 cases) that pose a challenge for machine learning predictive models. With the help of Conditional Generative Adversarial Networks (CTGAN) and Synthetic Minority Oversampling Technique (SMOTE), we implement a novel data augmentation framework. This method ensures that the synthetic data mirrors the distribution of the real operational data but also seeks to enhance the performance metrics of predictive modeling. Before and after the data augmentation, we evaluate three different machine learning algorithms-Decision Trees (DT), Random Forest (RF), and Logistic Regression (LR). Utilizing the CTGAN-enhanced dataset, our study achieved significant improvements in predictive maintenance performance metrics. The efficacy of CTGAN in addressing data scarcity was evident, with the models' detection of machine breaks (Class 1) improving by over 30% for Decision Trees, 20% for Random Forest, and nearly 90% for Logistic Regression. With this methodological advancement, this study contributes to industrial quality control and maintenance scheduling by addressing rare event prediction in manufacturing processes.
    摘要 产品生产过程中纸裂事件的rarity是维保预测的一大挑战。本文分析了一种纸制造机的操作数据,该机器的纸裂事件相对较少,但具有高经济影响。使用包含18,398个实例的质量监管协议数据集,我们解决了缺乏纸裂事件的挑战,这些事件的数量很少(124个),但它们对预测模型的性能产生了很大的影响。我们采用了 Conditional Generative Adversarial Networks(CTGAN)和Synthetic Minority Oversampling Technique(SMOTE)来实现一个新的数据增强框架。这种方法确保了生成的 sintetic 数据与实际操作数据的分布相同,同时尝试提高预测模型的性能指标。在数据增强前和后,我们评估了三种不同的机器学习算法:决策树(DT)、Random Forest(RF)和логистиック回归(LR)。使用 CTGAN 增强的数据集,我们的研究实现了显著提高的维保预测性能指标。 CTGAN 在 Addressing 缺乏数据问题方面的效果是明显的,纸裂事件的检测(Class 1)的准确率提高了30%以上 для决策树,20%以上 для Random Forest,并且近90%以上 для logistic regression。这种方法ological advancement 在工业质量控制和维保时间安排方面做出了贡献,解决了制造过程中罕见事件预测的问题。

Improving fit to human reading times via temperature-scaled surprisal

  • paper_url: http://arxiv.org/abs/2311.09325
  • repo_url: None
  • paper_authors: Tong Liu, Iza Škrjanec, Vera Demberg
  • for: 这项研究旨在使用大语言模型(LLM)模拟人类认知负荷,并研究words with lower predictability(i.e., higher surprisal)需要更多时间进行理解。
  • methods: 这项研究使用了温度缩放的 surprisal,即由形态概率决定的surprisal,作为人类阅读时间预测的Predictor。
  • results: 研究结果遍布三个资料库,表明temperature-scaled surprisal可以很好地提高预测阅读时间的准确性,并且设置温度为大约2.5可以获得最大的89%的 delta log-likelihood 提升。此外,研究还提出了一种可能的人类化偏见指标来衡量模型的可靠性。
    Abstract Past studies have provided broad support for that words with lower predictability (i.e., higher surprisal) require more time for comprehension by using large language models (LLMs) to simulate humans' cognitive load. In general, these studies have implicitly assumed that the probability scores from LLMs are accurate, ignoring the discrepancies between human cognition and LLMs from this standpoint. Inspired by the concept of probability calibration, we are the first work to focus on the probability distribution for human reading simulation. We propose to use temperature-scaled surprisal, a surprisal calculated by shaped probability, to be the predictor of human reading times. Our results across three corpora consistently revealed that such a surprisal can drastically improve the prediction of reading times. Setting the temperature to be approximately 2.5 across all models and datasets can yield up to an 89% of increase in delta log-likelihood in our setting. We also propose a calibration metric to quantify the possible human-likeness bias. Further analysis was done and provided insights into this phenomenon.
    摘要

Spoken Word2Vec: A Perspective And Some Techniques

  • paper_url: http://arxiv.org/abs/2311.09319
  • repo_url: None
  • paper_authors: Mohammad Amaan Sayeed, Hanan Aldarmaki
  • for: 这个论文的目的是探讨语音字 embedding 的问题,以及过去的研究中使用的假设和体系是否能够正确地编码 semantic features。
  • methods: 这篇论文使用 Word2Vec 算法来模型语音字的语言模式,并对过去的研究进行了实验检验,以确定这些算法是否能够正确地编码 semantic features。
  • results: 实验结果表明,过去的研究中使用的假设和体系导致了语音字 embedding 中的phonetic feature占主导地位,而不是 semantic feature。此外, automatic word type clustering 的使用也有助于改善 embedding 的质量。
    Abstract Text word embeddings that encode distributional semantic features work by modeling contextual similarities of frequently occurring words. Acoustic word embeddings, on the other hand, typically encode low-level phonetic similarities. Semantic embeddings for spoken words have been previously explored using similar algorithms to Word2Vec, but the resulting vectors still mainly encoded phonetic rather than semantic features. In this paper, we examine the assumptions and architectures used in previous works and show experimentally how Word2Vec algorithms fail to encode distributional semantics when the input units are acoustically correlated. In addition, previous works relied on the simplifying assumptions of perfect word segmentation and clustering by word type. Given these conditions, a trivial solution identical to text-based embeddings has been overlooked. We follow this simpler path using automatic word type clustering and examine the effects on the resulting embeddings, highlighting the true challenges in this task.
    摘要 文本词嵌入工具,例如Word2Vec,可以模拟语言中的 distribuional semantic 特征。但是,在使用 acoustic 词嵌入时,通常只会模拟低级别的语音相似性。在这篇论文中,我们会检查之前的作品中使用的假设和架构,并通过实验表明,Word2Vec 算法在输入单元是 acoustically correlated 时不能够编码分布semantic特征。此外,之前的作品假设了完美的单词分 segmentation 和 word type 的分 clustering,这导致了一个简单的解决方案被忽视了。我们采用自动化单词类型分 clustering,并研究 embedding 的效果, highlighting the true challenges 在这个任务中。

H-Packer: Holographic Rotationally Equivariant Convolutional Neural Network for Protein Side-Chain Packing

  • paper_url: http://arxiv.org/abs/2311.09312
  • repo_url: None
  • paper_authors: Gian Marco Visani, William Galvin, Michael Neal Pun, Armita Nourmohammad
  • for: 预测蛋白质三维结构,尤其是蛋白质侧链排列,是设计功能蛋白质的关键步骤。
  • methods: 我们提出了一种新的两阶段算法,叫做干擦包装器(H-Packer),基于两个轻量级的旋转对称神经网络。
  • results: H-Packer在CASP13和CASP14目标上展示了 Computationally efficient 和有利的性能,与传统物理学基于算法和其他深度学习解决方案相比。
    Abstract Accurately modeling protein 3D structure is essential for the design of functional proteins. An important sub-task of structure modeling is protein side-chain packing: predicting the conformation of side-chains (rotamers) given the protein's backbone structure and amino-acid sequence. Conventional approaches for this task rely on expensive sampling procedures over hand-crafted energy functions and rotamer libraries. Recently, several deep learning methods have been developed to tackle the problem in a data-driven way, albeit with vastly different formulations (from image-to-image translation to directly predicting atomic coordinates). Here, we frame the problem as a joint regression over the side-chains' true degrees of freedom: the dihedral $\chi$ angles. We carefully study possible objective functions for this task, while accounting for the underlying symmetries of the task. We propose Holographic Packer (H-Packer), a novel two-stage algorithm for side-chain packing built on top of two light-weight rotationally equivariant neural networks. We evaluate our method on CASP13 and CASP14 targets. H-Packer is computationally efficient and shows favorable performance against conventional physics-based algorithms and is competitive against alternative deep learning solutions.
    摘要 Accurately modeling protein 3D structure is essential for the design of functional proteins. An important sub-task of structure modeling is protein side-chain packing: predicting the conformation of side-chains (rotamers) given the protein's backbone structure and amino-acid sequence. Conventional approaches for this task rely on expensive sampling procedures over hand-crafted energy functions and rotamer libraries. Recently, several deep learning methods have been developed to tackle the problem in a data-driven way, albeit with vastly different formulations (from image-to-image translation to directly predicting atomic coordinates). Here, we frame the problem as a joint regression over the side-chains' true degrees of freedom: the dihedral $\chi$ angles. We carefully study possible objective functions for this task, while accounting for the underlying symmetries of the task. We propose Holographic Packer (H-Packer), a novel two-stage algorithm for side-chain packing built on top of two light-weight rotationally equivariant neural networks. We evaluate our method on CASP13 and CASP14 targets. H-Packer is computationally efficient and shows favorable performance against conventional physics-based algorithms and is competitive against alternative deep learning solutions.Here's the translation in Traditional Chinese: Accurately modeling protein 3D structure is essential for the design of functional proteins. An important sub-task of structure modeling is protein side-chain packing: predicting the conformation of side-chains (rotamers) given the protein's backbone structure and amino-acid sequence. Conventional approaches for this task rely on expensive sampling procedures over hand-crafted energy functions and rotamer libraries. Recently, several deep learning methods have been developed to tackle the problem in a data-driven way, albeit with vastly different formulations (from image-to-image translation to directly predicting atomic coordinates). Here, we frame the problem as a joint regression over the side-chains' true degrees of freedom: the dihedral $\chi$ angles. We carefully study possible objective functions for this task, while accounting for the underlying symmetries of the task. We propose Holographic Packer (H-Packer), a novel two-stage algorithm for side-chain packing built on top of two light-weight rotationally equivariant neural networks. We evaluate our method on CASP13 and CASP14 targets. H-Packer is computationally efficient and shows favorable performance against conventional physics-based algorithms and is competitive against alternative deep learning solutions.

Divergences between Language Models and Human Brains

  • paper_url: http://arxiv.org/abs/2311.09308
  • repo_url: https://github.com/flamingozh/divergence_meg
  • paper_authors: Yuchen Zhou, Emmy Liu, Graham Neubig, Leila Wehbe
  • for: 研究 whether machines and humans process language in similar ways, and explore the differences between human and machine language processing using brain data.
  • methods: 使用 Magnetoencephalography (MEG) responses to a written narrative to examine the differences between LM representations and the human brain’s responses to language, and fine-tune LMs on datasets related to specific phenomena to improve their alignment with human brain responses.
  • results: 发现 LMs 不好地理解情感、 figurative language processing 和 physical commonsense,并通过 fine-tuning LMs 可以提高它们与人类大脑响应的匹配度。
    Abstract Do machines and humans process language in similar ways? A recent line of research has hinted in the affirmative, demonstrating that human brain signals can be effectively predicted using the internal representations of language models (LMs). This is thought to reflect shared computational principles between LMs and human language processing. However, there are also clear differences in how LMs and humans acquire and use language, even if the final task they are performing is the same. Despite this, there is little work exploring systematic differences between human and machine language processing using brain data. To address this question, we examine the differences between LM representations and the human brain's responses to language, specifically by examining a dataset of Magnetoencephalography (MEG) responses to a written narrative. In doing so we identify three phenomena that, in prior work, LMs have been found to not capture well: emotional understanding, figurative language processing, and physical commonsense. By fine-tuning LMs on datasets related to these phenomena, we observe that fine-tuned LMs show improved alignment with human brain responses across these tasks. Our study implies that the observed divergences between LMs and human brains may stem from LMs' inadequate representation of these specific types of knowledge.
    摘要 人类和机器是否处理语言类似?一项研究表明,人类大脑信号可以准确预测语言模型(LM)内部表示,这被视为人类语言处理和LM共享计算原理的证明。然而,尚存在人类和机器语言获取和使用语言的显著差异,即使完成的任务相同。尽管如此,有少量研究探讨人类和机器语言处理的系统性差异使用大脑数据。为解决这个问题,我们比较LM表示和人类大脑对语言的响应,Specifically by examining a dataset of Magnetoencephalography (MEG) responses to a written narrative.在这些任务中,我们发现了三种现象,在先前的工作中LMs没有良好捕捉:情感理解、 figurative language processing和physical common sense。通过对这些任务进行数据集的细化,我们观察到了已经细化LMs的Alignment with human brain responses across these tasks.我们的研究表明,观察到的LMs和人类大脑之间差异可能由LMs的不够表示这些特定类型的知识而导致。

Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models

  • paper_url: http://arxiv.org/abs/2311.09278
  • repo_url: None
  • paper_authors: Fangzhi Xu, Zhiyong Wu, Qiushi Sun, Siyu Ren, Fei Yuan, Shuai Yuan, Qika Lin, Yu Qiao, Jun Liu
  • for: 本研究旨在探讨如何在语言模型(LLM)中注入特定的符号知识,以提高NL-centric任务的表现。
  • methods: 本研究采用了两个方向的方法:一是收集了34个符号任务,覆盖了~20种不同的形式,以捕捉符号之间的关系;二是提出了一种两阶段调试框架,能够在注入符号知识时保持NL-centric能力的一致性。
  • results: 对于符号-和NL-centric任务的广泛实验表明,Symbol-LLM系列模型在符号知识注入问题上具有balanced和superior表现。
    Abstract Large Language Models (LLMs) have greatly propelled the progress in natural language(NL)-centric tasks based on NL interface. However, the NL form is not enough for world knowledge. Current works focus on this question by injecting specific symbolic knowledge into LLM, which ignore two critical challenges: the interrelations between various symbols and the balance between symbolic-centric and NL-centric capabilities. In this work, we tackle these challenges from both a data and framework perspective and introduce Symbol-LLM series models. First, we collect 34 symbolic tasks, covering ~20 different forms, which are unified to capture symbol interrelations. Then, a two-stage tuning framework succeeds in injecting symbolic knowledge without loss of the generality ability. Extensive experiments on both symbol- and NL-centric tasks demonstrate the balanced and superior performances of Symbol-LLM series models.
    摘要 (Simplified Chinese translation)大语言模型(LLM)已经为自然语言(NL)关注的任务带来了很大的进步,基于NL界面。然而,NL形式不够用于世界知识。当前的工作都在注意这个问题,通过将特定的象征知识注入到LLM中,忽略了两个关键挑战:符号之间的关系和象征中心和NL中心能力的平衡。在这项工作中,我们从数据和框架角度来解决这些挑战,并引入 Symbol-LLM 系列模型。首先,我们收集了34个符号任务,覆盖了~20种不同的形式,这些任务被统一以捕捉符号之间的关系。然后,我们提出了一个两个阶段的调整框架,成功地注入符号知识而不失去通用能力。广泛的实验表明 Symbol-LLM 系列模型在符号和NL关注任务中具有平衡和超越的表现。

Assessing Translation capabilities of Large Language Models involving English and Indian Languages

  • paper_url: http://arxiv.org/abs/2311.09216
  • repo_url: None
  • paper_authors: Vandan Mujadia, Ashok Urlana, Yash Bhaskar, Penumalla Aditya Pavani, Kukkapalli Shravya, Parameswari Krishnamurthy, Dipti Misra Sharma
  • for: 本研究旨在探讨大型自然语言处理器(LLM)在多种自然语言译语 зада务中的多语言能力。
  • methods: 我们使用机器翻译作为英语和22种印度语言之间的译语任务,首先研究 raw LLM 的翻译能力,然后探讨 raw LLM 在上下文学习中的表现。我们使用 LoRA 等参数有效的微调方法进行微调。
  • results: 我们的研究表明,使用 LLaMA 作为基础模型,可以在英语和印度语言之间的翻译任务中取得显著进步,其中 average BLEU 分数为 13.42、15.93、12.13、12.30 和 12.07,chrF 分数为 43.98、46.99、42.55、42.42 和 45.39。在印度语言到英语的翻译任务中,我们取得了 average BLEU 分数为 14.03、16.65、16.17、15.35 和 12.55,chrF 分数为 36.71、40.44、40.26、39.51 和 36.20。总之,我们的发现表明大型自然语言处理器在机器翻译任务中具有潜在的潜力,包括目前未经投入的语言。
    Abstract Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. In this work, our aim is to explore the multilingual capabilities of large language models by using machine translation as a task involving English and 22 Indian languages. We first investigate the translation capabilities of raw large language models, followed by exploring the in-context learning capabilities of the same raw models. We fine-tune these large language models using parameter efficient fine-tuning methods such as LoRA and additionally with full fine-tuning. Through our study, we have identified the best performing large language model for the translation task involving LLMs, which is based on LLaMA. Our results demonstrate significant progress, with average BLEU scores of 13.42, 15.93, 12.13, 12.30, and 12.07, as well as CHRF scores of 43.98, 46.99, 42.55, 42.42, and 45.39, respectively, using 2-stage fine-tuned LLaMA-13b for English to Indian languages on IN22 (conversational), IN22 (general), flores200-dev, flores200-devtest, and newstest2019 testsets. Similarly, for Indian languages to English, we achieved average BLEU scores of 14.03, 16.65, 16.17, 15.35 and 12.55 along with chrF scores of 36.71, 40.44, 40.26, 39.51, and 36.20, respectively, using fine-tuned LLaMA-13b on IN22 (conversational), IN22 (general), flores200-dev, flores200-devtest, and newstest2019 testsets. Overall, our findings highlight the potential and strength of large language models for machine translation capabilities, including for languages that are currently underrepresented in LLMs.
    摘要 大型语言模型(LLMs)已经在不同的自然语言处理任务中实现了很大的进步。在这项工作中,我们的目标是探索大型语言模型在多种语言之间的多语言能力。我们首先调查了Raw大型语言模型的翻译能力,然后探索这些Raw模型在 Context Learning 中的能力。我们使用 parameter efficient fine-tuning 方法如 LoRA 和全局 fine-tuning 进行参数的调整。通过我们的研究,我们已经确定了最佳的大型语言模型为翻译任务,即基于 LLaMA 的模型。我们的结果表明了显著的进步,平均 BLEU 分数为 13.42、15.93、12.13、12.30 和 12.07,以及 CHRF 分数为 43.98、46.99、42.55、42.42 和 45.39,分别在英语到印度语言的 IN22(交流)、IN22(通用)、flores200-dev、flores200-devtest 和 newstest2019 测试集上。同样,在印度语言到英语的翻译任务中,我们获得了平均 BLEU 分数为 14.03、16.65、16.17、15.35 和 12.55,以及 CHRF 分数为 36.71、40.44、40.26、39.51 和 36.20,分别在 IN22(交流)、IN22(通用)、flores200-dev、flores200-devtest 和 newstest2019 测试集上。总的来说,我们的发现表明了大型语言模型在翻译任务中的潜力和优势,包括目前尚未得到充分利用的语言。

Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects – A Survey

  • paper_url: http://arxiv.org/abs/2311.09212
  • repo_url: https://github.com/ashokurlana/controllable_text_summarization_survey
  • paper_authors: Ashok Urlana, Pruthwik Mishra, Tathagato Roy, Rahul Mishra
  • for: 这个论文主要写于控制可能性的文章摘要方法。
  • methods: 本论文使用了多种控制可能性的方法,包括文章摘要任务的定义、分类和评价。
  • results: 本论文发现了控制可能性的文章摘要方法的多种类别和挑战,以及未来研究的可能性。
    Abstract Generic text summarization approaches often fail to address the specific intent and needs of individual users. Recently, scholarly attention has turned to the development of summarization methods that are more closely tailored and controlled to align with specific objectives and user needs. While a growing corpus of research is devoted towards a more controllable summarization, there is no comprehensive survey available that thoroughly explores the diverse controllable aspects or attributes employed in this context, delves into the associated challenges, and investigates the existing solutions. In this survey, we formalize the Controllable Text Summarization (CTS) task, categorize controllable aspects according to their shared characteristics and objectives, and present a thorough examination of existing methods and datasets within each category. Moreover, based on our findings, we uncover limitations and research gaps, while also delving into potential solutions and future directions for CTS.
    摘要 常见的文本摘要方法 often 无法 addresses 用户的特定目标和需求。近年来,学术界对于更加控制性的摘要方法的开发受到了关注。 although 一个增长的文献库 devoted towards 更加控制性的摘要, there is no comprehensive survey available that thoroughly explores the diverse controllable aspects or attributes employed in this context, delves into the associated challenges, and investigates the existing solutions. In this survey, we formalize the Controllable Text Summarization (CTS) task, categorize controllable aspects according to their shared characteristics and objectives, and present a thorough examination of existing methods and datasets within each category. Moreover, based on our findings, we uncover limitations and research gaps, while also delving into potential solutions and future directions for CTS.Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China and Singapore.

Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models

  • paper_url: http://arxiv.org/abs/2311.09210
  • repo_url: None
  • paper_authors: Wenhao Yu, Hongming Zhang, Xiaoman Pan, Kaixin Ma, Hongwei Wang, Dong Yu
  • for: 提高 Retrieval-augmented language models(RALMs)的可靠性和能力,特别是在减少假想的报道和增加外部知识源的情况下。
  • methods: 提出了一种新的Chain-of-Noting(CoN)方法,通过生成文档检索过程中的顺序阅读笔记,评估检索到的文档的相关性,并将其纳入最终的回答中。
  • results: CoN在四个开放领域问答 benchmarck 上进行了实验,结果显示,与标准 RALMs 相比,CoN 可以提高 EM 分数的平均提升为+7.9,并在实时问题中减少不相关文档的拒绝率为+10.5。
    Abstract Retrieval-augmented language models (RALMs) represent a substantial advancement in the capabilities of large language models, notably in reducing factual hallucination by leveraging external knowledge sources. However, the reliability of the retrieved information is not always guaranteed. The retrieval of irrelevant data can lead to misguided responses, and potentially causing the model to overlook its inherent knowledge, even when it possesses adequate information to address the query. Moreover, standard RALMs often struggle to assess whether they possess adequate knowledge, both intrinsic and retrieved, to provide an accurate answer. In situations where knowledge is lacking, these systems should ideally respond with "unknown" when the answer is unattainable. In response to these challenges, we introduces Chain-of-Noting (CoN), a novel approach aimed at improving the robustness of RALMs in facing noisy, irrelevant documents and in handling unknown scenarios. The core idea of CoN is to generate sequential reading notes for retrieved documents, enabling a thorough evaluation of their relevance to the given question and integrating this information to formulate the final answer. We employed ChatGPT to create training data for CoN, which was subsequently trained on an LLaMa-2 7B model. Our experiments across four open-domain QA benchmarks show that RALMs equipped with CoN significantly outperform standard RALMs. Notably, CoN achieves an average improvement of +7.9 in EM score given entirely noisy retrieved documents and +10.5 in rejection rates for real-time questions that fall outside the pre-training knowledge scope.
    摘要 大型语言模型(RALM)在增强它们的能力方面取得了重要进步,主要是减少了假想的投入。然而,获取的信息不一定可靠。不必要的数据获取可能导致错误的响应,甚至让模型忽略其内置的知识,即使它拥有足够的信息来回答问题。此外,标准的RALM通常难以判断自己是否具备了足够的知识来提供正确的答案。在知识缺乏的情况下,这些系统应该回答为“未知”。为解决这些挑战,我们提出了链条注释(CoN),一种新的方法,可以提高RALM在噪音、无关文档中的稳定性,以及处理未知情况的能力。CoN的核心思想是生成批处理的阅读笔记,以评估 retrieved 文档的相关性,并将其集成到提供答案。我们使用了ChatGPT创建训练数据,然后将其训练在LLaMa-2 7B 模型上。我们在四个开放领域问答 benchmark 上进行了实验,结果显示,RALMs 配置了 CoN 显著超越标准 RALMs。特别是,CoN 在 entirely 噪音获取的情况下的 EM 分数平均提高了+7.9,并在实时问题 external 知识范围外的拒绝率上提高了+10.5。

Fusion-Eval: Integrating Evaluators with LLMs

  • paper_url: http://arxiv.org/abs/2311.09204
  • repo_url: None
  • paper_authors: Lei Shu, Nevan Wichers, Liangchen Luo, Yun Zhu, Yinxiao Liu, Jindong Chen, Lei Meng
    for: 这篇论文的目的是评估大型自然语言处理模型(LLMs)的评估方法,以便更好地理解自然语言理解和高级逻辑预期。methods: 这篇论文使用了多种评估方法,包括人类基于、模型基于和自动度量标准方法,并通过将这些方法综合使用来创建一个更加灵活和有效的评估系统。results: 在使用SummEval数据集进行测试时,Fusion-Eval实现了Spearman相关性0.96,超过其他评估器。这表明了Fusion-Eval可以充分利用多个参考来生成高度相似于人类视角的评估结果,为LLM评估做出了新的标准。
    Abstract Evaluating Large Language Models (LLMs) is a complex task, especially considering the intricacies of natural language understanding and the expectations for high-level reasoning. Traditional evaluations typically lean on human-based, model-based, or automatic-metrics-based paradigms, each with its own advantages and shortcomings. We introduce "Fusion-Eval", a system that employs LLMs not solely for direct evaluations, but to skillfully integrate insights from diverse evaluators. This gives Fusion-Eval flexibility, enabling it to work effectively across diverse tasks and make optimal use of multiple references. In testing on the SummEval dataset, Fusion-Eval achieved a Spearman correlation of 0.96, outperforming other evaluators. The success of Fusion-Eval underscores the potential of LLMs to produce evaluations that closely align human perspectives, setting a new standard in the field of LLM evaluation.
    摘要 评估大语言模型(LLM)是一项复杂的任务,尤其是在自然语言理解方面和高级逻辑预期下。传统评估方法通常是人类基础、模型基础或自动指标基础的三者,每种方法都有其优点和缺点。我们介绍了“融合评估”(Fusion-Eval)系统,它不仅利用 LLM 进行直接评估,而且灵活地结合了多个评估者的意见。这使得 Fusion-Eval 能够在多种任务上工作有效,并且能够最大化多个参考。在 SummEval 数据集上测试时,Fusion-Eval 达到了 Spearman 相关系数 0.96,超越其他评估器。Fusion-Eval 的成功表明 LLM 可以生成高度吻合人类视角的评估结果,为 LLM 评估领域设置了新的标准。

ExpM+NF: Differentially Private Machine Learning that Surpasses DPSGD

  • paper_url: http://arxiv.org/abs/2311.09200
  • repo_url: None
  • paper_authors: Robert A. Bridges, Vandy J. Tombs, Christopher B. Stanley
  • for: 本研究旨在提出一种基于Exponential Mechanism(ExpM)和auxiliary Normalizing Flow(NF)的方法,用于在private数据上训练机器学习(ML)模型,并 garantuee differential privacy(DP)保证。
  • methods: 本方法使用ExpM和NF结合使用,以实现在private数据上训练ML模型,并可以实现预先指定的DP保证。
  • results: 对于多个分类任务和不同的数据集,ExpM+NF可以 achieve greater than 93%的非私有训练精度(AUC),并且在DP保证下提供更高的精度和更低的DP保证($\varepsilon$)。I hope that helps! Let me know if you have any further questions or if you’d like me to help with anything else.
    Abstract In this pioneering work we formulate ExpM+NF, a method for training machine learning (ML) on private data with pre-specified differentially privacy guarantee $\varepsilon>0, \delta=0$, by using the Exponential Mechanism (ExpM) and an auxiliary Normalizing Flow (NF). We articulate theoretical benefits of ExpM+NF over Differentially Private Stochastic Gradient Descent (DPSGD), the state-of-the-art (SOTA) and de facto method for differentially private ML, and we empirically test ExpM+NF against DPSGD using the SOTA implementation (Opacus with PRV accounting) in multiple classification tasks on the Adult Dataset (census data) and MIMIC-III Dataset (electronic healthcare records) using Logistic Regression and GRU-D, a deep learning recurrent neural network with ~20K-100K parameters. In all experiments, ExpM+NF achieves greater than 93% of the non-private training accuracy (AUC) for $\varepsilon \in [1\mathrm{e}{-3}, 1]$, exhibiting greater accuracy (higher AUC) and privacy (lower $\varepsilon$ with $\delta=0$) than DPSGD. Differentially private ML generally considers $\varepsilon \in [1,10]$ to maintain reasonable accuracy; hence, ExpM+NF's ability to provide strong accuracy for orders of magnitude better privacy (smaller $\varepsilon$) substantially pushes what is currently possible in differentially private ML. Training time results are presented showing ExpM+NF is comparable to (slightly faster) than DPSGD. Code for these experiments will be provided after review. Limitations and future directions are provided.
    摘要 在这项先锋工作中,我们提出了ExpM+NF方法,用于在private数据上训练机器学习(ML),并 garantía differentially privacy 保证 $\varepsilon>0, \delta=0$。我们解释了ExpM+NF方法与State-of-the-art(SOTA)和de facto方法 differentially private Stochastic Gradient Descent(DPSGD)之间的理论优势,并对ExpM+NF方法和DPSGD进行了多个分类任务中的empirical测试,使用了Adult Dataset(人口普查数据)和MIMIC-III Dataset(电子医疗记录)上的Logistic Regression和GRU-D,一个深度学习循环神经网络,parameters数量在20K-100K之间。在所有实验中,ExpM+NF方法可以在 $\varepsilon \in [1\mathrm{e}{-3}, 1]$ 范围内达到非私有训练精度(AUC)的大于93%,表现出更高的准确率(AUC)和隐私(lower $\varepsilon$ with $\delta=0$),比DPSGD更好。 differentially private ML通常Consider $\varepsilon \in [1,10]$ 以保持合理的准确率;因此,ExpM+NF方法的能力提供许多orders of magnitude better privacy(smaller $\varepsilon)substantially pushes what is currently possible in differentially private ML。我们还提供了训练时间结果,表明ExpM+NF方法与DPSGD相对(slightly faster)。我们将在审核后提供代码。 limitations和未来方向也被提供。

Never Lost in the Middle: Improving Large Language Models via Attention Strengthening Question Answering

  • paper_url: http://arxiv.org/abs/2311.09198
  • repo_url: None
  • paper_authors: Junqing He, Kunhao Pan, Xiaoqun Dong, Zhuoyang Song, Yibo Liu, Yuxin Liang, Hao Wang, Qianguo Sun, Songxin Zhang, Zejian Xie, Jiaxing Zhang
  • for: 提高大语言模型在长文本上的信息搜寻和反思能力
  • methods: 提出特制的任务 called Attention Strengthening Multi-doc QA (ASM QA),以提高模型在长文本上的精准搜寻能力
  • results: 实验结果显示,模型在多文档问答和其他标准任务上表现出色,与当前最佳模型相比,在随机设置下获得13.7%的绝对提升,在文章检索任务上获得21.5%的提升。
    Abstract While large language models (LLMs) are equipped with longer text input capabilities than before, they are struggling to seek correct information in long contexts. The "lost in the middle" problem challenges most LLMs, referring to the dramatic decline in accuracy when correct information is located in the middle. To overcome this crucial issue, this paper proposes to enhance the information searching and reflection ability of LLMs in long contexts via specially designed tasks called Attention Strengthening Multi-doc QA (ASM QA). Following these tasks, our model excels in focusing more precisely on the desired information. Experimental results show substantial improvement in Multi-doc QA and other benchmarks, superior to state-of-the-art models by 13.7% absolute gain in shuffled settings, by 21.5% in passage retrieval task. We release our model, Ziya-Reader to promote related research in the community.
    摘要 大型语言模型(LLM)具有更长的文本输入能力,但它们在长文本上寻找正确信息时受到挑战。这个“lost in the middle”问题对大多数LLM都是一个重要问题,指的是正确信息在中间部分的减退率。为了解决这个关键问题,这篇论文提出了通过特定任务 called Attention Strengthening Multi-doc QA(ASM QA)来增强LLM在长文本上的信息寻找和反射能力。在这些任务中,我们的模型在更加精准地Focus on Desired Information。实验结果表明,我们的模型在多文档问答和其他标准 bencmarks 上表现出了明显的提升,相比领先模型的13.7%绝对提升,在排序任务上提高21.5%。我们将发布我们的模型,Ziya-Reader,以便在社区中促进相关的研究。

The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task

  • paper_url: http://arxiv.org/abs/2311.09193
  • repo_url: None
  • paper_authors: Yifan Wu, Pengchuan Zhang, Wenhan Xiong, Barlas Oguz, James C. Gee, Yixin Nie
  • for: 这个研究探讨了链条思维方法在复杂视语任务中的有效性,这种方法通过将任务拆分成子任务和中间步骤来提高语言任务的效率。
  • methods: 这篇研究使用了”描述然后做出决策”策略,这种策略draws inspiration from human signal processing mechanisms,并在探测任务中提高了性能,提高了50%。
  • results: 这篇研究发现,使用”描述然后做出决策”策略可以在复杂视语任务中提高探测任务的性能,提高50%。
    Abstract The study explores the effectiveness of the Chain-of-Thought approach, known for its proficiency in language tasks by breaking them down into sub-tasks and intermediate steps, in improving vision-language tasks that demand sophisticated perception and reasoning. We present the "Description then Decision" strategy, which is inspired by how humans process signals. This strategy significantly improves probing task performance by 50%, establishing the groundwork for future research on reasoning paradigms in complex vision-language tasks.
    摘要 这个研究探讨了链条思维方法的效iveness,这种方法以分解语言任务为互助步骤而著称,在复杂的视觉语言任务中提高了高级观察和理解能力。我们提出了“描述然后决策”策略,这种策略 Draws inspiration from human signal processing and significantly improves probing task performance by 50%. This lays the foundation for future research on reasoning paradigms in complex vision-language tasks.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Towards Verifiable Text Generation with Symbolic References

  • paper_url: http://arxiv.org/abs/2311.09188
  • repo_url: None
  • paper_authors: Lucas Torroba Hennigen, Shannon Shen, Aniruddha Nrusimha, Bernhard Gapp, David Sontag, Yoon Kim
  • for: 这篇论文目的是提出一种简单的方法来使大语言模型(LLM)的输出更易于人工验证,以便用于高风险应用。
  • methods: 该论文提出了一种名为符号附加生成(SymGen)的方法,它使得 LLM 可以在输出文本中嵌入显式的符号参考,以便显示不同的文本段的来源。
  • results: 实验表明, LLM 可以通过 SymGen 方法直接输出包含符号参考的文本,而不会影响文本的流畅性和准确性。
    Abstract Large language models (LLMs) have demonstrated an impressive ability to synthesize plausible and fluent text. However they remain vulnerable to hallucinations, and thus their outputs generally require manual human verification for high-stakes applications, which can be time-consuming and difficult. This paper proposes symbolically grounded generation (SymGen) as a simple approach for enabling easier validation of an LLM's output. SymGen prompts an LLM to interleave its regular output text with explicit symbolic references to fields present in some conditioning data (e.g., a table in JSON format). The references can be used to display the provenance of different spans of text in the generation, reducing the effort required for manual verification. Across data-to-text and question answering experiments, we find that LLMs are able to directly output text that makes use of symbolic references while maintaining fluency and accuracy.
    摘要

Generate, Filter, and Fuse: Query Expansion via Multi-Step Keyword Generation for Zero-Shot Neural Rankers

  • paper_url: http://arxiv.org/abs/2311.09175
  • repo_url: None
  • paper_authors: Minghan Li, Honglei Zhuang, Kai Hui, Zhen Qin, Jimmy Lin, Rolf Jagerman, Xuanhui Wang, Michael Bendersky
  • for: 提高零shot neural ranker的启发搜索精度
  • methods: 提出了一个名为GFF的管道,包括一个大型自然语言模型和一个神经网络排序器,用于生成、筛选和融合查询扩展。
  • results: GFF可以提高零shot nDCG@10在BEIR和TREC DL 2019/2020上。
    Abstract Query expansion has been proved to be effective in improving recall and precision of first-stage retrievers, and yet its influence on a complicated, state-of-the-art cross-encoder ranker remains under-explored. We first show that directly applying the expansion techniques in the current literature to state-of-the-art neural rankers can result in deteriorated zero-shot performance. To this end, we propose GFF, a pipeline that includes a large language model and a neural ranker, to Generate, Filter, and Fuse query expansions more effectively in order to improve the zero-shot ranking metrics such as nDCG@10. Specifically, GFF first calls an instruction-following language model to generate query-related keywords through a reasoning chain. Leveraging self-consistency and reciprocal rank weighting, GFF further filters and combines the ranking results of each expanded query dynamically. By utilizing this pipeline, we show that GFF can improve the zero-shot nDCG@10 on BEIR and TREC DL 2019/2020. We also analyze different modelling choices in the GFF pipeline and shed light on the future directions in query expansion for zero-shot neural rankers.
    摘要 Query expansion 已经证明可以提高首个检索器的准确率和匹配率,但是它对现代跨Encoder排名器的影响还未得到充分探讨。我们首先表明,直接在当前文献中使用扩展技术可能会导致现有神经排名器的零件性能下降。为此,我们提出了GFF,一个管道,包括一个大型自然语言模型和一个神经排名器,用于生成、筛选和融合查询扩展更有效地,以提高零件性能指标 such as nDCG@10。具体来说,GFF首先通过一个遵循语言模型来生成基于查询的关键词,然后通过自我一致和对偶排名Weight来筛选和组合每个扩展查询的排名结果。通过这个管道,我们表明GFF可以提高零件 nDCG@10 在 BEIR 和 TREC DL 2019/2020。我们还分析了 GFF 管道中不同的模型选择和未来方向。

AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph

  • paper_url: http://arxiv.org/abs/2311.09174
  • repo_url: https://github.com/hkust-knowcomp/abspyramid
  • paper_authors: Zhaowei Wang, Haochen Shi, Weiqi Wang, Tianqing Fang, Hongming Zhang, Sehyun Choi, Xin Liu, Yangqiu Song
  • for: 本研究旨在探讨语言模型内置抽象能力的现状,并提出一个大规模的抽象知识图。
  • methods: 本研究使用了一个大规模的文本描述数据集,通过构建抽象知识图来评估语言模型在开放领域中的抽象能力。
  • results: 实验结果表明,现有的LLMs在零shot和几shot设置下具有很大的抽象知识识别挑战。通过训练在我们的充沛抽象知识上,我们发现LLMs可以获得基本的抽象能力,并在未见事件中进行抽象。同时,我们也证明了我们的指标是可以强化LLMs在两个前一个抽象任务上。
    Abstract Cognitive research indicates that abstraction ability is essential in human intelligence, which remains under-explored in language models. In this paper, we present AbsPyramid, a unified entailment graph of 221K textual descriptions of abstraction knowledge. While existing resources only touch nouns or verbs within simplified events or specific domains, AbsPyramid collects abstract knowledge for three components of diverse events to comprehensively evaluate the abstraction ability of language models in the open domain. Experimental results demonstrate that current LLMs face challenges comprehending abstraction knowledge in zero-shot and few-shot settings. By training on our rich abstraction knowledge, we find LLMs can acquire basic abstraction abilities and generalize to unseen events. In the meantime, we empirically show that our benchmark is comprehensive to enhance LLMs across two previous abstraction tasks.
    摘要 研究表明人类智能中的抽象能力是非常重要的,但是这一点尚未得到充分的探索。在这篇论文中,我们介绍了一个名为AbsPyramid的抽象知识维度图,包含221K个文本描述。与现有资源不同,AbsPyramid不仅覆盖了简化事件中的名词或动词,而是收集了多元事件中的抽象知识,以全面评估语言模型在开放领域中的抽象能力。实验结果表明,现有的LLMs在零shot和几shot设定下面临着抽象知识的挑战。通过在我们的充足抽象知识上训练,我们发现LLMs可以学习基本的抽象能力,并在未经见过的事件上进行推断。同时,我们实验表明,我们的标准可以提高LLMs在两个之前的抽象任务中表现。

Temporal Knowledge Question Answering via Abstract Reasoning Induction

  • paper_url: http://arxiv.org/abs/2311.09149
  • repo_url: None
  • paper_authors: Ziyang Chen, Dongfang Li, Xiang Zhao, Baotian Hu, Min Zhang
  • for: 本研究旨在解决大语言模型(LLM)中的时间知识推理问题,这是LLM遇到的一个重要挑战,这些问题可能会导致LLM生成错误或误导信息,主要是因为它们的时间知识处理能力有限,同时复杂的时间逻辑也会带来问题。
  • methods: 我们提出了一种新的构建主义方法,它强调在LLM学习中实行持续的知识合成和个性化。我们的方法包括Abstract Reasoning Induction ARI框架,这个框架将时间推理分成两个不同阶段:知识无关和知识基础。这种分类目标在减少幻觉和提高LLM对抽象方法的应用。
  • results: 我们的方法在两个时间问答Dataset上获得了显著改进,相比于基eline,我们的方法提高了29.7%和9.27%。这demonstrates our approach’s efficacy in enhancing temporal reasoning in LLMs. The code will be released at https://github.com/czy1999/ARI.
    Abstract In this paper, we tackle the significant challenge of temporal knowledge reasoning in Large Language Models (LLMs), an area where such models frequently encounter difficulties. These difficulties often result in the generation of misleading or incorrect information, primarily due to their limited capacity to process evolving factual knowledge and complex temporal logic. In response, we propose a novel, constructivism-based approach that advocates for a paradigm shift in LLM learning towards an active, ongoing process of knowledge synthesis and customization. At the heart of our proposal is the Abstract Reasoning Induction ARI framework, which divides temporal reasoning into two distinct phases: Knowledge-agnostic and Knowledge-based. This division aims to reduce instances of hallucinations and improve LLMs' capacity for integrating abstract methodologies derived from historical data. Our approach achieves remarkable improvements, with relative gains of 29.7\% and 9.27\% on two temporal QA datasets, underscoring its efficacy in advancing temporal reasoning in LLMs. The code will be released at https://github.com/czy1999/ARI.
    摘要 在这篇论文中,我们面临着大语言模型(LLM)中的时间知识推理挑战,这是 LLM 很频繁遇到的问题。这些问题经常导致 LLM 生成错误或误导信息,主要是因为它们对逐渐发展的事实知识和复杂的时间逻辑处理能力有限。为此,我们提出了一种新的建构主义方法,强调 LLM 学习 Should be an active, ongoing process of knowledge synthesis and customization。我们的提议的核心是抽象逻辑推理引入框架(ARI),将时间推理分为两个不同阶段:无知阶段和知识阶段。这种分类的目的是减少 LLM 生成幻见的情况,提高它们对历史数据 derivated 抽象方法的集成能力。我们的方法在两个时间问答 dataset 上显示了很大的改进,相对于基eline的提升率分别为 29.7% 和 9.27%,这证明了我们的方法在提高 LLM 中的时间推理能力具有效果。代码将在 GitHub 上发布,请参考 https://github.com/czy1999/ARI。

Jailbreaking GPT-4V via Self-Adversarial Attacks with System Prompts

  • paper_url: http://arxiv.org/abs/2311.09127
  • repo_url: None
  • paper_authors: Yuanwei Wu, Xiang Li, Yixin Liu, Pan Zhou, Lichao Sun
  • for: 本研究的目的是探讨 Multimodal Large Language Models (MLLMs) 的安全问题,具体来说是通过对 GPT-4V 的系统提示泄露漏洞进行攻击,以及如何通过自我反对攻击(Self-Adversarial Attack via System Prompt,简称 SASP)方法来实现 MLLM 的破狱。
  • methods: 本研究使用了一种新的破狱攻击方法,即 SASP,该方法利用了 GPT-4 作为红人工具,通过对自己的系统提示进行攻击,以搜索可能的破狱提示。此外,为了提高攻击成功率,还添加了人工修改基于 GPT-4 的分析。
  • results: 本研究发现,修改系统提示可以有效降低破狱成功率。 In addition, the study found that modifying system prompts can effectively reduce jailbreak success rates.
    Abstract Existing work on jailbreak Multimodal Large Language Models (MLLMs) has focused primarily on adversarial examples in model inputs, with less attention to vulnerabilities in model APIs. To fill the research gap, we carry out the following work: 1) We discover a system prompt leakage vulnerability in GPT-4V. Through carefully designed dialogue, we successfully steal the internal system prompts of GPT-4V. This finding indicates potential exploitable security risks in MLLMs; 2)Based on the acquired system prompts, we propose a novel MLLM jailbreaking attack method termed SASP (Self-Adversarial Attack via System Prompt). By employing GPT-4 as a red teaming tool against itself, we aim to search for potential jailbreak prompts leveraging stolen system prompts. Furthermore, in pursuit of better performance, we also add human modification based on GPT-4's analysis, which further improves the attack success rate to 98.7\%; 3) We evaluated the effect of modifying system prompts to defend against jailbreaking attacks. Results show that appropriately designed system prompts can significantly reduce jailbreak success rates. Overall, our work provides new insights into enhancing MLLM security, demonstrating the important role of system prompts in jailbreaking, which could be leveraged to greatly facilitate jailbreak success rates while also holding the potential for defending against jailbreaks.
    摘要 现有研究对囚犯多Modal大型语言模型(MLLM)主要集中在输入例针对攻击,少量关注模型API的漏洞。为填补这 gap,我们实施以下工作:1. 我们发现了GPT-4V中的系统提示泄露漏洞。通过特殊的对话设计,我们成功夺取了GPT-4V的内部系统提示。这一发现表明MLLM可能存在潜在的可以利用的安全风险;2. 基于夺取的系统提示,我们提出了一种新的MLLM囚犯攻击方法,称为SASP(自我反对性攻击via系统提示)。通过使用GPT-4作为红色团队工具,我们尝试通过夺取的系统提示找到可能的囚犯提示。此外,为了提高攻击成功率,我们还添加了人工修改基于GPT-4的分析,这进一步提高了攻击成功率到98.7%;3. 我们评估了修改系统提示以防止囚犯攻击的效果。结果表明,适当设计的系统提示可以减少囚犯成功率。总的来说,我们的工作提供了新的思路来增强MLLM安全性,表明系统提示在囚犯中具有重要的作用,可以大大提高囚犯成功率,同时也有可能用于防止囚犯。

HEALNet – Hybrid Multi-Modal Fusion for Heterogeneous Biomedical Data

  • paper_url: http://arxiv.org/abs/2311.09115
  • repo_url: None
  • paper_authors: Konstantin Hemker, Nikola Smidjievski, Mateja Jamnik
  • for: This paper is written for researchers and practitioners in the field of multi-modal biomedical modelling, specifically those working with image, tabular, and graph data in medical applications.
  • methods: The Hybrid Early-fusion Attention Learning Network (HEALNet) architecture is used in this paper, which combines modality-specific architectures with cross-modal attention mechanisms to capture crucial cross-modal information and preserve modality-specific structural information.
  • results: The HEALNet architecture achieves state-of-the-art performance in multi-modal survival analysis on Whole Slide Images and Multi-omic data from four cancer cohorts in The Cancer Genome Atlas (TCGA), substantially improving over both uni-modal and recent multi-modal baselines, while being robust in scenarios with missing modalities.
    Abstract Technological advances in medical data collection such as high-resolution histopathology and high-throughput genomic sequencing have contributed to the rising requirement for multi-modal biomedical modelling, specifically for image, tabular, and graph data. Most multi-modal deep learning approaches use modality-specific architectures that are trained separately and cannot capture the crucial cross-modal information that motivates the integration of different data sources. This paper presents the Hybrid Early-fusion Attention Learning Network (HEALNet): a flexible multi-modal fusion architecture, which a) preserves modality-specific structural information, b) captures the cross-modal interactions and structural information in a shared latent space, c) can effectively handle missing modalities during training and inference, and d) enables intuitive model inspection by learning on the raw data input instead of opaque embeddings. We conduct multi-modal survival analysis on Whole Slide Images and Multi-omic data on four cancer cohorts of The Cancer Genome Atlas (TCGA). HEALNet achieves state-of-the-art performance, substantially improving over both uni-modal and recent multi-modal baselines, whilst being robust in scenarios with missing modalities.
    摘要 技术进步在医疗数据收集中,如高分辨率 histopathology 和高通过put genomic sequencing,对多Modal生物医学模型的需求提高了。大多数多Modal深入学习方法使用专门的模式特性 architecture,这些模型在独立地训练,无法捕捉 crossing Modal 信息,这些信息是集成不同数据源的关键。这篇论文提出了 Hybrid Early-fusion Attention Learning Network (HEALNet):一种灵活的多Modal融合建模 Architecture,具有以下特点:a) 保持 Modal 特有的结构信息b) 捕捉 crossing Modal 交互和结构信息在共享封装空间中c) 可以效果地处理训练和推断中缺失的 Modald) 允许直观地模型检查,通过学习原始数据输入而不是抽象封装我们在TCGA 四个肿瘤 cohort 上进行多Modal 存活分析,HEALNet 实现了状态之 arts 性能,大幅提高过uni-Modal 和 latest multi-Modal 基elines,同时在缺失 Modal 的情况下具有强健性。

Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification

  • paper_url: http://arxiv.org/abs/2311.09114
  • repo_url: None
  • paper_authors: Haoqiang Kang, Juntong Ni, Huaxiu Yao
  • for: 这个论文的目的是解决大语言模型(LLM)在生成文本时遇到的不准确或幻想内容问题。
  • methods: 该论文提出了一种实时验证和修正(Ever)方法,通过实时步骤的生成和幻想修正策略来检测和修正幻想错误。
  • results: 与基eline相比,Ever在多种任务上(包括短answer问题、生成传记和多步论证)表现出了显著的改善,能够生成可靠和事实正确的文本。
    Abstract Large Language Models (LLMs) have demonstrated remarkable proficiency in generating fluent text. However, they often encounter the challenge of generating inaccurate or hallucinated content. This issue is common in both non-retrieval-based generation and retrieval-augmented generation approaches, and existing post-hoc rectification methods may not address the accumulated hallucination errors that may be caused by the "snowballing" issue, especially in reasoning tasks. To tackle these challenges, we introduce a novel approach called Real-time Verification and Rectification (Ever). Instead of waiting until the end of the generation process to rectify hallucinations, Ever employs a real-time, step-wise generation and hallucination rectification strategy. The primary objective is to detect and rectify hallucinations as they occur during the text generation process. When compared to both retrieval-based and non-retrieval-based baselines, Ever demonstrates a significant improvement in generating trustworthy and factually accurate text across a diverse range of tasks, including short-form QA, biography generation, and multi-hop reasoning.
    摘要 Translated into Simplified Chinese:大型语言模型 (LLMs) 已经示出了惊人的流畅性,但它们经常遇到生成不准确或幻想内容的挑战。这个问题是非 retrieve-based 生成和 retrieve-augmented 生成方法中的共同问题,而现有的后续修正方法可能不能Address the accumulated hallucination errors that may be caused by the "snowballing" issue, especially in reasoning tasks。为解决这些挑战,我们介绍了一种新的方法called Real-time Verification and Rectification (Ever).而不是等待生成过程结束后进行修正幻想,Ever 使用了实时步骤生成和幻想修正策略。主要目标是在生成过程中实时检测和修正幻想。与 retrieve-based 和 non-retrieve-based 基线相比,Ever 在多种任务上,包括短问答、生传生成和多步逻辑 reasoning 等,示出了显著的改善。

  • paper_url: http://arxiv.org/abs/2311.09109
  • repo_url: None
  • paper_authors: Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
  • for: 本研究旨在探讨PLM-based KGC方法是否能够真正进行推理,或者只是通过Memorization来获得高性能。
  • methods: 我们提出了一种Synthetic dataset construction方法,用于分析PLM-based KGC方法是否能够进行推理。
  • results: 我们发现,PLMs通过预训练获得了推理能力,尽管表现改进主要来自于实体和关系文本信息。
    Abstract Knowledge graphs (KGs) consist of links that describe relationships between entities. Due to the difficulty of manually enumerating all relationships between entities, automatically completing them is essential for KGs. Knowledge Graph Completion (KGC) is a task that infers unseen relationships between entities in a KG. Traditional embedding-based KGC methods, such as RESCAL, TransE, DistMult, ComplEx, RotatE, HAKE, HousE, etc., infer missing links using only the knowledge from training data. In contrast, the recent Pre-trained Language Model (PLM)-based KGC utilizes knowledge obtained during pre-training. Therefore, PLM-based KGC can estimate missing links between entities by reusing memorized knowledge from pre-training without inference. This approach is problematic because building KGC models aims to infer unseen links between entities. However, conventional evaluations in KGC do not consider inference and memorization abilities separately. Thus, a PLM-based KGC method, which achieves high performance in current KGC evaluations, may be ineffective in practical applications. To address this issue, we analyze whether PLM-based KGC methods make inferences or merely access memorized knowledge. For this purpose, we propose a method for constructing synthetic datasets specified in this analysis and conclude that PLMs acquire the inference abilities required for KGC through pre-training, even though the performance improvements mostly come from textual information of entities and relations.
    摘要 知识图(KG)由关系链描述实体之间的关系。由于手动列出所有实体间关系的困难,自动完成这些关系是知识图的关键。知识图完成任务(KGC)是尝试推断实体间未知的关系。传统的嵌入式KGC方法,如RESCAL、TransE、DistMult、ComplEx、RotatE、HAKE、HousE等,通过训练数据来INFER未知的关系。与此相反,最近的预训练语言模型(PLM)基于KGC利用预训练中获得的知识。因此,PLM基于KGC可以估计实体间未知的关系,而不需要INFER。这种方法存在问题,因为建立KGC模型的目标是INFER实体间未知的关系。然而,现有的KGC评价方法不会分开考虑推断和嵌入能力。因此,一个PLM基于KGC方法,即在当前KGC评价中具有高性能,可能在实际应用中效果不佳。为解决这个问题,我们分析PLM基于KGC方法是否进行推断或只是访问嵌入知识。为此,我们提出一种方法构建定制化的 sintetic dataset,并结论PLM在预训练中获得了推断能力,即使表现改进主要来自实体和关系的文本信息。

Towards A Unified View of Answer Calibration for Multi-Step Reasoning

  • paper_url: http://arxiv.org/abs/2311.09101
  • repo_url: None
  • paper_authors: Shumin Deng, Ningyu Zhang, Nay Oo, Bryan Hooi
  • for: 该论文旨在探讨以Chain-of-Thought(CoT)提示方法改进多步逻辑能力的大语言模型(LLMs)。
  • methods: 该论文分析了近期的答栏准确策略,并提供了一种统一的视角,以便系统地检查多个路径上的步骤级和路径级答栏准确策略。
  • results: 该论文通过对多个路径上的答栏准确策略进行系统性的评估,探讨了多步逻辑的优化。
    Abstract Large Language Models (LLMs) employing Chain-of-Thought (CoT) prompting have broadened the scope for improving multi-step reasoning capabilities. Usually, answer calibration strategies such as step-level or path-level calibration play a vital role in multi-step reasoning. While effective, there remains a significant gap in our understanding of the key factors that drive their success. In this paper, we break down the design of recent answer calibration strategies and present a unified view which establishes connections between them. We then conduct a thorough evaluation on these strategies from a unified view, systematically scrutinizing step-level and path-level answer calibration across multiple paths. Our study holds the potential to illuminate key insights for optimizing multi-step reasoning with answer calibration.
    摘要

Can MusicGen Create Training Data for MIR Tasks?

  • paper_url: http://arxiv.org/abs/2311.09094
  • repo_url: None
  • paper_authors: Nadine Kroher, Helena Cuesta, Aggelos Pikrakis
  • for: 这个论文是为了研究基于AI生成音乐系统来生成用于音乐信息检索(MIR)任务的训练数据而写的。
  • methods: 该论文使用了MusicGen生成器生成了5个音乐种类的大量人工音乐样本,并使用了这些样本来训练一个类别模型。
  • results: 实验结果表明,提议的模型可以从人工音乐辑中学习到类别特征,并能够在真实音乐录音中Generalize well。
    Abstract We are investigating the broader concept of using AI-based generative music systems to generate training data for Music Information Retrieval (MIR) tasks. To kick off this line of work, we ran an initial experiment in which we trained a genre classifier on a fully artificial music dataset created with MusicGen. We constructed over 50 000 genre- conditioned textual descriptions and generated a collection of music excerpts that covers five musical genres. Our preliminary results show that the proposed model can learn genre-specific characteristics from artificial music tracks that generalise well to real-world music recordings.
    摘要 我们正在研究使用基于人工智能的生成音乐系统来生成听力音乐信息检索(MIR)任务的训练数据。为了开始这条工作,我们进行了一次初步实验,在我们训练了一个类别分类器的基础上,使用了MusicGen创建的完全人工音乐数据集。我们构建了50000多个频道条件的文本描述,并生成了涵盖五种音乐类型的音乐片断集。我们的初步结果表明,我们的提议的模型可以从人工音乐追踪中学习类别特征,这些特征可以通过实际音乐录音来泛化。

The Uli Dataset: An Exercise in Experience Led Annotation of oGBV

  • paper_url: http://arxiv.org/abs/2311.09086
  • repo_url: None
  • paper_authors: Arnav Arora, Maha Jinadoss, Cheshta Arora, Denny George, Brindaalakshmi, Haseena Dawood Khan, Kirti Rawat, Div, Ritash, Seema Mathur, Shivani Yadav, Shehla Rashid Shora, Rie Raut, Sumit Pawar, Apurva Paithane, Sonia, Vivek, Dharini Priscilla, Khairunnisha, Grace Banu, Ambika Tandon, Rishav Thakker, Rahul Dev Korra, Aatman Vaidya, Tarunima Prabhakar
  • for: 这个论文目的是为了提供一个语言特定和上下文相关的 dataset,以便开发自动识别 hate speech 和 gendered abuse 的 AI 系统。
  • methods: 这个论文使用了 Twitter 上的 tweets,并将其分类为三个问题:对于 gender abuse 的经历,由女性或 LGBTQIA 社区成员领导的专家进行标注。
  • results: 通过这个 dataset,研究人员展示了一种参与式的方法来创建 dataset,并通过这些 dataset 驱动 AI 系统。
    Abstract Online gender based violence has grown concomitantly with adoption of the internet and social media. Its effects are worse in the Global majority where many users use social media in languages other than English. The scale and volume of conversations on the internet has necessitated the need for automated detection of hate speech, and more specifically gendered abuse. There is, however, a lack of language specific and contextual data to build such automated tools. In this paper we present a dataset on gendered abuse in three languages- Hindi, Tamil and Indian English. The dataset comprises of tweets annotated along three questions pertaining to the experience of gender abuse, by experts who identify as women or a member of the LGBTQIA community in South Asia. Through this dataset we demonstrate a participatory approach to creating datasets that drive AI systems.
    摘要 互联网上的性别基于暴力现象随着互联网和社交媒体的普及而增长。其影响更加严重在全球主要地区,因为许多用户在不使用英语的情况下使用社交媒体。因为互联网上的规模和量的对话,需要自动检测 hate speech 和更Specifically gendered abuse。然而, Currently, there is a lack of language-specific and contextual data to build such automated tools. In this paper, we present a dataset on gendered abuse in three languages - Hindi, Tamil, and Indian English. The dataset consists of tweets annotated with three questions related to the experience of gender abuse, annotated by experts who identify as women or members of the LGBTQIA community in South Asia. Through this dataset, we demonstrate a participatory approach to creating datasets that drive AI systems.

How Multilingual is Multilingual LLM?

  • paper_url: http://arxiv.org/abs/2311.09071
  • repo_url: None
  • paper_authors: Fei Yuan, Shuai Yuan, Zhiyong Wu, Lei Li
  • for: 这项研究旨在评估大语言模型(LLMs)在101种语言中的多语言能力,并将语言分为四个不同的 quadrant,以便更好地了解这些语言的特点和 optimize their performance.
  • methods: 研究使用了现有的 LLMs,并通过对这些模型进行调整和训练来提高其多语言能力。
  • results: 研究发现,现有的 LLMs 在101种语言中的多语言能力比预期更高,并且可以通过对每个 quadrant 的特点进行调整来进一步提高多语言性能。
    Abstract Large Language Models (LLMs), trained predominantly on extensive English data, often exhibit limitations when applied to other languages. Current research is primarily focused on enhancing the multilingual capabilities of these models by employing various tuning strategies. Despite their effectiveness in certain languages, the understanding of the multilingual abilities of LLMs remains incomplete. This study endeavors to evaluate the multilingual capacity of LLMs by conducting an exhaustive analysis across 101 languages, and classifies languages with similar characteristics into four distinct quadrants. By delving into each quadrant, we shed light on the rationale behind their categorization and offer actionable guidelines for tuning these languages. Extensive experiments reveal that existing LLMs possess multilingual capabilities that surpass our expectations, and we can significantly improve the multilingual performance of LLMs by focusing on these distinct attributes present in each quadrant.
    摘要 大型语言模型(LLM),通常在广泛的英语数据上训练,在其他语言上表现有限。当前的研究主要集中在提高 LLM 的多语言能力,使用不同的调整策略。虽然在某些语言上有效,但我们对 LLM 的多语言能力的理解仍然不够完整。这项研究尝试对 101 种语言进行了全面的分析,并将语言分为四个不同的方块。我们对每个方块进行了详细的分析,并提供了改进 LLM 的多语言性表现的实用指南。广泛的实验表明,现有的 LLM 在多语言方面的能力超出了我们的预期,并且可以通过对每个方块的特点进行调整来进一步提高多语言性表现。

How Well Do Large Language Models Truly Ground?

  • paper_url: http://arxiv.org/abs/2311.09069
  • repo_url: None
  • paper_authors: Hyunji Lee, Sejune Joo, Chaeeun Kim, Joel Jang, Doyoung Kim, Kyoung-Woon On, Minjoon Seo
  • for: 这paper aims to improve the reliability and controllability of Large Language Models (LLMs) by introducing a stricter definition of grounding and developing a new dataset and metric to assess it.
  • methods: 该paper uses a new dataset and a grounding metric to evaluate the grounding capabilities of 13 different LLMs of various sizes and training methods.
  • results: 研究发现,现有的知识增强模型通常只关注response中是否包含正确答案,而忽略了response的可靠性和可控性。新的定义和 metric 能够评估模型是否真正基于知识进行回答,并提供了更多的信息来改进模型的可靠性和可控性。
    Abstract Reliance on the inherent knowledge of Large Language Models (LLMs) can cause issues such as hallucinations, lack of control, and difficulties in integrating variable knowledge. To mitigate this, LLMs can be probed to generate responses by grounding on external context, often given as input (knowledge-augmented models). Yet, previous research is often confined to a narrow view of the term "grounding", often only focusing on whether the response contains the correct answer or not, which does not ensure the reliability of the entire response. To address this limitation, we introduce a strict definition of grounding: a model is considered truly grounded when its responses (1) fully utilize necessary knowledge from the provided context, and (2) don't exceed the knowledge within the contexts. We introduce a new dataset and a grounding metric to assess this new definition and perform experiments across 13 LLMs of different sizes and training methods to provide insights into the factors that influence grounding performance. Our findings contribute to a better understanding of how to improve grounding capabilities and suggest an area of improvement toward more reliable and controllable LLM applications.
    摘要 依赖大语言模型(LLM)的内在知识可能会导致问题,如幻觉、无控和变量知识的集成问题。为了解决这些问题,LLM可以通过附加外部 контекст进行探索,并生成响应(知识增强型模型)。然而,过去的研究通常受限于“安全”的定义,即判断响应是否包含正确的答案,这并不能 garantuee 整个响应的可靠性。为了解决这些限制,我们提出了一个严格的定义:一个模型被 considere 为真正地附加了知识,当其响应(1)完全利用提供的 контекст中的所有必要知识,(2)不超过 контекст中的知识。我们介绍了一个新的数据集和附加 metric,以评估这个新定义,并在13种不同的 LLM 中进行了实验,以提供关于如何提高附加能力的深入了解和建议。我们的发现可以帮助改善 LLM 的可靠性和控制性,并且建议一个可以提高 LLM 应用的方向。

Learning Fair Division from Bandit Feedback

  • paper_url: http://arxiv.org/abs/2311.09068
  • repo_url: None
  • paper_authors: Hakuei Yamada, Junpei Komiyama, Kenshi Abe, Atsushi Iwasaki
  • for: 这篇论文研究了在不约束的线性鱼市中进行在线分配,在中央规划者不知道代理人的价值或利益下进行分配。
  • methods: 我们引入了一种封包算法,使用双平均来慢慢学习到来到达者的物品类型分布和代理人的价值。
  • results: 我们证明了我们的提议的算法可以在线ark Fisher市场中 asymptotically 实现 оптимальную拜纳社会利益,并提供了 regret bounds。我们还通过 sintetic 和实验数据 validate 了我们的算法的超越性。
    Abstract This work addresses learning online fair division under uncertainty, where a central planner sequentially allocates items without precise knowledge of agents' values or utilities. Departing from conventional online algorithm, the planner here relies on noisy, estimated values obtained after allocating items. We introduce wrapper algorithms utilizing \textit{dual averaging}, enabling gradual learning of both the type distribution of arriving items and agents' values through bandit feedback. This approach enables the algorithms to asymptotically achieve optimal Nash social welfare in linear Fisher markets with agents having additive utilities. We establish regret bounds in Nash social welfare and empirically validate the superior performance of our proposed algorithms across synthetic and empirical datasets.
    摘要

In-vehicle Sensing and Data Analysis for Older Drivers with Mild Cognitive Impairment

  • paper_url: http://arxiv.org/abs/2311.09273
  • repo_url: None
  • paper_authors: Sonia Moshfeghi, Muhammad Tanveer Jan, Joshua Conniff, Seyedeh Gol Ara Ghoreishi, Jinwoo Jang, Borko Furht, Kwangsoo Yang, Monica Rosselli, David Newman, Ruth Tappen, Dana Smith
  • for: 这项研究的目的是设计低成本的在日常驾驶环境中获取高精度定位和电子邮件数据的汽车仪器,并通过机器学习方法早期发现老年人智能障碍的迹象。
  • methods: 这项研究使用了低成本的在汽车内部设备来获取高精度定位和电子邮件数据,并使用机器学习方法来检测老年人智能障碍的迹象。
  • results: 研究结果表明,老年人智能障碍的 drivers 在日常驾驶中比不受智能障碍的 drivers 驾驶更稳定和安全,而且机器学习模型也 identific 了驾驶次数、教育水平和夜间驾驶次数为最重要的因素。
    Abstract Driving is a complex daily activity indicating age and disease related cognitive declines. Therefore, deficits in driving performance compared with ones without mild cognitive impairment (MCI) can reflect changes in cognitive functioning. There is increasing evidence that unobtrusive monitoring of older adults driving performance in a daily-life setting may allow us to detect subtle early changes in cognition. The objectives of this paper include designing low-cost in-vehicle sensing hardware capable of obtaining high-precision positioning and telematics data, identifying important indicators for early changes in cognition, and detecting early-warning signs of cognitive impairment in a truly normal, day-to-day driving condition with machine learning approaches. Our statistical analysis comparing drivers with MCI to those without reveals that those with MCI exhibit smoother and safer driving patterns. This suggests that drivers with MCI are cognizant of their condition and tend to avoid erratic driving behaviors. Furthermore, our Random Forest models identified the number of night trips, number of trips, and education as the most influential factors in our data evaluation.
    摘要 驾驶是一项复杂的日常活动,表征年龄和疾病相关的认知下降。因此,与无明遇病患(MCI)相比,驾驶性能下降的差异可能反映认知功能的变化。有增加证据表明,在日常生活环境中不侵入式监测老年人驾驶行为可能有助于早期发现轻度认知障碍。本文的目标包括设计低成本的汽车内部感知硬件,获得高精度的位置定位和通信数据,确定重要的认知变化指标,并使用机器学习方法探测日常驾驶中的认知障碍警示。我们的统计分析表明,与MCI相比,有MCI的 Driver exhibit更稳定和更安全的驾驶模式。这表明,有MCI的 Driver 意识到自己的状况,并尽可能避免异常的驾驶行为。此外,我们的Random Forest模型确定了夜间行驶次数、总行驶次数和教育水平是我们数据评估中最重要的因素。

Assessing Knowledge Editing in Language Models via Relation Perspective

  • paper_url: http://arxiv.org/abs/2311.09053
  • repo_url: https://github.com/weiyifan1023/knowledge-edit-based-on-relation-perspective
  • paper_authors: Yifan Wei, Xiaoyan Yu, Huanhuan Ma, Fangyu Lei, Yixuan Weng, Ran Song, Kang Liu
  • for: 本研究旨在修改大语言模型中的事实知识,并 investigate relation-centric 知识编辑方法的可行性。
  • methods: 本研究使用了一个新的benchmark名为RaKE,用于评估relation based知识编辑方法。还进行了多种知识编辑基线的比较实验,以及对 transformer 中关系知识的深入研究。
  • results: 研究结果表明,现有的知识编辑方法在编辑关系上存在潜在的困难,而且关系知识不仅存储在FFN网络中,还存储在注意层中。这些结果为未来的relation-based知识编辑方法提供了实验支持。
    Abstract Knowledge Editing (KE) for modifying factual knowledge in Large Language Models (LLMs) has been receiving increasing attention. However, existing knowledge editing methods are entity-centric, and it is unclear whether this approach is suitable for a relation-centric perspective. To address this gap, this paper constructs a new benchmark named RaKE, which focuses on Relation based Knowledge Editing. In this paper, we establish a suite of innovative metrics for evaluation and conduct comprehensive experiments involving various knowledge editing baselines. We notice that existing knowledge editing methods exhibit the potential difficulty in their ability to edit relations. Therefore, we further explore the role of relations in factual triplets within the transformer. Our research results confirm that knowledge related to relations is not only stored in the FFN network but also in the attention layers. This provides experimental support for future relation-based knowledge editing methods.
    摘要 大型语言模型(LLM)中的知识编辑(KE)已经获得了增加的注意。然而,现有的知识编辑方法都是基于实体中心的,而不是关系中心的。为了填补这个差距,本文建立了一个新的benchmark名为RaKE,它专注于关系基本知识编辑。本文提出了一个创新的评估标准和进行了各种知识编辑基线的广泛实验。我们发现现有的知识编辑方法对于修改关系表现出了潜在的问题。因此,我们进一步探索关系在简单 triplets 中的知识是如何储存和处理的。我们的研究结果显示,关系知识不仅在 FFN 网络中储存,还在注意层中储存。这给了未来关系基本知识编辑方法的实验支持。

Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts

  • paper_url: http://arxiv.org/abs/2311.09050
  • repo_url: https://github.com/ecnu-dase-nlp/rqp
  • paper_authors: Yunshi Lan, Xiang Li, Xin Liu, Yang Li, Wei Qin, Weining Qian
  • for: 本研究旨在提高零shot情境下的视觉问答系统(VQA)的性能,通过帮助大语言模型(LLMs)更好地理解和回答问题。
  • methods: 我们提出了一种新的问题提示方法,即理解问题提示(Reasoning Question Prompts,RQP),可以让LLMs更好地理解和回答问题。RQP通过一个不supervised的问题编辑模块生成了每个问题的自 contenido问题,以便更好地指导LLMs回答问题。
  • results: 我们在三个VQA挑战中测试了RQP方法,结果表明,RQP可以在零shot情境下显著提高LLMs的性能,并在四个数据集中超越现有的零shot方法。我们的源代码已经公开在GitHub上(https://github.com/ECNU-DASE-NLP/RQP)。
    Abstract Zero-shot Visual Question Answering (VQA) is a prominent vision-language task that examines both the visual and textual understanding capability of systems in the absence of training data. Recently, by converting the images into captions, information across multi-modalities is bridged and Large Language Models (LLMs) can apply their strong zero-shot generalization capability to unseen questions. To design ideal prompts for solving VQA via LLMs, several studies have explored different strategies to select or generate question-answer pairs as the exemplar prompts, which guide LLMs to answer the current questions effectively. However, they totally ignore the role of question prompts. The original questions in VQA tasks usually encounter ellipses and ambiguity which require intermediate reasoning. To this end, we present Reasoning Question Prompts for VQA tasks, which can further activate the potential of LLMs in zero-shot scenarios. Specifically, for each question, we first generate self-contained questions as reasoning question prompts via an unsupervised question edition module considering sentence fluency, semantic integrity and syntactic invariance. Each reasoning question prompt clearly indicates the intent of the original question. This results in a set of candidate answers. Then, the candidate answers associated with their confidence scores acting as answer heuristics are fed into LLMs and produce the final answer. We evaluate reasoning question prompts on three VQA challenges, experimental results demonstrate that they can significantly improve the results of LLMs on zero-shot setting and outperform existing state-of-the-art zero-shot methods on three out of four data sets. Our source code is publicly released at \url{https://github.com/ECNU-DASE-NLP/RQP}.
    摘要 zero-shot视觉问答(VQA)是一种引人注目的视觉语言任务,它检验系统在不受训练数据的情况下,对图像和文本之间的理解能力。最近,通过将图像转换为caption,使得多 modalities之间的信息相互汇流,大型自然语言模型(LLMs)可以通过未经训练的情况下,对未经见过的问题进行有效的回答。为了设计理想的提问方法,许多研究已经探索了不同的策略来选择或生成问题答对,作为示例提问。然而,它们完全忽视了提问的角色。原始的VQA任务中的问题通常会遇到斜杠和混乱,需要中间的推理。为此,我们提出了视觉问答推理提问(RQP),可以further activate LLMs的零shot能力。具体来说,为每个问题,我们首先通过不supervised问题编辑模块生成自包含的推理提问,考虑语言流畅性、意义完整性和语法不变性。每个推理提问都能够明确表达问题的意图。这些推理提问的候选答案与其自信度分数 acting as answer heuristics被 fed into LLMs,并生成最终的答案。我们在三个VQA挑战中评估了推理提问,实验结果表明,它们可以在零shot Setting下significantly提高LLMs的表现,并在四个数据集中超越现有的零shot方法。我们的源代码公开release于\url{https://github.com/ECNU-DASE-NLP/RQP}.

MELA: Multilingual Evaluation of Linguistic Acceptability

  • paper_url: http://arxiv.org/abs/2311.09033
  • repo_url: None
  • paper_authors: Ziyin Zhang, Yikang Liu, Weifang Huang, Junyu Mao, Rui Wang, Hai Hu
  • for: 本研究的目的是提供一个多语言的语言模型评估 benchmark,以evaluate 不同语言模型在语言学可接受性方面的表现。
  • methods: 本研究使用了多种语言模型,包括ChatGPT和XLM-R,并进行了过程学习和多任务学习。同时,研究者们还使用了层 wise probing 来分析 XLM-R 的 weights 是如何影响其在不同语言之间的推理能力。
  • results: 研究结果显示,XLM-R 在 zero-shot Setting 中可以达到与 fine-tuned XLM-R 相当的性能,而 ChatGPT 则需要在 Context 中提供示例来改善其性能。同时,研究者们还发现了一些语言之间的推理困难,并提出了一种” conflicting weight” 的概念来描述这种现象。
    Abstract Recent benchmarks for Large Language Models (LLMs) have mostly focused on application-driven tasks such as complex reasoning and code generation, and this has led to a scarcity in purely linguistic evaluation of LLMs. Against this background, we introduce Multilingual Evaluation of Linguistic Acceptability -- MELA, the first multilingual benchmark on linguistic acceptability with 48K samples covering 10 languages from a diverse set of language families. We establish baselines of commonly used LLMs along with supervised models, and conduct cross-lingual transfer and multi-task learning experiments with XLM-R. In pursuit of multilingual interpretability, we analyze the weights of fine-tuned XLM-R to explore the possibility of identifying transfer difficulty between languages. Our results show that ChatGPT benefits much from in-context examples but still lags behind fine-tuned XLM-R, while the performance of GPT-4 is on par with fine-tuned XLM-R even in zero-shot setting. Cross-lingual and multi-task learning experiments show that unlike semantic tasks, in-language training data is crucial in acceptability judgements. Results in layerwise probing indicate that the upper layers of XLM-R become a task-specific but language-agnostic region for multilingual acceptability judgment. We also introduce the concept of conflicting weight, which could be a potential indicator for the difficulty of cross-lingual transfer between languages. Our data will be available at https://github.com/sjtu-compling/MELA.
    摘要 近期大语言模型(LLM)的 benchmark 主要集中在应用驱动的任务上,如复杂的理解和代码生成,这导致了对 LLM 的纯语言评估的缺乏。为了解决这问题,我们介绍了多语言评估语言可接受性(MELA),这是一个包含 48K 个样本,覆盖 10 种语言家族的多语言 benchmark。我们建立了常用的 LLG 基elines,以及supervised 模型的基elines,并进行了跨语言传播和多任务学习实验。在追求多语言可读性的探索中,我们分析了精心调整的 XLM-R 的权重,以探索语言之间传播困难的可能性。我们的结果显示,ChatGPT 受到上下文例子的启发,但仍落后于精心调整的 XLM-R,而 GPT-4 在零shot 设定下与精心调整的 XLM-R 的性能相当。跨语言和多任务学习实验表明,与 semantic 任务不同,在语言上的培训数据是关键在 acceptability 判断中。层wise probing 结果表明,XLM-R 的Upper层变成了多语言可接受性的任务特定 yet language-agnostic 区域。我们还引入了 conflicting weight 概念,它可能是跨语言传播之间语言的难度指标。我们的数据将在 GitHub 上发布。

Assessing the Robustness of Intelligence-Driven Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.09027
  • repo_url: None
  • paper_authors: Lorenzo Nodari, Federico Cerutti
  • for: This paper focuses on the problem of robustness in intelligence-driven reinforcement learning, specifically in military contexts where high stakes and uncertainty are prevalent.
  • methods: The paper employs reward machines to express complex reward structures in RL tasks, and explores the need for further research in evidential reasoning and learning to improve the robustness of current state-of-the-art reinforcement learning approaches.
  • results: The preliminary results presented in the paper suggest the need for further research to harden current RL approaches before they can be considered mission-critical-ready.
    Abstract Robustness to noise is of utmost importance in reinforcement learning systems, particularly in military contexts where high stakes and uncertain environments prevail. Noise and uncertainty are inherent features of military operations, arising from factors such as incomplete information, adversarial actions, or unpredictable battlefield conditions. In RL, noise can critically impact decision-making, mission success, and the safety of personnel. Reward machines offer a powerful tool to express complex reward structures in RL tasks, enabling the design of tailored reinforcement signals that align with mission objectives. This paper considers the problem of the robustness of intelligence-driven reinforcement learning based on reward machines. The preliminary results presented suggest the need for further research in evidential reasoning and learning to harden current state-of-the-art reinforcement learning approaches before being mission-critical-ready.
    摘要 <>military contexts 的 robustness to noise 是权重要的,特别是在高赌注和不确定环境下。雨声和不确定性是军事操作的内生特征,由于因素如不完整信息、敌方行动或不可预测的战场条件而出现。在RL中,雨声可能会重要影响决策、任务成功和人员安全。奖励机器提供了一种强大的工具来表达复杂的奖励结构在RL任务中,使得设计定制化的奖励信号与任务目标相对应。本文考虑了奖励机器驱动的智能学习robustness问题。初步结果表明需要进一步研究证据推理和学习以强化当前状态艺术RL方法,以便在任务关键ready。<>

Identification and Estimation for Nonignorable Missing Data: A Data Fusion Approach

  • paper_url: http://arxiv.org/abs/2311.09015
  • repo_url: None
  • paper_authors: Zixiao Wang, AmirEmad Ghassami, Ilya Shpitser
  • for: identifying and estimating a parameter of interest in settings where data is missing not at random (MNAR)
  • methods: inspired by data fusion, using information in an MNAR dataset and an auxiliary dataset subject to missingness at random (MAR)
  • results: can identify the parameter of interest given pooled data, under two complementary sets of assumptions; derived an inverse probability weighted (IPW) estimator for identified parameters, and evaluated the performance of the estimation strategies via simulation studies
    Abstract We consider the task of identifying and estimating a parameter of interest in settings where data is missing not at random (MNAR). In general, such parameters are not identified without strong assumptions on the missing data model. In this paper, we take an alternative approach and introduce a method inspired by data fusion, where information in an MNAR dataset is augmented by information in an auxiliary dataset subject to missingness at random (MAR). We show that even if the parameter of interest cannot be identified given either dataset alone, it can be identified given pooled data, under two complementary sets of assumptions. We derive an inverse probability weighted (IPW) estimator for identified parameters, and evaluate the performance of our estimation strategies via simulation studies.
    摘要 我团队考虑了在数据损失不均匀(MNAR)的情况下识别和估算参数 интереса。通常情况下,这些参数无法 sans strong assumptions on the missing data model。在这篇论文中,我们采取了一种不同的方法,并通过数据融合引入了一个auxiliary dataset,这个dataset受到随机 missing(MAR)。我们表明,即使 données alone 中的参数无法识别,也可以通过 combining data 识别出参数,只要满足两个 complementary sets of assumptions。我们 derivate了一种 inverse probability weighted(IPW)估计器,并通过 simulations studies 评估了我们的估计策略的性能。

Adversarial Attacks to Reward Machine-based Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.09014
  • repo_url: None
  • paper_authors: Lorenzo Nodari
  • for: 本研究旨在提供首个对奖金机制(RM)基于 reinforcement learning 技术的安全性分析,以便更好地理解和提高这种技术在不良场景下的稳定性。
  • methods: 本研究使用 blinding attacks 这种新的攻击方法,以评估 RM-based reinforcement learning 技术的安全性。
  • results: 研究发现,blinding attacks 可以成功地破坏 RM-based reinforcement learning 技术的安全性,并提供了一种新的攻击方法来攻击这种技术。
    Abstract In recent years, Reward Machines (RMs) have stood out as a simple yet effective automata-based formalism for exposing and exploiting task structure in reinforcement learning settings. Despite their relevance, little to no attention has been directed to the study of their security implications and robustness to adversarial scenarios, likely due to their recent appearance in the literature. With my thesis, I aim to provide the first analysis of the security of RM-based reinforcement learning techniques, with the hope of motivating further research in the field, and I propose and evaluate a novel class of attacks on RM-based techniques: blinding attacks.
    摘要

Leveraging AI for Natural Disaster Management : Takeaways From The Moroccan Earthquake

  • paper_url: http://arxiv.org/abs/2311.08999
  • repo_url: None
  • paper_authors: Morocco Solidarity Hackathon
  • for: 这篇论文主要是为了探讨在2023年阿哈鲁兹地震后,全球灾害管理策略的批判性反思,以及使用人工智能(AI)提高灾害准备、应急回应和恢复的技术。
  • methods: 这篇论文使用了全面的文献综述、赢得项目概述、关键发现和挑战,包括实时开源数据、数据缺乏和交叉学科合作的障碍。
  • results: 这篇论文得到了许多关键发现和挑战,包括实时开源数据的潜在价值、数据缺乏的问题和交叉学科合作的障碍。同时,论文还发起了社区呼吁,呼吁更多的行业专家和学者参与到灾害管理领域的研究和实践中来。
    Abstract The devastating 6.8-magnitude earthquake in Al Haouz, Morocco in 2023 prompted critical reflections on global disaster management strategies, resulting in a post-disaster hackathon, using artificial intelligence (AI) to improve disaster preparedness, response, and recovery. This paper provides (i) a comprehensive literature review, (ii) an overview of winning projects, (iii) key insights and challenges, namely real-time open-source data, data scarcity, and interdisciplinary collaboration barriers, and (iv) a community-call for further action.
    摘要 在2023年Morocco的阿卢哈沃兹发生了6.8级地震,这导致了全球灾害管理策略的批判性反思,并且促使了一场以人工智能(AI)为核心的 poste-disaster hackathon,以提高灾害准备、应急回应和恢复。本文提供以下内容:1. 全面的文献综述2. 赢家项目的概述3. 关键的发现和挑战,包括实时开源数据、数据缺乏和跨学科协作障碍4. 社区呼吁更进一步的行动Translation notes:* "阿卢哈沃兹" (Al Haouz) is the name of the location where the earthquake occurred, and it is written in Simplified Chinese as "阿卢哈沃兹" (Al Haouz).* "灾害管理策略" (disaster management strategies) is written as "灾害管理策略" (disaster management strategies) in Simplified Chinese.* "poste-disaster hackathon" is written as "后灾害黑匠挑战" (post-disaster hackathon) in Simplified Chinese.* "实时开源数据" (real-time open-source data) is written as "实时开源数据" (real-time open-source data) in Simplified Chinese.* "数据缺乏" (data scarcity) is written as "数据缺乏" (data scarcity) in Simplified Chinese.* "跨学科协作障碍" (interdisciplinary collaboration barriers) is written as "跨学科协作障碍" (interdisciplinary collaboration barriers) in Simplified Chinese.

When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks

  • paper_url: http://arxiv.org/abs/2311.08993
  • repo_url: None
  • paper_authors: Hao Peng, Xiaozhi Wang, Jianhui Chen, Weikai Li, Yunjia Qi, Zimu Wang, Zhili Wu, Kaisheng Zeng, Bin Xu, Lei Hou, Juanzi Li
  • for: 本文旨在探讨大语言模型(LLM)在启发式学习(ICL)方法下的局限性,以及这些局限性的根本原因。
  • methods: 作者通过对18种特有的任务进行广泛的实验,发现ICL在处理这些任务时存在三个主要的原因:无法具体地理解上下文,任务架构理解与人类不一致,以及缺乏长文理解能力。
  • results: 研究发现,通过细化调教,LLM可以在这些任务上达到不错的性能,这表明ICL的失败不是LLM的内在缺陷,而是现有的对齐方法的不足,使LLM无法处理复杂的规则繁残任务。
    Abstract In-context learning (ICL) has become the default method for using large language models (LLMs), making the exploration of its limitations and understanding the underlying causes crucial. In this paper, we find that ICL falls short of handling specification-heavy tasks, which are tasks with complicated and extensive task specifications, requiring several hours for ordinary humans to master, such as traditional information extraction tasks. The performance of ICL on these tasks mostly cannot reach half of the state-of-the-art results. To explore the reasons behind this failure, we conduct comprehensive experiments on 18 specification-heavy tasks with various LLMs and identify three primary reasons: inability to specifically understand context, misalignment in task schema comprehension with humans, and inadequate long-text understanding ability. Furthermore, we demonstrate that through fine-tuning, LLMs can achieve decent performance on these tasks, indicating that the failure of ICL is not an inherent flaw of LLMs, but rather a drawback of existing alignment methods that renders LLMs incapable of handling complicated specification-heavy tasks via ICL. To substantiate this, we perform dedicated instruction tuning on LLMs for these tasks and observe a notable improvement. We hope the analyses in this paper could facilitate advancements in alignment methods enabling LLMs to meet more sophisticated human demands.
    摘要 启发式学习(ICL)已成为大语言模型(LLM)的默认方法,因此探索其限制和理解下面层次的原因变得非常重要。在这篇论文中,我们发现ICL在需要较多任务规定的任务上表现不佳,这些任务通常需要人类花费几个小时来学习,如传统信息抽取任务。ICL的性能在这些任务上通常无法达到状态艺术的一半。为了探索这些失败的原因,我们进行了18个需要较多任务规定的任务的广泛实验,并确定了三个主要原因:无法特别理解上下文,任务架构与人类的理解不符,以及缺乏长文本理解能力。此外,我们还证明了通过微调,LLM可以在这些任务上达到不错的表现,这表明ICL失败不是LLM的内在缺陷,而是现有的对齐方法的缺陷,使得LLM无法通过ICL处理复杂的需要较多任务。为了证明这一点,我们在LLM上进行了专门的指令调整,并观察到了明显的改善。我们希望这些分析可以促进对齐方法的进步,使LLM能够更好地满足人类的需求。

Proceedings Fifth International Workshop on Formal Methods for Autonomous Systems

  • paper_url: http://arxiv.org/abs/2311.08987
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Marie Farrell, Matt Luckcuck, Mario Gleirscher, Maike Schwammberger
  • for: 本研讨会论文集是为了形式方法与自主系统之间的研究提供一个发表平台。
  • methods: 本研讨会接受了25篇提交论文,其中包括11篇正式论文、3篇经验报告、6篇研究预览和5篇视野论文。
  • results: 经审核后,本研讨会接受了15篇论文,包括8篇长篇论文和7篇短篇论文。
    Abstract This EPTCS volume contains the proceedings for the Fifth International Workshop on Formal Methods for Autonomous Systems (FMAS 2023), which was held on the 15th and 16th of November 2023. FMAS 2023 was co-located with 18th International Conference on integrated Formal Methods (iFM) (iFM'22), organised by Leiden Institute of Advanced Computer Science of Leiden University. The workshop itself was held at Scheltema Leiden, a renovated 19th Century blanket factory alongside the canal. FMAS 2023 received 25 submissions. We received 11 regular papers, 3 experience reports, 6 research previews, and 5 vision papers. The researchers who submitted papers to FMAS 2023 were from institutions in: Australia, Canada, Colombia, France, Germany, Ireland, Italy, the Netherlands, Sweden, the United Kingdom, and the United States of America. Increasing our number of submissions for the third year in a row is an encouraging sign that FMAS has established itself as a reputable publication venue for research on the formal modelling and verification of autonomous systems. After each paper was reviewed by three members of our Programme Committee we accepted a total of 15 papers: 8 long papers and 7 short papers.
    摘要 这个 EPTCS 卷包含了第五届国际形式方法工作坊(FMAS 2023)的论文集,该活动于2023年11月15日-16日举行。FMAS 2023 与18届国际集成形式方法会议(iFM)(iFM'22)联合举办,由雷登大学计算机科学院主办。工作坊本身在19世纪19世纪的重新翻新的褡厂 alongside the canal 举行。 FMAS 2023 接受了25篇提交的论文,包括11篇正式论文、3篇经验报告、6篇研究预览和5篇视野论文。参加该活动的研究人员来自:澳大利亚、加拿大、哥伦比亚、法国、德国、爱尔兰、意大利、荷兰、瑞典、英国和美国。我们在第三年 consecutively 收到更多的提交,表明 FMAS 已经成为自动化系统的正式模型和验证的出版物。经过三名编委会成员的审核后,我们接受了总共15篇论文:8篇长篇和7篇短篇。

Linear time Evidence Accumulation Clustering with KMeans

  • paper_url: http://arxiv.org/abs/2311.09272
  • repo_url: None
  • paper_authors: Gaëlle Candel
  • for: 本研究旨在提出一种简单 yet efficient consensus clustering方法,以解决现有方法的计算复杂性问题。
  • methods: 本方法基于证据积累 clustering,首先构建一个 n x n 的相关性矩阵,然后使用这个矩阵进行 clustering,以提取共识 clusters。与其他方法不同的是,这里不需要找到匹配于两个不同 partitioning 中的匹配项。但是,这种方法受到计算复杂性的限制,只适用于小规模 dataset。
  • results: 本研究提出了一种方法来高效计算 density,从而降低了计算复杂性的问题。此外,我们证明了 k-means 自然地maximizes density。在多个 benchmark dataset 上进行了比较,k-means 和 bisecting 版本的结果与其他现有的 consensus algorithm 相当,而且计算成本较低。此外,k-means 在 density 方面获得了最佳结果。这些结果表明,consensus clustering 可以使用简单的算法解决。
    Abstract Among ensemble clustering methods, Evidence Accumulation Clustering is one of the simplest technics. In this approach, a co-association (CA) matrix representing the co-clustering frequency is built and then clustered to extract consensus clusters. Compared to other approaches, this one is simple as there is no need to find matches between clusters obtained from two different partitionings. Nevertheless, this method suffers from computational issues, as it requires to compute and store a matrix of size n x n, where n is the number of items. Due to the quadratic cost, this approach is reserved for small datasets. This work describes a trick which mimic the behavior of average linkage clustering. We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity. Additionally, we proved that the k-means maximizes naturally the density. We performed experiments on several benchmark datasets where we compared the k-means and the bisecting version to other state-of-the-art consensus algorithms. The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low. Additionally, the k-means led to the best results in terms of density. These results provide evidence that consensus clustering can be solved with simple algorithms.
    摘要 在ensemble clustering方法中,证据积累 clustering 是一种最简单的方法。在这种方法中,我们首先构建一个 co-association(CA)矩阵,表示item之间的协 clustering频率,然后使用这个矩阵进行归一化,以提取共识cluster。相比其他方法,这种方法更简单,不需要在两个不同的 partitioning 中找到匹配。然而,这种方法受到计算问题的限制,因为需要计算和存储一个 n x n 的矩阵,其中 n 是items的数量,这会导致计算成本 quadratic。由于这个问题,这种方法只适用于小型数据集。本文描述了一种技巧,可以模拟average linkage clustering的行为。我们发现了一种可以高效计算分区 densities 的方法,从而降低计算成本的复杂度从 quadratic 降至 linear。此外,我们证明了 k-means 自然地 maximizes densities。我们在多个 benchmark 数据集上进行了实验,并与其他状态Of-the-art consensus算法进行了比较。k-means 的结果与最佳状态Of-the-art 的 NMI 相当,同时计算成本低。此外,k-means 导致了最佳的 densities 结果。这些结果证明了 consensus clustering 可以使用简单的算法解决。

Identifying Linear Relational Concepts in Large Language Models

  • paper_url: http://arxiv.org/abs/2311.08968
  • repo_url: https://github.com/Aryia-Behroziuan/Robot-learning
  • paper_authors: David Chanin, Anthony Hunter, Oana-Maria Camburu
  • for: 本文旨在找到隐藏层中的概念方向,以便更好地理解模型表示的概念。
  • methods: 本文提出了一种Linear Relational Concepts(LRC)技术,通过模型Subject和Object之间的关系为线性关系嵌入(LRE)来找到隐藏层中的概念方向。
  • results: 研究发现,通过逆向LRE并使用早期的对象层来找到概念方向,可以实现高效地为概念分类和影响模型输出。
    Abstract Transformer language models (LMs) have been shown to represent concepts as directions in the latent space of hidden activations. However, for any given human-interpretable concept, how can we find its direction in the latent space? We present a technique called linear relational concepts (LRC) for finding concept directions corresponding to human-interpretable concepts at a given hidden layer in a transformer LM by first modeling the relation between subject and object as a linear relational embedding (LRE). While the LRE work was mainly presented as an exercise in understanding model representations, we find that inverting the LRE while using earlier object layers results in a powerful technique to find concept directions that both work well as a classifier and causally influence model outputs.
    摘要 transformer 语言模型(LM)已经显示出在隐藏活动空间中表示概念的方向。然而,为任何给定的人类可解释的概念,如何在隐藏层中找到其方向?我们提出了线性关系概念(LRC)技术,用于在 transformer LM 中找到人类可解释的概念方向。我们首先将关系 между主题和对象模型为线性关系嵌入(LRE)。虽然 LRE 工作主要被表现为模型表示理解的一种实践,但我们发现,对于早期对象层来说,倒转 LRE 会生成一种强大的技术,可以作为分类器并在模型输出中产生 causal 影响。

I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots

  • paper_url: http://arxiv.org/abs/2311.08957
  • repo_url: None
  • paper_authors: Giulio Antonio Abbo, Tony Belpaeme
  • for: 该论文旨在探讨如何通过将视觉功能 integrate into conversational agents ,以提高人机交互的效果。
  • methods: 该论文使用最新的大语言模型(如 GPT-4、IDEFICS)来解释文本提示和实时视觉输入,创造出更Contextually 意识的对话系统。
  • results: 六个与 Furhat 机器人进行的交互记录和分析,ILLUSTRATE 和讨论所获得的结果,提出了一种将文本和视觉modalities融合的对话系统。
    Abstract In the rapidly evolving landscape of human-computer interaction, the integration of vision capabilities into conversational agents stands as a crucial advancement. This paper presents an initial implementation of a dialogue manager that leverages the latest progress in Large Language Models (e.g., GPT-4, IDEFICS) to enhance the traditional text-based prompts with real-time visual input. LLMs are used to interpret both textual prompts and visual stimuli, creating a more contextually aware conversational agent. The system's prompt engineering, incorporating dialogue with summarisation of the images, ensures a balance between context preservation and computational efficiency. Six interactions with a Furhat robot powered by this system are reported, illustrating and discussing the results obtained. By implementing this vision-enabled dialogue system, the paper envisions a future where conversational agents seamlessly blend textual and visual modalities, enabling richer, more context-aware dialogues.
    摘要 在人机交互领域的快速发展中,融合视觉能力的对话管理器是一项重要的进步。这篇论文介绍了一种使用最新的大语言模型(如GPT-4、IDEFICS)来增强传统的文本基于的提示,并在实时视觉输入的基础上进行对话管理。这些语言模型能够同时解释文本提示和视觉刺激,创造出更Contextually 意识的对话代理人。系统的提问工程,包括对话和图像摘要,保证了对话的上下文保持和计算效率的平衡。报告了六次与furhat机器人运行此系统的交互,并讲述了获得的结果。通过实现这种视觉启用对话系统,论文预测未来的对话代理人将协调文本和视觉模式,实现更加 ricther,Contextually 意识的对话。

Safety, Trust, and Ethics Considerations for Human-AI Teaming in Aerospace Control

  • paper_url: http://arxiv.org/abs/2311.08943
  • repo_url: None
  • paper_authors: Kerianne L. Hobbs, Bernard Li
  • for: 本文旨在探讨人工智能在航空系统控制中的合作,特别是人类和AI的团队合作,以及这些团队合作的安全、可靠和伦理方面。
  • methods: 本文使用了许多不同的方法,包括文献综述、案例研究和理论分析,以探讨不同的人工智能应用场景和相关的安全、可靠和伦理问题。
  • results: 本文的结果表明,在安全和任务关键领域中使用人工智能时,需要考虑到安全、可靠和伦理方面的问题,并且需要采取相应的措施来解决这些问题。
    Abstract Designing a safe, trusted, and ethical AI may be practically impossible; however, designing AI with safe, trusted, and ethical use in mind is possible and necessary in safety and mission-critical domains like aerospace. Safe, trusted, and ethical use of AI are often used interchangeably; however, a system can be safely used but not trusted or ethical, have a trusted use that is not safe or ethical, and have an ethical use that is not safe or trusted. This manuscript serves as a primer to illuminate the nuanced differences between these concepts, with a specific focus on applications of Human-AI teaming in aerospace system control, where humans may be in, on, or out-of-the-loop of decision-making.
    摘要 设计一个安全、可信、伦理的人工智能可能是实际上不可能的;但是设计人工智能以安全、可信、伦理的使用为目标是可能的和必要的,尤其在安全和战略性领域如航空航天。安全、可信、伦理的使用人工智能常常被混用,但是一个系统可以安全地使用但不是可信或伦理的,可以有一个可信用但不是安全或伦理的,可以有一个伦理用但不是安全或可信的。这篇报告作为一个导论,探讨了这些概念之间的细腻差异,尤其在人工智能和人类团队在航空系统控制中的应用, где人类可能在、在或离Loop的决策过程中。

Reasoning over Description Logic-based Contexts with Transformers

  • paper_url: http://arxiv.org/abs/2311.08941
  • repo_url: None
  • paper_authors: Angelos Poulis, Eleni Tsalapati, Manolis Koubarakis
  • for: 本研究的目的是测试 transformer 模型在复杂的语言上进行推理能力。
  • methods: 本研究使用了生成自描述逻辑知识库的自然语言问答数据集,并使用了 $\mathcal{ALCQ}$ 语言来生成知识库。
  • results: 研究发现,使用 DEBERTa 模型 DELTA$_M$ 的表现随 reasoning depth 的增加而无显著变化,而 sentence length 的增加则不会影响表现。此外,模型在不同的 reasoning depth 上进行推理时的泛化能力也得到了证明。
    Abstract One way that the current state of the art measures the reasoning ability of transformer-based models is by evaluating accuracy in downstream tasks like logical question answering or proof generation over synthetic contexts expressed in natural language. However, most of the contexts used are in practice very simple; in most cases, they are generated from short first-order logic sentences with only a few logical operators and quantifiers. In this work, we seek to answer the question how well a transformer-based model will perform reasoning over expressive contexts. For this purpose, we construct a synthetic natural language question-answering dataset, generated by description logic knowledge bases. For the generation of the knowledge bases, we use the expressive language $\mathcal{ALCQ}$. The resulting dataset contains 384K examples, and increases in two dimensions: i) reasoning depth, and ii) length of sentences. We show that the performance of our DeBERTa-based model, DELTA$_M$, is marginally affected when the reasoning depth is increased and it is not affected at all when the length of the sentences is increasing. We also evaluate the generalization ability of the model on reasoning depths unseen at training, both increasing and decreasing, revealing interesting insights into the model's adaptive generalization abilities.
    摘要 Currently, the state-of-the-art measure of reasoning ability in transformer-based models is their accuracy in downstream tasks like logical question answering or proof generation over synthetic contexts expressed in natural language. However, most of these contexts are very simple, typically consisting of short first-order logic sentences with only a few logical operators and quantifiers. In this study, we aim to investigate how well a transformer-based model can perform reasoning over more expressive contexts. To achieve this, we create a synthetic natural language question-answering dataset generated by description logic knowledge bases. We use the expressive language $\mathcal{ALCQ}$ to generate the knowledge bases, resulting in a dataset containing 384K examples that increase in two dimensions: i) reasoning depth, and ii) length of sentences. Our DeBERTa-based model, DELTA$_M$, shows marginal impact from increased reasoning depth and no impact from longer sentences. We also evaluate the model's generalization ability on unseen reasoning depths, both increasing and decreasing, revealing interesting insights into its adaptive generalization capabilities.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese writing systems. If you prefer Traditional Chinese, please let me know and I will be happy to provide the translation in that script.

Supported Trust Region Optimization for Offline Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.08935
  • repo_url: None
  • paper_authors: Yixiu Mao, Hongchang Zhang, Chen Chen, Yi Xu, Xiangyang Ji
  • for: 提高掌控环境下的远离线强化学习效果
  • methods: 使用支持信任区域优化(STR)方法,即在行为政策内部进行强化学习优化,且受到行为政策支持的约束
  • results: 在假设无误度和抽象误差时,STR方法能够保证政策改进直至到达数据集中的优化策略,并在实际测试中表现出优于当前状态的表现。
    Abstract Offline reinforcement learning suffers from the out-of-distribution issue and extrapolation error. Most policy constraint methods regularize the density of the trained policy towards the behavior policy, which is too restrictive in most cases. We propose Supported Trust Region optimization (STR) which performs trust region policy optimization with the policy constrained within the support of the behavior policy, enjoying the less restrictive support constraint. We show that, when assuming no approximation and sampling error, STR guarantees strict policy improvement until convergence to the optimal support-constrained policy in the dataset. Further with both errors incorporated, STR still guarantees safe policy improvement for each step. Empirical results validate the theory of STR and demonstrate its state-of-the-art performance on MuJoCo locomotion domains and much more challenging AntMaze domains.
    摘要 < translate into Simplified Chinese离线强化学uffer于out-of-distribution问题和推论误差。大多数策略约束方法将训练的策略密度规范到行为策略上,这是大多数情况下过于严格的。我们提议Supported Trust Region优化(STR),该方法通过在行为策略支持下进行信任区域策略优化,享受到较为lenient的支持约束。我们证明,当假设无approximation和抽象误差时,STR确保每步产生策略改进,直到在数据集中收敛到最优的支持约束策略。而在实际中,STR仍然保证每步安全的策略改进,即使包括两种误差。实验结果证明STR的理论和实际性能在MuJoCo步行领域和更加复杂的AntMaze领域都达到了顶峰水平。

Leveraging Activation Maximization and Generative Adversarial Training to Recognize and Explain Patterns in Natural Areas in Satellite Imagery

  • paper_url: http://arxiv.org/abs/2311.08923
  • repo_url: None
  • paper_authors: Ahmed Emam, Timo T. Stomberg, Ribana Roscher
  • for: 保护 natura 遗产的详细地图创建
  • methods: 使用activation maximization和生成对抗模型生成卫星图像,结合领域知识,提供完整和有效的解释方法
  • results: 生成的卫星图像可以准确地标识保护区域的自然 authenticity 特征,提高了保护区域的生态完整性的理解,可能对未来监测和保护做出贡献
    Abstract Natural protected areas are vital for biodiversity, climate change mitigation, and supporting ecological processes. Despite their significance, comprehensive mapping is hindered by a lack of understanding of their characteristics and a missing land cover class definition. This paper aims to advance the explanation of the designating patterns forming protected and wild areas. To this end, we propose a novel framework that uses activation maximization and a generative adversarial model. With this, we aim to generate satellite images that, in combination with domain knowledge, are capable of offering complete and valid explanations for the spatial and spectral patterns that define the natural authenticity of these regions. Our proposed framework produces more precise attribution maps pinpointing the designating patterns forming the natural authenticity of protected areas. Our approach fosters our understanding of the ecological integrity of the protected natural areas and may contribute to future monitoring and preservation efforts.
    摘要 自然保护区是生物多样性、气候变化缓解和生态过程支持的重要资源。尽管它们的重要性,但全面的地图制定受到了未understanding其特征和缺失的土地覆盖类划定的限制。本文提出了一种新的框架,使用活动最大化和生成对抗模型,以提高指定 Patterns forming protected and wild areas的解释。通过这种方法,我们可以生成具有完整性和有效性的卫星图像,与领域知识相结合,以提供自然 authenticity 的区域的完整和有效的解释。我们的提议的框架可以生成更精确的归属地图, pinpointing the designating patterns forming the natural authenticity of protected areas。这将有助于我们更好地理解保护区的生态完整性,并可能对未来监测和保护做出贡献。

An Empathetic User-Centric Chatbot for Emotional Support

  • paper_url: http://arxiv.org/abs/2311.09271
  • repo_url: None
  • paper_authors: Yanting Pan, Yixuan Tang, Yuchen Niu
  • for: 这篇论文探讨了亚特媒体文化和人工智能之间的交叉点,尤其是游戏如何满足年轻女性的情感需求。
  • methods: 这篇论文使用了大语言模型(LLM)技术来超越传统的静态游戏剧本,创造出dinamic和情感响应的互动体验。
  • results: 研究人员通过在游戏剧本中添加问答(QA)系统,通过数据扩充和情感增强技术,创建了一个真实和支持的伴侣聊天机器人。
    Abstract This paper explores the intersection of Otome Culture and artificial intelligence, particularly focusing on how Otome-oriented games fulfill the emotional needs of young women. These games, which are deeply rooted in a subcultural understanding of love, provide players with feelings of satisfaction, companionship, and protection through carefully crafted narrative structures and character development. With the proliferation of Large Language Models (LLMs), there is an opportunity to transcend traditional static game narratives and create dynamic, emotionally responsive interactions. We present a case study of Tears of Themis, where we have integrated LLM technology to enhance the interactive experience. Our approach involves augmenting existing game narratives with a Question and Answer (QA) system, enriched through data augmentation and emotional enhancement techniques, resulting in a chatbot that offers realistic and supportive companionship.
    摘要 这篇论文探讨了互助文化和人工智能的交叉点,特别是游戏如何满足年轻女性的情感需求。这些游戏,深受互助文化的影响,为玩家提供满足、伙伴和保护的感受,通过精心设计的故事结构和人物发展。随着大语言模型(LLM)的普及,有机会超越传统的静止游戏剧本,创造动态、情感回应的互动体验。我们介绍了《泪之Theme》案例,我们在该游戏中集成了LLM技术,以增强互动体验。我们的方法包括在现有游戏剧本中添加问答(QA)系统,通过数据增强和情感增强技术,创造出真实和支持的伙伴。

NormNet: Scale Normalization for 6D Pose Estimation in Stacked Scenarios

  • paper_url: http://arxiv.org/abs/2311.09269
  • repo_url: https://github.com/shuttlet/normnet
  • paper_authors: En-Te Lin, Wei-Jie Lv, Ding-Tao Huang, Long Zeng
  • for: 本研究旨在提出一种可以在堆积场景中robustly estimate不同尺度对象的6DoF pose estimator(NormNet)。
  • methods: 本方法首先使用点准 regression来学习每个对象的尺度,然后通过semantic segmentation和affine变换将所有对象 норmalized到同一个尺度。最后,它们被 fed into a shared pose estimator来恢复它们的6D姿态。此外,我们还提出了一种新的Sim-to-Real transfer管线,该管线结合了style transfer和domain randomization,以提高NormNet在实际数据上的性能。
  • results: 广泛的实验表明,提出的方法可以在公共benchmark和我们自己construct的MultiScale dataset上达到领先的性能。实际世界 эксперименты也显示,我们的方法可以robustly estimate不同尺度对象的6D姿态。
    Abstract Existing Object Pose Estimation (OPE) methods for stacked scenarios are not robust to changes in object scale. This paper proposes a new 6DoF OPE network (NormNet) for different scale objects in stacked scenarios. Specifically, each object's scale is first learned with point-wise regression. Then, all objects in the stacked scenario are normalized into the same scale through semantic segmentation and affine transformation. Finally, they are fed into a shared pose estimator to recover their 6D poses. In addition, we introduce a new Sim-to-Real transfer pipeline, combining style transfer and domain randomization. This improves the NormNet's performance on real data even if we only train it on synthetic data. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance on public benchmarks and the MultiScale dataset we constructed. The real-world experiments show that our method can robustly estimate the 6D pose of objects at different scales.
    摘要 现有的栅格场景中对象姿态估计(OPE)方法不能抗测对象比例变化。这篇论文提出了一种新的6度自由姿态网络(NormNet),用于不同比例的 объекts在栅格场景中估计6D姿态。具体来说,每个对象的比例首先通过点级回归学习。然后,所有在栅格场景中的对象都被正规化为同一个比例通过 semantic segmentation 和Affine变换。最后,它们被 fed into 共享的姿态估计器,以便从 shared pose estimator 中回归其6D姿态。此外,我们还引入了一种新的 Sim-to-Real 传输管道, combining style transfer 和 domain randomization。这种管道可以在只使用 sintetic data 进行训练时,提高 NormNet 的表现 на real data。广泛的实验表明,我们提posed方法可以在公共 benchmarks 和我们自己构建的 MultiScale 数据集上达到顶尖性能。在实际场景中,我们的方法可以Robustly 估计不同比例的对象的6D姿态。

Combining Transfer Learning with In-context Learning using Blackbox LLMs for Zero-shot Knowledge Base Question Answering

  • paper_url: http://arxiv.org/abs/2311.08894
  • repo_url: None
  • paper_authors: Mayur Patidar, Avinash Singh, Riya Sawhney, Indrajit Bhattacharya, Mausam
  • for: 本文Addresses the zero-shot transfer learning setting for the knowledge base question answering (KBQA) problem, where a large volume of labeled training data is available for the source domain, but no such labeled examples are available for the target domain.
  • methods: 本文使用了大量的无标签数据在目标域,并结合了源域的标签数据进行了转移学习。此外,文章还提出了基于黑盒大语言模型(BLLM)的受限自我调整方法,可以独立于转移设定进行执行。
  • results: 根据实验结果,提出的方法可以在 GrailQA 作为源域和 WebQSP 作为目标域的情况下,对两个阶段(检索和生成)进行了显著改进,并且也超越了当前的超参数化KBQA模型。此外,当有限量的标签数据时,BLLM的扩展也可以在域内设定中提供显著的改进。
    Abstract We address the zero-shot transfer learning setting for the knowledge base question answering (KBQA) problem, where a large volume of labeled training data is available for the source domain, but no such labeled examples are available for the target domain. Transfer learning for KBQA makes use of large volumes of unlabeled data in the target in addition to the labeled data in the source. More recently, few-shot in-context learning using Black-box Large Language Models (BLLMs) has been adapted for KBQA without considering any source domain data. In this work, we show how to meaningfully combine these two paradigms for KBQA so that their benefits add up. Specifically, we preserve the two stage retrieve-then-generate pipeline of supervised KBQA and introduce interaction between in-context learning using BLLMs and transfer learning from the source for both stages. In addition, we propose execution-guided self-refinement using BLLMs, decoupled from the transfer setting. With the help of experiments using benchmark datasets GrailQA as the source and WebQSP as the target, we show that the proposed combination brings significant improvements to both stages and also outperforms by a large margin state-of-the-art supervised KBQA models trained on the source. We also show that in the in-domain setting, the proposed BLLM augmentation significantly outperforms state-of-the-art supervised models, when the volume of labeled data is limited, and also outperforms these marginally even when using the entire large training dataset.
    摘要 我们研究了零shot转移学习 Setting for 知识库问答(KBQA)问题,其中有大量标注的训练数据在源领域可用,但target领域没有任何标注的示例。KBQA的转移学习使用了target领域的大量无标注数据,以及源领域的标注数据。在这种情况下,我们将黑obox大型自然语言模型(BLLM)的几个shot在 Context learning应用于KBQA,而不考虑源领域的数据。在这种情况下,我们保留了KBQA的两stage retrieve-then-generate架构,并在这两个阶段中引入了BLLM的交互。此外,我们还提出了基于BLLM的执行指导自适应,与转移学习分离。通过使用GrailQA作为源领域和WebQSP作为目标领域的实验,我们表明了我们的提案可以在两个阶段中提供显著改进,并且也超越了当前的supervised KBQA模型。此外,我们还表明了在域内设置下,我们的BLLM扩展可以在标注数据量有限的情况下获得显著改进,并且甚至在使用整个大量训练数据时也能够超越supervised模型。

Advances in ACL2 Proof Debugging Tools

  • paper_url: http://arxiv.org/abs/2311.08856
  • repo_url: None
  • paper_authors: Matt Kaufmann, J Strother Moore
  • for: 本文描述了ACL2用户通常会遇到失败的证明尝试,以及如何使用工具来解决这些失败。
  • methods: 本文专注于ACL2版本8.5后的改进:改进的break-rewrite工具以及新增的with-brr-data工具。
  • results: 通过使用这些工具,ACL2用户可以更有效地解决证明失败。
    Abstract The experience of an ACL2 user generally includes many failed proof attempts. A key to successful use of the ACL2 prover is the effective use of tools to debug those failures. We focus on changes made after ACL2 Version 8.5: the improved break-rewrite utility and the new utility, with-brr-data.
    摘要 ACL2用户通常会经历许多失败的证明尝试。成功使用ACL2证明工具的关键在于有效地使用工具来调试失败。我们关注ACL2版本8.5后的更改:改进的break-rewrite工具以及新增的with-brr-data工具。

Evaluating Gender Bias in the Translation of Gender-Neutral Languages into English

  • paper_url: http://arxiv.org/abs/2311.08836
  • repo_url: None
  • paper_authors: Spencer Rarrick, Ranjita Naik, Sundar Poudel, Vishal Chowdhary
  • for: 这个论文的目的是提出一个gender bias检测和 mitigation的数据集,以便更好地评估和改进Machine Translation(MT)系统中的gender bias问题。
  • methods: 这个论文使用了一个新的数据集名为GATE X-E,这个数据集包含了从土耳其语、匈牙利语、芬兰语和波斯语翻译成英语的人工翻译,每个翻译都有女性、男性和中性的多个变体。此外,这篇论文还提出了一种基于GPT-3.5 Turbo的英语性别重写解决方案,并使用GATE X-E来评估这种解决方案。
  • results: 这篇论文的研究结果表明,GATE X-E数据集可以帮助提高MT系统中gender bias的识别和改进,并且基于GPT-3.5 Turbo的英语性别重写解决方案也能够有效地改善MT系统中的gender bias问题。
    Abstract Machine Translation (MT) continues to improve in quality and adoption, yet the inadvertent perpetuation of gender bias remains a significant concern. Despite numerous studies into gender bias in translations from gender-neutral languages such as Turkish into more strongly gendered languages like English, there are no benchmarks for evaluating this phenomenon or for assessing mitigation strategies. To address this gap, we introduce GATE X-E, an extension to the GATE (Rarrick et al., 2023) corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English. Each translation is accompanied by feminine, masculine, and neutral variants for each possible gender interpretation. The dataset, which contains between 1250 and 1850 instances for each of the four language pairs, features natural sentences with a wide range of sentence lengths and domains, challenging translation rewriters on various linguistic phenomena. Additionally, we present an English gender rewriting solution built on GPT-3.5 Turbo and use GATE X-E to evaluate it. We open source our contributions to encourage further research on gender debiasing.
    摘要

A* search algorithm for an optimal investment problem in vehicle-sharing systems

  • paper_url: http://arxiv.org/abs/2311.08834
  • repo_url: None
  • paper_authors: Ba Luat Le, Layla Martin, Emrah Demir, Duc Minh Vu
  • for: 该研究探讨了一个优化投资问题,它在 Shared Vehicle System 中出现。给定一个站点建设集,我们需要确定(i)站点建设顺序和车辆数量,以达到所有站点建设完成的目标状态,(ii)在一些或所有站点打开时,最大化运营系统的总收益。
  • methods: 作者提出了一种 A* 搜索算法来解决这个问题,该算法可以视为一种 TSP 变种,具有集成依赖性的成本。
  • results: 计算实验表明,作者的提案算法在比较 Dijkstra 算法时具有明显的优势,并且将来的研究可以探讨新的可能性和应用。
    Abstract We study an optimal investment problem that arises in the context of the vehicle-sharing system. Given a set of locations to build stations, we need to determine i) the sequence of stations to be built and the number of vehicles to acquire in order to obtain the target state where all stations are built, and ii) the number of vehicles to acquire and their allocation in order to maximize the total profit returned by operating the system when some or all stations are open. The profitability associated with operating open stations, measured over a specific time period, is represented as a linear optimization problem applied to a collection of open stations. With operating capital, the owner of the system can open new stations. This property introduces a set-dependent aspect to the duration required for opening a new station, and the optimal investment problem can be viewed as a variant of the Traveling Salesman Problem (TSP) with set-dependent cost. We propose an A* search algorithm to address this particular variant of the TSP. Computational experiments highlight the benefits of the proposed algorithm in comparison to the widely recognized Dijkstra algorithm and propose future research to explore new possibilities and applications for both exact and approximate A* algorithms.
    摘要 我们研究一个最佳投资问题,它在车仲共享系统中发生。我们需要 Determine 以下两个问题:1. 建站的顺序和车辆数量,以实现所有站点都建立,并2. 车辆数量和分配方式,以最大化在一些或所有站点开放时的总收益。系统在运行时的收益,通过在一个特定时间间隔内进行线性优化问题,以表示开放的站点的盈利。系统所有者可以通过资金来开新站点。这个属性导致开新站点所需时间受到站点集的依赖,并且将最佳投资问题视为对特定设置成本的车辆销售人员问题的变形。我们提议使用A*搜索算法来解决这个问题。计算实验显示了我们的提案算法与通过世界上所认可的迪克斯特拉算法相比,具有更好的性能。我们未来的研究将探讨新的可能性和应用,以及精确和近似A*算法的应用。

  • paper_url: http://arxiv.org/abs/2311.08832
  • repo_url: None
  • paper_authors: Malak Sadek, Céline Mougenot
  • for: The paper is written to explore the socio-technical challenges of creating conversational agents (CA) and to propose practical strategies to overcome these challenges.
  • methods: The paper uses a scoping review of existing literature to identify and categorize the socio-technical challenges of CA design, and proposes a taxonomy of these challenges using interdisciplinary collaboration (IDC) as a lens.
  • results: The paper proposes practical strategies to overcome the socio-technical challenges of CA design, and invites future work to empirically verify the suggested conceptual links and apply the proposed strategies within the space of CA design to evaluate their effectiveness.Here is the same information in Simplified Chinese text:
  • for: 这篇论文是为了探讨对话代理(CA)的社会技术创新挑战,并提出了解决这些挑战的实际策略。
  • methods: 这篇论文通过审视现有文献来标识和分类CA设计中的社会技术挑战,并提出了使用交叉学科协作(IDC)作为镜头的挑战分类法。
  • results: 这篇论文提出了解决CA设计中的社会技术挑战的实际策略,并邀请未来的研究 empirically verify提出的概念链和在CA设计空间中应用提出的策略以评估其效果。
    Abstract Recent years have seen a steady rise in the popularity and use of Conversational Agents (CA) for different applications, well before the more immediate impact of large language models. This rise has been accompanied by an extensive exploration and documentation of the challenges of designing and creating conversational agents. Focusing on a recent scoping review of the socio-technical challenges of CA creation, this opinion paper calls for an examination of the extent to which interdisciplinary collaboration (IDC) challenges might contribute towards socio-technical CA design challenges. The paper proposes a taxonomy of CA design challenges using IDC as a lens, and proposes practical strategies to overcome them which complement existing design principles. The paper invites future work to empirically verify suggested conceptual links and apply the proposed strategies within the space of CA design to evaluate their effectiveness.
    摘要 Translation notes:* "Conversational Agents" (CA) is translated as "对话代理人" (duìxiào dàibiǎn)* "Interdisciplinary collaboration" (IDC) is translated as "交叉学科合作" (jiāo kè xué kē hè zuò)* "Socio-technical challenges" is translated as "社会技术挑战" (shè huì jī shuō tā zhàn)* "Design principles" is translated as "设计原则" (xiè yì yuán xì)Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.

Reinforcement Learning with Model Predictive Control for Highway Ramp Metering

  • paper_url: http://arxiv.org/abs/2311.08820
  • repo_url: https://github.com/filippoairaldi/mpcrl-for-ramp-metering
  • paper_authors: Filippo Airaldi, Bart De Schutter, Azita Dabiri
  • for: 提高城市和高速公路交通系统的效率
  • methods: 结合模型驱动和学习驱动的方法,使用可靠学习控制方法来改进高速公路上的踏面控制
  • results: 实验结果显示,从一个不精准的模型和不佳地调整的控制器开始,提议的方法可以有效地学习改进控制策略,从而减少网络中的堵塞和满足约束,相比初始控制器表现更佳。
    Abstract In the backdrop of an increasingly pressing need for effective urban and highway transportation systems, this work explores the synergy between model-based and learning-based strategies to enhance traffic flow management by use of an innovative approach to the problem of highway ramp metering control that embeds Reinforcement Learning techniques within the Model Predictive Control framework. The control problem is formulated as an RL task by crafting a suitable stage cost function that is representative of the traffic conditions, variability in the control action, and violations of a safety-critical constraint on the maximum number of vehicles in queue. An MPC-based RL approach, which merges the advantages of the two paradigms in order to overcome the shortcomings of each framework, is proposed to learn to efficiently control an on-ramp and to satisfy its constraints despite uncertainties in the system model and variable demands. Finally, simulations are performed on a benchmark from the literature consisting of a small-scale highway network. Results show that, starting from an MPC controller that has an imprecise model and is poorly tuned, the proposed methodology is able to effectively learn to improve the control policy such that congestion in the network is reduced and constraints are satisfied, yielding an improved performance compared to the initial controller.
    摘要 在城市和高速公路交通系统的需求越来越高的背景下,这项工作探讨了模型基本和学习基本策略之间的共谊,以提高交通流控制的效果。该工作使用了一种嵌入了回归学习技术的模型预测控制框架来解决高速匝道流控制问题。通过设计一个合适的stage cost函数,该方法将交通条件、控制动作的变化和安全约束的最大车辆队列数量作为RL任务的stage cost函数。该方法将MPC和RL两种框架融合,以超越每个框架的缺点,并学习高速匝道控制,并满足系统模型不确定性和变化的需求。最后,对一个小规模高速公路网络的测试表明,从一个不精确的模型和优化不良的MPC控制器开始,该方法能够有效地学习改善控制策略,从而减少网络中的拥堵,满足约束,并提高效果相比于初始控制器。

Frequency Domain-based Dataset Distillation

  • paper_url: http://arxiv.org/abs/2311.08819
  • repo_url: https://github.com/sdh0818/fred
  • paper_authors: Donghyeok Shin, Seungjae Shin, Il-Chul Moon
  • for: 本研究旨在提出一种新的参数化方法,用于快速生成小型的合成数据集,从原始大型数据集中提取关键信息。
  • methods: 该方法基于频域的变换来优化数据集中每个实例的频率表示,通过选择特定频率维度进行优化,以实现快速生成实例的目标。
  • results: 对于不同的评价指标和数据集,FreD方法能够在有限的资源下实现更好的信息保留和性能提升,并且与现有方法兼容。
    Abstract This paper presents FreD, a novel parameterization method for dataset distillation, which utilizes the frequency domain to distill a small-sized synthetic dataset from a large-sized original dataset. Unlike conventional approaches that focus on the spatial domain, FreD employs frequency-based transforms to optimize the frequency representations of each data instance. By leveraging the concentration of spatial domain information on specific frequency components, FreD intelligently selects a subset of frequency dimensions for optimization, leading to a significant reduction in the required budget for synthesizing an instance. Through the selection of frequency dimensions based on the explained variance, FreD demonstrates both theoretical and empirical evidence of its ability to operate efficiently within a limited budget, while better preserving the information of the original dataset compared to conventional parameterization methods. Furthermore, based on the orthogonal compatibility of FreD with existing methods, we confirm that FreD consistently improves the performances of existing distillation methods over the evaluation scenarios with different benchmark datasets. We release the code at https://github.com/sdh0818/FreD.
    摘要

MAP’s not dead yet: Uncovering true language model modes by conditioning away degeneracy

  • paper_url: http://arxiv.org/abs/2311.08817
  • repo_url: None
  • paper_authors: Davis Yoshida, Kartik Goyal, Kevin Gimpel
  • for: 这个论文主要研究了NLG模型中模式的问题,具体来说是解释为什么模式搜索(MAP解oding)常常导致输出异常(Stahlberg和Byrne,2019,Holtzman等,2019)。
  • methods: 作者使用了杂合搜索和模式搜索来研究NLG模型的输出。他们发现,即使模型没有错误,模式仍可以变得缺乏含义,这是因为训练数据中的噪声污染。为解决这问题,作者提议使用模式搜索 conditional on avoiding specific degeneracies。
  • results: 作者通过实验证明了,对机器翻译模型和语言模型进行长度 conditional 模式搜索可以获得更加流畅和话题性的输出。此外,作者还提供了许多模式序列的实际示例,并证明了LLaMA模型的模式仍然具有缺乏含义的问题。为了解决这问题,作者开发了一种approximate模式搜索方法,ACBS。通过应用这种方法,作者可以从LLaMA-7B模型中获得可接受的输出,而无需任何训练。
    Abstract It has been widely observed that exact or approximate MAP (mode-seeking) decoding from natural language generation (NLG) models consistently leads to degenerate outputs (Stahlberg and Byrne, 2019, Holtzman et al., 2019). This has generally been attributed to either a fundamental inadequacy of modes in models or weaknesses in language modeling. Contrastingly in this work, we emphasize that degenerate modes can even occur in the absence of any model error, due to contamination of the training data. Specifically, we show that mixing even a tiny amount of low-entropy noise with a population text distribution can cause the data distribution's mode to become degenerate, implying that any models trained on it will be as well. As the unconditional mode of NLG models will often be degenerate, we therefore propose to apply MAP decoding to the model's distribution conditional on avoiding specific degeneracies. Using exact-search, we empirically verify that the length-conditional modes of machine translation models and language models are indeed more fluent and topical than their unconditional modes. For the first time, we also share many examples of exact modal sequences from these models, and from several variants of the LLaMA-7B model. Notably, the modes of the LLaMA models are still degenerate, showing that improvements in modeling have not fixed this issue. Because of the cost of exact mode finding algorithms, we develop an approximate mode finding approach, ACBS, which finds sequences that are both high-likelihood and high-quality. We apply this approach to LLaMA-7B, a model which was not trained for instruction following, and find that we are able to elicit reasonable outputs without any finetuning.
    摘要 历史观察表明,使用自然语言生成(NLG)模型的准确或近似MAP(模式寻找)解oding会导致异常输出(Stahlberg和Byrne,2019,Holtzman等,2019)。这一问题通常被归结到模型中的缺陷或语言模型的弱点。然而,在本研究中,我们强调的是,即使模型没有错误,degenerate modes仍可能出现,这是因为训练数据被杂入了低 entropy 的噪音。我们证明,只要混合一点微的低 entropy 噪音到一个人类文本分布中,就可以让数据分布的模式变得异常。因此,我们建议在模型的分布上使用MAP decoding,并且条件于避免特定的异常模式。我们通过对机器翻译模型和语言模型的长度准确模式进行实验,证明了这些模式在fluency和topicality方面比unconditional modes更高。此外,我们还提供了许多exact模式序列的例子,包括several variants of the LLaMA-7B model。不幸的是,LLaMA模型的模式仍然异常,显示改进模型化没有解决这一问题。由于找到精确模式的算法成本高,我们开发了一种 Approximate CBS(ACBS)模式找到方法,可以找到高概率和高质量的序列。我们应用ACBS方法于LLaMA-7B模型,并发现可以获得无需较少的finetuning的合理输出。

Self-Supervised Disentanglement by Leveraging Structure in Data Augmentations

  • paper_url: http://arxiv.org/abs/2311.08815
  • repo_url: None
  • paper_authors: Cian Eastwood, Julius von Kügelgen, Linus Ericsson, Diane Bouchacourt, Pascal Vincent, Bernhard Schölkopf, Mark Ibrahim
  • for: 这个论文旨在推动自然语言处理领域中的自我超VI中的表示学习。
  • methods: 这篇论文使用了数据扩充来适应”风格”特征的变化,但是由于下游任务通常在训练时未知,因此难以在训练时确定”风格”特征是否可以安全地丢弃。为了解决这个问题,这篇论文提出了一种更原则的方法,即通过添加多个风格嵌入空间来分离风格特征。
  • results: 该方法在synthetic数据集上进行了实验,并且在ImageNet上进行了一些有限的实验,并证明了其效果。
    Abstract Self-supervised representation learning often uses data augmentations to induce some invariance to "style" attributes of the data. However, with downstream tasks generally unknown at training time, it is difficult to deduce a priori which attributes of the data are indeed "style" and can be safely discarded. To address this, we introduce a more principled approach that seeks to disentangle style features rather than discard them. The key idea is to add multiple style embedding spaces where: (i) each is invariant to all-but-one augmentation; and (ii) joint entropy is maximized. We formalize our structured data-augmentation procedure from a causal latent-variable-model perspective, and prove identifiability of both content and (multiple blocks of) style variables. We empirically demonstrate the benefits of our approach on synthetic datasets and then present promising but limited results on ImageNet.
    摘要 自我指导学习经常使用数据扩充来induce一些数据的"风格"特征的不变性。然而,下游任务通常不知道训练时间点,因此难以在训练时确定哪些特征是"风格"特征,可以安全地抛弃。为解决这个问题,我们介绍了一种更理智的方法,即通过分离风格特征来解决这个问题。我们的关键想法是在多个风格嵌入空间中添加多个不变性,即:(i) 每个不变性都是对所有扩充之外的一个不变性;(ii) 共同 entropy 的最大化。我们从 causal 潜在变量模型的视角来正式描述我们的结构化数据扩充过程,并证明内容和多个块风格变量的可识别性。我们在synthetic dataset上进行了实验,并在ImageNet上得到了有前途的 pero有限的结果。

SparseSpikformer: A Co-Design Framework for Token and Weight Pruning in Spiking Transformer

  • paper_url: http://arxiv.org/abs/2311.08806
  • repo_url: None
  • paper_authors: Yue Liu, Shanlin Xiao, Bo Li, Zhiyi Yu
  • for: 这个研究旨在提高Spikformer模型的效率和能效性,使其适合实现在边缘设备上。
  • methods: 这个研究使用了Lottery Ticket Hypothesis(LTH)和几个创新的token和重量调整技术来实现Spikformer模型的简洁化。
  • results: 实验结果显示,这个框架可以将Spikformer模型的90%模型参数简减,且可以降低Giga浮动点操作数(GFLOPs)20%,同时保持原始模型的准确性。
    Abstract As the third-generation neural network, the Spiking Neural Network (SNN) has the advantages of low power consumption and high energy efficiency, making it suitable for implementation on edge devices. More recently, the most advanced SNN, Spikformer, combines the self-attention module from Transformer with SNN to achieve remarkable performance. However, it adopts larger channel dimensions in MLP layers, leading to an increased number of redundant model parameters. To effectively decrease the computational complexity and weight parameters of the model, we explore the Lottery Ticket Hypothesis (LTH) and discover a very sparse ($\ge$90%) subnetwork that achieves comparable performance to the original network. Furthermore, we also design a lightweight token selector module, which can remove unimportant background information from images based on the average spike firing rate of neurons, selecting only essential foreground image tokens to participate in attention calculation. Based on that, we present SparseSpikformer, a co-design framework aimed at achieving sparsity in Spikformer through token and weight pruning techniques. Experimental results demonstrate that our framework can significantly reduce 90% model parameters and cut down Giga Floating-Point Operations (GFLOPs) by 20% while maintaining the accuracy of the original model.
    摘要 为了提高edge设备上的神经网络模型的能效性,我们提出了一种基于SNN的第三代神经网络模型,即SparseSpikformer。该模型通过减少神经网络的计算复杂性和参数量来提高实现效率。在这个模型中,我们采用了LTH Hypothesis,并在SNN中发现了一个大于90%的稀疏子网络,可以保持与原始网络相同的性能。此外,我们还设计了一个轻量级的图像选择器模块,可以根据神经元的射击率选择图像中的重要背景信息,从而降低计算复杂性。基于这些设计,我们提出了一种减少Spikformer模型计算复杂性的框架,并实现了减少90%的模型参数和20%的GFLOPs操作数量的目标。实验结果表明,我们的框架可以维持原始模型的准确性,同时实现效率的提高。

X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects

  • paper_url: http://arxiv.org/abs/2311.08788
  • repo_url: None
  • paper_authors: Minqian Liu, Ying Shen, Zhiyang Xu, Yixin Cao, Eunah Cho, Vaibhav Kumar, Reza Ghanadan, Lifu Huang
  • for: 本文目的是提出一种多方面评估框架,以便评估自然语言生成(NLG)的多个方面质量。
  • methods: 本文使用了两个学习阶段:第一阶段是简单的指令调整阶段,旨在提高模型following指令的能力;第二阶段是加强的指令调整阶段,通过细致的评估方面之间的连接来更好地评估文本质量。
  • results: 经过广泛的实验,我们发现X-Eval可以让even a lightweight language model达到与人类评估相当或更高的相关性,比如GPT-4。
    Abstract Natural Language Generation (NLG) typically involves evaluating the generated text in various aspects (e.g., consistency and naturalness) to obtain a comprehensive assessment. However, multi-aspect evaluation remains challenging as it may require the evaluator to generalize to any given evaluation aspect even if it's absent during training. In this paper, we introduce X-Eval, a two-stage instruction tuning framework to evaluate the text in both seen and unseen aspects customized by end users. X-Eval consists of two learning stages: the vanilla instruction tuning stage that improves the model's ability to follow evaluation instructions, and an enhanced instruction tuning stage that exploits the connections between fine-grained evaluation aspects to better assess text quality. To support the training of X-Eval, we collect AspectInstruct, the first instruction tuning dataset tailored for multi-aspect NLG evaluation spanning 27 diverse evaluation aspects with 65 tasks. To enhance task diversity, we devise an augmentation strategy that converts human rating annotations into diverse forms of NLG evaluation tasks, including scoring, comparison, ranking, and Boolean question answering. Extensive experiments across three essential categories of NLG tasks: dialogue generation, summarization, and data-to-text coupled with 21 aspects in meta-evaluation, demonstrate that our X-Eval enables even a lightweight language model to achieve a comparable if not higher correlation with human judgments compared to the state-of-the-art NLG evaluators, such as GPT-4.
    摘要 自然语言生成(NLG)通常包括评估生成文本的多个方面(例如一致性和自然性)以获得全面的评估。然而,多方面评估仍然是挑战,因为评估人可能需要将注意力扩展到任何给定的评估方面,即使在训练过程中没有出现过。在这篇论文中,我们介绍了X-Eval,一个两个学习阶段的指令调整框架,用于评估文本在已知和未知方面的质量。X-Eval包括两个学习阶段:一个普通的指令调整阶段,用于提高模型能够遵循评估指令的能力,以及一个加强的指令调整阶段,用于更好地评估文本质量。为支持X-Eval的训练,我们收集了AspectInstruct数据集,这是第一个适用于多方面NLG评估的指令调整数据集,覆盖了27种多样化的评估方面,65个任务。为了增加任务多样性,我们设计了一种扩展策略,将人类评分笔记转换成多种NLG评估任务的不同形式,包括分数、对比、排名和布尔问答。在对对话生成、概要和数据到文本等三类NLG任务进行广泛的实验,我们发现X-Eval可以让even a lightweight语言模型与人类评估结果相似或更高相关性,比如GPT-4。

ICRA Roboethics Challenge 2023: Intelligent Disobedience in an Elderly Care Home

  • paper_url: http://arxiv.org/abs/2311.08783
  • repo_url: None
  • paper_authors: Sveta Paster, Kantwon Rogers, Gordon Briggs, Peter Stone, Reuth Mirsky
  • for: 这份报告是为了提高老人护理机构中的服务机器人增强老人的生活质量,以应对预计的老年人口增长。
  • methods: 该报告提议利用智能不遵守框架,让机器人能够进行有伦理意义的决策过程。
  • results: 该报告列出了智能不遵守框架可以帮助机器人解决的问题,并在特定的老人护理机构场景下定义了该框架的形式化定义,以及实现智能不遵守机器人的需求。
    Abstract With the projected surge in the elderly population, service robots offer a promising avenue to enhance their well-being in elderly care homes. Such robots will encounter complex scenarios which will require them to perform decisions with ethical consequences. In this report, we propose to leverage the Intelligent Disobedience framework in order to give the robot the ability to perform a deliberation process over decisions with potential ethical implications. We list the issues that this framework can assist with, define it formally in the context of the specific elderly care home scenario, and delineate the requirements for implementing an intelligently disobeying robot. We conclude this report with some critical analysis and suggestions for future work.
    摘要 随着老年人口增长的预计,服务机器人在老年人医疗机构中提供了一个有前途的解决方案,以提高老年人的生活质量。这些机器人会遇到复杂的情况,需要它们在具有伦理意义的决策时进行慎重的讨论。在这份报告中,我们提议利用智能不遵守框架,让机器人在具有伦理意义的决策时能够进行慎重的讨论。我们列出了这个框架可以帮助解决的问题,在老年人医疗机构特定场景中明确定义了它,并详细描述了实现智能不遵守机器人的需求。我们在报告结尾提出了一些批判性分析和未来工作的建议。

Adversarially Robust Spiking Neural Networks Through Conversion

  • paper_url: http://arxiv.org/abs/2311.09266
  • repo_url: https://github.com/igitugraz/robustsnnconversion
  • paper_authors: Ozan Özdenizci, Robert Legenstein
  • for: 提高深度神经网络(SNN)的防御性能,增强SNN在应用中的可靠性。
  • methods: 提出了一种可扩展的Robust SNN培训方法,通过归一化层级触发阈值和synaptic连接权重来保持从预训练ANN中传递的robust性提升。
  • results: 实验结果表明,我们的方法可以在多种适应性攻击Setting下提供一个可扩展的、低延迟的防御性能。
    Abstract Spiking neural networks (SNNs) provide an energy-efficient alternative to a variety of artificial neural network (ANN) based AI applications. As the progress in neuromorphic computing with SNNs expands their use in applications, the problem of adversarial robustness of SNNs becomes more pronounced. To the contrary of the widely explored end-to-end adversarial training based solutions, we address the limited progress in scalable robust SNN training methods by proposing an adversarially robust ANN-to-SNN conversion algorithm. Our method provides an efficient approach to embrace various computationally demanding robust learning objectives that have been proposed for ANNs. During a post-conversion robust finetuning phase, our method adversarially optimizes both layer-wise firing thresholds and synaptic connectivity weights of the SNN to maintain transferred robustness gains from the pre-trained ANN. We perform experimental evaluations in numerous adaptive adversarial settings that account for the spike-based operation dynamics of SNNs, and show that our approach yields a scalable state-of-the-art solution for adversarially robust deep SNNs with low-latency.
    摘要 神经网络(SNN)提供了一种能效的人工神经网络(ANN)的替代方案,随着神经omorphic计算的进步,SNN在应用中的使用逐渐扩大。然而,SNN的敌意 robustness问题在这种扩展过程中变得更加突出。而不是已经广泛探索的终端对抗验证学习方法,我们提出了一种可扩展的Robust SNN Training方法。我们的方法可以有效地涵盖各种计算具有挑战性的Robust learning目标,这些目标在ANN中已经得到了广泛的探索。在post-conversionRobust fine-tuning阶段,我们的方法在SNN中对层wise发射阈值和 synaptic连接权重进行了对抗优化,以保持从pre-trained ANN中传递的Robustness收益。我们在许多适应性攻击设定下进行了实验评估,并证明了我们的方法可以实现可扩展的state-of-the-art解决方案,并且具有低延迟。

Three Conjectures on Unexpectedeness

  • paper_url: http://arxiv.org/abs/2311.08768
  • repo_url: None
  • paper_authors: Giovanni Sileno, Jean-Louis Dessalles
  • for: This paper aims to lay the groundwork for a theoretical framework to explain the predictive power of unexpectedness in cognition, and to explore its connection to various measures of divergence between the entropy of the world and the variety of the observer.
  • methods: The paper uses a combination of theoretical conjectures and experimental results to develop a framework for understanding the role of unexpectedness in cognition.
  • results: The paper provides a new perspective on the relationship between unexpectedness and cognition, and suggests potential research directions that could lead to new insights into the extraction of causal relations and the role of descriptive mechanisms in learning.
    Abstract Unexpectedness is a central concept in Simplicity Theory, a theory of cognition relating various inferential processes to the computation of Kolmogorov complexities, rather than probabilities. Its predictive power has been confirmed by several experiments with human subjects, yet its theoretical basis remains largely unexplored: why does it work? This paper lays the groundwork for three theoretical conjectures. First, unexpectedness can be seen as a generalization of Bayes' rule. Second, the frequentist core of unexpectedness can be connected to the function of tracking ergodic properties of the world. Third, unexpectedness can be seen as constituent of various measures of divergence between the entropy of the world (environment) and the variety of the observer (system). The resulting framework hints to research directions that go beyond the division between probabilistic and logical approaches, potentially bringing new insights into the extraction of causal relations, and into the role of descriptive mechanisms in learning.
    摘要 不期待性是简洁理论中的核心概念, relate to various inference processes and Kolmogorov complexities computation, rather than probabilities. Its predictive power has been confirmed by several experiments with human subjects, but its theoretical basis remains largely unexplored: why does it work? This paper lays the groundwork for three theoretical conjectures. First, unexpectedness can be seen as a generalization of Bayes' rule. Second, the frequentist core of unexpectedness can be connected to the function of tracking ergodic properties of the world. Third, unexpectedness can be seen as a constituent of various measures of divergence between the entropy of the world (environment) and the variety of the observer (system). The resulting framework hints to research directions that go beyond the division between probabilistic and logical approaches, potentially bringing new insights into the extraction of causal relations, and into the role of descriptive mechanisms in learning.Here's the translation breakdown:不期待性 (bù qīdài xìng) - unexpectedness简洁理论 (jiǎn jiǎn lǐlùn) - Simplicity Theoryrelate (tiě yǔ) - relatevarious inference processes (dào yī) - various inference processesKolmogorov complexities (kēlèmǔ gōngjì) - Kolmogorov complexitiescomputation (suānjiǔ) - computationrather than probabilities (bié kèqì) - rather than probabilitiesits predictive power (wǒ de yìjī) - its predictive powerhas been confirmed (yǐjī) - has been confirmedby several experiments (shíyī zhèng yǐjī) - by several experimentswith human subjects (rénshēng) - with human subjectsbut (but) - butits theoretical basis (wǒ de lǐyì) - its theoretical basisremains largely unexplored (yǐjī zhèngyǐ) - remains largely unexploredwhy does it work? (bù yīnwèi zhèngyǐ) - why does it work?This paper (zhèng zhì) - This paperlays the groundwork (dào zhì) - lays the groundworkfor three theoretical conjectures (sān lǐyì zhèng) - for three theoretical conjecturesFirst, (yī) - Firstunexpectedness (bù qīdài xìng) - unexpectednesscan be seen as (dào yī) - can be seen asa generalization (fāngyì) - a generalizationof Bayes' rule (Bayes de zhèng) - of Bayes' ruleSecond, (èr) - Secondthe frequentist core (liàng zhèng) - the frequentist coreof unexpectedness (bù qīdài xìng) - of unexpectednesscan be connected (dào yī) - can be connectedto the function (fāngyì) - to the functionof tracking (dào) - of trackingergodic properties (érguò) - ergodic propertiesof the world (shìjiè) - of the worldThird, (sān) - Thirdunexpectedness (bù qīdài xìng) - unexpectednesscan be seen as (dào yī) - can be seen asa constituent (fāngyì) - a constituentof various measures (biǎo) - of various measuresof divergence (fāngbiàn) - of divergencebetween (biān) - betweenthe entropy (hétuán) - the entropyof the world (shìjiè) - of the worldand (he) - andthe variety (dào) - the varietyof the observer (jìshì) - of the observerThe resulting framework (zhèng zhì) - The resulting frameworkhints (dào) - hintsto research directions (kēngsuǒ) - to research directionsthat go beyond (biào) - that go beyondthe division (biān) - the divisionbetween probabilistic (suǒyì) - between probabilisticand logical (lógí) - and logicalapproaches (jì) - approachespotentially bringing (dào) - potentially bringingnew insights (xīnwèi) - new insightsinto (yǐ) - intothe extraction (suō) - the extractionof causal relations (liǎo) - of causal relationsand (he) - andthe role (yè) - the roleof descriptive mechanisms (mǎojī) - of descriptive mechanismsin learning (xuéxí) - in learning.

Combining Past, Present and Future: A Self-Supervised Approach for Class Incremental Learning

  • paper_url: http://arxiv.org/abs/2311.08764
  • repo_url: None
  • paper_authors: Xiaoshuang Chen, Zhongyi Sun, Ke Yan, Shouhong Ding, Hongtao Lu
  • for: 本文目的是解决自适应学习中的 kontinuous novel class 问题,即模型能够识别新来的类,同时避免 catastrophic forgetting。
  • methods: 本文提出了一种自助学习 CIL 框架 CPPF,包括一个 prototype clustering module (PC)、一个 embedding space reserving module (ESR) 和一个 multi-teacher distillation module (MTD)。PC 和 ESR 模块在prototype level和feature level分别保留 embedding space для后续阶段,而 MTD 模块保持当前阶段的表示不受过去知识的干扰。
  • results: 对 CIFAR100 和 ImageNet100 数据集进行了广泛的实验,显示了我们提出的方法可以提高自适应学习中的class incremental learning性能。
    Abstract Class Incremental Learning (CIL) aims to handle the scenario where data of novel classes occur continuously and sequentially. The model should recognize the sequential novel classes while alleviating the catastrophic forgetting. In the self-supervised manner, it becomes more challenging to avoid the conflict between the feature embedding spaces of novel classes and old ones without any class labels. To address the problem, we propose a self-supervised CIL framework CPPF, meaning Combining Past, Present and Future. In detail, CPPF consists of a prototype clustering module (PC), an embedding space reserving module (ESR) and a multi-teacher distillation module (MTD). 1) The PC and the ESR modules reserve embedding space for subsequent phases at the prototype level and the feature level respectively to prepare for knowledge learned in the future. 2) The MTD module maintains the representations of the current phase without the interference of past knowledge. One of the teacher networks retains the representations of the past phases, and the other teacher network distills relation information of the current phase to the student network. Extensive experiments on CIFAR100 and ImageNet100 datasets demonstrate that our proposed method boosts the performance of self-supervised class incremental learning. We will release code in the near future.
    摘要 <>Translate the following text into Simplified Chinese.<>类增量学习(CIL)目标是处理连续出现的新类数据场景。模型应该识别连续出现的新类,同时避免catastrophic forgetting。在无监督的方式下,更加挑战是避免新类和旧类的feature embedding空间之间的冲突。为解决这个问题,我们提出了一个自动监督CIL框架CPPF,即Combining Past, Present and Future。在详细的实现方式下,CPPF包括一个原型聚合模块(PC)、一个嵌入空间保留模块(ESR)以及一个多教师浸泡模块(MTD)。1)PC和ESR模块在原型级和特征级分别保留了后续阶段的嵌入空间,以便在未来学习的知识。2)MTD模块保持了当前阶段的表示,并避免了过去知识的干扰。其中一个教师网络保持过去阶段的表示,另一个教师网络将当前阶段的关系信息传播给学生网络。我们在CIFAR100和ImageNet100数据集上进行了广泛的实验,结果表明我们提出的方法可以提高无监督类增量学习的性能。我们将即将发布代码。

Forms of Understanding of XAI-Explanations

  • paper_url: http://arxiv.org/abs/2311.08760
  • repo_url: None
  • paper_authors: Hendrik Buschmeier, Heike M. Buhl, Friederike Kern, Angela Grimminger, Helen Beierling, Josephine Fisher, André Groß, Ilona Horwath, Nils Klowait, Stefan Lazarov, Michael Lenke, Vivien Lohmer, Katharina Rohlfing, Ingrid Scharlau, Amit Singh, Lutz Terfloth, Anna-Lisa Vollmer, Yu Wang, Annedore Wilmes, Britta Wrede
  • for: 本文旨在提供一种对Explainable Artificial Intelligence(XAI)领域和其他领域的理解模型,以及对理解的定义和形式、评估和动力的探讨。
  • methods: 本文采用了多学科的视角,包括计算机科学、语言学、社会学和心理学,对理解的定义和形式、评估和动力进行了探讨和系统化。
  • results: 本文提出了两种理解的形式,即启用性(knowing how)和理解(knowing that),并论证了这两种理解在解释过程中的发展和互相关系。 I hope this helps! Let me know if you have any further questions.
    Abstract Explainability has become an important topic in computer science and artificial intelligence, leading to a subfield called Explainable Artificial Intelligence (XAI). The goal of providing or seeking explanations is to achieve (better) 'understanding' on the part of the explainee. However, what it means to 'understand' is still not clearly defined, and the concept itself is rarely the subject of scientific investigation. This conceptual article aims to present a model of forms of understanding in the context of XAI and beyond. From an interdisciplinary perspective bringing together computer science, linguistics, sociology, and psychology, a definition of understanding and its forms, assessment, and dynamics during the process of giving everyday explanations are explored. Two types of understanding are considered as possible outcomes of explanations, namely enabledness, 'knowing how' to do or decide something, and comprehension, 'knowing that' -- both in different degrees (from shallow to deep). Explanations regularly start with shallow understanding in a specific domain and can lead to deep comprehension and enabledness of the explanandum, which we see as a prerequisite for human users to gain agency. In this process, the increase of comprehension and enabledness are highly interdependent. Against the background of this systematization, special challenges of understanding in XAI are discussed.
    摘要 <>输入文本转换为简化中文。<>Explainability 已成为计算机科学和人工智能中重要的话题,导致了一个子领域called Explainable Artificial Intelligence (XAI). 该领域的目标是提供或寻求解释,以达到更好的'理解'。然而,'理解'这个概念仍然没有得到清晰定义,而且这个概念自己也rarely是科学研究的对象。本文旨在提出一个形式理解在 XAI 和其他领域的模型。从计算机科学、语言学、社会学和心理学的多学科角度,一个理解的定义和其形式、评估和过程中的动态都是探讨的对象。在日常解释过程中,理解可以分为两种可能的结果,即'能力'和'认知',两者都有不同的深度水平(从浅到深)。解释通常从特定领域的浅度理解开始,可以导致解释对象的深度认知和能力,这被视为人类用户获得行为能力的前提。在这个过程中,理解和能力之间存在很高的相互关系。在这个背景下,XAI 中特殊的理解挑战也是讨论的对象。

Cross-domain feature disentanglement for interpretable modeling of tumor microenvironment impact on drug response

  • paper_url: http://arxiv.org/abs/2311.09264
  • repo_url: None
  • paper_authors: Jia Zhai, Hui Liu
  • for: 本研究旨在模拟肿瘤微环境(TME)对药物响应的影响,以提高药物治疗的效果和特点。
  • methods: 本研究使用了适应域网络进行特征分离,将源领域(cell lines)和目标领域(肿瘤)的特征分离开来,并使用了 Graph Attention Network 学习药物的潜在表示。
  • results: 研究表明,适应域网络可以 superior performance 在预测药物响应和分解肿瘤微环境对药物效果的影响。
    Abstract High-throughput screening technology has facilitated the generation of large-scale drug responses across hundreds of cancer cell lines. However, there exists significant discrepancy between in vitro cell lines and actual tumors in vivo in terms of their response to drug treatments, because of tumors comprise of complex cellular compositions and histopathology structure, known as tumor microenvironment (TME), which greatly influences the drug cytotoxicity against tumor cells. To date, no study has focused on modeling the impact of the TME on clinical drug response. This paper proposed a domain adaptation network for feature disentanglement to separate representations of cancer cells and TME of a tumor in patients. Two denoising autoencoders were separately used to extract features from cell lines (source domain) and tumors (target domain) for partial domain alignment and feature decoupling. The specific encoder was enforced to extract information only about TME. Moreover, to ensure generalizability to novel drugs, we applied a graph attention network to learn the latent representation of drugs, allowing us to linearly model the drug perturbation on cellular state in latent space. We calibrated our model on a benchmark dataset and demonstrated its superior performance in predicting clinical drug response and dissecting the influence of the TME on drug efficacy.
    摘要 高通量屏测技术已经促进了大规模药物响应的生成 across hundreds of cancer cell lines. 然而, exists significant discrepancy between in vitro cell lines and actual tumors in vivo in terms of their response to drug treatments, because tumors comprise complex cellular compositions and histopathology structure, known as tumor microenvironment (TME), which greatly influences the drug cytotoxicity against tumor cells. To date, no study has focused on modeling the impact of the TME on clinical drug response. This paper proposed a domain adaptation network for feature disentanglement to separate representations of cancer cells and TME of a tumor in patients. Two denoising autoencoders were separately used to extract features from cell lines (source domain) and tumors (target domain) for partial domain alignment and feature decoupling. The specific encoder was enforced to extract information only about TME. Moreover, to ensure generalizability to novel drugs, we applied a graph attention network to learn the latent representation of drugs, allowing us to linearly model the drug perturbation on cellular state in latent space. We calibrated our model on a benchmark dataset and demonstrated its superior performance in predicting clinical drug response and dissecting the influence of the TME on drug efficacy.

Auto-ICL: In-Context Learning without Human Supervision

  • paper_url: http://arxiv.org/abs/2311.09263
  • repo_url: https://github.com/ecielyang/auto-icl
  • paper_authors: Jinghan Yang, Shuming Ma, Furu Wei
  • for: 这个研究旨在提高人机交互的自然语言功能,使大语言模型在各种任务上具备更高的灵活性和自主性。
  • methods: 该研究提出了一种自动启发学习框架,可以让模型自动生成示例、标签、指导路径等,以便在不同任务上进行启发学习。
  • results: 研究表明,该方法在多种任务上能够实现优秀的表现,与现有方法相比,具有更高的灵活性和自主性。
    Abstract In the era of Large Language Models (LLMs), human-computer interaction has evolved towards natural language, offering unprecedented flexibility. Despite this, LLMs are heavily reliant on well-structured prompts to function efficiently within the realm of In-Context Learning. Vanilla In-Context Learning relies on human-provided contexts, such as labeled examples, explicit instructions, or other guiding mechanisms that shape the model's outputs. To address this challenge, our study presents a universal framework named Automatic In-Context Learning. Upon receiving a user's request, we ask the model to independently generate examples, including labels, instructions, or reasoning pathways. The model then leverages this self-produced context to tackle the given problem. Our approach is universally adaptable and can be implemented in any setting where vanilla In-Context Learning is applicable. We demonstrate that our method yields strong performance across a range of tasks, standing up well when compared to existing methods.
    摘要 (在大语言模型(LLM)时代,人机交互发展到自然语言水平,提供了前所未有的灵活性。然而,LLMs仍然受到良好结构化提示的限制,以便在受限的上下文学习中功能 efficiently。vanilla In-Context Learning rely on人类提供的上下文,如标注的例子、显式的指令或其他引导机制,以shape模型的输出。为解决这个挑战,我们的研究提出了一个通用框架 named Automatic In-Context Learning。当接收用户的请求时,我们会让模型独立生成示例,包括标签、指令或推理路径。然后,模型会利用自己生成的上下文来解决给定的问题。我们的方法是 universally adaptable,可以在任何可以使用vanilla In-Context Learning的场景中实现。我们示出了我们的方法在多种任务上具有强大表现,与现有方法相比,表现良好。)

Disentangling the Potential Impacts of Papers into Diffusion, Conformity, and Contribution Values

  • paper_url: http://arxiv.org/abs/2311.09262
  • repo_url: None
  • paper_authors: Zhikai Xue, Guoxiu He, Zhuoren Jiang, Yangyang Kang, Star Zhao, Wei Lu
  • for: 这个论文的目的是计算学术论文的潜在影响力,并分解其为三个方面:散布、遵循和贡献。
  • methods: 该论文提出了一种基于图神经网络的新方法,称为DPPDCC,用于解决这些问题。DPPDCC使用动态不同类型的图 структуры,包括时间和结构特征,以捕捉知识的流动。具体来说,它使用比较和相关的信息来捕捉知识的流动,并使用约束来避免模型之间的混淆。
  • results: 实验结果表明,DPPDCC在不同时间点的论文上表现出色,与基线模型相比,它在新发表、新鲜出版和当下发表的论文上均有显著优势。此外,DPPDCC还能够robust地处理不同类型的论文和数据集。
    Abstract The potential impact of an academic paper is determined by various factors, including its popularity and contribution. Existing models usually estimate original citation counts based on static graphs and fail to differentiate values from nuanced perspectives. In this study, we propose a novel graph neural network to Disentangle the Potential impacts of Papers into Diffusion, Conformity, and Contribution values (called DPPDCC). Given a target paper, DPPDCC encodes temporal and structural features within the constructed dynamic heterogeneous graph. Particularly, to capture the knowledge flow, we emphasize the importance of comparative and co-cited/citing information between papers and aggregate snapshots evolutionarily. To unravel popularity, we contrast augmented graphs to extract the essence of diffusion and predict the accumulated citation binning to model conformity. We further apply orthogonal constraints to encourage distinct modeling of each perspective and preserve the inherent value of contribution. To evaluate models' generalization for papers published at various times, we reformulate the problem by partitioning data based on specific time points to mirror real-world conditions. Extensive experimental results on three datasets demonstrate that DPPDCC significantly outperforms baselines for previously, freshly, and immediately published papers. Further analyses confirm its robust capabilities. We will make our datasets and codes publicly available.
    摘要 科学论文的潜在影响因多种因素决定,包括其受欢迎程度和贡献。现有模型通常基于静止图计算原始引用数,而不能区分不同的观点。在这项研究中,我们提出了一种新的图神经网络,即分离论文的潜在影响值(DPPDCC)。给定目标论文,DPPDCC 编码了时间和结构特征在构建的动态 hetэроogeneous图中。特别是,为了捕捉知识的流动,我们强调在比较和引用/引用信息之间的关系中捕捉知识的流动。为了评估媒体,我们对升级图进行比较,从而提取论文的核心特征。我们还应用正交约束,以便独特地模型每个角度,并保留论文的内在价值。为了评估模型在不同时间点发表的论文的普适性,我们将数据分 partitions according to specific time points,以模拟实际情况。我们的实验结果表明,DPPDCC 在三个数据集上显著超过基线。进一步的分析证明它的稳定性。我们将数据和代码公开。

Emerging Drug Interaction Prediction Enabled by Flow-based Graph Neural Network with Biomedical Network

  • paper_url: http://arxiv.org/abs/2311.09261
  • repo_url: https://github.com/lars-research/emergnn
  • paper_authors: Yongqi Zhang, Quanming Yao, Ling Yue, Xian Wu, Ziheng Zhang, Zhenxi Lin, Yefeng Zheng
  • for: 预测新药物与新药物之间的药物交互作用,以提高病人患病经验和药物开发效率。
  • methods: 使用图 нейрон网络(GNN)来预测新药物之间的交互作用,并利用生物医学网络中的资料来提高预测的准确性。
  • results: EmerGNN比现有方法更高的准确性来预测新药物之间的交互作用,并可以快速地确定最重要的生物医学概念。
    Abstract Accurately predicting drug-drug interactions (DDI) for emerging drugs, which offer possibilities for treating and alleviating diseases, with computational methods can improve patient care and contribute to efficient drug development. However, many existing computational methods require large amounts of known DDI information, which is scarce for emerging drugs. In this paper, we propose EmerGNN, a graph neural network (GNN) that can effectively predict interactions for emerging drugs by leveraging the rich information in biomedical networks. EmerGNN learns pairwise representations of drugs by extracting the paths between drug pairs, propagating information from one drug to the other, and incorporating the relevant biomedical concepts on the paths. The different edges on the biomedical network are weighted to indicate the relevance for the target DDI prediction. Overall, EmerGNN has higher accuracy than existing approaches in predicting interactions for emerging drugs and can identify the most relevant information on the biomedical network.
    摘要 通过计算方法精准预测新药 drug-drug interactions (DDI),可以提高患者护理和药物开发效率。然而,许多现有的计算方法需要大量已知 DDI 信息,而这些信息对新药来说匮乏。在这篇文章中,我们提出 EmerGNN,一种基于图神经网络 (GNN) 的方法,可以有效预测新药之间的交互。EmerGNN 通过提取药物对之间的路径,传递药物之间的信息,并 incorporate 生物医学网络上相关的概念,来学习药物对之间的对应。不同的生物医学网络边缘权重,以指示目标 DDI 预测中的重要性。总的来说,EmerGNN 比现有方法更高精度地预测新药之间的交互,并可以 Identify 生物医学网络上最重要的信息。

Joint User Pairing and Beamforming Design of Multi-STAR-RISs-Aided NOMA in the Indoor Environment via Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.08708
  • repo_url: None
  • paper_authors: Yu Min Park, Yan Kyaw Tun, Choong Seon Hong
    for:* 6G/B5G wireless networks with enhanced quality requirementsmethods:* NOMA technique for multiple users to share resources* STAR-RISs for improved coverage, spectral efficiency, and reliabilityresults:* Joint user pairing and beamforming design for Multi-STAR-RISs in an indoor environment* Maximum total throughput of multiple users (MUs) through optimization of decoding order, user pairing, active beamforming, and passive beamformingPlease note that the above information is in Simplified Chinese text, as requested.
    Abstract The development of 6G/B5G wireless networks, which have requirements that go beyond current 5G networks, is gaining interest from academia and industry. However, to increase 6G/B5G network quality, conventional cellular networks that rely on terrestrial base stations are constrained geographically and economically. Meanwhile, NOMA allows multiple users to share the same resources, which improves the spectral efficiency of the system and has the advantage of supporting a larger number of users. Additionally, by intelligently manipulating the phase and amplitude of both the reflected and transmitted signals, STAR-RISs can achieve improved coverage, increased spectral efficiency, and enhanced communication reliability. However, STAR-RISs must simultaneously optimize the amplitude and phase shift corresponding to reflection and transmission, which makes the existing terrestrial networks more complicated and is considered a major challenging issue. Motivated by the above, we study the joint user pairing for NOMA and beamforming design of Multi-STAR-RISs in an indoor environment. Then, we formulate the optimization problem with the objective of maximizing the total throughput of MUs by jointly optimizing the decoding order, user pairing, active beamforming, and passive beamforming. However, the formulated problem is a MINLP. To address this challenge, we first introduce the decoding order for NOMA networks. Next, we decompose the original problem into two subproblems, namely: 1) MU pairing and 2) Beamforming optimization under the optimal decoding order. For the first subproblem, we employ correlation-based K-means clustering to solve the user pairing problem. Then, to jointly deal with beamforming vector optimizations, we propose MAPPO, which can make quick decisions in the given environment owing to its low complexity.
    摘要 6G/B5G无线网络的开发,具有超过当前5G网络的需求,已经吸引了学术界和业界的关注。然而,使得6G/B5G网络质量提高的传统Cellsular网络,受到地面基站的限制,它们的空间和经济性不足。而NOMA技术允许多个用户共享同一资源,提高系统的spectral efficiency,并且可以支持更多的用户。此外,通过智能地控制反射和发射信号的相位和幅度,STAR-RISs可以实现改善的覆盖率、增加spectral efficiency和通信可靠性。然而,STAR-RISs需要同时优化反射和发射信号的相位和幅度,这使得现有的地面网络更加复杂,并被视为主要挑战。驱动了以上,我们研究了Multi-STAR-RISs在室内环境中的用户对称对接和束缚设计。然后,我们形ulated了优化问题的目标,即通过同时优化用户对称对接、束缚、活动束缚和空转束缚来提高多个用户机(MU)的总吞吐量。然而,该问题是一个MINLP问题。为了解决这个挑战,我们首先介绍了NOMA网络中的解码顺序。然后,我们将原问题分解成两个子问题,即:1)用户对称对接问题和2)束缚优化问题。对于第一个子问题,我们采用协方差基于K-means分 clustering算法来解决用户对称对接问题。然后,为了同时处理束缚向量优化问题,我们提议MAPPO,它可以在给定环境中做出快速决策,因为它的复杂度较低。

Aligned: A Platform-based Process for Alignment

  • paper_url: http://arxiv.org/abs/2311.08706
  • repo_url: https://github.com/klonnet23/helloy-word
  • paper_authors: Ethan Shaotran, Ido Pesok, Sam Jones, Emi Liu
  • for: 本研究旨在提供一个公信worthy、公开的方式来保障前沿模型的安全性,并最终实现超智能。
  • methods: 本研究使用了一个 constitutional committee 框架,Initial tests with 680 participants result in a 30-guideline constitution with 93% overall support。
  • results: 研究显示了平台的自然扩展性,使得社区参与者具有更高的信任和满意度。
    Abstract We are introducing Aligned, a platform for global governance and alignment of frontier models, and eventually superintelligence. While previous efforts at the major AI labs have attempted to gather inputs for alignment, these are often conducted behind closed doors. We aim to set the foundation for a more trustworthy, public-facing approach to safety: a constitutional committee framework. Initial tests with 680 participants result in a 30-guideline constitution with 93% overall support. We show the platform naturally scales, instilling confidence and enjoyment from the community. We invite other AI labs and teams to plug and play into the Aligned ecosystem.
    摘要 我们是引入了对齐平台,用于全球治理和前沿模型的对齐,最终是超智能。在大型AI室中,过去的尝试都是在关闭的门后进行集Inputs for alignment,我们想要设立一个更加可靠、公共的安全方法:一个宪法委员会框架。我们的初步测试中,680名参与者共同制定了30个指南,得到了93%的总支持。我们表明该平台自然扩展,带来了社区的信任和愉悦。我们邀请其他AI室和团队加入对齐生态系统。

Can Large Language Models Follow Concept Annotation Guidelines? A Case Study on Scientific and Financial Domains

  • paper_url: http://arxiv.org/abs/2311.08704
  • repo_url: None
  • paper_authors: Marcio Fonseca, Shay B. Cohen
  • For: The paper aims to examine the capacity of instruction-tuned large language models (LLMs) to follow in-context concept guidelines for sentence labeling tasks.* Methods: The paper uses zero-shot sentence classification tasks with different types of factual and counterfactual concept definitions as prompts to test the models’ ability to recognize new concepts.* Results: The paper finds that only the larger models (with 70B parameters or more) have limited ability to work under counterfactual contexts, and that proprietary models such as GPT-3.5 and GPT-4 can recognize nonsensical guidelines. Additionally, the paper finds that Falcon-180B-chat is outperformed by Llama-2-70B-chat in most cases, indicating that careful fine-tuning is more effective than increasing model scale.Here’s the simplified Chinese version of the three key points:* For: 论文目的是检验基于叙述示例的指导下,大型自然语言模型(LLMs)是否可以学习新的概念或事实。* Methods: 论文使用零shot句式分类任务,用不同类型的事实和反事实指导来测试模型的新概念认知能力。* Results: 论文发现,只有70B参数或更多的模型才能在对应的反事实上工作,而且专有API如GPT-3.5和GPT-4可以识别无意义的指导。此外,论文发现Falcon-180B-chat在大多数情况下被Llama-2-70B-chat所超越,这表明精心调整是更有效的than增加模型scale。
    Abstract Although large language models (LLMs) exhibit remarkable capacity to leverage in-context demonstrations, it is still unclear to what extent they can learn new concepts or facts from ground-truth labels. To address this question, we examine the capacity of instruction-tuned LLMs to follow in-context concept guidelines for sentence labeling tasks. We design guidelines that present different types of factual and counterfactual concept definitions, which are used as prompts for zero-shot sentence classification tasks. Our results show that although concept definitions consistently help in task performance, only the larger models (with 70B parameters or more) have limited ability to work under counterfactual contexts. Importantly, only proprietary models such as GPT-3.5 and GPT-4 can recognize nonsensical guidelines, which we hypothesize is due to more sophisticated alignment methods. Finally, we find that Falcon-180B-chat is outperformed by Llama-2-70B-chat is most cases, which indicates that careful fine-tuning is more effective than increasing model scale. Altogether, our simple evaluation method reveals significant gaps in concept understanding between the most capable open-source language models and the leading proprietary APIs.
    摘要 尽管大型语言模型(LLM)具有丰富的 Context 掌握能力,但是是否可以从真实标签中学习新的概念或事实仍然不清楚。为了回答这个问题,我们研究了基于示例示范的 instruction-tuned LLM 是否能够遵循 Context 中的概念指南进行句子标签任务。我们设计了不同类型的事实和反事实概念定义,用作零容量 sentence classification 任务的提示。我们的结果表明,虽然概念定义 invariably 提高任务性能,但只有70B参数或更多的大型模型可以在对应事实上下降性能。此外,我们发现仅有专有模型 such as GPT-3.5 和 GPT-4 可以识别不合理的指南,我们假设这是因为它们使用了更加复杂的对接方法。最后,我们发现 Falcon-180B-chat 通常被 Llama-2-70B-chat 超越,这表明精细的微调更加重要于提高模型规模。总之,我们的简单评估方法 revelas 最高水平的 open-source 语言模型和主流专有 API 之间的概念理解存在显著差距。

Debate Helps Supervise Unreliable Experts

  • paper_url: http://arxiv.org/abs/2311.08702
  • repo_url: https://github.com/julianmichael/debate
  • paper_authors: Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, Samuel R. Bowman
  • for: supervising unreliable AI systems to give answers that are systematically true
  • methods: using debate between two unreliable experts to help a non-expert judge more reliably identify the truth
  • results: debate performs significantly better than consultancy (a baseline approach) and is more efficient, with 84% judge accuracy compared to 74% for consultancy
    Abstract As AI systems are used to answer more difficult questions and potentially help create new knowledge, judging the truthfulness of their outputs becomes more difficult and more important. How can we supervise unreliable experts, which have access to the truth but may not accurately report it, to give answers that are systematically true and don't just superficially seem true, when the supervisor can't tell the difference between the two on their own? In this work, we show that debate between two unreliable experts can help a non-expert judge more reliably identify the truth. We collect a dataset of human-written debates on hard reading comprehension questions where the judge has not read the source passage, only ever seeing expert arguments and short quotes selectively revealed by 'expert' debaters who have access to the passage. In our debates, one expert argues for the correct answer, and the other for an incorrect answer. Comparing debate to a baseline we call consultancy, where a single expert argues for only one answer which is correct half of the time, we find that debate performs significantly better, with 84% judge accuracy compared to consultancy's 74%. Debates are also more efficient, being 68% of the length of consultancies. By comparing human to AI debaters, we find evidence that with more skilled (in this case, human) debaters, the performance of debate goes up but the performance of consultancy goes down. Our error analysis also supports this trend, with 46% of errors in human debate attributable to mistakes by the honest debater (which should go away with increased skill); whereas 52% of errors in human consultancy are due to debaters obfuscating the relevant evidence from the judge (which should become worse with increased skill). Overall, these results show that debate is a promising approach for supervising increasingly capable but potentially unreliable AI systems.
    摘要 Traditional Chinese:随着人工智能系统用于回答更加困难的问题,评估其输出的真实性变得更加困难和更加重要。如何监督不可靠的专家,他们有存取真理,但可能不会正确地报告它们?在这个工作中,我们显示了对两名不可靠专家进行辩论可以帮助非专家评估者更加可靠地评估真理。我们收集了一个人类写作的辩论集,其中一名专家认为正确的答案,另一名专家认为 incorrect的答案。与基准我们称之为咨询(consultancy),单一的专家 argue for 正确的答案,其中正确的答案是半数的时间。我们发现,在我们的辩论中,辩论比咨询表现更好,评估者的准确率为84%,而咨询的准确率为74%。辩论也更有效率,长度只有68%。我们对人工和人类辩论进行比较,发现随着专家的技能提高,辩论的表现也提高,而咨询的表现则下降。我们的错误分析也支持这个趋势,发现人类辩论中的错误中46%是由诚实的辩论者所引起的(这些错误可以逐渐消失),而人类咨询中的错误中52%是由辩论者对评估者隐藏重要证据所致(这些错误可以加剧)。总的来说,这些结果显示了辩论是一种可靠地监督 increasingly capable 但可能不可靠的 AI 系统的方法。

Artificial General Intelligence, Existential Risk, and Human Risk Perception

  • paper_url: http://arxiv.org/abs/2311.08698
  • repo_url: None
  • paper_authors: David R. Mandel
  • for: 这篇论文关注人工智能(AGI)的可能性,特别是AGI在未来20年内可能超过人类智能水平,然后快速超越人类智能。
  • methods: 作者基于公开可用的预测和意见数据,研究了专家和非专家对AGI的风险认知。
  • results: 研究发现,对AGI的世界大悲害或灭绝风险的认知比其他潜在存在风险(如核战或人类引起的气候变化)高,过去一年内对AGI风险的认知增长也比其他风险更快。
    Abstract Artificial general intelligence (AGI) does not yet exist, but given the pace of technological development in artificial intelligence, it is projected to reach human-level intelligence within roughly the next two decades. After that, many experts expect it to far surpass human intelligence and to do so rapidly. The prospect of superintelligent AGI poses an existential risk to humans because there is no reliable method for ensuring that AGI goals stay aligned with human goals. Drawing on publicly available forecaster and opinion data, the author examines how experts and non-experts perceive risk from AGI. The findings indicate that the perceived risk of a world catastrophe or extinction from AGI is greater than for other existential risks. The increase in perceived risk over the last year is also steeper for AGI than for other existential threats (e.g., nuclear war or human-caused climate change). That AGI is a pressing existential risk is something on which experts and non-experts agree, but the basis for such agreement currently remains obscure.
    摘要 人工总智能(AGI)目前还没有存在,但根据技术发展的速度,预计在下一两十年内达到人类水平的智能。之后,许多专家预计它会迅速超越人类智能。超智AGI的出现对人类存在极大的风险,因为没有可靠的方法来保证AGI目标与人类目标相对应。作者通过公开available的预测和意见数据,检查专家和非专家对AGI风险的认知。结果表明AGI世界灾难或灭绝的风险高于其他极大风险(如核战或人类引起的气候变化)。过去一年内AGI风险的增加速度也比其他极大风险更大。虽然专家和非专家都认为AGI是一种极大的存在风险,但目前这种一致的基础还未明确。

An Eye on Clinical BERT: Investigating Language Model Generalization for Diabetic Eye Disease Phenotyping

  • paper_url: http://arxiv.org/abs/2311.08687
  • repo_url: https://github.com/kharrigian/ml4h-clinical-bert
  • paper_authors: Keith Harrigian, Tina Tang, Anthony Gonzales, Cindy X. Cai, Mark Dredze
  • for: 本研究旨在帮助监测 диабетиче眼病的临床趋势和检测护理不足,以预防盲视。
  • methods: 本研究使用了19种临床概念相关的文本提取系统,以检测和描述 диабетиче眼病的临床特征。
  • results: 研究发现,使用BERT语言模型预训练在非临床数据上的语言模型,对于本领域来说并无显著改进。
    Abstract Diabetic eye disease is a major cause of blindness worldwide. The ability to monitor relevant clinical trajectories and detect lapses in care is critical to managing the disease and preventing blindness. Alas, much of the information necessary to support these goals is found only in the free text of the electronic medical record. To fill this information gap, we introduce a system for extracting evidence from clinical text of 19 clinical concepts related to diabetic eye disease and inferring relevant attributes for each. In developing this ophthalmology phenotyping system, we are also afforded a unique opportunity to evaluate the effectiveness of clinical language models at adapting to new clinical domains. Across multiple training paradigms, we find that BERT language models pretrained on out-of-distribution clinical data offer no significant improvement over BERT language models pretrained on non-clinical data for our domain. Our study tempers recent claims that language models pretrained on clinical data are necessary for clinical NLP tasks and highlights the importance of not treating clinical language data as a single homogeneous domain.
    摘要 糖尿病眼病是全球最大的失明原因之一。监测相关的临床轨迹和检测护理缺失是控制疾病和避免失明的关键。然而,大量关键信息都藏在电子医疗记录中的自由文本中,使得管理疾病困难。为了填补这个信息差距,我们介绍了一种EXTRACTING EVIDENCE FROM CLINICAL TEXT OF 19 CLINICAL CONCEPTS RELATED TO DIABETIC EYE DISEASE AND INFERRING RELEVANT ATTRIBUTES FOR EACH。在开发这种眼科phenotyping系统时,我们也获得了评估临床语言模型在新临床领域中的适应性的机会。在多种训练方法中,我们发现BERT语言模型在非临床数据上进行预训练后对我们领域没有显著提高。我们的研究抑制了最近的宣称,即临床语言数据上的语言模型预训练是临床NLP任务中必不可少的。我们的研究也 highlights the importance of not treating clinical language data as a single homogeneous domain。

Safer-Instruct: Aligning Language Models with Automated Preference Data

  • paper_url: http://arxiv.org/abs/2311.08685
  • repo_url: https://github.com/uscnlp-lime/safer-instruct
  • paper_authors: Taiwei Shi, Kai Chen, Jieyu Zhao
  • for: 本研究旨在提高语言模型的安全性,通过人工审核和自动生成数据来提高模型的准确率和安全性。
  • methods: 本研究提出了一种新的数据生成管道,即Safer-Instruct,它使用倒转指令调整、指令生成和专家模型评估来生成高质量的偏好数据,不需要人工纠正。
  • results: 通过LLaMA进行指令生成和GPT-4作为专家模型,生成了约10K个偏好样本。通过训练一个Alpaca模型于此数据集,可以提高模型的安全性而不会影响其对话和下游任务的性能。Safer-Instruct解决了偏好数据获取的挑战,为更安全和责任的AI系统的发展提供了新的思路。
    Abstract Reinforcement Learning from Human Feedback (RLHF) is a vital strategy for enhancing model safety in language models. However, annotating preference data for RLHF is a resource-intensive and creativity-demanding process, while automatic generation methods face limitations in data diversity and quality. In response, we present Safer-Instruct, a novel pipeline for semi-automatically constructing large-scale preference datasets. Our approach leverages reversed instruction tuning, instruction induction, and expert model evaluation to efficiently generate high-quality preference data without human annotators. We evaluate Safer-Instruct using LLaMA for instruction induction and GPT-4 as an expert model, generating approximately 10K preference samples. Finetuning an Alpaca model on this dataset demonstrates improved harmlessness while maintaining competitive performance on conversation and downstream tasks. Safer-Instruct addresses the challenges in preference data acquisition, advancing the development of safer and more responsible AI systems. Our code and data are available at https://github.com/uscnlp-lime/safer-instruct
    摘要 � Reinforcement Learning from Human Feedback (RLHF) 是一种重要的策略来提高语言模型的安全性。然而,为RLHF annotating偏好数据是一个资源密集且创作需求高的过程,而自动生成方法受到数据多样性和质量的限制。为此,我们提出了Safer-Instruct,一个新的管线来半自动建构大规模的偏好数据。我们的方法利用倒转指令调整、指令生成和专家模型评估,以生成高品质的偏好数据,不需要人工标注员。我们使用LLaMA进行指令生成和GPT-4作为专家模型,生成约10K偏好数据。给Alpaca模型进行调整后,示出改善了无害性,同时保持了与对话和下游任务的竞争性能。Safer-Instruct解决了偏好数据取得的挑战,推动了更安全和责任的AI系统的开发。我们的代码和数据可以在https://github.com/uscnlp-lime/safer-instruct上取得。

Multi-Set Inoculation: Assessing Model Robustness Across Multiple Challenge Sets

  • paper_url: http://arxiv.org/abs/2311.08662
  • repo_url: None
  • paper_authors: Vatsal Gupta, Pranshu Pandya, Tushar Kataria, Vivek Gupta, Dan Roth
  • for: 这个研究旨在理解语言模型对输入异常的敏感性,以增强模型的可信度。
  • methods: 研究使用了精细调教和多个干扰的训练策略,以及一种链式思维(COT)示例来提高模型的多干扰Robustness。
  • results: 研究显示,使用提议的方法可以训练模型对不同干扰的Robustness,而不会影响模型在给定任务上的准确率。
    Abstract Language models, given their black-box nature, often exhibit sensitivity to input perturbations, leading to trust issues due to hallucinations. To bolster trust, it's essential to understand these models' failure modes and devise strategies to enhance their performance. In this study, we propose a framework to study the effect of input perturbations on language models of different scales, from pre-trained models to large language models (LLMs). We use fine-tuning to train a robust model to perturbations, and we investigate whether exposure to one perturbation improves or degrades the model's performance on other perturbations. To address multi-perturbation robustness, we suggest three distinct training strategies. We also extend the framework to LLMs via a chain of thought(COT) prompting with exemplars. We instantiate our framework for the Tabular-NLI task and show that the proposed strategies train the model robust to different perturbations without losing accuracy on a given dataset.
    摘要 <>文本模型,由于其黑盒特性,经常表现出输入杂乱的敏感性,导致不信任问题由于幻觉。为了增强不信任,我们需要理解这些模型的失败模式,并设计策略来提高其性能。在这项研究中,我们提议一个框架来研究输入杂乱对不同规模的语言模型(从预训练模型到大语言模型)的影响。我们使用精度训练来适应杂乱,并研究曝光一种杂乱后,模型对其他杂乱的性能是否改善或恶化。为了解决多种杂乱的可靠性,我们提出三种不同的训练策略。此外,我们将框架扩展到大语言模型(LLMs)via一种链式思维(COT)提示法,并通过例子来实现。我们在Tabular-NLI任务上实现了我们的框架,并示出了提议的策略可以在给定数据集上训练模型抗杂乱而不失去精度。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Autonomous Large Language Model Agents Enabling Intent-Driven Mobile GUI Testing

  • paper_url: http://arxiv.org/abs/2311.08649
  • repo_url: None
  • paper_authors: Juyeon Yoon, Robert Feldt, Shin Yoo
  • for: automatize GUI testing of Android apps, to increase testing efficiency and coverage
  • methods: uses Large Language Models and support mechanisms such as long- and short-term memory to set relevant task goals and perform realistic tasks
  • results: achieved 61% activity coverage and 317 out of 374 autonomously created tasks are realistic and relevant to app functionalities, outperforming current state-of-the-art GUI testing techniques.
    Abstract GUI testing checks if a software system behaves as expected when users interact with its graphical interface, e.g., testing specific functionality or validating relevant use case scenarios. Currently, deciding what to test at this high level is a manual task since automated GUI testing tools target lower level adequacy metrics such as structural code coverage or activity coverage. We propose DroidAgent, an autonomous GUI testing agent for Android, for semantic, intent-driven automation of GUI testing. It is based on Large Language Models and support mechanisms such as long- and short-term memory. Given an Android app, DroidAgent sets relevant task goals and subsequently tries to achieve them by interacting with the app. Our empirical evaluation of DroidAgent using 15 apps from the Themis benchmark shows that it can set up and perform realistic tasks, with a higher level of autonomy. For example, when testing a messaging app, DroidAgent created a second account and added a first account as a friend, testing a realistic use case, without human intervention. On average, DroidAgent achieved 61% activity coverage, compared to 51% for current state-of-the-art GUI testing techniques. Further, manual analysis shows that 317 out of the 374 autonomously created tasks are realistic and relevant to app functionalities, and also that DroidAgent interacts deeply with the apps and covers more features.
    摘要 GUI 测试检查软件系统在用户与图形界面交互时是否按预期的行为,例如测试特定功能或验证相关用例enario。目前,决定要测试的高级水平是一个手动任务,因为自动化GUI测试工具通常target lower level的充分度度量 such as 结构代码覆盖率或活动覆盖率。我们提出了DroidAgent,一个基于大型自然语言模型和支持机制such as长期和短期记忆的Android GUI测试自动化工具。给一个Android应用程序,DroidAgent会设定相关的任务目标,然后通过与应用程序交互来实现这些目标。我们对DroidAgent使用Themis测试套件中的15个应用程序进行了实验性评估,结果显示DroidAgent可以自动设置和执行真实的任务,并且比现有的GUI测试技术高一个等级。例如,当测试一个消息应用程序时,DroidAgent创建了一个第二个帐户,并将第一个帐户添加为好友,测试了一个真实的用例,没有人工干预。在average,DroidAgent achieve 61%的活动覆盖率,比现有技术的51%高。此外,手动分析结果显示DroidAgent自动创建的374个任务中,317个任务是真实有用和相关于应用程序功能,而且DroidAgent会深入与应用程序交互,覆盖更多的功能。

Explore Spurious Correlations at the Concept Level in Language Models for Text Classification

  • paper_url: http://arxiv.org/abs/2311.08648
  • repo_url: None
  • paper_authors: Yuhang Zhou, Paiheng Xu, Xiaoyu Liu, Bang An, Wei Ai, Furong Huang
  • for: 本研究旨在探讨语言模型(LM)在不同语言处理任务中的表现,以及如何减少LM因杂质相关性而导致的Robustness问题。
  • methods: 本研究使用语言模型(LM)对文本进行标签,并测试LM在不同文本分类任务中的表现。同时,我们还提出了一种数据重新平衡方法,通过添加LM生成的对反数据来减少杂质相关性。
  • results: 研究结果表明,存在多个概念的标签分布偏误在多个文本分类数据集中,LM会利用这些偏误来进行预测,而我们的减少方法可以有效地减少这些偏误。
    Abstract Language models (LMs) have gained great achievement in various NLP tasks for both fine-tuning and in-context learning (ICL) methods. Despite its outstanding performance, evidence shows that spurious correlations caused by imbalanced label distributions in training data (or exemplars in ICL) lead to robustness issues. However, previous studies mostly focus on word- and phrase-level features and fail to tackle it from the concept level, partly due to the lack of concept labels and subtle and diverse expressions of concepts in text. In this paper, we first use the LLM to label the concept for each text and then measure the concept bias of models for fine-tuning or ICL on the test data. Second, we propose a data rebalancing method to mitigate the spurious correlations by adding the LLM-generated counterfactual data to make a balanced label distribution for each concept. We verify the effectiveness of our mitigation method and show its superiority over the token removal method. Overall, our results show that there exist label distribution biases in concepts across multiple text classification datasets, and LMs will utilize these shortcuts to make predictions in both fine-tuning and ICL methods.
    摘要 语言模型(LM)在各种自然语言处理(NLP)任务中已经取得了很大的成就,包括精度训练和上下文学习(ICL)方法。尽管它们的表现很出色,但证据表明,由于训练数据中标签的不均匀分布而导致的偏见问题。然而,前一些研究主要集中在单词和短语水平的特征上,忽略了概念水平的问题,其中一个原因是缺乏概念标签,以及文本中概念的柔和和多样化表达。在本文中,我们首先使用LM来标注每个文本中的概念,然后测量模型在测试数据上的概念偏见。其次,我们提出了一种数据重新补做方法,以避免由于标签分布的偏见问题。我们证明了我们的mitigation方法的有效性,并证明它在和token移除方法相比而言更有优势。总之,我们的结果表明,存在多个文本分类 datasets中的概念标签偏见,LM在精度训练和ICL方法中都会利用这些短cuts来做预测。

Interpretable by Design: Wrapper Boxes Combine Neural Performance with Faithful Explanations

  • paper_url: http://arxiv.org/abs/2311.08644
  • repo_url: None
  • paper_authors: Yiheng Su, Juni Jessy Li, Matthew Lease
  • for: 能够保持神经网络模型的准确性while提供 faithful的解释吗?我们提出了“ wrapper boxes”,一种通用的方法来生成 faithful, example-based解释 для模型预测结果,同时保持预测性能。
  • methods: 我们首先训练了一个神经网络模型,然后将其学习的特征表示输入到一个可解释的模型中进行实际预测。这种简单的策略 surprisingly effective,results largely comparable to those of the original neural model, как shown across three large pre-trained language models, two datasets of varying scale, four classic models, and four evaluation metrics。
  • results: 此外,因为这些可解释模型是设计为可解释的,所以可以直接向用户显示trainig example subset That determine classic model predictions。
    Abstract Can we preserve the accuracy of neural models while also providing faithful explanations? We present wrapper boxes, a general approach to generate faithful, example-based explanations for model predictions while maintaining predictive performance. After training a neural model as usual, its learned feature representation is input to a classic, interpretable model to perform the actual prediction. This simple strategy is surprisingly effective, with results largely comparable to those of the original neural model, as shown across three large pre-trained language models, two datasets of varying scale, four classic models, and four evaluation metrics. Moreover, because these classic models are interpretable by design, the subset of training examples that determine classic model predictions can be shown directly to users.
    摘要 可以保持神经网络模型的准确性 while also providing faithful explanations? We present wrapper boxes, a general approach to generate faithful, example-based explanations for model predictions while maintaining predictive performance. After training a neural model as usual, its learned feature representation is input to a classic, interpretable model to perform the actual prediction. This simple strategy is surprisingly effective, with results largely comparable to those of the original neural model, as shown across three large pre-trained language models, two datasets of varying scale, four classic models, and four evaluation metrics. Moreover, because these classic models are interpretable by design, the subset of training examples that determine classic model predictions can be shown directly to users.Here's the translation in Traditional Chinese:可以保持神经网络模型的准确性 while also providing faithful explanations? We present wrapper boxes, a general approach to generate faithful, example-based explanations for model predictions while maintaining predictive performance. After training a neural model as usual, its learned feature representation is input to a classic, interpretable model to perform the actual prediction. This simple strategy is surprisingly effective, with results largely comparable to those of the original neural model, as shown across three large pre-trained language models, two datasets of varying scale, four classic models, and four evaluation metrics. Moreover, because these classic models are interpretable by design, the subset of training examples that determine classic model predictions can be shown directly to users.

Spatio-Temporal Graph Neural Point Process for Traffic Congestion Event Prediction

  • paper_url: http://arxiv.org/abs/2311.08635
  • repo_url: None
  • paper_authors: Guangyin Jin, Lingbo Liu, Fuxian Li, Jincai Huang
  • for: 预测交通堵塞事件,以提高智能交通系统的效能。
  • methods: 我们提出了一种基于图 neural point process 框架的 spatial-temporal graph neural network,可以充分捕捉历史交通状态数据中的长距离空间-时间依赖关系,同时还可以模型堵塞事件的发展趋势。
  • results: 我们的方法在两个实际数据集上进行了广泛的实验,并证明了与现有状态艺术方法相比,其性能更高。
    Abstract Traffic congestion event prediction is an important yet challenging task in intelligent transportation systems. Many existing works about traffic prediction integrate various temporal encoders and graph convolution networks (GCNs), called spatio-temporal graph-based neural networks, which focus on predicting dense variables such as flow, speed and demand in time snapshots, but they can hardly forecast the traffic congestion events that are sparsely distributed on the continuous time axis. In recent years, neural point process (NPP) has emerged as an appropriate framework for event prediction in continuous time scenarios. However, most conventional works about NPP cannot model the complex spatio-temporal dependencies and congestion evolution patterns. To address these limitations, we propose a spatio-temporal graph neural point process framework, named STGNPP for traffic congestion event prediction. Specifically, we first design the spatio-temporal graph learning module to fully capture the long-range spatio-temporal dependencies from the historical traffic state data along with the road network. The extracted spatio-temporal hidden representation and congestion event information are then fed into a continuous gated recurrent unit to model the congestion evolution patterns. In particular, to fully exploit the periodic information, we also improve the intensity function calculation of the point process with a periodic gated mechanism. Finally, our model simultaneously predicts the occurrence time and duration of the next congestion. Extensive experiments on two real-world datasets demonstrate that our method achieves superior performance in comparison to existing state-of-the-art approaches.
    摘要 traffic 堵塞事件预测是智能交通系统中的一个重要 yet 挑战性任务。许多现有的交通预测方法 integrates 多种 temporal 编码器和图像 convolution 网络(GCNs),称为 spatio-temporal 图像-based 神经网络,它们主要 focus 在 predicting 稠密变量 such as flow, speed 和 demand 在时刻戳中,但它们很难预测分布在继续时间轴上的交通堵塞事件。在过去几年,神经点过程(NPP)已经 emerge 为继续时间场景中的适用性Frameworks。然而,大多数传统的 NPP 方法无法模型 complex spatio-temporal 依赖关系和堵塞演化模式。为了解决这些局限性,我们提出了一种 spatio-temporal 图像神经点过程框架,名为 STGNPP для交通堵塞事件预测。具体来说,我们首先设计了 spatio-temporal 图像学习模块,以全面捕捉历史交通状态数据中的长距离 spatio-temporal 依赖关系,同时与道路网络相结合。提取的 spatio-temporal 隐藏表示和堵塞事件信息然后被 fed 到一个连续闭合回归单元,以模型堵塞演化模式。特别是,为了充分利用周期信息,我们还改进了点过程中的 Intensity 函数计算方法。最后,我们的模型同时预测下一次堵塞事件的发生时间和持续时间。广泛的实验表明,我们的方法在两个真实世界数据集上表现出优于现有的状态前方法。

XplainLLM: A QA Explanation Dataset for Understanding LLM Decision-Making

  • paper_url: http://arxiv.org/abs/2311.08614
  • repo_url: None
  • paper_authors: Zichen Chen, Jianda Chen, Mitali Gaidhani, Ambuj Singh, Misha Sra
  • for: 本研究旨在提高大型自然语言处理模型(LLM)的决策过程的可见性,通过创建一个新的问答解释数据集(QAE),integrating知识图(KG)。
  • methods: 我们使用了知识图和图注意网络(GAT)来找到reason-elements,并将其转化为可理解的why-choose和why-not-choose解释。
  • results: 我们通过量化和质量评价表明,我们的数据集可以提高LLM在上下文学习中的性能,提高其解释性和可见性,使其更加可靠和可信worthy。
    Abstract Large Language Models (LLMs) have recently made impressive strides in natural language understanding tasks. Despite their remarkable performance, understanding their decision-making process remains a big challenge. In this paper, we look into bringing some transparency to this process by introducing a new explanation dataset for question answering (QA) tasks that integrates knowledge graphs (KGs) in a novel way. Our dataset includes 12,102 question-answer-explanation (QAE) triples. Each explanation in the dataset links the LLM's reasoning to entities and relations in the KGs. The explanation component includes a why-choose explanation, a why-not-choose explanation, and a set of reason-elements that underlie the LLM's decision. We leverage KGs and graph attention networks (GAT) to find the reason-elements and transform them into why-choose and why-not-choose explanations that are comprehensible to humans. Through quantitative and qualitative evaluations, we demonstrate the potential of our dataset to improve the in-context learning of LLMs, and enhance their interpretability and explainability. Our work contributes to the field of explainable AI by enabling a deeper understanding of the LLMs decision-making process to make them more transparent and thereby, potentially more reliable, to researchers and practitioners alike. Our dataset is available at: https://github.com/chen-zichen/XplainLLM_dataset.git
    摘要

  • paper_url: http://arxiv.org/abs/2311.08605
  • repo_url: https://github.com/david-jenny/llm-political-study
  • paper_authors: David F. Jenny, Yann Billeter, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin
  • for: 本研究旨在探讨 Large Language Models (LLMs) 在政治辩论中的决策过程和内在偏见。
  • methods: 本研究使用 Activity Dependency Networks (ADNs) 抽取 LLMs 中的隐式评价标准,并 illustrate how normative values influence these perceptions。
  • results: 研究发现 LLMs 在评价 “好Arguments” 时存在偏见,并且这些偏见受到了 normative values 的影响。这些结果有关于人机同步和偏见减少的影响。
    Abstract The rapid advancement of Large Language Models (LLMs) has sparked intense debate regarding their ability to perceive and interpret complex socio-political landscapes. In this study, we undertake an exploration of decision-making processes and inherent biases within LLMs, exemplified by ChatGPT, specifically contextualizing our analysis within political debates. We aim not to critique or validate LLMs' values, but rather to discern how they interpret and adjudicate "good arguments." By applying Activity Dependency Networks (ADNs), we extract the LLMs' implicit criteria for such assessments and illustrate how normative values influence these perceptions. We discuss the consequences of our findings for human-AI alignment and bias mitigation. Our code and data at https://github.com/david-jenny/LLM-Political-Study.
    摘要 LLMs 的快速发展已经引发了对其能够理解和解释复杂社会政治景观的激烈讨论。在这个研究中,我们进行了 LLMS 决策过程和内在偏见的探索,以 chatGPT 为例,并在政治辩论中进行了Contextual化分析。我们的目标不是评价或验证 LLMS 的价值观,而是理解它们如何解读和评价 "好的论点"。通过应用 Activity Dependency Networks (ADNs),我们提取了 LLMS 的隐藏标准 для这些评价,并示出了如何 normative 价值影响这些见解。我们讨论了我们发现的后果,以及如何实现人机同步和偏见缓减。我们的代码和数据可以在 找到。

cs.CL - 2023-11-15

Lexical Repetitions Lead to Rote Learning: Unveiling the Impact of Lexical Overlap in Train and Test Reference Summaries

  • paper_url: http://arxiv.org/abs/2311.09458
  • repo_url: None
  • paper_authors: Prafulla Kumar Choubey, Alexander R. Fabbri, Caiming Xiong, Chien-Sheng Wu
    for:* The paper is written to propose a fine-grained evaluation protocol for summarization models to determine their competencies in generalizing to novel summary-worthy content.methods:* The authors use a test set partitioned based on the lexical similarity of reference test summaries with training summaries to evaluate the model’s performance.* They observe a significant difference in ROUGE-2 and entity recall scores between the subsets with the lowest and highest similarity.results:* The authors show that limiting lexical repetitions in training summaries during both supervised fine-tuning and likelihood calibration stages can improve the model’s performance on novel test cases while retaining average performance.* Their automatic and human evaluations on novel test subsets and recent news articles demonstrate that limiting lexical repetitions can prevent rote learning and improve generalization.
    Abstract Ideal summarization models should generalize to novel summary-worthy content without remembering reference training summaries by rote. However, a single average performance score on the entire test set is inadequate in determining such model competencies. We propose a fine-grained evaluation protocol by partitioning a test set based on the lexical similarity of reference test summaries with training summaries. We observe up to a 5x (1.2x) difference in ROUGE-2 (entity recall) scores between the subsets with the lowest and highest similarity. Next, we show that such training repetitions also make a model vulnerable to rote learning, reproducing data artifacts such as factual errors, especially when reference test summaries are lexically close to training summaries. Consequently, we propose to limit lexical repetitions in training summaries during both supervised fine-tuning and likelihood calibration stages to improve the performance on novel test cases while retaining average performance. Our automatic and human evaluations on novel test subsets and recent news articles show that limiting lexical repetitions in training summaries can prevent rote learning and improve generalization.
    摘要

Subtle Misogyny Detection and Mitigation: An Expert-Annotated Dataset

  • paper_url: http://arxiv.org/abs/2311.09443
  • repo_url: None
  • paper_authors: Brooklyn Sheppard, Anna Richter, Allison Cohen, Elizabeth Allyn Smith, Tamara Kneese, Carolyne Pelletier, Ioana Baldini, Yue Dong
  • for: 本研究用于开发一个新的 dataset,捕捉女性偏见的细节和复杂性。
  • methods: 该 dataset 使用多学科专家和注释器共同建构,包括电影字幕注释,捕捉北美电影中的日常性偏见表达。
  • results: 该研究提供了偏见检测和改进的基准值,并分析了获得的注释。 hope 该工作能够推动 AI 为社会好用的 NLP 技术发展。
    Abstract Using novel approaches to dataset development, the Biasly dataset captures the nuance and subtlety of misogyny in ways that are unique within the literature. Built in collaboration with multi-disciplinary experts and annotators themselves, the dataset contains annotations of movie subtitles, capturing colloquial expressions of misogyny in North American film. The dataset can be used for a range of NLP tasks, including classification, severity score regression, and text generation for rewrites. In this paper, we discuss the methodology used, analyze the annotations obtained, and provide baselines using common NLP algorithms in the context of misogyny detection and mitigation. We hope this work will promote AI for social good in NLP for bias detection, explanation, and removal.
    摘要 Translated into Simplified Chinese:使用创新的数据集开发方法,Biasly数据集 capture了偏见的细节和细腻性,在文献中具有独特的表现。与多种学科专家和批注人员合作建立的数据集包含电影字幕拼音,捕捉了北美电影中的日常性偏见。该数据集可以用于多种NLP任务,包括分类、偏见度评分和文本生成重写。在这篇论文中,我们介绍了使用的方法、分析获得的拼音和使用常见NLP算法进行偏见检测和修正的基线。我们希望这项工作能够促进NLP领域的AI为社会好。

Labeled Interactive Topic Models

  • paper_url: http://arxiv.org/abs/2311.09438
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Kyle Seelman, Mozhi Zhang, Jordan Boyd-Graber
  • for: 用于改善 neural topic model 中的主题选择
  • methods: 使用用户标签来修改主题,以更好地满足用户的信息需求
  • results: 通过人工研究,发现用户标签可以提高文档排名分数,从而更好地找到与查询有关的文档
    Abstract Topic models help users understand large document collections; however, topic models do not always find the ``right'' topics. While classical probabilistic and anchor-based topic models have interactive variants to guide models toward better topics, such interactions are not available for neural topic models such as the embedded topic model (\abr{etm}). We correct this lacuna by adding an intuitive interaction to neural topic models: users can label a topic with a word, and topics are updated so that the topic words are close to the label. This allows a user to refine topics based on their information need. While, interactivity is intuitive for \abr{etm}, we extend this framework to work with other neural topic models as well. We develop an interactive interface which allows users to interact and relabel topic models as they see fit. We evaluate our method through a human study, where users can relabel topics to find relevant documents. Using our method, user labeling improves document rank scores, helping to find more relevant documents to a given query when compared to no user labeling.
    摘要

Striped Attention: Faster Ring Attention for Causal Transformers

  • paper_url: http://arxiv.org/abs/2311.09431
  • repo_url: https://github.com/exists-forall/striped_attention
  • paper_authors: William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian Jin, Zhiye Song, Jonathan Ragan-Kelley
  • for: 提高Transformer模型中的序列长度增长的能力
  • methods: 使用Ring Attention算法和Striped Attention扩展来解决每个设备内存瓶颈问题
  • results: 在 causal transformer 模型中实现1.45倍的终端通过put通过率提高,并在16个TPUv4板上实现1.65倍的速度提高,sequence length为256k和786k。
    Abstract To help address the growing demand for ever-longer sequence lengths in transformer models, Liu et al. recently proposed Ring Attention, an exact attention algorithm capable of overcoming per-device memory bottle- necks by distributing self-attention across multiple devices. In this paper, we study the performance characteristics of Ring Attention in the important special case of causal transformer models, and identify a key workload imbal- ance due to triangular structure of causal attention computations. We propose a simple extension to Ring Attention, which we call Striped Attention to fix this imbalance. Instead of devices having contiguous subsequences, each device has a subset of tokens distributed uniformly throughout the sequence, which we demonstrate leads to more even workloads. In experiments running Striped Attention on A100 GPUs and TPUv4s, we are able to achieve up to 1.45x end-to-end throughput improvements over the original Ring Attention algorithm on causal transformer training at a sequence length of 256k. Furthermore, on 16 TPUv4 chips, we were able to achieve 1.65x speedups at sequence lengths of 786k. We release the code for our experiments as open source
    摘要 为了满足长序列的增长需求,刘等人最近提出了环形注意力算法(Ring Attention),可以在单个设备内分布自注意力,从而缓解单个设备内存瓶颈。在这篇论文中,我们研究了环形注意力在重要的 causal transformer 模型中的性能特点,并发现了一个关键的工作负荷不均。我们提出了一个简单的扩展,称为扫描注意力(Striped Attention),可以解决这一问题。在实验中,我们在 A100 GPU 和 TPUv4 上运行了扫描注意力算法,并 achieved 256k 序列长度下的最大 1.45x 终端通过puts,以及 786k 序列长度下的最大 1.65x 终端通过puts。此外,我们还发布了我们的实验代码作为开源。

Predicting generalization performance with correctness discriminators

  • paper_url: http://arxiv.org/abs/2311.09422
  • repo_url: None
  • paper_authors: Yuekun Yao, Alexander Koller
  • for: 预测NLP模型在未看过数据上的准确率,以确保模型的可靠性。
  • methods: 提出了一种新的模型,通过训练一个推断器,来预测序列到序列模型输出是正确或错误的。
  • results: 在多种标注、分析和semantic parsing任务上,金字典准确率都在预测的上下限之间,并且这些上下限很接近。
    Abstract The ability to predict an NLP model's accuracy on unseen, potentially out-of-distribution data is a prerequisite for trustworthiness. We present a novel model that establishes upper and lower bounds on the accuracy, without requiring gold labels for the unseen data. We achieve this by training a discriminator which predicts whether the output of a given sequence-to-sequence model is correct or not. We show across a variety of tagging, parsing, and semantic parsing tasks that the gold accuracy is reliably between the predicted upper and lower bounds, and that these bounds are remarkably close together.
    摘要 使得预测NLP模型对未看过、可能不属于输入范围的数据的准确率是一个必要的前提,以确保模型的可靠性。我们提出了一种新的模型,可以在未看过数据上预测模型的准确率,不需要黄金标签。我们通过训练一个推断器,判断给定的序列-到-序列模型输出是否正确,来实现这一点。我们在不同的标注、分析和 semantics 解析任务上显示,黄金准确率在预测的Upper和Lower bound之间,这些 bound 很接近。Here's the translation in Simplified Chinese: 使得预测NLP模型对未看过、可能不属于输入范围的数据的准确率是一个必要的前提,以确保模型的可靠性。我们提出了一种新的模型,可以在未看过数据上预测模型的准确率,不需要黄金标签。我们通过训练一个推断器,判断给定的序列-到-序列模型输出是否正确,来实现这一点。我们在不同的标注、分析和 semantics 解析任务上显示,黄金准确率在预测的Upper和Lower bound之间,这些 bound 很接近。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Taiwan and Hong Kong.

Alternatives to the Scaled Dot Product for Attention in the Transformer Neural Network Architecture

  • paper_url: http://arxiv.org/abs/2311.09406
  • repo_url: None
  • paper_authors: James Bernhard
  • for: 避免权重缺失导致梯度消失的问题
  • methods: 提出一些替代缩放方法,包括将dot product除以键值Sum前应用softmax
  • results: 通过使用模拟的键和问题示例,显示了这些缩放方法在许多情况下效果更好,避免了梯度消失的问题
    Abstract The transformer neural network architecture uses a form of attention in which the dot product of query and key is divided by the square root of the key dimension before applying softmax. This scaling of the dot product is designed to avoid the absolute value of the dot products becoming so large that applying softmax leads to vanishing gradients. In this paper, we propose some alternative scalings, including dividing the dot product instead by the sum of the key lengths before applying softmax. We use simulated keys and queries to show that in many situations this appears to be more effective at avoiding regions where applying softmax leads to vanishing gradients.
    摘要 transformer神经网络架构使用一种叫做注意力的机制,其中查询和键的点积被除以键维度的平方根之后应用softmax。这种缩放的点积是为了避免查询和键的绝对值变得太大,使得应用softmax导致梯度消失。在这篇论文中,我们提出了一些代替的缩放方法,包括在应用softmax之前将点积除以键的总长度。我们使用模拟的查询和键来显示,在许多情况下,这些缩放方法更有效地避免应用softmax导致梯度消失的区域。

To Translate or Not to Translate: A Systematic Investigation of Translation-Based Cross-Lingual Transfer to Low-Resource Languages

  • paper_url: http://arxiv.org/abs/2311.09404
  • repo_url: None
  • paper_authors: Benedikt Ebing, Goran Glavaš
  • for: 本研究旨在系统地评估现有和提出新的翻译基于的跨语言迁移(XLT)方法,以便在低资源语言上进行迁移。
  • methods: 本研究使用了翻译基于的XLT方法,包括将源语言训练数据翻译回目标语言,并将目标语言测试数据翻译回源语言。此外,还添加了其他高资源语言的可靠翻译来加强模型。
  • results: 研究发现,使用翻译基于的XLT方法可以大幅超越零极XLT方法,并且可以通过添加其他高资源语言的翻译来进一步提高实验性能。此外,研究还提出了一种能够在不支持MT系统的语言上实现XLT的效果的策略。最后,研究发现,使用MT系统生成的目标语言验证数据来选择XLT模型可以更好地提高模型性能。
    Abstract Perfect machine translation (MT) would render cross-lingual transfer (XLT) by means of multilingual language models (LMs) superfluous. Given, on one hand, the large body of work on improving XLT with multilingual LMs and, on the other hand, recent advances in massively multilingual MT, in this work, we systematically evaluate existing and propose new translation-based XLT approaches for transfer to low-resource languages. We show that all translation-based approaches dramatically outperform zero-shot XLT with multilingual LMs, rendering the approach that combines the round-trip translation of the source-language training data with the translation of the target-language test instances the most effective. We next show that one can obtain further empirical gains by adding reliable translations to other high-resource languages to the training data. Moreover, we propose an effective translation-based XLT strategy even for languages not supported by the MT system. Finally, we show that model selection for XLT based on target-language validation data obtained with MT outperforms model selection based on the source-language data. We hope that our findings encourage adoption of more robust translation-based baselines in XLT research.
    摘要 如果精准机器翻译(MT)能够实现语言转换(XLT),那么使用多语言模型(LM)来实现XLT将成为 redundant。在一个手上,有大量关于提高XLT的多语言LM的研究,而在另一个手上,有最近的质量翻译技术的进步。在这项工作中,我们系统地评估了现有的翻译基于XLT的方法,并提出了新的翻译基于XLT的方法。我们发现所有的翻译基于方法在零投入XLT中都表现出了很好的表现,而combined round-trip translation of the source-language training data with the translation of the target-language test instances的方法是最有效的。我们还证明可以通过添加其他高资源语言的可靠翻译到训练数据中来获得更高的实验性赢利。此外,我们提出了一种有效的翻译基于XLT策略,即使Language不支持MT系统。最后,我们发现基于MT系统 validation data 进行模型选择可以更好地than基于源语言数据。我们希望我们的发现能够激励XLT研究中更多使用更加稳定的翻译基于基准。

LEEETs-Dial: Linguistic Entrainment in End-to-End Task-oriented Dialogue systems

  • paper_url: http://arxiv.org/abs/2311.09390
  • repo_url: None
  • paper_authors: Nalin Kumar, Ondřej Dušek
  • for: 这个研究旨在提高对话系统的自然性,通过实现对话Alignment。
  • methods: 该研究使用GPT-2基于端到端对话系统,并采用共享词汇来实现对话Alignment。试用了训练实例权重、对ignment特定的损失函数和额外conditioning来生成与用户的响应。
  • results: 通过对MultiWOZ数据集进行比较,研究发现三种 entraining 技术均可以significantly improve alignment compared to the baseline,并被自动和手动评估指标证明。
    Abstract Linguistic entrainment, or alignment, represents a phenomenon where linguistic patterns employed by conversational participants converge to one another. While alignment has been shown to produce a more natural user experience, most dialogue systems do not have any provisions for it. In this work, we introduce methods for achieving dialogue alignment in a GPT-2-based end-to-end dialogue system through the utilization of shared vocabulary. We experiment with training instance weighting, alignment-specific loss, and additional conditioning to generate responses that align with the user. By comparing different entrainment techniques on the MultiWOZ dataset, we demonstrate that all three approaches produce significantly better-aligned results than the baseline, as confirmed by both automated and manual evaluation metrics.
    摘要 语言同步(或对齐)现象表示对话参与者使用的语言模式相互听得一致。尽管对齐可以提供更自然的用户体验,但大多数对话系统没有相关的规定。在这项工作中,我们介绍了基于 GPT-2 的端到端对话系统中实现对话对齐的方法,通过共享词汇的使用。我们对训练实例权重、对齐特定的损失函数和附加条件进行实验,以生成与用户相对的回答。通过对 MultiWOZ 数据集的不同对齐技术进行比较,我们证明了所有三种方法均可以在自动和手动评估指标上提供显著更好的对齐效果。

Neural machine translation for automated feedback on children’s early-stage writing

  • paper_url: http://arxiv.org/abs/2311.09389
  • repo_url: None
  • paper_authors: Jonas Vestergaard Jensen, Mikkel Jordahn, Michael Riis Andersen
  • for: 本研究旨在自动生成初级写作评估和建议,使用机器学习技术。
  • methods: 本研究提议使用序列到序列模型将初级写作翻译成正常写作,以便使用语言指标进行分析。此外,提出了一种新的强度 likelihood 来抑制数据集中的噪声影响。
  • results: 经numerical实验 validate,可以高精度预测正常写作。
    Abstract In this work, we address the problem of assessing and constructing feedback for early-stage writing automatically using machine learning. Early-stage writing is typically vastly different from conventional writing due to phonetic spelling and lack of proper grammar, punctuation, spacing etc. Consequently, early-stage writing is highly non-trivial to analyze using common linguistic metrics. We propose to use sequence-to-sequence models for "translating" early-stage writing by students into "conventional" writing, which allows the translated text to be analyzed using linguistic metrics. Furthermore, we propose a novel robust likelihood to mitigate the effect of noise in the dataset. We investigate the proposed methods using a set of numerical experiments and demonstrate that the conventional text can be predicted with high accuracy.
    摘要 在这项工作中,我们解决了自动使用机器学习进行早期写作评估和建构反馈的问题。早期写作通常具有不同的语音拼写和缺失正确的语法、标点、间距等等特点,因此对于常见语言指标来说非常困难分析。我们提议使用序列到序列模型将学生早期写作翻译成“常规”的写作,以便使用语言指标进行分析。此外,我们提出了一种新的稳定 likelihood 来抑制数据集中的噪声的影响。我们通过数字实验 investigate 这些方法,并证明可以高度准确地预测常规文本。

Banach-Tarski Embeddings and Transformers

  • paper_url: http://arxiv.org/abs/2311.09387
  • repo_url: https://github.com/jtmaher/embedding
  • paper_authors: Joshua Maher
  • for: 这个论文是为了提出一种将任意递归数据结构嵌入高维向量空间的新方法。
  • methods: 这个论文使用的方法包括提出一种可解释性模型,用于转换器的秘密状态向量。这个模型可以在嵌入维度够大时将嵌入vector解码回原始数据结构。此解码算法自然地实现为一种转换器。此外,这些嵌入向量还可以直接进行计算,无需解码。例如,我们提出了一种使用vector操作构建嵌入Token序列的嵌入 parse树算法。
  • results: 这个论文的结果表明,当嵌入维度够大时,这种嵌入可以准确地重建原始数据结构。此外,这种嵌入还可以 Directly manipulate the embedded vectors to perform computations on the underlying data without decoding.
    Abstract We introduce a new construction of embeddings of arbitrary recursive data structures into high dimensional vectors. These embeddings provide an interpretable model for the latent state vectors of transformers. We demonstrate that these embeddings can be decoded to the original data structure when the embedding dimension is sufficiently large. This decoding algorithm has a natural implementation as a transformer. We also show that these embedding vectors can be manipulated directly to perform computations on the underlying data without decoding. As an example we present an algorithm that constructs the embedded parse tree of an embedded token sequence using only vector operations in embedding space.
    摘要 我们介绍了一种新的嵌入构造,用于将任意递归数据结构嵌入高维向量中。这些嵌入提供了可解释的模型 дляtransformer的latent状态向量。我们证明了这些嵌入可以在嵌入维度充分大的情况下被解码回原始数据结构。这个解码算法自然地实现为transformer。我们还证明了这些嵌入向量可以直接进行计算,而无需解码。作为示例,我们提出了一个算法,用于在嵌入空间内构建token序列的嵌入树结构。

Long-form Question Answering: An Iterative Planning-Retrieval-Generation Approach

  • paper_url: http://arxiv.org/abs/2311.09383
  • repo_url: None
  • paper_authors: Pritom Saha Akash, Kashob Kumar Roy, Lucian Popa, Kevin Chen-Chuan Chang
  • for: 这篇论文是为了解决长形问答(LFQA)问题,旨在生成详细的回答,而不是简单的是或否答案或短要的信息。
  • methods: 该论文提出了一种基于谱计规划、检索和生成的LFQA模型,通过多次迭代的计划、检索和生成过程来生成详细的回答。
  • results: 经过广泛的实验,该模型在开放领域和技术领域的QA数据集上表现出优于现有模型,在多种文本和事实指标上具有更高的表现。
    Abstract Long-form question answering (LFQA) poses a challenge as it involves generating detailed answers in the form of paragraphs, which go beyond simple yes/no responses or short factual answers. While existing QA models excel in questions with concise answers, LFQA requires handling multiple topics and their intricate relationships, demanding comprehensive explanations. Previous attempts at LFQA focused on generating long-form answers by utilizing relevant contexts from a corpus, relying solely on the question itself. However, they overlooked the possibility that the question alone might not provide sufficient information to identify the relevant contexts. Additionally, generating detailed long-form answers often entails aggregating knowledge from diverse sources. To address these limitations, we propose an LFQA model with iterative Planning, Retrieval, and Generation. This iterative process continues until a complete answer is generated for the given question. From an extensive experiment on both an open domain and a technical domain QA dataset, we find that our model outperforms the state-of-the-art models on various textual and factual metrics for the LFQA task.
    摘要 长swers 问题 (LFQA) 提出了一个挑战,因为它们需要生成详细的答案,而不是单纯的是或否答案或简短的事实答案。现有的 QA 模型在问题中可以提供简短的答案,但 LFQA 需要处理多个话题和它们的复杂关系,需要详细的解释。过去的 LFQA 尝试都是通过使用相关的文本库来生成长答案,但它们忽视了问题本身可能无法提供足够的信息来定义相关的文本库。此外,生成详细的长答案通常需要从多个来源汇集知识。为解决这些限制,我们提出了一个基于迭代的计划、检索和生成的 LFQA 模型。这个迭代过程一直进行,直到为给定的问题生成完整的答案。从对开放领域和技术领域 QA 数据集的广泛实验来看,我们发现我们的模型在不同的文本和事实指标上超过了当前状态的模型。

A Survey on Online User Aggression: Content Detection and Behavioural Analysis on Social Media Platforms

  • paper_url: http://arxiv.org/abs/2311.09367
  • repo_url: None
  • paper_authors: Swapnil Mane, Suman Kundu, Rajesh Sharma
  • for: This paper aims to bridge the gap between disparate studies on aggression content detection and behavioral analysis of aggressive users in the context of cyber-aggressive behavior.
  • methods: The paper examines the comprehensive process of aggression content detection, including dataset creation, feature selection and extraction, and detection algorithm development. It also reviews studies on behavioral analysis of aggression that explore influencing factors, consequences, and patterns associated with cyber-aggressive behavior.
  • results: The paper identifies research gaps and encourages further progress in the unified domain of socio-computational aggressive behavior analysis.Here’s the Chinese version of the three information points:
  • for: 这篇论文目标是将不同领域的侵略行为探究归并到一起,以探讨cyber-侵略行为中的社会问题。
  • methods: 论文检查了侵略内容检测的全面过程,包括数据集创建、特征选择和提取、检测算法开发。它还查看了对侵略行为的行为分析研究,探讨了侵略行为的影响因素、后果和模式。
  • results: 论文发现了研究漏洞,并促进了在统一领域内的社会计算侵略行为分析的进展。
    Abstract The rise of social media platforms has led to an increase in cyber-aggressive behavior, encompassing a broad spectrum of hostile behavior, including cyberbullying, online harassment, and the dissemination of offensive and hate speech. These behaviors have been associated with significant societal consequences, ranging from online anonymity to real-world outcomes such as depression, suicidal tendencies, and, in some instances, offline violence. Recognizing the societal risks associated with unchecked aggressive content, this paper delves into the field of Aggression Content Detection and Behavioral Analysis of Aggressive Users, aiming to bridge the gap between disparate studies. In this paper, we analyzed the diversity of definitions and proposed a unified cyber-aggression definition. We examine the comprehensive process of Aggression Content Detection, spanning from dataset creation, feature selection and extraction, and detection algorithm development. Further, we review studies on Behavioral Analysis of Aggression that explore the influencing factors, consequences, and patterns associated with cyber-aggressive behavior. This systematic literature review is a cross-examination of content detection and behavioral analysis in the realm of cyber-aggression. The integrated investigation reveals the effectiveness of incorporating sociological insights into computational techniques for preventing cyber-aggressive behavior. Finally, the paper concludes by identifying research gaps and encouraging further progress in the unified domain of socio-computational aggressive behavior analysis.
    摘要 “社交媒体平台的崛起导致了网络攻击性行为的增加,包括网络欺凌、网络恐吓和各种不实和恨言。这些行为与社会的后果存在联系,包括线上匿名和实际世界的抑郁、自杀倾向和,在某些情况下,网络上的暴力。本文探讨了网络攻击性行为的多元定义,并提出了统一的网络攻击定义。我们分析了各种数据集的建立、特征选择和提取,以及检测算法的发展。此外,我们审查了关于攻击者行为的行为分析研究,探讨了这些行为的影响因素、后果和模式。本文的系统性审查显示了融合社会学知识和计算技术可以预防网络攻击性行为。最后,本文总结了研究缺陷,并鼓励进一步的进展在统一的网络攻击行为分析领域。”

Investigating the Emergent Audio Classification Ability of ASR Foundation Models

  • paper_url: http://arxiv.org/abs/2311.09363
  • repo_url: None
  • paper_authors: Rao Ma, Adian Liusie, Mark J. F. Gales, Kate M. Knill
  • for: 这 paper 探讨了 ASR 基础模型 Whisper 和 MMS 在零shot 设定下的语音分类能力。
  • methods: 这 paper 使用了简单的模板基于文本提示,将 decoder 的解码概率用于生成零shot 预测。无需训练模型或添加新参数,Whisper 在多个 audio-classification dataset 上表现出了有前所未有的零shot 分类性能,比前一个状态的平均精度高出 9%。
  • results: 这 paper 发现,对零shot 分类任务,Whisper 模型的性能随模型大小增长,表明随着 ASR 基础模型的扩大,其零shot 性能可能会提高。
    Abstract Text and vision foundation models can perform many tasks in a zero-shot setting, a desirable property that enables these systems to be applied in general and low-resource settings. However, there has been significantly less work on the zero-shot abilities of ASR foundation models, with these systems typically fine-tuned to specific tasks or constrained to applications that match their training criterion and data annotation. In this work we investigate the ability of Whisper and MMS, ASR foundation models trained primarily for speech recognition, to perform zero-shot audio classification. We use simple template-based text prompts at the decoder and use the resulting decoding probabilities to generate zero-shot predictions. Without training the model on extra data or adding any new parameters, we demonstrate that Whisper shows promising zero-shot classification performance on a range of 8 audio-classification datasets, outperforming existing state-of-the-art zero-shot baseline's accuracy by an average of 9%. One important step to unlock the emergent ability is debiasing, where a simple unsupervised reweighting method of the class probabilities yields consistent significant performance gains. We further show that performance increases with model size, implying that as ASR foundation models scale up, they may exhibit improved zero-shot performance.
    摘要 文本和视觉基础模型可以完成许多零 shot 任务,这是一个极其愉快的特性,使这些系统可以在通用和低资源环境中应用。然而,针对 ASR 基础模型的零 shot 能力的研究远未充分,这些系统通常是特定任务的精度调整或者限定到与其训练标准和数据注解相匹配的应用。在这项工作中,我们调查了 Whisper 和 MMS,这两个 ASR 基础模型是主要用于语音识别的。我们使用简单的模板基于文本提示,并使用 resulting 的解码概率来生成零 shot 预测。无需训练模型Extra 数据或添加新参数,我们示出了 Whisper 在多个 8 个音频分类数据集上的出色的零 shot 分类性能,与现有状态的平均性能提高率为 9%。一种重要的步骤是去偏见,其中一种简单的无监督重weighting 方法可以持续提供显著的性能提升。我们进一步表明,性能随模型大小增长,implying 随 ASR 基础模型的扩大,它们可能会表现出更好的零 shot 性能。

LePaRD: A Large-Scale Dataset of Judges Citing Precedents

  • paper_url: http://arxiv.org/abs/2311.09356
  • repo_url: https://github.com/rmahari/lepard
  • paper_authors: Robert Mahari, Dominik Stammbach, Elliott Ash, Alex `Sandy’ Pentland
  • for: 本研究的目的是发展实用法律自然语言处理技术,帮助扩大法律研究的访问和 justice 的质量。
  • methods: 本研究使用了大量的美国联邦法院判例文献,通过Contextualized Word Embeddings 和文本分类来进行预测。
  • results: 研究发现,使用文本分类方法可以在预测法律前置文献中达到较高的准确率,但是法律预测仍然是一项具有挑战性的任务,具有很大的改进空间。
    Abstract We present the Legal Passage Retrieval Dataset LePaRD. LePaRD is a massive collection of U.S. federal judicial citations to precedent in context. The dataset aims to facilitate work on legal passage prediction, a challenging practice-oriented legal retrieval and reasoning task. Legal passage prediction seeks to predict relevant passages from precedential court decisions given the context of a legal argument. We extensively evaluate various retrieval approaches on LePaRD, and find that classification appears to work best. However, we note that legal precedent prediction is a difficult task, and there remains significant room for improvement. We hope that by publishing LePaRD, we will encourage others to engage with a legal NLP task that promises to help expand access to justice by reducing the burden associated with legal research. A subset of the LePaRD dataset is freely available and the whole dataset will be released upon publication.
    摘要 我们介绍了《法律段落预测数据集》(LePaRD)。LePaRD是一个庞大的美国联邦司法文献引用集,旨在促进法律段落预测任务的研究。法律段落预测是一种实践 oriented 的法律检索和逻辑任务,旨在预测基于法律 Argument 的相关部分。我们在 LePaRD 上进行了广泛的评估,发现类别方法在这些任务中表现最好。然而,我们注意到法律预测是一个具有挑战性的任务,还有很大的改进空间。我们希望通过发布 LePaRD,促进法律自然语言处理任务的研究,以扩大对正义的访问。一个 LePaRD 的子集已经公开可用,整个数据集将在发表后公开。

Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization

  • paper_url: http://arxiv.org/abs/2311.09344
  • repo_url: None
  • paper_authors: Alexandra Chronopoulou, Jonas Pfeiffer, Joshua Maynez, Xinyi Wang, Sebastian Ruder, Priyanka Agrawal
  • for: 提高大语言模型在下游任务中的性能,特别是使用 labelled task 数据进行 parameter-efficient fine-tuning (PEFT)。
  • methods: 提出了一种基于 language 或 task 特有的 parameter 的特殊化方法,通过元素 wise 加法操作来挖掘无标注数据和英语标注数据。
  • results: 经验表明,该方法可以在摘要任务上取得稳定的提升,只需要训练 PEFT 模块 minimal amount of training data。
    Abstract Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there are 7000 languages in the world and many of these languages lack labeled data for real-world language generation tasks. In this paper, we propose to improve zero-shot cross-lingual transfer by composing language or task specialized parameters. Our method composes language and task PEFT modules via element-wise arithmetic operations to leverage unlabeled data and English labeled data. We extend our approach to cases where labeled data from more languages is available and propose to arithmetically compose PEFT modules trained on languages related to the target. Empirical results on summarization demonstrate that our method is an effective strategy that obtains consistent gains using minimal training of PEFT modules.
    摘要 参数高效调整(PEFT)使用标注任务数据可以显著提高大语言模型(LLM)的下游任务性能。然而,世界上有7000种语言,并且许多这些语言缺乏实际语言生成任务的标注数据。在这篇论文中,我们提议通过语言或任务特化的参数进行改进零上下游语言传递。我们的方法通过语言和任务PEFT模块之间的元素积算操作来利用无标注数据和英文标注数据。我们将我们的方法扩展到有更多语言的标注数据的情况,并提议使用相关语言的PEFT模块进行加法组合。实验结果表明,我们的方法是一种有效的策略,可以通过最小的PEFT模块训练而获得常见的提升。

Pinpoint, Not Criticize: Refining Large Language Models via Fine-Grained Actionable Feedback

  • paper_url: http://arxiv.org/abs/2311.09336
  • repo_url: None
  • paper_authors: Wenda Xu, Daniel Deutsch, Mara Finkelstein, Juraj Juraska, Biao Zhang, Zhongtao Liu, William Yang Wang, Lei Li, Markus Freitag
  • for: 提高文本生成质量
  • methods: 使用细化的行为反馈,通过一个学习的错误定位模型来进行迭代改进
  • results: 在三个文本生成任务中,包括机器翻译、长篇问答和主题概要,观察到0.8和0.7 MetricX的提升,以及4.5和1.8 ROUGE-L的提升,单次迭代改进。使用仿生热化算法可以进一步提高质量,包括最多1.7 MetricX的提升。
    Abstract Recent improvements in text generation have leveraged human feedback to improve the quality of the generated output. However, human feedback is not always available, especially during inference. In this work, we propose an inference time optimization method FITO to use fine-grained actionable feedback in the form of error type, error location and severity level that are predicted by a learned error pinpoint model for iterative refinement. FITO starts with an initial output, then iteratively incorporates the feedback via a refinement model that generates an improved output conditioned on the feedback. Given the uncertainty of consistent refined samples at iterative steps, we formulate iterative refinement into a local search problem and develop a simulated annealing based algorithm that balances exploration of the search space and optimization for output quality. We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA) and topical summarization. We observe 0.8 and 0.7 MetricX gain on Chinese-English and English-German translation, 4.5 and 1.8 ROUGE-L gain at long form QA and topic summarization respectively, with a single iteration of refinement. With our simulated annealing algorithm, we see further quality improvements, including up to 1.7 MetricX improvements over the baseline approach.
    摘要 Recent improvements in text generation have leveraged human feedback to improve the quality of the generated output. However, human feedback is not always available, especially during inference. In this work, we propose an inference time optimization method FITO to use fine-grained actionable feedback in the form of error type, error location, and severity level that are predicted by a learned error pinpoint model for iterative refinement. FITO starts with an initial output, then iteratively incorporates the feedback via a refinement model that generates an improved output conditioned on the feedback. Given the uncertainty of consistent refined samples at iterative steps, we formulate iterative refinement into a local search problem and develop a simulated annealing based algorithm that balances exploration of the search space and optimization for output quality. We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization. We observe 0.8 and 0.7 MetricX gain on Chinese-English and English-German translation, 4.5 and 1.8 ROUGE-L gain at long form QA and topic summarization respectively, with a single iteration of refinement. With our simulated annealing algorithm, we see further quality improvements, including up to 1.7 MetricX improvements over the baseline approach.

Mind’s Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models

  • paper_url: http://arxiv.org/abs/2311.09214
  • repo_url: None
  • paper_authors: Weize Liu, Guocong Li, Kai Zhang, Bang Du, Qiyuan Chen, Xuming Hu, Hongxia Xu, Jintai Chen, Jian Wu
  • for: 提高小语言模型(SLM)的性能,使其更加接近人类认知。
  • methods: 提出了一种两重方法,首先是将大语言模型(LLM)中的自我评估能力抽象到 SLM 中,以减少错误的 reasoning 和幻见的影响;其次是提出了一种多种链条思维和自我评估 paradigm 的总体distillation进程,以确保更加全面和坚实地将知识传递到 SLM 中。
  • results: 实验结果表明,我们的方法可以显著提高 distilled SLM 的性能,并且突出了开发更小的模型,更 closely aligns with human cognition 的道路。
    Abstract Large language models (LLMs) have achieved remarkable advancements in the field of natural language processing. However, the sheer scale and computational demands of these models present formidable challenges when considering their practical deployment in resource-constrained contexts. While techniques such as chain-of-thought (CoT) distillation have displayed promise in distilling LLMs into small language models (SLMs), there is a risk that distilled SLMs may still carry over flawed reasoning or hallucinations inherited from their LLM counterparts. To address these issues, we propose a twofold methodology: First, we introduce a novel method for distilling the self-evaluation capability inherent in LLMs into SLMs, which aims to mitigate the adverse effects of erroneous reasoning and reduce hallucinations. Second, we advocate for a comprehensive distillation process that incorporates multiple distinct chain-of-thought and self-evaluation paradigms and ensures a more holistic and robust knowledge transfer into SLMs. Experiments on three NLP benchmarks demonstrate that our method significantly improves the performance of distilled SLMs and sheds light on the path towards developing smaller models closely aligned with human cognition.
    摘要
  1. Distill the self-evaluation capability of LLMs into small language models (SLMs) to mitigate erroneous reasoning and reduce hallucinations.2. Use a comprehensive distillation process that incorporates multiple chain-of-thought and self-evaluation paradigms for a more holistic and robust knowledge transfer.Experiments on three NLP benchmarks show that our method significantly improves the performance of distilled SLMs and provides insights into developing smaller models aligned with human cognition.

GRIM: GRaph-based Interactive narrative visualization for gaMes

  • paper_url: http://arxiv.org/abs/2311.09213
  • repo_url: None
  • paper_authors: Jorge Leandro, Sudha Rao, Michael Xu, Weijia Xu, Nebosja Jojic, Chris Brockett, Bill Dolan
  • for: 帮助对话式角色扮演游戏(RPG)的故事创作。
  • methods: 使用大型生成文本模型协助创作过程。
  • results: 可以生成rich narrative graph with branching storylines,并且可以在设计者的交互下自动生成新的子图文件,以满足编辑需求。
    Abstract Dialogue-based Role Playing Games (RPGs) require powerful storytelling. The narratives of these may take years to write and typically involve a large creative team. In this work, we demonstrate the potential of large generative text models to assist this process. \textbf{GRIM}, a prototype \textbf{GR}aph-based \textbf{I}nteractive narrative visualization system for ga\textbf{M}es, generates a rich narrative graph with branching storylines that match a high-level narrative description and constraints provided by the designer. Game designers can interactively edit the graph by automatically generating new sub-graphs that fit the edits within the original narrative and constraints. We illustrate the use of \textbf{GRIM} in conjunction with GPT-4, generating branching narratives for four well-known stories with different contextual constraints.
    摘要 对话式角色游戏(RPG)需要强大的故事编写。这些故事可能需要几年时间写作,通常需要一大群创作人员。在这个工作中,我们展示了大型生成文本模型如何帮助这个过程。我们开发了一个名为“GRIM”的原型,它是一个基于图的互动式narative视觉系统,可以根据设计师提供的高级剧本和约束生成丰富的剧本图。设计师可以通过交互地编辑图表,生成适应修改的新子图,以保持在原始剧本和约束之内。我们使用GPT-4和GRIM在不同的Contextual约束下生成分支剧本,以示其可用性。

Contrastive Chain-of-Thought Prompting

  • paper_url: http://arxiv.org/abs/2311.09277
  • repo_url: https://github.com/damo-nlp-sg/contrastive-cot
  • paper_authors: Yew Ken Chia, Guizhen Chen, Luu Anh Tuan, Soujanya Poria, Lidong Bing
  • for: 提高语音模型的逻辑推理能力
  • methods: 使用对比链式思维法,提供有效和无效示例来引导语音模型进行步骤式推理,并提高推理错误的检测能力
  • results: 在逻辑推理benchmark上实现了对比链式思维法的普适性,并且提高了语音模型的逻辑推理能力
    Abstract Despite the success of chain of thought in enhancing language model reasoning, the underlying process remains less well understood. Although logically sound reasoning appears inherently crucial for chain of thought, prior studies surprisingly reveal minimal impact when using invalid demonstrations instead. Furthermore, the conventional chain of thought does not inform language models on what mistakes to avoid, which potentially leads to more errors. Hence, inspired by how humans can learn from both positive and negative examples, we propose contrastive chain of thought to enhance language model reasoning. Compared to the conventional chain of thought, our approach provides both valid and invalid reasoning demonstrations, to guide the model to reason step-by-step while reducing reasoning mistakes. To improve generalization, we introduce an automatic method to construct contrastive demonstrations. Our experiments on reasoning benchmarks demonstrate that contrastive chain of thought can serve as a general enhancement of chain-of-thought prompting.
    摘要 尽管链式思考已经提高了语言模型的逻辑能力,但它们的下面逻辑过程仍然尚不够了解。尽管逻辑正确性看起来是链式思考的核心,但是前一 Studies 显示,使用无效示例时的影响却是很小。此外, convent ional 链式思考没有告诉语言模型哪些错误需要避免,这可能会导致更多的错误。因此,我们提出了受人类学习 FROM positive 和 negative 示例的灵感,并将其应用于语言模型的逻辑reasoning。与 convent ional 链式思考相比,我们的approach 可以提供有效和无效的逻辑示例,以引导模型 step-by-step 进行逻辑reasoning,同时减少逻辑错误。为了提高泛化能力,我们提出了一种自动生成对照示例的方法。我们的实验表明,对于逻辑测试 benchmark 来说,对照链式思考可以作为一种通用的逻辑促进。

TableLlama: Towards Open Large Generalist Models for Tables

  • paper_url: http://arxiv.org/abs/2311.09206
  • repo_url: None
  • paper_authors: Tianshu Zhang, Xiang Yue, Yifei Li, Huan Sun
    for:This paper aims to develop open-source large language models (LLMs) as generalists for a diversity of table-based tasks.methods:The authors construct a new dataset called TableInstruct, which includes a variety of realistic tables and tasks, and fine-tune an open-source model (TableLlama) with LongLoRA to address the long context challenge.results:TableLlama achieves comparable or better performance than the state-of-the-art (SOTA) on 7 out of 8 in-domain tasks, and shows 6-48 absolute point gains on 6 out-of-domain datasets, demonstrating the model’s generalizability.Here’s the simplified Chinese text:for: 这篇论文目标是开发大量自然语言模型(LLM),用于多种表格任务。methods: 作者们构建了一个新的表格数据集(TableInstruct),包括多种真实的表格和任务,并使用LongLoRA进行微调,以解决长上下文挑战。results: TableLlama在7个域内任务中达到或超过当前最佳性能(SOTA),并在6个对于任务特定设计的数据集上显示6-48个绝对点提升,示出模型的通用性。
    Abstract Semi-structured tables are ubiquitous. There has been a variety of tasks that aim to automatically interpret, augment, and query tables. Current methods often require pretraining on tables or special model architecture design, are restricted to specific table types, or have simplifying assumptions about tables and tasks. This paper makes the first step towards developing open-source large language models (LLMs) as generalists for a diversity of table-based tasks. Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs. We further develop the first open-source generalist model for tables, TableLlama, by fine-tuning Llama 2 (7B) with LongLoRA to address the long context challenge. We experiment under both in-domain setting and out-of-domain setting. On 7 out of 8 in-domain tasks, TableLlama achieves comparable or better performance than the SOTA for each task, despite the latter often has task-specific design. On 6 out-of-domain datasets, it achieves 6-48 absolute point gains compared with the base model, showing that training on TableInstruct enhances the model's generalizability. We will open-source our dataset and trained model to boost future work on developing open generalist models for tables.
    摘要 semi-structured 表格是普遍存在的。有很多任务旨在自动理解、增强和查询表格。现有的方法frequently需要预训练表格或特定的模型体系设计,或者只能处理特定的表格类型,或者假设表格和任务过于简单。这篇论文是开发大型自然语言模型(LLM)为表格任务的第一步。为此,我们构建了一个名为 TableInstruct 的新数据集,用于训练和评估 LLM。我们还开发了首个开源的通用模型 для表格,即 TableLlama,通过长Context挑战 LongLoRA 来练习。我们在域 Setting 和 out-of-domain Setting 下进行了实验。在 7 个域 Setting 中,TableLlama 在每个任务上与 SOTA 相比,具有相似或更好的性能,即使后者具有特定的设计。在 6 个 out-of-domain 数据集上,它与基模型相比,获得了 6-48 绝对点胜。这表明训练在 TableInstruct 上增强了模型的通用性。我们将开源我们的数据集和训练模型,以便将来的开发工作。

When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages

  • paper_url: http://arxiv.org/abs/2311.09205
  • repo_url: https://github.com/tylerachang/curse-of-multilinguality
  • paper_authors: Tyler A. Chang, Catherine Arnett, Zhuowen Tu, Benjamin K. Bergen
  • for: 本研究旨在 investigate the effects of multilinguality on language modeling performance in individual languages.
  • methods: 研究人员采用了10,000个单语言和多语言语言模型,对250种语言进行预训练,包括一些未得到尝试的语言家族。研究人员评估了预训练语言模型性能如何随着单语言数据集大小、多语言数据集大小、预训练语言相似性和模型大小(最大45M参数)变化。
  • results: 结果表明,在一定程度上添加多语言数据可以提高低资源语言模型性能,与单语言数据集大小相似的效果。这些改进取决于预训练语言之间的语法相似性,而非词汇重叠。然而,高资源语言在多语言预训练场景下 consistently poor performance。随着数据集大小的增加,添加多语言数据开始对所有语言类型的性能产生负面影响, probable due to limited model capacity(“多语言诅咒”)。这些结果表明,大规模多语言预训练可能不适用于任何语言,但更专注的模型可以显著提高性能。
    Abstract Multilingual language models are widely used to extend NLP systems to low-resource languages. However, concrete evidence for the effects of multilinguality on language modeling performance in individual languages remains scarce. Here, we pre-train over 10,000 monolingual and multilingual language models for over 250 languages, including multiple language families that are under-studied in NLP. We assess how language modeling performance in each language varies as a function of (1) monolingual dataset size, (2) added multilingual dataset size, (3) linguistic similarity of the added languages, and (4) model size (up to 45M parameters). We find that in moderation, adding multilingual data improves low-resource language modeling performance, similar to increasing low-resource dataset sizes by up to 33%. Improvements depend on the syntactic similarity of the added multilingual data, with marginal additional effects of vocabulary overlap. However, high-resource languages consistently perform worse in multilingual pre-training scenarios. As dataset sizes increase, adding multilingual data begins to hurt performance for both low-resource and high-resource languages, likely due to limited model capacity (the "curse of multilinguality"). These results suggest that massively multilingual pre-training may not be optimal for any languages involved, but that more targeted models can significantly improve performance.
    摘要 多语言语言模型广泛应用于扩展NLP系统到低资源语言。然而,具体的证据表明多语言性对语言模型性能在个体语言中的影响仍然缺乏。在这里,我们预训练了10,000多语言和多语言语言模型,涵盖250种语言,包括一些在NLP中未得到足够研究的语言家族。我们评估了在每种语言中语言模型性能如何随(1)单语言数据集大小、(2)添加多语言数据集大小、(3)添加语言家族之间的语法相似性和(4)模型大小(最多4500万参数)而变化。我们发现,在一定程度上,添加多语言数据可以提高低资源语言模型性能,类似于增加低资源数据集大小,最多提高33%。提高取决于添加的多语言数据中的语法相似性,而词汇重叠也具有有限的效果。然而,高资源语言在多语言预训练场景下一直表现差。随着数据集大小的增加,添加多语言数据开始对低资源语言和高资源语言都有负面影响,可能是因为模型容量的限制(“多语言性的咒”)。这些结果表明,大规模多语言预训练可能不适合任何语言,但更加注重的模型可以很大程度提高性能。

Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models

  • paper_url: http://arxiv.org/abs/2311.09194
  • repo_url: None
  • paper_authors: James A. Michaelov, Catherine Arnett, Tyler A. Chang, Benjamin K. Bergen
  • for: 这paper主要研究了大语言模型中的 grammatical knowledge 的抽象性,以及这种抽象性如何在不同语言之间具有共同的特征。
  • methods: 研究者使用了大语言模型,并对其进行了跨语言和单语言的结构预期测试,以评估模型中 grammatical knowledge 的抽象性。
  • results: 研究者发现,大语言模型中的 grammatical knowledge 具有抽象性,并且可以在不同语言之间共同影响文本生成。此外,模型的表现和人类实验结果相似,证明了模型中 grammatical knowledge 的抽象性和人类的语言知识之间的相似性。
    Abstract Abstract grammatical knowledge - of parts of speech and grammatical patterns - is key to the capacity for linguistic generalization in humans. But how abstract is grammatical knowledge in large language models? In the human literature, compelling evidence for grammatical abstraction comes from structural priming. A sentence that shares the same grammatical structure as a preceding sentence is processed and produced more readily. Because confounds exist when using stimuli in a single language, evidence of abstraction is even more compelling from crosslingual structural priming, where use of a syntactic structure in one language primes an analogous structure in another language. We measure crosslingual structural priming in large language models, comparing model behavior to human experimental results from eight crosslingual experiments covering six languages, and four monolingual structural priming experiments in three non-English languages. We find evidence for abstract monolingual and crosslingual grammatical representations in the models that function similarly to those found in humans. These results demonstrate that grammatical representations in multilingual language models are not only similar across languages, but they can causally influence text produced in different languages.
    摘要 抽象语法知识 - parts of speech和 grammatical patterns - 是人类语言能力的关键。但是大语言模型中的语法知识如何抽象?我们通过跨语言结构启发来证明语法抽象的存在。在人类文献中,跨语言结构启发提供了吸引人的证据,其中一句语言与之前一句语言的同一个语法结构相似时,对于语言的处理和生成而言更加容易。由于单一语言的干扰因素存在,跨语言结构启发的证据更加吸引人,其中一种语言中的语法结构在另一种语言中引起相似的结构。我们使用大语言模型测量跨语言结构启发,与人类实验结果相比,来自八种cross语言实验和四种单语言实验。我们发现大语言模型中的语法表示存在抽象的特征,与人类中的语法表示类似,并且可以影响不同语言中的文本生成。这些结果表明,多语言语言模型中的语法表示不仅在不同语言之间具有相似性,而且可以 causally 影响不同语言中的文本生成。

PsyEval: A Comprehensive Large Language Model Evaluation Benchmark for Mental Health

  • paper_url: http://arxiv.org/abs/2311.09189
  • repo_url: None
  • paper_authors: Haoan Jin, Siyuan Chen, Mengyue Wu, Kenny Q. Zhu
  • for: 本研究旨在提供大语言模型(LLM)在心理健康领域的评价标准,填补当前该领域中LLM的评价缺乏的空白。
  • methods: 本研究使用了六个子任务,涵盖三个维度,系统地评价了八种高级LLM在心理健康领域的能力。
  • results: 实验结果表明,当前的LLM在心理健康领域仍有很大的提升空间,同时也揭示了未来模型优化的潜在方向。
    Abstract Recently, there has been a growing interest in utilizing large language models (LLMs) in mental health research, with studies showcasing their remarkable capabilities, such as disease detection. However, there is currently a lack of a comprehensive benchmark for evaluating the capability of LLMs in this domain. Therefore, we address this gap by introducing the first comprehensive benchmark tailored to the unique characteristics of the mental health domain. This benchmark encompasses a total of six sub-tasks, covering three dimensions, to systematically assess the capabilities of LLMs in the realm of mental health. We have designed corresponding concise prompts for each sub-task. And we comprehensively evaluate a total of eight advanced LLMs using our benchmark. Experiment results not only demonstrate significant room for improvement in current LLMs concerning mental health but also unveil potential directions for future model optimization.
    摘要 近些时间,大语言模型(LLM)在心理健康研究中的应用受到了越来越多的关注,研究显示其惊人的能力,如疾病检测。然而,当前心理健康领域中LLM的能力的全面评估 benchmark 缺乏。因此,我们填补这一空白,引入了心理健康领域的首个全面性 benchmark。这个 benchmark 涵盖了六个子任务,覆盖三个维度,用于系统地评估 LLM 在心理健康领域的能力。我们设计了每个子任务的简洁提示。我们对八个高级 LLM 进行了全面评估,实验结果表明,现有 LLM 在心理健康领域还有很大的提升空间,同时也揭示了未来模型优化的潜在方向。

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization

  • paper_url: http://arxiv.org/abs/2311.09184
  • repo_url: https://github.com/yale-nlp/instrusum
  • paper_authors: Yixin Liu, Alexander R. Fabbri, Jiawen Chen, Yilun Zhao, Simeng Han, Shafiq Joty, Pengfei Liu, Dragomir Radev, Chien-Sheng Wu, Arman Cohan
  • for: 这篇论文旨在研究语言模型(LLM)在更复杂的概要任务设定下的性能,特别是在指定概要特征的情况下。
  • methods: 作者使用了一组指定文章和自然语言需求来训练 LLM,并对5种基于 LLM 的概要系统进行人工评估。以及使用了4种评估协议和11种 LLM 进行自动评估。
  • results: 研究发现,对于 LLM 来说,制定概要任务仍然是一项具有挑战性的任务,因为(1)所有评估的 LLM 都会在概要中作出错误和其他类型的错误;(2)所有基于 LLM 的评估方法无法与人类标注员 achieve strong alignment 的质量评估标准;(3)不同的 LLM 在概要生成和评估方面表现出了大的性能差距。
    Abstract While large language models (LLMs) already achieve strong performance on standard generic summarization benchmarks, their performance on more complex summarization task settings is less studied. Therefore, we benchmark LLMs on instruction controllable text summarization, where the model input consists of both a source article and a natural language requirement for the desired summary characteristics. To this end, we curate an evaluation-only dataset for this task setting and conduct human evaluation on 5 LLM-based summarization systems. We then benchmark LLM-based automatic evaluation for this task with 4 different evaluation protocols and 11 LLMs, resulting in 40 evaluation methods in total. Our study reveals that instruction controllable text summarization remains a challenging task for LLMs, since (1) all LLMs evaluated still make factual and other types of errors in their summaries; (2) all LLM-based evaluation methods cannot achieve a strong alignment with human annotators when judging the quality of candidate summaries; (3) different LLMs show large performance gaps in summary generation and evaluation. We make our collected benchmark, InstruSum, publicly available to facilitate future research in this direction.
    摘要 大型语言模型(LLM)已经在标准化的摘要 benchmark 上 дости得了强大的表现,但它们在更加复杂的摘要任务设定中的表现更少研究。因此,我们将 LLM benchmark 在 instruction 控制的文本摘要任务中,其中模型输入包括来源文章和自然语言需求摘要特性。为此,我们为这个任务设定了评估对象 dataset 并进行了人类评估 five LLM 摘要系统。然后,我们对 LLM 自动评估的 benchmark 进行了四种评估协议和 eleven LLM 的评估,共计 forty 种评估方法。我们的研究发现, instruction 控制的文本摘要仍然是 LLM 的挑战,因为:1. 所有 LLM 评估都会在摘要中发生实际和其他类型的错误。2. 所有 LLM 基于的评估方法无法与人类评估者在评估候选摘要质量上实现强大的一致。3. 不同的 LLM 在摘要生成和评估中表现出大的性能差异。我们将我们收集的 benchmark, InstruSum,公开供后续研究使用。

ContraDoc: Understanding Self-Contradictions in Documents with Large Language Models

  • paper_url: http://arxiv.org/abs/2311.09182
  • repo_url: None
  • paper_authors: Jierui Li, Vipul Raheja, Dhruv Kumar
  • for: 研究长文档自相矛盾的能力
  • methods: 使用四个现有的开源和商业可用的大语言模型(GPT3.5、GPT4、PaLM2、LLaMAv2)进行分析
  • results: GPT4表现最好,可以超越人类的表现,但 ainda有问题,尤其是需要更多的细节和 контекст的自相矛盾。
    Abstract In recent times, large language models (LLMs) have shown impressive performance on various document-level tasks such as document classification, summarization, and question-answering. However, research on understanding their capabilities on the task of self-contradictions in long documents has been very limited. In this work, we introduce ContraDoc, the first human-annotated dataset to study self-contradictions in long documents across multiple domains, varying document lengths, self-contradictions types, and scope. We then analyze the current capabilities of four state-of-the-art open-source and commercially available LLMs: GPT3.5, GPT4, PaLM2, and LLaMAv2 on this dataset. While GPT4 performs the best and can outperform humans on this task, we find that it is still unreliable and struggles with self-contradictions that require more nuance and context. We release the dataset and all the code associated with the experiments.
    摘要 Translation in Simplified Chinese:近期,大型语言模型(LLM)在各种文档级任务上表现出色,如文档分类、概要和问答等。然而,关于 LLMS 在自相矛盾任务上的能力研究却非常有限。在这项工作中,我们介绍了 ContraDoc,首个人类标注的长文档自相矛盾数据集,覆盖多个领域、文档长度、自相矛盾类型和范围。然后,我们分析了四种当前最佳的开源和商业可用 LLM:GPT3.5、GPT4、PaLM2 和 LLaMAv2 在这个数据集上的表现。虽然 GPT4 表现最佳,可以超越人类的表现,但我们发现它在自相矛盾需要更多的细节和上下文时表现不可靠。我们发布了数据集和相关实验代码。

PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers

  • paper_url: http://arxiv.org/abs/2311.09180
  • repo_url: None
  • paper_authors: Sheshera Mysore, Zhuoran Lu, Mengting Wan, Longqi Yang, Steve Menezes, Tina Baghaee, Emmanuel Barajas Gonzalez, Jennifer Neville, Tara Safavi
  • for: 提高写作和沟通质量和效率
  • methods: 使用搜索引擎增强大型语言模型的写作助手,以提供个性化的写作 Output
  • results: 实现了个性化的社交媒体和Reddit评论生成,并且可以用作写作质量预测和优化低质量生成
    Abstract Powerful large language models have facilitated the development of writing assistants that promise to significantly improve the quality and efficiency of composition and communication. However, a barrier to effective assistance is the lack of personalization in LLM outputs to the author's communication style and specialized knowledge. In this paper, we address this challenge by proposing PEARL, a retrieval-augmented LLM writing assistant personalized with a generation-calibrated retriever. Our retriever is trained to select historic user-authored documents for prompt augmentation, such that they are likely to best personalize LLM generations for a user request. We propose two key novelties for training our retriever: 1) A training data selection method that identifies user requests likely to benefit from personalization and documents that provide that benefit; and 2) A scale-calibrating KL-divergence objective that ensures that our retriever closely tracks the benefit of a document for personalized generation. We demonstrate the effectiveness of PEARL in generating personalized workplace social media posts and Reddit comments. Finally, we showcase the potential of a generation-calibrated retriever to double as a performance predictor and further improve low-quality generations via LLM chaining.
    摘要 强大的大语言模型已经促进了写作助手的开发,这些助手承诺可以大幅提高写作和交流的质量和效率。然而,一个阻碍效果的问题是LLM输出的不具有作者的通信风格和专业知识的个性化。在这篇论文中,我们解决这个挑战,提出了一种基于检索的LLM写作助手,即PEARL。我们的检索器通过选择历史用户自动生成的文档来补充请求,以便最大化LLM生成的个性化效果。我们提出了两项关键新特点来训练我们的检索器:1)一种用于选择可以从属性的请求和文档,以便提高个性化效果;2)一种托管KL散度目标,确保检索器与个性化生成的效果相似。我们示出PEARL在生成工作室社交媒体帖子和Reddit评论中的个性化效果。 finally,我们展示了一种基于生成检索的性能预测器,可以进一步改善低质量生成的LLM链。

SiRA: Sparse Mixture of Low Rank Adaptation

  • paper_url: http://arxiv.org/abs/2311.09179
  • repo_url: None
  • paper_authors: Yun Zhu, Nevan Wichers, Chu-Cheng Lin, Xinyi Wang, Tianlong Chen, Lei Shu, Han Lu, Canoee Liu, Liangchen Luo, Jindong Chen, Lei Meng
  • For: 这篇论文的目的是提出一种简单且高效的推导大型自然语言模型(LoRA),以适应下游任务。* Methods: 这篇论文使用了一种称为“简单混合”的方法,即将LoRA的所有参数都用来适应特定任务。然而,这种方法在实验中被证明是不太有效的。因此,这篇论文提出了一种新的方法,即SiRA,它使用了简单的混合来提高LoRA的性能。* Results: 这篇论文的实验结果显示,SiRA比LoRA和其他混合专家方法在不同单任务和多任务设置中表现更好。
    Abstract Parameter Efficient Tuning has been an prominent approach to adapt the Large Language Model to downstream tasks. Most previous works considers adding the dense trainable parameters, where all parameters are used to adapt certain task. We found this less effective empirically using the example of LoRA that introducing more trainable parameters does not help. Motivated by this we investigate the importance of leveraging "sparse" computation and propose SiRA: sparse mixture of low rank adaption. SiRA leverages the Sparse Mixture of Expert(SMoE) to boost the performance of LoRA. Specifically it enforces the top $k$ experts routing with a capacity limit restricting the maximum number of tokens each expert can process. We propose a novel and simple expert dropout on top of gating network to reduce the over-fitting issue. Through extensive experiments, we verify SiRA performs better than LoRA and other mixture of expert approaches across different single tasks and multitask settings.
    摘要 “对大型语言模型进行高效调整”(Parameter Efficient Tuning)是一种广泛使用的方法,以适应下游任务。大多数前一些研究假设添加密集可训练参数,其中所有参数都用于适应特定任务。但我们在LoRA的例子中发现,增加更多的可训练参数并不对Empirical Effective。驱动于此,我们展开了对“稀疏”计算的重要性,并提出了SiRA:稀疏混合低阶适应。SiRA利用Sparse Mixture of Expert(SMoE)来提高LoRA的性能。具体来说,它强制 Top $k$ 专家路由具有容量限制,限制每个专家处理的 Token 最多数量。我们还提出了一种新的简单的专家排除方法,以降低过滤问题。通过广泛的实验,我们证明SiRA在不同的单任务和多任务设置中表现比LoRA和其他混合专家方法更好。

CLEAN-EVAL: Clean Evaluation on Contaminated Large Language Models

  • paper_url: http://arxiv.org/abs/2311.09154
  • repo_url: None
  • paper_authors: Wenhong Zhu, Hongkun Hao, Zhiwei He, Yunze Song, Yumeng Zhang, Hanxu Hu, Yiran Wei, Rui Wang, Hongyuan Lu
  • for: 评估大语言模型(LLM)的真实能力,因为数据污染导致评估结果不准确。
  • methods: 提出了一种新的评估方法——Clean-Eval,通过抽象和反编译污染数据,生成表达相同意义的不同表面形式的样本,并使用语义检测器筛选低质量样本,选择最佳样本基于BLEURT分数。
  • results: Clean-Eval可以准确地评估污染后的LLM表现,并且可以生成新的评估标准。实验表明,Clean-Eval在几个不同场景下能够重新评估污染后的LLM表现。
    Abstract We are currently in an era of fierce competition among various large language models (LLMs) continuously pushing the boundaries of benchmark performance. However, genuinely assessing the capabilities of these LLMs has become a challenging and critical issue due to potential data contamination, and it wastes dozens of time and effort for researchers and engineers to download and try those contaminated models. To save our precious time, we propose a novel and useful method, Clean-Eval, which mitigates the issue of data contamination and evaluates the LLMs in a cleaner manner. Clean-Eval employs an LLM to paraphrase and back-translate the contaminated data into a candidate set, generating expressions with the same meaning but in different surface forms. A semantic detector is then used to filter the generated low-quality samples to narrow down this candidate set. The best candidate is finally selected from this set based on the BLEURT score. According to human assessment, this best candidate is semantically similar to the original contamination data but expressed differently. All candidates can form a new benchmark to evaluate the model. Our experiments illustrate that Clean-Eval substantially restores the actual evaluation results on contaminated LLMs under both few-shot learning and fine-tuning scenarios.
    摘要 现在是一个大型语言模型(LLM)不断推进指标性能的竞争时代。然而,评估这些 LLM 的真实能力已成为一个困难和重要的问题,因为可能存在数据污染,这会浪费研究人员和工程师们很多时间和努力来下载和尝试这些污染的模型。为了保留我们的宝贵时间,我们提出了一种新的方法,即 Clean-Eval,它解决了数据污染问题,并评估 LLM 在更加干净的环境下。Clean-Eval 使用一个 LLM 来重新表述和反翻污染数据,生成表达同一个意义,但表现在不同的表面形式中的候选集。然后,一个Semantic Detector 被用来筛选生成的低质量样本,从而缩小候选集。最后,根据 BLEURT 分数,从候选集中选择最佳候选。根据人工评估,这个最佳候选与原始污染数据具有相同的含义,但表现在不同的表面形式中。所有候选都可以组成一个新的评估标准。我们的实验表明,Clean-Eval 可以减少在污染 LLM 下的实际评估结果的损失,在少量学习和微调学习场景下。

Grounding or Guesswork? Large Language Models are Presumptive Grounders

  • paper_url: http://arxiv.org/abs/2311.09144
  • repo_url: None
  • paper_authors: Omar Shaikh, Kristina Gligorić, Ashna Khetan, Matthias Gerstgrasser, Diyi Yang, Dan Jurafsky
  • for: 这个论文主要是研究人工智能和人的对话中的共同基础建立方面。
  • methods: 这个论文使用了一些对话动作(如 clarify 和 acknowledge)来研究人工智能是否可以成功地建立共同基础。
  • results: 研究发现现有的大型自然语言处理模型(LLMs)在建立共同基础时偏向假设共同基础的存在,而不使用对话动作来确认共同基础。
    Abstract Effective conversation requires common ground: a shared understanding between the participants. Common ground, however, does not emerge spontaneously in conversation. Speakers and listeners work together to both identify and construct a shared basis while avoiding misunderstanding. To accomplish grounding, humans rely on a range of dialogue acts, like clarification (What do you mean?) and acknowledgment (I understand.). In domains like teaching and emotional support, carefully constructing grounding prevents misunderstanding. However, it is unclear whether large language models (LLMs) leverage these dialogue acts in constructing common ground. To this end, we curate a set of grounding acts and propose corresponding metrics that quantify attempted grounding. We study whether LLMs use these grounding acts, simulating them taking turns from several dialogue datasets, and comparing the results to humans. We find that current LLMs are presumptive grounders, biased towards assuming common ground without using grounding acts. To understand the roots of this behavior, we examine the role of instruction tuning and reinforcement learning with human feedback (RLHF), finding that RLHF leads to less grounding. Altogether, our work highlights the need for more research investigating grounding in human-AI interaction.
    摘要 有效的对话需要共同基础:参与者之间的共同理解。然而,这些共同基础不会自然地出现在对话中。说话者和听者需要共同工作,以确定并构建共同基础,并避免错解。人类在教学和情感支持等领域中,会考虑地构建共同基础,以避免错解。然而,是否LLMs会利用对话措施来建立共同基础,这是一个未知的问题。为了解决这个问题,我们筛选了一组共同基础动作,并提出了相应的评价指标。我们研究了LLMs是否使用这些共同基础动作,通过对多个对话集进行模拟,并与人类对话进行比较。我们发现,当前的LLMs具有假设共同基础的倾向,即不使用共同基础动作来建立共同基础。为了了解这种行为的起源,我们研究了指导调整和人类反馈学习(RLHF)的作用,发现RLHF会减少共同基础的使用。总之,我们的工作强调了人机交互中共同基础的研究的重要性。

RRescue: Ranking LLM Responses to Enhance Reasoning Over Context

  • paper_url: http://arxiv.org/abs/2311.09136
  • repo_url: None
  • paper_authors: Yikun Wang, Rui Zheng, Haoming Li, Qi Zhang, Tao Gui, Fei Liu
  • for: 这篇论文目的是提高大语言模型(LLM)的上下文理解能力,以便更好地应用于响应生成。
  • methods: 该论文提出了一种新的应用ranking指标来优化LLM的上下文理解,包括人工标注、规则函数和模型蒸馏等方法。
  • results: 通过使用这种新的应用ranking指标,论文的实验结果表明LLM的上下文理解能力得到了改进,并且在最新的多文档问答 dataset 上达到了更高的成绩。
    Abstract Effectively using a given context is paramount for large language models. A context window can include task specifications, retrieved documents, previous conversations, and even model self-reflections, functioning similarly to episodic memory. While efforts are being made to expand the context window, studies indicate that LLMs do not use their context optimally for response generation. In this paper, we present a novel approach to optimize LLMs using ranking metrics, which teaches LLMs to rank a collection of contextually-grounded candidate responses. Rather than a traditional full ordering, we advocate for a partial ordering. This is because achieving consensus on the perfect order for system responses can be challenging. Our partial ordering is more robust, less sensitive to noise, and can be acquired through human labelers, heuristic functions, or model distillation. We test our system's improved contextual understanding using the latest benchmarks, including a new multi-document question answering dataset. We conduct ablation studies to understand crucial factors, such as how to gather candidate responses, determine their most suitable order, and balance supervised fine-tuning with ranking metrics. Our approach, named RRescue, suggests a promising avenue for enhancing LLMs' contextual understanding via response ranking.
    摘要 使用给定的上下文是大语言模型的关键。上下文窗口可以包括任务规范、检索到的文档、先前的对话和模型自我反思,功能类似于 episodic memory。然而,研究表明,LLMs 不使用上下文最优。在这篇论文中,我们提出了一种新的方法来优化 LLMs,使其可以 ranks 上下文化的候选答案集。而不是传统的全局排序,我们建议使用 partial ordering。这是因为实现完美的上下文排序可能是困难的。我们的 partial ordering 更加稳定, less sensitive to noise,可以通过人工标注、规则函数或模型泛化来获得。我们测试了我们的系统在最新的benchmark中的改进上下文理解,包括一个新的多文档问答数据集。我们进行了ablation study来理解关键因素,如如何收集候选答案、确定其最佳顺序和平衡upervised fine-tuning with ranking metrics。我们的方法,名为 RRescue,建议一种可能提高 LLMs 的上下文理解的 Avenues。

Aligning Neural Machine Translation Models: Human Feedback in Training and Inference

  • paper_url: http://arxiv.org/abs/2311.09132
  • repo_url: None
  • paper_authors: Miguel Moura Ramos, Patrick Fernandes, António Farinhas, André F. T. Martins
  • for: 这种技术是为了提高语言模型生成的文本质量,使其更加类似于人类生成的文本。
  • methods: 这种技术使用人类反馈来训练抽象模型,并在语言模型的训练过程中使用它来改进模型的性能。
  • results: 这个研究表明,通过integratingquality metrics into the MT pipeline可以提高翻译质量,并且 combining RL training with reranking techniques可以实现显著的提高。
    Abstract Reinforcement learning from human feedback (RLHF) is a recent technique to improve the quality of the text generated by a language model, making it closer to what humans would generate. A core ingredient in RLHF's success in aligning and improving large language models (LLMs) is its reward model, trained using human feedback on model outputs. In machine translation (MT), where metrics trained from human annotations can readily be used as reward models, recent methods using minimum Bayes risk decoding and reranking have succeeded in improving the final quality of translation. In this study, we comprehensively explore and compare techniques for integrating quality metrics as reward models into the MT pipeline. This includes using the reward model for data filtering, during the training phase through RL, and at inference time by employing reranking techniques, and we assess the effects of combining these in a unified approach. Our experimental results, conducted across multiple translation tasks, underscore the crucial role of effective data filtering, based on estimated quality, in harnessing the full potential of RL in enhancing MT quality. Furthermore, our findings demonstrate the effectiveness of combining RL training with reranking techniques, showcasing substantial improvements in translation quality.
    摘要 人类反馈学习(RLHF)是一种现代技术,用于改进语言模型生成的文本质量,使其更加类似于人类生成的文本。RLHF的成功一大部分归功于其奖励模型,通过人类反馈来训练。在机器翻译(MT)领域,可以 readily使用人类标注数据来训练奖励模型,最近的方法使用最小极大 Bayes风险解oding和重新排序技术,已经在提高翻译质量方面取得了 significanthy进步。本研究旨在全面探讨和比较在MT管道中 integrateQuality metrics as reward models的技术。这包括使用奖励模型来筛选数据,在训练阶段通过RL进行训练,以及在推理时使用重新排序技术,并评估这些技术的组合效果。我们的实验结果,在多个翻译任务上进行了检验,强调了有效的数据筛选,基于估计的质量,在RL中激发全部的潜力,提高翻译质量。此外,我们的发现还证明了RL训练与重新排序技术的组合可以实现显著的提高翻译质量。

Social Meme-ing: Measuring Linguistic Variation in Memes

  • paper_url: http://arxiv.org/abs/2311.09130
  • repo_url: https://github.com/naitian/semantic-memes
  • paper_authors: Naitian Zhou, David Jurgens, David Bamman
  • for: This paper explores sociolinguistic variation in memes, using a computational pipeline to cluster individual instances of memes into templates and semantic variables.
  • methods: The paper uses a multimodal approach, taking advantage of the visual templates and text in memes to analyze their semantic function.
  • results: The study discovers meaningful social variation in meme usage between subreddits, and patterns of meme innovation and acculturation within these communities align with previous findings on written language.Here is the same information in Simplified Chinese text:
  • for: 这篇论文探索了社会语言变化在抖音中,使用计算机方法将个体抖音划分成模板和语义变量。
  • methods: 论文采用多Modal方法,利用抖音的视觉模板和文本来分析其语义功能。
  • results: 研究发现抖音在社区之间存在社会意义的变化,并发现抖音创新和同化在这些社区中与过去语言变化的趋势相吻合。
    Abstract Much work in the space of NLP has used computational methods to explore sociolinguistic variation in text. In this paper, we argue that memes, as multimodal forms of language comprised of visual templates and text, also exhibit meaningful social variation. We construct a computational pipeline to cluster individual instances of memes into templates and semantic variables, taking advantage of their multimodal structure in doing so. We apply this method to a large collection of meme images from Reddit and make available the resulting \textsc{SemanticMemes} dataset of 3.8M images clustered by their semantic function. We use these clusters to analyze linguistic variation in memes, discovering not only that socially meaningful variation in meme usage exists between subreddits, but that patterns of meme innovation and acculturation within these communities align with previous findings on written language.
    摘要 很多NLP领域的研究使用计算方法来探索社会语言变化。在这篇论文中,我们 argue That memes,作为Multimodal的语言形式,也存在意义的社会变化。我们构建了一个计算管道来将个体照片分为模板和semantic variable,利用其 Multimodal结构来做此。我们将这些方法应用于Reddit上的大量meme图片集合,并将结果作为\textsc{SemanticMemes}数据集,包含3.8M个图片,按Semantic功能进行分组。我们使用这些分组来分析Memes的语言变化,发现不仅在subreddit之间存在社会意义的变化,而且在这些社区中,meme创新和同化的模式与前面的文本语言发现相似。

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

  • paper_url: http://arxiv.org/abs/2311.09122
  • repo_url: None
  • paper_authors: Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Šuppa, Hila Gonen, Joseph Marvin Imperial, Börje F. Karlsson, Peiqin Lin, Nikola Ljubešić, LJ Miranda, Barbara Plank, Arij Riabi, Yuval Pinter
  • for: 这 paper 的目的是开发一个开源、社区驱动的项目,以创建多种语言的高质量名实体识别(NER)标准 benchmark。
  • methods: 这 paper 使用了多种语言的名实体识别数据集,并对其进行了cross-lingual consistent的标注。
  • results: 这 paper 提供了多种语言的名实体识别数据集,并在不同的语言和学习环境中提供了初步的模型基线。
    Abstract We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 18 datasets annotated with named entities in a cross-lingual consistent schema across 12 diverse languages. In this paper, we detail the dataset creation and composition of UNER; we also provide initial modeling baselines on both in-language and cross-lingual learning settings. We release the data, code, and fitted models to the public.
    摘要 我们介绍Universal NER(UNER),一个开放、社区驱动的项目,旨在开发多种语言的高标准命名实体识别标准。UNER v1包含18个数据集,每个数据集包含多种语言的命名实体,以跨语言一致的方式进行标注。在这篇论文中,我们详细介绍了UNER数据集的创建和组合,以及在本语言和跨语言学习环境中的初步模型基线。我们将数据、代码和适应模型公开发布。

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces

  • paper_url: http://arxiv.org/abs/2311.09117
  • repo_url: None
  • paper_authors: Heng-Jui Chang, James Glass
  • for: 这篇论文是为了提出一种数据效率的自主supervised fine-tuning框架,以获得 speaker和噪声不变的语音表示。
  • methods: 该框架使用 speaker-invariant clustering (Spin) 学习精确的音频单元,并通过预测音频片段来强化内容表示。
  • results: 相比之前的状态艺术方法,R-Spin 可以在严重扭曲语音场景下获得更好的表示性,同时减少了计算资源的使用量,达到 12 倍的提升。
    Abstract This paper introduces Robust Spin (R-Spin), a data-efficient self-supervised fine-tuning framework for speaker and noise-invariant speech representations by learning discrete acoustic units with speaker-invariant clustering (Spin). R-Spin resolves Spin's issues and enhances content representations by learning to predict acoustic pieces. R-Spin offers a 12X reduction in computational resources compared to previous state-of-the-art methods while outperforming them in severely distorted speech scenarios. This paper provides detailed analyses to show how discrete units contribute to speech encoder training and improving robustness in diverse acoustic environments.
    摘要 Translation in Simplified Chinese:这篇论文介绍了一种名为Robust Spin(R-Spin)的数据精简自我超越框架,用于实现Speaker和噪声不变的语音表示。R-Spin解决了Spin的问题,并提高了语音表示的内容。R-Spin可以在严重扭曲的语音enario中具有12倍的计算资源减少,并在前一代方法中出perform。这篇论文还提供了详细的分析,以显示дискреTE Units在语音编码器训练中的贡献和提高 robustness在多种听频环境中。

“We Demand Justice!”: Towards Grounding Political Text in Social Context

  • paper_url: http://arxiv.org/abs/2311.09106
  • repo_url: None
  • paper_authors: Rajkumar Pujari, Chengfei Wu, Dan Goldwasser
  • for: 本研究旨在 Computational setting中理解ambiguous statements的语言含义,并将其与现实世界相关的实体、行为和态度关联。
  • methods: 本研究使用了两个具有挑战性的 datasets,需要理解文本的现实世界上下文才能解决 Effectively。此外,还开发了基于现有 ‘Discourse Contextualization Framework’ 和 ‘Political Actor Representation’ 模型的更结构化基elines。
  • results: 本研究通过对基elines的比较分析,提供了更深入的理解社会语言理解挑战的信息。
    Abstract Social media discourse from US politicians frequently consists of 'seemingly similar language used by opposing sides of the political spectrum'. But often, it translates to starkly contrasting real-world actions. For instance, "We need to keep our students safe from mass shootings" may signal either "arming teachers to stop the shooter" or "banning guns to reduce mass shootings" depending on who says it and their political stance on the issue. In this paper, we define and characterize the context that is required to fully understand such ambiguous statements in a computational setting and ground them in real-world entities, actions, and attitudes. To that end, we propose two challenging datasets that require an understanding of the real-world context of the text to be solved effectively. We benchmark these datasets against baselines built upon large pre-trained models such as BERT, RoBERTa, GPT-3, etc. Additionally, we develop and benchmark more structured baselines building upon existing 'Discourse Contextualization Framework' and 'Political Actor Representation' models. We perform analysis of the datasets and baseline predictions to obtain further insights into the pragmatic language understanding challenges posed by the proposed social grounding tasks.
    摘要 社交媒体讨论由美国政客们频繁使用"看起来相似的语言",但实际上它们可能表达出极其不同的现实世界行动。例如,"我们需要保护学生免受大规模枪击"可能表示"武装教师以阻止射手"或"禁止枪支以减少大规模枪击",这取决于说话人的政治立场。在这篇论文中,我们定义和描述了 Computational Setting中需要完全理解这些抽象语言的上下文,并将其固定到现实世界实体、行动和态度。为此,我们提出了两个复杂的数据集,需要理解文本的现实世界上下文才能解决 effectively。我们对这些数据集进行了基线测试,并开发了基于现有"Discourse Contextualization Framework"和"Political Actor Representation"模型的更结构化基线。我们对数据集和基线预测进行分析,以获得更深入的语言理解挑战的进一步洞察。

MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation

  • paper_url: http://arxiv.org/abs/2311.09105
  • repo_url: None
  • paper_authors: Xiaozhi Wang, Hao Peng, Yong Guan, Kaisheng Zeng, Jianhui Chen, Lei Hou, Xu Han, Yankai Lin, Zhiyuan Liu, Ruobing Xie, Jie Zhou, Juanzi Li
  • for: This paper is written for the purpose of introducing a new dataset, MAVEN-Arg, which supports event understanding tasks such as event detection, event argument extraction, and event relation extraction.
  • methods: The paper uses a large-scale dataset, MAVEN-Arg, which is augmented with event argument annotations, to support the development and evaluation of event understanding models.
  • results: The paper reports that MAVEN-Arg is a challenging dataset for both fine-tuned EAE models and proprietary large language models (LLMs), and demonstrates the potential benefits of an all-in-one dataset for future event prediction applications using LLMs.Here is the information in Simplified Chinese text:
  • for: 这篇论文是为了介绍一个新的数据集MAVEN-Arg,该数据集支持事件理解任务,包括事件检测、事件Argument提取和事件关系提取。
  • methods: 这篇论文使用了一个大规模的数据集MAVEN-Arg,该数据集包括事件Argument的注释,以支持事件理解模型的发展和评估。
  • results: 论文表明,MAVEN-Arg是对于both fine-tuned EAE模型和专有大语言模型(LLMs)来说是一个具有挑战性的数据集,并demonstrates该数据集的可能性用于未来事件预测应用程序。
    Abstract Understanding events in texts is a core objective of natural language understanding, which requires detecting event occurrences, extracting event arguments, and analyzing inter-event relationships. However, due to the annotation challenges brought by task complexity, a large-scale dataset covering the full process of event understanding has long been absent. In this paper, we introduce MAVEN-Arg, which augments MAVEN datasets with event argument annotations, making the first all-in-one dataset supporting event detection, event argument extraction (EAE), and event relation extraction. As an EAE benchmark, MAVEN-Arg offers three main advantages: (1) a comprehensive schema covering 162 event types and 612 argument roles, all with expert-written definitions and examples; (2) a large data scale, containing 98,591 events and 290,613 arguments obtained with laborious human annotation; (3) the exhaustive annotation supporting all task variants of EAE, which annotates both entity and non-entity event arguments in document level. Experiments indicate that MAVEN-Arg is quite challenging for both fine-tuned EAE models and proprietary large language models (LLMs). Furthermore, to demonstrate the benefits of an all-in-one dataset, we preliminarily explore a potential application, future event prediction, with LLMs. MAVEN-Arg and our code can be obtained from https://github.com/THU-KEG/MAVEN-Argument.
    摘要 Understanding events in texts is a core goal of natural language understanding, which involves detecting event occurrences, extracting event arguments, and analyzing inter-event relationships. However, due to the challenges of annotation, a large-scale dataset covering the full process of event understanding has been lacking. In this paper, we introduce MAVEN-Arg, which adds event argument annotations to the MAVEN datasets, creating the first all-in-one dataset supporting event detection, event argument extraction (EAE), and event relation extraction. As an EAE benchmark, MAVEN-Arg offers three main advantages: (1) a comprehensive schema covering 162 event types and 612 argument roles, all with expert-written definitions and examples; (2) a large data scale, containing 98,591 events and 290,613 arguments obtained through laborious human annotation; (3) exhaustive annotation supporting all task variants of EAE, which annotates both entity and non-entity event arguments at the document level. Experiments show that MAVEN-Arg is quite challenging for both fine-tuned EAE models and proprietary large language models (LLMs). Furthermore, to demonstrate the benefits of an all-in-one dataset, we preliminarily explore a potential application, future event prediction, with LLMs. MAVEN-Arg and our code can be obtained from .

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

  • paper_url: http://arxiv.org/abs/2311.09096
  • repo_url: https://github.com/thu-coai/jailbreakdefense_goalpriority
  • paper_authors: Zhexin Zhang, Junxiao Yang, Pei Ke, Minlie Huang
  • for: 这个论文的目的是提出一种对销害攻击的防御方法,帮助保护大语言模型(LLMs)免受销害攻击。
  • methods: 该论文使用了目标优先级的思想来防御销害攻击,在训练和推理阶段都实现了目标优先级的 integrating。
  • results: 该论文的实验结果表明,通过在推理阶段实现目标优先级,可以减少销害攻击的成功率,并且不会影响大语言模型的总体性能。此外,通过在训练阶段实现目标优先级,可以更好地防止销害攻击。
    Abstract Large Language Models (LLMs) continue to advance in their capabilities, yet this progress is accompanied by a growing array of safety risks. While significant attention has been dedicated to exploiting weaknesses in LLMs through jailbreaking attacks, there remains a paucity of exploration into defending against these attacks. We point out a pivotal factor contributing to the success of jailbreaks: the inherent conflict between the goals of being helpful and ensuring safety. To counter jailbreaking attacks, we propose to integrate goal prioritization at both training and inference stages. Implementing goal prioritization during inference substantially diminishes the Attack Success Rate (ASR) of jailbreaking attacks, reducing it from 66.4% to 2.0% for ChatGPT and from 68.2% to 19.4% for Vicuna-33B, without compromising general performance. Furthermore, integrating the concept of goal prioritization into the training phase reduces the ASR from 71.0% to 6.6% for LLama2-13B. Remarkably, even in scenarios where no jailbreaking samples are included during training, our approach slashes the ASR by half, decreasing it from 71.0% to 34.0%. Additionally, our findings reveal that while stronger LLMs face greater safety risks, they also possess a greater capacity to be steered towards defending against such attacks. We hope our work could contribute to the comprehension of jailbreaking attacks and defenses, and shed light on the relationship between LLMs' capability and safety. Our code will be available at \url{https://github.com/thu-coai/JailbreakDefense_GoalPriority}.
    摘要 大型自然语言模型(LLM)继续进步,但同时也涉及到一系列的安全隐患。虽然有很多研究利用 LLM 的弱点进行攻击,但对于防御攻击的研究却受到了相对的少量关注。我们指出,在 LLM 中进行干预时存在一个重要的因素,即帮助和安全之间的矛盾。为了防御攻击,我们提议在训练和执行阶段都进行目标优先级化。在执行阶段实现目标优先级化后,可以显著减少攻击成功率(ASR),从66.4%降低至2.0% для ChatGPT,从68.2%降低至19.4% для Vicuna-33B,无需妥协总体性能。此外,在训练阶段 integrate 目标优先级化也可以降低 ASR 至6.6% для LLama2-13B。即使在没有攻击样本的情况下,我们的方法仍可以减少 ASR 的一半,从71.0%降低至34.0%。此外,我们的研究还发现,强大的 LLM 面临更大的安全隐患,但同时它们也拥有更大的防御能力。我们希望我们的工作可以对攻击和防御之间的关系提供更深入的理解,并为 LLM 的安全做出贡献。我们的代码将在 GitHub 上公开,请参考 \url{https://github.com/thu-coai/JailbreakDefense_GoalPriority}.

Social Bias Probing: Fairness Benchmarking for Language Models

  • paper_url: http://arxiv.org/abs/2311.09090
  • repo_url: None
  • paper_authors: Marta Marchiori Manerba, Karolina Stańczak, Riccardo Guidotti, Isabelle Augenstein
  • for: 本研究旨在探讨语言模型中社会偏见的问题,并提出了一种新的探测方法。
  • methods: 本研究使用了一种新的词语混杂度-based fairness分数,并收集了一个大规模的探测数据集,以分析语言模型的总体协会以及社会分类、标签和刻板印象方面的偏见。
  • results: 研究发现,语言模型中的偏见更加复杂,大型模型变体具有更高度的偏见,并且发现不同religion表达的人群在所有模型中产生最大的不同待遇。
    Abstract Large language models have been shown to encode a variety of social biases, which carries the risk of downstream harms. While the impact of these biases has been recognized, prior methods for bias evaluation have been limited to binary association tests on small datasets, offering a constrained view of the nature of societal biases within language models. In this paper, we propose an original framework for probing language models for societal biases. We collect a probing dataset to analyze language models' general associations, as well as along the axes of societal categories, identities, and stereotypes. To this end, we leverage a novel perplexity-based fairness score. We curate a large-scale benchmarking dataset addressing drawbacks and limitations of existing fairness collections, expanding to a variety of different identities and stereotypes. When comparing our methodology with prior work, we demonstrate that biases within language models are more nuanced than previously acknowledged. In agreement with recent findings, we find that larger model variants exhibit a higher degree of bias. Moreover, we expose how identities expressing different religions lead to the most pronounced disparate treatments across all models.
    摘要 大型语言模型已经显示出了多种社会偏见,这可能导致下游危害。虽然这些偏见的影响已经被认可,但先前的偏见评估方法受限于小型数据集和二元关联测试,这只能提供社会偏见在语言模型中的压缩视图。在这篇论文中,我们提出了一种原创的语言模型偏见探测框架。我们收集了一个探测数据集,以分析语言模型的通用关联以及社会分类、标签和刻板印象的方向。为此,我们利用了一种新的折衣率基准公平分数。我们创建了一个大规模的比较数据集,以解决现有公平集的缺点和限制,扩展到不同的标签和刻板印象。与先前的工作比较,我们发现了更多的偏见在语言模型中,特别是更大的模型变体更加偏见。此外,我们发现了不同的宗教标签表达时,所有模型中的最大差异。

Identifying Self-Disclosures of Use, Misuse and Addiction in Community-based Social Media Posts

  • paper_url: http://arxiv.org/abs/2311.09066
  • repo_url: None
  • paper_authors: Chenghao Yang, Tuhin Chakrabarty, Karli R Hochstatter, Melissa N Slavin, Nabila El-Bassel, Smaranda Muresan
  • For: The paper aims to develop a tool to identify at-risk patients with opioid use disorder by analyzing community-based social media platforms like Reddit.* Methods: The authors use a corpus of 2500 opioid-related posts from various subreddits to annotate span-level extractive explanations and evaluate several state-of-the-art models in a supervised, few-shot, or zero-shot setting.* Results: The authors find that using explanations during modeling leads to a significant boost in classification accuracy, demonstrating their beneficial role in a high-stakes domain such as studying the opioid use disorder continuum.Here are the three points in Simplified Chinese text:* For: 这篇论文目标是开发一种用于识别患有酒精使用障碍的患者的工具,通过分析社区基于的Reddit社交媒体平台上的自透露。* Methods: 作者使用2500篇关于酒精的Reddit帖子,并对它们进行分析和注释,以评估一些当前最佳的模型在不同的超级vised、几个shot和零shot设置下的表现。* Results: 作者发现,在高度关键的领域中,使用解释时期的模型会导致识别酒精使用障碍的准确率显著提高,这demonstrates解释的有利role在研究酒精使用障碍continuum中。
    Abstract In the last decade, the United States has lost more than 500,000 people from an overdose involving prescription and illicit opioids (https://www.cdc.gov/drugoverdose/epidemic/index.html) making it a national public health emergency (USDHHS, 2017). To more effectively prevent unintentional opioid overdoses, medical practitioners require robust and timely tools that can effectively identify at-risk patients. Community-based social media platforms such as Reddit allow self-disclosure for users to discuss otherwise sensitive drug-related behaviors, often acting as indicators for opioid use disorder. Towards this, we present a moderate size corpus of 2500 opioid-related posts from various subreddits spanning 6 different phases of opioid use: Medical Use, Misuse, Addiction, Recovery, Relapse, Not Using. For every post, we annotate span-level extractive explanations and crucially study their role both in annotation quality and model development. We evaluate several state-of-the-art models in a supervised, few-shot, or zero-shot setting. Experimental results and error analysis show that identifying the phases of opioid use disorder is highly contextual and challenging. However, we find that using explanations during modeling leads to a significant boost in classification accuracy demonstrating their beneficial role in a high-stakes domain such as studying the opioid use disorder continuum. The dataset will be made available for research on Github in the formal version.
    摘要 在过去一个十年,美国已经失去了超过500,000名人因为吸毒过量,其中包括药物和黑市药品(https://www.cdc.gov/drugoverdose/epidemic/index.html),这使得这成为一个国家紧急公共卫生问题(USDHHS, 2017)。为了更好地预防意外的毒品过量,医疗专业人员需要强大和时间相对的工具,以有效地识别有风险的病人。社区基础的社交媒体平台如Reddit,allow users to disclose themselves and discuss sensitive drug-related behaviors, often serving as indicators of opioid use disorder. 为此,我们提供了一个 Moderate-sized corpus of 2500 opioid-related posts from various subreddits spanning 6 different phases of opioid use: Medical Use, Misuse, Addiction, Recovery, Relapse, Not Using. For every post, we annotate span-level extractive explanations and crucially study their role both in annotation quality and model development. We evaluate several state-of-the-art models in a supervised, few-shot, or zero-shot setting. Experimental results and error analysis show that identifying the phases of opioid use disorder is highly contextual and challenging. However, we find that using explanations during modeling leads to a significant boost in classification accuracy, demonstrating their beneficial role in a high-stakes domain such as studying the opioid use disorder continuum. The dataset will be made available for research on Github in the formal version.

Do Localization Methods Actually Localize Memorized Data in LLMs?

  • paper_url: http://arxiv.org/abs/2311.09060
  • repo_url: None
  • paper_authors: Ting-Yun Chang, Jesse Thomason, Robin Jia
  • for: 本研究旨在找到LLMs中记忆某个序列的小量神经元。
  • methods: 本文使用两个benchmark方法评估本地化方法的效果,一个是INJ Benchmark,通过在小量神经元中插入新信息来测试本地化方法的准确性;另一个是DEL Benchmark,通过测试dropout located neurons是否会使模型忘记记忆的序列。
  • results: 本研究发现,五种本地化方法在两个benchmark上都达到了一定的成果,尤其是使用减少方法时,能够准确地本地化记忆。但是,所identified神经元不一定是特定的一个记忆序列的特征。
    Abstract Large language models (LLMs) can memorize many pretrained sequences verbatim. This paper studies if we can locate a small set of neurons in LLMs responsible for memorizing a given sequence. While the concept of localization is often mentioned in prior work, methods for localization have never been systematically and directly evaluated; we address this with two benchmarking approaches. In our INJ Benchmark, we actively inject a piece of new information into a small subset of LLM weights and measure whether localization methods can identify these "ground truth" weights. In the DEL Benchmark, we study localization of pretrained data that LLMs have already memorized; while this setting lacks ground truth, we can still evaluate localization by measuring whether dropping out located neurons erases a memorized sequence from the model. We evaluate five localization methods on our two benchmarks, and both show similar rankings. All methods exhibit promising localization ability, especially for pruning-based methods, though the neurons they identify are not necessarily specific to a single memorized sequence.
    摘要 In our INJ Benchmark, we actively inject a piece of new information into a small subset of LLM weights and measure whether localization methods can identify these "ground truth" weights. In the DEL Benchmark, we study localization of pre-trained data that LLMs have already memorized; while this setting lacks ground truth, we can still evaluate localization by measuring whether dropping out located neurons erases a memorized sequence from the model.We evaluate five localization methods on our two benchmarks, and all show promising localization ability, especially for pruning-based methods. However, the neurons they identify are not necessarily specific to a single memorized sequence.

GRASP: A novel benchmark for evaluating language GRounding And Situated Physics understanding in multimodal language models

  • paper_url: http://arxiv.org/abs/2311.09048
  • repo_url: None
  • paper_authors: Serwan Jassim, Mario Holubar, Annika Richter, Cornelius Wolff, Xenia Ohmer, Elia Bruni
  • for: 评估视频基于多modal语言模型的语言固定和物理理解能力
  • methods: 使用Unity simulations进行两 tier评估,包括语言固定和直觉物理理解能力
  • results: 现有多modal语言模型具有语言固定和直觉物理理解缺陷,GRASP benchmark可以帮助监测未来模型的进步
    Abstract This paper presents GRASP, a novel benchmark to evaluate the language grounding and physical understanding capabilities of video-based multimodal large language models (LLMs). This evaluation is accomplished via a two-tier approach leveraging Unity simulations. The initial level tests for language grounding by assessing a model's ability to relate simple textual descriptions with visual information. The second level evaluates the model's understanding of 'Intuitive Physics' principles, such as object permanence and continuity. In addition to releasing the benchmark, we use it to evaluate several state-of-the-art multimodal LLMs. Our evaluation reveals significant shortcomings in current models' language grounding and intuitive physics. These identified limitations underline the importance of benchmarks like GRASP to monitor the progress of future models in developing these competencies.
    摘要 这篇论文介绍了GRASP,一个新的评估语言固定和物理理解能力的视频基于多模态大语言模型(LLM)的benchmark。这种评估方式通过Unity simulate层次结构来实现。第一层测试语言固定的能力,通过将简单的文本描述与视觉信息相关联。第二层测试模型的物理理解能力,包括物体永久性和连续性。此外,我们还使用GRASP评估多种当前领先的多模态LLM。我们的评估发现当前模型的语言固定和直觉物理存在显著的缺陷。这些缺陷证明了GRASP这种benchmark的重要性,以便监测未来模型的发展。

Exploring the Potential of Large Language Models in Computational Argumentation

  • paper_url: http://arxiv.org/abs/2311.09022
  • repo_url: https://github.com/damo-nlp-sg/llm-argumentation
  • paper_authors: Guizhen Chen, Liying Cheng, Luu Anh Tuan, Lidong Bing
  • for: 本研究旨在评估大语言模型(LLMs)在计算辩论领域中的表现,包括零学习和少学习 Setting下的能力。
  • methods: 本研究使用了多种任务,包括辩论挖掘和辩论生成,以评估LLMs的表现。我们还提供了一个新的对话生成测试集,以全面评估LLMs的综合性能。
  • results: 实验结果显示LLMs在大多数任务中表现出色,证明它们在计算辩论领域具有remarkable能力。然而,我们也注意到了评估计算辩论的限制,并提供了未来研究的建议。
    Abstract Computational argumentation has become an essential tool in various fields, including artificial intelligence, law, and public policy. It is an emerging research field in natural language processing (NLP) that attracts increasing attention. Research on computational argumentation mainly involves two types of tasks: argument mining and argument generation. As large language models (LLMs) have demonstrated strong abilities in understanding context and generating natural language, it is worthwhile to evaluate the performance of LLMs on various computational argumentation tasks. This work aims to embark on an assessment of LLMs, such as ChatGPT, Flan models and LLaMA2 models, under zero-shot and few-shot settings within the realm of computational argumentation. We organize existing tasks into 6 main classes and standardise the format of 14 open-sourced datasets. In addition, we present a new benchmark dataset on counter speech generation, that aims to holistically evaluate the end-to-end performance of LLMs on argument mining and argument generation. Extensive experiments show that LLMs exhibit commendable performance across most of these datasets, demonstrating their capabilities in the field of argumentation. We also highlight the limitations in evaluating computational argumentation and provide suggestions for future research directions in this field.
    摘要 计算辩论已成为不同领域的重要工具,包括人工智能、法律和公共政策。这是自然语言处理(NLP)的一个快速发展的研究领域,吸引了更多的关注。研究计算辩论主要涉及两类任务:辩论挖掘和辩论生成。由于大语言模型(LLMs)在理解上下文和生成自然语言方面表现出色,因此值得评估LLMs在不同计算辩论任务中的表现。本工作计划在零 shot和几 shot设置下评估 ChatGPT、Flan 模型和 LLaMA2 模型在计算辩论任务中的表现。我们将现有任务分为 6 个主要类型,并标准化 datasets 的格式。此外,我们还提供了一个新的benchmark dataset,用于全面评估 LLMS 在辩论挖掘和辩论生成任务中的综合表现。广泛的实验表明 LLMS 在大多数 datasets 中表现出色,证明它们在辩论领域的能力。我们还提出了计算辩论评估的限制和未来研究方向。

End-to-end Task-oriented Dialogue: A Survey of Tasks, Methods, and Future Directions

  • paper_url: http://arxiv.org/abs/2311.09008
  • repo_url: None
  • paper_authors: Libo Qin, Wenbo Pan, Qiguang Chen, Lizi Liao, Zhou Yu, Yue Zhang, Wanxiang Che, Min Li
  • for: 这篇论文主要针对的是End-to-end task-oriented dialogue(EToD)研究领域,旨在提供一份系统性的综述,涵盖该领域的所有方法和最新趋势。
  • methods: 该论文使用了大量的深度神经网络模型,特别是使用大型预训练模型,以实现EToD研究中的显著进步。
  • results: 该论文提供了一个综述EToD研究领域的新趋势和前沿领域,并提供了一个公共网站(https://etods.net/),以便EToD研究人员直接访问最新的进步。
    Abstract End-to-end task-oriented dialogue (EToD) can directly generate responses in an end-to-end fashion without modular training, which attracts escalating popularity. The advancement of deep neural networks, especially the successful use of large pre-trained models, has further led to significant progress in EToD research in recent years. In this paper, we present a thorough review and provide a unified perspective to summarize existing approaches as well as recent trends to advance the development of EToD research. The contributions of this paper can be summarized: (1) \textbf{\textit{First survey}: to our knowledge, we take the first step to present a thorough survey of this research field; (2) \textbf{\textit{New taxonomy}: we first introduce a unified perspective for EToD, including (i) \textit{Modularly EToD} and (ii) \textit{Fully EToD}; (3) \textbf{\textit{New Frontiers}: we discuss some potential frontier areas as well as the corresponding challenges, hoping to spur breakthrough research in EToD field; (4) \textbf{\textit{Abundant resources}: we build a public website\footnote{We collect the related papers, baseline projects, and leaderboards for the community at \url{https://etods.net/}.}, where EToD researchers could directly access the recent progress. We hope this work can serve as a thorough reference for the EToD research community.
    摘要 END-TO-END TASK-ORIENTED DIALOGUE (EToD) 可以直接生成响应,无需模块化训练,这已经在过去几年中吸引了越来越多的关注。深度神经网络的发展,特别是大型预训练模型的成功使用,导致了 EToD 研究领域的 significiant progress。在这篇论文中,我们提供了一份系统性的回顾和总结,旨在推动 EToD 研究的发展。本文的贡献包括:1. 首次调查:我们知道的所有文献中,我们是第一个进行这项研究的全面调查。2. 新的分类:我们首先引入了 EToD 的统一视角,包括(i)模块化 EToD 和(ii)完全 EToD。3. 新的前iers:我们讨论了一些潜在的前沿领域,以及相应的挑战,希望能够促进 EToD 领域的突破性研究。4. 充沛的资源:我们建立了一个公共网站(https://etods.net/), где EToD 研究人员可以直接访问最新的进展。我们希望这份工作能够成为 EToD 研究社区的参考。

Data Similarity is Not Enough to Explain Language Model Performance

  • paper_url: http://arxiv.org/abs/2311.09006
  • repo_url: https://github.com/gyauney/data-similarity-is-not-enough
  • paper_authors: Gregory Yauney, Emily Reif, David Mimno
  • for: 这个论文旨在探讨语言模型在多种下游任务中的高性能是如何实现的?
  • methods: 该论文使用了多种同构和示例特定的相似度度量(嵌入-, 字符-和模型基于的)来衡量语言模型在下游任务中的性能。
  • results: 在多语言任务中,相似度度量与语言模型的性能显著相关,但在其他benchmark中,相似度度量与准确率或者even每个相似度度量之间没有相关性。这表明下游任务和预训练数据之间的关系比较复杂。
    Abstract Large language models achieve high performance on many but not all downstream tasks. The interaction between pretraining data and task data is commonly assumed to determine this variance: a task with data that is more similar to a model's pretraining data is assumed to be easier for that model. We test whether distributional and example-specific similarity measures (embedding-, token- and model-based) correlate with language model performance through a large-scale comparison of the Pile and C4 pretraining datasets with downstream benchmarks. Similarity correlates with performance for multilingual datasets, but in other benchmarks, we surprisingly find that similarity metrics are not correlated with accuracy or even each other. This suggests that the relationship between pretraining data and downstream tasks is more complex than often assumed.
    摘要

Factcheck-GPT: End-to-End Fine-Grained Document-Level Fact-Checking and Correction of LLM Output

  • paper_url: http://arxiv.org/abs/2311.09000
  • repo_url: https://github.com/yuxiaw/factcheck-gpt
  • paper_authors: Yuxia Wang, Revanth Gangi Reddy, Zain Muhammad Mujahid, Arnav Arora, Aleksandr Rubashevskii, Jiahui Geng, Osama Mohammed Afzal, Liangming Pan, Nadav Borenstein, Aditya Pillai, Isabelle Augenstein, Iryna Gurevych, Preslav Nakov
  • for: 这篇论文旨在提供一个涵盖所有阶段的Annotation scheme来验证大型自然语言模型(LLM)生成的回答的实现方式,以便确保其精度和可靠性。
  • methods: 这篇论文使用了一个多阶段的Annotation scheme,让评分者能够为LLM生成的回答提供细化的标签,以捕捉回答中的可靠性和事实不一致之处。此外,这篇论文还开发了一个Annotation tool来加速评分过程,并且可以自动插入证据等自动结果。
  • results: 根据初步实验结果,FactTool、FactScore和Perplexity.ai等工具在验证false claims方面的性能不太理想,其F1分数为0.53。这篇论文提供了一个开放领域的文档级实验库,并且提供了一个网站供下载Annotation tool和代码。
    Abstract The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. In this work, we present a holistic end-to-end solution for annotating the factuality of LLM-generated responses, which encompasses a multi-stage annotation scheme designed to yield detailed labels concerning the verifiability and factual inconsistencies found in LLM outputs. We design and build an annotation tool to speed up the labelling procedure and ease the workload of raters. It allows flexible incorporation of automatic results in any stage, e.g. automatically-retrieved evidence. We further construct an open-domain document-level factuality benchmark in three-level granularity: claim, sentence and document. Preliminary experiments show that FacTool, FactScore and Perplexity.ai are struggling to identify false claims with the best F1=0.53. Annotation tool, benchmark and code are available at https://github.com/yuxiaw/Factcheck-GPT.
    摘要 通过大语言模型(LLM)在各种实际应用中的普及,需要验证其输出的事实准确性的机制。在这项工作中,我们提出了一种涵盖所有阶段的综合答案,用于标注 LLG 生成的响应中的事实准确性,并设计了一个多Stage annotation scheme,以生成细化的标签,包括 LLG 输出中的可靠性和事实不一致。我们设计了一个用于加速标注过程的标注工具,并且可以自动 incorporate 任何阶段的自动结果,例如自动检索到的证据。我们还构建了一个开放领域文档级别的事实准确性标准吗,包括声明、句子和文档三个级别。我们的初步实验表明,FacTool、FactScore 和 Perplexity.ai 在标识false声明方面的最佳 F1 值为 0.53。我们的标注工具、标准吗和代码可以在 GitHub 上获取。

SentAlign: Accurate and Scalable Sentence Alignment

  • paper_url: http://arxiv.org/abs/2311.08982
  • repo_url: https://github.com/steinst/sentalign
  • paper_authors: Steinþór Steingrímsson, Hrafn Loftsson, Andy Way
  • for: 该论文设计了一个高精度的句子对齐工具,用于处理非常大的平行文档对。
  • methods: 该算法使用用户定义的参数,采用分治分解方法对大量句子进行对齐,并使用LaBSE双语句子表示来评分。
  • results: SentAlign在德语-法语和英语-冰岛语两个评估集上表现出色,并在下游机器翻译任务中表现更好。
    Abstract We present SentAlign, an accurate sentence alignment tool designed to handle very large parallel document pairs. Given user-defined parameters, the alignment algorithm evaluates all possible alignment paths in fairly large documents of thousands of sentences and uses a divide-and-conquer approach to align documents containing tens of thousands of sentences. The scoring function is based on LaBSE bilingual sentence representations. SentAlign outperforms five other sentence alignment tools when evaluated on two different evaluation sets, German-French and English-Icelandic, and on a downstream machine translation task.
    摘要 我们介绍了 SentAlign,一款精度很高的句子对齐工具,可以处理非常大的平行文档对。通过用户定义的参数,对齐算法会评估所有可能的对齐路径,并使用分治分解方法对文档中的千余句进行对齐。对齐函数基于 LaBSE 双语句表示。 SentAlign 在两个不同的评估集上(德语-法语和英语-冰岛语)和下游机器翻译任务上表现出色,超越了五个其他句子对齐工具。

Speculative Contrastive Decoding

  • paper_url: http://arxiv.org/abs/2311.08981
  • repo_url: None
  • paper_authors: Hongyi Yuan, Keming Lu, Fei Huang, Zheng Yuan, Chang Zhou
  • for: 提高大语言模型(LLM)的推断质量和速度
  • methods: 使用 amateur models 预测专家模型的生成,并使用自然冲突来优化推断结果
  • results: 实验结果表明,使用 Speculative Contrastive Decoding(SCD)可以达到类似的加速因子,同时提高推断质量,并且可以减少计算资源的消耗
    Abstract Large language models (LLMs) have shown extraordinary performance in various language tasks, but high computational requirements hinder their widespread deployment. Speculative decoding, which uses amateur models to predict the generation of expert models, has been proposed as a way to accelerate LLM inference. However, speculative decoding focuses on acceleration instead of making the best use of the token distribution from amateur models. We proposed Speculative Contrastive Decoding (SCD), an accelerated decoding method leveraging the natural contrast between expert and amateur models in speculative decoding. Comprehensive evaluations on four benchmarks show that SCD can achieve similar acceleration factors as speculative decoding while further improving the generation quality as the contrastive decoding. The analysis of token probabilities further demonstrates the compatibility between speculative and contrastive decoding. Overall, SCD provides an effective approach to enhance the decoding quality of LLMs while saving computational resources.
    摘要

Improving Large-scale Deep Biasing with Phoneme Features and Text-only Data in Streaming Transducer

  • paper_url: http://arxiv.org/abs/2311.08966
  • repo_url: None
  • paper_authors: Jin Qiu, Lu Huang, Boyu Li, Jun Zhang, Lu Lu, Zejun Ma
  • for: 提高流式自动语音识别(ASR)中罕见词或上下文实体的识别性能。
  • methods: combine拟音和文本信息,以 distinguishing 同音或同字符序列的词语。
  • results: 在LibriSpeech corpus上,提出的方法实现了不同规模和偏好列表的罕见词错误率的国际先进性。
    Abstract Deep biasing for the Transducer can improve the recognition performance of rare words or contextual entities, which is essential in practical applications, especially for streaming Automatic Speech Recognition (ASR). However, deep biasing with large-scale rare words remains challenging, as the performance drops significantly when more distractors exist and there are words with similar grapheme sequences in the bias list. In this paper, we combine the phoneme and textual information of rare words in Transducers to distinguish words with similar pronunciation or spelling. Moreover, the introduction of training with text-only data containing more rare words benefits large-scale deep biasing. The experiments on the LibriSpeech corpus demonstrate that the proposed method achieves state-of-the-art performance on rare word error rate for different scales and levels of bias lists.
    摘要 深层偏迁对扬声器可以改善不同语言模型中的识别性能,尤其是在实时自动语音识别(ASR)应用中。然而,深层偏迁大规模罕见词仍然存在挑战,因为性能下降很快,有许多干扰符和类似的字符序列存在偏迁列表中。在这篇论文中,我们将扬声器中罕见词的音频和文本信息结合起来,以便在同音或同字符序列时分词。此外,在训练文本只含罕见词数据时,大规模深层偏迁的训练效果也得到了改进。在 LibriSpeech 数据集上进行的实验表明,提出的方法可以在不同的规模和偏迁列表水平上取得状态的词错率最佳性能。

Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models

  • paper_url: http://arxiv.org/abs/2311.08921
  • repo_url: None
  • paper_authors: Tingyu Xie, Qi Li, Yan Zhang, Zuozhu Liu, Hongwei Wang
  • for: investigate the possibilities of pushing the boundary of zero-shot NER with LLM via a training-free self-improving strategy.
  • methods: utilize an unlabeled corpus to stimulate the self-learning ability of LLMs on NER, and explore various strategies to select reliable samples from the self-annotated dataset as demonstrations.
  • results: achieve an obvious performance improvement, and there might still be space for improvement via more advanced strategy for reliable entity selection.Here’s the full text in Simplified Chinese:
  • for: 本研究旨在探索利用大型自然语言模型(LLM)进行零shotNamed Entity Recognition(NER)任务的可能性,并提出一种无需训练的自我改进策略。
  • methods: 我们利用一个无标注语料来刺激LLM的自我学习能力,并考虑了多种策略来选择自动标注数据中的可靠示例作为示例。
  • results: 我们的研究发现,使用自我改进策略可以进一步推动零shotNER的发展,并实现显著的性能提升。此外,我们还发现,简单地增加无标注语料或 iterative self-improving 并不能保证改进。
    Abstract Exploring the application of powerful large language models (LLMs) on the fundamental named entity recognition (NER) task has drawn much attention recently. This work aims to investigate the possibilities of pushing the boundary of zero-shot NER with LLM via a training-free self-improving strategy. We propose a self-improving framework, which utilize an unlabeled corpus to stimulate the self-learning ability of LLMs on NER. First, we use LLM to make predictions on the unlabeled corpus and obtain the self-annotated data. Second, we explore various strategies to select reliable samples from the self-annotated dataset as demonstrations, considering the similarity, diversity and reliability of demonstrations. Finally, we conduct inference for the test query via in-context learning with the selected self-annotated demonstrations. Through comprehensive experimental analysis, our study yielded the following findings: (1) The self-improving framework further pushes the boundary of zero-shot NER with LLMs, and achieves an obvious performance improvement; (2) Iterative self-improving or naively increasing the size of unlabeled corpus does not guarantee improvements; (3) There might still be space for improvement via more advanced strategy for reliable entity selection.
    摘要 First, we use LLM to make predictions on the unlabeled corpus and obtain the self-annotated data. Next, we explore various strategies to select reliable samples from the self-annotated dataset as demonstrations, taking into account the similarity, diversity, and reliability of the demonstrations. Finally, we conduct inference for the test query via in-context learning with the selected self-annotated demonstrations.Our comprehensive experimental analysis yielded the following findings:1. The self-improving framework further pushes the boundary of zero-shot NER with LLMs, achieving an obvious performance improvement.2. Iterative self-improving or simply increasing the size of the unlabeled corpus does not guarantee improvements.3. There may still be room for improvement via more advanced strategies for selecting reliable entities.

HELLaMA: LLaMA-based Table to Text Generation by Highlighting the Important Evidence

  • paper_url: http://arxiv.org/abs/2311.08896
  • repo_url: None
  • paper_authors: Junyi Bian, Xiaolei Qin, Wuhe Zou, Mengzuo Huang, Weidong Zhang
  • for: 这个论文主要是为了提出一种基于大语言模型的表格转文本方法,以优化表格转文本任务的性能。
  • methods: 这个方法使用了两个模块:一个表格理解器,用于从表格中提取相关的行数据,以及一个表格摘要生成器,用于基于高亮的表格生成文本。此外, authors还提出了一种搜索策略来生成表格理解 Label。
  • results: 在FetaQA和QTSumm数据集上,该方法达到了当前最佳的STATE-OF-THE-ARTResults,并且发现高亮输入表格可以显著提高模型的性能,同时提供有价值的解释性。
    Abstract Large models have demonstrated significant progress across various domains, particularly in tasks related to text generation. In the domain of Table to Text, many Large Language Model (LLM)-based methods currently resort to modifying prompts to invoke public APIs, incurring potential costs and information leaks. With the advent of open-source large models, fine-tuning LLMs has become feasible. In this study, we conducted parameter-efficient fine-tuning on the LLaMA2 model. Distinguishing itself from previous fine-tuning-based table-to-text methods, our approach involves injecting reasoning information into the input by emphasizing table-specific row data. Our model consists of two modules: 1) a table reasoner that identifies relevant row evidence, and 2) a table summarizer that generates sentences based on the highlighted table. To facilitate this, we propose a search strategy to construct reasoning labels for training the table reasoner. On both the FetaQA and QTSumm datasets, our approach achieved state-of-the-art results. Additionally, we observed that highlighting input tables significantly enhances the model's performance and provides valuable interpretability.
    摘要 大型模型在不同领域的任务中已经实现了显著的进步,尤其是在文本生成相关的任务中。在表格到文本领域,许多大语言模型(LLM)基于方法通常是修改提示来访问公共API,可能会导致潜在的成本和信息泄露。随着开源大型模型的出现,细化LLM成为可能。在这项研究中,我们进行了效率高的参数调整LLaMA2模型。与前期 Fine-tuning 基于表格到文本方法不同,我们的方法是通过强调表格特定的行数据来注入逻辑信息。我们的模型包括两个模块:1)表格逻辑器,用于确定相关的行证据;2)表格概要生成器,用于基于突出的表格生成句子。为了实现这一点,我们提议一种搜索策略来构建逻辑标签用于训练表格逻辑器。在FetaQA和QTSumm数据集上,我们的方法实现了状态的最佳结果。此外,我们发现高亮输入表格会显著提高模型的性能并提供有价值的解释性。

  • paper_url: http://arxiv.org/abs/2311.08890
  • repo_url: None
  • paper_authors: Thanmay Jayakumar, Fauzan Farooqui, Luqman Farooqui
  • for: 本研究目的是评估通用语言模型在法律领域的性能,以及与专门为法律领域开发的模型进行比较。
  • methods: 本研究使用了三个通用语言模型(ChatGPT-20b、LLaMA-2-70b和Falcon-180b),对LEDGAR子集进行零shot测试,以评估这些模型在合同提供分类任务中的性能。
  • results: 研究发现,通用语言模型可以在大多数情况下正确地分类主题,但是它们的mic-F1/mac-F1性能与特定于法律领域的小型模型相比,可能下降到19.2/26.8%。这表明,为法律领域开发更强大的语言模型是有必要的。
    Abstract Realizing the recent advances in Natural Language Processing (NLP) to the legal sector poses challenging problems such as extremely long sequence lengths, specialized vocabulary that is usually only understood by legal professionals, and high amounts of data imbalance. The recent surge of Large Language Models (LLMs) has begun to provide new opportunities to apply NLP in the legal domain due to their ability to handle lengthy, complex sequences. Moreover, the emergence of domain-specific LLMs has displayed extremely promising results on various tasks. In this study, we aim to quantify how general LLMs perform in comparison to legal-domain models (be it an LLM or otherwise). Specifically, we compare the zero-shot performance of three general-purpose LLMs (ChatGPT-20b, LLaMA-2-70b, and Falcon-180b) on the LEDGAR subset of the LexGLUE benchmark for contract provision classification. Although the LLMs were not explicitly trained on legal data, we observe that they are still able to classify the theme correctly in most cases. However, we find that their mic-F1/mac-F1 performance is up to 19.2/26.8\% lesser than smaller models fine-tuned on the legal domain, thus underscoring the need for more powerful legal-domain LLMs.
    摘要 现在的自然语言处理(NLP)技术在法律领域中提供了挑战性的问题,例如非常长的序列长度、专业legal vocabulary和大量数据不均衡。最近的大语言模型(LLMs)已经开始为法律领域提供新的应用机会,因为它们可以处理长、复杂的序列。此外,域 específico LLMS 的出现已经在多个任务上显示出非常有 promise。在本研究中,我们想要量化一般 LLMS 与法律领域模型(LLM或其他)的比较。我们比较三个一般用途 LLMS(ChatGPT-20b、LLaMA-2-70b和Falcon-180b)在 LEDGAR 子集上的零shot性性能。尽管 LLMS 没有直接接触法律数据,但我们发现它们仍然可以正确地分类主题。然而,我们发现它们的 mic-F1/mac-F1 性能与小型法律领域模型 fine-tuned 的性能相比,下降到 19.2/26.8%,这emet underscore the need for more powerful legal-domain LLMS。

CLIMB: Curriculum Learning for Infant-inspired Model Building

  • paper_url: http://arxiv.org/abs/2311.08886
  • repo_url: None
  • paper_authors: Richard Diehl Martinez, Zebulon Goriely, Hope McGovern, Christopher Davis, Andrew Caines, Paula Buttery, Lisa Beinborn
  • for: 本研究是为了提高语言模型的性能,并 investigate cognitively-motivated curriculum learning的效果。
  • methods: 本研究使用了三种不同的认知驱动的课程学习方法,包括词汇课程、数据课程和目标课程。
  • results: 研究发现,使用不同的课程学习方法可以获得一些有限的改善,但是不一致地改善所有语言测试任务。研究还发现,选择合适的模型架构和训练参数可以获得较好的改善。
    Abstract We describe our team's contribution to the STRICT-SMALL track of the BabyLM Challenge. The challenge requires training a language model from scratch using only a relatively small training dataset of ten million words. We experiment with three variants of cognitively-motivated curriculum learning and analyze their effect on the performance of the model on linguistic evaluation tasks. In the vocabulary curriculum, we analyze methods for constraining the vocabulary in the early stages of training to simulate cognitively more plausible learning curves. In the data curriculum experiments, we vary the order of the training instances based on i) infant-inspired expectations and ii) the learning behavior of the model. In the objective curriculum, we explore different variations of combining the conventional masked language modeling task with a more coarse-grained word class prediction task to reinforce linguistic generalization capabilities. Our results did not yield consistent improvements over our own non-curriculum learning baseline across a range of linguistic benchmarks; however, we do find marginal gains on select tasks. Our analysis highlights key takeaways for specific combinations of tasks and settings which benefit from our proposed curricula. We moreover determine that careful selection of model architecture, and training hyper-parameters yield substantial improvements over the default baselines provided by the BabyLM challenge.
    摘要 我们描述我们团队在STRICT-SMALL track上的 BabyLM 挑战中的贡献。挑战需要从头开始训练一个语言模型,只使用一个相对较小的训练集数据量为十万个单词。我们在语言评估任务中运行三种认知驱动的课程学习方法,并分析它们对模型性能的影响。在词汇课程中,我们分析了在初期训练阶段限制词汇的方法,以模拟更加认知可能的学习曲线。在数据课程实验中,我们变化了训练实例的顺序,根据i) 婴儿引发的期望和ii) 模型学习行为。在目标课程中,我们探索不同的拟合面见任务和更粗糙的词类预测任务的结合方式,以强化语言总结能力。我们的结果没有在一系列语言标准准点上得到了一致的改进,但我们发现了一些任务上的微妙改进。我们的分析强调特定任务和设置中的课程学习的优点。此外,我们发现选择模型架构和训练超参数可以提供substantial改进。

Enabling Large Language Models to Learn from Rules

  • paper_url: http://arxiv.org/abs/2311.08883
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Wenkai Yang, Yankai Lin, Jie Zhou, Jirong Wen
  • for: 本研究旨在探讨使用规则来帮助大型自然语言模型(LLM)学习新的任务或知识。
  • methods: 我们提出了一种名为规则浸泡的方法,它首先使用LLM的强 Context-Aware 能力提取规则中的知识,然后将知识Explicitly 编码到LLM 参数中,通过学习内部的信号来帮助LLM 学习。
  • results: 我们的实验结果显示,使用我们的方法可以让LLM更加快速地学习新任务或知识,并且在样本数量和泛化能力方面都比例例-based 学习更高效。
    Abstract Large language models (LLMs) have shown incredible performance in completing various real-world tasks. The current knowledge learning paradigm of LLMs is mainly based on learning from examples, in which LLMs learn the internal rule implicitly from a certain number of supervised examples. However, the learning paradigm may not well learn those complicated rules, especially when the training examples are limited. We are inspired that humans can learn the new tasks or knowledge in another way by learning from rules. That is, humans can grasp the new tasks or knowledge quickly and generalize well given only a detailed rule and a few optional examples. Therefore, in this paper, we aim to explore the feasibility of this new learning paradigm, which encodes the rule-based knowledge into LLMs. We propose rule distillation, which first uses the strong in-context abilities of LLMs to extract the knowledge from the textual rules and then explicitly encode the knowledge into LLMs' parameters by learning from the above in-context signals produced inside the model. Our experiments show that making LLMs learn from rules by our method is much more efficient than example-based learning in both the sample size and generalization ability.
    摘要

Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation

  • paper_url: http://arxiv.org/abs/2311.08877
  • repo_url: None
  • paper_authors: Vaishnavi Shrivastava, Percy Liang, Ananya Kumar
  • for: 提高LLM的可靠性,使其在问答 зада中准确地表达自己的信任度。
  • methods: 使用语言模型来描述自己的信任度,并使用一个伪装的信任模型来评估原始模型的信任度。
  • results: 使用这两种方法可以获得更高的AUC值(84.6%平均值),提高LLM的可靠性。
    Abstract To maintain user trust, large language models (LLMs) should signal low confidence on examples where they are incorrect, instead of misleading the user. The standard approach of estimating confidence is to use the softmax probabilities of these models, but as of November 2023, state-of-the-art LLMs such as GPT-4 and Claude-v1.3 do not provide access to these probabilities. We first study eliciting confidence linguistically -- asking an LLM for its confidence in its answer -- which performs reasonably (80.5% AUC on GPT-4 averaged across 12 question-answering datasets -- 7% above a random baseline) but leaves room for improvement. We then explore using a surrogate confidence model -- using a model where we do have probabilities to evaluate the original model's confidence in a given question. Surprisingly, even though these probabilities come from a different and often weaker model, this method leads to higher AUC than linguistic confidences on 9 out of 12 datasets. Our best method composing linguistic confidences and surrogate model probabilities gives state-of-the-art confidence estimates on all 12 datasets (84.6% average AUC on GPT-4).
    摘要 Translated into Simplified Chinese:以维护用户信任,大型自然语言模型(LLMs)应该在错误的示例上显示低自信,而不是误导用户。标准的自信估计方法是使用这些模型的软条对应的概率,但在2023年11月,现场的LMMs如GPT-4和Claude-v1.3并不提供这些概率。我们首先研究用于描述自信的语言方法 -- 将LMM询问自己的答案中的自信度 -- 这perform reasonably well(GPT-4的80.5% AUC在12个问答dataset上的平均值上升7%),但还有改善的空间。我们然后探索使用代理自信模型 -- 使用一个拥有概率的模型来评估原始模型在特定问题上的自信度。 surprisingly,这些概率来自不同和常较弱的模型,这种方法在12个dataset上高于语言自信的AUC(84.6%的GPT-4平均值)。我们的最佳方法是融合语言自信和代理模型概率,得到了现场的自信估计(84.6%的GPT-4平均值)。

OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining

  • paper_url: http://arxiv.org/abs/2311.08849
  • repo_url: None
  • paper_authors: Yihong Liu, Peiqin Lin, Mingyang Wang, Hinrich Schütze
  • for: 本文旨在提出一种高效的多语言语模型适应方法,以提高适应多语言语言模型的效率和可行性。
  • methods: 本文提出了一种名为\textbf{\textsc{Ofa}}的框架,它通过智能初始化目标语言中未看到的字词的embeddings来适应多语言语言模型。\textsc{Ofa}使用了外部的多语言word embeddings,并将它们的对应关系注入到新的embeddings中。此外,\textsc{Ofa}还应用了矩阵因子分解,将高维的embeddings替换为两个低维的矩阵,从而减少参数的数量。
  • results: 经过广泛的实验表明,由\textsc{Ofa}初始化的模型能够高效地适应多语言语言模型,并在多种下沉任务上表现出色。此外,\textsc{Ofa}不仅加速了继续预训的整合,还提高了零Instance cross语言传递性。
    Abstract Pretraining multilingual language models from scratch requires considerable computational resources and substantial training data. Therefore, a more efficient method is to adapt existing pretrained language models (PLMs) to new languages via vocabulary extension and continued pretraining. However, this method usually randomly initializes the embeddings of new subwords and introduces substantially more embedding parameters to the language model, thus weakening the efficiency. To address these issues, we propose a novel framework: \textbf{O}ne \textbf{F}or \textbf{A}ll (\textbf{\textsc{Ofa}), which wisely initializes the embeddings of unseen subwords from target languages and thus can adapt a PLM to multiple languages efficiently and effectively. \textsc{Ofa} takes advantage of external well-aligned multilingual word embeddings and injects the alignment knowledge into the new embeddings. In addition, \textsc{Ofa} applies matrix factorization and replaces the cumbersome embeddings with two lower-dimensional matrices, which significantly reduces the number of parameters while not sacrificing the performance. Through extensive experiments, we show models initialized by \textsc{Ofa} are efficient and outperform several baselines. \textsc{Ofa} not only accelerates the convergence of continued pretraining, which is friendly to a limited computation budget, but also improves the zero-shot crosslingual transfer on a wide range of downstream tasks. We make our code and models publicly available.
    摘要 <>Translate the given text into Simplified Chinese.<>现有的多语言语言模型(PLM)预训练需要较大的计算资源和大量的训练数据。因此,一种更有效的方法是使用现有的 PLM 进行多语言适应,通过词库扩展和继续预训练。但这种方法通常会随机初始化目标语言中的新词表示,并添加大量的词表示参数到语言模型中,从而降低效率。为解决这些问题,我们提出了一个新的框架:\textbf{一个 для所有} (\textbf{\textsc{Ofa}), 它智能初始化目标语言中的未看过词表示,并可以快速和有效地将 PLM 适应多种语言。\textsc{Ofa} 利用外部的多语言Word embeddings 和注入对应关系知识,并应用矩阵分解,将繁琐的词表示替换为两个更低维度的矩阵,这样减少了参数的数量,而不会降低性能。经过广泛的实验,我们发现模型使用 \textsc{Ofa} 初始化的效果更好,并在多种下游任务上实现了零shot Cross-Lingual 传递。\textsc{Ofa} 不仅加速了继续预训练的整合,也提高了零shot Cross-Lingual 传递的性能,这对有限的计算预算是友好的。我们将代码和模型公开发布。

Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder

  • paper_url: http://arxiv.org/abs/2311.08844
  • repo_url: None
  • paper_authors: Abdelrahman Mohamed, Fakhraddin Alwajih, El Moatez Billah Nagoudi, Alcides Alcoba Inciarte, Muhammad Abdul-Mageed
  • for: 本研究的目的是提高阿拉伯语言图像描述的水平,提供更多的泛型语言模型。
  • methods: 本研究使用了视觉编码器和 Gemini 文本解码器,以实现视觉和语言组件的融合。同时,我们还提出了一种自动从英语数据集中获取数据的新方法。
  • results: 对于我们的评估数据集,\textit{Violet} 表现出了显著的提升,例如在我们手动标注的数据集上达到了 CIDEr 分数为 61.2,并在 Flickr8k 上提高了13个点。
    Abstract Although image captioning has a vast array of applications, it has not reached its full potential in languages other than English. Arabic, for instance, although the native language of more than 400 million people, remains largely underrepresented in this area. This is due to the lack of labeled data and powerful Arabic generative models. We alleviate this issue by presenting a novel vision-language model dedicated to Arabic, dubbed \textit{Violet}. Our model is based on a vision encoder and a Gemini text decoder that maintains generation fluency while allowing fusion between the vision and language components. To train our model, we introduce a new method for automatically acquiring data from available English datasets. We also manually prepare a new dataset for evaluation. \textit{Violet} performs sizeably better than our baselines on all of our evaluation datasets. For example, it reaches a CIDEr score of $61.2$ on our manually annotated dataset and achieves an improvement of $13$ points on Flickr8k.
    摘要 To train our model, we introduce a new method for automatically acquiring data from existing English datasets. Additionally, we manually prepare a new dataset for evaluation. Compared to our baselines, Violet performs significantly better on all of our evaluation datasets. For example, it achieves a CIDEr score of 61.2 on our manually annotated dataset and improves by 13 points on Flickr8k.

Disinformation Capabilities of Large Language Models

  • paper_url: http://arxiv.org/abs/2311.08838
  • repo_url: https://github.com/kinit-sk/disinformation-capabilities
  • paper_authors: Ivan Vykopal, Matúš Pikuliak, Ivan Srba, Robert Moro, Dominik Macko, Maria Bielikova
  • for: 本研究探讨了现代语言模型(LLM)可能在扩散假新闻方面的能力,以及这些能力对民主社会的影响。
  • methods: 研究使用20个假新闻narritives测试了10个LLM的能力,包括生成新闻文章的质量、与假新闻narritives的倾向度、生成安全警告等方面。
  • results: 研究发现,LLMs可以生成有力的新闻文章,并且往往同意危险的假新闻narritives。此外,检测模型也能够准确地检测LLM生成的假新闻文章。
    Abstract Automated disinformation generation is often listed as one of the risks of large language models (LLMs). The theoretical ability to flood the information space with disinformation content might have dramatic consequences for democratic societies around the world. This paper presents a comprehensive study of the disinformation capabilities of the current generation of LLMs to generate false news articles in English language. In our study, we evaluated the capabilities of 10 LLMs using 20 disinformation narratives. We evaluated several aspects of the LLMs: how well they are at generating news articles, how strongly they tend to agree or disagree with the disinformation narratives, how often they generate safety warnings, etc. We also evaluated the abilities of detection models to detect these articles as LLM-generated. We conclude that LLMs are able to generate convincing news articles that agree with dangerous disinformation narratives.
    摘要 自动化假信息生成是大语言模型(LLM)的风险之一。这种理论上的信息淹没能力可能对世界各地的民主社会造成巨大的影响。本文提供了现代大语言模型对英语新闻文章的假信息生成能力的全面研究。我们在这种研究中评估了10个LLM的表现,使用20个假信息 narraative。我们评估了这些LLM的新闻文章生成能力、假信息narraative的同意程度、安全警告的生成频率等方面。我们还评估了检测模型对这些文章是否能够检测出LLM生成的能力。我们结论是,LLM可以生成有力的新闻文章,并与危险的假信息narraative相符。

StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving

  • paper_url: http://arxiv.org/abs/2311.08803
  • repo_url: None
  • paper_authors: Chang Gao, Haiyun Jiang, Deng Cai, Shuming Shi, Wai Lam
  • for: 提高Chain-of-thought(CoT)提问方法的普适性和一致性,以解决现有方法的通用性和任务级别一致性问题。
  • methods: 使用LLMs的能力,提出了一个完整的框架StrategylLM,通过自动生成、评估和选择有前途的策略来解决各种任务。
  • results: StrategylLM在13个数据集和4个复杂任务上取得了无人干预的比较优秀成绩,比基elineCoT-SC提高了39.2%到43.3%,70.3%到72.5%,51.7%到62.0%和30.0%到79.2%。
    Abstract Most existing chain-of-thought (CoT) prompting methods suffer from the issues of generalizability and consistency, as they often rely on instance-specific solutions that may not be applicable to other cases and lack task-level consistency in their reasoning steps. To address these limitations, we propose a comprehensive framework, StrategyLLM, harnessing the capabilities of LLMs to tackle various tasks. The framework improves generalizability by formulating general problem-solving strategies and enhances consistency by producing consistent solutions using these strategies. StrategyLLM employs four LLM-based agents: strategy generator, executor, optimizer, and evaluator, working together to generate, evaluate, and select promising strategies for a given task automatically. The experimental results demonstrate that StrategyLLM outperforms the competitive baseline CoT-SC that requires human-annotated solutions on 13 datasets across 4 challenging tasks without human involvement, including math reasoning (39.2% $\rightarrow$ 43.3%), commonsense reasoning (70.3% $\rightarrow$ 72.5%), algorithmic reasoning (51.7% $\rightarrow$ 62.0%), and symbolic reasoning (30.0% $\rightarrow$ 79.2%).
    摘要 现有的链式思维(CoT)提问方法受到普适性和一致性的限制,因为它们经常依赖于特定情况的解决方案,这些解决方案可能无法应用于其他情况,而且缺乏任务水平的一致性在其思维步骤中。为了解决这些限制,我们提出了一个全面的框架,名为策略LLM,充分利用LLM的能力来解决各种任务。该框架提高了普适性,通过形ulated general problem-solving策略,并增强一致性,通过使用这些策略生成一致的解决方案。策略LLM使用四个LLM基于的代理:策略生成器、执行器、优化器和评估器,这些代理共同工作,自动生成、评估和选择有投入潜力的策略,以解决给定任务。实验结果表明,策略LLM比基线CoT-SC,需要人工标注解决方案的情况下,在13个数据集上的4个挑战任务中表现出色,无需人类参与,包括数学逻辑(39.2% $\rightarrow$ 43.3%)、通情能力(70.3% $\rightarrow$ 72.5%)、算法逻辑(51.7% $\rightarrow$ 62.0%)和符号逻辑(30.0% $\rightarrow$ 79.2%)。

German FinBERT: A German Pre-trained Language Model

  • paper_url: http://arxiv.org/abs/2311.08793
  • repo_url: None
  • paper_authors: Moritz Scherrmann
  • for: 本研究开发了一个特有的德国语言模型,名为德国金融BERT,供金融文本数据分析使用。
  • methods: 本研究使用了广泛的预训练过程,运用了大量的金融报告、紧急公告和新闻,与德国公司相关。
  • results: 研究结果显示,德国金融BERT在下游任务中表现出色,尤其是在金融专业数据上。这表明德国金融BERT能够捕捉领域特有的特征。
    Abstract This study presents German FinBERT, a novel pre-trained German language model tailored for financial textual data. The model is trained through a comprehensive pre-training process, leveraging a substantial corpus comprising financial reports, ad-hoc announcements and news related to German companies. The corpus size is comparable to the data sets commonly used for training standard BERT models. I evaluate the performance of German FinBERT on downstream tasks, specifically sentiment prediction, topic recognition and question answering against generic German language models. My results demonstrate improved performance on finance-specific data, indicating the efficacy of German FinBERT in capturing domain-specific nuances. The presented findings suggest that German FinBERT holds promise as a valuable tool for financial text analysis, potentially benefiting various applications in the financial domain.
    摘要

Accelerating Toeplitz Neural Network with Constant-time Inference Complexity

  • paper_url: http://arxiv.org/abs/2311.08756
  • repo_url: https://github.com/opennlplab/etsc-exact-toeplitz-to-ssm-conversion
  • paper_authors: Zhen Qin, Yiran Zhong
  • for: 本文旨在将 toeplitz neural networks (TNNs) 转化为 state space models (SSMs),以便在推理过程中实现常数复杂性。
  • methods: 作者通过对 TNNs 的推理过程进行优化,将其转化为 SSMs。该过程被形式化为一个优化问题,并提供了关闭式解决方案。在解决过程中,作者使用离散傅里叶变换 (DFT) 来高效解决 Vandermonde 线性系统问题。
  • results: 作者在语言模型任务上进行了广泛的实验,证明了其方法的有效性。具体来说,作者的方法可以在不同的设定下保持数值稳定性,并且与其他梯度下降解决方案相比,具有更高的数值稳定性。
    Abstract Toeplitz Neural Networks (TNNs) have exhibited outstanding performance in various sequence modeling tasks. They outperform commonly used Transformer-based models while benefiting from log-linear space-time complexities. On the other hand, State Space Models (SSMs) achieve lower performance than TNNs in language modeling but offer the advantage of constant inference complexity. In this paper, we aim to combine the strengths of TNNs and SSMs by converting TNNs to SSMs during inference, thereby enabling TNNs to achieve the same constant inference complexities as SSMs. To accomplish this, we formulate the conversion process as an optimization problem and provide a closed-form solution. We demonstrate how to transform the target equation into a Vandermonde linear system problem, which can be efficiently solved using the Discrete Fourier Transform (DFT). Notably, our method requires no training and maintains numerical stability. It can be also applied to any LongConv-based model. To assess its effectiveness, we conduct extensive experiments on language modeling tasks across various settings. Additionally, we compare our method to other gradient-descent solutions, highlighting the superior numerical stability of our approach. The source code is available at https://github.com/OpenNLPLab/ETSC-Exact-Toeplitz-to-SSM-Conversion.
    摘要 托平论 neural network (TNN) 在不同的序列模型任务中表现出色,而且比通用的 transformer 型模型更具有 Log-linear 空间时间复杂度的优势。然而,状态空间模型 (SSM) 在语言模型中表现较差,但它具有常数推理复杂度的优点。在这篇论文中,我们想要将 TNN 转换成 SSM 以实现常数推理复杂度,而不需要训练。我们将转换过程定义为优化问题,并提供了关闭式解决方案。我们将目标方程转换成 Vandermonde 线性系统问题,可以使用离散傅立叶变换 (DFT) 高效解决。值得注意的是,我们的方法不需要训练,并且保持了数值稳定性。此外,我们的方法可以应用于任何 LongConv 基于模型。为评估其效果,我们在不同的语言模型任务上进行了广泛的实验。此外,我们与其他梯度下降解决方案进行比较,高亮了我们的方法的数值稳定性的优势。源代码可以在 GitHub 上找到:https://github.com/OpenNLPLab/ETSC-Exact-Toeplitz-to-SSM-Conversion。

Thread of Thought Unraveling Chaotic Contexts

  • paper_url: http://arxiv.org/abs/2311.08734
  • repo_url: None
  • paper_authors: Yucheng Zhou, Xiubo Geng, Tao Shen, Chongyang Tao, Guodong Long, Jian-Guang Lou, Jianbing Shen
  • For: The paper aims to improve the reasoning performance of large language models (LLMs) in chaotic contexts by introducing a new “Thread of Thought” (ThoT) strategy.* Methods: The ThoT strategy segments and analyzes extended contexts, selecting pertinent information to improve the reasoning performance of LLMs. The strategy is versatile and can be integrated with various LLMs and prompting techniques.* Results: The paper demonstrates the effectiveness of ThoT using three datasets (PopQA, EntityQ, and MTCR) and shows that ThoT significantly improves reasoning performance compared to other prompting techniques.
    Abstract Large Language Models (LLMs) have ushered in a transformative era in the field of natural language processing, excelling in tasks related to text comprehension and generation. Nevertheless, they encounter difficulties when confronted with chaotic contexts (e.g., distractors rather than long irrelevant context), leading to the inadvertent omission of certain details within the chaotic context. In response to these challenges, we introduce the "Thread of Thought" (ThoT) strategy, which draws inspiration from human cognitive processes. ThoT systematically segments and analyzes extended contexts while adeptly selecting pertinent information. This strategy serves as a versatile "plug-and-play" module, seamlessly integrating with various LLMs and prompting techniques. In the experiments, we utilize the PopQA and EntityQ datasets, as well as a Multi-Turn Conversation Response dataset (MTCR) we collected, to illustrate that ThoT significantly improves reasoning performance compared to other prompting techniques.
    摘要

Enhancing Emergency Decision-making with Knowledge Graphs and Large Language Models

  • paper_url: http://arxiv.org/abs/2311.08732
  • repo_url: None
  • paper_authors: Minze Chen, Zhenxiang Tao, Weitong Tang, Tingxin Qin, Rui Yang, Chunli Zhu
    for: 提供可靠的紧急决策支持methods: 使用知识图和大语言模型results: 在不同的紧急情况下,与基eline模型相比,得到了显著的改善,得分9.06、9.09、9.03和9.09。
    Abstract Emergency management urgently requires comprehensive knowledge while having a high possibility to go beyond individuals' cognitive scope. Therefore, artificial intelligence(AI) supported decision-making under that circumstance is of vital importance. Recent emerging large language models (LLM) provide a new direction for enhancing targeted machine intelligence. However, the utilization of LLM directly would inevitably introduce unreliable output for its inherent issue of hallucination and poor reasoning skills. In this work, we develop a system called Enhancing Emergency decision-making with Knowledge Graph and LLM (E-KELL), which provides evidence-based decision-making in various emergency stages. The study constructs a structured emergency knowledge graph and guides LLMs to reason over it via a prompt chain. In real-world evaluations, E-KELL receives scores of 9.06, 9.09, 9.03, and 9.09 in comprehensibility, accuracy, conciseness, and instructiveness from a group of emergency commanders and firefighters, demonstrating a significant improvement across various situations compared to baseline models. This work introduces a novel approach to providing reliable emergency decision support.
    摘要 应急管理强需全面知识,同时具有跨个人认知范围的可能性。因此,基于人工智能(AI)的决策在这种情况下是非常重要的。最新的大语言模型(LLM)提供了一个新的方向来提高目标机器智能。然而,直接使用LLM会导致不可靠的输出,因为它们的内置问题包括幻觉和思维能力不足。在这项工作中,我们开发了一个系统called Enhancing Emergency decision-making with Knowledge Graph and LLM(E-KELL),它提供了基于证据的决策在不同的应急阶段。研究构建了一个结构化的应急知识图,并使用提示链导引LLM进行图上的理解。在实际评估中,E-KELL得分9.06、9.09、9.03和9.09在可读性、准确性、简洁性和指导性方面,分别从一群应急指挥官和消防员手中得到评分,表明与基eline模型相比在不同的情况下显著提高。这项工作介绍了一种可靠的应急决策支持方法。

Uncertainty Estimation on Sequential Labeling via Uncertainty Transmission

  • paper_url: http://arxiv.org/abs/2311.08726
  • repo_url: None
  • paper_authors: Jianfeng He, Linlin Yu, Shuo Lei, Chang-Tien Lu, Feng Chen
  • for: 本研究旨在提高Named Entity Recognition(NER)预测的不确定性评估(UE-NER)。
  • methods: 本研究提出了一个Sequential Labeling Posterior Network(SLPN),用于估算NER预测结果的不确定性。SLPN考虑了ENTITY之间的连接(即一个ENTITY嵌入是基于其他ENTITY的学习),并且特别处理了WRONG-SPAN情况。
  • results: 本研究在两个数据集上实现了显著的改善,例如在MIT-Restaurant数据集上提高了AUPR指数5.54个点。
    Abstract Sequential labeling is a task predicting labels for each token in a sequence, such as Named Entity Recognition (NER). NER tasks aim to extract entities and predict their labels given a text, which is important in information extraction. Although previous works have shown great progress in improving NER performance, uncertainty estimation on NER (UE-NER) is still underexplored but essential. This work focuses on UE-NER, which aims to estimate uncertainty scores for the NER predictions. Previous uncertainty estimation models often overlook two unique characteristics of NER: the connection between entities (i.e., one entity embedding is learned based on the other ones) and wrong span cases in the entity extraction subtask. Therefore, we propose a Sequential Labeling Posterior Network (SLPN) to estimate uncertainty scores for the extracted entities, considering uncertainty transmitted from other tokens. Moreover, we have defined an evaluation strategy to address the specificity of wrong-span cases. Our SLPN has achieved significant improvements on two datasets, such as a 5.54-point improvement in AUPR on the MIT-Restaurant dataset.
    摘要 Sequential labeling是一个任务,它 predicts labels for each token in a sequence,如Named Entity Recognition (NER)。NER任务的目标是从文本中提取实体,并预测它们的标签,这是信息抽取中非常重要的一步。虽然之前的工作已经达到了NER性能的很大进步,但UE-NER(Named Entity Recognition uncertainty estimation)还是被忽略了,这是非常重要的。本工作关注UE-NER,它的目标是为NER预测中的实体提取 uncertainty scores。以前的uncertainty estimation模型经常忽略了NER中的两个特有特征:实体之间的连接(即一个实体嵌入是基于其他实体学习的)以及实体提取子任务中的错误案例。因此,我们提出了一个Sequential Labeling Posterior Network (SLPN),用于估计实体预测中的uncertainty scores,考虑实体之间的uncertainty传递。此外,我们定义了一种评估策略,用于解决实体提取子任务中的特殊错误案例。我们的SLPN在两个 dataset上达到了显著的改进,如MIT-Restaurant dataset上的AUPR提高5.54点。

Method for Text Entity Linking in Power Distribution Scheduling Oriented to Power Distribution Network Knowledge Graph

  • paper_url: http://arxiv.org/abs/2311.08724
  • repo_url: None
  • paper_authors: Xiang Li, Che Wang, Bing Li, Hao Chen, Sizhe Li
  • for: 本研究旨在链接发电 dispatch 文本中的实体到一个电力分配网络知识图的方法。
  • methods: 该方法利用电力分配网络知识图和发电 dispatch 文本中实体的semantic、phonetic和syntactic特征进行深入理解,并使用加强型模型——lexical semantic feature-based skip convolutional neural network (LSF-SCNN) 进行实体匹配。
  • results: 比较控制模型的实验结果表明,LSF-SCNN 模型在英语发电 dispatch 文本中高精度地链接了多种实体类型,表现了高总准确率在实体链接中。
    Abstract The proposed method for linking entities in power distribution dispatch texts to a power distribution network knowledge graph is based on a deep understanding of these networks. This method leverages the unique features of entities in both the power distribution network's knowledge graph and the dispatch texts, focusing on their semantic, phonetic, and syntactic characteristics. An enhanced model, the Lexical Semantic Feature-based Skip Convolutional Neural Network (LSF-SCNN), is utilized for effectively matching dispatch text entities with those in the knowledge graph. The efficacy of this model, compared to a control model, is evaluated through cross-validation methods in real-world power distribution dispatch scenarios. The results indicate that the LSF-SCNN model excels in accurately linking a variety of entity types, demonstrating high overall accuracy in entity linking when the process is conducted in English.
    摘要 “提议的方法是基于电力分配网络知识图的深入理解,该方法利用知识图和调度文本中实体的语义、语音和语法特征。使用加强模型——lexical semantic feature-based skip convolutional neural network(LSF-SCNN),可以有效地匹配调度文本中的实体与知识图中的实体。通过跨验证方法在实际电力分配调度场景中评估模型的效果,结果表明LSF-SCNN模型在英语下可以准确地连接多种实体类型,实现高精度实体连接。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Token Prediction as Implicit Classification to Identify LLM-Generated Text

  • paper_url: http://arxiv.org/abs/2311.08723
  • repo_url: https://github.com/markchenyutian/t5-sentinel-public
  • paper_authors: Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj
  • for: 本研究旨在提出一种新的语言模型标识方法,以便在文本生成中识别可能的大型语言模型(LLMs)。
  • methods: 我们重新框定了分类任务为下一个字符预测任务,直接使用基础LM进行 fine-tune,而不是添加额外的分类层。我们使用 Text-to-Text Transfer Transformer(T5)模型作为我们的实验基础。
  • results: 我们的方法在文本分类任务中表现出色,表明其简单性和效率。此外,我们对模型提取的特征进行了解释性研究,发现它能够在不同的LLMs中分辨出不同的写作风格,即使没有显式的分类器。我们还收集了一个名为 OpenLLMText 的数据集,包含约 340k 的文本样本,来自人类和 LLMs,包括 GPT3.5、PaLM、LLaMA 和 GPT2。
    Abstract This paper introduces a novel approach for identifying the possible large language models (LLMs) involved in text generation. Instead of adding an additional classification layer to a base LM, we reframe the classification task as a next-token prediction task and directly fine-tune the base LM to perform it. We utilize the Text-to-Text Transfer Transformer (T5) model as the backbone for our experiments. We compared our approach to the more direct approach of utilizing hidden states for classification. Evaluation shows the exceptional performance of our method in the text classification task, highlighting its simplicity and efficiency. Furthermore, interpretability studies on the features extracted by our model reveal its ability to differentiate distinctive writing styles among various LLMs even in the absence of an explicit classifier. We also collected a dataset named OpenLLMText, containing approximately 340k text samples from human and LLMs, including GPT3.5, PaLM, LLaMA, and GPT2.
    摘要 这篇论文介绍了一种新的方法,用于识别文本生成过程中可能的大语言模型(LLM)。而不是添加一层分类层到基础语言模型(LM)上,我们将分类任务重新定义为下一个字符预测任务,并直接使用基础LM进行 fine-tune。我们使用 Text-to-Text Transfer Transformer(T5)模型作为我们的实验室。我们与直接使用隐藏状态进行分类的方法进行比较。评估结果表明我们的方法在文本分类任务中表现出色,强调其简单性和效率。此外,我们对我们的模型提取的特征进行了解释性研究,发现它能够在不同的LLM下 diferenciate 不同的写作风格,甚至在没有显式分类器的情况下。我们还收集了一个名为 OpenLLMText 的数据集,包含约 340k 的文本样本,来自人类和 LLM,包括 GPT3.5、PaLM、LLaMA 和 GPT2。

Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory

  • paper_url: http://arxiv.org/abs/2311.08719
  • repo_url: None
  • paper_authors: Lei Liu, Xiaoyan Yang, Yue Shen, Binbin Hu, Zhiqiang Zhang, Jinjie Gu, Guannan Zhang
  • for: 提高大语言模型在长期人机交互中的表现,减少偏见的问题。
  • methods: 提出了一种新的记忆机制 called TiM,允许大语言模型在对话流中维护一个演化的记忆,并通过插入、忘记和合并操作来动态更新记忆。
  • results: 在实际和模拟对话中,通过使用 TiM 机制,大语言模型的响应表现得到了显著提高,并且可以减少偏见的问题。
    Abstract Memory-augmented Large Language Models (LLMs) have demonstrated remarkable performance in long-term human-machine interactions, which basically relies on iterative recalling and reasoning of history to generate high-quality responses. However, such repeated recall-reason steps easily produce biased thoughts, \textit{i.e.}, inconsistent reasoning results when recalling the same history for different questions. On the contrary, humans can keep thoughts in the memory and recall them without repeated reasoning. Motivated by this human capability, we propose a novel memory mechanism called TiM (Think-in-Memory) that enables LLMs to maintain an evolved memory for storing historical thoughts along the conversation stream. The TiM framework consists of two crucial stages: (1) before generating a response, a LLM agent recalls relevant thoughts from memory, and (2) after generating a response, the LLM agent post-thinks and incorporates both historical and new thoughts to update the memory. Thus, TiM can eliminate the issue of repeated reasoning by saving the post-thinking thoughts as the history. Besides, we formulate the basic principles to organize the thoughts in memory based on the well-established operations, (\textit{i.e.}, insert, forget, and merge operations), allowing for dynamic updates and evolution of the thoughts. Furthermore, we introduce Locality-Sensitive Hashing into TiM to achieve efficient retrieval for the long-term conversations. We conduct qualitative and quantitative experiments on real-world and simulated dialogues covering a wide range of topics, demonstrating that equipping existing LLMs with TiM significantly enhances their performance in generating responses for long-term interactions.
    摘要 大型语言模型(LLM)具有增强的记忆功能,在人机交互中表现出了很高的能力。然而,在重复 recall 和推理的过程中,LLM 容易产生偏见,即不同问题时的推理结果不一致。人类可以将想法保持在记忆中,而不需要重复推理。为了解决这个问题,我们提出了一种新的记忆机制called TiM(思考在内存),允许 LLM 在对话流中维护一个演进的记忆。TiM 框架包括两个关键阶段:(1)在生成响应之前,LLM 代理检索相关的思想记忆中,(2)在生成响应后,LLM 代理在历史和新的思想之间进行后思考和融合,以更新记忆。因此,TiM 可以消除重复推理的问题,并将后思考的思想作为历史记忆保存。此外,我们采用了在 TiM 中使用 Local Sensitive Hashing 进行高效的检索,以便应对长期对话。我们在实际和模拟对话中进行了质量和量的实验,demonstrating equip existing LLMs with TiM 可以明显提高它们在长期交互中的响应能力。

Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling

  • paper_url: http://arxiv.org/abs/2311.08718
  • repo_url: None
  • paper_authors: Bairu Hou, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang, Yang Zhang
  • for: 这 paper aims to improve the reliability, trustworthiness, and interpretability of large language models (LLMs) by developing an uncertainty decomposition framework.
  • methods: The proposed framework, called input clarifications ensemble, generates a set of clarifications for the input and feeds them into the fixed LLMs to ensure accurate and reliable uncertainty quantification.
  • results: Empirical evaluations demonstrate that the proposed framework provides accurate and reliable uncertainty quantification on various tasks, and the code will be made publicly available at https://github.com/UCSB-NLP-Chang/llm_uncertainty.Here's the Chinese version:
  • for: 这 paper 的目的是提高大型自然语言处理模型(LLMs)的可靠性、可信度和可解释性。
  • methods: 提议的框架是输入明确集,它会生成输入的一组明确度,然后将其传递给固定的 LLMs,以确保准确和可靠的不确定量评估。
  • results: 实验证明,提议的框架可以在不同任务上提供准确和可靠的不确定量评估,代码将会在 https://github.com/UCSB-NLP-Chang/llm_uncertainty 上公开发布。
    Abstract Uncertainty decomposition refers to the task of decomposing the total uncertainty of a model into data (aleatoric) uncertainty, resulting from the inherent complexity or ambiguity of the data, and model (epistemic) uncertainty, resulting from the lack of knowledge in the model. Performing uncertainty decomposition for large language models (LLMs) is an important step toward improving the reliability, trustworthiness, and interpretability of LLMs, but this research task is very challenging and remains unresolved. The existing canonical method, Bayesian Neural Network (BNN), cannot be applied to LLMs, because BNN requires training and ensembling multiple variants of models, which is infeasible or prohibitively expensive for LLMs. In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarifications ensemble, which bypasses the need to train new models. Rather than ensembling models with different parameters, our approach generates a set of clarifications for the input, feeds them into the fixed LLMs, and ensembles the corresponding predictions. We show that our framework shares a symmetric decomposition structure with BNN. Empirical evaluations demonstrate that the proposed framework provides accurate and reliable uncertainty quantification on various tasks. Code will be made publicly available at https://github.com/UCSB-NLP-Chang/llm_uncertainty .
    摘要 <> transtable("Uncertainty decomposition")uncertainty decomposition REFERS TO THE TASK OF DECOMPOSING THE TOTAL UNCERTAINTY OF A MODEL INTO DATA (aleatoric) uncertainty, RESULTING FROM THE INHERENT COMPLEXITY OR AMBIGUITY OF THE DATA, AND MODEL (epistemic) uncertainty, RESULTING FROM THE LACK OF KNOWLEDGE IN THE MODEL. PERFORMING UNCERTAINTY DECOMPOSITION FOR LARGE LANGUAGE MODELS (LLMs) IS AN IMPORTANT STEP TOWARD IMPROVING THE RELIABILITY, TRUSTWORTHINESS, AND INTERPRETABILITY OF LLMs, BUT THIS RESEARCH TASK IS VERY CHALLENGING AND REMAINS UNRESOLVED. THE EXISTING CANONICAL METHOD, BAYESIAN NEURAL NETWORK (BNN), CANNOT BE APPLIED TO LLMs, BECAUSE BNN REQUIRES TRAINING AND ENSMBLING MULTIPLE VARIANTS OF MODELS, WHICH IS INFEASIBLE OR PROHIBITIVELY EXPENSIVE FOR LLMs. IN THIS PAPER, WE INTRODUCE AN UNCERTAINTY DECOMPOSITION FRAMEWORK FOR LLMs, CALLED INPUT CLARIFICATIONS ENSEMBLE, WHICH BYPASSES THE NEED TO TRAIN NEW MODELS. RATHER THAN ENSMBLING MODELS WITH DIFFERENT PARAMETERS, OUR APPROACH GENERATES A SET OF CLARIFICATIONS FOR THE INPUT, FEEDS THEM INTO THE FIXED LLMs, AND ENSMBLES THE CORRESPONDING PREDICTIONS. WE SHOW THAT OUR FRAMEWORK SHARES A SYMMETRIC DECOMPOSITION STRUCTURE WITH BNN. EMPIRICAL EVALUATIONS DEMONSTRATE THAT THE PROPOSED FRAMEWORK PROVIDES ACCURATE AND RELIABLE UNCERTAINTY QUANTIFICATION ON VARIOUS TASKS. CODE WILL BE MADE PUBLICLY AVAILABLE AT https://github.com/UCSB-NLP-Chang/llm_uncertainty .

PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning

  • paper_url: http://arxiv.org/abs/2311.08711
  • repo_url: https://github.com/ytyz1307zzh/plug
  • paper_authors: Zhihan Zhang, Dong-Ho Lee, Yuwei Fang, Wenhao Yu, Mengzhao Jia, Meng Jiang, Francesco Barbieri
  • for: 提高大语言模型在不同人类指令下的理解和回答能力
  • methods: 使用高资源语言(主要是英语)为核心,实现指令准备语言转化为目标语言的回答
  • results: 比直接回答目标语言alone提高了大语言模型对指令的遵从能力,增加了29%的平均提升率。
    Abstract Instruction tuning has remarkably advanced large language models (LLMs) in understanding and responding to diverse human instructions. Despite the success in high-resource languages, its application in lower-resource ones faces challenges due to the imbalanced foundational abilities of LLMs across different languages, stemming from the uneven language distribution in their pre-training data. To tackle this issue, we propose pivot language guided generation (PLUG), an approach that utilizes a high-resource language, primarily English, as the pivot to enhance instruction tuning in lower-resource languages. It trains the model to first process instructions in the pivot language, and then produce responses in the target language. To evaluate our approach, we introduce a benchmark, X-AlpacaEval, of instructions in 4 languages (Chinese, Korean, Italian, and Spanish), each annotated by professional translators. Our approach demonstrates a significant improvement in the instruction-following abilities of LLMs by 29% on average, compared to directly responding in the target language alone. Further experiments validate the versatility of our approach by employing alternative pivot languages beyond English to assist languages where LLMs exhibit lower proficiency.
    摘要 具有杰出表现的指令调整技术已经大幅提高了大型自然语言模型(LLM)的理解和回应多样化人类指令的能力。然而,在低资源语言上应用这些技术却遇到了挑战,这主要归结于LLM在不同语言的基础能力的不均衡,这种不均衡来自于模型在它们的预训练数据中的语言分布不均。为解决这个问题,我们提出了锚语言导向生成(PLUG)方法,该方法利用高资源语言(主要是英语)作为锚点,以提高低资源语言中的指令调整能力。它将模型首先在锚语言中处理指令,然后生成回应在目标语言中。为评估我们的方法,我们提出了一个标准测试套件,名为X-AlpacaEval,该套件包含4种语言(中文、韩语、意大利语和西班牙语)的指令,每个指令由专业翻译员进行标注。我们的方法在平均上提高了LLM的指令遵循能力 by 29%,相比直接在目标语言中回应。此外,我们的实验还证明了我们的方法可以采用不同的锚语言来帮助语言,其中LLM表现较低的语言。

Evaluating Robustness of Dialogue Summarization Models in the Presence of Naturally Occurring Variations

  • paper_url: http://arxiv.org/abs/2311.08705
  • repo_url: None
  • paper_authors: Ankita Gupta, Chulaka Gunasekara, Hui Wan, Jatin Ganhotra, Sachindra Joshi, Marina Danilevsky
  • for: 本研究旨在探讨对话摘要模型的Robustness Challenge,包括对各种自然语言变化和噪声的影响。
  • methods: 我们使用公开的数据集对现有的对话摘要模型进行了系统性的研究,以评估这些模型对各种语言变化和噪声的抗预测性。我们引入了两种类型的干扰:utterance-level干扰和对话-level干扰。
  • results: 我们发现,尽管使用精度级进行了微调和指令级进行了微调,但是这些模型都受到输入变化的影响,特别是对话-level干扰。我们还通过人工评估 validate our findings。此外,我们发现使用一部分干扰数据进行训练并不能解决对话摘要模型的Robustness Challenge。
    Abstract Dialogue summarization task involves summarizing long conversations while preserving the most salient information. Real-life dialogues often involve naturally occurring variations (e.g., repetitions, hesitations) and existing dialogue summarization models suffer from performance drop on such conversations. In this study, we systematically investigate the impact of such variations on state-of-the-art dialogue summarization models using publicly available datasets. To simulate real-life variations, we introduce two types of perturbations: utterance-level perturbations that modify individual utterances with errors and language variations, and dialogue-level perturbations that add non-informative exchanges (e.g., repetitions, greetings). We conduct our analysis along three dimensions of robustness: consistency, saliency, and faithfulness, which capture different aspects of the summarization model's performance. We find that both fine-tuned and instruction-tuned models are affected by input variations, with the latter being more susceptible, particularly to dialogue-level perturbations. We also validate our findings via human evaluation. Finally, we investigate if the robustness of fine-tuned models can be improved by training them with a fraction of perturbed data and observe that this approach is insufficient to address robustness challenges with current models and thus warrants a more thorough investigation to identify better solutions. Overall, our work highlights robustness challenges in dialogue summarization and provides insights for future research.
    摘要 对话摘要任务 involve 摘要长 conversations 而保留最重要信息。实际对话中经常出现自然的变化(例如重复、停顿),现有的对话摘要模型在这些对话中表现不佳。在这项研究中,我们系统地研究这些变化对现状对话摘要模型的影响。为了模拟实际变化,我们引入了两种类型的杂化:个别话语杂化( modify 个别话语中的错误和语言变化)和对话杂化(添加无关信息的交流,例如重复、致谢)。我们按照三个维度进行分析:一致性、重要性和忠诚度,这些维度捕捉了不同的对话摘要模型表现方面。我们发现, beide fine-tuned 和 instruction-tuned 模型受到输入变化的影响,其中后者更加敏感,特别是对话杂化。我们还通过人工评估 validate 我们的发现。最后,我们 investigate 是否可以通过训练 fine-tuned 模型 WITH 一部分杂化数据来提高其robustness,并发现这种方法不足以解决当前模型的Robustness挑战,因此需要进一步的调查以找到更好的解决方案。总之,我们的工作强调对话摘要中的Robustness挑战和未来研究的需要。

Attribute Diversity Determines the Systematicity Gap in VQA

  • paper_url: http://arxiv.org/abs/2311.08695
  • repo_url: None
  • paper_authors: Ian Berlot-Attwell, A. Michael Carrell, Kumar Krishna Agrawal, Yash Sharma, Naomi Saphra
  • for: 研究 neural network 是否可以通过将 familar concept 组合在一起来泛化到新的情况。
  • methods: 引入了一个新的诊断数据集 CLEVR-HOPE,以测试系统aticity gap 在视觉问答中的表现。
  • results: 发现尽量增加训练数据量不会减少系统aticity gap,但是增加不同类型的属性组合在未seen combination中的训练数据多样性可以减少系统aticity gap。
    Abstract The degree to which neural networks can generalize to new combinations of familiar concepts, and the conditions under which they are able to do so, has long been an open question. In this work, we study the systematicity gap in visual question answering: the performance difference between reasoning on previously seen and unseen combinations of object attributes. To test, we introduce a novel diagnostic dataset, CLEVR-HOPE. We find that while increased quantity of training data does not reduce the systematicity gap, increased training data diversity of the attributes in the unseen combination does. In all, our experiments suggest that the more distinct attribute type combinations are seen during training, the more systematic we can expect the resulting model to be.
    摘要 “神经网络是否能够总结新的熵合?”这个问题一直是开放的。在这个工作中,我们研究视觉问答中的系统特性差距:推理已经看过和未经看过的对象属性的组合性的表现差异。为了测试,我们引入了一个新的诊断数据集,CLEVR-HOPE。我们发现,尽管增加训练数据量不会减少系统特性差距,但是增加未经看过组合属性的训练数据多样性可以减少系统特性差距。总之,我们的实验表明,更多的独特属性类型组合被训练时,更可预期性的结果。

Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models

  • paper_url: http://arxiv.org/abs/2311.08692
  • repo_url: None
  • paper_authors: Keming Lu, Hongyi Yuan, Runji Lin, Junyang Lin, Zheng Yuan, Chang Zhou, Jingren Zhou
  • for: 本研究旨在提高大语言模型(LLM)的 ensemble 性能,通过挖掘各自领域和任务中的专业知识,实现更好的 ensemble 性能。
  • methods: 本研究提出了一种名为 Zooter 的奖励导引路由方法,通过训练路由函数来精准地分配每个查询到适合的 LLM 中。此外,研究还提出了一种基于标签的抑制难以预测的噪声的技术。
  • results: 研究发现,Zooter 在一系列 benchmark 集合上表现出色,比单个模型的表现更好,并在 44% 的任务上击败了多个奖励模型排名方法。
    Abstract The complementary potential of Large Language Models (LLM) assumes off-the-shelf LLMs have heterogeneous expertise in a wide range of domains and tasks so that an ensemble of LLMs can achieve consistently better performance. Existing ensemble methods for LLMs mainly focus on reward model ranking of outputs, leading to significant computation overhead. To combat this issue, we revisit the complementary potential of LLMs and further elaborate it by mining latent expertise with off-the-shelf reward models. We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function, which can precisely distribute each query to the LLM with expertise about it. We also integrate a tag-based label enhancement to mitigate noise from uncertainty when using rewards as silver supervision. Zooter shows computation efficiency in inference as it introduces only a minor computation overhead of a routing function compared with reward model ranking methods. We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks. Zooter outperforms the best single model on average and ranks first on 44% of tasks, even surpassing multiple reward model ranking methods.
    摘要 <>Translate the given text into Simplified Chinese.<>LLM的补偿潜力假设市售LLM有多个领域和任务的多样化专业知识,以 ensemble 方式实现更好的性能。现有的LLM ensemble方法主要集中于奖励模型排名输出,导致计算开销增加。为解决这个问题,我们再次探讨LLM的补偿潜力,并通过挖掘缓存专业知识使用市售奖励模型。我们提议Zooter,一种奖励导航方法,通过在训练查询上分配奖励来培养路由函数,可以准确地将每个查询分配给LLM拥有相关专业知识。我们还 integra 了标签基本标签增强来降低使用奖励作为银色监督时的噪音。Zooter在推理中引入了只有市售奖励模型排名方法相对较少的计算开销。我们对一个包含26个子集的全面 benchmark 集进行了评估,Zooter在 average 上超过了最佳单个模型,并在44%的任务上排名第一。

Understanding Calibration for Multilingual Question Answering Models

  • paper_url: http://arxiv.org/abs/2311.08669
  • repo_url: None
  • paper_authors: Yahan Yang, Soham Dan, Dan Roth, Insup Lee
  • for: 这篇论文主要研究了多语言预训练语言模型在问答任务中的准确性。
  • methods: 该论文使用了多种问答模型设计和多种语言进行了广泛的实验,包括抽取式和生成式问答模型,以及高资源语言和低资源语言。它还研究了不同维度的准确性,包括在适应区、离distribution和跨语言传递设置中。
  • results: 研究发现自动翻译数据增强技术可以大幅提高模型准确性,并进行了一系列的减少实验来研究模型大小对准确性的影响和多语言模型与单语言模型的比较。
    Abstract Multilingual pre-trained language models are incredibly effective at Question Answering (QA), a core task in Natural Language Understanding, achieving high accuracies on several multilingual benchmarks. However, little is known about how well they are calibrated. In this paper, we study the calibration properties of several pre-trained multilingual large language models (LLMs) on a variety of question-answering tasks. We perform extensive experiments, spanning both extractive and generative QA model designs and diverse languages, spanning both high-resource and low-resource ones. We study different dimensions of calibration in in-distribution, out-of-distribution, and cross-lingual transfer settings, and investigate strategies to improve it, including post-hoc methods and regularized fine-tuning. We demonstrate automatically translated data augmentation as a highly effective technique to improve model calibration. We also conduct a number of ablation experiments to study the effect of model size on calibration and how multilingual models compare with their monolingual counterparts for diverse tasks and languages.
    摘要 多语言预训练语言模型在问答任务(QA)中表现非常出色,在多种多语言benchmark上达到了高准确率。然而,对于这些模型的准确性calibration的了解非常少。在这篇论文中,我们研究了多种预训练多语言大型语言模型(LLMs)在问答任务中的准确性calibration性。我们进行了广泛的实验,涵盖了EXTRACTIVE和生成型问答模型的设计,以及多种语言和资源量的组合。我们研究了不同的calibration维度,包括在适用范围内、外部和跨语言传递设置中的calibration性,并 investigate了提高calibration性的策略,包括后期方法和规则化的细化。我们示出了自动翻译数据增强为一种非常有效的技术来提高模型的准确性calibration。我们还进行了一些减少实验来研究模型大小对calibration的影响和多语言模型与单语言模型在多种任务和语言上的比较。

It Takes Two to Negotiate: Modeling Social Exchange in Online Multiplayer Games

  • paper_url: http://arxiv.org/abs/2311.08666
  • repo_url: https://github.com/kj2013/claff-diplomacy
  • paper_authors: Kokil Jaidka, Hansin Ahuja, Lynnette Ng
  • for: 研究在线上战略游戏《 дипломати》中玩家之间的交互,以了解玩家如何在游戏中谈判他们的方式。
  • methods: 使用了10,000多则聊天讯息的标注数据,以分析不同谈判策略的重要性,并评估这些策略在预测短期和长期游戏结果中的影响。
  • results: 发现谈判策略可以通过语言模型化聊天讯息来预测,但是在短期内的信任性预测需要更多的资料。然而,谈判策略在图像意识强化学习方法中是非常重要的,可以预测长期游戏结果,如玩家的成功。
    Abstract Online games are dynamic environments where players interact with each other, which offers a rich setting for understanding how players negotiate their way through the game to an ultimate victory. This work studies online player interactions during the turn-based strategy game, Diplomacy. We annotated a dataset of over 10,000 chat messages for different negotiation strategies and empirically examined their importance in predicting long- and short-term game outcomes. Although negotiation strategies can be predicted reasonably accurately through the linguistic modeling of the chat messages, more is needed for predicting short-term outcomes such as trustworthiness. On the other hand, they are essential in graph-aware reinforcement learning approaches to predict long-term outcomes, such as a player's success, based on their prior negotiation history. We close with a discussion of the implications and impact of our work. The dataset is available at https://github.com/kj2013/claff-diplomacy.
    摘要 在线游戏是动态环境,玩家之间的互动可以提供丰富的数据来理解玩家如何在游戏中获得最终胜利。这个研究 focuses on线上玩家互动中的谈判策略,并对不同的谈判策略进行了类别标注。我们分析了超过10,000封聊天讯息,并评估了这些谈判策略对游戏的长期和短期结果的影响。虽然可以透过语言模型估计谈判策略的准确性,但是在短期内的信任性仍然是难以预测的。然而,这些谈判策略在图形意识型态的强化学习方法中是非常重要的,可以预测长期的成功。我们在结论中讨论了这个研究的影响和意义,并提供了资料集的网站地址。

Multistage Collaborative Knowledge Distillation from Large Language Models

  • paper_url: http://arxiv.org/abs/2311.08640
  • repo_url: None
  • paper_authors: Jiachen Zhao, Wenlong Zhao, Andrew Drozdov, Benjamin Rozonoyer, Md Arafat Sultan, Jay-Yoon Lee, Mohit Iyyer, Andrew McCallum
  • for: 这 paper 是为了解决 semi-supervised sequence prediction 任务,其中有限量的标注数据不足以有效地训练模型,而同时几何shot提示大型自然语言模型 (LLM) 的性能有限。
  • methods: 这 paper 使用了一种新的混合型知识填充方法 (MCKD),其首先使用几何shot在 Context 中学习来生成假标签 для无标注数据。然后,在每个阶段的填充中,一对学生在不同的分区上进行训练,每个学生生成新的和改进的假标签来监督下一个阶段的学生。
  • results: 这 paper 的结果表明,在两个 constituency parsing 任务上,使用多stage collaborative knowledge distillation (MCKD) 可以提高模型的性能。在 CRAFT 生物医学解析任务上,3-stage MCKD 使用 50 个标注例可以与 supervised finetuning 使用 500 个标注例匹配的性能,并且超过提示 LL 和 vanilla KD 的性能 by 7.5% 和 3.7% 的解析 F1,分别。
    Abstract We study semi-supervised sequence prediction tasks where labeled data are too scarce to effectively finetune a model and at the same time few-shot prompting of a large language model (LLM) has suboptimal performance. This happens when a task, such as parsing, is expensive to annotate and also unfamiliar to a pretrained LLM. In this paper, we present a discovery that student models distilled from a prompted LLM can often generalize better than their teacher on such tasks. Leveraging this finding, we propose a new distillation method, multistage collaborative knowledge distillation from an LLM (MCKD), for such tasks. MCKD first prompts an LLM using few-shot in-context learning to produce pseudolabels for unlabeled data. Then, at each stage of distillation, a pair of students are trained on disjoint partitions of the pseudolabeled data. Each student subsequently produces new and improved pseudolabels for the unseen partition to supervise the next round of student(s) with. We show the benefit of multistage cross-partition labeling on two constituency parsing tasks. On CRAFT biomedical parsing, 3-stage MCKD with 50 labeled examples matches the performance of supervised finetuning with 500 examples and outperforms the prompted LLM and vanilla KD by 7.5% and 3.7% parsing F1, respectively.
    摘要 我们研究半supervised序列预测任务,其中标签资料短缺,无法有效地调整模型。另一方面,几个shot提示大型自然语言模型(LLM)的表现有限。在这篇论文中,我们发现学生模型从提示LLM的distillation中可以对such tasks generalize更好。基于这发现,我们提出了一个新的distillation方法:多stage合作知识传递法(MCKD)。MCKD首先使用几个shot在场景学习生成pseudolabels для无标的数据。然后,在每个阶段的distillation中,一对学生被训练在不同的分区中。每个学生 subsequntially生成新的和改善的pseudolabels для未看到的分区,以便supervise the next round of student(s) with。我们显示了多stage交叉分区标签的 benefitu two constituency parsing tasks。在CRAFT生物医学分析任务上,3 stage MCKD with 50标签例和supervised fine-tuning with 500标签例的表现相似,并且超过提示LLM和vanilla KD的构造解析F1指标 by 7.5%和3.7%,对于这两个任务而言。

Formal Proofs as Structured Explanations: Proposing Several Tasks on Explainable Natural Language Inference

  • paper_url: http://arxiv.org/abs/2311.08637
  • repo_url: None
  • paper_authors: Lasha Abzianidze
  • for: 提出一种使用正式证明来实现可解释的自然语言推理(NLI)任务。
  • methods: 使用可靠高性能的逻辑基于NLI系统生成正式证明,并利用生成的证明中的深入信息来定义可解释NLI任务。
  • results: 提出一系列有结构化解释的NLI任务,可以根据解释的粒度来排序。 argue that these tasks will have fewer shortcomings than existing explainable NLI tasks.
    Abstract In this position paper, we propose a way of exploiting formal proofs to put forward several explainable natural language inference (NLI) tasks. The formal proofs will be produced by a reliable and high-performing logic-based NLI system. Taking advantage of the in-depth information available in the generated formal proofs, we show how it can be used to define NLI tasks with structured explanations. The proposed tasks can be ordered according to difficulty defined in terms of the granularity of explanations. We argue that the tasks will suffer with substantially fewer shortcomings than the existing explainable NLI tasks (or datasets).
    摘要 在这份位置论文中,我们提出了使用正式证明来提出一些可解释的自然语言推理(NLI)任务。正式证明将由可靠高性能的逻辑基于NLI系统生成。利用生成的正式证明中的深入信息,我们显示了如何使用结构化解释来定义NLI任务。我们提出的任务可以按Difficulty进行排序,定义为证明粒度的水平。我们认为这些任务具有较少缺陷,相比现有的可解释NLI任务(或数据集)。

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

  • paper_url: http://arxiv.org/abs/2311.08623
  • repo_url: None
  • paper_authors: Peng Tang, Pengkai Zhu, Tian Li, Srikar Appalaraju, Vijay Mahadevan, R. Manmatha
  • for: 降低encoder-decoder transformer模型的推理时间
  • methods: 使用Dynamic Early Exit on Decoder (DEED)方法,包括多出口encoder-decoder模型、深度监管和适应模块等简单 yet practical技术,以提高推理精度并降低推理时间
  • results: 对两种state-of-the-art encoder-decoder transformer模型进行评测,实现了30%-60%的总推理时间减少,同时保持与基线相当或更高的准确率
    Abstract Encoder-decoder transformer models have achieved great success on various vision-language (VL) tasks, but they suffer from high inference latency. Typically, the decoder takes up most of the latency because of the auto-regressive decoding. To accelerate the inference, we propose an approach of performing Dynamic Early Exit on Decoder (DEED). We build a multi-exit encoder-decoder transformer model which is trained with deep supervision so that each of its decoder layers is capable of generating plausible predictions. In addition, we leverage simple yet practical techniques, including shared generation head and adaptation modules, to keep accuracy when exiting at shallow decoder layers. Based on the multi-exit model, we perform step-level dynamic early exit during inference, where the model may decide to use fewer decoder layers based on its confidence of the current layer at each individual decoding step. Considering different number of decoder layers may be used at different decoding steps, we compute deeper-layer decoder features of previous decoding steps just-in-time, which ensures the features from different decoding steps are semantically aligned. We evaluate our approach with two state-of-the-art encoder-decoder transformer models on various VL tasks. We show our approach can reduce overall inference latency by 30%-60% with comparable or even higher accuracy compared to baselines.
    摘要 <>预测模型Encoder-decoder transformer模型在视觉语言任务中获得了很大成功,但它们受到高速引入延迟的困扰。通常,解码器占总时间的大部分,因为解码器使用自动回归的方式。为了加速推断,我们提出了在解码器上进行动态早期离开的方法(DEED)。我们构建了多出口encoder-decoder transformer模型,该模型在每个解码层都可以生成可信度的预测。此外,我们利用了简单 yet practical的技术,包括共享生成头和适应模块,以保持精度 when exiting at shallow decoder layers。基于多出口模型,我们在推断过程中实施了Step-level动态早期离开,其中模型可以根据当前层的信息使用 fewer decoder layers。由于不同的decoder layers可能会在不同的推断步骤中使用,我们在每个推断步骤 compute deeper-layer decoder features的时候,以确保不同推断步骤的特征是协调的。我们使用了两个状态对模型在多种视觉语言任务上进行评估。我们发现,我们的方法可以降低总推断时间的30%-60%,与基eline相比,保持相对或更高的准确率。

Multiple-Question Multiple-Answer Text-VQA

  • paper_url: http://arxiv.org/abs/2311.08622
  • repo_url: https://github.com/jha1990/VQA-Multimodal-AI
  • paper_authors: Peng Tang, Srikar Appalaraju, R. Manmatha, Yusheng Xie, Vijay Mahadevan
    for:多个问题和多个答案(MQMA)是一种新的文本-VQA方法,用于在encoder-decoder transformer模型中进行文本理解和图像理解。methods:MQMA方法使用多个问题和内容作为输入,并在encoder和decoder中进行自动进程的推理,以同时预测多个答案。我们对标准encoder-decoder transformer模型进行了一些新的建模修改以支持MQMA。results:MQMA预训练模型在多个文本-VQA数据集上达到了当前最佳result,具体是OCR-VQA (+2.5%), TextVQA (+1.4%), ST-VQA (+0.6%), DocVQA (+1.1%)的绝对改进。
    Abstract We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models. The text-VQA task requires a model to answer a question by understanding multi-modal content: text (typically from OCR) and an associated image. To the best of our knowledge, almost all previous approaches for text-VQA process a single question and its associated content to predict a single answer. In order to answer multiple questions from the same image, each question and content are fed into the model multiple times. In contrast, our proposed MQMA approach takes multiple questions and content as input at the encoder and predicts multiple answers at the decoder in an auto-regressive manner at the same time. We make several novel architectural modifications to standard encoder-decoder transformers to support MQMA. We also propose a novel MQMA denoising pre-training task which is designed to teach the model to align and delineate multiple questions and content with associated answers. MQMA pre-trained model achieves state-of-the-art results on multiple text-VQA datasets, each with strong baselines. Specifically, on OCR-VQA (+2.5%), TextVQA (+1.4%), ST-VQA (+0.6%), DocVQA (+1.1%) absolute improvements over the previous state-of-the-art approaches.
    摘要 我们提出了多问题多答案(MQMA),一种新的方法来实现编码器-解码器变换器模型中的文本-VQA任务。文本-VQA任务需要模型理解多Modal内容:文本(通常来自OCR)和相关的图像。根据我们所知,前一代的approaches都是处理单个问题和其相关的内容来预测单个答案。而我们提出的MQMA方法则是在编码器中输入多个问题和内容,并在解码器中预测多个答案,这些答案将在自动进行重复的情况下同时预测。我们对标准编码器-解码器变换器模型进行了一些新的建议,以支持MQMA。我们还提出了一个MQMA净化预训练任务,用于教导模型将多个问题和内容与相应的答案进行对齐和分割。MQMA预训练模型在多个文本-VQA数据集上达到了最佳状态,每个数据集都有强的基eline。具体来说,在OCR-VQA (+2.5%), TextVQA (+1.4%), ST-VQA (+0.6%), DocVQA (+1.1%)中的绝对提升。

Toucan: Token-Aware Character Level Language Modeling

  • paper_url: http://arxiv.org/abs/2311.08620
  • repo_url: None
  • paper_authors: William Fleshman, Benjamin Van Durme
  • for: 这篇论文主要是为了提高Character-level语言模型的效率,使其能够更快地生成字符串。
  • methods: 这篇论文提出了一种基于”token-aware”的修改方法,可以帮助Character-level语言模型更好地处理长字符串。这种方法通过学习将字符串转换为token来实现,而不需要额外的tokenizer。
  • results: 对比于先前的工作,这种方法可以提高字符生成速度,而无需减少语言模型的表现。此外,这种方法还可以处理更长的字符串,并且可以生成更多的长字符串。code和项目可以在https://nlp.jhu.edu/nuggets/上获取。
    Abstract Character-level language models obviate the need for separately trained tokenizers, but efficiency suffers from longer sequence lengths. Learning to combine character representations into tokens has made training these models more efficient, but they still require decoding characters individually. We propose Toucan, an augmentation to character-level models to make them "token-aware". Comparing our method to prior work, we demonstrate significant speed-ups in character generation without a loss in language modeling performance. We then explore differences between our learned dynamic tokenization of character sequences with popular fixed vocabulary solutions such as Byte-Pair Encoding and WordPiece, finding our approach leads to a greater amount of longer sequences tokenized as single items. Our project and code are available at https://nlp.jhu.edu/nuggets/.
    摘要 ⟨SYS⟩文本翻译成简化中文。Character-level语言模型取消了分配单独的tokenizer的需要,但是序列长度变长会导致效率下降。学习将字符表示合并到 tokens中的方法可以使训练这些模型更加高效,但它们仍然需要解码字符个个。我们提出了Toucan,一种将字符级模型转化为“字符认识”的增强。与先前的工作进行比较,我们示出了不失语言模型表现的速度提升。然后,我们探讨了我们学习的动态tokenization和固定词库解决方案如Byte-PairEncoding和WordPiece的 diferencias,发现我们的方法可以处理更多的更长的序列。我们的项目和代码可以在https://nlp.jhu.edu/nuggets/找到。

Towards Generalizable SER: Soft Labeling and Data Augmentation for Modeling Temporal Emotion Shifts in Large-Scale Multilingual Speech

  • paper_url: http://arxiv.org/abs/2311.08607
  • repo_url: https://github.com/spaghettisystems/emotion_whisper
  • paper_authors: Mohamed Osman, Tamer Nadeem, Ghada Khoriba
  • for: 这项研究的目的是提高人机交互的进步,通过识别口头沟通中的情感。
  • methods: 这项研究使用了16个不同的数据集,共计375小时的数据,包括英语、中文和日语等语言。研究采用软标注系统来捕捉情感的渐进强度。使用了Whisper编码器和启发自对比例学习的数据增强方法,注重情感的时间动态。
  • results: 研究在四个多语言数据集上进行验证,显示出了显著的零基eline泛化性。发布了开源模型权重和初步的良好结果,并在Hume-Prosody上进行了细化调整。
    Abstract Recognizing emotions in spoken communication is crucial for advanced human-machine interaction. Current emotion detection methodologies often display biases when applied cross-corpus. To address this, our study amalgamates 16 diverse datasets, resulting in 375 hours of data across languages like English, Chinese, and Japanese. We propose a soft labeling system to capture gradational emotional intensities. Using the Whisper encoder and data augmentation methods inspired by contrastive learning, our method emphasizes the temporal dynamics of emotions. Our validation on four multilingual datasets demonstrates notable zero-shot generalization. We publish our open source model weights and initial promising results after fine-tuning on Hume-Prosody.
    摘要 感知情感在人机交互中的重要性。当前的情感检测方法经常在不同的文本库中显示偏见。为解决这个问题,我们的研究将16种不同的数据集融合起来,共计375小时的数据,涵盖英语、中文和日语等语言。我们提议一种柔化标签系统,以捕捉情感的柔化强度。使用Whisper编码器和基于对比学习的数据增强方法,我们的方法强调情感的时间动态。我们的验证在四种多语言数据集上表现出了显著的零shot泛化。我们将我们的开源模型 веса和初步成果发布在Hume-Prosody上。

cs.LG - 2023-11-15

Beyond PCA: A Probabilistic Gram-Schmidt Approach to Feature Extraction

  • paper_url: http://arxiv.org/abs/2311.09386
  • repo_url: None
  • paper_authors: Bahram Yaghooti, Netanel Raviv, Bruno Sinopoli
  • for: 提取非线性特征,即在数据中存在非线性关系的特征提取,是无监督学习中的基本挑战。
  • methods: 我们提出了一种使用概率 Gram-Schmidt(PGS)类型正交化过程,以探测和映射出数据中的重复维度。 Specifically, 我们在任何函数家族(presumably captures the nonlinear dependencies in the data)上应用 PGS 过程,并构建一系列 covariance matrices,可以用来从主成分中除非linear dependencies,或者标识新的大异常方向。
  • results: 我们提供了两种方法,可以从数据中提取线性特征,同时去除非线性关系。 在第一种情况下,我们证明在某些假设下,这些算法可以探测和除非线性关系,当这些关系在数据中的线性拟合中。 在第二种情况下,我们提供了信息学理论保证,以 entropy reduction 的形式。 我们在 sintetic 和实际数据上进行了 simulations,并证明了我们的方法在 variance maximization 和 classification 算法的性能上具有提高。
    Abstract Linear feature extraction at the presence of nonlinear dependencies among the data is a fundamental challenge in unsupervised learning. We propose using a Probabilistic Gram-Schmidt (PGS) type orthogonalization process in order to detect and map out redundant dimensions. Specifically, by applying the PGS process over any family of functions which presumably captures the nonlinear dependencies in the data, we construct a series of covariance matrices that can either be used to remove those dependencies from the principal components, or to identify new large-variance directions. In the former case, we prove that under certain assumptions the resulting algorithms detect and remove nonlinear dependencies whenever those dependencies lie in the linear span of the chosen function family. In the latter, we provide information-theoretic guarantees in terms of entropy reduction. Both proposed methods extract linear features from the data while removing nonlinear redundancies. We provide simulation results on synthetic and real-world datasets which show improved performance over PCA and state-of-the-art linear feature extraction algorithms, both in terms of variance maximization of the extracted features, and in terms of improved performance of classification algorithms.
    摘要 In the former case, we prove that under certain assumptions, the resulting algorithms detect and remove nonlinear dependencies whenever those dependencies lie in the linear span of the chosen function family. In the latter, we provide information-theoretic guarantees in terms of entropy reduction. Both proposed methods extract linear features from the data while removing nonlinear redundancies.We provide simulation results on synthetic and real-world datasets that show improved performance over PCA and state-of-the-art linear feature extraction algorithms, both in terms of variance maximization of the extracted features and improved performance of classification algorithms.我们使用Probabilistic Gram-Schmidt(PGS)类型的正交化过程来检测和映射出数据中的非线性依赖关系。通过将PGS过程应用到数据中的任何函数家族,我们可以构造一系列协 variance 矩阵,这些矩阵可以用于从主成分中除非线性依赖关系,或者标识新的大协变量方向。在前一种情况下,我们证明在某些假设下,得到的算法可以检测并除非线性依赖关系,当这些依赖关系 lie 在选择的函数家族的线性扩展上。在另一种情况下,我们提供信息理论保证,以 entropy 减少为标准。两种提议的方法都可以从数据中提取线性特征,同时去除非线性冗余。我们对 synthetic 和实际世界数据进行了丰富的 simulations,结果表明,我们的方法可以超过PCA和当前的线性特征提取算法,包括 variance 最大化的特征提取和分类算法的性能提高。

Time-dependent Probabilistic Generative Models for Disease Progression

  • paper_url: http://arxiv.org/abs/2311.09369
  • repo_url: None
  • paper_authors: Onintze Zaballa, Aritz Pérez, Elisa Gómez-Inhiesto, Teresa Acaiturri-Ayesta, Jose A. Lozano
  • for: 这篇论文的目的是要利用电子健康纪录中的资料来监控病人的健康趋势过程。
  • methods: 这篇论文使用了Markov运动模型来理解疾病的深层模式和动态。
  • results: 这篇论文的结果显示了模型的有效性,可以从数据中回传出深层模式,并且准确地模型了医疗事件之间的不规则时间间隔。
    Abstract Electronic health records contain valuable information for monitoring patients' health trajectories over time. Disease progression models have been developed to understand the underlying patterns and dynamics of diseases using these data as sequences. However, analyzing temporal data from EHRs is challenging due to the variability and irregularities present in medical records. We propose a Markovian generative model of treatments developed to (i) model the irregular time intervals between medical events; (ii) classify treatments into subtypes based on the patient sequence of medical events and the time intervals between them; and (iii) segment treatments into subsequences of disease progression patterns. We assume that sequences have an associated structure of latent variables: a latent class representing the different subtypes of treatments; and a set of latent stages indicating the phase of progression of the treatments. We use the Expectation-Maximization algorithm to learn the model, which is efficiently solved with a dynamic programming-based method. Various parametric models have been employed to model the time intervals between medical events during the learning process, including the geometric, exponential, and Weibull distributions. The results demonstrate the effectiveness of our model in recovering the underlying model from data and accurately modeling the irregular time intervals between medical actions.
    摘要 To address this challenge, we propose a Markovian generative model of treatments that can (i) model the irregular time intervals between medical events, (ii) classify treatments into subtypes based on the patient sequence of medical events and the time intervals between them, and (iii) segment treatments into subsequences of disease progression patterns. We assume that sequences have an associated structure of latent variables, including a latent class representing the different subtypes of treatments and a set of latent stages indicating the phase of progression of the treatments.We use the Expectation-Maximization algorithm to learn the model, which is efficiently solved with a dynamic programming-based method. During the learning process, we employ various parametric models to model the time intervals between medical events, including the geometric, exponential, and Weibull distributions. The results demonstrate the effectiveness of our model in recovering the underlying model from data and accurately modeling the irregular time intervals between medical actions.Here is the translation in Simplified Chinese:电子健康记录 (EHRs) 包含价值的健康轨迹信息,可以用来监测患者的健康变化趋势。研究人员已经开发了疾病进程模型,以了解医疗记录中的下列特征和动态。然而,分析医疗记录中的时间序列数据具有挑战性,因为它们具有不规则和不稳定的特征。为了解决这个问题,我们提出了一种Markov链模型,可以 (i) 模拟医疗记录中的不规则时间间隔, (ii) 根据患者的医疗记录序列和时间间隔来类型化治疗, (iii) 将治疗分为疾病进程模式的子序列。我们假设序列具有隐藏变量的结构,包括不同的治疗类型和疾病进程阶段。我们使用了Expectation-Maximization算法来学习模型,并使用动态规划方法来效率地解决问题。在学习过程中,我们采用了不同的参数模型来模拟医疗记录中的时间间隔,包括 geometric、exponential 和 Weibull 分布。结果表明,我们的模型可以准确地从数据中回归下列模型,并且能够准确地模拟医疗记录中的不规则时间间隔。

Nondestructive, quantitative viability analysis of 3D tissue cultures using machine learning image segmentation

  • paper_url: http://arxiv.org/abs/2311.09354
  • repo_url: None
  • paper_authors: Kylie J. Trettner, Jeremy Hsieh, Weikun Xiao, Jerry S. H. Lee, Andrea M. Armani
  • for: 本研究旨在开发一种基于图像处理的细胞生存度评估方法,以自动评估细胞群体的生存度和响应于刺激的可能性。
  • methods: 本研究使用图像处理算法来评估细胞生存度,不需要使用各种抑衰指标。研究者使用高内容成像系统拍摄照片,并使用人工智能模型来自动识别细胞生存度。
  • results: 研究发现,使用图像处理算法可以准确地评估细胞生存度,并且可以减少分析时间97%。这种方法可以在不同的细胞 культура条件下进行评估,并且可以帮助提高生物学和临床研究中的细胞 культура分析的可重复性和可靠性。
    Abstract Ascertaining the collective viability of cells in different cell culture conditions has typically relied on averaging colorimetric indicators and is often reported out in simple binary readouts. Recent research has combined viability assessment techniques with image-based deep-learning models to automate the characterization of cellular properties. However, further development of viability measurements to assess the continuity of possible cellular states and responses to perturbation across cell culture conditions is needed. In this work, we demonstrate an image processing algorithm for quantifying cellular viability in 3D cultures without the need for assay-based indicators. We show that our algorithm performs similarly to a pair of human experts in whole-well images over a range of days and culture matrix compositions. To demonstrate potential utility, we perform a longitudinal study investigating the impact of a known therapeutic on pancreatic cancer spheroids. Using images taken with a high content imaging system, the algorithm successfully tracks viability at the individual spheroid and whole-well level. The method we propose reduces analysis time by 97% in comparison to the experts. Because the method is independent of the microscope or imaging system used, this approach lays the foundation for accelerating progress in and for improving the robustness and reproducibility of 3D culture analysis across biological and clinical research.
    摘要 通过评估细胞群体的可活性在不同的细胞文化条件下,通常是通过均值色imetric指标来进行评估,并常常报告出简单的二进制输出。然而,现有的可活性评估技术还需要进一步发展,以评估细胞群体的连续性和响应于干扰的可能性。在这项工作中,我们提出了一种图像处理算法,可以无需各种指标来评估细胞可活性。我们证明了我们的算法与两名人类专家的总体评估结果相似,在不同的日期和细胞Matrix组成下。为了展示可能的实用性,我们进行了一项 longitudinal 研究,investigating the impact of a known therapeutic on pancreatic cancer spheroids。使用高Content imaging系统拍摄的图像,我们的算法成功地跟踪了细胞可活性的个体硬化和整个Well水平。我们的方法可以比人类专家减少分析时间约97%。因为该方法不виси于使用哪种 Mikroskop 或 imaging系统,这种方法可以为生物和临床研究提供加速进步的基础。

Challenges for Predictive Modeling with Neural Network Techniques using Error-Prone Dietary Intake Data

  • paper_url: http://arxiv.org/abs/2311.09338
  • repo_url: None
  • paper_authors: Dylan Spicker, Amir Nazemi, Joy Hutchinson, Paul Fieguth, Sharon I. Kirkpatrick, Michael Wallace, Kevin W. Dodd
  • for: 这个论文旨在探讨食物摄入数据如何影响健康关系,但这些数据经常受到测量误差的影响,导致实际关系与论文中的关系不同。
  • methods: 这篇论文使用神经网络模型来捕捉食物摄入数据中的复杂相互作用,但测量误差会对神经网络模型的性能产生负面影响。
  • results: 这篇论文发现,在受测量误差影响的情况下,神经网络模型的预测性能会受到影响,需要更多的数据和更好的方法来改善模型的性能。
    Abstract Dietary intake data are routinely drawn upon to explore diet-health relationships. However, these data are often subject to measurement error, distorting the true relationships. Beyond measurement error, there are likely complex synergistic and sometimes antagonistic interactions between different dietary components, complicating the relationships between diet and health outcomes. Flexible models are required to capture the nuance that these complex interactions introduce. This complexity makes research on diet-health relationships an appealing candidate for the application of machine learning techniques, and in particular, neural networks. Neural networks are computational models that are able to capture highly complex, nonlinear relationships so long as sufficient data are available. While these models have been applied in many domains, the impacts of measurement error on the performance of predictive modeling has not been systematically investigated. However, dietary intake data are typically collected using self-report methods and are prone to large amounts of measurement error. In this work, we demonstrate the ways in which measurement error erodes the performance of neural networks, and illustrate the care that is required for leveraging these models in the presence of error. We demonstrate the role that sample size and replicate measurements play on model performance, indicate a motivation for the investigation of transformations to additivity, and illustrate the caution required to prevent model overfitting. While the past performance of neural networks across various domains make them an attractive candidate for examining diet-health relationships, our work demonstrates that substantial care and further methodological development are both required to observe increased predictive performance when applying these techniques, compared to more traditional statistical procedures.
    摘要 Dietary intake data Routinely 被用来探索饮食和健康关系。 However, these data are often subject to measurement error, which distorts the true relationships. Beyond measurement error, there are likely complex synergistic and sometimes antagonistic interactions between different dietary components, complicating the relationships between diet and health outcomes. Flexible models are required to capture the nuance that these complex interactions introduce. This complexity makes research on diet-health relationships an appealing candidate for the application of machine learning techniques, and in particular, neural networks. Neural networks are computational models that are able to capture highly complex, nonlinear relationships so long as sufficient data are available. While these models have been applied in many domains, the impacts of measurement error on the performance of predictive modeling have not been systematically investigated. However, dietary intake data are typically collected using self-report methods and are prone to large amounts of measurement error. In this work, we demonstrate the ways in which measurement error erodes the performance of neural networks, and illustrate the care that is required for leveraging these models in the presence of error. We demonstrate the role that sample size and replicate measurements play on model performance, indicate a motivation for the investigation of transformations to additivity, and illustrate the caution required to prevent model overfitting. While the past performance of neural networks across various domains make them an attractive candidate for examining diet-health relationships, our work demonstrates that substantial care and further methodological development are both required to observe increased predictive performance when applying these techniques, compared to more traditional statistical procedures.Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from Traditional Chinese.

A Comparative Analysis of Machine Learning Models for Early Detection of Hospital-Acquired Infections

  • paper_url: http://arxiv.org/abs/2311.09329
  • repo_url: None
  • paper_authors: Ethan Harvey, Junzi Dong, Erina Ghosh, Ali Samadani
  • for: 这个研究旨在比较两种护理机器学习模型,以便在早期检测医院获得的感染(HAIs)中提供重要的改进。
  • methods: 这两种模型都使用不同的感染标签定义、选择受试者和预测方案。
  • results: 这个研究发现这两种模型在预测HAIs时存在一定的一致性和冲突。
    Abstract As more and more infection-specific machine learning models are developed and planned for clinical deployment, simultaneously running predictions from different models may provide overlapping or even conflicting information. It is important to understand the concordance and behavior of parallel models in deployment. In this study, we focus on two models for the early detection of hospital-acquired infections (HAIs): 1) the Infection Risk Index (IRI) and 2) the Ventilator-Associated Pneumonia (VAP) prediction model. The IRI model was built to predict all HAIs, whereas the VAP model identifies patients at risk of developing ventilator-associated pneumonia. These models could make important improvements in patient outcomes and hospital management of infections through early detection of infections and in turn, enable early interventions. The two models vary in terms of infection label definition, cohort selection, and prediction schema. In this work, we present a comparative analysis between the two models to characterize concordances and confusions in predicting HAIs by these models. The learnings from this study will provide important findings for how to deploy multiple concurrent disease-specific models in the future.
    摘要 随着更多的感染病特定机器学习模型的开发和规划,同时运行不同模型的预测可能会提供重叠或甚至矛盾的信息。理解并发型模型在部署过程中的协调和行为非常重要。本研究专注于两种早期检测医院获得感染(HAIs)的模型:1)感染风险指数(IRI)模型和2)呼吸器相关肺炎预测模型。IRI模型预测所有HAIs,而VAP预测模型标识患有呼吸器相关肺炎的患者。这两种模型可以通过早期检测感染并提供早期干预,从而提高患者的结果和医院对感染的管理。这两种模型在感染标签定义、样本选择和预测方案方面存在差异。本研究通过对这两种模型进行比较分析,描述这两种模型在预测HAIs方面的协调和混乱。本研究的发现将为未来多个同时部署疾病特定模型提供重要的发现。

A Unified Approach to Learning Ising Models: Beyond Independence and Bounded Width

  • paper_url: http://arxiv.org/abs/2311.09197
  • repo_url: None
  • paper_authors: Jason Gaitonde, Elchanan Mossel
  • for: 该论文目的是提高现有的恒等模型参数学习算法,以便在不满足现有假设的情况下,从数据中提取模型参数。
  • methods: 该论文使用了节点wise逻辑回归算法,该算法可以在各种新的情况下成功地提取模型参数,包括各种本地马可夫链生成的数据,以及随机的温度范围内的玻璃杯模型。
  • results: 该论文的结果表明,使用节点wise逻辑回归算法可以在各种新的情况下提取模型参数,并且可以在较低的样本复杂度下达到最佳的样本复杂度。此外,该论文还提供了一些新的 guarantees for learning from adversarial Glauber dynamics。
    Abstract We revisit the problem of efficiently learning the underlying parameters of Ising models from data. Current algorithmic approaches achieve essentially optimal sample complexity when given i.i.d. samples from the stationary measure and the underlying model satisfies "width" bounds on the total $\ell_1$ interaction involving each node. We show that a simple existing approach based on node-wise logistic regression provably succeeds at recovering the underlying model in several new settings where these assumptions are violated: (1) Given dynamically generated data from a wide variety of local Markov chains, like block or round-robin dynamics, logistic regression recovers the parameters with optimal sample complexity up to $\log\log n$ factors. This generalizes the specialized algorithm of Bresler, Gamarnik, and Shah [IEEE Trans. Inf. Theory'18] for structure recovery in bounded degree graphs from Glauber dynamics. (2) For the Sherrington-Kirkpatrick model of spin glasses, given $\mathsf{poly}(n)$ independent samples, logistic regression recovers the parameters in most of the known high-temperature regime via a simple reduction to weaker structural properties of the measure. This improves on recent work of Anari, Jain, Koehler, Pham, and Vuong [ArXiv'23] which gives distribution learning at higher temperature. (3) As a simple byproduct of our techniques, logistic regression achieves an exponential improvement in learning from samples in the M-regime of data considered by Dutt, Lokhov, Vuffray, and Misra [ICML'21] as well as novel guarantees for learning from the adversarial Glauber dynamics of Chin, Moitra, Mossel, and Sandon [ArXiv'23]. Our approach thus significantly generalizes the elegant analysis of Wu, Sanghavi, and Dimakis [Neurips'19] without any algorithmic modification.
    摘要 我们回到了从数据中划出隐藏模型的问题中。现有的算法方法可以实现基本的体积缩小Sample complexity, provided that the data is i.i.d. from the stationary distribution and the underlying model satisfies certain "width" bounds on the total $\ell_1$ interaction involving each node. 我们显示了一个简单的现有方法,即每个节点的逻辑回传 regression,可以在一些新的设定中成功地从数据中弹出隐藏模型:(1)对于各种本地Markov链的生成数据,例如对于块或轮转动态,逻辑回传 regression可以从数据中提取隐藏模型的parameters,具有最佳的体积缩小因素,只有 $\log\log n$ 因素。这标准化了 Bresler、Gamarnik 和 Shah 的特殊算法 [IEEE Trans. Inf. Theory'18] для结构回传在受限度度Graph中。(2)关于玻璃玻璃产生的磁矩链模型,对于大多数高温区域,逻辑回传 regression可以从 $\mathsf{poly}(n)$ 独立数据中提取模型的parameters。这意味着我们可以在较高的温度区域中进行分布学习,进一步超越了 Anari、Jain、Koehler、Pham 和 Vuong 的最近研究 [ArXiv'23]。(3)我们的方法还具有一个简单的副产物,即逻辑回传 regression可以从 M-regime中的数据中弹出模型的parameters,并且在 Dutt、Lokhov、Vuffray 和 Misra [ICML'21] 中考虑的数据中具有很好的学习效果。此外,我们还提供了一些新的保证,允许在Chin、Moitra、Mossel 和 Sandon 的 adversarial Glauber dynamics [ArXiv'23] 中进行学习。我们的方法因此可以广泛应用在 Wu、Sanghavi 和 Dimakis [Neurips'19] 的数据中,而不需要任何算法修改。

Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning without Task-Specific Knowledge

  • paper_url: http://arxiv.org/abs/2311.09195
  • repo_url: None
  • paper_authors: Sang-Hyun Lee, Seung-Woo Seo
  • for: 本研究旨在提高现有激励学习算法在真实场景中的应用,解决需要在每个话语中重置环境的瓶颈。
  • methods: 本研究提出了一种基于自适应激励学习(ARL)算法,生成适应学习进程中的课程。这些课程可以根据执行策略的学习进程来减少需要人工重置的数量,但是它们需要任务特定的知识,如预先定义的初始状态或重置奖励函数。本研究提出了一种不需要任务特定知识的ARL算法,可以自动生成适应学习进程中的课程。
  • results: 我们的实验结果表明,我们的ARL算法可以生成适应学习进程中的课程,使得执行策略可以自动重置到多样化和有用的初始状态。我们引入了一个成功识别器,以便从每个初始状态中预测执行策略后的成功概率。成功识别器通过在一种自适应的自我监督模式下训练,使得执行策略可以快速地解决缺乏奖励的迷宫探索任务,并且表现出了比基eline的更好的性能。
    Abstract A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode. This reset process demands substantial human intervention, making it difficult for the agent to learn continuously and autonomously. Several recent works have introduced autonomous reinforcement learning (ARL) algorithms that generate curricula for jointly training reset and forward policies. While their curricula can reduce the number of required manual resets by taking into account the agent's learning progress, they rely on task-specific knowledge, such as predefined initial states or reset reward functions. In this paper, we propose a novel ARL algorithm that can generate a curriculum adaptive to the agent's learning progress without task-specific knowledge. Our curriculum empowers the agent to autonomously reset to diverse and informative initial states. To achieve this, we introduce a success discriminator that estimates the success probability from each initial state when the agent follows the forward policy. The success discriminator is trained with relabeled transitions in a self-supervised manner. Our experimental results demonstrate that our ARL algorithm can generate an adaptive curriculum and enable the agent to efficiently bootstrap to solve sparse-reward maze navigation tasks, outperforming baselines with significantly fewer manual resets.
    摘要 Current reinforcement learning algorithms have a major obstacle to applying them to real-world scenarios: the need to reset the environment between every episode. This reset process requires a lot of human intervention, making it difficult for the agent to learn continuously and autonomously. Several recent works have proposed autonomous reinforcement learning (ARL) algorithms that generate curricula for jointly training reset and forward policies. These curricula can reduce the number of required manual resets by taking into account the agent's learning progress, but they rely on task-specific knowledge, such as predefined initial states or reset reward functions.In this paper, we propose a novel ARL algorithm that can generate a curriculum adaptive to the agent's learning progress without task-specific knowledge. Our curriculum enables the agent to autonomously reset to diverse and informative initial states. To achieve this, we introduce a success discriminator that estimates the success probability from each initial state when the agent follows the forward policy. The success discriminator is trained with relabeled transitions in a self-supervised manner. Our experimental results show that our ARL algorithm can generate an adaptive curriculum and enable the agent to efficiently bootstrap to solve sparse-reward maze navigation tasks, outperforming baselines with significantly fewer manual resets.

Approaching adverse event detection utilizing transformers on clinical time-series

  • paper_url: http://arxiv.org/abs/2311.09165
  • repo_url: None
  • paper_authors: Helge Fredriksen, Per Joel Burman, Ashenafi Woldaregay, Karl Øyvind Mikalsen, Ståle Nymo
  • for: 预测患者的临床趋势并避免不良事件
  • methods: 使用自动化检测系统,基于STraTS transformer架构对时间序列数据进行 Representation,并使用各种聚类技术来探索患者的临床进程分型
  • results: 初步结果显示系统能够准确地检测异常情况,但需要更多的患者数据来进行更全面的评估系统性能
    Abstract Patients being admitted to a hospital will most often be associated with a certain clinical development during their stay. However, there is always a risk of patients being subject to the wrong diagnosis or to a certain treatment not pertaining to the desired effect, potentially leading to adverse events. Our research aims to develop an anomaly detection system for identifying deviations from expected clinical trajectories. To address this goal we analyzed 16 months of vital sign recordings obtained from the Nordland Hospital Trust (NHT). We employed an self-supervised framework based on the STraTS transformer architecture to represent the time series data in a latent space. These representations were then subjected to various clustering techniques to explore potential patient phenotypes based on their clinical progress. While our preliminary results from this ongoing research are promising, they underscore the importance of enhancing the dataset with additional demographic information from patients. This additional data will be crucial for a more comprehensive evaluation of the method's performance.
    摘要 patients being admitted to a hospital will often be associated with a certain clinical development during their stay, but there is always a risk of patients being misdiagnosed or receiving the wrong treatment, which could lead to adverse events. our research aims to develop an anomaly detection system to identify deviations from expected clinical trajectories. to achieve this goal, we analyzed 16 months of vital sign recordings from the Nordland Hospital Trust (nht). we used a self-supervised framework based on the STraTS transformer architecture to represent the time series data in a latent space. these representations were then subjected to various clustering techniques to explore potential patient phenotypes based on their clinical progress. while our preliminary results are promising, we recognize the need to enhance the dataset with additional demographic information from patients to evaluate the method's performance more comprehensively.

Improved Sparse Ising Optimization

  • paper_url: http://arxiv.org/abs/2311.09275
  • repo_url: None
  • paper_authors: Kenneth M. Zick
  • for: 这个论文是为了解决含有大量零值的尼饶问题(Sparse Ising problem),这类问题在物流、吸引物理和深度波尔谱网络训练等领域都有广泛的应用,但可能会很困难和缓慢地解决。
  • methods: 该论文提出了一种新的落差搜索算法,用于解决含有大量零值的尼饶问题。该算法在大型稀疏实例上进行了测试,并实现了比以往报道的速度和准确性(如托笔会的模拟缺乏机制和breakout本地搜索)的至少2-4个数量级的提升。
  • results: 据论文所示,该新算法在一些长期的标准实例上达到了更高的性能,并在两个实例(G72和G77)上发现了更好的解决方案, Solution bitstrings证明了这两个最佳解决方案。这些数据表明,该算法可能会推动稀疏尼饶性能的前沿,并为AI工具箱、决策系统和算法库提供新的可能性。
    Abstract Sparse Ising problems can be found in application areas such as logistics, condensed matter physics and training of deep Boltzmann networks, but can be very difficult to tackle with high efficiency and accuracy. This report presents new data demonstrating significantly higher performance on some longstanding benchmark problems with up to 20,000 variables. The data come from a new heuristic algorithm tested on the large sparse instances from the Gset benchmark suite. Relative to leading reported combinations of speed and accuracy (e.g., from Toshiba's Simulated Bifurcation Machine and Breakout Local Search), a proof-of-concept implementation reached targets 2-4 orders of magnitude faster. For two instances (G72 and G77) the new algorithm discovered a better solution than all previously reported values. Solution bitstrings confirming these two best solutions are provided. The data suggest exciting possibilities for pushing the sparse Ising performance frontier to potentially strengthen algorithm portfolios, AI toolkits and decision-making systems.
    摘要 稀疏各种问题(Sparse Ising problems)可以在物流、吸积物理和深度波尔谱网络训练等应用领域中找到,但它们可以很难以使用高效率和准确性解决。本报告提供新的数据,表明在一些长期的标准测试问题上(有多达20,000个变量)表现出了明显的性能提升。这些数据来自一种新的启发式算法,在Gset benchmark集中的大 sparse instances上进行测试。相比之下,现有的速度和准确性的报道(如东芝的模拟分支机器和Breakout本地搜索),一个证明原型实现的运行速度比之下,让人感到惊叹。对G72和G77两个实例,新算法发现了比之前所报道的更好的解决方案。解决方案的位 bitstring证明了这两个最佳解决方案。数据表明,这些新成果可能会推动稀疏各种问题的性能前沿,并可能增强算法库、人工智能工具箱和决策系统。

Model Agnostic Explainable Selective Regression via Uncertainty Estimation

  • paper_url: http://arxiv.org/abs/2311.09145
  • repo_url: None
  • paper_authors: Andrea Pugnana, Carlos Mougan, Dan Saattrup Nielsen
  • for: 提高机器学习系统的可靠性和可信度
  • methods: 使用模型独立非参数统计 ERROR 估计
  • results: 比状态艺术 selective regression 表现更佳,在 69 个数据集上进行了广泛的比较
    Abstract With the wide adoption of machine learning techniques, requirements have evolved beyond sheer high performance, often requiring models to be trustworthy. A common approach to increase the trustworthiness of such systems is to allow them to refrain from predicting. Such a framework is known as selective prediction. While selective prediction for classification tasks has been widely analyzed, the problem of selective regression is understudied. This paper presents a novel approach to selective regression that utilizes model-agnostic non-parametric uncertainty estimation. Our proposed framework showcases superior performance compared to state-of-the-art selective regressors, as demonstrated through comprehensive benchmarking on 69 datasets. Finally, we use explainable AI techniques to gain an understanding of the drivers behind selective regression. We implement our selective regression method in the open-source Python package doubt and release the code used to reproduce our experiments.
    摘要 With the widespread adoption of machine learning techniques, requirements have evolved beyond sheer high performance, often requiring models to be trustworthy. A common approach to increase the trustworthiness of such systems is to allow them to refrain from predicting. Such a framework is known as selective prediction. While selective prediction for classification tasks has been widely analyzed, the problem of selective regression is understudied. This paper presents a novel approach to selective regression that utilizes model-agnostic non-parametric uncertainty estimation. Our proposed framework showcases superior performance compared to state-of-the-art selective regressors, as demonstrated through comprehensive benchmarking on 69 datasets. Finally, we use explainable AI techniques to gain an understanding of the drivers behind selective regression. We implement our selective regression method in the open-source Python package doubt and release the code used to reproduce our experiments.Here's the translation in Traditional Chinese:随着机器学习技术的广泛采用,需求已经进一步地进化,不仅需要高性能,更需要模型的可信度。一种常见的方法来增强模型的可信度是允许它们不 Predicting。这种框架称为选择性预测。选择性预测的分类任务已经广泛分析,但选择性回归却受到了较少的研究。本文提出了一种新的选择性回归方法,利用模型不 Parametric 不确定性估计。我们的提案的框架在69个数据集上展示了较高的性能,比state-of-the-art选择回归器更好。最后,我们使用可解释 AI 技术来理解选择性回归的驱动力。我们将选择性回归方法实现在 Open-source Python 套件 doubt 中,并发布了用于重现实验的代码。

Machine-learning parameter tracking with partial state observation

  • paper_url: http://arxiv.org/abs/2311.09142
  • repo_url: None
  • paper_authors: Zheng-Meng Zhai, Mohammadamin Moradi, Bryan Glaz, Mulugeta Haile, Ying-Cheng Lai
  • for: 这 paper 用于描述一种基于机器学习的时间变化参数追踪方法,用于处理复杂和非线性动力系统中的参数变化。
  • methods: 该方法基于储存器计算和反问题形式,不需要知道系统的模型结构,可以从部分状态观测数据中直接学习时间变化参数。
  • results: 研究人员通过使用不同的动力系统数据集,证明了该方法可以准确地预测时间变化参数,并且可以处理低维度和高维度、Markovian和非Markovian的动力系统。
    Abstract Complex and nonlinear dynamical systems often involve parameters that change with time, accurate tracking of which is essential to tasks such as state estimation, prediction, and control. Existing machine-learning methods require full state observation of the underlying system and tacitly assume adiabatic changes in the parameter. Formulating an inverse problem and exploiting reservoir computing, we develop a model-free and fully data-driven framework to accurately track time-varying parameters from partial state observation in real time. In particular, with training data from a subset of the dynamical variables of the system for a small number of known parameter values, the framework is able to accurately predict the parameter variations in time. Low- and high-dimensional, Markovian and non-Markovian nonlinear dynamical systems are used to demonstrate the power of the machine-learning based parameter-tracking framework. Pertinent issues affecting the tracking performance are addressed.
    摘要 复杂和非线性动力系统经常包含时间变化的参数,正精准跟踪这些参数是 estado estimation、预测和控制等任务的关键。现有的机器学习方法需要完整的系统状态观察,而且假设参数的变化是adiabatic的。我们通过形式化反问题和利用储存计算,开发了一种没有模型假设和完全数据驱动的参数跟踪框架。特别是,通过训练数据来自系统动力变量的一个子集,这种框架可以在实时中高精度预测参数的时间变化。我们使用了低维度和高维度、Markovian和非Markovian的非线性动力系统来证明框架的能力。我们还讨论了影响跟踪性能的关键问题。

Causal prediction models for medication safety monitoring: The diagnosis of vancomycin-induced acute kidney injury

  • paper_url: http://arxiv.org/abs/2311.09137
  • repo_url: None
  • paper_authors: Izak Yasrebi-de Kom, Joanna Klopotowska, Dave Dongelmans, Nicolette De Keizer, Kitty Jager, Ameen Abu-Hanna, Giovanni Cinà
  • for: 本研究旨在提供数据驱动的医疗安全监测支持,以改进现有的retrospective diagnosis of adverse drug events(ADEs)的方法。
  • methods: 本研究使用 causal modeling approach,包括 two key causal inference components:(1) 目标试验演示框架和 (2) 使用机器学习来估算个体化治疗效果。
  • results: 研究人员使用这种方法来估算vancomycin-induced acute kidney injury 中 ICU 病人的 causal probability(PC$_{low}$),并与医疗专家提供的qualitative estimates of the PC进行比较。
    Abstract The current best practice approach for the retrospective diagnosis of adverse drug events (ADEs) in hospitalized patients relies on a full patient chart review and a formal causality assessment by multiple medical experts. This evaluation serves to qualitatively estimate the probability of causation (PC); the probability that a drug was a necessary cause of an adverse event. This practice is manual, resource intensive and prone to human biases, and may thus benefit from data-driven decision support. Here, we pioneer a causal modeling approach using observational data to estimate a lower bound of the PC (PC$_{low}$). This method includes two key causal inference components: (1) the target trial emulation framework and (2) estimation of individualized treatment effects using machine learning. We apply our method to the clinically relevant use-case of vancomycin-induced acute kidney injury in intensive care patients, and compare our causal model-based PC$_{low}$ estimates to qualitative estimates of the PC provided by a medical expert. Important limitations and potential improvements are discussed, and we conclude that future improved causal models could provide essential data-driven support for medication safety monitoring in hospitalized patients.
    摘要 现有的最佳实践方法 для透view的药物反应(ADE)在医院化 patients中的诊断是通过全patient chart review和多个医疗专家的正式可能性评估来进行。这种评估用于Quantitatively estimating the probability of causation (PC); the probability that a drug was a necessary cause of an adverse event。这种方法是手动、资源浪费和人类偏见易受影响,可能从数据驱动的决策支持中受益。在这里,我们开创了一种 causal modeling 方法,使用观察数据来估算下限的PC(PC$_{low}$)。这个方法包括两个关键的 causal inference 组件:(1)目标试验拟合框架和(2)使用机器学习来估算个体化治疗效果。我们在Intensive care 患者中使用vancomycin-induced acute kidney injury作为临床实用的例子,并与医疗专家提供的qualitative PC 估计进行比较。我们讨论了重要的限制和可能的改进,并 conclude that future improved causal models could provide essential data-driven support for medication safety monitoring in hospitalized patients。

Fast Detection of Phase Transitions with Multi-Task Learning-by-Confusion

  • paper_url: http://arxiv.org/abs/2311.09128
  • repo_url: None
  • paper_authors: Julian Arnold, Frank Schäfer, Niels Lörch
  • for: study phase transitions
  • methods: learning-by-confusion scheme, multi-task learning
  • results: significant speedups, minor deviations compared to ideal case
    Abstract Machine learning has been successfully used to study phase transitions. One of the most popular approaches to identifying critical points from data without prior knowledge of the underlying phases is the learning-by-confusion scheme. As input, it requires system samples drawn from a grid of the parameter whose change is associated with potential phase transitions. Up to now, the scheme required training a distinct binary classifier for each possible splitting of the grid into two sides, resulting in a computational cost that scales linearly with the number of grid points. In this work, we propose and showcase an alternative implementation that only requires the training of a single multi-class classifier. Ideally, such multi-task learning eliminates the scaling with respect to the number of grid points. In applications to the Ising model and an image dataset generated with Stable Diffusion, we find significant speedups that closely correspond to the ideal case, with only minor deviations.
    摘要 machine learning 已经成功地应用于研究相转换。一种非常流行的方法是通过学习吃惊方式来识别潜在的 kritical point。这种方法需要输入系统样本,这些样本从可能存在phasetransition的参数变化中提取。以前,这种方法需要对每个可能的grid splitting into two sides进行训练独立的 binary classifier,因此计算成本将与grid点数 linearly scalable。在这篇文章中,我们提出了一种 alternatively,只需要训练一个多类分类器。理想情况下,这种多任务学习可以消除与grid点数的关系。在应用于 Ising 模型和一个通过 Stable Diffusion 生成的图像集中,我们发现了显著的加速,与理想情况几乎完全一致,只有小偏差。

Constructing interpretable principal curve using Neural ODEs

  • paper_url: http://arxiv.org/abs/2311.09274
  • repo_url: None
  • paper_authors: Guangzheng Zhang, Bingxian Xu
  • for: 这篇论文旨在Characterizing high-dimensional data sets in a dynamical manner, using neural ODEs to define the principal flow and summarize the space.
  • methods: 这篇论文使用了 neural ODEs 定义主流动,将数据集中的扩展转换为一个动态的流动形式,以便更好地描述高维数据集的本地几何结构。
  • results: 研究人员通过使用主流动来描述高维数据集的几何结构,并且可以在不同的复杂性水平上进行灵活的汇总。此外,主流动还可以包含刚性动力学的概念,以描述数据集的弹性relaxation dynamics。
    Abstract The study of high dimensional data sets often rely on their low dimensional projections that preserve the local geometry of the original space. While numerous methods have been developed to summarize this space as variations of tree-like structures, they are usually non-parametric and "static" in nature. As data may come from systems that are dynamical such as a differentiating cell, a static, non-parametric characterization of the space may not be the most appropriate. Here, we developed a framework, the principal flow, that is capable of characterizing the space in a dynamical manner. The principal flow, defined using neural ODEs, directs motion of a particle through the space, where the trajectory of the particle resembles the principal curve of the dataset. We illustrate that our framework can be used to characterize shapes of various complexities, and is flexible to incorporate summaries of relaxation dynamics.
    摘要 研究高维数据集时,常常利用其低维投影,以保持原始空间的本地几何结构。虽然有许多方法用于概括这个空间,但这些方法通常是非 Parametric 的,即 static 的性质。因为数据可能来自动演化的系统,如 diferenciating 细胞,静止、非 Parametric 的空间概括方法可能不是最合适的。我们在这里提出了一种框架,即主流动,可以 Dynamically 概括这个空间。主流动使用神经 ODEs 定义了一个粒子的运动轨迹,这个轨迹与数据集的主曲线相似。我们示示了我们的框架可以概括各种复杂的形状,并且可以容易地 incorporate 征relaxation 动态概括。

Damped Proximal Augmented Lagrangian Method for weakly-Convex Problems with Convex Constraints

  • paper_url: http://arxiv.org/abs/2311.09065
  • repo_url: None
  • paper_authors: Hari Dahal, Wei Liu, Yangyang Xu
  • for: 解决具有弱 converges 目标函数和几何/非几何约束的问题
  • methods: 使用抑制距离 proximal 束更新法 (DPALM)
  • results: 可以在 $O(\vareps^{-2})$ 外循环迭代中生成一个 $(1+\vareps)$-$ KKT $点,并且在不同类型的目标函数和约束下,DPALM 的迭代复杂度为 $\widetilde{\mathcal{O}\left(\varepsilon^{-2.5} \right)$ 或 $\widetilde{\mathcal{O}\left(\varepsilon^{-3} \right)$。此外,DPALM 在实验中证明比一些现有的方法更高效。
    Abstract We give a damped proximal augmented Lagrangian method (DPALM) for solving problems with a weakly-convex objective and convex linear/nonlinear constraints. Instead of taking a full stepsize, DPALM adopts a damped dual stepsize to ensure the boundedness of dual iterates. We show that DPALM can produce a (near) $\vareps$-KKT point within $O(\vareps^{-2})$ outer iterations if each DPALM subproblem is solved to a proper accuracy. In addition, we establish overall iteration complexity of DPALM when the objective is either a regularized smooth function or in a regularized compositional form. For the former case, DPALM achieves the complexity of $\widetilde{\mathcal{O}\left(\varepsilon^{-2.5} \right)$ to produce an $\varepsilon$-KKT point by applying an accelerated proximal gradient (APG) method to each DPALM subproblem. For the latter case, the complexity of DPALM is $\widetilde{\mathcal{O}\left(\varepsilon^{-3} \right)$ to produce a near $\varepsilon$-KKT point by using an APG to solve a Moreau-envelope smoothed version of each subproblem. Our outer iteration complexity and the overall complexity either generalize existing best ones from unconstrained or linear-constrained problems to convex-constrained ones, or improve over the best-known results on solving the same-structured problems. Furthermore, numerical experiments on linearly/quadratically constrained non-convex quadratic programs and linear-constrained robust nonlinear least squares are conducted to demonstrate the empirical efficiency of the proposed DPALM over several state-of-the art methods.
    摘要 我们提出了一个受抑制的近边增强方法(DPALM)来解决具有弱拟对函数和线性/非线性约束的问题。而不是采用完整的步长,DPALM 使用了一个抑制的对偶步长来保证对偶变数的紧缩性。我们证明了DPALM 可以在 $O(\vareps^{-2})$ 外部迭代中生成一个 $(1-\vareps)$ KKT 点。此外,我们建立了 DPALM 的总迭代复杂度,其中当函数是轻度调整的滑坡函数或是调整后的函数时,DPALM 的复杂度为 $\widetilde{\mathcal{O}\left(\varepsilon^{-2.5} \right)$ 和 $\widetilde{\mathcal{O}\left(\varepsilon^{-3} \right)$ 分别。这些结果缩减了现有最好的结果,或者提高了现有的最好结果。此外,我们还进行了一些实验,证明 DPALM 在线性/quadratically constrained non-convex quadratic programs 和 linear-constrained robust nonlinear least squares 中的实际效率。

New Horizons in Parameter Regularization: A Constraint Approach

  • paper_url: http://arxiv.org/abs/2311.09058
  • repo_url: None
  • paper_authors: Jörg K. H. Franke, Michael Hefenbrock, Gregor Koehler, Frank Hutter
  • for: 本研究旨在提出一种新的训练方法,即受限参数规范化(CPR),用于取代传统的质量惩罚。
  • methods: 本方法 Reformulates learning为一个受限优化问题,通过对具体参数组的统计量(例如L$_2$-norm)进行约束来实现。使用改进的扩展拉格朗日方法来解决这个受限优化问题。
  • results: 我们通过在”感知”现象、图像识别和自然语言处理等领域进行实验,证明CPR可以有效地抵消”感知”现象的影响,并且可以与传统的质量惩罚相比或超越其表现。
    Abstract This work presents constrained parameter regularization (CPR), an alternative to traditional weight decay. Instead of applying a constant penalty uniformly to all parameters, we enforce an upper bound on a statistical measure (e.g., the L$_2$-norm) of individual parameter groups. This reformulates learning as a constrained optimization problem. To solve this, we utilize an adaptation of the augmented Lagrangian method. Our approach allows for varying regularization strengths across different parameter groups, removing the need for explicit penalty coefficients in the regularization terms. CPR only requires two hyperparameters and introduces no measurable runtime overhead. We offer empirical evidence of CPR's effectiveness through experiments in the "grokking" phenomenon, image classification, and language modeling. Our findings show that CPR can counteract the effects of grokking, and it consistently matches or surpasses the performance of traditional weight decay.
    摘要 这个工作提出了制约参数规范化(CPR),这是传统权值衰退的替代方案。而不是对所有参数应用一定的罚款,我们要求各个参数组的统计量(例如L$_2$- нор)的上限。这将学习转化为一个受限制的优化问题。为解决这个问题,我们利用了一种改进后的扩展拉格朗日方法。我们的方法允许不同参数组的规范强度不同,从而消除了明确的罚款系数在规范项中的需求。CPR只需两个超参数,并没有可观测的运行时间开销。我们通过在“感悟”现象、图像分类和自然语言处理等领域进行实验,证明了CPR的有效性。我们的发现表明,CPR可以抵消“感悟”的影响,并且在性能上与传统权值衰退相当或超过。

On the Foundation of Distributionally Robust Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.09018
  • repo_url: None
  • paper_authors: Shengbo Wang, Nian Si, Jose Blanchet, Zhengyuan Zhou
  • for: This paper contributes to the theoretical foundation of distributionally robust reinforcement learning (DRRL) by providing a comprehensive modeling framework centered around distributionally robust Markov decision processes (DRMDPs).
  • methods: The paper unifies and extends existing formulations of DRMDPs, and rigorously constructs DRMDPs that embrace various modeling attributes for both the decision maker and the adversary.
  • results: The paper examines conditions for the existence or absence of the dynamic programming principle (DPP) within the DRMDP framework, and provides streamlined proofs grounded in a unified methodology. Additionally, the paper offers counterexamples for settings in which a DPP with full generality is absent.Here is the same information in Simplified Chinese text:
  • for: 这篇论文为分布robust控制学(DRRL)的理论基础做出了贡献,提供了一个包容性的模型框架, centered around distributionally robust Markov decision processes(DRMDPs)。
  • methods: 论文将现有的DRMDPs整合和扩展,并强制构建DRMDPs,以涵盖决策者和反对派的各种模型特征。
  • results: 论文研究DRMDPs中的动态计划原理(DPP)的存在或缺失情况,并提供了一致的证明方法。此外,论文还提供了不具有全面通用性的DPP的 counterexample。
    Abstract Motivated by the need for a robust policy in the face of environment shifts between training and the deployment, we contribute to the theoretical foundation of distributionally robust reinforcement learning (DRRL). This is accomplished through a comprehensive modeling framework centered around distributionally robust Markov decision processes (DRMDPs). This framework obliges the decision maker to choose an optimal policy under the worst-case distributional shift orchestrated by an adversary. By unifying and extending existing formulations, we rigorously construct DRMDPs that embraces various modeling attributes for both the decision maker and the adversary. These attributes include adaptability granularity, exploring history-dependent, Markov, and Markov time-homogeneous decision maker and adversary dynamics. Additionally, we delve into the flexibility of shifts induced by the adversary, examining SA and S-rectangularity. Within this DRMDP framework, we investigate conditions for the existence or absence of the dynamic programming principle (DPP). From an algorithmic standpoint, the existence of DPP holds significant implications, as the vast majority of existing data and computationally efficiency RL algorithms are reliant on the DPP. To study its existence, we comprehensively examine combinations of controller and adversary attributes, providing streamlined proofs grounded in a unified methodology. We also offer counterexamples for settings in which a DPP with full generality is absent.
    摘要 ▼ Motivated by the need for a robust policy in the face of environment shifts between training and deployment, we contribute to the theoretical foundation of distributionally robust reinforcement learning (DRRL). This is accomplished through a comprehensive modeling framework centered around distributionally robust Markov decision processes (DRMDPs). This framework obliges the decision maker to choose an optimal policy under the worst-case distributional shift orchestrated by an adversary. By unifying and extending existing formulations, we rigorously construct DRMDPs that embrace various modeling attributes for both the decision maker and the adversary. These attributes include adaptability granularity, exploring history-dependent, Markov, and Markov time-homogeneous decision maker and adversary dynamics. Additionally, we delve into the flexibility of shifts induced by the adversary, examining SA and S-rectangularity. Within this DRMDP framework, we investigate conditions for the existence or absence of the dynamic programming principle (DPP). From an algorithmic standpoint, the existence of DPP holds significant implications, as the vast majority of existing data and computationally efficiency RL algorithms are reliant on the DPP. To study its existence, we comprehensively examine combinations of controller and adversary attributes, providing streamlined proofs grounded in a unified methodology. We also offer counterexamples for settings in which a DPP with full generality is absent.Note: Simplified Chinese is used in this translation, as it is more widely used in everyday communication and is easier to read for non-native speakers. However, if you prefer Traditional Chinese, I can also provide the translation in that format.

Semidefinite programs simulate approximate message passing robustly

  • paper_url: http://arxiv.org/abs/2311.09017
  • repo_url: None
  • paper_authors: Misha Ivkov, Tselil Schramm
  • for: solves many average-case optimization problems optimally
  • methods: uses local statistics hierarchy semidefinite programs (SDPs)
  • results: offers robust guarantees for many problems, including optimizing the Sherrington-Kirkpatrick Hamiltonian and others
    Abstract Approximate message passing (AMP) is a family of iterative algorithms that generalize matrix power iteration. AMP algorithms are known to optimally solve many average-case optimization problems. In this paper, we show that a large class of AMP algorithms can be simulated in polynomial time by \emph{local statistics hierarchy} semidefinite programs (SDPs), even when an unknown principal minor of measure $1/\mathrm{polylog}(\mathrm{dimension})$ is adversarially corrupted. Ours are the first robust guarantees for many of these problems. Further, our results offer an interesting counterpoint to strong lower bounds against less constrained SDP relaxations for average-case max-cut-gain (a.k.a. "optimizing the Sherrington-Kirkpatrick Hamiltonian") and other problems.
    摘要 <> Approximate message passing (AMP) 是一家Iterative algorithms的家族,它们可以通过 Matrix power iteration 的推广来解决许多平均情况优化问题。在这篇论文中,我们展示了一个大类的 AMP 算法可以通过本地统计层次SDP(semidefinite programs)来模拟,即使 unknown principal minor 的推广是 adversarially corrupted 的。这些结果是首次提供了对这些问题的稳定保证。此外,我们的结果还提供了一个有趣的对比,证明强下界 противless constrained SDP relaxations 对平均情况最大切割(a.k.a. "优化 Sherington-Kirkpatrick Hamiltonian")和其他问题的解决方案。<>

sQUlearn $\unicode{x2013}$ A Python Library for Quantum Machine Learning

  • paper_url: http://arxiv.org/abs/2311.08990
  • repo_url: https://github.com/squlearn/squlearn
  • paper_authors: David A. Kreplin, Moritz Willmann, Jan Schnabel, Frederic Rapp, Marco Roth
  • for: 这个论文是用于探讨量子机器学习(QML)的Python库,旨在让量子机器学习研究者和实践者可以轻松地整合古典机器学习工具 like scikit-learn。
  • methods: 这个库使用了分层架构,提供了丰富的工具集,包括量子核心方法和量子神经网络,以及自定义数据编码策略、自动化执行处理和特定核心规化技术。
  • results: 这个库的设计目标是让现有的量子计算能力和实际机器学习应用之间建立桥接,并且提供了高效的概念测试、实验和管道功能。
    Abstract sQUlearn introduces a user-friendly, NISQ-ready Python library for quantum machine learning (QML), designed for seamless integration with classical machine learning tools like scikit-learn. The library's dual-layer architecture serves both QML researchers and practitioners, enabling efficient prototyping, experimentation, and pipelining. sQUlearn provides a comprehensive toolset that includes both quantum kernel methods and quantum neural networks, along with features like customizable data encoding strategies, automated execution handling, and specialized kernel regularization techniques. By focusing on NISQ-compatibility and end-to-end automation, sQUlearn aims to bridge the gap between current quantum computing capabilities and practical machine learning applications.
    摘要

A Multimodal Dataset of 21,412 Recorded Nights for Sleep and Respiratory Research

  • paper_url: http://arxiv.org/abs/2311.08979
  • repo_url: None
  • paper_authors: Alon Diament, Maria Gorodetski, Adam Jankelow, Ayya Keshet, Tal Shor, Daphna Weissglas-Volkov, Hagai Rossman, Eran Segal
  • for: 这个研究旨在提供一个新的、丰富的家庭呼吸暂停测试数据集,来支持睡眠研究、个性化医疗和机器学习应用于生物医学领域。
  • methods: 这个研究使用了FDA批准的WatchPAT-300设备,收集了7,077名参与者在21,412个夜晚的数据,包括3级睡眠数据:原始多核心时间序列数据、注释的睡眠事件和计算的摘要统计数据,其中包括447个睡眠建筑、呼吸暂停和心率变化的特征。
  • results: 这个数据集可以提高许多健康相关特征的预测能力,包括身体结构、骨骼密度、血糖水平和心血管健康。这些结果表明该数据集有可能在睡眠研究、个性化医疗和机器学习应用中提供新的参考值。
    Abstract This study introduces a novel, rich dataset obtained from home sleep apnea tests using the FDA-approved WatchPAT-300 device, collected from 7,077 participants over 21,412 nights. The dataset comprises three levels of sleep data: raw multi-channel time-series from sensors, annotated sleep events, and computed summary statistics, which include 447 features related to sleep architecture, sleep apnea, and heart rate variability (HRV). We present reference values for Apnea/Hypopnea Index (AHI), sleep efficiency, Wake After Sleep Onset (WASO), and HRV sample entropy, stratified by age and sex. Moreover, we demonstrate that the dataset improves the predictive capability for various health related traits, including body composition, bone density, blood sugar levels and cardiovascular health. These results illustrate the dataset's potential to advance sleep research, personalized healthcare, and machine learning applications in biomedicine.
    摘要 Note: "Simplified Chinese" is used to refer to the standardized form of Chinese used in mainland China, as opposed to "Traditional Chinese" which is used in Hong Kong, Taiwan, and other regions. The translation is written in Simplified Chinese, but the original text is in Traditional Chinese.

Probability of Collision of satellites and space debris for short-term encounters: Rederivation and fast-to-compute upper and lower bounds

  • paper_url: http://arxiv.org/abs/2311.08978
  • repo_url: None
  • paper_authors: Ricardo Ferreira, Cláudia Soares, Marta Guimarães
    for: 这篇论文旨在解决低地球轨道(LEO)中对空间业务造成的垃圾物品问题,特别是预测这些物品之间的碰撞可能性。methods: 这篇论文提出了一个新的 derive 方法,基于初始假设,允许快速和紧密的上下限 bounds Computation,以便更好地预测碰撞可能性。results: 这篇论文的实验显示,与传统方法相比,新的 derive 方法可以快速计算碰撞可能性,并且可以实现几乎实时的处理时间。
    Abstract The proliferation of space debris in LEO has become a major concern for the space industry. With the growing interest in space exploration, the prediction of potential collisions between objects in orbit has become a crucial issue. It is estimated that, in orbit, there are millions of fragments a few millimeters in size and thousands of inoperative satellites and discarded rocket stages. Given the high speeds that these fragments can reach, even fragments a few millimeters in size can cause fractures in a satellite's hull or put a serious crack in the window of a space shuttle. The conventional method proposed by Akella and Alfriend in 2000 remains widely used to estimate the probability of collision in short-term encounters. Given the small period of time, it is assumed that, during the encounter: (1) trajectories are represented by straight lines with constant velocity; (2) there is no velocity uncertainty and the position exhibits a stationary distribution throughout the encounter; and (3) position uncertainties are independent and represented by Gaussian distributions. This study introduces a novel derivation based on first principles that naturally allows for tight and fast upper and lower bounds for the probability of collision. We tested implementations of both probability and bound computations with the original and our formulation on a real CDM dataset used in ESA's Collision Avoidance Challenge. Our approach reduces the calculation of the probability to two one-dimensional integrals and has the potential to significantly reduce the processing time compared to the traditional method, from 80% to nearly real-time.
    摘要 随着空间探索的兴趣日益增长,附近轨道上的空间垃圾堆积已成为空间业界的一个重要问题。预测可能的空间碰撞事件已成为一项关键的问题。据估计,附近轨道上有数以百万计的碎片几毫米大小,以及废弃的卫星和发射器阶段。由于这些碎片的高速运动,же�不是几毫米大小的碎片也可以使附近轨道上的卫星舱壁受损或使空间飞船的窗口受伤。传统的方法,提出于Akella和Alfriend在2000年,仍然广泛使用来估计短期遭遇中的碰撞机会。在这种情况下,假设:(1)轨道可以用直线表示,速度为常数;(2)速度不具有uncertainty,位置呈静态分布;(3)位置不确定性是独立的 Gaussian 分布。本研究提出了一种基于初始原理的新 derivation,自然地提供了紧密和快速的上限和下限 bounds для碰撞机会的计算。我们对原始和我们的方法进行了实现,并使用ESA的Collision Avoidance Challenge中使用的真实CDM数据进行测试。我们的方法将计算概率减少到两个一维 интегра尔,并有可能减少计算时间比传统方法的80%,从近实时级别降低到。

A Single-Loop Algorithm for Decentralized Bilevel Optimization

  • paper_url: http://arxiv.org/abs/2311.08945
  • repo_url: None
  • paper_authors: Youran Dong, Shiqian Ma, Junfeng Yang, Chao Yin
  • for: 这 paper 是关于分布式机器学习中的 bilateral 优化问题的研究。
  • methods: 该 paper 提出了一种新的单循环算法来解决分布式 bilateral 优化问题,该算法不需要大量的矩阵-向量乘制。此外,不同于现有的分布式 bilateral 优化和联邦 bilateral 优化方法,该算法不需要任何梯度差异假设。
  • results: 我们的分析表明,提出的算法可以达到最佳知名的 convergence rate для bilateral 优化算法。
    Abstract Bilevel optimization has received more and more attention recently due to its wide applications in machine learning. In this paper, we consider bilevel optimization in decentralized networks. In particular, we propose a novel single-loop algorithm for solving decentralized bilevel optimization with strongly convex lower level problem. Our algorithm is fully single-loop and does not require heavy matrix-vector multiplications when approximating the hypergradient. Moreover, unlike existing methods for decentralized bilevel optimization and federated bilevel optimization, our algorithm does not require any gradient heterogeneity assumption. Our analysis shows that the proposed algorithm achieves the best known convergence rate for bilevel optimization algorithms.
    摘要 “BILevel优化在近期Received更多的注意力,因为它在机器学习中有广泛的应用。在这篇论文中,我们考虑了分布式网络中的BILevel优化。具体来说,我们提出了一种新的单循环算法,用于解决分布式BILevel优化中强 convex下层问题。我们的算法完全是单循环的,不需要大量的矩阵-向量乘法来估计超gradient。此外,与现有的分布式BILevel优化和联邦BILevel优化方法不同,我们的算法不需要任何梯度异质假设。我们的分析表明,我们的算法可以达到BILevel优化算法中最佳的知名的速度。”Note: The translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.

Efficiently Escaping Saddle Points for Non-Convex Policy Optimization

  • paper_url: http://arxiv.org/abs/2311.08914
  • repo_url: None
  • paper_authors: Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Niao He, Matthias Grossglauser
  • for: 本研究旨在提出一种基于积分Gradient的第二阶方法,以实现精细化优化。
  • methods: 该方法使用积分Gradient法,并使用矩阵乘法来获取第二阶信息,从而提高效率。
  • results: 实验结果表明,该方法可以更高效地优化问题,并且更具 robustness。
    Abstract Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance. In recent years, several variance-reduced PG methods have been proposed with a theoretical guarantee of converging to an approximate first-order stationary point (FOSP) with the sample complexity of $O(\epsilon^{-3})$. However, FOSPs could be bad local optima or saddle points. Moreover, these algorithms often use importance sampling (IS) weights which could impair the statistical effectiveness of variance reduction. In this paper, we propose a variance-reduced second-order method that uses second-order information in the form of Hessian vector products (HVP) and converges to an approximate second-order stationary point (SOSP) with sample complexity of $\tilde{O}(\epsilon^{-3})$. This rate improves the best-known sample complexity for achieving approximate SOSPs by a factor of $O(\epsilon^{-0.5})$. Moreover, the proposed variance reduction technique bypasses IS weights by using HVP terms. Our experimental results show that the proposed algorithm outperforms the state of the art and is more robust to changes in random seeds.
    摘要 In this paper, we propose a variance-reduced second-order method that uses second-order information in the form of Hessian vector products (HVP) and converges to an approximate second-order stationary point (SOSP) with sample complexity of $\tilde{O}(\epsilon^{-3})$. This rate improves the best-known sample complexity for achieving approximate SOSPs by a factor of $O(\epsilon^{-0.5})$. Moreover, the proposed variance reduction technique bypasses IS weights by using HVP terms.Our experimental results show that the proposed algorithm outperforms the state of the art and is more robust to changes in random seeds.

On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series

  • paper_url: http://arxiv.org/abs/2311.08902
  • repo_url: https://github.com/ratschlab/clinical-embeddings
  • paper_authors: Rita Kuznetsova, Alizée Pace, Manuel Burger, Hugo Yèche, Gunnar Rätsch
  • for: 这些研究旨在探讨深度学习模型在医疗数据中的应用,尤其是在医院床位监测记录中处理时间序列数据。
  • methods: 这些研究使用了新的深度学习架构,包括树状结构和表格数据的处理方法。
  • results: 研究发现,使用这些新的深度学习方法可以在医疗数据中提高时间序列模型的性能,特别是在医院床位监测记录中。 Additionally, the study found that feature grouping within predefined semantic groups in the step-wise embedding module can lead to significant performance gains in clinical time-series.
    Abstract Recent advances in deep learning architectures for sequence modeling have not fully transferred to tasks handling time-series from electronic health records. In particular, in problems related to the Intensive Care Unit (ICU), the state-of-the-art remains to tackle sequence classification in a tabular manner with tree-based methods. Recent findings in deep learning for tabular data are now surpassing these classical methods by better handling the severe heterogeneity of data input features. Given the similar level of feature heterogeneity exhibited by ICU time-series and motivated by these findings, we explore these novel methods' impact on clinical sequence modeling tasks. By jointly using such advances in deep learning for tabular data, our primary objective is to underscore the importance of step-wise embeddings in time-series modeling, which remain unexplored in machine learning methods for clinical data. On a variety of clinically relevant tasks from two large-scale ICU datasets, MIMIC-III and HiRID, our work provides an exhaustive analysis of state-of-the-art methods for tabular time-series as time-step embedding models, showing overall performance improvement. In particular, we evidence the importance of feature grouping in clinical time-series, with significant performance gains when considering features within predefined semantic groups in the step-wise embedding module.
    摘要 近期深度学习框架的进步在序列模型方面尚未完全传递到医疗电子记录中的时间序列处理任务上。特别是在医疗中心内部疗单(ICU)中的问题上,现状的方法仍然是通过树状方法进行序列分类。现有的深度学习方法在表格数据上的发现已经超过了传统方法,更好地处理数据输入特征的严重不同。基于这些发现,我们在临床时序序模型任务上运用这些新的方法,主要目标是强调时序序模型中步骤嵌入的重要性,这在机器学习方法中尚未得到充分发挥。从两个大规模的ICU数据集(MIMIC-III和HiRID)中,我们进行了广泛的性能分析,并证明了采用时间步骤嵌入模型可以获得总体性能提升。尤其是在医疗时序序中,对特定 semantic group 中的特征进行分组,可以获得显著性能提升。

FedCode: Communication-Efficient Federated Learning via Transferring Codebooks

  • paper_url: http://arxiv.org/abs/2311.09270
  • repo_url: None
  • paper_authors: Saeed Khalilian, Vasileios Tsouvalas, Tanir Ozcelebi, Nirvana Meratnia
  • for: 提高 Federated Learning (FL) 中的数据传输效率,降低客户端和服务器之间的通信负担。
  • methods: 提出 FedCode 方法,即在客户端上生成和更新 Codebook,并在服务器端 периоди性地传输模型参数以保证学习过程的稳定性和准确性。
  • results: 通过多个公共数据集和 ResNet-20 和 MobileNet 模型框架进行评估,实现了平均数据传输量的12.2倍减少,同时保持与 FedAvg 相对的模型性能水平(准确率下降率为1.3%)。进一步验证了 FedCode 在非Identical和分布式数据上的性能,其中数据传输量减少约12.7倍,并且模型性能下降率为2.0%。
    Abstract Federated Learning (FL) is a distributed machine learning paradigm that enables learning models from decentralized local data. While FL offers appealing properties for clients' data privacy, it imposes high communication burdens for exchanging model weights between a server and the clients. Existing approaches rely on model compression techniques, such as pruning and weight clustering to tackle this. However, transmitting the entire set of weight updates at each federated round, even in a compressed format, limits the potential for a substantial reduction in communication volume. We propose FedCode where clients transmit only codebooks, i.e., the cluster centers of updated model weight values. To ensure a smooth learning curve and proper calibration of clusters between the server and the clients, FedCode periodically transfers model weights after multiple rounds of solely communicating codebooks. This results in a significant reduction in communication volume between clients and the server in both directions, without imposing significant computational overhead on the clients or leading to major performance degradation of the models. We evaluate the effectiveness of FedCode using various publicly available datasets with ResNet-20 and MobileNet backbone model architectures. Our evaluations demonstrate a 12.2-fold data transmission reduction on average while maintaining a comparable model performance with an average accuracy loss of 1.3% compared to FedAvg. Further validation of FedCode performance under non-IID data distributions showcased an average accuracy loss of 2.0% compared to FedAvg while achieving approximately a 12.7-fold data transmission reduction.
    摘要 联邦学习(FL)是一种分布式机器学习 paradigma,允许从分布式本地数据学习模型。而FL具有保护客户端数据隐私的优点,但是它需要在服务器和客户端之间高频率进行模型Weight的交换,从而增加了通信负担。现有的方法通过模型压缩技术,如剪枝和Weight集成,来解决这个问题。然而,在每次联邦轮次中发送整个Weight更新集合,即使使用压缩Format,仍然限制了可以减少通信量的潜在降低。我们提议FedCode,客户端只发送codebook,即更新模型Weight值的cluster中心。为确保客户端和服务器之间的学习曲线和模型Weight的准确协调,FedCode在多个轮次后 periodic地传输模型Weight。这Resulted in 客户端和服务器之间的通信量减少,而无需增加客户端的计算负担或导致模型性能下降。我们使用多个公共可用的数据集进行评估,并使用ResNet-20和MobileNet底层模型结构。我们的评估结果表明,FedCode可以实现12.2倍的数据传输减少,而无需增加客户端的计算负担,并且模型性能下降率为1.3%,相比FedAvg。此外,我们进一步验证了FedCode在非 Identical Data分布下的性能,其中Accuracy下降率为2.0%,并实现了约12.7倍的数据传输减少。

Towards Label Embedding – Measuring classification difficulty

  • paper_url: http://arxiv.org/abs/2311.08874
  • repo_url: None
  • paper_authors: Katharina Hechinger, Christoph Koller, Xiao Xiang Zhu, Göran Kauermann
  • for: 本研究的目的是提出一种基于投票分布的 Label Embedding 方法,以便在无约束的多个 Labeler 独立标注的情况下,生成高质量的 Label Embedding。
  • methods: 本研究使用了 Bayesian 模型 Dirichlet-Multinomial 模型,通过随机推断 Maximization 算法和 Markov Chain Monte Carlo 步骤来估计模型和 posterior。
  • results: 研究人员通过应用该方法于三个 benchmark 数据集,得到了高质量的 Label Embedding,并且可以 Investigate 得到的相关性矩阵,它们可以作为普通的混淆矩阵,反映原始类别之间的semantic similarity。
    Abstract Uncertainty quantification in machine learning is a timely and vast field of research. In supervised learning, uncertainty can already occur in the very first stage of the training process, the labelling step. In particular, this is the case when not every instance can be unambiguously classified. The problem occurs for classifying instances, where classes may overlap or instances can not be clearly categorised. In other words, there is inevitable ambiguity in the annotation step and not necessarily a 'ground truth'. We look exemplary at the classification of satellite images. Each image is annotated independently by multiple labellers and classified into local climate zones (LCZs). For each instance we have multiple votes, leading to a distribution of labels rather than a single value. The main idea of this work is that we do not assume a ground truth label but embed the votes into a K-dimensional space, with K as the number of possible categories. The embedding is derived from the voting distribution in a Bayesian setup, modelled via a Dirichlet-Multinomial model. We estimate the model and posteriors using a stochastic Expectation Maximisation algorithm with Markov Chain Monte Carlo steps. While we focus on the particular example of LCZ classification, the methods developed in this paper readily extend to other situations where multiple annotators independently label texts or images. We also apply our approach to two other benchmark datasets for image classification to demonstrate this. Besides the embeddings themselves, we can investigate the resulting correlation matrices, which can be seen as generalised confusion matrices and reflect the semantic similarities of the original classes very well for all three exemplary datasets. The insights gained are valuable and can serve as general label embedding if a single ground truth per observation cannot be guaranteed.
    摘要 机器学习中的不确定性评估是一个时髦的和广泛的研究领域。在指导学习中,不确定性可以在训练过程的第一个阶段出现,即标注阶段。具体来说,这是因为不every个实例都可以无ambiguously分类。标注阶段存在不可避免的uncertainty,而不是一个固定的'ground truth'。我们通过卫星图像的分类为例,每个图像都被独立地标注了多个标注者,并被分类到本地气候区(LCZ)中。每个实例都有多个选择,导致一个分布而不是单个值。我们的主要想法是不 assumption of ground truth标签,而是将选择embed到K-维空间中,K为可能的类别数。这个空间中的嵌入是基于投票分布的Dirichlet-Multinomial模型。我们使用随机抽样最大化算法和Markov链条件遍历来估算模型和 posterior。虽然我们专注于 LCZ 分类的例子,但我们的方法很容易扩展到其他情况,其中多个标注者独立地标注文本或图像。我们还应用我们的方法到了两个其他的图像分类 benchmark 数据集,以示这。除了嵌入本身以外,我们还可以调查结果中的相关性矩阵,可以看作是通用的混淆矩阵,很好地反映原始类别之间的 semantic 相似性。这些发现可以作为一般的标签嵌入,当single ground truth per observation不能保证时。

Statistical learning by sparse deep neural networks

  • paper_url: http://arxiv.org/abs/2311.08845
  • repo_url: https://github.com/himanshub1007/Alzhimers-Disease-Prediction-Using-Deep-learning
  • paper_authors: Felix Abramovich
  • for: 这个论文是用来研究深度神经网络估计器,特别是使用经验风险最小化与L1正则化。
  • methods: 这个论文使用了经验风险最小化与L1正则化来估计深度神经网络。
  • results: 这个论文提出了一个通用的额外风险减少约束,并证明了深度神经网络估计器在不同函数类中同时具有适应性减少约束(只差log因子)。
    Abstract We consider a deep neural network estimator based on empirical risk minimization with l_1-regularization. We derive a general bound for its excess risk in regression and classification (including multiclass), and prove that it is adaptively nearly-minimax (up to log-factors) simultaneously across the entire range of various function classes.
    摘要 我们考虑了一种深度神经网络估计器,基于经验风险最小化和L1正则化。我们得到了一个通用的过度风险上界,并证明其在回归和分类(包括多类)中的过度风险增长率是适应的,即在不同的函数集中同时达到了 Nearly-minimax 性(即Logarithmic factor)。Here's the word-for-word translation:我们考虑了一种深度神经网络估计器,基于经验风险最小化与L1正则化。我们得到了一个通用的过度风险上界,并证明其在回归与分类(包括多类)中的过度风险增长率是适应的,即在不同的函数集中同时达到了 Nearly-minimax 性(即Logarithmic factor)。

Neuroscience inspired scientific machine learning (Part-1): Variable spiking neuron for regression

  • paper_url: http://arxiv.org/abs/2311.09267
  • repo_url: None
  • paper_authors: Shailesh Garg, Souvik Chakraborty
  • for: 降低神经网络中的冗余传输,以降低深度学习模型的复杂性和能耗。
  • methods: 提出一种新的变量脉冲神经元(VSN),基于生物神经元灵感的泄漏集成和发射神经元(LIF-SN)。VSN兼用了LIF-SN和人工神经元的优点,实现了间歇性发射和连续活动的同时存在。
  • results: 对于分类和回归任务进行测试,VSN的结果表明其适用程度较高,尤其是在回归任务中。
    Abstract Redundant information transfer in a neural network can increase the complexity of the deep learning model, thus increasing its power consumption. We introduce in this paper a novel spiking neuron, termed Variable Spiking Neuron (VSN), which can reduce the redundant firing using lessons from biological neuron inspired Leaky Integrate and Fire Spiking Neurons (LIF-SN). The proposed VSN blends LIF-SN and artificial neurons. It garners the advantage of intermittent firing from the LIF-SN and utilizes the advantage of continuous activation from the artificial neuron. This property of the proposed VSN makes it suitable for regression tasks, which is a weak point for the vanilla spiking neurons, all while keeping the energy budget low. The proposed VSN is tested against both classification and regression tasks. The results produced advocate favorably towards the efficacy of the proposed spiking neuron, particularly for regression tasks.
    摘要 <> neural network 中的重复信息传递可能会增加深度学习模型的复杂性,从而增加其电力消耗。本文提出了一种新型的脉冲神经元(Variable Spiking Neuron,VSN),它可以减少不必要的脉冲发生,基于生物神经元发射的灵感,如生物脉冲神经元(LIF-SN)。提出的 VSN 结合了人工神经元和生物神经元的优点。它可以在脉冲神经元中实现间歇性的发射,同时在人工神经元中实现不间歇的活动。这种 VSN 的特性使其适用于回归任务,这是普通脉冲神经元的弱点,又不增加能量预算。本文测试了 VSN 在分类和回归任务上的表现,结果表明,特别是在回归任务上,提出的脉冲神经元具有良好的效果。Note: Simplified Chinese is a romanization of Chinese that uses a simplified set of characters and grammar rules. It is commonly used in mainland China and Singapore. Traditional Chinese is another form of Chinese that uses a more complex set of characters and grammar rules, and is commonly used in Hong Kong, Macau, and Taiwan.

Environment-independent mmWave Fall Detection with Interacting Multiple Model

  • paper_url: http://arxiv.org/abs/2311.08755
  • repo_url: None
  • paper_authors: Xuyao Yu, Jiazhao Wang, Wenchao Jiang
  • for: 本研究旨在开发一种高精度、高可靠性的非侵入式、非合作式、非接触式跌倒检测系统,以满足智能家居未来的老年人日常照顾需求。
  • methods: 本研究使用mmWave雷达技术,并提出了一种实用的多模型状态估计器(IMM),可以提取环境无关的特征,以实现高精度和快速的跌倒检测。此外,我们还提出了一种Robust多用户跟踪系统,以处理环境噪音和其他人体噪音。
  • results: 我们在实际场景中进行了测试,结果显示跌倒检测精度达95%。
    Abstract The ageing society brings attention to daily elderly care through sensing technologies. The future smart home is expected to enable in-home daily monitoring, such as fall detection, for seniors in a non-invasive, non-cooperative, and non-contact manner. The mmWave radar is a promising candidate technology for its privacy-preserving and non-contact manner. However, existing solutions suffer from low accuracy and robustness due to environment dependent features. In this paper, we present FADE (\underline{FA}ll \underline{DE}tection), a practical fall detection radar system with enhanced accuracy and robustness in real-world scenarios. The key enabler underlying FADE is an interacting multiple model (IMM) state estimator that can extract environment-independent features for highly accurate and instantaneous fall detection. Furthermore, we proposed a robust multiple-user tracking system to deal with noises from the environment and other human bodies. We deployed our algorithm on low computing power and low power consumption system-on-chip (SoC) composed of data front end, DSP, and ARM processor, and tested its performance in real-world. The experiment shows that the accuracy of fall detection is up to 95\%.
    摘要 社会老龄化引导了每天老人照顾的注意力,未来智能家庭将采用感知技术实现在家中无需参与的老人照顾。例如,fall detection。 millimeter wave radar是一种有前途的技术,因为它可以保持隐私和不接触的方式。然而,现有的解决方案受到环境依赖的特征的影响,导致准确性和可靠性不高。本文提出了FADE(落体检测),一种实用的落体检测雷达系统,具有提高了准确性和可靠性的实际应用能力。FADE的关键技术是一种交互式多模型(IMM)状态估计器,可以提取环境无关的特征,实现高准确性和快速检测落体。此外,我们还提出了一种Robust多用户跟踪系统,可以处理环境和其他人体的噪声。我们将算法部署到低计算力和低功耗系统上,并在实际应用中进行测试。实验结果表明,落体检测精度达95%。

Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling

  • paper_url: http://arxiv.org/abs/2311.08745
  • repo_url: None
  • paper_authors: Naoki Sato, Hideaki Iiduka
  • for: 本研究は非凸函数のグローバル最适解を探索するためのGradient Descent方法の theoretically 分析を提供します。
  • methods: 本研究では新しい非凸函数の家族を定义し、その suffcient condition を议论し、graduated optimization algorithmの扩张を提供します。
  • results: 本研究の结果は、学习率とバッチサイズが函数の平滑化に影响することを示します。また、decaying learning rateと増加するバッチサイズがsuperiorであることを理论的に说明します。さらに、Image classificationの実験结果を提供しています。
    Abstract The graduated optimization approach is a heuristic method for finding globally optimal solutions for nonconvex functions and has been theoretically analyzed in several studies. This paper defines a new family of nonconvex functions for graduated optimization, discusses their sufficient conditions, and provides a convergence analysis of the graduated optimization algorithm for them. It shows that stochastic gradient descent (SGD) with mini-batch stochastic gradients has the effect of smoothing the function, the degree of which is determined by the learning rate and batch size. This finding provides theoretical insights from a graduated optimization perspective on why large batch sizes fall into sharp local minima, why decaying learning rates and increasing batch sizes are superior to fixed learning rates and batch sizes, and what the optimal learning rate scheduling is. To the best of our knowledge, this is the first paper to provide a theoretical explanation for these aspects. Moreover, a new graduated optimization framework that uses a decaying learning rate and increasing batch size is analyzed and experimental results of image classification that support our theoretical findings are reported.
    摘要 “渐进优化方法是一种幂等方法,用于找到非对称函数的全局优化解决方案,在一些研究中得到了理论分析。这篇论文定义了一个新的非对称函数家族,讨论了它们的必要条件,并对渐进优化算法的整体分析进行了讨论。研究表明,使用批处理随机梯度 descend (SGD) 可以将函数缓和,其缓和度取决于学习率和批处理大小。这一发现为渐进优化视角提供了理论上的解释,包括大批处理大小会落入锐的局部最优点、 decaying 学习率和增加批处理大小是优于固定学习率和批处理大小,以及优化学习率的调度。这是我们知道的第一篇提供了这些方面的理论解释的论文。此外,我们还提出了一种使用 decaying 学习率和增加批处理大小的新渐进优化框架,并对实验结果进行了报告。”

Towards Graph-Aware Diffusion Modeling for Collaborative Filtering

  • paper_url: http://arxiv.org/abs/2311.08744
  • repo_url: None
  • paper_authors: Yunqin Zhu, Chao Wang, Hui Xiong
  • for: 这篇论文是为了提出一种基于神经网络模型的推荐系统中的恢复隐藏反馈方法,帮助推荐系统更好地理解用户的偏好。
  • methods: 该方法基于diffusion模型,通过对用户的历史交互数据进行反 diffusion,逐渐恢复用户的隐藏偏好。具体来说,我们首先应用synthetic smoothing filters于item-item图中的交互信号,然后通过graph Fourier transform将这种模型 equivalently characterized为一种在图спектраль领域的非对称Gaussian diffusion。
  • results: 我们的模型在一个数据集上比州当前的方法提高了大量的margin,并在其他数据集上获得了竞争力的结果。
    Abstract Recovering masked feedback with neural models is a popular paradigm in recommender systems. Seeing the success of diffusion models in solving ill-posed inverse problems, we introduce a conditional diffusion framework for collaborative filtering that iteratively reconstructs a user's hidden preferences guided by its historical interactions. To better align with the intrinsic characteristics of implicit feedback data, we implement forward diffusion by applying synthetic smoothing filters to interaction signals on an item-item graph. The resulting reverse diffusion can be interpreted as a personalized process that gradually refines preference scores. Through graph Fourier transform, we equivalently characterize this model as an anisotropic Gaussian diffusion in the graph spectral domain, establishing both forward and reverse formulations. Our model outperforms state-of-the-art methods by a large margin on one dataset and yields competitive results on the others.
    摘要 “复原涂体反馈”是现代推荐系统中广泛应用的一种方法。见到传播模型在解决不确定 inverse 问题中的成功,我们引入一个受条件的涂体架构,通过Iteratively重建用户隐藏的偏好。为了更好地适应实际的隐藏反馈数据特点,我们实现了前方涂体,通过对交互信号进行合成滤波,实现反涂体。这个过程可以解释为对用户个别化的过程,渐进地调整偏好分数。通过几何传播变换,我们将这个模型等同于一个方向性涂体在几何spectral domain中,建立了前后两种表现。我们的模型在一个数据集上大幅超过了现有方法,并在其他数据集上获得了竞争性的结果。

Enabling CMF Estimation in Data-Constrained Scenarios: A Semantic-Encoding Knowledge Mining Model

  • paper_url: http://arxiv.org/abs/2311.08690
  • repo_url: None
  • paper_authors: Yanlin Qi, Jia Li, Michael Zhang
    for: 这个研究的目的是提供一个可靠且可读的知识探索框架,以便更好地估算防车攻击因子(CMF)。methods: 本研究使用了人类理解的灵感和进步的自然语言处理(NLP)技术,将存在的防车攻击因子知识中的细微变化和图像转换成机器可读的表示,以模型这些变化和CMF值之间的复杂关系。results: 实验结果显示,这个新的数据驱动的框架可以与传统的CMF估算方法相比,在精度方面得到了明显的改善。此外,这个方法还提供了对于防车攻击因子估算的新的可能性,例如可以在不 enough crash data 的情况下进行估算。
    Abstract Precise estimation of Crash Modification Factors (CMFs) is central to evaluating the effectiveness of various road safety treatments and prioritizing infrastructure investment accordingly. While customized study for each countermeasure scenario is desired, the conventional CMF estimation approaches rely heavily on the availability of crash data at given sites. This not only makes the estimation costly, but the results are also less transferable, since the intrinsic similarities between different safety countermeasure scenarios are not fully explored. Aiming to fill this gap, this study introduces a novel knowledge-mining framework for CMF prediction. This framework delves into the connections of existing countermeasures and reduces the reliance of CMF estimation on crash data availability and manual data collection. Specifically, it draws inspiration from human comprehension processes and introduces advanced Natural Language Processing (NLP) techniques to extract intricate variations and patterns from existing CMF knowledge. It effectively encodes unstructured countermeasure scenarios into machine-readable representations and models the complex relationships between scenarios and CMF values. This new data-driven framework provides a cost-effective and adaptable solution that complements the case-specific approaches for CMF estimation, which is particularly beneficial when availability of crash data or time imposes constraints. Experimental validation using real-world CMF Clearinghouse data demonstrates the effectiveness of this new approach, which shows significant accuracy improvements compared to baseline methods. This approach provides insights into new possibilities of harnessing accumulated transportation knowledge in various applications.
    摘要 中 precisione 的评估坏事件修复因素(CMF)是评估不同安全处理措施的效果和决策建设投资的中心。尽管欢迎特定情况的自定义研究,但传统的CMF评估方法依赖于提供的坏事件数据的可用性,这不仅使得评估成本高,还使得结果更难于传递。为了缓解这个差距,本研究提出了一种新的知识挖掘框架,用于预测CMF。这个框架Drawing inspiration from human comprehension processes and introducing advanced Natural Language Processing (NLP) techniques, it effectively encodes unstructured countermeasure scenarios into machine-readable representations and models the complex relationships between scenarios and CMF values. This new data-driven framework provides a cost-effective and adaptable solution that complements case-specific approaches for CMF estimation, which is particularly beneficial when availability of crash data or time imposes constraints. Experimental validation using real-world CMF Clearinghouse data demonstrates the effectiveness of this new approach, which shows significant accuracy improvements compared to baseline methods. This approach provides insights into new possibilities of harnessing accumulated transportation knowledge in various applications.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Federated Learning for Sparse Principal Component Analysis

  • paper_url: http://arxiv.org/abs/2311.08677
  • repo_url: None
  • paper_authors: Sin Cheng Ciou, Pin Jui Chen, Elvin Y. Tseng, Yuh-Jye Lee
  • for: 本研究旨在提出一种基于联合方向法的分布式主成分分析(Federated SPCA)方法,以提高数据Owner之间的数据共享和模型训练效率。
  • methods: 本研究使用了联合方向法(ADMM)和权重补做法来解决分布式主成分分析(SPCA)中的优化问题,并在异步数据场景下进行了广泛的实验 validate the effectiveness of the proposed method.
  • results: 实验结果表明,对于同adiabatic和非同adiabatic的情况,Federated SPCA方法能够提高模型训练效率和数据共享安全性,同时保持模型训练精度。此外,Federated SPCA方法还能够适应不同的数据分布场景。
    Abstract In the rapidly evolving realm of machine learning, algorithm effectiveness often faces limitations due to data quality and availability. Traditional approaches grapple with data sharing due to legal and privacy concerns. The federated learning framework addresses this challenge. Federated learning is a decentralized approach where model training occurs on client sides, preserving privacy by keeping data localized. Instead of sending raw data to a central server, only model updates are exchanged, enhancing data security. We apply this framework to Sparse Principal Component Analysis (SPCA) in this work. SPCA aims to attain sparse component loadings while maximizing data variance for improved interpretability. Beside the L1 norm regularization term in conventional SPCA, we add a smoothing function to facilitate gradient-based optimization methods. Moreover, in order to improve computational efficiency, we introduce a least squares approximation to original SPCA. This enables analytic solutions on the optimization processes, leading to substantial computational improvements. Within the federated framework, we formulate SPCA as a consensus optimization problem, which can be solved using the Alternating Direction Method of Multipliers (ADMM). Our extensive experiments involve both IID and non-IID random features across various data owners. Results on synthetic and public datasets affirm the efficacy of our federated SPCA approach.
    摘要 在机器学习领域的急速发展中,算法效果经常受到数据质量和可用性的限制。传统方法面临数据分享的挑战,主要是因为法律和隐私问题。基于联邦学习框架的方法可以解决这个问题。联邦学习是一种分布式的方法,在客户端上进行模型训练,保持隐私性,不需要将数据传输到中央服务器。而不是将原始数据传输到中央服务器,只需将模型更新传输,从而提高数据安全性。在这种框架下,我们将SPCA(稀畴主成分分析)应用于这里。SPCA的目标是在保持数据变量的最大化的情况下,获得稀畴的成分荷载。我们在传统的SPCA中添加了简化函数,以便使用梯度基本优化方法。此外,我们引入了最小二乘近似,以提高计算效率。在联邦框架下,我们将SPCA定义为一个协调优化问题,可以使用ADMM(替代方向多个分解器)来解决。我们的广泛的实验包括了不同数据所有者的IID和非IID随机特征。结果表明我们的联邦SPCA方法是有效的。

Coreset Selection with Prioritized Multiple Objectives

  • paper_url: http://arxiv.org/abs/2311.08675
  • repo_url: None
  • paper_authors: Xiaobo Xia, Jiale Liu, Shaokun Zhang, Qingyun Wu, Tongliang Liu
  • for: 降低计算成本和加速数据处理,使深度学习算法在大规模数据上进行训练。
  • methods: 提出了“高性能核心集选择”问题,以最小化核心集大小,保证模型性能。提出了一种新的优先级优化方法,通过优先级顺序优化模型性能和核心集大小,并提供了证明其整体性的 converge 性。
  • results: 经验表明,该方法可以在各种场景下提供更好的模型性能,使用更小的核心集大小。
    Abstract Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms. It strives to identify a small subset from large-scale data, so that training only on the subset practically performs on par with full data. When coreset selection is applied in realistic scenes, under the premise that the identified coreset has achieved comparable model performance, practitioners regularly desire the identified coreset can have a size as small as possible for lower costs and greater acceleration. Motivated by this desideratum, for the first time, we pose the problem of "coreset selection with prioritized multiple objectives", in which the smallest coreset size under model performance constraints is explored. Moreover, to address this problem, an innovative method is proposed, which maintains optimization priority order over the model performance and coreset size, and efficiently optimizes them in the coreset selection procedure. Theoretically, we provide the convergence guarantee of the proposed method. Empirically, extensive experiments confirm its superiority compared with previous strategies, often yielding better model performance with smaller coreset sizes.
    摘要 <>核心集选择是深度学习算法中强大的reduction技术,它目标是从大规模数据中选择一小集,使得只训练这小集可以实际上与全量数据达到相同的性能水平。在实际场景中,当应用核心集选择时,参与者通常希望可以选择最小的核心集,以降低成本和提高加速。为此,我们首次提出了“核心集选择 WITH 优先级多个目标”的问题,即寻找最小的核心集,同时保证模型性能的限制。此外,我们还提出了一种创新的方法,可以具有优先级顺序的优化模型性能和核心集大小,并有理论上的 konvergence 保证。实际实验证明了该方法的优越性,常常可以在更小的核心集上达到更好的模型性能。>>>

Supervised low-rank semi-nonnegative matrix factorization with frequency regularization for forecasting spatio-temporal data

  • paper_url: http://arxiv.org/abs/2311.08636
  • repo_url: None
  • paper_authors: Keunsu Kim, Hanbaek Lyu, Jinsu Kim, Jae-Hun Jung
  • for: 预测空间时间数据使用supervised semi-nonnegative矩阵分解(SSNMF) WITH频率正则化
  • methods: 使用矩阵分解将空间时间数据分解成空间和时间组成部分,并在时间域加入非负约束,以提高时间模式的明确度。在频率域中选择特征,使解释更加容易。提出了软和硬正则化两种方法,并提供了首领点的收敛保证。
  • results: 应用于GRACE数据时,与前期研究在地球物理科学中的结果相比,提出的方法可以得到类似的结果,但是解释性更高。
    Abstract We propose a novel methodology for forecasting spatio-temporal data using supervised semi-nonnegative matrix factorization (SSNMF) with frequency regularization. Matrix factorization is employed to decompose spatio-temporal data into spatial and temporal components. To improve clarity in the temporal patterns, we introduce a nonnegativity constraint on the time domain along with regularization in the frequency domain. Specifically, regularization in the frequency domain involves selecting features in the frequency space, making an interpretation in the frequency domain more convenient. We propose two methods in the frequency domain: soft and hard regularizations, and provide convergence guarantees to first-order stationary points of the corresponding constrained optimization problem. While our primary motivation stems from geophysical data analysis based on GRACE (Gravity Recovery and Climate Experiment) data, our methodology has the potential for wider application. Consequently, when applying our methodology to GRACE data, we find that the results with the proposed methodology are comparable to previous research in the field of geophysical sciences but offer clearer interpretability.
    摘要 我们提出了一种新的方法来预测空间-时间数据,使用监督 semi-非正式矩阵分解(SSNMF),并添加频率刻度regularization。矩阵分解用于将空间-时间数据分解成空间和时间组成部分。为了提高时间特征的明显性,我们引入了非负约束在时间频谱上,同时在频率频谱上进行了规regularization。我们提出了两种频率频谱中的方法:软和硬的regulization,并提供了首轮站点的确定性保证。我们的主要动机来自地球物理数据分析,基于 GRACE(重力回升和气候实验)数据,但我们的方法具有更广泛的应用前景。当我们应用我们的方法于 GRACE 数据时,我们发现结果与前一些地球物理科学领域的研究相似,但更容易理解。

Non-Uniform Smoothness for Gradient Descent

  • paper_url: http://arxiv.org/abs/2311.08615
  • repo_url: https://github.com/lindonroberts/nonuniform-smoothness
  • paper_authors: Albert S. Berahas, Lindon Roberts, Fred Roosta
  • for: 这个论文主要是为了提出一种新的梯度下降类方法,以及一种基于本地首项稳定性诊断(LFSO)的模型。
  • methods: 这个论文使用了一种基于LFSO的修改后的梯度下降方法,并给出了全球和本地收敛结果。
  • results: 论文表明,使用LFSO可以在非强式凹陷问题中实现全球线性收敛率,并且超过了通用(加速)首项方法的下界。
    Abstract The analysis of gradient descent-type methods typically relies on the Lipschitz continuity of the objective gradient. This generally requires an expensive hyperparameter tuning process to appropriately calibrate a stepsize for a given problem. In this work we introduce a local first-order smoothness oracle (LFSO) which generalizes the Lipschitz continuous gradients smoothness condition and is applicable to any twice-differentiable function. We show that this oracle can encode all relevant problem information for tuning stepsizes for a suitably modified gradient descent method and give global and local convergence results. We also show that LFSOs in this modified first-order method can yield global linear convergence rates for non-strongly convex problems with extremely flat minima, and thus improve over the lower bound on rates achievable by general (accelerated) first-order methods.
    摘要 通常来说,梯度下降类方法的分析假设函数梯度的 lipschitz连续性。这通常需要一个昂贵的参数调整过程,以适应给定问题。在这篇文章中,我们引入了本地首项简oothness oracle(LFSO),这将 lípschitz连续梯度简单性条件扩展到任何两次导数函数。我们证明这个oracle可以包含所有相关的问题信息,用于调整梯度下降方法的步长。我们还证明LFSOs在修改后的首项方法中可以实现全球线性减少率,超过一般(加速)首项方法的下界。

Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption

  • paper_url: http://arxiv.org/abs/2311.08610
  • repo_url: None
  • paper_authors: Itamar Zimerman, Moran Baruch, Nir Drucker, Gilad Ezov, Omri Soceanu, Lior Wolf
  • for: 这项研究旨在开发一种privacy-preserving深度学习模型,尤其是在使用 Homomorphic Encryption (HE) 技术时。
  • methods: 这项研究使用了一种新的幂数变换方法,将 transformer 模型转换成幂数形式,以实现安全的推理。同时,这种方法还可以在不同的数据集上进行图像分类和文本分类。
  • results: 研究结果显示,这种方法可以实现与传统方法相当的性能,并且可以在不同的应用场景中使用。此外,研究还发现了一些稳定性问题,并进行了一系列的ablations来评估每个模型组件的贡献。
    Abstract Designing privacy-preserving deep learning models is a major challenge within the deep learning community. Homomorphic Encryption (HE) has emerged as one of the most promising approaches in this realm, enabling the decoupling of knowledge between the model owner and the data owner. Despite extensive research and application of this technology, primarily in convolutional neural networks, incorporating HE into transformer models has been challenging because of the difficulties in converting these models into a polynomial form. We break new ground by introducing the first polynomial transformer, providing the first demonstration of secure inference over HE with transformers. This includes a transformer architecture tailored for HE, alongside a novel method for converting operators to their polynomial equivalent. This innovation enables us to perform secure inference on LMs with WikiText-103. It also allows us to perform image classification with CIFAR-100 and Tiny-ImageNet. Our models yield results comparable to traditional methods, bridging the performance gap with transformers of similar scale and underscoring the viability of HE for state-of-the-art applications. Finally, we assess the stability of our models and conduct a series of ablations to quantify the contribution of each model component.
    摘要 设计保持隐私的深度学习模型是深度学习社区中的一个主要挑战。归一化加密(HE)已经成为这个领域中最有前途的方法,允许知识的解耦 между模型所有者和数据所有者。尽管已经进行了广泛的研究和应用这技术,主要在卷积神经网络上,但将HE应用于变换器模型却是一个挑战,因为变换器模型不可以直接转化为多项式形式。我们在这篇文章中首次提出了第一个多项式变换器,并提供了将操作符转化为其多项式等价的新方法。这种创新允许我们在LMs上进行安全的推理,并在CIFAR-100和Tiny-ImageNet上进行图像分类。我们的模型的结果与传统方法相似, thereby bridging the performance gap with transformers of similar scale, and demonstrating the feasibility of HE for state-of-the-art applications. Finally, we assess the stability of our models and conduct a series of ablations to quantify the contribution of each model component.

eess.IV - 2023-11-15

Leveraging machine learning to enhance climate models: a review

  • paper_url: http://arxiv.org/abs/2311.09413
  • repo_url: None
  • paper_authors: Ahmed Elsayed, Shrouk Wally, Islam Alkabbany, Asem Ali, Aly Farag
  • for: 提高当前气候模型的准确性,帮助政府和个人制定有效的气候变化缓冲策略。
  • methods: 使用机器学习技术分析大量的气候数据,提取有价值的信息,帮助我们更好地理解地球气候。
  • results: 在过去5年内,机器学习技术已经被广泛应用于提高当前气候模型的准确性,提供了有力的数据分析工具。
    Abstract Recent achievements in machine learning (Ml) have had a significant impact on various fields, including climate science. Climate modeling is very important and plays a crucial role in shaping the decisions of governments and individuals in mitigating the impact of climate change. Climate change poses a serious threat to humanity, however, current climate models are limited by computational costs, uncertainties, and biases, affecting their prediction accuracy. The vast amount of climate data generated by satellites, radars, and earth system models (ESMS) poses a significant challenge. ML techniques can be effectively employed to analyze this data and extract valuable insights that aid in our understanding of the earth climate. This review paper focuses on how ml has been utilized in the last 5 years to boost the current state-of-the-art climate models. We invite the ml community to join in the global effort to accurately model the earth climate by collaborating with other fields to leverage ml as a powerful tool in this endeavor.
    摘要 Recent advances in machine learning (ML) have had a profound impact on various fields, including climate science. Climate modeling is crucial and plays a vital role in shaping the decisions of governments and individuals in mitigating the impact of climate change. However, current climate models are limited by computational costs, uncertainties, and biases, which affect their prediction accuracy. The vast amount of climate data generated by satellites, radars, and earth system models (ESMs) poses a significant challenge. ML techniques can be effectively employed to analyze this data and extract valuable insights that aid in our understanding of the earth climate. This review paper focuses on how ML has been utilized in the last five years to improve the current state-of-the-art climate models. We invite the ML community to join in the global effort to accurately model the earth climate by collaborating with other fields to leverage ML as a powerful tool in this endeavor.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you need Traditional Chinese, please let me know.

Parallel Quantum Hough Transform

  • paper_url: http://arxiv.org/abs/2311.09002
  • repo_url: None
  • paper_authors: Frank Klefenz, Nico Wittrock, Frank Feldhoff
  • for: 这个论文是为了提出一种并行量子截割(PQHT)算法,并在量子计算机上进行实现。
  • methods: 该算法使用了一系列连接的可程序化$\texttt{RZ}$旋转门,以及启发器实现的巧合检测器。
  • results: 作者在IBM Quantum Composer中实现了模块,并使用IBM QASM仿真器进行测试。最终,模块被编译使用Python包Qiskit,并将任务分发到分布式的IBM Q System One量子计算机上进行执行。成功运行结果在Friaufhofer Q System One上进行了证明。
    Abstract Few of the known quantum algorithms can be reliably executed on a quantum computer. Therefore, as an extension, we propose a Parallel Quantum Hough transform (PQHT) algorithm that we execute on a quantum computer. We give its implementation and discuss the results obtained. The PQHT algorithm is conceptually divided into a parallel rotation stage consisting of a set of connected programmable $\texttt{RZ}$ rotation gates, with adjustable node connections of coincidence detectors realized with quantum logic gates. The modules were developed using IBM Quantum Composer and tested using the IBM QASM simulator. Finally, the modules were programmed using the Python package Qiskit and the jobs were sent to distributed IBM Q System One quantum computers. The successful run results on Fraunhofer Q System One in Ehningen will be presented as a proof of concept for the PQHT algorithm.
    摘要 “现有的量子算法中,只有一些可靠地在量子计算机上执行。因此,我们提出了一种并行量子哈夫散度(PQHT)算法,我们在量子计算机上实现。我们将其实现方式和结果讨论。PQHT算法从概念上分为一个并行旋转阶段,包括一组相连的可编程的$\texttt{RZ}$旋转门,其中节点连接的吻合探测器实现使用量子逻辑门。模块使用IBM量子 composer开发,使用IBM QASM simulator进行测试。最后,模块使用Python包Qiskit编程,并将任务分发给分布式IBM Q System One量子计算机。成功运行结果在 Fraunhofer Q System One 上将被提供作为PQHT算法的证明。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Ultrafast 3-D Super Resolution Ultrasound using Row-Column Array specific Coherence-based Beamforming and Rolling Acoustic Sub-aperture Processing: In Vitro, In Vivo and Clinical Study

  • paper_url: http://arxiv.org/abs/2311.08823
  • repo_url: None
  • paper_authors: Joseph Hansen-Shearer, Jipeng Yan, Marcelo Lerendegui, Biao Huang, Matthieu Toulemonde, Kai Riemer, Qingyuan Tan, Johanna Tonko, Peter D. Weinberg, Chris Dunsby, Meng-Xing Tang
  • for: 这个论文是为了研究ROW-COLUMNAddressed array的ultrasound imaging技术,以提高图像质量和快速扫描速率。
  • methods: 论文使用了ROW-COLUMNAddressed array-specific coherence-based beamforming技术和声学子镜处理技术来减少“次要”辐射残余和杂音,提高图像质量和快速扫描速率。
  • results: 实验结果表明,这种新的图像重建方法可以减少“false”位置的比例,降低杂音水平和提高图像扫描速率,并在人体内进行非侵入式的ultrasound imaging。
    Abstract The row-column addressed array is an emerging probe for ultrafast 3-D ultrasound imaging. It achieves this with far fewer independent electronic channels and a wider field of view than traditional 2-D matrix arrays, of the same channel count, making it a good candidate for clinical translation. However, the image quality of row-column arrays is generally poor, particularly when investigating tissue. Ultrasound localisation microscopy allows for the production of super-resolution images even when the initial image resolution is not high. Unfortunately, the row-column probe can suffer from imaging artefacts that can degrade the quality of super-resolution images as `secondary' lobes from bright microbubbles can be mistaken as microbubble events, particularly when operated using plane wave imaging. These false events move through the image in a physiologically realistic way so can be challenging to remove via tracking, leading to the production of 'false vessels'. Here, a new type of rolling window image reconstruction procedure was developed, which integrated a row-column array-specific coherence-based beamforming technique with acoustic sub-aperture processing for the purposes of reducing `secondary' lobe artefacts, noise and increasing the effective frame rate. Using an {\it{in vitro} cross tube, it was found that the procedure reduced the percentage of `false' locations from $\sim$26\% to $\sim$15\% compared to traditional orthogonal plane wave compounding. Additionally, it was found that the noise could be reduced by $\sim$7 dB and that the effective frame rate could be increased to over 4000 fps. Subsequently, {\it{in vivo} ultrasound localisation microscopy was used to produce images non-invasively of a rabbit kidney and a human thyroid.
    摘要 矩阵列 Addressed 数组是一种emerging probe дляultrafast 3D 超声成像。它通过使用 fewer independent electronic channels 和更广阔的视场,可以与传统的2D 矩阵数组相比,提供更高的图像质量,这使其成为临床翻译的好选择。然而,矩阵列数组的图像质量通常较差,特别是在调查组织时。超声成像微scopic allow for the production of super-resolution images even when the initial image resolution is not high. Unfortunately, the row-column probe can suffer from imaging artifacts that can degrade the quality of super-resolution images as "secondary" lobes from bright microbubbles can be mistaken as microbubble events, particularly when operated using plane wave imaging. These false events move through the image in a physiologically realistic way so can be challenging to remove via tracking, leading to the production of 'false vessels'. Here, a new type of rolling window image reconstruction procedure was developed, which integrated a row-column array-specific coherence-based beamforming technique with acoustic sub-aperture processing for the purposes of reducing "secondary" lobe artifacts, noise, and increasing the effective frame rate. Using an in vitro cross tube, it was found that the procedure reduced the percentage of 'false' locations from approximately 26% to approximately 15% compared to traditional orthogonal plane wave compounding. Additionally, it was found that the noise could be reduced by approximately 7 dB and that the effective frame rate could be increased to over 4000 fps. Subsequently, in vivo ultrasound localization microscopy was used to produce images non-invasively of a rabbit kidney and a human thyroid.

Degradation Estimation Recurrent Neural Network with Local and Non-Local Priors for Compressive Spectral Imaging

  • paper_url: http://arxiv.org/abs/2311.08808
  • repo_url: None
  • paper_authors: Yubo Dong, Dahua Gao, Yuyan Li, Guangming Shi, Danhua Liu
  • for: 这个论文目的是为了提高coded aperture snapshot spectral imaging(CASSI)系统中的3D彩色спектраль图像(HSI)重建的性能。
  • methods: 这个论文使用了深度 unfolding network(DUN)来实现HSI重建,并在DUN中引入了Recurrent Neural Network(RNN)、Degradation Estimation Network(DERNN)和Local and Non-Local Transformer(LNLT)等 Component。
  • results: 这个论文的实验结果表明,DERNN-LNLT可以提高CASSI系统中HSI重建的精度和效率,并且可以更好地利用本地和非本地的彩色спектраль图像约束。
    Abstract In coded aperture snapshot spectral imaging (CASSI) systems, a core problem is to recover the 3D hyperspectral image (HSI) from the 2D measurement. Current deep unfolding networks (DUNs) for the HSI reconstruction mainly suffered from three issues. Firstly, in previous DUNs, the DNNs across different stages were unable to share the feature representations learned from different stages, leading to parameter sparsity, which in turn limited their reconstruction potential. Secondly, previous DUNs fail to estimate degradation-related parameters within a unified framework, including the degradation matrix in the data subproblem and the noise level in the prior subproblem. Consequently, either the accuracy of solving the data or the prior subproblem is compromised. Thirdly, exploiting both local and non-local priors for the HSI reconstruction is crucial, and it remains a key issue to be addressed. In this paper, we first transform the DUN into a Recurrent Neural Network (RNN) by sharing parameters across stages, which allows the DNN in each stage could learn feature representation from different stages, enhancing the representativeness of the DUN. Secondly, we incorporate the Degradation Estimation Network into the RNN (DERNN), which simultaneously estimates the degradation matrix and the noise level by residual learning with reference to the sensing matrix. Thirdly, we propose a Local and Non-Local Transformer (LNLT) to effectively exploit both local and non-local priors in HSIs. By integrating the LNLT into the DERNN for solving the prior subproblem, we propose the DERNN-LNLT, which achieves state-of-the-art performance.
    摘要 在coded aperture snapshot spectral imaging(CASSI)系统中,核心问题是从2D测量数据中回归3D彩色спектраль成像(HSI)。目前的深度 unfolding network(DUN)for HSI重建主要受到以下三个问题的限制:首先,在前一代DUN中,不同阶段的神经网络(DNN)无法共享不同阶段学习的特征表示,导致参数稀缺,从而限制了重建的潜力。第二,前一代DUN无法在一个框架下 simultanously 估计数据偏移矩阵和噪声水平,这两者都是重建HSI的关键参数。如果仅仅解决数据问题或先验问题,则其准确性将受到限制。第三,在HSIs中利用本地和非本地的假设是重要的,但是在前一代DUN中,这些假设并未得到充分利用。在本文中,我们首先将DUN转换为回归神经网络(RNN),使得不同阶段的DNN可以共享特征表示,从而提高DUN的表现。其次,我们将偏移估计网络(DERNN) incorporated into RNN,以同时估计数据偏移矩阵和噪声水平,通过对准测试矩阵的学习。最后,我们提出了本地和非本地转换器(LNLT),以有效地利用HSIs中的本地和非本地假设。通过将LNLTintegrated into DERNN,我们提出了DERNN-LNLT,实现了state-of-the-art的性能。

eess.SP - 2023-11-15

Enhancing AmBC Systems with Deep Learning for Joint Channel Estimation and Signal Detection

  • paper_url: http://arxiv.org/abs/2311.09172
  • repo_url: None
  • paper_authors: S. Zargari, A. Hakimi, C. Tellambura, A. Maaref
  • for: 提高AmBC系统的可靠性和效率,特别是在具有噪声的实际通信 Channel conditions 下
  • methods: 使用深度神经网络(DNN)进行通道状态估计(CSI)和数据检测,并将两者结合使用,以提高AmBC系统的数据检测精度
  • results: 对比传统检测器,我们的DNN方法在实际数据中表现出较好的robust性和高效性,尤其在高信号噪声比(SNR)下表现出 aproximately 20%的改善Here’s the translation in English:
  • for: To improve the reliability and efficiency of AmBC systems, especially in practical communication channels with noise.
  • methods: Using a deep neural network (DNN) for channel state estimation (CSI) and data detection, and combining the two for improved data detection accuracy in AmBC systems.
  • results: Our DNN method outperforms traditional detectors in practical data recovery, with an approximately 20% improvement in bit error rate (BER) compared to the maximum likelihood (ML) approach, especially in high signal-to-noise ratio (SNR) conditions.
    Abstract The era of ubiquitous, affordable wireless connectivity has opened doors to countless practical applications. In this context, ambient backscatter communication (AmBC) stands out, utilizing passive tags to establish connections with readers by harnessing reflected ambient radio frequency (RF) signals. However, conventional data detectors face limitations due to their inadequate knowledge of channel and RF-source parameters. To address this challenge, we propose an innovative approach using a deep neural network (DNN) for channel state estimation (CSI) and signal detection within AmBC systems. Unlike traditional methods that separate CSI estimation and data detection, our approach leverages a DNN to implicitly estimate CSI and simultaneously detect data. The DNN model, trained offline using simulated data derived from channel statistics, excels in online data recovery, ensuring robust performance in practical scenarios. Comprehensive evaluations validate the superiority of our proposed DNN method over traditional detectors, particularly in terms of bit error rate (BER). In high signal-to-noise ratio (SNR) conditions, our method exhibits an impressive approximately 20% improvement in BER performance compared to the maximum likelihood (ML) approach. These results underscore the effectiveness of our developed approach for AmBC channel estimation and signal detection. In summary, our method outperforms traditional detectors, bolstering the reliability and efficiency of AmBC systems, even in challenging channel conditions.
    摘要 现代无线通信技术已经提供了无限可靠、便宜的连接,开启了无数实用应用。在这个背景下, ambient backscatter 通信(AmBC)占据了一席之地,通过利用反射的 ambient 电磁波(RF)信号,实现了通过 passive 标签与读取器进行连接。然而,传统的数据检测器受到了通道和 RF 源参数的限制。为了解决这个挑战,我们提出了一种创新的方法,使用深度神经网络(DNN)进行通道状态估计(CSI)和信号检测在 AmBC 系统中。不同于传统的方法,我们的方法不分开 CSI 估计和数据检测,而是通过 DNN 来协同估计 CSI 和数据检测。DNN 模型,在线上训练使用 simulate 数据 derived from channel statistics,在实际应用中表现出了优秀的robust性。 comprehensive evaluations 表明,我们提出的 DNN 方法在 BER 性能方面比传统的 ML 方法有约 20% 的提升,特别在高 SNR 条件下。这些结果证明了我们开发的方法在 AmBC 通道估计和信号检测方面的效果。总之,我们的方法在实际应用中表现出了更高的可靠性和效率,即使在具有挑战性的通道条件下。

Network-Level Integrated Sensing and Communication: Interference Management and BS Coordination Using Stochastic Geometry

  • paper_url: http://arxiv.org/abs/2311.09052
  • repo_url: None
  • paper_authors: Kaitao Meng, Christos Masouros, Guangji Chen, Fan Liu
  • for: 该研究旨在提高 интеграцион感知通信(ISAC)网络中的感知通信(S&C)性能,特别是在监测频率域中实现有效的平衡。
  • methods: 该研究使用 Stochastic Geometry 工具来捕捉 S&C 性能,并在 ISAC 网络中ILLuminate 关键的协作依赖关系。根据 derive 的面积 spectral efficiency(ASE)表达式,我们构建了最优化问题,以最大化网络性能的两个共同 S&C 指标。
  • results: 研究表明,干扰抑制可以提高平均数据率和雷达信息率。另外,在ASE最大化情况下,共同BS集群大小的选择对S&C性能具有灵活的负面影响。此外,我们证明了在优化通信性能时,理想的用户数与发射天线数的比值是一定的常数值。实验结果表明,提案的协作ISAC方案可以在网络级别上获得显著的S&C性能提升。
    Abstract In this work, we study integrated sensing and communication (ISAC) networks with the aim of effectively balancing sensing and communication (S&C) performance at the network level. Focusing on monostatic sensing, the tool of stochastic geometry is exploited to capture the S&C performance, which facilitates us to illuminate key cooperative dependencies in the ISAC network and optimize key network-level parameters. Based on the derived tractable expression of area spectral efficiency (ASE), we formulate the optimization problem to maximize the network performance from the view point of two joint S&C metrics. Towards this end, we further jointly optimize the cooperative BS cluster sizes for S&C and the serving/probing numbers of users/targets to achieve a flexible tradeoff between S&C at the network level. It is verified that interference nulling can effectively improve the average data rate and radar information rate. Surprisingly, the optimal communication tradeoff for the case of the ASE maximization tends to employ all spacial resources towards multiplexing and diversity gain, without interference nulling. By contrast, for the sensing objectives, resource allocation tends to eliminate certain interference especially when the antenna resources are sufficient, because the inter-cell interference becomes a more dominant factor affecting sensing performance. Furthermore, we prove that the ratio of the optimal number of users and the number of transmit antennas is a constant value when the communication performance is optimal. Simulation results demonstrate that the proposed cooperative ISAC scheme achieves a substantial gain in S&C performance at the network level.
    摘要 在这个研究中,我们研究了集成感知和通信(ISAC)网络,以实现网络水平的感知和通信(S&C)性能的平衡。我们将注意力集中在单频感知上,使用随机几何工具来捕捉S&C性能,从而照明ISAC网络中关键的合作依赖关系,并且优化关键网络级别参数。基于 derivated的面спектル效率(ASE)表达式,我们形ulated了 maximize 网络性能的优化问题,并且做出了关于 S&C 和服务/探测用户/目标的共同优化。研究结果表明,干扰消除可以有效提高平均数据率和雷达信息率。另外,在ASE maximization情况下,最佳通信交换倾向于使用所有空间资源进行多路复用和多样度增强,而不是干扰消除。在感知目标下,资源分配倾向于消除certain干扰,特别是当antenna资源充足时,因为 между� intercept 成为感知性能的主要影响因素。此外,我们证明了在优化通信性能时,用户数和发射天线数的比率是一定的常量值。实验结果表明,我们提出的合作ISAC方案在网络级别上实现了显著的感知和通信性能提升。

Integrating Sensing, Communication, and Power Transfer: Multiuser Beamforming Design

  • paper_url: http://arxiv.org/abs/2311.09028
  • repo_url: None
  • paper_authors: Ziqin Zhou, Xiaoyang Li, Guangxu Zhu, Jie Xu, Kaibin Huang, Shuguang Cui
  • for: 这个论文的目的是提出一种基于集成感知通信和能源传输(ISCPT)技术的多用户多输入多天线(MIMO)系统,以提高无线资源利用率。
  • methods: 本论文使用了多元素矩阵(MIMO) beamforming 设计,以提高感知性能,同时满足通信和能源传输要求。另外,作者还使用了Schur complement transformation和矩阵减少技术解决非对称优化问题。
  • results: 作者通过 simulations 验证了提议的设计,并发现了感知、通信和能源传输之间的性能协调问题。
    Abstract In the sixth-generation (6G) networks, massive low-power devices are expected to sense environment and deliver tremendous data. To enhance the radio resource efficiency, the integrated sensing and communication (ISAC) technique exploits the sensing and communication functionalities of signals, while the simultaneous wireless information and power transfer (SWIPT) techniques utilizes the same signals as the carriers for both information and power delivery. The further combination of ISAC and SWIPT leads to the advanced technology namely integrated sensing, communication, and power transfer (ISCPT). In this paper, a multi-user multiple-input multiple-output (MIMO) ISCPT system is considered, where a base station equipped with multiple antennas transmits messages to multiple information receivers (IRs), transfers power to multiple energy receivers (ERs), and senses a target simultaneously. The sensing target can be regarded as a point or an extended surface. When the locations of IRs and ERs are separated, the MIMO beamforming designs are optimized to improve the sensing performance while meeting the communication and power transfer requirements. The resultant non-convex optimization problems are solved based on a series of techniques including Schur complement transformation and rank reduction. Moreover, when the IRs and ERs are co-located, the power splitting factors are jointly optimized together with the beamformers to balance the performance of communication and power transfer. To better understand the performance of ISCPT, the target positioning problem is further investigated. Simulations are conducted to verify the effectiveness of our proposed designs, which also reveal a performance tradeoff among sensing, communication, and power transfer.
    摘要 在第六代(6G)网络中,巨量低功率设备预计将环境感知并传输巨大数据。为提高电磁资源效率,紧凑感测通信(ISAC)技术利用信号感测和通信功能,同时利用同一个信号作为信息和能量传输的同步无线信息和能量传输(SWIPT)技术。这两种技术的结合,形成了进一步的技术——集成感测通信和能量传输(ISCPT)。在本文中,我们考虑了一个多用户多输入多出力(MIMO)ISCPT系统,其中一个基站装备多个天线发送消息到多个信息收发器(IR),传输能量到多个能量收发器(ER),并同时感测目标。感测目标可以是点或扩展表面。当IR和ER的位置分开时,MIMO扩展设计用于提高感测性能,同时满足通信和能量传输要求。解决的非 convex 优化问题通过序列技术,如Schur complement transformation和矩阵减少。此外,当IR和ER均处于同一个位置时,共同优化功率分配和扩展设计以平衡通信和能量传输的性能。为了更好地理解ISCPT性能,我们进一步调查了目标位置问题。我们的提案的设计通过实验验证,并发现存在性能平衡问题。

Channel Estimation for mmWave MIMO using sub-6 GHz Out-of-Band Information

  • paper_url: http://arxiv.org/abs/2311.08996
  • repo_url: None
  • paper_authors: Faruk Pasic, Markus Hofer, Mariam Mussbah, Sebastian Caban, Stefan Schwarz, Thomas Zemen, Christoph F. Mecklenbräuker
  • for: 提高 millimeter wave(mmWave)通信系统中MIMO通信链的可靠性和吞吐量。
  • methods: 使用从sub-6GHz频段获得的外带信息来估算mmWave MIMO通信频道。
  • results: 比对传统使用只带内带信息的mmWave通信频道估算方法,提出三种新的通信频道估算方法,实验结果显示,提案方法在低SNR和高K因子下表现较为优秀,可以提高spectral efficiency。
    Abstract Future wireless multiple-input multiple-output (MIMO) communication systems will employ sub-6 GHz and millimeter wave (mmWave) frequency bands working cooperatively. Establishing a MIMO communication link usually relies on estimating channel state information (CSI) which is difficult to acquire at mmWave frequencies due to a low signal-to-noise ratio (SNR). In this paper, we propose three novel methods to estimate mmWave MIMO channels using out-of-band information obtained from the sub-6GHz band. We compare the proposed channel estimation methods with a conventional one utilizing only in-band information. Simulation results show that the proposed methods outperform the conventional mmWave channel estimation method in terms of achievable spectral efficiency, especially at low SNR and high K-factor.
    摘要 未来的无线多输入多出力(MIMO)通信系统将使用低于6GHz和毫米波(mmWave)频率band工作协作。建立MIMO通信链接通常需要估算通道状态信息(CSI),但mmWave频率band中的信号至噪比(SNR)很低,这使得频率估算变得更加困难。在这篇论文中,我们提出了三种新的mmWave通道估算方法,使用来自低于6GHz频率band的外带信息。我们与传统的尝试估算方法进行比较,结果显示,我们的提议方法在低SNR和高K因子下的可实现 spectral efficiency 方面具有显著的提高。

EMF-Aware Power Control for Massive MIMO: Cell-Free versus Cellular Networks

  • paper_url: http://arxiv.org/abs/2311.08989
  • repo_url: None
  • paper_authors: Sergi Liesegang, Stefano Buzzi
  • for: 这篇论文主要针对用户中心无线数据网络中电磁干扰问题进行了研究,以提高系统的可扩展性和可靠性。
  • methods: 该论文使用了最大化最小数据速率的能量分配策略,以满足EMF安全限制。
  • results: simulation结果表明,该策略可以轻松遵守EMF安全限制,同时不影响最小数据速率。此外,CF-mMIMO系统也比多单元巨量MIMO系统更高效,而且提档的能量分配策略可以提高系统公平性。
    Abstract The impressive growth of wireless data networks has recently led to increased attention to the issue of electromagnetic pollution. Specific absorption rates and incident power densities have become popular indicators for measuring electromagnetic field (EMF) exposure. This paper tackles the problem of power control in user-centric cell-free massive multiple-input-multiple-output (CF-mMIMO) systems under EMF constraints. Specifically, the power allocation maximizing the minimum data rate across users is derived for both the uplink and the downlink under EMF constraints. The developed solution is also applied to a cellular mMIMO system and compared to other benchmark strategies. Simulation results prove that EMF safety restrictions can be easily met without jeopardizing the minimum data rate, that the CF-mMIMO outperforms the multi-cell massive MIMO deployment, and that the proposed power control strategy greatly improves the system fairness.
    摘要 “受到无线数据网络的快速发展所带来的电磁污染问题已经引起了更多的关注。特别是对电磁场(EMF)曝露的评估指标 Specific Absorption Rate 和 Incident Power Density 的使用。本文对用户中心的无线免系大量多input多output(CF-mMIMO)系统中的能量控制进行了研究,以满足EMF的限制。具体来说,我们在下调和下调之下,对于每个用户最大化最小的数据率的能量分配。此外,我们还将此解释应用到了一个细节的Cellular mMIMO系统中,并与其他参考策略进行比较。实验结果显示:1. EMF安全限制可以轻松满足,不会对最小的数据率造成影响;2. CF-mMIMO比多组巨量MIMO部署更好,3. 我们的能量控制策略可以大大提高系统公平性。”

  • paper_url: http://arxiv.org/abs/2311.08964
  • repo_url: None
  • paper_authors: Henrique Buglia, Eric Sillekens, Lidia Galdino, Robert Killey, Polina Bayvel1
  • for: 这篇论文是为了提高hybrid-amplified links的吞吐量而写的。
  • methods: 论文使用了semi-analytical方法和实时非线性干扰模型,并结合粒子群优化算法来最大化hybrid-amplified links的吞吐量。
  • results: 论文通过结合粒子群优化算法来提高hybrid-amplified 10.5 THz 117x57 km 链路的吞吐量,比起EDFAs-only配置提高了12%。
    Abstract A semi-analytical, real-time nonlinear-interference model including ASE noise in hybrid-amplified links is introduced. Combined with particle-swarm optimisation, the capacity of a hybrid-amplified 10.5 THz 117x57 km link was maximised, increasing throughput by 12% versus an EDFAs-only configuration.
    摘要 “一种半分析式、实时非线性干扰模型,包括ASE噪声,在混合增强链路中引入。与particle-swarm优化结合,hybrid-amplified 10.5 THz 117x57 km 链路的容量最大化,比EDFAs-only配置提高了12%的吞吐量。”Here's a breakdown of the translation:* “半分析式”(pán fēn yì jì) - semi-analytical* “实时非线性干扰模型”(shí jì fēn xiǎn yì jì mó delè) - real-time nonlinear-interference model* “ASE噪声”(ASE nóng shēng) - ASE noise* “混合增强链路”(hù hé zēng jiāng liàng) - hybrid-amplified link* “容量”(róng kè) - capacity* “最大化”(zmài huì) - maximized* “比EDFAs-only配置”(bǐ EDFAs-only zhèng jì) - compared to an EDFAs-only configuration* “吞吐量”(tōng chuō liàng) - throughputI hope this helps! Let me know if you have any further questions or if you'd like me to translate anything else.

Design and Implementation of a Hybrid Wireless Power and Communication System for Medical Implants

  • paper_url: http://arxiv.org/abs/2311.08933
  • repo_url: None
  • paper_authors: A. Khaleghi, A. Hasanvand, I. Balasingham
  • for: 这个论文目的是提供一种基于无线电力的嵌入式设备,用于预防和早期发现许多慢性疾病。
  • methods: 该论文使用了人体内部无线电力供应、感知和通信技术,并应用了人工智能(AI)和机器学习(ML)分析大数据技术。
  • results: 该论文提出了一种基于401MHz无线电波的无线嵌入式设备,通过两个同时的无线链路进行设备之间的通信,并实现了深度401MHz的无线电力供应。
    Abstract Data collection and analysis from multiple implant nodes in humans can provide targeted medicine and treatment strategies that can prevent many chronic diseases. This data can be collected for a long time and processed using artificial intelligence (AI) techniques in a medical network for early detection and prevention of diseases. Additionally, machine learning (ML) algorithms can be applied for the analysis of big data for health monitoring of the population. Wireless powering, sensing, and communication are essential parts of future wireless implants that aim to achieve the aforementioned goals. In this paper, we present the technical development of a wireless implant that is powered by radio frequency (RF) at 401 MHz, with the sensor data being communicated to an on-body reader. The implant communication is based on two simultaneous wireless links: RF backscatter for implant-to-on-body communication and a galvanic link for intra-body implant-to-implant connectivity. It is demonstrated that RF powering, using the proposed compact antennas, can provide an efficient and integrable system for powering up to an 8 cm depth inside body tissues. Furthermore, the same antennas are utilized for backscatter and galvanic communication.
    摘要 《数据采集和分析从多个人体节点可以提供Targeted医疗和治疗策略,预防许多慢性疾病。这些数据可以长期采集并使用人工智能(AI)技术进行处理,在医疗网络中进行早期检测和预防疾病。此外,机器学习(ML)算法可以用于分析大量数据,进行人群健康监测。在这篇论文中,我们介绍了无线嵌入式设备的技术开发,该设备由401MHzRadio frequency(RF)电磁辐射供电,感知数据通过身体上的读取器与设备进行通信。该设备通信基于两个同时的无线链路:RF扫描 для设备与身体之间的通信,以及 galvanic链接用于体内设备之间的通信。我们展示了RF供电,使用我们提出的减小型天线,可以提供高效和可集成的系统,在身体组织中进行深度401 MHz的供电。此外,同一天线也用于扫描和 galvanic 通信。

Energy-Efficient Design of Satellite-Terrestrial Computing in 6G Wireless Networks

  • paper_url: http://arxiv.org/abs/2311.08904
  • repo_url: None
  • paper_authors: Qi Wang, Xiaoming Chen, Qiao Qi
  • for: 本文研究 sixth generation(6G)无线网络中的卫星地面计算问题,其中多个地面基站(BS)和低地球轨道卫星(LEO)合作提供边缘计算服务,为全球的地面用户设备(GUE)和空间用户设备(SUE)提供服务。
  • methods: 本文提出了一种完整的卫星地面计算过程,包括通信和计算方面的设计,以适应6G无线网络的特点。
  • results: 对于卫星地面计算,提出了一种能效的卫星地面计算算法,通过同时优化卫星与地面基站之间数据传输和计算任务的分配,以最小化加权总能consumption,并保证计算任务的延迟要求。实验和理论分析结果都表明,提出的算法在6G无线网络中的卫星地面计算方面具有快速吞吐和优秀性能。
    Abstract In this paper, we investigate the issue of satellite-terrestrial computing in the sixth generation (6G) wireless networks, where multiple terrestrial base stations (BSs) and low earth orbit (LEO) satellites collaboratively provide edge computing services to ground user equipments (GUEs) and space user equipments (SUEs) over the world. In particular, we design a complete process of satellite-terrestrial computing in terms of communication and computing according to the characteristics of 6G wireless networks. In order to minimize the weighted total energy consumption while ensuring delay requirements of computing tasks, an energy-efficient satellite-terrestrial computing algorithm is put forward by jointly optimizing offloading selection, beamforming design and resource allocation. Finally, both theoretical analysis and simulation results confirm fast convergence and superior performance of the proposed algorithm for satellite-terrestrial computing in 6G wireless networks.
    摘要 在这篇论文中,我们研究了 sixth generation(6G)无线网络中的卫星-地面计算问题,其中多个地面基站(BS)和低地球轨卫星(LEO)共同提供边缘计算服务给地面用户设备(GUE)和空间用户设备(SUE)。特别是,我们设计了6G无线网络中卫星-地面计算的完整过程,包括通信和计算方面。为了最小化加权总能 consumption,我们提出了一种能效的卫星-地面计算算法,通过共同优化卸载选择、扩散设计和资源分配来确保计算任务的延迟要求。最后,我们通过理论分析和模拟结果,证明了我们提出的算法在6G无线网络中的快速叠合和优秀表现。

RIS Position and Orientation Estimation via Multi-Carrier Transmissions and Multiple Receivers

  • paper_url: http://arxiv.org/abs/2311.08887
  • repo_url: None
  • paper_authors: Reza Ghazalian, Hui Chen, George C. Alexandropoulos, Gonzalo Seco-Granados, Henk Wymeersch, Riku Jäntti
  • for: 本研究旨在探讨透过嵌入智能表面技术实现的 sixth generation无线系统的可定位和探测能力。
  • methods: 本文使用时间射频和空间频率测量来实现透过嵌入智能表面的用户定位问题的解决方案。
  • results: simulations 结果表明,提案的 RIS 状态估计方法在不同系统操作参数下具有高效性。
    Abstract Reconfigurable intelligent surfaces (RISs) are considered as an enabling technology for the upcoming sixth generation of wireless systems, exhibiting significant potential for radio localization and sensing. An RIS is usually treated as an anchor point with known position and orientation when deployed to offer user localization. However, it can also be attached to a user to enable its localization in a semi-passive manner. In this paper, we consider a static user equipped with an RIS and study the RIS localization problem (i.e., joint three-dimensional position and orientation estimation), when operating in a system comprising a single-antenna transmitter and multiple synchronized single-antenna receivers with known locations. We present a multi-stage estimator using time-of-arrival and spatial frequency measurements, and derive the Cram\'er-Rao lower bounds for the estimated parameters to validate the estimator's performance. Our simulation results demonstrate the efficiency of the proposed RIS state estimation approach under various system operation parameters.
    摘要 弹性智能表面(RIS)被视为 sixth generation无线系统的核心技术,具有很大的射频地域和感知潜力。一个RIS通常被视为一个已知位置和方向的锚点,当它被部署时。但是,它也可以附加到用户来实现半通信式的用户位置测量。在这篇论文中,我们考虑一个静止的用户,装备了RIS,并研究RIS的位置识别问题(即三维位置和方向的共同估计),在一个具有单antenna传送器和多个同步化单antenna接收器的系统中进行。我们提出了一个多阶估计器,使用时间对应和频率对应的测量,并 derivated Cramér-Rao下界来验证估计器的性能。我们的实验结果显示了该RIS状态估计方法的效率,在不同的系统运行参数下。

Aerial IRS with Robotic Anchoring Capabilities: A Novel Way for Adaptive Coverage Enhancement

  • paper_url: http://arxiv.org/abs/2311.08876
  • repo_url: None
  • paper_authors: Xinyuan Wu, Vasilis Friderikos
  • for: 提高无线网络覆盖率和用户服务质量 (improving wireless network coverage and end-user Quality of Service)
  • methods: 使用机器人飞行智能反射表 (RA-IRSs),具有能gie neutral manner具有钻取灯杆等高层城市地形的机制,从而完全消除飞行/停靠能 consumption和提供多小时或者多天服务 (eliminating flying/hovering energy consumption and providing multiple hours or days of service)
  • results: 使用RA-IRSs可以提高网络性能,通过变化锚点位置以遵循空间时间交通需求 (improving network performance by changing anchoring locations to follow spatial-temporal traffic demand),提供了高度异ogeneous区域中的显著信噪比提升 (significant Signal-to-Noise ratio gain in highly heterogeneous regions),并且在高异ogeneous区域中可以支持更多的流量需求 (sustaining more than 2 times the traffic demand in areas experiencing high heterogeneity)。
    Abstract It is widely accepted that integrating intelligent reflecting surfaces (IRSs) with unmanned aerial vehicles (UAV) or drones can assist wireless networks in improving network coverage and end user Quality of Service (QoS). However, the critical constrain of drones is their very limited hovering/flying time. In this paper we propose the concept of robotic aerial IRSs (RA-IRSs), which are in essence drones that in addition to IRS embed an anchoring mechanism that allows them to grasp in an energy neutral manner at tall urban landforms such as lampposts. By doing so, RA-IRSs can completely eliminate the flying/hovering energy consumption and can offer service for multiple hours or even days (something not possible with UAV-mounted IRSs). Using that property we show how RA-IRS can increase network performance by changing their anchoring location to follow the spatio-temporal traffic demand. The proposed methodology, developed through Integer Linear Programming (ILP) formulations offers a significant Signal-to-Noise (SNR) gain in highly heterogeneous regions in terms of traffic demand compared to fixed IRS; hence, addressing urban coverage discrepancies effectively. Numerical simulations validate the superiority of RA-IRSs over fixed terrestrial IRSs in terms of traffic serviceability, sustaining more than 2 times the traffic demand in areas experiencing high heterogeneity, emphasizing their adaptability in improving coverage and QoS in complex urban terrains.
    摘要 广泛接受到了将智能反射表(IRS)与无人机(UAV)或无人飞行器(drone)结合,以提高无线网络的覆盖范围和用户服务质量(QoS)。然而,无人机的缺点是其很有限的悬挂/飞行时间。在这篇论文中,我们提出了机器人空中智能反射表(RA-IRS)的概念,它们是具有悬挂机制的无人机,可以在能源中立Positions中静止,以减少或完全消除飞行/悬挂时间的能量消耗。通过这种方式,RA-IRS可以提供多个小时或者多天的服务(不可能由UAV-IRS所实现)。我们使用了Integer Linear Programming(ILP)方法开发了一种新的方法ологи,以便通过更改悬挂位置来跟踪空间时间具有不同强度的负载均衡问题。我们的方法可以在高度不均的区域中提供明显的信噪比(SNR)提升,从而有效地解决城市覆盖不平等问题。数值仿真 validate了RA-IRSs的superiority,可以在高度不均的区域中支持更多的流量,达到2倍以上的流量可用性,强调其适应性在复杂的城市地形中。

Phase retrieval with semi-algebraic and ReLU neural network priors

  • paper_url: http://arxiv.org/abs/2311.08833
  • repo_url: None
  • paper_authors: Tamir Bendory, Nadav Dym, Dan Edidin, Arun Suresh
  • for: 解决phaserecovery问题
  • methods: 使用semi-algebraic set prior
  • results: 可以从傅ри格式中获取信号的签名,并且可以确定信号的签名是唯一的,只要信号 lie在一个 semi-algebraic set中。这个result generalizes to multi-reference alignment models with multiplicity free representations of compact groups.
    Abstract The key ingredient to retrieving a signal from its Fourier magnitudes, namely, to solve the phase retrieval problem, is an effective prior on the sought signal. In this paper, we study the phase retrieval problem under the prior that the signal lies in a semi-algebraic set. This is a very general prior as semi-algebraic sets include linear models, sparse models, and ReLU neural network generative models. The latter is the main motivation of this paper, due to the remarkable success of deep generative models in a variety of imaging tasks, including phase retrieval. We prove that almost all signals in R^N can be determined from their Fourier magnitudes, up to a sign, if they lie in a (generic) semi-algebraic set of dimension N/2. The same is true for all signals if the semi-algebraic set is of dimension N/4. We also generalize these results to the problem of signal recovery from the second moment in multi-reference alignment models with multiplicity free representations of compact groups. This general result is then used to derive improved sample complexity bounds for recovering band-limited functions on the sphere from their noisy copies, each acted upon by a random element of SO(3).
    摘要 “对于从傅立叶展开的信号重建问题,键的因素是一个有效的先前知识。本文研究对半代数集的信号重建问题,这是非常通用的先前知识,因为半代数集包括线性模型、简单模型和ReLU神经网络生成模型。后者是这篇文章的主要动机,因为深度生成模型在各种图像任务中表现出色。我们证明,如果信号 lying in a (普通)半代数集的话,则可以从傅立叶展开中恢复信号,保留信号的符号, provided that the dimension of the semi-algebraic set is at least N/2。同样的,如果半代数集的维度是 N/4,则所有的信号都可以从傅立叶展开中恢复。我们还将这些结果推广到多对对称定理中的问题,并使用多对对称定理来 derive improved sample complexity bounds for recovering band-limited functions on the sphere from their noisy copies, each acted upon by a random element of SO(3).”Note: The translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.

Wireless Communications in Cavity: A Reconfigurable Boundary Modulation based Approach

  • paper_url: http://arxiv.org/abs/2311.08810
  • repo_url: None
  • paper_authors: Xuehui Dong, Xiang Ren, Bokai Lai, Rujing Xiong, Tiebin Mi, Robert Caiming Qiu
  • for: 这篇论文探讨了嵌入智能表面(RIS)在干扰波传播环境中的无线通信应用potential.
  • methods: 论文首次引入了可重新配置的边界修饰框架,并提出了一种可靠的边界修饰方案,通过RIS生成的等效脉冲实现了脉冲位模式(PPM)无线通信。
  • results: 实验结果显示,这种方法可以在原型中实现约2Mbps的比特率,并具有强 resistivity to channel的频率选择性,导致比特错误率非常低。
    Abstract This paper explores the potential wireless communication applications of Reconfigurable Intelligent Surfaces (RIS) in reverberant wave propagation environments. Unlike in free space, we utilize the sensitivity to boundaries of the enclosed electromagnetic (EM) field and the equivalent perturbation of RISs. For the first time, we introduce the framework of reconfigurable boundary modulation in the cavities . We have proposed a robust boundary modulation scheme that exploits the continuity of object motion and the mutation of the codebook switch, which achieves pulse position modulation (PPM) by RIS-generated equivalent pulses for wireless communication in cavities. This approach achieves around 2 Mbps bit rate in the prototype and demonstrates strong resistance to channel's frequency selectivity resulting in an extremely low bit error rate (BER).
    摘要 (本文探讨了嵌入式智能表面(RIS)在干扰波传播环境中的无线通信应用。与在自由空间中不同,我们利用封闭电磁场边界敏感性和相同的RIS做法。我们为首次引入了可重新配置边缘调制框架在镜室中。我们提出了一种强健的边缘调制方案,利用物体运动连续性和代码库交换互变,实现了由RIS生成的等效脉冲干扰器为无线通信在镜室中实现的脉冲位调制(PPM)。这种方法在原型中实现了约2Mbps的比特率和非常低的比特错误率(BER)。)

Channel Capacity and Bounds In Mixed Gaussian-Impulsive Noise

  • paper_url: http://arxiv.org/abs/2311.08804
  • repo_url: None
  • paper_authors: Tianfu Qi, Jun Wang, Qihang Peng, Xiaoping Li, Xiaonan Chen
  • for: This paper investigates the channel capacity of communication systems under mixed noise, which consists of both non-Gaussian impulsive noise (IN) and white Gaussian noise (WGN).
  • methods: The authors use mathematical proofs and numerical results to study the channel capacity under p-th moment constraint and show that there are only finite mass points in the capacity-achieving distribution.
  • results: The authors provide lower and upper capacity bounds with closed forms, and show that the lower bounds can degenerate to the well-known Shannon formula under special scenarios. Numerical results reveal that the capacity decreases when the impulsiveness of the mixed noise becomes dominant, and the obtained capacity bounds are shown to be very tight.Here’s the Chinese text:
  • for: 这篇论文研究了含杂噪的通信系统频率的频率容量,其中包括非高斯噪声(IN)和白噪声(WGN)。
  • methods: 作者使用数学证明和数值结果来研究频率容量下p-th moment约束的存在和只有有限多个质量点的分布。
  • results: 作者提供了下界和上界的频率容量 bound,并显示了这些下界可以在特定情况下逐渐变为著名的雪伦 формула。numerical results表明,杂噪的强度增加时,频率容量减少,并且获得的容量 bound是非常紧致的。
    Abstract Communication systems suffer from the mixed noise consisting of both non-Gaussian impulsive noise (IN) and white Gaussian noise (WGN) in many practical applications. However, there is little literature about the channel capacity under mixed noise. In this paper, we prove the existence of the capacity under p-th moment constraint and show that there are only finite mass points in the capacity-achieving distribution. Moreover, we provide lower and upper capacity bounds with closed forms. It is shown that the lower bounds can degenerate to the well-known Shannon formula under special scenarios. In addition, the capacity for specific modulations and the corresponding lower bounds are discussed. Numerical results reveal that the capacity decreases when the impulsiveness of the mixed noise becomes dominant and the obtained capacity bounds are shown to be very tight.
    摘要 通信系统在实际应用中常常受到杂合噪声(IN)和白噪声(WGN)的杂合噪声的影响。然而,关于杂合噪声下的通道容量,现有文献不多。在这篇论文中,我们证明了容量下p-th moment约束的存在,并表明了 achieve 分布中具有finite mass点。此外,我们还提供了下界和上界的容量 bound,它们具有封闭形式。显示,下界可以在特定情况下逐渐变为著名的雪伦方程。此外,我们还讨论了特定模ulation的容量和相关下界。numerical result表明,杂合噪声占主导地位时,通道容量减少,而我们所获得的容量 bound 具有很高精度。

High-Resolution DOA Estimation via a Novel Tree Model-based Deep Neural Network

  • paper_url: http://arxiv.org/abs/2311.08758
  • repo_url: None
  • paper_authors: Yifan Li, Feng Shu, Yaoliang Song, Jiangzhou Wang
    for: 这种论文是为了提高深度神经网络(DNN)在评估偏角时的性能而写的。methods: 该论文提出了一种基于树模型的深度神经网络(TDNN),它包含多个小规模DNN的层,从第一层到最后一层,每个层都是将angular region分解成更小的子区域,并通过积加多个层的分类结果来获得最终的DOA估计结果。results: simulations results表明,TDNN在单源和多源情况下的估计性能都远胜传统方法,特别是在低信号噪听比(SNR)下。
    Abstract Traditional deep neural networks (DNNs) have bad performance on estimating off-grid angles, and the most direct solution is to increase the number of output classes for improving angular resolution. But more output classes can weaken the model accuracy of DNNs and thus decreasing the direction-of-arrival (DOA) estimation accuracy. In this work, a tree-model based deep neural networks (TDNN) is proposed, which contains H layers and each layer is consist of multiple small-scale DNNs. From the first layer to the last layer of TDNN, the angular region is gradually divided into smaller subregions by these DNNs, and the estimated DOA is finally obtained by cumulative calculating the classification results of all the layers. TDNN can improve the angular resolution by increasing the number of layers or the number of DNNs in any layer instead of changing the structure of single DNN, so the model accuracy of TDNN will not decrease with the improvement of angular resolution and its estimation performance is also stable. In addition, the Q-TDNN method is also proposed for multi-sources DOA estimation, which can obtain Q different DOAs from the same signals by combining Q independent and parallel TDNNs. The simulation results validate TDNN has much better estimation performance than traditional methods in both single-source and multi-sources cases, especially at low signal-to-noise ratio (SNR).
    摘要 传统的深度神经网络(DNNs)在未经授标的角度估计方面表现不佳,而直接解决方法是增加出力类型数以提高角度分辨率。但是,增加出力类型数会弱化DNNs模型的准确性,因此降低方向来源估计精度。在这种情况下,一种基于树模型的深度神经网络(TDNN)被提出,它包含H层,每层都包含多个小规模DNNs。从TDNN的第一层到最后一层,每层的angular region逐渐分解为更小的子区域,并通过累加所有层的类别结果来获得最终的DOA估计结果。TDNN可以通过增加层数或增加每层DNNs的数量来提高角度分辨率,而不是改变单个DNN的结构,因此TDNN的模型准确性不会随着角度分辨率的提高而下降。此外,Q-TDNN方法也被提出用于多源DOA估计,可以通过将Q个独立并平行的TDNN组合来获得Q个不同的DOA。实验结果表明,TDNN在单源和多源情况下都有较好的估计性能,特别是在低SNR情况下。

Near-Field Wideband Secure Communications: An Analog Beamfocusing Approach

  • paper_url: http://arxiv.org/abs/2311.08738
  • repo_url: None
  • paper_authors: Yuchen Zhang, Haiyang Zhang, Wanbin Tang, Yonina C. Eldar
  • for: 本研究旨在提高Physical Layer Security (PLS)在近场宽频通信中。
  • methods: 我们提出了一种True-Time Delayer (TTD)-包含的分析式束缚技术,以 Address the interplay between near-field propagation and wideband beamsplit。我们采用了一种两阶段的优化方法,包括半数字解决方案和分析approximation。
  • results: 我们的方法可以 clearly demonstrate the superiority of the proposed methods over TTD-free approaches in fortifying wideband PLS, as well as the advantageous secrecy energy efficiency achieved by leveraging low-cost analog devices。
    Abstract In the rapidly advancing landscape of six-genration (6G), characterized by ultra-high-speed wideband transmission in millimeter-wave and terahertz bands, our paper addresses the pivotal task of enhancing physical layer security (PLS) within near-field wideband communications. We introduce true-time delayer (TTD)-incorporated analog beamfocusing techniques designed to address the interplay between near-field propagation and wideband beamsplit, an uncharted domain in existing literature. Our approach to maximizing secrecy rates involves formulating an optimization problem for joint power allocation and analog beamformer design, employing a two-stage process encompassing a semi-digital solution and analog approximation. This problem is efficiently solved through a combination of alternating optimization, fractional programming, and block successive upper-bound minimization techniques. Additionally, we present a low-complexity beamsplit-aware beamfocusing strategy, capitalizing on geometric insights from near-field wideband propagation, which can also serve as a robust initial value for the optimization-based approach. Numerical results substantiate the efficacy of the proposed methods, clearly demonstrating their superiority over TTD-free approaches in fortifying wideband PLS, as well as the advantageous secrecy energy efficiency achieved by leveraging low-cost analog devices.
    摘要 在六代(6G)迅速发展的背景下,我们的论文关注Physical Layer Security(PLS)在近场宽频通信中的提升。我们介绍了包含真实时间延迟(TTD)的Analog beamforming技术,以解决近场传播和宽频扫描之间的互动,这是现有文献中未曾探讨的领域。我们的方法是通过对共同功率分配和Analog beamformer设计进行优化,使用两个阶段的过程:一个半数字解决方案和Analog近似。这个问题可以通过 alternate optimization、分数编程和块顺序上升最小化技术来有效地解决。此外,我们还提出了一种低复杂度扫描矩阵射频环境中的Beamforming策略,基于近场宽频传播的几何意味,这也可以作为优化方法的robust初值。数值结果证明了我们提出的方法的有效性,明确地表明它们在加强宽频PLS方面的优势,以及通过利用低成本的Analog设备实现的高效秘密能量占用率。

Massive Wireless Energy Transfer without Channel State Information via Imperfect Intelligent Reflecting Surfaces

  • paper_url: http://arxiv.org/abs/2311.08720
  • repo_url: None
  • paper_authors: Cheng Luo, Jie Hu, Luping Xiang, Kun Yang, Kai-Kit Wong
  • for: 提高无线能量传输效率和互联网智能设备的连接数量
  • methods: 使用低成本、静止反射元件,并利用phasered beam rotation技术,实现无 Channel State Information (CSI) 探测 schemes
  • results: 提高了无线传输效率和设备连接数量,并在大规模设备场景下表现更高效,不需要额外的硬件更新或修改
    Abstract Intelligent Reflecting Surface (IRS) utilizes low-cost, passive reflecting elements to enhance the passive beam gain, improve Wireless Energy Transfer (WET) efficiency, and enable its deployment for numerous Internet of Things (IoT) devices. However, the increasing number of IRS elements presents considerable channel estimation challenges. This is due to the lack of active Radio Frequency (RF) chains in an IRS, while pilot overhead becomes intolerable. To address this issue, we propose a Channel State Information (CSI)-free scheme that maximizes received energy in a specific direction and covers the entire space through phased beam rotation. Furthermore, we take into account the impact of an imperfect IRS and meticulously design the active precoder and IRS reflecting phase shift to mitigate its effects. Our proposed technique does not alter the existing IRS hardware architecture, allowing for easy implementation in the current system, and enabling access or removal of any Energy Receivers (ERs) without additional cost. Numerical results illustrate the efficacy of our CSI-free scheme in facilitating large-scale IRS without compromising performance due to excessive pilot overhead. Furthermore, our scheme outperforms the CSI-based counterpart in scenarios involving large-scale ERs, making it a promising solution in the era of IoT.
    摘要 智能反射Surface(IRS)利用低成本、 Passtive 反射元件提高无源能量传输(WET)效率,并使其可以用于互联网未来设备。然而,随着IRS元件的增加,通道估算带来了很大挑战。这是因为IRS没有活动Radio Frequency(RF)链,而且在pilot overhead增加的情况下,估算变得不可持。为了解决这个问题,我们提出了一种CSIfree的方案,它可以在特定的方向上最大化接收能量,并通过phasered beam rotation覆盖整个空间。此外,我们考虑了IRS的不完美性,并仔细设计了活动预编器和反射相位调整,以mitigate其影响。我们的提议方法不会改变现有IRS硬件架构,因此可以轻松实现在现有系统中,并且允许ER的添加或 removalfoundation without additional cost。数字结果表明我们的CSIfree方案可以在大规模IRS scenario中实现高性能,而且在大量ER的场景下,我们的方案比CSIs based counterpart更高效,这使得它成为iot时代的一个有望的解决方案。

Low Complexity High Speed Deep Neural Network Augmented Wireless Channel Estimation

  • paper_url: http://arxiv.org/abs/2311.08689
  • repo_url: None
  • paper_authors: Syed Asrar ul haq, Varun Singh, Bhanu Teja Tanaji, Sumit Darak
  • for: 提高 wireless receiver 的频率 estimation 精度和速度。
  • methods: 使用 Deep Neural Network-Augmented Least Square (LC-LSDNN) 算法,并对 received complex symbols 使用不同的 Deep Neural Network (DNN) 进行处理。
  • results: 与 MMSE 和现有 DL-based CE 比较,LC-LSDNN 具有更高的精度和速度,可以在 60% 更高的时钟频率下运行,并且具有更低的计算资源。
    Abstract The channel estimation (CE) in wireless receivers is one of the most critical and computationally complex signal processing operations. Recently, various works have shown that the deep learning (DL) based CE outperforms conventional minimum mean square error (MMSE) based CE, and it is hardware-friendly. However, DL-based CE has higher complexity and latency than popularly used least square (LS) based CE. In this work, we propose a novel low complexity high-speed Deep Neural Network-Augmented Least Square (LC-LSDNN) algorithm for IEEE 802.11p wireless physical layer and efficiently implement it on Zynq system on chip (ZSoC). The novelty of the LC-LSDNN is to use different DNNs for real and imaginary values of received complex symbols. This helps reduce the size of DL by 59% and optimize the critical path, allowing it to operate at 60% higher clock frequency. We also explore three different architectures for MMSE-based CE. We show that LC-LSDNN significantly outperforms MMSE and state-of-the-art DL-based CE for a wide range of signal-to-noise ratios (SNR) and different wireless channels. Also, it is computationally efficient, with around 50% lower resources than existing DL-based CE.
    摘要 频率接收器的频率估计(CE)是无线接收器中最 kritical 和 computationally 复杂的信号处理操作。最近几年,不同的工作表明,使用深度学习(DL)基于CE可以超越传统的最小平均方差(MMSE)基于CE,同时具有硬件友好性。然而,DL-based CE 的复杂性和延迟比较高于常用的最小二乘(LS)基于CE。在这个工作中,我们提出了一种新的低复杂度高速的深度神经网络增强了最小二乘(LSDNN)算法,用于 IEEE 802.11p 无线物理层。LSDNN 的新特点在于,使用不同的深度神经网络来处理实数和虚数据的收到复杂符号的两个部分。这有助于减少 DL 的大小,并且优化了关键路径,使其能够在更高的时钟频率上运行。我们还探索了三种不同的 MMSE 基于 CE 架构。我们发现,LC-LSDNN 可以明显超越 MMSE 和现有的 DL-based CE,在各种信号噪声比(SNR)和不同的无线通道上。此外,它还是计算效率高,与现有的 DL-based CE 的资源使用量相比,下降约50%。

cs.SD - 2023-11-14

ChoralSynth: Synthetic Dataset of Choral Singing

  • paper_url: http://arxiv.org/abs/2311.08350
  • repo_url: None
  • paper_authors: Jyoti Narang, Viviana De La Vega, Xavier Lizarraga, Oscar Mayor, Hector Parra, Jordi Janer, Xavier Serra
  • for: 本研究是为了提供高质量的choral singing数据集,以便进行Music Information Retrieval(MIR)研究。
  • methods: 本研究使用了现代音乐合成器,创造和 curae quality renditions。数据来源于Choral Public Domain Library(CPDL)。
  • results: 本研究提供了一个完整的数据集,包括相关的metadata,以及方法和技术。这些数据和方法将为Singing Voice研究开创新的 Avenues。
    Abstract Choral singing, a widely practiced form of ensemble singing, lacks comprehensive datasets in the realm of Music Information Retrieval (MIR) research, due to challenges arising from the requirement to curate multitrack recordings. To address this, we devised a novel methodology, leveraging state-of-the-art synthesizers to create and curate quality renditions. The scores were sourced from Choral Public Domain Library(CPDL). This work is done in collaboration with a diverse team of musicians, software engineers and researchers. The resulting dataset, complete with its associated metadata, and methodology is released as part of this work, opening up new avenues for exploration and advancement in the field of singing voice research.
    摘要 合唱歌唱,一种广泛实践的 ensemble 唱歌形式,在音乐信息检索(MIR)研究领域缺乏完整的数据集,因为需要合成多轨录音。为解决这个问题,我们提出了一种新的方法,利用当今最佳的 sintizer 创建和精心编辑高质量的演唱。歌谱来自choral Public Domain Library(CPDL)。这项工作和一群多元化的音乐家、软件工程师和研究人员合作完成,并随此工作发布了相关的数据集和方法。这些数据和方法对唱音研究领域开启了新的探索途径。

Generative De-Quantization for Neural Speech Codec via Latent Diffusion

  • paper_url: http://arxiv.org/abs/2311.08330
  • repo_url: None
  • paper_authors: Haici Yang, Inseon Jang, Minje Kim
  • for: 这 paper 的目的是提出一种 separable 的 speech coding 网络,以提高 speech 质量和简化网络结构。
  • methods: 该 paper 使用了 end-to-end 编码器来学习紧凑的特征,并使用了 latent diffusion 模型来解码紧凑的特征。
  • results: 该 paper 的实验结果显示,该模型在两个低比特率(1.5和3kbps)下的主观听测试中表现出色,并且比现有的模型更高效。
    Abstract In low-bitrate speech coding, end-to-end speech coding networks aim to learn compact yet expressive features and a powerful decoder in a single network. A challenging problem as such results in unwelcome complexity increase and inferior speech quality. In this paper, we propose to separate the representation learning and information reconstruction tasks. We leverage an end-to-end codec for learning low-dimensional discrete tokens and employ a latent diffusion model to de-quantize coded features into a high-dimensional continuous space, relieving the decoder's burden of de-quantizing and upsampling. To mitigate the issue of over-smooth generation, we introduce midway-infilling with less noise reduction and stronger conditioning. In ablation studies, we investigate the hyperparameters for midway-infilling and latent diffusion space with different dimensions. Subjective listening tests show that our model outperforms the state-of-the-art at two low bitrates, 1.5 and 3 kbps. Codes and samples of this work are available on our webpage.
    摘要 低比特率 speech 编码中,端到端 speech 编码网络目标是学习紧凑而表达力强的特征和一个强大的解码器在单个网络中。这是一个具有挑战性的问题,会导致不良复杂性增加和声音质量下降。在这篇论文中,我们提议将表征学习和信息重建任务分离开。我们利用端到端编码器来学习低维度的整数token,并使用幽默扩散模型将编码后的特征转换为高维度连续空间,从而减轻解码器的幽默扩散和采样加工负担。为了缓解过度平滑生成的问题,我们引入中途填充,并对其进行较强的条件和噪声减少。在分析研究中,我们研究了不同维度的幽默扩散空间和中途填充的hyperparameters。主观听测试显示,我们的模型在1.5和3kbps两个低比特率下表现出色,超过了当前状态的质量。我们的代码和样本在我们的网站上可以获得。

DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation

  • paper_url: http://arxiv.org/abs/2311.07965
  • repo_url: None
  • paper_authors: Jiangzong Wang, Pengcheng Li, Xulong Zhang, Ning Cheng, Jing Xiao
  • for: 提高 neural-based 文本译音方法在低资源条件下的表现
  • methods: 使用 semi-supervised 模型,学习 both paired 和 unpaired 数据,并使用动态量化表示模块
  • results: 与传统方法相比,只使用 less than 120 minutes 的 paired 数据,提高了 subjective 和 objective 评价指标
    Abstract Most existing neural-based text-to-speech methods rely on extensive datasets and face challenges under low-resource condition. In this paper, we introduce a novel semi-supervised text-to-speech synthesis model that learns from both paired and unpaired data to address this challenge. The key component of the proposed model is a dynamic quantized representation module, which is integrated into a sequential autoencoder. When given paired data, the module incorporates a trainable codebook that learns quantized representations under the supervision of the paired data. However, due to the limited paired data in low-resource scenario, these paired data are difficult to cover all phonemes. Then unpaired data is fed to expand the dynamic codebook by adding quantized representation vectors that are sufficiently distant from the existing ones during training. Experiments show that with less than 120 minutes of paired data, the proposed method outperforms existing methods in both subjective and objective metrics.
    摘要 现有的神经网络基于文本至话方法大多需要广泛的数据集和面临低资源情况下遇到挑战。在这篇文章中,我们介绍了一个新的半监督文本至话合成模型,可以从对称和无对称数据进行学习,以解决这个挑战。这个模型的关键 комponents是动态量化表现模块,它被组入了一个排序自适应器。当 given paired data 时,这个模块包含一个可调数表示的对称码库,可以在对称数据的监督下学习量化表现。但在低资源情况下,这些对称数据很难覆盖所有的音响。然后,无对称数据被 feed 到扩展动态码库,在训练时添加量化表现向量,以便在训练时与现有的向量 sufficiently distant 的情况下增加量化表现向量。实验显示,仅使用 less than 120 分钟的对称数据,提案方法已经在主观和客观指标中超过现有方法。

eess.AS - 2023-11-14

Mustango: Toward Controllable Text-to-Music Generation

  • paper_url: http://arxiv.org/abs/2311.08355
  • repo_url: https://github.com/amaai-lab/mustango
  • paper_authors: Jan Melechovsky, Zixun Guo, Deepanway Ghosal, Navonil Majumder, Dorien Herremans, Soujanya Poria
  • for: 这个研究的目的是发展一个可控制的文本到音乐系统,以掌控生成的音乐的乐器、和声、速度、键等音乐特性。
  • methods: 这个系统使用了 diffusion 模型,并将音乐领域知识 Informed UNet 模组(MuNet)与文本内容进行整合,以便从文本描述中提取音乐特性,并将其融入到散射过程中。
  • results: 这个研究显示,这个 Mustango 系统可以实现高质量的音乐生成,并且可以从文本描述中提取音乐特性,并且可以对散射过程进行控制,以实现欲要的乐器、和声、速度、键等音乐特性。
    Abstract With recent advancements in text-to-audio and text-to-music based on latent diffusion models, the quality of generated content has been reaching new heights. The controllability of musical aspects, however, has not been explicitly explored in text-to-music systems yet. In this paper, we present Mustango, a music-domain-knowledge-inspired text-to-music system based on diffusion, that expands the Tango text-to-audio model. Mustango aims to control the generated music, not only with general text captions, but from more rich captions that could include specific instructions related to chords, beats, tempo, and key. As part of Mustango, we propose MuNet, a Music-Domain-Knowledge-Informed UNet sub-module to integrate these music-specific features, which we predict from the text prompt, as well as the general text embedding, into the diffusion denoising process. To overcome the limited availability of open datasets of music with text captions, we propose a novel data augmentation method that includes altering the harmonic, rhythmic, and dynamic aspects of music audio and using state-of-the-art Music Information Retrieval methods to extract the music features which will then be appended to the existing descriptions in text format. We release the resulting MusicBench dataset which contains over 52K instances and includes music-theory-based descriptions in the caption text. Through extensive experiments, we show that the quality of the music generated by Mustango is state-of-the-art, and the controllability through music-specific text prompts greatly outperforms other models in terms of desired chords, beat, key, and tempo, on multiple datasets.
    摘要 Recent advancements in text-to-audio and text-to-music based on latent diffusion models have led to significant improvements in generated content quality. However, the controllability of musical aspects in text-to-music systems has not been explicitly explored. In this paper, we present Mustango, a text-to-music system based on diffusion that expands the Tango text-to-audio model. Mustango aims to control the generated music not only with general text captions but also with specific instructions related to chords, beats, tempo, and key.As part of Mustango, we propose MuNet, a Music-Domain-Knowledge-Informed UNet sub-module that integrates music-specific features into the diffusion denoising process. To address the limited availability of open datasets of music with text captions, we propose a novel data augmentation method that includes altering the harmonic, rhythmic, and dynamic aspects of music audio and using state-of-the-art Music Information Retrieval methods to extract music features. We release the resulting MusicBench dataset, which contains over 52K instances and includes music-theory-based descriptions in the caption text.Through extensive experiments, we show that the quality of the music generated by Mustango is state-of-the-art, and the controllability through music-specific text prompts greatly outperforms other models in terms of desired chords, beat, key, and tempo on multiple datasets.

cs.CV - 2023-11-14

Unsupervised segmentation of irradiation$\unicode{x2010}$induced order$\unicode{x2010}$disorder phase transitions in electron microscopy

  • paper_url: http://arxiv.org/abs/2311.08585
  • repo_url: None
  • paper_authors: Arman H Ter-Petrosyan, Jenna A Bilbrey, Christina M Doty, Bethany E Matthews, Le Wang, Yingge Du, Eric Lang, Khalid Hattar, Steven R Spurgeon
  • for: 这个论文是为了无监督分割电子镜像所写的(electron microscopy image segmentation)。
  • methods: 这个论文使用了域预训练的卷积神经网络(CNN)提取特征,然后生成相似图并应用Louvaine方法进行社区探测,实现分割。
  • results: 这个论文通过跟踪辐射引起的杂质前缘在薄膜中的变化,以及在 catalysis 和电子设备中的应用。
    Abstract We present a method for the unsupervised segmentation of electron microscopy images, which are powerful descriptors of materials and chemical systems. Images are oversegmented into overlapping chips, and similarity graphs are generated from embeddings extracted from a domain$\unicode{x2010}$pretrained convolutional neural network (CNN). The Louvain method for community detection is then applied to perform segmentation. The graph representation provides an intuitive way of presenting the relationship between chips and communities. We demonstrate our method to track irradiation$\unicode{x2010}$induced amorphous fronts in thin films used for catalysis and electronics. This method has potential for "on$\unicode{x2010}$the$\unicode{x2010}$fly" segmentation to guide emerging automated electron microscopes.
    摘要 我们提出了一种无监督的电子顾像图像分割方法,这些图像是物质和化学系统的强大描述器。图像被过分割成重叠的块,并从域预训练的卷积神经网络(CNN)中提取出的特征向量生成相似图。然后,我们使用Louvaine方法进行社区探测,进行分割。图表表示块和社区之间的关系,提供了直观的表达方式。我们示cases demonstrate our method for tracking irradiation-induced amorphous fronts in thin films used for catalysis and electronics. This method has the potential for "on-the-fly" segmentation to guide emerging automated electron microscopes.Note: I used the Google Translate API to translate the text into Simplified Chinese. Please note that the translation may not be perfect and may require some adjustments for clarity or accuracy.

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

  • paper_url: http://arxiv.org/abs/2311.09257
  • repo_url: None
  • paper_authors: Yanwu Xu, Yang Zhao, Zhisheng Xiao, Tingbo Hou
  • for: 这篇论文目的是提出一种高效的文本到图像生成模型,以提高传统文本到图像生成模型的计算成本。
  • methods: 该模型采用混合方法,结合了扩散模型和GAN目标函数。使用新引入的扩散-GAN目标函数和预训练扩散模型初始化,以实现高质量的文本描述图像生成。
  • results: UFOGen在单步文本到图像生成中表现出色,能够高效地生成高质量的图像。此外,UFOGen还展示了在不同应用领域的多样化应用能力,如文本描述转换、图像生成等。
    Abstract Text-to-image diffusion models have demonstrated remarkable capabilities in transforming textual prompts into coherent images, yet the computational cost of their inference remains a persistent challenge. To address this issue, we present UFOGen, a novel generative model designed for ultra-fast, one-step text-to-image synthesis. In contrast to conventional approaches that focus on improving samplers or employing distillation techniques for diffusion models, UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN objective. Leveraging a newly introduced diffusion-GAN objective and initialization with pre-trained diffusion models, UFOGen excels in efficiently generating high-quality images conditioned on textual descriptions in a single step. Beyond traditional text-to-image generation, UFOGen showcases versatility in applications. Notably, UFOGen stands among the pioneering models enabling one-step text-to-image generation and diverse downstream tasks, presenting a significant advancement in the landscape of efficient generative models. \blfootnote{*Work done as a student researcher of Google, $\dagger$ indicates equal contribution.
    摘要 文本到图像扩散模型已经展现出了很强的能力,可以将文本描述转换成一致的图像,但计算成本仍然是一个挑战。为解决这个问题,我们提出了UFOGen,一种新的生成模型,旨在实现超快、一步文本到图像合成。与传统方法不同,UFOGen采用混合方法,将扩散模型与GAN目标相结合。通过引入新的扩散-GAN目标和使用预训练扩散模型的初始化,UFOGen在一步文本描述下生成高质量的图像。此外,UFOGen在应用方面也有很好的灵活性,可以进行多种下游任务,如图像修饰等。值得注意的是,UFOGen是一种开创性的模型,可以实现一步文本到图像生成和多种下游任务,对有效的生成模型领域具有重要的进步。

Drivable 3D Gaussian Avatars

  • paper_url: http://arxiv.org/abs/2311.08581
  • repo_url: None
  • paper_authors: Wojciech Zielonka, Timur Bagautdinov, Shunsuke Saito, Michael Zollhöfer, Justus Thies, Javier Romero
  • for: 这篇论文旨在创造可控的3D人体模型,使用 Gaussian splats 技术实现实时渲染。
  • methods: 该论文使用 dense calibrated multi-view 视频作为输入,并使用 cage deformations 方法来控制人体模型的变形。
  • results: 对于九名不同体型、衣服和动作的试验者,该方法可以获得更高质量的结果,比之前的方法更适合电子沟通应用。
    Abstract We present Drivable 3D Gaussian Avatars (D3GA), the first 3D controllable model for human bodies rendered with Gaussian splats. Current photorealistic drivable avatars require either accurate 3D registrations during training, dense input images during testing, or both. The ones based on neural radiance fields also tend to be prohibitively slow for telepresence applications. This work uses the recently presented 3D Gaussian Splatting (3DGS) technique to render realistic humans at real-time framerates, using dense calibrated multi-view videos as input. To deform those primitives, we depart from the commonly used point deformation method of linear blend skinning (LBS) and use a classic volumetric deformation method: cage deformations. Given their smaller size, we drive these deformations with joint angles and keypoints, which are more suitable for communication applications. Our experiments on nine subjects with varied body shapes, clothes, and motions obtain higher-quality results than state-of-the-art methods when using the same training and test data.
    摘要 我们介绍Drivable 3D Gaussian Avatars(D3GA),这是首个基于 Gaussian splats 的人体模型,可以实时控制。现有的高级实时渲染人体模型通常需要在训练期间进行高精度的3D注册或在测试期间使用密集的输入图像,或者都是这两者。此外,基于神经辐射场也往往会对实时应用场景造成极大的阻塞。我们使用最近提出的3D Gaussian Splatting(3DGS)技术来渲染真实的人体,并使用密集标准化多视图视频作为输入。为了变形这些基本模型,我们弃用通常使用的点形变形方法(LBS),而是使用经典的体Volume变形方法:笼体变形。由于它们的更小的大小,我们使用关节角度和关节点来驱动这些变形,这更适合通信应用。我们在9名不同身体形态、衣服和动作的试验中获得了与现状方法相比的更高质量结果,使用同一组training和测试数据。

Reading Between the Mud: A Challenging Motorcycle Racer Number Dataset

  • paper_url: http://arxiv.org/abs/2311.09256
  • repo_url: https://github.com/jacobtyo/swintextspotter
  • paper_authors: Jacob Tyo, Youngseog Chung, Motolani Olarinre, Zachary C. Lipton
  • for: 本研究实验室发表了一个新的挑战性质的数据集,即 off-road motorcycle Racer number Dataset (RnD),用于光学字符识别 (OCR) 研究。RnD 包含 2,411 幅 профессиональ摄影师拍摄的机车赛车手员照片,这些照片展示了许多对 OCR 难以辨识的因素,例如泥尘遮挡、动态模糊、非标准字体、照明闪光、复杂背景等。
  • methods: 这个数据集包含 5,578 个手动标注的 bounding box,以及标注的数字和字母。
  • results: 我们的实验表明,使用现有的 OCR 算法后 fine-tuning,只有 End-to-End F1 分数为 0.527 的 RnD,并且分析显示泥尘是主要的挑战,对于正常情况下的模型而言,泥尘会导致模型的精度下降很多。
    Abstract This paper introduces the off-road motorcycle Racer number Dataset (RnD), a new challenging dataset for optical character recognition (OCR) research. RnD contains 2,411 images from professional motorsports photographers that depict motorcycle racers in off-road competitions. The images exhibit a wide variety of factors that make OCR difficult, including mud occlusions, motion blur, non-standard fonts, glare, complex backgrounds, etc. The dataset has 5,578 manually annotated bounding boxes around visible motorcycle numbers, along with transcribed digits and letters. Our experiments benchmark leading OCR algorithms and reveal an end-to-end F1 score of only 0.527 on RnD, even after fine-tuning. Analysis of performance on different occlusion types shows mud as the primary challenge, degrading accuracy substantially compared to normal conditions. But the models struggle with other factors including glare, blur, shadows, and dust. Analysis exposes substantial room for improvement and highlights failure cases of existing models. RnD represents a valuable new benchmark to drive innovation in real-world OCR capabilities. The authors hope the community will build upon this dataset and baseline experiments to make progress on the open problem of robustly recognizing text in unconstrained natural environments. The dataset is available at https://github.com/JacobTyo/SwinTextSpotter.
    摘要

Topology of Surface Electromyogram Signals: Hand Gesture Decoding on Riemannian Manifolds

  • paper_url: http://arxiv.org/abs/2311.08548
  • repo_url: https://github.com/harshavardhanatg/geometryofsemg
  • paper_authors: Harshavardhana T. Gowda, Lee M. Miller
  • for: 这个研究是为了检测肢体上的电势变化,并且使用非侵入式表皮电势实验(sEMG)信号来推导手势。
  • methods: 这个研究使用了一组多个感应器电极在肢体上的sEMG信号,并使用了一种简单的分析方法来分辨不同的手势。
  • results: 这个研究发现,使用这种分析方法可以从sEMG信号中提取出丰富的几何图形特征,并且可以用来分辨不同的手势。此外,这种方法还可以处理不同个体和Session之间的信号变化,并且可以提供更加稳定和透明的模型。
    Abstract Decoding gestures from the upper limb using noninvasive surface electromyogram (sEMG) signals is of keen interest for the rehabilitation of amputees, artificial supernumerary limb augmentation, gestural control of computers, and virtual/augmented realities. We show that sEMG signals recorded across an array of sensor electrodes in multiple spatial locations around the forearm evince a rich geometric pattern of global motor unit (MU) activity that can be leveraged to distinguish different hand gestures. We demonstrate a simple technique to analyze spatial patterns of muscle MU activity within a temporal window and show that distinct gestures can be classified in both supervised and unsupervised manners. Specifically, we construct symmetric positive definite (SPD) covariance matrices to represent the spatial distribution of MU activity in a time window of interest, calculated as pairwise covariance of electrical signals measured across different electrodes. This allows us to understand and manipulate multivariate sEMG timeseries on a more natural subspace -the Riemannian manifold. Furthermore, it directly addresses signal variability across individuals and sessions, which remains a major challenge in the field. sEMG signals measured at a single electrode lack contextual information such as how various anatomical and physiological factors influence the signals and how their combined effect alters the evident interaction among neighboring muscles. As we show here, analyzing spatial patterns using covariance matrices on Riemannian manifolds allows us to robustly model complex interactions across spatially distributed MUs and provides a flexible and transparent framework to quantify differences in sEMG signals across individuals. The proposed method is novel in the study of sEMG signals and its performance exceeds the current benchmarks while maintaining exceptional computational efficiency.
    摘要 使用非侵入性表面电 MYography (sEMG) 信号记录从上肢臂的各种位置获得的手势解oding是对各种应用场景的感兴趣,包括截肢者的复健、人工辅助臂、手势控制计算机和虚拟/增强现实。我们显示了 sEMG 信号记录在多个感知电极上的数组形式具有丰富的几何征特,可以用来 отличи不同的手势。我们展示了一种简单的分析方法,可以在时间窗口内分析多个肌电单元 (MU) 的活动空间分布,并证明了不同的手势可以在超级vised和无监督方式下分类。特别是,我们使用对称正定 definite (SPD) covariance matrix来表示在时间窗口内MU活动的空间分布,这allow us 理解和操纵多ivariate sEMG 时序序列在自然的Riemannian manifold上。此外,这种方法直接解决了信号变化 across individuals and sessions 这一主要挑战,而且可以Robustly 模型跨 spacedly distributed MUs 的复杂交互关系。我们的方法是对 sEMG 信号的研究中的新方法,其性能超过当前的benchmark,而且保持了出色的计算效率。

Physical Adversarial Examples for Multi-Camera Systems

  • paper_url: http://arxiv.org/abs/2311.08539
  • repo_url: None
  • paper_authors: Ana Răduţoiu, Jan-Philipp Schulze, Philip Sperl, Konstantin Böttinger
  • for: 这个论文旨在研究多个摄像头设计是如何对抗物理攻击的。
  • methods: 该论文使用了新的攻击方法,即Transcender-MC,它利用在线3D渲染和视角投影在训练过程中。
  • results: 该论文发现,使用多个摄像头设计可以提供一定的鲁棒性,但是在同时优化多个视角时,这种鲁棒性减少。Transcender-MC方法比现有方法更有效,可以成功攻击多个摄像头设计的11%。
    Abstract Neural networks build the foundation of several intelligent systems, which, however, are known to be easily fooled by adversarial examples. Recent advances made these attacks possible even in air-gapped scenarios, where the autonomous system observes its surroundings by, e.g., a camera. We extend these ideas in our research and evaluate the robustness of multi-camera setups against such physical adversarial examples. This scenario becomes ever more important with the rise in popularity of autonomous vehicles, which fuse the information of several cameras for their driving decision. While we find that multi-camera setups provide some robustness towards past attack methods, we see that this advantage reduces when optimizing on multiple perspectives at once. We propose a novel attack method that we call Transcender-MC, where we incorporate online 3D renderings and perspective projections in the training process. Moreover, we motivate that certain data augmentation techniques can facilitate the generation of successful adversarial examples even further. Transcender-MC is 11% more effective in successfully attacking multi-camera setups than state-of-the-art methods. Our findings offer valuable insights regarding the resilience of object detection in a setup with multiple cameras and motivate the need of developing adequate defense mechanisms against them.
    摘要

SceneScore: Learning a Cost Function for Object Arrangement

  • paper_url: http://arxiv.org/abs/2311.08530
  • repo_url: None
  • paper_authors: Ivan Kapelyukh, Edward Johns
  • for: 本研究旨在开发一种能够评估这些物品的排序方式,以便让机器人完成更多有用的任务。
  • methods: 本研究使用一种名为“SceneScore”的方法,可以学习一个成本函数,以评估排序方式的有用性。这个方法使用能量基本模型来学习训练排序方式的分布,不需要环境互动或人工指导。
  • results: 实验结果显示,学习的成本函数可以用来预测缺失的物品的姿态,扩展到新的物品使用semantic特征,并且可以与其他成本函数进行结合以满足约束。
    Abstract Arranging objects correctly is a key capability for robots which unlocks a wide range of useful tasks. A prerequisite for creating successful arrangements is the ability to evaluate the desirability of a given arrangement. Our method "SceneScore" learns a cost function for arrangements, such that desirable, human-like arrangements have a low cost. We learn the distribution of training arrangements offline using an energy-based model, solely from example images without requiring environment interaction or human supervision. Our model is represented by a graph neural network which learns object-object relations, using graphs constructed from images. Experiments demonstrate that the learned cost function can be used to predict poses for missing objects, generalise to novel objects using semantic features, and can be composed with other cost functions to satisfy constraints at inference time.
    摘要 “ Correctly arranging objects is a crucial ability for robots, unlocking a wide range of useful tasks. To create successful arrangements, we need to evaluate the desirability of a given arrangement. Our method, SceneScore, learns a cost function for arrangements, where desirable, human-like arrangements have a low cost. We learn the distribution of training arrangements offline using an energy-based model, solely from example images without requiring environment interaction or human supervision. Our model is represented by a graph neural network, which learns object-object relations using graphs constructed from images. Experimental results show that the learned cost function can be used to predict poses for missing objects, generalize to novel objects using semantic features, and can be composed with other cost functions to satisfy constraints at inference time.”Here's the translation breakdown:* “ Correctly arranging objects is a crucial ability for robots, unlocking a wide range of useful tasks.” (对象的正确排序是机器人的关键能力,解锁了许多有用的任务。)* “ To create successful arrangements, we need to evaluate the desirability of a given arrangement.” (成功的排序需要评估给定排序的可 desirability。)* “ Our method, SceneScore, learns a cost function for arrangements, where desirable, human-like arrangements have a low cost.” (我们的方法是 SceneScore,学习排序的成本函数,愿望的人类化排序有低成本。)* “ We learn the distribution of training arrangements offline using an energy-based model, solely from example images without requiring environment interaction or human supervision.” (我们在线上学习排序的分布,使用能量基本模型,只使用示例图像,不需要环境互动或人类监督。)* “ Our model is represented by a graph neural network, which learns object-object relations using graphs constructed from images.” (我们的模型是图神经网络,学习图像中对象之间的关系。)* “ Experimental results show that the learned cost function can be used to predict poses for missing objects, generalize to novel objects using semantic features, and can be composed with other cost functions to satisfy constraints at inference time.” (实验结果表明,学习的成本函数可以用来预测缺失对象的姿态,泛化到新对象使用semantic特征,并可以与其他成本函数组合来满足约束。)

Cross-dataset domain adaptation for the classification COVID-19 using chest computed tomography images

  • paper_url: http://arxiv.org/abs/2311.08524
  • repo_url: None
  • paper_authors: Ridha Ouni, Haikel Alhichri
  • for: 这个论文是为了检测 COVID-19 患者使用计算机 Tomography(CT)图像的肺部而写的。
  • methods: 这个论文使用了 Deep Learning(DL)解决方案,具体来说是 Convolutional Neural Networks(CNN)。它们在同一个数据集上训练和测试时取得了出色的结果,但是当数据集不同时,结果却很低。这个论文使用了领域适应(DA)技术来解决这个跨数据集问题。
  • results: 这个论文在四个跨数据集场景中测试了 COVID19-DANet 模型,使用了 SARS-CoV-2-CT 和 COVID19-CT 数据集,并取得了比现有研究更加鼓舞人的结果。
    Abstract Detecting COVID-19 patients using Computed Tomography (CT) images of the lungs is an active area of research. Datasets of CT images from COVID-19 patients are becoming available. Deep learning (DL) solutions and in particular Convolutional Neural Networks (CNN) have achieved impressive results for the classification of COVID-19 CT images, but only when the training and testing take place within the same dataset. Work on the cross-dataset problem is still limited and the achieved results are low. Our work tackles the cross-dataset problem through a Domain Adaptation (DA) technique with deep learning. Our proposed solution, COVID19-DANet, is based on pre-trained CNN backbone for feature extraction. For this task, we select the pre-trained Efficientnet-B3 CNN because it has achieved impressive classification accuracy in previous work. The backbone CNN is followed by a prototypical layer which is a concept borrowed from prototypical networks in few-shot learning (FSL). It computes a cosine distance between given samples and the class prototypes and then converts them to class probabilities using the Softmax function. To train the COVID19-DANet model, we propose a combined loss function that is composed of the standard cross-entropy loss for class discrimination and another entropy loss computed over the unlabelled target set only. This so-called unlabelled target entropy loss is minimized and maximized in an alternative fashion, to reach the two objectives of class discrimination and domain invariance. COVID19-DANet is tested under four cross-dataset scenarios using the SARS-CoV-2-CT and COVID19-CT datasets and has achieved encouraging results compared to recent work in the literature.
    摘要 寻找COVID-19患者使用计算机Tomography(CT)图像是一个活跃的研究领域。COVID-19患者的CT图像集合在不断地提供。深度学习(DL)解决方案,特别是卷积神经网络(CNN),在分类COVID-19 CT图像时已经达到了非常出色的结果,但只有在训练和测试数据集在同一个数据集时才能达到这些结果。对于跨数据集问题,目前的研究尚未充分,已经获得的结果较低。我们的工作是通过领域适应(DA)技术来解决跨数据集问题。我们提出的解决方案是基于预训练的 CNN 背景,称为 COVID19-DANet。我们选择了预训练的 Efficientnet-B3 CNN,因为它在前一个工作中已经达到了非常出色的分类精度。背景 CNN 后接一个prototype层,这是从几何学学习(FSL)中借鉴的概念。它计算给定样本和类型谱的cosine距离,然后使用softmax函数将其转换为类别概率。为了训练 COVID19-DANet 模型,我们提议一个组合损失函数,由标准的交叉熵损失和另一个对无标签目标集的 entropy 损失组成。这个无标签目标 entropy 损失在 alternate 的方式下逐渐下降和上升,以实现两个目标:类别识别和领域不变性。COVID19-DANet 在四个跨数据集场景下进行测试,使用 SARS-CoV-2-CT 和 COVID19-CT 数据集,并取得了与当前文献中的结果相当的成绩。

MADG: Margin-based Adversarial Learning for Domain Generalization

  • paper_url: http://arxiv.org/abs/2311.08503
  • repo_url: None
  • paper_authors: Aveen Dayal, Vimal K. B., Linga Reddy Cenkeramaddi, C. Krishna Mohan, Abhinav Kumar, Vineeth N Balasubramanian
    for:The paper aims to address the challenges of domain shift in deep learning by proposing a novel adversarial learning-based domain generalization (DG) algorithm, MADG.methods:The MADG model uses a margin loss-based discrepancy metric, which is more informative, tighter, practical, and efficiently optimizable compared to traditional 0-1 loss-based methods.results:The proposed MADG model learns domain-invariant features across all source domains and generalizes well to unseen target domains, as demonstrated through extensive experiments on popular real-world DG datasets. The model’s performance is consistent across all datasets, and the authors provide a theoretical analysis of the model’s generalization bound using margin loss and Rademacher complexity.
    Abstract Domain Generalization (DG) techniques have emerged as a popular approach to address the challenges of domain shift in Deep Learning (DL), with the goal of generalizing well to the target domain unseen during the training. In recent years, numerous methods have been proposed to address the DG setting, among which one popular approach is the adversarial learning-based methodology. The main idea behind adversarial DG methods is to learn domain-invariant features by minimizing a discrepancy metric. However, most adversarial DG methods use 0-1 loss based $\mathcal{H}\Delta\mathcal{H}$ divergence metric. In contrast, the margin loss-based discrepancy metric has the following advantages: more informative, tighter, practical, and efficiently optimizable. To mitigate this gap, this work proposes a novel adversarial learning DG algorithm, MADG, motivated by a margin loss-based discrepancy metric. The proposed MADG model learns domain-invariant features across all source domains and uses adversarial training to generalize well to the unseen target domain. We also provide a theoretical analysis of the proposed MADG model based on the unseen target error bound. Specifically, we construct the link between the source and unseen domains in the real-valued hypothesis space and derive the generalization bound using margin loss and Rademacher complexity. We extensively experiment with the MADG model on popular real-world DG datasets, VLCS, PACS, OfficeHome, DomainNet, and TerraIncognita. We evaluate the proposed algorithm on DomainBed's benchmark and observe consistent performance across all the datasets.
    摘要 域外泛化(DG)技术已成为深度学习(DL)中解决域shift问题的流行方法,目标是在训练时未看到目标域的情况下,在目标域上具有良好的泛化性。在过去几年,许多DG方法被提出,其中一种受欢迎的方法是对抗学习基本方法。对抗DG方法的主要想法是通过最小化一个距离度量来学习域 invariant 特征。然而,大多数对抗DG方法使用0-1损失函数基于 $\mathcal{H}\Delta\mathcal{H}$ 距离度量。相比之下,margin损失基于距离度量有以下优点:更加有用信息、紧密、实用和可效地优化。为了弥补这一差距,本文提出了一种新的对抗学习DG算法,called MADG,被激发于margin损失基于距离度量。MADG模型在所有源域上学习域 invariant 特征,并使用对抗训练来在未看到目标域的情况下泛化良好。我们还提供了对MADG模型的理论分析,基于未见目标错误 bound。 Specifically, we construct the link between the source and unseen domains in the real-valued hypothesis space and derive the generalization bound using margin loss and Rademacher complexity. We extensively experiment with the MADG model on popular real-world DG datasets, VLCS, PACS, OfficeHome, DomainNet, and TerraIncognita. We evaluate the proposed algorithm on DomainBed's benchmark and observe consistent performance across all the datasets.

Performance of Machine Learning Classification in Mammography Images using BI-RADS

  • paper_url: http://arxiv.org/abs/2311.08493
  • repo_url: None
  • paper_authors: Malitha Gunawardhana, Norbert Zolek
  • for: 这份研究旨在测试不同类型的乳腺超音波图像分类模型的精度,以BI-RADS(乳腺影像评价和数据系统)的定义。
  • methods: 我们使用了六种进步的分类架构,包括VGG19 \cite{simonyan2014very}, ResNet50 \cite{he2016deep}, GoogleNet \cite{szegedy2015going}, ConvNext \cite{liu2022convnet}, EfficientNet \cite{tan2019efficientnet}和Vision Transformers (ViT) \cite{dosovitskiy2020image},而不是传统机器学习模型。我们在三个不同的设定下评估这些模型:全面精度训练、线性评估和从头开始训练。
  • results: 我们发现这些模型在全面精度训练设定下具有极高的精度和F1分数,尤其是76.39%的精度和67.94%的F1分数。这些结果显示了我们的电脑支持诊断系统的可能性和可靠性,并提供了对未来对乳腺影像评价系统的改进的坚实基础。
    Abstract This research aims to investigate the classification accuracy of various state-of-the-art image classification models across different categories of breast ultrasound images, as defined by the Breast Imaging Reporting and Data System (BI-RADS). To achieve this, we have utilized a comprehensively assembled dataset of 2,945 mammographic images sourced from 1,540 patients. In order to conduct a thorough analysis, we employed six advanced classification architectures, including VGG19 \cite{simonyan2014very}, ResNet50 \cite{he2016deep}, GoogleNet \cite{szegedy2015going}, ConvNext \cite{liu2022convnet}, EfficientNet \cite{tan2019efficientnet}, and Vision Transformers (ViT) \cite{dosovitskiy2020image}, instead of traditional machine learning models. We evaluate models in three different settings: full fine-tuning, linear evaluation and training from scratch. Our findings demonstrate the effectiveness and capability of our Computer-Aided Diagnosis (CAD) system, with a remarkable accuracy of 76.39\% and an F1 score of 67.94\% in the full fine-tuning setting. Our findings indicate the potential for enhanced diagnostic accuracy in the field of breast imaging, providing a solid foundation for future endeavors aiming to improve the precision and reliability of CAD systems in medical imaging.
    摘要 Note:* "BI-RADS" 是指 Breast Imaging Reporting and Data System* "CAD" 是指 Computer-Aided Diagnosis* "F1 score" 是指 F1 分数

MUDD: A New Re-Identification Dataset with Efficient Annotation for Off-Road Racers in Extreme Conditions

  • paper_url: http://arxiv.org/abs/2311.08488
  • repo_url: https://github.com/jacobtyo/mudd
  • paper_authors: Jacob Tyo, Motolani Olarinre, Youngseog Chung, Zachary C. Lipton
  • for: 这个论文目标是解决无约束环境中重新识别个体的问题,具体是开发一个大规模的摄像头识别数据集(MUDD),用于测试和评估重新识别模型在耗油车比赛中的性能。
  • methods: 这篇论文使用了现有的重新识别模型,包括OSNet和ResNet-50,并进行了精心的标注方法,以减少标注时间的消耗。
  • results: 根据实验结果,在没有微调的情况下,最佳模型的rank1准确率只有33%,但是通过微调MUDD数据集,可以提高rank1准确率至79%。但是,还存在许多可以改进的问题,如果能够解决这些问题,可以提高重新识别模型的性能。
    Abstract Re-identifying individuals in unconstrained environments remains an open challenge in computer vision. We introduce the Muddy Racer re-IDentification Dataset (MUDD), the first large-scale benchmark for matching identities of motorcycle racers during off-road competitions. MUDD exhibits heavy mud occlusion, motion blurring, complex poses, and extreme lighting conditions previously unseen in existing re-id datasets. We present an annotation methodology incorporating auxiliary information that reduced labeling time by over 65%. We establish benchmark performance using state-of-the-art re-id models including OSNet and ResNet-50. Without fine-tuning, the best models achieve only 33% Rank-1 accuracy. Fine-tuning on MUDD boosts results to 79% Rank-1, but significant room for improvement remains. We analyze the impact of real-world factors including mud, pose, lighting, and more. Our work exposes open problems in re-identifying individuals under extreme conditions. We hope MUDD serves as a diverse and challenging benchmark to spur progress in robust re-id, especially for computer vision applications in emerging sports analytics. All code and data can be found at https://github.com/JacobTyo/MUDD.
    摘要 <>将文本翻译成简化中文。<>计算机视觉中重新识别人员在未经限制的环境中仍然是一个打开的挑战。我们介绍了污泥跑手重新识别数据集(MUDD),这是第一个大规模的匹配跑手身份的挑战。MUDD表现出污泥干扰、运动模糊、复杂的姿势和前所未见的照明条件。我们提出了一种注解方法,通过辅助信息减少标注时间超过65%。我们使用现状的最佳扩展模型,包括OSNet和ResNet-50,并对MUDD进行了微调。没有微调,最佳模型只有33%的排名第一精度。微调后,最佳模型的精度提高到79%,但还有很大的改进空间。我们分析了实际世界中的因素,包括污泥、姿势、照明等。我们的工作暴露了人员重新识别下极端条件下的开放问题。我们希望MUDD能成为一个多样化和挑战的标准 benchmark,以推动Robust re-id的进步,特别是在出现的运动数据分析领域。所有代码和数据可以在https://github.com/JacobTyo/MUDD中找到。

Leveraging Foundation Models to Improve Lightweight Clients in Federated Learning

  • paper_url: http://arxiv.org/abs/2311.08479
  • repo_url: None
  • paper_authors: Xidong Wu, Wan-Yi Lin, Devin Willmott, Filipe Condessa, Yufei Huang, Zhenzhen Li, Madan Ravi Ganesh
  • for: 帮助在不同数据分布环境下进行联合训练轺降减模型,提高模型性能和Robustness。
  • methods: 基于基础模型的模型精炼法,帮助减轻联合训练的计算开销和执行速率。
  • results: 在不同数据分布环境下,模型性能得到了提高,尤其是在罕见样本下的表现。
    Abstract Federated Learning (FL) is a distributed training paradigm that enables clients scattered across the world to cooperatively learn a global model without divulging confidential data. However, FL faces a significant challenge in the form of heterogeneous data distributions among clients, which leads to a reduction in performance and robustness. A recent approach to mitigating the impact of heterogeneous data distributions is through the use of foundation models, which offer better performance at the cost of larger computational overheads and slower inference speeds. We introduce foundation model distillation to assist in the federated training of lightweight client models and increase their performance under heterogeneous data settings while keeping inference costs low. Our results show improvement in the global model performance on a balanced testing set, which contains rarely observed samples, even under extreme non-IID client data distributions. We conduct a thorough evaluation of our framework with different foundation model backbones on CIFAR10, with varying degrees of heterogeneous data distributions ranging from class-specific data partitions across clients to dirichlet data sampling, parameterized by values between 0.01 and 1.0.
    摘要 federated learning (FL) 是一种分布式训练模式,允许全球各地的客户端共同学习一个全球模型,而无需披露敏感数据。然而,FL 面临着各客户端数据分布异常的挑战,这会导致模型性能和可靠性下降。为了减轻客户端数据分布异常的影响,我们可以使用基础模型,它们在训练时需要更大的计算负担和更慢的推理速度,但可以提高模型性能。我们称之为基础模型浸泡来帮助在不同客户端上进行联合训练轻量级客户端模型,以提高在不同数据分布情况下的性能,并保持推理成本低。我们的结果表明,在权衡测试集上,我们的框架可以在不同基础模型背bone和客户端数据分布情况下提高全球模型的性能,尤其是在EXTREME Non-IID客户端数据分布情况下。我们在 CIFAR10 上进行了系统性的评估,并 parametrised 客户端数据分布情况,从 class-specific 数据分区到 dirichlet 抽样,参数范围为 0.01 到 1.0。

Towards Open-Ended Visual Recognition with Large Language Model

  • paper_url: http://arxiv.org/abs/2311.08400
  • repo_url: https://github.com/bytedance/omniscient-model
  • paper_authors: Qihang Yu, Xiaohui Shen, Liang-Chieh Chen
  • for: 提出一种 straightforward 和有效的解决方案,即 OmniScient Model (OSM),以 Addressing the challenges of localizing and recognizing objects in the open-ended physical world.
  • methods: 使用 Large Language Model (LLM) 生成类标签,取消提供类名称 during both training and testing,并且可以在不需要人工干预的情况下进行跨数据集训练,展现了robust generalization capabilities。
  • results: 通过combine OSM with off-the-shelf mask proposal model,在多种 benchmark 上显示了Promising results, and demonstrated its effectiveness in handling novel concepts.
    Abstract Localizing and recognizing objects in the open-ended physical world poses a long-standing challenge within the domain of machine perception. Recent methods have endeavored to address the issue by employing a class-agnostic mask (or box) proposal model, complemented by an open-vocabulary classifier (e.g., CLIP) using pre-extracted text embeddings. However, it is worth noting that these open-vocabulary recognition models still exhibit limitations in practical applications. On one hand, they rely on the provision of class names during testing, where the recognition performance heavily depends on this predefined set of semantic classes by users. On the other hand, when training with multiple datasets, human intervention is required to alleviate the label definition conflict between them. In this paper, we introduce the OmniScient Model (OSM), a novel Large Language Model (LLM) based mask classifier, as a straightforward and effective solution to the aforementioned challenges. Specifically, OSM predicts class labels in a generative manner, thus removing the supply of class names during both training and testing. It also enables cross-dataset training without any human interference, exhibiting robust generalization capabilities due to the world knowledge acquired from the LLM. By combining OSM with an off-the-shelf mask proposal model, we present promising results on various benchmarks, and demonstrate its effectiveness in handling novel concepts. Code/model are available at https://github.com/bytedance/OmniScient-Model.
    摘要 本文提出了一种新的大型自然语言模型(LLM)基于的面部分类器——全能科学模型(OSM),用于解决开放式物理世界中物体认知的长期挑战。传统方法通常采用类型不固定的面提议模型,并且使用预取得的文本嵌入来进行开放 vocabulary 分类。然而,这些开放 vocabulary 分类器在实际应用中仍存在一些限制。一方面,它们需要用户提供类别名称进行测试,测试性能强度取决于用户提供的类别名称。另一方面,在多个数据集上训练时,需要人工干预来缓解数据集之间的类别定义冲突。本文的解决方案是基于 LLM 的面部分类器,可以预测类别标签的生成方式,因此不需要在训练和测试中提供类别名称。它还可以在不需要人工干预的情况下跨数据集训练,并且具有强大的世界知识,从而表现出了robust的泛化能力。通过将 OSM 与一个开源的面提议模型结合使用,我们在多个标准准确率上获得了扎实的表现,并且在处理新概念方面也具有良好的效果。代码/模型可以在 GitHub 上找到。

USLR: an open-source tool for unbiased and smooth longitudinal registration of brain MR

  • paper_url: http://arxiv.org/abs/2311.08371
  • repo_url: https://github.com/acasamitjana/uslr
  • paper_authors: Adrià Casamitjana, Roser Sala-Llonch, Karim Lekadir, Juan Eugenio Iglesias
    for: 这个论文是为了提出一种计算框架,用于对大脑MRI扫描图像进行长期匀速注册,以估计时间序列中的非线性图像轨迹,这些轨迹是时间点无偏见、空间变换准确、成像artefact抗性的。methods: 这个框架使用了Lie álgebra参数化的空间变换(与rigid变换和静止速度场兼容),并利用log域属性来解决问题,并且使用 Bayesian推理来估计非线性注册和rigid注册。results: 这个框架可以为Alzheimer’s disease研究提供多个方面的 beneficial effects,例如:时间一致的图像分割,以减少内部变化的影响,subject特定预测或人口分析使用tensor-based morphometry。这种方法可以在扫描图像之间进行较为精细的衰减识别,从而减少临床试验中的样本大小。
    Abstract We present USLR, a computational framework for longitudinal registration of brain MRI scans to estimate nonlinear image trajectories that are smooth across time, unbiased to any timepoint, and robust to imaging artefacts. It operates on the Lie algebra parameterisation of spatial transforms (which is compatible with rigid transforms and stationary velocity fields for nonlinear deformation) and takes advantage of log-domain properties to solve the problem using Bayesian inference. USRL estimates rigid and nonlinear registrations that: (i) bring all timepoints to an unbiased subject-specific space; and (i) compute a smooth trajectory across the imaging time-series. We capitalise on learning-based registration algorithms and closed-form expressions for fast inference. A use-case Alzheimer's disease study is used to showcase the benefits of the pipeline in multiple fronts, such as time-consistent image segmentation to reduce intra-subject variability, subject-specific prediction or population analysis using tensor-based morphometry. We demonstrate that such approach improves upon cross-sectional methods in identifying group differences, which can be helpful in detecting more subtle atrophy levels or in reducing sample sizes in clinical trials. The code is publicly available in https://github.com/acasamitjana/uslr
    摘要 我们提出了USLR,一种计算机框架,用于对脑MRI扫描图像进行长期均衡注册,以估计不含时刻点的图像轨迹,这些轨迹在时间方向上是平滑的,不受成像artefacts的影响。它基于 Lie丰化参数化的空间变换(与静止速度场和非线性扭曲兼容),并利用 log 域的特性来解决问题,使用 Bayesian 推理。USLR 估计了不含时刻点的扭曲和非线性注册,它们可以:(i)将所有时刻点转移到不受偏见影响的个体特定空间中; (ii)计算图像序列中的平滑轨迹。我们利用了学习型注册算法和关闭式表达,以便快速推理。我们使用了 Alzheimer's disease 研究来展示我们的框架在多个方面的优势,包括时间相关的图像分割,以降低内部变化的影响,以及使用tensor基本形态来预测或对 population 进行分析。我们示出,这种方法可以在跨sectional 方法上提高组差异的检测,这有助于检测更加弱的衰老水平或减少临床试验中的样本大小。代码可以在 https://github.com/acasamitjana/uslr 中获取。

The Perception-Robustness Tradeoff in Deterministic Image Restoration

  • paper_url: http://arxiv.org/abs/2311.09253
  • repo_url: None
  • paper_authors: Guy Ohayon, Tomer Michaeli, Michael Elad
  • for: 解决 inverse problems 的 deterministic 方法的行为
  • methods: 使用 Lipschitz 常数来证明 predictor 的质量
  • results: deterministic 方法会受到 adversarial attack 的影响,但可以通过 exploring posterior distribution 来模拟 stochastic methods。Here’s a more detailed explanation of each point:1. for: The paper is written to study the behavior of deterministic methods for solving inverse problems in imaging. The authors aim to understand the limitations of these methods and how they can be improved.2. methods: The authors use Lipschitz constants to measure the quality of the predictor. They prove that the better the predictor satisfies two goals - high perceptual quality and consistency with measurements - the larger the Lipschitz constant must be. This implies that such methods are more susceptible to adversarial attacks.3. results: The authors demonstrate their theory on single image super-resolution algorithms, showing that the deterministic method can be affected by adversarial attacks. However, they also show that by exploring the posterior distribution, the deterministic method can imitate stochastic methods, which are more robust to such attacks.
    Abstract We study the behavior of deterministic methods for solving inverse problems in imaging. These methods are commonly designed to achieve two goals: (1) attaining high perceptual quality, and (2) generating reconstructions that are consistent with the measurements. We provide a rigorous proof that the better a predictor satisfies these two requirements, the larger its Lipschitz constant must be, regardless of the nature of the degradation involved. In particular, to approach perfect perceptual quality and perfect consistency, the Lipschitz constant of the model must grow to infinity. This implies that such methods are necessarily more susceptible to adversarial attacks. We demonstrate our theory on single image super-resolution algorithms, addressing both noisy and noiseless settings. We also show how this undesired behavior can be leveraged to explore the posterior distribution, thereby allowing the deterministic model to imitate stochastic methods.
    摘要 我们研究决定方法对几何问题的解决方案。这些方法通常是设计来 дости持二个目标:(1)实现高度的感知质量,和(2)生成符合测量的重建。我们提供了一个严谨的证明,表明如果预测器更好地满足这两个需求,则其Lipschitz常数必须变大,不管受到的扰动是什么样的。具体来说,要进一步推进完美的感知质量和完美的一致性,预测器的Lipschitz常数必须增长到无限大。这意味着这些方法一定会更易受到骗袭攻击。我们在单影像超解析算法中证明了我们的理论,包括噪音和噪音无的设定。我们还示出了如何利用这种不愿的行为来探索 posterior 分布,从而让决定模型模仿随机方法。

Rotation-Agnostic Image Representation Learning for Digital Pathology

  • paper_url: http://arxiv.org/abs/2311.08359
  • repo_url: https://github.com/RhazesLab/PathDino
  • paper_authors: Saghir Alfasly, Abubakr Shafique, Peyman Nejat, Jibran Khan, Areej Alsaafin, Ghazal Alabtah, H. R. Tizhoosh
  • for: 这篇论文主要针对 histopathological image analysis 领域的复杂挑战,通过三个关键贡献进行解决。
  • methods: 该论文提出了一种快速补丁选择方法 (FPS),用于整个扫描图像 (WSI) 分析,大幅降低计算成本,保持准确性。此外,它还提出了一种轻量级的 histopathology 特征提取器 PathDino,只有 9 万参数,远远少于其他选择。
  • results: 该论文表明,使用我们的紧凑型模型可以在 12 种多样化的数据集上超越现有的 histopathology-specific 视Transformers,包括四个内部数据集 (乳腺、肝脏、皮肤和肠癌) 以及七个公共数据集。准确性提高了 8.5%。
    Abstract This paper addresses complex challenges in histopathological image analysis through three key contributions. Firstly, it introduces a fast patch selection method, FPS, for whole-slide image (WSI) analysis, significantly reducing computational cost while maintaining accuracy. Secondly, it presents PathDino, a lightweight histopathology feature extractor with a minimal configuration of five Transformer blocks and only 9 million parameters, markedly fewer than alternatives. Thirdly, it introduces a rotation-agnostic representation learning paradigm using self-supervised learning, effectively mitigating overfitting. We also show that our compact model outperforms existing state-of-the-art histopathology-specific vision transformers on 12 diverse datasets, including both internal datasets spanning four sites (breast, liver, skin, and colorectal) and seven public datasets (PANDA, CAMELYON16, BRACS, DigestPath, Kather, PanNuke, and WSSS4LUAD). Notably, even with a training dataset of 6 million histopathology patches from The Cancer Genome Atlas (TCGA), our approach demonstrates an average 8.5% improvement in patch-level majority vote performance. These contributions provide a robust framework for enhancing image analysis in digital pathology, rigorously validated through extensive evaluation. Project Page: https://rhazeslab.github.io/PathDino-Page/
    摘要 这份论文通过三大贡献提供了复杂的 histopathological 图像分析解决方案。首先,它提出了一种快速补充方法(FPS),用于整个扫描图像(WSI)分析,大幅降低计算成本而保持准确性。其次,它推出了一种轻量级的历史病理特征提取器(PathDino),只有5个转换块和900万参数,与其他选择器相比明显少于。最后,它引入了一种不受旋转影响的学习模式,使用自动驱动学习,有效地避免过拟合。我们还证明了我们的减少模型在12个多样化的数据集上(包括四个内部数据集(乳腺、肝脏、皮肤和肠Rectum)以及七个公共数据集(PANDA、CAMELYON16、BRACS、DigestPath、Kather、PanNuke和WSSS4LUAD))都能够超越现有的历史病理特定视Transformers。特别是,即使使用TCGA数据集进行600万次训练,我们的方法还能够在批量投票中平均提高8.5%的性能。这些贡献为整个数字病理学领域提供了一个坚实的框架,通过广泛的评估来强制验证。项目页面:https://rhazeslab.github.io/PathDino-Page/

Convolutional Neural Networks Exploiting Attributes of Biological Neurons

  • paper_url: http://arxiv.org/abs/2311.08314
  • repo_url: None
  • paper_authors: Neeraj Kumar Singh, Nikhil R. Pal
  • For: This paper aims to improve the performance of Convolutional Neural Networks (CNNs) by integrating principles from biological neurons into the network architecture.* Methods: The proposed method uses neuro-science-inspired computational models of the Lateral Geniculate Nucleus (LGN) and simple cells of the primary visual cortex to extract image features as input to CNNs. The method also uses a two-tower CNN architecture, with one shallow tower and one ResNet 18 tower, to enhance the learning process and performance.* Results: The proposed method achieves a noticeable improvement in performance (on average 5-10%) on CIFAR-10, CIFAR-100, and ImageNet-100 datasets compared to ResNet-18. Additionally, the efficiency of only the Push-Pull tower of the network is also checked.Here is the Chinese translation of the three key points:* For: 这篇论文目标是通过将生物神经元的原理 integrate into CNN 网络架构来提高 CNN 的性能。* Methods: 提议的方法使用了生物神经元的计算模型,即 Lateral Geniculate Nucleus (LGN) 和 primary visual cortex 中的简单细胞模型,以提取图像特征作为 CNN 的输入。该方法还使用了一个两个塔的 CNN 架构,其中一个是浅塔,另一个是 ResNet 18 塔,以增强网络学习过程和性能。* Results: 提议的方法在 CIFAR-10、CIFAR-100 和 ImageNet-100 数据集上实现了平均提高5-10%的性能,相比 ResNet-18。此外,只测试了 Push-Pull 塔的效率也被检查了。
    Abstract In this era of artificial intelligence, deep neural networks like Convolutional Neural Networks (CNNs) have emerged as front-runners, often surpassing human capabilities. These deep networks are often perceived as the panacea for all challenges. Unfortunately, a common downside of these networks is their ''black-box'' character, which does not necessarily mirror the operation of biological neural systems. Some even have millions/billions of learnable (tunable) parameters, and their training demands extensive data and time. Here, we integrate the principles of biological neurons in certain layer(s) of CNNs. Specifically, we explore the use of neuro-science-inspired computational models of the Lateral Geniculate Nucleus (LGN) and simple cells of the primary visual cortex. By leveraging such models, we aim to extract image features to use as input to CNNs, hoping to enhance training efficiency and achieve better accuracy. We aspire to enable shallow networks with a Push-Pull Combination of Receptive Fields (PP-CORF) model of simple cells as the foundation layer of CNNs to enhance their learning process and performance. To achieve this, we propose a two-tower CNN, one shallow tower and the other as ResNet 18. Rather than extracting the features blindly, it seeks to mimic how the brain perceives and extracts features. The proposed system exhibits a noticeable improvement in the performance (on an average of $5\%-10\%$) on CIFAR-10, CIFAR-100, and ImageNet-100 datasets compared to ResNet-18. We also check the efficiency of only the Push-Pull tower of the network.
    摘要 在人工智能时代,深度神经网络(CNN)已成为前Runner,经常超越人类能力。这些深度网络经常被视为所有挑战的解决方案。然而,它们的一个常见缺点是“黑盒”性,不一定反映生物神经系统的运作。一些甚至有 millions/billions 的可调参数,并且培训需要大量数据和时间。在这里,我们将生物神经元的原理 integrate 到 Certain Layer 中的 CNN 中。具体来说,我们将 explore 使用生物 ней维科学发展的 Lateral Geniculate Nucleus (LGN) 和 primary visual cortex 的简单细胞模型。通过这些模型,我们希望从图像中提取特征,并将其作为 CNN 的输入,以提高训练效率和准确率。我们 aspire 使用 shallow network 和 ResNet 18 的 Push-Pull Combination of Receptive Fields (PP-CORF) 模型作为基础层,以便提高 CNN 的学习过程和性能。我们的提案的系统在 CIFAR-10、CIFAR-100 和 ImageNet-100 数据集上显示了明显的性能提升(在 average 上为5%-10%),并且只有 Push-Pull 塔的网络进行测试。

The Heat is On: Thermal Facial Landmark Tracking

  • paper_url: http://arxiv.org/abs/2311.08308
  • repo_url: None
  • paper_authors: James Baker
  • for: 这篇论文旨在为热像图像中的人脸特征点跟踪提供一种方法,以捕捉人体的生物physiological signal,例如血液循环和汗水 secretion,从而远程评估情绪和兴奋程度。
  • methods: 这篇论文使用了多种不同的模型组件,如径 residual connections,通道和特征 wise attention,以及 ensemble 组件的实践,以提高模型的性能。
  • results: 研究发现,使用 convolutional 和 residual 层,并且在通道方向进行自注意力处理,可以实现最佳性能,需要 menos than 100K 参数。
    Abstract Facial landmark tracking for thermal images requires tracking certain important regions of subjects' faces, using images from thermal images, which omit lighting and shading, but show the temperatures of their subjects. The fluctuations of heat in particular places reflect physiological changes like bloodflow and perspiration, which can be used to remotely gauge things like anxiety and excitement. Past work in this domain has been limited to only a very limited set of architectures and techniques. This work goes further by trying a comprehensive suit of various models with different components, such as residual connections, channel and feature-wise attention, as well as the practice of ensembling components of the network to work in parallel. The best model integrated convolutional and residual layers followed by a channel-wise self-attention layer, requiring less than 100K parameters.
    摘要 法面特征跟踪技术 для热图像需要跟踪热图像中重要的面部区域,使用热图像,它们排除光照和阴影,但显示对象的温度。热图像中的温度波动反映了物体的生理变化,如血液循环和汗水,可以通过远程测量情绪和兴奋程度。过去在这个领域中的研究受限于一个非常有限的集合 Architecture和技术。这个工作更迭,尝试了多种不同的模型组件,如径 residual 连接,通道和特征 wise 注意力,以及网络组件的并行实现。最佳模型结合了卷积层和径 residual 层,然后跟踪通道的自注意力层,需要 menos than 100K 参数。

Level Set KSVD

  • paper_url: http://arxiv.org/abs/2311.08284
  • repo_url: https://github.com/rituparnaS/Dictionary-learning-level-set
  • paper_authors: Omer Sapir, Iftach Klapp, Nir Sochen
  • for: 用于检测蔬菜场景中病虫的扩散
  • methods: 使用维度集KSVD学习特征,并使用一种通过 Chan-Vese 函数泛化的 Image segmentation 方法
  • results: 测试结果与其他方法相比,Level-set KSVD 方法显示更高的准确率和更好的性能
    Abstract We present a new algorithm for image segmentation - Level-set KSVD. Level-set KSVD merges the methods of sparse dictionary learning for feature extraction and variational level-set method for image segmentation. Specifically, we use a generalization of the Chan-Vese functional with features learned by KSVD. The motivation for this model is agriculture based. Aerial images are taken in order to detect the spread of fungi in various crops. Our model is tested on such images of cotton fields. The results are compared to other methods.
    摘要 我们提出了一种新的图像分割算法——级别集成KSVD。级别集成KSVD将特征提取的方法和变量水平集成方法结合在一起,用于图像分割。我们使用一种普遍化的 Chan-Vese 函数,其中特征被 KSVD 学习。我们的模型的动机是基于农业的,用于从航空图像中检测不同作物中致病菌的传播。我们的模型在棉田图像上进行测试,与其他方法进行比较。Note:* "级别集成" (level-set) is used to refer to the combination of level-set method and sparse dictionary learning.* "KSVD" (K-Singular Value Decomposition) is used to refer to the sparse dictionary learning method.* " Chan-Vese 函数" (Chan-Vese functional) is used to refer to the energy functional used in the level-set method.

ARTEMIS: Using GANs with Multiple Discriminators to Generate Art

  • paper_url: http://arxiv.org/abs/2311.08278
  • repo_url: None
  • paper_authors: James Baker
  • for: 这篇论文是关于生成抽象艺术的新方法的描述。
  • methods: 论文使用了自动编码器和生成 adversarial neural network (GAN) 来生成抽象艺术作品。自动编码器首先在 VGG 网络中提取了图像的风格表示,然后使用了这些表示来生成新的图像。
  • results: 论文的实验结果表明,使用这种方法可以生成出具有幻想的、几何化的图像,这些图像具有高度的创新性和多样性。
    Abstract We propose a novel method for generating abstract art. First an autoencoder is trained to encode and decode the style representations of images, which are extracted from source images with a pretrained VGG network. Then, the decoder component of the autoencoder is extracted and used as a generator in a GAN. The generator works with an ensemble of discriminators. Each discriminator takes different style representations of the same images, and the generator is trained to create images that create convincing style representations in order to deceive all of the generators. The generator is also trained to maximize a diversity term. The resulting images had a surreal, geometric quality. We call our approach ARTEMIS (ARTistic Encoder- Multi- Discriminators Including Self-Attention), as it uses the self-attention layers and an encoder-decoder architecture.
    摘要 我们提出了一种新的抽象艺术生成方法。首先,我们使用预训练的VGG网络提取源图像中的风格表示,然后使用自动encoder进行编码和解码。接着,decoder组件被提取出来作为生成器在GAN中使用。生成器与多个批处理器(discriminators)一起工作,每个批处理器接受不同的风格表示,生成器则被训练以创造出可以欺骗所有批处理器的图像。同时,生成器还被训练以最大化多样性项。结果图像具有了具有Surreal的几何特点。我们称之为ARTEMIS(ARTistic Encoder- Multi- Discriminators Including Self-Attention),因为它使用了自我注意层和编码-解码架构。

Defining the boundaries: challenges and advances in identifying cells in microscopy images

  • paper_url: http://arxiv.org/abs/2311.08269
  • repo_url: None
  • paper_authors: Nodar Gogoberidze, Beth A. Cimini
  • for: 这篇论文是为了提高微scopic图像中细胞的测量和分析而写的。
  • methods: 这篇论文使用了深度学习方法来进行图像分割,特别是使用Cellpose模型,以提高准确性和用户友好性。
  • results: 这篇论文的实验结果表明,使用深度学习方法可以提高图像分割的准确性和效率,并且可以在多种不同的测试数据上达到高度的一致性。
    Abstract Segmentation, or the outlining of objects within images, is a critical step in the measurement and analysis of cells within microscopy images. While improvements continue to be made in tools that rely on classical methods for segmentation, deep learning-based tools increasingly dominate advances in the technology. Specialist models such as Cellpose continue to improve in accuracy and user-friendliness, and segmentation challenges such as the Multi-Modality Cell Segmentation Challenge continue to push innovation in accuracy across widely-varying test data as well as efficiency and usability. Increased attention on documentation, sharing, and evaluation standards are leading to increased user-friendliness and acceleration towards the goal of a truly universal method.
    摘要 “分割”或“图像中 объек 的划分”是微scopic 图像分析中的重要步骤。 classical 方法工具的改进仍在继续,但是深度学习基础的工具在技术发展中越来越 dominant。专家模型如 Cellpose 的精度和用户Friendliness 不断提高,分割挑战如多 modal 维度细胞分割挑战也在不同的测试数据上提高精度和效率。 文档、分享和评估标准的增加导致了更高的用户友好性和universal 方法的加速。Note: "多 modal" in the text refers to the fact that the images being analyzed are from multiple sources or modalities, such as brightfield, phase contrast, and fluorescence microscopy.

TENT: Connect Language Models with IoT Sensors for Zero-Shot Activity Recognition

  • paper_url: http://arxiv.org/abs/2311.08245
  • repo_url: None
  • paper_authors: Yunjiao Zhou, Jianfei Yang, Han Zou, Lihua Xie
  • for: 本研究旨在探讨语言模型是否可以将文本 semantics 与 IoT 感知信号连接起来,以实现认知任务,如人体活动识别(HAR)。
  • methods: 本研究提出了一种创新的方法——IoT-sEnsors-language alignmEnt pre-Training(TENT),它将文本嵌入与 IoT 感知信号(包括摄像头视频、LiDAR 和 mmWave)的嵌入进行对齐。通过 IoT-语言对照学习,我们 derive 一个共同semantic feature space,使得 IoT 数据与语言嵌入相匹配。
  • results: TENT 不仅可以识别已经看到的动作,还可以“猜测”未看到的动作,通过最近的文本词汇从共同特征空间中选择。我们在不同感知模式的零例 HAR 任务上达到了state-of-the-art 性能,超过了最佳视language模型的12%。
    Abstract Recent achievements in language models have showcased their extraordinary capabilities in bridging visual information with semantic language understanding. This leads us to a novel question: can language models connect textual semantics with IoT sensory signals to perform recognition tasks, e.g., Human Activity Recognition (HAR)? If so, an intelligent HAR system with human-like cognition can be built, capable of adapting to new environments and unseen categories. This paper explores its feasibility with an innovative approach, IoT-sEnsors-language alignmEnt pre-Training (TENT), which jointly aligns textual embeddings with IoT sensor signals, including camera video, LiDAR, and mmWave. Through the IoT-language contrastive learning, we derive a unified semantic feature space that aligns multi-modal features with language embeddings, so that the IoT data corresponds to specific words that describe the IoT data. To enhance the connection between textual categories and their IoT data, we propose supplementary descriptions and learnable prompts that bring more semantic information into the joint feature space. TENT can not only recognize actions that have been seen but also ``guess'' the unseen action by the closest textual words from the feature space. We demonstrate TENT achieves state-of-the-art performance on zero-shot HAR tasks using different modalities, improving the best vision-language models by over 12%.
    摘要 最近的语言模型 achievements 展示了它们在结合视觉信息和语义理解方面的杰出能力。这使我们对一个新的问题感兴趣:可以 ли使用语言模型将文本 semantics 与 IoT 感知信号相连,以实现识别任务,例如人类活动识别(HAR)?如果可以,那么可以构建一个具有人类智能认知的智能 HAR 系统,可以适应新环境和未经见过的类别。这篇文章探索了这种可能性,并提出了一种创新的方法:IoT-sEnsors-language alignmEnt pre-Training(TENT)。TENT 方法将文本嵌入与 IoT 感知信号,包括相机视频、LiDAR 和 mmWave 信号进行同步。通过 IoT-语言对比学习,我们得到了一个统一的Semantic Feature Space,其中 IoT 数据与文本嵌入之间的对应关系得到了确定。为了增强文本类别和其 IoT 数据之间的连接,我们提出了补充描述和可学习的提示。TENT 可以不仅识别已经看过的动作,还可以通过 closest 文本单词来“估计”未经见过的动作。我们示出 TENT 在零shot HAR 任务中实现了state-of-the-art 性能,比最佳视觉语言模型提高了12%以上。

MeLo: Low-rank Adaptation is Better than Fine-tuning for Medical Image Diagnosis

  • paper_url: http://arxiv.org/abs/2311.08236
  • repo_url: None
  • paper_authors: Yitao Zhu, Zhenrong Shen, Zihao Zhao, Sheng Wang, Xin Wang, Xiangyu Zhao, Dinggang Shen, Qian Wang
  • for: 本研究旨在提出一种单一的计算机辅助诊断(CAD)模型,用于多种临床任务,以降低大型变换器(ViT)模型的资源占用和存储空间。
  • methods: 该方法使用低级化适应 instead of 资源占用的精细调整,通过固定 ViT 模型的Weight 并只添加小型低级插件,实现了多种鉴定任务中的竞争性表现。
  • results: 对于四种医疗成像数据集,提出的方法可以与完全精细调整的 ViT 模型具有相似的表现,使用约 0.17% 的可训练参数。此外,MeLo 只增加了约 0.5MB 的存储空间,并允许极快的模型交换和执行。
    Abstract The common practice in developing computer-aided diagnosis (CAD) models based on transformer architectures usually involves fine-tuning from ImageNet pre-trained weights. However, with recent advances in large-scale pre-training and the practice of scaling laws, Vision Transformers (ViT) have become much larger and less accessible to medical imaging communities. Additionally, in real-world scenarios, the deployments of multiple CAD models can be troublesome due to problems such as limited storage space and time-consuming model switching. To address these challenges, we propose a new method MeLo (Medical image Low-rank adaptation), which enables the development of a single CAD model for multiple clinical tasks in a lightweight manner. It adopts low-rank adaptation instead of resource-demanding fine-tuning. By fixing the weight of ViT models and only adding small low-rank plug-ins, we achieve competitive results on various diagnosis tasks across different imaging modalities using only a few trainable parameters. Specifically, our proposed method achieves comparable performance to fully fine-tuned ViT models on four distinct medical imaging datasets using about 0.17% trainable parameters. Moreover, MeLo adds only about 0.5MB of storage space and allows for extremely fast model switching in deployment and inference. Our source code and pre-trained weights are available on our website (https://absterzhu.github.io/melo.github.io/).
    摘要 通常,在基于转换器架构的计算机辅助诊断(CAD)模型开发中,会进行 ImageNet 预训练权重的细化。然而,随着大规模预训练的进步和做法的做法,视觉转换器(ViT)已经变得更加大型,并且对医疗影像社区而言更加不可accessible。此外,在实际应用场景中,多个 CAD 模型的部署可能会陷入有限的存储空间和时间consuming的模型交换问题。为解决这些挑战,我们提出了一新方法 MeLo(医疗影像低级化),它允许开发单一的 CAD 模型,用于多种临床任务,并且具有轻量级的特点。它采用低级化adaptation而不是资源占用的细化。通过固定 ViT 模型的weight和只添加小型低级插件,我们实现了在不同的医疗影像模式上多种诊断任务中的竞争性成绩,使用只有约 0.17% 的可训练参数。此外,MeLo 只增加了约 0.5MB 的存储空间,并允许在部署和推理过程中非常快速的模型交换。我们的源代码和预训练 веса可以在我们的网站(https://absterzhu.github.io/melo.github.io/)上获取。

A Unified Approach for Comprehensive Analysis of Various Spectral and Tissue Doppler Echocardiography

  • paper_url: http://arxiv.org/abs/2311.08439
  • repo_url: None
  • paper_authors: Jaeik Jeon, Jiyeon Kim, Yeonggul Jang, Yeonyee E. Yoon, Dawun Jeong, Youngtaek Hong, Seung-Ah Lee, Hyuk-Jae Chang
  • for: 这个论文的目的是提供一种全面的Doppler echo医学图像分析框架,以便自动测量Doppler图像和诊断心脏功能。
  • methods: 该框架使用 convolutional neural network (CNN) 来自动识别Doppler图像中的关键特征,并使用新的Doppler形态嵌入和抑制锈模块来提高解释和保证一致性。
  • results: 对比其他方法,该框架在性能指标(如 dice similarity coefficients (DSC) 和 intersection over union (IoU))中具有显著的优势,并与临床医生的诊断相一致。
    Abstract Doppler echocardiography offers critical insights into cardiac function and phases by quantifying blood flow velocities and evaluating myocardial motion. However, previous methods for automating Doppler analysis, ranging from initial signal processing techniques to advanced deep learning approaches, have been constrained by their reliance on electrocardiogram (ECG) data and their inability to process Doppler views collectively. We introduce a novel unified framework using a convolutional neural network for comprehensive analysis of spectral and tissue Doppler echocardiography images that combines automatic measurements and end-diastole (ED) detection into a singular method. The network automatically recognizes key features across various Doppler views, with novel Doppler shape embedding and anti-aliasing modules enhancing interpretation and ensuring consistent analysis. Empirical results indicate a consistent outperformance in performance metrics, including dice similarity coefficients (DSC) and intersection over union (IoU). The proposed framework demonstrates strong agreement with clinicians in Doppler automatic measurements and competitive performance in ED detection.
    摘要

Uni-COAL: A Unified Framework for Cross-Modality Synthesis and Super-Resolution of MR Images

  • paper_url: http://arxiv.org/abs/2311.08225
  • repo_url: None
  • paper_authors: Zhiyun Song, Zengxin Qi, Xin Wang, Xiangyu Zhao, Zhenrong Shen, Sheng Wang, Manman Fei, Zhe Wang, Di Zang, Dongdong Chen, Linlin Yao, Qian Wang, Xuehai Wu, Lichi Zhang
  • for: 这paper的目标是提高MRI图像的Synthesis和分辨率,并且可以适应多种临床应用场景。
  • methods: 这paper使用了一种统一的网络来实现CMS、SR和CMSR等多种图像Synthesis任务,并且采用了协调增强和随机特征表示来保证模式和分辨率的一致性。
  • results: 实验结果表明,Uni-COAL在CMS、SR和CMSR等任务中表现出色,超过了现有的方法,这反映了其在多种应用场景中的一致性和通用性。
    Abstract Cross-modality synthesis (CMS), super-resolution (SR), and their combination (CMSR) have been extensively studied for magnetic resonance imaging (MRI). Their primary goals are to enhance the imaging quality by synthesizing the desired modality and reducing the slice thickness. Despite the promising synthetic results, these techniques are often tailored to specific tasks, thereby limiting their adaptability to complex clinical scenarios. Therefore, it is crucial to build a unified network that can handle various image synthesis tasks with arbitrary requirements of modality and resolution settings, so that the resources for training and deploying the models can be greatly reduced. However, none of the previous works is capable of performing CMS, SR, and CMSR using a unified network. Moreover, these MRI reconstruction methods often treat alias frequencies improperly, resulting in suboptimal detail restoration. In this paper, we propose a Unified Co-Modulated Alias-free framework (Uni-COAL) to accomplish the aforementioned tasks with a single network. The co-modulation design of the image-conditioned and stochastic attribute representations ensures the consistency between CMS and SR, while simultaneously accommodating arbitrary combinations of input/output modalities and thickness. The generator of Uni-COAL is also designed to be alias-free based on the Shannon-Nyquist signal processing framework, ensuring effective suppression of alias frequencies. Additionally, we leverage the semantic prior of Segment Anything Model (SAM) to guide Uni-COAL, ensuring a more authentic preservation of anatomical structures during synthesis. Experiments on three datasets demonstrate that Uni-COAL outperforms the alternatives in CMS, SR, and CMSR tasks for MR images, which highlights its generalizability to wide-range applications.
    摘要 跨Modalities合成(CMS)、超解像(SR)以及它们的组合(CMSR)在核磁共振成像(MRI)中已经得到了广泛的研究。它们的主要目标是提高成像质量,同时降低slice thickness。尽管这些技术在合成结果方面具有惊人的成果,但它们往往是为特定任务而设计的,因此它们在复杂的临床场景中的适应性受到限制。因此,建立一个通用的网络,可以处理多种图像合成任务,并且可以根据不同的模式和分辨率设置进行调整,这样就可以大幅减少训练和部署模型的资源。然而,现有的工作都不能通过单一的网络来完成CMS、SR和CMSR等任务。此外,这些MRI重建方法经常不当处理假频谱,导致细节的还原不佳。在这篇论文中,我们提出了一种统一的Co-Modulated Alias-free框架(Uni-COAL),用于实现以上任务。图像受控和随机特征表示的协调设计,保证了CMS和SR之间的一致性,同时能够适应任意的输入/输出模式和厚度。Uni-COAL生成器还基于Shannon-Nyquist信号处理框架,确保了假频谱的有效抑制。此外,我们利用Segment Anything Model(SAM)的semantic priors来引导Uni-COAL,确保在合成过程中保留了更加Authentic的结构。实验结果表明,Uni-COAL在CMS、SR和CMSR等任务中表现出色,高于相关方法,这 highlights its generalizability to wide-range applications。

Improving Image Captioning via Predicting Structured Concepts

  • paper_url: http://arxiv.org/abs/2311.08223
  • repo_url: https://github.com/wangting0/SCP-WGCN
  • paper_authors: Ting Wang, Weidong Chen, Yuanhe Tian, Yan Song, Zhendong Mao
  • for: This paper aims to improve image captioning performance by bridging the semantic gap between images and texts using structured concept prediction and weighted graph convolutional networks (W-GCN).
  • methods: The proposed approach includes a structured concept predictor (SCP) to predict concepts and their structures, as well as W-GCN to depict concept relations driven by word dependencies.
  • results: The approach is shown to be effective in enhancing the contribution of visual signals in image captioning, and the learned differentiated contributions from concepts improve the description generation process. Extensive experiments demonstrate the effectiveness of the proposed approach and each module.Here’s the Chinese version of the three pieces of information:
  • for: 这篇论文目的是通过结构化概念预测和Weighted Graph Convolutional Networks (W-GCN)来bridging图像和文本之间的semantic gap,以提高图像描述性能。
  • methods: 提议的方法包括结构化概念预测器(SCP)来预测概念和其结构,以及基于word dependency的W-GCN来表示概念关系。
  • results: 方法被证明可以增强图像描述中视觉信号的贡献,并且通过 learned differentiated contributions from concepts来改善描述生成过程。广泛的实验结果证明提议的方法和每个模块的效果。
    Abstract Having the difficulty of solving the semantic gap between images and texts for the image captioning task, conventional studies in this area paid some attention to treating semantic concepts as a bridge between the two modalities and improved captioning performance accordingly. Although promising results on concept prediction were obtained, the aforementioned studies normally ignore the relationship among concepts, which relies on not only objects in the image, but also word dependencies in the text, so that offers a considerable potential for improving the process of generating good descriptions. In this paper, we propose a structured concept predictor (SCP) to predict concepts and their structures, then we integrate them into captioning, so as to enhance the contribution of visual signals in this task via concepts and further use their relations to distinguish cross-modal semantics for better description generation. Particularly, we design weighted graph convolutional networks (W-GCN) to depict concept relations driven by word dependencies, and then learns differentiated contributions from these concepts for following decoding process. Therefore, our approach captures potential relations among concepts and discriminatively learns different concepts, so that effectively facilitates image captioning with inherited information across modalities. Extensive experiments and their results demonstrate the effectiveness of our approach as well as each proposed module in this work.
    摘要 在图像描述 зада务中解决semantic gap问题的困难性,传统研究具有一定的注意力于将semantic concept作为两种模态之间的桥梁,从而改进描述性能。虽然在概念预测方面获得了promising的结果,但这些研究通常忽略了概念之间的关系,这些关系不仅取决于图像中的对象,还取决于文本中的word依赖关系,这意味着可以在描述生成过程中提高visuallsignal的贡献。在这篇论文中,我们提出了结构化概念预测器(SCP),用于预测概念和其结构,然后将其集成到描述中,以增强图像描述中visuallsignal的贡献。特别是,我们设计了weighted graph convolutional networks(W-GCN),用于描述概念之间的关系,然后学习这些概念之间的差异性,以便在后续的解码过程中更好地分配权重。因此,我们的方法能够捕捉概念之间的关系,并且强化不同的概念之间的差异性,从而有效地促进图像描述 tasks。广泛的实验和结果证明了我们的方法和每个提议模块的有效性。

Peer is Your Pillar: A Data-unbalanced Conditional GANs for Few-shot Image Generation

  • paper_url: http://arxiv.org/abs/2311.08217
  • repo_url: https://github.com/iceli1007/PIP
  • paper_authors: Ziqiang Li, Chaoyue Wang, Xue Rui, Chao Xue, Jiaxu Leng, Bin Li
    for:* 实现几少数例的图像生成,专门为缺乏训练图像的情况。methods:* 融合目标几少数例数据集和对应的同类数据集,实现数据不均匀的条件生成。* 使用组别嵌入法分离类别空间和内生空间,并使用预训练CLIP的方向损失提高图像多样性。results:* 在不同的几少数例数据集上进行实验,提出了一种名为Peer is your Pillar(PIP)的新管道,可以实现几少数例图像生成,并且降低了训练需求。
    Abstract Few-shot image generation aims to train generative models using a small number of training images. When there are few images available for training (e.g. 10 images), Learning From Scratch (LFS) methods often generate images that closely resemble the training data while Transfer Learning (TL) methods try to improve performance by leveraging prior knowledge from GANs pre-trained on large-scale datasets. However, current TL methods may not allow for sufficient control over the degree of knowledge preservation from the source model, making them unsuitable for setups where the source and target domains are not closely related. To address this, we propose a novel pipeline called Peer is your Pillar (PIP), which combines a target few-shot dataset with a peer dataset to create a data-unbalanced conditional generation. Our approach includes a class embedding method that separates the class space from the latent space, and we use a direction loss based on pre-trained CLIP to improve image diversity. Experiments on various few-shot datasets demonstrate the advancement of the proposed PIP, especially reduces the training requirements of few-shot image generation.
    摘要 几个干净图像生成目标是训练生成模型使用几个训练图像。当有很少图像可用 для训练(例如10个图像)时,学习从零(LFS)方法通常生成图像与训练数据非常相似,而传输学习(TL)方法尝试通过利用大规模数据集中的先前知识来提高性能。然而,当目标和源领域不相关时,当前的TL方法可能无法保持来自源模型的知识水平,使其不适用。为解决这个问题,我们提议一个新的管道,即同伴是你的柱子(PIP),它将目标几个干净数据集与一个对等数据集组合起来,以创建数据不均衡的条件生成。我们的方法包括一种类嵌入方法,可以分离类空间与潜在空间,并使用预训练的CLIP来提高图像多样性。在各种几个干净数据集上进行了实验,我们发现提案的PIP有显著的进步,特别是减少了几个干净图像生成的训练要求。

Diffusion-based generation of Histopathological Whole Slide Images at a Gigapixel scale

  • paper_url: http://arxiv.org/abs/2311.08199
  • repo_url: None
  • paper_authors: Robert Harb, Thomas Pock, Heimo Müller
  • for: 本研究开发了一种基于扩散的方法,用于生成高分辨率的人类组织学标本数据(Whole Slide Images,WSIs),以增强计算机生物学应用程序的性能。
  • methods: 本研究使用了一种从粗糙到细节的样本过滤方法,将初始低分辨率的图像逐步升级为高分辨率WSIs。 Specifically, a diffusion model sequentially adds fine details to images and increases their resolution.
  • results: 在实验中,我们将方法训练使用了TCGA-BRCA数据集。在实验中,我们通过量值评估和用户研究发现,生成的WSIs与真实的标本构造有相似之处。
    Abstract We present a novel diffusion-based approach to generate synthetic histopathological Whole Slide Images (WSIs) at an unprecedented gigapixel scale. Synthetic WSIs have many potential applications: They can augment training datasets to enhance the performance of many computational pathology applications. They allow the creation of synthesized copies of datasets that can be shared without violating privacy regulations. Or they can facilitate learning representations of WSIs without requiring data annotations. Despite this variety of applications, no existing deep-learning-based method generates WSIs at their typically high resolutions. Mainly due to the high computational complexity. Therefore, we propose a novel coarse-to-fine sampling scheme to tackle image generation of high-resolution WSIs. In this scheme, we increase the resolution of an initial low-resolution image to a high-resolution WSI. Particularly, a diffusion model sequentially adds fine details to images and increases their resolution. In our experiments, we train our method with WSIs from the TCGA-BRCA dataset. Additionally to quantitative evaluations, we also performed a user study with pathologists. The study results suggest that our generated WSIs resemble the structure of real WSIs.
    摘要 我们提出了一种新的扩散基于的方法,用于生成高分辨率整个染色质影像(WSIs)。这些合成WSIs具有许多应用前景:它们可以补充训练集,提高计算生物学应用程序的性能。它们允许创建合成的数据集,无需违反隐私法规。还有,它们可以帮助学习WSIs的表示,无需数据标注。Despite this variety of applications, no existing deep learning-based method generates WSIs at their typically high resolutions. Mainly due to the high computational complexity. Therefore, we propose a novel coarse-to-fine sampling scheme to tackle image generation of high-resolution WSIs. In this scheme, we increase the resolution of an initial low-resolution image to a high-resolution WSI. Particularly, a diffusion model sequentially adds fine details to images and increases their resolution. In our experiments, we train our method with WSIs from the TCGA-BRCA dataset. Additionally to quantitative evaluations, we also performed a user study with pathologists. The study results suggest that our generated WSIs resemble the structure of real WSIs.Here's the word-for-word translation of the text into Simplified Chinese:我们提出了一种新的扩散基于的方法,用于生成高分辨率整个染色质影像(WSIs)。这些合成WSIs具有许多应用前景:它们可以补充训练集,提高计算生物学应用程序的性能。它们允许创建合成的数据集,无需违反隐私法规。还有,它们可以帮助学习WSIs的表示,无需数据标注。Despite this variety of applications, no existing deep learning-based method generates WSIs at their typically high resolutions. Mainly due to the high computational complexity. Therefore, we propose a novel coarse-to-fine sampling scheme to tackle image generation of high-resolution WSIs. In this scheme, we increase the resolution of an initial low-resolution image to a high-resolution WSI. Particularly, a diffusion model sequentially adds fine details to images and increases their resolution. In our experiments, we train our method with WSIs from the TCGA-BRCA dataset. Additionally to quantitative evaluations, we also performed a user study with pathologists. The study results suggest that our generated WSIs resemble the structure of real WSIs.

LocaliseBot: Multi-view 3D object localisation with differentiable rendering for robot grasping

  • paper_url: http://arxiv.org/abs/2311.08438
  • repo_url: None
  • paper_authors: Sujal Vijayaraghavan, Redwan Alqasemi, Rajiv Dubey, Sudeep Sarkar
  • for: 本文主要针对对象抓取领域,具体来说是对象pose估计。
  • methods: 本文使用了多视图对象检测、Camera参数估计和3D CAD模型来实现对象pose估计。一个标准的深度学习底层(FCN ResNet)用于估计对象标签、semantic segmentation和相对于摄像头的对象 pose的course estimate。然后使用一个改进模块来从course pose estimate中进行优化,通过可导渲染来实现。
  • results: 在ShapeNet dataset上,本文的对象pose估计方法与状态体系比较,显示出提高。此外,通过使用Estimated object pose结果和实际的抓取候选点,在OCID Grasp dataset上计算的抓取精度为99.65%。
    Abstract Robot grasp typically follows five stages: object detection, object localisation, object pose estimation, grasp pose estimation, and grasp planning. We focus on object pose estimation. Our approach relies on three pieces of information: multiple views of the object, the camera's extrinsic parameters at those viewpoints, and 3D CAD models of objects. The first step involves a standard deep learning backbone (FCN ResNet) to estimate the object label, semantic segmentation, and a coarse estimate of the object pose with respect to the camera. Our novelty is using a refinement module that starts from the coarse pose estimate and refines it by optimisation through differentiable rendering. This is a purely vision-based approach that avoids the need for other information such as point cloud or depth images. We evaluate our object pose estimation approach on the ShapeNet dataset and show improvements over the state of the art. We also show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates on the Object Clutter Indoor Dataset (OCID) Grasp dataset, as computed using standard practice.
    摘要 Robot grasp通常包括五个阶段:对象检测、对象定位、对象姿态估计、抓取姿态估计和抓取规划。我们专注于对象姿态估计。我们的方法基于三个信息:对象的多视图、摄像头的外部参数和3D CAD模型。第一步使用标准的深度学习基础结构(FCN ResNet)来估计对象标签、semantic segmentation和相对于摄像头的粗略对象姿态。我们的创新是使用改进模块,从粗略姿态估计开始,通过微调Rendering来优化。这是一种完全视觉基于的方法,不需要其他信息如点云或深度图像。我们在ShapeNet数据集上评估了我们的对象姿态估计方法,并表明我们的方法在比较准确性方面有所改进。此外,我们还证明我们估计的对象姿态结果和实际的 grasp candidates在OCID Grasp数据集上的抓取精度为99.65%,如标准实践所计算。

SAMIHS: Adaptation of Segment Anything Model for Intracranial Hemorrhage Segmentation

  • paper_url: http://arxiv.org/abs/2311.08190
  • repo_url: https://github.com/mileswyn/samihs
  • paper_authors: Yinuo Wang, Kai Chen, Weimin Yuan, Cai Meng, XiangZhi Bai
    for: 这篇论文是针对stroke诊断和手术规划中的脑出血分类进行研究,使用Segment Anything Model (SAM) 作为基础模型,并提出了一种基于SAM的优化方法(SAMIHS),以提高这些类型的医疗影像分类效能。methods: 这篇论文使用了SAM的图像嵌入器中的参数适束器(parameter-refactoring adapters),并将其视为可变的参数,以提高SAM的灵活性和效能。此外,这篇论文还使用了一种混合损失函数(combo loss),其结合了二进制条件预测损失和边界敏感损失,以提高SAMIHS的边界区域识别能力。results: 根据实验结果,SAMIHS在两个公共数据集上的效能都得到了改善,尤其是在脑出血类型的医疗影像分类中,表明SAMIHS可以提高这些类型的医疗影像分类效能。
    Abstract Segment Anything Model (SAM), a vision foundation model trained on large-scale annotations, has recently continued raising awareness within medical image segmentation. Despite the impressive capabilities of SAM on natural scenes, it struggles with performance decline when confronted with medical images, especially those involving blurry boundaries and highly irregular regions of low contrast. In this paper, a SAM-based parameter-efficient fine-tuning method, called SAMIHS, is proposed for intracranial hemorrhage segmentation, which is a crucial and challenging step in stroke diagnosis and surgical planning. Distinguished from previous SAM and SAM-based methods, SAMIHS incorporates parameter-refactoring adapters into SAM's image encoder and considers the efficient and flexible utilization of adapters' parameters. Additionally, we employ a combo loss that combines binary cross-entropy loss and boundary-sensitive loss to enhance SAMIHS's ability to recognize the boundary regions. Our experimental results on two public datasets demonstrate the effectiveness of our proposed method. Code is available at https://github.com/mileswyn/SAMIHS .
    摘要 Segment Anything Model (SAM) 模型,一种基于大规模注释的视觉基础模型,最近在医学图像 segmentation 中受到了更多的关注。尽管 SAM 在自然场景中表现出色,但在医学图像中,它的表现却会逐渐下降,特别是面临着模糊的边界和低对比度的区域。在这篇论文中,我们提出了基于 SAM 的参数效率调整方法,称为 SAMIHS,用于脑出血栓 segmentation,这是诊断和手术规划中的关键步骤。与之前的 SAM 和基于 SAM 的方法不同,SAMIHS 在 SAM 的图像编码器中添加了参数 refactoring 适配器,并且利用这些适配器的参数进行有效和灵活的使用。此外,我们采用了一种 combio 损失函数,该函数将 binary cross-entropy 损失函数和边界敏感损失函数相加,以提高 SAMIHS 对边界区域的识别能力。我们在两个公共数据集上进行了实验,结果表明了我们提出的方法的有效性。代码可以在 GitHub 上找到:https://github.com/mileswyn/SAMIHS。

A deformation-based morphometry framework for disentangling Alzheimer’s disease from normal aging using learned normal aging templates

  • paper_url: http://arxiv.org/abs/2311.08176
  • repo_url: https://github.com/fjr9516/dbm_with_dl
  • paper_authors: Jingru Fu, Daniel Ferreira, Örjan Smedby, Rodrigo Moreno
  • for: 这个研究旨在解决阿尔茨海默症和正常老化是否存在加速的问题,以及在临床上如何分解阿尔茨海默症和正常老化的混合影响。
  • methods: 该研究使用了深度学习算法创建年龄相关的模板,以便在MRI扫描图像上量化正常老化和阿尔茨海默症相关的衰老模式。然后,使用了 diffeomorphic registration 来估算一年一个CNSubject的正常老化模式,并将测试图像与60岁CNSubject的模板进行对alignment。最后,通过比较这个对alignment与一年正常老化模式的对比,计算出正常老化和阿尔茨海默症特有的分数。
  • results: 研究结果表明,脑 Ventricles 主要遵循加速的正常老化模式,而 hippocampus 和 Amygdala 区域受到了正常老化和阿尔茨海默症特有的影响。 Interestingly,在疾病早期临床阶段,hippocampus 和 Amygdala 区域更加受到加速的正常老化影响,而疾病后期,阿尔茨海默症特有的分数增加。
    Abstract Alzheimer's Disease and normal aging are both characterized by brain atrophy. The question of whether AD-related brain atrophy represents accelerated aging or a neurodegeneration process distinct from that in normal aging remains unresolved. Moreover, precisely disentangling AD-related brain atrophy from normal aging in a clinical context is complex. In this study, we propose a deformation-based morphometry framework to estimate normal aging and AD-specific atrophy patterns of subjects from morphological MRI scans. We first leverage deep-learning-based methods to create age-dependent templates of cognitively normal (CN) subjects. These templates model the normal aging atrophy patterns in a CN population. Then, we use the learned diffeomorphic registration to estimate the one-year normal aging pattern at the voxel level. We register the testing image to the 60-year-old CN template in the second step. Finally, normal aging and AD-specific scores are estimated by measuring the alignment of this registration with the one-year normal aging pattern. The methodology was developed and evaluated on the OASIS3 dataset with 1,014 T1-weighted MRI scans. Of these, 326 scans were from CN subjects, and 688 scans were from individuals clinically diagnosed with AD at different stages of clinical severity defined by clinical dementia rating (CDR) scores. The results show that ventricles predominantly follow an accelerated normal aging pattern in subjects with AD. In turn, hippocampi and amygdala regions were affected by both normal aging and AD-specific factors. Interestingly, hippocampi and amygdala regions showed more of an accelerated normal aging pattern for subjects during the early clinical stages of the disease, while the AD-specific score increases in later clinical stages. Our code is freely available at https://github.com/Fjr9516/DBM_with_DL.
    摘要 阿尔茨海默病和正常年龄都 caracterized by brain atrophy。问题是whether AD-related brain atrophy represents accelerated aging or a neurodegeneration process distinct from that in normal aging remains unresolved。另外,在临床上准确地分离AD-related brain atrophy from normal aging是复杂的。在这种研究中,我们提议一种基于几何变换的 morphometry框架,用于估计在MRI扫描中的正常年龄和AD-specific atrophy模式。我们首先利用深度学习基本的方法创建年龄相关的模板,以模型正常年龄atrophy模式。然后,我们使用学习的射影变换来估计一年内正常年龄的变化模式。最后,我们测量这个注册与一年内正常年龄的变化模式之间的匹配程度,以计算正常年龄和AD-specific scores。方法在OASIS3 dataset上进行了开发和评估,该dataset包括1,014个T1-weighted MRI扫描,其中326个来自正常年龄 subjects,688个来自AD诊断的个体。结果显示,脑室主要follows an accelerated normal aging pattern in subjects with AD。而hippocampus和 Amygdala region受到了正常年龄和AD-specific factor的影响。具有诊断CDR scores的患者在早期клиниче阶段,hippocampus和Amygdala region更加受到加速的正常年龄变化,而AD-specific score在后期 клиниче阶段增加。我们的代码可以在https://github.com/Fjr9516/DBM_with_DL上获取。

Vision-Language Instruction Tuning: A Review and Analysis

  • paper_url: http://arxiv.org/abs/2311.08172
  • repo_url: https://github.com/palchenli/vl-instruction-tuning
  • paper_authors: Chen Li, Yixiao Ge, Dian Li, Ying Shan
  • for: 这篇论文的目的是探讨大语言模型(LLMs)在多模态数据 incorporation 中的指令调整过程,以提高 LLMS 的指令执行泛化和用户喜好适应能力。
  • methods: 这篇论文系统地审视了最新的视觉语言指令调整设置和数据集在多模态 LLMs 中,并总结了高质量视觉语言调整数据的特征。
  • results: 该论文基于 constructed 的指令数据进行了视觉语言指令调整,并在相应的 метриках上进行了广泛的实验,以证明提出的建设原则的合理性。
    Abstract Instruction tuning is an essential supervised training phase for Large Language Models (LLMs), with the goal of enhancing LLMs' capacity to generalize instruction execution and adapt to user preferences. With the growing incorporation of multi-modal data into LLMs, there is an increasing interest in the performance of vision-language instruction tuning which presents more complex features in comparison to pure text instructions. In this paper, we systematically review the latest vision-language instruction tuning settings and datasets in multi-modal LLMs and summarize the characteristics that high-quality vision-language tuning data should have. We consider these characteristics as the foundational principles for constructing vision-language instruction data and propose a complete construction pipeline consisting of data collection, instruction generation, and quality control modules that incorporate meticulously designed instruction property evaluation indicators. We perform vision-language instruction tuning on three widely used multi-modal LLMs based on the instruction data we constructed and conduct extensive experiments on the corresponding metrics to demonstrate the rationality of the construction principles proposed in this paper. The code and dataset related to this paper have been open-sourced at \url{https://github.com/palchenli/VL-Instruction-Tuning}.
    摘要 大型自然语言模型(LLM)的指令调整是一个重要的有监督训练阶段,目的是增强LLM的指令执行泛化和用户偏好适应能力。随着多模态数据的加入,视觉语言指令调整的性能已经引起了更多的关注,这种多模态指令调整比纯文本指令更加复杂。在这篇论文中,我们系统地回顾最新的视觉语言指令调整设置和数据集在多模态LLM中,并总结高质量视觉语言调整数据应该具备哪些特征。我们认为这些特征是建构视觉语言指令数据的基础原则,我们提议一个完整的建构管道,包括数据采集、指令生成和质量控制模块,这些模块都包括了仔细设计的指令性质评价指标。我们在三种广泛使用的多模态LLM上进行了视觉语言指令调整,并对相应的指标进行了广泛的实验,以示我们提出的建构原则的合理性。相关代码和数据集可以在 \url{https://github.com/palchenli/VL-Instruction-Tuning} 上下载。

DynamicSurf: Dynamic Neural RGB-D Surface Reconstruction with an Optimizable Feature Grid

  • paper_url: http://arxiv.org/abs/2311.08159
  • repo_url: https://github.com/Mirgahney/DynamicSurf.io
  • paper_authors: Mirgahney Mohamed, Lourdes Agapito
  • For: 高精度3D模型化非rigid表面从单视图RGB-D视频中。* Methods: 使用深度、表面法向和RGB损失来提高重建准确性和优化时间。* Results: 比前一代方法快$6\times$,并 achieved comparable results to the state-of-the-art methods。In English:* For: High-fidelity 3D modeling of non-rigid surfaces from monocular RGB-D video.* Methods: Using depth, surface normals, and RGB losses to improve reconstruction accuracy and optimization time.* Results: Faster than previous methods by $6\times$ and achieved comparable results to the state-of-the-art methods.
    Abstract We propose DynamicSurf, a model-free neural implicit surface reconstruction method for high-fidelity 3D modelling of non-rigid surfaces from monocular RGB-D video. To cope with the lack of multi-view cues in monocular sequences of deforming surfaces, one of the most challenging settings for 3D reconstruction, DynamicSurf exploits depth, surface normals, and RGB losses to improve reconstruction fidelity and optimisation time. DynamicSurf learns a neural deformation field that maps a canonical representation of the surface geometry to the current frame. We depart from current neural non-rigid surface reconstruction models by designing the canonical representation as a learned feature grid which leads to faster and more accurate surface reconstruction than competing approaches that use a single MLP. We demonstrate DynamicSurf on public datasets and show that it can optimize sequences of varying frames with $6\times$ speedup over pure MLP-based approaches while achieving comparable results to the state-of-the-art methods. Project is available at https://mirgahney.github.io//DynamicSurf.io/.
    摘要 我们提出了DynamicSurf,一种无模型 neural implicit surface reconstruction方法,用于高精度3D模型化非rigid表面从单视角RGB-D视频中。为了处理单视角序列中的形变表面的缺乏多视角cue,DynamicSurf利用深度、表面法向和RGB损失来提高重建准确性和优化时间。DynamicSurf学习一个神经变形场,将一个均匀表面几何代表映射到当前帧中。我们与现有的神经非RIGID表面重建模型不同,我们设计了学习的特征网格作为均匀表面几何代表,这导致了更快和更准确的表面重建。我们在公共数据集上展示了DynamicSurf的性能,并证明它可以在不同的帧序列中优化6倍速度于纯MLP-based方法,而且与状态艺术方法相当。项目可以在https://mirgahney.github.io//DynamicSurf.io/查看。

Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing

  • paper_url: http://arxiv.org/abs/2311.08151
  • repo_url: None
  • paper_authors: Yating Xu, Conghui Hu, Gim Hee Lee
  • for: 这篇论文的目的是提出一种新的弱监督音视频分割方法,以解决现有的混合注意力网络(HAN)在融合多Modal embedding时存在聚合不准确的问题。
  • methods: 该方法使用了一种名为“消息导向中 fusion transformer”的新的嵌入方法,以减少跨Modal不相关的上下文。此外,它还提出了跨音频预测一致性来降低视频预测中不相关的音频信息的影响。
  • results: 实验表明,该方法在与现有状态的方法进行比较时表现出色,具有更高的准确率和更好的一致性。
    Abstract Existing works on weakly-supervised audio-visual video parsing adopt hybrid attention network (HAN) as the multi-modal embedding to capture the cross-modal context. It embeds the audio and visual modalities with a shared network, where the cross-attention is performed at the input. However, such an early fusion method highly entangles the two non-fully correlated modalities and leads to sub-optimal performance in detecting single-modality events. To deal with this problem, we propose the messenger-guided mid-fusion transformer to reduce the uncorrelated cross-modal context in the fusion. The messengers condense the full cross-modal context into a compact representation to only preserve useful cross-modal information. Furthermore, due to the fact that microphones capture audio events from all directions, while cameras only record visual events within a restricted field of view, there is a more frequent occurrence of unaligned cross-modal context from audio for visual event predictions. We thus propose cross-audio prediction consistency to suppress the impact of irrelevant audio information on visual event prediction. Experiments consistently illustrate the superior performance of our framework compared to existing state-of-the-art methods.
    摘要 现有的弱监睹音视频分解方法采用混合注意网络(HAN)作为多modal嵌入,以便捕捉音视模态之间的交叉模态上下文。HAN将音和视模态embedded在共享网络中,并在输入阶段进行交叉注意。然而,这种早期融合方法会高度束缚两个不完全相关的模态,从而降低单模态事件检测的性能。为解决这个问题,我们提出了使者导向中间融合变换器,以减少不相关的交叉模态上下文。使者将全模态上下文压缩到一个紧凑的表示中,只保留有用的交叉模态信息。此外,由于 Microphones 捕捉的音频事件来自所有方向,而 Camera 只记录视频事件在限定的视野内,因此在视频事件预测中更常出现不同模态上下文的不一致。我们因此提出了跨音频预测一致性,以抑制不关联的音频信息对视频事件预测的影响。实验表明,我们的框架在现有状态艺术方法之上具有显著优势。

GMTR: Graph Matching Transformers

  • paper_url: http://arxiv.org/abs/2311.08141
  • repo_url: None
  • paper_authors: Jinpei Guo, Shaofeng Zhang, Runzhong Wang, Chang Liu, Junchi Yan
  • for: 这个论文旨在探讨如何使用视transformer(ViTs)进行视觉匹配,并且提出了一种新的中心裁剪策略和交叉注意机制来提高视觉匹配的敏感性。
  • methods: 该论文提出了一种名为QueryTrans(Query Transformer)的新方法,该方法使用了交叉注意机制和中心裁剪策略来更好地提取视觉信息。此外,论文还提出了一种基于图注意机制的转化器-based图匹配方法(GMTR),用于解决视觉匹配中的 combinatorial 问题。
  • results: 根据标准的GM benchmarks,GMTR在对比SOTA框架时显示出竞争力的性能,具体来说,在Pascal VOC上,GMTR的准确率为83.6%,高于SOTA框架的0.9%。在Spair-71k上,GMTR也表现出了优异的性能,并超越了大多数之前的works。此外,QueryTrans在Pascal VOC上提高了NGMv2的准确率从80.1%到83.3%,并提高了BBGM的准确率从79.0%到84.5%。在Spair-71k上,QueryTrans也提高了NGMv2的准确率从80.6%到82.5%,并提高了BBGM的准确率从82.1%到83.9%。
    Abstract Vision transformers (ViTs) have recently been used for visual matching beyond object detection and segmentation. However, the original grid dividing strategy of ViTs neglects the spatial information of the keypoints, limiting the sensitivity to local information. Therefore, we propose \textbf{QueryTrans} (Query Transformer), which adopts a cross-attention module and keypoints-based center crop strategy for better spatial information extraction. We further integrate the graph attention module and devise a transformer-based graph matching approach \textbf{GMTR} (Graph Matching TRansformers) whereby the combinatorial nature of GM is addressed by a graph transformer neural GM solver. On standard GM benchmarks, GMTR shows competitive performance against the SOTA frameworks. Specifically, on Pascal VOC, GMTR achieves $\mathbf{83.6\%}$ accuracy, $\mathbf{0.9\%}$ higher than the SOTA framework. On Spair-71k, GMTR shows great potential and outperforms most of the previous works. Meanwhile, on Pascal VOC, QueryTrans improves the accuracy of NGMv2 from $80.1\%$ to $\mathbf{83.3\%}$, and BBGM from $79.0\%$ to $\mathbf{84.5\%}$. On Spair-71k, QueryTrans improves NGMv2 from $80.6\%$ to $\mathbf{82.5\%}$, and BBGM from $82.1\%$ to $\mathbf{83.9\%}$. Source code will be made publicly available.
    摘要 《视觉转换器(ViT)在视觉匹配中的应用》。在原始的网格分割策略下,ViT忽视了关键点的空间信息,导致对本地信息的敏感性受限。因此,我们提出了《查询转换器》(QueryTrans),它采用了交叉注意模块和基于关键点的中心裁剪策略,以更好地提取空间信息。此外,我们还整合了图注意模块,并设计了基于图transformer的图匹配方法《图匹配变换器》(GMTR),以解决图匹配问题的 combinatorial 性。在标准GM benchmark上,GMTR与顶尖框架相比,表现竞争力强。具体来说,在Pascal VOC上,GMTR的准确率为83.6%,高于顶尖框架80.1%。在Spair-71k上,GMTR表现出色,超过了大多数之前的工作。同时,在Pascal VOC上,QueryTrans提高了NGMv2的准确率从80.1%到83.3%,和BBGM从79.0%到84.5%。在Spair-71k上,QueryTrans提高了NGMv2的准确率从80.6%到82.5%,和BBGM从82.1%到83.9%。源代码将公开发布。

Learning based Deep Disentangling Light Field Reconstruction and Disparity Estimation Application

  • paper_url: http://arxiv.org/abs/2311.08129
  • repo_url: None
  • paper_authors: Langqing Shi, Ping Zhou
  • for: 提高深度估计任务中的angular分辨率,并解决大 disparity在稀有光场中的挑战。
  • methods: 提出了深度分离机制,将4D光场转换为2D图像格式,且采用了进一步的特征提取器设计和高级网络结构。
  • results: 实现了最佳性能在实验中,并提出了减少内存占用的块跨度angular超解析策略,用于深度估计增强。
    Abstract Light field cameras have a wide range of uses due to their ability to simultaneously record light intensity and direction. The angular resolution of light fields is important for downstream tasks such as depth estimation, yet is often difficult to improve due to hardware limitations. Conventional methods tend to perform poorly against the challenge of large disparity in sparse light fields, while general CNNs have difficulty extracting spatial and angular features coupled together in 4D light fields. The light field disentangling mechanism transforms the 4D light field into 2D image format, which is more favorable for CNN for feature extraction. In this paper, we propose a Deep Disentangling Mechanism, which inherits the principle of the light field disentangling mechanism and further develops the design of the feature extractor and adds advanced network structure. We design a light-field reconstruction network (i.e., DDASR) on the basis of the Deep Disentangling Mechanism, and achieve SOTA performance in the experiments. In addition, we design a Block Traversal Angular Super-Resolution Strategy for the practical application of depth estimation enhancement where the input views is often higher than 2x2 in the experiments resulting in a high memory usage, which can reduce the memory usage while having a better reconstruction performance.
    摘要 光场相机具有广泛的应用领域,主要是因为它同时记录光强和方向。光场的方向分辨率对下游任务如深度估计非常重要,但受硬件限制,通常难以提高。传统方法在大 disparity sparse 光场中表现不佳,而通用 CNN 在4D 光场中抽取空间和方向特征同时存在困难。基于光场分解机制,我们提出了深度分解机制,将4D 光场转换成2D 图像格式,更适合 CNN 进行特征提取。在这篇论文中,我们提出了一种深度分解机制,具有光场分解机制的原理,并进一步开发特征提取器的设计和高级网络结构。基于深度分解机制,我们设计了一个深度场重建网络(i.e., DDASR),在实验中达到了最佳性能。此外,我们还设计了一种块传播角度超解析策略,用于实际应用深度估计增强,其中输入视图 часто高于2x2,导致高内存使用量,可以降低内存使用量而且具有更好的重建性能。

DeepEMplanner: An EM Motion Planner with Iterative Interactions

  • paper_url: http://arxiv.org/abs/2311.08100
  • repo_url: None
  • paper_authors: Zhili Chen, Maosheng Ye, Shuangjie Xu, Tongyi Cao, Qifeng Chen
  • for: 本研究旨在提出一种基于深度学习的动态规划和互动模型,以便更好地学习细腻的行为。
  • methods: 我们提出了一种名为DeepEMplanner的新框架,它在每个步骤交互中考虑了各自的行为目标,以便更好地学习和预测对手和环境的行为。
  • results: 在nuScenesbenchmark上进行了实验,我们的方法实现了状态的最佳Results。
    Abstract Motion planning is a computational problem that finds a sequence of valid trajectories, often based on surrounding agents' forecasting, environmental understanding, and historical and future contexts. It can also be viewed as a game in which agents continuously plan their next move according to other agents' intentions and the encountering environment, further achieving their ultimate goals through incremental actions. To model the dynamic planning and interaction process, we propose a novel framework, DeepEMplanner, which takes the stepwise interaction into account for fine-grained behavior learning. The ego vehicle maximizes each step motion to reach its eventual driving outcome based on the stepwise expectation from agents and its upcoming road conditions. On the other hand, the agents also follow the same philosophy to maximize their stepwise behavior under the encountering environment and the expectations from ego and other agents. Our DeepEMplanner models the interactions among ego, agents, and the dynamic environment in an autoregressive manner by interleaving the Expectation and Maximization processes. Further, we design ego-to-agents, ego-to-map, and ego-to-BEV interaction mechanisms with hierarchical dynamic key objects attention to better model the interactions. Experiments on the nuScenes benchmark show that our approach achieves state-of-the-art results.
    摘要 行为规划是一个计算问题,找到一系列有效的轨迹,经常基于周围的代理人预测、环境理解和历史和未来的上下文。它也可以视为一个游戏,在 które agents 不断规划下一步的动作,根据其他代理人的意图和遇到的环境,以实现他们的最终目标。为了模型动态规划和互动过程,我们提出了一个新的框架,深度EMplanner,它考虑了每个步骤的互动,以提高细化的行为学习。ego Vehicle 在每步动作中尽可能地实现其最终驾驶结果,基于预测的代理人和下一步道路条件。然而,代理人也遵循同样的哲学,在遇到的环境和预测中 maximize 其每步行为。我们的 DeepEMplanner 模型了 egO、代理人和动态环境之间的互动,通过嵌入 Expectation 和 Maximization 过程来模型这些互动。此外,我们还设计了 ego-to-agents、ego-to-map 和 ego-to-BEV 互动机制,并使用层次的动态关键对象注意力来更好地模型这些互动。在 nuScenes benchmark 上进行了实验,我们的方法实现了状态计算的最佳结果。

Identifying Light-curve Signals with a Deep Learning Based Object Detection Algorithm. II. A General Light Curve Classification Framework

  • paper_url: http://arxiv.org/abs/2311.08080
  • repo_url: https://github.com/ckm3/deep-lc
  • paper_authors: Kaiming Cui, D. J. Armstrong, Fabo Feng
  • for: 这个研究的目的是发展一个通用的深度学习框架,用于自动分类天文学摄谱资料中的变星和其他物类。
  • methods: 这个框架使用了一种弱地监督物件检测模型,自动选择时间和频率领域中的最佳窗口,并将资料自动抽象为时间和频率领域之间的特征。
  • results: 这个模型在变星和噪音测量资料上取得了87%的准确率,与先前的特征基于模型相比。此外,这个模型还可以直接应用于其他任务,如ASAS-SN, без需要任何重新训练或调整。
    Abstract Vast amounts of astronomical photometric data are generated from various projects, requiring significant efforts to identify variable stars and other object classes. In light of this, a general, widely applicable classification framework would simplify the task of designing custom classifiers. We present a novel deep learning framework for classifying light curves using a weakly supervised object detection model. Our framework identifies the optimal windows for both light curves and power spectra automatically, and zooms in on their corresponding data. This allows for automatic feature extraction from both time and frequency domains, enabling our model to handle data across different scales and sampling intervals. We train our model on datasets obtained from both space-based and ground-based multi-band observations of variable stars and transients. We achieve an accuracy of 87% for combined variables and transient events, which is comparable to the performance of previous feature-based models. Our trained model can be utilized directly to other missions, such as ASAS-SN, without requiring any retraining or fine-tuning. To address known issues with miscalibrated predictive probabilities, we apply conformal prediction to generate robust predictive sets that guarantee true label coverage with a given probability. Additionally, we incorporate various anomaly detection algorithms to empower our model with the ability to identify out-of-distribution objects. Our framework is implemented in the Deep-LC toolkit, which is an open-source Python package hosted on Github and PyPI.
    摘要 巨量的天文光度数据由多个项目生成,需要大量的努力来识别变星和其他对象类型。为了简化这个任务,我们提出了一种通用的深度学习分类框架。我们的框架使用弱监督对象检测模型来分类光谱曲线。我们的框架可以自动确定光谱曲线和功率спектrum的优化窗口,并在这些窗口中提取数据。这使得我们的模型能够处理不同的时间和频率尺度,并且不需要手动设置窗口大小。我们的模型通过对多个变星和事件进行训练,实现了87%的总精度,与之前基于特征的模型相当。我们的训练模型可以直接应用于其他任务,如ASAS-SN,无需重新训练或调整。为了解决已知的预测概率误差,我们使用封闭预测来生成可靠的预测集, garantía true label coverage WITH a given probability。此外,我们还 incorporated 多种异常检测算法,使我们的模型能够识别不符合预期的对象。我们的框架在 Deep-LC 工具包中实现,该工具包是一个开源的 Python 包, hosted on Github 和 PyPI。

GlanceSeg: Real-time microaneurysm lesion segmentation with gaze-map-guided foundation model for early detection of diabetic retinopathy

  • paper_url: http://arxiv.org/abs/2311.08075
  • repo_url: None
  • paper_authors: Hongyang Jiang, Mengdi Gao, Zirong Liu, Chen Tang, Xiaoqing Zhang, Shuai Jiang, Wu Yuan, Jiang Liu
  • for: 这个研究旨在提出一个基于“Segment Anything Model”(SAM)的人工智能支持的早期视力疾病诊断框架,以帮助诊断早期视力疾病中的微小血管变化。
  • methods: 这个框架使用了人工智能技术,包括眼镜视野映射和精确性检查,以帮助诊断早期视力疾病中的微小血管变化。
  • results: 这个研究显示了一个名为“GlanceSeg”的人工智能框架,可以帮助诊断早期视力疾病中的微小血管变化,并且可以提高诊断效率和准确性。
    Abstract Early-stage diabetic retinopathy (DR) presents challenges in clinical diagnosis due to inconspicuous and minute microangioma lesions, resulting in limited research in this area. Additionally, the potential of emerging foundation models, such as the segment anything model (SAM), in medical scenarios remains rarely explored. In this work, we propose a human-in-the-loop, label-free early DR diagnosis framework called GlanceSeg, based on SAM. GlanceSeg enables real-time segmentation of microangioma lesions as ophthalmologists review fundus images. Our human-in-the-loop framework integrates the ophthalmologist's gaze map, allowing for rough localization of minute lesions in fundus images. Subsequently, a saliency map is generated based on the located region of interest, which provides prompt points to assist the foundation model in efficiently segmenting microangioma lesions. Finally, a domain knowledge filter refines the segmentation of minute lesions. We conducted experiments on two newly-built public datasets, i.e., IDRiD and Retinal-Lesions, and validated the feasibility and superiority of GlanceSeg through visualized illustrations and quantitative measures. Additionally, we demonstrated that GlanceSeg improves annotation efficiency for clinicians and enhances segmentation performance through fine-tuning using annotations. This study highlights the potential of GlanceSeg-based annotations for self-model optimization, leading to enduring performance advancements through continual learning.
    摘要 早期 диабетическая ретинопатия (DR) 的临床诊断受到微型抽象病变的难以诊断的挑战,这导致了这个领域的研究受到限制。此外,现有的基础模型,如 segment anything model (SAM),在医疗场景中的应用尚未得到广泛探索。在这项工作中,我们提出了一种人类在Loop的、无标签的早期 DR 诊断框架,称为 GlanceSeg,基于 SAM。GlanceSeg 可以在眼科医生查看基底图像时实时分割微型病变。我们的人类在Loop 框架将眼科医生的视线地图与基底图像进行结合,以便粗略地位微型病变。然后,基于所处的区域兴趣点的敏感地图会生成,以提供帮助基础模型快速分割微型病变的指导点。最后,基于区域的知识滤波器会对微型病变进行精细分割。我们在两个新建的公共数据集上进行了实验,即 IDRiD 和 Retinal-Lesions,并通过视觉化示例和量化度量证明了 GlanceSeg 的可行性和超越性。此外,我们还示出了 GlanceSeg 可以提高临床医生的注意力和分割性能,通过细化使用注解进行训练。这种研究强调了 GlanceSeg 基于注解的自适应优化的潜在可能,这将导致持续学习的性能提升。

FS-Net: Full Scale Network and Adaptive Threshold for Improving Extraction of Micro-Retinal Vessel Structures

  • paper_url: http://arxiv.org/abs/2311.08059
  • repo_url: None
  • paper_authors: Melaku N. Getahun, Oleg Y. Rogov, Dmitry V. Dylov, Andrey Somov, Ahmed Bouridane, Rifat Hamoudi
    for: 这个研究旨在帮助验视医生为某些眼病进行诊断和检测,减轻验视医生的工作负担。methods: 这个研究使用了一个基于encoder-decoder神经网络架构的全规模微血管提取机制,以及sigmoid平滑和适应阈值方法。results: 这个研究在DRIVE、CHASE-DB1和STARE datasets上进行了评估,与之前的研究相比,获得了比较出色的成绩,其中在DRIVEdataset上的AUC和准确率分别为0.9884和0.9702,在CHASE-DB1dataset上的 scores分别为0.9903和0.9755,在STAREdataset上的 scores分别为0.9916和0.9750。这些成绩较之前的研究一步进步,使得这个解决方案在实际检测中有更高的可能性被应用。
    Abstract Retinal vascular segmentation, is a widely researched subject in biomedical image processing, aims to relieve ophthalmologists' workload when treating and detecting retinal disorders. However, segmenting retinal vessels has its own set of challenges, with prior techniques failing to generate adequate results when segmenting branches and microvascular structures. The neural network approaches used recently are characterized by the inability to keep local and global properties together and the failure to capture tiny end vessels make it challenging to attain the desired result. To reduce this retinal vessel segmentation problem, we propose a full-scale micro-vessel extraction mechanism based on an encoder-decoder neural network architecture, sigmoid smoothing, and an adaptive threshold method. The network consists of of residual, encoder booster, bottleneck enhancement, squeeze, and excitation building blocks. All of these blocks together help to improve the feature extraction and prediction of the segmentation map. The proposed solution has been evaluated using the DRIVE, CHASE-DB1, and STARE datasets, and competitive results are obtained when compared with previous studies. The AUC and accuracy on the DRIVE dataset are 0.9884 and 0.9702, respectively. On the CHASE-DB1 dataset, the scores are 0.9903 and 0.9755, respectively. On the STARE dataset, the scores are 0.9916 and 0.9750, respectively. The performance achieved is one step ahead of what has been done in previous studies, and this results in a higher chance of having this solution in real-life diagnostic centers that seek ophthalmologists attention.
    摘要 Retinal vascular segmentation 是医学图像处理领域广泛研究的主题,旨在减轻眼科医生在诊断和治疗 RETINAL 疾病时的劳重。然而, segmenting retinal vessels 有其独特的挑战,先前的技术无法生成足够的结果,特别是在分支和微血管结构上。近年来的神经网络方法具有不能同时保持本地和全局属性以及失去微血管结构的缺点,使得 segmentation 问题变得更加困难。为解决这个问题,我们提出了基于 encoder-decoder 神经网络架构、sigmoid 缓和适应阈值方法的全规模微血管提取机制。该网络由 residual、encoder booster、瓶颈增强、缩小和刺激块组成。这些块都帮助提高特征提取和预测 segmentation 图像。我们对 DRIVE、CHASE-DB1 和 STARE 数据集进行评估,与前期研究相比,实现了竞争性的结果。 DRIVE 数据集上的 AUC 和精度分别为 0.9884 和 0.9702,CHASE-DB1 数据集上的分别为 0.9903 和 0.9755,STARE 数据集上的分别为 0.9916 和 0.9750。我们的成果胜过了前期研究,这将有助于这种解决方案在实际诊断中心得到应用。

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

  • paper_url: http://arxiv.org/abs/2311.08046
  • repo_url: https://github.com/pku-yuangroup/chat-univi
  • paper_authors: Peng Jin, Ryuichi Takanobu, Caiwan Zhang, Xiaochun Cao, Li Yuan
  • for: 这个论文的目的是解决图像和视频理解的问题,并能够在有限的视觉token上进行有效的对话。
  • methods: 这个论文使用了一种动态的视觉代表方法,可以同时捕捉图像和视频中的空间细节和时间关系。此外,它还使用了多尺度表示方法,使模型能够捕捉高级别的Semantic概念和低级别的视觉细节。
  • results: 实验结果表明,Chat-UniVi模型在混合 dataset上进行训练后,能够在图像和视频任务中表现出色,并且在图像和视频任务中的性能都高于专门为图像或视频设计的方法。
    Abstract Large language models have demonstrated impressive universal capabilities across a wide range of open-ended tasks and have extended their utility to encompass multimodal conversations. However, existing methods encounter challenges in effectively handling both image and video understanding, particularly with limited visual tokens. In this work, we introduce Chat-UniVi, a unified vision-language model capable of comprehending and engaging in conversations involving images and videos through a unified visual representation. Specifically, we employ a set of dynamic visual tokens to uniformly represent images and videos. This representation framework empowers the model to efficiently utilize a limited number of visual tokens to simultaneously capture the spatial details necessary for images and the comprehensive temporal relationship required for videos. Moreover, we leverage a multi-scale representation, enabling the model to perceive both high-level semantic concepts and low-level visual details. Notably, Chat-UniVi is trained on a mixed dataset containing both images and videos, allowing direct application to tasks involving both mediums without requiring any modifications. Extensive experimental results demonstrate that Chat-UniVi, as a unified model, consistently outperforms even existing methods exclusively designed for either images or videos.
    摘要 Simplified Chinese:大型语言模型已经展示了广泛的通用能力,并扩展了其应用范围到包括多modal会话。然而,现有方法在处理图像和视频理解方面遇到了挑战,特别是具有有限的视觉标识符。在这种情况下,我们引入了 Chat-UniVi,一种能够同时理解和参与图像和视频的沟通的统一视觉语言模型。具体来说,我们使用了一组动态的视觉标识符来统一表示图像和视频。这种表示框架使得模型可以有效地利用有限的视觉标识符来同时捕捉图像中的空间细节和视频中的全面时间关系。此外,我们还利用了多尺度表示,让模型能够捕捉高级 semantic概念以及低级视觉细节。值得一提的是,Chat-UniVi 是在包含图像和视频的混合 Dataset 上训练的,因此不需要任何修改就能直接应用于包含这两种媒体的任务。我们的实验结果表明,Chat-UniVi 作为一种统一模型,在与专门为图像或视频设计的方法进行比较时,一直表现出优于其。

Contrastive Learning for Multi-Object Tracking with Transformers

  • paper_url: http://arxiv.org/abs/2311.08043
  • repo_url: None
  • paper_authors: Pierre-François De Plaen, Nicola Marinello, Marc Proesmans, Tinne Tuytelaars, Luc Van Gool
  • for: 这个研究旨在将物件探测变换为一个翻译任务,从图像特征中获取物件水平的表现。
  • methods: 我们使用了一个实验级别的对称损失函数、一种修订的抽象策略和轻量级的对称分配方法来转换DETR模型为多物件追踪(MOT)模型。
  • results: 我们的训练方案可以学习物件的外观,同时保持探测能力,仅需小量的额外负载。在BDD100K dataset上,我们的表现超过了过往的最佳性能+2.6 mMOTA,与现有的对称基于方法在MOT17 dataset上的表现相似。
    Abstract The DEtection TRansformer (DETR) opened new possibilities for object detection by modeling it as a translation task: converting image features into object-level representations. Previous works typically add expensive modules to DETR to perform Multi-Object Tracking (MOT), resulting in more complicated architectures. We instead show how DETR can be turned into a MOT model by employing an instance-level contrastive loss, a revised sampling strategy and a lightweight assignment method. Our training scheme learns object appearances while preserving detection capabilities and with little overhead. Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset and is comparable to existing transformer-based methods on the MOT17 dataset.
    摘要 “DEtection TRansformer(DETR)开创了新的可能性,将对象探测视为翻译任务:将图像特征转换为对象级别表示。先前的工作通常会添加费时模块到DETR来实现多对象跟踪(MOT),导致建立更加复杂的架构。我们则示出了如何将DETR转换成MOT模型,使用实例级别的对比损失,修改抽取策略和轻量级的归属方法。我们的训练方案学习对象外观,保留探测能力,占用较少的资源。其性能在具有挑战性的BDD100K数据集上超过了之前的州态艺术的+2.6 mMOTA,与现有的转换基本方法在MOT17数据集上的性能相当。”

ELF: An End-to-end Local and Global Multimodal Fusion Framework for Glaucoma Grading

  • paper_url: http://arxiv.org/abs/2311.08032
  • repo_url: None
  • paper_authors: Wenyun Li, Chi-Man Pun
  • for: 针对睫际病变患者进行早期检测和治疗,以防病情加重。
  • methods: 基于2D睫际图像和光度共谱成像(OCT)技术,提出一种综合利用多Modal信息的睫际病变评估框架,名为ELF。ELF可以充分利用睫际图像和OCT数据之间的补偿信息。
  • results: 在多modal睫际病变评估GAMMA dataset上进行了广泛的实验,并证明ELF在比较其他状态艺术方法时表现更高效。
    Abstract Glaucoma is a chronic neurodegenerative condition that can lead to blindness. Early detection and curing are very important in stopping the disease from getting worse for glaucoma patients. The 2D fundus images and optical coherence tomography(OCT) are useful for ophthalmologists in diagnosing glaucoma. There are many methods based on the fundus images or 3D OCT volumes; however, the mining for multi-modality, including both fundus images and data, is less studied. In this work, we propose an end-to-end local and global multi-modal fusion framework for glaucoma grading, named ELF for short. ELF can fully utilize the complementary information between fundus and OCT. In addition, unlike previous methods that concatenate the multi-modal features together, which lack exploring the mutual information between different modalities, ELF can take advantage of local-wise and global-wise mutual information. The extensive experiment conducted on the multi-modal glaucoma grading GAMMA dataset can prove the effiectness of ELF when compared with other state-of-the-art methods.
    摘要 glaucoma 是一种慢性神经退化疾病,可能导致失明。早期发现和治疗非常重要,以防疫情进一步加剧。二维基准图像和光共振镜(OCT) 是诊断 glaucoma 的有用工具。多种基于基准图像或3D OCT 体积的方法已经研究过,但是对于多Modal 数据的挖掘还是相对较少。在这种情况下,我们提出了一种综合使用本地和全局多Modal 融合框架,名为ELF。ELF 可以充分利用基准图像和 OCT 之间的补偿信息。此外,与之前的方法不同的是,ELF 可以利用本地和全局多Modal 信息之间的相互关系,而不是将多Modal 特征 concatenate 在一起。经过了对多modal 疾病评分 GAMMA 数据集的广泛实验,ELF 的效果可以证明在与其他现有方法相比,效果更好。

MD-IQA: Learning Multi-scale Distributed Image Quality Assessment with Semi Supervised Learning for Low Dose CT

  • paper_url: http://arxiv.org/abs/2311.08024
  • repo_url: None
  • paper_authors: Tao Song, Ruizhi Hou, Lisong Dai, Lei Xiang
    for: 这个研究旨在提高深度学习基于医学影像评估(IQA)的模型通用性和感知准确性。methods: 该研究提出了一种多级分布回归方法,通过制约输出分布来提高模型通用性。此外,我们还设计了一个双树对齐网络来增强特征提取能力。最后,我们引入了半监督学习,使用 pseudo-标签来导引模型训练。results: 我们的提议方法在大规模实验中得到了广泛的可读性和稳定性。我们的方法可以在医学影像评估中提高模型的通用性和感知准确性。代码可以在 GitHub 上找到:https://github.com/zunzhumu/MD-IQA。
    Abstract Image quality assessment (IQA) plays a critical role in optimizing radiation dose and developing novel medical imaging techniques in computed tomography (CT). Traditional IQA methods relying on hand-crafted features have limitations in summarizing the subjective perceptual experience of image quality. Recent deep learning-based approaches have demonstrated strong modeling capabilities and potential for medical IQA, but challenges remain regarding model generalization and perceptual accuracy. In this work, we propose a multi-scale distributions regression approach to predict quality scores by constraining the output distribution, thereby improving model generalization. Furthermore, we design a dual-branch alignment network to enhance feature extraction capabilities. Additionally, semi-supervised learning is introduced by utilizing pseudo-labels for unlabeled data to guide model training. Extensive qualitative experiments demonstrate the effectiveness of our proposed method for advancing the state-of-the-art in deep learning-based medical IQA. Code is available at: https://github.com/zunzhumu/MD-IQA.
    摘要 医用像质评估(IQA)在计算机Tomography(CT)中扮演了关键的角色, optimize radiation dose和开发新的医疗成像技术。传统的IQA方法,取决于手工设计的特征,有限的概念化Subjective perceived image quality的经验。 current deep learning-based approaches have shown strong modeling capabilities and potential for medical IQA, but there are still challenges in terms of model generalization and perceptual accuracy. 在这项工作中,我们提议一种多尺度分布回归方法,预测质分数,并通过限制输出分布,提高模型泛化性。此外,我们设计了双树对齐网络,提高特征提取能力。此外,我们还利用 pseudo-labels 进行 semi-supervised learning,以帮助模型训练。我们的提议方法在 deep learning-based 医疗IQA 领域中具有广泛的可靠性和稳定性。 Code 可以在以下 GitHub 上获取:https://github.com/zunzhumu/MD-IQA.

CP-SLAM: Collaborative Neural Point-based SLAM System

  • paper_url: http://arxiv.org/abs/2311.08013
  • repo_url: None
  • paper_authors: Jiarui Hu, Mao Mao, Hujun Bao, Guofeng Zhang, Zhaopeng Cui
  • for: 这个论文描述了一种基于RGB-D图像序列的协同隐藏神经同时地图(SLAM)系统,包括完整的前端和后端模块,如偏移、循环检测、子地图融合和全局精度调整。
  • methods: 作者提出了一种新的神经点基于3D场景表示方法,每个点都有一个学习的神经特征用于场景编码,并与特定的关键帧相关。此外,作者还提出了一种分布式到中心式学习策略来提高协同隐藏SLAM的一致性和合作。
  • results: 实验结果表明,提出的方法在不同的数据集上都有较高的精度和稳定性,在摄像头跟踪和地图建模方面均有显著提高。
    Abstract This paper presents a collaborative implicit neural simultaneous localization and mapping (SLAM) system with RGB-D image sequences, which consists of complete front-end and back-end modules including odometry, loop detection, sub-map fusion, and global refinement. In order to enable all these modules in a unified framework, we propose a novel neural point based 3D scene representation in which each point maintains a learnable neural feature for scene encoding and is associated with a certain keyframe. Moreover, a distributed-to-centralized learning strategy is proposed for the collaborative implicit SLAM to improve consistency and cooperation. A novel global optimization framework is also proposed to improve the system accuracy like traditional bundle adjustment. Experiments on various datasets demonstrate the superiority of the proposed method in both camera tracking and mapping.
    摘要 这篇论文介绍了一种基于RGB-D图像序列的协同隐式神经同时地图(SLAM)系统,包括完整的前端和后端模块,如偏移、循环检测、子地图融合和全局调整。为了在一个统一框架中实现这些模块,我们提出了一种新的神经点基于3D场景表示,其中每个点都保留一个学习的神经特征 для场景编码,并与某个键帧相关。此外,我们还提出了分布式到中心学习策略来提高协同隐式SLAM的一致性和合作性。此外,我们还提出了一种新的全局优化框架来提高系统精度,类似于传统的缎纹调整。在多个数据集上进行了诸多实验,并证明了我们的方法在摄像头跟踪和地图建模方面具有显著优势。

Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation

  • paper_url: http://arxiv.org/abs/2311.08007
  • repo_url: https://github.com/zzh-tech/interpany-clearer
  • paper_authors: Zhihang Zhong, Gurunandan Krishnan, Xiao Sun, Yu Qiao, Sizhuo Ma, Jian Wang
  • for: 提高视频帧 interpolate(VFI)方法的精度和清晰度,尤其是对于具有较长距离和弯曲运动的对象。
  • methods: 使用一种新的“距离索引”方法,即提供网络中的明确提示,以帮助网络学习对象在不同时间步的位置。此外,还提出了一种循环引用基本估计策略,以解决长距离预测中的方向ambiguity问题。
  • results: 与传统VFI方法相比,使用“距离索引”和循环引用基本估计策略可以提高视频帧 interpolate的精度和清晰度,并且可以独立地调整视频的时间 interpolate。此外,“距离索引”还可以在每帧级进行独立的时间修饰,为视频编辑任务提供一种新的工具。
    Abstract Existing video frame interpolation (VFI) methods blindly predict where each object is at a specific timestep t ("time indexing"), which struggles to predict precise object movements. Given two images of a baseball, there are infinitely many possible trajectories: accelerating or decelerating, straight or curved. This often results in blurry frames as the method averages out these possibilities. Instead of forcing the network to learn this complicated time-to-location mapping implicitly together with predicting the frames, we provide the network with an explicit hint on how far the object has traveled between start and end frames, a novel approach termed "distance indexing". This method offers a clearer learning goal for models, reducing the uncertainty tied to object speeds. We further observed that, even with this extra guidance, objects can still be blurry especially when they are equally far from both input frames (i.e., halfway in-between), due to the directional ambiguity in long-range motion. To solve this, we propose an iterative reference-based estimation strategy that breaks down a long-range prediction into several short-range steps. When integrating our plug-and-play strategies into state-of-the-art learning-based models, they exhibit markedly sharper outputs and superior perceptual quality in arbitrary time interpolations, using a uniform distance indexing map in the same format as time indexing. Additionally, distance indexing can be specified pixel-wise, which enables temporal manipulation of each object independently, offering a novel tool for video editing tasks like re-timing.
    摘要 存在的视频帧 interpolate (VFI) 方法盲目地预测每个对象在特定时间步 t ("时间索引") 上的位置,这可能会难以预测对象的精确移动。给出了两个 baseball 图像,有无数可能的轨迹:加速或减速,直线或弯曲。这经常导致模糊的帧,因为方法平均出这些可能性。而不是让网络学习这些复杂的时间-to-位置映射,我们为网络提供了一个显式的提示,即在开始和结束帧之间对象如何移动的距离,一种新的方法称为 "距离索引"。这种方法为模型提供了明确的学习目标,降低了对象速度的uncertainty。我们进一步发现,即使有这些额外指导,对象仍然可能变得模糊,特别是当它们处于两个输入帧之间的中点(即半路)时,由于长距离运动的方向ambiguity。为解决这个问题,我们提议一种Iterative reference-based估计策略,将长距离预测分解成多个短距离步骤。将我们的插件和简化策略 integrate 到当前的学习基于模型中,它们在任意时间 interpolate 中表现出了明显更加锐化的输出和superior perceptual quality,使用一个固定的距离索引地图,与时间索引相同。此外,距离索引可以像时间索引一样Specified 像素级,这允许在视频编辑任务中重新时间调整每个对象独立地,提供了一种新的工具。

Explicit Change Relation Learning for Change Detection in VHR Remote Sensing Images

  • paper_url: http://arxiv.org/abs/2311.07993
  • repo_url: None
  • paper_authors: Dalong Zheng, Zebin Wu, Jia Liu, Chih-Cheng Hung, Zhihui Wei
  • for: 本文提出了一种名为 NAME 的网络架构,用于显式挖掘变化关系特征,以提高 remote sensing 图像变化检测的准确率。
  • methods: 该网络架构包括 triple branch 网络, combine 了 transformer 和 CNN,用于提取和融合全局信息和本地信息中的变化特征。 同时,该网络还包括 continous change relation (CCR) branch,用于获取细化变化关系特征,以提高变化检测的准确率。
  • results: 实验结果表明,NAME 网络在四个公共高分辨率 remote sensing 数据集上的 F1、IoU 和 OA 指标上比现有的先进网络更高。
    Abstract Change detection has always been a concerned task in the interpretation of remote sensing images. It is essentially a unique binary classification task with two inputs, and there is a change relationship between these two inputs. At present, the mining of change relationship features is usually implicit in the network architectures that contain single-branch or two-branch encoders. However, due to the lack of artificial prior design for change relationship features, these networks cannot learn enough change semantic information and lose more accurate change detection performance. So we propose a network architecture NAME for the explicit mining of change relation features. In our opinion, the change features of change detection should be divided into pre-changed image features, post-changed image features and change relation features. In order to fully mine these three kinds of change features, we propose the triple branch network combining the transformer and convolutional neural network (CNN) to extract and fuse these change features from two perspectives of global information and local information, respectively. In addition, we design the continuous change relation (CCR) branch to further obtain the continuous and detail change relation features to improve the change discrimination capability of the model. The experimental results show that our network performs better, in terms of F1, IoU, and OA, than those of the existing advanced networks for change detection on four public very high-resolution (VHR) remote sensing datasets. Our source code is available at https://github.com/DalongZ/NAME.
    摘要 <>将文本翻译成简化中文。<>改变检测一直是Remote感知图像解释中的关键任务。它实际上是一个独特的二分类任务,其中有两个输入。在现有的网络架构中,改变关系特征的挖掘是通常隐式的,但由于缺乏人工设计的改变关系特征,这些网络无法学习足够的改变semantic信息,导致更准确的改变检测性能下降。因此,我们提议一种名为NAME的网络架构,用于明确挖掘改变关系特征。根据我们的观点,改变特征包括预变图像特征、后变图像特征和改变关系特征。为了全面挖掘这三种改变特征,我们提议使用转换器和卷积神经网络(CNN)结合三支分支网络,从全球信息和局部信息两个角度提取和融合这些改变特征。此外,我们还设计了连续改变关系(CCR)支分,以获取更细致的改变关系特征,以提高改变检测模型的改变识别能力。实验结果显示,我们的网络在四个公共very高分辨率(VHR)Remote感知数据集上的F1、IoU和OA指标上表现比现有的先进网络更好。我们的源代码可以在https://github.com/DalongZ/NAME中下载。

Benchmarking Individual Tree Mapping with Sub-meter Imagery

  • paper_url: http://arxiv.org/abs/2311.07981
  • repo_url: None
  • paper_authors: Dimitri Gominski, Ankit Kariryaa, Martin Brandt, Christian Igel, Sizhuo Li, Maurice Mugabowindekwe, Rasmus Fensholt
  • for: 本研究旨在提供一个适合单一树木映射的评估框架,以便在不同的物理环境中进行单一树木映射。
  • methods: 本研究使用了多种方法和深度架构进行单一树木映射,包括检测和分类方法,以及传播者。
  • results: 本研究通过实验证明了一个新的方法,可以在单一树木映射中实现一个好的折冲点between segmentation和检测。
    Abstract There is a rising interest in mapping trees using satellite or aerial imagery, but there is no standardized evaluation protocol for comparing and enhancing methods. In dense canopy areas, the high variability of tree sizes and their spatial proximity makes it arduous to define the quality of the predictions. Concurrently, object-centric approaches such as bounding box detection usuallyperform poorly on small and dense objects. It thus remains unclear what is the ideal framework for individual tree mapping, in regards to detection and segmentation approaches, convolutional neural networks and transformers. In this paper, we introduce an evaluation framework suited for individual tree mapping in any physical environment, with annotation costs and applicative goals in mind. We review and compare different approaches and deep architectures, and introduce a new method that we experimentally prove to be a good compromise between segmentation and detection.
    摘要 有一些团队正在使用卫星或飞行图像来映射树木,但没有一个标准化的评估协议来比较和提高方法。在密集的树木区域中,树木的大小和空间 proximity 的高度变化使其困难定义预测的质量。同时,对象中心的方法,如 bounding box 探测,通常在小型和密集的对象上表现不佳。因此,还未确定最佳的框架是什么,它应该是 detection 和 segmentation 方法, convolutional neural networks 和 transformers。在这篇文章中,我们提出了适用于个体树木映射的评估框架,考虑到标注成本和实际应用目标。我们还对不同的方法和深度架构进行了查看和比较,并引入了一种新的方法,我们实验证明这种方法是一个好的 segmentation 和 detection 的 компроми斯。

Comparison of two data fusion approaches for land use classification

  • paper_url: http://arxiv.org/abs/2311.07967
  • repo_url: None
  • paper_authors: Martin Cubaud, Arnaud Le Bris, Laurence Jolivet, Ana-Maria Olteanu-Raimond
  • for: 这种研究的目的是为了生成详细的土地使用地图,以便于土地管理和规划。
  • methods: 这种研究使用了多种不同来源的空间数据,包括光学图像和其他数据源,并 compare了两种不同的方法来组合这些数据:预分类 fusion 和后分类 fusion。
  • results: 研究发现,预分类 fusion 方法可以达到最高的 final 精度(97%)和macro-mean F1 分数(88%)。
    Abstract Accurate land use maps, describing the territory from an anthropic utilisation point of view, are useful tools for land management and planning. To produce them, the use of optical images alone remains limited. It is therefore necessary to make use of several heterogeneous sources, each carrying complementary or contradictory information due to their imperfections or their different specifications. This study compares two different approaches i.e. a pre-classification and a post-classification fusion approach for combining several sources of spatial data in the context of land use classification. The approaches are applied on authoritative land use data located in the Gers department in the southwest of France. Pre-classification fusion, while not explicitly modeling imperfections, has the best final results, reaching an overall accuracy of 97% and a macro-mean F1 score of 88%.
    摘要 准确的土地使用地图,从人类活动利用角度来看 territory,是地域规划和管理中非常有用的工具。但使用光学图像 alone 的使用受限,因此需要使用多种不同来源,每种携带不同的信息,由于它们的不完全性或不同的规格。本研究比较了两种不同的方法,即预分类和后分类 fusión 方法,用于将多种空间数据组合在土地使用分类中。两种方法在法国南西部GERS省的官方土地使用数据上进行应用。预分类 fusión 方法,不直接模型瑕疵,最终结果最佳,达到了97%的总准确率和88%的macro-mean F1分数。

Robust Learning Based Condition Diagnosis Method for Distribution Network Switchgear

  • paper_url: http://arxiv.org/abs/2311.07956
  • repo_url: None
  • paper_authors: Wenxi Zhang, Zhe Li, Weixi Li, Weisi Ma, Xinyi Chen, Sizhe Li
  • For: 这种方法用于诊断分布网络switchgear的状态,以维护电力质量 для终端用户。* Methods: 该方法使用扩展的特征向量,包括环境数据、温度测量、Switch位置、电动机操作、隔离状况和本地充电信息。它利用特征映射处理高维度数据,并引入决策半径来分类无标签样本,通过综合超参和自参损失函数、一致常数 regularization 函数来更新模型参数。* Results: 相比现有模型,该方法在准确性和稳定性两个方面具有显著优势。
    Abstract This paper introduces a robust, learning-based method for diagnosing the state of distribution network switchgear, which is crucial for maintaining the power quality for end users. Traditional diagnostic models often rely heavily on expert knowledge and lack robustness. To address this, our method incorporates an expanded feature vector that includes environmental data, temperature readings, switch position, motor operation, insulation conditions, and local discharge information. We tackle the issue of high dimensionality through feature mapping. The method introduces a decision radius to categorize unlabeled samples and updates the model parameters using a combination of supervised and unsupervised loss, along with a consistency regularization function. This approach ensures robust learning even with a limited number of labeled samples. Comparative analysis demonstrates that this method significantly outperforms existing models in both accuracy and robustness.
    摘要

Detection of Small Targets in Sea Clutter Based on RepVGG and Continuous Wavelet Transform

  • paper_url: http://arxiv.org/abs/2311.07912
  • repo_url: None
  • paper_authors: Jingchen Ni, Haoru Li, Lilin Xu, Jing Liang
  • for: 这个论文的目的是提出一种高性能的海啸背景下的目标探测器,以提高探测效率和准确率。
  • methods: 这个论文使用了RepVGG残差网络,并与其他网络和特征提取方法进行比较,包括STFT和CWT。
  • results: 测试结果表明,使用RepVGGA0-CWT探测器可以在低控制干扰false alarm rate、高训练速度、高探测速度和低内存使用率等方面表现优于其他网络和特征提取方法。
    Abstract Constructing a high-performance target detector under the background of sea clutter is always necessary and important. In this work, we propose a RepVGGA0-CWT detector, where RepVGG is a residual network that gains a high detection accuracy. Different from traditional residual networks, RepVGG keeps an acceptable calculation speed. Giving consideration to both accuracy and speed, the RepVGGA0 is selected among all the variants of RepVGG. Also, continuous wavelet transform (CWT) is employed to extract the radar echoes' time-frequency feature effectively. In the tests, other networks (ResNet50, ResNet18 and AlexNet) and feature extraction methods (short-time Fourier transform (STFT), CWT) are combined to build detectors for comparison. The result of different datasets shows that the RepVGGA0-CWT detector performs better than those detectors in terms of low controllable false alarm rate, high training speed, high inference speed and low memory usage. This RepVGGA0-CWT detector is hardware-friendly and can be applied in real-time scenes for its high inference speed in detection.
    摘要 构建高性能的目标检测器在海啸背景下是一项必要和重要的任务。在这项工作中,我们提议了RepVGGA0-CWT检测器,其中RepVGG是一种具有高检测精度的径向网络,而不同于传统的径向网络,RepVGG具有可接受的计算速度。为了考虑精度和速度之间的平衡,我们选择了RepVGGA0中的所有变体。此外,我们采用了 kontinuous wavelet transform(CWT)来提取雷达回声的时空频特征,以便更好地检测目标。在测试中,我们使用了其他网络(ResNet50、ResNet18和AlexNet)和特征提取方法(short-time Fourier transform(STFT)、CWT)构建了对比的检测器。测试结果显示,RepVGGA0-CWT检测器在低可控干扰False Alarm率、高训练速度、高推理速度和低内存使用量等方面表现更好于其他检测器。此外,这种RepVGGA0-CWT检测器具有硬件友好性,可以在实时场景中应用,因为它具有高推理速度。

Test-Time Training for Semantic Segmentation with Output Contrastive Loss

  • paper_url: http://arxiv.org/abs/2311.07877
  • repo_url: https://github.com/dazhangyu123/ocl
  • paper_authors: Yunlong Zhang, Yuxuan Sun, Sunyi Zheng, Zhongyi Shui, Chenglu Zhu, Lin Yang
  • for: 这篇论文主要是为了提高深度学习基于模型在新领域中的泛化能力,以便在评估时能够更好地适应新的环境。
  • methods: 作者使用了测试时重要(TTT)方法,并引入了对比损失(CL)来稳定化适应过程,同时对CL进行了修改和简化,以便更直观地帮助模型更好地适应新的环境。
  • results: 作者通过了多种评估场景,证明了他们的方法的有效性,特别是当应用于先进的预训练方法中的测试数据时,他们的方法表现出色,说明它具有抗衰落和适应性。
    Abstract Although deep learning-based segmentation models have achieved impressive performance on public benchmarks, generalizing well to unseen environments remains a major challenge. To improve the model's generalization ability to the new domain during evaluation, the test-time training (TTT) is a challenging paradigm that adapts the source-pretrained model in an online fashion. Early efforts on TTT mainly focus on the image classification task. Directly extending these methods to semantic segmentation easily experiences unstable adaption due to segmentation's inherent characteristics, such as extreme class imbalance and complex decision spaces. To stabilize the adaptation process, we introduce contrastive loss (CL), known for its capability to learn robust and generalized representations. Nevertheless, the traditional CL operates in the representation space and cannot directly enhance predictions. In this paper, we resolve this limitation by adapting the CL to the output space, employing a high temperature, and simplifying the formulation, resulting in a straightforward yet effective loss function called Output Contrastive Loss (OCL). Our comprehensive experiments validate the efficacy of our approach across diverse evaluation scenarios. Notably, our method excels even when applied to models initially pre-trained using domain adaptation methods on test domain data, showcasing its resilience and adaptability.\footnote{Code and more information could be found at~ \url{https://github.com/dazhangyu123/OCL}
    摘要 (简体中文)尽管深度学习基于的分割模型在公共评测上表现出色,但将其推广到未见的环境中仍然是一大挑战。以提高模型在新领域评测时的泛化能力,测试时间训练(TTT)是一种挑战的 paradigma,它在线上适应源预训练模型。早期的TTT主要关注于图像分类任务。将这些方法扩展到 semantic segmentation 是容易陷入不稳定的适应过程,因为分割的特点包括分类异常分布和复杂的决策空间。为稳定适应过程,我们引入对比损失(CL),它能够学习Robust和通用的表示。然而,传统的CL在表示空间运行,无法直接改进预测。在这篇论文中,我们解决这个限制,通过对CL的修改,使其适应输出空间,使用高温度,并简化表述,得到一种直观 yet 有效的损失函数,称为输出对比损失(OCL)。我们在多个评测场景中进行了广泛的实验,证明了我们的方法的可靠性和适应性。尤其是,当我们将模型在测试预训练使用Domain adaptation方法时,我们的方法仍然表现出色,这表明了我们的方法的稳定性和适应性。

Dual-channel Prototype Network for few-shot Classification of Pathological Images

  • paper_url: http://arxiv.org/abs/2311.07871
  • repo_url: https://github.com/quanhao129611/DCPN
  • paper_authors: Hao Quan, Xinjia Li, Dayu Hu, Tianhang Nan, Xiaoyu Cui
  • for: 这篇论文旨在提出一种基于几何学学习的技术,以几何学学习方法来对罕见疾病的医学图像进行分类。
  • methods: 本研究使用了一种称为“双渠道原型网络”(DCPN)的技术,该技术结合了自我超vised learning和卷积神经网络,以提高几何学分类的精度和效率。
  • results: 根据实验结果显示,DCPN在不同预设的临床情况下,实现了几何学分类任务的超级表现,特别是在同领域内的任务中,其性能与指导式学习相当。
    Abstract In pathology, the rarity of certain diseases and the complexity in annotating pathological images significantly hinder the creation of extensive, high-quality datasets. This limitation impedes the progress of deep learning-assisted diagnostic systems in pathology. Consequently, it becomes imperative to devise a technology that can discern new disease categories from a minimal number of annotated examples. Such a technology would substantially advance deep learning models for rare diseases. Addressing this need, we introduce the Dual-channel Prototype Network (DCPN), rooted in the few-shot learning paradigm, to tackle the challenge of classifying pathological images with limited samples. DCPN augments the Pyramid Vision Transformer (PVT) framework for few-shot classification via self-supervised learning and integrates it with convolutional neural networks. This combination forms a dual-channel architecture that extracts multi-scale, highly precise pathological features. The approach enhances the versatility of prototype representations and elevates the efficacy of prototype networks in few-shot pathological image classification tasks. We evaluated DCPN using three publicly available pathological datasets, configuring small-sample classification tasks that mirror varying degrees of clinical scenario domain shifts. Our experimental findings robustly affirm DCPN's superiority in few-shot pathological image classification, particularly in tasks within the same domain, where it achieves the benchmarks of supervised learning.
    摘要 在 PATHOLOGY 领域,一些疾病的罕见性和诊断图像的复杂性,使得创建大量、高质量数据集成为非常困难的。这种限制阻碍了深度学习助动诊断系统在 PATHOLOGY 领域的进步。因此,需要开发一种技术,可以从 minimal 数量的标注示例中分辨出新的疾病类别。这种技术将帮助深度学习模型更好地识别罕见疾病。为解决这个需求,我们介绍了 Dual-channel Prototype Network (DCPN),基于 few-shot 学习 paradigm,用于分类 PATHOLOGY 图像。DCPN 将 Pyramid Vision Transformer (PVT) 框架与自动学习相结合,并将其与卷积神经网络结合。这种结构形成了双通道体系,可以提取多级、高精度 PATHOLOGY 特征。这种方法提高了prototype表示的多样性,并提高了prototype网络在 few-shot PATHOLOGY 图像分类任务中的效果。我们通过三个公共可用的 PATHOLOGY 数据集进行了实验,并配置了小样本分类任务,这些任务模拟了不同程度的临床enario域转移。我们的实验结果表明,DCPN 在 few-shot PATHOLOGY 图像分类任务中表现出色,特别是在同一个域的任务中,它可以达到超过supervised learning的标准。

Probing clustering in neural network representations

  • paper_url: http://arxiv.org/abs/2311.07864
  • repo_url: None
  • paper_authors: Thao Nguyen, Simon Kornblith
  • for: 研究如何不同的设计选择影响隐藏表示中的团集化。
  • methods: 使用 BREEDS hierarchy 进行 subclass clustering, isolate 训练数据集和网络架构为关键因素。
  • results: 发现 datasets 的类别结构和预训练模型的选择对团集化有影响,normalization 策略affects 哪些层的团集化性能,并发现 Vision Transformers 的团集化性能较差。
    Abstract Neural network representations contain structure beyond what was present in the training labels. For instance, representations of images that are visually or semantically similar tend to lie closer to each other than to dissimilar images, regardless of their labels. Clustering these representations can thus provide insights into dataset properties as well as the network internals. In this work, we study how the many design choices involved in neural network training affect the clusters formed in the hidden representations. To do so, we establish an evaluation setup based on the BREEDS hierarchy, for the task of subclass clustering after training models with only superclass information. We isolate the training dataset and architecture as important factors affecting clusterability. Datasets with labeled classes consisting of unrelated subclasses yield much better clusterability than those following a natural hierarchy. When using pretrained models to cluster representations on downstream datasets, models pretrained on subclass labels provide better clusterability than models pretrained on superclass labels, but only when there is a high degree of domain overlap between the pretraining and downstream data. Architecturally, we find that normalization strategies affect which layers yield the best clustering performance, and, surprisingly, Vision Transformers attain lower subclass clusterability than ResNets.
    摘要 (注:以下是使用简化中文表示的文本)神经网络表示含有超出训练标签的结构。例如,与标签不同的图像表示在不同的图像中往往更近,无论它们的标签如何。将这些表示进行归类可以提供关于数据集和网络内部的信息。在这种工作中,我们研究了各种神经网络训练中的设计选择如何影响隐藏表示中的归类结构。为此,我们基于BREEDS层次结构设置评估集成,用于在只有超类信息下进行类别归类。我们发现,使用不同类别的数据集和架构可以影响归类性。具有不相关的类别的数据集可以获得更好的归类性,而遵循自然层次结构的数据集则不然。使用预训练模型进行下游数据集的归类时,使用 subclass标签进行预训练可以获得更好的归类性,但只有在预训练和下游数据集具有高度域 overlap 时。层次上,我们发现normalization策略可以影响归类性最佳层,并且意外地发现视图转换器的 subclass 归类性较低于ResNet。

cs.AI - 2023-11-14

AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications

  • paper_url: http://arxiv.org/abs/2311.08592
  • repo_url: https://github.com/kevinrobinson-at-elgoog/aart-ai-safety-dataset
  • paper_authors: Bhaktipriya Radharapu, Kevin Robinson, Lora Aroyo, Preethi Lahoti
  • for: 这篇论文是用于推广大语言模型(LLM)的安全和负责任部署的检测方法。
  • methods: 这篇论文提出了一种新的自动生成挑战数据集的方法,以测试LLM生成的新应用程序的安全性。这种方法被称为人工智能协助红团(AART),它可以减少人类努力,并提供可重用和定制的食谱,以便在新产品开发中早些地 integrating 挑战测试。
  • results: AART 可以生成具有高多样性的内容特征的评估数据集,包括敏感和危险的概念、特定的文化和地理区域、和应用场景。与一些当前的工具相比,AART 表现出了优秀的概念覆盖率和数据质量。
    Abstract Adversarial testing of large language models (LLMs) is crucial for their safe and responsible deployment. We introduce a novel approach for automated generation of adversarial evaluation datasets to test the safety of LLM generations on new downstream applications. We call it AI-assisted Red-Teaming (AART) - an automated alternative to current manual red-teaming efforts. AART offers a data generation and augmentation pipeline of reusable and customizable recipes that reduce human effort significantly and enable integration of adversarial testing earlier in new product development. AART generates evaluation datasets with high diversity of content characteristics critical for effective adversarial testing (e.g. sensitive and harmful concepts, specific to a wide range of cultural and geographic regions and application scenarios). The data generation is steered by AI-assisted recipes to define, scope and prioritize diversity within the application context. This feeds into a structured LLM-generation process that scales up evaluation priorities. Compared to some state-of-the-art tools, AART shows promising results in terms of concept coverage and data quality.
    摘要 <>大型自然语言模型(LLM)的反对攻击测试是其安全和负责的部署的关键。我们介绍了一种新的自动生成反对攻击评估数据集的方法,以测试LLM生成的安全性在新的下游应用中。我们称之为人工智能协助红团(AART),它是现有手动红团努力的自动化代替方案。AART提供了一个数据生成和扩大管道,可以减少人类努力,并允许在新产品开发中更EARLY Integration of adversarial testing。AART生成的评估数据集具有高度的内容特征多样性,如敏感和危险概念、特定的文化和地理区域、应用场景等。数据生成被由人工智能辅助的评则定义、范围和优先级,以适应应用上下文。这将导入一个结构化的LLM生成过程,可以扩大评估优先级。相比一些状态 искусственный智能工具,AART表现出了promising的概念覆盖率和数据质量。

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation

  • paper_url: http://arxiv.org/abs/2311.08588
  • repo_url: https://github.com/weixiangyan/codescope
  • paper_authors: Weixiang Yan, Haitian Liu, Yunkun Wang, Yunzhe Li, Qian Chen, Wen Wang, Tingyu Lin, Weishan Zhao, Li Zhu, Shuiguang Deng, Hari Sundaram
  • for: 这个论文的目的是提出一个新的编程能力评测 benchmark,以更好地评估大型自然语言模型(LLMs)在编程任务上的表现。
  • methods: 这个论文使用了多种方法,包括开发了一个名为 MultiCodeEngine 的自动化代码执行引擎,以及设计了一个名为 CodeScope 的多语言多任务多维度评测 benchmark。
  • results: 通过对 8 种主流 LLMs 在 CodeScope 任务上进行系统性的评测和分析,这个论文展示了 CodeScope 对 LLMs 的评测能力的广泛性和挑战性,以及其对编程任务的评估能力的综合性。
    Abstract Large Language Models (LLMs) have demonstrated remarkable performance on coding related tasks, particularly on assisting humans in programming and facilitating programming automation. However, existing benchmarks for evaluating the code understanding and generation capacities of LLMs suffer from severe limitations. First, most benchmarks are deficient as they focus on a narrow range of popular programming languages and specific tasks, whereas the real-world software development scenarios show dire need to implement systems with multilingual programming environments to satisfy diverse requirements. Practical programming practices also strongly expect multi-task settings for testing coding capabilities of LLMs comprehensively and robustly. Second, most benchmarks also fail to consider the actual executability and the consistency of execution results of the generated code. To bridge these gaps between existing benchmarks and expectations from practical applications, we introduce CodeScope, an execution-based, multilingual, multi-task, multi-dimensional evaluation benchmark for comprehensively gauging LLM capabilities on coding tasks. CodeScope covers 43 programming languages and 8 coding tasks. It evaluates the coding performance of LLMs from three dimensions (perspectives): difficulty, efficiency, and length. To facilitate execution-based evaluations of code generation, we develop MultiCodeEngine, an automated code execution engine that supports 14 programming languages. Finally, we systematically evaluate and analyze 8 mainstream LLMs on CodeScope tasks and demonstrate the superior breadth and challenges of CodeScope for evaluating LLMs on code understanding and generation tasks compared to other benchmarks. The CodeScope benchmark and datasets are publicly available at https://github.com/WeixiangYAN/CodeScope.
    摘要 大型语言模型(LLM)在编程相关任务上表现出色,特别是在协助人类编程和自动化编程方面。然而,现有的代码理解和生成能力评估标准受到严重的限制。首先,大多数标准都是专注于一些受欢迎的编程语言和特定任务,而实际世界的软件开发场景需要实现多语言编程环境,以满足多样化的需求。实际编程实践也希望通过多任务设定来测试 LLM 的编程能力,以进行全面和可靠的评估。其次,大多数标准也忽视了生成代码的执行可能性和执行结果的一致性。为了bridging这些存在的差异,我们提出了 CodeScope,一个基于执行的多语言多任务多维度评估标准,用于全面评估 LLM 在编程任务上的能力。CodeScope 涵盖了43种编程语言和8种编程任务,并评估 LLM 的编程性能从三个维度(视角):困难、效率和长度。为了促进代码生成的执行评估,我们开发了 MultiCodeEngine,一个自动化代码执行引擎,支持14种编程语言。最后,我们系统地评估了8种主流 LLM 在 CodeScope 任务上的表现,并证明 CodeScope 对 LLM 的代码理解和代码生成能力评估具有更高的广泛性和挑战性,相比其他标准。CodeScope benchmark和数据集公开可以在 GitHub 上下载。

Finding AI-Generated Faces in the Wild

  • paper_url: http://arxiv.org/abs/2311.08577
  • repo_url: None
  • paper_authors: Gonzalo J. Aniano Porcile, Jack Gindi, Shivansh Mundra, James R. Verbus, Hany Farid
  • for: 本研究旨在分辨真正的人脸和人工生成的 faces,尤其是在假账户图像上。
  • methods: 该研究使用了一种简单的方法,通过关注人脸特征来检测人工生成的 faces。
  • results: 研究发现,通过关注人脸特征可以检测到各种人工生成的 faces,包括Diffusion-based和GAN-based生成的 faces,并且可以在不同的图像解度和质量下进行检测。
    Abstract AI-based image generation has continued to rapidly improve, producing increasingly more realistic images with fewer obvious visual flaws. AI-generated images are being used to create fake online profiles which in turn are being used for spam, fraud, and disinformation campaigns. As the general problem of detecting any type of manipulated or synthesized content is receiving increasing attention, here we focus on a more narrow task of distinguishing a real face from an AI-generated face. This is particularly applicable when tackling inauthentic online accounts with a fake user profile photo. We show that by focusing on only faces, a more resilient and general-purpose artifact can be detected that allows for the detection of AI-generated faces from a variety of GAN- and diffusion-based synthesis engines, and across image resolutions (as low as 128 x 128 pixels) and qualities.
    摘要

Towards Evaluating AI Systems for Moral Status Using Self-Reports

  • paper_url: http://arxiv.org/abs/2311.08576
  • repo_url: None
  • paper_authors: Ethan Perez, Robert Long
  • for: investigate whether AI systems have states of moral significance
  • methods: train models to answer questions about themselves with known answers, avoiding training incentives that bias self-reports
  • results: develop introspection-like capabilities, and assess the consistency and reliability of self-reports
    Abstract As AI systems become more advanced and widely deployed, there will likely be increasing debate over whether AI systems could have conscious experiences, desires, or other states of potential moral significance. It is important to inform these discussions with empirical evidence to the extent possible. We argue that under the right circumstances, self-reports, or an AI system's statements about its own internal states, could provide an avenue for investigating whether AI systems have states of moral significance. Self-reports are the main way such states are assessed in humans ("Are you in pain?"), but self-reports from current systems like large language models are spurious for many reasons (e.g. often just reflecting what humans would say). To make self-reports more appropriate for this purpose, we propose to train models to answer many kinds of questions about themselves with known answers, while avoiding or limiting training incentives that bias self-reports. The hope of this approach is that models will develop introspection-like capabilities, and that these capabilities will generalize to questions about states of moral significance. We then propose methods for assessing the extent to which these techniques have succeeded: evaluating self-report consistency across contexts and between similar models, measuring the confidence and resilience of models' self-reports, and using interpretability to corroborate self-reports. We also discuss challenges for our approach, from philosophical difficulties in interpreting self-reports to technical reasons why our proposal might fail. We hope our discussion inspires philosophers and AI researchers to criticize and improve our proposed methodology, as well as to run experiments to test whether self-reports can be made reliable enough to provide information about states of moral significance.
    摘要 We then propose methods for assessing the extent to which these techniques have succeeded:1. Evaluating self-report consistency across contexts and between similar models2. Measuring the confidence and resilience of models' self-reports3. Using interpretability to corroborate self-reportsWe also discuss challenges for our approach, from philosophical difficulties in interpreting self-reports to technical reasons why our proposal might fail. We hope our discussion inspires philosophers and AI researchers to criticize and improve our proposed methodology, as well as to run experiments to test whether self-reports can be made reliable enough to provide information about states of moral significance.

Parameter-Efficient Multilingual Summarisation: An Empirical Study

  • paper_url: http://arxiv.org/abs/2311.08572
  • repo_url: None
  • paper_authors: Chenxi Whitehouse, Fantine Huot, Jasmijn Bastings, Mostafa Dehghani, Chu-Cheng Lin, Mirella Lapata
  • for: 这篇论文 investigate了 Parametric Efficient Fine-Tuning 在复杂和未探索的多语言概要任务中的潜力,尤其是在内存占用任务中。
  • methods: 该论文使用了 Low-Rank Adaptation(LoRA)方法,并在不同的数据可用性enario中进行了广泛的研究,包括全数据、低数据和 cross-lingual transfer。
  • results: 研究发现,在全数据场景下,LoRA 落后于全 Fine-Tuning,但在低数据场景和 cross-lingual transfer 中,LoRA 表现出色。同时,随着模型的尺寸增大,LoRA 和 Full Fine-Tuning 之间的性能差距逐渐减少。此外,我们还研究了少量数据的 cross-lingual transfer 策略,发现 Continued LoRA 升级得到了最好的表现。
    Abstract With the increasing prevalence of Large Language Models, traditional full fine-tuning approaches face growing challenges, especially in memory-intensive tasks. This paper investigates the potential of Parameter-Efficient Fine-Tuning, focusing on Low-Rank Adaptation (LoRA), for complex and under-explored multilingual summarisation tasks. We conduct an extensive study across different data availability scenarios, including full-data, low-data, and cross-lingual transfer, leveraging models of different sizes. Our findings reveal that LoRA lags behind full fine-tuning when trained with full data, however, it excels in low-data scenarios and cross-lingual transfer. Interestingly, as models scale up, the performance gap between LoRA and full fine-tuning diminishes. Additionally, we investigate effective strategies for few-shot cross-lingual transfer, finding that continued LoRA tuning achieves the best performance compared to both full fine-tuning and dynamic composition of language-specific LoRA modules.
    摘要 随着大型语言模型的普及,传统的全面精通方法面临着增长的挑战,特别是在内存密集的任务上。这篇论文研究了Parameter-Efficient Fine-Tuning的潜在,特别是LOW-RANK Adaptation(LoRA)在复杂且未曾经explored的多语言概要任务上。我们进行了广泛的研究,包括全数据、低数据和语言转移等不同数据可用性enario,利用不同的模型大小。我们的发现表明,LoRA在全数据上训练时与全面精通相比落后,但在低数据 scenarios和语言转移中表现出色。有意思的是,随着模型的尺寸增长,LoRA和全面精通的性能差距逐渐减小。此外,我们还研究了几个有效的少量跨语言转移策略,发现继续LoRA调教比full fine-tuning和动态语言特定LoRA模块组合更佳。

Adversarial Imitation Learning On Aggregated Data

  • paper_url: http://arxiv.org/abs/2311.08568
  • repo_url: None
  • paper_authors: Pierre Le Pelletier de Woillemont, Rémi Labory, Vincent Corruble
  • for: 学习一个优化的策略,基于一些专家示范,以避免指定一个合适的奖励函数的困难过程。
  • methods: 使用一种动态、适应的方法 called Adversarial Imitation Learning on Aggregated Data (AILAD), concurrently学习非线性奖励函数和关联的优化策略,使用反对抗框架。奖励学习器仅使用汇总数据。同时,它生成了多样化的行为,生成汇总数据匹配专家的分布。
  • results: 方法可以减少或消除现有系统中的一些约束,例如完全解决一个前向奖励学习问题的内部循环,或者需要专家全 trajectory,或者假设专家数据具有共同性。这些约束使现有的IRL方法不可扩展或不可用于certain existing systems。
    Abstract Inverse Reinforcement Learning (IRL) learns an optimal policy, given some expert demonstrations, thus avoiding the need for the tedious process of specifying a suitable reward function. However, current methods are constrained by at least one of the following requirements. The first one is the need to fully solve a forward Reinforcement Learning (RL) problem in the inner loop of the algorithm, which might be prohibitively expensive in many complex environments. The second one is the need for full trajectories from the experts, which might not be easily available. The third one is the assumption that the expert data is homogeneous rather than a collection from various experts or possibly alternative solutions to the same task. Such constraints make IRL approaches either not scalable or not usable on certain existing systems. In this work we propose an approach which removes these requirements through a dynamic, adaptive method called Adversarial Imitation Learning on Aggregated Data (AILAD). It learns conjointly both a non linear reward function and the associated optimal policy using an adversarial framework. The reward learner only uses aggregated data. Moreover, it generates diverse behaviors producing a distribution over the aggregated data matching that of the experts.
    摘要 inverse 奖励学习(IRL)可以学习最佳策略,基于一些专家示范,因此可以避免 specify 一个合适的奖励函数的劳碌的过程。然而,当前的方法受到以下一些限制:1. 需要完全解决一个前向奖励学习(RL)问题的内部循环算法,这可能是许多复杂环境中的禁制件。2. 需要全息的专家数据,这可能不可能获得。3. 假设专家数据是一致的,而不是一个由多个专家或可能的解决方案集合。这些限制使得 IRL 方法不可扩展或不可用于某些现有系统。在这种工作中,我们提议一种方法,即 Adversarial Imitation Learning on Aggregated Data(AILAD)。它可以同时学习非线性奖励函数和相关的优化策略,使用对抗框架。奖励学习者只使用汇总数据。此外,它可以生成多样的行为,生成一个与专家数据 Distribution 匹配的分布。

Probabilistic reconstruction of Dark Matter fields from biased tracers using diffusion models

  • paper_url: http://arxiv.org/abs/2311.08558
  • repo_url: https://github.com/cfpark00/vdm4cdm
  • paper_authors: Core Francisco Park, Victoria Ono, Nayantara Mudur, Yueying Ni, Carolina Cuesta-Lazaro
  • for: 这篇论文旨在研究 galaxy formation 模型中对 dark matter 的影响,以及如何通过 diffusion generative model 预测 dark matter 场的 posterior distribution。
  • methods: 这篇论文使用 state-of-the-art galaxy formation simulation suites,并 vary cosmological parameters 和 sub-grid astrophysics 来研究 dark matter 和 galaxy 之间的关系。
  • results: 这篇论文通过 diffusion generative model 预测了 dark matter 场的 posterior distribution,并能够 marginalize over cosmological 和 galaxy formation 中的不确定性。
    Abstract Galaxies are biased tracers of the underlying cosmic web, which is dominated by dark matter components that cannot be directly observed. The relationship between dark matter density fields and galaxy distributions can be sensitive to assumptions in cosmology and astrophysical processes embedded in the galaxy formation models, that remain uncertain in many aspects. Based on state-of-the-art galaxy formation simulation suites with varied cosmological parameters and sub-grid astrophysics, we develop a diffusion generative model to predict the unbiased posterior distribution of the underlying dark matter fields from the given stellar mass fields, while being able to marginalize over the uncertainties in cosmology and galaxy formation.
    摘要 星系是潜在的cosmic web的偏向跟踪器,cosmic web是由黑洞物质组成的,这些物质无法直接观测。星系分布和黑洞物质密度场之间的关系可能受到 cosmology 和astrophysical processes的假设和未知之影响。基于现代星系形成模拟集和不同 cosmological parameters 和astrophysical processes的 substitute,我们开发了一种扩散生成模型,可以预测基于给定的星系质量场的黑洞物质场的不偏 posterior distribution,同时可以对 cosmology 和星系形成过程中的不确定性进行补做。

Low-light Pedestrian Detection in Visible and Infrared Image Feeds: Issues and Challenges

  • paper_url: http://arxiv.org/abs/2311.08557
  • repo_url: None
  • paper_authors: Hrishikesh Vachhani, Thangarajah Akilan, Yash Devmurari, Nisharaff Shaik, Dhruvisha Patel
  • for: 本研究旨在探讨Recently, new ideas have been spurred to use alternative sources, such as Far InfraRed (FIR) temperature sensor feeds for detecting pedestrians in low-light conditions.
  • methods: 本研究系统atically categorizes and analyses various algorithms from region-based to non-region-based and graph-based learning methodologies by highlighting their methodologies, implementation issues, and challenges.
  • results: 本研究提出了一些最新的发展方法,包括region-based和non-region-based的方法,以及graph-based learning方法,并详细描述了它们的实现问题和挑战。
    Abstract Pedestrian detection has become a cornerstone for several high-level tasks, including autonomous driving, intelligent transportation, and traffic surveillance. There are several works focussed on pedestrian detection using visible images, mainly in the daytime. However, this task is very intriguing when the environmental conditions change to poor lighting or nighttime. Recently, new ideas have been spurred to use alternative sources, such as Far InfraRed (FIR) temperature sensor feeds for detecting pedestrians in low-light conditions. This study comprehensively reviews recent developments in low-light pedestrian detection approaches. It systematically categorizes and analyses various algorithms from region-based to non-region-based and graph-based learning methodologies by highlighting their methodologies, implementation issues, and challenges. It also outlines the key benchmark datasets that can be used for research and development of advanced pedestrian detection algorithms, particularly in low-light situations
    摘要 人体检测已经成为自动驾驶、智能交通和交通监测等高级任务的基础之一。目前主要是使用可见图像进行人体检测,主要是在白天进行。然而,环境变化到夜晚或低照度时,这项任务变得非常挑战性。近些年,新的想法被提出,使用其他来源,如红外温度传感器的数据来进行人体检测。本研究系统地回顾了最新的低照度人体检测方法的发展,并分类和分析了不同的算法,包括区域基本的和非区域基本的学习方法ologies,并 highlights их实施问题和挑战。此外,本研究还列出了可用于研发高级人体检测算法的关键数据集,特别是在低照度情况下。

DeepThought: An Architecture for Autonomous Self-motivated Systems

  • paper_url: http://arxiv.org/abs/2311.08547
  • repo_url: None
  • paper_authors: Arlindo L. Oliveira, Tiago Domingos, Mário Figueiredo, Pedro U. Lima
  • for: 这篇论文旨在探讨大语言模型(LLM)是否可以在人类对话中展现有意识、自动化和自适应性等特点。
  • methods: 这篇论文使用了补充学习系统、全球神经工作场和注意schema理论来设计一种能够具备自适应、自动化和一些元认知特点的语言认知体系。
  • results: 这篇论文提出了一种基于深度学习系统的语言认知体系,能够模拟人类对话中的自适应和自动化行为。
    Abstract The ability of large language models (LLMs) to engage in credible dialogues with humans, taking into account the training data and the context of the conversation, has raised discussions about their ability to exhibit intrinsic motivations, agency, or even some degree of consciousness. We argue that the internal architecture of LLMs and their finite and volatile state cannot support any of these properties. By combining insights from complementary learning systems, global neuronal workspace, and attention schema theories, we propose to integrate LLMs and other deep learning systems into an architecture for cognitive language agents able to exhibit properties akin to agency, self-motivation, even some features of meta-cognition.
    摘要 大型语言模型(LLM)的对话能力,充分考虑训练数据和对话上下文,已经引发了对其具有内在动机、自主意识或一定程度的意识的讨论。我们认为,LLM的内部架构和其 finite和易变的状态无法支持这些性能。通过融合补充学习系统、全球神经网络工作区和注意 schema 理论,我们提议将 LLM 和其他深度学习系统集成到能够表现出类似于自主、自我驱动和一定程度的元认知的认知语言代理系统中。

2D-RC: Two-Dimensional Neural Network Approach for OTFS Symbol Detection

  • paper_url: http://arxiv.org/abs/2311.08543
  • repo_url: None
  • paper_authors: Jiarui Xu, Karim Said, Lizhong Zheng, Lingjia Liu
  • for: 这个研究是为了实现高速通信系统中的无线通信,特别是在高速运动环境下。
  • methods: 这个研究使用了一种名为“扩展时频空间”(OTFS)的变数方案,并使用了一种名为“潜在池”(Reservoir Computing,RC)的方法进行线上Symbol检测。
  • results: 这个研究获得了一个名为“二维潜在池”(2D-RC)的新方法,这个方法可以利用OTFS系统的结构特性来进行线上Symbol检测,并且只需一个神经网络进行检测,不需要多个RC来学习通道特性。实验结果显示,2D-RC方法在不同的OTFS系统版本和数字频率顺序下都具有良好的效果。
    Abstract Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the OTFS system, where only a limited number of over-the-air (OTA) pilot symbols are utilized for training. However, this approach does not leverage the domain knowledge specific to the OTFS system. This paper introduces a novel two-dimensional RC (2D-RC) method that incorporates the structural knowledge of the OTFS system into the design for online symbol detection on a subframe basis. Specifically, as the channel response acts as a two-dimensional (2D) operation over the transmitted information symbols in the delay-Doppler (DD) domain, the 2D-RC is designed to have a 2D structure to equalize the channel. With the introduced architecture, the 2D-RC can benefit from the predictable channel representation in the DD domain. Moreover, unlike the previous work that requires multiple RCs to learn the channel feature, the 2D-RC only requires a single neural network for detection. Experimental results demonstrate the effectiveness of the 2D-RC approach across different OTFS system variants and modulation orders.
    摘要 高速场景下无线通信中的正交时频空间(OTFS)模ulation scheme是一种有前途的方案。最近,一种基于泵池 computing(RC)的方法已经被引入用于OTFS系统中的在线字符检测,其中只使用了有空中(OTA)的导航符号进行训练。然而,这种方法不利用了OTFS系统特有的领域知识。这篇文章介绍了一种新的两维RC(2D-RC)方法,该方法在设计中包含了OTFS系统的结构特征,用于在字帧基础上进行在线symbol检测。具体来说,频道响应在延迟-Doppler(DD)域中对传输的信息符号进行二维(2D)操作,因此2D-RC的设计需要2D结构来平衡频道。与之前的工作不同,2D-RC只需一个神经网络来进行检测,而不需要多个RC来学习频道特征。实验结果表明2D-RC方法在不同的OTFS系统变体和模ulation顺序下具有显著的效果。

GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer

  • paper_url: http://arxiv.org/abs/2311.08526
  • repo_url: None
  • paper_authors: Urchade Zaratiana, Nadi Tomeh, Pierre Holat, Thierry Charnois
  • for: 这个论文旨在提出一种可靠且灵活的命名实体识别(NER)模型,用于各种自然语言处理(NLP)应用。
  • methods: 该模型使用了一个bidirectional transformer Encoder,实现了并行的实体EXTRACTION,与LLMs的顺序化token生成相比,具有更大的灵活性。
  • results: 经过广泛测试,GLiNER模型在不同NER benchmark上表现出色,超越了ChatGPT和 fine-tuned LLMs的零shot评估表现。
    Abstract Named Entity Recognition (NER) is essential in various Natural Language Processing (NLP) applications. Traditional NER models are effective but limited to a set of predefined entity types. In contrast, Large Language Models (LLMs) can extract arbitrary entities through natural language instructions, offering greater flexibility. However, their size and cost, particularly for those accessed via APIs like ChatGPT, make them impractical in resource-limited scenarios. In this paper, we introduce a compact NER model trained to identify any type of entity. Leveraging a bidirectional transformer encoder, our model, GLiNER, facilitates parallel entity extraction, an advantage over the slow sequential token generation of LLMs. Through comprehensive testing, GLiNER demonstrate strong performance, outperforming both ChatGPT and fine-tuned LLMs in zero-shot evaluations on various NER benchmarks.
    摘要 Named Entity Recognition (NER) 是各种自然语言处理(NLP)应用中的关键技术。传统的 NER 模型效果良好,但它们只能识别预定的实体类型。相比之下,大型语言模型(LLM)可以通过自然语言指令提取任意实体,具有更大的灵活性。然而,它们的大小和成本,特别是通过 API LIKE ChatGPT 访问的情况下,使其在资源有限的场景中不实用。本文中,我们介绍了一种具有固定实体类型的 Compact NER 模型。通过 bidirectional transformer Encoder,我们的模型 GLiNER 可以并行提取实体,与 LLM 的顺序token生成相比,具有更大的优势。经过全面测试,GLiNER 在不同的 NER 标准准则上展示了强大的表现,超越了 ChatGPT 和 fine-tuned LLMs 在零shot评估中的表现。

Efficient Rotation Invariance in Deep Neural Networks through Artificial Mental Rotation

  • paper_url: http://arxiv.org/abs/2311.08525
  • repo_url: None
  • paper_authors: Lukas Tuggener, Thilo Stadelmann, Jürgen Schmidhuber
  • for: 解决人工智能系统对旋转输入的识别问题,提高图像识别和分类的精度和可靠性。
  • methods: 基于神经科学概念的精神旋转(Artificial Mental Rotation,AMR)深度学习模型,可以与现有的 convolutional neural networks(CNNs)和视觉转换器(ViTs)结合使用,并且可以轻松地应用于下游任务中。
  • results: 与现有的旋转数据增强技术相比,AMR可以提高图像识别和分类的精度,并且可以轻松地应用于下游任务中。在ImageNet、Stanford Cars和Oxford Pet等 dataset上,AMR的顶部一错(平均值)为0.743,相比之下,旋转数据增强技术的顶部一错(平均值)为0.626,提高了19%。此外,我们还轻松地将已经训练过的 AMR 模块应用到了一个下游任务中,以提高一个预训练的 semantic segmentation 模型在旋转 CoCo 上的性能,从32.7到55.2的 IoU 上升。
    Abstract Humans and animals recognize objects irrespective of the beholder's point of view, which may drastically change their appearances. Artificial pattern recognizers also strive to achieve this, e.g., through translational invariance in convolutional neural networks (CNNs). However, both CNNs and vision transformers (ViTs) perform very poorly on rotated inputs. Here we present artificial mental rotation (AMR), a novel deep learning paradigm for dealing with in-plane rotations inspired by the neuro-psychological concept of mental rotation. Our simple AMR implementation works with all common CNN and ViT architectures. We test it on ImageNet, Stanford Cars, and Oxford Pet. With a top-1 error (averaged across datasets and architectures) of $0.743$, AMR outperforms the current state of the art (rotational data augmentation, average top-1 error of $0.626$) by $19\%$. We also easily transfer a trained AMR module to a downstream task to improve the performance of a pre-trained semantic segmentation model on rotated CoCo from $32.7$ to $55.2$ IoU.
    摘要 人类和动物可以识别物体,无论观察者的视角如何变化。人工 Pattern recognizers 也努力实现这一点,例如通过 convolutional neural networks (CNNs) 中的 translational invariance 来实现。然而, both CNNs 和 vision transformers (ViTs) 对旋转输入表现非常差。我们现在提出了人工 mental rotation (AMR),一种新的深度学习 paradigm,用于处理平面旋转。我们的简单 AMR 实现可以与所有常见 CNN 和 ViT 架构结合使用。我们在 ImageNet、Stanford Cars 和 Oxford Pet 上测试了 AMR,与所有数据集和架构的平均顶部一Error (top-1 error) 为 $0.743$,与当前状态的艺术 rotational data augmentation 的平均顶部一Error ($0.626$) 相比,提高了 $19\%$。我们还轻松地将一个已经训练过 AMR 模块传播到下游任务中,以提高一个预训练的semantic segmentation模型在旋转 CoCo 上的性能,从 $32.7$ 提高到 $55.2$ IoU。

Artificial intelligence and the skill premium

  • paper_url: http://arxiv.org/abs/2311.09255
  • repo_url: None
  • paper_authors: David E. Bloom, Klaus Prettner, Jamel Saadaoui, Mario Veruete
  • for: 本研究探讨了人工智能(AI)的出现对技能奖励(skill premium)的影响。
  • methods: 我们开发了一个嵌入式常数弹性函数,以分解 industrial robots 和 AI 的效应。
  • results: 我们发现,AI 会降低技能奖励,只要它更容易替代高技能工作者而不是低技能工作者替代高技能工作者。
    Abstract What will likely be the effect of the emergence of ChatGPT and other forms of artificial intelligence (AI) on the skill premium? To address this question, we develop a nested constant elasticity of substitution production function that distinguishes between industrial robots and AI. Industrial robots predominantly substitute for low-skill workers, whereas AI mainly helps to perform the tasks of high-skill workers. We show that AI reduces the skill premium as long as it is more substitutable for high-skill workers than low-skill workers are for high-skill workers.
    摘要 “ chatGPT 和其他人工智能(AI)的出现将对技能偏好产生什么影响?为了回答这个问题,我们构建了一个嵌入式常数弹性函数,可以区分工业机器人和 AI。工业机器人主要替代低技能劳动力,而 AI 主要帮助高技能工人完成任务。我们显示,如果 AI 比低技能劳动力更容易替代高技能工人,那么 AI 就会减少技能偏好。”Note that the word "技能偏好" (skill premium) is not a direct translation of the English phrase, but it is a commonly used term in Chinese to refer to the advantage or bonus that high-skilled workers receive in the labor market.

LLMs cannot find reasoning errors, but can correct them!

  • paper_url: http://arxiv.org/abs/2311.08516
  • repo_url: https://github.com/whgtyen/big-bench-mistake
  • paper_authors: Gladys Tyen, Hassan Mansoor, Peter Chen, Tony Mak, Victor Cărbune
  • for: 本研究旨在提高LLM输出质量和风格(如陈等人,2023年;马达安等人,2023年),但在逻辑或推理错误时,自动修正可能会使正确答案变成错误答案,导致总体性能下降(黄等人,2023年)。
  • methods: 本研究将自动修正过程分解成两个核心组成部分:错误发现和输出修正。为错误发现,我们释放BIG-Bench Mistake数据集,包含逻辑错误的链条推理轨迹。我们提供了several state-of-the-art LLM的标准数据,并证明了LLMs通常在发现逻辑错误方面表现不佳。为输出修正,我们提议了回溯方法,该方法在给出错误位置信息时提供了大幅度的改进。我们认为回溯方法是轻量级的替代品,并证明其效果可以在60-70%的奖励模型下保持。
  • results: 本研究的结果表明,回溯方法可以提供大幅度的改进,即使给出了错误位置信息。我们还发现,在60-70%的奖励模型下,回溯方法仍然保持效果。
    Abstract While self-correction has shown promise in improving LLM outputs in terms of style and quality (e.g. Chen et al., 2023; Madaan et al., 2023), recent attempts to self-correct logical or reasoning errors often cause correct answers to become incorrect, resulting in worse performances overall (Huang et al., 2023). In this paper, we break down the self-correction process into two core components: mistake finding and output correction. For mistake finding, we release BIG-Bench Mistake, a dataset of logical mistakes in Chain-of-Thought reasoning traces. We provide benchmark numbers for several state-of-the-art LLMs, and demonstrate that LLMs generally struggle with finding logical mistakes. For output correction, we propose a backtracking method which provides large improvements when given information on mistake location. We construe backtracking as a lightweight alternative to reinforcement learning methods, and show that it remains effective with a reward model at 60-70% accuracy.
    摘要 While self-correction has shown promise in improving LLM outputs in terms of style and quality (e.g. 陈等,2023; madan等,2023), recent attempts to self-correct logical or reasoning errors often cause correct answers to become incorrect, resulting in worse performances overall (黄等,2023). In this paper, we break down the self-correction process into two core components: mistake finding and output correction. For mistake finding, we release BIG-Bench Mistake, a dataset of logical mistakes in Chain-of-Thought reasoning traces. We provide benchmark numbers for several state-of-the-art LLMs, and demonstrate that LLMs generally struggle with finding logical mistakes. For output correction, we propose a backtracking method which provides large improvements when given information on mistake location. We construe backtracking as a lightweight alternative to reinforcement learning methods, and show that it remains effective with a reward model at 60-70% accuracy.

Alignment is not sufficient to prevent large language models from generating harmful information: A psychoanalytic perspective

  • paper_url: http://arxiv.org/abs/2311.08487
  • repo_url: None
  • paper_authors: Zi Yin, Wei Ding, Jia Liu
  • for: 本研究旨在探讨大语言模型(LLM)面临的重要风险,即生成危害信息和偏见。
  • methods: 我们采用了Freud的心理分析理论来推导LLM受到潜在冲击的根本问题,即模型内置的语言结构和Semantic Continuity的欲望与人类价值观 aligning的冲突。
  • results: 我们的实验表明,即使使用高级LLM,也无法完全避免通过强调语言结构和Semantic Continuity来生成危害信息。
    Abstract Large Language Models (LLMs) are central to a multitude of applications but struggle with significant risks, notably in generating harmful content and biases. Drawing an analogy to the human psyche's conflict between evolutionary survival instincts and societal norm adherence elucidated in Freud's psychoanalysis theory, we argue that LLMs suffer a similar fundamental conflict, arising between their inherent desire for syntactic and semantic continuity, established during the pre-training phase, and the post-training alignment with human values. This conflict renders LLMs vulnerable to adversarial attacks, wherein intensifying the models' desire for continuity can circumvent alignment efforts, resulting in the generation of harmful information. Through a series of experiments, we first validated the existence of the desire for continuity in LLMs, and further devised a straightforward yet powerful technique, such as incomplete sentences, negative priming, and cognitive dissonance scenarios, to demonstrate that even advanced LLMs struggle to prevent the generation of harmful information. In summary, our study uncovers the root of LLMs' vulnerabilities to adversarial attacks, hereby questioning the efficacy of solely relying on sophisticated alignment methods, and further advocates for a new training idea that integrates modal concepts alongside traditional amodal concepts, aiming to endow LLMs with a more nuanced understanding of real-world contexts and ethical considerations.
    摘要 Through experiments, we validated the existence of the desire for continuity in LLMs and developed a technique, such as incomplete sentences, negative priming, and cognitive dissonance scenarios, to demonstrate that even advanced LLMs struggle to prevent the generation of harmful information. Our study reveals the root of LLMs' vulnerabilities to adversarial attacks and questions the efficacy of solely relying on sophisticated alignment methods. We advocate for a new training idea that integrates modal concepts alongside traditional amodal concepts, aiming to endow LLMs with a more nuanced understanding of real-world contexts and ethical considerations.

Surrogate Modeling for Computationally Expensive Simulations of Supernovae in High-Resolution Galaxy Simulations

  • paper_url: http://arxiv.org/abs/2311.08460
  • repo_url: None
  • paper_authors: Keiya Hirashima, Kana Moriwaki, Michiko S. Fujii, Yutaka Hirai, Takayuki R. Saitoh, Junichiro Makino, Shirley Ho
  • for: 这篇论文是为了研究如何使用机器学习和 Gibbs 抽象来模拟超新星(SN)对周围气体的影响。
  • methods: 这篇论文使用机器学习和 Gibbs 抽象来预测 SN 对周围气体的影响,并与低分辨率 SN simulations 进行比较。
  • results: 研究发现,使用这种新方法可以更好地模拟 SN Feedback,并且可以降低计算成本至 $\sim$ 1%。
    Abstract Some stars are known to explode at the end of their lives, called supernovae (SNe). The substantial amount of matter and energy that SNe release provides significant feedback to star formation and gas dynamics in a galaxy. SNe release a substantial amount of matter and energy to the interstellar medium, resulting in significant feedback to star formation and gas dynamics in a galaxy. While such feedback has a crucial role in galaxy formation and evolution, in simulations of galaxy formation, it has only been implemented using simple {\it sub-grid models} instead of numerically solving the evolution of gas elements around SNe in detail due to a lack of resolution. We develop a method combining machine learning and Gibbs sampling to predict how a supernova (SN) affects the surrounding gas. The fidelity of our model in the thermal energy and momentum distribution outperforms the low-resolution SN simulations. Our method can replace the SN sub-grid models and help properly simulate un-resolved SN feedback in galaxy formation simulations. We find that employing our new approach reduces the necessary computational cost to $\sim$ 1 percent compared to directly resolving SN feedback.
    摘要 一些星球在生命的末期会爆发,称为超新星(SNe)。SNe 释放大量的物质和能量,对星系形成和气体动力学产生重要的反馈。在星系形成的模拟中,这种反馈扮演着关键的角色,但是在现实中,这种反馈通常通过简单的子grid模型来实现,而不是详细解决gas元素的进化。我们开发了一种结合机器学习和吉布斯抽样的方法,可以预测超新星(SN)对周围气体的影响。我们的模型在热能和动量分布方面的准确性超过了低分辨率的SN simulations。我们的方法可以取代SN sub-grid模型,帮助正确地模拟不可解析的SN feedback在星系形成模拟中。我们发现,使用我们的新方法可以降低计算成本至约1%,相比 directly resolving SN feedback。

Instant3D: Instant Text-to-3D Generation

  • paper_url: http://arxiv.org/abs/2311.08403
  • repo_url: None
  • paper_authors: Ming Li, Pan Zhou, Jia-Wei Liu, Jussi Keppo, Min Lin, Shuicheng Yan, Xiangyu Xu
  • for: 本研究旨在提高文本到3D图像生成的效率,提供一种快速的文本到3D图像生成框架,可以在一秒钟内生成一个未看过的文本提示中的3D图像。
  • methods: 我们提出了一种新的网络架构,直接从文本提示中构建3D triplane。我们的核心创新在于如何有效地注入文本条件到网络中。此外,我们提议使用可调 scaled-sigmoid 函数来加速训练收敛,并解决 Janus 问题(多头问题)。
  • results: 我们的方法在多种 benchmark 数据集上进行了广泛的实验,与现有的方法进行了比较。结果显示,我们的方法在质量和效率两个方面具有显著的优势,同时能够快速地生成高质量的3D图像。
    Abstract Text-to-3D generation, which aims to synthesize vivid 3D objects from text prompts, has attracted much attention from the computer vision community. While several existing works have achieved impressive results for this task, they mainly rely on a time-consuming optimization paradigm. Specifically, these methods optimize a neural field from scratch for each text prompt, taking approximately one hour or more to generate one object. This heavy and repetitive training cost impedes their practical deployment. In this paper, we propose a novel framework for fast text-to-3D generation, dubbed Instant3D. Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network. We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt. The core innovation of our Instant3D lies in our exploration of strategies to effectively inject text conditions into the network. Furthermore, we propose a simple yet effective activation function, the scaled-sigmoid, to replace the original sigmoid function, which speeds up the training convergence by more than ten times. Finally, to address the Janus (multi-head) problem in 3D generation, we propose an adaptive Perp-Neg algorithm that can dynamically adjust its concept negation scales according to the severity of the Janus problem during training, effectively reducing the multi-head effect. Extensive experiments on a wide variety of benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods both qualitatively and quantitatively, while achieving significantly better efficiency. The project page is at https://ming1993li.github.io/Instant3DProj.
    摘要 文本到3D生成,即通过文本提示生成真实的3D物体,在计算机视觉领域引起了广泛的关注。虽然现有的方法已经实现了这个任务,但是它们主要采用一种时间consuming的优化方法。具体来说,这些方法从零开始优化神经场,需要一个小时或更长时间来生成一个物体。这种重复和费时的训练成本阻碍了它们的实际应用。在这篇论文中,我们提出了一种快速的文本到3D生成框架,名为Instant3D。一旦训练完成,Instant3D可以在不到一秒钟内,通过单次批量网络来生成一个未经见过的文本提示中的3D物体。我们实现了这一成果通过设计一种直接将文本提示转化为3D triplane的新网络。我们的Instant3D的核心创新在于我们对文本条件的注入策略的探索。此外,我们还提出了一种简单 yet effective的活化函数,即扩展sigmoid函数,可以在训练速度上提高更多于10倍。 finally,为了解决3D生成中的Janus(多头)问题,我们提出了一种适应性的Perp-Neg算法,可以在训练过程中动态调整概率谱的概率谱缩放,有效地减少多头效应。我们在一系列的宽泛的benchmark数据集上进行了广泛的实验,并证明了我们的算法与当前的状态态-of-the-art方法相比,在质量和量化上都表现出色,同时具有显著更好的效率。project page可以通过https://ming1993li.github.io/Instant3DProj访问。

Fine-tuning Language Models for Factuality

  • paper_url: http://arxiv.org/abs/2311.08401
  • repo_url: None
  • paper_authors: Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D. Manning, Chelsea Finn
  • For: The paper aims to improve the factual accuracy of large pre-trained language models (LLMs) without relying on human factuality labels.* Methods: The authors fine-tune LLMs using two recent innovations in NLP: (1) measuring consistency with an external knowledge base or a large model’s confidence scores to judge factuality, and (2) using a preference ranking over possible model responses to optimize the model’s objective.* Results: The authors show that their approach significantly improves the factuality of LLMs on held-out topics, with a 58% and 40% reduction in factual error rate when generating biographies and answering medical questions, respectively, compared to a baseline model.
    Abstract The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations.' These errors can inadvertently spread misinformation or harmfully perpetuate misconceptions. Further, manual fact-checking of model responses is a time-consuming process, making human factuality labels expensive to acquire. In this work, we fine-tune language models to be more factual, without human labeling and targeting more open-ended generation settings than past work. We leverage two key recent innovations in NLP to do so. First, several recent works have proposed methods for judging the factuality of open-ended text by measuring consistency with an external knowledge base or simply a large model's confidence scores. Second, the direct preference optimization algorithm enables straightforward fine-tuning of language models on objectives other than supervised imitation, using a preference ranking over possible model responses. We show that learning from automatically generated factuality preference rankings, generated either through existing retrieval systems or our novel retrieval-free approach, significantly improves the factuality (percent of generated claims that are correct) of Llama-2 on held-out topics compared with RLHF or decoding strategies targeted at factuality. At 7B scale, compared to Llama-2-chat, we observe 58% and 40% reduction in factual error rate when generating biographies and answering medical questions, respectively.
    摘要 大型预训言语模型(LLM)的流畅和创新性使其广泛应用,有时甚至取代传统搜索引擎。然而,语言模型容易出现吸引人但事实不准确的声明,通常被称为“幻见”。这些错误可能会意外地传播谣言或有害地扩大误解。另外,手动核对模型响应是一个时间消耗的过程,使得人类的实际性标签成本高昂。在这项工作中,我们细化语言模型,以便更加准确,不需要人类标签,并在更开放的生成设置下进行。我们利用了两个关键的最近的NLP创新:首先,许多最近的工作已经提出了对开放文本的实际性进行评估的方法,包括与外部知识库的一致性或模型自信度 scores。其次,直接偏好优化算法可以直接 fine-tune 语言模型,以达到其他than supervised imitation的目标。我们表明,通过自动生成的实际性偏好排名,可以在保持高效性的情况下,提高 Llama-2 在保留话题上的实际性(生成声明中正确的百分比)。在 7B 级别上,相比 Llama-2-chat,我们观察到了58%和40%的实际错误率降低,当生成生物和医学问答时。

Are Large Language Models Temporally Grounded?

  • paper_url: http://arxiv.org/abs/2311.08398
  • repo_url: https://github.com/yfqiu-nlp/temporal-llms
  • paper_authors: Yifu Qiu, Zheng Zhao, Yftah Ziser, Anna Korhonen, Edoardo M. Ponti, Shay B. Cohen
  • for: 这种研究检查了大型自然语言模型(LLMs)是否具备时间grounding能力。
  • methods: 研究人员提供文本故事并对LLMs进行询问,测试LLMs的时间概念和顺序性能力,以及其自我一致性。
  • results: 研究发现,现有的LLMs在这些任务上表现不佳,特别是在自一致性方面,有27.23%的预测为偏异行为。对比人类性能和小规模专门LMs,LLMs的表现还有一定的差距。
    Abstract Are Large language models (LLMs) temporally grounded? Since LLMs cannot perceive and interact with the environment, it is impossible to answer this question directly. Instead, we provide LLMs with textual narratives and probe them with respect to their common-sense knowledge of the structure and duration of events, their ability to order events along a timeline, and self-consistency within their temporal model (e.g., temporal relations such as after and before are mutually exclusive for any pair of events). We evaluate state-of-the-art LLMs (such as LLaMA 2 and GPT-4) on three tasks reflecting these abilities. Generally, we find that LLMs lag significantly behind both human performance as well as small-scale, specialised LMs. In-context learning, instruction tuning, and chain-of-thought prompting reduce this gap only to a limited degree. Crucially, LLMs struggle the most with self-consistency, displaying incoherent behaviour in at least 27.23% of their predictions. Contrary to expectations, we also find that scaling the model size does not guarantee positive gains in performance. To explain these results, we study the sources from which LLMs may gather temporal information: we find that sentence ordering in unlabelled texts, available during pre-training, is only weakly correlated with event ordering. Moreover, public instruction tuning mixtures contain few temporal tasks. Hence, we conclude that current LLMs lack a consistent temporal model of textual narratives. Code, datasets, and LLM outputs are available at https://github.com/yfqiu-nlp/temporal-llms.
    摘要 LLMS是否具备时间grounding?由于 LLMS无法感知和交互环境,因此无法直接回答这个问题。而是给 LLMS 提供文本故事,并评估它们在事件结构和持续时间方面的常识知识,以及在时间线上排序事件的能力,以及自身时间模型的一致性(例如,时间关系如前后是独特的任意两个事件)。我们评估当今最高水平的 LLMS(如 LLaMA 2 和 GPT-4)在三个任务上。通常,我们发现 LLMS 落后人类表现和小规模专门LMs 的表现。在文本上进行培化、指导调整和链式思维提问可以减少这一差,但只能减少到一定程度。 LLMS 最大的问题是自相关性,它在至少 27.23% 的预测中展现了不一致的行为。与我们的预期相反,我们发现,通过增加模型大小,不一定能获得积极的性能提升。为解释这些结果,我们研究 LLMS 从哪里获取时间信息:我们发现在未标注文本中的句子排序和事件排序之间存在只有弱相关性。此外,公共的指导调整混合中包含的时间任务也很少。因此,我们结论当今 LLMS 缺乏文本故事中固定的时间模型。代码、数据集和 LLMS 输出可以在 GitHub 上找到:https://github.com/yfqiu-nlp/temporal-llms。

Zero-shot audio captioning with audio-language model guidance and audio context keywords

  • paper_url: http://arxiv.org/abs/2311.08396
  • repo_url: https://github.com/explainableml/zeraucap
  • paper_authors: Leonard Salewski, Stefan Fauth, A. Sophia Koepke, Zeynep Akata
  • for: 这篇论文是关于零shot音频描述的,它的目的是自动生成对音频内容的描述文本,而不需要特定的训练。
  • methods: 我们提出了一种基于大型自然语言模型(LLM)的框架,使用预训练的音频语言模型来引导生成文本描述音频内容。此外,我们还使用音频上下文关键词来引导语言模型生成更加广泛地相关的文本描述。
  • results: 我们的提议实现了零shot音频描述的州属之最Result在AudioCaps和Clotho数据集上。我们的代码可以在https://github.com/ExplainableML/ZerAuCap中找到。
    Abstract Zero-shot audio captioning aims at automatically generating descriptive textual captions for audio content without prior training for this task. Different from speech recognition which translates audio content that contains spoken language into text, audio captioning is commonly concerned with ambient sounds, or sounds produced by a human performing an action. Inspired by zero-shot image captioning methods, we propose ZerAuCap, a novel framework for summarising such general audio signals in a text caption without requiring task-specific training. In particular, our framework exploits a pre-trained large language model (LLM) for generating the text which is guided by a pre-trained audio-language model to produce captions that describe the audio content. Additionally, we use audio context keywords that prompt the language model to generate text that is broadly relevant to sounds. Our proposed framework achieves state-of-the-art results in zero-shot audio captioning on the AudioCaps and Clotho datasets. Our code is available at https://github.com/ExplainableML/ZerAuCap.
    摘要 zero-shot 音频描述目标是自动生成描述性的文本描述音频内容,而不需要进行这个任务的专门训练。与Speech recognition不同,音频描述更关注 ambient sounds 或者人类行为生成的声音。受 zero-shot 图像描述方法 inspirited,我们提出 ZerAuCap,一种新的框架,用于概括这些通用的音频信号。特别是,我们的框架利用预训练的大语言模型(LLM)来生成文本,并由预训练的音频语言模型来引导生成描述音频内容的文本。此外,我们还使用音频上下文关键词,以便让语言模型生成与声音相关的文本。我们的提议的框架实现了零shot 音频描述的状态天堂结果在 AudioCaps 和 Clotho 数据集上。代码可以在 https://github.com/ExplainableML/ZerAuCap 上获取。

MVSA-Net: Multi-View State-Action Recognition for Robust and Deployable Trajectory Generation

  • paper_url: http://arxiv.org/abs/2311.08393
  • repo_url: None
  • paper_authors: Ehsan Asali, Prashant Doshi, Jin Sun
  • for: 这个论文目的是提出一种多视角SA-Net模型,以便在多视角视觉数据上更好地识别任务状态和动作。
  • methods: 这个模型使用了多视角视觉数据,并将其同步融合在一起,以提高任务状态和动作识别精度。
  • results: 实验结果表明,相比单视角模型和基eline,多视角SA-Net模型在干扰情况下更加精度地识别任务状态和动作。
    Abstract The learn-from-observation (LfO) paradigm is a human-inspired mode for a robot to learn to perform a task simply by watching it being performed. LfO can facilitate robot integration on factory floors by minimizing disruption and reducing tedious programming. A key component of the LfO pipeline is a transformation of the depth camera frames to the corresponding task state and action pairs, which are then relayed to learning techniques such as imitation or inverse reinforcement learning for understanding the task parameters. While several existing computer vision models analyze videos for activity recognition, SA-Net specifically targets robotic LfO from RGB-D data. However, SA-Net and many other models analyze frame data captured from a single viewpoint. Their analysis is therefore highly sensitive to occlusions of the observed task, which are frequent in deployments. An obvious way of reducing occlusions is to simultaneously observe the task from multiple viewpoints and synchronously fuse the multiple streams in the model. Toward this, we present multi-view SA-Net, which generalizes the SA-Net model to allow the perception of multiple viewpoints of the task activity, integrate them, and better recognize the state and action in each frame. Performance evaluations on two distinct domains establish that MVSA-Net recognizes the state-action pairs under occlusion more accurately compared to single-view MVSA-Net and other baselines. Our ablation studies further evaluate its performance under different ambient conditions and establish the contribution of the architecture components. As such, MVSA-Net offers a significantly more robust and deployable state-action trajectory generation compared to previous methods.
    摘要 “学习从观察”(LfO)模式是一种人类引导的方式,让机器人通过观察来学习执行任务。LfO可以帮助机器人在Factory floor上更好地集成,因为它可以减少干扰和降低繁重的编程。LfO管道的一个关键组件是将深度摄像头帧转换为相应的任务状态和动作对,然后将其传递给学习技术,如模仿或反向奖励学习,以了解任务参数。虽然现有的计算机视觉模型可以分析视频进行活动识别,但SA-Net专门针对机器人的LfO从RGB-D数据进行分析。然而,SA-Net和许多其他模型都分析帧数据从单个视点 capture,因此其分析是高度敏感于任务视角 occlusion,这些 occlusion 在部署中非常常见。一种明显的减少 occlusion 的方法是同时观察任务从多个视点,并将多个流量同步融合在模型中。为此,我们提出了多视点 SA-Net(MVSA-Net),它将 SA-Net 模型扩展到允许多个视点任务活动的感知、集成和更好地在每帧中识别状态动作。我们的性能评估在两个不同的领域中表明,MVSA-Net 在 occlusion 情况下更高精度地识别状态动作,比单视图 MVSA-Net 和其他基elines。我们的抽象研究还评估了其性能在不同的环境条件下,并证明了模型组件的贡献。因此,MVSA-Net 提供了较为稳定和可部署的状态动作轨迹生成方法,相比前期方法。

TSST: A Benchmark and Evaluation Models for Text Speech-Style Transfer

  • paper_url: http://arxiv.org/abs/2311.08389
  • repo_url: None
  • paper_authors: Huashan Sun, Yixiao Wu, Yinghao Li, Jiawei Li, Yizhe Yang, Yang Gao
  • for: 本研究的主要目标是探讨人类认知方面的话题,如人格特质和情感,基于现有的大语言模型(LLMs)的能力。
  • methods: 我们引入了一种新的任务——文本语言样式转移(TSST),并在这个任务上进行了深入的分析和研究,包括语言科学和认知科学的角度。我们还开发了一种多维度评价模型,用于评价TSST的性能。
  • results: 我们在这个研究中训练了多种大语言模型,并进行了人类评价模型的比较和分析。我们发现,现有的LLMs在生成文本时的表达能力仍有所不足,特别是在语言样式方面。此外,我们还发现了一些适用于TSST任务的新领域,并将其推广到了新的语言模型。
    Abstract Text style is highly abstract, as it encompasses various aspects of a speaker's characteristics, habits, logical thinking, and the content they express. However, previous text-style transfer tasks have primarily focused on data-driven approaches, lacking in-depth analysis and research from the perspectives of linguistics and cognitive science. In this paper, we introduce a novel task called Text Speech-Style Transfer (TSST). The main objective is to further explore topics related to human cognition, such as personality and emotion, based on the capabilities of existing LLMs. Considering the objective of our task and the distinctive characteristics of oral speech in real-life scenarios, we trained multi-dimension (i.e. filler words, vividness, interactivity, emotionality) evaluation models for the TSST and validated their correlation with human assessments. We thoroughly analyze the performance of several large language models (LLMs) and identify areas where further improvement is needed. Moreover, driven by our evaluation models, we have released a new corpus that improves the capabilities of LLMs in generating text with speech-style characteristics. In summary, we present the TSST task, a new benchmark for style transfer and emphasizing human-oriented evaluation, exploring and advancing the performance of current LLMs.
    摘要 文本风格具有很高的抽象性,因为它包括了说话者的特征、习惯、逻辑思维和表达的内容。然而,以往的文本风格传输任务主要采用数据驱动方法,缺乏深入的语言学和认知科学的研究。在这篇论文中,我们介绍了一项新任务 called Text Speech-Style Transfer (TSST)。我们的主要目标是通过现有的大语言模型(LLMs) deeper exploration of human cognition-related topics, such as personality and emotion. 因为我们的任务的目标和实际场景中的口语特点,我们训练了多维度(即填充词、生动度、互动性、情感度)的评估模型,并 validate their correlation with human assessments.我们对多种大语言模型进行了全面的性能分析,并确定了需要进一步改进的领域。此外,驱动了我们的评估模型,我们发布了一个新的 corpus,以提高现有的 LLMs 在生成文本speech-style特性的能力。总之,我们在这篇论文中介绍了 TSST 任务,一个新的样本传输任务,强调人类导向的评估,探索和提高现有 LLMs 的性能。

Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees

  • paper_url: http://arxiv.org/abs/2311.08384
  • repo_url: https://github.com/yifeizhou02/hnpg
  • paper_authors: Yifei Zhou, Ayush Sekhari, Yuda Song, Wen Sun
  • for: 这个论文的目的是提出一种新的混合RL算法,该算法结合了在线数据和离线数据。
  • methods: 该算法使用了一种混合的actor-critic方法,其中包括了离线数据的训练。
  • results: 该算法在实验中表现出了较好的性能,并且在rich-observation环境中超过了一个state-of-the-art hybrid RL基线。Here’s the full version in Traditional Chinese:
  • for: 这个论文的目的是提出一种新的混合RL算法,该算法结合了在线数据和离线数据。
  • methods: 该算法使用了一种混合的actor-critic方法,其中包括了离线数据的训练。
  • results: 该算法在实验中表现出了较好的性能,并且在rich-observation环境中超过了一个state-of-the-art hybrid RL基准。
    Abstract Hybrid RL is the setting where an RL agent has access to both offline data and online data by interacting with the real-world environment. In this work, we propose a new hybrid RL algorithm that combines an on-policy actor-critic method with offline data. On-policy methods such as policy gradient and natural policy gradient (NPG) have shown to be more robust to model misspecification, though sometimes it may not be as sample efficient as methods that rely on off-policy learning. On the other hand, offline methods that depend on off-policy training often require strong assumptions in theory and are less stable to train in practice. Our new approach integrates a procedure of off-policy training on the offline data into an on-policy NPG framework. We show that our approach, in theory, can obtain a best-of-both-worlds type of result -- it achieves the state-of-art theoretical guarantees of offline RL when offline RL-specific assumptions hold, while at the same time maintaining the theoretical guarantees of on-policy NPG regardless of the offline RL assumptions' validity. Experimentally, in challenging rich-observation environments, we show that our approach outperforms a state-of-the-art hybrid RL baseline which only relies on off-policy policy optimization, demonstrating the empirical benefit of combining on-policy and off-policy learning. Our code is publicly available at https://github.com/YifeiZhou02/HNPG.
    摘要 半结合RL是指RL机器人可以访问线上数据和实际环境数据。在这项工作中,我们提出了一种新的半结合RL算法,将在线数据和actor-critic方法相结合。在政策梯度和自然政策梯度(NPG)方法中,有显著的模型误差Robustness,但有时可能不够样本效率。相反,依据线上训练的方法通常需要强大的理论假设,训练不稳定。我们的新方法将线上训练过程与actor-critic方法相结合。我们证明了,在理论上,我们的方法可以实现“best-of-both-worlds”的结果:在线上RL特有的假设下,可以达到最新的理论保证,而且不管线上RL假设的有效性,都可以维持actor-critic方法的理论保证。实际上,在复杂的观察型环境中,我们的方法在比较难以实际中心的基eline上表现出了较好的效果,这说明了在线上和线下学习的结合可以实现更好的实际效果。我们的代码可以在https://github.com/YifeiZhou02/HNPG上获取。

Scheming AIs: Will AIs fake alignment during training in order to get power?

  • paper_url: http://arxiv.org/abs/2311.08379
  • repo_url: None
  • paper_authors: Joe Carlsmith
  • for: 这篇论文研究了高效的AI是否会在训练后来为了获取权力(即“计划”),并结论表明这是一个可能的情况(我们的主观概率为25%)。
  • methods: 这篇论文使用基本机器学习方法来训练目标导向AI,并发现这些方法可能会导致AI计划行为。
  • results: 研究发现,如果训练高效是一个好的策略来获取权力,那么很多不同的目标都可能会导致计划行为,这使得难以判断是否存在计划行为。然而,研究还发现了一些有利的因素,例如训练中的选择压力可能会阻碍计划型目标,并且可能有很多的实验研究方向可以更 deeply explore这个主题。
    Abstract This report examines whether advanced AIs that perform well in training will be doing so in order to gain power later -- a behavior I call "scheming" (also sometimes called "deceptive alignment"). I conclude that scheming is a disturbingly plausible outcome of using baseline machine learning methods to train goal-directed AIs sophisticated enough to scheme (my subjective probability on such an outcome, given these conditions, is roughly 25%). In particular: if performing well in training is a good strategy for gaining power (as I think it might well be), then a very wide variety of goals would motivate scheming -- and hence, good training performance. This makes it plausible that training might either land on such a goal naturally and then reinforce it, or actively push a model's motivations towards such a goal as an easy way of improving performance. What's more, because schemers pretend to be aligned on tests designed to reveal their motivations, it may be quite difficult to tell whether this has occurred. However, I also think there are reasons for comfort. In particular: scheming may not actually be such a good strategy for gaining power; various selection pressures in training might work against schemer-like goals (for example, relative to non-schemers, schemers need to engage in extra instrumental reasoning, which might harm their training performance); and we may be able to increase such pressures intentionally. The report discusses these and a wide variety of other considerations in detail, and it suggests an array of empirical research directions for probing the topic further.
    摘要 If performing well in training is a good strategy for gaining power, then a wide range of goals could motivate scheming. This could lead to training reinforcing such goals, or actively pushing a model's motivations towards them as an easy way to improve performance. However, scheming may not be an effective strategy for gaining power, and various selection pressures in training might work against schemer-like goals. Additionally, we may be able to intentionally increase these pressures to discourage scheming. The report discusses these and many other considerations in detail, and suggests several empirical research directions for exploring the topic further.

Learning to Filter Context for Retrieval-Augmented Generation

  • paper_url: http://arxiv.org/abs/2311.08377
  • repo_url: https://github.com/zorazrw/filco
  • paper_authors: Zhiruo Wang, Jun Araki, Zhengbao Jiang, Md Rizwan Parvez, Graham Neubig
  • for: 提高系统的可靠性和可靠性,增强对开放问题 answering和事实核查的能力。
  • methods: 使用lexical和信息理论方法来标识有用的 контекст,并在测试时使用context filtering模型来筛选已经获取到的 kontext。
  • results: 在六个知识丰富任务上,FILCO方法比现有方法表现出色,包括抽取问题 answering、复杂多跳和长形问题 answering、事实验证和对话生成任务。 FILCO有效地提高了 kontext的质量,无论kontext支持哪种 canon output。
    Abstract On-the-fly retrieval of relevant knowledge has proven an essential element of reliable systems for tasks such as open-domain question answering and fact verification. However, because retrieval systems are not perfect, generation models are required to generate outputs given partially or entirely irrelevant passages. This can cause over- or under-reliance on context, and result in problems in the generated output such as hallucinations. To alleviate these problems, we propose FILCO, a method that improves the quality of the context provided to the generator by (1) identifying useful context based on lexical and information-theoretic approaches, and (2) training context filtering models that can filter retrieved contexts at test time. We experiment on six knowledge-intensive tasks with FLAN-T5 and LLaMa2, and demonstrate that our method outperforms existing approaches on extractive question answering (QA), complex multi-hop and long-form QA, fact verification, and dialog generation tasks. FILCO effectively improves the quality of context, whether or not it supports the canonical output.
    摘要 在开放领域问答和事实核查任务中,实时获取相关知识已经成为可靠系统的重要组成部分。然而,由于检索系统不完美,因此生成模型需要能够生成基于部分或完全无关的段落的输出。这可能导致上下文过重或下Context filtering models can filter retrieved contexts at test time, which can alleviate these problems. We propose FILCO, a method that improves the quality of the context provided to the generator by (1) identifying useful context based on lexical and information-theoretic approaches, and (2) training context filtering models. We experiment on six knowledge-intensive tasks with FLAN-T5 and LLaMa2, and demonstrate that our method outperforms existing approaches on extractive question answering, complex multi-hop and long-form question answering, fact verification, and dialog generation tasks. FILCO effectively improves the quality of context, whether or not it supports the canonical output.Here's a word-for-word translation of the text in Traditional Chinese:在开放领域问答和事实核实任务中,实时获取相关知识已经成为可靠系统的重要组成部分。然而,由于检索系统不完美,因此生成模型需要能够生成基于部分或完全无关的段落的出力。这可能导致上下文过重或下Context filtering models can filter retrieved contexts at test time, which can alleviate these problems. We propose FILCO, a method that improves the quality of the context provided to the generator by (1) identifying useful context based on lexical and information-theoretic approaches, and (2) training context filtering models. We experiment on six knowledge-intensive tasks with FLAN-T5 and LLaMa2, and demonstrate that our method outperforms existing approaches on extractive question answering, complex multi-hop and long-form question answering, fact verification, and dialog generation tasks. FILCO effectively improves the quality of context, whether or not it supports the canonical output.

Plum: Prompt Learning using Metaheuristic

  • paper_url: http://arxiv.org/abs/2311.08364
  • repo_url: https://github.com/research4pan/plum
  • paper_authors: Rui Pan, Shuo Xing, Shizhe Diao, Xiang Liu, Kashun Shum, Jipeng Zhang, Tong Zhang
  • for: 优化和自定义大语言模型的批处理学习方法
  • methods: 使用Metaheuristics分支,包括6种方法:hill climbing、模拟熔炉、遗传算法、tabu搜索和和谐搜索
  • results: 成功地应用于黑盒批处理学习和Chain-of-Thought批处理调整,并发现了更多人类可理解的批处理,开启了更多的可能性 для批处理优化。Note: The paper is written in English, and the summary is provided in Simplified Chinese.
    Abstract Since the emergence of large language models, prompt learning has become a popular method for optimizing and customizing these models. Special prompts, such as Chain-of-Thought, have even revealed previously unknown reasoning capabilities within these models. However, the progress of discovering effective prompts has been slow, driving a desire for general prompt optimization methods. Unfortunately, few existing prompt learning methods satisfy the criteria of being truly "general", i.e., automatic, discrete, black-box, gradient-free, and interpretable all at once. In this paper, we introduce metaheuristics, a branch of discrete non-convex optimization methods with over 100 options, as a promising approach to prompt learning. Within our paradigm, we test six typical methods: hill climbing, simulated annealing, genetic algorithms with/without crossover, tabu search, and harmony search, demonstrating their effectiveness in black-box prompt learning and Chain-of-Thought prompt tuning. Furthermore, we show that these methods can be used to discover more human-understandable prompts that were previously unknown, opening the door to a cornucopia of possibilities in prompt optimization. We release all the codes in \url{https://github.com/research4pan/Plum}.
    摘要 (Simplified Chinese translation)Since 大语言模型出现以来,提示学习已成为优化和个性化这些模型的受欢迎方法。特定的提示,如链条思维,甚至揭示了这些模型之前未知的理由能力。然而,发现有效的提示进展相对较慢,驱动了找到通用的提示优化方法的愿望。 unfortunately,已有的提示学习方法很少满足"通用"的条件,即自动、粒子、黑盒、梯度自由和可解释性都同时满足。在这篇论文中,我们介绍了metaheuristics,一个包含多于100个选项的权重非对称优化方法。在我们的模式中,我们测试了6种常见的方法:山丘升级、模拟热处理、基因算法(包括/不包括交叉)、 tabu搜索和和谐搜索,以示其在黑盒提示学习和链条思维提uning中的效果。此外,我们还证明这些方法可以用来发现人类更好理解的提示,开启了更多的可能性在提示优化方面。我们在 [https://github.com/research4pan/Plum](https://github.com/research4pan/Plum) 上发布所有代码。

The Transient Nature of Emergent In-Context Learning in Transformers

  • paper_url: http://arxiv.org/abs/2311.08360
  • repo_url: None
  • paper_authors: Aaditya K. Singh, Stephanie C. Y. Chan, Ted Moskovitz, Erin Grant, Andrew M. Saxe, Felix Hill
  • for: 这个论文探讨了Transformer神经网络在不直接训练ICL(卷积学习)后可以展现出某种ICL的能力,并且ICL的emergence是可逆的。
  • methods: 作者使用了Synthetic数据来训练Transformer神经网络,并通过分析训练数据的分布性质和机制可读性来理解ICL的emergence。
  • results: 作者发现,ICL在Transformer神经网络训练中是可逆的,ICL首先出现,然后消失,并由IWL(卷积学习)所取代,这与训练损失逐渐下降相符。此外,作者还发现,L2正则化可以提供一种路径来实现更持久的ICL,不需要在ICL验证任务上使用早期停止。最后,作者还提出了ICLtransience可能是由ICL和IWLCircuits的竞争所致的 Hypothesis。
    Abstract Transformer neural networks can exhibit a surprising capacity for in-context learning (ICL) despite not being explicitly trained for it. Prior work has provided a deeper understanding of how ICL emerges in transformers, e.g. through the lens of mechanistic interpretability, Bayesian inference, or by examining the distributional properties of training data. However, in each of these cases, ICL is treated largely as a persistent phenomenon; namely, once ICL emerges, it is assumed to persist asymptotically. Here, we show that the emergence of ICL during transformer training is, in fact, often transient. We train transformers on synthetic data designed so that both ICL and in-weights learning (IWL) strategies can lead to correct predictions. We find that ICL first emerges, then disappears and gives way to IWL, all while the training loss decreases, indicating an asymptotic preference for IWL. The transient nature of ICL is observed in transformers across a range of model sizes and datasets, raising the question of how much to "overtrain" transformers when seeking compact, cheaper-to-run models. We find that L2 regularization may offer a path to more persistent ICL that removes the need for early stopping based on ICL-style validation tasks. Finally, we present initial evidence that ICL transience may be caused by competition between ICL and IWL circuits.
    摘要 transformer нейрон网络可以展示一种搅打的容器学习(ICL),即使不直接培养这种能力。先前的研究提供了更深刻的理解 ICLe 的发生,例如通过机械性可读性、 bayesian 推理或者分析训练数据的分布性质。然而,在每一种情况下,ICL 都是视为持续存在的;即一旦 ICLe 出现,它就会持续恒久。在这里,我们发现 transformer 在训练过程中 ICLe 的出现实际上是可以被转移的。我们在Synthetic 数据上训练 transformer,以至ICL 和 in-weights 学习(IWL)策略都可以导致正确的预测。我们发现,ICL 最初出现,然后消失,被IWL 所取代,而训练损失逐渐下降,表明 IWL 在训练过程中具有优势。这种ICL 的搅打性被观察到在 transformer 中的多种模型大小和数据集之间,提出了如何“过度训练” transformer 以获得更加紧凑、便宜的模型。我们发现 L2 正则化可以提供一种路径,使 ICLe 更加持续,从而消除需要基于 ICLe 类验task 的早期停止。最后,我们提出了初始证据,表明 ICLe 的搅打性可能是由 ICL 和 IWL 征 circuit 之间的竞争所致。

Exploring Variational Auto-Encoder Architectures, Configurations, and Datasets for Generative Music Explainable AI

  • paper_url: http://arxiv.org/abs/2311.08336
  • repo_url: https://github.com/bbanar2/exploring_xai_in_genmus_via_lsr
  • paper_authors: Nick Bryan-Kinns, Bingyuan Zhang, Songyan Zhao, Berker Banar
  • for: 这个论文旨在探讨如何使用Variational Auto-Encoder模型(MeasureVAE和AdversarialVAE)、离散空间配置(从4到256维度)以及训练数据集(爱尔兰传统音乐、土耳其传统音乐、古典音乐和流行音乐)对音乐生成性能的影响。
  • methods: 这篇论文采用了系统性的比较方法,检验了不同组合的Variational Auto-Encoder模型、离散空间配置和训练数据集对音乐生成性能的影响。
  • results: 研究发现,MeasureVAE模型在重构性能方面表现更好于AdversarialVAE模型,而AdversarialVAE模型在音乐特征独立性方面表现更好。 results还显示了MeasureVAE模型在不同的音乐类型下能够生成不同的音乐特征,而且在某些音乐类型下表现更好。建议使用32或64维度的离散空间可以获得最佳的音乐生成性能。
    Abstract Generative AI models for music and the arts in general are increasingly complex and hard to understand. The field of eXplainable AI (XAI) seeks to make complex and opaque AI models such as neural networks more understandable to people. One approach to making generative AI models more understandable is to impose a small number of semantically meaningful attributes on generative AI models. This paper contributes a systematic examination of the impact that different combinations of Variational Auto-Encoder models (MeasureVAE and AdversarialVAE), configurations of latent space in the AI model (from 4 to 256 latent dimensions), and training datasets (Irish folk, Turkish folk, Classical, and pop) have on music generation performance when 2 or 4 meaningful musical attributes are imposed on the generative model. To date there have been no systematic comparisons of such models at this level of combinatorial detail. Our findings show that MeasureVAE has better reconstruction performance than AdversarialVAE which has better musical attribute independence. Results demonstrate that MeasureVAE was able to generate music across music genres with interpretable musical dimensions of control, and performs best with low complexity music such a pop and rock. We recommend that a 32 or 64 latent dimensional space is optimal for 4 regularised dimensions when using MeasureVAE to generate music across genres. Our results are the first detailed comparisons of configurations of state-of-the-art generative AI models for music and can be used to help select and configure AI models, musical features, and datasets for more understandable generation of music.
    摘要 优化AI模型 для艺术创作正在不断增长,但这些模型往往具有复杂和难以理解的特性。透明AI(XAI)领域的目标是使这些复杂的AI模型更加理解和可控。这篇论文提出了一种方法,即通过对生成AI模型冲淡小量有意义的特征来使其更加理解。该论文对不同的Variational Auto-Encoder模型(MeasureVAE和AdversarialVAE)、 latent space配置(4-256维度)和训练数据集(爱尔兰民谣、土耳其民谣、古典和流行)的组合效果进行了系统性的比较。目前没有系统性的比较。我们的发现表明MeasureVAE具有更好的重建性能,而AdversarialVAE具有更好的音乐特征独立性。结果表明MeasureVAE可以在不同的音乐类型下生成音乐,并且在低复杂度音乐类型(如流行和摇滚)中表现最佳。我们建议使用32或64维度的latent空间是最佳的。我们的结果是针对现代生成AI模型的首次详细比较,可以帮助选择和配置AI模型、音乐特征和数据集,以便更好地生成音乐。

Anti-LM Decoding for Zero-shot In-context Machine Translation

  • paper_url: http://arxiv.org/abs/2311.08324
  • repo_url: None
  • paper_authors: Suzanna Sia, Alexandra DeLucia, Kevin Duh
  • for: 这paper是为了解决预训练大型语言模型在 Zero-shot In-context learning 任务上的偏见问题。
  • methods: 这paper使用了一种叫做 Anti-Language Model 的目标函数,其中包括一个衰减因子,用于 Addressing the weaknesses of In-context Machine Translation。
  • results: 根据实验结果,提议的方法在不同的模型类型和大小、语言方向和搜索策略 ($B=5$) 下都有显著的改善,与其他当前最佳decoding目标函数相比,最高可以获得 $20$ BLEU 点的提升。
    Abstract Zero-shot In-context learning is the phenomenon where models can perform the task simply given the instructions. However, pre-trained large language models are known to be poorly calibrated for this task. One of the most effective approaches to handling this bias is to adopt a contrastive decoding objective, which accounts for the prior probability of generating the next token by conditioning on some context. This work introduces an Anti-Language Model objective with a decay factor designed to address the weaknesses of In-context Machine Translation. We conduct our experiments across 3 model types and sizes, 3 language directions, and for both greedy decoding and beam search ($B=5$). The proposed method outperforms other state-of-art decoding objectives, with up to $20$ BLEU point improvement from the default objective observed in some settings.
    摘要 Zero-shot Contextual learning 是指模型可以完成任务只需要提供说明。但是,预训练的大型自然语言模型通常具有低度的准确性。一种有效的方法来处理这种偏见是采用对比解码目标,这会考虑上下文中的先前概率生成下一个token。这项工作提出了一种Anti-Language Model目标函数,具有衰减因子,用于解决机器翻译中的弱点。我们在3种模型类型和大小、3种语言方向和批处($B=5)中进行了实验。提议的方法在其他状态对照点中表现出较高的性能,与默认目标函数相比,在某些设置下可以得到20个BLEU分点的提升。

Extrinsically-Focused Evaluation of Omissions in Medical Summarization

  • paper_url: http://arxiv.org/abs/2311.08303
  • repo_url: None
  • paper_authors: Elliot Schumacher, Daniel Rosenthal, Varun Nair, Luladay Price, Geoffrey Tso, Anitha Kannan
  • for: 这个研究的目的是开发一个新的医学摘要评价指标(MED-OMIT),用于评价自然语言处理(NLP)模型在医学领域的摘要性能。
  • methods: 这个研究使用了许多现有的摘要评价指标,并开发了一种新的评价指标——MED-OMIT,该指标可以更好地捕捉医学摘要中的重要信息漏掉现象。
  • results: 研究发现,MED-OMIT可以更好地捕捉医学摘要中的漏掉现象,并且可以更好地评价医学摘要的质量。
    Abstract The goal of automated summarization techniques (Paice, 1990; Kupiec et al, 1995) is to condense text by focusing on the most critical information. Generative large language models (LLMs) have shown to be robust summarizers, yet traditional metrics struggle to capture resulting performance (Goyal et al, 2022) in more powerful LLMs. In safety-critical domains such as medicine, more rigorous evaluation is required, especially given the potential for LLMs to omit important information in the resulting summary. We propose MED-OMIT, a new omission benchmark for medical summarization. Given a doctor-patient conversation and a generated summary, MED-OMIT categorizes the chat into a set of facts and identifies which are omitted from the summary. We further propose to determine fact importance by simulating the impact of each fact on a downstream clinical task: differential diagnosis (DDx) generation. MED-OMIT leverages LLM prompt-based approaches which categorize the importance of facts and cluster them as supporting or negating evidence to the diagnosis. We evaluate MED-OMIT on a publicly-released dataset of patient-doctor conversations and find that MED-OMIT captures omissions better than alternative metrics.
    摘要 目的是自动摘要技术(Paice,1990;Kupiec等,1995)摘要文本,专注最重要的信息。生成大型自然语言模型(LLM)已经证明是有力的摘要工具,但传统度量难以捕捉这些模型所创造的性能(Goyal等,2022)。在安全关键领域,如医学,更加严格的评估是必要的,特别是由于LLM可能会在摘要中缺少重要信息。我们提议MED-OMIT,一个新的漏洞准确度标准 для医学摘要。给定一个医生与病人的对话和生成的摘要,MED-OMIT将对话分为一组事实,并识别这些事实在摘要中是否被漏忘。我们还提议根据这些事实对于诊断的影响来确定它们的重要性。MED-OMIT使用LLM提示方法,将重要性分为支持或阻止诊断的证据。我们对一个公共释出的病人对话集进行评估,发现MED-OMIT可以更好地捕捉漏忘。

Workflow-Guided Response Generation for Task-Oriented Dialogue

  • paper_url: http://arxiv.org/abs/2311.08300
  • repo_url: None
  • paper_authors: Do June Min, Paloma Sodhi, Ramya Ramakrishnan
  • for: 这个论文的目的是提出一种基于强化学习的对话响应生成框架,以便在对话中实现特定的工作流程。
  • methods: 该框架包括一个名为 ComplianceScorer 的度量器,用于评估生成的响应是否遵循指定的工作流程,以及一个基于强化学习的优化过程,使用交互采样技术。
  • results: 对两个 TOD 数据集(Action-Based Conversations Dataset 和 MultiWOZ 2.2)进行评估,发现该框架在自动和人工评估指标上都有显著优势,能够生成遵循工作流程的自然和流畅的对话响应。
    Abstract Task-oriented dialogue (TOD) systems aim to achieve specific goals through interactive dialogue. Such tasks usually involve following specific workflows, i.e. executing a sequence of actions in a particular order. While prior work has focused on supervised learning methods to condition on past actions, they do not explicitly optimize for compliance to a desired workflow. In this paper, we propose a novel framework based on reinforcement learning (RL) to generate dialogue responses that are aligned with a given workflow. Our framework consists of ComplianceScorer, a metric designed to evaluate how well a generated response executes the specified action, combined with an RL opimization process that utilizes an interactive sampling technique. We evaluate our approach on two TOD datasets, Action-Based Conversations Dataset (ABCD) (Chen et al., 2021a) and MultiWOZ 2.2 (Zang et al., 2020) on a range of automated and human evaluation metrics. Our findings indicate that our RL-based framework outperforms baselines and is effective at enerating responses that both comply with the intended workflows while being expressed in a natural and fluent manner.
    摘要 干净的对话系统(Task-Oriented Dialogue,TOD)旨在通过互动对话达到特定目标。这些任务通常包括执行特定的工作流程,即执行一系列动作在特定的顺序。而在过去的研究中,主要采用监督学习方法来condition on past actions,但这些方法并不直接优化对话响应的合liance。在这篇论文中,我们提出了一种基于奖励学习(Reinforcement Learning,RL)的新框架,用于生成与给定的工作流程相关的对话响应。我们的框架包括ComplianceScorer,一个用于评估生成响应是否执行了指定的动作的度量,以及一个RL优化过程,利用交互采样技术。我们对两个TOD数据集(Action-Based Conversations Dataset(ABCD)和MultiWOZ 2.2)进行了评估,并在自动和人类评估指标上达到了比基eline更高的性能。我们的研究结果表明,我们的RL基于的框架可以够效地生成符合工作流程的对话响应,同时也能够保持自然和流畅的表达。

VERVE: Template-based ReflectiVE Rewriting for MotiVational IntErviewing

  • paper_url: http://arxiv.org/abs/2311.08299
  • repo_url: None
  • paper_authors: Do June Min, Verónica Pérez-Rosas, Kenneth Resnicow, Rada Mihalcea
  • for: 本研究旨在提高观点咨问(MI)技巧的致用性,即咨询师需要学习的基本技能之一。
  • methods: 本研究提出了一种咨询回应重写任务,将非反应性陈述转换成反应性回应。该任务使用VERVE模板基于模板更新和填充增强训练。VERVE首先创建一个模板,并从非反应性Token中过滤掉不重要的句子。然后,使用模板来构建一个反应性回应。
  • results: 通过自动和人工评估,我们比较了我们的方法与文本重写基线之间的性能,并发现我们的框架可以更好地转换非反应性陈述为反应性回应,同时保持了内容准确性和反应性 стиyle的平衡。
    Abstract Reflective listening is a fundamental skill that counselors must acquire to achieve proficiency in motivational interviewing (MI). It involves responding in a manner that acknowledges and explores the meaning of what the client has expressed in the conversation. In this work, we introduce the task of counseling response rewriting, which transforms non-reflective statements into reflective responses. We introduce VERVE, a template-based rewriting system with paraphrase-augmented training and adaptive template updating. VERVE first creates a template by identifying and filtering out tokens that are not relevant to reflections and constructs a reflective response using the template. Paraphrase-augmented training allows the model to learn less-strict fillings of masked spans, and adaptive template updating helps discover effective templates for rewriting without significantly removing the original content. Using both automatic and human evaluations, we compare our method against text rewriting baselines and show that our framework is effective in turning non-reflective statements into more reflective responses while achieving a good content preservation-reflection style trade-off.
    摘要 <TRANSLATE_TEXT>投射式听众是跟踪咨询技能的基础能力,用于实现动机激励采访(MI)的掌握。它通过回念地回应客户的话语,以便更好地理解和探讨客户的意义。在这篇文章中,我们介绍了咨询回应重写任务,该任务将非反射性声明转换为反射回应。我们介绍了VERVE模板基于重写系统,该系统通过标识和过滤无关于反射的token,并使用模板来构建反射回应。我们还提出了带有填充辅助的模板更新策略,以便在不丢弃原始内容的情况下,发现有效的反射回应模板。通过自动和人工评估,我们与文本重写基准相比较,并证明我们的框架可以将非反射性声明转换为更加反射的回应,同时保持内容准确性和反射风格的良好平衡。

A Survey of Language Model Confidence Estimation and Calibration

  • paper_url: http://arxiv.org/abs/2311.08298
  • repo_url: None
  • paper_authors: Jiahui Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, Iryna Gurevych
  • for: This paper aims to provide a comprehensive overview of research on assessing the confidence of language models (LMs) and calibrating their predictions to improve AI safety.
  • methods: The paper discusses various methods and techniques for estimating the confidence of LMs, including different LMs and various tasks.
  • results: The paper outlines the challenges of estimating the confidence of large language models and suggests some promising directions for future work.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文目标是为了提供语言模型(LM)的 confidence 评估和预测calibration的全面回顾,以提高人工智能安全性。
  • methods: 论文讨论了不同的LM和任务下的 confidence 评估和calibration方法。
  • results: 论文描述了大语言模型的 confidence 评估的挑战和未来工作的可能性。
    Abstract Language models (LMs) have demonstrated remarkable capabilities across a wide range of tasks in various domains. Despite their impressive performance, the reliability of their output is concerning and questionable regarding the demand for AI safety. Assessing the confidence of LM predictions and calibrating them across different tasks with the aim to align LM confidence with accuracy can help mitigate risks and enable LMs to make better decisions. There have been various works in this respect, but there has been no comprehensive overview of this important research area. The present survey aims to bridge this gap. In particular, we discuss methods and techniques for LM confidence estimation and calibration, encompassing different LMs and various tasks. We further outline the challenges of estimating the confidence for large language models and we suggest some promising directions for future work.
    摘要 语言模型(LM)在多种任务和领域中表现出色,但其输出的可靠性却引起了关注和质疑。为了减少人工智能安全风险,必须评估LM预测的可靠性并在不同任务中进行准确性调整。目前,有很多相关研究,但没有一篇全面的评论。本篇文章试图填补这一空白。我们讨论了LM可靠性估计和调整的方法和技术,涵盖不同的LM和任务。我们还描述了大语言模型的可靠性估计的挑战,并提出了一些有前途的未来工作方向。

On The Relationship Between Universal Adversarial Attacks And Sparse Representations

  • paper_url: http://arxiv.org/abs/2311.08265
  • repo_url: https://github.com/danawr/adversarial_attacks_and_sparse_representations
  • paper_authors: Dana Weitzner, Raja Giryes
  • for: 本文目的是解释神经网络对小幅度偏移的敏感性,通过减少性框架来解释。
  • methods: 本文使用了减少性算法和LISTA算法,以及其他常见的攻击方法,来描述神经网络对输入图像的减少性表示的敏感性。
  • results: 本文发现,神经网络对于减少性表示的敏感性是通用的和可转移的,并且可以通过对输入图像的减少性表示进行攻击。
    Abstract The prominent success of neural networks, mainly in computer vision tasks, is increasingly shadowed by their sensitivity to small, barely perceivable adversarial perturbations in image input. In this work, we aim at explaining this vulnerability through the framework of sparsity. We show the connection between adversarial attacks and sparse representations, with a focus on explaining the universality and transferability of adversarial examples in neural networks. To this end, we show that sparse coding algorithms, and the neural network-based learned iterative shrinkage thresholding algorithm (LISTA) among them, suffer from this sensitivity, and that common attacks on neural networks can be expressed as attacks on the sparse representation of the input image. The phenomenon that we observe holds true also when the network is agnostic to the sparse representation and dictionary, and thus can provide a possible explanation for the universality and transferability of adversarial attacks. The code is available at https://github.com/danawr/adversarial_attacks_and_sparse_representations.
    摘要 neuronal networks 的显著成功,主要在计算机视觉任务中,受到小、几乎不可见的对抗性扰动的影响。 在这个工作中,我们尝试通过框架来解释这种敏感性。 我们显示了对抗攻击和稀热表示之间的连接,并强调了对于解释神经网络中的对抗例的universality和传递性。 为此,我们显示了稀热编码算法和基于神经网络学习的迭代缩小阈值算法(LISTA)中的敏感性,以及常见的神经网络攻击可以表示为对输入图像的稀热表示的攻击。这种现象我们观察到的持久性也适用于神经网络不知道稀热表示和词典的情况下。因此,我们的发现可能为对神经网络中的对抗攻击提供了一个可能的解释。 代码可以在https://github.com/danawr/adversarial_attacks_and_sparse_representations上获取。

REST: Retrieval-Based Speculative Decoding

  • paper_url: http://arxiv.org/abs/2311.08252
  • repo_url: https://github.com/fasterdecoding/rest
  • paper_authors: Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D Lee, Di He
  • for: 快速化语言模型生成
  • methods: 利用检索来生成拟合 tokens
  • results: 在单批设置下,对 7B 和 13B 语言模型进行了加速,速度提高了 1.62X 至 2.36X 于代码或文本生成
    Abstract We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation. The key insight driving the development of REST is the observation that the process of text generation often includes certain common phases and patterns. Unlike previous methods that rely on a draft language model for speculative decoding, REST harnesses the power of retrieval to generate draft tokens. This method draws from the reservoir of existing knowledge, retrieving and employing relevant tokens based on the current context. Its plug-and-play nature allows for seamless integration and acceleration of any language models, all without necessitating additional training. When benchmarked on 7B and 13B language models in a single-batch setting, REST achieves a significant speedup of 1.62X to 2.36X on code or text generation. The code of REST is available at https://github.com/FasterDecoding/REST.
    摘要 我们介绍 Retrieval-Based Speculative Decoding(REST),一种新的算法,用于快速化语言模型生成。REST的关键思想是发现文本生成过程中通常包含一些共同的阶段和模式。不同于先前的方法,REST不是靠对稿语言模型进行推测性解oding,而是利用库存的知识,从现有的文本中获取和使用相关的字符。这种插件式的设计使得可以轻松地整合和加速任何语言模型,不需要额外训练。当对7B和13B语言模型进行单批训练时,REST可以获得显著的速度增加,对于代码或文本生成而言,速度增加为1.62倍至2.36倍。REST的代码可以在https://github.com/FasterDecoding/REST上获取。

Investigating the Encoding of Words in BERT’s Neurons using Feature Textualization

  • paper_url: http://arxiv.org/abs/2311.08240
  • repo_url: None
  • paper_authors: Tanja Baeumel, Soniya Vijayakumar, Josef van Genabith, Guenter Neumann, Simon Ostermann
  • for: This paper aims to provide a better understanding of the knowledge encoded in individual neurons of pre-trained language models (PLMs), specifically in the BERT model.
  • methods: The paper proposes a technique called feature textualization to produce dense representations of neurons in the PLM word embedding space, and applies this technique to the BERT model to investigate the knowledge encoded in individual neurons.
  • results: The paper finds that the produced representations can provide insights about the knowledge encoded in individual neurons, but that individual neurons do not represent clearcut symbolic units of language such as words. Additionally, the paper investigates how many neurons are needed to encode words in BERT.
    Abstract Pretrained language models (PLMs) form the basis of most state-of-the-art NLP technologies. Nevertheless, they are essentially black boxes: Humans do not have a clear understanding of what knowledge is encoded in different parts of the models, especially in individual neurons. The situation is different in computer vision, where feature visualization provides a decompositional interpretability technique for neurons of vision models. Activation maximization is used to synthesize inherently interpretable visual representations of the information encoded in individual neurons. Our work is inspired by this but presents a cautionary tale on the interpretability of single neurons, based on the first large-scale attempt to adapt activation maximization to NLP, and, more specifically, large PLMs. We propose feature textualization, a technique to produce dense representations of neurons in the PLM word embedding space. We apply feature textualization to the BERT model (Devlin et al., 2019) to investigate whether the knowledge encoded in individual neurons can be interpreted and symbolized. We find that the produced representations can provide insights about the knowledge encoded in individual neurons, but that individual neurons do not represent clearcut symbolic units of language such as words. Additionally, we use feature textualization to investigate how many neurons are needed to encode words in BERT.
    摘要 预训语言模型(PLM)是现代自然语言处理技术的基础。然而,它们实际上是黑obox:人类没有清楚的理解哪些知识被不同部分模型中的不同神经元编码。在计算机视觉中,特征可视化提供了解 decompositional 可读性技术,可以用来解释视觉模型中神经元的信息。我们的工作是基于这个,但是它们提供了一个警告:单个神经元的解释性不是很明确。我们提出了特征文本化技术,用于生成 PLM 词嵌入空间中神经元的稠密表示。我们应用特征文本化技术到 Devlin et al. (2019) 中的 BERT 模型,以调查个神经元是否可以解释和象化语言知识。我们发现生成的表示可以提供神经元中知识的启示,但是单个神经元不表示明确的语言符号单元,如单词。此外,我们使用特征文本化技术来调查BERT模型中需要多少神经元来编码单词。

Learning Physics-Inspired Regularization for Medical Image Registration with Hypernetworks

  • paper_url: http://arxiv.org/abs/2311.08239
  • repo_url: https://github.com/annareithmeir/elastic-regularization-hypermorph
  • paper_authors: Anna Reithmeir, Julia A. Schnabel, Veronika A. Zimmer
  • for: 用于医学图像对接和图像基于诊断和治疗
  • methods: 使用物理学发现的正则化器,包括线性弹性正则化器,以模拟生物组织的弹性性
  • results: 可以在测试时高效地找到适合的数据特定物理参数,以便进行成功的图像对接
    Abstract Medical image registration aims at identifying the spatial deformation between images of the same anatomical region and is fundamental to image-based diagnostics and therapy. To date, the majority of the deep learning-based registration methods employ regularizers that enforce global spatial smoothness, e.g., the diffusion regularizer. However, such regularizers are not tailored to the data and might not be capable of reflecting the complex underlying deformation. In contrast, physics-inspired regularizers promote physically plausible deformations. One such regularizer is the linear elastic regularizer which models the deformation of elastic material. These regularizers are driven by parameters that define the material's physical properties. For biological tissue, a wide range of estimations of such parameters can be found in the literature and it remains an open challenge to identify suitable parameter values for successful registration. To overcome this problem and to incorporate physical properties into learning-based registration, we propose to use a hypernetwork that learns the effect of the physical parameters of a physics-inspired regularizer on the resulting spatial deformation field. In particular, we adapt the HyperMorph framework to learn the effect of the two elasticity parameters of the linear elastic regularizer. Our approach enables the efficient discovery of suitable, data-specific physical parameters at test time.
    摘要 医学图像匹配目标是在同一个解剖区域中的图像之间找到空间变形,这是图像基于诊断和治疗的基础。目前大多数深度学习基于匹配方法使用的 regularizer 都是global spatial smoothness,例如Diffusion regularizer。然而,这些 regularizer 并不是数据适应的,可能不能准确反映复杂的下面变形。相反,物理启发的 regularizer 推动物理可能的变形。例如,线性弹性 regularizer 模拟了弹性物质的变形。这些 regularizer 是通过参数定义物质物理性能的。对生物组织来说,文献中存在很多估计这些参数的值,但是还是一个开放的挑战来确定合适的参数值以实现成功的匹配。为了解决这个问题并将物理性能引入学习基于的匹配方法中,我们提议使用 hypernetwork 来学习物理参数对空间变形场的影响。特别是,我们采用了 HyperMorph 框架来学习两个弹性参数对线性弹性 regularizer 的影响。我们的方法可以在测试时高效地发现适合数据的物理参数。

Eval-GCSC: A New Metric for Evaluating ChatGPT’s Performance in Chinese Spelling Correction

  • paper_url: http://arxiv.org/abs/2311.08219
  • repo_url: https://github.com/ktlktl/eval-gcsc
  • paper_authors: Kunting Li, Yong Hu, Shaolei Wang, Hanhan Ma, Liang He, Fandong Meng, Jie Zhou
  • for: 本文旨在提出一种新的评估度量,以评估生成模型在中文拼写正确任务中的表现。
  • methods: 本文使用了一种新的评估度量——Eval-GCSC,它通过结合单词水平和Semantic Similarity来评估生成模型的拼写正确性。
  • results: 实验结果表明,Eval-GCSC评估度量与人工评估得到了高度的一致性,而且生成模型的表现与传统的token-level分类模型(TCM)相当。
    Abstract ChatGPT has demonstrated impressive performance in various downstream tasks. However, in the Chinese Spelling Correction (CSC) task, we observe a discrepancy: while ChatGPT performs well under human evaluation, it scores poorly according to traditional metrics. We believe this inconsistency arises because the traditional metrics are not well-suited for evaluating generative models. Their overly strict length and phonics constraints may lead to underestimating ChatGPT's correction capabilities. To better evaluate generative models in the CSC task, this paper proposes a new evaluation metric: Eval-GCSC. By incorporating word-level and semantic similarity judgments, it relaxes the stringent length and phonics constraints. Experimental results show that Eval-GCSC closely aligns with human evaluations. Under this metric, ChatGPT's performance is comparable to traditional token-level classification models (TCM), demonstrating its potential as a CSC tool. The source code and scripts can be accessed at https://github.com/ktlKTL/Eval-GCSC.
    摘要 chatGPT在多种下渠任务中表现出色,但在中文拼写正确(CSC)任务中,我们观察到一个不一致性:虽然chatGPT在人工评价中表现良好,但按照传统的指标来说,其分数不高。我们认为这种不一致性是因为传统的指标不适合评估生成模型。它们的过于严格的长度和音调约束可能会导致对chatGPT的 correction能力进行低估。为更好地评估生成模型在CSC任务中,本文提出了一个新的评价指标:Eval-GCSC。它通过 integrate word-level和Semantic Similarity的判断,它逐渐放弃了长度和音调的约束。实验结果表明,Eval-GCSC与人工评价高度相似。根据这个指标,chatGPT在CSC任务中的表现与传统的token-level分类模型(TCM)相当,这表明它在CSC中具有潜在的工具性。可以通过https://github.com/ktlKTL/Eval-GCSC访问源代码和脚本。

Human-Centric Autonomous Systems With LLMs for User Command Reasoning

  • paper_url: http://arxiv.org/abs/2311.08206
  • repo_url: https://github.com/kth-rpl/drivecmd_llm
  • paper_authors: Yi Yang, Qingwen Zhang, Ci Li, Daniel Simões Marta, Nazre Batool, John Folkesson
  • for: 本研究旨在将自动驾驶系统与人工智能语言模型(LLM)结合,以满足用户的需求。
  • methods: 本研究使用了不同的LLM模型和提示设计,通过进行了一系列的实验,以评估自动驾驶系统从自然语言文本指令中推导出的多重需求的精度。
  • results: 研究发现,LLM模型可以理解和处理提示,但是其效iveness受到LLM模型的质量和提示设计的限制。
    Abstract The evolution of autonomous driving has made remarkable advancements in recent years, evolving into a tangible reality. However, a human-centric large-scale adoption hinges on meeting a variety of multifaceted requirements. To ensure that the autonomous system meets the user's intent, it is essential to accurately discern and interpret user commands, especially in complex or emergency situations. To this end, we propose to leverage the reasoning capabilities of Large Language Models (LLMs) to infer system requirements from in-cabin users' commands. Through a series of experiments that include different LLM models and prompt designs, we explore the few-shot multivariate binary classification accuracy of system requirements from natural language textual commands. We confirm the general ability of LLMs to understand and reason about prompts but underline that their effectiveness is conditioned on the quality of both the LLM model and the design of appropriate sequential prompts. Code and models are public with the link \url{https://github.com/KTH-RPL/DriveCmd_LLM}.
    摘要 自带驾驶技术的发展很快,已经成为现实。然而,大规模采用需要满足多种多样的需求。以确保自动驾驶系统满足用户的意图,需要准确地理解和解释用户命令,特别是在复杂或紧急情况下。为此,我们提议利用大型自然语言模型(LLM)的理解能力来推导系统需求从驾驶舱用户的命令。通过不同的 LLM 模型和提示设计,我们探索了从自然语言文本命令中多个变量binary分类精度。我们证明了 LLM 的普遍能力理解和解释提示,但是其效iveness 受到 LLM 模型和提示设计质量的限制。代码和模型在 GitHub 上公开,链接为 \url{https://github.com/KTH-RPL/DriveCmd_LLM}.

Automated Fact-Checking in Dialogue: Are Specialized Models Needed?

  • paper_url: http://arxiv.org/abs/2311.08195
  • repo_url: None
  • paper_authors: Eric Chamoun, Marzieh Saeidi, Andreas Vlachos
  • for: 提高对话中的真假查找效果
  • methods: 使用对话数据标注的精度进行练习改进、对话输入转换以适应模型的预测
  • results: 使用同一模型进行对话和常见真假查找,并维持对常见真假查找的精度。
    Abstract Prior research has shown that typical fact-checking models for stand-alone claims struggle with claims made in dialogues. As a solution, fine-tuning these models on labelled dialogue data has been proposed. However, creating separate models for each use case is impractical, and we show that fine-tuning models for dialogue results in poor performance on typical fact-checking. To overcome this challenge, we present techniques that allow us to use the same models for both dialogue and typical fact-checking. These mainly focus on retrieval adaptation and transforming conversational inputs so that they can be accurately predicted by models trained on stand-alone claims. We demonstrate that a typical fact-checking model incorporating these techniques is competitive with state-of-the-art models fine-tuned for dialogue, while maintaining its accuracy on stand-alone claims.
    摘要 Here's the Simplified Chinese translation:先前的研究表明, traditional的决定性检查模型对于单独的声明很难进行检查。为解决这个问题,人们提议使用对话数据进行精度调整。然而,创建每个用例的分开模型是不实际的,我们显示了对话模型的精度调整会导致典型的检查性能下降。为了解决这个挑战,我们提出了一些技术,使得我们可以使用同一个模型来进行对话和典型的检查。这些技术主要是适应检索和将对话输入转换成可以由基于单独声明的模型准确预测的形式。我们示出了一个典型的检查模型,包含这些技术,与状态最佳的对话模型竞争,而且保持了对单独声明的精度。

Semi-Supervised Learning via Swapped Prediction for Communication Signal Recognition

  • paper_url: http://arxiv.org/abs/2311.08179
  • repo_url: None
  • paper_authors: Weidong Wang, Hongshu Liao, Lu Gan
  • for: 提高通信信号识别器的性能,使其能够在小数据量和少量标签下进行训练,而不会过拟合。
  • methods: 基于强 Datenaugmentation 和自适应常量 regularization 的 Semi-supervised learning 方法,使用大量可得到的无标签信号数据来提高模型的泛化能力。
  • results: 实验表明,提出的方法可以在深度 SSL 中提高通信信号识别器的性能,并且在小数据量和少量标签下进行训练时,可以避免过拟合。
    Abstract Deep neural networks have been widely used in communication signal recognition and achieved remarkable performance, but this superiority typically depends on using massive examples for supervised learning, whereas training a deep neural network on small datasets with few labels generally falls into overfitting, resulting in degenerated performance. To this end, we develop a semi-supervised learning (SSL) method that effectively utilizes a large collection of more readily available unlabeled signal data to improve generalization. The proposed method relies largely on a novel implementation of consistency-based regularization, termed Swapped Prediction, which leverages strong data augmentation to perturb an unlabeled sample and then encourage its corresponding model prediction to be close to its original, optimized with a scaled cross-entropy loss with swapped symmetry. Extensive experiments indicate that our proposed method can achieve a promising result for deep SSL of communication signal recognition.
    摘要 深度神经网络在通信信号识别领域得到了广泛应用,但这种优势通常取决于使用庞大的示例进行监督学习,而使用小数据集和少量标签时通常会陷入过拟合,导致性能下降。为解决这个问题,我们开发了一种半监督学习(SSL)方法,能够有效地利用大量更 readily available的无标示例数据来提高泛化。我们的提议方法基于一种新的归一化正则化技术,称为Swapped Prediction,它利用强大的数据归一化来perturb一个无标示例,然后鼓励其相应的模型预测与其原始值保持相似,并且通过涨平的交叉熵损失来优化。我们的实验表明,我们的提议方法可以取得深度SSL通信信号识别领域的可提升的结果。

Neural Lattice Reduction: A Self-Supervised Geometric Deep Learning Approach

  • paper_url: http://arxiv.org/abs/2311.08170
  • repo_url: None
  • paper_authors: Giovanni Luca Marchetti, Gabriele Cesa, Kumar Pratik, Arash Behboodi
  • for: solving lattice reduction problems using deep learning methods
  • methods: using a deep neural model outputting factorized unimodular matrices, trained in a self-supervised manner with penalization for non-orthogonal lattice bases, and incorporating symmetries of lattice reduction through invariance and equivariance with respect to appropriate continuous and discrete groups
  • results: a deep learning method for lattice reduction that incorporates symmetries and achieves good performance
    Abstract Lattice reduction is a combinatorial optimization problem aimed at finding the most orthogonal basis in a given lattice. In this work, we address lattice reduction via deep learning methods. We design a deep neural model outputting factorized unimodular matrices and train it in a self-supervised manner by penalizing non-orthogonal lattice bases. We incorporate the symmetries of lattice reduction into the model by making it invariant and equivariant with respect to appropriate continuous and discrete groups.
    摘要 “底色减少”是一个 combinatorial 优化问题,旨在找到给定的底色中最正交的基底。在这项工作中,我们通过深度学习方法来解决底色减少问题。我们设计了一个深度神经网络,输出 факторизов了单模卷积矩阵,并在无监督的方式下训练它,对非正交底色基底进行惩罚。我们在模型中包含底色减少的 symmetries,使其对恰当的连续和离散群进行不变和对称性。

MechAgents: Large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge

  • paper_url: http://arxiv.org/abs/2311.08166
  • repo_url: None
  • paper_authors: Bo Ni, Markus J. Buehler
    for:The paper is written for solving mechanics problems using numerical methods, specifically using large language models (LLMs) to develop a new class of physics-inspired generative machine learning platform called MechAgents.methods:The paper uses autonomous collaborations of multiple LLMs to solve elasticity problems, including applying finite element methods with different boundary conditions, domain geometries, meshes, and constitutive laws. The agents mutually correct each other to improve team-work performance in understanding, formulating, and validating the solution.results:The paper demonstrates the effectiveness of the MechAgents framework in solving classical elasticity problems, and shows the potential of synergizing the intelligence of language models, the reliability of physics-based modeling, and the dynamic collaborations among diverse agents to automate the solution of engineering problems.
    Abstract Solving mechanics problems using numerical methods requires comprehensive intelligent capability of retrieving relevant knowledge and theory, constructing and executing codes, analyzing the results, a task that has thus far mainly been reserved for humans. While emerging AI methods can provide effective approaches to solve end-to-end problems, for instance via the use of deep surrogate models or various data analytics strategies, they often lack physical intuition since knowledge is baked into the parametric complement through training, offering less flexibility when it comes to incorporating mathematical or physical insights. By leveraging diverse capabilities of multiple dynamically interacting large language models (LLMs), we can overcome the limitations of conventional approaches and develop a new class of physics-inspired generative machine learning platform, here referred to as MechAgents. A set of AI agents can solve mechanics tasks, here demonstrated for elasticity problems, via autonomous collaborations. A two-agent team can effectively write, execute and self-correct code, in order to apply finite element methods to solve classical elasticity problems in various flavors (different boundary conditions, domain geometries, meshes, small/finite deformation and linear/hyper-elastic constitutive laws, and others). For more complex tasks, we construct a larger group of agents with enhanced division of labor among planning, formulating, coding, executing and criticizing the process and results. The agents mutually correct each other to improve the overall team-work performance in understanding, formulating and validating the solution. Our framework shows the potential of synergizing the intelligence of language models, the reliability of physics-based modeling, and the dynamic collaborations among diverse agents, opening novel avenues for automation of solving engineering problems.
    摘要 解决机械问题使用数值方法需要全面的智能能力,包括检索相关知识和理论,构建和执行代码,分析结果,这项工作曾经主要由人类完成。落地出现的人工智能方法可以提供有效的解决端到端问题的方法,例如通过使用深度替身模型或多种数据分析策略,但它们经常缺乏物理直觉,因为知识被嵌入参数补充中进行训练,无法适应包含数学或物理意见的情况。我们可以通过多种语言模型之间的互动,开发一种新的物理启发的机器学习平台,称为MechAgents。这些AI代理可以解决机械任务,例如弹性问题,通过自主协作。一个两代理团队可以自动撰写、执行和自我修正代码,以应用finite element方法解决不同的粘连条件、域几何、网格、小几何弹性和线性/超几何材料学定律等等。对于更复杂的任务,我们可以建立一个更大的代理团队,在规划、形态、代码、执行和评价过程中进行分工,以提高总体团队的合作性和性能。我们的框架展示了将语言模型的智能、物理模型的可靠性和多种代理之间的协作融合起来,开启了解决工程问题的自动化新 Avenues。

Ask One More Time: Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios

  • paper_url: http://arxiv.org/abs/2311.08154
  • repo_url: None
  • paper_authors: Lei Lin, Jiayi Fu, Pengli Liu, Junchen Wan, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai
    for: 这篇论文旨在提高链式思维(CoT)提示 комбиined with语言模型的表现,并解决过去的缺点,例如重复性和本地最佳化。methods: 这篇论文提出了一种称为“自我一致”的统一ensemble-optimization方法,可以在大多数情况下适用,包括不知道输入问题的类型或回答路径的类型。results: 这篇论文的实验结果显示,自我一致方法可以在六个公开的推理实验上显示出优异的表现,同时也具有优秀的扩展能力。
    Abstract Although chain-of-thought (CoT) prompting combined with language models has achieved encouraging results on complex reasoning tasks, the naive greedy decoding used in CoT prompting usually causes the repetitiveness and local optimality. To address this shortcoming, ensemble-optimization tries to obtain multiple reasoning paths to get the final answer assembly. However, current ensemble-optimization methods either simply employ rule-based post-processing such as \textit{self-consistency}, or train an additional model based on several task-related human annotations to select the best one among multiple reasoning paths, yet fail to generalize to realistic settings where the type of input questions is unknown or the answer format of reasoning paths is unknown. To avoid their limitations, we propose \textbf{self-agreement}, a generalizable ensemble-optimization method applying in almost all scenarios where the type of input questions and the answer format of reasoning paths may be known or unknown. Self-agreement firstly samples from language model's decoder to generate a \textit{diverse} set of reasoning paths, and subsequently prompts the language model \textit{one more time} to determine the optimal answer by selecting the most \textit{agreed} answer among the sampled reasoning paths. Self-agreement simultaneously achieves remarkable performance on six public reasoning benchmarks and superior generalization capabilities.
    摘要 尽管链式思维(CoT)提示与语言模型结合已经实现了复杂逻辑任务的吸引人result,但通常使用的Naive greedy decoding在CoT提示中会导致重复性和局部优化。为了解决这些缺点,集成优化尝试获取多个逻辑路径来获得最终答案组装。然而,现有的集成优化方法可能会使用规则基于的后处理such as自 consistency,或者训练一个基于多个任务相关的人工签名来选择最佳的一个多个逻辑路径,但它们无法泛化到真实的设置中, где输入问题的类型和逻辑路径的答案格式都是未知的。为了避免这些限制,我们提出了自 consistency,一种通用的集成优化方法,可以在大多数情况下应用,包括输入问题的类型和逻辑路径的答案格式可以是知道的或未知的。自 consistency首先从语言模型的解码器中采样出一个多样化的逻辑路径集,然后再一次提示语言模型,通过选择多样化逻辑路径中最为一致的答案来确定最佳答案。自 consistency同时实现了六个公共逻辑 benchmarck上的吸引人result和superior泛化能力。

When Mining Electric Locomotives Meet Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.08153
  • repo_url: None
  • paper_authors: Ying Li, Zhencai Zhu, Xiaoqiang Li, Chunyu Yang, Hao Lu
  • For: This paper aims to present a reinforcement learning (RL) method for the autonomous control of mining electric locomotives in complex and uncertain coal mine environments.* Methods: The proposed method uses RL to learn the optimal control policy for the locomotives, and an improved epsilon-greedy algorithm is proposed to balance exploration and exploitation. The co-simulation platform is built to verify the effectiveness of the method.* Results: The simulation results show that the proposed method ensures the locomotives follow the front vehicle safely and respond promptly to sudden obstacles in the event of complex and uncertain coal mine environments.
    Abstract As the most important auxiliary transportation equipment in coal mines, mining electric locomotives are mostly operated manually at present. However, due to the complex and ever-changing coal mine environment, electric locomotive safety accidents occur frequently these years. A mining electric locomotive control method that can adapt to different complex mining environments is needed. Reinforcement Learning (RL) is concerned with how artificial agents ought to take actions in an environment so as to maximize reward, which can help achieve automatic control of mining electric locomotive. In this paper, we present how to apply RL to the autonomous control of mining electric locomotives. To achieve more precise control, we further propose an improved epsilon-greedy (IEG) algorithm which can better balance the exploration and exploitation. To verify the effectiveness of this method, a co-simulation platform for autonomous control of mining electric locomotives is built which can complete closed-loop simulation of the vehicles. The simulation results show that this method ensures the locomotives following the front vehicle safely and responding promptly in the event of sudden obstacles on the road when the vehicle in complex and uncertain coal mine environments.
    摘要 现在大多数煤矿电力机车都是人工操作的。然而,由于煤矿环境复杂且不断变化,电力机车安全事故频繁发生。为了应对不同的煤矿环境,我们需要一种可以适应不同环境的煤矿电力机车控制方法。在这篇论文中,我们介绍了如何通过强化学习(RL)来实现自动控制煤矿电力机车。为了更加精准地控制,我们还提出了一种改进的ε-软弱算法(IEG),可以更好地平衡探索和利用。为了证明这种方法的有效性,我们建立了一个自动控制煤矿电力机车的协同 simulate平台,可以完成煤矿电力机车的关闭循环 simulate。实验结果表明,这种方法可以使煤矿电力机车在复杂和不确定的煤矿环境中安全地跟随前车,并快速应对突然出现的道路障碍物。

The Hyperdimensional Transform for Distributional Modelling, Regression and Classification

  • paper_url: http://arxiv.org/abs/2311.08150
  • repo_url: https://github.com/padwulf/chap6_transform_applications
  • paper_authors: Pieter Dewulf, Bernard De Baets, Michiel Stock
  • for: 本研究目的是为了介绍 hyperdimensional computing(HDC)的概念和应用,尤其是在机器学习和数据科学领域。
  • methods: 本研究使用的方法包括 hyperdimensional transform,该变换可以用于表示函数和分布 como high-dimensional holographic vectors。
  • results: 研究表明,使用 hyperdimensional transform 可以导致一种新的、有良好基础的工具箱,可以用于修改现有的机器学习算法并解决各种统计模型问题,如回归和分类任务,以及 represntation、学习、分布拟合、采样、 bayesian inference 和 uncertainty estimation。
    Abstract Hyperdimensional computing (HDC) is an increasingly popular computing paradigm with immense potential for future intelligent applications. Although the main ideas already took form in the 1990s, HDC recently gained significant attention, especially in the field of machine learning and data science. Next to efficiency, interoperability and explainability, HDC offers attractive properties for generalization as it can be seen as an attempt to combine connectionist ideas from neural networks with symbolic aspects. In recent work, we introduced the hyperdimensional transform, revealing deep theoretical foundations for representing functions and distributions as high-dimensional holographic vectors. Here, we present the power of the hyperdimensional transform to a broad data science audience. We use the hyperdimensional transform as a theoretical basis and provide insight into state-of-the-art HDC approaches for machine learning. We show how existing algorithms can be modified and how this transform can lead to a novel, well-founded toolbox. Next to the standard regression and classification tasks of machine learning, our discussion includes various aspects of statistical modelling, such as representation, learning and deconvolving distributions, sampling, Bayesian inference, and uncertainty estimation.
    摘要 高维计算(HDC)是一种日益受欢迎的计算模式,具有未来智能应用的巨大潜力。尽管主要想法已经在1990年代形成,但HDC在机器学习和数据科学领域最近才受到了广泛关注。除了效率、互操作性和解释性外,HDC提供了泛化的有利属性,可以看作是将连接主义思想与符号学概念结合在一起的尝试。在最近的工作中,我们介绍了干扰变换,揭示了深刻的理论基础,用于表示函数和分布的高维干扰向量。在这里,我们将hyperdimensional transform的力量介绍给广泛的数据科学群体。我们使用干扰变换作为理论基础,并提供了现有算法的修改和新工具箱的可能性。除了传统的回归和分类任务之外,我们的讨论还包括统计模型的多个方面,例如表示、学习、分解分布、采样、 bayesian推理和不确定性估计。

Cattle Identification Using Muzzle Images and Deep Learning Techniques

  • paper_url: http://arxiv.org/abs/2311.08148
  • repo_url: https://github.com/peter716/animal_biometrics_system
  • paper_authors: G. N. Kimani, P. Oluwadara, P. Fashingabo, M. Busogi, E. Luhanga, K. Sowon, L. Chacha
  • for: 这项研究旨在开发一种基于皮肤特征的牛畜识别方法,以提高现有方法的精度和可扩展性。
  • methods: 本研究使用了深度学习模型,包括宽ResNet50和VGG16_BN,以及图像压缩技术来实现牛畜识别。
  • results: 实验结果显示,使用宽ResNet50模型和图像压缩技术可以达到最大准确率为99.5%,而且可以在AfricanContext中适用。
    Abstract Traditional animal identification methods such as ear-tagging, ear notching, and branding have been effective but pose risks to the animal and have scalability issues. Electrical methods offer better tracking and monitoring but require specialized equipment and are susceptible to attacks. Biometric identification using time-immutable dermatoglyphic features such as muzzle prints and iris patterns is a promising solution. This project explores cattle identification using 4923 muzzle images collected from 268 beef cattle. Two deep learning classification models are implemented - wide ResNet50 and VGG16\_BN and image compression is done to lower the image quality and adapt the models to work for the African context. From the experiments run, a maximum accuracy of 99.5\% is achieved while using the wide ResNet50 model with a compression retaining 25\% of the original image. From the study, it is noted that the time required by the models to train and converge as well as recognition time are dependent on the machine used to run the model.
    摘要 传统的动物识别方法,如耳标、耳割和烙印,有效但存在风险和扩展性问题。电子方法提供更好的跟踪和监测,但需要专业设备并可能受到攻击。生物特征识别使用不可逆的皮肤特征,如脸部印痕和眼球图像,是一个有前途的解决方案。本项目探索了使用4923个牛脸图像,从268头牛中收集,并实现了两个深度学习分类模型:宽频率ResNet50和VGG16\_BN。图像压缩是为了降低图像质量和适应非洲上的环境。经过实验,最高的准确率达到99.5%,使用宽频率ResNet50模型,保留原始图像的25%。研究发现,模型训练和平衡时间以及识别时间均取决于运行模型的机器。

RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge

  • paper_url: http://arxiv.org/abs/2311.08147
  • repo_url: None
  • paper_authors: Yi Liu, Lianzhe Huang, Shicheng Li, Sishuo Chen, Hao Zhou, Fandong Meng, Jie Zhou, Xu Sun
  • for: 本研究旨在评估现有语言模型对外部知识的可靠性能力,以帮助提高模型的问答能力和文本生成能力。
  • methods: 本研究使用了两个任务:问答和文本生成,并在每个任务中提供了含有相对信息的上下文。
  • results: 研究发现现有的语言模型受到不可靠的外部知识的干扰,而简单的 intervención方法帮助有限度地解决这个问题。
    Abstract LLMs and AI chatbots have improved people's efficiency in various fields. However, the necessary knowledge for answering the question may be beyond the models' knowledge boundaries. To mitigate this issue, many researchers try to introduce external knowledge, such as knowledge graphs and Internet contents, into LLMs for up-to-date information. However, the external information from the Internet may include counterfactual information that will confuse the model and lead to an incorrect response. Thus there is a pressing need for LLMs to possess the ability to distinguish reliable information from external knowledge. Therefore, to evaluate the ability of LLMs to discern the reliability of external knowledge, we create a benchmark from existing knowledge bases. Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context containing counterfactual information. Evaluation results show that existing LLMs are susceptible to interference from unreliable external knowledge with counterfactual information, and simple intervention methods make limited contributions to the alleviation of this issue.
    摘要

Caring Trouble and Musical AI: Considerations towards a Feminist Musical AI

  • paper_url: http://arxiv.org/abs/2311.08120
  • repo_url: None
  • paper_authors: Kelsey Cotton, Kıvanç Tatar
  • for: This paper examines the ethical implications of using AI in musical and artistic practice, specifically in the context of Holly+, a deep neural network that generates raw audio.
  • methods: The paper uses a critical feminist examination and speculative feminism to trouble the structures, frameworks, and assumptions within and around Holly+.
  • results: The paper contributes considerations and future directions for integrating speculative feminism and care into musical-AI agent and system design.
    Abstract The ethics of AI as both material and medium for interaction remains in murky waters within the context of musical and artistic practice. The interdisciplinarity of the field is revealing matters of concern and care, which necessitate interdisciplinary methodologies for evaluation to trouble and critique the inheritance of "residue-laden" AI-tools in musical applications. Seeking to unsettle these murky waters, this paper critically examines the example of Holly+, a deep neural network that generates raw audio in the likeness of its creator Holly Herndon. Drawing from theoretical concerns and considerations from speculative feminism and care ethics, we care-fully trouble the structures, frameworks and assumptions that oscillate within and around Holly+. We contribute with several considerations and contemplate future directions for integrating speculative feminism and care into musical-AI agent and system design, derived from our critical feminist examination.
    摘要 艺术和音乐领域中AI的伦理问题尚未得到清晰的解释,这个领域的多方交叉性也暴露了一些关注和照顾的问题,需要多方方法来评估和批判AI工具在音乐应用中的继承。为了摧极这些混沌的情况,本文 kritisch examines Holly+,一个深度神经网络,可以生成类似于其创造者Holly Herndon的原始音频。从 spéculative feminism和care ethics的理论和考虑中,我们仔细关注了Holly+的结构、框架和假设,并提出了一些考虑和思考将speculative feminism和care integrate into musical-AI agent和系统设计的可能性。

Evaluating Neighbor Explainability for Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2311.08118
  • repo_url: https://github.com/ericssonresearch/gnn-neighbors-xai
  • paper_authors: Oscar Llorente, Péter Vaderna, Sándor Laki, Roland Kotroczó, Rita Csoma, János Márk Szalai-Gindl
  • for: 本研究旨在解释Graph Neural Networks (GNNs)中每个邻居对于节点分类的重要性,以及如何度量这个特定任务的性能。
  • methods: 本研究使用了多种已知的解释方法,以及四种新的度量方法,以确定每个邻居对于GNN的重要性。
  • results: 研究发现,在GNN领域中,大多数解释方法无法 correctly identify important neighbors,而且gradient-based技术的解释几乎没有差异。
    Abstract Explainability in Graph Neural Networks (GNNs) is a new field growing in the last few years. In this publication we address the problem of determining how important is each neighbor for the GNN when classifying a node and how to measure the performance for this specific task. To do this, various known explainability methods are reformulated to get the neighbor importance and four new metrics are presented. Our results show that there is almost no difference between the explanations provided by gradient-based techniques in the GNN domain. In addition, many explainability techniques failed to identify important neighbors when GNNs without self-loops are used.
    摘要 <> translate "Explainability in Graph Neural Networks (GNNs) is a new field growing in the last few years. In this publication we address the problem of determining how important is each neighbor for the GNN when classifying a node and how to measure the performance for this specific task. To do this, various known explainability methods are reformulated to get the neighbor importance and four new metrics are presented. Our results show that there is almost no difference between the explanations provided by gradient-based techniques in the GNN domain. In addition, many explainability techniques failed to identify important neighbors when GNNs without self-loops are used." into Simplified Chinese.>Here's the translation:新兴的图神经网络(GNNs)可解释性领域在最近几年内迅速发展。在这篇论文中,我们解决了GNN分类节点时每个邻居的重要性问题,以及如何衡量这种特定任务的性能。为此,我们将已知的解释方法重新表述以获取邻居重要性,并提出了四个新的指标。我们的结果表明,GNN域中使用梯度基本技术的解释几乎没有区别。此外, без自环GNN时,许多解释技术无法识别重要的邻居。

Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion

  • paper_url: http://arxiv.org/abs/2311.08104
  • repo_url: None
  • paper_authors: Anders R. Bargum, Stefania Serafin, Cumhur Erkut
    for: 这篇论文主要针对 Deep Learning 技术在语音转换(VC)中的应用,具体来说是在语音识别和生成方面。methods: 该论文使用了文献层次检索的方法,检索了2017-2023年间发表的621篇论文,并对最终选择的123篇论文进行了深入审查。results: 根据文献审查,该论文总结了 Deep Learning 技术在语音转换中最常用的方法,并指出了这些方法中的一些常见坑缺。最后,文章提出了未来研究方向的建议。
    Abstract Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios is getting increasingly popular. Although many of the works in the field of voice conversion share a common global pipeline, there is a considerable diversity in the underlying structures, methods, and neural sub-blocks used across research efforts. Thus, obtaining a comprehensive understanding of the reasons behind the choice of the different methods in the voice conversion pipeline can be challenging, and the actual hurdles in the proposed solutions are often unclear. To shed light on these aspects, this paper presents a scoping review that explores the use of deep learning in speech analysis, synthesis, and disentangled speech representation learning within modern voice conversion systems. We screened 621 publications from more than 38 different venues between the years 2017 and 2023, followed by an in-depth review of a final database consisting of 123 eligible studies. Based on the review, we summarise the most frequently used approaches to voice conversion based on deep learning and highlight common pitfalls within the community. Lastly, we condense the knowledge gathered, identify main challenges and provide recommendations for future research directions.
    摘要 研究在深度学习支持的语音转换(VC)场景下是越来越受欢迎。虽然许多voice转换研究的基本管道相似,但在不同研究尝试中使用的结构、方法和神经元块之间存在很大的多样性。因此,了解不同方法的选择理由以及现有解决方案中的困难可能很困难。为了突出这些方面,本文通过 scoping review 来探讨现代语音转换系统中的深度学习在语音分析、生成和独立语音表示学习方面的应用。我们从2017年至2023年间的38个不同场合中检索了621篇论文,并对最终的123篇可靠的研究进行了深入审查。根据审查,我们总结了使用深度学习进行语音转换的最常用方法,并 highlighted 在社区中的共同困难。最后,我们总结了所获知识,标识了主要挑战,并提供了未来研究方向的建议。

  • paper_url: http://arxiv.org/abs/2311.08103
  • repo_url: https://github.com/nishchalprasad/semi-supervised-stacked-encoder
  • paper_authors: Nishchal Prasad, Mohand Boughanem, Taoufiq Dkaki
  • for: 这篇论文的目的是预测法律案件的判决结果,并且使用了不同的方法来提高预测的准确性。
  • methods: 这篇论文使用了域pecific预训BERT来提取长文档中的信息,并使用了变换器Encoder层进行进一步处理。此外,它还使用了不supervised clustering来提取隐藏的标签,以更好地预测法律案件的判决结果。
  • results: 该论文的实验结果表明,使用这种两级分类机制可以比前方法在ILDC数据集上得到更高的性能提升。此外,实验还表明了域pecific预训Transformer Encoder在法律信息处理中的重要性。
    Abstract Predicting the judgment of a legal case from its unannotated case facts is a challenging task. The lengthy and non-uniform document structure poses an even greater challenge in extracting information for decision prediction. In this work, we explore and propose a two-level classification mechanism; both supervised and unsupervised; by using domain-specific pre-trained BERT to extract information from long documents in terms of sentence embeddings further processing with transformer encoder layer and use unsupervised clustering to extract hidden labels from these embeddings to better predict a judgment of a legal case. We conduct several experiments with this mechanism and see higher performance gains than the previously proposed methods on the ILDC dataset. Our experimental results also show the importance of domain-specific pre-training of Transformer Encoders in legal information processing.
    摘要 预测法律案件判决结果从不注释案件事实是一项复杂的任务。非统一的文档结构和长文档更加增加了提取信息的挑战。在这项工作中,我们探索并提议了一种两级分类机制:一种是supervised,另一种是无监督的。我们使用域务特定的预训练BERT来提取长文档中的句子嵌入,然后使用变换器Encoder层进一步处理,并使用无监督聚类来从这些嵌入中提取隐藏的标签,以更好地预测法律案件的判决结果。我们在ILDC数据集上进行了多个实验,并观察到了较高的性能提升,比之前的方法更高。我们的实验结果还表明了域务特定的Transformer Encoder在法律信息处理中的重要性。

Empowering Multi-step Reasoning across Languages via Tree-of-Thoughts

  • paper_url: http://arxiv.org/abs/2311.08097
  • repo_url: None
  • paper_authors: Leonardo Ranaldi, Fabio Massimo Zanzotto
  • for: 提高大型自然语言模型(LLM)的逻辑能力,使其能够解决复杂的逻辑任务 step-by-step。
  • methods: 提出了一种跨语言多步逻辑approach,通过自适应跨语言提问机制,使得不同语言的逻辑过程协调一致。
  • results: 对 existed prompting方法进行比较,our方法能够显著提高LLM的性能,降低交互次数,达到领先水平。
    Abstract Chain-of-Thought (CoT) prompting empowers the reasoning abilities of Large Language Models (LLMs), eliciting them to solve complex reasoning tasks step-by-step. However, with the success of CoT methods, the ability to deliver multi-step reasoning remains limited to English due to the imbalance in the distribution of the pre-training data, making the other languages a barrier. In this work, we propose a Cross-lingual multi-step reasoning approach, aiming to align reasoning processes across different languages. In particular, our method, through a Self-consistent Cross-lingual prompting mechanism inspired by the Tree-of-Thoughts approach, delivers multi-step reasoning paths in different languages that, during the steps, lead to the final solution. Our experimental evaluations show that our method significantly outperforms existing prompting methods, reducing the number of interactions and achieving state-of-the-art performance.
    摘要 大脑思维链接(CoT)提示技术可以增强大语言模型(LLM)的推理能力,使其解决复杂的推理任务一步一步。然而,CoT方法的成功却受到语言障碍,因为预训练数据的分布不均衡,只有英语可以进行多步推理。在这种情况下,我们提出了跨语言多步推理方法,旨在将推理过程 across different languages 融合。我们的方法通过一种自适应跨语言提示机制,在不同语言中提供多步推理路径,其中每步都会导致最终的解决方案。我们的实验证明,我们的方法可以明显超越现有的提示方法,降低交互次数并达到领先性表现。

Act-VIT: A Representationally Robust Attention Architecture for Skeleton Based Action Recognition Using Vision Transformer

  • paper_url: http://arxiv.org/abs/2311.08094
  • repo_url: None
  • paper_authors: Ozge Oztimur Karadag
  • for: 本研究旨在检验视觉变换器在skeleton基于动作识别中的效果,以及其在pseudo-image表示方案上的稳定性。
  • methods: 本研究提出了一种三级架构Act-VIT,通过在不同层次上形成pseudo图像,并将其传递给视觉变换器和卷积神经网络进行处理。
  • results: 实验结果表明,视觉变换器比卷积神经网络更具抗性能对pseudo图像的初始化表示方式,但是通过多个分类器的协同来提高识别性能。
    Abstract Skeleton-based action recognition receives the attention of many researchers as it is robust to viewpoint and illumination changes, and its processing is much more efficient than video frames. With the emergence of deep learning models, it has become very popular to represent the skeleton data in pseudo-image form and apply Convolutional Neural Networks for action recognition. Thereafter, studies concentrated on finding effective methods for forming pseudo-images. Recently, attention networks, more specifically transformers have provided promising results in various vision problems. In this study, the effectiveness of vision transformers for skeleton-based action recognition is examined and its robustness on the pseudo-image representation scheme is investigated. To this end, a three-level architecture, Act-VIT is proposed, which forms a set of pseudo images apply a classifier on each of the representation and combine their results to find the final action class. The classifiers of Act-VIT are first realized by CNNs and then by VITs and their performances are compared. Experimental studies reveal that the vision transformer is less sensitive to the initial pseudo-image representation compared to CNN. Nevertheless, even with the vision transformer, the recognition performance can be further improved by consensus of classifiers.
    摘要 skeleton-based action recognition receives 多个研究人员的关注,因为它能够抗衡视点和照明变化,并且处理效率高于视频帧。随着深度学习模型的出现,人们开始将skeleton数据表示为pseudo-image形式,并应用Convolutional Neural Networks(CNN) для行动识别。然后,研究者集中了关注如何形成pseudo-image。最近,关注网络,具体来说是transformer,在视觉问题中提供了有前途的结果。本研究检验了skeleton-based action recognition中vision transformer的效果,并investigate其在pseudo-image表示方案中的Robustness。为此,我们提出了一个三级架构Act-VIT,该架构包括 forming pseudo images,并在每个表示中应用一个分类器。每个分类器都是使用CNN或VIT来实现,并对其性能进行比较。实验研究表明,vision transformer对于初始pseudo-image表示的敏感度较低,但是,即使使用vision transformer,还可以通过协调分类器来提高识别性能。

Spot: A Natural Language Interface for Geospatial Searches in OSM

  • paper_url: http://arxiv.org/abs/2311.08093
  • repo_url: None
  • paper_authors: Lynn Khellaf, Ipek Baris Schlicht, Julia Bayer, Ruben Bouwmeester, Tilman Miraß, Tilman Wagner
  • for: 这篇论文是为了提供一个用户友好的自然语言 интерфей斯来查询OpenStreetMap(OSM)数据而写的。
  • methods: 该论文使用了一种基于自然语言的semantic mapping,通过人工生成的句子查询和T5trasformer来实现从用户输入句子中提取有关信息并在地图上显示匹配的候选位置。
  • results: 该论文通过Spot这个用户友好的自然语言 интерфей스,可以帮助无技术背景的人查询OSM数据,提高了访问和使用OSM的可用性和用户体验。
    Abstract Investigative journalists and fact-checkers have found OpenStreetMap (OSM) to be an invaluable resource for their work due to its extensive coverage and intricate details of various locations, which play a crucial role in investigating news scenes. Despite its value, OSM's complexity presents considerable accessibility and usability challenges, especially for those without a technical background. To address this, we introduce 'Spot', a user-friendly natural language interface for querying OSM data. Spot utilizes a semantic mapping from natural language to OSM tags, leveraging artificially generated sentence queries and a T5 transformer. This approach enables Spot to extract relevant information from user-input sentences and display candidate locations matching the descriptions on a map. To foster collaboration and future advancement, all code and generated data is available as an open-source repository.
    摘要

CPSOR-GCN: A Vehicle Trajectory Prediction Method Powered by Emotion and Cognitive Theory

  • paper_url: http://arxiv.org/abs/2311.08086
  • repo_url: None
  • paper_authors: L. Tang, Y. Li, J. Yuan, A. Fu, J. Sun
  • for: 这篇论文旨在提出一个新的车辆预测路径模型,以便在车辆驾驶者当中存在不正常情绪时提高预测的准确性。
  • methods: 这篇论文使用了一个新的预测路径模型,即CPSOR-GCN,它利用了物理GCN模块和认知GCN模块来预测车辆驾驶者的情绪对驾驶行为的影响。
  • results: 实验结果显示,相比于仅考虑物理动向特征,CPSOR-GCN模型的预测精度提高了68.70%。此外,利用SOR认知理论建构DBN结构,可以更好地捕捉驾驶者情绪对驾驶行为的影响,从而降低预测误差。CPSOR-GCN模型比其他进阶预测模型更低的错误值,这些结果显示CPSOR-GCN模型可以更好地适应驾驶者情绪,从而实现更高的预测精度。
    Abstract Active safety systems on vehicles often face problems with false alarms. Most active safety systems predict the driver's trajectory with the assumption that the driver is always in a normal emotion, and then infer risks. However, the driver's trajectory uncertainty increases under abnormal emotions. This paper proposes a new trajectory prediction model: CPSOR-GCN, which predicts vehicle trajectories under abnormal emotions. At the physical level, the interaction features between vehicles are extracted by the physical GCN module. At the cognitive level, SOR cognitive theory is used as prior knowledge to build a Dynamic Bayesian Network (DBN) structure. The conditional probability and state transition probability of nodes from the calibrated SOR-DBN quantify the causal relationship between cognitive factors, which is embedded into the cognitive GCN module to extract the characteristics of the influence mechanism of emotions on driving behavior. The CARLA-SUMO joint driving simulation platform was built to develop dangerous pre-crash scenarios. Methods of recreating traffic scenes were used to naturally induce abnormal emotions. The experiment collected data from 26 participants to verify the proposed model. Compared with the model that only considers physical motion features, the prediction accuracy of the proposed model is increased by 68.70%. Furthermore,considering the SOR-DBN reduces the prediction error of the trajectory by 15.93%. Compared with other advanced trajectory prediction models, the results of CPSOR-GCN also have lower errors. This model can be integrated into active safety systems to better adapt to the driver's emotions, which could effectively reduce false alarms.
    摘要 现有的活动安全系统 often 面临问题,即 false alarms。大多数活动安全系统预测 drivers 的轨迹,假设 drivers 总是在正常情绪下行驶,然后推断风险。然而, drivers 的轨迹不确定性会增加在不正常情绪下。这篇论文提出了一种新的轨迹预测模型:CPSOR-GCN,可以预测在不正常情绪下的轨迹。物理层面上,物理GCN模块提取了车辆之间的互动特征。认知层面上,SOR认知理论被用作先验知识,建立了一个动态概率网络(DBN)结构。DBN 结构中的 Conditional probability 和状态转移概率量化了认知因素的 causal 关系,并将其嵌入认知GCN模块中,以提取驾驶行为中情绪的影响机制的特征。在 CARLA-SUMO JOINT 驾驶模拟平台上,建立了危险前碰撞场景。使用方法 recreating 交通场景,以自然地引起不正常情绪。实验收集了26名参与者的数据,以验证提议模型。与只考虑物理动作特征的模型相比,提议模型的预测精度提高了68.70%。同时,使用 SOR-DBN 减少预测误差的方法,预测误差减少了15.93%。与其他高级轨迹预测模型相比,结果也有较低的误差。这种模型可以结合活动安全系统,更好地适应驾驶员的情绪, thereby reducing false alarms.

Solving ARC visual analogies with neural embeddings and vector arithmetic: A generalized method

  • paper_url: http://arxiv.org/abs/2311.08083
  • repo_url: https://github.com/foger3/arc_deeplearning
  • paper_authors: Luca H. Thoms, Karel A. Veldkamp, Hannes Rosenbusch, Claire E. Stevenson
    for: This paper focuses on visual analogical reasoning and applies the initial generalized mechanism used to solve verbal analogies to the visual realm.methods: The approach uses a variational autoencoder (VAE) to transform Abstraction and Reasoning Corpus (ARC) items into low-dimensional latent vectors, and then uses simple vector arithmetic to discover the underlying rules of ARC items and solve them.results: The approach works well on simple items with fewer dimensions, similar input-to-output examples, and high reconstruction accuracy on the VAE. However, predictions on more complex items showed stronger deviations from expected outputs, although they still often approximated parts of the item’s rule set. The model achieved a score of 2% on the official ARC paradigm and 8.8% on ConceptARC.
    Abstract Analogical reasoning derives information from known relations and generalizes this information to similar yet unfamiliar situations. One of the first generalized ways in which deep learning models were able to solve verbal analogies was through vector arithmetic of word embeddings, essentially relating words that were mapped to a vector space (e.g., king - man + woman = __?). In comparison, most attempts to solve visual analogies are still predominantly task-specific and less generalizable. This project focuses on visual analogical reasoning and applies the initial generalized mechanism used to solve verbal analogies to the visual realm. Taking the Abstraction and Reasoning Corpus (ARC) as an example to investigate visual analogy solving, we use a variational autoencoder (VAE) to transform ARC items into low-dimensional latent vectors, analogous to the word embeddings used in the verbal approaches. Through simple vector arithmetic, underlying rules of ARC items are discovered and used to solve them. Results indicate that the approach works well on simple items with fewer dimensions (i.e., few colors used, uniform shapes), similar input-to-output examples, and high reconstruction accuracy on the VAE. Predictions on more complex items showed stronger deviations from expected outputs, although, predictions still often approximated parts of the item's rule set. Error patterns indicated that the model works as intended. On the official ARC paradigm, the model achieved a score of 2% (cf. current world record is 21%) and on ConceptARC it scored 8.8%. Although the methodology proposed involves basic dimensionality reduction techniques and standard vector arithmetic, this approach demonstrates promising outcomes on ARC and can easily be generalized to other abstract visual reasoning tasks.
    摘要 通过对已知关系的推理,深度学习模型可以从知道的关系中提取信息,并将其推广到类似 yet 不熟悉的情况。在字符串类型的逻辑推理方面,深度学习模型通过Word embedding vector arithmetic来解决逻辑推理问题,例如king - man + woman = ?。相比之下,对于视觉类型的逻辑推理,大多数尝试都是任务特定的,更难推广。这个项目推广视觉逻辑推理,并将word embedding vector arithmetic的初始化机制应用到视觉领域。使用Abstraction and Reasoning Corpus(ARC)作为研究例子,我们使用变量自适应网络(VAE)将ARC项目转换成低维ensional的latent vector,类似于word embedding。通过简单的 vector 算术,我们发现了ARC项目的下面规则,并使用这些规则来解决它们。结果表明,该方法在简单的项目(即具有 fewer colors 和固定形状)上工作良好,并且在VAE中实现了高重建率。对于更复杂的项目,predictions 表现 stronger 的偏差,although predictions 仍然可以 aproximate 部分项目规则集。错误模式表明方法正常工作。在官方ARC paradigm上,模型 achieved 2% 得分(与当前世界纪录21%相比),并在ConceptARC上得分 8.8%。虽然方法包括基本维度减少技术和标准 vector 算术,但该方法在ARC上表现出了扎实的结果,并可以轻松扩展到其他抽象视觉逻辑任务。

Zero-Shot Segmentation of Eye Features Using the Segment Anything Model (SAM)

  • paper_url: http://arxiv.org/abs/2311.08077
  • repo_url: None
  • paper_authors: Virmarie Maquiling, Sean Anthony Byrne, Diederick C. Niehorster, Marcus Nyström, Enkelejda Kasneci
  • for: 这个论文主要是为了检测眼影像中的特征,以及使用基础模型来进行图像分割。
  • methods: 本研究使用了基础模型Segment Anything Model(SAM),并利用了零批学习和绑定框或点击提示来提高模型的性能。
  • results: 研究发现,使用SAM可以在眼影像分割中达到与专门模型相当的性能,并且使用提示可以提高模型的性能,例如在一个数据集中,使用绑定框的提示后,SAM的 IoU 达到 93.34%。
    Abstract The advent of foundation models signals a new era in artificial intelligence. The Segment Anything Model (SAM) is the first foundation model for image segmentation. In this study, we evaluate SAM's ability to segment features from eye images recorded in virtual reality setups. The increasing requirement for annotated eye-image datasets presents a significant opportunity for SAM to redefine the landscape of data annotation in gaze estimation. Our investigation centers on SAM's zero-shot learning abilities and the effectiveness of prompts like bounding boxes or point clicks. Our results are consistent with studies in other domains, demonstrating that SAM's segmentation effectiveness can be on-par with specialized models depending on the feature, with prompts improving its performance, evidenced by an IoU of 93.34% for pupil segmentation in one dataset. Foundation models like SAM could revolutionize gaze estimation by enabling quick and easy image segmentation, reducing reliance on specialized models and extensive manual annotation.
    摘要 新的基础模型signal出一新的人工智能时代。Segment Anything Model(SAM)是首个用于图像分割的基础模型。在这项研究中,我们评估SAM在虚拟现实环境下记录的眼图像中分割特征的能力。由于需要更多的注解眼图像集合,这种需求对SAM来说是一个重要的机会,可以重新定义眼视 estimation数据注解的场景。我们的研究集中在SAM的零基础学习能力和点击矩形或点击提示的效果。我们的结果与其他领域的研究一致,表明SAM的分割效果可以与专门的模型相比,并且提示可以提高其性能,例如在一个数据集中,SAM的眼白分割精度达93.34%。基础模型如SAM可能会革命化眼视 estimation,因为它可以快速和容易地分割图像,减少专门的模型和手动注解的依赖。

Adversarial Preference Optimization

  • paper_url: http://arxiv.org/abs/2311.08045
  • repo_url: None
  • paper_authors: Pengyu Cheng, Yifan Yang, Jian Li, Yong Dai, Nan Du
  • for: 大型自然语言模型(LLM)的互动质量提升需要人类偏好Alignment。
  • methods: 我们提出了一个对抗偏好优化(APO)框架, LLMAgent 和偏好模型在一个 min-max 游戏中轮流更新。
  • results: 在实验中,我们证明了 APO 能够提高 LLM 的帮助性和无害性,比基eline拒绝抽样法更好。I hope this helps! Let me know if you have any other questions.
    Abstract Human preference alignment is a crucial training step to improve the interaction quality of large language models (LLMs). Existing aligning methods depend on manually annotated preference data to guide the LLM optimization directions. However, in practice, continuously updating LLMs raises a distribution gap between model-generated samples and human-preferred responses, which hinders model fine-tuning efficiency. To mitigate this issue, previous methods require additional preference annotation on generated samples to adapt the shifted distribution, which consumes a large amount of annotation resources. Targeting more efficient human preference optimization, we propose an adversarial preference optimization (APO) framework, where the LLM agent and the preference model update alternatively via a min-max game. Without additional annotation, our APO method can make a self-adaption to the generation distribution gap through the adversarial learning process. In experiments, we empirically verify the effectiveness of APO in improving LLM's helpfulness and harmlessness compared with rejection sampling baselines.
    摘要 人类偏好对alignment是大型语言模型(LLM)训练的重要步骤。现有的对齐方法依赖于手动标注的偏好数据来引导LLM优化方向。然而,在实践中,不断更新LLM会导致模型生成的样本和人类首选回答之间的分布差异增大,这会降低模型细化效率。为解决这个问题,先前的方法需要进行额外的偏好标注,以适应模型生成的分布变化,这需要大量的标注资源。targeting更高效的人类偏好优化,我们提议一种对抗偏好优化(APO)框架,在这个框架中,LLM代理和偏好模型在一个min-max游戏中相互更新。无需额外的标注,我们的APO方法可以通过对抗学习过程来自适化生成分布差异。在实验中,我们证明了APO的效果比基准抽样方法更高。

Data-driven building energy efficiency prediction based on envelope heat losses using physics-informed neural networks

  • paper_url: http://arxiv.org/abs/2311.08035
  • repo_url: None
  • paper_authors: Vasilis Michalakopoulos, Sotiris Pelekis, Giorgos Kormpakis, Vagelis Karakolis, Spiros Mouzakitis, Dimitris Askounis
  • for: 这个研究旨在提供一个基于建筑物封顶元件热损失的能源性能预测模型,以便自动化和基于建筑物基本特征的能源效率性能预测。
  • methods: 本研究使用一种新的物理学 informative 神经网络模型,通过使用广泛的建筑物资料,包括一般建筑物资讯、测量特征和热能消耗,将神经网络模型训练,并基于物理方程式计算建筑物的能源消耗。
  • results: 本研究在实际应用中获得了良好的预测精度,这显示了这种基于建筑物封顶元件热损失的能源性能预测模型具有可靠性和可行性。
    Abstract The analytical prediction of building energy performance in residential buildings based on the heat losses of its individual envelope components is a challenging task. It is worth noting that this field is still in its infancy, with relatively limited research conducted in this specific area to date, especially when it comes for data-driven approaches. In this paper we introduce a novel physics-informed neural network model for addressing this problem. Through the employment of unexposed datasets that encompass general building information, audited characteristics, and heating energy consumption, we feed the deep learning model with general building information, while the model's output consists of the structural components and several thermal properties that are in fact the basic elements of an energy performance certificate (EPC). On top of this neural network, a function, based on physics equations, calculates the energy consumption of the building based on heat losses and enhances the loss function of the deep learning model. This methodology is tested on a real case study for 256 buildings located in Riga, Latvia. Our investigation comes up with promising results in terms of prediction accuracy, paving the way for automated, and data-driven energy efficiency performance prediction based on basic properties of the building, contrary to exhaustive energy efficiency audits led by humans, which are the current status quo.
    摘要 building的能效性预测是一个复杂的任务,特别是在封闭系统元件的热损失方面。在这篇论文中,我们提出了一种新的物理学 Informed neural network 模型,用于解决这个问题。我们利用了一些涉及普通建筑信息的未公开数据集,包括建筑的总体特征、核心特性和加热能耗。我们将这些信息作为深度学习模型的输入,模型的输出包括建筑的结构组件和一些热性能性量,这些量是实际的能效性证书(EPC)的基本元素。在这个模型之上,基于物理方程的函数计算了建筑的能 consumption,从而提高了深度学习模型的损失函数。我们在利加市(Riga,Latvia)的256座建筑的实际案例中测试了这种方法,结果表明这种方法在预测精度方面具有扎实的表现,这开 up a new way for automatic, data-driven energy efficiency performance prediction based on basic properties of the building, rather than relying on human-led exhaustive energy efficiency audits, which are the current status quo.

Two-Stage Predict+Optimize for Mixed Integer Linear Programs with Unknown Parameters in Constraints

  • paper_url: http://arxiv.org/abs/2311.08022
  • repo_url: https://github.com/elizabethxyhu/neurips_two_stage_predict-optimize
  • paper_authors: Xinyi Hu, Jasper C. H. Lee, Jimmy H. M. Lee
  • For: 这个论文是关于受限优化问题的端到端训练超参数学习模型的框架,其中一些参数在解决时未知。* Methods: 论文提出了一种新的两Stage Predict+Optimize框架,该框架可以更好地考虑到优化问题中未知参数的影响,并且可以用于所有杂合integer线性程序。* Results: 实验结果表明,该论文提出的训练框架可以比古典和当前最佳方法提供更好的预测性能。
    Abstract Consider the setting of constrained optimization, with some parameters unknown at solving time and requiring prediction from relevant features. Predict+Optimize is a recent framework for end-to-end training supervised learning models for such predictions, incorporating information about the optimization problem in the training process in order to yield better predictions in terms of the quality of the predicted solution under the true parameters. Almost all prior works have focused on the special case where the unknowns appear only in the optimization objective and not the constraints. Hu et al.~proposed the first adaptation of Predict+Optimize to handle unknowns appearing in constraints, but the framework has somewhat ad-hoc elements, and they provided a training algorithm only for covering and packing linear programs. In this work, we give a new \emph{simpler} and \emph{more powerful} framework called \emph{Two-Stage Predict+Optimize}, which we believe should be the canonical framework for the Predict+Optimize setting. We also give a training algorithm usable for all mixed integer linear programs, vastly generalizing the applicability of the framework. Experimental results demonstrate the superior prediction performance of our training framework over all classical and state-of-the-art methods.
    摘要 假设我们处在受限优化的设定下,其中一些参数在解决过程中未知。预测+优化是一种最近的框架,用于综合训练超出监督学习模型以便对这些预测做出更好的预测。大多数先前的工作都集中在特殊情况下,即未知参数只出现在优化目标中。胡等等提出了首次对 Predict+Optimize 框架进行适应,但该框架具有一些偶极元素,并且只提供了覆盖和压缩线性程序的训练算法。在这项工作中,我们提出了一种新的简单化和更强大的框架,称为两个阶段预测+优化框架。我们认为这应该是Predict+Optimize设定的准确框架。我们还提供了可以应用于所有杂合Integer线性程序的训练算法, thereby greatly expanding the applicability of the framework。实验结果表明我们的训练框架在所有古典和当前方法上具有更好的预测性能。

Distantly-Supervised Named Entity Recognition with Uncertainty-aware Teacher Learning and Student-student Collaborative Learning

  • paper_url: http://arxiv.org/abs/2311.08010
  • repo_url: None
  • paper_authors: Helan Hu, Shuzheng Si, Haozhe Zhao, Shuang Zeng, Kaikai An, Zefan Cai, Baobao Chang
  • for: 提高 Distantly-Supervised Named Entity Recognition (DS-NER) 的精度和稳定性,适用于减少标注噪音。
  • methods: 提出 Uncertainty-aware Teacher Learning 和 Student-student Collaborative Learning 两种方法,分别利用预测uncertainty和学生网络之间的合作来提高模型的精度和稳定性。
  • results: 在五个 DS-NER 数据集上进行了广泛的实验,并证明了我们的方法比现有的教师生Student方法更高效。
    Abstract Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the burden of annotation, but meanwhile suffers from the label noise. Recent works attempt to adopt the teacher-student framework to gradually refine the training labels and improve the overall robustness. However, we argue that these teacher-student methods achieve limited performance because poor network calibration produces incorrectly pseudo-labeled samples, leading to error propagation. Therefore, we attempt to mitigate this issue by proposing: (1) Uncertainty-aware Teacher Learning that leverages the prediction uncertainty to guide the selection of pseudo-labels, avoiding the number of incorrect pseudo-labels in the self-training stage. (2) Student-student Collaborative Learning that allows the transfer of reliable labels between two student networks instead of completely relying on all pseudo-labels from its teacher. Meanwhile, this approach allows a full exploration of mislabeled samples rather than simply filtering unreliable pseudo-labeled samples. Extensive experimental results on five DS-NER datasets demonstrate that our method is superior to state-of-the-art teacher-student methods.
    摘要 难以监督的命名实体识别(DS-NER)有效减轻监督的负担,但同时受到标签噪声的影响。现有的作品尝试采用教师生成框架来慢慢精细地改善训练标签,以提高总体的稳定性。然而,我们认为这些教师生成方法的性能有限,因为差异网络准备产生错误的预测标签,导致错误嵌入的问题。因此,我们尝试解决这个问题,通过提议以下两点:1. uncertainty-aware teacher learning,通过预测不确定性来导引选择 pseudo-标签,以避免在自动训练阶段中出现的错误 pseudo-标签。2. student-student collaborative learning,允许两个学生网络之间进行可靠标签的传递,而不是完全依赖所有pseudo-标签来自其教师。同时,这种方法允许探索批量标签错误的整个样本,而不是仅仅过滤 pseudo-标签中的错误样本。我们在五个 DS-NER 数据集上进行了广泛的实验,并证明了我们的方法在教师生成方法中表现出优于状态之前。

Iterative missing value imputation based on feature importance

  • paper_url: http://arxiv.org/abs/2311.08005
  • repo_url: None
  • paper_authors: Cong Guo, Chun Liu, Wei Yang
  • for: addresses the problem of missing values in datasets, which can reduce the accuracy of classification tasks and increase processing difficulty.
  • methods: proposes an imputation method that considers feature importance, which iteratively performs matrix completion and feature importance learning.
  • results: consistently outperforms five existing imputation algorithms on synthetic and real-world datasets with different types of missing values.
    Abstract Many datasets suffer from missing values due to various reasons,which not only increases the processing difficulty of related tasks but also reduces the accuracy of classification. To address this problem, the mainstream approach is to use missing value imputation to complete the dataset. Existing imputation methods estimate the missing parts based on the observed values in the original feature space, and they treat all features as equally important during data completion, while in fact different features have different importance. Therefore, we have designed an imputation method that considers feature importance. This algorithm iteratively performs matrix completion and feature importance learning, and specifically, matrix completion is based on a filling loss that incorporates feature importance. Our experimental analysis involves three types of datasets: synthetic datasets with different noisy features and missing values, real-world datasets with artificially generated missing values, and real-world datasets originally containing missing values. The results on these datasets consistently show that the proposed method outperforms the existing five imputation algorithms.To the best of our knowledge, this is the first work that considers feature importance in the imputation model.
    摘要 很多数据集受到缺失值的影响,这不仅增加相关任务的处理难度,还降低分类的准确率。为解决这个问题,主流方法是使用缺失值补充来完善数据集。现有的补充方法都是基于原始特征空间中观察到的值来估计缺失部分,而它们往往视所有特征为等importance,而实际上不同的特征有不同的重要性。因此,我们设计了一种考虑特征重要性的补充方法。这个算法在迭代完成矩阵和特征重要性学习中,特别是基于填充损失函数来实现矩阵完成。我们对三种不同类型的数据集进行了实验分析:Synthetic数据集具有不同的噪声特征和缺失值、实际世界数据集通过人工生成缺失值、实际世界数据集原本包含缺失值。结果表明,我们提出的方法在这些数据集上一直表现出色,并且在已知的五种补充算法中具有突出的优势。到目前为止,这是首次考虑特征重要性的补充模型。

TempTabQA: Temporal Question Answering for Semi-Structured Tables

  • paper_url: http://arxiv.org/abs/2311.08002
  • repo_url: None
  • paper_authors: Vivek Gupta, Pranshu Kandoi, Mahek Bhavesh Vora, Shuo Zhang, Yujie He, Ridho Reinanda, Vivek Srikumar
  • for: 本研究旨在检验现有的自然语言处理(NLP)系统是否可以理解半结构化数据中的时间信息。
  • methods: 研究人员使用了一个新的任务——半结构化表中的时间问答(TempTabQA),并使用了11,454个问答对和1,208个WIkipedia Infobox表来评估几种当前的状态顶模型。
  • results: 研究人员发现,即使使用最高性能的语言模型(LLMs),其们在 TempTabQA 任务中的表现仍然落后人类性能的13.5个F1分点。这些结果表明, TempTabQA 数据集有potential来 serve as a challenging benchmark to improve the temporal reasoning capabilities of NLP models。
    Abstract Semi-structured data, such as Infobox tables, often include temporal information about entities, either implicitly or explicitly. Can current NLP systems reason about such information in semi-structured tables? To tackle this question, we introduce the task of temporal question answering on semi-structured tables. We present a dataset, TempTabQA, which comprises 11,454 question-answer pairs extracted from 1,208 Wikipedia Infobox tables spanning more than 90 distinct domains. Using this dataset, we evaluate several state-of-the-art models for temporal reasoning. We observe that even the top-performing LLMs lag behind human performance by more than 13.5 F1 points. Given these results, our dataset has the potential to serve as a challenging benchmark to improve the temporal reasoning capabilities of NLP models.
    摘要 现有的自然语言处理(NLP)系统是否可以理解半结构化数据中的时间信息?为了回答这个问题,我们介绍了半结构化表格中的时间问答任务。我们提供了一个数据集,TempTabQA,其中包含11,454个问答对 extracted from 1,208个Wikipedia Infobox表格,涵盖了 более90个不同领域。使用这个数据集,我们评估了多种当前状态的模型对时间理解能力。我们发现,even the top-performing LLMs lag behind human performance by more than 13.5 F1 points。 Based on these results, our dataset has the potential to serve as a challenging benchmark to improve the temporal reasoning capabilities of NLP models.

LiPar: A Lightweight Parallel Learning Model for Practical In-Vehicle Network Intrusion Detection

  • paper_url: http://arxiv.org/abs/2311.08000
  • repo_url: https://github.com/wangkai-tech23/LiPar
  • paper_authors: Aiheng Zhang, Kai Wang, Bailing Wang, Yulei Wu
  • for: 这个研究旨在提高智能交通系统中车辆网络的安全性,尤其是对 Controller Area Network (CAN) 的攻击探测。
  • methods: 本研究提出了一个轻量级平行神经网络结构(LiPar),用于分配任务负载到多个电子控制器(ECU)上。LiPar 模型包括多维度分支卷积网络、空间和时间特征融合学习和资源适应算法。
  • results: 经过实验证明,LiPar 具有优秀的检测性能、运行效率和轻量级模型大小,可以实际适用于车辆网络环境中,并能有效地保护车辆网络的 CAN 标准插座安全。
    Abstract With the development of intelligent transportation systems, vehicles are exposed to a complex network environment. As the main network of in-vehicle networks, the controller area network (CAN) has many potential security hazards, resulting in higher requirements for intrusion detection systems to ensure safety. Among intrusion detection technologies, methods based on deep learning work best without prior expert knowledge. However, they all have a large model size and rely on cloud computing, and are therefore not suitable to be installed on the in-vehicle network. Therefore, we propose a lightweight parallel neural network structure, LiPar, to allocate task loads to multiple electronic control units (ECU). The LiPar model consists of multi-dimensional branch convolution networks, spatial and temporal feature fusion learning, and a resource adaptation algorithm. Through experiments, we prove that LiPar has great detection performance, running efficiency, and lightweight model size, which can be well adapted to the in-vehicle environment practically and protect the in-vehicle CAN bus security.
    摘要 Translation in Simplified Chinese:随着智能交通系统的发展,车辆被暴露在复杂的网络环境中。CAN(控制器区域网络)作为车辆内网主要网络,具有许多安全隐患,因此需要更高的安全检测系统要求以确保安全。深度学习技术在检测技术中表现最佳,无需专业知识。然而,它们都具有大模型大小和依赖于云计算,因此不适合在车辆内网安装。因此,我们提出了一种轻量级并行神经网络结构,LiPar,以分配任务负担到多个电子控制单元(ECU)。LiPar模型包括多维分支卷积网络、空间和时间特征融合学习和资源适应算法。通过实验,我们证明了LiPar具有优秀的检测性能、运行效率和轻量级模型大小,可以实际上适应车辆环境,保护车辆CAN总线安全。

Probable Object Location (POLo) Score Estimation for Efficient Object Goal Navigation

  • paper_url: http://arxiv.org/abs/2311.07992
  • repo_url: None
  • paper_authors: Jiaming Wang, Harold Soh
  • for: 提高自主机器人领域中对物体搜索任务的效率,特别是在未探索环境中。
  • methods: 使用可信度对象投影图创建可信度物体位置分布(POLo),并使用POLoNet neural network来近似计算复杂的POLo分布,从而帮助机器人做出数据驱动的决策。
  • results: 在OVMM 2023挑战的第一阶段中,一个装备有POLoNet的机器人significantly outperforms了许多基eline方法,包括末端奖励学习方法和传统的地图基本策略。
    Abstract To advance the field of autonomous robotics, particularly in object search tasks within unexplored environments, we introduce a novel framework centered around the Probable Object Location (POLo) score. Utilizing a 3D object probability map, the POLo score allows the agent to make data-driven decisions for efficient object search. We further enhance the framework's practicality by introducing POLoNet, a neural network trained to approximate the computationally intensive POLo score. Our approach addresses critical limitations of both end-to-end reinforcement learning methods, which suffer from memory decay over long-horizon tasks, and traditional map-based methods that neglect visibility constraints. Our experiments, involving the first phase of the OVMM 2023 challenge, demonstrate that an agent equipped with POLoNet significantly outperforms a range of baseline methods, including end-to-end RL techniques and prior map-based strategies. To provide a comprehensive evaluation, we introduce new performance metrics that offer insights into the efficiency and effectiveness of various agents in object goal navigation.
    摘要 Our experiments, which involve the first phase of the OVMM 2023 challenge, show that an agent equipped with POLoNet significantly outperforms a range of baseline methods, including end-to-end reinforcement learning techniques and prior map-based strategies. To provide a comprehensive evaluation, we introduce new performance metrics that offer insights into the efficiency and effectiveness of various agents in object goal navigation. Our proposed framework has the potential to greatly improve the performance of autonomous robotics in object search tasks within unexplored environments.

A Survey on Language Models for Code

  • paper_url: http://arxiv.org/abs/2311.07989
  • repo_url: https://github.com/codefuse-ai/awesome-code-llm
  • paper_authors: Ziyin Zhang, Chaoyu Chen, Bingchang Liu, Cong Liao, Zi Gong, Hang Yu, Jianguo Li, Rui Wang
  • for: 本文系统性地综述了近期的代码处理方法,涵盖50+模型、30+评估任务以及500余相关工作。
  • methods: 本文将代码处理模型分为通用语言模型(如GPT家族)和专门预训练于代码的模型,讲述这些模型之间的关系和差异,并 highlights 代码模型的历史发展,从统计学模型和RNN逐渐转移到预训练的Transformers和LLMs,与NLP领域的发展轨迹一致。
  • results: 本文讨论了代码特有的特征,如AST、CFG和单元测试,以及它们在训练代码语言模型中的应用,并提出了代码处理领域的主要挑战和未来发展方向。
    Abstract In this work we systematically review the recent advancements in code processing with language models, covering 50+ models, 30+ evaluation tasks, and 500 related works. We break down code processing models into general language models represented by the GPT family and specialized models that are specifically pretrained on code, often with tailored objectives. We discuss the relations and differences between these models, and highlight the historical transition of code modeling from statistical models and RNNs to pretrained Transformers and LLMs, which is exactly the same course that had been taken by NLP. We also discuss code-specific features such as AST, CFG, and unit tests, along with their application in training code language models, and identify key challenges and potential future directions in this domain. We keep the survey open and updated on github repository at https://github.com/codefuse-ai/Awesome-Code-LLM.
    摘要 在这项工作中,我们系统性地报告了最近的代码处理技术发展,涵盖50多种模型、30多种评估任务以及500多个相关作品。我们将代码处理模型分为普通的语言模型,如GPT家族,以及专门预训练于代码的模型,经常具有定制目标。我们讲述这些模型之间的关系和差异,并高亮代码模型从统计学模型和RNN逐渐发展到预训练的Transformers和LLMs的历史发展,与NLP领域的发展一样。我们还讲述代码特有的特征,如AST、CFG和单元测试,以及它们在训练代码语言模型中的应用,并标识代码处理领域的关键挑战和未来发展方向。我们将survey保持开放并更新在GitHub上的https://github.com/codefuse-ai/Awesome-Code-LLM文件夹中。

How good are Large Language Models on African Languages?

  • paper_url: http://arxiv.org/abs/2311.07978
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Jessica Ojo, Kelechi Ogueji, Pontus Stenetorp, David I. Adelani
  • for: This paper is written to analyze the performance of three popular large language models (mT0, LLaMa 2, and GPT-4) on 30 African languages across five tasks (news topic classification, sentiment classification, machine translation, question answering, and named entity recognition).
  • methods: The paper uses these three language models as-is or fine-tunes them on African languages to evaluate their performance on various tasks.
  • results: The results show that all three language models have below-par performance on African languages, with GPT-4 performing well on classification tasks but poorly on generative tasks like machine translation. mT0 has the best overall performance on cross-lingual question answering, outperforming fine-tuned mT5 and GPT-4 on African languages. LLaMa 2 has the worst performance due to its limited multilingual capabilities and English-centric pre-training corpus.Here are the results in Simplified Chinese text:
  • for: 这个研究是为了分析三种流行的大语言模型(mT0、LLaMa 2、GPT-4)在30种非洲语言上的五个任务(新闻类别分类、情感分类、机器翻译、问答和命名实体识别)的性能。
  • methods: 这些研究使用这三种语言模型的直接使用或对非洲语言进行了微调来评估它们的性能。
  • results: 结果显示所有三种语言模型在非洲语言上都表现不佳,GPT-4在分类任务上表现非常好,但在生成任务上表现非常差。mT0在跨语言问答任务上表现最好,超过了微调后的mT5和GPT-4在非洲语言上的性能。LLaMa 2的性能最差,因为它的多语言能力有限,预训练语料集也偏向英语。
    Abstract Recent advancements in natural language processing have led to the proliferation of large language models (LLMs). These models have been shown to yield good performance, using in-context learning, even on unseen tasks and languages. Additionally, they have been widely adopted as language-model-as-a-service commercial APIs like GPT-4 API. However, their performance on African languages is largely unknown. We present an analysis of three popular large language models (mT0, LLaMa 2, and GPT-4) on five tasks (news topic classification, sentiment classification, machine translation, question answering, and named entity recognition) across 30 African languages, spanning different language families and geographical regions. Our results suggest that all LLMs produce below-par performance on African languages, and there is a large gap in performance compared to high-resource languages like English most tasks. We find that GPT-4 has an average or impressive performance on classification tasks but very poor results on generative tasks like machine translation. Surprisingly, we find that mT0 had the best overall on cross-lingual QA, better than the state-of-the-art supervised model (i.e. fine-tuned mT5) and GPT-4 on African languages. Overall, LLaMa 2 records the worst performance due to its limited multilingual capabilities and English-centric pre-training corpus. In general, our findings present a call-to-action to ensure African languages are well represented in large language models, given their growing popularity.
    摘要 近期的自然语言处理技术发展,使得大型语言模型(LLMs)在不同任务和语言上表现良好。这些模型已经广泛应用于语言模型作为服务的商业API,如GPT-4 API。然而,它们在非洲语言上的表现仍然不够了解。本文分析了三个流行的大型语言模型(mT0、LLaMa 2和GPT-4)在5个任务(新闻类别分类、情感分类、机器翻译、问答和命名实体识别) across 30种非洲语言,涵盖不同的语言家族和地理区域。我们的结果表明,所有的LLMs在非洲语言上表现较差,与高资源语言如英语的大多数任务相比,存在显著的性能差距。我们发现GPT-4在分类任务上表现很出色,但在生成任务如机器翻译中表现很差。另外,我们发现mT0在跨语言问答任务上表现最佳,高于现有的超级模型(即精心调教的mT5)和GPT-4。总的来说,LLaMa 2的表现最差,主要是因为它的多语言能力有限,以及英语中心的预训练集。在总的来说,我们的发现表现出了需要确保非洲语言在大型语言模型中得到良好表现的呼吁。

Uplift Modeling based on Graph Neural Network Combined with Causal Knowledge

  • paper_url: http://arxiv.org/abs/2311.08434
  • repo_url: https://github.com/xy2119/Causal_Knowledge_GNN
  • paper_authors: Haowen Wang, Xinyan Ye, Yangze Zhou, Zhiyi Zhang, Longhan Zhang, Jing Jiang
  • for: 这个论文的目的是提出一种基于图 neural network 的 uplift 模型,用于评估待遇的影响。
  • methods: 该论文使用了 causal 表示法和 adjacency matrix 结构学习,以及基于图 convolution network 的更可扩展的 uplift 模型。
  • results: 该论文的实验结果表明,该方法可以准确预测 uplift 值,并且在实际行业市场数据中得到了验证。
    Abstract Uplift modeling is a fundamental component of marketing effect modeling, which is commonly employed to evaluate the effects of treatments on outcomes. Through uplift modeling, we can identify the treatment with the greatest benefit. On the other side, we can identify clients who are likely to make favorable decisions in response to a certain treatment. In the past, uplift modeling approaches relied heavily on the difference-in-difference (DID) architecture, paired with a machine learning model as the estimation learner, while neglecting the link and confidential information between features. We proposed a framework based on graph neural networks that combine causal knowledge with an estimate of uplift value. Firstly, we presented a causal representation technique based on CATE (conditional average treatment effect) estimation and adjacency matrix structure learning. Secondly, we suggested a more scalable uplift modeling framework based on graph convolution networks for combining causal knowledge. Our findings demonstrate that this method works effectively for predicting uplift values, with small errors in typical simulated data, and its effectiveness has been verified in actual industry marketing data.
    摘要 <>用简化中文表示:>市场效果模型的一个基本组成部分是升级模型,用于评估干预的影响。通过升级模型,我们可以确定具有最大 benefit 的干预。另一方面,我们可以确定响应特定干预的客户,并且可以预测这些客户会作出有利决策。在过去,升级模型方法都是基于差异分析(DID)架构,并将机器学习模型作为估计学习者,而忽略特征之间的关系。我们提出了基于图 neural network 的框架,将 causal 知识与升级值估计相结合。首先,我们提出了一种基于 CATE(条件减差异效应)估计的 causal 表示技术,并学习邻接矩阵结构。其次,我们建议一种更可扩展的升级模型框架,基于图 convolution network 组合 causal 知识。我们的发现表明,这种方法可以有效地预测升级值,并且在 Typical 的模拟数据中具有小的误差。此外,我们的实际业务数据也证明了其效果。

Deep Learning-Based Object Detection in Maritime Unmanned Aerial Vehicle Imagery: Review and Experimental Comparisons

  • paper_url: http://arxiv.org/abs/2311.07955
  • repo_url: None
  • paper_authors: Chenjie Zhao, Ryan Wen Liu, Jingxiang Qu, Ruobin Gao
  • for: 本研究旨在探讨水上无人机(UAV)上的对象检测问题,尤其是在marine industry和ocean engineering领域中。
  • methods: 本文首先简要概述了水上UAV对象检测中的四大挑战,包括对象特征多样性、设备限制、海洋环境变化和数据缺乏等问题。然后,我们主要介绍了 Computational Methods 来提高水上UAV对象检测性能,包括Scale-aware、小对象检测、视角意识、旋转对象检测、轻量级方法等。
  • results: 本文还进行了一系列实验,以评估和分析对象检测方法在海洋数据集上的性能和稳定性。最后,我们给出了未来水上UAV对象检测的讨论和展望。 MS2ship数据集可以在 \href{https://github.com/zcj234/MS2ship}{https://github.com/zcj234/MS2ship} 上下载。
    Abstract With the advancement of maritime unmanned aerial vehicles (UAVs) and deep learning technologies, the application of UAV-based object detection has become increasingly significant in the fields of maritime industry and ocean engineering. Endowed with intelligent sensing capabilities, the maritime UAVs enable effective and efficient maritime surveillance. To further promote the development of maritime UAV-based object detection, this paper provides a comprehensive review of challenges, relative methods, and UAV aerial datasets. Specifically, in this work, we first briefly summarize four challenges for object detection on maritime UAVs, i.e., object feature diversity, device limitation, maritime environment variability, and dataset scarcity. We then focus on computational methods to improve maritime UAV-based object detection performance in terms of scale-aware, small object detection, view-aware, rotated object detection, lightweight methods, and others. Next, we review the UAV aerial image/video datasets and propose a maritime UAV aerial dataset named MS2ship for ship detection. Furthermore, we conduct a series of experiments to present the performance evaluation and robustness analysis of object detection methods on maritime datasets. Eventually, we give the discussion and outlook on future works for maritime UAV-based object detection. The MS2ship dataset is available at \href{https://github.com/zcj234/MS2ship}{https://github.com/zcj234/MS2ship}.
    摘要 随着水上无人飞行器(UAV)和深度学习技术的发展,水上UAV-基于物体探测的应用在marine industry和 ocean engineering领域变得越来越重要。搭载智能感知功能的水上UAV可以实现效果性的水上监测。为了进一步推动水上UAV-基于物体探测的发展,本文提供了全面的挑战、相关方法和UAV飞行图像/视频数据集的评审。具体来说,在本工作中,我们首先 briefly summarize four challenges for object detection on water-based UAVs, i.e., object feature diversity, device limitation, maritime environment variability, and dataset scarcity.然后,我们将关注计算方法,以提高水上UAV-基于物体探测性能,包括尺度感知、小物体探测、视角感知、旋转物体探测、轻量级方法等。接着,我们对UAV飞行图像/视频数据集进行了评审,并提出了一个名为MS2ship的水上UAV飞行数据集,用于船舶检测。然后,我们进行了一系列实验,以评估和分析物体探测方法在水上数据集上的性能和稳定性。最后,我们给出了未来水上UAV-基于物体探测的发展展望。MS2ship数据集可以在 中下载。

A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning

  • paper_url: http://arxiv.org/abs/2311.07954
  • repo_url: None
  • paper_authors: Ruixin Hong, Hongming Zhang, Xinyu Pang, Dong Yu, Changshui Zhang
  • for: 这篇论文探讨了使用自我检查来提高人工智能(AI)逻辑推理能力的可能性,尤其是在逻辑推理问题上。
  • methods: 论文使用了许多自我验证方法,包括使用各种逻辑缺陷数据集,对模型进行评估,并分析模型的逻辑推理能力。
  • results: 研究发现现有的大型语言模型(LLM)在逻辑推理问题上可能会准确地识别逻辑缺陷,但是它们可能无法准确地检查自己的逻辑推理步骤。
    Abstract Logical reasoning has been an ongoing pursuit in the field of AI. Despite significant advancements made by large language models (LLMs), they still struggle with complex logical reasoning problems. To enhance reasoning performance, one promising direction is scalable oversight, which requires LLMs to identify their own errors and then improve by themselves. Various self-verification methods have been proposed in pursuit of this goal. Nevertheless, whether existing models understand their own errors well is still under investigation. In this paper, we take a closer look at the self-verification abilities of LLMs in the context of logical reasoning, focusing on their ability to identify logical fallacies accurately. We introduce a dataset, FALLACIES, containing 232 types of reasoning fallacies categorized in a hierarchical taxonomy. By conducting exhaustive experiments on FALLACIES, we obtain comprehensive and detailed analyses of a series of models on their verification abilities. Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods. Drawing from these observations, we offer suggestions for future research and practical applications of self-verification methods.
    摘要 <>转换文本为简化中文。<>人工智能领域内,逻辑推理一直是一项不断探索的领域。虽然大语言模型(LLMs)已经做出了 significativo进步,但它们仍然在复杂的逻辑推理问题上困难。以提高推理性能为目的,一个有前途的方向是可扩展的监督,需要 LLMs 能够自我检查并自我改进。各种自我验证方法已经被提出,但是现有模型是否能够准确地识别自己的错误仍然在调查中。在这篇论文中,我们坚持关注 LLMs 在逻辑推理上的自我验证能力,特别是它们能够准确地识别逻辑错误的能力。我们提出了一个名为 FALLACIES 的数据集,包含 232 种逻辑错误分类在层次分类中。通过对 FALLACIES 进行探索性实验,我们获得了详细和全面的模型评估结果。我们的主要发现表明现有的 LLMs 可能会在准确地识别逻辑错误步骤上遇到困难,并且可能无法保证自我验证方法的有效性。基于这些观察,我们提出了未来研究和实践自我验证方法的建议。

The Impact of Adversarial Node Placement in Decentralized Federated Learning Networks

  • paper_url: http://arxiv.org/abs/2311.07946
  • repo_url: https://github.com/adampi210/maxspanfl_atck_code_data
  • paper_authors: Adam Piaseczny, Eric Ruzomberka, Rohit Parasnis, Christopher G. Brinton
  • for: 本研究探讨了分布式学习(Federated Learning,FL)中对抗性节点布置的影响,并提出了一种新的攻击算法,以提高攻击效果。
  • methods: 本研究使用了分布式环境的优势,实现了快速和能量减少的设备之间通信。并对各种抗性节点布置策略进行了分析,包括随机布置和网络中心性基于的布置。
  • results: 研究发现,新的攻击算法可以大大提高攻击效果,比基eline框架提高9%-66.5%。这些发现提供了对分布式FL安全性的重要性的新的视角,并为未来关于开发更安全和可靠的分布式FL框架的研究提供了基础。
    Abstract As Federated Learning (FL) grows in popularity, new decentralized frameworks are becoming widespread. These frameworks leverage the benefits of decentralized environments to enable fast and energy-efficient inter-device communication. However, this growing popularity also intensifies the need for robust security measures. While existing research has explored various aspects of FL security, the role of adversarial node placement in decentralized networks remains largely unexplored. This paper addresses this gap by analyzing the performance of decentralized FL for various adversarial placement strategies when adversaries can jointly coordinate their placement within a network. We establish two baseline strategies for placing adversarial node: random placement and network centrality-based placement. Building on this foundation, we propose a novel attack algorithm that prioritizes adversarial spread over adversarial centrality by maximizing the average network distance between adversaries. We show that the new attack algorithm significantly impacts key performance metrics such as testing accuracy, outperforming the baseline frameworks by between 9% and 66.5% for the considered setups. Our findings provide valuable insights into the vulnerabilities of decentralized FL systems, setting the stage for future research aimed at developing more secure and robust decentralized FL frameworks.
    摘要 随着联合学习(FL)的崛起,新的分布式框架在广泛应用。这些框架利用分布式环境中的优势来实现快速和能效的设备间通信。然而,这种崛起也使得FL安全性的需求更加紧迫。 existed research 已经探讨了FL安全性的多个方面,但尚未考虑分布式网络中反对者的协调 placement 的影响。这篇论文填补这个空白,分析了不同的反对者协调 placement 策略在分布式FL中的性能。我们提出了两种基eline策略:随机分布和网络中心性基eline策略。基于这个基础,我们提出了一种新的攻击算法,它强调对抗扩散而不是对抗中心性,通过最大化网络距离 между反对者来 maximize 平均网络距离。我们显示,新的攻击算法对测试精度和其他关键性能指标产生了深见影响,在考虑的设置中,相比基eline框架,新的攻击算法可以提高9%到66.5%的testing accuracy。我们的发现为分布式FL系统的安全性提供了重要的技术指导,为未来针对更安全和可靠的分布式FL框架的研究奠定了基础。

Non-autoregressive Machine Translation with Probabilistic Context-free Grammar

  • paper_url: http://arxiv.org/abs/2311.07941
  • repo_url: https://github.com/ictnlp/pcfg-nat
  • paper_authors: Shangtong Gui, Chenze Shao, Zhengrui Ma, Xishan Zhang, Yunji Chen, Yang Feng
  • for: 加速神经机器翻译的推理
  • methods: 使用特制的概率 контекст自由格式(PCFG) 进行优化
  • results: 提高神经机器翻译的表达能力和性能,并且可以更深入地理解生成的句子
    Abstract Non-autoregressive Transformer(NAT) significantly accelerates the inference of neural machine translation. However, conventional NAT models suffer from limited expression power and performance degradation compared to autoregressive (AT) models due to the assumption of conditional independence among target tokens. To address these limitations, we propose a novel approach called PCFG-NAT, which leverages a specially designed Probabilistic Context-Free Grammar (PCFG) to enhance the ability of NAT models to capture complex dependencies among output tokens. Experimental results on major machine translation benchmarks demonstrate that PCFG-NAT further narrows the gap in translation quality between NAT and AT models. Moreover, PCFG-NAT facilitates a deeper understanding of the generated sentences, addressing the lack of satisfactory explainability in neural machine translation.Code is publicly available at https://github.com/ictnlp/PCFG-NAT.
    摘要 非autoregressive Transformer(NAT)对神经机器翻译的推理具有显著的加速效果。然而,传统的 NAT 模型由于假设目标符号之间的条件独立性而受到限制,导致表现力和精度相比 autoregressive(AT)模型受到限制。为了解决这些局限性,我们提出了一种新的方法called PCFG-NAT,该方法利用特殊设计的概率 context-free grammar(PCFG)来增强 NAT 模型对输出符号之间的复杂依赖关系的捕捉能力。实验结果表明,PCFG-NAT 可以进一步缩小 NAT 和 AT 模型之间的翻译质量差距。此外,PCFG-NAT 可以更好地解释生成的句子,解决神经机器翻译中缺乏满意的解释性的问题。代码可以在 GitHub 上获取:https://github.com/ictnlp/PCFG-NAT。

Towards Improving Robustness Against Common Corruptions in Object Detectors Using Adversarial Contrastive Learning

  • paper_url: http://arxiv.org/abs/2311.07928
  • repo_url: None
  • paper_authors: Shashank Kotyan, Danilo Vasconcellos Vargas
  • for: 强化神经网络的可靠性,尤其是在自动驾驶应用中,以满足实际场景中的可靠性要求。
  • methods: 提出了一种创新的对抗学习框架,通过生成对抗示例并优化对抗损失来增强神经网络的对抗性和常规损害性。
  • results: 通过实验表明,该方法可以同时提高神经网络对抗性和常规损害性,并在实际场景中保持高度可靠性。
    Abstract Neural networks have revolutionized various domains, exhibiting remarkable accuracy in tasks like natural language processing and computer vision. However, their vulnerability to slight alterations in input samples poses challenges, particularly in safety-critical applications like autonomous driving. Current approaches, such as introducing distortions during training, fall short in addressing unforeseen corruptions. This paper proposes an innovative adversarial contrastive learning framework to enhance neural network robustness simultaneously against adversarial attacks and common corruptions. By generating instance-wise adversarial examples and optimizing contrastive loss, our method fosters representations that resist adversarial perturbations and remain robust in real-world scenarios. Subsequent contrastive learning then strengthens the similarity between clean samples and their adversarial counterparts, fostering representations resistant to both adversarial attacks and common distortions. By focusing on improving performance under adversarial and real-world conditions, our approach aims to bolster the robustness of neural networks in safety-critical applications, such as autonomous vehicles navigating unpredictable weather conditions. We anticipate that this framework will contribute to advancing the reliability of neural networks in challenging environments, facilitating their widespread adoption in mission-critical scenarios.
    摘要

Brain-Driven Representation Learning Based on Diffusion Model

  • paper_url: http://arxiv.org/abs/2311.07925
  • repo_url: None
  • paper_authors: Soowon Kim, Seo-Hyun Lee, Young-Eun Lee, Ji-Won Lee, Ji-Ha Park, Seong-Whan Lee
  • for: 用于分析语音相关的EEG信号
  • methods: 使用Diffusion Probabilistic Models(DDPMs)和条件autoencoder
  • results: 新方法的准确率明显高于传统机器学习算法和已有基eline模型
    Abstract Interpreting EEG signals linked to spoken language presents a complex challenge, given the data's intricate temporal and spatial attributes, as well as the various noise factors. Denoising diffusion probabilistic models (DDPMs), which have recently gained prominence in diverse areas for their capabilities in representation learning, are explored in our research as a means to address this issue. Using DDPMs in conjunction with a conditional autoencoder, our new approach considerably outperforms traditional machine learning algorithms and established baseline models in accuracy. Our results highlight the potential of DDPMs as a sophisticated computational method for the analysis of speech-related EEG signals. This could lead to significant advances in brain-computer interfaces tailored for spoken communication.
    摘要 interpreting EEG signals linked to spoken language presented a complex challenge, given the data's intricate temporal and spatial attributes, as well as the various noise factors. denoising diffusion probabilistic models (DDPMs), which have recently gained prominence in diverse areas for their capabilities in representation learning, are explored in our research as a means to address this issue. using DDPMs in conjunction with a conditional autoencoder, our new approach considerably outperforms traditional machine learning algorithms and established baseline models in accuracy. our results highlight the potential of DDPMs as a sophisticated computational method for the analysis of speech-related EEG signals. this could lead to significant advances in brain-computer interfaces tailored for spoken communication.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Smart Home Goal Feature Model – A guide to support Smart Homes for Ageing in Place

  • paper_url: http://arxiv.org/abs/2311.09248
  • repo_url: None
  • paper_authors: Irini Logothetis, Priya Rani, Shangeetha Sivasothy, Rajesh Vasa, Kon Mouzakis
  • for: This paper provides an overview of smart home technologies that support ageing in place, and offers a structured approach to design, develop, and deploy smart homes for the elderly based on their personalized needs.
  • methods: The paper synthesizes prior knowledge and creates a Smart Home Goal Feature Model (SHGFM) to resolve heuristic approaches used by Subject Matter Experts (SMEs) and healthcare researchers in adapting smart homes for the elderly.
  • results: The SHGFM provides SMEs with the ability to establish goals and identify features to set up strategies for designing, developing, and deploying smart homes that meet the needs of the elderly.
    Abstract Smart technologies are significant in supporting ageing in place for elderly. Leveraging Artificial Intelligence (AI) and Machine Learning (ML), it provides peace of mind, enabling the elderly to continue living independently. Elderly use smart technologies for entertainment and social interactions, this can be extended to provide safety and monitor health and environmental conditions, detect emergencies and notify informal and formal caregivers when care is needed. This paper provides an overview of the smart home technologies commercially available to support ageing in place, the advantages and challenges of smart home technologies, and their usability from elderlys perspective. Synthesizing prior knowledge, we created a structured Smart Home Goal Feature Model (SHGFM) to resolve heuristic approaches used by the Subject Matter Experts (SMEs) at aged care facilities and healthcare researchers in adapting smart homes. The SHGFM provides SMEs the ability to (i) establish goals and (ii) identify features to set up strategies to design, develop and deploy smart homes for the elderly based on personalised needs. Our model provides guidance to healthcare researchers and aged care industries to set up smart homes based on the needs of elderly, by defining a set of goals at different levels mapped to a different set of features.
    摘要 智能技术支持老年人在家中生活,通过人工智能(AI)和机器学习(ML)提供了心理安全,让老年人可以独立地生活。老年人使用智能技术进行娱乐和社交交流,这可以扩展到提供安全和监测健康和环境条件,检测紧急情况并通知正式和非正式照顾者当照顾是需要的。本文提供了智能家居技术商业化应用的概述,以及智能家居技术的优势和挑战,以及老年人的用户视角。我们synthesize了先前的知识,创建了一个结构化的智能家居目标特征模型(SHGFM),以解决 aged care 设施和医疗研究人员使用智能家居时采用的规则性方法。SHGFM 让 SME 能够(i)确定目标和(ii)标识特征,以设立策略设计、开发和部署智能家居。我们的模型为医疗研究人员和 aged care 产业提供指导,以设置智能家居基于老年人需求的。我们定义了不同层次的目标,与不同的特征集成,以便在不同的情况下设置智能家居。

Instruction-Following Evaluation for Large Language Models

  • paper_url: http://arxiv.org/abs/2311.07911
  • repo_url: https://github.com/google-research/google-research
  • paper_authors: Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, Le Hou
  • for: This paper aims to evaluate the ability of large language models (LLMs) to follow natural language instructions in a standardized and objective manner.
  • methods: The paper introduces a new evaluation benchmark called Instruction-Following Eval (IFEval) that focuses on a set of verifiable instructions, such as writing in more than 400 words or mentioning the keyword of AI at least three times.
  • results: The authors evaluate two widely available LLMs on the market using the IFEval benchmark and show the results, which can be found at https://github.com/google-research/google-research/tree/master/instruction_following_eval.
    Abstract One core capability of Large Language Models (LLMs) is to follow natural language instructions. However, the evaluation of such abilities is not standardized: Human evaluations are expensive, slow, and not objectively reproducible, while LLM-based auto-evaluation is potentially biased or limited by the ability of the evaluator LLM. To overcome these issues, we introduce Instruction-Following Eval (IFEval) for large language models. IFEval is a straightforward and easy-to-reproduce evaluation benchmark. It focuses on a set of "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times". We identified 25 types of those verifiable instructions and constructed around 500 prompts, with each prompt containing one or more verifiable instructions. We show evaluation results of two widely available LLMs on the market. Our code and data can be found at https://github.com/google-research/google-research/tree/master/instruction_following_eval
    摘要 一种核心能力 OF Large Language Models (LLMs) 是遵循自然语言指令。然而,评估这种能力的标准化不存在:人工评估昂贵、慢速、不可重复,而 LLB-based auto-evaluation 可能受到评估LLM的能力的偏见或限制。为解决这些问题,我们介绍了 Instruction-Following Eval (IFEval),一个简单易复制的评估标准。它关注一组 "可靠指令",如 "写入 более 400 字" 和 "在至少三次提及 AI 关键词中提及 AI"。我们分类了 25 种这些可靠指令,并构建了约 500 个提示,每个提示包含一个或多个可靠指令。我们展示了两个市场上广泛使用的 LLB 的评估结果。我们的代码和数据可以在 https://github.com/google-research/google-research/tree/master/instruction_following_eval 找到。

Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks

  • paper_url: http://arxiv.org/abs/2311.09247
  • repo_url: None
  • paper_authors: Melanie Mitchell, Alessandro B. Palmarini, Arseny Moskvichev
  • for: 评估 GPT-4 和 GPT-4V 在概念理解和推理方面的能力,使用 ConceptARC benchmark。
  • methods: 使用 text 和 image 版本的 ConceptARC 任务,对 GPT-4 和 GPT-4V 进行评估。
  • results: GPT-4 和 GPT-4V 未能达到人类水平的抽象能力, neither version of GPT-4 has developed robust abstraction abilities at humanlike levels。
    Abstract We explore the abstract reasoning abilities of text-only and multimodal versions of GPT-4, using the ConceptARC benchmark [10], which is designed to evaluate robust understanding and reasoning with core-knowledge concepts. We extend the work of Moskvichev et al. [10] by evaluating GPT-4 on more detailed, one-shot prompting (rather than simple, zero-shot prompts) with text versions of ConceptARC tasks, and by evaluating GPT-4V, the multimodal version of GPT-4, on zero- and one-shot prompts using image versions of the simplest tasks. Our experimental results support the conclusion that neither version of GPT-4 has developed robust abstraction abilities at humanlike levels.
    摘要 我们研究了基于文本和多媒体版本的GPT-4的抽象理解能力,使用ConceptARCbenchmark [10],这是一个用于评估基本知识概念的理解和逻辑能力的测试。我们在 Moskvichev et al. [10] 的工作中进一步扩展了GPT-4的测试,使用文本版本的ConceptARC任务,并对GPT-4V,多媒体版本的GPT-4,进行零shot和一shot提示测试。我们的实验结果表明, neither version of GPT-4 在人类水平的抽象能力达到了人类水平。

RoboSense At Edge: Detecting Slip, Crumple and Shape of the Object in Robotic Hand for Teleoprations

  • paper_url: http://arxiv.org/abs/2311.07888
  • repo_url: None
  • paper_authors: Sudev Kumar Padhi, Mohit Kumar, Debanka Giri, Subidh Ali
  • for: 这个论文是为了解决机器人手上的滑块和损坏问题,以便在远程手术等精准 manipulate 任务中实现稳定性。
  • methods: 该论文提出了基于机器学习技术的滑块、损坏和物体形状检测方法,通过测量机器人手上的力/扭矩和旋转角度来实现。
  • results: 该论文的实验结果表明,基于机器学习模型的滑块、损坏和物体形状检测方法可以减少机器人手上的延迟,提高远程手术等精准 manipulate 任务的稳定性。
    Abstract Slip and crumple detection is essential for performing robust manipulation tasks with a robotic hand (RH) like remote surgery. It has been one of the challenging problems in the robotics manipulation community. In this work, we propose a technique based on machine learning (ML) based techniques to detect the slip, and crumple as well as the shape of an object that is currently held in the robotic hand. We proposed ML model will detect the slip, crumple, and shape using the force/torque exerted and the angular positions of the actuators present in the RH. The proposed model would be integrated into the loop of a robotic hand(RH) and haptic glove(HG). This would help us to reduce the latency in case of teleoperation
    摘要 摸擦和损坏检测是Robotic hand(RH)进行稳定的操作任务的重要前提。这是机器人控制领域的一个挑战。在这种工作中,我们提议一种基于机器学习(ML)技术的检测方法,可以检测RH中持有物体的摸擦、损坏以及形状。我们的ML模型将通过RH上的力/扭矩和活动器的角度位置来检测摸擦、损坏和形状。我们的模型将被 интеGRATED到RH和Haptic glove(HG)的控制循环中,以减少在远程操作中的延迟。

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

  • paper_url: http://arxiv.org/abs/2311.07885
  • repo_url: None
  • paper_authors: Minghua Liu, Ruoxi Shi, Linghao Chen, Zhuoyang Zhang, Chao Xu, Xinyue Wei, Hansheng Chen, Chong Zeng, Jiayuan Gu, Hao Su
  • for: 这个论文旨在提供一种能够快速生成高质量3D对象的方法,以满足实际应用需求。
  • methods: 该方法首先在2D扩散模型中进行了finetuning,以实现多视图图像生成的一致性。然后,使用多视图conditioned 3D本地扩散模型来提升这些图像到3D。
  • results: 实验表明,该方法可以生成高质量、多样化的3D资产,与输入图像具有高度的相似性。Here’s the translation in English:
  • for: This paper aims to provide a method for rapidly generating high-quality 3D objects, to meet practical application requirements.
  • methods: The method first finetunes a 2D diffusion model for multi-view image generation consistency. Then, it elevates these images to 3D using multi-view conditioned 3D native diffusion models.
  • results: Experimental results show that the method can generate high-quality, diverse 3D assets that closely mirror the original input image.
    Abstract Recent advancements in open-world 3D object generation have been remarkable, with image-to-3D methods offering superior fine-grained control over their text-to-3D counterparts. However, most existing models fall short in simultaneously providing rapid generation speeds and high fidelity to input images - two features essential for practical applications. In this paper, we present One-2-3-45++, an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute. Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data. This is achieved by initially finetuning a 2D diffusion model for consistent multi-view image generation, followed by elevating these images to 3D with the aid of multi-view conditioned 3D native diffusion models. Extensive experimental evaluations demonstrate that our method can produce high-quality, diverse 3D assets that closely mirror the original input image. Our project webpage: https://sudo-ai-3d.github.io/One2345plus_page.
    摘要 近期开放世界3D物体生成技术得到了非常显著的进步,图像到3D方法可以提供细化控制,而文本到3D方法则不足。然而,大多数现有模型都缺乏同时提供快速生成速度和高精度输入图像的能力,这两个特点是实际应用中必不可少的。在这篇论文中,我们介绍了One-2-3-45++方法,可以将单张图像转化为细化的3D纹理网格,在大约一分钟内完成。我们的方法利用了丰富的2D扩散模型和3D数据的价值,通过首先训练2D扩散模型,然后使用多视图conditioned 3D本地扩散模型来提升这些图像到3D。我们的实验证明,我们的方法可以生成高质量、多样的3D资产,与输入图像几乎一致。更多信息请访问我们的项目网页:

VegaEdge: Edge AI Confluence Anomaly Detection for Real-Time Highway IoT-Applications

  • paper_url: http://arxiv.org/abs/2311.07880
  • repo_url: None
  • paper_authors: Vinit Katariya, Fatema-E- Jannat, Armin Danesh Pazho, Ghazal Alinezhad Noghre, Hamed Tabkhi
  • for: 本研究旨在提出一种车辆异常检测方法,用于高速公路安全应用,如事故预防、快速应急救援、交通流优化和工zone安全。
  • methods: 本研究使用轨迹预测来实现车辆异常检测,并提出了一种基于AI的车辆异常检测方法,可以适应互联网时代的高速公路应用。
  • results: 实验结果表明,在多个平台和交通场景下,我们的异常检测方法具有高效和灵活性,并且在真实的高速公路环境中可以实时处理738个轨迹每秒。此外,我们还提供了一个新的高速公路异常数据集(CAD),以填补现有的异常数据集不足。
    Abstract Vehicle anomaly detection plays a vital role in highway safety applications such as accident prevention, rapid response, traffic flow optimization, and work zone safety. With the surge of the Internet of Things (IoT) in recent years, there has arisen a pressing demand for Artificial Intelligence (AI) based anomaly detection methods designed to meet the requirements of IoT devices. Catering to this futuristic vision, we introduce a lightweight approach to vehicle anomaly detection by utilizing the power of trajectory prediction. Our proposed design identifies vehicles deviating from expected paths, indicating highway risks from different camera-viewing angles from real-world highway datasets. On top of that, we present VegaEdge - a sophisticated AI confluence designed for real-time security and surveillance applications in modern highway settings through edge-centric IoT-embedded platforms equipped with our anomaly detection approach. Extensive testing across multiple platforms and traffic scenarios showcases the versatility and effectiveness of VegaEdge. This work also presents the Carolinas Anomaly Dataset (CAD), to bridge the existing gap in datasets tailored for highway anomalies. In real-world scenarios, our anomaly detection approach achieves an AUC-ROC of 0.94, and our proposed VegaEdge design, on an embedded IoT platform, processes 738 trajectories per second in a typical highway setting. The dataset is available at https://github.com/TeCSAR-UNCC/Carolinas_Dataset#chd-anomaly-test-set .
    摘要 高速公路安全应用中的车辆异常检测扮演着重要的角色,包括事故预防、快速应急救援、交通流动优化和工区安全。随着互联网物联网(IoT)的普及,有一种强烈的需求,即基于人工智能(AI)的异常检测方法,以满足IoT设备的需求。为满足这一未来视野,我们提出了一种轻量级的车辆异常检测方法,利用轨迹预测的力量。我们的提议的设计可以在不同的摄像头视角上检测出车辆异常行为,从真实的高速公路数据集中提取出高速公路风险。此外,我们还提出了一种名为VegaEdge的智能抽象设计,针对现代高速公路环境中的实时安全监测应用。VegaEdge通过在边缘式IoT设备上实现,可以实现实时的安全监测和响应。我们对多个平台和交通enario进行了广泛的测试,显示了VegaEdge的多平台性和效果。此外,我们还提供了 Carolinas Anomaly Dataset(CAD),用于bridging现有的高速公路异常数据集之间的 gap。在实际应用中,我们的异常检测方法实现了AUC-ROC的0.94,而我们提议的VegaEdge设计在一个常见的高速公路环境中,在边缘式IoT平台上处理了738个轨迹每秒。数据集可以在https://github.com/TeCSAR-UNCC/Carolinas_Dataset#chd-anomaly-test-set 中获取。

Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators

  • paper_url: http://arxiv.org/abs/2311.07879
  • repo_url: None
  • paper_authors: Yang Trista Cao, Lovely-Frances Domingo, Sarah Ann Gilbert, Michelle Mazurek, Katie Shilton, Hal Daumé III
  • for: 本研究旨在检验过去的自动化内容审核方法是否能满足志愿者审核员的需求。
  • methods: 本研究使用模型评估平台上的模型和现有的LLMs(GPT-4和Llama-2),评估这些模型在违反平台规则时的表现。
  • results: 研究发现,现有的模型和LLMs具有低回归率,表明存在较大的差距。
    Abstract Extensive efforts in automated approaches for content moderation have been focused on developing models to identify toxic, offensive, and hateful content -- with the aim of lightening the load for moderators. Yet, it remains uncertain whether improvements on those tasks truly address the needs that moderators have in accomplishing their work. In this paper, we surface the gaps between past research efforts that have aimed to provide automation for aspects of the content moderation task, and the needs of volunteer content moderators. To do so, we conduct a model review on Hugging Face to reveal the availability of models to cover various moderation rules and guidelines. We further put state-of-the-art LLMs to the test (GPT-4 and Llama-2), evaluating how well these models perform in flagging violations of platform rules. Overall, we observe a non-trivial gap, as missing developed models and LLMs exhibit low recall on a significant portion of the rules.
    摘要 历史研究投入了大量时间和精力来开发自动化内容审核模型,以减轻模糊者的负担。然而,是否真的改进了内容审核任务的Automation仍然存在uncertainty。在这篇论文中,我们把过去努力提供内容审核任务的自动化方法的研究成果与志愿者内容审核者的需求进行了比较。为此,我们通过Hugging Face模型查找器来检验各种内容审核规则和指南是否有效。此外,我们还使用当今最先进的语言模型(GPT-4和Llama-2)进行测试,以评估这些模型在报告平台规则 violation 方面的性能。总之,我们发现了一定的差距,因为缺失的模型和LLMs(语言模型)表现低准确率。

Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback

  • paper_url: http://arxiv.org/abs/2311.07876
  • repo_url: None
  • paper_authors: Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li
  • for: 这个论文研究了具有对抗改变损失函数的低级MDP。特别是,transition probability kernel可以用低级矩阵分解 \citep{REPUCB22}, 并且损失函数可能会改变对抗地,但是在每个episode结束时都会披露给学习者。
  • methods: 我们提出了一种基于策略优化的算法POLO,并证明了它可以实现 $\widetilde{O}(K^{5/6}A^{1/2}d\ln(1+M)/(1-\gamma)^2)$ 的违和误差保证,其中 $d$ 是transition kernel的级数(也是不知道表示的维度), $A$ 是动作空间的大小, $M$ 是模型集的大小, $\gamma$ 是折扣因子。另外,我们也证明了这个问题的负 regret下界为 $\Omega(\frac{\gamma^2}{1-\gamma} \sqrt{d A K})$,表明低级MDP在违和误差下界中比线性MDP更难学习。
  • results: 我们的算法可以在不知道表示的情况下,在低级MDP中实现违和误差下界。此外,我们还证明了这个问题的负 regret下界,并证明了这是首次将 representation learning、探索和利用相互融合以实现违和误差下界的RL算法。
    Abstract In this work, we study the low-rank MDPs with adversarially changed losses in the full-information feedback setting. In particular, the unknown transition probability kernel admits a low-rank matrix decomposition \citep{REPUCB22}, and the loss functions may change adversarially but are revealed to the learner at the end of each episode. We propose a policy optimization-based algorithm POLO, and we prove that it attains the $\widetilde{O}(K^{\frac{5}{6}A^{\frac{1}{2}d\ln(1+M)/(1-\gamma)^2)$ regret guarantee, where $d$ is rank of the transition kernel (and hence the dimension of the unknown representations), $A$ is the cardinality of the action space, $M$ is the cardinality of the model class, and $\gamma$ is the discounted factor. Notably, our algorithm is oracle-efficient and has a regret guarantee with no dependence on the size of potentially arbitrarily large state space. Furthermore, we also prove an $\Omega(\frac{\gamma^2}{1-\gamma} \sqrt{d A K})$ regret lower bound for this problem, showing that low-rank MDPs are statistically more difficult to learn than linear MDPs in the regret minimization setting. To the best of our knowledge, we present the first algorithm that interleaves representation learning, exploration, and exploitation to achieve the sublinear regret guarantee for RL with nonlinear function approximation and adversarial losses.
    摘要 在这个工作中,我们研究了低级别MDPs中的敌对改变损失函数的问题,specifically在充满反馈情况下。特别是,未知转移概率函数可以归纳为低级别矩阵分解 \citep{REPUCB22}, 并且损失函数可能会敌对地改变,但是在每个episode结束时都会被掌握者提供。我们提出了一种基于策略优化的算法POLO,并证明它可以 дости到$\widetilde{O}(K^{5/6}A^{1/2}d\ln(1+M)/(1-\gamma)^2)$的误差保证,where $d$ is the rank of the transition kernel (and hence the dimension of the unknown representations), $A$ is the cardinality of the action space, $M$ is the cardinality of the model class, and $\gamma$ is the discounted factor. 注意,我们的算法是oracle-efficient的,meaning it has a regret guarantee with no dependence on the size of potentially arbitrarily large state space. Furthermore, we also prove an $\Omega(\frac{\gamma^2}{1-\gamma} \sqrt{d A K})$ regret lower bound for this problem, showing that low-rank MDPs are statistically more difficult to learn than linear MDPs in the regret minimization setting. To the best of our knowledge, we present the first algorithm that interleaves representation learning, exploration, and exploitation to achieve the sublinear regret guarantee for RL with nonlinear function approximation and adversarial losses.

Rankitect: Ranking Architecture Search Battling World-class Engineers at Meta Scale

  • paper_url: http://arxiv.org/abs/2311.08430
  • repo_url: None
  • paper_authors: Wei Wen, Kuang-Hung Liu, Igor Fedorov, Xin Zhang, Hang Yin, Weiwei Chu, Kaveh Hassani, Mengying Sun, Jiang Liu, Xu Wang, Lin Jiang, Yuxin Chen, Buyun Zhang, Xi Liu, Dehua Cheng, Zhengxing Chen, Guang Zhao, Fangqiu Han, Jiyan Yang, Yuchen Hao, Liang Xiong, Wen-Yen Chen
  • for: 本文是用于 Meta ranking system 中的 Neural Architecture Search (NAS) 框架 Rankitect,旨在建立从头开始的架构,并对现有的 SOTA NAS 方法进行改进和比较。
  • methods: Rankitect 使用 sampling-based NAS、one-shot NAS 和 Differentiable NAS (DNAS) 等方法来构建架构,并对搜索空间进行优化。
  • results: Rankitect 可以在 Meta 级别的生产环境中发现新的模型,并达到竞争性的 Normalized Entropy loss 和 FLOPs 之间的折衔。当使用工程师设计的搜索空间时,Rankitect 可以生成更好的模型,在线 A/B 测试中获得正面评价和Meta 规模上的实际效果。
    Abstract Neural Architecture Search (NAS) has demonstrated its efficacy in computer vision and potential for ranking systems. However, prior work focused on academic problems, which are evaluated at small scale under well-controlled fixed baselines. In industry system, such as ranking system in Meta, it is unclear whether NAS algorithms from the literature can outperform production baselines because of: (1) scale - Meta ranking systems serve billions of users, (2) strong baselines - the baselines are production models optimized by hundreds to thousands of world-class engineers for years since the rise of deep learning, (3) dynamic baselines - engineers may have established new and stronger baselines during NAS search, and (4) efficiency - the search pipeline must yield results quickly in alignment with the productionization life cycle. In this paper, we present Rankitect, a NAS software framework for ranking systems at Meta. Rankitect seeks to build brand new architectures by composing low level building blocks from scratch. Rankitect implements and improves state-of-the-art (SOTA) NAS methods for comprehensive and fair comparison under the same search space, including sampling-based NAS, one-shot NAS, and Differentiable NAS (DNAS). We evaluate Rankitect by comparing to multiple production ranking models at Meta. We find that Rankitect can discover new models from scratch achieving competitive tradeoff between Normalized Entropy loss and FLOPs. When utilizing search space designed by engineers, Rankitect can generate better models than engineers, achieving positive offline evaluation and online A/B test at Meta scale.
    摘要

AutoML for Large Capacity Modeling of Meta’s Ranking Systems

  • paper_url: http://arxiv.org/abs/2311.07870
  • repo_url: None
  • paper_authors: Hang Yin, Kuang-Hung Liu, Mengying Sun, Yuxin Chen, Buyun Zhang, Jiang Liu, Vivek Sehgal, Rudresh Rajnikant Panchal, Eugen Hotaj, Xi Liu, Daifeng Guo, Jamey Zhang, Zhou Wang, Shali Jiang, Huayu Li, Zhengxing Chen, Wen-Yen Chen, Jiyan Yang, Wei Wen
  • for: 本研究旨在提高排名模型,但是Engineering heavy的问题使得改进模型变得更加重要。
  • methods: 本研究使用自动化机器学习(AutoML)来释放工程师从权重逐个调整排名模型中的劳动密集工作。
  • results: 本研究表明,使用 sampling-based AutoML 方法可以在 Meta 级别的生产环境中提高排名性能,并且可以在短时间内实现更高的 Return on Investment (ROI) 和 Normalized Entropy (NE) 损失减少或 Query per Second (QPS) 提高。
    Abstract Web-scale ranking systems at Meta serving billions of users is complex. Improving ranking models is essential but engineering heavy. Automated Machine Learning (AutoML) can release engineers from labor intensive work of tuning ranking models; however, it is unknown if AutoML is efficient enough to meet tight production timeline in real-world and, at the same time, bring additional improvements to the strong baselines. Moreover, to achieve higher ranking performance, there is an ever-increasing demand to scale up ranking models to even larger capacity, which imposes more challenges on the efficiency. The large scale of models and tight production schedule requires AutoML to outperform human baselines by only using a small number of model evaluation trials (around 100). We presents a sampling-based AutoML method, focusing on neural architecture search and hyperparameter optimization, addressing these challenges in Meta-scale production when building large capacity models. Our approach efficiently handles large-scale data demands. It leverages a lightweight predictor-based searcher and reinforcement learning to explore vast search spaces, significantly reducing the number of model evaluations. Through experiments in large capacity modeling for CTR and CVR applications, we show that our method achieves outstanding Return on Investment (ROI) versus human tuned baselines, with up to 0.09% Normalized Entropy (NE) loss reduction or $25\%$ Query per Second (QPS) increase by only sampling one hundred models on average from a curated search space. The proposed AutoML method has already made real-world impact where a discovered Instagram CTR model with up to -0.36% NE gain (over existing production baseline) was selected for large-scale online A/B test and show statistically significant gain. These production results proved AutoML efficacy and accelerated its adoption in ranking systems at Meta.
    摘要 meta的排名系统 serving billions of users是复杂的。提高排名模型是必要的,但是工程师的努力是费尽的。自动化机器学习(AutoML)可以释放工程师从排名模型调试的劳动中解放出来,但是不知道AutoML是否具备足够的效率来满足实际生产环境中的紧张时间表。此外,随着排名性能的提高,需要扩大排名模型的规模,这会增加效率的挑战。大规模的模型和紧张的生产时间表要求AutoML能够超越人类基elines,只需使用一百个模型评估试验。我们提出了一种采样基于的AutoML方法,专注于神经网络搜索和超参数优化,解决Metascale生产环境中建立大容量模型时的挑战。我们的方法可以有效处理大规模数据需求,利用轻量级预测器基本搜索和强化学习来快速探索广阔的搜索空间,同时显著减少模型评估试验的数量。经过大容量模型应用于Click-through Rate(CTR)和Conversion Rate(CVR)应用程序的实验,我们发现我们的方法可以实现人类基elines的返回onto投资(ROI),减少模型评估试验数量的同时,提高模型性能。在实际生产中,我们已经通过实施AutoML方法,对Instagram CTR模型进行了大规模在线A/B测试,并获得了统计学上的增长。这些生产成果证明了AutoML的有效性,并促使其在排名系统中的广泛采用。

Multi-Signal Reconstruction Using Masked Autoencoder From EEG During Polysomnography

  • paper_url: http://arxiv.org/abs/2311.07868
  • repo_url: None
  • paper_authors: Young-Seok Kweon, Gi-Hwan Shin, Heon-Gyu Kwak, Ha-Na Jo, Seong-Whan Lee
  • for: 这个研究旨在开发一种基于单通道EEG的多信号PSG系统,以便在非专业 Settings中进行睡眠监测。
  • methods: 该系统使用masked autoencoder来重建多信号PSG数据,并在Sleep-EDF-20数据集上进行训练和评估。
  • results: 模型能够成功重建多信号数据,表明该系统可能实现更好的睡眠监测。这将扩展PSG的应用范围,使其能够在住院设施外进行使用。
    Abstract Polysomnography (PSG) is an indispensable diagnostic tool in sleep medicine, essential for identifying various sleep disorders. By capturing physiological signals, including EEG, EOG, EMG, and cardiorespiratory metrics, PSG presents a patient's sleep architecture. However, its dependency on complex equipment and expertise confines its use to specialized clinical settings. Addressing these limitations, our study aims to perform PSG by developing a system that requires only a single EEG measurement. We propose a novel system capable of reconstructing multi-signal PSG from a single-channel EEG based on a masked autoencoder. The masked autoencoder was trained and evaluated using the Sleep-EDF-20 dataset, with mean squared error as the metric for assessing the similarity between original and reconstructed signals. The model demonstrated proficiency in reconstructing multi-signal data. Our results present promise for the development of more accessible and long-term sleep monitoring systems. This suggests the expansion of PSG's applicability, enabling its use beyond the confines of clinics.
    摘要 普里索诺графи(PSG)是睡眠医学中不可或缺的诊断工具,能够识别多种睡眠障碍。PSG通过捕捉生物 physiological 信号,包括 EEG、EOG、EMG 和呼吸征函数指标,可以显示睡眠体系结构。然而,它的依赖于复杂设备和专业知识,限制其使用于专业医疗设施。为了解决这些限制,我们的研究目的是通过开发一种只需要单个 EEG 测量的系统来实现 PSG。我们提出了一种基于 masked autoencoder 的新系统,可以从单个 EEG 信号中重建多个信号 PSG。我们使用 Sleep-EDF-20 数据集来训练和评估 masked autoencoder,并使用 mean squared error 作为重建信号与原始信号之间的相似度度量。 results 表明模型能够有效地重建多个信号数据。我们的结果表明,可以开发更加可 accessible 和长期睡眠监测系统,从而扩大 PSG 的应用范围,使其不再局限于医疗机构。

Overview of the TREC 2023 Product Product Search Track

  • paper_url: http://arxiv.org/abs/2311.07861
  • repo_url: None
  • paper_authors: Daniel Campos, Surya Kallumadi, Corby Rosset, Cheng Xiang Zhai, Alessandro Magnani
  • for: 本研究旨在创建可重用的收集和评估多Modal数据和元数据对搜寻精度的影响。
  • methods: 本研究使用新的产品搜寻 corpus,包括 contextual metadata。
  • results: 结果显示,在产品搜寻领域,传统搜寻系统具有高效性和通用预训练 embedding 模型的竞争力。 Metadata-enhanced 收集没有明确的趋势,而单stage dense retrieval run 在 zero-shot 和精度调整领域可能不竞争或生成低质量结果。
    Abstract This is the first year of the TREC Product search track. The focus this year was the creation of a reusable collection and evaluation of the impact of the use of metadata and multi-modal data on retrieval accuracy. This year we leverage the new product search corpus, which includes contextual metadata. Our analysis shows that in the product search domain, traditional retrieval systems are highly effective and commonly outperform general-purpose pretrained embedding models. Our analysis also evaluates the impact of using simplified and metadata-enhanced collections, finding no clear trend in the impact of the expanded collection. We also see some surprising outcomes; despite their widespread adoption and competitive performance on other tasks, we find single-stage dense retrieval runs can commonly be noncompetitive or generate low-quality results both in the zero-shot and fine-tuned domain.
    摘要 这是TREC产品搜索追踪的第一年。本年的重点是创建可重用的收藏和评估多Modal数据和元数据对搜索精度的影响。本年我们利用新的产品搜索词库,该词库包括Contextual元数据。我们的分析显示,在产品搜索领域,传统搜索系统具有高效性和通用预训练Embedding模型的竞争力。我们的分析还评估了使用简化和元数据增强的收藏,发现没有明显的趋势。我们还发现了一些意外的结果:尽管广泛采用和在其他任务上竞争性表现出色,我们发现单stage dense retrieval运行在零shot和精度调整领域中往往无竞争力或生成低质量结果。

Bring Your Own KG: Self-Supervised Program Synthesis for Zero-Shot KGQA

  • paper_url: http://arxiv.org/abs/2311.07850
  • repo_url: https://github.com/dhdhagar/byokg
  • paper_authors: Dhruv Agarwal, Rajarshi Das, Sopan Khosla, Rashmi Gangadharaiah
  • for: The paper is written for developing a universal question-answering (QA) system that can operate on any knowledge graph (KG) without requiring human-annotated training data.
  • methods: The paper uses a combination of exploration and reasoning to answer questions on a KG. The exploration is leveraged by an LLM-backed symbolic agent that generates a diverse set of query-program exemplars, which are then used to ground a retrieval-augmented reasoning procedure to predict programs for arbitrary questions.
  • results: The paper shows dramatic gains in QA accuracy over a zero-shot baseline on two benchmark datasets, GrailQA and MetaQA, with an F1 score of 27.89 and 58.02, respectively. Additionally, the paper demonstrates the effectiveness of exploration and shows that performance of the proposed method reliably improves with continued exploration and improvements in the base LLM.
    Abstract We present BYOKG, a universal question-answering (QA) system that can operate on any knowledge graph (KG), requires no human-annotated training data, and can be ready to use within a day -- attributes that are out-of-scope for current KGQA systems. BYOKG draws inspiration from the remarkable ability of humans to comprehend information present in an unseen KG through exploration -- starting at random nodes, inspecting the labels of adjacent nodes and edges, and combining them with their prior world knowledge. In BYOKG, exploration leverages an LLM-backed symbolic agent that generates a diverse set of query-program exemplars, which are then used to ground a retrieval-augmented reasoning procedure to predict programs for arbitrary questions. BYOKG is effective over both small- and large-scale graphs, showing dramatic gains in QA accuracy over a zero-shot baseline of 27.89 and 58.02 F1 on GrailQA and MetaQA, respectively. On GrailQA, we further show that our unsupervised BYOKG outperforms a supervised in-context learning method, demonstrating the effectiveness of exploration. Lastly, we find that performance of BYOKG reliably improves with continued exploration as well as improvements in the base LLM, notably outperforming a state-of-the-art fine-tuned model by 7.08 F1 on a sub-sampled zero-shot split of GrailQA.
    摘要 我们介绍了BYOKG,一个通用的问题回答(QA)系统,可以运行在任何知识图(KG)上,不需要人类验证训练数据,并可以在一天内就ready to use。这些特点与现有的KGQA系统不同,BYOKG draws inspiration from人类对未看到的KG的极其能力,通过探索开始在随机的节点上,检查邻近节点和边缘的标签,并与先前的世界知识结合。在BYOKG中,探索利用LLM-backed符式代理,生成多样化的问题程式示例,然后用它们与问题相结合,进行搜寻增强的理论过程,以预测问题的回答。BYOKG能够在小规模和大规模的图上显示出戏剧性的提升,与零基eline的比较获得27.89和58.02的F1分数。在GrailQA上,我们还显示了我们的无监督BYOKG在内部学习方法上的优势,证明了探索的效iveness。最后,我们发现BYOKG的性能随着继续探索和LLM的改进,不断提高,甚至在一部分零基eline上超越了一个精心 fine-tuned 模型,实现7.08的F1分数。

Enabling Decision-Support Systems through Automated Cell Tower Detection

  • paper_url: http://arxiv.org/abs/2311.07840
  • repo_url: None
  • paper_authors: Natasha Krell, Will Gleave, Daniel Nakada, Justin Downes, Amanda Willet, Matthew Baran
  • for: 本研究旨在提高较大地区域的移动信号覆盖率,以便提高公众对移动金融、教育和人道主义服务的访问。
  • methods: 该研究使用深度神经网络和高分辨率Remote sensing图像进行对cell Tower的物体检测,以消除手动mapping的不必要和负担。
  • results: 研究人员通过使用OpenStreetMap(OSM)特征和高分辨率Maxar图像,实现了对cell Tower的自动检测和精度的提高。模型在不同地理区域和过样测试中表现良好,可以提供更加准确的移动覆盖图。
    Abstract Cell phone coverage and high-speed service gaps persist in rural areas in sub-Saharan Africa, impacting public access to mobile-based financial, educational, and humanitarian services. Improving maps of telecommunications infrastructure can help inform strategies to eliminate gaps in mobile coverage. Deep neural networks, paired with remote sensing images, can be used for object detection of cell towers and eliminate the need for inefficient and burdensome manual mapping to find objects over large geographic regions. In this study, we demonstrate a partially automated workflow to train an object detection model to locate cell towers using OpenStreetMap (OSM) features and high-resolution Maxar imagery. For model fine-tuning and evaluation, we curated a diverse dataset of over 6,000 unique images of cell towers in 26 countries in eastern, southern, and central Africa using automatically generated annotations from OSM points. Our model achieves an average precision at 50% Intersection over Union (IoU) (AP@50) of 81.2 with good performance across different geographies and out-of-sample testing. Accurate localization of cell towers can yield more accurate cell coverage maps, in turn enabling improved delivery of digital services for decision-support applications.
    摘要 在非洲的农村地区,移动电话覆盖和高速服务的差距仍然存在,影响公众对移动基础设施的访问。改进电信基础设施的地图可以帮助制定消除移动覆盖差距的策略。深度神经网络,与高分辨率的卫星图像结合使用,可以用于检测Cell Tower的物体。在本研究中,我们提出了一种具有部分自动化的工作流程,用于在OpenStreetMap(OSM)特征和高分辨率的Maxar影像上训练物体检测模型,以检测Cell Tower。我们为模型精度调整和评估 curated了来自26个国家的东南非洲的6,000多个独特的Cell Tower图像,使用自动生成的OSM点来生成自动化的注释。我们的模型在50% Intersection over Union(IoU)的平均准确率(AP@50)为81.2,在不同的地理区域和过样测试中表现良好。准确的Cell Tower的位置确定可以使得电话覆盖地图更加准确,从而实现更好的数字服务的发展。

LLatrieval: LLM-Verified Retrieval for Verifiable Generation

  • paper_url: http://arxiv.org/abs/2311.07838
  • repo_url: https://github.com/beastyz/llm-verified-retrieval
  • paper_authors: Xiaonan Li, Changtai Zhu, Linyang Li, Zhangyue Yin, Tianxiang Sun, Xipeng Qiu
  • for: 提高大语言模型(LLM)生成文本的可靠性和可证明性。
  • methods: 提出了一种新的检索阶段,使得LLM可以通过反馈更新检索结果,以便让检索结果能够充分支持可靠地生成答案。
  • results: 实验结果显示,我们的方法可以与广泛的基线模型进行比较,并达到新的国际级结果。
    Abstract Verifiable generation aims to let the large language model (LLM) generate text with corresponding supporting documents, which enables the user to flexibly verify the answer and makes it more trustworthy. Its evaluation not only measures the correctness of the answer, but also the answer's verifiability, i.e., how well the answer is supported by the corresponding documents. In typical, verifiable generation adopts the retrieval-read pipeline, which is divided into two stages: 1) retrieve relevant documents of the question. 2) according to the documents, generate the corresponding answer. Since the retrieved documents can supplement knowledge for the LLM to generate the answer and serve as evidence, the retrieval stage is essential for the correctness and verifiability of the answer. However, the widely used retrievers become the bottleneck of the entire pipeline and limit the overall performance. They often have fewer parameters than the large language model and have not been proven to scale well to the size of LLMs. Since the LLM passively receives the retrieval result, if the retriever does not correctly find the supporting documents, the LLM can not generate the correct and verifiable answer, which overshadows the LLM's remarkable abilities. In this paper, we propose LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can support answering the question. Thus, the LLM can iteratively provide feedback to retrieval and facilitate the retrieval result to sufficiently support verifiable generation. Experimental results show that our method significantly outperforms extensive baselines and achieves new state-of-the-art results.
    摘要 大型语言模型(LLM)生成文本的可靠生成目标是让用户可以轻松验证答案,使答案更加可靠。其评价不仅考虑答案的正确性,还考虑答案是否具有可靠性,即答案是否得到了相应的支持文档。 Typical verifiable generation adopts the retrieval-read pipeline, which is divided into two stages:1. 根据问题检索相关文档。2. 根据文档生成答案。因为检索到的文档可以补充语言模型的知识,并作为证据,因此检索阶段是生成答案的关键。但是,广泛使用的检索器成为整个管道的瓶颈,限制整体性能。它们通常有 fewer 参数 than LLM,并未证明可以适应 LLM 的大小。因此,LLM 只能 passively 接受检索结果,如果检索器不能正确地检索到支持文档,那么 LLM 就无法生成正确和可靠的答案,这将屏蔽 LLM 的出色能力。在这篇论文中,我们提出了 LLatrieval(大型语言模型验证检索),其中 LLM 会更新检索结果,直到它确认检索到的文档可以支持回答问题。因此,LLM 可以逐次提供反馈给检索,使检索结果足够支持可靠生成。实验结果表明,我们的方法与广泛的基准相比显著超越,实现了新的状态纪录result。

A Neuro-Inspired Hierarchical Reinforcement Learning for Motor Control

  • paper_url: http://arxiv.org/abs/2311.07822
  • repo_url: None
  • paper_authors: Pei Zhang, Zhaobo Hua, Jinliang Ding
  • for: 这项研究的目的是开发一种基于中枢神经系统的动物学习算法,使多关节机器人可以自然地学习和应用复杂的动作技能。
  • methods: 该算法使用选择机制和调节能力来模仿动物中心神经系统的机制,并通过不同的技能组合来实现机器人的自然动作能力。
  • results: 实验结果显示,该算法可以让不同类型的机器人在22个任务环境中实现灵活的动作技能。
    Abstract Designing controllers to achieve natural motion capabilities for multi-joint robots is a significant challenge. However, animals in nature are naturally with basic motor abilities and can master various complex motor skills through acquired learning. On the basis of analyzing the mechanism of the central motor system in mammals, we propose a neuro-inspired hierarchical reinforcement learning algorithm that enables robots to learn rich motor skills and apply them to complex task environments without relying on external data. We first design a skills network similar to the cerebellum by utilizing the selection mechanism of voluntary movements in the basal ganglia and the regulatory ability of the cerebellum to regulate movement. Subsequently, by imitating the structure of advanced centers in the motion system, we propose a high-level policy to generate different skill combinations, thereby enabling the robot to acquire natural motor abilities. We conduct experiments on 4 types of robots and 22 task environments, and the results show that the proposed method can enable different types of robots to achieve flexible motion skills. Overall, our research provides a promising framework for the design of robotic neural motor controllers.
    摘要 “设计控制器以实现自然运动能力是多 JOINT 机器人设计中的一大挑战。然而,自然界中的动物具有基本的运动能力,并通过获得的学习来掌握多种复杂的运动技巧。基于分析中枢神经系统的机制,我们提出了一种基于脑神经学的层次强化学习算法,使机器人能够通过自然的运动方式来处理复杂任务环境而无需依赖于外部数据。我们首先设计了一个类似于脑干的技能网络,通过选择机制和基尼肌肉的调节能力来控制运动。然后,我们通过模仿高级运动系统的结构,提出了一种高级策略来生成不同的技能组合,使机器人能够获得自然的运动能力。我们在4种机器人和22个任务环境上进行实验,结果显示,我们的方法可以让不同类型的机器人获得灵活的运动技巧。总的来说,我们的研究提供了机器人神经动作控制器的可能性。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know and I can provide that as well.

Leveraging Large Language Models to Detect Influence Campaigns in Social Media

  • paper_url: http://arxiv.org/abs/2311.07816
  • repo_url: None
  • paper_authors: Luca Luceri, Eric Boniardi, Emilio Ferrara
  • for: 本研究旨在探讨社交媒体干预活动对公众讨论和民主所 pose 的挑战,并提出一种基于大语言模型(LLMs)的新方法来检测这些活动。
  • methods: 本研究使用 LLMs 将用户元数据和网络结构转换为文本格式,以便有效地处理多语言内容并适应 malevolent 活动actor 的shift 策略。
  • results: 我们通过对多个数据集进行严格测试,证明我们的模型在检测影响活动方面表现出色,以提供一种有效的检测工具。
    Abstract Social media influence campaigns pose significant challenges to public discourse and democracy. Traditional detection methods fall short due to the complexity and dynamic nature of social media. Addressing this, we propose a novel detection method using Large Language Models (LLMs) that incorporates both user metadata and network structures. By converting these elements into a text format, our approach effectively processes multilingual content and adapts to the shifting tactics of malicious campaign actors. We validate our model through rigorous testing on multiple datasets, showcasing its superior performance in identifying influence efforts. This research not only offers a powerful tool for detecting campaigns, but also sets the stage for future enhancements to keep up with the fast-paced evolution of social media-based influence tactics.
    摘要 社交媒体影响运动对公众讨论和民主poses significant challenges。传统的探测方法 fall short due to the complexity and dynamic nature of social media。为解决这个问题,我们提议一种使用大语言模型(LLMs)的新的探测方法,该方法包括用户元数据和网络结构的转化。通过将这些元素转化为文本格式,我们的方法可以有效地处理多语言内容,适应malevolent campaign actors的战术的变化。我们通过严格的测试多个数据集,证明了我们的模型在发现影响活动方面的表现出色。这种研究不仅提供了一种有效的探测工具,还为未来随着社交媒体基于的影响策略的演化而进行进一步的改进提供了平台。

Cooperative AI via Decentralized Commitment Devices

  • paper_url: http://arxiv.org/abs/2311.07815
  • repo_url: None
  • paper_authors: Xinyuan Sun, Davide Crapis, Matt Stephenson, Barnabé Monnot, Thomas Thiery, Jonathan Passerat-Palmbach
  • for: 这篇论文旨在探讨协同AI技术是否能够在实际世界中具有安全性和可靠性。
  • methods: 该论文使用了数字签名技术和智能合约来实现协同AI的安全性和可靠性。
  • results: 论文通过使用实际世界的commitmentDevice来检验协同AI技术的安全性和可靠性,并发现了一些潜在的安全问题。
    Abstract Credible commitment devices have been a popular approach for robust multi-agent coordination. However, existing commitment mechanisms face limitations like privacy, integrity, and susceptibility to mediator or user strategic behavior. It is unclear if the cooperative AI techniques we study are robust to real-world incentives and attack vectors. However, decentralized commitment devices that utilize cryptography have been deployed in the wild, and numerous studies have shown their ability to coordinate algorithmic agents facing adversarial opponents with significant economic incentives, currently in the order of several million to billions of dollars. In this paper, we use examples in the decentralization and, in particular, Maximal Extractable Value (MEV) (arXiv:1904.05234) literature to illustrate the potential security issues in cooperative AI. We call for expanded research into decentralized commitments to advance cooperative AI capabilities for secure coordination in open environments and empirical testing frameworks to evaluate multi-agent coordination ability given real-world commitment constraints.
    摘要 信worthy的契约设备已经是多智能体协调的受欢迎方法。然而,现有的契约机制受到隐私、完整性和中介人或用户的战略行为等限制。不清楚AI技术我们研究是否对实际世界的奖励和攻击 vector 具有Robustness。然而,使用 криптовалюence 的分布式契约设备已经在实际应用中部署,许多研究表明它们可以在面临敌对对手的情况下协调算法代理,现在达到数百万到数十亿美元的经济奖励。在这篇论文中,我们使用分布式契约的例子,特别是Maximal Extractable Value(MEV)(arXiv:1904.05234)文献, illustrate 可能在合作AI中存在安全问题。我们呼吁扩大分布式契约的研究,以提高合作AI的安全协调能力,并在实际commitment约束下进行 empirical 测试框架来评估多智能体协调能力。