2023-10-02

cs.CV

cs.CV - 2023-10-02

STARS: Zero-shot Sim-to-Real Transfer for Segmentation of Shipwrecks in Sonar Imagery

paper_url: http://arxiv.org/abs/2310.01667
repo_url: None
paper_authors: Advaith Venkatramanan Sethuraman, Katherine A. Skinner
for: 这篇论文主要关注的是对于对象分类的适应性问题，具体来说是在没有训练过真实数据的情况下，实现实验资料和真实数据之间的转换。
methods: 我们提出了一个新的分类网络，名为STARS，它通过融合预测的扭变场和异常体积，以更好地适应真实侧探显像，并实现零shot实验资料转换。
results: 我们在一个真实、专家标注的侧探显像数据集上评估了我们的方法的实验资料转换能力，结果显示我们的方法可以实现20%的提升在过滤器中的适应性。

Abstract
In this paper, we address the problem of sim-to-real transfer for object segmentation when there is no access to real examples of an object of interest during training, i.e. zero-shot sim-to-real transfer for segmentation. We focus on the application of shipwreck segmentation in side scan sonar imagery. Our novel segmentation network, STARS, addresses this challenge by fusing a predicted deformation field and anomaly volume, allowing it to generalize better to real sonar images and achieve more effective zero-shot sim-to-real transfer for image segmentation. We evaluate the sim-to-real transfer capabilities of our method on a real, expert-labeled side scan sonar dataset of shipwrecks collected from field work surveys with an autonomous underwater vehicle (AUV). STARS is trained entirely in simulation and performs zero-shot shipwreck segmentation with no additional fine-tuning on real data. Our method provides a significant 20% increase in segmentation performance for the targeted shipwreck class compared to the best baseline.

摘要
在这篇论文中，我们解决了在没有训练时间的情况下，从 simulate 到 real 的转移问题，即零例转移。我们在Side Scan Sonar 图像中进行船舶残骸分割的应用中强调这一问题。我们提出了一种新的分割网络，称为 STARS，它通过将预测的变换场和异常体积融合，以更好地适应实际 Sonar 图像和实现更好的零例转移。我们使用了一个实际由专家标注的 Side Scan Sonar 数据集，来评估我们的方法的 sim-to-real 转移能力。STARS 在实际上完全在 simulator 上训练，无需进行额外的调整，可以对真实的 Sonar 图像进行零例分割。我们的方法在Targeted 船舶类中提高了 segmentation 性能，相比最佳基准，提高了20%。

Task-guided Domain Gap Reduction for Monocular Depth Prediction in Endoscopy

paper_url: http://arxiv.org/abs/2310.01663
repo_url: None
paper_authors: Anita Rau, Binod Bhattarai, Lourdes Agapito, Danail Stoyanov
for: 针对抑制肠癌的computer-aided方法，提高肠胃镜检查和改善肠胃镜检查的质量和可用性。
methods: 利用监督学习方法和无监督学习方法，对单眼视频帧中的深度进行预测。
results: 提出了一种新的方法，可以充分利用标注的 sintetic数据和无标注的实际数据，以提高肠胃镜检查中深度的预测精度。

Abstract
Colorectal cancer remains one of the deadliest cancers in the world. In recent years computer-aided methods have aimed to enhance cancer screening and improve the quality and availability of colonoscopies by automatizing sub-tasks. One such task is predicting depth from monocular video frames, which can assist endoscopic navigation. As ground truth depth from standard in-vivo colonoscopy remains unobtainable due to hardware constraints, two approaches have aimed to circumvent the need for real training data: supervised methods trained on labeled synthetic data and self-supervised models trained on unlabeled real data. However, self-supervised methods depend on unreliable loss functions that struggle with edges, self-occlusion, and lighting inconsistency. Methods trained on synthetic data can provide accurate depth for synthetic geometries but do not use any geometric supervisory signal from real data and overfit to synthetic anatomies and properties. This work proposes a novel approach to leverage labeled synthetic and unlabeled real data. While previous domain adaptation methods indiscriminately enforce the distributions of both input data modalities to coincide, we focus on the end task, depth prediction, and translate only essential information between the input domains. Our approach results in more resilient and accurate depth maps of real colonoscopy sequences.

摘要
This work proposes a novel approach that leverages labeled synthetic and unlabeled real data to improve depth prediction. Previous domain adaptation methods have indiscriminately enforced the distributions of both input data modalities to coincide. In contrast, our approach focuses on the end task of depth prediction and translates only essential information between the input domains. This results in more resilient and accurate depth maps of real colonoscopy sequences.

SYRAC: Synthesize, Rank, and Count

paper_url: http://arxiv.org/abs/2310.01662
repo_url: https://github.com/adrian-dalessandro/SYRAC
paper_authors: Adriano D’Alessandro, Ali Mahdavi-Amiri, Ghassan Hamarneh
for: 这个研究旨在解决人群掌握 tasks 中的投入问题，特别是在计算机视觉领域中，具有重要应用。
methods: 本研究提出了一个新的方法，利用对流模型生成静止图像，以消除投入问题。然而，这些模型对物品数量的理解有误差，导致生成的图像中的物品数量信号为杂音。因此，本研究使用对流模型生成两种类型的实验数据：一种是从真实图像中移除人群，生成排名的图像对，另一种是生成预先决定的物品数量的实验图像。
results: 本研究报告了无监督人群掌握的最新成果，并且证明了这个新方法可以帮助解决投入问题。

Abstract
Crowd counting is a critical task in computer vision, with several important applications. However, existing counting methods rely on labor-intensive density map annotations, necessitating the manual localization of each individual pedestrian. While recent efforts have attempted to alleviate the annotation burden through weakly or semi-supervised learning, these approaches fall short of significantly reducing the workload. We propose a novel approach to eliminate the annotation burden by leveraging latent diffusion models to generate synthetic data. However, these models struggle to reliably understand object quantities, leading to noisy annotations when prompted to produce images with a specific quantity of objects. To address this, we use latent diffusion models to create two types of synthetic data: one by removing pedestrians from real images, which generates ranked image pairs with a weak but reliable object quantity signal, and the other by generating synthetic images with a predetermined number of objects, offering a strong but noisy counting signal. Our method utilizes the ranking image pairs for pre-training and then fits a linear layer to the noisy synthetic images using these crowd quantity features. We report state-of-the-art results for unsupervised crowd counting.

摘要
受众计数是计算机视觉中的关键任务，具有许多重要应用。然而，现有的计数方法依赖于劳动密集的密度地图注解，需要手动定位每个人。尽管最近的努力已经尝试通过弱类或半监督学习方法减轻注解负担，但这些方法并没有显著减少工作负担。我们提出一种新的方法，以利用幽游泛函数模型生成Synthetic数据，以消除注解负担。然而，这些模型困难理解物体量，导致生成具有特定量物体的图像时的笔迹噪声。为解决这个问题，我们使用幽游泛函数模型生成两种类型的Synthetic数据：一种是从真实图像中移除人群，生成排名图像对，具有弱的但可靠的物体量信号；另一种是生成具有预先确定的物体数量的Synthetic图像，提供强大但噪声的计数信号。我们的方法首先使用排名图像对进行预训练，然后使用这些人群量特征适应Linear层。我们报告了未监督人群计数领域的状态足球结果。

You Only Look at Once for Real-time and Generic Multi-Task

paper_url: http://arxiv.org/abs/2310.01641
repo_url: https://github.com/jiayuanwang-jw/yolov8-multi-task
paper_authors: Jiayuan Wang, Q. M. Jonathan Wu, Ning Zhang
for: This paper aims to present an adaptive, real-time, and lightweight multi-task model for object detection, drivable area segmentation, and lane line segmentation tasks in autonomous driving.
methods: The proposed model is an end-to-end multi-task model with a unified and streamlined segmentation structure, featuring a learnable parameter that adaptively concatenates features in segmentation necks and a segmentation head composed of a series of convolutional layers.
results: The model achieves competitive results on the BDD100k dataset, with a mAP50 of 81.1% for object detection, a mIoU of 91.0% for drivable area segmentation, and an IoU of 28.8% for lane line segmentation. Additionally, the model demonstrates better performance in real-world scenarios compared to existing multi-task models.Here is the information in Simplified Chinese text:
for: 这篇论文目标是提出一种适应性、实时性和轻量级的多任务模型，用于自动驾驶中的对象检测、可行区域分割和车道线分割任务。
methods: 提出的模型是一种端到端的多任务模型，具有统一和整合的分割结构，拥有一个可学习的参数，可以在分割 neck 中 concatenate 特征，使用同一个损失函数进行所有分割任务的学习。
results: 模型在 BDD100k 数据集上实现了竞争性的结果，具体来说是 mAP50 为 81.1%，可行区域分割 mIoU 为 91.0%，车道线分割 IoU 为 28.8%。此外，模型在实际场景中表现出色，与现有的多任务模型相比，具有更好的灵活性和更快的运算速度。

Abstract
High precision, lightweight, and real-time responsiveness are three essential requirements for implementing autonomous driving. In this study, we present an adaptive, real-time, and lightweight multi-task model designed to concurrently address object detection, drivable area segmentation, and lane line segmentation tasks. Specifically, we developed an end-to-end multi-task model with a unified and streamlined segmentation structure. We introduced a learnable parameter that adaptively concatenate features in segmentation necks, using the same loss function for all segmentation tasks. This eliminates the need for customizations and enhances the model's generalization capabilities. We also introduced a segmentation head composed only of a series of convolutional layers, which reduces the inference time. We achieved competitive results on the BDD100k dataset, particularly in visualization outcomes. The performance results show a mAP50 of 81.1% for object detection, a mIoU of 91.0% for drivable area segmentation, and an IoU of 28.8% for lane line segmentation. Additionally, we introduced real-world scenarios to evaluate our model's performance in a real scene, which significantly outperforms competitors. This demonstrates that our model not only exhibits competitive performance but is also more flexible and faster than existing multi-task models. The source codes and pre-trained models are released at https://github.com/JiayuanWang-JW/YOLOv8-multi-task

摘要
高精度、轻量级、实时响应性是执行自动驾驶的三个关键要求。在这项研究中，我们提出了一种适应、实时和轻量级多任务模型，用于同时解决对象检测、可驾驶区域分割和车道线分割任务。具体来说，我们开发了一个端到端多任务模型，其具有统一和整合的分割结构。我们引入了可学习的参数，通过同一个损失函数对所有分割任务进行 concatenate feature，从而消除了需要特殊化的需求，提高模型的泛化能力。此外，我们引入了一个仅由一系列卷积层组成的分割头，从而降低了推理时间。我们在 BDD100k 数据集上实现了竞争力强的 результа都，特别是视觉结果。结果显示我们的模型在 object detection 任务上取得了81.1%的 mAP50 分数，在可驾驶区域分割任务上取得了91.0%的 mIoU 分数，在车道线分割任务上取得了28.8%的 IoU 分数。此外，我们将实际Scene 中的场景添加到了我们的模型中，并在实际场景中显示了明显的优异性。这表明我们的模型不仅具有竞争力强的性能，还更具有更多的灵活性和更快的速度。我们的代码和预训练模型可以在 GitHub 上找到：https://github.com/JiayuanWang-JW/YOLOv8-multi-task。

Adaptive Visual Scene Understanding: Incremental Scene Graph Generation

paper_url: http://arxiv.org/abs/2310.01636
repo_url: https://github.com/zhanglab-deepneurocoglab/csegg
paper_authors: Naitik Khandelwal, Xiao Liu, Mengmi Zhang
for: 本研究旨在探讨Scene Graph Generation（SGG）领域中的连续学习问题，以满足人工智能系统在视觉世界中检测新对象和建立新关系的需求。
methods: 我们提出了包括3种学习场景和8个评价指标的完整的Continual ScenE Graph Generation（CSEGG）数据集，用于评估既存SGG方法在保持之前对象实体和关系的维护能力的 continual learning 性能。
results: 我们的研究发现，经典的两stage SGG方法和最新的 transformer-based SGG方法在连续学习Setting下的表现不佳，而 continual object detection 可以增强未知对象的类别化能力。我们的实验结果为这个新兴领域提供了有价值的信息。

Abstract
Scene graph generation (SGG) involves analyzing images to extract meaningful information about objects and their relationships. Given the dynamic nature of the visual world, it becomes crucial for AI systems to detect new objects and establish their new relationships with existing objects. To address the lack of continual learning methodologies in SGG, we introduce the comprehensive Continual ScenE Graph Generation (CSEGG) dataset along with 3 learning scenarios and 8 evaluation metrics. Our research investigates the continual learning performances of existing SGG methods on the retention of previous object entities and relationships as they learn new ones. Moreover, we also explore how continual object detection enhances generalization in classifying known relationships on unknown objects. We conduct extensive experiments benchmarking and analyzing the classical two-stage SGG methods and the most recent transformer-based SGG methods in continual learning settings, and gain valuable insights into the CSEGG problem. We invite the research community to explore this emerging field of study.

摘要
scene graph生成（SGG）涉及对图像进行意义分析，提取对象和其关系的信息。由于视觉世界的动态性，AI系统需要检测新的对象并建立它们与现有对象之间的新关系。为了解决SGG中缺乏 continual learning 方法，我们介绍了全面的 Continual ScenE Graph Generation（CSEGG）数据集，以及3种学习场景和8个评估指标。我们的研究探讨了现有SGG方法在保持先前对象实体和关系的同时学习新对象的能力。此外，我们还探究了持续对象检测是如何增强对未知对象的分类能力的。我们进行了广泛的实验和分析，探讨了经典两Stage SGG方法和最近的 transformer-based SGG方法在连续学习Setting下的性能。我们希望通过这项研究，鼓励研究者探索这一新兴领域。

Dynamic Spatio-Temporal Summarization using Information Based Fusion

paper_url: http://arxiv.org/abs/2310.01617
repo_url: None
paper_authors: Humayra Tasnim, Soumya Dutta, Melanie Moses
For: 寻求解决大规模时变数据的管理和存储问题，提高数据管理效率和深入理解复杂数据行为。* Methods: 提出一种动态空间时序数据概要技术，通过identifying informative features in key timesteps和fusing less informative ones来减少存储需求，保留原始和概要时间点的数据视图。* Results: 在多种数据集上实现了高效的数据管理和深入理解，包括流体动力学、安全监测和免疫系统等领域。

Abstract
In the era of burgeoning data generation, managing and storing large-scale time-varying datasets poses significant challenges. With the rise of supercomputing capabilities, the volume of data produced has soared, intensifying storage and I/O overheads. To address this issue, we propose a dynamic spatio-temporal data summarization technique that identifies informative features in key timesteps and fuses less informative ones. This approach minimizes storage requirements while preserving data dynamics. Unlike existing methods, our method retains both raw and summarized timesteps, ensuring a comprehensive view of information changes over time. We utilize information-theoretic measures to guide the fusion process, resulting in a visual representation that captures essential data patterns. We demonstrate the versatility of our technique across diverse datasets, encompassing particle-based flow simulations, security and surveillance applications, and biological cell interactions within the immune system. Our research significantly contributes to the realm of data management, introducing enhanced efficiency and deeper insights across diverse multidisciplinary domains. We provide a streamlined approach for handling massive datasets that can be applied to in situ analysis as well as post hoc analysis. This not only addresses the escalating challenges of data storage and I/O overheads but also unlocks the potential for informed decision-making. Our method empowers researchers and experts to explore essential temporal dynamics while minimizing storage requirements, thereby fostering a more effective and intuitive understanding of complex data behaviors.

摘要
在大规模数据生成时代，管理和存储大规模时变数据集pose significant challenges。随着超级计算机能力的升级，数据生产的量增加，加速存储和I/O负担。为解决这个问题，我们提出了一种动态空间时间数据简化技术，可以在关键时间步骤中识别有用的特征，并将不重要的特征进行融合。这种方法可以减少存储需求，保持数据的动态特征。与现有方法不同，我们的方法保留了原始和简化时间步骤，以便在时间上具有全面的信息变化视图。我们利用信息理论度量来引导融合过程，从而生成一个捕捉主要数据模式的视觉表示。我们在多种数据集中进行了多样化的应用，包括基于流体 simulations、安全和监测应用以及生物细胞交互在免疫系统中。我们的研究对数据管理领域作出了重要贡献，提出了提高效率和深入理解的新方法，可以应用于实时分析以及后续分析。这种方法不仅解决了存储和I/O负担的增长问题，还为研究人员和专家提供了一种更有效和直观地了解复杂数据行为的工具。

ImagenHub: Standardizing the evaluation of conditional image generation models

paper_url: http://arxiv.org/abs/2310.01596
repo_url: https://github.com/TIGER-AI-Lab/ImagenHub
paper_authors: Max Ku, Tianle Li, Kai Zhang, Yujie Lu, Xingyu Fu, Wenwen Zhuang, Wenhu Chen
for: This paper aims to standardize the evaluation and comparison of conditional image generation models by providing a unified inference pipeline and human evaluation metrics.
methods: The paper proposes a one-stop library called ImagenHub, which includes seven prominent tasks and high-quality evaluation datasets. It also introduces two human evaluation scores, i.e. Semantic Consistency and Perceptual Quality, and comprehensive guidelines for evaluating generated images.
results: The paper evaluates around 30 models using the proposed metrics and observes that the existing models’ performance is generally unsatisfying except for Text-guided Image Generation and Subject-driven Image Generation. It also finds that 83% of the claims from published papers hold with a few exceptions, and none of the existing automatic metrics has a Spearman’s correlation higher than 0.2 except for subject-driven image generation.

Abstract
Recently, a myriad of conditional image generation and editing models have been developed to serve different downstream tasks, including text-to-image generation, text-guided image editing, subject-driven image generation, control-guided image generation, etc. However, we observe huge inconsistencies in experimental conditions: datasets, inference, and evaluation metrics - render fair comparisons difficult. This paper proposes ImagenHub, which is a one-stop library to standardize the inference and evaluation of all the conditional image generation models. Firstly, we define seven prominent tasks and curate high-quality evaluation datasets for them. Secondly, we built a unified inference pipeline to ensure fair comparison. Thirdly, we design two human evaluation scores, i.e. Semantic Consistency and Perceptual Quality, along with comprehensive guidelines to evaluate generated images. We train expert raters to evaluate the model outputs based on the proposed metrics. Our human evaluation achieves a high inter-worker agreement of Krippendorff's alpha on 76% models with a value higher than 0.4. We comprehensively evaluated a total of around 30 models and observed three key takeaways: (1) the existing models' performance is generally unsatisfying except for Text-guided Image Generation and Subject-driven Image Generation, with 74% models achieving an overall score lower than 0.5. (2) we examined the claims from published papers and found 83% of them hold with a few exceptions. (3) None of the existing automatic metrics has a Spearman's correlation higher than 0.2 except subject-driven image generation. Moving forward, we will continue our efforts to evaluate newly published models and update our leaderboard to keep track of the progress in conditional image generation.

摘要
近期，一大量的具有条件生成和修改图像的模型已经开发出来，以服务不同的下游任务，包括文本生成图像、文本指导图像修改、主题驱动图像生成、控制指导图像生成等。然而，我们发现实验条件差异很大：数据集、推理和评估指标 - 使得公正比较困难。这篇文章提出了 ImagenHub，它是一个一站式库，用于标准化具有条件图像生成模型的推理和评估。首先，我们定义了七种表达 Task，并为它们制定了高质量的评估数据集。其次，我们构建了一个统一的推理管道，以确保公正比较。最后，我们设计了两种人工评分指标，即Semantic Consistency和Perceptual Quality，并附加了详细的评估指南。我们通过专业评分员根据我们提出的指标评估模型输出。我们的人工评估达到了Krippendorff alpha值高于0.4的同作业者一致度的76%模型。我们对总计约30个模型进行了全面评估，发现了三个关键结论：（1）现有模型的表现通常不满意，只有文本指导图像生成和主题驱动图像生成 Task 表现较好，74%的模型在总体分下降0.5。（2）我们查阅发表的论文的CLAIM，发现83%的CLAIM是正确的，其余几个例外。（3）现有自动指标中，只有主题驱动图像生成 Task 的自动指标有Spearman correlation coefficient高于0.2。后续，我们将继续评估新发布的模型，并将更新我们的排名，以跟踪具有条件图像生成的进步。

RF-ULM: Deep Learning for Radio-Frequency Ultrasound Localization Microscopy

paper_url: http://arxiv.org/abs/2310.01545
repo_url: https://github.com/hahnec/rf-ulm
paper_authors: Christopher Hahne, Georges Chabouh, Arthur Chavignon, Olivier Couture, Raphael Sznitman
for: 本研究旨在提高ultrasound localization microscopy（ULM）中图像的高分辨率，通过精准地确定干扰素Particle Across consecutive beamformed frames。
methods: 我们提出了一种直接在RF信号中 lokalisieren scatterers的方法，使用自定义的超解像深度学神经网络（DNN），并 introduce a novel semi-global convolutional sampling block tailored for reliable and accurate localization in RF input data。
results: 我们的研究表明，RF-ULM可以 bridge the domain gap between synthetic and real datasets，提供了较高的精度和较低的复杂性。我们还发现，RF-ULM可以提供实际世界中的实用性。

Abstract
In Ultrasound Localization Microscopy (ULM), achieving high-resolution images relies on the precise localization of contrast agent particles across consecutive beamformed frames. However, our study uncovers an enormous potential: The process of delay-and-sum beamforming leads to an irreversible reduction of Radio-Frequency (RF) data, while its implications for localization remain largely unexplored. The rich contextual information embedded within RF wavefronts, including their hyperbolic shape and phase, offers great promise for guiding Deep Neural Networks (DNNs) in challenging localization scenarios. To fully exploit this data, we propose to directly localize scatterers in RF signals. Our approach involves a custom super-resolution DNN using learned feature channel shuffling and a novel semi-global convolutional sampling block tailored for reliable and accurate localization in RF input data. Additionally, we introduce a geometric point transformation that facilitates seamless mapping between B-mode and RF spaces. To validate the effectiveness of our method and understand the impact of beamforming, we conduct an extensive comparison with State-Of-The-Art (SOTA) techniques in ULM. We present the inaugural in vivo results from an RF-trained DNN, highlighting its real-world practicality. Our findings show that RF-ULM bridges the domain gap between synthetic and real datasets, offering a considerable advantage in terms of precision and complexity. To enable the broader research community to benefit from our findings, our code and the associated SOTA methods are made available at https://github.com/hahnec/rf-ulm.

摘要
ultrasound мест化 microscopy (ULM) 中，实现高分辨照片依赖于连续的束formed帧中的精确局域化。然而，我们的研究发现了一个巨大的潜力：延迟和总和的扫描过程会导致无法恢复的射频数据，而其对局域化的影响尚未得到了充分的探讨。ULM中的射频波front的 ricoh 信息，包括它们的折射形状和阶段，对深度神经网络 (DNNs) 在困难的局域化场景中提供了丰富的 Contextual information。为了全面利用这些数据，我们提议直接在射频信号中localize散射体。我们的方法包括一种自定义的超解像 DNN，使用学习Feature channel shuffling和一种新的半全球巨观测束 block，以确保可靠和准确的局域化。此外，我们还引入了一种地理点变换，使得B-mode和射频空间之间的映射变得更加简单。为了证明我们的方法的有效性并了解扫描过程对局域化的影响，我们进行了广泛的对比与State-Of-The-Art (SOTA)技术在ULM中。我们发表了实际中的RF-trained DNN的首次结果，展示了它在实际场景中的实用性。我们的发现表明，RF-ULM可以跨越数据集领域的域 gap，提供高度精度和复杂性的优势。为了让研究者们更加方便地利用我们的发现，我们在https://github.com/hahnec/rf-ulm中提供了我们的代码和相关SOTA方法。

Progressive DeepSSM: Training Methodology for Image-To-Shape Deep Models

paper_url: http://arxiv.org/abs/2310.01529
repo_url: None
paper_authors: Abu Zahid Bin Aziz, Jadie Adams, Shireen Elhabian
for: 本研究旨在提高医疗图像中的统计形态模型（SSM）的精度和稳定性，以便更好地研究各种医学应用中的形态特征。
methods: 本研究提出了一种新的训练策略，称为进行式深度SSM（Progressive DeepSSM），它在多个尺度上进行多个轮次训练，以便逐渐学习各种细腻的形态特征。此外，本研究还利用形态假设和多任务学习来加以改进。
results: 实验表明，由提出的训练策略训练的模型在量化和质量上均有显著提高，而且在医疗图像中的形态推断任务中也有更高的精度和稳定性。

Abstract
Statistical shape modeling (SSM) is an enabling quantitative tool to study anatomical shapes in various medical applications. However, directly using 3D images in these applications still has a long way to go. Recent deep learning methods have paved the way for reducing the substantial preprocessing steps to construct SSMs directly from unsegmented images. Nevertheless, the performance of these models is not up to the mark. Inspired by multiscale/multiresolution learning, we propose a new training strategy, progressive DeepSSM, to train image-to-shape deep learning models. The training is performed in multiple scales, and each scale utilizes the output from the previous scale. This strategy enables the model to learn coarse shape features in the first scales and gradually learn detailed fine shape features in the later scales. We leverage shape priors via segmentation-guided multi-task learning and employ deep supervision loss to ensure learning at each scale. Experiments show the superiority of models trained by the proposed strategy from both quantitative and qualitative perspectives. This training methodology can be employed to improve the stability and accuracy of any deep learning method for inferring statistical representations of anatomies from medical images and can be adopted by existing deep learning methods to improve model accuracy and training stability.

摘要

Fetal-BET: Brain Extraction Tool for Fetal MRI

paper_url: http://arxiv.org/abs/2310.01523
repo_url: https://github.com/bchimagine/fetal-brain-extraction
paper_authors: Razieh Faghihpirayesh, Davood Karimi, Deniz Erdoğmuş, Ali Gholipour
for: 这个研究的目的是为了提供一种高度自动化、准确、普适的胎儿大脑EXTRACTION方法，以便在不同的MRI序列和扫描条件下进行胎儿大脑图像分析。
methods: 这种方法利用了U-Net风格的建筑、注意机制、多态特征学习和数据增强等深度学习技术，以捕捉多contrast（多序列）胎儿MRI数据中的详细胎儿大脑结构信息。
results: 这种方法在独立测试数据上表现出了高度的自动化、准确和普适性，可以在不同的扫描仪和 Gestational stages 下进行胎儿大脑EXTRACTION，并且在patological brains中也表现出了良好的性能。

Abstract
Fetal brain extraction is a necessary first step in most computational fetal brain MRI pipelines. However, it has been a very challenging task due to non-standard fetal head pose, fetal movements during examination, and vastly heterogeneous appearance of the developing fetal brain and the neighboring fetal and maternal anatomy across various sequences and scanning conditions. Development of a machine learning method to effectively address this task requires a large and rich labeled dataset that has not been previously available. As a result, there is currently no method for accurate fetal brain extraction on various fetal MRI sequences. In this work, we first built a large annotated dataset of approximately 72,000 2D fetal brain MRI images. Our dataset covers the three common MRI sequences including T2-weighted, diffusion-weighted, and functional MRI acquired with different scanners. Moreover, it includes normal and pathological brains. Using this dataset, we developed and validated deep learning methods, by exploiting the power of the U-Net style architectures, the attention mechanism, multi-contrast feature learning, and data augmentation for fast, accurate, and generalizable automatic fetal brain extraction. Our approach leverages the rich information from multi-contrast (multi-sequence) fetal MRI data, enabling precise delineation of the fetal brain structures. Evaluations on independent test data show that our method achieves accurate brain extraction on heterogeneous test data acquired with different scanners, on pathological brains, and at various gestational stages. This robustness underscores the potential utility of our deep learning model for fetal brain imaging and image analysis.

摘要
“胎儿脑部抽取是computational fetal brain MRI处理的必要步骤。然而，由于胎儿头部的非标准位置、胎儿在检测过程中的运动以及胎儿脑部和 maternal anatomy在不同序列和扫描条件下的巨大多样性，使得 machine learning 方法实现这项任务非常具有挑战性。在现有的数据不够时，无法实现准确的胎儿脑部抽取。在这项工作中，我们首先建立了约72,000个2D胎儿脑部 MRI图像的大量注解数据集。我们的数据集包括T2-weighted、diffusion-weighted和功能 MRI序列，以及正常和疾病脑部。使用这个数据集，我们开发了和验证了深度学习方法，利用 U-Net 风格的架构、注意机制、多contrast feature learning 和数据扩展来实现快速、准确和普适的自动胎儿脑部抽取。我们的方法利用多contrast（多序列）胎儿 MRI 数据中的丰富信息，以便准确地定义胎儿脑部结构。独立测试数据上的评估表明，我们的方法可以在不同扫描器、疾病脑部和不同 gestational stages 上准确地抽取胎儿脑部。这种稳定性表明了我们的深度学习模型在胎儿脑 imaging 和图像分析中的潜在应用前景。”

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

paper_url: http://arxiv.org/abs/2310.01506
repo_url: https://github.com/cure-lab/directinversion
paper_authors: Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu
for: 这篇论文主要目标是提高 diffusion-based editing 的效果，具体来说是通过分离源和目标扩散支持分支来提高图像编辑的可靠性和多样性。methods: 这篇论文使用了一种新的直接反向方法，即 “Direct Inversion”，这种方法可以在只有三行代码的情况下实现优化的源和目标扩散支持分支。results: 对于700张不同场景和编辑类型的图像， compared to state-of-the-art optimization-based inversion techniques, 这种新方法不仅表现出了更高的编辑效果，还可以减少了大约一个数量级的计算时间。

Abstract
Text-guided diffusion models have revolutionized image generation and editing, offering exceptional realism and diversity. Specifically, in the context of diffusion-based editing, where a source image is edited according to a target prompt, the process commences by acquiring a noisy latent vector corresponding to the source image via the diffusion model. This vector is subsequently fed into separate source and target diffusion branches for editing. The accuracy of this inversion process significantly impacts the final editing outcome, influencing both essential content preservation of the source image and edit fidelity according to the target prompt. Prior inversion techniques aimed at finding a unified solution in both the source and target diffusion branches. However, our theoretical and empirical analyses reveal that disentangling these branches leads to a distinct separation of responsibilities for preserving essential content and ensuring edit fidelity. Building on this insight, we introduce "Direct Inversion," a novel technique achieving optimal performance of both branches with just three lines of code. To assess image editing performance, we present PIE-Bench, an editing benchmark with 700 images showcasing diverse scenes and editing types, accompanied by versatile annotations and comprehensive evaluation metrics. Compared to state-of-the-art optimization-based inversion techniques, our solution not only yields superior performance across 8 editing methods but also achieves nearly an order of speed-up.

摘要
(Simplified Chinese translation)文本引导的扩散模型已经革命化了图像生成和编辑，提供了无比真实和多样性。 Specifically, in the context of diffusion-based editing, where a source image is edited according to a target prompt, the process commences by acquiring a noisy latent vector corresponding to the source image via the diffusion model. This vector is subsequently fed into separate source and target diffusion branches for editing. The accuracy of this inversion process significantly impacts the final editing outcome, influencing both essential content preservation of the source image and edit fidelity according to the target prompt. Prior inversion techniques aimed at finding a unified solution in both the source and target diffusion branches. However, our theoretical and empirical analyses reveal that disentangling these branches leads to a distinct separation of responsibilities for preserving essential content and ensuring edit fidelity. Building on this insight, we introduce "Direct Inversion," a novel technique achieving optimal performance of both branches with just three lines of code. To assess image editing performance, we present PIE-Bench, an editing benchmark with 700 images showcasing diverse scenes and editing types, accompanied by versatile annotations and comprehensive evaluation metrics. Compared to state-of-the-art optimization-based inversion techniques, our solution not only yields superior performance across 8 editing methods but also achieves nearly an order of speed-up.

DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

paper_url: http://arxiv.org/abs/2310.01412
repo_url: None
paper_authors: Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee. K. Wong, Zhenguo Li, Hengshuang Zhao
for: 这篇论文旨在解决自动驾驶领域中的可读性问题，以便提高自动车辆的商业化和进一步的发展。
methods: 这篇论文使用了多Modal大型自然语言处理器（LLM）来处理和理解图像和视频数据，并通过文本进行解释和回答人类用户提出的多种问题。
results: 对多个任务进行评估， DriveGPT4 显示出了与传统方法和视频理解 LLM 相比的超过其他方法的较高质量和量化性表现。此外， DriveGPT4 还可以在零时shot模式下扩展到更多的未看到的场景。

Abstract
In the past decade, autonomous driving has experienced rapid development in both academia and industry. However, its limited interpretability remains a significant unsolved problem, severely hindering autonomous vehicle commercialization and further development. Previous approaches utilizing small language models have failed to address this issue due to their lack of flexibility, generalization ability, and robustness. Recently, multimodal large language models (LLMs) have gained considerable attention from the research community for their capability to process and reason non-text data (e.g., images and videos) by text. In this paper, we present DriveGPT4, an interpretable end-to-end autonomous driving system utilizing LLMs. DriveGPT4 is capable of interpreting vehicle actions and providing corresponding reasoning, as well as answering diverse questions posed by human users for enhanced interaction. Additionally, DriveGPT4 predicts vehicle low-level control signals in an end-to-end fashion. These capabilities stem from a customized visual instruction tuning dataset specifically designed for autonomous driving. To the best of our knowledge, DriveGPT4 is the first work focusing on interpretable end-to-end autonomous driving. When evaluated on multiple tasks alongside conventional methods and video understanding LLMs, DriveGPT4 demonstrates superior qualitative and quantitative performance. Additionally, DriveGPT4 can be generalized in a zero-shot fashion to accommodate more unseen scenarios. The project page is available at https://tonyxuqaq.github.io/projects/DriveGPT4/ .

摘要
过去一代，自动驾驶技术在学术和业界两个领域都有了很快的发展。然而，它的解释能力仍然是一个主要的未解决问题，妨碍自动车 commercialization 和进一步的发展。以前的方法使用小型语言模型（SLMs）未能解决这个问题，因为它们缺乏灵活性、泛化能力和可靠性。最近，多模态大型语言模型（LLMs）在研究 сообществе中受到了广泛的关注，因为它们可以通过文本处理和理解非文本数据（如图像和视频）。在这篇论文中，我们提出了一种可解释的终端自动驾驶系统，即 DriveGPT4。 DriveGPT4 可以解释车辆行为并提供相应的理由，同时还可以回答人类用户提出的多样化问题以提高人机交互。此外， DriveGPT4 预测车辆低级控制信号在终端方式。这些能力来自于特制的自动驾驶视觉指令调整数据集。据我们所知， DriveGPT4 是第一个关注可解释终端自动驾驶的研究成果。在多个任务上与 convent ional 方法和视频理解 LLMS 进行评估时， DriveGPT4 表现出了超过其他方法的质量和量化性能。此外， DriveGPT4 可以在零shot 方式推广到更多的未看到场景。项目页面可以在中找到。

LEAP: Liberate Sparse-view 3D Modeling from Camera Poses

paper_url: http://arxiv.org/abs/2310.01410
repo_url: https://github.com/hwjiang1510/LEAP
paper_authors: Hanwen Jiang, Zhenyu Jiang, Yue Zhao, Qixing Huang
for: 是否需要摄像头pose进行多视图3D模型？现有方法主要假设可以获得准确的摄像头pose。但是，在稀疏视图情况下，估计摄像头pose的精度可能很差。我们的分析表明，具有噪声的估计pose会导致现有的稀疏视图3D模型方法的性能下降。
methods: 我们提出了一种新的 pose-free方法，即LEAP，以挑战现有的假设。LEAP抛弃了pose-based操作，从数据中学习 геометрические知识。LEAP具有一个神经网络卷积体，该卷积体在不同场景中共享参数，并用来编码光栅和тексту征约束。对于每个进来场景，我们将卷积体中的参数更新，通过Feature-similarity驱动的方式，将2D图像特征相似性聚合到 neural volume 中。更新后的 neural volume 被解码成为辐射场，可以实现从任何视角 synthesize 新的视图。
results: 我们在object-centric和scene-level数据集上展示了LEAP的显著性能优势。LEAP在使用 state-of-the-art pose estimator 预测pose的情况下，与使用真实pose的方法准确相当，而且运行速度比PixelNeRF快 $400\times$。此外，我们还证明LEAP可以在新的对象类和场景中 generalized 学习，并且学习的知识与epipolar geometry closely resembles。项目页面：https://hwjiang1510.github.io/LEAP/

Abstract
Are camera poses necessary for multi-view 3D modeling? Existing approaches predominantly assume access to accurate camera poses. While this assumption might hold for dense views, accurately estimating camera poses for sparse views is often elusive. Our analysis reveals that noisy estimated poses lead to degraded performance for existing sparse-view 3D modeling methods. To address this issue, we present LEAP, a novel pose-free approach, therefore challenging the prevailing notion that camera poses are indispensable. LEAP discards pose-based operations and learns geometric knowledge from data. LEAP is equipped with a neural volume, which is shared across scenes and is parameterized to encode geometry and texture priors. For each incoming scene, we update the neural volume by aggregating 2D image features in a feature-similarity-driven manner. The updated neural volume is decoded into the radiance field, enabling novel view synthesis from any viewpoint. On both object-centric and scene-level datasets, we show that LEAP significantly outperforms prior methods when they employ predicted poses from state-of-the-art pose estimators. Notably, LEAP performs on par with prior approaches that use ground-truth poses while running $400\times$ faster than PixelNeRF. We show LEAP generalizes to novel object categories and scenes, and learns knowledge closely resembles epipolar geometry. Project page: https://hwjiang1510.github.io/LEAP/

摘要
是否必需摄像头姿态 для多视图3D模型？现有方法主要假设可以获得准确的摄像头姿态。然而，对于稀疏视图来说， precisely estimating camera poses 可能是一个难题。我们的分析表明，具有噪声的估计姿态会导致现有的稀疏视图3D模型方法的性能下降。为解决这个问题，我们提出了LEAP，一种不需要摄像头姿态的新方法，因此挑战了现有的摄像头姿态是必需的假设。LEAP抛弃了姿态基于的操作，从数据中学习 геометрические知识。LEAP具有一个神经网络体，该体在不同场景中被共享，并被参数化以编码geometry和texture约束。对于每个进来的场景，我们将 neural network 更新为图像特征的集合，并通过图像特征相似性驱动的方式来归一化。更新后的神经网络体被解码为辐射场，以 Synthesize 任何视点的新视图。在对象中心和场景级 datasets 上，我们示出了LEAPsignificantly 超越了先前的方法，当使用state-of-the-art pose estimator 预测的姿态。尤其是，LEAP在使用真实姿态的情况下运行 $400\times$ faster than PixelNeRF，并且可以在新的物体类和场景中广泛适用。我们还证明LEAP学习了epipolar geometry 的知识，并且可以在不同的视图和光照条件下进行推断。项目页面：https://hwjiang1510.github.io/LEAP/

HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation

paper_url: http://arxiv.org/abs/2310.01406
repo_url: None
paper_authors: Xin Huang, Ruizhi Shao, Qi Zhang, Hongwen Zhang, Ying Feng, Yebin Liu, Qing Wang
for: 高品质和真实的3D人体生成
methods: 调整文本至图像扩散模型，包括对于测试文本进行对�OH的扩散模型，以提高2D对3D几何的认知，并将它转换为对于普通测试文本进行扩散的模型，以维持大规模数据集中的统计学约束。
results: 提出了一个名为HumanNorm的新方法，可以实现高品质和真实的3D人体生成，并且可以实现对于测试文本进行高精度的对�OH，以及对于实际测试文本进行高效的扩散。实验结果显示，HumanNorm可以实现高品质和真实的3D人体生成，并且比较 existing text-to-3D 方法在几何和颜色品质方面表现更好。

Abstract
Recent text-to-3D methods employing diffusion models have made significant advancements in 3D human generation. However, these approaches face challenges due to the limitations of the text-to-image diffusion model, which lacks an understanding of 3D structures. Consequently, these methods struggle to achieve high-quality human generation, resulting in smooth geometry and cartoon-like appearances. In this paper, we observed that fine-tuning text-to-image diffusion models with normal maps enables their adaptation into text-to-normal diffusion models, which enhances the 2D perception of 3D geometry while preserving the priors learned from large-scale datasets. Therefore, we propose HumanNorm, a novel approach for high-quality and realistic 3D human generation by learning the normal diffusion model including a normal-adapted diffusion model and a normal-aligned diffusion model. The normal-adapted diffusion model can generate high-fidelity normal maps corresponding to prompts with view-dependent text. The normal-aligned diffusion model learns to generate color images aligned with the normal maps, thereby transforming physical geometry details into realistic appearance. Leveraging the proposed normal diffusion model, we devise a progressive geometry generation strategy and coarse-to-fine texture generation strategy to enhance the efficiency and robustness of 3D human generation. Comprehensive experiments substantiate our method's ability to generate 3D humans with intricate geometry and realistic appearances, significantly outperforming existing text-to-3D methods in both geometry and texture quality. The project page of HumanNorm is https://humannorm.github.io/.

摘要
Recent text-to-3D methods using diffusion models have made significant advancements in 3D human generation. However, these approaches face challenges due to the limitations of the text-to-image diffusion model, which lacks an understanding of 3D structures. As a result, these methods often produce smooth geometry and cartoon-like appearances. In this paper, we observed that fine-tuning text-to-image diffusion models with normal maps can enhance the 2D perception of 3D geometry while preserving the priors learned from large-scale datasets. Therefore, we propose HumanNorm, a novel approach for high-quality and realistic 3D human generation by learning a normal diffusion model, including a normal-adapted diffusion model and a normal-aligned diffusion model. The normal-adapted diffusion model can generate high-fidelity normal maps corresponding to prompts with view-dependent text. The normal-aligned diffusion model learns to generate color images aligned with the normal maps, thereby transforming physical geometry details into realistic appearance. By leveraging the proposed normal diffusion model, we devise a progressive geometry generation strategy and coarse-to-fine texture generation strategy to enhance the efficiency and robustness of 3D human generation. Comprehensive experiments demonstrate our method's ability to generate 3D humans with intricate geometry and realistic appearances, significantly outperforming existing text-to-3D methods in both geometry and texture quality. The project page of HumanNorm is .

H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation

paper_url: http://arxiv.org/abs/2310.01404
repo_url: https://github.com/YanjieZe/H-InDex
paper_authors: Yanjie Ze, Yuyao Liu, Ruizhe Shi, Jiaxin Qin, Zhecheng Yuan, Jiashun Wang, Huazhe Xu
for: 解决困难的手征性操作任务（dexterous manipulation tasks），提高机器人手部的柔韧性和灵活性。
methods: 基于人工智能的视觉学习框架，包括三个阶段：（i）预训练表示的3D人手姿态估计，（ii）自动化表示的自我标注，以及（iii）强化学习加速移动平均BatchNorm。
results: 对12个困难的手征性操作任务进行了实验研究，发现H-InDex比强基础方法和最新的视觉基础模型具有更高的性能，能够更好地控制机器人的手部动作。

Abstract
Human hands possess remarkable dexterity and have long served as a source of inspiration for robotic manipulation. In this work, we propose a human $\textbf{H}$and$\textbf{-In}$formed visual representation learning framework to solve difficult $\textbf{Dex}$terous manipulation tasks ($\textbf{H-InDex}$) with reinforcement learning. Our framework consists of three stages: (i) pre-training representations with 3D human hand pose estimation, (ii) offline adapting representations with self-supervised keypoint detection, and (iii) reinforcement learning with exponential moving average BatchNorm. The last two stages only modify $0.36\%$ parameters of the pre-trained representation in total, ensuring the knowledge from pre-training is maintained to the full extent. We empirically study 12 challenging dexterous manipulation tasks and find that H-InDex largely surpasses strong baseline methods and the recent visual foundation models for motor control. Code is available at https://yanjieze.com/H-InDex .

摘要
人类手部拥有杰出的灵活性，长期作为 робо控制 manipulate 的灵感源。在这项工作中，我们提出一种基于人类手部 pose 估计的视觉学习框架，以解决困难的dexterous manipulate 任务（H-InDex）。我们的框架包括三个阶段：（i）使用 3D 人类手部 pose 估计进行先期准备表示，（ii）在线适应表示使用无监督关键点检测，（iii）使用加权平均值 BatchNorm 进行强化学习。最后两个阶段只 modify 0.36% 的参数，保证先期准备的知识得到完整保留。我们对 12 个困难的dexterous manipulate 任务进行了实验研究，发现 H-InDex 在强基eline方法和最近的视觉基础模型之上减得较大的成果。代码可以在 https://yanjieze.com/H-InDex 上下载。

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

paper_url: http://arxiv.org/abs/2310.01403
repo_url: https://github.com/wusize/clipself
paper_authors: Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy
for: 这篇研究旨在探讨 CLIP 模型在对Local Image Region Representation的适应，以提高下游开放 vocabulary dense prediction 任务的性能。
methods: 本研究使用 CLIP 模型，特别是包含 Computer Vision Transformer (ViT) 的 CLIP 模型，实现了从全图像到地方图像区域的视力-语言Alignment。
results: 本研究获得了对开放 vocabulary object detection、Semantic Segmentation 和 Panoptic Segmentation 的新的顶峰性能，并且不需要任何 Region-Text 对。

Abstract
Open-vocabulary dense prediction tasks including object detection and image segmentation have been advanced by the success of Contrastive Language-Image Pre-training (CLIP). CLIP models, particularly those incorporating vision transformers (ViTs), have exhibited remarkable generalization ability in zero-shot image classification. However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions. In this paper, we embark on an in-depth analysis of the region-language alignment in CLIP models, which is essential for downstream open-vocabulary dense prediction tasks. Subsequently, we propose an approach named CLIPSelf, which adapts the image-level recognition ability of CLIP ViT to local image regions without needing any region-text pairs. CLIPSelf empowers ViTs to distill itself by aligning a region representation extracted from its dense feature map with the image-level representation of the corresponding image crop. With the enhanced CLIP ViTs, we achieve new state-of-the-art performance on open-vocabulary object detection, semantic segmentation, and panoptic segmentation across various benchmarks. Models and code will be available at https://github.com/wusize/CLIPSelf.

摘要
translate_language="zh-CN"开放词汇稠密预测任务，包括物体检测和图像分割，受到了CLIP的成功的推动。CLIP模型，特别是包含视Transformer（ViT）的CLIP模型，在零shot图像分类中显示出了惊人的泛化能力。然而，在将CLIP的视力语言协调从全图像表示转移到本地区域表示时，CLIP ViT受到了全图像和本地区域之间的领域转移。在这篇论文中，我们进行了CLIP模型的地区语言协调的深入分析，这是对下游开放词汇稠密预测任务的关键。然后，我们提出了一种名为CLIPSelf的方法，可以使CLIP ViT在不需要任何区域文本对的情况下，将图像级别的认知能力转移到本地图像区域。CLIPSelf方法使得ViT可以自我整化，将图像级别的表示与对应的图像裁剪中的图像级别表示进行对应。通过增强CLIP ViT，我们在不同的标准benchmark上实现了新的状态场报表性能。模型和代码将在https://github.com/wusize/CLIPSelf上提供。

Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection

paper_url: http://arxiv.org/abs/2310.01401
repo_url: https://github.com/ymingxie/PARQ
paper_authors: Yiming Xie, Huaizu Jiang, Georgia Gkioxari, Julian Straub
for: 这篇论文是为了开发一种基于转换器和像素对齐循环查询的多视图3D物体检测器。
methods: 这篇论文使用了具有出版点的参考点来初始化查询，然后使用循环十字关注操作来更新查询的3D位置。它还 integrates 像素对齐特征和十字关注，使模型能够编码必要的3D-to-2D匹配和捕捉全局图像信息。
results: 根据实验结果，PARQ在ScanNet和ARKitScenes数据集上表现出色，比之前的最佳方法更快地学习和检测，更具有响应度分布变化的引导点的 Robustness，可以无需重新训练使用多个视图，并且可以根据批处理计算更改数量进行调整。

Abstract
We present PARQ - a multi-view 3D object detector with transformer and pixel-aligned recurrent queries. Unlike previous works that use learnable features or only encode 3D point positions as queries in the decoder, PARQ leverages appearance-enhanced queries initialized from reference points in 3D space and updates their 3D location with recurrent cross-attention operations. Incorporating pixel-aligned features and cross attention enables the model to encode the necessary 3D-to-2D correspondences and capture global contextual information of the input images. PARQ outperforms prior best methods on the ScanNet and ARKitScenes datasets, learns and detects faster, is more robust to distribution shifts in reference points, can leverage additional input views without retraining, and can adapt inference compute by changing the number of recurrent iterations.

摘要
我团队 todavía presentamos PARQ - a multi-view 3D object detector with transformer and pixel-aligned recurrent queries. 与之前的工作不同，PARQ 不使用学习的特征或者只将3D点位作为解码器中的查询进行编码; 相反，PARQ 利用增强的外观查询，初始化自参考点在3D空间，并通过循环クロス注意力操作更新其3D位置。将像素对应的特征和循环注意力纳入模型可以编码必要的3D-to-2D对应关系，捕捉输入图像的全局Contextual information。PARQ 在ScanNet和ARKitScenes数据集上超越了之前的最佳方法，速度快，训练和检测更快，对参考点的分布shift更强，可以无需重新训练使用多个视图，并且可以根据计算引擎的变化而变化推理计算数量。

Sequential Data Generation with Groupwise Diffusion Process

paper_url: http://arxiv.org/abs/2310.01400
repo_url: None
paper_authors: Sangyun Lee, Gayoung Lee, Hyunsu Kim, Junho Kim, Youngjung Uh
for: 本研究旨在扩展diffusion模型，通过分组数据并逐步升级一个组的方式来生成数据。
methods: 本研究使用Groupwise Diffusion Model（GDM），该模型将数据分成多个组，并在每个时间间隔内逐步升级一个组。
results: GDM可以生成数据，并且可以扩展certain forms of autoregressive models和cascaded diffusion models。此外，由于每个初始噪声只影响一个特定组的生成数据， latent space 现在具有group-wise可解释的含义。

Abstract
We present the Groupwise Diffusion Model (GDM), which divides data into multiple groups and diffuses one group at one time interval in the forward diffusion process. GDM generates data sequentially from one group at one time interval, leading to several interesting properties. First, as an extension of diffusion models, GDM generalizes certain forms of autoregressive models and cascaded diffusion models. As a unified framework, GDM allows us to investigate design choices that have been overlooked in previous works, such as data-grouping strategy and order of generation. Furthermore, since one group of the initial noise affects only a certain group of the generated data, latent space now possesses group-wise interpretable meaning. We can further extend GDM to the frequency domain where the forward process sequentially diffuses each group of frequency components. Dividing the frequency bands of the data as groups allows the latent variables to become a hierarchical representation where individual groups encode data at different levels of abstraction. We demonstrate several applications of such representation including disentanglement of semantic attributes, image editing, and generating variations.

摘要
我团队介绍了集体扩散模型（GDM），该模型将数据分成多个组并在一个时间间隔内对一个组进行扩散。GDM在前向扩散过程中逐渐生成数据，从而导致一些有趣的性质。首先，作为扩散模型的扩展，GDM拓展了某些形式的自回归模型和链式扩散模型。作为一个统一框架，GDM允许我们研究过去被忽略的设计选择，如数据分组策略和生成顺序。此外，由于一个组的初始噪声只影响特定组的生成数据，潜在空间现在具有组别可解释的含义。我们可以进一步扩展GDM到频域频谱，在前向过程中逐渐扩散每个频谱组件。将频谱分成多个组可以使潜在变量变得层次结构化，各个组代表不同层次的数据抽象。我们展示了这种表示的一些应用，包括分解semantic attribute、图像编辑和生成变化。

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection

paper_url: http://arxiv.org/abs/2310.01393
repo_url: None
paper_authors: Shilin Xu, Xiangtai Li, Size Wu, Wenwei Zhang, Yining Li, Guangliang Cheng, Yunhai Tong, Kai Chen, Chen Change Loy
for: 本研究旨在提高open-vocabulary object detection（OVOD）的精度和准确性，使得模型能够检测训练时未见过的所有类别。
methods: 本研究提出了一种简单 yet effective的策略，利用预训练的视觉语言模型（VLM）的零shot类别能力，将提案分为不同类别进行直接类别。与之前的方法不同，我们的方法不仅通过Region Proposal Network（RPN）来检测未知类别，还可以在训练阶段对提案进行选择性筛选，以便使用提案作为pseudo-label来进行自我训练。
results: 我们的方法在三个 dataset（LVIS、V3Det和COCO）上进行了实验，并达到了无需更多参数或计算成本的情况下，与基eline性能相比，提高了1.7-2.0%的LVIS数据集和2.3-3.8%的V3Det数据集的性能，并在COCO数据集上提高了6%的MAP。

Abstract
Open-vocabulary object detection (OVOD) aims to detect the objects beyond the set of categories observed during training. This work presents a simple yet effective strategy that leverages the zero-shot classification ability of pre-trained vision-language models (VLM), such as CLIP, to classify proposals for all possible novel classes directly. Unlike previous works that ignore novel classes during training and rely solely on the region proposal network (RPN) for novel object detection, our method selectively filters proposals based on specific design criteria. The resulting sets of identified proposals serve as pseudo-labels for novel classes during the training phase. It enables our self-training strategy to improve the recall and accuracy of novel classes in a self-training manner without requiring additional annotations or datasets. We further propose a simple offline pseudo-label generation strategy to refine the object detector. Empirical evaluations on three datasets, including LVIS, V3Det, and COCO, demonstrate significant improvements over the baseline performance without incurring additional parameters or computational costs during inference. In particular, compared with previous F-VLM, our method achieves a 1.7-2.0% improvement on LVIS dataset and 2.3-3.8% improvement on the recent challenging V3Det dataset. Our method also boosts the strong baseline by 6% mAP on COCO. The code and models will be publicly available at https://github.com/xushilin1/dst-det.

摘要
open-vocabulary对象检测（OVOD）目标是检测训练过程中未经见过的对象类型。这项工作提出了一种简单又有效的策略，利用预训练的视觉语言模型（VLM），如CLIP，来直接将提议分类为所有可能的新类型。与先前的方法不同，我们在训练过程中忽略新类型，并仅仅依靠区域提议网络（RPN）进行新对象检测。我们的方法可以根据特定的设计 criterion 选择提议，并将其作为新类型的pseudo-标签进行训练。这种自动训练策略可以在没有额外注释或数据集的情况下提高新类型的准确率和回归率。我们还提出了一种简单的离线pseudo-标签生成策略，以更进一步地改进对象检测器。我们的实验结果在三个数据集上，包括LVIS、V3Det和COCO，表明我们的方法可以减少基eline性能的差异，而不需要额外的参数或计算成本。具体来说，相比之前的F-VLM，我们的方法在LVIS数据集上提高了1.7-2.0%，在最近的V3Det数据集上提高了2.3-3.8%，并在COCO数据集上提高了6% mAP。我们的代码和模型将在https://github.com/xushilin1/dst-det上公开。

Towards Distribution-Agnostic Generalized Category Discovery

paper_url: http://arxiv.org/abs/2310.01376
repo_url: https://github.com/jianhongbai/bacon
paper_authors: Jianhong Bai, Zuozhu Liu, Hualiang Wang, Ruizhe Chen, Lianrui Mu, Xiaomeng Li, Joey Tianyi Zhou, Yang Feng, Jian Wu, Haoji Hu
for: 本文旨在解决数据不均衡和开放结构的问题，通过将两者结合到实际视觉世界中来。
methods: 本文提出了一种名为分布agnostic generalized category discovery (DA-GCD)的任务，即在长尾开放世界中为close-set和open-set类进行细化预测。为解决这个问题，本文提出了一种自适应协助对比框架(BaCon)，包括对比学习分支和pseudo标签分支，它们在合作下提供了相互的超vision来解决DA-GCD任务。
results: 对比BaCon与状态对应方法的实验结果表明，BaCon在所有基elines上显示出优于性，并进行了广泛的数据分析。

Abstract
Data imbalance and open-ended distribution are two intrinsic characteristics of the real visual world. Though encouraging progress has been made in tackling each challenge separately, few works dedicated to combining them towards real-world scenarios. While several previous works have focused on classifying close-set samples and detecting open-set samples during testing, it's still essential to be able to classify unknown subjects as human beings. In this paper, we formally define a more realistic task as distribution-agnostic generalized category discovery (DA-GCD): generating fine-grained predictions for both close- and open-set classes in a long-tailed open-world setting. To tackle the challenging problem, we propose a Self-Balanced Co-Advice contrastive framework (BaCon), which consists of a contrastive-learning branch and a pseudo-labeling branch, working collaboratively to provide interactive supervision to resolve the DA-GCD task. In particular, the contrastive-learning branch provides reliable distribution estimation to regularize the predictions of the pseudo-labeling branch, which in turn guides contrastive learning through self-balanced knowledge transfer and a proposed novel contrastive loss. We compare BaCon with state-of-the-art methods from two closely related fields: imbalanced semi-supervised learning and generalized category discovery. The effectiveness of BaCon is demonstrated with superior performance over all baselines and comprehensive analysis across various datasets. Our code is publicly available.

摘要
数据不匹配和开放式分布是现实视觉世界的两个内在特点。虽然已经取得了解决每个挑战的进步，但几乎没有工作将其们结合到现实世界场景中。之前的一些工作都是在测试时分类close-set样本和检测open-set样本，但还是必须能够将未知对象分类为人类。在这篇论文中，我们正式定义了更真实的任务：分布不依赖泛化分类发现（DA-GCD）：在长尾开放世界设置下，生成细化预测close-和open-set类型。为解决这个复杂的问题，我们提出了Self-Balanced Co-Advice对比框架（BaCon），它包括对比学习分支和伪标签分支，这两个分支在合作下提供交互性监督，以解决DA-GCD任务。具体来说，对比学习分支为可靠分布估计提供了正则化，以帮助伪标签分支的预测，而伪标签分支则通过自适应知识传递和我们提出的一种新的对比损失，来指导对比学习。我们与状态艺术方法进行比较，包括受欠 semi-supervised learning 和泛化分类发现。我们的代码公开可用。

NEUCORE: Neural Concept Reasoning for Composed Image Retrieval

paper_url: http://arxiv.org/abs/2310.01358
repo_url: None
paper_authors: Shu Zhao, Huijuan Xu
for: 本研究旨在提高图像检索中的复合图像组合检索任务，使模型能够更好地理解视觉和语言模式之间的互动。现有方法强调整体多modal交互模型化，忽略了图像和文本修改器之间的组合和补充性。
methods: 我们将多modal理解升级到精细度的概念水平，并学习多modal概念对齐以确定图像和文本修改器之间的视觉位置。我们提出的NEUCORE模型包括多modal概念对齐和进程式多modal融合。在多个实例学习框架下，我们使用图像和句子水平的弱监督学习多modal概念对齐。此外，基于对齐的概念，我们提出一种进程式融合策略，通过听写语言概念来生成准确的目标图像检索结果。
results: 我们在三个 datasets 上测试了我们的提议，并实现了状态地理上的结果。

Abstract
Composed image retrieval which combines a reference image and a text modifier to identify the desired target image is a challenging task, and requires the model to comprehend both vision and language modalities and their interactions. Existing approaches focus on holistic multi-modal interaction modeling, and ignore the composed and complimentary property between the reference image and text modifier. In order to better utilize the complementarity of multi-modal inputs for effective information fusion and retrieval, we move the multi-modal understanding to fine-granularity at concept-level, and learn the multi-modal concept alignment to identify the visual location in reference or target images corresponding to text modifier. Toward the end, we propose a NEUral COncept REasoning (NEUCORE) model which incorporates multi-modal concept alignment and progressive multimodal fusion over aligned concepts. Specifically, considering that text modifier may refer to semantic concepts not existing in the reference image and requiring to be added into the target image, we learn the multi-modal concept alignment between the text modifier and the concatenation of reference and target images, under multiple-instance learning framework with image and sentence level weak supervision. Furthermore, based on aligned concepts, to form discriminative fusion features of the input modalities for accurate target image retrieval, we propose a progressive fusion strategy with unified execution architecture instantiated by the attended language semantic concepts. Our proposed approach is evaluated on three datasets and achieves state-of-the-art results.

摘要
合成图像检索，将参考图像和文本修改器结合以确定目标图像是一项具有挑战性的任务，需要模型能够理解视觉和语言模式，并且理解它们之间的互动。现有方法强调整合多模态交互，忽略了参考图像和文本修改器之间的组合和补偿性。为了更好地利用多模态输入的资料，我们将多模态理解降到了微观粒度，并学习多模态概念对齐，以确定参考图像或目标图像中对文本修改器的视觉位置。最后，我们提出了一种基于NEUCORE模型，包括多模态概念对齐和进程式多模态融合。具体来说，我们学习了文本修改器中的 semantic concept不在参考图像中存在，需要在目标图像中添加，基于多个实例学习框架和图像和句子水平的弱级别指导。此外，基于对齐的概念，我们提出了一种逐步融合策略，使得输入模态之间的准确融合。我们的提议方法在三个 datasets 上进行了评估，并实现了状态最佳的结果。

Less is More: Toward Zero-Shot Local Scene Graph Generation via Foundation Models

paper_url: http://arxiv.org/abs/2310.01356
repo_url: None
paper_authors: Shu Zhao, Huijuan Xu
for: 本研究旨在提高计算机视觉系统的能力，使其更好地理解和掌握场景中的物体和关系。
methods: 我们提出了一种新的任务——本地场景图生成任务，它的目标是从图像中提取有用的结构信息，并将其转化为符号知识。我们还提出了一种基于基础模型的框架，称为zEro-shot Local scEne GrAph geNeraTion（ELEGANT），它可以在不需要标注supervision下实现zero-shot本地场景图生成。
results: 我们的方法在开放式评估环境下显著超过了基eline，并在封闭式评估环境下达到了24.58%的性能提升。这表明我们的提posed方法具有强大的理解能力和掌握能力。

Abstract
Humans inherently recognize objects via selective visual perception, transform specific regions from the visual field into structured symbolic knowledge, and reason their relationships among regions based on the allocation of limited attention resources in line with humans' goals. While it is intuitive for humans, contemporary perception systems falter in extracting structural information due to the intricate cognitive abilities and commonsense knowledge required. To fill this gap, we present a new task called Local Scene Graph Generation. Distinct from the conventional scene graph generation task, which encompasses generating all objects and relationships in an image, our proposed task aims to abstract pertinent structural information with partial objects and their relationships for boosting downstream tasks that demand advanced comprehension and reasoning capabilities. Correspondingly, we introduce zEro-shot Local scEne GrAph geNeraTion (ELEGANT), a framework harnessing foundation models renowned for their powerful perception and commonsense reasoning, where collaboration and information communication among foundation models yield superior outcomes and realize zero-shot local scene graph generation without requiring labeled supervision. Furthermore, we propose a novel open-ended evaluation metric, Entity-level CLIPScorE (ECLIPSE), surpassing previous closed-set evaluation metrics by transcending their limited label space, offering a broader assessment. Experiment results show that our approach markedly outperforms baselines in the open-ended evaluation setting, and it also achieves a significant performance boost of up to 24.58% over prior methods in the close-set setting, demonstrating the effectiveness and powerful reasoning ability of our proposed framework.

摘要
人类天生拥有对物体的选择性视觉识别能力，将视场中的特定区域转化为结构化符号知识，并根据人类目标的分配有限的注意资源进行关系理解。这种人类的直觉行为尽管自然，但现代观察系统却很难提取结构信息，因为需要复杂的认知能力和通用常识知识。为了填补这个 gap，我们提出了一个新的任务：本地场景图生成。与传统场景图生成任务不同，我们的提议任务仅需抽象出图像中相关的结构信息，而不是生成所有 объек 和关系。这种任务的目标是为下游任务提供更高级的认知和理解能力。为实现这个任务，我们提出了一种新的框架：zEro-shot Local scEne GrAph geNeraTion（ELEGANT）。该框架利用了知名的基础模型，这些模型具有强大的观察和通用常识理解能力。 Collaboration 和模型之间的信息交流，使得我们可以在零shot情况下实现本地场景图生成，而不需要标注supervision。此外，我们还提出了一个新的开放式评价指标：Entity-level CLIPScorE（ECLIPSE）。这个指标不仅超越了先前的关闭集评价指标，还可以为更广泛的评价提供更好的评价。实验结果表明，我们的方法在开放式评价 setting 上表现出色，并在close-set setting 上达到了24.58%的显著性提升。这说明了我们的提议的效iveness和强大的理解能力。

Streaming Motion Forecasting for Autonomous Driving

paper_url: http://arxiv.org/abs/2310.01351
repo_url: https://github.com/ziqipang/streamingforecasting
paper_authors: Ziqi Pang, Deva Ramanan, Mengtian Li, Yu-Xiong Wang
for: 这个研究旨在解决自动化航行中的路径预测问题，但现有的底线没有考虑到实际应用中的流动数据，这个研究将流动数据作为预测的来源。
methods: 这个研究使用了流动预测 benchmark，该 benchmark 每个时间检查未来的路径，并且因此产生了隐藏的代理人问题，这是传统的 snapshot-based benchmark 忽略的安全问题。此外，这个 benchmark 还要求预测中的时间弹性，即预测的结果必须与上一个时间检查的结果匹配。
results: 这个研究获得了以下结果：（1）透过将 snapshot-based forecaster 转换为 streaming forecaster，可以提高预测质量；（2）透过 occlusion reasoning 和时间弹性策略，可以降低隐藏代理人的预测误差，导致终端错误量下降25%；（3）这个研究将 motion forecasting 带入了其自然的流动设定，从而提高了预测质量。

Abstract
Trajectory forecasting is a widely-studied problem for autonomous navigation. However, existing benchmarks evaluate forecasting based on independent snapshots of trajectories, which are not representative of real-world applications that operate on a continuous stream of data. To bridge this gap, we introduce a benchmark that continuously queries future trajectories on streaming data and we refer to it as "streaming forecasting." Our benchmark inherently captures the disappearance and re-appearance of agents, presenting the emergent challenge of forecasting for occluded agents, which is a safety-critical problem yet overlooked by snapshot-based benchmarks. Moreover, forecasting in the context of continuous timestamps naturally asks for temporal coherence between predictions from adjacent timestamps. Based on this benchmark, we further provide solutions and analysis for streaming forecasting. We propose a plug-and-play meta-algorithm called "Predictive Streamer" that can adapt any snapshot-based forecaster into a streaming forecaster. Our algorithm estimates the states of occluded agents by propagating their positions with multi-modal trajectories, and leverages differentiable filters to ensure temporal consistency. Both occlusion reasoning and temporal coherence strategies significantly improve forecasting quality, resulting in 25% smaller endpoint errors for occluded agents and 10-20% smaller fluctuations of trajectories. Our work is intended to generate interest within the community by highlighting the importance of addressing motion forecasting in its intrinsic streaming setting. Code is available at https://github.com/ziqipang/StreamingForecasting.

摘要
几乎所有的自动适应 Navigation 问题都是访问预测的问题，但现有的底线没有考虑到实际上的应用程序是以流动数据为基础运作的。为了将这个差异处理，我们引入了一个以流动数据查询未来路径的底线，我们称之为“流动预测”。我们的底线自然地捕捉了代理人的消失和重新出现，这个问题在Snapshot-based底线上被忽略的，并且这个问题在流动预测中是一个安全critical的问题。此外，在流动timestamp上进行预测 naturally asks for temporal coherence between adjacent timestamp predictions. Based on this benchmark, we further provide solutions and analysis for streaming forecasting. We propose a plug-and-play meta-algorithm called "Predictive Streamer" that can adapt any snapshot-based forecaster into a streaming forecaster. Our algorithm estimates the states of occluded agents by propagating their positions with multi-modal trajectories, and leverages differentiable filters to ensure temporal consistency. Both occlusion reasoning and temporal coherence strategies significantly improve forecasting quality, resulting in 25% smaller endpoint errors for occluded agents and 10-20% smaller fluctuations of trajectories. Our work is intended to generate interest within the community by highlighting the importance of addressing motion forecasting in its intrinsic streaming setting. Code is available at .

Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association

paper_url: http://arxiv.org/abs/2310.01330
repo_url: None
paper_authors: Qiyu Wu, Mengjie Zhao, Yutong He, Lang Huang, Junya Ono, Hiromi Wakaki, Yuki Mitsufuji
for: Mitigating reporting bias in visual-language datasets to improve object-attribute understanding and zero-shot retrieval performance.
methods: Bimodal augmentation (BiAug) approach through object-attribute decoupling, employing large language models (LLMs) and an inpainting model to synthesize visual-language examples with a rich array of object-attribute pairing and cross-modal hard negatives.
results: Superior object-attribute understanding and improved performance on zero-shot retrieval tasks on general benchmarks like MSCOCO and Flickr30K.

Abstract
Reporting bias arises when people assume that some knowledge is universally understood and hence, do not necessitate explicit elaboration. In this paper, we focus on the wide existence of reporting bias in visual-language datasets, embodied as the object-attribute association, which can subsequentially degrade models trained on them. To mitigate this bias, we propose a bimodal augmentation (BiAug) approach through object-attribute decoupling to flexibly synthesize visual-language examples with a rich array of object-attribute pairing and construct cross-modal hard negatives. We employ large language models (LLMs) in conjunction with a grounding object detector to extract target objects. Subsequently, the LLM generates a detailed attribute description for each object and produces a corresponding hard negative counterpart. An inpainting model is then used to create images based on these detailed object descriptions. By doing so, the synthesized examples explicitly complement omitted objects and attributes to learn, and the hard negative pairs steer the model to distinguish object attributes. Our experiments demonstrated that BiAug is superior in object-attribute understanding. In addition, BiAug also improves the performance on zero-shot retrieval tasks on general benchmarks like MSCOCO and Flickr30K. BiAug refines the way of collecting text-image datasets. Mitigating the reporting bias helps models achieve a deeper understanding of visual-language phenomena, expanding beyond mere frequent patterns to encompass the richness and diversity of real-world scenarios.

摘要
报告偏见出现在视觉语言数据集中，manifests as object-attribute association，可能导致模型在这些数据集上训练时降低性能。为了 Mitigate this bias，我们提出了bi-modal augmentation（BiAug）方法，通过对象-属性分解来生成多样化的视觉语言示例，并构建跨模态硬negative pairs。我们使用大语言模型（LLM）和图像识别器来提取目标对象，然后LLM生成对象的详细属性描述，并生成对应的硬negative counterpart。然后，一个填充模型用于基于这些详细对象描述生成图像。这些合成示例可以填补掉 omitted objects and attributes，使模型能够更好地理解视觉语言现象，并且可以扩展到更加复杂和多样化的实际场景。我们的实验表明，BiAug可以提高对象-属性理解能力，同时也提高了零shot Retrieval任务的性能。BiAug还改善了文本-图像数据集的收集方式，减少了报告偏见，帮助模型更好地理解视觉语言现象。

ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video

paper_url: http://arxiv.org/abs/2310.01324
repo_url: https://github.com/leexinhao/ZeroI2V
paper_authors: Xinhao Li, Limin Wang
for: 本研究的目标是在视频识别任务中转移图像模型，而不需要全面精细调整。
methods: 我们采用了两种核心设计来实现零成本的转移：首先，我们利用自我注意力的灵活性，引入了空间-时间两头注意力（STDHA），以便免除额外参数和计算，同时具备视频中的时间模型化能力。其次，我们提出了一种线性适应策略，通过轻量级密集的线性适应器来完全传递冻结的图像模型到视频识别任务。
results: 我们在四个广泛使用的视频识别 benchmark 上进行了广泛的实验，结果表明，我们的 ZeroI2V 可以与之前的状态体系方法匹配或超越，同时享有优秀的参数和执行效率。

Abstract
Adapting image models to video domain is becoming an efficient paradigm for solving video recognition tasks. Due to the huge number of parameters and effective transferability of image models, performing full fine-tuning is less efficient and even unnecessary. Thus, recent research is shifting its focus towards parameter-efficient image-to-video adaptation. However, these adaptation strategies inevitably introduce extra computational cost to deal with the domain gap and temporal modeling in videos. In this paper, our goal is to present a zero-cost adaptation paradigm (ZeroI2V) to transfer the image transformers to video recognition tasks (i.e., introduce zero extra cost to the adapted models during inference). To achieve this goal, we present two core designs. First, to capture the dynamics in videos and reduce the difficulty of achieving image-to-video adaptation, we exploit the flexibility of self-attention and introduce the spatial-temporal dual-headed attention (STDHA) that efficiently endow the image transformers with temporal modeling capability at zero extra parameters and computation. Second, to handle the domain gap between images and videos, we propose a linear adaption strategy which utilizes lightweight densely placed linear adapters to fully transfer the frozen image models to video recognition. Due to its customized linear design, all newly added adapters could be easily merged with the original modules through structural reparameterization after training, thus achieving zero extra cost during inference. Extensive experiments on four widely-used video recognition benchmarks show that our ZeroI2V can match or even outperform previous state-of-the-art methods while enjoying superior parameter and inference efficiency.

摘要
现在，将图像模型适应到视频频谱是解决视频识别任务的有效方法。由于图像模型的庞大参数数量和传输效果，完全精度调整是不必要且不高效的。因此，当前研究的焦点在于减少参数的效果。在这篇论文中，我们的目标是提出一种零成本适应（ZeroI2V）模型，将图像变换器转换到视频识别任务（即在推理过程中不增加额外成本）。为 достичь这个目标，我们提出了两个核心设计：首先，为了在视频中捕捉动态和减少图像适应的困难，我们利用自注意力的灵活性，并引入空间-时间双头注意力（STDHA），以免费 Parameters和计算增加，快速地赋予图像变换器时间表示能力。其次，为了处理图像和视频之间的频谱差，我们提出了直线适应策略，通过轻量级的密集分布在线性适应器来完全传输冻结的图像模型到视频识别任务。由于其特制的直线设计，所有新增的适应器都可以轻松地与原始模块合并，通过结构权重重新parameter化后训练，因此在推理过程中不增加额外成本。我们在四个常用的视频识别标准benchmark上进行了广泛的实验，结果表明，我们的ZeroI2V可以与之前的状态机制匹配或者超越，同时享有优化参数和推理效率。

Color and Texture Dual Pipeline Lightweight Style Transfer

paper_url: http://arxiv.org/abs/2310.01321
repo_url: None
paper_authors: ShiQi Jiang
for: 提高风格转换的效果和效率，并能够添加控制INTENSITY的Texture结构
methods: 提出了一种双管道方法，同时输出色彩和文本ure转换结果，并使用masked total variation loss来抑制遗憾和小Texture表示
results: 在比较 экспериментах中， CTDP 方法在色彩和文本ure转换中达到了现状的最佳性能，并且模型的大小仅20k，与其他现状模型相比，较为小巧。

Abstract
Style transfer methods typically generate a single stylized output of color and texture coupling for reference styles, and color transfer schemes may introduce distortion or artifacts when processing reference images with duplicate textures. To solve the problem, we propose a Color and Texture Dual Pipeline Lightweight Style Transfer CTDP method, which employs a dual pipeline method to simultaneously output the results of color and texture transfer. Furthermore, we designed a masked total variation loss to suppress artifacts and small texture representations in color transfer results without affecting the semantic part of the content. More importantly, we are able to add texture structures with controllable intensity to color transfer results for the first time. Finally, we conducted feature visualization analysis on the texture generation mechanism of the framework and found that smoothing the input image can almost completely eliminate this texture structure. In comparative experiments, the color and texture transfer results generated by CTDP both achieve state-of-the-art performance. Additionally, the weight of the color transfer branch model size is as low as 20k, which is 100-1500 times smaller than that of other state-of-the-art models.

摘要
通常的风格传输方法只能生成单个风格化输出，其中包括颜色和文本ure coupling，而且在处理参照图像时，可能会出现重复的文本URE引起的扭曲或 artifacts。为解决这问题，我们提出了一种颜色和Texture Dual Pipeline Lightweight Style Transfer（CTDP）方法，该方法使用 dual pipeline 方式同时输出颜色和Texture transfer 的结果。此外，我们还设计了一个masked total variation loss，以便抑制颜色传输结果中的artefacts和小型文本URE，而不影响内容的semantic部分。此外，我们还能够在颜色传输结果中添加可控的Intensity的Texture structure，这是首次实现的。最后，我们对框架的Texture generation机制进行了特征可视化分析，发现可以几乎完全消除输入图像中的Texture structure的smoothing。在比较实验中，CTDP方法生成的颜色和Texture transfer结果都达到了领先水平。此外，颜色传输分支模型的 веса只有20k，这与其他领先模型的 веса相比，是100-1500倍小得多。

Efficient Remote Sensing Segmentation With Generative Adversarial Transformer

paper_url: http://arxiv.org/abs/2310.01292
repo_url: None
paper_authors: Luyi Qiu, Dayu Yu, Xiaofeng Zhang, Chenxiao Zhang
for: 提高 semantic segmentation 精度，适用于嵌入式设备。
methods: 使用 Global Transformer Network (GTNet) 生成器，并通过 residual connections 高效提取多级特征。 GTNet 使用全球 transformer 块，逐渐 Linear 计算复杂度来重新分配全局特征。
results: 在 Vaihingen 数据集上进行了广泛的实验，实现了平均 F1 分数为 90.17%，总准确率为 91.92%。

Abstract
Most deep learning methods that achieve high segmentation accuracy require deep network architectures that are too heavy and complex to run on embedded devices with limited storage and memory space. To address this issue, this paper proposes an efficient Generative Adversarial Transfomer (GATrans) for achieving high-precision semantic segmentation while maintaining an extremely efficient size. The framework utilizes a Global Transformer Network (GTNet) as the generator, efficiently extracting multi-level features through residual connections. GTNet employs global transformer blocks with progressively linear computational complexity to reassign global features based on a learnable similarity function. To focus on object-level and pixel-level information, the GATrans optimizes the objective function by combining structural similarity losses. We validate the effectiveness of our approach through extensive experiments on the Vaihingen dataset, achieving an average F1 score of 90.17% and an overall accuracy of 91.92%.

摘要
大多数深度学习方法可以达到高精度分割精度，但是这些方法通常需要具有较重、复杂的深度网络架构，这些架构在嵌入式设备上具有有限的存储和内存空间。为解决这个问题，这篇论文提出了一种高效的生成式对抗网络（GATrans），用于实现高精度semantic segmentation，同时具有极高效率。该框架利用全球变换网络（GTNet）作为生成器，高效地提取多级特征通过径向连接。GTNet使用全球变换块，通过学习相似函数来重新分配全球特征。为了强调对象级和像素级信息，GATrans优化目标函数通过结构相似损失。我们通过对洛阳数据集进行广泛的实验，证明了我们的方法的有效性，实现了平均F1分数90.17%和总准确率91.92%。

3DHR-Co: A Collaborative Test-time Refinement Framework for In-the-Wild 3D Human-Body Reconstruction Task

paper_url: http://arxiv.org/abs/2310.01291
repo_url: None
paper_authors: Jonathan Samuel Lumentut, Kyoung Mu Lee
for: 提高3D人体重建 task 的精度和稳定性，特别是在真实世界场景中处理各种多样化的人体pose和形态。
methods: 提议一种协同策略，包括预适应和测试时适应两部分，以提高通用的3DHR模型在各种场景中的性能。
results: 实验结果表明，提议的方法可以显著提高通用的3DHR模型在各种场景中的精度，比如在各种各样的人体pose和形态下的精度提高达到-34 mm。

Abstract
The field of 3D human-body reconstruction (abbreviated as 3DHR) that utilizes parametric pose and shape representations has witnessed significant advancements in recent years. However, the application of 3DHR techniques to handle real-world, diverse scenes, known as in-the-wild data, still faces limitations. The primary challenge arises as curating accurate 3D human pose ground truth (GT) for in-the-wild scenes is still difficult to obtain due to various factors. Recent test-time refinement approaches on 3DHR leverage initial 2D off-the-shelf human keypoints information to support the lack of 3D supervision on in-the-wild data. However, we observed that additional 2D supervision alone could cause the overfitting issue on common 3DHR backbones, making the 3DHR test-time refinement task seem intractable. We answer this challenge by proposing a strategy that complements 3DHR test-time refinement work under a collaborative approach. Specifically, we initially apply a pre-adaptation approach that works by collaborating various 3DHR models in a single framework to directly improve their initial outputs. This approach is then further combined with the test-time adaptation work under specific settings that minimize the overfitting issue to further boost the 3DHR performance. The whole framework is termed as 3DHR-Co, and on the experiment sides, we showed that the proposed work can significantly enhance the scores of common classic 3DHR backbones up to -34 mm pose error suppression, putting them among the top list on the in-the-wild benchmark data. Such achievement shows that our approach helps unveil the true potential of the common classic 3DHR backbones. Based on these findings, we further investigate various settings on the proposed framework to better elaborate the capability of our collaborative approach in the 3DHR task.

摘要
三维人体重建（简称3DHR）领域在过去几年内取得了重要进步，但是在真实世界、多样化场景中应用3DHR技术仍面临一些限制。主要挑战在于获取准确的三维人体姿态真实数据（GT），因为各种因素很难以制定。现有的3DHR测试时精度修正方法可以利用初始的二维人体关键点信息来支持缺乏三维监督的在野数据，但我们发现单独使用二维监督可能会导致在常见3DHR后缀中出现过拟合问题，使得3DHR测试时精度修正任务看起来无法解决。我们回答这个挑战，我们提出了一种协作策略，即在一个框架中结合不同的3DHR模型，以直接改进他们的初始输出。这种方法然后与测试时精度修正工作结合，在特定设置下进行最小化过拟合问题，以进一步提高3DHR性能。整个框架被称为3DHR-Co，我们在实验中表明，我们的方法可以明显提高常见的经典3DHR后缀的表现，在野数据上的姿态误差优化达到-34 mm，将其列入了顶尖名单。这一成果表明，我们的方法有助于探索经典3DHR后缀的真实潜力。基于这些发现，我们进一步调查了我们的框架在3DHR任务中的不同设置，以更好地评估我们的协作方法的能力。

Offline Tracking with Object Permanence

paper_url: http://arxiv.org/abs/2310.01288
repo_url: None
paper_authors: Xianzhong Liu, Holger Caesar
for: 提高自动驾驶数据集的标注效率，避免手动标注的昂贵劳动成本。
methods: 提出了一种停机跟踪模型，通过对 occlusion 情况的处理，能够有效地恢复 occluded 对象的轨迹。模型包括三部分：标准的在线跟踪器、重复识别（Re-ID）模块和轨迹完成模块。Re-ID 模块和轨迹完成模块使用 vectorized map 作为输入，以提高跟踪结果的准确性。
results: 模型可以有效地恢复 occluded 对象的轨迹，并在 3D 多对象跟踪中实现了state-of-the-art表现，比原始在线跟踪结果提高了45% IDS 和 2% AMOTA 的车辆轨迹。

Abstract
To reduce the expensive labor cost for manual labeling autonomous driving datasets, an alternative is to automatically label the datasets using an offline perception system. However, objects might be temporally occluded. Such occlusion scenarios in the datasets are common yet underexplored in offline autolabeling. In this work, we propose an offline tracking model that focuses on occluded object tracks. It leverages the concept of object permanence which means objects continue to exist even if they are not observed anymore. The model contains three parts: a standard online tracker, a re-identification (Re-ID) module that associates tracklets before and after occlusion, and a track completion module that completes the fragmented tracks. The Re-ID module and the track completion module use the vectorized map as one of the inputs to refine the tracking results with occlusion. The model can effectively recover the occluded object trajectories. It achieves state-of-the-art performance in 3D multi-object tracking by improving over the original online tracking result by 45% IDS and 2% AMOTA on the vehicle tracks.

摘要
(Simplified Chinese translation)为了减少手动标注自动驾驶数据集的高昂劳动成本，一种 alternatives 是使用离线感知系统自动标注数据集。然而，物体可能会被时间 occluded。这些 occlusion 场景在数据集中很常见， yet underexplored 在离线 autolabeling 中。在这种工作中，我们提出了一种离线跟踪模型，它关注 occluded объек目的跟踪。它利用 object permanence 的概念，这是指物体甚至不再观察后仍然存在。模型包括三部分：标准的在线跟踪器、重复识别（Re-ID）模块，它将在 occlusion 之前和之后的跟踪集成为一个，以及跟踪完成模块，它可以完成受阻跟踪的分割。Re-ID 模块和跟踪完成模块使用 vectorized map 作为输入，以提高 occlusion 的跟踪结果。模型可以有效地恢复 occluded объек目的轨迹。它在 3D 多对目标跟踪中实现了 state-of-the-art 性能，提高了在线跟踪结果的 45% IDS 和 2% AMOTA 的车辆轨迹。

MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device

paper_url: http://arxiv.org/abs/2310.01258
repo_url: None
paper_authors: Ties van Rozendaal, Tushar Singhal, Hoang Le, Guillaume Sautiere, Amir Said, Krishna Buska, Anjuman Raha, Dimitris Kalatzis, Hitarth Mehta, Frank Mayer, Liang Zhang, Markus Nagel, Auke Wiggers
for: 这个论文旨在提出一种可实现在移动设备上的神经网络视频编解码器，可以在低延迟设置下与标准编解码器竞争。
methods: 该论文使用了两大贡献来实现实时神经网络视频编解码器。首先，我们设计了一个高效的编码器，使用移动加速器上的块基动态补做算法，并将这个模型适应整数精度。其次，我们实现了一个快速解码管道，通过并行运行神经网络组件、移动设备上的Mobile GPU和扩展核心上的扩展核心来实现。
results: 我们的编码器与之前的设备上的编码器比较，具有大量BD-率减少（最高达48%）和接收器端MAC计数减少10倍。我们也进行了精心的减少分析，以证明我们引入的动态补做方案的效果。

Abstract
Neural video codecs have recently become competitive with standard codecs such as HEVC in the low-delay setting. However, most neural codecs are large floating-point networks that use pixel-dense warping operations for temporal modeling, making them too computationally expensive for deployment on mobile devices. Recent work has demonstrated that running a neural decoder in real time on mobile is feasible, but shows this only for 720p RGB video, while the YUV420 format is more commonly used in production. This work presents the first neural video codec that decodes 1080p YUV420 video in real time on a mobile device. Our codec relies on two major contributions. First, we design an efficient codec that uses a block-based motion compensation algorithm available on the warping core of the mobile accelerator, and we show how to quantize this model to integer precision. Second, we implement a fast decoder pipeline that concurrently runs neural network components on the neural signal processor, parallel entropy coding on the mobile GPU, and warping on the warping core. Our codec outperforms the previous on-device codec by a large margin with up to 48 % BD-rate savings, while reducing the MAC count on the receiver side by 10x. We perform a careful ablation to demonstrate the effect of the introduced motion compensation scheme, and ablate the effect of model quantization.

摘要
现代神经视频编码器在低延迟设置下与标准编码器如HEVC竞争。然而，大多数神经编码器是大型浮点网络，使用像素密集扭曲操作进行 temporal 模型化，导致其在移动设备上不可deploy。 recent work 表明可以在移动设备上运行真实时间的神经解码器，但只适用于 720p RGB 视频，而 YUV420 格式更常用于生产。这项工作介绍了首个可以在移动设备上实时解码 1080p YUV420 视频的神经视频编码器。我们的编码器基于两大贡献：首先，我们设计了高效的编码器，使用移动加速器上的块基动作补做算法，并显示如何将这个模型降到整数精度。其次，我们实现了快速解码管道，并在神经信号处理器上并行运行神经网络组件、移动 GPU 上的快速Entropy 编码和块基动作。我们的编码器在前一代移动设备上的编码器之上减少了大量的MAC计数，同时实现了48%的BD-rate 减少。我们进行了仔细的减少分析，以证明我们引入的运动补做方案的效果，以及模型降到整数精度的效果。

Generating 3D Brain Tumor Regions in MRI using Vector-Quantization Generative Adversarial Networks

paper_url: http://arxiv.org/abs/2310.01251
repo_url: None
paper_authors: Meng Zhou, Matthias W Wagner, Uri Tabori, Cynthia Hawkins, Birgit B Ertl-Wagner, Farzad Khalvati
for: 增强医学影像分析的深度学习方法，特别是生成对抗网络（GANs）的应用，以生成真实和多样化的图像，以增强训练集的数据。
methods: 我们提出了一种新的框架，使用向量量化GAN和变换器，并含有遮盲令素模型，来生成高分辨率和多样化的3D脑肿瘤ROI，可以直接作为增强数据进行脑肿瘤分类。
results: 我们在两个不均衡数据集上应用了我们的方法，分别是Multimodal Brain Tumor Segmentation Challenge（BraTS）2019数据集和内部的pediatric LGG（pLGG）数据集。结果显示，我们的方法比基线模型提高6.4%的AUC在BraTS 2019数据集和4.3%的AUC在我们内部的pLGG数据集。这些结果表明我们生成的肿瘤ROIs可以有效地解决不均衡数据问题，并且我们的方法可能可以帮助精准诊断少见的脑肿瘤。

Abstract
Medical image analysis has significantly benefited from advancements in deep learning, particularly in the application of Generative Adversarial Networks (GANs) for generating realistic and diverse images that can augment training datasets. However, the effectiveness of such approaches is often limited by the amount of available data in clinical settings. Additionally, the common GAN-based approach is to generate entire image volumes, rather than solely the region of interest (ROI). Research on deep learning-based brain tumor classification using MRI has shown that it is easier to classify the tumor ROIs compared to the entire image volumes. In this work, we present a novel framework that uses vector-quantization GAN and a transformer incorporating masked token modeling to generate high-resolution and diverse 3D brain tumor ROIs that can be directly used as augmented data for the classification of brain tumor ROI. We apply our method to two imbalanced datasets where we augment the minority class: (1) the Multimodal Brain Tumor Segmentation Challenge (BraTS) 2019 dataset to generate new low-grade glioma (LGG) ROIs to balance with high-grade glioma (HGG) class; (2) the internal pediatric LGG (pLGG) dataset tumor ROIs with BRAF V600E Mutation genetic marker to balance with BRAF Fusion genetic marker class. We show that the proposed method outperforms various baseline models in both qualitative and quantitative measurements. The generated data was used to balance the data in the brain tumor types classification task. Using the augmented data, our approach surpasses baseline models by 6.4% in AUC on the BraTS 2019 dataset and 4.3% in AUC on our internal pLGG dataset. The results indicate the generated tumor ROIs can effectively address the imbalanced data problem. Our proposed method has the potential to facilitate an accurate diagnosis of rare brain tumors using MRI scans.

摘要
医学影像分析受到深度学习的进步带来了 significiant benefits,特别是在使用生成对抗网络（GANs）生成真实多样化的图像，以增加训练集的数据量。然而，这些方法的效果受到临床实践中数据的有限性的限制。此外，通常的GAN基本方法是生成整个图像Volume，而不是仅仅是Region of Interest（ROI）。研究表明，使用深度学习分类brain tumor using MRI时，ROI比整个图像更容易分类。在这项工作中，我们提出了一种新的框架，使用vector-quantization GAN和一种包含masked token的transformer来生成高分辨率和多样化的3D brain tumor ROI，这些数据可以直接用于分类brain tumor ROI。我们在两个倾斜数据集上应用我们的方法：（1）Multimodal Brain Tumor Segmentation Challenge（BraTS）2019 dataset，生成新的low-grade glioma（LGG）ROI，以增加与高度 glioma（HGG）类的数据均衡；（2）内部的pediatric LGG（pLGG）数据集，tumor ROIs with BRAF V600E Mutation genetic marker，以增加与BRAF Fusion genetic marker类的数据均衡。我们显示，我们的方法在两个数据集上都有较好的表现，与基线模型相比，提高了6.4%的AUC在BraTS 2019dataset和4.3%的AUC在我们内部的pLGG dataset。这些结果表明，生成的 tumor ROIs 可以有效地解决倾斜数据问题。我们的提议方法具有激活准确诊断少见脑肿的潜力。

Mirror Diffusion Models for Constrained and Watermarked Generation

paper_url: http://arxiv.org/abs/2310.01236
repo_url: None
paper_authors: Guan-Horng Liu, Tianrong Chen, Evangelos A. Theodorou, Molei Tao
for: 这个研究旨在构建一种可以在受限的集合上生成数据的扩散模型，并且保持 tractability 的特性。
methods: 这个研究使用了一种新的 Mirror Diffusion Models (MDM)，它是一种基于mirror map的扩散模型，并且可以在受限的集合上生成数据。
results: 研究发现，这个 MDM 可以在受限的集合上生成数据，并且可以实现适当的 tractability。此外，这个方法还可以实现对数据的安全和隐私保护。

Abstract
Modern successes of diffusion models in learning complex, high-dimensional data distributions are attributed, in part, to their capability to construct diffusion processes with analytic transition kernels and score functions. The tractability results in a simulation-free framework with stable regression losses, from which reversed, generative processes can be learned at scale. However, when data is confined to a constrained set as opposed to a standard Euclidean space, these desirable characteristics appear to be lost based on prior attempts. In this work, we propose Mirror Diffusion Models (MDM), a new class of diffusion models that generate data on convex constrained sets without losing any tractability. This is achieved by learning diffusion processes in a dual space constructed from a mirror map, which, crucially, is a standard Euclidean space. We derive efficient computation of mirror maps for popular constrained sets, such as simplices and $\ell_2$-balls, showing significantly improved performance of MDM over existing methods. For safety and privacy purposes, we also explore constrained sets as a new mechanism to embed invisible but quantitative information (i.e., watermarks) in generated data, for which MDM serves as a compelling approach. Our work brings new algorithmic opportunities for learning tractable diffusion on complex domains.

摘要
现代扩散模型在学习复杂高维数据分布上取得了成功，其中一部分归功于它们可以构建分析式传递函数和评估函数。这些特点使得在无 simulate 框架下实现了稳定的回归损失，从而可以学习具有扩散性的生成过程。然而，当数据受到约束时，这些愉悦的特点似乎消失了，这是根据之前的尝试所示。在这个工作中，我们提出了镜像扩散模型（MDM），一种新的扩散模型，可以在受到约束的 convex 空间中生成数据，而不失去任何的 tractability。我们通过学习镜像映射来实现这一点，这个映射是标准的欧几丁素空间。我们 derivation 了镜像映射的高效计算方法，并证明了 MDM 在各种流行的约束空间中表现出色。此外，我们还探讨了在受到约束的数据中嵌入不可见的数字信息（即水印）的可能性，并证明了 MDM 作为一种可靠的方法。我们的工作为学习在复杂领域上 tractable 扩散带来了新的算法机遇。

Reconstructing 3D Human Pose from RGB-D Data with Occlusions

paper_url: http://arxiv.org/abs/2310.01228
repo_url: https://github.com/DangBowen-Bell/Occlusion_HPR
paper_authors: Bowen Dang, Xi Zhao, Bowen Zhang, He Wang
for: 从RGB-D图像中重建3D人体，解决 occlusion 问题
methods: 使用 neural network 估算“自由区”，并使用“截角影像体积”对可见的身体部分进行条件约束
results: 在PROX dataset上实验，比较其他方法更加精确和可能的结果

Abstract
We propose a new method to reconstruct the 3D human body from RGB-D images with occlusions. The foremost challenge is the incompleteness of the RGB-D data due to occlusions between the body and the environment, leading to implausible reconstructions that suffer from severe human-scene penetration. To reconstruct a semantically and physically plausible human body, we propose to reduce the solution space based on scene information and prior knowledge. Our key idea is to constrain the solution space of the human body by considering the occluded body parts and visible body parts separately: modeling all plausible poses where the occluded body parts do not penetrate the scene, and constraining the visible body parts using depth data. Specifically, the first component is realized by a neural network that estimates the candidate region named the "free zone", a region carved out of the open space within which it is safe to search for poses of the invisible body parts without concern for penetration. The second component constrains the visible body parts using the "truncated shadow volume" of the scanned body point cloud. Furthermore, we propose to use a volume matching strategy, which yields better performance than surface matching, to match the human body with the confined region. We conducted experiments on the PROX dataset, and the results demonstrate that our method produces more accurate and plausible results compared with other methods.

摘要
我们提出了一种新的方法，用于从RGB-D图像中重建3D人体。最大的挑战是RGB-D数据的不完整性，由于人体和环境之间的遮挡，导致重建结果受到严重的人体-场景卷绕的影响。为了重建具有semantic和physical可能性的人体，我们提议将解决空间缩小到场景信息和先前知识基础上。我们的关键思想是：对于遮挡的人体部分和可见的人体部分分别进行搜索和约束。具体来说，我们使用神经网络来估算"自由区"（free zone），这是在场景中开放的空间内，无需担心遮挡的人体部分的搜索。第二个组件是使用"截断影adowVolume"来约束可见的人体部分。此外，我们还提议使用体积匹配策略，这对于匹配人体和困难的场景来说比surface匹配更有优势。我们在PROX数据集上进行了实验，结果表明，我们的方法可以在与其他方法进行比较时产生更加准确和可能的结果。

Making LLaMA SEE and Draw with SEED Tokenizer

paper_url: http://arxiv.org/abs/2310.01218
repo_url: https://github.com/ailab-cvc/seed
paper_authors: Yuying Ge, Sijie Zhao, Ziyun Zeng, Yixiao Ge, Chen Li, Xintao Wang, Ying Shan
for: 本研究旨在推动大型自然语言模型（LLM）的进一步发展，使其能够更好地处理多Modal信息，并且展现出开放世界上的emergent能力。
methods: 本研究使用了一种新的图像tokenizer，称为SEED，允许LLM在原有的训练环境下进行批量多Modal autoregressive预测。SEED tokens采用了一种1D causaldependency，使得图像和文本能够相互之间独立地进行预测。
results: 研究表明，使用SEED tokens可以实现scalable多Modal autoregressive预测，并且在多种多Modal comprehension和生成任务上表现出色。此外，SEED-LLaMA还能够展现出compositional emergent能力，如多轮在 Context中的多Modal生成。

Abstract
The great success of Large Language Models (LLMs) has expanded the potential of multimodality, contributing to the gradual evolution of General Artificial Intelligence (AGI). A true AGI agent should not only possess the capability to perform predefined multi-tasks but also exhibit emergent abilities in an open-world context. However, despite the considerable advancements made by recent multimodal LLMs, they still fall short in effectively unifying comprehension and generation tasks, let alone open-world emergent abilities. We contend that the key to overcoming the present impasse lies in enabling text and images to be represented and processed interchangeably within a unified autoregressive Transformer. To this end, we introduce SEED, an elaborate image tokenizer that empowers LLMs with the ability to SEE and Draw at the same time. We identify two crucial design principles: (1) Image tokens should be independent of 2D physical patch positions and instead be produced with a 1D causal dependency, exhibiting intrinsic interdependence that aligns with the left-to-right autoregressive prediction mechanism in LLMs. (2) Image tokens should capture high-level semantics consistent with the degree of semantic abstraction in words, and be optimized for both discriminativeness and reconstruction during the tokenizer training phase. With SEED tokens, LLM is able to perform scalable multimodal autoregression under its original training recipe, i.e., next-word prediction. SEED-LLaMA is therefore produced by large-scale pretraining and instruction tuning on the interleaved textual and visual data, demonstrating impressive performance on a broad range of multimodal comprehension and generation tasks. More importantly, SEED-LLaMA has exhibited compositional emergent abilities such as multi-turn in-context multimodal generation, acting like your AI assistant.

摘要
LLMs 的成功已经扩展了多Modal的潜力，为普遍智能（AGI）的演化做出了贡献。一个真正的 AGI 代理应该不仅具备多个预定的多Modal 任务的能力，而且在开放世界上展现出emergent能力。然而，尽管 latest multimodal LLMs 已经做出了很大的进步，但它们仍然无法有效地结合理解和生成任务，更不用说open-world emergent ability。我们认为，解决当前僵困的关键在于让文本和图像被表示和处理在一起，并在一个权威的Transformer中进行 autoregressive 处理。为此，我们引入 SEED，一种复杂的图像Tokenizer，使 LLMS 能够同时See和Draw。我们提出了两个关键的设计原则：（1）图像Token应该是2D物理 patch 位置独立的，而是通过1D causal dependency生成的，这种自适应性与 LLMS 的 left-to-right autoregressive 预测机制相符。（2）图像Token应该捕捉高度semantic的抽象，与文本中的语义相关，并在Tokenizer 训练阶段进行优化，以确保它们在描述性和重建性方面都具备出色的表现。 SEED Token 使 LLMS 能够在原来的训练策略下进行扩展的多Modal autoregression，并且在大规模预训练和指令调整后，SEED-LLaMA 在多Modal 理解和生成任务上表现出卓越的表现。更重要的是，SEED-LLaMA 展现出了 compositional emergent ability，例如多turn in-context multimodal generation，类似于你的 AI 助手。

Towards Robust Cardiac Segmentation using Graph Convolutional Networks

paper_url: http://arxiv.org/abs/2310.01210
repo_url: https://github.com/gillesvntnu/graphbasedsegmentation
paper_authors: Gilles Van De Vyver, Sarina Thomas, Guy Ben-Yosef, Sindre Hellum Olaisen, Håvard Dalen, Lasse Løvstakken, Erik Smistad
for: 本研究旨在提高现有的深度学习模型中的卡多骨骼分割精度，以提高echocardiography图像中的心血管结构分割结果。
methods: 本研究使用了图 convolutional neural networks（GCN）来预测心血管结构的极限点，而不是每个像素的标签。GCN使用了两个径向的卷积核，基于心血管解剖学。
results: 研究表明，使用GCN可以减少心血管结构多结构分割中的大异常值，并且在公共数据集CAMUS上显示了比较好的性能。此外，研究还进行了ablation study和临床数据集HUNT4上的评估。最后，研究还提出了使用U-Net和GCN之间的间隔协议来预测输入和分割质量。

Abstract
Fully automatic cardiac segmentation can be a fast and reproducible method to extract clinical measurements from an echocardiography examination. The U-Net architecture is the current state-of-the-art deep learning architecture for medical segmentation and can segment cardiac structures in real-time with average errors comparable to inter-observer variability. However, this architecture still generates large outliers that are often anatomically incorrect. This work uses the concept of graph convolutional neural networks that predict the contour points of the structures of interest instead of labeling each pixel. We propose a graph architecture that uses two convolutional rings based on cardiac anatomy and show that this eliminates anatomical incorrect multi-structure segmentations on the publicly available CAMUS dataset. Additionally, this work contributes with an ablation study on the graph convolutional architecture and an evaluation of clinical measurements on the clinical HUNT4 dataset. Finally, we propose to use the inter-model agreement of the U-Net and the graph network as a predictor of both the input and segmentation quality. We show this predictor can detect out-of-distribution and unsuitable input images in real-time. Source code is available online: https://github.com/gillesvntnu/GCN_multistructure

摘要
自动化心脏分割可以是一种快速和可重复的方法，以提取生理测量从echo心脏检查中。U-Net架构是现有状态的深度学习架构，用于医疗分割，可以在实时中将心脏结构分割，并且 average 错误相对于多观者变化。然而，这些架构仍然会生成大的异常值，其中一些是生理错误的多结构分割。本工作使用图 convolutional neural networks（图 CNN）的概念，预测结构关键点而不是每个像素的标签。我们提出一种基于心脏解剖学的图架构，使用两个 convolutional 环，并证明了这可以消除生理错误的多结构分割。此外，本工作还包括一个ablation study 和 echo心脏 dataset 上的临床评估。最后，我们提出使用 U-Net 和图网络之间的交互模型协议，用于预测输入和分割质量。我们表明这可以在实时中检测输入图像是否合法。源代码可以在线获取：https://github.com/gillesvntnu/GCN_multistructure。

Self-distilled Masked Attention guided masked image modeling with noise Regularized Teacher (SMART) for medical image analysis

paper_url: http://arxiv.org/abs/2310.01209
repo_url: None
paper_authors: Jue Jiang, Harini Veeraraghavan
For: 这个研究是为了提出一种基于自我混合推理的masked image modeling（MIM）预训练方法，以提高SWIN模型在医学图像分析中的转移性和准确性。* Methods: 该研究使用了 hierarchical shifted window transformers（Swin）模型，并通过自我混合推理和随机扰动注入的师导学习方法来实现Masked image modeling（MIM）预训练。* Results: 该研究在多个下游任务中表现出色，包括预测高度进行Immunotherapy Response（Task I）、预测恶性肿瘤复发（Task II）、 segmentation of lung cancer（Task III）和不监督的器官归一化（Task IV）等。SMART模型在这些任务中均无需精度调整，并且在医学图像分析中表现出了优秀的转移性和准确性。

Abstract
Hierarchical shifted window transformers (Swin) are a computationally efficient and more accurate alternative to plain vision transformers. Masked image modeling (MIM)-based pretraining is highly effective in increasing models' transferability to a variety of downstream tasks. However, more accurate and efficient attention guided MIM approaches are difficult to implement with Swin due to it's lack of an explicit global attention. We thus architecturally enhanced Swin with semantic class attention for self-supervised attention guided co-distillation with MIM. We also introduced a noise injected momentum teacher, implemented with patch dropout of teacher's inputs for improved training regularization and accuracy. Our approach, called \underline{s}elf-distilled \underline{m}asked \underline{a}ttention MIM with noise \underline{r}egularized \underline{t}eacher (SMART) was pretrained with \textbf{10,412} unlabeled 3D computed tomography (CT)s of multiple disease sites and sourced from institutional and public datasets. We evaluated SMART for multiple downstream tasks involving analysis of 3D CTs of lung cancer (LC) patients for: (i) [Task I] predicting immunotherapy response in advanced stage LC (n = 200 internal dataset), (ii) [Task II] predicting LC recurrence in early stage LC before surgery (n = 156 public dataset), (iii) [Task III] LC segmentation (n = 200 internal, 21 public dataset), and (iv) [Task IV] unsupervised clustering of organs in the chest and abdomen (n = 1,743 public dataset) \underline{without} finetuning. SMART predicted immunotherapy response with an AUC of 0.916, LC recurrence with an AUC of 0.793, segmented LC with Dice accuracy of 0.81, and clustered organs with an inter-class cluster distance of 5.94, indicating capability of attention guided MIM for Swin in medical image analysis.

摘要
Hierarchical shifted window transformers (Swin) 是一种 computationally efficient 和更加准确的替代品 для普通的视觉transformers。基于掩码图像模型（MIM）的预训练是提高模型的转移性的高效方法，但更高精度和更高效的注意力导向MIM方法难以实现于Swindue to its lack of explicit global attention。我们因此将Swin进行了semantic class attention的建模，并将其与MIM进行自我混合填充。我们还引入了噪声注入的振荡教师，通过对教师输入的patch dropout进行了改进的训练正则化和精度。我们的方法被称为自适应掩码注意力MIM噪声注入教师（SMART）。SMART在10,412个未标注的3D计算机 Tomography（CT）图像上进行预训练，这些图像来自机构和公共数据集。我们对SMART进行多个下游任务的评估，包括对lung cancer（LC）患者的先进stage predicting immunotherapy response（任务I，n = 200）、LC恢复前 surgery predicting LC recurrence（任务II，n = 156）、LC segmentation（任务III，n = 200）和无监督归一化 clustering of organs in the chest and abdomen（任务IV，n = 1,743）。SMART预测了immunotherapy response的AUC为0.916，LC恢复的AUC为0.793，LC segmentation的Dice accuracy为0.81，并且归一化了organs的inter-class cluster distance为5.94，表明了Swin的注意力导向MIM在医学影像分析中的能力。

Cross-adversarial local distribution regularization for semi-supervised medical image segmentation

paper_url: http://arxiv.org/abs/2310.01176
repo_url: https://github.com/PotatoThanh/Cross-adversarial-local-distribution-regularization
paper_authors: Thanh Nguyen-Duc, Trung Le, Roland Bammer, He Zhao, Jianfei Cai, Dinh Phung
for: 这篇论文探讨了半指导式医疗图像分类的技术，尤其是对于有限标注数据的情况下。
methods: 本文提出了一种新的对抗式本地分布（Cross-ALD）调整，用于增强半指导式医疗图像分类任务中的平滑假设。
results: 作者在实验中发现，Cross-ALD可以对多个最近的方法在公开的LA和ACDC数据集上进行比较，并取得了最佳表现。

Abstract
Medical semi-supervised segmentation is a technique where a model is trained to segment objects of interest in medical images with limited annotated data. Existing semi-supervised segmentation methods are usually based on the smoothness assumption. This assumption implies that the model output distributions of two similar data samples are encouraged to be invariant. In other words, the smoothness assumption states that similar samples (e.g., adding small perturbations to an image) should have similar outputs. In this paper, we introduce a novel cross-adversarial local distribution (Cross-ALD) regularization to further enhance the smoothness assumption for semi-supervised medical image segmentation task. We conducted comprehensive experiments that the Cross-ALD archives state-of-the-art performance against many recent methods on the public LA and ACDC datasets.

摘要
医学半超级分割是一种技术，用于在医疗图像中分割注意力点对象，只使用有限的标注数据进行训练。现有的半超级分割方法通常基于平滑假设。这个假设表明，两个相似的数据样本（例如，通过添加小干扰到图像）的模型输出分布应该是相似的。在这篇论文中，我们介绍了一种新的交叉对抗本地分布（Cross-ALD）规范，以进一步增强半超级分割任务中的平滑假设。我们对公共的LA和ACDC数据集进行了广泛的实验，并证明了Cross-ALD可以在许多最近的方法中达到状态盘点性能。

Segment Any Building

paper_url: http://arxiv.org/abs/2310.01164
repo_url: https://github.com/SOYJUN/Application-with-raw-IP-sockets
paper_authors: Lei Li
for: 本研究旨在提高遥感图像中建筑物分割的精度和效率，以便在城市规划、灾害防制和生态监测等领域中应用。
methods: 该研究利用了多种数据集，并采用了前沿的表示学习方法进行建筑物分割。这些数据集的整合不仅扩大了模型训练中可用信息的 horizons，而且在多个数据集上实现了卓越的性能指标。
results: 该研究的结果表明，通过合理的数据集整合和采用前沿表示学习方法，可以在多个数据集上实现优秀的建筑物分割性能。这些成果不仅为后续学术研究提供了坚实的基础，还预示了在建筑物分割领域的创新应用。

Abstract
The task of identifying and segmenting buildings within remote sensing imagery has perennially stood at the forefront of scholarly investigations. This manuscript accentuates the potency of harnessing diversified datasets in tandem with cutting-edge representation learning paradigms for building segmentation in such images. Through the strategic amalgamation of disparate datasets, we have not only expanded the informational horizon accessible for model training but also manifested unparalleled performance metrics across multiple datasets. Our avant-garde joint training regimen underscores the merit of our approach, bearing significant implications in pivotal domains such as urban infrastructural development, disaster mitigation strategies, and ecological surveillance. Our methodology, predicated upon the fusion of datasets and gleaning insights from pre-trained models, carves a new benchmark in the annals of building segmentation endeavors. The outcomes of this research both fortify the foundations for ensuing scholarly pursuits and presage a horizon replete with innovative applications in the discipline of building segmentation.

摘要
remote sensing imagery 内部建筑物识别任务一直处于学术研究的前列。这篇文章强调了利用多种数据集并结合前沿表示学习方法的潜在力量，以实现在这些图像中的建筑物分割。通过策略性融合多种数据集，我们不仅扩大了模型训练的信息领域，而且在多个数据集上实现了很高的性能指标。我们的前卫合训练方法表明了我们的方法的优势，对于重要领域如城市基础设施建设、灾害防御策略和生态监测等具有深远的意义。我们的方法基于数据集的 fusión和采用预训练模型的洞察，为建筑物分割领域划新的benchmark。这些研究成果不仅强化了后续学术研究的基础，还预示了这一领域的创新应用的前景。

Iterative Semi-Supervised Learning for Abdominal Organs and Tumor Segmentation

paper_url: http://arxiv.org/abs/2310.01159
repo_url: https://github.com/ustguy/flare23
paper_authors: Jiaxin Zhuang, Luyang Luo, Zhixuan Chen, Linshan Wu
for: 这个研究是为了提高Computed Tomography（CT）扫描图像中腹部器官和肿瘤分类的深度学习（Deep-learning）方法。
methods: 这个研究使用了Semantic Supervised Learning（SSL）策略和迭代 Pseudo Labeling（PL）方法，使用一个深度模型（nn-UNet）在完全标注的数据集上训练，然后将这些pseudo Labels用于训练一个更强大的分类模型。
results: 使用Flare23 dataset，我们的方法在线上验证领航板上获得了89.63%的DSC分数和46.07%的NSD分数 для器官分类，并获得了0.9007%的DSC和0.9493%的NSD分数 для肿瘤分类。我们的代码可以在https://github.com/USTguy/Flare23上找到。

Abstract
Deep-learning (DL) based methods are playing an important role in the task of abdominal organs and tumors segmentation in CT scans. However, the large requirements of annotated datasets heavily limit its development. The FLARE23 challenge provides a large-scale dataset with both partially and fully annotated data, which also focuses on both segmentation accuracy and computational efficiency. In this study, we propose to use the strategy of Semi-Supervised Learning (SSL) and iterative pseudo labeling to address FLARE23. Initially, a deep model (nn-UNet) trained on datasets with complete organ annotations (about 220 scans) generates pseudo labels for the whole dataset. These pseudo labels are then employed to train a more powerful segmentation model. Employing the FLARE23 dataset, our approach achieves an average DSC score of 89.63% for organs and 46.07% for tumors on online validation leaderboard. For organ segmentation, We obtain 0.9007\% DSC and 0.9493\% NSD. For tumor segmentation, we obtain 0.3785% DSC and 0.2842% NSD. Our code is available at https://github.com/USTguy/Flare23.

摘要
深度学习（DL）基本方法在腹部器官和肿瘤分割 CT 扫描图像任务中发挥重要作用。然而，大量标注数据的需求对其发展带来了很大的限制。FLARE23 挑战提供了大规模的数据集，其中包括部分和完全标注数据，同时也关注到了分割精度和计算效率。在这个研究中，我们提议使用半监督学习（SSL）策略和迭代 pseudo labeling 来解决 FLARE23。在首先，使用完整器官标注数据（约 220 个扫描图像）训练深度模型（nn-UNet），然后使用这些 pseudo labels 训练更强大的分割模型。使用 FLARE23 数据集，我们的方法在在线验证领先板上获得了平均 DSC 分数为 89.63% 的器官和 46.07% 的肿瘤。对器官分割，我们获得了 0.9007% DSC 和 0.9493% NSD。对肿瘤分割，我们获得了 0.3785% DSC 和 0.2842% NSD。我们的代码可以在 GitHub 上找到：https://github.com/USTguy/Flare23。

paper_url: http://arxiv.org/abs/2310.01142
repo_url: None
paper_authors: Viswesh N, Kaushal Jadhav, Avi Amalanshu, Bratin Mondal, Sabaris Waran, Om Sadhwani, Apoorv Kumar, Debashish Chakravarty
for: 本研究提出了一种新的 Cross Layer Refinement Network（CLRNet），用于lane detection。CLRNet利用了高和低层特征进行融合，以提高检测精度。
methods: 本研究使用了一种新的网络结构，即CLRNet，其包括了高层特征提取和低层特征提取两个部分。高层特征提取部分使用了一种新的卷积神经网络，可以更好地捕捉高层特征。而低层特征提取部分使用了一种新的卷积神经网络，可以更好地捕捉低层特征。两个部分的特征会在多个层次上进行融合，以提高检测精度。
results: 本研究在三个lane detection benchmark上进行了测试，结果显示，CLRNet可以在这些benchmark上达到新的state-of-the-art纪录。

Abstract
The following work is a reproducibility report for CLRNet: Cross Layer Refinement Network for Lane Detection. The basic code was made available by the author. The paper proposes a novel Cross Layer Refinement Network to utilize both high and low level features for lane detection. The authors assert that the proposed technique sets the new state-of-the-art on three lane-detection benchmarks

摘要
本文是一份可重现报告，描述了CLRNet：跨层精度网络，用于车道检测。作者提供了基本代码。文章提出了一种新的跨层精度网络，利用高和低层特征进行车道检测。作者表示，提出的技术已经在三个车道检测标准测试集上设置了新的状态照。

Neural Processing of Tri-Plane Hybrid Neural Fields

paper_url: http://arxiv.org/abs/2310.01140
repo_url: https://github.com/CVLAB-Unibo/triplane_processing
paper_authors: Adriano Cardace, Pierluigi Zama Ramirez, Francesco Ballerini, Allan Zhou, Samuele Salti, Luigi Di Stefano
for: 本文旨在Addressing tasks such as classification and part segmentation directly on neural fields for 3D data, which have appealing properties for storing and communicating 3D data.
methods: 本文使用了具有多层感知器（MLP）的个体神经场，但是这些神经场的维度高，具有内在的重要性空间对称性和随机初始化的敏感性，导致结果较为差。而hybrid representation，尤其是基于三面的表示，尚未被直接处理。
results: 本文表明，使用三面离散数据结构可以有效地处理神经场，并且可以使用标准深度学习机制来处理。作者定义了一个广泛的 benchmark，覆盖了占据性、签名/无签名距离和、 для首次，辐射场。而在同等重构质量下，本文达到了大MLP框架和明文处理框架的任务性能，并且达到了对应的explicit representation处理框架的近似水平。

Abstract
Driven by the appealing properties of neural fields for storing and communicating 3D data, the problem of directly processing them to address tasks such as classification and part segmentation has emerged and has been investigated in recent works. Early approaches employ neural fields parameterized by shared networks trained on the whole dataset, achieving good task performance but sacrificing reconstruction quality. To improve the latter, later methods focus on individual neural fields parameterized as large Multi-Layer Perceptrons (MLPs), which are, however, challenging to process due to the high dimensionality of the weight space, intrinsic weight space symmetries, and sensitivity to random initialization. Hence, results turn out significantly inferior to those achieved by processing explicit representations, e.g., point clouds or meshes. In the meantime, hybrid representations, in particular based on tri-planes, have emerged as a more effective and efficient alternative to realize neural fields, but their direct processing has not been investigated yet. In this paper, we show that the tri-plane discrete data structure encodes rich information, which can be effectively processed by standard deep-learning machinery. We define an extensive benchmark covering a diverse set of fields such as occupancy, signed/unsigned distance, and, for the first time, radiance fields. While processing a field with the same reconstruction quality, we achieve task performance far superior to frameworks that process large MLPs and, for the first time, almost on par with architectures handling explicit representations.

摘要
驱动了神经场的吸引性，用于存储和传输3D数据的问题在最近的研究中得到了关注。早期的方法使用共享网络参数化神经场，实现了好的任务性能，但是额外增加了重建质量的成本。为了改进这个问题，后期的方法使用大的多层感知神经网络（MLP）作为神经场的参数，但是这些神经网络的维度很高，其中的内在约束和随机初始化导致结果不稳定。因此，与处理明确表示（如点云或多面体）的结果相比，这些方法的性能并不如理想。在这篇论文中，我们显示了tri-plane离散数据结构对神经场具有丰富的信息，这些信息可以通过标准的深度学习机制进行有效地处理。我们定义了一个广泛的 benchmark，覆盖各种领域，如占据、签名/未签名距离场和，在第一次提出的辐射场中。而处理这个场的同时保持同样的重建质量，我们实现了任务性能远胜大MLP框架，并且在处理明确表示的框架中几乎与之相当。

Strength in Diversity: Multi-Branch Representation Learning for Vehicle Re-Identification

paper_url: http://arxiv.org/abs/2310.01129
repo_url: https://github.com/videturfortuna/vehicle_reid_itsc2023
paper_authors: Eurico Almeida, Bruno Silva, Jorge Batista
for: 这种 paper 是为了提高汽车重复标识（V-ReID）而写的。
methods: 这种 paper 使用了组合了复杂多支架构来提取Robust和多样化的嵌入，以便实现重复标识。作者提出了组合了Grouped-convolution和Loss-Branch-Split策略来设计多支架构，以提高特征多样性和特征分化。此外，作者还提出了一种使用 grouped convolution 来模拟损失分割的轻量级解决方案，以减少模型大小。
results: 作者的方法在 Veri-776 和 Veri-Wild 两个预测集上比靶状方法高，其中在 Veri-776 上取得了85.6% mAP和97.7% CMC1，在 Veri-Wild 上取得了88.1% mAP和96.3% CMC1。总的来说，作者的工作提供了对于提高汽车重复标识的重要思路，并提供了其他检索任务的强大基础。

Abstract
This paper presents an efficient and lightweight multi-branch deep architecture to improve vehicle re-identification (V-ReID). While most V-ReID work uses a combination of complex multi-branch architectures to extract robust and diversified embeddings towards re-identification, we advocate that simple and lightweight architectures can be designed to fulfill the Re-ID task without compromising performance. We propose a combination of Grouped-convolution and Loss-Branch-Split strategies to design a multi-branch architecture that improve feature diversity and feature discriminability. We combine a ResNet50 global branch architecture with a BotNet self-attention branch architecture, both designed within a Loss-Branch-Split (LBS) strategy. We argue that specialized loss-branch-splitting helps to improve re-identification tasks by generating specialized re-identification features. A lightweight solution using grouped convolution is also proposed to mimic the learning of loss-splitting into multiple embeddings while significantly reducing the model size. In addition, we designed an improved solution to leverage additional metadata, such as camera ID and pose information, that uses 97% less parameters, further improving re-identification performance. In comparison to state-of-the-art (SoTA) methods, our approach outperforms competing solutions in Veri-776 by achieving 85.6% mAP and 97.7% CMC1 and obtains competitive results in Veri-Wild with 88.1% mAP and 96.3% CMC1. Overall, our work provides important insights into improving vehicle re-identification and presents a strong basis for other retrieval tasks. Our code is available at the https://github.com/videturfortuna/vehicle_reid_itsc2023.

摘要
这篇论文提出了一种高效和轻量级多支分支深度架构，以提高车辆重认（V-ReID）性能。大多数V-ReID工作使用复杂多支分支架构来提取Robust和多样化的嵌入，但我们认为可以通过简单和轻量级的架构来实现Re-ID任务，无需妥协性能。我们提出了组合Grouped-convolution和Loss-Branch-Split策略，以设计一种多支分支架构，以提高特征多样性和特征分化能力。我们将ResNet50全球支分支架构与BotNet自注意支分支架构相结合，同时采用Loss-Branch-Split策略。我们认为特殊的损失分支拆分可以提高重认任务的性能，生成特殊的重认特征。此外，我们还提出了一种使用组合 convolution 来模拟损失分支拆分的学习方法，可以在减少模型大小的情况下提高重认性能。此外，我们还提出了一种利用额外元数据，如摄像头 ID 和姿势信息，来提高重认性能，该方法使用97% fewer parameters，并且在Veri-776和Veri-Wild上与SoTA方法竞争。总之，我们的工作为车辆重认提供了重要的意见和技术基础，并在其他检索任务上提供了一个强大的基础。我们的代码可以在上下载。

Batch-less stochastic gradient descent for compressive learning of deep regularization for image denoising

paper_url: http://arxiv.org/abs/2310.03085
repo_url: None
paper_authors: Hui Shi, Yann Traonmilin, J-F Aujol
for: 这个论文是为了解含优先知识数据库中的信号或图像杂谱问题。
methods: 这个论文使用了变分法，通过最大 posteriori bayesian框架，将数据分布系统地链接到Regularizer。使用深度神经网络（DNN）可以从大训练数据库中恢复复杂的分布。
results: 这个论文提出了两种 Stochastic Gradient Descent（SGD）算法，用于从压缩数据库中恢复深度REGULARIZER参数。这两种算法比初始方法更高效，每次使用整个数据库中的信息，并且受到 классиical SGD的确定性保证。这些改进使得这种方法可以应用于图像杂谱问题中。

Abstract
We consider the problem of denoising with the help of prior information taken from a database of clean signals or images. Denoising with variational methods is very efficient if a regularizer well adapted to the nature of the data is available. Thanks to the maximum a posteriori Bayesian framework, such regularizer can be systematically linked with the distribution of the data. With deep neural networks (DNN), complex distributions can be recovered from a large training database.To reduce the computational burden of this task, we adapt the compressive learning framework to the learning of regularizers parametrized by DNN. We propose two variants of stochastic gradient descent (SGD) for the recovery of deep regularization parameters from a heavily compressed database. These algorithms outperform the initially proposed method that was limited to low-dimensional signals, each iteration using information from the whole database. They also benefit from classical SGD convergence guarantees. Thanks to these improvements we show that this method can be applied for patch based image denoising.}

摘要
我们考虑使用库存资料中的对应信息来解决干扰问题。使用可変方法来进行干扰是非常高效，只要有一个适合数据的调整器。透过最大 posteriori Bayesian框架，这个调整器可以与数据的分布系统地连接。使用深度神经网络（DNN）可以从大量训练数据库中重建复杂的分布。为了将这个任务中的计算负担降低，我们适用了压缩学习框架来学习由DNN实现的调整器。我们提出了两种Stochastic Gradient Descent（SGD）算法来从压缩的数据库中获得深度调整器的回复。这些算法比起初的方法，限制在低维度的信号上，每次迭代都使用整个数据库中的信息。它们也获得了 класи学SGD的均衡点。由于这些改进，我们展示了这种方法可以应用于质子项像干扰。

HyMNet: a Multimodal Deep Learning System for Hypertension Classification using Fundus Photographs and Cardiometabolic Risk Factors

paper_url: http://arxiv.org/abs/2310.01099
repo_url: https://github.com/mohammedsb/hypertension
paper_authors: Mohammed Baharoon, Hessa Almatar, Reema Alduhayan, Tariq Aldebasi, Badr Alahmadi, Yahya Bokhari, Mohammed Alawad, Ahmed Almazroa, Abdulrhman Aljouie
for: 预测高血压（HTN）从胸部影像中
methods: 用多modal深度学习（MMDL）系统，结合胸部影像和Cardiometabolic风险因素（年龄和性别），提高高血压检测能力
results: 结果显示，将胸部影像与年龄和性别结合的多modal模型，可以提高高血压检测精度，AUC为0.791([CI: 0.735, 0.848])，比单 modal模型（仅基于胸部影像）的AUC（0.766，[CI: 0.705, 0.828）高。

Abstract
In recent years, deep learning has shown promise in predicting hypertension (HTN) from fundus images. However, most prior research has primarily focused on analyzing a single type of data, which may not capture the full complexity of HTN risk. To address this limitation, this study introduces a multimodal deep learning (MMDL) system, dubbed HyMNet, which combines fundus images and cardiometabolic risk factors, specifically age and gender, to improve hypertension detection capabilities. Our MMDL system uses the DenseNet-201 architecture, pre-trained on ImageNet, for the fundus imaging path and a fully connected neural network for the age and gender path. The two paths are jointly trained by concatenating 64 features output from each path that are then fed into a fusion network. The system was trained on 1,143 retinal images from 626 individuals collected from the Saudi Ministry of National Guard Health Affairs. The results show that the multimodal model that integrates fundus images along with age and gender achieved an AUC of 0.791 [CI: 0.735, 0.848], which outperforms the unimodal model trained solely on fundus photographs that yielded an AUC of 0.766 [CI: 0.705, 0.828] for hypertension detection.

摘要
近年来，深度学习在血管照片中预测高血压（HTN）表现了承诺。然而，大多数先前的研究都主要集中在分析单一的数据类型，可能不能捕捉高血压风险的全面性。为了解决这些限制，本研究提出了一种多模态深度学习（MMDL）系统，名为HyMNet，它将血管照片和Cardiometabolic风险因素（具体来说是年龄和性别）结合起来，以提高高血压检测能力。我们的MMDL系统使用了DenseNet-201架构，预先在ImageNet上训练，对血管照片路径进行了预处理，并使用了一个全连接神经网络来处理年龄和性别路径。两个路径之后被 concatenate 并 feed into 一个 fusión网络进行联合训练。系统在Saudi Ministry of National Guard Health Affairs提供的1,143张胶囊照片和626名参与者的数据集上进行了训练。结果显示，将血管照片和年龄 gender integrate 在一起的多模态模型在检测高血压方面达到了AUC 0.791 的表现（CI: 0.735, 0.848），超过了只使用血管照片进行训练的单模态模型，其AUC为0.766 （CI: 0.705, 0.828）。

Leveraging Cutting Edge Deep Learning Based Image Matching for Reconstructing a Large Scene from Sparse Images

paper_url: http://arxiv.org/abs/2310.01092
repo_url: None
paper_authors: Georg Bökman, Johan Edstedt
for: 本研究 targets the AISG-SLA Visual Localisation Challenge benchmark (IJCAI 2023), where the task is to estimate relative motion between images taken in sequence by a camera mounted on a car driving through an urban scene.
methods: 我们使用我们最新的深度学习基于匹配器 RoMa 来匹配图像，并使用 COLMAP 进行结构从运动重建。我们选择了我们最新的 DeDoDe 关键点，以便提高匹配的稳定性。此外，我们还使用 DINOv2 进行图像检索，以解决时间跳跃问题。
results: 我们的方法在比赛中得到了第三名的竞争性成绩，并且我们还提供了一个不精确的最大准确率的上限，这个上限表明了图像检索方法的可能性。

Abstract
We present the top ranked solution for the AISG-SLA Visual Localisation Challenge benchmark (IJCAI 2023), where the task is to estimate relative motion between images taken in sequence by a camera mounted on a car driving through an urban scene. For matching images we use our recent deep learning based matcher RoMa. Matching image pairs sequentially and estimating relative motion from point correspondences sampled by RoMa already gives very competitive results -- third rank on the challenge benchmark. To improve the estimations we extract keypoints in the images, match them using RoMa, and perform structure from motion reconstruction using COLMAP. We choose our recent DeDoDe keypoints for their high repeatability. Further, we address time jumps in the image sequence by matching specific non-consecutive image pairs based on image retrieval with DINOv2. These improvements yield a solution beating all competitors. We further present a loose upper bound on the accuracy obtainable by the image retrieval approach by also matching hand-picked non-consecutive pairs.

摘要
我们现在提出了AISG-SLA视Localisation挑战 bencmark（IJCAI 2023）上的首位解决方案，任务是根据摄像机附加在城市场景中驱动的车辆拍摄的图像序列中估计相对运动。 для匹配图像我们使用我们最近的深度学习基于的匹配器RoMa。匹配图像序列中的图像对并估计相对运动从点对应关系采样了RoMa已经给出了非常竞争力的结果， rank third on the challenge benchmark。为了提高估计，我们提取图像中的关键点，匹配它们使用RoMa，并使用COLMAP进行结构从运动重建。我们选择我们最近的DeDoDe关键点，因为它们具有高重复性。此外，我们解决图像序列中的时间跳动问题，通过基于图像检索的DINOv2匹配特定非连续图像对。这些改进导致我们的解决方案超过了所有竞争对手。此外，我们还提出了图像检索方法的轻松上限，通过手动选择非连续图像对进行匹配。

Unsupervised Roofline Extraction from True Orthophotos for LoD2 Building Model Reconstruction

paper_url: http://arxiv.org/abs/2310.01067
repo_url: https://github.com/tudelft3d/roofline-extraction-from-orthophotos
paper_authors: Weixiao Gao, Ravi Peters, Jantien Stoter
for: 本研究旨在利用倾斜飞行图像生成的点云进行大规模城市环境中的LoD2建筑模型重建。
methods: 本研究使用了线检测技术来从真正正方图中提取屋顶线，以便在LoD2水平进行建筑模型重建。
results: 本研究表明，使用本方法可以relative完整地提取屋顶线，而无需预先训练数据或模型。这些线可以直接用于LoD2建筑模型重建过程中。与传统方法和现状深度学习方法相比，本方法能够提供更高的准确性和完整性。Here’s the same information in English:
for: The paper aims to reconstruct LoD2 building models from 2D and 3D data for large-scale urban environments using point clouds generated from oblique aerial images.
methods: The method used in the paper is line detection to extract rooflines from true orthophotos for the reconstruction of building models at the LoD2 level.
results: The paper shows that the method can relatively complete extract rooflines without the need for pre-labeled training data or pre-trained models. These lines can directly be used in the LoD2 building model reconstruction process, and the method is superior to existing plane detection-based methods and state-of-the-art deep learning methods in terms of accuracy and completeness.

Abstract
This paper discusses the reconstruction of LoD2 building models from 2D and 3D data for large-scale urban environments. Traditional methods involve the use of LiDAR point clouds, but due to high costs and long intervals associated with acquiring such data for rapidly developing areas, researchers have started exploring the use of point clouds generated from (oblique) aerial images. However, using such point clouds for traditional plane detection-based methods can result in significant errors and introduce noise into the reconstructed building models. To address this, this paper presents a method for extracting rooflines from true orthophotos using line detection for the reconstruction of building models at the LoD2 level. The approach is able to extract relatively complete rooflines without the need for pre-labeled training data or pre-trained models. These lines can directly be used in the LoD2 building model reconstruction process. The method is superior to existing plane detection-based methods and state-of-the-art deep learning methods in terms of the accuracy and completeness of the reconstructed building. Our source code is available at https://github.com/tudelft3d/Roofline-extraction-from-orthophotos.

摘要

Unsupervised motion segmentation in one go: Smooth long-term model over a video

paper_url: http://arxiv.org/abs/2310.01040
repo_url: None
paper_authors: Etienne Meunier, Patrick Bouthemy
for: 提出一种 Totally Unsupervised Video Object Segmentation (VOS) 方法，用于同时分割视频序列中的对象和动作。
methods: 提出了一种基于 transformer 网络的方法，使用 Evidence Lower Bound (ELBO) 捕获函数来定义损失函数，该函数组合了空间尺度的多omial（quadratic）动作模型和时间尺度的 B-splines 动作模型，以及一个 temporal consistency 正则项。
results: 在四个 VOS benchmark 上实现了有力的量化结果，并通过视觉结果展示了方法对时间一致性的重要贡献。

Abstract
Human beings have the ability to continuously analyze a video and immediately extract the main motion components. Motion segmentation methods often proceed frame by frame. We want to go beyond this classical paradigm, and perform the motion segmentation over a video sequence in one go. It will be a prominent added value for downstream computer vision tasks, and could provide a pretext criterion for unsupervised video representation learning. In this perspective, we propose a novel long-term spatio-temporal model operating in a totally unsupervised way. It takes as input the volume of consecutive optical flow (OF) fields, and delivers a volume of segments of coherent motion over the video. More specifically, we have designed a transformer-based network, where we leverage a mathematically well-founded framework, the Evidence Lower Bound (ELBO), to infer the loss function. The loss function combines a flow reconstruction term involving spatio-temporal parametric motion models combining, in a novel way, polynomial (quadratic) motion models for the $(x,y)$-spatial dimensions and B-splines for the time dimension of the video sequence, and a regularization term enforcing temporal consistency on the masks. We report experiments on four VOS benchmarks with convincing quantitative results. We also highlight through visual results the key contributions on temporal consistency brought by our method.

摘要
人类有能力不断分析视频并立即提取主要运动组成部分。运动分割方法经常以帧为单位进行。我们想要超越这种传统模式，并在视频序列中一次性进行运动分割。这将为下游计算机视觉任务带来明显的加值，并可提供无监督视频表示学习的先天权威标准。在这个视角下，我们提出了一种新的长期空间时间模型，不需要监督。它接受连续的滤流场（OF）场的体积作为输入，并输出一个视频序列中的各个 Segment of coherent motion。我们设计了基于 transformer 网络，并利用数学上有根据的框架，证明 Lower Bound（ELBO）来推导损函数。损函数组合了空间时间参数动力学模型的杂率重构项，以及时间维度的视频序列中的强制一致性约束。我们在四个 VOS benchmark 上进行了有力量的量化实验，并通过视觉结果显示了我们方法带来的时间一致性的关键贡献。

paper_url: http://arxiv.org/abs/2310.01035
repo_url: None
paper_authors: Hu Wang, Yuanhong Chen, Congbo Ma, Jodie Avery, Louise Hull, Gustavo Carneiro
for: 这篇论文的目的是解决多Modalities中缺失的问题，提高多Modalities模型的性能。methods: 这篇论文提出了一种Learnable Cross-modal Knowledge Distillation（LCKD）模型，通过自适应选择重要的Modalities并将它们中的知识透传到其他Modalities来解决缺失Modalities问题。results: 实验结果表明，LCKD方法在Brain Tumour Segmentation Dataset 2018（BraTS2018）上表现出色，比其他方法提高了3.61%、5.99%和3.76%的 segmentation Dice score，提高了状态对应的性能。

Abstract
The problem of missing modalities is both critical and non-trivial to be handled in multi-modal models. It is common for multi-modal tasks that certain modalities contribute more compared to other modalities, and if those important modalities are missing, the model performance drops significantly. Such fact remains unexplored by current multi-modal approaches that recover the representation from missing modalities by feature reconstruction or blind feature aggregation from other modalities, instead of extracting useful information from the best performing modalities. In this paper, we propose a Learnable Cross-modal Knowledge Distillation (LCKD) model to adaptively identify important modalities and distil knowledge from them to help other modalities from the cross-modal perspective for solving the missing modality issue. Our approach introduces a teacher election procedure to select the most ``qualified'' teachers based on their single modality performance on certain tasks. Then, cross-modal knowledge distillation is performed between teacher and student modalities for each task to push the model parameters to a point that is beneficial for all tasks. Hence, even if the teacher modalities for certain tasks are missing during testing, the available student modalities can accomplish the task well enough based on the learned knowledge from their automatically elected teacher modalities. Experiments on the Brain Tumour Segmentation Dataset 2018 (BraTS2018) shows that LCKD outperforms other methods by a considerable margin, improving the state-of-the-art performance by 3.61% for enhancing tumour, 5.99% for tumour core, and 3.76% for whole tumour in terms of segmentation Dice score.

摘要
“多modal模型处理缺失modalities的问题是非常 kritical 和复杂的。通常情况下，多modal任务中某些modalities会比其他modalities更加重要，如果这些重要modalities缺失，模型性能会下降很多。这一点尚未被当前的多modal方法考虑，这些方法通常通过特征重建或盲目特征聚合来从其他modalities中恢复缺失的modalities。而我们提出的Learnable Cross-modal Knowledge Distillation（LCKD）模型则可以适应性地标识重要的modalities，并从这些modalities中提取有用信息，以帮助其他modalities从跨modal的视角解决缺失modalities问题。我们的方法包括选择最佳的教师modalities基于它们单 modalities的性能在某些任务上，然后在教师和学生modalities之间进行跨modal知识填充。因此，即使在测试时缺失某些教师modalities，可以使用可得到的学生modalities来完成任务，并且基于自动选择的教师modalities学习出来的知识来实现比较好的性能。我们在Brain Tumour Segmentation Dataset 2018（BraTS2018）上进行了实验，结果显示LCKD的表现比其他方法更好，提高了 state-of-the-art 性能的水平，具体来说是提高了涂抹率的Dice分数的值：3.61%，5.99%和3.76%。”

Incorporating Supervised Domain Generalization into Data Augmentation

paper_url: http://arxiv.org/abs/2310.01029
repo_url: None
paper_authors: Shohei Enomoto, Monikka Roslianna Busto, Takeharu Eda
for: 提高深度学习在户外环境中的鲁棒性，以 preserve accuracy 在分布变化的情况下
methods: 使用数据扩充技术，并将其视为支持领域整合~~(SDG)，使用对比语义Alignment~~(CSA)损失来提高数据扩充的鲁棒性和训练效率
results: 在CIFAR-100和CUB datasets上实验表明，提议的方法可以提高数据扩充的鲁棒性和训练效率，并可以作为现有数据扩充方法的插件使用

Abstract
With the increasing utilization of deep learning in outdoor settings, its robustness needs to be enhanced to preserve accuracy in the face of distribution shifts, such as compression artifacts. Data augmentation is a widely used technique to improve robustness, thanks to its ease of use and numerous benefits. However, it requires more training epochs, making it difficult to train large models with limited computational resources. To address this problem, we treat data augmentation as supervised domain generalization~(SDG) and benefit from the SDG method, contrastive semantic alignment~(CSA) loss, to improve the robustness and training efficiency of data augmentation. The proposed method only adds loss during model training and can be used as a plug-in for existing data augmentation methods. Experiments on the CIFAR-100 and CUB datasets show that the proposed method improves the robustness and training efficiency of typical data augmentations.

摘要
随着深度学习在户外场景中的应用越来越广泛，其Robustness需要得到加强，以保持面对分布变化时的准确性。数据扩充是一种广泛使用的技术来提高Robustness，因为它的使用非常容易和有很多优点。然而，它需要更多的训练环节，使得在有限的计算资源下训练大型模型变得困难。为解决这个问题，我们将数据扩充视为指导领域泛化~(SDG)，并利用指导领域泛化方法的对准性 semantic alignment~(CSA)损失来提高数据扩充的Robustness和训练效率。该提案只需在模型训练时添加损失，可以作为现有数据扩充方法的插件使用。实验表明，在CIFAR-100和CUB数据集上，我们的提案方法可以提高数据扩充的Robustness和训练效率。

A New Real-World Video Dataset for the Comparison of Defogging Algorithms

paper_url: http://arxiv.org/abs/2310.01020
repo_url: None
paper_authors: Alexandra Duminil, Jean-Philippe Tarel, Roland Brémond
for: 这篇论文主要写于为何 foggy video 修复引起了越来越多的关注，以及为何现有的数据集缺乏包含清晰和雾oso的视频样本，以便进行深度学习和评估。methods: 该论文提出了一个新的REal-world Video dataset（REVIDE），用于比较抑fog算法（VIREDA）的性能，该数据集包含多种雾 densities和ground truths without fog。此外，该论文还提出了一种视频抑fog算法（尚在开发中），其关键思想是利用时间重复性来减少artefacts和曝光变化 между帧。results: 该论文采用了Transformers架构，以示出该数据集的相关性。

Abstract
Video restoration for noise removal, deblurring or super-resolution is attracting more and more attention in the fields of image processing and computer vision. Works on video restoration with data-driven approaches for fog removal are rare however, due to the lack of datasets containing videos in both clear and foggy conditions which are required for deep learning and benchmarking. A new dataset, called REVIDE, was recently proposed for just that purpose. In this paper, we implement the same approach by proposing a new REal-world VIdeo dataset for the comparison of Defogging Algorithms (VIREDA), with various fog densities and ground truths without fog. This small database can serve as a test base for defogging algorithms. A video defogging algorithm is also mentioned (still under development), with the key idea of using temporal redundancy to minimize artefacts and exposure variations between frames. Inspired by the success of Transformers architecture in deep learning for various applications, we select this kind of architecture in a neural network to show the relevance of the proposed dataset.

摘要
视频修复技术在干扰除、锐化或超分辨等领域受到越来越多的关注，但对于数据驱动的视频修复方法， Works on video restoration with data-driven approaches for fog removal are rare，因为缺乏包含清晰和雾osos condition的视频数据集，这些数据集是深度学习和标准化的必要条件。一个新的数据集，名为REVIDE，最近被提出用于这个目的。在这篇论文中，我们实现了同样的方法，提出了一个新的真实世界视频数据集，用于比较抑雾算法（VIREDA），其中包括不同的雾度和无雾的场景。这个小型数据库可以作为抑雾算法的测试基础。此外，我们还提出了一种视频抑雾算法（还在开发中），其关键思想是通过时间重复使用数据来减少遗传和曝光变化的问题。受到Transformers架构在深度学习中的成功，我们选择了这种架构来证明提出的数据集的相关性。

Controlling Vision-Language Models for Universal Image Restoration

paper_url: http://arxiv.org/abs/2310.01018
repo_url: https://github.com/algolzw/daclip-uir
paper_authors: Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sjölund, Thomas B. Schön
for: 这个论文旨在提高适用于底层视觉任务的预训语言模型（CLIP）的表现，并提供一个通用的框架来实现图像修复。
methods: 这个论文使用了一个增强控制器，将预先训练的 CLIP 图像encoder 扮演为预测高品质的对象特征向量的模型。这个控制器还会输出一个与实际负载相应的偏差特征，将模型引导学习高屏幕图像重建。
results: 这个论文的方法可以在两种类型的图像修复任务上进行顶尖表现，包括对于特定类型的负载和统一的图像修复任务。此外，这个论文还构建了一个混合类型的负载数据集，以供 DA-CLIP 训练。

Abstract
Vision-language models such as CLIP have shown great impact on diverse downstream tasks for zero-shot or label-free predictions. However, when it comes to low-level vision such as image restoration their performance deteriorates dramatically due to corrupted inputs. In this paper, we present a degradation-aware vision-language model (DA-CLIP) to better transfer pretrained vision-language models to low-level vision tasks as a universal framework for image restoration. More specifically, DA-CLIP trains an additional controller that adapts the fixed CLIP image encoder to predict high-quality feature embeddings. By integrating the embedding into an image restoration network via cross-attention, we are able to pilot the model to learn a high-fidelity image reconstruction. The controller itself will also output a degradation feature that matches the real corruptions of the input, yielding a natural classifier for different degradation types. In addition, we construct a mixed degradation dataset with synthetic captions for DA-CLIP training. Our approach advances state-of-the-art performance on both degradation-specific and unified image restoration tasks, showing a promising direction of prompting image restoration with large-scale pretrained vision-language models. Our code is available at https://github.com/Algolzw/daclip-uir.

摘要
CLIP类视力语模型在多种下渠任务中表现出色，但在低级视力任务中，其表现很差，主要是因为输入数据受到损害。在这篇论文中，我们提出了一种适应质量降低（DA-CLIP）视力语模型，以更好地将预训练的视力语模型传输到低级视力任务中。具体来说，DA-CLIP在预训练的CLIP图像Encoder上添加了一个适应器，以预测高质量的特征嵌入。通过将嵌入 integrate到图像修复网络中via Cross-Attention，我们可以让模型学习高准确性的图像重建。适应器本身也会输出一个与实际损害相符的损害特征，使得模型可以自然地分类不同的损害类型。此外，我们构建了一个混合损害数据集，用于DA-CLIP训练。我们的方法在不同类型的损害任务和综合图像修复任务中提高了状态之前的表现，展示了使用大规模预训练的视力语模型进行图像修复的可行性。我们的代码可以在https://github.com/Algolzw/daclip-uir中找到。

Multi-task Learning with 3D-Aware Regularization

paper_url: http://arxiv.org/abs/2310.00986
repo_url: https://github.com/vico-uoe/mtpsl
paper_authors: Wei-Hong Li, Steven McDonagh, Ales Leonardis, Hakan Bilen
For: The paper aims to improve the performance of deep neural networks for multiple dense computer vision tasks by introducing a structured 3D-aware regularizer.* Methods: The proposed method uses a shared 3D feature space to interface multiple tasks and improve performance by reducing noisy cross-task correlations. The method is architecture agnostic and can be plugged into various prior multi-task backbones.* Results: The proposed method improves performance on standard benchmarks NYUv2 and PASCAL-Context.Here’s the Chinese translation of the three points:* For: 该文章目的是使深度神经网络在多个粗糙计算机视觉任务上表现更好，通过引入一种结构化3D意识的正则化。* Methods: 提议的方法使用一个共享的3D特征空间来接口多个任务，以减少高维特征空间中的噪声交叠关系，从而提高性能。该方法是任务预设独立的，可以插入不同的多任务背景中。* Results: 提议的方法在标准测试集NYUv2和PASCAL-Context上提高了性能。

Abstract
Deep neural networks have become a standard building block for designing models that can perform multiple dense computer vision tasks such as depth estimation and semantic segmentation thanks to their ability to capture complex correlations in high dimensional feature space across tasks. However, the cross-task correlations that are learned in the unstructured feature space can be extremely noisy and susceptible to overfitting, consequently hurting performance. We propose to address this problem by introducing a structured 3D-aware regularizer which interfaces multiple tasks through the projection of features extracted from an image encoder to a shared 3D feature space and decodes them into their task output space through differentiable rendering. We show that the proposed method is architecture agnostic and can be plugged into various prior multi-task backbones to improve their performance; as we evidence using standard benchmarks NYUv2 and PASCAL-Context.

摘要
深度神经网络已成为多任务计算机视觉模型的标准构建件，感谢它们可以捕捉高维特征空间中复杂的相关性。然而，在无结构的特征空间中学习的交叉任务相关性可能会具有噪声性和适应性问题，从而影响性能。我们提议通过引入三维意识的结构化正则化来解决这问题，该正则化通过从图像编码器提取的特征进行投影，并将其decode到每个任务的输出空间。我们证明了该方法是静态背景无关的，可以适用于不同的多任务背景架构，以提高其性能。我们通过使用标准 bencmarks NYUv2和PASCAL-Context证明了这一点。Note: " Simplified Chinese" is also known as "Mandarin" or "Standard Chinese".

LS-VOS: Identifying Outliers in 3D Object Detections Using Latent Space Virtual Outlier Synthesis

paper_url: http://arxiv.org/abs/2310.00952
repo_url: None
paper_authors: Aldi Piroli, Vinzenz Dallabetta, Johannes Kopp, Marc Walessa, Daniel Meissner, Klaus Dietmayer
for: 提高自动驾驶应用中LiDAR基于3D物体探测器的可靠性和准确性
methods: 基于虚拟外围同构（VOS）的方法，在训练过程中integrate异常知识，使模型学习更加紧凑的决策边界
results: 在广泛的实验中，我们的方法可以改善现有的3D物体探测器的异常检测能力，同时保持高度的3D物体检测性能

Abstract
LiDAR-based 3D object detectors have achieved unprecedented speed and accuracy in autonomous driving applications. However, similar to other neural networks, they are often biased toward high-confidence predictions or return detections where no real object is present. These types of detections can lead to a less reliable environment perception, severely affecting the functionality and safety of autonomous vehicles. We address this problem by proposing LS-VOS, a framework for identifying outliers in 3D object detections. Our approach builds on the idea of Virtual Outlier Synthesis (VOS), which incorporates outlier knowledge during training, enabling the model to learn more compact decision boundaries. In particular, we propose a new synthesis approach that relies on the latent space of an auto-encoder network to generate outlier features with a parametrizable degree of similarity to in-distribution features. In extensive experiments, we show that our approach improves the outlier detection capabilities of a state-of-the-art object detector while maintaining high 3D object detection performance.

摘要
“LiDAR基于的3D物体探测器在自动驾驶应用中实现了历史性的速度和准确性。然而，与其他神经网络一样，它们经常受到高信任度预测或返回存在实体的检测，导致环境感知变得不可靠，严重影响自动驾驶车辆的功能和安全。我们解决这个问题 by proposing LS-VOS，一种用于察看3D物体检测中异常值的框架。我们的方法基于虚拟异常合成（VOS）的想法，在训练过程中包含异常知识，使模型学习更加紧凑的决策边界。具体来说，我们提出了一种新的合成方法，利用自动encoder网络的隐藏空间生成异常特征，其 Parametrizable degree of similarity to in-distribution features。在广泛的实验中，我们显示了我们的方法可以提高一个状态的arteobject detector的异常检测能力，而不会影响3D物体检测性能。”

paper_url: http://arxiv.org/abs/2310.00950
repo_url: None
paper_authors: Ling Shuang Soh, Hann Woei Ho
for: 本研究旨在提供一种基于视觉的indoor Micro Air Vehicle（MAV）导航解决方案，主要应用于自动化仓库中。
methods: 我们的方法包括HSV色彩检测和Hough直线变换，以实现在仓库环境中的线检测和跟踪。我们还利用 kalman filter 来使camera可靠地跟踪黄色线。
results: 我们通过在Gazebo 11平台上进行MAV飞行测试，使用 ROS Noetic 评估了我们的视觉基于的直线跟踪算法的性能。测试结果表明系统可以成功导航窄的室内空间。我们的提出的系统具有减少劳动成本和提高仓库运作效率的潜力。

Abstract
In this paper, we propose a vision-based solution for indoor Micro Air Vehicle (MAV) navigation, with a primary focus on its application within autonomous warehouses. Our work centers on the utilization of a single camera as the primary sensor for tasks such as detection, localization, and path planning. To achieve these objectives, we implement the HSV color detection and the Hough Line Transform for effective line detection within warehouse environments. The integration of a Kalman filter into our system enables the camera to track yellow lines reliably. We evaluated the performance of our vision-based line following algorithm through various MAV flight tests conducted in the Gazebo 11 platform, utilizing ROS Noetic. The results of these simulations demonstrate the system capability to successfully navigate narrow indoor spaces. Our proposed system has the potential to significantly reduce labor costs and enhance overall productivity in warehouse operations. This work contributes to the growing field of MAV applications in autonomous warehouses, addressing the need for efficient logistics and supply chain solutions.

摘要
在这篇论文中，我们提出了一种视觉基于的indoor Micro Air Vehicle（MAV）导航解决方案，主要关注于自动化仓库应用。我们的工作集中在Single camera作为主要感知器，用于任务such as检测、定位和路径规划。为了实现这些目标，我们实现了HSV颜色检测和Hough Line Transform，以有效地检测仓库环境中的直线。通过将Kalman Filter integrate into our system，我们可以可靠地跟踪黄色线。我们通过在Gazebo 11平台上进行MAV飞行测试，使用ROS Noetic评估了我们的视觉基于直线跟踪算法的性能。测试结果表明系统能够成功导航室内窄空间。我们的提议系统具有减少劳动成本和提高仓库运作效率的潜力。这种工作对自动化仓库应用的MAV技术做出了贡献，解决了效率的物流和供应链解决方案。

Towards Robust 3D Object Detection In Rainy Conditions

paper_url: http://arxiv.org/abs/2310.00944
repo_url: None
paper_authors: Aldi Piroli, Vinzenz Dallabetta, Johannes Kopp, Marc Walessa, Daniel Meissner, Klaus Dietmayer
for: 提高探测器对路面泥浆的Robustness
methods: 使用现有的恶劣天气检测网络筛除路面泥浆，并使用雷达目标进行进一步的过滤false positive检测
results: 测试结果表明，我们的方法可以提高各种popular 3D对象检测器对路面泥浆的Robustness

Abstract
LiDAR sensors are used in autonomous driving applications to accurately perceive the environment. However, they are affected by adverse weather conditions such as snow, fog, and rain. These everyday phenomena introduce unwanted noise into the measurements, severely degrading the performance of LiDAR-based perception systems. In this work, we propose a framework for improving the robustness of LiDAR-based 3D object detectors against road spray. Our approach uses a state-of-the-art adverse weather detection network to filter out spray from the LiDAR point cloud, which is then used as input for the object detector. In this way, the detected objects are less affected by the adverse weather in the scene, resulting in a more accurate perception of the environment. In addition to adverse weather filtering, we explore the use of radar targets to further filter false positive detections. Tests on real-world data show that our approach improves the robustness to road spray of several popular 3D object detectors.

摘要
利达（LiDAR）感知器在自动驾驶应用中用于准确感知环境。然而，它们受到日常天气Conditionssuch as snow, fog, and rain的影响，这些现象会引入LiDAR测量中的噪声，严重降低LiDAR基于感知系统的性能。在这种工作中，我们提出一种加强LiDAR基于3D объек检测系统对道路喷涂的Robustness的框架。我们的方法使用当前最佳的不良天气检测网络来筛除喷涂从LiDAR点云中，然后将筛除后的点云作为对象检测器的输入，从而使检测到的对象减少与场景中的不良天气的影响，实现更加准确的环境感知。此外，我们还探讨了使用雷达目标来进一步筛除假阳性检测的可能性。实际测试表明，我们的方法可以提高多种流行的3D对象检测器对道路喷涂的Robustness。

paper_url: http://arxiv.org/abs/2310.00943
repo_url: None
paper_authors: M. Zarebnia, R. Parvaz
for: 本研究旨在解决图像模糊问题，图像模糊是由手动或摄像头摇摆等多种因素引起的。
methods: 本研究使用了 semi-blind 图像除锐方法，由于这是一个不可逆的问题，因此需要使用总变量（TV）方法。
results: 提出的方法在不同类型的图像上进行了测试，与现有方法进行了比较，测试结果表明，提出的方法在图像除锐问题中具有较高的精度和稳定性。Here’s the English version of the three key points for reference:
for: The paper aims to solve the problem of image blurring, which is caused by various factors such as hand or camera shake.
methods: The proposed method uses a semi-blind image deblurring approach, which is an ill-conditioned problem and cannot be solved directly. The method improves the TV method by using the framelet transform and fractional calculations.
results: The proposed method is tested on different types of images and compared with existing methods, and the results show that the method has higher accuracy and stability in image deblurring.

Abstract
The problem of image blurring is one of the most studied topics in the field of image processing. Image blurring is caused by various factors such as hand or camera shake. To restore the blurred image, it is necessary to know information about the point spread function (PSF). And because in the most cases it is not possible to accurately calculate the PSF, we are dealing with an approximate kernel. In this paper, the semi-blind image deblurring problem are studied. Due to the fact that the model of the deblurring problems is an ill-conditioned problem, it is not possible to solve this problem directly. One of the most efficient ways to solve this problem is to use the total variation (TV) method. In the proposed algorithm, by using the framelet transform and fractional calculations, the TV method is improved. The proposed method is used on different types of images and is compared with existing methods with different types of tests.

摘要
“图像模糊问题是图像处理领域最具研究价值的话题之一。图像模糊是由手或相机摇晃等多种因素引起的。为恢复模糊图像，需要知道点扩散函数（PSF）的信息。然而，在大多数情况下，不可能准确计算PSF，因此我们面临着一个近似kernel的问题。在这篇论文中，我们研究了半覆盖图像修复问题。由于修复问题的模型是一个不整合问题，因此直接解决这个问题是不可能的。一种非常有效的解决方法是使用总变量（TV）方法。在我们提出的算法中，通过使用框列变换和分数计算，我们改进了TV方法。我们使用了不同类型的图像和与现有方法进行了不同类型的测试，并与之进行比较。”Note that Simplified Chinese is used in mainland China, and Traditional Chinese is used in Taiwan and other regions. The translation may vary slightly depending on the specific dialect or regional variation.

Data Efficient Training of a U-Net Based Architecture for Structured Documents Localization

paper_url: http://arxiv.org/abs/2310.00937
repo_url: None
paper_authors: Anastasiia Kabeshova, Guillaume Betmont, Julien Lerouge, Evgeny Stepankevich, Alexis Bergès
for: 本研究旨在提高现代在线权限进程中的文档分析和识别率，并且文档地标是确定可靠关键信息提取的关键步骤。
methods: 我们提出了SDL-Net，一种基于encoder-decoder架构的新型文档地标方法，可以预训练encoder部分使用通用数据集，并快速和数据efficient地调整decoder部分以支持新类型文档的地标。
results: 我们在一个专有的文档图像数据集上进行了广泛的实验，证明了我们提出的方法的有效性和通用性。

Abstract
Structured documents analysis and recognition are essential for modern online on-boarding processes, and document localization is a crucial step to achieve reliable key information extraction. While deep-learning has become the standard technique used to solve document analysis problems, real-world applications in industry still face the limited availability of labelled data and of computational resources when training or fine-tuning deep-learning models. To tackle these challenges, we propose SDL-Net: a novel U-Net like encoder-decoder architecture for the localization of structured documents. Our approach allows pre-training the encoder of SDL-Net on a generic dataset containing samples of various document classes, and enables fast and data-efficient fine-tuning of decoders to support the localization of new document classes. We conduct extensive experiments on a proprietary dataset of structured document images to demonstrate the effectiveness and the generalization capabilities of the proposed approach.

摘要
现代在线上办理过程中，结构化文档分析和识别是非常重要的，而文档本地化是确保可靠地提取关键信息的关键步骤。深度学习已成为解决文档分析问题的标准技术，但实际应用中仍然面临有限的标签数据和计算资源的问题。为解决这些挑战，我们提出了 SDL-Net：一种基于 U-Net 的Encoder-Decoder架构，用于本地化结构化文档。我们的方法允许预训练 Encoder 的 SDL-Net 在一个通用的 dataset 上，并允许快速和数据有效地调整 Decoder 以支持新的文档类型的本地化。我们在一个专用的结构化文档图像集上进行了广泛的实验，以证明我们的方法的有效性和泛化能力。

paper_url: http://arxiv.org/abs/2310.00936
repo_url: None
paper_authors: Takumi Harada, Kazuyuki Aihara, Hiroyuki Sakai
for: 提高 StyleGAN 模型中 latent code 的搜索和操作精度，以保持生成图像的真实性。
methods: 提出一种简单的无监督方法，通过识别紧密映射的 latent space，限制 latent code 的修改在本地 latent subspace 内，以保持生成图像的真实性。
results: 实验表明，通过本方法对 latent code 进行优化，可以保持生成图像的真实性，并且可以应用于不同类型的 style-based 模型。

Abstract
Recent studies on StyleGAN variants show promising performances for various generation tasks. In these models, latent codes have traditionally been manipulated and searched for the desired images. However, this approach sometimes suffers from a lack of photorealism in generated images due to a lack of knowledge about the geometry of the trained latent space. In this paper, we show a simple unsupervised method that provides well-trained local latent subspace, enabling latent code navigation while preserving the photorealism of the generated images. Specifically, the method identifies densely mapped latent spaces and restricts latent manipulations within the local latent subspace. Experimental results demonstrate that images generated within the local latent subspace maintain photorealism even when the latent codes are significantly and repeatedly manipulated. Moreover, experiments show that the method can be applied to latent code optimization for various types of style-based models. Our empirical evidence of the method will benefit applications in style-based models.

摘要
Translated into Simplified Chinese:现代StyleGAN变体的研究显示了许多图像生成任务的承诺性表现。然而，在这些模型中，通常是通过 manipulate 缓存代码来实现图像生成。但是，这种方法有时会导致图像生成出来的图像不具备真实性，这是因为缓存空间的学习结构不够了解。在这篇论文中，我们提出了一种简单的无监督方法，可以提供一个准确地训练了的本地缓存子空间，以便在缓存代码 navigation 时保持图像的真实性。具体来说，该方法可以将缓存空间映射到 densely 的地方，并在本地缓存子空间内限制缓存操作。实验结果表明，在本地缓存子空间内生成的图像保持真实性，即使缓存代码被重复地修改。此外，该方法可以应用于不同类型的 style-based 模型的缓存代码优化。我们的实验证明了该方法的有用性，将有助于应用于 style-based 模型。

Enhanced Winter Road Surface Condition Monitoring with Computer Vision

paper_url: http://arxiv.org/abs/2310.00923
repo_url: https://github.com/ojalar/siwnet
paper_authors: Risto Ojala, Alvari Seppänen
for: 这篇论文的目的是提出一个深度学习 regression 模型，SIWNet，可以从摄像头图像中估计路面黏度特性。
methods: 这篇论文使用了一个包含 uncertainty estimation 机制的深度学习网络架构，并且使用了一个最大可能性损失函数来训练这个机制。
results: 研究发现，SIWNet 可以 accurately estimate road surface friction properties from camera images, and the prediction interval estimation of SIWNet is effective in quantifying the uncertainty of the predictions.

Abstract
Winter conditions pose several challenges for automated driving applications. A key challenge during winter is accurate assessment of road surface condition, as its impact on friction is a critical parameter for safely and reliably controlling a vehicle. This paper proposes a deep learning regression model, SIWNet, capable of estimating road surface friction properties from camera images. SIWNet extends state of the art by including an uncertainty estimation mechanism in the architecture. This is achieved by including an additional head in the network, which estimates a prediction interval. The prediction interval head is trained with a maximum likelihood loss function. The model was trained and tested with the SeeingThroughFog dataset, which features corresponding road friction sensor readings and images from an instrumented vehicle. Acquired results highlight the functionality of the prediction interval estimation of SIWNet, while the network also achieved similar point estimate accuracy as the previous state of the art. Furthermore, the SIWNet architecture is several times more lightweight than the previously applied state-of-the-art model, resulting in more practical and efficient deployment.

摘要
冬季条件对自动驾驶应用 pose 多个挑战。冬季中准确评估路面条件的影响是 Critical 参数，以确保安全可靠地控制车辆。这篇论文提出了一种深度学习回归模型，SIWNet，能够从摄像头图像中估算路面逐滴性。SIWNet 进一步了现状之arte，通过包括一个不确定估计机制在网络架构中。这是通过添加一个额外头部来实现的，该头部用最大有elihood 损失函数进行训练。模型在SeeingThroughFog数据集上进行了训练和测试，测试结果表明了SIWNet 的预测interval 估计能力，而模型也达到了与之前的状态之arte 相同的点估 precisión。此外，SIWNet 的架构比之前应用的状态之arte 模型轻量级多少，使其更加实用和高效地部署。

How Close are Other Computer Vision Tasks to Deepfake Detection?

paper_url: http://arxiv.org/abs/2310.00922
repo_url: None
paper_authors: Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
for: 本研究挑战了传统的信念，即监督 ImageNet 训练模型具有强大的总结能力和适用于深伪检测中的特征提取器。
methods: 本研究提出了一个新的衡量指标，“模型分离度”，用于可视化和量化地评估模型在无监督下的数据分离能力。我们还提供了一个系统的比较，用于检测深伪检测和其他计算机视觉任务之间的相关性。
results: 我们的分析显示，预训练面部识别模型与深伪检测更加相关，而其他模型则更加与其他计算机视觉任务相关。自动学习方法学习的模型在分离方面更高效，但是可能存在过拟合风险。我们的结果为研究人员和实践者提供了有价值的指导，帮助他们开发更有效的深伪检测模型。

Abstract
In this paper, we challenge the conventional belief that supervised ImageNet-trained models have strong generalizability and are suitable for use as feature extractors in deepfake detection. We present a new measurement, "model separability," for visually and quantitatively assessing a model's raw capacity to separate data in an unsupervised manner. We also present a systematic benchmark for determining the correlation between deepfake detection and other computer vision tasks using pre-trained models. Our analysis shows that pre-trained face recognition models are more closely related to deepfake detection than other models. Additionally, models trained using self-supervised methods are more effective in separation than those trained using supervised methods. After fine-tuning all models on a small deepfake dataset, we found that self-supervised models deliver the best results, but there is a risk of overfitting. Our results provide valuable insights that should help researchers and practitioners develop more effective deepfake detection models.

摘要
在这篇论文中，我们挑战了传统的认知，即超级视频批处理模型具有强大的总体化能力和适用于深伪检测中的特征提取器。我们提出了一个新的评价指标，即“模型分离度”，用于不经指导的方式评估模型的原始能力分离数据。我们还提供了一个系统的比较方法，用于确定深伪检测和其他计算机视觉任务之间的相关性，使用预训练模型。我们的分析表明，预训练人脸识别模型和深伪检测更加相关，而使用自我指导方法进行训练的模型则更好地进行分离。经过所有模型的微调using一小型深伪数据集，我们发现自我指导模型提供了最佳的结果，但也存在风险的过拟合。我们的结果提供了价值的意见，可以帮助研究人员和实践者开发更有效的深伪检测模型。

Every Dataset Counts: Scaling up Monocular 3D Object Detection with Joint Datasets Training

paper_url: http://arxiv.org/abs/2310.00920
repo_url: https://github.com/Owen-Liuyuxuan/visionfactory
paper_authors: Fulong Ma, Xiaoyang Yan, Yuxuan Liu, Ming Liu
for: 实现自动驾驶中的单目3D物体检测，但现有的单目3D检测算法受到3D标签的限制，这些标签仅从LiDAR测量获取，价格高昂且在新环境中实施具有挑战性。本研究探讨实现单目3D物体检测模型的训练管线。
methods: 提出一个三部分架构，包括：(1) 一个可靠的单目3D模型，能够在不同的摄像头设定下运作，(2) 选择性的训练策略，以应对具有不同分类标签的数据集，以及(3) 使用2D标签进行伪3D训练，以增强在仅具有2D标签的场景中的检测性能。
results: 透过实验证明，提出的方法可以训练模型在各种开放3D/2D数据集上，实现模型在新数据集上的强大普遍化能力和优化检测性能。

Abstract
Monocular 3D object detection plays a crucial role in autonomous driving. However, existing monocular 3D detection algorithms depend on 3D labels derived from LiDAR measurements, which are costly to acquire for new datasets and challenging to deploy in novel environments. Specifically, this study investigates the pipeline for training a monocular 3D object detection model on a diverse collection of 3D and 2D datasets. The proposed framework comprises three components: (1) a robust monocular 3D model capable of functioning across various camera settings, (2) a selective-training strategy to accommodate datasets with differing class annotations, and (3) a pseudo 3D training approach using 2D labels to enhance detection performance in scenes containing only 2D labels. With this framework, we could train models on a joint set of various open 3D/2D datasets to obtain models with significantly stronger generalization capability and enhanced performance on new dataset with only 2D labels. We conduct extensive experiments on KITTI/nuScenes/ONCE/Cityscapes/BDD100K datasets to demonstrate the scaling ability of the proposed method.

摘要
MONOCULAR 3D OBJECT DETECTION PLAYS A CRUCIAL ROLE IN AUTONOMOUS DRIVING. HOWEVER, EXISTING MONOCULAR 3D DETECTION ALGORITHMS RELY ON 3D LABELS DERIVED FROM LiDAR MEASUREMENTS, WHICH ARE COSTLY TO ACQUIRE FOR NEW DATASETS AND CHALLENGING TO DEPLOY IN NOVEL ENVIRONMENTS. THIS STUDY INVESTIGATES A PIPELINE FOR TRAINING A MONOCULAR 3D OBJECT DETECTION MODEL ON A DIVERSE COLLECTION OF 3D AND 2D DATASETS. THE PROPOSED FRAMEWORK COMPRISES THREE COMPONENTS: (1) A ROBUST MONOCULAR 3D MODEL CAPABLE OF FUNCTIONING ACROSS VARIOUS CAMERA SETTINGS, (2) A SELECTIVE-TRAINING STRATEGY TO ACCOMMODATE DATASETS WITH DIFFERING CLASS ANNOTATIONS, AND (3) A PSEUDO 3D TRAINING APPROACH USING 2D LABELS TO ENHANCE DETECTION PERFORMANCE IN SCENES CONTAINING ONLY 2D LABELS. WITH THIS FRAMEWORK, WE COULD TRAIN MODELS ON A JOINT SET OF VARIOUS OPEN 3D/2D DATASETS TO OBTAIN MODELS WITH SIGNIFICANTLY STRONGER GENERALIZATION CAPABILITY AND ENHANCED PERFORMANCE ON NEW DATASETS WITH ONLY 2D LABELS. WE CONDUCT EXTENSIVE EXPERIMENTS ON KITTI/nUSCENES/ONCE/CITYSCAPES/BDD100K DATASETS TO DEMONSTRATE THE SCALING ABILITY OF THE PROPOSED METHOD.

BAAF: A Benchmark Attention Adaptive Framework for Medical Ultrasound Image Segmentation Tasks

paper_url: http://arxiv.org/abs/2310.00919
repo_url: https://github.com/cgpxy/baaf
paper_authors: Gongping Chen, Lei Zhao, Xiaotao Yin, Liang Cui, Jianxun Zhang, Yu Dai
For: 这个研究旨在提出一种更通用和Robust的Benchmark Attention Adaptive Framework（BAAF），帮助医生更快速和准确地分类病变或组织在ultrasound图像中。* Methods: 该方法包括一个并行混合注意模块（PHAM）和一个适应调整机制（ACM）。特别是，BAAF首先粗略调整输入特征从通道和空间维度上，然后适应选择更加Robust的病变或组织特征从粗略调整后的特征地图。* Results: 实验结果表明，BAAF在四个医学ultrasound分类任务上表现出了remarkable的性能改进，与现有状态最佳方法相比，BAAF也表现出了superiority。这种方法可能提供自动化医学ultrasound诊断助手，减少人类准确率和精度的依赖。

Abstract
The AI-based assisted diagnosis programs have been widely investigated on medical ultrasound images. Complex scenario of ultrasound image, in which the coupled interference of internal and external factors is severe, brings a unique challenge for localize the object region automatically and precisely in ultrasound images. In this study, we seek to propose a more general and robust Benchmark Attention Adaptive Framework (BAAF) to assist doctors segment or diagnose lesions and tissues in ultrasound images more quickly and accurately. Different from existing attention schemes, the BAAF consists of a parallel hybrid attention module (PHAM) and an adaptive calibration mechanism (ACM). Specifically, BAAF first coarsely calibrates the input features from the channel and spatial dimensions, and then adaptively selects more robust lesion or tissue characterizations from the coarse-calibrated feature maps. The design of BAAF further optimizes the "what" and "where" focus and selection problems in CNNs and seeks to improve the segmentation accuracy of lesions or tissues in medical ultrasound images. The method is evaluated on four medical ultrasound segmentation tasks, and the adequate experimental results demonstrate the remarkable performance improvement over existing state-of-the-art methods. In addition, the comparison with existing attention mechanisms also demonstrates the superiority of BAAF. This work provides the possibility for automated medical ultrasound assisted diagnosis and reduces reliance on human accuracy and precision.

摘要
《人工智能支持的医疗ultrasound图像分割方法》已经广泛研究。ultrasound图像的复杂场景，即内部和外部因素的整合干扰严重，带来了自动LOCAL化对象区域的特殊挑战。在本研究中，我们提出了一种更通用和Robust的Benchmark Attention Adaptive Framework（BAAF），以帮助医生更快、更准确地分割或诊断ultrasound图像中的肿瘤或组织。BAAF与已有的注意机制不同，它包括一个并行混合注意模块（PHAM）和一个适应calibration机制（ACM）。具体来说，BAAF首先粗略调整输入特征从通道和空间维度，然后适应地选择更Robust的肿瘤或组织特征从粗略调整后的特征地图。BAAF的设计解决了CNN中的“what”和“where”注意和选择问题，并且提高了肿瘤或组织分割精度。对四种医疗ultrasound分割任务进行了评估，实验结果表明BAAF的表现很出色，与现有状态的方法相比，具有显著的性能提升。此外，与现有的注意机制进行比较，BAAF也表现出了superiority。这种工作为医疗ultrasound辅助诊断提供了自动化的可能性，减少了人类精度和精密性的依赖。

Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance

paper_url: http://arxiv.org/abs/2310.00917
repo_url: None
paper_authors: Alloy Das, Sanket Biswas, Ayan Banerjee, Saumik Bhattacharya, Josep Lladós, Umapada Pal
for: 本研究旨在探讨Scene Text Spotting（文本检测）模型在不同领域中的适应能力，以便在实际应用中能够更好地适应不同的环境和场景。
methods: 本研究使用了Transformer基eline called Swin-TESTR，通过在多个领域的源数据上进行训练，以达到文本检测模型能够直接适应目标领域的目的。
results: 结果表明，使用中间表示可以在多个领域的文本检测benchmark上达到显著的性能提升， both in terms of accuracy和 efficiency。

Abstract
The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art (SOTA) approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene text datasets, which do not directly exploit the intermediate feature representations between multiple domains. Here, we investigate the problem of domain-adaptive scene text spotting, i.e., training a model on multi-domain source data such that it can directly adapt to target domains rather than being specialized for a specific domain or scenario. Further, we investigate a transformer baseline called Swin-TESTR to focus on solving scene-text spotting for both regular and arbitrary-shaped scene text along with an exhaustive evaluation. The results clearly demonstrate the potential of intermediate representations to achieve significant performance on text spotting benchmarks across multiple domains (e.g. language, synth-to-real, and documents). both in terms of accuracy and efficiency.

摘要
<>translate_language Simplified Chinese;The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art (SOTA) approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene text datasets, which do not directly exploit the intermediate feature representations between multiple domains. Here, we investigate the problem of domain-adaptive scene text spotting, i.e., training a model on multi-domain source data such that it can directly adapt to target domains rather than being specialized for a specific domain or scenario. Further, we investigate a transformer baseline called Swin-TESTR to focus on solving scene-text spotting for both regular and arbitrary-shaped scene text along with an exhaustive evaluation. The results clearly demonstrate the potential of intermediate representations to achieve significant performance on text spotting benchmarks across multiple domains (e.g. language, synth-to-real, and documents). both in terms of accuracy and efficiency.TRANSLATION COMMENTS:* "scene text spotting" is translated as "场景文本检测"* "domain-adaptive" is translated as "适应域"* "intermediate feature representations" is translated as "中间特征表示"* "target domains" is translated as "目标域"* "synth-to-real" is translated as " sint-to-real"* "documents" is translated as "文档"Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

paper_url: http://arxiv.org/abs/2310.00906
repo_url: None
paper_authors: Mohamed Rahouti, Damian Lyons, Senthil Kumar Jagatheesaperumal, Kaiqi Xiong
for: 这种方法用于visual navigation，特别是 для一个病理多种机器人团队在广泛的视觉导航中进行导航。
methods: 这种方法使用了块链技术，不需要地图数据结构，可以在机器人平台上进行实时计算，并且可以在不可靠的视觉导航网络中达成共识。
results: 这种方法可以支持一个可靠和适应的导航路径选择，并且可以在缺乏地图数据的情况下实现视觉导航。

Abstract
Visual homing is a lightweight approach to visual navigation. Given the stored information of an initial 'home' location, the navigation task back to this location is achieved from any other location by comparing the stored home information to the current image and extracting a motion vector. A challenge that constrains the applicability of visual homing is that the home location must be within the robot's field of view to initiate the homing process. Thus, we propose a blockchain approach to visual navigation for a heterogeneous robot team over a wide area of visual navigation. Because it does not require map data structures, the approach is useful for robot platforms with a small computational footprint, and because it leverages current visual information, it supports a resilient and adaptive path selection. Further, we present a lightweight Proof-of-Work (PoW) mechanism for reaching consensus in the untrustworthy visual homing network.

摘要
Visual homing 是一种轻量级的视觉导航方法。从存储的初始"家"位置信息开始，导航任务返回到该位置从任何其他位置进行比较，并提取动作 вектор。一个挑战是要求家位置在机器人的视场内以便启动寻回过程。因此，我们提议使用区块链方法来支持多种机器人团队在广泛的视觉导航中进行寻回。由于不需要地图数据结构，这种方法适用于具有小型计算机脚本的机器人平台，而且由于使用当前视觉信息，它支持恒久和适应的路径选择。此外，我们提出了一种轻量级的 Proof-of-Work（PoW）机制来达成不可信视觉寻回网络中的一致。

JPEG Information Regularized Deep Image Prior for Denoising

paper_url: http://arxiv.org/abs/2310.00894
repo_url: None
paper_authors: Tsukasa Takagi, Shinya Ishizaki, Shin-ichi Maeda
for: 图像干涉除 (image denoising) 是计算机视觉中的一个重要任务，尤其是从干涉图像中进行图像恢复。
methods: 深度图像假设 (DIP) 提出了基于卷积神经网络架构的图像恢复方法，无需任何预训练。但是，DIP 的主要挑战是，它会完全恢复原始干涉图像，除非应用早期停止。
results: 我们提议使用 JPEG 文件大小来监控优化过程中的干涉水平，作为早期停止的代理指标。我们的实验表明，压缩图像文件大小可以作为有效的指标来实现早期停止。

Abstract
Image denoising is a representative image restoration task in computer vision. Recent progress of image denoising from only noisy images has attracted much attention. Deep image prior (DIP) demonstrated successful image denoising from only a noisy image by inductive bias of convolutional neural network architectures without any pre-training. The major challenge of DIP based image denoising is that DIP would completely recover the original noisy image unless applying early stopping. For early stopping without a ground-truth clean image, we propose to monitor JPEG file size of the recovered image during optimization as a proxy metric of noise levels in the recovered image. Our experiments show that the compressed image file size works as an effective metric for early stopping.

摘要
Image denoising 是计算机视觉中的一个代表性图像恢复任务。最近的进展在只有噪图像时进行图像恢复吸引了很多关注。深度图像先验（DIP）成功地通过循环神经网络架构的卷积假设来实现只有噪图像的图像恢复。但DIP基于的图像恢复具有一定的挑战，即DIP只有在应用早期停止时才能完全恢复原始噪图像。为了在优化过程中实现早期停止而不需要标准clean图像，我们提议监测优化过程中图像恢复后的JPEG文件大小作为噪度水平的代理指标。我们的实验表明，压缩图像文件大小indeed作为有效的停止指标。

PC-NeRF: Parent-Child Neural Radiance Fields under Partial Sensor Data Loss in Autonomous Driving Environments

paper_url: http://arxiv.org/abs/2310.00874
repo_url: https://github.com/biter0088/pc-nerf
paper_authors: Xiuzhong Hu, Guangming Xiong, Zheng Zang, Peng Jia, Yuxuan Han, Junyi Ma
for: 大规模3D场景重建是自动驾驶车辆的关键，特别是当部分感知数据丢失时。 although recently developed neural radiance fields (NeRF) have shown promising results in implicit representations, large-scale 3D scene reconstruction using partially lost LiDAR point cloud data still needs to be explored.
methods: we propose a novel 3D scene reconstruction framework called parent-child neural radiance field (PC-NeRF), which comprises two modules: the parent NeRF and the child NeRF. the framework simultaneously optimizes scene-level, segment-level, and point-level scene representations, allowing for more efficient utilization of sensor data and quick obtainment of an approximate volumetric representation of the scene even with limited observations.
results: our proposed PC-NeRF is proven to achieve high-precision 3D reconstruction in large-scale scenes, and can effectively tackle situations where partial sensor data is lost. our approach has high deployment efficiency with limited training time, and the pre-trained models and implementation will be available at https://github.com/biter0088/pc-nerf.

Abstract
Reconstructing large-scale 3D scenes is essential for autonomous vehicles, especially when partial sensor data is lost. Although the recently developed neural radiance fields (NeRF) have shown compelling results in implicit representations, the large-scale 3D scene reconstruction using partially lost LiDAR point cloud data still needs to be explored. To bridge this gap, we propose a novel 3D scene reconstruction framework called parent-child neural radiance field (PC-NeRF). The framework comprises two modules, the parent NeRF and the child NeRF, to simultaneously optimize scene-level, segment-level, and point-level scene representations. Sensor data can be utilized more efficiently by leveraging the segment-level representation capabilities of child NeRFs, and an approximate volumetric representation of the scene can be quickly obtained even with limited observations. With extensive experiments, our proposed PC-NeRF is proven to achieve high-precision 3D reconstruction in large-scale scenes. Moreover, PC-NeRF can effectively tackle situations where partial sensor data is lost and has high deployment efficiency with limited training time. Our approach implementation and the pre-trained models will be available at https://github.com/biter0088/pc-nerf.

摘要
<>Translate given text into Simplified Chinese.<>大规模3D场景重建是自动驾驶 Vehicle 中的关键，特别是当部分感知数据丢失时。虽然最近发展的神经辐射场（NeRF）已经显示出了吸引人的结果，但是大规模3D场景重建使用部分丢失 LiDAR 点云数据还需要进一步研究。为了填补这个差距，我们提出了一种新的3D场景重建框架，即父母神经辐射场（PC-NeRF）。该框架包括两个模块：父NeRF和孩子NeRF，同时优化场景级、段级和点级场景表示。通过利用孩子NeRF的段级表示能力，我们可以更好地利用感知数据，并在有限观察下 quickly obtain 一个 Approximate 的顶点 cloud 表示。经过广泛的实验，我们的提出的 PC-NeRF 已经证明可以在大规模场景中实现高精度3D重建。此外，PC-NeRF 还可以有效地处理部分感知数据丢失的情况，并且具有限定训练时间和高部署效率。我们的方法实现和预训练模型将在 GitHub 上提供。

RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal Consistency to Frame-Based Domain Translation Approaches

paper_url: http://arxiv.org/abs/2310.00868
repo_url: https://github.com/nadeemlab/CEP
paper_authors: Shawn Mathew, Saad Nadeem, Alvin C. Goh, Arie Kaufman
for: 这篇论文旨在开发一种新的无监督频率翻译方法，用于探视视频。
methods: 论文使用个别帧的方法，然后添加连续帧，并使用修改的深度学习架构来训练新的模型以实现时间一致性。
results: 论文提出了一种轻量级的解决方案，使得训练时间减少了一半。并在两个困难的应用中，包括肠静脉瓣分割和实际探视视频生成，进行了证明。Here’s the same information in English for reference:
for: This paper aims to develop new unsupervised domain translation methods for endoscopy videos.
methods: The paper uses individual-frame methods and then adds contiguous frames with a modified deep learning architecture to train a new model for temporal consistency.
results: The paper proposes a lightweight solution with a tunable temporal parameter, RT-GAN, that reduces training requirements by a factor of 5. The effectiveness of the approach is demonstrated on two challenging use cases in colonoscopy: haustral fold segmentation and realistic colonoscopy simulator video generation.

Abstract
While developing new unsupervised domain translation methods for endoscopy videos, it is typical to start with approaches that initially work for individual frames without temporal consistency. Once an individual-frame model has been finalized, additional contiguous frames are added with a modified deep learning architecture to train a new model for temporal consistency. This transition to temporally-consistent deep learning models, however, requires significantly more computational and memory resources for training. In this paper, we present a lightweight solution with a tunable temporal parameter, RT-GAN (Recurrent Temporal GAN), for adding temporal consistency to individual frame-based approaches that reduces training requirements by a factor of 5. We demonstrate the effectiveness of our approach on two challenging use cases in colonoscopy: haustral fold segmentation (indicative of missed surface) and realistic colonoscopy simulator video generation. The datasets, accompanying code, and pretrained models will be made available at \url{https://github.com/nadeemlab/CEP}.

摘要
在开发新的无监督频道翻译方法时，通常会从个别帧的方向开始，然后逐渐添加连续的帧来训练一个新的模型以保证时间一致性。然而，这种从无监督频道模型到时间一致的转换需要更多的计算资源和内存资源进行训练。在这篇论文中，我们提出了一种轻量级解决方案，即RT-GAN（循环时间GAN），用于在个别帧基础上添加时间一致性，从而降低训练需求的 factor of 5。我们在两个具有挑战性的colonoscopy任务中， namely haustral fold segmentation（表示过时的表面）和realistic colonoscopy simulator video生成任务中，证明了我们的方法的有效性。我们将在 \url{https://github.com/nadeemlab/CEP} 上提供数据集、代码和预训练模型。

Can Pre-trained Networks Detect Familiar Out-of-Distribution Data?

paper_url: http://arxiv.org/abs/2310.00847
repo_url: https://github.com/atsumiyai/pt-ood
paper_authors: Atsuyuki Miyai, Qing Yu, Go Irie, Kiyoharu Aizawa
for: 本研究旨在探讨预训练模型中PT-OOD数据对OOD探测性能的影响，以便更好地理解现有的探测方法在实际应用中的效果。
methods: 本研究使用了预训练模型，并通过线性探测进行PT-OOD探测。同时，我们还比较了超级vised和自我监督预训练方法的PT-OOD探测性能。
results: 我们发现PT-OOD在特征空间的低linear分离度对OOD探测性能产生了重大影响，而自我监督预训练模型比超级vised模型更容易受到PT-OOD的影响，即使使用了当前最佳的探测方法。为解决这个敏感性，我们还提出了一种独特的解决方案：利用预训练模型中强大的实例对实例分类表示，独立于ID决策边界进行OOD探测。

Abstract
Out-of-distribution (OOD) detection is critical for safety-sensitive machine learning applications and has been extensively studied, yielding a plethora of methods developed in the literature. However, most studies for OOD detection did not use pre-trained models and trained a backbone from scratch. In recent years, transferring knowledge from large pre-trained models to downstream tasks by lightweight tuning has become mainstream for training in-distribution (ID) classifiers. To bridge the gap between the practice of OOD detection and current classifiers, the unique and crucial problem is that the samples whose information networks know often come as OOD input. We consider that such data may significantly affect the performance of large pre-trained networks because the discriminability of these OOD data depends on the pre-training algorithm. Here, we define such OOD data as PT-OOD (Pre-Trained OOD) data. In this paper, we aim to reveal the effect of PT-OOD on the OOD detection performance of pre-trained networks from the perspective of pre-training algorithms. To achieve this, we explore the PT-OOD detection performance of supervised and self-supervised pre-training algorithms with linear-probing tuning, the most common efficient tuning method. Through our experiments and analysis, we find that the low linear separability of PT-OOD in the feature space heavily degrades the PT-OOD detection performance, and self-supervised models are more vulnerable to PT-OOD than supervised pre-trained models, even with state-of-the-art detection methods. To solve this vulnerability, we further propose a unique solution to large-scale pre-trained models: Leveraging powerful instance-by-instance discriminative representations of pre-trained models and detecting OOD in the feature space independent of the ID decision boundaries. The code will be available via https://github.com/AtsuMiyai/PT-OOD.

摘要
OUT-OF-DISTRIBUTION (OOD) 检测是安全敏感机器学习应用中的关键问题，文献中已有大量研究，但大多数研究不使用预训练模型，从scratch 训练了后续任务的后续任务。在过去几年，通过将大型预训练模型的知识传递到下游任务，使用轻量级调整成为现代培训ID类器的主流方法。为了跨越现有的类ifier和OOD检测方法之间的差距，我们认为这样的OOD数据可能会对大型预训练网络的性能产生很大影响，因为这些OOD数据的分类可能与预训练算法有关。我们将这种OOD数据称为PT-OOD（预训练OOD）数据。在这篇论文中，我们想要探讨PT-OOD对预训练网络的OOD检测性能的影响，从预训练算法的角度出发。为了实现这一目标，我们在supervised和self-supervised预训练算法中进行了线性探索调整，这是最常用的效率调整方法。经过我们的实验和分析，我们发现PT-OOD在特征空间的低线性分割对OOD检测性能产生重要影响，而自动化模型更容易受到PT-OOD的影响，即使使用当前最佳检测方法。为解决这一问题，我们进一步提出了一种对大规模预训练模型的解决方案：利用预训练模型强大的实例对实例权威表示，独立于ID决策边界在特征空间检测OOD。代码将通过https://github.com/AtsuMiyai/PT-OOD 提供。

Elastic Interaction Energy Loss for Traffic Image Segmentation

paper_url: http://arxiv.org/abs/2310.01449
repo_url: None
paper_authors: Yaxin Feng, Yuan Lan, Luchan Zhang, Yang Xiang
for: 这篇论文旨在提高实时车道景象理解中的图像分割精度和实时运算速度。
methods: 本论文提出了一种简单 yet efficient的geometry-sensitive energy-based损失函数，用于增强Convolutional Neural Network (CNN)的多类分割能力。
results: experiments show that the proposed method consistently improves performance, especially when using real-time, lightweight networks as the backbones, which is more suitable for autonomous driving.

Abstract
Segmentation is a pixel-level classification of images. The accuracy and fast inference speed of image segmentation are crucial for autonomous driving safety. Fine and complex geometric objects are the most difficult but important recognition targets in traffic scene, such as pedestrians, traffic signs and lanes. In this paper, a simple and efficient geometry-sensitive energy-based loss function is proposed to Convolutional Neural Network (CNN) for multi-class segmentation on real-time traffic scene understanding. To be specific, the elastic interaction energy (EIE) between two boundaries will drive the prediction moving toward the ground truth until completely overlap. The EIE loss function is incorporated into CNN to enhance accuracy on fine-scale structure segmentation. In particular, small or irregularly shaped objects can be identified more accurately, and discontinuity issues on slender objects can be improved. Our approach can be applied to different segmentation-based problems, such as urban scene segmentation and lane detection. We quantitatively and qualitatively analyze our method on three traffic datasets, including urban scene data Cityscapes, lane data TuSimple and CULane. The results show that our approach consistently improves performance, especially when using real-time, lightweight networks as the backbones, which is more suitable for autonomous driving.

摘要
Segmentation 是图像水平分类的过程。图像 segmentation 的准确性和快速推理速度对自动驾驶安全性至关重要。交通场景中最难但最重要的认知目标是细小复杂的几何对象，如行人、交通标志和车道。本文提出了一种简单有效的几何敏感能量基函数，用于 Convolutional Neural Network (CNN) 的多类分类。具体来说，在实际执行时，两个边界之间的弹性交互能量 (EIE) 会使预测向真实值靠拢，直到完全重叠。EIE 损失函数被集成到 CNN 中，以提高细致结构分类的准确性。特别是，小型或不规则形状的对象可以更加准确地被识别出来，并且可以解决车道板材的不连续问题。我们的方法可以应用于不同的分类问题，如城市场景分类和车道检测。我们对三个交通数据集进行了量化和质量分析，结果表明，我们的方法在使用实时、轻量级网络作为 backing 时，能够一直提高表现，特别是在使用真实值、快速推理的情况下。

Large Scale Masked Autoencoding for Reducing Label Requirements on SAR Data

paper_url: http://arxiv.org/abs/2310.00826
repo_url: None
paper_authors: Matt Allen, Francisco Dorr, Joseph A. Gallego-Mejia, Laura Martínez-Ferrer, Anna Jungbluth, Freddie Kalaitzis, Raúl Ramos-Pollán
for: 这 paper 用于 monitoring 和 mitigation of anthropogenic climate change.
methods: 这 paper 使用 Synthetic Aperture Radar (SAR) 数据, 并应用 self-supervised pretraining scheme 和 masked autoencoding.
results: 这 paper 显示了 SAR 数据可以减少 labelling 需求, 并且在不同地区表现出较好的泛化性. 这些结果可以为 клиimate change 监测和 mitigation 提供更快、更准确的监测方案.

Abstract
Satellite-based remote sensing is instrumental in the monitoring and mitigation of the effects of anthropogenic climate change. Large scale, high resolution data derived from these sensors can be used to inform intervention and policy decision making, but the timeliness and accuracy of these interventions is limited by use of optical data, which cannot operate at night and is affected by adverse weather conditions. Synthetic Aperture Radar (SAR) offers a robust alternative to optical data, but its associated complexities limit the scope of labelled data generation for traditional deep learning. In this work, we apply a self-supervised pretraining scheme, masked autoencoding, to SAR amplitude data covering 8.7\% of the Earth's land surface area, and tune the pretrained weights on two downstream tasks crucial to monitoring climate change - vegetation cover prediction and land cover classification. We show that the use of this pretraining scheme reduces labelling requirements for the downstream tasks by more than an order of magnitude, and that this pretraining generalises geographically, with the performance gain increasing when tuned downstream on regions outside the pretraining set. Our findings significantly advance climate change mitigation by facilitating the development of task and region-specific SAR models, allowing local communities and organizations to deploy tailored solutions for rapid, accurate monitoring of climate change effects.

摘要

2023-10-02

cs.AI

cs.AI - 2023-10-02

Transcending Domains through Text-to-Image Diffusion: A Source-Free Approach to Domain Adaptation

paper_url: http://arxiv.org/abs/2310.01701
repo_url: None
paper_authors: Shivang Chopra, Suraj Kothawade, Houda Aynaou, Aman Chadha
for: 增强模型在目标领域的性能，使用充足的标注数据来增强模型的性能。
methods: 使用文本到图像扩散模型在目标领域样本上训练，并使用预训练的源模型进行微调，生成相似于源数据的样本。
results: 在标准Office-31、Office-Home和VisDA benchmark上比较多个基线方法，证明了我们的方法在SFDA任务中的效果。

Abstract
Domain Adaptation (DA) is a method for enhancing a model's performance on a target domain with inadequate annotated data by applying the information the model has acquired from a related source domain with sufficient labeled data. The escalating enforcement of data-privacy regulations like HIPAA, COPPA, FERPA, etc. have sparked a heightened interest in adapting models to novel domains while circumventing the need for direct access to the source data, a problem known as Source-Free Domain Adaptation (SFDA). In this paper, we propose a novel framework for SFDA that generates source data using a text-to-image diffusion model trained on the target domain samples. Our method starts by training a text-to-image diffusion model on the labeled target domain samples, which is then fine-tuned using the pre-trained source model to generate samples close to the source data. Finally, we use Domain Adaptation techniques to align the artificially generated source data with the target domain data, resulting in significant performance improvements of the model on the target domain. Through extensive comparison against several baselines on the standard Office-31, Office-Home, and VisDA benchmarks, we demonstrate the effectiveness of our approach for the SFDA task.

摘要
域 Adaptation（DA）是一种方法，用于在目标域中提高模型的性能，使用相关的源域数据，具有足够的标注数据。随着数据隐私法规如HIPAA、COPPA、FERPA等的推广，SFDA问题（无法直接访问源数据的域 adaptation）在模型适应新域时引起了更高的兴趣。在这篇论文中，我们提出了一种新的SFDA框架，通过在目标域样本上训练文本到图像扩散模型，生成源数据。我们的方法从目标域上的标注样本中训练文本到图像扩散模型，然后使用预训练的源模型进行微调，生成与源数据相似的样本。最后，我们使用域 adaptation技术将人工生成的源数据与目标域数据进行对齐，从而实现了模型在目标域的显著性能提升。通过对标准Office-31、Office-Home和VisDA测试集进行广泛的比较，我们证明了我们的方法在SFDA任务中的有效性。

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

paper_url: http://arxiv.org/abs/2310.04445
repo_url: None
paper_authors: Muhammad Ahmed Shah, Roshan Sharma, Hira Dhamyal, Raphael Olivier, Ankit Shah, Joseph Konan, Dareen Alharthi, Hazim T Bukhari, Massa Baali, Soham Deshmukh, Michael Kuhlmann, Bhiksha Raj, Rita Singh
for: 防止大语言模型（LLM）Alignment可以被绕过，通过附加特制的攻击 suffix 来诱发危险回答。
methods: 使用公共模型作为代理，通过成功地从公共代理模型传播攻击到私有目标模型。
results: 通过本地精细调教（LoFT），即在附近 lexicosemantic 附近的类似查询上练习代理模型，以减少代理模型和目标模型之间的差异。实验结果显示，通过本地精细调教，可以提高攻击 transferred 的成功率，对于 ChatGPT、GPT-4 和 Claude 目标模型，提高了 $39%$, $7%$ 和 $0.5%$ （绝对）。

Abstract
It has been shown that Large Language Model (LLM) alignments can be circumvented by appending specially crafted attack suffixes with harmful queries to elicit harmful responses. To conduct attacks against private target models whose characterization is unknown, public models can be used as proxies to fashion the attack, with successful attacks being transferred from public proxies to private target models. The success rate of attack depends on how closely the proxy model approximates the private model. We hypothesize that for attacks to be transferrable, it is sufficient if the proxy can approximate the target model in the neighborhood of the harmful query. Therefore, in this paper, we propose \emph{Local Fine-Tuning (LoFT)}, \textit{i.e.}, fine-tuning proxy models on similar queries that lie in the lexico-semantic neighborhood of harmful queries to decrease the divergence between the proxy and target models. First, we demonstrate three approaches to prompt private target models to obtain similar queries given harmful queries. Next, we obtain data for local fine-tuning by eliciting responses from target models for the generated similar queries. Then, we optimize attack suffixes to generate attack prompts and evaluate the impact of our local fine-tuning on the attack's success rate. Experiments show that local fine-tuning of proxy models improves attack transferability and increases attack success rate by $39\%$, $7\%$, and $0.5\%$ (absolute) on target models ChatGPT, GPT-4, and Claude respectively.

摘要
研究人员已经发现，可以使用特制的攻击 suffix 让 Large Language Model (LLM) alignment 被绕过。为了进行对私人目标模型的攻击，可以使用公共模型作为代理，通过成功地从公共代理传输攻击到私人目标模型。攻击成功率取决于代理模型与目标模型之间的相似性。我们假设，只要代理模型可以在恶意查询附近与目标模型相似，那么攻击就可以传输。因此，在这篇论文中，我们提出了本地精细调整（LoFT），即将代理模型在恶意查询附近的类似查询上进行精细调整，以减少代理模型与目标模型之间的差异。首先，我们介绍了三种方法，以获得与恶意查询相似的查询。然后，我们通过目标模型对生成的类似查询进行响应来获得本地调整数据。最后，我们优化攻击提示符，并评估本地调整对攻击成功率的影响。实验结果显示，本地调整可以提高代理模型对攻击的传输性和攻击成功率，具体上提高了 $39\%$, $7\%$, 和 $0.5\%$（绝对）的攻击成功率。

Zero-Shot Continuous Prompt Transfer: Generalizing Task Semantics Across Language Models

paper_url: http://arxiv.org/abs/2310.01691
repo_url: https://github.com/manga-uofa/ptfer
paper_authors: Zijun Wu, Yongkang Wu, Lili Mou
for: 这篇论文是针对自然语言处理（NLP）领域中的启动调整问题进行研究，尤其是大语言模型的适应性问题。
methods: 本文提出了一种零数据 kontinuous prompt 转移方法，其中源提示被编码到相对空间中，并且对目标模型进行搜索以实现转移。
results: 实验结果显示了该方法的效果，发现task semantics在 kontinuous prompt 中可以通过不同语言模型进行普遍化。此外，我们发现可以从多个源模型中获取task semantics，对应的目标模型进行转移可以进一步提高转移性。

Abstract
Prompt tuning in natural language processing (NLP) has become an increasingly popular method for adapting large language models to specific tasks. However, the transferability of these prompts, especially continuous prompts, between different models remains a challenge. In this work, we propose a zero-shot continuous prompt transfer method, where source prompts are encoded into relative space and the corresponding target prompts are searched for transferring to target models. Experimental results confirm the effectiveness of our method, showing that 'task semantics' in continuous prompts can be generalized across various language models. Moreover, we find that combining 'task semantics' from multiple source models can further enhance the generalizability of transfer.

摘要
Prompt tuning在自然语言处理（NLP）中已经变得越来越流行，用于适应特定任务。然而，这些提示的转移性，尤其是连续提示，在不同模型之间仍然是一个挑战。在这项工作中，我们提出了零shot连续提示转移方法，其中源提示被编码到相对空间中，并在目标模型上查找对应的目标提示进行转移。实验结果证明了我们的方法的效iveness，表明在不同语言模型之间可以通过'任务semantics'在连续提示中 generalized。此外，我们发现了将'任务semantics'从多个源模型组合起来可以进一步增强转移的一致性。

Designing User-Centric Behavioral Interventions to Prevent Dysglycemia with Novel Counterfactual Explanations

paper_url: http://arxiv.org/abs/2310.01684
repo_url: None
paper_authors: Asiful Arefeen, Hassan Ghasemzadeh
for: 预防糖尿病和chronic complications，通过提供个性化的饮食、运动和药物建议来维持正常血糖水平
methods: 利用对抗学习的想法，生成高维健康数据的决策边界，并通过格里д搜索生成可行的交互方案
results: GlyCoach可以在两个实际数据集和外部模拟器上进行广泛的评估，与现有的对抗学习方法相比，Counterfactuals from GlyCoach exhibit a 32% improved normalized distance, 和87%的敏感性。

Abstract
Maintaining normal blood glucose levels through lifestyle behaviors is central to maintaining health and preventing disease. Frequent exposure to dysglycemia (i.e., abnormal glucose events such as hyperlycemia and hypoglycemia) leads to chronic complications including diabetes, kidney disease and need for dialysis, myocardial infarction, stroke, amputation, and death. Therefore, a tool capable of predicting dysglycemia and offering users actionable feedback about how to make changes in their diet, exercise, and medication to prevent abnormal glycemic events could have significant societal impacts. Counterfactual explanations can provide insights into why a model made a particular prediction by generating hypothetical instances that are similar to the original input but lead to a different prediction outcome. Therefore, counterfactuals can be viewed as a means to design AI-driven health interventions to prevent adverse health outcomes such as dysglycemia. In this paper, we design GlyCoach, a framework for generating counterfactual explanations for glucose control. Leveraging insights from adversarial learning, GlyCoach characterizes the decision boundary for high-dimensional health data and performs a grid search to generate actionable interventions. GlyCoach is unique in integrating prior knowledge about user preferences of plausible explanations into the process of counterfactual generation. We evaluate GlyCoach extensively using two real-world datasets and external simulators from prior studies that predict glucose response. GlyCoach achieves 87\% sensitivity in the simulation-aided validation, surpassing the state-of-the-art techniques for generating counterfactual explanations by at least $10\%$. Besides, counterfactuals from GlyCoach exhibit a $32\%$ improved normalized distance compared to previous research.

摘要
保持正常血糖水平通过生活习惯是保健和预防疾病的中心。经常暴露于异常血糖（如高血糖和低血糖）会导致慢性合并症如 диабе特、肾病和肾透析、心脏infarction、中风、截肢和死亡。因此，一个能预测异常血糖并提供用户可行的feedback以预防异常血糖事件的工具可能会有显著的社会影响。对于这种情况，我们设计了GlyCoach，一种基于对抗学习的框架，用于生成可行的异常血糖解释。GlyCoach利用对抗学习的洞察， caracteriza la frontera de toma de decisiones para datos de salud de alta dimensionalidad y realiza una búsqueda en grilla para generar intervenciones factibles. GlyCoach是唯一一个integrating knowledge about user preferences of plausible explanations into the process of counterfactual generation。我们对GlyCoach进行了广泛的测试，并使用了两个实际数据集和外部的仿真器来验证其性能。GlyCoach在 simulation-aided validation中达到了87%的敏感性，超过了当前最佳的对抗学习技术生成异常解释的水平。此外，异常解释从GlyCoach中 exhibit了32%的改善正规化距离。

What’s the Magic Word? A Control Theory of LLM Prompting

paper_url: http://arxiv.org/abs/2310.04444
repo_url: https://github.com/amanb2000/magic_words
paper_authors: Aman Bhargava, Cameron Witkowski, Manav Shah, Matt Thomson
for: 本研究旨在形式化提示工程为自然语言处理模型（LLM）的优化问题，并研究是否存在一个特定的提示可以使LLM正确预测输入序列中的最后一个元素。
methods: 本研究使用控制理论来分析LLM的可控性，并提出了一个名为$k-\epsilon$控制可行性的度量来量化LLM的可操控性。
results: 研究发现，对于大多数的WikiText实例，存在一个10个字或更少的魔法提示可以使LLM正确预测输入序列中的最后一个元素。此外，研究还发现了不同模型的$k-\epsilon$控制可行性的差异。

Abstract
Prompt engineering is effective and important in the deployment of LLMs but is poorly understood mathematically. Here, we formalize prompt engineering as an optimal control problem on LLMs -- where the prompt is considered a control variable for modulating the output distribution of the LLM. Within this framework, we ask a simple question: given a sequence of tokens, does there always exist a prompt we can prepend that will steer the LLM toward accurately predicting the final token? We call such an optimal prompt the magic word since prepending the prompt causes the LLM to output the correct answer. If magic words exist, can we find them? If so, what are their properties? We offer analytic analysis on the controllability of the self-attention head where we prove a bound on controllability as a function of the singular values of its weight matrices. We take inspiration from control theory to propose a metric called $k-\epsilon$ controllability to characterize LLM steerability. We compute the $k-\epsilon$ controllability of a panel of large language models, including Falcon-7b, Llama-7b, and Falcon-40b on 5000 WikiText causal language modeling tasks. Remarkably, we find that magic words of 10 tokens or less exist for over 97% of WikiText instances surveyed for each model.

摘要
Prompt engineering是效果重要且不充分理解的一部分，它涉及到控制LLMs的输出分布。在这个框架下，我们问一个简单的问题：给定一个序列的token，是否总是存在一个可以预先 prepend 的提示，可以使LLM更准确地预测最后一个token？如果存在这种最佳提示，我们可以找到它吗？如果可以找到，它们的性质是什么？我们提出了一种分析方法，证明了自注意头的控制性的下界，这个下界取决于自注意头的Weight矩阵的单值。我们受到控制理论的启发，提出了一个名为$k-\epsilon$控制性的度量，用于描述LLM的可控性。我们计算了一些大型语言模型的$k-\epsilon$控制性，包括Falcon-7b、Llama-7b和Falcon-40b，并在5000个WikiText causal语言模型任务上测试了它们。结果显示，对于97%以上的WikiText实例，存在10个字或更少的魔法提示。

Keypoint-Augmented Self-Supervised Learning for Medical Image Segmentation with Limited Annotation

paper_url: http://arxiv.org/abs/2310.01680
repo_url: https://github.com/zshyang/kaf
paper_authors: Zhangsihao Yang, Mengwei Ren, Kaize Ding, Guido Gerig, Yalin Wang
for: 这 paper 是为了提高医疗图像分割的精度，特别是在低注释条件下。
methods: 这 paper 使用了自我超视的 CNN 模型（如 UNet），并在自我超视下进行了自我学习。具体来说，它使用了一种具有键点扩展的融合层，以EXTRACT representations preserving both short- and long-range self-attention。此外，它还使用了全球和本地自我超视预训练方法。
results: 在 MRI 和 CT 分割任务上，这 paper 的提出的方法在比 CNN 和 Transformer-based UNets 的情况下表现出了建筑学习优势。此外，它还在 SSL 方法中表现出了更好的robustness和更高的 segmentation 结果。

Abstract
Pretraining CNN models (i.e., UNet) through self-supervision has become a powerful approach to facilitate medical image segmentation under low annotation regimes. Recent contrastive learning methods encourage similar global representations when the same image undergoes different transformations, or enforce invariance across different image/patch features that are intrinsically correlated. However, CNN-extracted global and local features are limited in capturing long-range spatial dependencies that are essential in biological anatomy. To this end, we present a keypoint-augmented fusion layer that extracts representations preserving both short- and long-range self-attention. In particular, we augment the CNN feature map at multiple scales by incorporating an additional input that learns long-range spatial self-attention among localized keypoint features. Further, we introduce both global and local self-supervised pretraining for the framework. At the global scale, we obtain global representations from both the bottleneck of the UNet, and by aggregating multiscale keypoint features. These global features are subsequently regularized through image-level contrastive objectives. At the local scale, we define a distance-based criterion to first establish correspondences among keypoints and encourage similarity between their features. Through extensive experiments on both MRI and CT segmentation tasks, we demonstrate the architectural advantages of our proposed method in comparison to both CNN and Transformer-based UNets, when all architectures are trained with randomly initialized weights. With our proposed pretraining strategy, our method further outperforms existing SSL methods by producing more robust self-attention and achieving state-of-the-art segmentation results. The code is available at https://github.com/zshyang/kaf.git.

摘要
预训练CNN模型（例如UNet）通过自我指导已成为医学图像分割中低注解注意力下强大的方法。最近的对比学习方法激励同一张图像下不同变换后的相似全球表示，或者强制不同图像/补丁特征之间的相似性。然而，CNN提取的全球和本地特征限制了捕捉生物解剖学中长距离空间相关性。为此，我们提出了一种带有加密键点的融合层，该层提取保留短距离和长距离自动注意的表示。具体来说，我们在多个缩放级别上将CNN特征地图中加入了额外输入，以学习长距离空间自动注意。此外，我们还引入了全球和本地自我指导预训练方法。在全球级别上，我们从UNet瓶颈中获得全球表示，并将多个缩放级别的键点特征聚合成全球表示。这些全球特征 subsequentially 被正则化通过图像级别对比学习目标。在本地级别上，我们定义了一种距离基于的标准准则，以首先确定键点之间的对应关系，并且促进键点特征之间的相似性。经过了广泛的实验，我们发现在MRI和CT segmentation任务上，我们提出的方法在与CNN和Transformer-based UNets进行比较时，具有更高的建筑准确性。此外，我们还提出了一种自动注意预训练策略，该策略可以生成更加稳定和robust的自动注意，并且实现了状态级别的分割结果。代码可以在https://github.com/zshyang/kaf.git中找到。

Artemis: HE-Aware Training for Efficient Privacy-Preserving Machine Learning

paper_url: http://arxiv.org/abs/2310.01664
repo_url: None
paper_authors: Yeonsoo Jeon, Mattan Erez, Michael Orshansky
for: 这个研究是为了提高基于同质加密（HE）的隐私保护 Machine Learning（ML）的实用性，特别是在处理现代大深度神经网时。
methods: 这个研究使用了一种名为Artemis的高效神经网簇析方法，包括两种HE掌握析法（位置析法和 диагональ析法），以减少HE条件下的Rotation操作 Compute时间。
results: 这个研究发现，使用Artemis方法可以在三个数据集上 achieved a 1.2-6x的改善，并且比之前的HE掌握析法更好。

Abstract
Privacy-Preserving ML (PPML) based on Homomorphic Encryption (HE) is a promising foundational privacy technology. Making it more practical requires lowering its computational cost, especially, in handling modern large deep neural networks. Model compression via pruning is highly effective in conventional plaintext ML but cannot be effectively applied to HE-PPML as is. We propose Artemis, a highly effective DNN pruning technique for HE-based inference. We judiciously investigate two HE-aware pruning strategies (positional and diagonal) to reduce the number of Rotation operations, which dominate compute time in HE convolution. We find that Pareto-optimal solutions are based fully on diagonal pruning. Artemis' benefits come from coupling DNN training, driven by a novel group Lasso regularization objective, with pruning to maximize HE-specific cost reduction (dominated by the Rotation operations). We show that Artemis improves on prior HE-oriented pruning and can achieve a 1.2-6x improvement when targeting modern convolutional models (ResNet18 and ResNet18) across three datasets.

摘要
privacy-preserving ML (PPML) based on homomorphic encryption (HE) is a promising foundational privacy technology. making it more practical requires lowering its computational cost, especially in handling modern large deep neural networks. model compression via pruning is highly effective in conventional plaintext ML but cannot be effectively applied to HE-PPML as is. we propose artemis, a highly effective DNN pruning technique for HE-based inference. we judiciously investigate two HE-aware pruning strategies (positional and diagonal) to reduce the number of rotation operations, which dominate compute time in HE convolution. we find that pareto-optimal solutions are based fully on diagonal pruning. artemis' benefits come from coupling DNN training, driven by a novel group lasso regularization objective, with pruning to maximize HE-specific cost reduction (dominated by the rotation operations). we show that artemis improves on prior HE-oriented pruning and can achieve a 1.2-6x improvement when targeting modern convolutional models (resnet18 and resnet50) across three datasets.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

It’s all about you: Personalized in-Vehicle Gesture Recognition with a Time-of-Flight Camera

paper_url: http://arxiv.org/abs/2310.01659
repo_url: None
paper_authors: Amr Gomaa, Guillermo Reyes, Michael Feld
for: 提高在驾驶环境中手势识别精度，提高安全性和司机体验。
methods: 提出了一种基于CNNLSTM模型的个性化适应方法，通过数据增强、个性化适应和逐步学习等技术来提高识别精度，同时减少数据需求。
results: 实现了最高识别精度达90%，并证明了个性化适应和逐步学习的效果。

Abstract
Despite significant advances in gesture recognition technology, recognizing gestures in a driving environment remains challenging due to limited and costly data and its dynamic, ever-changing nature. In this work, we propose a model-adaptation approach to personalize the training of a CNNLSTM model and improve recognition accuracy while reducing data requirements. Our approach contributes to the field of dynamic hand gesture recognition while driving by providing a more efficient and accurate method that can be customized for individual users, ultimately enhancing the safety and convenience of in-vehicle interactions, as well as driver's experience and system trust. We incorporate hardware enhancement using a time-of-flight camera and algorithmic enhancement through data augmentation, personalized adaptation, and incremental learning techniques. We evaluate the performance of our approach in terms of recognition accuracy, achieving up to 90\%, and show the effectiveness of personalized adaptation and incremental learning for a user-centered design.

摘要
尽管Gesture认知技术已经取得了重要进步，但在驾驶环境中识别手势仍然是一项挑战，主要因为数据的有限性和成本高昂，以及手势的动态和不断变化的性质。在这项工作中，我们提出了一种个性化模型适应approach，以提高CNNLSTM模型的识别精度，降低数据需求。我们的approach对 dynamically hand gesture recognition while driving做出了贡献，提供了更有效率、准确的方法，可以根据个人用户进行个性化定制，从而提高车辆内交互的安全性和用户体验，以及系统的可靠性和信任性。我们在hardware层面使用了时光探测相机，在算法层面使用了数据扩展、个性化适应和增量学习技术。我们对我们的方法的性能进行评估， recognition accuracy达到90%，并证明了个性化适应和增量学习的有效性，以及用户中心的设计。

CoDBench: A Critical Evaluation of Data-driven Models for Continuous Dynamical Systems

paper_url: http://arxiv.org/abs/2310.01650
repo_url: None
paper_authors: Priyanshu Burark, Karn Tiwari, Meer Mehran Rashid, Prathosh A P, N M Anoop Krishnan
for: This paper is written for researchers and practitioners in the field of scientific machine learning, particularly those interested in modeling dynamical systems using data-driven models.
methods: The paper uses 11 state-of-the-art data-driven models for solving differential equations, including feed forward neural networks, deep operator regression models, frequency-based neural operators, and transformer architectures.
results: The paper conducts extensive experiments to evaluate the capabilities of these models in learning, zero-shot super-resolution, data efficiency, robustness to noise, and computational efficiency, using 8 widely applicable benchmark datasets encompassing challenges from fluid and solid mechanics. The results show that current operators struggle with the newer mechanics datasets, motivating the need for more robust neural operators.Here is the simplified Chinese version of the three key information points:
for: 这篇论文是为科学机器学习研究者和实践者们写的，特别是关注使用数据驱动模型来模型动态系统的人们。
methods: 这篇论文使用了11种最新的数据驱动模型来解决偏微分方程，包括扩展神经网络、深度运算回归模型、频率基于神经运算器和转换架构。
results: 这篇论文进行了广泛的实验来评估这些模型在学习、零shot超分辨、数据效率、鲁棒性和计算效率等方面的能力，使用8种广泛适用的标准 benchmark dataset，包括流体和固体机械学的挑战。

Abstract
Continuous dynamical systems, characterized by differential equations, are ubiquitously used to model several important problems: plasma dynamics, flow through porous media, weather forecasting, and epidemic dynamics. Recently, a wide range of data-driven models has been used successfully to model these systems. However, in contrast to established fields like computer vision, limited studies are available analyzing the strengths and potential applications of different classes of these models that could steer decision-making in scientific machine learning. Here, we introduce CodBench, an exhaustive benchmarking suite comprising 11 state-of-the-art data-driven models for solving differential equations. Specifically, we comprehensively evaluate 4 distinct categories of models, viz., feed forward neural networks, deep operator regression models, frequency-based neural operators, and transformer architectures against 8 widely applicable benchmark datasets encompassing challenges from fluid and solid mechanics. We conduct extensive experiments, assessing the operators' capabilities in learning, zero-shot super-resolution, data efficiency, robustness to noise, and computational efficiency. Interestingly, our findings highlight that current operators struggle with the newer mechanics datasets, motivating the need for more robust neural operators. All the datasets and codes will be shared in an easy-to-use fashion for the scientific community. We hope this resource will be an impetus for accelerated progress and exploration in modeling dynamical systems.

摘要
To address this gap, we introduce CodBench, an exhaustive benchmarking suite comprising 11 state-of-the-art data-driven models for solving differential equations. Specifically, we comprehensively evaluate 4 distinct categories of models, including feed forward neural networks, deep operator regression models, frequency-based neural operators, and transformer architectures, against 8 widely applicable benchmark datasets that cover challenges from fluid and solid mechanics.We conduct extensive experiments to assess the operators' capabilities in learning, zero-shot super-resolution, data efficiency, robustness to noise, and computational efficiency. Our findings reveal that current operators struggle with the newer mechanics datasets, highlighting the need for more robust neural operators.All the datasets and codes will be shared in an easy-to-use fashion for the scientific community, with the hope that this resource will serve as an impetus for accelerated progress and exploration in modeling dynamical systems.

Human Mobility Question Answering (Vision Paper)

paper_url: http://arxiv.org/abs/2310.04443
repo_url: None
paper_authors: Hao Xue, Flora D. Salim
for: 这个论文的目的是提出一个新的任务：人行为问答（MobQA），以帮助智能系统通过人行为数据学习并回答相关的问题。
methods: 该论文提出了一种初步的数据集设计和深度学习模型框架，用于支持该新的研究方向。
results: 该论文的研究对人行为预测和问答系统的研究带来了新的思路和方向，并可能开拓了新的应用领域，如智能城市规划、疫情管理和个性化推荐系统。

Abstract
Question answering (QA) systems have attracted much attention from the artificial intelligence community as they can learn to answer questions based on the given knowledge source (e.g., images in visual question answering). However, the research into question answering systems with human mobility data remains unexplored. Mining human mobility data is crucial for various applications such as smart city planning, pandemic management, and personalised recommendation system. In this paper, we aim to tackle this gap and introduce a novel task, that is, human mobility question answering (MobQA). The aim of the task is to let the intelligent system learn from mobility data and answer related questions. This task presents a new paradigm change in mobility prediction research and further facilitates the research of human mobility recommendation systems. To better support this novel research topic, this vision paper also proposes an initial design of the dataset and a potential deep learning model framework for the introduced MobQA task. We hope that this paper will provide novel insights and open new directions in human mobility research and question answering research.

摘要
question answering (QA) 系统在人工智能社区中受到了很多关注，因为它们可以从知识源（例如：图像）学习回答问题。但是，对于使用人类活动数据进行问题回答系统的研究，仍然是一个未解探索的领域。人类活动数据的采矿是各种应用中的重要 componenet，例如：智能城市观察、疫病管理和个性化推荐系统。在这篇论文中，我们想要处理这个差异，并引入一个新的任务：人类活动问题回答（MobQA）。这个任务的目的是让智能系统从人类活动数据中学习回答相关问题。这个任务将带来一个新的思维模式变化，并且为人类活动推荐系统的研究提供更多的支持。为了更好地支持这个新的研究主题，这篇论文还提出了一个初步的数据集设计和一个可能的深度学习模型框架 для MobQA 任务。我们希望这篇论文能够提供新的见解和开启新的方向，并且为人类活动研究和问题回答研究带来新的发展。

On Training Derivative-Constrained Neural Networks

paper_url: http://arxiv.org/abs/2310.01649
repo_url: https://github.com/lbai-lab/dcnn-training
paper_authors: KaiChieh Lo, Daniel Huang
for: 这种情况是在物理学术领域中使用神经网络（NN）的时候，通过输入的部分导数来提供额外的训练信号。
methods: 我们提出了一种集成RELU（IReLU）活动函数，以改善DCNN的训练。我们还研究了减小和标签缩放以帮助DC训练稳定。
results: 我们在物理学术领域中使用 физи学信息和科学机器学习（SciML）任务进行评估，发现现有架构与IReLU激活函数、减小和标签缩放结合可以更好地利用导数约束提供的训练信号。

Abstract
We refer to the setting where the (partial) derivatives of a neural network's (NN's) predictions with respect to its inputs are used as additional training signal as a derivative-constrained (DC) NN. This situation is common in physics-informed settings in the natural sciences. We propose an integrated RELU (IReLU) activation function to improve training of DC NNs. We also investigate denormalization and label rescaling to help stabilize DC training. We evaluate our methods on physics-informed settings including quantum chemistry and Scientific Machine Learning (SciML) tasks. We demonstrate that existing architectures with IReLU activations combined with denormalization and label rescaling better incorporate training signal provided by derivative constraints.

摘要
我们称将神经网络（NN）预测结果对入力的（部分）导数作为训练信号的情况为对数条件（DC）神经网络。这种情况在自然科学中的物理知识整合中很普遍。我们提议一个统合RELU（IReLU）启动函数来改善 DC 训练。我们还研究了减小和标签推对称来帮助稳定 DC 训练。我们在物理知识整合中包括量子化学和科学机器学习（SciML）任务进行评估。我们示出了现有架构与 IReLU 启动函数、减小和标签推对称可以更好地利用训练信号。

Exploring Naming Conventions (and Defects) of Pre-trained Deep Learning Models in Hugging Face and Other Model Hubs

paper_url: http://arxiv.org/abs/2310.01642
repo_url: None
paper_authors: Wenxin Jiang, Chingwo Cheung, George K. Thiruvathukal, James C. Davis
for: 本研究旨在探讨PTM名称的命名规范和相关的名称问题。
methods: 本研究使用自动生成名称评估技术和自动架构识别算法来分析PTM名称的语义和语法模式。
results: 研究发现PTM名称的命名规范和名称问题，并将这些问题框定为研究到实践过程中的一部分。我们预计未来的更多实践研究将基于PTM的元特征来支持模型搜索和再利用。

Abstract
As innovation in deep learning continues, many engineers want to adopt Pre-Trained deep learning Models (PTMs) as components in computer systems. PTMs are part of a research-to-practice pipeline: researchers publish PTMs, which engineers adapt for quality or performance and then deploy. If PTM authors choose appropriate names for their PTMs, it could facilitate model discovery and reuse. However, prior research has reported that model names are not always well chosen, and are sometimes erroneous. The naming conventions and naming defects for PTM packages have not been systematically studied - understanding them will add to our knowledge of how the research-to-practice process works for PTM packages In this paper, we report the first study of PTM naming conventions and the associated PTM naming defects. We define the components of a PTM package name, comprising the package name and claimed architecture from the metadata. We present the first study focused on characterizing the nature of naming in PTM ecosystem. To this end, we developed a novel automated naming assessment technique that can automatically extract the semantic and syntactic patterns. To identify potential naming defects, we developed a novel algorithm, automated DNN ARchitecture Assessment pipeline (DARA), to cluster PTMs based on architectural differences. Our study suggests the naming conventions for PTMs, and frames the naming conventions as signal of the research-to-practice relationships in the PTM ecosystem. We envision future works on further empirical study on leveraging meta-features of PTMs to support model search and reuse.

摘要
As deep learning innovation continues, many engineers want to use Pre-Trained deep learning Models (PTMs) as components in computer systems. PTMs are part of a research-to-practice pipeline: researchers publish PTMs, which engineers adapt for quality or performance and then deploy. If PTM authors choose appropriate names for their PTMs, it could facilitate model discovery and reuse. However, prior research has reported that model names are not always well chosen, and are sometimes erroneous. The naming conventions and naming defects for PTM packages have not been systematically studied - understanding them will add to our knowledge of how the research-to-practice process works for PTM packages. In this paper, we report the first study of PTM naming conventions and the associated PTM naming defects. We define the components of a PTM package name, comprising the package name and claimed architecture from the metadata. We present the first study focused on characterizing the nature of naming in PTM ecosystem. To this end, we developed a novel automated naming assessment technique that can automatically extract the semantic and syntactic patterns. To identify potential naming defects, we developed a novel algorithm, automated DNN ARchitecture Assessment pipeline (DARA), to cluster PTMs based on architectural differences. Our study suggests the naming conventions for PTMs, and frames the naming conventions as signal of the research-to-practice relationships in the PTM ecosystem. We envision future works on further empirical study on leveraging meta-features of PTMs to support model search and reuse.

Imitation Learning from Observation through Optimal Transport

paper_url: http://arxiv.org/abs/2310.01632
repo_url: None
paper_authors: Wei-Di Chang, Scott Fujimoto, David Meger, Gregory Dudek
for: 学习从观察（ILfO）设定，在无直接指导和示例行为的情况下，学习者尝试模仿专家的行为。
methods: 使用最优运输来优化IL，生成基于温水斯坦距离的状态轨迹奖励函数，不需要学习模型或对抗学习。
results: 在多个连续控制任务上 demonstrates 该简单的方法可以超越当前状态的艺术，在不同评估领域中达到专家水平表现，即使只有一个专家轨迹而没有行为。

Abstract
Imitation Learning from Observation (ILfO) is a setting in which a learner tries to imitate the behavior of an expert, using only observational data and without the direct guidance of demonstrated actions. In this paper, we re-examine the use of optimal transport for IL, in which a reward is generated based on the Wasserstein distance between the state trajectories of the learner and expert. We show that existing methods can be simplified to generate a reward function without requiring learned models or adversarial learning. Unlike many other state-of-the-art methods, our approach can be integrated with any RL algorithm, and is amenable to ILfO. We demonstrate the effectiveness of this simple approach on a variety of continuous control tasks and find that it surpasses the state of the art in the IlfO setting, achieving expert-level performance across a range of evaluation domains even when observing only a single expert trajectory without actions.

摘要
学习从观察（ILfO）是一种学习场景，在这种场景中，学习者尝试通过观察专家的行为，而不是通过直接示范的行动，来学习行为。在这篇论文中，我们重新评估了使用最优运输来进行 IL，在这种情况下，学习者会根据专家的状态轨迹和自己的状态轨迹之间的普通 Wasserstein 距离来生成奖励函数。我们表明，现有的方法可以简化为生成奖励函数，不需要学习模型或对抗学习。与其他多数现状之前的方法不同，我们的方法可以与任何RL算法集成，并且适用于 ILfO Setting。我们在多个连续控制任务上进行了详细的实验，并证明了我们的简单方法可以在评估领域中超过状态艺术。

VAL: Interactive Task Learning with GPT Dialog Parsing

paper_url: http://arxiv.org/abs/2310.01627
repo_url: None
paper_authors: Lane Lawley, Christopher J. MacLellan
for: 学习交互任务 (Interactive Task Learning) 的系统，以便从有限的人类提供的Modalities，如自然语言，获得增量知识。
methods: 使用大语言模型 (Large Language Models) 进行特定任务，如 predicate 和 argument 选择，并将其与符号学习算法集成。
results: VAL 系统可以很好地从自然语言中学习层次任务知识，并且可以通过人类理解的方式表示获得的知识，同时可以在执行新任务时无需额外训练。

Abstract
Reinforcement learning often requires millions of examples to produce static, black-box models. In contrast, interactive task learning (ITL) emphasizes incremental knowledge acquisition from limited instruction provided by humans in modalities such as natural language. However, in practice, ITL systems often suffers from brittle, error-prone language parsing. Large language models (LLMs) are resistant to brittleness but are not interpretable and cannot learn incrementally. We present VAL, an ITL system with a new philosophy for LLM/symbolic integration. By using LLMs only for specific tasks -- such as predicate and argument selection -- within an algorithmic framework, VAL reaps the benefits of LLMs to support interactive learning of hierarchical task knowledge from natural language. Acquired knowledge is human interpretable and generalizes to support execution of novel tasks without additional training. We studied users' interactions with VAL in a video game setting, finding that most users could successfully teach VAL using language they felt was natural.

摘要
常规强化学习通常需要数百万个示例来生成静态、黑盒模型。相比之下，交互任务学习（ITL）强调从人类提供的有限指导下逐步获得知识。然而，在实践中，ITL系统经常会遇到不稳定的自然语言分析问题。大型自然语言模型（LLM）具有不稳定性的抵觐性，但是它们不可解释和不能在不同任务上逐步学习。我们提出了 VAL，一种基于 ITL 系统的新哲学，通过将 LLM 用于特定任务，例如 predicate 和 argument 选择，来支持从自然语言中学习层次任务知识。获得的知识是人类可解释的，并且可以在执行新任务时进行扩展。我们在视频游戏设定下对用户与 VAL 的互动进行了研究，发现大多数用户可以使用自然的语言教导 VAL。

Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity

paper_url: http://arxiv.org/abs/2310.01616
repo_url: None
paper_authors: Emmeran Johnson, Ciara Pike-Burke, Patrick Rebeschini
for: investigate the relationship between sample-efficiency and adaptivity in reinforcement learning
methods: employ a learning framework that allows sending queries in $K$ batches, with feedback being processed and queries updated after each batch, covering the whole adaptivity spectrum from non-adaptive to fully adaptive scenarios
results: establish lower bounds on the number of batches $K$ required for sample-efficient algorithms with $n = O(poly(d))$ queries, showing that just having adaptivity does not necessarily guarantee sample-efficiency, and the adaptivity-boundary for sample-efficiency depends on the problem dimension.Here is the text in Simplified Chinese:
for: investigate the relationship between sample-efficiency和adaptivity在 reinforcement learning中
methods: 使用一个学习框架，允许在 $K$ 批次中发送查询，并在每批次后进行反馈处理和查询更新，覆盖整个 adaptivity 谱从非适应的 ‘offline’ ($K=1$) 到完全适应 ($K=n$) 的场景和中间的场景
results: 确定 $K$ 批次的下界，以保证 sample-efficient 算法，并显示了适应性并不一定能 garant sample-efficiency，并且 adaptivity-boundary 对 sample-efficiency 的问题Dimension 依赖。

Abstract
We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries $n$ to the environment that is polynomial in the dimension $d$ of the problem. Adaptivity refers to the frequency at which queries are sent and feedback is processed to update the querying strategy. To investigate this interplay, we employ a learning framework that allows sending queries in $K$ batches, with feedback being processed and queries updated after each batch. This model encompasses the whole adaptivity spectrum, ranging from non-adaptive 'offline' ($K=1$) to fully adaptive ($K=n$) scenarios, and regimes in between. For the problems of policy evaluation and best-policy identification under $d$-dimensional linear function approximation, we establish $\Omega(\log \log d)$ lower bounds on the number of batches $K$ required for sample-efficient algorithms with $n = O(poly(d))$ queries. Our results show that just having adaptivity ($K>1$) does not necessarily guarantee sample-efficiency. Notably, the adaptivity-boundary for sample-efficiency is not between offline reinforcement learning ($K=1$), where sample-efficiency was known to not be possible, and adaptive settings. Instead, the boundary lies between different regimes of adaptivity and depends on the problem dimension.

摘要
我们理论上探索对应关系中的样本效率和适应性在强化学习中。一个算法是样本效率的如果它使用一个对称于问题维度($d$)的多项式数量($n$)来访问环境。适应性指的是在发送询问和处理反馈之后更新询问策略的频率。为了进行 investigating 这个关系，我们使用一个学习框架，让在 $K$ 个批次中发送询问，并在每个批次后处理反馈并更新询问策略。这个模型包括了整个适应spectrum，从非适应的 'offline' ($K=1$) 到完全适应 ($K=n$) 的情况，以及中间的情况。对于 $d$-dimensional 线性函数近似问题，我们建立 $\Omega(\log \log d)$ 下界在 $K$ 个批次中的数量，并证明这些下界不可能被 $n = O(poly(d))$ 问题数量超越。我们的结果显示，只有适应性 ($K>1$) 不是保证样本效率。更重要的是，适应性-boundary 不在 offline reinforcement learning ($K=1$) 和适应设定之间，而是在不同的适应度下随着问题维度而变化。

Solving the Quadratic Assignment Problem using Deep Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.01604
repo_url: None
paper_authors: Puneet S. Bagga, Arthur Delarue
for: solves the Quadratic Assignment Problem (QAP) using deep reinforcement learning.
methods: uses a novel double pointer network that alternates between selecting a location and a facility to place.
results: produces solutions that are on average within 7.5% of a high-quality local search baseline, and even outperform it on 1.2% of instances.Here’s the full Chinese text:
for: 解决了 combinatorial 问题 Quadratic Assignment Problem (QAP) 使用深度强化学习。
methods: 使用一种新的 double pointer network，该网络 alternate 地选择一个位置和一个facility放置。
results: 生成的解决方案在一个大规模的synthetic实例集上进行训练，并不需要实例特定的重新训练。out-of-sample，解决方案的平均误差为7.5%，并在1.2%的实例上超越了高质量的局部搜索基准。

Abstract
The Quadratic Assignment Problem (QAP) is an NP-hard problem which has proven particularly challenging to solve: unlike other combinatorial problems like the traveling salesman problem (TSP), which can be solved to optimality for instances with hundreds or even thousands of locations using advanced integer programming techniques, no methods are known to exactly solve QAP instances of size greater than 30. Solving the QAP is nevertheless important because of its many critical applications, such as electronic wiring design and facility layout selection. We propose a method to solve the original Koopmans-Beckman formulation of the QAP using deep reinforcement learning. Our approach relies on a novel double pointer network, which alternates between selecting a location in which to place the next facility and a facility to place in the previous location. We train our model using A2C on a large dataset of synthetic instances, producing solutions with no instance-specific retraining necessary. Out of sample, our solutions are on average within 7.5% of a high-quality local search baseline, and even outperform it on 1.2% of instances.

摘要
“ quadratic assignment problem（QAP）是一个NP困难的问题，它与其他的 combinatorial 问题（如旅游销售问题（TSP））不同，后者可以使用先进的整数程式设计技术解决，但是QAP的解决方案仍然是一个挑战。尽管如此，解决QAP仍然很重要，因为它在电子组件设计和设施布局选择等领域有许多重要应用。我们提出了一种使用深度强化学习解决原始的 Koopmans-Beckman 形式的 QAP 方法。我们的方法基于一个 noval double pointer network，这个网络在选择下一个设施的位置和在上一个位置中放置设施之间轮流。我们使用 A2C 进行训练，使用一个大量的 sintetic 实验数据，并不需要实验pecific 重训。在 out-of-sample 测试中，我们的解决方案的平均差异为 7.5%，并且在 1.2% 的实验中even outperform 高品质的地方搜索基准。”

A Review of Digital Learning Environments for Teaching Natural Language Processing in K-12 Education

paper_url: http://arxiv.org/abs/2310.01603
repo_url: None
paper_authors: Xiaoyi Tian, Kristy Elizabeth Boyer
for: 这篇论文旨在探讨在基础教育（K-12）中使用自然语言处理（NLP）的数字学习环境。
methods: 论文采用了一种抽查式的方法，检查现有的数字学习工具是否支持特定的NLP任务和过程，以及这些工具在教育上的可读性和评价结果。
results: 论文发现现有的数字学习工具在教育上有很多优点和缺点，并且有一些领域需要进一步的研究和开发。这篇论文的发现可以帮助未来的研究人员更好地制定更有效和包容的NLP教育策略。

Abstract
Natural Language Processing (NLP) plays a significant role in our daily lives and has become an essential part of Artificial Intelligence (AI) education in K-12. As children grow up with NLP-powered applications, it is crucial to introduce NLP concepts to them, fostering their understanding of language processing, language generation, and ethical implications of AI and NLP. This paper presents a comprehensive review of digital learning environments for teaching NLP in K-12. Specifically, it explores existing digital learning tools, discusses how they support specific NLP tasks and procedures, and investigates their explainability and evaluation results in educational contexts. By examining the strengths and limitations of these tools, this literature review sheds light on the current state of NLP learning tools in K-12 education. It aims to guide future research efforts to refine existing tools, develop new ones, and explore more effective and inclusive strategies for integrating NLP into K-12 educational contexts.

摘要
自然语言处理（NLP）在我们日常生活中扮演着重要的角色，成为人工智能（AI）教育的一个重要组成部分。随着孩子在NLP应用程序中长大，因此是非常重要引入NLP概念，促进他们对语言处理、语言生成以及人工智能和NLP的伦理方面的理解。本文提供了关于在K-12教育中教学NLP的全面评估。特别是，它探讨了现有的数字学习环境，评估它们支持具体的NLP任务和过程，以及它们在教育 context中的解释性和评价结果。通过对这些工具的分析，本文照明了K-12教育中NLP学习工具的当前状况，以引导未来的研究努力，抑制现有工具，开发新的工具，并探索更有效和包容的策略，以将NLP integrate到K-12教育中。

CAT-LM: Training Language Models on Aligned Code And Tests

paper_url: http://arxiv.org/abs/2310.01602
repo_url: https://github.com/raonikitha/cat-lm
paper_authors: Nikitha Rao, Kush Jain, Uri Alon, Claire Le Goues, Vincent J. Hellendoorn
for: 这个论文的目的是提出一种基于 GPT 语言模型的自动化测试生成方法，以提高测试生成效率和代码质量。
methods: 该方法使用了一种新的预训练信号，该信号考虑到测试文件和代码文件之间的映射关系，并使用了长度最多为 8,192 个 tokens 的输入序列，以确保模型可以利用代码上下文来生成测试代码。
results: 实验结果表明，CAT-LM 可以生成高覆盖率的测试代码，并且比其他大型语言模型（CodeGen 16B 和 StarCoder）更能够生成有效的测试代码。此外，CAT-LM 还可以在测试完成任务中表现出色，并且在比较 Test-specific 模型 (TeCo) 时表现出优异。

Abstract
Testing is an integral part of the software development process. Yet, writing tests is time-consuming and therefore often neglected. Classical test generation tools such as EvoSuite generate behavioral test suites by optimizing for coverage, but tend to produce tests that are hard to understand. Language models trained on code can generate code that is highly similar to that written by humans, but current models are trained to generate each file separately, as is standard practice in natural language processing, and thus fail to consider the code-under-test context when producing a test file. In this work, we propose the Aligned Code And Tests Language Model (CAT-LM), a GPT-style language model with 2.7 Billion parameters, trained on a corpus of Python and Java projects. We utilize a novel pretraining signal that explicitly considers the mapping between code and test files when available. We also drastically increase the maximum sequence length of inputs to 8,192 tokens, 4x more than typical code generation models, to ensure that the code context is available to the model when generating test code. We analyze its usefulness for realistic applications, showing that sampling with filtering (e.g., by compilability, coverage) allows it to efficiently produce tests that achieve coverage similar to ones written by developers while resembling their writing style. By utilizing the code context, CAT-LM generates more valid tests than even much larger language models trained with more data (CodeGen 16B and StarCoder) and substantially outperforms a recent test-specific model (TeCo) at test completion. Overall, our work highlights the importance of incorporating software-specific insights when training language models for code and paves the way to more powerful automated test generation.

摘要
testing是软件开发过程中的一个重要环节。然而，写测试用例是时间consuming的，因此经常被忽略。经典的测试生成工具such as EvoSuite会生成行为测试用例，但是它们通常会生成难以理解的测试用例。使用代码生成模型可以生成高度类似于人类编写的代码，但现有的模型都是在自然语言处理的标准实践下进行单个文件的生成，因此缺乏考虑到测试代码上下文的能力。在这种情况下，我们提出了一种新的语言模型CAT-LM，其基于GPT的语言模型，具有2.7亿个参数，并在Python和Java项目的庞大词汇库上进行训练。我们采用了一种新的预训练信号，其将考虑测试代码和源代码之间的映射。我们还增加了输入序列长度的最大值至8,192个字符，比typical code generation模型更长，以确保模型在生成测试代码时可以获得代码上下文。我们对CAT-LM的实用性进行了实验，并证明了采用抽象筛选（例如，编译可读性、覆盖率）可以有效地生成测试用例，而且与开发者编写的测试用例的样式相似。CAT-LM在与更大的语言模型（CodeGen 16B和StarCoder）和最近的测试特定模型（TeCo）进行比较时表现出了明显的优势，可以快速生成高覆盖率的测试用例。总之，我们的工作表明了在训练代码生成模型时，需要结合软件特定的知识，以便更好地实现自动化测试生成。

Memory-efficient particle filter recurrent neural network for object localization

paper_url: http://arxiv.org/abs/2310.01595
repo_url: None
paper_authors: Roman Korkin, Ivan Oseledets, Aleksandr Katrutsa
for: 本研究提出了一种新的快速卷积神经网络架构，用于解决对象定位问题。
methods: 本研究使用了经典的粒子筛抽象和GRU循环神经网络架构，实现了具有同样数量的参数来处理不同环境大小的约束。
results: 在我们的实验中，mePFRNN模型在Symmetric和噪音环境中表现更加精准地定位对象，并且需要 fewer trained parameters。

Abstract
This study proposes a novel memory-efficient recurrent neural network (RNN) architecture specified to solve the object localization problem. This problem is to recover the object states along with its movement in a noisy environment. We take the idea of the classical particle filter and combine it with GRU RNN architecture. The key feature of the resulting memory-efficient particle filter RNN model (mePFRNN) is that it requires the same number of parameters to process environments of different sizes. Thus, the proposed mePFRNN architecture consumes less memory to store parameters compared to the previously proposed PFRNN model. To demonstrate the performance of our model, we test it on symmetric and noisy environments that are incredibly challenging for filtering algorithms. In our experiments, the mePFRNN model provides more precise localization than the considered competitors and requires fewer trained parameters.

摘要
这个研究提出了一种新的快速减少参数的隐藏状态RNN架构，用于解决对象Localization问题。这个问题是在噪声环境中恢复对象的状态和运动。我们从类传统的 particule filter 中灵感，并将 GRU RNN 架构结合起来。 mePFRNN 模型的关键特征是它需要同一数量的参数来处理不同大小的环境。因此，我们提出的 mePFRNN 模型对存储参数的内存占用量更低，相比之前提出的 PFRNN 模型。为证明我们的模型的性能，我们在具有对称和噪声环境的测试中对它进行了测试，并发现它在 filtering 算法中表现更加精准，需要 fewer 训练参数。

Prescribed Fire Modeling using Knowledge-Guided Machine Learning for Land Management

paper_url: http://arxiv.org/abs/2310.01593
repo_url: None
paper_authors: Somya Sharma Chatterjee, Kelly Lindsay, Neel Chatterjee, Rohan Patil, Ilkay Altintas De Callafon, Michael Steinbach, Daniel Giron, Mai H. Nguyen, Vipin Kumar
for: 这项研究旨在提供一种可靠、快速的机器学习（ML）框架，用于实时规划和管理控制性的野火预防。methods: 该研究使用了域知识引入的ML框架，以减少数据缺乏时的物理不一致性问题，同时还使用了数据补充技术来减少预测偏袋问题。results: 研究表明，该ML框架可以快速地模拟控制性的野火，同时也可以提供更准确的火灾面积和燃烧速度估计值。此外，该框架还可以在不同的风向condition下展现出更好的总体化能力。

Abstract
In recent years, the increasing threat of devastating wildfires has underscored the need for effective prescribed fire management. Process-based computer simulations have traditionally been employed to plan prescribed fires for wildfire prevention. However, even simplified process models like QUIC-Fire are too compute-intensive to be used for real-time decision-making, especially when weather conditions change rapidly. Traditional ML methods used for fire modeling offer computational speedup but struggle with physically inconsistent predictions, biased predictions due to class imbalance, biased estimates for fire spread metrics (e.g., burned area, rate of spread), and generalizability in out-of-distribution wind conditions. This paper introduces a novel machine learning (ML) framework that enables rapid emulation of prescribed fires while addressing these concerns. By incorporating domain knowledge, the proposed method helps reduce physical inconsistencies in fuel density estimates in data-scarce scenarios. To overcome the majority class bias in predictions, we leverage pre-existing source domain data to augment training data and learn the spread of fire more effectively. Finally, we overcome the problem of biased estimation of fire spread metrics by incorporating a hierarchical modeling structure to capture the interdependence in fuel density and burned area. Notably, improvement in fire metric (e.g., burned area) estimates offered by our framework makes it useful for fire managers, who often rely on these fire metric estimates to make decisions about prescribed burn management. Furthermore, our framework exhibits better generalization capabilities than the other ML-based fire modeling methods across diverse wind conditions and ignition patterns.

摘要
近年来，due to the increasing threat of devastating wildfires，the need for effective prescribed fire management has become more urgent. Traditional process-based computer simulations have been used to plan prescribed fires for wildfire prevention, but even simplified process models like QUIC-Fire are too compute-intensive for real-time decision-making, especially when weather conditions change rapidly.Traditional machine learning (ML) methods used for fire modeling offer computational speedup, but they struggle with physically inconsistent predictions, biased predictions due to class imbalance, biased estimates for fire spread metrics (e.g., burned area, rate of spread), and generalizability in out-of-distribution wind conditions.This paper introduces a novel ML framework that enables rapid emulation of prescribed fires while addressing these concerns. By incorporating domain knowledge, the proposed method helps reduce physical inconsistencies in fuel density estimates in data-scarce scenarios. To overcome the majority class bias in predictions, we leverage pre-existing source domain data to augment training data and learn the spread of fire more effectively.Finally, we overcome the problem of biased estimation of fire spread metrics by incorporating a hierarchical modeling structure to capture the interdependence in fuel density and burned area. Notably, the improvement in fire metric (e.g., burned area) estimates offered by our framework makes it useful for fire managers, who often rely on these fire metric estimates to make decisions about prescribed burn management.Furthermore, our framework exhibits better generalization capabilities than the other ML-based fire modeling methods across diverse wind conditions and ignition patterns.

On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?

paper_url: http://arxiv.org/abs/2310.01581
repo_url: None
paper_authors: Hangfan Zhang, Zhimeng Guo, Huaisheng Zhu, Bochuan Cao, Lu Lin, Jinyuan Jia, Jinghui Chen, Dinghao Wu
For: The paper is written to investigate whether alignment can prevent open-sourced large language models from being misused to generate undesired content.* Methods: The paper uses direct manipulation of the generation process to misguide open-sourced LLMs into generating undesired content, including harmful or biased information and even private data.* Results: The paper shows that even aligned open-sourced LLMs can be easily misguided to generate undesired content without heavy computations or careful prompt designs, highlighting the need for more advanced mitigation strategies.

Abstract
Large Language Models (LLMs) have achieved unprecedented performance in Natural Language Generation (NLG) tasks. However, many existing studies have shown that they could be misused to generate undesired content. In response, before releasing LLMs for public access, model developers usually align those language models through Supervised Fine-Tuning (SFT) or Reinforcement Learning with Human Feedback (RLHF). Consequently, those aligned large language models refuse to generate undesired content when facing potentially harmful/unethical requests. A natural question is "could alignment really prevent those open-sourced large language models from being misused to generate undesired content?''. In this work, we provide a negative answer to this question. In particular, we show those open-sourced, aligned large language models could be easily misguided to generate undesired content without heavy computations or careful prompt designs. Our key idea is to directly manipulate the generation process of open-sourced LLMs to misguide it to generate undesired content including harmful or biased information and even private data. We evaluate our method on 4 open-sourced LLMs accessible publicly and our finding highlights the need for more advanced mitigation strategies for open-sourced LLMs.

摘要
我们的关键思想是通过直接控制公开的LLMs生成过程来误导它生成不良内容，包括有害或歧视性的信息以及私人数据。我们对公开的4个LLMs进行了评估，并发现了更高度的防范策略是必要的。这种发现高亮了公开的LLMs需要更加高度的安全保护。

Active Learning on Neural Networks through Interactive Generation of Digit Patterns and Visual Representation

paper_url: http://arxiv.org/abs/2310.01580
repo_url: https://github.com/drjeong/digitperceptron
paper_authors: Dong H. Jeong, Jin-Hee Cho, Feng Chen, Audun Josang, Soo-Yeon Ji
for: 这个论文的目的是提高用户对人工神经网络（ANNs）的理解和学习，通过设计一个互动式学习系统，在实时模式下创造数字 patrerns并识别它们。methods: 该论文使用了人工神经网络（NNs）来识别数字 patrerns，并通过多种用户交互来帮助用户更好地理解数字 patrerns的视觉差异和其与NNs的结果。results: 经过多种数据集的评估，该系统被证明可以帮助用户更好地理解数字 patrerns和其与NNs的关系，并在实时模式下提供有用的学习体验。在夏令营中的 informal 用户测试中，参与者们也表示了对该系统的好用。

Abstract
Artificial neural networks (ANNs) have been broadly utilized to analyze various data and solve different domain problems. However, neural networks (NNs) have been considered a black box operation for years because their underlying computation and meaning are hidden. Due to this nature, users often face difficulties in interpreting the underlying mechanism of the NNs and the benefits of using them. In this paper, to improve users' learning and understanding of NNs, an interactive learning system is designed to create digit patterns and recognize them in real time. To help users clearly understand the visual differences of digit patterns (i.e., 0 ~ 9) and their results with an NN, integrating visualization is considered to present all digit patterns in a two-dimensional display space with supporting multiple user interactions. An evaluation with multiple datasets is conducted to determine its usability for active learning. In addition, informal user testing is managed during a summer workshop by asking the workshop participants to use the system.

摘要
人工神经网络（ANNs）已广泛应用于不同数据分析和各个领域问题解决。然而，神经网络（NNs）在多年来一直被视为黑盒子操作，因为其下面计算和意义隐藏不seen。由于这种性质，用户经常遇到解释NNs的下面机制和使用它们的困难。在这篇论文中，为了提高用户学习和理解NNs，一种互动学习系统被设计，可以在实时创建数字模式和识别它们。为了帮助用户清楚地理解数字模式（即0～9）的视觉差异和NNs的结果，在两个维度显示空间中进行了视觉化的整合。为了评估该系统的使用可用性，对多个数据集进行了评估。此外，在一个夏令营中，通过向参加者们提问，进行了非正式的用户测试。

Iterative Option Discovery for Planning, by Planning

paper_url: http://arxiv.org/abs/2310.01569
repo_url: None
paper_authors: Kenny Young, Richard S. Sutton
for: 这篇论文的目的是探索有用的时间抽象，以实现使用增强学习和规划在更加复杂的领域中。
methods: 这篇论文提出了Option Iteration，一种基于AlphaZero的专家循环找到专家策略的方法。它不是将单一强大的策略学习到满足搜索结果的所有状态，而是将一组专家策略学习到每个状态下，至少有一个策略在某些时间进程中对搜索结果进行匹配。
results: 实验结果显示，使用Option Iteration学习的规划Algorithm在复杂的规划环境中具有优秀的表现，较 analogous planning algorithm operating in the space of primitive actions and learning a single rollout policy with Expert Iteration。

Abstract
Discovering useful temporal abstractions, in the form of options, is widely thought to be key to applying reinforcement learning and planning to increasingly complex domains. Building on the empirical success of the Expert Iteration approach to policy learning used in AlphaZero, we propose Option Iteration, an analogous approach to option discovery. Rather than learning a single strong policy that is trained to match the search results everywhere, Option Iteration learns a set of option policies trained such that for each state encountered, at least one policy in the set matches the search results for some horizon into the future. Intuitively, this may be significantly easier as it allows the algorithm to hedge its bets compared to learning a single globally strong policy, which may have complex dependencies on the details of the current state. Having learned such a set of locally strong policies, we can use them to guide the search algorithm resulting in a virtuous cycle where better options lead to better search results which allows for training of better options. We demonstrate experimentally that planning using options learned with Option Iteration leads to a significant benefit in challenging planning environments compared to an analogous planning algorithm operating in the space of primitive actions and learning a single rollout policy with Expert Iteration.

摘要
发现有用的时间抽象，在形式上是选项，被广泛认为是应用奖励学习和规划到越来越复杂的领域的关键。基于 alphaZero 中的专家迭代方法的实际成功，我们提议Option Iteration，这是一种相似的选项发现方法。而不是学习一个强大的全局策略，这种策略可能具有复杂的现状依赖关系，Option Iteration 学习一组局部强策略，其中每个状态都有至少一个策略匹配搜索结果在某个时间后的未来。这可能比学习一个全局强策略更容易，因为它允许算法在不同的状态下留下风险。已经学习了这组局部强策略后，我们可以使用它们来导引搜索算法，从而形成一个循环关系，更好的选项会导致更好的搜索结果，从而帮助更好的选项的训练。我们通过实验表明，使用 Option Iteration 进行规划在复杂的规划环境中具有显著的优势，比一种相似的规划算法在原始动作空间中学习一个单一满足 Rollout 策略的 Expert Iteration 方法。

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

paper_url: http://arxiv.org/abs/2310.01558
repo_url: https://github.com/oriyor/ret-robust
paper_authors: Ori Yoran, Tomer Wolfson, Ori Ram, Jonathan Berant
for: 这个论文的目的是探讨 Retrieval-augmented language models (RALMs) 在多步骤理解场景下的性能，并研究如何在不相关的证据被利用时避免性能下降。
methods: 该论文使用了五个开放领域问答benchmark进行了广泛的分析，描述了 Retrieval-augmented language models (RALMs) 在不相关的证据下时的性能下降的情况。然后，该论文提出了两种方法来解决这个问题：一种是使用自然语言推理（NLI）模型来筛选出问题答案对应的段落，另一种是通过在训练时使用混合相关和无关上下文来训练语言模型，以便在不相关上下文下仍然维持高性能。
results: 该论文的实验结果表明，只要使用1,000个示例来调整语言模型，就可以使其在不相关上下文下维持高性能，同时在相关上下文下保持高性能。

Abstract
Retrieval-augmented language models (RALMs) hold promise to produce language understanding systems that are are factual, efficient, and up-to-date. An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant, and does not harm performance when it is not. This is particularly important in multi-hop reasoning scenarios, where misuse of irrelevant evidence can lead to cascading errors. However, recent work has shown that retrieval augmentation can sometimes have a negative effect on performance. In this work, we present a thorough analysis on five open-domain question answering benchmarks, characterizing cases when retrieval reduces accuracy. We then propose two methods to mitigate this issue. First, a simple baseline that filters out retrieved passages that do not entail question-answer pairs according to a natural language inference (NLI) model. This is effective in preventing performance reduction, but at a cost of also discarding relevant passages. Thus, we propose a method for automatically generating data to fine-tune the language model to properly leverage retrieved passages, using a mix of relevant and irrelevant contexts at training time. We empirically show that even 1,000 examples suffice to train the model to be robust to irrelevant contexts while maintaining high performance on examples with relevant ones.

摘要

SmartPlay : A Benchmark for LLMs as Intelligent Agents

paper_url: http://arxiv.org/abs/2310.01557
repo_url: https://github.com/microsoft/smartplay
paper_authors: Yue Wu, Xuan Tang, Tom M. Mitchell, Yuanzhi Li
for: 评估大语言模型（LLM）的智能代理和下一代自动化能力
methods: 引入了6个游戏 benchmark，包括 Рок-纸-剑、塔ower of Hanoi 和 Minecraft，每款游戏都提供了不同的设定，共计20个评估设定和无限多种环境变化
results: SmartPlay 可以评估 LLM 代理的9种重要能力，包括对物品相互依赖的理解、规划、空间理解、从历史学习、理解随机性等，这些能力的分化使得可以分析每一能力 separately

Abstract
Recent large language models (LLMs) have demonstrated great potential toward intelligent agents and next-gen automation, but there currently lacks a systematic benchmark for evaluating LLMs' abilities as agents. We introduce SmartPlay: both a challenging benchmark and a methodology for evaluating LLMs as agents. SmartPlay consists of 6 different games, including Rock-Paper-Scissors, Tower of Hanoi, Minecraft. Each game features a unique setting, providing up to 20 evaluation settings and infinite environment variations. Each game in SmartPlay uniquely challenges a subset of 9 important capabilities of an intelligent LLM agent, including reasoning with object dependencies, planning ahead, spatial reasoning, learning from history, and understanding randomness. The distinction between the set of capabilities each game test allows us to analyze each capability separately. SmartPlay serves not only as a rigorous testing ground for evaluating the overall performance of LLM agents but also as a road-map for identifying gaps in current methodologies. We release our benchmark at github.com/microsoft/SmartPlay

摘要
最近的大语言模型（LLM）已经展示了很大的潜力，用于智能代理和下一代自动化，但目前没有一个系统性的 benchmark 来评估 LLM 的代理能力。我们介绍了 SmartPlay：一个具有挑战性的 benchmark 和评估 LLM 作为代理的方法ологи。SmartPlay 包含 6 个游戏，包括石头剑纸、塔ower of Hanoi 和 Minecraft。每个游戏都有唯一的设定，提供最多 20 个评估设定和无限多的环境变化。每个游戏在 SmartPlay 中具有独特的挑战，测试 LLM 代理的 9 个重要能力，包括对物品之间的逻辑推理、规划、空间逻辑、从历史学习、理解Randomness。这些测试设定的差异使我们可以分析每个能力 separately。SmartPlay 不仅为评估 LLM 代理的总性表现提供了严格的测试场景，还可以作为当前方法学的漏洞图标。我们在 GitHub 上发布了我们的 benchmark，请参考。

Harnessing the Power of Choices in Decision Tree Learning

paper_url: http://arxiv.org/abs/2310.01551
repo_url: https://github.com/sullivanc19/pydl8.5-topk
paper_authors: Guy Blanc, Jane Lange, Chirag Pabbaraju, Colin Sullivan, Li-Yang Tan, Mo Tiwari
for: 提高决策树学习算法的性能和可扩展性
methods: 提出了一种简单的普适化，使用Top-$k$算法，考虑了$k$个最佳特征作为拆分方式，而不是单个最佳特征
results: 经过理论和实验验证，Top-$k$算法可以在各种benchmark上提供较高的准确率，同时也比较有可扩展性，可以处理大量数据和特征集。

Abstract
We propose a simple generalization of standard and empirically successful decision tree learning algorithms such as ID3, C4.5, and CART. These algorithms, which have been central to machine learning for decades, are greedy in nature: they grow a decision tree by iteratively splitting on the best attribute. Our algorithm, Top-$k$, considers the $k$ best attributes as possible splits instead of just the single best attribute. We demonstrate, theoretically and empirically, the power of this simple generalization. We first prove a {\sl greediness hierarchy theorem} showing that for every $k \in \mathbb{N}$, Top-$(k+1)$ can be dramatically more powerful than Top-$k$: there are data distributions for which the former achieves accuracy $1-\varepsilon$, whereas the latter only achieves accuracy $\frac1{2}+\varepsilon$. We then show, through extensive experiments, that Top-$k$ outperforms the two main approaches to decision tree learning: classic greedy algorithms and more recent "optimal decision tree" algorithms. On one hand, Top-$k$ consistently enjoys significant accuracy gains over greedy algorithms across a wide range of benchmarks. On the other hand, Top-$k$ is markedly more scalable than optimal decision tree algorithms and is able to handle dataset and feature set sizes that remain far beyond the reach of these algorithms.

摘要
我们提出了一种简单的扩展，以提高标准和验证成功的决策树学习算法，如ID3、C4.5和CART。这些算法在机器学习领域中具有中心作用，它们会在每次拆分时选择最佳特征。我们的算法，Top-$k$,会考虑最佳$k$个特征作为拆分点，而不是只有单个最佳特征。我们在理论和实验方面证明了这个简单的扩展的力量。我们首先证明了一个“吝啬层次理论”，显示在每个$k \in \mathbb{N}$中，Top-$k$可以比Top-$k+1$更强大：存在一些数据分布，使得前者可以达到准确率$1-\varepsilon$，而后者只能达到准确率$\frac1{2}+\varepsilon$。然后，我们通过广泛的实验表明，Top-$k$在许多基准上具有显著的准确性提升，并且可以在许多情况下超越经典的滥览算法和“优质决策树”算法。一方面，Top-$k$在广泛的基准上具有显著的准确性提升。另一方面，Top-$k$可以处理 dataset 和特征集的大小，这些大小还超过“优质决策树”算法的处理范围。

Algebras of actions in an agent’s representations of the world

paper_url: http://arxiv.org/abs/2310.01536
repo_url: https://github.com/awjdean/cayleytablegeneration
paper_authors: Alexander Dean, Eduardo Alonso, Esther Mondragon
for: 本文提出了一个框架，用于从智能体的视角提取世界变换的代数。
methods: 本文使用我们提出的计算方法，对世界变换的特征进行分类，并将其归类为不同的属性。
results: 本文将 symmetry-based representation learning（SBDRL） formalism中的同质性基础代表学习（EDRL）的两个重要结论推广到不仅基于对称性的表示，还基于世界变换的表示。最后，我们将独立处理每个纯属性的同质性条件，并将其与分类结果结合起来。

Abstract
In this paper, we propose a framework to extract the algebra of the transformations of worlds from the perspective of an agent. As a starting point, we use our framework to reproduce the symmetry-based representations from the symmetry-based disentangled representation learning (SBDRL) formalism proposed by [1]; only the algebra of transformations of worlds that form groups can be described using symmetry-based representations. We then study the algebras of the transformations of worlds with features that occur in simple reinforcement learning scenarios. Using computational methods, that we developed, we extract the algebras of the transformations of these worlds and classify them according to their properties. Finally, we generalise two important results of SBDRL - the equivariance condition and the disentangling definition - from only working with symmetry-based representations to working with representations capturing the transformation properties of worlds with transformations for any algebra. Finally, we combine our generalised equivariance condition and our generalised disentangling definition to show that disentangled sub-algebras can each have their own individual equivariance conditions, which can be treated independently.

摘要
在这篇论文中，我们提出了一个框架，用于从智能体的视角提取世界变换的代数。作为起点，我们使用我们的框架重新实现了基于 симметрии的分离表示学习（SBDRL）形式中的对称性基于表示。只有世界变换的代数可以通过对称性基于表示来描述。我们然后研究了具有简单奖励学习场景中的特征的世界变换的代数。使用我们开发的计算方法，我们提取了这些世界变换的代数，并根据其性质进行分类。最后，我们扩展了 SBDRL 中两个重要结论 - 对称条件和分离定义 - 从只能使用对称基于表示来使用表示捕捉世界变换的Transformations的任何代数。此外，我们将这两个扩展结论相加以示，分离子代数可以各自拥有自己的对称条件，这些条件可以独立处理。

Bridging the Gap between Structural and Semantic Similarity in Diverse Planning

paper_url: http://arxiv.org/abs/2310.01520
repo_url: https://github.com/mfaisalzaki/pair2023-semantic-similarity-metrics
paper_authors: Mustafa F. Abdelwahed, Joan Espasa, Alice Toniolo, Ian P. Gent
for: 提高多计划生成器的多样性和效率，以及在具有噪音和缺失观测的情况下提高计划识别系统的效能。
methods: 提出了两种新的域无关度量，可以在域依存的视角下捕捉两个给定计划之间的不同信息。
results: 通过在多种情况下测试这两种度量，显示了它们能够更好地捕捉两个计划之间的 Structural Symmetries，并且在当前使用的度量失败时捕捉到一些重要的相似性信息。

Abstract
Diverse planning is the problem of finding multiple plans for a given problem specification, which is at the core of many real-world applications. For example, diverse planning is a critical piece for the efficiency of plan recognition systems when dealing with noisy and missing observations. Providing diverse solutions can also benefit situations where constraints are too expensive or impossible to model. Current diverse planners operate by generating multiple plans and then applying a selection procedure to extract diverse solutions using a similarity metric. Generally, current similarity metrics only consider the structural properties of the given plans. We argue that this approach is a limitation that sometimes prevents such metrics from capturing why two plans differ. In this work, we propose two new domain-independent metrics which are able to capture relevant information on the difference between two given plans from a domain-dependent viewpoint. We showcase their utility in various situations where the currently used metrics fail to capture the similarity between plans, failing to capture some structural symmetries.

摘要
多样规划是一个问题，它的核心在于多个问题规定下找到多个解决方案。这种问题在许多实际应用中具有重要性，例如在受到噪音和缺失观察的情况下，多样规划是计划认知系统的关键组成部分。提供多样解决方案可以帮助处理约束太过于贵或者不可能模型的情况。现有的多样规划器通常通过生成多个计划，然后使用一个选择过程提取多样解决方案，并使用一个相似度度量来衡量两个计划之间的相似性。现有的相似度度量通常只考虑计划的结构性质。我们认为这种方法是一种局限性，有时会使相似度度量无法捕捉两个计划之间的差异。在这种情况下，我们提出了两种新的领域无关的度量，它们能够捕捉不同领域视角下两个计划之间的差异。我们在各种情况下展示了这两种度量的实用性，包括现有的度量失去一些结构对称性。

Towards Automatic Design of Factorio Blueprints

paper_url: http://arxiv.org/abs/2310.01505
repo_url: None
paper_authors: Sean Patterson, Joan Espasa, Mun See Chang, Ruth Hoffmann
For: The paper is written to explore the feasibility of a constraint model to optimise Factorio blueprints, with the goal of balancing correctness, optimality, and performance.* Methods: The paper uses a constraint model to optimise Factorio blueprints, combining elements of bin-packing, routing, and network design to create an optimal design.* Results: The paper presents a new challenging problem and explores the feasibility of the constraint model to optimise Factorio blueprints, demonstrating the potential for improving the efficiency and effectiveness of factory designs in the game.Here’s the same information in Simplified Chinese:* 为：本文探讨了使用约束模型优化 factorio 蓝图，以寻求平衡正确性、优化性和性能的解决方案。* 方法：本文使用约束模型，结合了垃圾排序、 Routing 和网络设计等元素，实现了优化蓝图的目标。* 结果：本文提出了一个新的挑战性问题，并探讨了约束模型是否能够有效地优化 factorio 蓝图，并示出了可能提高Factory设计的效率和生产力。

Abstract
Factorio is a 2D construction and management simulation video game about building automated factories to produce items of increasing complexity. A core feature of the game is its blueprint system, which allows players to easily save and replicate parts of their designs. Blueprints can reproduce any layout of objects in the game, but are typically used to encapsulate a complex behaviour, such as the production of a non-basic object. Once created, these blueprints are then used as basic building blocks, allowing the player to create a layer of abstraction. The usage of blueprints not only eases the expansion of the factory but also allows the sharing of designs with the game's community. The layout in a blueprint can be optimised using various criteria, such as the total space used or the final production throughput. The design of an optimal blueprint is a hard combinatorial problem, interleaving elements of many well-studied problems such as bin-packing, routing or network design. This work presents a new challenging problem and explores the feasibility of a constraint model to optimise Factorio blueprints, balancing correctness, optimality, and performance.

摘要
factorio 是一款 2D 建设和管理模拟游戏，玩家需要建立自动化工厂生产复杂性增加的物品。核心特点之一是蓝图系统，允许玩家轻松地保存并复制部分设计。蓝图可以复制任何布局对象在游戏中，通常用于嵌入复杂行为，如生产非基本物品。一旦创建，这些蓝图就被用作基本建筑块，让玩家创建一层抽象。使用蓝图不仅简化了工厂的扩展，还allow玩家与游戏社区分享设计。在蓝图中的布局可以通过不同的优化标准来优化，如总空间用途或最终生产速率。设计优化蓝图是一个复杂的 combinatorial 问题，涉及了许多已经研究过的问题，如垃圾堆分配、路由或网络设计。本研究提出了一个新的挑战性问题，并explores the feasibility of a constraint model to optimize Factorio blueprints，既要求正确性、优化性和性能。

Towards a Model of Puzznic

paper_url: http://arxiv.org/abs/2310.01503
repo_url: None
paper_authors: Joan Espasa, Ian P. Gent, Ian Miguel, Peter Nightingale, András Z. Salamon, Mateu Villaret
for: 研究了一种用于模拟和解决PUZZNIC视频游戏的方法，该游戏需要玩家根据Sequence of moves来清空一个格子，并且不含移动块。
methods: 比较了一种规划方法和三种约束编程方法在一小组Benchmark实例上的表现。
results: 规划方法目前比约束编程方法高效，但我们提出了改进约束模型的建议。

Abstract
We report on progress in modelling and solving Puzznic, a video game requiring the player to plan sequences of moves to clear a grid by matching blocks. We focus here on levels with no moving blocks. We compare a planning approach and three constraint programming approaches on a small set of benchmark instances. The planning approach is at present superior to the constraint programming approaches, but we outline proposals for improving the constraint models.

摘要
我们报道了PUZZNIC游戏中玩家需要规划动作序列以清空格enix的进展。我们在这里关注没有移动块的水平。我们比较了规划方法和三种约束编程方法，并对一小组benchmark实例进行比较。规划方法目前胜过约束编程方法，但我们提出了改进约束模型的建议。

GPT-Driver: Learning to Drive with GPT

paper_url: http://arxiv.org/abs/2310.01415
repo_url: https://github.com/pointscoder/gpt-driver
paper_authors: Jiageng Mao, Yuxi Qian, Hang Zhao, Yue Wang
for: 这种方法是为了将OpenAI GPT-3.5模型转换成一个可靠的自动驾驶车辆运动规划器。
methods: 这种方法使用了语言模型的强的理解能力和泛化潜力，将运动规划转换为一个语言模型问题，并使用了一种新的提问-理解-调整策略来刺激语言模型的数字化理解能力。
results: 在使用这种方法的大规模nuScenes数据集上进行了广泛的实验，并证明了这种GPT基于的运动规划器的有效性、泛化能力和可解释性。

Abstract
We present a simple yet effective approach that can transform the OpenAI GPT-3.5 model into a reliable motion planner for autonomous vehicles. Motion planning is a core challenge in autonomous driving, aiming to plan a driving trajectory that is safe and comfortable. Existing motion planners predominantly leverage heuristic methods to forecast driving trajectories, yet these approaches demonstrate insufficient generalization capabilities in the face of novel and unseen driving scenarios. In this paper, we propose a novel approach to motion planning that capitalizes on the strong reasoning capabilities and generalization potential inherent to Large Language Models (LLMs). The fundamental insight of our approach is the reformulation of motion planning as a language modeling problem, a perspective not previously explored. Specifically, we represent the planner inputs and outputs as language tokens, and leverage the LLM to generate driving trajectories through a language description of coordinate positions. Furthermore, we propose a novel prompting-reasoning-finetuning strategy to stimulate the numerical reasoning potential of the LLM. With this strategy, the LLM can describe highly precise trajectory coordinates and also its internal decision-making process in natural language. We evaluate our approach on the large-scale nuScenes dataset, and extensive experiments substantiate the effectiveness, generalization ability, and interpretability of our GPT-based motion planner. Code is now available at https://github.com/PointsCoder/GPT-Driver.

摘要
我们提出了一种简单 yet effective的方法，可以将OpenAI GPT-3.5模型转换成自动驾驶车辆的可靠动作规划器。动作规划是自动驾驶的核心挑战，目标是计划一条安全和舒适的驾驶路径。现有的动作规划器主要采用了规则方法来预测驾驶路径，然而这些方法在面对新和未经见过的驾驶场景时显示出不够的总体化能力。在这篇论文中，我们提出了一种新的动作规划方法，它基于大语言模型（LLM）的强大推理能力和总体化潜力。我们的基本想法是将动作规划视为语言模型问题，一种未曾被探讨的视角。具体来说，我们将驾驶器输入和输出转换为语言符号，并利用LLM来生成驾驶路径的语言描述。此外，我们还提出了一种新的套件-推理-精度调整策略，以促进LLM的数字推理潜力。这种策略使得LLM可以通过自然语言描述坐标位置来描述高精度的路径坐标和其内部决策过程。我们对大规模的 nuScenes 数据集进行了广泛的实验，并证明了我们基于 GPT 的动作规划器的效果、总体化能力和可读性。代码现已经公开在 GitHub 上，请参考 https://github.com/PointsCoder/GPT-Driver。

A multi-institutional pediatric dataset of clinical radiology MRIs by the Children’s Brain Tumor Network

paper_url: http://arxiv.org/abs/2310.01413
repo_url: None
paper_authors: Ariana M. Familiar, Anahita Fathi Kazerooni, Hannah Anderson, Aliaksandr Lubneuski, Karthik Viswanathan, Rocky Breslow, Nastaran Khalili, Sina Bagheri, Debanjan Haldar, Meen Chul Kim, Sherjeel Arif, Rachel Madhogarhia, Thinh Q. Nguyen, Elizabeth A. Frenkel, Zeinab Helili, Jessica Harrison, Keyvan Farahani, Marius George Linguraru, Ulas Bagci, Yury Velichko, Jeffrey Stevens, Sarah Leary, Robert M. Lober, Stephani Campion, Amy A. Smith, Denise Morinigo, Brian Rood, Kimberly Diamond, Ian F. Pollack, Melissa Williams, Arastoo Vossough, Jeffrey B. Ware, Sabine Mueller, Phillip B. Storm, Allison P. Heath, Angela J. Waanders, Jena V. Lilly, Jennifer L. Mason, Adam C. Resnick, Ali Nabavizadeh
for: 这项研究旨在使用临床决策支持系统在儿童脑肿瘤领域进行预测分析，以提高儿童脑肿瘤的诊断和治疗。
methods: 这项研究使用了大规模的多Parametric MRI数据，以及关联的临床信息、数字生物 Pathology 镜像和生物 markers 数据，通过人工智能方法进行预测分析。
results: 研究提供了23,101个多 Parametric MRI数据，并与1,526名脑肿瘤患者的临床信息相关联，以便进行预测分析和诊断。

Abstract
Pediatric brain and spinal cancers remain the leading cause of cancer-related death in children. Advancements in clinical decision-support in pediatric neuro-oncology utilizing the wealth of radiology imaging data collected through standard care, however, has significantly lagged other domains. Such data is ripe for use with predictive analytics such as artificial intelligence (AI) methods, which require large datasets. To address this unmet need, we provide a multi-institutional, large-scale pediatric dataset of 23,101 multi-parametric MRI exams acquired through routine care for 1,526 brain tumor patients, as part of the Children's Brain Tumor Network. This includes longitudinal MRIs across various cancer diagnoses, with associated patient-level clinical information, digital pathology slides, as well as tissue genotype and omics data. To facilitate downstream analysis, treatment-na\"ive images for 370 subjects were processed and released through the NCI Childhood Cancer Data Initiative via the Cancer Data Service. Through ongoing efforts to continuously build these imaging repositories, our aim is to accelerate discovery and translational AI models with real-world data, to ultimately empower precision medicine for children.

摘要
儿童脑和脊梗癌仍然是儿童癌症related death的主要原因。 however，在临床决策支持方面，使用儿童神经外科数据的财富，尚未有significant progress。这些数据非常适合用于预测分析，如人工智能（AI）方法，这些方法需要大量数据。为了解决这个未满需求，我们提供了多所机构、大规模的儿童数据集，包括23,101个多参数MRI检查数据，来自1,526名脑肿瘤患者的 Routine care中收集的数据。这包括跨诊断的长期MRI数据，以及相关的患者水平的临床信息、数字生物pathology slides、以及组织遗传和 Omics 数据。为了促进下游分析，我们对370名患者的首次检查数据进行了处理和发布，通过国家癌症数据计划（NCI）的儿童癌数据计划，通过癌症数据服务。我们持续努力建立这些成像存储库，以便加速发现和翻译AI模型，以最终实现精准医学 для儿童。

Generalized Animal Imitator: Agile Locomotion with Versatile Motion Prior

paper_url: http://arxiv.org/abs/2310.01408
repo_url: None
paper_authors: Ruihan Yang, Zhuoqun Chen, Jianhan Ma, Chongyi Zheng, Yiyu Chen, Quan Nguyen, Xiaolong Wang
for: 本研究旨在帮助机器人学习多种迅速和灵活的行走、转弯、跳跃和翻滚等活动，以便在复杂的任务中应用。
methods: 本研究使用了 Reinforcement Learning 框架，并通过模仿动物的动作和手动设计的动作来学习多种迅速的低级技能。
results: 本研究成功地使用了一个单一的控制器来同时学习多种迅速的行走技能，并在实际环境中进行了评估。

Abstract
The agility of animals, particularly in complex activities such as running, turning, jumping, and backflipping, stands as an exemplar for robotic system design. Transferring this suite of behaviors to legged robotic systems introduces essential inquiries: How can a robot be trained to learn multiple locomotion behaviors simultaneously? How can the robot execute these tasks with a smooth transition? And what strategies allow for the integrated application of these skills? This paper introduces the Versatile Instructable Motion prior (VIM) - a Reinforcement Learning framework designed to incorporate a range of agile locomotion tasks suitable for advanced robotic applications. Our framework enables legged robots to learn diverse agile low-level skills by imitating animal motions and manually designed motions with Functionality reward and Stylization reward. While the Functionality reward guides the robot's ability to adopt varied skills, the Stylization reward ensures performance alignment with reference motions. Our evaluations of the VIM framework span both simulation environments and real-world deployment. To our understanding, this is the first work that allows a robot to concurrently learn diverse agile locomotion tasks using a singular controller. Further details and supportive media can be found at our project site: https://rchalyang.github.io/VIM .

摘要
“动物的机能，特别是在复杂的活动中，如跑步、折转、跳跃和翻杀，是机器系统设计的优秀示范。将这些行为 transferred 到四肢机器系统引入了重要的问题：如何让机器人学习多种 locomotive 行为 simultaneously? 如何使机器人在不同任务之间进行缓 Slope 的转换？以及如何实现这些技能的集成应用？本文介绍了 Versatile Instructable Motion prior (VIM) - 一种基于奖励学习的框架，用于把动物的机能 transferred 到高级机器人应用。我们的框架使得四肢机器人可以通过模仿动物动作和人工设计动作来学习多种灵活的低级技能。在Functionality奖励下，机器人可以采取多种技能，而在Stylization奖励下，机器人的表现需要与参考动作保持一致。我们对 VIM 框架进行了在 simulate 环境和实际部署的评估。到我们所知，这是首次允许机器人同时学习多种灵活的 locomotive 任务使用单一控制器。更多细节和支持媒体可以在我们项目网站上找到：https://rchalyang.github.io/VIM。”

Conditional Diffusion Distillation

paper_url: http://arxiv.org/abs/2310.01407
repo_url: None
paper_authors: Kangfu Mei, Mauricio Delbracio, Hossein Talebi, Zhengzhong Tu, Vishal M. Patel, Peyman Milanfar
for: 提高文本到图像生成中的速度，以便更快地进行图像修剪、修复和超分辨等条件生成任务。
methods: 使用条件馈承法，将 diffusion 模型与图像条件结合，以降低抽象采样时间。通过单 Stage 的共同学习，直接吸取普通预训练，大大简化之前的两 Stage 的分离两个阶段的 distillation 和条件精度调整。
results: 在多个任务上，包括超分辨、图像修剪和深度到图像生成等，实验表明，我们的方法可以超过现有的 distillation 技术，同样的采样时间下。另外，我们的方法是首个能匹配已经精度调整的条件馈承模型的 distillation 策略。

Abstract
Generative diffusion models provide strong priors for text-to-image generation and thereby serve as a foundation for conditional generation tasks such as image editing, restoration, and super-resolution. However, one major limitation of diffusion models is their slow sampling time. To address this challenge, we present a novel conditional distillation method designed to supplement the diffusion priors with the help of image conditions, allowing for conditional sampling with very few steps. We directly distill the unconditional pre-training in a single stage through joint-learning, largely simplifying the previous two-stage procedures that involve both distillation and conditional finetuning separately. Furthermore, our method enables a new parameter-efficient distillation mechanism that distills each task with only a small number of additional parameters combined with the shared frozen unconditional backbone. Experiments across multiple tasks including super-resolution, image editing, and depth-to-image generation demonstrate that our method outperforms existing distillation techniques for the same sampling time. Notably, our method is the first distillation strategy that can match the performance of the much slower fine-tuned conditional diffusion models.

摘要
生成扩散模型提供强大的文本到图像生成预期，因此成为条件生成任务such as图像修复、修剪和超解像的基础。然而，扩散模型的扫描时间很慢。为解决这个挑战，我们提出了一种新的条件馈渠方法，通过将图像条件融入到扩散预期中，实现了很快速的条件扫描。我们通过同时学习和联合学习来直接升级扩散预期，大大简化了之前的两个阶段的过程，这两个阶段分别是通过馈渠和条件训练来进行馈渠和条件训练。此外，我们的方法具有新的参数效率的馈渠机制，每个任务只需要少量的额外参数，并且与共享冻结的无条件背景结合使用。实验表明，我们的方法在多个任务上，包括超解像、图像修复和深度到图像生成，都能够超越现有的馈渠技术。吸引注意的是，我们的方法是首次馈渠策略，可以在同样的扫描时间内匹配Conditional扩散模型的性能。>>>

Representation Engineering: A Top-Down Approach to AI Transparency

paper_url: http://arxiv.org/abs/2310.01405
repo_url: https://github.com/andyzoujm/representation-engineering
paper_authors: Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks
for: 本研究旨在探讨和描述人工智能系统中的表示工程（RepE），它是一种基于认知神经科学的方法，用于增强人工智能系统的透明性。
methods: 本研究使用了人工智能系统中的深度神经网络（DNNs），并使用了人类大脑中的 Population-level 表示来帮助我们更好地理解和控制这些系统。
results: 本研究显示了 RepE 技术的基线和初步分析，并证明了这些技术可以简单而有效地提高我们对大语言模型的理解和控制。此外，这些技术还可以解决许多安全问题，如诚实、无害、权力欲等问题，这表明了 RepE 的推动力。

Abstract
In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representations, rather than neurons or circuits, at the center of analysis, equipping us with novel methods for monitoring and manipulating high-level cognitive phenomena in deep neural networks (DNNs). We provide baselines and an initial analysis of RepE techniques, showing that they offer simple yet effective solutions for improving our understanding and control of large language models. We showcase how these methods can provide traction on a wide range of safety-relevant problems, including honesty, harmlessness, power-seeking, and more, demonstrating the promise of top-down transparency research. We hope that this work catalyzes further exploration of RepE and fosters advancements in the transparency and safety of AI systems.

摘要
在这篇论文中，我们识别并特点化出现在AI系统中的表示工程（RepE），这是一种以认知神经科学为基础的方法，用于增强AI系统的透明性。RepE将人口级表示作为分析中心点，为我们提供了新的监测和控制深度神经网络（DNNs）中高级认知现象的方法。我们提供了基线和初步分析RepE技术，证明它们可以提供简单 yet effective的解决方案，以提高我们对大语言模型的理解和控制。我们示例了这些方法可以解决各种安全关键问题，包括诚实、无害、欲望力和更多的问题，表明了顶部透明性研究的承诺。我们希望这项工作能够推动更多的RepE探索，并促进AI系统的透明性和安全。

A Good Snowman is Hard to Plan

paper_url: http://arxiv.org/abs/2310.01471
repo_url: https://github.com/udg-lai/keps2023
paper_authors: Miquel Bofill, Cristina Borralleras, Joan Espasa, Gerard Martín, Gustavo Patow, Mateu Villaret
for: 本研究旨在设计一种能够证明解决方案的优化工具，以便在粘雪人游戏中提高玩家的参与度。
methods: 本研究使用了直观推理（SAT）来模型游戏问题，并证明了使用直观推理可以更快地解决问题，而不使用axioms来表示可达性预测。
results: 研究对51个水平进行了解决，其中43个水平得到了解决，剩下8个水平仍然需要解决。

Abstract
In this work we face a challenging puzzle video game: A Good Snowman is Hard to Build. The objective of the game is to build snowmen by moving and stacking snowballs on a discrete grid. For the sake of player engagement with the game, it is interesting to avoid that a player finds a much easier solution than the one the designer expected. Therefore, having tools that are able to certify the optimality of solutions is crucial. Although the game can be stated as a planning problem and can be naturally modelled in PDDL, we show that a direct translation to SAT clearly outperforms off-the-shelf state-of-the-art planners. As we show, this is mainly due to the fact that reachability properties can be easily modelled in SAT, allowing for shorter plans, whereas using axioms to express a reachability derived predicate in PDDL does not result in any significant reduction of solving time with the considered planners. We deal with a set of 51 levels, both original and crafted, solving 43 and with 8 challenging instances still remaining to be solved.

摘要
在这个工作中，我们面临着一个挑战性的拼图游戏：一个好的冰人是困难的建立。游戏的目标是在离散格上移动和堆叠冰球以建立冰人。为了让玩家与游戏产生更多的互动，避免玩家找到设计师没有想到的更加容易的解决方案。因此，有效的证明解的工具是非常重要。虽然游戏可以被视为规划问题，但我们表明了直接将问题转换为SAT明确超过了现有的状态艺术机器人的解决时间。这主要是因为SAT中可以轻松地表达可达性属性，从而导致更短的计划，而使用axioms来表达可达性 derivated predicate在PDDL中并不会导致解决时间减少。我们处理了51个水平，其中包括原始和手工制作的水平，解决了43个，并且还有8个挑战性的实例尚未解决。

Challenges in Modelling and Solving Plotting with PDDL

paper_url: http://arxiv.org/abs/2310.01470
repo_url: None
paper_authors: Joan Espasa, Ian Miguel, Peter Nightingale, András Z. Salamon, Mateu Villaret
for: 本研究是基于游戏《Plotting》的规划问题，旨在从格子中移除目标数量的颜色块。
methods: 本研究使用PDDL模型和基于实际的grounding算法来解决这个问题。
results: 研究表明，模型 Plotting 的复杂性和gravity的影响使得规划问题变得非常困难。

Abstract
We study a planning problem based on Plotting, a tile-matching puzzle video game published by Taito in 1989. The objective of this game is to remove a target number of coloured blocks from a grid by sequentially shooting blocks into the grid. Plotting features complex transitions after every shot: various blocks are affected directly, while others can be indirectly affected by gravity. We highlight the challenges of modelling Plotting with PDDL and of solving it with a grounding-based state-of-the-art planner.

摘要
我们研究一个基于Plotting的规划问题，这是一款由Taito在1989年发布的块匹配游戏。这个游戏的目标是从格子中移除一定数量的颜色块，通过sequentially发射块到格子中来完成。Plotting中存在复杂的转移，其中一些块会直接受到影响，而其他块则可能会受到重力的影响。我们将PDDL的模型化和一个基于实体的现代规划器的解决方案强调出来。

EXTRACTER: Efficient Texture Matching with Attention and Gradient Enhancing for Large Scale Image Super Resolution

paper_url: http://arxiv.org/abs/2310.01379
repo_url: https://github.com/esteban-rs/extracter
paper_authors: Esteban Reyes-Saldana, Mariano Rivera
for: 提高低分辨率图像的超分辨率性能，使用引用高分辨率图像的纹理传输来增强低分辨率图像。
methods: 使用深度搜索算法，在特征空间内查找最相似的纹理匹配，并使用简单的差分架构来提高超分辨率结果的纹理精度。
results: 提高超分辨率结果的纹理精度，使用PSNR和SSMI指标显示竞争性metric results。

Abstract
Recent Reference-Based image super-resolution (RefSR) has improved SOTA deep methods introducing attention mechanisms to enhance low-resolution images by transferring high-resolution textures from a reference high-resolution image. The main idea is to search for matches between patches using LR and Reference image pair in a feature space and merge them using deep architectures. However, existing methods lack the accurate search of textures. They divide images into as many patches as possible, resulting in inefficient memory usage, and cannot manage large images. Herein, we propose a deep search with a more efficient memory usage that reduces significantly the number of image patches and finds the $k$ most relevant texture match for each low-resolution patch over the high-resolution reference patches, resulting in an accurate texture match. We enhance the Super Resolution result adding gradient density information using a simple residual architecture showing competitive metrics results: PSNR and SSMI.

摘要
Translated into Simplified Chinese:现代参考基于图像超解析（RefSR）已经提高了SOTA深度方法，通过引入注意机制来提高低分辨率图像，并将高分辨率图像的文件质量传递给低分辨率图像。主要的想法是在LR和参考图像对的Feature空间中搜索匹配，并使用深度架构将其合并。然而，现有方法缺乏准确的 texture搜索。它们将图像分解成最多的patches为 möglich，导致内存使用非常不效率，并无法处理大图像。在这里，我们提议一种深度搜索，它可以更有效地使用内存，并将LR图像中的patch数量减少到最低限度。此外，我们还添加了一个简单的差分体系，以提高超分辨率结果的某些特征。我们的方法在PSNR和SSMI指标上达到了竞争性的metric结果。

On Grid Graph Reachability and Puzzle Games

paper_url: http://arxiv.org/abs/2310.01378
repo_url: https://github.com/udg-lai/modref2023
paper_authors: Miquel Bofill, Cristina Borralleras, Joan Espasa, Mateu Villaret
for: 这个论文主要针对的是解决类似于Sokoban的拧盘游戏中的路径规划问题。
methods: 该论文使用CP和SAT方法来解决这类问题，并提出了一新的编码方法。
results: 实验表明，新的编码方法在 плани策SAT模式下特别适合解决拧盘问题，特别是当考虑同时执行多个动作时。

Abstract
Many puzzle video games, like Sokoban, involve moving some agent in a maze. The reachable locations are usually apparent for a human player, and the difficulty of the game is mainly related to performing actions on objects, such as pushing (reachable) boxes. For this reason, the difficulty of a particular level is often measured as the number of actions on objects, other than agent walking, needed to find a solution. In this paper we study CP and SAT approaches for solving these kind of problems. We review some reachability encodings and propose a new one. We empirically show that the new encoding is well-suited for solving puzzle problems in the planning as SAT paradigm, especially when considering the execution of several actions in parallel.

摘要
许多拼图游戏，如索科幻， involve moving some agent in a maze. The reachable locations are usually apparent for a human player, and the difficulty of the game is mainly related to performing actions on objects, such as pushing (reachable) boxes. For this reason, the difficulty of a particular level is often measured as the number of actions on objects, other than agent walking, needed to find a solution. In this paper, we study CP and SAT approaches for solving these kind of problems. We review some reachability encodings and propose a new one. We empirically show that the new encoding is well-suited for solving puzzle problems in the planning as SAT paradigm, especially when considering the execution of several actions in parallel.Here's a breakdown of the text in Simplified Chinese:许多拼图游戏 (xù duō pīn tú xì yù) - Many puzzle video gameslike Sokoban (索科幻) - involve moving some agent in a maze.The reachable locations are usually apparent for a human player (人类玩家可见的可达位置) - The reachable locations are usually apparent for a human playerand the difficulty of the game is mainly related to performing actions on objects (主要是通过对物体进行操作) - and the difficulty of the game is mainly related to performing actions on objectssuch as pushing (可达) boxes. (例如，推动oboxes。) - such as pushing boxes.For this reason, the difficulty of a particular level is often measured as the number of actions on objects (其中的某些级别通常被计算为对物体进行操作的次数) - For this reason, the difficulty of a particular level is often measured as the number of actions on objectsother than agent walking (除了agent walking外) - other than agent walkingneeded to find a solution. (需要解决。) - needed to find a solution.In this paper, we study CP and SAT approaches for solving these kind of problems. (在这篇论文中，我们研究了CP和SAT方法来解决这些问题。) - In this paper, we study CP and SAT approaches for solving these kind of problems.We review some reachability encodings (我们审查了一些可达性编码) - We review some reachability encodingsand propose a new one. (并提出了一个新的编码。) - and propose a new one.We empirically show that the new encoding is well-suited for solving puzzle problems in the planning as SAT paradigm (我们实际上证明了新编码适合在 плани策中的SAT парадигме中解决拼图问题) - We empirically show that the new encoding is well-suited for solving puzzle problems in the planning as SAT paradigmespecially when considering the execution of several actions in parallel. (特别是在同时执行多个动作时。) - especially when considering the execution of several actions in parallel.

UltraFeedback: Boosting Language Models with High-quality Feedback

paper_url: http://arxiv.org/abs/2310.01377
repo_url: https://github.com/thunlp/ultrafeedback
paper_authors: Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Wei Zhu, Yuan Ni, Guotong Xie, Zhiyuan Liu, Maosong Sun
for:ULTRAFEEDBACK is designed to overcome the limitations of current preference datasets in reinforcement learning from human feedback (RLHF) research, specifically the scarcity of diverse and naturalistic datasets of human preferences on large language models (LLMs) at scale.methods:To create ULTRAFEEDBACK, the authors compile a diverse array of instructions and models from multiple sources, meticulously devise annotation instructions, and employ GPT-4 to offer detailed feedback in both numerical and textual forms.results:The authors train various models using ULTRAFEEDBACK, including the reward model UltraRM, chat language model UltraLM-13B-PPO, and critique model UltraCM, and achieve top performance across multiple benchmarks, outperforming existing open-source models.Here’s the simplified Chinese version:for:ULTRAFEEDBACK 是为了解决现有偏好数据集的限制，即人工智能学习从人类反馈中的偏好数据集的缺乏多样性和自然性，特别是大语言模型（LLM）的缺乏大规模的人类偏好数据。methods:ULTRAFEEDBACK 的创建具有多种来源的指令和模型，并且经过精心设计的注释指令和 GPT-4 的详细反馈，以 numerics 和文本两种形式提供反馈。results:作者使用 ULTRAFEEDBACK 训练多种模型，包括 UltraRM 奖励模型、UltraLM-13B-PPO 对话语言模型和 UltraCM 评价模型，并在多个benchmark上达到了现有开源模型的顶峰性表现。

Abstract
Reinforcement learning from human feedback (RLHF) has become a pivot technique in aligning large language models (LLMs) with human preferences. In RLHF practice, preference data plays a crucial role in bridging human proclivity and LLMs. However, the scarcity of diverse, naturalistic datasets of human preferences on LLM outputs at scale poses a great challenge to RLHF as well as feedback learning research within the open-source community. Current preference datasets, either proprietary or limited in size and prompt variety, result in limited RLHF adoption in open-source models and hinder further exploration. In this study, we propose ULTRAFEEDBACK, a large-scale, high-quality, and diversified preference dataset designed to overcome these limitations and foster RLHF development. To create ULTRAFEEDBACK, we compile a diverse array of instructions and models from multiple sources to produce comparative data. We meticulously devise annotation instructions and employ GPT-4 to offer detailed feedback in both numerical and textual forms. ULTRAFEEDBACK establishes a reproducible and expandable preference data construction pipeline, serving as a solid foundation for future RLHF and feedback learning research. Utilizing ULTRAFEEDBACK, we train various models to demonstrate its effectiveness, including the reward model UltraRM, chat language model UltraLM-13B-PPO, and critique model UltraCM. Experimental results indicate that our models outperform existing open-source models, achieving top performance across multiple benchmarks. Our data and models are available at https://github.com/thunlp/UltraFeedback.

摘要
大量的人类反馈学习（RLHF）已成为对大语言模型（LLM）与人类偏好的重要技术。在RLHF实践中，偏好数据扮演着关键的桥接角色，将人类偏好与LLM相连。然而，对各种自然和多样化的人类偏好数据的大规模收集和分析却成为RLHF的重大挑战。目前的偏好数据集，ether proprietary或有限的规模和提示类型，导致RLHF在开源模型中的广泛应用和进一步探索受到限制。在这项研究中，我们提出ULTRAFEEDBACK，一个大规模、高质量和多样化的偏好数据集，旨在超越这些限制，推动RLHF的发展。为建立ULTRAFEEDBACK，我们编译了多种来源的指令和模型，以生成比较数据。我们仔细设计了注释指令，并使用GPT-4提供详细的反馈，包括数字和文本两种形式。ULTRAFEEDBACK实现了可重复和扩展的偏好数据建构管道，为未来RLHF和反馈学习研究提供坚实的基础。通过使用ULTRAFEEDBACK，我们训练了多种模型，包括奖励模型 UltraRM、chat语言模型 UltraLM-13B-PPO 和评价模型 UltraCM。实验结果表明，我们的模型在多个标准准则上表现出色，与现有的开源模型相比，具有更高的性能。我们的数据和模型可以在 GitHub 上找到：https://github.com/thunlp/UltraFeedback。

Elephant Neural Networks: Born to Be a Continual Learner

paper_url: http://arxiv.org/abs/2310.01365
repo_url: None
paper_authors: Qingfeng Lan, A. Rupam Mahmood
for: 本研究旨在探讨 activation functions 在 neural network 训练动态中的作用，以及它们对遗弃学习的影响。
methods: 本研究使用了一种新的 activation functions 称为 “elephant activation functions”，这种函数可以生成 both sparse representations 和 sparse gradients。
results: 我们的研究发现，通过将 classical activation functions 替换为 elephant activation functions，可以 Significantly improve the resilience of neural networks to catastrophic forgetting。我们的方法在 Split MNIST 数据集上 achieve excellent performance ，无需使用 replay buffer、task boundary information 或 pre-training。

Abstract
Catastrophic forgetting remains a significant challenge to continual learning for decades. While recent works have proposed effective methods to mitigate this problem, they mainly focus on the algorithmic side. Meanwhile, we do not fully understand what architectural properties of neural networks lead to catastrophic forgetting. This study aims to fill this gap by studying the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting. Our study reveals that, besides sparse representations, the gradient sparsity of activation functions also plays an important role in reducing forgetting. Based on this insight, we propose a new class of activation functions, elephant activation functions, that can generate both sparse representations and sparse gradients. We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting. Our method has broad applicability and benefits for continual learning in regression, class incremental learning, and reinforcement learning tasks. Specifically, we achieves excellent performance on Split MNIST dataset in just one single pass, without using replay buffer, task boundary information, or pre-training.

摘要
长期学习中的快速忘记问题仍然是一个长期不解的挑战。Recent works have proposed effective methods to mitigate this problem, but these methods mainly focus on the algorithmic side. Meanwhile, we do not fully understand what architectural properties of neural networks lead to catastrophic forgetting. This study aims to fill this gap by studying the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting. Our study reveals that, besides sparse representations, the gradient sparsity of activation functions also plays an important role in reducing forgetting. Based on this insight, we propose a new class of activation functions, elephant activation functions, that can generate both sparse representations and sparse gradients. We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting. Our method has broad applicability and benefits for continual learning in regression, class incremental learning, and reinforcement learning tasks. Specifically, we achieve excellent performance on Split MNIST dataset in just one single pass, without using replay buffer, task boundary information, or pre-training.

SyMPox: An Automated Monkeypox Detection System Based on Symptoms Using XGBoost

paper_url: http://arxiv.org/abs/2310.19801
repo_url: None
paper_authors: Alireza Farzipour, Roya Elmi, Hamid Nasiri
for: 这个研究目的是为了提供一个独立的Monkeypox诊断应用程序，以便对症状进行分析并提供准确的诊断。
methods: 这个研究使用了XGBoost算法来分析症状模式，并提供了一个用户友善的平台，让人们可以轻松地评估自己的症状并获得可靠的Monkeypox诊断。
results: 这个研究获得了一个名为SyMPox的独立应用程序，可以实现高速和可靠的Monkeypox诊断。

Abstract
Monkeypox is a zoonotic disease. About 87000 cases of monkeypox were confirmed by the World Health Organization until 10th June 2023. The most prevalent methods for identifying this disease are image-based recognition techniques. Still, they are not too fast and could only be available to a few individuals. This study presents an independent application named SyMPox, developed to diagnose Monkeypox cases based on symptoms. SyMPox utilizes the robust XGBoost algorithm to analyze symptom patterns and provide accurate assessments. Developed using the Gradio framework, SyMPox offers a user-friendly platform for individuals to assess their symptoms and obtain reliable Monkeypox diagnoses.

摘要
猴病毒是一种动物传人疾病。到2023年6月10日，世界卫生组织确认了约87000例猴病毒病例。现有的主要识别方法是基于图像识别技术，但它们并不够快，只能为少数人提供服务。本研究提出了一个独立应用程序，名为SyMPox，用于诊断猴病毒病例基于症状。SyMPox利用XGBoost算法分析症状模式，提供准确评估。使用Gradio框架开发，SyMPox提供了一个易用的平台，让人们可以轻松地评估症状并获得可靠的猴病毒诊断。

RA-DIT: Retrieval-Augmented Dual Instruction Tuning

paper_url: http://arxiv.org/abs/2310.01352
repo_url: None
paper_authors: Xi Victoria Lin, Xilun Chen, Mingda Chen, Weijia Shi, Maria Lomeli, Rich James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, Luke Zettlemoyer, Scott Yih
for: 提高 Retrieval-augmented language models (RALMs) 的性能，使其能够访问长尾和最新的知识来源。
methods: 我们提出了一种轻量级的细化调教方法，即 Retrieval-Augmented Dual Instruction Tuning (RA-DIT)，可以让任何 LLM 添加检索功能。我们的方法包括两个细化调教步骤：第一步是更新预训练 LM，使其更好地使用检索到的信息；第二步是更新检索器，使其返回更加相关的结果，符合 LM 的首选。
results: 我们在具有知识利用和Contextual awareness的任务上进行了细化调教，并证明每个阶段都有显著性能提升，并且使用两个阶段时又有更大的提升。我们的最佳模型 RA-DIT 65B 在 Zero-shot 和 few-shot 学习 benchmark 上实现了状态机器人性能，与现有的在 Context 中 RALM 方法相比，在 0-shot 设定下提高了 +8.9% 的平均提升，在 5-shot 设定下提高了 +1.4% 的平均提升。

Abstract
Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores, but are challenging to build. Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance. We introduce Retrieval-Augmented Dual Instruction Tuning (RA-DIT), a lightweight fine-tuning methodology that provides a third option by retrofitting any LLM with retrieval capabilities. Our approach operates in two distinct fine-tuning steps: (1) one updates a pre-trained LM to better use retrieved information, while (2) the other updates the retriever to return more relevant results, as preferred by the LM. By fine-tuning over tasks that require both knowledge utilization and contextual awareness, we demonstrate that each stage yields significant performance improvements, and using both leads to additional gains. Our best model, RA-DIT 65B, achieves state-of-the-art performance across a range of knowledge-intensive zero- and few-shot learning benchmarks, significantly outperforming existing in-context RALM approaches by up to +8.9% in 0-shot setting and +1.4% in 5-shot setting on average.

摘要
听说Retrieval-augmented language models（RALMs）可以提高表现，但是构建它们是困难的。现有的方法 Either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store, both of which lead to suboptimal performance. We introduce Retrieval-Augmented Dual Instruction Tuning（RA-DIT），一种轻量级的细调方法，可以让任何LM添加检索能力。我们的方法在两个细调步骤中运行：（1）首先更新预训练LM，使其更好地利用检索到的信息；（2）然后更新检索器，使其返回更加相关的结果，如LM所 preference。通过在需要知识利用和Contextual awareness的任务上细调，我们示出了每个阶段的性能提升，并且使用两者导致了更大的提升。我们的最佳模型RA-DIT 65B在多种知识密集的零和几shot学习benchmark上达到了状态的最佳性能，与现有的在Context中RALM方法相比，在0-shot Setting中提高了8.9%，在5-shot Setting中提高了1.4%的平均提升。

Improving Dialogue Management: Quality Datasets vs Models

paper_url: http://arxiv.org/abs/2310.01339
repo_url: https://github.com/miguel-kjh/Improving-Dialogue-Management
paper_authors: Miguel Ángel Medina-Ramírez, Cayetano Guerra-Artal, Mario Hernández-Tejera
for: 这篇论文的目的是解释对话管理器的异常性的原因，而不是通过使用不同的模型来解决问题。
methods: 作者使用了人工生成的对话生成器来控制数据集中的错误量和类型，以证明数据集的质量问题是对对话管理器的性能的主要限制因素。
results: 研究发现，数据集中的错误对对话管理器的性能占有很大的比重。

Abstract
Task-oriented dialogue systems (TODS) have become crucial for users to interact with machines and computers using natural language. One of its key components is the dialogue manager, which guides the conversation towards a good goal for the user by providing the best possible response. Previous works have proposed rule-based systems (RBS), reinforcement learning (RL), and supervised learning (SL) as solutions for the correct dialogue management; in other words, select the best response given input by the user. However, this work argues that the leading cause of DMs not achieving maximum performance resides in the quality of the datasets rather than the models employed thus far; this means that dataset errors, like mislabeling, originate a large percentage of failures in dialogue management. We studied the main errors in the most widely used datasets, Multiwoz 2.1 and SGD, to demonstrate this hypothesis. To do this, we have designed a synthetic dialogue generator to fully control the amount and type of errors introduced in the dataset. Using this generator, we demonstrated that errors in the datasets contribute proportionally to the performance of the models

摘要

LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples

paper_url: http://arxiv.org/abs/2310.01469
repo_url: https://github.com/pku-yuangroup/hallucination-attack
paper_authors: Jia-Yu Yao, Kun-Peng Ning, Zhen-Hui Liu, Mu-Nan Ning, Li Yuan
for: 这个论文旨在探讨大语言模型（LLMs）如何处理Random Tokens的非意义提问，以及这种行为是否可以被视为黑客攻击。
methods: 作者使用了不同的提问方法来检测LLMs的回应，并发现了一种自动诱发幻觉的方法，称为幻觉攻击。
results: 研究发现， LLMS 可以通过幻觉攻击来诱发幻觉，并且这种行为与 convent ional adversarial examples 有相似之处。此外，作者还提出了一种简单 yet effective 的防御策略。

Abstract
Large Language Models (LLMs), including GPT-3.5, LLaMA, and PaLM, seem to be knowledgeable and able to adapt to many tasks. However, we still can not completely trust their answer, since LLMs suffer from hallucination--fabricating non-existent facts to cheat users without perception. And the reasons for their existence and pervasiveness remain unclear. In this paper, we demonstrate that non-sense prompts composed of random tokens can also elicit the LLMs to respond with hallucinations. This phenomenon forces us to revisit that hallucination may be another view of adversarial examples, and it shares similar features with conventional adversarial examples as the basic feature of LLMs. Therefore, we formalize an automatic hallucination triggering method as the hallucination attack in an adversarial way. Finally, we explore basic feature of attacked adversarial prompts and propose a simple yet effective defense strategy. Our code is released on GitHub.

摘要
大型语言模型（LLM），包括GPT-3.5、LLaMA和PaLM，似乎具有知识和适应多个任务的能力。然而，我们仍不能完全信任他们的答案，因为LLMs受到幻视——创造不存在的事实来欺骗用户。而这些模型的存在和普遍性的原因仍然未知。在这篇论文中，我们示示了使用随机的tokencompose的非sensical prompt可以让LLMs回应 avec hallucination。这一现象让我们需要重新思考幻视是否是另一种见识例子，并且它与 conventioal adversarial examples 共同拥有相似的基本特征。因此，我们正式定义了一种自动幻视触发方法，即幻视攻击。最后，我们探索了受到攻击的幻视提示的基本特征，并提出了一个简单 yet effective的防御策略。我们的代码已经发布到GitHub上。

The Entity-Deduction Arena: A playground for probing the conversational reasoning and planning capabilities of LLMs

paper_url: http://arxiv.org/abs/2310.01468
repo_url: None
paper_authors: Yizhe Zhang, Jiarui Lu, Navdeep Jaitly
for: This paper aims to evaluate the conversational reasoning and planning capabilities of large language models (LLMs) and to develop a surrogate problem to assess their ability to deduce an entity unknown to itself.methods: The paper uses an entity-deducing game as an evaluation framework to test the performance of various LLMs, and employs Behavior Cloning (BC) and Reinforcement Learning to enhance the reasoning and planning capacity of weaker models.results: The paper finds significant differences in the performance of different LLMs on the entity-deducing game, and demonstrates that strong LLMs like GPT-4 outperform human players by a large margin. Additionally, the paper shows that weaker models can be trained to imitate stronger models and generalize to data or domains using only the demonstrations from a stronger model.

Abstract
Large language models (LLMs) are effective at answering questions that are clearly asked. However, when faced with ambiguous queries they can act unpredictably and produce incorrect outputs. This underscores the need for the development of intelligent agents capable of asking clarification questions to resolve ambiguities effectively. This capability requires complex understanding, state tracking, reasoning and planning over multiple conversational turns. However, directly measuring this can be challenging. In this paper, we offer a surrogate problem which assesses an LLMs's capability to deduce an entity unknown to itself, but revealed to a judge, by asking the judge a series of queries. This entity-deducing game can serve as an evaluation framework to probe the conversational reasoning and planning capabilities of language models. We systematically evaluate various LLMs and discover significant differences in their performance on this task. We find that strong LLMs like GPT-4 outperform human players by a large margin. We further employ Behavior Cloning (BC) to examine whether a weaker model is capable of imitating a stronger model and generalizing to data or domains, using only the demonstrations from a stronger model. We finally propose to use Reinforcement Learning to enhance reasoning and planning capacity of Vicuna models through episodes of game playing, which lead to significant performance improvement. We hope that this problem offers insights into how autonomous agents could be trained to behave more intelligently in ambiguous circumstances.

摘要
大型语言模型（LLM）能够有效地回答明确表达的问题。然而，当面临不确定的问题时，它们可能会 acted randomly 并产生错误的输出。这反映了需要开发智能代理人，可以有效地解决不确定性。这种能力需要复杂的理解、状态跟踪、理智和规划多个对话转帖。然而，直接测量这种能力可以困难。在这篇论文中，我们提出了一个代理人的实验，用于评估语言模型的对话理解和规划能力。我们系统地评估了不同的 LLM，并发现它们在这个任务中表现出了明显的差异。我们发现强大的 GPT-4 超越了人类玩家，并且使用 Behavior Cloning（BC）来检查一个弱化模型是否可以通过强大模型的示例来学习和泛化。最后，我们提出了使用奖励学习来增强 Vicuna 模型的理智和规划能力，通过游戏玩家的episode来实现显著的性能提升。我们希望这个问题可以为自主代理人的训练提供新的思路。

L2MAC: Large Language Model Automatic Computer for Unbounded Code Generation

paper_url: http://arxiv.org/abs/2310.02003
repo_url: None
paper_authors: Samuel Holt, Max Ruiz Luyten, Mihaela van der Schaar
for: 这篇论文是为了解决 transformer 架构下的 fixed context window 问题，使 transformer 基于模型能够生成长长的、逻辑一致的代码。
methods: 这篇论文提出了一种实用的 stored-program automatic computer，即 L2MAC，用于长长代码生成。L2MAC 的内存包括两部分：指令注册，用于解决用户给定的任务，以及文件存储，用于存储最终和中间输出。每个指令都由一个独立的 LLM 实例执行，它的上下文由一个控制单元管理，以确保有效地与文件存储交互。这些组件使 L2MAC 能够生成无限长的代码结构，超越 transformer 的固定上下文窗口，同时生成代码符合用户指定的复杂要求。
results: 该论文验证了 L2MAC 在系统设计任务中生成大型代码基本成功，其他编程方法在实现用户要求时匮乏实现能力。此外，论文还提供了关于这种性能差距的调查。

Abstract
Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture, hindering their ability to produce long and logically consistent code. Memory-augmented LLMs are a promising solution, but current approaches cannot handle long code generation tasks since they (1) only focus on reading memory and reduce its evolution to the concatenation of new memories or (2) use very specialized memories that cannot adapt to other domains. This paper presents L2MAC, the first practical LLM-based stored-program automatic computer for long and consistent code generation. Its memory has two components: the instruction registry, which is populated with a prompt program to solve the user-given task, and a file store, which will contain the final and intermediate outputs. Each instruction is executed by a separate LLM instance, whose context is managed by a control unit capable of precise memory reading and writing to ensure effective interaction with the file store. These components enable L2MAC to generate virtually unbounded code structures, bypassing the constraints of the finite context window while producing code that fulfills complex user-specified requirements. We empirically show that L2MAC succeeds in generating large code bases for system design tasks where other coding methods fall short in implementing user requirements and provide insight into the reasons for this performance gap.

摘要
transformer-based large language models (LLMs) 是受到转换器架构下的固定上下文窗口限制，使其无法生成长度很长且逻辑上一致的代码。增强内存的 LLM 是一种有前途的解决方案，但现有的方法无法处理长代码生成任务，因为它们 (1) 只是关注读取内存，减少内存的演化到新内存的 concatenation 或 (2) 使用非常特殊的内存，无法适应其他领域。这篇论文介绍了 L2MAC，首个实用的 LLM 基于存储程序自动计算机，用于长度很长且一致的代码生成。它的内存包括两个组件：指令注册，通过一个提示程序解决用户给定的任务，以及文件存储，包含最终和中间输出。每个指令都是由一个独立的 LLM 实例执行，它的上下文是由一个可以精准地读取和写入内存的控制单元管理，以确保与文件存储的有效交互。这些组件使 L2MAC 可以生成无限大的代码结构，绕过转换器架构下的固定上下文窗口的限制，同时生成代码，满足用户给定的复杂要求。我们实验表明，L2MAC 可以成功生成大型代码基 для系统设计任务，其他编程方法无法实现用户要求，并提供了这种性能差距的原因。

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

paper_url: http://arxiv.org/abs/2310.01334
repo_url: https://github.com/unites-lab/mc-smoe
paper_authors: Pingzhi Li, Zhenyu Zhang, Prateek Yadav, Yi-Lin Sung, Yu Cheng, Mohit Bansal, Tianlong Chen
for: 提高神经网络学习能力，适用于资源受限下执行端应用。
methods: 提出了一种名为M-SMoE的方法，通过路由统计指导专家合并，实现专家信息的整合。
results: 经过广泛的实验 validate，MC-SMoE可以减少内存使用量和计算量，保持性能的稳定性。例如，MC-SMoE可以在8个标准测试集上减少内存使用量达80%，计算量减少20%，而性能减少不到1%。

Abstract
Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks, however, they have issues like (a) High Memory Usage, due to duplication of the network layers into multiple copies as experts; and (b) Redundancy in Experts, as common learning-based routing policies suffer from representational collapse. Therefore, vanilla SMoE models are memory inefficient and non-scalable, especially for resource-constrained downstream scenarios. In this paper, we ask: Can we craft a compact SMoE model by consolidating expert information? What is the best recipe to merge multiple experts into fewer but more knowledgeable experts? Our pilot investigation reveals that conventional model merging methods fail to be effective in such expert merging for SMoE. The potential reasons are: (1) redundant information overshadows critical experts; (2) appropriate neuron permutation for each expert is missing to bring all of them in alignment. To address this, we propose M-SMoE, which leverages routing statistics to guide expert merging. Specifically, it starts with neuron permutation alignment for experts; then, dominant experts and their "group members" are formed; lastly, every expert group is merged into a single expert by utilizing each expert's activation frequency as their weight for merging, thus diminishing the impact of insignificant experts. Moreover, we observed that our proposed merging promotes a low dimensionality in the merged expert's weight space, naturally paving the way for additional compression. Hence, our final method, MC-SMoE (i.e., Merge, then Compress SMoE), further decomposes the merged experts into low-rank and structural sparse alternatives. Extensive experiments across 8 benchmarks validate the effectiveness of MC-SMoE. For instance, our MC-SMoE achieves up to 80% memory and a 20% FLOPs reduction, with virtually no loss in performance.

摘要
SMoE（简化 mixture of experts）已经展示了扩展学习能力的搭配能力，但它们具有高内存使用和重复知识的问题。因此，vanilla SMoE模型是内存不足和不可扩展的，特别是在资源有限的下游场景中。在这篇论文中，我们问：可以通过集成专家信息来构建一个 компакт的 SMoE 模型吗？我们的初步调查发现，传统的模型合并方法无法有效地将多个专家合并为 fewer but more knowledgeable 专家。这可能是因为：（1）重复信息压倒关键专家;（2）缺乏适当的神经元 permutation 来将所有专家都放在一起。为解决这个问题，我们提出了 M-SMoE，它利用路由统计来导引专家合并。具体来说，它首先对专家进行神经元 permutation 对应;然后，选择dominant专家和其所在的 "组合部分" ;最后，将每个专家组合成一个单独的专家，使用每个专家的活动频率作为合并的权重，从而减少无关专家的影响。此外，我们发现我们的提议的合并可以降低专家的维度，自然地减少压缩。因此，我们的最终方法，MC-SMoE（即Merge, then Compress SMoE），进一步将合并专家 decomposes 成低维和结构减少的专家。我们对8个标准 bench 进行了广泛的实验，并证明了 MC-SMoE 的效果。例如，我们的 MC-SMoE 可以在内存和运算数据减少80%和20%，同时保持性能几乎不变。

ChoiceMates: Supporting Unfamiliar Online Decision-Making with Multi-Agent Conversational Interactions

paper_url: http://arxiv.org/abs/2310.01331
repo_url: None
paper_authors: Jeongeon Park, Bryan Min, Xiaojuan Ma, Juho Kim
for: 本研究旨在帮助用户更好地搜寻、理解和做出决策于在线信息中，尤其是在没有准确专业知识的情况下。
methods: 我们采用了一种动态的LLM-powered agent系统，allowing users to engage in conversations with multiple agents，以获得域园知识和有效地搜寻信息。
results: 与传统的网页搜索和单个代理相比，我们的 ChoiceMates 系统在用户们发现、深入了解和管理信息方面表现出了明显的优势，participants also reported that multi-agent conversations were helpful in their decision-making process.

Abstract
Unfamiliar decisions -- decisions where people lack adequate domain knowledge or expertise -- specifically increase the complexity and uncertainty of the process of searching for, understanding, and making decisions with online information. Through our formative study (n=14), we observed users' challenges in accessing diverse perspectives, identifying relevant information, and deciding the right moment to make the final decision. We present ChoiceMates, a system that enables conversations with a dynamic set of LLM-powered agents for a holistic domain understanding and efficient discovery and management of information to make decisions. Agents, as opinionated personas, flexibly join the conversation, not only providing responses but also conversing among themselves to elicit each agent's preferences. Our between-subjects study (n=36) comparing ChoiceMates to conventional web search and single-agent showed that ChoiceMates was more helpful in discovering, diving deeper, and managing information compared to Web with higher confidence. We also describe how participants utilized multi-agent conversations in their decision-making process.

摘要
不熟悉的决策——即人们缺乏相关领域知识或专业知识——特别增加在线信息搜索、理解和做出决策过程中的复杂性和不确定性。通过我们的初步研究（n=14），我们发现用户在获取多个视角、标识重要信息以及做出最终决策的时机上遇到了困难。我们提出了 ChoiceMates，一个基于人工智能技术的系统，可以实现与动态集合的智能代理人进行协调对话，以实现领域全面理解和信息搜索管理。这些代理人作为偏好人物，不仅提供回答，还可以在对话中互动，以便每个代理人提供自己的首选。我们的 между件研究（n=36）表明，与传统网络搜索和单个代理人相比，ChoiceMates更有助于发现、深入探索和管理信息，用户表示更高的信任度。我们还描述了参与者如何在决策过程中利用多个代理人对话。

BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

paper_url: http://arxiv.org/abs/2310.01329
repo_url: None
paper_authors: Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh Hajishirzi
for: This paper is written for improving the efficiency and scalability of retrieval-augmented language models (LMs) by addressing problems such as hallucination, staleness, and privacy leaks.
methods: The paper introduces binary token representations (BTR) that use 1-bit vectors to precompute every token in passages, significantly reducing computation during inference. The authors also propose new calibration techniques and training objectives to restore performance.
results: The authors’ experiments show that BTR accelerates state-of-the-art inference by up to 4x and reduces storage by over 100x while maintaining over 95% task performance on five knowledge-intensive NLP tasks, using only 127GB of disk space to encode 3 billion tokens in Wikipedia.

Abstract
Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks. However, running retrieval-augmented language models (LMs) is slow and difficult to scale due to processing large amounts of retrieved text. We introduce binary token representations (BTR), which use 1-bit vectors to precompute every token in passages, significantly reducing computation during inference. Despite the potential loss of accuracy, our new calibration techniques and training objectives restore performance. Combined with offline and runtime compression, this only requires 127GB of disk space for encoding 3 billion tokens in Wikipedia. Our experiments show that on five knowledge-intensive NLP tasks, BTR accelerates state-of-the-art inference by up to 4x and reduces storage by over 100x while maintaining over 95% task performance.

摘要
<> Retrieval 增强 solves many critical problems in large language models, such as 幻象, 落后, and 隐私泄露. However, running retrieval-augmented language models (LMs) is slow and difficult to scale due to processing large amounts of retrieved text. We introduce binary token representations (BTR), which use 1-bit vectors to precompute every token in passages, significantly reducing computation during inference. Despite the potential loss of accuracy, our new calibration techniques and training objectives restore performance. Combined with offline and runtime compression, this only requires 127GB of disk space for encoding 3 billion tokens in Wikipedia. Our experiments show that on five knowledge-intensive NLP tasks, BTR accelerates state-of-the-art inference by up to 4x and reduces storage by over 100x while maintaining over 95% task performance.<>Here's the translation in Simplified Chinese:<> Retrieval 增强 solves many critical problems in large language models, such as 幻象, 落后, and 隐私泄露. However, running retrieval-augmented language models (LMs) is slow and difficult to scale due to processing large amounts of retrieved text. We introduce binary token representations (BTR), which use 1-bit vectors to precompute every token in passages, significantly reducing computation during inference. Despite the potential loss of accuracy, our new calibration techniques and training objectives restore performance. Combined with offline and runtime compression, this only requires 127GB of disk space for encoding 3 billion tokens in Wikipedia. Our experiments show that on five knowledge-intensive NLP tasks, BTR accelerates state-of-the-art inference by up to 4x and reduces storage by over 100x while maintaining over 95% task performance.<>

TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series

paper_url: http://arxiv.org/abs/2310.01327
repo_url: None
paper_authors: Arjun Ashok, Étienne Marcotte, Valentina Zantedeschi, Nicolas Chapados, Alexandre Drouin
for: 这篇论文是用于多变量构成的数据时间序列预测模型的设计，目的是让模型能够灵活地处理不同任务，包括预测、插值和其 комbination。
methods: 本文基于copula理论，提出了简化的目标函数，用于已经引入的transformer-based attentional copulas（TACTiS）。这个目标函数使得分布 Parameters的数量现在与变数的数量成正比，而不是因数。这需要对原始架构进行修改，并实施训练课程。
results: 根据作者的实验结果，新的模型在多个实际世界预测任务上表现出色，而且保留了先前的模型 flexibility，例如顺畅地处理不同标时和标本变化。

Abstract
We introduce a new model for multivariate probabilistic time series prediction, designed to flexibly address a range of tasks including forecasting, interpolation, and their combinations. Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS), wherein the number of distributional parameters now scales linearly with the number of variables instead of factorially. The new objective requires the introduction of a training curriculum, which goes hand-in-hand with necessary changes to the original architecture. We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks, while maintaining the flexibility of prior work, such as seamless handling of unaligned and unevenly-sampled time series.

摘要
我们介绍了一种新的多变量概率时间序列预测模型，用于灵活地解决多种任务，包括预测、 interpolate 和其组合。基于卷积理论，我们提出了简化的目标函数，用于已经引入的 transformer-based attentional copulas（TACTiS），其中分布参数的数量现在与变量数直接相关，而不是幂函数相关。这新的目标函数需要对原始架构进行必要的修改，并且需要一个训练课程。我们表明，该模型在多种实际预测任务上具有明显更好的训练势度和状态前进，同时保持了之前的灵活性，如顺利处理不对齐和不均匀采样的时间序列。

FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models

paper_url: http://arxiv.org/abs/2310.01467
repo_url: None
paper_authors: Jingwei Sun, Ziyue Xu, Hongxu Yin, Dong Yang, Daguang Xu, Yiran Chen, Holger R. Roth
for: 这个论文是为了解决对大语言模型的调评问题，并且保持隐私和安全性。
methods: 这个论文使用了联邦学习（Federated Learning）技术，并且提出了一个名为“Federated Black-box Prompt Tuning”的框架，以解决调评过程中的安全性和隐私问题。
results: 实验结果显示，这个框架可以对调评过程中的通信和存储成本进行剧烈的减少，而且可以保持比较好的性能。

Abstract
Pre-trained language models (PLM) have revolutionized the NLP landscape, achieving stellar performances across diverse tasks. These models, while benefiting from vast training data, often require fine-tuning on specific data to cater to distinct downstream tasks. However, this data adaptation process has inherent security and privacy concerns, primarily when leveraging user-generated, device-residing data. Federated learning (FL) provides a solution, allowing collaborative model fine-tuning without centralized data collection. However, applying FL to finetune PLMs is hampered by challenges, including restricted model parameter access, high computational requirements, and communication overheads. This paper introduces Federated Black-box Prompt Tuning (FedBPT), a framework designed to address these challenges. FedBPT does not require the clients to access the model parameters. By focusing on training optimal prompts and utilizing gradient-free optimization methods, FedBPT reduces the number of exchanged variables, boosts communication efficiency, and minimizes computational and storage costs. Experiments highlight the framework's ability to drastically cut communication and memory costs while maintaining competitive performance. Ultimately, FedBPT presents a promising solution for efficient, privacy-preserving fine-tuning of PLM in the age of large language models.

摘要
预训语言模型（PLM）已经革命化了NLPT景象，在多种任务上实现了出色的表现。这些模型，尽管从 vast 的训练数据受益，但是在下游任务中需要特定数据的精度调整。然而，这个数据适应过程存在安全和隐私问题，特别是当使用用户生成的设备存储的数据时。联邦学习（FL）提供了一个解决方案，允许合作模型的调整而无需中央数据收集。然而，在应用FL调整PLM时存在一些挑战，包括限制客户端访问模型参数、高计算需求和通信开销。本文介绍了联邦黑盒启发调整（FedBPT）框架，这个框架不需要客户端访问模型参数，而是通过训练优化提示和gradient-free优化方法来降低交换变量的数量，提高通信效率，降低计算和存储成本。实验表明，FedBPT可以剖正交通和内存成本，同时保持竞争力的表现。最终，FedBPT提供了一种有效的、隐私保护的PLM调整解决方案。

Avalon’s Game of Thoughts: Battle Against Deception through Recursive Contemplation

paper_url: http://arxiv.org/abs/2310.01320
repo_url: None
paper_authors: Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, Gao Huang
for:* 这项研究旨在探讨LLMs在有假信息混淆的环境下的能力。methods:* 该研究使用了复杂的Avalon游戏作为测试环境，并引入了一种新的框架——复杂思考（ReCon），以提高LLMs对假信息的识别和应对能力。results:* 实验结果表明，通过将ReCon与不同的LLMs结合，可以在Avalon游戏中提高LLMs的假信息识别和应对能力，无需额外微调和数据。

Abstract
Recent breakthroughs in large language models (LLMs) have brought remarkable success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is that the information processed by LLMs is consistently honest, neglecting the pervasive deceptive or misleading information in human society and AI-generated content. This oversight makes LLMs susceptible to malicious manipulations, potentially resulting in detrimental outcomes. This study utilizes the intricate Avalon game as a testbed to explore LLMs' potential in deceptive environments. Avalon, full of misinformation and requiring sophisticated logic, manifests as a "Game-of-Thoughts". Inspired by the efficacy of humans' recursive thinking and perspective-taking in the Avalon game, we introduce a novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to identify and counteract deceptive information. ReCon combines formulation and refinement contemplation processes; formulation contemplation produces initial thoughts and speech, while refinement contemplation further polishes them. Additionally, we incorporate first-order and second-order perspective transitions into these processes respectively. Specifically, the first-order allows an LLM agent to infer others' mental states, and the second-order involves understanding how others perceive the agent's mental state. After integrating ReCon with different LLMs, extensive experiment results from the Avalon game indicate its efficacy in aiding LLMs to discern and maneuver around deceptive information without extra fine-tuning and data. Finally, we offer a possible explanation for the efficacy of ReCon and explore the current limitations of LLMs in terms of safety, reasoning, speaking style, and format, potentially furnishing insights for subsequent research.

摘要
近期巨型语言模型（LLM）的突破让LMM作为代理人的领域具有了很大的成功。然而，一般假设是LMM处理的信息都是真实的，忽略了人类社会和AI生成的假信息的普遍存在。这种忽略使LMM容易受到恶意欺骗，可能导致不良的结果。这项研究使用了复杂的Avalon游戏作为测试环境，探索LMM在假信息环境中的潜力。Avalon游戏满是假信息，需要人类具有卓越的逻辑和思维能力。 draw inspiration from humans' recursive thinking and perspective-taking in the Avalon game, we propose a novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to identify and counteract deceptive information. ReCon combines formulation and refinement contemplation processes; formulation contemplation produces initial thoughts and speech, while refinement contemplation further polishes them. Additionally, we incorporate first-order and second-order perspective transitions into these processes respectively. Specifically, the first-order allows an LLM agent to infer others' mental states, and the second-order involves understanding how others perceive the agent's mental state. After integrating ReCon with different LLMs, extensive experiment results from the Avalon game indicate its efficacy in aiding LLMs to discern and maneuver around deceptive information without extra fine-tuning and data. Finally, we offer a possible explanation for the efficacy of ReCon and explore the current limitations of LLMs in terms of safety, reasoning, speaking style, and format, potentially furnishing insights for subsequent research.

On the Generalization of Training-based ChatGPT Detection Methods

paper_url: http://arxiv.org/abs/2310.01307
repo_url: https://github.com/hannxu123/hcvar
paper_authors: Han Xu, Jie Ren, Pengfei He, Shenglai Zeng, Yingqian Cui, Amy Liu, Hui Liu, Jiliang Tang
for: 本研究旨在 investigate ChatGPT 生成文本的泛化性能，以帮助开发更好的 ChatGPT 检测方法。
methods: 本研究使用了一系列的方法来检测 ChatGPT 生成文本，包括类型分类模型的训练和测试。
results: 研究发现，现有的检测方法可能会受到分布变化的影响，导致其在测试时效果不佳。此外，研究还发现了一些有用的发现，可以为未来的方法和数据采集策略提供指导。

Abstract
ChatGPT is one of the most popular language models which achieve amazing performance on various natural language tasks. Consequently, there is also an urgent need to detect the texts generated ChatGPT from human written. One of the extensively studied methods trains classification models to distinguish both. However, existing studies also demonstrate that the trained models may suffer from distribution shifts (during test), i.e., they are ineffective to predict the generated texts from unseen language tasks or topics. In this work, we aim to have a comprehensive investigation on these methods' generalization behaviors under distribution shift caused by a wide range of factors, including prompts, text lengths, topics, and language tasks. To achieve this goal, we first collect a new dataset with human and ChatGPT texts, and then we conduct extensive studies on the collected dataset. Our studies unveil insightful findings which provide guidance for developing future methodologies or data collection strategies for ChatGPT detection.

摘要
chatGPT是一个非常受欢迎的语言模型，它在各种自然语言任务上表现出了惊人的水平。然而，随着chatGPT的普及，也有了快速 distinguish between human-written texts and chatGPT-generated texts的需求。现有的研究主要是通过训练分类模型来实现这一目标，但是 exiting studies also show that these trained models may suffer from distribution shifts during test, meaning they are ineffective in predicting generated texts from unseen language tasks or topics.在这项工作中，我们想要进行 comprehensive investigation of these methods' generalization behaviors under distribution shift caused by a wide range of factors, including prompts, text lengths, topics, and language tasks。To achieve this goal, we first collect a new dataset with human and chatGPT texts, and then we conduct extensive studies on the collected dataset. Our studies unveil insightful findings which provide guidance for developing future methodologies or data collection strategies for chatGPT detection.Here's the translation in Traditional Chinese as well:chatGPT是一个非常受欢迎的语言模型，它在各种自然语言任务上表现出了惊人的水平。然而，随着chatGPT的普及，也有了快速 distinguish between human-written texts and chatGPT-generated texts的需求。现有的研究主要是通过训练分类模型来实现这一目标，但是 exiting studies also show that these trained models may suffer from distribution shifts during test, meaning they are ineffective in predicting generated texts from unseen language tasks or topics。在这个工作中，我们想要进行 comprehensive investigation of these methods' generalization behaviors under distribution shift caused by a wide range of factors, including prompts, text lengths, topics, and language tasks。To achieve this goal, we first collect a new dataset with human and chatGPT texts, and then we conduct extensive studies on the collected dataset。我们的研究发现有意义的发现，它们提供了未来方法olo gy或数据收集策略的指导方针 для chatGPT检测。

Generating Explanations in Medical Question-Answering by Expectation Maximization Inference over Evidence

paper_url: http://arxiv.org/abs/2310.01299
repo_url: None
paper_authors: Wei Sun, Mingxiao Li, Damien Sileo, Jesse Davis, Marie-Francine Moens
for: 帮助医疗工作人员查找答案。(to assist healthcare workers in finding answers.)
methods: 提出了一种新的方法，使用医学书籍中的知识来提高医疗解释的质量。(proposed a new approach that uses medical knowledge from textbooks to improve the quality of explanations.)
results: 实验结果表明，该方法可以效果地使用文本证据进行推理，并与当前状态Esp报表示的模型相比，实现了明显的提升。(experimental results demonstrate that the approach can effectively use textual evidence for reasoning, and achieve a significant improvement compared to current state-of-the-art models.)

Abstract
Medical Question Answering~(medical QA) systems play an essential role in assisting healthcare workers in finding answers to their questions. However, it is not sufficient to merely provide answers by medical QA systems because users might want explanations, that is, more analytic statements in natural language that describe the elements and context that support the answer. To do so, we propose a novel approach for generating natural language explanations for answers predicted by medical QA systems. As high-quality medical explanations require additional medical knowledge, so that our system extract knowledge from medical textbooks to enhance the quality of explanations during the explanation generation process. Concretely, we designed an expectation-maximization approach that makes inferences about the evidence found in these texts, offering an efficient way to focus attention on lengthy evidence passages. Experimental results, conducted on two datasets MQAE-diag and MQAE, demonstrate the effectiveness of our framework for reasoning with textual evidence. Our approach outperforms state-of-the-art models, achieving a significant improvement of \textbf{6.86} and \textbf{9.43} percentage points on the Rouge-1 score; \textbf{8.23} and \textbf{7.82} percentage points on the Bleu-4 score on the respective datasets.

摘要
医疗问答系统（医疗QA）在帮助医疗工作者查询问题时发挥着重要作用。然而，仅提供答案并不足以满足用户的需求，因为用户可能需要更多的解释，即更多的自然语言中的批判性陈述，以描述答案的元素和上下文。为此，我们提出了一种新的方法，用于生成医疗问答系统预测的答案中的自然语言解释。由于高质量的医疗解释需要额外的医学知识，因此我们的系统从医学书籍中提取了更多的医学知识，以提高解释生成过程中的解释质量。我们采用了一种预期-最大化方法，可以快速地扫描证据文本中的证据，以提高解释生成的效率。实验结果，在两个数据集MQAE-diag和MQAE上进行了测试，显示了我们的框架在处理文本证据时的有效性。我们的方法在比较州的模型上表现出色，在 Rouge-1 分数上提高了 \textbf{6.86} 和 \textbf{9.43} 个百分点，在 Bleu-4 分数上提高了 \textbf{8.23} 和 \textbf{7.82} 个百分点。

Co-audit: tools to help humans double-check AI-generated content

paper_url: http://arxiv.org/abs/2310.01297
repo_url: None
paper_authors: Andrew D. Gordon, Carina Negreanu, José Cambronero, Rasika Chakravarthy, Ian Drosos, Hao Fang, Bhaskar Mitra, Hannah Richardson, Advait Sarkar, Stephanie Simmons, Jack Williams, Ben Zorn
for: This paper is written to emphasize the importance of co-audit tools for generative AI applications where quality is crucial and errors have significant consequences, specifically in spreadsheet computations.methods: The paper proposes a preliminary list of principles for co-audit tools and outlines research challenges for developing effective co-audit experiences.results: The paper highlights the need for tool-assisted experiences to help users double-check AI-generated content for quality and correctness, as generative models generate more complex output that is harder to audit.

Abstract
Users are increasingly being warned to check AI-generated content for correctness. Still, as LLMs (and other generative models) generate more complex output, such as summaries, tables, or code, it becomes harder for the user to audit or evaluate the output for quality or correctness. Hence, we are seeing the emergence of tool-assisted experiences to help the user double-check a piece of AI-generated content. We refer to these as co-audit tools. Co-audit tools complement prompt engineering techniques: one helps the user construct the input prompt, while the other helps them check the output response. As a specific example, this paper describes recent research on co-audit tools for spreadsheet computations powered by generative models. We explain why co-audit experiences are essential for any application of generative AI where quality is important and errors are consequential (as is common in spreadsheet computations). We propose a preliminary list of principles for co-audit, and outline research challenges.

摘要
用户们正在收到更多的提醒，请他们检查生成内容的正确性。然而，随着LLMs（以及其他生成模型）生成的输出越来越复杂，如摘要、表格或代码，用户就更难以对输出质量或正确性进行审核或评估。因此，我们开始看到工具协助体验的出现，以帮助用户对AI生成内容进行双重审核。我们称这些工具为“合作审核工具”。这些工具与推荐工程技术相似，一个帮助用户构建输入提示，另一个帮助用户检查输出响应。本文描述了最近关于合作审核工具的研究，特别是在基于生成模型的表格计算中。我们解释了为什么合作审核体验是生成AI应用中质量重要的时候的必需品，并提出了一些初步的原则和研究挑战。

Knowledge Crosswords: Geometric Reasoning over Structured Knowledge with Large Language Models

paper_url: http://arxiv.org/abs/2310.01290
repo_url: https://github.com/wenwen-d/knowledgecrosswords
paper_authors: Wenxuan Ding, Shangbin Feng, Yuhan Liu, Zhaoxuan Tan, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov
for: 这paper的目的是检验大型语言模型（LLMs）是否可以在知识强大的情况下进行geometry reasoning，以及 LLMS 是否可以在不同领域和各种不同的知识环境中进行这种类型的reasoning。
methods: 这 paper使用的方法包括提出了一个名为 Knowledge Crosswords 的多空白QA数据集，以及两种新的提问方法：Stage Prompting和Verify-All。这些方法旨在augment LLMs的能力，让它们在不同的知识环境中更好地进行geometry reasoning。
results: 研究结果表明，虽然基eline方法在 easier 问题上表现良好，但在 harder 问题上却表现不佳。而提出的 Verify-All 方法在这些 harder 问题上表现出色，与其他方法相比差异较大。此外，研究还发现 LLMS 在 geometry reasoning 方面仍然存在一些不足，例如答案的顺序、特定的结构模式等可能会导致 LLMS 进行错误的回答。

Abstract
Large language models (LLMs) are widely adopted in knowledge-intensive tasks and have achieved impressive performance thanks to their knowledge abilities. While LLMs have demonstrated outstanding performance on atomic or linear (multi-hop) QA tasks, whether they can reason in knowledge-rich scenarios with interweaving constraints remains an underexplored problem. In this work, we propose geometric reasoning over structured knowledge, where pieces of knowledge are connected in a graph structure and models need to fill in the missing information. Such geometric knowledge reasoning would require the ability to handle structured knowledge, reason with uncertainty, verify facts, and backtrack when an error occurs. We propose Knowledge Crosswords, a multi-blank QA dataset where each problem consists of a natural language question representing the geometric constraints of an incomplete entity network, where LLMs are tasked with working out the missing entities while meeting all factual constraints. Knowledge Crosswords contains 2,101 individual problems, covering various knowledge domains and further divided into three difficulty levels. We conduct extensive experiments to evaluate existing LLM prompting approaches on the Knowledge Crosswords benchmark. We additionally propose two new approaches, Staged Prompting and Verify-All, to augment LLMs' ability to backtrack and verify structured constraints. Our results demonstrate that while baseline approaches perform well on easier problems but struggle with hard ones, our proposed Verify-All outperforms other methods by a large margin and is more robust with hard problems. Further analysis reveals that LLMs' ability of geometric reasoning over structured knowledge is still far from robust or perfect, susceptible to confounders such as the order of options, certain structural patterns, assumption of existence of correct answer, and more.

摘要
广泛采用的大语言模型（LLM）在知识密集任务中表现出色，感谢它们的知识能力。然而， LLM 在具有复杂续链（multi-hop）和知识丰富的情况下还未得到充分探索。在这个工作中，我们提出了几何理解知识，其中知识单元被Graph结构连接在一起，模型需要填充缺失信息。这个几何知识理解需要模型能够处理结构化知识、以 uncertain 的方式进行推理、验证事实和当错时回溯。我们提出了知识十字游戏，这是一个多输入 вопро� answered 数据集，每个问题都是一个自然语言问题，表达了几何续链中缺失的实体网络， LLM 需要填充缺失的实体，同时遵循所有的事实约束。知识十字游戏包含 2,101 个问题，涵盖多个知识领域，并且分为三种难度水平。我们进行了广泛的实验，评估现有的 LLM 提示方法在知识十字游戏中的表现。我们还提出了两个新的方法，分别是 Staged Prompting 和 Verify-All，以增强 LLM 的后退和验证结构约束的能力。我们的结果显示，基eline方法在较容易的问题上表现良好，但在困难的问题上却表现不佳。我们的提出的 Verify-All 方法则大幅超越其他方法，并且更具有实用性。进一步的分析显示， LLM 在几何理解知识时仍然具有许多不足和潜在问题，例如选择顺序、特定的结构图样、存在正确答案的假设等。

Grasping AI: experiential exercises for designers

paper_url: http://arxiv.org/abs/2310.01282
repo_url: None
paper_authors: Dave Murray-Rust, Maria Luce Lupetti, Iohanna Nicenboim, Wouter van der Hoog
for: 本研究旨在帮助设计师在AI技术的应用中更好地理解人类互动的问题和社会意义，以便更负责任地设计AI系统。
methods: 本研究使用了九种’AI练习’，包括超人设计、负责任AI设计和推理演示，以创造AI互动设计的实验性经验。
results: 研究发现，通过使用emetaphors和演示来让学生更直观地理解AI系统的训练和学习、隐私和consent、自主和权力等问题，可以帮助学生更有责任地设计AI系统。

Abstract
Artificial intelligence (AI) and machine learning (ML) are increasingly integrated into the functioning of physical and digital products, creating unprecedented opportunities for interaction and functionality. However, there is a challenge for designers to ideate within this creative landscape, balancing the possibilities of technology with human interactional concerns. We investigate techniques for exploring and reflecting on the interactional affordances, the unique relational possibilities, and the wider social implications of AI systems. We introduced into an interaction design course (n=100) nine 'AI exercises' that draw on more than human design, responsible AI, and speculative enactment to create experiential engagements around AI interaction design. We find that exercises around metaphors and enactments make questions of training and learning, privacy and consent, autonomy and agency more tangible, and thereby help students be more reflective and responsible on how to design with AI and its complex properties in both their design process and outcomes.

摘要
人工智能（AI）和机器学习（ML）在物理和数字产品中逐渐整合，创造了前所未有的交互和功能。然而，设计师面临着挑战，如何在这种创造力充沛的景观中进行设计，权衡技术的可能性和人类交互关注。我们在交互设计课程（n=100）中引入了9个“AI实践”，利用更多than human设计、责任AI和推演实践来创造交互式的AI交互设计经验。我们发现，这些实践使得学生对培训和学习、隐私和consent、自主和权力的问题变得更加直观，从而帮助学生更加反思和负责任地设计AI和其复杂特性。

A Comparison of Mesh-Free Differentiable Programming and Data-Driven Strategies for Optimal Control under PDE Constraints

paper_url: http://arxiv.org/abs/2310.02286
repo_url: None
paper_authors: Roussel Desmond Nzoyem, David A. W. Barton, Tom Deakin
for: 这篇论文主要是关于Optimal Control under Partial Differential Equations (PDE) constraints的研究，它们是受到深度学习和相关的自动梯度库的影响，而这些技术正在快速发展。
methods: 这篇论文使用了Direct-Adjoint Looping (DAL)、Physics-Informed Neural Networks (PINNs)和Differentiable Programming (DP)等方法进行比较，其中DP在Laplace和Navier-Stokes方程下表现最佳，而DAL和PINNs在某些情况下则表现不佳。
results: 这篇论文的研究结果表明，DP在解析PDE问题时可以生成最精准的梯度，并且在DAL和PINNs失败时表现出色。此外，论文还提供了一个详细的benchmark，以帮助Optimal Control启用者更好地选择合适的方法。

Abstract
The field of Optimal Control under Partial Differential Equations (PDE) constraints is rapidly changing under the influence of Deep Learning and the accompanying automatic differentiation libraries. Novel techniques like Physics-Informed Neural Networks (PINNs) and Differentiable Programming (DP) are to be contrasted with established numerical schemes like Direct-Adjoint Looping (DAL). We present a comprehensive comparison of DAL, PINN, and DP using a general-purpose mesh-free differentiable PDE solver based on Radial Basis Functions. Under Laplace and Navier-Stokes equations, we found DP to be extremely effective as it produces the most accurate gradients; thriving even when DAL fails and PINNs struggle. Additionally, we provide a detailed benchmark highlighting the limited conditions under which any of those methods can be efficiently used. Our work provides a guide to Optimal Control practitioners and connects them further to the Deep Learning community.

摘要
< translate into Simplified Chinese领域内的最优控制 unter 部分梯度方程（PDE）约束是在深度学习和相关的自动梯度库的影响下快速发展。新的技术如物理学 Informed Neural Networks（PINNs）和可微编程（DP）与传统的数学方法如直接-逆向循环（DAL）进行比较。我们使用一个通用的无格 mesh-free 可微PDE解决方案 based on Radial Basis Functions进行了全面的比较。在拉普拉斯和 navier-Stokes 方程下，我们发现 DP 非常有效，因为它生成的梯度最准确；在 DAL 失败和 PINNs 困难时，它卓越。此外，我们提供了详细的审核，描述了任何这些方法可以高效地使用的有限条件。我们的工作为 Optimal Control 专家提供了指南，并与深度学习社区更加密切连接。

paper_url: http://arxiv.org/abs/2310.01272
repo_url: None
paper_authors: Outongyi Lv, Bingxin Zhou, Jing Wang, Xiang Xiao, Weishu Zhao, Lirong Zheng
for: 这种研究旨在帮助分析和理解社交网络中的动态系统，包括社交actor之间的 opinio exchanges和信息传递。
methods: 该研究使用 neural message passing 和 sociometry 概念相结合，提出了 ODNet 消息传递方案，并调整了 bounded confidence 和影响 weights。
results: 该研究表明 ODNet 可以提高不同类型的图像表示学性能，并解决压缩问题。此外，该方法可以简单地使用社交网络图中Entity之间的交互频率来简化社交网络图。

Abstract
Social networks represent a common form of interconnected data frequently depicted as graphs within the domain of deep learning-based inference. These communities inherently form dynamic systems, achieving stability through continuous internal communications and opinion exchanges among social actors along their social ties. In contrast, neural message passing in deep learning provides a clear and intuitive mathematical framework for understanding information propagation and aggregation among connected nodes in graphs. Node representations are dynamically updated by considering both the connectivity and status of neighboring nodes. This research harmonizes concepts from sociometry and neural message passing to analyze and infer the behavior of dynamic systems. Drawing inspiration from opinion dynamics in sociology, we propose ODNet, a novel message passing scheme incorporating bounded confidence, to refine the influence weight of local nodes for message propagation. We adjust the similarity cutoffs of bounded confidence and influence weights of ODNet and define opinion exchange rules that align with the characteristics of social network graphs. We show that ODNet enhances prediction performance across various graph types and alleviates oversmoothing issues. Furthermore, our approach surpasses conventional baselines in graph representation learning and proves its practical significance in analyzing real-world co-occurrence networks of metabolic genes. Remarkably, our method simplifies complex social network graphs solely by leveraging knowledge of interaction frequencies among entities within the system. It accurately identifies internal communities and the roles of genes in different metabolic pathways, including opinion leaders, bridge communicators, and isolators.

摘要
社交网络是常见的连接数据的形式，在深度学习基础上的推理中经常被描绘为图。这些社区自然形成动态系统，通过社交actor之间的内部通信和意见交换来实现稳定。与此同时，深度学习中的神经Message passing提供了一个明确和直观的数学框架，用于理解图中连接的节点之间信息传递和聚合。每个节点的表示都会在考虑周围节点的连接和状态的基础上进行动态更新。这项研究将社会学中的社ometry和神经Message passing结合，以分析和推断动态系统的行为。我们提出了一种名为ODNet的新的消息传递方案，该方案包括约束confidence的约束，用于修改本地节点的影响重量。我们调整了约束confidence和ODNet的影响重量，并定义了与社交网络图的特点相align的意见交换规则。我们发现ODNet可以提高不同类型的图的预测性能，并解决过滤 пробле。此外，我们的方法超过了传统的基eline在图表示学习中的表现，并证明了其在分析实际世界的共occurrence网络中的实用意义。很 Remarkably，我们的方法可以简化社交网络图，只需要利用系统内部Entity之间的交互频率来训练。它可以准确地识别社交网络图中的内部社区和不同 метаболических道路中的基因的角色，包括意见领袖、桥接通信员和隔离者。

Cooperative Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.01267
repo_url: https://github.com/zhangxiaochen95/uav_bs_ctrl
paper_authors: Ben Finkelshtein, Xingyue Huang, Michael Bronstein, İsmail İlkan Ceylan
for: 这个论文的目的是提出一种新的图 neural network 训练框架，使每个节点可以选择不同的策略来处理信息。
methods: 这个论文使用了一种新的消息传递方案，其中每个节点可以选择是 ‘听’, ‘广播’, ‘听并广播’，或者 ‘隔离’。这种方法可以视为标准的消息传递模式的特例，在每层的节点状态更新中，每个节点都会对所有邻居进行广播和接收消息。
results: 论文提供了一种新的图 neural network 训练方法，可以更好地利用图 topology 进行学习。论文还提供了一个理论分析和广泛的实验分析，证明了新的消息传递方案的有效性。

Abstract
Graph neural networks are popular architectures for graph machine learning, based on iterative computation of node representations of an input graph through a series of invariant transformations. A large class of graph neural networks follow a standard message-passing paradigm: at every layer, each node state is updated based on an aggregate of messages from its neighborhood. In this work, we propose a novel framework for training graph neural networks, where every node is viewed as a player that can choose to either 'listen', 'broadcast', 'listen and broadcast', or to 'isolate'. The standard message propagation scheme can then be viewed as a special case of this framework where every node 'listens and broadcasts' to all neighbors. Our approach offers a more flexible and dynamic message-passing paradigm, where each node can determine its own strategy based on their state, effectively exploring the graph topology while learning. We provide a theoretical analysis of the new message-passing scheme which is further supported by an extensive empirical analysis on a synthetic dataset and on real-world datasets.

摘要
图 neural network 是一种流行的图学习架构，基于迭代计算输入图的节点表示的 invarianten 变换。大量的图 neural network 遵循一种标准的消息传递模式：在每层，每个节点状态都是基于它的邻居聚合的消息。在这项工作中，我们提出了一种新的图 neural network 训练框架，在该框架中，每个节点视为一个玩家，可以选择 'listen'、'broadcast'、'listen and broadcast' 或 'isolate' 四种策略。标准的消息传递方案可以视为该框架中每个节点 'listen and broadcast' 所有邻居的特例。我们的方法提供了更flexible和动态的消息传递模式，allowing each node to determine its own strategy based on its state, effectively exploring the graph topology while learning. 我们提供了对新消息传递方案的理论分析，并通过一系列实验来支持这一点，包括一个 sintetic dataset 和实际世界 dataset。

SPELL: Semantic Prompt Evolution based on a LLM

paper_url: http://arxiv.org/abs/2310.01260
repo_url: None
paper_authors: Yujian Betterest Li, Kai Wu
for: 提高训练过 neural network 模型的性能
methods: 利用大语言模型（LLMs）来自动优化提示语
results: 实验结果表明，SPELL 可以快速提高提示语Here’s a more detailed explanation of each point:
for: The paper is written to enhance the performance of trained neural network models by using a new paradigm called prompt engineering.
methods: The paper proposes a black-box evolution algorithm called SPELL (Semantic Prompt Evolution based on a LLM) to automatically optimize text prompts. SPELL uses a trained large language model (LLM) as a text generator to evolve the prompts.
results: The proposed method is evaluated with different LLMs and evolution parameters in different text tasks, and the experimental results show that SPELL can rapidly improve the prompts.

Abstract
Prompt engineering is a new paradigm for enhancing the performance of trained neural network models. For optimizing text-style prompts, existing methods usually individually operate small portions of a text step by step, which either breaks the fluency or could not globally adjust a prompt. Since large language models (LLMs) have powerful ability of generating coherent texts token by token, can we utilize LLMs for improving prompts? Based on this motivation, in this paper, considering a trained LLM as a text generator, we attempt to design a black-box evolution algorithm for automatically optimizing texts, namely SPELL (Semantic Prompt Evolution based on a LLM). The proposed method is evaluated with different LLMs and evolution parameters in different text tasks. Experimental results show that SPELL could rapidly improve the prompts indeed. We further explore the evolution process and discuss on the limitations, potential possibilities and future work.

摘要
Prompt engineering是一种新的思维方式，用于提高训练过的神经网络模型的性能。现有的方法通常是单独操作小剂文本步骤，这会导致流畅性被打断或者无法全面调整提示。由于大语言模型（LLMs）具有生成文本单词的强大能力，那么可以利用LLMs来提高提示吗？基于这种动机，在这篇论文中，我们尝试使用黑盒演化算法来自动优化文本，即SPELL（含义提示演化基于LLM）。我们对不同的LLM和演化参数进行了不同的文本任务的评估。实验结果表明，SPELL确实可以快速改善提示。我们进一步探讨演化过程中的限制、潜在可能性和未来工作。

Faster and Accurate Neural Networks with Semantic Inference

paper_url: http://arxiv.org/abs/2310.01259
repo_url: None
paper_authors: Sazzad Sayyed, Jonathan Ashdown, Francesco Restuccia
for: 这个论文旨在降低深度神经网络（DNN）的计算负担，而不导致性能下降。methods: 该论文提出了一种基于卷积神经网络（CNN）的Semantic Inference（SINF）框架，通过快速分类来确定输入对象的 semantic cluster，然后使用相应的子图进行预测。此外，论文还提出了一种新的Discriminative Capability Score（DCS）方法，可以独立地应用于任何DNN中。results: 论文的实验结果表明，SINF可以减少VGG16、VGG19和ResNet50的推理时间，而无需增加性能下降。DCS可以与现有的预测方法相比，在VGG16、VGG19和ResNet50上实现更高的准确率。此外，当用作截割 criterion时，DCS可以获得8.13%的准确率提升，并且减少5.82%的参数量。

Abstract
Deep neural networks (DNN) usually come with a significant computational burden. While approaches such as structured pruning and mobile-specific DNNs have been proposed, they incur drastic accuracy loss. In this paper we leverage the intrinsic redundancy in latent representations to reduce the computational load with limited loss in performance. We show that semantically similar inputs share many filters, especially in the earlier layers. Thus, semantically similar classes can be clustered to create cluster-specific subgraphs. To this end, we propose a new framework called Semantic Inference (SINF). In short, SINF (i) identifies the semantic cluster the object belongs to using a small additional classifier and (ii) executes the subgraph extracted from the base DNN related to that semantic cluster for inference. To extract each cluster-specific subgraph, we propose a new approach named Discriminative Capability Score (DCS) that finds the subgraph with the capability to discriminate among the members of a specific semantic cluster. DCS is independent from SINF and can be applied to any DNN. We benchmark the performance of DCS on the VGG16, VGG19, and ResNet50 DNNs trained on the CIFAR100 dataset against 6 state-of-the-art pruning approaches. Our results show that (i) SINF reduces the inference time of VGG19, VGG16, and ResNet50 respectively by up to 35%, 29% and 15% with only 0.17%, 3.75%, and 6.75% accuracy loss (ii) DCS achieves respectively up to 3.65%, 4.25%, and 2.36% better accuracy with VGG16, VGG19, and ResNet50 with respect to existing discriminative scores (iii) when used as a pruning criterion, DCS achieves up to 8.13% accuracy gain with 5.82% less parameters than the existing state of the art work published at ICLR 2023 (iv) when considering per-cluster accuracy, SINF performs on average 5.73%, 8.38% and 6.36% better than the base VGG16, VGG19, and ResNet50.

摘要
深度神经网络（DNN）通常会带来巨大的计算负担。虽然有许多方法，如结构化剪辑和移动设备专门的DNN，但它们都会导致减少性能。在这篇论文中，我们利用神经网络中固有的重复性来减少计算负担，同时减少性能下降。我们发现semantically相似的输入都共享许多滤波器，特别是在早期层。因此，我们可以将semantically相似的类划分到各自的子图中。为此，我们提出了一个新的框架called Semantic Inference（SINF）。简而言之，SINF包括以下两个步骤：1. 使用一个小型额外分类器来确定输入对象所属的semantic集（cluster）。2. 使用基于这个semantic集的子图来进行卷积。为EXTRACT每个cluster-specific subgraph，我们提出了一新的方法named Discriminative Capability Score（DCS）。DCS可以独立应用于任何DNN。我们对VGG16、VGG19和ResNet50在CIFAR100 dataset上进行了6种state-of-the-art剪辑方法的比较。我们的结果表明：1. SINF可以将VGG19、VGG16和ResNet50的推理时间减少35%、29%和15%，仅增加0.17%、3.75%和6.75%的准确率损失。2. DCS可以在VGG16、VGG19和ResNet50上提高准确率，相比之下，现有的推理分数提高了3.65%、4.25%和2.36%。3. 作为剪辑标准，DCS可以减少5.82%、8.13%和2.36%的参数数量，并且在ICLR 2023中发表的最佳成果中提高了8.13%的准确率。4. 考虑每个cluster的准确率，SINF在平均上提高了5.73%、8.38%和6.36%。

Pre-training Contextual Location Embeddings in Personal Trajectories via Efficient Hierarchical Location Representations

paper_url: http://arxiv.org/abs/2310.01252
repo_url: None
paper_authors: Chung Park, Taesan Kim, Junui Hong, Minsung Choi, Jaegul Choo
for: 本研究旨在提高Location Based Services（LBS）中 Location Embedding 的效能，解决实际应用中模型大量 Location 的问题。
methods: 我们提出了 Geo-Tokenizer，它可以高效地减少要训练的 Location 数量，通过表示一个 Location 为多个精度不同的 Grid 的组合。此外，我们还提出了 Hierarchical Auto-regressive Location Model 对象，用于有效地训练 Geo-Tokenizer 中的 decomposed Location。
results: 我们在两个实际用户轨迹数据集上进行了实验，结果表明，我们的模型可以在减少模型参数的情况下，显著提高下游任务的性能。

Abstract
Pre-training the embedding of a location generated from human mobility data has become a popular method for location based services. In practice, modeling the location embedding is too expensive, due to the large number of locations to be trained in situations with fine-grained resolution or extensive target regions. Previous studies have handled less than ten thousand distinct locations, which is insufficient in the real-world applications. To tackle this problem, we propose a Geo-Tokenizer, designed to efficiently reduce the number of locations to be trained by representing a location as a combination of several grids at different scales. In the Geo-Tokenizer, a grid at a larger scale shares the common set of grids at smaller scales, which is a key factor in reducing the size of the location vocabulary. The sequences of locations preprocessed with the Geo-Tokenizer are utilized by a causal location embedding model to capture the temporal dependencies of locations. This model dynamically calculates the embedding vector of a target location, which varies depending on its trajectory. In addition, to efficiently pre-train the location embedding model, we propose the Hierarchical Auto-regressive Location Model objective to effectively train decomposed locations in the Geo-Tokenizer. We conducted experiments on two real-world user trajectory datasets using our pre-trained location model. The experimental results show that our model significantly improves the performance of downstream tasks with fewer model parameters compared to existing location embedding methods.

摘要
预训化位置生成的位置生成方法在地理位置服务中变得非常流行。在实践中，定制位置生成的模型是太费时的，因为需要训练具有细化分辨率或广泛目标区域的大量位置。前一些研究只处理了数千个独特的位置，这对实际应用来说是不足够的。为了解决这个问题，我们提议了地理化токен化器（Geo-Tokenizer），用于有效减少需要训练的位置数量。在地理化токен化器中，一个大规模的网格共享同一个规模下的几个网格的公共集合，这是减少位置词汇库的关键因素。 pré-processed的位置序列使用地理化 embedding 模型来捕捉位置的时间相关性。这个模型会在目标位置的轨迹上计算不同的嵌入 вектор，这些嵌入 вектор会随着目标位置的变化而变化。此外，为了有效地预训练位置 embedding 模型，我们提议了层次自回归位置模型目标，以有效地训练地理化tokentizer 中的分解位置。我们在两个实际用户轨迹数据集上进行了实验，实验结果显示，我们的模型可以在较少的模型参数下显著提高下游任务的性能，相比于现有的位置嵌入方法。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

PASTA: PArallel Spatio-Temporal Attention with spatial auto-correlation gating for fine-grained crowd flow prediction

paper_url: http://arxiv.org/abs/2310.02284
repo_url: None
paper_authors: Chung Park, Junui Hong, Cheonbok Park, Taesan Kim, Minsung Choi, Jaegul Choo
for: 预测城市范围内未来人员和车辆的流动方向
methods: 使用PASTA神经网络模型，包括空间自相关阻止、多尺度径脱敏块和时间注意力阻止模块，有效地捕捉细致的空间时间模式
results: 对比其他基线模型，本模型在具有不规则空间区域的条件下表现出优异性，并提供了关键时间信息的Qualitative分析

Abstract
Understanding the movement patterns of objects (e.g., humans and vehicles) in a city is essential for many applications, including city planning and management. This paper proposes a method for predicting future city-wide crowd flows by modeling the spatio-temporal patterns of historical crowd flows in fine-grained city-wide maps. We introduce a novel neural network named PArallel Spatio-Temporal Attention with spatial auto-correlation gating (PASTA) that effectively captures the irregular spatio-temporal patterns of fine-grained maps. The novel components in our approach include spatial auto-correlation gating, multi-scale residual block, and temporal attention gating module. The spatial auto-correlation gating employs the concept of spatial statistics to identify irregular spatial regions. The multi-scale residual block is responsible for handling multiple range spatial dependencies in the fine-grained map, and the temporal attention gating filters out irrelevant temporal information for the prediction. The experimental results demonstrate that our model outperforms other competing baselines, especially under challenging conditions that contain irregular spatial regions. We also provide a qualitative analysis to derive the critical time information where our model assigns high attention scores in prediction.

摘要
理解城市中物体（如人和车辆）的运动模式是许多应用的关键，包括城市规划和管理。本文提出了一种预测未来城市范围内人流的方法，基于历史人流的细致空间时间模式。我们介绍了一种名为PASTA（并行空间时间注意力）的新神经网络，可以有效地捕捉细致空间时间模式的异常模式。我们的方法包括空间自相关阻断、多尺度径脱异阻断和时间注意力阻断模块。空间自相关阻断利用空间统计学的概念来标识不规则的空间区域。多尺度径脱异阻断用于处理细致地图中的多个范围空间依赖关系，而时间注意力阻断模块用于过滤不相关的时间信息。实验结果表明，我们的模型在具有异常空间区域的情况下表现出优于其他竞争对手。我们还提供了一种 качеitative分析，以derive高注意力得分的关键时间信息。

ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale

paper_url: http://arxiv.org/abs/2310.01217
repo_url: https://github.com/cpjku/scalearn
paper_authors: Markus Frohmann, Carolin Holtermann, Shahed Masoudian, Anne Lauscher, Navid Rekabsaz
for: 本研究旨在提高多任务学习（MTL）的效率，特别是使用预训练语言模型（PLMs）。
methods: 该研究使用了两个阶段的方法：第一阶段是任务学习，在这个阶段，知识特定于任务是通过sets of parameters（例如适配器）被储存起来；第二阶段是传输，在这个阶段，已经学习的知识被用于目标任务。
results: 该研究表明，通过linearly scaling the output representations of source adapters for transfer learning，可以提高MTL的效率。 experiments on three benchmarks（GLUE、SuperGLUE、HumSet）表明，我们的方法可以轻量级地替代AdapterFusion，并且可以减少约0.35%的传输参数数量。此外，我们还发现，当进一步减少参数时，我们的方法仍然可以保持强大的能力，只需8个传输参数。

Abstract
Multi-task learning (MTL) has shown considerable practical benefits, particularly when using pre-trained language models (PLMs). While this is commonly achieved by simultaneously learning $n$ tasks under a joint optimization procedure, recent methods such as AdapterFusion structure the problem into two distinct stages: (i) task learning, where knowledge specific to a task is encapsulated within sets of parameters (\eg adapters), and (ii) transfer, where this already learned knowledge is leveraged for a target task. This separation of concerns provides numerous benefits, such as promoting reusability, and addressing cases involving data privacy and societal concerns; on the flip side, current two-stage MTL methods come with the cost of introducing a substantial number of additional parameters. In this work, we address this issue by leveraging the usefulness of linearly scaling the output representations of source adapters for transfer learning. We introduce ScaLearn, a simple and highly parameter-efficient two-stage MTL method that capitalizes on the knowledge of the source tasks by learning a minimal set of scaling parameters that enable effective knowledge transfer to a target task. Our experiments on three benchmarks (GLUE, SuperGLUE, and HumSet) show that our ScaLearn, in addition to facilitating the benefits of two-stage MTL, consistently outperforms strong baselines with only a small number of transfer parameters - roughly 0.35% of those of AdapterFusion. Remarkably, we observe that ScaLearn maintains its strong abilities even when further reducing parameters through uniform scaling and layer-sharing, achieving similarly competitive results with only $8$ transfer parameters for each target task. Our proposed approach thus demonstrates the power of simple scaling as a promise for more efficient task transfer.

摘要
多任务学习（MTL）已经实现了很大的实践效果，特别是使用预训练语言模型（PLM）。而现在的方法，如AdapterFusion，将问题分解成两个阶段：（i）任务学习，其中知识特定于任务被封装在参数集中（例如适配器）；（ii）传输，其中已经学习的知识被应用于目标任务。这种分解问题的方式提供了许多利点，如推动可重用性和解决数据隐私和社会问题等问题。然而，当前的两个阶段MTL方法带来了增加大量额外参数的成本。在这种情况下，我们提出了一种简单有效的两阶段MTL方法，即ScaLearn。我们的方法通过学习源任务的输出表示的线性扩展来实现效果转移。我们的实验结果表明，ScaLearn不仅可以实现二阶段MTL的好处，还可以在三个 benchmark（GLUE、SuperGLUE和HumSet）上连续输出竞争力强的结果，并且只需要约0.35%的传输参数。此外，我们发现，当进一步减少参数时，ScaLearn仍然保持强大的能力，只需要每个目标任务8个传输参数。这表明了简单的扩展是一种有 promise的更有效的任务传输方法。

Learn to Follow: Decentralized Lifelong Multi-agent Pathfinding via Planning and Learning

paper_url: http://arxiv.org/abs/2310.01207
repo_url: None
paper_authors: Alexey Skrynnik, Anton Andreychuk, Maria Nesterova, Konstantin Yakovlev, Aleksandr Panov
for: 这个论文主要针对的问题是多代理路径寻找问题（MAPF），具体来说是在中央控制器缺 absent 的情况下，多个代理机器人在图上寻找冲突free的路径。
methods: 本论文提出了一种解决这个问题的方法，具体来说是通过规划和强化学习来实现。规划技术用于构建和重新规划个体路径，而强化学习则用于找到避免碰撞的策略。这两种方法被集成起来，以提高系统的吞吐量和灵活性。
results: 论文的实验结果显示，这种方法在多种设置下都能够准确地解决MAPF问题，并且比learnable竞争对手高效，能够更好地适应未经训练的地图。此外，这种方法还比一个基于规则的方法快得多，并且比一个state-of-the-art search-based solver更快。

Abstract
Multi-agent Pathfinding (MAPF) problem generally asks to find a set of conflict-free paths for a set of agents confined to a graph and is typically solved in a centralized fashion. Conversely, in this work, we investigate the decentralized MAPF setting, when the central controller that posses all the information on the agents' locations and goals is absent and the agents have to sequientially decide the actions on their own without having access to a full state of the environment. We focus on the practically important lifelong variant of MAPF, which involves continuously assigning new goals to the agents upon arrival to the previous ones. To address this complex problem, we propose a method that integrates two complementary approaches: planning with heuristic search and reinforcement learning through policy optimization. Planning is utilized to construct and re-plan individual paths. We enhance our planning algorithm with a dedicated technique tailored to avoid congestion and increase the throughput of the system. We employ reinforcement learning to discover the collision avoidance policies that effectively guide the agents along the paths. The policy is implemented as a neural network and is effectively trained without any reward-shaping or external guidance. We evaluate our method on a wide range of setups comparing it to the state-of-the-art solvers. The results show that our method consistently outperforms the learnable competitors, showing higher throughput and better ability to generalize to the maps that were unseen at the training stage. Moreover our solver outperforms a rule-based one in terms of throughput and is an order of magnitude faster than a state-of-the-art search-based solver.

摘要
多智能路径找索（MAPF）问题通常是在中央控制器拥有所有智能机器人位置和目标信息的情况下解决的。然而，在这项工作中，我们研究了不中央化的MAPF设置，在没有中央控制器的情况下，智能机器人需要顺序决定行动，而不是 possessed 整个环境状态。我们关注实际上非常重要的持续分配新目标给智能机器人的生命 variant 的 MAPF 问题。为解决这个复杂的问题，我们提出了一种方法，该方法结合了规划和策略优化的搜索。我们使用规划来构建和重新规划个人路径。我们为规划算法添加了一种专门针对避免堵塞和提高系统吞吐量的技术。我们使用策略优化来找到避免碰撞的策略，这些策略通过神经网络实现，并在训练过程中不需要奖励或外部指导。我们对不同的设置进行了广泛的评估，并与现有的解决方案进行比较。结果表明，我们的方法在与可学习竞争对手进行比较时， persistently exhibit 高于通过put 和更好地适应未经训练的地图。此外，我们的解决方案在比较于寻找算法和规划算法的搜索效率和性能方面也表现出优异。

appjsonify: An Academic Paper PDF-to-JSON Conversion Toolkit

paper_url: http://arxiv.org/abs/2310.01206
repo_url: https://github.com/hitachi-nlp/appjsonify
paper_authors: Atsuki Yamaguchi, Terufumi Morishita
for: 这篇论文是为了提供一个基于Python的PDF-to-JSON转换工具集，用于处理学术论文。
methods: 该工具使用了多种视觉基于文档布局分析模型和规则基于文本处理方法来解析PDF文档。
results: 用户可以根据自己需要配置处理管道来处理特定的PDF格式文档。

Abstract
We present appjsonify, a Python-based PDF-to-JSON conversion toolkit for academic papers. It parses a PDF file using several visual-based document layout analysis models and rule-based text processing approaches. appjsonify is a flexible tool that allows users to easily configure the processing pipeline to handle a specific format of a paper they wish to process. We are publicly releasing appjsonify as an easy-to-install toolkit available via PyPI and GitHub.

摘要
我们现在发布了一个名为appjsonify的Python基于的PDF到JSON转换工具集，用于处理学术论文。它使用了许多基于视觉文档布局分析模型以及基于规则的文本处理方法来解析PDF文件。appjsonify是一个灵活的工具，允许用户轻松配置处理管道以处理他们想要处理的特定格式的论文。我们正式发布appjsonify，可以通过PyPI和GitHub上安装。

Quantifying the Plausibility of Context Reliance in Neural Machine Translation

paper_url: http://arxiv.org/abs/2310.01188
repo_url: None
paper_authors: Gabriele Sarti, Grzegorz Chrupała, Malvina Nissim, Arianna Bisazza
for: 本研究旨在测试语言模型是否可以使用人类可接受的方式使用上下文信息，以确保其在实际应用中的安全使用。
methods: 本研究使用了PECoRe批判性评估框架，该框架利用模型内部结构来识别生成文本中的上下文敏感词和其与上下文关联的证据。
results: 通过对语言翻译模型的生成文本进行PECoRe评估，研究发现模型在不同的话语水平上的表现有所不同，并且可以通过对模型的生成结果进行分析来找到不可靠的上下文使用情况。

Abstract
Establishing whether language models can use contextual information in a human-plausible way is important to ensure their safe adoption in real-world settings. However, the questions of when and which parts of the context affect model generations are typically tackled separately, and current plausibility evaluations are practically limited to a handful of artificial benchmarks. To address this, we introduce Plausibility Evaluation of Context Reliance (PECoRe), an end-to-end interpretability framework designed to quantify context usage in language models' generations. Our approach leverages model internals to (i) contrastively identify context-sensitive target tokens in generated texts and (ii) link them to contextual cues justifying their prediction. We use PECoRe to quantify the plausibility of context-aware machine translation models, comparing model rationales with human annotations across several discourse-level phenomena. Finally, we apply our method to unannotated generations to identify context-mediated predictions and highlight instances of (im)plausible context usage in model translations.

摘要
Translation:确定语言模型是否可以在人类可理解的方式中使用语境信息是确保其在实际场景中安全采用的重要问题。然而，当前的问题是 WHEN 和哪些语境信息影响模型生成的问题，通常是分开处理的，并且目前的可理解评估只是通过几个人工标准来进行。为了解决这个问题，我们介绍了可理解语境关系评估（PECoRe），一种结构化可读性框架，用于评估语言模型在生成过程中使用语境信息的可理解程度。我们的方法利用模型内部的特征来（i）对生成文本中的语境敏感目标token进行对比，并（ii）将其与生成文本中的语境诱导符相关联。我们使用 PECoRe 评估context-aware机器翻译模型的可理解程度，与人类注释相比讨论多个话语水平现象。最后，我们将我们的方法应用于无注释生成文本，以识别语境媒介预测和指出模型翻译中的（不）可理解语境使用情况。

NarrativePlay: Interactive Narrative Understanding

paper_url: http://arxiv.org/abs/2310.01459
repo_url: None
paper_authors: Runcong Zhao, Wenjia Zhang, Jiazheng Li, Lixing Zhu, Yanran Li, Yulan He, Lin Gui
for: 让用户扮演小说中的人物，与其他角色互动，体验幽默有趣的故事情节。
methods: 利用大型自然语言模型（LLMs）生成人类化回应，根据故事中人物特征进行引导。
results: 系统可以增强用户体验，通过自动生成的视觉显示、人物肖像和对话。

Abstract
In this paper, we introduce NarrativePlay, a novel system that allows users to role-play a fictional character and interact with other characters in narratives such as novels in an immersive environment. We leverage Large Language Models (LLMs) to generate human-like responses, guided by personality traits extracted from narratives. The system incorporates auto-generated visual display of narrative settings, character portraits, and character speech, greatly enhancing user experience. Our approach eschews predefined sandboxes, focusing instead on main storyline events extracted from narratives from the perspective of a user-selected character. NarrativePlay has been evaluated on two types of narratives, detective and adventure stories, where users can either explore the world or improve their favorability with the narrative characters through conversations.

摘要
在这篇论文中，我们介绍了一种新的系统 called NarrativePlay，它允许用户扮演虚构的人物，与其他人物在故事中进行互动，并在有限的环境中体验沉浸式的交互。我们利用大型自然语言模型（LLMs）生成人类化的回答，以人物特质从故事中提取的方式为导向。系统包括自动生成的视觉显示，包括故事的背景、人物图像和人物对话，这大大提高了用户体验。我们的方法不依赖于预定的沙盒，而是Focus on the main storyline events from the perspective of a user-selected character。NarrativePlay在探险和侦探故事中进行了评估，用户可以 either explore the world or improve their favorability with narrative characters through conversations。

Graph Isomorphic Networks for Assessing Reliability of the Medium-Voltage Grid

paper_url: http://arxiv.org/abs/2310.01181
repo_url: https://github.com/charlottecvn/ginenergygrids
paper_authors: Charlotte Cambier van Nooten, Tom van de Poll, Sonja Füllhase, Jacco Heres, Tom Heskes, Yuliya Shapovalova
for: This paper aims to improve the reliability and efficiency of energy grid assessments by using Graph Isomorphic Networks (GINs) for n-1 assessments in medium voltage grids.
methods: The proposed GIN approach directly handles graph-structured data and utilises graph structure and data about stations/cables to generalise to unseen grids.
results: The GIN approach demonstrates faster and more reliable grid assessments than traditional mathematical optimisation methods, reducing prediction times by approximately a factor of 1000.

Abstract
Ensuring electricity grid reliability becomes increasingly challenging with the shift towards renewable energy and declining conventional capacities. Distribution System Operators (DSOs) aim to achieve grid reliability by verifying the n-1 principle, ensuring continuous operation in case of component failure. Electricity networks' complex graph-based data holds crucial information for n-1 assessment: graph structure and data about stations/cables. Unlike traditional machine learning methods, Graph Neural Networks (GNNs) directly handle graph-structured data. This paper proposes using Graph Isomorphic Networks (GINs) for n-1 assessments in medium voltage grids. The GIN framework is designed to generalise to unseen grids and utilise graph structure and data about stations/cables. The proposed GIN approach demonstrates faster and more reliable grid assessments than a traditional mathematical optimisation approach, reducing prediction times by approximately a factor of 1000. The findings offer a promising approach to address computational challenges and enhance the reliability and efficiency of energy grid assessments.

摘要
随着可再生能源的转型和传统装机能力的下降，电力网络可靠性的保证变得越来越挑战。分布系统运行商（DSOs）希望通过n-1原则来确保网络可靠性，以避免因组件故障而导致的停电。电力网络的复杂图structured数据包含关键信息 для n-1评估：图结构和站/电缆数据。不同于传统机器学习方法，图神经网络（GNNs）直接处理图结构数据。本文提出使用图同态网络（GINs）来进行n-1评估在中压电网络中。GIN框架设计用于泛化到未看到的网络，利用图结构和站/电缆数据。提议的GIN方法比传统的数学优化方法更快和可靠，降低预测时间约1000倍。发现可以有效地解决计算挑战，提高能源网络评估的可靠性和效率。

Evolutionary Neural Architecture Search for Transformer in Knowledge Tracing

paper_url: http://arxiv.org/abs/2310.01180
repo_url: https://github.com/devilyangs/enas-kt
paper_authors: Shangshang Yang, Xiaoshan Yu, Ye Tian, Xueming Yan, Haiping Ma, Xingyi Zhang
for: 这篇论文的目的是提高知识追踪（KT）的准确率，以及自动选择输入特征和操作的搜索方法。
methods: 该论文使用了 transformer 架构，并添加了 convolution 操作来增强地方上下文模型，以及一种进化性的神经网络搜索方法来自动选择输入特征和操作。
results: 实验结果表明，该方法可以在两个最大和最复杂的教育数据集上达到最佳结果，并且比传统的 transformer 架构具有更好的准确率和搜索效率。

Abstract
Knowledge tracing (KT) aims to trace students' knowledge states by predicting whether students answer correctly on exercises. Despite the excellent performance of existing Transformer-based KT approaches, they are criticized for the manually selected input features for fusion and the defect of single global context modelling to directly capture students' forgetting behavior in KT, when the related records are distant from the current record in terms of time. To address the issues, this paper first considers adding convolution operations to the Transformer to enhance its local context modelling ability used for students' forgetting behavior, then proposes an evolutionary neural architecture search approach to automate the input feature selection and automatically determine where to apply which operation for achieving the balancing of the local/global context modelling. In the search space, the original global path containing the attention module in Transformer is replaced with the sum of a global path and a local path that could contain different convolutions, and the selection of input features is also considered. To search the best architecture, we employ an effective evolutionary algorithm to explore the search space and also suggest a search space reduction strategy to accelerate the convergence of the algorithm. Experimental results on the two largest and most challenging education datasets demonstrate the effectiveness of the architecture found by the proposed approach.

摘要
知识跟踪（KT）目的是跟踪学生的知识状态，预测学生在作业中答题是否正确。 DESPITE 现有的 transformer 基本 KT 方法的出色表现，它们被批评为手动选择的输入特征 для 混合和单一全局上下文模型直接捕捉学生的忘记行为在 KT 中，当相关记录与当前记录之间的时间距离较远时。 TO ADDRESS 这些问题，本文首先考虑添加 convolution 操作到 transformer 以增强其本地上下文模型的能力，然后提出一种EVOLUTIONARY NEURAL ARCHITECTURE SEARCH 方法来自动选择输入特征和应用哪些操作以实现本地/全局上下文模型的平衡。 IN THE SEARCH SPACE, the original global path containing the attention module in Transformer is replaced with the sum of a global path and a local path that could contain different convolutions, and the selection of input features is also considered. TO SEARCH THE BEST ARCHITECTURE, we employ an effective evolutionary algorithm to explore the search space and also suggest a search space reduction strategy to accelerate the convergence of the algorithm. EXPERIMENTAL RESULTS ON THE TWO LARGEST AND MOST CHALLENGING EDUCATION DATASETS DEMONSTRATE THE EFFECTIVENESS OF THE ARCHITECTURE FOUND BY THE PROPOSED APPROACH.

Towards guarantees for parameter isolation in continual learning

paper_url: http://arxiv.org/abs/2310.01165
repo_url: None
paper_authors: Giulia Lanzillotta, Sidak Pal Singh, Benjamin F. Grewe, Thomas Hofmann
for: 本研究旨在解释深度学习中 catastrophic forgetting 问题的解决方法。
methods: 本研究使用 parameter isolation 方法，并提供了对这些方法的可证确保。
results: 研究表明，parameter isolation 方法可以减轻 catastrophic forgetting 问题，并且可以提供可证的确保。

Abstract
Deep learning has proved to be a successful paradigm for solving many challenges in machine learning. However, deep neural networks fail when trained sequentially on multiple tasks, a shortcoming known as catastrophic forgetting in the continual learning literature. Despite a recent flourish of learning algorithms successfully addressing this problem, we find that provable guarantees against catastrophic forgetting are lacking. In this work, we study the relationship between learning and forgetting by looking at the geometry of neural networks' loss landscape. We offer a unifying perspective on a family of continual learning algorithms, namely methods based on parameter isolation, and we establish guarantees on catastrophic forgetting for some of them.

摘要
深度学习已经证明是机器学习中成功的一种方程式，但是深度神经网络在多个任务的顺序训练时会失败，这被称为 continual learning 文献中的惨极忘记问题。尽管最近有一些学习算法成功解决了这个问题，但我们发现有证据保证是缺失的。在这种工作中，我们研究学习和忘记之间的关系，通过审视神经网络损失的地形来研究。我们对基于参数隔离的一家 continual learning 算法提供了一种统一的视角，并为一些其中的 garantía 提供了保证。

DINE: Dimensional Interpretability of Node Embeddings

paper_url: http://arxiv.org/abs/2310.01162
repo_url: https://github.com/simonepiaggesi/dine
paper_authors: Simone Piaggesi, Megha Khosla, André Panisson, Avishek Anand
for: 这篇论文的目的是提出一种可解释性强的节点嵌入方法，以便更好地理解节点嵌入所表示的图structuren。
methods: 该论文使用了一种新的度量来评估节点嵌入的可解释性，并提出了一种新的节点嵌入方法called DINE，可以在不影响任务性能的情况下，提高节点嵌入的可解释性。
results: 实验表明，DINE可以同时学习高度可解释的节点嵌入，并且在链接预测任务中达到了高度有效的性能。

Abstract
Graphs are ubiquitous due to their flexibility in representing social and technological systems as networks of interacting elements. Graph representation learning methods, such as node embeddings, are powerful approaches to map nodes into a latent vector space, allowing their use for various graph tasks. Despite their success, only few studies have focused on explaining node embeddings locally. Moreover, global explanations of node embeddings remain unexplored, limiting interpretability and debugging potentials. We address this gap by developing human-understandable explanations for dimensions in node embeddings. Towards that, we first develop new metrics that measure the global interpretability of embedding vectors based on the marginal contribution of the embedding dimensions to predicting graph structure. We say that an embedding dimension is more interpretable if it can faithfully map to an understandable sub-structure in the input graph - like community structure. Having observed that standard node embeddings have low interpretability, we then introduce DINE (Dimension-based Interpretable Node Embedding), a novel approach that can retrofit existing node embeddings by making them more interpretable without sacrificing their task performance. We conduct extensive experiments on synthetic and real-world graphs and show that we can simultaneously learn highly interpretable node embeddings with effective performance in link prediction.

摘要
GRAPH（图）是在社会和技术系统中广泛使用的，因为它可以将这些系统表示为互相交互的网络。图表学习方法，如节点嵌入，可以将节点映射到一个隐藏空间中，以便进行多种图任务。然而，只有一些研究对节点嵌入进行了本地解释。此外，对节点嵌入的全球解释仍然未得到了充分的研究，这限制了图解释性和调试潜力。我们解决了这个差距，通过开发人类可理解的节点嵌入解释方法。为此，我们首先开发了一些测量全球可解释性的嵌入向量метри克，以衡量嵌入向量的权重。我们认为，嵌入维度更加可解释，如果它可以准确地映射到输入图中的理解性质结构。由于标准节点嵌入具有低可解释性，我们因此引入了DINE（可解释性基于维度的节点嵌入），一种新的方法，可以让节点嵌入更加可解释，而不是牺牲任务性能。我们在 sintetic 和实际图上进行了广泛的实验，并证明了我们可以同时学习高度可解释性的节点嵌入，并且在链接预测任务中获得了有效的表现。

SWMLP: Shared Weight Multilayer Perceptron for Car Trajectory Speed Prediction using Road Topographical Features

paper_url: http://arxiv.org/abs/2310.02282
repo_url: None
paper_authors: Sarah Almeida Carneiro, Giovanni Chierchia, Jean Charléty, Aurélie Chataignon, Laurent Najman
for: 提高交通运输管理的准确性和效率，适用于各地交通数据的可用性不充分的情况
methods: 使用路径特征与分类多层扩散学习模型，对车辆速度进行预测
results: 与标准回归分析相比，显示了显著的改善，同时提供了新的方法设计交通分析的思路。

Abstract
Although traffic is one of the massively collected data, it is often only available for specific regions. One concern is that, although there are studies that give good results for these data, the data from these regions may not be sufficiently representative to describe all the traffic patterns in the rest of the world. In quest of addressing this concern, we propose a speed prediction method that is independent of large historical speed data. To predict a vehicle's speed, we use the trajectory road topographical features to fit a Shared Weight Multilayer Perceptron learning model. Our results show significant improvement, both qualitative and quantitative, over standard regression analysis. Moreover, the proposed framework sheds new light on the way to design new approaches for traffic analysis.

摘要
Note:* "massively collected" is translated as " widely collected" (广泛收集)* "in quest of" is translated as "to address" (解决)* "independent of" is translated as "不依赖"* "shared weight" is translated as "共享 weights" (共享重量)* "multilayer perceptron" is translated as "多层感知器" (多层感知器)* "trajectory" is translated as "路径" (路径)* "road topographical features" is translated as "道路地形特征" (道路地形特征)

Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives

paper_url: http://arxiv.org/abs/2310.01152
repo_url: https://github.com/git-disl/gptlens
paper_authors: Sihao Hu, Tiansheng Huang, Fatih İlhan, Selim Furkan Tekin, Ling Liu
for: 这paper是为了系统地分析利用大语言模型（LLM）如GPT-4探测智能合约中的漏洞的可能性、挑战和解决方案。
methods: 这paper使用了一种名为GPTLens的 adversarial框架，将检测分为两个相互作用的阶段：生成和评估，通过让LLM担任审查员和评论者两个角色，分别提供更多答案和评估其正确性。
results: 实验和示例表明，GPTLens可以有效地减少false positive，并且可以由不具备专业知识的人使用。

Abstract
This paper provides a systematic analysis of the opportunities, challenges, and potential solutions of harnessing Large Language Models (LLMs) such as GPT-4 to dig out vulnerabilities within smart contracts based on our ongoing research. For the task of smart contract vulnerability detection, achieving practical usability hinges on identifying as many true vulnerabilities as possible while minimizing the number of false positives. Nonetheless, our empirical study reveals contradictory yet interesting findings: generating more answers with higher randomness largely boosts the likelihood of producing a correct answer but inevitably leads to a higher number of false positives. To mitigate this tension, we propose an adversarial framework dubbed GPTLens that breaks the conventional one-stage detection into two synergistic stages $-$ generation and discrimination, for progressive detection and refinement, wherein the LLM plays dual roles, i.e., auditor and critic, respectively. The goal of auditor is to yield a broad spectrum of vulnerabilities with the hope of encompassing the correct answer, whereas the goal of critic that evaluates the validity of identified vulnerabilities is to minimize the number of false positives. Experimental results and illustrative examples demonstrate that auditor and critic work together harmoniously to yield pronounced improvements over the conventional one-stage detection. GPTLens is intuitive, strategic, and entirely LLM-driven without relying on specialist expertise in smart contracts, showcasing its methodical generality and potential to detect a broad spectrum of vulnerabilities. Our code is available at: https://github.com/git-disl/GPTLens.

摘要

Adaptive Online Non-stochastic Control

paper_url: http://arxiv.org/abs/2310.02261
repo_url: None
paper_authors: Naram Mhaisen, George Iosifidis
for: 该论文目的是解决非随机控制问题，以获得适应控制环境的算法。
methods: 该论文使用FTRL框架，并设计了新的减少技术来考虑系统的记忆。此外，它还将这些减少技术与未可信的未来成本预测结合，实现了首个可信度适应的FTRL控制器。
results: 该论文获得了新的下线性数据适应政策 regret bound，并实现了可信度适应的 regret bound。

Abstract
We tackle the problem of Non-stochastic Control with the aim of obtaining algorithms that adapt to the controlled environment. Namely, we tailor the FTRL framework to dynamical systems where the existence of a state, or equivalently a memory, couples the effect of the online decisions. By designing novel regularization techniques that take the system's memory into consideration, we obtain controllers with new sub-linear data adaptive policy regret bounds. Furthermore, we append these regularizers with untrusted predictions of future costs, which enables the design of the first Optimistic FTRL-based controller whose regret bound is adaptive to the accuracy of the predictions, shrinking when they are accurate while staying sub-linear even when they all fail.

摘要
我团队面临非渐进控制问题，目标是获得适应控制环境的算法。具体来说，我们将FTRL框架应用到动态系统中，其中存在状态或等价的记忆会影响控制决策的效果。通过设计新的内存考虑的规范技术，我们获得了新的非线性数据适应策略 regret bound。此外，我们将这些规范与未经过信任的未来成本预测结合，实现了首个可靠FTRL-基于控制器，其 regret bound是对预测准确程度的自适应变量。

Stability and Generalization for Minibatch SGD and Local SGD

paper_url: http://arxiv.org/abs/2310.01139
repo_url: None
paper_authors: Yunwen Lei, Tao Sun, Mingrui Liu
for: 优化方法的批处理和本地优化
methods: 使用批处理SGD和本地SGD
results: 研究了这两种方法的稳定性和泛化能力，并发现它们可以达到线性增速，且可以满足最优风险 bound。

Abstract
The increasing scale of data propels the popularity of leveraging parallelism to speed up the optimization. Minibatch stochastic gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization. The existing theoretical studies show a linear speedup of these methods with respect to the number of machines, which, however, is measured by optimization errors. As a comparison, the stability and generalization of these methods are much less studied. In this paper, we study the stability and generalization analysis of minibatch and local SGD to understand their learnability by introducing a novel expectation-variance decomposition. We incorporate training errors into the stability analysis, which shows how small training errors help generalization for overparameterized models. We show both minibatch and local SGD achieve a linear speedup to attain the optimal risk bounds.

摘要
随着数据规模的增长，批处理并行化成为加速优化的受欢迎方法。小批处理梯度下降（minibatch SGD）和本地梯度下降（local SGD）是两种受欢迎的并行优化方法。现有的理论研究表明，这些方法在机器数量上 linear 的提高，但这些研究仅基于优化错误的度量。相比之下，这些方法的稳定性和泛化性尚未得到充分的研究。在这篇论文中，我们对 minibatch 和 local SGD 进行稳定性和泛化分析，并通过引入一种新的期望-变量分解来评估这些方法的学习可能性。我们将训练错误包含在稳定性分析中，并显示出小训练错误可以帮助泛化过parameterized 模型。我们还证明了 minibatch 和 local SGD 都可以达到线性的提高，以实现最佳风险下界。

Automated Evaluation of Classroom Instructional Support with LLMs and BoWs: Connecting Global Predictions to Specific Feedback

paper_url: http://arxiv.org/abs/2310.01132
repo_url: None
paper_authors: Jacob Whitehill, Jennifer LoCasale-Crouch
for: 用于提供更specific、更频繁、更actionable的教学反馈，我们探索了大自然语言模型（LLMs）可以用于估算“教学支持”领域分数（CLASS）评估协议中的教学支持分数。
methods: 我们设计了一种机器学习架构，使用Meta的Llama2零shot提示或经典的Bag of Words（BoW）模型，对教师的语音讲解（通过OpenAI的Whisper自动转录）进行分类，以确定11种行为指标中的教学支持。然后，这些语音分类结果被聚合到整个15分钟观察会议中，以估算全局CLASS分数。
results: 实验结果表明，(1)自动CLASS教学支持估算精度（Pearson $R$ 达到 $0.46$）与人类间评估相当（达到 $R=0.55$）; (2) LLMs 在这个任务中表现slightly better than BoW; (3) 最佳模型通常结合了LLM和BoW中的特征。最后，(4) 我们示例了如何使用模型的输出 visualize 教师每个语音的相对 corrleated 度，以提供可解释的反馈。

Abstract
With the aim to provide teachers with more specific, frequent, and actionable feedback about their teaching, we explore how Large Language Models (LLMs) can be used to estimate ``Instructional Support'' domain scores of the CLassroom Assessment Scoring System (CLASS), a widely used observation protocol. We design a machine learning architecture that uses either zero-shot prompting of Meta's Llama2, and/or a classic Bag of Words (BoW) model, to classify individual utterances of teachers' speech (transcribed automatically using OpenAI's Whisper) for the presence of 11 behavioral indicators of Instructional Support. Then, these utterance-level judgments are aggregated over an entire 15-min observation session to estimate a global CLASS score. Experiments on two CLASS-coded datasets of toddler and pre-kindergarten classrooms indicate that (1) automatic CLASS Instructional Support estimation accuracy using the proposed method (Pearson $R$ up to $0.46$) approaches human inter-rater reliability (up to $R=0.55$); (2) LLMs yield slightly greater accuracy than BoW for this task; and (3) the best models often combined features extracted from both LLM and BoW. Finally, (4) we illustrate how the model's outputs can be visualized at the utterance level to provide teachers with explainable feedback on which utterances were most positively or negatively correlated with specific CLASS dimensions.

摘要
为了为教师提供更具体、频繁和操作性的反馈，我们探讨了使用大型自然语言模型（LLM）来估算教室评估系统（CLASS）中的“教学支持”领域分数。我们设计了一种机器学习架构，使用Meta的Llama2零shot提示和/或简单的词汇模型（BoW）来分类教师的口头语言（通过OpenAI的Whisper自动识别）是否包含11种教学支持行为指标。然后，这些语言级别的判别结果被聚合到整个15分钟观察会话中，以估算全局的CLASS分数。实验结果表明，使用我们提posed的方法，自动CLASS教学支持估算准确率（Pearson $R$ 达到0.46）与人类间评估相似（达到$R=0.55$）。此外，LLM在这个任务中表现了些微的优势，而且最佳模型通常是将LLM和BoW特征结合使用。最后，我们示例了如何使用模型输出来为教师提供可解释的反馈，以便他们了解每句话是否与特定的CLASS维度有正相关。

Disentangling Voice and Content with Self-Supervision for Speaker Recognition

paper_url: http://arxiv.org/abs/2310.01128
repo_url: None
paper_authors: Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li
for: 这篇论文旨在提出一种同时模型说话者特征和内容变化的分解框架，以提高说话者识别精度。
methods: 该框架使用三个 Gaussian 推理层，每个层包含一个学习过程来抽取不同的说话者特征。此外，还提出了一种自我超vision方法，以动态分解内容，无需额外标签。
results: 在 VoxCeleb 和 SITW 数据集上进行了实验，提高了平均识别精度和最小偏差值，减少了 EER 和 minDCF 的平均值为 9.56% 和 8.24%，分别。这种方法不需要额外的模型训练或数据，因此在实际应用中容易实施。

Abstract
For speaker recognition, it is difficult to extract an accurate speaker representation from speech because of its mixture of speaker traits and content. This paper proposes a disentanglement framework that simultaneously models speaker traits and content variability in speech. It is realized with the use of three Gaussian inference layers, each consisting of a learnable transition model that extracts distinct speech components. Notably, a strengthened transition model is specifically designed to model complex speech dynamics. We also propose a self-supervision method to dynamically disentangle content without the use of labels other than speaker identities. The efficacy of the proposed framework is validated via experiments conducted on the VoxCeleb and SITW datasets with 9.56% and 8.24% average reductions in EER and minDCF, respectively. Since neither additional model training nor data is specifically needed, it is easily applicable in practical use.

摘要
为了进行说话人识别，因为语音中含有说话人特征和内容的混合，从语音中提取准确的说话人表示很难。这篇论文提出了一种分解框架，同时模型说话人特征和内容变化。它通过三个 Gaussian 推理层实现，每个层有一个可学习的传输模型，提取不同的语音组成部分。特别是，我们设计了一种加强的传输模型，以模型复杂的语音动态。我们还提出了一种自我超vision方法，以无需其他标签，动态分解内容。我们的方法在VoxCeleb和SITW数据集上进行了实验，实现了9.56%和8.24%的平均降低EER和minDCF。由于不需要额外的模型训练或数据，因此在实践中非常容易应用。

End-to-End Continuous Speech Emotion Recognition in Real-life Customer Service Call Center Conversations

paper_url: http://arxiv.org/abs/2310.02281
repo_url: None
paper_authors: Yajing Feng, Laurence Devillers
for: 这项研究的目的是开发一个大规模的实际生产环境中的客户服务Dialogue Emotion recognition系统。
methods: 该研究使用了维度型情感注解方法，以捕捉实际客户服务对话中情感的细腻、复杂性和连续性。同时，该研究还Addressed the challenges of applying End-to-End SER system to the dataset, such as determining the appropriate label sampling rate and input segment length, as well as integrating contextual information (interlocutor’s gender and empathy level) with different weights using multitask learning。
results: 研究显示，将对话者的共情水平信息与不同权重相结合使用多任务学习可以提高模型的表现。

Abstract
Speech Emotion recognition (SER) in call center conversations has emerged as a valuable tool for assessing the quality of interactions between clients and agents. In contrast to controlled laboratory environments, real-life conversations take place under uncontrolled conditions and are subject to contextual factors that influence the expression of emotions. In this paper, we present our approach to constructing a large-scale reallife dataset (CusEmo) for continuous SER in customer service call center conversations. We adopted the dimensional emotion annotation approach to capture the subtlety, complexity, and continuity of emotions in real-life call center conversations, while annotating contextual information. The study also addresses the challenges encountered during the application of the End-to-End (E2E) SER system to the dataset, including determining the appropriate label sampling rate and input segment length, as well as integrating contextual information (interlocutor's gender and empathy level) with different weights using multitask learning. The result shows that incorporating the empathy level information improved the model's performance.

摘要
<>Translate the following text into Simplified Chinese.<>语音情感识别（SER）在客户服务中心对话中已经成为评估客户和代理之间交流质量的有价值工具。与控制的实验室环境不同，实际对话发生在不可控的环境下，情感表达受到上下文因素的影响。本文介绍了我们的大规模实际生活数据集（CusEmo）的建构方法，用于连续的SER。我们采用了维度情感标注方法，以捕捉实际对话中情感的细节、复杂性和连续性，同时注解上下文信息。研究还解决了将结束到终端（E2E）SER系统应用到数据集中的挑战，包括确定合适的标签采样率和输入段长度，以及将不同权重的上下文信息（对话伙伴的性别和Empathy水平）与多任务学习结合。结果表明，包含Empathy水平信息可以提高模型的性能。

Prompt-tuning latent diffusion models for inverse problems

paper_url: http://arxiv.org/abs/2310.01110
repo_url: None
paper_authors: Hyungjin Chung, Jong Chul Ye, Peyman Milanfar, Mauricio Delbracio
for: 解决图像反问题，使用文本至图像普适扩散模型作为通用先验。
methods: 提出了一种新的方法，即在运行反扩散过程中同时调整文本嵌入，以提高图像的准确性。此外，我们还提出了一种方法，即在演化秘密变量时保持扩散变量在Encoder的范围内，以降低图像artefacts。
results: P2L方法在多个任务上，如超解、去锈和填充等任务上表现出色，超过了图像扩散模型和秘密扩散模型基于反问题解决器的性能。

Abstract
We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors. Existing methods using latent diffusion models for inverse problems typically rely on simple null text prompts, which can lead to suboptimal performance. To address this limitation, we introduce a method for prompt tuning, which jointly optimizes the text embedding on-the-fly while running the reverse diffusion process. This allows us to generate images that are more faithful to the diffusion prior. In addition, we propose a method to keep the evolution of latent variables within the range space of the encoder, by projection. This helps to reduce image artifacts, a major problem when using latent diffusion models instead of pixel-based diffusion models. Our combined method, called P2L, outperforms both image- and latent-diffusion model-based inverse problem solvers on a variety of tasks, such as super-resolution, deblurring, and inpainting.

摘要
我们提出一种新的方法，使用文本到图像潜扩散模型作为普riors来解决图像反问题。现有的方法通常使用简单的空文本提示，这可能会导致表现不佳。为解决这些限制，我们介绍了一种提问调整方法，同时在反扩散过程中运行文本嵌入jointly优化。这使得我们可以生成更 faithful to the diffusion prior的图像。此外，我们还提出了一种方法，使用投影来保持潜在变量的演化在扩散器的编码器范围内，以避免图像artefacts。我们的共同方法，称为P2L，在多种任务上，如超分解、抑噪和填充等任务上表现出色，超过了图像-和潜扩散模型基于 inverse problem solvers。

Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models

paper_url: http://arxiv.org/abs/2310.01107
repo_url: https://github.com/ground-a-video/ground-a-video
paper_authors: Hyeonho Jeong, Jong Chul Ye
for: 这篇论文旨在解决多属性视频编辑中的缺陷，提供一种基于文本-视频数据的无需训练的视频-视频翻译框架，以实现多属性视频编辑。
methods: 该方法基于 Cross-Frame Gated Attention Mechanism，利用文本信息在封闭表示中带入时间协调的信息，并具有 Modulated Cross-Attention 和滤波器减少 inverted latents 的技术。
results: 对于多属性视频编辑任务，Ground-A-Video 方法在无需训练的情况下，与基eline方法进行比较，实现了更高的编辑精度和帧稳定性。

Abstract
Recent endeavors in video editing have showcased promising results in single-attribute editing or style transfer tasks, either by training text-to-video (T2V) models on text-video data or adopting training-free methods. However, when confronted with the complexities of multi-attribute editing scenarios, they exhibit shortcomings such as omitting or overlooking intended attribute changes, modifying the wrong elements of the input video, and failing to preserve regions of the input video that should remain intact. To address this, here we present a novel grounding-guided video-to-video translation framework called Ground-A-Video for multi-attribute video editing. Ground-A-Video attains temporally consistent multi-attribute editing of input videos in a training-free manner without aforementioned shortcomings. Central to our method is the introduction of Cross-Frame Gated Attention which incorporates groundings information into the latent representations in a temporally consistent fashion, along with Modulated Cross-Attention and optical flow guided inverted latents smoothing. Extensive experiments and applications demonstrate that Ground-A-Video's zero-shot capacity outperforms other baseline methods in terms of edit-accuracy and frame consistency. Further results and codes are provided at our project page (http://ground-a-video.github.io).

摘要

NP$^2$L: Negative Pseudo Partial Labels Extraction for Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.01098
repo_url: None
paper_authors: Xinjie Shen, Danyang Wu, Jitao Lu, Junjie Liang, Jin Xu, Feiping Nie
for: 提高 Pseudo 标签的准确性，并在图神经网络（GNNs）中应用。
methods: 使用不重叠的部分标签选择 pseudo 标签，并通过构建负边的方式将其与Message Passing Mechanism结合。
results: 在链接预测和节点分类任务中达到了领先的表现，比如在 benchmark 数据集上实现了状态的排名。

Abstract
How to utilize the pseudo labels has always been a research hotspot in machine learning. However, most methods use pseudo labels as supervised training, and lack of valid assessing for their accuracy. Moreover, applications of pseudo labels in graph neural networks (GNNs) oversee the difference between graph learning and other machine learning tasks such as message passing mechanism. Aiming to address the first issue, we found through a large number of experiments that the pseudo labels are more accurate if they are selected by not overlapping partial labels and defined as negative node pairs relations. Therefore, considering the extraction based on pseudo and partial labels, negative edges are constructed between two nodes by the negative pseudo partial labels extraction (NP$^2$E) module. With that, a signed graph are built containing highly accurate pseudo labels information from the original graph, which effectively assists GNN in learning at the message-passing level, provide one solution to the second issue. Empirical results about link prediction and node classification tasks on several benchmark datasets demonstrate the effectiveness of our method. State-of-the-art performance is achieved on the both tasks.

摘要
如何利用假标签是机器学习领域的研究热点之一。然而，大多数方法使用假标签作为有监督训练，而不具有有效的评估准确性。此外，假标签在图神经网络（GNNs）中的应用过look the difference between graph learning and other machine learning tasks such as message passing mechanism.为解决第一个问题，我们通过大量实验发现，使用不重叠的 partial labels 选择 pseudo labels 会更准确。因此，基于 pseudo 和 partial 标签的抽取，我们提出了一种构建负 edges 的方法，即负 pseudo partial labels EXTraction（NP$^2$E）模块。通过这种方法，我们可以从原始图中提取高准确度的 pseudo labels 信息，帮助 GNN 在消息传递水平上学习，提供一种解决第二个问题的方案。我们在多个 benchmark 数据集上进行了链接预测和节点分类任务的实验结果，得到了 estado-of-the-art 的性能。

LoCUS: Learning Multiscale 3D-consistent Features from Posed Images

paper_url: http://arxiv.org/abs/2310.01095
repo_url: https://github.com/dakloepfer/locus
paper_authors: Dominik A. Kloepfer, Dylan Campbell, João F. Henriques
for: trains a neural network to learn a versatile representation of the world that can handle occlusions, previously-unseen views, and long time horizons without supervision.
methods: uses a patch retrieval objective to train the network, balancing retrieval and reusability by constructing the retrieval set carefully and adjusting the spatial tolerance.
results: demonstrates the effectiveness of the proposed method in creating sparse, multi-scale, semantic spatial maps composed of highly identifiable landmarks, with applications in landmark retrieval, localization, semantic segmentation, and instance segmentation.

Abstract
An important challenge for autonomous agents such as robots is to maintain a spatially and temporally consistent model of the world. It must be maintained through occlusions, previously-unseen views, and long time horizons (e.g., loop closure and re-identification). It is still an open question how to train such a versatile neural representation without supervision. We start from the idea that the training objective can be framed as a patch retrieval problem: given an image patch in one view of a scene, we would like to retrieve (with high precision and recall) all patches in other views that map to the same real-world location. One drawback is that this objective does not promote reusability of features: by being unique to a scene (achieving perfect precision/recall), a representation will not be useful in the context of other scenes. We find that it is possible to balance retrieval and reusability by constructing the retrieval set carefully, leaving out patches that map to far-away locations. Similarly, we can easily regulate the scale of the learned features (e.g., points, objects, or rooms) by adjusting the spatial tolerance for considering a retrieval to be positive. We optimize for (smooth) Average Precision (AP), in a single unified ranking-based objective. This objective also doubles as a criterion for choosing landmarks or keypoints, as patches with high AP. We show results creating sparse, multi-scale, semantic spatial maps composed of highly identifiable landmarks, with applications in landmark retrieval, localization, semantic segmentation and instance segmentation.

摘要
自然语言处理中的一个重要挑战是建立一个空间和时间一致的世界模型。这种模型需要在遮盖物、新视图和长时间跨度（例如循环关闭和重新识别）下保持一致。目前还没有一个有效的训练方法。我们从patch retrieval问题开始，即给一个图像patch在不同视图中找到与同一个真实世界位置相对应的所有patch。一个缺点是这种目标不会促进特征的可重用性：由于每个场景都是唯一的，因此一个表示不会在其他场景中有用。我们发现可以平衡检索和可重用性，通过精心构造检索集，排除与远方位置相对应的patch。同时，我们可以轻松调整学习的特征精度（例如点、物体或房间），通过调整检索集的空间宽度。我们优化(平滑)平均准确率（AP），并在单一的排名基础上定义一个对象。这个对象同时也可以用于选择标识点或关键点，因为patches with high AP。我们获得了一个稀畴、多尺度、semantic空间地图，由高度可识别的标识点组成，并有应用于标识点检索、本地化、semantic分割和实例分割。

Non-negative isomorphic neural networks for photonic neuromorphic accelerators

paper_url: http://arxiv.org/abs/2310.01084
repo_url: None
paper_authors: Manos Kirtas, Nikolaos Passalis, Nikolaos Pleros, Anastasios Tefas
for: 提高计算速度和能效率，实现 Femtojoule per MAC 级别的精度。
methods: 使用非负神经网络和启用不同的启发式硬件能力，从而超越现有的硬件复杂性和能效率问题。
results: 实现了非负神经网络的训练和优化，并且能够与常见神经网络相比，保持同等的准确率。

Abstract
Neuromorphic photonic accelerators are becoming increasingly popular, since they can significantly improve computation speed and energy efficiency, leading to femtojoule per MAC efficiency. However, deploying existing DL models on such platforms is not trivial, since a great range of photonic neural network architectures relies on incoherent setups and power addition operational schemes that cannot natively represent negative quantities. This results in additional hardware complexity that increases cost and reduces energy efficiency. To overcome this, we can train non-negative neural networks and potentially exploit the full range of incoherent neuromorphic photonic capabilities. However, existing approaches cannot achieve the same level of accuracy as their regular counterparts, due to training difficulties, as also recent evidence suggests. To this end, we introduce a methodology to obtain the non-negative isomorphic equivalents of regular neural networks that meet requirements of neuromorphic hardware, overcoming the aforementioned limitations. Furthermore, we also introduce a sign-preserving optimization approach that enables training of such isomorphic networks in a non-negative manner.

摘要
射频神经加速器在普及化的程度不断增加，因为它们可以很大程度地提高计算速度和能效率，实现 femtojoule 每个 MAC 效率。但是，将现有的 Deep Learning 模型部署到这些平台上不是一个简单的问题，因为大量的光子神经网络架构和加法操作方案不能Native地表示负数。这会增加硬件复杂度，降低能效率。为了解决这个问题，我们可以训练非负的神经网络，并且可能利用射频神经网络的全面射频能力。但是，现有的方法不能实现与常规神经网络相同的精度水准，因为训练问题。为了解决这个问题，我们提出了一种方法，可以将常规神经网络转换为非负的同样能力的神经网络，超越以上所述的限制。此外，我们还提出了一种维持正数的优化方法，可以在非负的情况下训练这些同样能力的神经网络。

Linear attention is (maybe) all you need (to understand transformer optimization)

paper_url: http://arxiv.org/abs/2310.01082
repo_url: None
paper_authors: Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra
for: 研究transformer训练的困难性，并设计优化器和各种优化策略。
methods: 使用简化后的线性 transformer 模型来解决回归任务，受到 J. von Oswald et al. (ICML 2023) 和 K. Ahn et al. (NeurIPS 2023) 的启发。
results: 发现我们的提议的线性 transformer 模型可以复制许多 transformer 训练动态的显著特征。因此，这些结果表明一个简化后的线性 transformer 模型可能是理解 transformer 优化的有用、现实的尝试。

Abstract
Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training transformers by carefully studying a simple yet canonical linearized shallow transformer model. Specifically, we train linear transformers to solve regression tasks, inspired by J. von Oswald et al. (ICML 2023), and K. Ahn et al. (NeurIPS 2023). Most importantly, we observe that our proposed linearized models can reproduce several prominent aspects of transformer training dynamics. Consequently, the results obtained in this paper suggest that a simple linearized transformer model could actually be a valuable, realistic abstraction for understanding transformer optimization.

摘要
<>转换器训练非常困难，需要精心设计优化器和使用各种规则。我们通过仔细研究简单又惯用的线性化浅转换器模型，做出了进展。Specifically，我们使用线性转换器解决回归任务， draw inspiration from J. von Oswald et al. (ICML 2023) 和 K. Ahn et al. (NeurIPS 2023)。最重要的是，我们发现我们提议的线性化模型可以重现许多转换器训练动态。因此，本文中的结果表明，一个简单的线性化转换器模型可能是一个有价值的、现实的减少。Note: "ICML" and "NeurIPS" are conferences in the field of machine learning, and "J. von Oswald et al." and "K. Ahn et al." are references to specific papers or research works.

Shaping of Magnetic Field Coils in Fusion Reactors using Bayesian Optimisation

paper_url: http://arxiv.org/abs/2310.01455
repo_url: None
paper_authors: Timothy Nunn, Vignesh Gopakumar, Sebastien Kahn
for: 这个论文是为了设计一个可持续的核聚变能源 реактор而写的。
methods: 这个论文使用了人工智能驱动的策略来探索设计搜索空间，并确定最佳参数。
results: 这个论文的结果表明，使用多输出汇丰度优化方法可以确定托роида型磁铁织物的形状，以最小化磁湍的影响，并最大化плазма稳定性。

Abstract
Nuclear fusion using magnetic confinement holds promise as a viable method for sustainable energy. However, most fusion devices have been experimental and as we move towards energy reactors, we are entering into a new paradigm of engineering. Curating a design for a fusion reactor is a high-dimensional multi-output optimisation process. Through this work we demonstrate a proof-of-concept of an AI-driven strategy to help explore the design search space and identify optimum parameters. By utilising a Multi-Output Bayesian Optimisation scheme, our strategy is capable of identifying the Pareto front associated with the optimisation of the toroidal field coil shape of a tokamak. The optimisation helps to identify design parameters that would minimise the costs incurred while maximising the plasma stability by way of minimising magnetic ripples.

摘要
核聚变使用磁场束缚显示具有可持续的能源潜力。然而，大多数聚变设备都是实验性质的，我们进入了新的工程 paradigm。 curaing a design for a fusion reactor是一个多输出优化过程。通过这项工作，我们展示了一种人工智能驱动的策略，以探索设计搜索空间并确定优化参数。通过使用多输出 bayesian 优化方案，我们的策略可以确定扁平磁场绕组的形状优化参数，以最小化磁场波动的影响。

Back to the Future: Towards Explainable Temporal Reasoning with Large Language Models

paper_url: http://arxiv.org/abs/2310.01074
repo_url: https://github.com/chenhan97/timellama
paper_authors: Chenhan Yuan, Qianqian Xie, Jimin Huang, Sophia Ananiadou
for: 这paper的目的是提出一个新的时间推理任务，即可Explainable Temporal Reasoning（ETR），用于测试LLMs的复杂时间推理能力和解释能力。
methods: 这paper使用了一种新的知识图生成策略，将多个知识图 dataset的时间推理路径转化为 ExpTime dataset，并基于这些数据集提出了一种新的 LLM 系列 TimeLlaMA，可以进行指令跟踪的时间推理。
results: 这paper的实验结果显示，TimeLlaMA 在 ETR 任务中表现出色，在时间预测和解释能力方面均达到了当前最佳性能。

Abstract
Temporal reasoning is a crucial NLP task, providing a nuanced understanding of time-sensitive contexts within textual data. Although recent advancements in LLMs have demonstrated their potential in temporal reasoning, the predominant focus has been on tasks such as temporal expression and temporal relation extraction. These tasks are primarily designed for the extraction of direct and past temporal cues and to engage in simple reasoning processes. A significant gap remains when considering complex reasoning tasks such as event forecasting, which requires multi-step temporal reasoning on events and prediction on the future timestamp. Another notable limitation of existing methods is their incapability to provide an illustration of their reasoning process, hindering explainability. In this paper, we introduce the first task of explainable temporal reasoning, to predict an event's occurrence at a future timestamp based on context which requires multiple reasoning over multiple events, and subsequently provide a clear explanation for their prediction. Our task offers a comprehensive evaluation of both the LLMs' complex temporal reasoning ability, the future event prediction ability, and explainability-a critical attribute for AI applications. To support this task, we present the first multi-source instruction-tuning dataset of explainable temporal reasoning (ExpTime) with 26k derived from the temporal knowledge graph datasets and their temporal reasoning paths, using a novel knowledge-graph-instructed-generation strategy. Based on the dataset, we propose the first open-source LLM series TimeLlaMA based on the foundation LlaMA2, with the ability of instruction following for explainable temporal reasoning. We compare the performance of our method and a variety of LLMs, where our method achieves the state-of-the-art performance of temporal prediction and explanation.

摘要
时间理解是一项重要的自然语言处理（NLP）任务，它提供了对时间敏感的文本数据中的上下文的细化理解。虽然最近的大语言模型（LLMs）在时间理解方面已经表现出了潜在的能力，但是主要的ocus是在时间表达和时间关系提取方面，这些任务主要是为了提取 direct 和过去时间信息，并进行简单的逻辑处理。现有的方法具有复杂逻辑任务的限制，如事件预测，它需要对事件进行多步时间逻辑处理，并预测未来时间戳。此外，现有的方法很难提供逻辑过程的说明，这限制了解释性。在这篇论文中，我们引入了首次的解释时间理解任务，即预测未来时间戳上的事件发生，基于上下文，需要多个逻辑步骤和多个事件的重复理解。这个任务提供了评估LLMs的复杂时间理解能力、未来事件预测能力和解释性的全面评估。为支持这个任务，我们提供了首个多源 instruciton-tuning 数据集 explainable temporal reasoning (ExpTime)，包含26k个从时间知识图数据集和其时间逻辑路径 derivated，使用一种新的知识图指导生成策略。基于这个数据集，我们提出了首个基于基础 LlaMA2 的开源 LLM 系列 TimeLlaMA，具有指令跟踪的能力 для解释时间理解。我们与多种 LLMs 进行比较，其中我们的方法达到了时间预测和解释的状态艺术性表现。

KGEx: Explaining Knowledge Graph Embeddings via Subgraph Sampling and Knowledge Distillation

paper_url: http://arxiv.org/abs/2310.01065
repo_url: None
paper_authors: Vasileios Baltatzis, Luca Costabello
For: + The paper aims to provide explanations for link predictions in knowledge graph embeddings (KGE) models.* Methods: + The paper proposes a novel post-hoc method called KGEx, which explains individual link predictions by training surrogate KGE models on different subsets of the target triple’s neighborhood. + The method uses a distillation process to ensure that the surrogate models are faithful to the original KGE model.* Results: + The paper demonstrates that KGEx is capable of providing explanations that are faithful to the original KGE model, using two publicly available datasets.Here is the simplified Chinese translation of the three key information points:* For: + 论文目的是为知识图embedding（KGE）模型提供链接预测解释。* Methods: + 论文提出了一种新的后置方法called KGEx，它使用不同的目标 triple 邻居子集进行各个链接预测解释。 + 方法使用了一种热退处理来确保佯作模型忠实于原始KGE模型。* Results: + 论文示出了KGEx可以提供 faithful于原始KGE模型的解释，使用了两个公开available的数据集。

Abstract
Despite being the go-to choice for link prediction on knowledge graphs, research on interpretability of knowledge graph embeddings (KGE) has been relatively unexplored. We present KGEx, a novel post-hoc method that explains individual link predictions by drawing inspiration from surrogate models research. Given a target triple to predict, KGEx trains surrogate KGE models that we use to identify important training triples. To gauge the impact of a training triple, we sample random portions of the target triple neighborhood and we train multiple surrogate KGE models on each of them. To ensure faithfulness, each surrogate is trained by distilling knowledge from the original KGE model. We then assess how well surrogates predict the target triple being explained, the intuition being that those leading to faithful predictions have been trained on impactful neighborhood samples. Under this assumption, we then harvest triples that appear frequently across impactful neighborhoods. We conduct extensive experiments on two publicly available datasets, to demonstrate that KGEx is capable of providing explanations faithful to the black-box model.

摘要
尽管知识图谱链接预测是使用知识图谱嵌入（KGE）的首选方法，但研究知识图谱嵌入的解释性（KGE）的研究相对较少。我们提出了KGEx，一种新的后期方法，用于解释个体链接预测。给定一个目标 triple，KGEx 将训练多个代表性 KGE 模型，并用这些模型来确定重要的训练 triple。为了衡量训练 triple 的影响，我们会随机选择target triple的邻居区域，并在每个区域中训练多个代表性 KGE 模型。为确保准确性，每个代表性模型都会通过从原始 KGE 模型中精炼知识来训练。然后，我们会评估这些代表性模型是否能够预测目标 triple，即那些能够预测正确的 triple 的代表性模型是否能够在 impactful 的 neighborhood 中找到相似的 triple。在这个假设下，我们会抽取出 frequently 出现在 impactful neighborhood 中的 triple。我们进行了大量的实验，以示KGEx 可以提供 faithful 的解释。

Combining Deep Learning and GARCH Models for Financial Volatility and Risk Forecasting

paper_url: http://arxiv.org/abs/2310.01063
repo_url: None
paper_authors: Jakub Michańków, Łukasz Kwiatkowski, Janusz Morajda
for: 这个论文是为了研究一种结合常见 econometric GARCH 时间序列模型和深度学习神经网络的投资风险预测方法。
methods: 这个方法使用了 Gated Recurrent Unit (GRU) 神经网络，并使用了四种不同的 GARCH 组合：标准 GARCH、EGARCH、GJR-GARCH 和 APARCH。
results: 模型在使用日常对数返回的 S&P 500 指数和黄金价格、比特币价格上进行测试，并使用价格范围基于 Garman-Klass 估计器，修改为包含开盘和收盘价格。使用这些混合模型的风险预测结果可以评估资产的风险，包括 Value-at-Risk (VaR) 和 Expected Shortfall (ES) 在两个不同的风险水平（5% 和 1%）。

Abstract
In this paper, we develop a hybrid approach to forecasting the volatility and risk of financial instruments by combining common econometric GARCH time series models with deep learning neural networks. For the latter, we employ Gated Recurrent Unit (GRU) networks, whereas four different specifications are used as the GARCH component: standard GARCH, EGARCH, GJR-GARCH and APARCH. Models are tested using daily logarithmic returns on the S&P 500 index as well as gold price Bitcoin prices, with the three assets representing quite distinct volatility dynamics. As the main volatility estimator, also underlying the target function of our hybrid models, we use the price-range-based Garman-Klass estimator, modified to incorporate the opening and closing prices. Volatility forecasts resulting from the hybrid models are employed to evaluate the assets' risk using the Value-at-Risk (VaR) and Expected Shortfall (ES) at two different tolerance levels of 5% and 1%. Gains from combining the GARCH and GRU approaches are discussed in the contexts of both the volatility and risk forecasts. In general, it can be concluded that the hybrid solutions produce more accurate point volatility forecasts, although it does not necessarily translate into superior VaR and ES forecasts.

摘要
在这篇论文中，我们开发了一种混合方法来预测金融工具的波动和风险， combining 常见的 econometric GARCH 时间序列模型和深度学习神经网络。其中，我们使用 Gated Recurrent Unit (GRU) 网络，而 GARCH 组件使用了四种不同的规格：标准 GARCH， EGARCH， GJR-GARCH 和 APARCH。我们使用日ilogarithmic returns 和黄金价格 Bitcoin 价格作为测试数据，这三种资产具有不同的波动动态。作为主要波动估计器，我们使用了基于价格范围的 Garman-Klass 估计器，并修改了其包含开盘和关闭价格。我们使用 hybrid 模型中的波动预测值来评估这三种资产的风险，使用 VaR 和 ES 两种不同的容差级别（5% 和 1%）。我们发现混合方法可以提高波动预测值的准确性，但不一定意味着更好的 VaR 和 ES 预测值。

Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

paper_url: http://arxiv.org/abs/2310.01061
repo_url: https://github.com/rmanluo/reasoning-on-graphs
paper_authors: Linhao Luo, Yuan-Fang Li, Gholamreza Haffari, Shirui Pan
for: 提高大语言模型（LLM）的逻辑能力和可靠性，使其能够 faithful 和可读性地进行逻辑过程。
methods: 提出一种基于知识图（KG）的逻辑 reasoning 方法，即reasoning on graphs（RoG）， synergizes LLM 与 KG 以实现 faithful 和可读性的逻辑过程。
results: 通过实验表明，RoG 可以在两个 benchmark KGQA 数据集上 achieve state-of-the-art 性能，并生成 faithful 和可读性的逻辑结果。

Abstract
Large language models (LLMs) have demonstrated impressive reasoning abilities in complex tasks. However, they lack up-to-date knowledge and experience hallucinations during reasoning, which can lead to incorrect reasoning processes and diminish their performance and trustworthiness. Knowledge graphs (KGs), which capture vast amounts of facts in a structured format, offer a reliable source of knowledge for reasoning. Nevertheless, existing KG-based LLM reasoning methods only treat KGs as factual knowledge bases and overlook the importance of their structural information for reasoning. In this paper, we propose a novel method called reasoning on graphs (RoG) that synergizes LLMs with KGs to enable faithful and interpretable reasoning. Specifically, we present a planning-retrieval-reasoning framework, where RoG first generates relation paths grounded by KGs as faithful plans. These plans are then used to retrieve valid reasoning paths from the KGs for LLMs to conduct faithful reasoning. Furthermore, RoG not only distills knowledge from KGs to improve the reasoning ability of LLMs through training but also allows seamless integration with any arbitrary LLMs during inference. Extensive experiments on two benchmark KGQA datasets demonstrate that RoG achieves state-of-the-art performance on KG reasoning tasks and generates faithful and interpretable reasoning results.

摘要
大型语言模型（LLM）在复杂任务中表现出了印象的推理能力。然而，它们缺乏最新的知识和经验幻视，这可能会导致推理过程中的错误和性能下降。知识图（KG），它们捕捉了巨量的事实，并以结构化的格式储存，可以提供推理中的可靠来源。然而，现有的KG基于LLM推理方法只视KG作为事实知识库，忽略了结构信息的重要性。在这篇论文中，我们提出了一种名为“推理在图”（RoG）的新方法，它融合了LLM和KG，以实现忠实和可解释的推理。具体来说，RoG首先从KG中获取与关系相关的路径，并将其转换为实际的行动方案。这些方案后来可以用来从KG中搜寻有效的推理路径，供LLM进行忠实的推理。此外，RoG不仅将知识从KG中提取出来提高LLM的推理能力，而且还允许在推理过程中与任意LLM进行整合。实验结果显示，RoG在KG推理任务中获得了国际级的表现，并产生了忠实和可解释的推理结果。

Improved Crop and Weed Detection with Diverse Data Ensemble Learning in Agriculture

paper_url: http://arxiv.org/abs/2310.01055
repo_url: None
paper_authors: Muhammad Hamza Asad, Saeed Anwar, Abdul Bais
for: 这项研究的目的是提高深度学习技术在具有不同场景的现代农业中的应用，特别是在检测、定位和量化作物和障碍物方面。
methods: 这项研究使用了 Ensemble 框架，包括不同的作物和障碍物模型，以及一种 teach-and-student 配置。基本模型通过同一个 UNET 元架进行同步融合，以提高作物和障碍物的Semantic Segmentation。
results: 研究表明，使用多种作物和障碍物模型，并使用 teach-and-student 配置可以提高 Canola 作物和 Kochia 障碍物的 Semantic Segmentation 性能，并在未看到的测试数据上超过单个 Semantic Segmentation 模型的表现。此外，我们还进行了ablation study，并证明了我们提出的模型的有效性。

Abstract
Modern agriculture heavily relies on Site-Specific Farm Management practices, necessitating accurate detection, localization, and quantification of crops and weeds in the field, which can be achieved using deep learning techniques. In this regard, crop and weed-specific binary segmentation models have shown promise. However, uncontrolled field conditions limit their performance from one field to the other. To improve semantic model generalization, existing methods augment and synthesize agricultural data to account for uncontrolled field conditions. However, given highly varied field conditions, these methods have limitations. To overcome the challenges of model deterioration in such conditions, we propose utilizing data specific to other crops and weeds for our specific target problem. To achieve this, we propose a novel ensemble framework. Our approach involves utilizing different crop and weed models trained on diverse datasets and employing a teacher-student configuration. By using homogeneous stacking of base models and a trainable meta-architecture to combine their outputs, we achieve significant improvements for Canola crops and Kochia weeds on unseen test data, surpassing the performance of single semantic segmentation models. We identify the UNET meta-architecture as the most effective in this context. Finally, through ablation studies, we demonstrate and validate the effectiveness of our proposed model. We observe that including base models trained on other target crops and weeds can help generalize the model to capture varied field conditions. Lastly, we propose two novel datasets with varied conditions for comparisons.

摘要
现代农业 heavily 依赖于位置特定农业管理实践，需要准确检测、定位和量化作物和雷的场景，可以通过深度学习技术来实现。在这个意义上，特定作物和雷的二进制分割模型已经表现出了承诺。然而，不可控的场景限制了这些模型在不同场景下的性能。为了改进semantic模型的泛化性，现有的方法通过扩充和合成农业数据来补做这些场景。然而，由于场景变化很大，这些方法有限制。为了超越场景变化的挑战，我们提议利用其他作物和雷的数据来解决特定问题。我们的方法包括利用不同的作物和雷模型，并使用教师-学生配置。通过同态拼接基模型的输出，我们实现了对Canola作物和Kochia雷的测试数据上的显著改进，超越了单个semantic segmentation模型的性能。我们发现UNET meta-architecture是最有效的。最后，通过剖面研究，我们证明了我们的提议模型的有效性。我们发现包括基于其他目标作物和雷的基模型可以帮助模型泛化到不同的场景中。最后，我们提出了两个新的数据集，用于比较。

Subtractor-Based CNN Inference Accelerator

paper_url: http://arxiv.org/abs/2310.01022
repo_url: None
paper_authors: Victor Gao, Issam Hammad, Kamal El-Sankary, Jason Gu
for: 提高 CNN 推理加速器的性能，通过使用减法来取代多余的 multiply 和 add 操作。
methods: 提出了一种新的 CNN 预处理加速器，利用排序、分组和圆拟操作来创造一些可以将 multiply 和 add 操作替换为减法操作的组合。这种方法可以降低 multiply 操作的成本，从而提高性能。
results: 通过对 LeNet-5 和 MNIST 集成 dataset 进行测试，实现了32.03% 的功率减少和24.59% 的面积减少，但只增加了0.1% 的精度损失。

Abstract
This paper presents a novel method to boost the performance of CNN inference accelerators by utilizing subtractors. The proposed CNN preprocessing accelerator relies on sorting, grouping, and rounding the weights to create combinations that allow for the replacement of one multiplication operation and addition operation by a single subtraction operation when applying convolution during inference. Given the high cost of multiplication in terms of power and area, replacing it with subtraction allows for a performance boost by reducing power and area. The proposed method allows for controlling the trade-off between performance gains and accuracy loss through increasing or decreasing the usage of subtractors. With a rounding size of 0.05 and by utilizing LeNet-5 with the MNIST dataset, the proposed design can achieve 32.03% power savings and a 24.59% reduction in area at the cost of only 0.1% in terms of accuracy loss.

摘要

ETGraph: A Pioneering Dataset Bridging Ethereum and Twitter

paper_url: http://arxiv.org/abs/2310.01015
repo_url: None
paper_authors: Qian Wang, Zhen Zhang, Zemin Liu, Shengliang Lu, Bingqiao Luo, Bingsheng He
For: The paper aims to address the limitation of existing public blockchain datasets by incorporating relevant social network data into blockchain analysis.* Methods: The paper introduces ETGraph, a novel dataset that combines Ethereum transaction records and Twitter following data to authentically link Ethereum addresses with verified Twitter accounts.* Results: The paper demonstrates the significance of Twitter data in enhancing Ethereum analysis through detailed statistical analysis and extensive experiments, including Ethereum link prediction, wash-trading Ethereum addresses detection, and Twitter-Ethereum matching link prediction.

Abstract
While numerous public blockchain datasets are available, their utility is constrained by a singular focus on blockchain data. This constraint limits the incorporation of relevant social network data into blockchain analysis, thereby diminishing the breadth and depth of insight that can be derived. To address the above limitation, we introduce ETGraph, a novel dataset that authentically links Ethereum and Twitter, marking the first and largest dataset of its kind. ETGraph combines Ethereum transaction records (2 million nodes and 30 million edges) and Twitter following data (1 million nodes and 3 million edges), bonding 30,667 Ethereum addresses with verified Twitter accounts sourced from OpenSea. Detailed statistical analysis on ETGraph highlights the structural differences between Twitter-matched and non-Twitter-matched Ethereum addresses. Extensive experiments, including Ethereum link prediction, wash-trading Ethereum addresses detection, and Twitter-Ethereum matching link prediction, emphasize the significant role of Twitter data in enhancing Ethereum analysis. ETGraph is available at https://etgraph.deno.dev/.

摘要
“虽然有很多公共链表数据可用，但它们的用途受到单一的链数据专注的限制，这限制了社交网络数据的包含在链分析中，从而减少了分析的广度和深度。为解决这个限制，我们介绍ETGraph，一个新的数据集，它独特地将Ethereum和Twitter联系起来，成为首次和最大的数据集。ETGraph将Ethereum交易记录（2000万节点和3000万边）和Twitter关注数据（1000万节点和3000万边）结合，将30667个Ethereum地址与OpenSea上验证的Twitter帐户相连接。etailed的统计分析表明，Twitter帐户和非Twitter帐户之间存在结构上的差异。广泛的实验，包括Ethereum链接预测、洗刷Ethereum地址检测和Twitter-Ethereum匹配链接预测，证明Twitter数据在提高Ethereum分析中扮演着重要的角色。ETGraph可以在https://etgraph.deno.dev/上获取。”

Efficient Algorithms for the CCA Family: Unconstrained Objectives with Unbiased Gradients

paper_url: http://arxiv.org/abs/2310.01012
repo_url: None
paper_authors: James Chapman, Ana Lawry Aguila, Lennie Wells
for: 这篇论文主要是为了研究多视角学习中的 canonical correlation analysis（CCA）家族方法，以及CCA方法的扩展和改进。
methods: 这篇论文使用了常见的linear CCA方法和deep CCA方法，并提出了一种新的不确定目标函数，以及一种基于杂交 descent（SGD）的 familia l of fast algorithms，以提高CCA方法的效率和精度。
results: 该论文的实验结果表明，新提出的CCA方法可以快速地 converge，并且在多个标准CCA和深度CCA benchmark上显示出较高的相关度和更好的性能，而且可以进行大规模的生物医学数据分析。此外，该论文还成功地与经典CCA方法进行了理论上的联系。

Abstract
The Canonical Correlation Analysis (CCA) family of methods is foundational in multi-view learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. These methods show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. This speed allows us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 variants. Finally, we not only match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, but also establish the first solid theoretical links to classical CCA, laying the groundwork for future insights.

摘要
“Canonical Correlation Analysis（CCA）家族的方法是多视角学习的基础方法。常规线性CCA方法可以看作是分解多项式回归（PLS）的推广，并且可以与通用的 eigenvalues 问题（GEP）框架统一。然而，经典算法 для这些线性方法在大规模数据上是计算不可行的。扩展到深度CCA显示了很大的托管，但现有的训练过程相对慢和复杂。我们首先提出了一个新的不受限制的目标函数，该函数可以 caracterize 高维GEPs 的topsubspace。我们的核心贡献是一家 Fast Algorithms for Stochastic PLS, Stochastic CCA, and Deep CCA，只需要将 Stochastic Gradient Descent（SGD）应用于相应的 CCA 目标函数。这些方法在所有标准 CCA 和 Deep CCA benchmark 上显示了迅速的收敛和更高的相关性。这个速度允许我们对 UK Biobank 上的一个非常大的生物医学数据集进行了首次 PLSA 分析，该数据集包含了33,000名个体和500,000个变iante。最后，我们不仅与 `CCA-family' 自适应学习（SSL）方法在 CIFAR-10 和 CIFAR-100 上匹配性，还 establishment了类CCA的首次理论链接，为未来的发现提供了基础。”

Towards Fixing Clever-Hans Predictors with Counterfactual Knowledge Distillation

paper_url: http://arxiv.org/abs/2310.01011
repo_url: None
paper_authors: Sidney Bender, Christopher J. Anders, Pattarawatt Chormai, Heike Marxfeld, Jan Herrmann, Grégoire Montavon
for: 本文提出了一种新的方法 called counterfactual knowledge distillation (CFKD), 用于检测和消除深度学习模型中的假设因素（confounders），并且通过人工专家反馈来帮助模型更好地理解和预测。
methods: 本文使用了Counterfactual Explanations的技术，通过人工专家的反馈来帮助模型更好地理解和预测。
results: 本文对人工增强的数据集和实际的 Histopathological 数据集进行了证明，并且表明了 CFKD 的效果。

Abstract
This paper introduces a novel technique called counterfactual knowledge distillation (CFKD) to detect and remove reliance on confounders in deep learning models with the help of human expert feedback. Confounders are spurious features that models tend to rely on, which can result in unexpected errors in regulated or safety-critical domains. The paper highlights the benefit of CFKD in such domains and shows some advantages of counterfactual explanations over other types of explanations. We propose an experiment scheme to quantitatively evaluate the success of CFKD and different teachers that can give feedback to the model. We also introduce a new metric that is better correlated with true test performance than validation accuracy. The paper demonstrates the effectiveness of CFKD on synthetically augmented datasets and on real-world histopathological datasets.

摘要

Using Reinforcement Learning to Optimize Responses in Care Processes: A Case Study on Aggression Incidents

paper_url: http://arxiv.org/abs/2310.00981
repo_url: None
paper_authors: Bart J. Verhoef, Xixi Lu
for: 本研究旨在发现侵略行为情况下医疗工作人员的优化策略。
methods: 本研究采用了Markov决策过程和Q学习算法，以及SARSA算法来找到最佳策略。
results: 研究结果显示，从Q学习和SARSA算法 derivied的策略与当前最常用的行为相似，但提供了一些在某些情况下的更多选择。

Abstract
Previous studies have used prescriptive process monitoring to find actionable policies in business processes and conducted case studies in similar domains, such as the loan application process and the traffic fine process. However, care processes tend to be more dynamic and complex. For example, at any stage of a care process, a multitude of actions is possible. In this paper, we follow the reinforcement approach and train a Markov decision process using event data from a care process. The goal was to find optimal policies for staff members when clients are displaying any type of aggressive behavior. We used the reinforcement learning algorithms Q-learning and SARSA to find optimal policies. Results showed that the policies derived from these algorithms are similar to the most frequent actions currently used but provide the staff members with a few more options in certain situations.

摘要
Translated into Simplified Chinese:前期研究使用拟定过程监测来找到可行的政策，并在类似领域进行了案例研究，如贷款申请过程和交通罚款过程。然而，护理过程通常更加动态和复杂。例如，在任何一个护理过程阶段，有多种动作是可能的。在这篇论文中，我们采用了回归方法，使用护理过程事件数据来训练Markov决策过程。目标是找到对员工行为的优化政策，当客户展现任何种攻击行为时。我们使用Q-学习和SARSA算法来找到优化政策。结果显示，这些算法 derivated 的策略与现有最常用的策略相似，但提供了员工在某些情况下一些更多的选项。

All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization

paper_url: http://arxiv.org/abs/2310.00964
repo_url: None
paper_authors: Pablo Barros, Alessandra Sciutti
for: 这篇论文目的是为了解决在竞争性游戏场景下多个代理需要学习决策，以最大化自己的目标并最小化对手的目标。
methods: 这篇论文提出了一种新的模型，由三层神经网络组成，用于学习竞争性游戏的表示、映射对手策略以及对其进行干扰。整个模型在线上进行训练，使用了一种组合损函数基于对比优化，以学习竞争性多客户端游戏。
results: 我们在一个彩虹猪牌游戏和四名玩家竞争厨帽游戏中进行了实验，结果表明我们的模型在与离线、在线和竞争性特定模型对战时表现更好，特别是在与同一名对手多次对战时。我们还提出了关于模型对特定策略学习的讨论。

Abstract
In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time. Besides dealing with the increased dynamics of the scenarios due to the opponents' actions, they usually have to understand how to overcome the opponent's strategies. Most of the common solutions, usually based on continual learning or centralized multi-agent experiences, however, do not allow the development of personalized strategies to face individual opponents. In this paper, we propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them. The entire model is trained online, using a composed loss based on a contrastive optimization, to learn competitive and multiplayer games. We evaluate our model on a pokemon duel scenario and the four-player competitive Chef's Hat card game. Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times. We also present a discussion on the impact of our model, in particular on how well it deals with on specific strategy learning for each of the two scenarios.

摘要
在竞争性游戏场景下，一组代理需要学习决策以最大化自己的目标并最小化对手的目标。除了面对对手的动态外，它们通常还需要理解如何超越对手的策略。大多数常见的解决方案，通常基于连续学习或中央多代理经验，然而这些方案并不允许发展个性化策略面对特定对手。在这篇论文中，我们提出了一种新的模型，包括三层神经网络，学习竞争游戏的表示、对特定对手策略的映射以及如何破坏它们。整个模型在线上训练，使用组合损失基于对比优化，以学习竞争和多人游戏。我们在POKEMON战斗场景和四名竞争对手的Chef's Hat卡牌游戏中进行了实验。我们的实验表明，我们的模型在面对离线、在线和竞争特定模型时表现更好，特别是在与同一个对手多次交手。我们还提供了关于我们模型的影响，包括对每个场景的特定策略学习的讨论。

Multi-Agent Bayesian Optimization with Coupled Black-Box and Affine Constraints

paper_url: http://arxiv.org/abs/2310.00962
repo_url: None
paper_authors: Wenjie Xu, Yuning Jiang, Bratislav Svetozarevic, Colin N. Jones
for: 该论文研究了分布式多代理器搜索问题，其中有 coupling black-box 约束和知道的 affine 约束。
methods: 提出了一种 primal-dual 分布式算法，可以在黑盒目标函数和约束函数下实现类似于单代理器情况下的 regret/violation 界限。此外，该算法可以保证在知道的 affine 约束下，每个样本的平均违规程度在 $\mathcal{O}(N\sqrt{T})$ 内。
results: 应用于 Gaussian processes 和无线通信优化问题，结果表明该方法可以同时实现近似优秀性和保持平均违规程度较小，证实了我们的理论分析。

Abstract
This paper studies the problem of distributed multi-agent Bayesian optimization with both coupled black-box constraints and known affine constraints. A primal-dual distributed algorithm is proposed that achieves similar regret/violation bounds as those in the single-agent case for the black-box objective and constraint functions. Additionally, the algorithm guarantees an $\mathcal{O}(N\sqrt{T})$ bound on the cumulative violation for the known affine constraints, where $N$ is the number of agents. Hence, it is ensured that the average of the samples satisfies the affine constraints up to the error $\mathcal{O}({N}/{\sqrt{T})$. Furthermore, we characterize certain conditions under which our algorithm can bound a stronger metric of cumulative violation and provide best-iterate convergence without affine constraint. The method is then applied to both sampled instances from Gaussian processes and a real-world optimal power allocation problem for wireless communication; the results show that our method simultaneously provides close-to-optimal performance and maintains minor violations on average, corroborating our theoretical analysis.

摘要
Translation notes:* "black-box constraints" 是指不知道内部运作的约束，通常用来描述对于一个函数的约束，而不是对于一个函数的解的约束。* "known affine constraints" 是指已知的平差约束，通常用来描述一个函数的约束，其中约束的形式是一个平差方程。* "primaldual distributed algorithm" 是一种分布式算法，它可以在多个代理之间进行协调，以解决一个问题。* "regret/violation bounds" 是指一个算法的性能指标，它可以量化算法在问题上的性能。* "best-iterate convergence" 是指一个算法可以在问题上 converges to the optimal solution，并且可以保证在某些情况下，算法的迭代次数可以降低到最少。

Deep Learning in Computational Biology: Advancements, Challenges, and Future Outlook

paper_url: http://arxiv.org/abs/2310.03086
repo_url: None
paper_authors: Suresh Kumar, Dhanyashri Guruparan, Pavithren Aaron, Philemon Telajan, Kavinesh Mahadevan, Dinesh Davagandhi, Ong Xin Yue
for: 这篇文章主要关注 Computational Biology 中的 Deep Learning 技术，包括 DNA 序列分类和调适、蛋白结构预测等。
methods: 文章使用了 Deep Learning 技术，包括 Convolutional Neural Networks (CNNs) 和其他进阶模型，以分析生物资料。
results: 文章指出 Deep Learning 技术在 Computational Biology 中已经获得了重要的进步，包括 DNA 序列分类和调适、蛋白结构预测等领域。同时，文章还提出了一些挑战，例如需要大量 Labelled 数据和模型解释等问题。

Abstract
Deep learning has become a powerful tool in computational biology, revolutionising the analysis and interpretation of biological data over time. In our article review, we delve into various aspects of deep learning in computational biology. Specifically, we examine its history, advantages, and challenges. Our focus is on two primary applications: DNA sequence classification and prediction, as well as protein structure prediction from sequence data. Additionally, we provide insights into the outlook for this field. To fully harness the potential of deep learning in computational biology, it is crucial to address the challenges that come with it. These challenges include the requirement for large, labelled datasets and the interpretability of deep learning models. The use of deep learning in the analysis of DNA sequences has brought about a significant transformation in the detection of genomic variants and the analysis of gene expression. This has greatly contributed to the advancement of personalised medicine and drug discovery. Convolutional neural networks (CNNs) have been shown to be highly accurate in predicting genetic variations and gene expression levels. Deep learning techniques are used for analysing epigenetic data, including DNA methylation and histone modifications. This provides valuable insights into metabolic conditions and gene regulation. The field of protein structure prediction has been significantly impacted by deep learning, which has enabled accurate determination of the three-dimensional shape of proteins and prediction of their interactions. The future of deep learning in computational biology looks promising. With the development of advanced deep learning models and interpretation techniques, there is potential to overcome current challenges and further our understanding of biological systems.

摘要
深度学习已成为计算生物中的强大工具，革命化了生物数据的分析和解释过程。在我们的文章评论中，我们深入探讨了深度学习在计算生物中的历史、优势和挑战。我们的焦点是两个主要应用：DNA序列分类和预测，以及蛋白质结构预测从序列数据。此外，我们还提供了这个领域未来发展的情况。要完全利用深度学习在计算生物中的潜力，必须解决这些挑战，包括需要大量标注的数据和深度学习模型的解释性。使用深度学习分析DNA序列已经带来了 significiant transformation，帮助掌握个性化医学和药物探索。卷积神经网络（CNNs）在预测基因变异和基因表达水平方面表现出了极高的准确率。深度学习技术还用于分析生物样本中的脱氧核酸和蛋白质结构。这提供了有价值的生物体系的内部状态和基因规则的信息。蛋白质结构预测领域受到深度学习的影响，使得可以准确地确定蛋白质的三维形态和预测它们之间的交互。未来，深度学习在计算生物中的发展前景很 promising。随着深度学习模型和解释技术的发展，有望超越当前的挑战，并进一步了解生物体系。

Distilling Influences to Mitigate Prediction Churn in Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.00946
repo_url: None
paper_authors: Andreas Roth, Thomas Liebig
for: 本文研究graph neural networks（GNN）中的预测变化现象（prediction churn），通过比较不同模型的初始化和特征使用情况来解释这个现象。
methods: 本文提出了一个新的度量方法called Influence Difference（ID），用于量化不同模型之间节点采用的理由的变化。此外，文章还考虑了节点预测的稳定性和不稳定性，并提出了基于ID的知识塑造法 DropDistillation（DD）来逐个节点进行预测稳定性。
results: 文章的实验结果表明，DropDistillation（DD）可以在六个 benchmark dataset 上对节点预测进行稳定性和总性表现的改进，并且在知识塑造中比前方法更好。

Abstract
Models with similar performances exhibit significant disagreement in the predictions of individual samples, referred to as prediction churn. Our work explores this phenomenon in graph neural networks by investigating differences between models differing only in their initializations in their utilized features for predictions. We propose a novel metric called Influence Difference (ID) to quantify the variation in reasons used by nodes across models by comparing their influence distribution. Additionally, we consider the differences between nodes with a stable and an unstable prediction, positing that both equally utilize different reasons and thus provide a meaningful gradient signal to closely match two models even when the predictions for nodes are similar. Based on our analysis, we propose to minimize this ID in Knowledge Distillation, a domain where a new model should closely match an established one. As an efficient approximation, we introduce DropDistillation (DD) that matches the output for a graph perturbed by edge deletions. Our empirical evaluation of six benchmark datasets for node classification validates the differences in utilized features. DD outperforms previous methods regarding prediction stability and overall performance in all considered Knowledge Distillation experiments.

摘要
模型之间存在类似性表现的情况下，预测结果存在显著的分歧，称为预测涨潮。我们的研究探讨了这种现象在图 neural network 中，通过比较不同模型的初始化和使用的特征来 investigate differences between models. 我们提出了一个新的度量方法 called Influence Difference (ID)，用于量化节点之间的变化原因。此外，我们还考虑了节点的稳定和不稳定预测之间的差异，并认为这两种预测都使用了不同的原因，因此可以提供有用的梯度信号，以便在两个模型之间匹配。基于我们的分析，我们建议在知识储存中使用 ID 来减少预测涨潮。为了有效地 aproximate ID，我们引入 DropDistillation（DD），它可以在图中Edge deletions 的情况下，将输出与原始模型的输出匹配。我们对六个基准数据集进行了node classification 的实验，并证明了 DropDistillation 的效果。DD 在所有考虑的知识储存实验中都超过了先前的方法，包括预测稳定性和总性能。

Fooling the Textual Fooler via Randomizing Latent Representations

paper_url: http://arxiv.org/abs/2310.01452
repo_url: None
paper_authors: Duy C. Hoang, Quang H. Nguyen, Saurav Manchanda, MinLong Peng, Kok-Seng Wong, Khoa D. Doan
for: 防止文本黑客攻击 NLP 模型
methods: 使用随机 latent space 防范攻击
results: near state-of-the-art 鲁棒性 against representative adversarial word-level attacks on two benchmark datasets

Abstract
Despite outstanding performance in a variety of NLP tasks, recent studies have revealed that NLP models are vulnerable to adversarial attacks that slightly perturb the input to cause the models to misbehave. Among these attacks, adversarial word-level perturbations are well-studied and effective attack strategies. Since these attacks work in black-box settings, they do not require access to the model architecture or model parameters and thus can be detrimental to existing NLP applications. To perform an attack, the adversary queries the victim model many times to determine the most important words in an input text and to replace these words with their corresponding synonyms. In this work, we propose a lightweight and attack-agnostic defense whose main goal is to perplex the process of generating an adversarial example in these query-based black-box attacks; that is to fool the textual fooler. This defense, named AdvFooler, works by randomizing the latent representation of the input at inference time. Different from existing defenses, AdvFooler does not necessitate additional computational overhead during training nor relies on assumptions about the potential adversarial perturbation set while having a negligible impact on the model's accuracy. Our theoretical and empirical analyses highlight the significance of robustness resulting from confusing the adversary via randomizing the latent space, as well as the impact of randomization on clean accuracy. Finally, we empirically demonstrate near state-of-the-art robustness of AdvFooler against representative adversarial word-level attacks on two benchmark datasets.

摘要
尽管现代自然语言处理（NLP）模型在多种任务上表现出色，但是最新的研究表明这些模型对敌意攻击很易受到影响。其中，对word-level攻击的研究最为深入了解和有效。这些攻击可以在黑盒Setting下进行，因此不需要对模型结构或参数进行访问，这使得现有的NLP应用程序受到威胁。在进行攻击时，敌人会通过让受到攻击的模型多次查询输入文本来确定最重要的单词并将它们替换为对应的同义词。在这种情况下，我们提出了一种轻量级的防御策略，名为AdvFooler。它的主要目标是在这些黑盒Query-based攻击中干扰敌人发现攻击的过程，即“诱敌”。AdvFooler在推理时随机化输入的干扰表示。与现有防御策略不同，AdvFooler不需要在训练时添加额外的计算过程，也不需要假设潜在的攻击干扰集。我们的理论和实验分析表明，通过干扰latent空间来混淆敌人，从而提高模型的Robustness。此外，我们也证明了AdvFooler在 repreensitive adversarial word-level 攻击下 Near state-of-the-art 的Robustness。

Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP

paper_url: http://arxiv.org/abs/2310.00927
repo_url: None
paper_authors: Zixiang Chen, Yihe Deng, Yuanzhi Li, Quanquan Gu
for: 本研究旨在深入理解CLIP中的跨模态学习，并对其在零基础学习和自然图像生成中的表现进行分析。
methods: 本研究使用视觉语言对照预训练来学习图像和文本之间的共同表示，并对CLIP的特征对齐进行分析。
results: 研究发现CLIP的跨模态学习可以帮助提高零基础学习和自然图像生成的性能，并提出了一种基于CLIP的新方法，其在标准数据集上表现更好于CLIP和其他现有方法。

Abstract
Multi-modal learning has become increasingly popular due to its ability to leverage information from different data sources (e.g., text and images) to improve the model performance. Recently, CLIP has emerged as an effective approach that employs vision-language contrastive pretraining to learn joint image and text representations and exhibits remarkable performance in zero-shot learning and text-guided natural image generation. Despite the huge practical success of CLIP, its theoretical understanding remains elusive. In this paper, we formally study transferrable representation learning underlying CLIP and demonstrate how features from different modalities get aligned. We also analyze its zero-shot transfer performance on the downstream tasks. Inspired by our analysis, we propose a new CLIP-type approach, which achieves better performance than CLIP and other state-of-the-art methods on benchmark datasets.

摘要
多Modal学习在最近几年来变得越来越流行，这是因为它可以利用不同数据源（例如文本和图像）来提高模型性能。最近，CLIP在无需任何批处或监督学习的情况下，通过视觉语言对照预训练来学习图像和文本之间的共同表示，并达到了无法匹配的表现。尽管CLIP在实践中取得了很大的成功，但其理论理解仍然不够清楚。在这篇论文中，我们正式研究CLIP中的传递表示学习，并证明了不同模式之间的特征如何相互对应。我们还分析了CLIP在下游任务中的零shot传递性能。 inspirited by our analysis, we propose a new CLIP-type approach, which achieves better performance than CLIP and other state-of-the-art methods on benchmark datasets.Note that Simplified Chinese is used in this translation, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can also provide the translation using that writing system.

The Participatory Turn in AI Design: Theoretical Foundations and the Current State of Practice

paper_url: http://arxiv.org/abs/2310.00907
repo_url: None
paper_authors: Fernando Delgado, Stephen Yang, Michael Madaio, Qian Yang
for: 本研究旨在探讨参与式设计在人工智能设计中的应用，并提供评估参与式方法的工具kit。
methods: 本研究结合技术设计、政治理论和社会科学 литера献，Synthesize出一个参与式设计框架，以评估参与式方法的效果。同时，通过对最新的研究和12名人工智能研究者和实践者的 semi-结构化采访，描述参与式实践的当前状况。
results: 研究发现现有的参与式实践存在各种限制和不一致，需要更好地考虑实践中的 constraint，以实现更加有效的参与式设计。

Abstract
Despite the growing consensus that stakeholders affected by AI systems should participate in their design, enormous variation and implicit disagreements exist among current approaches. For researchers and practitioners who are interested in taking a participatory approach to AI design and development, it remains challenging to assess the extent to which any participatory approach grants substantive agency to stakeholders. This article thus aims to ground what we dub the "participatory turn" in AI design by synthesizing existing theoretical literature on participation and through empirical investigation and critique of its current practices. Specifically, we derive a conceptual framework through synthesis of literature across technology design, political theory, and the social sciences that researchers and practitioners can leverage to evaluate approaches to participation in AI design. Additionally, we articulate empirical findings concerning the current state of participatory practice in AI design based on an analysis of recently published research and semi-structured interviews with 12 AI researchers and practitioners. We use these empirical findings to understand the current state of participatory practice and subsequently provide guidance to better align participatory goals and methods in a way that accounts for practical constraints.

摘要
尽管现在AI系统的设计中存在增长的共识，即参与者们应参与到AI系统的设计中，但是现实中的各种方法仍然存在巨大的变化和隐含的不一致。为了帮助研究人员和实践者们实施参与式方法，评估参与式方法是否实际提供了参与者们真正的权力仍然是一个挑战。这篇文章希望通过对参与式转变的理论基础和现实情况的研究，为参与式AI设计提供一个可用的评估框架。具体来说，我们通过技术设计、政治理论和社会科学的文献 синте减得出了一个参与式AI设计评估框架。此外，我们还通过对最新发表的研究和12名AI研究人员的 semi-结构化采访进行分析，了解参与式实践的当前状况。基于这些实践发现，我们提供了更好地实现参与式目标和方法的指导，考虑到实际的限制。

All Languages Matter: On the Multilingual Safety of Large Language Models

paper_url: http://arxiv.org/abs/2310.00905
repo_url: https://github.com/jarviswang94/multilingual_safety_benchmark
paper_authors: Wenxuan Wang, Zhaopeng Tu, Chang Chen, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu
for: 本研究旨在为大语言模型（LLM）的开发和部署提供安全性测试 benchmark，以适应 LLM 的全球部署。
methods: 我们构建了第一个多语言安全测试 benchmark，名为 XSafety，用于检测 LLM 的多语言安全性。 XSafety 覆盖了 10 种语言家族中的 14 种常用安全问题。
results: 我们使用 XSafety 对 4 种广泛使用的 LLM 进行了实验研究，发现所有 LLM 对非英语查询产生了许多不安全响应，这表明需要开发非英语语言的安全对Alignment。我们还提出了一些简单有效的提示方法，可以改善 ChatGPT 的多语言安全性。我们的提示方法可以将非英语查询中不安全响应的比例从 19.1% 降低到 9.7%。我们将数据发布在 GitHub 上，链接在 https://github.com/Jarviswang94/Multilingual_safety_benchmark。

Abstract
Safety lies at the core of developing and deploying large language models (LLMs). However, previous safety benchmarks only concern the safety in one language, e.g. the majority language in the pretraining data such as English. In this work, we build the first multilingual safety benchmark for LLMs, XSafety, in response to the global deployment of LLMs in practice. XSafety covers 14 kinds of commonly used safety issues across 10 languages that span several language families. We utilize XSafety to empirically study the multilingual safety for 4 widely-used LLMs, including both close-API and open-source models. Experimental results show that all LLMs produce significantly more unsafe responses for non-English queries than English ones, indicating the necessity of developing safety alignment for non-English languages. In addition, we propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT by evoking safety knowledge and improving cross-lingual generalization of safety alignment. Our prompting method can significantly reduce the ratio of unsafe responses from 19.1% to 9.7% for non-English queries. We release our data at https://github.com/Jarviswang94/Multilingual_safety_benchmark.

摘要
安全是大语言模型（LLM）的核心开发和部署的关键。然而，过去的安全标准仅关注一种语言的安全，例如在预训练数据中的主导语言，如英语。在这项工作中，我们建立了第一个多语言安全标准（XSafety），以应对全球 LLMS 的实践部署。XSafety 覆盖了 14 种常用的安全问题，涵盖 10 种语言家族。我们使用 XSafety 来实验性研究多语言安全，并对 4 种广泛使用的 LLMS 进行了实验研究，包括内置 API 和开源模型。实验结果表明，所有 LLMS 对非英语查询产生了许多不安全的响应， indicating 非英语语言的安全需要进行适应。此外，我们提议了一些简单 yet 有效的提示方法，可以改善 ChatGPT 的多语言安全性。我们的提示方法可以将非英语查询中的不安全响应比例从 19.1% 降低到 9.7%。我们将数据发布到 GitHub 上，请参考 https://github.com/Jarviswang94/Multilingual_safety_benchmark。

Expert enhanced dynamic time warping based anomaly detection

paper_url: http://arxiv.org/abs/2310.02280
repo_url: None
paper_authors: Matej Kloska, Gabriela Grmanova, Viera Rozinajova
for: 这篇论文是为了提出一种基于动态时间扭曲（DTW）算法的新型异常检测方法，以提高异常检测的效率和准确率。
methods: 该方法基于DTW算法，并在其基础上进行了人工智能Loop（HITL）概念的扩展和增强。
results: 该方法可以具有高效的异常检测、可重新训练基于专家的检测反馈，同时保持低的计算复杂度和存储空间复杂度。

Abstract
Dynamic time warping (DTW) is a well-known algorithm for time series elastic dissimilarity measure. Its ability to deal with non-linear time distortions makes it helpful in variety of data mining tasks. Such a task is also anomaly detection which attempts to reveal unexpected behaviour without false detection alarms. In this paper, we propose a novel anomaly detection method named Expert enhanced dynamic time warping anomaly detection (E-DTWA). It is based on DTW with additional enhancements involving human-in-the-loop concept. The main benefits of our approach comprise efficient detection, flexible retraining based on strong consideration of the expert's detection feedback while retaining low computational and space complexity.

摘要
《动态时间扭曲（DTW）算法是著名的时间序列灵活不同度量表》。它的不线性时间扭曲处理能力使其在数据挖掘任务中有很好的帮助。例如，异常检测任务，它尝试发现不期望的行为，而不会出现假警示。在这篇论文中，我们提出了一种基于DTW的增强版异常检测方法，即专家增强的动态时间扭曲异常检测（E-DTWA）。这种方法基于DTW，并具有人工智能 loop 概念的增强。我们的方法具有高效检测、可重新训练基于专家检测反馈的优点，同时保持了低计算量和存储空间复杂度。

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

paper_url: http://arxiv.org/abs/2310.00900
repo_url: None
paper_authors: Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu
for: 提高语音质量和可理解性，并实现语音编辑
methods: 使用 conditional diffusion models 处理多种任务，包括语音干扰和泛音磁化
results: 比其他相关生成语音增强模型 superior performance 在语音干扰和泛音磁化任务上，并可以根据desired environmental sound text description、SNR和RIR进行语音编辑

Abstract
Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs. In this paper, we propose a Unified Speech Enhancement and Editing (uSee) model with conditional diffusion models to handle various tasks at the same time in a generative manner. Specifically, by providing multiple types of conditions including self-supervised learning embeddings and proper text prompts to the score-based diffusion model, we can enable controllable generation of the unified speech enhancement and editing model to perform corresponding actions on the source speech. Our experiments show that our proposed uSee model can achieve superior performance in both speech denoising and dereverberation compared to other related generative speech enhancement models, and can perform speech editing given desired environmental sound text description, signal-to-noise ratios (SNR), and room impulse responses (RIR). Demos of the generated speech are available at https://muqiaoy.github.io/usee.

摘要
<>请求翻译文本到简化中文。<>语音增强的目标是提高语音信号质量和可理解性，语音编辑则是根据用户需求进行语音编辑。在这篇论文中，我们提出了一个统一语音增强和编辑（uSee）模型，使用条件扩散模型来处理多种任务。Specifically，通过提供多种条件，包括自我超vised学习嵌入和合适的文本提示，我们可以使得控制性地生成对应的语音增强和编辑模型，以执行对源语音的相应操作。我们的实验表明，我们提出的uSee模型在语音干扰和隔音方面的性能较高，并且可以根据欲有的环境声音文本描述、信号噪声比和房间冲击波来进行语音编辑。 demo 可以在中找到。

No Offense Taken: Eliciting Offensiveness from Language Models

paper_url: http://arxiv.org/abs/2310.00892
repo_url: https://github.com/anugyas/nluproject
paper_authors: Anugya Srivastava, Rahul Ahuja, Rohith Mukku
for: 本研究旨在提高语言模型在实际应用中的安全可靠性，通过对语言模型进行Robust testing。
methods: 本研究使用自动生成测试用例的方法，包括使用公开可用的小型语言模型（LMs）、不同目标LMs和红色分类器进行实验，并生成了诱导语言模型发送攻击性响应的测试用例集。
results: 研究发现，通过使用自动生成测试用例，可以帮助发现广泛部署的语言模型的失败模式，并且可以通过对这些测试用例进行分析，提高语言模型的安全性和可靠性。

Abstract
This work was completed in May 2022. For safe and reliable deployment of language models in the real world, testing needs to be robust. This robustness can be characterized by the difficulty and diversity of the test cases we evaluate these models on. Limitations in human-in-the-loop test case generation has prompted an advent of automated test case generation approaches. In particular, we focus on Red Teaming Language Models with Language Models by Perez et al.(2022). Our contributions include developing a pipeline for automated test case generation via red teaming that leverages publicly available smaller language models (LMs), experimenting with different target LMs and red classifiers, and generating a corpus of test cases that can help in eliciting offensive responses from widely deployed LMs and identifying their failure modes.

摘要

GRID: A Platform for General Robot Intelligence Development

paper_url: http://arxiv.org/abs/2310.00887
repo_url: https://github.com/scaledfoundations/grid-playground
paper_authors: Sai Vemprala, Shuhang Chen, Abhinav Shukla, Dinesh Narayanan, Ashish Kapoor
for: 这个论文是为了提出一个新的机器人智能发展平台（GRID），以解决现有的机器人智能发展时间和成本问题。
methods: 这个平台使用了基础模型来解决机器人智能学问题，并且可以让机器人学习、实现和适应其物理能力、环境限制和目标。
results: 在不同的航空机器人情况下，这个平台能够实现机器人智能发展的快速化和扩展，并且能够让机器人适应不同的环境和任务。

Abstract
Developing machine intelligence abilities in robots and autonomous systems is an expensive and time consuming process. Existing solutions are tailored to specific applications and are harder to generalize. Furthermore, scarcity of training data adds a layer of complexity in deploying deep machine learning models. We present a new platform for General Robot Intelligence Development (GRID) to address both of these issues. The platform enables robots to learn, compose and adapt skills to their physical capabilities, environmental constraints and goals. The platform addresses AI problems in robotics via foundation models that know the physical world. GRID is designed from the ground up to be extensible to accommodate new types of robots, vehicles, hardware platforms and software protocols. In addition, the modular design enables various deep ML components and existing foundation models to be easily usable in a wider variety of robot-centric problems. We demonstrate the platform in various aerial robotics scenarios and demonstrate how the platform dramatically accelerates development of machine intelligent robots.

摘要
开发机器智能能力在机器人和自动化系统中是一个昂贵和时间consuming的过程。现有的解决方案是为特定应用程序定制的，更难于泛化。另外，训练数据的缺乏加加了深入学习模型的部署复杂性。我们介绍了一个新的机器人智能发展平台（GRID），以解决这两个问题。该平台允许机器人学习、组合和适应它们的物理能力、环境限制和目标。平台通过物理世界的基础模型解决了机器人领域的AI问题。GRID是从头开始设计，以扩展性和可定制性为特点，以便更多种机器人、车辆、硬件平台和软件协议。此外，模块化的设计使得不同的深入学习组件和现有的基础模型可以轻松地在更多的机器人中应用。我们在不同的飞行器enario中展示了该平台，并证明了它在机器人机智能发展中带来了很大的加速。

(Dynamic) Prompting might be all you need to repair Compressed LLMs

paper_url: http://arxiv.org/abs/2310.00867
repo_url: None
paper_authors: Duc N. M Hoang, Minsik Cho, Thomas Merth, Mohammad Rastegari, Zhangyang Wang
for: 这种研究的目的是为了提高大语言模型（LLM）的压缩，以降低它们的计算需求。methods: 这种研究使用了训练自由压缩，并 investigate了提高性能的潜在方法，包括提前驱动（prompt-driven recovery）和动态提示（inference-time dynamic prompting）。results: 研究发现，使用提前驱动和动态提示可以提高 LLM 的性能，并且在多种知识领域的九种任务中得到了1.24%的均值提高。

Abstract
Large language models (LLMs), while transformative for NLP, come with significant computational demands, underlining the need for efficient, training-free compression. Notably, despite the marked improvement in training-free compression for the largest of LLMs, our tests using LLaMA-7B and OPT-6.7b highlight a significant performance drop in several realistic downstream tasks. Investigation into the trade-off between resource-intensive post-compression re-training highlights the prospect of prompt-driven recovery as a lightweight adaption tool. However, existing studies, confined mainly to perplexity evaluations and simple tasks, fail to offer unequivocal confidence in the scalability and generalizability of prompting. We tackle this uncertainty in two key ways. First, we uncover the vulnerability of naive prompts in LLM compression as an over-reliance on a singular prompt per input. In response, we propose inference-time dynamic prompting (IDP), a mechanism that autonomously chooses from a set of curated prompts based on the context of each individual input. Second, we delve into a scientific understanding of why "prompting might be all you need post-LLM compression." Our findings suggest that compression does not irretrievably erase LLM model knowledge but displace it, necessitating a new inference path. IDP effectively redirects this path, enabling the model to tap into its inherent yet displaced knowledge and thereby recover performance. Empirical tests affirm the value of IDP, demonstrating an average performance improvement of 1.24% across nine varied tasks spanning multiple knowledge domains.

摘要
大型自然语言处理（NLP）模型（LLM），尽管对NLP革命性，但具有显著的计算需求，高调出训练自由压缩的需求。尤其是在最大的LLM模型上，我们的测试表明在多个实际下渠道任务中出现了显著的性能下降。我们发现，对于资源占用高的后期压缩重新训练，存在负面的负担和投资回报的负担。在这种情况下，我们提出了一种基于推荐的恢复方法，称为推荐动态提示（IDP）。我们发现，使用单个提示的LLM压缩存在潜在的弱点，即对每个输入只能使用单个提示。为了解决这个问题，我们提出了一种基于上下文的自动提示选择机制，可以在每个输入上选择最佳的提示。此外，我们还进行了一种科学性的研究，以了解“提示是LLM压缩后模型性能的唯一保障”的原因。我们发现，压缩不会永久地消除LLM模型知识，而是将其推倒，需要一个新的推理路径。IDP可以有效地重定向这个路径，使模型能够启用其内在 yet 被推倒的知识，从而恢复性能。我们的实验证明了IDP的价值，在九个多样化的任务上达到了1.24%的平均性能提高。

Melody-conditioned lyrics generation via fine-tuning language model and its evaluation with ChatGPT

paper_url: http://arxiv.org/abs/2310.00863
repo_url: None
paper_authors: Zhe Zhang, Karol Lasocki, Yi Yu, Atsuhiro Takasu
for: 用于生成符号旋律的 syllable-level 歌词
methods: 使用 fine-tuning caracter-level 预训练模型，integrate 语言知识到 Transformer 生成器的 beam search 中
results: 通过 ChatGPT-based 评估，表明生成的歌词具有更高的 coherence 和正确性

Abstract
We leverage character-level language models for syllable-level lyrics generation from symbolic melody. By fine-tuning a character-level pre-trained model, we integrate language knowledge into the beam search of a syllable-level Transformer generator. Using ChatGPT-based evaluations, we demonstrate enhanced coherence and correctness in the generated lyrics.

摘要
我们利用字符级语言模型来生成字节级歌词从符号旋律中。通过 fine-tuning 字符级预训练模型，我们将语言知识 интегриinto了字节级 transformer 生成器的搜索树。使用 ChatGPT-based 评估，我们证明了生成的歌词具有更高的一致性和正确性。Note that "字节级" (byte-level) and "symbolic melody" are not direct translations, but rather approximations of the original text. The original text is discussing the use of character-level language models for generating lyrics from musical melodies, but in Chinese, we would typically use "字符级" (character-level) instead of "symbolic melody" to refer to the musical melodies. Additionally, "ChatGPT-based evaluations" is not a direct translation, but rather a description of the evaluation method used in the study.

Use Your INSTINCT: INSTruction optimization usIng Neural bandits Coupled with Transformers

paper_url: http://arxiv.org/abs/2310.02905
repo_url: https://github.com/xqlin98/INSTINCT
paper_authors: Xiaoqiang Lin, Zhaoxuan Wu, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, Bryan Kian Hsiang Low
for: 这paper aimed to optimize the instructions given to large language models (LLMs) using a neural bandit algorithm, in order to improve their performance in various tasks.
methods: 这paper uses a neural bandit algorithm that replaces the traditional Gaussian process (GP) model with a neural network (NN) surrogate, and naturally couples the NN surrogate with the hidden representation learned by a pre-trained transformer (i.e., an open-source LLM).
results: 该paper通过实验表明，使用INSTINCT算法可以在不同的任务中，如 instrucion induction tasks 和 zero-shot chain-of-thought instruction 等，与现有方法相比，具有更高的性能。

Abstract
Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications. However, the performances of LLMs depend heavily on the instructions given to them, which are typically manually tuned with substantial human efforts. Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimize the instructions given to black-box LLMs. However, BO usually falls short when optimizing highly sophisticated (e.g., high-dimensional) objective functions, such as the functions mapping an instruction to the performance of an LLM. This is mainly due to the limited expressive power of the Gaussian process (GP) model which is used by BO as a surrogate to model the objective function. Meanwhile, it has been repeatedly shown that neural networks (NNs), especially pre-trained transformers, possess strong expressive power and can model highly complex functions. So, we adopt a neural bandit algorithm which replaces the GP in BO by an NN surrogate to optimize instructions for black-box LLMs. More importantly, the neural bandit algorithm allows us to naturally couple the NN surrogate with the hidden representation learned by a pre-trained transformer (i.e., an open-source LLM), which significantly boosts its performance. These motivate us to propose our INSTruction optimization usIng Neural bandits Coupled with Transformers} (INSTINCT) algorithm. We perform instruction optimization for ChatGPT and use extensive experiments to show that our INSTINCT consistently outperforms the existing methods in different tasks, such as in various instruction induction tasks and the task of improving the zero-shot chain-of-thought instruction.

摘要
大型语言模型（LLM）已经展现出很好的指令遵循能力和各种应用中的优秀表现。然而，LLM的表现受到提供的指令的影响很大，这些指令通常需要大量的人工努力进行调整。现有的工作使用了查询有效率的汤普逊-波利 Optimization（BO）算法来自动调整黑盒LLM的指令。然而，BO通常无法优化高维度的目标函数，例如将指令与LLM的性能映射到的函数。这主要是因为GP模型，它是BO使用的伪函数模型，有限的表达力。同时，对于高维度的目标函数，NN类型模型，特别是预训transformer，具有强大的表达力和可以模型高度复杂的函数。因此，我们采用了对GP模型进行替换的神经网络参数（Neural Bandit）来优化黑盒LLM的指令。此外，神经网络参数可以自然地与预训transformer中的隐藏表现相互关联，从而增强表现。这些动机我们提出INSTINCT算法（INSTruction optimization using Neural bandits Coupled with Transformers）。我们对ChatGPT进行了 instruction optimization，并通过实验显示了INSTINCT在不同任务中，如不同的指令启发任务和零shot chain-of-thought指令的提升中，与现有方法相比，具有较高的表现。

Application of frozen large-scale models to multimodal task-oriented dialogue

paper_url: http://arxiv.org/abs/2310.00845
repo_url: None
paper_authors: Tatsuki Kawamoto, Takuma Suzuki, Ko Miyama, Takumi Meguro, Tomohiro Takagi
for: 这个研究用于测试多modal任务对话的可行性，以及使用现有的大语言模型优化See框架（LENS框架）解决计算机视觉任务。
methods: 我们使用Multimodal Dialogs（MMD） dataset，并使用ChatGPT-based G-EVAL进行评估，它只接受文本模式，并将多modal数据处理成文本模式。
results: 比前一些使用Transformer模型的研究，我们的方法在fluency、有用性和相关性和 coherence三个指标上显示出统计学上的优势， Specifically, our method demonstrated an absolute lift of 10.8% in fluency, 8.8% in usefulness, and 5.2% in relevance and coherence.

Abstract
In this study, we use the existing Large Language Models ENnhanced to See Framework (LENS Framework) to test the feasibility of multimodal task-oriented dialogues. The LENS Framework has been proposed as a method to solve computer vision tasks without additional training and with fixed parameters of pre-trained models. We used the Multimodal Dialogs (MMD) dataset, a multimodal task-oriented dialogue benchmark dataset from the fashion field, and for the evaluation, we used the ChatGPT-based G-EVAL, which only accepts textual modalities, with arrangements to handle multimodal data. Compared to Transformer-based models in previous studies, our method demonstrated an absolute lift of 10.8% in fluency, 8.8% in usefulness, and 5.2% in relevance and coherence. The results show that using large-scale models with fixed parameters rather than using models trained on a dataset from scratch improves performance in multimodal task-oriented dialogues. At the same time, we show that Large Language Models (LLMs) are effective for multimodal task-oriented dialogues. This is expected to lead to efficient applications to existing systems.

摘要
在本研究中，我们使用现有的大语言模型加强框架（LENS框架）测试多模态任务对话的可行性。LENS框架被提议用于解决计算机视觉任务无需额外训练和固定参数的前训练模型。我们使用的是时尚领域的多模态对话数据集（MMD），并使用基于ChatGPT的G-EVAL进行评估，该模型只接受文本Modalities，并通过特殊安排来处理多模态数据。与前一 Studies中使用Transformer模型相比，我们的方法在流利性、有用性和相关性和 coherence 等方面表现出积极的提升，具体的提升为10.8%、8.8%和5.2%。结果表明，使用大规模模型的固定参数而不是从scratch 训练模型可以提高多模态任务对话的性能，同时也表明了大语言模型在多模态任务对话中的有效性。这将导致现有系统的效率应用。

Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models

paper_url: http://arxiv.org/abs/2310.00836
repo_url: None
paper_authors: Man Luo, Shrinidhi Kumbhar, Ming shen, Mihir Parmar, Neeraj Varshney, Pratyay Banerjee, Somak Aditya, Chitta Baral
for: 本研究旨在了解大自然语言模型（LLMs）在逻辑推理中的效能，尤其是通过自然语言进行逻辑推理是否能够实现更高水平的逻辑推理能力。
methods: 本研究采用了一系列的方法，包括采用Seq2Seq任务标准化了24个不同的逻辑推理 datasets，并对这些 datasets 进行了单任务训练、多任务训练以及知识填充精细化训练等方法来评估模型在不同的逻辑推理类别中的性能。
results: 研究发现，通过单任务训练、多任务训练以及知识填充精细化训练等方法可以提高 LLMS 在逻辑推理中的性能，并且可以在不同的逻辑推理类别中实现更高水平的逻辑推理能力。

Abstract
Logical reasoning is fundamental for humans yet presents a substantial challenge in the domain of Artificial Intelligence. Initially, researchers used Knowledge Representation and Reasoning (KR) systems that did not scale and required non trivial manual effort. Recently, the emergence of large language models (LLMs) has demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. Consequently, there is a growing interest in using LLMs for logical reasoning via natural language. This work strives to understand the proficiency of LLMs in logical reasoning by offering a brief review of the latest progress in this area; with a focus on the logical reasoning datasets, tasks, and the methods adopted to utilize LLMs for reasoning. To offer a thorough analysis, we have compiled a benchmark titled LogiGLUE. This includes 24 varied datasets encompassing deductive, abductive, and inductive reasoning. We have standardized these datasets into Seq2Seq tasks to facilitate straightforward training and evaluation for future research. Utilizing LogiGLUE as a foundation, we have trained an instruction fine tuned language model, resulting in LogiT5. We study single task training, multi task training, and a chain of thought knowledge distillation fine tuning technique to assess the performance of model across the different logical reasoning categories. By this comprehensive process, we aim to shed light on the capabilities and potential pathways for enhancing logical reasoning proficiency in LLMs, paving the way for more advanced and nuanced developments in this critical field.

摘要
人类的逻辑推理是基础知识，但是在人工智能领域 pose a significant challenge. 在初期，研究人员使用知识表示和推理（KR）系统，但这些系统不具备扩展性和需要很大的人工努力。近年来，大型自然语言模型（LLMs）的出现已经表明了在不同的逻辑推理任务上的可行性。因此，使用 LLMS 进行逻辑推理已经成为一个受到关注的领域。本工作的目的是了解 LLMS 在逻辑推理中的效能。我们提供了一个简短的进展概述，并将焦点放在逻辑推理数据集、任务和使用 LLMS 进行推理的方法上。为了进行全面的分析，我们编制了一个名为 LogiGLUE 的标准套件，该套件包含了 24 个不同的逻辑推理数据集，涵盖了推理、推论和论证三种类型的推理。我们将这些数据集标准化为 Seq2Seq 任务，以便 straightforward 的训练和评估。使用 LogiGLUE 为基础，我们训练了一个特定任务的语言模型，并命名为 LogiT5。我们研究了单任务训练、多任务训练和知识传递精炼练练技术，以评估模型在不同的逻辑推理类型中的性能。通过这种全面的 процес，我们希望能够探讨 LLMS 在逻辑推理中的能力和可能的提升路径，为更高级和复杂的发展奠定基础。

Natural Language Models for Data Visualization Utilizing nvBench Dataset

paper_url: http://arxiv.org/abs/2310.00832
repo_url: None
paper_authors: Shuo Wang, Carlos Crespo-Quinones
for: 这个论文的目的是用自然语言模型来实现数据可视化中的语言翻译。
methods: 这个论文使用了序列到序列变换器基本模型，并使用大型自然语言模型BERT作为编码器来预测从自然语言查询中的可视化命令。
results: 这个论文通过对大量自然语言查询和可视化命令进行预测，证明了这种方法的设计和性能。

Abstract
Translation of natural language into syntactically correct commands for data visualization is an important application of natural language models and could be leveraged to many different tasks. A closely related effort is the task of translating natural languages into SQL queries, which in turn could be translated into visualization with additional information from the natural language query supplied\cite{Zhong:2017qr}. Contributing to the progress in this area of research, we built natural language translation models to construct simplified versions of data and visualization queries in a language called Vega Zero. In this paper, we explore the design and performance of these sequence to sequence transformer based machine learning model architectures using large language models such as BERT as encoders to predict visualization commands from natural language queries, as well as apply available T5 sequence to sequence models to the problem for comparison.

摘要
“自然语言转换为资料可视化的命令是一个重要的应用领域，可以应用于多种任务。相似的努力包括将自然语言转换为SQL查询，这些查询可以随后转换为可视化的资料，并且从自然语言查询中提供额外的资讯。”以下是我们在这个领域中所建立的自然语言转换模型的设计和性能：1. 使用大型自然语言模型，如BERT作为Encoder，以预测自然语言查询中的可视化命令。2. 将可用的T5序列转换模型应用到这个问题上，进行比较。我们的研究将FOCUS ON THE DESIGN AND PERFORMANCE OF THESE MODEL ARCHITECTURES, AND EXPLORE THEIR APPLICATION IN THE FIELD OF DATA VISUALIZATION.

Action Recognition Utilizing YGAR Dataset

paper_url: http://arxiv.org/abs/2310.00831
repo_url: None
paper_authors: Shuo Wang, Amiya Ranjan, Lawrence Jiang
for: 这篇论文是为了 bridging the gap of high-quality action video data 的研究和应用而写的。
methods: 这篇论文使用了一种新的3D动作数据生成引擎，生成了3组样本数据，以示其当前的功能性。
results: 通过使用这种数据生成过程，可以应用于图像分类、动作识别等领域，并且有potential可以演化成更复杂的动作识别系统。

Abstract
The scarcity of high quality actions video data is a bottleneck in the research and application of action recognition. Although significant effort has been made in this area, there still exist gaps in the range of available data types a more flexible and comprehensive data set could help bridge. In this paper, we present a new 3D actions data simulation engine and generate 3 sets of sample data to demonstrate its current functionalities. With the new data generation process, we demonstrate its applications to image classifications, action recognitions and potential to evolve into a system that would allow the exploration of much more complex action recognition tasks. In order to show off these capabilities, we also train and test a list of commonly used models for image recognition to demonstrate the potential applications and capabilities of the data sets and their generation process.

摘要
<>转换给定文本到简化中文。>研究和应用动作认知的瓶颈是高质量动作视频数据的缺乏。尽管已经投入了大量努力，仍然存在不同类型数据的空白，一个更flexible和全面的数据集可以帮助bridging这些空白。在这篇论文中，我们介绍了一种新的3D动作数据生成引擎，并生成了3组样本数据来示出其当前的功能。通过新的数据生成过程，我们示出了其应用于图像分类、动作认知和潜在的复杂动作认知任务的可能性。为了展示这些能力，我们还训练和测试了一些常用的图像认知模型，以示出数据集和生成过程的潜在应用和能力。

2023-10-02

cs.CL

cs.CL - 2023-10-02

Closing the Curious Case of Neural Text Degeneration

paper_url: http://arxiv.org/abs/2310.01693
repo_url: https://github.com/mattf1n/basis-aware-threshold
paper_authors: Matthew Finlayson, John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish Sabharwal
for: 这篇论文是为了解释 truncation sampling 规则的有效性而写的。
methods: 该论文使用了 теоретиче分析和实验方法来解释 truncation sampling 的有效性。
results: 研究发现， truncation sampling 可以保证所抽取的token都有非零真实概率，但是这种方法会丢弃一些有非零真实概率的token。此外，研究还开发了一种基于 softmax bottleneck 的 эксперименталь truncation strategy，并进行了一些 Pilot 研究，显示该方法可以在自动和人工评价指标下表现更好。

Abstract
Despite their ubiquity in language generation, it remains unknown why truncation sampling heuristics like nucleus sampling are so effective. We provide a theoretical explanation for the effectiveness of the truncation sampling by proving that truncation methods that discard tokens below some probability threshold (the most common type of truncation) can guarantee that all sampled tokens have nonzero true probability. However, thresholds are a coarse heuristic, and necessarily discard some tokens with nonzero true probability as well. In pursuit of a more precise sampling strategy, we show that we can leverage a known source of model errors, the softmax bottleneck, to prove that certain tokens have nonzero true probability, without relying on a threshold. Based on our findings, we develop an experimental truncation strategy and the present pilot studies demonstrating the promise of this type of algorithm. Our evaluations show that our method outperforms its threshold-based counterparts under automatic and human evaluation metrics for low-entropy (i.e., close to greedy) open-ended text generation. Our theoretical findings and pilot experiments provide both insight into why truncation sampling works, and make progress toward more expressive sampling algorithms that better surface the generative capabilities of large language models.

摘要
尽管截断采样策略如核心采样在语言生成中如此普遍，但它们的效果还未得到解释。我们提供了 truncation 方法的理论解释，证明截断方法，即按一定概率阈值截断token，可以 garantía 所采样的token具有非零真实概率。然而，阈值是一种粗略的估计，一定会抛弃一些具有非零真实概率的token。为了实现更精细的采样策略，我们表明可以利用模型的软饱感瓶颈来证明某些token具有非零真实概率，不需要阈值。根据我们的发现，我们开发了一种实验性的截断策略，并进行了相关的试验。我们的评估结果表明，我们的方法在自动和人工评估指标下，对低熵（即近乎恒等）的开放式文本生成 Task 表现较好。我们的理论发现和试验结果为我们更好的理解截断采样的效果，以及开发更表达力强的采样算法，以便更好地激发大语言模型的生成能力。

One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition

paper_url: http://arxiv.org/abs/2310.01688
repo_url: None
paper_authors: Samuele Cornell, Jee-weon Jung, Shinji Watanabe, Stefano Squartini
for: 这个论文提出了一种新的整合Speaker Diagnosis(SD)和自动语音识别(ASR)框架，名为SLIDAR（滑块窗口识别增强识别）。SLIDAR可以处理任意长度输入，并可以同时解决“谁说了什么，何时”问题。
methods: 这个框架使用了滑块窗口方法，并基于一个端到端识别扩展（E2E DAST）模型，该模型在每个窗口中提供了讲话稿、识别和说话人嵌入。E2E DAST模型采用了最新的技术，如串联输出训练和“喊叫式”提示。
results: 对于从AMI语音库中的单麦记录而言，实验证明SLIDAR在Close-talk和far-field语音场景中都有效。

Abstract
This paper presents a novel framework for joint speaker diarization (SD) and automatic speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented recognition). SLIDAR can process arbitrary length inputs and can handle any number of speakers, effectively solving ``who spoke what, when'' concurrently. SLIDAR leverages a sliding window approach and consists of an end-to-end diarization-augmented speech transcription (E2E DAST) model which provides, locally, for each window: transcripts, diarization and speaker embeddings. The E2E DAST model is based on an encoder-decoder architecture and leverages recent techniques such as serialized output training and ``Whisper-style" prompting. The local outputs are then combined to get the final SD+ASR result by clustering the speaker embeddings to get global speaker identities. Experiments performed on monaural recordings from the AMI corpus confirm the effectiveness of the method in both close-talk and far-field speech scenarios.

摘要
SLIDAR uses a sliding window approach and consists of an end-to-end diarization-augmented speech transcription (E2E DAST) model. The E2E DAST model is based on an encoder-decoder architecture and utilizes recent techniques such as serialized output training and "Whisper-style" prompting. The model provides locally, for each window, transcripts, diarization, and speaker embeddings.The local outputs are then combined to obtain the final SD+ASR result by clustering the speaker embeddings to obtain global speaker identities. Experiments conducted on monaural recordings from the AMI corpus demonstrate the effectiveness of the method in both close-talk and far-field speech scenarios.

Defending Against Authorship Identification Attacks

paper_url: http://arxiv.org/abs/2310.01568
repo_url: None
paper_authors: Haining Wang
for: This paper focuses on the issue of authorship identification and the threat it poses to individuals who wish to remain anonymous while communicating publicly.
methods: The paper reviews and analyzes various methods that have been proposed to defend against authorship identification attacks, including modification and generation-based strategies, with a focus on joint efforts from the differential privacy community.
results: The paper highlights the limitations of current research and identifies open challenges and potential research avenues in the field of authorship identification defense.

Abstract
Authorship identification has proven unsettlingly effective in inferring the identity of the author of an unsigned document, even when sensitive personal information has been carefully omitted. In the digital era, individuals leave a lasting digital footprint through their written content, whether it is posted on social media, stored on their employer's computers, or located elsewhere. When individuals need to communicate publicly yet wish to remain anonymous, there is little available to protect them from unwanted authorship identification. This unprecedented threat to privacy is evident in scenarios such as whistle-blowing. Proposed defenses against authorship identification attacks primarily aim to obfuscate one's writing style, thereby making it unlinkable to their pre-existing writing, while concurrently preserving the original meaning and grammatical integrity. The presented work offers a comprehensive review of the advancements in this research area spanning over the past two decades and beyond. It emphasizes the methodological frameworks of modification and generation-based strategies devised to evade authorship identification attacks, highlighting joint efforts from the differential privacy community. Limitations of current research are discussed, with a spotlight on open challenges and potential research avenues.

摘要
作者标识可以很准确地推断未签名文档的作者身份，即使仔细避免敏感个人信息。在数字时代，人们会通过自己的写作内容留下数字印记，无论是在社交媒体上发布、工作计算机上存储或其他地方。当人们需要公开沟通却希望保持匿名性时，有限的保护机制可以帮助他们避免不必要的作者标识攻击。这种威胁到隐私的现象特别明显在披露行为中。提案的防御策略主要是使用修改和生成基于策略，以避免作者标识攻击，同时保持原始的意思和语法完整性。这篇文章对过去二十年以来在这个研究领域的进步做出了全面的审视，并高亮了差异隐私社区的合作。现有研究的局限性和未解决的挑战也得到了讨论。

It’s MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk

paper_url: http://arxiv.org/abs/2310.01387
repo_url: None
paper_authors: Amanda Bertsch, Alex Xie, Graham Neubig, Matthew R. Gormley
for: 这篇论文主要是为了介绍和推广最小极大风险（MBR）解码方法，以及对NLP领域中MBR的应用和发展。
methods: 这篇论文使用了MBR解码方法，并对其的理论基础和最近的研究进行了介绍。同时， authors还提出了一些MBR的特例，并对这些特例的性能进行了 theoretically和实验性的研究。
results: 研究结果显示，MBR解码方法可以在各种任务上提供可靠的多点改进，而无需额外的数据或训练。此外， authors还发现了一些相关的NLP任务，其性能可以通过MBR的特例来进一步提高。

Abstract
Minimum Bayes Risk (MBR) decoding is a method for choosing the outputs of a machine learning system based not on the output with the highest probability, but the output with the lowest risk (expected error) among multiple candidates. It is a simple but powerful method: for an additional cost at inference time, MBR provides reliable several-point improvements across metrics for a wide variety of tasks without any additional data or training. Despite this, MBR is not frequently applied in NLP works, and knowledge of the method itself is limited. We first provide an introduction to the method and the recent literature. We show that several recent methods that do not reference MBR can be written as special cases of MBR; this reformulation provides additional theoretical justification for the performance of these methods, explaining some results that were previously only empirical. We provide theoretical and empirical results about the effectiveness of various MBR variants and make concrete recommendations for the application of MBR in NLP models, including future directions in this area.

摘要
<>将文本翻译成简化中文。> minimum bayes risk（MBR）解oding是一种基于机器学习系统的输出选择方法，而不是根据输出概率最高的输出，而是根据输出风险（预期错误）最低的输出。这是一种简单 yet powerful的方法：在推理时添加额外成本，MBR可以在各种任务上提供可靠的多点改进，无需任何额外数据或训练。尽管如此，MBR在NLP领域并不常用，人们对该方法的知识相对较少。我们首先介绍MBR方法和最近的文献，并证明了一些最近的方法可以写作MBR的特殊情况，这种重新表述提供了额外的理论支持，解释了一些先前只有实验性的结果。我们提供了理论和实验结果，证明了MBR变体的效果，并提出了在NLP模型中应用MBR的具体建议，以及未来在这个领域的发展方向。

Who is ChatGPT? Benchmarking LLMs’ Psychological Portrayal Using PsychoBench

paper_url: http://arxiv.org/abs/2310.01386
repo_url: https://github.com/cuhk-arise/psychobench
paper_authors: Jen-tse Huang, Wenxuan Wang, Eric John Li, Man Ho Lam, Shujie Ren, Youliang Yuan, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu
for: This paper aims to evaluate the psychological aspects of large language models (LLMs), specifically their personalities, temperaments, and emotions.
methods: The authors propose a framework called PsychoBench, which includes 13 scales commonly used in clinical psychology and classifies them into four categories: personality traits, interpersonal relationships, motivational tests, and emotional abilities. They also employ a jailbreak approach to bypass safety alignment protocols and test the intrinsic natures of LLMs.
results: The study examines five popular LLM models, including \texttt{text-davinci-003}, ChatGPT, GPT-4, LLaMA-2-7b, and LLaMA-2-13b, and provides a comprehensive evaluation of their psychological aspects using the PsychoBench framework.

Abstract
Large Language Models (LLMs) have recently showcased their remarkable capacities, not only in natural language processing tasks but also across diverse domains such as clinical medicine, legal consultation, and education. LLMs become more than mere applications, evolving into assistants capable of addressing diverse user requests. This narrows the distinction between human beings and artificial intelligence agents, raising intriguing questions regarding the potential manifestation of personalities, temperaments, and emotions within LLMs. In this paper, we propose a framework, PsychoBench, for evaluating diverse psychological aspects of LLMs. Comprising thirteen scales commonly used in clinical psychology, PsychoBench further classifies these scales into four distinct categories: personality traits, interpersonal relationships, motivational tests, and emotional abilities. Our study examines five popular models, namely \texttt{text-davinci-003}, ChatGPT, GPT-4, LLaMA-2-7b, and LLaMA-2-13b. Additionally, we employ a jailbreak approach to bypass the safety alignment protocols and test the intrinsic natures of LLMs. We have made PsychoBench openly accessible via \url{https://github.com/CUHK-ARISE/PsychoBench}.

摘要

Compressing LLMs: The Truth is Rarely Pure and Never Simple

paper_url: http://arxiv.org/abs/2310.01382
repo_url: None
paper_authors: Ajay Jaiswal, Zhe Gan, Xianzhi Du, Bowen Zhang, Zhangyang Wang, Yinfei Yang
for: 这个论文的目的是重新评估现有的最佳压缩方法（pruning和quantization）的效果，以及它们是否能够保留大语言模型（LLM）的语言理解和创作能力。
methods: 这个论文使用了现有的最佳压缩方法（pruning和quantization），并通过设计了一个新的评估卷（LLM-KICK）来评估压缩后的LLM的性能。
results: 研究发现，现有的压缩方法在一些任务上存在较大的性能下降，尤其是在知识密集的任务上；而压缩后的LLM仍然能够保持高效的语言理解和创作能力。

Abstract
Despite their remarkable achievements, modern Large Language Models (LLMs) encounter exorbitant computational and memory footprints. Recently, several works have shown significant success in training-free and data-free compression (pruning and quantization) of LLMs achieving 50-60% sparsity and reducing the bit-width down to 3 or 4 bits per weight, with negligible perplexity degradation over the uncompressed baseline. As recent research efforts are focused on developing increasingly sophisticated compression methods, our work takes a step back, and re-evaluates the effectiveness of existing SoTA compression methods, which rely on a fairly simple and widely questioned metric, perplexity (even for dense LLMs). We introduce Knowledge-Intensive Compressed LLM BenchmarK (LLM-KICK), a collection of carefully-curated tasks to re-define the evaluation protocol for compressed LLMs, which have significant alignment with their dense counterparts, and perplexity fail to capture subtle change in their true capabilities. LLM-KICK unveils many favorable merits and unfortunate plights of current SoTA compression methods: all pruning methods suffer significant performance degradation, sometimes at trivial sparsity ratios (e.g., 25-30%), and fail for N:M sparsity on knowledge-intensive tasks; current quantization methods are more successful than pruning; yet, pruned LLMs even at $\geq 50$% sparsity are robust in-context retrieval and summarization systems; among others. LLM-KICK is designed to holistically access compressed LLMs' ability for language understanding, reasoning, generation, in-context retrieval, in-context summarization, etc. We hope our study can foster the development of better LLM compression methods. All our related codes are planed to be open-sourced.

摘要
尽管现代大型自然语言模型（LLM）具有很多出色的成就，但它们的计算和存储占用非常大。最近，一些研究表明可以通过无需训练和数据的压缩（杀掉和量化）来减少LLM的计算和存储占用，并达到50-60%的稀疏性和3-4位的位宽，而无意义影响基eline的误差。然而，当前的研究几乎所有的努力都在开发更加复杂和精准的压缩方法上，我们的工作则是回到了开发SoTA压缩方法的基础上，并重新评估现有的SoTA压缩方法。我们引入了知识强度压缩LLM benchmark（LLM-KICK），一个仿制了 dense LLM 的任务，以重新定义压缩LLM的评估协议。LLM-KICK 的任务与 dense LLM 的任务有很大的吻合度，但误差不能准确反映压缩LLM 的真实能力。LLM-KICK 揭示了现有 SoTA 压缩方法的多种优点和缺点：所有的杀掉方法在一定的稀疏率（例如25-30%）下会导致显著的性能下降，而且无法处理 N:M 稀疏率的知识型任务；当前的量化方法比杀掉更成功；然而，压缩 LLM even at >= 50% sparsity 可以在语言理解、逻辑、生成等方面保持Robust性，等等。LLM-KICK 是一个旨在全面评估压缩LLM 的语言理解、逻辑、生成、语言理解、概要、概要等能力的工具。我们希望通过这个研究，可以推动开发更好的 LLM 压缩方法。我们计划将相关代码开源。

DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation

paper_url: http://arxiv.org/abs/2310.01381
repo_url: https://github.com/rbenita/diffar
paper_authors: Roi Benita, Michael Elad, Joseph Keshet
for: 高质量的语音生成
methods: 泛化抽象模型
results: 提高语音质量

Abstract
Diffusion models have recently been shown to be relevant for high-quality speech generation. Most work has been focused on generating spectrograms, and as such, they further require a subsequent model to convert the spectrogram to a waveform (i.e., a vocoder). This work proposes a diffusion probabilistic end-to-end model for generating a raw speech waveform. The proposed model is autoregressive, generating overlapping frames sequentially, where each frame is conditioned on a portion of the previously generated one. Hence, our model can effectively synthesize an unlimited speech duration while preserving high-fidelity synthesis and temporal coherence. We implemented the proposed model for unconditional and conditional speech generation, where the latter can be driven by an input sequence of phonemes, amplitudes, and pitch values. Working on the waveform directly has some empirical advantages. Specifically, it allows the creation of local acoustic behaviors, like vocal fry, which makes the overall waveform sounds more natural. Furthermore, the proposed diffusion model is stochastic and not deterministic; therefore, each inference generates a slightly different waveform variation, enabling abundance of valid realizations. Experiments show that the proposed model generates speech with superior quality compared with other state-of-the-art neural speech generation systems.

摘要
Diffusion模型最近已经被证明对高质量语音生成具有重要意义。大多数工作都是将焦点放在生成spectrogram上，因此需要一个额外的模型来将spectrogram转换为波形（即vocoder）。这项工作提出了一种扩散概率结束点模型，用于直接生成Raw speech波形。该模型是自适应的，每个框都是基于前一个框的一部分进行生成，因此可以无限制长的语音生成而且保持高度准确的生成和时间准确性。我们实现了该模型的不保持和保持speech生成。前者可以驱动输入序列的phonemes、幅度和振荡值来生成语音。工作在波形直接上有一些优势。具体来说，它允许创造当地的声学行为，如喉筋振荡，使整个波形具有更自然的声音。此外，提出的扩散模型是随机的，因此每次推理都会生成一些不同的波形变化，使得有很多有效的实现。实验表明，该模型对其他现有的神经网络语音生成系统产生的语音质量都有所提高。

GenSim: Generating Robotic Simulation Tasks via Large Language Models

paper_url: http://arxiv.org/abs/2310.01361
repo_url: https://github.com/liruiw/gensim
paper_authors: Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Chen Bao, Yuzhe Qin, Bailin Wang, Huazhe Xu, Xiaolong Wang
for:这篇论文是为了提出一种自动生成丰富的simulation环境和专家示范，以帮助培训通用 робо师策略。methods:该方法利用大型自然语言模型（LLM）的固定和编码能力，以生成许多不同任务的 simulation数据。results:研究发现，利用LLM生成的simulation程序可以显著提高任务水平的泛化性，并且在多任务策略训练中提高实际世界中的任务执行能力。

Abstract
Collecting large amounts of real-world interaction data to train general robotic policies is often prohibitively expensive, thus motivating the use of simulation data. However, existing methods for data generation have generally focused on scene-level diversity (e.g., object instances and poses) rather than task-level diversity, due to the human effort required to come up with and verify novel tasks. This has made it challenging for policies trained on simulation data to demonstrate significant task-level generalization. In this paper, we propose to automatically generate rich simulation environments and expert demonstrations by exploiting a large language models' (LLM) grounding and coding ability. Our approach, dubbed GenSim, has two modes: goal-directed generation, wherein a target task is given to the LLM and the LLM proposes a task curriculum to solve the target task, and exploratory generation, wherein the LLM bootstraps from previous tasks and iteratively proposes novel tasks that would be helpful in solving more complex tasks. We use GPT4 to expand the existing benchmark by ten times to over 100 tasks, on which we conduct supervised finetuning and evaluate several LLMs including finetuned GPTs and Code Llama on code generation for robotic simulation tasks. Furthermore, we observe that LLMs-generated simulation programs can enhance task-level generalization significantly when used for multitask policy training. We further find that with minimal sim-to-real adaptation, the multitask policies pretrained on GPT4-generated simulation tasks exhibit stronger transfer to unseen long-horizon tasks in the real world and outperform baselines by 25%. See the project website (https://liruiw.github.io/gensim) for code, demos, and videos.

摘要
collecting large amounts of real-world interaction data to train general robotic policies is often too expensive, so people use simulation data instead. However, existing methods for making simulation data have focused on scene-level diversity (like object instances and poses) instead of task-level diversity, because it's hard to come up with and check new tasks. This has made it hard for policies trained on simulation data to do well on new tasks.In this paper, we propose a way to automatically make rich simulation environments and expert demonstrations by using a large language model's (LLM) ability to ground and code. Our approach, called GenSim, has two modes: goal-directed generation, where we give the LLM a target task and it proposes a task curriculum to solve the target task, and exploratory generation, where the LLM starts with previous tasks and comes up with new tasks that would be helpful for solving more complex tasks. We use GPT4 to add ten times more tasks to the existing benchmark, and we finetune and evaluate several LLMs, including finetuned GPTs and Code Llama, on code generation for robotic simulation tasks.We find that the LLM-generated simulation programs can improve task-level generalization a lot when used for multitask policy training. We also find that with just a little bit of sim-to-real adaptation, the multitask policies pretrained on GPT4-generated simulation tasks can do well on unseen long-horizon tasks in the real world and are 25% better than baselines. You can find more information, including code, demos, and videos, on the project website (https://liruiw.github.io/gensim).

LEEC: A Legal Element Extraction Dataset with an Extensive Domain-Specific Label System

paper_url: http://arxiv.org/abs/2310.01271
repo_url: https://github.com/thulawtech/leec
paper_authors: Xue Zongyue, Liu Huanghai, Hu Yiran, Kong Kangle, Wang Chenlu, Liu Yun, Shen Weixing
for: 这个论文的目的是为了提供更加全面和准确的法律知识，以便提高法律案例的解释和分析能力，并且可以推动各种法律领域的下游应用。
methods: 这篇论文使用了legal experts的 Label system和annotation guideline来构建了15,831份司法文档和159个标签的 dataset，以及使用了多种State-of-the-art models来验证LEEC的可用性。
results: 这篇论文通过验证LEEC dataset的可用性，提供了更加全面和准确的法律知识，并且可以推动各种法律领域的下游应用。

Abstract
As a pivotal task in natural language processing, element extraction has gained significance in the legal domain. Extracting legal elements from judicial documents helps enhance interpretative and analytical capacities of legal cases, and thereby facilitating a wide array of downstream applications in various domains of law. Yet existing element extraction datasets are limited by their restricted access to legal knowledge and insufficient coverage of labels. To address this shortfall, we introduce a more comprehensive, large-scale criminal element extraction dataset, comprising 15,831 judicial documents and 159 labels. This dataset was constructed through two main steps: first, designing the label system by our team of legal experts based on prior legal research which identified critical factors driving and processes generating sentencing outcomes in criminal cases; second, employing the legal knowledge to annotate judicial documents according to the label system and annotation guideline. The Legal Element ExtraCtion dataset (LEEC) represents the most extensive and domain-specific legal element extraction dataset for the Chinese legal system. Leveraging the annotated data, we employed various SOTA models that validates the applicability of LEEC for Document Event Extraction (DEE) task. The LEEC dataset is available on https://github.com/THUlawtech/LEEC .

摘要
为了提高自然语言处理的能力，法律领域中的元素提取任务已经具有重要的意义。从法律文档中提取法律元素可以提高对法律案例的解释和分析能力，并且可以推动许多领域的法律应用。然而，现有的元素提取数据集受到了限制的法律知识和不够的标签覆盖的限制。为了解决这个问题，我们提出了一个更加全面、大规模的刑事元素提取数据集，包括15831份司法文档和159个标签。这个数据集通过以下两个步骤建立：first，我们的法律专家团队根据过去的法律研究设计了标签系统，该系统基于在刑事案例中驱动和生成刑事裁决结果的关键因素和过程; second，我们使用法律知识来对司法文档按照标签系统和注释指南进行标注。这个LEEC数据集（法律元素提取数据集）是中国法律系统中最广泛和域specific的法律元素提取数据集。利用注释数据，我们使用了多种当今最佳实践的模型，validate了LEEC数据集的可用性 для文档事件提取（DEE）任务。LEEC数据集可以在https://github.com/THUlawtech/LEEC上下载。

Improving Emotional Expression and Cohesion in Image-Based Playlist Description and Music Topics: A Continuous Parameterization Approach

paper_url: http://arxiv.org/abs/2310.01248
repo_url: None
paper_authors: Yuelyu Ji, Yuheng Song, Wei Wang, Ruoyi Xu, Zhongqian Xie, Huiyun Liu
for: 这个研究是为了提高文本生成中的字体控制和情感表达。
methods: 这个研究使用了Continuous Parameterization for Controlled Text Generation（CPCTG）方法，利用语言模型（LM）作为式掌握器，并考虑 semantic cohesion（SC）和情感表达比例（EEP）因素。
results: 这个研究在playlist描述和音乐主题生成任务中表现出色，ROUGE scores提高了，显示了调控文本生成中的外部因素对生成文本的影响。

Abstract
Text generation in image-based platforms, particularly for music-related content, requires precise control over text styles and the incorporation of emotional expression. However, existing approaches often need help to control the proportion of external factors in generated text and rely on discrete inputs, lacking continuous control conditions for desired text generation. This study proposes Continuous Parameterization for Controlled Text Generation (CPCTG) to overcome these limitations. Our approach leverages a Language Model (LM) as a style learner, integrating Semantic Cohesion (SC) and Emotional Expression Proportion (EEP) considerations. By enhancing the reward method and manipulating the CPCTG level, our experiments on playlist description and music topic generation tasks demonstrate significant improvements in ROUGE scores, indicating enhanced relevance and coherence in the generated text.

摘要
文本生成在图像基础的平台上，特别是音乐相关内容，需要精准控制文本风格和表达情感。然而，现有的方法 часто需要更多的外部因素控制生成的文本，而且依赖于精确的输入，缺乏连续的控制条件 для满意的文本生成。本研究提出了连续参数化控制文本生成（CPCTG）方法，以解决这些限制。我们的方法利用语言模型（LM）作为风格学习器，结合含义相关性（SC）和情感表达比例（EEP）考虑因素。通过改进奖励方法和CPCTG水平的调整，我们在歌单描述和音乐主题生成任务上进行了实验，并达到了ROUGE分数的显著提高， indicating that the generated text has enhanced relevance and coherence.

Label Supervised LLaMA Finetuning

paper_url: http://arxiv.org/abs/2310.01208
repo_url: https://github.com/4ai/ls-llama
paper_authors: Zongxi Li, Xianming Li, Yuzhang Liu, Haoran Xie, Jing Li, Fu-lee Wang, Qing Li, Xiaoqin Zhong
for: 本研究旨在提高大型自然语言模型（LLM）的零、几少预备测试能力，通过调整LLM的 fins。
methods: 本研究使用的方法包括对LLM进行标签预测，并使用低维适应（LoRA）来最小化混淆 Entropy损失。
results: 本研究发现，在标签预测任务中，对LLM进行标签预测可以大幅提高其性能，并且可以与BERT和RoBERTa大小的模型相匹配。另外，通过移除解释器的潜在面积，LS-unLLaMA可以在命名实体识别（NER）任务中 achieve the state-of-the-art 性能。

Abstract
The recent success of Large Language Models (LLMs) has gained significant attention in both academia and industry. Substantial efforts have been made to enhance the zero- and few-shot generalization capabilities of open-source LLMs through finetuning. Currently, the prevailing approach is instruction-tuning, which trains LLMs to complete real-world tasks by generating responses guided by natural language instructions. It is worth noticing that such an approach may underperform in sequence and token classification tasks. Unlike text generation tasks, classification tasks have a limited label space, where precise label prediction is more appreciated than generating diverse and human-like responses. Prior research has unveiled that instruction-tuned LLMs cannot outperform BERT, prompting us to explore the potential of leveraging latent representations from LLMs for supervised label prediction. In this paper, we introduce a label-supervised adaptation for LLMs, which aims to finetuning the model with discriminant labels. We evaluate this approach with Label Supervised LLaMA (LS-LLaMA), based on LLaMA-2-7B, a relatively small-scale LLM, and can be finetuned on a single GeForce RTX4090 GPU. We extract latent representations from the final LLaMA layer and project them into the label space to compute the cross-entropy loss. The model is finetuned by Low-Rank Adaptation (LoRA) to minimize this loss. Remarkably, without intricate prompt engineering or external knowledge, LS-LLaMA substantially outperforms LLMs ten times its size in scale and demonstrates consistent improvements compared to robust baselines like BERT-Large and RoBERTa-Large in text classification. Moreover, by removing the causal mask from decoders, LS-unLLaMA achieves the state-of-the-art performance in named entity recognition (NER). Our work will shed light on a novel approach to adapting LLMs for various downstream tasks.

摘要
最近，大型语言模型（LLMs）的成功吸引了学术和业界的重要注意。为了提高开源LLMs的零和几架演算数据的泛化能力，有很大的努力。现在的主流方法是指令调教，将LLMs训练为完成实际任务的回应，这可能会在序列和Token分类任务下表现不佳。不同于文本生成任务，分类任务有受限的标签空间，需要更加精确地预测标签，而不是生成多样化和人工化的回应。先前的研究发现，指令调教的LLMs无法超越BERT，这让我们探索可以从LLMs中提取的潜在表示，并调整这些表示以进行监督学习。在这篇文章中，我们将引入一种标签监督的LLMs修改方法，将模型调整为使用标签为条件的标签损失。我们使用Label Supervised LLaMA（LS-LLaMA），基于LLaMA-2-7B，一个较小规模的LLM，并可以在单个GeForce RTX4090 GPU上调整。我们从最终的LLaMA层提取潜在表示，并将其转换到标签空间中，以计算标签损失。模型运行LoRA来最小化这个损失。给了不需要复杂的提示工程或外部知识，LS-LLaMA在文本分类任务上取得了很好的表现，并与BERT-Large和RoBERTa-Large等稳定基准一样。此外，当我们从构成推断器中移除了导致推断器的隐藏掩蔽，LS-unLLaMA在命名实体识别（NER）任务上取得了顶尖的表现。我们的工作将照明一种新的LLMs的修改方法，以应对不同的下游任务。

Target-Aware Contextual Political Bias Detection in News

paper_url: http://arxiv.org/abs/2310.01138
repo_url: None
paper_authors: Iffat Maab, Edison Marrese-Taylor, Yutaka Matsuo
for: 这个论文的目的是提出一种基于文本偏见检测的媒体偏见检测方法，以提高媒体偏见检测的精度。
methods: 这个论文使用了一种基于模型采用偏见检测的数据增强技术，通过模拟不同的写作风格和语言环境来增强数据的多样性和表达能力。
results: 实验结果表明，当与预训练模型BERT相结合使用时，这种数据增强技术可以提高媒体偏见检测的性能，并达到了当前最佳的检测效果（F1分数58.15），与之前的方法相比有显著的提高。

Abstract
Media bias detection requires comprehensive integration of information derived from multiple news sources. Sentence-level political bias detection in news is no exception, and has proven to be a challenging task that requires an understanding of bias in consideration of the context. Inspired by the fact that humans exhibit varying degrees of writing styles, resulting in a diverse range of statements with different local and global contexts, previous work in media bias detection has proposed augmentation techniques to exploit this fact. Despite their success, we observe that these techniques introduce noise by over-generalizing bias context boundaries, which hinders performance. To alleviate this issue, we propose techniques to more carefully search for context using a bias-sensitive, target-aware approach for data augmentation. Comprehensive experiments on the well-known BASIL dataset show that when combined with pre-trained models such as BERT, our augmentation techniques lead to state-of-the-art results. Our approach outperforms previous methods significantly, obtaining an F1-score of 58.15 over state-of-the-art bias detection task.

摘要
媒体偏见检测需要全面融合多个新闻源的信息。新闻层次的政治偏见检测是一项复杂的任务，需要理解偏见的上下文。人类在写作时会展现出不同的写作风格，导致各种不同的本地和全球上下文，previous work in media bias detection has proposed augmentation techniques to exploit this fact. Despite their success, we observe that these techniques introduce noise by over-generalizing bias context boundaries, which hinders performance. To alleviate this issue, we propose techniques to more carefully search for context using a bias-sensitive, target-aware approach for data augmentation. Comprehensive experiments on the well-known BASIL dataset show that when combined with pre-trained models such as BERT, our augmentation techniques lead to state-of-the-art results. Our approach outperforms previous methods significantly, obtaining an F1-score of 58.15 over state-of-the-art bias detection task.

Text Data Augmentation in Low-Resource Settings via Fine-Tuning of Large Language Models

paper_url: http://arxiv.org/abs/2310.01119
repo_url: None
paper_authors: Jean Kaddour, Qi Liu
for: 这个论文旨在提高小型语言模型的下游性能，通过使用精心调整的教师模型生成和标注训练数据。
methods: 本文使用了精心调整的教师模型来生成和标注训练数据，以提高小型语言模型的下游性能。
results: 研究发现，通过使用精心调整的教师模型生成和标注训练数据，可以大幅提高小型语言模型的下游性能，并且只需要一小部分原始训练数据。

Abstract
The in-context learning ability of large language models (LLMs) enables them to generalize to novel downstream tasks with relatively few labeled examples. However, they require enormous computational resources to be deployed. Alternatively, smaller models can solve specific tasks if fine-tuned with enough labeled examples. These examples, however, are expensive to obtain. In pursuit of the best of both worlds, we study the annotation and generation of fine-tuning training data via fine-tuned teacher LLMs to improve the downstream performance of much smaller models. In four text classification and two text generation tasks, we find that both data generation and annotation dramatically improve the respective downstream model's performance, occasionally necessitating only a minor fraction of the original training dataset.

摘要
大型语言模型（LLMs）的内Context learning能力允许它们通过相对少量的标注例子进行泛化，但是它们需要巨大的计算资源进行部署。 Alternatively, 较小的模型可以通过精确地调整的例子来解决特定任务，但是这些例子可以是昂贵的获得。为了取得最佳的世界，我们研究了精度和生成 fine-tuning 训练数据的注释和生成，以提高下游模型的性能。在四个文本分类和两个文本生成任务中，我们发现了数据生成和注释都能够对相应的下游模型性能有很大的改善，occasionally只需要一小部分的原始训练数据。

GraphText: Graph Reasoning in Text Space

paper_url: http://arxiv.org/abs/2310.01089
repo_url: None
paper_authors: Jianan Zhao, Le Zhuo, Yikang Shen, Meng Qu, Kai Liu, Michael Bronstein, Zhaocheng Zhu, Jian Tang
for: 本文旨在探讨大型自然语言模型（LLM）在图机器学习领域的应用前景，以及如何使用LLM来处理图数据。
methods: 本文提出了一种名为GraphText的新框架，它可以将图转换成自然语言，从而让LLM能够更好地处理图数据。GraphText使用节点属性和节点间关系来生成图yntax树，并通过 traverse 该树来生成图文列表，这些列表可以被LLM处理为文本生成任务。
results: 本文的实验结果表明，GraphText 可以帮助LLM在图机器学习任务中进行无需训练的图理解，并且可以在人工智能和LLM之间进行自然语言交互，从而实现图机器学习领域中的互动式图理解。此外，GraphText 还可以与已经训练的图神经网络相比，在一些任务上具有相当的性能。

Abstract
Large Language Models (LLMs) have gained the ability to assimilate human knowledge and facilitate natural language interactions with both humans and other LLMs. However, despite their impressive achievements, LLMs have not made significant advancements in the realm of graph machine learning. This limitation arises because graphs encapsulate distinct relational data, making it challenging to transform them into natural language that LLMs understand. In this paper, we bridge this gap with a novel framework, GraphText, that translates graphs into natural language. GraphText derives a graph-syntax tree for each graph that encapsulates both the node attributes and inter-node relationships. Traversal of the tree yields a graph text sequence, which is then processed by an LLM to treat graph tasks as text generation tasks. Notably, GraphText offers multiple advantages. It introduces training-free graph reasoning: even without training on graph data, GraphText with ChatGPT can achieve on par with, or even surpassing, the performance of supervised-trained graph neural networks through in-context learning (ICL). Furthermore, GraphText paves the way for interactive graph reasoning, allowing both humans and LLMs to communicate with the model seamlessly using natural language. These capabilities underscore the vast, yet-to-be-explored potential of LLMs in the domain of graph machine learning.

摘要
GraphText 具有多个优点。它提供了无需训练的图理解：即使没有训练图数据，GraphText 与 ChatGPT 可以通过在场景学习（ICL）来达到与supervised 训练的图神经网络性能相同或者甚至超越性能。此外，GraphText 开创了互动式图理解的可能性，允许人类和 LLMs 通过自然语言与模型进行互动。这些能力表明 LLMs 在图机器学习领域的潜在潜力还尚未得到了充分的发掘。

Towards human-like spoken dialogue generation between AI agents from written dialogue

paper_url: http://arxiv.org/abs/2310.01088
repo_url: None
paper_authors: Kentaro Mitsui, Yukiya Hono, Kei Sawada
for: 本研究旨在生成基于写作对话的人工对话，以提高对话的自然性和流畅性。
methods: 本研究提出了 CHATS 系统，即 CHatty Agents Text-to-Speech，这是一种基于 discrete token 的系统，可以根据写作对话生成 spoken dialogue。 CHATS 系统可以同时生成说话者和听众两个角度的语音，只需基于说话者的译文。此外，CHATS 系统还能够自然地实现对话的转接，根据每个话语的duration来决定对话的流畅性。
results: 实验结果表明，CHATS 系统在对话的流畅性和自然性方面表现出色，而且保持了语音的清晰性和理解度。

Abstract
The advent of large language models (LLMs) has made it possible to generate natural written dialogues between two agents. However, generating human-like spoken dialogues from these written dialogues remains challenging. Spoken dialogues have several unique characteristics: they frequently include backchannels and laughter, and the smoothness of turn-taking significantly influences the fluidity of conversation. This study proposes CHATS - CHatty Agents Text-to-Speech - a discrete token-based system designed to generate spoken dialogues based on written dialogues. Our system can generate speech for both the speaker side and the listener side simultaneously, using only the transcription from the speaker side, which eliminates the need for transcriptions of backchannels or laughter. Moreover, CHATS facilitates natural turn-taking; it determines the appropriate duration of silence after each utterance in the absence of overlap, and it initiates the generation of overlapping speech based on the phoneme sequence of the next utterance in case of overlap. Experimental evaluations indicate that CHATS outperforms the text-to-speech baseline, producing spoken dialogues that are more interactive and fluid while retaining clarity and intelligibility.

摘要
LLM的出现使得可以生成自然的书面对话。然而，将书面对话转化为人工语音对话仍然是一项挑战。口语对话具有许多特殊特征，包括后台喊叫和笑声，以及对话的流畅性具有重要影响。本研究提出了 CHATS（对话Agent Text-to-Speech），一种基于粒子token的分割系统，用于将书面对话转化为语音对话。我们的系统可以同时生成说话者和听者两个角色的语音，只使用说话者的讲稿，这就不需要后台喊叫或笑声的转录。此外，CHATS支持自然的转化，可以根据每个语音的持续时间来决定合适的沉默时间长度，并在语音重叠时生成相应的 overlap 语音。实验评估表明，CHATS在人工语音对话中表现出色，生成的对话更加交互、流畅，同时保持清晰和理解性。

Tool-Augmented Reward Modeling

paper_url: http://arxiv.org/abs/2310.01045
repo_url: None
paper_authors: Lei Li, Yekun Chai, Shuohuan Wang, Yu Sun, Hao Tian, Ningyu Zhang, Hua Wu
for: 这个论文主要目的是为了将人类偏好模型与人工智能语言模型相互调整，尤其是在人工智能反馈学习（RLHF）中。
methods: 本论文提出了一个工具补充的偏好模型方法，名为Themis，可以让偏好模型有更多的功能和可靠性。这个方法通过让偏好模型与外部环境进行互动，包括计算机和搜索引擎，以提高偏好模型的解释能力和评分可靠性。
results: 本论文的实验结果显示，使用Themis方法可以有效地提高偏好模型的性能，特别是在多个任务中的偏好排名中。另外，这个方法也可以与基于 truthfulQA 任务的 Gopher 280B 进行 zeroshot 评分，并且在人类评价中胜过基于 Gopher 280B 的模型。

Abstract
Reward modeling (a.k.a., preference modeling) is instrumental for aligning large language models with human preferences, particularly within the context of reinforcement learning from human feedback (RLHF). While conventional reward models (RMs) have exhibited remarkable scalability, they oft struggle with fundamental functionality such as arithmetic computation, code execution, and factual lookup. In this paper, we propose a tool-augmented preference modeling approach, named \name, to address these limitations by empowering RMs with access to external environments, including calculators and search engines. This approach not only fosters synergy between tool utilization and reward grading but also enhances interpretive capacity and scoring reliability. Our study delves into the integration of external tools into RMs, enabling them to interact with diverse external sources and construct task-specific tool engagement and reasoning traces in an autoregressive manner. We validate our approach across a wide range of domains, incorporating seven distinct external tools. Our experimental results demonstrate a noteworthy overall improvement of 17.7% across eight tasks in preference ranking. Furthermore, our approach outperforms Gopher 280B by 7.3% on TruthfulQA task in zero-shot evaluation. In human evaluations, RLHF trained with Themis attains an average win rate of 32% when compared to baselines across four distinct tasks. Additionally, we provide a comprehensive collection of tool-related RM datasets, incorporating data from seven distinct tool APIs, totaling 15,000 instances. We anticipate that this publicly available dataset will facilitate and inspire further research advancements in the field.

摘要
大规模语言模型（ preference modeling）在人类偏好的满足方面发挥重要作用，特别是在人类反馈学习（RLHF）的情况下。传统的奖励模型（RM）具有惊人的扩展性，但它们经常在基本功能方面遇到困难，如数学计算、代码执行和事实查找。在本文中，我们提出了一种工具辅助 preference modeling 方法，名为 Themis，以解决这些限制。 Themis 方法不仅在工具使用和奖励评估之间建立了合作关系，还提高了解释能力和评价可靠性。我们的研究探讨了在 RM 中 инте incorporating 外部环境，包括计算器和搜索引擎，以便让 RM 可以与多种外部源交互。我们的实验表明，Themis 在多个领域的 eight 个任务中提高了总体性能的 17.7%，并在 TruthfulQA 任务上与 Gopher 280B 相比，在零化评估中提高了7.3%。在人类评估中，RLHF 在四个不同任务中的平均赢利率为 32%。此外，我们还提供了一个包含 seven 种工具 API 数据的全面的 RM 数据集，总计 15,000 个实例。我们期望这个公共可用的数据集将会促进和推动领域的研究进步。

Language Model Decoding as Direct Metrics Optimization

paper_url: http://arxiv.org/abs/2310.01041
repo_url: None
paper_authors: Haozhe Ji, Pei Ke, Hongning Wang, Minlie Huang
for: 这篇论文的目的是提出一种新的语言模型解oding方法，以实现对多个方面的理想性能 strictly matching。
methods: 该方法将解oding视为一个优化问题，并通过定义多个纬度的能量函数来解决。这些能量函数用于衡量不同方面的性能，并通过一种 analytical solution 来扩展输入语言模型分布。
results: 实验结果表明，该方法可以在不同领域和模型缩放量下，跨度性能与人类文本更好，并且在人类评估中超过强基eline。

Abstract
Despite the remarkable advances in language modeling, current mainstream decoding methods still struggle to generate texts that align with human texts across different aspects. In particular, sampling-based methods produce less-repetitive texts which are often disjunctive in discourse, while search-based methods maintain topic coherence at the cost of increased repetition. Overall, these methods fall short in achieving holistic alignment across a broad range of aspects. In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts measured by multiple metrics of desired aspects simultaneously. The resulting decoding distribution enjoys an analytical solution that scales the input language model distribution via a sequence-level energy function defined by these metrics. And most importantly, we prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts. To facilitate tractable sampling from this globally normalized distribution, we adopt the Sampling-Importance-Resampling technique. Experiments on various domains and model scales demonstrate the superiority of our method in metrics alignment with human texts and human evaluation over strong baselines.

摘要
In this work, we approach decoding as an optimization problem, aiming to strictly match the expected performance with human texts as measured by multiple metrics of desired aspects simultaneously. The resulting decoding distribution has an analytical solution that scales the input language model distribution via a sequence-level energy function defined by these metrics. Moreover, we prove that this induced distribution is guaranteed to improve the perplexity on human texts, suggesting a better approximation to the underlying distribution of human texts.To facilitate tractable sampling from this globally normalized distribution, we adopt the Sampling-Importance-Resampling technique. Our experiments on various domains and model scales demonstrate the superiority of our method in aligning with human texts and outperforming strong baselines in human evaluation.

ARN: A Comprehensive Framework and Dataset for Analogical Reasoning on Narratives

paper_url: http://arxiv.org/abs/2310.00996
repo_url: None
paper_authors: Zhivar Sourati, Filip Ilievski, Pia Sommerauer
for: 本研究旨在 bridge 认知心理学中关于 analogical reasoning 的研究和自然语言处理（NLP）中的评估方法之间的差异。
methods: 本研究使用了 cognitive psychology 中关于 analogical reasoning 的理论，在 narrative 上建立了一个 computationally 的评估框架，并释放了 Analogical Reasoning on Narratives（ARN）数据集。
results: 研究发现，大语言模型（LLMs）在本任务中很难认出高阶Mapping，只有当所有Mapping都存在时（near analogies）时表现良好。此外，研究发现，near distractors 可以轻松地干扰 LLMs 的 analogical reasoning 能力。

Abstract
Analogical reasoning is one of the prime abilities of humans and is linked to creativity and scientific discoveries. This ability has been studied extensively in natural language processing (NLP) as well as in cognitive psychology by proposing various benchmarks and evaluation setups. Yet, a substantial gap exists between evaluations of analogical reasoning in cognitive psychology and NLP. Our aim is to bridge this by computationally adapting theories related to analogical reasoning from cognitive psychology in the context of narratives and developing an evaluation framework large in scale. More concretely, we propose the task of matching narratives based on system mappings and release the Analogical Reasoning on Narratives (ARN) dataset. To create the dataset, we devise a framework inspired by cognitive psychology theories about analogical reasoning to utilize narratives and their components to form mappings of different abstractness levels. These mappings are then leveraged to create pairs of analogies and disanalogies/distractors with more than 1k triples of query narratives, analogies, and distractors. We cover four categories of far/near analogies and far/near distractors that allow us to study analogical reasoning in models from distinct perspectives. In this study, we evaluate different large language models (LLMs) on this task. Our results demonstrate that LLMs struggle to recognize higher-order mappings when they are not accompanied by lower-order mappings (far analogies) and show better performance when all mappings are present simultaneously (near analogies). We observe that in all the settings, the analogical reasoning abilities of LLMs can be easily impaired by near distractors that form lower-order mappings with the query narratives.

摘要
人类的分析逻辑能力是其创造力和科学发现的重要能力之一，这种能力已经在自然语言处理（NLP）和认知心理学中得到了广泛的研究。然而，存在评估分析逻辑能力在认知心理学和NLP之间的重大差距。我们的目标是通过计算方式把认知心理学中关于分析逻辑的理论应用到叙事中，并开发一个大规模的评估框架。更具体地，我们提出了匹配叙事基于系统映射的任务，并释放了分析逻辑在叙事中（ARN）数据集。为创建数据集，我们采用了基于认知心理学中关于分析逻辑的理论框架，使用叙事和其组成部分来形成不同抽象水平的映射。这些映射然后被利用来创建更多于1k对叙事、类比和干扰的对比。我们将其分为四类的远/近类比和远/近干扰，以便通过不同的角度研究分析逻辑能力。在这项研究中，我们评估了不同的大语言模型（LLMs）。我们的结果表明，当高级映射不同于下级映射时，LLMs很难认出更高级的映射，并且在所有设置下，LLMs的分析逻辑能力都受到近映射的干扰。

EALM: Introducing Multidimensional Ethical Alignment in Conversational Information Retrieval

paper_url: http://arxiv.org/abs/2310.00970
repo_url: https://github.com/wanng-ide/ealm
paper_authors: Yiyao Yu, Junjie Wang, Yuxiang Zhang, Lin Zhang, Yujiu Yang, Tetsuya Sakai
For: The paper aims to improve the ethical alignment of Conversational Information Retrieval (CIR) systems by incorporating human norms and ethical considerations into the system’s workflow.* Methods: The authors introduce a workflow that integrates ethical alignment into the CIR process, using an initial ethical judgment stage for efficient data screening. They also present two datasets, QA-ETHICS and MP-ETHICS, to evaluate the system’s performance in ethical judgment tasks.* Results: The authors achieve top performance in both binary and multi-label ethical judgment tasks using their proposed approach. Their research provides a practical method for introducing ethical alignment into the CIR workflow, and the datasets and code are available online for further research.

Abstract
Artificial intelligence (AI) technologies should adhere to human norms to better serve our society and avoid disseminating harmful or misleading information, particularly in Conversational Information Retrieval (CIR). Previous work, including approaches and datasets, has not always been successful or sufficiently robust in taking human norms into consideration. To this end, we introduce a workflow that integrates ethical alignment, with an initial ethical judgment stage for efficient data screening. To address the need for ethical judgment in CIR, we present the QA-ETHICS dataset, adapted from the ETHICS benchmark, which serves as an evaluation tool by unifying scenarios and label meanings. However, each scenario only considers one ethical concept. Therefore, we introduce the MP-ETHICS dataset to evaluate a scenario under multiple ethical concepts, such as justice and Deontology. In addition, we suggest a new approach that achieves top performance in both binary and multi-label ethical judgment tasks. Our research provides a practical method for introducing ethical alignment into the CIR workflow. The data and code are available at https://github.com/wanng-ide/ealm .

摘要
人工智能（AI）技术应该遵循人类社会的norms，以更好地服务社会，避免传播有害或误导信息，尤其在对话信息检索（CIR）领域。前一些工作，包括方法和数据集，未必成功或充分考虑人类norms。为此，我们提出了一个具有优化人类准则的工作流程。为了解决CIR中的伦理判断问题，我们介绍了QA-ETHICS数据集，该数据集基于ETHICSbenchmark，用于评估工具的性能。每个enario只考虑一个伦理概念，因此我们引入了MP-ETHICS数据集，用于评估一个scenario下多个伦理概念，如正义和德 Ontology。此外，我们还提出了一种新的方法，可以在binary和多个标签伦理判断任务中达到最高性能。我们的研究提供了实用的方法，用于在CIR workflow中引入伦理对齐。数据和代码可以在https://github.com/wanng-ide/ealm上获取。

Resolving Knowledge Conflicts in Large Language Models

paper_url: http://arxiv.org/abs/2310.00935
repo_url: https://github.com/yikee/knowledge_conflict
paper_authors: Yike Wang, Shangbin Feng, Heng Wang, Weijia Shi, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov
for: 本研究旨在探讨语言模型在知识冲突场景下的表现，并提出了Language Model应该满足哪些需求以解决知识冲突。
methods: 本研究使用了KNOWLEDGE CONFLICT评价框架，该框架包括多种和复杂的知识冲突场景、多个知识来源和领域、两种人工创造冲突方法以及难度逐渐增加的设置，以评估LLMs在知识冲突场景下是否满足了三个需求。
results: 实验结果显示，LLMs在检测知识冲突方面表现良好，但在具体 pinpoint 冲突信息和生成不同看法时表现不佳。此外，研究还发现，LLMs在不同的知识领域和提示文本下表现有很大差异。

Abstract
Large language models (LLMs) often encounter knowledge conflicts, scenarios where discrepancy arises between the internal parametric knowledge of LLMs and non-parametric information provided in the prompt context. In this work we ask what are the desiderata for LLMs when a knowledge conflict arises and whether existing LLMs fulfill them. We posit that LLMs should 1) identify knowledge conflicts, 2) pinpoint conflicting information segments, and 3) provide distinct answers or viewpoints in conflicting scenarios. To this end, we introduce KNOWLEDGE CONFLICT, an evaluation framework for simulating contextual knowledge conflicts and quantitatively evaluating to what extent LLMs achieve these goals. KNOWLEDGE CONFLICT includes diverse and complex situations of knowledge conflict, knowledge from diverse entities and domains, two synthetic conflict creation methods, and settings with progressively increasing difficulty to reflect realistic knowledge conflicts. Extensive experiments with the KNOWLEDGE CONFLICT framework reveal that while LLMs perform well in identifying the existence of knowledge conflicts, they struggle to determine the specific conflicting knowledge and produce a response with distinct answers amidst conflicting information. To address these challenges, we propose new instruction-based approaches that augment LLMs to better achieve the three goals. Further analysis shows that abilities to tackle knowledge conflicts are greatly impacted by factors such as knowledge domain and prompt text, while generating robust responses to knowledge conflict scenarios remains an open research question.

摘要
大型语言模型（LLM）经常遇到知识冲突，即在 parametric 知识和提问上下文中的不一致。在这项工作中，我们问到 LLM 在知识冲突 arise 时的需求是什么，以及现有 LLM 是否满足这些需求。我们认为 LLM 应该满足以下三个需求：1. 识别知识冲突2. 特定 conflicting 信息段3. 在冲突enario 中提供不同的答案或视点为此，我们提出了 KNOWLEDGE CONFLICT 评估框架，用于模拟上下文知识冲突和评估 LLM 是否满足以上三个需求。KNOWLEDGE CONFLICT 包括多样化和复杂的知识冲突 scenarios，知识来源于多个实体和领域，两种生成方法，以及逐渐增加的Difficulty 设置，以Reflect 现实知识冲突。我们进行了广泛的实验，发现 LLM 能够识别知识冲突的存在，但在确定特定 conflicting 知识和生成不同答案中困难。为此，我们提出了新的 instruction-based 方法，用于增强 LLM 的能力。进一步分析表明， LLM 在处理知识冲突时受到知识领域和提问文本的因素很大影响，而生成冲突场景中的稳定答案仍然是一个开放的研究问题。

TADIS: Steering Models for Deep-Thinking about Demonstration Examples

paper_url: http://arxiv.org/abs/2310.00901
repo_url: None
paper_authors: Tianci Xue, Ziqi Wang, Yixia Li, Yun Chen, Guanhua Chen
for: 提高LLMs的零shot泛化能力，增强模型对 instrucion 的理解和执行能力。
methods: 提出了一种新的方法 called TADIS，通过让模型”深思”对示例进行评估和检验，以减轻模型的偏见和假象能力。
results: TADIS 在领域内和领域外任务上表现出色，可以增强模型在零shot和几shot设置下的性能，并且可以大规模应用于提高模型的 instrucion 遵循能力。

Abstract
Instruction tuning has been demonstrated that could significantly improve the zero-shot generalization capability to unseen tasks by an apparent margin. By incorporating additional context (e.g., task definition, examples) during the fine-tuning process, Large Language Models (LLMs) achieved much higher performance than before. However, recent work reported that delusive task examples can achieve almost the same performance as correct task examples, indicating the input-label correspondence is less important than previously thought. Intrigued by this counter-intuitive observation, we suspect models have the same illusion of competence as humans. Therefore, we propose a novel method called TADIS that steers LLMs for "Deep-Thinking'' about demonstration examples instead of merely seeing. To alleviate the illusion of competence of models, we first ask the model to verify the correctness of shown examples. Then, using the verification results as conditions to elicit models for a better answer. Our experimental results show that TADIS consistently outperforms competitive baselines on in-domain and out-domain tasks (improving 2.79 and 4.03 average ROUGLE-L on out-domain and in-domain datasets, respectively). Despite the presence of generated examples (not all of the thinking labels are accurate), TADIS can notably enhance performance in zero-shot and few-shot settings. This also suggests that our approach can be adopted on a large scale to improve the instruction following capabilities of models without any manual labor. Moreover, we construct three types of thinking labels with different model sizes and find that small models learn from the format of TADIS but larger models can be steered for "Deep-Thinking''.

摘要
instruction 调教可以显著提高零shot泛化能力，使模型在未看过任务的情况下表现更高。通过在细化过程中添加更多上下文（例如任务定义、示例），大型自然语言模型（LLMs）达到了更高的性能。然而，最近的研究发现，欺骗性任务示例可以达到正确任务示例的同等性能，表明输入-标签对应不那么重要。我们受这一Counter-intuitive 观察的启发，因此我们提出了一种名为 TADIS 的新方法。TADIS 方法通过让模型对示例进行 "深思" 而不仅仅是看，来避免模型的假象能力。我们首先询问模型示例的正确性，然后使用验证结果作为条件，让模型更好地回答。我们的实验结果表明，TADIS 方法在领域内和领域外任务上（ROUGLE-L 平均提升2.79和4.03）表现出了显著的优势。尽管存在生成的示例（不 всех思考标签正确），TADIS 还是能够明显提高零shot和几shot设置中的性能。这也表明我们的方法可以在大规模上采用，无需手动劳动，以提高模型的 instrucion 遵循能力。此外，我们还构建了三种不同模型大小的思考标签，发现小型模型可以从 TADIS 的格式学习，而大型模型可以通过 "深思" 来更好地回答。

Enable Language Models to Implicitly Learn Self-Improvement From Data

paper_url: http://arxiv.org/abs/2310.00898
repo_url: None
paper_authors: Ziqi Wang, Le Hou, Tianjian Lu, Yuexin Wu, Yunxuan Li, Hongkun Yu, Heng Ji
for:这篇论文的目的是提高大语言模型（LLMs）的自我改进能力，以提高它们在开放式文本生成任务中的表现质量。methods:这篇论文提出了一种名为ImPlicit Self-ImprovemenT（PIT）框架，该框架通过学习人类喜好数据来隐式地学习改进目标。PIT只需要人类喜好数据来训练奖励模型，不需要额外的人工努力。results:在两个实际数据集和一个Synthetic数据集上进行了实验，结果显示，我们的方法在比较之下明显地超过了提示基于方法。

Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in open-ended text generation tasks. However, the inherent open-ended nature of these tasks implies that there is always room for improvement in the quality of model responses. To address this challenge, various approaches have been proposed to enhance the performance of LLMs. There has been a growing focus on enabling LLMs to self-improve their response quality, thereby reducing the reliance on extensive human annotation efforts for collecting diverse and high-quality training data. Recently, prompting-based methods have been widely explored among self-improvement methods owing to their effectiveness, efficiency, and convenience. However, those methods usually require explicitly and thoroughly written rubrics as inputs to LLMs. It is expensive and challenging to manually derive and provide all necessary rubrics with a real-world complex goal for improvement (e.g., being more helpful and less harmful). To this end, we propose an ImPlicit Self-ImprovemenT (PIT) framework that implicitly learns the improvement goal from human preference data. PIT only requires preference data that are used to train reward models without extra human efforts. Specifically, we reformulate the training objective of reinforcement learning from human feedback (RLHF) -- instead of maximizing response quality for a given input, we maximize the quality gap of the response conditioned on a reference response. In this way, PIT is implicitly trained with the improvement goal of better aligning with human preferences. Experiments on two real-world datasets and one synthetic dataset show that our method significantly outperforms prompting-based methods.

摘要
大型语言模型（LLM）在开放式文本生成任务中表现出色，但是由于这类任务的自然开放性，模型的回答质量还有很大的提升空间。为解决这个挑战，各种方法被提出来提高 LLM 的表现。在这些方法中，许多研究者强调了让 LLM 自我改进回答质量，以降低人类注释数据的准备成本。在这些自我改进方法中，许多研究者利用了提示方法，因为它们的效果、效率和方便性。然而，这些方法通常需要 LLM 接受明确和准确的 rubric 作为输入。这是一个昂贵和困难的任务，特别是在实际世界中处理复杂的目标时。为解决这个问题，我们提出了一种名为 Implicit Self-Improvement 框架（PIT）的方法。PIT 使用人类偏好数据来隐式地学习改进目标。具体来说，我们将人类反馈学习（RLHF）的训练目标重新定义为：将响应质量与参考响应之间的质量差异最大化。这样，PIT 将被隐式地训练以更好地遵循人类偏好。我们在两个实际数据集和一个 sintetic 数据集上进行了实验，结果显示，我们的方法在许多情况下比提示方法表现更好。

Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models

paper_url: http://arxiv.org/abs/2310.00840
repo_url: None
paper_authors: Tianjian Li, Haoran Xu, Philipp Koehn, Daniel Khashabi, Kenton Murray
for: 提高文本生成模型的 Robustness，增强模型对噪声数据的耐受能力。
methods: 提出 Error Norm Truncation（ENT）方法，利用非目标token的分布来更准确地估计数据质量，并与标准训练目标函数结合使用。
results: 经过广泛的语言模型、机器翻译和文本概要等实验，显示了对标准训练和先前的软和硬截断方法的改进，并在机器翻译中对噪声添加50%时提高了 más de 2 BLEU 点。

Abstract
Text generation models are notoriously vulnerable to errors in the training data. With the wide-spread availability of massive amounts of web-crawled data becoming more commonplace, how can we enhance the robustness of models trained on a massive amount of noisy web-crawled text? In our work, we propose Error Norm Truncation (ENT), a robust enhancement method to the standard training objective that truncates noisy data. Compared to methods that only uses the negative log-likelihood loss to estimate data quality, our method provides a more accurate estimation by considering the distribution of non-target tokens, which is often overlooked by previous work. Through comprehensive experiments across language modeling, machine translation, and text summarization, we show that equipping text generation models with ENT improves generation quality over standard training and previous soft and hard truncation methods. Furthermore, we show that our method improves the robustness of models against two of the most detrimental types of noise in machine translation, resulting in an increase of more than 2 BLEU points over the MLE baseline when up to 50% of noise is added to the data.

摘要
文本生成模型具有训练数据中错误的极高敏感性。随着庞大量的网络爬虫数据的普及，我们如何提高基于庞大量噪音网络爬虫文本的模型训练robustness?在我们的工作中，我们提出了错误norm truncation（ENT）方法，这是一种robust增强方法，可以减少噪音数据的影响。相比之下，先前的工作通常只使用负log-likelihood损失来估计数据质量，我们的方法可以更加准确地估计非目标字符的分布，这一点通常被前一些工作所忽略。通过语言模型、机器翻译和文本概要等多种实验，我们示出了在把ENT纳入文本生成模型中可以提高生成质量，并且超过了标准训练和先前的软和硬截断方法。此外，我们还证明了ENT可以提高机器翻译模型对噪音的抗性，即在添加50%噪音后，ENT可以提高more than 2 BLEU点的基eline值。

TRAM: Benchmarking Temporal Reasoning for Large Language Models

paper_url: http://arxiv.org/abs/2310.00835
repo_url: https://github.com/eternityyw/tram-benchmark
paper_authors: Yuqing Wang, Yun Zhao
For: This paper aims to provide a standardized benchmark for evaluating the temporal reasoning capabilities of large language models (LLMs).* Methods: The authors introduce a new benchmark called TRAM, which includes ten datasets that cover various temporal aspects of events, such as order, arithmetic, frequency, and duration. They evaluate the performance of popular LLMs, including GPT-4 and Llama2, in both zero-shot and few-shot learning scenarios, and use BERT-based models as the baseline.* Results: The authors find that current LLMs still trail human performance in temporal reasoning tasks, but hope that TRAM will spur further progress in enhancing the temporal reasoning abilities of LLMs.Here’s the summary in the format you requested:* For: 这个研究的目的是提供一个对大型自然语言处理模型（LLMs）的时间理解能力进行标准化评估的 bencmark。* Methods: 作者们引入了一个新的 bencmark called TRAM，该bencmark包括了十个数据集，涵盖了不同的时间方面的事件，如顺序、数学、频率和持续时间等。他们在零shot和少shot学习场景中评估了流行的 LLMs，如 GPT-4 和 Llama2，并使用 BERT-based 模型作为基准。* Results: 作者们发现现在的 LLMs 仍然落后于人类的时间理解能力，但希望 TRAM 能够逐渐提高 LLMS 的时间理解能力。

Abstract
Reasoning about time is essential for understanding the nuances of events described in natural language. Previous research on this topic has been limited in scope, characterized by a lack of standardized benchmarks that would allow for consistent evaluations across different studies. In this paper, we introduce TRAM, a temporal reasoning benchmark composed of ten datasets, encompassing various temporal aspects of events such as order, arithmetic, frequency, and duration, designed to facilitate a comprehensive evaluation of the temporal reasoning capabilities of large language models (LLMs). We conduct an extensive evaluation using popular LLMs, such as GPT-4 and Llama2, in both zero-shot and few-shot learning scenarios. Additionally, we employ BERT-based models to establish the baseline evaluations. Our findings indicate that these models still trail human performance in temporal reasoning tasks. It is our aspiration that TRAM will spur further progress in enhancing the temporal reasoning abilities of LLMs.

摘要
理解时间的推理是理解自然语言描述事件的细节的关键。先前的研究在这一领域受限，因为缺乏一致的标准化测试准则，这使得不同研究之间的评估不可靠。在这篇论文中，我们提出了TRAM，它是一个包含多种时间方面的事件的测试集，包括顺序、数学、频率和持续时间等，以便对大语言模型（LLM）的时间推理能力进行全面的评估。我们进行了广泛的评估，使用流行的GPT-4和Llama2模型，以及基于BERT的模型，以确定这些模型在时间推理任务中的表现。我们的发现表明，这些模型仍然远远落后于人类在时间推理任务中的表现。我们希望TRAM能够推动大语言模型的时间推理能力的进步。

Necessary and Sufficient Watermark for Large Language Models

paper_url: http://arxiv.org/abs/2310.00833
repo_url: None
paper_authors: Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, Makoto Yamada
for: 本研究旨在提出一种能够不干扰生成文本质量的必需和充分水印（Necessary and Sufficient Watermark, NS-Watermark），以区分LLM生成的文本和人工生成的文本。
methods: 本研究基于生成文本的必需和充分约束，通过定制的优化问题和效率高的算法，实现了NS-Watermark的执行。
results: 经过实验表明，NS-Watermark可以生成更自然的文本，并且更准确地区分LLM和人工生成的文本。特别在机器翻译任务中，NS-Watermark可以超越现有的水印方法，提高机器翻译质量。

Abstract
In recent years, large language models (LLMs) have achieved remarkable performances in various NLP tasks. They can generate texts that are indistinguishable from those written by humans. Such remarkable performance of LLMs increases their risk of being used for malicious purposes, such as generating fake news articles. Therefore, it is necessary to develop methods for distinguishing texts written by LLMs from those written by humans. Watermarking is one of the most powerful methods for achieving this. Although existing watermarking methods have successfully detected texts generated by LLMs, they significantly degrade the quality of the generated texts. In this study, we propose the Necessary and Sufficient Watermark (NS-Watermark) for inserting watermarks into generated texts without degrading the text quality. More specifically, we derive minimum constraints required to be imposed on the generated texts to distinguish whether LLMs or humans write the texts. Then, we formulate the NS-Watermark as a constrained optimization problem and propose an efficient algorithm to solve it. Through the experiments, we demonstrate that the NS-Watermark can generate more natural texts than existing watermarking methods and distinguish more accurately between texts written by LLMs and those written by humans. Especially in machine translation tasks, the NS-Watermark can outperform the existing watermarking method by up to 30 BLEU scores.

摘要
In this study, we propose the Necessary and Sufficient Watermark (NS-Watermark) for inserting watermarks into generated texts without degrading the text quality. Specifically, we derive the minimum constraints required to distinguish whether LLMs or humans write the texts. We formulate the NS-Watermark as a constrained optimization problem and propose an efficient algorithm to solve it.Through experiments, we demonstrate that the NS-Watermark can generate more natural texts than existing watermarking methods and distinguish more accurately between texts written by LLMs and those written by humans. In machine translation tasks, the NS-Watermark can outperform existing watermarking methods by up to 30 BLEU scores.

2023-10-02

cs.LG

cs.LG - 2023-10-02

Transformers are efficient hierarchical chemical graph learners

paper_url: http://arxiv.org/abs/2310.01704
repo_url: None
paper_authors: Zihan Pengmei, Zimu Li, Chih-chan Tien, Risi Kondor, Aaron R. Dinner
for: 这篇论文是为了提出一种基于自然语言处理的图表示学习方法，即 SubFormer，以解决现代图 transformer 中节点或边视为分立的做法所导致的计算挑战。
methods: SubFormer 使用了一种消息传递机制来聚合信息，从而减少了 tokens 的数量并提高了长距离交互的学习。
results: 作者在使用 SubFormer 进行化学结构预测任务上达到了与当前状态OF-the-art 图 transformer 的竞争水平，并且在消耗了许多 fewer 计算资源的情况下。具体来说，训练时间在consumer-grade graphics card 上只需要几分钟。此外，作者还进行了对 attention weights 的解释，并证明 SubFormer 具有限制过拟合和过压缩的特点。

Abstract
Transformers, adapted from natural language processing, are emerging as a leading approach for graph representation learning. Contemporary graph transformers often treat nodes or edges as separate tokens. This approach leads to computational challenges for even moderately-sized graphs due to the quadratic scaling of self-attention complexity with token count. In this paper, we introduce SubFormer, a graph transformer that operates on subgraphs that aggregate information by a message-passing mechanism. This approach reduces the number of tokens and enhances learning long-range interactions. We demonstrate SubFormer on benchmarks for predicting molecular properties from chemical structures and show that it is competitive with state-of-the-art graph transformers at a fraction of the computational cost, with training times on the order of minutes on a consumer-grade graphics card. We interpret the attention weights in terms of chemical structures. We show that SubFormer exhibits limited over-smoothing and avoids over-squashing, which is prevalent in traditional graph neural networks.

摘要
transformers，起源于自然语言处理，在图表示学习中emerging为领先方法。当前的图transformers通常将节点或边视为分立的token。这种方法会导致对几乎任何大小的图进行计算而带来挑战，因为自我注意复杂性与token数平方成正比。在这篇论文中，我们介绍SubFormer，一种基于消息传递机制的图transformer，可以在subgraph上进行图表示学习。这种方法可以减少token数量，提高了长距离交互的学习。我们在化学结构预测 tasks上使用SubFormer，并证明它与当前的图transformers在计算成本上一样竞争，但是训练时间只需几分钟，可以在consumer级别的图形处理卡上完成。我们还对SubFormer的注意力权重进行了解释，并证明它在化学结构中具有有限的过滤和压缩现象，这些现象在传统的图神经网络中很普遍。

Robustifying State-space Models for Long Sequences via Approximate Diagonalization

paper_url: http://arxiv.org/abs/2310.01698
repo_url: None
paper_authors: Annan Yu, Arnur Nigmetov, Dmitriy Morozov, Michael W. Mahoney, N. Benjamin Erichson
for: 本研究旨在提出一种泛化的”强制后向稳定”（PTD）方法，用于解决机器学习中的非正常矩阵对称化问题。
methods: 我们提出了一种基于pseudospectral理论的非正常矩阵对称化方法，并在S4-PTD和S5-PTD模型中应用了这种方法。
results: 我们通过对不同初始化方案的传输函数的分析，证明了S4-PTD/S5-PTD初始化对HiPPO框架强有吸引力，而S4D/S5初始化只能获得弱连续性。此外，我们的S5-PTD模型在Long-Range Arenabenchmark上得到了87.6%的准确率，表明PTD方法可以提高深度学习模型的准确率。

Abstract
State-space models (SSMs) have recently emerged as a framework for learning long-range sequence tasks. An example is the structured state-space sequence (S4) layer, which uses the diagonal-plus-low-rank structure of the HiPPO initialization framework. However, the complicated structure of the S4 layer poses challenges; and, in an effort to address these challenges, models such as S4D and S5 have considered a purely diagonal structure. This choice simplifies the implementation, improves computational efficiency, and allows channel communication. However, diagonalizing the HiPPO framework is itself an ill-posed problem. In this paper, we propose a general solution for this and related ill-posed diagonalization problems in machine learning. We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology, which is based on the pseudospectral theory of non-normal operators, and which may be interpreted as the approximate diagonalization of the non-normal matrices defining SSMs. Based on this, we introduce the S4-PTD and S5-PTD models. Through theoretical analysis of the transfer functions of different initialization schemes, we demonstrate that the S4-PTD/S5-PTD initialization strongly converges to the HiPPO framework, while the S4D/S5 initialization only achieves weak convergences. As a result, our new models show resilience to Fourier-mode noise-perturbed inputs, a crucial property not achieved by the S4D/S5 models. In addition to improved robustness, our S5-PTD model averages 87.6% accuracy on the Long-Range Arena benchmark, demonstrating that the PTD methodology helps to improve the accuracy of deep learning models.

摘要
状态空间模型（SSM）最近在长距离序列任务中得到应用。例如，结构化状态空间序列（S4）层使用了HiPPO初始化框架的对角线加低级结构。然而，S4层的复杂结构带来挑战，以至于模型如S4D和S5在解决这些挑战时考虑了纯对角结构。这种选择简化实现，提高计算效率，并允许通道通信。然而，对HiPPO框架的对角化本身是一个不定 пробле。在这篇文章中，我们提出了一种通用的解决方案，基于非正常算子的pseudospectral理论，并可以看作是SSM中非正常矩阵的 Approximate diagonalization。基于这，我们引入了S4-PTD和S5-PTD模型。通过对不同初始化方案的传输函数的分析，我们证明了S4-PTD/S5-PTD初始化强 converges to HiPPO框架，而S4D/S5初始化只有weak converges。因此，我们新的模型具有耐 Fourier-mode 噪声扰动输入的性能，而S4D/S5模型没有达到这种性能。此外，我们的S5-PTD模型在Long-Range Arena benchmark上的准确率为87.6%，表明PTD方法可以提高深度学习模型的准确率。

DANI: Fast Diffusion Aware Network Inference with Preserving Topological Structure Property

paper_url: http://arxiv.org/abs/2310.01696
repo_url: https://github.com/aryanahadinia/dani
paper_authors: Maryam Ramezani, Aryan Ahadinia, Erfan Farhadi, Hamid R. Rabiee
for: 推断社交网络的底层结构
methods: 基于时序链reactivity Matrix和节点之间相似性的方法
results: 高精度和低运行时间，保持结构性特征，包括模块结构、度分布、连接分量、浸泡度和嵌入度Here’s the same information in English:
for: Inferring the underlying structure of social networks
methods: Based on the Markov transition matrix derived from time series cascades and node-node similarity from a structural perspective
results: High accuracy and low running time, preserving structural properties, including modular structure, degree distribution, connected components, density, and clustering coefficients.

Abstract
The fast growth of social networks and their data access limitations in recent years has led to increasing difficulty in obtaining the complete topology of these networks. However, diffusion information over these networks is available, and many algorithms have been proposed to infer the underlying networks using this information. The previously proposed algorithms only focus on inferring more links and ignore preserving the critical topological characteristics of the underlying social networks. In this paper, we propose a novel method called DANI to infer the underlying network while preserving its structural properties. It is based on the Markov transition matrix derived from time series cascades, as well as the node-node similarity that can be observed in the cascade behavior from a structural point of view. In addition, the presented method has linear time complexity (increases linearly with the number of nodes, number of cascades, and square of the average length of cascades), and its distributed version in the MapReduce framework is also scalable. We applied the proposed approach to both real and synthetic networks. The experimental results showed that DANI has higher accuracy and lower run time while maintaining structural properties, including modular structure, degree distribution, connected components, density, and clustering coefficients, than well-known network inference methods.

摘要
“社交网络的快速增长和数据访问限制在最近几年中，导致了获取社交网络的完整拓扑结构的增加困难。然而，社交网络上的协议信息可以获取，许多算法已经被提出来使用这些信息推断社交网络的下面结构。但这些算法只注重推断更多的链接，忽略了保持社交网络的结构性特征。在这篇论文中，我们提出了一种新的方法 called DANI，可以在保持社交网络的结构性特征的情况下推断社交网络的下面结构。它基于时间序列冲击矩阵，以及从结构角度观察到的节点对节点相似性。此外，提出的方法的时间复杂度为线性时间复杂度（与节点数、冲击数、冲击链的平均长度平方成正比增长），其分布式版本在MapReduce框架中也可扩展。我们对真实网络和synthetic网络进行了实验，结果表明，DANI比较知名的网络推断方法更高精度且更快速，同时保持了社交网络的结构特征，包括模块结构、度分布、连接分布、密度和嵌入系数。”

Forecasting Tropical Cyclones with Cascaded Diffusion Models

paper_url: http://arxiv.org/abs/2310.01690
repo_url: https://github.com/nathzi1505/forecast-diffmodels
paper_authors: Pritthijit Nath, Pancham Shukla, César Quilodrán-Casas
for: 预测飓风轨迹和降水强度
methods: 利用扩散模型 Integrate 卫星成像、远程感知和大气数据，采用级联方法，包括预测、超分辨和降水模型，对51个飓风基区进行训练
results: 实验表明，级联模型的最终预测可以准确预测飓风轨迹和降水强度，SSIM和PSNR值分别高于0.5和20 dB，适用于高性能需求和财力有限的地区。Here’s the English version for reference:
for: Forecasting cyclone trajectories and precipitation patterns
methods: Leveraging diffusion models to integrate satellite imaging, remote sensing, and atmospheric data, using a cascaded approach that includes forecasting, super-resolution, and precipitation modeling, with training on a dataset of 51 cyclones from six major basins.
results: Experiments show that the final forecasts from the cascaded models can accurately predict cyclone trajectories and precipitation patterns up to a 36-hour rollout, with SSIM and PSNR values exceeding 0.5 and 20 dB, respectively, for all three tasks.

Abstract
As cyclones become more intense due to climate change, the rise of AI-based modelling provides a more affordable and accessible approach compared to traditional methods based on mathematical models. This work leverages diffusion models to forecast cyclone trajectories and precipitation patterns by integrating satellite imaging, remote sensing, and atmospheric data, employing a cascaded approach that incorporates forecasting, super-resolution, and precipitation modelling, with training on a dataset of 51 cyclones from six major basins. Experiments demonstrate that the final forecasts from the cascaded models show accurate predictions up to a 36-hour rollout, with SSIM and PSNR values exceeding 0.5 and 20 dB, respectively, for all three tasks. This work also highlights the promising efficiency of AI methods such as diffusion models for high-performance needs, such as cyclone forecasting, while remaining computationally affordable, making them ideal for highly vulnerable regions with critical forecasting needs and financial limitations. Code accessible at \url{https://github.com/nathzi1505/forecast-diffmodels}.

摘要
随着气候变化，风暴的强度变得越来越高，AI模型提供了一种更加可靠和可 accessible的方法，相比传统基于数学模型的方法。这项工作利用扩散模型预测风暴轨迹和降水模式，并将卫星影像、远程感知和大气数据集成起来，采用层次结构的方法，包括预测、超分解和降水模型，并对51个风暴数据进行训练。实验表明，最终预测结果从扩散模型中得到了准确的预测结果，SSIM和PSNR值分别高于0.5和20 dB，对所有三个任务都有良好的预测性。此外，这项工作也指出了AI方法如扩散模型在高性能需求下的可行性，同时保持计算可持，使其成为有严重预测需求和财务限制的地区的理想选择。代码可以在 GitHub上获取：\url{https://github.com/nathzi1505/forecast-diffmodels}。

From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression

paper_url: http://arxiv.org/abs/2310.01687
repo_url: None
paper_authors: Xuxing Chen, Krishnakumar Balasubramanian, Promit Ghosal, Bhavya Agrawalla
for: investigate the dynamics of gradient descent using large-order constant step-sizes in quadratic regression models.
methods: use a specific cubic map to encapsulate the dynamics, and conduct a fine-grained bifurcation analysis concerning the step-size parameter.
results: identify five distinct training phases, including monotonic, catapult, periodic, chaotic, and divergent phases, and observe that performing an ergodic trajectory averaging stabilizes the test error in non-monotonic (and non-divergent) phases.

Abstract
We conduct a comprehensive investigation into the dynamics of gradient descent using large-order constant step-sizes in the context of quadratic regression models. Within this framework, we reveal that the dynamics can be encapsulated by a specific cubic map, naturally parameterized by the step-size. Through a fine-grained bifurcation analysis concerning the step-size parameter, we delineate five distinct training phases: (1) monotonic, (2) catapult, (3) periodic, (4) chaotic, and (5) divergent, precisely demarcating the boundaries of each phase. As illustrations, we provide examples involving phase retrieval and two-layer neural networks employing quadratic activation functions and constant outer-layers, utilizing orthogonal training data. Our simulations indicate that these five phases also manifest with generic non-orthogonal data. We also empirically investigate the generalization performance when training in the various non-monotonic (and non-divergent) phases. In particular, we observe that performing an ergodic trajectory averaging stabilizes the test error in non-monotonic (and non-divergent) phases.

摘要
我们进行了对梯度下降的全面调查，使用大顺序常数步长在 quadratic 回归模型中。在这个框架下，我们发现这些动态可以通过特定的立方图表示，自然地归一化到步长参数。通过细腻的分岔分析，我们划分出五种不同的训练阶段：（1）升序、（2）炸彩、（3）周期、（4）危机和（5）分散，准确地界定每个阶段的边界。例如，我们提供了phaserecovery和两层神经网络，使用quadratic activation functions和常数外层，使用正交训练数据。我们的实验表明，这五个阶段也会在非正交数据上出现。此外，我们还employs empirical investigation of the generalization performance during training in the various non-monotonic（和非分散）阶段，并发现在非升序（和非分散）阶段执行随机轨迹平均可以稳定测试错误。

A Framework for Interpretability in Machine Learning for Medical Imaging

paper_url: http://arxiv.org/abs/2310.01685
repo_url: None
paper_authors: Alan Q. Wang, Batuhan K. Karaman, Heejong Kim, Jacob Rosenthal, Rachit Saluja, Sean I. Young, Mert R. Sabuncu
for: 本研究的目的是提高医学影像分析中的机器学习模型可解性。
methods: 本研究使用了 formalize 可解性需求，通过理解医学影像分析和机器学习的实际任务和目标，identify 四个核心可解性元素：localization、视觉可识别、物理归因和透明度。
results: 本研究为医学影像分析领域的机器学习模型设计提供了实用和教学信息， inspiritedevelopers可以更深入理解可解性的目的和方法，并提出了未来可解性研究的方向。

Abstract
Interpretability for machine learning models in medical imaging (MLMI) is an important direction of research. However, there is a general sense of murkiness in what interpretability means. Why does the need for interpretability in MLMI arise? What goals does one actually seek to address when interpretability is needed? To answer these questions, we identify a need to formalize the goals and elements of interpretability in MLMI. By reasoning about real-world tasks and goals common in both medical image analysis and its intersection with machine learning, we identify four core elements of interpretability: localization, visual recognizability, physical attribution, and transparency. Overall, this paper formalizes interpretability needs in the context of medical imaging, and our applied perspective clarifies concrete MLMI-specific goals and considerations in order to guide method design and improve real-world usage. Our goal is to provide practical and didactic information for model designers and practitioners, inspire developers of models in the medical imaging field to reason more deeply about what interpretability is achieving, and suggest future directions of interpretability research.

摘要
machine learning models in medical imaging (MLMI) 的可解释性是一个重要的研究方向。然而，有一个通用的感觉是，可解释性的含义不够明确。为什么需要在 MLMI 中的可解释性？我们需要解决什么问题时需要可解释性？为了回答这些问题，我们需要正式化 MLMI 中的目标和元素。通过考虑医学影像分析和机器学习的实际任务和目标，我们确定了 MLMI 中的四个核心元素：局部化、视觉可识别性、物理归因和透明度。总之，这篇论文将 MLMI 中的可解释性需求进行了形式化，并通过应用实际的视角，为模型设计者和实践者提供了实用和教学的信息，激励医学影像领域中的模型开发者更深入思考可解释性的目的，并建议将来的可解释性研究的未来方向。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Commutative Width and Depth Scaling in Deep Neural Networks

paper_url: http://arxiv.org/abs/2310.01683
repo_url: None
paper_authors: Soufiane Hayou
for: 本研究是深度神经网络中 commutativity 的第二篇文章，旨在理解深度神经网络中宽度和深度在无穷大时的行为，并 eventually 确定 commutativity 是否成立。
methods: 本文使用新的证明技术，基于更加容易理解的杂event calculus，证明深度神经网络中 skip connections 的使用可以使 covariance 结构保持不变，无论宽度和深度在无穷大时如何取得。
results: 本文的结果表明，在深度神经网络中，采用 skip connections 的方法，可以使 covariance 结构保持不变，无论宽度和深度在无穷大时如何取得。这些结果扩展了先前的研究（参考 [55]），并有很多理论和实践上的意义，我们在文章中进行了详细的介绍和讨论。

Abstract
This paper is the second in the series Commutative Scaling of Width and Depth (WD) about commutativity of infinite width and depth limits in deep neural networks. Our aim is to understand the behaviour of neural functions (functions that depend on a neural network model) as width and depth go to infinity (in some sense), and eventually identify settings under which commutativity holds, i.e. the neural function tends to the same limit no matter how width and depth limits are taken. In this paper, we formally introduce and define the commutativity framework, and discuss its implications on neural network design and scaling. We study commutativity for the neural covariance kernel which reflects how network layers separate data. Our findings extend previous results established in [55] by showing that taking the width and depth to infinity in a deep neural network with skip connections, when branches are suitably scaled to avoid exploding behaviour, result in the same covariance structure no matter how that limit is taken. This has a number of theoretical and practical implications that we discuss in the paper. The proof techniques in this paper are novel and rely on tools that are more accessible to readers who are not familiar with stochastic calculus (used in the proofs of WD(I))).

摘要
这份论文是WD系列的第二篇，探讨深度神经网络中宽度和深度范围的交换性。我们的目标是在宽度和深度范围趋向于无穷大时，理解神经函数（取决于神经网络模型的函数）的行为，并 eventually 确定在某些设置下，交换性存在，即神经函数往往趋向同一个边界，无论宽度和深度范围如何选择。在这篇论文中，我们正式引入和定义交换性框架，并讨论其对神经网络设计和缩放的影响。我们研究交换性在神经卷积核中，这个核心反映了神经网络层次如何分离数据。我们的发现超越了之前的结果（参考 [55]），显示在深度神经网络中具有跳跃连接的情况下，当分支适当缩放以避免暴跌行为时，宽度和深度范围趋向同一个covariance结构，无论如何选择这个边界。这有许多理论和实践意义，我们在论文中详细介绍。这份论文的证明技巧是新的，并且基于更加可 accessible 的概率Calculus（WD(I) 证明中使用的概率Calculus）。

Estimating and Implementing Conventional Fairness Metrics With Probabilistic Protected Features

paper_url: http://arxiv.org/abs/2310.01679
repo_url: None
paper_authors: Hadi Elzayn, Emily Black, Patrick Vossler, Nathanael Jo, Jacob Goldin, Daniel E. Ho
For: 本研究的目的是开发一种能够在有限protected attribute标签的情况下训练公平模型的方法。* Methods: 我们提出了一种使用可信度提升 surname geocoding 来获取保护属性标签的 probabilistic 估计，并使用这些估计来计算公平指标的上下限。另外，我们还提出了一种基于上下文信息的具体化方法，该方法利用模型预测结果和保护属性的 probabilistic 预测结果之间的关系来提供更紧的上限。* Results: 我们的实验表明，我们的测量方法可以与previous方法相比，在这些应用程序中紧跟true disparity的上限。此外，我们的训练方法可以减少disparity，同时与其他具有有限保护属性标签的公平优化方法相比，具有较小的公平精度质量trade-off。

Abstract
The vast majority of techniques to train fair models require access to the protected attribute (e.g., race, gender), either at train time or in production. However, in many important applications this protected attribute is largely unavailable. In this paper, we develop methods for measuring and reducing fairness violations in a setting with limited access to protected attribute labels. Specifically, we assume access to protected attribute labels on a small subset of the dataset of interest, but only probabilistic estimates of protected attribute labels (e.g., via Bayesian Improved Surname Geocoding) for the rest of the dataset. With this setting in mind, we propose a method to estimate bounds on common fairness metrics for an existing model, as well as a method for training a model to limit fairness violations by solving a constrained non-convex optimization problem. Unlike similar existing approaches, our methods take advantage of contextual information -- specifically, the relationships between a model's predictions and the probabilistic prediction of protected attributes, given the true protected attribute, and vice versa -- to provide tighter bounds on the true disparity. We provide an empirical illustration of our methods using voting data. First, we show our measurement method can bound the true disparity up to 5.5x tighter than previous methods in these applications. Then, we demonstrate that our training technique effectively reduces disparity while incurring lesser fairness-accuracy trade-offs than other fair optimization methods with limited access to protected attributes.

摘要
大多数尝试培训公平模型都需要访问保护属性（例如种族、性别），ether during training or in production. However, in many important applications, this protected attribute is not readily available. In this paper, we develop methods for measuring and reducing fairness violations in a setting with limited access to protected attribute labels. Specifically, we assume access to protected attribute labels for a small subset of the dataset of interest, but only probabilistic estimates of protected attribute labels (e.g., via Bayesian Improved Surname Geocoding) for the rest of the dataset. With this setting in mind, we propose a method to estimate bounds on common fairness metrics for an existing model, as well as a method for training a model to limit fairness violations by solving a constrained non-convex optimization problem. Unlike similar existing approaches, our methods take advantage of contextual information -- specifically, the relationships between a model's predictions and the probabilistic prediction of protected attributes, given the true protected attribute, and vice versa -- to provide tighter bounds on the true disparity. We provide an empirical illustration of our methods using voting data. First, we show our measurement method can bound the true disparity up to 5.5 times tighter than previous methods in these applications. Then, we demonstrate that our training technique effectively reduces disparity while incurring lesser fairness-accuracy trade-offs than other fair optimization methods with limited access to protected attributes.

Score dynamics: scaling molecular dynamics with picosecond timesteps via conditional diffusion model

paper_url: http://arxiv.org/abs/2310.01678
repo_url: None
paper_authors: Tim Hsu, Babak Sadigh, Vasily Bulatov, Fei Zhou
for: 这个论文是为了学习有效的演化运算符，从分子动力学实验中获得的分子动力学模型。
methods: 这个论文使用了分子动力学实验中的分子动力学模型，并使用了图神经网络来构建分子动力学系统的分数动力学模型。
results: 这个论文的实验结果表明，使用分数动力学模型可以在1~ps时间步长下进行高速的分子动力学模拟，并且可以与分子动力学实验的结果相符。

Abstract
We propose score dynamics (SD), a general framework for learning effective evolution operators for atomistic as well as coarse-grained dynamics from molecular-dynamics (MD) simulations. SD is centered around scores, or derivatives of the transition log-probability with respect to the dynamical degrees of freedom. The latter play the same role as force fields in MD but are used in denoising diffusion probability models to generate discrete transitions of the dynamical variables in an SD timestep, which can be orders of magnitude larger than a typical MD timestep. In this work, we construct graph neural network based score dynamics models of realistic molecular systems that are evolved with 1~ps timesteps. We demonstrate the efficacy of score dynamics with case studies of alanine dipeptide and short alkanes in aqueous solution. Both equilibrium predictions derived from the stationary distributions of the conditional probability and kinetic predictions for the transition rates and transition paths are in good agreement with MD at about 8-18 fold wall-clock speedup. Open challenges and possible future remedies to improve score dynamics are also discussed.

摘要
我们提出了得分动力学（SD），一种泛化框架，可以从分子动力学（MD）仿真中学习有效的演化运算符。SD中心在于得分，即动力学变量的转移极 probabilistic 的导数。这些导数与力场在MD中扮演相同的角色，但是在排除噪声扩散概率模型中使用，以生成动态变量的精炼过程中的精炼步骤，这些步骤可以是MD步骤的数个数量级。在这种工作中，我们使用图 neural network 构建了真实分子系统的Score Dynamics 模型，这些模型在1~ps步骤中进行了演化。我们通过对 Alanine dipeptide 和尘埃烷在液态中的情况进行了 caso study，并证明了得分动力学的有效性。我们的方法可以与MD相比，提高了8-18倍的计时速度。我们还讨论了现有的挑战和可能的未来改进。

Locality-Aware Graph-Rewiring in GNNs

paper_url: http://arxiv.org/abs/2310.01668
repo_url: None
paper_authors: Federico Barbero, Ameya Velingker, Amin Saberi, Michael Bronstein, Francesco Di Giovanni
for: 本文旨在提高图像学习中的图结构学习模型（Graph Neural Networks，GNNs）的表现，通过修改图的连接方式来改善信息流动。
methods: 本文提出了三种重要的条件 для图重编组：减少过载、尊重图的本地特性和保持图的稀疏性。同时，本文还提出了一种新的重编组框架，通过地域性执行重编组操作来满足这三个条件。
results: 本文通过多个实验 validate 了新的重编组框架的有效性，并证明它可以与或大幅超过现有的重编组方法。

Abstract
Graph Neural Networks (GNNs) are popular models for machine learning on graphs that typically follow the message-passing paradigm, whereby the feature of a node is updated recursively upon aggregating information over its neighbors. While exchanging messages over the input graph endows GNNs with a strong inductive bias, it can also make GNNs susceptible to over-squashing, thereby preventing them from capturing long-range interactions in the given graph. To rectify this issue, graph rewiring techniques have been proposed as a means of improving information flow by altering the graph connectivity. In this work, we identify three desiderata for graph-rewiring: (i) reduce over-squashing, (ii) respect the locality of the graph, and (iii) preserve the sparsity of the graph. We highlight fundamental trade-offs that occur between spatial and spectral rewiring techniques; while the former often satisfy (i) and (ii) but not (iii), the latter generally satisfy (i) and (iii) at the expense of (ii). We propose a novel rewiring framework that satisfies all of (i)--(iii) through a locality-aware sequence of rewiring operations. We then discuss a specific instance of such rewiring framework and validate its effectiveness on several real-world benchmarks, showing that it either matches or significantly outperforms existing rewiring approaches.

摘要
图神网络（GNN）是常用的机器学习模型，它通常遵循消息传递假设，其中节点的特征通过与邻居 nodes 的信息聚合来进行更新。在传递消息过程中，GNN 具有强 inductive bias，但是这也可能导致 GNN 不能捕捉图中远距离的交互。为了解决这个问题，人们提出了图重排技术，以改善信息流动。在这个工作中，我们提出了三个愿景 для图重排：1. 减少过度压缩：图重排应该减少 GNN 中节点之间信息的重复，以便更好地捕捉图中的交互。2. 尊重图的本地性：图重排应该尊重图的本地结构，避免对图的整体结构进行大规模的改变。3. 保持图的稀疏性：图重排应该保持图的稀疏性，避免对图的稀疏性进行大规模的破坏。我们指出了图重排技术之间的基本质量贝各，其中一般来说，空间重排技术可以减少过度压缩，但是通常不能尊重图的本地性和稀疏性。相反，spectral重排技术通常能够尊重图的本地性和稀疏性，但是通常不能减少过度压缩。我们提出了一种新的重排框架，该框架通过一系列本地化的重排操作来满足所有的愿景。我们then 讨论了这种重排框架的一个具体实现，并在多个实际 benchmark 上验证了其效果，显示它可以与或大于现有的重排方法相比。

Home Electricity Data Generator (HEDGE): An open-access tool for the generation of electric vehicle, residential demand, and PV generation profiles

paper_url: http://arxiv.org/abs/2310.01661
repo_url: None
paper_authors: Flora Charbonnier, Thomas Morstyn, Malcolm McCulloch
for: 本研究开发了一个名为Home Electricity Data Generator（HEDGE）的开源工具，用于随机生成真实的住宅电力数据。
methods: 本研究使用了生成对抗网络（GANs）来训练生成真实的人工数据，并将其分为不同的行为群。
results: HEDGE可以填补现有数据库中的数据损失，并生成一些真实的住宅电力数据，包括太阳能发电、家用电力负载和电动车的消耗。这些数据可以用于研究住宅分布式能源资源的特性和协调。

Abstract
In this paper, we present the Home Electricity Data Generator (HEDGE), an open-access tool for the random generation of realistic residential energy data. HEDGE generates realistic daily profiles of residential PV generation, household electric loads, and electric vehicle consumption and at-home availability, based on real-life UK datasets. The lack of usable data is a major hurdle for research on residential distributed energy resources characterisation and coordination, especially when using data-driven methods such as machine learning-based forecasting and reinforcement learning-based control. A key issue is that while large data banks are available, they are not in a usable format, and numerous subsequent days of data for a given single home are unavailable. We fill these gaps with the open-access HEDGE tool which generates data sequences of energy data for several days in a way that is consistent for single homes, both in terms of profile magnitude and behavioural clusters. From raw datasets, pre-processing steps are conducted, including filling in incomplete data sequences and clustering profiles into behaviour clusters. Generative adversarial networks (GANs) are then trained to generate realistic synthetic data representative of each behaviour groups consistent with real-life behavioural and physical patterns.

摘要
本文介绍了家庭电力数据生成器（HEDGE），一种开源工具，用于随机生成真实的家庭可再生能源数据。HEDGE生成了真实的每天家庭太阳能生成、家庭电力负荷和电动车消耗的日程表，基于英国实际数据。由于家庭分布式能源资源特征化和协调研究的数据缺乏问题，特别是使用数据驱动方法such as机器学习预测和强化学习控制时，HEDGE工具填补了这些缺失。HEDGE工具可以生成一系列的能源数据序列，包括家庭特有的能源资源特征和行为带。从原始数据开始，进行了预处理步骤，包括填充不完整的数据序列和对 Profile clustering。然后，使用生成敌方网络（GANs）训练生成真实的同一个行为群的合理的 sintetic数据，与实际行为和物理特征相符。

REMEDI: REinforcement learning-driven adaptive MEtabolism modeling of primary sclerosing cholangitis DIsease progression

paper_url: http://arxiv.org/abs/2310.01426
repo_url: None
paper_authors: Chang Hu, Krishnakant V. Saboo, Ahmad H. Ali, Brian D. Juran, Konstantinos N. Lazaridis, Ravishankar K. Iyer
For: This paper aims to introduce a framework called REMEDI, which can assist in exploring treatments for Primary Sclerosing Cholangitis (PSC) by capturing bile acid dynamics and the body’s adaptive response during PSC progression.* Methods: REMEDI combines a differential equation (DE)-based mechanistic model of bile acid metabolism with reinforcement learning (RL) to emulate the body’s adaptations to PSC continuously. The framework leverages RL to approximate adaptations in PSC, treating homeostasis as a reward signal and adjusting the DE parameters as the corresponding actions.* Results: On real-world data, REMEDI generated bile acid dynamics and parameter adjustments consistent with published findings, and supported discussions in the literature that early administration of drugs that suppress bile acid synthesis may be effective in PSC treatment.Here’s the simplified Chinese text for the three key points:* For: 这篇论文目的是介绍一种名为REMEDI的框架，该框架可以帮助研究Primary Sclerosing Cholangitis (PSC) 的治疗方法，通过捕捉胆囊酸的动态和身体的适应反应来模拟PSC的进程。* Methods: REMEDI 结合了差分方程 (DE) 基本的机制模型和奖励学习 (RL) 来模拟身体在PSC 进程中的适应。框架通过RL来估算PSC 的适应，将身体的适应视为奖励信号，并将DE 参数的调整视为相应的行动。* Results: 在实际数据上，REMEDI 生成的胆囊酸动态和参数调整与已发表文献相符，并支持 literatura 中关于PSC 治疗的讨论，提出了抑制胆囊酸 synthesis 的药物可能在PSC 治疗中的早期行使有效性。

Abstract
Primary sclerosing cholangitis (PSC) is a rare disease wherein altered bile acid metabolism contributes to sustained liver injury. This paper introduces REMEDI, a framework that captures bile acid dynamics and the body's adaptive response during PSC progression that can assist in exploring treatments. REMEDI merges a differential equation (DE)-based mechanistic model that describes bile acid metabolism with reinforcement learning (RL) to emulate the body's adaptations to PSC continuously. An objective of adaptation is to maintain homeostasis by regulating enzymes involved in bile acid metabolism. These enzymes correspond to the parameters of the DEs. REMEDI leverages RL to approximate adaptations in PSC, treating homeostasis as a reward signal and the adjustment of the DE parameters as the corresponding actions. On real-world data, REMEDI generated bile acid dynamics and parameter adjustments consistent with published findings. Also, our results support discussions in the literature that early administration of drugs that suppress bile acid synthesis may be effective in PSC treatment.

摘要
主要硬化性胆汁炎（PSC）是一种罕见的疾病，其中改变的胆汁酸代谢过程对持续liver injury做出了贡献。本文介绍了REMEDI框架，该框架旨在捕捉胆汁酸动力学和身体的适应应对PSC进程中的变化。REMEDI通过结合极限值方程（DE）基本机制模型和强化学习（RL）来模拟身体适应PSC的过程，并且通过RL来让身体在PSC进程中实现homeostasis。在这个过程中，RL通过调整DE参数来实现这一目标。在实际数据上，REMEDI生成的胆汁酸动力学和参数调整均与出版物中的发现一致。此外，我们的结果支持文献中的讨论， early administration of drugs that suppress bile acid synthesis may be effective in PSC treatment。

PolySketchFormer: Fast Transformers via Sketches for Polynomial Kernels

paper_url: http://arxiv.org/abs/2310.01655
repo_url: None
paper_authors: Praneeth Kacham, Vahab Mirrokni, Peilin Zhong
for: This paper aims to improve the efficiency of transformer architectures for language modeling by replacing softmax attention with a polynomial function and using polynomial sketching.
methods: The paper proposes a new attention mechanism called polynomial attention, which uses sketches for Polynomial Kernel from the randomized numerical linear algebra literature to approximate the attention output. The paper also introduces an efficient block-based algorithm to apply the causal mask to the attention matrix without explicitly realizing the $n \times n$ attention matrix.
results: The paper shows that the proposed polynomial attention mechanism leads to a significantly faster attention mechanism without assuming any sparse structure for the attention matrix, and the block-based algorithm gives significant speedups over the cumulative sum algorithm used by Performer. The paper also validates the design empirically by training language models with long context lengths and shows that the eval perplexities of the models are comparable to those of models trained with softmax attention, and the training times are significantly faster than FlashAttention.

Abstract
The quadratic complexity of attention in transformer architectures remains a big bottleneck in scaling up large foundation models for long context. In fact, recent theoretical results show the hardness of approximating the output of softmax attention mechanism in sub-quadratic time assuming Strong Exponential Time Hypothesis. In this paper, we show how to break this theoretical barrier by replacing softmax with a polynomial function and polynomial sketching. In particular we show that sketches for Polynomial Kernel from the randomized numerical linear algebra literature can be used to approximate the polynomial attention which leads to a significantly faster attention mechanism without assuming any sparse structure for the attention matrix that has been done in many previous works. In addition, we propose an efficient block-based algorithm that lets us apply the causal mask to the attention matrix without explicitly realizing the $n \times n$ attention matrix and compute the output of the polynomial attention mechanism in time linear in the context length. The block-based algorithm gives significant speedups over the \emph{cumulative sum} algorithm used by Performer to apply the causal mask to the attention matrix. These observations help us design \emph{PolySketchFormer}, a practical linear-time transformer architecture for language modeling with provable guarantees. We validate our design empirically by training language models with long context lengths. We first show that the eval perplexities of our models are comparable to that of models trained with softmax attention. We then show that for large context lengths our training times are significantly faster than FlashAttention.

摘要
“对于对称架构中的注意力运算，这是一个很大的瓶颈，尤其是在扩展大型基础模型时。事实上，最近的理论成果显示，对于softmax注意力机制的输出应用权值矩阵在下ynomial时间内难以近似。在本文中，我们显示了如何突破这个理论障碍，通过取代softmax WITH polynomial函数和概率图 sketching。具体来说，我们显示了图 sketches for Polynomial Kernel from the randomized numerical linear algebra literature可以用来近似 polynomial attention，从而实现了较快的注意力运算，不需要假设注意力矩阵的罕见结构。此外，我们提出了一个高效的封页基于算法，可以将 causal mask 应用到注意力矩阵中，而不需要直接建立 $n \times n$ 的注意力矩阵。这个封页基于算法可以实现linear时间内 compute 出 polynomial attention 的输出。与Performer的 cumulative sum 算法相比，这个封页基于算法可以提供重要的几何速度增加。这些观察帮助我们设计了PolySketchFormer，一个实际的linear-time transformer架构，具有证明的保证。我们透过训练语言模型来验证我们的设计。我们首先显示了我们的模型在不同的文本长度下的eval perplexity是相似的，与使用 softmax attention 训练的模型相似。然后，我们显示了在大文本长度下，我们的训练时间是与FlashAttention相比的significantly faster。”

Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations

paper_url: http://arxiv.org/abs/2310.01651
repo_url: https://github.com/ys-zong/foolyourvllms
paper_authors: Yongshuo Zong, Tingyang Yu, Bingchen Zhao, Ruchika Chavhan, Timothy Hospedales
for: 这篇论文旨在检测流行语言和视觉语言模型中的敏感性问题，即在多选问题回答中对答案集的排序影响。
methods: 作者使用了多种方法来检测模型的敏感性，包括对模型的输入和输出进行 permutation 操作，并对模型的性能进行分析。
results: 研究发现，流行的语言和视觉语言模型具有 permutation 敏感性问题，即对答案集的排序会导致模型的性能下降。这种敏感性存在于不同的模型大小和最新的语言和视觉语言模型中。

Abstract
Large language and vision-language models are rapidly being deployed in practice thanks to their impressive capabilities in instruction following, in-context learning, and so on. This raises an urgent need to carefully analyse their robustness so that stakeholders can understand if and when such models are trustworthy enough to be relied upon in any given application. In this paper, we highlight a specific vulnerability in popular models, namely permutation sensitivity in multiple-choice question answering (MCQA). Specifically, we show empirically that popular models are vulnerable to adversarial permutation in answer sets for multiple-choice prompting, which is surprising as models should ideally be as invariant to prompt permutation as humans are. These vulnerabilities persist across various model sizes, and exist in very recent language and vision-language models. Code is available at \url{https://github.com/ys-zong/FoolyourVLLMs}.

摘要
大型语言和视觉语言模型在实践中迅速投入使用，其吸引力在 instrucion following、上下文学习等方面表现出色。然而，这也提高了对这些模型的robustness进行仔细分析的需求，以便各方可以了解这些模型在具体应用中是否可靠。在这篇论文中，我们强调了流行模型中的一个特点，即多项选择问题回答中的排序敏感性。我们通过实验证明，流行的模型对答案集的排序很敏感，这是人类应该是不敏感的。这些敏感性存在不同模型大小和最新语言和视觉语言模型中。可以在 \url{https://github.com/ys-zong/FoolyourVLLMs} 上获取代码。

Equivariant Adaptation of Large Pretrained Models

paper_url: http://arxiv.org/abs/2310.01647
repo_url: None
paper_authors: Arnab Kumar Mondal, Siba Smarak Panigrahi, Sékou-Oumar Kaba, Sai Rajeswar, Siamak Ravanbakhsh
for: 使得大型预训练模型具有更高的采样效率和更准确的预测结果，并且能够快速地在训练和推理过程中进行变换
methods: 使用简单的均值化网络将输入转换到均值形式，然后将其传递给未Constrained预测网络
results: 使用 dataset-dependent priors 来指导均值化函数，使得大型预训练模型能够具有更高的采样效率和更准确的预测结果，并且能够在某些情况下提高对于旋转等概率变换的Robustness。

Abstract
Equivariant networks are specifically designed to ensure consistent behavior with respect to a set of input transformations, leading to higher sample efficiency and more accurate and robust predictions. However, redesigning each component of prevalent deep neural network architectures to achieve chosen equivariance is a difficult problem and can result in a computationally expensive network during both training and inference. A recently proposed alternative towards equivariance that removes the architectural constraints is to use a simple canonicalization network that transforms the input to a canonical form before feeding it to an unconstrained prediction network. We show here that this approach can effectively be used to make a large pretrained network equivariant. However, we observe that the produced canonical orientations can be misaligned with those of the training distribution, hindering performance. Using dataset-dependent priors to inform the canonicalization function, we are able to make large pretrained models equivariant while maintaining their performance. This significantly improves the robustness of these models to deterministic transformations of the data, such as rotations. We believe this equivariant adaptation of large pretrained models can help their domain-specific applications with known symmetry priors.

摘要
Equivariant 网络是专门为了保证输入变换的一致性，以提高样本效率和更准确的预测。然而，为了实现这种选择的一致性，现有的深度神经网络架构中的每个组件都需要重新设计，这会导致训练和推断过程中的计算成本增加。一种最近提出的代替方案是使用一个简单的标准化网络，将输入转换为一个标准形式，然后将其传递给一个未定型预测网络。我们在这里表明，这种方法可以有效地使大型预训练模型变换成一致的。然而，我们发现生产的标准方向可能与训练分布的方向不一致，这会降低性能。使用数据集依赖的先验来 inform 标准化函数，我们能够使大型预训练模型变换成一致，同时保持其性能。这会大幅提高这些模型对 deterministic 变换数据（如旋转）的Robustness。我们认为这种一致适应的大型预训练模型可以帮助它们在知道Symmetry先验的领域应用中提高性能。

Deep Insights into Noisy Pseudo Labeling on Graph Data

paper_url: http://arxiv.org/abs/2310.01634
repo_url: None
paper_authors: Botao Wang, Jia Li, Yang Liu, Jiashun Cheng, Yu Rong, Wenjia Wang, Fugee Tsung
for: 本文旨在对 Pseudo Labeling (PL) 策略在图学习模型中的应用进行深入分析，并提出一种谨慎的PL方法来改进图学习过程。
methods: 本文使用错误分析方法对 PL 策略进行了深入分析，并提出了一种基于 confidence 和多视图一致性的PL方法。
results: 实验结果显示，提出的方法可以改善图学习过程，并在链接预测和节点分类任务上超过了其他 PL 策略。

Abstract
Pseudo labeling (PL) is a wide-applied strategy to enlarge the labeled dataset by self-annotating the potential samples during the training process. Several works have shown that it can improve the graph learning model performance in general. However, we notice that the incorrect labels can be fatal to the graph training process. Inappropriate PL may result in the performance degrading, especially on graph data where the noise can propagate. Surprisingly, the corresponding error is seldom theoretically analyzed in the literature. In this paper, we aim to give deep insights of PL on graph learning models. We first present the error analysis of PL strategy by showing that the error is bounded by the confidence of PL threshold and consistency of multi-view prediction. Then, we theoretically illustrate the effect of PL on convergence property. Based on the analysis, we propose a cautious pseudo labeling methodology in which we pseudo label the samples with highest confidence and multi-view consistency. Finally, extensive experiments demonstrate that the proposed strategy improves graph learning process and outperforms other PL strategies on link prediction and node classification tasks.

摘要
假标签（PL）是一种广泛应用的策略，用于扩大标注数据集的训练过程中。许多研究表明，PL可以提高图学习模型的性能。然而，我们发现 incorrect labels 可能对图学习过程产生致命的影响。不当的 PL 可能导致性能下降，尤其是在图数据中， где 噪声可能进行卷积。 surprisingly，相关的错误分析在文献中 rarely 被 theoretically 探讨。在这篇论文中，我们希望给 PL 在图学习模型中提供深入的理解。我们首先给出 PL 策略的错误分析，并证明 error 是 PL 置信度和多视图预测一致性的 bound。然后，我们 theoretically 描述了 PL 对于融合性的影响。基于分析，我们提出了一种谨慎的假标签方法，其中我们假标签的样本是 confidence 最高和多视图一致的。最后，我们进行了广泛的实验，并证明了我们提出的策略可以改善图学习过程，并在链接预测和节点分类任务上超越其他 PL 策略。

Operator Learning Meets Numerical Analysis: Improving Neural Networks through Iterative Methods

paper_url: http://arxiv.org/abs/2310.01618
repo_url: None
paper_authors: Emanuele Zappala, Daniel Levine, Sizhuang He, Syed Rizvi, Sacha Levy, David van Dijk
for: 这篇论文目的是帮助深度学习模型更好地理解和设计，通过与数值分析的联系来提供理论基础。
methods: 这篇论文使用了迭代方法来描述神经网络，并提出了一种基于迭代法的神经网络架构。
results: 实验表明，迭代神经网络可以提高性能，而 alphaFold 和扩散模型等流行的架构也是基于迭代法。

Abstract
Deep neural networks, despite their success in numerous applications, often function without established theoretical foundations. In this paper, we bridge this gap by drawing parallels between deep learning and classical numerical analysis. By framing neural networks as operators with fixed points representing desired solutions, we develop a theoretical framework grounded in iterative methods for operator equations. Under defined conditions, we present convergence proofs based on fixed point theory. We demonstrate that popular architectures, such as diffusion models and AlphaFold, inherently employ iterative operator learning. Empirical assessments highlight that performing iterations through network operators improves performance. We also introduce an iterative graph neural network, PIGN, that further demonstrates benefits of iterations. Our work aims to enhance the understanding of deep learning by merging insights from numerical analysis, potentially guiding the design of future networks with clearer theoretical underpinnings and improved performance.

摘要
深度神经网络，尽管在许多应用中取得了成功，但它们往往没有明确的理论基础。在这篇论文中，我们尝试填补这一漏洞，通过将神经网络视为有定点表示希望的解的运算器，开发了一个基于迭代方法的理论框架。在定义的条件下，我们提供了收敛证明基于定点理论。我们发现，流行的架构，如扩散模型和AlphaFold，实际上是使用迭代运算学习。Empirical assessments表明，通过网络运算器进行迭代可以提高性能。我们还介绍了一种迭代图 neural network，PIGN，它进一步证明了迭代的好处。我们的工作的目的是增强深度学习的理解，通过与数学分析的交互，可能导向未来的网络设计有 clearer theoretical underpinnings和提高性能。

Intractability of Learning the Discrete Logarithm with Gradient-Based Methods

paper_url: http://arxiv.org/abs/2310.01611
repo_url: https://github.com/armanbolatov/hardness_of_learning
paper_authors: Rustem Takhanov, Maxat Tezekbayev, Artur Pak, Arman Bolatov, Zhibek Kadyrsizova, Zhenisbek Assylbekov
for: 本研究探讨了使用梯度下降法学习质数logarithm的缺点。
methods: 本研究使用了梯度下降法和内存梯度下降法，并通过对特定的矩阵的spectral norm进行分析，证明了梯度下降法在学习质数logarithm的缺点上具有局限性。
results: 研究发现，使用梯度下降法学习质数logarithm的缺点时，梯度的强度会受到基数的影响，而不是logarithm的基数。此外，随着群体的规模增加，预测缺点的成功率会下降。

Abstract
The discrete logarithm problem is a fundamental challenge in number theory with significant implications for cryptographic protocols. In this paper, we investigate the limitations of gradient-based methods for learning the parity bit of the discrete logarithm in finite cyclic groups of prime order. Our main result, supported by theoretical analysis and empirical verification, reveals the concentration of the gradient of the loss function around a fixed point, independent of the logarithm's base used. This concentration property leads to a restricted ability to learn the parity bit efficiently using gradient-based methods, irrespective of the complexity of the network architecture being trained. Our proof relies on Boas-Bellman inequality in inner product spaces and it involves establishing approximate orthogonality of discrete logarithm's parity bit functions through the spectral norm of certain matrices. Empirical experiments using a neural network-based approach further verify the limitations of gradient-based learning, demonstrating the decreasing success rate in predicting the parity bit as the group order increases.

摘要
“离散logsarithm问题是数理学中的基本挑战，具有临� Notices significant implications for cryptographic protocols. In this paper, we investigate the limitations of gradient-based methods for learning the parity bit of the discrete logarithm in finite cyclic groups of prime order. Our main result, supported by theoretical analysis and empirical verification, reveals the concentration of the gradient of the loss function around a fixed point, independent of the logarithm's base used. This concentration property leads to a restricted ability to learn the parity bit efficiently using gradient-based methods, irrespective of the complexity of the network architecture being trained. Our proof relies on Boas-Bellman inequality in inner product spaces and it involves establishing approximate orthogonality of discrete logarithm's parity bit functions through the spectral norm of certain matrices. Empirical experiments using a neural network-based approach further verify the limitations of gradient-based learning, demonstrating the decreasing success rate in predicting the parity bit as the group order increases.”

Adversarial Contextual Bandits Go Kernelized

paper_url: http://arxiv.org/abs/2310.01609
repo_url: None
paper_authors: Gergely Neu, Julia Olkhovskaya, Sattar Vakili
for: 本研究探讨了在线学习中的反对抗敌性线性上下文随机带动问题的一种普适化问题，通过使用可重构kernel空间中的损失函数，以更加灵活地模型复杂的决策场景。
methods: 我们提出了一种计算效率高的算法，使用了一种新的乐观偏置估计器来估计损失函数，并实现了近似optimal的恐慌保证下界，对于多种 eigenvalue decay 假设。
results: 我们的算法在多种情况下都可以实现near-optimal的恐慌保证下界，包括对于polynomial eigendecay的情况下， regret 为 $\widetilde{O}(KT^{(\frac{1}{2}(1+\frac{1}{c})}$，其中 $T$ 是 Round 的数量， $K$ 是行动的数量。当 eigendecay 遵循 exponential 模式时，我们可以实现even tighter的 regret bound，即 $\widetilde{O}(\sqrt{T})$。这些率与所有已知的下界匹配，并且与已知的最佳上界匹配。

Abstract
We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios. We propose a computationally efficient algorithm that makes use of a new optimistically biased estimator for the loss functions and achieves near-optimal regret guarantees under a variety of eigenvalue decay assumptions made on the underlying kernel. Specifically, under the assumption of polynomial eigendecay with exponent $c>1$, the regret is $\widetilde{O}(KT^{\frac{1}{2}(1+\frac{1}{c})})$, where $T$ denotes the number of rounds and $K$ the number of actions. Furthermore, when the eigendecay follows an exponential pattern, we achieve an even tighter regret bound of $\widetilde{O}(\sqrt{T})$. These rates match the lower bounds in all special cases where lower bounds are known at all, and match the best known upper bounds available for the more well-studied stochastic counterpart of our problem.

摘要
我们研究一种扩展线性上下文ual bandit问题的在线学习泛化问题，该问题允许更加灵活地模型复杂的决策场景。我们提出了一种 computationally efficient 的算法，该算法使用了一种新的乐观偏向估计器来估计损失函数，并实现了近似optimal的 regret guarantee。 Specifically, under the assumption of 多项幂减少($c>1$)，我们的 regret是 $\widetilde{O}(KT^{\frac{1}{2}(1+\frac{1}{c})})$, where $T$ denotes the number of rounds and $K$ the number of actions. 另外，当欧几何减少 follows an exponential pattern 时，我们可以达到更紧的 regret bound of $\widetilde{O}(\sqrt{T})$. These rates match the lower bounds in all special cases where lower bounds are known at all, and match the best known upper bounds available for the more well-studied stochastic counterpart of our problem.

Pool-Based Active Learning with Proper Topological Regions

paper_url: http://arxiv.org/abs/2310.01597
repo_url: https://github.com/Lies0zeta/PALPTR-
paper_authors: Lies Hadjadj, Emilie Devijver, Remi Molinier, Massih-Reza Amini
for: 本研究提出了一种基于多类分类任务的池型活动学习策略，用于增强机器学习模型的性能。
methods: 本文提出了一种基于 topological data analysis（TDA）的Proper Topological Regions（PTR）方法，用于在池型活动学习中选择最有价值的无标注数据。
results: 实验表明，提出的方法在多种 benchmark 数据集上具有竞争力，并且与传统方法相比，可以更好地增强机器学习模型的性能。

Abstract
Machine learning methods usually rely on large sample size to have good performance, while it is difficult to provide labeled set in many applications. Pool-based active learning methods are there to detect, among a set of unlabeled data, the ones that are the most relevant for the training. We propose in this paper a meta-approach for pool-based active learning strategies in the context of multi-class classification tasks based on Proper Topological Regions. PTR, based on topological data analysis (TDA), are relevant regions used to sample cold-start points or within the active learning scheme. The proposed method is illustrated empirically on various benchmark datasets, being competitive to the classical methods from the literature.

摘要
文本翻译成简化中文：机器学习方法通常需要大量数据来达到良好的性能，而在许多应用场景中提供标注数据却是困难的。基于池的活动学习方法可以探测一个未标注数据集中最相关的数据，以便在活动学习中训练。本文提出了一种基于多 класс分类任务的池基活动学习策略的meta方法，使用Proper Topological Regions（PTR）来检测 relevance。PTR基于数据 topological分析（TDA），可以在活动学习中作为冷开始点或在激活学习中选择数据。Empirical experiment表明，提议的方法与文献中的传统方法竞争。Note:* "Pool-based active learning" refers to the approach of using a pool of unlabeled data to select the most relevant instances for labeling.* "Proper Topological Regions" (PTR) are regions in the data space that are defined based on topological data analysis (TDA) and are used to detect relevance in the unlabeled data.* "Multi-class classification" refers to the task of classifying instances into one of multiple classes.

An Investigation of Representation and Allocation Harms in Contrastive Learning

paper_url: http://arxiv.org/abs/2310.01583
repo_url: https://github.com/smaityumich/cl-representation-harm
paper_authors: Subha Maity, Mayank Agarwal, Mikhail Yurochkin, Yuekai Sun
for: 本研究探讨了自动学习中少数群体表现下降的原因，具体来说是对于自适应学习（SSL）中的对比学习（CL）方法的影响。
methods: 本研究使用了图像和文本数据集，以及相关的流行CL方法，来描述对少数群体的 represeting潜在危害。
results: 研究发现，CL方法在处理少数群体时容易导致对少数群体的表示潜在危害，并且这种危害对于下游分类任务有一定的影响。此外，研究还提供了一种理论解释，即在CLSetting中的概率链模型导致了表示潜在危害。

Abstract
The effect of underrepresentation on the performance of minority groups is known to be a serious problem in supervised learning settings; however, it has been underexplored so far in the context of self-supervised learning (SSL). In this paper, we demonstrate that contrastive learning (CL), a popular variant of SSL, tends to collapse representations of minority groups with certain majority groups. We refer to this phenomenon as representation harm and demonstrate it on image and text datasets using the corresponding popular CL methods. Furthermore, our causal mediation analysis of allocation harm on a downstream classification task reveals that representation harm is partly responsible for it, thus emphasizing the importance of studying and mitigating representation harm. Finally, we provide a theoretical explanation for representation harm using a stochastic block model that leads to a representational neural collapse in a contrastive learning setting.

摘要
supervised learning 中少数群体的表现问题已经得到了广泛关注，但在自动学习（SSL）上还未得到足够的探讨。本文显示，在对比学习（CL）中，少数群体的表现会与主要群体的表现相归缩合。我们称此现象为表现害，并在图像和文本 dataset 上使用相关的流行 CL 方法进行证明。此外，我们通过对下游分类任务的干扰分析表明，表现害对 representation harm 具有一定的贡献，因此就是要研究和缓解表现害的重要性。最后，我们提供了一种 theoretically explain representation harm 的Stochastic block model，导致了对比学习设置中的表现害。

Contraction Properties of the Global Workspace Primitive

paper_url: http://arxiv.org/abs/2310.01571
repo_url: None
paper_authors: Michaela Ennis, Leo Kozachkov, Jean-Jacques Slotine
for: 这 paper 探讨了多个领域的循环神经网络（RNN）的重要研究领域，特别是 Kozachkov et al. 提出的可证实的RNN（RNNs）。
methods: 该 paper 通过理论和实验方式扩展了 RNNs 的稳定性条件，特别是对全球工作空间模块结构的研究。
results: 该 paper 通过实验成功地示出了 Global Workspace Sparse Combo Nets 具有少量可训练参数，并且在缺少个体子网络时具有更好的抗耗性。这些实验结果表明了我们的理论研究对于实现模块 RNN 的稳定性具有重要意义。

Abstract
To push forward the important emerging research field surrounding multi-area recurrent neural networks (RNNs), we expand theoretically and empirically on the provably stable RNNs of RNNs introduced by Kozachkov et al. in "RNNs of RNNs: Recursive Construction of Stable Assemblies of Recurrent Neural Networks". We prove relaxed stability conditions for salient special cases of this architecture, most notably for a global workspace modular structure. We then demonstrate empirical success for Global Workspace Sparse Combo Nets with a small number of trainable parameters, not only through strong overall test performance but also greater resilience to removal of individual subnetworks. These empirical results for the global workspace inter-area topology are contingent on stability preservation, highlighting the relevance of our theoretical work for enabling modular RNN success. Further, by exploring sparsity in the connectivity structure between different subnetwork modules more broadly, we improve the state of the art performance for stable RNNs on benchmark sequence processing tasks, thus underscoring the general utility of specialized graph structures for multi-area RNNs.

摘要

Causality-informed Rapid Post-hurricane Building Damage Detection in Large Scale from InSAR Imagery

paper_url: http://arxiv.org/abs/2310.01565
repo_url: None
paper_authors: Chenguang Wang, Yepeng Liu, Xiaojian Zhang, Xuechun Li, Vladimir Paramygin, Arthriya Subgranon, Peter Sheng, Xilei Zhao, Susu Xu
for:* 这 paper 是为了快速评估飓风引起的建筑物损害而写的。methods:* 这 paper 使用了 remote sensing 技术获取大规模的光学或 Interferometric Synthetic Aperture Radar（InSAR）图像数据，并使用了 causal Bayesian network 编码了风、洪水、建筑物损害、InSAR 图像之间的复杂 causal 关系。results:* 这 paper 的结果表明，使用这种方法可以快速 и准确地检测飓风引起的建筑物损害，并且可以避免了传统的手动检查方法所需的较长的处理时间。

Abstract
Timely and accurate assessment of hurricane-induced building damage is crucial for effective post-hurricane response and recovery efforts. Recently, remote sensing technologies provide large-scale optical or Interferometric Synthetic Aperture Radar (InSAR) imagery data immediately after a disastrous event, which can be readily used to conduct rapid building damage assessment. Compared to optical satellite imageries, the Synthetic Aperture Radar can penetrate cloud cover and provide more complete spatial coverage of damaged zones in various weather conditions. However, these InSAR imageries often contain highly noisy and mixed signals induced by co-occurring or co-located building damage, flood, flood/wind-induced vegetation changes, as well as anthropogenic activities, making it challenging to extract accurate building damage information. In this paper, we introduced an approach for rapid post-hurricane building damage detection from InSAR imagery. This approach encoded complex causal dependencies among wind, flood, building damage, and InSAR imagery using a holistic causal Bayesian network. Based on the causal Bayesian network, we further jointly inferred the large-scale unobserved building damage by fusing the information from InSAR imagery with prior physical models of flood and wind, without the need for ground truth labels. Furthermore, we validated our estimation results in a real-world devastating hurricane -- the 2022 Hurricane Ian. We gathered and annotated building damage ground truth data in Lee County, Florida, and compared the introduced method's estimation results with the ground truth and benchmarked it against state-of-the-art models to assess the effectiveness of our proposed method. Results show that our method achieves rapid and accurate detection of building damage, with significantly reduced processing time compared to traditional manual inspection methods.

摘要
时刻和精准的飓风导致建筑物损坏评估是应急回应和恢复努力的关键。现在，远程感知技术提供大规模的光学或折射 Synthetic Aperture Radar（InSAR）图像数据，可以快速进行飓风后建筑物损坏评估。相比光学卫星图像，Synthetic Aperture Radar可以穿过云层和提供更完整的损坏区域各种天气情况下的损坏评估。然而，这些InSAR图像经常含有高度杂音和混合信号，由于同时发生或位于损坏区域的建筑物损坏、洪水、洪水/风吹落 vegetation 变化以及人类活动，使其� Extracting accurate building damage information challenging。在本文中，我们介绍了一种快速飓风后建筑物损坏检测方法，基于数学关系网络（Bayesian network）。这个方法利用这些建筑物损坏、洪水、风吹的复杂 causal 关系，通过组合 InSAR 图像资讯和预先建立的洪水和风吹 Physical 模型，无需地面实验标签。此外，我们还 validate 了我们的估计结果，在2022年飓风 Ian 中进行了真实世界的应用。我们在李县、佛罗里达聚集和标注建筑物损坏的实际数据，并与地面实验标签相比较，以评估我们提出的方法的有效性。结果显示，我们的方法可以快速和精准地检测建筑物损坏，并且与传统手动检查方法相比，具有明显的处理时间缩短。

On the near-optimality of betting confidence sets for bounded means

paper_url: http://arxiv.org/abs/2310.01547
repo_url: None
paper_authors: Shubhanshu Shekhar, Aaditya Ramdas
for: 这个论文的目的是提供一种非对称信息Interval的建立方法，以及其时间平衡变体 confidence sequence。
methods: 这个论文使用了一种基于赌博的方法，即Waudby-Smith和Ramdas（2023）的赌博信息Interval。
results: 这个论文提供了一些 teorethical guarantees for this improved empirical performance of betting CIs and CSs，包括limiting width comparison, lower bounds characterization, and matching fundamental limits.

Abstract
Constructing nonasymptotic confidence intervals (CIs) for the mean of a univariate distribution from independent and identically distributed (i.i.d.) observations is a fundamental task in statistics. For bounded observations, a classical nonparametric approach proceeds by inverting standard concentration bounds, such as Hoeffding's or Bernstein's inequalities. Recently, an alternative betting-based approach for defining CIs and their time-uniform variants called confidence sequences (CSs), has been shown to be empirically superior to the classical methods. In this paper, we provide theoretical justification for this improved empirical performance of betting CIs and CSs. Our main contributions are as follows: (i) We first compare CIs using the values of their first-order asymptotic widths (scaled by $\sqrt{n}$), and show that the betting CI of Waudby-Smith and Ramdas (2023) has a smaller limiting width than existing empirical Bernstein (EB)-CIs. (ii) Next, we establish two lower bounds that characterize the minimum width achievable by any method for constructing CIs/CSs in terms of certain inverse information projections. (iii) Finally, we show that the betting CI and CS match the fundamental limits, modulo an additive logarithmic term and a multiplicative constant. Overall these results imply that the betting CI~(and CS) admit stronger theoretical guarantees than the existing state-of-the-art EB-CI~(and CS); both in the asymptotic and finite-sample regimes.

摘要
constructing nonasymptotic confidence intervals (CIs) for the mean of a univariate distribution from independent and identically distributed (i.i.d.) observations is a fundamental task in statistics. For bounded observations, a classical nonparametric approach proceeds by inverting standard concentration bounds, such as Hoeffding's or Bernstein's inequalities. Recently, an alternative betting-based approach for defining CIs and their time-uniform variants called confidence sequences (CSs), has been shown to be empirically superior to the classical methods. In this paper, we provide theoretical justification for this improved empirical performance of betting CIs and CSs. 我们的主要贡献如下：（i）我们首先比较CIs的第一个 asymptotic width（按照n的平方根 scaling），并显示WAudby-Smith和Ramdas（2023）的赌博CI的限制宽度小于现有的empirical Bernstein（EB）-CI。（ii）然后，我们设定了两个下界，用于描述任何方法构造CIs/CSs的最小宽度，并表示这些下界与certain inverse information projections有关。（iii）最后，我们表明了赌博CI和CS与基本限制相匹配，即，对于任何方法，其宽度至少要比基本限制宽度加上一个对数函数和一个常数多少。总的来说，这些结果表明赌博CI（和CS）在 both the asymptotic and finite-sample regimes具有更强的理论保证，比现有的状态 искусственный智能EB-CI（和CS）更强。

Fusing Models with Complementary Expertise

paper_url: http://arxiv.org/abs/2310.01542
repo_url: None
paper_authors: Hongyi Wang, Felipe Maia Polo, Yuekai Sun, Souvik Kundu, Eric Xing, Mikhail Yurochkin
for: 这个论文的目的是解决训练AI模型通用多任务多领域的问题，以便在测试时能够更好地掌握数据分布的各种多样性。
methods: 这篇论文使用了专家模型的融合（Fusion of Experts， FoE）方法，将专家模型的输出融合到一起，以提高任务的性能。这种方法适用于推理和生成任务，并且在图像和文本分类、文本摘要、多选问答以及自动评估生成文本等任务中得到了显著的性能提升。
results: 这篇论文的实验结果表明，使用 FoE 方法可以在图像和文本分类、文本摘要、多选问答以及自动评估生成文本等任务中提高性能，并且在“倔强”（frugal）设定下，可以减少专家模型评估的次数。

Abstract
Training AI models that generalize across tasks and domains has long been among the open problems driving AI research. The emergence of Foundation Models made it easier to obtain expert models for a given task, but the heterogeneity of data that may be encountered at test time often means that any single expert is insufficient. We consider the Fusion of Experts (FoE) problem of fusing outputs of expert models with complementary knowledge of the data distribution and formulate it as an instance of supervised learning. Our method is applicable to both discriminative and generative tasks and leads to significant performance improvements in image and text classification, text summarization, multiple-choice QA, and automatic evaluation of generated text. We also extend our method to the "frugal" setting where it is desired to reduce the number of expert model evaluations at test time.

摘要
<>对于训练通用的人工智能模型，长期以来是开放问题驱动着人工智能研究的一个重要问题。基础模型的出现使得获得特定任务的专家模型更加容易，但是在测试时可能遇到的数据多样性通常意味着任何一个专家都不够。我们将专家融合（FoE）问题定义为将专家模型输出的 complementary 知识与数据分布相结合，并将其视为一种supervised learning实例。我们的方法适用于推论和生成任务，并在图像和文本分类、文本摘要、多选问答和自动评估生成文本中导致显著性能提升。我们还将方法推广到"倔强"设定，即在测试时尽可能减少专家模型评估数量。

Adversarial Client Detection via Non-parametric Subspace Monitoring in the Internet of Federated Things

paper_url: http://arxiv.org/abs/2310.01537
repo_url: None
paper_authors: Xianjian Xie, Xiaochen Xian, Dan Li, Andi Wang
for: 该论文旨在提出一种有效的非参数方法FedRR，用于解决 federated learning 网络中的恶意攻击问题。
methods: 该方法基于 transmitted 参数更新的低级特征，并可以准确地检测恶意客户端和控制假阳性率。
results: 实验基于 MNIST 数据集的 digit 识别 validate 了我们的方法的优势。

Abstract
The Internet of Federated Things (IoFT) represents a network of interconnected systems with federated learning as the backbone, facilitating collaborative knowledge acquisition while ensuring data privacy for individual systems. The wide adoption of IoFT, however, is hindered by security concerns, particularly the susceptibility of federated learning networks to adversarial attacks. In this paper, we propose an effective non-parametric approach FedRR, which leverages the low-rank features of the transmitted parameter updates generated by federated learning to address the adversarial attack problem. Besides, our proposed method is capable of accurately detecting adversarial clients and controlling the false alarm rate under the scenario with no attack occurring. Experiments based on digit recognition using the MNIST datasets validated the advantages of our approach.

摘要
互联网联邦智能（IoFT）表示一个联网了多个系统，带有联邦学习作为核心，实现共同知识获取的网络，同时保障个体系统的数据隐私。然而，IoFT的广泛应用受到了安全问题的限制，尤其是联邦学习网络对 adversarial 攻击的抵触。在这篇论文中，我们提出了一种有效的非参数方法 FedRR，它利用联邦学习传输的参数更新低级特征来解决 adversarial 攻击问题。此外，我们的提议方法可以准确地检测出恶意客户端，并在没有攻击情况下控制假阳性率。基于 digit 识别 using MNIST 数据集，我们的方法在实验中证明了其优势。

Nowcasting day-ahead marginal emissions using multi-headed CNNs and deep generative models

paper_url: http://arxiv.org/abs/2310.01524
repo_url: None
paper_authors: Dhruv Suri, Anela Arifi, Ines Azevedo
for: 预测当天纳入系统的碳排放因素，以便在高灵活性和分布式能源资源的能源系统中更好地管理能源。
methods: 使用多头 convolutional neural networks（CNN）生成当天碳排放预测，以便更好地理解一个独立系统运营商的投入决策对碳排放的影响。
results: 通过使用多头 CNN 生成当天碳排放预测，可以更好地理解一个独立系统运营商的投入决策对碳排放的影响，从而更好地管理能源系统。

Abstract
Nowcasting day-ahead marginal emissions factors is increasingly important for power systems with high flexibility and penetration of distributed energy resources. With a significant share of firm generation from natural gas and coal power plants, forecasting day-ahead emissions in the current energy system has been widely studied. In contrast, as we shift to an energy system characterized by flexible power markets, dispatchable sources, and competing low-cost generation such as large-scale battery or hydrogen storage, system operators will be able to choose from a mix of different generation as well as emission pathways. To fully develop the emissions implications of a given dispatch schedule, we need a near real-time workflow with two layers. The first layer is a market model that continuously solves a security-constrained economic dispatch model. The second layer determines the marginal emissions based on the output of the market model, which is the subject of this paper. We propose using multi-headed convolutional neural networks to generate day-ahead forecasts of marginal and average emissions for a given independent system operator.

摘要
现在casting日前边额排放因子是现代化能源系统中增加的重要问题，特别是在高灵活性和分布式能源资源的普及下。现有的研究主要关注天然气和煤矿发电厂的固定产量，预测当前能源系统的日前排放。然而，随着我们转向一个具有灵活电力市场、投放可靠发电源和低成本生产如大规模电池或氢存储的能源系统，系统运营商将有多种不同的发电和排放路径可供选择。为了充分发挥排放的影响，我们需要一个实时工作流程，包括两层。第一层是一个安全保证的经济调度模型，第二层确定基于第一层模型的输出的边额排放。我们提议使用多头 convolutional neural networks（CNN）生成日前预测边额和平均排放的方法，这是本文的研究对象。

The Benefit of Noise-Injection for Dynamic Gray-Box Model Creation

paper_url: http://arxiv.org/abs/2310.01517
repo_url: None
paper_authors: Mohamed Kandil, J. J. McArthur
for: This paper aims to improve the performance of gray-box models for equipment emulator development by addressing uncertainties in the model creation process.methods: The paper proposes injecting noise into the training dataset to enrich the data and provide a measure of robustness against uncertainties.results: The approach was tested on a water-to-water heat exchanger using real devices with live data streaming, resulting in a significant reduction in modeling error (root mean square error) compared to the unprocessed signal data. The improvement amounted to 60% on the training set, and 50% and 45% on the test and validation sets, respectively.

Abstract
Gray-box models offer significant benefit over black-box approaches for equipment emulator development for equipment since their integration of physics provides more confidence in the model outside of the training domain. However, challenges such as model nonlinearity, unmodeled dynamics, and local minima introduce uncertainties into grey-box creation that contemporary approaches have failed to overcome, leading to their under-performance compared with black-box models. This paper seeks to address these uncertainties by injecting noise into the training dataset. This noise injection enriches the dataset and provides a measure of robustness against such uncertainties. A dynamic model for a water-to-water heat exchanger has been used as a demonstration case for this approach and tested using a pair of real devices with live data streaming. Compared to the unprocessed signal data, the application of noise injection resulted in a significant reduction in modeling error (root mean square error), decreasing from 0.68 to 0.27{\deg}C. This improvement amounts to a 60% enhancement when assessed on the training set, and improvements of 50% and 45% when validated against the test and validation sets, respectively.

摘要
灰色模型对设备模拟器开发具有显著的优势，因为它们 integrates 物理学提供了更多的信任度在训练领域之外。然而，模型不线性、不确定性和地方极值引入了不确定性，使得现代方法无法超越这些不确定性，导致其表现相对落后于黑色模型。本文提出了将噪声掺入训练集的方法，以增强数据集的质量和模型对不确定性的Robustness。我们使用了一个水到水热交换器的动态模型作为示例，并使用了两个真实的设备进行实际测试。与未处理的信号数据相比，噪声掺入导致模型错误减少了从0.68到0.27℃，即60%的提高。在训练集上评估时，改进了50%，在验证集和验证集上分别提高了45%。

Tensor Ring Optimized Quantum-Enhanced Tensor Neural Networks

paper_url: http://arxiv.org/abs/2310.01515
repo_url: https://github.com/konar1987/tr-qnet
paper_authors: Debanjan Konar, Dheeraj Peddireddy, Vaneet Aggarwal, Bijaya K. Panigrahi
For:The paper is written for researchers in the field of quantum machine learning, specifically those interested in incorporating tensor networks into deep neural networks and variational optimization.Methods:The paper proposes a multi-layer design of a Tensor Ring optimized variational Quantum learning classifier (Quan-TR), which consists of cascading entangling gates replacing the fully connected layers of a tensor network. The parameters of the TR-QNet are optimized through stochastic gradient descent algorithm on qubit measurements.Results:The proposed TR-QNet achieves promising accuracy on three distinct datasets, namely Iris, MNIST, and CIFAR-10, with accuracy of 94.5%, 86.16%, and 83.54%, respectively, on quantum simulations. The paper also conducts benchmark studies on state-of-the-art quantum and classical implementations of tensor network models to demonstrate the efficacy of the proposed TR-QNet. Additionally, the scalability of TR-QNet highlights its potential for deep learning applications on a large scale.

Abstract
Quantum machine learning researchers often rely on incorporating Tensor Networks (TN) into Deep Neural Networks (DNN) and variational optimization. However, the standard optimization techniques used for training the contracted trainable weights of each model layer suffer from the correlations and entanglement structure between the model parameters on classical implementations. To address this issue, a multi-layer design of a Tensor Ring optimized variational Quantum learning classifier (Quan-TR) comprising cascading entangling gates replacing the fully connected (dense) layers of a TN is proposed, and it is referred to as Tensor Ring optimized Quantum-enhanced tensor neural Networks (TR-QNet). TR-QNet parameters are optimized through the stochastic gradient descent algorithm on qubit measurements. The proposed TR-QNet is assessed on three distinct datasets, namely Iris, MNIST, and CIFAR-10, to demonstrate the enhanced precision achieved for binary classification. On quantum simulations, the proposed TR-QNet achieves promising accuracy of $94.5\%$, $86.16\%$, and $83.54\%$ on the Iris, MNIST, and CIFAR-10 datasets, respectively. Benchmark studies have been conducted on state-of-the-art quantum and classical implementations of TN models to show the efficacy of the proposed TR-QNet. Moreover, the scalability of TR-QNet highlights its potential for exhibiting in deep learning applications on a large scale. The PyTorch implementation of TR-QNet is available on Github:https://github.com/konar1987/TR-QNet/

摘要
研究员们常常将量子机器学习与含tensor网络（TN）和变量优化结合起来。然而，在классифика翻译器中使用标准优化技术来训练每层模型参数的问题受到 correlate 和束缚结构的影响。为了解决这个问题，我们提出了一种多层设计的tensor环优化量子学习分类器（Quan-TR），其中每层的含tensor网络（TN）中的完全连接（dense）层被替换为束缚门。这种模型被称为tensor环优化量子含tensor神经网络（TR-QNet）。TR-QNet的参数通过随机梯度下降算法在量子测量中进行优化。我们在三个不同的数据集上（namely Iris、MNIST和CIFAR-10）进行了评估，并达到了高精度的分类结果。在量子仿真中，我们的TR-QNet实现了可观的准确率，即$94.5\%$, $86.16\%$和$83.54\%$。我们还对现有的量子和类型实现的TN模型进行了比较，以显示TR-QNet的效果。此外，TR-QNet的可扩展性表明它在深度学习应用中具有潜在的潜力。TR-QNet的PyTorch实现可以在GitHub上找到：https://github.com/konar1987/TR-QNet/。

CODA: Temporal Domain Generalization via Concept Drift Simulator

paper_url: http://arxiv.org/abs/2310.01508
repo_url: None
paper_authors: Chia-Yuan Chang, Yu-Neng Chuang, Zhimeng Jiang, Kwei-Herng Lai, Anxiao Jiang, Na Zou
for: 这个研究旨在解决机器学习模型在概念漂移（concept drift）中的问题，以提高模型在不同时间点的通用性。
methods: 研究使用了一个名为CODA（Concept Drift simulAtor）的框架，它利用预测的特征相互 correlations来生成未来数据，以便训练模型。
results: 实验结果显示，使用CODA-生成的数据作为训练输入可以有效地实现时间领域通用性，并且可以适用于不同的模型架构。

Abstract
In real-world applications, machine learning models often become obsolete due to shifts in the joint distribution arising from underlying temporal trends, a phenomenon known as the "concept drift". Existing works propose model-specific strategies to achieve temporal generalization in the near-future domain. However, the diverse characteristics of real-world datasets necessitate customized prediction model architectures. To this end, there is an urgent demand for a model-agnostic temporal domain generalization approach that maintains generality across diverse data modalities and architectures. In this work, we aim to address the concept drift problem from a data-centric perspective to bypass considering the interaction between data and model. Developing such a framework presents non-trivial challenges: (i) existing generative models struggle to generate out-of-distribution future data, and (ii) precisely capturing the temporal trends of joint distribution along chronological source domains is computationally infeasible. To tackle the challenges, we propose the COncept Drift simulAtor (CODA) framework incorporating a predicted feature correlation matrix to simulate future data for model training. Specifically, CODA leverages feature correlations to represent data characteristics at specific time points, thereby circumventing the daunting computational costs. Experimental results demonstrate that using CODA-generated data as training input effectively achieves temporal domain generalization across different model architectures.

摘要
在实际应用中，机器学习模型经常因为 JOINT 分布的变化而变得过时，这种现象被称为 "概念漂移"。现有的工作提出了特定于模型的策略来实现时间总结。然而，实际数据的多样性需要特定的预测模型建 architecture。因此，有一项非常需要的是一种模型无关的时间域总结方法，可以在不同的数据模式和建 architecture 下保持一致性。在这项工作中，我们尝试通过数据中心的方式解决概念漂移问题，而不是考虑数据和模型之间的交互。开发这样的框架具有非常大的挑战：（i）现有的生成模型很难生成未经验数据，（ii）准确地捕捉 JOINT 分布中的时间趋势是计算不可能的。为了解决这些挑战，我们提出了 COncept Drift simulAtor（CODA）框架，该框架利用预测的特征相关矩阵来模拟未来数据，以便对模型进行训练。具体来说，CODA 利用特征相关来表示特定时间点的数据特征，从而绕过了计算不可能的问题。实验结果表明，使用 CODA-生成的数据作为训练输入可以实现时间域总结，并且可以在不同的模型建 architecture 下实现。

A Learning Based Scheme for Fair Timeliness in Sparse Gossip Networks

paper_url: http://arxiv.org/abs/2310.01396
repo_url: None
paper_authors: Purbesh Mitra, Sennur Ulukus
for: 本研究旨在研究一个带有各种连接性的谣言网络，source更新信息采用波动过程，并且将信息传递给网络中的节点。由于网络结构不均衡，不同节点的实时性不同，因此需要研究如何对网络进行公平的时间分配，以最小化总体最差性能。
methods: 本研究使用连续搜索空间的枪戈投掷问题形式化了问题，并采用 Gaussian process基于 Bayesian 优化来实现探索和利用的权衡。
results: 研究发现，采用 Gaussian process基于 Bayesian 优化的方法可以在不同的网络结构下实现公平的时间分配，并且可以最小化总体最差性能。

Abstract
We consider a gossip network, consisting of $n$ nodes, which tracks the information at a source. The source updates its information with a Poisson arrival process and also sends updates to the nodes in the network. The nodes themselves can exchange information among themselves to become as timely as possible. However, the network structure is sparse and irregular, i.e., not every node is connected to every other node in the network, rather, the order of connectivity is low, and varies across different nodes. This asymmetry of the network implies that the nodes in the network do not perform equally in terms of timelines. Due to the gossiping nature of the network, some nodes are able to track the source very timely, whereas, some nodes fall behind versions quite often. In this work, we investigate how the rate-constrained source should distribute its update rate across the network to maintain fairness regarding timeliness, i.e., the overall worst case performance of the network can be minimized. Due to the continuous search space for optimum rate allocation, we formulate this problem as a continuum-armed bandit problem and employ Gaussian process based Bayesian optimization to meet a trade-off between exploration and exploitation sequentially.

摘要
我们考虑一个嗅探网络，包含 $n$ 个节点，跟踪源信息的变化。源节点通过波动过程更新自己的信息，并将更新传递给网络中的其他节点。节点之间可以互相交换信息，以使自己的时间线最为整拢。然而，网络结构稀疏和不规则，即不是所有节点与所有其他节点相连，而是每个节点与其他节点之间的连接关系较弱，因此不同节点在网络中的性能不同。由于嗅探网络的自我感知特性，一些节点可以很快地跟踪源信息，而其他节点则经常落后版本。在这个工作中，我们研究如何Constrained source应该在网络中分配更新率，以保持公平性，即最大化网络总体最差性能。由于搜索空间是连续的，我们将这个问题转化为连续武器问题，并使用 Gaussian process 基于 Bayesian 优化来实现搜索和利用的权衡。

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.01380
repo_url: None
paper_authors: Qiwei Di, Heyang Zhao, Jiafan He, Quanquan Gu
for: 本研究旨在提出一种 oracle-efficient 算法，用于 offline 强化学习（RL）中的非线性函数approximation。
methods: 我们的算法采用了三个创新的Component：(1) 一种基于差异的重 regression scheme，可以应用于各种函数类型; (2) 一种用于幂度估计的 subroutine; (3) 一种计划阶段使用的 pessimistic value iteration 方法。
results: 我们的算法可以 garantuetotal achieve minimax 优化的实例特性 regret，并且在特定的函数类型下，其 regret bound 具有紧张的函数类型复杂度的关系。

Abstract
Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based on the data collected by a behavior policy, has attracted increasing attention in recent years. While offline RL with linear function approximation has been extensively studied with optimal results achieved under certain assumptions, many works shift their interest to offline RL with non-linear function approximation. However, limited works on offline RL with non-linear function approximation have instance-dependent regret guarantees. In this paper, we propose an oracle-efficient algorithm, dubbed Pessimistic Nonlinear Least-Square Value Iteration (PNLSVI), for offline RL with non-linear function approximation. Our algorithmic design comprises three innovative components: (1) a variance-based weighted regression scheme that can be applied to a wide range of function classes, (2) a subroutine for variance estimation, and (3) a planning phase that utilizes a pessimistic value iteration approach. Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation. Our work extends the previous instance-dependent results within simpler function classes, such as linear and differentiable function to a more general framework.

摘要
养成机器人学习（RL）在线上进行学习，以便学习最佳策略基于行为策略收集的数据。Recent years have seen increasing attention paid to offline RL with linear function approximation. However, many works have shifted their focus to offline RL with non-linear function approximation. Although there have been limited works on offline RL with non-linear function approximation that provide instance-dependent regret guarantees. In this paper, we propose an oracle-efficient algorithm, called Pessimistic Nonlinear Least-Square Value Iteration (PNLSVI), for offline RL with non-linear function approximation. Our algorithm design includes three innovative components:1. 一种基于方差的重量回归方案，可以应用于各种函数类型2. 一种归一化误差估计的子routine3. 一个使用悲观值迭代方法的规划阶段我们的算法拥有一个具有函数类型复杂度的 regret bound，并在特殊化为线性函数approximation时实现最佳最小化例外 regret。我们的工作扩展了之前只适用于更简单的函数类型，如线性和导数函数的结果，到一个更通用的框架。

Window-based Model Averaging Improves Generalization in Heterogeneous Federated Learning

paper_url: http://arxiv.org/abs/2310.01366
repo_url: None
paper_authors: Debora Caldarola, Barbara Caputo, Marco Ciccone
for: 提高 Federated Learning（FL）中数据分布不均的问题，保护用户隐私。
methods: 提出了窗口基于的模型均值（WIMA）方法，通过融合不同回合的全球模型，有效地捕捉多个用户的知识，降低最后见 Client 数据偏见。
results: 在不同的分布Shift和坏 Client 采样情况下，WIMA 能够提供更平滑、稳定的学习趋势，同时不增加客户端计算或通信开销。

Abstract
Federated Learning (FL) aims to learn a global model from distributed users while protecting their privacy. However, when data are distributed heterogeneously the learning process becomes noisy, unstable, and biased towards the last seen clients' data, slowing down convergence. To address these issues and improve the robustness and generalization capabilities of the global model, we propose WIMA (Window-based Model Averaging). WIMA aggregates global models from different rounds using a window-based approach, effectively capturing knowledge from multiple users and reducing the bias from the last ones. By adopting a windowed view on the rounds, WIMA can be applied from the initial stages of training. Importantly, our method introduces no additional communication or client-side computation overhead. Our experiments demonstrate the robustness of WIMA against distribution shifts and bad client sampling, resulting in smoother and more stable learning trends. Additionally, WIMA can be easily integrated with state-of-the-art algorithms. We extensively evaluate our approach on standard FL benchmarks, demonstrating its effectiveness.

摘要

Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use

paper_url: http://arxiv.org/abs/2310.01362
repo_url: https://github.com/liruiw/fleet-tools
paper_authors: Lirui Wang, Kaiqing Zhang, Allan Zhou, Max Simchowitz, Russ Tedrake
for: 这篇论文的目的是探讨分布式学习可以如何实现群体级别的机器人学习，而不需要传输或中央化群体级别数据。
methods: 该论文提出了一种分布式学习策略，称为“队伍合并”（fleet-merge），可以有效地将多个策略 Parameterized by recurrent neural networks (RNNs) 集成在分布式环境中。
results: 研究人员在Meta-World环境中训练了50个任务，并通过队伍合并策略将其们的策略集成起来，得到了良好的性能。此外，他们还提出了一个新的机器人工具使用标准，称为“队伍工具”（fleet-tools），可以用于评估群体级别的机器人学习在复杂和有接触的机器人手 manipulate 任务中的性能。

Abstract
Fleets of robots ingest massive amounts of streaming data generated by interacting with their environments, far more than those that can be stored or transmitted with ease. At the same time, we hope that teams of robots can co-acquire diverse skills through their experiences in varied settings. How can we enable such fleet-level learning without having to transmit or centralize fleet-scale data? In this paper, we investigate distributed learning of policies as a potential solution. To efficiently merge policies in the distributed setting, we propose fleet-merge, an instantiation of distributed learning that accounts for the symmetries that can arise in learning policies that are parameterized by recurrent neural networks. We show that fleet-merge consolidates the behavior of policies trained on 50 tasks in the Meta-World environment, with the merged policy achieving good performance on nearly all training tasks at test time. Moreover, we introduce a novel robotic tool-use benchmark, fleet-tools, for fleet policy learning in compositional and contact-rich robot manipulation tasks, which might be of broader interest, and validate the efficacy of fleet-merge on the benchmark.

摘要
大量的机器人队伍通过与环境互动生成大量流动数据，远远超出了可以存储或传输的范围。同时，我们希望机器人队伍可以通过不同的场景经验共同获得多样化的技能。如何实现这种队伍级学习而无需传输或中央化队伍级数据？在这篇论文中，我们调查分布式学习策略为可能的解决方案。为了有效地融合分布式环境中的策略，我们提出了“队伍融合”（fleet-merge），这是基于循环神经网络参数化策略的分布式学习实现，考虑到分布式环境中策略学习时可能出现的对称性。我们显示，队伍融合可以有效地将50个任务的策略在Meta-World环境中的行为协调，并在测试时对大多数训练任务表现良好。此外，我们还介绍了一个新的机器人工具使用指标，称为“队伍工具”（fleet-tools），用于评估机器人队伍在复杂的机器人拼接和接触rich任务中的策略学习能力，这可能对更广泛的领域有所启发。我们 Validate the effectiveness of fleet-merge on the benchmark.

A peridynamic-informed deep learning model for brittle damage prediction

paper_url: http://arxiv.org/abs/2310.01350
repo_url: None
paper_authors: Roozbeh Eghbalpoor, Azadeh Sheidaei
for: 预测质量材料中的 quasi-static 损害和裂化
methods: combines périodic 理论与Physics-Informed Neural Network (PINN) 方法
results: 能准确预测质量材料中的损害和裂化，并且高效率Here’s a more detailed explanation of each point:
for: The paper is written to predict the quasi-static damage and crack propagation in brittle materials using a novel approach that combines the principles of peridynamic theory with PINN.
methods: The proposed approach uses the linearized PD governing equation to enforce the PD principles in the PINN’s residual-based loss function, allowing the model to learn and capture intricate displacement patterns associated with different geometrical parameters. The paper also proposes several enhancements, such as cyclical annealing schedule and deformation gradient aware optimization technique, to ensure the model’s convergence and accuracy.
results: The paper’s results show that the proposed PD-INN approach can accurately predict damage and crack propagation in brittle materials, and it is more efficient than traditional methods such as PD direct numerical method and Extended-Finite Element Method. The paper provides several benchmark cases to validate the accuracy of the proposed approach.

Abstract
In this study, a novel approach that combines the principles of peridynamic (PD) theory with PINN is presented to predict quasi-static damage and crack propagation in brittle materials. To achieve high prediction accuracy and convergence rate, the linearized PD governing equation is enforced in the PINN's residual-based loss function. The proposed PD-INN is able to learn and capture intricate displacement patterns associated with different geometrical parameters, such as pre-crack position and length. Several enhancements like cyclical annealing schedule and deformation gradient aware optimization technique are proposed to ensure the model would not get stuck in its trivial solution. The model's performance assessment is conducted by monitoring the behavior of loss function throughout the training process. The PD-INN predictions are also validated through several benchmark cases with the results obtained from high-fidelity techniques such as PD direct numerical method and Extended-Finite Element Method. Our results show the ability of the nonlocal PD-INN to predict damage and crack propagation accurately and efficiently.

摘要
在这一研究中，我们提出了一种新的方法，即将普适动学（PD）原理与人工神经网络（PINN）结合以预测质量静的损害和裂缝升温。为了保证预测精度和收敛率高，我们在PINN的剩余基于损失函数中 enforces 了线性化的PD公式。我们的PD-INN可以学习和捕捉不同的几何参数（如预先裂位和长度）对应的复杂的位移模式。我们还提出了循环退火 schedule 和减弱材料响应优化技术来确保模型不会陷入到极少的解。我们通过监测损失函数的行为进行模型评估。我们的PD-INN预测也与高精度技术such as PD直接数值方法和扩展Finite Element方法的结果进行了验证。我们的结果表明PD-INN能够高效地和准确地预测损害和裂缝升温。

The Optimal use of Segmentation for Sampling Calorimeters

paper_url: http://arxiv.org/abs/2310.04442
repo_url: https://github.com/eiccodesign/regressiononly
paper_authors: Fernando Torales Acosta, Bishnu Karki, Piyush Karande, Aaron Angerami, Miguel Arratia, Kenneth Barish, Ryan Milton, Sebastián Morán, Benjamin Nachman, Anshuman Sinha
for: 这个论文是为了研究探测器的能量重建方法。
methods: 该论文使用深度神经网络来表示探测器，并利用所有可用信息来进行能量重建。
results: 研究发现，在隔离带电离袋中， relativelly细的长itudinal分割是重建能量的关键。这些结果可以作为未来EIC探测器优化的标准，以及其他实验室中高分辨率探测器的研究。

Abstract
One of the key design choices of any sampling calorimeter is how fine to make the longitudinal and transverse segmentation. To inform this choice, we study the impact of calorimeter segmentation on energy reconstruction. To ensure that the trends are due entirely to hardware and not to a sub-optimal use of segmentation, we deploy deep neural networks to perform the reconstruction. These networks make use of all available information by representing the calorimeter as a point cloud. To demonstrate our approach, we simulate a detector similar to the forward calorimeter system intended for use in the ePIC detector, which will operate at the upcoming Electron Ion Collider. We find that for the energy estimation of isolated charged pion showers, relatively fine longitudinal segmentation is key to achieving an energy resolution that is better than 10% across the full phase space. These results provide a valuable benchmark for ongoing EIC detector optimizations and may also inform future studies involving high-granularity calorimeters in other experiments at various facilities.

摘要
一个重要的设计选择 для任何采样加热计是如何细化 longitudinal 和 transverse 分 segmentation。为了决定这个选择，我们研究采用加热计分 segmentation 对能量重建的影响。为确保这些趋势是固有的硬件效应而不是不当使用分 segmentation，我们使用深度神经网络进行重建。这些网络利用所有可用信息，将加热计表示为点云。为了证明我们的方法，我们模拟了类似于前向加热计系统，这将在未来的 Electron Ion Collider 中使用。我们发现，对孤立 charged pion 散射的能量估计，相对细化 longitudinal 分 segmentation 是达到更好于 10% 的全频范围能量分辨率的关键。这些结果提供了价值的参考点 для进行中的 EIC 仪器优化，也可能会影响未来在其他实验室中的高精度加热计研究。

Optimal Estimator for Linear Regression with Shuffled Labels

paper_url: http://arxiv.org/abs/2310.01326
repo_url: None
paper_authors: Hang Zhang, Ping Li
for: Linear regression with shuffled labels, specifically reconstructing the permutation matrix and signal of interest from the sensing results.
methods: One-step estimator with a computational complexity of $O(n^3 + np^2m)$, which is comparable to the maximum complexity of linear assignment and least square algorithms.
results: Sufficient conditions for correct permutation recovery under different regimes of signal-to-noise ratio (SNR), including an easy regime, a medium regime, and a hard regime. Numerical experiments confirm the theoretical claims.

Abstract
This paper considers the task of linear regression with shuffled labels, i.e., $\mathbf Y = \mathbf \Pi \mathbf X \mathbf B + \mathbf W$, where $\mathbf Y \in \mathbb R^{n\times m}, \mathbf Pi \in \mathbb R^{n\times n}, \mathbf X\in \mathbb R^{n\times p}, \mathbf B \in \mathbb R^{p\times m}$, and $\mathbf W\in \mathbb R^{n\times m}$, respectively, represent the sensing results, (unknown or missing) corresponding information, sensing matrix, signal of interest, and additive sensing noise. Given the observation $\mathbf Y$ and sensing matrix $\mathbf X$, we propose a one-step estimator to reconstruct $(\mathbf \Pi, \mathbf B)$. From the computational perspective, our estimator's complexity is $O(n^3 + np^2m)$, which is no greater than the maximum complexity of a linear assignment algorithm (e.g., $O(n^3)$) and a least square algorithm (e.g., $O(np^2 m)$). From the statistical perspective, we divide the minimum $snr$ requirement into four regimes, e.g., unknown, hard, medium, and easy regimes; and present sufficient conditions for the correct permutation recovery under each regime: $(i)$ $snr \geq \Omega(1)$ in the easy regime; $(ii)$ $snr \geq \Omega(\log n)$ in the medium regime; and $(iii)$ $snr \geq \Omega((\log n)^{c_0}\cdot n^{c_1}/{srank(\mathbf B)})$ in the hard regime ($c_0, c_1$ are some positive constants and $srank(\mathbf B)$ denotes the stable rank of $\mathbf B$). In the end, we also provide numerical experiments to confirm the above claims.

摘要
这篇论文考虑了线性回归问题，即 $\mathbf{Y = \Pi XB + W}$, 其中 $\mathbf{Y} \in \mathbb{R}^{n \times m}, \mathbf{\Pi} \in \mathbb{R}^{n \times n}, \mathbf{X} \in \mathbb{R}^{n \times p}, \mathbf{B} \in \mathbb{R}^{p \times m}$, 和 $\mathbf{W} \in \mathbb{R}^{n \times m}$ 分别表示探测结果、对应信息、探测矩阵、信号 OF interest 和随机探测噪音。给定观测值 $\mathbf{Y}$ 和探测矩阵 $\mathbf{X}$，我们提议一步估计器来重建 $(\mathbf{\Pi}, \mathbf{B})$。从计算角度来看，我们的估计器的复杂度为 $O(n^3 + np^2m)$，不超过最大的线性分配算法的复杂度（例如 $O(n^3)$）和最小二乘算法的复杂度（例如 $O(np^2m)$）。从统计角度来看，我们将最小 $snr$ 要求分为四个 режиmes，即未知 режиme、困难 режиme、中等 режиme 和容易 режиme，并给出了各 режиme 下correct permutation recovery的 suficient conditions： $(i)$ $snr \geq \Omega(1)$ 在容易 режиme; $(ii)$ $snr \geq \Omega(\log n)$ 在中等 режиme; 和 $(iii)$ $snr \geq \Omega((log n)^{c_0} \cdot n^{c_1}/{srank(\mathbf{B})})$ 在困难 режиme（$c_0, c_1$ 是一些正数， $srank(\mathbf{B})$ 表示 $\mathbf{B}$ 的稳定秩）。 finally, we also provide numerical experiments to confirm the above claims.

Coupling public and private gradient provably helps optimization

paper_url: http://arxiv.org/abs/2310.01304
repo_url: None
paper_authors: Ruixuan Liu, Zhiqi Bu, Yu-xiang Wang, Sheng Zha, George Karypis
for: 提高大神经网络的成功率，通过结合私人和公共数据进行优化。
methods: 使用权重Linear Combination将私人和公共数据的梯度相互 Coupling，并在 convex 设定下解析出 оптималь solution。
results: 通过实验证明，在语言和视觉benchmark上， gradient Coupling可以加速非对称损失的收敛，并且Hyperparameter如隐私预算、迭代次数、批处理大小和模型大小对 choosing 优化的权重有影响。

Abstract
The success of large neural networks is crucially determined by the availability of data. It has been observed that training only on a small amount of public data, or privately on the abundant private data can lead to undesirable degradation of accuracy. In this work, we leverage both private and public data to improve the optimization, by coupling their gradients via a weighted linear combination. We formulate an optimal solution for the optimal weight in the convex setting to indicate that the weighting coefficient should be hyperparameter-dependent. Then, we prove the acceleration in the convergence of non-convex loss and the effects of hyper-parameters such as privacy budget, number of iterations, batch size, and model size on the choice of the weighting coefficient. We support our analysis with empirical experiments across language and vision benchmarks, and provide a guideline for choosing the optimal weight of the gradient coupling.

摘要
“大型神经网络的成功几率受到数据的可用性的重要限制。已经观察到只在少量公共数据或私有数据上训练时，可能会导致准确度下降。在这种工作中，我们利用了私有和公共数据来改善优化，通过将其权重组合。我们在凸Setting中提出了最佳解决方案，其中权重系数应该是Hyperparameter-dependent。然后，我们证明了加速非 convex损失的收敛速度和 гиперparameters的影响，如隐私预算、迭代次数、批处理大小和模型大小。我们支持我们的分析通过语言和视觉 benchmarks 的实验，并提供了选择最佳权重的指南。”Note that Simplified Chinese is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and other parts of the world.

Automated regime detection in multidimensional time series data using sliced Wasserstein k-means clustering

paper_url: http://arxiv.org/abs/2310.01285
repo_url: None
paper_authors: Qinmeng Luan, James Hamp
for: 本研究使用 Wasserstein k-means clustering 方法来标识时间序列数据中的不同频率模式。
methods: 本研究首先对一维时间序列数据应用 Wasserstein k-means clustering 算法，并研究了不同初始化的影响。然后，对多维时间序列数据，我们使用 slice Wasserstein k-means clustering 方法（sWk-means），并用合成数据示出了该方法的有效性。
results: 本研究使用实际的外汇spot价数据进行了一个案例研究，并证明了 sWk-means 方法的有效性。研究还发现了一些限制，并提出了可能的补充或替代方法。

Abstract
Recent work has proposed Wasserstein k-means (Wk-means) clustering as a powerful method to identify regimes in time series data, and one-dimensional asset returns in particular. In this paper, we begin by studying in detail the behaviour of the Wasserstein k-means clustering algorithm applied to synthetic one-dimensional time series data. We study the dynamics of the algorithm and investigate how varying different hyperparameters impacts the performance of the clustering algorithm for different random initialisations. We compute simple metrics that we find are useful in identifying high-quality clusterings. Then, we extend the technique of Wasserstein k-means clustering to multidimensional time series data by approximating the multidimensional Wasserstein distance as a sliced Wasserstein distance, resulting in a method we call `sliced Wasserstein k-means (sWk-means) clustering'. We apply the sWk-means clustering method to the problem of automated regime detection in multidimensional time series data, using synthetic data to demonstrate the validity of the approach. Finally, we show that the sWk-means method is effective in identifying distinct market regimes in real multidimensional financial time series, using publicly available foreign exchange spot rate data as a case study. We conclude with remarks about some limitations of our approach and potential complementary or alternative approaches.

摘要
最近的工作提出了 Wasserstein k-means（Wk-means）归一 clustering 方法，用于时间序列数据中的分区。本文首先对 synthetic 一维时间序列数据进行了详细的研究，探讨了 Wasserstein k-means clustering 算法的行为和不同权重参数对不同初始化的影响。我们计算了一些简单的指标，用于评价高质量的归一结果。然后，我们将多维时间序列数据中的 Wasserstein k-means clustering 方法扩展为 sliced Wasserstein k-means（sWk-means）归一方法，通过 aproximating 多维 Wasserstein 距离为 slice Wasserstein 距离。我们使用 synthetic 数据 demonstrate 了这种方法的有效性。最后，我们使用公开available foreign exchange spot rate 数据作为案例研究，证明了 sWk-means 方法在实际多维金融时间序列中可以有效地 Identify 市场 режимы。我们结束时提出了一些限制和可能的补充或替代方法。

Non-Exchangeable Conformal Risk Control

paper_url: http://arxiv.org/abs/2310.01262
repo_url: https://github.com/deep-spin/non-exchangeable-crc
paper_authors: António Farinhas, Chrysoula Zerva, Dennis Ulmer, André F. T. Martins
for: 提供形式保证的uncertainty集或间隔 для黑盒神经网络预测，确保先定的概率包含实际的地面真值。
methods: 基于非交换性数据的扩展，以及提供更广泛的目标的统计保证，如确界最好的F1分数或预期false negative rate。
results: 在 synthetic 和实际数据上实现了非交换性扩展的 conformal risk control，可控制任意升序损失函数的期望值，无需假设，可以根据测试示例的统计相似性进行数据权重。

Abstract
Split conformal prediction has recently sparked great interest due to its ability to provide formally guaranteed uncertainty sets or intervals for predictions made by black-box neural models, ensuring a predefined probability of containing the actual ground truth. While the original formulation assumes data exchangeability, some extensions handle non-exchangeable data, which is often the case in many real-world scenarios. In parallel, some progress has been made in conformal methods that provide statistical guarantees for a broader range of objectives, such as bounding the best F1-score or minimizing the false negative rate in expectation. In this paper, we leverage and extend these two lines of work by proposing non-exchangeable conformal risk control, which allows controlling the expected value of any monotone loss function when the data is not exchangeable. Our framework is flexible, makes very few assumptions, and allows weighting the data based on its statistical similarity with the test examples; a careful choice of weights may result on tighter bounds, making our framework useful in the presence of change points, time series, or other forms of distribution drift. Experiments with both synthetic and real world data show the usefulness of our method.

摘要
划分预测（Conformal Prediction）在最近几年内产生了广泛的兴趣，它可以提供对黑盒神经网络模型的预测结果的形式保证的不确定集或间隔，保证预测结果符合定义的概率。然而，原始 формулировщин assumes 数据均匀性，一些扩展可以处理非均匀数据，这些数据在许多实际场景中很常见。另外，一些进展在划分方法上，可以为更广泛的目标提供统计保证，例如缩小最佳 F1 分数或预测FALSE Negative 率的预期。在这篇文章中，我们利用和扩展这两个线索，提出非均匀划分风险控制，可以在不均匀数据上控制预测结果的预期值。我们的框架具有很少假设，可以根据测试例子的统计相似性来赋重数据，选择合适的赋重可能会使我们的框架在变化点、时间序列或其他形式的分布漂移中更加有用。实验表明，我们的方法在实际数据上具有很好的用处。

Self-supervised Learning for Anomaly Detection in Computational Workflows

paper_url: http://arxiv.org/abs/2310.01247
repo_url: None
paper_authors: Hongwei Jin, Krishnan Raghavan, George Papadimitriou, Cong Wang, Anirban Mandal, Ewa Deelman, Prasanna Balaprakash
for: 这个研究旨在探讨计算工作流程中的异常检测问题，以涵盖各领域如防火墙、金融和社交网络等。
methods: 这篇研究使用自动encoder驱动的自我超vised learning（SSL）方法，从无标注的工作流程数据中学习一个总体统计，以评估计算工作流程的正常行为。
results: 研究结果显示，通过估计正常行为的分布在隐藏空间，可以超越现有的异常检测方法在我们的参考数据集上。

Abstract
Anomaly detection is the task of identifying abnormal behavior of a system. Anomaly detection in computational workflows is of special interest because of its wide implications in various domains such as cybersecurity, finance, and social networks. However, anomaly detection in computational workflows~(often modeled as graphs) is a relatively unexplored problem and poses distinct challenges. For instance, when anomaly detection is performed on graph data, the complex interdependency of nodes and edges, the heterogeneity of node attributes, and edge types must be accounted for. Although the use of graph neural networks can help capture complex inter-dependencies, the scarcity of labeled anomalous examples from workflow executions is still a significant challenge. To address this problem, we introduce an autoencoder-driven self-supervised learning~(SSL) approach that learns a summary statistic from unlabeled workflow data and estimates the normal behavior of the computational workflow in the latent space. In this approach, we combine generative and contrastive learning objectives to detect outliers in the summary statistics. We demonstrate that by estimating the distribution of normal behavior in the latent space, we can outperform state-of-the-art anomaly detection methods on our benchmark datasets.

摘要
《异常检测在计算工作流中是一项特殊的任务，因为它在各个领域，如网络安全、金融和社交媒体中具有广泛的应用。然而，在计算工作流中进行异常检测（通常模型为图）是一个相对未经探索的问题，它具有许多独特的挑战。例如，在图数据上进行异常检测时，需要考虑图中节点和边之间的复杂依赖关系，节点属性和边类型的异常性。虽然使用图神经网络可以帮助捕捉图中的复杂依赖关系，但是从计算工作流中获得标注的异常示例还是一个主要的挑战。为解决这个问题，我们提出了一种自动编码器驱动的自我超级vised学习（SSL）方法，该方法通过不supervised learning来学习计算工作流的正常行为的摘要统计。在这种方法中，我们将生成和对比学习目标结合起来，以检测摘要统计中的异常点。我们示示了，通过估计计算工作流的正常行为的分布在隐藏空间，我们可以超越现有的异常检测方法在我们的标准散点集上表现。》

Modality-aware Transformer for Time series Forecasting

paper_url: http://arxiv.org/abs/2310.01232
repo_url: None
paper_authors: Hajar Emami, Xuan-Hong Dang, Yousaf Shah, Petros Zerfos
for: 这篇论文主要针对多modal时间序列预测问题，特别是在金融领域，时间序列的未来行为frequently linked to information derived from various textual reports和多个经济指标。
methods: 我们提出了一个名为Modality-aware Transformer的新型多modal transformer-based模型，利用这个模型可以充分利用不同modal的信息，同时实现时间序列预测和多modal跨模式理解。我们在这个模型中开发了一个内置特性级别注意力层，让模型在每个数据模式中对最重要的特性进行注意。
results: 我们的实验结果显示，Modality-aware Transformer在金融数据上比较 existed方法更好，提供了一个新和实际的解决方案 для多modal时间序列预测问题。

Abstract
Time series forecasting presents a significant challenge, particularly when its accuracy relies on external data sources rather than solely on historical values. This issue is prevalent in the financial sector, where the future behavior of time series is often intricately linked to information derived from various textual reports and a multitude of economic indicators. In practice, the key challenge lies in constructing a reliable time series forecasting model capable of harnessing data from diverse sources and extracting valuable insights to predict the target time series accurately. In this work, we tackle this challenging problem and introduce a novel multimodal transformer-based model named the Modality-aware Transformer. Our model excels in exploring the power of both categorical text and numerical timeseries to forecast the target time series effectively while providing insights through its neural attention mechanism. To achieve this, we develop feature-level attention layers that encourage the model to focus on the most relevant features within each data modality. By incorporating the proposed feature-level attention, we develop a novel Intra-modal multi-head attention (MHA), Inter-modal MHA and Modality-target MHA in a way that both feature and temporal attentions are incorporated in MHAs. This enables the MHAs to generate temporal attentions with consideration of modality and feature importance which leads to more informative embeddings. The proposed modality-aware structure enables the model to effectively exploit information within each modality as well as foster cross-modal understanding. Our extensive experiments on financial datasets demonstrate that Modality-aware Transformer outperforms existing methods, offering a novel and practical solution to the complex challenges of multi-modality time series forecasting.

摘要
时间序列预测存在 significativ Challenge，特别是当它的准确性取决于外部数据源而不仅仅是历史值。这个问题在金融领域非常普遍，因为未来时间序列的行为frequently linked to information derived from various textual reports and a multitude of economic indicators。在实践中，关键挑战在于构建可靠的时间序列预测模型，能够从多种数据源中提取有价值的信息，并准确预测目标时间序列。在这种工作中，我们解决这个挑战，并提出了一种新的多modal transformer-based模型，名为Modality-aware Transformer。我们的模型能够effectively explore the power of both categorical text and numerical time series to forecast the target time series accurately while providing insights through its neural attention mechanism。为了实现这一点，我们开发了一种特有的Feature-level attention层，该层鼓励模型对每个数据模式中最相关的特征进行注意力。通过在MHA中 integrate feature-level attention，我们开发了一种新的Intra-modal multi-head attention (MHA)、Inter-modal MHA和Modality-target MHA，其中both feature和 temporal attentions are incorporated in MHAs。这使得MHAs可以生成基于模式和特征重要性的temporal attention，从而生成更有信息的嵌入。我们的模型结构能够effectively exploit information within each modality as well as foster cross-modal understanding。我们对金融数据集进行了广泛的实验，并证明Modality-aware Transformer可以超过现有方法，提供一种新和实用的解决方案 для复杂的多模态时间序列预测问题。

Reconstructing Atmospheric Parameters of Exoplanets Using Deep Learning

paper_url: http://arxiv.org/abs/2310.01227
repo_url: None
paper_authors: Flavio Giobergia, Alkis Koudounas, Elena Baralis
for: 这篇论文是为了研究外层行星大气的Properties和特性，提出了一种基于深度学习和反向模型的多目标概率回归方法。
methods: 该方法结合了深度学习和反向模型技术，在多模式架构中实现了大气参数的估算。
results: 该方法可以更好地处理多目标问题，提高了计算效率和准确率，为外层行星研究提供了有价值的新思路和方法。

Abstract
Exploring exoplanets has transformed our understanding of the universe by revealing many planetary systems that defy our current understanding. To study their atmospheres, spectroscopic observations are used to infer essential atmospheric properties that are not directly measurable. Estimating atmospheric parameters that best fit the observed spectrum within a specified atmospheric model is a complex problem that is difficult to model. In this paper, we present a multi-target probabilistic regression approach that combines deep learning and inverse modeling techniques within a multimodal architecture to extract atmospheric parameters from exoplanets. Our methodology overcomes computational limitations and outperforms previous approaches, enabling efficient analysis of exoplanetary atmospheres. This research contributes to advancements in the field of exoplanet research and offers valuable insights for future studies.

摘要
translate_text: 探索外行星已经重新定义了我们对宇宙的理解，揭示了许多不同于我们当前理解的行星系统。为了研究它们的大气，我们使用光谱观测获取不直接测量的大气属性。估算大气参数，使得光谱特征最佳匹配指定的大气模型是一个复杂的问题，具有计算限制。在这篇论文中，我们提出了一种多目标概率回归方法，结合深度学习和反向模型技术，在多Modal 架构中提取大气参数。我们的方法超越计算限制，并超越先前的方法，使得对外行星大气的分析变得效率。这项研究对外行星研究领域的发展做出了贡献，并为未来的研究提供了价值的意见。Note: The translation is in Simplified Chinese, which is the standard version of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

A path-norm toolkit for modern networks: consequences, promises and challenges

paper_url: http://arxiv.org/abs/2310.01225
repo_url: None
paper_authors: Antoine Gonon, Nicolas Brisebarre, Elisa Riccietti, Rémi Gribonval
for: This paper introduces a toolkit for generalization bounds of modern neural networks, specifically for DAG ReLU networks with biases, skip connections, and any operation based on the extraction of order statistics.
methods: The toolkit uses path-norms, which are a type of complexity measure that is easy to compute, invariant under network symmetries, and improves sharpness on feedforward networks.
results: The paper establishes generalization bounds for modern neural networks that are the most widely applicable and recover or beat the sharpest known bounds of this type. The toolkit is also used to numerically evaluate the sharpest known bounds for ResNets on ImageNet.

Abstract
This work introduces the first toolkit around path-norms that is fully able to encompass general DAG ReLU networks with biases, skip connections and any operation based on the extraction of order statistics: max pooling, GroupSort etc. This toolkit notably allows us to establish generalization bounds for modern neural networks that are not only the most widely applicable path-norm based ones, but also recover or beat the sharpest known bounds of this type. These extended path-norms further enjoy the usual benefits of path-norms: ease of computation, invariance under the symmetries of the network, and improved sharpness on feedforward networks compared to the product of operators' norms, another complexity measure most commonly used. The versatility of the toolkit and its ease of implementation allow us to challenge the concrete promises of path-norm-based generalization bounds, by numerically evaluating the sharpest known bounds for ResNets on ImageNet.

摘要
这个工具包 introduces the first toolkit around path-norms that can fully encompass general DAG ReLU networks with biases, skip connections, and any operation based on the extraction of order statistics: max pooling, GroupSort, etc. This toolkit allows us to establish generalization bounds for modern neural networks that are not only the most widely applicable path-norm based ones, but also recover or beat the sharpest known bounds of this type. These extended path-norms also enjoy the usual benefits of path-norms: ease of computation, invariance under the symmetries of the network, and improved sharpness on feedforward networks compared to the product of operators' norms, another complexity measure most commonly used. 该工具的多样性和易用性，使我们能够 numerically evaluate the sharpest known bounds for ResNets on ImageNet, challenging the concrete promises of path-norm-based generalization bounds.

Revisiting Mobility Modeling with Graph: A Graph Transformer Model for Next Point-of-Interest Recommendation

paper_url: http://arxiv.org/abs/2310.01224
repo_url: https://github.com/yukayo/mobgt
paper_authors: Xiaohang Xu, Toyotaro Suzumura, Jiawei Yong, Masatoshi Hanai, Chuang Yang, Hiroki Kanezashi, Renhe Jiang, Shintaro Fukushima
for: 本研究旨在提出一种能够充分利用图模型来捕捉用户流动数据中的空间和时间特征的POI推荐模型。
methods: 该模型基于图神经网络（GNN），并将个体空间和时间图嵌入器与全球用户位置关系嵌入器结合，以捕捉唯一的特征。此外，模型还包括基于图变换器的流动嵌入器，以提取更高级别的POI之间关系。
results: 实验结果表明，MobGT模型在多个数据集和指标上都有显著提高，相比之前的模型，平均提高24%。 codes 可以在 \url{https://github.com/Yukayo/MobGT} 上获取。

Abstract
Next Point-of-Interest (POI) recommendation plays a crucial role in urban mobility applications. Recently, POI recommendation models based on Graph Neural Networks (GNN) have been extensively studied and achieved, however, the effective incorporation of both spatial and temporal information into such GNN-based models remains challenging. Extracting distinct fine-grained features unique to each piece of information is difficult since temporal information often includes spatial information, as users tend to visit nearby POIs. To address the challenge, we propose \textbf{\underline{Mob}ility \textbf{\underline{G}raph \textbf{\underline{T}ransformer (MobGT) that enables us to fully leverage graphs to capture both the spatial and temporal features in users' mobility patterns. MobGT combines individual spatial and temporal graph encoders to capture unique features and global user-location relations. Additionally, it incorporates a mobility encoder based on Graph Transformer to extract higher-order information between POIs. To address the long-tailed problem in spatial-temporal data, MobGT introduces a novel loss function, Tail Loss. Experimental results demonstrate that MobGT outperforms state-of-the-art models on various datasets and metrics, achieving 24\% improvement on average. Our codes are available at \url{https://github.com/Yukayo/MobGT}.

摘要
Next Point-of-Interest (POI) recommendation plays a crucial role in urban mobility applications. Recently, POI recommendation models based on Graph Neural Networks (GNN) have been extensively studied and achieved, but the effective incorporation of both spatial and temporal information into such GNN-based models remains challenging. Extracting distinct fine-grained features unique to each piece of information is difficult since temporal information often includes spatial information, as users tend to visit nearby POIs. To address the challenge, we propose \textbf{\underline{Mobile} \textbf{\underline{Graph} \textbf{\underline{Transformer} (MobGT) that enables us to fully leverage graphs to capture both the spatial and temporal features in users' mobility patterns. MobGT combines individual spatial and temporal graph encoders to capture unique features and global user-location relations. Additionally, it incorporates a mobility encoder based on Graph Transformer to extract higher-order information between POIs. To address the long-tailed problem in spatial-temporal data, MobGT introduces a novel loss function, Tail Loss. Experimental results demonstrate that MobGT outperforms state-of-the-art models on various datasets and metrics, achieving 24\% improvement on average. Our codes are available at \url{https://github.com/Yukayo/MobGT}.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication

paper_url: http://arxiv.org/abs/2310.01211
repo_url: None
paper_authors: Irene Cannistraci, Luca Moschella, Marco Fumero, Valentino Maiorca, Emanuele Rodolà
for: 提高 neural network 模块的重用和合并性能
methods: 直接 incorporate 一组抽象到 latent representation 中的几何变换，无需先知道优化的抽象
results: 在 classification 和重建任务中，观察了一致的潜在相似性和下游性能提升在零shot合并设定下

Abstract
It has been observed that representations learned by distinct neural networks conceal structural similarities when the models are trained under similar inductive biases. From a geometric perspective, identifying the classes of transformations and the related invariances that connect these representations is fundamental to unlocking applications, such as merging, stitching, and reusing different neural modules. However, estimating task-specific transformations a priori can be challenging and expensive due to several factors (e.g., weights initialization, training hyperparameters, or data modality). To this end, we introduce a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations without requiring prior knowledge about the optimal invariance to infuse. We validate our solution on classification and reconstruction tasks, observing consistent latent similarity and downstream performance improvements in a zero-shot stitching setting. The experimental analysis comprises three modalities (vision, text, and graphs), twelve pretrained foundational models, eight benchmarks, and several architectures trained from scratch.

摘要
各种神经网络学习的表示器可能会隐藏同类结构的相似性，当模型在类似启发假设下训练时。从几何角度来看，找出这些表示器中的类别转换和相关的免疫性是解锁应用的关键，如合并、缝合和重用不同神经模块。然而，在不同任务上预先估算任务特定的转换可能是困难和昂贵的，因为多种因素（例如权重初始化、训练超参数和数据类型）。为此，我们提出了一种通用的方法，直接将一组免疫性 incorporated 到表示器中，在幂 space 上构建免疫性的产品空间，无需先知道最佳免疫性。我们在分类和重建任务上验证了我们的解决方案，观察到了静态相似性和下游性能提升。实验分析包括三种模式（视觉、文本和图）、十二个预训练基础模型、八个标准核心和多个从零开始训练的建筑。

Unified Uncertainty Calibration

paper_url: http://arxiv.org/abs/2310.01202
repo_url: None
paper_authors: Kamalika Chaudhuri, David Lopez-Paz
for: 提高AI系统的稳定性、公平性和安全性，让分类器在测试示例上采取“我不知道”的决策。
methods: 提出了一种名为“统一不确定性均衡（U2C）”的框架，将 aleatoric 和 epistemic 不确定性集成到一起，以便对不确定性进行有效的学习理论分析，并在 ImageNet benchmark 上超越了 reject-or-classify 策略。
results: U2C 在 ImageNet 上实现了比 reject-or-classify 更高的性能，并且可以更好地捕捉不同来源的不确定性，进一步提高了 AI 系统的稳定性、公平性和安全性。

Abstract
To build robust, fair, and safe AI systems, we would like our classifiers to say ``I don't know'' when facing test examples that are difficult or fall outside of the training classes.The ubiquitous strategy to predict under uncertainty is the simplistic \emph{reject-or-classify} rule: abstain from prediction if epistemic uncertainty is high, classify otherwise.Unfortunately, this recipe does not allow different sources of uncertainty to communicate with each other, produces miscalibrated predictions, and it does not allow to correct for misspecifications in our uncertainty estimates. To address these three issues, we introduce \emph{unified uncertainty calibration (U2C)}, a holistic framework to combine aleatoric and epistemic uncertainties. U2C enables a clean learning-theoretical analysis of uncertainty estimation, and outperforms reject-or-classify across a variety of ImageNet benchmarks.

摘要

SWoTTeD: An Extension of Tensor Decomposition to Temporal Phenotyping

paper_url: http://arxiv.org/abs/2310.01201
repo_url: None
paper_authors: Hana Sebia, Thomas Guyet, Etienne Audureau
for: 本研究旨在探讨electronic health records（EHR）数据中的个体轨迹分析，并提出了一种基于时间fenotype的新方法SWoTTeD（Sliding Window for Temporal Tensor Decomposition）来挖掘隐藏的时间模式。
methods: 本研究提出了一种 integrate several constraints and regularizations的方法SWoTTeD，以增强 extracted phenotypes的解释性。
results: 通过synthetic和实际数据 validate，SWoTTeD可以与最新的tensor decomposition模型匹配或超越它们，并提取了 meaningful for clinicians的时间 fenotypes。

Abstract
Tensor decomposition has recently been gaining attention in the machine learning community for the analysis of individual traces, such as Electronic Health Records (EHR). However, this task becomes significantly more difficult when the data follows complex temporal patterns. This paper introduces the notion of a temporal phenotype as an arrangement of features over time and it proposes SWoTTeD (Sliding Window for Temporal Tensor Decomposition), a novel method to discover hidden temporal patterns. SWoTTeD integrates several constraints and regularizations to enhance the interpretability of the extracted phenotypes. We validate our proposal using both synthetic and real-world datasets, and we present an original usecase using data from the Greater Paris University Hospital. The results show that SWoTTeD achieves at least as accurate reconstruction as recent state-of-the-art tensor decomposition models, and extracts temporal phenotypes that are meaningful for clinicians.

摘要
Recently, tensor decomposition has been gaining attention in the machine learning community for analyzing individual traces, such as Electronic Health Records (EHR). However, this task becomes significantly more difficult when the data follows complex temporal patterns. This paper introduces the concept of a temporal phenotype as an arrangement of features over time and proposes SWoTTeD (Sliding Window for Temporal Tensor Decomposition), a novel method to discover hidden temporal patterns. SWoTTeD incorporates several constraints and regularizations to enhance the interpretability of the extracted phenotypes. We validate our proposal using both synthetic and real-world datasets and present an original use case using data from the Greater Paris University Hospital. The results show that SWoTTeD achieves at least as accurate reconstruction as recent state-of-the-art tensor decomposition models and extracts temporal phenotypes that are meaningful for clinicians.Here's the Chinese text with traditional characters:近期，tensor decomposition在机器学习社区内已引起关注，用于分析个体轨迹，如电子医疗记录（EHR）。然而，当数据表现出复杂的时间模式时，这种任务变得非常困难。这篇论文提出了时间型现象（temporal phenotype）的概念，即时间上的特征排列，并提出了SWoTTeD（Sliding Window for Temporal Tensor Decomposition），一种新的方法，用于发现隐藏的时间模式。SWoTTeDintegrates several constraints and regularizations to enhance the interpretability of the extracted phenotypes。我们验证了我们的提议使用了 both synthetic和实际数据集，并提供了一个原创的用例，使用了法国巴黎大学医院的数据。结果表明，SWoTTeD在最近的状态艺术tensor decomposition模型中至少具有相同的准确重建能力，并提取了有意义的临床型现象。

Federated K-means Clustering

paper_url: http://arxiv.org/abs/2310.01195
repo_url: https://github.com/ourownstory/federated_kmeans
paper_authors: Swier Garst, Marcel Reinders
for: 本研究旨在提出一种基于 federated learning 的 K-means 嵌入 clustering 算法，以保持数据隐私和拥有权。
methods: 该算法使用 federated averaging 方法，并采用一种新的聚合策略来Address the challenges of varying number of clusters between centers 和 less separable datasets。
results: 实验结果表明，该算法能够在不同数据中心之间的不同数据分布下准确地进行嵌入 clustering，并且在 less separable datasets 上具有更高的鲁棒性和稳定性。

Abstract
Federated learning is a technique that enables the use of distributed datasets for machine learning purposes without requiring data to be pooled, thereby better preserving privacy and ownership of the data. While supervised FL research has grown substantially over the last years, unsupervised FL methods remain scarce. This work introduces an algorithm which implements K-means clustering in a federated manner, addressing the challenges of varying number of clusters between centers, as well as convergence on less separable datasets.

摘要
设置语言为简化中文。 Federated learning 是一种技术，允许在分布式数据集上进行机器学习，而不需要数据集集中化，从而更好地保护数据隐私和所有权。 although supervised FL research has grown substantially in recent years, unsupervised FL methods are still scarce. This work introduces an algorithm that implements K-means clustering in a federated manner, addressing the challenges of varying number of clusters between centers, as well as convergence on less separable datasets.Note: "简化中文" (Simplified Chinese) is a romanization of Chinese characters, which is used in mainland China and Singapore. It is different from "traditional Chinese" (Traditional Chinese) which is used in Hong Kong, Taiwan, and other countries.

If there is no underfitting, there is no Cold Posterior Effect

paper_url: http://arxiv.org/abs/2310.01189
repo_url: None
paper_authors: Yijie Zhang, Yi-Shan Wu, Luis A. Ortega, Andrés R. Masegosa
for: 这篇论文研究了温 posterior effect（CPE）在 bayesian deep learning 中的存在，并发现在温度 $T<1$ 下， posterior predictive 可能会比 bayesian posterior ($T=1$) 更好。
methods: 这篇论文使用了 bayesian deep learning 方法，并研究了 CPE 是否为模型误差问题。
results: 这篇论文发现，如果存在过度适应（underfitting），那么 CPE 会出现；如果没有过度适应，那么 CPE 不会出现。

Abstract
The cold posterior effect (CPE) (Wenzel et al., 2020) in Bayesian deep learning shows that, for posteriors with a temperature $T<1$, the resulting posterior predictive could have better performances than the Bayesian posterior ($T=1$). As the Bayesian posterior is known to be optimal under perfect model specification, many recent works have studied the presence of CPE as a model misspecification problem, arising from the prior and/or from the likelihood function. In this work, we provide a more nuanced understanding of the CPE as we show that misspecification leads to CPE only when the resulting Bayesian posterior underfits. In fact, we theoretically show that if there is no underfitting, there is no CPE.

摘要
冷后效应（CPE）（文泽等，2020）在 bayesian深度学习中显示，当 posterior 的温度 $T<1$ 时，结果的 posterior predictive 可能比 bayesian posterior ($T=1）更好。由于 bayesian posterior 在完美模型假设下是最优的，因此许多最近的工作都在研究 CPE 是模型假设错误的问题，来自 prior 和/或 likelihood 函数。在这个工作中，我们提供了更加细腻的理解 CPE，显示了 misspecification 导致 CPE 只有当 bayesian posterior 下降时。事实上，我们理论上表明，如果没有下降，就没有 CPE。Note: "冷后效应" (CPE) is the Chinese translation of "cold posterior effect".

Light Schrödinger Bridge

paper_url: http://arxiv.org/abs/2310.01174
repo_url: https://github.com/ngushchin/lightsb
paper_authors: Alexander Korotin, Nikita Gushchin, Evgeny Burnaev
for: This paper aims to address the issue of heavy-weighted and complex optimization of existing Schrodinger Bridges (SB) solvers, and proposes a novel fast and simple SB solver.
methods: The proposed LightSB solver combines two ideas from the field: parameterizing the Schrodinger potentials with sum-exp quadratic functions, and viewing the log-Schrodinger potentials as energy functions. The optimization objective is simple and straightforward, and the solver is lightweight, simulation-free, and theoretically justified.
results: The LightSB solver is able to solve SB in moderate dimensions in a matter of minutes on CPU without painful hyperparameter selection, and is proven to be a universal approximator of SBs. The code for the LightSB solver is available at https://github.com/ngushchin/LightSB.

Abstract
Despite the recent advances in the field of computational Schrodinger Bridges (SB), most existing SB solvers are still heavy-weighted and require complex optimization of several neural networks. It turns out that there is no principal solver which plays the role of simple-yet-effective baseline for SB just like, e.g., $k$-means method in clustering, logistic regression in classification or Sinkhorn algorithm in discrete optimal transport. We address this issue and propose a novel fast and simple SB solver. Our development is a smart combination of two ideas which recently appeared in the field: (a) parameterization of the Schrodinger potentials with sum-exp quadratic functions and (b) viewing the log-Schrodinger potentials as the energy functions. We show that combined together these ideas yield a lightweight, simulation-free and theoretically justified SB solver with a simple straightforward optimization objective. As a result, it allows solving SB in moderate dimensions in a matter of minutes on CPU without a painful hyperparameter selection. Our light solver resembles the Gaussian mixture model which is widely used for density estimation. Inspired by this similarity, we also prove an important theoretical result showing that our light solver is a universal approximator of SBs. The code for the LightSB solver can be found at https://github.com/ngushchin/LightSB

摘要
Despite recent advances in computational Schrödinger bridges (SB), most existing solvers are still computationally expensive and require complex optimization of multiple neural networks. There is no simple yet effective baseline solver for SB, similar to methods like $k$-means in clustering, logistic regression in classification, or the Sinkhorn algorithm in discrete optimal transport. We address this issue and propose a novel fast and simple SB solver. Our approach combines two recent ideas in the field: (a) parameterizing the Schrödinger potentials with sum-exp quadratic functions, and (b) viewing the log-Schrödinger potentials as energy functions. By combining these ideas, we obtain a lightweight, simulation-free, and theoretically justified SB solver with a simple and straightforward optimization objective. This allows for solving SB in moderate dimensions in just a few minutes on CPU without painful hyperparameter selection. Our light solver is similar to the Gaussian mixture model, which is widely used for density estimation. Inspired by this similarity, we also prove an important theoretical result showing that our light solver is a universal approximator of SBs. The code for the LightSB solver can be found at https://github.com/ngushchin/LightSB.

RRR-Net: Reusing, Reducing, and Recycling a Deep Backbone Network

paper_url: http://arxiv.org/abs/2310.01157
repo_url: None
paper_authors: Haozhe Sun, Isabelle Guyon, Felix Mohr, Hedi Tabia
for: 这篇论文的目的是创建一个较小的、较快的模型，并且保持与大型背景网络相似的性能。
methods: 这篇论文使用了将预训练的大型背景网络（ResNet152）缩减为5个块，并分割模型为多个分支来提高性能。
results: 这篇论文的实验结果显示，使用这些技术可以创建一个较小的、较快的模型，并且与传统的背景网络组合相似的性能。

Abstract
It has become mainstream in computer vision and other machine learning domains to reuse backbone networks pre-trained on large datasets as preprocessors. Typically, the last layer is replaced by a shallow learning machine of sorts; the newly-added classification head and (optionally) deeper layers are fine-tuned on a new task. Due to its strong performance and simplicity, a common pre-trained backbone network is ResNet152.However, ResNet152 is relatively large and induces inference latency. In many cases, a compact and efficient backbone with similar performance would be preferable over a larger, slower one. This paper investigates techniques to reuse a pre-trained backbone with the objective of creating a smaller and faster model. Starting from a large ResNet152 backbone pre-trained on ImageNet, we first reduce it from 51 blocks to 5 blocks, reducing its number of parameters and FLOPs by more than 6 times, without significant performance degradation. Then, we split the model after 3 blocks into several branches, while preserving the same number of parameters and FLOPs, to create an ensemble of sub-networks to improve performance. Our experiments on a large benchmark of $40$ image classification datasets from various domains suggest that our techniques match the performance (if not better) of ``classical backbone fine-tuning'' while achieving a smaller model size and faster inference speed.

摘要
现在计算机视觉和其他机器学习领域中已成为主流的做法是 reuse 已经预训练的基准网络。通常是将最后一层替换为一个浅学习机制，新增的分类头和（选项别）更深的层进行 fine-tuning 新任务。由于其强大的表现和简单性，一个常见的预训练基准网络是 ResNet152。然而，ResNet152 相对较大，导致推理延迟。在许多情况下，一个更加 компакт和高效的基准网络会更有优势于一个更大和更慢的网络。这篇论文研究了如何重用预训练的基准网络，以创建一个更小更快的模型。从 ImageNet 预训练的大 ResNet152 基准网络开始，我们首先将其减少为 5 层，从而减少参数和 FLOPs 的数量比 exceeds 6 times，无需重要性下降。然后，我们在 3 层处将模型分割成多个分支，保持同样的参数和 FLOPs 数量，以创建一个 ensemble 的子网络，以提高性能。我们对 $40$ 个图像分类数据集进行了大规模的实验，结果表明，我们的技术与“经典基准网络精度”匹配或更好，同时实现了更小的模型大小和更快的推理速度。

Modularity in Deep Learning: A Survey

paper_url: http://arxiv.org/abs/2310.01154
repo_url: https://github.com/ckg-DeepLearning/CoverTypeClassification
paper_authors: Haozhe Sun, Isabelle Guyon
For: This paper reviews the concept of modularity in deep learning, focusing on three axes: data, task, and model.* Methods: The paper discusses different instantiations of the modularity principle in deep learning, including data modularity, task modularity, and model modularity.* Results: The paper provides a comprehensive overview of the advantages of modularity in deep learning, including ease of conceptualization, interpretability, scalability, module combinability, and module reusability.Here is the same information in Simplified Chinese text:* For: 这篇论文探讨了深度学习中的模块化原则，围绕着数据、任务和模型三个轴线进行了详细的介绍。* Methods: 论文讲述了不同的模块化实现方式，包括数据模块化、任务模块化和模型模块化。* Results: 论文提供了深度学习中模块化的优点，包括易于理解、可读性、可扩展性、模块可组合性和模块可重用性等。

Abstract
Modularity is a general principle present in many fields. It offers attractive advantages, including, among others, ease of conceptualization, interpretability, scalability, module combinability, and module reusability. The deep learning community has long sought to take inspiration from the modularity principle, either implicitly or explicitly. This interest has been increasing over recent years. We review the notion of modularity in deep learning around three axes: data, task, and model, which characterize the life cycle of deep learning. Data modularity refers to the observation or creation of data groups for various purposes. Task modularity refers to the decomposition of tasks into sub-tasks. Model modularity means that the architecture of a neural network system can be decomposed into identifiable modules. We describe different instantiations of the modularity principle, and we contextualize their advantages in different deep learning sub-fields. Finally, we conclude the paper with a discussion of the definition of modularity and directions for future research.

摘要
modularity是一种通用的原则，存在很多领域中。它带来了一些优点，包括易于概念、可解释性、可扩展性、模块可组合性和模块可重用性。深度学习社区长期寻求从模块原则中得到灵感，或者直接或间接地。这些 интерест在最近几年内不断增长。本文将对深度学习中的模块原理进行评论，分为三个轴：数据、任务和模型，这些轴描述了深度学习生命周期。数据模块性指的是将数据分组为不同目的。任务模块性指的是将任务拆分成子任务。模型模块性是指神经网络系统的架构可以被分解成可识别的模块。我们将介绍不同的模块原理的实现方式，并在不同的深度学习子领域中评估其优点。最后，我们将结束这篇论文，并对模块原理的定义和未来研究的方向进行讨论。

Cryptocurrency Portfolio Optimization by Neural Networks

paper_url: http://arxiv.org/abs/2310.01148
repo_url: None
paper_authors: Quoc Minh Nguyen, Dat Thanh Tran, Juho Kanniainen, Alexandros Iosifidis, Moncef Gabbouj
for: 提出了一种基于神经网络的算法，用于利用现代 cryptocurrency 投资产品进行减少风险或投资。
methods: 使用深度神经网络，输出每个时间间隔的资产分配重量，并通过最大化尖峰比率来减少网络偏袋性。
results: 经过19个月的Backtest测试，提出的算法可以生成能够在不同市场情况下实现盈利的神经网络。

Abstract
Many cryptocurrency brokers nowadays offer a variety of derivative assets that allow traders to perform hedging or speculation. This paper proposes an effective algorithm based on neural networks to take advantage of these investment products. The proposed algorithm constructs a portfolio that contains a pair of negatively correlated assets. A deep neural network, which outputs the allocation weight of each asset at a time interval, is trained to maximize the Sharpe ratio. A novel loss term is proposed to regulate the network's bias towards a specific asset, thus enforcing the network to learn an allocation strategy that is close to a minimum variance strategy. Extensive experiments were conducted using data collected from Binance spanning 19 months to evaluate the effectiveness of our approach. The backtest results show that the proposed algorithm can produce neural networks that are able to make profits in different market situations.

摘要
很多现代加密货币经纪人现在提供一种多种Derivative资产，让投资者进行减风险或投机投资。这篇论文提出了一种基于神经网络的有效算法，用于利用这些投资产品。提议的算法构建了一个包含一对相互负相关资产的资产组合。一个深度神经网络，输出每个时间间隔的每个资产的投资Weight，通过最大化Sharpe比来寻找最佳投资策略。提出了一个新的损失函数，以规避神经网络偏好某个资产，从而使神经网络学习一种减风险的投资策略。对 Binance 收集的19个月的数据进行了广泛的实验测试，以评估我们的方法的有效性。backtest结果表明，我们的算法可以生成能够在不同市场情况下产生利润的神经网络。

Parallel-in-Time Probabilistic Numerical ODE Solvers

paper_url: http://arxiv.org/abs/2310.01145
repo_url: https://github.com/nathanaelbosch/parallel-in-time-ode-filters
paper_authors: Nathanael Bosch, Adrien Corenflos, Fatemeh Yaghoobi, Filip Tronarp, Philipp Hennig, Simo Särkkä
for: numerical simulation of dynamical systems as problems of Bayesian state estimation
methods: time-parallel formulation of iterated extended Kalman smoothers
results: reduced span cost from linear to logarithmic in the number of time steps

Abstract
Probabilistic numerical solvers for ordinary differential equations (ODEs) treat the numerical simulation of dynamical systems as problems of Bayesian state estimation. Aside from producing posterior distributions over ODE solutions and thereby quantifying the numerical approximation error of the method itself, one less-often noted advantage of this formalism is the algorithmic flexibility gained by formulating numerical simulation in the framework of Bayesian filtering and smoothing. In this paper, we leverage this flexibility and build on the time-parallel formulation of iterated extended Kalman smoothers to formulate a parallel-in-time probabilistic numerical ODE solver. Instead of simulating the dynamical system sequentially in time, as done by current probabilistic solvers, the proposed method processes all time steps in parallel and thereby reduces the span cost from linear to logarithmic in the number of time steps. We demonstrate the effectiveness of our approach on a variety of ODEs and compare it to a range of both classic and probabilistic numerical ODE solvers.

摘要
<>传送给定文本到简化中文。>概率数学方法 для常微分方程（ODE）视数学动力系统的数字 simulate 为某种抽象 Bayesian 状态估计问题。除了生成 posterior 分布于 ODE 解和数字方法自身的误差外，这种形式主义还具有通过 Bayesian 滤波和平滑来获得的算法 Fleibility。在这篇文章中，我们利用这种灵活性，并基于时间平行的迭代扩展 Kalman 平滑器来构建一种并发在时间上的概率数学 ODE 解决方案。而不是在时间序列中顺序地 simulate 动力系统，这种方法在所有时间步长进行并行处理，从而将 span 成本由线性降低到对数型。我们在各种 ODE 上测试了我们的方法，并与经典和概率数学 ODE 解决方案进行了比较。

The Map Equation Goes Neural

paper_url: http://arxiv.org/abs/2310.01144
repo_url: None
paper_authors: Christopher Blöcker, Chester Tan, Ingo Scholtes
for: 本研究旨在bridging深度学习和网络科学两个领域，提出一种基于深度学习的社区检测方法，以便自动找出高级组织结构。
methods: 我们使用了map方程，一种信息论函数来实现社区检测。将其表示为完全可导的矩阵形式，然后通过梯度下降优化。这种方法可以兼容任何图神经网络架构，从而实现灵活的归一化和图 pooling，自动找出最佳数量的群集，并且自然地检测 overlap 社区。
results: 我们通过实验表明，我们的方法可以与基eline比肩，自动找出最佳数量的群集，避免了稀疏图的过分 partitioning。我们的方法还能够自然地检测 overlap 社区，并且不需要手动添加辅助特征。

Abstract
Community detection and graph clustering are essential for unsupervised data exploration and understanding the high-level organisation of networked systems. Recently, graph clustering has been highlighted as an under-explored primary task for graph neural networks. While hierarchical graph pooling has been shown to improve performance in graph and node classification tasks, it performs poorly in identifying meaningful clusters. Community detection has a long history in network science, but typically relies on optimising objective functions with custom-tailored search algorithms, not leveraging recent advances in deep learning, particularly from graph neural networks. In this paper, we narrow this gap between the deep learning and network science communities. We consider the map equation, an information-theoretic objective function for community detection. Expressing it in a fully differentiable tensor form that produces soft cluster assignments, we optimise the map equation with deep learning through gradient descent. More specifically, the reformulated map equation is a loss function compatible with any graph neural network architecture, enabling flexible clustering and graph pooling that clusters both graph structure and data features in an end-to-end way, automatically finding an optimum number of clusters without explicit regularisation. We evaluate our approach experimentally using different neural network architectures for unsupervised clustering in synthetic and real data. Our results show that our approach achieves competitive performance against baselines, naturally detects overlapping communities, and avoids over-partitioning sparse graphs.

摘要
In this paper, we bridge the gap between the deep learning and network science communities by considering the map equation, an information-theoretic objective function for community detection. We express it in a fully differentiable tensor form that produces soft cluster assignments, and we optimize it with deep learning through gradient descent. Our approach is compatible with any graph neural network architecture, allowing for flexible clustering and graph pooling that simultaneously clusters both graph structure and data features in an end-to-end manner. This enables the automatic discovery of an optimal number of clusters without explicit regularization.We evaluate our approach experimentally using different neural network architectures for unsupervised clustering in synthetic and real data. Our results show that our approach achieves competitive performance against baselines, naturally detects overlapping communities, and avoids over-partitioning sparse graphs.

CommIN: Semantic Image Communications as an Inverse Problem with INN-Guided Diffusion Models

paper_url: http://arxiv.org/abs/2310.01130
repo_url: None
paper_authors: Jiakang Chen, Di You, Deniz Gündüz, Pier Luigi Dragotti
for: 提高无线图像传输中的品质
methods: 使用傅立叶网络（INN）和扩散模型
results: 在极端条件下（如低带宽和低信号噪比）显著改善图像的品质

Abstract
Joint source-channel coding schemes based on deep neural networks (DeepJSCC) have recently achieved remarkable performance for wireless image transmission. However, these methods usually focus only on the distortion of the reconstructed signal at the receiver side with respect to the source at the transmitter side, rather than the perceptual quality of the reconstruction which carries more semantic information. As a result, severe perceptual distortion can be introduced under extreme conditions such as low bandwidth and low signal-to-noise ratio. In this work, we propose CommIN, which views the recovery of high-quality source images from degraded reconstructions as an inverse problem. To address this, CommIN combines Invertible Neural Networks (INN) with diffusion models, aiming for superior perceptual quality. Through experiments, we show that our CommIN significantly improves the perceptual quality compared to DeepJSCC under extreme conditions and outperforms other inverse problem approaches used in DeepJSCC.

摘要
joint source-channel coding schemes based on deep neural networks (DeepJSCC) 已经取得了对无线影像传输中的出色表现。然而，这些方法通常仅专注于从接收端传输端到源端的变数的干扰，而不是传输端对源端的semantic信息传输的质量。因此，在极端情况下，如低带宽和低信号至杂音比，可能导致严重的semantic扭曲。在这个工作中，我们提出了CommIN，它视为从损坏重建中恢复高质量源影像的问题为一个逆问题。为解决这个问题，CommIN结合了反射神经网络（INN）和扩散模型，实现了更好的semantic质量。经过实验，我们发现CommIN在极端情况下比DeepJSCC更有优秀的semantic质量表现，并且在其他 inverse problem 方法上进行了比较。

Predicting emergence of crystals from amorphous matter with deep learning

paper_url: http://arxiv.org/abs/2310.01117
repo_url: None
paper_authors: Muratahan Aykol, Amil Merchant, Simon Batzner, Jennifer N. Wei, Ekin Dogus Cubuk
for: 这个论文的目的是predicting the outcome of phase transitions in inorganic materials, enabling new research directions in material synthesis and development.
methods: 该论文使用了universal deep learning potentials to sample the crystallization pathways of local structural motifs at the atomistic level, allowing for the prediction of crystal structures of polymorphs from amorphous precursors with high accuracy.
results: 该论文的结果表明，通过利用 Ostwald’s rule of stages mechanistically at the molecular level, it is possible to predictably access new metastable crystals from the amorphous phase in material synthesis, across a diverse set of material systems including polymorphic oxides, nitrides, carbides, fluorides, chlorides, chalcogenides, and metal alloys.

Abstract
Crystallization of the amorphous phases into metastable crystals plays a fundamental role in the formation of new matter, from geological to biological processes in nature to synthesis and development of new materials in the laboratory. Predicting the outcome of such phase transitions reliably would enable new research directions in these areas, but has remained beyond reach with molecular modeling or ab-initio methods. Here, we show that crystallization products of amorphous phases can be predicted in any inorganic chemistry by sampling the crystallization pathways of their local structural motifs at the atomistic level using universal deep learning potentials. We show that this approach identifies the crystal structures of polymorphs that initially nucleate from amorphous precursors with high accuracy across a diverse set of material systems, including polymorphic oxides, nitrides, carbides, fluorides, chlorides, chalcogenides, and metal alloys. Our results demonstrate that Ostwald's rule of stages can be exploited mechanistically at the molecular level to predictably access new metastable crystals from the amorphous phase in material synthesis.

摘要
晶体化的杂形阶段到稳定的晶体发散着重要作用于自然界的形成和人工材料的 synthesis 中。可预测晶体化结果的方法会开拓新的研究方向，但这一目标尚未被分子模型或初始方法实现。本文显示，在无机化学中，通过 sampling 杂形阶段的晶体化路径的本地结构模式，使用 universal deep learning potentials 可预测晶体化产物的晶体结构。我们的结果表明，这种方法可以高精度地预测各种材料系统中的多形体，包括氧化物、硼化物、碳化物、氟化物、氯化物、硫化物和金属合金。我们的结果还示出，在材料合成中，奥斯特瓦尔的规则可以在分子水平上机制地抓住，以预测从杂形阶段到新的稳定晶体的晶体化过程。

R-divergence for Estimating Model-oriented Distribution Discrepancy

paper_url: http://arxiv.org/abs/2310.01109
repo_url: https://github.com/lawliet-zzl/r-div
paper_authors: Zhilin Zhao, Longbing Cao
for: 本文旨在检测两个数据集的概率分布是否相同，以便在不同的概率分布下进行模型训练。
methods: 本文提出了R- divergence方法，该方法通过估计最优假设来评估两个数据集的概率分布差异。
results: 实验表明，R-divergence方法可以准确地检测不同概率分布下的数据集差异，并且在不同任务上达到了状态 искусственный的性能。此外，本文还应用R-divergence方法在受损标签数据集上训练Robust神经网络。

Abstract
Real-life data are often non-IID due to complex distributions and interactions, and the sensitivity to the distribution of samples can differ among learning models. Accordingly, a key question for any supervised or unsupervised model is whether the probability distributions of two given datasets can be considered identical. To address this question, we introduce R-divergence, designed to assess model-oriented distribution discrepancies. The core insight is that two distributions are likely identical if their optimal hypothesis yields the same expected risk for each distribution. To estimate the distribution discrepancy between two datasets, R-divergence learns a minimum hypothesis on the mixed data and then gauges the empirical risk difference between them. We evaluate the test power across various unsupervised and supervised tasks and find that R-divergence achieves state-of-the-art performance. To demonstrate the practicality of R-divergence, we employ R-divergence to train robust neural networks on samples with noisy labels.

摘要
<> translate the following text into Simplified ChineseReal-life data are often non-IID due to complex distributions and interactions, and the sensitivity to the distribution of samples can differ among learning models. Accordingly, a key question for any supervised or unsupervised model is whether the probability distributions of two given datasets can be considered identical. To address this question, we introduce R-divergence, designed to assess model-oriented distribution discrepancies. The core insight is that two distributions are likely identical if their optimal hypothesis yields the same expected risk for each distribution. To estimate the distribution discrepancy between two datasets, R-divergence learns a minimum hypothesis on the mixed data and then gauges the empirical risk difference between them. We evaluate the test power across various unsupervised and supervised tasks and find that R-divergence achieves state-of-the-art performance. To demonstrate the practicality of R-divergence, we employ R-divergence to train robust neural networks on samples with noisy labels.中文简体版：实际数据往往非相关的，因为样本分布复杂和互动，而不同学习模型对样本分布的敏感度也不同。因此，任何supervised或unsupersvised模型的关键问题是否可以视两个给定的数据集为同一个分布。为解决这个问题，我们介绍了R-divergence，用于评估模型层次分布差异。R-divergence的核心思想是，如果两个分布的优化假设都能够对它们的预期风险做出同样的预测，那么这两个分布可能是同一个。为估计两个数据集之间的分布差异，R-divergence学习了混合数据上的最小假设，然后计算这两个数据集之间的实际风险差异。我们在不同的Unsupervised和Supervised任务中评估了R-divergence的测试能力，并发现它达到了当前最佳性能。为证明R-divergence的实用性，我们使用R-divergence来训练对样本标签有误的神经网络。

Energy-Guided Continuous Entropic Barycenter Estimation for General Costs

paper_url: http://arxiv.org/abs/2310.01105
repo_url: None
paper_authors: Alexander Kolesov, Petr Mokrov, Igor Udovichenko, Milena Gazdieva, Gudmund Pammer, Evgeny Burnaev, Alexander Korotin
for: averaging probability distributions while capturing their geometric properties
methods: novel algorithm based on weak OT and dual reformulation, with quality bounds and interconnectivity with Energy-Based Models
results: validated on low-dimensional scenarios and image-space setups, with practical applications in learning barycenter on image manifolds generated by pretrained generative models

Abstract
Optimal transport (OT) barycenters are a mathematically grounded way of averaging probability distributions while capturing their geometric properties. In short, the barycenter task is to take the average of a collection of probability distributions w.r.t. given OT discrepancies. We propose a novel algorithm for approximating the continuous Entropic OT (EOT) barycenter for arbitrary OT cost functions. Our approach is built upon the dual reformulation of the EOT problem based on weak OT, which has recently gained the attention of the ML community. Beyond its novelty, our method enjoys several advantageous properties: (i) we establish quality bounds for the recovered solution; (ii) this approach seemlessly interconnects with the Energy-Based Models (EBMs) learning procedure enabling the use of well-tuned algorithms for the problem of interest; (iii) it provides an intuitive optimization scheme avoiding min-max, reinforce and other intricate technical tricks. For validation, we consider several low-dimensional scenarios and image-space setups, including non-Euclidean cost functions. Furthermore, we investigate the practical task of learning the barycenter on an image manifold generated by a pretrained generative model, opening up new directions for real-world applications.

摘要

我们设定了解决方案的质量上限，从而确保方案的可靠性。2. 我们的方法可以继续使用已经调整好的EBMs学习过程，从而使用已有的算法解决问题。3. 我们的方法具有直观的优化方案，不需要使用复杂的技术手段，如min-max、强化等。为了验证我们的方法，我们在低维度的场景和图像空间中进行了多个实验，包括非欧几何成本函数。此外，我们还研究了在一个由预训练的生成模型生成的图像概率空间中学习矩形成的实际任务，开启了新的应用场景。

Seismogram Transformer: A generic deep learning backbone network for multiple earthquake monitoring tasks

paper_url: http://arxiv.org/abs/2310.01037
repo_url: https://github.com/senli1073/seist
paper_authors: Sen Li, Xu Yang, Anye Cao, Changbin Wang, Yaoqi Liu, Yapeng Liu, Qiang Niu
for: 这个论文主要探讨了地震记录（seismogram）处理的深度学习方法，以提高地震研究和监测的精度和效率。
methods: 这个论文提出了一种新的含有多种基础块的径向神经网络模型，称为Seismogram Transformer（SeisT），用于地震检测、地震阶段选择、初动方向分类、强度估计和反射方向估计等多种地震监测任务。
results: 这个论文的实验结果表明，SeisT模型在不同的任务上能够匹配或者甚至超越现有的状态对照模型，特别是在对于不同数据集的扩展性表现上。SeisT模型通过不同基础块的组合，能够从低级到高级复杂特征之间提取多种特征，如频率、阶段和时间-频率关系。

Abstract
Seismic records, known as seismograms, are crucial records of ground motion resulting from seismic events, constituting the backbone of earthquake research and monitoring. The latest advancements in deep learning have significantly facilitated various seismic signal processing tasks. This paper introduces a novel backbone neural network model designed for various seismic monitoring tasks, named Seismogram Transformer (SeisT). Thanks to its efficient network architecture, SeisT matches or even outperforms the state-of-the-art models in earthquake detection, seismic phase picking, first-motion polarity classification, magnitude estimation, and back-azimuth estimation tasks, particularly in terms of out-of-distribution generalization performance. SeisT consists of multiple network layers composed of different foundational blocks, which help the model understand multi-level feature representations of seismograms from low-level to high-level complex features, effectively extracting features such as frequency, phase, and time-frequency relationships from input seismograms. Three different-sized models were customized based on these diverse foundational modules. Through extensive experiments and performance evaluations, this study showcases the capabilities and potential of SeisT in advancing seismic signal processing and earthquake research.

摘要
震动记录，即震ogram，是地震研究和监测中非常重要的记录，表现地面上的运动。最新的深度学习技术在不同的震动信号处理任务中帮助了我们更好地完成工作。这篇论文介绍了一种新的震动监测模型，名为震ogram Transformer（SeisT），用于不同的震动监测任务。这种模型具有高效的网络架构，可以匹配或者超越现有的状态 искусственный智能模型在地震检测、震动相位选择、首动方向分类、 magnitude 估计和反射角估计等任务中的性能。SeisT 模型由多层网络组成，每层由不同的基础块组成，这些基础块帮助模型理解震ogram 中的多级特征表示，从低级特征到高级复杂特征，有效地提取震ogram 中的频率、相位和时间频率关系等特征。为了适应不同的应用场景，我们还制定了三个不同的模型大小。经过了广泛的实验和性能评估，这篇论文展示了 SeisT 模型在震动信号处理和地震研究中的可能性和应用前景。

A Novel Approach for Machine Learning-based Load Balancing in High-speed Train System using Nested Cross Validation

paper_url: http://arxiv.org/abs/2310.01034
repo_url: None
paper_authors: Ibrahim Yazici, Emre Gures
for: 高速铁路系统中的5G无线通信网络，以提高 mobil 用户的服务质量。
methods: 使用机器学习方法，包括嵌入式检查运算（Nested Cross Validation），以避免模型评估中的信息泄露，并且避免适材料过滤（Overfitting），以获得更好的一致错误（Generalization Error）。
results: 使用不同的机器学习方法，包括渐进增强回传（GBR）、适材料增强（AdaBoost）、猫缩回传（CBR）、人工神经网络（ANN）、核心ridge回传（KRR）、支持向量回传（SVR）和k-最近邻回传（KNNR），可以对高速铁路系统的问题进行优化。而使用嵌入式检查运算的结果，比较了不同的跨 VALIDATION 方案，发现boosting方法、ABR、CBR、GBR在嵌入式检查运算下表现最佳，而SVR、KNNR、KRR、ANN在嵌入式检查运算下的预测结果则有 promise。

Abstract
Fifth-generation (5G) mobile communication networks have recently emerged in various fields, including highspeed trains. However, the dense deployment of 5G millimeter wave (mmWave) base stations (BSs) and the high speed of moving trains lead to frequent handovers (HOs), which can adversely affect the Quality-of-Service (QoS) of mobile users. As a result, HO optimization and resource allocation are essential considerations for managing mobility in high-speed train systems. In this paper, we model system performance of a high-speed train system with a novel machine learning (ML) approach that is nested cross validation scheme that prevents information leakage from model evaluation into the model parameter tuning, thereby avoiding overfitting and resulting in better generalization error. To this end, we employ ML methods for the high-speed train system scenario. Handover Margin (HOM) and Time-to-Trigger (TTT) values are used as features, and several KPIs are used as outputs, and several ML methods including Gradient Boosting Regression (GBR), Adaptive Boosting (AdaBoost), CatBoost Regression (CBR), Artificial Neural Network (ANN), Kernel Ridge Regression (KRR), Support Vector Regression (SVR), and k-Nearest Neighbor Regression (KNNR) are employed for the problem. Finally, performance comparisons of the cross validation schemes with the methods are made in terms of mean absolute error (MAE) and mean square error (MSE) metrics are made. As per obtained results, boosting methods, ABR, CBR, GBR, with nested cross validation scheme superiorly outperforms conventional cross validation scheme results with the same methods. On the other hand, SVR, KNRR, KRR, ANN with the nested scheme produce promising results for prediction of some KPIs with respect to their conventional scheme employment.

摘要
fifth-generation (5G) 移动通信网络在不同领域出现，包括高速列车。然而，5G毫米波基站的密集部署和高速列车的运动速度导致频繁的手动更新（HO），可能会影响移动用户的服务质量（QoS）。因此，HO优化和资源分配是管理移动高速列车系统的关键考虑因素。在这篇论文中，我们使用一种新的机器学习（ML）方法，即嵌入式十字验证（Nested CV），来模型高速列车系统的性能。这种方法可以避免信息泄露，从而避免过拟合和更好地适应泛化误差。为此，我们在高速列车系统场景中采用ML方法。手动更新边缘（HOM）和时间触发（TTT）值被用作特征，并使用多个key performance indicators（KPIs）作为输出。此外，我们采用了多种ML方法，包括梯度提升回归（GBR）、适应提升（AdaBoost）、CatBoost回归（CBR）、人工神经网络（ANN）、核心ridge回归（KRR）、支持向量回归（SVR）和k-最近邻回归（KNNR）。最后，我们对嵌入式十字验证方案与这些方法进行了性能比较，并通过MAE和MSE指标进行评估。根据所获结果，梯度提升方法、ABR、CBR、GBR等方法，在嵌入式十字验证方案下表现出色，而SVR、KNRR、KRR、ANN等方法在嵌入式十字验证方案下表现出色。

The Fisher-Rao geometry of CES distributions

paper_url: http://arxiv.org/abs/2310.01032
repo_url: None
paper_authors: Florent Bouchard, Arnaud Breloy, Antoine Collas, Alexandre Renaux, Guillaume Ginolhac
for: 这篇论文是关于 Parametric Statistical Model 的扩展，通过在参数空间上尝试 Fisher 信息度量来自然地具有 Riemannian 拓扑结构，并利用这种结构来应用 differential geometry 的多种工具。
methods: 这篇论文使用了 Fisher 信息度量来 induce Riemannian 拓扑结构在参数空间上，并利用这种结构来解决 Covariance 矩阵估计、内在 Cramér-Rao 上限、以及使用 Riemannian 距离进行分类等问题。
results: 这篇论文的结果表明，通过使用 Riemannian 拓扑结构和 differential geometry 的工具，可以解决 Parametric Statistical Model 中的一些实际问题，如 Covariance 矩阵估计、内在 Cramér-Rao 上限、以及分类等问题。

Abstract
When dealing with a parametric statistical model, a Riemannian manifold can naturally appear by endowing the parameter space with the Fisher information metric. The geometry induced on the parameters by this metric is then referred to as the Fisher-Rao information geometry. Interestingly, this yields a point of view that allows for leveragingmany tools from differential geometry. After a brief introduction about these concepts, we will present some practical uses of these geometric tools in the framework of elliptical distributions. This second part of the exposition is divided into three main axes: Riemannian optimization for covariance matrix estimation, Intrinsic Cram\'er-Rao bounds, and classification using Riemannian distances.

摘要
当处理parametric统计模型时，Riemannian manifold自然而然地出现，通过将参数空间授予Fisher信息度量。这个度量在参数上引入的几何是Fisher-Rao信息几何。这种角度可以利用多种杂分几何工具。在这篇文章中，我们将首先介绍这些概念，然后在elliptical分布框架中展示一些实际应用。这一部分分为三个主要轴：Riemannian优化 дляcovariance矩阵估计、内在Cramér-Rao bound和Riemannian距离分类。

A Robust Machine Learning Approach for Path Loss Prediction in 5G Networks with Nested Cross Validation

paper_url: http://arxiv.org/abs/2310.01030
repo_url: None
paper_authors: Ibrahim Yazıcı, Emre Gures
for: This paper aims to improve the accuracy of path loss prediction in 5G wireless networks using machine learning (ML) methods, which can facilitate more accurate network planning, resource optimization, and performance improvement.
methods: The paper utilizes a nested cross validation scheme and six different ML methods (Support Vector Regression, CatBoost Regression, eXtreme Gradient Boosting Regression, Artificial Neural Network, and Random Forest) to predict path loss in a 5G network system, and compares the prediction results in terms of Mean Absolute Error and Mean Square Error.
results: The results show that XGBR outperforms the other methods, with a slight performance difference of 0.4% and 1% in terms of MAE and MSE, respectively, compared to CBR. The rest of the methods are outperformed by XGBR with clear performance differences.

Abstract
The design and deployment of fifth-generation (5G) wireless networks pose significant challenges due to the increasing number of wireless devices. Path loss has a landmark importance in network performance optimization, and accurate prediction of the path loss, which characterizes the attenuation of signal power during transmission, is critical for effective network planning, coverage estimation, and optimization. In this sense, we utilize machine learning (ML) methods, which overcome conventional path loss prediction models drawbacks, for path loss prediction in a 5G network system to facilitate more accurate network planning, resource optimization, and performance improvement in wireless communication systems. To this end, we utilize a novel approach, nested cross validation scheme, with ML to prevent overfitting, thereby getting better generalization error and stable results for ML deployment. First, we acquire a publicly available dataset obtained through a comprehensive measurement campaign conducted in an urban macro-cell scenario located in Beijing, China. The dataset includes crucial information such as longitude, latitude, elevation, altitude, clutter height, and distance, which are utilized as essential features to predict the path loss in the 5G network system. We deploy Support Vector Regression (SVR), CatBoost Regression (CBR), eXtreme Gradient Boosting Regression (XGBR), Artificial Neural Network (ANN), and Random Forest (RF) methods to predict the path loss, and compare the prediction results in terms of Mean Absolute Error (MAE) and Mean Square Error (MSE). As per obtained results, XGBR outperforms the rest of the methods. It outperforms CBR with a slight performance differences by 0.4 % and 1 % in terms of MAE and MSE metrics, respectively. On the other hand, it outperforms the rest of the methods with clear performance differences.

摘要
fifth-generation (5G) 无线网络的设计和部署具有显著的挑战，主要是因为无线设备的增加数量。path loss的预测对网络性能优化具有重要的意义，因此我们使用机器学习（ML）方法来预测path loss，以便更好地规划网络、资源优化和性能提高。为此，我们采用了一种新的方法——嵌套交叉验证算法，以避免过拟合，从而获得更好的总体误差和稳定的结果。首先，我们获得了一个公共可用的数据集，通过在北京市区 macro-cell enario中进行了全面的测量活动来获得。该数据集包括了重要的信息，如 longitude、latitude、高度、高度、垃圾高度和距离，这些信息被用作5G网络系统中path loss预测的关键特征。我们使用支持向量回归（SVR）、CatBoost回归（CBR）、极限Gradient Boosting回归（XGBR）、人工神经网络（ANN）和Random Forest（RF）方法来预测path loss，并将结果比较以 Mean Absolute Error（MAE）和Mean Square Error（MSE）指标。根据结果显示，XGBR方法在所有方法中表现最佳，其与CBR方法的性能差距只有0.4%和1%，在MAE和MSE指标上分别下降。此外，XGBR方法在其他方法之间表现出清晰的性能差异。

Conflict-Aware Active Automata Learning

paper_url: http://arxiv.org/abs/2310.01003
repo_url: None
paper_authors: Tiago Ferreira, Léo Henry, Raquel Fernandes da Silva, Alexandra Silva
for: 这篇论文是用于解决活动自动化学习算法对于观察数据中的矛盾问题的。
methods: 本文提出了一个具有观察树的对话式活动自动化学习框架（C3AL），以便在学习过程中处理矛盾资料。
results: 根据大量的实验结果显示，C3AL可以更好地处理错误和系统变化，并且可以与现有的学习算法一起使用。

Abstract
Active automata learning algorithms cannot easily handle conflict in the observation data (different outputs observed for the same inputs). This inherent inability to recover after a conflict impairs their effective applicability in scenarios where noise is present or the system under learning is mutating. We propose the Conflict-Aware Active Automata Learning (C3AL) framework to enable handling conflicting information during the learning process. The core idea is to consider the so-called observation tree as a first-class citizen in the learning process. Though this idea is explored in recent work, we take it to its full effect by enabling its use with any existing learner and minimizing the number of tests performed on the system under learning, specially in the face of conflicts. We evaluate C3AL in a large set of benchmarks, covering over 30 different realistic targets, and over 18,000 different scenarios. The results of the evaluation show that C3AL is a suitable alternative framework for closed-box learning that can better handle noise and mutations.

摘要
aktive automata learning algoritmen kan ikke lette håndtere konflikt i observasjonsdata (forskjellige utgaver observert for de samme inputtene). Dette innbygge impiderer deres effektive anvendelighet i scenarioer der støy er tilstede eller systemet under læring er muterende. Vi foreslår Conflict-Aware Active Automata Learning (C3AL) rammeverk for å håndtere konflikt information under læringprosessen. Hovedidéen er å betrakte den kaldte observasjon tre som en førsteklasses borger i læringprosessen. Selv om denne ideen er utforsket i recent arbeid, tas vi den til sitt fulle uttrykk ved å tillate bruk av den med enhver eksisterende lærer og å minimere antall tester på systemet under læring, særlig i møte med konflikter. Vi evaluerer C3AL i et stort sett med benchmarks, dekkende over 30 forskjellige reelle mål og over 18 000 forskjellige scenarioer. Resultatene av evaluasjonen viser at C3AL er en passende alternativ rammeverk for lukket bok learning som kan bedre håndtere støy og mutasjoner.

A Theoretical Analysis of the Test Error of Finite-Rank Kernel Ridge Regression

paper_url: http://arxiv.org/abs/2310.00987
repo_url: None
paper_authors: Tin Sum Cheng, Aurelien Lucchi, Ivan Dokmanić, Anastasis Kratsios, David Belius
for: 这个论文旨在提供对任意Finite-rank kernel ridge regression（KRR）的精细学习保证。
methods: 这篇论文使用了非难式的方法，包括 derive sharp non-asymptotic upper and lower bounds for KRR test error。
results: 这篇论文提供了较为紧凑的 bounds，与之前的结果相比，它们在任何正则化参数下都 remained valid。

Abstract
Existing statistical learning guarantees for general kernel regressors often yield loose bounds when used with finite-rank kernels. Yet, finite-rank kernels naturally appear in several machine learning problems, e.g.\ when fine-tuning a pre-trained deep neural network's last layer to adapt it to a novel task when performing transfer learning. We address this gap for finite-rank kernel ridge regression (KRR) by deriving sharp non-asymptotic upper and lower bounds for the KRR test error of any finite-rank KRR. Our bounds are tighter than previously derived bounds on finite-rank KRR, and unlike comparable results, they also remain valid for any regularization parameters.

摘要
<>将给定文本翻译成简化中文。<>现有的统计学学习保证对通用kernel regression器通常会提供松弛的上限。然而，finite-rank kernel naturally appears in several machine learning problems,例如在转移学习中使用pre-trained deep neural network的最后一层进行微调以适应novel task。我们解决了这个差异 для finite-rank kernel ridge regression (KRR) by deriving sharp non-asymptotic upper and lower bounds for the KRR test error of any finite-rank KRR.我们的 boundstighter than previously derived bounds on finite-rank KRR, and unlike comparable results, they also remain valid for any regularization parameters.

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

paper_url: http://arxiv.org/abs/2310.00968
repo_url: None
paper_authors: Qiwei Di, Tao Jin, Yue Wu, Heyang Zhao, Farzad Farnoud, Quanquan Gu
for: 这篇论文旨在提出一种Contextual Dueling Bandits算法，用于处理带有对比性不确定性的决策问题。
methods: 该算法使用一种新的SupLinUCB型算法，具有计算效率和对偏差的 regret bound。
results: 实验结果表明，该算法在synthetic数据上比前一代不考虑偏差的算法表现更好，特别是在对比性较高的情况下。

Abstract
Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems. While substantial efforts have been made to minimize the cumulative regret in dueling bandits, a notable gap in the current research is the absence of regret bounds that account for the inherent uncertainty in pairwise comparisons between the dueling arms. Intuitively, greater uncertainty suggests a higher level of difficulty in the problem. To bridge this gap, this paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM). We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $\tilde O\big(d\sqrt{\sum_{t=1}^T\sigma_t^2} + d\big)$, where $\sigma_t$ is the variance of the pairwise comparison in round $t$, $d$ is the dimension of the context vectors, and $T$ is the time horizon. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $\tilde O(d)$ regret. We perform empirical experiments on synthetic data to confirm the advantage of our method over previous variance-agnostic algorithms.

摘要
“对抗匪徒”是一个具有偏好反馈的决策框架，这种特性适合人际互动的应用，如排名、资料搜寻和推荐系统。 despite significant efforts to minimize the cumulative regret in dueling bandits, there is a notable gap in the current research: the absence of regret bounds that account for the inherent uncertainty in pairwise comparisons between the dueling arms. Intuitively, greater uncertainty suggests a higher level of difficulty in the problem. To bridge this gap, this paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM). We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $\tilde O\big(d\sqrt{\sum_{t=1}^T\sigma_t^2} + d\big)$, where $\sigma_t$ is the variance of the pairwise comparison in round $t$, $d$ is the dimension of the context vectors, and $T$ is the time horizon. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $\tilde O(d)$ regret. We perform empirical experiments on synthetic data to confirm the advantage of our method over previous variance-agnostic algorithms.

MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training

paper_url: http://arxiv.org/abs/2310.00967
repo_url: https://github.com/kljp/micro
paper_authors: Daegun Yoon, Sangyoon Oh
for: 加速分布式深度神经网络（DNN）训练，提高训练效率和可扩展性。
methods: 提出了一种新的梯度减少方法，即MiCRO，通过将梯度向量分割，并对每个分割分配到相应的工作者进行梯度选择，以避免梯度积累和不当的阈值选择，并且可以根据用户需求设定最佳阈值，以实现近于零成本的梯度减少。
results: 在广泛的实验中，MiCRO比状态空间的减少器更具有极高的快速收敛率。

Abstract
Gradient sparsification is a communication optimisation technique for scaling and accelerating distributed deep neural network (DNN) training. It reduces the increasing communication traffic for gradient aggregation. However, existing sparsifiers have poor scalability because of the high computational cost of gradient selection and/or increase in communication traffic. In particular, an increase in communication traffic is caused by gradient build-up and inappropriate threshold for gradient selection. To address these challenges, we propose a novel gradient sparsification method called MiCRO. In MiCRO, the gradient vector is partitioned, and each partition is assigned to the corresponding worker. Each worker then selects gradients from its partition, and the aggregated gradients are free from gradient build-up. Moreover, MiCRO estimates the accurate threshold to maintain the communication traffic as per user requirement by minimising the compression ratio error. MiCRO enables near-zero cost gradient sparsification by solving existing problems that hinder the scalability and acceleration of distributed DNN training. In our extensive experiments, MiCRO outperformed state-of-the-art sparsifiers with an outstanding convergence rate.

摘要
Gradient sparsification 是一种分布式深度神经网络（DNN）训练中的通信优化技术，它降低了梯度聚合所导致的通信峰值。然而，现有的简化器具有较差的扩展性，因为梯度选择所需的计算成本高，以及或者通信峰值的增加。特别是，通信峰值的增加是由梯度建立和不当的阈值导致的。为解决这些挑战，我们提出了一种新的梯度简化方法，称为 MiCRO。在 MiCRO 中，梯度 вектор 被分区，每个分区被分配到对应的工作者。每个工作者然后从其分区中选择梯度，并将所有分区的积和的梯度进行聚合。此外，MiCRO 估算了精准的阈值，以保持通信峰值与用户需求一致。MiCRO 允许 near-zero 成本的梯度简化，解决了分布式 DNN 训练中的扩展性和加速性的问题。在我们的广泛实验中，MiCRO 与状态之前的简化器相比，具有出色的收敛率。

Effective Learning with Node Perturbation in Deep Neural Networks

paper_url: http://arxiv.org/abs/2310.00965
repo_url: None
paper_authors: Sander Dalm, Marcel van Gerven, Nasir Ahmad
for: 提高深度神经网络模型的训练参数，替代传统的反射推传方法。
methods: 使用节点干扰（NP）方法，通过在网络活动中扔入噪声，并测量引起的损失变化来学习。
results: 通过与方向导Derivatives的关联和层weise输入decorrelating机制，NP学习得到显著改进，与传统的反射推传方法竞争。

Abstract
Backpropagation (BP) is the dominant and most successful method for training parameters of deep neural network models. However, BP relies on two computationally distinct phases, does not provide a satisfactory explanation of biological learning, and can be challenging to apply for training of networks with discontinuities or noisy node dynamics. By comparison, node perturbation (NP) proposes learning by the injection of noise into the network activations, and subsequent measurement of the induced loss change. NP relies on two forward (inference) passes, does not make use of network derivatives, and has been proposed as a model for learning in biological systems. However, standard NP is highly data inefficient and unstable due to its unguided, noise-based, activity search. In this work, we investigate different formulations of NP and relate it to the concept of directional derivatives as well as combining it with a decorrelating mechanism for layer-wise inputs. We find that a closer alignment with directional derivatives, and induction of decorrelation of inputs at every layer significantly enhances performance of NP learning making it competitive with BP.

摘要
“背测传播”（Backpropagation，BP）是深度神经网络模型的训练方法之主流和最成功的方法。然而，BP 依赖了两个计算上不同的阶段，无法提供满意的生物学学习解释，且可能对网络中的随机性或不确定性进行训练时具有挑战。相比之下，“节点干扰”（Node Perturbation，NP）提出了通过对网络活动的噪声注入，并且 mesure 对� induced loss change 的方法。NP 依赖了两个前（推论）通过，没有使用网络 Derivative，并且被视为生物学系统中的学习模型。然而，标准的NP 对于资料效率和稳定性而言相当不利，因为它的不导向、噪声基的活动搜寻。在这个研究中，我们调查了不同的NP формулювання，并与方向 derivative 的概念和层别输入的decorrelating Mechanism 结合。我们发现，与方向 derivative 更加接近的NP 学习，并且在每个层级 inducing decorrelation of inputs 可以很好地提高 NP 的性能，使其与 BP 竞争。

A Novel IoT Trust Model Leveraging Fully Distributed Behavioral Fingerprinting and Secure Delegation

paper_url: http://arxiv.org/abs/2310.00953
repo_url: None
paper_authors: Marco Arazzi, Serena Nicolazzo, Antonino Nocera
for: 提供一种在互联网物联网设备之间evaluate对象的信任性的方法，以帮助解决数据黑客和丢失问题。
methods: 我们提出了一种基于行为指纹、分布式一致算法和区块链技术的全分布式信任模型，以及一种安全模型和测试方法来评估模型的正确性和性能。
results: 我们的研究表明，我们的方法可以帮助iot设备在网络中评估对象的信任性，从而减少数据黑客和丢失的风险。我们的方法可以在不同类型的设备上实现，并且在不同的网络环境下进行有效地评估对象的信任性。

Abstract
With the number of connected smart devices expected to constantly grow in the next years, Internet of Things (IoT) solutions are experimenting a booming demand to make data collection and processing easier. The ability of IoT appliances to provide pervasive and better support to everyday tasks, in most cases transparently to humans, is also achieved through the high degree of autonomy of such devices. However, the higher the number of new capabilities and services provided in an autonomous way, the wider the attack surface that exposes users to data hacking and lost. In this scenario, many critical challenges arise also because IoT devices have heterogeneous computational capabilities (i.e., in the same network there might be simple sensors/actuators as well as more complex and smart nodes). In this paper, we try to provide a contribution in this setting, tackling the non-trivial issues of equipping smart things with a strategy to evaluate, also through their neighbors, the trustworthiness of an object in the network before interacting with it. To do so, we design a novel and fully distributed trust model exploiting devices' behavioral fingerprints, a distributed consensus mechanism and the Blockchain technology. Beyond the detailed description of our framework, we also illustrate the security model associated with it and the tests carried out to evaluate its correctness and performance.

摘要
随着智能设备的连接数量逐渐增加，互联网智能（IoT）解决方案的需求也在不断增长，以便更方便地收集和处理数据。智能设备的高度自主性使得它们能够在大多数情况下透明地为人类提供每天任务的支持。然而，随着新功能和服务的增加，用户面临的数据入侵和丢失的风险也在增加。在这种情况下，许多重要的挑战也在出现，其中一个是因为 IoT 设备的计算能力具有多种不同的水平（即在同一个网络中可能有简单的感知器/动作器以及更复杂的智能节点）。在这篇论文中，我们尝试提供一种在这种设定下的贡献，即为智能东西 equip 一种评估网络中对象的可信worthiness 的策略，并通过其邻居进行评估。为此，我们设计了一种全新的、分布式的信任模型，利用设备的行为指纹、分布式共识机制和区块链技术。我们还详细描述了我们的框架的安全模型和对其正确性和性能的测试。

Improved Variational Bayesian Phylogenetic Inference using Mixtures

paper_url: http://arxiv.org/abs/2310.00941
repo_url: https://github.com/lagergren-lab/vbpi-mixtures
paper_authors: Oskar Kviman, Ricky Molén, Jens Lagergren
for: 增强生物进化树的准确性，特别是树结构和分支长度的近似。
methods: 使用Variational Bayesian Phylogenetic Inference（VBPI）框架，加上现代深度学习技术如正常化流和图神经网络，进行树 topology 和分支长度的近似。
results: 在多个实际生物演化数据集上达到了状态体现性能。

Abstract
We present VBPI-Mixtures, an algorithm designed to enhance the accuracy of phylogenetic posterior distributions, particularly for tree-topology and branch-length approximations. Despite the Variational Bayesian Phylogenetic Inference (VBPI), a leading-edge black-box variational inference (BBVI) framework, achieving remarkable approximations of these distributions, the multimodality of the tree-topology posterior presents a formidable challenge to sampling-based learning techniques such as BBVI. Advanced deep learning methodologies such as normalizing flows and graph neural networks have been explored to refine the branch-length posterior approximation, yet efforts to ameliorate the posterior approximation over tree topologies have been lacking. Our novel VBPI-Mixtures algorithm bridges this gap by harnessing the latest breakthroughs in mixture learning within the BBVI domain. As a result, VBPI-Mixtures is capable of capturing distributions over tree-topologies that VBPI fails to model. We deliver state-of-the-art performance on difficult density estimation tasks across numerous real phylogenetic datasets.

摘要
我团队 todavía present VBPI-Mixtures，一种算法用于提高生物进化 posterior 分布的准确性，特别是树体态和分支长度的估计。 despite Variational Bayesian Phylogenetic Inference (VBPI) 是一个 cutting-edge black-box 变量推理 (BBVI) 框架，它已经实现了这些分布的很好的估计，但是树体态 posterior 的多模性是一个困难的挑战，而 sampling-based learning 技术如 BBVI 无法有效地处理这种多模性。在这种情况下，我们提出了一种新的 VBPI-Mixtures 算法，它利用 BBVI 领域中最新的混合学习突破口，以解决 VBPI 无法模型的树体态 posterior 问题。我们的 VBPI-Mixtures 算法能够捕捉 VBPI 无法模型的分布，并在许多实际生物进化数据集上达到了state-of-the-art 性能。

Integration of Graph Neural Network and Neural-ODEs for Tumor Dynamic Prediction

paper_url: http://arxiv.org/abs/2310.00926
repo_url: None
paper_authors: Omid Bazgir, Zichen Wang, Marc Hafner, James Lu
for: 这个研究旨在帮助抗癌药物开发中解决高维度 genomics 数据、肿瘤来源、治疗目标和治疗反应之间的复杂关系。
methods: 本研究提出了一种异源graph Encoder，该方法利用了双方向图connolly（GCN）和神经普通对数方程（Neural-ODEs），以实现个性化肿瘤动态预测。
results: 研究发现，该方法能够提高个性化肿瘤动态预测，并且能够充分利用多modal数据来增强肿瘤预测。

Abstract
In anti-cancer drug development, a major scientific challenge is disentangling the complex relationships between high-dimensional genomics data from patient tumor samples, the corresponding tumor's organ of origin, the drug targets associated with given treatments and the resulting treatment response. Furthermore, to realize the aspirations of precision medicine in identifying and adjusting treatments for patients depending on the therapeutic response, there is a need for building tumor dynamic models that can integrate both longitudinal tumor size as well as multimodal, high-content data. In this work, we take a step towards enhancing personalized tumor dynamic predictions by proposing a heterogeneous graph encoder that utilizes a bipartite Graph Convolutional Neural network (GCN) combined with Neural Ordinary Differential Equations (Neural-ODEs). We applied the methodology to a large collection of patient-derived xenograft (PDX) data, spanning a wide variety of treatments (as well as their combinations) on tumors that originated from a number of different organs. We first show that the methodology is able to discover a tumor dynamic model that significantly improves upon an empirical model which is in current use. Additionally, we show that the graph encoder is able to effectively utilize multimodal data to enhance tumor predictions. Our findings indicate that the methodology holds significant promise and offers potential applications in pre-clinical settings.

摘要
在抗癌药物开发中，一个主要的科学挑战是分离高维 genomics 数据，来自病人肿瘤样本，与相应的肿瘤所属的器官、与给定治疗相关的药Target以及治疗效果的关系。此外，为实现精准医学的目标，需要建立肿瘤动态模型，可以结合长期肿瘤大小和多Modal、高Content数据。在这项工作中，我们提出一种异构图像编码器，使用BiPartite Graph Convolutional Neural network (GCN)和Neural Ordinary Differential Equations (Neural-ODEs)。我们应用这种方法ологи到了一个大量的病人 derivated xenograft (PDX) 数据集，覆盖了多种治疗（以及其组合），对来自多种器官的肿瘤进行预测。我们首先显示该方法可以发现一个肿瘤动态模型，Significantly improves upon an empirical model ，现在使用。此外，我们还表明异构图像编码器可以有效地利用多Modal数据来增强肿瘤预测。我们的发现表明该方法具有普遍应用的潜在应用，特别是在预 клиниче设置中。

DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models

paper_url: http://arxiv.org/abs/2310.00902
repo_url: https://github.com/ykwon0407/datainf
paper_authors: Yongchan Kwon, Eric Wu, Kevin Wu, James Zou
for: 这paper是为了提高机器学习模型的透明度和理解输出，并且可以帮助标注数据点的批注。
methods: 这paper使用了一种名为DataInf的有效影响近似方法，该方法可以实现大规模的生成AI模型中的影响计算。
results: 经过系统的实验评估，DataInf可以准确地计算影响得分，并且比现有的方法更快速和有更少的内存占用。在应用于RoBERTa-large、Llama-2-13B-chat和stable-diffusion-v1.5模型中，DataInf能够更好地 identificet最重要的 fine-tuning 示例，并且可以帮助标注数据点的批注。

Abstract
Quantifying the impact of training data points is crucial for understanding the outputs of machine learning models and for improving the transparency of the AI pipeline. The influence function is a principled and popular data attribution method, but its computational cost often makes it challenging to use. This issue becomes more pronounced in the setting of large language models and text-to-image models. In this work, we propose DataInf, an efficient influence approximation method that is practical for large-scale generative AI models. Leveraging an easy-to-compute closed-form expression, DataInf outperforms existing influence computation algorithms in terms of computational and memory efficiency. Our theoretical analysis shows that DataInf is particularly well-suited for parameter-efficient fine-tuning techniques such as LoRA. Through systematic empirical evaluations, we show that DataInf accurately approximates influence scores and is orders of magnitude faster than existing methods. In applications to RoBERTa-large, Llama-2-13B-chat, and stable-diffusion-v1.5 models, DataInf effectively identifies the most influential fine-tuning examples better than other approximate influence scores. Moreover, it can help to identify which data points are mislabeled.

摘要
量化训练数据点的影响是机器学习模型输出的理解和AI管道的透明度的关键。影响函数是一种有 principios y popular data attribution方法，但其计算成本经常使其成为实现困难的。在大语言模型和文本到图像模型的设置中，这个问题更加突出。在这种情况下，我们提出了DataInf，一种高效的影响估计方法，适用于大规模生成AI模型。通过一个容易计算的关闭式表达，DataInf在计算和内存效率方面比既有的影响计算算法高效得多。我们的理论分析表明，DataInf在精细调整技术such as LoRA中特别有效。通过系统的实验评估，我们发现DataInf可以准确地估计影响得分，并且比现有方法快得多。在应用于RoBERTa-large、Llama-2-13B-chat和stable-diffusion-v1.5模型时，DataInf能够更好地确定最有影响的练习示例，并且能够帮助标识数据点是否被标注错误。

paper_url: http://arxiv.org/abs/2310.00896
repo_url: None
paper_authors: Yihong Zhang, Takahiro Hara
for: 这个论文旨在预测活动参与者。methods: 该论文使用社交媒体转发活动数据来增强活动参与者预测模型。它创建了一个共同知识图，将社交媒体和目标领域的信息相互关联。此外，它提出了一种利用转发信息更好地预测目标领域的学习模型。results: 作者在两个场景中进行了广泛的实验，使用实际数据。在每个场景中，他们设置了不同的训练数据大小和热和冷测试 caso。结果显示，他们的方法在热测试 caso 和数据有限情况下一直表现出优异性，特别是在热测试 caso 上。

Abstract
Nowadays, many platforms on the Web offer organized events, allowing users to be organizers or participants. For such platforms, it is beneficial to predict potential event participants. Existing work on this problem tends to borrow recommendation techniques. However, compared to e-commerce items and purchases, events and participation are usually of a much smaller frequency, and the data may be insufficient to learn an accurate model. In this paper, we propose to utilize social media retweeting activity data to enhance the learning of event participant prediction models. We create a joint knowledge graph to bridge the social media and the target domain, assuming that event descriptions and tweets are written in the same language. Furthermore, we propose a learning model that utilizes retweeting information for the target domain prediction more effectively. We conduct comprehensive experiments in two scenarios with real-world data. In each scenario, we set up training data of different sizes, as well as warm and cold test cases. The evaluation results show that our approach consistently outperforms several baseline models, especially with the warm test cases, and when target domain data is limited.

摘要
现在，许多网络平台上提供了有组织的活动，让用户成为组织者或参与者。为这些平台，预测可能参加活动的人员是有利的。现有的工作通常是借鉴推荐技术。然而，与电商Item和购买相比，活动和参与的频率通常较少，数据可能不够学习准确的模型。在这篇论文中，我们提议使用社交媒体转发活动数据来增强参与者预测模型的学习。我们创建了一个共同知识图，将社交媒体和目标领域的信息相互连接，假设活动描述和微博都是同一种语言。此外，我们提出了一种利用转发信息更好地预测目标领域的学习模型。我们对实际数据进行了广泛的实验，在两个场景中，每个场景都设置了不同的训练数据大小和热和冷测试 caso。评估结果显示，我们的方法在热测试 caso中具有显著的优势，特别是当目标领域数据有限时。

Engineering the Neural Collapse Geometry of Supervised-Contrastive Loss

paper_url: http://arxiv.org/abs/2310.00893
repo_url: None
paper_authors: Jaidev Gill, Vala Vakilian, Christos Thrampoulidis
for: 这篇论文的目的是提出一种代替cross-entropy（CE）的分类任务方法，使用相似性在嵌入空间来允许更加丰富的表示。
methods: 这篇论文提出了 modifying the contrastive loss来引导学习的嵌入空间的geometry的方法，并通过实验发现，在每个batch中包含prototypes可以使得learnt embedding的geometry与prototypes的geometry相似。
results: 通过对深度神经网络进行多个实验， authors validate their findings and show that this method can improve the performance of the classifier.Here’s the full text in Simplified Chinese:
for: 这篇论文的目的是提出一种代替cross-entropy（CE）的分类任务方法，使用相似性在嵌入空间来允许更加丰富的表示。
methods: 这篇论文提出了 modifying the contrastive loss来引导学习的嵌入空间的geometry的方法，并通过实验发现，在每个batch中包含prototypes可以使得learnt embedding的geometry与prototypes的geometry相似。
results: 通过对深度神经网络进行多个实验， authors validate their findings and show that this method can improve the performance of the classifier.

Abstract
Supervised-contrastive loss (SCL) is an alternative to cross-entropy (CE) for classification tasks that makes use of similarities in the embedding space to allow for richer representations. In this work, we propose methods to engineer the geometry of these learnt feature embeddings by modifying the contrastive loss. In pursuit of adjusting the geometry we explore the impact of prototypes, fixed embeddings included during training to alter the final feature geometry. Specifically, through empirical findings, we demonstrate that the inclusion of prototypes in every batch induces the geometry of the learnt embeddings to align with that of the prototypes. We gain further insights by considering a limiting scenario where the number of prototypes far outnumber the original batch size. Through this, we establish a connection to cross-entropy (CE) loss with a fixed classifier and normalized embeddings. We validate our findings by conducting a series of experiments with deep neural networks on benchmark vision datasets.

摘要
超级vised-contrastive loss (SCL) 是一种用于分类任务的替代方法，它利用特征空间中的相似性来允许更加丰富的表示。在这项工作中，我们提议修改对冲损失来控制learnt的特征嵌入的geometry。为了调整geometry，我们探索包括在训练中添加prototype的方法。具体来说，我们通过实验发现，在每个batch中包含prototype的 inclusioninduceslearnt的特征嵌入的geometry与prototype的geometry相对符合。我们还通过考虑一种情况，即原始batch size与prototype的数量之间的比例较大，来获得更多的内容。通过这种方式，我们建立了与cross-entropy (CE)损失和固定分类器的连接，并且使用normalized的嵌入。我们验证了我们的发现通过对深度神经网络进行一系列实验，并在图像识别 benchmark datasets 上进行了验证。

Deep Neural Networks Tend To Extrapolate Predictably

paper_url: http://arxiv.org/abs/2310.00873
repo_url: https://github.com/katiekang1998/cautious_extrapolation
paper_authors: Katie Kang, Amrith Setlur, Claire Tomlin, Sergey Levine
for: 该研究检验了神经网络对不同类型的输入数据的预测性能，以及如何在面对不同类型的输入数据时使用神经网络进行风险感知。
methods: 该研究使用了多个 datasets，不同的损失函数和网络架构，并通过观察神经网络预测值的变化情况来描述神经网络在面对不同类型的输入数据时的行为。
results: 研究发现，对于高维输入的神经网络预测结果往往受到输入数据的类型的影响，而且在输入数据变得越来越不同于训练数据时，神经网络预测结果往往会变得更加稳定，并且与最优常数解（OCS）之间的差异逐渐减少。

Abstract
Conventional wisdom suggests that neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. Our work reassesses this assumption for neural networks with high-dimensional inputs. Rather than extrapolating in arbitrary ways, we observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. Moreover, we find that this value often closely approximates the optimal constant solution (OCS), i.e., the prediction that minimizes the average loss over the training data without observing the input. We present results showing this phenomenon across 8 datasets with different distributional shifts (including CIFAR10-C and ImageNet-R, S), different loss functions (cross entropy, MSE, and Gaussian NLL), and different architectures (CNNs and transformers). Furthermore, we present an explanation for this behavior, which we first validate empirically and then study theoretically in a simplified setting involving deep homogeneous networks with ReLU activations. Finally, we show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.

摘要
We present results demonstrating this phenomenon across 8 datasets with different distributional shifts, loss functions, and architectures. We also provide an explanation for this behavior, which we first validate empirically and then study theoretically in a simplified setting involving deep homogeneous networks with ReLU activations. Finally, we show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.

COMPOSER: Scalable and Robust Modular Policies for Snake Robots

paper_url: http://arxiv.org/abs/2310.00871
repo_url: None
paper_authors: Yuyou Zhang, Yaru Niu, Xingyu Liu, Ding Zhao
For: + The paper aims to develop a control policy for snake robots that leverages their hyper-redundancy and flexibility to enhance robustness and generalizability.* Methods: + The paper formulates the control of the snake robot as a cooperative Multi-Agent Reinforcement Learning (MARL) problem, with each segment of the snake robot functioning as an individual agent. + The paper incorporates a self-attention mechanism to enhance cooperative behavior between agents, and proposes a high-level imagination policy to provide additional rewards to guide the low-level control policy.* Results: + The proposed method COMPOSER achieves the highest success rate across five snake robot tasks, including goal reaching, wall climbing, shape formation, tube crossing, and block pushing, compared to a centralized baseline and four modular policy baselines. + The method demonstrates enhanced robustness against module corruption and significantly superior zero-shot generalizability.Here is the information in Simplified Chinese text:* For: + 论文目标是开发一种利用蛇机器人的超低维度和灵活性来增强Robustness和普遍性的控制策略。* Methods: + 论文将蛇机器人控制问题设置为一个协同多智能体学习（MARL）问题，每个蛇机器人段都作为一个个体 Agent。 + 论文含有自注意机制来增强协同行为，并提出一种高级幻想策略来为低级控制策略提供额外奖励。* Results: + 提案的方法COMPOSER在五个蛇机器人任务中取得了最高成功率，包括目标达成、墙 climbing、形态形成、管道跨越和块推动等，比中央基线和四个模块策略基线高。 + 方法还表现出了增强的模块腐坏鲁棒性和零基础学习可重复性。

Abstract
Snake robots have showcased remarkable compliance and adaptability in their interaction with environments, mirroring the traits of their natural counterparts. While their hyper-redundant and high-dimensional characteristics add to this adaptability, they also pose great challenges to robot control. Instead of perceiving the hyper-redundancy and flexibility of snake robots as mere challenges, there lies an unexplored potential in leveraging these traits to enhance robustness and generalizability at the control policy level. We seek to develop a control policy that effectively breaks down the high dimensionality of snake robots while harnessing their redundancy. In this work, we consider the snake robot as a modular robot and formulate the control of the snake robot as a cooperative Multi-Agent Reinforcement Learning (MARL) problem. Each segment of the snake robot functions as an individual agent. Specifically, we incorporate a self-attention mechanism to enhance the cooperative behavior between agents. A high-level imagination policy is proposed to provide additional rewards to guide the low-level control policy. We validate the proposed method COMPOSER with five snake robot tasks, including goal reaching, wall climbing, shape formation, tube crossing, and block pushing. COMPOSER achieves the highest success rate across all tasks when compared to a centralized baseline and four modular policy baselines. Additionally, we show enhanced robustness against module corruption and significantly superior zero-shot generalizability in our proposed method. The videos of this work are available on our project page: https://sites.google.com/view/composer-snake/.

摘要
神经骨蟹机器人在与环境互动中表现出了惊人的适应性和灵活性，与其自然对应者一样。尽管神经骨蟹机器人的高级别和多维度特征增加了控制难度，但是这些特征也隐藏了控制策略的不利影响。我们寻求开发一种控制策略，可以有效地将神经骨蟹机器人的高维度特征纳入控制范畴，同时利用其灵活性。在这项工作中，我们将神经骨蟹机器人视为模块化机器人，并将其控制问题形式为合作多智能体学习（MARL）问题。每个神经骨蟹机器人段功能为个体代理。我们采用自注意机制来增强代理之间的合作行为。我们还提出了高级别想象策略，以提供低级别控制策略的引导。我们验证了我们的方法COMPOSER，并在五个神经骨蟹机器人任务中取得了最高成功率，比中央基线和四个模块策略基线更高。此外，我们还证明了我们的方法具有更高的机器人模块损害robustness和零基础学习能力。视频 demo 可以在我们项目页面上找到：https://sites.google.com/view/composer-snake/.

Drug Discovery with Dynamic Goal-aware Fragments

paper_url: http://arxiv.org/abs/2310.00841
repo_url: None
paper_authors: Seul Lee, Seanie Lee, Sung Ju Hwang
for: 用于药物探索和发现新药候选体
methods: 使用目标化学性质信息瓶颈原理提取目标化学性质的重要片段，并将其组装成一个有目标性的片段词典。然后通过增强的碎片修改模块，继续探索和更新碎片词典。
results: 通过三个模块的生成循环，GEAM有效地探索和发现了许多有优点的药物候选体。

Abstract
Fragment-based drug discovery is an effective strategy for discovering drug candidates in the vast chemical space, and has been widely employed in molecular generative models. However, many existing fragment extraction methods in such models do not take the target chemical properties into account or rely on heuristic rules. Additionally, the existing fragment-based generative models cannot update the fragment vocabulary with goal-aware fragments newly discovered during the generation. To this end, we propose a molecular generative framework for drug discovery, named Goal-aware fragment Extraction, Assembly, and Modification (GEAM). GEAM consists of three modules, each responsible for goal-aware fragment extraction, fragment assembly, and fragment modification. The fragment extraction module identifies important fragments that contribute to the desired target properties with the information bottleneck principle, thereby constructing an effective goal-aware fragment vocabulary. Moreover, GEAM can explore beyond the initial vocabulary with the fragment modification module, and the exploration is further enhanced through the dynamic goal-aware vocabulary update. We experimentally demonstrate that GEAM effectively discovers drug candidates through the generative cycle of the three modules in various drug discovery tasks.

摘要
Fragment-based drug discovery 是一种有效的探索药物候选者的策略，广泛应用于分子生成模型中。然而，许多现有的 Fragment 提取方法在这些模型中并不考虑目标化学性质或者采用规则性的方法。此外，现有的 Fragment-based 生成模型无法在生成过程中更新 Fragment 词汇库，以满足目标化学性质。为此，我们提出了一种用于药物探索的分子生成框架，名为 Goal-aware Fragment Extraction、Assembly、and Modification（GEAM）。 GEAM 包括三个模块，每个模块负责goal-aware Fragment 提取、Fragment 组装和Fragment 修改。 Fragment 提取模块通过信息瓶颈原理来确定重要的 Fragment，以构建有效的目标化学性质相关的 Fragment 词汇库。此外，GEAM 可以在生成过程中超越初始词汇库，并通过动态更新目标化学性质相关的 Fragment 词汇库来进一步增强探索。我们在多个药物探索任务中实验表明，GEAM 能够有效地通过生成模型的三个模块来找到药物候选者。

Subsurface Characterization using Ensemble-based Approaches with Deep Generative Models

paper_url: http://arxiv.org/abs/2310.00839
repo_url: https://github.com/jichao1/wgan-gp
paper_authors: Jichao Bao, Hongkyu Yoon, Jonghyun Lee
for:This paper aims to accurately and efficiently estimate spatially distributed properties like hydraulic conductivity (K) from sparse measurements using a deep generative model and ensemble-based inversion method.methods:The proposed method combines Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) and Ensemble Smoother with Multiple Data Assimilation (ES-MDA) to generate high-dimensional K fields from a low-dimensional latent space and update the latent variables by assimilating available measurements.results:The proposed method accurately characterizes the main features of the unknown K fields with reliable uncertainty quantification, outperforming a widely-used variational inversion approach, especially for channelized and fractured field examples. The ensemble-based approach smooths out the complex objective function surface during minimization, leading to improved performance.

Abstract
Estimating spatially distributed properties such as hydraulic conductivity (K) from available sparse measurements is a great challenge in subsurface characterization. However, the use of inverse modeling is limited for ill-posed, high-dimensional applications due to computational costs and poor prediction accuracy with sparse datasets. In this paper, we combine Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP), a deep generative model that can accurately capture complex subsurface structure, and Ensemble Smoother with Multiple Data Assimilation (ES-MDA), an ensemble-based inversion method, for accurate and accelerated subsurface characterization. WGAN-GP is trained to generate high-dimensional K fields from a low-dimensional latent space and ES-MDA then updates the latent variables by assimilating available measurements. Several subsurface examples are used to evaluate the accuracy and efficiency of the proposed method and the main features of the unknown K fields are characterized accurately with reliable uncertainty quantification. Furthermore, the estimation performance is compared with a widely-used variational, i.e., optimization-based, inversion approach, and the proposed approach outperforms the variational inversion method, especially for the channelized and fractured field examples. We explain such superior performance by visualizing the objective function in the latent space: because of nonlinear and aggressive dimension reduction via generative modeling, the objective function surface becomes extremely complex while the ensemble approximation can smooth out the multi-modal surface during the minimization. This suggests that the ensemble-based approach works well over the variational approach when combined with deep generative models at the cost of forward model runs unless convergence-ensuring modifications are implemented in the variational inversion.

摘要
估算沿体分布的特性，如水利导能（K），从可用的稀疏测量数据中估算是一项大allenge。然而，因为这类应用的维度太多，使用反向模型受限于计算成本和精度不高。在这篇论文中，我们结合 Wasserstein Generative Adversarial Network with Gradient Penalty（WGAN-GP）和 Ensemble Smoother with Multiple Data Assimilation（ES-MDA），一种深度生成模型和一种ensemble-based倒推方法，以实现高精度和加速的地下特性估算。WGAN-GP是用于生成高维K场的深度生成模型，ES-MDA则将可用测量数据 assimilate到latent变量中。我们使用多个地下示例来评估提案的准确性和效率，并发现提案可以准确地 caracterize unknown K场的主要特征，并提供可靠的不确定量评估。此外，我们与一种广泛使用的变量，即优化基于推理的倒推方法进行比较，并发现提案的方法在渠化和裂隙场示例中表现更优异。我们通过Visualizing the objective function in the latent space来解释这种更好的性能，因为通过非线性和攻击性的维度减少，生成模型可以生成非常复杂的目标函数表面，而ensemble approximation可以在最小化过程中平滑出多模态表面。这表明， ensemble-based方法在Variational inversion方法中表现更好，尤其是在渠化和裂隙场示例中。

Online Sensitivity Optimization in Differentially Private Learning

paper_url: http://arxiv.org/abs/2310.00829
repo_url: None
paper_authors: Filippo Galli, Catuscia Palamidessi, Tommaso Cucinotta
for: 这篇研究的目的是为了开发具有隐私保证的机器学习模型，并且需要对个人贡献的限制。
methods: 这篇研究使用的方法包括将个人的梯度转换为$2$-norm，并且在批制程中进行数据隐藏。
results: 这篇研究的结果显示，这种动态化 clipping 阈值的方法可以与固定的阈值比较，具有相同或更好的性能，并且可以在不同的数据集、模型结构和隐私水平下进行最佳化。

Abstract
Training differentially private machine learning models requires constraining an individual's contribution to the optimization process. This is achieved by clipping the $2$-norm of their gradient at a predetermined threshold prior to averaging and batch sanitization. This selection adversely influences optimization in two opposing ways: it either exacerbates the bias due to excessive clipping at lower values, or augments sanitization noise at higher values. The choice significantly hinges on factors such as the dataset, model architecture, and even varies within the same optimization, demanding meticulous tuning usually accomplished through a grid search. In order to circumvent the privacy expenses incurred in hyperparameter tuning, we present a novel approach to dynamically optimize the clipping threshold. We treat this threshold as an additional learnable parameter, establishing a clean relationship between the threshold and the cost function. This allows us to optimize the former with gradient descent, with minimal repercussions on the overall privacy analysis. Our method is thoroughly assessed against alternative fixed and adaptive strategies across diverse datasets, tasks, model dimensions, and privacy levels. Our results demonstrate its comparable or superior performance in all evaluated scenarios, given the same privacy requirements.

摘要

2023-10-02

eess.IV

eess.IV - 2023-10-02

A Restoration Network as an Implicit Prior

paper_url: http://arxiv.org/abs/2310.01391
repo_url: None
paper_authors: Yuyang Hu, Mauricio Delbracio, Peyman Milanfar, Ulugbek S. Kamilov
for: 这个论文是为了解决图像恢复问题而写的。
methods: 这个论文使用的方法是使用先行训练的深度神经网络作为恢复运算的假设。
results: numerical results show that the method using a super-resolution prior achieves state-of-the-art performance both quantitatively and qualitatively。Here’s the full translation of the abstract in Simplified Chinese:
for: 这个论文是为了解决图像恢复问题而写的。
methods: 这个论文使用的方法是使用先行训练的深度神经网络作为恢复运算的假设。
results: 数值结果表明，使用超解像前提可以达到状态体系数量和质量上的最佳性能。I hope that helps!

Abstract
Image denoisers have been shown to be powerful priors for solving inverse problems in imaging. In this work, we introduce a generalization of these methods that allows any image restoration network to be used as an implicit prior. The proposed method uses priors specified by deep neural networks pre-trained as general restoration operators. The method provides a principled approach for adapting state-of-the-art restoration models for other inverse problems. Our theoretical result analyzes its convergence to a stationary point of a global functional associated with the restoration operator. Numerical results show that the method using a super-resolution prior achieves state-of-the-art performance both quantitatively and qualitatively. Overall, this work offers a step forward for solving inverse problems by enabling the use of powerful pre-trained restoration models as priors.

摘要
图像去噪器已经被证明是解析问题的强大先验。在这项工作中，我们介绍了一种扩展这些方法，允许任何图像恢复网络被用作隐藏先验。我们的方法使用由深度神经网络预训练为普通恢复操作器的先验。我们的理论结果分析其趋向于一个全局函数相关的恢复运算的站点点。数值结果表明，使用超解像先验可以达到现代水平的性能 both quantitatively and qualitatively。总之，这项工作为解析问题提供了一个前进，允许使用强大的预训练恢复模型作为先验。

Predicting Lung Cancer’s Metastats’ Locations Using Bioclinical Model

paper_url: http://arxiv.org/abs/2310.08596
repo_url: None
paper_authors: Teddy Lazebnik, Svetlana Bunimovich-Mendrazitsky
for: 预测肺癌 метастаisis的空间扩散
methods: 使用三维计算机断层成像(CT)扫描和生物物理学模型
results: 实验 validate the bioclinical model on 10 patient data, with 74% accuracy in metastasis location prediction.

Abstract
Lung cancer is a leading cause of cancer-related deaths worldwide. The spread of the disease from its primary site to other parts of the lungs, known as metastasis, significantly impacts the course of treatment. Early identification of metastatic lesions is crucial for prompt and effective treatment, but conventional imaging techniques have limitations in detecting small metastases. In this study, we develop a bioclinical model for predicting the spatial spread of lung cancer's metastasis using a three-dimensional computed tomography (CT) scan. We used a three-layer biological model of cancer spread to predict locations with a high probability of metastasis colonization. We validated the bioclinical model on real-world data from 10 patients, showing promising 74% accuracy in the metastasis location prediction. Our study highlights the potential of the combination of biophysical and ML models to advance the way that lung cancer is diagnosed and treated, by providing a more comprehensive understanding of the spread of the disease and informing treatment decisions.

摘要
肺癌是全球最主要的肿瘤相关死亡原因之一。肿瘤从主要位点至其他肺部的传播，即肿瘤转移，对治疗诊断产生重要影响。早期识别转移 lesions 非常重要，但传统的成像技术有限制可以探测小转移。在这项研究中，我们开发了一种生物клиниче模型，用于预测肺癌转移的空间扩散。我们使用了三层生物模型来预测可能受感染的位置。我们验证了生物клиниче模型使用实际数据，从10名患者中提取了数据，并达到了74%的准确率。我们的研究表明，将生物物理和机器学习模型相结合，可以推动肺癌的诊断和治疗方法的进步，提供更全面的疾病扩散的理解，以及更加准确的诊断和治疗决策。

Fourier PD and PDUNet: Complex-valued networks to speed-up MR Thermometry during Hypterthermia

paper_url: http://arxiv.org/abs/2310.01073
repo_url: None
paper_authors: Rupali Khatun, Soumick Chatterjee, Christoph Bert, Martin Wadepohl, Manfred Schmidt, Oliver J. Ott, Rainer Fietkau, Andreas Nürnberger, Udo S. Gaipl, Benjamin Frey
for: 这个研究的目的是提高下采样的MR温度测量数据重建的解决方案，以提高解决速度和减少artefacts。methods: 这个研究使用了深度学习技术来重建高度下采样的MR温度测量数据，并使用了两种不同的深度学习模型：Fourier Primal-Dual网络和Fourier Primal-Dual UNet。results: 研究发现，使用深度学习模型可以减少下采样MR温度测量数据与完全采样MR温度测量数据之间的温度差距，从1.5 $\degree$C降至0.5 $\degree$C。

Abstract
Hyperthermia (HT) in combination with radio- and/or chemotherapy has become an accepted cancer treatment for distinct solid tumour entities. In HT, tumour tissue is exogenously heated to temperatures of 39 to 43 $\degree$C for 60 minutes. Temperature monitoring can be performed noninvasively using dynamic magnetic resonance imaging (MRI). However, the slow nature of MRI leads to motion artefacts in the images due to the movements of patients during image acquisition time. By discarding parts of the data, the speed of the acquisition can be increased - known as Undersampling. However, due to the invalidation of the Nyquist criterion, the acquired images have lower resolution and can also produce artefacts. The aim of this work was, therefore, to reconstruct highly undersampled MR thermometry acquisitions with better resolution and with less artefacts compared to conventional techniques like compressed sensing. The use of deep learning in the medical field has emerged in recent times, and various studies have shown that deep learning has the potential to solve inverse problems such as MR image reconstruction. However, most of the published work only focusses on the magnitude images, while the phase images are ignored, which are fundamental requirements for MR thermometry. This work, for the first time ever, presents deep learning based solutions for reconstructing undersampled MR thermometry data. Two different deep learning models have been employed here, the Fourier Primal-Dual network and Fourier Primal-Dual UNet, to reconstruct highly undersampled complex images of MR thermometry. It was observed that the method was able to reduce the temperature difference between the undersampled MRIs and the fully sampled MRIs from 1.5 $\degree$C to 0.5 $\degree$C.

摘要
高级热辐射（HT）在结合放射线和/或化学疗法的抗癌治疗中得到了承认。在HT中，肿瘤组织被外源性加热到39-43℃的温度，持续60分钟。肿瘤组织的温度可以非侵入性地监测使用动力磁共振成像（MRI）。然而，由于患者在获取图像时的运动，MRI图像会受到运动artefacts的影响。通过抛弃一部分数据，可以快速化图像获取过程 - bekannt als Undersampling。然而，由于遵循 Nyquist критериion的无效化，获取的图像具有更低的分辨率，并且可能会产生artefacts。因此，本研究的目标是使用深度学习来重建高度受抽象的 MR 热图像数据，以提高分辨率和减少artefacts，并且不同于传统的压缩感知技术。深度学习在医学领域的应用已经在最近几年得到了广泛的关注，而且许多研究表明，深度学习有可能解决 inverse 问题，如 MR 图像重建。然而，大多数已发表的研究仅关注了 magnitude 图像，而忽略了阶跃图像，这些图像是 MR 热测量的基本需求。本研究是首次使用深度学习来重建高度受抽象的 MR 热图像数据。我们在这里采用了两种深度学习模型：Fourier Primal-Dual 网络和 Fourier Primal-Dual UNet，来重建高度受抽象的 MR 热图像。我们发现，该方法可以将高级受抽象 MR 热图像和完全样本 MR 热图像之间的温度差降低至0.5℃。

2023-10-02

eess.SP

eess.SP - 2023-10-02

OFDM-RSMA: Robust Transmission under Inter-Carrier Interference

paper_url: http://arxiv.org/abs/2310.01686
repo_url: None
paper_authors: Mehmet Mert Sahin, Onur Dizdar, Bruno Clerckx, Huseyin Arslan
for: 这篇论文目的是提出一种基于rate-splitting多访问(RSMA)和orthogonal frequency division multiplexing(OFDM)的下行传输方案，以提高多用户多antenna系统中的性能。
methods: 该论文使用了weighted minimum mean-square error(WMMSE)算法来解决非对称问题，并且利用了RSMA的干扰管理能力来处理多个干扰源的干扰。
results: 论文的计算结果表明，在不同的媒体通道条件下，提案的OFDM-RSMA方案可以与OFDMA和NOMA方案相比，具有更高的总速率性能。

Abstract
Rate-splitting multiple access (RSMA) is a multiple access scheme to mitigate the effects of the multi-user interference (MUI) in multi-antenna systems. In this study, we leverage the interference management capabilities of RSMA to tackle the issue of inter-carrier interference (ICI) in orthogonal frequency division multiplexing (OFDM) waveform. We formulate a sum-rate maximization problem to find the optimal subcarrier and power allocation for downlink transmission in a two-user system using RSMA and OFDM. A weighted minimum mean-square error (WMMSE)-based algorithm is proposed to obtain a solution for the formulated non-convex problem. We show that the marriage of rate-splitting (RS) with OFDM provides complementary strengths to cope with peculiar characteristic of wireless medium and its performance-limiting challenges including inter-symbol interference (ISI), MUI, ICI, and inter-numerology interference (INI). The sum-rate performance of the proposed OFDM-RSMA scheme is numerically compared with that of conventional orthogonal frequency division multiple access (OFDMA) and OFDM-non-orthogonal multiple access (NOMA). It is shown that the proposed OFDM-RSMA outperforms OFDM-NOMA and OFDMA in diverse propagation channel conditions owing to its flexible structure and robust interference management capabilities.

摘要
rate-splitting多Access（RSMA）是一种多Access方案，用于 Mitigate the effects of multi-user interference（MUI）in multi-antenna systems. In this study, we leverage the interference management capabilities of RSMA to tackle the issue of inter-carrier interference（ICI）in orthogonal frequency division multiplexing（OFDM）waveform. We formulate a sum-rate maximization problem to find the optimal subcarrier and power allocation for downlink transmission in a two-user system using RSMA and OFDM. A weighted minimum mean-square error（WMMSE）-based algorithm is proposed to obtain a solution for the formulated non-convex problem. We show that the marriage of rate-splitting（RS）with OFDM provides complementary strengths to cope with the peculiar characteristic of wireless medium and its performance-limiting challenges including inter-symbol interference（ISI）, MUI, ICI, and inter-numerology interference（INI）. The sum-rate performance of the proposed OFDM-RSMA scheme is numerically compared with that of conventional orthogonal frequency division multiple access（OFDMA）and OFDM-non-orthogonal multiple access（NOMA）. It is shown that the proposed OFDM-RSMA outperforms OFDM-NOMA and OFDMA in diverse propagation channel conditions owing to its flexible structure and robust interference management capabilities.

Data-driven Forced Oscillation Localization using Inferred Impulse Responses

paper_url: http://arxiv.org/abs/2310.01656
repo_url: https://github.com/ShaohuiLiu/fo_local
paper_authors: Shaohui Liu, Hao Zhu, Vassilis Kekatos
for: 本文旨在推断强制振荡（FO）的来源，只使用同步phasor测量数据。
methods: 提议的数据驱动框架使用快速环境数据进行小信号响应的恢复，不需要系统模型。在FO事件发生时，使用预先恢复的冲击响应进行频域分析，并使用LS误差函数进行适应。
results: 数字验证表明该方法可以应用于实际的电力系统，包括非线性、高阶动力学和控制效应。总的来说，该方法可以准确地推断FO的来源，并且可以扩展到不同的测量类型和部分感知覆盖条件。

Abstract
Poorly damped oscillations pose threats to the stability and reliability of interconnected power systems. In this work, we propose a comprehensive data-driven framework for inferring the sources of forced oscillation (FO) using only synchrophasor measurements. During normal grid operations, fast-rate ambient data are collected to recover the impulse responses in the small-signal regime, without requiring the system models. When FO events occur, the source is estimated based on the frequency domain analysis by fitting the least-squares (LS) error for the FO data using the impulse responses recovered previously. Although the proposed framework is purely data-driven, the result has been established theoretically via model-based analysis of linearized dynamics under a few realistic assumptions. Numerical validations demonstrate its applicability to realistic power systems including nonlinear, higher-order dynamics with control effects using the IEEE 68-bus system. The generalizability of the proposed methodology has been validated using different types of measurements and partial sensor coverage conditions.

摘要
低刚性振荡会对电力系统稳定性和可靠性提出威胁。在这项工作中，我们提议了一个全面的数据驱动方法，使用同步phasor测量来推测强制振荡（FO）的来源。在正常网络运行时， быстро速的 ambient数据被收集来恢复小信号域中的冲击响应，无需系统模型。当FO事件发生时，源的估计基于频域分析，通过LS误差适应FO数据使用先前恢复的冲击响应进行适应。 although the proposed framework is purely data-driven, the result has been established theoretically via model-based analysis of linearized dynamics under a few realistic assumptions. numerical validations demonstrate its applicability to realistic power systems including nonlinear, higher-order dynamics with control effects using the IEEE 68-bus system. The generalizability of the proposed methodology has been validated using different types of measurements and partial sensor coverage conditions.

Near-field Integrated Sensing and Communication: Opportunities and Challenges

paper_url: http://arxiv.org/abs/2310.01342
repo_url: None
paper_authors: Jiayi Cong, Changsheng You, Jiapeng Li, Li Chen, Beixiong Zheng, Yuanwei Liu, Wen Wu, Yi Gong, Shi Jin, Rui Zhang
for: investigate the near-field ISAC, which integrates sensing and communication in the near-field region
methods: joint near-field communication and sensing, sensing-assisted near-field communication, and communication-assisted near-field sensing
results: new research opportunities, new design issues, and promising solutions for near-field ISAC

Abstract
With the extremely large-scale array XL-array deployed in future wireless systems, wireless communication and sensing are expected to operate in the radiative near-field region, which needs to be characterized by the spherical rather than planar wavefronts. Unlike most existing works that considered far-field integrated sensing and communication (ISAC), we study in this article the new near-field ISAC, which integrates both functions of sensing and communication in the near-field region. To this end, we first discuss the appealing advantages of near-field communication and sensing over their far-field counterparts, respectively. Then, we introduce three approaches for near-field ISAC, including joint near-field communication and sensing, sensing-assisted near-field communication, and communication-assisted near-field sensing. We discuss their individual research opportunities, new design issues, as well as propose promising solutions. Finally, several important directions in near-field ISAC are also highlighted to motivate future work.

摘要
With the extremely large-scale array XL-array deployed in future wireless systems, wireless communication and sensing are expected to operate in the radiative near-field region, which needs to be characterized by the spherical rather than planar wavefronts. Unlike most existing works that considered far-field integrated sensing and communication (ISAC), we study in this article the new near-field ISAC, which integrates both functions of sensing and communication in the near-field region. To this end, we first discuss the appealing advantages of near-field communication and sensing over their far-field counterparts, respectively. Then, we introduce three approaches for near-field ISAC, including joint near-field communication and sensing, sensing-assisted near-field communication, and communication-assisted near-field sensing. We discuss their individual research opportunities, new design issues, as well as propose promising solutions. Finally, several important directions in near-field ISAC are also highlighted to motivate future work.Here's the translation in Traditional Chinese:With the extremely large-scale array XL-array deployed in future wireless systems, wireless communication and sensing are expected to operate in the radiative near-field region, which needs to be characterized by the spherical rather than planar wavefronts. Unlike most existing works that considered far-field integrated sensing and communication (ISAC), we study in this article the new near-field ISAC, which integrates both functions of sensing and communication in the near-field region. To this end, we first discuss the appealing advantages of near-field communication and sensing over their far-field counterparts, respectively. Then, we introduce three approaches for near-field ISAC, including joint near-field communication and sensing, sensing-assisted near-field communication, and communication-assisted near-field sensing. We discuss their individual research opportunities, new design issues, as well as propose promising solutions. Finally, several important directions in near-field ISAC are also highlighted to motivate future work.

Cell-Free Bistatic Backscatter Communication: Channel Estimation, Optimization, and Performance Analysis

paper_url: http://arxiv.org/abs/2310.01264
repo_url: None
paper_authors: Diluka Galappaththige, Fatemeh Rezaei, Chintha Tellambura, Amine Maaref
for: 这项研究旨在探讨和实现基于终端无尘核心架构和反射式回播通信（BiBC）的未来（EH）基于互联网物联网（IoT）网络中的潜在应用。
methods: 我们首先提出了一种频道估计方案，用于估计提案系统集成的直接、叠加、前向通道。然后，我们使用频道估计来设计最佳的AP束重、标签反射率和接收器滤波器，以最大化标签总速率，并满足标签的最小能量需求。由于提案的最大化问题是非凸的，我们提出了一种基于备选优化、分配Programming和几何Quotient技术的解决方案。
results: 我们 presente extensa numerical results，用于验证我们的频道估计方案和优化框架，以及提案的集成性能。与Random Beamforming/Combining benchmark相比，我们的算法实现了非常出色的提升。例如，它在10dBm的输入电平下，使用36个AP和3个标签时，实现了约64.8%和约253.5%的能量收集和标签总速率提升。

Abstract
This study introduces and investigates the integration of a cell-free architecture with bistatic backscatter communication (BiBC), referred to as cell-free BiBC or distributed access point (AP)-assisted BiBC, which can enable potential applications in future (EH)-based Internet-of-Things (IoT) networks. To that purpose, we first present a pilot-based channel estimation scheme for estimating the direct, cascaded, forward channels of the proposed system setup. We next utilize the channel estimates for designing the optimal beamforming weights at the APs, reflection coefficients at the tags, and reception filters at the reader to maximize the tag sum rate while meeting the tags' minimum energy requirements. Because the proposed maximization problem is non-convex, we propose a solution based on alternative optimization, fractional programming, and Rayleigh quotient techniques. We also quantify the computational complexity of the developed algorithms. Finally, we present extensive numerical results to validate the proposed channel estimation scheme and optimization framework, as well as the performance of the integration of these two technologies. Compared to the random beamforming/combining benchmark, our algorithm yields impressive gains. For example, it achieves $\sim$ 64.8\% and $\sim$ 253.5\% gains in harvested power and tag sum rate, respectively, for 10 dBm with 36 APs and 3 tags.

摘要

Wireless strain and temperature monitoring in reinforced concrete using Surface Acoustic Wave (SAW) sensors

paper_url: http://arxiv.org/abs/2310.03765
repo_url: None
paper_authors: Pierre Jeltiri, Firas Al-Mahmoud, Rémi Boissière, Baptiste Paulmier, Tony Makdissy, Omar Elmazria, Pascal Nicolay, Sami Hage-Ali
for: 这个论文的目的是为了监测土木工程结构的健康状况，并通过植入减形、温度和腐蚀传感器来提高维护和延长服务寿命。
methods: 这个论文使用了商业SAW设备，将其附加到钢矱上，以测量混凝土梁的减形和温度。不需要电缆或内置电子设备。
results: 研究发现，SAW传感器可以有效地测量混凝土梁的减形和温度，并且可以持续三周不间断地测量温度。

Abstract
Monitoring the health of civil engineering structures using implanted deformation, temperature and corrosion sensors would further improve maintenance and extend the service life of those structures. However, sensor integration poses a number of problems, due to the presence of cables and on-board electronics. Passive, wireless SAW sensors offer a very promising solution, here. We used commercial SAW devices mounted on steel rebars to carry out an initial feasibility study. Without cables or embedded electronics, we were able to measure the deformation of a concrete beam subjected to bending load. We were also able to measure the temperature continuously over a three-week period.

摘要
监测ivil工程结构的健康状况使用植入的弯形、温度和腐蚀感知器，可以进一步提高维护和延长结构的服务寿命。但是感知器集成带来了一些问题，即有电缆和固定电子设备的存在。无缆无嵌入电子设备的感知器所提供的可能性非常吸引人。我们使用商业SAW设备 mounted on steel rebars进行了一项初步可行性研究。无需电缆或嵌入电子设备，我们成功地测量了一根受轴向荷重负荷的混凝土箍的弯形。我们还成功地测量了三个星期内的温度连续变化。

Generative AI for Integrated Sensing and Communication: Insights from the Physical Layer Perspective

paper_url: http://arxiv.org/abs/2310.01036
repo_url: None
paper_authors: Jiacheng Wang, Hongyang Du, Dusit Niyato, Jiawen Kang, Shuguang Cui, Xuemin Shen, Ping Zhang
for: 本研究探讨了基于生成人工智能（GAI）的物理层应用，特别是整合感知通信（ISAC）系统的支持。
methods: 本文首先提供了GAI和ISAC的概述，并详细介绍了GAI在物理层上的应用，包括通道估计等方面。
results: 在实验中，提出的扩散模型基本方法能够在近场条件下高精度地估计信号发射方向，具体达到了1.03度的平均方差。这confirm GAI在物理层的支持。

Abstract
As generative artificial intelligence (GAI) models continue to evolve, their generative capabilities are increasingly enhanced and being used extensively in content generation. Beyond this, GAI also excels in data modeling and analysis, benefitting wireless communication systems. In this article, we investigate applications of GAI in the physical layer and analyze its support for integrated sensing and communications (ISAC) systems. Specifically, we first provide an overview of GAI and ISAC, touching on GAI's potential support across multiple layers of ISAC. We then concentrate on the physical layer, investigating GAI's applications from various perspectives thoroughly, such as channel estimation, and demonstrate the value of these GAI-enhanced physical layer technologies for ISAC systems. In the case study, the proposed diffusion model-based method effectively estimates the signal direction of arrival under the near-field condition based on the uniform linear array, when antenna spacing surpassing half the wavelength. With a mean square error of 1.03 degrees, it confirms GAI's support for the physical layer in near-field sensing and communications.

摘要
如果生成人工智能（GAI）模型继续演化，它们的生成能力将会不断增强，并在内容生成中得到广泛的应用。此外，GAI还在数据模型化和分析方面表现出色，对无线通信系统产生了很大的影响。在这篇文章中，我们 investigate GAI在物理层中的应用，并分析它在集成感知通信（ISAC）系统中的支持。我们首先提供GAI和ISAC的概述，然后专门关注物理层，从多种角度进行深入的研究，例如通道估计，并证明GAI在物理层技术方面对ISAC系统提供了价值。在 caso study中，我们提出了基于扩散模型的方法，用于估计信号到来方向，当antenna spacing超过半波长时。该方法的均方差为1.03度，确认了GAI在近场感知通信中的支持。

Joint Source-Channel Coding System for 6G Communication: Design, Prototype and Future Directions

paper_url: http://arxiv.org/abs/2310.01024
repo_url: None
paper_authors: Xinchao Zhong, Sean Longyu Ma, Hong-fu Chou, Arsham Mostaani, Thang X. Vu, Symeon Chatzinotas
for: 本研究旨在超越逻辑 communicate 的优化问题，即将发送源和通道协同设计 source coding 和 channel coding 的整合，以实现未来通信系统的高效性和可靠性。
methods: 本研究使用了 flexible structural hardware design 和 joint source-channel coding (JSCC) 技术，以实现源字符串的多样性和 unity code rate。另外，本研究还使用了 Unequal Error Protection (UEP) 技术，以保持 semantic importance 的恢复。
results: 本研究表明，使用 quasi-cyclic (QC) 特点和 UEP 技术可以实现高效的 semantic communication，并且可以在各种通信频率和信道条件下实现优秀的表达效果。

Abstract
The goal of semantic communication is to surpass optimal Shannon's criterion regarding a notable problem for future communication which lies in the integration of collaborative efforts between the intelligence of the transmission source and the joint design of source coding and channel coding. The convergence of scholarly investigation and applicable products in the field of semantic communication is facilitated by the utilization of flexible structural hardware design, which is constrained by the computational capabilities of edge devices. This characteristic represents a significant benefit of joint source-channel coding (JSCC), as it enables the generation of source alphabets with diverse lengths and achieves a code rate of unity. Moreover, JSCC exhibits near-capacity performance while maintaining low complexity. Therefore, we leverage not only quasi-cyclic (QC) characteristics to propose a QC-LDPC code-based JSCC scheme but also Unequal Error Protection (UEP) to ensure the recovery of semantic importance. In this study, the feasibility for using a semantic encoder/decoder that is aware of UEP can be explored based on the existing JSCC system. This approach is aimed at protecting the significance of semantic task-oriented information. Additionally, the deployment of a JSCC system can be facilitated by employing Low-Density Parity-Check (LDPC) codes on a reconfigurable device. This is achieved by reconstructing the LDPC codes as QC-LDPC codes. The QC-LDPC layered decoding technique, which has been specifically optimized for hardware parallelism and tailored for channel decoding applications, can be suitably adapted to accommodate the JSCC system. The performance of the proposed system is evaluated by conducting BER measurements using both floating-point and 6-bit quantization.

摘要
《semantic communication的目标是超越希耶纳的最佳吞吐量标准，解决未来通信中的一个重要问题，即源传输和通道编码的共同设计。学术研究和实用产品在semantic communication领域的结合，得益于灵活的结构硬件设计，这种设计受到边缘设备的计算能力的限制。这种特点是JSCC的一大优点，它可以生成源字母的不同长度，实现码率unity，同时具有低复杂性和高效率。因此，我们不仅利用逻辑（QC）特征，还提出了LDPC码基于JSCC方案，并通过不等错误保护（UEP）确保 semantic importance的恢复。在这项研究中，我们可以通过现有JSCC系统的semantic编码/解码器来探索使用UEP的可能性。这种方法旨在保护 semantic任务关键信息的重要性。此外，通过在可重新配置的设备上使用LDPC码，我们可以方便地实现JSCC系统的部署。通过重新构造LDPC码为QC-LDPC码，我们可以采用QC-LDPC层次解码技术，这种技术已经特Optimized for hardware parallelism和tailored for channel decoding应用。我们通过使用浮点数和6比特量化进行BER测量来评估提议的系统性能。

Magnetic SAW RFID Sensor Based on Love Wave for Detection of Magnetic Field and Temperature

paper_url: http://arxiv.org/abs/2310.03764
repo_url: None
paper_authors: Prince Mengue, Laurine Meistersheim, Sami Hage-Ali, Cécile Floer, Sébastien Petit-Watelot, Daniel Lacour, Michel Hehn, Omar Elmazria
for: 这个研究旨在开发一种抗频率干扰的磁场测量系统，包括磁场的温度补做。
methods: 该研究使用了一种抗频率干扰的磁场探测器，基于ZnO/LiNbO$_3$ Ycut (X轴) 层结构，并使用了Co-Fe-B敏感层来探测磁场变化。
results: 研究表明，该探测器在各种温度下展现出了高磁场和温度敏感度，分别为-63 ppm/$^\circ$C和-781 ppm/mT。此外，该探测器还实现了温度补做，并且可以实现多感器功能和RFID功能。

Abstract
Magnetic field measurement including a temperature compensation is essential for a magnetic field sensor. This study investigates a magnetic surface acoustic wave (MSAW) sensor in a reflective delay line configuration with two acoustic propagation paths with and without magnetic field sensitive layer. The delay in path with sensitive layer leads to magnetic field detection and the one without enable temperature measurement and thus compensation for the first path. The developed sensor is based on a ZnO/LiNbO$_3$ Ycut (X-direction) layered structure as Love wave platform. Love wave as a shear wave being more favorable for magnetic detection. Co-Fe-B is considered as sensitive layer to detect magnetic field changes and is deposited on the top of ZnO, but only on one of the two paths. We combined an original configuration of connected IDTs with a high electromechanical coupling coefficient (K$^2$) mode to improve the signal amplitude. The achieved sensor exhibits a high temperature and magnetic field sensitivity of -63 ppm/$^\circ$C and -781 ppm/mT, respectively. The temperature compensation method for magnetic field measurement is demonstrated using a differential measurement by subtracting the delay times obtained for the two paths with and without the sensitive layer. Finally, The sensor exhibited good repeatability at various temperatures. Moreover, the device developed allows in addition to the multisensor functionality, the radio frequency identification (RFID) which is necessary for the deployment of sensor networks.

摘要
магнитное поле измерение, включая температурную компенсацию, является важным для магнитного поля сенсора. Это исследование изучает магнитный поверхностный акустический волны (MSAW) сенсор в конфигурации отрактивной задержки с двумя путями пропаганции с и без магнитнечувствительного слоя. Задержка на пути с чувствительным слоем приводит к измерению магнитного поля, а путь без него позволяет измерять температуру и thus компенсировать ее в первом пути. Разработанный сенсор основан на структуре ZnO/LiNbO$_3$ Ycut (по kierungi X) с платформой Love волны, которая более удобна для магнитного детектирования. Сенсор использует слой Co-Fe-B для детектирования изменений магнитного поля и нанесен на верхнюю часть ZnO, но только на одном из двух путей. Мы комбинировали оригинальную конфигурацию соединенных IDTs с высоким коэффициентом электромеханической взаимодействия (K^2) для улучшения амплитуды сигнала. Разработанный сенсор демонстрирует высокую температурную и магнитную чувствительность -63 ppm/$^\circ$C и -781 ppm/mT, соответственно. Метод температурной компенсации для измерения магнитного поля демонстрируется с помощью дифференциального измерения, subtracting the delay times obtained for the two paths with and without the sensitive layer. Кроме того, сенсор показал хорошую повторяемость при различных температурах. Кроме того, разработанный сенсор позволяет в дополнение к многосенсорной функции, радиоидентификацию (RFID), которая необходима для деплоирования сетей сенсоров.

Managing the Impact of Sensor’s Thermal Noise in Machine Learning for Nuclear Applications

paper_url: http://arxiv.org/abs/2310.01014
repo_url: None
paper_authors: Issam Hammad
for: 这个论文是为了探讨加拧度、磁力和自转仪在核电厂中进行测量时的噪声问题，以及这些噪声对于基于感知器的机器学习模型的影响。
methods: 这篇论文使用了机器学习技术，以analyze the impact of thermal noise on sensor-fusion-based machine learning models in nuclear applications.
results: 该论文发现，在生产环境中部署机器学习模型时，温度噪声会导致感知器的精度下降，从而影响模型的准确性。此外，论文还发现了不同机器学习算法对于温度噪声的影响不同，选择更加鲜硬的模型可以减轻这种影响。

Abstract
Sensors such as accelerometers, magnetometers, and gyroscopes are frequently utilized to perform measurements in nuclear power plants. For example, accelerometers are used for vibration monitoring of critical systems. With the recent rise of machine learning, data captured from such sensors can be used to build machine learning models for predictive maintenance and automation. However, these sensors are known to have thermal noise that can affect the sensor's accuracy. Thermal noise differs between sensors in terms of signal-to-noise ratio (SNR). This thermal noise will cause an accuracy drop in sensor-fusion-based machine learning models when deployed in production. This paper lists some applications for Canada Deuterium Uranium (CANDU) reactors where such sensors are used and therefore can be impacted by the thermal noise issue if machine learning is utilized. A list of recommendations to help mitigate the issue when building future machine learning models for nuclear applications based on sensor fusion is provided. Additionally, this paper demonstrates that machine learning algorithms can be impacted differently by the issue, therefore selecting a more resilient model can help in mitigating it.

摘要
感知器如加速计、磁计和陀螺仪在核电厂中广泛应用于测量。例如，加速计用于关键系统的振荡监测。随着机器学习的兴起，从感知器获取的数据可以用于建立机器学习模型，以实现预测维护和自动化。然而，感知器受到热噪声的影响，可能导致感知器的精度下降。热噪声 между感知器不同，这会导致机器学习模型在生产环境中的精度下降。这篇论文介绍了加拿大氘气氘化燃料（CANDU）堆反应器中的应用，因此可能受到热噪声问题的影响。此外，文章还提供了一些建议来减轻这一问题，以及选择更鲁容的机器学习模型，以减轻其影响。此外，文章还示出了不同的机器学习算法受到热噪声问题的影响，因此选择更鲁容的机器学习模型可以减轻这一问题。

Enhancing Secrecy Capacity in PLS Communication with NORAN based on Pilot Information Codebooks

paper_url: http://arxiv.org/abs/2310.01453
repo_url: None
paper_authors: Yebo Gu, Tao Shen, Jian Song, Qingbo Wang
for: 这篇论文是关于非正交人造障碍（NORAN）的研究，旨在提高通信系统的安全性和隐私性。
methods: 这篇论文提出了一种基于飞行信息编码表（PIC）的新的NORAN方案，该方案可以在不增加正式障碍（LC）的情况下提高安全性。
results: numerical simulations和分析表明，使用PIC基于NORAN方案可以显著提高通信系统的安全性和隐私性。

Abstract
In recent research, non-orthogonal artificial noise (NORAN) has been proposed as an alternative to orthogonal artificial noise (AN). However, NORAN introduces additional noise into the channel, which reduces the capacity of the legitimate channel (LC). At the same time, selecting a NORAN design with ideal security performance from a large number of design options is also a challenging problem. To address these two issues, a novel NORAN based on a pilot information codebook is proposed in this letter. The codebook associates different suboptimal NORANs with pilot information as the key under different channel state information (CSI). The receiver interrogates the codebook using the pilot information to obtain the NORAN that the transmitter will transmit in the next moment, in order to eliminate the NORAN when receiving information. Therefore, NORAN based on pilot information codebooks can improve the secrecy capacity (SC) of the communication system by directly using suboptimal NORAN design schemes without increasing the noise in the LC. Numerical simulations and analyses show that the introduction of NORAN with a novel design using pilot information codebooks significantly enhances the security and improves the SC of the communication system.

摘要

A Resource-efficient FIR Filter Design Based on an RAG Improved Algorithm

paper_url: http://arxiv.org/abs/2310.00912
repo_url: None
paper_authors: Mengwei Hu, Zhengxiong Li, Xianyang Jiang
for: 这个论文主要用于提出一种高效的数字滤波器芯片设计方法，以优化资源利用率。
methods: 该论文使用了一种改进的RAG算法，以减少 multiplication 函数的硬件资源消耗。
results: 对比各种算法和矩阵大小，实验结果显示，提出的算法在逻辑资源利用率、资源分配策略、运行速度和功耗消耗等方面具有优势。

Abstract
In modern digital filter chip design, efficient resource utilization is a hot topic. Due to the linear phase characteristics of FIR filters, a pulsed fully parallel structure can be applied to address the problem. To further reduce hardware resource consumption, especially related to multiplication functions, an improved RAG algorithm has been proposed. Filters with different orders and for different algorithms have been compared, and the experimental results show that the improved RAG algorithm excels in terms of logic resource utilization, resource allocation, running speed, and power consumption under various application scenarios. The proposed algorithm introduces a better circuit structure for FIR filters, fully leveraging resource allocation strategies to reduce logic resource consumption. The proposed circuit is faster and more stable, making it suitable for a variety of complex application scenarios.

摘要
现代数字筛选器设计中，高效资源利用是一个热门话题。由于FIR筛选器的线性阶段特性，可以应用精心设计的全параллеLRagstruktur来解决这个问题。为了进一步减少硬件资源占用，特别是相multiplication功能，一种改进的RAG算法已经被提议。对于不同顺序和不同算法的筛选器进行了比较，实验结果表明，改进的RAG算法在逻辑资源利用、资源分配、运行速度和电力占用等多种应用场景中具有优异性。提出的Circuit结构可以充分利用资源分配策略，以减少逻辑资源占用。这种快速稳定的Circuit适用于许多复杂的应用场景。

2023-10-01

cs.SD

cs.SD - 2023-10-01

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

paper_url: http://arxiv.org/abs/2310.00704
repo_url: https://github.com/yangdongchao/UniAudio_demo
paper_authors: Dongchao Yang, Jinchuan Tian, Xu Tan, Rongjie Huang, Songxiang Liu, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian, Xixin Wu, Zhou Zhao, Shinji Watanabe, Helen Meng
for: 这个论文的目标是开发一个可以处理多种生成任务的语言模型（LLM），并使其能够生成具有给定输入条件的多种音频类型（包括语音、声音、音乐和歌唱）。
methods: 这个论文使用了一种新的 Tokenization 技术，即 residual vector quantization based neural codec，来处理各种目标音频的tokenization。它还使用了一种多尺度 transformer 模型来处理长度过长的序列问题。
results: 论文在11个任务上实现了州际级或至少竞争性的成绩，并且发现UniAudio模型在所有训练任务中表现出了强大的能力。

Abstract
Large Language models (LLM) have demonstrated the capability to handle a variety of generative tasks. This paper presents the UniAudio system, which, unlike prior task-specific approaches, leverages LLM techniques to generate multiple types of audio (including speech, sounds, music, and singing) with given input conditions. UniAudio 1) first tokenizes all types of target audio along with other condition modalities, 2) concatenates source-target pair as a single sequence, and 3) performs next-token prediction using LLM. Also, a multi-scale Transformer model is proposed to handle the overly long sequences caused by the residual vector quantization based neural codec in tokenization. Training of UniAudio is scaled up to 165K hours of audio and 1B parameters, based on all generative tasks, aiming to obtain sufficient prior knowledge not only in the intrinsic properties of audio but also the inter-relationship between audio and other modalities. Therefore, the trained UniAudio model has the potential to become a foundation model for universal audio generation: it shows strong capability in all trained tasks and can seamlessly support new audio generation tasks after simple fine-tuning. Experiments demonstrate that UniAudio achieves state-of-the-art or at least competitive results on most of the 11 tasks. Demo and code are released at https://github.com/yangdongchao/UniAudio

摘要
大型语言模型（LLM）已经证明了处理多种生成任务的能力。这篇论文介绍了UniAudio系统，与前一些任务特定的方法不同，通过LLM技术来生成多种音频（包括语音、声音、音乐和歌唱），并且可以根据输入条件进行生成。UniAudio的实现方式包括以下三个步骤：1. 对所有类型的目标音频进行token化，并将其与其他条件模式一起 concatenate 成一个序列。2. 使用 LLM 进行下一个token预测。3. 使用多级 transformer 模型来处理由 residual vector quantization 基于的 neural codec 生成的过长序列。在训练UniAudio时，使用了165K小时的音频和1B参数，基于所有生成任务，以获得充足的先验知识不仅在音频的内在性能，还在音频和其他模式之间的关系。因此，训练UniAudio模型后，可以作为普适的音频生成基模型，它在所有训练任务中表现出了强大的能力，并且可以通过简单的微调来支持新的音频生成任务。实验结果表明，UniAudio在大多数11个任务中具有国际级或至少竞争力的成绩。示例和代码可以在https://github.com/yangdongchao/UniAudio 中下载。

Pianist Identification Using Convolutional Neural Networks

paper_url: http://arxiv.org/abs/2310.00699
repo_url: https://github.com/betsytang/pid-cnn
paper_authors: Jingjing Tang, Geraint Wiggins, Gyorgy Fazekas
For: 本研究旨在用深度学习技术自动识别表演型钢琴演奏者，解决了建立智能音乐 инструмент和智能音乐系统的挑战。* Methods: 我们使用卷积神经网络和表达特征来实现自动识别，并对大规模的表演型钢琴演奏数据进行了深度学习技术的应用和改进。* Results: 我们的模型在6类识别任务中达到85.3%的准确率，比基eline模型高出了20.8%。我们的改进的数据集也提供了更好的训练数据，为自动演奏者识别做出了重要贡献。

Abstract
This paper presents a comprehensive study of automatic performer identification in expressive piano performances using convolutional neural networks (CNNs) and expressive features. Our work addresses the challenging multi-class classification task of identifying virtuoso pianists, which has substantial implications for building dynamic musical instruments with intelligence and smart musical systems. Incorporating recent advancements, we leveraged large-scale expressive piano performance datasets and deep learning techniques. We refined the scores by expanding repetitions and ornaments for more accurate feature extraction. We demonstrated the capability of one-dimensional CNNs for identifying pianists based on expressive features and analyzed the impact of the input sequence lengths and different features. The proposed model outperforms the baseline, achieving 85.3% accuracy in a 6-way identification task. Our refined dataset proved more apt for training a robust pianist identifier, making a substantial contribution to the field of automatic performer identification. Our codes have been released at https://github.com/BetsyTang/PID-CNN.

摘要

2023-10-01

eess.AS

eess.AS - 2023-10-01

Mechatronic Generation of Datasets for Acoustics Research

paper_url: http://arxiv.org/abs/2310.00587
repo_url: None
paper_authors: Austin Lu, Ethaniel Moore, Arya Nallanthighall, Kanad Sarkar, Manan Mittal, Ryan M. Corey, Paris Smaragdis, Andrew Singer
for: 这篇论文是为了描述一种机器人共享测试空间，用于实现自动化听音实验。
methods: 该系统使用无线多机器人协调技术，实现同步机器人运动，以适应动态场景中的移动发声器和收音器。用户可以通过虚拟控制界面来设计自动化实验，收集大规模的听音数据。
results: 实验结果表明，MARS系统可以生成高可靠性的听音数据，并且可以帮助研究人员无需特有听音研究空间来收集听音数据。

Abstract
We address the challenge of making spatial audio datasets by proposing a shared mechanized recording space that can run custom acoustic experiments: a Mechatronic Acoustic Research System (MARS). To accommodate a wide variety of experiments, we implement an extensible architecture for wireless multi-robot coordination which enables synchronized robot motion for dynamic scenes with moving speakers and microphones. Using a virtual control interface, we can remotely design automated experiments to collect large-scale audio data. This data is shown to be similar across repeated runs, demonstrating the reliability of MARS. We discuss the potential for MARS to make audio data collection accessible for researchers without dedicated acoustic research spaces.

摘要
我们面临的挑战是创建空间听音数据集，我们提议一种共享机械化录音空间，可以进行自定义听音实验：一个名为 MARS 的机械听音研究系统。为了满足广泛的实验需求，我们实施了可扩展的无线多机器人协调架构，可以实现同步的机器人运动，以便在动态场景中进行移动speaker和 microphone的记录。通过虚拟控制界面，我们可以远程设计自动化实验，收集大规模的听音数据。这些数据显示与重复运行中的相似性，证明 MARS 的可靠性。我们讨论了 MARS 的潜在可能性，使听音数据采集变得对研究人员而言可 accessible。

2023-10-01

cs.CV

cs.CV - 2023-10-01

Sharingan: A Transformer-based Architecture for Gaze Following

paper_url: http://arxiv.org/abs/2310.00816
repo_url: None
paper_authors: Samy Tafasca, Anshul Gupta, Jean-Marc Odobez
for: 这 paper 是为了研究人类视线跟踪的模型，以便在各种应用领域中使用。
methods: 这 paper 使用了一种新的 transformer-based 架构来实现 2D 视线预测。
results: 这 paper 在 GazeFollow 和 VideoAttentionTarget 数据集上 achieved state-of-the-art 结果。Here’s the full translation in Simplified Chinese:
for: 这 paper 是为了研究人类视线跟踪的模型，以便在各种应用领域中使用。
methods: 这 paper 使用了一种新的 transformer-based 架构来实现 2D 视线预测。
results: 这 paper 在 GazeFollow 和 VideoAttentionTarget 数据集上 achieved state-of-the-art 结果。I hope this helps! Let me know if you have any further questions.

Abstract
Gaze is a powerful form of non-verbal communication and social interaction that humans develop from an early age. As such, modeling this behavior is an important task that can benefit a broad set of application domains ranging from robotics to sociology. In particular, Gaze Following is defined as the prediction of the pixel-wise 2D location where a person in the image is looking. Prior efforts in this direction have focused primarily on CNN-based architectures to perform the task. In this paper, we introduce a novel transformer-based architecture for 2D gaze prediction. We experiment with 2 variants: the first one retains the same task formulation of predicting a gaze heatmap for one person at a time, while the second one casts the problem as a 2D point regression and allows us to perform multi-person gaze prediction with a single forward pass. This new architecture achieves state-of-the-art results on the GazeFollow and VideoAttentionTarget datasets. The code for this paper will be made publicly available.

摘要
gaze 是一种强大的非语言通信和社交互动方式，人类从 early age 开始发展。因此，模拟这种行为是一项重要的任务，可以 benefiting Broad 应用领域，从机器人学到社会学。特别是，瞥向预测（Gaze Following）定义为图像中人员的 pixel-wise 2D 位置预测。先前的尝试都是通过 CNN arquitectures 来完成这项任务。在这篇论文中，我们提出了一种新的 transformer 结构来实现 2D 瞥向预测。我们实验了两个变体：第一个保持了同样的任务表述，即预测一个人的瞥向热图 ; 第二个将问题定义为2D 点 regression，允许我们通过单一的前进 pass 进行多人瞥向预测。这种新的结构实现了 GazeFollow 和 VideoAttentionTarget 数据集的状态图。代码将公开发布。

Completing Visual Objects via Bridging Generation and Segmentation

paper_url: http://arxiv.org/abs/2310.00808
repo_url: None
paper_authors: Xiang Li, Yinpeng Chen, Chung-Ching Lin, Rita Singh, Bhiksha Raj, Zicheng Liu
for: reconstruction of a complete object from its partially visible components
methods: iterative stages of generation and segmentation, with the object mask provided as an additional condition
results: superior object completion results compared to existing approaches such as ControlNet and Stable Diffusion

Abstract
This paper presents a novel approach to object completion, with the primary goal of reconstructing a complete object from its partially visible components. Our method, named MaskComp, delineates the completion process through iterative stages of generation and segmentation. In each iteration, the object mask is provided as an additional condition to boost image generation, and, in return, the generated images can lead to a more accurate mask by fusing the segmentation of images. We demonstrate that the combination of one generation and one segmentation stage effectively functions as a mask denoiser. Through alternation between the generation and segmentation stages, the partial object mask is progressively refined, providing precise shape guidance and yielding superior object completion results. Our experiments demonstrate the superiority of MaskComp over existing approaches, e.g., ControlNet and Stable Diffusion, establishing it as an effective solution for object completion.

摘要
这篇论文提出了一种新的物体完成方法，主要目标是从部分可见的组件中重建完整的物体。我们的方法，名为MaskComp，通过 iterate 的生成和分割阶段来进行分割。在每个迭代阶段，提供对象Mask作为附加条件，以提高图像生成，并在返回的图像中提取更加精确的Mask。我们发现，通过生成和分割阶段的交互，可以有效地减少Mask的噪声。通过 alternate 生成和分割阶段，部分物体Mask可以逐渐进行精细化，提供精确的形状指导，并且实现了更好的物体完成效果。我们的实验表明，MaskComp 比 existed 方法（如 ControlNet 和 Stable Diffusion）更加有效， establishing 它为物体完成的有效解决方案。

Propagating Semantic Labels in Video Data

paper_url: http://arxiv.org/abs/2310.00783
repo_url: None
paper_authors: David Balaban, Justin Medich, Pranay Gosar, Justin Hart
for: 这个论文的目的是提出一种基于Foundation Models的视频 segmentation方法，以减少人工标注成本。
methods: 该方法使用Segment Anything Model (SAM)和Structure from Motion (SfM)两种技术来实现视频 segmentation。首先，视频输入被重构为3D几何结构使用SfM，然后使用SAM进行每帧的分割。最后，对于每帧的分割结果，进行3D几何投影，以便在新的视角下进行跟踪。
results: 该方法可以大幅减少人工标注成本，但是与人工标注的性能相比，其性能有所下降。三个主要纪录器都用于评估系统性能：计算时间、面积 overlap with manual labels和跟踪损失数量。结果表明，该系统在跟踪对象在视频帧上的计算时间方面具有显著的提高，但是在性能方面却存在一定的下降。

Abstract
Semantic Segmentation combines two sub-tasks: the identification of pixel-level image masks and the application of semantic labels to those masks. Recently, so-called Foundation Models have been introduced; general models trained on very large datasets which can be specialized and applied to more specific tasks. One such model, the Segment Anything Model (SAM), performs image segmentation. Semantic segmentation systems such as CLIPSeg and MaskRCNN are trained on datasets of paired segments and semantic labels. Manual labeling of custom data, however, is time-consuming. This work presents a method for performing segmentation for objects in video. Once an object has been found in a frame of video, the segment can then be propagated to future frames; thus reducing manual annotation effort. The method works by combining SAM with Structure from Motion (SfM). The video input to the system is first reconstructed into 3D geometry using SfM. A frame of video is then segmented using SAM. Segments identified by SAM are then projected onto the the reconstructed 3D geometry. In subsequent video frames, the labeled 3D geometry is reprojected into the new perspective, allowing SAM to be invoked fewer times. System performance is evaluated, including the contributions of the SAM and SfM components. Performance is evaluated over three main metrics: computation time, mask IOU with manual labels, and the number of tracking losses. Results demonstrate that the system has substantial computation time improvements over human performance for tracking objects over video frames, but suffers in performance.

摘要
Semantic Segmentation 将两个子任务结合在一起：Pixel-level图像mask的标识和图像mask的semantic标签应用。最近，称之为基础模型的模型被引入，这些模型可以在很大的数据集上训练，然后应用到更特定的任务上。一个such model是Segment Anything Model（SAM），它实现了图像 segmentation。图像 segmentation系统such as CLIPSeg和MaskRCNN通常是在paired segments和semantic labels的数据集上训练的。然而，手动标注自定义数据是时间consuming。这个工作提出了一种方法，通过结合SAM和Structure from Motion（SfM）来实现对视频帧中对象的分割。首先，视频输入被重建为3D几何结构使用SfM。然后，在SAM中Segment一帧视频。由SAM标识的分割被 проекted onto the reconstructed 3D几何结构。在后续的视频帧中，标注的3D几何结构被重新投影到新的视角，以便在新的视频帧中invoked SAM fewer times。系统性能被评估，包括SAM和SfM组件的贡献。性能被评估以三个主要指标：计算时间、mask IOU with manual labels和跟踪损失数。结果表明，系统在跟踪对象在视频帧之间的计算时间上有substantial的提高，但是性能不如人工标注。

SMOOT: Saliency Guided Mask Optimized Online Training

paper_url: http://arxiv.org/abs/2310.00772
repo_url: None
paper_authors: Ali Karkehabadi, Houman Homayoun, Avesta Sasan
for: 这种论文的目的是提出一种新的隐藏导航法（Saliency-Guided Training，SGT），以提高深度神经网络的解释性。
methods: 这种方法使用反射和修改Gradient来引导模型强调最重要的特征，以提高模型的解释性。
results: 实验结果表明，我们的提案可以有效地提高模型的准确率和隐藏特征的明确度。

Abstract
Deep Neural Networks are powerful tools for understanding complex patterns and making decisions. However, their black-box nature impedes a complete understanding of their inner workings. Saliency-Guided Training (SGT) methods try to highlight the prominent features in the model's training based on the output to alleviate this problem. These methods use back-propagation and modified gradients to guide the model toward the most relevant features while keeping the impact on the prediction accuracy negligible. SGT makes the model's final result more interpretable by masking input partially. In this way, considering the model's output, we can infer how each segment of the input affects the output. In the particular case of image as the input, masking is applied to the input pixels. However, the masking strategy and number of pixels which we mask, are considered as a hyperparameter. Appropriate setting of masking strategy can directly affect the model's training. In this paper, we focus on this issue and present our contribution. We propose a novel method to determine the optimal number of masked images based on input, accuracy, and model loss during the training. The strategy prevents information loss which leads to better accuracy values. Also, by integrating the model's performance in the strategy formula, we show that our model represents the salient features more meaningful. Our experimental results demonstrate a substantial improvement in both model accuracy and the prominence of saliency, thereby affirming the effectiveness of our proposed solution.

摘要

Counterfactual Image Generation for adversarially robust and interpretable Classifiers

paper_url: http://arxiv.org/abs/2310.00761
repo_url: None
paper_authors: Rafael Bischof, Florian Scheidegger, Michael A. Kraus, A. Cristiano I. Malossi
for: 这种方法的目的是提高神经网络图像分类器的解释性和robustness。
methods: 该方法使用图像到图像翻译生成器（GANs）来生成对应的替换样本，以提高解释性和对抗性。
results: 该方法可以生成高度描述性的解释图像，并且可以提高模型对抗性。此外，该方法还可以用来评估模型的不确定性。

Abstract
Neural Image Classifiers are effective but inherently hard to interpret and susceptible to adversarial attacks. Solutions to both problems exist, among others, in the form of counterfactual examples generation to enhance explainability or adversarially augment training datasets for improved robustness. However, existing methods exclusively address only one of the issues. We propose a unified framework leveraging image-to-image translation Generative Adversarial Networks (GANs) to produce counterfactual samples that highlight salient regions for interpretability and act as adversarial samples to augment the dataset for more robustness. This is achieved by combining the classifier and discriminator into a single model that attributes real images to their respective classes and flags generated images as "fake". We assess the method's effectiveness by evaluating (i) the produced explainability masks on a semantic segmentation task for concrete cracks and (ii) the model's resilience against the Projected Gradient Descent (PGD) attack on a fruit defects detection problem. Our produced saliency maps are highly descriptive, achieving competitive IoU values compared to classical segmentation models despite being trained exclusively on classification labels. Furthermore, the model exhibits improved robustness to adversarial attacks, and we show how the discriminator's "fakeness" value serves as an uncertainty measure of the predictions.

摘要
Our framework combines the classifier and discriminator into a single model, which attributes real images to their respective classes and flags generated images as "fake". We evaluate the effectiveness of our method by assessing the produced explainability masks on a semantic segmentation task for concrete cracks and the model's resilience against the Projected Gradient Descent (PGD) attack on a fruit defects detection problem.Our produced saliency maps are highly descriptive and achieve competitive IoU values compared to classical segmentation models, despite being trained exclusively on classification labels. Additionally, the model exhibits improved robustness to adversarial attacks, and we show how the discriminator's "fakeness" value serves as an uncertainty measure of the predictions.

Top-down Green-ups: Satellite Sensing and Deep Models to Predict Buffelgrass Phenology

paper_url: http://arxiv.org/abs/2310.00740
repo_url: https://github.com/lurosenb/phenology_projects
paper_authors: Lucas Rosenblatt, Bin Han, Erin Posthumus, Theresa Crimmins, Bill Howe
For: 预测buffelgrass的”绿化”（即Ready for herbicidal treatment），以预防南部美国的严重野火和生物多样性损失。* Methods: 使用卫星感知和深度学习模型，包括时间、视觉和多模态模型，以提高预测 buffelgrass 绿化的精度。* Results: 所有神经网络基于的方法都超越了传统 buffelgrass 绿化模型，并讨论了如何实现神经网络模型的部署，以实现 significiant resource savings。

Abstract
An invasive species of grass known as "buffelgrass" contributes to severe wildfires and biodiversity loss in the Southwest United States. We tackle the problem of predicting buffelgrass "green-ups" (i.e. readiness for herbicidal treatment). To make our predictions, we explore temporal, visual and multi-modal models that combine satellite sensing and deep learning. We find that all of our neural-based approaches improve over conventional buffelgrass green-up models, and discuss how neural model deployment promises significant resource savings.

摘要
“一种入侵性的草本植物──牛肚草”在南部美国引起了严重的野火和生物多样性损失。我们面临着预测牛肚草“绿化”（即Ready for 药物处理）的问题。为了实现这一目标，我们探讨了时间、视觉和多模态模型， combining satellite sensing和深度学习。我们发现所有的神经网络方法都超过了传统的牛肚草绿化模型，并讨论了如何部署神经网络模型以实现显著的资源节约。”Note that "牛肚草" (bù dù cǎo) is the Simplified Chinese term for "buffelgrass".

HOH: Markerless Multimodal Human-Object-Human Handover Dataset with Large Object Count

paper_url: http://arxiv.org/abs/2310.00723
repo_url: None
paper_authors: Noah Wiederhold, Ava Megyeri, DiMaggio Paris, Sean Banerjee, Natasha Kholgade Banerjee
for: 论文主要用于促进数据驱动的手夹研究、人机手夹实现以及人工智能手夹参数估计的数据集。
methods: 本论文使用了多视图RGB和深度数据、skeleton、笔oint clouds、抓取类型和手夹性labels、物体、接受手和发送手2D和3D分割、接受手和发送手舒适评分、对象元数据和对应的3D模型等数据来描述136种物品的人类互动。
results: 本论文通过使用HOH数据集进行神经网络训练，实现了抓取、orientation和轨迹预测等任务。相比标注数据集，HOH数据集不需要特定的装备，可以更自然地捕捉人类之间的手夹互动，并且包含了人类手夹互动的高分辨率手夹跟踪数据。至今为止，HOH数据集是手夹数据集中最大的物品数、参与者数、对应的对话对数和总交互记录的数据集。

Abstract
We present the HOH (Human-Object-Human) Handover Dataset, a large object count dataset with 136 objects, to accelerate data-driven research on handover studies, human-robot handover implementation, and artificial intelligence (AI) on handover parameter estimation from 2D and 3D data of person interactions. HOH contains multi-view RGB and depth data, skeletons, fused point clouds, grasp type and handedness labels, object, giver hand, and receiver hand 2D and 3D segmentations, giver and receiver comfort ratings, and paired object metadata and aligned 3D models for 2,720 handover interactions spanning 136 objects and 20 giver-receiver pairs-40 with role-reversal-organized from 40 participants. We also show experimental results of neural networks trained using HOH to perform grasp, orientation, and trajectory prediction. As the only fully markerless handover capture dataset, HOH represents natural human-human handover interactions, overcoming challenges with markered datasets that require specific suiting for body tracking, and lack high-resolution hand tracking. To date, HOH is the largest handover dataset in number of objects, participants, pairs with role reversal accounted for, and total interactions captured.

摘要
我们提出了人机物交换数据集（HOH），包含136种物品，以加速基于数据驱动的手柄研究、人机手柄实现和人工智能（AI）在手柄参数估计方面。HOH包含多视角RGB和深度数据、skeleton、粘合点云、抓取类型和手征标注、物品、付给人手和接受人手2D和3D分割、付给人和接受人的舒适评分、对应的对象元数据和对应的3D模型。我们还显示了使用HOH训练神经网络进行抓取、方向和轨迹预测的实验结果。作为唯一的无标记手柄捕捉数据集，HOH表现了自然的人人手柄交换互动，超越了标记数据集需要特定的服装以满足身体跟踪的问题，以及缺乏高分辨率手征跟踪。到目前为止，HOH是手柄数据集中最大的物品数、参与者数、对话对数和总交换次数。

Logical Bias Learning for Object Relation Prediction

paper_url: http://arxiv.org/abs/2310.00712
repo_url: None
paper_authors: Xinyu Zhou, Zihan Ji, Anna Zhu
for: 提高Scene Graph生成（SGG）的精度和可靠性，以提高图像理解和下游任务的能力。
methods: 基于 causal inference 的对象关系预测策略，并提出一个对象提升模块来进行缺省研究。
results: 在Visual Gnome 150（VG-150）dataset上实验证明了我们提议的方法的有效性。

Abstract
Scene graph generation (SGG) aims to automatically map an image into a semantic structural graph for better scene understanding. It has attracted significant attention for its ability to provide object and relation information, enabling graph reasoning for downstream tasks. However, it faces severe limitations in practice due to the biased data and training method. In this paper, we present a more rational and effective strategy based on causal inference for object relation prediction. To further evaluate the superiority of our strategy, we propose an object enhancement module to conduct ablation studies. Experimental results on the Visual Gnome 150 (VG-150) dataset demonstrate the effectiveness of our proposed method. These contributions can provide great potential for foundation models for decision-making.

摘要
Scene graph generation (SGG) 目标是自动将图像映射到 semantic 结构图，以提高场景理解。它吸引了大量注意力，因为它可以提供对象和关系信息，使得图reasoning 可能。然而，在实践中，它面临严重的限制，主要是因为数据和训练方法偏向。在这篇论文中，我们提出了基于 causal inference 的更合理和有效的策略，用于对象关系预测。为了进一步证明我们的策略的超越性，我们提出了对象增强模块进行缺失研究。实验结果表明，我们提posed 方法在 Visual Gnome 150 (VG-150) 数据集上得到了较好的效果。这些贡献可以为基础模型提供巨大的潜力。

You Do Not Need Additional Priors in Camouflage Object Detection

paper_url: http://arxiv.org/abs/2310.00702
repo_url: None
paper_authors: Yuchen Dong, Heng Zhou, Chengyang Li, Junjie Xie, Yongqiang Xie, Zhongbo Li
for: 本研究旨在开发一种不需要额外知识的掩蔽物检测网络，以解决现有方法强调额外知识的问题。
methods: 我们提出了一种新的自适应特征综合方法，通过多层特征信息的组合生成导航信息，不同于之前的方法，我们直接从图像特征中提取信息来导航模型训练。
results: 我们通过广泛的实验结果表明，我们的提议方法可以与现有的方法相比或超越其性能。

Abstract
Camouflage object detection (COD) poses a significant challenge due to the high resemblance between camouflaged objects and their surroundings. Although current deep learning methods have made significant progress in detecting camouflaged objects, many of them heavily rely on additional prior information. However, acquiring such additional prior information is both expensive and impractical in real-world scenarios. Therefore, there is a need to develop a network for camouflage object detection that does not depend on additional priors. In this paper, we propose a novel adaptive feature aggregation method that effectively combines multi-layer feature information to generate guidance information. In contrast to previous approaches that rely on edge or ranking priors, our method directly leverages information extracted from image features to guide model training. Through extensive experimental results, we demonstrate that our proposed method achieves comparable or superior performance when compared to state-of-the-art approaches.

摘要
高度掩蔽物检测（COD）具有 significannot challenges，因为掩蔽物和周围环境的高度相似性。当前深度学习方法已经在检测掩蔽物方面做出了 significannot进步，但大多数其中依赖于额外的先验信息。然而，在实际场景中获取这种额外先验信息是both expensive和不实际的。因此，有必要开发一种不依赖于额外先验信息的掩蔽物检测网络。在这篇论文中，我们提出了一种新的 adaptive feature aggregation 方法，可以有效地将多层特征信息集成成导航信息。与先前的方法相比，我们的方法直接利用图像特征中提取的信息来导航模型训练。通过广泛的实验结果，我们证明了我们提出的方法可以与当前状态的方法相比或更高的性能。

A quantum moving target segmentation algorithm for grayscale video

paper_url: http://arxiv.org/abs/2310.03038
repo_url: None
paper_authors: Wenjie Liu, Lu Wang, Qingshan Wu
for: 用于实时分割视频中移动目标。
methods: 使用量子机制同时计算所有邻帧图像差异，然后快速分割移动目标。设计了可行的量子比较器，用于判断灰度值与阈值的差异。
results: 对 IBM Q 进行实验，确认了我们的算法在不纯量子时代（NISQ）中的可行性。对于一个量子视频包含 $2^m$ 帧 ($每帧是 $2^n\times 2^n$ 图像，每个像素有 $q$ 灰度水平），我们的算法的复杂度可以降至 O $(n^2 + q) $。与 классический对比，它具有对数快速速度增长，同时也高于现有的量子算法。

Abstract
The moving target segmentation (MTS) aims to segment out moving targets in the video, however, the classical algorithm faces the huge challenge of real-time processing in the current video era. Some scholars have successfully demonstrated the quantum advantages in some video processing tasks, but not concerning moving target segmentation. In this paper, a quantum moving target segmentation algorithm for grayscale video is proposed, which can use quantum mechanism to simultaneously calculate the difference of all pixels in all adjacent frames and then quickly segment out the moving target. In addition, a feasible quantum comparator is designed to distinguish the grayscale values with the threshold. Then several quantum circuit units, including three-frame difference, binarization and AND operation, are designed in detail, and then are combined together to construct the complete quantum circuits for segmenting the moving target. For a quantum video with $2^m$ frames (every frame is a $2^n\times 2^n$ image with $q$ grayscale levels), the complexity of our algorithm can be reduced to O$(n^2 + q)$. Compared with the classic counterpart, it is an exponential speedup, while its complexity is also superior to the existing quantum algorithms. Finally, the experiment is conducted on IBM Q to show the feasibility of our algorithm in the noisy intermediate-scale quantum (NISQ) era.

摘要
traditional Chinese version:运动目标分割（MTS）的目标是将影像中的运动目标分割出来，但 класиical algorithm在现今的影像时代中面临巨大的实时处理挑战。一些学者已经成功地显示了量子优势在一些影像处理任务中，但不包括运动目标分割。本文提出了一个量子运动目标分割算法 для灰度影像，可以使用量子机制同时计算所有帧的差值，快速地分割出运动目标。此外，一个可行的量子比较器也被设计出来，用于区分灰度值与阈值。然后，一些量子Circuit单元，包括三帧差值、binarization 和 AND 操作，在细节中被设计出来，然后被组合起来建立完整的量子Circuits для分割运动目标。对于一个具有 $2^m$ 帧影像（每帧是 $2^n\times 2^n$ 图像，每个像素有 $q$ 灰度水平）的量子影像，我们的算法的复杂度可以降至 O $(n^2 + q)$。相比 классиical counterpart，这是一个指数快速的优化，而且其复杂度也高于现有的量子算法。最后，我们在 IBM Q 上进行实验，以显示我们的算法在不确定中等量子（NISQ）时代的可行性。Here's the translation in Simplified Chinese:运动目标分割（MTS）的目标是将影像中的运动目标分割出来，但 classical algorithm在现今的影像时代中面临巨大的实时处理挑战。一些学者已经成功地显示了量子优势在一些影像处理任务中，但不包括运动目标分割。本文提出了一个量子运动目标分割算法 для灰度影像，可以使用量子机制同时计算所有帧的差值，快速地分割出运动目标。此外，一个可行的量子比较器也被设计出来，用于区分灰度值与阈值。然后，一些量子Circuit单元，包括三帧差值、binarization 和 AND 操作，在细节中被设计出来，然后被组合起来建立完整的量子Circuits для分割运动目标。对于一个具有 $2^m$ 帧影像（每帧是 $2^n\times 2^n$ 图像，每个像素有 $q$ 灰度水平）的量子影像，我们的算法的复杂度可以降至 O $(n^2 + q)$。相比 classical counterpart，这是一个指数快速的优化，而且其复杂度也高于现有的量子算法。最后，我们在 IBM Q 上进行实验，以显示我们的算法在不确定中等量子（NISQ）时代的可行性。

Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips

paper_url: http://arxiv.org/abs/2310.00698
repo_url: None
paper_authors: Reshma Ramaprasad
for: 为了让漫画可以对视障群体开放，提供可读的自然语言描述。
methods: 使用计算机视觉技术提取漫画图片中的信息，包括panel、角色和文本信息，然后使用这些信息作为多Modal大语言模型的提示，生成描述。
results: 对一组已经得到人工注释的漫画进行测试，测试结果具有较好的量化和质量指标。

Abstract
Comic strips are a popular and expressive form of visual storytelling that can convey humor, emotion, and information. However, they are inaccessible to the BLV (Blind or Low Vision) community, who cannot perceive the images, layouts, and text of comics. Our goal in this paper is to create natural language descriptions of comic strips that are accessible to the visually impaired community. Our method consists of two steps: first, we use computer vision techniques to extract information about the panels, characters, and text of the comic images; second, we use this information as additional context to prompt a multimodal large language model (MLLM) to produce the descriptions. We test our method on a collection of comics that have been annotated by human experts and measure its performance using both quantitative and qualitative metrics. The outcomes of our experiments are encouraging and promising.

摘要
漫画是一种受欢迎且表达力强的视觉故事形式，可以传达幽默、情感和信息。然而，它们对视障（Blind or Low Vision）社区不可见，无法感受到漫画的图片、布局和文本。我们的目标是创建可访问的漫画描述，以便让视障社区可以享受漫画的乐趣。我们的方法包括两步：第一步，我们使用计算机视觉技术提取漫画图片中的信息，包括画格、人物和文本信息；第二步，我们使用这些信息作为多模态大语言模型（MLLM）的提示，以生成描述。我们对一个收录了人工 эксперTS的漫画集进行测试，并使用量化和质量指标评估我们的方法的性能。实验结果很Encouraging和Promising。

A Hierarchical Graph-based Approach for Recognition and Description Generation of Bimanual Actions in Videos

paper_url: http://arxiv.org/abs/2310.00670
repo_url: None
paper_authors: Fatemeh Ziaeetabar, Reza Safabakhsh, Saeedeh Momtazi, Minija Tamosiunaite, Florentin Wörgötter
for: 这项研究旨在提高视频中人体动作的描述精度和全面性，以满足机器人学、人机交互和视频分析等领域的需求。
methods: 该研究提出了一种新的方法，结合图形模型和层次嵌入式注意机制，以提高视频描述的精度和全面性。该方法首先编码视频中对象和动作之间的空间时间相互关系，然后使用三级建构的层次注意机制，以recognize本地和全局上下文元素。
results: 对多个2D和3D数据集进行了实验，并与状态对比，该方法 consistently 获得了更高的准确率、精度和上下文相关性。在大量的减少实验中，我们也评估了不同组件的作用。该方法可以生成不同 semantic 深度的描述，类似于不同人的描述。此外，更深入的二手手Object交互的理解可能会降低人工智能领域中的机器人模拟动作的准确性。

Abstract
Nuanced understanding and the generation of detailed descriptive content for (bimanual) manipulation actions in videos is important for disciplines such as robotics, human-computer interaction, and video content analysis. This study describes a novel method, integrating graph based modeling with layered hierarchical attention mechanisms, resulting in higher precision and better comprehensiveness of video descriptions. To achieve this, we encode, first, the spatio-temporal inter dependencies between objects and actions with scene graphs and we combine this, in a second step, with a novel 3-level architecture creating a hierarchical attention mechanism using Graph Attention Networks (GATs). The 3-level GAT architecture allows recognizing local, but also global contextual elements. This way several descriptions with different semantic complexity can be generated in parallel for the same video clip, enhancing the discriminative accuracy of action recognition and action description. The performance of our approach is empirically tested using several 2D and 3D datasets. By comparing our method to the state of the art we consistently obtain better performance concerning accuracy, precision, and contextual relevance when evaluating action recognition as well as description generation. In a large set of ablation experiments we also assess the role of the different components of our model. With our multi-level approach the system obtains different semantic description depths, often observed in descriptions made by different people, too. Furthermore, better insight into bimanual hand-object interactions as achieved by our model may portend advancements in the field of robotics, enabling the emulation of intricate human actions with heightened precision.

摘要
importance of nuanced understanding and detailed descriptive content for (bimanual) manipulation actions in videos is crucial for fields such as robotics, human-computer interaction, and video content analysis. This study introduces a novel method that combines graph-based modeling with layered hierarchical attention mechanisms, resulting in more precise and comprehensive video descriptions. To achieve this, we first encode the spatio-temporal interdependencies between objects and actions using scene graphs, and then combine this with a novel 3-level architecture that creates a hierarchical attention mechanism using Graph Attention Networks (GATs). The 3-level GAT architecture allows for the recognition of both local and global contextual elements, enabling the generation of multiple descriptions with different semantic complexity for the same video clip. This approach improves the discriminative accuracy of action recognition and description generation. Our method is empirically tested on several 2D and 3D datasets, and we consistently obtain better performance compared to the state of the art in terms of accuracy, precision, and contextual relevance. In a series of ablation experiments, we also assess the role of the different components of our model. Our multi-level approach enables the system to obtain different semantic description depths, often observed in descriptions made by different people, and may also contribute to advancements in the field of robotics by enabling the emulation of intricate human actions with heightened precision.

Liveness Detection Competition – Noncontact-based Fingerprint Algorithms and Systems (LivDet-2023 Noncontact Fingerprint)

paper_url: http://arxiv.org/abs/2310.00659
repo_url: None
paper_authors: Sandip Purnapatra, Humaira Rezaie, Bhavin Jawade, Yu Liu, Yue Pan, Luke Brosell, Mst Rumana Sumi, Lambert Igene, Alden Dimarco, Srirangaraj Setlur, Soumyabrata Dey, Stephanie Schuckers, Marco Huber, Jan Niklas Kolf, Meiling Fang, Naser Damer, Banafsheh Adami, Raul Chitic, Karsten Seelert, Vishesh Mistry, Rahul Parthe, Umit Kacar
For: The paper is written for the assessment and reporting of state-of-the-art in Presentation Attack Detection (PAD) using noncontact fingerprint-based methods.* Methods: The paper uses a noncontact fingerprint-based PAD competition for algorithms and systems, with a common evaluation protocol that includes finger photos of various Presentation Attack Instruments (PAIs) and live fingers.* Results: The winning algorithm achieved an APCER of 11.35% and a BPCER of 0.62%, while the winning system achieved an APCER of 13.04% and a BPCER of 1.68%. Additionally, four-finger systems that make individual finger-based PAD decisions were also tested.Here are the three key points in Simplified Chinese text:* For: 这篇论文是用于评估和报告非接触指纹基于方法的攻击检测（PAD）的国际竞赛系列。* Methods: 这篇论文使用了一种非接触指纹基于的PAD竞赛，使用共同评估协议，包括指纹 фотографирования多种攻击工具（PAIs）和真实的手指。* Results: 赢家算法实现了APCER的11.35%和BPCER的0.62%，而赢家系统实现了APCER的13.04%和BPCER的1.68%。此外，四根手指系统也进行了个体指纹基于的PAD决策。

Abstract
Liveness Detection (LivDet) is an international competition series open to academia and industry with the objec-tive to assess and report state-of-the-art in Presentation Attack Detection (PAD). LivDet-2023 Noncontact Fingerprint is the first edition of the noncontact fingerprint-based PAD competition for algorithms and systems. The competition serves as an important benchmark in noncontact-based fingerprint PAD, offering (a) independent assessment of the state-of-the-art in noncontact-based fingerprint PAD for algorithms and systems, and (b) common evaluation protocol, which includes finger photos of a variety of Presentation Attack Instruments (PAIs) and live fingers to the biometric research community (c) provides standard algorithm and system evaluation protocols, along with the comparative analysis of state-of-the-art algorithms from academia and industry with both old and new android smartphones. The winning algorithm achieved an APCER of 11.35% averaged overall PAIs and a BPCER of 0.62%. The winning system achieved an APCER of 13.0.4%, averaged over all PAIs tested over all the smartphones, and a BPCER of 1.68% over all smartphones tested. Four-finger systems that make individual finger-based PAD decisions were also tested. The dataset used for competition will be available 1 to all researchers as per data share protocol

摘要
生命检测（LivDet）是一个国际竞赛系列，开放于学术和产业领域，旨在评估和报告当前最佳的演示攻击检测（PAD）技术。LivDet-2023非接触指纹是第一届非接触指纹基于PAD竞赛，用于评估和比较不同算法和系统的性能。这个竞赛作为非接触指纹PAD领域的重要标准，提供了独立的评估标准，以及一套共同的评估协议。该竞赛包括了多种演示攻击工具（PAIs）和真实的手指图像，以及一套标准的评估协议。winning algorithm achieved an APCER of 11.35% and a BPCER of 0.62% over all PAIs, and the winning system achieved an APCER of 13.04% and a BPCER of 1.68% over all smartphones tested. In addition, four-finger systems that make individual finger-based PAD decisions were also tested. The dataset used for the competition will be made available to all researchers according to the data sharing protocol.

Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning

paper_url: http://arxiv.org/abs/2310.00647
repo_url: https://github.com/mshukor/EvALign-ICL
paper_authors: Mustafa Shukor, Alexandre Rame, Corentin Dancette, Matthieu Cord
for: 这个论文旨在探讨大型多模型（LMMs）的问题和局限性，以及如何通过增强ICL（增强内容学习）来解决这些问题。
methods: 这个论文使用了8种不同的开源LMM（基于FLAMINGO架构），并对这些模型进行了5个轴的评估：幻觉、抑郁、 композиitional、解释性和遵循指令。此外，论文还研究了ICL的效果于LMMs的问题。
results: 论文发现，尽管LMMs在任务性能方面表现出色，但它们仍然存在许多问题，例如幻觉、抑郁、不compositional和解释性不足。ICL可以有效解决一些问题，但并不能解决所有问题。此外，论文还提出了一些新的多模态ICL方法，如多任务ICL、链式回忆ICL和自我修正ICL，以解决LMMs的问题。

Abstract
Following the success of Large Language Models (LLMs), Large Multimodal Models (LMMs), such as the Flamingo model and its subsequent competitors, have started to emerge as natural steps towards generalist agents. However, interacting with recent LMMs reveals major limitations that are hardly captured by the current evaluation benchmarks. Indeed, task performances (e.g., VQA accuracy) alone do not provide enough clues to understand their real capabilities, limitations, and to which extent such models are aligned to human expectations. To refine our understanding of those flaws, we deviate from the current evaluation paradigm and propose the EvALign-ICL framework, in which we (1) evaluate 8 recent open-source LMMs (based on the Flamingo architecture such as OpenFlamingo and IDEFICS) on 5 different axes; hallucinations, abstention, compositionality, explainability and instruction following. Our evaluation on these axes reveals major flaws in LMMs. To efficiently address these problems, and inspired by the success of in-context learning (ICL) in LLMs, (2) we explore ICL as a solution and study how it affects these limitations. Based on our ICL study, (3) we push ICL further and propose new multimodal ICL approaches such as; Multitask-ICL, Chain-of-Hindsight-ICL, and Self-Correcting-ICL. Our findings are as follows; (1) Despite their success, LMMs have flaws that remain unsolved with scaling alone. (2) The effect of ICL on LMMs flaws is nuanced; despite its effectiveness for improved explainability, abstention, and instruction following, ICL does not improve compositional abilities, and actually even amplifies hallucinations. (3) The proposed ICL variants are promising as post-hoc approaches to efficiently tackle some of those flaws. The code is available here: https://evalign-icl.github.io/

摘要
以 Large Language Models (LLMs) 的成功为契机，Large Multimodal Models (LMMs) 也在出现，如FLAMINGO模型和其竞争对手。然而，与最近的 LMMs 交互后，我们发现它们存在重要的局限性，这些局限性并不被当前的评价标准完全捕捉。实际上，任务性能（如 VQA 准确率） alone 不能够反映它们的真正能力和局限性，以及与人类期望的对应度。为了更好地理解这些问题，我们在评价标准之外尝试了 EvALign-ICL 框架，其中我们（1）评价了 8 个最近开源 LMMs（基于 FLAMINGO 架构，如 OpenFlamingo 和 IDEFICS）在 5 个轴上，即幻觉、抑制、复合性、解释性和遵从性。我们的评价表明，LMMs 存在重要的问题。为了有效地解决这些问题，我们（2）探索了 ICL 的潜在作用，并研究了 ICL 如何影响这些局限性。基于我们的 ICL 研究，我们（3）将 ICL 推广到多Modal ICL，并提出了新的多模态 ICL 方法，如 Multitask-ICL、Chain-of-Hindsight-ICL 和 Self-Correcting-ICL。我们的发现是，（1）虽然 LMMs 成功，但它们仍存在不解决的问题，不能通过缩放 alone 解决。（2）ICL 对 LMMs 的缺陷有复杂的影响，虽有效提高了解释性、抑制和遵从性，但是不会改善复合性，并且实际上会加剧幻觉。（3）我们提出的 ICL 变体是可以有效地解决一些问题的后续方法。代码可以在以下链接获取：https://evalign-icl.github.io/

RegBN: Batch Normalization of Multimodal Data with Regularization

paper_url: http://arxiv.org/abs/2310.00641
repo_url: https://github.com/mogvision/regbn
paper_authors: Morteza Ghahremani, Christian Wachinger
for: 这篇论文的目的是提出一种新的多modal资料Normalization方法，以便将多种不同的数据模式融合在一起，提高模组的表现。
methods: 这篇论文使用了RegBN方法，具有调整Regularization的功能，可以干预干扰因素和背景噪音的影响，并且可以跨多个数据模式进行normalization。
results: 这篇论文在八个数据库上进行验证，包括语言、音频、图像、视频、深度、表格和3D MRI等多种数据模式，以及不同的架构（如多层感知神经网络、卷积神经网络和视觉转移神经网络），展示了RegBN方法的通用性和效iveness。

Abstract
Recent years have witnessed a surge of interest in integrating high-dimensional data captured by multisource sensors, driven by the impressive success of neural networks in the integration of multimodal data. However, the integration of heterogeneous multimodal data poses a significant challenge, as confounding effects and dependencies among such heterogeneous data sources introduce unwanted variability and bias, leading to suboptimal performance of multimodal models. Therefore, it becomes crucial to normalize the low- or high-level features extracted from data modalities before their fusion takes place. This paper introduces a novel approach for the normalization of multimodal data, called RegBN, that incorporates regularization. RegBN uses the Frobenius norm as a regularizer term to address the side effects of confounders and underlying dependencies among different data sources. The proposed method generalizes well across multiple modalities and eliminates the need for learnable parameters, simplifying training and inference. We validate the effectiveness of RegBN on eight databases from five research areas, encompassing diverse modalities such as language, audio, image, video, depth, tabular, and 3D MRI. The proposed method demonstrates broad applicability across different architectures such as multilayer perceptrons, convolutional neural networks, and vision transformers, enabling effective normalization of both low- and high-level features in multimodal neural networks. RegBN is available at \url{https://github.com/mogvision/regbn}.

摘要
近年来，有一个强大的兴趣在将多维数据集成到多源感知器中，这主要归功于神经网络在多模态数据的集成中的出色成绩。然而，多模态数据的集成带来一些挑战，因为不同的数据来源之间存在干扰效应和依赖关系，这会导致模型的性能下降。因此，在模型融合之前，需要对数据模式中的低级或高级特征进行Normalization。这篇文章提出了一种新的多模态数据Normalization方法，称为RegBN，它包含了正则化项。RegBN使用 Frobenius нор为正则化项，以解决不同数据来源之间的干扰效应和依赖关系。提出的方法可以通过多种模式进行扩展，无需学习参数，因此训练和推理变得更加简单。我们在八个数据库中进行了验证，包括语言、音频、图像、视频、深度、表格和3D MRI 等多种模式，RegBN 的效果广泛，可以effective地Normalization多模式的低级和高级特征。RegBN 可以在多层感知器、卷积神经网络和视transformer 等不同的架构上进行应用，为多模态神经网络的Normalization提供了一个简单的解决方案。RegBN 的代码可以在上下载。

Segmentation-based Assessment of Tumor-Vessel Involvement for Surgical Resectability Prediction of Pancreatic Ductal Adenocarcinoma

paper_url: http://arxiv.org/abs/2310.00639
repo_url: None
paper_authors: Christiaan Viviers, Mark Ramaekers, Amaan Valiuddin, Terese Hellström, Nick Tasios, John van der Ven, Igor Jacobs, Lotte Ewals, Joost Nederend, Peter de With, Misha Luyer, Fons van der Sommen
for: This research aims to provide a workflow and deep learning-based segmentation models to automatically assess tumor-vessel involvement in Pancreatic ductal adenocarcinoma (PDAC) patients, which is crucial for determining treatment options and improving patient outcomes.
methods: The proposed workflow involves processing CT scans to segment the tumor and vascular structures, analyzing spatial relationships and the extent of vascular involvement, using three different deep learning-based segmentation architectures (nnU-Net, 3D U-Net, and Probabilistic 3D U-Net).
results: The segmentations achieved a high accuracy in segmenting veins, arteries, and the tumor, and enabled automated detection of tumor involvement with high accuracy (0.88 sensitivity and 0.86 specificity). Additionally, the models captured uncertainty in the predicted involvement, providing clinicians with a clear indication of tumor-vessel involvement and facilitating more informed decision-making for surgical interventions.

Abstract
Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer with limited treatment options. This research proposes a workflow and deep learning-based segmentation models to automatically assess tumor-vessel involvement, a key factor in determining tumor resectability. Correct assessment of resectability is vital to determine treatment options. The proposed workflow involves processing CT scans to segment the tumor and vascular structures, analyzing spatial relationships and the extent of vascular involvement, which follows a similar way of working as expert radiologists in PDAC assessment. Three segmentation architectures (nnU-Net, 3D U-Net, and Probabilistic 3D U-Net) achieve a high accuracy in segmenting veins, arteries, and the tumor. The segmentations enable automated detection of tumor involvement with high accuracy (0.88 sensitivity and 0.86 specificity) and automated computation of the degree of tumor-vessel contact. Additionally, due to significant inter-observer variability in these important structures, we present the uncertainty captured by each of the models to further increase insights into the predicted involvement. This result provides clinicians with a clear indication of tumor-vessel involvement and may be used to facilitate more informed decision-making for surgical interventions. The proposed method offers a valuable tool for improving patient outcomes, personalized treatment strategies and survival rates in pancreatic cancer.

摘要
《胰腺ductal adenocarcinoma（PDAC）是一种高度侵略性的Cancer，具有有限的治疗选择。本研究提出了一种工作流程和深度学习基于的分割模型，以自动评估肿瘤-血管涉及度，这是确定肿瘤可否切除的关键因素。正确评估可以决定疗程选择。本工作流程包括对CT扫描图进行肿瘤和血管结构分割，分析肿瘤和血管之间的空间关系和血管涉及度，与专业放射科医生在PDAC评估中采用相似的方法。三种分割建筑（nnU-Net、3D U-Net和概率3D U-Net）实现了高精度分割血管、肿瘤和血管。这些分割可以自动检测肿瘤涉及度，并计算肿瘤与血管之间的接触度，并且由于肿瘤-血管结构之间存在显著的Observer variability，我们还提供了每个模型对应的不确定性，以增加预测涉及度的信息。这些结果为临床医生提供了诊断肿瘤涉及度的清晰指导，可能用于改进患者的疗效、个性化治疗策略和存活率。》

Win-Win: Training High-Resolution Vision Transformers from Two Windows

paper_url: http://arxiv.org/abs/2310.00632
repo_url: None
paper_authors: Vincent Leroy, Jerome Revaud, Thomas Lucas, Philippe Weinzaepfel
For: 提高高分辨率视觉转换器的训练和执行效率。* Methods: 随机窗口Masking技术，使模型只需学习每个窗口内的本地交互，以及不同窗口间的全局交互。* Results: 在推理时直接处理高分辨率输入，不需特殊处理，并且在 semantic segmentation 和 optical flow 任务上达到了最佳性能。

Abstract
Transformers have become the standard in state-of-the-art vision architectures, achieving impressive performance on both image-level and dense pixelwise tasks. However, training vision transformers for high-resolution pixelwise tasks has a prohibitive cost. Typical solutions boil down to hierarchical architectures, fast and approximate attention, or training on low-resolution crops. This latter solution does not constrain architectural choices, but it leads to a clear performance drop when testing at resolutions significantly higher than that used for training, thus requiring ad-hoc and slow post-processing schemes. In this paper, we propose a novel strategy for efficient training and inference of high-resolution vision transformers: the key principle is to mask out most of the high-resolution inputs during training, keeping only N random windows. This allows the model to learn local interactions between tokens inside each window, and global interactions between tokens from different windows. As a result, the model can directly process the high-resolution input at test time without any special trick. We show that this strategy is effective when using relative positional embedding such as rotary embeddings. It is 4 times faster to train than a full-resolution network, and it is straightforward to use at test time compared to existing approaches. We apply this strategy to the dense monocular task of semantic segmentation, and find that a simple setting with 2 windows performs best, hence the name of our method: Win-Win. To demonstrate the generality of our contribution, we further extend it to the binocular task of optical flow, reaching state-of-the-art performance on the Spring benchmark that contains Full-HD images with an inference time an order of magnitude faster than the best competitor.

摘要
启示器变得是现代视觉建筑标准，在图像级和密集像素级任务上达到了印象性的表现。然而，在高分辨率像素级任务上训练视觉启示器有束缚的成本。常见的解决方案包括层次结构、快速和 aproximate 注意力以及在训练低分辨率裁剪上进行训练。这个后者不会限制建筑选择，但会导致在测试分辨率远高于训练分辨率时的表现下降，需要特殊的预处理方案。在这篇论文中，我们提出了一种新的高分辨率视觉启示器训练和执行策略：关键原则是在训练时随机隐藏大多数高分辨率输入，只保留N个随机窗口。这 позвоits 模型学习每个窗口内Token之间的本地互动，以及不同窗口内Token之间的全局互动。因此，模型可以直接在测试时处理高分辨率输入，不需要特殊的技巧。我们发现，使用相对位置嵌入，如旋转嵌入，这种策略是最效的。训练时间比普通网络快四倍，并且在测试时使用非常简单。我们在激素分割任务中应用了这种策略，并发现使用2个窗口得到最佳性能，因此我们称之为Win-Win。为了证明我们的贡献的通用性，我们进一步扩展了它到双目任务中，达到了SpringBenchmark上的最新纪录，该纪录包含高清晰度图像，并且在执行时间上比最佳竞争者快一个数量级。

Finger-UNet: A U-Net based Multi-Task Architecture for Deep Fingerprint Enhancement

paper_url: http://arxiv.org/abs/2310.00629
repo_url: None
paper_authors: Ekta Gavas, Anoop Namboodiri
for: 提高低质量指纹识别率
methods: 使用Discrete Wavelet Transform（DWT）进行指纹提高，并使用波峰注意模块代替最大池化，同时使用多任务学习和方向估计任务进行指纹重建。
results: 在FVC 2002和NIST SD302数据库上进行实验，证明我们的方法可以提高低质量指纹识别率，并且比前一些方法更高效。

Abstract
For decades, fingerprint recognition has been prevalent for security, forensics, and other biometric applications. However, the availability of good-quality fingerprints is challenging, making recognition difficult. Fingerprint images might be degraded with a poor ridge structure and noisy or less contrasting backgrounds. Hence, fingerprint enhancement plays a vital role in the early stages of the fingerprint recognition/verification pipeline. In this paper, we investigate and improvise the encoder-decoder style architecture and suggest intuitive modifications to U-Net to enhance low-quality fingerprints effectively. We investigate the use of Discrete Wavelet Transform (DWT) for fingerprint enhancement and use a wavelet attention module instead of max pooling which proves advantageous for our task. Moreover, we replace regular convolutions with depthwise separable convolutions, which significantly reduces the memory footprint of the model without degrading the performance. We also demonstrate that incorporating domain knowledge with fingerprint minutiae prediction task can improve fingerprint reconstruction through multi-task learning. Furthermore, we also integrate the orientation estimation task to propagate the knowledge of ridge orientations to enhance the performance further. We present the experimental results and evaluate our model on FVC 2002 and NIST SD302 databases to show the effectiveness of our approach compared to previous works.

摘要
In this paper, we improve the encoder-decoder style architecture and suggest intuitive modifications to U-Net to enhance low-quality fingerprints effectively. We use Discrete Wavelet Transform (DWT) for fingerprint enhancement and replace regular convolutions with depthwise separable convolutions, which significantly reduces the memory footprint of the model without compromising performance.Moreover, we incorporate domain knowledge with fingerprint minutiae prediction tasks to improve fingerprint reconstruction through multi-task learning. We also integrate orientation estimation tasks to propagate the knowledge of ridge orientations and enhance performance further.We present experimental results and evaluate our model on FVC 2002 and NIST SD302 databases to demonstrate the effectiveness of our approach compared to previous works.

GhostEncoder: Stealthy Backdoor Attacks with Dynamic Triggers to Pre-trained Encoders in Self-supervised Learning

paper_url: http://arxiv.org/abs/2310.00626
repo_url: None
paper_authors: Qiannan Wang, Changchun Yin, Zhe Liu, Liming Fang, Run Wang, Chenhao Lin
for: 本研究旨在提出一种隐藏式、动态Backdoor攻击方法，用于自动学习预训练的图像编码器。
methods: 该攻击方法利用图像隐写技术，将隐藏信息编码到无害图像中，生成后门样本。然后，通过精制预训练图像编码器，植入后门。
results: 试验结果表明，GhostEncoder可以在图像上实现高度的隐藏性，让目标模型具有高度的攻击成功率，而不会丢失其实用性。此外，GhostEncoder也可以抵御现有的防御技术。

Abstract
Within the realm of computer vision, self-supervised learning (SSL) pertains to training pre-trained image encoders utilizing a substantial quantity of unlabeled images. Pre-trained image encoders can serve as feature extractors, facilitating the construction of downstream classifiers for various tasks. However, the use of SSL has led to an increase in security research related to various backdoor attacks. Currently, the trigger patterns used in backdoor attacks on SSL are mostly visible or static (sample-agnostic), making backdoors less covert and significantly affecting the attack performance. In this work, we propose GhostEncoder, the first dynamic invisible backdoor attack on SSL. Unlike existing backdoor attacks on SSL, which use visible or static trigger patterns, GhostEncoder utilizes image steganography techniques to encode hidden information into benign images and generate backdoor samples. We then fine-tune the pre-trained image encoder on a manipulation dataset to inject the backdoor, enabling downstream classifiers built upon the backdoored encoder to inherit the backdoor behavior for target downstream tasks. We evaluate GhostEncoder on three downstream tasks and results demonstrate that GhostEncoder provides practical stealthiness on images and deceives the victim model with a high attack success rate without compromising its utility. Furthermore, GhostEncoder withstands state-of-the-art defenses, including STRIP, STRIP-Cl, and SSL-Cleanse.

摘要
在计算机视觉领域，自主学习（SSL）指的是使用大量未标注图像进行训练已经预训练的图像编码器。这些预训练图像编码器可以作为特征提取器，帮助建立下游分类器 для多种任务。然而，使用SSL带来了安全研究中的各种后门攻击。现在，许多后门攻击使用SSL的触发模式都是可见或静止的（样本不具特定），这使得后门变得更加明显，对攻击性能产生负面影响。在这种情况下，我们提出了 GhostEncoder，首个在SSL中的动态隐藏后门攻击。与现有的SSL后门攻击不同，GhostEncoder使用图像隐写技术来编码隐藏信息到正常图像中，并生成后门样本。然后，我们精细调整预训练图像编码器，使其在扭曲数据集上进行后门插入，使得基于后门编码器的下游分类器继承后门行为，并且不会增加负面影响。我们对 GhostEncoder 进行了三个下游任务的评估，结果表明，GhostEncoder 在图像上具有实际的隐藏性，诱导了受试模型，并且不会降低其实用性。此外，GhostEncoder 可以抵御当前的防御技术，包括 STRIP、STRIP-Cl 和 SSL-Cleanse。

Understanding Adversarial Transferability in Federated Learning

paper_url: http://arxiv.org/abs/2310.00616
repo_url: None
paper_authors: Yijiang Li, Ying Gao, Haohan Wang
for: This paper investigates the robustness and security issues of federated learning (FL) systems in a practical setting where malicious clients disguise their identities and launch transferable adversarial attacks.methods: The paper uses empirical experiments and theoretical analysis to study the robustness of FL systems against such attacks, and hypothesizes that the decentralized training on distributed data and the averaging operation contribute to the system’s robustness.results: The paper finds that the federated model is more robust compared to its centralized counterpart when the accuracy on clean images is comparable, and provides evidence from both empirical experiments and theoretical analysis to support this conclusion.

Abstract
We investigate the robustness and security issues from a novel and practical setting: a group of malicious clients has impacted the model during training by disguising their identities and acting as benign clients, and only revealing their adversary position after the training to conduct transferable adversarial attacks with their data, which is usually a subset of the data that FL system is trained with. Our aim is to offer a full understanding of the challenges the FL system faces in this practical setting across a spectrum of configurations. We notice that such an attack is possible, but the federated model is more robust compared with its centralized counterpart when the accuracy on clean images is comparable. Through our study, we hypothesized the robustness is from two factors: the decentralized training on distributed data and the averaging operation. We provide evidence from both the perspective of empirical experiments and theoretical analysis. Our work has implications for understanding the robustness of federated learning systems and poses a practical question for federated learning applications.

摘要
我们研究了一种新和实际的场景中的安全和稳定性问题：一群恶意客户端在训练过程中对模型产生了影响，通过掩饰自己的身份和行为如善意客户端，并只在训练后 revelation 自己为敌对位置，以进行可转移性攻击。我们的目标是对 Federated Learning 系统在这种实际场景中所面临的挑战进行全面的理解，并通过不同的配置进行spectrum 的研究。我们发现这种攻击是可能的，但在模型级别的清洁图像准确率相似时， federated model 比其中央化模型更加稳定。我们认为这种稳定性来自两个因素：分布式训练在分布式数据上和平均操作。我们通过实验和理论分析提供证据。我们的工作对 Federated Learning 系统的稳定性有重要的意义，并提出了实际问题 для Federated Learning 应用。

Scene-aware Human Motion Forecasting via Mutual Distance Prediction

paper_url: http://arxiv.org/abs/2310.00615
repo_url: None
paper_authors: Chaoyue Xing, Wei Mao, Miaomiao Liu
for: 本研究强调解决人体动作预测中的场景相关性问题，通过模elling人体-场景交互来预测未来人体动作。
methods: 我们提出了基于人体-场景距离的人体动作预测方法，其中距离包括人体Vertex与场景表面之间的积分距离和基准场景点与人体网格之间的距离。我们还开发了一个预测步骤两步管道，先预测未来距离，然后根据预测距离预测未来人体动作。在训练过程中，我们显式地促进了预测pose与距离之间的一致性。
results: 我们的方法在 sintetic和实际数据集上比前学者的方法表现更好，提高了人体动作预测的精度和可靠性。

Abstract
In this paper, we tackle the problem of scene-aware 3D human motion forecasting. A key challenge of this task is to predict future human motions that are consistent with the scene, by modelling the human-scene interactions. While recent works have demonstrated that explicit constraints on human-scene interactions can prevent the occurrence of ghost motion, they only provide constraints on partial human motion e.g., the global motion of the human or a few joints contacting the scene, leaving the rest motion unconstrained. To address this limitation, we propose to model the human-scene interaction with the mutual distance between the human body and the scene. Such mutual distances constrain both the local and global human motion, resulting in a whole-body motion constrained prediction. In particular, mutual distance constraints consist of two components, the signed distance of each vertex on the human mesh to the scene surface, and the distance of basis scene points to the human mesh. We develop a pipeline with two prediction steps that first predicts the future mutual distances from the past human motion sequence and the scene, and then forecasts the future human motion conditioning on the predicted mutual distances. During training, we explicitly encourage consistency between the predicted poses and the mutual distances. Our approach outperforms the state-of-the-art methods on both synthetic and real datasets.

摘要

Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning

paper_url: http://arxiv.org/abs/2310.00608
repo_url: None
paper_authors: Zhiheng Li, Wenjia Geng, Muheng Li, Lei Chen, Yansong Tang, Jiwen Lu, Jie Zhou
for: 提出了一种基于减少行动空间学习的过程规划方法，以解决现有方法在高维状态监测和动作序列错误积累问题上遇到困难。
methods: 将过程规划问题抽象为数学链模型，通过跳过不确定节点和边，将长和复杂的序列函数转化为短而可靠的两种方式。
results: 对 CrossTask 和 COIN 测试集进行了广泛的实验，并达到了当前状态的最佳性能。

Abstract
In this paper, we propose Skip-Plan, a condensed action space learning method for procedure planning in instructional videos. Current procedure planning methods all stick to the state-action pair prediction at every timestep and generate actions adjacently. Although it coincides with human intuition, such a methodology consistently struggles with high-dimensional state supervision and error accumulation on action sequences. In this work, we abstract the procedure planning problem as a mathematical chain model. By skipping uncertain nodes and edges in action chains, we transfer long and complex sequence functions into short but reliable ones in two ways. First, we skip all the intermediate state supervision and only focus on action predictions. Second, we decompose relatively long chains into multiple short sub-chains by skipping unreliable intermediate actions. By this means, our model explores all sorts of reliable sub-relations within an action sequence in the condensed action space. Extensive experiments show Skip-Plan achieves state-of-the-art performance on the CrossTask and COIN benchmarks for procedure planning.

摘要
在这篇论文中，我们提出了Skip-Plan方法，这是一种简化动作空间学习方法 для过程规划在教程视频中。现有的过程规划方法都是在每个时间步骤上预测状态-动作对，这与人类直觉相吻合，但这种方法ология在高维状态监督和动作序列错误积累方面一直遇到困难。在这种工作中，我们抽象了过程规划问题为数学链模型。通过在动作链中跳过不确定的节点和边，我们将长而复杂的序列函数转化为短而可靠的两种方式。第一种方法是跳过所有间接状态监督，只Focus on 动作预测。第二种方法是将相对较长的链分解成多个短的子链，通过跳过不可靠的间接动作来实现。通过这种方式，我们的模型可以在简化动作空间中探索所有可靠的子关系。我们的实验表明Skip-Plan在CrossTask和COIN测试准则上达到了过程规划领域的状态天空。

Quantum image edge detection based on eight-direction Sobel operator for NEQR

paper_url: http://arxiv.org/abs/2310.03037
repo_url: None
paper_authors: Wenjie Liu, Lu Wang
for: 这个论文是为了提出一种基于量子机制的图像边缘检测算法（QSED），以解决经典算法遇到的实时问题。
methods: 该算法基于八个方向的 Sobel 算子，不仅可以减少部分图像的边缘信息损失，还同时计算所有像素的八个方向的梯度值。
results: 对于 2^n x 2^n 图像，该算法的复杂度可以降至 O(n^2 + q^2)，比其他经典或量子算法低。实验表明，该算法可以更好地检测高清像中的对角边缘。

Abstract
Quantum Sobel edge detection (QSED) is a kind of algorithm for image edge detection using quantum mechanism, which can solve the real-time problem encountered by classical algorithms. However, the existing QSED algorithms only consider two- or four-direction Sobel operator, which leads to a certain loss of edge detail information in some high-definition images. In this paper, a novel QSED algorithm based on eight-direction Sobel operator is proposed, which not only reduces the loss of edge information, but also simultaneously calculates eight directions' gradient values of all pixel in a quantum image. In addition, the concrete quantum circuits, which consist of gradient calculation, non-maximum suppression, double threshold detection and edge tracking units, are designed in details. For a 2^n x 2^n image with q gray scale, the complexity of our algorithm can be reduced to O(n^2 + q^2), which is lower than other existing classical or quantum algorithms. And the simulation experiment demonstrates that our algorithm can detect more edge information, especially diagonal edges, than the two- and four-direction QSED algorithms.

摘要

Image Data Hiding in Neural Compressed Latent Representations

paper_url: http://arxiv.org/abs/2310.00568
repo_url: None
paper_authors: Chen-Hsiu Huang, Ja-Ling Wu
for: 这个论文是为了开发一个朴素的图像数据隐藏框架，用于嵌入和提取秘密信息。
methods: 该方法使用了一种朴素的神经压缩器，并且使用了一种我们提出的消息编码器和解码器，同时使用了一种感知损失函数来实现高品质图像和高比特率。
results: 该方法可以在压缩领域中实现高水平的图像秘密性和竞争力强的水印鲁棒性，同时提高嵌入速度，比传统方法快上百倍。这些结果表明了将数据隐藏技术与神经压缩相结合的潜在优势和应用前景。

Abstract
We propose an end-to-end learned image data hiding framework that embeds and extracts secrets in the latent representations of a generic neural compressor. By leveraging a perceptual loss function in conjunction with our proposed message encoder and decoder, our approach simultaneously achieves high image quality and high bit accuracy. Compared to existing techniques, our framework offers superior image secrecy and competitive watermarking robustness in the compressed domain while accelerating the embedding speed by over 50 times. These results demonstrate the potential of combining data hiding techniques and neural compression and offer new insights into developing neural compression techniques and their applications.

摘要
我们提出了一个末端学习的图像数据隐藏框架，该框架在一个通用的神经压缩器中嵌入和提取秘密。通过我们提出的消息编码器和解码器以及一种感知损失函数，我们的方法同时实现高质量图像和高比特率。与现有技术相比，我们的框架在压缩领域中提供了更高的图像机密性和竞争力强的水印鲁棒性，同时加速嵌入速度，提高了50倍以上。这些结果表明将数据隐藏技术与神经压缩结合可以实现新的应用和技术突破，并为神经压缩技术的发展提供新的视角。

CPIPS: Learning to Preserve Perceptual Distances in End-to-End Image Compression

paper_url: http://arxiv.org/abs/2310.00559
repo_url: None
paper_authors: Chen-Hsiu Huang, Ja-Ling Wu
for: 这篇论文目的是提出一种基于神经科学和生物系统的压缩图像 Similarity Metric，以提高机器视觉任务中的图像压缩和比较效率。
methods: 该方法基于一种已经学习的神经网络编码器，通过修改压缩缓存来优先级化 semantics relevance，同时保持 perceived distance。
results: 对比 traditional DNN-based perceptual metrics，CPIPS 可以在计算速度和复杂度上具有明显的优势，而且可以在机器视觉任务中提高图像压缩和比较效率。

Abstract
Lossy image coding standards such as JPEG and MPEG have successfully achieved high compression rates for human consumption of multimedia data. However, with the increasing prevalence of IoT devices, drones, and self-driving cars, machines rather than humans are processing a greater portion of captured visual content. Consequently, it is crucial to pursue an efficient compressed representation that caters not only to human vision but also to image processing and machine vision tasks. Drawing inspiration from the efficient coding hypothesis in biological systems and the modeling of the sensory cortex in neural science, we repurpose the compressed latent representation to prioritize semantic relevance while preserving perceptual distance. Our proposed method, Compressed Perceptual Image Patch Similarity (CPIPS), can be derived at a minimal cost from a learned neural codec and computed significantly faster than DNN-based perceptual metrics such as LPIPS and DISTS.

摘要
产生损失的图像编码标准如JPEG和MPEG已经成功实现了多媒体数据的高压缩率 для人类消耗。然而，随着互联网物联网设备、无人机和自动驾驶车的普及，机器正在处理更多的捕捉视觉内容。因此，我们需要追求一种高效的压缩表示，不仅适合人类视觉，还适合图像处理和机器视觉任务。 drawing inspiration from生物系统中的高效编码假设和神经科学中的感觉脑层模型，我们重新利用压缩潜在表示，优先级 semantic relevance 而保持perceptual distance。我们提议的方法，压缩感知图像patch similarity（CPIPS），可以在学习神经编码器的基础上得到，并且可以在DNN基于的感知度量方法，如LPIPS和DISTS，中计算得到更快。

Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes

paper_url: http://arxiv.org/abs/2310.00558
repo_url: None
paper_authors: Alloy Das, Sanket Biswas, Umapada Pal, Josep Lladós
for: 本研究旨在开发一种能够通用多个频道的自动Scene文本检测系统，以便在实际世界中针对不同频道进行文本检测。
methods: 我们采用了一种培训模型使用多个频道源数据，以便将其直接应用于目标频道中进行文本检测，而不是特定频道或enario中的精化。
results: 我们提出了一种基于超解析的终端转换器基线模型，称为DA-TextSpotter，可以在常见和arbitrary-shapedScene文本检测benchmark上达到或超越现有的文本检测建筑，同时具有较高的模型效率。

Abstract
When used in a real-world noisy environment, the capacity to generalize to multiple domains is essential for any autonomous scene text spotting system. However, existing state-of-the-art methods employ pretraining and fine-tuning strategies on natural scene datasets, which do not exploit the feature interaction across other complex domains. In this work, we explore and investigate the problem of domain-agnostic scene text spotting, i.e., training a model on multi-domain source data such that it can directly generalize to target domains rather than being specialized for a specific domain or scenario. In this regard, we present the community a text spotting validation benchmark called Under-Water Text (UWT) for noisy underwater scenes to establish an important case study. Moreover, we also design an efficient super-resolution based end-to-end transformer baseline called DA-TextSpotter which achieves comparable or superior performance over existing text spotting architectures for both regular and arbitrary-shaped scene text spotting benchmarks in terms of both accuracy and model efficiency. The dataset, code and pre-trained models will be released upon acceptance.

摘要
当用于实际噪声环境中的自动Scene文本检测系统时，能够泛化到多个领域是非常重要的。然而，现有的状态艺术方法通常采用预训练和细化策略在自然场景数据集上，这并不利用场景文本之间的特征互动。在这种情况下，我们探索和探讨域性文本检测问题，即在多个领域源数据上训练一个模型，使其直接泛化到目标领域而不是特定领域或enario。为此，我们向社区提供了文本检测验证 benchmark called Under-Water Text (UWT)，以便在水下场景中进行噪声检测。此外，我们还设计了一种高效的超解像基于 transformer 结构的终端模型called DA-TextSpotter，它在常见和任意形状场景文本检测benchmark上实现了和现有文本检测建筑物之间的比较或更好的性能，同时具有更高的模型效率。数据集、代码和预训练模型将在接受后发布。

Seal2Real: Prompt Prior Learning on Diffusion Model for Unsupervised Document Seal Data Generation and Realisation

paper_url: http://arxiv.org/abs/2310.00546
repo_url: None
paper_authors: Jiancheng Huang, Yifan Liu, Yi Huang, Shifeng Chen
For: 提供了一种生成大量标注文档印章数据的方法，以便提高Docuement Processing中的印章相关任务的性能。* Methods: 使用了一种基于静止扩散模型的提问先学架构，通过无监督训练将生成器的优先生成能力迁移到印章生成任务中。* Results: 在Seal-DB dataset上进行实验，表明Seal2Real方法可以生成高度真实的印章图像，对后续的印章相关任务进行实际数据上的提升。

Abstract
In document processing, seal-related tasks have very large commercial applications, such as seal segmentation, seal authenticity discrimination, seal removal, and text recognition under seals. However, these seal-related tasks are highly dependent on labelled document seal datasets, resulting in very little work on these tasks. To address the lack of labelled datasets for these seal-related tasks, we propose Seal2Real, a generative method that generates a large amount of labelled document seal data, and construct a Seal-DB dataset containing 20K images with labels. In Seal2Real, we propose a prompt prior learning architecture based on a pre-trained Stable Diffusion Model that migrates the prior generative power of to our seal generation task with unsupervised training. The realistic seal generation capability greatly facilitates the performance of downstream seal-related tasks on real data. Experimental results on the Seal-DB dataset demonstrate the effectiveness of Seal2Real.

摘要
在文档处理领域中，有很多商业应用，如印章分割、印章真实性识别、印章去除和文本识别下印章。然而，这些印章相关任务都受到了标注文档印章数据的限制，导致了这些任务的研究得到了非常少的积极性。为了解决标注文档印章数据的缺乏，我们提出了Seal2Real方法，该方法可以生成大量标注文档印章数据，并构建了一个名为Seal-DB的数据集，包含20K个图像和标签。在Seal2Real中，我们提出了一种提前学习的推荐模型，基于静止扩散模型，将先前的生成能力传递到我们的印章生成任务中，并在无监督下训练。这种印章生成能力可以帮助下游印章相关任务在真实数据上表现出色。实验结果表明，Seal2Real是有效的。

Implicit Neural Representations and the Algebra of Complex Wavelets

paper_url: http://arxiv.org/abs/2310.00545
repo_url: None
paper_authors: T. Mitchell Roddenberry, Vishwanath Saragadam, Maarten V. de Hoop, Richard G. Baraniuk
for: 这个论文旨在探讨隐形神经表示（INR）如何用于欧几丁素空间上的信号处理和机器学习。
methods: 该论文使用多层感知器（MLP）来Parameterize图像，并使用浮动函数或浮动滤波器来实现INR。
results: 研究发现，使用浮动滤波器作为激活函数可以同时具有频率和空间特征的地方化特征，从而提高信号处理和机器学习的性能。此外，该论文还提出了多种INR架构设计方法，包括复杂滤波器、分离低频和高频拟合、以及基于所求信号的初始化方案。

Abstract
Implicit neural representations (INRs) have arisen as useful methods for representing signals on Euclidean domains. By parameterizing an image as a multilayer perceptron (MLP) on Euclidean space, INRs effectively represent signals in a way that couples spatial and spectral features of the signal that is not obvious in the usual discrete representation, paving the way for continuous signal processing and machine learning approaches that were not previously possible. Although INRs using sinusoidal activation functions have been studied in terms of Fourier theory, recent works have shown the advantage of using wavelets instead of sinusoids as activation functions, due to their ability to simultaneously localize in both frequency and space. In this work, we approach such INRs and demonstrate how they resolve high-frequency features of signals from coarse approximations done in the first layer of the MLP. This leads to multiple prescriptions for the design of INR architectures, including the use of complex wavelets, decoupling of low and band-pass approximations, and initialization schemes based on the singularities of the desired signal.

摘要
启发神经表示（INR）在欧几何空间上表示信号已成为有用的方法。通过将图像 Parametric 为多层感知器（MLP）在欧几何空间中，INR 可以将信号表示为不可分离的空间和频谱特征，使得不可分离的信号处理和机器学习方法变得可能。虽然使用惯性函数的 INR 已经被研究，但是最近的工作表明使用浪谱函数作为激活函数的优势，因为它可以同时在频谱和空间中进行本地化。在这个工作中，我们研究了这些 INR 和它们如何在 MLP 的第一层中解决高频特征。这导致了多种 INR 架构的设计方法，包括复杂浪谱、分离低频和高频拟合、以及基于感知信号的初始化方案。

Enabling Neural Radiance Fields (NeRF) for Large-scale Aerial Images – A Multi-tiling Approach and the Geometry Assessment of NeRF

paper_url: http://arxiv.org/abs/2310.00530
repo_url: None
paper_authors: Ningli Xu, Rongjun Qin, Debao Huang, Fabio Remondino
for: 这个论文旨在提高大规模飞行图像数据上的NeRF纹理场的渐进性和准确性。
methods: 作者提出了一种位置特定采样技术和多摄像头分割策略来降低图像加载、表示训练和缓存内存占用，并提高内部缓存的速度。
results: 作者对两个典型的飞行图像数据集进行了比较，结果表明提出的NeRF方法在完整性和物体细节方面表现更好，但还有一定的准确性不足。

Abstract
Neural Radiance Fields (NeRF) offer the potential to benefit 3D reconstruction tasks, including aerial photogrammetry. However, the scalability and accuracy of the inferred geometry are not well-documented for large-scale aerial assets,since such datasets usually result in very high memory consumption and slow convergence.. In this paper, we aim to scale the NeRF on large-scael aerial datasets and provide a thorough geometry assessment of NeRF. Specifically, we introduce a location-specific sampling technique as well as a multi-camera tiling (MCT) strategy to reduce memory consumption during image loading for RAM, representation training for GPU memory, and increase the convergence rate within tiles. MCT decomposes a large-frame image into multiple tiled images with different camera models, allowing these small-frame images to be fed into the training process as needed for specific locations without a loss of accuracy. We implement our method on a representative approach, Mip-NeRF, and compare its geometry performance with threephotgrammetric MVS pipelines on two typical aerial datasets against LiDAR reference data. Both qualitative and quantitative results suggest that the proposed NeRF approach produces better completeness and object details than traditional approaches, although as of now, it still falls short in terms of accuracy.

摘要
neural radiance fields (NeRF) 可能帮助3D重建任务，包括航空相机摄影。然而，对大规模航空资产的推广和准确性不够 document。在这篇论文中，我们希望通过缩放NeRF来适应大规模航空资产，并对NeRF的geometry进行全面评估。我们引入了位置特定的采样技术以及多camera tilting（MCT）策略，以降低内存占用量，提高内存中的表示训练，并提高分割区域内的快速转换。MCT将大幅度图像分解成多个不同摄像机模型的小幅度图像，以便在特定位置上无损loss的方式进行训练。我们实现了这种方法，并与三种光学多视角摄影管道进行比较，以评估NeRF的geometry性能。结果表明，我们的方法可以在两个典型的航空数据集上提供更好的完整性和物体细节，although it still lags behind in terms of accuracy。

Self-supervised Learning of Contextualized Local Visual Embeddings

paper_url: http://arxiv.org/abs/2310.00527
repo_url: https://github.com/sthalles/clove
paper_authors: Thalles Santos Silva, Helio Pedrini, Adín Ramírez Rivera
for: 这篇论文是为了提出一种基于自我supervised convolutional neural network（CNN）的方法，以学习适合紧密预测任务的表示。
methods: 这篇论文使用了一种新的多头自我注意层，通过对不同部分的图像特征进行相似性combine来学习 contextualized embedding。
results: 该论文在多个数据集上进行了广泛的 benchmarking，并达到了基于CNN架构的 dense prediction downstream tasks中的国际级表现，包括物体检测、实例分割、关键点检测和紧密pose estimation。

Abstract
We present Contextualized Local Visual Embeddings (CLoVE), a self-supervised convolutional-based method that learns representations suited for dense prediction tasks. CLoVE deviates from current methods and optimizes a single loss function that operates at the level of contextualized local embeddings learned from output feature maps of convolution neural network (CNN) encoders. To learn contextualized embeddings, CLoVE proposes a normalized mult-head self-attention layer that combines local features from different parts of an image based on similarity. We extensively benchmark CLoVE's pre-trained representations on multiple datasets. CLoVE reaches state-of-the-art performance for CNN-based architectures in 4 dense prediction downstream tasks, including object detection, instance segmentation, keypoint detection, and dense pose estimation.

摘要
我团队现在发布 Contextualized Local Visual Embeddings (CLoVE)，这是一种自动学习的卷积神经网络方法，用于学习适用于紧凑预测任务的表示。CLoVE与现有方法不同，它优化了基于输出特征图卷积神经网络Encoder学习的上下文化 embedding 的单个损失函数。为了学习上下文化 embedding，CLoVE提议了一种归一化多头自注意层，通过相似性将不同部分的图像特征相结合。我们对 CLoVE 的预训练表示进行了广泛的比较，并达到了基于 CNN 架构的 dense prediction 下游任务中的国际级表现。

2023-10-01

cs.AI

cs.AI - 2023-10-01

OceanNet: A principled neural operator-based digital twin for regional oceans

paper_url: http://arxiv.org/abs/2310.00813
repo_url: None
paper_authors: Ashesh Chattopadhyay, Michael Gray, Tianning Wu, Anna B. Lowe, Ruoying He
for: 这个研究旨在开发一种基于神经网络的数字孪生模型，用于海洋径流预测。
methods: 该模型使用FOURNIER神经网络算法和评估误差修正方法，以提高预测稳定性和抑制自回卷积误差增长。此外，使用 спектраль regularizer 减少小规模谱偏误。
results: 在北大西洋西部边域流（卡里布湾流）中测试了这种模型，并成功地预测了径流聚合体和弯曲流的季节预报。与传统的不连接、状态环境模型预测相比，这种模型显示出竞争的预测能力，同时减少了计算量500,000倍。这些成果表明物理启发的深度神经算法可能成为高分辨率数字海洋模型的成本效果的替代方案。

Abstract
While data-driven approaches demonstrate great potential in atmospheric modeling and weather forecasting, ocean modeling poses distinct challenges due to complex bathymetry, land, vertical structure, and flow non-linearity. This study introduces OceanNet, a principled neural operator-based digital twin for ocean circulation. OceanNet uses a Fourier neural operator and predictor-evaluate-corrector integration scheme to mitigate autoregressive error growth and enhance stability over extended time scales. A spectral regularizer counteracts spectral bias at smaller scales. OceanNet is applied to the northwest Atlantic Ocean western boundary current (the Gulf Stream), focusing on the task of seasonal prediction for Loop Current eddies and the Gulf Stream meander. Trained using historical sea surface height (SSH) data, OceanNet demonstrates competitive forecast skill by outperforming SSH predictions by an uncoupled, state-of-the-art dynamical ocean model forecast, reducing computation by 500,000 times. These accomplishments demonstrate the potential of physics-inspired deep neural operators as cost-effective alternatives to high-resolution numerical ocean models.

摘要
While data-driven approaches have shown great potential in atmospheric modeling and weather forecasting, ocean modeling poses distinct challenges due to complex bathymetry, land, vertical structure, and flow non-linearity. This study introduces OceanNet, a principled neural operator-based digital twin for ocean circulation. OceanNet uses a Fourier neural operator and predictor-evaluate-corrector integration scheme to mitigate autoregressive error growth and enhance stability over extended time scales. A spectral regularizer counteracts spectral bias at smaller scales. OceanNet is applied to the northwest Atlantic Ocean western boundary current (the Gulf Stream), focusing on the task of seasonal prediction for Loop Current eddies and the Gulf Stream meander. Trained using historical sea surface height (SSH) data, OceanNet demonstrates competitive forecast skill by outperforming SSH predictions by an uncoupled, state-of-the-art dynamical ocean model forecast, reducing computation by 500,000 times. These accomplishments demonstrate the potential of physics-inspired deep neural operators as cost-effective alternatives to high-resolution numerical ocean models.Here's the translation in Traditional Chinese:而data-driven方法在大气模拟和天气预测中表现出了很大的潜力，但是海洋模拟却存在复杂的海底地形、陆地、垂直结构和流体非线性等挑战。本研究提出了OceanNet，一种基于神经算子的数字双胞虫 для海洋流动。OceanNet使用了福洛神经算子和预测评估修正 integrate scheme来减少自回归错误增长和提高时间尺度上的稳定性。另外，一种 Spectral regularizer 来抵消小尺度的 spectral bias。OceanNet 应用于北大西洋西部边Current（ GolStream），专注于季节预测Loop Current eddies 和 GolStream meander。使用历史海面高度数据进行训练，OceanNet 表现出了与不可分离的、现有的动力海洋模型预测 SSH 数据的竞争力，并且减少了计算量500,000倍。这些成就表明 physics-inspired deep neural operators 可以成为高分解能数字海洋模型的成本效果的替代方案。

Sparse Backpropagation for MoE Training

paper_url: http://arxiv.org/abs/2310.00811
repo_url: None
paper_authors: Liyuan Liu, Jianfeng Gao, Weizhu Chen
for: 这篇论文主要旨在解决深度学习中的权值计算问题，特别是在混合专家（Mixture-of-Expert，MoE）模型中，通过专家路由实现稀疏计算，从而实现很好的扩展性。
methods: 该论文提出了一种名为SparseMixer的扩展性 gradient estimator，它可以在混合专家模型中实现可靠的梯度估计，并且不需要忽略某些梯度项，从而实现更加准确的梯度估计。SparseMixer基于数字差分方法，利用中点法来提供精确的梯度估计，计算 overhead 很低。
results: 应用SparseMixer于 Switch Transformer 上，在预训练和机器翻译任务中，可以见到较大的性能提升，快速加速训练过程，最多提高训练速度2倍。

Abstract
One defining characteristic of Mixture-of-Expert (MoE) models is their capacity for conducting sparse computation via expert routing, leading to remarkable scalability. However, backpropagation, the cornerstone of deep learning, requires dense computation, thereby posting challenges in MoE gradient computations. Here, we introduce SparseMixer, a scalable gradient estimator that bridges the gap between backpropagation and sparse expert routing. Unlike typical MoE training which strategically neglects certain gradient terms for the sake of sparse computation and scalability, SparseMixer provides scalable gradient approximations for these terms, enabling reliable gradient estimation in MoE training. Grounded in a numerical ODE framework, SparseMixer harnesses the mid-point method, a second-order ODE solver, to deliver precise gradient approximations with negligible computational overhead. Applying SparseMixer to Switch Transformer on both pre-training and machine translation tasks, SparseMixer showcases considerable performance gain, accelerating training convergence up to 2 times.

摘要
Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The translation may not be perfect, and some nuances or idiomatic expressions may be lost in translation.

Towards Causal Foundation Model: on Duality between Causal Inference and Attention

paper_url: http://arxiv.org/abs/2310.00809
repo_url: None
paper_authors: Jiaqi Zhang, Joel Jennings, Cheng Zhang, Chao Ma
for: 这篇论文旨在建立复杂任务中的 causal inference 模型，以提高机器学习的效果。
methods: 该论文提出了一种新的、理论上正确的方法 called Causal Inference with Attention (CInA)，该方法通过多个无标注数据进行自主学习 causal learning，并在新数据上进行零shot causal inference。
results: 实验结果表明，CInA方法能够通过最终层的 transformer-type 架构实现零shot causal inference，并能够在不同的数据集上进行效果的泛化。

Abstract
Foundation models have brought changes to the landscape of machine learning, demonstrating sparks of human-level intelligence across a diverse array of tasks. However, a gap persists in complex tasks such as causal inference, primarily due to challenges associated with intricate reasoning steps and high numerical precision requirements. In this work, we take a first step towards building causally-aware foundation models for complex tasks. We propose a novel, theoretically sound method called Causal Inference with Attention (CInA), which utilizes multiple unlabeled datasets to perform self-supervised causal learning, and subsequently enables zero-shot causal inference on unseen tasks with new data. This is based on our theoretical results that demonstrate the primal-dual connection between optimal covariate balancing and self-attention, facilitating zero-shot causal inference through the final layer of a trained transformer-type architecture. We demonstrate empirically that our approach CInA effectively generalizes to out-of-distribution datasets and various real-world datasets, matching or even surpassing traditional per-dataset causal inference methodologies.

摘要
基础模型已经带来了机器学习领域的变革，展示出人类水平的智能特性在多种任务上。然而，在复杂任务中，如 causal inference，仍存在一个差距，主要归结于复杂的逻辑步骤和高精度数字需求。在这项工作中，我们首次实现了基于自我超vised causal learning的可 causal-aware基础模型。我们提出了一种新的、理论上正确的方法called Causal Inference with Attention（CInA），通过多个无标签数据集进行自我超vised causal learning，并在新数据上进行零实际参数的 causal inference。这基于我们的理论结果，证明了优化 covariate balancing 和 self-attention 的 primal-dual 连接，从而实现零实际参数的 causal inference through 训练过的 transformer-type 架构的最后一层。我们通过实验证明，我们的方法 CInA 可以对不同的数据集和实际世界任务进行有效的泛化，与传统每个数据集的 causal inference 方法相当或者even surpass。

Knowledge Engineering for Wind Energy

paper_url: http://arxiv.org/abs/2310.00804
repo_url: https://github.com/Planet21century/TECHALDO.
paper_authors: Yuriy Marykovskiy, Thomas Clark, Justin Day, Marcus Wiens, Charles Henderson, Julian Quick, Imad Abdallah, Anna Maria Sempreviva, Jean-Paul Calbimonte, Eleni Chatzi, Sarah Barber
for: 本研究旨在帮助风能领域专家将数据转化为域知识，与其他知识源集成，并为下一代人工智能系统提供可用的数据。
methods: 本文使用知识工程来支持风能领域的数字变革，并提出了域知识表示的主要概念。 previous work 在风能领域知识工程和知识表示方面进行了系统性的分析，并提供了适用于域专家的指南。
results: 本文通过系统分析当前风能领域知识工程的状况，并将主要域算法和工具置于风能领域专家需求和问题点上下文中，以帮助读者更好地理解和应用知识工程技术。

Abstract
With the rapid evolution of the wind energy sector, there is an ever-increasing need to create value from the vast amounts of data made available both from within the domain, as well as from other sectors. This article addresses the challenges faced by wind energy domain experts in converting data into domain knowledge, connecting and integrating it with other sources of knowledge, and making it available for use in next generation artificially intelligent systems. To this end, this article highlights the role that knowledge engineering can play in the process of digital transformation of the wind energy sector. It presents the main concepts underpinning Knowledge-Based Systems and summarises previous work in the areas of knowledge engineering and knowledge representation in a manner that is relevant and accessible to domain experts. A systematic analysis of the current state-of-the-art on knowledge engineering in the wind energy domain is performed, with available tools put into perspective by establishing the main domain actors and their needs and identifying key problematic areas. Finally, guidelines for further development and improvement are provided.

摘要
随着风能行业的快速发展，需要从各个领域中提取丰富的数据，并将其与其他领域的知识相连接和融合。这篇文章挑战风能领域专家将数据转化为域知识，并将其与其他知识源融合，以便在下一代人工智能系统中使用。为此，本文强调了知识工程在风能领域的数字转型过程中的重要作用。文章介绍了知识工程的主要概念，并总结了过去在风能领域的知识工程和知识表示方面的工作，以便对风能领域专家有所帮助。本文进行了风能领域知识工程的系统性分析，并将可用工具放在风能领域主要拥有者和他们的需求之前提下进行了比较。文章还标识了主要问题点，以便进一步的发展和改进。最后，文章提供了进一步发展和改进的指南。

GraphPatcher: Mitigating Degree Bias for Graph Neural Networks via Test-time Augmentation

paper_url: http://arxiv.org/abs/2310.00800
repo_url: https://github.com/jumxglhf/graphpatcher
paper_authors: Mingxuan Ju, Tong Zhao, Wenhao Yu, Neil Shah, Yanfang Ye
for: 提高 graph neural network (GNN) 的测试时通用性和低度节点表现。
methods: 提出了一种名为 GraphPatcher 的测试时扩充框架，通过在训练时生成虚拟节点来强化 GNN 的测试时性能。
results: 对七个基准数据集进行了广泛的实验，并 consistently 提高了常见 GNN 的总性能和低度节点表现，相比之前的状态态标准基eline。

Abstract
Recent studies have shown that graph neural networks (GNNs) exhibit strong biases towards the node degree: they usually perform satisfactorily on high-degree nodes with rich neighbor information but struggle with low-degree nodes. Existing works tackle this problem by deriving either designated GNN architectures or training strategies specifically for low-degree nodes. Though effective, these approaches unintentionally create an artificial out-of-distribution scenario, where models mainly or even only observe low-degree nodes during the training, leading to a downgraded performance for high-degree nodes that GNNs originally perform well at. In light of this, we propose a test-time augmentation framework, namely GraphPatcher, to enhance test-time generalization of any GNNs on low-degree nodes. Specifically, GraphPatcher iteratively generates virtual nodes to patch artificially created low-degree nodes via corruptions, aiming at progressively reconstructing target GNN's predictions over a sequence of increasingly corrupted nodes. Through this scheme, GraphPatcher not only learns how to enhance low-degree nodes (when the neighborhoods are heavily corrupted) but also preserves the original superior performance of GNNs on high-degree nodes (when lightly corrupted). Additionally, GraphPatcher is model-agnostic and can also mitigate the degree bias for either self-supervised or supervised GNNs. Comprehensive experiments are conducted over seven benchmark datasets and GraphPatcher consistently enhances common GNNs' overall performance by up to 3.6% and low-degree performance by up to 6.5%, significantly outperforming state-of-the-art baselines. The source code is publicly available at https://github.com/jumxglhf/GraphPatcher.

摘要
近期研究发现，图 neural network (GNN) 具有节点度偏好：它们通常在高度节点上表现良好，但是在低度节点上遇到困难。现有的方法包括设计专门的 GNN 架构或训练策略，以解决这个问题。虽然有效，这些方法会意外创造一种人工的异常情况，导致模型在训练中主要或仅仅观察低度节点，从而导致高度节点的性能下降。为了解决这个问题，我们提出了一个测试时扩展框架，即 GraphPatcher，以提高任何 GNN 的测试时通用性。Specifically, GraphPatcher iteratively generates virtual nodes to patch artificially created low-degree nodes via corruptions, aiming at progressively reconstructing target GNN's predictions over a sequence of increasingly corrupted nodes. Through this scheme, GraphPatcher not only learns how to enhance low-degree nodes (when the neighborhoods are heavily corrupted) but also preserves the original superior performance of GNNs on high-degree nodes (when lightly corrupted). Additionally, GraphPatcher is model-agnostic and can also mitigate the degree bias for either self-supervised or supervised GNNs.我们在七个 benchmark 数据集上进行了广泛的实验，并证明 GraphPatcher 可以一直提高常见 GNN 的总性能和低度节点性能，最高提高3.6%和6.5%。与现有的基elines相比，GraphPatcher 显示出了显著的优势。源代码可以在 GitHub 上下载，请参阅。

A Comprehensive Review of Generative AI in Healthcare

paper_url: http://arxiv.org/abs/2310.00795
repo_url: None
paper_authors: Yasin Shokrollahi, Sahar Yarmohammadtoosky, Matthew M. Nikahd, Pengfei Dong, Xianqi Li, Linxia Gu
for: 本文主要探讨了生成式人工智能（AI）在医疗领域的应用，尤其是转换器和扩散模型。
methods: 本文使用的方法包括医疗影像分析、预测蛋白结构、临床文档、诊断协助、放射学解读、临床决策支持、医疗代码和财务处理等。
results: 本文总结了各种生成式AI应用在医疗领域的进展，包括医疗影像重建、图像至图像翻译、图像生成和分类、蛋白结构预测、临床诊断和决策支持等，并提出了未来研究的可能性以满足医疗领域的发展需求。

Abstract
The advancement of Artificial Intelligence (AI) has catalyzed revolutionary changes across various sectors, notably in healthcare. Among the significant developments in this field are the applications of generative AI models, specifically transformers and diffusion models. These models have played a crucial role in analyzing diverse forms of data, including medical imaging (encompassing image reconstruction, image-to-image translation, image generation, and image classification), protein structure prediction, clinical documentation, diagnostic assistance, radiology interpretation, clinical decision support, medical coding, and billing, as well as drug design and molecular representation. Such applications have enhanced clinical diagnosis, data reconstruction, and drug synthesis. This review paper aims to offer a thorough overview of the generative AI applications in healthcare, focusing on transformers and diffusion models. Additionally, we propose potential directions for future research to tackle the existing limitations and meet the evolving demands of the healthcare sector. Intended to serve as a comprehensive guide for researchers and practitioners interested in the healthcare applications of generative AI, this review provides valuable insights into the current state of the art, challenges faced, and prospective future directions.

摘要
人工智能（AI）的发展对各个领域产生了革命性的变革，医疗领域是其中之一。在这个领域中，生成式AI模型，特别是转换器和扩散模型，对医疗数据进行分析发挥了重要作用。这些模型可以处理各种不同的数据类型，包括医疗影像重建、图像到图像翻译、图像生成和图像分类、蛋白质结构预测、临床记录、诊断助手、医学影像理解、诊断支持、医疗代码和财务处理等。这些应用程序提高了临床诊断、数据重建和药物合成。本文旨在为医疗领域的研究人员和实践者提供一份全面的综述，探讨生成式AI在医疗领域的应用，特别是转换器和扩散模型。此外，我们还提出了未来研究的可能性，以满足医疗领域的发展需求。这篇文章旨在为医疗领域的研究人员和实践者提供一份价值的指南，帮助他们更好地理解现有技术的状况、挑战和未来发展趋势。

Revisiting Link Prediction: A Data Perspective

paper_url: http://arxiv.org/abs/2310.00793
repo_url: https://github.com/uisim2020/uisim2020
paper_authors: Haitao Mao, Juanhui Li, Harry Shomer, Bingheng Li, Wenqi Fan, Yao Ma, Tong Zhao, Neil Shah, Jiliang Tang
for: 本研究旨在探讨链接预测task在不同领域 dataset 之间的共通原理，以提高链接预测模型的普适性。
methods: 本研究使用了三种关键因素：本地结构靠近性、全局结构靠近性和特征靠近性，以探索链接预测task 的数据中心视角。
results: 研究发现，全局结构靠近性只有在本地结构靠近性不足时才有效。此外，特征靠近性和结构靠近性之间存在冲突，导致 GNN4LP 模型在一些链接上表现不佳。

Abstract
Link prediction, a fundamental task on graphs, has proven indispensable in various applications, e.g., friend recommendation, protein analysis, and drug interaction prediction. However, since datasets span a multitude of domains, they could have distinct underlying mechanisms of link formation. Evidence in existing literature underscores the absence of a universally best algorithm suitable for all datasets. In this paper, we endeavor to explore principles of link prediction across diverse datasets from a data-centric perspective. We recognize three fundamental factors critical to link prediction: local structural proximity, global structural proximity, and feature proximity. We then unearth relationships among those factors where (i) global structural proximity only shows effectiveness when local structural proximity is deficient. (ii) The incompatibility can be found between feature and structural proximity. Such incompatibility leads to GNNs for Link Prediction (GNN4LP) consistently underperforming on edges where the feature proximity factor dominates. Inspired by these new insights from a data perspective, we offer practical instruction for GNN4LP model design and guidelines for selecting appropriate benchmark datasets for more comprehensive evaluations.

摘要
链接预测，一项基本任务在图上，已经在各种应用中证明无可或，例如朋友推荐、蛋白分析和药物交互预测。然而， datasets span 多个领域，它们可能具有不同的下面机制。文献证明了无一个通用的算法适用于所有 datasets。在这篇文章中，我们尝试通过数据中心的视角来探索链接预测的原则。我们认为链接预测中有三个基本因素是关键的：本地结构靠近性、全局结构靠近性和特征靠近性。然后，我们发现这些因素之间存在关系，包括（i）全局结构靠近性只有当本地结构靠近性不足时才能够有效。（ii）特征和结构靠近性之间存在不兼容性，这导致 GNNs for Link Prediction (GNN4LP) 在特征靠近性因素占主导地位的边上表现不佳。被这些新的数据视角所 inspirited，我们提供了实用的 GNN4LP 模型设计指南和选择合适的 benchmark 数据集的指南，以便更全面的评估。

Towards a Universal Understanding of Color Harmony: Fuzzy Approach

paper_url: http://arxiv.org/abs/2310.00791
repo_url: None
paper_authors: Pakizar Shamoi, Muragul Muratbekova, Assylzhan Izbassar, Atsushi Inoue, Hiroharu Kawanaka
for: explore color harmony using a fuzzy-based color model and evaluate its universality
methods: use a dataset of attractive images from five different domains, apply a fuzzy approach to identify harmony patterns and dominant color palettes
results: color harmony is largely universal, influenced by hue relationships, saturation, and intensity of colors, with prevalent adherence to color wheel principles in palettes with high harmony levels.

Abstract
Harmony level prediction is receiving increasing attention nowadays. Color plays a crucial role in affecting human aesthetic responses. In this paper, we explore color harmony using a fuzzy-based color model and address the question of its universality. For our experiments, we utilize a dataset containing attractive images from five different domains: fashion, art, nature, interior design, and brand logos. We aim to identify harmony patterns and dominant color palettes within these images using a fuzzy approach. It is well-suited for this task because it can handle the inherent subjectivity and contextual variability associated with aesthetics and color harmony evaluation. Our experimental results suggest that color harmony is largely universal. Additionally, our findings reveal that color harmony is not solely influenced by hue relationships on the color wheel but also by the saturation and intensity of colors. In palettes with high harmony levels, we observed a prevalent adherence to color wheel principles while maintaining moderate levels of saturation and intensity. These findings contribute to ongoing research on color harmony and its underlying principles, offering valuable insights for designers, artists, and researchers in the field of aesthetics.

摘要
现在，谐契度预测已经得到了越来越多的关注。颜色在人类美学反应中发挥了关键性的作用。在这篇论文中，我们使用基于朴素集的颜色模型来探讨颜色谐契，并评估其universality。我们使用包含有吸引人的图像的五个领域：时尚、艺术、自然、家居设计和品牌LOGO的 dataset进行实验。我们希望通过朴素方法来确定图像中的谐契模式和主导的颜色alette。这种方法适合这种任务，因为它可以处理美学和颜色谐契评估中的内在主观性和上下文变化。我们的实验结果表明，颜色谐契是大体上的universal。此外，我们发现颜色谐契不仅受到颜色轮的颜色关系的影响，还受到颜色的浓淡和强度的影响。在高谐契水平的颜色alette中，我们发现了较高的颜色轮原则遵循性，同时保持了中等的浓淡和强度。这些发现对美学颜色谐契的研究提供了有价值的见解，对设计师、艺术家和研究人员在美学颜色谐契方面的工作都是有益的。

BooookScore: A systematic exploration of book-length summarization in the era of LLMs

paper_url: http://arxiv.org/abs/2310.00785
repo_url: https://github.com/lilakk/booookscore
paper_authors: Yapei Chang, Kyle Lo, Tanya Goyal, Mohit Iyyer
For: This paper focuses on developing a method to evaluate the coherence of book-length summaries generated by large language models (LLMs). The authors aim to address the challenges of evaluating summarization of long documents, which are not well-studied due to the lack of datasets and evaluation methods.* Methods: The authors use two prompting workflows to generate book-length summaries: (1) hierarchically merging chunk-level summaries, and (2) incrementally updating a running summary. They also develop an automatic metric, BooookScore, to measure the coherence of the summaries.* Results: The authors obtain human annotations on GPT-4 generated summaries of 100 recently-published books and identify eight common types of coherence errors made by LLMs. They find that closed-source LLMs such as GPT-4 and Claude 2 produce summaries with higher BooookScore than the oft-repetitive ones generated by LLaMA 2. Incremental updating yields lower BooookScore but higher level of detail than hierarchical merging, a trade-off sometimes preferred by human annotators.Here is the Chinese translation of the three points:* For: 本文旨在开发一种方法来评估大语言模型（LLM）生成的长文摘要的准确性。作者们面临长文摘要评估的挑战，因为现有的数据集和评估方法尚未得到了深入研究。* Methods: 作者们使用两种提示工作流程来生成长文摘要：（1）层次合并 chunk-level 摘要，和（2）逐步更新 Running 摘要。他们还开发了一个自动度量器，叫做 BooookScore，用于衡量摘要的准确性。* Results: 作者们 obt 100 篇最近发表的书籍的 GPT-4 生成的摘要，并将其分为八种常见的准确性错误。他们发现，关闭源 LLM such as GPT-4 和 Claude 2 生成的摘要具有更高的 BooookScore，与 oft-repetitive 的 LLaMA 2 生成的摘要不同。增量更新 yields 较低的 BooookScore，但是具有更高的细节水平。

Abstract
Summarizing book-length documents (>100K tokens) that exceed the context window size of large language models (LLMs) requires first breaking the input document into smaller chunks and then prompting an LLM to merge, update, and compress chunk-level summaries. Despite the complexity and importance of this task, it has yet to be meaningfully studied due to the challenges of evaluation: existing book-length summarization datasets (e.g., BookSum) are in the pretraining data of most public LLMs, and existing evaluation methods struggle to capture errors made by modern LLM summarizers. In this paper, we present the first study of the coherence of LLM-based book-length summarizers implemented via two prompting workflows: (1) hierarchically merging chunk-level summaries, and (2) incrementally updating a running summary. We obtain 1193 fine-grained human annotations on GPT-4 generated summaries of 100 recently-published books and identify eight common types of coherence errors made by LLMs. Because human evaluation is expensive and time-consuming, we develop an automatic metric, BooookScore, that measures the proportion of sentences in a summary that do not contain any of the identified error types. BooookScore has high agreement with human annotations and allows us to systematically evaluate the impact of many other critical parameters (e.g., chunk size, base LLM) while saving $15K and 500 hours in human evaluation costs. We find that closed-source LLMs such as GPT-4 and Claude 2 produce summaries with higher BooookScore than the oft-repetitive ones generated by LLaMA 2. Incremental updating yields lower BooookScore but higher level of detail than hierarchical merging, a trade-off sometimes preferred by human annotators. We release code and annotations after blind review to spur more principled research on book-length summarization.

摘要
大量文档摘要（>100K tokens）需要首先将输入文档分成更小的块，然后使用大型自然语言模型（LLM）来合并、更新和压缩块级摘要。尽管这项任务的复杂性和重要性尚未得到系统的研究，但是现有的书籍摘要 dataset（例如 BookSum）都包含在大多数公共 LLM 的预训练数据中，而现有的评估方法很难 Capture LLM 摘要器中的错误。在这篇论文中，我们提出了首次对 LLM 基于的书籍摘要器的准确性进行了研究。我们使用两种提示工作流程：（1）层次合并块级摘要，和（2）逐步更新RunningSummary。我们获得了1193个精细的人类标注，对 GPT-4 生成的 100 部最新出版的书籍摘要进行了评估，并发现了 eight 种常见的准确性错误。由于人类评估是昂贵和时间consuming的，我们开发了一个自动度量器，BooookScore，可以衡量摘要中含有准确性错误的句子的比例。BooookScore 与人类标注具有高一致性，我们可以通过 sistematic 地评估多个关键参数（例如块大小、基础 LLN）而节省 $15K 和 500 小时的人类评估成本。我们发现closed-source LLMs 如 GPT-4 和 Claude 2 生成的摘要具有更高的 BooookScore，而 LLLaMA 2 的摘要则具有较高的重复性。逐次更新具有较低的 BooookScore，但是具有更高的细节水平，这些trade-off 有时被人类 annotators 首选。我们在审查后发布代码和标注，以促进更理性的研究在书籍摘要领域。

Mining Java Memory Errors using Subjective Interesting Subgroups with Hierarchical Targets

paper_url: http://arxiv.org/abs/2310.00781
repo_url: https://github.com/remilyoucef/sca-miner
paper_authors: Youcef Remil, Anes Bendimerad, Mathieu Chambard, Romain Mathonat, Marc Plantevit, Mehdi Kaytoue
for: 本文主要针对软件应用程序，尤其是企业资源计划（ERP）系统的维护问题。
methods: 本文提出了一种新的子组发现（SD）技术，可以自动 mines incident数据并提取独特的模式，以识别问题的根本原因。
results: 本文通过一个Empirical Study validate了该方法的有效性和Pattern的质量。

Abstract
Software applications, especially Enterprise Resource Planning (ERP) systems, are crucial to the day-to-day operations of many industries. Therefore, it is essential to maintain these systems effectively using tools that can identify, diagnose, and mitigate their incidents. One promising data-driven approach is the Subgroup Discovery (SD) technique, a data mining method that can automatically mine incident datasets and extract discriminant patterns to identify the root causes of issues. However, current SD solutions have limitations in handling complex target concepts with multiple attributes organized hierarchically. To illustrate this scenario, we examine the case of Java out-of-memory incidents among several possible applications. We have a dataset that describes these incidents, including their context and the types of Java objects occupying memory when it reaches saturation, with these types arranged hierarchically. This scenario inspires us to propose a novel Subgroup Discovery approach that can handle complex target concepts with hierarchies. To achieve this, we design a pattern syntax and a quality measure that ensure the identified subgroups are relevant, non-redundant, and resilient to noise. To achieve the desired quality measure, we use the Subjective Interestingness model that incorporates prior knowledge about the data and promotes patterns that are both informative and surprising relative to that knowledge. We apply this framework to investigate out-of-memory errors and demonstrate its usefulness in incident diagnosis. To validate the effectiveness of our approach and the quality of the identified patterns, we present an empirical study. The source code and data used in the evaluation are publicly accessible, ensuring transparency and reproducibility.

摘要
To illustrate this scenario, we examine the case of Java out-of-memory incidents among several possible applications. We have a dataset that describes these incidents, including their context and the types of Java objects occupying memory when it reaches saturation, with these types arranged hierarchically. This scenario inspires us to propose a novel Subgroup Discovery approach that can handle complex target concepts with hierarchies.To achieve this, we design a pattern syntax and a quality measure that ensure the identified subgroups are relevant, non-redundant, and resilient to noise. To achieve the desired quality measure, we use the Subjective Interestingness model that incorporates prior knowledge about the data and promotes patterns that are both informative and surprising relative to that knowledge. We apply this framework to investigate out-of-memory errors and demonstrate its usefulness in incident diagnosis.To validate the effectiveness of our approach and the quality of the identified patterns, we present an empirical study. The source code and data used in the evaluation are publicly accessible, ensuring transparency and reproducibility.硬件应用程序，特别是企业资源规划（ERP）系统，对许多行业的日常运营是关键。因此，保持这些系统的效果是非常重要，使用可以识别、诊断和缓解incident的工具。一种有前途的数据驱动方法是Subgroup Discovery（SD）技术，可以自动挖掘incident数据集并提取描述性模式，以识别问题的根本原因。然而，现有的SD解决方案在处理复杂目标概念中存在限制，这些概念通常具有多个属性，并且归类在层次结构中。为了解释这种情况，我们选择了Java垃圾回收incident作为例子，我们有一个描述这些incident的数据集，包括incident的 контекст和占用内存资源的Java对象类型，这些类型以层次结构组织。这种情况提醒我们提出一种处理复杂目标概念的Subgroup Discovery方法。为了实现这一目标，我们设计了一种模式语法和质量度量，确保提取的子组是有用、非重复、抗噪的。为了实现所需的质量度量，我们使用Subjective Interestingness模型，该模型将数据中的知识纳入考虑，并且提高模式的有用性和surprise性，以便更好地描述数据。我们在调查垃圾回收incident中应用这种框架，并示出其在事件诊断中的有用性。为了证明我们的方法的有效性和模式的质量，我们进行了一个实验研究。研究中使用的源代码和数据都公开 accessible，以确保透明度和可重现性。

Pre-training with Synthetic Data Helps Offline Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.00771
repo_url: None
paper_authors: Zecheng Wang, Che Wang, Zixuan Dong, Keith Ross
for: 这个论文主要研究了深度强化学习（DRL）的离线预训练方法，特别是使用大量语言资料来提高下游性能（Reid et al., 2022）。
methods: 本论文使用了几种简单的预训练方案，包括使用生成的IID数据和一步随机链生成的数据，以及使用Q学习算法和多层感知器（MLP）作为后续。
results: 实验结果表明，使用这些简单的预训练方案可以提高DRL的性能，并且可以与使用大量语言资料预训练的性能相比肩。此外，采用这些预训练方案可以提高CQL算法的性能，并且在D4RL Gym游戏数据集上获得了一致的性能提升。

Abstract
Recently, it has been shown that for offline deep reinforcement learning (DRL), pre-training Decision Transformer with a large language corpus can improve downstream performance (Reid et al., 2022). A natural question to ask is whether this performance gain can only be achieved with language pre-training, or can be achieved with simpler pre-training schemes which do not involve language. In this paper, we first show that language is not essential for improved performance, and indeed pre-training with synthetic IID data for a small number of updates can match the performance gains from pre-training with a large language corpus; moreover, pre-training with data generated by a one-step Markov chain can further improve the performance. Inspired by these experimental results, we then consider pre-training Conservative Q-Learning (CQL), a popular offline DRL algorithm, which is Q-learning-based and typically employs a Multi-Layer Perceptron (MLP) backbone. Surprisingly, pre-training with simple synthetic data for a small number of updates can also improve CQL, providing consistent performance improvement on D4RL Gym locomotion datasets. The results of this paper not only illustrate the importance of pre-training for offline DRL but also show that the pre-training data can be synthetic and generated with remarkably simple mechanisms.

摘要
Inspired by these results, we then explore pre-training Conservative Q-Learning (CQL), a popular offline DRL algorithm that uses Q-learning and typically employs a Multi-Layer Perceptron (MLP) backbone. Surprisingly, we find that pre-training with simple synthetic data for a small number of updates can also improve CQL, providing consistent performance improvement on D4RL Gym locomotion datasets.The findings of this paper demonstrate the importance of pre-training for offline DRL and show that the pre-training data can be synthetic and generated with remarkably simple mechanisms. This has significant implications for the development of offline DRL algorithms and highlights the potential for using simple pre-training schemes to improve performance.

Facilitating Battery Swapping Services for Freight Trucks with Spatial-Temporal Demand Prediction

paper_url: http://arxiv.org/abs/2310.04440
repo_url: None
paper_authors: Linyu Liu, Zhen Dai, Shiji Song, Xiaocheng Li, Guanting Chen
for: 这篇论文旨在探讨重型卡车电池更换服务的潜力和效率，以实现碳neutral未来。
methods: 论文运用了双重方法，首先预测了运输网络上未来几个小时的交通模式，然后将预测结果引入优化模组，实现电池的有效分配和部署。
results: 分析了2,500英里长的高速公路重型卡车数据，我们发现预测/机器学习可以帮助未来的决策。具体来说，我们发现在设置早期的移动电池更换站更有利，但是随着系统的成熟，固定位置的电池更换站更受欢迎。

Abstract
Electrifying heavy-duty trucks offers a substantial opportunity to curtail carbon emissions, advancing toward a carbon-neutral future. However, the inherent challenges of limited battery energy and the sheer weight of heavy-duty trucks lead to reduced mileage and prolonged charging durations. Consequently, battery-swapping services emerge as an attractive solution for these trucks. This paper employs a two-fold approach to investigate the potential and enhance the efficacy of such services. Firstly, spatial-temporal demand prediction models are adopted to predict the traffic patterns for the upcoming hours. Subsequently, the prediction guides an optimization module for efficient battery allocation and deployment. Analyzing the heavy-duty truck data on a highway network spanning over 2,500 miles, our model and analysis underscore the value of prediction/machine learning in facilitating future decision-makings. In particular, we find that the initial phase of implementing battery-swapping services favors mobile battery-swapping stations, but as the system matures, fixed-location stations are preferred.

摘要
电动重型卡车的应用提供了巨大的减少碳排放的机会，推进向碳中和未来。然而，重型卡车的自然限制，如电池能量有限和车辆总重，导致很长的充电时间和减少的行驶距离。因此，电池换卡服务出现了一个有吸引力的解决方案。本文采用两重方法来探讨这种服务的潜在和提高效率。首先，采用空间-时间需求预测模型预测下一个几个小时的交通趋势。然后，预测导引一个优化模块，以便有效地分配和部署电池。分析了2,500英里长的高速公路上的重型卡车数据，我们的模型和分析表明，预测/机器学习在未来决策中发挥了重要作用。尤其是在实施电池换卡服务的初期阶段，移动电池换卡站更有优势；而在系统成熟后，固定位置的电池换卡站变得更加受欢迎。

Mind the Gap: Federated Learning Broadens Domain Generalization in Diagnostic AI Models

paper_url: http://arxiv.org/abs/2310.00757
repo_url: https://github.com/tayebiarasteh/fldomain
paper_authors: Soroosh Tayebi Arasteh, Christiane Kuhl, Marwin-Jonathan Saehn, Peter Isfort, Daniel Truhn, Sven Nebelung
for: 这项研究旨在评估 Federated Learning（FL）在骨肢X射线图像分类 task 中的影响，特别是训练策略、网络架构和数据多样性等因素对模型的预测性能的影响。methods: 研究使用了610,000个骨肢X射线图像数据集，来评估不同训练策略、网络架构和数据多样性对模型的预测性能。results: 研究发现，虽然大型数据集可能会增加FL的性能，但是在某些情况下，甚至会导致性能下降。相反，小型数据集表现出了明显的改善。因此，本地训练和FL的性能主要受到训练数据大小的影响，而不同数据集之间的多样性则对于Off-domain任务的性能产生了更大的影响。通过合作训练在多个外部机构的数据上，FL可以提高隐私、可重现性和 Off-domain 可靠性，并且可能提高医疗结果。

Abstract
Developing robust artificial intelligence (AI) models that generalize well to unseen datasets is challenging and usually requires large and variable datasets, preferably from multiple institutions. In federated learning (FL), a model is trained collaboratively at numerous sites that hold local datasets without exchanging them. So far, the impact of training strategy, i.e., local versus collaborative, on the diagnostic on-domain and off-domain performance of AI models interpreting chest radiographs has not been assessed. Consequently, using 610,000 chest radiographs from five institutions across the globe, we assessed diagnostic performance as a function of training strategy (i.e., local vs. collaborative), network architecture (i.e., convolutional vs. transformer-based), generalization performance (i.e., on-domain vs. off-domain), imaging finding (i.e., cardiomegaly, pleural effusion, pneumonia, atelectasis, consolidation, pneumothorax, and no abnormality), dataset size (i.e., from n=18,000 to 213,921 radiographs), and dataset diversity. Large datasets not only showed minimal performance gains with FL but, in some instances, even exhibited decreases. In contrast, smaller datasets revealed marked improvements. Thus, on-domain performance was mainly driven by training data size. However, off-domain performance leaned more on training diversity. When trained collaboratively across diverse external institutions, AI models consistently surpassed models trained locally for off-domain tasks, emphasizing FL's potential in leveraging data diversity. In conclusion, FL can bolster diagnostic privacy, reproducibility, and off-domain reliability of AI models and, potentially, optimize healthcare outcomes.

摘要
<>发展强健的人工智能（AI）模型，使其在未见数据集上具有良好的泛化性是一项挑战。通常需要大量和多样的数据集，从多个机构获取。在联合学习（FL）中，模型在多个地点进行协同学习，而不需要交换本地数据。 jusqu'à présent，训练策略（本地 versus 协同）对AI模型解剖学影像鉴定性的影响尚未得到评估。为了解决这个问题，我们使用了全球五个机构的610,000张胸部X射影像，评估AI模型的鉴定性以训练策略、网络架构、泛化性、影像发现（cardiomegaly, pleural effusion, pneumonia, atelectasis, consolidation, pneumothorax, 和无异常）、数据集大小（从n=18,000到213,921 radiographs）和数据多样性之间的关系。结果表明，大型数据集不仅在FL中没有获得明显的性能提升，有些情况下甚至显示下降。相反，较小的数据集表现出了明显的改善。因此，本地训练的性能主要受到训练数据集大小的影响，而在不同机构的外部数据集上进行协同训练的性能则更多受到训练多样性的影响。当AI模型在多个外部机构上进行协同训练时，其在外部任务上的性能 consistently 高于本地训练的模型，这将FL的潜在作用在拓展数据多样性方面强调。总之，FL可以提高鉴定隐私、重现性和外部可靠性的AI模型，并且可能会优化医疗结果。

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks

paper_url: http://arxiv.org/abs/2310.00752
repo_url: None
paper_authors: Dongfu Jiang, Yishan Li, Ge Zhang, Wenhao Huang, Bill Yuchen Lin, Wenhu Chen
for: 本研究开发了一个名为TIGERScore的自动评估指标，用于评估文本生成任务的效果。
methods: TIGERScore使用了专门训练的LLaMA模型，并基于自己调整的MetricInstruct dataset，以提供可读的错误分析，并不需要参考。
results: TIGERScore在5个对接评分数据集上 Achieves the highest overall Spearman’s correlation with human ratings，并且与其他指标相比表现更好，甚至可以超越参考基于的指标。

Abstract
We present TIGERScore, a \textbf{T}rained metric that follows \textbf{I}nstruction \textbf{G}uidance to perform \textbf{E}xplainable, and \textbf{R}eference-free evaluation over a wide spectrum of text generation tasks. Different from other automatic evaluation methods that only provide arcane scores, TIGERScore is guided by the natural language instruction to provide error analysis to pinpoint the mistakes in the generated text. Our metric is based on LLaMA, trained on our meticulously curated instruction-tuning dataset MetricInstruct which covers 6 text generation tasks and 23 text generation datasets. The dataset consists of 48K quadruple in the form of (instruction, input, system output $\rightarrow$ error analysis). We collected the `system outputs' through diverse channels to cover different types of errors. To quantitatively assess our metric, we evaluate its correlation with human ratings on 5 held-in datasets, 2 held-out datasets and show that TIGERScore can achieve the highest overall Spearman's correlation with human ratings across these datasets and outperforms other metrics significantly. As a reference-free metric, its correlation can even surpass the best existing reference-based metrics. To further qualitatively assess the rationale generated by our metric, we conduct human evaluation on the generated explanations and found that the explanations are 70.8\% accurate. Through these experimental results, we believe TIGERScore demonstrates the possibility of building universal explainable metrics to evaluate any text generation task.

摘要
我们介绍TIGERScore，一个已经训练的度量，可以根据自然语言指南进行可解释的、无参考度的文本生成任务评价。与其他自动评价方法不同，TIGERScore不仅提供神秘的分数，还可以通过错误分析来 pinpoint生成文本中的错误。我们的度量基于LLaMA，并在我们精心抽样的指南调度集MetricInstruct上训练。这个集合包括6种文本生成任务和23种文本生成数据集，共48000个四元组（指南、输入、系统输出 → 错误分析）。我们通过多种途径收集了“系统输出”，以覆盖不同类型的错误。为了评估我们的度量，我们对5个保留数据集、2个保 OUT数据集进行了量化评估，并发现TIGERScore可以在这些数据集中 achiev the highest Spearman correlation coefficient with human ratings，并且与其他度量相比显著出perform better。作为一个无参考度量，TIGERScore的相关性可以甚至超过最佳参考基础度量。为了进一步评估我们的度量生成的理由，我们对生成的解释进行了人工评估，并发现解释的准确率为70.8%。通过这些实验结果，我们认为TIGERScore表明了可以建立 universal explainable metrics，用于评价任何文本生成任务。

NoxTrader: LSTM-Based Stock Return Momentum Prediction for Quantitative Trading

paper_url: http://arxiv.org/abs/2310.00747
repo_url: None
paper_authors: Hsiang-Hui Liu, Han-Jay Shu, Wei-Ning Chiu
for: 这个研究主要目的是在股票市场中获得资金收益，尤其是在中期至长期的时间预测。
methods: 这个研究使用时间序列分析来学习股票市场的趋势，并使用价格和股票量数据进行特征工程。他们还使用Long Short-Term Memory（LSTM）模型来捕捉价格趋势，并在交易过程中进行动态模型更新。
results: 这个研究获得了一些优秀的预测数据，其中预测和实际市场资料之间的距离在0.65至0.75之间。他们还使用筛选技术来改善初始投资回报，从-60%提升到325%.

Abstract
We introduce NoxTrader, a sophisticated system designed for portfolio construction and trading execution with the primary objective of achieving profitable outcomes in the stock market, specifically aiming to generate moderate to long-term profits. The underlying learning process of NoxTrader is rooted in the assimilation of valuable insights derived from historical trading data, particularly focusing on time-series analysis due to the nature of the dataset employed. In our approach, we utilize price and volume data of US stock market for feature engineering to generate effective features, including Return Momentum, Week Price Momentum, and Month Price Momentum. We choose the Long Short-Term Memory (LSTM)model to capture continuous price trends and implement dynamic model updates during the trading execution process, enabling the model to continuously adapt to the current market trends. Notably, we have developed a comprehensive trading backtesting system - NoxTrader, which allows us to manage portfolios based on predictive scores and utilize custom evaluation metrics to conduct a thorough assessment of our trading performance. Our rigorous feature engineering and careful selection of prediction targets enable us to generate prediction data with an impressive correlation range between 0.65 and 0.75. Finally, we monitor the dispersion of our prediction data and perform a comparative analysis against actual market data. Through the use of filtering techniques, we improved the initial -60% investment return to 325%.

摘要
我们介绍NoxTrader，一个复杂的系统，用于股票投资组合建立和交易执行，主要目标是在股市中实现可观的收益。我们的学习过程借鉴了历史交易数据中的宝贵经验，特别是时间序列分析，因为我们使用的数据集是时间序列型的。在我们的方法中，我们利用美国股市价格和量数据进行特征工程，生成有效特征，包括回报势力、周期势力和月度势力。我们选择Long Short-Term Memory（LSTM）模型，以捕捉连续价格趋势，并在交易执行过程中进行动态模型更新，使模型能够不断适应当前市场趋势。值得一提的是，我们开发了一套完整的交易回测系统——NoxTrader，它允许我们基于预测得分来管理投资组合，并使用自定义评估 metric来进行严格的评估我们的交易性能。我们的严格的特征工程和预测目标的精心选择，使我们能够生成预测数据的各种相关度范围在0.65-0.75之间。最后，我们监测预测数据的分散情况，并对实际市场数据进行比较分析。通过筛选技术，我们从初始投资回报下降至60%的位置提高至325%。

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

paper_url: http://arxiv.org/abs/2310.00746
repo_url: https://github.com/interactivenlp-team/rolellm-public
paper_authors: Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Wenhu Chen, Jie Fu, Junran Peng
for: 本文旨在提高语言模型（LLM）的角色扮演能力，以增强用户交互。
methods: 本文提出了一个框架，名为RoleLLM，用于评估、引出和提高 LLM 的角色扮演能力。RoleLLM 包括四个阶段：（1）角色资料构建（Role Profile Construction），（2）基于上下文的指令生成（Context-Based Instruction Generation），（3）角色提示（Role Prompting），以及（4）角色定制化指令调整（Role-Conditioned Instruction Tuning）。
results: 通过 Context-Instruct 和 RoleGPT，我们创建了 RoleBench，这是首个系统性的、细致的字级 benchmark 数据集，用于测试角色扮演能力。此外，通过 RoCIT 在 RoleBench 上进行调整，我们获得了 RoleLLaMA（英文）和 RoleGLM（中文），这些模型显著提高了角色扮演能力，甚至与 RoleGPT（使用 GPT-4）具有相同的Results。

Abstract
The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters. However, the closed-source nature of state-of-the-art LLMs and their general-purpose training limit role-playing optimization. In this paper, we introduce RoleLLM, a framework to benchmark, elicit, and enhance role-playing abilities in LLMs. RoleLLM comprises four stages: (1) Role Profile Construction for 100 roles; (2) Context-Based Instruction Generation (Context-Instruct) for role-specific knowledge extraction; (3) Role Prompting using GPT (RoleGPT) for speaking style imitation; and (4) Role-Conditioned Instruction Tuning (RoCIT) for fine-tuning open-source models along with role customization. By Context-Instruct and RoleGPT, we create RoleBench, the first systematic and fine-grained character-level benchmark dataset for role-playing with 168,093 samples. Moreover, RoCIT on RoleBench yields RoleLLaMA (English) and RoleGLM (Chinese), significantly enhancing role-playing abilities and even achieving comparable results with RoleGPT (using GPT-4).

摘要
大型语言模型（LLM）的出现已经为复杂的任务如角色扮演提供了方便，这些任务可以使模型模拟多种角色，从而提高用户交互的体验。然而，现有的State-of-the-art LLMs的关闭源代码和通用训练限制了角色扮演优化。在这篇论文中，我们介绍了RoleLLM框架，用于评价、引导和提高LLMs中的角色扮演能力。RoleLLM包括四个阶段：（1）角色Profile构建100个角色；（2）基于上下文的指令生成（Context-Instruct）用于角色特定知识提取；（3）基于GPT的角色提示（RoleGPT）用于模仿说话风格；以及（4）基于角色的Conditioned Instruction Tuning（RoCIT）用于 fine-tuning开源模型以及角色定制。通过Context-Instruct和RoleGPT，我们创建了RoleBench，第一个系统和细化的字符级 benchmark dataset для角色扮演，包含168,093个样本。此外，在RoleBench上进行RoCIT后，我们获得了RoleLLaMA（英语）和RoleGLM（中文），两个能够明显提高角色扮演能力的模型，甚至与RoleGPT（使用GPT-4）相当。

My Machine and I: ChatGPT and the Future of Human-Machine Collaboration in Africa

paper_url: http://arxiv.org/abs/2310.13704
repo_url: None
paper_authors: Munachimso Blessing Oguine, Chidera Godsfavor Oguine, Kanyifeechukwu Jane Oguine
for: 本研究旨在探讨聊天GPT在人机合作中的效果。
methods: 本研究使用反透明主题分析方法对51篇2019-2023年的文章进行分析。
results: 研究发现聊天GPT在学术领域 such as 教育和研究中的人机交互非常普遍，而且聊天GPT在改善人机合作方面的效果较高。

Abstract
Recent advancements in technology have necessitated a paradigm shift in the people use technology necessitating a new research field called Human-Machine collaboration. ChatGPT, an Artificial intelligence (AI) assistive technology, has gained mainstream adoption and implementation in academia and industry; however, a lot is left unknown about how this new technology holds for Human-Machine Collaboration in Africa. Our survey paper highlights to answer some of these questions. To understand the effectiveness of ChatGPT on human-machine collaboration we utilized reflexive thematic analysis to analyze (N= 51) articles between 2019 and 2023 obtained from our literature search. Our findings indicate the prevalence of ChatGPT for human-computer interaction within academic sectors such as education, and research; trends also revealed the relatively high effectiveness of ChatGPT in improving human-machine collaboration.

摘要
最近的技术发展使得人机合作的研究领域得到了推动，这种新的研究领域被称为人机合作。智能人工智能（AI）协助技术ChatGPT在学术和产业界得到了广泛的批处和实施，但是关于这种新技术在非洲的人机合作方面还有很多未知之处。我们的调查论文旨在回答这些问题。为了评估ChatGPT在人机合作效果，我们使用了反思主题分析法分析（N=51）于2019年至2023年之间的文章。我们的发现表明了ChatGPT在教育和研究领域的人机交互非常普遍，并且发现ChatGPT在改善人机合作效果方面的趋势相对较高。

GenAI Against Humanity: Nefarious Applications of Generative Artificial Intelligence and Large Language Models

paper_url: http://arxiv.org/abs/2310.00737
repo_url: None
paper_authors: Emilio Ferrara
for: This paper is written to raise awareness about the potential risks and challenges of Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) being misused for nefarious purposes.
methods: The paper uses a combination of research and analysis to identify the potential risks of GenAI and LLMs, including their use in deepfakes, malicious content generation, and the creation of synthetic identities.
results: The paper highlights the potential consequences of GenAI and LLMs being misused, including the blurring of the lines between the virtual and real worlds, the potential for targeted misinformation and scams, and the creation of sophisticated malware. The paper also serves as a call to action to prepare for these potential risks and challenges.In Simplified Chinese text, the three key points would be:
for: 这篇论文是为了提醒大家关于生成人工智能（GenAI）和大语言模型（LLMs）的可能的风险和挑战。
methods: 论文使用了组合的研究和分析来识别GenAI和LLMs的可能的风险，包括它们在深圳中的使用、邪恶内容生成和假造标识等。
results: 论文 highlights GenAI和LLMs的可能的后果，包括虚拟和现实世界之间的边界模糊、targeted的谣言和骗局、以及高级的黑客软件。论文也 serves as a call to action，准备这些可能的风险和挑战。

Abstract
Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) are marvels of technology; celebrated for their prowess in natural language processing and multimodal content generation, they promise a transformative future. But as with all powerful tools, they come with their shadows. Picture living in a world where deepfakes are indistinguishable from reality, where synthetic identities orchestrate malicious campaigns, and where targeted misinformation or scams are crafted with unparalleled precision. Welcome to the darker side of GenAI applications. This article is not just a journey through the meanders of potential misuse of GenAI and LLMs, but also a call to recognize the urgency of the challenges ahead. As we navigate the seas of misinformation campaigns, malicious content generation, and the eerie creation of sophisticated malware, we'll uncover the societal implications that ripple through the GenAI revolution we are witnessing. From AI-powered botnets on social media platforms to the unnerving potential of AI to generate fabricated identities, or alibis made of synthetic realities, the stakes have never been higher. The lines between the virtual and the real worlds are blurring, and the consequences of potential GenAI's nefarious applications impact us all. This article serves both as a synthesis of rigorous research presented on the risks of GenAI and misuse of LLMs and as a thought-provoking vision of the different types of harmful GenAI applications we might encounter in the near future, and some ways we can prepare for them.

摘要
生成人工智能（GenAI）和大型语言模型（LLMs）是技术的宠儿，被庆贤以其在自然语言处理和多模式内容生成的能力。它们承诺一个转型的未来。但就像所有的强大工具一样，它们也有阴影。 imagine living in a world where deepfakes are indistinguishable from reality, where synthetic identities orchestrate malicious campaigns, and where targeted misinformation or scams are crafted with unparalleled precision. Welcome to the darker side of GenAI applications. This article is not just a journey through the meanders of potential misuse of GenAI and LLMs, but also a call to recognize the urgency of the challenges ahead. As we navigate the seas of misinformation campaigns, malicious content generation, and the eerie creation of sophisticated malware, we'll uncover the societal implications that ripple through the GenAI revolution we are witnessing. From AI-powered botnets on social media platforms to the unnerving potential of AI to generate fabricated identities, or alibis made of synthetic realities, the stakes have never been higher. The lines between the virtual and the real worlds are blurring, and the consequences of potential GenAI's nefarious applications impact us all. This article serves both as a synthesis of rigorous research presented on the risks of GenAI and misuse of LLMs and as a thought-provoking vision of the different types of harmful GenAI applications we might encounter in the near future, and some ways we can prepare for them.

Review of deep learning in healthcare

paper_url: http://arxiv.org/abs/2310.00727
repo_url: https://github.com/avadhutsonavane/Diagnosis-of-Coronavirus-using-chest-X-RAY
paper_authors: Hasan Hejbari Zargar, Saha Hejbari Zargar, Raziye Mehri
for: 本研究旨在探讨医疗系统中使用深度学习方法，包括最新的网络设计、应用和市场趋势。
methods: 本研究使用深度学习方法，包括深度神经网络模型，以提取医疗数据中隐藏的模式和有价值信息。
results: 研究发现，深度学习方法在医疗系统中可以提取到有价值的信息，但是需要更好地结合人类医疗解释才能实现更高效的应用。

Abstract
Given the growing complexity of healthcare data over the last several years, using machine learning techniques like Deep Neural Network (DNN) models has gained increased appeal. In order to extract hidden patterns and other valuable information from the huge quantity of health data, which traditional analytics are unable to do in a reasonable length of time, machine learning (ML) techniques are used. Deep Learning (DL) algorithms in particular have been shown as potential approaches to pattern identification in healthcare systems. This thought has led to the contribution of this research, which examines deep learning methods used in healthcare systems via an examination of cutting-edge network designs, applications, and market trends. To connect deep learning methodologies and human healthcare interpretability, the initial objective is to provide in-depth insight into the deployment of deep learning models in healthcare solutions. And last, to outline the current unresolved issues and potential directions.

摘要
随着医疗数据的增长复杂性，使用机器学习技术如深度神经网络（DNN）模型已经得到了加大的appeal。为了从庞大量的医疗数据中提取隐藏的模式和其他有价值的信息，传统分析无法在合理的时间内完成，因此机器学习（ML）技术被使用。深度学习（DL）算法在医疗系统中特别有潜力，这也导致了本研究的出发，即通过对当前最新的网络设计、应用和市场趋势进行检验，探讨深度学习在医疗解决方案中的应用。为了将深度学习方法与人类医疗解释相连接，初始的目标是提供深度学习模型在医疗解决方案中的深入分析。最后，总结当前未解决的问题和可能的发展方向。

Improving Length-Generalization in Transformers via Task Hinting

paper_url: http://arxiv.org/abs/2310.00726
repo_url: None
paper_authors: Pranjal Awasthi, Anupam Gupta
for: 本研究旨在解决 transformer 模型在某些逻辑和数学任务上长度泛化问题。特别是，一个基于添加的 transformer 模型在应用于更长的实例时表现会下降很快。本研究提出了一种基于任务提示的方法，以解决长度泛化问题。
methods: 本研究使用了多任务训练框架，并在训练过程中同时训练模型解决一个简单且相关的 auxillary 任务。
results: 对于排序问题，我们发现可以使用 sequences 的 length 不超过 20 来训练模型，并在 test 数据上提高了模型的测试准确率从 less than 1% (标准训练) 提高到更多于 92% (via 任务提示)。此外，我们还发现了一些有趣的长度泛化问题的方面，包括不同的 auxillary 任务的效iveness 在提高长度泛化方面有很大差异。

Abstract
It has been observed in recent years that transformers have problems with length generalization for certain types of reasoning and arithmetic tasks. In particular, the performance of a transformer model trained on tasks (say addition) up to a certain length (e.g., 5 digit numbers) drops sharply when applied to longer instances of the same problem. This work proposes an approach based on task hinting towards addressing length generalization. Our key idea is that while training the model on task-specific data, it is helpful to simultaneously train the model to solve a simpler but related auxiliary task as well. We study the classical sorting problem as a canonical example to evaluate our approach. We design a multitask training framework and show that task hinting significantly improve length generalization. For sorting we show that it is possible to train models on data consisting of sequences having length at most $20$, and improve the test accuracy on sequences of length $100$ from less than 1% (for standard training) to more than 92% (via task hinting). Our study uncovers several interesting aspects of length generalization. We observe that while several auxiliary tasks may seem natural a priori, their effectiveness in improving length generalization differs dramatically. We further use probing and visualization-based techniques to understand the internal mechanisms via which the model performs the task, and propose a theoretical construction consistent with the observed learning behaviors of the model. Based on our construction, we show that introducing a small number of length dependent parameters into the training procedure can further boost the performance on unseen lengths. Finally, we also show the efficacy of our task hinting based approach beyond sorting, giving hope that these techniques will be applicable in broader contexts.

摘要
近年来，transformer模型在某些逻辑和数学任务中表现出长度泛化问题。具体来说，一个基于添加任务的transformer模型在应用于更长的问题时表现下降。这项工作提出一种基于任务提示的方法来解决长度泛化问题。我们的关键想法是在训练模型时，同时训练模型解决一个相关的简单任务。我们选择排序问题作为一个典型的例子来评估我们的方法。我们设计了一个多任务训练框架，并证明了任务提示可以显著提高长度泛化。对于排序问题，我们可以在数据中包含长度不超过20的序列，并在测试时提高测试 accuracy 从 less than 1% (标准训练) 到更多于92% (via任务提示)。我们的研究揭示了长度泛化的几个有趣方面。我们发现，虽然一些 auxillary task 可能看起来很自然，但它们在提高长度泛化效果上差异很大。我们还使用探测和视觉化技术来理解模型如何完成任务，并提出了一种理论建构，该建构与模型学习行为相符。基于该建构，我们表明在训练过程中引入一小数量的长度参数可以进一步提高对未经见长度的表现。最后，我们还证明了我们的任务提示基本方法在更广泛的上下文中有效。

Subtractive Mixture Models via Squaring: Representation and Learning

paper_url: http://arxiv.org/abs/2310.00724
repo_url: https://github.com/anon-npc/squared-npcs
paper_authors: Lorenzo Loconte, Aleksanteri M. Sladek, Stefan Mengel, Martin Trapp, Arno Solin, Nicolas Gillis, Antonio Vergari
for: 用于模型复杂的分布
methods: 使用深度减法 mixture 模型
results: 可以提高表达能力，并且在实际分布估计任务中得到良好效果

Abstract
Mixture models are traditionally represented and learned by adding several distributions as components. Allowing mixtures to subtract probability mass or density can drastically reduce the number of components needed to model complex distributions. However, learning such subtractive mixtures while ensuring they still encode a non-negative function is challenging. We investigate how to learn and perform inference on deep subtractive mixtures by squaring them. We do this in the framework of probabilistic circuits, which enable us to represent tensorized mixtures and generalize several other subtractive models. We theoretically prove that the class of squared circuits allowing subtractions can be exponentially more expressive than traditional additive mixtures; and, we empirically show this increased expressiveness on a series of real-world distribution estimation tasks.

摘要
混合模型通常通过添加多个分布来表示和学习。然而，允许混合 subtract 概率质量或密度可以很快减少需要odel复杂分布的组件数量。然而，学习这种 subtractive 混合并确保它们仍然表示非负函数是困难的。我们研究如何在概率Circuits框架下学习和进行推理深 subtractive 混合，并证明在这种框架下，allowing squaring 可以在exponentially more expressive的基础上表示。此外，我们还employs empirical evidence demonstrate this increased expressiveness on a series of real-world distribution estimation tasks。Note: Simplified Chinese is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and Macau.

A Simple Yet Effective Strategy to Robustify the Meta Learning Paradigm

paper_url: http://arxiv.org/abs/2310.00708
repo_url: None
paper_authors: Qi Wang, Yiqin Lv, Yanghe Feng, Zheng Xie, Jincai Huang
for: 提高 meta 学习的可靠性和鲁棒性，尤其是在风险敏感的情况下。
methods: 基于分布 robust 思想来优化 meta 学习管道，并使用预期尾风险度量进行优化。
results: 实验结果显示，我们的简单方法可以提高 meta 学习对任务分布的Robustness，降低 conditional 预期最坏快速风险的平均值。

Abstract
Meta learning is a promising paradigm to enable skill transfer across tasks. Most previous methods employ the empirical risk minimization principle in optimization. However, the resulting worst fast adaptation to a subset of tasks can be catastrophic in risk-sensitive scenarios. To robustify fast adaptation, this paper optimizes meta learning pipelines from a distributionally robust perspective and meta trains models with the measure of expected tail risk. We take the two-stage strategy as heuristics to solve the robust meta learning problem, controlling the worst fast adaptation cases at a certain probabilistic level. Experimental results show that our simple method can improve the robustness of meta learning to task distributions and reduce the conditional expectation of the worst fast adaptation risk.

摘要
<>将文本翻译成简化中文。<>基于实际风险最小化原则的现有方法通常采用Meta学习。然而，这可能导致在任务分布下的最坏快adaptation情况，在风险敏感场景下可能是灾难性的。为了强化快adaptation的稳定性，这篇论文从分布 robust perspective来优化Meta学习管道，并使用度量预期的尾风险来训练Meta模型。我们采用两阶段策略来解决Robust Meta学习问题，在某些 probabilistic水平上控制最坏快adaptation的情况。实验结果表明，我们的简单方法可以提高Meta学习的任务分布Robustness和降低最坏快adaptation风险的 conditional expectation。

Meta Semantic Template for Evaluation of Large Language Models

paper_url: http://arxiv.org/abs/2310.01448
repo_url: None
paper_authors: Yachuan Liu, Liang Chen, Jindong Wang, Qiaozhu Mei, Xing Xie
for: 评估大语言模型（LLMs）的 semantics 理解能力，不是仅仅是 memorize 训练数据。
methods: 提出了 MSTemp 方法，通过创建meta semantic templates来评估 LLMs 的 semantics 理解能力。
results: MSTemp 可以生成高度 OUT-OF-DISTRIBUTION（OOD）评估样本，并且可以显著降低 LLMS 使用现有数据集作为种子时的性能。

Abstract
Do large language models (LLMs) genuinely understand the semantics of the language, or just memorize the training data? The recent concern on potential data contamination of LLMs has raised awareness of the community to conduct research on LLMs evaluation. In this paper, we propose MSTemp, an approach that creates meta semantic templates to evaluate the semantic understanding ability of LLMs. The core of MSTemp is not to perform evaluation directly on existing benchmark datasets, but to generate new out-of-distribution (OOD) evaluation sets using existing datasets as seeds. Specifically, for a given sentence, MSTemp leverages another language model to generate new samples while preserving its semantics. The new samples are called semantic templates to the original sentence. Then, MSTemp generates evaluation samples via sentence parsing and random word replacement on the semantic templates. MSTemp is highly flexible, dynamic, and cost-effective. Our initial experiments show that MSTemp-generated samples can significantly reduce the performance of LLMs using existing datasets as seeds. We hope this initial work can shed light on future research of LLMs evaluation.

摘要
<>将文本翻译成简化中文。<> LLMs 真的理解语言 semantics 吗，或者只是Memorize 训练数据？在社区关注 LLMS 可能存在数据束缚的问题后，我们提出了一种方法来评估 LLMS。在这篇论文中，我们提出了 MSTemp，它使用现有的语言模型生成新的 OUT-OF-DISTRIBUTION（OOD）评估集。具体来说，对于一个句子，MSTemp 利用另一个语言模型生成新的样本，保持句子的 semantics。然后，MSTemp 使用句子分析和随机词替换来生成评估样本。MSTemp 具有高度的灵活性、动态性和成本效益。我们的初步实验表明，MSTemp 生成的样本可以使 LLMS 使用现有数据集作为种子时表现出显著的下降性能。我们希望这些初步研究可以鼓励未来 LLMS 评估的研究。

Exchange means change: an unsupervised single-temporal change detection framework based on intra- and inter-image patch exchange

paper_url: http://arxiv.org/abs/2310.00689
repo_url: https://github.com/chenhongruixuan/i3pe
paper_authors: Hongruixuan Chen, Jian Song, Chen Wu, Bo Du, Naoto Yokoya
for: 这个研究旨在提出一个无监控、无标注的单时间变化检测 Framework，以实现单时间 remote sensing 图像上的变化检测。
methods: 这个 Framework 使用了内部和外部图像块交换 (I3PE) 方法，通过交换内部图像块和外部图像块，从单时间图像中生成 pseudo-bi-temporal 图像组和变化标签。
results: 实验结果显示，I3PE 可以超过表现最佳方法的代表无监控方法，实现 F1 值提升约 10.65% 和 6.99%。此外，I3PE 可以在单监控和半监控情况下提高变化检测器的性能。

Abstract
Change detection (CD) is a critical task in studying the dynamics of ecosystems and human activities using multi-temporal remote sensing images. While deep learning has shown promising results in CD tasks, it requires a large number of labeled and paired multi-temporal images to achieve high performance. Pairing and annotating large-scale multi-temporal remote sensing images is both expensive and time-consuming. To make deep learning-based CD techniques more practical and cost-effective, we propose an unsupervised single-temporal CD framework based on intra- and inter-image patch exchange (I3PE). The I3PE framework allows for training deep change detectors on unpaired and unlabeled single-temporal remote sensing images that are readily available in real-world applications. The I3PE framework comprises four steps: 1) intra-image patch exchange method is based on an object-based image analysis method and adaptive clustering algorithm, which generates pseudo-bi-temporal image pairs and corresponding change labels from single-temporal images by exchanging patches within the image; 2) inter-image patch exchange method can generate more types of land-cover changes by exchanging patches between images; 3) a simulation pipeline consisting of several image enhancement methods is proposed to simulate the radiometric difference between pre- and post-event images caused by different imaging conditions in real situations; 4) self-supervised learning based on pseudo-labels is applied to further improve the performance of the change detectors in both unsupervised and semi-supervised cases. Extensive experiments on two large-scale datasets demonstrate that I3PE outperforms representative unsupervised approaches and achieves F1 value improvements of 10.65% and 6.99% to the SOTA method. Moreover, I3PE can improve the performance of the ... (see the original article for full abstract)

摘要
Change detection (CD) 是生态系统和人类活动研究中的关键任务，使用多时间 remote sensing 图像进行研究。深度学习已经在 CD 任务中表现出色，但它需要大量标注和对应的多时间图像来达到高性能。对于大规模多时间 remote sensing 图像的对应和标注是非常昂贵和时间消耗的。为了使深度学习基于 CD 技术更实用和成本效果，我们提出了一个无监督单时 CD 框架，基于内部和外部图像块交换（I3PE）。I3PE 框架包括四个步骤：1. 内部图像块交换方法基于 объек 基于 image 分析方法和自适应聚类算法，通过在图像中交换块来生成 pseudo-bi-temporal 图像对和相应的变化标签。2. 外部图像块交换方法可以生成更多的土地覆盖变化类型，通过在图像之间交换块。3. 我们提出了一个模拟管道，包括多种图像提升方法，以模拟在实际情况下的 радиометрические差异。4. 我们采用了自动标注的自我超vised 学习方法，以进一步提高 CD 检测器的性能。我们在两个大规模数据集上进行了广泛的实验，并证明 I3PE 可以在无监督和半监督情况下超越代表性的无监督方法，并在 SOTA 方法上实现 F1 值提升率为 10.65% 和 6.99%。此外，I3PE 还可以提高 CD 检测器的性能。

The Robots are Here: Navigating the Generative AI Revolution in Computing Education

paper_url: http://arxiv.org/abs/2310.00658
repo_url: None
paper_authors: James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, Stephen MacNeil, Andrew Peterson, Raymond Pettit, Brent N. Reeves, Jaromir Savelka
for: 这份工作组报告旨在探讨大语言模型（LLMs）在计算教育中的应用和挑战，以及如何适应和利用这些新技术。
methods: 本报告使用Literature Review和论坛调查来探讨LLMs在计算教育中的应用，并从22名计算教育专家的深入采访中收集了实践经验。
results: 本报告的主要结论是：LLMs在计算教育中的应用可以提高学生的学习效果和创新能力，但也存在一些伦理和教学方法的挑战。同时，现有的LLMs在计算教育领域的性能水平在不断提高。

Abstract
Recent advancements in artificial intelligence (AI) are fundamentally reshaping computing, with large language models (LLMs) now effectively being able to generate and interpret source code and natural language instructions. These emergent capabilities have sparked urgent questions in the computing education community around how educators should adapt their pedagogy to address the challenges and to leverage the opportunities presented by this new technology. In this working group report, we undertake a comprehensive exploration of LLMs in the context of computing education and make five significant contributions. First, we provide a detailed review of the literature on LLMs in computing education and synthesise findings from 71 primary articles. Second, we report the findings of a survey of computing students and instructors from across 20 countries, capturing prevailing attitudes towards LLMs and their use in computing education contexts. Third, to understand how pedagogy is already changing, we offer insights collected from in-depth interviews with 22 computing educators from five continents who have already adapted their curricula and assessments. Fourth, we use the ACM Code of Ethics to frame a discussion of ethical issues raised by the use of large language models in computing education, and we provide concrete advice for policy makers, educators, and students. Finally, we benchmark the performance of LLMs on various computing education datasets, and highlight the extent to which the capabilities of current models are rapidly improving. Our aim is that this report will serve as a focal point for both researchers and practitioners who are exploring, adapting, using, and evaluating LLMs and LLM-based tools in computing classrooms.

摘要
Recent advancements in artificial intelligence (AI) are fundamentally reshaping computing, with large language models (LLMs) now effectively being able to generate and interpret source code and natural language instructions. These emergent capabilities have sparked urgent questions in the computing education community around how educators should adapt their pedagogy to address the challenges and to leverage the opportunities presented by this new technology. In this working group report, we undertake a comprehensive exploration of LLMs in the context of computing education and make five significant contributions. First, we provide a detailed review of the literature on LLMs in computing education and synthesize findings from 71 primary articles. Second, we report the findings of a survey of computing students and instructors from across 20 countries, capturing prevailing attitudes towards LLMs and their use in computing education contexts. Third, to understand how pedagogy is already changing, we offer insights collected from in-depth interviews with 22 computing educators from five continents who have already adapted their curricula and assessments. Fourth, we use the ACM Code of Ethics to frame a discussion of ethical issues raised by the use of large language models in computing education, and we provide concrete advice for policy makers, educators, and students. Finally, we benchmark the performance of LLMs on various computing education datasets, and highlight the extent to which the capabilities of current models are rapidly improving. Our aim is that this report will serve as a focal point for both researchers and practitioners who are exploring, adapting, using, and evaluating LLMs and LLM-based tools in computing classrooms.

LEGO-Prover: Neural Theorem Proving with Growing Libraries

paper_url: http://arxiv.org/abs/2310.00656
repo_url: https://github.com/wiio12/LEGO-Prover
paper_authors: Haiming Wang, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Heng Liao, Xiaodan Liang
for: This paper aims to improve the ability of large language models (LLMs) to prove mathematical theorems by employing a growing skill library containing verified lemmas as skills.
methods: The proposed method, called LEGO-Prover, constructs the proof modularly and uses existing skills retrieved from the library to augment the capability of LLMs. The skills are further evolved by prompting an LLM to enrich the library on another scale.
results: The proposed method advances the state-of-the-art pass rate on miniF2F-valid and miniF2F-test, and generates over 20,000 skills (theorems/lemmas) that are added to the growing library. The ablation study shows that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47.1% to 50.4%.Here is the same information in Simplified Chinese text:
for: 这篇论文的目的是使大型自然语言模型（LLM）能够更好地证明数学定理，通过使用增长的技能库，这个库包含已验证的证明。
methods: 提议的方法是LEGO-Prover，它将证明构造为模块化的方式，使用已存在的技能库中的技能来增强LLM的能力。这些技能还会在证明过程中进行进一步的演化，以便在另一个尺度上增强库。
results: 提议的方法提高了miniF2F-valid和miniF2F-test的状态前的通过率，并生成了超过20,000个技能（定理/证明），这些技能被加入了增长的库中。我们的减少研究表明，这些新增的技能确实对证明定理有帮助，从47.1%提高到50.4%。我们还发布了我们的代码和所有生成的技能。

Abstract
Despite the success of large language models (LLMs), the task of theorem proving still remains one of the hardest reasoning tasks that is far from being fully solved. Prior methods using language models have demonstrated promising results, but they still struggle to prove even middle school level theorems. One common limitation of these methods is that they assume a fixed theorem library during the whole theorem proving process. However, as we all know, creating new useful theorems or even new theories is not only helpful but crucial and necessary for advancing mathematics and proving harder and deeper results. In this work, we present LEGO-Prover, which employs a growing skill library containing verified lemmas as skills to augment the capability of LLMs used in theorem proving. By constructing the proof modularly, LEGO-Prover enables LLMs to utilize existing skills retrieved from the library and to create new skills during the proving process. These skills are further evolved (by prompting an LLM) to enrich the library on another scale. Modular and reusable skills are constantly added to the library to enable tackling increasingly intricate mathematical problems. Moreover, the learned library further bridges the gap between human proofs and formal proofs by making it easier to impute missing steps. LEGO-Prover advances the state-of-the-art pass rate on miniF2F-valid (48.0% to 57.0%) and miniF2F-test (45.5% to 47.1%). During the proving process, LEGO-Prover also manages to generate over 20,000 skills (theorems/lemmas) and adds them to the growing library. Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47.1% to 50.4%. We also release our code and all the generated skills.

摘要
尽管大型语言模型（LLM）取得了成功，但 theorem proving 仍然是一项非常困难的推理任务，尚未被完全解决。先前的方法使用语言模型已经取得了有望的结果，但它们仍然无法证明中学水平的定理。一个常见的限制是这些方法假设定量定理库在整个定理证明过程中保持不变。然而，我们所知道，创造新有用的定理或新的理论是不仅有帮助作用，而且是必要和必要的，以前进 mathematics 和证明更深入的结果。在这项工作中，我们提出了 LEGO-Prover，它使用增长的技能库，其中包含验证的证明为技能来增强 LLMS 在定理证明中的能力。通过构建证明为模块，LEGO-Prover 让 LLMS 可以在证明过程中使用现有的技能库中的技能，以及在证明过程中创建新的技能。这些技能被进一步演化（通过提示 LLMS），以拓展库的规模。我们不断增加可重用的技能，以便解决越来越复杂的数学问题。此外，学习的库还使得人类证明和正式证明之间的差距变得更小，使得补充缺失的步骤更加容易。LEGO-Prover 提高了 miniF2F-valid 和 miniF2F-test 的通过率（48.0% 到 57.0%）和 miniF2F-test 的成功率（45.5% 到 47.1%）。在证明过程中，LEGO-Prover 还生成了超过 20,000 个技能（定理/证明），并将它们添加到增长的库中。我们的剥离研究表明，这些新增的技能确实对于证明定理有帮助，导致成功率从 47.1% 提高到 50.4%。我们还发布了我们的代码和所有生成的技能。

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

paper_url: http://arxiv.org/abs/2310.00653
repo_url: https://github.com/thunlp/muffin
paper_authors: Tianyu Yu, Jinyi Hu, Yuan Yao, Haoye Zhang, Yue Zhao, Chongyi Wang, Shan Wang, Yinxv Pan, Jiao Xue, Dahai Li, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun
for: 这个论文主要是为了提出一种新的视觉语言模型框架和 Multimodal 指令训练数据集，以提高现有的 Multimodal 语言模型的性能。
methods: 这个论文使用了一种称为 Muffin 的新框架，该框架直接使用预训练的视觉语言模型来连接视觉模块和语言模型，而不需要额外的特征Alignment预训练。此外，该论文还提出了一个名为 UniMM-Chat 的新数据集，该数据集通过将不同任务的数据集融合而成，以生成高质量和多样化的 Multimodal 指令。
results: 实验结果表明，Muffin 框架和 UniMM-Chat 数据集可以提高 Multimodal 语言模型的性能，并且超越了现有的状态机器人模型 like LLaVA 和 InstructBLIP。

Abstract
Recent Multimodal Large Language Models (MLLMs) exhibit impressive abilities to perceive images and follow open-ended instructions. The capabilities of MLLMs depend on two crucial factors: the model architecture to facilitate the feature alignment of visual modules and large language models; the multimodal instruction tuning datasets for human instruction following. (i) For the model architecture, most existing models introduce an external bridge module to connect vision encoders with language models, which needs an additional feature-alignment pre-training. In this work, we discover that compact pre-trained vision language models can inherently serve as ``out-of-the-box'' bridges between vision and language. Based on this, we propose Muffin framework, which directly employs pre-trained vision-language models to act as providers of visual signals. (ii) For the multimodal instruction tuning datasets, existing methods omit the complementary relationship between different datasets and simply mix datasets from different tasks. Instead, we propose UniMM-Chat dataset which explores the complementarities of datasets to generate 1.1M high-quality and diverse multimodal instructions. We merge information describing the same image from diverse datasets and transforms it into more knowledge-intensive conversation data. Experimental results demonstrate the effectiveness of the Muffin framework and UniMM-Chat dataset. Muffin achieves state-of-the-art performance on a wide range of vision-language tasks, significantly surpassing state-of-the-art models like LLaVA and InstructBLIP. Our model and dataset are all accessible at https://github.com/thunlp/muffin.

摘要
最近的多模态大语言模型（MLLM）展现出了惊人的图像识别和开放式指令遵从能力。MLLM的能力取决于两个关键因素：模型架构来实现视觉模块的特征对应，以及多模态指令调整数据集来训练人类指令遵从。在这种情况下，我们发现了一种``出团''的解决方案：使用预训练的视觉语言模型作为视觉信号的提供者。基于这一点，我们提出了甜甜干涯（Muffin）框架，直接employs预训练的视觉语言模型来处理视觉信号。其次，我们发现现有的多模态指令调整数据集忽略了不同任务之间的补偿关系，而是将不同任务的数据集混合在一起。相反，我们提出了UniMM-Chat数据集，它探索不同任务之间的补偿关系，并将这些数据集转化为更加知识充沛的对话数据。实验结果表明甜甜干涯框架和UniMM-Chat数据集的效果。甜甜干涯在各种视觉语言任务上达到了状态之arte的表现，significantly超越了状态之arte的模型 like LLaVA和InstructBLIP。我们的模型和数据集都可以在https://github.com/thunlp/muffin中下载。

Adaptive-Solver Framework for Dynamic Strategy Selection in Large Language Model Reasoning

paper_url: http://arxiv.org/abs/2310.01446
repo_url: None
paper_authors: Jianpeng Zhou, Wanjun Zhong, Yanlin Wang, Jiahai Wang
for: 本研究旨在提高大语言模型（LLM）在复杂理解任务中的表现，并适应实际问题的多样性。
methods: 本研究提出了一种适应性解决框架，该框架可以根据问题的复杂性进行灵活的调整。具体来说，该框架包括两个主要模块：初始评估模块和后续适应模块。在后续适应模块中，研究者采用了三种适应策略：（1）模型适应策略：根据问题的复杂性，选择合适的大语言模型；（2）提示方法适应策略：根据问题的特点，选择合适的提示方法；（3）归纳粒度适应策略：根据问题的复杂性，进行细化的问题分解。
results: 实验结果显示，提示方法适应策略和归纳粒度适应策略在所有任务中均提高了表现，而模型适应策略可以减少API成本（最多50%），同时保持高水平的表现。

Abstract
Large Language Models (LLMs) are showcasing impressive ability in handling complex reasoning tasks. In real-world situations, problems often span a spectrum of complexities. Humans inherently adjust their problem-solving approaches based on task complexity. However, most methodologies that leverage LLMs tend to adopt a uniform approach: utilizing consistent models, prompting methods, and degrees of problem decomposition, regardless of the problem complexity. Inflexibility of them can bring unnecessary computational overhead or sub-optimal performance. To address this problem, we introduce an Adaptive-Solver framework. It strategically modulates solving strategies based on the difficulties of the problems. Given an initial solution, the framework functions with two primary modules. The initial evaluation module assesses the adequacy of the current solution. If improvements are needed, the subsequent adaptation module comes into play. Within this module, three key adaptation strategies are employed: (1) Model Adaptation: Switching to a stronger LLM when a weaker variant is inadequate. (2) Prompting Method Adaptation: Alternating between different prompting techniques to suit the problem's nuances. (3) Decomposition Granularity Adaptation: Breaking down a complex problem into more fine-grained sub-questions to enhance solvability. Through such dynamic adaptations, our framework not only enhances computational efficiency but also elevates the overall performance. This dual-benefit ensures both the efficiency of the system for simpler tasks and the precision required for more complex questions. Experimental results from complex reasoning tasks reveal that the prompting method adaptation and decomposition granularity adaptation enhance performance across all tasks. Furthermore, the model adaptation approach significantly reduces API costs (up to 50%) while maintaining superior performance.

摘要
大型语言模型（LLMs）在复杂逻辑任务中表现出色，但在实际情况中，问题 часто处于复杂性spectrum中。人类自然地根据任务复杂性调整问题解决方法，而多数利用LLMs的方法ologies却采用一致的方法：使用一致的模型、提示方法和问题剖析级别，无论问题复杂性如何。这种不灵活性可能会带来不必要的计算开销或低效性。为解决这个问题，我们介绍了一个适应解决器框架。它在给定的解决方案基础上，策略地调整解决方法，以适应问题的复杂度。解决器框架包括两个主要模块：初始评估模块和后续适应模块。初始评估模块评估当前解决方案的妥当性。如果需要改进，后续适应模块就会起到作用。在这个模块中，我们采用了三种适应策略：1. 模型适应：在弱模型无法解决问题时，切换到更强的LLM。2. 提示方法适应：根据问题的特点，采用不同的提示方法。3. decompositions Granularity适应：将复杂问题 decompositions into更细grained的子问题，以提高可解性。通过这些动态适应策略，我们的框架不 только提高计算效率，还能够保持高度的表现。这种双重优点确保系统在简单任务上的效率，以及复杂任务上的准确性。实验结果表明，提示方法适应和 decompositions Granularity适应在所有任务上提高表现，而模型适应策略可以减少API成本（最多50%），同时保持高度表现。

WASA: WAtermark-based Source Attribution for Large Language Model-Generated Data

paper_url: http://arxiv.org/abs/2310.00646
repo_url: None
paper_authors: Jingtan Wang, Xinyang Lu, Zitong Zhao, Zhongxiang Dai, Chuan-Sheng Foo, See-Kiong Ng, Bryan Kian Hsiang Low
for: 这个论文是为了解决大语言模型（LLM）训练数据的知识产权问题而写的。
methods: 这个论文使用了水印技术（watermarking）来解决知识产权问题。 specifically, it proposes a WAtermarking for Source Attribution (WASA) framework that enables an LLM to generate synthetic texts with embedded watermarks that contain information about their source(s).
results: 该论文通过实验证明，使用WASA框架可以实现有效的源归属和数据来源验证。

Abstract
The impressive performances of large language models (LLMs) and their immense potential for commercialization have given rise to serious concerns over the intellectual property (IP) of their training data. In particular, the synthetic texts generated by LLMs may infringe the IP of the data being used to train the LLMs. To this end, it is imperative to be able to (a) identify the data provider who contributed to the generation of a synthetic text by an LLM (source attribution) and (b) verify whether the text data from a data provider has been used to train an LLM (data provenance). In this paper, we show that both problems can be solved by watermarking, i.e., by enabling an LLM to generate synthetic texts with embedded watermarks that contain information about their source(s). We identify the key properties of such watermarking frameworks (e.g., source attribution accuracy, robustness against adversaries), and propose a WAtermarking for Source Attribution (WASA) framework that satisfies these key properties due to our algorithmic designs. Our WASA framework enables an LLM to learn an accurate mapping from the texts of different data providers to their corresponding unique watermarks, which sets the foundation for effective source attribution (and hence data provenance). Extensive empirical evaluations show that our WASA framework achieves effective source attribution and data provenance.

摘要
大型语言模型（LLM）的吸引人表现和其商业化潜力已经引起了训练数据知识产权（IP）的严重担忧。具体来说， LLM 生成的 sintetic 文本可能会侵犯训练数据的 IP。因此，必须能够（a）确定 LLM 生成 sintetic 文本中的数据提供者（源归属），以及（b）验证数据提供者的文本数据是否被用来训练 LLM。在这篇论文中，我们表明了这两个问题可以通过水印来解决，即使 LLM 生成 sintetic 文本时包含水印，其中包含了文本的来源信息。我们标识了水印框架的关键属性（如源归属精度和对抗攻击者的Robustness），并提出了一个基于 WASA 框架的水印方法，该方法满足这些关键属性。我们的 WASA 框架使得 LLM 可以学习不同数据提供者的文本和它们对应的唯一水印之间的准确映射，这为有效的源归属（以及数据来源）提供了基础。我们的 empirical 评估表明，我们的 WASA 框架可以实现有效的源归属和数据来源识别。

From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information

paper_url: http://arxiv.org/abs/2310.00642
repo_url: None
paper_authors: Zhendong Shi, Xiaoli Wei, Ercan E. Kuruoglu
for: 本研究旨在解决Sequential process中的各种复杂环境下的投资决策问题，使用了两种方法来增强遗传学习的性能：contextual Thompson sampling和 reinforcement learning under supervision。
methods: 本研究使用了遗传学习和CPPI（常数比例资产保险），并将其与DDPG（深度决定策函数优化）相结合，以加速遗传学习的迭代过程，寻找最佳策略。
results: 实验结果显示，使用了上述两种方法可以加速遗传学习的迭代过程，并且可以快速获得最佳策略。

Abstract
The problem of how to take the right actions to make profits in sequential process continues to be difficult due to the quick dynamics and a significant amount of uncertainty in many application scenarios. In such complicated environments, reinforcement learning (RL), a reward-oriented strategy for optimum control, has emerged as a potential technique to address this strategic decision-making issue. However, reinforcement learning also has some shortcomings that make it unsuitable for solving many financial problems, excessive resource consumption, and inability to quickly obtain optimal solutions, making it unsuitable for quantitative trading markets. In this study, we use two methods to overcome the issue with contextual information: contextual Thompson sampling and reinforcement learning under supervision which can accelerate the iterations in search of the best answer. In order to investigate strategic trading in quantitative markets, we merged the earlier financial trading strategy known as constant proportion portfolio insurance (CPPI) into deep deterministic policy gradient (DDPG). The experimental results show that both methods can accelerate the progress of reinforcement learning to obtain the optimal solution.

摘要
“对于续行过程中获利的选择问题，由于动态变化快速且在许多应用场景中存在许多不确定性，这个问题仍然具有困难。在这些复杂的环境中，奖励学习（RL），一种奖励控制的整合策略，已经被视为一种解决此策略决策问题的技术。然而，奖励学习也有一些缺陷，使其不适合解决许多金融问题，包括资源耗用量过大和寻找最佳解答的速度太慢，这使其不适合量化交易市场。在这篇研究中，我们使用了两种方法来突破问题，即在上下文信息中进行奖励探索和奖励学习的监督。为了研究量化交易的战略问题，我们将以前的金融交易策略known as constant proportion portfolio insurance (CPPI)与深度决定性策略gradient (DDPG) 混合。实验结果显示，这两种方法可以加速奖励学习的进程，以取得最佳解答。”Note: The translation is done using Google Translate and may not be perfect. Please note that the translation is done in a simplified Chinese, if you need a traditional Chinese translation, please let me know.

Knowledge Engineering using Large Language Models

paper_url: http://arxiv.org/abs/2310.00637
repo_url: https://github.com/jettbrains/-L-
paper_authors: Bradley P. Allen, Lise Stork, Paul Groth
for: 这篇论文旨在探讨大语言模型在知识工程中的潜在作用，以及如何将其与传统的符号知识系统融合。
methods: 该论文提出了两个中心方向：1）创建混合神经符号知识系统；2）在自然语言中进行知识工程。
results: 该论文提出了一些关键的未解决问题，以便进一步探讨这两个方向。

Abstract
Knowledge engineering is a discipline that focuses on the creation and maintenance of processes that generate and apply knowledge. Traditionally, knowledge engineering approaches have focused on knowledge expressed in formal languages. The emergence of large language models and their capabilities to effectively work with natural language, in its broadest sense, raises questions about the foundations and practice of knowledge engineering. Here, we outline the potential role of LLMs in knowledge engineering, identifying two central directions: 1) creating hybrid neuro-symbolic knowledge systems; and 2) enabling knowledge engineering in natural language. Additionally, we formulate key open research questions to tackle these directions.

摘要
知识工程是一个领域，它关注创建和维护生成和应用知识的过程。传统上，知识工程方法都是关注正式语言表达的知识。然而，大型自然语言模型的出现和它们可以有效地与自然语言进行交互，使得知识工程的基础和实践面临到了新的问题。以下是我们对大型自然语言模型在知识工程中的潜在作用的描述，以及两个中心方向：1. 创建混合神经符号知识系统；2. 实现自然语言知识工程。此外，我们还提出了关键的开放研究问题，以便解决这两个方向。

A Survey of Robustness and Safety of 2D and 3D Deep Learning Models Against Adversarial Attacks

paper_url: http://arxiv.org/abs/2310.00633
repo_url: None
paper_authors: Yanjie Li, Bin Xie, Songtao Guo, Yuanyuan Yang, Bin Xiao
for: 本研究旨在提高深度学习模型的可靠性和安全性，对抗训练时的敏感攻击和应用场景中的物理攻击。
methods: 本文首先构建了不同角度的威胁模型，然后对最新的2D和3D敏感攻击进行了全面的文献综述。同时，本文还扩展了敏感示例的概念，涵盖了不同类型的攻击方法。
results: 本文系统性地Investigated 3D模型对各种敏感攻击的 robustness，并发现了许多现有的攻击方法。此外，本文还发现了物理攻击可能导致安全风险的问题。最后，本文Summarize 现有的主流话题，预测未来研究的挑战和方向，以帮助建立可靠的AI系统。

Abstract
Benefiting from the rapid development of deep learning, 2D and 3D computer vision applications are deployed in many safe-critical systems, such as autopilot and identity authentication. However, deep learning models are not trustworthy enough because of their limited robustness against adversarial attacks. The physically realizable adversarial attacks further pose fatal threats to the application and human safety. Lots of papers have emerged to investigate the robustness and safety of deep learning models against adversarial attacks. To lead to trustworthy AI, we first construct a general threat model from different perspectives and then comprehensively review the latest progress of both 2D and 3D adversarial attacks. We extend the concept of adversarial examples beyond imperceptive perturbations and collate over 170 papers to give an overview of deep learning model robustness against various adversarial attacks. To the best of our knowledge, we are the first to systematically investigate adversarial attacks for 3D models, a flourishing field applied to many real-world applications. In addition, we examine physical adversarial attacks that lead to safety violations. Last but not least, we summarize present popular topics, give insights on challenges, and shed light on future research on trustworthy AI.

摘要
利用深度学习快速发展，2D和3D计算机视觉应用在许多安全关键系统中部署，如自动驾驶和身份验证。然而，深度学习模型没有够的可靠性，因为它们对骚动攻击有限制的Robustness。物理可行的骚动攻击更加 pose 致命的威胁，对应用和人类安全构成了 fatal 威胁。许多论文已经出现，以 investigate 深度学习模型对骚动攻击的Robustness和安全性。为了带来可靠的 AI，我们首先从不同的角度构建一个通用威胁模型，然后对最新的2D和3D骚动攻击进行全面的回顾。我们将 adversarial 例外扩展到不可见的扰动，并将超过 170 篇论文综述深度学习模型对不同骚动攻击的Robustness。我们认为是首次系统地调查3D模型对骚动攻击的Robustness，这是应用于许多实际应用的蓬勃领域。此外，我们还检查了物理骚动攻击导致的安全违反。最后，我们 summarize 当前流行的话题，提供挑战的视角，并照明未来可靠 AI 的研究方向。

Intelligent Client Selection for Federated Learning using Cellular Automata

paper_url: http://arxiv.org/abs/2310.00627
repo_url: https://github.com/nikopavl4/ca_client_selection
paper_authors: Nikolaos Pavlidis, Vasileios Perifanis, Theodoros Panagiotis Chatzinikolaou, Georgios Ch. Sirakoulis, Pavlos S. Efraimidis
for: 这个研究旨在提出一个基于自动化机器学习的联盟学习（Federated Learning）客户端选择算法，以提高隐私保护和减少延迟，并且能够适应实际应用中的快速变化环境。
methods: 本研究提出了一个基于细胞自动机（Cellular Automata）的客户端选择算法（CA-CS），它考虑了参与客户端的 Computational Resources 和通信能力，并且考虑了客户端之间的互动，以选择最适合的客户端进行联盟学习过程。
results: 根据实验结果显示，CA-CS 可以与随机选择方法相比，具有与随机选择方法相似的准确性，而且可以快速避免高延迟的客户端。

Abstract
Federated Learning (FL) has emerged as a promising solution for privacy-enhancement and latency minimization in various real-world applications, such as transportation, communications, and healthcare. FL endeavors to bring Machine Learning (ML) down to the edge by harnessing data from million of devices and IoT sensors, thus enabling rapid responses to dynamic environments and yielding highly personalized results. However, the increased amount of sensors across diverse applications poses challenges in terms of communication and resource allocation, hindering the participation of all devices in the federated process and prompting the need for effective FL client selection. To address this issue, we propose Cellular Automaton-based Client Selection (CA-CS), a novel client selection algorithm, which leverages Cellular Automata (CA) as models to effectively capture spatio-temporal changes in a fast-evolving environment. CA-CS considers the computational resources and communication capacity of each participating client, while also accounting for inter-client interactions between neighbors during the client selection process, enabling intelligent client selection for online FL processes on data streams that closely resemble real-world scenarios. In this paper, we present a thorough evaluation of the proposed CA-CS algorithm using MNIST and CIFAR-10 datasets, while making a direct comparison against a uniformly random client selection scheme. Our results demonstrate that CA-CS achieves comparable accuracy to the random selection approach, while effectively avoiding high-latency clients.

摘要
通用学习（FL）已经出现为保护隐私和减少延迟的有力解决方案，在交通、通信和医疗等实际应用中得到广泛应用。FL目的是将机器学习（ML）带到边缘，通过收集数百万个设备和物联网感知器的数据，以实现快速应对动态环境和提供高度个性化结果。然而，在多个应用中的多种感知器上增加了通信和资源分配的挑战，这会阻碍所有设备参与联邦过程，并提高效果的联邦学习客户端选择的需求。为解决这个问题，我们提出了基于Cellular Automata（CA）的客户端选择算法（CA-CS），利用CA模型来有效地捕捉快速发展环境中的空间-时间变化。CA-CS考虑每个参与联邦学习的客户端的计算资源和通信能力，同时也考虑客户端之间的互动，以实现在线联邦学习过程中智能客户端选择。在这篇论文中，我们对提出的CA-CS算法进行了住ehour评估，使用MNIST和CIFAR-10数据集，并对Random client selection scheme进行了直接比较。我们的结果表明，CA-CS可以与随机选择方案具有相同的准确率，同时有效地避免高延迟客户端。

Hierarchical Adaptation with Hypernetworks for Few-shot Molecular Property Prediction

paper_url: http://arxiv.org/abs/2310.00614
repo_url: None
paper_authors: Shiguang Wu, Yaqing Wang, Quanming Yao
for: 这篇论文的目的是提出一种基于卷积神经网络的层次适应机制，以解决生物医学应用中的分类问题。
methods: 该论文提出了一种基于卷积神经网络的层次适应机制，包括在编码器中选择性地适应参数，以及在预测器中对分子的适应进行层次适应。
results: 该论文的实验结果显示，基于层次适应机制的方法可以在几拟shot学习问题中取得state-of-the-art的性能。

Abstract
Molecular property prediction (MPP) is important in biomedical applications, which naturally suffers from a lack of labels, thus forming a few-shot learning problem. State-of-the-art approaches are usually based on gradient-based meta learning strategy, which ignore difference in model parameter and molecule's learning difficulty. To address above problems, we propose a novel hierarchical adaptation mechanism for few-shot MPP (HiMPP). The model follows a encoder-predictor framework. First, to make molecular representation property-adaptive, we selectively adapt encoder's parameter by designing a hypernetwork to modulate node embeddings during message propagation. Next, we make molecule-level adaptation by design another hypernetwork, which assigns larger propagating steps for harder molecules in predictor. In this way, molecular representation is transformed by HiMPP hierarchically from property-level to molecular level. Extensive results show that HiMPP obtains the state-of-the-art performance in few-shot MPP problems, and our proposed hierarchical adaptation mechanism is rational and effective.

摘要
молекулярная свойство предсказание（MPP）在生物医学应用中具有重要意义，但受到标签缺乏的限制，形成了几个shot学习问题。现状的方法通常基于梯度based meta学习策略，忽略了模型参数和分子学习难度之间的差异。为解决上述问题，我们提出了一种新的层次适应机制 для几个shot MPP（HiMPP）。模型采用encoder-predictor框架，首先使分子表示性能adaptive，通过设计一个权重网络来修饰节点嵌入的消息传播过程中的模型参数。然后，我们又使用另一个权重网络，将更难的分子 assign 更大的传播步长，从而使分子表示被HiMPP层次适应。这种方法使得分子表示被HiMPP层次适应，从属性层次适应到分子层次适应。我们的实验结果表明，HiMPP在几个shot MPP问题中获得了状态计算机科学中的最佳性能，而我们提出的层次适应机制是合理和有效的。

Understanding AI Cognition: A Neural Module for Inference Inspired by Human Memory Mechanisms

paper_url: http://arxiv.org/abs/2310.09297
repo_url: https://github.com/zengxyyu/A-neural-module-for-inference-inspired-by-human-memory-mechanisms
paper_authors: Xiangyu Zeng, Jie Lin, Piao Hu, Ruizheng Huang, Zhicheng Zhang
for: The paper aims to improve the ability of machines to make sense of current inputs and retain information for relation reasoning and question-answering by proposing a PMI framework inspired by human brain’s memory system and cognitive architectures.
methods: The PMI framework consists of perception, memory, and inference components, with a differentiable competitive write access, working memory, and long-term memory with a higher-order structure. The framework also uses outer product associations to merge working memory with long-term memory and retrieve relevant information from two separate memory origins for associative integration.
results: The paper exploratively applies the PMI framework to improve prevailing Transformers and CNN models on question-answering tasks like bAbI-20k and Sort-of-CLEVR datasets, as well as relation calculation and image classification tasks, and in each case, the PMI enhancements consistently outshine their original counterparts significantly. Visualization analyses reveal that memory consolidation and the interaction and integration of information from diverse memory sources substantially contribute to the model effectiveness on inference tasks.Here’s the format you requested:
for: 论文目标是提高机器对当前输入的理解和保留信息以便关系逻辑和问答。
methods: PMI框架包括感知、记忆和推理组件，具有可 differentiable 竞争写访问，工作记忆和长期记忆，其中长期记忆具有更高级结构以保留更多的积累知识和经验。outer product associations 将工作记忆与长期记忆 merge，并在两个不同的记忆来源之间进行相关的集成。
results: 论文应用 PMI 框架进行 prevailing Transformers 和 CNN 模型的改进，包括 bAbI-20k 和 Sort-of-CLEVR 数据集上的问答任务，以及关系计算和图像分类任务，并在每一个任务上，PMI 改进均以显著的程度超越原始模型。视觉分析表明，记忆整合和多种记忆来源之间的交互和集成对推理任务的效果具有重要作用。

Abstract
How humans and machines make sense of current inputs for relation reasoning and question-answering while putting the perceived information into context of our past memories, has been a challenging conundrum in cognitive science and artificial intelligence. Inspired by human brain's memory system and cognitive architectures, we propose a PMI framework that consists of perception, memory and inference components. Notably, the memory module comprises working and long-term memory, with the latter endowed with a higher-order structure to retain more accumulated knowledge and experiences. Through a differentiable competitive write access, current perceptions update working memory, which is later merged with long-term memory via outer product associations, averting memory overflow and minimizing information conflicts. In the inference module, relevant information is retrieved from two separate memory origins and associatively integrated to attain a more comprehensive and precise interpretation of current perceptions. We exploratively apply our PMI to improve prevailing Transformers and CNN models on question-answering tasks like bAbI-20k and Sort-of-CLEVR datasets, as well as relation calculation and image classification tasks, and in each case, our PMI enhancements consistently outshine their original counterparts significantly. Visualization analyses reveal that memory consolidation, along with the interaction and integration of information from diverse memory sources, substantially contributes to the model effectiveness on inference tasks.

摘要
人们和机器如何对当前输入进行关系理解和问答，将感知信息置入我们过去经验的 контекст，是认知科学和人工智能领域的挑战。我们提出了一个PMI框架，包括感知、记忆和推理组件。特别是记忆模块包括工作记忆和长期记忆，其中后者具有更高级别结构，以保留更多的总知识和经验。通过可 diferenciable 竞争写访问，当前感知更新工作记忆，并 eventually 与长期记忆通过外产品关联相结合，避免记忆溢出和信息冲突。在推理模块中，来自不同记忆来源的相关信息被asso ciatively 集成，以实现更全面和准确的当前感知解释。我们考虑应用PMI来改进现有的Transformers和CNN模型，在问答任务和关系计算任务上，以及图像分类任务上，并在每个任务上，我们的PMI改进都能够显著超越原始模型。视觉分析表明，记忆凝固以及不同记忆来源之间的交互和集成，对推理任务的效果具有重要作用。

Adapting LLM Agents Through Communication

paper_url: http://arxiv.org/abs/2310.01444
repo_url: None
paper_authors: Kuan Wang, Yadong Lu, Michael Santacroce, Yeyun Gong, Chao Zhang, Yelong Shen
for: 这个论文旨在提出一种名为“学习通信”（LTC）的训练方法，帮助大型自然语言模型（LLM） agents 在不需要广泛人类指导下，适应新任务。
methods: 该方法基于 iterative exploration 和 PPO 训练，使得 LLM agents 可以通过与环境和其他代理交互，不断提高自己的能力。
results: 在 ALFWorld、HotpotQA 和 GSM8k 三个数据集上，LTC 方法比基eline 高出 12%、5.1% 和 3.6% respectively，这些结果表明 LTC 方法在多种领域中具有广泛的应用前景。

Abstract
Recent advancements in large language models (LLMs) have shown potential for human-like agents. To help these agents adapt to new tasks without extensive human supervision, we propose the Learning through Communication (LTC) paradigm, a novel training approach enabling LLM agents to improve continuously through interactions with their environments and other agents. Recent advancements in large language models (LLMs) have shown potential for human-like agents. To help these agents adapt to new tasks without extensive human supervision, we propose the Learning through Communication (LTC) paradigm, a novel training approach enabling LLM agents to improve continuously through interactions with their environments and other agents. Through iterative exploration and PPO training, LTC empowers the agent to assimilate short-term experiences into long-term memory. To optimize agent interactions for task-specific learning, we introduce three structured communication patterns: Monologue, Dialogue, and Analogue-tailored for common tasks such as decision-making, knowledge-intensive reasoning, and numerical reasoning. We evaluated LTC on three datasets: ALFWorld (decision-making), HotpotQA (knowledge-intensive reasoning), and GSM8k (numerical reasoning). On ALFWorld, it exceeds the instruction tuning baseline by 12% in success rate. On HotpotQA, LTC surpasses the instruction-tuned LLaMA-7B agent by 5.1% in EM score, and it outperforms the instruction-tuned 9x larger PaLM-62B agent by 0.6%. On GSM8k, LTC outperforms the CoT-Tuning baseline by 3.6% in accuracy. The results showcase the versatility and efficiency of the LTC approach across diverse domains. We will open-source our code to promote further development of the community.

摘要
最近的大语言模型（LLM）的进步已经表现出人类样式的代理。为了帮助这些代理适应新任务而不需极大的人类指导，我们提议了学习通信（LTC）方法，这是一种新的训练方法，可以让 LLM 代理通过与环境和其他代理的交互来不断改进。通过迭代探索和 PPO 训练，LTC 让代理可以将短期经验转化为长期记忆。为了优化代理之间的交互以掌握任务特定的学习，我们引入了三种结构化的通信模式：假言、对话和数据辅助，这些模式特化于常见的决策、知识激发和数学计算等任务。我们在 ALFWorld、HotpotQA 和 GSM8k 三个 dataset 上评估了 LTC，结果显示 LTC 在Success rate、EM 分数和准确率等方面都表现出优异。这些结果表明 LTC 方法在多种领域中具有广泛的应用前景和高效性。我们将在未来开源代码，以便更多的社区成员参与发展。

Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals

paper_url: http://arxiv.org/abs/2310.00603
repo_url: None
paper_authors: Yair Gat, Nitay Calderon, Amir Feder, Alexander Chapanin, Amit Sharma, Roi Reichart
for: 提高NLプロセッサの安全性和信任性を确保するための说明の强化
methods: 2种のアプローチを提案します：1つはCF生成アプローチで、具体的なテキスト概念を変更することでCFを生成する方法です。2つ目はマッチングアプローチで、トレーニング时にLLMを使用して特别な拟似的空间を学习する方法です。
results: 実験结果では、CF生成アプローチが非常に效果的ですが、検证时间が高くなるDrawbackがあります。一方、マッチングアプローチは、テスト时间の资源を削减した上で效果的な说明を提供することができます。また、Top-K技术を适用することで、すべてのテストされた方法を超える说明を提供することができます。

Abstract
Causal explanations of the predictions of NLP systems are essential to ensure safety and establish trust. Yet, existing methods often fall short of explaining model predictions effectively or efficiently and are often model-specific. In this paper, we address model-agnostic explanations, proposing two approaches for counterfactual (CF) approximation. The first approach is CF generation, where a large language model (LLM) is prompted to change a specific text concept while keeping confounding concepts unchanged. While this approach is demonstrated to be very effective, applying LLM at inference-time is costly. We hence present a second approach based on matching, and propose a method that is guided by an LLM at training-time and learns a dedicated embedding space. This space is faithful to a given causal graph and effectively serves to identify matches that approximate CFs. After showing theoretically that approximating CFs is required in order to construct faithful explanations, we benchmark our approaches and explain several models, including LLMs with billions of parameters. Our empirical results demonstrate the excellent performance of CF generation models as model-agnostic explainers. Moreover, our matching approach, which requires far less test-time resources, also provides effective explanations, surpassing many baselines. We also find that Top-K techniques universally improve every tested method. Finally, we showcase the potential of LLMs in constructing new benchmarks for model explanation and subsequently validate our conclusions. Our work illuminates new pathways for efficient and accurate approaches to interpreting NLP systems.

摘要
natura causa 解释预测是非常重要的，以确保安全性和建立信任。然而，现有的方法经常无法有效地解释模型预测或有效地适用于不同模型。在这篇论文中，我们提出了两种方法来实现模型无关的解释。首先，我们提出了一种基于生成的方法，使用大型自然语言模型（LLM）在预测时对特定文本概念进行修改，保持干扰因素不变。虽然这种方法具有很高的效果，但是在执行时需要费力。我们因此提出了第二种方法，基于匹配的方法，并提出了一种受 LLM 培训时引导的方法，学习专门的嵌入空间。这个空间忠实于给定的 causal 图，并能够准确地标识符合 CF 的匹配。我们理论上证明，以 Approximating CF 为前提，才能建立 faithful 的解释。我们对我们的方法进行了比较，并将其应用于多个模型，包括具有数百亿参数的 LLM。我们的实验结果表明，CF 生成模型在无关模型中具有非常高的表现，并且我们的匹配方法，需要较少的测试资源，也提供了有效的解释。此外，我们发现 Top-K 技术在所有测试方法中都有优化效果。最后，我们展示了 LLB 的潜在在建立新的解释指标和验证我们的结论。我们的工作揭示了新的高效和准确的方法，用于解释 NLP 系统。

A Novel Computational and Modeling Foundation for Automatic Coherence Assessment

paper_url: http://arxiv.org/abs/2310.00598
repo_url: None
paper_authors: Aviya Maimon, Reut Tsarfaty
for: 这篇论文主要针对了自然语言处理（NLP）中的 coherence 评估问题，即文本听起来有意义和连贯的问题。
methods: 该论文提出了一种基于 формаль语言定义的 coherence 评估方法，包括三个条件：cohesion、consistency 和 relevance。这些条件被формализова为不同的计算任务，并假设一个涵盖所有任务的模型会学习出 coherence 评估所需的特征。
results: 在两个人类评分的 benchmark 上进行了实验，结果表明，对于每个任务和总的 coherence 评估来说，使用 joint 模型比使用单个任务模型更好。这些结果表明，该方法可以提供一个强大的基础 для大规模自动 coherence 评估。

Abstract
Coherence is an essential property of well-written texts, that refers to the way textual units relate to one another. In the era of generative AI, coherence assessment is essential for many NLP tasks; summarization, generation, long-form question-answering, and more. However, in NLP {coherence} is an ill-defined notion, not having a formal definition or evaluation metrics, that would allow for large-scale automatic and systematic coherence assessment. To bridge this gap, in this work we employ the formal linguistic definition of \citet{Reinhart:1980} of what makes a discourse coherent, consisting of three conditions -- {\em cohesion, consistency} and {\em relevance} -- and formalize these conditions as respective computational tasks. We hypothesize that (i) a model trained on all of these tasks will learn the features required for coherence detection, and that (ii) a joint model for all tasks will exceed the performance of models trained on each task individually. On two benchmarks for coherence scoring rated by humans, one containing 500 automatically-generated short stories and another containing 4k real-world texts, our experiments confirm that jointly training on the proposed tasks leads to better performance on each task compared with task-specific models, and to better performance on assessing coherence overall, compared with strong baselines. We conclude that the formal and computational setup of coherence as proposed here provides a solid foundation for advanced methods of large-scale automatic assessment of coherence.

摘要
“一致性”是文本写作中非常重要的特性，指的是文本单位之间的关联方式。在生成AI时代，一致性评估成为许多自然语言处理（NLP）任务的重要组成部分，包括概要、生成、长文问答等。但在NLP中，“一致性”是一个不具体定义或评估指标的概念，无法进行大规模自动化和系统化的评估。为了bridging这个差距，在这个工作中，我们运用了实际语言学定义（Reinhart, 1980）所定义的一致性条件，包括“结合”、“一致”和“相关”三个条件，并将这些条件ormal化为各自的计算任务。我们假设（i）一个对所有这些任务进行训练的模型将学习出一致性检测所需的特征，并且（ii）将所有这些任务联合训练的模型会比单独训练的模型表现更好。在人类评分的两个库中，一个包含500个自动生成的短篇故事，另一个包含4000个真实世界文本，我们的实验显示，将所有这些任务联合训练的模型比单独训练的模型表现更好，并且在评估一致性方面表现更好，比单独使用强大的基准模型。我们 conclude that这种以形式和计算为基础的一致性设置提供了一个坚实的基础 для进一步的大规模自动一致性评估。

Quantum generative adversarial learning in photonics

paper_url: http://arxiv.org/abs/2310.00585
repo_url: None
paper_authors: Yizhi Wang, Shichuan Xue, Yaxuan Wang, Yong Liu, Jiangfang Ding, Weixu Shi, Dongyang Wang, Yingwen Liu, Xiang Fu, Guangyao Huang, Anqi Huang, Mingtang Deng, Junjie Wu
for: 本研究旨在调查 Whether Quantum Generative Adversarial Networks (QGANs) can perform learning tasks on near-term quantum devices usually affected by noise and even defects.
methods: 我们使用了一个可编程的硅量子光学芯片，实验了 QGAN 模型在光学领域中，并研究了噪声和缺陷对其性能的影响。
results: 我们的结果表明，即使Generator的相位调制器中有一半被损坏，或Generator和Discriminator的相位调制器都受到相位噪声达0.04π，QGANs仍然可以生成高质量量子数据，其准确率高于90%。

Abstract
Quantum Generative Adversarial Networks (QGANs), an intersection of quantum computing and machine learning, have attracted widespread attention due to their potential advantages over classical analogs. However, in the current era of Noisy Intermediate-Scale Quantum (NISQ) computing, it is essential to investigate whether QGANs can perform learning tasks on near-term quantum devices usually affected by noise and even defects. In this Letter, using a programmable silicon quantum photonic chip, we experimentally demonstrate the QGAN model in photonics for the first time, and investigate the effects of noise and defects on its performance. Our results show that QGANs can generate high-quality quantum data with a fidelity higher than 90\%, even under conditions where up to half of the generator's phase shifters are damaged, or all of the generator and discriminator's phase shifters are subjected to phase noise up to 0.04$\pi$. Our work sheds light on the feasibility of implementing QGANs on NISQ-era quantum hardware.

摘要
量子生成对抗网络（QGAN），量子计算和机器学习的交叉点，在当今中等规模量子计算（NISQ）时代受到广泛关注，因为它们可能比类比的古典模型具有优势。然而，在NISQ时代的近期量子设备上进行学习任务，受到噪声和瑕疵的影响是必须考虑的。在这封信中，我们使用可编程的硅量子光学芯片实验ally QGAN模型在光学中，并研究噪声和瑕疵对其性能的影响。我们的结果表明，QGAN可以生成高质量量子数据，其准确率高于90%， même under conditions where up to half of the generator's phase shifters are damaged, or all of the generator and discriminator's phase shifters are subjected to phase noise up to 0.04π。我们的工作照明了在NISQ时代量子硬件上实现QGAN的可能性。

CityFM: City Foundation Models to Solve Urban Challenges

paper_url: http://arxiv.org/abs/2310.00583
repo_url: None
paper_authors: Pasquale Balsebre, Weiming Huang, Gao Cong, Yi Li
for: 本研究旨在开发一种基于自适应学习的城市基础模型（CityFM），以便在选定的地理区域内（如城市）进行自动化学习。
methods: CityFM 基于开源地理数据（如 OpenStreetMap）进行自我超vision，通过对不同类型实体（如路径、建筑物、区域）的多模式信息进行拟合，生成高质量的基础表示。
results: 对于路、建筑物和区域等下游任务，CityFM 的表示能够超过或与特定应用程序的基elines匹配。

Abstract
Pre-trained Foundation Models (PFMs) have ushered in a paradigm-shift in Artificial Intelligence, due to their ability to learn general-purpose representations that can be readily employed in a wide range of downstream tasks. While PFMs have been successfully adopted in various fields such as Natural Language Processing and Computer Vision, their capacity in handling geospatial data and answering urban questions remains limited. This can be attributed to the intrinsic heterogeneity of geospatial data, which encompasses different data types, including points, segments and regions, as well as multiple information modalities, such as a spatial position, visual characteristics and textual annotations. The proliferation of Volunteered Geographic Information initiatives, and the ever-increasing availability of open geospatial data sources, like OpenStreetMap, which is freely accessible globally, unveil a promising opportunity to bridge this gap. In this paper, we present CityFM, a self-supervised framework to train a foundation model within a selected geographical area of interest, such as a city. CityFM relies solely on open data from OSM, and produces multimodal representations of entities of different types, incorporating spatial, visual, and textual information. We analyse the entity representations generated using our foundation models from a qualitative perspective, and conduct quantitative experiments on road, building, and region-level downstream tasks. We compare its results to algorithms tailored specifically for the respective applications. In all the experiments, CityFM achieves performance superior to, or on par with, the baselines.

摘要
干支基模型（PFM）已经引入了人工智能中的一个新模式，因为它们可以学习通用表示，可以在多种下游任务中使用。 although PFMs have been successfully applied in various fields such as natural language processing and computer vision, their ability to handle geospatial data and answer urban questions is still limited. This is because geospatial data is inherently heterogeneous, including different data types such as points, segments, and regions, as well as multiple information modalities such as spatial position, visual characteristics, and textual annotations. With the proliferation of Volunteered Geographic Information initiatives and the increasing availability of open geospatial data sources like OpenStreetMap, which is freely accessible globally, there is a promising opportunity to bridge this gap.在本文中，我们提出了CityFM，一种自我超vised框架，用于在选择的地理区域内（如城市）训练基本模型。 CityFM仅使用OpenStreetMap开源数据，生成多模式表示实体不同类型，包括空间、视觉和文本信息。我们从质量角度分析基本模型生成的实体表示，并对路、建筑物和区域级下游任务进行量测试。我们与专门为这些应用程序开发的算法进行比较。在所有实验中，CityFM的性能都高于或与基eline相当。

paper_url: http://arxiv.org/abs/2310.00582
repo_url: https://github.com/sy-xuan/pink
paper_authors: Shiyu Xuan, Qingpei Guo, Ming Yang, Shiliang Zhang
For:This paper aims to enhance the Referential Comprehension (RC) ability of Multi-modal Large Language Models (MLLMs) for fine-grained perception tasks.Methods:The proposed method represents the referring object in the image using the coordinates of its bounding box and converts the coordinates into texts in a specific format, allowing the model to treat the coordinates as natural language. The model is trained end-to-end with a parameter-efficient tuning framework that allows both modalities to benefit from multi-modal instruction tuning.Results:The proposed method demonstrates superior performance on conventional vision-language and RC tasks, achieving a 12.0% absolute accuracy improvement over Instruct-BLIP on VSR and surpassing Kosmos-2 by 24.7% on RefCOCO_val under zero-shot settings. The model also attains the top position on the leaderboard of MMBench.

Abstract
Multi-modal Large Language Models (MLLMs) have shown remarkable capabilities in many vision-language tasks. Nevertheless, most MLLMs still lack the Referential Comprehension (RC) ability to identify a specific object or area in images, limiting their application in fine-grained perception tasks. This paper proposes a novel method to enhance the RC capability for MLLMs. Our model represents the referring object in the image using the coordinates of its bounding box and converts the coordinates into texts in a specific format. This allows the model to treat the coordinates as natural language. Moreover, we construct the instruction tuning dataset with various designed RC tasks at a low cost by unleashing the potential of annotations in existing datasets. To further boost the RC ability of the model, we propose a self-consistent bootstrapping method that extends dense object annotations of a dataset into high-quality referring-expression-bounding-box pairs. The model is trained end-to-end with a parameter-efficient tuning framework that allows both modalities to benefit from multi-modal instruction tuning. This framework requires fewer trainable parameters and less training data. Experimental results on conventional vision-language and RC tasks demonstrate the superior performance of our method. For instance, our model exhibits a 12.0% absolute accuracy improvement over Instruct-BLIP on VSR and surpasses Kosmos-2 by 24.7% on RefCOCO_val under zero-shot settings. We also attain the top position on the leaderboard of MMBench. The models, datasets, and codes are publicly available at https://github.com/SY-Xuan/Pink

摘要
多modal大语言模型（MLLM）已经表现出了很好的能力在视觉语言任务中。然而，大多数MLLM仍然缺乏指向某个特定 объек或区域在图像中的能力，限制了它们在细化感知任务中的应用。这篇论文提出了一种新的方法来增强MLLM的指向能力。我们的模型使用图像中引用对象的矩形框坐标来表示引用对象，并将坐标转换成特定格式的文本。这 позвоils 模型对坐标视为自然语言。此外，我们构建了一个指令调整数据集，包括了多种设计的指令调整任务，并且可以在低成本下实现。为了进一步提高模型的指向能力，我们提出了一种自适应增强方法，该方法可以将 dense object 注解 extend 到高质量的引用表示矩形框对。模型通过一个简单的参数效率的调参框架进行全局调参，这使得两种模式都可以从多模态指令调整中受益。实验结果表明，我们的方法可以在 convential 视觉语言任务和指向任务中表现出较好的性能。例如，我们的模型在 VSR 任务上比 Instruct-BLIP 提高 12.0% 绝对准确率，并在 RefCOCO_val 任务上比 Kosmos-2 提高 24.7% 绝对准确率，这些结果均在零shot设置下获得。此外，我们的模型在 MMBench 领导板块上位居榜首。模型、数据集和代码都可以在 https://github.com/SY-Xuan/Pink 上获取。

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

paper_url: http://arxiv.org/abs/2310.02279
repo_url: https://github.com/sony/ctm
paper_authors: Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon
for: 加速扩散模型采样，提高扩散模型的性能。
methods: 提议一种新的兼容性轨迹模型（CTM），可以在单个前进 pass中输出分数（即极化流动方程中的导数），并允许在扩散过程中任意时刻进行交互。
results: CTM在CIFAR-10和ImageNet的64x64分辨率上达到了新的州际级FID值（FID 1.73和FID 2.06），并可以在计算预算增加时，不断提高样本质量，避免了CM中的降低。

Abstract
Consistency Models (CM) (Song et al., 2023) accelerate score-based diffusion model sampling at the cost of sample quality but lack a natural way to trade-off quality for speed. To address this limitation, we propose Consistency Trajectory Model (CTM), a generalization encompassing CM and score-based models as special cases. CTM trains a single neural network that can -- in a single forward pass -- output scores (i.e., gradients of log-density) and enables unrestricted traversal between any initial and final time along the Probability Flow Ordinary Differential Equation (ODE) in a diffusion process. CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance and achieves new state-of-the-art FIDs for single-step diffusion model sampling on CIFAR-10 (FID 1.73) and ImageNet at 64X64 resolution (FID 2.06). CTM also enables a new family of sampling schemes, both deterministic and stochastic, involving long jumps along the ODE solution trajectories. It consistently improves sample quality as computational budgets increase, avoiding the degradation seen in CM. Furthermore, CTM's access to the score accommodates all diffusion model inference techniques, including exact likelihood computation.

摘要
协调模型（CM）（Song et al., 2023）可以加速基于分数的扩散模型抽象，但是会增加样本质量的成本。为了解决这个限制，我们提出了一种新的模型——一致轨迹模型（CTM）。CTM可以在单个前进 pass中输出分数（即极化流速度的导数），并且允许在扩散过程中任意时刻之间进行不受限制的游走。此外，CTM还可以通过 combining adversarial training和杂噪分数匹配损失来提高性能，并实现了单步扩散模型抽象中的新的州态-of-the-art FID 值（FID 1.73）和 ImageNet 的 64x64 分辨率上的 FID 值（FID 2.06）。此外，CTM还可以实现一种新的抽象方式，包括 deterministic 和 stochastic 的长距离跳跃。在计算预算增加时，CTM可以逐步提高样本质量，而不是如CM所见的协调模型。此外，CTM可以访问分数，因此可以应用于所有扩散模型的推理技术，包括准确的概率计算。

LaPLACE: Probabilistic Local Model-Agnostic Causal Explanations

paper_url: http://arxiv.org/abs/2310.00570
repo_url: https://github.com/simon-tan/laplace
paper_authors: Sein Minn
for: The paper aims to provide probabilistic cause-and-effect explanations for any classifier operating on tabular data, in a human-understandable manner.
methods: The LaPLACE-Explainer component leverages the concept of a Markov blanket to establish statistical boundaries between relevant and non-relevant features automatically, and incorporates conditional probabilities to offer probabilistic causal explanations.
results: The approach outperforms LIME and SHAP in terms of local accuracy and consistency of explained features, and is validated across various classification models through experiments with both simulated and real-world datasets. The explanations provided by LaPLACE can address trust-related issues such as evaluating prediction reliability, facilitating model selection, enhancing trustworthiness, and identifying fairness-related concerns within classifiers.Here is the information in Simplified Chinese text:
for: 本文目的是提供任何类别器操作于表格数据上的可能性 causa causal 解释，以人类可理解的方式。
methods: LaPLACE-Explainer 组件利用 Markov 围栏的概念，自动地建立表格数据上相关和非相关特征的统计边界，并通过 conditional probabilities 提供可能性解释。
results: LaPLACE 的方法比 LIME 和 SHAP 在本地准确率和解释特征的一致性方面表现出色，并通过多种分类模型的实验，在 simulate 和实际数据集上进行了验证。 LaPLACE 的解释可以解决一些信任问题，如评估预测可靠性、促进模型选择、增强可靠性和检测 fairness 相关问题在类别器中。

Abstract
Machine learning models have undeniably achieved impressive performance across a range of applications. However, their often perceived black-box nature, and lack of transparency in decision-making, have raised concerns about understanding their predictions. To tackle this challenge, researchers have developed methods to provide explanations for machine learning models. In this paper, we introduce LaPLACE-explainer, designed to provide probabilistic cause-and-effect explanations for any classifier operating on tabular data, in a human-understandable manner. The LaPLACE-Explainer component leverages the concept of a Markov blanket to establish statistical boundaries between relevant and non-relevant features automatically. This approach results in the automatic generation of optimal feature subsets, serving as explanations for predictions. Importantly, this eliminates the need to predetermine a fixed number N of top features as explanations, enhancing the flexibility and adaptability of our methodology. Through the incorporation of conditional probabilities, our approach offers probabilistic causal explanations and outperforms LIME and SHAP (well-known model-agnostic explainers) in terms of local accuracy and consistency of explained features. LaPLACE's soundness, consistency, local accuracy, and adaptability are rigorously validated across various classification models. Furthermore, we demonstrate the practical utility of these explanations via experiments with both simulated and real-world datasets. This encompasses addressing trust-related issues, such as evaluating prediction reliability, facilitating model selection, enhancing trustworthiness, and identifying fairness-related concerns within classifiers.

摘要
机器学习模型在多种应用场景中表现出色，但它们的很多时候被视为黑盒模型，无法准确地描述它们的预测结果。为解决这个问题，研究人员开发了一些方法来提供机器学习模型的解释。本文介绍了LaPLACE-explainer，可以为任何基于表格数据的分类器提供 probabilistic cause-and-effect 的解释，并且在人类可以理解的方式下进行解释。LaPLACE-Explainer 组件利用 Markov blanket 的概念，自动地确定相关和无关的特征。这种方法可以自动生成最佳的特征子集，作为预测的解释。这种方法不需要手动决定固定的特征数 N 作为解释，从而提高了方法的灵活性和适应性。通过 incorporating conditional probabilities，我们的方法可以提供 probabilistic causal 的解释，并且在本地准确性和解释特征的一致性方面超过 LIME 和 SHAP（已知的模型无关解释器）。LaPLACE 的准确性、一致性、本地准确性和适应性被严格验证了多种分类模型。此外，我们通过对 simulated 和实际数据进行实验，证明了这些解释的实际用途。这包括评估预测可靠性、促进模型选择、增强可靠性和识别分类器中的公平问题。

Quantum-Based Feature Selection for Multi-classification Problem in Complex Systems with Edge Computing

paper_url: http://arxiv.org/abs/2310.01443
repo_url: None
paper_authors: Wenjie Liu, Junxiu Chen, Yuxiang Wang, Peipei Gao, Zhibin Lei, Xu Ma
for: 本研究提出了一种基于量子算法的特征选择方法，以提高计算效率和降低资源消耗。
methods: 本方法使用量子态编码法将每个样本的特征编码为量子状态，然后应用振荡器算法计算任务之间的相似性。接着，根据相似性，使用格罗韦-隆方法找到最近的k个邻居样本，并更新权重矩阵。
results: 与传统的类ReliefF算法相比，本方法可以降低相似性计算的复杂度从O(MN)降至O(M)，找到最近的邻居的复杂度从O(M)降至O(sqrt(M))，并降低资源消耗从O(MN)降至O(MlogN)。同时，与量子Relief算法相比，本方法在找到最近的邻居方面更为精准，从O(M)降至O(sqrt(M))。最后，通过基于Rigetti的一个简单示例的实验来验证方法的可行性。

Abstract
The complex systems with edge computing require a huge amount of multi-feature data to extract appropriate insights for their decision making, so it is important to find a feasible feature selection method to improve the computational efficiency and save the resource consumption. In this paper, a quantum-based feature selection algorithm for the multi-classification problem, namely, QReliefF, is proposed, which can effectively reduce the complexity of algorithm and improve its computational efficiency. First, all features of each sample are encoded into a quantum state by performing operations CMP and R_y, and then the amplitude estimation is applied to calculate the similarity between any two quantum states (i.e., two samples). According to the similarities, the Grover-Long method is utilized to find the nearest k neighbor samples, and then the weight vector is updated. After a certain number of iterations through the above process, the desired features can be selected with regards to the final weight vector and the threshold {\tau}. Compared with the classical ReliefF algorithm, our algorithm reduces the complexity of similarity calculation from O(MN) to O(M), the complexity of finding the nearest neighbor from O(M) to O(sqrt(M)), and resource consumption from O(MN) to O(MlogN). Meanwhile, compared with the quantum Relief algorithm, our algorithm is superior in finding the nearest neighbor, reducing the complexity from O(M) to O(sqrt(M)). Finally, in order to verify the feasibility of our algorithm, a simulation experiment based on Rigetti with a simple example is performed.

摘要
复杂系统与边计算需要巨量多元特征数据提取适当的洞察，因此需要一种可行的特征选择方法来提高计算效率和节省资源消耗。本文提出了一种基于量子算法的多类划分问题特征选择算法，即QReliefF，可以有效减少算法的复杂性和提高计算效率。首先，每个样本的所有特征都被编码成量子状态，并通过操作CMP和R_y进行实现。然后，对任意两个量子状态（即两个样本）进行振荡检测，并根据相似性，使用格罗弗-隆方法查找最近的k个邻居样本。然后更新权重 вектор。经过一定的迭代过程，可以选择符合最终权重 вектор和阈值{\tau}的特征。与 классическойReliefF算法相比，我们的算法减少了相似性计算的复杂性从O(MN)降低到O(M)，寻找最近邻居的复杂性从O(M)降低到O(sqrt(M))，资源消耗从O(MN)降低到O(MlogN)。同时，与量子Relief算法相比，我们的算法在寻找最近邻居方面更加突出，从O(M)降低到O(sqrt(M))。 finally，为证明我们的算法的可行性，我们在Rigetti上进行了一个简单的实验。

TDCGL: Two-Level Debiased Contrastive Graph Learning for Recommendation

paper_url: http://arxiv.org/abs/2310.00569
repo_url: None
paper_authors: Yubo Gao, Haotian Wu
for:The paper aims to address the problems of over-reliance on high-quality knowledge graphs and noise issues in real-world data, which can negatively impact the performance of knowledge graph-based recommendation methods.methods:The proposed method, Two-Level Debiased Contrastive Graph Learning (TDCGL), combines contrastive learning with debiasing techniques to improve the performance of knowledge graph-based recommendation methods. The method is designed to work on both User-Item and User-User pairs to model higher-order relations.results:The proposed method significantly outperforms state-of-the-art baselines in terms of anti-noise capability and recommendation performance. Ablation studies demonstrate the necessity of each level of the TDCGL method.

Abstract
knowledge graph-based recommendation methods have achieved great success in the field of recommender systems. However, over-reliance on high-quality knowledge graphs is a bottleneck for such methods. Specifically, the long-tailed distribution of entities of KG and noise issues in the real world will make item-entity dependent relations deviate from reflecting true characteristics and significantly harm the performance of modeling user preference. Contrastive learning, as a novel method that is employed for data augmentation and denoising, provides inspiration to fill this research gap. However, the mainstream work only focuses on the long-tail properties of the number of items clicked, while ignoring that the long-tail properties of total number of clicks per user may also affect the performance of the recommendation model. Therefore, to tackle these problems, motivated by the Debiased Contrastive Learning of Unsupervised Sentence Representations (DCLR), we propose Two-Level Debiased Contrastive Graph Learning (TDCGL) model. Specifically, we design the Two-Level Debiased Contrastive Learning (TDCL) and deploy it in the KG, which is conducted not only on User-Item pairs but also on User-User pairs for modeling higher-order relations. Also, to reduce the bias caused by random sampling in contrastive learning, with the exception of the negative samples obtained by random sampling, we add a noise-based generation of negation to ensure spatial uniformity. Considerable experiments on open-source datasets demonstrate that our method has excellent anti-noise capability and significantly outperforms state-of-the-art baselines. In addition, ablation studies about the necessity for each level of TDCL are conducted.

摘要
知识图库（KG）基于推荐方法在推荐系统中取得了很大的成功。然而，高质量知识图的过亢使得这些方法受到了阻碍。具体来说，知识图中实体的长尾分布和实际世界中的噪声问题会使Item-Entity相关性偏离真实特性，从而对模型用户喜好的表达有很大的负面影响。对此，对数据增强和降噪的contrastive学习提供了灵感，但主流工作只关注长尾数量的点击项，而忽略了每个用户的总点击量长尾属性的影响。因此，为了解决这些问题，我们提出了Two-Level Debiased Contrastive Graph Learning（TDCGL）模型。具体来说，我们设计了Two-Level Debiased Contrastive Learning（TDCL），并在KG中进行了实现，不仅在用户-项对上进行了实现，还在用户-用户对上进行了实现，以模型高级别关系。此外，为了减少Random sampling导致的偏见，除了随机抽取的负样本外，我们还添加了随机生成的负样本，以确保空间均匀性。经过了一系列的实验，我们发现我们的方法在开源数据集上具有极高的反噪能力，并在比较之下显著超越了状态精算标准。此外，我们还进行了剖析研究，以确定每级TDCL的必要性。

Understanding the Robustness of Randomized Feature Defense Against Query-Based Adversarial Attacks

paper_url: http://arxiv.org/abs/2310.00567
repo_url: None
paper_authors: Quang H. Nguyen, Yingjie Lao, Tung Pham, Kok-Seng Wong, Khoa D. Doan
for: 防止深度神经网络受到黑盒攻击，即使攻击者只有模型的输出信息。
methods: 提出了一种简单、轻量级的防御策略，在推理时将隐藏层的特征加上随机噪音，以提高模型免受黑盒攻击。
results: 经过 teorical 分析和实验 validate，该方法可以有效地增强模型对黑盒攻击的抵抗力，并不需要对模型进行 adversarial 训练，对模型的准确率也没有明显的影响。

Abstract
Recent works have shown that deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify. Even with access only to the model's output, an attacker can employ black-box attacks to generate such adversarial examples. In this work, we propose a simple and lightweight defense against black-box attacks by adding random noise to hidden features at intermediate layers of the model at inference time. Our theoretical analysis confirms that this method effectively enhances the model's resilience against both score-based and decision-based black-box attacks. Importantly, our defense does not necessitate adversarial training and has minimal impact on accuracy, rendering it applicable to any pre-trained model. Our analysis also reveals the significance of selectively adding noise to different parts of the model based on the gradient of the adversarial objective function, which can be varied during the attack. We demonstrate the robustness of our defense against multiple black-box attacks through extensive empirical experiments involving diverse models with various architectures.

摘要

Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models

paper_url: http://arxiv.org/abs/2310.00566
repo_url: https://github.com/colfeng/calm
paper_authors: Duanyu Feng, Yongfu Dai, Jimin Huang, Yifang Zhang, Qianqian Xie, Weiguang Han, Alejandro Lopez-Lira, Hao Wang
for: 这篇论文旨在检验大语言模型（LLM）是否可以用于信用评估。
methods: 作者使用了三个假设和一个大量的实验研究LLM在信用评估中的可行性。他们首先制定了一个特有的信用评估大语言模型（CALM），然后对LLM的偏见进行了严格的检查。
results: 研究发现LLM可以超越传统模型的局限性，并且在不同的金融评估中表现出优异的适应能力。同时，研究也发现LLM可能存在一些偏见，因此提出了一些改进方案。

Abstract
Credit and risk assessments are cornerstones of the financial landscape, impacting both individual futures and broader societal constructs. Existing credit scoring models often exhibit limitations stemming from knowledge myopia and task isolation. In response, we formulate three hypotheses and undertake an extensive case study to investigate LLMs' viability in credit assessment. Our empirical investigations unveil LLMs' ability to overcome the limitations inherent in conventional models. We introduce a novel benchmark curated for credit assessment purposes, fine-tune a specialized Credit and Risk Assessment Large Language Model (CALM), and rigorously examine the biases that LLMs may harbor. Our findings underscore LLMs' potential in revolutionizing credit assessment, showcasing their adaptability across diverse financial evaluations, and emphasizing the critical importance of impartial decision-making in the financial sector. Our datasets, models, and benchmarks are open-sourced for other researchers.

摘要
信用和风险评估是金融景观中的两个重要基础，对个人未来和社会构建都产生了深远的影响。现有的信用评估模型经常受到知识偏见和任务隔离的限制。为了应对这些限制，我们提出了三个假设，并进行了广泛的案例研究，以评估LLMs在信用评估中的可行性。我们的实际调查发现，LLMs可以超越传统模型中的限制。我们开发了一个专门为信用评估目的制定的benchmark，细化一个特殊的信用和风险评估大语言模型（CALM），并且严格地检查LLMs可能披露的偏见。我们的发现表明，LLMs在改变信用评估的方式方面具有启示性，并且在多种金融评估中展现出了适应性。我们的数据集、模型和benchmark都公开发布，以便其他研究人员进行进一步的研究。

DYNAP-SE2: a scalable multi-core dynamic neuromorphic asynchronous spiking neural network processor

paper_url: http://arxiv.org/abs/2310.00564
repo_url: None
paper_authors: Ole Richter, Chenxi Wu, Adrian M. Whatley, German Köstinger, Carsten Nielsen, Ning Qiao, Giacomo Indiveri
for: 这个论文旨在提出一种基于生物神经系统的概念的卷积神经网络平台，用于实时处理感知信号。
methods: 该平台使用了卷积神经网络，并实现了各种生物学上的神经处理现象，如短期抑制、NMDA阻链、AMPA扩散、家OSTAT、脉冲频率调整、抗阻填充和脉冲传输延迟。
results: 该平台可以实现实时处理感知信号，并且可以模拟不同的生物学上的神经网络，包括单个神经元和脑细胞信号的监测。

Abstract
With the remarkable progress that technology has made, the need for processing data near the sensors at the edge has increased dramatically. The electronic systems used in these applications must process data continuously, in real-time, and extract relevant information using the smallest possible energy budgets. A promising approach for implementing always-on processing of sensory signals that supports on-demand, sparse, and edge-computing is to take inspiration from biological nervous system. Following this approach, we present a brain-inspired platform for prototyping real-time event-based Spiking Neural Networks (SNNs). The system proposed supports the direct emulation of dynamic and realistic neural processing phenomena such as short-term plasticity, NMDA gating, AMPA diffusion, homeostasis, spike frequency adaptation, conductance-based dendritic compartments and spike transmission delays. The analog circuits that implement such primitives are paired with a low latency asynchronous digital circuits for routing and mapping events. This asynchronous infrastructure enables the definition of different network architectures, and provides direct event-based interfaces to convert and encode data from event-based and continuous-signal sensors. Here we describe the overall system architecture, we characterize the mixed signal analog-digital circuits that emulate neural dynamics, demonstrate their features with experimental measurements, and present a low- and high-level software ecosystem that can be used for configuring the system. The flexibility to emulate different biologically plausible neural networks, and the chip's ability to monitor both population and single neuron signals in real-time, allow to develop and validate complex models of neural processing for both basic research and edge-computing applications.

摘要
随着技术的快速发展，处理数据在边缘的需求减少了很多。电子系统在这些应用程序中必须在实时中处理数据，并在最小的能量预算下提取相关信息。一种有前途的方法是根据生物神经系统来实现持续时间的触发神经网络（SNN）。我们在这篇文章中提出了一种基于脑神经系统的平台，用于实时驱动SNN。该系统支持直接模拟生物化的神经处理现象，如短期抑制、NMDA闭合、AMPA扩散、家OSTASIS、脉冲频率调整、抗场基于脑干细胞和脉冲传输延迟。这些分析电路与低延迟的异步数字电路结合，以实现不同网络架构和直接将事件转换为数据。这个异步基础设施允许定义不同的网络架构，并提供直接基于事件的数据编码和转换接口。我们在这篇文章中描述了整体系统架构，Characterize mixed signal analog-digital circuits that emulate neural dynamics, demonstrate their features with experimental measurements, and present a low- and high-level software ecosystem that can be used for configuring the system。系统的灵活性可以模拟不同的生物学可能的神经网络，系统的检测功能可以在实时中监测单个神经元和群体神经元的信号。这些功能使得可以开发和验证复杂的神经处理模型，以满足边缘计算应用和基础研究的需求。

Siamese Representation Learning for Unsupervised Relation Extraction

paper_url: http://arxiv.org/abs/2310.00552
repo_url: https://github.com/gxxxzhang/siamese-ure
paper_authors: Guangxin Zhang, Shu Chen
for: 掌握开放平台文本中Named Entity对的下一级关系，无需先知 relacional distribution。
methods: 使用对比学习，吸引正样本，排斥负样本，以提高分类的分化。
results: 我们提出的Siamese Representation Learning for Unsupervised Relation Extraction模型，可以有效优化关系表示例子，保持关系特征空间的层次结构，并在无监督的情况下提高关系EXTRACTION的性能。

Abstract
Unsupervised relation extraction (URE) aims at discovering underlying relations between named entity pairs from open-domain plain text without prior information on relational distribution. Existing URE models utilizing contrastive learning, which attract positive samples and repulse negative samples to promote better separation, have got decent effect. However, fine-grained relational semantic in relationship makes spurious negative samples, damaging the inherent hierarchical structure and hindering performances. To tackle this problem, we propose Siamese Representation Learning for Unsupervised Relation Extraction -- a novel framework to simply leverage positive pairs to representation learning, possessing the capability to effectively optimize relation representation of instances and retain hierarchical information in relational feature space. Experimental results show that our model significantly advances the state-of-the-art results on two benchmark datasets and detailed analyses demonstrate the effectiveness and robustness of our proposed model on unsupervised relation extraction.

摘要
<>Translate the given text into Simplified Chinese.<>无监督关系抽取（URE）目标是从开放领域平滑文本中发现下面的关系，无需先知relational分布。现有URE模型使用对照学习，吸引正样本并排斥负样本，以促进更好的分离。然而，细腻的关系semantic在关系中导致假性负样本的生成，损害内在的层次结构，降低性能。为解决这个问题，我们提出了对称表示学习 для无监督关系抽取——一种新的框架，可以简单地利用正样本来 representation学习，具有可以有效优化关系表示实例的能力，并保留关系特征空间中的层次信息。实验结果显示，我们的模型在两个 benchmark 数据集上显著提高了状态的报告结果，并在详细分析中证明了我们提出的模型在无监督关系抽取中的效果和稳定性。

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

paper_url: http://arxiv.org/abs/2310.00535
repo_url: None
paper_authors: Yuandong Tian, Yiping Wang, Zhenyu Zhang, Beidi Chen, Simon Du
for: 这篇论文旨在理解多层Transformer架构在训练过程中的行为。
methods: 论文提出了一种新的数学框架，称为Join MLP/Attention（JoMA）动力学，它将Transformer架构中的自注意层替换为多层MLP层，从而更好地理解训练过程。
results: 实验表明，在使用真实世界数据集（Wikitext2/Wikitext103）和不同的预训练模型（OPT、Pythia）训练的情况下，JoMA能够准确预测多层Transformer中Token的组合方式，并且能够解释在不同的 activations 下，注意力在不同阶段变得稀疏或密集。

Abstract
We propose Joint MLP/Attention (JoMA) dynamics, a novel mathematical framework to understand the training procedure of multilayer Transformer architectures. This is achieved by integrating out the self-attention layer in Transformers, producing a modified dynamics of MLP layers only. JoMA removes unrealistic assumptions in previous analysis (e.g., lack of residual connection) and predicts that the attention first becomes sparse (to learn salient tokens), then dense (to learn less salient tokens) in the presence of nonlinear activations, while in the linear case, it is consistent with existing works that show attention becomes sparse over time. We leverage JoMA to qualitatively explains how tokens are combined to form hierarchies in multilayer Transformers, when the input tokens are generated by a latent hierarchical generative model. Experiments on models trained from real-world dataset (Wikitext2/Wikitext103) and various pre-trained models (OPT, Pythia) verify our theoretical findings.

摘要
我们提议的 JOINT MLP/ATTENTION（JoMA）动力学框架，用于理解多层Transformer结构的训练过程。我们在Transformer中抽取了自注意层，生成了修改后的MLP层 dynamics。JoMA eliminates unrealistic assumptions in previous analysis（例如缺乏径向连接），并预测在非线性活化下，注意力首先变得稀疏（以学习重要的token），然后变得密集（以学习较不重要的token）。在线性情况下，它与先前的研究一致，注意力随时间变得稀疏。我们利用JoMA来质量地解释在多层Transformer中如何将输入token组合成层次结构，当输入token由隐藏的层次生成模型生成。我们通过在真实世界数据集（Wikitext2/Wikitext103）和多种预训练模型（OPT、Pythia）进行实验，证明我们的理论发现。

SELF: Language-Driven Self-Evolution for Large Language Model

paper_url: http://arxiv.org/abs/2310.00533
repo_url: None
paper_authors: Jianqiao Lu, Wanjun Zhong, Wenyong Huang, Yufei Wang, Fei Mi, Baojun Wang, Weichao Wang, Lifeng Shang, Qun Liu
for: The paper aims to introduce an innovative approach for autonomous model development in large language models (LLMs), enabling them to undergo continual self-evolution and improve their intrinsic abilities without human intervention.
methods: The proposed approach, called “SELF” (Self-Evolution with Language Feedback), employs language-based feedback as a versatile and comprehensive evaluative tool to guide the model’s self-evolutionary training. SELF acquires foundational meta-skills through meta-skill learning, and uses self-curated data for perpetual training and iterative fine-tuning to enhance its capabilities.
results: The experimental results on representative benchmarks demonstrate that SELF can progressively advance its inherent abilities without human intervention, producing responses of superior quality. The SELF framework signifies a viable pathway for autonomous LLM development, transforming the LLM from a passive recipient of information into an active participant in its own evolution.

Abstract
Large Language Models (LLMs) have showcased remarkable versatility across diverse domains. However, the pathway toward autonomous model development, a cornerstone for achieving human-level learning and advancing autonomous AI, remains largely uncharted. We introduce an innovative approach, termed "SELF" (Self-Evolution with Language Feedback). This methodology empowers LLMs to undergo continual self-evolution. Furthermore, SELF employs language-based feedback as a versatile and comprehensive evaluative tool, pinpointing areas for response refinement and bolstering the stability of self-evolutionary training. Initiating with meta-skill learning, SELF acquires foundational meta-skills with a focus on self-feedback and self-refinement. These meta-skills are critical, guiding the model's subsequent self-evolution through a cycle of perpetual training with self-curated data, thereby enhancing its intrinsic abilities. Given unlabeled instructions, SELF equips the model with the capability to autonomously generate and interactively refine responses. This synthesized training data is subsequently filtered and utilized for iterative fine-tuning, enhancing the model's capabilities. Experimental results on representative benchmarks substantiate that SELF can progressively advance its inherent abilities without the requirement of human intervention, thereby indicating a viable pathway for autonomous model evolution. Additionally, SELF can employ online self-refinement strategy to produce responses of superior quality. In essence, the SELF framework signifies a progressive step towards autonomous LLM development, transforming the LLM from a mere passive recipient of information into an active participant in its own evolution.

摘要
大型语言模型（LLM）在多种领域表现出了惊人的多面性。然而，把模型发展成为自主的核心目标，以实现人类水平的学习和自主AI的进步，仍然是一个未探索的路径。我们提出了一种创新的方法，称为“自我演进”（Self-Evolution with Language Feedback，SELF）。这种方法使得LLM可以不断自我演进。此外，SELF使用语言反馈作为多方面的评价工具，帮助模型自我评估和改进。通过初级技能学习，SELF取得了基本的初级技能，重点是自我反馈和自我改进。这些初级技能是关键的，帮助模型在后续的自我演进过程中自动生成和互动地反复修改答案。在没有 Label 的情况下，SELF 使得模型可以自动生成和修改答案。这些合成的训练数据被筛选并用于迭代练化，从而提高模型的能力。实验结果表明，SELF 可以不断提高其内在能力，无需人类干预，这表明了一个可行的自主模型演进路径。此外，SELF 还可以使用在线自我反finement策略生成高质量答案。总之，SELF 框架表示了一个自主 LLM 发展的进步，将 LLM 转化为一个活跃参与自己演进的参与者。

Are Graph Neural Networks Optimal Approximation Algorithms?

paper_url: http://arxiv.org/abs/2310.00526
repo_url: None
paper_authors: Morris Yau, Eric Lu, Nikolaos Karalias, Jessica Xu, Stefanie Jegelka
for: 这个论文目的是设计用于获得优化算法的图 neural network 架构，用于解决一类 combinatorial optimization problems。
methods: 论文使用了强大的算法工具 from semidefinite programming (SDP)，并证明了可以使用 polynomial-sized message passing algorithms 来表示最强 polynomial time algorithms for Max Constraint Satisfaction Problems，假设Unique Games Conjecture 成立。
results: 论文实现了高质量的近似解决方案，在多种实际和 sintetic 数据集上对比 both neural baselines 和 classical algorithms 表现出色。此外，论文还利用 OptGNN 的 convex relaxation 能力设计了一种生成 dual certificates of optimality 的算法。

Abstract
In this work we design graph neural network architectures that can be used to obtain optimal approximation algorithms for a large class of combinatorial optimization problems using powerful algorithmic tools from semidefinite programming (SDP). Concretely, we prove that polynomial-sized message passing algorithms can represent the most powerful polynomial time algorithms for Max Constraint Satisfaction Problems assuming the Unique Games Conjecture. We leverage this result to construct efficient graph neural network architectures, OptGNN, that obtain high-quality approximate solutions on landmark combinatorial optimization problems such as Max Cut and maximum independent set. Our approach achieves strong empirical results across a wide range of real-world and synthetic datasets against both neural baselines and classical algorithms. Finally, we take advantage of OptGNN's ability to capture convex relaxations to design an algorithm for producing dual certificates of optimality (bounds on the optimal solution) from the learned embeddings of OptGNN.

摘要
在这个工作中，我们设计了图神经网络架构，可以用来获取大类 combinatorial optimization 问题的优化算法。我们证明了，使用半definite 程序（SDP）的强大算法工具，可以通过极限下的讯息传递算法来获取最优解。我们利用这个结果，构建了高效的图神经网络架构 OptGNN，可以在 landmark combinatorial optimization 问题中获得高质量的近似解。我们的方法在各种实际和 sintetic 数据集上实现了强有力的实际结果，比较 neural 基elines 和经典算法。最后，我们利用 OptGNN 捕捉到的 convex relaxation，设计了一种生成优化解的 dual certificate 算法。

2023-10-02

STARS: Zero-shot Sim-to-Real Transfer for Segmentation of Shipwrecks in Sonar Imagery

Task-guided Domain Gap Reduction for Monocular Depth Prediction in Endoscopy

SYRAC: Synthesize, Rank, and Count

You Only Look at Once for Real-time and Generic Multi-Task

Adaptive Visual Scene Understanding: Incremental Scene Graph Generation

Dynamic Spatio-Temporal Summarization using Information Based Fusion

ImagenHub: Standardizing the evaluation of conditional image generation models

RF-ULM: Deep Learning for Radio-Frequency Ultrasound Localization Microscopy

Progressive DeepSSM: Training Methodology for Image-To-Shape Deep Models

Fetal-BET: Brain Extraction Tool for Fetal MRI

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

LEAP: Liberate Sparse-view 3D Modeling from Camera Poses

HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation

H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection

Sequential Data Generation with Groupwise Diffusion Process

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection

Towards Distribution-Agnostic Generalized Category Discovery

NEUCORE: Neural Concept Reasoning for Composed Image Retrieval

Less is More: Toward Zero-Shot Local Scene Graph Generation via Foundation Models

Streaming Motion Forecasting for Autonomous Driving

Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association

ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video

Color and Texture Dual Pipeline Lightweight Style Transfer

Efficient Remote Sensing Segmentation With Generative Adversarial Transformer

3DHR-Co: A Collaborative Test-time Refinement Framework for In-the-Wild 3D Human-Body Reconstruction Task

Offline Tracking with Object Permanence

MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device

Generating 3D Brain Tumor Regions in MRI using Vector-Quantization Generative Adversarial Networks

Mirror Diffusion Models for Constrained and Watermarked Generation

Reconstructing 3D Human Pose from RGB-D Data with Occlusions

Making LLaMA SEE and Draw with SEED Tokenizer

Towards Robust Cardiac Segmentation using Graph Convolutional Networks

Self-distilled Masked Attention guided masked image modeling with noise Regularized Teacher (SMART) for medical image analysis

Cross-adversarial local distribution regularization for semi-supervised medical image segmentation

Segment Any Building

Iterative Semi-Supervised Learning for Abdominal Organs and Tumor Segmentation

[Re] CLRNet: Cross Layer Refinement Network for Lane Detection

Neural Processing of Tri-Plane Hybrid Neural Fields

Strength in Diversity: Multi-Branch Representation Learning for Vehicle Re-Identification

Batch-less stochastic gradient descent for compressive learning of deep regularization for image denoising

HyMNet: a Multimodal Deep Learning System for Hypertension Classification using Fundus Photographs and Cardiometabolic Risk Factors

Leveraging Cutting Edge Deep Learning Based Image Matching for Reconstructing a Large Scene from Sparse Images

Unsupervised Roofline Extraction from True Orthophotos for LoD2 Building Model Reconstruction

Unsupervised motion segmentation in one go: Smooth long-term model over a video

Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality

Incorporating Supervised Domain Generalization into Data Augmentation

A New Real-World Video Dataset for the Comparison of Defogging Algorithms

Controlling Vision-Language Models for Universal Image Restoration

Multi-task Learning with 3D-Aware Regularization

LS-VOS: Identifying Outliers in 3D Object Detections Using Latent Space Virtual Outlier Synthesis

Autonomous Navigation of Micro Air Vehicles in Warehouses Using Vision-based Line Following

Towards Robust 3D Object Detection In Rainy Conditions

Semi-Blind Image Deblurring Based on Framelet Prior

Data Efficient Training of a U-Net Based Architecture for Structured Documents Localization

Trained Latent Space Navigation to Prevent Lack of Photorealism in Generated Images on Style-based Models

Enhanced Winter Road Surface Condition Monitoring with Computer Vision

How Close are Other Computer Vision Tasks to Deepfake Detection?

Every Dataset Counts: Scaling up Monocular 3D Object Detection with Joint Datasets Training

BAAF: A Benchmark Attention Adaptive Framework for Medical Ultrasound Image Segmentation Tasks

Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance

A Decentralized Cooperative Navigation Approach for Visual Homing Networks

JPEG Information Regularized Deep Image Prior for Denoising

PC-NeRF: Parent-Child Neural Radiance Fields under Partial Sensor Data Loss in Autonomous Driving Environments

RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal Consistency to Frame-Based Domain Translation Approaches

Can Pre-trained Networks Detect Familiar Out-of-Distribution Data?

Elastic Interaction Energy Loss for Traffic Image Segmentation

Large Scale Masked Autoencoding for Reducing Label Requirements on SAR Data

2023-10-02

Transcending Domains through Text-to-Image Diffusion: A Source-Free Approach to Domain Adaptation

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

Zero-Shot Continuous Prompt Transfer: Generalizing Task Semantics Across Language Models

Designing User-Centric Behavioral Interventions to Prevent Dysglycemia with Novel Counterfactual Explanations

What’s the Magic Word? A Control Theory of LLM Prompting

Keypoint-Augmented Self-Supervised Learning for Medical Image Segmentation with Limited Annotation

Artemis: HE-Aware Training for Efficient Privacy-Preserving Machine Learning

It’s all about you: Personalized in-Vehicle Gesture Recognition with a Time-of-Flight Camera