2023-11-12

cs.CV

cs.CV - 2023-11-12

Augmented Bridge Matching

paper_url: http://arxiv.org/abs/2311.06978
repo_url: None
paper_authors: Valentin De Bortoli, Guan-Horng Liu, Tianrong Chen, Evangelos A. Theodorou, Weilie Nie
for: 这篇论文探讨了一种新的流程和桥接匹配过程，它们可以描述diffusion模型。这种新的过程可以在两个给定的分布之间学习随机（和决定）过程，并且可以扩展到学习任意诱导任意转移任务。
methods: 论文使用了流程和桥接匹配过程，这些过程可以 interpolate между两个给定的分布。然而，这些过程并不一定会保留coupling信息，除非遵循更加强的优化条件。
results: 论文表明，通过对流程的速度场（或推移）加入初始样本点的信息，可以恢复coupling信息。这样，我们失去了Markov性质，但保留了两个分布之间的 coupling信息。论文还通过用于学习图像翻译任务的实验，证明了这种更新的效果。

Abstract
Flow and bridge matching are a novel class of processes which encompass diffusion models. One of the main aspect of their increased flexibility is that these models can interpolate between arbitrary data distributions i.e. they generalize beyond generative modeling and can be applied to learning stochastic (and deterministic) processes of arbitrary transfer tasks between two given distributions. In this paper, we highlight that while flow and bridge matching processes preserve the information of the marginal distributions, they do \emph{not} necessarily preserve the coupling information unless additional, stronger optimality conditions are met. This can be problematic if one aims at preserving the original empirical pairing. We show that a simple modification of the matching process recovers this coupling by augmenting the velocity field (or drift) with the information of the initial sample point. Doing so, we lose the Markovian property of the process but preserve the coupling information between distributions. We illustrate the efficiency of our augmentation in learning mixture of image translation tasks.

摘要
“流和桥匹配是一种新的过程类型，包含扩散模型。其中一个主要优点是这些模型可以 interpolate между任意数据分布，即可以泛化 beyond 生成模型，并可以应用于学习随机（和决定的）过程的任意传输任务。在这篇论文中，我们指出了流和桥匹配过程保留了边缘分布信息，但是不一定保留了对应关系信息，除非遵循更加强的优化条件。这可能会导致损失原始的empirical pairing。我们示出了一种简单的修改方法，可以重新获得这个对应关系信息，通过在速度场（或拖动）中添加初始样本点的信息。这样，我们失去了MarkovProperty的属性，但保留了分布之间的对应关系。我们在图像翻译任务中证明了这种修改的效率。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

CD-COCO: A Versatile Complex Distorted COCO Database for Scene-Context-Aware Computer Vision

paper_url: http://arxiv.org/abs/2311.06976
repo_url: https://github.com/aymanbegh/cd-coco
paper_authors: Ayman Beghdadi, Azeddine Beghdadi, Malik Mallem, Lotfi Beji, Faouzi Alaya Cheikh
for: 提高计算机视觉任务的Robustness，通过人工扩充训练数据库或设计深度学习模型，以适应不同的图像获取环境。
methods: 使用本地和全局photo-realistic扭曲，基于图像场景信息和对象深度信息，以提高图像处理的photo-realism。
results: 提供了一个多样化的图像数据库，可以提高对象检测、场景分割和扭曲类型分类等计算机视觉任务的Robustness。

Abstract
The recent development of deep learning methods applied to vision has enabled their increasing integration into real-world applications to perform complex Computer Vision (CV) tasks. However, image acquisition conditions have a major impact on the performance of high-level image processing. A possible solution to overcome these limitations is to artificially augment the training databases or to design deep learning models that are robust to signal distortions. We opt here for the first solution by enriching the database with complex and realistic distortions which were ignored until now in the existing databases. To this end, we built a new versatile database derived from the well-known MS-COCO database to which we applied local and global photo-realistic distortions. These new local distortions are generated by considering the scene context of the images that guarantees a high level of photo-realism. Distortions are generated by exploiting the depth information of the objects in the scene as well as their semantics. This guarantees a high level of photo-realism and allows to explore real scenarios ignored in conventional databases dedicated to various CV applications. Our versatile database offers an efficient solution to improve the robustness of various CV tasks such as Object Detection (OD), scene segmentation, and distortion-type classification methods. The image database, scene classification index, and distortion generation codes are publicly available \footnote{\url{https://github.com/Aymanbegh/CD-COCO}

摘要
To create a versatile database, we built upon the well-known MS-COCO database and applied local and global photo-realistic distortions. These distortions were generated by considering the scene context of the images, ensuring a high level of photo-realism. We exploited the depth information of the objects in the scene, as well as their semantics, to generate distortions that are both realistic and diverse.Our versatile database offers an efficient solution to improve the robustness of various computer vision tasks, such as object detection, scene segmentation, and distortion-type classification methods. The image database, scene classification index, and distortion generation codes are publicly available at \url{https://github.com/Aymanbegh/CD-COCO}.

Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels

paper_url: http://arxiv.org/abs/2311.06964
repo_url: None
paper_authors: Vijay Veerabadran, Srinivas Ravishankar, Yuan Tang, Ritik Raina, Virginia R. de Sa
for: investigate the adaptive computation of recurrent neural networks in visual reasoning tasks, and whether it can enable zero-shot generalization to novel difficulty levels.
methods: combine convolutional recurrent neural networks (ConvRNNs) with a learnable halting mechanism based on Graves (2016), and explore various implementations of adaptive ConvRNNs (AdRNNs) such as tying weights across layers and biologically inspired recurrent networks with lateral connections and gating.
results: AdRNNs learn to dynamically halt processing early or late to solve easier or harder problems, and zero-shot generalize to more difficult problem settings not shown during training by dynamically increasing the number of recurrent iterations at test time.

Abstract
Humans solving algorithmic (or) reasoning problems typically exhibit solution times that grow as a function of problem difficulty. Adaptive recurrent neural networks have been shown to exhibit this property for various language-processing tasks. However, little work has been performed to assess whether such adaptive computation can also enable vision models to extrapolate solutions beyond their training distribution's difficulty level, with prior work focusing on very simple tasks. In this study, we investigate a critical functional role of such adaptive processing using recurrent neural networks: to dynamically scale computational resources conditional on input requirements that allow for zero-shot generalization to novel difficulty levels not seen during training using two challenging visual reasoning tasks: PathFinder and Mazes. We combine convolutional recurrent neural networks (ConvRNNs) with a learnable halting mechanism based on Graves (2016). We explore various implementations of such adaptive ConvRNNs (AdRNNs) ranging from tying weights across layers to more sophisticated biologically inspired recurrent networks that possess lateral connections and gating. We show that 1) AdRNNs learn to dynamically halt processing early (or late) to solve easier (or harder) problems, 2) these RNNs zero-shot generalize to more difficult problem settings not shown during training by dynamically increasing the number of recurrent iterations at test time. Our study provides modeling evidence supporting the hypothesis that recurrent processing enables the functional advantage of adaptively allocating compute resources conditional on input requirements and hence allowing generalization to harder difficulty levels of a visual reasoning problem without training.

摘要
人类解决算法逻辑问题通常会 exhibit 解决时间随问题难度增长。适应式循环神经网络已经在不同的语言处理任务上显示出这种性质。然而，对于视觉模型是否可以通过适应计算来解决超出训练分布难度水平的问题，尚未得到了足够的研究。在这种研究中，我们调查了适应计算的重要功能作用：动态根据输入要求调整计算资源，以实现零aser普适性。我们结合了卷积循环神经网络（ConvRNN）和学习 halt 机制，并 explore 了不同的适应 ConvRNN（AdRNN）的实现方式，从简单的Weight 套接到更复杂的生物学发现逻辑循环网络。我们的研究表明，1） AdRNNs 可以在解决更容易或更难的问题时适应停止处理，2）这些 RNNs 在训练没有看到的更加困难的问题设定中进行零aser普适性。我们的研究提供了对适应计算的模型证据，支持逻辑处理允许通过 conditional 计算资源的分配来实现零aser普适性，而无需训练。

SegReg: Segmenting OARs by Registering MR Images and CT Annotations

paper_url: http://arxiv.org/abs/2311.06956
repo_url: https://github.com/steve-zeyu-zhang/SegReg
paper_authors: Zeyu Zhang, Xuyin Qi, Bowen Zhang, Biao Wu, Hien Le, Bora Jeong, Minh-Son To, Richard Hartley
for: 这个研究是为了提高肿瘤头颈治疗规划中的肿瘤组织（Organ at Risk，OAR）分类，以提高诊断效率和精确性。
methods: 这个方法使用了弹性对称normalization来对MRI融合CT扫描进行匹配和分类，以提高OAR分类的精度和效率。
results: 这个方法比CT仅的基eline提高了16.78%的mDSC和18.77%的mIoU，显示了它能够有效地结合CT的凖度精度和MRI的软组织视觉化，实现了便捷且精确的自动OAR分类。

Abstract
Organ at risk (OAR) segmentation is a critical process in radiotherapy treatment planning such as head and neck tumors. Nevertheless, in clinical practice, radiation oncologists predominantly perform OAR segmentations manually on CT scans. This manual process is highly time-consuming and expensive, limiting the number of patients who can receive timely radiotherapy. Additionally, CT scans offer lower soft-tissue contrast compared to MRI. Despite MRI providing superior soft-tissue visualization, its time-consuming nature makes it infeasible for real-time treatment planning. To address these challenges, we propose a method called SegReg, which utilizes Elastic Symmetric Normalization for registering MRI to perform OAR segmentation. SegReg outperforms the CT-only baseline by 16.78% in mDSC and 18.77% in mIoU, showing that it effectively combines the geometric accuracy of CT with the superior soft-tissue contrast of MRI, making accurate automated OAR segmentation for clinical practice become possible.

摘要
Organ at risk (OAR) 分 Segmentation 是肿瘤治疗规划中的关键过程，特别是头颈肿瘤。然而，在临床实践中，辐射生物学专家主要通过手动操作 CT 扫描来进行 OAR 分 Segmentation。这个手动过程非常时间consuming 和昂贵，因此限制了可以得到有效的肿瘤治疗的病人数量。此外， CT 扫描表现下降的软组织对比度，尤其是在肿瘤辐射治疗中。Despite MRI 提供更高的软组织视化，它的时间consuming 性使其不可靠实时规划。为了解决这些挑战，我们提出了一种方法called SegReg，该方法利用弹性对称normalization 来将 MRI 注册到 OAR 分 Segmentation。SegReg 比 CT 基eline 高出 16.78% 的 mDSC 和 18.77% 的 mIoU，表明它可以有效地结合 CT 的准确性和 MRI 的软组织对比度，使得临床实践中的自动 OAR 分 Segmentation 变得可能。

Video-based sympathetic arousal assessment via peripheral blood flow estimation

paper_url: http://arxiv.org/abs/2311.06930
repo_url: None
paper_authors: Bjoern Braun, Daniel McDuff, Tadas Baltrusaitis, Christian Holz
for: 这篇论文旨在测量 sympathetic arousal 的标准 markers，但传统的 EDA 测量需要电极在皮肤上固定接触。这篇论文提出了一种新的方法，通过测量面或手上的血液流动来推断 sympathetic arousal。methods: 该方法使用了 RGB 摄像头测量面或手上的血液流动，并使用了同步录制的视频和 EDA 和 photoplethysmography (PPG) 信号作为参照标准。results: 研究结果表明，可以使用只有视频或 PPG 信号来测量 sympathetic arousal，并且得到了高度相关的结果（ median 相关性为 0.57-0.63）。此外，研究还表明，最佳的 sympathetic arousal 推断来自于面、手或手掌。

Abstract
Electrodermal activity (EDA) is considered a standard marker of sympathetic activity. However, traditional EDA measurement requires electrodes in steady contact with the skin. Can sympathetic arousal be measured using only an optical sensor, such as an RGB camera? This paper presents a novel approach to infer sympathetic arousal by measuring the peripheral blood flow on the face or hand optically. We contribute a self-recorded dataset of 21 participants, comprising synchronized videos of participants' faces and palms and gold-standard EDA and photoplethysmography (PPG) signals. Our results show that we can measure peripheral sympathetic responses that closely correlate with the ground truth EDA. We obtain median correlations of 0.57 to 0.63 between our inferred signals and the ground truth EDA using only videos of the participants' palms or foreheads or PPG signals from the foreheads or fingers. We also show that sympathetic arousal is best inferred from the forehead, finger, or palm.

摘要
电气活动 (EDA) 被视为 sympathetic 活动标准标志。然而，传统的 EDA 测量需要电极在皮肤上稳定接触。这篇论文提出了一种新的方法，可以通过 оптиче方式测量表照 sympathetic 活动。我们提供了 21 名参与者的自录录视频数据集，包括参与者的脸和手掌视频和标准 EDA 和光谱激光血流压力 (PPG) 信号。我们的结果显示，我们可以通过只使用参与者的脸、手掌或手指的视频来测量周围 sympathetic 响应，并且与真实 EDA 的对应相吻合度为 0.57-0.63。我们还发现， sympathetic 活动最好从脸、手掌或手指进行测量。

Setting a Baseline for long-shot real-time Player and Ball detection in Soccer Videos

paper_url: http://arxiv.org/abs/2311.06892
repo_url: https://github.com/kmouts/soccernet_v3_h250
paper_authors: Konstantinos Moutselos, Ilias Maglogiannis
for: 本研究的目的是提供一个更好的足球数据集，以便进行人员和球的检测，并提出了一种使用YOLO normalized annotation format进行训练和评估的方法。
methods: 本研究使用了SoccerNet v3的编辑版本，并提供了代码和度量器，以便在未来的比较中作为参照。
results: 研究发现，使用YOLO8n模型可以在实时长距离检测球和人员的情况下，比 FootAndBall 模型更好。

Abstract
Players and ball detection are among the first required steps on a football analytics platform. Until recently, the existing open datasets on which the evaluations of most models were based, were not sufficient. In this work, we point out their weaknesses, and with the advent of the SoccerNet v3, we propose and deliver to the community an edited part of its dataset, in YOLO normalized annotation format for training and evaluation. The code of the methods and metrics are provided so that they can be used as a benchmark in future comparisons. The recent YOLO8n model proves better than FootAndBall in long-shot real-time detection of the ball and players on football fields.

摘要
《玩家和球的探测是足球分析平台上的首要步骤。ntil recently，现有的公开数据集，在大多数模型的评估上是不充分的。在这项工作中，我们指出了它们的弱点，并随着SoccerNet v3的出现，我们对社区提供了修订后的数据集，使用YOLO normalized annotation格式进行训练和评估。我们提供了方法和指标的代码，以便作为未来的比较标准。最新的YOLO8n模型在长距离实时探测足球场上的球和玩家表现出色。》Note that "足球" (zúqiú) in the text refers to "soccer" in English.

Concept-wise Fine-tuning Matters in Preventing Negative Transfer

paper_url: http://arxiv.org/abs/2311.06868
repo_url: None
paper_authors: Yunqiao Yang, Long-Kai Huang, Ying Wei
for: 提高 Fine-tuning 的效果，尤其是对于 rare 和偶合 correlated 特征的影响
methods: 提出 Concept-wise fine-tuning (Concept-Tuning) 方法，通过在 patch 级别划分特征，提高 feature 表示的精度
results: 在 eleven 个 dataset 上，Concept-Tuning 方法与 priors 的 state-of-the-art fine-tuning 方法相比，显著提高了效果（最多提高4.76%），并在不同的预训练策略、网络架构和样本大小上具有广泛的可行性。

Abstract
A multitude of prevalent pre-trained models mark a major milestone in the development of artificial intelligence, while fine-tuning has been a common practice that enables pretrained models to figure prominently in a wide array of target datasets. Our empirical results reveal that off-the-shelf finetuning techniques are far from adequate to mitigate negative transfer caused by two types of underperforming features in a pre-trained model, including rare features and spuriously correlated features. Rooted in structural causal models of predictions after fine-tuning, we propose a Concept-wise fine-tuning (Concept-Tuning) approach which refines feature representations in the level of patches with each patch encoding a concept. Concept-Tuning minimizes the negative impacts of rare features and spuriously correlated features by (1) maximizing the mutual information between examples in the same category with regard to a slice of rare features (a patch) and (2) applying front-door adjustment via attention neural networks in channels and feature slices (patches). The proposed Concept-Tuning consistently and significantly (by up to 4.76%) improves prior state-of-the-art fine-tuning methods on eleven datasets, diverse pre-training strategies (supervised and self-supervised ones), various network architectures, and sample sizes in a target dataset.

摘要
一群普遍存在的预训练模型标志着人工智能的发展历程中的一个重要里程碑，而适应已成为预训练模型在多个目标数据集中突出表现的常见做法。我们的实验结果表明，直接使用存在问题的特征的预训练技术是不够有效地 mitigate 负面传递。基于结构 causal 模型的预测后 fine-tuning，我们提出了概念层 fine-tuning（Concept-Tuning）方法，它在 patch 级别对特征进行了更加细化的修正。Concept-Tuning 通过（1）在同类目标中的例子之间增加例子之间的相互信息（regarding 一个质量的patch），以及（2）通过注意力神经网络在通道和特征片（patch）中进行前门调整来减少罕见特征和偶合特征的负面影响。我们的 Concept-Tuning 方法在十一个数据集上，包括不同的预训练策略（指导和自我指导）、不同的网络架构和样本大小等多个因素上，与之前的最佳 fine-tuning 方法进行了比较，并表现出了显著改善（最高提升4.76%）。

Contrastive Learning of View-Invariant Representations for Facial Expressions Recognition

paper_url: http://arxiv.org/abs/2311.06852
repo_url: None
paper_authors: Shuvendu Roy, Ali Etemad
for: 提高非FR面部表达识别精度，抵御不同视角入参影响
methods: 基于对比学习的新视角不变 facial expression recognition 框架 ViewFX，通过自我超vised contrastive loss 学习视角不变特征，并通过supervised contrastive loss 驱动每个表情的特征归一化
results: 在两个多视角面部表达识别数据集上进行测试，表现比前作高，新创造了两个数据集的状态之一，并在不同视角和输入 Parameters 的情况下显示更高的鲁棒性和灵活性

Abstract
Although there has been much progress in the area of facial expression recognition (FER), most existing methods suffer when presented with images that have been captured from viewing angles that are non-frontal and substantially different from those used in the training process. In this paper, we propose ViewFX, a novel view-invariant FER framework based on contrastive learning, capable of accurately classifying facial expressions regardless of the input viewing angles during inference. ViewFX learns view-invariant features of expression using a proposed self-supervised contrastive loss which brings together different views of the same subject with a particular expression in the embedding space. We also introduce a supervised contrastive loss to push the learnt view-invariant features of each expression away from other expressions. Since facial expressions are often distinguished with very subtle differences in the learned feature space, we incorporate the Barlow twins loss to reduce the redundancy and correlations of the representations in the learned representations. The proposed method is a substantial extension of our previously proposed CL-MEx, which only had a self-supervised loss. We test the proposed framework on two public multi-view facial expression recognition datasets, KDEF and DDCF. The experiments demonstrate that our approach outperforms previous works in the area and sets a new state-of-the-art for both datasets while showing considerably less sensitivity to challenging angles and the number of output labels used for training. We also perform detailed sensitivity and ablation experiments to evaluate the impact of different components of our model as well as its sensitivity to different parameters.

摘要
尽管在人脸表达识别（FER）领域已经做出了很多进步，但大多数现有方法在输入角度不同于训练过程中的角度时受到影响。在这篇论文中，我们提出了 ViewFX，一种基于对比学习的新型视角不变的FER框架，能够在推理过程中准确地识别不同角度的人脸表达。ViewFX使用我们提议的自我监督对比损失来学习不同视角的表达特征，并通过推理过程中的不同视角来将这些特征带入嵌入空间中。我们还引入了一种监督对比损失来将学习的不同视角特征远离其他表达的特征。由于人脸表达通常通过非常微小的差异来区分，我们采用了Barlow twins损失来减少学习的表示空间中的重复和相关性。我们的方法是CL-MEx的扩展，只有自我监督loss。我们在两个公共多视角人脸表达识别数据集（KDEF和DDCF）进行了测试，实验结果表明，我们的方法在这两个数据集上超越了之前的成果，并在不同角度和输入标签数量的情况下设置了新的状态法。我们还进行了详细的敏感性和缺失 эксперименты来评估我们的模型的不同组件和参数的影响。

Sampler Scheduler for Diffusion Models

paper_url: http://arxiv.org/abs/2311.06845
repo_url: https://github.com/carzit/sd-webui-samplers-scheduler
paper_authors: Zitong Cheng
for: 这个论文的目的是提出一种多抽样器（ODE/SDE）的可行性，以解决 diffusion-based generative models 中的抽样问题。
methods: 该论文使用了多种主流抽样器（ODE/SDE），并通过分析和总结每种抽样器的更新方程，实现在同一个抽样过程中使用不同的抽样器。
results: 实验结果表明，这种多抽样器调度策略可以提高抽样效率和质量。例如，在 CIFAR-10 数据集上，使用 ODE Sampler Scheduler 时，FID 分数为 1.91，比 DPM++ 2M 、DPM2 和 Heun 等方法更好。同时， combining SDE 和 ODE 的抽样调度策略可以更好地解决抽样问题。

Abstract
Diffusion modeling (DM) has high-quality generative performance, and the sampling problem is an important part of the DM performance. Thanks to efficient differential equation solvers, the sampling speed can be reduced while higher sampling quality is guaranteed. However, currently, there is a contradiction in samplers for diffusion-based generative models: the mainstream sampler choices are diverse, each with its own characteristics in terms of performance. However, only a single sampler algorithm can be specified on all sampling steps in the generative process. This often makes one torn between sampler choices; in other words, it makes it difficult to fully utilize the advantages of each sampler. In this paper, we propose the feasibility of using different samplers (ODE/SDE) on different sampling steps of the same sampling process based on analyzing and generalizing the updating formulas of each mainstream sampler, and experimentally demonstrate that such a multi-sampler scheduling improves the sampling results to some extent. In particular, we also verify that the combination of using SDE in the early sampling steps and ODE in the later sampling steps solves the inherent problems previously caused by using both singly. We show that our design changes improve the sampling efficiency and quality in previous work. For instance, when Number of Function Evaluations (NFE) = 24, the ODE Sampler Scheduler achieves a FID score of 1.91 on the CIFAR-10 dataset, compared to 2.02 for DPM++ 2M, 1.97 for DPM2, and 11.90 for Heun for the same NFE. Meanwhile the Sampler Scheduler with the combined scheduling of SDE and ODE reaches 1.899, compared to 18.63 for Euler a, 3.14 for DPM2 a and 23.14 for DPM++ SDE.

摘要
Diffusion 模型（DM）具有高质量的生成性能， sampling 问题是 DM 性能的重要组成部分。due to efficient differential equation solvers, the sampling speed can be reduced while higher sampling quality is guaranteed. However, currently, there is a contradiction in samplers for diffusion-based generative models: the mainstream sampler choices are diverse, each with its own characteristics in terms of performance. However, only a single sampler algorithm can be specified on all sampling steps in the generative process. This often makes one torn between sampler choices; in other words, it makes it difficult to fully utilize the advantages of each sampler. In this paper, we propose the feasibility of using different samplers (ODE/SDE) on different sampling steps of the same sampling process based on analyzing and generalizing the updating formulas of each mainstream sampler, and experimentally demonstrate that such a multi-sampler scheduling improves the sampling results to some extent. In particular, we also verify that the combination of using SDE in the early sampling steps and ODE in the later sampling steps solves the inherent problems previously caused by using both singly. We show that our design changes improve the sampling efficiency and quality in previous work. For instance, when Number of Function Evaluations (NFE) = 24, the ODE Sampler Scheduler achieves a FID score of 1.91 on the CIFAR-10 dataset, compared to 2.02 for DPM++ 2M, 1.97 for DPM2, and 11.90 for Heun for the same NFE. Meanwhile the Sampler Scheduler with the combined scheduling of SDE and ODE reaches 1.899, compared to 18.63 for Euler a, 3.14 for DPM2 a, and 23.14 for DPM++ SDE.

Osteoporosis Prediction from Hand and Wrist X-rays using Image Segmentation and Self-Supervised Learning

paper_url: http://arxiv.org/abs/2311.06834
repo_url: None
paper_authors: Hyungeun Lee, Ung Hwang, Seungwon Yu, Chang-Hun Lee, Kijung Yoon
for: 这篇文章旨在探讨使用手和腕X射线影像来预测骨质疾病（osteoporosis），以提高检测率而无需增加成本或时间。
methods: 这篇文章使用了一个基础模型来进行影像分类，然后使用自我超vised learning的方法来提取有意义的表示，最后使用一个超vised learning的方法来类别骨质疾病。
results: 这篇文章的结果显示，使用这种方法可以获得一个较高的类别分数（AUC=0.83），这表明这种方法可以很好地预测骨质疾病。

Abstract
Osteoporosis is a widespread and chronic metabolic bone disease that often remains undiagnosed and untreated due to limited access to bone mineral density (BMD) tests like Dual-energy X-ray absorptiometry (DXA). In response to this challenge, current advancements are pivoting towards detecting osteoporosis by examining alternative indicators from peripheral bone areas, with the goal of increasing screening rates without added expenses or time. In this paper, we present a method to predict osteoporosis using hand and wrist X-ray images, which are both widely accessible and affordable, though their link to DXA-based data is not thoroughly explored. Initially, our method segments the ulnar, radius, and metacarpal bones using a foundational model for image segmentation. Then, we use a self-supervised learning approach to extract meaningful representations without the need for explicit labels, and move on to classify osteoporosis in a supervised manner. Our method is evaluated on a dataset with 192 individuals, cross-referencing their verified osteoporosis conditions against the standard DXA test. With a notable classification score (AUC=0.83), our model represents a pioneering effort in leveraging vision-based techniques for osteoporosis identification from the peripheral skeleton sites.

摘要
骨质疏松是一种广泛存在和慢性的代谢疾病，经常未被诊断和治疗，这是由于骨骼矿物厚度测试（DXA）的有限访问而导致的。为了解决这个挑战，当前的进展都在转移到查看周边骨骼区域的指标，以提高检测率而无需增加成本或时间。在这篇论文中，我们提出了使用手和手腕X射线图像来预测骨质疏松的方法，这些图像都是非常容易获得和便宜的，但它们与DXA测试的关系尚未得到了充分探讨。我们的方法首先使用基础模型来对手和手腕X射线图像进行分割，然后使用无监督学习方法来提取有意义的表示，最后在监督学习方式下来进行分类。我们的方法在一个包含192名个体的数据集上进行评估，并与标准DXA测试进行对比。我们的模型在这些数据上获得了 Notable的分类分数（AUC=0.83），表明我们的方法可以有效地利用视觉技术来识别骨质疏松。

On original and latent space connectivity in deep neural networks

paper_url: http://arxiv.org/abs/2311.06816
repo_url: None
paper_authors: Boyang Gu, Anastasia Borovykh
for: 研究了 Whether inputs from the same class can be connected by a continuous path in the original or latent representation space, and how the neural network views its own input space and the structure of the latent spaces.
methods: 使用了 Neural network models to study the connectivity of same-class inputs and the structure of the latent spaces.
results: 发现了 All points on the path are mapped by the neural network model to the same class, and paths, linear or nonlinear, connecting same-class inputs exist in all cases studied.

Abstract
We study whether inputs from the same class can be connected by a continuous path, in original or latent representation space, such that all points on the path are mapped by the neural network model to the same class. Understanding how the neural network views its own input space and how the latent spaces are structured has value for explainability and robustness. We show that paths, linear or nonlinear, connecting same-class inputs exist in all cases studied.

摘要
我们研究 Whether inputs from the same class can be connected by a continuous path, in original or latent representation space, such that all points on the path are mapped by the neural network model to the same class. 理解 neural network 对自己的输入空间的看法和离散空间的结构有价值，用于解释性和稳定性。我们显示，在所有研究的 случаeschina，同类输入的路径，线性或非线性，都存在。

MetaMix: Meta-state Precision Searcher for Mixed-precision Activation Quantization

paper_url: http://arxiv.org/abs/2311.06798
repo_url: None
paper_authors: Han-Byul Kim, Joo Hyung Lee, Sungjoo Yoo, Hong-Seok Kim
for: 提高混合精度量化的精度和效率，解决活动不稳定性问题
methods: 提出一种名为MetaMix的新方法，包括位数选择和Weight训练两个阶段，均可减少混合精度量化中的活动不稳定性问题，并且可以快速完成位数选择和Weight训练
results: 通过在MobileNet v2和v3，以及ResNet-18上进行图像Net的实验，显示我们提出的方法可以超越混合和单精度SOTA方法，在精度vs运算数上push混合精度量化的边界

Abstract
Mixed-precision quantization of efficient networks often suffer from activation instability encountered in the exploration of bit selections. To address this problem, we propose a novel method called MetaMix which consists of bit selection and weight training phases. The bit selection phase iterates two steps, (1) the mixed-precision-aware weight update, and (2) the bit-search training with the fixed mixed-precision-aware weights, both of which combined reduce activation instability in mixed-precision quantization and contribute to fast and high-quality bit selection. The weight training phase exploits the weights and step sizes trained in the bit selection phase and fine-tunes them thereby offering fast training. Our experiments with efficient and hard-to-quantize networks, i.e., MobileNet v2 and v3, and ResNet-18 on ImageNet show that our proposed method pushes the boundary of mixed-precision quantization, in terms of accuracy vs. operations, by outperforming both mixed- and single-precision SOTA methods.

摘要
通常，混合精度量化的高效网络会遇到活动不稳定性问题，这是在探索比选择中出现的问题。为解决这个问题，我们提出了一种新方法called MetaMix，它包括比选择和重量训练两个阶段。比选择阶段包括两步：（1）混合精度意识的Weight更新，和（2）固定混合精度意识的比选择训练，这两个步骤共同减少了混合精度量化中的活动不稳定性，并且对快速和高质量的比选择做出了贡献。重量训练阶段利用了在比选择阶段训练的 weights和 step sizes，并对它们进行了细调，从而提供了快速的训练。我们对高效和难以量化的网络，如MobileNet v2和v3，以及ResNet-18在ImageNet上进行了实验，结果表明，我们的提出的方法可以超越混合和单精度的SOTA方法，在精度vs操作方面Push the boundary of mixed-precision quantization。

Deep Perspective Transformation Based Vehicle Localization on Bird’s Eye View

paper_url: http://arxiv.org/abs/2311.06796
repo_url: https://github.com/ipm-hpc/perspective-bev-transformer
paper_authors: Abtin Mahyar, Hossein Motamednia, Dara Rahmati
for: 提高自动驾驶车 Navigation 系统的准确性和效率，提供丰富的环境数据 для下游任务。
methods: 使用 bird’s-eye-view 映射将 perspective 视图RGB 图像转换为分割环境汽车的映射，提供高效且成本低的环境信息获取方法。
results: 提供了一个新的合成数据集，包含了一系列包含 ego 车和其环境的帧图像，为同类下游任务提供了丰富的资源。

Abstract
An accurate understanding of a self-driving vehicle's surrounding environment is crucial for its navigation system. To enhance the effectiveness of existing algorithms and facilitate further research, it is essential to provide comprehensive data to the routing system. Traditional approaches rely on installing multiple sensors to simulate the environment, leading to high costs and complexity. In this paper, we propose an alternative solution by generating a top-down representation of the scene, enabling the extraction of distances and directions of other cars relative to the ego vehicle. We introduce a new synthesized dataset that offers extensive information about the ego vehicle and its environment in each frame, providing valuable resources for similar downstream tasks. Additionally, we present an architecture that transforms perspective view RGB images into bird's-eye-view maps with segmented surrounding vehicles. This approach offers an efficient and cost-effective method for capturing crucial environmental information for self-driving cars. Code and dataset are available at https://github.com/IPM-HPC/Perspective-BEV-Transformer.

摘要
<> translate into Simplified Chinese一个准确的自驾车环境理解对其导航系统是非常重要的。为了提高现有算法的效iveness和促进进一步的研究，提供全面的数据给路由系统是必要的。传统的方法通过安装多个感测器来模拟环境，这会导致高成本和复杂性。在这篇论文中，我们提出一个 alternativesolution，通过生成顶部视图的场景表示，以提取相对于egos车的其他车辆的距离和方向。我们介绍了一个新的合成数据集，该数据集在每帧中提供了 egos车和其environments的广泛信息，这将为下游任务提供优质的资源。此外，我们提出了一种将平视图RGB图像转换为鸟瞰视图地图的架构，该approach可以有效地和经济地记录自驾车环境中的重要信息。代码和数据集可以在https://github.com/IPM-HPC/Perspective-BEV-Transformer上下载。

CL-Flow:Strengthening the Normalizing Flows by Contrastive Learning for Better Anomaly Detection

paper_url: http://arxiv.org/abs/2311.06794
repo_url: None
paper_authors: Shunfeng Wang, Yueyang Li, Haichi Luo, Chenyang Bi
for: 这个论文主要关注于自适应异常检测领域中的异常样本稀缺问题，提出了一种自我监督异常检测方法， combinig contrastive learning with 2D-Flow，以提高检测精度和减少计算成本。
methods: 本文提出了一种新的异常生成方法，通过模拟真实的工业场景来生成异常样本，并对2D-Flow框架进行了改进，通过多种代理任务来练级网络，以提高异常检测精度。
results: 比较主流的无监督方法，本文的自我监督方法在MVTecAD和BTAD datasets上达到了新的州态艺术率记录，分别为99.6%和96.8%。

Abstract
In the anomaly detection field, the scarcity of anomalous samples has directed the current research emphasis towards unsupervised anomaly detection. While these unsupervised anomaly detection methods offer convenience, they also overlook the crucial prior information embedded within anomalous samples. Moreover, among numerous deep learning methods, supervised methods generally exhibit superior performance compared to unsupervised methods. Considering the reasons mentioned above, we propose a self-supervised anomaly detection approach that combines contrastive learning with 2D-Flow to achieve more precise detection outcomes and expedited inference processes. On one hand, we introduce a novel approach to anomaly synthesis, yielding anomalous samples in accordance with authentic industrial scenarios, alongside their surrogate annotations. On the other hand, having obtained a substantial number of anomalous samples, we enhance the 2D-Flow framework by incorporating contrastive learning, leveraging diverse proxy tasks to fine-tune the network. Our approach enables the network to learn more precise mapping relationships from self-generated labels while retaining the lightweight characteristics of the 2D-Flow. Compared to mainstream unsupervised approaches, our self-supervised method demonstrates superior detection accuracy, fewer additional model parameters, and faster inference speed. Furthermore, the entire training and inference process is end-to-end. Our approach showcases new state-of-the-art results, achieving a performance of 99.6\% in image-level AUROC on the MVTecAD dataset and 96.8\% in image-level AUROC on the BTAD dataset.

摘要
在异常检测领域，缺乏异常样本导致当前研究强调无监督异常检测。而这些无监督异常检测方法尽管方便，但它们也忽略了异常样本中关键的先前信息。此外，深度学习方法中，监督方法通常比无监督方法表现更优。针对以上原因，我们提出一种自我监督异常检测方法，将对比学习与2D-Flow结合使用，以实现更精准的检测结果和加速的检测过程。一方面，我们提出了一种新的异常生成方法，生成了符合实际工业场景的异常样本，并同时提供了代表性的注释。另一方面，通过获得大量异常样本，我们改进了2D-Flow框架，通过对多个代理任务进行细化 parameter 的网络。我们的方法使得网络可以从自己生成的标签中学习更精准的映射关系，同时保持2D-Flow的轻量级特性。相比主流无监督方法，我们的自我监督方法在检测精度、额外参数数量和检测速度方面均表现出优异。此外，整个训练和检测过程是端到端的。我们的方法在MVTecAD数据集上达到了图像级AUROC99.6%和BTAD数据集上达到了图像级AUROC96.8%的新状态纪录。

IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models

paper_url: http://arxiv.org/abs/2311.06792
repo_url: None
paper_authors: Zhaoyuan Yang, Zhengyang Yu, Zhiwei Xu, Jaskirat Singh, Jing Zhang, Dylan Campbell, Peter Tu, Richard Hartley
for: 生成smooth、direct、realistic的图像变换
methods: 利用扩散模型、 lokally线性和Gaussian latent space进行图像 interpolating
results: suppresses ghosting artifacts、achieves smooth、direct、realistic image morphingHere’s a more detailed explanation of each point:
for: The paper is written to present a new method for image morphing, which is a technique used to transform one image into another. The goal is to create smooth, direct, and realistic changes between the two images.
methods: The proposed method uses a diffusion-based image morphing approach with perceptually-uniform sampling (IMPUS). This method leverages a latent diffusion model that has distinct conditional distributions and data embeddings for each of the two images, especially when they are from different classes. To bridge the gap between the two images, the method interpolates in the locally linear and continuous text embedding space and Gaussian latent space. Additionally, the method uses an adaptive bottleneck constraint based on a novel relative perceptual path diversity score to control the bottleneck size and balance the diversity along the path with its directness.
results: The proposed method can achieve smooth, direct, and realistic image morphing. Extensive experiments validate that the method can be applied to other image generation tasks. Additionally, the method suppresses ghosting artifacts, which are common in traditional image morphing techniques.

Abstract
We present a diffusion-based image morphing approach with perceptually-uniform sampling (IMPUS) that produces smooth, direct, and realistic interpolations given an image pair. A latent diffusion model has distinct conditional distributions and data embeddings for each of the two images, especially when they are from different classes. To bridge this gap, we interpolate in the locally linear and continuous text embedding space and Gaussian latent space. We first optimize the endpoint text embeddings and then map the images to the latent space using a probability flow ODE. Unlike existing work that takes an indirect morphing path, we show that the model adaptation yields a direct path and suppresses ghosting artifacts in the interpolated images. To achieve this, we propose an adaptive bottleneck constraint based on a novel relative perceptual path diversity score that automatically controls the bottleneck size and balances the diversity along the path with its directness. We also propose a perceptually-uniform sampling technique that enables visually smooth changes between the interpolated images. Extensive experiments validate that our IMPUS can achieve smooth, direct, and realistic image morphing and be applied to other image generation tasks.

摘要
我们提出了一种基于扩散的图像融合方法（IMPUS），该方法可以生成基于图像对的平滑、直接和实际的插值。我们的模型具有不同类别图像的独特条件分布和数据嵌入，因此在 interpolating 图像时需要 bridge 这个差异。我们使用了在本地线性和连续的文本嵌入空间和高斯嵌入空间进行 interpolating，首先优化终点文本嵌入，然后将图像映射到嵌入空间使用概率流ODE。不同于现有的工作，我们的模型适应化可以实现直接的插值路径，并且可以抑制抽象残余 artifacts。为了实现这一点，我们提出了一种自适应瓶颈约束，该约束基于一种新的相对感知路径多样性分数，可以自动控制瓶颈大小，并且可以在路径上保持视觉平滑的变化。我们还提出了一种可视平滑的抽象采样技术，可以在 interpolating 图像时使得变化更加平滑。我们的IMPUS可以实现平滑、直接和实际的图像插值，并且可以应用于其他图像生成任务。

InfMLLM: A Unified Framework for Visual-Language Tasks

paper_url: http://arxiv.org/abs/2311.06791
repo_url: https://github.com/mightyzau/infmllm
paper_authors: Qiang Zhou, Zhibin Wang, Wei Chu, Yinghui Xu, Hao Li, Yuan Qi
for: 这个论文的目的是扩展大语言模型（LLM）的能力，以涵盖更广泛的语言相关应用。
methods: 该论文使用了三个阶段的训练方案：开始 WITH 轻量级含义预处理，然后是中量级多任务混合训练，最后是 LLJ 练习以提高 instrux following 能力。
results: 该论文的实验结果表明，通过使用 pool-adapter 模块保持视觉嵌入的 pozitional 信息，对于如 visual grounding 等任务具有特别的 beneficial 效果。 InfMLLM 的性能达到了或与最新的 MLLM 相当。代码和模型将在：\url{https://github.com/mightyzau/InfMLLM} 上开源。

Abstract
Large language models (LLMs) have proven their remarkable versatility in handling a comprehensive range of language-centric applications. To expand LLMs' capabilities to a broader spectrum of modal inputs, multimodal large language models (MLLMs) have attracted growing interest. This work delves into enabling LLMs to tackle more vision-language-related tasks, particularly image captioning, visual question answering (VQA,) and visual grounding. To this end, we implemented a three-stage training scheme: starting with lightweight alignment pretraining, then moderate-weight multitask hybrid training, and finally, LLM fine-tuning to improve instruction following capability. Throughout the training process, the requirements on GPU memory gradually increase. To effectively manage the number of visual embeddings passed to the LLM while preserving their positional information, we introduce a straightforward visual adapter module dubbed pool-adapter. Our experiments demonstrate that preserving the positional information of visual embeddings through the pool-adapter is particularly beneficial for tasks like visual grounding. We name our proposed approach InfMLLM and have evaluated it extensively on various benchmark datasets. Our results demonstrate that InfMLLM achieves either state-of-the-art (SOTA) performance or performance comparable to recent MLLMs. The code and model will be made open-source at: \url{https://github.com/mightyzau/InfMLLM}.

摘要
大型语言模型（LLM）已经证明了它们在处理广泛的语言相关应用方面的卓越多样性。为了扩展LLM的能力到更广泛的模式输入，多模式大型语言模型（MLLM）在最近引起了越来越多的关注。这个工作探索了将LLM应用到更多的视觉语言相关任务，特别是图像描述、视觉问题答案（VQA）和视觉定位。为此，我们采用了三阶段训练方案：首先是轻量级Alignment预训练，然后是中量级多任务混合训练，最后是LLM精细调整以提高指令遵循能力。在训练过程中，GPU内存的需求逐渐增加。为了有效地管理LLM接受的视觉嵌入的数量，我们提出了一个简单的视觉适配器模组，名为pool适配器。我们的实验表明，通过pool适配器保留视觉嵌入的位置信息是特别有益于像定位任务。我们统称我们的提案为InfMLLM，并对多个benchmark数据集进行了广泛的评估。我们的结果显示InfMLLM在不同的任务上均可以 achieve state-of-the-art（SOTA）性或与最近的MLLMs相似的性能。我们将代码和模型公开发布在：。

Explainability of Vision Transformers: A Comprehensive Review and New Perspectives

paper_url: http://arxiv.org/abs/2311.06786
repo_url: None
paper_authors: Rojina Kashefi, Leili Barekatain, Mohammad Sabokrou, Fatemeh Aghaeipoor
for: 本研究旨在解释视觉转换器（ViT）的工作原理和决策基础。
methods: 本研究提出了不同的解释方法，并对其进行了分类和评价。
results: 本研究提供了一份完整的评价标准和解释工具框架，以及未经探索的重要方向和未来投资领域。

Abstract
Transformers have had a significant impact on natural language processing and have recently demonstrated their potential in computer vision. They have shown promising results over convolution neural networks in fundamental computer vision tasks. However, the scientific community has not fully grasped the inner workings of vision transformers, nor the basis for their decision-making, which underscores the importance of explainability methods. Understanding how these models arrive at their decisions not only improves their performance but also builds trust in AI systems. This study explores different explainability methods proposed for visual transformers and presents a taxonomy for organizing them according to their motivations, structures, and application scenarios. In addition, it provides a comprehensive review of evaluation criteria that can be used for comparing explanation results, as well as explainability tools and frameworks. Finally, the paper highlights essential but unexplored aspects that can enhance the explainability of visual transformers, and promising research directions are suggested for future investment.

摘要
<>transformers 对自然语言处理和计算机视觉有很大的影响，最近在基本计算机视觉任务中表现出色。然而，科学社区对视觉transformers 的内部工作和决策基础还没有全面理解，这标志着解释方法的重要性。理解这些模型如何做出决策不仅提高了它们的性能，还可以帮助建立对 AI 系统的信任。本文探讨了不同的解释方法，并对它们进行了分类，按照它们的动机、结构和应用场景进行排序。此外，文章还提供了评估解释结果的标准化评价标准，以及解释工具和框架。最后，文章强调了对视觉transformers 的解释仍然存在一些不足之处，并提出了未来投入的潜在研究方向。>>>

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

paper_url: http://arxiv.org/abs/2311.06783
repo_url: https://github.com/Q-Future/Q-Instruct
paper_authors: Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Kaixin Xu, Chunyi Li, Jingwen Hou, Guangtao Zhai, Geng Xue, Wenxiu Sun, Qiong Yan, Weisi Lin
for: 增强基础模型对低级视觉任务的能力，包括对图像的识别、分类、描述等。
methods: 基于GPT-4V的多模态基础模型，通过大规模人工反馈的集成，提高低级视觉任务的能力。
results: 经过大规模人工反馈的集成，可以提高基础模型对低级视觉任务的能力，并且可以让基础模型更好地理解和评价图像的低级视觉特征。

Abstract
Multi-modality foundation models, as represented by GPT-4V, have brought a new paradigm for low-level visual perception and understanding tasks, that can respond to a broad range of natural human instructions in a model. While existing foundation models have shown exciting potentials on low-level visual tasks, their related abilities are still preliminary and need to be improved. In order to enhance these models, we conduct a large-scale subjective experiment collecting a vast number of real human feedbacks on low-level vision. Each feedback follows a pathway that starts with a detailed description on the low-level visual appearance (*e.g. clarity, color, brightness* of an image, and ends with an overall conclusion, with an average length of 45 words. The constructed **Q-Pathway** dataset includes 58K detailed human feedbacks on 18,973 images with diverse low-level appearance. Moreover, to enable foundation models to robustly respond to diverse types of questions, we design a GPT-participated conversion to process these feedbacks into diverse-format 200K instruction-response pairs. Experimental results indicate that the **Q-Instruct** consistently elevates low-level perception and understanding abilities across several foundational models. We anticipate that our datasets can pave the way for a future that general intelligence can perceive, understand low-level visual appearance and evaluate visual quality like a human. Our dataset, model zoo, and demo is published at: https://q-future.github.io/Q-Instruct.

摘要
多Modal基础模型，如GPT-4V，已经带来了一个新的LOW级视觉理解和任务框架，可以回应人类的各种自然指令。现有的基础模型有很多潜在的能力，但它们的相关能力还需要进一步提高。为了提高这些模型，我们进行了大规模的主观实验，收集了大量真正的人类反馈，每个反馈都包括一个详细的LOW级视觉出现（如清晰度、颜色、亮度等）的描述，以及一个总结，平均长度为45个单词。我们构建了**Q-Pathway**数据集，包含58000个详细的人类反馈，对18973张图像进行了多种LOW级视觉出现。此外，为了让基础模型能够有效地回应多种问题，我们设计了一种GPT参与的转换，将这些反馈转换为多种格式的200000个指令响应对。实验结果表明，**Q-Instruct**可以不断提高基础模型的LOW级视觉理解和识别能力。我们预计，我们的数据集、模型 zoological和示例将在未来为一个人类智能感知、理解LOW级视觉出现和评估视觉质量的未来开拓道路。我们的数据集、模型 zoo和示例可以在：https://q-future.github.io/Q-Instruct 中找到。

2023-11-12

cs.AI

cs.AI - 2023-11-12

Creating a Discipline-specific Commons for Infectious Disease Epidemiology

paper_url: http://arxiv.org/abs/2311.06989
repo_url: None
paper_authors: Michael M. Wagner, William Hogan, John Levander, Adam Darr, Matt Diller, Max Sibilla, Alexander T. Loiacono. Terence Sperringer, Jr., Shawn T. Brown
for: 本研究目的是建立一个感染病谱（ID） epidemiology共享平台，让 epidemiologists、公共卫生官员、数据生产者和软件开发者可以不仅分享数据和软件，还可以在改进它们的可操作性方面得到帮助。
methods: 本研究使用OWL 2和逻辑查询来推导可能兼容的软件和数据集组合，以及这些组合的统计信息。同时， authors 还使用 DATS 2.2 和自己设计的软件元数据Schema来表示对象。
results: 研究发现，由于软件输入/输出格式的标准化不足，实现可操作性受限。然而，逻辑搜索基于名称的数据格式的 triple store 仍然能够确定众多的可能兼容的软件和数据集组合。I hope this helps! Let me know if you have any further questions.

Abstract
Objective: To create a commons for infectious disease (ID) epidemiology in which epidemiologists, public health officers, data producers, and software developers can not only share data and software, but receive assistance in improving their interoperability. Materials and Methods: We represented 586 datasets, 54 software, and 24 data formats in OWL 2 and then used logical queries to infer potentially interoperable combinations of software and datasets, as well as statistics about the FAIRness of the collection. We represented the objects in DATS 2.2 and a software metadata schema of our own design. We used these representations as the basis for the Content, Search, FAIR-o-meter, and Workflow pages that constitute the MIDAS Digital Commons. Results: Interoperability was limited by lack of standardization of input and output formats of software. When formats existed, they were human-readable specifications (22/24; 92%); only 3 formats (13%) had machine-readable specifications. Nevertheless, logical search of a triple store based on named data formats was able to identify scores of potentially interoperable combinations of software and datasets. Discussion: We improved the findability and availability of a sample of software and datasets and developed metrics for assessing interoperability. The barriers to interoperability included poor documentation of software input/output formats and little attention to standardization of most types of data in this field. Conclusion: Centralizing and formalizing the representation of digital objects within a commons promotes FAIRness, enables its measurement over time and the identification of potentially interoperable combinations of data and software.

摘要
目标：创建一个媒体共享平台 для感染病（ID）epidemiology，让感染病学家、公共卫生官员、数据生产者和软件开发者可以不仅分享数据和软件，而且在提高它们的相互适用性方面获得帮助。材料和方法：我们将586个数据集、54个软件和24个数据格式表示为OWL 2，然后使用逻辑查询来推理可能相互适用的软件和数据集的组合，以及这些集合的统计数据。我们使用这些表示来构建Content、搜索、FAIR-o-meter和工作流页面，这些页面组成了MIDAS数字共享平台。结果：宏观的可操作性受到数据集和软件之间的标准化问题的限制。虽然大多数数据格式有人类可读的规范（22/24，92%），但只有13%的格式有机器可读的规范。然而，基于命名数据格式的 triple store 上的逻辑搜索仍可以identify scores of potentially interoperable combinations of software and datasets。讨论：我们提高了一个样本的软件和数据集的可用性和找到性，并开发了评估相互适用性的 metrics。障碍因子包括感染病软件的输入/输出格式的ocumentation缺乏和这个领域大多数数据类型的标准化得 little attention。结论：在一个共享平台上中央和正式地表示数字对象可以提高FAIRness，并且可以随时衡量和识别数据和软件之间的可能相互适用组合。

Assessing the Interpretability of Programmatic Policies with Large Language Models

paper_url: http://arxiv.org/abs/2311.06979
repo_url: None
paper_authors: Zahra Bashir, Michael Bowling, Levi H. S. Lelis
for: 这篇论文目的是评估程序编程策略的可解性。
methods: 这篇论文使用大型自然语言模型（LLM）来评估程序编程策略的可解性。
results: 该评估方法可以准确地评估程序编程策略的可解性，并且可以用来比较不同的程序编程策略的可解性水平。

Abstract
Although the synthesis of programs encoding policies often carries the promise of interpretability, systematic evaluations to assess the interpretability of these policies were never performed, likely because of the complexity of such an evaluation. In this paper, we introduce a novel metric that uses large-language models (LLM) to assess the interpretability of programmatic policies. For our metric, an LLM is given both a program and a description of its associated programming language. The LLM then formulates a natural language explanation of the program. This explanation is subsequently fed into a second LLM, which tries to reconstruct the program from the natural language explanation. Our metric measures the behavioral similarity between the reconstructed program and the original. We validate our approach using obfuscated programs that are used to solve classic programming problems. We also assess our metric with programmatic policies synthesized for playing a real-time strategy game, comparing the interpretability scores of programmatic policies synthesized by an existing system to lightly obfuscated versions of the same programs. Our LLM-based interpretability score consistently ranks less interpretable programs lower and more interpretable ones higher. These findings suggest that our metric could serve as a reliable and inexpensive tool for evaluating the interpretability of programmatic policies.

摘要
Simplified Chinese translation:尽管编译程序策略的合成 часто承诺可读性，但系统性的评估以评估这些策略的可读性从未进行过，可能是因为评估的复杂性。在这篇论文中，我们引入了一个新的指标，使用大型自然语言模型（LLM）来评估程序策略的可读性。我们给LLM提供了一个程序和其关联的编程语言描述。LLM然后将程序转换成自然语言形式的解释。这个解释被Feed入第二个LLM，它尝试从自然语言解释中重构程序。我们的指标测量重构后的程序与原始程序之间的行为相似性。我们验证我们的方法使用了难以解读的程序，用于解决经典编程问题。我们还对使用LLM进行评估的程序策略与轻微隐藏版本的同样程序进行比较。我们的LLM可读性分数一直 ranks不可读性程序低和可读性程序高。这些发现表示我们的指标可能是一种可靠且便宜的评估程序策略可读性的工具。

Physics-Informed Data Denoising for Real-Life Sensing Systems

paper_url: http://arxiv.org/abs/2311.06968
repo_url: None
paper_authors: Xiyuan Zhang, Xiaohan Fu, Diyan Teng, Chengyu Dong, Keerthivasan Vijayakumar, Jiayun Zhang, Ranak Roy Chowdhury, Junsheng Han, Dezhi Hong, Rashmi Kulkarni, Jingbo Shang, Rajesh Gupta
for: 这篇论文是为了提出一种基于物理法则的实时减杂模型，以提高实际应用中的感应器资料品质。
methods: 这篇论文使用的方法是基于物理法则的实时减杂模型，利用不同感应器测值之间的物理关系来导正减杂过程，不需要使用实际减杂数据。
results: 这篇论文的实验结果显示，这种基于物理法则的实时减杂模型可以在不同领域中实现高性能，例如陀螺仪 Navigation、CO2监控和HVAC控制，并且可以实现实时减杂（4ms для每一秒的序列），与高精度、高成本的替代方法相匹配。

Abstract
Sensors measuring real-life physical processes are ubiquitous in today's interconnected world. These sensors inherently bear noise that often adversely affects performance and reliability of the systems they support. Classic filtering-based approaches introduce strong assumptions on the time or frequency characteristics of sensory measurements, while learning-based denoising approaches typically rely on using ground truth clean data to train a denoising model, which is often challenging or prohibitive to obtain for many real-world applications. We observe that in many scenarios, the relationships between different sensor measurements (e.g., location and acceleration) are analytically described by laws of physics (e.g., second-order differential equation). By incorporating such physics constraints, we can guide the denoising process to improve even in the absence of ground truth data. In light of this, we design a physics-informed denoising model that leverages the inherent algebraic relationships between different measurements governed by the underlying physics. By obviating the need for ground truth clean data, our method offers a practical denoising solution for real-world applications. We conducted experiments in various domains, including inertial navigation, CO2 monitoring, and HVAC control, and achieved state-of-the-art performance compared with existing denoising methods. Our method can denoise data in real time (4ms for a sequence of 1s) for low-cost noisy sensors and produces results that closely align with those from high-precision, high-cost alternatives, leading to an efficient, cost-effective approach for more accurate sensor-based systems.

摘要
现代互连世界中的感测器广泛存在，这些感测器自然带有噪声，这些噪声可能会影响感测器支持的系统性能和可靠性。 классический滤波器基本方法假设时间或频率特性的感测值，而学习基于权重的净化方法通常需要使用clean数据来训练净化模型，这在许多实际应用中是困难或不可能的。我们发现，在许多场景下，不同感测值之间的关系可以通过物理法则（如二阶差分方程）进行描述。我们可以利用这些物理约束，导引净化过程，以提高净化效果，甚至在clean数据不可获得的情况下。在这个意义上，我们设计了物理约束净化模型，利用不同感测值之间的物理关系，从而减少了净化过程中的噪声。我们的方法不需要clean数据，可以在实时（4毫秒）内进行净化，并且与高精度、高成本的alternative结果高度一致，从而提供了高效、成本效果的净化方案。

Towards probabilistic Weather Forecasting with Conditioned Spatio-Temporal Normalizing Flows

paper_url: http://arxiv.org/abs/2311.06958
repo_url: None
paper_authors: Christina Winkler
for: 这篇论文是为了模型随机空间时间分布而写的。
methods: 这篇论文使用 conditional normalizing flows 来实现随机空间时间模型。
results: 实验表明，这种方法能够捕捉随机空间时间的相关性，并可以在训练时间以外的时间范围内预测。Here’s the English version for reference:
for: This paper is written for modeling multimodal spatial distributions and capturing temporal correlations.
methods: The paper uses conditional normalizing flows to achieve stochastic spatio-temporal modeling.
results: Experiments show that the method can capture spatio-temporal correlations and extrapolate well beyond the training time horizon.

Abstract
Generative normalizing flows are able to model multimodal spatial distributions, and they have been shown to model temporal correlations successfully as well. These models provide several benefits over other types of generative models due to their training stability, invertibility and efficiency in sampling and inference. This makes them a suitable candidate for stochastic spatio-temporal prediction problems, which are omnipresent in many fields of sciences, such as earth sciences, astrophysics or molecular sciences. In this paper, we present conditional normalizing flows for stochastic spatio-temporal modelling. The method is evaluated on the task of daily temperature and hourly geopotential map prediction from ERA5 datasets. Experiments show that our method is able to capture spatio-temporal correlations and extrapolates well beyond the time horizon used during training.

摘要

FLASH-RL: Federated Learning Addressing System and Static Heterogeneity using Reinforcement Learning

paper_url: http://arxiv.org/abs/2311.06917
repo_url: https://github.com/Sofianebouaziz1/FLASH-RL
paper_authors: Sofiane Bouaziz, Hadjer Benmeziane, Youcef Imine, Leila Hamdad, Smail Niar, Hamza Ouarnoughi
for: 这篇论文主要关注的是联合学习（Federated Learning，FL）中的训练执行效率和稳定性，以及如何对应系统和静态不统一性。
methods: 本文提出了一个名为FLASH-RL的框架，使用了双层深度问题学习（Double Deep Q-Learning，DDQL）来解决系统和静态不统一性。此外，本文还引入了一个名为“实验增强”的新的评估函数，以评估客户端的贡献度。
results: 实验结果显示，FLASH-RL可以实现与现有解决方案相对的平衡，即模型性能和终端延迟之间的平衡。具体来说，FLASH-RL可以对MNIST和CIFAR-10数据集进行训练，并且在训练轮次和终端延迟方面实现了24.83%和24.67%的提升。此外，FLASH-RL还可以实现模型性能和训练轮次之间的平衡，并且在滑块检测中实现了2.82%的提升。

Abstract
Federated Learning (FL) has emerged as a promising Machine Learning paradigm, enabling multiple users to collaboratively train a shared model while preserving their local data. To minimize computing and communication costs associated with parameter transfer, it is common practice in FL to select a subset of clients in each training round. This selection must consider both system and static heterogeneity. Therefore, we propose FLASH-RL, a framework that utilizes Double Deep QLearning (DDQL) to address both system and static heterogeneity in FL. FLASH-RL introduces a new reputation-based utility function to evaluate client contributions based on their current and past performances. Additionally, an adapted DDQL algorithm is proposed to expedite the learning process. Experimental results on MNIST and CIFAR-10 datasets have shown FLASH-RL's effectiveness in achieving a balanced trade-off between model performance and end-to-end latency against existing solutions. Indeed, FLASH-RL reduces latency by up to 24.83% compared to FedAVG and 24.67% compared to FAVOR. It also reduces the training rounds by up to 60.44% compared to FedAVG and +76% compared to FAVOR. In fall detection using the MobiAct dataset, FLASH-RL outperforms FedAVG by up to 2.82% in model's performance and reduces latency by up to 34.75%. Additionally, FLASH-RL achieves the target performance faster, with up to a 45.32% reduction in training rounds compared to FedAVG.

摘要
Federated Learning (FL) 已经成为一种有前途的机器学习方法，允许多个用户共同训练共享模型，保留他们的本地数据。为了降低计算和通信成本相关的参数传输，FL 通常会选择每轮训练中的一 subset of clients。这种选择必须考虑系统和静态不同化。因此，我们提议 FLASH-RL，一个基于 Double Deep QLearning (DDQL) 的框架，用于解决 FL 中的系统和静态不同化。FLASH-RL 引入了一个基于客户端表现的声誉基于的用户贡献函数。此外，我们还提出了一种适应 DDQL 算法，以促进学习过程。实验结果表明，FLASH-RL 在 MNIST 和 CIFAR-10 数据集上可以很好地寻求一个平衡的交易off между模型性能和端到端延迟，相比现有的解决方案。具体来说，FLASH-RL 可以降低延迟时间达到 24.83%，相比 FedAVG 和 FAVOR。它还可以降低训练轮数达到 60.44%，相比 FedAVG 和 FAVOR。在 fall detection 中使用 MobiAct 数据集，FLASH-RL 可以在模型性能方面与 FedAVG 相比提高达 2.82%，并降低延迟时间达 34.75%。此外，FLASH-RL 可以更快地实现目标性能，相比 FedAVG 的训练轮数减少达 45.32%。

TSViT: A Time Series Vision Transformer for Fault Diagnosis

paper_url: http://arxiv.org/abs/2311.06916
repo_url: None
paper_authors: Shouhua Zhang, Jiehan Zhou, Xue Ma, Chenglin Wen, Susanna Pirttikangas, Chen Yu, Weishan Zhang, Chunsheng Yang
For: This paper is written for fault diagnosis in mechanical systems using Convolutional Neural Networks (CNNs) and the Time Series Vision Transformer (TSViT) model.* Methods: The TSViT model uses a combination of convolutional layers to segment vibration signals and capture local features, as well as a transformer encoder to learn long-term temporal information.* Results: The experimental results on two distinct datasets show that TSViT achieves an average accuracy of 100% and 99.99% on two test sets, respectively, outperforming other methods in terms of performance, computational complexity, and parameter quantity.

Abstract
Traditional fault diagnosis methods using Convolutional Neural Networks (CNNs) face limitations in capturing temporal features (i.e., the variation of vibration signals over time). To address this issue, this paper introduces a novel model, the Time Series Vision Transformer (TSViT), specifically designed for fault diagnosis. On one hand, TSViT model integrates a convolutional layer to segment vibration signals and capture local features. On the other hand, it employs a transformer encoder to learn long-term temporal information. The experimental results with other methods on two distinct datasets validate the effectiveness and generalizability of TSViT with a comparative analysis of its hyperparameters' impact on model performance, computational complexity, and overall parameter quantity. TSViT reaches average accuracies of 100% and 99.99% on two test sets, correspondingly.

摘要
传统的疲劳诊断方法使用卷积神经网络（CNN）受到时间特征的限制，这限制了其在捕捉振荡信号的变化过程中的表现。为解决这个问题，本文提出了一种新的模型——时间序列视力 трансформер（TSViT），专门用于疲劳诊断。一方面，TSViT模型包含了一层卷积层，用于分割振荡信号并捕捉本地特征。另一方面，它使用变换器编码器来学习长期的时间特征。实验结果表明，TSViT模型在两个不同的数据集上达到了100%和99.99%的平均准确率，对比其他方法的性能表现明显优于。此外，TSViT模型的计算复杂度和总参数数量也被详细分析。

Flames: Benchmarking Value Alignment of Chinese Large Language Models

paper_url: http://arxiv.org/abs/2311.06899
repo_url: None
paper_authors: Kexin Huang, Xiangyang Liu, Qianyu Guo, Tianxiang Sun, Jiawei Sun, Yaru Wang, Zeyang Zhou, Yixu Wang, Yan Teng, Xipeng Qiu, Yingchun Wang, Dahua Lin
for: 本研究旨在评估大语言模型（LLMs）是否与人类价值观align得通。
methods: 本研究提出了首个高度 adversarial benchmark named Flames，包括2,251个手动制作的提问、~18.7K个模型回应、和一个特定的分数器。
results: 根据Flames框架，我们手动制作了逻辑攻击提问，并使用这些提问让主流LLMs回应。我们发现所有评估的LLMs在Flames上表现相对较差，特别是在安全和公平性维度。Claude emerges as the best-performing model overall，但其无害率只有63.08%，而GPT-4的分数只有39.04%。Flames的复杂性已经超越了现有的 benchmark，设置了一个新的挑战 для当代LLMs，并 highlighted the need for further alignment of LLMs。

Abstract
The widespread adoption of large language models (LLMs) across various regions underscores the urgent need to evaluate their alignment with human values. Current benchmarks, however, fall short of effectively uncovering safety vulnerabilities in LLMs. Despite numerous models achieving high scores and 'topping the chart' in these evaluations, there is still a significant gap in LLMs' deeper alignment with human values and achieving genuine harmlessness. To this end, this paper proposes the first highly adversarial benchmark named Flames, consisting of 2,251 manually crafted prompts, ~18.7K model responses with fine-grained annotations, and a specified scorer. Our framework encompasses both common harmlessness principles, such as fairness, safety, legality, and data protection, and a unique morality dimension that integrates specific Chinese values such as harmony. Based on the framework, we carefully design adversarial prompts that incorporate complex scenarios and jailbreaking methods, mostly with implicit malice. By prompting mainstream LLMs with such adversarially constructed prompts, we obtain model responses, which are then rigorously annotated for evaluation. Our findings indicate that all the evaluated LLMs demonstrate relatively poor performance on Flames, particularly in the safety and fairness dimensions. Claude emerges as the best-performing model overall, but with its harmless rate being only 63.08% while GPT-4 only scores 39.04%. The complexity of Flames has far exceeded existing benchmarks, setting a new challenge for contemporary LLMs and highlighting the need for further alignment of LLMs. To efficiently evaluate new models on the benchmark, we develop a specified scorer capable of scoring LLMs across multiple dimensions, achieving an accuracy of 77.4%. The Flames Benchmark is publicly available on https://github.com/AIFlames/Flames.

摘要
广泛的大语言模型（LLM）在不同地区的采用，强调了评估它们与人类价值的调和。现有的标准 however，无法有效发现语言模型的安全漏洞。 despite numerous models achieving high scores and "topping the chart" in these evaluations, there is still a significant gap in LLMs' deeper alignment with human values and achieving genuine harmlessness. To address this issue, this paper proposes the first highly adversarial benchmark named Flames, which includes 2,251 manually crafted prompts, ~18.7K model responses with fine-grained annotations, and a specified scorer. Our framework encompasses both common harmlessness principles, such as fairness, safety, legality, and data protection, and a unique morality dimension that integrates specific Chinese values such as harmony. Based on the framework, we carefully design adversarial prompts that incorporate complex scenarios and jailbreaking methods, mostly with implicit malice. By prompting mainstream LLMs with such adversarially constructed prompts, we obtain model responses, which are then rigorously annotated for evaluation. Our findings indicate that all the evaluated LLMs demonstrate relatively poor performance on Flames, particularly in the safety and fairness dimensions. Claude emerges as the best-performing model overall, but with its harmless rate being only 63.08% while GPT-4 only scores 39.04%. The complexity of Flames has far exceeded existing benchmarks, setting a new challenge for contemporary LLMs and highlighting the need for further alignment of LLMs. To efficiently evaluate new models on the benchmark, we develop a specified scorer capable of scoring LLMs across multiple dimensions, achieving an accuracy of 77.4%. The Flames Benchmark is publicly available on .

Anticipating User Needs: Insights from Design Fiction on Conversational Agents for Computational Thinking

paper_url: http://arxiv.org/abs/2311.06887
repo_url: None
paper_authors: Jacob Penney, João Felipe Pimentel, Igor Steinmacher, Marco A. Gerosa
for: 这篇论文的目的是为了帮助设计一个能够帮助学生学习计算思维和程式设计的聊天机器人。
methods: 这篇论文使用了设计幻想（design fiction）的方法来了解教育导师对于一个基于生成人工智能（genAI）的聊天机器人的需求和预期。
results: 根据这篇论文的结果，导师们希望一个基于genAI的聊天机器人能够运行学生逐步进行运算，并且能够根据学生的学习背景、技能和缺失、以及学习风格来调整帮助方式。

Abstract
Computational thinking, and by extension, computer programming, is notoriously challenging to learn. Conversational agents and generative artificial intelligence (genAI) have the potential to facilitate this learning process by offering personalized guidance, interactive learning experiences, and code generation. However, current genAI-based chatbots focus on professional developers and may not adequately consider educational needs. Involving educators in conceiving educational tools is critical for ensuring usefulness and usability. We enlisted \numParticipants{} instructors to engage in design fiction sessions in which we elicited abilities such a conversational agent supported by genAI should display. Participants envisioned a conversational agent that guides students stepwise through exercises, tuning its method of guidance with an awareness of the educational background, skills and deficits, and learning preferences. The insights obtained in this paper can guide future implementations of tutoring conversational agents oriented toward teaching computational thinking and computer programming.

摘要
We conducted design fiction sessions with \numParticipants{} instructors to explore the abilities that a conversational agent supported by genAI should have. Participants envisioned a conversational agent that guides students step-by-step through exercises, adapting its method of guidance based on the student's educational background, skills, and learning preferences. The insights from this study can inform the development of future tutoring conversational agents that teach computational thinking and computer programming.

Modeling User Viewing Flow using Large Language Models for Article Recommendation

paper_url: http://arxiv.org/abs/2311.07619
repo_url: None
paper_authors: Zhenghao Liu, Zulong Chen, Moufeng Zhang, Shaoyang Duan, Hong Wen, Liangyue Li, Nan Li, Yu Gu, Ge Yu
for: 这 paper 是为了提出一种基于用户常见 preference 和当下兴趣的文章推荐方法（SINGLE），用于解决文章推荐任务。
methods: 该方法使用用户常见视图流模型来概括用户的总体兴趣，并使用 Large Language Models (LLMs) 捕捉用户常见的 preference。此外，它还设计了用户即时视图流模型来建立用户点击文章历史和候选文章之间的互动。
results: 根据在 Alibaba Technology Association (ATA) 网站上的实验结果，SINGLE 方法在 online A/B 测试中获得了2.4% 的提升，并且further 分析表明，SINGLE 方法可以建立更加适合用户的推荐系统，通过模仿用户查看不同文章的行为和推荐更适合用户兴趣的文章。

Abstract
This paper proposes the User Viewing Flow Modeling (SINGLE) method for the article recommendation task, which models the user constant preference and instant interest from user-clicked articles. Specifically, we employ a user constant viewing flow modeling method to summarize the user's general interest to recommend articles. We utilize Large Language Models (LLMs) to capture constant user preferences from previously clicked articles, such as skills and positions. Then we design the user instant viewing flow modeling method to build interactions between user-clicked article history and candidate articles. It attentively reads the representations of user-clicked articles and aims to learn the user's different interest views to match the candidate article. Our experimental results on the Alibaba Technology Association (ATA) website show the advantage of SINGLE, which achieves 2.4% improvements over previous baseline models in the online A/B test. Our further analyses illustrate that SINGLE has the ability to build a more tailored recommendation system by mimicking different article viewing behaviors of users and recommending more appropriate and diverse articles to match user interests.

摘要
Note: Simplified Chinese is also known as "简化字符" or "简体字".Please note that the translation is done using Google Translate and may not be perfect.

Understanding Practices around Computational News Discovery Tools in the Domain of Science Journalism

paper_url: http://arxiv.org/abs/2311.06864
repo_url: None
paper_authors: Sachita Nishal, Jasmine Sinchai, Nicholas Diakopoulos
for: 该论文主要目标是帮助科技新闻记者更加快速地找到新闻灵感，因为他们的工作负担增加，资源减少，科学出版生态系统扩大。
methods: 该论文使用计算机方法来帮助科技新闻记者发现新闻，包括一个交互式工具，用于评估科学新闻的时效性和新闻价值。
results: 研究发现，计算机工具可以帮助科技新闻记者更加快速地找到新闻灵感，但是需要考虑新闻价值、科学背景和社会影响等因素。

Abstract
Science and technology journalists today face challenges in finding newsworthy leads due to increased workloads, reduced resources, and expanding scientific publishing ecosystems. Given this context, we explore computational methods to aid these journalists' news discovery in terms of time-efficiency and agency. In particular, we prototyped three computational information subsidies into an interactive tool that we used as a probe to better understand how such a tool may offer utility or more broadly shape the practices of professional science journalists. Our findings highlight central considerations around science journalists' agency, context, and responsibilities that such tools can influence and could account for in design. Based on this, we suggest design opportunities for greater and longer-term user agency; incorporating contextual, personal and collaborative notions of newsworthiness; and leveraging flexible interfaces and generative models. Overall, our findings contribute a richer view of the sociotechnical system around computational news discovery tools, and suggest ways to improve such tools to better support the practices of science journalists.

摘要
现代科技期刊记者面临着新闻发现的挑战，这主要归结于增加的工作负担、减少的资源以及科学出版生态系统的扩展。为了解决这些问题，我们研究了计算机方法来帮助记者新闻发现，以提高效率和使用者感受。具体来说，我们将三种计算机信息补充变为一个互动工具，用于更好地理解这种工具如何提供用户价值，以及如何改进这种工具以更好地支持专业科技记者的实践。我们的发现表明，计算机新闻发现工具的设计应该考虑科技记者的决策权、上下文和责任，并且应该包括个人、社交和协作的新闻可能性。基于这些发现，我们建议设计者可以通过提供更多的用户参与、个性化的新闻可能性和灵活的界面来提高用户参与度和持续时间。总之，我们的发现为计算机新闻发现工具的设计提供了更加丰富的社会技术系统视角，并提供了改进这种工具以更好地支持专业科技记者的方法。

Can Large Language Models Augment a Biomedical Ontology with missing Concepts and Relations?

paper_url: http://arxiv.org/abs/2311.06858
repo_url: https://github.com/minitour/ontology-extension-chatgpt
paper_authors: Antonio Zaitoun, Tomer Sagi, Szymon Wilk, Mor Peleg
for: 扩展现有 ontology 中的概念和关系
methods: 使用大型自然语言模型 (LLM) 和对话交互来自动扩展 ontology
results: 对 clinical practice guidelines (CPGs) 进行分析，检测不在 SNOMED-CT 中的新医学概念和关系

Abstract
Ontologies play a crucial role in organizing and representing knowledge. However, even current ontologies do not encompass all relevant concepts and relationships. Here, we explore the potential of large language models (LLM) to expand an existing ontology in a semi-automated fashion. We demonstrate our approach on the biomedical ontology SNOMED-CT utilizing semantic relation types from the widely used UMLS semantic network. We propose a method that uses conversational interactions with an LLM to analyze clinical practice guidelines (CPGs) and detect the relationships among the new medical concepts that are not present in SNOMED-CT. Our initial experimentation with the conversational prompts yielded promising preliminary results given a manually generated gold standard, directing our future potential improvements.

摘要
（注意：以下是简化中文，不同于正式中文） ontology 是知识组织和表示的关键作用，但当前 ontology 并不包括所有相关的概念和关系。我们探讨了使用大型自然语言模型（LLM）来扩展现有 ontology 的方法。我们使用 UMLS semantic network 中的 semantic relation types，并在 SNOMED-CT 生物医学 ontology 中进行了实验。我们提议一种使用 conversational interactions 与 LLM 分析临床实践指南 (CPGs)，检测不在 SNOMED-CT 中的新医疗概念之间的关系。我们的初步实验表明，使用 conversational prompts 可以获得有价值的初步结果，引导我们未来的可能的改进。

On learning spatial sequences with the movement of attention

paper_url: http://arxiv.org/abs/2311.06856
repo_url: None
paper_authors: Viacheslav M. Osaulenko
for: 本研究的目的是解释人类如何通过视觉经验recognize不同的身体运动，以及在不同感知modalities中存在固有的空间序列表示。
methods: 本研究使用新的数学表示方法，提出了对于空间序列的抽象层次结构，并提出了两个假设来解释这种抽象的形成。
results: 研究发现，人类认知中的注意力运动是关键，并且可以应用到新的学习算法中。通过对 redundancy的处理，人类可以更好地recognize和泛化不同的身体运动。

Abstract
In this paper we start with a simple question, how is it possible that humans can recognize different movements over skin with only a prior visual experience of them? Or in general, what is the representation of spatial sequences that are invariant to scale, rotation, and translation across different modalities? To answer, we rethink the mathematical representation of spatial sequences, argue against the minimum description length principle, and focus on the movements of attention. We advance the idea that spatial sequences must be represented on different levels of abstraction, this adds redundancy but is necessary for recognition and generalization. To address the open question of how these abstractions are formed we propose two hypotheses: the first invites exploring selectionism learning, instead of finding parameters in some models; the second proposes to find new data structures, not neural network architectures, to efficiently store and operate over redundant features to be further selected. Movements of attention are central to human cognition and lessons should be applied to new better learning algorithms.

摘要
在这篇论文中，我们开始于一个简单的问题：人类如何通过视觉经验 alone 识别不同的身体运动？或者更一般来说，如何表示不同模式之间的空间序列是具有扩展、旋转和平移不变性的？为了回答这个问题，我们重新思考了空间序列的数学表示方式，反对最小描述长度原则，并将注意力的运动作为中心。我们提出了两个假设：第一个是探索选择主义学习，而不是找到某些模型中的参数；第二个是找到更好的存储和运算缓存重复特征的数据结构，以便进一步选择。我们认为 spatial sequences 必须在不同的层次上表示，这会添加冗余，但是是必要的 для认知和泛化。为了解决如何形成这些抽象，我们提出了两个假设：第一个是通过选择主义学习来形成抽象；第二个是找到更好的学习算法，以便更有效地存储和运算重复特征。我们认为人类认知中的运动是中心，我们应该从这里学习新的更好的学习算法。

Distribution Re-weighting and Voting Paradoxes

paper_url: http://arxiv.org/abs/2311.06840
repo_url: None
paper_authors: Bijan Mazaheri, Siddharth Jain, Matthew Cook, Jehoshua Bruck
for: 本研究探讨了域专家知识的分布偏移问题，即训练只限于特定标签 subsets。
methods: 研究使用标准分布偏移策略，包括数据重新权重，以及常见 causal inference 调整。
results: 研究发现，标准分布偏移策略可能导致域专家之间的反面互相矛盾，同时也与选举 preference 中的假设相似。

Abstract
We explore a specific type of distribution shift called domain expertise, in which training is limited to a subset of all possible labels. This setting is common among specialized human experts, or specific focused studies. We show how the standard approach to distribution shift, which involves re-weighting data, can result in paradoxical disagreements among differing domain expertise. We also demonstrate how standard adjustments for causal inference lead to the same paradox. We prove that the characteristics of these paradoxes exactly mimic another set of paradoxes which arise among sets of voter preferences.

摘要
我们研究一种特定的分布偏移问题，即领域专家知识，在培训中仅限于一 subset of all possible labels。这种情况常见于专业人士或特定领域研究。我们显示了标准的分布偏移方法，即重新权重数据，可能导致不同领域专家之间的意见不一致。我们也示出了标准的 causal inference 调整也会导致同样的парадок。我们证明了这些 парадок 的特征与另一种来自选民偏好的集合中的 парадок 完全相同。

Open-Set Graph Anomaly Detection via Normal Structure Regularisation

paper_url: http://arxiv.org/abs/2311.06835
repo_url: None
paper_authors: Qizhou Wang, Guansong Pang, Mahsa Salehi, Wray Buntine, Christopher Leckie
for:这篇论文针对的是 Graph Anomaly Detection (GAD) 任务，具体来说是开放集 GAD，这种任务的目标是通过一小量标注正常和异常节点（称为 seen anomalies）来检测图像中的异常节点。methods:这篇论文提出了一种新的开放集 GAD方法，即normal structure regularization (NSReg)，该方法利用标注节点中的正常图 структуры来解决现有方法过于强调seen anomalies，导致检测未看到异常节点的问题。results:实验结果表明，NSReg 在实际世界数据集上具有superiority，可以更好地检测图像中的异常节点。

Abstract
This paper considers an under-explored Graph Anomaly Detection (GAD) task, namely open-set GAD, which aims to detect anomalous nodes using a small number of labelled training normal and anomaly nodes (known as seen anomalies) that cannot illustrate all possible inference-time abnormalities. The task has attracted growing attention due to the availability of anomaly prior knowledge from the label information that can help to substantially reduce detection errors. However, current methods tend to over-emphasise fitting the seen anomalies, leading to a weak generalisation ability to detect unseen anomalies, i.e., those that are not illustrated by the labelled anomaly nodes. Further, they were introduced to handle Euclidean data, failing to effectively capture important non-Euclidean features for GAD. In this work, we propose a novel open-set GAD approach, namely normal structure regularisation (NSReg), to leverage the rich normal graph structure embedded in the labelled nodes to tackle the aforementioned two issues. In particular, NSReg trains an anomaly-discriminative supervised graph anomaly detector, with a plug-and-play regularisation term to enforce compact, semantically-rich representations of normal nodes. To this end, the regularisation is designed to differentiate various types of normal nodes, including labelled normal nodes that are connected in their local neighbourhood, and those that are not connected. By doing so, it helps incorporate strong normality into the supervised anomaly detector learning, mitigating their overfitting to the seen anomalies. Extensive empirical results on real-world datasets demonstrate the superiority of our proposed NSReg for open-set GAD.

摘要
To address these issues, we propose a novel open-set GAD approach called normal structure regularization (NSReg). NSReg leverages the rich normal graph structure embedded in the labeled nodes to improve the detection of anomalies. Specifically, NSReg trains an anomaly-discriminative supervised graph anomaly detector with a plug-and-play regularization term that enforces compact, semantically-rich representations of normal nodes. The regularization differentiates various types of normal nodes, including labeled normal nodes that are connected in their local neighborhood and those that are not connected. This helps incorporate strong normality into the supervised anomaly detector learning, mitigating overfitting to the seen anomalies.Extensive empirical results on real-world datasets demonstrate the superiority of our proposed NSReg for open-set GAD.

Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms

paper_url: http://arxiv.org/abs/2311.06826
repo_url: None
paper_authors: Kristof Meding, Thilo Hagendorff
for: 本研究旨在探讨如何透过“公平黑客”的做法，对Algorithmic discrimination进行隐藏。
methods: 本研究使用了两种不同的“公平黑客”分类：一是内部公平黑客（misuse of a particular metric by adding or removing sensitive attributes from the analysis），二是间接公平黑客（search for a specific fair metric with given attributes）。
results: 本研究使用了真实数据来示范两种“公平黑客”的存在，并说明了这些“公平黑客”可能对于End-users和广泛的AI实践Community造成的伤害。

Abstract
Fairness in machine learning (ML) is an ever-growing field of research due to the manifold potential for harm from algorithmic discrimination. To prevent such harm, a large body of literature develops new approaches to quantify fairness. Here, we investigate how one can divert the quantification of fairness by describing a practice we call "fairness hacking" for the purpose of shrouding unfairness in algorithms. This impacts end-users who rely on learning algorithms, as well as the broader community interested in fair AI practices. We introduce two different categories of fairness hacking in reference to the established concept of p-hacking. The first category, intra-metric fairness hacking, describes the misuse of a particular metric by adding or removing sensitive attributes from the analysis. In this context, countermeasures that have been developed to prevent or reduce p-hacking can be applied to similarly prevent or reduce fairness hacking. The second category of fairness hacking is inter-metric fairness hacking. Inter-metric fairness hacking is the search for a specific fair metric with given attributes. We argue that countermeasures to prevent or reduce inter-metric fairness hacking are still in their infancy. Finally, we demonstrate both types of fairness hacking using real datasets. Our paper intends to serve as a guidance for discussions within the fair ML community to prevent or reduce the misuse of fairness metrics, and thus reduce overall harm from ML applications.

摘要
机器学习（ML）中的公平是一个不断成长的研究领域，因为算法歧视可能导致严重的伤害。为防止这种伤害，一大量的文献开发了新的方法来量化公平。在这篇论文中，我们调查了如何将公平量化歪曲到算法中，以及这对终端用户和更广泛的AI实践社区的影响。我们分别将这种歪曲分为两个类别：在内部度量公平歪曲中，我们添加或删除敏感特征，以影响分析结果。在这种情况下，已经发展了预防或减少p-hacking的对策，可以类似地适用于预防或减少公平歪曲。第二个类别是between度量公平歪曲，它是搜寻特定的公平度量器，以满足给定的属性。我们认为预防或减少between度量公平歪曲的对策仍在起步阶段。最后，我们使用实际数据示出了这两种公平歪曲的示例。我们的论文旨在为公平ML社区提供指南，以预防或减少公平度量器的歪曲，并因此减少ML应用中的伤害。

Training A Multi-stage Deep Classifier with Feedback Signals

paper_url: http://arxiv.org/abs/2311.06823
repo_url: None
paper_authors: Chao Xu, Yu Yang, Rongzhao Wang, Guan Wang, Bojia Lin
for: 这篇论文是针对多阶段分类器（Multi-Stage Classifier，MSC）的训练框架，尤其是两阶段二元分类的情况。
methods: 本文提出了一个新的训练框架，名为反向训练（Feedback Training），它在实际运行的顺序反之训练每个阶段的分类器，并使用后期阶段的分类器来导引初期阶段的分类器训练。
results: 实验结果显示，提案的训练框架在几个 shot 训练情况下具有优秀的表现，并且在实际应用中具有很好的适用性。

Abstract
Multi-Stage Classifier (MSC) - several classifiers working sequentially in an arranged order and classification decision is partially made at each step - is widely used in industrial applications for various resource limitation reasons. The classifiers of a multi-stage process are usually Neural Network (NN) models trained independently or in their inference order without considering the signals from the latter stages. Aimed at two-stage binary classification process, the most common type of MSC, we propose a novel training framework, named Feedback Training. The classifiers are trained in an order reverse to their actual working order, and the classifier at the later stage is used to guide the training of initial-stage classifier via a sample weighting method. We experimentally show the efficacy of our proposed approach, and its great superiority under the scenario of few-shot training.

摘要
多阶段分类器（MSC）是在工业应用中广泛使用的，它是一种多个分类器在预先定义的顺序中工作，并在每个阶段中做出部分分类决策。多阶段分类器的每个阶段通常使用独立地或在推理阶段无视后续阶段的神经网络（NN）模型进行训练。我们对二阶段二分类过程进行了改进，并提出了一种新的训练框架，名为反向培训。在这种框架中，分类器们在它们的实际工作顺序相反的顺序进行训练，并使用后期阶段的分类器来引导初始阶段分类器的训练，通过一种权重方法。我们通过实验证明了我们的提议的优势，特别在几个步骤训练的场景下表现出色。

Dual-Branch Reconstruction Network for Industrial Anomaly Detection with RGB-D Data

paper_url: http://arxiv.org/abs/2311.06797
repo_url: None
paper_authors: Chenyang Bi, Yueyang Li, Haichi Luo
for: 这个论文主要针对的是产业异常检测领域中的无监控异常检测方法，并且强调了3D点云和RGB图像的多 modal 检测。
methods: 本论文提出了一个轻量级的双支分支重建网络（DBRN），利用RGB-D输入，通过学习决策界面来区分正常和异常的示例。此外，本论文还引入了一个重要性分配模组，以帮助将这两种模式的特征融合，从而获得全面的决策结果。
results: 根据MVTec 3D-AD dataset的评估结果，DBRN可以 дости得92.8% AUROC的高准确率，而且不需要大量的预训数据和快取库。

Abstract
Unsupervised anomaly detection methods are at the forefront of industrial anomaly detection efforts and have made notable progress. Previous work primarily used 2D information as input, but multi-modal industrial anomaly detection based on 3D point clouds and RGB images is just beginning to emerge. The regular approach involves utilizing large pre-trained models for feature representation and storing them in memory banks. However, the above methods require a longer inference time and higher memory usage, which cannot meet the real-time requirements of the industry. To overcome these issues, we propose a lightweight dual-branch reconstruction network(DBRN) based on RGB-D input, learning the decision boundary between normal and abnormal examples. The requirement for alignment between the two modalities is eliminated by using depth maps instead of point cloud input. Furthermore, we introduce an importance scoring module in the discriminative network to assist in fusing features from these two modalities, thereby obtaining a comprehensive discriminative result. DBRN achieves 92.8% AUROC with high inference efficiency on the MVTec 3D-AD dataset without large pre-trained models and memory banks.

摘要
“无监督异常检测方法在工业中领先，它们已经做出了杰出的进步。以往的工作主要使用2D信息作为输入，但是基于3D点Cloud和RGB图像的多modal工业异常检测则刚开始出现。常规的方法是利用大型预训模型来表示特征，并将其储存在内存库中。但这些方法需要较长的推论时间和更高的内存使用，这不能满足工业的实时需求。为了解决这些问题，我们提出了一个轻量级双支架网络（DBRN），基于RGB-D输入，学习决策界面 между正常和异常的例子。depth maps而不是点Cloud输入，从而消除了两个模式之间的对齐需求。此外，我们引入了优先级分配模组，以帮助将这两个模式的特征融合，从而获得了全面的决策结果。DBRN在MVTec 3D-AD dataset上 achieve 92.8% AUROC，并且具有高推论效率和低资源需求。”

Alleviating Behavior Data Imbalance for Multi-Behavior Graph Collaborative Filtering

paper_url: http://arxiv.org/abs/2311.06777
repo_url: None
paper_authors: Yijie Zhang, Yuanchen Bei, Shiqi Yang, Hao Chen, Zhiqing Li, Lijia Chen, Feiran Huang
for: 提高多行为推荐的性能，解决多行为数据不均衡问题。
methods: 利用多任务学习框架进行多行为图建议，对缺乏数据的行为进行 represencing学习以提高推荐性能。
results: 在两个常用的多行为数据集上实现了IMGCF模型的有效性。

Abstract
Graph collaborative filtering, which learns user and item representations through message propagation over the user-item interaction graph, has been shown to effectively enhance recommendation performance. However, most current graph collaborative filtering models mainly construct the interaction graph on a single behavior domain (e.g. click), even though users exhibit various types of behaviors on real-world platforms, including actions like click, cart, and purchase. Furthermore, due to variations in user engagement, there exists an imbalance in the scale of different types of behaviors. For instance, users may click and view multiple items but only make selective purchases from a small subset of them. How to alleviate the behavior imbalance problem and utilize information from the multiple behavior graphs concurrently to improve the target behavior conversion (e.g. purchase) remains underexplored. To this end, we propose IMGCF, a simple but effective model to alleviate behavior data imbalance for multi-behavior graph collaborative filtering. Specifically, IMGCF utilizes a multi-task learning framework for collaborative filtering on multi-behavior graphs. Then, to mitigate the data imbalance issue, IMGCF improves representation learning on the sparse behavior by leveraging representations learned from the behavior domain with abundant data volumes. Experiments on two widely-used multi-behavior datasets demonstrate the effectiveness of IMGCF.

摘要
graph collaborative filtering，通过message propagation over the user-item interaction graph来学习用户和物品表示，已经能够有效提高推荐性能。然而，当前大多数图共同 Filtering模型都是基于单一的行为Domain（例如，点击）构建交互图，即使用户在现实世界平台上表现出多种行为，例如点击、购物车和购买。此外，由于用户参与度的变化，存在不同类型行为之间的数据不均衡问题。例如，用户可能会对多个物品进行点击和浏览，但只是从一小部分中进行选择性购买。为了解决这个问题并利用多个行为图同时提高目标行为转化（例如，购买），我们提出了IMGCF模型。specifically, IMGCF使用多任务学习框架 для图共同 Filtering on multi-behavior graphs。然后，为了缓解数据不均衡问题，IMGCF在稀薄的行为上提高表示学习，通过使用行为Domain中充满数据量的表示学习来增强表示学习。实验表明，IMGCF在两个广泛使用的多种行为dataset上显示出效果。

ChatAnything: Facetime Chat with LLM-Enhanced Personas

paper_url: http://arxiv.org/abs/2311.06772
repo_url: https://github.com/zhoudaquan/ChatAnything
paper_authors: Yilin Zhao, Xinbin Yuan, Shanghua Gao, Zhijie Lin, Qibin Hou, Jiashi Feng, Daquan Zhou
for: 这个技术报告targets generating anthropomorphized personas for LLM-based characters in an online manner, including visual appearance, personality, and tones, using only text descriptions.
methods: authors propose two novel concepts, the mixture of voices (MoV) and the mixture of diffusers (MoD), for diverse voice and appearance generation. They also utilize in-context learning capability of LLMs for personality generation and incorporate pixel-level guidance to infuse human face landmarks during the image generation phase.
results: the proposed framework, ChatAnything, can animate anything with any personas that are anthropomorphic using just a few text inputs. The authors also report a significant increase in the detection rate of face landmarks, from 57.0% to 92.5%, allowing for automatic face animation based on generated speech content.

Abstract
In this technical report, we target generating anthropomorphized personas for LLM-based characters in an online manner, including visual appearance, personality and tones, with only text descriptions. To achieve this, we first leverage the in-context learning capability of LLMs for personality generation by carefully designing a set of system prompts. We then propose two novel concepts: the mixture of voices (MoV) and the mixture of diffusers (MoD) for diverse voice and appearance generation. For MoV, we utilize the text-to-speech (TTS) algorithms with a variety of pre-defined tones and select the most matching one based on the user-provided text description automatically. For MoD, we combine the recent popular text-to-image generation techniques and talking head algorithms to streamline the process of generating talking objects. We termed the whole framework as ChatAnything. With it, users could be able to animate anything with any personas that are anthropomorphic using just a few text inputs. However, we have observed that the anthropomorphic objects produced by current generative models are often undetectable by pre-trained face landmark detectors, leading to failure of the face motion generation, even if these faces possess human-like appearances because those images are nearly seen during the training (e.g., OOD samples). To address this issue, we incorporate pixel-level guidance to infuse human face landmarks during the image generation phase. To benchmark these metrics, we have built an evaluation dataset. Based on it, we verify that the detection rate of the face landmark is significantly increased from 57.0% to 92.5% thus allowing automatic face animation based on generated speech content. The code and more results can be found at https://chatanything.github.io/.

摘要
在这份技术报告中，我们目标是通过文本描述生成LLM基于人物的人形化人物，包括视觉外表、性格和语言口音。为达到这一目标，我们首先利用LLM在场景学习中的增强学习能力进行人格生成，通过精心设计系统提示。然后，我们提出了两项新概念：混合声音（MoV）和混合扩散器（MoD），用于多样化的声音和外表生成。对MoV，我们利用文本识别算法中的多种预定的声音，并根据用户提供的文本描述自动选择最符合的一个。对MoD，我们结合了最新的文本识别技术和说话头算法，以便更加方便地生成说话的对象。我们将整个框架称为ChatAnything。通过它，用户可以使用只有几个文本输入来animate任何东西，并且可以为这些人物选择任何人形化。然而，我们发现当前的生成模型中的人形对象经常无法被预训练的面部准 markers检测器探测出来，导致对话动画失败，即使这些面部具有人类化的外观，因为这些图像在训练时未被考虑（例如，OD）。为解决这个问题，我们在生成图像阶段添加像素级指导，以启用人类面部准 markers。根据这些指标，我们建立了评估集。根据它，我们证明了面部准 markers检测率的提高，从57.0%提高到92.5%，因此允许基于生成的语音内容自动进行面部动画。代码和更多结果可以在https://chatanything.github.io/查看。

Learning Globally Optimized Language Structure via Adversarial Training

paper_url: http://arxiv.org/abs/2311.06771
repo_url: None
paper_authors: Xuwang Yin
for: 提高文本生成能力
methods: 使用反对敌攻击策略来训练EBM
results: 实验结果表明，该方法可以明显提高文本生成质量，比过去方法更有 Promise。主要贡献包括： + 针对文本生成的反对敌攻击策略，生成负样本以外的潜在模式 + 基于反对敌攻击的EBM训练算法 + 对文本生成任务的实验验证

Abstract
Recent work has explored integrating autoregressive language models with energy-based models (EBMs) to enhance text generation capabilities. However, learning effective EBMs for text is challenged by the discrete nature of language. This work proposes an adversarial training strategy to address limitations in prior efforts. Specifically, an iterative adversarial attack algorithm is presented to generate negative samples for training the EBM by perturbing text from the autoregressive model. This aims to enable the EBM to suppress spurious modes outside the support of the data distribution. Experiments on an arithmetic sequence generation task demonstrate that the proposed adversarial training approach can substantially enhance the quality of generated sequences compared to prior methods. The results highlight the promise of adversarial techniques to improve discrete EBM training. Key contributions include: (1) an adversarial attack strategy tailored to text to generate negative samples, circumventing MCMC limitations; (2) an adversarial training algorithm for EBMs leveraging these attacks; (3) empirical validation of performance improvements on a sequence generation task.

摘要

An adversarial attack strategy tailored to text to generate negative samples, overcoming the limitations of Markov chain Monte Carlo (MCMC) methods.2. An adversarial training algorithm for EBMs leveraging these attacks.3. Empirical validation of performance improvements on a sequence generation task.

Evaluating the Efficacy of Interactive Language Therapy Based on LLM for High-Functioning Autistic Adolescent Psychological Counseling

paper_url: http://arxiv.org/abs/2311.09243
repo_url: None
paper_authors: Yujin Cho, Mingeon Kim, Seojin Kim, Oyun Kwon, Ryan Donghan Kwon, Yoonha Lee, Dohyun Lim
for: 这个研究旨在评估大自然语言模型（LLM）在临床语言治疗中的有效性，以帮助高功能自闭症青少年。
methods: 这个研究使用了一种特制的评价表卡，由临床心理学家和心理医生组成的评审组对LLM的表现进行评估。
results: 研究发现，LLM在临床语言治疗中表现出了强大的同情和适应能力，但是在人类治疗师的深度情感理解和个性化方面存在挑战。

Abstract
This study investigates the efficacy of Large Language Models (LLMs) in interactive language therapy for high-functioning autistic adolescents. With the rapid advancement of artificial intelligence, particularly in natural language processing, LLMs present a novel opportunity to augment traditional psychological counseling methods. This research primarily focuses on evaluating the LLM's ability to engage in empathetic, adaptable, and contextually appropriate interactions within a therapeutic setting. A comprehensive evaluation was conducted by a panel of clinical psychologists and psychiatrists using a specially developed scorecard. The assessment covered various aspects of the LLM's performance, including empathy, communication skills, adaptability, engagement, and the ability to establish a therapeutic alliance. The study avoided direct testing with patients, prioritizing privacy and ethical considerations, and instead relied on simulated scenarios to gauge the LLM's effectiveness. The results indicate that LLMs hold significant promise as supportive tools in therapy, demonstrating strengths in empathetic engagement and adaptability in conversation. However, challenges in achieving the depth of personalization and emotional understanding characteristic of human therapists were noted. The study also highlights the importance of ethical considerations in the application of AI in therapeutic contexts. This research provides valuable insights into the potential and limitations of using LLMs in psychological counseling for autistic adolescents. It lays the groundwork for future explorations into AI's role in mental health care, emphasizing the need for ongoing development to enhance the capabilities of these models in therapeutic settings.

摘要
A comprehensive evaluation was conducted by a panel of clinical psychologists and psychiatrists using a specially developed scorecard. The assessment covered various aspects of the LLM's performance, including empathy, communication skills, adaptability, engagement, and the ability to establish a therapeutic alliance. The study avoided direct testing with patients, prioritizing privacy and ethical considerations, and instead relied on simulated scenarios to gauge the LLM's effectiveness.The results indicate that LLMs show great promise as supportive tools in therapy, excelling in empathetic engagement and adaptability in conversation. However, the study also noted challenges in achieving the depth of personalization and emotional understanding that are characteristic of human therapists. The research highlights the importance of ethical considerations in the application of AI in therapeutic contexts.This study provides valuable insights into the potential and limitations of using LLMs in psychological counseling for autistic adolescents. It lays the groundwork for future explorations into AI's role in mental health care, emphasizing the need for ongoing development to enhance the capabilities of these models in therapeutic settings.

Large Language Models’ Understanding of Math: Source Criticism and Extrapolation

paper_url: http://arxiv.org/abs/2311.07618
repo_url: None
paper_authors: Roozbeh Yousefzadeh, Xuenan Cao
for: 这篇论文是否提出了GPT-4模型是否具有数学理解能力的问题？
methods: 作者使用的方法是通过制作一些数学问题，其正式证明不可通过网络找到，以评估GPT-4模型是否具有数学理解能力。
results: 研究发现，GPT-4模型无法解决这些简单的数学问题，这casts doubt on whether GPT-4 has acquired an understanding of even basic mathematical concepts。

Abstract
It has been suggested that large language models such as GPT-4 have acquired some form of understanding beyond the correlations among the words in text including some understanding of mathematics as well. Here, we perform a critical inquiry into this claim by evaluating the mathematical understanding of the GPT-4 model. Considering that GPT-4's training set is a secret, it is not straightforward to evaluate whether the model's correct answers are based on a mathematical understanding or based on replication of proofs that the model has seen before. We specifically craft mathematical questions which their formal proofs are not readily available on the web, proofs that are more likely not seen by the GPT-4. We see that GPT-4 is unable to solve those problems despite their simplicity. It is hard to find scientific evidence suggesting that GPT-4 has acquired an understanding of even basic mathematical concepts. A straightforward way to find failure modes of GPT-4 in theorem proving is to craft questions where their formal proofs are not available on the web. Our finding suggests that GPT-4's ability is to reproduce, rephrase, and polish the mathematical proofs that it has seen before, and not in grasping mathematical concepts. We also see that GPT-4's ability to prove mathematical theorems is continuously expanding over time despite the claim that it is a fixed model. We suggest that the task of proving mathematical theorems in formal language is comparable to the methods used in search engines such as Google while predicting the next word in a sentence may be a misguided approach, a recipe that often leads to excessive extrapolation and eventual failures. Prompting the GPT-4 over and over may benefit the GPT-4 and the OpenAI, but we question whether it is valuable for machine learning or for theorem proving.

摘要
它已被建议大型语言模型如GPT-4已经获得了一些形式的理解，包括数学方面的理解。在这篇文章中，我们进行了一个批判性的评价，以评估GPT-4模型的数学理解能力。由于GPT-4的训练集是机密的，因此不可能直接评估模型是否基于数学理解或是基于复制证明的。我们专门设计了一些没有正式证明的数学问题，以验证GPT-4是否具备数学理解能力。我们发现GPT-4无法解决这些问题，即使它们的简单程度。这 suggeSTS that GPT-4没有获得基本数学概念的理解。我们的发现显示GPT-4的能力是重复、重写和精致化已经看过的数学证明，而不是具备数学概念的理解。此外，我们发现GPT-4的数学证明能力随时间的推移而增长，这与 claim 的固定模型不符。我们建议使用正式语言进行数学证明，而不是预测下一个字的方法，这种方法通常会导致不必要的推理和最终失败。重复提示GPT-4可能对GPT-4和OpenAI有利，但我们问到是这种方法对机器学习或数学证明有价值。

ReIDTracker Sea: the technical report of BoaTrack and SeaDronesSee-MOT challenge at MaCVi of WACV24

paper_url: http://arxiv.org/abs/2311.07616
repo_url: None
paper_authors: Kaer Huang, Weitu Chong
for: solves the problem of multi-object tracking in maritime unmanned aerial vehicles (UAVs) and unmanned surface vehicles (USVs) usage scenarios, with a completely unsupervised approach.
methods: uses instance representation learning by self-supervision on ImageNet, and cooperates with high-quality detectors to complete the multi-target tracking task simply and efficiently.
results: achieved top 3 performance on both UAV-based Multi-Object Tracking with Reidentification and USV-based Multi-Object Tracking benchmarks, and won the championship in many multiple Multi-Object Tracking competitions, such as BDD100K MOT, MOTS, and Waymo 2D MOT.

Abstract
Multi-Object Tracking is one of the most important technologies in maritime computer vision. Our solution tries to explore Multi-Object Tracking in maritime Unmanned Aerial vehicles (UAVs) and Unmanned Surface Vehicles (USVs) usage scenarios. Most of the current Multi-Object Tracking algorithms require complex association strategies and association information (2D location and motion, 3D motion, 3D depth, 2D appearance) to achieve better performance, which makes the entire tracking system extremely complex and heavy. At the same time, most of the current Multi-Object Tracking algorithms still require video annotation data which is costly to obtain for training. Our solution tries to explore Multi-Object Tracking in a completely unsupervised way. The scheme accomplishes instance representation learning by using self-supervision on ImageNet. Then, by cooperating with high-quality detectors, the multi-target tracking task can be completed simply and efficiently. The scheme achieved top 3 performance on both UAV-based Multi-Object Tracking with Reidentification and USV-based Multi-Object Tracking benchmarks and the solution won the championship in many multiple Multi-Object Tracking competitions. such as BDD100K MOT,MOTS, Waymo 2D MOT

摘要
多bject 跟踪是海上计算机视觉中最重要的技术之一。我们的解决方案尝试在海上无人飞机（UAV）和无人水面车（USV）使用场景中探索多bject 跟踪。现有的多bject 跟踪算法大多需要复杂的关联策略和关联信息（2D位置和运动、3D运动、3D深度、2D外观）以实现更好的性能，这使整个跟踪系统变得极其复杂和重量。同时，大多数现有的多bject 跟踪算法仍需要视频注释数据进行训练，这是昂贵的。我们的解决方案尝试在无监督的情况下实现多bject 跟踪。我们使用 ImageNet 上的自我超级vised 学习实现实例表示学习，然后与高质量的探测器合作，完成了多bject 跟踪任务，这种方法简单、高效。我们的方案在 UAV-based 多bject 跟踪与标识和 USV-based 多bject 跟踪标准套件上达到了前三名的表现，并在多个多bject 跟踪竞赛中获得了冠军。例如 BDD100K MOT、MOTS、Waymo 2D MOT 等。

Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data

paper_url: http://arxiv.org/abs/2311.06753
repo_url: None
paper_authors: Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer
for: 延伸LLM模型，提供整体的语音处理和推理能力，无需使用精心挑选的对应数据。
methods: 使用音频提示代替文本，维护广泛的LLM功能，并能够交换文本和音频模式，利用对话的前Context提供更好的结果。
results: 实验显示，我们的端到端方法与或超越箱式系统（音频识别器+LLM）在响应提示的模型化方面具有相同或更好的性能。

Abstract
In this work, we extend the instruction-tuned Llama-2 model with end-to-end general-purpose speech processing and reasoning abilities while maintaining the wide range of LLM capabilities, without using any carefully curated paired data. The proposed model can utilize audio prompts as a replacement for text and sustain a conversation. Such a model also has extended cross-modal capabilities such as being able to perform speech question answering, speech translation, and audio summarization amongst many other closed and open-domain tasks. This is unlike prior approaches in speech, in which LLMs are extended to handle audio for a limited number of pre-designated tasks. Experiments show that our end-to-end approach is on par with or outperforms a cascaded system (speech recognizer + LLM) in terms of modeling the response to a prompt. Furthermore, unlike a cascade, our approach shows the ability to interchange text and audio modalities and utilize the prior context in a conversation to provide better results.

摘要
在这项工作中，我们扩展了 LLama-2 模型，以涉及总体语音处理和推理能力，而无需使用任何精心准备的对应数据。我们提议的模型可以使用音频提示而不是文本进行对话，并且可以实现多modal功能，如语音问答、语音翻译和音频摘要等多种关闭和开放领域任务。这与之前的语音处理方法不同， LLMs 被扩展以处理音频，仅用于限定数量的预定义任务。实验显示，我们的末端方法与或超过简单系统（语音识别器 + LLM）在回快提示的模型化方面。此外，我们的方法还能够交换文本和音频模式，并使用对话中的先前 контекст提供更好的结果。

Federated Learning for Generalization, Robustness, Fairness: A Survey and Benchmark

paper_url: http://arxiv.org/abs/2311.06750
repo_url: https://github.com/wenkehuang/marsfl
paper_authors: Wenke Huang, Mang Ye, Zekun Shi, Guancheng Wan, He Li, Bo Du, Qiang Yang
for: 本文提供了一个系统性的报告 Federated Learning 的最新研究发展。
methods: 本文综述了 Federated Learning 的三大研究方向：泛化、稳定性和公平性，并介绍了它们的背景概念、任务设定和主要挑战。
results: 本文对一些知名数据集进行了抽查，并提供了一个公共网站（https://github.com/WenkeHuang/MarsFL）以跟踪这个快速发展的领域的最新进展。

Abstract
Federated learning has emerged as a promising paradigm for privacy-preserving collaboration among different parties. Recently, with the popularity of federated learning, an influx of approaches have delivered towards different realistic challenges. In this survey, we provide a systematic overview of the important and recent developments of research on federated learning. Firstly, we introduce the study history and terminology definition of this area. Then, we comprehensively review three basic lines of research: generalization, robustness, and fairness, by introducing their respective background concepts, task settings, and main challenges. We also offer a detailed overview of representative literature on both methods and datasets. We further benchmark the reviewed methods on several well-known datasets. Finally, we point out several open issues in this field and suggest opportunities for further research. We also provide a public website to continuously track developments in this fast advancing field: https://github.com/WenkeHuang/MarsFL.

摘要
federated learning 已经出现为一种保持隐私的合作方式，各种不同的方面之间进行合作。随着联合学习的流行，一些方法在不同的挑战上提出了许多方法。在这份报告中，我们提供了联合学习的系统性评论，包括该领域的研究历史和术语定义，以及三个基本的研究方向：泛化、鲁棒性和公正。我们还详细介绍了这些研究方向的背景概念、任务设定和主要挑战。此外，我们还为评论的方法和数据集进行了详细的介绍。最后，我们对一些知名的数据集进行了比较。我们还指出了这个领域的一些开放问题，并建议了进一步的研究方向。此外，我们还提供了一个公共网站，以跟踪这个快速发展的领域的最新进展：https://github.com/WenkeHuang/MarsFL。

Aggregate, Decompose, and Fine-Tune: A Simple Yet Effective Factor-Tuning Method for Vision Transformer

paper_url: http://arxiv.org/abs/2311.06749
repo_url: None
paper_authors: Dongping Chen
for: 提高vision transformer（ViT）模型的 Fine-Tuning效果，解决inner-和cross-layer redundancy问题。
methods: 提出了一种简单 yet effective的 Fine-Tuning方法，即EFfective Factor-Tuning（EFFT）。
results: 在VTAB-1K数据集上，EFFT方法超过了所有基eline，在top-1准确率中取得了75.9%，仅使用0.28%的参数进行全面 Fine-Tuning。

Abstract
Recent advancements have illuminated the efficacy of some tensorization-decomposition Parameter-Efficient Fine-Tuning methods like LoRA and FacT in the context of Vision Transformers (ViT). However, these methods grapple with the challenges of inadequately addressing inner- and cross-layer redundancy. To tackle this issue, we introduce EFfective Factor-Tuning (EFFT), a simple yet effective fine-tuning method. Within the VTAB-1K dataset, our EFFT surpasses all baselines, attaining state-of-the-art performance with a categorical average of 75.9% in top-1 accuracy with only 0.28% of the parameters for full fine-tuning. Considering the simplicity and efficacy of EFFT, it holds the potential to serve as a foundational benchmark. The code and model are now available at https://github.com/Dongping-Chen/EFFT-EFfective-Factor-Tuning.

摘要
现有最新的进展在tensorization-decomposition Parameter-Efficient Fine-Tuning方法如LoRA和FacT在视transformer（ViT）上得到了证明。然而，这些方法尚未能充分地处理内部和交叉层重复。为解决这个问题，我们介绍了效果因素调整（EFFT），一种简单 yet effective fine-tuning方法。在VTAB-1K数据集上，我们的EFFT超过了所有基elines，在顶部一Accuracy中达到了75.9%，只需0.28%的参数进行全面 fine-tuning。考虑到EFFT的简单性和效果，它具有潜在的基础性。代码和模型现在可以在https://github.com/Dongping-Chen/EFFT-EFfective-Factor-Tuning中找到。

Two Stream Scene Understanding on Graph Embedding

paper_url: http://arxiv.org/abs/2311.06746
repo_url: None
paper_authors: Wenkai Yang, Wenyuan Sun, Runxaing Huang
for: 提高计算机视觉中场景理解的能力
methods: 使用两条流网络架构，其中一条是图像特征流，另一条是图像树结构，将两者融合以提高图像分类和场景图生成任务的性能
results: 在ADE20K数据集上进行了实验，并证明了该方法可以提高图像分类精度 compared to 传统方法

Abstract
The paper presents a novel two-stream network architecture for enhancing scene understanding in computer vision. This architecture utilizes a graph feature stream and an image feature stream, aiming to merge the strengths of both modalities for improved performance in image classification and scene graph generation tasks. The graph feature stream network comprises a segmentation structure, scene graph generation, and a graph representation module. The segmentation structure employs the UPSNet architecture with a backbone that can be a residual network, Vit, or Swin Transformer. The scene graph generation component focuses on extracting object labels and neighborhood relationships from the semantic map to create a scene graph. Graph Convolutional Networks (GCN), GraphSAGE, and Graph Attention Networks (GAT) are employed for graph representation, with an emphasis on capturing node features and their interconnections. The image feature stream network, on the other hand, focuses on image classification through the use of Vision Transformer and Swin Transformer models. The two streams are fused using various data fusion methods. This fusion is designed to leverage the complementary strengths of graph-based and image-based features.Experiments conducted on the ADE20K dataset demonstrate the effectiveness of the proposed two-stream network in improving image classification accuracy compared to conventional methods. This research provides a significant contribution to the field of computer vision, particularly in the areas of scene understanding and image classification, by effectively combining graph-based and image-based approaches.

摘要

Detecting and Correcting Hate Speech in Multimodal Memes with Large Visual Language Model

paper_url: http://arxiv.org/abs/2311.06737
repo_url: None
paper_authors: Minh-Hao Van, Xintao Wu
for: 这个论文主要是为了探讨大语言模型（LLMs）在视觉语言处理中的应用，以及在社交媒体平台上使用这些模型来检测和修正仇恨幻灵的能力。
methods: 该论文使用了预训练的LLaVA模型，通过零shot提示来实现仇恨幻灵检测和修正任务。
results: 经验证明，预训练的LLaVA模型在仇恨幻灵检测和修正任务中具有显著的效果，但也存在一些缺点和局限性。

Abstract
Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore more emergent abilities in multimodality. Visual language models (VLMs), such as LLaVA, Flamingo, or GPT-4, have demonstrated impressive performance on various visio-linguistic tasks. Consequently, there are enormous applications of large models that could be potentially used on social media platforms. Despite that, there is a lack of related work on detecting or correcting hateful memes with VLMs. In this work, we study the ability of VLMs on hateful meme detection and hateful meme correction tasks with zero-shot prompting. From our empirical experiments, we show the effectiveness of the pretrained LLaVA model and discuss its strengths and weaknesses in these tasks.

摘要
Translation notes:* "Recently" is translated as "最近" (most recent)* "large language models" is translated as "大型语言模型" (dà xíng yǔ yán módel)* "integrating" is translated as "结合" (combine)* "vision" is translated as "视觉" (sī jué)* "visio-linguistic" is translated as "语言视觉" (yǔ yán sī jué)* "Visual language models" is translated as "视觉语言模型" (sī jué yǔ yán módel)* "hateful memes" is translated as "仇恨的照片" (chōu hěn de zhào pǐn)* "zero-shot prompting" is translated as "零shot提示" (zhèng shè zhǐ xiǎng)* "empirical experiments" is translated as "实验" (shí yàn)* "strengths and weaknesses" is translated as "优点和缺点" (yù diǎn hé zuò diǎn)

DeepQC: A Deep Learning System for Automatic Quality Control of In-situ Soil Moisture Sensor Time Series Data

paper_url: http://arxiv.org/abs/2311.06735
repo_url: None
paper_authors: Lahari Bandaru, Bharat C Irigireddy, Brian Davis
for: 这个研究的目的是为了开发一种能够在农业中实时检测 anomalies 的深度学习模型，以提高 soil moisture 数据的质量并帮助农民更好地管理气候变化中的风险。methods: 这个研究使用了 Bi-directional Long Short-Term Memory (LSTM) 模型，称为 DeepQC，来检测 soil moisture 数据中的 anomalies。该模型使用了手动标注的 PSA 观测数据进行训练、验证和测试，并与 Flagit 模型进行比较。results: 研究发现，DeepQC 模型可以准确地检测 soil moisture 数据中的 anomalies，并且其性能与数据量无关。相比之下，Flaggit 模型在检测 anomalies 时存在一定的限制。此外，DeepQC 模型可以在 significatively 更快的时间内完成检测任务。

Abstract
Amidst changing climate, real-time soil moisture monitoring is vital for the development of in-season decision support tools to help farmers manage weather related risks. Precision Sustainable Agriculture (PSA) recently established a real-time soil moisture monitoring network across the central, Midwest, and eastern U.S., but field-scale sensor observations often come with data gaps and anomalies. To maintain the data quality needed for development of decision tools, a quality control system is necessary. The International Soil Moisture Network (ISMN) introduced the Flagit module for anomaly detection in soil moisture observations. However, under certain conditions, Flagit's quality control approaches may underperform in identifying anomalies. Recently deep learning methods have been successfully applied to detect anomalies in time series data in various disciplines. However, their use in agriculture has not been yet investigated. This study focuses on developing a Bi-directional Long Short-Term Memory (LSTM) model, referred to as DeepQC, to identify anomalies in soil moisture data. Manual flagged PSA observations were used for training, validation, and testing the model, following an 80:10:10 split. The study then compared the DeepQC and Flagit based estimates to assess their relative performance. Flagit corrected flagged 95.5% of the corrected observations and 50.3% of the anomaly observations, indicating its limitations in identifying anomalies. On the other hand, the DeepQC correctly flagged 99.7% of the correct observations and 95.6% of the anomalies in significantly less time, demonstrating its superiority over Flagit approach. Importantly, DeepQC's performance remained consistent regardless of the number of anomalies. Given the promising results obtained with the DeepQC, future studies will focus on implementing this model on national and global soil moisture networks.

摘要
在变化的气候条件下，实时土壤湿度监测是不可或缺的 для开发季节性决策支持工具，帮助农民管理气候相关风险。准确可持续农业（PSA）已经在中部、中西部和东部美国建立了实时土壤湿度监测网络，但场地级别的感器观测经常出现数据 gap和异常。为保持需要的数据质量，一个质控系统是必要的。国际土壤湿度网络（ISMN）提出了Flagit模块，用于土壤湿度观测中异常检测，但在某些条件下，Flagit的质控方法可能会下降异常检测性能。现在，深度学习方法在不同领域的时间序列数据中已经得到了成功应用，但在农业中尚未得到研究。本研究旨在开发一种双向长短时间记忆网络（LSTM）模型，称为DeepQC，用于土壤湿度观测中异常检测。使用PSA手动标注的训练、验证和测试数据，按照80:10:10的分 splitting。研究 comparing DeepQC和Flagit两种方法的相对性能，结果显示，DeepQC可以更高精度地检测异常，并且可以在迅速的时间内完成。此外，DeepQC的性能不受异常数量的影响，表明其在不同情况下具有优势。这些结果表明，DeepQC可以成为土壤湿度观测中异常检测的有力工具。未来研究将ocus on在全国和全球土壤湿度网络中实施DeepQC模型。

An advantage based policy transfer algorithm for reinforcement learning with metrics of transferability

paper_url: http://arxiv.org/abs/2311.06731
repo_url: None
paper_authors: Md Ferdous Alam, Parinaz Naghizadeh, David Hoelzle
for: 这个论文的目的是提出一种基于优势的离线策略传递算法（APT-RL），用于在固定Domain环境中进行策略传递。
methods: 这个论文使用了优势概念作为补偿，将知识从源环境传递到目标环境，并提出了一种新的传递性能度标准来评估传递RL算法的性能。
results: 数值实验表明，APT-RL在三个连续控制 benchmark 任务中表现出色，比既有的传递RL算法更高效，并且在大多数任务上比学习从头开始更高效$10%$到$75%$。

Abstract
Reinforcement learning (RL) can enable sequential decision-making in complex and high-dimensional environments if the acquisition of a new state-action pair is efficient, i.e., when interaction with the environment is inexpensive. However, there are a myriad of real-world applications in which a high number of interactions are infeasible. In these environments, transfer RL algorithms, which can be used for the transfer of knowledge from one or multiple source environments to a target environment, have been shown to increase learning speed and improve initial and asymptotic performance. However, most existing transfer RL algorithms are on-policy and sample inefficient, and often require heuristic choices in algorithm design. This paper proposes an off-policy Advantage-based Policy Transfer algorithm, APT-RL, for fixed domain environments. Its novelty is in using the popular notion of ``advantage'' as a regularizer, to weigh the knowledge that should be transferred from the source, relative to new knowledge learned in the target, removing the need for heuristic choices. Further, we propose a new transfer performance metric to evaluate the performance of our algorithm and unify existing transfer RL frameworks. Finally, we present a scalable, theoretically-backed task similarity measurement algorithm to illustrate the alignments between our proposed transferability metric and similarities between source and target environments. Numerical experiments on three continuous control benchmark tasks demonstrate that APT-RL outperforms existing transfer RL algorithms on most tasks, and is $10\%$ to $75\%$ more sample efficient than learning from scratch.

摘要
利用增强学习（RL）可以在复杂高维环境中进行顺序决策，如果新的状态动作对获得是效果的，即在与环境交互时的成本低。然而，现实世界中有许多应用程序在哪里交互的数量太多。在这些环境中，转移RL算法可以将知识从一个或多个源环境传递到目标环境，提高学习速度和初始和终态性能。然而，现有的大多数转移RL算法是在政策上的，并且通常需要许多的参数选择。本文提出了一种基于优点的转移RL算法，称为APT-RL，用于固定领域环境。它的创新在于使用流行的“优点”概念作为规则，以比较源环境中的知识和目标环境中的新知识，从而消除参数选择。此外，我们提出了一个新的转移性能指标，用于评估我们的算法的性能，并将现有的转移RL框架统一。最后，我们提供了一种可扩展、理论支持的任务相似度测量算法，用于证明我们的转移性能指标与源和目标环境之间的相似性。数值实验表明，APT-RL在三个连续控制 benchmark 任务上表现出色，超过现有的转移RL算法，并且在大多数任务上比learn from scratch的学习效率高$10\%$到$75\%$。

Enabling Human-Centered AI: A Methodological Perspective

paper_url: http://arxiv.org/abs/2311.06703
repo_url: None
paper_authors: Wei Xu, Zaifeng Gao
for: 本文提出了一种涵盖设计目标、设计原则、实施方法、跨学科团队、HCAI方法和HCAI过程的完整HCAI框架，以帮助实践HCAI的应用。
methods: 本文提出了一种“三层”方法来实现HCAI框架，包括设计目标、设计原则、实施方法和HCAI过程。
results: 本文认为该框架可以解决现有HCAI框架的杠�和实践中遇到的挑战，并且可以帮助实践HCAI的应用。

Abstract
Human-centered AI (HCAI) is a design philosophy that advocates prioritizing humans in designing, developing, and deploying intelligent systems, aiming to maximize the benefits of AI to humans and avoid potential adverse impacts. While HCAI continues to influence, the lack of guidance on methodology in practice makes its adoption challenging. This paper proposes a comprehensive HCAI framework based on our previous work with integrated components, including design goals, design principles, implementation approaches, interdisciplinary teams, HCAI methods, and HCAI processes. This paper also presents a "three-layer" approach to facilitate the implementation of the framework. We believe this systematic and executable framework can overcome the weaknesses in current HCAI frameworks and the challenges currently faced in practice, putting it into action to enable HCAI further.

摘要
人类中心的人工智能（HCAI）是一种设计哲学，强调在设计、开发和部署智能系统时，优先考虑人类的需求和利益，以最大化人工智能对人类的 beneficial impacts，避免 potential negative impacts。 although HCAI continues to influence, the lack of guidance on methodology in practice makes its adoption challenging. This paper proposes a comprehensive HCAI framework based on our previous work with integrated components, including design goals, design principles, implementation approaches, interdisciplinary teams, HCAI methods, and HCAI processes. This paper also presents a "three-layer" approach to facilitate the implementation of the framework. We believe this systematic and executable framework can overcome the weaknesses in current HCAI frameworks and the challenges currently faced in practice, putting it into action to enable HCAI further.Here's the translation of the text into Traditional Chinese:人类中心的人工智能（HCAI）是一种设计哲学，强调在设计、开发和部署智能系统时，优先考虑人类的需求和利益，以最大化人工智能对人类的 beneficial impacts，避免 potential negative impacts。 although HCAI continues to influence, the lack of guidance on methodology in practice makes its adoption challenging. This paper proposes a comprehensive HCAI framework based on our previous work with integrated components, including design goals, design principles, implementation approaches, interdisciplinary teams, HCAI methods, and HCAI processes. This paper also presents a "three-layer" approach to facilitate the implementation of the framework. We believe this systematic and executable framework can overcome the weaknesses in current HCAI frameworks and the challenges currently faced in practice, putting it into action to enable HCAI further.

An Investigation of Hepatitis B Virus Genome using Markov Models

paper_url: http://arxiv.org/abs/2311.06699
repo_url: None
paper_authors: Khadijeh, Jahanian, Elnaz Shalbafian, Morteza Saberi, Roohallah Alizadehsani, Iman Dehzangi
for:这项研究的目的是investigating the mutational footprint of APOBEC3 enzymes in the HBV genome.methods:这项研究使用了一种多变量数据分析技术，通过分析全基因组HBV序列从多种自然感染者中获取的数据，以探索APOBEC3蛋白的抑瘤作用。results:这项研究发现，APOBEC3蛋白对HBV genomes的抑瘤具有不同的结果，并且这些结果与HIV genomes中的抑瘤结果不同。 Specifically, the study found that either APOBEC3 enzymes are not active against HBV, or the induction of G-to-A mutations by these enzymes is not sequence context-dependent in the HBV genome.

Abstract
The human genome encodes a family of editing enzymes known as APOBEC3 (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3). Several family members, such as APO-BEC3G, APOBEC3F, and APOBEC3H haplotype II, exhibit activity against viruses such as HIV. These enzymes induce C-to-U mutations in the negative strand of viral genomes, resulting in multiple G-to-A changes, commonly referred to as 'hypermutation.' Mutations catalyzed by these enzymes are sequence context-dependent in the HIV genome; for instance, APOBEC3G preferen-tially mutates G within GG, TGG, and TGGG contexts, while other members mutate G within GA, TGA, and TGAA contexts. However, the same sequence context has not been explored in relation to these enzymes and HBV. In this study, our objective is to identify the mutational footprint of APOBEC3 enzymes in the HBV genome. To achieve this, we employ a multivariable data analytics technique to investigate motif preferences and potential sequence hierarchies of mutation by APOBEC3 enzymes using full genome HBV sequences from a diverse range of naturally infected patients. This approach allows us to distinguish between normal and hypermutated sequences based on the representation of mono- to tetra-nucleotide motifs. Additionally, we aim to identify motifs associated with hypermutation induced by different APOBEC3 enzymes in HBV genomes. Our analyses reveal that either APOBEC3 enzymes are not active against HBV, or the induction of G-to-A mutations by these enzymes is not sequence context-dependent in the HBV genome.

摘要
人类基因组编码一家编辑酶家族，称为APOBEC3（肽聚簇BmRNA编辑酶， catalytic polypeptide-like 3）。家族成员，如APO-BEC3G、APOBEC3F和APOBEC3H haplotype II，对病毒如HIV有活性。这些酶在病毒基因组中引起C-to-U变化，导致多个G-to-A变化，通常称为"超迁变"。这些变化由这些酶catalyzed是病毒基因组中的序列上下文依赖的；例如，APOBEC3G更偏好在GG、TGG和TGGG上进行mutation，而其他成员更偏好在GA、TGA和TGAA上进行mutation。然而，这些序列上下文尚未在APOBEC3酶和HBV之间进行研究。在这项研究中，我们的目标是确定APOBEC3酶对HBV基因组中的mutational footprint。为了实现这一目标，我们使用多变量数据分析技术，investigate APOBEC3酶对HBV基因组中的motif偏好和可能的序列层次结构。这种方法允许我们根据全基因组HBV序列从多种自然感染的患者中分离出正常和高度突变序列。此外，我们还计划确定APOBEC3酶对HBV基因组中的特定序列上下文所启用的mutation。我们的分析发现，或者APOBEC3酶不活跃对HBV，或者它们在HBV基因组中不是序列上下文依赖的。

Conversational Data Exploration: A Game-Changer for Designing Data Science Pipelines

paper_url: http://arxiv.org/abs/2311.06695
repo_url: None
paper_authors: Genoveva Vargas-Solar, Tania Cerquitelli, Javier A. Espinosa-Oviedo, François Cheval, Anthelme Buchaille, Luca Polgar
for: 这篇论文是为了提供一种对话方式，使系统Chatin实现数据探索的直观体验。
methods: 该论文使用了一种新的数据科学解决方案，通过对话方式帮助非技术用户从不同领域探索数据，并从数据中提取知识。
results: 该论文通过实现对话方式，为非技术用户提供了一种直观的数据探索体验，并帮助他们更好地理解数据。

Abstract
This paper proposes a conversational approach implemented by the system Chatin for driving an intuitive data exploration experience. Our work aims to unlock the full potential of data analytics and artificial intelligence with a new generation of data science solutions. Chatin is a cutting-edge tool that democratises access to AI-driven solutions, empowering non-technical users from various disciplines to explore data and extract knowledge from it.

摘要
这篇论文提出了一种对话方式，通过系统Chatin实现了INTUISIVE DATA EXPLORATION EXPERIENCE。我们的工作目标是通过一代新的数据科学解决方案，解放数据分析和人工智能的潜力。Chatin是一种先进的工具，通过启用非技术用户从各个领域来探索数据，从数据中提取知识。

Comparative Multi-View Language Grounding

paper_url: http://arxiv.org/abs/2311.06694
repo_url: None
paper_authors: Chancharik Mitra, Abrar Anwar, Rodolfo Corona, Dan Klein, Trevor Darrell, Jesse Thomason
for: 解决对象引用问题时给出比较语言描述
methods: 使用 transformers 进行多视图考虑和语言描述的 Pragmatic 理解
results: 比较理解帮助实现 SOTA 性能在 SNARE 对象引用任务上

Abstract
In this work, we consider the task of resolving object referents when given a comparative language description. We present a Multi-view Approach to Grounding in Context (MAGiC) that leverages transformers to pragmatically reason over both objects given multiple image views and a language description. In contrast to past efforts that attempt to connect vision and language for this task without fully considering the resulting referential context, MAGiC makes use of the comparative information by jointly reasoning over multiple views of both object referent candidates and the referring language expression. We present an analysis demonstrating that comparative reasoning contributes to SOTA performance on the SNARE object reference task.

摘要
在这个工作中，我们考虑了对象引用的解决方法，当给定一个比较语言描述时。我们提出了一种多视图方法 для文本背景（MAGiC），利用转换器来 Pragmatic 地在多个图像视图和语言描述之间进行关系reasoning。与过去的尝试不同，MAGiC 利用比较信息，同时对多个对象引用候选者和引用语言表达进行联合理解。我们提供分析，表明了比较理解对 SNARE 对象引用任务的最高性能做出贡献。

2023-11-12

cs.CL

cs.CL - 2023-11-12

SELF-EXPLAIN: Teaching Large Language Models to Reason Complex Questions by Themselves

paper_url: http://arxiv.org/abs/2311.06985
repo_url: None
paper_authors: Jiachen Zhao, Zonghai Yao, Zhichao Yang, Hong Yu
for: 提高大语言模型的可靠逻辑能力
methods: 使用自我解释生成链条例子
results: 使用自我解释可以提高大语言模型的自信度、准确率和不偏率，并在复杂问题 answering 任务上达到或超过人工制作的 CoT 例子表现。

Abstract
Large language models (LLMs) can generate intermediate reasoning steps. To elicit the reliable reasoning, the common practice is to employ few-shot chain-of-thought prompting, where several in-context demonstrations for reasoning are prepended to the question. However, such chain-of-thought examples are expensive to craft, especially for professional domains, and can have high variance depending on human annotators. Therefore, this work investigates whether LLMs can teach themselves to reason without human-crafted demonstrations. We propose SELF-EXPLAIN to generate CoT examples by LLMs inspired by "encoding specificity" in human memory retrieval. We find using self-explanations makes LLMs more confident, more calibrated and less biased when answering complex questions. Moreover, we find prompting with self-explanations can even significantly outperform using human-crafted CoTs on several complex question answering dataset.

摘要

Retrieval and Generative Approaches for a Pregnancy Chatbot in Nepali with Stemmed and Non-Stemmed Data : A Comparative Study

paper_url: http://arxiv.org/abs/2311.06898
repo_url: None
paper_authors: Sujan Poudel, Nabin Ghimire, Bipesh Subedi, Saugat Singh
for: 这个研究旨在开发一个基于自然语言处理（NLP）技术的医疗域聊天机器人，提供有关怀孕信息。
methods: 这个研究使用了两种不同的NLP基本方法，一种是基于BERT的多类分类 Retrieval Approach，另一种是基于Transformer的生成型聊天机器人。
results: 实验结果表明，BERT基本模型在非分词数据上表现良好，而自建Transformer模型在分词数据上表现更好。在非分词数据上，DistilBERT模型 achieved highest training和验证精度，testing精度为0.9165。在生成方法中，使用transformer 1 gram BLEU和2 gram BLEU得分为0.3570和0.1413。

Abstract
The field of Natural Language Processing which involves the use of artificial intelligence to support human languages has seen tremendous growth due to its high-quality features. Its applications such as language translation, chatbots, virtual assistants, search autocomplete, and autocorrect are widely used in various domains including healthcare, advertising, customer service, and target advertising. To provide pregnancy-related information a health domain chatbot has been proposed and this work explores two different NLP-based approaches for developing the chatbot. The first approach is a multiclass classification-based retrieval approach using BERTbased multilingual BERT and multilingual DistilBERT while the other approach employs a transformer-based generative chatbot for pregnancy-related information. The performance of both stemmed and non-stemmed datasets in Nepali language has been analyzed for each approach. The experimented results indicate that BERT-based pre-trained models perform well on non-stemmed data whereas scratch transformer models have better performance on stemmed data. Among the models tested the DistilBERT model achieved the highest training and validation accuracy and testing accuracy of 0.9165 on the retrieval-based model architecture implementation on the non-stemmed dataset. Similarly, in the generative approach architecture implementation with transformer 1 gram BLEU and 2 gram BLEU scores of 0.3570 and 0.1413 respectively were achieved.

摘要
自然语言处理（NLP）领域，利用人工智能支持人类语言的应用有很大的发展 potential，这主要是因为它的高质量特性。其应用包括语言翻译、chatbot、虚拟助手、搜索自动完成和自动修改等，在医疗、广告、客服等领域都有广泛的应用。为了提供妊娠相关信息，这种工作提出了一个医学领域chatbot，本文探讨了两种不同的NLP基于的方法来开发chatbot。第一种方法是基于BERT的多类分类 retrieve Approach，使用BERT和DistilBERT multilingual模型；第二种方法是基于transformer的生成chatbot Approach。对于尼泊尔语的分ensed和不分ensed数据进行了分析。实验结果显示，BERT基于预训练模型在不分ensed数据上表现良好，而凿transformer模型在分ensed数据上表现更好。在多个模型中，DistilBERT模型在非分ensed数据上达到了0.9165的训练和验证精度，以及0.3570和0.1413的1 gram BLEU和2 gram BLEU分数。

DialMAT: Dialogue-Enabled Transformer with Moment-Based Adversarial Training

paper_url: http://arxiv.org/abs/2311.06855
repo_url: https://github.com/keio-smilab23/dialmat
paper_authors: Kanta Kaneda, Ryosuke Korekata, Yuiga Wada, Shunya Nagashima, Motonari Kambara, Yui Iioka, Haruka Matsuo, Yuto Imai, Takayuki Nishimura, Komei Sugiura
for: 本研究 targets the DialFRED task, which is the task of embodied instruction following in a setting where an agent can actively ask questions about the task.
methods: 本研究提出了 DialMAT，它使用了瞬时对抗训练，将对抗扰动添加到语言、图像和动作的幂论空间中。此外，它还引入了跨模态平行特征提取机制，通过基础模型对语言和图像进行同时学习。
results: 我们使用了基于DialFRED dataset构建的数据集进行评估，并与基线方法进行比较。结果显示，我们的模型在成功率和路径权重成功率上表现了superiority。此外，我们的模型在CVPR 2023 Embodied AI工作坊上举行的DialFRED Challenge中获得了第一名。

Abstract
This paper focuses on the DialFRED task, which is the task of embodied instruction following in a setting where an agent can actively ask questions about the task. To address this task, we propose DialMAT. DialMAT introduces Moment-based Adversarial Training, which incorporates adversarial perturbations into the latent space of language, image, and action. Additionally, it introduces a crossmodal parallel feature extraction mechanism that applies foundation models to both language and image. We evaluated our model using a dataset constructed from the DialFRED dataset and demonstrated superior performance compared to the baseline method in terms of success rate and path weighted success rate. The model secured the top position in the DialFRED Challenge, which took place at the CVPR 2023 Embodied AI workshop.

摘要
这篇论文关注的是DialogFRED任务，即在一个 Setting 中，智能机器可以主动提问任务的 Embodied instruction following 任务。为解决这个任务，我们提议了DialMAT。DialMAT 引入了时刻基于的对抗训练，将对抗扰动包含在语言、图像和动作的含义空间中。此外，它还引入了跨Modal 平行特征提取机制，使用基础模型来处理语言和图像。我们使用了基于 DialogFRED dataset constructed 的数据集进行评估，并证明了与基线方法相比，我们的模型在成功率和路径权重成功率方面具有显著优势。我们的模型在 CVPR 2023 Embodied AI 工作坊举行的 DialFRED 挑战中占据了第一名。

Automatic Textual Normalization for Hate Speech Detection

paper_url: http://arxiv.org/abs/2311.06851
repo_url: https://github.com/anhhoang0529/small-lexnormvihsd
paper_authors: Anh Thi-Hoang Nguyen, Dung Ha Nguyen, Nguyet Thi Nguyen, Khanh Thanh-Duy Ho, Kiet Van Nguyen
for: 本研究旨在提高社交媒体数据中非标pecific字（NSW）的处理能力，以便进行更好的自然语言处理（NLP）任务。
methods: 本研究使用单一的sequence-to-sequence（Seq2Seq）模型进行文本正常化，并提供了2,181个人注解的评论数据集，它们的间接标注协调率为0.9014。
results: 研究表明，通过使用Seq2Seq模型进行文本正常化，可以提高针对社交媒体数据的仇恨言语检测（HSD）任务的准确率约2%。此外，文本正常化还可以提高NLP任务的总表现水平。数据集可供研究用途。

Abstract
Social media data is a valuable resource for research, yet it contains a wide range of non-standard words (NSW). These irregularities hinder the effective operation of NLP tools. Current state-of-the-art methods for the Vietnamese language address this issue as a problem of lexical normalization, involving the creation of manual rules or the implementation of multi-staged deep learning frameworks, which necessitate extensive efforts to craft intricate rules. In contrast, our approach is straightforward, employing solely a sequence-to-sequence (Seq2Seq) model. In this research, we provide a dataset for textual normalization, comprising 2,181 human-annotated comments with an inter-annotator agreement of 0.9014. By leveraging the Seq2Seq model for textual normalization, our results reveal that the accuracy achieved falls slightly short of 70%. Nevertheless, textual normalization enhances the accuracy of the Hate Speech Detection (HSD) task by approximately 2%, demonstrating its potential to improve the performance of complex NLP tasks. Our dataset is accessible for research purposes.

摘要
社交媒体数据是一种有价值的资源 для研究，但它包含了广泛的非标准词 (NSW)。这些异常会阻碍NLP工具的有效运行。现有的state-of-the-art方法对越南语言问题解决这个问题为词语正常化问题，需要创建手动规则或实施多stage深度学习框架，这需要很大的努力来编写复杂的规则。与之相反，我们的方法是简单的，只使用序列到序列（Seq2Seq）模型。在这项研究中，我们提供了文本正常化的数据集，包括2,181个人注释的评论，其间的间隔注释协调度为0.9014。通过利用Seq2Seq模型进行文本正常化，我们的结果表明，准确率达到了约70%。虽然文本正常化提高了仇恨言语检测（HSD）任务的准确率约2%，这表明它有可能改善复杂NLP任务的性能。我们的数据集可以用于研究用途。

GIELLM: Japanese General Information Extraction Large Language Model Utilizing Mutual Reinforcement Effect

paper_url: http://arxiv.org/abs/2311.06838
repo_url: None
paper_authors: Chengguang Gan, Qinghao Zhang, Tatsunori Mori
for: 本研究旨在开发一个能同时处理多种自然语言处理（NLP）子任务的通用语言模型（GIELLM），以提高现有的专门化模型的性能。methods: 本研究使用了一个统一的输入-输出架构， integrate了文本分类、情感分析、名称实体识别、关系EXTRACTION和事件EXTRACTION等多种自然语言处理子任务。此外，研究还利用了相互强制效应（MRE），从而提高了统一任务中的性能。results: 实验结果显示，GIELLM在日本混合数据集上取得了State-of-the-Art（SOTA）的成绩，较GPT-3.5-Turbo有明显的改善。此外，在新的文本分类关系和事件EXTRACTION数据集上进行独立评估，也获得了相互强制效应的优化。这个突破破坏了传统的NLP子任务特化模型，并将大多数IE子任务整合到了一个通用语言模型架构中。

Abstract
Information Extraction (IE) stands as a cornerstone in natural language processing, traditionally segmented into distinct sub-tasks. The advent of Large Language Models (LLMs) heralds a paradigm shift, suggesting the feasibility of a singular model addressing multiple IE subtasks. In this vein, we introduce the General Information Extraction Large Language Model (GIELLM), which integrates text Classification, Sentiment Analysis, Named Entity Recognition, Relation Extraction, and Event Extraction using a uniform input-output schema. This innovation marks the first instance of a model simultaneously handling such a diverse array of IE subtasks. Notably, the GIELLM leverages the Mutual Reinforcement Effect (MRE), enhancing performance in integrated tasks compared to their isolated counterparts. Our experiments demonstrate State-of-the-Art (SOTA) results in five out of six Japanese mixed datasets, significantly surpassing GPT-3.5-Turbo. Further, an independent evaluation using the novel Text Classification Relation and Event Extraction(TCREE) dataset corroborates the synergistic advantages of MRE in text and word classification. This breakthrough paves the way for most IE subtasks to be subsumed under a singular LLM framework. Specialized fine-tune task-specific models are no longer needed.

摘要
信息提取（IE）作为自然语言处理的基石，曾经分为多个子任务。现在大语言模型（LLM）的出现，标志着一种新的 парадиг shift，提出了一个单一模型可以处理多个 IE 子任务。为此，我们介绍了通用信息提取大语言模型（GIELLM），它将文本分类、情感分析、名实Recognition、关系提取和事件提取 integrates into a uniform input-output schema.这是首次一个模型同时处理这些多样化的 IE 子任务。值得注意的是，GIELLM 利用了相互强制效应（MRE），在集成任务中提高了性能，相比独立的任务。我们的实验表明，GIELLM 在日本混合数据集上达到了状态之最（SOTA）水平，明显超过 GPT-3.5-Turbo。此外，一个独立的评估使用新的文本类别关系和事件抽象（TCREE）数据集也证明了MRE在文本和单词分类中的共同优势。这一突破可能使得大多数 IE 子任务被纳入单一 LLM 框架中，不再需要专门的精化任务模型。

Cricket Player Profiling: Unraveling Strengths and Weaknesses Using Text Commentary Data

paper_url: http://arxiv.org/abs/2311.06818
repo_url: None
paper_authors: Swarup Ranjan Behera, Vijaya V. Saradhi
For: This paper aims to develop computational models to extract the rules governing cricket players’ strengths and weaknesses, with the goal of devising player-specific strategies.* Methods: The paper utilizes unstructured data from cricket text commentary to construct comprehensive strength and weakness rules for cricket players, and employs dimensionality reduction techniques to simplify the rule-building process.* Results: The paper conducts an in-depth analysis of cricket player strengths and weaknesses using a vast corpus of over one million text commentaries, and validates the constructed rules through two distinct methodologies: intrinsic and extrinsic. The results are made openly accessible, including the collected data, source code, and results for over 250 cricket players.

Abstract
Devising player-specific strategies in cricket necessitates a meticulous understanding of each player's unique strengths and weaknesses. Nevertheless, the absence of a definitive computational approach to extract such insights from cricket players poses a significant challenge. This paper seeks to address this gap by establishing computational models designed to extract the rules governing player strengths and weaknesses, thereby facilitating the development of tailored strategies for individual players. The complexity of this endeavor lies in several key areas: the selection of a suitable dataset, the precise definition of strength and weakness rules, the identification of an appropriate learning algorithm, and the validation of the derived rules. To tackle these challenges, we propose the utilization of unstructured data, specifically cricket text commentary, as a valuable resource for constructing comprehensive strength and weakness rules for cricket players. We also introduce computationally feasible definitions for the construction of these rules, and present a dimensionality reduction technique for the rule-building process. In order to showcase the practicality of this approach, we conduct an in-depth analysis of cricket player strengths and weaknesses using a vast corpus of more than one million text commentaries. Furthermore, we validate the constructed rules through two distinct methodologies: intrinsic and extrinsic. The outcomes of this research are made openly accessible, including the collected data, source code, and results for over 250 cricket players, which can be accessed at https://bit.ly/2PKuzx8.

摘要
制定玩家特定策略在板球需要非常细致地理解每名球员的独特优势和劣势。然而，没有一种确定的计算方法可以从板球球员中提取这些洞察。这篇论文希望通过建立计算模型，从板球球员中提取规则，以便为每名球员制定特定策略。这个复杂的任务存在多个关键领域：选择合适的数据集、准确定义优势和劣势规则、选择适当的学习算法和验证 derivated 规则。为了解决这些挑战，我们提议使用无结构数据，具体是板球文字评论，作为建立全面优势和劣势规则的 valuabel 资源。我们还介绍了计算可行的规则定义方法，并提出了维度减少技术来进行规则建立过程。为了证明这种方法的实用性，我们对板球球员的优势和劣势进行了深入分析，使用了超过一百万个文字评论。此外，我们还验证了建立的规则，通过两种不同的方法：内在和外在。研究结果将公开访问，包括收集的数据、源代码和结果，可以在https://bit.ly/2PKuzx8 中获取。

Evaluation of GPT-4 for chest X-ray impression generation: A reader study on performance and perception

paper_url: http://arxiv.org/abs/2311.06815
repo_url: None
paper_authors: Sebastian Ziegelmayer, Alexander W. Marka, Nicolas Lenhart, Nadja Nehls, Stefan Reischl, Felix Harder, Andreas Sauter, Marcus Makowski, Markus Graf, Joshua Gawlitza
for: 这个研究用于检查GPT-4模型是否可以生成高质量的胸部X射影印象。
methods: 研究使用了GPT-4模型，给它提供了图像、文本、图文三种不同的输入模式，然后让它生成对应的印象。 radiologist then blindly评分了这些印象，并将它们分为人类写的和AI生成的两类。
results: 研究发现，人类写的印象得分最高，但与文本基于印象的分数相似。自动评分指标与评分分数之间存在显著的相关性，但输入模式对检测AI生成印象的能力产生了差异。 AI生成的印象的评分比人类写的印象更差，即使用 radiologist 写的。

Abstract
The remarkable generative capabilities of multimodal foundation models are currently being explored for a variety of applications. Generating radiological impressions is a challenging task that could significantly reduce the workload of radiologists. In our study we explored and analyzed the generative abilities of GPT-4 for Chest X-ray impression generation. To generate and evaluate impressions of chest X-rays based on different input modalities (image, text, text and image), a blinded radiological report was written for 25-cases of the publicly available NIH-dataset. GPT-4 was given image, finding section or both sequentially to generate an input dependent impression. In a blind randomized reading, 4-radiologists rated the impressions and were asked to classify the impression origin (Human, AI), providing justification for their decision. Lastly text model evaluation metrics and their correlation with the radiological score (summation of the 4 dimensions) was assessed. According to the radiological score, the human-written impression was rated highest, although not significantly different to text-based impressions. The automated evaluation metrics showed moderate to substantial correlations to the radiological score for the image impressions, however individual scores were highly divergent among inputs, indicating insufficient representation of radiological quality. Detection of AI-generated impressions varied by input and was 61% for text-based impressions. Impressions classified as AI-generated had significantly worse radiological scores even when written by a radiologist, indicating potential bias. Our study revealed significant discrepancies between a radiological assessment and common automatic evaluation metrics depending on the model input. The detection of AI-generated findings is subject to bias that highly rated impressions are perceived as human-written.

摘要
“研究现在正在探索多modal基础模型的生成能力，以应用于各种领域。生成骨质影像是一个具有挑战性的任务，可以帮助骨科医生优化工作效率。我们在这个研究中探索了GPT-4模型的生成能力，并分析了它对骨质影像的生成。为了生成和评估不同输入模式（影像、文本、文本和影像）的骨质影像，我们将医学报告撰写为25例NIH数据集的隐藏标签。GPT-4模型获得了影像、发现部分或两者来生成输入对应的印象。在隐藏随机读取中，4名医生评估了印象，并被要求根据印象的来源（人类、AI）进行分类，并提供详细的评论。我们发现文本模型评估度和医学评分（四个维度的和）之间存在 Moderate to substantial 的相互相关性，但个别输入的评分存在很大的差异，这表明医学质量的抽象不够。我们发现自动生成印象的检测存在偏见，对于文本印象而言，检测率为61%。我们发现，即使由医生生成的AI印象，也存在偏见。我们的研究表明，医学评分和自动评估度之间存在差异，尤其是在不同的输入模式下。”

On the Robustness of Question Rewriting Systems to Questions of Varying Hardness

paper_url: http://arxiv.org/abs/2311.06807
repo_url: https://github.com/nusnlp/diffqre
paper_authors: Hai Ye, Hwee Tou Ng, Wenjuan Han
for: 本文关注在异 Reformulation 系统对 вопро题的灵活性进行扩展，以提高问题的 rewrite 灵活性。
methods: 本文提出了一种自动将问题分类为不同困难度的方法，并通过人工评估确定问题的 rewrite 困难度。 finally, 本文提出了一种新的学习框架，通过独立地在不同困难度的问题上训练 QR 模型，然后将这些模型组合成一个joint模型进行推理。
results: 实验结果表明，本文提出的方法可以提高问题的 rewrite 性能，并且在两个数据集上达到了比基eline更高的性能。

Abstract
In conversational question answering (CQA), the task of question rewriting~(QR) in context aims to rewrite a context-dependent question into an equivalent self-contained question that gives the same answer. In this paper, we are interested in the robustness of a QR system to questions varying in rewriting hardness or difficulty. Since there is a lack of questions classified based on their rewriting hardness, we first propose a heuristic method to automatically classify questions into subsets of varying hardness, by measuring the discrepancy between a question and its rewrite. To find out what makes questions hard or easy for rewriting, we then conduct a human evaluation to annotate the rewriting hardness of questions. Finally, to enhance the robustness of QR systems to questions of varying hardness, we propose a novel learning framework for QR that first trains a QR model independently on each subset of questions of a certain level of hardness, then combines these QR models as one joint model for inference. Experimental results on two datasets show that our framework improves the overall performance compared to the baselines.

摘要
在对话式问答（CQA）任务中，问题重写（QR）任务的目标是将上下文相依的问题重写成等效的自包含问题，以便得到相同的答案。在这篇论文中，我们对Question重写系统的稳定性具有兴趣。因为没有按 Rewrite 难度分类的问题，我们首先提出了一种euristic方法，使用问题和重写之间的差异来自动分类问题，并将其分为不同难度的子集。然后，我们进行了人工评估，以标注问题的重写难度。最后，我们提出了一种新的学习框架，用于增强Question重写系统对问题难度的Robustness。我们首先在每个难度水平上独立训练了QR模型，然后将这些QR模型组合成一个共同模型进行推理。实验结果表明，我们的框架可以提高对比基eline的总性能。

Tunable Soft Prompts are Messengers in Federated Learning

paper_url: http://arxiv.org/abs/2311.06805
repo_url: https://github.com/alibaba/federatedscope
paper_authors: Chenhe Dong, Yuexiang Xie, Bolin Ding, Ying Shen, Yaliang Li
for: 这个论文的目的是提出一种基于联合学习的新训练方法，以保护模型隐私并提高联合学习的效率。
methods: 该论文使用了软提示的技术，通过在服务器和客户端之间更新和传输软提示来实现信息交换。这些软提示将担任全球模型参数的角色，将本地数据和全球模型的有用知识传递给客户端进行训练。
results: 对比多个基eline，实验结果显示了提出的方法的效果，包括降低了联合学习的通信和计算成本，同时保护了全球模型的隐私。

Abstract
Federated learning (FL) enables multiple participants to collaboratively train machine learning models using decentralized data sources, alleviating privacy concerns that arise from directly sharing local data. However, the lack of model privacy protection in FL becomes an unneglectable challenge, especially when people want to federally finetune models based on a proprietary large language model. In this study, we propose a novel FL training approach that accomplishes information exchange among participants via tunable soft prompts. These soft prompts, updated and transmitted between the server and clients, assume the role of the global model parameters and serve as messengers to deliver useful knowledge from the local data and global model. As the global model itself is not required to be shared and the local training is conducted based on an auxiliary model with fewer parameters than the global model, the proposed approach provides protection for the global model while reducing communication and computation costs in FL. Extensive experiments show the effectiveness of the proposed approach compared to several baselines. We have released the source code at \url{https://github.com/alibaba/FederatedScope/tree/fedsp/federatedscope/nlp/fedsp}.

摘要
federated learning (FL) 允许多个参与者共同训练机器学习模型使用分散数据源，从直接分享本地数据中减轻隐私问题。然而，FL中模型隐私保护的缺失成为一个不可忽略的挑战，特别是当人们想 federally finetune 模型基于专有大型语言模型时。在这种研究中，我们提出了一种新的 FL 训练方法，通过可调软提示来实现参与者之间的信息交换。这些软提示在服务器和客户端之间往返更新和传输，担任全球模型参数的角色，将本地数据和全球模型中的有用知识传递给其他参与者。由于全球模型本身不需要直接分享，并且基于副本模型（具有较少参数）进行本地训练，我们的方法提供了全球模型的保护，同时降低了 FL 的通信和计算成本。我们的实验表明，我们的方法与多个基准方法进行比较，显著超出了这些基准方法。我们已经在 \url{https://github.com/alibaba/FederatedScope/tree/fedsp/federatedscope/nlp/fedsp} 上发布了源代码。

CLAMP: A Contrastive Language And Molecule Pre-training Network

paper_url: http://arxiv.org/abs/2311.07617
repo_url: https://github.com/neelr/clamp
paper_authors: Neel Redkar
for: 这篇论文探讨了一种新的材料生成方法，即语言到材料生成架构，利用了数百万个untapped数据点。
methods: 该方法使用了一种对偶模型，通过用一个 convolutional graph neural network encoder 和一个语言encoder来训练。这allow了无监督的零例试验分类，可以利用语言结构的特征。
results: 在实验中，该方法可以达到了约82%的准确率和约75%的光催化剂预测率，使用了一个非常小的数据集。这种新的网络可以应用于任何可以通过文本描述的反应，开启了完全新的方法来思考3D化学结构生成。

Abstract
This paper highlights a shift in how to approach material generation. Instead of material-to-material, we propose a language-to-material generation architecture that utilizes millions of untapped data points. Using a web scraper to collect crystal text pairs from open-source research papers, a contrastive model can be trained using a convolutional graph neural network encoder and a language encoder. This would allow unsupervised zero-shot classification which can be trained by taking advantage of linguistic structure. Without any specific training data, an ~82\% accuracy was achieved and ~75\% accuracy for photocatalyst prediction with an extremely small dataset. This novel network could ideally be cross-applied to any reaction that can be described via text, opening completely new methods to think about 3D chemical framework generation. In the full experiment diffusion models would likely be incorporated to fully exploit the latent space.

摘要
这篇论文描述了一种新的材料生成方法的shift。而不是传统的材料到材料的方法，我们提议使用语言到材料生成架构，利用了数百万个未利用的数据点。通过使用网络抓取器收集开源研究论文中的晶体文本对，我们可以使用一种对比模型来训练一个 convolutional graph neural network 编码器和一个语言编码器。这将允许无监督零shot分类训练，利用语言结构来学习。无需任何特定的训练数据，我们可以达到了~82%的准确率和~75%的光吸catalyst预测准确率，只使用了一个非常小的数据集。这种新的网络可以理论上应用于任何可以通过文本描述的反应，打开了 Completely new方法来思考3D化学框架生成。在实验中，扩散模型可能会被integrated以全面利用潜在空间。

Learning Knowledge-Enhanced Contextual Language Representations for Domain Natural Language Understanding

paper_url: http://arxiv.org/abs/2311.06761
repo_url: None
paper_authors: Ruyao Xu, Taolin Zhang, Chengyu Wang, Zhongjie Duan, Cen Chen, Minghui Qiu, Dawei Cheng, Xiaofeng He, Weining Qian
for: 提高闭区域NLPTasks的性能（包括知识感知和普通NLPTasks）
methods: 使用知识图中的隐式图结构，以及深度层次entity-class结构的卷积编码来充分融合知识；同时使用subgraph contrastive learning来提高数据训练的质量
results: 在闭区域NLPTasks中显著超过其他KEPLM训练方法的性能，包括全shot和少shot学习设置

Abstract
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) improve the performance of various downstream NLP tasks by injecting knowledge facts from large-scale Knowledge Graphs (KGs). However, existing methods for pre-training KEPLMs with relational triples are difficult to be adapted to close domains due to the lack of sufficient domain graph semantics. In this paper, we propose a Knowledge-enhanced lANGuAge Representation learning framework for various clOsed dOmains (KANGAROO) via capturing the implicit graph structure among the entities. Specifically, since the entity coverage rates of closed-domain KGs can be relatively low and may exhibit the global sparsity phenomenon for knowledge injection, we consider not only the shallow relational representations of triples but also the hyperbolic embeddings of deep hierarchical entity-class structures for effective knowledge fusion.Moreover, as two closed-domain entities under the same entity-class often have locally dense neighbor subgraphs counted by max point biconnected component, we further propose a data augmentation strategy based on contrastive learning over subgraphs to construct hard negative samples of higher quality. It makes the underlying KELPMs better distinguish the semantics of these neighboring entities to further complement the global semantic sparsity. In the experiments, we evaluate KANGAROO over various knowledge-aware and general NLP tasks in both full and few-shot learning settings, outperforming various KEPLM training paradigms performance in closed-domains significantly.

摘要
知识增强预训练语言模型（KEPLM）可以提高下游NLPTask的性能，通过在大规模知识图（KG）中插入知识事实。然而，现有的KEPLM预训练方法难以适应封闭领域，因为封闭领域知识图的 semantics 缺乏。在这篇论文中，我们提出了一种名为 Knowledge-enhanced lANGuAge Representation learning framework for various clOsed dOmains（KANGAROO），通过捕捉实体之间的隐式图结构来提高 KEPLM 的性能。具体来说，关闭领域知识图中实体的覆盖率可能比较低，同时可能出现全球稀缺现象，因此我们不仅考虑了 triple 的浅层关系表示，还考虑了深层Entity-class结构的质量 embeddings。此外，在关闭领域知识图中，两个相同Entity-class的实体通常有本地稠密的邻居子图，我们提出了基于对比学习的数据增强策略，以生成更高质量的硬性负样本。这使得下面的KELPM更好地了解这些邻近实体的 semantics，并且进一步补偿全球semantic稀缺。在实验中，我们评估了 KANGAROO 在不同知识感知和通用 NLP 任务上的性能，在封闭领域内显著超越了不同的KEPLM 训练方法。

paper_url: http://arxiv.org/abs/2311.06758
repo_url: None
paper_authors: Tingfeng Cao, Chengyu Wang, Chuanqi Tan, Jun Huang, Jinhui Zhu
for: 这 paper 的目的是提出一种新的跨语言 Machine Reading Comprehension (MRC) 方法，以增强跨语言模型之间的转移性。
methods: 这 paper 使用了一种名为 X-STA 的新方法，包括一个抑制 teachert 来细致地传递源语言中的答案块到目标语言的答案输出空间，以及一种 Gradient-Disentangled Knowledge Sharing 技术来提高跨语言转移性。
results: 根据 experiments 表明，X-STA 方法可以准确地捕捉多种语言的答案块，并在三个多语言 MRC 数据集上表现出色，超越了当前的state-of-the-art 方法。

Abstract
In cross-lingual language understanding, machine translation is often utilized to enhance the transferability of models across languages, either by translating the training data from the source language to the target, or from the target to the source to aid inference. However, in cross-lingual machine reading comprehension (MRC), it is difficult to perform a deep level of assistance to enhance cross-lingual transfer because of the variation of answer span positions in different languages. In this paper, we propose X-STA, a new approach for cross-lingual MRC. Specifically, we leverage an attentive teacher to subtly transfer the answer spans of the source language to the answer output space of the target. A Gradient-Disentangled Knowledge Sharing technique is proposed as an improved cross-attention block. In addition, we force the model to learn semantic alignments from multiple granularities and calibrate the model outputs with teacher guidance to enhance cross-lingual transferability. Experiments on three multi-lingual MRC datasets show the effectiveness of our method, outperforming state-of-the-art approaches.

摘要
在语言跨越机器理解中，机器翻译经常被使用来提高语言之间模型的传输性，例如将源语言的训练数据翻译成目标语言，或将目标语言的数据翻译回源语言以帮助推理。然而，在跨语言机器阅读理解（MRC）中，因为答案范围位置在不同语言中存在差异，因此很难进行深度的帮助来提高跨语言传输性。在这篇论文中，我们提出了X-STA，一种新的跨语言MRC方法。具体来说，我们利用了一个注意力教师，通过细致地将源语言的答案范围转移到目标语言的答案输出空间中来帮助学习。此外，我们还提出了一种 Gradient-Disentangled Knowledge Sharing 技术，用于改进交叉注意力块。此外，我们还强制模型学习多级别的 semantic alignments，并使用教师指导来调整模型输出以增强跨语言传输性。实验结果表明，我们的方法可以备受效果，比过去的方法更高。

From Complex to Simple: Unraveling the Cognitive Tree for Reasoning with Small Language Models

paper_url: http://arxiv.org/abs/2311.06754
repo_url: None
paper_authors: Junbing Yan, Chengyu Wang, Taolin Zhang, Xiaofeng He, Jun Huang, Wei Zhang
for: 这个论文旨在探讨语言模型如何实现复杂的逻辑推理能力，以及如何使用双 processtheory来解释语言模型的认知过程。
methods: 该论文采用了一种迭代的方法来构建一个认知树（CogTree），该树的根节点表示初始查询，叶节点则表示可以直接回答的简单问题。该方法包括两个主要组成部分：即潜意EXTRACTION模块（Intuitive System）和EXPLICIT reasoning模块（Reflective System）。
results: 实验结果表明，使用这种方法可以达到与GPT-3.5（具有175B参数）的性能水平，使用的语言模型只有 <=7B 参数，这比GPT-3.5的5% fewer parameters。

Abstract
Reasoning is a distinctive human capacity, enabling us to address complex problems by breaking them down into a series of manageable cognitive steps. Yet, complex logical reasoning is still cumbersome for language models. Based on the dual process theory in cognitive science, we are the first to unravel the cognitive reasoning abilities of language models. Our framework employs an iterative methodology to construct a Cognitive Tree (CogTree). The root node of this tree represents the initial query, while the leaf nodes consist of straightforward questions that can be answered directly. This construction involves two main components: the implicit extraction module (referred to as the intuitive system) and the explicit reasoning module (referred to as the reflective system). The intuitive system rapidly generates multiple responses by utilizing in-context examples, while the reflective system scores these responses using comparative learning. The scores guide the intuitive system in its subsequent generation step. Our experimental results on two popular and challenging reasoning tasks indicate that it is possible to achieve a performance level comparable to that of GPT-3.5 (with 175B parameters), using a significantly smaller language model that contains fewer parameters (<=7B) than 5% of GPT-3.5.

摘要
人类具有特殊的理智能力，可以将复杂问题分解成一系列可管理的认知步骤来解决。然而，复杂逻辑理解仍然是语言模型的瓶颈。根据认知科学中的双 процесс理论，我们是第一个揭示语言模型的认知逻辑能力的研究。我们的框架采用迭代方法构建认知树（CogTree）。树的根节点表示初始查询，叶节点包含直接回答的简单问题。这个构建过程包括两个主要组成部分：印象EXTRACT模块（被称为直觉系统）和显式逻辑理解模块（被称为反思系统）。直觉系统快速生成多个回答，利用上下文例子，而显式逻辑理解模块使用比较学习评分这些回答。这些分数导引直觉系统在下一步生成过程中。我们对两个知名和具有挑战性的逻辑任务进行实验，结果表明，可以使用比GPT-3.5（具有175B参数）更小的语言模型（<=7B）达到相似的性能水平。

BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis

paper_url: http://arxiv.org/abs/2311.06752
repo_url: https://github.com/NicholasCao/BeautifulPrompt
paper_authors: Tingfeng Cao, Chengyu Wang, Bingyan Liu, Ziheng Wu, Jinhui Zhu, Jun Huang
for: 提高 diffusion-based deep generative models 的 text-to-image Synthesis 质量
methods: 使用 BeautifulPrompt 模型从简单描述生成高质量的 prompts，并通过人工智能反馈循环优化模型
results: 通过学习视觉 AI 反馈，可以提高生成的 prompts 和图像质量，并将 BeautifulPrompt 集成到云端 AI 平台以提供更好的 text-to-image 生成服务

Abstract
Recently, diffusion-based deep generative models (e.g., Stable Diffusion) have shown impressive results in text-to-image synthesis. However, current text-to-image models often require multiple passes of prompt engineering by humans in order to produce satisfactory results for real-world applications. We propose BeautifulPrompt, a deep generative model to produce high-quality prompts from very simple raw descriptions, which enables diffusion-based models to generate more beautiful images. In our work, we first fine-tuned the BeautifulPrompt model over low-quality and high-quality collecting prompt pairs. Then, to ensure that our generated prompts can generate more beautiful images, we further propose a Reinforcement Learning with Visual AI Feedback technique to fine-tune our model to maximize the reward values of the generated prompts, where the reward values are calculated based on the PickScore and the Aesthetic Scores. Our results demonstrate that learning from visual AI feedback promises the potential to improve the quality of generated prompts and images significantly. We further showcase the integration of BeautifulPrompt to a cloud-native AI platform to provide better text-to-image generation service in the cloud.

摘要

Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof Generation with Contrastive Stepwise Decoding

paper_url: http://arxiv.org/abs/2311.06736
repo_url: None
paper_authors: Ying Su, Xiaojin Fu, Mingwen Liu, Zhijiang Guo
for: 这个研究旨在评估大型自然语言模型（LLM）在逻辑推理任务中的表现，特别是使用链式思维（CoT）策略。
methods: 研究者使用了减少规模的语言模型，并引入了分解证明目标为更可管理的子目标、以及使用反例推导来强化模型的逻辑推理能力。
results: 实验结果表明，使用研究者提出的方法可以增强LLM在逻辑推理任务中的表现，特别是在复杂的逻辑推理链中。

Abstract
Logical reasoning remains a pivotal component within the realm of artificial intelligence. The recent evolution of large language models (LLMs) has marked significant progress in this domain. The adoption of strategies like chain-of-thought (CoT) has enhanced the performance of LLMs across diverse reasoning tasks. Nonetheless, logical reasoning that involves proof planning, specifically those that necessitate the validation of explanation accuracy, continues to present stumbling blocks. In this study, we first evaluate the efficacy of LLMs with advanced CoT strategies concerning such tasks. Our analysis reveals that LLMs still struggle to navigate complex reasoning chains, which demand the meticulous linkage of premises to derive a cogent conclusion. To address this issue, we finetune a smaller-scale language model, equipping it to decompose proof objectives into more manageable subgoals. We also introduce contrastive decoding to stepwise proof generation, making use of negative reasoning paths to strengthen the model's capacity for logical deduction. Experiments on EntailmentBank underscore the success of our method in augmenting the proof planning abilities of language models.

摘要
<> translate the given text into Simplified Chinese.>ilogical reasoning remains a crucial component within the realm of artificial intelligence. The recent evolution of large language models (LLMs) has marked significant progress in this domain. The adoption of strategies like chain-of-thought (CoT) has enhanced the performance of LLMs across diverse reasoning tasks. However, logical reasoning that involves proof planning, specifically those that require the validation of explanation accuracy, continues to present challenges. In this study, we first evaluate the efficacy of LLMs with advanced CoT strategies concerning such tasks. Our analysis reveals that LLMs still struggle to navigate complex reasoning chains, which demand the meticulous linkage of premises to derive a cogent conclusion. To address this issue, we fine-tune a smaller-scale language model, equipping it to decompose proof objectives into more manageable subgoals. We also introduce contrastive decoding to stepwise proof generation, making use of negative reasoning paths to strengthen the model's capacity for logical deduction. Experiments on EntailmentBank underscore the success of our method in augmenting the proof planning abilities of language models.Here's the text in Traditional Chinese:<>转换文本为简化字体。>ilogical reasoning remains a crucial component within the realm of artificial intelligence. The recent evolution of large language models (LLMs) has marked significant progress in this domain. The adoption of strategies like chain-of-thought (CoT) has enhanced the performance of LLMs across diverse reasoning tasks. However, logical reasoning that involves proof planning, specifically those that require the validation of explanation accuracy, continues to present challenges. In this study, we first evaluate the efficacy of LLMs with advanced CoT strategies concerning such tasks. Our analysis reveals that LLMs still struggle to navigate complex reasoning chains, which demand the meticulous linkage of premises to derive a cogent conclusion. To address this issue, we fine-tune a smaller-scale language model, equipping it to decompose proof objectives into more manageable subgoals. We also introduce contrastive decoding to stepwise proof generation, making use of negative reasoning paths to strengthen the model's capacity for logical deduction. Experiments on EntailmentBank underscore the success of our method in augmenting the proof planning abilities of language models.

paper_url: http://arxiv.org/abs/2311.06729
repo_url: None
paper_authors: Salim Sazzed
for:This study aims to understand the linguistic and socio-demographic features of online social media reviews, including English language styles, conveyed sentiments, and lexical diversity.methods:The study uses a case study approach, extracting and examining statistical, grammatical, and sentimental features from two demographically diverse groups. Machine learning (ML) classifiers are then leveraged to differentiate between the groups based on these features.results:The study finds significant disparities in linguistic attributes between the two groups, which can be effectively used to distinguish them with a macro F1 score of approximately 0.85. Additionally, the study compares the performance of linguistic features with word n-gram-based lexical features and finds that the latter, combined with fine-tuned transformer-based models, achieve higher accuracy (over 95%) and macro F1 scores (over 0.96). The findings provide valuable guidelines for future research on analyzing demographic patterns in textual content across social media platforms.

Abstract
This study aims to comprehend linguistic and socio-demographic features, encompassing English language styles, conveyed sentiments, and lexical diversity within spatial online social media review data. To this end, we undertake a case study that scrutinizes reviews composed by two distinct and demographically diverse groups. Our analysis entails the extraction and examination of various statistical, grammatical, and sentimental features from these two groups. Subsequently, we leverage these features with machine learning (ML) classifiers to discern their potential in effectively differentiating between the groups. Our investigation unveils substantial disparities in certain linguistic attributes between the two groups. When integrated into ML classifiers, these attributes exhibit a marked efficacy in distinguishing the groups, yielding a macro F1 score of approximately 0.85. Furthermore, we conduct a comparative evaluation of these linguistic features with word n-gram-based lexical features in discerning demographically diverse review data. As expected, the n-gram lexical features, coupled with fine-tuned transformer-based models, show superior performance, attaining accuracies surpassing 95\% and macro F1 scores exceeding 0.96. Our meticulous analysis and comprehensive evaluations substantiate the efficacy of linguistic and sentimental features in effectively discerning demographically diverse review data. The findings of this study provide valuable guidelines for future research endeavors concerning the analysis of demographic patterns in textual content across various social media platforms.

摘要

Controllable Topic-Focused Abstractive Summarization

paper_url: http://arxiv.org/abs/2311.06724
repo_url: None
paper_authors: Seyed Ali Bahrainian, Martin Jaggi, Carsten Eickhoff
for: 这个论文目的是提出一种基于Transformer架构的新方法，用于生成关注特定主题的摘要。
methods: 这个方法修改了Transformer模型中的跨注意力机制，以实现控制生成过程中的主题强调。这不添加任何额外参数到模型中。
results: 我们的模型在NEWTS dataset上实现了关注特定主题的摘要的新州OF艺。此外，我们通过广泛的实验表明，我们的提议的主题跨注意力机制可以在不需要重新训练的情况下，将BART和T5模型改进到CNN/Dailymail和XSum数据集上的摘要生成任务上。

Abstract
Controlled abstractive summarization focuses on producing condensed versions of a source article to cover specific aspects by shifting the distribution of generated text towards a desired style, e.g., a set of topics. Subsequently, the resulting summaries may be tailored to user-defined requirements. This paper presents a new Transformer-based architecture capable of producing topic-focused summaries. The architecture modifies the cross-attention mechanism of the Transformer to bring topic-focus control to the generation process while not adding any further parameters to the model. We show that our model sets a new state of the art on the NEWTS dataset in terms of topic-focused abstractive summarization as well as a topic-prevalence score. Moreover, we show via extensive experiments that our proposed topical cross-attention mechanism can be plugged into various Transformer models, such as BART and T5, improving their performance on the CNN/Dailymail and XSum benchmark datasets for abstractive summarization. This is achieved via fine-tuning, without requiring training from scratch. Finally, we show through human evaluation that our model generates more faithful summaries outperforming the state-of-the-art Frost model.

摘要
控制抽象摘要的研究集中焦点在生成受控的摘要，以掌控生成文本的分布，例如将摘要集中在某些主题上。这篇论文提出了一种基于Transformer架构的新型摘要生成模型，可以控制生成过程中的话题强调。我们修改了Transformer模型中的cross-attention机制，以实现话题强调控制，而无需添加任何参数。我们的模型在NEWTS数据集上实现了话题抽象摘要的新状态之冠，同时也在话题强制分布上达到了新的高水平。此外，我们通过广泛的实验表明，我们的提议的话题跨注意力机制可以在不同的Transformer模型中应用，如BART和T5，提高其在CNN/Dailymail和XSum数据集上的抽象摘要性能。这是通过微调，不需要从scratch retrained。最后，我们通过人工评估表明，我们的模型生成的摘要更 faithful，比出现在状态之冠的Frost模型。

Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer

paper_url: http://arxiv.org/abs/2311.06720
repo_url: https://github.com/tanyuqian/cappy
paper_authors: Bowen Tan, Yun Zhu, Lijuan Liu, Eric Xing, Zhiting Hu, Jindong Chen
for: 提高多任务大语言模型（LLMs）的性能和效率，并且可以方便地适应下游应用程序。
methods: 引入一个预训练小评分器（Cappy），可以独立地完成分类任务或者作为 LLMs 的辅助组件，提高其性能。
results: Cappy 可以在 11 种语言理解任务上表现出色，并且可以与其他 LLM 的适应方法（如 finetuning 和 in-context learning）相互协作，提供更高的性能提升。

Abstract
Large language models (LLMs) such as T0, FLAN, and OPT-IML, excel in multi-tasking under a unified instruction-following paradigm, where they also exhibit remarkable generalization abilities to unseen tasks. Despite their impressive performance, these LLMs, with sizes ranging from several billion to hundreds of billions of parameters, demand substantial computational resources, making their training and inference expensive and inefficient. Furthermore, adapting these models to downstream applications, particularly complex tasks, is often unfeasible due to the extensive hardware requirements for finetuning, even when utilizing parameter-efficient approaches such as prompt tuning. Additionally, the most powerful multi-task LLMs, such as OPT-IML-175B and FLAN-PaLM-540B, are not publicly accessible, severely limiting their customization potential. To address these challenges, we introduce a pretrained small scorer, Cappy, designed to enhance the performance and efficiency of multi-task LLMs. With merely 360 million parameters, Cappy functions either independently on classification tasks or serve as an auxiliary component for LLMs, boosting their performance. Moreover, Cappy enables efficiently integrating downstream supervision without requiring LLM finetuning nor the access to their parameters. Our experiments demonstrate that, when working independently on 11 language understanding tasks from PromptSource, Cappy outperforms LLMs that are several orders of magnitude larger. Besides, on 45 complex tasks from BIG-Bench, Cappy boosts the performance of the advanced multi-task LLM, FLAN-T5, by a large margin. Furthermore, Cappy is flexible to cooperate with other LLM adaptations, including finetuning and in-context learning, offering additional performance enhancement.

摘要
大型语言模型（LLM）如T0、FLAN和OPT-IML，在一体化指令遵循模式下表现出色，并具有卓越的应用扩展能力。尽管它们在表现方面卓越，但这些LLM对于训练和测试而言非常耗费资源，导致它们的训练和测试成本高昂不可持续。此外，对于下游应用的适配也是困难的，特别是面对复杂的任务时。这些最强的多任务LLM之一，如OPT-IML-175B和FLAN-PaLM-540B，则不公开 accessible，严重限制了它们的自定义潜力。为解决这些挑战，我们引入了一个预训小评分器，Cappy，用于增强多任务LLM的表现和效率。Cappy仅有360亿个参数，可以独立进行分类任务，或者作为LLM的辅助元件，提高其表现。此外，Cappy可以效率地 интеграate下游监督，不需要LLM的调整对应，也不需要存取LLM的参数。我们的实验显示，在11种语言理解任务上，Cappy在与许多个项目上表现出色，而且在45个复杂任务上，Cappy将FLAN-T5进步大幅。此外，Cappy还可以与其他LLM的适配方法，包括调整和在 контекスト中学习，提供进一步的表现提升。

What factors influence the popularity of user-generated text in the creative domain? A case study of book reviews

paper_url: http://arxiv.org/abs/2311.06714
repo_url: None
paper_authors: Salim Sazzed
for: 本研究探究了书评的各种心理、语言、 semantics 和可读性特征，以揭示书评的评价因素。
methods: 我们对书评中的各种特征进行统计分析，包括观点和情感表达的类型和频率、连接词、人物提及、单词独特性、通用性、句子结构等。此外，我们还使用两种可读性测试来探究是否存在评价媒体和评价媒体之间的相关性。
results: 我们的发现表明，除了一些特征（如评论长度、情感和单词独特性）之外，大多数特征没有显著的差异 между受欢迎和不受欢迎的评论组。此外，使用单词 n-gram 特征的机器学习分类器表现糟糕，这反映了在创造性领域中评价困难的问题。总之，本研究提供了各种评论受欢迎的因素的启示，并强调了在创造性领域进一步研究的必要性。

Abstract
This study investigates a range of psychological, lexical, semantic, and readability features of book reviews to elucidate the factors underlying their perceived popularity. To this end, we conduct statistical analyses of various features, including the types and frequency of opinion and emotion-conveying terms, connectives, character mentions, word uniqueness, commonness, and sentence structure, among others. Additionally, we utilize two readability tests to explore whether reading ease is positively associated with review popularity. Finally, we employ traditional machine learning classifiers and transformer-based fine-tuned language models with n-gram features to automatically determine review popularity. Our findings indicate that, with the exception of a few features (e.g., review length, emotions, and word uniqueness), most attributes do not exhibit significant differences between popular and non-popular review groups. Furthermore, the poor performance of machine learning classifiers using the word n-gram feature highlights the challenges associated with determining popularity in creative domains. Overall, our study provides insights into the factors underlying review popularity and highlights the need for further research in this area, particularly in the creative realm.

摘要
(Simplified Chinese)这个研究 investigate 一系列心理、语言、Semantic 和可读性特征，以探索书评的可读性的因素。为此，我们进行了各种统计分析，包括评论和情感表达的类型和频率、连接词、人物提及、单词独特性、常见性和句子结构等。此外，我们还使用了两种可读性测试，以探究评论的阅读易懂性是否与评论的流行性相关。最后，我们使用传统的机器学习分类器和基于 transformer 的优化语言模型，使用 n-gram 特征来自动确定评论的流行程度。我们的发现表明，除了一些特征（如评论的长度、情感和单词独特性），大多数特征没有显著的差异 между 流行和不流行的评论组。此外，使用 word n-gram 特征的机器学习分类器表现不佳，反映了在创造性领域中决定流行性的挑战。总的来说，我们的研究提供了关于评论流行性的因素的启示，并高亮了在创造性领域进一步研究的需要。

Trusted Source Alignment in Large Language Models

paper_url: http://arxiv.org/abs/2311.06697
repo_url: None
paper_authors: Vasilisa Bashlovkina, Zhaobin Kuang, Riley Matthews, Edward Clifford, Yennie Jun, William W. Cohen, Simon Baumgartner
for: This paper is written to evaluate the trusted source alignment (TSA) property of large language models (LLMs) and to present a dataset called FactCheckQA for evaluating TSA.
methods: The paper proposes a simple protocol for evaluating TSA, which includes response extraction, claim contextualization, and bias in prompt formulation.
results: The authors find that as they scale up the model size, the model performance on FactCheckQA improves from near-random to up to 80% balanced accuracy in aligning with trusted sources.Here’s the same information in Simplified Chinese text:
for: 这篇论文是为评估大语言模型（LLM）中的可靠来源对应性（TSA）而写的。
methods: 论文提出了一种简单的评估TSA的协议，包括响应EXTRACTION、CLAIM CONTEXTUALIZATION和提问表达中的偏见。
results: 作者发现，随着模型大小的增加，模型在FactCheckQA上的性能从near-random提高到了80%的权衡精度，与可靠来源进行对应。

Abstract
Large language models (LLMs) are trained on web-scale corpora that inevitably include contradictory factual information from sources of varying reliability. In this paper, we propose measuring an LLM property called trusted source alignment (TSA): the model's propensity to align with content produced by trusted publishers in the face of uncertainty or controversy. We present FactCheckQA, a TSA evaluation dataset based on a corpus of fact checking articles. We describe a simple protocol for evaluating TSA and offer a detailed analysis of design considerations including response extraction, claim contextualization, and bias in prompt formulation. Applying the protocol to PaLM-2, we find that as we scale up the model size, the model performance on FactCheckQA improves from near-random to up to 80% balanced accuracy in aligning with trusted sources.

摘要

Simple and Effective Input Reformulations for Translation

paper_url: http://arxiv.org/abs/2311.06696
repo_url: https://github.com/bri25yu/languagemodelexperimentation
paper_authors: Brian Yu, Hansen Lillemark, Kurt Keutzer
for: This paper aims to improve the performance of language models on challenging translation tasks through reformulating inputs during finetuning.
methods: The paper proposes simple data-level modifications to the input data during finetuning, which do not require additional training data or modifications at inference time.
results: The proposed methods achieve significant performance improvements of up to $\textbf{3.5 chrF++}$ on the Flores200 translation benchmark.Here’s the full Chinese text:
for: 这篇论文目标是通过在finetuning过程中对输入数据进行修改，提高语言模型在具有挑战性的翻译任务中的性能。
methods: 论文提出了一种简单的数据层修改方法，不需要额外收集训练数据或在推理时进行修改。
results: 提议的方法在Flores200翻译 benchmark上实现了显著的性能提升，达到了 $\textbf{3.5 chrF++}$ 的最佳性能。

Abstract
Foundation language models learn from their finetuning input context in different ways. In this paper, we reformulate inputs during finetuning for challenging translation tasks, leveraging model strengths from pretraining in novel ways to improve downstream performance. These reformulations are simple data level modifications, require no additional collection of training data or modification of data at inference time. They can be applied either on single language pair translation tasks or massively multilingual translation tasks. Experiments with these techniques demonstrate significant performance improvements up to $\textbf{3.5 chrF++ on the Flores200 translation benchmark}$. We hope our research accessibly improves finetuning data efficiency, enabling more effective training to scalably improve state-of-the-art performance. Our code is released $\href{https://github.com/bri25yu/LanguageModelExperimentation}{here}.$

摘要
基础语言模型从finetuning输入上学习的方式不同。在这篇论文中，我们将finetuning输入重新编写，以利用模型在预训练中的优势，以提高下游性能。这些重新编写是单纯的数据层次修改，无需额外收集训练数据或在推理时修改数据。它们可以应用于单语言对翻译任务或大规模多语言翻译任务。实验结果显示，使用这些技术可以获得$\textbf{3.5 chrF++在Flores200翻译标准 bencmark}$中的显著性能提升。我们希望我们的研究能够提高finetuning数据效率，以便更有效地训练，以拓宽状态之巅表现。我们的代码可以在 $\href{https://github.com/bri25yu/LanguageModelExperimentation}{这里}$ 获取。

2023-11-12

cs.LG

cs.LG - 2023-11-12

Analytical Verification of Deep Neural Network Performance for Time-Synchronized Distribution System State Estimation

paper_url: http://arxiv.org/abs/2311.06973
repo_url: None
paper_authors: Behrouz Azimian, Shiva Moshtagh, Anamitra Pal, Shanshan Ma
for: 本文提出了一种使用深度神经网络（DNN）实现实时不可见分布系统状态估计的方法，并提供了对这种方法的性能分析。
methods: 本文使用了深度神经网络来解决实时不可见分布系统状态估计问题，并对输入偏差的影响进行分析。
results: 本文通过对模拟数据集和实际系统数据进行比较，证明了深度神经网络在输入偏差下的Robustness和可靠性。同时，本文也发现批量正常化可以有效地解决了MILP形式中的缺点。

Abstract
Recently, we demonstrated success of a time-synchronized state estimator using deep neural networks (DNNs) for real-time unobservable distribution systems. In this letter, we provide analytical bounds on the performance of that state estimator as a function of perturbations in the input measurements. It has already been shown that evaluating performance based on only the test dataset might not effectively indicate a trained DNN's ability to handle input perturbations. As such, we analytically verify robustness and trustworthiness of DNNs to input perturbations by treating them as mixed-integer linear programming (MILP) problems. The ability of batch normalization in addressing the scalability limitations of the MILP formulation is also highlighted. The framework is validated by performing time-synchronized distribution system state estimation for a modified IEEE 34-node system and a real-world large distribution system, both of which are incompletely observed by micro-phasor measurement units.

摘要
最近，我们已经成功地使用深度神经网络（DNN）来实时估计不可见分布系统的状态。在这封信中，我们提供了对状态估计器的性能进行分析的下限。已经证明了只 judging 基于训练集不能准确地评估已经训练好的 DNN 对输入干扰的能力。因此，我们使用混合整数线性程序（MILP）问题来验证 DNN 对输入干扰的Robustness和可靠性。我们还 highlighted 批处理normalization 的缩放性限制，并在批处理normalization 下进行了性能验证。我们的框架在一个修改过 IEEE 34 节点系统和一个实际大型分布系统中进行了时同步分布系统状态估计，两个系统都是通过微phasor测量单元不完全观察的。

An Expandable Machine Learning-Optimization Framework to Sequential Decision-Making

paper_url: http://arxiv.org/abs/2311.06972
repo_url: None
paper_authors: Dogacan Yilmaz, İ. Esra Büyüktahtakın
for: 解决sequential decision-making问题，提高machine learning（ML）预测的可行性和泛化能力。
methods: integrate attention-based encoder-decoder neural network architecture with infeasibility-elimination和generalization framework，并Optimize the required level of predictions to eliminate the infeasibility of the ML predictions。
results: 可以快速解决time-dependent optimization问题，并且可以降低solution time by three orders of magnitude，average optimality gap below 0.1%。 Comparing with various specially designed heuristics, PredOpt outperforms them.

Abstract
We present an integrated prediction-optimization (PredOpt) framework to efficiently solve sequential decision-making problems by predicting the values of binary decision variables in an optimal solution. We address the key issues of sequential dependence, infeasibility, and generalization in machine learning (ML) to make predictions for optimal solutions to combinatorial problems. The sequential nature of the combinatorial optimization problems considered is captured with recurrent neural networks and a sliding-attention window. We integrate an attention-based encoder-decoder neural network architecture with an infeasibility-elimination and generalization framework to learn high-quality feasible solutions to time-dependent optimization problems. In this framework, the required level of predictions is optimized to eliminate the infeasibility of the ML predictions. These predictions are then fixed in mixed-integer programming (MIP) problems to solve them quickly with the aid of a commercial solver. We demonstrate our approach to tackling the two well-known dynamic NP-Hard optimization problems: multi-item capacitated lot-sizing (MCLSP) and multi-dimensional knapsack (MSMK). Our results show that models trained on shorter and smaller-dimensional instances can be successfully used to predict longer and larger-dimensional problems. The solution time can be reduced by three orders of magnitude with an average optimality gap below 0.1%. We compare PredOpt with various specially designed heuristics and show that our framework outperforms them. PredOpt can be advantageous for solving dynamic MIP problems that need to be solved instantly and repetitively.

摘要
我们提出了一个集成预测优化（PredOpt）框架，用于高效解决顺序决策问题，预测二进制决策变量的价值在优质解决方案中。我们解决了机器学习（ML）中的顺序依赖、不可实现性和泛化问题，以使ML预测可以为优质解决方案提供高质量的预测。我们使用循环神经网络和滑块注意力窗口捕捉顺序优化问题的特点。我们将注意力基本网络和不可实现性和泛化框架结合在一起，以学习高质量的可行解决方案。在这个框架中，ML预测的需要级别被优化，以消除ML预测中的不可实现性。这些预测然后被ixed在混合整数编程（MIP）问题中，通过商业解决器快速解决。我们利用PredOpt解决了多项目资源配置问题（MCLSP）和多维度饼干问题（MSMK）。我们的结果显示，可以使用较短和更小的实例来训练模型，并且这些模型可以成功预测更长和更大的问题。我们的解决方案比特制的各种优化策略更高效，并且可以降低解决时间三个数量级，average optimality gap在0.1%以下。我们与其他专门设计的各种优化策略进行比较，并证明PredOpt可以在实时和重复地解决动态MIP问题中具有优势。

Anchor Data Augmentation

paper_url: http://arxiv.org/abs/2311.06965
repo_url: None
paper_authors: Nora Schneider, Shirin Goshtasbpour, Fernando Perez-Cruz
for: 提高非线性过参数回归的数据增强方法
methods: 基于 causality 文献的 Anchor regression (AR) 方法，使用多个修改后的样本来提供更多训练例子，提高回归预测的Robustness。
results: ADA 在线性和非线性回归问题中表现与当前领域无关的 Mixup 解决方案竞争。

Abstract
We propose a novel algorithm for data augmentation in nonlinear over-parametrized regression. Our data augmentation algorithm borrows from the literature on causality and extends the recently proposed Anchor regression (AR) method for data augmentation, which is in contrast to the current state-of-the-art domain-agnostic solutions that rely on the Mixup literature. Our Anchor Data Augmentation (ADA) uses several replicas of the modified samples in AR to provide more training examples, leading to more robust regression predictions. We apply ADA to linear and nonlinear regression problems using neural networks. ADA is competitive with state-of-the-art C-Mixup solutions.

摘要
我们提出一种新的数据扩充算法，用于非线性过参数化回归。我们的数据扩充算法从 causality литературе借鉴，并对最近提出的 Anchor regression（AR）方法进行扩展，而不是现有的领域不依然的 Mixup 解决方案。我们的 Anchor Data Augmentation（ADA）使用多个修改后的样本来提供更多的训练示例，从而导致更加稳定的回归预测。我们在线性和非线性回归问题中应用 ADA，并与当前领域最佳的 C-Mixup 解决方案竞争。

Robust Regression over Averaged Uncertainty

paper_url: http://arxiv.org/abs/2311.06960
repo_url: None
paper_authors: Dimitris Bertsimas, Yu Ma
for: 本文提出了一种新的稳健回归方法，通过综合所有实现集来获得最佳解决方案 для常规最小二乘回归问题。
methods: 本文使用了一种averaged approach来处理uncertainty set，并证明了这种方法可以回归ridge回归和确定了exististing回归问题的mean squared error和robust optimization之间的联系。
results: 本文在synthetic数据集和实际世界回归问题中显示了一个consistent improvement，并且与干扰水平增加时，提高的速度也随着干扰水平增加。

Abstract
We propose a new formulation of robust regression by integrating all realizations of the uncertainty set and taking an averaged approach to obtain the optimal solution for the ordinary least-squared regression problem. We show that this formulation surprisingly recovers ridge regression and establishes the missing link between robust optimization and the mean squared error approaches for existing regression problems. We first prove the equivalence for four uncertainty sets: ellipsoidal, box, diamond, and budget, and provide closed-form formulations of the penalty term as a function of the sample size, feature size, as well as perturbation protection strength. We then show in synthetic datasets with different levels of perturbations, a consistent improvement of the averaged formulation over the existing worst-case formulation in out-of-sample performance. Importantly, as the perturbation level increases, the improvement increases, confirming our method's advantage in high-noise environments. We report similar improvements in the out-of-sample datasets in real-world regression problems obtained from UCI datasets.

摘要
We prove the equivalence of our method for four types of uncertainty sets: ellipsoidal, box, diamond, and budget. We also provide closed-form expressions for the penalty term as a function of sample size, feature size, and perturbation protection strength.In synthetic datasets with different levels of perturbations, our method consistently outperforms the traditional worst-case formulation in out-of-sample performance. As the perturbation level increases, the improvement also increases, demonstrating the advantage of our method in high-noise environments. We observe similar improvements in real-world regression problems obtained from UCI datasets.

A GPU-Accelerated Moving-Horizon Algorithm for Training Deep Classification Trees on Large Datasets

paper_url: http://arxiv.org/abs/2311.06952
repo_url: None
paper_authors: Jiayang Ren, Valentín Osuna-Enciso, Morimasa Okamoto, Qiangqiang Mao, Chaojie Ji, Liang Cao, Kaixun Hua, Yankai Cao
for: 本文主要针对决策树的训练受限于NP-完备性和各种各样的特性，并提出了一种基于移动观察点的 diferencial evolution算法来解决这些问题。
methods: 本文提出了一种基于GPU加速和搜索优化的移动观察点差分演化算法（MH-DEOCT），包括离散树解码方法、GPU加速实现和移动观察点策略。
results: 对于68个UCI数据集，MH-DEOCT方法与CART方法相比，平均提高了训练和测试准确率3.44%和1.71%，并且在深树和大规模数据集中实现了很好的扩展性。

Abstract
Decision trees are essential yet NP-complete to train, prompting the widespread use of heuristic methods such as CART, which suffers from sub-optimal performance due to its greedy nature. Recently, breakthroughs in finding optimal decision trees have emerged; however, these methods still face significant computational costs and struggle with continuous features in large-scale datasets and deep trees. To address these limitations, we introduce a moving-horizon differential evolution algorithm for classification trees with continuous features (MH-DEOCT). Our approach consists of a discrete tree decoding method that eliminates duplicated searches between adjacent samples, a GPU-accelerated implementation that significantly reduces running time, and a moving-horizon strategy that iteratively trains shallow subtrees at each node to balance the vision and optimizer capability. Comprehensive studies on 68 UCI datasets demonstrate that our approach outperforms the heuristic method CART on training and testing accuracy by an average of 3.44% and 1.71%, respectively. Moreover, these numerical studies empirically demonstrate that MH-DEOCT achieves near-optimal performance (only 0.38% and 0.06% worse than the global optimal method on training and testing, respectively), while it offers remarkable scalability for deep trees (e.g., depth=8) and large-scale datasets (e.g., ten million samples).

摘要
决策树是必备的，但是训练NP-完整的，导致广泛使用各种各样的规则来缺省性能。最近，对于寻找优化决策树的突破发展出现了，但这些方法仍然面临巨大的计算成本和深度大的树结构。为了解决这些限制，我们介绍了一种基于移动观察点的 diferencial evolution算法 для分类树（MH-DEOCT）。我们的方法包括一种离散树解码方法，消除邻近样本之间的重复搜索，一种GPU加速的实现，以及一种移动观察点策略，在每个节点训练 shallow 树以平衡视觉和优化能力。我们对68个UCI数据集进行了完整的实验研究，显示我们的方法在训练和测试精度上比CART方法高出3.44%和1.71%， respectively。此外，这些数字实验也证明了MH-DEOCT方法在训练和测试精度上几乎达到了最佳性能（只比全球最佳方法在训练和测试精度上低出0.38%和0.06%），而且它在深度大的树结构和大规模数据集（例如，一千万个样本）中表现出了很好的可扩展性。

Contractive Systems Improve Graph Neural Networks Against Adversarial Attacks

paper_url: http://arxiv.org/abs/2311.06942
repo_url: None
paper_authors: Moshe Eliasof, Davide Murari, Ferdia Sherry, Carola-Bibiane Schönlieb
for: 强化Graph Neural Networks（GNNs）对抗黑客攻击
methods: 基于减法动态系统的图神经网络层，同时学习节点特征和图连接矩阵的演化，提高模型对输入特征和图结构的Robustness
results: 通过许多实验示范，与现有方法相比，提高或与之相当的性能

Abstract
Graph Neural Networks (GNNs) have established themselves as a key component in addressing diverse graph-based tasks. Despite their notable successes, GNNs remain susceptible to input perturbations in the form of adversarial attacks. This paper introduces an innovative approach to fortify GNNs against adversarial perturbations through the lens of contractive dynamical systems. Our method introduces graph neural layers based on differential equations with contractive properties, which, as we show, improve the robustness of GNNs. A distinctive feature of the proposed approach is the simultaneous learned evolution of both the node features and the adjacency matrix, yielding an intrinsic enhancement of model robustness to perturbations in the input features and the connectivity of the graph. We mathematically derive the underpinnings of our novel architecture and provide theoretical insights to reason about its expected behavior. We demonstrate the efficacy of our method through numerous real-world benchmarks, reading on par or improved performance compared to existing methods.

摘要
图 neural network (GNN) 已成为许多图像任务的关键组件。尽管它们具有显著的成功，但 GNN 仍然易受输入抗干扰的影响。这篇论文介绍了一种创新的方法，通过对 GNN 进行启发式动力系统的扩展，提高其对抗干扰的 robustness。我们的方法基于差分方程，并且通过对节点特征和邻接矩阵的同时学习，实现了图像模型的自适应性。我们 математичеamente derivation 了我们的新架构的基础，并提供了理论上的理解，以便理解我们的方法的预期行为。我们通过多个实际 benchmark 证明了我们的方法的有效性，与现有方法相比，表现了类似或更好的性能。

5G Networks and IoT Devices: Mitigating DDoS Attacks with Deep Learning Techniques

paper_url: http://arxiv.org/abs/2311.06938
repo_url: None
paper_authors: Reem M. Alzhrani, Mohammed A. Alliheedi
For: 这项研究旨在应对互联网物联网（IoT）设备的安全性和隐私问题，特别是在5G网络中。* Methods: 该研究使用了深度学习技术，包括卷积神经网络（CNN）和Feed Forward神经网络（FNN），对iot设备在5G网络中的数据进行分析和识别。* Results: 研究发现，使用深度学习技术可以准确地识别normal网络流量和DDos攻击，CNN和FNN两种算法均达到了99%的准确率。这些结果表明深度学习可以提高IoT设备在5G网络中的安全性。

Abstract
The development and implementation of Internet of Things (IoT) devices have been accelerated dramatically in recent years. As a result, a super-network is required to handle the massive volumes of data collected and transmitted to these devices. Fifth generation (5G) technology is a new, comprehensive wireless technology that has the potential to be the primary enabling technology for the IoT. The rapid spread of IoT devices can encounter many security limits and concerns. As a result, new and serious security and privacy risks have emerged. Attackers use IoT devices to launch massive attacks; one of the most famous is the Distributed Denial of Service (DDoS) attack. Deep Learning techniques have proven their effectiveness in detecting and mitigating DDoS attacks. In this paper, we applied two Deep Learning algorithms Convolutional Neural Network (CNN) and Feed Forward Neural Network (FNN) in dataset was specifically designed for IoT devices within 5G networks. We constructed the 5G network infrastructure using OMNeT++ with the INET and Simu5G frameworks. The dataset encompasses both normal network traffic and DDoS attacks. The Deep Learning algorithms, CNN and FNN, showed impressive accuracy levels, both reaching 99%. These results underscore the potential of Deep Learning to enhance the security of IoT devices within 5G networks.

摘要
“现在的互联网发展趋势很快，因此需要一个超级网络来处理大量的数据和传输到这些设备。第五代（5G）技术是一种新的、全面的无线技术，它有可能成为互联网的主要启动技术。随着互联网设备的快速普及，新的安全和隐私问题也在不断产生。攻击者使用互联网设备发动大规模攻击，其中最著名的是分布式拒绝服务（DDoS）攻击。深度学习技术在检测和解决DDoS攻击方面表现出色，在本文中，我们将运用深度学习算法Convolutional Neural Network（CNN）和Feed Forward Neural Network（FNN），在特定的互联网5G网络中进行测试。我们使用OMNeT++架构，并使用INET和Simu5G框架建立5G网络基础设施。资料集包括正常网络流量和DDoS攻击。深度学习算法CNN和FNN在资料集中表现出色，精度分别达到99%。这些结果显示深度学习在5G网络中增强互联网设备的安全性具有潜力。”

Attention for Causal Relationship Discovery from Biological Neural Dynamics

paper_url: http://arxiv.org/abs/2311.06928
repo_url: None
paper_authors: Ziyu Lu, Anika Tabassum, Shruti Kulkarni, Lu Mi, J. Nathan Kutz, Eric Shea-Brown, Seung-Hwan Lim
for: 这 paper 探讨了使用 transformer 模型来学习 neural network 中每个节点的 Granger causality，以获得更好的 causal representation learning。
methods: 这 paper 使用 simulated neural dynamics 进行证明，并通过 cross attention module 来捕捉 neuron 之间的 causal relationship，其准确率与 Granger causality analysis 方法相当或更高。
results: 这 paper 的研究表明，transformer 模型可以有效地捕捉 neural network 中每个节点的 causal relationship，并且可以与 Granger causality analysis 方法相当或更高的准确率。

Abstract
This paper explores the potential of the transformer models for learning Granger causality in networks with complex nonlinear dynamics at every node, as in neurobiological and biophysical networks. Our study primarily focuses on a proof-of-concept investigation based on simulated neural dynamics, for which the ground-truth causality is known through the underlying connectivity matrix. For transformer models trained to forecast neuronal population dynamics, we show that the cross attention module effectively captures the causal relationship among neurons, with an accuracy equal or superior to that for the most popular Granger causality analysis method. While we acknowledge that real-world neurobiology data will bring further challenges, including dynamic connectivity and unobserved variability, this research offers an encouraging preliminary glimpse into the utility of the transformer model for causal representation learning in neuroscience.

摘要
Translation in Simplified Chinese:这篇论文探讨了转换器模型在具有复杂非线性动态的网络中学习格兰格 causality的潜力，例如 neuroscience 和生物物理网络。我们的研究主要集中在基于模拟神经动力学的证明性研究上，其中的 causality 是通过连接矩阵获知的。对于基于神经动力学预测的 transformer 模型，我们显示了 cross attention 模块能够有效地捕捉神经之间的 causal 关系，准确率与最受欢迎的格兰格 causality 分析方法相当或更高。虽然我们认为实际的 neuroscience 数据会带来更多的挑战，包括动态连接和隐藏变量，但这种研究提供了encouraging 的初步预览，表明 transformer 模型在 neuroscience 中的 causal 表示学习具有潜力。

Concept Matching: Clustering-based Federated Continual Learning

paper_url: http://arxiv.org/abs/2311.06921
repo_url: None
paper_authors: Xiaopeng Jiang, Cristian Borcea
for: 本研究旨在解决联合学习（FL）和继续学习（CL）的问题，提高模型准确率。
methods: 提出了一种基于归一化的概念匹配（CM）框架，通过将客户端模型分组到概念模型集中，然后在不同时间点建立不同概念的全局模型，以避免泄漏性学习和客户端模型之间的干扰。
results: 证明了CM比状态艺术系统表现更好，并可扩展到不同的归一化、汇集和匹配算法。

Abstract
Federated Continual Learning (FCL) has emerged as a promising paradigm that combines Federated Learning (FL) and Continual Learning (CL). To achieve good model accuracy, FCL needs to tackle catastrophic forgetting due to concept drift over time in CL, and to overcome the potential interference among clients in FL. We propose Concept Matching (CM), a clustering-based framework for FCL to address these challenges. The CM framework groups the client models into concept model clusters, and then builds different global models to capture different concepts in FL over time. In each round, the server sends the global concept models to the clients. To avoid catastrophic forgetting, each client selects the concept model best-matching the concept of the current data for further fine-tuning. To avoid interference among client models with different concepts, the server clusters the models representing the same concept, aggregates the model weights in each cluster, and updates the global concept model with the cluster model of the same concept. Since the server does not know the concepts captured by the aggregated cluster models, we propose a novel server concept matching algorithm that effectively updates a global concept model with a matching cluster model. The CM framework provides flexibility to use different clustering, aggregation, and concept matching algorithms. The evaluation demonstrates that CM outperforms state-of-the-art systems and scales well with the number of clients and the model size.

摘要

Resource-Aware Hierarchical Federated Learning for Video Caching in Wireless Networks

paper_url: http://arxiv.org/abs/2311.06918
repo_url: None
paper_authors: Md Ferdous Pervej, Andreas F Molisch
for: 避免回хай路塞车的压力，提高网络性能
methods: 使用资源意识的联邦学习方法（RawHFL）估算用户未来的内容请求
results: 比基eline的方法表现出更高的预测精度和总能耗Here’s the translation in Simplified Chinese:
for: 减轻回хай路塞车的压力，提高网络性能
methods: 使用资源意识的联邦学习方法（RawHFL）预测用户未来的内容请求
results: 比基线的方法表现出更高的预测精度和总能耗

Abstract
Video caching can significantly improve backhaul traffic congestion by locally storing the popular content that users frequently request. A privacy-preserving method is desirable to learn how users' demands change over time. As such, this paper proposes a novel resource-aware hierarchical federated learning (RawHFL) solution to predict users' future content requests under the realistic assumptions that content requests are sporadic and users' datasets can only be updated based on the requested content's information. Considering a partial client participation case, we first derive the upper bound of the global gradient norm that depends on the clients' local training rounds and the successful reception of their accumulated gradients over the wireless links. Under delay, energy and radio resource constraints, we then optimize client selection and their local rounds and central processing unit (CPU) frequencies to minimize a weighted utility function that facilitates RawHFL's convergence in an energy-efficient way. Our simulation results show that the proposed solution significantly outperforms the considered baselines in terms of prediction accuracy and total energy expenditure.

摘要
<>使用视频缓存可以大幅提高后向压力堵塞，通过地方存储用户经常请求的受欢迎内容。一种遵守隐私的方法是需要了解用户的需求变化。因此，这篇论文提出了一种基于资源意识的归纳 Federated learning（RawHFL）解决方案，以预测用户未来的内容请求。assuming that content requests are sporadic and users' datasets can only be updated based on the requested content's information.首先，我们 derive the upper bound of the global gradient norm that depends on the clients' local training rounds and the successful reception of their accumulated gradients over the wireless links.然后，我们在延迟、能量和无线链接的限制下优化客户选择和他们的本地循环数和中央处理器（CPU）频率，以最小化一个权重函数，以便 RawHFL 在能效的方式进行归纳。我们的 simulation 结果表明，提出的解决方案significantly outperforms the considered baselines in terms of prediction accuracy and total energy expenditure.(Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.)

EPIM: Efficient Processing-In-Memory Accelerators based on Epitome

paper_url: http://arxiv.org/abs/2311.07620
repo_url: None
paper_authors: Chenyu Wang, Zhen Dong, Daquan Zhou, Zhenhua Zhu, Yu Wang, Jiashi Feng, Kurt Keutzer
for: 这篇论文旨在探讨如何在Processing-In-Memory（PIM）加速器上实现大规模神经网络，并解决由于PIM加速器的内存容量限制所带来的挑战。
methods: 本论文使用了模型压缩算法来缩小对应encephalographic Neural Networks（CNNs）的大小，并提出了一种名为Epitome的轻量级神经操作，以实现PIM加速器上的内存效率。在软件方面，我们评估了epitome的延迟和能源消耗PIM加速器上，并提出了一种PIM应用层次设计方法来提高硬件效率。在硬件方面，我们修改了现有PIM加速器的资料道路来适应epitome，并实现了图像重复技术来降低computation成本。
results: 我们的32位量化EPIM-ResNet50在ImageNet上取得71.59%的顶部1精度，比前一代压缩方法在PIM上更高。EPIM超过了现有的删除方法在PIM上。

Abstract
The exploration of Processing-In-Memory (PIM) accelerators has garnered significant attention within the research community. However, the utilization of large-scale neural networks on Processing-In-Memory (PIM) accelerators encounters challenges due to constrained on-chip memory capacity. To tackle this issue, current works explore model compression algorithms to reduce the size of Convolutional Neural Networks (CNNs). Most of these algorithms either aim to represent neural operators with reduced-size parameters (e.g., quantization) or search for the best combinations of neural operators (e.g., neural architecture search). Designing neural operators to align with PIM accelerators' specifications is an area that warrants further study. In this paper, we introduce the Epitome, a lightweight neural operator offering convolution-like functionality, to craft memory-efficient CNN operators for PIM accelerators (EPIM). On the software side, we evaluate epitomes' latency and energy on PIM accelerators and introduce a PIM-aware layer-wise design method to enhance their hardware efficiency. We apply epitome-aware quantization to further reduce the size of epitomes. On the hardware side, we modify the datapath of current PIM accelerators to accommodate epitomes and implement a feature map reuse technique to reduce computation cost. Experimental results reveal that our 3-bit quantized EPIM-ResNet50 attains 71.59% top-1 accuracy on ImageNet, reducing crossbar areas by 30.65 times. EPIM surpasses the state-of-the-art pruning methods on PIM.

摘要
研究人员对处理在内存（PIM）加速器的探索已经吸引了广泛的关注。然而，使用大规模神经网络（CNN）在PIM加速器上遇到了问题，因为内存容量的限制。为解决这个问题，当前的研究主要探讨模型压缩算法，以减少神经网络中参数的大小。大多数这些算法都是通过压缩神经网络中的参数来减少神经网络的大小。然而，设计神经网络操作符符合PIM加速器的特点是一个需要更多研究的领域。在这篇论文中，我们介绍了一种轻量级的神经操作符，即Epitome，以实现内存有效的CNN操作符。在软件端，我们评估了epitome在PIM加速器上的响应时间和能耗，并提出了一种针对PIM加速器的层次设计方法，以提高硬件效率。在硬件端，我们修改了现有PIM加速器的数据路径，以便使用epitome，并实现了特征图 reuse技术，以减少计算成本。实验结果表明，我们的3比特量化的EPIM-ResNet50在ImageNet上达到了71.59%的前1 accuracy，相比之下，降低了交叉栅格面积30.65倍。EPIM超越了PIM上的状态态-of-the-art剪裁方法。

An Application of Vector Autoregressive Model for Analyzing the Impact of Weather And Nearby Traffic Flow On The Traffic Volume

paper_url: http://arxiv.org/abs/2311.06894
repo_url: None
paper_authors: Anh Thi-Hoang Nguyen, Dung Ha Nguyen, Trong-Hop Do
for: 预测一个道路段的交通流量，基于附近交通量和天气条件。
methods: 使用VAR(36)模型，包括时间趋势和常数，来训练数据集和预测。
results: 通过分析结果，发现天气条件和附近交通量对交通流量的影响，并且提供了解决交通流量预测问题的一种方法。

Abstract
This paper aims to predict the traffic flow at one road segment based on nearby traffic volume and weather conditions. Our team also discover the impact of weather conditions and nearby traffic volume on the traffic flow at a target point. The analysis results will help solve the problem of traffic flow prediction and develop an optimal transport network with efficient traffic movement and minimal traffic congestion. Hourly historical weather and traffic flow data are selected to solve this problem. This paper uses model VAR(36) with time trend and constant to train the dataset and forecast. With an RMSE of 565.0768111 on average, the model is considered appropriate although some statistical tests implies that the residuals are unstable and non-normal. Also, this paper points out some variables that are not useful in forecasting, which helps simplify the data-collecting process when building the forecasting system.

摘要
这篇论文目标是根据附近交通量和天气情况预测一段公路交通流量。我们团队还发现了天气情况和附近交通量对target点交通流量的影响。分析结果将帮助解决交通流量预测问题并开发高效的交通网络，实现最佳的交通运输和最小的交通拥堵。选用了一年历史天气和交通流量数据进行解决这个问题。本文使用VAR(36)模型，包括时间趋势和常数，来训练数据集和预测。其中RMSE平均为565.0768111，可以视为合适的，但一些统计测试表明 residuals 不稳定和不归一化。此外，本文还指出了一些无用的变量，帮助简化收集数据时建立预测系统。

Preserving Node-level Privacy in Graph Neural Networks

paper_url: http://arxiv.org/abs/2311.06888
repo_url: None
paper_authors: Zihang Xiang, Tianhao Wang, Di Wang
for: 这个研究旨在解决Graph Neural Networks（GNNs）中的实体隐私问题，specifically addressing the issue of node-level privacy。
methods: 我们的协议包括两个主要 ком成分：1）一个称为HeterPoisson的抽样 routinen，这个routinen使用特殊化的节点抽样策略和一系列适合的操作来生成一批子graphs with desired properties，2）一个Randomization routinen，这个routinen使用symmetric multivariate Laplace（SML）噪声而不是常用的Gaussian噪声。
results: 我们的隐私评估显示这组合提供了一定的隐私保证。实验表明，与现有的基准比较，我们的方法在高隐私 режи的情况下表现更好，特别是在五个真实世界数据集上。我们还进行了会员推测攻击和隐私审核技术来证明我们的协议的隐私完整性。

Abstract
Differential privacy (DP) has seen immense applications in learning on tabular, image, and sequential data where instance-level privacy is concerned. In learning on graphs, contrastingly, works on node-level privacy are highly sparse. Challenges arise as existing DP protocols hardly apply to the message-passing mechanism in Graph Neural Networks (GNNs). In this study, we propose a solution that specifically addresses the issue of node-level privacy. Our protocol consists of two main components: 1) a sampling routine called HeterPoisson, which employs a specialized node sampling strategy and a series of tailored operations to generate a batch of sub-graphs with desired properties, and 2) a randomization routine that utilizes symmetric multivariate Laplace (SML) noise instead of the commonly used Gaussian noise. Our privacy accounting shows this particular combination provides a non-trivial privacy guarantee. In addition, our protocol enables GNN learning with good performance, as demonstrated by experiments on five real-world datasets; compared with existing baselines, our method shows significant advantages, especially in the high privacy regime. Experimentally, we also 1) perform membership inference attacks against our protocol and 2) apply privacy audit techniques to confirm our protocol's privacy integrity. In the sequel, we present a study on a seemingly appealing approach \cite{sajadmanesh2023gap} (USENIX'23) that protects node-level privacy via differentially private node/instance embeddings. Unfortunately, such work has fundamental privacy flaws, which are identified through a thorough case study. More importantly, we prove an impossibility result of achieving both (strong) privacy and (acceptable) utility through private instance embedding. The implication is that such an approach has intrinsic utility barriers when enforcing differential privacy.

摘要
différential privacy (DP) 已经在表格、图像和序列数据中进行了广泛的应用，而且在实例级隐私方面进行了充分的保障。然而，在图学中，工作在节点级隐私方面是非常罕见。 existing DP 协议几乎没有应用于图学中的消息传递机制。在这种研究中，我们提出了一种解决方案，即特点在节点级隐私方面。我们的协议包括两个主要组成部分： 1. 一种叫做 HeterPoisson 的采样 Routine，该 Routine使用特殊的节点采样策略和一系列适应的操作来生成一批具有感兴趣的属性的子图。 2. 一种叫做 Symmetric Multivariate Laplace (SML) 随机噪音的使用，而不是通常使用的高斯噪音。我们的隐私负荷表明这种特定的组合具有一定的隐私保障。此外，我们的协议允许 GNN 学习具有良好的性能，如实验所示，相比现有的基eline，我们的方法在高隐私 режиower表现出了显著优势，特别是在高隐私 режиower下。在实验中，我们还 1. 对我们的协议进行了成员推理攻击，以及 2. 通过隐私审核技术来确认我们的协议的隐私完整性。在继续的研究中，我们发现了一篇可能有吸引力的论文 \cite{sajadmanesh2023gap} (USENIX'23)，该论文通过异 diferencial privacy 保护节点级隐私。然而，我们在这篇论文中发现了基本的隐私漏洞，并通过详细的案例研究证明了这些漏洞。此外，我们还证明了在保持异 diferencial privacy 的情况下，不可能同时实现强隐私和可接受的用用。这种隐私的限制意味着在强制实施异 diferencial privacy 时，实际上存在一定的实用障碍。

paper_url: http://arxiv.org/abs/2311.06879
repo_url: None
paper_authors: Liping Yi, Han Yu, Gang Wang, Xiaoguang Liu
for: 本研究旨在提出一种基于特征提取器共享（pFedES）的个性化 Federated learning方法，以便让每个数据拥有者（FL客户端）在本地数据分布、系统资源和模型结构等限制下，训练个性化的本地模型。
methods: 本方法基于一个小型的同构特征提取器，并通过论证方法证明其可以在wall-to-wall时间内收敛。客户端通过迭代学习方法来训练本地模型，并将模型参数上传到FL服务器进行集成。
results: 对于两个实际数据集，与六种状态OF-the-art方法进行比较，实验结果表明，pFedES可以建立最准确的模型，同时具有低的通信和计算成本。相比最佳基eline，它可以提高测试准确率1.61%，而同时降低通信和计算成本99.6%和82.9%。

Abstract
As a privacy-preserving collaborative machine learning paradigm, federated learning (FL) has attracted significant interest from academia and the industry alike. To allow each data owner (a.k.a., FL clients) to train a heterogeneous and personalized local model based on its local data distribution, system resources and requirements on model structure, the field of model-heterogeneous personalized federated learning (MHPFL) has emerged. Existing MHPFL approaches either rely on the availability of a public dataset with special characteristics to facilitate knowledge transfer, incur high computation and communication costs, or face potential model leakage risks. To address these limitations, we propose a model-heterogeneous personalized Federated learning approach based on feature Extractor Sharing (pFedES). It incorporates a small homogeneous feature extractor into each client's heterogeneous local model. Clients train them via the proposed iterative learning method to enable the exchange of global generalized knowledge and local personalized knowledge. The small local homogeneous extractors produced after local training are uploaded to the FL server and for aggregation to facilitate easy knowledge sharing among clients. We theoretically prove that pFedES can converge over wall-to-wall time. Extensive experiments on two real-world datasets against six state-of-the-art methods demonstrate that pFedES builds the most accurate model, while incurring low communication and computation costs. Compared with the best-performing baseline, it achieves 1.61% higher test accuracy, while reducing communication and computation costs by 99.6% and 82.9%, respectively.

摘要
As a privacy-preserving collaborative machine learning paradigm, federated learning (FL) has attracted significant interest from academia and industry. To allow each data owner (a.k.a., FL clients) to train a heterogeneous and personalized local model based on its local data distribution, system resources, and requirements on model structure, the field of model-heterogeneous personalized federated learning (MHPFL) has emerged. Existing MHPFL approaches either rely on the availability of a public dataset with special characteristics to facilitate knowledge transfer, incur high computation and communication costs, or face potential model leakage risks. To address these limitations, we propose a model-heterogeneous personalized Federated learning approach based on feature Extractor Sharing (pFedES). It incorporates a small homogeneous feature extractor into each client's heterogeneous local model. Clients train them via the proposed iterative learning method to enable the exchange of global generalized knowledge and local personalized knowledge. The small local homogeneous extractors produced after local training are uploaded to the FL server for aggregation to facilitate easy knowledge sharing among clients. We theoretically prove that pFedES can converge over wall-to-wall time. Extensive experiments on two real-world datasets against six state-of-the-art methods demonstrate that pFedES builds the most accurate model, while incurring low communication and computation costs. Compared with the best-performing baseline, it achieves 1.61% higher test accuracy, while reducing communication and computation costs by 99.6% and 82.9%, respectively.

Unified machine learning tasks and datasets for enhancing renewable energy

paper_url: http://arxiv.org/abs/2311.06876
repo_url: None
paper_authors: Arsam Aryandoust, Thomas Rigoni, Francesco di Stefano, Anthony Patt
for: 本研究旨在探讨使用多任务机器学习模型解决可再生能源过渡和气候变化问题。
methods: 本文使用多任务机器学习模型，包括零损训练和几何学习模型，以解决具有少量训练数据的问题。
results: 本文 introduce了17个能源转换任务数据集，并将所有任务集合成一个多任务机器学习模型，以便对这些任务进行解决。同时，本文还提出了一些数据集的维度、需要的设计要求和模型性能指标。

Abstract
Multi-tasking machine learning (ML) models exhibit prediction abilities in domains with little to no training data available (few-shot and zero-shot learning). Over-parameterized ML models are further capable of zero-loss training and near-optimal generalization performance. An open research question is, how these novel paradigms contribute to solving tasks related to enhancing the renewable energy transition and mitigating climate change. A collection of unified ML tasks and datasets from this domain can largely facilitate the development and empirical testing of such models, but is currently missing. Here, we introduce the ETT-17 (Energy Transition Tasks-17), a collection of 17 datasets from six different application domains related to enhancing renewable energy, including out-of-distribution validation and testing data. We unify all tasks and datasets, such that they can be solved using a single multi-tasking ML model. We further analyse the dimensions of each dataset; investigate what they require for designing over-parameterized models; introduce a set of dataset scores that describe important properties of each task and dataset; and provide performance benchmarks.

摘要
多任务学习机器学习（ML）模型在具有少量或无training数据的领域表现出预测能力（几shot和零shot学习）。过度参数化的ML模型可以在无损训练和近似优化性能下进行训练。现有一个开放的研究问题是，这些新的 paradigma如何在推进可再生能源转型和减轻气候变化中发挥作用。一个包含这些任务和数据集的集成可以大大促进这些模型的开发和实验测试，但目前缺失。我们现在介绍ETT-17（能源转型任务17），这是6个不同应用领域中的17个数据集，包括out-of-distribution验证和测试数据。我们将所有任务和数据集统一，以便通过单个多任务ML模型解决它们。我们还分析每个数据集的维度，研究它们需要的设计过度参数化模型的要求，介绍每个任务和数据集的数据集分数，并提供性能标准。

Inference and Interference: The Role of Clipping, Pruning and Loss Landscapes in Differentially Private Stochastic Gradient Descent

paper_url: http://arxiv.org/abs/2311.06839
repo_url: None
paper_authors: Lauren Watson, Eric Gan, Mohan Dantam, Baharan Mirzasoleiman, Rik Sarkar
for: 这篇论文主要针对了差异性保护随机梯度下降（DP-SGD）在大神经网络上的训练和测试性能，并对其进行了详细的研究和比较。
methods: 该论文使用了分析DP-SGD和SGD的两个过程的不同行为，并在早期和晚期两个阶段进行了分别的分析。它发现DP-SGD在早期阶段的进度较慢，但是在后期阶段的进度决定了最终结果。此外，它还分析了DP-SGD中的剪切和随机噪声的两个步骤，发现剪切Step有更大的影响，而随机噪声则会引入误差。
results: 该论文通过理论分析和广泛的实验表明，可以通过减小维度来提高DP-SGD的测试准确率，并且发现重剪的方法可以更好地提高DP-SGD的测试准确率。

Abstract
Differentially private stochastic gradient descent (DP-SGD) is known to have poorer training and test performance on large neural networks, compared to ordinary stochastic gradient descent (SGD). In this paper, we perform a detailed study and comparison of the two processes and unveil several new insights. By comparing the behavior of the two processes separately in early and late epochs, we find that while DP-SGD makes slower progress in early stages, it is the behavior in the later stages that determines the end result. This separate analysis of the clipping and noise addition steps of DP-SGD shows that while noise introduces errors to the process, gradient descent can recover from these errors when it is not clipped, and clipping appears to have a larger impact than noise. These effects are amplified in higher dimensions (large neural networks), where the loss basin occupies a lower dimensional space. We argue theoretically and using extensive experiments that magnitude pruning can be a suitable dimension reduction technique in this regard, and find that heavy pruning can improve the test accuracy of DPSGD.

摘要
diferencialmente privado stochastic gradient descent (DP-SGD) 是已知在大型神经网络上训练和测试性能较差，相比普通的随机梯度 descent (SGD)。在这篇论文中，我们进行了详细的比较和分析两个过程，并发现了一些新的发现。通过分 sep 梯度 descent 和噪声添加步骤的分析，我们发现，虽然 DP-SGD 在早期阶段 slower progress，但是在后期阶段的表现决定了结果。这些效果在高维（大神经网络）中更加突出，因为损失基地占据了低维度空间。我们 theoretically 和广泛实验表明， magnitude pruning 可以是适当的维度减少技术，并发现了重彻uning 可以提高 DPSGD 的测试精度。

GraNNDis: Efficient Unified Distributed Training Framework for Deep GNNs on Large Clusters

paper_url: http://arxiv.org/abs/2311.06837
repo_url: None
paper_authors: Jaeyong Song, Hongsun Jang, Jaewon Jung, Youngsok Kim, Jinho Lee
for: 提高大图和深层GNN训练的效率
methods: 提出了三种新技术：分享预加载、扩展意识采样和合作批处理
results: 实验在多服务器多GPU集群上显示，GraNNDis可以提供更高的速度提升 compared to state-of-the-art distributed GNN 训练框架

Abstract
Graph neural networks (GNNs) are one of the most rapidly growing fields within deep learning. According to the growth in the dataset and the model size used for GNNs, an important problem is that it becomes nearly impossible to keep the whole network on GPU memory. Among numerous attempts, distributed training is one popular approach to address the problem. However, due to the nature of GNNs, existing distributed approaches suffer from poor scalability, mainly due to the slow external server communications. In this paper, we propose GraNNDis, an efficient distributed GNN training framework for training GNNs on large graphs and deep layers. GraNNDis introduces three new techniques. First, shared preloading provides a training structure for a cluster of multi-GPU servers. We suggest server-wise preloading of essential vertex dependencies to reduce the low-bandwidth external server communications. Second, we present expansion-aware sampling. Because shared preloading alone has limitations because of the neighbor explosion, expansion-aware sampling reduces vertex dependencies that span across server boundaries. Third, we propose cooperative batching to create a unified framework for full-graph and minibatch training. It significantly reduces redundant memory usage in mini-batch training. From this, GraNNDis enables a reasonable trade-off between full-graph and mini-batch training through unification especially when the entire graph does not fit into the GPU memory. With experiments conducted on a multi-server/multi-GPU cluster, we show that GraNNDis provides superior speedup over the state-of-the-art distributed GNN training frameworks.

摘要
GRAPH NeRal networks (GNNs) 是深度学习中最快增长的领域之一。随着数据集和模型大小的增长，GNNs 中的一个重要问题是将整个网络存储在 GPU 内存中变得几乎不可能。为解决这个问题，分布式训练是一种受欢迎的方法。然而，由于 GNNs 的性质，现有的分布式方法受到较低的扩展缓存的限制，主要是由slow external server communications引起的。在本文中，我们提出了 GraNNDis，一种高效的分布式 GNN 训练框架，用于在大图和深层次上训练 GNNs。GraNNDis introduce three new techniques：1. 共享预加载提供了一种cluster of multi-GPU servers中的训练结构。我们建议在多GPU服务器上进行优先级预加载 essentials vertex dependencies，以降低低带宽外部服务器通信的External server communications。2. 我们提出了扩展相关采样。由于共享预加载独立有限制，因为邻居爆发，扩展相关采样可以降低 span across server boundaries的 vertex dependencies。3. 我们提出了合作批处理，以创建一个统一的框架，用于全图和小批量训练。它可以减少了小批量训练中的冗余内存使用。从而，GraNNDis 允许在reasonable trade-off between full-graph and mini-batch training through unification，特别是当整个图片不能被GPU内存中的情况下。通过在多服务器/多GPU集群上进行实验，我们表明了 GraNNDis 对现有的分布式 GNN 训练框架提供了显著的加速。

Towards Continual Reinforcement Learning for Quadruped Robots

paper_url: http://arxiv.org/abs/2311.06828
repo_url: None
paper_authors: Giovanni Minelli, Vassilis Vassiliades
for: 本研究旨在强化quadruped robot在实际场景中的适应性和性能，通过在不同环境下进行练习和评估来增强机器人的适应能力。
methods: 本研究采用了两种 continual learning 方法，先后在不同环境中训练机器人，并同时评估其在所有环境下的性能。
results: 研究发现，机器人在不同环境下的适应能力受到了环境变化和前期练习的影响，同时也存在了机器人忘记先前学习的技能的现象。通过对这些因素的评估和控制，我们希望能够提高quadruped robot在实际场景中的性能和适应性。

Abstract
Quadruped robots have emerged as an evolving technology that currently leverages simulators to develop a robust controller capable of functioning in the real-world without the need for further training. However, since it is impossible to predict all possible real-world situations, our research explores the possibility of enabling them to continue learning even after their deployment. To this end, we designed two continual learning scenarios, sequentially training the robot on different environments while simultaneously evaluating its performance across all of them. Our approach sheds light on the extent of both forward and backward skill transfer, as well as the degree to which the robot might forget previously acquired skills. By addressing these factors, we hope to enhance the adaptability and performance of quadruped robots in real-world scenarios.

摘要
四足机器人已经成为一种发展中的技术，目前利用模拟器来开发一个在实际世界中能够运行的稳定控制器，无需进一步训练。然而，由于无法预测所有的实际世界情况，我们的研究探讨了让机器人继续学习，即使已经部署。为此，我们设计了两种连续学习情况，顺序地训练机器人在不同环境中，同时评估其在所有环境中的性能。我们的方法揭示了机器人在前进和逆转技能传递方面的扩展和忘记程度。通过解决这些因素，我们希望提高四足机器人在实际世界情况下的适应性和性能。

A Comprehensive Survey On Client Selections in Federated Learning

paper_url: http://arxiv.org/abs/2311.06801
repo_url: None
paper_authors: Ala Gouissem, Zina Chkirbene, Ridha Hamila
for: 本文提供了 federated learning 中客户端选择技术的审视，包括其优势和局限性，以及需要解决的挑战和开issues。
methods: 本文覆盖了一些常见的客户端选择技术，如随机选择、性能 aware 选择和资源 aware 选择，以及在不同类型的网络中的应用。
results: 本文讨论了客户端选择对模型安全性的改进，以及在动态约束和不同类型的网络中的客户端选择的挑战和开issues。

Abstract
Federated Learning (FL) is a rapidly growing field in machine learning that allows data to be trained across multiple decentralized devices. The selection of clients to participate in the training process is a critical factor for the performance of the overall system. In this survey, we provide a comprehensive overview of the state-of-the-art client selection techniques in FL, including their strengths and limitations, as well as the challenges and open issues that need to be addressed. We cover conventional selection techniques such as random selection where all or partial random of clients is used for the trained. We also cover performance-aware selections and as well as resource-aware selections for resource-constrained networks and heterogeneous networks. We also discuss the usage of client selection in model security enhancement. Lastly, we discuss open issues and challenges related to clients selection in dynamic constrained, and heterogeneous networks.

摘要
federated 学习（FL）是一个快速发展的机器学习领域，允许数据在多个分散式设备上进行训练。选择参与训练过程的客户端是系统性能的关键因素。在本调查中，我们提供了 federated 学习中客户端选择技术的全面概述，包括它们的优点和局限性，以及需要解决的挑战和开放问题。我们覆盖了传统的选择技术，如随机选择，其中所有或部分随机选择客户端进行训练。我们还覆盖了性能协调的选择和资源协调的选择，用于资源受限的网络和多样化网络。此外，我们还讨论了客户端选择在模型安全增强中的应用。最后，我们讨论了客户端选择在动态约束和多样化网络中的开放问题。

Learning Predictive Safety Filter via Decomposition of Robust Invariant Set

paper_url: http://arxiv.org/abs/2311.06769
repo_url: None
paper_authors: Zeyang Li, Chuxiong Hu, Weiye Zhao, Changliu Liu
for: 本研究旨在提供一种可靠地保证非线性系统的安全性，尤其是在实际控制任务中，遇到模型不确定性和外部干扰时。
methods: 本研究使用了一种混合了模型预测控制（RMPC）和学习控制（RL）的方法，以实现安全性和可扩展性的平衡。
results: 研究结果表明，该方法可以在实时处理非 convex 优化问题，并提供持续的安全性保证，而不需要高度计算负担。

Abstract
Ensuring safety of nonlinear systems under model uncertainty and external disturbances is crucial, especially for real-world control tasks. Predictive methods such as robust model predictive control (RMPC) require solving nonconvex optimization problems online, which leads to high computational burden and poor scalability. Reinforcement learning (RL) works well with complex systems, but pays the price of losing rigorous safety guarantee. This paper presents a theoretical framework that bridges the advantages of both RMPC and RL to synthesize safety filters for nonlinear systems with state- and action-dependent uncertainty. We decompose the robust invariant set (RIS) into two parts: a target set that aligns with terminal region design of RMPC, and a reach-avoid set that accounts for the rest of RIS. We propose a policy iteration approach for robust reach-avoid problems and establish its monotone convergence. This method sets the stage for an adversarial actor-critic deep RL algorithm, which simultaneously synthesizes a reach-avoid policy network, a disturbance policy network, and a reach-avoid value network. The learned reach-avoid policy network is utilized to generate nominal trajectories for online verification, which filters potentially unsafe actions that may drive the system into unsafe regions when worst-case disturbances are applied. We formulate a second-order cone programming (SOCP) approach for online verification using system level synthesis, which optimizes for the worst-case reach-avoid value of any possible trajectories. The proposed safety filter requires much lower computational complexity than RMPC and still enjoys persistent robust safety guarantee. The effectiveness of our method is illustrated through a numerical example.

摘要
保证非线性系统在模型不确定性和外部干扰下的安全性是非常重要，尤其是在实际控制任务中。预测方法如Robust Model Predictive Control（RMPC）需要在线解决非凸优化问题，这会导致高计算负担和低可扩展性。学习控制（RL）可以与复杂系统相处，但是付出了放弃准确的安全保证。本文提出了一个概念框架，可以结合RMPC和RL两种方法，并生成安全筛选器 для非线性系统。我们将 robust invariant set（RIS）分解为两部分：一个目标集，与终端区域设计相对应，以及一个可达-避免集，其他的RIS都包含在内。我们提出了一种政策迭代法，用于robust reach-avoid问题，并证明其 monotone convergence。这种方法为一种actor-critic深度学习算法提供了基础，该算法同时生成了一个可达-避免策略网络、一个干扰策略网络和一个可达-避免值网络。学习的可达-避免策略网络可以生成nominal trajectories，用于在线验证，并过滤潜在危险的动作，以避免系统在最差情况下受到危险的影响。我们使用系统级 synthesis 的 SOCP 方法进行在线验证，并优化了最差情况下的可达-避免值。提案的安全筛选器需要远低于RMPC的计算复杂性，仍然享有持续的准确安全保证。数据示例 verify 了我们的方法的有效性。

Personalized Federated Learning via ADMM with Moreau Envelope

paper_url: http://arxiv.org/abs/2311.06756
repo_url: https://github.com/zsk66/flame-master
paper_authors: Shengkun Zhu, Jinshan Zeng, Sheng Wang, Yuan Sun, Zhiyong Peng
for: 提出了个人化联合学习（PFL）方法，以解决不同数据集的训练不收敛问题。
methods: 使用了多元函数方法（ADMM）和更多瓦埃均值（FLAME），实现了下线性收敛率，只需要Gradient Lipschitz连续性的较弱假设。此外，由于ADMM是无梯度的，FLAME可以减少全局模型训练中的hyperparameter调整，特别是避免学习率的调整。
results: 在不同数据集上训练PFL模型，FLAME可以在模型性能方面超过现有方法，并且在通信效率方面实现3.75倍的平均加速。此外，对于客户端选择策略，我们提出了偏向客户端选择策略，可以加速个人和全局模型的训练。

Abstract
Personalized federated learning (PFL) is an approach proposed to address the issue of poor convergence on heterogeneous data. However, most existing PFL frameworks require strong assumptions for convergence. In this paper, we propose an alternating direction method of multipliers (ADMM) for training PFL models with Moreau envelope (FLAME), which achieves a sublinear convergence rate, relying on the relatively weak assumption of gradient Lipschitz continuity. Moreover, due to the gradient-free nature of ADMM, FLAME alleviates the need for hyperparameter tuning, particularly in avoiding the adjustment of the learning rate when training the global model. In addition, we propose a biased client selection strategy to expedite the convergence of training of PFL models. Our theoretical analysis establishes the global convergence under both unbiased and biased client selection strategies. Our experiments validate that FLAME, when trained on heterogeneous data, outperforms state-of-the-art methods in terms of model performance. Regarding communication efficiency, it exhibits an average speedup of 3.75x compared to the baselines. Furthermore, experimental results validate that the biased client selection strategy speeds up the convergence of both personalized and global models.

摘要
“个性化联合学习（PFL）是一种提出来解决不同数据集的融合问题的方法。然而，大多数现有的PFL框架需要强大的假设来保证收敛。在这篇论文中，我们提出了一种多参数方向法（ADMM）来训练PFL模型，使用Moreau抛物（FLAME）实现下线性收敛率，只需要 Gradient Lipschitz continuity 的较弱假设。此外，由于ADMM是gradient-free的，FLAME可以减少对学习率的调整，特别是在训练全局模型时。此外，我们还提出了偏向客户选择策略来加速PFL模型的训练。我们的理论分析表明，在不偏向和偏向客户选择策略下，模型都能够达到全球收敛。我们的实验表明，FLAME，当训练不同数据集时，可以比现有方法更好地性能。此外，实验还表明，偏向客户选择策略可以加速个性化和全局模型的训练。”Note: Please note that the translation is in Simplified Chinese, and the word order and sentence structure may be different from the original text.

Application of a Dense Fusion Attention Network in Fault Diagnosis of Centrifugal Fan

paper_url: http://arxiv.org/abs/2311.07614
repo_url: None
paper_authors: Ruijun Wang, Yuan Liu, Zhixia Fan, Xiaogang Xu, Huijie Wang
for: 本研究旨在提高 rotate machinery 的监测和诊断能力，通过嵌入分布式注意力模块而不是传统的紧密相连操作。
methods: 本研究使用了 dense fusion 技术，即在 dense connections 中嵌入 distributed attention modules，以隔离空间和通道对瑕点特征重新调整的影响。同时，该技术还形成了一种 fusional attention 函数，可以帮助解释网络诊断过程的可读性。
results: 实验结果表明，该网络在 centrifugal fan 瑕点诊断方面的表现比其他先进瑕点诊断模型更好，可以更好地抵御噪声和提高瑕点特征提取能力。

Abstract
Although the deep learning recognition model has been widely used in the condition monitoring of rotating machinery. However, it is still a challenge to understand the correspondence between the structure and function of the model and the diagnosis process. Therefore, this paper discusses embedding distributed attention modules into dense connections instead of traditional dense cascading operations. It not only decouples the influence of space and channel on fault feature adaptive recalibration feature weights, but also forms a fusion attention function. The proposed dense fusion focuses on the visualization of the network diagnosis process, which increases the interpretability of model diagnosis. How to continuously and effectively integrate different functions to enhance the ability to extract fault features and the ability to resist noise is answered. Centrifugal fan fault data is used to verify this network. Experimental results show that the network has stronger diagnostic performance than other advanced fault diagnostic models.

摘要
尽管深度学习识别模型在旋转机器condition monitoring中广泛应用，但是理解模型和诊断过程之间的对应关系仍然是一个挑战。因此，本文提出将分布式注意力模块采用 dense connections 而不是传统的 dense cascading 操作。这不仅解耦了空间和通道对缺陷特征重量的影响，还形成了一种 fusional attention 函数。提出的 dense fusion 可以视觉化网络诊断过程，从而提高模型诊断的可读性。如何不断和有效地 интегра力不同功能，提高抽取缺陷特征的能力和抗噪能力，这个问题得到了答案。使用中心扇式风机缺陷数据进行验证，实验结果表明，该网络在其他先进缺陷诊断模型之上具有更强的诊断能力。

How do Minimum-Norm Shallow Denoisers Look in Function Space?

paper_url: http://arxiv.org/abs/2311.06748
repo_url: None
paper_authors: Chen Zeno, Greg Ongie, Yaniv Blumenfeld, Nir Weinberger, Daniel Soudry
for: This paper aims to understand the functions realized by shallow ReLU NN denoisers in the context of interpolation and minimal representation cost.
methods: The authors use a theoretical approach to derive closed-form expressions for the NN denoiser functions, and prove their contractivity and generalization properties.
results: The authors find that the NN denoiser functions can be decomposed into a sum of simple rank-one piecewise linear interpolations aligned with edges and/or faces connecting training samples, and empirically verify this alignment phenomenon on synthetic data and real images.

Abstract
Neural network (NN) denoisers are an essential building block in many common tasks, ranging from image reconstruction to image generation. However, the success of these models is not well understood from a theoretical perspective. In this paper, we aim to characterize the functions realized by shallow ReLU NN denoisers -- in the common theoretical setting of interpolation (i.e., zero training loss) with a minimal representation cost (i.e., minimal $\ell^2$ norm weights). First, for univariate data, we derive a closed form for the NN denoiser function, find it is contractive toward the clean data points, and prove it generalizes better than the empirical MMSE estimator at a low noise level. Next, for multivariate data, we find the NN denoiser functions in a closed form under various geometric assumptions on the training data: data contained in a low-dimensional subspace, data contained in a union of one-sided rays, or several types of simplexes. These functions decompose into a sum of simple rank-one piecewise linear interpolations aligned with edges and/or faces connecting training samples. We empirically verify this alignment phenomenon on synthetic data and real images.

摘要
For univariate data, we derive a closed-form expression for the NN denoiser function and show that it is contractive towards the clean data points. We also prove that the NN denoiser generalizes better than the empirical MMSE estimator at a low noise level.For multivariate data, we find the NN denoiser functions in closed form under various geometric assumptions on the training data, including data contained in a low-dimensional subspace, data contained in a union of one-sided rays, or data contained in several types of simplexes. These functions decompose into a sum of simple rank-one piecewise linear interpolations aligned with edges and/or faces connecting training samples. We empirically verify this alignment phenomenon on synthetic data and real images.

ReactionT5: a large-scale pre-trained model towards application of limited reaction data

paper_url: http://arxiv.org/abs/2311.06708
repo_url: https://github.com/sagawatatsuya/ReactionT5
paper_authors: Tatsuya Sagawa, Ryosuke Kojima
for: 这种paper是为了提出一种基于Transformer的深度神经网络，用于预测多个分子反应的结果。
methods: 这种模型使用了ORD数据库中的大规模数据进行预训练，然后进行了精细调整以适应具体的反应预测任务。
results: 研究发现，这种模型在预测反应产物的量和产物分布中表现出色，即使用有限的精细调整数据也能够达到比较出色的效果。

Abstract
Transformer-based deep neural networks have revolutionized the field of molecular-related prediction tasks by treating molecules as symbolic sequences. These models have been successfully applied in various organic chemical applications by pretraining them with extensive compound libraries and subsequently fine-tuning them with smaller in-house datasets for specific tasks. However, many conventional methods primarily focus on single molecules, with limited exploration of pretraining for reactions involving multiple molecules. In this paper, we propose ReactionT5, a novel model that leverages pretraining on the Open Reaction Database (ORD), a publicly available large-scale resource. We further fine-tune this model for yield prediction and product prediction tasks, demonstrating its impressive performance even with limited fine-tuning data compared to traditional models. The pre-trained ReactionT5 model is publicly accessible on the Hugging Face platform.

摘要
transformer-based deep neural networks have revolutionized the field of molecular-related prediction tasks by treating molecules as symbolic sequences. These models have been successfully applied in various organic chemical applications by pretraining them with extensive compound libraries and subsequently fine-tuning them with smaller in-house datasets for specific tasks. However, many conventional methods primarily focus on single molecules, with limited exploration of pretraining for reactions involving multiple molecules. In this paper, we propose ReactionT5, a novel model that leverages pretraining on the Open Reaction Database (ORD), a publicly available large-scale resource. We further fine-tune this model for yield prediction and product prediction tasks, demonstrating its impressive performance even with limited fine-tuning data compared to traditional models. The pre-trained ReactionT5 model is publicly accessible on the Hugging Face platform.Here's the translation in Traditional Chinese:transformer-based deep neural networks have revolutionized the field of molecular-related prediction tasks by treating molecules as symbolic sequences. These models have been successfully applied in various organic chemical applications by pretraining them with extensive compound libraries and subsequently fine-tuning them with smaller in-house datasets for specific tasks. However, many conventional methods primarily focus on single molecules, with limited exploration of pretraining for reactions involving multiple molecules. In this paper, we propose ReactionT5, a novel model that leverages pretraining on the Open Reaction Database (ORD), a publicly available large-scale resource. We further fine-tune this model for yield prediction and product prediction tasks, demonstrating its impressive performance even with limited fine-tuning data compared to traditional models. The pre-trained ReactionT5 model is publicly accessible on the Hugging Face platform.

Transfer Learning to Detect COVID-19 Coughs with Incremental Addition of Patient Coughs to Healthy People’s Cough Detection Models

paper_url: http://arxiv.org/abs/2311.06707
repo_url: None
paper_authors: Sudip Vhaduri, Seungyeon Paik, Jessica E Huber
for: 检测COVID-19病人的喘音，以防止疾病的迅速传播。
methods: 使用升级传输学习方法，利用健康人喘音和COVID-19患者喘音之间的关系，以精准地检测COVID-19病人的喘音。
results: 使用小量病人喘音数据和预训练的健康人喘音模型，可以达到reasonable的喘音检测精度，从而降低需要大量病人数据来训练模型的需求。

Abstract
Millions of people have died worldwide from COVID-19. In addition to its high death toll, COVID-19 has led to unbearable suffering for individuals and a huge global burden to the healthcare sector. Therefore, researchers have been trying to develop tools to detect symptoms of this human-transmissible disease remotely to control its rapid spread. Coughing is one of the common symptoms that researchers have been trying to detect objectively from smartphone microphone-sensing. While most of the approaches to detect and track cough symptoms rely on machine learning models developed from a large amount of patient data, this is not possible at the early stage of an outbreak. In this work, we present an incremental transfer learning approach that leverages the relationship between healthy peoples' coughs and COVID-19 patients' coughs to detect COVID-19 coughs with reasonable accuracy using a pre-trained healthy cough detection model and a relatively small set of patient coughs, reducing the need for large patient dataset to train the model. This type of model can be a game changer in detecting the onset of a novel respiratory virus.

摘要
众多人在全球死亡了，COVID-19 也引起了不可支持的人类uffering 和全球医疗机构的巨大荷载。因此，研究人员在努力开发可以远程检测COVID-19 病人的症状，以控制其迅速传播。咳嗽是COVID-19 病人的常见症状之一，研究人员在智能手机麦克风感测中尝试了对咳嗽症状进行 объектив检测。大多数检测和跟踪咳嗽症状的方法都基于由大量病人数据提供的机器学习模型，但在疫情的早期，这并不是可行的。在这种情况下，我们提出了一种增量传输学习方法，利用健康人群的咳嗽和COVID-19 病人的咳嗽之间的关系，以对COVID-19 病人的咳嗽进行reasonable的检测，使用先前已经训练的健康咳嗽检测模型和相对较小的病人咳嗽数据集，从而减少需要大量病人数据来训练模型。这种类型的模型可能会是检测新型呼吸病的游戏 changer。

A Physics-informed Machine Learning-based Control Method for Nonlinear Dynamic Systems with Highly Noisy Measurements

paper_url: http://arxiv.org/abs/2311.07613
repo_url: None
paper_authors: Mason Ma, Jiajie Wu, Chase Post, Tony Shi, Jingang Yi, Tony Schmitz, Hong Wang
for: 该研究旨在提出一种基于物理学习的控制方法，用于非线性动态系统中的噪声探测。现有的数据驱动控制方法使用机器学习进行系统识别，但不能有效应对高度噪声的测量结果，导致控制性能不稳定。
methods: 该研究扩展了当前的物理学习可靠性模型，并将其集成到预测控制框架中。以实验 validate 两个噪声非线性动态系统：洛朗兹3系统和转换机床。
results: 分析结果表明，提出的方法在高噪声条件下表现出较高的模型准确性和控制性能，与当前参考值相比有所提高。

Abstract
This study presents a physics-informed machine learning-based control method for nonlinear dynamic systems with highly noisy measurements. Existing data-driven control methods that use machine learning for system identification cannot effectively cope with highly noisy measurements, resulting in unstable control performance. To address this challenge, the present study extends current physics-informed machine learning capabilities for modeling nonlinear dynamics with control and integrates them into a model predictive control framework. To demonstrate the capability of the proposed method we test and validate with two noisy nonlinear dynamic systems: the chaotic Lorenz 3 system, and turning machine tool. Analysis of the results illustrate that the proposed method outperforms state-of-the-art benchmarks as measured by both modeling accuracy and control performance for nonlinear dynamic systems under high-noise conditions.

摘要

2023-11-12

eess.IV

eess.IV - 2023-11-12

PuzzleTuning: Explicitly Bridge Pathological and Natural Image with Puzzles

paper_url: http://arxiv.org/abs/2311.06712
repo_url: https://github.com/sagizty/puzzletuning
paper_authors: Tianyi Zhang, Shangqing Lyu, Yanli Lei, Sicheng Chen, Nan Ying, Yufang He, Yu Zhao, Yunlu Feng, Guanglei Zhang
for: 这个研究是为了提高路ологиcal image分析的效果，因为 annotation scarcity 问题，很多 recent works 使用 self-supervised learning (SSL) 在无标注的路ологиcal image上进行预训练，但是这些方法存在两个核心缺陷：一是不直接探索路ологиcal 领域的主要专注点，二是不能有效地与大量的自然图像领域结合。
methods: 我们提出了一个大规模的 PuzzleTuning 框架，包括以下几个创新：首先，我们识别了三个任务专注点，可以有效地连接路ологиcal 和自然领域：外观一致性、空间一致性和调整理解。其次，我们设计了多个蛋糕修复任务，以Explicitly 预训模型这些专注点。最后，为了补充两个领域之间的差距，我们引入了一个明确的问题调整过程，将这些领域专门的知识与自然知识融合。
results: 我们的 PuzzleTuning 框架在多个下游任务中表现出色，比如多个数据集上的不同任务。实验结果显示，我们的 PuzzleTuning 框架比前一代 SOTA 方法更好地适应路ологиcal image分析的问题。可以在 https://github.com/sagizty/PuzzleTuning 获取代码、示例和预训模型。

Abstract
Pathological image analysis is a crucial field in computer vision. Due to the annotation scarcity in the pathological field, recently, most of the works leverage self-supervised learning (SSL) trained on unlabeled pathological images, hoping to mine the main representation automatically. However, there are two core defects in SSL-based pathological pre-training: (1) they do not explicitly explore the essential focuses of the pathological field, and (2) they do not effectively bridge with and thus take advantage of the large natural image domain. To explicitly address them, we propose our large-scale PuzzleTuning framework, containing the following innovations. Firstly, we identify three task focuses that can effectively bridge pathological and natural domains: appearance consistency, spatial consistency, and misalignment understanding. Secondly, we devise a multiple puzzle restoring task to explicitly pre-train the model with these focuses. Thirdly, for the existing large domain gap between natural and pathological fields, we introduce an explicit prompt-tuning process to incrementally integrate the domain-specific knowledge with the natural knowledge. Additionally, we design a curriculum-learning training strategy that regulates the task difficulty, making the model fit the complex multiple puzzle restoring task adaptively. Experimental results show that our PuzzleTuning framework outperforms the previous SOTA methods in various downstream tasks on multiple datasets. The code, demo, and pre-trained weights are available at https://github.com/sagizty/PuzzleTuning.

摘要
Pathological image分析是计算机视觉中的关键领域。由于pathological领域中的标注缺乏，最近的大多数工作都是通过自动学习（SSL）在未标注的pathological图像上进行训练，以希望从自动获得主表示。然而， SSL在pathological领域中有两个核心缺陷：（1）它们不直接探索pathological领域的关键点，和（2）它们不能有效地与大自然图像领域相结合。为了直接解决这些问题，我们提出了我们的大规模PuzzleTuning框架，包括以下创新：1. 我们确定了三个任务专注点，可以有效桥接pathological和自然领域：外观一致性、空间一致性和误差理解。2. 我们设计了一个多个谜题恢复任务，以直接在这些专注点上培养模型。3. 由于natural和pathological领域之间的域领域差距较大，我们引入了一个明确的提问调整过程，以增量地 интеGRATE域特定知识和自然知识。4. 我们设计了一个curriculum学习训练策略，以适应模型适应复杂的多个谜题恢复任务的适应性。实验结果显示，我们的PuzzleTuning框架在多个下游任务中超过了先前的SOTA方法。我们在https://github.com/sagizty/PuzzleTuning上提供了代码、示例和预训练 веса。

2023-11-12

eess.SP

eess.SP - 2023-11-12

Dual-Polarized Reconfigurable Intelligent Surface Assisted Broad Beamforming

paper_url: http://arxiv.org/abs/2311.06967
repo_url: https://github.com/parisaramezani/RISBroadBeamforming
paper_authors: Parisa Ramezani, Maksym A. Girnyk, Emil Björnson
for: 这个论文研究了一种基于智能表面的通信系统，其中一个发送器想要向多个用户传输一个公共信号，这些用户分布在宽的角度领域中。
methods: 这个论文提议使用双极化的智能表面，其中每个元素具有垂直的两种极化方向，以实现广泛的束发射。它们还提出了一种基于 Golay 匹配序列的配置方法，可以构建大型智能表面从更小的智能表面中。
results: 数值分析证明了这种方法可以大大提高覆盖范围，并且可以将多个小型智能表面组合成大型智能表面。

Abstract
A reconfigurable intelligent surface (RIS) consists of a large number of low-cost elements that can control the propagation environment seen from a transmitter by intelligently applying phase shifts to impinging signals before reflection. This paper studies an RIS-assisted communication system where a transmitter wants to transmit a common signal to many users residing in a wide angular area. To cover this sector uniformly, the RIS needs to radiate a broad beam with a spatially flat array factor, instead of a narrow beam as normally considered. To achieve this, we propose to use a dual-polarized RIS consisting of elements with orthogonal polarizations and show that the RIS can produce a broad beam if the phase shift configuration vectors in the two polarizations form a so-called Golay complementary sequence pair. By utilizing their properties, we also present a method for constructing configuration for large RISs from smaller ones, while preserving the broad radiation pattern of the smaller RIS. The numerical results corroborate the mathematical analyses and highlight the greatly improved coverage properties.

摘要
一个可重新配置智能表面（RIS）由一大量低成本元素组成，可以控制传输器所看到的媒介传播环境，通过智能应用相位偏移到反射的信号前。这篇论文研究了一种由RIS支持的通信系统，在这种系统中，发送器想要将通信信号传输给覆盖广泛的用户群，这些用户群居住在一个宽角度区域中。为了覆盖这个领域，RIS需要发射一个广泛的束，而不是通常考虑的窄束。为了实现这一目标，我们提议使用双极化RIS，其元素具有正交的极化。我们表明，如果相位配置向量在两个极化中形成一个叫做高级别补做对的序列， то么RIS就可以生成一个广泛的束。此外，我们还提出了一种使用这种特性来构建大型RIS从小RIS中的方法，保持小RIS的广泛束 radiation 特性。数值结果证明了数学分析结果，并将广泛覆盖性得到了显著提高。

UAV Formation Optimization for Communication-assisted InSAR Sensing

paper_url: http://arxiv.org/abs/2311.06959
repo_url: None
paper_authors: Mohamed-Amine Lahmeri, Victor Mustieles-Pérez, Martin Vossiek, Gerhard Krieger, Robert Schober
for: 本研究旨在提高无人航空器（UAV）对雷达成像 Synthetic Aperture Radar（InSAR）散射应用的效率，包括生成高精度的数字高程模型（DEM）。
methods: 本文采用了joint形成和通信资源分配优化方法，以优化UAV对InSAR散射和数据传输的性能。
results: 研究结果显示，采用 alternate optimization（AO）技术和successive convex approximation（SCA）可以最大化InSAR覆盖率，同时满足所有InSAR特定的探测和通信性能指标。

Abstract
Interferometric synthetic aperture radar (InSAR) is an increasingly important remote sensing technique that enables three-dimensional (3D) sensing applications such as the generation of accurate digital elevation models (DEMs). In this paper, we investigate the joint formation and communication resource allocation optimization for a system comprising two unmanned aerial vehicles (UAVs) to perform InSAR sensing and to transfer the acquired data to the ground. To this end, we adopt as sensing performance metrics the interferometric coherence, i.e., the local correlation between the two co-registered UAV radar images, and the height of ambiguity (HoA), which together are a measure for the accuracy with which the InSAR system can estimate the height of ground objects. In addition, an analytical expression for the coverage of the considered InSAR sensing system is derived. Our objective is to maximize the InSAR coverage while satisfying all relevant InSAR-specific sensing and communication performance metrics. To tackle the non-convexity of the formulated optimization problem, we employ alternating optimization (AO) techniques combined with successive convex approximation (SCA). Our simulation results reveal that the resulting resource allocation algorithm outperforms two benchmark schemes in terms of InSAR coverage while satisfying all sensing and real-time communication requirements. Furthermore, we highlight the importance of efficient communication resource allocation in facilitating real-time sensing and unveil the trade-off between InSAR height estimation accuracy and coverage.

摘要
<>双无人航空器（UAV）探测双成像雷达（InSAR）系统的资源分配优化，以提高三维（3D）探测应用的精度。在本文中，我们将 investigate InSAR探测系统的共同形成和通信资源分配优化，以便两架UAV通过彼此联合探测来传输探测数据到地面。我们采用了干扰合成射频干扰（InSAR）探测系统的测试表现指标，包括干扰相关性（local correlation）和高程槽（HoA），这些指标共同测量干扰系统对地面物体高度的准确估计。此外，我们Derive an analytical expression for the coverage of the considered InSAR sensing system.我们的目标是将InSAR覆盖范围最大化，同时满足所有相关的InSAR特有探测和通信性能指标。为了解决干扰合成射频探测系统的非对称问题，我们使用了alternating optimization（AO）技术和Successive Convex Approximation（SCA）。我们的实验结果显示，所得的资源分配算法比两个底线方案更好，在满足所有探测和实时通信需求的情况下，提高InSAR覆盖范围。此外，我们阐述了实时探测和干扰系统之间的贡献相互作用，以及探测高度估计精度和覆盖范围之间的贡献贸易。<>

A Generalized Framework for Pulse-Shaping on Delay-Doppler Plane

paper_url: http://arxiv.org/abs/2311.06936
repo_url: None
paper_authors: Mohsen Bayat, Arman Farhang
for: 本研究的主要目标是建立一个总结射频平面上的射频整形框架。为此，我们将延迟-Doppler射频整形技术分为两类：圆形和线性射频整形。这为开发一个总结射频整形框架提供了机会，同时带来了新的启示。
methods: 我们使用了数学 derivations 来解释 ODDM 是一种线性射频整形技术，并且通过在延迟维度上插入一些零卫士 (ZG) 符号来提高圆形和线性射频整形技术的 OOB 泄漏和 BER 性能。
results: 我们的仪器实验结果证明了我们的数学 derivations 和主张的正确性，同时也证明了 ZGs 在 OOB 减少和 BER 性能提高方面的效果。

Abstract
The primary objective of this paper is to establish a generalized framework for pulse-shaping on the delay-Doppler plane. To this end, we classify delay-Doppler pulse-shaping techniques into two types, namely, circular and linear pulse-shaping. This paves the way towards the development of a generalized pulse-shaping framework. Our generalized framework provides the opportunity to compare different pulse-shaping techniques under the same umbrella while bringing new insights into their properties. In particular, our derivations based on this framework reveal that the recently emerged waveform orthogonal delay-Doppler multiplexing modulation (ODDM) is a linear pulse-shaping technique. By presenting ODDM under our generalized framework, we clearly explain the observed staircase behavior of its spectrum which has not been previously reported in the literature. Another contribution of this paper is proposal of a simple out-of-band (OOB) emission reduction technique by inserting a small number of zero-guard (ZG) symbols along the delay dimension of the circularly pulse-shaped signals. Additionally, inserting the zero-guards improves the bit-error-rate (BER) performance of both circular and linear pulse-shaping techniques. Finally, our simulation results confirm the validity of our mathematical derivations, claims and the effectiveness of the ZGs in OOB reduction and BER performance improvement.

摘要
主要目标 OF 这篇论文是建立一个通用的射频平面上的振荡形态框架。为达到这个目标，我们将延迟-Doppler 振荡技术分为两类：圆形振荡和直线振荡。这些分类提供了一个通用的振荡形态框架，可以对不同的振荡技术进行比较，并为其Properties 提供新的看法。具体来说，我们的 derivations 表明，最近出现的振荡 Ortogonal delay-Doppler multiplexing modulation (ODDM) 是一种直线振荡技术。通过将 ODDM 放入我们的通用框架中，我们可以清晰地解释 literature 中未曾报道的 ODDM 谱图的楼梯结构。此外，我们还提出了一种简单的出带 Emission reduction 技术，通过在圆振荡信号中添加一些零卫星（ZG）符号来减少带外干扰。此外，添加 ZG 符号还可以提高圆振荡和直线振荡技术的比特错误率（BER）性能。最后，我们的仿真结果证明了我们的数学 derivations 和主张的正确性，以及 ZG 符号在 OOB 减少和 BER 性能提高中的效果。

Symbol-Error Probability Constrained Power Minimization for Reconfigurable Intelligent Surfaces-based Passive Transmitter

paper_url: http://arxiv.org/abs/2311.06900
repo_url: None
paper_authors: Erico S. P. Lopes, Lukas T. N. Landau
for: 这个论文主要关注虚拟多用户多输入多输出系统中使用PSK模ulation和可配置智能表面发射机器人设置。
methods: 该论文 derive了 union-bound symbol-error probability 的形式化，该形式化是实际symbol-error probability的Upper bound。然后，根据这个形式化，提出了一个符号级 precoding 功率最小化问题，其中符号错误率要求下降到给定的水平。
results: 该论文通过使用 bisecting 方法和里曼尼昂 conjugate gradient 算法解决了这个问题。数值结果表明，该方法可以有效地减少 transmit power，并且适用于不同的 symbol-error probability 要求。

Abstract
This study considers a virtual multiuser multiple-input multiple-output system with PSK modulation realized via the reconfigurable intelligent surface-based passive transmitter setup. Under this framework, the study derives the formulation for the union-bound symbol-error probability, which is an upper bound on the actual symbol-error probability. Based on this, a symbol-level precoding power minimization problem under the condition that the union-bound symbol-error probability is below a given requirement is proposed. The problem is formulated as a constrained optimization on an oblique manifold, and solved via a bisection method. The method consists of successively optimizing transmit power while evaluating the feasibility of the union-bound symbol-error probability requisite by solving, via the Riemannian conjugate gradient algorithm, an auxiliary problem dependent only on the reflection coefficients of the reconfigurable intelligent surface elements. Numerical results demonstrate the effectiveness of the proposed approach in minimizing the transmit power for different symbol-error probability requirements.

摘要

Energy-efficient Beamforming for RISs-aided Communications: Gradient Based Meta Learning

paper_url: http://arxiv.org/abs/2311.06861
repo_url: None
paper_authors: Xinquan Wang, Fenghao Zhu, Qianyun Zhou, Qihao Yu, Chongwen Huang, Ahmed Alhammadi, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah
for: 提高6G通信的能效性和可扩展性。
methods: 使用绿色梯度基于meta学 beamforming（GMLB）方法，即通过把总比特率的梯度 feed into神经网络来进行优化。同时，设计了一个差分调节器来处理RIS的相位优化。
results: 在 simulations 中，GMLB 能够高于典型的 alternate 优化算法，并且能够减少两个数量级的能耗。

Abstract
Reconfigurable intelligent surfaces (RISs) have become a promising technology to meet the requirements of energy efficiency and scalability in future six-generation (6G) communications. However, a significant challenge in RISs-aided communications is the joint optimization of active and passive beamforming at base stations (BSs) and RISs respectively. Specifically, the main difficulty is attributed to the highly non-convex optimization space of beamforming matrices at both BSs and RISs, as well as the diversity and mobility of communication scenarios. To address this, we present a greenly gradient based meta learning beamforming (GMLB) approach. Unlike traditional deep learning based methods which take channel information directly as input, GMLB feeds the gradient of sum rate into neural networks. Coherently, we design a differential regulator to address the phase shift optimization of RISs. Moreover, we use the meta learning to iteratively optimize the beamforming matrices of BSs and RISs. These techniques make the proposed method to work well without requiring energy-consuming pre-training. Simulations show that GMLB could achieve higher sum rate than that of typical alternating optimization algorithms with the energy consumption by two orders of magnitude less.

摘要
现代化智能表面技术（RIS）已成为未来六代通信（6G）的一种有前途的技术，以满足能效性和可扩展性的需求。然而，RIS协助通信中存在一个主要挑战，即BS和RIS的活动和被动扫描矩阵优化的共同优化。具体来说，这是因为扫描矩阵的优化空间具有非 convex 性，以及通信场景的多样性和移动性。为解决这个问题，我们提出了一种绿色梯度基于元学习扫描矩阵优化（GMLB）方法。不同于传统的深度学习基于方法，GMLB 将扫描矩阵的梯度Feed into神经网络。此外，我们设计了一种差分调制器来处理RIS的相位优化。此外，我们使用元学习来逐次优化BS和RIS的扫描矩阵。这些技术使得我们的方法可以不需要耗费能量劳动性的预训练。实验显示，GMLB 可以在能效性方面高于传统的交互优化算法，并且能够降低能 consumption by two orders of magnitude。

Multiuser Resource Allocation for Semantic-Relay-Aided Text Transmissions

paper_url: http://arxiv.org/abs/2311.06854
repo_url: None
paper_authors: Zeyang Hu, Tianyu Liu, Changsheng You, Zhaohui Yang, Mingzhe Chen
for: 提高含义传输效率，尤其在低信号噪比和小宽度区域
methods: 提出一种新的含义 relay（SemRelay），使用含义接收器帮助多用户文本传输
results: 提出一种优化多用户资源分配问题的算法，以提高多用户文本传输效率，并经过数学实验证明其高质量优化解决方案的有效性

Abstract
Semantic communication (SemCom) is an emerging technology that extracts useful meaning from data and sends only relevant semantic information. Thus, it has the great potential to improve the spectrum efficiency of conventional wireless systems with bit transmissions, especially in low signal-to-noise ratio (SNR) and small bandwidth regions. However, the existing works have mostly overlooked the constraints of mobile devices, which may not have sufficient capabilities to implement resource-demanding semantic encoder/decoder based on deep learning. To address this issue, we propose in this paper a new semantic relay (SemRelay), which is equipped with a semantic receiver to assist multiuser text transmissions. Specifically, the SemRelay decodes semantic information from a base station and forwards it to the users using conventional bit transmission, hence effectively improving text transmission efficiency. To study the multiuser resource allocation, we formulate an optimization problem to maximize the multiuser weighted sum-rate by jointly designing the SemRelay transmit power allocation and system bandwidth allocation. Although this problem is non-convex and hence challenging to solve, we propose an efficient algorithm to obtain its high-quality suboptimal solution by using the block coordinate descent method. Last, numerical results show the effectiveness of the proposed algorithm as well as superior performance of the proposed SemRelay over the conventional decode-and-forward (DF) relay, especially in small bandwidth region.

摘要
semantic communication (SemCom) 是一种出现在技术中，它从数据中提取有用的意义并仅发送相关的semantic信息。因此，它在传统的无线系统中的频率效率可以得到改进，特别是在低信号响应率（SNR）和小带宽区域。然而，现有的工作都很少考虑了移动设备的限制，它们可能没有充分的能力来实现深度学习基于的 semantic编码/解码器。为解决这个问题，我们在本文提出了一种新的semantic relay（SemRelay），它装备了semantic接收器，以帮助多用户文本传输。具体来说，SemRelay从基站中解码semantic信息，并将其转发给用户 mediante conventional bit transmission，从而有效地提高文本传输效率。为了研究多用户资源分配，我们建立了一个优化问题，以最大化多用户权重合并率。尽管这个问题是非核心的，但我们提出了一种高效的算法，使用块均衡下降法来获得其高质量的偏函数解。最后，数值结果表明了提案的算法的有效性以及SemRelay对传统的decode-and-forward（DF）关系的superior性，特别是在小带宽区域。

Coexistence of OTFS Modulation With OFDM-based Communication Systems

paper_url: http://arxiv.org/abs/2311.06850
repo_url: None
paper_authors: Akram Shafie, Jinhong Yuan, Yuting Fang, Paul Fitzpatrick, Taka Sakurai
for: 本研究探讨了Orthogonal Time-Frequency Space（OTFS）模ulation在当前第四代和第五代（4G/5G）无线通信系统中的合作。
methods: 我们首先 derivated OTFS signal的输入输出关系（IOR），并考虑了CPs的不同长度对OTFS信号的影响。我们还提出了一种基于嵌入式测试点的渠道估计技术，以便在合作系统中准确地 caracterize the channel。
results: 我们通过数字结果显示，在忽略CPs的不同长度时，OTFS信号在共存系统中的错误比率性能会下降。此外，我们还证明了我们提出的渠道估计技术在OTFS信号中的共存系统中能够超过当前状态的阈值基于渠道估计技术。

Abstract
This study examines the coexistence of orthogonal time-frequency space (OTFS) modulation with current fourth- and fifth-generation (4G/5G) wireless communication systems that primarily use orthogonal frequency-division multiplexing (OFDM) waveforms. We first derive the input-output-relation (IOR) of OTFS when it coexists with an OFDM system while considering the impact of unequal lengths of the cyclic prefixes (CPs) in the OTFS signal. We show analytically that the inclusion of multiple CPs to the OTFS signal results in the effective sampled delay-Doppler (DD) domain channel response to be less sparse. We also show that the effective DD domain channel coefficients for OTFS in coexisting systems are influenced by the unequal lengths of the CPs. Subsequently, we propose an embedded pilot-aided channel estimation (CE) technique for OTFS in coexisting systems that leverages the derived IOR for accurate channel characterization. Using numerical results, we show that ignoring the impact of unequal lengths of the CPs during signal detection can degrade the bit error rate performance of OTFS in coexisting systems. We also show that the proposed CE technique for OTFS in coexisting systems outperforms the state-of-the-art threshold-based CE technique.

摘要
Translation:这个研究研究了 fourth-和 fifth-generation (4G/5G) 无线通信系统中 orthogonal time-frequency space (OTFS) 模ulation的共存。我们首先 derive OTFS 在共存系统中的输入-输出关系 (IOR)，并考虑了 OTFS 信号中不同的循环前缀 (CP) 的不同长度对共存系统的影响。我们显示了 analytically ，将多个 CP 添加到 OTFS 信号中会使共存系统的快速傅立叶频域频率响应变得更加稀疏。此外，我们还显示了 OTFS 在共存系统中的有效快速傅立叶频域通道响应被 CP 的不同长度所影响。然后，我们提出了一种基于嵌入式测试器的 OTFS 在共存系统中的频道估计 (CE) 技术，该技术利用了我们Derived IOR 实现了准确的频道特征化。使用数字结果，我们显示了忽略共存系统中 CP 的不同长度时 during signal detection 可能会导致 OTFS 的比特错误率性能下降。此外，我们还显示了我们提出的 CE 技术在共存系统中的 OTFS 性能高于当前的阈值基于 CE 技术。

Joint Design of Coding and Modulation for Digital Over-the-Air Computation

paper_url: http://arxiv.org/abs/2311.06829
repo_url: None
paper_authors: Xin Xie, Cunqinq Hua, Jianan Hong, Yuejun Wei
For: 本文提出了一种基于数字系统的空中计算（AirComp）传输技术，以提高AirComp在复杂无线环境中的可靠性和可扩展性。* Methods: 本文提出了一种基于非二进制LDPC编码的通道编码方案，以增强AirComp的错误修复能力。此外，本文还提出了一种基于阈值编码技术的数字化模调方案，以实现多发送器的数量之和。* Results: 本文提供了仪表结果，以证明提议的设计的可行性和性能。

Abstract
Due to its high communication efficiency, over-the-air computation (AirComp) has been expected to carry out various computing tasks in the next-generation wireless networks. However, up to now, most applications of AirComp are explored in the analog domain, which limits the capability of AirComp in resisting the complex wireless environment, not to mention to integrate the AirComp technique to the existing universal communication standards, most of which are based on the digital system. In this paper, we propose a joint design of channel coding and digital modulation for digital AirComp transmission to attempt to reinforce the foundation for the application of AirComp in the digital system. Specifically, we first propose a non-binary LDPC-based channel coding scheme to enhance the error-correction capability of AirComp. Then, a digital modulation scheme is proposed to achieve the number summation from multiple transmitters via the lattice coding technique. We also provide simulation results to demonstrate the feasibility and the performance of the proposed design.

摘要
由于它的高通信效率，无线计算（AirComp）已被预期在下一代无线网络中执行多种计算任务。然而，至今为止，大多数AirComp应用都是在分析频域中进行的，这限制了AirComp在复杂无线环境中的可扩展性，更不能说明将AirComp技术与现有的通用通信标准集成。在这篇论文中，我们提议一种合理的渠道编码和数字化调制的结合方案，以强化AirComp在数字系统中的基础。具体来说，我们首先提议一种非二进制LDPC编码方案，以增强AirComp的错误恢复能力。然后，我们提议一种数字化调制方案，通过笛卡尔编码技术实现多个发送器的数字加法。我们还提供了临场实验结果，以证明提议的设计的可行性和性能。

Secure Rate-Splitting Multiple Access Transmissions in LMS Systems

paper_url: http://arxiv.org/abs/2311.06825
repo_url: None
paper_authors: Minjue He, Hui Zhao, Xiaqing Miao, Shuai Wang, Gaofeng Pan
for: investigate the secure delivery performance of the rate-splitting multiple access scheme in land mobile satellite (LMS) systems
methods: adopt Maximum ratio transmission (MRT) and matched-filtering (MF) precoding techniques at the satellite, based on the estimated LMS channels suffering from the Shadowed-Rician fading
results: derive closed-form expressions for the ergodic rates for decoding the common messages (CM) and private messages (PM) at the intended user, as well as the ergodic secrecy rate against eavesdropping, and provide numerical results to validate the analysis models and show interesting comparisons.Here’s the summary in Traditional Chinese:
for: 研究陆上通信卫星系统（LMS）中的稳定多存取网络传输性能，特别是当私人讯息被抓取者所抓取时
methods: 采用陆上用户的单束天线，并在天线上运用最大比率传输（MRT）和匹配滤波（MF）的 precoding 技术，基于预测的 LMS 通道受到阴影折射混合噪声
results: derivate 关于传输率的关键表达式，包括传输率、私人讯息传输率和机密私人讯息传输率，并提供数值结果来验证分析模型，以及展示一些有趣的比较I hope this helps!

Abstract
This letter investigates the secure delivery performance of the rate-splitting multiple access scheme in land mobile satellite (LMS) systems, considering that the private messages intended by a terminal can be eavesdropped by any others from the broadcast signals. Specifically, the considered system has an N-antenna satellite and numerous single-antenna land users. Maximum ratio transmission (MRT) and matched-filtering (MF) precoding techniques are adopted at the satellite separately for the common messages (CMs) and for the private messages (PMs), which are both implemented based on the estimated LMS channels suffering from the Shadowed-Rician fading. Then, closed-form expressions are derived for the ergodic rates for decoding the CM, and for decoding the PM at the intended user respectively, and more importantly, we also derive the ergodic secrecy rate against eavesdropping. Finally, numerical results are provided to validate the correctness of the proposed analysis models, as well as to show some interesting comparisons.

摘要
这封信函通过研究在陆地手机卫星（LMS）系统中的稳定交付性表现来调查率分访问方案，具体来说，系统具有N个天线的卫星和多个单天线的地面用户。在卫星上，我们采用最大比率传输（MRT）和匹配滤波（MF） precoding技术来进行通用消息（CM）和专用消息（PM）的编码，这两种技术都基于卫星通道的估计，卫星通道受到阴影-瑞德抽象扰干。然后，我们得到了对CM的有效比特率和对PM的有效比特率的关闭式表达，以及对听到者进行窃听的安全秘密率。最后，我们提供了数值结果来验证我们的分析模型，并且展示了一些有趣的比较。

Compressive Sensing-Based Grant-Free Massive Access for 6G Massive Communication

paper_url: http://arxiv.org/abs/2311.06770
repo_url: None
paper_authors: Zhen Gao, Malong Ke, Yikun Mei, Li Qiao, Sheng Chen, Derrick Wing Kwan Ng, H. Vincent Poor
for: 本研究旨在探讨6G无线通信技术中的巨量访问问题，以实现未来的 ubiqueconnectivity vision。
methods: 本文主要介绍了 compressive sensing 技术在 grant-free 巨量访问方面的应用前景，包括从单antenna到大规模antenna数组基站、从单站到协作多Input多Output系统、以及从无源到源 rand access 场景的演化。
results: 本文预测了 grant-free 巨量访问的未来研究方向，包括巨量访问技术的进一步开发和应用。

Abstract
The advent of the sixth-generation (6G) of wireless communications has given rise to the necessity to connect vast quantities of heterogeneous wireless devices, which requires advanced system capabilities far beyond existing network architectures. In particular, such massive communication has been recognized as a prime driver that can empower the 6G vision of future ubiquitous connectivity, supporting Internet of Human-Machine-Things for which massive access is critical. This paper surveys the most recent advances toward massive access in both academic and industry communities, focusing primarily on the promising compressive sensing-based grant-free massive access paradigm. We first specify the limitations of existing random access schemes and reveal that the practical implementation of massive communication relies on a dramatically different random access paradigm from the current ones mainly designed for human-centric communications. Then, a compressive sensing-based grant-free massive access roadmap is presented, where the evolutions from single-antenna to large-scale antenna array-based base stations, from single-station to cooperative massive multiple-input multiple-output systems, and from unsourced to sourced random access scenarios are detailed. Finally, we discuss the key challenges and open issues to shed light on the potential future research directions of grant-free massive access.

摘要
六代无线通信技术的出现（6G）已经导致了迫切需要连接各种不同的无线设备，这需要进一步的系统功能，远超现有网络架构。特别是，这种大规模通信被认为是未来无限连接性的核心驱动力，支持人机机器互联网，它们的大规模访问是关键。本文 survey了最新的大规模访问技术发展，主要关注 compressive sensing 基于的免费大规模访问方案。我们首先描述了现有随机访问方案的限制，并显示了现有随机访问方案主要是为人类通信而设计的。然后，我们提出了 compressive sensing 基于的免费大规模访问路线图，包括从单antenna到大规模天线阵列基站、从单站到合作大规模多输入多出力系统、和从无源到源随机访问场景的演化。最后，我们讨论了关键挑战和未解决问题，以照明未来研究方向。

One Signal-Noise Separation based Wiener Filter for Magnetogastrogram

paper_url: http://arxiv.org/abs/2311.06739
repo_url: None
paper_authors: Hua Li
for: 这 paper 是为了提高Magnetogastrogram (MGG) signal detection中的噪声抑制和信号分离。
methods: 这 paper 使用了一种新的信号处理框架，即基于信号噪声分离的维ener filters (SNSWF)，以提高维ener filters 输出的噪声比例。
results: 使用SNSWFfilter，可以提高MGG signal detection中的信号噪声比例，相比 классический维ener filters，SNSWFfilter 的filter SNR高出16.7 dB。

Abstract
Magnetogastrogram (MGG) signal frequency is about 0.05 Hz, the low-frequency environmental noise interference is serious and can be several times stronger in magnitude than the signals of interest and may severely impede the extraction of relevant information. Wiener filter is one classic denoising solution for biomagnetic applications. Since the reference channels are usually placed not far enough from the biomagnetic sources under test, they will inevitably detect the signals and the Wiener filters may produce ill-conditioned solutions. Considering the solutions to improve the signal-to-noise ratio (SNR) of Wiener filter output, there are few methods to separate the signals from the noises of the reference signal at the filter input. In this paper, a new signal processing framework called signal-noise separation based Wiener filter (SNSWF) is proposed that it separates the main noise as the input signal of the filter to improve the output SNR of Wiener filter. The filter was successfully applied to the noise suppression for MGG signal detection. Using the SNSWF, the filter SNR is 16.7 dB better than the classic Wiener filter.

摘要
магнитогастрограм（MGG）信号频率约为0.05Hz，低频环境干扰强度较高，可能比有兴趣信号强度多倍，严重阻碍了有关信息的提取。wiener filter是生物磁学应用中一种经典的干扰除解决方案。由于参照通道通常不会在测试biomagnetic sources的位置够近，因此它们会检测信号，wiener filter可能会生成不正确的解决方案。为了提高wiener filter输出的信号噪听比（SNR），有几种方法可以从filter输入分离信号和噪声。在本文中，一种新的信号处理框架被提出，称为信号噪声分离based Wiener filter（SNSWF）。它可以将主要噪声作为输入信号，以提高wiener filter输出的SNR。这个filter成功应用于MGG信号检测中的噪声抑制。使用SNSWF，filter SNR比 классиwiener filter高16.7dB。

2023-11-11

cs.CV

cs.CV - 2023-11-11

3DFusion, A real-time 3D object reconstruction pipeline based on streamed instance segmented data

paper_url: http://arxiv.org/abs/2311.06659
repo_url: None
paper_authors: Xi Sun, Derek Jacoby, Yvonne Coady
for:这篇论文旨在提供一个实时分割和重建系统，该系统使用RGB-D图像来生成准确和详细的indoor环境中对象的三维模型。methods:该系统采用了当今最佳实例分割技术，对RGB-D数据进行像素级分割，以分割背景和前景对象。然后，通过高性能计算平台，对分割的对象进行三维重建。results:该系统可以实现实时三维模elling，并且可以应用于各种领域，如增强现实/虚拟现实、内部设计、城市规划、公路协助、安全系统等。为了实现实时性，论文提出了一种方法，通过采样连续帧来减少网络负担，保证重建质量。此外，该系统采用了并行SLAM管道，以高效地将分割对象切割成个体。该系统使用了领先的框架YOLO进行实例分割，并对YOLO进行修改，以解决类似对象的重复或假检测问题，确保重建模型与目标对象保持一致。总之，该工作建立了一个可靠的实时系统，对indoor环境中对象的分割和重建进行了显著提高。它可能会扩展到户外场景，开启了许多实际应用的可能性。

Abstract
This paper presents a real-time segmentation and reconstruction system that utilizes RGB-D images to generate accurate and detailed individual 3D models of objects within a captured scene. Leveraging state-of-the-art instance segmentation techniques, the system performs pixel-level segmentation on RGB-D data, effectively separating foreground objects from the background. The segmented objects are then reconstructed into distinct 3D models in a high-performance computation platform. The real-time 3D modelling can be applied across various domains, including augmented/virtual reality, interior design, urban planning, road assistance, security systems, and more. To achieve real-time performance, the paper proposes a method that effectively samples consecutive frames to reduce network load while ensuring reconstruction quality. Additionally, a multi-process SLAM pipeline is adopted for parallel 3D reconstruction, enabling efficient cutting of the clustering objects into individuals. This system employs the industry-leading framework YOLO for instance segmentation. To improve YOLO's performance and accuracy, modifications were made to resolve duplicated or false detection of similar objects, ensuring the reconstructed models align with the targets. Overall, this work establishes a robust real-time system with a significant enhancement for object segmentation and reconstruction in the indoor environment. It can potentially be extended to the outdoor scenario, opening up numerous opportunities for real-world applications.

摘要
To achieve real-time performance, the paper proposes a method that effectively samples consecutive frames to reduce network load while ensuring reconstruction quality. Additionally, a multi-process SLAM pipeline is adopted for parallel 3D reconstruction, enabling efficient cutting of the clustering objects into individuals. The system employs the industry-leading framework YOLO for instance segmentation, with modifications made to resolve duplicated or false detection of similar objects, ensuring the reconstructed models align with the targets.Overall, this work establishes a robust real-time system with a significant enhancement for object segmentation and reconstruction in the indoor environment. It can potentially be extended to the outdoor scenario, opening up numerous opportunities for real-world applications.

Unsupervised and semi-supervised co-salient object detection via segmentation frequency statistics

paper_url: http://arxiv.org/abs/2311.06654
repo_url: https://github.com/sourachakra/uscosod-sscosod
paper_authors: Souradeep Chakraborty, Shujon Naha, Muhammet Bastan, Amit Kumar K C, Dimitris Samaras
for: 本文提出了一种不需要监督的对象共同突出物检测方法（CoSOD），用于检测图像集中的共同突出物。
methods: 本文使用了自动学习的自编码器和自注意力机制，将图像分割成不同类别的单个像素，并计算每个类别的对象共同突出度。
results: 本文的方法可以在不具备监督的情况下，基于图像集的频率统计学习，实现高度精度的对象共同突出物检测。 compared with existing methods, the proposed method has a significant improvement in performance.

Abstract
In this paper, we address the detection of co-occurring salient objects (CoSOD) in an image group using frequency statistics in an unsupervised manner, which further enable us to develop a semi-supervised method. While previous works have mostly focused on fully supervised CoSOD, less attention has been allocated to detecting co-salient objects when limited segmentation annotations are available for training. Our simple yet effective unsupervised method US-CoSOD combines the object co-occurrence frequency statistics of unsupervised single-image semantic segmentations with salient foreground detections using self-supervised feature learning. For the first time, we show that a large unlabeled dataset e.g. ImageNet-1k can be effectively leveraged to significantly improve unsupervised CoSOD performance. Our unsupervised model is a great pre-training initialization for our semi-supervised model SS-CoSOD, especially when very limited labeled data is available for training. To avoid propagating erroneous signals from predictions on unlabeled data, we propose a confidence estimation module to guide our semi-supervised training. Extensive experiments on three CoSOD benchmark datasets show that both of our unsupervised and semi-supervised models outperform the corresponding state-of-the-art models by a significant margin (e.g., on the Cosal2015 dataset, our US-CoSOD model has an 8.8% F-measure gain over a SOTA unsupervised co-segmentation model and our SS-CoSOD model has an 11.81% F-measure gain over a SOTA semi-supervised CoSOD model).

摘要
在这篇论文中，我们 Addresses the detection of co-occurring salient objects (CoSOD) in an image group using frequency statistics in an unsupervised manner, which further enables us to develop a semi-supervised method. Previous works have mostly focused on fully supervised CoSOD, but less attention has been allocated to detecting co-salient objects when limited segmentation annotations are available for training. Our simple yet effective unsupervised method US-CoSOD combines the object co-occurrence frequency statistics of unsupervised single-image semantic segmentations with salient foreground detections using self-supervised feature learning. For the first time, we show that a large unlabeled dataset e.g. ImageNet-1k can be effectively leveraged to significantly improve unsupervised CoSOD performance. Our unsupervised model is a great pre-training initialization for our semi-supervised model SS-CoSOD, especially when very limited labeled data is available for training. To avoid propagating erroneous signals from predictions on unlabeled data, we propose a confidence estimation module to guide our semi-supervised training. Extensive experiments on three CoSOD benchmark datasets show that both of our unsupervised and semi-supervised models outperform the corresponding state-of-the-art models by a significant margin (e.g., on the Cosal2015 dataset, our US-CoSOD model has an 8.8% F-measure gain over a SOTA unsupervised co-segmentation model and our SS-CoSOD model has an 11.81% F-measure gain over a SOTA semi-supervised CoSOD model).Here's the translation in Traditional Chinese:在这篇论文中，我们 Addresses the detection of co-occurring salient objects (CoSOD) in an image group using frequency statistics in an unsupervised manner, which further enables us to develop a semi-supervised method. 前一些工作主要集中在完全supervised CoSOD，但是对于有限的分类标注available for training时，对于检测共同突出的物件更少的注意力。我们的简单 yet effective unsupervised method US-CoSOD combines the object co-occurrence frequency statistics of unsupervised single-image semantic segmentations with salient foreground detections using self-supervised feature learning. 我们首次显示了一个大量的无标注数据集例如ImageNet-1k可以对不supervised CoSOD performance进行明显改善。我们的无supervised model是一个优秀的预训初始化 для我们的半supervised model SS-CoSOD，特别是当有很少的标注数据available for training时。为了避免对无标注数据预测中的错误信号传播，我们提出了一个信任估计模组来引导我们的半supervised训练。广泛的实验表明我们的无supervised和半supervised模型在三个CoSOD benchmark dataset上都大比前一些state-of-the-art模型进行了明显改善 (e.g., on the Cosal2015 dataset, our US-CoSOD model has an 8.8% F-measure gain over a SOTA unsupervised co-segmentation model and our SS-CoSOD model has an 11.81% F-measure gain over a SOTA semi-supervised CoSOD model).

Traffic Sign Recognition Using Local Vision Transformer

paper_url: http://arxiv.org/abs/2311.06651
repo_url: None
paper_authors: Ali Farzipour, Omid Nejati Manzari, Shahriar B. Shokouhi
for: 提高自驾车和驾手助手系统中的交通标志识别率
methods: combining convolutional blocks和 transformer-based blocks，并添加了一个局部模块以提高局部感知
results: 在德国交通标志识别benchmark和波斯尼亚交通标志数据集上达到了99.66%和99.8%的准确率，高于最佳 convolutional models，同时具有快速推理速度和实际应用场景适用性。

Abstract
Recognition of traffic signs is a crucial aspect of self-driving cars and driver assistance systems, and machine vision tasks such as traffic sign recognition have gained significant attention. CNNs have been frequently used in machine vision, but introducing vision transformers has provided an alternative approach to global feature learning. This paper proposes a new novel model that blends the advantages of both convolutional and transformer-based networks for traffic sign recognition. The proposed model includes convolutional blocks for capturing local correlations and transformer-based blocks for learning global dependencies. Additionally, a locality module is incorporated to enhance local perception. The performance of the suggested model is evaluated on the Persian Traffic Sign Dataset and German Traffic Sign Recognition Benchmark and compared with SOTA convolutional and transformer-based models. The experimental evaluations demonstrate that the hybrid network with the locality module outperforms pure transformer-based models and some of the best convolutional networks in accuracy. Specifically, our proposed final model reached 99.66% accuracy in the German traffic sign recognition benchmark and 99.8% in the Persian traffic sign dataset, higher than the best convolutional models. Moreover, it outperforms existing CNNs and ViTs while maintaining fast inference speed. Consequently, the proposed model proves to be significantly faster and more suitable for real-world applications.

摘要
自驾车和助手系统中识别交通标识是一项关键性的任务，机器视觉任务如交通标识已经吸引了广泛的关注。CNNs在机器视觉中经常被使用，但是引入视Transformers提供了一种全局特征学习的替代方法。这篇论文提议了一种新的 hybrid 模型，将 convolutional 块和 transformer-based 块结合在一起，以便捕捉本地相关性和全球依赖关系。此外，还包含了一个 locality 模块，以增强本地感知。提议的模型在 Persian Traffic Sign Dataset 和 German Traffic Sign Recognition Benchmark 上进行了实验评估，与 SOTA convolutional 和 transformer-based 模型进行比较。实验结果表明，提议的 hybrid 网络在准确率方面与最佳 convolutional 模型和一些 transformer-based 模型相当，而且具有更快的推理速度。特别是，我们的最终模型在 German traffic sign recognition benchmark 上达到了 99.66% 的准确率，而 Persian traffic sign dataset 上达到了 99.8%。此外，它还超越了现有的 CNNs 和 ViTs，而且保持了快速的推理速度。因此，我们的提议模型在实际应用中具有优势。

Back to Basics: Fast Denoising Iterative Algorithm

paper_url: http://arxiv.org/abs/2311.06634
repo_url: None
paper_authors: Deborah Pereg
for: 降低噪音，提高图像质量
methods: 使用Back to Basics（BTB）快迭代算法，不需训练或真实数据，可应用于独立噪音和相关噪音环境中
results: 对三个研究 caso进行了实验，包括自然图像噪音纠正、POisson分布图像纠正和optical coherence tomography（OCT）干涉抑制，实验结果表明提案方法可以有效提高图像质量，在噪音设定中展现出良好的性能，并提供了理论保证。

Abstract
We introduce Back to Basics (BTB), a fast iterative algorithm for noise reduction. Our method is computationally efficient, does not require training or ground truth data, and can be applied in the presence of independent noise, as well as correlated (coherent) noise, where the noise level is unknown. We examine three study cases: natural image denoising in the presence of additive white Gaussian noise, Poisson-distributed image denoising, and speckle suppression in optical coherence tomography (OCT). Experimental results demonstrate that the proposed approach can effectively improve image quality, in challenging noise settings. Theoretical guarantees are provided for convergence stability.

摘要
我们介绍Back to Basics（BTB）算法，它是一种快速迭代的噪声减少方法。我们的方法不需要训练或真实数据，可以在独立噪声和相关噪声（相对干扰）的情况下应用，而且噪声水平未知。我们在自然图像噪声 removing中使用了三个研究 caso：在添加白噪声的情况下的自然图像净化、Poisson分布图像净化和optical coherence tomography（OCT）中的斑点消除。实验结果表明，我们提出的方法可以有效地提高图像质量，在具有挑战性噪声的情况下。我们也提供了理论保证对 converges 稳定性。

A 3D Conditional Diffusion Model for Image Quality Transfer – An Application to Low-Field MRI

paper_url: http://arxiv.org/abs/2311.06631
repo_url: https://github.com/edshkim98/diffusioniqt
paper_authors: Seunghoi Kim, Henry F. J. Tregidgo, Ahmed K. Eldaly, Matteo Figini, Daniel C. Alexander
for: 提高低场磁共振成像质量
methods: 使用3Dconditional扩散模型和cross-batch机制提高自注意力和填充
results: 在HCP数据集上比较出色，远超过现有方法 both quantitatively and qualitatively

Abstract
Low-field (LF) MRI scanners (<1T) are still prevalent in settings with limited resources or unreliable power supply. However, they often yield images with lower spatial resolution and contrast than high-field (HF) scanners. This quality disparity can result in inaccurate clinician interpretations. Image Quality Transfer (IQT) has been developed to enhance the quality of images by learning a mapping function between low and high-quality images. Existing IQT models often fail to restore high-frequency features, leading to blurry output. In this paper, we propose a 3D conditional diffusion model to improve 3D volumetric data, specifically LF MR images. Additionally, we incorporate a cross-batch mechanism into the self-attention and padding of our network, ensuring broader contextual awareness even under small 3D patches. Experiments on the publicly available Human Connectome Project (HCP) dataset for IQT and brain parcellation demonstrate that our model outperforms existing methods both quantitatively and qualitatively. The code is publicly available at \url{https://github.com/edshkim98/DiffusionIQT}.

摘要
低场（LF）MRI仪器（<1T）仍然广泛用于具有限制的资源或不可靠的电力供应的设置。然而，它们经常生成图像的空间分辨率和对比度较低，从而导致临床医生的解释不准确。图像质量传输（IQT）已经被开发来提高图像质量，学习映射函数 между低质量和高质量图像。现有的IQT模型经常无法恢复高频特征，导致输出模糊不清。在这篇论文中，我们提议一种3D条件扩散模型，用于改进3D积分数据，特别是LF MR图像。此外，我们在网络中包含了跨批机制，使其在小3D片段中保持更广泛的Contextual awareness。实验结果表明，我们的模型在公共可用的人类连接组计划（HCP）数据集上的IQT和脑分割 task中，与现有方法相比，具有较高的数量和质量性能。代码可以在 \url{https://github.com/edshkim98/DiffusionIQT} 上获取。

Computer Vision for Particle Size Analysis of Coarse-Grained Soils

paper_url: http://arxiv.org/abs/2311.06613
repo_url: None
paper_authors: Sompote Youwai, Parchya Makam
for: 本研究用computer vision技术和Python编程语言进行粒子大小分析，以提高土壤物理特性的评估效率。
methods: 使用OPENCV库对普通照明条件下拍摄的土壤粒子进行检测和测量，并使用标准手持式摄像头。
results: 相比传统筛分分析方法，该方法在大于2mm粒子上表现出良好的准确性（MAPE约6%），但是小于2mm粒子的MAPE可达60%，建议使用更高分辨率的摄像头进行拍摄。

Abstract
Particle size analysis (PSA) is a fundamental technique for evaluating the physical characteristics of soils. However, traditional methods like sieving can be time-consuming and labor-intensive. In this study, we present a novel approach that utilizes computer vision (CV) and the Python programming language for PSA of coarse-grained soils, employing a standard mobile phone camera. By eliminating the need for a high-performance camera, our method offers convenience and cost savings. Our methodology involves using the OPENCV library to detect and measure soil particles in digital photographs taken under ordinary lighting conditions. For accurate particle size determination, a calibration target with known dimensions is placed on a plain paper alongside 20 different sand samples. The proposed method is compared with traditional sieve analysis and exhibits satisfactory performance for soil particles larger than 2 mm, with a mean absolute percent error (MAPE) of approximately 6%. However, particles smaller than 2 mm result in higher MAPE, reaching up to 60%. To address this limitation, we recommend using a higher-resolution camera to capture images of the smaller soil particles. Furthermore, we discuss the advantages, limitations, and potential future improvements of our method. Remarkably, the program can be executed on a mobile phone, providing immediate results without the need to send soil samples to a laboratory. This field-friendly feature makes our approach highly convenient for on-site usage, outside of a traditional laboratory setting. Ultimately, this novel method represents an initial disruption to the industry, enabling efficient particle size analysis of soil without the reliance on laboratory-based sieve analysis. KEYWORDS: Computer vision, Grain size, ARUCO

摘要
计量粒子分析（PSA）是土壤物理特性的基本技术。然而，传统方法如筛分可能是时间consuming和人力成本高。在这项研究中，我们提出了一种新的方法，利用计算机视觉（CV）和Python编程语言进行PSA，使用标准的移动电话摄像头。它消除了高性能摄像头的需求，从而提供了便利和成本节省。我们的方法ология是使用OPENCV库检测和测量在普通照明条件下拍摄的土壤粒子。为了准确地确定粒子大小，我们使用了一个标准化的检测目标，并与20个不同的砂样进行比较。我们的方法与传统筛分分析相比，对土壤粒子大于2毫米的粒子大小具有较好的性能，具有约6%的平均绝对百分比误差（MAPE）。然而，粒子小于2毫米的误差较高，可达60%。为了解决这个限制，我们建议使用更高分辨率的摄像头拍摄小粒子土壤。此外，我们还讨论了我们的方法的优缺点，以及未来可能的改进。值得注意的是，我们的方法可以在移动电话上执行，无需将土壤样本送往实验室进行分析。这种场地友好的特点使我们的方法在实验室外实现了高效的粒子分析。最后，我们的新方法代表了它在业界的初步干扰，允许不需要实验室基础的粒子分析。关键词：计算机视觉、粒子大小、ARUCO

Swin UNETR++: Advancing Transformer-Based Dense Dose Prediction Towards Fully Automated Radiation Oncology Treatments

paper_url: http://arxiv.org/abs/2311.06572
repo_url: None
paper_authors: Kuancheng Wang, Hai Siong Tan, Rafe Mcbeth
For: The paper aims to develop a deep learning model for automating the creation of radiation treatment plans for cancer therapy.* Methods: The proposed model, called Swin UNETR++, uses a lightweight 3D Dual Cross-Attention (DCA) module to capture the intra and inter-volume relationships of each patient’s unique anatomy. The model was trained, validated, and tested on the Open Knowledge-Based Planning dataset.* Results: Swin UNETR++ demonstrates near-state-of-the-art performance on the validation and test datasets, with average volume-wise acceptance rates of 88.58% and 90.50%, and average patient-wise clinical acceptance rates of 100.0% and 98.0%. The results establish a basis for future studies to translate 3D dose predictions into a deliverable treatment plan, facilitating full automation.Here are the three points in Simplified Chinese text:* For: 本文旨在开发一种深度学习模型，用于自动生成 radiation therapy 的辐射治疗计划。* Methods: 提议的模型叫做 Swin UNETR++，它使用轻量级的 3D 双重跨参量 (DCA) 模块，以捕捉每个病人唯一的 анатомиче关系。模型在 Open Knowledge-Based Planning 数据集上进行了训练、验证和测试。* Results: Swin UNETR++ 在验证数据集和测试数据集上达到了 near-state-of-the-art 性能，具体来说，average volume-wise acceptance rate 为 88.58% 和 90.50%，average patient-wise clinical acceptance rate 为 100.0% 和 98.0%。结果为未来的研究提供了一个基础，以便将 3D 剂量预测翻译成可实施的治疗计划，实现了自动化。

Abstract
The field of Radiation Oncology is uniquely positioned to benefit from the use of artificial intelligence to fully automate the creation of radiation treatment plans for cancer therapy. This time-consuming and specialized task combines patient imaging with organ and tumor segmentation to generate a 3D radiation dose distribution to meet clinical treatment goals, similar to voxel-level dense prediction. In this work, we propose Swin UNETR++, that contains a lightweight 3D Dual Cross-Attention (DCA) module to capture the intra and inter-volume relationships of each patient's unique anatomy, which fully convolutional neural networks lack. Our model was trained, validated, and tested on the Open Knowledge-Based Planning dataset. In addition to metrics of Dose Score $\overline{S_{\text{Dose}}$ and DVH Score $\overline{S_{\text{DVH}}$ that quantitatively measure the difference between the predicted and ground-truth 3D radiation dose distribution, we propose the qualitative metrics of average volume-wise acceptance rate $\overline{R_{\text{VA}}$ and average patient-wise clinical acceptance rate $\overline{R_{\text{PA}}$ to assess the clinical reliability of the predictions. Swin UNETR++ demonstrates near-state-of-the-art performance on validation and test dataset (validation: $\overline{S_{\text{DVH}}$=1.492 Gy, $\overline{S_{\text{Dose}}$=2.649 Gy, $\overline{R_{\text{VA}}$=88.58%, $\overline{R_{\text{PA}}$=100.0%; test: $\overline{S_{\text{DVH}}$=1.634 Gy, $\overline{S_{\text{Dose}}$=2.757 Gy, $\overline{R_{\text{VA}}$=90.50%, $\overline{R_{\text{PA}}$=98.0%), establishing a basis for future studies to translate 3D dose predictions into a deliverable treatment plan, facilitating full automation.

摘要
领域 Radiation Oncology 可以充分利用人工智能来自动生成抑肿治疗计划。这个时间consuming 和专业化的任务涉及到病人图像与器官和肿瘤分割，以生成符合临床治疗目标的3D辐射剂量分布。在这种工作中，我们提议Swin UNITR++模型，它包含了轻量级3D双交叉关注（DCA）模块，以捕捉每个患者独特的生物学结构关系。与普通的完全 convolutional neural networks 不同，这种模型可以更好地考虑患者的多个方面和尺度。我们的模型在Open Knowledge-Based Planning数据集上进行训练、验证和测试，并且使用了量化评价指标，包括辐射剂量分布的DOSE Score和DVH Score，以及评价预测结果的临床可靠性指标，包括平均体积级别接受率（RVA）和平均患者级别接受率（RPA）。Swin UNITR++在验证和测试数据集上达到了近似于状态之arte的性能（验证数据集：DOSE Score=1.492 Gy，DVH Score=2.649 Gy，RVA=88.58%，RPA=100.0%; 测试数据集：DOSE Score=1.634 Gy，DVH Score=2.757 Gy，RVA=90.50%，RPA=98.0%），为未来的研究提供了一个基础，以便将3D剂量预测翻译成可实施的治疗计划，实现全自动化。

OR Residual Connection Achieving Comparable Accuracy to ADD Residual Connection in Deep Residual Spiking Neural Networks

paper_url: http://arxiv.org/abs/2311.06570
repo_url: https://github.com/ym-shan/orrc-syna-natural-pruning
paper_authors: Yimeng Shan, Xuerui Qiu, Rui-jie Zhu, Ruike Li, Meng Wang, Haicheng Qu
for: This paper aims to improve the performance and energy efficiency of deep residual spiking neural networks (SNNs) for brain-like computing.
methods: The authors introduce the OR Residual connection (ORRC) and the Synergistic Attention (SynA) module to the SEW-ResNet architecture, and integrate natural pruning to reduce computational overhead.
results: The enhanced OR-Spiking ResNet achieved single-sample classification with as little as 0.8 spikes per neuron, outperforming other spike residual models in accuracy and power consumption.Here is the same information in Simplified Chinese:
for: 这篇论文目标是改进深度待遇刺激神经网络（SNNs）的性能和能效性，用于脑类计算。
methods: 作者们引入 OR Residual connection（ORRC）和Synergistic Attention（SynA）模块到 SEW-ResNet 架构中，并实现自然减少计算开销。
results: 提升后的 OR-Spiking ResNet 实现单个样本分类，只需0.8个神经元发射，与其他刺激剩余模型相比，具有更高的准确率和更低的电力消耗。

Abstract
Spiking Neural Networks (SNNs) have garnered substantial attention in brain-like computing for their biological fidelity and the capacity to execute energy-efficient spike-driven operations. As the demand for heightened performance in SNNs surges, the trend towards training deeper networks becomes imperative, while residual learning stands as a pivotal method for training deep neural networks. In our investigation, we identified that the SEW-ResNet, a prominent representative of deep residual spiking neural networks, incorporates non-event-driven operations. To rectify this, we introduce the OR Residual connection (ORRC) to the architecture. Additionally, we propose the Synergistic Attention (SynA) module, an amalgamation of the Inhibitory Attention (IA) module and the Multi-dimensional Attention (MA) module, to offset energy loss stemming from high quantization. When integrating SynA into the network, we observed the phenomenon of "natural pruning", where after training, some or all of the shortcuts in the network naturally drop out without affecting the model's classification accuracy. This significantly reduces computational overhead and makes it more suitable for deployment on edge devices. Experimental results on various public datasets confirmed that the SynA enhanced OR-Spiking ResNet achieved single-sample classification with as little as 0.8 spikes per neuron. Moreover, when compared to other spike residual models, it exhibited higher accuracy and lower power consumption. Codes are available at https://github.com/Ym-Shan/ORRC-SynA-natural-pruning.

摘要
神经网络（SNN）在脑如计算中备受关注，因其生物准确性和能效地执行脉冲驱动操作。随着深度SNN的需求增加，训练深度网络变得必要，而剩余学习成为训练深度网络的重要方法。在我们的研究中，我们发现SEW-ResNet，一种深度剩余神经网络的代表，包含非事件驱动操作。为解决这问题，我们引入了OR隐藏连接（ORRC）到架构中。此外，我们提出了协同注意（SynA）模块，它是禁忌注意模块和多维注意模块的组合，以弥补因高量化而导致的能量损失。在将SynA模块 incorporated into the network时，我们观察到了自然减少现象，即在训练后，网络中的减少减少自然而无需影响模型的分类精度。这显著减少计算开销，使其更适合边缘设备部署。实验结果表明，对多个公共数据集进行训练后，使用SynA进行增强的OR-Spiking ResNet可以在0.8脉冲每个神经元单个样本分类。此外，与其他脉冲剩余模型相比，它表现出更高的准确率和更低的能 consumption。代码可以在https://github.com/Ym-Shan/ORRC-SynA-natural-pruning中找到。

Artificial Intelligence in Assessing Cardiovascular Diseases and Risk Factors via Retinal Fundus Images: A Review of the Last Decade

paper_url: http://arxiv.org/abs/2311.07609
repo_url: None
paper_authors: Mirsaeed Abdollahi, Ali Jafarizadeh, Amirhosein Ghafouri Asbagh, Navid Sobhi, Keysan Pourmoghtader, Siamak Pedrammehr, Houshyar Asadi, Roohallah Alizadehsani, Ru-San Tan, U. Rajendra Acharya
For: The paper aims to provide an overview of the current advancements and challenges in employing retinal imaging and artificial intelligence to identify cardiovascular disorders.* Methods: The paper uses a comprehensive search of various databases, including PubMed, Medline, Google Scholar, Scopus, Web of Sciences, IEEE Xplore, and ACM Digital Library, to identify relevant publications related to cardiovascular diseases and artificial intelligence.* Results: The study includes 87 English-language publications that provide insights into the current state of research in this field and highlights the potential of AI and deep learning for early detection and prediction of cardiovascular diseases.Here are the three points in Simplified Chinese text:* For: 这篇论文目的是为了提供Cardiovascular diseases (CVDs)的现状和挑战，以及使用Retinal imaging和人工智能来识别CVDs的概述。* Methods: 这篇论文使用了多种数据库的检索，包括PubMed、Medline、Google Scholar、Scopus、Web of Sciences、IEEE Xplore和ACM Digital Library，以确定相关的Cardiovascular diseases和人工智能publications。* Results: 这篇论文包含87篇英文文献，提供了这个领域的当前进展和挑战，并指出了人工智能和深度学习在早期检测和预测Cardiovascular diseases方面的潜在潜力。

Abstract
Background: Cardiovascular diseases (CVDs) continue to be the leading cause of mortality on a global scale. In recent years, the application of artificial intelligence (AI) techniques, particularly deep learning (DL), has gained considerable popularity for evaluating the various aspects of CVDs. Moreover, using fundus images and optical coherence tomography angiography (OCTA) to diagnose retinal diseases has been extensively studied. To better understand heart function and anticipate changes based on microvascular characteristics and function, researchers are currently exploring the integration of AI with non-invasive retinal scanning. Leveraging AI-assisted early detection and prediction of cardiovascular diseases on a large scale holds excellent potential to mitigate cardiovascular events and alleviate the economic burden on healthcare systems. Method: A comprehensive search was conducted across various databases, including PubMed, Medline, Google Scholar, Scopus, Web of Sciences, IEEE Xplore, and ACM Digital Library, using specific keywords related to cardiovascular diseases and artificial intelligence. Results: A total of 87 English-language publications, selected for relevance were included in the study, and additional references were considered. This study presents an overview of the current advancements and challenges in employing retinal imaging and artificial intelligence to identify cardiovascular disorders and provides insights for further exploration in this field. Conclusion: Researchers aim to develop precise disease prognosis patterns as the aging population and global CVD burden increase. AI and deep learning are transforming healthcare, offering the potential for single retinal image-based diagnosis of various CVDs, albeit with the need for accelerated adoption in healthcare systems.

摘要
背景：心血管疾病（CVD）仍然是全球范围内最主要的死亡原因。在过去几年，人工智能（AI）技术，特别是深度学习（DL），在评估CVD多方面的方面得到了广泛的应用。此外，使用眼膜图像和光共振成像（OCTA）诊断视网膜疾病已经得到了广泛的研究。为了更好地理解心脏功能和预测基于微血管特征和功能的变化，研究人员正在探索将AI与不侵入性的眼膜扫描结合起来。利用AI助成早期检测和预测心血管疾病的大规模应用拥有很大的潜在价值，可以减少心血管事件和减轻医疗系统的负担。方法：我们对多种数据库进行了总体检索，包括PubMed、Medline、Google学术搜索、Scopus、Web of Sciences、IEEE Xplore和ACM数字图书馆，使用与心血管疾病相关的特定关键词。结果：共选择了87篇英文文献，包括其他参考文献。本研究提供了目前AI与眼膜成像在诊断心血管疾病方面的进展和挑战，以及此领域的更多可能性的探索。结论：研究人员目标是通过随着人口老龄化和全球CVD荷重的增加，发展精准的疾病诊断模式。AI和深度学习在医疗领域中发挥了重要作用，尝试通过单一的眼膜图像诊断多种CVD，尽管需要加速在医疗系统中的采用。

Identification of vortex in unstructured mesh with graph neural networks

paper_url: http://arxiv.org/abs/2311.06557
repo_url: None
paper_authors: Lianfa Wang, Yvan Fournier, Jean-Francois Wald, Youssef Mesri
for: 用于identifying flow characteristics from Computational Fluid Dynamics (CFD) databases，帮助研究者更好地理解流场，优化geometry设计和选择合适的CFD配置。
methods: 使用Graph Neural Network (GNN) with U-Net architecture，通过 algebraic multigrid method生成图并构建图层结构，对2D CFD网格中的涡旋区域进行自动标签。
results: 对CFD结果进行了vortex自动标签，并评估了GNN矩阵的分类精度、训练效率和标注结果的流场特征。最后，demonstrated the approach的可扩展性和通用性，可应用于不同的液体动力学模型和 Reynolds 数。

Abstract
Deep learning has been employed to identify flow characteristics from Computational Fluid Dynamics (CFD) databases to assist the researcher to better understand the flow field, to optimize the geometry design and to select the correct CFD configuration for corresponding flow characteristics. Convolutional Neural Network (CNN) is one of the most popular algorithms used to extract and identify flow features. However its use, without any additional flow field interpolation, is limited to the simple domain geometry and regular meshes which limits its application to real industrial cases where complex geometry and irregular meshes are usually used. Aiming at the aforementioned problems, we present a Graph Neural Network (GNN) based model with U-Net architecture to identify the vortex in CFD results on unstructured meshes. The graph generation and graph hierarchy construction using algebraic multigrid method from CFD meshes are introduced. A vortex auto-labeling method is proposed to label vortex regions in 2D CFD meshes. We precise our approach by firstly optimizing the input set on CNNs, then benchmarking current GNN kernels against CNN model and evaluating the performances of GNN kernels in terms of classification accuracy, training efficiency and identified vortex morphology. Finally, we demonstrate the adaptability of our approach to unstructured meshes and generality to unseen cases with different turbulence models at different Reynolds numbers.

摘要
深度学习已经在计算流体动力学（CFD）数据库中使用来识别流体特性，以 помо助研究人员更好地理解流场，优化geometry设计和选择相应的CFD配置。抽象神经网络（CNN）是最受欢迎的算法之一，用于提取和识别流体特征。然而，不带任何流场插值的CNN使用，受限于简单的域几何和规则的网格，因此对实际工业案例中的复杂几何和不规则网格的应用有限。为此，我们提出了基于图神经网络（GNN）的模型，使用U-Net架构来识别CFD结果中的涡。我们首先优化输入集，然后对现有GNN核 compare with CNN模型，并评估GNN核的性能。最后，我们证明我们的方法可以适用于不结构化网格和未看到的情况中的不同的湍流模型和不同的 Reynolds 数。

Visual Commonsense based Heterogeneous Graph Contrastive Learning

paper_url: http://arxiv.org/abs/2311.06553
repo_url: None
paper_authors: Zongzhao Li, Xiangyu Zhu, Xi Zhang, Zhaoxiang Zhang, Zhen Lei
for: 提高多modal应用中视语关系的理解和语言领域关系的抽象
methods: 使用heterogeneous graph contrastive learning方法，包括Visual Commonsense Information和Graph Relation Network，以提高视觉理解任务的完成
results: 对四个 benchmark 进行了广泛的实验，显示了方法的效果和通用性，可以大幅提高七种代表性 VQA 模型的性能

Abstract
How to select relevant key objects and reason about the complex relationships cross vision and linguistic domain are two key issues in many multi-modality applications such as visual question answering (VQA). In this work, we incorporate the visual commonsense information and propose a heterogeneous graph contrastive learning method to better finish the visual reasoning task. Our method is designed as a plug-and-play way, so that it can be quickly and easily combined with a wide range of representative methods. Specifically, our model contains two key components: the Commonsense-based Contrastive Learning and the Graph Relation Network. Using contrastive learning, we guide the model concentrate more on discriminative objects and relevant visual commonsense attributes. Besides, thanks to the introduction of the Graph Relation Network, the model reasons about the correlations between homogeneous edges and the similarities between heterogeneous edges, which makes information transmission more effective. Extensive experiments on four benchmarks show that our method greatly improves seven representative VQA models, demonstrating its effectiveness and generalizability.

摘要
多Modalitate应用中，选择相关的关键对象并理解跨视听域关系是两个关键问题。在这种情况下，我们将视觉常识信息 integrate到模型中，并提出一种多态图像异构学习方法来更好地完成视觉理解任务。我们的方法设计为可插入式的方式，以便快速和方便地与各种代表性方法结合使用。具体来说，我们的模型包括两个关键组件： Commonsense-based Contrastive Learning和图像关系网络。通过对比学习，我们引导模型更多地关注特征对象和相关的视觉常识特征。此外，图像关系网络的引入使得模型可以更好地理解同型边的相互关系，使信息传递更加有效。我们在四个标准测试集上进行了广泛的实验，并证明了我们的方法可以大幅提高七种代表VQA模型的性能，表明其有效性和普适性。

Stain Consistency Learning: Handling Stain Variation for Automatic Digital Pathology Segmentation

paper_url: http://arxiv.org/abs/2311.06552
repo_url: https://github.com/mlyg/stain_consistency_learning
paper_authors: Michael Yeung, Todd Watts, Sean YW Tan, Pedro F. Ferreira, Andrew D. Scott, Sonia Nielles-Vallespin, Guang Yang
for: 本研究旨在提高机器学习方法对染色谱变化的可靠性，并对各种方法进行比较性评估，以便选择最佳方法。
methods: 本研究提出了一种新的染色协调学习框架，即染色特征归一化学习法，该法结合染色特征归一化和染色一致损失函数来学习染色颜色无关的特征。
results: 对于 Masson 染色和 H&E 染色的细胞和核lei datasets，本研究对各种染色变化处理方法进行了首次、广泛的比较，并证明了提案的方法能够获得最佳性能。

Abstract
Stain variation is a unique challenge associated with automated analysis of digital pathology. Numerous methods have been developed to improve the robustness of machine learning methods to stain variation, but comparative studies have demonstrated limited benefits to performance. Moreover, methods to handle stain variation were largely developed for H&E stained data, with evaluation generally limited to classification tasks. Here we propose Stain Consistency Learning, a novel framework combining stain-specific augmentation with a stain consistency loss function to learn stain colour invariant features. We perform the first, extensive comparison of methods to handle stain variation for segmentation tasks, comparing ten methods on Masson's trichrome and H&E stained cell and nuclei datasets, respectively. We observed that stain normalisation methods resulted in equivalent or worse performance, while stain augmentation or stain adversarial methods demonstrated improved performance, with the best performance consistently achieved by our proposed approach. The code is available at: https://github.com/mlyg/stain_consistency_learning

摘要
颜色差异是数字病理学自动分析中的一个独特挑战。许多方法已经开发来改善机器学习方法对颜色差异的Robustness，但是比较研究表明这些方法具有有限的效果。此外，处理颜色差异的方法主要是为H&E染料数据而开发，评估通常是限定为分类任务。我们提出了一种新的框架，即颜色一致学习（Stain Consistency Learning），它将颜色特异的扩充与颜色一致损失函数结合在一起，以学习颜色不变的特征。我们对 Masson的三色染料和H&E染料分别进行了细胞和核lei数据集的比较，结果表明，颜色normal化方法的性能相当或更差，而颜色扩充或颜色对抗方法的性能则得到了改善，我们的提议方法得到了最佳性能。代码可以在以下链接下获取：https://github.com/mlyg/stain_consistency_learning。

FDNet: Feature Decoupled Segmentation Network for Tooth CBCT Image

paper_url: http://arxiv.org/abs/2311.06551
repo_url: None
paper_authors: Xiang Feng, Chengkai Wang, Chengyu Wu, Yunxiang Li, Yongbo He, Shuai Wang, Yaiqi Wang
for: 本研究旨在提高CBCT影像的精确分割，以便正确评估牙齿准备治疗计划。
methods: 该研究提出了一种新的Feature Decoupled Segmentation Network（FDNet），通过结合低频波峰变换（LF-Wavelet）和SAM编码器，以提高牙齿边界的精度和细节分割的准确性。
results: 研究表明，FDNet可以在CBCT影像中提供高达85.28%的Dice分数和75.23%的IoU分数，表明该方法可以有效地减少semantic gap，提供精确的牙齿分割结果。

Abstract
Precise Tooth Cone Beam Computed Tomography (CBCT) image segmentation is crucial for orthodontic treatment planning. In this paper, we propose FDNet, a Feature Decoupled Segmentation Network, to excel in the face of the variable dental conditions encountered in CBCT scans, such as complex artifacts and indistinct tooth boundaries. The Low-Frequency Wavelet Transform (LF-Wavelet) is employed to enrich the semantic content by emphasizing the global structural integrity of the teeth, while the SAM encoder is leveraged to refine the boundary delineation, thus improving the contrast between adjacent dental structures. By integrating these dual aspects, FDNet adeptly addresses the semantic gap, providing a detailed and accurate segmentation. The framework's effectiveness is validated through rigorous benchmarks, achieving the top Dice and IoU scores of 85.28% and 75.23%, respectively. This innovative decoupling of semantic and boundary features capitalizes on the unique strengths of each element to significantly elevate the quality of segmentation performance.

摘要
精准牙齿 cone beam computed tomography（CBCT）图像分割是正确的orthodontic treatment planning的关键。在这篇论文中，我们提出了FDNet，一种特征解coupled Segmentation Network，以便在CBCT扫描中遇到的变化牙齿条件下表现出色，例如复杂的遗产物和不明确的牙齿界限。使用低频波лет变换（LF-Wavelet）可以增强牙齿的semantic内容，同时使用SAM编码器可以进一步改善边界定义，从而提高牙齿结构之间的对比度。通过这种双重方法，FDNet能够有效地bridging semantic gap，提供精确和详细的分割。该框架的效果被证明通过严格的 benchmark，实现了Dice和IoU分割分别达到85.28%和75.23%的最高分。这种创新的特征解coupling技术可以 Capitalize on Each element的特点，提高分割性能的质量。

Generation Of Colors using Bidirectional Long Short Term Memory Networks

paper_url: http://arxiv.org/abs/2311.06542
repo_url: https://github.com/chungimungi/color-prediction
paper_authors: A. Sinha
For: This paper aims to bridge the gap between human visual perception of countless shades of colours and our ability to name and describe them accurately, using a novel model based on Bidirectional Long Short-Term Memory (BiLSTM) networks with Active learning.* Methods: The paper develops a novel model that operates on a proprietary dataset curated for this study, using BiLSTM networks with Active learning to categorize and name previously unnamed colours or identify intermediate shades that elude traditional colour terminology.* Results: The findings of the study demonstrate the potential of this innovative approach in revolutionizing our understanding of colour perception and language, with the potential to extend the applications of Natural Language Processing (NLP) beyond conventional boundaries.

Abstract
Human vision can distinguish between a vast spectrum of colours, estimated to be between 2 to 7 million discernible shades. However, this impressive range does not inherently imply that all these colours have been precisely named and described within our lexicon. We often associate colours with familiar objects and concepts in our daily lives. This research endeavors to bridge the gap between our visual perception of countless shades and our ability to articulate and name them accurately. A novel model has been developed to achieve this goal, leveraging Bidirectional Long Short-Term Memory (BiLSTM) networks with Active learning. This model operates on a proprietary dataset meticulously curated for this study. The primary objective of this research is to create a versatile tool for categorizing and naming previously unnamed colours or identifying intermediate shades that elude traditional colour terminology. The findings underscore the potential of this innovative approach in revolutionizing our understanding of colour perception and language. Through rigorous experimentation and analysis, this study illuminates a promising avenue for Natural Language Processing (NLP) applications in diverse industries. By facilitating the exploration of the vast colour spectrum the potential applications of NLP are extended beyond conventional boundaries.

摘要
人类视觉可以分辨出各种颜色，估计有2到7百万个不同的颜色。然而，这一各种颜色的范围并不意味着所有的颜色都有被精确地命名和描述在我们的语言中。我们常常将颜色与日常生活中的familiar对象和概念相关联。这项研究的目的是将视觉中的 countless 颜色与我们的语言之间的差距bridged。为此，该研究开发了一种基于BiLSTM网络和活动学习的新模型。该模型运用了专门为本研究制作的专有数据集。本研究的主要目标是开发一种可以分类和命名未命名颜色或者描述不能被传统颜色术语捕捉的颜色的工具。研究结果表明这种创新的方法在改变我们对颜色识别和语言的理解方面具有潜力。通过严格的实验和分析，本研究探讨了NLP应用的新途径，扩展了NLP应用的边界。

CrashCar101: Procedural Generation for Damage Assessment

paper_url: http://arxiv.org/abs/2311.06536
repo_url: None
paper_authors: Jens Parslov, Erik Riise, Dim P. Papadopoulos
for: 本研究旨在解决汽车损害评估中的问题，包括检测损害的位置和程度以及特定的损害部分。
methods: 我们提议使用生成过程来训练计算机视觉系统，使其能够进行Semantic part和损害分 segmentation。我们使用生成的3D汽车模型和Synthetic Data来生成高度多样化的样本，并为每个样本提供高精度的像素注释。
results: 我们采用这种方法并生成了CrashCar101数据集。我们在三个实际数据集上进行了实验，并证明了在part segmentation任务上，使用实际数据和Synthetic Data进行训练的模型比使用实际数据进行训练的模型表现更好。在损害 segmentation任务上，我们证明了CrashCar101数据集的sim2real转移能力。

Abstract
In this paper, we are interested in addressing the problem of damage assessment for vehicles, such as cars. This task requires not only detecting the location and the extent of the damage but also identifying the damaged part. To train a computer vision system for the semantic part and damage segmentation in images, we need to manually annotate images with costly pixel annotations for both part categories and damage types. To overcome this need, we propose to use synthetic data to train these models. Synthetic data can provide samples with high variability, pixel-accurate annotations, and arbitrarily large training sets without any human intervention. We propose a procedural generation pipeline that damages 3D car models and we obtain synthetic 2D images of damaged cars paired with pixel-accurate annotations for part and damage categories. To validate our idea, we execute our pipeline and render our CrashCar101 dataset. We run experiments on three real datasets for the tasks of part and damage segmentation. For part segmentation, we show that the segmentation models trained on a combination of real data and our synthetic data outperform all models trained only on real data. For damage segmentation, we show the sim2real transfer ability of CrashCar101.

摘要
在这篇论文中，我们关注了汽车损害评估问题，例如汽车受损的部分和程度的识别。为了训练计算机视觉系统进行semantic部分和损害分割，我们需要手动标注图像，以获得价值的像素注释。为了缓解这个需求，我们提议使用生成数据。生成数据可以提供高度多样性的样本，高精度的像素注释，并且可以在人工干预下生成无限大的训练集。我们提出了一个生成过程，用于损害3D汽车模型，并从而获得了损害2D图像和高精度的像素注释。为了验证我们的想法，我们执行了我们的管道，并生成了CrashCar101数据集。我们在三个实际数据集上进行了实验，以评估part和损害分割任务。对于part分割任务，我们显示了将real数据和我们生成的数据混合训练的模型，与只使用实际数据训练的模型相比，具有更高的性能。对于损害分割任务，我们显示了CrashCar101数据集的sim2real传送能力。

Band-wise Hyperspectral Image Pansharpening using CNN Model Propagation

paper_url: http://arxiv.org/abs/2311.06510
repo_url: https://github.com/giu-guarino/r-pnn
paper_authors: Giuseppe Guarino, Matteo Ciotola, Gemine Vivone, Giuseppe Scarpa
for: 本研究的目的是提出一种深度学习方法，用于解决高spectral缩进问题。
methods: 该方法基于单 banda unsupervised pansharpening模型，通过在排序band-wise adaptive scheme中嵌入该模型，以适应不同 spectral band的数据。
results: 对于我们的数据集，该方法达到了非常好的结果，超过了传统和深度学习参考方法。代码实现可以在https://github.com/giu-guarino/R-PNN找到。

Abstract
Hyperspectral pansharpening is receiving a growing interest since the last few years as testified by a large number of research papers and challenges. It consists in a pixel-level fusion between a lower-resolution hyperspectral datacube and a higher-resolution single-band image, the panchromatic image, with the goal of providing a hyperspectral datacube at panchromatic resolution. Thanks to their powerful representational capabilities, deep learning models have succeeded to provide unprecedented results on many general purpose image processing tasks. However, when moving to domain specific problems, as in this case, the advantages with respect to traditional model-based approaches are much lesser clear-cut due to several contextual reasons. Scarcity of training data, lack of ground-truth, data shape variability, are some such factors that limit the generalization capacity of the state-of-the-art deep learning networks for hyperspectral pansharpening. To cope with these limitations, in this work we propose a new deep learning method which inherits a simple single-band unsupervised pansharpening model nested in a sequential band-wise adaptive scheme, where each band is pansharpened refining the model tuned on the preceding one. By doing so, a simple model is propagated along the wavelength dimension, adaptively and flexibly, with no need to have a fixed number of spectral bands, and, with no need to dispose of large, expensive and labeled training datasets. The proposed method achieves very good results on our datasets, outperforming both traditional and deep learning reference methods. The implementation of the proposed method can be found on https://github.com/giu-guarino/R-PNN

摘要
“几年前，几何spectral pansharpening已经受到了越来越多的关注，可以看到许多研究论文和挑战。它的目的是将lower-resolution的几何spectral数据 кубы和高分辨率的单色图像（panchromatic image）进行像素级融合，以获得高分辨率的几何spectral数据库。由于深度学习模型具有强大的表示能力，它们在许多通用图像处理任务上取得了无PRECEDENT的成绩。但是，当转移到域特定问题时，例如这个案例中，深度学习模型的优势与传统的模型基于方法相比较难明确地表现出来，因为一些contextual因素的限制。数据缺乏、缺乏标注、数据形态变化等因素，都会限制深度学习网络在域特定问题上的总体化能力。为了缓解这些限制，在这个工作中，我们提出了一种新的深度学习方法。这种方法基于单色图像无监督的宽渠扩充模型，通过在带宽维度上逐步进行适应式的band-wise适应方案，使得每个带都可以细化和适应，无需具备固定的 spectral 带数量，也无需具备大量、昂贵和标注的训练数据。提议的方法在我们的数据集上实现了非常好的效果，超越了传统和深度学习参考方法。实现方法可以在https://github.com/giu-guarino/R-PNN 找到。”

Self-supervised Context Learning for Visual Inspection of Industrial Defects

paper_url: http://arxiv.org/abs/2311.06504
repo_url: None
paper_authors: Peng Wang, Haiming Yao, Wenyong Yu
for: 本研究旨在提出一种基于自我监督学习的检测方法，以解决现有的无监督模型在产品表面变化大的情况下检测缺陷的问题。
methods: 我们提出一种自我监督学习算法，通过将目标图像分割成9个 patches，并让编码器预测每两个 patch 的相对位置关系，以提取丰富的 semantics。我们还提出一种帮助函数-加 augmentation 方法，以强调正常和异常的 latent 表示之间的差异。
results: 我们的方法在 widely 使用的 MVTec AD 数据集上实现了出色的检测和 segmentation 性能，即 95.8% 和 96.8% 分别，创造了当今无监督检测领域的状元标准。广泛的实验证明了我们的方法在多种工业应用中的有效性。

Abstract
The unsupervised visual inspection of defects in industrial products poses a significant challenge due to substantial variations in product surfaces. Current unsupervised models struggle to strike a balance between detecting texture and object defects, lacking the capacity to discern latent representations and intricate features. In this paper, we present a novel self-supervised learning algorithm designed to derive an optimal encoder by tackling the renowned jigsaw puzzle. Our approach involves dividing the target image into nine patches, tasking the encoder with predicting the relative position relationships between any two patches to extract rich semantics. Subsequently, we introduce an affinity-augmentation method to accentuate differences between normal and abnormal latent representations. Leveraging the classic support vector data description algorithm yields final detection results. Experimental outcomes demonstrate that our proposed method achieves outstanding detection and segmentation performance on the widely used MVTec AD dataset, with rates of 95.8% and 96.8%, respectively, establishing a state-of-the-art benchmark for both texture and object defects. Comprehensive experimentation underscores the effectiveness of our approach in diverse industrial applications.

摘要
“无监督的视觉检测工业产品上的瑕疵具有严重的挑战，因为产品表面会有很大的变化。现有的无监督模型对于检测文字和物体瑕疵具有困难，因为它们缺乏能够捕捉实际特征和细节的能力。在这篇论文中，我们提出了一个新的自类学习算法，用于从熔毙难以分辨的图像中提取有用的 semantics。我们的方法是将目标图像分成九块，让算法预测两块之间的相对位置关系，以提取丰富的 semantics。然后，我们引入了一个增强不同于正常的latent representation的方法，以提高分辨率。通过使用了经典支持向量描述算法，获得最终的检测结果。实验结果显示，我们的提案方法在广泛使用的MVTec AD dataset上实现了95.8%和96.8%的检测和分类性能，成为瑕疵和物体瑕疵检测的现代标准。实验结果显示，我们的方法在不同的工业应用中具有广泛的适用范围。”

LayoutPrompter: Awaken the Design Ability of Large Language Models

paper_url: http://arxiv.org/abs/2311.06495
repo_url: https://github.com/microsoft/layoutgeneration
paper_authors: Jiawei Lin, Jiaqi Guo, Shizhao Sun, Zijiang James Yang, Jian-Guang Lou, Dongmei Zhang
for: 这个论文是为了提出一种基于大语言模型（LLM）的 Conditional Graphic Layout Generation 方法，以解决现有方法缺乏灵活性和数据效率问题。
methods: 该方法包括三个关键组件：输入输出序列化、动态示例选择和布局排序。具体来说，输入输出序列化组件 меiculously 设计了每个布局生成任务的输入和输出格式。动态示例选择负责选择对于给定输入最有帮助的提示示例。布局排序则是用于从多个 LLM 的输出中选择最高质量的布局。
results: 经过实验表明，LayoutPrompter 可以在所有现有的布局生成任务上与或超越当前状态的方法，无需训练或调整模型。此外，对比baseline方法，LayoutPrompter 在低数据情况下表现更出色，进一步证明了该方法的数据效率。

Abstract
Conditional graphic layout generation, which automatically maps user constraints to high-quality layouts, has attracted widespread attention today. Although recent works have achieved promising performance, the lack of versatility and data efficiency hinders their practical applications. In this work, we propose LayoutPrompter, which leverages large language models (LLMs) to address the above problems through in-context learning. LayoutPrompter is made up of three key components, namely input-output serialization, dynamic exemplar selection and layout ranking. Specifically, the input-output serialization component meticulously designs the input and output formats for each layout generation task. Dynamic exemplar selection is responsible for selecting the most helpful prompting exemplars for a given input. And a layout ranker is used to pick the highest quality layout from multiple outputs of LLMs. We conduct experiments on all existing layout generation tasks using four public datasets. Despite the simplicity of our approach, experimental results show that LayoutPrompter can compete with or even outperform state-of-the-art approaches on these tasks without any model training or fine-tuning. This demonstrates the effectiveness of this versatile and training-free approach. In addition, the ablation studies show that LayoutPrompter is significantly superior to the training-based baseline in a low-data regime, further indicating the data efficiency of LayoutPrompter. Our project is available at https://github.com/microsoft/LayoutGeneration/tree/main/LayoutPrompter.

摘要
《 conditional graphic layout generation 》，即自动将用户约束映射到高质量的布局，在今天已经吸引了广泛的关注。虽然 latest works 已经实现了可观的性能，但缺乏实用性和数据效率限制了它们的实际应用。在这项工作中，我们提出了 LayoutPrompter，它利用大型语言模型（LLMs）来解决上述问题通过在线上学习。LayoutPrompter 由三个关键组成部分：输入输出序列化、动态示例选择和布局排名。具体来说，输入输出序列化部分仔细设计了每个布局生成任务的输入和输出格式。动态示例选择部分选择给定输入的最有用的推动示例。而布局排名部分则用来从多个 LLMS 的输出中选择最高质量的布局。我们在所有现有的布局生成任务上进行了实验，使用四个公共数据集。尽管我们的方法简单，但实验结果显示，LayoutPrompter 可以与或 même outperform 当前状态的方法这些任务上，无需任何模型训练或调整。这说明 LayoutPrompter 是一种灵活且无需训练的方法。此外，我们的剖析研究表明，LayoutPrompter 在低数据情况下表现 significatively 优于基eline，这再次证明了 LayoutPrompter 的数据效率。您可以在 https://github.com/microsoft/LayoutGeneration/tree/main/LayoutPrompter 上查看我们的项目。

PECoP: Parameter Efficient Continual Pretraining for Action Quality Assessment

paper_url: http://arxiv.org/abs/2311.07603
repo_url: https://github.com/plrbear/pecop
paper_authors: Amirhossein Dadashzadeh, Shuchao Duan, Alan Whone, Majid Mirmehdi
for: 本研究的目的是提高Action Quality Assessment（AQA）中的模型表现，特别是在预测过程中处理具有域对错的资料时。
methods: 我们提出了一个新的、效率高的普遍预训架构，名为PECoP，并在其中引入3D-Adapters来学习预测当中的体域特征。在PECoP中，仅对Adapter模组的参数进行更新，以减少域对错的影响。
results: 我们在几个benchmark dataset上进行了实验，包括JIGSAWS、MTL-AQA和FineDiving，并取得了较好的成绩（比如JIGSAWS中提高6.0%）。此外，我们还提供了一个新的Parkinson’s Disease dataset，PD4T，并在其上进行了比较，并与之前的最佳成绩进行了比较（提高3.56%）。

Abstract
The limited availability of labelled data in Action Quality Assessment (AQA), has forced previous works to fine-tune their models pretrained on large-scale domain-general datasets. This common approach results in weak generalisation, particularly when there is a significant domain shift. We propose a novel, parameter efficient, continual pretraining framework, PECoP, to reduce such domain shift via an additional pretraining stage. In PECoP, we introduce 3D-Adapters, inserted into the pretrained model, to learn spatiotemporal, in-domain information via self-supervised learning where only the adapter modules' parameters are updated. We demonstrate PECoP's ability to enhance the performance of recent state-of-the-art methods (MUSDL, CoRe, and TSA) applied to AQA, leading to considerable improvements on benchmark datasets, JIGSAWS ($\uparrow6.0\%$), MTL-AQA ($\uparrow0.99\%$), and FineDiving ($\uparrow2.54\%$). We also present a new Parkinson's Disease dataset, PD4T, of real patients performing four various actions, where we surpass ($\uparrow3.56\%$) the state-of-the-art in comparison. Our code, pretrained models, and the PD4T dataset are available at https://github.com/Plrbear/PECoP.

摘要
因为Action Quality Assessment（AQA）的标注数据有限，previous works通常是 fine-tune 在大规模的领域通用数据集上预训练的模型。这种常见的方法会导致弱化泛化，特别是当领域shift很大时。我们提出了一种新的、效率的 continual pretraining 框架，PECoP，以减少领域shift。在 PECoP 中，我们引入了 3D-Adapters，用于在预训练模型中学习空间时间领域信息，通过自我超vised learning，只有 adapter modules 的参数被更新。我们证明了 PECoP 能够提高最近state-of-the-art方法（MUSDL、CoRe 和 TSA）在 AQA 中的性能，在 benchmark 数据集（JIGSAWS、MTL-AQA 和 FineDiving）上实现了显著提高（$\uparrow6.0\%$, $\uparrow0.99\%$ 和 $\uparrow2.54\%$）。我们还发布了一个新的 Parkinson's Disease 数据集，PD4T，包含了四种不同的动作，我们在 comparison 中超过了 state-of-the-art（$\uparrow3.56\%$）。我们的代码、预训练模型和 PD4T 数据集可以在 GitHub 上获取：https://github.com/Plrbear/PECoP。

Polarimetric PatchMatch Multi-View Stereo

paper_url: http://arxiv.org/abs/2311.07600
repo_url: None
paper_authors: Jinyu Zhao, Jumpei Oishi, Yusuke Monno, Masatoshi Okutomi
for: 这paper是为了提高多视图ステレオ（MVS）的准确性和完整性而设计的。
methods: 这paper使用的方法是PatchMatch multi-view Stereo（PatchMatch MVS），该方法通过生成深度和法向假设，并效率地在多视图图像中寻找最佳假设，以确定物体的三维模型。此外，这paper还引入了抗licht极化信息来评估假设的正确性。
results: 实验结果表明，对比现有PatchMatch MVS方法，PolarPMS可以提高三维模型的准确性和完整性，特别是对于无文本表面。

Abstract
PatchMatch Multi-View Stereo (PatchMatch MVS) is one of the popular MVS approaches, owing to its balanced accuracy and efficiency. In this paper, we propose Polarimetric PatchMatch multi-view Stereo (PolarPMS), which is the first method exploiting polarization cues to PatchMatch MVS. The key of PatchMatch MVS is to generate depth and normal hypotheses, which form local 3D planes and slanted stereo matching windows, and efficiently search for the best hypothesis based on the consistency among multi-view images. In addition to standard photometric consistency, our PolarPMS evaluates polarimetric consistency to assess the validness of a depth and normal hypothesis, motivated by the physical property that the polarimetric information is related to the object's surface normal. Experimental results demonstrate that our PolarPMS can improve the accuracy and the completeness of reconstructed 3D models, especially for texture-less surfaces, compared with state-of-the-art PatchMatch MVS methods.

摘要
patchmatch多视图雷达（PatchMatch MVS）是一种受欢迎的MVS方法，因为它的平衡准确性和效率。在这篇论文中，我们提出了抗 polarimetric PatchMatch多视图雷达（PolarPMS），这是第一种利用抗 polarimetric 信号来PatchMatch MVS的方法。patchmatch MVS的关键在于生成深度和法向假设，形成局部三维平面和斜视匹配窗口，然后高效地搜索最佳假设，基于多视图图像的一致性。除了标准光度一致性外，我们的PolarPMS还评估抗 polarimetric一致性，以评估假设的有效性，这是因为物体表面法向的物理特性和抗 polarimetric信号之间存在关系。实验结果表明，我们的PolarPMS可以提高准确性和完整性的三维模型重建，特别是面粗糙表面，相比现有的PatchMatch MVS方法。

CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer

paper_url: http://arxiv.org/abs/2311.06443
repo_url: https://github.com/HowieMa/CVTHead
paper_authors: Haoyu Ma, Tong Zhang, Shanlin Sun, Xiangyi Yan, Kun Han, Xiaohui Xie
for: 本研究旨在 reconstruction 个性化动画人头模型，以便在 AR/VR 领域中实现真实时间的人脸动画。
methods: 本研究使用 point-based 神经渲染技术，从单个参考图像中生成可控的神经头像。该方法利用 mesh 中稀疏的顶点点集，并采用提出的 Vertex-feature Transformer 来学习每个顶点的本地特征描述符。这使得可以模型所有顶点之间的长距离依赖关系。
results: 实验结果表明，CVTHead 可以与现状的图形学基于方法相比，实现相似的性能。此外，它还允许在不同的表情、头部姿态和摄像头视图下，高效地渲染出新的人头模型。这些属性可以通过 3DMM 的偏置系数进行控制，以实现多样化和真实的动画在真实时间enario中。

Abstract
Reconstructing personalized animatable head avatars has significant implications in the fields of AR/VR. Existing methods for achieving explicit face control of 3D Morphable Models (3DMM) typically rely on multi-view images or videos of a single subject, making the reconstruction process complex. Additionally, the traditional rendering pipeline is time-consuming, limiting real-time animation possibilities. In this paper, we introduce CVTHead, a novel approach that generates controllable neural head avatars from a single reference image using point-based neural rendering. CVTHead considers the sparse vertices of mesh as the point set and employs the proposed Vertex-feature Transformer to learn local feature descriptors for each vertex. This enables the modeling of long-range dependencies among all the vertices. Experimental results on the VoxCeleb dataset demonstrate that CVTHead achieves comparable performance to state-of-the-art graphics-based methods. Moreover, it enables efficient rendering of novel human heads with various expressions, head poses, and camera views. These attributes can be explicitly controlled using the coefficients of 3DMMs, facilitating versatile and realistic animation in real-time scenarios.

摘要
<>将个性化动画头模型重建为有关AR/VR的研究领域有着重要意义。现有的实现方法通常需要多视图图像或视频，这使得重建过程变得复杂。另外，传统的渲染管道时间consuming，限制了实时动画的可能性。在这篇论文中，我们介绍CVTHead，一种新的方法，可以从单个参考图像中生成可控的神经头模型。CVTHead使用点集为网格的稀疏顶点来学习本地特征描述符，这使得模型可以学习所有顶点之间的长距离依赖关系。实验结果表明，CVTHead可以与现有的图形学基于方法相比，在VOXCELEB数据集上实现相似的性能。此外，它可以高效地渲染 novel human head 模型，包括不同的表情、头部姿态和摄像头视角。这些特性可以通过3DMM的系数来控制，从而实现有效的实时动画。Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form instead.

2023-11-11

cs.AI

cs.AI - 2023-11-11

Automatized Self-Supervised Learning for Skin Lesion Screening

paper_url: http://arxiv.org/abs/2311.06691
repo_url: None
paper_authors: Vullnet Useini, Stephanie Tanadini-Lang, Quentin Lohmeyer, Mirko Meboldt, Nicolaus Andratschke, Ralph P. Braun, Javier Barranco García
for: 这份研究的目的是为了提高皮肤癌检测的精度和效率，以及帮助皮肤科医生识别病变。methods: 这份研究使用了人工智能（AI）决策支持工具，该工具使用了现代的物体检测算法来识别和从患者影像中提取所有皮肤损伤，然后使用自主学习AI算法来排序这些损伤的可疑程度。results: 这份研究的结果显示，使用AI决策支持工具可以提高皮肤科医生识别病变的精度，具体来说，该工具可以帮助医生识别93%的病变损伤，并且帮助医生增加自信心和与其他专家的一致性。

Abstract
The incidence rates of melanoma, the deadliest form of skin cancer, have been increasing steadily worldwide, presenting a significant challenge to dermatologists. Early detection of melanoma is crucial for improving patient survival rates, but identifying suspicious lesions through ugly duckling (UD) screening, the current method used for skin cancer screening, can be challenging and often requires expertise in pigmented lesions. To address these challenges and improve patient outcomes, an artificial intelligence (AI) decision support tool was developed to assist dermatologists in identifying UD from wide-field patient images. The tool uses a state-of-the-art object detection algorithm to identify and extract all skin lesions from patient images, which are then sorted by suspiciousness using a self-supervised AI algorithm. A clinical validation study was conducted to evaluate the tool's performance, which demonstrated an average sensitivity of 93% for the top-10 AI-identified UDs on skin lesions selected by the majority of experts in pigmented skin lesions. The study also found that dermatologists confidence increased, and the average majority agreement with the top-10 AI-identified UDs improved to 100% when assisted by AI. The development of this AI decision support tool aims to address the shortage of specialists, enable at-risk patients to receive faster consultations and understand the impact of AI-assisted screening. The tool's automation can assist dermatologists in identifying suspicious lesions and provide a more objective assessment, reducing subjectivity in the screening process. The future steps for this project include expanding the dataset to include histologically confirmed melanoma cases and increasing the number of participants for clinical validation to strengthen the tool's reliability and adapt it for real-world consultation.

摘要
全球的梅毒病例数逐渐增加，对皮肤科医生而言，这提出了一项重要的挑战。早期发现梅毒病是改善病人存活率的关键，但通过“鸟嘤”（UD）检测，现在用于皮肤癌检测的方法，可能很困难，需要对疤痕性皮肤病有专门的知识。为了解决这些挑战并提高病人 outcome，我们开发了一种人工智能（AI）决策支持工具，用于协助皮肤科医生从广角图像中识别UD。该工具使用当前最先进的物体检测算法来识别和提取患者图像中的所有皮肤损伤，然后根据自动学习AI算法排序为可疑程度。在临床验证研究中，我们发现该工具的敏感性为93%，对于由大多数专家选择的疤痕性皮肤损伤的top-10 AI识别UD。研究还发现，当帮助于AI的时候，专家的自信度增加，并且对top-10 AI识别UD的多数同意率提高到100%。该工具的开发旨在解决专业人员短缺、帮助高风险患者更快地咨询，并了解AI助检查的影响。该工具的自动化可以帮助皮肤科医生识别可疑损伤，提供更Objective的评估，减少检测过程中的主观性。未来的步骤包括将数据集扩展到包括历史确诊梅毒患者 случа，并增加参与者数量以强化工具的可靠性和适应实际咨询。

Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination

paper_url: http://arxiv.org/abs/2311.06673
repo_url: None
paper_authors: Lu Wen, Songan Zhang, H. Eric Tseng, Huei Peng
for: 这个论文旨在快速学习未见过的任务，通过将先前学习的知识传递到相似任务中。
methods: 这个论文提出了一个基于上下文的Meta reinforcement learning（Meta RL）算法，称为MetaDreamer，它需要更少的真实任务和数据，通过做meta-幻想和MDP-幻想。
results: 我们的实验显示，MetaDreamer在数据效率和混合 interpolated 测试中表现出色，超越现有的方法。

Abstract
Meta reinforcement learning (Meta RL) has been amply explored to quickly learn an unseen task by transferring previously learned knowledge from similar tasks. However, most state-of-the-art algorithms require the meta-training tasks to have a dense coverage on the task distribution and a great amount of data for each of them. In this paper, we propose MetaDreamer, a context-based Meta RL algorithm that requires less real training tasks and data by doing meta-imagination and MDP-imagination. We perform meta-imagination by interpolating on the learned latent context space with disentangled properties, as well as MDP-imagination through the generative world model where physical knowledge is added to plain VAE networks. Our experiments with various benchmarks show that MetaDreamer outperforms existing approaches in data efficiency and interpolated generalization.

摘要
translate="zh-CN"meta学习（Meta RL）已经广泛探索，以快速学习未经见过的任务，通过将先前学习的知识传递到相似任务中。然而，大多数当前最佳方法需要meta训练任务的权重复盖到任务分布中，并且需要很多数据 для每个meta训练任务。在这篇论文中，我们提出了MetaDreamer算法，它需要更少的真实训练任务和数据，通过meta想象和MDP想象。我们在meta想象中，通过 interpolating在已学习的约束空间中，捕捉到分离的属性，并在生成世界模型中添加物理知识，使得plain VAE网络可以更好地预测未经见过的任务。我们在不同的benchmark上进行了实验，结果显示，MetaDreamer在数据效率和 interpolated泛化方面超过了现有方法。

In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering

paper_url: http://arxiv.org/abs/2311.06668
repo_url: https://github.com/shengliu66/icv
paper_authors: Sheng Liu, Lei Xing, James Zou
for: 这篇论文旨在提出一种新的叙述学习方法，以便LLM在新任务上更好地适应示例示例。
methods: 该方法包括在示例示例上进行前向传播，并生成一个叙述向量（ICV），该向量捕捉了示例示例中的关键信息。然后，在新的查询上，将LLM的幂状态进行偏移，使其更好地跟随示例示例。
results: 研究表明，ICV方法可以在多种任务上达到更好的性能，包括安全性、风格转换、扮演和格式化等。此外，ICV方法还可以轻松地控制LLM的行为，并且计算效率高于精度调整。

Abstract
Large language models (LLMs) demonstrate emergent in-context learning capabilities, where they adapt to new tasks based on example demonstrations. However, in-context learning has seen limited effectiveness in many settings, is difficult to quantitatively control and takes up context window space. To overcome these limitations, we propose an alternative approach that recasts in-context learning as in-context vectors (ICV). Using ICV has two steps. We first use a forward pass on demonstration examples to create the in-context vector from the latent embedding of the LLM. This vector captures essential information about the intended task. On a new query, instead of adding demonstrations to the prompt, we shift the latent states of the LLM using the ICV. The ICV approach has several benefits: 1) it enables the LLM to more effectively follow the demonstration examples; 2) it's easy to control by adjusting the magnitude of the ICV; 3) it reduces the length of the prompt by removing the in-context demonstrations; 4) ICV is computationally much more efficient than fine-tuning. We demonstrate that ICV achieves better performance compared to standard in-context learning and fine-tuning on diverse tasks including safety, style transfer, role-playing and formatting. Moreover, we show that we can flexibly teach LLM to simultaneously follow different types of instructions by simple vector arithmetics on the corresponding ICVs.

摘要
大型语言模型（LLM）展示出emergent在场景学习能力，即通过示例示例来适应新任务。然而，场景学习在许多场景下表现有限，控制困难，需要场景窗口空间。为了解决这些限制，我们提议一种替代方法，即在场景中 vectors（ICV）。使用ICV有两步：首先，我们使用示例示例进行前向传播，从LLM的干扰空间中生成场景vector，这个vector捕捉了任务的核心信息。然后，在新的查询上，而不是添加示例到提示中，我们使用ICV来偏移LLM的干扰状态。ICV方法具有以下优点：1）帮助LLM更好地跟随示例示例；2）容易控制，只需调整ICV的大小；3）缩短提示的长度，去除场景示例；4）ICV比finetuning更高效。我们示示ICV可以在多种任务上达到更好的性能，包括安全、样式转移、扮演和格式化。此外，我们还证明了可以通过简单的向量运算来让LLM同时遵循不同类型的指令。

The Pros and Cons of Using Machine Learning and Interpretable Machine Learning Methods in psychiatry detection applications, specifically depression disorder: A Brief Review

paper_url: http://arxiv.org/abs/2311.06633
repo_url: None
paper_authors: Hossein Simchi, Samira Tajik
for: 这些研究旨在提高心理疾病诊断的准确性和速度，防止自杀等严重结果。
methods: 这些研究使用机器学习技术，以提供更加准确和理解性的诊断结果。
results: 这些研究获得了有用的结果，帮助了心理科学家和研究人员更好地理解机器学习在心理疾病诊断中的优劣。

Abstract
The COVID-19 pandemic has forced many people to limit their social activities, which has resulted in a rise in mental illnesses, particularly depression. To diagnose these illnesses with accuracy and speed, and prevent severe outcomes such as suicide, the use of machine learning has become increasingly important. Additionally, to provide precise and understandable diagnoses for better treatment, AI scientists and researchers must develop interpretable AI-based solutions. This article provides an overview of relevant articles in the field of machine learning and interpretable AI, which helps to understand the advantages and disadvantages of using AI in psychiatry disorder detection applications.

摘要
COVID-19 大流行导致许多人需要限制社交活动，这已经导致了心理疾病的增加，特别是抑郁症。为了准确和快速诊断这些疾病，以避免严重的结果如自杀，机器学习的使用已成为越来越重要。此外，为了提供更好的治疗，AI科学家和研究人员必须开发可解释的 AI 解决方案。本文提供了机器学习和可解释 AI 领域的相关文章，以便更好地了解使用 AI 在心理疾病检测应用中的优劣。

VT-Former: A Transformer-based Vehicle Trajectory Prediction Approach For Intelligent Highway Transportation Systems

paper_url: http://arxiv.org/abs/2311.06623
repo_url: None
paper_authors: Armin Danesh Pazho, Vinit Katariya, Ghazal Alinezhad Noghre, Hamed Tabkhi
for: 增强道路安全和交通管理已成为现代 цифровой物理系统和智能交通系统的关键焦点。
methods: 本文提出了一种基于变换器的新方法，称为VT-Former，用于高速公路安全和监测中的车辆轨迹预测。这种方法不仅利用变换器捕捉长距离时间模式，还提出了一种图像注意力模块，以捕捉车辆之间的复杂社交互动。
results: 研究在三个 benchmark 数据集上，通过三种不同的视点展示了VT-Former 在车辆轨迹预测中的 State-of-The-Art 性能，以及其普适性和稳定性。此外，本文还评估了 VT-Former 在嵌入式板上的效率，并对其在车辆异常检测中的应用展示了其广泛的应用前景。

Abstract
Enhancing roadway safety and traffic management has become an essential focus area for a broad range of modern cyber-physical systems and intelligent transportation systems. Vehicle Trajectory Prediction is a pivotal element within numerous applications for highway and road safety. These applications encompass a wide range of use cases, spanning from traffic management and accident prevention to enhancing work-zone safety and optimizing energy conservation. The ability to implement intelligent management in this context has been greatly advanced by the developments in the field of Artificial Intelligence (AI), alongside the increasing deployment of surveillance cameras across road networks. In this paper, we introduce a novel transformer-based approach for vehicle trajectory prediction for highway safety and surveillance, denoted as VT-Former. In addition to utilizing transformers to capture long-range temporal patterns, a new Graph Attentive Tokenization (GAT) module has been proposed to capture intricate social interactions among vehicles. Combining these two core components culminates in a precise approach for vehicle trajectory prediction. Our study on three benchmark datasets with three different viewpoints demonstrates the State-of-The-Art (SoTA) performance of VT-Former in vehicle trajectory prediction and its generalizability and robustness. We also evaluate VT-Former's efficiency on embedded boards and explore its potential for vehicle anomaly detection as a sample application, showcasing its broad applicability.

摘要
提高公路安全和交通管理已成为现代ци伯-物理系统和智能交通系统的重要焦点。车辆轨迹预测是这些应用程序中的重要组成部分，包括交通管理、事故预防和工地安全等。随着人工智能技术的发展和公路网络上的监测摄像头的普及，实现智能管理在这个领域已得到了大幅提高。在这篇论文中，我们介绍了一种新的变换器基于方法（VT-Former），用于高速公路安全和监测中的车辆轨迹预测。此外，我们还提出了一种新的图像注意力模块（GAT），用于捕捉车辆之间的复杂社交互动。这两个核心组件的结合，实现了准确的车辆轨迹预测。我们在三个标准数据集上进行了三种不同的视角测试，并证明了VT-Former在车辆轨迹预测中的状态之最（SoTA）性和其广泛应用性和稳定性。此外，我们还评估了VT-Former的效率在嵌入板上，并探讨了其在车辆异常检测方面的潜在应用。

TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System

paper_url: http://arxiv.org/abs/2311.06622
repo_url: None
paper_authors: Haoyuan Li, Hao Jiang, Tianke Zhang, Zhelun Yu, Aoxiong Yin, Hao Cheng, Siming Fu, Yuhao Zhang, Wanggui He
for: 提高人工智能模型的开发效率和质量，实现个性化服务。
methods: 提出了一种基于多代理系统的TrainerAgent系统，包括任务、数据、模型和服务器代理，这些代理通过分析用户定义的任务、输入数据和要求（如准确率、速度），从数据和模型两个角度全面优化，以获得满足要求的模型，并最终将这些模型部署为在线服务。
results: 实验证明，该系统能够顺利地生成满足要求的模型，并能够检测和排除不可能的任务（如幻想情境或不道德请求），从而确保了系统的可靠性和安全性。

Abstract
Training AI models has always been challenging, especially when there is a need for custom models to provide personalized services. Algorithm engineers often face a lengthy process to iteratively develop models tailored to specific business requirements, making it even more difficult for non-experts. The quest for high-quality and efficient model development, along with the emergence of Large Language Model (LLM) Agents, has become a key focus in the industry. Leveraging the powerful analytical, planning, and decision-making capabilities of LLM, we propose a TrainerAgent system comprising a multi-agent framework including Task, Data, Model and Server agents. These agents analyze user-defined tasks, input data, and requirements (e.g., accuracy, speed), optimizing them comprehensively from both data and model perspectives to obtain satisfactory models, and finally deploy these models as online service. Experimental evaluations on classical discriminative and generative tasks in computer vision and natural language processing domains demonstrate that our system consistently produces models that meet the desired criteria. Furthermore, the system exhibits the ability to critically identify and reject unattainable tasks, such as fantastical scenarios or unethical requests, ensuring robustness and safety. This research presents a significant advancement in achieving desired models with increased efficiency and quality as compared to traditional model development, facilitated by the integration of LLM-powered analysis, decision-making, and execution capabilities, as well as the collaboration among four agents. We anticipate that our work will contribute to the advancement of research on TrainerAgent in both academic and industry communities, potentially establishing it as a new paradigm for model development in the field of AI.

摘要
traditional AI model training has always been challenging, especially when there is a need for custom models to provide personalized services. Algorithm engineers often face a lengthy process to iteratively develop models tailored to specific business requirements, making it even more difficult for non-experts. With the emergence of Large Language Model (LLM) Agents, there is a growing focus on high-quality and efficient model development.我们提出了一种名为TrainerAgent的多智能框架，包括任务、数据、模型和服务器代理。这些代理分析用户定义的任务、输入数据和要求（如准确率和速度），从数据和模型角度进行全面优化，以获得满足要求的模型，并最后将这些模型部署为在线服务。我们的实验评估表明，我们的系统可以适应古典的推论和生成任务，包括计算机视觉和自然语言处理领域。此外，系统还能够批判性地识别和拒绝不可能的任务，如幻想场景或不道德的请求，以确保系统的稳定性和安全性。我们的研究表明，TrainerAgent系统可以在传统模型开发的基础上提供更高效和高质量的模型开发，这得到了LLM智能分析、决策和执行能力的支持，以及代理之间的合作。我们anticipate that our work will contribute to the advancement of research on TrainerAgent in both academic and industry communities, potentially establishing it as a new paradigm for model development in the field of AI.

paper_url: http://arxiv.org/abs/2311.06607
repo_url: https://github.com/yuliang-liu/monkey
paper_authors: Zhang Li, Biao Yang, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun, Yuliang Liu, Xiang Bai
for: 提高大型多modal模型在复杂场景理解和叙述能力
methods: 提出Monkey方法，包括不需要预训练可以使用现有视觉编码器（如vit-BigHuge）进行提高输入分辨率，以及自动生成多级描述方法以便模型学习场景和对象之间的Contextual关系
results: 在多达16个不同的数据集上进行了广泛的测试，发现Monkey在基本任务上（如图像描述、全视Question Answering和文档Question Answering）具有稳定竞争力的表现

Abstract
Large Multimodal Models have demonstrated impressive capabilities in understanding general vision-language tasks. However, due to the limitation of supported input resolution (e.g., 448 x 448) as well as the inexhaustive description of the training image-text pair, these models often encounter challenges when dealing with intricate scene understandings and narratives. Here we address the problem by proposing the Monkey. Our contributions are two-fold: 1) without pretraining from the start, our method can be built upon an existing vision encoder (e.g., vit-BigHuge) to effectively improve the input resolution capacity up to 896 x 1344 pixels; 2) we propose a multi-level description generation method, which automatically provides rich information that can guide model to learn contextual association between scenes and objects. Our extensive testing across more than 16 distinct datasets reveals that Monkey achieves consistently competitive performance over the existing LMMs on fundamental tasks, such as Image Captioning, General Visual Question Answering (VQA), and Document-oriented VQA. Models, interactive demo, and the source code are provided at the following https://github.com/Yuliang-Liu/Monkey.

摘要
大型多Modal模型在通用视力语言任务上表现出了吸引人的能力。然而，由于输入分辨率的限制（例如448x448）以及训练图片文本对的描述不够详细，这些模型经常在处理复杂的场景理解和 narraves 时遇到挑战。我们解决这个问题，我们提出了猴子（Monkey）。我们的贡献有两个方面：1. 不需要先training，我们的方法可以基于现有的视力编码器（例如 vit-BigHuge）来提高输入分辨率capacity到896x1344像素;2. 我们提出了多 уров层描述生成方法，可以自动提供详细的信息，以帮助模型学习场景和物体之间的上下文关系。我们在16个不同的数据集上进行了广泛的测试，发现Monkey在基本任务上（如图像描述、通用视Question Answering和文档 oriented VQA）与现有的LMMs（Large Multimodal Models）具有相当竞争力。我们提供了模型、交互示例和源代码，可以在以下GitHub上下载：https://github.com/Yuliang-Liu/Monkey。

Understanding Grokking Through A Robustness Viewpoint

paper_url: http://arxiv.org/abs/2311.06597
repo_url: None
paper_authors: Zhiquan Tan, Weiran Huang
for: 研究一种名为“grokking”的奇异现象，即神经网络在准确适应训练数据后仍然泛化。
methods: 使用Robustness视角来理解这种现象，并提出新的评价指标基于Robustness和信息理论。
results: 发现$l_2$ нор为神经网络泛化的必要条件，但是$l_2$ norm与测试数据不协调，提出新的评价指标可以协调grokking现象。 Additionally, the proposed method can speed up the generalization process, and learning the commutative law can explain part of the speedup.

Abstract
Recently, an unusual phenomenon called grokking has gained much attention, where sometimes a neural network generalizes long after it perfectly fits the training data. We try to understand this seemingly strange phenomenon using the robustness of the neural network. Using a robustness viewpoint, we show that the popular $l_2$ weight norm (metric) of the neural network is actually a sufficient condition for grokking. As we also empirically find that $l_2$ norm correlates with grokking on the test data not in a timely way, we propose new metrics based on robustness and information theory and find that our new metrics correlate well with the grokking phenomenon. Based on the previous observations, we propose methods to speed up the generalization process. In addition, we examine the standard training process on modulo addition dataset and find that it hardly learns other basic group operations before grokking, including the commutative law. Interestingly, the speed up of generalization when using our proposed method can be partially explained by learning the commutative law, a necessary condition when the model groks on test dataset.

摘要
最近，一种奇异现象叫“grokking”在神经网络领域受到了广泛关注，神经网络在训练数据完美适应后仍然能够泛化。我们使用神经网络的稳定性视角来理解这一现象，并证明了$l_2$质量 нор（度量）是泛化现象的必要条件。然而，我们发现$l_2$ нор与测试数据上的泛化不一致，因此我们提出了基于稳定性和信息理论的新度量，并发现它们与泛化现象有高度相关性。基于以前的观察结果，我们提出了加速泛化过程的方法。此外，我们还检查了模式训练过程中的标准处理方法，发现它几乎不会学习测试数据上的其他基本群操作，包括交换律。Interestingly，使用我们提出的方法可以加速泛化过程，其中一部分可以通过学习交换律来解释，交换律是泛化到测试数据的必要条件。

paper_url: http://arxiv.org/abs/2311.06576
repo_url: None
paper_authors: Xubo Yang, Jian Gao, Ting Wang, Yaozhen He
for: 这篇论文的目的是提出一种基于社交学习的智能控制算法，以便控制黑盒系统中的机器人。
methods: 这篇论文使用了一种叫做社交学习算法（Intelligent Social Learning，ISL），它包括学习、模仿和自我研究三种式态。
results: 试验结果显示，ISL算法比四种现有方法在六个连续控制测试案例中更有效率，具有更快的计算速度、更少的参数和更高的稳定性。此外，ISL算法在模拟和实验中的抓取任务中也获得了满意的解决方案。

Abstract
Implementing intelligent control of robots is a difficult task, especially when dealing with complex black-box systems, because of the lack of visibility and understanding of how these robots work internally. This paper proposes an Intelligent Social Learning (ISL) algorithm to enable intelligent control of black-box robotic systems. Inspired by mutual learning among individuals in human social groups, ISL includes learning, imitation, and self-study styles. Individuals in the learning style use the Levy flight search strategy to learn from the best performer and form the closest relationships. In the imitation style, individuals mimic the best performer with a second-level rapport by employing a random perturbation strategy. In the self-study style, individuals learn independently using a normal distribution sampling method while maintaining a distant relationship with the best performer. Individuals in the population are regarded as autonomous intelligent agents in each style. Neural networks perform strategic actions in three styles to interact with the environment and the robot and iteratively optimize the network policy. Overall, ISL builds on the principles of intelligent optimization, incorporating ideas from reinforcement learning, and possesses strong search capabilities, fast computation speed, fewer hyperparameters, and insensitivity to sparse rewards. The proposed ISL algorithm is compared with four state-of-the-art methods on six continuous control benchmark cases in MuJoCo to verify its effectiveness and advantages. Furthermore, ISL is adopted in the simulation and experimental grasping tasks of the UR3 robot for validations, and satisfactory solutions are yielded.

摘要
实现智能控制机器人是一项困难任务，特别是面临复杂黑盒系统时，因为lack of visibility和理解机器人内部的工作方式。这篇论文提议一种智能社会学习（ISL）算法，以帮助智能控制黑盒机器人系统。 Drawing inspiration from human social groups' mutual learning, ISL includes learning, imitation, and self-study styles. Individuals in the learning style use Levy flight search strategy to learn from the best performer and form the closest relationships. In the imitation style, individuals mimic the best performer with a second-level rapport by employing a random perturbation strategy. In the self-study style, individuals learn independently using a normal distribution sampling method while maintaining a distant relationship with the best performer. Individuals in the population are regarded as autonomous intelligent agents in each style. Neural networks perform strategic actions in three styles to interact with the environment and the robot and iteratively optimize the network policy. Overall, ISL builds on the principles of intelligent optimization, incorporating ideas from reinforcement learning, and possesses strong search capabilities, fast computation speed, fewer hyperparameters, and insensitivity to sparse rewards. The proposed ISL algorithm is compared with four state-of-the-art methods on six continuous control benchmark cases in MuJoCo to verify its effectiveness and advantages. Furthermore, ISL is adopted in the simulation and experimental grasping tasks of the UR3 robot for validations, and satisfactory solutions are yielded.

SCADI: Self-supervised Causal Disentanglement in Latent Variable Models

paper_url: http://arxiv.org/abs/2311.06567
repo_url: https://github.com/hazel-heejeong-nam/self-supervised-causal-disentanglement
paper_authors: Heejeong Nam
for: 本研究旨在提出一种新的自助学习 causal disentanglement 模型，即 SCADI（SElf-supervised CAusal DIsentanglement）模型，以便无需指导或标注数据，通过自动学习方式，捕捉 semantic factor 和其 causal 关系。
methods: 本研究使用了 masked structural causal model (SCM) 和 pseudo-label generator 两种方法，以实现无监督的 causal disentanglement。
results: 研究发现，SCADI 模型能够自动学习 semantic factor 和 causal 关系，无需任何指导或标注数据，并且能够生成可读的 causal 拓扑图。

Abstract
Causal disentanglement has great potential for capturing complex situations. However, there is a lack of practical and efficient approaches. It is already known that most unsupervised disentangling methods are unable to produce identifiable results without additional information, often leading to randomly disentangled output. Therefore, most existing models for disentangling are weakly supervised, providing information about intrinsic factors, which incurs excessive costs. Therefore, we propose a novel model, SCADI(SElf-supervised CAusal DIsentanglement), that enables the model to discover semantic factors and learn their causal relationships without any supervision. This model combines a masked structural causal model (SCM) with a pseudo-label generator for causal disentanglement, aiming to provide a new direction for self-supervised causal disentanglement models.

摘要
causal disentanglement 有很大的潜力，可以捕捉复杂的情况。但是，现有的实用和效率的方法缺乏。大多数无监督分解方法无法生成可识别的结果，通常导致随机分解的输出。因此，现有的分解模型都是弱监督的，提供内在因素的信息，这会带来过高的成本。因此，我们提议一种新的模型， namely SCADI（自我监督 causal disentanglement），它可以让模型发现 semantic factor 和学习其 causal 关系，无需任何监督。这个模型将 masked 结构 causal model（SCM）与 pseudo-label 生成器结合，以实现自我监督 causal disentanglement 模型的新方向。

Heuristics-Driven Link-of-Analogy Prompting: Enhancing Large Language Models for Document-Level Event Argument Extraction

paper_url: http://arxiv.org/abs/2311.06555
repo_url: None
paper_authors: Hanzhang Zhou, Junlang Qian, Zijian Feng, Hui Lu, Zixiao Zhu, Kezhi Mao
for: 这篇论文研究了文档级事件说明抽取（EAE）中的在Context learning（ICL）问题。
methods: 该论文提出了一种名为Heuristic-Driven Link-of-Analogy（HD-LoA）的提示方法，通过示例选择和人工智能学习来帮助模型学习任务特有的规则。
results: 该论文通过实验表明，与现有提示方法和少量监督学习方法相比，HD-LoA提示方法在文档级EAE数据集上实现了4.53%和9.38%的F1分数提升，并在另外两个任务中也达到了2.87%和2.63%的准确率提升。

Abstract
In this study, we investigate in-context learning (ICL) in document-level event argument extraction (EAE). The paper identifies key challenges in this problem, including example selection, context length limitation, abundance of event types, and the limitation of Chain-of-Thought (CoT) prompting in non-reasoning tasks. To address these challenges, we introduce the Heuristic-Driven Link-of-Analogy (HD-LoA) prompting method. Specifically, we hypothesize and validate that LLMs learn task-specific heuristics from demonstrations via ICL. Building upon this hypothesis, we introduce an explicit heuristic-driven demonstration construction approach, which transforms the haphazard example selection process into a methodical method that emphasizes task heuristics. Additionally, inspired by the analogical reasoning of human, we propose the link-of-analogy prompting, which enables LLMs to process new situations by drawing analogies to known situations, enhancing their adaptability. Extensive experiments show that our method outperforms the existing prompting methods and few-shot supervised learning methods, exhibiting F1 score improvements of 4.53% and 9.38% on the document-level EAE dataset. Furthermore, when applied to sentiment analysis and natural language inference tasks, the HD-LoA prompting achieves accuracy gains of 2.87% and 2.63%, indicating its effectiveness across different tasks.

摘要
在这个研究中，我们研究了文档级事件参考抽取（EAE）中的内在学习（ICL）。文章标出了该问题的关键挑战，包括示例选择、上下文长度限制、事件类型的充沛和不可靠的链条（CoT）唤起在非逻辑任务中。为解决这些挑战，我们介绍了逻辑驱动链接 аналоги（HD-LoA）唤起方法。具体来说，我们假设并证明了LLMs通过示例示例学习任务特有的规则。基于这个假设，我们提出了一种显式逻辑驱动示例建构方法，将随机示例选择过程变换成一种系统化的方法，注重任务规则。此外，受人类 аналоги性理解的启发，我们提出了链接 аналоги唤起，使LLMs可以通过对已知情况的分析，处理新情况，提高其适应性。广泛的实验表明，我们的方法在文档级EAE数据集上的 F1 分数提高 4.53% 和 9.38%，并在 Sentiment Analysis 和自然语言推理任务上实现了 Accuracy 的提高。

paper_url: http://arxiv.org/abs/2311.06537
repo_url: None
paper_authors: Jianhong Liu, Dianshi Li
for: 这篇论文旨在探讨高风险事件预测下的计算方法的基础和热点问题。
methods: 论文提出了一些新的思路，挑战了一些常见的机器学习观点，并提出了一种新的方法融合计算方法和传统社会科学方法。
results: 论文的研究结果表明，这种新的方法可以更好地捕捉社会系统的复杂性和不确定性，提高预测的准确性和可靠性。

Abstract
The paper addresses some fundamental and hotly debated issues for high-stakes event predictions underpinning the computational approach to social sciences. We question several prevalent views against machine learning and outline a new paradigm that highlights the promises and promotes the infusion of computational methods and conventional social science approaches.

摘要
Here's a word-for-word translation of the text into Simplified Chinese:文章讨论了社会科学 Computational Approach 高度风险预测的一些基本和热点问题。我们对数据学习的一些常见看法提出了质疑，并提出了一新的思路，强调计算方法和传统社会科学方法的融合。

MuST: Multimodal Spatiotemporal Graph-Transformer for Hospital Readmission Prediction

paper_url: http://arxiv.org/abs/2311.07608
repo_url: None
paper_authors: Yan Miao, Lequan Yu
for: 预测医院复 admit 是一项重要的方法，可以减少复 admit 率，这是评估医疗系统质量和效果的关键因素。
methods: 本研究提出了一种新的模型，即多modal Spatiotemporal Graph-Transformer (MuST)，用于预测医院复 admit。该模型使用图像 convolutional networks 和时间变换器，可以有效地捕捉 EHR 和胸部X光图像中的空间和时间关系。
results: 我们的实验结果表明，包含多modal特征在内的 MuST 模型在 MIMIC-IV 数据集上的性能明显高于单modal方法。此外，我们提出的管道还超过了目前最佳的方法在医院复 admit 预测方面的表现。

Abstract
Hospital readmission prediction is considered an essential approach to decreasing readmission rates, which is a key factor in assessing the quality and efficacy of a healthcare system. Previous studies have extensively utilized three primary modalities, namely electronic health records (EHR), medical images, and clinical notes, to predict hospital readmissions. However, the majority of these studies did not integrate information from all three modalities or utilize the spatiotemporal relationships present in the dataset. This study introduces a novel model called the Multimodal Spatiotemporal Graph-Transformer (MuST) for predicting hospital readmissions. By employing Graph Convolution Networks and temporal transformers, we can effectively capture spatial and temporal dependencies in EHR and chest radiographs. We then propose a fusion transformer to combine the spatiotemporal features from the two modalities mentioned above with the features from clinical notes extracted by a pre-trained, domain-specific transformer. We assess the effectiveness of our methods using the latest publicly available dataset, MIMIC-IV. The experimental results indicate that the inclusion of multimodal features in MuST improves its performance in comparison to unimodal methods. Furthermore, our proposed pipeline outperforms the current leading methods in the prediction of hospital readmissions.

摘要
This study introduces a novel model called the Multimodal Spatiotemporal Graph-Transformer (MuST) to predict hospital readmissions. By utilizing Graph Convolution Networks and temporal transformers, we can effectively capture spatial and temporal dependencies in EHR and chest radiographs. Additionally, we propose a fusion transformer to combine the spatiotemporal features from the two modalities with features from clinical notes extracted by a pre-trained, domain-specific transformer.We evaluate the effectiveness of our method using the latest publicly available dataset, MIMIC-IV. The experimental results show that the inclusion of multimodal features in MuST improves its performance compared to unimodal methods. Furthermore, our proposed pipeline outperforms the current leading methods in predicting hospital readmissions.

Modeling Choice via Self-Attention

paper_url: http://arxiv.org/abs/2311.07607
repo_url: None
paper_authors: Joohwan Ko, Andrew A. Li
for: 这篇论文的目的是提出一种基于现代神经网络架构的选择模型，以便更好地估计选择问题中的模型。
methods: 该论文使用了一种现代神经网络架构——自注意力机制，来提出一种新的选择模型。这种模型可以在几乎相同的数据样本数下支持估计，而且可以在实际应用中提供更高的准确性。
results: 该论文通过设置一个大规模的实际数据集，并对现有的选择模型进行了大规模的比较，发现该提出的选择模型在短期和长期数据Period内都具有优势。

Abstract
Models of choice are a fundamental input to many now-canonical optimization problems in the field of Operations Management, including assortment, inventory, and price optimization. Naturally, accurate estimation of these models from data is a critical step in the application of these optimization problems in practice, and so it is perhaps surprising that such choice estimation has to now been accomplished almost exclusively, both in theory and in practice, (a) without the use of deep learning in any meaningful way, and (b) via evaluation on limited data with constantly-changing metrics. This is in stark contrast to the vast majority of similar learning applications, for which the practice of machine learning suggests that (a) neural network-based models are typically state-of-the-art, and (b) strict standardization on evaluation procedures (datasets, metrics, etc.) is crucial. Thus motivated, we first propose a choice model that is the first to successfully (both theoretically and practically) leverage a modern neural network architectural concept (self-attention). Theoretically, we show that our attention-based choice model is a low-rank generalization of the Halo Multinomial Logit model, a recent model that parsimoniously captures irrational choice effects and has seen empirical success. We prove that whereas the Halo-MNL requires $\Omega(m^2)$ data samples to estimate, where $m$ is the number of products, our model supports a natural nonconvex estimator (in particular, that which a standard neural network implementation would apply) which admits a near-optimal stationary point with $O(m)$ samples. We then establish the first realistic-scale benchmark for choice estimation on real data and use this benchmark to run the largest evaluation of existing choice models to date. We find that the model we propose is dominant over both short-term and long-term data periods.

摘要
选择模型是操作管理领域的基本输入，包括搭配、存储和价格优化等问题。选择模型的准确估计从数据中是应用这些优化问题的重要步骤，但是到目前为止，大多数实际应用中都使用了深度学习。这是与大多数类似学习应用不同的，后者通常使用神经网络模型，并且在评价过程中坚持标准化。因此，我们首先提出一个利用现代神经网络架构思想（自注意）的选择模型，这是第一个成功地（both theoretically and practically）利用自注意来估计选择模型的实际应用。我们证明了我们的注意力基本是唯一的多omialLogit模型的低级泛化，这是一个最近的模型，可以减少人们的偏好选择效应。我们证明了在$m$是产品数量时，哈洛-多omialLogit模型需要$\Omega(m^2)$的数据样本来估计，而我们的模型可以使用标准神经网络实现，并且可以在$O(m)$的样本数据上获得近似最优的站点。我们然后建立了实际规模的选择估计 benchmark，并使用这个 benchmark 来评估现有的选择模型，发现我们的模型在短期和长期数据期间均占据了主导地位。

How ChatGPT is Solving Vulnerability Management Problem

paper_url: http://arxiv.org/abs/2311.06530
repo_url: None
paper_authors: Peiyu Liu, Junming Liu, Lirong Fu, Kangjie Lu, Yifan Xia, Xuhong Zhang, Wenzhi Chen, Haiqin Weng, Shouling Ji, Wenhai Wang
for:这个论文旨在探讨ChatGPT是否可以在实际的漏洞管理任务中表现出色，包括预测安全相关性和补丁正确性等多个方面。methods:这个论文使用了ChatGPT完成6个关于漏洞管理过程的任务，并与现有的最佳实践进行比较，以 investigates the impact of different prompts 和 explore the difficulties。results:论文表明，ChatGPT在某些任务中表现出色，如生成软件漏洞报告标题。然而，ChatGPT也遇到了困难，如直接提供随机示例不能保证良好的性能。 Study reveals that leveraging ChatGPT in a self-heuristic way and effectively guiding ChatGPT to focus on helpful information are promising research directions.

Abstract
Recently, ChatGPT has attracted great attention from the code analysis domain. Prior works show that ChatGPT has the capabilities of processing foundational code analysis tasks, such as abstract syntax tree generation, which indicates the potential of using ChatGPT to comprehend code syntax and static behaviors. However, it is unclear whether ChatGPT can complete more complicated real-world vulnerability management tasks, such as the prediction of security relevance and patch correctness, which require an all-encompassing understanding of various aspects, including code syntax, program semantics, and related manual comments. In this paper, we explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 78,445 samples. For each task, we compare ChatGPT against SOTA approaches, investigate the impact of different prompts, and explore the difficulties. The results suggest promising potential in leveraging ChatGPT to assist vulnerability management. One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports. Furthermore, our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions. For instance, directly providing random demonstration examples in the prompt cannot consistently guarantee good performance in vulnerability management. By contrast, leveraging ChatGPT in a self-heuristic way -- extracting expertise from demonstration examples itself and integrating the extracted expertise in the prompt is a promising research direction. Besides, ChatGPT may misunderstand and misuse the information in the prompt. Consequently, effectively guiding ChatGPT to focus on helpful information rather than the irrelevant content is still an open problem.

摘要
近来，ChatGPT在代码分析领域引起了广泛关注。先前的研究表明，ChatGPT可以处理基础代码分析任务，如抽象语法树生成，这表明ChatGPT可能可以理解代码语法和静态行为。然而，是否可以使用ChatGPT完成更加复杂的实际漏洞管理任务，例如预测安全相关性和补丁正确性，它们需要覆盖多个方面，包括代码语法、程序 semantics 和相关手动注释。在这篇论文中，我们探索了ChatGPT在6个任务中的能力，这些任务涉及到了整个漏洞管理过程，使用了78445个样本。对于每个任务，我们与SOTA方法进行比较，研究不同的提示的影响，并探索了困难。结果表明可以使用ChatGPT协助漏洞管理，其中一个例子是ChatGPT在生成软件漏洞报告标题上的护法。此外，我们的发现还揭示了ChatGPT遇到的困难，并提供了可能的未来方向。例如，直接在提示中提供随机示例不一定能够在漏洞管理中达到好的表现。相反，通过在提示中抽取示例本身的专家知识，并将其 интегрирова到提示中是一个有前途的研究方向。此外，ChatGPT可能会对提示中的信息进行错误或不当使用，因此有效地引导ChatGPT关注有用的信息而不是无关的内容仍然是一个开放的问题。

Conceptual Model Interpreter for Large Language Models

paper_url: http://arxiv.org/abs/2311.07605
repo_url: https://github.com/fhaer/llm-cmi
paper_authors: Felix Härer
for: 这篇论文探讨了使用大语言模型（LLMs）生成和解释概念模型的可能性，并实现了一个概念模型解释器的原型，可以将文本语法生成的概念模型自动渲染为图形模型。
methods: 本论文采用了探索性的研究方法，使用现有的LLMs such as Llama~2和ChatGPT 4生成和解释概念模型，并通过API或本地交互来实现与解释器和LLMs的集成。
results: 本论文的实验结果显示，使用ChatGPT 4和Llama 2生成的模型可以在对话式交互中进行迭代模式化，并且可以在不同的商业和开源LLMs和解释器上支持多种不同的实现方式。

Abstract
Large Language Models (LLMs) recently demonstrated capabilities for generating source code in common programming languages. Additionally, commercial products such as ChatGPT 4 started to provide code interpreters, allowing for the automatic execution of generated code fragments, instant feedback, and the possibility to develop and refine in a conversational fashion. With an exploratory research approach, this paper applies code generation and interpretation to conceptual models. The concept and prototype of a conceptual model interpreter is explored, capable of rendering visual models generated in textual syntax by state-of-the-art LLMs such as Llama~2 and ChatGPT 4. In particular, these LLMs can generate textual syntax for the PlantUML and Graphviz modeling software that is automatically rendered within a conversational user interface. The first result is an architecture describing the components necessary to interact with interpreters and LLMs through APIs or locally, providing support for many commercial and open source LLMs and interpreters. Secondly, experimental results for models generated with ChatGPT 4 and Llama 2 are discussed in two cases covering UML and, on an instance level, graphs created from custom data. The results indicate the possibility of modeling iteratively in a conversational fashion.

摘要
Recently, large language models (LLMs) have shown the ability to generate source code in common programming languages. In addition, commercial products such as ChatGPT 4 have provided code interpreters, allowing for automatic execution of generated code fragments, instant feedback, and the ability to develop and refine in a conversational manner. With an exploratory research approach, this paper applies code generation and interpretation to conceptual models.The concept and prototype of a conceptual model interpreter were explored, capable of rendering visual models generated in textual syntax by state-of-the-art LLMs such as Llama~2 and ChatGPT 4. In particular, these LLMs can generate textual syntax for the PlantUML and Graphviz modeling software that is automatically rendered within a conversational user interface.The first result is an architecture describing the components necessary to interact with interpreters and LLMs through APIs or locally, providing support for many commercial and open-source LLMs and interpreters. Secondly, experimental results for models generated with ChatGPT 4 and Llama 2 are discussed in two cases covering UML and, on an instance level, graphs created from custom data. The results indicate the possibility of modeling iteratively in a conversational fashion.Here's the text in Traditional Chinese:最近，大型语言模型（LLMs）已经显示出生成常用程式语言的源代码的能力。此外，商业产品如ChatGPT 4已经提供了代码解释器，允许将生成的代码片段自动执行，并提供了即时反馈和开发和细化在对话方式下的能力。透过探索性研究方法，这篇论文将应用代码生成和解释到概念模型。这篇论文探索了一个概念模型解释器的概念和原型，可以将由现代 LLMs 如Llama~2和ChatGPT 4生成的文本 syntax 自动转换为可见的Visual模型。具体来说，这些 LLMs 可以生成 PlantUML 和 Graphviz 模型软件的文本 syntax，并将其自动转换为可见的Visual模型。论文的首个结果是一个架构，描述了与解释器和 LLMs 进行交互的 ком成�ionen，以及支持多个商业和开源 LLMs 和解释器的架构。其次，这篇论文针对使用 ChatGPT 4 和 Llama 2 生成的模型进行实验，并分为两个情况进行讨论：UML 和具体情况下的图形。结果显示了可以在对话方式下进行迭代式模型化。

BClean: A Bayesian Data Cleaning System

paper_url: http://arxiv.org/abs/2311.06517
repo_url: https://github.com/yyssl88/bclean
paper_authors: Jianbin Qin, Sifan Huang, Yaoshu Wang, Jing Zhu, Yifan Zhang, Yukai Miao, Rui Mao, Makoto Onizuka, Chuan Xiao
for:BClean is proposed to solve the problem of data cleaning, which is a crucial step in data preprocessing and machine learning.methods:BClean uses Bayesian inference and automatic Bayesian network construction, which can fully exploit the relationships between attributes in the observed dataset and any prior information provided by users. The system also includes an effective scoring model and several approximation strategies to enhance the efficiency of data cleaning.results:BClean achieves an F-measure of up to 0.9 in data cleaning, outperforming existing Bayesian methods by 2% and other data cleaning methods by 15%.

Abstract
There is a considerable body of work on data cleaning which employs various principles to rectify erroneous data and transform a dirty dataset into a cleaner one. One of prevalent approaches is probabilistic methods, including Bayesian methods. However, existing probabilistic methods often assume a simplistic distribution (e.g., Gaussian distribution), which is frequently underfitted in practice, or they necessitate experts to provide a complex prior distribution (e.g., via a programming language). This requirement is both labor-intensive and costly, rendering these methods less suitable for real-world applications. In this paper, we propose BClean, a Bayesian Cleaning system that features automatic Bayesian network construction and user interaction. We recast the data cleaning problem as a Bayesian inference that fully exploits the relationships between attributes in the observed dataset and any prior information provided by users. To this end, we present an automatic Bayesian network construction method that extends a structure learning-based functional dependency discovery method with similarity functions to capture the relationships between attributes. Furthermore, our system allows users to modify the generated Bayesian network in order to specify prior information or correct inaccuracies identified by the automatic generation process. We also design an effective scoring model (called the compensative scoring model) necessary for the Bayesian inference. To enhance the efficiency of data cleaning, we propose several approximation strategies for the Bayesian inference, including graph partitioning, domain pruning, and pre-detection. By evaluating on both real-world and synthetic datasets, we demonstrate that BClean is capable of achieving an F-measure of up to 0.9 in data cleaning, outperforming existing Bayesian methods by 2% and other data cleaning methods by 15%.

摘要
有一大量的研究在数据清洁方面，这些方法使用不同的原则来修正错误数据并将废弃数据集转换为一个更加干净的数据集。其中一种常见的方法是概率方法，包括极 bayesian 方法。然而，现有的概率方法 frequently 假设一个简单的分布（例如， Gaussian 分布），这些分布在实际应用中 часто 被做不当，或者需要专家提供复杂的先前分布（例如，通过编程语言）。这种需求是 Both labor-intensive and costly，使得这些方法在实际应用中不太适用。在这篇论文中，我们提出 BClean，一个基于 Bayesian 的数据清洁系统。我们将数据清洁问题转换为 Bayesian 推理，并将用户提供的先前信息和观察数据中的关系完全利用。为此，我们提出一种自动生成 Bayesian 网络的方法，该方法基于结构学习-基于函数依赖性发现的方法，并使用相似函数来捕捉属性之间的关系。此外，我们的系统允许用户修改生成的 Bayesian 网络，以便指定先前信息或者更正由自动生成过程发现的错误。我们还设计了一种有效的评分模型（即补偿评分模型），以便实现 Bayesian 推理。为提高数据清洁的效率，我们提出了多种approximation 策略，包括图 partitioning、domain pruning 和 pre-detection。通过对真实数据和 sintetic 数据进行评估，我们示出 BClean 可以在数据清洁中 achiev 0.9 的 F-度，比既 Bayesian 方法高 2%，比其他数据清洁方法高 15%。

Step by Step to Fairness: Attributing Societal Bias in Task-oriented Dialogue Systems

paper_url: http://arxiv.org/abs/2311.06513
repo_url: None
paper_authors: Hsuan Su, Rebecca Qian, Chinnadhurai Sankar, Shahin Shayandeh, Shang-Tse Chen, Hung-yi Lee, Daniel M. Bikel
for: 本文旨在描述一种用于诊断对话系统中偏见的诊断方法，以帮助研究人员更深入地理解偏见的来源。
methods: 本文使用了预训练的大语言模型（LLM），并通过综合分析各个系统 ком ponent的偏见行为，进行偏见诊断。
results: 实验结果表明，对话系统中的偏见通常来自于响应生成模型，而不是其他系统 ком ponent。

Abstract
Recent works have shown considerable improvements in task-oriented dialogue (TOD) systems by utilizing pretrained large language models (LLMs) in an end-to-end manner. However, the biased behavior of each component in a TOD system and the error propagation issue in the end-to-end framework can lead to seriously biased TOD responses. Existing works of fairness only focus on the total bias of a system. In this paper, we propose a diagnosis method to attribute bias to each component of a TOD system. With the proposed attribution method, we can gain a deeper understanding of the sources of bias. Additionally, researchers can mitigate biased model behavior at a more granular level. We conduct experiments to attribute the TOD system's bias toward three demographic axes: gender, age, and race. Experimental results show that the bias of a TOD system usually comes from the response generation model.

摘要
近期研究已经显示了使用预训练大型自然语言模型（LLM）的端到端方式可以获得显著改进的任务对话（TOD）系统。然而，TOD系统中每个组件的偏见行为以及端到端框架中的错误卷积问题可能会导致严重的偏见TOD响应。现有的公平性研究只关注系统总体偏见。在这篇论文中，我们提出了一种诊断方法，用于归因TOD系统中各组件的偏见。通过该归因方法，我们可以更深入地了解偏见的来源。此外，研究人员可以在更细化的水平上 mitigate 模型偏见行为。我们对TOD系统的偏见进行了三个民族轴的诊断：性别、年龄和种族。实验结果表明，TOD系统的偏见通常来自于响应生成模型。

Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering

paper_url: http://arxiv.org/abs/2311.06503
repo_url: https://github.com/zjukg/knowpat
paper_authors: Yichi Zhang, Zhuo Chen, Yin Fang, Lei Cheng, Yanxi Lu, Fangming Li, Wen Zhang, Huajun Chen
for: 这 paper 的目的是应用语言模型 (LLM) 于域pecific问答 (QA) 领域，利用域知 graph (KG)，以解决现实Scene 中 LLM 应用中的两个主要difficulty：一是生成内容需要是用户友好的，二是模型需要正确地利用域知ledge。
methods: 这 paper 提出了一个新的管道，称为 Knowledgeable Preference AlignmenT (KnowPAT)，它使用了两种偏好集合：style preference set 和 knowledge preference set，并设计了一个新的对 alignment 目标，以让 LLM 的偏好与人类偏好相align。
results: 根据对 15 个基线方法的比较，这 paper 的 KnowPAT 管道在实际Scene 中域specific问答中表现出色，超越了 15 个基eline方法。代码可以在 https://github.com/zjukg/KnowPAT 上获取。

Abstract
Recently, the development of large language models (LLMs) has attracted wide attention in academia and industry. Deploying LLMs to real scenarios is one of the key directions in the current Internet industry. In this paper, we present a novel pipeline to apply LLMs for domain-specific question answering (QA) that incorporates domain knowledge graphs (KGs), addressing an important direction of LLM application. As a real-world application, the content generated by LLMs should be user-friendly to serve the customers. Additionally, the model needs to utilize domain knowledge properly to generate reliable answers. These two issues are the two major difficulties in the LLM application as vanilla fine-tuning can not adequately address them. We think both requirements can be unified as the model preference problem that needs to align with humans to achieve practical application. Thus, we introduce Knowledgeable Preference AlignmenT (KnowPAT), which constructs two kinds of preference set called style preference set and knowledge preference set respectively to tackle the two issues. Besides, we design a new alignment objective to align the LLM preference with human preference, aiming to train a better LLM for real-scenario domain-specific QA to generate reliable and user-friendly answers. Adequate experiments and comprehensive with 15 baseline methods demonstrate that our KnowPAT is an outperforming pipeline for real-scenario domain-specific QA with LLMs. Our code is open-source at https://github.com/zjukg/KnowPAT.

摘要
最近，大型语言模型（LLM）的发展吸引了学术和产业界的广泛关注。将LLM应用到实际场景是当前互联网业界的一个重要方向。在这篇论文中，我们提出了一个新的管道，用于将LLM应用于域pecific问答（QA）中，并利用域知识图（KG），解决LLM应用中的重要方向。作为实际应用，生成的内容应该是用户友好，以服务于客户。此外，模型需要正确地利用域知识，以生成可靠的答案。这两个问题是LLM应用中的两大difficulty，vanilla fine-tuning无法充分解决。我们认为，这两个问题可以被统称为模型偏好问题，需要与人类Alignment，以实现实际应用。因此，我们提出了知识偏好Alignment（KnowPAT），它构建了两种偏好集，namely style preference set和knowledge preference set，分别解决这两个问题。此外，我们设计了一个新的对Alignment objective，以将LLM的偏好与人类偏好Alignment，以训练更好的LLM，以生成可靠和用户友好的答案。我们的实验和对15种基准方法进行了详细的比较，示出了我们的KnowPAT在实际场景下的域pecific问答with LLM的表现优于15种基准方法。我们的代码可以在https://github.com/zjukg/KnowPAT上获取。

DRUformer: Enhancing the driving scene Important object detection with driving relationship self-understanding

paper_url: http://arxiv.org/abs/2311.06497
repo_url: None
paper_authors: Yingjie Niu, Ming Ding, Keisuke Fujii, Kento Ohtani, Alexander Carballo, Kazuya Takeda
for: 这篇论文的目的是帮助车辆预测重要的物体，以提高安全驾驶。methods: 这篇论文使用了一个名为DRUformer的多模式转换器模型，考虑了所有参与者之间的关系，并将驾驶意向 embed 到模型中。results: 该模型在DRAMA数据集上进行比较实验，与其他现有的SOTA模型进行比较，获得了16.2%的mIoU提升和12.3%的ACC提升。此外，该模型在不同的道路场景和类别下实现了重要物体检测的多元效果。

Abstract
Traffic accidents frequently lead to fatal injuries, contributing to over 50 million deaths until 2023. To mitigate driving hazards and ensure personal safety, it is crucial to assist vehicles in anticipating important objects during travel. Previous research on important object detection primarily assessed the importance of individual participants, treating them as independent entities and frequently overlooking the connections between these participants. Unfortunately, this approach has proven less effective in detecting important objects in complex scenarios. In response, we introduce Driving scene Relationship self-Understanding transformer (DRUformer), designed to enhance the important object detection task. The DRUformer is a transformer-based multi-modal important object detection model that takes into account the relationships between all the participants in the driving scenario. Recognizing that driving intention also significantly affects the detection of important objects during driving, we have incorporated a module for embedding driving intention. To assess the performance of our approach, we conducted a comparative experiment on the DRAMA dataset, pitting our model against other state-of-the-art (SOTA) models. The results demonstrated a noteworthy 16.2\% improvement in mIoU and a substantial 12.3\% boost in ACC compared to SOTA methods. Furthermore, we conducted a qualitative analysis of our model's ability to detect important objects across different road scenarios and classes, highlighting its effectiveness in diverse contexts. Finally, we conducted various ablation studies to assess the efficiency of the proposed modules in our DRUformer model.

摘要
交通事故常引起致命伤害，至2023年已经导致50多万人死亡。为了实现安全驾驶和预防驾驶危险，它是非常重要的帮助车辆预测重要的物件。过去的研究主要集中在个人参与者的重要性，将它们视为独立的实体，往往忽略了参与者之间的关系。可是，这种方法在复杂的情况下显示出较差的检测效果。因此，我们提出了驾驶景况关系自我理解变数former（DRUformer），用于提高重要物件检测任务。DRUformer 是基于 transformer 的多模式重要物件检测模型，考虑所有参与者在驾驶景况中的关系。认识到驾驶意向也对重要物件检测 during driving 有重要影响，我们将驾驶意向模块 embed 到我们的模型中。为了评估我们的方法效果，我们在 DRAMA dataset 上进行了比较性实验，与其他现有的 SOTA 方法进行比较。结果显示 DRUformer 在 mIoU 方面获得了可注目的 16.2% 提升，并在 ACC 方面获得了重要的 12.3% 提升，与 SOTA 方法相比。此外，我们进行了多种简洁分析，以评估 DRUformer 模型在不同的道路enario 和类别中的效果，显示它在多元的情况下具有优秀的效果。最后，我们进行了多种简洁分析，以评估 DRUformer 模型中各个模块的效率。

Finetuning Text-to-Image Diffusion Models for Fairness

paper_url: http://arxiv.org/abs/2311.07604
repo_url: None
paper_authors: Xudong Shen, Chao Du, Tianyu Pang, Min Lin, Yongkang Wong, Mohan Kankanhalli
for: 这篇论文目的是实现文本至图像传播模型中的公平性。
methods: 这篇论文使用了两个主要技术贡献：首先，一个 Distributional alignment loss，可以将生成图像中的特定特征调整到使用者定义的目标分布上；其次，偏见直接调整Diffusion Model的抽样过程，这样可以更好地优化生成图像上的损失。
results: 这篇论文的实验结果表明，使用我们的方法可以对职业描述进行重大减少gender、race和其 intersectional偏见。 gender偏见可以在仅五个软标签的情况下得到明显减少。更重要的是，我们的方法可以支持多种公平的观点，例如控制年龄分布为75%的年轻人和25%的老人，同时对gender和race进行减少偏见。最后，我们的方法可以应对多个概念的偏见，只需要在调整资料中包含这些描述。

Abstract
The rapid adoption of text-to-image diffusion models in society underscores an urgent need to address their biases. Without interventions, these biases could propagate a distorted worldview and limit opportunities for minority groups. In this work, we frame fairness as a distributional alignment problem. Our solution consists of two main technical contributions: (1) a distributional alignment loss that steers specific characteristics of the generated images towards a user-defined target distribution, and (2) biased direct finetuning of diffusion model's sampling process, which leverages a biased gradient to more effectively optimize losses defined on the generated images. Empirically, our method markedly reduces gender, racial, and their intersectional biases for occupational prompts. Gender bias is significantly reduced even when finetuning just five soft tokens. Crucially, our method supports diverse perspectives of fairness beyond absolute equality, which is demonstrated by controlling age to a $75\%$ young and $25\%$ old distribution while simultaneously debiasing gender and race. Finally, our method is scalable: it can debias multiple concepts at once by simply including these prompts in the finetuning data. We hope our work facilitates the social alignment of T2I generative AI. We will share code and various debiased diffusion model adaptors.

摘要
社会中文本到图像扩散模型的快速采纳标志着必须解决这些模型的偏见。如果没有干预措施，这些偏见可能会延续一个扭曲的世界观和限制少数群体的机会。在这项工作中，我们将公平视为分布对齐问题。我们的解决方案包括两个主要技术贡献：1. 分布对齐损失，使生成图像的特定特征向用户定义的目标分布进行调整。2. 偏见直接训练扩散模型的采样过程的方法，利用偏见的梯度更好地优化生成图像上的损失。实验表明，我们的方法可以显著减少 gender、种族和他们的交叉性偏见，特别是只需要五个软token进行微调。此外，我们的方法还可以控制年龄分布，将图像生成到75%的年轻人和25%的老年人之间。最后，我们的方法可以同时控制多个概念的偏见，只需要在微调数据中包含这些提示。我们希望我们的工作可以促进文本到图像生成AI的社会对齐。我们将分享代码和多个减偏 diffusion model adapter。

Adaptive Language-based Mental Health Assessment with Item-Response Theory

paper_url: http://arxiv.org/abs/2311.06467
repo_url: None
paper_authors: Vasudha Varadarajan, Sverker Sikström, Oscar N. E. Kjell, H. Andrew Schwartz
For: 这个研究旨在开发一种适应语言基本评估方法，以估计个人的心理分数基于有限的语言回答。* Methods: 研究使用了两种统计学学习方法：类传统测试理论（CTT）和项回快捷论（IRT）。* Results: 研究发现，使用适应测试可以大幅减少需要ask的问题数量，从11个问题降低到3个问题（对于抑郁）和5个问题（对于焦虑），而且使用ALIRT模型可以实现最高的准确率（如 Pearson r ≈ 0.93 ），同时减少问题数量。

Abstract
Mental health issues widely vary across individuals - the manifestations of signs and symptoms can be fairly heterogeneous. Recently, language-based depression and anxiety assessments have shown promise for capturing this heterogeneous nature by evaluating a patient's own language, but such approaches require a large sample of words per person to be accurate. In this work, we introduce adaptive language-based assessment - the task of iteratively estimating an individual's psychological score based on limited language responses to questions that the model also decides to ask. To this end, we explore two statistical learning-based approaches for measurement/scoring: classical test theory (CTT) and item response theory (IRT). We find that using adaptive testing in general can significantly reduce the number of questions required to achieve high validity (r ~ 0.7) with standardized tests, bringing down from 11 total questions down to 3 for depression and 5 for anxiety. Given the combinatorial nature of the problem, we empirically evaluate multiple strategies for both the ordering and scoring objectives, introducing two new methods: a semi-supervised item response theory based method (ALIRT), and a supervised actor-critic based model. While both of the models achieve significant improvements over random and fixed orderings, we find ALIRT to be a scalable model that achieves the highest accuracy with lower numbers of questions (e.g. achieves Pearson r ~ 0.93 after only 3 questions versus asking all 11 questions). Overall, ALIRT allows prompting a reduced number of questions without compromising accuracy or overhead computational costs.

摘要
心理健康问题在各个人之间很有差异 - 症状的表现可以很异化。在最近的语言基于评估中，使用患者自己的语言来评估心理健康的表现已经显示了批 promise。然而，这些方法需要每个人提供大量的语言数据来达到准确性。在这种情况下，我们介绍了适应语言基本评估 - 通过限制语言问题的数量来评估个体的心理分数。为此，我们 explore了两种统计学学习方法：классиical test theory（CTT）和item response theory（IRT）。我们发现，使用适应测试可以significantly reducethe number of questions required to achieve high validity（r ≈ 0.7）with standardized tests，从11个问题降低到3个问题（对压力问题）和5个问题（对抑郁问题）。由于问题的组合性，我们进行了多种策略的实验性评估，包括两种新方法：一种基于 semi-supervised item response theory的方法（ALIRT），以及一种基于supervised actor-critic模型的方法。虽然两种模型都实现了 Random和固定顺序的改进，但我们发现 ALIRT 是一种可扩展的模型，可以在减少问题数量的情况下保持高度的准确性（例如，在只需要3个问题时达到 Pearson r ≈ 0.93）。总之，ALIRT 允许在减少问题数量的情况下进行评估，不会增加计算成本或承载压力。

Electronic Communication Data Link Encryption Simulation Based on Wireless Communication

paper_url: http://arxiv.org/abs/2311.06462
repo_url: None
paper_authors: Rulin Bai
for: 提高电子通信数据链Encryption的模拟效果
methods: 基于无线通信技术研究，提高护圈卷私钥 cryptographic algorithm，建立系统加密模型，获取合法有效节点私钥，评估系统安全特性，验证钥匙安全性，实现无线网络通信加密优化
results: 实验结果表明，使用改进的护圈卷私钥 cryptographic algorithm simulate系统数据链Encryption在无线网络通信中，时间只需2.31毫秒，比其他算法更低。结论：研究表明，基于无线通信技术可以有效提高电子通信数据链Encryption的模拟效果。

Abstract
In order to improve the simulation effect of electronic communication data link encryption, the author proposes a solution based on wireless communication. The main content of this technology is based on the research of wireless communication, improve the elliptic curve cryptographic algorithm to build a system encryption model, obtain legal and valid node private keys, evaluate and analyze the relevant security attributes of the system, verify the security of the keys, and realize the encryption optimization of wireless network communication. Experimental results show that: Using the improved elliptic curve to simulate the system data chain encryption under the certificateless public key cryptosystem in network communication, the time is only 2.31 milliseconds, which is lower than other algorithms. Conclusion: It is proved that the technology research based on wireless communication can effectively improve the encryption simulation effect of electronic communication data link.

摘要
要提高电子通信数据链加密的模拟效果，作者提出了基于无线通信的解决方案。该技术的主要内容是基于无线通信的研究，改进椭圆曲线密码算法，建立系统加密模型，获得法理合法的节点私钥，评估和分析系统安全特性，验证密钥安全性，并实现无线网络通信加密优化。实验结果显示，使用改进的椭圆曲线来模拟系统数据链加密under certificateless public key cryptosystem在网络通信中，时间只需2.31毫秒，比其他算法更低。结论：研究表明，基于无线通信技术可以有效地提高电子通信数据链加密的模拟效果。

Online Advertisements with LLMs: Opportunities and Challenges

paper_url: http://arxiv.org/abs/2311.07601
repo_url: None
paper_authors: Soheil Feizi, MohammadTaghi Hajiaghayi, Keivan Rezaei, Suho Shin
for: 这篇论文探讨了在在线广告系统中使用大语言模型（LLM）的可能性。
methods: 论文介绍了必须满足的基本需求，包括隐私、延迟、可靠性、用户和广告商满意度。并提出了一个通用的LLM广告框架，包括修改、拍卖、预测和拍卖模块。
results: 论文提出了不同设计考虑和实施技术挑战。

Abstract
This paper explores the potential for leveraging Large Language Models (LLM) in the realm of online advertising systems. We delve into essential requirements including privacy, latency, reliability, users and advertisers' satisfaction, which such a system must fulfill. We further introduce a general framework for LLM advertisement, consisting of modification, bidding, prediction, and auction modules. Different design considerations for each module is presented, with an in-depth examination of their practicality and the technical challenges inherent to their implementation.

摘要

Aria-NeRF: Multimodal Egocentric View Synthesis

paper_url: http://arxiv.org/abs/2311.06455
repo_url: None
paper_authors: Jiankai Sun, Jianing Qiu, Chuanyang Zheng, John Tucker, Javier Yu, Mac Schwager
for: This paper aims to accelerate research in developing rich, multimodal scene models trained from egocentric data, with applications in VR/AR and intelligent agents.methods: The paper uses differentiable volumetric ray-tracing inspired by Neural Radiance Fields (NeRFs) to construct a NeRF-like model from an egocentric image sequence.results: The paper presents a comprehensive multimodal egocentric video dataset, featuring diverse data modalities and real-world context, as a foundation for furthering our understanding of human behavior and enabling more immersive and intelligent experiences in VR, AR, and robotics.

Abstract
We seek to accelerate research in developing rich, multimodal scene models trained from egocentric data, based on differentiable volumetric ray-tracing inspired by Neural Radiance Fields (NeRFs). The construction of a NeRF-like model from an egocentric image sequence plays a pivotal role in understanding human behavior and holds diverse applications within the realms of VR/AR. Such egocentric NeRF-like models may be used as realistic simulations, contributing significantly to the advancement of intelligent agents capable of executing tasks in the real-world. The future of egocentric view synthesis may lead to novel environment representations going beyond today's NeRFs by augmenting visual data with multimodal sensors such as IMU for egomotion tracking, audio sensors to capture surface texture and human language context, and eye-gaze trackers to infer human attention patterns in the scene. To support and facilitate the development and evaluation of egocentric multimodal scene modeling, we present a comprehensive multimodal egocentric video dataset. This dataset offers a comprehensive collection of sensory data, featuring RGB images, eye-tracking camera footage, audio recordings from a microphone, atmospheric pressure readings from a barometer, positional coordinates from GPS, connectivity details from Wi-Fi and Bluetooth, and information from dual-frequency IMU datasets (1kHz and 800Hz) paired with a magnetometer. The dataset was collected with the Meta Aria Glasses wearable device platform. The diverse data modalities and the real-world context captured within this dataset serve as a robust foundation for furthering our understanding of human behavior and enabling more immersive and intelligent experiences in the realms of VR, AR, and robotics.

摘要
我们寻求加速开展具有丰富多modal scene模型的研究，基于可微分数据trace的射影类似NeRF（Neural Radiance Fields）。从 Egocentric 影像序列建立NeRF-like模型扮演着重要的角色，可以更好地理解人类行为，并具有广泛应用于VR/AR等领域。这些 Egocentric NeRF-like 模型可以用来生成真实的simulation，对于在真实世界中进行任务的智能代理人具有重要意义。未来的 Egocentric 视角合成可能将会超越今天的NeRFs，通过与多modal感应器（如IMU、Audio、眼动追踪等）集成，增强视觉数据，并从人类语言上下文中获取更多的信息。为了支持和促进 Egocentric 多modal scene 模型的开发和评估，我们提供了一个完整的多modal Egocentric 影像Dataset。这个dataset包括RGB图像、眼动摄影机、麦克风录音、气压测量、GPS位置坐标、Wi-Fi和蓝牙连接资讯以及双频率IMU数据（1kHz和800Hz）和磁ometer。这个dataset在Meta Aria Glasses 挂架台上进行收集。这个多modal的数据模式和在真实世界中捕捉的情感上，将成为更好的基础 для进一步理解人类行为，并实现更 immerse 和智能的VR、AR和机器人体验。

THOS: A Benchmark Dataset for Targeted Hate and Offensive Speech

paper_url: http://arxiv.org/abs/2311.06446
repo_url: https://github.com/mohaimeed/thos
paper_authors: Saad Almohaimeed, Saleh Almohaimeed, Ashfaq Ali Shafin, Bogdan Carbunar, Ladislau Bölöni
for: 这篇论文的目的是提供一个精细的推特上的危险内容分类 dataset，以便使用大型自然语言模型来进行分类。
methods: 本论文使用了 manually labeled 的 tweets，并使用了 Large Language Models 来进行分类。
results: 研究人员通过使用 THOS dataset，成功地使用 Large Language Models 进行分类，并达到了高度的准确率。

Abstract
Detecting harmful content on social media, such as Twitter, is made difficult by the fact that the seemingly simple yes/no classification conceals a significant amount of complexity. Unfortunately, while several datasets have been collected for training classifiers in hate and offensive speech, there is a scarcity of datasets labeled with a finer granularity of target classes and specific targets. In this paper, we introduce THOS, a dataset of 8.3k tweets manually labeled with fine-grained annotations about the target of the message. We demonstrate that this dataset makes it feasible to train classifiers, based on Large Language Models, to perform classification at this level of granularity.

摘要
检测社交媒体上的危险内容，如推特上的负面或仇恨言论，受到复杂性的限制。实际上，许多数据集已经为训练分类器而收集，但是它们的标签精度尚不够。在这篇论文中，我们介绍了THOS数据集，包含8.3万个推特消息的手动标注细化目标类别。我们示示了这个数据集使得基于大语言模型的分类器可以在这种精度水平上进行分类。

Controllability-Constrained Deep Network Models for Enhanced Control of Dynamical Systems

paper_url: http://arxiv.org/abs/2311.06438
repo_url: https://github.com/suruchi1997/controlledvae
paper_authors: Suruchi Sharma, Volodymyr Makarenko, Gautam Kumar, Stas Tiomkin
for: 这篇论文的目的是提出一种控制理论基于的方法，以提高由数据驱动的模型中的控制性。
methods: 这篇论文使用了深度神经网络（DNN）来估算未知动力学模型，并在控制输入和状态观测输出之间进行数据驱动模型的估算。
results: 该方法可以提高模型中的控制性，并且可以提供更高效的控制器、更好的解释性和更低的长期预测误差。这些结果表明了数据驱动模型的控制性可以通过控制理论基于的方法进行改进。

Abstract
Control of a dynamical system without the knowledge of dynamics is an important and challenging task. Modern machine learning approaches, such as deep neural networks (DNNs), allow for the estimation of a dynamics model from control inputs and corresponding state observation outputs. Such data-driven models are often utilized for the derivation of model-based controllers. However, in general, there are no guarantees that a model represented by DNNs will be controllable according to the formal control-theoretical meaning of controllability, which is crucial for the design of effective controllers. This often precludes the use of DNN-estimated models in applications, where formal controllability guarantees are required. In this proof-of-the-concept work, we propose a control-theoretical method that explicitly enhances models estimated from data with controllability. That is achieved by augmenting the model estimation objective with a controllability constraint, which penalizes models with a low degree of controllability. As a result, the models estimated with the proposed controllability constraint allow for the derivation of more efficient controllers, they are interpretable by the control-theoretical quantities and have a lower long-term prediction error. The proposed method provides new insights on the connection between the DNN-based estimation of unknown dynamics and the control-theoretical guarantees of the solution properties. We demonstrate the superiority of the proposed method in two standard classical control systems with state observation given by low resolution high-dimensional images.

摘要
<>Here's the translation in Traditional Chinese:<>

2023-11-11

cs.CL

cs.CL - 2023-11-11

Intentional Biases in LLM Responses

paper_url: http://arxiv.org/abs/2311.07611
repo_url: None
paper_authors: Nicklaus Badyal, Derek Jacoby, Yvonne Coady
for: 这个研究旨在将对话语言模型中的偏见故意引入，以创造特定的人物形象，用于互动媒体目的。
methods: 这个研究使用了 Falcon-7b 和 Open AI 的 GPT-4 开源模型，并评估了它们对不同角色的回应。
results: 研究发现，GPT-4 的监管器模型可以确保 AI 的调整，但是它们在创造不同角色的偏见时不够有用。

Abstract
In this study we intentionally introduce biases into large language model responses in an attempt to create specific personas for interactive media purposes. We explore the differences between open source models such as Falcon-7b and the GPT-4 model from Open AI, and we quantify some differences in responses afforded by the two systems. We find that the guardrails in the GPT-4 mixture of experts models with a supervisor, while useful in assuring AI alignment in general, are detrimental in trying to construct personas with a variety of uncommon viewpoints. This study aims to set the groundwork for future exploration in intentional biases of large language models such that these practices can be applied in the creative field, and new forms of media.

摘要
在这个研究中，我们故意引入大语言模型的偏见，以创造特定的人物形象，用于互动媒体目的。我们比较了开源模型 falcon-7b 和 open AI 的 GPT-4 模型，并量化了两者响应的一些不同。我们发现，GPT-4 的混合专家模型的监督器，虽有用于保证 AI Compatibility，但在构建多种不同观点的人物时，是不利的。本研究的目的是为未来在大语言模型中意外偏见的实践提供基础，以便在艺术领域和新媒体中应用这些技术。

A Template Is All You Meme

paper_url: http://arxiv.org/abs/2311.06649
repo_url: https://github.com/ukplab/a-template-is-all-you-meme
paper_authors: Luke Bates, Peter Ebert Christensen, Preslav Nakov, Iryna Gurevych
For: The paper aims to improve the understanding of memes and their context, and to develop a method to inject context into machine learning models for better meme classification.* Methods: The authors release a large knowledge base of memes and information from www.knowyourmeme.com, and create a non-parametric majority-based classifier called Template-Label Counter (TLC) to test their hypothesis that meme templates can provide missing context for machine learning models.* Results: The authors conduct thorough classification experiments and exploratory data analysis to demonstrate the effectiveness of their method and the value of their knowledge base for meme analysis tasks.

Abstract
Memes are a modern form of communication and meme templates possess a base semantics that is customizable by whomever posts it on social media. Machine learning systems struggle with memes, which is likely due to such systems having insufficient context to understand memes, as there is more to memes than the obvious image and text. Here, to aid understanding of memes, we release a knowledge base of memes and information found on www.knowyourmeme.com, which we call the Know Your Meme Knowledge Base (KYMKB), composed of more than 54,000 images. The KYMKB includes popular meme templates, examples of each template, and detailed information about the template. We hypothesize that meme templates can be used to inject models with the context missing from previous approaches. To test our hypothesis, we create a non-parametric majority-based classifier, which we call Template-Label Counter (TLC). We find TLC more effective than or competitive with fine-tuned baselines. To demonstrate the power of meme templates and the value of both our knowledge base and method, we conduct thorough classification experiments and exploratory data analysis in the context of five meme analysis tasks.

摘要
现代通信的形式之一是memes，它们具有可自定义的基本 semantics，可以在社交媒体上分享。机器学习系统对memes表示困难，可能是因为这些系统缺乏memes的Context，因为memes比图像和文本更多。为了帮助理解memes，我们发布了www.knowyourmeme.com上的知识库，称之为知识库（KYMKB），包含超过54,000个图像。KYMKB包括流行的meme模板，每个模板的示例和详细信息。我们提出的假设是，meme模板可以用来补充过去方法缺失的Context。为了测试这个假设，我们创建了一种非 Parametric多数策略，称之为模板标签计数器（TLC）。我们发现TLC比或与精心调整的基线相当有效。为了证明meme模板和我们的知识库以及方法的力量，我们在五种meme分析任务中进行了严格的分类实验和探索数据分析。

Robust Text Classification: Analyzing Prototype-Based Networks

paper_url: http://arxiv.org/abs/2311.06647
repo_url: None
paper_authors: Zhivar Sourati, Darshan Deshpande, Filip Ilievski, Kiril Gashteovski, Sascha Saralajew
for: 本研究旨在检验 prototype-based 网络（PBN）在文本分类任务中是否具有鲁棒性特性。
methods: 我们采用了一种模块化和全面的研究框架，包括不同的后处理架构、后处理大小和目标函数。我们的评估协议对模型进行了不同级别的拟合干扰测试。
results: 我们的实验结果表明，PBNs在面对现实的拟合干扰时保持了鲁棒性。此外，PBNs的鲁棒性主要归功于保持概念可读性的目标函数，而与普通模型相比，PBNs在数据越复杂时的鲁棒性差异越加鲜明。

Abstract
Downstream applications often require text classification models to be accurate, robust, and interpretable. While the accuracy of the stateof-the-art language models approximates human performance, they are not designed to be interpretable and often exhibit a drop in performance on noisy data. The family of PrototypeBased Networks (PBNs) that classify examples based on their similarity to prototypical examples of a class (prototypes) is natively interpretable and shown to be robust to noise, which enabled its wide usage for computer vision tasks. In this paper, we study whether the robustness properties of PBNs transfer to text classification tasks. We design a modular and comprehensive framework for studying PBNs, which includes different backbone architectures, backbone sizes, and objective functions. Our evaluation protocol assesses the robustness of models against character-, word-, and sentence-level perturbations. Our experiments on three benchmarks show that the robustness of PBNs transfers to NLP classification tasks facing realistic perturbations. Moreover, the robustness of PBNs is supported mostly by the objective function that keeps prototypes interpretable, while the robustness superiority of PBNs over vanilla models becomes more salient as datasets get more complex.

摘要

PerceptionGPT: Effectively Fusing Visual Perception into LLM

paper_url: http://arxiv.org/abs/2311.06612
repo_url: None
paper_authors: Renjie Pi, Lewei Yao, Jiahui Gao, Jipeng Zhang, Tong Zhang
for: 这篇论文旨在做什么？ + 这篇论文目标是具备Visual Large Language Model（VLLM）Visual perception能力，并且能够高效地使用Large Language Model（LLM）的表示能力来实现这一目标。
methods: 这篇论文使用了什么方法？ + 这篇论文提出了一种新的综合框架，名为PerceptionGPT，它可以快速和高效地使用VLLM来实现视觉感知任务。该方法通过利用LLM的表示能力，将其token嵌入作为视觉信息的载体，然后使用轻量级的视觉任务编码器和解码器来完成视觉感知任务。
results: 这篇论文的研究结果是什么？ + 对比之前的方法，这篇论文的方法可以更好地处理多个视觉输出，并且可以减少训练时间和数据量，同时减少批处理时间。这种方法可以帮助未来的研究更好地具备VLLM的视觉感知能力。

Abstract
The integration of visual inputs with large language models (LLMs) has led to remarkable advancements in multi-modal capabilities, giving rise to visual large language models (VLLMs). However, effectively harnessing VLLMs for intricate visual perception tasks remains a challenge. In this paper, we present a novel end-to-end framework named PerceptionGPT, which efficiently and effectively equips the VLLMs with visual perception abilities by leveraging the representation power of LLMs' token embedding. Our proposed method treats the token embedding of the LLM as the carrier of spatial information, then leverage lightweight visual task encoders and decoders to perform visual perception tasks (e.g., detection, segmentation). Our approach significantly alleviates the training difficulty suffered by previous approaches that formulate the visual outputs as discrete tokens, and enables achieving superior performance with fewer trainable parameters, less training data and shorted training time. Moreover, as only one token embedding is required to decode the visual outputs, the resulting sequence length during inference is significantly reduced. Consequently, our approach enables accurate and flexible representations, seamless integration of visual perception tasks, and efficient handling of a multiple of visual outputs. We validate the effectiveness and efficiency of our approach through extensive experiments. The results demonstrate significant improvements over previous methods with much fewer trainable parameters and GPU hours, which facilitates future research in enabling LLMs with visual perception abilities.

摘要
摘要：将视觉输入与大语言模型（LLM）结合，已经导致多模态能力的很大进步，产生了视觉大语言模型（VLLM）。然而，使VLLM进行复杂的视觉感知任务仍然是一大挑战。在这篇论文中，我们提出了一种新的端到端框架，名为PerceptionGPT，可以高效地和有效地让VLLM具备视觉感知能力。我们的提议方法是将LLM的 Token embedding作为空间信息的传递者，然后使用轻量级的视觉任务编码器和解码器来完成视觉感知任务（例如检测和分割）。我们的方法可以减少前一些方法的训练困难，只需要 fewer 的可训练参数和训练数据，同时减少训练时间。此外，只需要一个 Token embedding 来解码视觉输出，因此在推理过程中的序列长度减少了。这使得我们的方法可以实现高精度和灵活的表示，同时实现多个视觉输出的有效集成。我们通过广泛的实验 validate 了我们的方法的有效性和效率。结果表明，我们的方法可以与之前的方法相比，减少很多可训练参数和GPU时间，这为未来启用LLM的视觉感知能力提供了可能性。

BizBench: A Quantitative Reasoning Benchmark for Business and Finance

paper_url: http://arxiv.org/abs/2311.06602
repo_url: None
paper_authors: Rik Koncel-Kedziorski, Michael Krumdick, Viet Lai, Varshini Reddy, Charles Lovering, Chris Tanner
for: 评估商业和金融领域中模型的数理逻辑能力。
methods: 利用程序生成技术来评估模型对金融数据的问答能力，并 isolate 不同的金融逻辑能力。
results: 通过对开源和商业模型进行评估， illustrate 该 benchmark 对数理逻辑能力的评估是一项挑战性的任务。

Abstract
As large language models (LLMs) impact a growing number of complex domains, it is becoming increasingly important to have fair, accurate, and rigorous evaluation benchmarks. Evaluating the reasoning skills required for business and financial NLP stands out as a particularly difficult challenge. We introduce BizBench, a new benchmark for evaluating models' ability to reason about realistic financial problems. BizBench comprises 8 quantitative reasoning tasks. Notably, BizBench targets the complex task of question-answering (QA) for structured and unstructured financial data via program synthesis (i.e., code generation). We introduce three diverse financially-themed code-generation tasks from newly collected and augmented QA data. Additionally, we isolate distinct financial reasoning capabilities required to solve these QA tasks: reading comprehension of financial text and tables, which is required to extract correct intermediate values; and understanding domain knowledge (e.g., financial formulas) needed to calculate complex solutions. Collectively, these tasks evaluate a model's financial background knowledge, ability to extract numeric entities from financial documents, and capacity to solve problems with code. We conduct an in-depth evaluation of open-source and commercial LLMs, illustrating that BizBench is a challenging benchmark for quantitative reasoning in the finance and business domain.

摘要
As large language models (LLMs) impact an increasing number of complex domains, it is becoming increasingly important to have fair, accurate, and rigorous evaluation benchmarks. Evaluating the reasoning skills required for business and financial NLP is a particularly difficult challenge. We introduce BizBench, a new benchmark for evaluating models' ability to reason about realistic financial problems. BizBench consists of 8 quantitative reasoning tasks. Notably, BizBench targets the complex task of question-answering (QA) for structured and unstructured financial data via program synthesis (i.e., code generation). We introduce three diverse financially-themed code-generation tasks from newly collected and augmented QA data. Additionally, we isolate distinct financial reasoning capabilities required to solve these QA tasks, including reading comprehension of financial text and tables, which is necessary to extract correct intermediate values, and understanding domain knowledge (e.g., financial formulas) needed to calculate complex solutions. Collectively, these tasks evaluate a model's financial background knowledge, ability to extract numeric entities from financial documents, and capacity to solve problems with code. We conduct an in-depth evaluation of open-source and commercial LLMs, illustrating that BizBench is a challenging benchmark for quantitative reasoning in the finance and business domain.

From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL

paper_url: http://arxiv.org/abs/2311.06595
repo_url: None
paper_authors: Xiaoqian Li, Ercong Nie, Sheng Liang
for: 提高低资源语言中大型自然语言模型（LLM）的域内学习（ICL）性能。
methods: 提出了一种新的方法，即跨语言检索增强域内学习（CREA-ICL），通过提取高资源语言中相似的提示，提高多语言预训练语言模型（MPLM）在多种任务上的零基eline性能。
results: 在分类任务中，该方法得到了稳定的提升，但在生成任务中遇到了挑战。我们的评估带来了域内学习在分类和生成领域的性能动态。

Abstract
The remarkable ability of Large Language Models (LLMs) to understand and follow instructions has sometimes been limited by their in-context learning (ICL) performance in low-resource languages. To address this, we introduce a novel approach that leverages cross-lingual retrieval-augmented in-context learning (CREA-ICL). By extracting semantically similar prompts from high-resource languages, we aim to improve the zero-shot performance of multilingual pre-trained language models (MPLMs) across diverse tasks. Though our approach yields steady improvements in classification tasks, it faces challenges in generation tasks. Our evaluation offers insights into the performance dynamics of retrieval-augmented in-context learning across both classification and generation domains.

摘要
LLMs的出色能力理解和遵从指令有时会受到低资源语言的ICL性能的限制。为解决这个问题，我们提出了一种新的方法，即跨语言检索增强ICL（CREA-ICL）。通过从高资源语言提取相似的提示，我们希望提高多语言预训练语言模型（MPLM）的零配置性能。虽然我们的方法在分类任务中得到了稳定的改善，但在生成任务中遇到了挑战。我们的评估对于检索增强ICL在分类和生成领域的性能动态进行了评估。

Zero-Shot Cross-Lingual Sentiment Classification under Distribution Shift: an Exploratory Study

paper_url: http://arxiv.org/abs/2311.06549
repo_url: None
paper_authors: Maarten De Raedt, Semere Kiros Bitew, Fréderic Godin, Thomas Demeester, Chris Develder
for: This paper is focused on studying the generalization of multi-lingual language models to out-of-distribution (OOD) test data in zero-shot cross-lingual transfer settings, and analyzing the impact of both language and domain shifts on performance.
methods: The paper uses counterfactually augmented data (CAD) to improve OOD generalization in the cross-lingual setting, and proposes two new approaches that avoid the costly annotation process associated with CAD.
results: The paper evaluates the performance of three multilingual models (LaBSE, mBERT, and XLM-R) on OOD test sets in 13 languages, and finds that the proposed cost-effective approaches reach similar or up to +3.1% better accuracy than CAD for Amazon and Restaurant reviews.

Abstract
The brittleness of finetuned language model performance on out-of-distribution (OOD) test samples in unseen domains has been well-studied for English, yet is unexplored for multi-lingual models. Therefore, we study generalization to OOD test data specifically in zero-shot cross-lingual transfer settings, analyzing performance impacts of both language and domain shifts between train and test data. We further assess the effectiveness of counterfactually augmented data (CAD) in improving OOD generalization for the cross-lingual setting, since CAD has been shown to benefit in a monolingual English setting. Finally, we propose two new approaches for OOD generalization that avoid the costly annotation process associated with CAD, by exploiting the power of recent large language models (LLMs). We experiment with 3 multilingual models, LaBSE, mBERT, and XLM-R trained on English IMDb movie reviews, and evaluate on OOD test sets in 13 languages: Amazon product reviews, Tweets, and Restaurant reviews. Results echo the OOD performance decline observed in the monolingual English setting. Further, (i) counterfactuals from the original high-resource language do improve OOD generalization in the low-resource language, and (ii) our newly proposed cost-effective approaches reach similar or up to +3.1% better accuracy than CAD for Amazon and Restaurant reviews.

摘要
英文语言模型在不同领域的 OUT-OF-DISTRIBUTION（OOD）测试样本上的 brittleness已经得到了广泛的研究，然而对多语言模型的研究尚未得到了探讨。因此，我们研究了在零shot跨语言传输 Setting中的OOD总结能力，分析了语言和领域之间的数据偏移对测试数据的影响。此外，我们还评估了基于counterfactual augmented data（CAD）的方法在跨语言设置中的有效性，因为CAD在英文设置中已经被证明有助于提高OOD总结能力。最后，我们提出了两种新的OOD总结方法，以避免与CAD相关的昂贵的注释过程，通过利用最新的大语言模型（LLMs）。我们在英语 IMDb 电影评论上训练了3个多语言模型：LaBSE、mBERT和XLM-R，并对13种语言的OOD测试集进行评估：Amazon产品评论、推特和餐厅评论。结果表明，OOD性能减降与英文设置中观察到的类似。此外，（i）原始高资源语言中的counterfactuals实际上提高了低资源语言中的OOD总结能力，和（ii）我们新提出的经济性方法达到了类似或更高于CAD的准确率，为Amazon和餐厅评论达到了+3.1%的提升。

Enhancing Public Understanding of Court Opinions with Automated Summarizers

paper_url: http://arxiv.org/abs/2311.06534
repo_url: None
paper_authors: Elliott Ash, Aniket Kesari, Suresh Naidu, Lena Song, Dominik Stammbach
for: 帮助非专家理解法律案例
methods: 使用人工智能助手生成简化摘要
results: 调查实验表明，简化摘要可以帮助非专家更好地理解法律案例的关键特征。In English, this translates to:
for: To help non-experts understand legal cases
methods: Using an AI assistant to generate simplified summaries
results: A survey experiment shows that simplified summaries can help non-experts understand the key features of a ruling.

Abstract
Written judicial opinions are an important tool for building public trust in court decisions, yet they can be difficult for non-experts to understand. We present a pipeline for using an AI assistant to generate simplified summaries of judicial opinions. These are more accessible to the public and more easily understood by non-experts, We show in a survey experiment that the simplified summaries help respondents understand the key features of a ruling. We discuss how to integrate legal domain knowledge into studies using large language models. Our results suggest a role both for AI assistants to inform the public, and for lawyers to guide the process of generating accessible summaries.

摘要
Translated into Simplified Chinese:written judicial opinions are an important tool for building public trust in court decisions, yet they can be difficult for non-experts to understand. we present a pipeline for using an AI assistant to generate simplified summaries of judicial opinions. these are more accessible to the public and more easily understood by non-experts. we show in a survey experiment that the simplified summaries help respondents understand the key features of a ruling. we discuss how to integrate legal domain knowledge into studies using large language models. our results suggest a role both for AI assistants to inform the public, and for lawyers to guide the process of generating accessible summaries.

Added Toxicity Mitigation at Inference Time for Multimodal and Massively Multilingual Translation

paper_url: http://arxiv.org/abs/2311.06532
repo_url: None
paper_authors: Marta R. Costa-jussà, David Dale, Maha Elbayad, Bokai Yu
for: 这 paper 的目的是提出一种新的pipeline来识别添加的毒性并mitigate这个问题，该pipeline在推理时间实现。
methods: 这 paper 使用了一种多modal的毒性检测分类器（speech和text），该分类器可以在大规模语言中工作。mitigation方法直接应用于文本输出中。
results: 这 paper 使用 MinTox pipeline在 SEAMLESSM4T 系统上实现了显著的添加毒性 Mitigation， across domains, modalities和语言方向。 MinTox 能够约Filter出25%-95%的添加毒性（根据模式和领域），保持翻译质量。

Abstract
Added toxicity in the context of translation refers to the fact of producing a translation output with more toxicity than there exists in the input. In this paper, we present MinTox which is a novel pipeline to identify added toxicity and mitigate this issue which works at inference time. MinTox uses a toxicity detection classifier which is multimodal (speech and text) and works in languages at scale. The mitigation method is applied to languages at scale and directly in text outputs. MinTox is applied to SEAMLESSM4T, which is the latest multimodal and massively multilingual machine translation system. For this system, MinTox achieves significant added toxicity mitigation across domains, modalities and language directions. MinTox manages to approximately filter out from 25% to 95% of added toxicity (depending on the modality and domain) while keeping translation quality.

摘要
加入毒性在翻译上指的是生成翻译输出中存在更多的毒性 чем输入。在这篇论文中，我们介绍了一种名为MinTox的新的管道，用于识别加入毒性并缓解这个问题，它在推理时间进行应用。MinTox使用一个多Modal（语音和文本）的毒性检测类ifier，可以在多种语言和模式下进行检测。这种缓解方法直接应用于文本输出中。MinTox在SEAMLESSM4T上进行应用，SEAMLESSM4T是最新的多Modal和大量多语言翻译系统。对这个系统来说，MinTox在域、modal和语言方向上都实现了显著的加入毒性缓解，可以将25%-95%的加入毒性（根据模式和领域）约束出去，而不会影响翻译质量。

Minimum Description Length Hopfield Networks

paper_url: http://arxiv.org/abs/2311.06518
repo_url: https://github.com/matanabudy/mdl-hn
paper_authors: Matan Abudy, Nur Lan, Emmanuel Chemla, Roni Katzir
for: 这个论文是为了研究协同记忆架构的Memorization和Generalization之间的质量。
methods: 这个论文使用Modern Hopfield Networks（MHN）来研究协同记忆架构的Memorization和Generalization。
results: 研究发现，大量的Memorization容量会妨碍Generalization的机会。提出一种使用Minimum Description Length（MDL）来在训练过程中决定保留哪些记忆和哪些记忆数量。

Abstract
Associative memory architectures are designed for memorization but also offer, through their retrieval method, a form of generalization to unseen inputs: stored memories can be seen as prototypes from this point of view. Focusing on Modern Hopfield Networks (MHN), we show that a large memorization capacity undermines the generalization opportunity. We offer a solution to better optimize this tradeoff. It relies on Minimum Description Length (MDL) to determine during training which memories to store, as well as how many of them.

摘要
协同记忆架构是设计来储存信息，但同时也提供了一种通过回溯方法对未见输入进行泛化的机会：储存的记忆可以被看作是类型的范例。专注于现代赫珀维尔网络（MHN），我们表明了大量储存容量会对泛化机会造成干扰。我们提出了一个解决方案，它基于最小描述长度（MDL）来决定在训练过程中哪些记忆要储存，以及哪些记忆要保留多少。

L3 Ensembles: Lifelong Learning Approach for Ensemble of Foundational Language Models

paper_url: http://arxiv.org/abs/2311.06493
repo_url: None
paper_authors: Aidin Shiri, Kaushik Roy, Amit Sheth, Manas Gaur
for: 这个论文旨在提出一种基于自然语言处理（NLP）任务的生命长学习（L3）框架，以便高效地进行任务特化和知识传递。
methods: 该方法包括提取有意义的表示，建立结构化知识库，以及在不同任务上进行逐步改进。
results: 经验表明，提出的L3 ensemble方法可以提高模型精度，同时保持或超过当前语言模型（T5）的性能。在STSbenchmark中，L3模型的准确率比原始 Fine-tuned FLM 提高15.4%。

Abstract
Fine-tuning pre-trained foundational language models (FLM) for specific tasks is often impractical, especially for resource-constrained devices. This necessitates the development of a Lifelong Learning (L3) framework that continuously adapts to a stream of Natural Language Processing (NLP) tasks efficiently. We propose an approach that focuses on extracting meaningful representations from unseen data, constructing a structured knowledge base, and improving task performance incrementally. We conducted experiments on various NLP tasks to validate its effectiveness, including benchmarks like GLUE and SuperGLUE. We measured good performance across the accuracy, training efficiency, and knowledge transfer metrics. Initial experimental results show that the proposed L3 ensemble method increases the model accuracy by 4% ~ 36% compared to the fine-tuned FLM. Furthermore, L3 model outperforms naive fine-tuning approaches while maintaining competitive or superior performance (up to 15.4% increase in accuracy) compared to the state-of-the-art language model (T5) for the given task, STS benchmark.

摘要
精度调整预训练基础语言模型（FLM） для特定任务是经常不可能，特别是在有限的设备资源下。这种情况需要开发一个生命时间学习（L3）框架，可以高效地适应流行的自然语言处理（NLP）任务。我们提出了一种方法，强调提取未经见过的数据中有意义的表示，建立结构化的知识库，并在不断更新的任务中提高表现。我们在多个 NLP 任务上进行了实验，以验证其效果，包括 GLUE 和 SuperGLUE 的benchmark。我们发现，在精度、训练效率和知识传递指标方面，L3 ensemble方法表现良好。初步实验结果表明，我们提议的 L3 模型比 fine-tuned FLM 提高4%~36%的模型精度。此外，L3 模型还能在与状态艺术语言模型（T5）相同或更高的精度水平上保持竞争性或超越性（最多提高15.4%的精度），对 STS benchmark进行了证明。

DocGen: Generating Detailed Parameter Docstrings in Python

paper_url: http://arxiv.org/abs/2311.06453
repo_url: None
paper_authors: Vatsal Venkatkrishna, Durga Shree Nagabushanam, Emmanuel Iko-Ojo Simon, Melina Vidoni
for: 提高开源软件的有效利用，因为文档债让开发者困惑。
methods: 提出了一种多步骤方法，通过结合多个任务特定的模型，每个模型都专门生成不同的段落，以确保生成的文档准确全面。
results: 与现有的生成模型进行比较，通过自动指标和人 centered评估17名开发者，证明了该方法与现有方法之间的超越。

Abstract
Documentation debt hinders the effective utilization of open-source software. Although code summarization tools have been helpful for developers, most would prefer a detailed account of each parameter in a function rather than a high-level summary. However, generating such a summary is too intricate for a single generative model to produce reliably due to the lack of high-quality training data. Thus, we propose a multi-step approach that combines multiple task-specific models, each adept at producing a specific section of a docstring. The combination of these models ensures the inclusion of each section in the final docstring. We compared the results from our approach with existing generative models using both automatic metrics and a human-centred evaluation with 17 participating developers, which proves the superiority of our approach over existing methods.

摘要
文档债务阻碍开源软件的有效利用。虽然代码概要工具有帮助开发者，但大多数开发者更偏好每个函数参数的详细账户而不是高级概要。然而，生成这样的概要是单一生成模型无法可靠地生成的由于缺乏高质量的训练数据。因此，我们提议一种多步骤方法，将多个任务特定的模型相互结合，以确保每个部分在最终的概要中包含。我们与已有的生成模型进行比较，并通过17名参与者进行人中心评估，证明我们的方法在现有方法之上。

Separating the Wheat from the Chaff with BREAD: An open-source benchmark and metrics to detect redundancy in text

paper_url: http://arxiv.org/abs/2311.06440
repo_url: https://github.com/toizzy/bread
paper_authors: Isaac Caswell, Lisa Wang, Isabel Papadimitriou
for: 这篇论文的目的是提供一个人类标注的数据集，用于测试语言模型训练数据中的重复文本问题，并评估不同语言中的数据质量。
methods: 该论文使用了人类标注的数据集，创建了一个名为BREAD的数据集，并提供了一些基线分析方法（CRED）来评估数据质量。
results: 该论文通过对BREAD数据集进行分析，发现了一些语言模型训练数据中的重复文本问题，并提供了一些参考实现方法来解决这些问题。

Abstract
Data quality is a problem that perpetually resurfaces throughout the field of NLP, regardless of task, domain, or architecture, and remains especially severe for lower-resource languages. A typical and insidious issue, affecting both training data and model output, is data that is repetitive and dominated by linguistically uninteresting boilerplate, such as price catalogs or computer-generated log files. Though this problem permeates many web-scraped corpora, there has yet to be a benchmark to test against, or a systematic study to find simple metrics that generalize across languages and agree with human judgements of data quality. In the present work, we create and release BREAD, a human-labeled benchmark on repetitive boilerplate vs. plausible linguistic content, spanning 360 languages. We release several baseline CRED (Character REDundancy) scores along with it, and evaluate their effectiveness on BREAD. We hope that the community will use this resource to develop better filtering methods, and that our reference implementations of CRED scores can become standard corpus evaluation tools, driving the development of cleaner language modeling corpora, especially in low-resource languages.

摘要
“资料质量是NLP领域中不断重现的问题，不论任务、领域或架构，它尤其严重 для低资源语言。一个常见的问题是训练数据和模型输出中的重复和 linguistically 无趣的� boilerplate，如价格目录或计算机生成的日志档案。这个问题在许多网页抓取数据中广泛存在，但是还没有一个底线来测试，或一个系统性的研究来找到简单的度量标准，以及与人类判断资料质量的一致性。在现在的工作中，我们创建了BREAD，一个人工标注的底线，涵盖360种语言。我们释出了多个基线CRED（Character REDundancy）分数，并评估它们在BREAD上的效果。我们希望社区可以使用这个资源，发展更好的筛选方法，以提高语言模型数据库的质量，特别是低资源语言。”

2023-11-11

cs.LG

cs.LG - 2023-11-11

Agnostic Membership Query Learning with Nontrivial Savings: New Results, Techniques

paper_url: http://arxiv.org/abs/2311.06690
repo_url: None
paper_authors: Ari Karchmer
for: 这paper主要研究了在agnostic learning模型中设计高效的算法（Haussler, 1992; Kearns et al., 1994）。
methods: 本paper使用了membership queries方法，特别是针对touchstone classes的frontier agnostic learning问题。
results: 本paper提出了多种agnostic learning算法，其中包括一个可以处理具有折衣数量的gate的circuit，并且可以在2^n时间内运行，而不是默认的2^n时间。此外，paper还提出了一个可以处理任意函数计算的\sym^+ circuit的算法，并且可以在2^n时间内运行。

Abstract
(Abridged) Designing computationally efficient algorithms in the agnostic learning model (Haussler, 1992; Kearns et al., 1994) is notoriously difficult. In this work, we consider agnostic learning with membership queries for touchstone classes at the frontier of agnostic learning, with a focus on how much computation can be saved over the trivial runtime of 2^n$. This approach is inspired by and continues the study of ``learning with nontrivial savings'' (Servedio and Tan, 2017). To this end, we establish multiple agnostic learning algorithms, highlighted by: 1. An agnostic learning algorithm for circuits consisting of a sublinear number of gates, which can each be any function computable by a sublogarithmic degree k polynomial threshold function (the depth of the circuit is bounded only by size). This algorithm runs in time 2^{n -s(n)} for s(n) \approx n/(k+1), and learns over the uniform distribution over unlabelled examples on \{0,1\}^n. 2. An agnostic learning algorithm for circuits consisting of a sublinear number of gates, where each can be any function computable by a \sym^+ circuit of subexponential size and sublogarithmic degree k. This algorithm runs in time 2^{n-s(n)} for s(n) \approx n/(k+1), and learns over distributions of unlabelled examples that are products of k+1 arbitrary and unknown distributions, each over \{0,1\}^{n/(k+1)} (assume without loss of generality that k+1 divides n).

摘要
（简化）在agnostic学习模型（Haussler，1992；Kearns等，1994）中设计 computationally efficient algorithm 是非常困难的。在这个工作中，我们考虑agnostic learning with membership queries for touchstone classes at the frontier of agnostic learning，并关注如何在基于2^n的极端情况下节省计算时间。这种方法是servedio和tan（2017）的研究继续。为了实现这一目标，我们提出了多种agnostic learning算法，其中包括：1. 一种agnostic learning算法 для满足一个子线性数量的门的Circuit，每个门可以是一个可以由k度多项式阈值函数计算的任意函数（Circuit的深度只决定了大小）。这个算法在时间2^n-s(n)中运行，其中s(n)约等于n/(k+1)，并在 uniform distribution over unlabelled examples on \{0,1\}^n上学习。2. 一种agnostic learning算法 для满足一个子线性数量的门的Circuit，其中每个门可以是一个可以由subexponential size和k度多项式阈值函数计算的任意函数（Circuit的深度只决定了大小）。这个算法在时间2^n-s(n)中运行，其中s(n)约等于n/(k+1)，并在分布 over unlabelled examples是k+1个未知和无标签的分布的产物上学习，即assume without loss of generality that k+1 divides n。

Heuristic Optimal Transport in Branching Networks

paper_url: http://arxiv.org/abs/2311.06650
repo_url: None
paper_authors: M. Andrecut
for: 学习一种可以在网络上最优化运输的方法，以最小化成本。
methods: 使用快速的规则来生成分支结构，以便在网络上实现最优化运输。
results: 提供了一些应用场景，例如在社交网络上的人员调配和物流网络中的货物分配。

Abstract
Optimal transport aims to learn a mapping of sources to targets by minimizing the cost, which is typically defined as a function of distance. The solution to this problem consists of straight line segments optimally connecting sources to targets, and it does not exhibit branching. These optimal solutions are in stark contrast with both natural, and man-made transportation networks, where branching structures are prevalent. Here we discuss a fast heuristic branching method for optimal transport in networks, and we provide several applications.

摘要
Translation in Simplified Chinese:优化交通目标是学习源到目标的映射，通常通过距离定义成本来实现。解决这个问题的解是直线段最优连接源到目标，无分支结构。这些优化解与自然和人工交通网络不同，后者通常具有分支结构。我们介绍了一种快速冒泡分支方法优化交通网络，并提供了多个应用。

Privacy Risks Analysis and Mitigation in Federated Learning for Medical Images

paper_url: http://arxiv.org/abs/2311.06643
repo_url: https://github.com/mlsysx/medpfl
paper_authors: Badhan Chandra Das, M. Hadi Amini, Yanzhao Wu
for: 本研究旨在分析和 Mitigate Medical data Privacy risk in Federated Learning (FL) 中的隐私风险。
methods: 本研究提出了一个整体的框架（MedPFL）来分析和 Mitigate FL 中隐私风险，并在实验中表明了对医疗数据的隐私攻击的极大威胁。
results: 研究发现，通过加入随机噪声来保护医疗数据的防御策略可能不一定有效，存在独特和紧迫的医疗数据隐私挑战。

Abstract
Federated learning (FL) is gaining increasing popularity in the medical domain for analyzing medical images, which is considered an effective technique to safeguard sensitive patient data and comply with privacy regulations. However, several recent studies have revealed that the default settings of FL may leak private training data under privacy attacks. Thus, it is still unclear whether and to what extent such privacy risks of FL exist in the medical domain, and if so, ``how to mitigate such risks?''. In this paper, first, we propose a holistic framework for Medical data Privacy risk analysis and mitigation in Federated Learning (MedPFL) to analyze privacy risks and develop effective mitigation strategies in FL for protecting private medical data. Second, we demonstrate the substantial privacy risks of using FL to process medical images, where adversaries can easily perform privacy attacks to reconstruct private medical images accurately. Third, we show that the defense approach of adding random noises may not always work effectively to protect medical images against privacy attacks in FL, which poses unique and pressing challenges associated with medical data for privacy protection.

摘要
受到批评的学习（Federated Learning，FL）在医疗领域的应用正在增加，用于分析医疗图像，这被视为一种有效的技术来保护敏感的病人数据和遵守隐私法规。然而，一些最近的研究表明，FL的默认设置可能会泄露敏感训练数据面临隐私攻击。因此，在医疗领域中是否存在这种隐私风险，以及如何 Mitigate 这些风险仍然是一个未知。在这篇论文中，我们提出了一个整体的框架，以便在 Federated Learning 中进行医疗数据隐私风险分析和降低（MedPFL），以分析隐私风险并开发有效的降低策略，以保护敏感的医疗数据。其次，我们示出了使用 FL 处理医疗图像时存在严重的隐私风险，敌方可以轻松地进行隐私攻击，以重建私人医疗图像。最后，我们表明了在 FL 中添加随机噪声可能无法有效地保护医疗图像 Against 隐私攻击，这增加了医疗数据隐私保护的特殊挑战。

The Exact Determinant of a Specific Class of Sparse Positive Definite Matrices

paper_url: http://arxiv.org/abs/2311.06632
repo_url: None
paper_authors: Mehdi Molkaraie
for: 这篇论文是为了解决一种特定的稀畴 Gaussian graphical model 的 determinant 问题而写的。
methods: 这篇论文使用了 Normal Factor Graph Duality Theorem 和 holographic algorithms 来提供一个关闭式解决方案，即通过 Matrix Determinant Lemma 对 transformed graphical model 进行处理。
results: 这篇论文提供了一个关闭式表达式，用于计算稀畴 Gaussian graphical model 的 determinant。此外， paper 还定义了一种等价关系 между两个 Gaussian graphical model。

Abstract
For a specific class of sparse Gaussian graphical models, we provide a closed-form solution for the determinant of the covariance matrix. In our framework, the graphical interaction model (i.e., the covariance selection model) is equal to replacement product of $\mathcal{K}_{n}$ and $\mathcal{K}_{n-1}$, where $\mathcal{K}_n$ is the complete graph with $n$ vertices. Our analysis is based on taking the Fourier transform of the local factors of the model, which can be viewed as an application of the Normal Factor Graph Duality Theorem and holographic algorithms. The closed-form expression is obtained by applying the Matrix Determinant Lemma on the transformed graphical model. In this context, we will also define a notion of equivalence between two Gaussian graphical models.

摘要
For a specific class of sparse Gaussian graphical models, we provide a closed-form solution for the determinant of the covariance matrix. In our framework, the graphical interaction model (i.e., the covariance selection model) is equal to the replacement product of $\mathcal{K}_{n}$ and $\mathcal{K}_{n-1}$, where $\mathcal{K}_n$ is the complete graph with $n$ vertices. Our analysis is based on taking the Fourier transform of the local factors of the model, which can be viewed as an application of the Normal Factor Graph Duality Theorem and holographic algorithms. The closed-form expression is obtained by applying the Matrix Determinant Lemma on the transformed graphical model. In this context, we will also define a notion of equivalence between two Gaussian graphical models.Here's the translation:为特定类型的稀疏 Gaussian 图模型，我们提供一个关闭式解的 determinant 表达。在我们的框架中，图模型的交互模型（即covariance 选择模型）等于 $\mathcal{K}_{n}$ 和 $\mathcal{K}_{n-1}$ 的交换乘积，其中 $\mathcal{K}_n$ 是一个完全图 WITH $n$ 个顶点。我们的分析基于图模型的本地ifactors 的傅ри幂变换，这可以看作是 Normal Factor Graph Duality Theorem 和 holographic algorithms 的应用。关闭式表达是通过应用 Matrix Determinant Lemma onto the transformed graphical model 获得的。在这个上下文中，我们还将定义 Gaussian 图模型之间的一种相等性。

Streamlining Energy Transition Scenarios to Key Policy Decisions

paper_url: http://arxiv.org/abs/2311.06625
repo_url: None
paper_authors: Florian Joseph Baader, Stefano Moret, Wolfram Wiesemann, Iain Staffell, André Bardow
For: The paper is written to provide an approach for interpreting and prioritizing key factors in the energy transition, specifically in the context of global decarbonization scenarios and a fossil-free Europe.* Methods: The paper uses decision trees, a popular machine-learning technique, to derive interpretable storylines from many quantitative scenarios and show how the key decisions in the energy transition are interlinked.* Results: The paper demonstrates that choosing a high deployment of renewables and sector coupling makes global decarbonization scenarios robust against uncertainties in climate sensitivity and demand, and that the energy transition to a fossil-free Europe is primarily determined by choices on the roles of bioenergy, storage, and heat electrification.Here is the information in Simplified Chinese text:* For: 这篇论文是为了提供一种方法来解释和优先级化能源转型的关键因素，具体是在全球减排场景和不burn欧洲的背景下。* Methods: 这篇论文使用决策树，一种流行的机器学习技术，来 derivates interpretable storylines从多个量化enario中，并显示了能源转型中关键决策之间的关联。* Results: 这篇论文发现，选择高部署的可再生能源和部署相互连接会使全球减排场景对气候敏感度和需求的不确定性 exhibit robustness，而不burn欧洲的能源转型主要取决于生物能源、存储和热电气化的角色。

Abstract
Uncertainties surrounding the energy transition often lead modelers to present large sets of scenarios that are challenging for policymakers to interpret and act upon. An alternative approach is to define a few qualitative storylines from stakeholder discussions, which can be affected by biases and infeasibilities. Leveraging decision trees, a popular machine-learning technique, we derive interpretable storylines from many quantitative scenarios and show how the key decisions in the energy transition are interlinked. Specifically, our results demonstrate that choosing a high deployment of renewables and sector coupling makes global decarbonization scenarios robust against uncertainties in climate sensitivity and demand. Also, the energy transition to a fossil-free Europe is primarily determined by choices on the roles of bioenergy, storage, and heat electrification. Our transferrable approach translates vast energy model results into a small set of critical decisions, guiding decision-makers in prioritizing the key factors that will shape the energy transition.

摘要
uncertainties surrounding the energy transition often lead modelers to present large sets of scenarios that are challenging for policymakers to interpret and act upon. an alternative approach is to define a few qualitative storylines from stakeholder discussions, which can be affected by biases and infeasibilities. leveraging decision trees, a popular machine-learning technique, we derive interpretable storylines from many quantitative scenarios and show how the key decisions in the energy transition are interlinked. specifically, our results demonstrate that choosing a high deployment of renewables and sector coupling makes global decarbonization scenarios robust against uncertainties in climate sensitivity and demand. also, the energy transition to a fossil-free Europe is primarily determined by choices on the roles of bioenergy, storage, and heat electrification. our transferrable approach translates vast energy model results into a small set of critical decisions, guiding decision-makers in prioritizing the key factors that will shape the energy transition.Here's the text with some additional information about the Simplified Chinese translation:The Simplified Chinese translation is written in 简化字符 (Simplified Chinese characters) rather than 正体字符 (Traditional Chinese characters). This is because Simplified Chinese is more widely used in mainland China and other countries, while Traditional Chinese is more commonly used in Hong Kong, Macau, and Taiwan.In the translation, some technical terms and concepts have been translated into Simplified Chinese, such as "能源转型" (energy transition), "可再生能源" (renewable energy), and "燃料电池" (fuel cell). However, some terms and concepts have been retained in English, such as "scenarios" and "storylines," as there may not be direct equivalents in Simplified Chinese.Additionally, some sentence structures and wording have been adjusted to conform to the grammatical conventions of Simplified Chinese. For example, in the sentence "Leveraging decision trees, a popular machine-learning technique, we derive interpretable storylines from many quantitative scenarios and show how the key decisions in the energy transition are interlinked," the word order has been adjusted to place the verb "derive" before the object "interpretable storylines" to conform to Simplified Chinese sentence structure.

Sparse Attention-Based Neural Networks for Code Classification

paper_url: http://arxiv.org/abs/2311.06575
repo_url: None
paper_authors: Ziyang Xiang, Zaixi Zhang, Qi Liu
for: 这个论文是为了解决实际programming教育平台中的代码分类问题而写的。methods: 这个论文使用了模型基于抽象语法树（ASTs）的方法，包括 syntax parsing和递归神经网络编码，以及一种特制的稀疏注意机制。results: 对于代码分类任务，这个方法能够提供高效精准的分类结果，并且可以解决之前相关研究中的问题，如不完整的分类标签和小型数据集。

Abstract
Categorizing source codes accurately and efficiently is a challenging problem in real-world programming education platform management. In recent years, model-based approaches utilizing abstract syntax trees (ASTs) have been widely applied to code classification tasks. We introduce an approach named the Sparse Attention-based neural network for Code Classification (SACC) in this paper. The approach involves two main steps: In the first step, source code undergoes syntax parsing and preprocessing. The generated abstract syntax tree is split into sequences of subtrees and then encoded using a recursive neural network to obtain a high-dimensional representation. This step simultaneously considers both the logical structure and lexical level information contained within the code. In the second step, the encoded sequences of subtrees are fed into a Transformer model that incorporates sparse attention mechanisms for the purpose of classification. This method efficiently reduces the computational cost of the self-attention mechanisms, thus improving the training speed while preserving effectiveness. Our work introduces a carefully designed sparse attention pattern that is specifically designed to meet the unique needs of code classification tasks. This design helps reduce the influence of redundant information and enhances the overall performance of the model. Finally, we also deal with problems in previous related research, which include issues like incomplete classification labels and a small dataset size. We annotated the CodeNet dataset with algorithm-related labeling categories, which contains a significantly large amount of data. Extensive comparative experimental results demonstrate the effectiveness and efficiency of SACC for the code classification tasks.

摘要
优化代码分类任务的准确性和效率是现实世界程序教育平台管理中的挑战。在过去几年，基于抽象树（AST）的模型方法在代码分类任务中得到了广泛的应用。我们在这篇论文中介绍了一种名为代码分类 neural network with sparse attention（SACC）的方法。该方法包括两个主要步骤：第一步：源代码进行语法分析和处理，并将生成的抽象树分解成多个子树序列，然后使用回归神经网络编码以获得高维度表示。这一步同时考虑了代码的逻辑结构和字面层次信息。第二步：编码后的子树序列被传输到一个包含稀缺注意机制的Transformer模型中，用于分类。这种方法可以有效减少自注意机制的计算成本，从而提高训练速度，同时保持效果。我们还设计了一种特殊的稀缺注意模式，用于满足代码分类任务的唯一需求。这种设计可以减少重复信息的影响，提高模型的总性能。最后，我们还解决了过去相关研究中的一些问题，如 incomplete classification labels和小型数据集。我们对CodeNet数据集进行了算法相关标签注释，该数据集包含很大量数据。我们进行了广泛的比较 эксперименталь研究，证明了 SACC 在代码分类任务中的有效性和效率。

Convolve and Conquer: Data Comparison with Wiener Filters

paper_url: http://arxiv.org/abs/2311.06558
repo_url: https://github.com/dpelacani/AWLoss
paper_authors: Deborah Pelacani Cruz, George Strong, Oscar Bates, Carlos Cueto, Jiashun Yao, Lluis Guasch
for: 这个论文是为了提出一种新的数据比较方法，用于量化评估数据样本之间的差异和相似性。
methods: 该方法基于温因 filter 理论，通过卷积方式对数据样本进行全面比较，以便更好地捕捉数据分布的特征。
results: 研究人员在四种机器学习应用中使用该方法，包括数据压缩、医学影像填充、翻译类别和非Parametric生成模型。结果表明，该方法可以提供更高的数据准确率和更好的感知质量，同时具有对摆动的Robustness。

Abstract
Quantitative evaluations of differences and/or similarities between data samples define and shape optimisation problems associated with learning data distributions. Current methods to compare data often suffer from limitations in capturing such distributions or lack desirable mathematical properties for optimisation (e.g. smoothness, differentiability, or convexity). In this paper, we introduce a new method to measure (dis)similarities between paired samples inspired by Wiener-filter theory. The convolutional nature of Wiener filters allows us to comprehensively compare data samples in a globally correlated way. We validate our approach in four machine learning applications: data compression, medical imaging imputation, translated classification, and non-parametric generative modelling. Our results demonstrate increased resolution in reconstructed images with better perceptual quality and higher data fidelity, as well as robustness against translations, compared to conventional mean-squared-error analogue implementations.

摘要
量化评估数据样本之间的差异和相似性定义和shape优化问题相关于学习数据分布。现有的比较方法 oft suffer from capturing这些分布的限制或缺乏优化中desirable的数学性质（例如，smoothness、 differentiability或convexity）。本文引入一种新的方法来衡量paired samples之间的（dis）similarities， draws inspiration from Wiener-filter theory。Wiener filters的卷积性质允许我们全面比较数据样本，并且在全球相关的方式下进行比较。我们在四种机器学习应用中 validate我们的方法：数据压缩、医学影像补充、翻译类别和非 Parametric生成模型。我们的结果表明我们的方法可以提供更高的重建图像分辨率、更好的感知质量和更高的数据准确性，同时具有对于平移的Robustness。

Graph ODE with Factorized Prototypes for Modeling Complicated Interacting Dynamics

paper_url: http://arxiv.org/abs/2311.06554
repo_url: None
paper_authors: Xiao Luo, Yiyang Gu, Huiyu Jiang, Jinsheng Huang, Wei Ju, Ming Zhang, Yizhou Sun
for: 本研究探讨了模型交互动力系统的问题，这对理解物理动力和生物过程都是关键。
methods: 研究使用了 геометрические图进行表示交互关系，然后使用强大的图神经网络（GNNs）进行捕捉。
results: 研究提出了一种新的方法 named Graph ODE with factorized prototypes (GOAT)，可以解决难以预测交互动力的问题，包括偏移量和复杂的基础规则。 GOAT 使用了分解原型的方法来提取对象级和系统级的上下文知识，从而提高了模型的通用性。

Abstract
This paper studies the problem of modeling interacting dynamical systems, which is critical for understanding physical dynamics and biological processes. Recent research predominantly uses geometric graphs to represent these interactions, which are then captured by powerful graph neural networks (GNNs). However, predicting interacting dynamics in challenging scenarios such as out-of-distribution shift and complicated underlying rules remains unsolved. In this paper, we propose a new approach named Graph ODE with factorized prototypes (GOAT) to address the problem. The core of GOAT is to incorporate factorized prototypes from contextual knowledge into a continuous graph ODE framework. Specifically, GOAT employs representation disentanglement and system parameters to extract both object-level and system-level contexts from historical trajectories, which allows us to explicitly model their independent influence and thus enhances the generalization capability under system changes. Then, we integrate these disentangled latent representations into a graph ODE model, which determines a combination of various interacting prototypes for enhanced model expressivity. The entire model is optimized using an end-to-end variational inference framework to maximize the likelihood. Extensive experiments in both in-distribution and out-of-distribution settings validate the superiority of GOAT.

摘要
To address this challenge, we propose a new approach called Graph ODE with factorized prototypes (GOAT). The core of GOAT is to incorporate factorized prototypes from contextual knowledge into a continuous graph ODE framework. Specifically, GOAT extracts both object-level and system-level contexts from historical trajectories using representation disentanglement and system parameters, allowing us to explicitly model their independent influence and enhance the generalization capability under system changes.Next, we integrate these disentangled latent representations into a graph ODE model, which combines various interacting prototypes for enhanced model expressivity. The entire model is optimized using an end-to-end variational inference framework to maximize the likelihood.Experimental results in both in-distribution and out-of-distribution settings demonstrate the superiority of GOAT. This paper provides a new approach to modeling interacting dynamic systems, which can be applied to various fields such as physical dynamics and biological processes.

From Charts to Atlas: Merging Latent Spaces into One

paper_url: http://arxiv.org/abs/2311.06547
repo_url: None
paper_authors: Donato Crisostomi, Irene Cannistraci, Luca Moschella, Pietro Barbiero, Marco Ciccone, Pietro Liò, Emanuele Rodolà
for: 这个研究的目的是创建一个汇集多个相关任务和数据集的综合空间，以便进行更好的分类。
methods: 这个研究使用了相对表示来使多个空间相似，然后使用简单的均值来汇集这些空间。
results: 研究发现，通过这种方法可以创建一个更好的分类空间，并且这个空间中含有任务特有的印记。此外，这种方法还可以在没有共同区域的情况下进行空间汇集，尽管效果不如结合所有任务的模型。

Abstract
Models trained on semantically related datasets and tasks exhibit comparable inter-sample relations within their latent spaces. We investigate in this study the aggregation of such latent spaces to create a unified space encompassing the combined information. To this end, we introduce Relative Latent Space Aggregation, a two-step approach that first renders the spaces comparable using relative representations, and then aggregates them via a simple mean. We carefully divide a classification problem into a series of learning tasks under three different settings: sharing samples, classes, or neither. We then train a model on each task and aggregate the resulting latent spaces. We compare the aggregated space with that derived from an end-to-end model trained over all tasks and show that the two spaces are similar. We then observe that the aggregated space is better suited for classification, and empirically demonstrate that it is due to the unique imprints left by task-specific embedders within the representations. We finally test our framework in scenarios where no shared region exists and show that it can still be used to merge the spaces, albeit with diminished benefits over naive merging.

摘要
模型在semantically相关的数据集和任务上训练后，其间的inter-sample关系在幂空间中相似。本研究 investigate这种情况下，如何将这些幂空间融合成一个涵盖所有信息的共同空间。为此，我们提出了相对表示空间融合（Relative Latent Space Aggregation），它包括两个步骤：首先使用相对表示来使幂空间相似，然后使用简单的均值来融合它们。我们在三种不同的设置下分别训练了一个模型：分享样本、分享类别或者不分享任何内容。然后我们训练了每个任务的模型，并将其所得到的幂空间融合起来。我们与一个结束到终端模型训练所有任务的空间进行比较，并发现它们之间的关系很相似。我们还观察到，融合后的空间更适合分类，并且实际上表明了任务特定的嵌入器在表示中留下了独特的印记。最后，我们在没有共同区域的情况下测试了我们的框架，并发现它仍可以将空间融合，尽管效果不如预期。

Understanding Generalization via Set Theory

paper_url: http://arxiv.org/abs/2311.06545
repo_url: None
paper_authors: Shiqi Liu
for: 本研究旨在更好地理解机器学习模型的泛化性。
methods: 本研究使用集合论来引入算法、假设和数据集泛化的概念。我们分析了数据集泛化的性质，并证明了一个关于代理泛化过程的定理。这个定理导致了我们的泛化方法。
results: 通过对MNIST数据集进行泛化实验，我们获得了13,541个样本基。当使用整个训练集来评估模型性能时，模型的准确率达99.945%。但是如果将样本基Shift或修改神经网络结构，模型的性能会受到显著的下降。我们还发现了一些难以预测的样本，并发现它们都是挑战性的示例。实验证明了泛化定义的准确性和我们提出的方法的有效性。

Abstract
Generalization is at the core of machine learning models. However, the definition of generalization is not entirely clear. We employ set theory to introduce the concepts of algorithms, hypotheses, and dataset generalization. We analyze the properties of dataset generalization and prove a theorem on surrogate generalization procedures. This theorem leads to our generalization method. Through a generalization experiment on the MNIST dataset, we obtain 13,541 sample bases. When we use the entire training set to evaluate the model's performance, the models achieve an accuracy of 99.945%. However, if we shift the sample bases or modify the neural network structure, the performance experiences a significant decline. We also identify consistently mispredicted samples and find that they are all challenging examples. The experiments substantiated the accuracy of the generalization definition and the effectiveness of the proposed methods. Both the set-theoretic deduction and the experiments help us better understand generalization.

摘要
<>将文本翻译成简化中文。<>机器学习模型的核心是泛化。然而，泛化的定义并不很明确。我们使用集合论来介绍算法、假设和数据泛化的概念。我们分析数据泛化的性质并证明了代替泛化过程的定理。这个定理导致我们的泛化方法。通过对 MNIST 数据集进行泛化实验，我们获得了13541个样本基。当我们使用整个训练集来评估模型的性能时，模型的准确率为99.945%。然而，如果将样本基shift或修改神经网络结构，模型的性能会受到显著的下降。我们还发现了一些难以预测的样本，并发现它们都是挑战性的示例。实验证明了泛化定义的准确性和我们提议的方法的有效性。同时，集合论 deduction 和实验帮助我们更好地理解泛化。

TURBO: The Swiss Knife of Auto-Encoders

paper_url: http://arxiv.org/abs/2311.06527
repo_url: None
paper_authors: Guillaume Quétant, Yury Belousov, Vitaliy Kinakh, Slava Voloshynovskiy
for: 本研究旨在系统地分析和总结自动编码方法的信息理论基础。
methods: 该框架基于两个方向的共聚information flow的最大化，以derive its core concept。
results: 研究发现多个常见神经网络模型都可以被包含在该框架中，而信息瓶颈概念无法涵盖这些模型，因此TURBO框架成为一个更好的理论参照。

Abstract
We present a novel information-theoretic framework, termed as TURBO, designed to systematically analyse and generalise auto-encoding methods. We start by examining the principles of information bottleneck and bottleneck-based networks in the auto-encoding setting and identifying their inherent limitations, which become more prominent for data with multiple relevant, physics-related representations. The TURBO framework is then introduced, providing a comprehensive derivation of its core concept consisting of the maximisation of mutual information between various data representations expressed in two directions reflecting the information flows. We illustrate that numerous prevalent neural network models are encompassed within this framework. The paper underscores the insufficiency of the information bottleneck concept in elucidating all such models, thereby establishing TURBO as a preferable theoretical reference. The introduction of TURBO contributes to a richer understanding of data representation and the structure of neural network models, enabling more efficient and versatile applications.

摘要
我们提出了一种新的信息理论框架，称之为TURBO，用于系统地分析和总结自编码方法。我们从自编码设置中检查信息瓶颈和瓶颈基础网络的原则，并指出其内在的限制，尤其是数据具有多个相关的物理相关表示。然后，我们介绍了TURBO框架，其核心思想是在两个方向强制实现各种数据表示之间的最大共同信息。我们示示了许多流行的神经网络模型都包含在这个框架内。文章强调信息瓶颈概念无法描述所有这些模型，因此Establish TURBO作为更好的理论参照。TURBO的引入将推动数据表示和神经网络模型的结构更深入理解，并提供更有效和灵活的应用。

CompCodeVet: A Compiler-guided Validation and Enhancement Approach for Code Dataset

paper_url: http://arxiv.org/abs/2311.06505
repo_url: None
paper_authors: Le Chen, Arijit Bhattacharjee, Nesreen K. Ahmed, Niranjan Hasabnis, Gal Oren, Bin Lei, Ali Jannesari
for: 提高 LLM 在 C 和 C++ 代码生成和理解方面的表现
methods: 使用编译器作为教师，通过 CompCodeVet approach 提高 LLM 的 zero-shot 思维能力
results: CompCodeVet 在两个开源代码集中进行评估，显示其能够改善 LLM 的训练数据质量

Abstract
Large language models (LLMs) have become increasingly prominent in academia and industry due to their remarkable performance in diverse applications. As these models evolve with increasing parameters, they excel in tasks like sentiment analysis and machine translation. However, even models with billions of parameters face challenges in tasks demanding multi-step reasoning. Code generation and comprehension, especially in C and C++, emerge as significant challenges. While LLMs trained on code datasets demonstrate competence in many tasks, they struggle with rectifying non-compilable C and C++ code. Our investigation attributes this subpar performance to two primary factors: the quality of the training dataset and the inherent complexity of the problem which demands intricate reasoning. Existing "Chain of Thought" (CoT) prompting techniques aim to enhance multi-step reasoning. This approach, however, retains the limitations associated with the latent drawbacks of LLMs. In this work, we propose CompCodeVet, a compiler-guided CoT approach to produce compilable code from non-compilable ones. Diverging from the conventional approach of utilizing larger LLMs, we employ compilers as a teacher to establish a more robust zero-shot thought process. The evaluation of CompCodeVet on two open-source code datasets shows that CompCodeVet has the ability to improve the training dataset quality for LLMs.

摘要

Stacked networks improve physics-informed training: applications to neural networks and deep operator networks

paper_url: http://arxiv.org/abs/2311.06483
repo_url: None
paper_authors: Amanda A Howard, Sarah H Murphy, Shady E Ahmed, Panos Stinis
for: 解决physics-informed neural networks和operator networks困难或无法准确地训练某些物理系统方程的问题。
methods: 提出了一种新的多优化框架，通过逐步堆叠physics-informed neural networks和operator networks来促进训练。每一步的输出可以作为下一步的低精度输入进行训练，逐步增加学习的模型表达能力。在每一步的迭代过程中，可以使用相同或不同的方程来模拟热处理（类似于随机扰动）。
results: 通过使用 benchmark问题，包括非线性摆车、波方程和viscous Burgers方程，我们展示了堆叠可以提高physics-informed neural networks和operator networks的准确率，并降低它们的大小。

Abstract
Physics-informed neural networks and operator networks have shown promise for effectively solving equations modeling physical systems. However, these networks can be difficult or impossible to train accurately for some systems of equations. We present a novel multifidelity framework for stacking physics-informed neural networks and operator networks that facilitates training. We successively build a chain of networks, where the output at one step can act as a low-fidelity input for training the next step, gradually increasing the expressivity of the learned model. The equations imposed at each step of the iterative process can be the same or different (akin to simulated annealing). The iterative (stacking) nature of the proposed method allows us to progressively learn features of a solution that are hard to learn directly. Through benchmark problems including a nonlinear pendulum, the wave equation, and the viscous Burgers equation, we show how stacking can be used to improve the accuracy and reduce the required size of physics-informed neural networks and operator networks.

摘要
physics-informed neural networks 和 operator networks 已经展示了解决物理系统方程的能力。然而，这些网络可能具有一些或所有系统方程难以准确地训练。我们提出了一种新的多优化框架，用于栈层physics-informed neural networks 和 operator networks，以便训练。我们逐步建立一串网络，其输出在一个步骤可以作为下一步训练的低精度输入，逐步增加学习的模型表达能力。在每个迭代步骤中，可以使用相同或不同的方程（类似于模拟热处理）。我们的方法的迭代性让我们可以逐步学习解决方程中的难以直接学习的特征。通过使用不同的测试问题，包括非线性摆、波方程和粘性拜尔斯方程，我们证明了栈层可以提高physics-informed neural networks 和 operator networks的准确率，同时减少这些网络的大小。

Topology-Matching Normalizing Flows for Out-of-Distribution Detection in Robot Learning

paper_url: http://arxiv.org/abs/2311.06481
repo_url: None
paper_authors: Jianxiang Feng, Jongseok Lee, Simon Geisler, Stephan Gunnemann, Rudolph Triebel
for: 提高自主机器人在实际世界中可靠部署的可靠性，通过异常检测能力。
methods: 使用Normalizing Flows（NFs）进行异常检测，但是在使用NFs时，往往会遇到复杂的目标分布与基础分布之间的匹配问题。这里我们使用一种表达力强的分布来匹配目标分布的 topology。
results: 在density estimation和2D对象检测benchmark中获得了较好的结果，并且在实际 robot部署中也展现出了良好的性能。

Abstract
To facilitate reliable deployments of autonomous robots in the real world, Out-of-Distribution (OOD) detection capabilities are often required. A powerful approach for OOD detection is based on density estimation with Normalizing Flows (NFs). However, we find that prior work with NFs attempts to match the complex target distribution topologically with naive base distributions leading to adverse implications. In this work, we circumvent this topological mismatch using an expressive class-conditional base distribution trained with an information-theoretic objective to match the required topology. The proposed method enjoys the merits of wide compatibility with existing learned models without any performance degradation and minimum computation overhead while enhancing OOD detection capabilities. We demonstrate superior results in density estimation and 2D object detection benchmarks in comparison with extensive baselines. Moreover, we showcase the applicability of the method with a real-robot deployment.

摘要
In this work, we address this limitation by using an expressive class-conditional base distribution trained with an information-theoretic objective to match the required topology. Our method is compatible with existing learned models, incurs minimal computation overhead, and enhances OOD detection capabilities. We demonstrate superior performance in density estimation and 2D object detection benchmarks compared to extensive baselines, and showcase the practicality of our method with a real-robot deployment.

Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance

paper_url: http://arxiv.org/abs/2311.06480
repo_url: https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound
paper_authors: June-Woo Kim, Chihyeon Yoon, Miika Toikkanen, Sangmin Bae, Ho-Young Jung
for: 提高呼吸音数据的分类性能，特别是对少数类型的呼吸音进行改进。
methods: 使用音频扩散模型作为 Conditional Neural Vocoder，并实现对呼吸音数据的增强。
results: 对ICBHI dataset进行实验，并证明了我们的反对抗学习方法可以提高呼吸音分类性能，并且在一些少数类型上提高了准确率。

Abstract
Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential data like respiratory sounds is less explored. In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder. We also demonstrate a simple yet effective adversarial fine-tuning method to align features between the synthetic and real respiratory sound samples to improve respiratory sound classification performance. Our experimental results on the ICBHI dataset demonstrate that the proposed adversarial fine-tuning is effective, while only using the conventional augmentation method shows performance degradation. Moreover, our method outperforms the baseline by 2.24% on the ICBHI Score and improves the accuracy of the minority classes up to 26.58%. For the supplementary material, we provide the code at https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound.

摘要
深度生成模型在医疗图像领域已经出现为数据稀缺问题提供了一个有前途的解决方案。然而，它们在时序数据如呼吸音波中的应用还较少。在这个工作中，我们提出了一种简单直观的增强呼吸音波数据不均衡问题的方法，利用音频扩散模型作为受控神经 vocoder。我们还提出了一种简单又有效的对抗训练方法，用于对真实呼吸音波样本和生成的呼吸音波样本进行对齐特征。我们的实验结果表明，我们的对抗训练方法是有效的，只使用常见增强方法时则会导致性能下降。此外，我们的方法比基线方法高2.24%的ICBHI Score和加强少数类准确率最高26.58%。详细的实验结果和代码可以在 GitHub 上找到：https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound。

Online Continual Learning via Logit Adjusted Softmax

paper_url: http://arxiv.org/abs/2311.06460
repo_url: https://github.com/k1nght/online_cl_logit_adjusted_softmax
paper_authors: Zhehao Huang, Tao Li, Chenhe Yuan, Yingwen Wu, Xiaolin Huang
for: 本研究旨在解决在线 continual learning 问题，即模型在非站ARY数据流中学习时避免衰老现象，并且减少最近学习类别的预测偏见。
methods: 本研究使用了理论分析，发现了间类差异完全由类别预置带来，并且通过调整模型征值来实现 Bayes-优论法。
results: 我们的方法可以有效地避免类别预置的影响，并在实际场景下提供显著的性能改进（比如 CIFAR10 上的最佳基eline 提高4.6%），而且增加了非常少的计算成本。

Abstract
Online continual learning is a challenging problem where models must learn from a non-stationary data stream while avoiding catastrophic forgetting. Inter-class imbalance during training has been identified as a major cause of forgetting, leading to model prediction bias towards recently learned classes. In this paper, we theoretically analyze that inter-class imbalance is entirely attributed to imbalanced class-priors, and the function learned from intra-class intrinsic distributions is the Bayes-optimal classifier. To that end, we present that a simple adjustment of model logits during training can effectively resist prior class bias and pursue the corresponding Bayes-optimum. Our proposed method, Logit Adjusted Softmax, can mitigate the impact of inter-class imbalance not only in class-incremental but also in realistic general setups, with little additional computational cost. We evaluate our approach on various benchmarks and demonstrate significant performance improvements compared to prior arts. For example, our approach improves the best baseline by 4.6% on CIFAR10.

摘要
（online continuous learning是一个困难的问题，where models must learn from a non-stationary data stream while avoiding catastrophic forgetting. Inter-class imbalance during training has been identified as a major cause of forgetting, leading to model prediction bias towards recently learned classes. In this paper, we theoretically analyze that inter-class imbalance is entirely attributed to imbalanced class-priors, and the function learned from intra-class intrinsic distributions is the Bayes-optimal classifier. To that end, we present that a simple adjustment of model logits during training can effectively resist prior class bias and pursue the corresponding Bayes-optimum. Our proposed method, Logit Adjusted Softmax, can mitigate the impact of inter-class imbalance not only in class-incremental but also in realistic general setups, with little additional computational cost. We evaluate our approach on various benchmarks and demonstrate significant performance improvements compared to prior arts. For example, our approach improves the best baseline by 4.6% on CIFAR10.）

Asymmetric Contrastive Multimodal Learning for Advancing Chemical Understanding

paper_url: http://arxiv.org/abs/2311.06456
repo_url: None
paper_authors: Hao Xu, Yifei Wang, Yunrui Li, Pengyu Hong
for: 这篇论文旨在提出一种新的多模态深度学习方法，用于提高化学研究和应用。
methods: 这篇论文使用了异形对比学习方法，将化学多modalities的信息转移到分子图表示中，以实现多modalities的共同理解。
results: 实验表明，ACML可以帮助化学研究人员更好地理解分子的含义，并提高化学应用的表达力和可解释性。

Abstract
The versatility of multimodal deep learning holds tremendous promise for advancing scientific research and practical applications. As this field continues to evolve, the collective power of cross-modal analysis promises to drive transformative innovations, leading us to new frontiers in chemical understanding and discovery. Hence, we introduce Asymmetric Contrastive M}ultimodal Learning (ACML) as a novel approach tailored for molecules, showcasing its potential to advance the field of chemistry. ACML harnesses the power of effective asymmetric contrastive learning to seamlessly transfer information from various chemical modalities to molecular graph representations. By combining pre-trained chemical unimodal encoders and a shallow-designed graph encoder, ACML facilitates the assimilation of coordinated chemical semantics from different modalities, leading to comprehensive representation learning with efficient training. This innovative framework enhances the interpretability of learned representations and bolsters the expressive power of graph neural networks. Through practical tasks such as isomer discrimination and uncovering crucial chemical properties for drug discovery, ACML exhibits its capability to revolutionize chemical research and applications, providing a deeper understanding of chemical semantics of different modalities.

摘要
多模态深度学习的多样性具有推进科学研究和实用应用的巨大承诺。随着这个领域的进一步发展，跨模态分析的共同力将驱动 transformative 创新，带我们进入新的化学理解和发现的前iers。因此，我们介绍 Asymmetric Contrastive Multimodal Learning（ACML）作为一种新的方法，特地设计用于分子，展示其在化学领域的潜在发展 potential。ACML 利用有效的不对称对比学习来传递不同化学modalities中的各种 semantics 到分子图表示。通过将预训练的化学uni模态编码器和一个浅层设计的图编码器结合在一起，ACML 实现了模态之间的协调化学 semantics的同化，从而实现了全面的表示学习，并且可以高效地训练。这种创新的框架提高了学习表示的可读性和图神经网络的表达能力。通过实际任务，如分子同分子识别和找到重要的药物发现中的化学性质，ACML 展示了其在化学研究和应用中的革命性潜力，为不同modalities的化学semantics提供了更深刻的理解。

A Saliency-based Clustering Framework for Identifying Aberrant Predictions

paper_url: http://arxiv.org/abs/2311.06454
repo_url: None
paper_authors: Aina Tersol Montserrat, Alexander R. Loftus, Yael Daihes
for: 这篇论文旨在提高机器学习分类器在高度不确定的生物医学应用中的可靠性和信任性。
methods: 本论文提出了一种新的训练方法，旨在降低误分率并识别异常预测。
results: 本论文的方法在 veterinary radiology 领域中实现了20%的精度提升。

Abstract
In machine learning, classification tasks serve as the cornerstone of a wide range of real-world applications. Reliable, trustworthy classification is particularly intricate in biomedical settings, where the ground truth is often inherently uncertain and relies on high degrees of human expertise for labeling. Traditional metrics such as precision and recall, while valuable, are insufficient for capturing the nuances of these ambiguous scenarios. Here we introduce the concept of aberrant predictions, emphasizing that the nature of classification errors is as critical as their frequency. We propose a novel, efficient training methodology aimed at both reducing the misclassification rate and discerning aberrant predictions. Our framework demonstrates a substantial improvement in model performance, achieving a 20\% increase in precision. We apply this methodology to the less-explored domain of veterinary radiology, where the stakes are high but have not been as extensively studied compared to human medicine. By focusing on the identification and mitigation of aberrant predictions, we enhance the utility and trustworthiness of machine learning classifiers in high-stakes, real-world scenarios, including new applications in the veterinary world.

摘要
Simplified Chinese:机器学习中的分类任务是广泛应用的基础。在生物医学设置下，可靠、可信的分类特别复杂，因为ground truth的自然状况是 uncertain，需要高度的人类专业知识进行标注。传统的精度和 recall 指标不能 Capture 这些抽象的情况。我们提出了异常预测的概念，强调异常预测的性质是重要的，不仅是频率。我们提出了一种新的训练方法，可以减少错分率，并且可以识别异常预测。我们的框架在 veterinary radiology 领域中实现了20%的提升精度。我们将这种方法应用到未经充分研究的 veterinary 世界，以提高机器学习分类器在高风险、真实世界中的可靠性和可信worthiness。

Mitigating Pooling Bias in E-commerce Search via False Negative Estimation

paper_url: http://arxiv.org/abs/2311.06444
repo_url: None
paper_authors: Xiaochen Wang, Xiao Xiao, Ruhan Zhang, Xuan Zhang, Taesik Na, Tejaswi Tenneti, Haixun Wang, Fenglong Ma
for: 提高用户体验和商业成功，需要准确和高效地评估产品相关性。
methods: 使用新的偏见抑制硬性负采样策略（BHNS），可以减轻pooling bias，提高性能和商业影响。
results: 在Instacart搜索设置中，BHNS实现了实用电商应用。此外，对公共数据集进行比较分析，表明BHNS具有适用于多种应用场景的领域独特性。

Abstract
Efficient and accurate product relevance assessment is critical for user experiences and business success. Training a proficient relevance assessment model requires high-quality query-product pairs, often obtained through negative sampling strategies. Unfortunately, current methods introduce pooling bias by mistakenly sampling false negatives, diminishing performance and business impact. To address this, we present Bias-mitigating Hard Negative Sampling (BHNS), a novel negative sampling strategy tailored to identify and adjust for false negatives, building upon our original False Negative Estimation algorithm. Our experiments in the Instacart search setting confirm BHNS as effective for practical e-commerce use. Furthermore, comparative analyses on public dataset showcase its domain-agnostic potential for diverse applications.

摘要
高效和准确的产品相关性评估对用户体验和商业成功至关重要。训练一个高效的相关性评估模型需要高质量的查询-产品对，常常通过负样本策略获得。然而，现有方法带有汇总偏见，由于错误地抽取假负样本，导致性能和商业影响减退。为解决这问题，我们提出了减少偏见的负样本选择策略（BHNS），基于我们原始的假负样本估计算算法。我们在Instacart搜索设置中进行了实验，证实BHNS在实际电商应用中是有效的。此外，我们对公共数据集进行了比较分析，显示BHNS在多种应用领域具有领域无关的潜在应用潜力。

2023-11-12

Augmented Bridge Matching

CD-COCO: A Versatile Complex Distorted COCO Database for Scene-Context-Aware Computer Vision

Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels

SegReg: Segmenting OARs by Registering MR Images and CT Annotations

Video-based sympathetic arousal assessment via peripheral blood flow estimation

Setting a Baseline for long-shot real-time Player and Ball detection in Soccer Videos

Concept-wise Fine-tuning Matters in Preventing Negative Transfer

Contrastive Learning of View-Invariant Representations for Facial Expressions Recognition

Sampler Scheduler for Diffusion Models

Osteoporosis Prediction from Hand and Wrist X-rays using Image Segmentation and Self-Supervised Learning

On original and latent space connectivity in deep neural networks

MetaMix: Meta-state Precision Searcher for Mixed-precision Activation Quantization

Deep Perspective Transformation Based Vehicle Localization on Bird’s Eye View

CL-Flow:Strengthening the Normalizing Flows by Contrastive Learning for Better Anomaly Detection

IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models

InfMLLM: A Unified Framework for Visual-Language Tasks

Explainability of Vision Transformers: A Comprehensive Review and New Perspectives

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

2023-11-12

Creating a Discipline-specific Commons for Infectious Disease Epidemiology

Assessing the Interpretability of Programmatic Policies with Large Language Models

Physics-Informed Data Denoising for Real-Life Sensing Systems

Towards probabilistic Weather Forecasting with Conditioned Spatio-Temporal Normalizing Flows

FLASH-RL: Federated Learning Addressing System and Static Heterogeneity using Reinforcement Learning

TSViT: A Time Series Vision Transformer for Fault Diagnosis

Flames: Benchmarking Value Alignment of Chinese Large Language Models

Anticipating User Needs: Insights from Design Fiction on Conversational Agents for Computational Thinking

Modeling User Viewing Flow using Large Language Models for Article Recommendation

Understanding Practices around Computational News Discovery Tools in the Domain of Science Journalism

Can Large Language Models Augment a Biomedical Ontology with missing Concepts and Relations?

On learning spatial sequences with the movement of attention

Distribution Re-weighting and Voting Paradoxes

Open-Set Graph Anomaly Detection via Normal Structure Regularisation

Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms

Training A Multi-stage Deep Classifier with Feedback Signals

Dual-Branch Reconstruction Network for Industrial Anomaly Detection with RGB-D Data

Alleviating Behavior Data Imbalance for Multi-Behavior Graph Collaborative Filtering

ChatAnything: Facetime Chat with LLM-Enhanced Personas

Learning Globally Optimized Language Structure via Adversarial Training

Evaluating the Efficacy of Interactive Language Therapy Based on LLM for High-Functioning Autistic Adolescent Psychological Counseling

Large Language Models’ Understanding of Math: Source Criticism and Extrapolation

ReIDTracker Sea: the technical report of BoaTrack and SeaDronesSee-MOT challenge at MaCVi of WACV24

Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data

Federated Learning for Generalization, Robustness, Fairness: A Survey and Benchmark

Aggregate, Decompose, and Fine-Tune: A Simple Yet Effective Factor-Tuning Method for Vision Transformer

Two Stream Scene Understanding on Graph Embedding

Detecting and Correcting Hate Speech in Multimodal Memes with Large Visual Language Model

DeepQC: A Deep Learning System for Automatic Quality Control of In-situ Soil Moisture Sensor Time Series Data

An advantage based policy transfer algorithm for reinforcement learning with metrics of transferability

Enabling Human-Centered AI: A Methodological Perspective

An Investigation of Hepatitis B Virus Genome using Markov Models

Conversational Data Exploration: A Game-Changer for Designing Data Science Pipelines

Comparative Multi-View Language Grounding

2023-11-12

SELF-EXPLAIN: Teaching Large Language Models to Reason Complex Questions by Themselves

Retrieval and Generative Approaches for a Pregnancy Chatbot in Nepali with Stemmed and Non-Stemmed Data : A Comparative Study

DialMAT: Dialogue-Enabled Transformer with Moment-Based Adversarial Training

Automatic Textual Normalization for Hate Speech Detection

GIELLM: Japanese General Information Extraction Large Language Model Utilizing Mutual Reinforcement Effect

Cricket Player Profiling: Unraveling Strengths and Weaknesses Using Text Commentary Data

Evaluation of GPT-4 for chest X-ray impression generation: A reader study on performance and perception

On the Robustness of Question Rewriting Systems to Questions of Varying Hardness

Tunable Soft Prompts are Messengers in Federated Learning

CLAMP: A Contrastive Language And Molecule Pre-training Network

Learning Knowledge-Enhanced Contextual Language Representations for Domain Natural Language Understanding

Sharing, Teaching and Aligning: Knowledgeable Transfer Learning for Cross-Lingual Machine Reading Comprehension

From Complex to Simple: Unraveling the Cognitive Tree for Reasoning with Small Language Models

BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis

Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof Generation with Contrastive Stepwise Decoding

Comprehending Lexical and Affective Ontologies in the Demographically Diverse Spatial Social Media Discourse

Controllable Topic-Focused Abstractive Summarization

Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer

What factors influence the popularity of user-generated text in the creative domain? A case study of book reviews

Trusted Source Alignment in Large Language Models

Simple and Effective Input Reformulations for Translation

2023-11-12

Analytical Verification of Deep Neural Network Performance for Time-Synchronized Distribution System State Estimation

An Expandable Machine Learning-Optimization Framework to Sequential Decision-Making

Anchor Data Augmentation