2023-10-21

cs.AI

cs.AI - 2023-10-21

Ask To The Point: Open-Domain Entity-Centric Question Generation

paper_url: http://arxiv.org/abs/2310.14126
repo_url: https://github.com/liuyuxiang512/ecqg
paper_authors: Yuxiang Liu, Jie Huang, Kevin Chen-Chuan Chang
for: 实现话题具体学习、助教读物和 факт检查等应用场景，提出了一个新任务：实体中心问题生成（ECQG）。
methods: 提出了一种具有一致性的 PLM 基础架构 GenCONE，包括两个新模块：内容强调模块和问题验证模块。
results: 经过广泛的实验，GenCONE 能够具有显著和稳定的性能优势，并且两个模块具有辅之usage和补做作用。

Abstract
We introduce a new task called *entity-centric question generation* (ECQG), motivated by real-world applications such as topic-specific learning, assisted reading, and fact-checking. The task aims to generate questions from an entity perspective. To solve ECQG, we propose a coherent PLM-based framework GenCONE with two novel modules: content focusing and question verification. The content focusing module first identifies a focus as "what to ask" to form draft questions, and the question verification module refines the questions afterwards by verifying the answerability. We also construct a large-scale open-domain dataset from SQuAD to support this task. Our extensive experiments demonstrate that GenCONE significantly and consistently outperforms various baselines, and two modules are effective and complementary in generating high-quality questions.

摘要
我们介绍了一个新任务called *实体中心问题生成* (ECQG), 该任务受到实际应用场景的启发，如专题学习、辅助阅读和事实核实。该任务的目标是从实体角度生成问题。为解决ECQG，我们提议了一个协调PLM-based框架GenCONE，该框架包括两个新模块：内容专注和问题验证。内容专注模块首先确定了“要问什么”的焦点，以生成签证问题，而问题验证模块则在验证答案可否回答。我们还构建了一个大规模的开放领域数据集，来支持这个任务。我们的广泛实验表明，GenCONE在various baselines的比较中具有显著且一致的优势，两个模块也是生成高质量问题的有效和补充性的。

Sentiment Analysis Across Multiple African Languages: A Current Benchmark

paper_url: http://arxiv.org/abs/2310.14120
repo_url: None
paper_authors: Saurav K. Aryal, Howard Prioleau, Surakshya Aryal
for: 这项研究的目的是为了提高非洲语言 sentiment 分析的研究，并评估当前的 transformer 模型在非洲语言上的性能。
methods: 研究使用了 AfriSenti-SemEval Shared Task 12 上的注释 sentiment analysis 数据，并对当前状态的 transformer 模型进行了比较和评估。
results: 研究发现，即使在低资源情况下，更多的数据仍然可以生成更好的模型，并且模型专门为非洲语言开发的模型在所有任务上都表现出色。此外，没有一个 universal 模型能够适用于所有语言的评估。

Abstract
Sentiment analysis is a fundamental and valuable task in NLP. However, due to limitations in data and technological availability, research into sentiment analysis of African languages has been fragmented and lacking. With the recent release of the AfriSenti-SemEval Shared Task 12, hosted as a part of The 17th International Workshop on Semantic Evaluation, an annotated sentiment analysis of 14 African languages was made available. We benchmarked and compared current state-of-art transformer models across 12 languages and compared the performance of training one-model-per-language versus single-model-all-languages. We also evaluated the performance of standard multilingual models and their ability to learn and transfer cross-lingual representation from non-African to African languages. Our results show that despite work in low resource modeling, more data still produces better models on a per-language basis. Models explicitly developed for African languages outperform other models on all tasks. Additionally, no one-model-fits-all solution exists for a per-language evaluation of the models evaluated. Moreover, for some languages with a smaller sample size, a larger multilingual model may perform better than a dedicated per-language model for sentiment classification.

摘要
《叙述分析是NLP领域的基础和重要任务。然而，由于数据和技术限制，关于非洲语言的叙述分析研究受到了限制，Fragmented和缺乏。随着最近发布的AfriSenti-SemEval Shared Task 12，14种非洲语言的叙述分析标注数据被提供。我们对当前状态的转换器模型进行了比较和比较，并 evaluate了单语言模型 versus 所有语言模型的训练。我们还评估了标准多语言模型的能力以及其在非洲语言上学习和传递cross-语言表示的能力。我们的结果显示，尽管在低资源模型方面做了很多工作，但更多的数据仍然可以生成更好的模型。专门为非洲语言开发的模型在所有任务上都高于其他模型。此外，没有一个“一模型 fits all”的解决方案，每种语言的评估中的模型都不同。此外，对于一些语言的小样本大小，大型多语言模型可能会在叙述分类任务上表现更好于专门为该语言开发的模型。

CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement

paper_url: http://arxiv.org/abs/2310.14108
repo_url: None
paper_authors: Mohammadreza Salehi, Mehrdad Farajtabar, Maxwell Horton, Fartash Faghri, Hadi Pouransari, Raviteja Vemulapalli, Oncel Tuzel, Ali Farhadi, Mohammad Rastegari, Sachin Mehta
for: 这个论文的目的是提高CLIP模型的视觉表示能力。
methods: 这个论文使用了开源的任务特定视觉模型生成 pseudo-labels，并在这些 pseudo-labels 基础上训练 CLIP 模型。
results: 这个方法可以提高 CLIP 模型在不同视觉任务中的表现，包括 segmentation、检测、深度估计和表面法线估计，最多提高16.3%。这些提高不会减少 CLIP 模型的现有能力，包括其在适应性零分类中的护卫。

Abstract
Contrastive language image pretraining (CLIP) is a standard method for training vision-language models. While CLIP is scalable, promptable, and robust to distribution shifts on image classification tasks, it lacks object localization capabilities. This paper studies the following question: Can we augment CLIP training with task-specific vision models from model zoos to improve its visual representations? Towards this end, we leverage open-source task-specific vision models to generate pseudo-labels for an uncurated and noisy image-text dataset. Subsequently, we train CLIP models on these pseudo-labels in addition to the contrastive training on image and text pairs. This simple setup shows substantial improvements of up to 16.3% across different vision tasks, including segmentation, detection, depth estimation, and surface normal estimation. Importantly, these enhancements are achieved without compromising CLIP's existing capabilities, including its proficiency in promptable zero-shot classification.

摘要
对比语言图像预训练（CLIP）是一种标准的视觉语言模型训练方法。CLIP可以扩展，可以提示，并具有对分布变化的鲁棒性，但它缺乏对象定位功能。这篇论文研究以下问题：可以通过将任务特定的视觉模型添加到CLIP训练中来改善其视觉表示？为此，我们利用开源的任务特定视觉模型生成pseudo-标签，并在这些pseudo-标签的基础上训练CLIP模型。这个简单的设置可以提高CLIP模型在不同视觉任务上的表现，最高提高达16.3%。这些改进不仅不会减少CLIP的现有能力，包括它的能力在零shot提示下的报表能力。

On the Transferability of Visually Grounded PCFGs

paper_url: http://arxiv.org/abs/2310.14107
repo_url: https://github.com/zhaoyanpeng/cpcfg
paper_authors: Yanpeng Zhao, Ivan Titov
for: 本研究旨在评估视觉基础 grammar 生成器在不同文本领域中的可迁移性。
methods: 我们extend了VC-PCFG模型，使其能够在不同文本领域中进行迁移学习。我们采用了零shot转移学习 Setting，即在源领域训练模型，然后直接应用到目标领域。
results: 我们的实验结果表明，视觉基础对文本的改进效果在相似领域中传递，但在远程领域中失效。我们还进行了数据和结果分析，发现 lexicon overlap между源领域和目标领域是迁移性的最重要因素。

Abstract
There has been a significant surge of interest in visually grounded grammar induction in recent times. While a variety of models have been developed for the task and have demonstrated impressive performance, they have not been evaluated on text domains that are different from the training domain, so it is unclear if the improvements brought by visual groundings are transferable. Our study aims to fill this gap and assess the degree of transferability. We start by extending VC-PCFG (short for Visually-grounded Compound PCFG~\citep{zhao-titov-2020-visually}) in such a way that it can transfer across text domains. We consider a zero-shot transfer learning setting where a model is trained on the source domain and is directly applied to target domains, without any further training. Our experimental results suggest that: the benefits from using visual groundings transfer to text in a domain similar to the training domain but fail to transfer to remote domains. Further, we conduct data and result analysis; we find that the lexicon overlap between the source domain and the target domain is the most important factor in the transferability of VC-PCFG.

摘要
Recently, there has been a significant increase in interest in visually grounded grammar induction. While various models have been developed for this task and have shown impressive performance, their transferability to different text domains has not been evaluated. Our study aims to fill this gap and assess the degree of transferability. We start by extending VC-PCFG (short for Visually-grounded Compound PCFG) to enable transfer across text domains. We use a zero-shot transfer learning setting where a model is trained on the source domain and is directly applied to target domains without further training. Our experimental results show that: the benefits of using visual groundings transfer to text in a domain similar to the training domain but fail to transfer to remote domains. Additionally, we conduct data and result analysis and find that the lexicon overlap between the source domain and the target domain is the most important factor in the transferability of VC-PCFG.

Revisiting Instruction Fine-tuned Model Evaluation to Guide Industrial Applications

paper_url: http://arxiv.org/abs/2310.14103
repo_url: https://github.com/manuelfay/ifteval
paper_authors: Manuel Faysse, Gautier Viaud, Céline Hudelot, Pierre Colombo
for: 这 paper 是 investigating task-specialization strategies for IFT model deployment in practical industrial settings.
methods: 该 paper 使用 LLM-based metrics to evaluate the performance of IFT models.
results: 该 paper 提供了实际 industrial setting 中 IFT 模型的部署中的贸易offs, offering practitioners actionable insights for real-world IFT model deployment.

Abstract
Instruction Fine-Tuning (IFT) is a powerful paradigm that strengthens the zero-shot capabilities of Large Language Models (LLMs), but in doing so induces new evaluation metric requirements. We show LLM-based metrics to be well adapted to these requirements, and leverage them to conduct an investigation of task-specialization strategies, quantifying the trade-offs that emerge in practical industrial settings. Our findings offer practitioners actionable insights for real-world IFT model deployment.

摘要
instruction 细调 (IFT) 是一种强大的思想方法，可以增强大型语言模型 (LLM) 的零shot 能力，但是在这之前需要新的评估指标。我们表明 LLM 基于的指标适应这些要求，并利用它们来研究实际工业场景中的任务特化策略，量化在实际应用中出现的交易offs。我们的发现可以为实际 IFT 模型部署提供实践性的指导意见。Here's a word-for-word translation of the text into Simplified Chinese: instruction 细调（IFT）是一种强大的思想方法，可以增强大型语言模型（LLM）的零shot能力，但是在这之前需要新的评估指标。我们表明 LLM基于的指标适应这些要求，并利用它们来研究实际工业场景中的任务特化策略，量化在实际应用中出现的交易offs。我们的发现可以为实际 IFT 模型部署提供实践性的指导意见。

Stabilizing reinforcement learning control: A modular framework for optimizing over all stable behavior

paper_url: http://arxiv.org/abs/2310.14098
repo_url: None
paper_authors: Nathan P. Lawrence, Philip D. Loewen, Shuyuan Wang, Michael G. Forbes, R. Bhushan Gopaluni
for: 本文提出了一个权重控制器设计框架，该框架结合深度学习的优化和模型自由，同时保证稳定性。
methods: 本文使用了Youla-Kucera参数化定义搜索领域，并使用行为系统构建了数据驱动内部模型。在噪声存在的情况下，对数据驱动模型的稳定性进行了分析。
results: 本文通过matrix factorization方法给出了所有稳定线性运算符的集合，并使用神经网络表示参数化的稳定运算符集合，实现了与标准深度学习库的无缝集成。最后，本文还展示了如何应用这些想法来调整固定结构控制器。

Abstract
We propose a framework for the design of feedback controllers that combines the optimization-driven and model-free advantages of deep reinforcement learning with the stability guarantees provided by using the Youla-Kucera parameterization to define the search domain. Recent advances in behavioral systems allow us to construct a data-driven internal model; this enables an alternative realization of the Youla-Kucera parameterization based entirely on input-output exploration data. Perhaps of independent interest, we formulate and analyze the stability of such data-driven models in the presence of noise. The Youla-Kucera approach requires a stable "parameter" for controller design. For the training of reinforcement learning agents, the set of all stable linear operators is given explicitly through a matrix factorization approach. Moreover, a nonlinear extension is given using a neural network to express a parameterized set of stable operators, which enables seamless integration with standard deep learning libraries. Finally, we show how these ideas can also be applied to tune fixed-structure controllers.

摘要
我们提出一个框架，将深度学习的优化优势和模型自由的优势结合起来，同时保证稳定性通过用Youla-Kucera参数化来定义搜索空间。现有的行为系统技术使我们可以构建数据驱动的内部模型，这使得我们可以基于输入输出探索数据来实现Youla-Kucera参数化的alternative实现。此外，我们还研究了这些数据驱动模型在噪声存在时的稳定性。Youla-Kucera方法需要一个稳定的参数来设计控制器。在训练深度学习代理人时，所有稳定的线性运算的集合可以通过矩阵分解方法得到Explicitly。此外，我们还给出了一种使用神经网络表示参数化集合的稳定运算的非线性扩展，这使得我们可以轻松地与标准深度学习库集成。最后，我们示出了如何应用这些想法来调整固定结构的控制器。

Learning Reward for Physical Skills using Large Language Model

paper_url: http://arxiv.org/abs/2310.14092
repo_url: None
paper_authors: Yuwei Zeng, Yiqing Xu
for: 学习物理技能的奖励函数是一个挑战，因为这些任务的谱度非常广泛，状态和动作空间的维度很高，以及复杂的感知反馈。获取专家示范数据是成本高昂和时间费时的。
methods: 我们使用大型自然语言模型（LLM）提取任务相关知识，以提出有效的奖励函数。我们的方法包括两个组成部分：首先，使用 LLM 提出特征和参数化的奖励函数。然后，我们通过迭代自适应过程来更新这些提出的奖励函数的参数，以最小化与 LLM 的排名不一致性。
results: 我们在三个模拟的物理技能学习任务上进行了测试，证明了我们的设计选择的有效性。

Abstract
Learning reward functions for physical skills are challenging due to the vast spectrum of skills, the high-dimensionality of state and action space, and nuanced sensory feedback. The complexity of these tasks makes acquiring expert demonstration data both costly and time-consuming. Large Language Models (LLMs) contain valuable task-related knowledge that can aid in learning these reward functions. However, the direct application of LLMs for proposing reward functions has its limitations such as numerical instability and inability to incorporate the environment feedback. We aim to extract task knowledge from LLMs using environment feedback to create efficient reward functions for physical skills. Our approach consists of two components. We first use the LLM to propose features and parameterization of the reward function. Next, we update the parameters of this proposed reward function through an iterative self-alignment process. In particular, this process minimizes the ranking inconsistency between the LLM and our learned reward functions based on the new observations. We validated our method by testing it on three simulated physical skill learning tasks, demonstrating effective support for our design choices.

摘要

Use LLM to propose features and parameterization of the reward function.2. Update the proposed reward function parameters through an iterative self-alignment process that minimizes the ranking inconsistency between the LLM and our learned reward functions based on new observations.We validated our method by testing it on three simulated physical skill learning tasks, demonstrating effective support for our design choices.

To Copy, or not to Copy; That is a Critical Issue of the Output Softmax Layer in Neural Sequential Recommenders

paper_url: http://arxiv.org/abs/2310.14079
repo_url: https://github.com/iesl/softmax_cpr_recommend
paper_authors: Haw-Shiuan Chang, Nikhil Agarwal, Andrew McCallum
for: 强化Sequential Recommendation任务中复现项目的处理能力
methods: 采用recently-proposed softmax alternatives如softmax-CPR，对输出softmax层进行修改，解决单个隐藏状态嵌入和静态项嵌入的问题
results: 在12个数据集上提供了一致性的改进，对GRU4Rec模型进行修改后，在5个数据集中具有重复项的NDCG@10提高10%（4%-17%），在7个数据集中无重复项的NDCG@10提高24%（8%-39%）。

Abstract
Recent studies suggest that the existing neural models have difficulty handling repeated items in sequential recommendation tasks. However, our understanding of this difficulty is still limited. In this study, we substantially advance this field by identifying a major source of the problem: the single hidden state embedding and static item embeddings in the output softmax layer. Specifically, the similarity structure of the global item embeddings in the softmax layer sometimes forces the single hidden state embedding to be close to new items when copying is a better choice, while sometimes forcing the hidden state to be close to the items from the input inappropriately. To alleviate the problem, we adapt the recently-proposed softmax alternatives such as softmax-CPR to sequential recommendation tasks and demonstrate that the new softmax architectures unleash the capability of the neural encoder on learning when to copy and when to exclude the items from the input sequence. By only making some simple modifications on the output softmax layer for SASRec and GRU4Rec, softmax-CPR achieves consistent improvement in 12 datasets. With almost the same model size, our best method not only improves the average NDCG@10 of GRU4Rec in 5 datasets with duplicated items by 10% (4%-17% individually) but also improves 7 datasets without duplicated items by 24% (8%-39%)!

摘要

Convolutional Bidirectional Variational Autoencoder for Image Domain Translation of Dotted Arabic Expiration

paper_url: http://arxiv.org/abs/2310.14069
repo_url: None
paper_authors: Ahmed Zidane, Ghada Soliman
for: 这个论文提出了一种基于升降栈底部卷积 convolutional bidirectional variational autoencoder（LCBVAE）架构的Encoder和Decoder，用于将斜体阿拉伯数字日期翻译成填充了阿拉伯数字日期。
methods: 我们采用了自定义和适应CRNN模型，并将其训练在2019年至2027年的填充图像上，以提取日期和评估LCBVAE模型在日期识别方面的性能。
results: 我们发现，在LCBVAE架构中添加缓冲层可以提高总体化的性能，并在下游传播学习任务中实现了图像翻译的97%准确率。这种方法可以普适应任何下游学习任务，如图像翻译和重建。

Abstract
THIS paper proposes an approach of Ladder Bottom-up Convolutional Bidirectional Variational Autoencoder (LCBVAE) architecture for the encoder and decoder, which is trained on the image translation of the dotted Arabic expiration dates by reconstructing the Arabic dotted expiration dates into filled-in expiration dates. We employed a customized and adapted version of Convolutional Recurrent Neural Network CRNN model to meet our specific requirements and enhance its performance in our context, and then trained the custom CRNN model with the filled-in images from the year of 2019 to 2027 to extract the expiration dates and assess the model performance of LCBVAE on the expiration date recognition. The pipeline of (LCBVAE+CRNN) can be then integrated into an automated sorting systems for extracting the expiry dates and sorting the products accordingly during the manufacture stage. Additionally, it can overcome the manual entry of expiration dates that can be time-consuming and inefficient at the merchants. Due to the lack of the availability of the dotted Arabic expiration date images, we created an Arabic dot-matrix True Type Font (TTF) for the generation of the synthetic images. We trained the model with unrealistic synthetic dates of 59902 images and performed the testing on a realistic synthetic date of 3287 images from the year of 2019 to 2027, represented as yyyy/mm/dd. In our study, we demonstrated the significance of latent bottleneck layer with improving the generalization when the size is increased up to 1024 in downstream transfer learning tasks as for image translation. The proposed approach achieved an accuracy of 97% on the image translation with using the LCBVAE architecture that can be generalized for any downstream learning tasks as for image translation and reconstruction.

摘要
本文提出了一种垂直卷积减采样变换自动编码器（LCBVAE）架构，用于编码器和解码器，用于图像翻译 arabic 黑点日期。我们采用了自定义和适应版本的卷积循环神经网络（CRNN）模型，以满足我们的特定需求，并在我们的上下文中进行了性能改进。然后，我们在2019年到2027年的 filled-in 图像上训练了自定义 CRNN 模型，以提取日期和评估 LCBVAE 模型在日期识别方面的性能。该管道可以在生产阶段 integrating 到自动化分类系统中，以提取日期并根据日期进行产品的分类。此外，它可以超越手动输入日期，因为这可能是时间consuming 和不可靠的。由于lack arabic 黑点日期图像的可用性，我们创建了一个 arabic 黑点矩阵 True Type Font（TTF），用于生成 synthetic 图像。我们在59902 个不实际的日期图像上训练了模型，并在2019年到2027年的 realistic synthetic 日期上进行测试，表示为 yyyy/mm/dd。在我们的研究中，我们发现了隐藏瓶颈层可以提高通用性，并且当隐藏瓶颈层的大小增加到 1024 时，在下游转移学习任务中可以提高通用性。我们的方法达到了 97% 的准确率在图像翻译任务中，并且可以普适应任何下游学习任务。

MOELoRA: An MOE-based Parameter Efficient Fine-Tuning Method for Multi-task Medical Applications

paper_url: http://arxiv.org/abs/2310.18339
repo_url: https://github.com/liuqidong07/moelora-peft
paper_authors: Qidong Liu, Xian Wu, Xiangyu Zhao, Yuanshao Zhu, Derong Xu, Feng Tian, Yefeng Zheng
for: 这个研究旨在为大语言模型（LLMs）在医疗系统中进行微调，以应对实际医疗任务中的多种任务。
methods: 我们提出了一个叫做MOELoRA的参数效率微调框架，利用MOE的多任务学习和LoRA的参数效率微调。我们将专家分为两个低矩阵对，以保持小数目的参数。此外，我们提出了一个任务驱动的闸函数，可以调节各专家的贡献和生成不同任务的特有参数。
results: 我们在公开的多任务中文医疗数据集上进行了广泛的实验，结果显示MOELoRA可以超越现有的参数效率微调方法。

Abstract
The recent surge in the field of Large Language Models (LLMs) has gained significant attention in numerous domains. In order to tailor an LLM to a specific domain such as a web-based healthcare system, fine-tuning with domain knowledge is necessary. However, two issues arise during fine-tuning LLMs for medical applications. The first is the problem of task variety, where there are numerous distinct tasks in real-world medical scenarios. This diversity often results in suboptimal fine-tuning due to data imbalance and seesawing problems. Additionally, the high cost of fine-tuning can be prohibitive, impeding the application of LLMs. The large number of parameters in LLMs results in enormous time and computational consumption during fine-tuning, which is difficult to justify. To address these two issues simultaneously, we propose a novel parameter-efficient fine-tuning framework for multi-task medical applications called MOELoRA. The framework aims to capitalize on the benefits of both MOE for multi-task learning and LoRA for parameter-efficient fine-tuning. To unify MOE and LoRA, we devise multiple experts as the trainable parameters, where each expert consists of a pair of low-rank matrices to maintain a small number of trainable parameters. Additionally, we propose a task-motivated gate function for all MOELoRA layers that can regulate the contributions of each expert and generate distinct parameters for various tasks. To validate the effectiveness and practicality of the proposed method, we conducted comprehensive experiments on a public multi-task Chinese medical dataset. The experimental results demonstrate that MOELoRA outperforms existing parameter-efficient fine-tuning methods. The implementation is available online for convenient reproduction of our experiments.

摘要
Recently, there has been a surge of interest in Large Language Models (LLMs) in various domains. To adapt an LLM to a specific domain like a web-based healthcare system, fine-tuning with domain knowledge is essential. However, there are two challenges during fine-tuning LLMs for medical applications. First, there are many diverse tasks in real-world medical scenarios, leading to suboptimal fine-tuning due to data imbalance and seesawing problems. Second, the high cost of fine-tuning can be prohibitive, making it difficult to apply LLMs. The large number of parameters in LLMs results in significant time and computational consumption during fine-tuning, which is difficult to justify. To address these two issues simultaneously, we propose a novel parameter-efficient fine-tuning framework for multi-task medical applications called MOELoRA. The framework combines the benefits of MOE for multi-task learning and LoRA for parameter-efficient fine-tuning.In our framework, we use multiple experts as the trainable parameters, where each expert consists of a pair of low-rank matrices to maintain a small number of trainable parameters. Additionally, we propose a task-motivated gate function for all MOELoRA layers that can regulate the contributions of each expert and generate distinct parameters for various tasks. To validate the effectiveness and practicality of the proposed method, we conducted comprehensive experiments on a public multi-task Chinese medical dataset. The experimental results show that MOELoRA outperforms existing parameter-efficient fine-tuning methods. The implementation is available online for convenient reproduction of our experiments.

On the Neural Tangent Kernel of Equilibrium Models

paper_url: http://arxiv.org/abs/2310.14062
repo_url: None
paper_authors: Zhili Feng, J. Zico Kolter
for: 这个研究探讨了深度平衡（DEQ）模型的神经 Tangent kernel（NTK）。
methods: 研究使用了 root-finding 方法来有效地找到 DEQ 模型的 deterministic NTK。
results: 研究发现，尽管 Fully-connected 神经网络的 NTK 在宽度和深度都在无限大时可能是随机的，但 DEQ 模型在某些条件下仍然具有 deterministic NTK，并且可以通过 root-finding 方法有效地找到它。

Abstract
This work studies the neural tangent kernel (NTK) of the deep equilibrium (DEQ) model, a practical ``infinite-depth'' architecture which directly computes the infinite-depth limit of a weight-tied network via root-finding. Even though the NTK of a fully-connected neural network can be stochastic if its width and depth both tend to infinity simultaneously, we show that contrarily a DEQ model still enjoys a deterministic NTK despite its width and depth going to infinity at the same time under mild conditions. Moreover, this deterministic NTK can be found efficiently via root-finding.

摘要

Composer Style-specific Symbolic Music Generation Using Vector Quantized Discrete Diffusion Models

paper_url: http://arxiv.org/abs/2310.14044
repo_url: None
paper_authors: Jincheng Zhang, Jingjing Tang, Charalampos Saitis, György Fazekas
for: 这篇论文旨在应用vector quantized variational autoencoder（VQ-VAE）和数组diffusion模型，实现符合作曲者风格的象数音乐生成。
methods: 本文使用VQ-VAE将象数音乐转换为一系列的index，然后使用数组diffusion模型来模拟VQ-VAE的数组几何空间。
results: 实验结果显示，使用本文提出的方法可以实现符合作曲者风格的象数音乐生成，精度为72.36%。

Abstract
Emerging Denoising Diffusion Probabilistic Models (DDPM) have become increasingly utilised because of promising results they have achieved in diverse generative tasks with continuous data, such as image and sound synthesis. Nonetheless, the success of diffusion models has not been fully extended to discrete symbolic music. We propose to combine a vector quantized variational autoencoder (VQ-VAE) and discrete diffusion models for the generation of symbolic music with desired composer styles. The trained VQ-VAE can represent symbolic music as a sequence of indexes that correspond to specific entries in a learned codebook. Subsequently, a discrete diffusion model is used to model the VQ-VAE's discrete latent space. The diffusion model is trained to generate intermediate music sequences consisting of codebook indexes, which are then decoded to symbolic music using the VQ-VAE's decoder. The results demonstrate our model can generate symbolic music with target composer styles that meet the given conditions with a high accuracy of 72.36%.

摘要
emerging Denoising Diffusion Probabilistic Models (DDPM) 已经越来越受到使用，因为它在不同的生成任务中的连续数据上表现出色，如图像和音频生成。然而， diffusion 模型尚未被完全扩展到字符串符号音乐。我们提议将量化变换自动编码器（VQ-VAE）和字符串扩散模型结合使用，以生成符号音乐 WITH 愿景作曲风格。训练过的 VQ-VAE 可以将符号音乐表示为一个序列的索引，这些索引与一个学习的码库中的特定条目相对应。然后，一个字符串扩散模型将 VQ-VAE 的字符串潜在空间模型化。扩散模型会生成符号音乐序列中的代码库索引，这些索引然后通过 VQ-VAE 的解码器转换为符号音乐。结果表明，我们的模型可以生成符号音乐 WITH 目标作曲风格，并且准确率为 72.36%。

Fast Diffusion GAN Model for Symbolic Music Generation Controlled by Emotions

paper_url: http://arxiv.org/abs/2310.14040
repo_url: None
paper_authors: Jincheng Zhang, György Fazekas, Charalampos Saitis
for: 本研究旨在用扩散模型和生成对抗网络（GAN）控制生成的符号音乐，以实现 targets 的情感控制。
methods: 我们首先使用已经训练过的变量自动编码器获取符号音乐数据集的嵌入，然后使用这些嵌入来训练扩散模型。
results: 我们的模型成功控制了生成的符号音乐，并且在计算成本方面具有显著改善，只需要四个时间步骤来减噪，而现有的扩散模型对符号音乐生成的计算成本是千万次。

Abstract
Diffusion models have shown promising results for a wide range of generative tasks with continuous data, such as image and audio synthesis. However, little progress has been made on using diffusion models to generate discrete symbolic music because this new class of generative models are not well suited for discrete data while its iterative sampling process is computationally expensive. In this work, we propose a diffusion model combined with a Generative Adversarial Network, aiming to (i) alleviate one of the remaining challenges in algorithmic music generation which is the control of generation towards a target emotion, and (ii) mitigate the slow sampling drawback of diffusion models applied to symbolic music generation. We first used a trained Variational Autoencoder to obtain embeddings of a symbolic music dataset with emotion labels and then used those to train a diffusion model. Our results demonstrate the successful control of our diffusion model to generate symbolic music with a desired emotion. Our model achieves several orders of magnitude improvement in computational cost, requiring merely four time steps to denoise while the steps required by current state-of-the-art diffusion models for symbolic music generation is in the order of thousands.

摘要
Diffusion models have shown promising results for a wide range of generative tasks with continuous data, such as image and audio synthesis. However, little progress has been made on using diffusion models to generate discrete symbolic music because this new class of generative models are not well suited for discrete data while its iterative sampling process is computationally expensive. In this work, we propose a diffusion model combined with a Generative Adversarial Network, aiming to (i) alleviate one of the remaining challenges in algorithmic music generation, which is the control of generation towards a target emotion, and (ii) mitigate the slow sampling drawback of diffusion models applied to symbolic music generation. We first used a trained Variational Autoencoder to obtain embeddings of a symbolic music dataset with emotion labels and then used those to train a diffusion model. Our results demonstrate the successful control of our diffusion model to generate symbolic music with a desired emotion. Our model achieves several orders of magnitude improvement in computational cost, requiring merely four time steps to denoise while the steps required by current state-of-the-art diffusion models for symbolic music generation is in the order of thousands.Here's the translation in Traditional Chinese:Diffusion models have shown promising results for a wide range of generative tasks with continuous data, such as image and audio synthesis. However, little progress has been made on using diffusion models to generate discrete symbolic music because this new class of generative models are not well suited for discrete data while its iterative sampling process is computationally expensive. In this work, we propose a diffusion model combined with a Generative Adversarial Network, aiming to (i) alleviate one of the remaining challenges in algorithmic music generation, which is the control of generation towards a target emotion, and (ii) mitigate the slow sampling drawback of diffusion models applied to symbolic music generation. We first used a trained Variational Autoencoder to obtain embeddings of a symbolic music dataset with emotion labels and then used those to train a diffusion model. Our results demonstrate the successful control of our diffusion model to generate symbolic music with a desired emotion. Our model achieves several orders of magnitude improvement in computational cost, requiring merely four time steps to denoise while the steps required by current state-of-the-art diffusion models for symbolic music generation is in the order of thousands.

Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning

paper_url: http://arxiv.org/abs/2310.18338
repo_url: https://github.com/lcs2-iiitd/daslam
paper_authors: Gurusha Juneja, Subhabrata Dutta, Soumen Chakrabarti, Sunny Manchanda, Tanmoy Chakraborty
for: 提高大语言模型（LLM）的链式思维能力，解决复杂多步骤的理性问题。
methods: 使用 decomposition generator 将复杂问题 decomposes 成需 fewer reasoning steps 的子问题，然后使用 solver 解决子问题。
results: 使用 DaSLaM 方法，可以使用 comparable 或更好的性能，与 orders-of-magnitude 更大的 GPT-4 相比。此外， DaSLaM 方法不受 solver 的scale 影响，可以使用 diverse 大小的 solver LM 获得显著的性能提升。

Abstract
Large Language Models (LLMs) prompted to generate chain-of-thought (CoT) exhibit impressive reasoning capabilities. Recent attempts at prompt decomposition toward solving complex, multi-step reasoning problems depend on the ability of the LLM to simultaneously decompose and solve the problem. A significant disadvantage is that foundational LLMs are typically not available for fine-tuning, making adaptation computationally prohibitive. We believe (and demonstrate) that problem decomposition and solution generation are distinct capabilites, better addressed in separate modules, than by one monolithic LLM. We introduce DaSLaM, which uses a decomposition generator to decompose complex problems into subproblems that require fewer reasoning steps. These subproblems are answered by a solver. We use a relatively small (13B parameters) LM as the decomposition generator, which we train using policy gradient optimization to interact with a solver LM (regarded as black-box) and guide it through subproblems, thereby rendering our method solver-agnostic. Evaluation on multiple different reasoning datasets reveal that with our method, a 175 billion parameter LM (text-davinci-003) can produce competitive or even better performance, compared to its orders-of-magnitude larger successor, GPT-4. Additionally, we show that DaSLaM is not limited by the solver's capabilities as a function of scale; e.g., solver LMs with diverse sizes give significant performance improvement with our solver-agnostic decomposition technique. Exhaustive ablation studies evince the superiority of our modular finetuning technique over exorbitantly large decomposer LLMs, based on prompting alone.

摘要
大型语言模型（LLM）因为被调动产生链式思维（CoT）而表现出卓越的推理能力。现在的尝试通过问题分解来解决复杂多步骤的问题，很大程度上取决于LLM的能力同时进行问题分解和解决。然而，基础LLM通常不可以进行微调，从而使得适应成本高昂。我们认为（并证明）问题分解和解决是不同的能力，更好地通过分类模组来处理，而不是单一的LLM。我们称之为DaSLaM，它使用问题分解生成器将复杂问题分解成需要 fewer 推理步骤的子问题。这些子问题由解决器回答。我们使用一个相对较小的（13B个参数）LM作为问题分解生成器，并使用政策倾斜优化训练它与解决器LM（被视为黑盒）互动，以导引它通过子问题，因此让我们的方法成为解决器无关的。我们在多个不同的推理数据集上进行评估，发现我们的方法可以让1750亿个参数的LM（text-davinci-003）生成竞争或更好的性能，相比之下它的规模增加了许多。此外，我们还证明了DaSLaM不受解决器的规模影响，例如，不同大小的解决器LM可以为我们的模块化微调技术带来很大的性能提升。我们进行了详细的剥夺研究，证明我们的模块化微调技术在Prompting alone下比极大的问题分解LLM更有优势。

Contrast Everything: A Hierarchical Contrastive Framework for Medical Time-Series

paper_url: http://arxiv.org/abs/2310.14017
repo_url: https://github.com/dl4mhealth/comet
paper_authors: Yihe Wang, Yu Han, Haishuai Wang, Xiang Zhang
for: 这个研究旨在提高医疗时间序列分析中的相似表现学习，以便更好地利用医疗时间序列中的资讯，减少专业人员的努力和时间投入。
methods: 本研究提出了一个创新的层次架构，叫做COMET，它在医疗时间序列中捕捉到四个可能的水平的数据一致性：观察、样本、实验和病人水平。通过开发多个水平的对照损失函数，我们可以学习有效的表现，并实现自动化的医疗时间序列分析。
results: 我们在具有10%和1%的标签数据分布的挑战性设置下实现了实验，并与六个基eline进行比较。结果显示，COMET在所有基eline中都表现出色，特别是在10%和1%的标签数据分布下的设置中。这些结果证明了我们的框架在医疗时间序列相似表现学习中的重要性。

Abstract
Contrastive representation learning is crucial in medical time series analysis as it alleviates dependency on labor-intensive, domain-specific, and scarce expert annotations. However, existing contrastive learning methods primarily focus on one single data level, which fails to fully exploit the intricate nature of medical time series. To address this issue, we present COMET, an innovative hierarchical framework that leverages data consistencies at all inherent levels in medical time series. Our meticulously designed model systematically captures data consistency from four potential levels: observation, sample, trial, and patient levels. By developing contrastive loss at multiple levels, we can learn effective representations that preserve comprehensive data consistency, maximizing information utilization in a self-supervised manner. We conduct experiments in the challenging patient-independent setting. We compare COMET against six baselines using three diverse datasets, which include ECG signals for myocardial infarction and EEG signals for Alzheimer's and Parkinson's diseases. The results demonstrate that COMET consistently outperforms all baselines, particularly in setup with 10% and 1% labeled data fractions across all datasets. These results underscore the significant impact of our framework in advancing contrastive representation learning techniques for medical time series. The source code is available at https://github.com/DL4mHealth/COMET.

摘要
医疗时序分析中，对比表示学学习是关键，因为它减轻了劳动密集、领域特定和珍贵专家标注的依赖。然而，现有的对比学习方法主要集中在单一数据层次，这会无法全面利用医疗时序的复杂性。为解决这个问题，我们提出了COMET，一种创新的层次结构框架，利用医疗时序中所有自然层次的数据一致性。我们 méticulously 设计的模型逐级捕捉数据一致性，从观察、样本、试验和患者四个级别进行对比学习。通过开发多级对比损失函数，我们可以学习有效的表示，保留医疗时序中完整的数据一致性，最大化自我监督的信息利用。我们在医疗时序中的挑战性 patrnt-independent 设置中进行实验，与六个基线比较。我们使用三种多样化的数据集，包括ECG信号和 Alzheimer's 和 Parkinson's 疾病的 EEG 信号。结果表明，COMET 在所有基线之上具有优异表现，特别是在10%和1%标注数据分布中的设置中。这些结果赋予COMET 在医疗时序对比表示学习技术的进步。COMET 的源代码可以在 GitHub 上获取：https://github.com/DL4mHealth/COMET。

One is More: Diverse Perspectives within a Single Network for Efficient DRL

paper_url: http://arxiv.org/abs/2310.14009
repo_url: None
paper_authors: Yiqin Tan, Ling Pan, Longbo Huang
for: 这 paper 是用于提高 deep reinforcement learning 的效率和稳定性的研究。
methods: 这 paper 使用了多个子网络（OMNet），每个子网络输出不同的结果，从而提高了学习效率和鲁棒性。
results: 通过在 MuJoCo 测试集上进行 comprehensive 评估， authors 发现 OMNet 能够很好地寻找效果和计算成本之间的平衡。

Abstract
Deep reinforcement learning has achieved remarkable performance in various domains by leveraging deep neural networks for approximating value functions and policies. However, using neural networks to approximate value functions or policy functions still faces challenges, including low sample efficiency and overfitting. In this paper, we introduce OMNet, a novel learning paradigm utilizing multiple subnetworks within a single network, offering diverse outputs efficiently. We provide a systematic pipeline, including initialization, training, and sampling with OMNet. OMNet can be easily applied to various deep reinforcement learning algorithms with minimal additional overhead. Through comprehensive evaluations conducted on MuJoCo benchmark, our findings highlight OMNet's ability to strike an effective balance between performance and computational cost.

摘要
深度强化学习已在多个领域取得了显著的成绩，通过使用深度神经网络来近似价值函数和策略函数。然而，使用神经网络来近似价值函数或策略函数仍然面临挑战，包括低样本效率和过拟合。在本文中，我们介绍了OMNet，一种新的学习模式，它在单个网络中嵌入多个子网络，以获得多种输出，高效地。我们提供了一个系统化的管道，包括初始化、训练和采样，以及OMNet可以轻松应用于多种深度强化学习算法，增加了最小的额外开销。通过对MuJoCo benchmark进行了全面的评估，我们的发现表明OMNet能够均衡性能和计算成本。

On Bilingual Lexicon Induction with Large Language Models

paper_url: http://arxiv.org/abs/2310.13995
repo_url: https://github.com/cambridgeltl/prompt4bli
paper_authors: Yaoyiran Li, Anna Korhonen, Ivan Vulić
for: 本研究旨在探讨使用最新一代大语言模型（LLMs）来开发双语词表。
methods: 我们采用了零批示推理、几批示推理和标准的BLI预训练方法来评估多语言模型（mLLMs）的应用性。
results: 我们的实验结果显示，使用几批示推理的 nearest neighbours 示例可以达到最佳性能，并创造了许多语言对的新纪录。

Abstract
Bilingual Lexicon Induction (BLI) is a core task in multilingual NLP that still, to a large extent, relies on calculating cross-lingual word representations. Inspired by the global paradigm shift in NLP towards Large Language Models (LLMs), we examine the potential of the latest generation of LLMs for the development of bilingual lexicons. We ask the following research question: Is it possible to prompt and fine-tune multilingual LLMs (mLLMs) for BLI, and how does this approach compare against and complement current BLI approaches? To this end, we systematically study 1) zero-shot prompting for unsupervised BLI and 2) few-shot in-context prompting with a set of seed translation pairs, both without any LLM fine-tuning, as well as 3) standard BLI-oriented fine-tuning of smaller LLMs. We experiment with 18 open-source text-to-text mLLMs of different sizes (from 0.3B to 13B parameters) on two standard BLI benchmarks covering a range of typologically diverse languages. Our work is the first to demonstrate strong BLI capabilities of text-to-text mLLMs. The results reveal that few-shot prompting with in-context examples from nearest neighbours achieves the best performance, establishing new state-of-the-art BLI scores for many language pairs. We also conduct a series of in-depth analyses and ablation studies, providing more insights on BLI with (m)LLMs, also along with their limitations.

摘要
global paradigm shift в NLP towards Large Language Models (LLMs) 我们从 LLMs 的最新一代获得了开发双语词汇的潜力。我们的研究问题是：可以将多ilingual LLMs (mLLMs) 用于双语词汇问题（BLI）的问题，并且如何与现有的 BLI 方法相比。为此，我们系统地研究了以下问题：1. 零shot 提示，不需要精度批评的 BLI。2. 几个 shot 的内容提示，使用一组seed translation pairs，而不需要精度批评。3. 标准的 BLI-oriented 精度批评。我们实验了 18 个开源的文本至文本 mLLMs ，它们的大小在 0.3B 到 13B 个参数之间，在两个标准的 BLI 库中进行了评估。我们的研究是首次证明了文本至文本 mLLMs 的强大 BLI 能力。结果显示，几个 shot 的内容提示可以取得最好的性能，建立了许多语言对的新的顶峰 BLI 分数。我们还进行了详细的分析和剔除研究，提供了更多关于 BLI 的关键。

Application of deep and reinforcement learning to boundary control problems

paper_url: http://arxiv.org/abs/2310.15191
repo_url: https://github.com/zenineasa/MasterThesis
paper_authors: Zenin Easa Panthakkalakath, Juraj Kardoš, Olaf Schenk
for: 本研究旨在使用深度学习和强化学习解决边控制问题。
methods: 我们采用迭代优化策略，使用空间神经网络构建初始猜测，并使用维度-时间神经网络学习迭代优化算法。
results: 我们的数据培育和测试表明，提议的方法可以与现有的解决方案相比，速度和准确性相当。在我们的初步结果中，网络实现成本比IPOPT低于51%的情况。

Abstract
The boundary control problem is a non-convex optimization and control problem in many scientific domains, including fluid mechanics, structural engineering, and heat transfer optimization. The aim is to find the optimal values for the domain boundaries such that the enclosed domain adhering to the governing equations attains the desired state values. Traditionally, non-linear optimization methods, such as the Interior-Point method (IPM), are used to solve such problems. This project explores the possibilities of using deep learning and reinforcement learning to solve boundary control problems. We adhere to the framework of iterative optimization strategies, employing a spatial neural network to construct well-informed initial guesses, and a spatio-temporal neural network learns the iterative optimization algorithm using policy gradients. Synthetic data, generated from the problems formulated in the literature, is used for training, testing and validation. The numerical experiments indicate that the proposed method can rival the speed and accuracy of existing solvers. In our preliminary results, the network attains costs lower than IPOPT, a state-of-the-art non-linear IPM, in 51\% cases. The overall number of floating point operations in the proposed method is similar to that of IPOPT. Additionally, the informed initial guess method and the learned momentum-like behaviour in the optimizer method are incorporated to avoid convergence to local minima.

摘要
“边界控制问题是科学领域中的一种非 convex 优化和控制问题，包括流体动力学、结构工程和热传输优化等。目标是找到包含 governing 方程的Domaint 的优化值，使涵盖的 Domaint 达到所需的状态值。传统上，非线性优化方法，如Interior-Point 方法（IPM），用于解决这些问题。”This project explores the use of deep learning and reinforcement learning to solve boundary control problems. We use an iterative optimization strategy, with a spatial neural network constructing well-informed initial guesses and a spatio-temporal neural network learning the iterative optimization algorithm using policy gradients. Synthetic data, generated from problems formulated in the literature, is used for training, testing, and validation. Our numerical experiments show that the proposed method can rival the speed and accuracy of existing solvers. In our preliminary results, the network attains costs lower than IPOPT, a state-of-the-art non-linear IPM, in 51% of cases. The overall number of floating point operations in the proposed method is similar to that of IPOPT. Additionally, we incorporate an informed initial guess method and a learned momentum-like behavior in the optimizer to avoid convergence to local minima.

Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

paper_url: http://arxiv.org/abs/2310.13961
repo_url: https://github.com/ibm/ensemble-instruct
paper_authors: Young-Suk Lee, Md Arafat Sultan, Yousef El-Kurdi, Tahira Naseem Asim Munawar, Radu Florian, Salim Roukos, Ramón Fernandez Astudillo
for: 用于自动生成数据，提高对话机器人的强度，只需要小量的人工监督。
methods: 使用 Self-Instruct 和 Alpaca 等技术，训练小型语言模型（10B–40B参数），并使用 permissive licenses。
results: 提高自动生成数据的质量，对于小型语言模型和大型语言模型都有显著提高，并且小型 instruction-tuned LM 生成的输出更有用。

Abstract
Using in-context learning (ICL) for data generation, techniques such as Self-Instruct (Wang et al., 2023) or the follow-up Alpaca (Taori et al., 2023) can train strong conversational agents with only a small amount of human supervision. One limitation of these approaches is that they resort to very large language models (around 175B parameters) that are also proprietary and non-public. Here we explore the application of such techniques to language models that are much smaller (around 10B--40B parameters) and have permissive licenses. We find the Self-Instruct approach to be less effective at these sizes and propose new ICL methods that draw on two main ideas: (a) Categorization and simplification of the ICL templates to make prompt learning easier for the LM, and (b) Ensembling over multiple LM outputs to help select high-quality synthetic examples. Our algorithm leverages the 175 Self-Instruct seed tasks and employs separate pipelines for instructions that require an input and instructions that do not. Empirical investigations with different LMs show that: (1) Our proposed method yields higher-quality instruction tuning data than Self-Instruct, (2) It improves performances of both vanilla and instruction-tuned LMs by significant margins, and (3) Smaller instruction-tuned LMs generate more useful outputs than their larger un-tuned counterparts. Our codebase is available at https://github.com/IBM/ensemble-instruct.

摘要
Translation in Simplified Chinese:使用增Context学习（ICL）数据生成技术，如Self-Instruct（Wang et al., 2023）或Alpaca（Taori et al., 2023），可以帮助强化对话机器人，只需小量人工监督。这些方法的一个限制是它们需要非常大的语言模型（约175B参数），这些模型也是 propriety 和非公共的。在这里，我们探索将这些技术应用于更小的语言模型（约10B--40B参数），这些模型具有允许的许可证。我们发现Self-Instruct方法在这些大小下不太有效，我们提出了新的ICL方法，它们基于以下两个主要想法：（a）将ICL模板分类和简化，使Language Model（LM）更容易学习提示，和（b）将多个LM输出ensemble，以选择高质量的人工示例。我们的算法利用Self-Instruct的175个种子任务，并使用不同的LM pipeline，对于需要输入和不需要输入的指令而分开。我们的实验表明，我们的提posed方法可以生成更高质量的指令调整数据，并且可以提高vanilla LM和指令调整LM的性能，同时使小型指令调整LM生成更有用的输出。我们的代码库可以在https://github.com/IBM/ensemble-instruct中获取。

Towards dialogue based, computer aided software requirements elicitation

paper_url: http://arxiv.org/abs/2310.13953
repo_url: None
paper_authors: Vasiliy Seibert
for: 这篇论文是为了探讨如何从自然语言规范中提取模型的问题。
methods: 这篇论文提出了一种对话基于的计算机支持的软件需求分析方法，而不是先前的模型提取方法，它鼓励个性、创造力和真实的妥协。
results: 这篇论文通过简单的实验示例了这种方法的核心思想，并讨论了这种方法和现有方法的比较。同时，它也认为未来的自然语言处理和生成AI技术的进步可能会带来重要的进步。

Abstract
Several approaches have been presented, which aim to extract models from natural language specifications. These approaches have inherent weaknesses for they assume an initial problem understanding that is perfect, and they leave no room for feedback. Motivated by real-world collaboration settings between requirements engineers and customers, this paper proposes an interaction blueprint that aims for dialogue based, computer aided software requirements analysis. Compared to mere model extraction approaches, this interaction blueprint encourages individuality, creativity and genuine compromise. A simplistic Experiment was conducted to showcase the general idea. This paper discusses the experiment as well as the proposed interaction blueprint and argues, that advancements in natural language processing and generative AI might lead to significant progress in a foreseeable future. However, for that, there is a need to move away from a magical black box expectation and instead moving towards a dialogue based approach that recognizes the individuality that is an undeniable part of requirements engineering.

摘要
Translated into Simplified Chinese:有几种方法已经被提出来，以EXTRACT模型从自然语言规格。这些方法具有内在的弱点，因为它们假设问题理解是完美的，并且没有减 feedback 的机制。由实际世界的合作场景中的需求工程师和客户而受到 inspirited，这篇论文提出了对话基本的交互蓝图，用于计算机助成的软件需求分析。这种方法强调个性、创造力和真实的妥协，并在简单的实验中进行了示例。这篇论文讨论了实验和提出的交互蓝图，并 argued That advancements in自然语言处理和生成 AI 可能会在未来导致显著的进步，但我们需要停止对 "黑盒子" 的期望，而是转向对话基本的方法，认可需求工程的个性。

Approximate Implication for Probabilistic Graphical Models

paper_url: http://arxiv.org/abs/2310.13942
repo_url: None
paper_authors: Batya Kenig
for: 本研究证明了关于 Conditional Independence (CI) 在 Probabilistic Graphical Models (PGMs) 中的准确性问题。
methods: 本文使用了现有的系统推理方法，以及新的证明方法，来研究 CI 的准确性。
results: 本文证明了对于非导向图模型，无法提供任何保证，而对于导向图模型，使用 $d$-separation 算法可以提供准确的 CI。 Additionally, the paper establishes improved approximation guarantees for independence relations derived from marginal and saturated CIs.

Abstract
The graphical structure of Probabilistic Graphical Models (PGMs) represents the conditional independence (CI) relations that hold in the modeled distribution. Every separator in the graph represents a conditional independence relation in the distribution, making them the vehicle through which new conditional independencies are inferred and verified. The notion of separation in graphs depends on whether the graph is directed (i.e., a Bayesian Network), or undirected (i.e., a Markov Network). The premise of all current systems-of-inference for deriving CIs in PGMs, is that the set of CIs used for the construction of the PGM hold exactly. In practice, algorithms for extracting the structure of PGMs from data discover approximate CIs that do not hold exactly in the distribution. In this paper, we ask how the error in this set propagates to the inferred CIs read off the graphical structure. More precisely, what guarantee can we provide on the inferred CI when the set of CIs that entailed it hold only approximately? It has recently been shown that in the general case, no such guarantee can be provided. In this work, we prove new negative and positive results concerning this problem. We prove that separators in undirected PGMs do not necessarily represent approximate CIs. That is, no guarantee can be provided for CIs inferred from the structure of undirected graphs. We prove that such a guarantee exists for the set of CIs inferred in directed graphical models, making the $d$-separation algorithm a sound and complete system for inferring approximate CIs. We also establish improved approximation guarantees for independence relations derived from marginal and saturated CIs.

摘要
PGMs（概率图）的图Structural representation表示模型中的 conditional independence（CI）关系。每个分隔器在图中表示模型中的CI关系，使其成为新的CI关系的推理和验证的渠道。图中的分隔器取决于图是指向的（即 bayesian network）还是无向的（即 markov network）。现有所有系统的推理方法都基于CI关系的集合在PGM中准确地满足。在实际中，数据中PGM的结构检索算法发现的CI关系不准确地存在于分布中。在这篇论文中，我们问 Error propagation in inferred CIs 问题的解决方案。即在PGM中的CI关系是否准确地推理出来？我们证明了一些新的负面和正面结果。在无向PGM中，分隔器不一定表示CI关系的近似。即，无法提供PGM中的CI关系的准确性 garantia。而在指向PGM中，我们证明了$d$-separation算法是一个准确和完整的系统，用于推理CI关系。此外，我们还证明了基于边缘和满足CI关系的约束的约束 CI 关系的改进了近似性保证。

The Hidden Adversarial Vulnerabilities of Medical Federated Learning

paper_url: http://arxiv.org/abs/2310.13893
repo_url: None
paper_authors: Erfan Darzi, Florian Dubost, Nanna. M. Sijtsema, P. M. A van Ooijen
for: 这个论文探讨了联合医疗图像分析系统对 adversarial 攻击的感受性。
methods: 该分析发现了一种新的攻击途径：利用先前全局模型更新的梯度信息，攻击者可以提高他们的攻击效率和传播性，而无需额外的计算成本增加。
results: 研究表明，适当初始化后的单步攻击（例如 FGSM）可以超越其迭代对手的效率，但需要更少的计算资源。这些发现强调了在联合医疗设备中应对 AI 安全的需要。

Abstract
In this paper, we delve into the susceptibility of federated medical image analysis systems to adversarial attacks. Our analysis uncovers a novel exploitation avenue: using gradient information from prior global model updates, adversaries can enhance the efficiency and transferability of their attacks. Specifically, we demonstrate that single-step attacks (e.g. FGSM), when aptly initialized, can outperform the efficiency of their iterative counterparts but with reduced computational demand. Our findings underscore the need to revisit our understanding of AI security in federated healthcare settings.

摘要
在这篇论文中，我们探讨了联合医疗图像分析系统中的恶意攻击的感受性。我们的分析发现了一个新的攻击途径：使用先前全球模型更新的梯度信息，恶意者可以提高攻击的效率和传播性。具体来说，我们表明了使用单步攻击（如FGSM），当初始化得当时，可以超过迭代攻击的效率，但却减少计算强度。我们的发现强调了在联合医疗设施中AI安全的重要性。

COVIDFakeExplainer: An Explainable Machine Learning based Web Application for Detecting COVID-19 Fake News

paper_url: http://arxiv.org/abs/2310.13890
repo_url: https://github.com/DatProGuy/COVIDFakeExplainer
paper_authors: Dylan Warman, Muhammad Ashad Kabir
for: 这篇论文旨在提供一个实用的伪新闻检测解决方案，以帮助社会对伪新闻进行有效防范。
methods: 本论文使用机器学习技术，包括深度学习方法，以探索伪新闻检测的可能性。特别是，本论文使用BERT模型，并将其应用于实际的伪新闻检测和解释。
results: 本论文的实验结果显示，BERT模型在检测COVID-19相关伪新闻方面 exhibits 高度的准确性。此外，本论文还提出了一个可读性增强的BERT模型，并将其作为一个服务通过AWS云端API主机。

Abstract
Fake news has emerged as a critical global issue, magnified by the COVID-19 pandemic, underscoring the need for effective preventive tools. Leveraging machine learning, including deep learning techniques, offers promise in combatting fake news. This paper goes beyond by establishing BERT as the superior model for fake news detection and demonstrates its utility as a tool to empower the general populace. We have implemented a browser extension, enhanced with explainability features, enabling real-time identification of fake news and delivering easily interpretable explanations. To achieve this, we have employed two publicly available datasets and created seven distinct data configurations to evaluate three prominent machine learning architectures. Our comprehensive experiments affirm BERT's exceptional accuracy in detecting COVID-19-related fake news. Furthermore, we have integrated an explainability component into the BERT model and deployed it as a service through Amazon's cloud API hosting (AWS). We have developed a browser extension that interfaces with the API, allowing users to select and transmit data from web pages, receiving an intelligible classification in return. This paper presents a practical end-to-end solution, highlighting the feasibility of constructing a holistic system for fake news detection, which can significantly benefit society.

摘要
假新闻在全球范围内已成为一个严重的问题，COVID-19大流行的爆发进一步强调了需要有效的预防工具。利用机器学习，包括深度学习技术，可能在打击假新闻方面提供希望。这篇论文超越了现有的研究，将BERT模型确定为假新闻检测的最佳模型，并将其作为普通民众 empower 的工具。我们开发了一款浏览器扩展程序，该扩展程序具有解释性特性，可以在实时检测假新闻并提供可读性高的解释。为了实现这一点，我们使用了两个公共可用的数据集，并创建了七种不同的数据配置来评估三种知名的机器学习架构。我们的广泛的实验证明了 COVID-19 相关的假新闻检测BERT模型的突出性。此外，我们将BERT模型集成了解释性Component，并通过Amazon云API主机（AWS）部署为服务。我们开发了一款浏览器扩展程序，该扩展程序可以从网页上选择和传输数据，并得到可读性高的分类结果。本论文提出了一个实用的端到端解决方案， highlighting 社会的可能性建立一个总体的假新闻检测系统，该系统可以对社会产生很大的好处。

2023-10-21

cs.CL

cs.CL - 2023-10-21

Structural generalization in COGS: Supertagging is (almost) all you need

paper_url: http://arxiv.org/abs/2310.14124
repo_url: https://github.com/alban-petit/semantic-supertag-parser
paper_authors: Alban Petit, Caio Corro, François Yvon
for: 提高 neural network 在不同类型的语言模型中的泛化能力
methods: 提出了一种基于图的semantic parsing框架，并对其进行了多种扩展以解决泛化问题
results: 实验结果表明，我们的方法可以在COGS dataset中提高泛化能力，特别是在需要结构泛化的例子上得到了显著提高

Abstract
In many Natural Language Processing applications, neural networks have been found to fail to generalize on out-of-distribution examples. In particular, several recent semantic parsing datasets have put forward important limitations of neural networks in cases where compositional generalization is required. In this work, we extend a neural graph-based semantic parsing framework in several ways to alleviate this issue. Notably, we propose: (1) the introduction of a supertagging step with valency constraints, expressed as an integer linear program; (2) a reduction of the graph prediction problem to the maximum matching problem; (3) the design of an incremental early-stopping training strategy to prevent overfitting. Experimentally, our approach significantly improves results on examples that require structural generalization in the COGS dataset, a known challenging benchmark for compositional generalization. Overall, our results confirm that structural constraints are important for generalization in semantic parsing.

摘要
多种自然语言处理应用程序中，神经网络通常无法泛化到不同分布中的示例。特别是在需要 Compositional Generalization 的 semantic parsing 数据集中，神经网络表现出了重要的局限性。在这种情况下，我们对一种基于神经网络的 semantic parsing 框架进行了多种扩展，以解决这个问题。主要提议包括：1. 引入精度标记步骤，使用整数线性Programming来表达 valency 约束。2. 将图像预测问题转换为最大匹配问题。3. 设计了逐步停止训练策略，以避免过拟合。实验表明，我们的方法可以在 COGS 数据集中，解决需要结构泛化的示例中显著提高结果。总之，我们的结果证明了结构约束对泛化的重要性。

Finite-context Indexing of Restricted Output Space for NLP Models Facing Noisy Input

paper_url: http://arxiv.org/abs/2310.14110
repo_url: https://github.com/mnhng/firo
paper_authors: Minh Nguyen, Nancy F. Chen
for: 提高 NLP 模型对不净输入的性能，而不是降低清晰输入的性能。
methods: FiRo 方法使用 finite-context aggregation 获取上下文嵌入，并在受限的输出空间内查找静止的表示。
results: FiRo 方法在六个分类任务和一个序列标注任务上，以不同程度的噪声为输入，与基eline相比表现出色。

Abstract
NLP models excel on tasks with clean inputs, but are less accurate with noisy inputs. In particular, character-level noise such as human-written typos and adversarially-engineered realistic-looking misspellings often appears in text and can easily trip up NLP models. Prior solutions to address character-level noise often alter the content of the inputs (low fidelity), thus inadvertently lowering model accuracy on clean inputs. We proposed FiRo, an approach to boost NLP model performance on noisy inputs without sacrificing performance on clean inputs. FiRo sanitizes the input text while preserving its fidelity by inferring the noise-free form for each token in the input. FiRo uses finite-context aggregation to obtain contextual embeddings which is then used to find the noise-free form within a restricted output space. The output space is restricted to a small cluster of probable candidates in order to predict the noise-free tokens more accurately. Although the clusters are small, FiRo's effective vocabulary (union of all clusters) can be scaled up to better preserve the input content. Experimental results show NLP models that use FiRo outperforming baselines on six classification tasks and one sequence labeling task at various degrees of noise.

摘要
FiRo 使用 finite-context aggregation 获取上下文嵌入，然后使用 restricted output space 来预测噪音自由形。输出空间是限制在一小 clusters 中，以更准确地预测噪音自由形。虽然 clusters 是小的，但FiRo 的有效词汇（union of all clusters）可以被扩展，以更好地保持输入内容。实验结果表明，使用 FiRo 的 NLP 模型在六个分类任务和一个序列标签任务上表现出色，在不同程度的噪音下都高于基eline。

Leveraging Knowledge Graphs for Orphan Entity Allocation in Resume Processing

paper_url: http://arxiv.org/abs/2310.14093
repo_url: None
paper_authors: Aagam Bakliwal, Shubham Manish Gandhi, Yashodhara Haribhakta
for: automatize and enhance the efficiency of the job screening process
methods: association mining, concept extraction, external knowledge linking, named entity recognition, and knowledge graph construction
results: successful bucketing of orphan entities within resumes, more effective candidate-job matching, and improved resume screening process accuracy.

Abstract
Significant challenges are posed in talent acquisition and recruitment by processing and analyzing unstructured data, particularly resumes. This research presents a novel approach for orphan entity allocation in resume processing using knowledge graphs. Techniques of association mining, concept extraction, external knowledge linking, named entity recognition, and knowledge graph construction are integrated into our pipeline. By leveraging these techniques, the aim is to automate and enhance the efficiency of the job screening process by successfully bucketing orphan entities within resumes. This allows for more effective matching between candidates and job positions, streamlining the resume screening process, and enhancing the accuracy of candidate-job matching. The approach's exceptional effectiveness and resilience are highlighted through extensive experimentation and evaluation, ensuring that alternative measures can be relied upon for seamless processing and orphan entity allocation in case of any component failure. The capabilities of knowledge graphs in generating valuable insights through intelligent information extraction and representation, specifically in the domain of categorizing orphan entities, are highlighted by the results of our research.

摘要
significannot challenges are posed in talent acquisition and recruitment by processing and analyzing unstructured data, particularly resumes. This research presents a novel approach for orphan entity allocation in resume processing using knowledge graphs. Techniques of association mining, concept extraction, external knowledge linking, named entity recognition, and knowledge graph construction are integrated into our pipeline. By leveraging these techniques, the aim is to automate and enhance the efficiency of the job screening process by successfully bucketing orphan entities within resumes. This allows for more effective matching between candidates and job positions, streamlining the resume screening process, and enhancing the accuracy of candidate-job matching. The approach's exceptional effectiveness and resilience are highlighted through extensive experimentation and evaluation, ensuring that alternative measures can be relied upon for seamless processing and orphan entity allocation in case of any component failure. The capabilities of knowledge graphs in generating valuable insights through intelligent information extraction and representation, specifically in the domain of categorizing orphan entities, are highlighted by the results of our research.Here's the text with some additional information about the Simplified Chinese translation:The translation is in Simplified Chinese, which is the standardized form of Chinese used in mainland China and Singapore. The text is written in a formal and academic style, using technical terms and concepts related to natural language processing, machine learning, and knowledge graphs.Some of the key concepts and techniques used in the text include:* orphan entity allocation (遗弃实体分配): the process of identifying and categorizing entities in unstructured data, such as resumes, that do not fit into predefined categories or structures.* knowledge graphs (知识图): a type of graph that represents entities and their relationships in a structured and interconnected way, allowing for efficient information extraction and analysis.* association mining (关联挖掘): a technique used to discover and extract relationships between entities in unstructured data, such as resumes.* concept extraction (概念提取): a technique used to identify and extract relevant concepts and entities from unstructured data, such as resumes.* named entity recognition (命名实体识别): a technique used to identify and extract specific types of entities, such as names of people, organizations, and locations, from unstructured data, such as resumes.Overall, the text presents a novel approach for automating and enhancing the efficiency of the job screening process using knowledge graphs and other techniques, with the goal of improving the accuracy of candidate-job matching and streamlining the resume screening process.

MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation

paper_url: http://arxiv.org/abs/2310.14088
repo_url: None
paper_authors: Zexue He, Yu Wang, An Yan, Yao Liu, Eric Y. Chang, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu
for: 本研究旨在提供一个多元、多任务、多领域医疗 benchmark，以推动语言模型在医疗领域的开发。
methods: 本研究使用了多种健康系统的数据，包括8种检查方式，共有22,779句 sentence和21,228份报告。研究人员在多个水平提供了专家标注，实现了精确的数据分类和多元的应用潜力。
results: 研究人员通过评估10种通用和领域特定的语言模型，包括健康领域基于领域的基eline和一般化的大语言模型（如ChatGPT），获得了不同任务之间的语言模型效果的评估结果。研究结果显示，大语言模型在不同任务之间的效果不同，并且发现了对 instrucion 的适应是一个重要的因素。

Abstract
Curated datasets for healthcare are often limited due to the need of human annotations from experts. In this paper, we present MedEval, a multi-level, multi-task, and multi-domain medical benchmark to facilitate the development of language models for healthcare. MedEval is comprehensive and consists of data from several healthcare systems and spans 35 human body regions from 8 examination modalities. With 22,779 collected sentences and 21,228 reports, we provide expert annotations at multiple levels, offering a granular potential usage of the data and supporting a wide range of tasks. Moreover, we systematically evaluated 10 generic and domain-specific language models under zero-shot and finetuning settings, from domain-adapted baselines in healthcare to general-purposed state-of-the-art large language models (e.g., ChatGPT). Our evaluations reveal varying effectiveness of the two categories of language models across different tasks, from which we notice the importance of instruction tuning for few-shot usage of large language models. Our investigation paves the way toward benchmarking language models for healthcare and provides valuable insights into the strengths and limitations of adopting large language models in medical domains, informing their practical applications and future advancements.

摘要
医疗领域的数据集经常受限于专家的人工标注。在这篇论文中，我们介绍了医生eval，一个多级、多任务、多领域的医疗语言模型开发 benchmark。医生eval 全面，涵盖多个医疗系统，覆盖人体8种检查方式，共收集了22,779句话和21,228份报告。我们提供了多个水平的专家标注，为数据的细化使用提供了可能的潜在应用，支持广泛的任务。此外，我们系统地评估了10种通用和医疗领域特定的语言模型，包括医疗领域基线模型和普通大语言模型（如ChatGPT）。我们的评估发现，这两类语言模型在不同任务中的效果不同，而且在几个任务中，大语言模型的几个 shot 使用需要进行调教。我们的调查开创了医疗领域语言模型的 benchmarking 的可能性，并为将来的应用和进步提供了有价值的见解，了解大语言模型在医疗领域的优劣和局限性。

Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain

paper_url: http://arxiv.org/abs/2310.14053
repo_url: https://github.com/marcusm117/IdentityChain
paper_authors: Marcus J. Min, Yangruibo Ding, Luca Buratti, Saurabh Pujar, Gail Kaiser, Suman Jana, Baishakhi Ray
for: 评估大型自然语言处理模型（Code LLMs）的可靠性。
methods: 提出了一种名为IdentityChain的框架，可以同时评估模型的自我一致性和总体准确率。
results: 对11个Code LLMs进行了评估，发现它们无法保持自我一致性，并且通过使用IdentityChain可以曝光当前模型的三大缺陷。

Abstract
Code Large Language Models (Code LLMs) are being increasingly employed in real-life applications, so evaluating them is critical. While the general accuracy of Code LLMs on individual tasks has been extensively evaluated, their self-consistency across different tasks is overlooked. Intuitively, a trustworthy model should be self-consistent when generating natural language specifications for its own code and generating code for its own specifications. Failure to preserve self-consistency reveals a lack of understanding of the shared semantics underlying natural language and programming language, and therefore undermines the trustworthiness of a model. In this paper, we first formally define the self-consistency of Code LLMs and then design a framework, IdentityChain, which effectively and efficiently evaluates the self-consistency and general accuracy of a model at the same time. We study eleven Code LLMs and show that they fail to preserve self-consistency, which is indeed a distinct aspect from general accuracy. Furthermore, we show that IdentityChain can be used as a model debugging tool to expose weaknesses of Code LLMs by demonstrating three major weaknesses that we identify in current models using IdentityChain. Our code is available at https://github.com/marcusm117/IdentityChain.

摘要
<>大型语言模型（大型语言模型，Code LLMs）在实际应用中越来越广泛使用，因此评估其重要。而大型语言模型在不同任务之间的一致性却受到了忽略。人们可以 intuition 来认为，一个可靠的模型应该在生成自然语言规范和代码之间保持一致。如果不能保持一致性，则表明模型对自然语言和编程语言共同下的 semantics 没有很好的理解，因此模型的可靠性将受到损害。在这篇论文中，我们首先正式定义了 Code LLMs 的自身一致性，然后我们设计了一个框架，叫做 IdentityChain，可以同时评估模型的自身一致性和总准确率。我们对 eleven 种 Code LLMs 进行了研究，发现它们在保持自身一致性方面存在问题，这实际上是一种与总准确率不同的特征。此外，我们还证明了 IdentityChain 可以作为模型调试工具，用于曝光当前模型的三大弱点。我们的代码可以在 https://github.com/marcusm117/IdentityChain 上找到。

Code-Switching with Word Senses for Pretraining in Neural Machine Translation

paper_url: http://arxiv.org/abs/2310.14050
repo_url: None
paper_authors: Vivek Iyer, Edoardo Barba, Alexandra Birch, Jeff Z. Pan, Roberto Navigli
for: 本研究旨在提高Neural Machine Translation（NMT）模型的精度和可靠性，通过在Pre-training阶段使用知识库中的单词意思信息来改善模型的多语言能力。
methods: 本研究提出了Word Sense Pretraining for Neural Machine Translation（WSP-NMT）方法，该方法利用知识库中的单词意思信息进行预训练，以提高模型在多语言翻译任务中的表现。
results: 实验结果表明，WSP-NMT方法可以显著提高翻译质量，并在不同的数据和资源匮乏情况下保持良好的表现。此外，研究还发现了在DiBiMT排除词检测测试上的细致精度提升。

Abstract
Lexical ambiguity is a significant and pervasive challenge in Neural Machine Translation (NMT), with many state-of-the-art (SOTA) NMT systems struggling to handle polysemous words (Campolungo et al., 2022). The same holds for the NMT pretraining paradigm of denoising synthetic "code-switched" text (Pan et al., 2021; Iyer et al., 2023), where word senses are ignored in the noising stage -- leading to harmful sense biases in the pretraining data that are subsequently inherited by the resulting models. In this work, we introduce Word Sense Pretraining for Neural Machine Translation (WSP-NMT) - an end-to-end approach for pretraining multilingual NMT models leveraging word sense-specific information from Knowledge Bases. Our experiments show significant improvements in overall translation quality. Then, we show the robustness of our approach to scale to various challenging data and resource-scarce scenarios and, finally, report fine-grained accuracy improvements on the DiBiMT disambiguation benchmark. Our studies yield interesting and novel insights into the merits and challenges of integrating word sense information and structured knowledge in multilingual pretraining for NMT.

摘要
Lexical ambiguity 是 neural machine translation (NMT) 中的一个重要和普遍存在的挑战 (Campolungo et al., 2022)。多种现代 NMT 系统在处理多义词 (polysemous words) 方面几乎都有困难。同样的情况也出现在 NMT 预训练 paradigm 中，例如 denoising synthetic "code-switched" text (Pan et al., 2021; Iyer et al., 2023)，在杂化阶段中忽略单词的意思 -- 导致预训练数据中的词义偏见，这些偏见后来被传递给结果模型。在这种情况下，我们引入了 Word Sense Pretraining for Neural Machine Translation (WSP-NMT) - 一种结束到结束的方法，使用知识库中的单词意思特定信息来预训练多语言 NMT 模型。我们的实验表明，我们的方法可以提高总翻译质量。然后，我们证明了我们的方法可以扩展到各种复杂的数据和资源匮乏场景，并最后报告了 DiBiMT 分词标准 bencmark 上的细化准确性改进。我们的研究提供了关于将单词意思信息和结构化知识integrated into multilingual pretraining for NMT的新颖和有趣的发现。

MeaeQ: Mount Model Extraction Attacks with Efficient Queries

paper_url: http://arxiv.org/abs/2310.14047
repo_url: https://github.com/c-w-d/meaeq
paper_authors: Chengwei Dai, Minxuan Lv, Kun Li, Wei Zhou
For: The paper is written to address model extraction attacks in natural language processing (NLP) and to propose a method for stealing victim models with low query costs.* Methods: The paper uses a zero-shot sequence inference classifier combined with API service information to filter task-relevant data from a public text corpus, and a clustering-based data reduction technique to obtain representative data as queries for the attack.* Results: The paper achieves higher functional similarity to the victim model than baselines while requiring fewer queries, as demonstrated through extensive experiments conducted on four benchmark datasets.Here’s the same information in Simplified Chinese text:* For: 本文是为了 Addressing Model Extraction Attacks in Natural Language Processing (NLP) 和提出一种用于夺取受害模型的低查询成本的方法。* Methods: 本文使用 Zero-shot Sequence Inference Classifier 与 API 服务信息结合，从公共文本 corpus 中筛选任务相关的数据，并使用 clustering-based data reduction technique 获取表示性的数据作为攻击的查询。* Results: 本文在四个 benchmark 数据集上实现了高度的函数相似性，而且需要更少的查询，较基eline 高。

Abstract
We study model extraction attacks in natural language processing (NLP) where attackers aim to steal victim models by repeatedly querying the open Application Programming Interfaces (APIs). Recent works focus on limited-query budget settings and adopt random sampling or active learning-based sampling strategies on publicly available, unannotated data sources. However, these methods often result in selected queries that lack task relevance and data diversity, leading to limited success in achieving satisfactory results with low query costs. In this paper, we propose MeaeQ (Model extraction attack with efficient Queries), a straightforward yet effective method to address these issues. Specifically, we initially utilize a zero-shot sequence inference classifier, combined with API service information, to filter task-relevant data from a public text corpus instead of a problem domain-specific dataset. Furthermore, we employ a clustering-based data reduction technique to obtain representative data as queries for the attack. Extensive experiments conducted on four benchmark datasets demonstrate that MeaeQ achieves higher functional similarity to the victim model than baselines while requiring fewer queries. Our code is available at https://github.com/C-W-D/MeaeQ.

摘要
我们研究模型EXTRACT attacks在自然语言处理（NLP）中，攻击者希望通过 repeatedly 访问公开的应用程序编程接口（API）来窃取受害者模型。最近的工作主要集中在有限Query 预算设置下，采用随机抽样或活动学习基于抽样策略在公共可用数据源上。然而，这些方法经常导致选择的查询缺乏任务相关性和数据多样性，从而导致寻求满意的结果需要较高的查询成本。在这篇论文中，我们提出 MeaeQ（模型EXTRACT攻击with高效查询），一种简单又有效的方法来解决这些问题。我们首先利用零拟合序列推理分类器，结合API服务信息，从公共文本资源中筛选任务相关的数据而不是具体领域特定的数据集。其次，我们使用聚类分析技术来减少数据，从而获得代表性的查询。我们对四个标准测试集进行了广泛的实验，结果显示，MeaeQ可以在 fewer 查询下达到更高的功能相似性，而与基eline 相比。我们的代码可以在上获取。

Tree Prompting: Efficient Task Adaptation without Fine-Tuning

paper_url: http://arxiv.org/abs/2310.14034
repo_url: https://github.com/csinva/treeprompt
paper_authors: John X. Morris, Chandan Singh, Alexander M. Rush, Jianfeng Gao, Yuntian Deng
for: 本文旨在提高小型语言模型（LM）的应用性，通过构建决策树来提高提问的精度。
methods: 本文提出了树提 prompting 方法，即在执行时，通过有效地路由上一步决策树的结果来确定下一步LM的调用。
results: 实验表明，树提 prompting 方法可以提高分类 dataset 的准确率，与Finetuning 相当，并且可以在各种任务上进行审查模型的决策过程。

Abstract
Prompting language models (LMs) is the main interface for applying them to new tasks. However, for smaller LMs, prompting provides low accuracy compared to gradient-based finetuning. Tree Prompting is an approach to prompting which builds a decision tree of prompts, linking multiple LM calls together to solve a task. At inference time, each call to the LM is determined by efficiently routing the outcome of the previous call using the tree. Experiments on classification datasets show that Tree Prompting improves accuracy over competing methods and is competitive with fine-tuning. We also show that variants of Tree Prompting allow inspection of a model's decision-making process.

摘要
LMs的提示是主要用于应用它们到新任务中。然而，对于较小的LMs，提示提供的准确率相对较低，而梯度基于的finetuning则提供了更高的准确率。Tree Prompting是一种提示方法，它建立了一棵决策树，将多个LM调用串起来解决任务。在推理时，每次LM调用的结果都会根据树的结构有效地Routing到下一次调用。我们的实验表明，Tree Prompting可以提高分类 datasets 的准确率，并与finetuning竞争。此外，我们还示出了Tree Prompting的变体可以对模型决策过程进行检查。

Analysing State-Backed Propaganda Websites: a New Dataset and Linguistic Study

paper_url: http://arxiv.org/abs/2310.14032
repo_url: https://github.com/gatenlp/wordpress-site-extractor
paper_authors: Freddy Heppell, Kalina Bontcheva, Carolina Scarton
for: 这篇论文旨在分析两个以前没有研究过的国家支持的假新闻网站，即Reliable Recent News (rrn.world)和WarOnFakes (waronfakes.com)，这两个网站分别发布了多种语言的内容，包括阿拉伯语、中文、英语、法语、德语和西班牙语。
methods: 我们使用了内容获取方法和跨站无监督主题聚合方法来处理这些多语言数据集，并对网页翻译和时间分析进行语言和时间分析。
results: 我们的研究发现，这两个网站的内容具有较高的假新闻分布率，并且有一些文章的发布日期不准确。我们还发现了这些网站之间的语言和主题相似性，以及它们在时间上的变化。我们还公开发布了一个包含14,053篇文章的新数据集，每篇文章都有相应的语言版本和附加的元数据，如链接和图片。本研究对NLP社区的主要贡献在于提供了假新闻网站的新数据集，以及训练NLP工具 для假新闻检测。

Abstract
This paper analyses two hitherto unstudied sites sharing state-backed disinformation, Reliable Recent News (rrn.world) and WarOnFakes (waronfakes.com), which publish content in Arabic, Chinese, English, French, German, and Spanish. We describe our content acquisition methodology and perform cross-site unsupervised topic clustering on the resulting multilingual dataset. We also perform linguistic and temporal analysis of the web page translations and topics over time, and investigate articles with false publication dates. We make publicly available this new dataset of 14,053 articles, annotated with each language version, and additional metadata such as links and images. The main contribution of this paper for the NLP community is in the novel dataset which enables studies of disinformation networks, and the training of NLP tools for disinformation detection.

摘要
这篇论文分析了两个未经研究的媒体站点，即可靠最新新闻（rrn.world）和战对假新闻（waronfakes.com），这两个站点都发布了多语言内容（阿拉伯语、中文、英语、法语、德语和西班牙语）。我们描述了我们的内容获取方法和跨站点无监督主题划分方法，并对多语言数据集进行语言和时间分析，以及文章发布日期的错误分析。我们公开发布了14,053篇文章，每篇文章都有相应的语言版本和附加元数据，如链接和图像。本文的主要贡献是提供了一个新的识别假新闻网络的数据集，以及训练NLP工具的机会。

LLM-Prop: Predicting Physical And Electronic Properties Of Crystalline Solids From Their Text Descriptions

paper_url: http://arxiv.org/abs/2310.14029
repo_url: https://github.com/vertaix/llm-prop
paper_authors: Andre Niyongabo Rubungo, Craig Arnold, Barry P. Rand, Adji Bousso Dieng
for: 这个论文旨在提出一种基于大语言模型（LLM）的方法，用于从晶体文本描述中预测晶体性质。
methods: 该方法使用大语言模型（LLM）来利用晶体文本描述来预测晶体的物理和电子性质。
results: 对比现有的graph neural network（GNN）方法，LLM-Prop方法在预测晶体带隙、是否直接带隙和晶体单元体积等性质上表现出较高的准确率。

Abstract
The prediction of crystal properties plays a crucial role in the crystal design process. Current methods for predicting crystal properties focus on modeling crystal structures using graph neural networks (GNNs). Although GNNs are powerful, accurately modeling the complex interactions between atoms and molecules within a crystal remains a challenge. Surprisingly, predicting crystal properties from crystal text descriptions is understudied, despite the rich information and expressiveness that text data offer. One of the main reasons is the lack of publicly available data for this task. In this paper, we develop and make public a benchmark dataset (called TextEdge) that contains text descriptions of crystal structures with their properties. We then propose LLM-Prop, a method that leverages the general-purpose learning capabilities of large language models (LLMs) to predict the physical and electronic properties of crystals from their text descriptions. LLM-Prop outperforms the current state-of-the-art GNN-based crystal property predictor by about 4% in predicting band gap, 3% in classifying whether the band gap is direct or indirect, and 66% in predicting unit cell volume. LLM-Prop also outperforms a finetuned MatBERT, a domain-specific pre-trained BERT model, despite having 3 times fewer parameters. Our empirical results may highlight the current inability of GNNs to capture information pertaining to space group symmetry and Wyckoff sites for accurate crystal property prediction.

摘要
<> translate "The prediction of crystal properties plays a crucial role in the crystal design process. Current methods for predicting crystal properties focus on modeling crystal structures using graph neural networks (GNNs). Although GNNs are powerful, accurately modeling the complex interactions between atoms and molecules within a crystal remains a challenge. Surprisingly, predicting crystal properties from crystal text descriptions is understudied, despite the rich information and expressiveness that text data offer. One of the main reasons is the lack of publicly available data for this task. In this paper, we develop and make public a benchmark dataset (called TextEdge) that contains text descriptions of crystal structures with their properties. We then propose LLM-Prop, a method that leverages the general-purpose learning capabilities of large language models (LLMs) to predict the physical and electronic properties of crystals from their text descriptions. LLM-Prop outperforms the current state-of-the-art GNN-based crystal property predictor by about 4% in predicting band gap, 3% in classifying whether the band gap is direct or indirect, and 66% in predicting unit cell volume. LLM-Prop also outperforms a finetuned MatBERT, a domain-specific pre-trained BERT model, despite having 3 times fewer parameters. Our empirical results may highlight the current inability of GNNs to capture information pertaining to space group symmetry and Wyckoff sites for accurate crystal property prediction."中文翻译：<>预测 кристаллических 属性在 кристалли设计过程中扮演关键角色。现有方法用图 neural networks（GNNs）模拟 кристалли结构，尽管 GNNs 强大，但是很难准确地模拟 кристалли中原子和分子之间复杂的交互。尽管预测 кристалли属性从 кристалли文本描述是未explored的，尽管文本数据具有丰富的信息和表达能力。一个主要的原因是该任务上没有公共可用的数据。在这篇论文中，我们开发了一个名为 TextEdge 的 referential dataset，该 dataset包含 кристалли结构的文本描述和其属性。我们然后提出了 LLM-Prop，一种利用大型自然语言模型（LLMs）预测 кристалли物理和电子属性的方法。LLM-Prop 在预测带隙、直接或间接带隙和单元积体积方面比现有状态 искусственный neural networks （GNNs） Based crystal property predictor 高于4%，高于3%在分类直接或间接带隙，和66%在预测单元积体积。LLM-Prop 还高于一个finetuned MatBERT，一种预先训练的 BERT 模型，尽管它有3倍少的参数。我们的实验结果可能高亮当前 GNNs 无法捕捉关于空群 симметрии和 Wyckoff 位置的信息，以便准确预测 кристалли属性。

GASCOM: Graph-based Attentive Semantic Context Modeling for Online Conversation Understanding

paper_url: http://arxiv.org/abs/2310.14028
repo_url: None
paper_authors: Vibhor Agarwal, Yu Chen, Nishanth Sastry
for: 这篇论文是为了提高在线对话理解的性能而写的。
methods: 这篇论文提出了一种基于图 estructure的注意力机制，使得可以更好地理解在线对话的含义。
results: 论文使用了两种新的算法，可以从整个对话树中提取有用的信息，并且使用多头Graph Attention Mechanism来进一步细化对话上的含义模型化。论文的实验结果表明，与现状比较，这种方法可以提高对话理解的性能，提高了对 polarity prediction 和 hate speech detection 的性能。

Abstract
Online conversation understanding is an important yet challenging NLP problem which has many useful applications (e.g., hate speech detection). However, online conversations typically unfold over a series of posts and replies to those posts, forming a tree structure within which individual posts may refer to semantic context from higher up the tree. Such semantic cross-referencing makes it difficult to understand a single post by itself; yet considering the entire conversation tree is not only difficult to scale but can also be misleading as a single conversation may have several distinct threads or points, not all of which are relevant to the post being considered. In this paper, we propose a Graph-based Attentive Semantic COntext Modeling (GASCOM) framework for online conversation understanding. Specifically, we design two novel algorithms that utilise both the graph structure of the online conversation as well as the semantic information from individual posts for retrieving relevant context nodes from the whole conversation. We further design a token-level multi-head graph attention mechanism to pay different attentions to different tokens from different selected context utterances for fine-grained conversation context modeling. Using this semantic conversational context, we re-examine two well-studied problems: polarity prediction and hate speech detection. Our proposed framework significantly outperforms state-of-the-art methods on both tasks, improving macro-F1 scores by 4.5% for polarity prediction and by 5% for hate speech detection. The GASCOM context weights also enhance interpretability.

摘要
在线对话理解是一项重要又挑战性的自然语言处理（NLP）问题，它在许多应用中具有用于 hate speech detection 等应用。然而，在线对话通常是一串带有回快的帖子和回快中的帖子，组成一个树状结构，各个帖子可能引用上下文中的semantic context。这种 semantic cross-referencing 使得单个帖子难以理解，同时考虑整个对话树也不仅困难scaling，还可能导致偏导的解释，因为一个对话可能有多个不同的线索或焦点，其中不 todos 是 relevante para el post being considered。在这篇论文中，我们提出一个 Graph-based Attentive Semantic COntext Modeling（GASCOM）框架，用于在线对话理解。具体来说，我们设计了两种新的算法，利用在线对话的graph结构以及各个帖子的semantic信息，来选择 relevante context nodes from the whole conversation。此外，我们还设计了一个token-level multi-head graph attention mechanism，用于在不同的selected context utterances中进行细致的对话上下文模型化。使用这种 semantic conversational context，我们重新评估了两个已有的问题：polarity prediction和 hate speech detection。我们的提议的框架在两个任务上显著超越了当前的状态方法，提高了macro-F1分数 by 4.5% for polarity prediction和by 5% for hate speech detection。GASCOM上下文权重也提高了解释性。

Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation

paper_url: http://arxiv.org/abs/2310.14025
repo_url: https://github.com/anastasiakrith/multimodal-retrieval-for-vwsd
paper_authors: Anastasia Kritharoula, Maria Lymperaiou, Giorgos Stamou
for: 本文主要针对的是解决文本含义歧义的图像检索任务，即Visual Word Sense Disambiguation (VWSD)。
methods: 本文采用了多种方法，包括最新的transformer-based方法和大自然语言模型（LLMs）来解决VWSD任务。
results: experiments表明，我们的方法可以在VWSD任务中达到竞争性的排名结果，并且通过Chain-of-Thought（CoT）提示来帮助解释answer生成。

Abstract
Visual Word Sense Disambiguation (VWSD) is a novel challenging task with the goal of retrieving an image among a set of candidates, which better represents the meaning of an ambiguous word within a given context. In this paper, we make a substantial step towards unveiling this interesting task by applying a varying set of approaches. Since VWSD is primarily a text-image retrieval task, we explore the latest transformer-based methods for multimodal retrieval. Additionally, we utilize Large Language Models (LLMs) as knowledge bases to enhance the given phrases and resolve ambiguity related to the target word. We also study VWSD as a unimodal problem by converting to text-to-text and image-to-image retrieval, as well as question-answering (QA), to fully explore the capabilities of relevant models. To tap into the implicit knowledge of LLMs, we experiment with Chain-of-Thought (CoT) prompting to guide explainable answer generation. On top of all, we train a learn to rank (LTR) model in order to combine our different modules, achieving competitive ranking results. Extensive experiments on VWSD demonstrate valuable insights to effectively drive future directions.

摘要
Visible Word Sense Disambiguation (VWSD) 是一个新型的挑战性任务，旨在从一组候选者中选取一幅图像，更好地表现出一个模糊词的意思在特定上下文中。在这篇文章中，我们做出了一个重要的进步，通过应用不同的方法来解决这个有趣的任务。由于 VWSD 主要是文本-图像搜寻任务，我们探索了最新的 transformer-based 方法来进行多modal搜寻。此外，我们使用 Large Language Models (LLMs) 来增强给定的短语，解决对目标词的模糊性。我们还研究了 VWSD 作为单modal问题，通过将它转换为文本-文本和图像-图像搜寻，以及问答 (QA)，以全面探索相关模型的能力。为了吸取 LLMS 的隐藏知识，我们尝试使用 Chain-of-Thought (CoT) 提示来引导可解释的答案生成。最后，我们将多个模组联合起来，使用 learn to rank (LTR) 模型进行排名， achieving 竞争性的排名结果。广泛的实验在 VWSD 中，提供了宝贵的见解，将未来发展领域带向更好的未来。

Toward Stronger Textual Attack Detectors

paper_url: http://arxiv.org/abs/2310.14001
repo_url: https://github.com/pierrecolombo/adversarialattacksnlp
paper_authors: Pierre Colombo, Marine Picot, Nathan Noiry, Guillaume Staerman, Pablo Piantanida
for: 防止文本敌对攻击，即使是深度NLP系统。
methods: 引入了一个新的检测敌对攻击的框架——LAROUSSE，以及一个新的评价板准——STAKEOUT。
results: LAROUSSE比前一代方法更高效，并且可以防止梯度基本法。

Abstract
The landscape of available textual adversarial attacks keeps growing, posing severe threats and raising concerns regarding the deep NLP system's integrity. However, the crucial problem of defending against malicious attacks has only drawn the attention of the NLP community. The latter is nonetheless instrumental in developing robust and trustworthy systems. This paper makes two important contributions in this line of search: (i) we introduce LAROUSSE, a new framework to detect textual adversarial attacks and (ii) we introduce STAKEOUT, a new benchmark composed of nine popular attack methods, three datasets, and two pre-trained models. LAROUSSE is ready-to-use in production as it is unsupervised, hyperparameter-free, and non-differentiable, protecting it against gradient-based methods. Our new benchmark STAKEOUT allows for a robust evaluation framework: we conduct extensive numerical experiments which demonstrate that LAROUSSE outperforms previous methods, and which allows to identify interesting factors of detection rate variations.

摘要
文章做出了两个重要贡献：首先，我们引入了一个新的检测文本针对攻击的框架，即LAROUSSE，它是不需要监督学习、无参数、不可导的，因此具有更高的安全性。其次，我们引入了一个新的评估框架，即STAKEOUT，它包括9种常见的攻击方法、3个数据集和2个预训练模型。我们的新框架允许更加稳定地评估检测效果，我们进行了广泛的数字实验，结果表明LAROUSSE在前一代方法之上出众，同时还可以识别检测率的变化因素。

Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models

paper_url: http://arxiv.org/abs/2310.13998
repo_url: None
paper_authors: Pierre Colombo, Victor Pellegrain, Malik Boudiaf, Victor Storchan, Myriam Tami, Ismail Ben Ayed, Celine Hudelot, Pablo Piantanida
for: This paper focuses on the practical applications of natural language processing, specifically few-shot classification, and addresses the issue of proprietary and closed APIs.
methods: The paper proposes a transductive inference learning paradigm that utilizes unlabeled data, along with a new parameter-free transductive regularizer based on the Fisher-Rao loss.
results: The paper presents experimental results using eight backbone models and an episodic evaluation over 1,000 episodes, which demonstrate the superiority of transductive inference over the standard inductive setting.Here is the result in Simplified Chinese text:
for: 本文关注自然语言处理的实际应用，具体是几个shot类型的分类，并解决了 propriety 和关闭 API 的问题。
methods: 本文提出了一种推uctive推理学习模式，利用无标注数据，并提出了一种无参数的推uctive规范基于 Fisher-Rao 损失。
results: 本文通过使用 eight 个基础模型和一千个 episodic 评估，展示了推uctive推理在标准 inductive 设定下的超越。

Abstract
Proprietary and closed APIs are becoming increasingly common to process natural language, and are impacting the practical applications of natural language processing, including few-shot classification. Few-shot classification involves training a model to perform a new classification task with a handful of labeled data. This paper presents three contributions. First, we introduce a scenario where the embedding of a pre-trained model is served through a gated API with compute-cost and data-privacy constraints. Second, we propose a transductive inference, a learning paradigm that has been overlooked by the NLP community. Transductive inference, unlike traditional inductive learning, leverages the statistics of unlabeled data. We also introduce a new parameter-free transductive regularizer based on the Fisher-Rao loss, which can be used on top of the gated API embeddings. This method fully utilizes unlabeled data, does not share any label with the third-party API provider and could serve as a baseline for future research. Third, we propose an improved experimental setting and compile a benchmark of eight datasets involving multiclass classification in four different languages, with up to 151 classes. We evaluate our methods using eight backbone models, along with an episodic evaluation over 1,000 episodes, which demonstrate the superiority of transductive inference over the standard inductive setting.

摘要
专有和关闭API在处理自然语言方面变得越来越普遍，这对自然语言处理的实际应用产生了影响，包括几个shot分类。本文提出了三个贡献。首先，我们介绍了一种情况，在这种情况下，一个预训练模型的 embedding 被通过一个限制 compute-cost 和数据隐私的 gat API 提供。第二，我们提议了一种被 NLP 社区忽视的学习模式——推uctive inference。推uctive inference 不同于传统的 inductive learning，它利用不标注数据的统计特性。我们还介绍了一个新的参数-free 推uctive regularizer，基于 Fisher-Rao 损失函数，可以在gat API 中使用。这种方法可以完全利用无标注数据，不需要与第三方 API 提供者共享标签，并且可以作为未来研究的基准。第三，我们提出了一个改进的实验设定，并编译了八个dataset，包括四种语言，最多 151 个分类。我们使用八种背部bone模型进行评估，并在1,000个 episodic 评估中，发现推uctive inference 在标准 inductive 设定下表现出优异性。

Emulating the Human Mind: A Neural-symbolic Link Prediction Model with Fast and Slow Reasoning and Filtered Rules

paper_url: http://arxiv.org/abs/2310.13996
repo_url: None
paper_authors: Mohammad Hossein Khojasteh, Najmeh Torabian, Ali Farjami, Saeid Hosseini, Behrouz Minaei-Bidgoli
for: 这个研究的目的是解决知识граフ（KG）中的不完整性问题，并提出了一个新的神经几何模型named FaSt-FLiP，以提高链接预测的性能和可解性。
methods: 这个模型使用了“常识推理”和“快速思考”两种人类认知方面的特点，并结合了逻辑和神经网络模型，以提高链接预测的精度和可解性。
results: 研究结果显示，FaSt-FLiP模型在链接预测中的表现较高，并能够提供更可靠的解释。另外，模型还能够自动检测和删除逻辑模型生成的错误规则。

Abstract
Link prediction is an important task in addressing the incompleteness problem of knowledge graphs (KG). Previous link prediction models suffer from issues related to either performance or explanatory capability. Furthermore, models that are capable of generating explanations, often struggle with erroneous paths or reasoning leading to the correct answer. To address these challenges, we introduce a novel Neural-Symbolic model named FaSt-FLiP (stands for Fast and Slow Thinking with Filtered rules for Link Prediction task), inspired by two distinct aspects of human cognition: "commonsense reasoning" and "thinking, fast and slow." Our objective is to combine a logical and neural model for enhanced link prediction. To tackle the challenge of dealing with incorrect paths or rules generated by the logical model, we propose a semi-supervised method to convert rules into sentences. These sentences are then subjected to assessment and removal of incorrect rules using an NLI (Natural Language Inference) model. Our approach to combining logical and neural models involves first obtaining answers from both the logical and neural models. These answers are subsequently unified using an Inference Engine module, which has been realized through both algorithmic implementation and a novel neural model architecture. To validate the efficacy of our model, we conducted a series of experiments. The results demonstrate the superior performance of our model in both link prediction metrics and the generation of more reliable explanations.

摘要
链接预测是知识 graphs（KG）的重要任务，以解决知识 Graphs 的不完整性问题。现有的链接预测模型受到性能和可解释能力的限制。而且，可以生成解释的模型经常会遇到错误的路径或理由，导致正确答案。为 Addressing these challenges, we propose a novel Neural-Symbolic model named FaSt-FLiP（快速思维与筛选规则 для链接预测任务）， drawing inspiration from two aspects of human cognition："通常的思维"和"快速和慢速的思考。our objective is to combine a logical and neural model for enhanced link prediction. To tackle the challenge of dealing with incorrect paths or rules generated by the logical model, we propose a semi-supervised method to convert rules into sentences. These sentences are then subjected to assessment and removal of incorrect rules using an NLI（自然语言推理）model. Our approach to combining logical and neural models involves first obtaining answers from both the logical and neural models. These answers are subsequently unified using an Inference Engine module, which has been realized through both algorithmic implementation and a novel neural model architecture. To validate the efficacy of our model, we conducted a series of experiments. The results demonstrate the superior performance of our model in both link prediction metrics and the generation of more reliable explanations.

A Novel Information-Theoretic Objective to Disentangle Representations for Fair Classification

paper_url: http://arxiv.org/abs/2310.13990
repo_url: None
paper_authors: Pierre Colombo, Nathan Noiry, Guillaume Staerman, Pablo Piantanida
for: The paper aims to learn abstract representations of reality from the observation of multiple contextual situations, specifically disentangled representations that are low-dimensional and independent of sensitive attributes such as gender or age.
methods: The paper proposes a novel family of regularizers called CLINIC, which minimizes the mutual information between the latent representation and the sensitive attribute conditional to the target. This approach is parameter-free and easier to train than previous techniques.
results: The paper demonstrates that the proposed CLINIC losses offer a better disentanglement/accuracy trade-off than previous techniques and generalize better than training with cross-entropy loss, provided that the disentanglement task is not too constraining.Here is the simplified Chinese version of the three key information points:
for: 该论文目标是从多个 contextual situations 中学习抽象的 reality 表示，具体来说是EXTRACT 独立的表示，即低维度且独立的概念表示。
methods: 该论文提出一种新的 family of regularizers called CLINIC，该regularizers 的目标是将敏感特征（如性别或年龄）与 latent representation 的相关性降低到最低。这种方法是 parameter-free 的， easier 和 faster than previous techniques。
results: 该论文的实验结果显示，提出的 CLINIC losses 可以比 previous techniques 提供更好的 disentanglement/accuracy 的负荷平衡，并且在不太紧张的disentanglement task 下可以更好地泛化。

Abstract
One of the pursued objectives of deep learning is to provide tools that learn abstract representations of reality from the observation of multiple contextual situations. More precisely, one wishes to extract disentangled representations which are (i) low dimensional and (ii) whose components are independent and correspond to concepts capturing the essence of the objects under consideration (Locatello et al., 2019b). One step towards this ambitious project consists in learning disentangled representations with respect to a predefined (sensitive) attribute, e.g., the gender or age of the writer. Perhaps one of the main application for such disentangled representations is fair classification. Existing methods extract the last layer of a neural network trained with a loss that is composed of a cross-entropy objective and a disentanglement regularizer. In this work, we adopt an information-theoretic view of this problem which motivates a novel family of regularizers that minimizes the mutual information between the latent representation and the sensitive attribute conditional to the target. The resulting set of losses, called CLINIC, is parameter free and thus, it is easier and faster to train. CLINIC losses are studied through extensive numerical experiments by training over 2k neural networks. We demonstrate that our methods offer a better disentanglement/accuracy trade-off than previous techniques, and generalize better than training with cross-entropy loss solely provided that the disentanglement task is not too constraining.

摘要
一个深度学习的核心目标是提供能够从多个上下文中学习抽象的现实表示的工具。更具体地说，希望从多个上下文中提取独立的表示，其维度低、并且其组成部分独立、对象本身的核心特征capturing (Locatello et al., 2019b).一种实现这个奢侈目标的方法是通过对敏感特征（如作者的性别或年龄）进行分离表示学习。可能这种独立表示的主要应用是公平分类。现有方法通常是提取一个通过混合Entropy目标和分离regularizer进行训练的神经网络的最后一层。在这个工作中，我们采用信息论视角来解决这个问题，并提出了一个新的家族征识器，它将 conditional于目标进行抽象的mutual information minimization。这些loss Functions被称为CLINIC，它们是无参数的，因此更容易和更快地训练。我们通过对2k个神经网络进行数字实验来研究CLINIC loss。我们示示了我们的方法可以在训练过程中提供更好的抽象/准确性质和平衡，并且在训练cross-entropy loss solo不可能达到的情况下，能够更好地泛化。

GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4

paper_url: http://arxiv.org/abs/2310.13988
repo_url: None
paper_authors: Tom Kocmi, Christian Federmann
for: 评估翻译质量错误
methods: 使用GPT模型，三次预览技术，无需人工参考翻译
results: 达到系统排名的最佳准确率，但需要注意GPT模型的商业性和黑盒特性，不建议用于学术论文中证明其他方法的改进。

Abstract
This paper introduces GEMBA-MQM, a GPT-based evaluation metric designed to detect translation quality errors, specifically for the quality estimation setting without the need for human reference translations. Based on the power of large language models (LLM), GEMBA-MQM employs a fixed three-shot prompting technique, querying the GPT-4 model to mark error quality spans. Compared to previous works, our method has language-agnostic prompts, thus avoiding the need for manual prompt preparation for new languages. While preliminary results indicate that GEMBA-MQM achieves state-of-the-art accuracy for system ranking, we advise caution when using it in academic works to demonstrate improvements over other methods due to its dependence on the proprietary, black-box GPT model.

摘要
这篇论文介绍了GEMBA-MQM，一种基于GPT的评估指标，用于探测翻译质量错误。这种指标不需要人工参考翻译，可以在质量估计设置中使用。通过大语言模型（LLM）的力量，GEMBA-MQM使用固定的三批提问技术，让GPT-4模型标识错误质量段落。与前一代方法相比，我们的方法有语言共享的提问，因此不需要为新语言手动准备提问。然而，我们建议在学术论文中使用GEMBA-MQM时需要谨慎，因为它取决于商业化、黑盒子GPT模型，这可能会导致使用它来证明其他方法的优越性的问题。

HateRephrase: Zero- and Few-Shot Reduction of Hate Intensity in Online Posts using Large Language Models

paper_url: http://arxiv.org/abs/2310.13985
repo_url: None
paper_authors: Vibhor Agarwal, Yu Chen, Nishanth Sastry
for: 这篇论文旨在提出一种新的、简单有效的方法，即在发布前对潜在仇恨言论内容进行重写。
methods: 这篇论文使用了大语言模型（LLMs），并比较了不同的提示方法，包括任务描述、仇Definition、少量示例和思维链。
results: 研究发现，使用几shot示例提示的LLMs最perform最好，并且在不同的提示方法下都有良好的表现。而且，对于不同的提示方法，GPT-3.5表现最佳。此外，人工评估表明，GPT-3.5生成的重写内容甚至超过了人类生成的ground truth重写内容。

Abstract
Hate speech has become pervasive in today's digital age. Although there has been considerable research to detect hate speech or generate counter speech to combat hateful views, these approaches still cannot completely eliminate the potential harmful societal consequences of hate speech -- hate speech, even when detected, can often not be taken down or is often not taken down enough; and hate speech unfortunately spreads quickly, often much faster than any generated counter speech. This paper investigates a relatively new yet simple and effective approach of suggesting a rephrasing of potential hate speech content even before the post is made. We show that Large Language Models (LLMs) perform well on this task, outperforming state-of-the-art baselines such as BART-Detox. We develop 4 different prompts based on task description, hate definition, few-shot demonstrations and chain-of-thoughts for comprehensive experiments and conduct experiments on open-source LLMs such as LLaMA-1, LLaMA-2 chat, Vicuna as well as OpenAI's GPT-3.5. We propose various evaluation metrics to measure the efficacy of the generated text and ensure the generated text has reduced hate intensity without drastically changing the semantic meaning of the original text. We find that LLMs with a few-shot demonstrations prompt work the best in generating acceptable hate-rephrased text with semantic meaning similar to the original text. Overall, we find that GPT-3.5 outperforms the baseline and open-source models for all the different kinds of prompts. We also perform human evaluations and interestingly, find that the rephrasings generated by GPT-3.5 outperform even the human-generated ground-truth rephrasings in the dataset. We also conduct detailed ablation studies to investigate why LLMs work satisfactorily on this task and conduct a failure analysis to understand the gaps.

摘要
仇恨言语在当今数字时代已经成为普遍存在的问题。尽管有很多研究检测仇恨言语或生成对抗仇恨观点的counter speech，但这些方法仍然无法完全消除仇恨言语的社会后果 -- 仇恨言语，即使检测到了，通常不能或很难被移除，而且仇恨言语往往很快就会扩散，常常比生成的counter speech更快。这篇论文研究了一种新的、简单而有效的方法，即在投稿之前，使用大语言模型（LLMs）来提议修改潜在的仇恨言语内容。我们证明了LLMs在这个任务上表现良好，超过了现有的基线模型如BART-Detox。我们开发了4个不同的提示，基于任务描述、仇Definition、几个示例和串联思维，进行了广泛的实验。我们使用了开源的LLaMA-1、LLaMA-2 chat、Vicuna以及OpenAI的GPT-3.5等模型进行实验。我们提出了多种评价指标，以确保生成的文本减少了仇恨程度，而不会毁灭语意。我们发现，使用几个示例提示的LLMs最为有效，能够生成接受的仇恨重新写文本， semantic meaning与原文相似。总之，我们发现GPT-3.5在所有不同的提示上都超过了基线和开源模型。我们还进行了人工评价， Interestingly，我们发现GPT-3.5生成的重新写文本甚至超过了人工生成的基准重新写文本。我们还进行了细化的抽象研究和失败分析，以解释LLMs在这个任务上的成功原因。

Automatic Pronunciation Assessment – A Review

paper_url: http://arxiv.org/abs/2310.13974
repo_url: None
paper_authors: Yassine El Kheir, Ahmed Ali, Shammur Absar Chowdhury
for: 这篇论文主要是为了探讨计算机辅助发音训练（CAPT）中的发音评估方法和其应用。
methods: 这篇论文评论了在发音评估方面使用的方法，包括phonemic和prosodic两种方法。
results: 论文认为，现有的研究存在一些挑战和局限性，并提出了未来研究的可能性。

Abstract
Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challenges observed in prominent research trends, and highlight existing limitations, and available resources. This is followed by a discussion of the remaining challenges and possible directions for future work.

摘要
声音评估和计算机辅助声音训练（CAPT）在最近几年中得到了很大的进步。随着语言处理和深度学习技术的快速发展，需要进行更新的评估。本文介绍了声音评估中使用的方法，包括音节和语言流行的评估。我们分类了主要的研究趋势中的挑战，并高亮现有的限制和可用资源。然后是对未来工作的残余挑战和可能的方向的讨论。Note: Simplified Chinese is the standard writing system used in mainland China, while Traditional Chinese is used in Taiwan and Hong Kong.

AITA Generating Moral Judgements of the Crowd with Reasoning

paper_url: http://arxiv.org/abs/2310.18336
repo_url: None
paper_authors: Osama Bsher, Ameer Sabri
for: This paper aims to generate comments with moral reasoning for stories with moral dilemmas using the AITA subreddit as a dataset.methods: The authors will leverage the vast amount of data on the forum and use state-of-the-art seq2seq text generation models to generate coherent comments that align with the norms and values of the AITA community.results: The authors aim to evaluate the ability of these models to make moral judgments similarly to humans and produce concise comments providing clear moral stances and advice for the poster.

Abstract
Morality is a fundamental aspect of human behavior and ethics, influencing how we interact with each other and the world around us. When faced with a moral dilemma, a person's ability to make clear moral judgments can be clouded. Due to many factors such as personal biases, emotions and situational factors people can find it difficult to decide their best course of action. The AmITheAsshole (AITA) subreddit is a forum on the social media platform Reddit that helps people get clarity and objectivity on their predicaments. In the forum people post anecdotes about moral dilemmas they are facing in their lives, seeking validation for their actions or advice on how to navigate the situation from the community. The morality of the actions in each post is classified based on the collective opinion of the community into mainly two labels, "Not The Asshole" (NTA) and "You Are The Asshole" (YTA). This project aims to generate comments with moral reasoning for stories with moral dilemmas using the AITA subreddit as a dataset. While past literature has explored the classification of posts into labels (Alhassan et al., 2022), the generation of comments remains a novel and challenging task. It involves understanding the complex social and ethical considerations in each situation. To address this challenge, we will leverage the vast amount of data on the forum with the goal of generating coherent comments that align with the norms and values of the AITA community. In this endeavor, we aim to evaluate state-of-the-art seq2seq text generation models for their ability to make moral judgments similarly to humans, ultimately producing concise comments providing clear moral stances and advice for the poster.

摘要
人类行为中的道德是一个基本方面，影响我们如何与其他人和世界around us interact。当面临道德困难时，人们可能会受到个人偏见、情感和情况因素的影响，导致困难做出明确的道德判断。为了帮助人们得到清晰性和 объекivity，Reddit上的AmITheAsshole（AITA）子社区成为了一个有用的平台。在这个社区中，人们会分享他们面临的道德困难，并请求 validation for their actions或对 Situation 的 Navigation 建议。根据社区的共同意见，每篇文章将被分类为主要两个标签：“Not The Asshole”（NTA）和“You Are The Asshole”（YTA）。这个项目的目标是使用 AITA 子社区的数据生成文章中的评论，以提供清晰的道德观点和建议。在这个任务中，我们将利用社区数据的庞大量，以生成一致的评论，与 AITA 社区的 norms 和价值观念相一致。为了解决这个挑战，我们将使用现代 seq2seq 文本生成模型，以模拟人类的道德判断能力。最终，我们希望通过生成简洁明了的评论，为poster提供清晰的道德观点和建议。

Linguistically Motivated Sign Language Segmentation

paper_url: http://arxiv.org/abs/2310.13960
repo_url: https://github.com/sign-language-processing/transcription
paper_authors: Amit Moryossef, Zifan Jiang, Mathias Müller, Sarah Ebling, Yoav Goldberg
for: 本文旨在提出一种新的手语分割方法，以便在手语处理系统中进行下游任务，如手语识别、 транскриpción和机器翻译。
methods: 我们提出的方法基于手语 corpora 中观察到的语言学 clue，使用 BIO 标记替换传统的 IO 标记，以考虑手语的连续性。我们还explore使用光流特征来捕捉手语语法的 просодические特征。
results: 我们发现，使用 BIO 标记可以更好地模型手语 bounding box。在某些深度模型中，使用光流特征可以提高分割质量，但在更深的模型中，这些特征的贡献几乎可以忽略不计。通过精心调整模型的解码算法，我们可以进一步提高分割质量。我们的最终模型可以在不同的手语语言下进行零基础学习，并在不同的视频内容上进行渠道。

Abstract
Sign language segmentation is a crucial task in sign language processing systems. It enables downstream tasks such as sign recognition, transcription, and machine translation. In this work, we consider two kinds of segmentation: segmentation into individual signs and segmentation into phrases, larger units comprising several signs. We propose a novel approach to jointly model these two tasks. Our method is motivated by linguistic cues observed in sign language corpora. We replace the predominant IO tagging scheme with BIO tagging to account for continuous signing. Given that prosody plays a significant role in phrase boundaries, we explore the use of optical flow features. We also provide an extensive analysis of hand shapes and 3D hand normalization. We find that introducing BIO tagging is necessary to model sign boundaries. Explicitly encoding prosody by optical flow improves segmentation in shallow models, but its contribution is negligible in deeper models. Careful tuning of the decoding algorithm atop the models further improves the segmentation quality. We demonstrate that our final models generalize to out-of-domain video content in a different signed language, even under a zero-shot setting. We observe that including optical flow and 3D hand normalization enhances the robustness of the model in this context.

摘要
签语段落是指文本处理系统中的签语分割任务。它可以帮助下游任务，如签语识别、转写和机器翻译。在这个工作中，我们考虑了两种类型的分割：即分割成个体签语和分割成短语，后者是由多个签语组成的更大单位。我们提出了一种新的方法，旨在同时解决这两种任务。我们的方法受到了签语 Corpora 中的语言学cue的启发。我们将主流的 IO 标记方案改为 BIO 标记，以考虑不间断的签语。由于语言的气息在短语边界上发挥重要作用，我们尝试使用光流特征。我们还提供了详细的手势分析和3D手势normalization。我们发现，使用 BIO 标记是必要的，以便模型签语边界。使用光流特征可以在浅度模型中提高分割质量，但在深度模型中，其贡献几乎可以忽略不计。通过精细调整模型顶部的解码算法，可以进一步提高分割质量。我们展示了我们的最终模型可以在不同的指语言中进行零基础学习，并在无预训练情况下保持良好的分割质量。我们发现，包含光流和3D手势normalization可以提高模型在这种情况下的Robustness。

Values, Ethics, Morals? On the Use of Moral Concepts in NLP Research

paper_url: http://arxiv.org/abs/2310.13915
repo_url: None
paper_authors: Karina Vida, Judith Simon, Anne Lauscher
for: 本研究旨在探讨NLPT中的伦理问题，尤其是语言模型的道德评价。
methods: 本研究使用文献综述和系统性分析方法，探讨NLPT中 morality 的定义和基础。
results: 研究发现，大多数文献没有提供明确的定义，也没有遵循哲学定义。此外，研究还给出了三个建议，以促进NLPT中的道德讨论。

Abstract
With language technology increasingly affecting individuals' lives, many recent works have investigated the ethical aspects of NLP. Among other topics, researchers focused on the notion of morality, investigating, for example, which moral judgements language models make. However, there has been little to no discussion of the terminology and the theories underpinning those efforts and their implications. This lack is highly problematic, as it hides the works' underlying assumptions and hinders a thorough and targeted scientific debate of morality in NLP. In this work, we address this research gap by (a) providing an overview of some important ethical concepts stemming from philosophy and (b) systematically surveying the existing literature on moral NLP w.r.t. their philosophical foundation, terminology, and data basis. For instance, we analyse what ethical theory an approach is based on, how this decision is justified, and what implications it entails. Our findings surveying 92 papers show that, for instance, most papers neither provide a clear definition of the terms they use nor adhere to definitions from philosophy. Finally, (c) we give three recommendations for future research in the field. We hope our work will lead to a more informed, careful, and sound discussion of morality in language technology.

摘要

Providing an overview of important ethical concepts from philosophy.2. Systematically surveying the existing literature on moral NLP, examining their philosophical foundations, terminology, and data basis.3. Offering three recommendations for future research in the field.Our survey of 92 papers found that most papers do not provide clear definitions of the terms they use, and few adhere to definitions from philosophy. We hope that our work will contribute to a more informed, careful, and sound discussion of morality in language technology.

RTSUM: Relation Triple-based Interpretable Summarization with Multi-level Salience Visualization

paper_url: http://arxiv.org/abs/2310.13895
repo_url: https://github.com/sjyyj/sjyyj
paper_authors: Seonglae Cho, Yonggi Cho, HoonJae Lee, Myungha Jang, Jinyoung Yeo, Dongha Lee
for: 本文提出了一种无监督概要框架，利用关系 triplets 为概要的基本单元。
methods: 输入文档后，本方法首先选择了突出的关系 triplets via 多级评分，然后通过文本-文本语言模型生成了简洁的概要。
results: 基于本方法，我们还开发了一款可解释性概要工具，提供了细致的解释与输出概要。用户可以自定义选择不同的级别，以便在文本单元层次上visual化文本的重要性。代码公开available。

Abstract
In this paper, we present RTSUM, an unsupervised summarization framework that utilizes relation triples as the basic unit for summarization. Given an input document, RTSUM first selects salient relation triples via multi-level salience scoring and then generates a concise summary from the selected relation triples by using a text-to-text language model. On the basis of RTSUM, we also develop a web demo for an interpretable summarizing tool, providing fine-grained interpretations with the output summary. With support for customization options, our tool visualizes the salience for textual units at three distinct levels: sentences, relation triples, and phrases. The codes,are publicly available.

摘要
在这篇论文中，我们提出了一种无监督摘要框架，称为RTSUM，它利用关系三元组作为摘要的基本单元。给定输入文档，RTSUM首先选择了突出的关系三元组via多级重要性分数，然后使用文本到文本语言模型生成了一份简洁的摘要。基于RTSUM，我们还开发了一个可视化摘要工具，可以提供细化的解释。这个工具支持自定义选项，可以在三级层次（句子、关系三元组、短语）上Visualize文本单元的重要性。代码publicly available。

RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning

paper_url: http://arxiv.org/abs/2310.13864
repo_url: https://github.com/wjhou/recap
paper_authors: Wenjun Hou, Yi Cheng, Kaishuai Xu, Wenjie Li, Jiang Liu
for: 实现 радиialogists 的工作负担减轻methods: 使用动态疾病进程理解和历史纪录融合results: 精确地生成专业医疗报告

Abstract
Automating radiology report generation can significantly alleviate radiologists' workloads. Previous research has primarily focused on realizing highly concise observations while neglecting the precise attributes that determine the severity of diseases (e.g., small pleural effusion). Since incorrect attributes will lead to imprecise radiology reports, strengthening the generation process with precise attribute modeling becomes necessary. Additionally, the temporal information contained in the historical records, which is crucial in evaluating a patient's current condition (e.g., heart size is unchanged), has also been largely disregarded. To address these issues, we propose RECAP, which generates precise and accurate radiology reports via dynamic disease progression reasoning. Specifically, RECAP first predicts the observations and progressions (i.e., spatiotemporal information) given two consecutive radiographs. It then combines the historical records, spatiotemporal information, and radiographs for report generation, where a disease progression graph and dynamic progression reasoning mechanism are devised to accurately select the attributes of each observation and progression. Extensive experiments on two publicly available datasets demonstrate the effectiveness of our model.

摘要
自动化放射学报告生成可以减轻放射学家的工作负担。先前的研究主要集中在实现极简观察结果，而忽略了疾病严重程度决定的精确属性（例如肿胸积液）。由于错误的属性会导致不准确的放射学报告，因此需要加强生成过程中的精确属性模型。此外，历史记录中的时间信息，如评估病人当前状况中的心脏大小是否变化（例如心脏大小不变），也被大量忽略。为解决这些问题，我们提议RECAP，它通过动态疾病进程逻辑来生成精确和准确的放射学报告。具体来说，RECAP首先预测两个连续的放射ogram的观察结果和进程（即空间时间信息）。然后，它将历史记录、空间时间信息和放射ogram组合起来，通过疾病进程图和动态进程逻辑机制来准确选择每个观察结果和进程的属性。我们对公共数据集进行了广泛的实验，结果表明RECAP的效果。

2023-10-21

cs.LG

cs.LG - 2023-10-21

Optimal Batched Best Arm Identification

paper_url: http://arxiv.org/abs/2310.14129
repo_url: None
paper_authors: Tianyuan Jin, Yu Yang, Jing Tang, Xiaokui Xiao, Pan Xu
for: 本研究探讨批处理最佳臂标识问题（Batched Best Arm Identification，BBAI），learner的目标是尽可能快地确定最佳臂，同时尽可能减少策略的变化次数。
methods: 我们提出了三批最佳臂标识算法（Tri-BBAI），它是首个在极限设定（即$\delta\to 0$）下实现最佳样本复杂性的批处理算法，并且只需要最多三批。基于Tri-BBAI，我们还提出了几乎最佳批处理最佳臂标识算法（Opt-BBAI），它在非极限设定（即$\delta>0$）下实现近似最佳样本和批复杂性，而且与Tri-BBAI在$\delta\to 0$时的复杂性相同。
results: 我们的研究表明，Opt-BBAI在非极限设定下实现了近似最佳样本和批复杂性，而且不需要对返回最佳臂的事件进行条件，这与之前的批处理算法不同。此外，我们还提出了一种新的检查最佳臂是否被消除的方法，它是独立的兴趣。

Abstract
We study the batched best arm identification (BBAI) problem, where the learner's goal is to identify the best arm while switching the policy as less as possible. In particular, we aim to find the best arm with probability $1-\delta$ for some small constant $\delta>0$ while minimizing both the sample complexity (total number of arm pulls) and the batch complexity (total number of batches). We propose the three-batch best arm identification (Tri-BBAI) algorithm, which is the first batched algorithm that achieves the optimal sample complexity in the asymptotic setting (i.e., $\delta\rightarrow 0$) and runs only in at most $3$ batches. Based on Tri-BBAI, we further propose the almost optimal batched best arm identification (Opt-BBAI) algorithm, which is the first algorithm that achieves the near-optimal sample and batch complexity in the non-asymptotic setting (i.e., $\delta>0$ is arbitrarily fixed), while enjoying the same batch and sample complexity as Tri-BBAI when $\delta$ tends to zero. Moreover, in the non-asymptotic setting, the complexity of previous batch algorithms is usually conditioned on the event that the best arm is returned (with a probability of at least $1-\delta$), which is potentially unbounded in cases where a sub-optimal arm is returned. In contrast, the complexity of Opt-BBAI does not rely on such an event. This is achieved through a novel procedure that we design for checking whether the best arm is eliminated, which is of independent interest.

摘要
我们研究批处最佳臂识别问题（BBAI），学生的目标是将最佳臂识别出来，将策略调整得最少。具体来说，我们想要在probability $1-\delta$的情况下找到最佳臂，而且将数据分析和批号复杂度（total number of arm pulls和total number of batches）降到最低。我们提出了三批最佳臂识别算法（Tri-BBAI），这是第一个批号算法，可以在无限边界（i.e., $\delta\rightarrow 0$)的情况下实现最佳数据分析和批号复杂度，并且只需要最多三个批号。基于Tri-BBAI，我们进一步提出了几乎最佳批号最佳臂识别算法（Opt-BBAI），这是第一个可以在非 asymptotic 环境（i.e., $\delta>0$是任意固定）下实现几乎最佳的数据分析和批号复杂度，并且跟Tri-BBAI在 $\delta$ 趋向 zero 时的性能相同。此外，在非 asymptotic 环境中，前一代批号算法的复杂度通常受到返回最佳臂的几率（至少是 $1-\delta$）的限制，这可能是无限大的情况。在 контраス特，Opt-BBAI 的复杂度不受这种事件的限制。这是通过我们设计的一种独特的检查方法来检查最佳臂是否被消除的。

DispersioNET: Joint Inversion of Rayleigh-Wave Multimode Phase Velocity Dispersion Curves using Convolutional Neural Networks

paper_url: http://arxiv.org/abs/2310.14094
repo_url: None
paper_authors: Rohan Sharma, Divakar Vashisth, Bharath Shekar
for: 本研究使用深度学习模型DispersioNET来对声波基本和高频模式phase velocity dispersion curve进行联合逆解，以获取声波速度profile。
methods: 该模型基于卷积神经网络(CNN)，并在不同的噪声水平上进行训练和测试。
results: DispersioNET能够准确预测声波速度profile，并能够抗噪和鲁棒性。

Abstract
Rayleigh wave dispersion curves have been widely used in near-surface studies, and are primarily inverted for the shear wave (S-wave) velocity profiles. However, the inverse problem is ill-posed, non-unique and nonlinear. Here, we introduce DispersioNET, a deep learning model based on convolution neural networks (CNN) to perform the joint inversion of Rayleigh wave fundamental and higher order mode phase velocity dispersion curves. DispersioNET is trained and tested on both noise-free and noisy dispersion curve datasets and predicts S-wave velocity profiles that match closely with the true velocities. The architecture is agnostic to variations in S-wave velocity profiles such as increasing velocity with depth and intermediate low-velocity layers, while also ensuring that the output remains independent of the number of layers.

摘要
rayleigh 波动峰位 Curves 在近地表研究中广泛使用，主要是对剪切波（S波）速度profile 进行逆解。然而，逆问题是不定、多重和非线性的。我们介绍了 DispersioNET，一种基于卷积神经网络（CNN）的深度学习模型，用于同时逆解 Rayleigh 波基本和高阶模式 phase 峰位速度分布 Curves。DispersioNET 在噪声存在和缺失的数据集上训练和测试，并能够准确预测 S波速度profile，与真实速度匹配得非常 closely。模型具有不同 S波速度profile 变化的tolerance，同时保证输出不受层数的影响。

A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport

paper_url: http://arxiv.org/abs/2310.14087
repo_url: None
paper_authors: Tianyi Lin, Marco Cuturi, Michael I. Jordan
for: This paper proposes a new method for solving optimal transport (OT) problems using a nonsmooth fixed-point model and a specialized semismooth Newton (SSN) method.
methods: The proposed method uses a nonsmooth fixed-point model and a specialized semismooth Newton (SSN) method to efficiently solve kernel-based OT problems.
results: The proposed method achieves a global convergence rate of $O(1/\sqrt{k})$ and a local quadratic convergence rate under standard regularity conditions, and shows substantial speedups over the short-step interior-point method (SSIPM) on both synthetic and real datasets.Here is the text in Simplified Chinese:
for: 这篇论文提出了一种使用非平滑稳点模型和特殊的半凝固新颖方法解决优Transport问题的新方法。
methods: 该方法使用非平滑稳点模型和特殊的半凝固新颖方法解决核心基于的优Transport问题。
results: 该方法实现了$O(1/\sqrt{k})$的全局收敛率和标准规定的本地二次收敛率，并在实验中显示了SSIPM的显著加速。

Abstract
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples. Recent works suggest that these estimators are more statistically efficient than plug-in (linear programming-based) OT estimators when comparing probability measures in high-dimensions~\citep{Vacher-2021-Dimension}. Unfortunately, that statistical benefit comes at a very steep computational price: because their computation relies on the short-step interior-point method (SSIPM), which comes with a large iteration count in practice, these estimators quickly become intractable w.r.t. sample size $n$. To scale these estimators to larger $n$, we propose a nonsmooth fixed-point model for the kernel-based OT problem, and show that it can be efficiently solved via a specialized semismooth Newton (SSN) method: We show, exploring the problem's structure, that the per-iteration cost of performing one SSN step can be significantly reduced in practice. We prove that our SSN method achieves a global convergence rate of $O(1/\sqrt{k})$, and a local quadratic convergence rate under standard regularity conditions. We show substantial speedups over SSIPM on both synthetic and real datasets.

摘要
kernel-based最优运输（OT）估计器提供了一种代替方法来解决OT问题，从样本中进行函数估计。据最新的研究表明，这些估计器在高维度下比插入（线性编程基于）OT估计器更有统计效率，但是计算成本却非常高：因为它们的计算基于短步内部点法（SSIPM），在实践中通常需要许多迭代。为了扩展这些估计器，我们提出了一种非光滑固定点模型，并证明可以通过特殊的半稳定新颖（SSN）方法高效解决。我们展示，通过利用问题的结构，每次SSN步骤的计算成本可以在实践中减少到一定程度。我们证明我们的SSN方法在全球收敛率为$O(1/\sqrt{k})$，以及当标准正则条件下的本地二阶收敛率。我们在 sintetic 和实际数据集上展示了显著的计算速度提升。

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

paper_url: http://arxiv.org/abs/2310.14085
repo_url: None
paper_authors: Michael I. Jordan, Tianyi Lin, Zhengyuan Zhou
for: 本研究旨在设计一个不需要先知 convexity/monotonicity 参数的在线梯度下降（OGD）算法，以便在单个代理和多个代理的情况下实现最优的 regret 和 Nash 平衡rate。
methods: 本研究使用了一种全部适应的 OGD 算法，称为 \textsf{AdaOGD}，它不需要先知 convexity/monotonicity 参数。在单个代理情况下，\textsf{AdaOGD} 可以实现 $O(\log^2(T))$ 的 regret，这是最优的结果，只有 log 因子的差异。在多个代理情况下，如果每个代理使用 \textsf{AdaOGD}，则共同行动会在 $O(\frac{\log^3 T}{T})$ 的速度 converges 到一个唯一的 Nash 平衡。
results: 本研究的结果显示，\textsf{AdaOGD} 可以在新闻vendor 问题中实现最优的 regret 和 Nash 平衡rate。此外，本研究还扩展到了更通用的 exp-concave 成本函数和游戏中，使用在线 Newton 步骤（ONS）算法。

Abstract
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of $\Theta(\log T)$ for strongly convex cost functions; and (2) in the multi-agent setting of strongly monotone games, with each agent employing OGD, we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of $\Theta(\frac{1}{T})$. While these finite-time guarantees highlight its merits, OGD has the drawback that it requires knowing the strong convexity/monotonicity parameters. In this paper, we design a fully adaptive OGD algorithm, \textsf{AdaOGD}, that does not require a priori knowledge of these parameters. In the single-agent setting, our algorithm achieves $O(\log^2(T))$ regret under strong convexity, which is optimal up to a log factor. Further, if each agent employs \textsf{AdaOGD} in strongly monotone games, the joint action converges in a last-iterate sense to a unique Nash equilibrium at a rate of $O(\frac{\log^3 T}{T})$, again optimal up to log factors. We illustrate our algorithms in a learning version of the classical newsvendor problem, where due to lost sales, only (noisy) gradient feedback can be observed. Our results immediately yield the first feasible and near-optimal algorithm for both the single-retailer and multi-retailer settings. We also extend our results to the more general setting of exp-concave cost functions and games, using the online Newton step (ONS) algorithm.

摘要
在线梯度下降（OGD）在强连续或强 monotonicity assumption下是著名的双优化算法：（1）在单个代理Setting中，它可以在强连续成本函数下 achieve óptimal regret of $\Theta(\log T)$;（2）在多个代理Setting中，每个代理使用 OGD，我们可以获得last-iterate 协调的joint action converges to a unique Nash equilibrium at an optimal rate of $\Theta(\frac{1}{T})$. 虽然这些 finite-time guarantees highlight its merits, OGD 需要先知道强连续/MONOTONICITY 参数。在这篇论文中，我们设计了一个完全适应的 OGD 算法，\textsf{AdaOGD}，不需要先知道这些参数。在单个代理Setting中，我们的算法可以 achieve $O(\log^2(T))$ regret under strong convexity, which is optimal up to a log factor。此外，如果每个代理使用 \textsf{AdaOGD} 在强MONOTONICITY games中，joint action 会在 last-iterate sense converge to a unique Nash equilibrium at a rate of $O(\frac{\log^3 T}{T})$, again optimal up to log factors。我们在学习版新闻 vendor problem中 illustrate 我们的算法，因为lost sales，只能观察（噪音）梯度反馈。我们的结果 immediately yield the first feasible and near-optimal algorithm for both the single-retailer and multi-retailer settings。我们还将我们的结果推广到更一般的 exp-concave cost functions 和 games，使用 online Newton step（ONS）算法。

Graph Neural Networks and Applied Linear Algebra

paper_url: http://arxiv.org/abs/2310.14084
repo_url: https://github.com/sandialabs/gnn-applied-linear-algebra
paper_authors: Nicholas S. Moore, Eric C. Cyr, Peter Ohm, Christopher M. Siefert, Raymond S. Tuminaro
for: 本文主要针对数学 linear algebra 领域的科学计算问题，探讨如何使用 neural network (NN) 来解决 sparse matrix 计算问题。
methods: 本文使用 graph neural network (GNN) 方法来处理 sparse matrix 计算问题，GNN 定义了聚合函数（例如和），可以操作变量大小的输入数据，生成固定大小的输出数据，以便应用 MLP 来解决问题。
results: 本文通过提供了多个实际例子，示出了如何使用 GNN 来解决一些常见的 linear algebra 任务，包括matrix-vector 乘法、插值、松弛方法和连接度度量。

Abstract
Sparse matrix computations are ubiquitous in scientific computing. With the recent interest in scientific machine learning, it is natural to ask how sparse matrix computations can leverage neural networks (NN). Unfortunately, multi-layer perceptron (MLP) neural networks are typically not natural for either graph or sparse matrix computations. The issue lies with the fact that MLPs require fixed-sized inputs while scientific applications generally generate sparse matrices with arbitrary dimensions and a wide range of nonzero patterns (or matrix graph vertex interconnections). While convolutional NNs could possibly address matrix graphs where all vertices have the same number of nearest neighbors, a more general approach is needed for arbitrary sparse matrices, e.g. arising from discretized partial differential equations on unstructured meshes. Graph neural networks (GNNs) are one approach suitable to sparse matrices. GNNs define aggregation functions (e.g., summations) that operate on variable size input data to produce data of a fixed output size so that MLPs can be applied. The goal of this paper is to provide an introduction to GNNs for a numerical linear algebra audience. Concrete examples are provided to illustrate how many common linear algebra tasks can be accomplished using GNNs. We focus on iterative methods that employ computational kernels such as matrix-vector products, interpolation, relaxation methods, and strength-of-connection measures. Our GNN examples include cases where parameters are determined a-priori as well as cases where parameters must be learned. The intent with this article is to help computational scientists understand how GNNs can be used to adapt machine learning concepts to computational tasks associated with sparse matrices. It is hoped that this understanding will stimulate data-driven extensions of classical sparse linear algebra tasks.

摘要
科学计算中的稀疏矩阵计算是普遍存在的。随着科学机器学习的兴趣增长，可以问题是如何使用神经网络（NN）来利用稀疏矩阵计算。然而，多层感知网络（MLP）通常不适用于图或稀疏矩阵计算，因为MLP需要固定大小的输入，而科学应用通常会生成稀疏矩阵，其维度和非零强相互关系是不固定的。而卷积神经网络（CNN）可能可以解决图格上的稀疏矩阵问题，但是需要一种更通用的方法来解决任意稀疏矩阵问题，例如由不同维度的几何函数所生成的稀疏矩阵。图神经网络（GNN）是一种适用于稀疏矩阵的方法，GNN定义了聚合函数（例如和），这些函数可以在变量大小的输入数据上运行，以生成固定大小的输出数据，从而使得MLP可以应用。本文的目标是为数 Linear Algebra 读者提供 GNN 的引入，并提供一些具体的例子来说明如何使用 GNN 来完成许多常见的Linear Algebra 任务。我们关注到了迭代方法，包括矩阵-向量产品、 interpolate、 relaxation 方法和 strength-of-connection 度量。我们的 GNN 例子包括参数由先前确定的情况以及参数需要学习的情况。我们希望通过这篇文章，让计算科学家理解 GNN 如何用于改变机器学习概念，以便在稀疏矩阵计算中进行数据驱动的扩展。

Counterfactual Prediction Under Selective Confounding

paper_url: http://arxiv.org/abs/2310.14064
repo_url: https://github.com/sohaib730/causalml
paper_authors: Sohaib Kiani, Jared Barton, Jon Sushinsky, Lynda Heimbach, Bo Luo
for: 本研究旨在Addressing the challenge of conducting interpretable causal inference between a binary treatment and its resulting outcome when not all confounders are known.
methods: 我们提出了一种基于Selective Confounding的方法，使用双调sample来实现。这种方法可以在多个决策者与不同政策存在的情况下进行适应。
results: 我们提供了both theoretical error bounds和实际证据，证明了我们的方法的有效性。此外，我们还介绍了三种特定于儿童分配场景的评价方法，以增强透明性和可解性。

Abstract
This research addresses the challenge of conducting interpretable causal inference between a binary treatment and its resulting outcome when not all confounders are known. Confounders are factors that have an influence on both the treatment and the outcome. We relax the requirement of knowing all confounders under desired treatment, which we refer to as Selective Confounding, to enable causal inference in diverse real-world scenarios. Our proposed scheme is designed to work in situations where multiple decision-makers with different policies are involved and where there is a re-evaluation mechanism after the initial decision to ensure consistency. These assumptions are more practical to fulfill compared to the availability of all confounders under all treatments. To tackle the issue of Selective Confounding, we propose the use of dual-treatment samples. These samples allow us to employ two-step procedures, such as Regression Adjustment or Doubly-Robust, to learn counterfactual predictors. We provide both theoretical error bounds and empirical evidence of the effectiveness of our proposed scheme using synthetic and real-world child placement data. Furthermore, we introduce three evaluation methods specifically tailored to assess the performance in child placement scenarios. By emphasizing transparency and interpretability, our approach aims to provide decision-makers with a valuable tool. The source code repository of this work is located at https://github.com/sohaib730/CausalML.

摘要

On discretisation drift and smoothness regularisation in neural network training

paper_url: http://arxiv.org/abs/2310.14036
repo_url: None
paper_authors: Mihaela Claudia Rosca
for: 本研究的目的是更深入地理解深度学习，尤其是优化和模型规范化。
methods: 本研究使用了梯度下降（GD）和负梯度流（NGF）等优化算法，以及新的连续时间流动来研究GD的动态。
results: 研究发现，在超级vised学习和两个玩家游戏中训练不稳定的问题可以通过构建新的学习率时间表和规范来解决。此外，研究还发现了在各种深度学习领域中细eness规范对优化的影响，并在强化学习中添加细eness规范后得到了性能提升。

Abstract
The deep learning recipe of casting real-world problems as mathematical optimisation and tackling the optimisation by training deep neural networks using gradient-based optimisation has undoubtedly proven to be a fruitful one. The understanding behind why deep learning works, however, has lagged behind its practical significance. We aim to make steps towards an improved understanding of deep learning with a focus on optimisation and model regularisation. We start by investigating gradient descent (GD), a discrete-time algorithm at the basis of most popular deep learning optimisation algorithms. Understanding the dynamics of GD has been hindered by the presence of discretisation drift, the numerical integration error between GD and its often studied continuous-time counterpart, the negative gradient flow (NGF). To add to the toolkit available to study GD, we derive novel continuous-time flows that account for discretisation drift. Unlike the NGF, these new flows can be used to describe learning rate specific behaviours of GD, such as training instabilities observed in supervised learning and two-player games. We then translate insights from continuous time into mitigation strategies for unstable GD dynamics, by constructing novel learning rate schedules and regularisers that do not require additional hyperparameters. Like optimisation, smoothness regularisation is another pillar of deep learning's success with wide use in supervised learning and generative modelling. Despite their individual significance, the interactions between smoothness regularisation and optimisation have yet to be explored. We find that smoothness regularisation affects optimisation across multiple deep learning domains, and that incorporating smoothness regularisation in reinforcement learning leads to a performance boost that can be recovered using adaptions to optimisation methods.

摘要
深度学习的制作方法，将现实世界的问题化为数学优化问题，并使用深度神经网络进行加速器基于梯度下降优化，不疑而是成功的。然而，深度学习的工作原理尚未得到充分的理解。我们想要做出一些探索深度学习的尝试，特icularly focusing on optimization and model regularization. We start by investigating gradient descent (GD), a discrete-time algorithm that is the foundation of most popular deep learning optimization algorithms. Understanding the dynamics of GD has been hindered by the presence of discretization drift, the numerical integration error between GD and its continuous-time counterpart, the negative gradient flow (NGF). To add to the toolkit available to study GD, we derive novel continuous-time flows that account for discretization drift. Unlike the NGF, these new flows can be used to describe learning rate specific behaviors of GD, such as training instabilities observed in supervised learning and two-player games. We then translate insights from continuous time into mitigation strategies for unstable GD dynamics, by constructing novel learning rate schedules and regularizers that do not require additional hyperparameters. Smoothness regularization is another pillar of deep learning's success, with wide use in supervised learning and generative modeling. Despite their individual significance, the interactions between smoothness regularization and optimization have yet to be explored. We find that smoothness regularization affects optimization across multiple deep learning domains, and that incorporating smoothness regularization in reinforcement learning leads to a performance boost that can be recovered using adaptions to optimization methods.

Filling the Missing: Exploring Generative AI for Enhanced Federated Learning over Heterogeneous Mobile Edge Devices

paper_url: http://arxiv.org/abs/2310.13981
repo_url: None
paper_authors: Peichun Li, Hanwen Zhang, Yuan Wu, Liping Qian, Rong Yu, Dusit Niyato, Xuemin Shen
for: 提高分布式人工智能模型训练过程中的Edge网络中设备的数据和资源不同性问题。
methods: 提议使用生成AI技术实现 federated learning，通过填充地方数据的想法来解决数据不同性问题，同时提高训练效率和设备资源利用率。
results: 实验结果表明，使用FIMI可以将设备 сторо的能源减少50%，同时达到global测试准确率目标，并且在非独立同分布（non-IID）数据下，FIMI可以显著提高全球准确率。

Abstract
Distributed Artificial Intelligence (AI) model training over mobile edge networks encounters significant challenges due to the data and resource heterogeneity of edge devices. The former hampers the convergence rate of the global model, while the latter diminishes the devices' resource utilization efficiency. In this paper, we propose a generative AI-empowered federated learning to address these challenges by leveraging the idea of FIlling the MIssing (FIMI) portion of local data. Specifically, FIMI can be considered as a resource-aware data augmentation method that effectively mitigates the data heterogeneity while ensuring efficient FL training. We first quantify the relationship between the training data amount and the learning performance. We then study the FIMI optimization problem with the objective of minimizing the device-side overall energy consumption subject to required learning performance constraints. The decomposition-based analysis and the cross-entropy searching method are leveraged to derive the solution, where each device is assigned suitable AI-synthesized data and resource utilization policy. Experiment results demonstrate that FIMI can save up to 50% of the device-side energy to achieve the target global test accuracy in comparison with the existing methods. Meanwhile, FIMI can significantly enhance the converged global accuracy under the non-independently-and-identically distribution (non-IID) data.

摘要
分布式人工智能（AI）模型训练在移动边缘网络上遇到了数据和资源不一致的问题。前者阻碍全球模型的吞吐率，而后者降低设备的资源利用效率。在这篇论文中，我们提出了基于生成AI的联邦学习来解决这些问题，利用了填充缺失（FIMI）的想法。 Specifically, FIMI可以看作是一种资源意识的数据扩充方法，有效地缓解数据不一致性，同时保证了有效的联邦学习训练。我们首先量化了训练数据量和学习性能之间的关系。然后，我们研究了FIMI优化问题，即将设备侧总能 consumption最小化，保证学习性能要求。通过分解分析和十字推测法，我们 derivated解决方案，每个设备都被分配了适合的AI生成的数据和资源利用策略。实验结果表明，FIMI可以将设备侧能 consumption降低至50%，以达到targettest accuracy。同时，FIMI可以在非独立同一分布（non-IID）数据下显著提高全球准确率。

Continual Invariant Risk Minimization

paper_url: http://arxiv.org/abs/2310.13977
repo_url: None
paper_authors: Francesco Alesiani, Shujian Yu, Mathias Niepert
for: 本研究旨在提出一种基于环境变换的连续学习方法，以便在不同环境中学习模型能够捕捉到共同特征表示。
methods: 本研究使用了 invariant risk minimization（IRM）方法，即在所有环境都可用于学习系统上的方法。另外，本研究还提出了一种基于变分 Bayesian 和双层框架的扩展方法，以满足连续学习中的共同特征捕捉。
results: 研究表明，使用提出的方法可以在多个数据集和多个顺序环境中，与之前的方法相比或与之比肩，提高连续学习中的模型性能。

Abstract
Empirical risk minimization can lead to poor generalization behavior on unseen environments if the learned model does not capture invariant feature representations. Invariant risk minimization (IRM) is a recent proposal for discovering environment-invariant representations. IRM was introduced by Arjovsky et al. (2019) and extended by Ahuja et al. (2020). IRM assumes that all environments are available to the learning system at the same time. With this work, we generalize the concept of IRM to scenarios where environments are observed sequentially. We show that existing approaches, including those designed for continual learning, fail to identify the invariant features and models across sequentially presented environments. We extend IRM under a variational Bayesian and bilevel framework, creating a general approach to continual invariant risk minimization. We also describe a strategy to solve the optimization problems using a variant of the alternating direction method of multiplier (ADMM). We show empirically using multiple datasets and with multiple sequential environments that the proposed methods outperform or is competitive with prior approaches.

摘要
empirical risk minimization可能会导致在未看到的环境中的差异化行为，如果学习的模型没有捕捉环境不变的特征表示。不变风险最小化(IRM)是一种最近提出的环境不变表示发现方法。IRM由Arjovsky et al.（2019）引入并由Ahuja et al.（2020）扩展。IRM假设所有环境都可以同时给学习系统提供。在这种情况下，我们推广了IRM的概念，以适应sequentially presented environments中的环境。我们表明，包括 continual learning的方法在内的现有方法无法在sequentially presented environments中标识不变的特征和模型。我们将IRM扩展为variational Bayesian和bilateral框架，创造一种总体的持续不变风险最小化方法。我们还描述了使用一种变体的alternating direction method of multiplier(ADMM)来解决优化问题的策略。我们通过多个数据集和多个sequential environments的实验表明，提posed方法可以比或与之前的方法相比性。

ASBART:Accelerated Soft Bayes Additive Regression Trees

paper_url: http://arxiv.org/abs/2310.13975
repo_url: https://github.com/richael008/xsbart
paper_authors: Hao Ran, Yang Bai
For: This paper proposes a new variant of Bayesian additive regression trees (BART) called accelerate soft BART (ASBART), which improves the speed of the existing Soft BART model while maintaining comparable accuracy.* Methods: The proposed ASBART method uses a new algorithm that is about 10 times faster than the default Soft BART method, making it more practical for real-world applications.* Results: Simulation studies show that ASBART has comparable accuracy to Soft BART, while being significantly faster in terms of computational speed. The code for ASBART is open-source and available online.

Abstract
Bayes additive regression trees(BART) is a nonparametric regression model which has gained wide-spread popularity in recent years due to its flexibility and high accuracy of estimation. Soft BART,one variation of BART,improves both practically and heoretically on existing Bayesian sum-of-trees models. One bottleneck for Soft BART is its slow speed in the long MCMC loop. Compared to BART,it use more than about 20 times to complete the calculation with the default setting. We proposed a variant of BART named accelerate Soft BART(ASBART). Simulation studies show that the new method is about 10 times faster than the Soft BART with comparable accuracy. Our code is open-source and available at https://github.com/richael008/XSBART.

摘要
bayes 添加 regresión árboles (BART) 是一种非 Parametric 回归模型，在过去几年内得到了广泛的推广和应用，因为它的灵活性和估计精度高。软 BART，BART 的一种变体，在现有的 Bayesian 汇集树模型上进行了改进，从理论和实践两个方面来说，它提高了回归预测的精度。然而，Soft BART 的长MCMC循环速度比较慢，相比 BART，它需要大约 20 倍的计算时间。我们提出了一种加速 Soft BART 的方法，称为加速 Soft BART (ASBART)。根据 simulations 的研究，新方法比 Soft BART 快约 10 倍，并且与其相比，它们在精度方面几乎相同。我们的代码开源在 GitHub 上，可以在获取。

Distributed Linear Regression with Compositional Covariates

paper_url: http://arxiv.org/abs/2310.13969
repo_url: None
paper_authors: Yue Chao, Lei Huang, Xuejun Ma
for: 解决大数据集中分布式统计方法和计算问题。
methods: 提议了两种分布式优化技术，一种是基于ADMM框架的中央化优化算法，另一种是基于CDMM框架的分布式坐标下降算法。
results: 通过对真实数据和 sintetic数据进行数值实验，证明了提议的算法的有效性和可靠性。

Abstract
With the availability of extraordinarily huge data sets, solving the problems of distributed statistical methodology and computing for such data sets has become increasingly crucial in the big data area. In this paper, we focus on the distributed sparse penalized linear log-contrast model in massive compositional data. In particular, two distributed optimization techniques under centralized and decentralized topologies are proposed for solving the two different constrained convex optimization problems. Both two proposed algorithms are based on the frameworks of Alternating Direction Method of Multipliers (ADMM) and Coordinate Descent Method of Multipliers(CDMM, Lin et al., 2014, Biometrika). It is worth emphasizing that, in the decentralized topology, we introduce a distributed coordinate-wise descent algorithm based on Group ADMM(GADMM, Elgabli et al., 2020, Journal of Machine Learning Research) for obtaining a communication-efficient regularized estimation. Correspondingly, the convergence theories of the proposed algorithms are rigorously established under some regularity conditions. Numerical experiments on both synthetic and real data are conducted to evaluate our proposed algorithms.

摘要
随着庞大数据集的可用性的提高，解决分布式统计方法和计算问题已成为大数据领域中越来越重要的问题。在这篇论文中，我们关注于巨大compositional数据中的分布式稀缺假设模型。我们提出了两种分布式优化技术，一种是基于中央化topology，另一种是基于分布式topology。两种提出的算法都基于ADMM和CDMM框架(Lin et al., 2014, Biometrika)。在分布式topology中，我们提出了一种分布式坐标点wise降降算法基于Group ADMM(Elgabli et al., 2020, Journal of Machine Learning Research)，以实现通信效率的Regularized估计。对于提出的算法，我们也证明了其 convergence的理论基础。在实验中，我们使用 sintetic和实际数据进行了数值测试，以评估我们的提出算法。

Minimax Optimal Transfer Learning for Kernel-based Nonparametric Regression

paper_url: http://arxiv.org/abs/2310.13966
repo_url: None
paper_authors: Chao Wang, Caixing Wang, Xin He, Xingdong Feng
for: This paper focuses on investigating the transfer learning problem within the context of nonparametric regression over a reproducing kernel Hilbert space, with the aim of bridging the gap between practical effectiveness and theoretical guarantees.
methods: The proposed method uses kernel ridge regression for the known transferable source case, and an efficient aggregation algorithm for the unknown case, which can automatically detect and alleviate the effects of negative sources.
results: The paper provides the statistical properties of the desired estimators and establishes the minimax optimal rate, and through extensive numerical experiments on synthetic data and real examples, the effectiveness of the proposed method is validated.

Abstract
In recent years, transfer learning has garnered significant attention in the machine learning community. Its ability to leverage knowledge from related studies to improve generalization performance in a target study has made it highly appealing. This paper focuses on investigating the transfer learning problem within the context of nonparametric regression over a reproducing kernel Hilbert space. The aim is to bridge the gap between practical effectiveness and theoretical guarantees. We specifically consider two scenarios: one where the transferable sources are known and another where they are unknown. For the known transferable source case, we propose a two-step kernel-based estimator by solely using kernel ridge regression. For the unknown case, we develop a novel method based on an efficient aggregation algorithm, which can automatically detect and alleviate the effects of negative sources. This paper provides the statistical properties of the desired estimators and establishes the minimax optimal rate. Through extensive numerical experiments on synthetic data and real examples, we validate our theoretical findings and demonstrate the effectiveness of our proposed method.

摘要
Two scenarios are considered: one where the transferable sources are known, and another where they are unknown. For the known transferable source case, a two-step kernel-based estimator is proposed, solely using kernel ridge regression. For the unknown case, a novel method based on an efficient aggregation algorithm is developed, which can automatically detect and alleviate the effects of negative sources.The statistical properties of the desired estimators are provided, and the minimax optimal rate is established. Extensive numerical experiments on synthetic data and real examples are conducted to validate the theoretical findings and demonstrate the effectiveness of the proposed method.

Toward Generative Data Augmentation for Traffic Classification

paper_url: http://arxiv.org/abs/2310.13935
repo_url: None
paper_authors: Chao Wang, Alessandro Finamore, Pietro Michiardi, Massimo Gallo, Dario Rossi
for: 本研究旨在探讨数据增强技术在网络应用中的可行性，特别是在流量分类领域。
methods: 本研究采用了14种手动设计的数据增强策略，应用于MIRAGE19 dataset。
results: 研究结果显示，数据增强可以在流量分类中提供未曾被探讨的优势，同时促进了使用生成模型自动设计数据增强策略的研究课程。

Abstract
Data Augmentation (DA)-augmenting training data with synthetic samples-is wildly adopted in Computer Vision (CV) to improve models performance. Conversely, DA has not been yet popularized in networking use cases, including Traffic Classification (TC). In this work, we present a preliminary study of 14 hand-crafted DAs applied on the MIRAGE19 dataset. Our results (i) show that DA can reap benefits previously unexplored in TC and (ii) foster a research agenda on the use of generative models to automate DA design.

摘要
<> translate "Data Augmentation (DA)-augmenting training data with synthetic samples-is wildly adopted in Computer Vision (CV) to improve models performance. Conversely, DA has not been yet popularized in networking use cases, including Traffic Classification (TC). In this work, we present a preliminary study of 14 hand-crafted DAs applied on the MIRAGE19 dataset. Our results (i) show that DA can reap benefits previously unexplored in TC and (ii) foster a research agenda on the use of generative models to automate DA design.">以下是文本的Simplified Chinese翻译：<>计算机视觉（CV）中广泛采用数据扩充（DA）技术，卷积神经网络性能。然而，在网络应用场景中，包括流量分类（TC），DA还没有得到普及。在这项工作中，我们对MIRAGE19数据集上手工设计了14种DA，并进行了初步研究。我们的结果表明，DA可以在TC中获得未曾提及的利益，同时也激发了使用生成模型自动化DA设计的研究论坛。

Diversified Outlier Exposure for Out-of-Distribution Detection via Informative Extrapolation

paper_url: http://arxiv.org/abs/2310.13923
repo_url: https://github.com/zfancy/divoe
paper_authors: Jianing Zhu, Geng Yu, Jiangchao Yao, Tongliang Liu, Gang Niu, Masashi Sugiyama, Bo Han
for:本研究旨在提高机器学习模型在实际应用中的可靠性，通过进行outsider暴露（OOD）检测。methods:本研究提出了一种新的框架，即多样化外围暴露（DivOE），通过在训练过程中使用多亮示的auxiliary outliers来实现有效的OOD检测。results: DivOE通过在训练过程中生成更多的外围样本，以便在ID和OOD数据之间找到更多的分界点，从而提高OOD检测的准确性。

Abstract
Out-of-distribution (OOD) detection is important for deploying reliable machine learning models on real-world applications. Recent advances in outlier exposure have shown promising results on OOD detection via fine-tuning model with informatively sampled auxiliary outliers. However, previous methods assume that the collected outliers can be sufficiently large and representative to cover the boundary between ID and OOD data, which might be impractical and challenging. In this work, we propose a novel framework, namely, Diversified Outlier Exposure (DivOE), for effective OOD detection via informative extrapolation based on the given auxiliary outliers. Specifically, DivOE introduces a new learning objective, which diversifies the auxiliary distribution by explicitly synthesizing more informative outliers for extrapolation during training. It leverages a multi-step optimization method to generate novel outliers beyond the original ones, which is compatible with many variants of outlier exposure. Extensive experiments and analyses have been conducted to characterize and demonstrate the effectiveness of the proposed DivOE. The code is publicly available at: https://github.com/tmlr-group/DivOE.

摘要
OUT-OF-DISTRIBUTION (OOD) 检测是在实际应用中部署可靠机器学习模型的重要问题。 latest advances in outlier exposure have shown promising results on OOD detection via fine-tuning model with informatively sampled auxiliary outliers。 However, previous methods assume that the collected outliers can be sufficiently large and representative to cover the boundary between ID and OOD data, which might be impractical and challenging。 In this work, we propose a novel framework, namely, Diversified Outlier Exposure (DivOE), for effective OOD detection via informative extrapolation based on the given auxiliary outliers。 Specifically, DivOE introduces a new learning objective, which diversifies the auxiliary distribution by explicitly synthesizing more informative outliers for extrapolation during training。 It leverages a multi-step optimization method to generate novel outliers beyond the original ones, which is compatible with many variants of outlier exposure。 Extensive experiments and analyses have been conducted to characterize and demonstrate the effectiveness of the proposed DivOE。 The code is publicly available at: https://github.com/tmlr-group/DivOE。Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Equivariant Map and Agent Geometry for Autonomous Driving Motion Prediction

paper_url: http://arxiv.org/abs/2310.13922
repo_url: None
paper_authors: Yuping Wang, Jier Chen
for: 这种研究旨在解决自动驾驶中的深度学习帮助预测运动的问题，具体来说是确保运动的准确性和稳定性。
methods: 这种研究使用了一种名为EqMotion的新型动作预测模型，该模型具有几何变换等价性和人工对话等变换性，从而使得模型在不同的坐标系下仍然可以准确预测动作。此外，该研究还引入了一种具有几何变换等价性的高清地图处理方法，以增强网络的空间理解。
results: 该研究表明，通过使用EqMotion模型和高清地图处理方法，可以实现高精度的动作预测，同时具有轻量级的设计和高效的数据利用。

Abstract
In autonomous driving, deep learning enabled motion prediction is a popular topic. A critical gap in traditional motion prediction methodologies lies in ensuring equivariance under Euclidean geometric transformations and maintaining invariant interaction relationships. This research introduces a groundbreaking solution by employing EqMotion, a theoretically geometric equivariant and interaction invariant motion prediction model for particles and humans, plus integrating agent-equivariant high-definition (HD) map features for context aware motion prediction in autonomous driving. The use of EqMotion as backbone marks a significant departure from existing methods by rigorously ensuring motion equivariance and interaction invariance. Equivariance here implies that an output motion must be equally transformed under the same Euclidean transformation as an input motion, while interaction invariance preserves the manner in which agents interact despite transformations. These properties make the network robust to arbitrary Euclidean transformations and contribute to more accurate prediction. In addition, we introduce an equivariant method to process the HD map to enrich the spatial understanding of the network while preserving the overall network equivariance property. By applying these technologies, our model is able to achieve high prediction accuracy while maintain a lightweight design and efficient data utilization.

摘要
自主驾驶中，深度学习启用的动作预测是一个受欢迎的话题。传统的动作预测方法存在一个重要的缺陷，即保证动作 equivariant 和 interaction invariant。这项研究提出了一种创新的解决方案，通过使用EqMotion，一种 theoretically 几何 equivariant 和 interaction invariant 的动作预测模型，以及 Agent-equivariant HD map 特征进行上下文意识激活的动作预测。使用EqMotion 作为基础marks a significant departure from existing methods， rigorously ensuring motion equivariance and interaction invariance。 equivariance 在这里意味着输入动作下的输出动作必须在同一个几何变换下具有相同的变换，而 interaction invariance 保持了代理人之间的交互方式不变，即使在变换下。这些特性使得网络具有对任意几何变换的 Robustness 和更高的预测精度。此外，我们还引入了一种具有 equivariance 性的方法来处理 HD map，以激活网络的空间理解，而不损失整体网络的 equivariance 性。通过应用这些技术，我们的模型能够实现高精度的预测，同时具有轻量级的设计和高效的数据利用。

Southern Ocean Dynamics Under Climate Change: New Knowledge Through Physics-Guided Machine Learning

paper_url: http://arxiv.org/abs/2310.13916
repo_url: https://github.com/yikwill/THOR-MOM6
paper_authors: William Yik, Maike Sonnewald, Mariana C. A. Clare, Redouane Lguensat
for: 本研究旨在理解由气候变化引起的南极环流Current的变化，以及这些变化对南极环流Current的影响。
methods: 本研究使用了Tracking global Heating with Ocean Regimes（THOR）方法，将高分辨率气候模型数据分解成不同的物理 régime，并使用神经网络模型预测这些 régime的变化。
results: 研究发现，随着气候变化，南极环流Current在interactions with the Pacific-Antarctic Ridge region发生了 régime shift，其中流速增强，而 bathymetry的 dominant dynamical role weakens。

Abstract
Complex ocean systems such as the Antarctic Circumpolar Current play key roles in the climate, and current models predict shifts in their strength and area under climate change. However, the physical processes underlying these changes are not well understood, in part due to the difficulty of characterizing and tracking changes in ocean physics in complex models. To understand changes in the Antarctic Circumpolar Current, we extend the method Tracking global Heating with Ocean Regimes (THOR) to a mesoscale eddy permitting climate model and identify regions of the ocean characterized by similar physics, called dynamical regimes, using readily accessible fields from climate models. To this end, we cluster grid cells into dynamical regimes and train an ensemble of neural networks to predict these regimes and track them under climate change. Finally, we leverage this new knowledge to elucidate the dynamics of regime shifts. Here we illustrate the value of this high-resolution version of THOR, which allows for mesoscale turbulence, with a case study of the Antarctic Circumpolar Current and its interactions with the Pacific-Antarctic Ridge. In this region, THOR specifically reveals a shift in dynamical regime under climate change driven by changes in wind stress and interactions with bathymetry. Using this knowledge to guide further exploration, we find that as the Antarctic Circumpolar Current shifts north under intensifying wind stress, the dominant dynamical role of bathymetry weakens and the flow strengthens.

摘要
COMPLEX ocean systems, such as the Antarctic Circumpolar Current, play key roles in the climate, and current models predict shifts in their strength and area under climate change. However, the physical processes underlying these changes are not well understood, in part due to the difficulty of characterizing and tracking changes in ocean physics in complex models. To understand changes in the Antarctic Circumpolar Current, we extend the method Tracking global Heating with Ocean Regimes (THOR) to a mesoscale eddy permitting climate model and identify regions of the ocean characterized by similar physics, called dynamical regimes, using readily accessible fields from climate models. To this end, we cluster grid cells into dynamical regimes and train an ensemble of neural networks to predict these regimes and track them under climate change. Finally, we leverage this new knowledge to elucidate the dynamics of regime shifts. Here we illustrate the value of this high-resolution version of THOR, which allows for mesoscale turbulence, with a case study of the Antarctic Circumpolar Current and its interactions with the Pacific-Antarctic Ridge. In this region, THOR specifically reveals a shift in dynamical regime under climate change driven by changes in wind stress and interactions with bathymetry. Using this knowledge to guide further exploration, we find that as the Antarctic Circumpolar Current shifts north under intensifying wind stress, the dominant dynamical role of bathymetry weakens and the flow strengthens.

Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models

paper_url: http://arxiv.org/abs/2310.13913
repo_url: None
paper_authors: Lihang Liu, Donglong He, Xianbin Ye, Shanzhuo Zhang, Xiaonan Zhang, Jingbo Zhou, Jun Li, Hua Chai, Fan Wang, Jingzhou He, Liang Zheng, Yonghui Li, Xiaomin Fang
for: 这篇论文旨在提高分子对抗位置预测的精确性，以推动人工智能驱动的药物探索。
methods: 这篇论文使用了深度学习技术来强化分子对抗位置预测，并且使用了大量的蛋白质和小分子组合数据进行预训。
results: 相比传统物理学基于的基底方法和深度学习基于的基底方法，这篇论文的HelixDock方法在复杂的测试集上表现出了superiority，尤其是在蛋白质和小分子之间的对抗位置预测方面。

Abstract
Molecular docking, a pivotal computational tool for drug discovery, predicts the binding interactions between small molecules (ligands) and target proteins (receptors). Conventional physics-based docking tools, though widely used, face limitations in precision due to restricted conformational sampling and imprecise scoring functions. Recent endeavors have employed deep learning techniques to enhance docking accuracy, but their generalization remains a concern due to limited training data. Leveraging the success of extensive and diverse data in other domains, we introduce HelixDock, a novel approach for site-specific molecular docking. Hundreds of millions of binding poses are generated by traditional docking tools, encompassing diverse protein targets and small molecules. Our deep learning-based docking model, a SE(3)-equivariant network, is pre-trained with this large-scale dataset and then fine-tuned with a small number of precise receptor-ligand complex structures. Comparative analyses against physics-based and deep learning-based baseline methods highlight HelixDock's superiority, especially on challenging test sets. Our study elucidates the scaling laws of the pre-trained molecular docking models, showcasing consistent improvements with increased model parameters and pre-train data quantities. Harnessing the power of extensive and diverse generated data holds promise for advancing AI-driven drug discovery.

摘要
分子停靠，一种重要的计算工具，预测小分子（抗体）与目标蛋白（受体）之间的绑定交互。传统的物理学基于的停靠工具，尽管广泛使用，但受限于精确性，因为它们只能进行有限的配置检查和不准确的评分函数。现在的努力是使用深度学习技术来提高停靠精度，但它们的泛化仍然是一个问题，因为它们只有有限的训练数据。我们在其他领域中的丰富和多样化数据的基础上引入了 HelixDock，一种新的方法。我们使用传统的停靠工具生成了数百万个绑定位置，涵盖了多种蛋白目标和小分子。我们的深度学习基于的停靠模型，一个SE(3)相似的网络，通过大规模数据集预训练，然后精度地调整了一个小数量的准确抗体-小分子复合结构。与物理学基于和深度学习基于的基线方法进行比较分析，HelixDock在具有挑战性的测试集上表现出优异性，特别是在难以预测的情况下。我们的研究描述了预训练分子停靠模型的涨幅法律，展示了随着模型参数和预训练数据量的增加，模型的性能得到了一致提高。通过利用广泛生成的数据来提高人工智能驱动的药物探索，我们希望能够推动这一领域的发展。

Towards Hyperparameter-Agnostic DNN Training via Dynamical System Insights

paper_url: http://arxiv.org/abs/2310.13901
repo_url: None
paper_authors: Carmel Fiscko, Aayushya Agarwal, Yihan Ruan, Soummya Kar, Larry Pileggi, Bruno Sinopoli
for:ECCO-DNN is designed to optimize deep neural network training.methods:ECCO-DNN uses a stochastic first-order optimization method that models the optimization variable trajectory as a dynamical system and adaptively selects step sizes based on the trajectory’s shape.results:ECCO-DNN achieves comparable performance to state-of-the-art optimizers including ADAM, SGD, RMSProp, and AdaGrad, and its single hyperparameter can be changed by three orders of magnitude without affecting the trained models’ accuracies. Additionally, ECCO-DNN is insensitive to hyperparameter variations and reduces the data and computation needed for hyperparameter tuning, making it advantageous for rapid prototyping and for applications with new datasets.

Abstract
We present a stochastic first-order optimization method specialized for deep neural networks (DNNs), ECCO-DNN. This method models the optimization variable trajectory as a dynamical system and develops a discretization algorithm that adaptively selects step sizes based on the trajectory's shape. This provides two key insights: designing the dynamical system for fast continuous-time convergence and developing a time-stepping algorithm to adaptively select step sizes based on principles of numerical integration and neural network structure. The result is an optimizer with performance that is insensitive to hyperparameter variations and that achieves comparable performance to state-of-the-art optimizers including ADAM, SGD, RMSProp, and AdaGrad. We demonstrate this in training DNN models and datasets, including CIFAR-10 and CIFAR-100 using ECCO-DNN and find that ECCO-DNN's single hyperparameter can be changed by three orders of magnitude without affecting the trained models' accuracies. ECCO-DNN's insensitivity reduces the data and computation needed for hyperparameter tuning, making it advantageous for rapid prototyping and for applications with new datasets. To validate the efficacy of our proposed optimizer, we train an LSTM architecture on a household power consumption dataset with ECCO-DNN and achieve an optimal mean-square-error without tuning hyperparameters.

摘要
我们提出了一种随机首频优化方法特化于深度神经网络（DNN），即ECCO-DNN。该方法将优化变量轨迹模型为动力系统，并开发了一种适应步长选择算法，以获得两个关键发现：设计动力系统以实现快速连续时间减少，并开发一种时间步骤算法以自适应选择步长，基于数值积分和神经网络结构。这些设计决策使ECCO-DNN的优化器性能免疫参数变化的影响，并与现有的优化器，包括ADAM、SGD、RMSProp和AdaGrad，具有相同的性能。我们在训练DNN模型和数据集中使用ECCO-DNN，包括CIFAR-10和CIFAR-100，并发现ECCO-DNN的单个超参数可以通过三个级别的变化而不影响训练模型的准确性。ECCO-DNN的不敏感性降低了数据和计算所需的 hyperparameter 调试，使其在快速原型和新数据集应用中更加优势。为证明我们提出的优化器的有效性，我们在一个家用电力消耗数据集上使用ECCO-DNN训练LSTM架构，并实现了最佳平均方差值，无需调整超参数。

Masked Hard-Attention Transformers and Boolean RASP Recognize Exactly the Star-Free Languages

paper_url: http://arxiv.org/abs/2310.13897
repo_url: None
paper_authors: Dana Angluin, David Chiang, Andy Yang
for: 这个论文研究了 transformer Encoder 的可能性和逻辑限制。
methods: 这个论文使用了硬 attention 和严格未来masking 方法，并证明了这种网络可以认izers star-free 语言。添加位域 embedding 可以扩展认izers 到其他已知类别。
results: 这个论文证明了 transformer 网络可以认izers star-free 语言，并与 first-order logic、 temporal logic 和 algebraic automata theory 有关。

Abstract
We consider transformer encoders with hard attention (in which all attention is focused on exactly one position) and strict future masking (in which each position only attends to positions strictly to its left), and prove that the class of languages recognized by these networks is exactly the star-free languages. Adding position embeddings increases the class of recognized languages to other well-studied classes. A key technique in these proofs is Boolean RASP, a variant of RASP that is restricted to Boolean values. Via the star-free languages, we relate transformers to first-order logic, temporal logic, and algebraic automata theory.

摘要
我们考虑使用转换器Encoder，具有固定注意力（所有注意力都集中在一个位置）和严格未来掩码（每个位置只能关注左侧的位置），并证明这些网络可以认可的语言类型是星号自由语言。添加位域嵌入可以将认可的语言类型扩展到其他已有的类型。我们使用布尔RASP，一种受限的RASP变体，进行关键技术。通过星号自由语言，我们将转换器相关联到首领逻辑、时间逻辑和代数自动机理论。

Specify Robust Causal Representation from Mixed Observations

paper_url: http://arxiv.org/abs/2310.13892
repo_url: https://github.com/ymy4323460/cari
paper_authors: Mengyue Yang, Xinyu Cai, Furui Liu, Weinan Zhang, Jun Wang
for: 本研究旨在学习从观察数据中学习一种低维度、紧凑的表示，以提高预测模型的稳定性和泛化性。
methods: 本研究使用了 causal 表示学习方法，通过在学习过程中添加 mutual information 度量来规范学习。
results: 研究表明，使用 causal 表示学习方法可以提高模型的Robustness 和泛化性，并且在骚扰攻击和分布Shift 下表现更好于基eline。I hope that helps! Let me know if you have any further questions.

Abstract
Learning representations purely from observations concerns the problem of learning a low-dimensional, compact representation which is beneficial to prediction models. Under the hypothesis that the intrinsic latent factors follow some casual generative models, we argue that by learning a causal representation, which is the minimal sufficient causes of the whole system, we can improve the robustness and generalization performance of machine learning models. In this paper, we develop a learning method to learn such representation from observational data by regularizing the learning procedure with mutual information measures, according to the hypothetical factored causal graph. We theoretically and empirically show that the models trained with the learned causal representations are more robust under adversarial attacks and distribution shifts compared with baselines. The supplementary materials are available at https://github.com/ymy $4323460 / \mathrm{CaRI} /$.

摘要
学习 purely from observations 的表示 Concerns the problem of learning a low-dimensional, compact representation, which is beneficial to prediction models. Under the assumption that the intrinsic latent factors follow some causal generative models, we argue that by learning a causal representation, which is the minimal sufficient causes of the whole system, we can improve the robustness and generalization performance of machine learning models. In this paper, we develop a learning method to learn such representation from observational data by regularizing the learning procedure with mutual information measures, according to the hypothetical factored causal graph. We theoretically and empirically show that the models trained with the learned causal representations are more robust under adversarial attacks and distribution shifts compared with baselines. 附加资料可以在 https://github.com/ymy $4323460 / \mathrm{CaRI} /$ 中找到。

Towards a General Framework for Continual Learning with Pre-training

paper_url: http://arxiv.org/abs/2310.13888
repo_url: https://github.com/thu-ml/hide-prompt
paper_authors: Liyuan Wang, Jingyi Xie, Xingxing Zhang, Hang Su, Jun Zhu
for: 本研究探讨了一种通用框架，用于Sequential continual learning tasks，通过预训练来实现人工智能系统适应真实世界动态变化。
methods: 我们在理论上将目标函数 decomposed into three hierarchical components，包括 within-task prediction、task-identity inference 和 task-adaptive prediction。然后，我们提出了一种新的方法，使用 parameter-efficient fine-tuning (PEFT) 技术和 representation statistics 来显著提高这些组件。
results: 我们在下游 continual learning 中观察到了我们的方法的优势和通用性，并进一步探讨了 PEFT 技术在上游 continual learning 中的应用可能性。此外，我们还讨论了该框架与 neuroscience 最新的进展之间的生物基础。

Abstract
In this work, we present a general framework for continual learning of sequentially arrived tasks with the use of pre-training, which has emerged as a promising direction for artificial intelligence systems to accommodate real-world dynamics. From a theoretical perspective, we decompose its objective into three hierarchical components, including within-task prediction, task-identity inference, and task-adaptive prediction. Then we propose an innovative approach to explicitly optimize these components with parameter-efficient fine-tuning (PEFT) techniques and representation statistics. We empirically demonstrate the superiority and generality of our approach in downstream continual learning, and further explore the applicability of PEFT techniques in upstream continual learning. We also discuss the biological basis of the proposed framework with recent advances in neuroscience.

摘要
在这项工作中，我们提出了一种总体框架，用于Sequential continual learning，利用预训练，这种方向已成为人工智能系统来应对实际世界动态的一种有前途的方法。从理论上来看，我们将目标函数分解成三个层次结构，包括内部任务预测、任务标识推理和任务适应预测。然后，我们提议一种新的方法，使用参数效率的细致调整（PEFT）技术和表示统计来显著地优化这些组成部分。我们在下游 continual learning 中进行了实验，证明了我们的方法的优越性和通用性。此外，我们还探讨了这个框架的生物基础，与 neuroscience 最新的进展有关。

Optimal Transport-based Nonlinear Filtering in High-dimensional Settings

paper_url: http://arxiv.org/abs/2310.13886
repo_url: None
paper_authors: Mohammad Al-Jarrah, Niyizhen Jin, Bamdad Hosseini, Amirhossein Taghvaei
for: 本文解决非线性滤波问题，即计算随机动力系统状态条件分布给历史噪声部分观测的问题。
methods: 我们提出的方法基于非线性滤波的最优运输解释，导致一种基于模拟和无概率算法的 simulation-based 和 likelihood-free 算法，可以估计当前状态分布到下一步时间Step的 Brenier 最优运输Map。
results: 我们的方法比 SIR 滤波和ensemble Kalman filter 表现出更高的样本效率、高维度可描述性和能够捕捉复杂多模分布的能力。

Abstract
This paper addresses the problem of nonlinear filtering, i.e., computing the conditional distribution of the state of a stochastic dynamical system given a history of noisy partial observations. The primary focus is on scenarios involving degenerate likelihoods or high-dimensional states, where traditional sequential importance resampling (SIR) particle filters face the weight degeneracy issue. Our proposed method builds on an optimal transport interpretation of nonlinear filtering, leading to a simulation-based and likelihood-free algorithm that estimates the Brenier optimal transport map from the current distribution of the state to the distribution at the next time step. Our formulation allows us to harness the approximation power of neural networks to model complex and multi-modal distributions and employ stochastic optimization algorithms to enhance scalability. Extensive numerical experiments are presented that compare our method to the SIR particle filter and the ensemble Kalman filter, demonstrating the superior performance of our method in terms of sample efficiency, high-dimensional scalability, and the ability to capture complex and multi-modal distributions.

摘要
Our proposed method is based on an optimal transport interpretation of nonlinear filtering, which leads to a simulation-based and likelihood-free algorithm that estimates the Brenier optimal transport map from the current distribution of the state to the distribution at the next time step. By using neural networks to model complex and multi-modal distributions and stochastic optimization algorithms to enhance scalability, our formulation is able to handle challenging scenarios with ease.Extensive numerical experiments are presented that compare our method to the SIR particle filter and the ensemble Kalman filter. The results demonstrate the superior performance of our method in terms of sample efficiency, high-dimensional scalability, and the ability to capture complex and multi-modal distributions.

Fast Approximation of Similarity Graphs with Kernel Density Estimation

paper_url: http://arxiv.org/abs/2310.13870
repo_url: https://github.com/pmacg/kde-similarity-graph
paper_authors: Peter Macgregor, He Sun
for: 构建一个稀疏的相似图，以便进行现代归一化算法的第一步。
methods: 基于kernel density estimation问题，提出了一种新的算法框架，可以快速构建稀疏的相似图，同时保持归一化结果的结构。
results: 与scikit-learn和FAISS库的实现相比，我们的方法在多种数据集上显著超越了它们。

Abstract
Constructing a similarity graph from a set $X$ of data points in $\mathbb{R}^d$ is the first step of many modern clustering algorithms. However, typical constructions of a similarity graph have high time complexity, and a quadratic space dependency with respect to $|X|$. We address this limitation and present a new algorithmic framework that constructs a sparse approximation of the fully connected similarity graph while preserving its cluster structure. Our presented algorithm is based on the kernel density estimation problem, and is applicable for arbitrary kernel functions. We compare our designed algorithm with the well-known implementations from the scikit-learn library and the FAISS library, and find that our method significantly outperforms the implementation from both libraries on a variety of datasets.

摘要
现在的许多现代聚类算法的第一步是从一个集合 $X$ 中的数据点集构建一个相似Graph。然而，通常情况下，构建相似图的方法具有高时间复杂度和对 $|X|$ 的平方空间依赖。我们解决这个限制，并提出了一个新的算法框架，可以构建稀疏的相似图，保持聚类结构。我们的提出的算法基于kernel density估计问题，适用于任意的kernel函数。我们与scikit-learn库和FAISS库中的常用实现进行比较，发现我们的方法在多种数据集上明显超过了这两个库的实现。

Distributionally Robust Optimization with Bias and Variance Reduction

paper_url: http://arxiv.org/abs/2310.13863
repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
paper_authors: Ronak Mehta, Vincent Roulet, Krishna Pillutla, Zaid Harchaoui
for: 该研究 targets the distributionally robust optimization (DRO) problem with spectral risk-based uncertainty set and $f$-divergence penalty, which includes common risk-sensitive learning objectives such as regularized condition value-at-risk (CVaR) and average top-$k$ loss.
methods: 该研究提出了一种名为 Prospect 的杂相 gradient-based algorithm，该算法只需要调整一个学习率参数，并且可以 linear convergence for smooth regularized losses。这与之前的算法不同，这些算法可能需要调整多个参数，或者因为偏向的梯度估计或不够的迁移而失败。
results: 该研究通过实验表明，Prospect 可以比基eline 2-3倍快速 converges on distribution shift and fairness benchmarks spanning tabular, vision, and language domains。

Abstract
We consider the distributionally robust optimization (DRO) problem with spectral risk-based uncertainty set and $f$-divergence penalty. This formulation includes common risk-sensitive learning objectives such as regularized condition value-at-risk (CVaR) and average top-$k$ loss. We present Prospect, a stochastic gradient-based algorithm that only requires tuning a single learning rate hyperparameter, and prove that it enjoys linear convergence for smooth regularized losses. This contrasts with previous algorithms that either require tuning multiple hyperparameters or potentially fail to converge due to biased gradient estimates or inadequate regularization. Empirically, we show that Prospect can converge 2-3$\times$ faster than baselines such as stochastic gradient and stochastic saddle-point methods on distribution shift and fairness benchmarks spanning tabular, vision, and language domains.

摘要
我团队考虑了分布robust优化（DRO）问题，使用spectral risk-based uncertainty set和$f$- divergence penalty。这种形式包括常见的风险敏感学习目标，如常量值-at-risk（CVaR）和平均top-$k$ loss。我们提出了Prospect算法，它仅需要调整一个学习率超参数，并证明它在抽象化的REGularized loss函数下具有线性减少的性质。这与之前的算法不同，这些算法可能需要调整多个超参数，或者因为偏向的梯度估计或不足的正则化而无法 converges。我们在实验中证明，Prospect可以比基eline算法快速 converge 2-3倍，包括在分布shift和公平性 benchmark上。

2023-10-21

eess.SP

eess.SP - 2023-10-21

Non-sliced Optical Arbitrary Waveform Measurement (OAWM) Using a Silicon Photonic Receiver Chip

paper_url: http://arxiv.org/abs/2310.17662
repo_url: None
paper_authors: Daniel Drayss, Dengyang Fang, Christoph Füllner, Wolfgang Freude, Sebastian Randel, Christian Koos
for: 这篇论文是为了探讨 comb-based optical arbitrary waveform measurement (OAWM) 技术的可能性和应用前景而写的。
methods: 该论文使用了一种基于 silicon photonic chip 的 OAWM 前端，不需要 slice filters 也不需要活动控制。它包含四个 IQ 接收器，通过使用 femtosecond 模糊化 Laser 进行精确准化，可以达到 170 GHz 的总采集带宽。
results: 该 OAWM 系统可以成功地测量 sinusoidal 测试信号，并达到 SINAD 30 dB 和 ENOB 4.7 bit。它还可以接收 64QAM 数据信号，并在 symbol rates 达到 100 GBd 时实现 constellation signal-to-noise ratios (CSNR) 与 conventional coherent receivers 相当。在理论上，我们发现可以通过增加非分割 OAWM 系统的通道数来提高捕获带宽和信号质量。

Abstract
Comb-based optical arbitrary waveform measurement (OAWM) techniques can overcome the bandwidth limitations of conventional coherent detection schemes and may have disruptive impact on a wide range of scientific and industrial applications. Over the previous years, different OAWM schemes have been demonstrated, showing the performance and the application potential of the concept in laboratory experiments. However, these demonstrations still relied on discrete fiber-optic components or on combinations of discrete coherent receivers with integrated optical slicing filters that require complex tuning procedures to achieve the desired performance. In this paper, we demonstrate the first wavelength-agnostic OAWM front-end that is integrated on a compact silicon photonic chip and that neither requires slicing filters nor active controls. Our OAWM system comprises four IQ receivers, which are accurately calibrated using a femtosecond mode-locked laser and which offer a total acquisition bandwidth of 170 GHz. Using sinusoidal test signals, we measure a signal-to-noise-and-distortion ratio (SINAD) of 30 dB for the reconstructed signal, which corresponds to an effective number of bits (ENOB) of 4.7 bit, where the underlying electronic analog-to-digital converters (ADC) turn out to be the main limitation. The performance of the OAWM system is further demonstrated by receiving 64QAM data signals at symbol rates of up to 100 GBd, achieving constellation signal-to-noise ratios (CSNR) that are on par with those obtained for conventional coherent receivers. In a theoretical scalability analysis, we show that increasing the channel count of non-sliced OAWM systems can improve both the acquisition bandwidth and the signal quality. We believe that our work represents a key step towards out-of-lab use of highly compact OAWM systems that rely on chip-scale integrated optical front-ends.

摘要
optical Arbitrary Waveform Measurement（OAWM）技术可以超越传统同步探测方案的频率限制，并可能在广泛的科学和工业应用中产生干扰性的影响。过去几年，不同的OAWM方案在实验室中得到了证明，并显示了这种概念的性能和应用潜力。但是，这些实验仍然基于离散的纤维仪件或者离散的同步接收器和集成仪件的组合，需要复杂的调试过程来实现所需的性能。在这篇论文中，我们展示了首个不需要探针过滤器，也不需要活动控制的OAWM前端。我们的OAWM系统包括四个IQ接收器，它们通过使用 Femtosecond 模拟激光来精准寻定，并具有170 GHz的总获得带宽。使用正弦测试信号，我们测量到了重建信号的噪声比为30 dB，相应的有效数 bits（ENOB）为4.7 bit，其中下面的电子分析数字转换器（ADC）是主要的限制。我们的OAWM系统的性能进一步得到了证明，可以接收64QAM数据信号，并达到与传统同步接收器相同的符号信号噪声比（CSNR）。在理论上，我们发现了不需要探针过滤器的OAWM系统的频率Count的提高可以提高获得带宽和信号质量。我们认为，我们的工作代表了同步探测系统的快速出场的关键一步。

Green Beamforming Design for Integrated Sensing and Communication Systems: A Practical Approach Using Beam-Matching Error Metrics

paper_url: http://arxiv.org/abs/2310.13993
repo_url: None
paper_authors: Luping Xiang, Ke Xu, Jie Hu, Kun Yang
for: 本研究提出了一种绿色扫描设计方案，用于 интеграted sensing and communication（ISAC）系统，使用扫描性能评估中的扫描差异来评估雷达性能。
methods: 本研究使用了semidefinite relaxation（SDR）方法和iterative rank minimization algorithm（IRM）来解决扫描设计中的非对称挑战，并且应用了扫描模式匹配错误来评估雷达性能。
results: 实验结果表明，提出的优化扫描设计方案具有非常出色的扫描性能，强调了雷达组件在探测任务中的特出表现。

Abstract
In this paper, we propose a green beamforming design for the integrated sensing and communication (ISAC) system, using beam-matching error to assess radar performance. The beam-matching error metric, which considers the mean square error between the desired and designed beam patterns, provides a more practical evaluation approach. To tackle the non-convex challenge inherent in beamforming design, we apply semidefinite relaxation (SDR) to address the rank-one relaxation issue, followed by the iterative rank minimization algorithm (IRM) for rank-one recovery. The simulation results showcase the effectiveness of our proposed optimal beamforming design, emphasizing the exceptional performance of the radar component in sensing tasks.

摘要
在这篇论文中，我们提出了一种绿色扩散设计方案，用于 интегрирован的感测和通信（ISAC）系统，使用束匹配错误来评估雷达性能。束匹配错误指标，考虑了想要的束 pattern和设计的束 pattern之间的平均方差，提供了更实用的评估方法。为了解决扩散设计中的非 convex 挑战，我们使用半definite relaxation（SDR）来解决级数一relaxation问题，然后使用迭代rank minimization algorithm（IRM）来实现级数一回复。实验结果显示了我们提出的优化扩散设计的效iveness，强调了雷达组件在感测任务中的出色性能。

Robust NOMA-assisted OTFS-ISAC Network Design with 3D Motion Prediction Topology

paper_url: http://arxiv.org/abs/2310.13984
repo_url: None
paper_authors: Luping Xiang, Ke Xu, Jie Hu, Christos Masouros, Kun Yang
for: 提供一个基于非正交多存取（NOMA）和时频空间（OTFS）的数据通信和感知（ISAC）网络，以支持多个用户。
methods: 使用无人航空车（UAV）作为空中基站，并使用ISAC来提取用户的位置和速度信息，并实现非正交功率分配以获得更高的可达速率。
results: 透过3D动态预测拓扑导引NOMA传输，并提出一个可靠的功率分配解决方案，以解决做为MMF和SR问题。 simulation结果显示，提案的NOMA-assisted OTFS-ISAC系统在具有3D动态预测拓扑的情况下，在完美和不完美频道条件下具有较高的可达速率。

Abstract
This paper proposes a novel non-orthogonal multiple access (NOMA)-assisted orthogonal time-frequency space (OTFS)-integrated sensing and communication (ISAC) network, which uses unmanned aerial vehicles (UAVs) as air base stations to support multiple users. By employing ISAC, the UAV extracts position and velocity information from the user's echo signals, and non-orthogonal power allocation is conducted to achieve a superior achievable rate. A 3D motion prediction topology is used to guide the NOMA transmission for multiple users, and a robust power allocation solution is proposed under perfect and imperfect channel estimation for Maxi-min Fairness (MMF) and Maximum sum-Rate (SR) problems. Simulation results demonstrate the superiority of the proposed NOMA-assisted OTFS-ISAC system over other systems in terms of achievable rate under both perfect and imperfect channel conditions with the aid of 3D motion prediction topology.

摘要
To guide the NOMA transmission for multiple users, a 3D motion prediction topology is used. The network also proposes a robust power allocation solution under both perfect and imperfect channel estimation for Maximum Minimum Fairness (MMF) and Maximum Sum-Rate (SR) problems.Simulation results show that the proposed NOMA-assisted OTFS-ISAC system outperforms other systems in terms of achievable rate under both perfect and imperfect channel conditions, with the aid of the 3D motion prediction topology.

Cloud-Connected Wireless Holter Monitor Machine with Neural Networks Based ECG Analysis for Remote Health Monitoring

paper_url: http://arxiv.org/abs/2310.13965
repo_url: None
paper_authors: Azlaan Ranjha, Laiba Jabbar, Osaid Ahmed
for: 提高心血管疾病诊断的准确性
methods: 使用无线电心电听器、WIFI数据传输、人工神经网络分类模型
results: 实现心血管疾病诊断的准确率超过88%，提供了一种快速、准确且cost-effective的心血管诊断方案

Abstract
This study describes the creation of a wireless, transportable Holter monitor to improve the accuracy of cardiac disease diagnosis. The main goal of this study is to develop a low-cost cardiac screening system suited explicitly for underprivileged areas, addressing the rising rates of cardiovascular death. The suggested system includes a wireless Electrocardiogram (ECG) module for real-time cardiac signal gathering using attached electrodes, with data transfer made possible by WiFi to a cloud server for archival and analysis. The system uses a neural network model for automated ECG classification, concentrating on the identification of cardiac anomalies. The diagnostic performance of cardiologist-level ECG analysis is surpassed by our upgraded deep neural network architecture, which underwent thorough evaluation and showed a stunning accuracy rate of more than 88\%. A quick, accurate, and reasonably priced option for cardiac screening is provided by this ground-breaking technology, which smoothly merges wireless data transfer with AI-assisted diagnostics. In addition to providing a thorough overview of the development process, this paper also highlights methods used to improve model accuracy, such as data preparation, class imbalance correction using oversampling, and model fine-tuning. The work shows the viability of a comprehensive remote cardiac screening system powered by AI and maximising the use of wearable and cloud computing resources. Such cutting-edge remote health monitoring technologies have great promise for improved health outcomes and early identification, especially in resource-constrained countries.

摘要
这项研究描述了一种无线传输式适用于诊断心血管疾病的哨杀器监测系统。研究的主要目标是开发一种低成本的心血管检测系统，特意针对贫困地区，以应对心血管疾病的增长。提议的系统包括一个无线电心电响（ECG）模块，通过附加的电极收集心电信号，并通过WiFi传输数据到云服务器进行存储和分析。系统使用一种升级的深度神经网络模型进行自动ECG分类，主要关注心血管疾病的识别。研究人员通过对模型进行优化，包括数据准备、数据偏置纠正和模型细化，提高了模型的准确率，达到了超过88%。这项创新技术为心血管疾病检测提供了快速、准确、便宜的选择，并将无线数据传输和人工智能助动诊断融合在一起。此外，研究人员还提出了使用云计算和佩戴式设备来实现远程响应心血管疾病检测的可能性。这种前于顶峰的远程卫生监测技术具有很大的推动健康结果和早期识别潜力，特别是在资源受限的国家。

Goal-oriented Communications for the IoT: System Design and Adaptive Resource Optimization

paper_url: http://arxiv.org/abs/2310.13948
repo_url: None
paper_authors: Paolo Di Lorenzo, Mattia Merluzzi, Francesco Binucci, Claudio Battiloro, Paolo Banelli, Emilio Calvanese Strinati, Sergio Barbarossa
for: 本文旨在探讨Internet of Things（IoT）应用程序中的资源限制问题，包括频率、能源、计算、学习和推理能力等。
methods: 本文提出了一种新的目标带动（GO）IoT系统设计方法，弃察传输数据的精度为主，直接关注数据交换的目标实现。该方法通过系统优化，结合感知、无线通信、计算、学习和控制等方面，提高IoT系统的效果。
results: 本文通过数据示例表明，GO IoT系统设计方法可以在边缘推理、合作感知和联合学习等场景中实现显著的提高。这些示例表明了该方法的实际应用和现实意义，有potentiality to revolutionize IoT systems。

Abstract
Internet of Things (IoT) applications combine sensing, wireless communication, intelligence, and actuation, enabling the interaction among heterogeneous devices that collect and process considerable amounts of data. However, the effectiveness of IoT applications needs to face the limitation of available resources, including spectrum, energy, computing, learning and inference capabilities. This paper challenges the prevailing approach to IoT communication, which prioritizes the usage of resources in order to guarantee perfect recovery, at the bit level, of the data transmitted by the sensors to the central unit. We propose a novel approach, called goal-oriented (GO) IoT system design, that transcends traditional bit-related metrics and focuses directly on the fulfillment of the goal motivating the exchange of data. The improvement is then achieved through a comprehensive system optimization, integrating sensing, communication, computation, learning, and control. We provide numerical results demonstrating the practical applications of our methodology in compelling use cases such as edge inference, cooperative sensing, and federated learning. These examples highlight the effectiveness and real-world implications of our proposed approach, with the potential to revolutionize IoT systems.

摘要
互联网物联网（IoT）应用程序将感知、无线通信、智能和动作结合起来，使不同设备之间进行数据交换和处理。然而，IoT应用程序的效iveness需要面临有限的资源，包括频率带、能源、计算、学习和推理能力。本文挑战了现有的IoT通信方法，它强调使用资源以确保数据传输的精度。我们提出了一种新的目标对齐（GO）IoT系统设计方法，它不仅仅是关注传输数据的位数据精度，而是直接关注数据传输的目标。我们通过了一种整体系统优化方法，整合感知、通信、计算、学习和控制，提高系统的效iveness。我们提供了数字结果，证明我们的方法在Edge推理、合作感知和联邦学习等实用应用中的实际应用。这些例子说明了我们的提议的效果和现实意义，它有可能革命化IoT系统。

Joint Network Function Placement and Routing Optimization in Dynamic Software-defined Satellite-Terrestrial Integrated Networks

paper_url: http://arxiv.org/abs/2310.13940
repo_url: None
paper_authors: Shuo Yuan, Yaohua Sun, Mugen Peng
for: 提高软件定义卫星地球Integrated网络（SDSTN）的延迟服务提供，以提高资源灵活性和全球通信覆盖率。
methods: 通过对虚拟网络功能（VNF）布局和路由规划进行集成优化，以适应网络动态特性和低轨道卫星的有限资源。
results: 比基eline方案提高完成服务数量超过58%，并降低服务延迟率大于17%。

Abstract
Software-defined satellite-terrestrial integrated networks (SDSTNs) are seen as a promising paradigm for achieving high resource flexibility and global communication coverage. However, low latency service provisioning is still challenging due to the fast variation of network topology and limited onboard resource at low earth orbit satellites. To address this issue, we study service provisioning in SDSTNs via joint optimization of virtual network function (VNF) placement and routing planning with network dynamics characterized by a time-evolving graph. Aiming at minimizing average service latency, the corresponding problem is formulated as an integer nonlinear programming under resource, VNF deployment, and time-slotted flow constraints. Since exhaustive search is intractable, we transform the primary problem into an integer linear programming by involving auxiliary variables and then propose a Benders decomposition based branch-and-cut (BDBC) algorithm. Towards practical use, a time expansion-based decoupled greedy (TEDG) algorithm is further designed with rigorous complexity analysis. Extensive experiments demonstrate the optimality of BDBC algorithm and the low complexity of TEDG algorithm. Meanwhile, it is indicated that they can improve the number of completed services within a configuration period by up to 58% and reduce the average service latency by up to 17% compared to baseline schemes.

摘要
软件定义卫星地面Integrated networks (SDSTNs) 被看作是一种有前途的方法，以实现高资源灵活性和全球通信覆盖。然而，为了提供低延迟服务，由于网络结构的快速变化和低轨道卫星上的限制资源，仍然是一个挑战。为解决这个问题，我们研究了在SDSTNs中提供服务通过虚拟网络功能（VNF）的分配和路由规划的 JOINT优化。我们的目标是最小化服务延迟，并且将问题转化为一个整数非线性程序，受到资源、VNF部署和时钟分配的约束限制。由于枚举搜索是不可行的，我们将 primal problem 转化为整数线性程序，并提出了基于Benders decomposition的分支和裁剪（BDBC）算法。为实际应用，我们还提出了一种基于时间扩展的分解蜕蜕（TEDG）算法，并进行了严格的复杂性分析。实验结果表明，BDBC算法是优化的，而TEDG算法的复杂性很低。此外，我们发现，它们可以在配置期内完成更多的服务，并将服务延迟降低到17%以上，相比基eline schemes。

Wideband Beamforming for STAR-RIS-assisted THz Communications with Three-Side Beam Split

paper_url: http://arxiv.org/abs/2310.13933
repo_url: None
paper_authors: Wencai Yan, Wanming Hao, Gangcan Sun, Chongwen Huang, Qingqing Wu
for: 这篇论文研究了同时发射和反射智能表面（STAR-RIS）帮助的THz通信系统，包括三个方向的折射。这篇论文首次分析了STAR-RIS的双面折射效应。
methods: 作者提出了一种基于时钟延迟（TD）的完全连接结构，以减轻双面折射效应。此外，作者还提出了一种具有低硬件复杂度和低功耗的半连接结构，其中多个STAR-RIS元素共享一个TD。
results: 作者通过数值结果验证了提出的方案的有效性。

Abstract
In this paper, we consider the simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)-assisted THz communications with three-side beam split. Except for the beam split at the base station (BS), we analyze the double-side beam split at the STAR-RIS for the first time. To relieve the double-side beam split effect, we propose a time delayer (TD)-based fully-connected structure at the STAR-RIS. As a further advance, a low-hardware complexity and low-power consumption sub-connected structure is developed, where multiple STAR-RIS elements share one TD. Meanwhile, considering the practical scenario, we investigate a multi-STAR-RIS and multi-user communication system, and a sum rate maximization problem is formulated by jointly optimizing the hybrid analog/digital beamforming, time delays at the BS as well as the double-layer phase-shift coefficients, time delays and amplitude coefficients at the STAR-RISs. Based on this, we first allocate users for each STAR-RIS, and then derive the analog beamforming, time delays at the BS, and the double-layer phase-shift coefficients, time delays at each STAR-RIS. Next, we develop an alternative optimization algorithm to calculate the digital beamforming at the BS and amplitude coefficients at the STAR-RISs. Finally, the numerical results verify the effectiveness of the proposed schemes.

摘要
在这篇论文中，我们考虑了同时传输和反射可配置智能表面（STAR-RIS）帮助的 THz 通信系统中的三面扫描分。除了基站（BS）的扫描分，我们对 STAR-RIS 中的双面扫描分进行了分析。为了减轻双面扫描分的影响，我们提议在 STAR-RIS 中使用时延（TD）基于完全连接结构。此外，我们还提出了一种具有低硬件复杂度和低能耗的半连接结构，其中多个 STAR-RIS 元素共享一个 TD。此外，我们还考虑了实际场景，研究了多个 STAR-RIS 和多用户通信系统，并对 hybrid 分析/数字 beamforming、时延在 BS 以及 DOUBLE-LAYER 相位偏移 coefficients、时延和幅偏移在每个 STAR-RIS 进行了最大化Sum rate 问题。基于这，我们首先将用户分配给每个 STAR-RIS，然后 derivation analog beamforming、时延在 BS 和 DOUBLE-LAYER 相位偏移 coefficients、时延在每个 STAR-RIS。接着，我们开发了一种代替优化算法，用于计算数字 beamforming 在 BS 和每个 STAR-RIS 的幅偏移。最后，数值结果证明了我们提出的方案的有效性。

Trajectory and Power Design for Aerial Multi-User Covert Communications

paper_url: http://arxiv.org/abs/2310.13932
repo_url: None
paper_authors: Hongjiang Lei, Jiacheng Jiang, Imran Shafique Ansari, Gaofeng Pan, Mohamed-Slim Alouini
for: 这个论文是研究了一种多用户下链 dual-UAV 掩饰通信系统，以提高未来通信系统的安全性和可靠性。
methods: 该论文使用了安全信息传输、人工干扰信号和位置不确定性等方法来实现掩饰通信。
results: 研究发现，采用合适的 trajectory 和功率分配策略可以提高掩饰率，并且在多天线 Wardens 情况下，可以采用类似的策略来提高掩饰率。

Abstract
Unmanned aerial vehicles (UAVs) can provide wireless access to terrestrial users, regardless of geographical constraints, and will be an important part of future communication systems. In this paper, a multi-user downlink dual-UAVs enabled covert communication system was investigated, in which a UAV transmits secure information to ground users in the presence of multiple wardens as well as a friendly jammer UAV transmits artificial jamming signals to fight with the wardens. The scenario of wardens being outfitted with a single antenna is considered, and the detection error probability (DEP) of wardens with finite observations is researched. Then, considering the uncertainty of wardens' location, a robust optimization problem with worst-case covertness constraint is formulated to maximize the average covert rate by jointly optimizing power allocation and trajectory. To cope with the optimization problem, an algorithm based on successive convex approximation methods is proposed. Thereafter, the results are extended to the case where all the wardens are equipped with multiple antennas. After analyzing the DEP in this scenario, a tractable lower bound of the DEP is obtained by utilizing Pinsker's inequality. Subsequently, the non-convex optimization problem was established and efficiently coped by utilizing a similar algorithm as in the single-antenna scenario. Numerical results indicate the effectiveness of our proposed algorithm.

摘要
“无人航空车（UAV）可以提供无线通信服务，无论地理限制，并将成为未来通信系统的重要组件。在这篇研究中，一个多用户下行双UAV对抗通信系统被研究，其中一个UAV传送安全信息到地面用户，同时友好干扰UAV传送假的干扰信号以抗衡敌人。假设监听者搭配单一天线，DEP（检测错误率）的研究被考虑。然后，根据监听者的位置不确定性，一个具有最坏隐身性约束的强化优化问题被设定，以最大化平均隐身率，通过协同调度和轨道优化。为了解决这个问题，我们提出了基于成功递增法的算法。接下来，我们将结果扩展到所有监听者都搭配多天线情况下。经过分析DEP的情况，我们得到了一个可以计算的下界DEP。然后，我们设定了一个非对称优化问题，并使用相似的算法来解决。 numerics indicate the effectiveness of our proposed algorithm。”Note: Simplified Chinese is used in this translation, which is a more casual and conversational style of Chinese. If you prefer Traditional Chinese or another style, please let me know and I can adjust the translation accordingly.

Trajectory and power design for aerial CRNs with colluding eavesdroppers

paper_url: http://arxiv.org/abs/2310.13931
repo_url: None
paper_authors: Hongjiang Lei, Jiacheng Jiang, Haosi Yang, Ki-Hong Park, Imran Shafique Ansari, Gaofeng Pan, Mohamed-Slim Alouini
for: 这篇论文是研究无人机驱动的空中广播网络的安全性的。
methods: 该论文使用了迭代算法基于块协调 descent和逻辑准确 approximation来解决非整数混合变量优化问题。
results: numerical results表明我们的提议的方案可以提高空中广播网络的安全性表现。

Abstract
Unmanned aerial vehicles (UAVs) can provide wireless access services to terrestrial users without geographical limitations and will become an essential part of the future communication system. However, the openness of wireless channels and the mobility of UAVs make the security of UAV-based communication systems particularly challenging. This work investigates the security of aerial cognitive radio networks (CRNs) with multiple uncertainties colluding eavesdroppers. A cognitive aerial base station transmits messages to cognitive terrestrial users using the spectrum resource of the primary users. All secondary terrestrial users and illegitimate receivers jointly decode the received message. The average secrecy rate of the aerial CRNs is maximized by jointly optimizing the UAV's trajectory and transmission power. An iterative algorithm based on block coordinate descent and successive convex approximation is proposed to solve the non-convex mixed-variable optimization problem. Numerical results verify the effectiveness of our proposed algorithm and show that our scheme improves the secrecy performance of airborne CRNs.

摘要
无人飞行器（UAV）可以提供无地点限制的无线访问服务，未来的通信系统中将成为不可或缺的一部分。然而，无线通道的开放性和UAV的移动性使得UAV基站的安全特别挑战。这项工作研究了有多个不确定参与者的邻近广播网络（CRN）的安全性。一个认知空中基站通过主用户频率资源传输消息给认知地面用户。所有次要地面用户和非法接收器共同解码接收到的消息。通过协调UAV的轨迹和传输功率进行最大化平均机密率，解决了混合变量优化问题。我们提出了基于块协调下降和Successive Convex Approximation（SCA）的迭代算法来解决非对称混合变量优化问题。数值结果证明了我们的提议的有效性，并显示了我们的方案在空中CRN中提高了机密性性能。

Beamforming Design for the Distributed RISs-aided THz Communications with Double-Layer True Time Delays

paper_url: http://arxiv.org/abs/2310.13917
repo_url: None
paper_authors: Gangcan Sun, Wencai Yan, Wanming Hao, Chongwen Huang, Chau Yuen
for: 本文研究了利用具有稀疏无线电频率链天线结构的基站（BS）的射频信号处理系统，以提高系统性能。
methods: 本文提出了一种double-layer true-time-delay（TTD）方案，以减少BS发射机的扩散辐射损失，并对RIS进行分布式实现。然后，通过协调分析系统性能。
results: 实验结果表明，提出的方案可以有效地提高系统性能，并且可以考虑实际硬件限制。

Abstract
In this paper, we investigate the reconfigurable intelligent surface (RIS)-aided terahertz (THz) communication system with the sparse radio frequency chains antenna structure at the base station (BS). To overcome the beam split of the BS, different from the conventional single-layer true-time-delay (TTD) scheme, we propose a double-layer TTD scheme that can effectively reduce the number of large-range delay devices, which involve additional insertion loss and amplification circuitry. Next, we analyze the system performance under the proposed double-layer TTD scheme. To relieve the beam split of the RIS, we consider multiple distributed RISs to replace an ultra-large size RIS. Based on this, we formulate an achievable rate maximization problem for the distributed RISs-aided THz communications via jointly optimizing the hybrid analog/digital beamforming, time delays of the double-layer TTD network and reflection coefficients of RISs. Considering the practical hardware limitation, the finite-resolution phase shift, time delay and reflection phase are constrained. To solve the formulated problem, we first design an analog beamforming scheme including optimizing phase shift and time delay based on the RISs' locations. Then, an alternatively optimization algorithm is proposed to obtain the digital beamforming and reflection coefficients based on the minimum mean square error and coordinate update techniques. Finally, simulation results show the effectiveness of the proposed scheme.

摘要
在这篇论文中，我们研究了利用协助器（RIS）的可重新配置的智能表面（RIS）-帮助的 TerraHertz（THz）通信系统，其中基站（BS）使用稀疏的 радио频率链天线结构。为了解决基站的束分，我们提议了一种双层真时延迟（TTD）方案，可以有效减少基站的大范围延迟设备，这些设备包括额外插入损和增益电路。接着，我们分析了系统性能下提议的双层 TTD 方案。为了缓解 RIS 的束分，我们考虑了多个分布式 RIS，以取代巨大Size RIS。基于这，我们提出了一个可实现最大化率的做法，通过同时优化混合式analog/数字扫描、双层 TTD 网络的时延和 RIS 的反射率来实现。受到实际硬件限制，finite-resolution phase shift、时延和反射相位受限。为了解决这个问题，我们首先设计了一种analog扫描方案，包括优化相位和时延基于 RIS 的位置。然后，我们提出了一种alternatively optimization算法，用于取得数字扫描和反射率。最后，我们通过实际结果来证明提议的方案的有效性。

NMR Spectra Denoising with Vandermonde Constraints

paper_url: http://arxiv.org/abs/2310.13882
repo_url: None
paper_authors: Di Guo, Runmin Xu, Jinyu Wu, Meijin Lin, Xiaofeng Du, Xiaobo Qu
for: 用于分析化学和蛋白质的生物工程中，使用核磁共振（NMR） спектроскопия时，Signal 容易受到数据收集时的噪声污染，这会影响后续的量化分析。因此，去噪NMR Signal 已成为一项长期关注的问题。
methods: 本文提出了一种优化模型基于迭代减噪方法，称为CHORD-V，该方法在时域NMR Signal 上对噪声进行了逼近，并维持了泛函分解。
results: 对于both synthetic和实际的NMR数据，CHORD-V方法表现出了较高的减噪性能，比typical Cadzow和rQRd方法更为出色，同时也比state-of-the-art CHORD方法更为精准。CHORD-V方法能够更好地还原低强度 спектраль峰，特别是当噪声相对较高时。

Abstract
Nuclear magnetic resonance (NMR) spectroscopy serves as an important tool to analyze chemicals and proteins in bioengineering. However, NMR signals are easily contaminated by noise during the data acquisition, which can affect subsequent quantitative analysis. Therefore, denoising NMR signals has been a long-time concern. In this work, we propose an optimization model-based iterative denoising method, CHORD-V, by treating the time-domain NMR signal as damped exponentials and maintaining the exponential signal form with a Vandermonde factorization. Results on both synthetic and realistic NMR data show that CHORD-V has a superior denoising performance over typical Cadzow and rQRd methods, and the state-of-the-art CHORD method. CHORD-V restores low-intensity spectral peaks more accurately, especially when the noise is relatively high.

摘要
核磁共振（NMR）光谱技术在生物工程中扮演着重要的角色，但NMR信号在数据采集过程中容易受到噪声污染，这会影响后续的量化分析。因此，NMR信号的减噪成为了长期的焦点。在这种工作中，我们提出了一种基于优化模型的迭代减噪方法，即CHORD-V，其中将时域NMR信号 treated为抑制的 экспонент，并通过瓦德蒙德分解保持信号形式。对于 synthetic 和实际的 NMR 数据进行了比较，CHORD-V 的减噪性能较 Cadzow 和 rQRd 方法更高，同时也高于现有的 CHORD 方法。CHORD-V 能够更准确地还原低强度的spectral peaks，特别是在噪声较高的情况下。

2023-10-20

cs.SD

cs.SD - 2023-10-20

Multi-label Open-set Audio Classification

paper_url: http://arxiv.org/abs/2310.13759
repo_url: https://github.com/sripathisridhar/moads
paper_authors: Sripathi Sridhar, Mark Cartwright
for: 实际世界中的声音事件范围较小，因此现有的标准分类模型可能会遗传重要但未知的声音事件。
methods: 以开集标准方法来探测未知的声音事件，并将其应用到多项分类内容中，如声音景象分类。
results: 这些基本方法在不同未知分布下的评估结果，以及它们如何应对多项标签模型中的声音事件重叠。

Abstract
Current audio classification models have small class vocabularies relative to the large number of sound event classes of interest in the real world. Thus, they provide a limited view of the world that may miss important yet unexpected or unknown sound events. To address this issue, open-set audio classification techniques have been developed to detect sound events from unknown classes. Although these methods have been applied to a multi-class context in audio, such as sound scene classification, they have yet to be investigated for polyphonic audio in which sound events overlap, requiring the use of multi-label models. In this study, we establish the problem of multi-label open-set audio classification by creating a dataset with varying unknown class distributions and evaluating baseline approaches built upon existing techniques.

摘要
当前的听音分类模型有较小的类 vocabulary，相对于实际世界中的听音类型的数量相对较多。因此，它们只能提供有限的视角，可能会错过一些重要却未知的听音事件。为解决这个问题，开放集 audio 分类技术已经开发出来，用于检测未知类别的听音事件。虽然这些方法在音频场景分类中已经应用，但它们尚未在多声音场景中进行研究，需要使用多标签模型。在这项研究中，我们将定义多标签开放集听音分类问题，创建不同未知类分布的数据集，并评估基础方法。

Intelligibility prediction with a pretrained noise-robust automatic speech recognition model

paper_url: http://arxiv.org/abs/2310.19817
repo_url: None
paper_authors: Zehai Tu, Ning Ma, Jon Barker
for: 这两个系统是为了预测声音质量而设计的，具体来说是为了参加第二届清晰预测挑战（CPC2）。
methods: 一个系统是侵入的，利用自动语音识别（ASR）模型的隐藏表示。另一个系统是不侵入的，通过 derivated ASR uncertainty 进行预测。ASR 模型只是在 simulations 的噪音语音库上进行预训练，没有利用 CPC2 数据。
results: 两个系统在 CPC2 评估中表现出色，具体来说是在不同的噪音环境下预测声音质量的能力。

Abstract
This paper describes two intelligibility prediction systems derived from a pretrained noise-robust automatic speech recognition (ASR) model for the second Clarity Prediction Challenge (CPC2). One system is intrusive and leverages the hidden representations of the ASR model. The other system is non-intrusive and makes predictions with derived ASR uncertainty. The ASR model is only pretrained with a simulated noisy speech corpus and does not take advantage of the CPC2 data. For that reason, the intelligibility prediction systems are robust to unseen scenarios given the accurate prediction performance on the CPC2 evaluation.

摘要

Neural domain alignment for spoken language recognition based on optimal transport

paper_url: http://arxiv.org/abs/2310.13471
repo_url: None
paper_authors: Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
for: 提高cross-domain spoken language recognition（SLR）的效果，addressing domain shift challenge.
methods: 使用Unsupervised domain adaptation（UDA）算法，without relying on class labels in the target domain.
results: 提出了一种基于optimal transport（OT）的UDA算法，significantly improved the performance in a cross-channel SLR task compared to existing UDA algorithms.Here’s the full text in Simplified Chinese:
for: 本研究旨在提高cross-domain spoken language recognition（SLR）的效果， Addressing domain shift challenge.
methods: 我们使用Unsupervised domain adaptation（UDA）算法，不依赖于目标频道上的分类标签。
results: 我们提出了一种基于optimal transport（OT）的UDA算法，在cross-channel SLR任务中与现有UDA算法相比，表现出了显著的改善。

Abstract
Domain shift poses a significant challenge in cross-domain spoken language recognition (SLR) by reducing its effectiveness. Unsupervised domain adaptation (UDA) algorithms have been explored to address domain shifts in SLR without relying on class labels in the target domain. One successful UDA approach focuses on learning domain-invariant representations to align feature distributions between domains. However, disregarding the class structure during the learning process of domain-invariant representations can result in over-alignment, negatively impacting the classification task. To overcome this limitation, we propose an optimal transport (OT)-based UDA algorithm for a cross-domain SLR, leveraging the distribution geometry structure-aware property of OT. An OT-based discrepancy measure on a joint distribution over feature and label information is considered during domain alignment in OT-based UDA. Our previous study discovered that completely aligning the distributions between the source and target domains can introduce a negative transfer, where classes or irrelevant classes from the source domain map to a different class in the target domain during distribution alignment. This negative transfer degrades the performance of the adaptive model. To mitigate this issue, we introduce coupling-weighted partial optimal transport (POT) within our UDA framework for SLR, where soft weighting on the OT coupling based on transport cost is adaptively set during domain alignment. A cross-domain SLR task was used in the experiments to evaluate the proposed UDA. The results demonstrated that our proposed UDA algorithm significantly improved the performance over existing UDA algorithms in a cross-channel SLR task.

摘要
域外迁带来很大的挑战，对cross-domain spoken language recognition（SLR）的效果甚至是降低的。无监督适应（UDA）算法已经被探索以解决域外迁问题，无需在目标域中使用类别标签。一种成功的UDA方法是学习域外适应的域不同表示，以平衡特征分布的分布。但是，在学习过程中忽略目标域的类结构可能会导致过度平衡，从而负面影响分类任务。为了解决这些限制，我们提议一种基于最优运输（OT）的UDA算法，利用OT的分布几何结构特性。在OT中，我们考虑了一个联合分布 над feature和标签信息的误差度量，以便在域对齐过程中进行域外适应。我们的之前研究发现，完全对源和目标域的分布进行对齐可能会导致一种负面传递，其中源域中的类或无关类在目标域中的不同类型。这种负面传递会降低适应模型的性能。为了解决这个问题，我们在UDA框架中引入了coupling-weighted partial optimal transport（POT），其中在对齐过程中采用软约束的OT交互基于运输成本进行调整。我们使用了cross-domain SLR任务来评估我们的UDA算法。实验结果表明，我们的UDA算法在跨频SLR任务中表现得更好，与现有UDA算法相比。

2023-10-20

eess.AS

eess.AS - 2023-10-20

GenDistiller: Distilling Pre-trained Language Models based on Generative Models

paper_url: http://arxiv.org/abs/2310.13418
repo_url: None
paper_authors: Yingying Gao, Shilei Zhang, Zihao Cui, Yanhan Xu, Chao Deng, Junlan Feng
for: 提高资源有限设备上下游任务表现
methods: 使用生成语言模型进行知识塑造框架，生成教师网络的隐藏层
results: 比基eline系统提高下游任务表现，并且可以在资源有限设备上进行应用

Abstract
Self-supervised pre-trained models such as HuBERT and WavLM leverage unlabeled speech data for representation learning and offer significantly improve for numerous downstream tasks. Despite the success of these methods, their large memory and strong computational requirements hinder their application on resource restricted devices. Therefore, this paper introduces GenDistiller, a novel knowledge distillation framework to distill hidden representations from teacher network based on generative language model. The generative structure enables the proposed model to generate the target teacher hidden layers autoregressively, considering the interactions between hidden layers without instroducing additional inputs. A two-dimensional attention mechanism is implemented to ensure the causality of hidden layers, while preserving bidirectional attention in the time dimension. Experiments reveal the advantage of the generative distiller over the baseline system that predicts the hidden layers of teacher network directly without a generatvie model.

摘要
自我监督预训练模型如 HuBERT 和 WavLM 利用无标注语音数据进行表示学习，并有显著提高多个下游任务的表现。尽管这些方法取得成功，但它们具有大量内存和强大计算需求，因此阻碍它们在资源有限的设备上应用。因此，这篇论文介绍了 GenDistiller，一种新的知识填充框架，通过基于生成语言模型的教师网络来填充教师网络的隐藏表示。生成结构允许我们预测教师网络的目标隐藏层，考虑隐藏层之间的互动，而不需要额外的输入。在时间维度中，我们实现了双维度注意力机制，以保持隐藏层之间的 causality，同时保留双向注意力。实验表明，基于生成模型的填充distiller在基eline系统所predict teacher network的隐藏层直接而无法实现的优势。

2023-10-20

cs.CV

cs.CV - 2023-10-20

A Dual-Stream Neural Network Explains the Functional Segregation of Dorsal and Ventral Visual Pathways in Human Brains

paper_url: http://arxiv.org/abs/2310.13849
repo_url: https://github.com/minkyu-choi04/dualstreambrains
paper_authors: Minkyu Choi, Kuan Han, Xiaokai Wang, Yizhen Zhang, Zhongming Liu
for: 模仿人类视系统中的两条平行路径，该模型用于空间处理和物体识别。
methods: 使用两个分支的卷积神经网络（CNN）模型，分别模仿人脑中的脊梁和轴索路径。
results: 与人脑处理同一影片时，模型的两个分支具有不同的学习目标和表征，主要受到视觉注意力和物体识别的各自目标的影响，而不是特定的抑制或选择性。

Abstract
The human visual system uses two parallel pathways for spatial processing and object recognition. In contrast, computer vision systems tend to use a single feedforward pathway, rendering them less robust, adaptive, or efficient than human vision. To bridge this gap, we developed a dual-stream vision model inspired by the human eyes and brain. At the input level, the model samples two complementary visual patterns to mimic how the human eyes use magnocellular and parvocellular retinal ganglion cells to separate retinal inputs to the brain. At the backend, the model processes the separate input patterns through two branches of convolutional neural networks (CNN) to mimic how the human brain uses the dorsal and ventral cortical pathways for parallel visual processing. The first branch (WhereCNN) samples a global view to learn spatial attention and control eye movements. The second branch (WhatCNN) samples a local view to represent the object around the fixation. Over time, the two branches interact recurrently to build a scene representation from moving fixations. We compared this model with the human brains processing the same movie and evaluated their functional alignment by linear transformation. The WhereCNN and WhatCNN branches were found to differentially match the dorsal and ventral pathways of the visual cortex, respectively, primarily due to their different learning objectives. These model-based results lead us to speculate that the distinct responses and representations of the ventral and dorsal streams are more influenced by their distinct goals in visual attention and object recognition than by their specific bias or selectivity in retinal inputs. This dual-stream model takes a further step in brain-inspired computer vision, enabling parallel neural networks to actively explore and understand the visual surroundings.

摘要
人类视系统使用两条平行的路径进行空间处理和物体识别。相比之下，计算机视系统通常使用单个径向的Feedforward路径，使其比人视不太强大、适应性或效率。为了bridging这个差距，我们开发了基于人眼和大脑的双流视模型。在输入级别，模型从两个不同的视觉模式中采样两个协同的视觉特征，以模仿人眼使用大脑皮层ganglion cells将视觉输入分配给大脑。在后端，模型通过两个分支的卷积神经网络（CNN）处理分配的输入，以模仿人大脑使用脑干和脑膜两个路径进行平行的视觉处理。首支分支（WhereCNN）在全视图中学习空间注意力和控制眼动。第二支分支（WhatCNN）在fixation周围的视觉中学习物体特征。随着时间的推移，两支分支之间进行交互性的回归，从移动的眼动中构建场景表示。我们与人脑处理相同电影的结果进行对比，并评估其功能对齐性。WhereCNN和WhatCNN分支在功能上与人脑的视觉干道和脑膜干道相对应，尤其是由于它们的不同学习目标。这些模型基于的结果引导我们推断，人脑的视觉干道和脑膜干道的不同响应和表示是更多地受到它们的特定目标和需求而决定的，而不是它们特定的偏好或选择性在视觉输入中。这种双流模型在人脑适应计算机视觉方面又一步进展，使得平行神经网络能够活动地探索和理解视觉环境。

Normalizing flow-based deep variational Bayesian network for seismic multi-hazards and impacts estimation from InSAR imagery

paper_url: http://arxiv.org/abs/2310.13805
repo_url: None
paper_authors: Xuechun Li, Paula M. Burgi, Wei Ma, Hae Young Noh, David J. Wald, Susu Xu
for: 本研究旨在提供高精度的onsite灾害估算，以便快速和有效地进行post-灾应对。
methods: 本研究使用抗干扰synthetic aperture radar（InSAR）数据，并提出了一种新的Stochastic variational inference with normalizing flows方法，可以同时估算多种不见的灾害和影响。
results: 研究表明，该方法可以减少由干扰和复杂的信号引起的估算误差，并提供高精度的onsite灾害估算结果。

Abstract
Onsite disasters like earthquakes can trigger cascading hazards and impacts, such as landslides and infrastructure damage, leading to catastrophic losses; thus, rapid and accurate estimates are crucial for timely and effective post-disaster responses. Interferometric Synthetic aperture radar (InSAR) data is important in providing high-resolution onsite information for rapid hazard estimation. Most recent methods using InSAR imagery signals predict a single type of hazard and thus often suffer low accuracy due to noisy and complex signals induced by co-located hazards, impacts, and irrelevant environmental changes (e.g., vegetation changes, human activities). We introduce a novel stochastic variational inference with normalizing flows derived to jointly approximate posteriors of multiple unobserved hazards and impacts from noisy InSAR imagery.

摘要

Data-Free Knowledge Distillation Using Adversarially Perturbed OpenGL Shader Images

paper_url: http://arxiv.org/abs/2310.13782
repo_url: None
paper_authors: Logan Frank, Jim Davis
for: This paper focuses on the problem of knowledge distillation (KD) in the absence of the original training data, which is known as “data-free” KD.methods: The proposed approach uses unnatural OpenGL images and large amounts of data augmentation, along with adversarial attacks, to train a student network.results: The proposed method achieves state-of-the-art results for a variety of datasets/networks and is more stable than existing generator-based data-free KD methods.Here’s the text in Simplified Chinese:for: 本文关注无原始训练数据的知识储备（KD）问题，即“数据空”KD。methods: 提议的方法使用不同的OpenGL图像和大量数据增强，以及抗击攻击，来训练学生网络。results: 提议的方法在多个数据集/网络上达到了状态的最佳结果，并且比现有的生成器基于的数据空KD方法更稳定。

Abstract
Knowledge distillation (KD) has been a popular and effective method for model compression. One important assumption of KD is that the original training dataset is always available. However, this is not always the case due to privacy concerns and more. In recent years, "data-free" KD has emerged as a growing research topic which focuses on the scenario of performing KD when no data is provided. Many methods rely on a generator network to synthesize examples for distillation (which can be difficult to train) and can frequently produce images that are visually similar to the original dataset, which raises questions surrounding whether privacy is completely preserved. In this work, we propose a new approach to data-free KD that utilizes unnatural OpenGL images, combined with large amounts of data augmentation and adversarial attacks, to train a student network. We demonstrate that our approach achieves state-of-the-art results for a variety of datasets/networks and is more stable than existing generator-based data-free KD methods. Source code will be available in the future.

摘要
知识塑化（KD）是一种受欢迎且有效的模型压缩方法。然而，KD中一个重要假设是原始训练集总是可用的。然而，这并不总是情况，特别是由于隐私问题和其他因素。在过去几年，无数据KD（data-free KD）作为一个快速发展的研究领域而出现。许多方法利用生成器网络生成例子进行塑化（可能困难于训练），并且可能会生成与原始数据集相似的图像，这引发了隐私是否完全保持的问题。在这项工作中，我们提出了一种新的无数据KD方法，利用不自然的OpenGL图像，结合大量数据增强和对抗攻击，训练学生网络。我们示出了我们的方法可以在多个数据集和网络上实现状态之最的结果，并且更稳定于现有的生成器基于的无数据KD方法。未来我们计划将源代码公开。

TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

paper_url: http://arxiv.org/abs/2310.13772
repo_url: None
paper_authors: Tianshi Cao, Karsten Kreis, Sanja Fidler, Nicholas Sharp, Kangxue Yin
for: 这 paper 的目的是提出一种新的 texture synthesis 方法，用于给定的 3D geometry Synthesize textures.
methods: 这 paper 使用了 large-scale text-guided image diffusion models 来实现 texture synthesis. 它不同于 recent works 通过使用 2D text-to-image diffusion models 来缓慢和脆弱的优化 процесс来 distill 3D objects.
results: TexFusion 可以生成高质量、多样性和全局一致的 textures. 它可以efficiently generate diverse, high quality and globally coherent textures.

Abstract
We present TexFusion (Texture Diffusion), a new method to synthesize textures for given 3D geometries, using large-scale text-guided image diffusion models. In contrast to recent works that leverage 2D text-to-image diffusion models to distill 3D objects using a slow and fragile optimization process, TexFusion introduces a new 3D-consistent generation technique specifically designed for texture synthesis that employs regular diffusion model sampling on different 2D rendered views. Specifically, we leverage latent diffusion models, apply the diffusion model's denoiser on a set of 2D renders of the 3D object, and aggregate the different denoising predictions on a shared latent texture map. Final output RGB textures are produced by optimizing an intermediate neural color field on the decodings of 2D renders of the latent texture. We thoroughly validate TexFusion and show that we can efficiently generate diverse, high quality and globally coherent textures. We achieve state-of-the-art text-guided texture synthesis performance using only image diffusion models, while avoiding the pitfalls of previous distillation-based methods. The text-conditioning offers detailed control and we also do not rely on any ground truth 3D textures for training. This makes our method versatile and applicable to a broad range of geometry and texture types. We hope that TexFusion will advance AI-based texturing of 3D assets for applications in virtual reality, game design, simulation, and more.

摘要
我们介绍TexFusion（Texture Diffusion），一种新的方法用于给定3D геометрии的文本渲染。相比之前的工作，利用2D文本到图像扩散模型来降低3D对象的整合过程，TexFusion引入了一种专门为文本生成设计的3D一致的生成技术。具体来说，我们利用潜在扩散模型，对2D渲染视图中的3D对象应用扩散模型的降噪器，并将不同视图中的降噪器预测集成到共享潜在文本地图上。最终输出的RGB文本是通过优化2D渲染视图中的扩散预测来生成的。我们详细验证了TexFusion，并证明了我们可以高效地生成多样化、高质量和全局一致的文本。我们在只使用图像扩散模型的情况下，超越了先前的滤波器基本方法，并且不需要任何基于真实3D文本的训练。这使得我们的方法通用和可应用于各种几何和文本类型。我们希望TexFusion能够推动AI在虚拟现实、游戏设计、模拟和更多应用中的3D资产 текстури化。

PACE: Human and Camera Motion Estimation from in-the-wild Videos

paper_url: http://arxiv.org/abs/2310.13768
repo_url: None
paper_authors: Muhammed Kocabas, Ye Yuan, Pavlo Molchanov, Yunrong Guo, Michael J. Black, Otmar Hilliges, Jan Kautz, Umar Iqbal
for: 估计人体运动在全景视频中
methods: 提议一种结合人体动作和摄像机动作的全球估计方法，通过对人体动作和背景特征的结合来分离人体和摄像机动作。不同于现有方法使用SLAM作为初始化，我们提议在估计人体和摄像机动作时紧密地结合SLAM和人体动作约束。
results: 对比现有方法，我们的方法在人体和摄像机动作估计中具有显著的改进，并且提出了一种适合批处理的运动假设，使我们的方法更加高效。此外，我们还提出了一个适合评估摄像机动作的实验室数据集，并在实验中证明了我们的方法的优越性。

Abstract
We present a method to estimate human motion in a global scene from moving cameras. This is a highly challenging task due to the coupling of human and camera motions in the video. To address this problem, we propose a joint optimization framework that disentangles human and camera motions using both foreground human motion priors and background scene features. Unlike existing methods that use SLAM as initialization, we propose to tightly integrate SLAM and human motion priors in an optimization that is inspired by bundle adjustment. Specifically, we optimize human and camera motions to match both the observed human pose and scene features. This design combines the strengths of SLAM and motion priors, which leads to significant improvements in human and camera motion estimation. We additionally introduce a motion prior that is suitable for batch optimization, making our approach significantly more efficient than existing approaches. Finally, we propose a novel synthetic dataset that enables evaluating camera motion in addition to human motion from dynamic videos. Experiments on the synthetic and real-world RICH datasets demonstrate that our approach substantially outperforms prior art in recovering both human and camera motions.

摘要
我们提出了一种方法来估计人体动作在全景视频中。这是一个非常困难的任务，因为人体和摄像机的运动在视频中相互关联。为解决这个问题，我们提议一个共同优化框架，通过对人体动作和摄像机运动进行分离，使用前景人体动作假设和背景场景特征。不同于现有的方法使用SLAM作为初始化，我们提议在SLAM和人体动作假设之间进行紧密的集成，以便同时优化人体和摄像机的运动。这种设计 combinesthe strengths of SLAM and motion priors, leading to significant improvements in human and camera motion estimation. In addition, we introduce a motion prior suitable for batch optimization, making our approach much more efficient than existing methods. Finally, we propose a novel synthetic dataset that enables evaluating camera motion in addition to human motion from dynamic videos. Experimental results on the synthetic and real-world RICH datasets show that our approach significantly outperforms prior art in estimating both human and camera motions.

U-BEV: Height-aware Bird’s-Eye-View Segmentation and Neural Map-based Relocalization

paper_url: http://arxiv.org/abs/2310.13766
repo_url: None
paper_authors: Andrea Boscolo Camiletto, Alfredo Bochicchio, Alexander Liniger, Dengxin Dai, Abel Gawel
for: 本研究旨在提高智能汽车的重新本地化精度，当GPS接收强度不足或感知器基本本地化失败时。
methods: 本文使用鸟瞰视图（BEV）分割技术来估算当地场景的准确出现，并可以帮助智能汽车重新本地化。然而，BEV方法的缺点是需要大量计算来利用地理约束。本文提出了U-BEV模型，它是基于U-Net架构的BEV分割模型，可以让BEV在多层高度上进行Scene理解。这种扩展可以提高U-BEV的性能，并且和其他计算相同的BEV方法相比，提高1.7到2.8 mIoU。
results: 本研究结果表明，将编码的神经网络BEV与可导分配模板匹配器结合使用，可以实现高精度的重新本地化。与类似计算复杂度的 transformer-based BEV方法相比，本方法提高了重新本地化精度。在nuScenes数据集上，本方法的召回精度高于26%。

Abstract
Efficient relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails. Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance and in turn, can benefit the relocalization of the vehicle. However, one downside of BEV methods is the heavy computation required to leverage the geometric constraints. This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features. We show that this extension boosts the performance of the U-BEV by up to 4.11 IoU. Additionally, we combine the encoded neural BEV with a differentiable template matcher to perform relocalization on neural SD-map data. The model is fully end-to-end trainable and outperforms transformer-based BEV methods of similar computational complexity by 1.7 to 2.8 mIoU and BEV-based relocalization by over 26% Recall Accuracy on the nuScenes dataset.

摘要
efficient 地方化是智能车辆当GPS接收不够或感知器基于的地方化失败时的关键。最近的鸟瞰视图（BEV）分割技术允许精确地估计当地场景的外观，从而为车辆的重新定位提供了利器。然而，BEV方法的一个缺点是需要大量的计算来利用地形约束。这篇论文提出了U-BEV架构，它是基于U-Net的建议，允许BEV理解场景在多个高层次上进行推理，然后将BEV特征级别。我们显示，这种扩展可以提高U-BEV的性能，最多提高4.11 IoU。此外，我们将编码的神经网络BEV与演算器模板匹配器结合，以实现基于神经网络的SD-map数据重新定位。该模型是完全端到端训练的，与类似计算复杂度的转换器基于BEV方法相比，提高了1.7到2.8 mIoU，并超过了26%的Recall精度在nuScenes数据集。

Evaluating sleep-stage classification: how age and early-late sleep affects classification performance

paper_url: http://arxiv.org/abs/2310.13754
repo_url: None
paper_authors: Eugenia Moris, Ignacio Larrabide
for: automatic sleep-stage classification method
methods: Wavelets for feature extraction, Random Forest for classification
results: the performance of the classifier is affected by the age of the subjects and the moment of sleep, with some sleep stages improving and others worsening.Here’s the full translation of the abstract in Simplified Chinese:
for: automatic sleep-stage classification method
methods: 使用浪波lets для特征提取，Random Forest进行分类
results: 论文发现，参与者年龄和睡眠时间点会影响自动分类器的性能，一些睡眠阶段的分类得到改善，而另一些则得到恶化。

Abstract
Sleep stage classification is a common method used by experts to monitor the quantity and quality of sleep in humans, but it is a time-consuming and labour-intensive task with high inter- and intra-observer variability. Using Wavelets for feature extraction and Random Forest for classification, an automatic sleep-stage classification method was sought and assessed. The age of the subjects, as well as the moment of sleep (early-night and late-night), were confronted to the performance of the classifier. From this study, we observed that these variables do affect the automatic model performance, improving the classification of some sleep stages and worsening others.

摘要
睡眠阶段分类是一种常用的专业人员用于评估人类睡眠量和质量的方法，但是它是一项时间consuming和劳动密集的任务，具有高度的内部和外部观察者变化。使用浪涌来提取特征和随机森林来分类，一种自动睡眠阶段分类方法被寻找并评估。研究发现，试验者的年龄以及睡眠的时间点（晚上早些和晚上较晚）对自动模型的性能产生影响，改善一些睡眠阶段的分类，而对其他阶段的分类则有所恶化。

Reference-based Restoration of Digitized Analog Videotapes

paper_url: http://arxiv.org/abs/2310.14926
repo_url: https://github.com/miccunifi/tape
paper_authors: Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto Del Bimbo
for: restore analog videotapes
methods: CLIP-based artifact detection, Swin-UNet network with Multi-Reference Spatial Feature Fusion (MRSFF) blocks
results: effective restoration of degraded analog videotapes, outperforms other state-of-the-art methodsHere’s the full text in Simplified Chinese:
for: restore analog videotapes
methods: CLIP-based artifact detection, Swin-UNet network with Multi-Reference Spatial Feature Fusion (MRSFF) blocks
results: 高效地 Restore 了受损的 analog videotapes, 比其他状态的方法更高效I hope that helps!

Abstract
Analog magnetic tapes have been the main video data storage device for several decades. Videos stored on analog videotapes exhibit unique degradation patterns caused by tape aging and reader device malfunctioning that are different from those observed in film and digital video restoration tasks. In this work, we present a reference-based approach for the resToration of digitized Analog videotaPEs (TAPE). We leverage CLIP for zero-shot artifact detection to identify the cleanest frames of each video through textual prompts describing different artifacts. Then, we select the clean frames most similar to the input ones and employ them as references. We design a transformer-based Swin-UNet network that exploits both neighboring and reference frames via our Multi-Reference Spatial Feature Fusion (MRSFF) blocks. MRSFF blocks rely on cross-attention and attention pooling to take advantage of the most useful parts of each reference frame. To address the absence of ground truth in real-world videos, we create a synthetic dataset of videos exhibiting artifacts that closely resemble those commonly found in analog videotapes. Both quantitative and qualitative experiments show the effectiveness of our approach compared to other state-of-the-art methods. The code, the model, and the synthetic dataset are publicly available at https://github.com/miccunifi/TAPE.

摘要
传统的材料磁带是数据存储设备的主要设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设备数据存储设�

Localizing and Editing Knowledge in Text-to-Image Generative Models

paper_url: http://arxiv.org/abs/2310.13730
repo_url: None
paper_authors: Samyadeep Basu, Nanxuan Zhao, Vlad Morariu, Soheil Feizi, Varun Manjunatha
for: 这篇论文的目的是研究文本到图像生成模型中的知识储存和传递问题。
methods: 这篇论文使用了 causal mediation analysis 方法来分析文本到图像模型中不同视觉特征知识的储存和传递。具体来说， authors 使用了 UNet 和文本编码器来跟踪不同视觉特征知识的传递，并发现不同视觉特征知识不是孤立在具体组件中，而是分布在文本到图像模型中的一系列组件中。
results: 这篇论文的结果表明，CLIP 文本编码器在实际的文本到图像模型中只有一个 causal state，并且这个 causal state 是文本中最后一个属性token 的第一个自注意层。这与其他语言模型的 causal state 不同，后者通常是中间的 MLP 层。基于这个发现， authors 提出了一种快速、无需数据的模型修改方法 Diff-QuickFix，可以快速编辑（简化）文本到图像模型中的概念。

Abstract
Text-to-Image Diffusion Models such as Stable-Diffusion and Imagen have achieved unprecedented quality of photorealism with state-of-the-art FID scores on MS-COCO and other generation benchmarks. Given a caption, image generation requires fine-grained knowledge about attributes such as object structure, style, and viewpoint amongst others. Where does this information reside in text-to-image generative models? In our paper, we tackle this question and understand how knowledge corresponding to distinct visual attributes is stored in large-scale text-to-image diffusion models. We adapt Causal Mediation Analysis for text-to-image models and trace knowledge about distinct visual attributes to various (causal) components in the (i) UNet and (ii) text-encoder of the diffusion model. In particular, we show that unlike generative large-language models, knowledge about different attributes is not localized in isolated components, but is instead distributed amongst a set of components in the conditional UNet. These sets of components are often distinct for different visual attributes. Remarkably, we find that the CLIP text-encoder in public text-to-image models such as Stable-Diffusion contains only one causal state across different visual attributes, and this is the first self-attention layer corresponding to the last subject token of the attribute in the caption. This is in stark contrast to the causal states in other language models which are often the mid-MLP layers. Based on this observation of only one causal state in the text-encoder, we introduce a fast, data-free model editing method Diff-QuickFix which can effectively edit concepts in text-to-image models. DiffQuickFix can edit (ablate) concepts in under a second with a closed-form update, providing a significant 1000x speedup and comparable editing performance to existing fine-tuning based editing methods.

摘要
文本到图像扩散模型，如稳定扩散和图像，在最新的MS-COCO和其他生成benchmark中实现了无 precedent的图像真实度水平。给出caption，图像生成需要细化的知识，包括对象结构、风格、视角等多个属性。在这些模型中，这些信息存储在哪里？在我们的论文中，我们解答这个问题，了解扩散模型中对不同视觉属性的知识是如何分布的。我们采用了 causal mediation analysis，跟踪扩散模型中对不同视觉属性的知识是如何分布在（i）UNet和（ii）文本编码器中。尤其是，我们发现在大型文本到图像扩散模型中，不同属性的知识不是孤立分布在特定的组件中，而是分布在一系列组件中。这些组件frequently是不同的视觉属性。另外，我们发现在公共的文本到图像模型中，如稳定扩散，CLIP文本编码器只有一个 causal state，这是最后一个主题token的第一个自注意力层。这与其他语言模型的 causal state不同，通常是mid-MLP层。基于这一观察，我们引入了一种快速、无数据的模型修改方法Diff-QuickFix，可以快速编辑（抹除）文本到图像模型中的概念。Diff-QuickFix可以在下一秒内进行数据准确的更新，提供1000倍的速度增加和与现有精细调整方法相当的编辑性能。

Using Human-like Mechanism to Weaken Effect of Pre-training Weight Bias in Face-Recognition Convolutional Neural Network

paper_url: http://arxiv.org/abs/2310.13674
repo_url: None
paper_authors: Haojiang Ying, Yi-Fan Li, Yiyang Chen
for: 这个研究的目的是解释人工智能中的卷积神经网络（CNN）如何模仿人类的认知机制。
methods: 研究者使用了4种广泛使用的CNN模型（AlexNet、VGG11、VGG13和VGG16），通过转移学习进行情感值分类任务。与人类数据进行比较，研究发现这些CNN模型在一定程度上模仿人类的认知方式。基于神经科学和行为数据，研究者还对AlexNet进行了更新，使其更像人类的认知。
results: 研究发现，更新后的FE-AlexNet在情感值分类任务中表现更好，并且与人类认知更相似。这些结果还揭示了CNN模型的计算机制。此外，这项研究还提供了一种新的理解和改进CNN性能的方法，基于人类数据。

Abstract
Convolutional neural network (CNN), as an important model in artificial intelligence, has been widely used and studied in different disciplines. The computational mechanisms of CNNs are still not fully revealed due to the their complex nature. In this study, we focused on 4 extensively studied CNNs (AlexNet, VGG11, VGG13, and VGG16) which has been analyzed as human-like models by neuroscientists with ample evidence. We trained these CNNs to emotion valence classification task by transfer learning. Comparing their performance with human data, the data unveiled that these CNNs would partly perform as human does. We then update the object-based AlexNet using self-attention mechanism based on neuroscience and behavioral data. The updated FE-AlexNet outperformed all the other tested CNNs and closely resembles human perception. The results further unveil the computational mechanisms of these CNNs. Moreover, this study offers a new paradigm to better understand and improve CNN performance via human data.

摘要
卷积神经网络（CNN）是人工智能中一种重要的模型，在不同领域都得到了广泛的应用和研究。然而，它们的计算机制仍然没有完全揭示，因为它们的复杂性。在本研究中，我们选择了4种广泛采用和研究的 CNN（AlexNet、VGG11、VGG13和VGG16），由神经科学家作为人类模型进行分析，并有丰富的证据。我们使用传输学习训练这些 CNN 进行情感值分类任务。与人类数据进行比较，数据表明这些 CNN 在一定程度上会 acted like humans do。然后，我们将 AlexNet 更新为基于自注意力机制的 FE-AlexNet，并证明它在所有测试 CNN 中表现最佳，并且与人类感知很相似。这些结果进一步揭示了这些 CNN 的计算机制，并提供了一种新的方法来更好地理解和改进 CNN 性能。

ARNIQA: Learning Distortion Manifold for Image Quality Assessment

paper_url: http://arxiv.org/abs/2310.14918
repo_url: https://github.com/miccunifi/arniqa
paper_authors: Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto Del Bimbo
for: 本研究旨在开发一种无需参考图像的图像质量评估方法，以匹配人类受众的识别。
methods: 我们提出了一种自助学习方法，名为ARNIQA，它通过学习图像扭曲映射来获得图像质量表示。
results: 我们的方法在多个数据集上达到了当今最佳性能，并且在数据效率、通用性和稳定性等方面也表现出优异。代码和模型可以在 GitHub 上获取。

Abstract
No-Reference Image Quality Assessment (NR-IQA) aims to develop methods to measure image quality in alignment with human perception without the need for a high-quality reference image. In this work, we propose a self-supervised approach named ARNIQA (leArning distoRtion maNifold for Image Quality Assessment) for modeling the image distortion manifold to obtain quality representations in an intrinsic manner. First, we introduce an image degradation model that randomly composes ordered sequences of consecutively applied distortions. In this way, we can synthetically degrade images with a large variety of degradation patterns. Second, we propose to train our model by maximizing the similarity between the representations of patches of different images distorted equally, despite varying content. Therefore, images degraded in the same manner correspond to neighboring positions within the distortion manifold. Finally, we map the image representations to the quality scores with a simple linear regressor, thus without fine-tuning the encoder weights. The experiments show that our approach achieves state-of-the-art performance on several datasets. In addition, ARNIQA demonstrates improved data efficiency, generalization capabilities, and robustness compared to competing methods. The code and the model are publicly available at https://github.com/miccunifi/ARNIQA.

摘要
NR-IQA（无参考图像质量评估）的目标是开发一种基于人类感知的图像质量评估方法，不需要高质量的参考图像。在这种工作中，我们提出了一种自我超级vised方法 named ARNIQA（扩展的Random Image Distoration Manifold for Image Quality Assessment），用于模型图像扭曲 manifold 以获取内在的质量表示。首先，我们引入了一种图像抑制模型，可以随机排序 consecutively applied distortions 的序列。这样，我们可以Synthesize 图像的多种抑制模式。其次，我们提议在训练模型时，通过将不同图像的 patches 的表示相似化，来实现图像在同样的抑制模式下的匹配。因此，具有同样的抑制模式的图像将在扭曲 manifold 中的不同位置相对应。最后，我们将图像表示映射到质量分数的简单线性回归器，这样无需调整编码器的参数。实验结果显示，我们的方法可以在多个数据集上达到状态之最的性能。此外，ARNIQA 还表现出了更好的数据使用效率、通用能力和Robustness 等特点，相比其他方法。模型和代码可以在上获取。

Deep-Learning-based Change Detection with Spaceborne Hyperspectral PRISMA data

paper_url: http://arxiv.org/abs/2310.13627
repo_url: None
paper_authors: J. F. Amieva, A. Austoni, M. A. Brovelli, L. Ansalone, P. Naylor, F. Serva, B. Le Saux
for:This paper is written for environmental monitoring and disaster management, as well as other sectors where change detection (CD) is applied.methods:The paper uses both standard and deep-learning (DL) CD methods, as well as a pipeline starting from coregistration, followed by CD with a full-spectrum algorithm, and a DL network developed for optical data.results:The paper finds that changes in vegetation and built environments are well captured using the proposed methods, and that the spectral information is valuable for identifying subtle changes. However, the paper also notes that atmospheric effects and the lack of reliable ground truth present a major challenge to hyperspectral CD.

Abstract
Change detection (CD) methods have been applied to optical data for decades, while the use of hyperspectral data with a fine spectral resolution has been rarely explored. CD is applied in several sectors, such as environmental monitoring and disaster management. Thanks to the PRecursore IperSpettrale della Missione operativA (PRISMA), hyperspectral-from-space CD is now possible. In this work, we apply standard and deep-learning (DL) CD methods to different targets, from natural to urban areas. We propose a pipeline starting from coregistration, followed by CD with a full-spectrum algorithm and by a DL network developed for optical data. We find that changes in vegetation and built environments are well captured. The spectral information is valuable to identify subtle changes and the DL methods are less affected by noise compared to the statistical method, but atmospheric effects and the lack of reliable ground truth represent a major challenge to hyperspectral CD.

摘要
CD方法（变化检测方法）在光学数据中应用了数十年，而使用高分谱数据的使用则rarely explored。CD方法在各个领域，如环境监测和灾害管理中使用。due to the PRecursore IperSpettrale della Missione operativA (PRISMA), hyperspectral-from-space CD现在可能。在这种工作中，我们对不同的目标，从自然区域到城市区域进行了应用。我们提议一个管道，从协调开始，然后使用全谱式算法进行CD，并使用为光学数据开发的深度学习网络。我们发现， Vegetation和建筑环境中的变化都被良好地捕捉。spectral信息有价值，可以发现微妙的变化，而深度学习方法对噪声比统计方法更为敏感，但大气效应和可靠的地面真实数据的缺乏是干扰 hyperspectral CD的主要挑战。

Inter-Scale Dependency Modeling for Skin Lesion Segmentation with Transformer-based Networks

paper_url: http://arxiv.org/abs/2310.13727
repo_url: None
paper_authors: Sania Eskandari, Janet Lumpp
for: 帮助诊断皮肤癌变，自动分割皮肤病变部分
methods: 使用U-Net建模，并提出Inter-scale Context Fusion（ISCF）模块来弥补semantic gaps
results: 在皮肤病变分割 benchmark 上获得了有效的结果，支持应用性和效果

Abstract
Melanoma is a dangerous form of skin cancer caused by the abnormal growth of skin cells. Fully Convolutional Network (FCN) approaches, including the U-Net architecture, can automatically segment skin lesions to aid diagnosis. The symmetrical U-Net model has shown outstanding results, but its use of a convolutional operation limits its ability to capture long-range dependencies, which are essential for accurate medical image segmentation. In addition, the U-shaped structure suffers from the semantic gaps between the encoder and decoder. In this study, we developed and evaluated a U-shaped hierarchical Transformer-based structure for skin lesion segmentation while we proposed an Inter-scale Context Fusion (ISCF) to utilize the attention correlations in each stage of the encoder to adaptively combine the contexts coming from each stage to hinder the semantic gaps. The preliminary results of the skin lesion segmentation benchmark endorse the applicability and efficacy of the ISCF module.

摘要
melanoma 是一种危险的皮肤癌，由皮肤细胞异常生长所致。全量卷积网络（FCN）方法，包括 U-Net 架构，可以自动 segment 皮肤损伤以帮助诊断。 symmetrical U-Net 模型已经达到了卓越的效果，但它使用的 convolutional 操作限制了它的捕捉长距离依赖关系的能力，这些关系是医疗影像分割中非常重要的。此外， U 形结构受到 encoder 和 decoder 之间的 semantic gap 的困扰。在这项研究中，我们开发了一种 U 形层次 Transformer 结构，用于皮肤损伤 segmentation，并提出了一种 Inter-scale Context Fusion（ISCF）模块，用于在每个encoder 阶段中adaptively combine 来自每个阶段的上下文，以避免 semantic gap。preliminary 的皮肤损伤 segmentation benchmark 结果表征了 ISCF 模块的可行性和效果。

What you see is what you get: Experience ranking with deep neural dataset-to-dataset similarity for topological localisation

paper_url: http://arxiv.org/abs/2310.13622
repo_url: https://github.com/mttgdd/vdna-experience-selection
paper_authors: Matthew Gadd, Benjamin Ramtoula, Daniele De Martini, Paul Newman
for: 本研究旨在提高视觉导航中的地点 lokalisierung 精度和稳定性，通过回忆最相关的视觉记忆来减少地点 lokalisierung 的努力。
methods: 本研究使用了 Visual DNA，一种高度可扩展的工具来比较图像集。在本研究中，我们使用了图像序列（map和实时经验）的比较，以检测模式的变化，包括天气、照明和季节。
results: 我们发现，对于使用深度建筑来进行地点 lokalisierung，分布度量可以比较 neuron-wise 活动统计数据 между实时图像和多个以前记录的经验，并且可以考虑季节（冬夏）或时间点（天亮夜晚）的变化。我们的方法可以准确地评估实际地点 lokalisierung 性能，并且在北欧的 Nordland 跨季度数据集和 Oxford 大学公园的照明和温和季节变化数据集上进行验证。

Abstract
Recalling the most relevant visual memories for localisation or understanding a priori the likely outcome of localisation effort against a particular visual memory is useful for efficient and robust visual navigation. Solutions to this problem should be divorced from performance appraisal against ground truth - as this is not available at run-time - and should ideally be based on generalisable environmental observations. For this, we propose applying the recently developed Visual DNA as a highly scalable tool for comparing datasets of images - in this work, sequences of map and live experiences. In the case of localisation, important dataset differences impacting performance are modes of appearance change, including weather, lighting, and season. Specifically, for any deep architecture which is used for place recognition by matching feature volumes at a particular layer, we use distribution measures to compare neuron-wise activation statistics between live images and multiple previously recorded past experiences, with a potentially large seasonal (winter/summer) or time of day (day/night) shift. We find that differences in these statistics correlate to performance when localising using a past experience with the same appearance gap. We validate our approach over the Nordland cross-season dataset as well as data from Oxford's University Parks with lighting and mild seasonal change, showing excellent ability of our system to rank actual localisation performance across candidate experiences.

摘要
回忆最有关系的视觉记忆可以帮助提高视觉导航的效率和稳定性。解决这个问题应该与表现评估 Against ground truth 分离，因为在运行时不可用。我们提议使用最近发展的视觉 DNA 作为高可扩展的图像比较工具。在本工作中，我们使用 Distribution measures 来比较 neuron-wise 活动统计量 междуlive图像和多个前期记录的过去经验，包括季节（冬季/夏季）或时间点（日间/夜晚）的变化。我们发现这些统计量与表现强相关，可以用来评估不同经验的地方化性。我们验证了我们的方法在北欧的 Nordland 跨季度数据集以及牛津大学公园的光照和温和季节变化数据集上，并示出了我们系统可以高效地评估实际的地方化性。

FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer

paper_url: http://arxiv.org/abs/2310.13605
repo_url: None
paper_authors: Xinyu Zhang, Li Wang, Zhiqiang Jiang, Kun Dai, Tao Xie, Lei Yang, Wenhao Yu, Yang Shen, Jun Li
for: 本研究旨在提出一种基于Transformer的Feature Matching方法，以提高计算机视觉任务中的结构从运动和视觉定位精度。
methods: 本方法使用一种专门设计的Reconciliatory Transformer（RecFormer），包括全球观察注意层（GPAL）、评估重要性层（PWL）和本地观察Feed-forward网络（LPFFN），以适应不同的感知范围和重要性进行自适应调整。
results: 广泛的实验结果表明，FMRT在多个标准 bencmarks上达到了极高的性能水平，包括pose estimation、visual localization、homography estimation和图像匹配等。

Abstract
Local Feature Matching, an essential component of several computer vision tasks (e.g., structure from motion and visual localization), has been effectively settled by Transformer-based methods. However, these methods only integrate long-range context information among keypoints with a fixed receptive field, which constrains the network from reconciling the importance of features with different receptive fields to realize complete image perception, hence limiting the matching accuracy. In addition, these methods utilize a conventional handcrafted encoding approach to integrate the positional information of keypoints into the visual descriptors, which limits the capability of the network to extract reliable positional encoding message. In this study, we propose Feature Matching with Reconciliatory Transformer (FMRT), a novel Transformer-based detector-free method that reconciles different features with multiple receptive fields adaptively and utilizes parallel networks to realize reliable positional encoding. Specifically, FMRT proposes a dedicated Reconciliatory Transformer (RecFormer) that consists of a Global Perception Attention Layer (GPAL) to extract visual descriptors with different receptive fields and integrate global context information under various scales, Perception Weight Layer (PWL) to measure the importance of various receptive fields adaptively, and Local Perception Feed-forward Network (LPFFN) to extract deep aggregated multi-scale local feature representation. Extensive experiments demonstrate that FMRT yields extraordinary performance on multiple benchmarks, including pose estimation, visual localization, homography estimation, and image matching.

摘要
本文提出了一种新的Transformer基本方法，即Feature Matching with Reconciliatory Transformer（FMRT），用于解决多个计算机视觉任务（如结构从运动和视觉位置）中的本地特征匹配问题。这些方法仅将键点之间的长距离上下文信息集成到Transformer网络中，导致网络无法考虑不同的感知场景中的特征之间的重要性，从而限制匹配精度。此外，这些方法使用了传统的手动编码方法来整合键点的位置信息到视觉描述符中，这限制了网络的能力以可靠的编码位置信息。在本研究中，我们提出了一种专门的Reconciliatory Transformer（RecFormer），包括全球感知注意层（GPAL）、重要性测量层（PWL）和本地感知径向网络（LPFFN）。GPAL用于捕捉不同感知场景中的视觉描述符，并将其集成到不同缩放尺度下的全球上下文信息中；PWL用于测量不同感知场景中的重要性，并将其适应性地调整；LPFFN用于提取多尺度本地特征表示。广泛的实验表明，FMRT在多个测试准则上表现出色，包括pose estimation、视觉位置、投影估计和图像匹配等。

Longer-range Contextualized Masked Autoencoder

paper_url: http://arxiv.org/abs/2310.13593
repo_url: None
paper_authors: Taekyung Kim, Sanghyuk Chun, Byeongho Heo, Dongyoon Han
for: 提高自助学习模型的长范围关注和多个范围关注的能力，以提高图像识别 tasks 的性能。
methods: 提出了一种名为 Longer-range Contextualized Masked Autoencoder (LC-MAE) 的自助学习框架，通过全像 pixels 和多视图 pixels 的组合来提高模型的长范围关注和多个范围关注能力，同时减少输入的空间重复性。
results: 通过LC-MAE框架，实现了 ImageNet-1K 上的 Top-1 准确率提高至 84.2%，与基eline ViT-B 模型相比增加了 0.6%p，并在 semantic segmentation 和 fine-grained visual classification 任务上显示出了显著的性能提升。

Abstract
Masked image modeling (MIM) has emerged as a promising self-supervised learning (SSL) strategy. The MIM pre-training facilitates learning powerful representations using an encoder-decoder framework by randomly masking some input pixels and reconstructing the masked pixels from the remaining ones. However, as the encoder is trained with partial pixels, the MIM pre-training can suffer from a low capability of understanding long-range dependency. This limitation may hinder its capability to fully understand multiple-range dependencies, resulting in narrow highlighted regions in the attention map that may incur accuracy drops. To mitigate the limitation, We propose a self-supervised learning framework, named Longer-range Contextualized Masked Autoencoder (LC-MAE). LC-MAE effectively leverages a global context understanding of visual representations while simultaneously reducing the spatial redundancy of input at the same time. Our method steers the encoder to learn from entire pixels in multiple views while also learning local representation from sparse pixels. As a result, LC-MAE learns more discriminative representations, leading to a performance improvement of achieving 84.2% top-1 accuracy with ViT-B on ImageNet-1K with 0.6%p gain. We attribute the success to the enhanced pre-training method, as evidenced by the singular value spectrum and attention analyses. Finally, LC-MAE achieves significant performance gains at the downstream semantic segmentation and fine-grained visual classification tasks; and on diverse robust evaluation metrics. Our code will be publicly available.

摘要
自适应学习（SSL）策略中的面具模型（MIM）已经出现为一种有前途的策略。MIM预训练使用Encoder-Decoder框架，随机遮盖输入像素，并从剩下的像素中重建遮盖的像素。然而，由于Encoder在部分像素上训练，MIM预训练可能会受到长距离依赖的限制，从而导致缺乏多距离依赖的全面理解，可能导致窄的强调区域在注意力图中，从而导致准确性下降。为解决这些限制，我们提出了一种自适应学习框架，名为长距离contextualized Masked Autoencoder（LC-MAE）。LC-MAE有效地利用了视觉表示的全局上下文理解，同时减少输入的空间重复。我们的方法使得Encoder从多个视图中学习整个像素，同时从罕见像素中学习本地表示。因此，LC-MAE学习出了更加细致的表示，导致它在ImageNet-1K上 achieved 84.2% top-1准确率，升准确率0.6%。我们认为这些成功是由于我们的预训练方法的改进，如谱值 спектrum和注意力分析所证明。最后，LC-MAE在下游semantic segmentation和细化视觉分类任务中具有显著性能提升，并在多个 Robust 评价指标上表现出色。我们的代码将公开。

POTLoc: Pseudo-Label Oriented Transformer for Point-Supervised Temporal Action Localization

paper_url: http://arxiv.org/abs/2310.13585
repo_url: None
paper_authors: Elahe Vahdani, Yingli Tian
for: 本 paper targets the challenge of point-supervised temporal action detection, where only a single frame is annotated for each action instance in the training set.
methods: 本 paper 提出了一种 Pseudo-label Oriented Transformer (POTLoc)，使用只有点级标注进行weakly-supervised Action Localization。POTLoc 通过自我训练策略来识别和跟踪连续的动作结构。
results: POTLoc 在 THUMOS’14 和 ActivityNet-v1.2 数据集上表现出色，与状态之前的点supervised方法进行比较，显示了5%的平均精度提升。

Abstract
This paper tackles the challenge of point-supervised temporal action detection, wherein only a single frame is annotated for each action instance in the training set. Most of the current methods, hindered by the sparse nature of annotated points, struggle to effectively represent the continuous structure of actions or the inherent temporal and semantic dependencies within action instances. Consequently, these methods frequently learn merely the most distinctive segments of actions, leading to the creation of incomplete action proposals. This paper proposes POTLoc, a Pseudo-label Oriented Transformer for weakly-supervised Action Localization utilizing only point-level annotation. POTLoc is designed to identify and track continuous action structures via a self-training strategy. The base model begins by generating action proposals solely with point-level supervision. These proposals undergo refinement and regression to enhance the precision of the estimated action boundaries, which subsequently results in the production of `pseudo-labels' to serve as supplementary supervisory signals. The architecture of the model integrates a transformer with a temporal feature pyramid to capture video snippet dependencies and model actions of varying duration. The pseudo-labels, providing information about the coarse locations and boundaries of actions, assist in guiding the transformer for enhanced learning of action dynamics. POTLoc outperforms the state-of-the-art point-supervised methods on THUMOS'14 and ActivityNet-v1.2 datasets, showing a significant improvement of 5% average mAP on the former.

摘要
This paper proposes POTLoc, a Pseudo-label Oriented Transformer for weakly-supervised Action Localization, which utilizes only point-level annotation. POTLoc is designed to identify and track continuous action structures via a self-training strategy. The base model generates action proposals solely with point-level supervision, which undergo refinement and regression to enhance the precision of the estimated action boundaries. These estimated boundaries serve as supplementary supervisory signals, known as pseudo-labels, to guide the transformer for enhanced learning of action dynamics.The architecture of the model integrates a transformer with a temporal feature pyramid to capture video snippet dependencies and model actions of varying duration. The pseudo-labels provide information about the coarse locations and boundaries of actions, assisting the transformer in learning action dynamics. POTLoc outperforms state-of-the-art point-supervised methods on the THUMOS'14 and ActivityNet-v1.2 datasets, showing a significant improvement of 5% average mAP on the former.

Progressive Dual Priori Network for Generalized Breast Tumor Segmentation

paper_url: http://arxiv.org/abs/2310.13574
repo_url: None
paper_authors: Li Wang, Lihui Wang, Zixiang Kuai, Lei Tang, Yingfeng Ou, Chen Ye, Yuemin Zhu
for: 提高乳腺癌分 segmentation模型的通用能力和小型腺癌分割性能
methods: 提出进步 dual priori network (PDPNet)，使用均衡精度和稳定性两种约束来进行乳腺癌分割
results: 比较多种国家数据集，PDPNet的 DSC、SEN、KAPPA 和 HD95 值分别提高 3.63%、8.19%、5.52% 和 3.66%，并通过减少正常组织的影响来提高模型的通用能力。

Abstract
To promote the generalization ability of breast tumor segmentation models, as well as to improve the segmentation performance for breast tumors with smaller size, low-contrast amd irregular shape, we propose a progressive dual priori network (PDPNet) to segment breast tumors from dynamic enhanced magnetic resonance images (DCE-MRI) acquired at different sites. The PDPNet first cropped tumor regions with a coarse-segmentation based localization module, then the breast tumor mask was progressively refined by using the weak semantic priori and cross-scale correlation prior knowledge. To validate the effectiveness of PDPNet, we compared it with several state-of-the-art methods on multi-center datasets. The results showed that, comparing against the suboptimal method, the DSC, SEN, KAPPA and HD95 of PDPNet were improved 3.63\%, 8.19\%, 5.52\%, and 3.66\% respectively. In addition, through ablations, we demonstrated that the proposed localization module can decrease the influence of normal tissues and therefore improve the generalization ability of the model. The weak semantic priors allow focusing on tumor regions to avoid missing small tumors and low-contrast tumors. The cross-scale correlation priors are beneficial for promoting the shape-aware ability for irregual tumors. Thus integrating them in a unified framework improved the multi-center breast tumor segmentation performance.

摘要
<>为提高乳腺癌分割模型的通用能力以及改善小型、低对比度和不规则形状的乳腺癌分割性能，我们提出了进步式双级先知网络（PDPNet）。PDPNet首先使用粗略分割基于本地化模块将肿瘤区域分割出来，然后通过弱语义先知和跨尺度相关先知知识进行进一步细化。为验证PDPNet的效果，我们与多个中心数据集进行比较。结果显示，相比于不优化方法，PDPNet的DSC、SEN、KAPPA和HD95分别提高了3.63%、8.19%、5.52%和3.66%。此外，通过剥离，我们证明了提案的本地化模块可以减少正常组织的影响，因此提高模型的通用能力。弱语义先知使得模型更注重肿瘤区域，以避免漏掉小肿瘤和低对比度肿瘤。跨尺度相关先知使得模型能够更好地捕捉不规则形状的肿瘤。因此，将它们集成到一个统一框架中，提高了多中心乳腺癌分割性能。

A Simple Baseline for Knowledge-Based Visual Question Answering

paper_url: http://arxiv.org/abs/2310.13570
repo_url: None
paper_authors: Alexandros Xenos, Themos Stafylakis, Ioannis Patras, Georgios Tzimiropoulos
for: 本研究的目的是解决知识基础视觉问答问题（KB-VQA），以提高问答效果。
methods: 本研究提出了一种简单、可重现的管道，基于快速内容学习，使用问题描述文本作为上下文信息，通过提问LLaMA（1和2）来解决问题。
results: 相比之前的方法，本研究的方法不需要训练，无需访问外部数据库或API，却可以达到当前最佳性能水平在OK-VQA和A-OK-VQA数据集上。

Abstract
This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA). Recent works have emphasized the significance of incorporating both explicit (through external databases) and implicit (through LLMs) knowledge to answer questions requiring external knowledge effectively. A common limitation of such approaches is that they consist of relatively complicated pipelines and often heavily rely on accessing GPT-3 API. Our main contribution in this paper is to propose a much simpler and readily reproducible pipeline which, in a nutshell, is based on efficient in-context learning by prompting LLaMA (1 and 2) using question-informative captions as contextual information. Contrary to recent approaches, our method is training-free, does not require access to external databases or APIs, and yet achieves state-of-the-art accuracy on the OK-VQA and A-OK-VQA datasets. Finally, we perform several ablation studies to understand important aspects of our method. Our code is publicly available at https://github.com/alexandrosXe/ASimple-Baseline-For-Knowledge-Based-VQA

摘要
这篇论文关注知识基于视觉问答（KB-VQA）问题。现有研究强调了结合外部数据库和深度学习模型（LLMs）的知识来解决需要外部知识的问题。这些方法的共同局限性是它们通常包含复杂的管道和大量依赖于GPT-3 API访问。我们的主要贡献在这篇论文中是提出了一个非常简单和可重现的管道，即通过快速在 LLMA （1和2）中进行准确的培训，使用问题描述作为 Contextual information。与之前的方法不同，我们的方法不需要训练和访问外部数据库或 API，却可以达到 OK-VQA 和 A-OK-VQA 数据集上的状态当前精度。最后，我们进行了多个ablation study来理解我们的方法的重要方面。我们的代码公开可用于 GitHub 上的 https://github.com/alexandrosXe/ASimple-Baseline-For-Knowledge-Based-VQA。

ROSS: Radar Off-road Semantic Segmentation

paper_url: http://arxiv.org/abs/2310.13551
repo_url: None
paper_authors: Peng Jiang, Srikanth Saripalli
for: 本研究旨在探讨RADAR数据中的Semantic segmentation问题，以提高自动导航在非道路环境中的能效性。
methods: 我们提出了一种新的管道，利用LIDAR数据和现有的 annotated off-road LIDAR数据来生成RADAR标签，其中RADAR数据被 Represented as images。
results: 我们验证了这种实用的方法，并发现它在实际数据中表现出色，这demonstrates the potential of RADAR technology for navigation applications in off-road environments。Here’s the translation in English for reference:
for: This study aims to tackle the inherent complexities of semantic segmentation in RADAR data for off-road scenarios, with the goal of enhancing the efficiency of autonomous navigation in such environments.
methods: We propose a novel pipeline that utilizes LIDAR data and an existing annotated off-road LIDAR dataset to generate RADAR labels, where the RADAR data are represented as images.
results: We validate the effectiveness of our practical approach using real-world datasets, and find that it performs excellently, demonstrating the potential of RADAR technology for navigation applications in off-road environments.

Abstract
As the demand for autonomous navigation in off-road environments increases, the need for effective solutions to understand these surroundings becomes essential. In this study, we confront the inherent complexities of semantic segmentation in RADAR data for off-road scenarios. We present a novel pipeline that utilizes LIDAR data and an existing annotated off-road LIDAR dataset for generating RADAR labels, in which the RADAR data are represented as images. Validated with real-world datasets, our pragmatic approach underscores the potential of RADAR technology for navigation applications in off-road environments.

摘要
As the demand for autonomous navigation in off-road environments increases, the need for effective solutions to understand these surroundings becomes essential. In this study, we confront the inherent complexities of semantic segmentation in RADAR data for off-road scenarios. We present a novel pipeline that utilizes LIDAR data and an existing annotated off-road LIDAR dataset for generating RADAR labels, in which the RADAR data are represented as images. Validated with real-world datasets, our pragmatic approach underscores the potential of RADAR technology for navigation applications in off-road environments.Here's the translation in Simplified Chinese characters:随着自动驾驶在非路面环境中的需求增加，理解这些环境的能力变得非常重要。在这个研究中，我们面临了RADAR数据中的自然复杂性，并提出了一种新的管道，利用LIIDAR数据和现有的 annotated off-road LIIDAR 数据集来生成 RADAR 标签。我们使用现实世界数据集进行验证，并证明了 RADAR 技术在非路面环境中的导航应用程序潜力。

A review of individual tree crown detection and delineation from optical remote sensing images

paper_url: http://arxiv.org/abs/2310.13481
repo_url: None
paper_authors: Juepeng Zheng, Shuai Yuan, Weijia Li, Haohuan Fu, Le Yu
for: This paper provides a comprehensive review of Individual Tree Crown Detection (ITCD) methods for detecting and delineating individual tree crowns in optical remote sensing images.methods: The review covers a wide range of ITCD methods, including traditional image processing methods, traditional machine learning methods, and deep learning-based methods.results: The review discusses the strengths and limitations of each method and provides a clear knowledge map of existing ITCD efforts. It also proposes some ITCD-related applications and potential hot topics in future ITCD research.

Abstract
Powered by the advances of optical remote sensing sensors, the production of very high spatial resolution multispectral images provides great potential for achieving cost-efficient and high-accuracy forest inventory and analysis in an automated way. Lots of studies that aim at providing an inventory to the level of each individual tree have generated a variety of methods for Individual Tree Crown Detection and Delineation (ITCD). This review covers ITCD methods for detecting and delineating individual tree crowns, and systematically reviews the past and present of ITCD-related researches applied to the optical remote sensing images. With the goal to provide a clear knowledge map of existing ITCD efforts, we conduct a comprehensive review of recent ITCD papers to build a meta-data analysis, including the algorithm, the study site, the tree species, the sensor type, the evaluation method, etc. We categorize the reviewed methods into three classes: (1) traditional image processing methods (such as local maximum filtering, image segmentation, etc.); (2) traditional machine learning methods (such as random forest, decision tree, etc.); and (3) deep learning based methods. With the deep learning-oriented approaches contributing a majority of the papers, we further discuss the deep learning-based methods as semantic segmentation and object detection methods. In addition, we discuss four ITCD-related issues to further comprehend the ITCD domain using optical remote sensing data, such as comparisons between multi-sensor based data and optical data in ITCD domain, comparisons among different algorithms and different ITCD tasks, etc. Finally, this review proposes some ITCD-related applications and a few exciting prospects and potential hot topics in future ITCD research.

摘要
Powered by the advances of optical remote sensing sensors, the production of very high spatial resolution multispectral images provides great potential for achieving cost-efficient and high-accuracy forest inventory and analysis in an automated way. Many studies that aim at providing an inventory to the level of each individual tree have generated a variety of methods for Individual Tree Crown Detection and Delineation (ITCD). This review covers ITCD methods for detecting and delineating individual tree crowns, and systematically reviews the past and present of ITCD-related researches applied to the optical remote sensing images. With the goal to provide a clear knowledge map of existing ITCD efforts, we conduct a comprehensive review of recent ITCD papers to build a meta-data analysis, including the algorithm, the study site, the tree species, the sensor type, the evaluation method, etc. We categorize the reviewed methods into three classes: (1) traditional image processing methods (such as local maximum filtering, image segmentation, etc.); (2) traditional machine learning methods (such as random forest, decision tree, etc.); and (3) deep learning based methods. With the deep learning-oriented approaches contributing a majority of the papers, we further discuss the deep learning-based methods as semantic segmentation and object detection methods. In addition, we discuss four ITCD-related issues to further comprehend the ITCD domain using optical remote sensing data, such as comparisons between multi-sensor based data and optical data in ITCD domain, comparisons among different algorithms and different ITCD tasks, etc. Finally, this review proposes some ITCD-related applications and a few exciting prospects and potential hot topics in future ITCD research.

Segment, Select, Correct: A Framework for Weakly-Supervised Referring Segmentation

paper_url: http://arxiv.org/abs/2310.13479
repo_url: https://github.com/fgirbal/segment-select-correct
paper_authors: Francisco Eiras, Kemal Oksuz, Adel Bibi, Philip H. S. Torr, Puneet K. Dokania
for: 提高弱类学习图像分割（Referring Image Segmentation，RIS）的性能，使其与全监督学习方法匹配。
methods: 提出一种新的弱类学习框架，包括三个步骤：获取引用 instrucion中对象的实例Mask（segment），使用零损学习选择可能正确的Mask（select），并使用模型修复零损选择的错误（correct）。
results: 在实验中，只使用首两个步骤（零损 segment和 select）比其他零损基elines提高了19%，而我们的全方法可以更高地超越这个强baseline，将弱类学习RIS的性能提高至全监督方法水平，在某些情况下将间隔降低至14%。

Abstract
Referring Image Segmentation (RIS) - the problem of identifying objects in images through natural language sentences - is a challenging task currently mostly solved through supervised learning. However, while collecting referred annotation masks is a time-consuming process, the few existing weakly-supervised and zero-shot approaches fall significantly short in performance compared to fully-supervised learning ones. To bridge the performance gap without mask annotations, we propose a novel weakly-supervised framework that tackles RIS by decomposing it into three steps: obtaining instance masks for the object mentioned in the referencing instruction (segment), using zero-shot learning to select a potentially correct mask for the given instruction (select), and bootstrapping a model which allows for fixing the mistakes of zero-shot selection (correct). In our experiments, using only the first two steps (zero-shot segment and select) outperforms other zero-shot baselines by as much as 19%, while our full method improves upon this much stronger baseline and sets the new state-of-the-art for weakly-supervised RIS, reducing the gap between the weakly-supervised and fully-supervised methods in some cases from around 33% to as little as 14%. Code is available at https://github.com/fgirbal/segment-select-correct.

摘要
稍微指导学习（RIS）——在自然语言句子中识别图像中的对象——是一项具有挑战性的任务，目前主要通过指导学习来解决。然而，收集引用涂抹的过程是时间consuming的，而exist的弱指导学习和零shot方法在性能上与完全指导学习方法相比显得较差。为了bridging性能差距而不需要涂抹，我们提出了一种新的弱指导学习框架，该框架通过分解RIS任务为三步来解决：取得提及的对象图像的实例涂抹（segment），使用零shot学习选择可能正确的涂抹（select），并使用一种可以修复零shot选择的错误的模型（correct）。在我们的实验中，只使用第一两步（零shot segment和select）可以与其他零shot基elines相比提高到19%，而我们的全方法可以超越这个强baseline，并将弱指导学习RIS的状态rola-art降低到14%。代码可以在https://github.com/fgirbal/segment-select-correct中找到。

Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models

paper_url: http://arxiv.org/abs/2310.13473
repo_url: https://github.com/coderj-one/giraffe-bench
paper_authors: Mingwei Zhu, Leigang Sha, Yu Shu, Kangjia Zhao, Tiancheng Zhao, Jianwei Yin
for: 本文旨在评估大型语言模型（MLLMs）在预测预测任务中的能力，以探索其预测理解能力的可能性。
methods: 本文提出了一个新的benchmark，用于评估MLLMs在多种场景中的预测能力。该benchmark包括三个重要领域：抽象模式逻辑推理、人员活动预测和物理互动预测。同时，本文还开发了三种评估方法，以评估模型在基于多 modal输入的预测和逻辑推理任务中的表现。
results: 实验证明了本文提出的benchmark和评估方法的可靠性，并 revela了当前popular MLLMs在预测任务中的优缺点。最后，本文的benchmark可以为MLLMs的发展提供一个标准化的评估框架，并促进模型的更高级别的发展，以便在复杂的多modal输入下进行预测和逻辑推理。

Abstract
Multimodal large language models (MLLMs) have shown great potential in perception and interpretation tasks, but their capabilities in predictive reasoning remain under-explored. To address this gap, we introduce a novel benchmark that assesses the predictive reasoning capabilities of MLLMs across diverse scenarios. Our benchmark targets three important domains: abstract pattern reasoning, human activity prediction, and physical interaction prediction. We further develop three evaluation methods powered by large language model to robustly quantify a model's performance in predicting and reasoning the future based on multi-visual context. Empirical experiments confirm the soundness of the proposed benchmark and evaluation methods via rigorous testing and reveal pros and cons of current popular MLLMs in the task of predictive reasoning. Lastly, our proposed benchmark provides a standardized evaluation framework for MLLMs and can facilitate the development of more advanced models that can reason and predict over complex long sequence of multimodal input.

摘要

Dance Your Latents: Consistent Dance Generation through Spatial-temporal Subspace Attention Guided by Motion Flow

paper_url: http://arxiv.org/abs/2310.14780
repo_url: None
paper_authors: Haipeng Fang, Zhihao Sun, Ziyao Huang, Fan Tang, Juan Cao, Sheng Tang
for: 这篇论文旨在提高人体舞蹈生成领域的生成AI技术，以实现更高质量的舞蹈视频生成。
methods: 该方法基于隐藏空间的嵌入学习，并提出了空间-时间嵌入注意力块和运动流导航注意力块等两种新的注意力机制，以提高生成的空间时间一致性。
results: 实验结果表明，该方法可以显著提高生成的舞蹈视频中的空间时间一致性，从而提高生成的质量。

Abstract
The advancement of generative AI has extended to the realm of Human Dance Generation, demonstrating superior generative capacities. However, current methods still exhibit deficiencies in achieving spatiotemporal consistency, resulting in artifacts like ghosting, flickering, and incoherent motions. In this paper, we present Dance-Your-Latents, a framework that makes latents dance coherently following motion flow to generate consistent dance videos. Firstly, considering that each constituent element moves within a confined space, we introduce spatial-temporal subspace-attention blocks that decompose the global space into a combination of regular subspaces and efficiently model the spatiotemporal consistency within these subspaces. This module enables each patch pay attention to adjacent areas, mitigating the excessive dispersion of long-range attention. Furthermore, observing that body part's movement is guided by pose control, we design motion flow guided subspace align & restore. This method enables the attention to be computed on the irregular subspace along the motion flow. Experimental results in TikTok dataset demonstrate that our approach significantly enhances spatiotemporal consistency of the generated videos.

摘要
“人体舞蹈生成领域内，对于生成AI的进步已经推广到。然而，目前的方法仍然存在着时空一致性的缺陷，导致类似“幽灵”、“跳跃”和“无统一”的artefacts出现。在本文中，我们提出 Dance-Your-Latents 框架，让 latent 在动作流中跳舞具有一致性，实现了一致的舞蹈视频生成。首先，我们考虑到每个元素在紧随的空间内运动，我们引入时空频域注意力对应方法，将全球空间分解为一系列规律的频域和有效地模型时空一致性。这个模员使每个小区与邻近区域进行对话，解决了过度分散的长距离注意力问题。其次，我们观察到人体部分的运动受到pose控制的指导，我们设计了动作流导向的时空对齐恢复方法。这种方法使得注意力可以在动作流方向上进行计算，实现了一致的注意力Computing。实验结果显示，我们的方法在TikTok数据集上有 statistically significant 提高了生成视频的时空一致性。”

Two-Stage Triplet Loss Training with Curriculum Augmentation for Audio-Visual Retrieval

paper_url: http://arxiv.org/abs/2310.13451
repo_url: None
paper_authors: Donghuo Zeng, Kazushi Ikeda
for: 本文targets the problem of cross-modal retrieval, specifically addressing the issue of suboptimal model performance due to the oversight of distinguishing between semi-hard and hard triples in the optimization process.
methods: 本文提出了一种基于课程学习的两阶段训练方法，通过从 semi-hard triplets开始，然后通过 interpolating embeddings来增强模型的学习过程。最后，模型通过 hard triplet mining来进一步优化。
results: 实验结果表明，在两个音频视频数据集上，与当前状态艺术方法MSNSCA进行比较，本文的方法在AV-CMR任务上的AVE数据集上提高了均值溢通精度（MAP）的平均值约9.8%， indicating the effectiveness of the proposed method.

Abstract
The cross-modal retrieval model leverages the potential of triple loss optimization to learn robust embedding spaces. However, existing methods often train these models in a singular pass, overlooking the distinction between semi-hard and hard triples in the optimization process. The oversight of not distinguishing between semi-hard and hard triples leads to suboptimal model performance. In this paper, we introduce a novel approach rooted in curriculum learning to address this problem. We propose a two-stage training paradigm that guides the model's learning process from semi-hard to hard triplets. In the first stage, the model is trained with a set of semi-hard triplets, starting from a low-loss base. Subsequently, in the second stage, we augment the embeddings using an interpolation technique. This process identifies potential hard negatives, alleviating issues arising from high-loss functions due to a scarcity of hard triples. Our approach then applies hard triplet mining in the augmented embedding space to further optimize the model. Extensive experimental results conducted on two audio-visual datasets show a significant improvement of approximately 9.8% in terms of average Mean Average Precision (MAP) over the current state-of-the-art method, MSNSCA, for the Audio-Visual Cross-Modal Retrieval (AV-CMR) task on the AVE dataset, indicating the effectiveness of our proposed method.

摘要
《cross-modal retrieval模型可以利用 triple loss优化来学习强化 embedding空间。然而，现有方法通常在单个过程中训练这些模型，忽视 semi-hard triplets 和 hard triplets 之间的差异在优化过程中。这种忽视导致模型性能下降。在本文中，我们提出了一种新的方法，基于 curriculum learning，来解决这个问题。我们提议一种两阶段训练方法，使得模型在 semi-hard triplets 上学习，然后使用 interpolate 技术来增强 embedding，并在扩展 embedding 空间中进行 hard triplet 挖掘，以进一步优化模型。我们的方法在两个音频视频数据集上进行了广泛的实验，并与当前状态艺术方法 MSNSCA 进行了比较。结果表明，我们的方法在 AVE 数据集上的 Audio-Visual Cross-Modal Retrieval（AV-CMR）任务中提高了平均精度为9.8%。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is based on the given text and may not reflect the exact phrasing or wording of the original text.

Definition-independent Formalization of Soundscapes: Towards a Formal Methodology

paper_url: http://arxiv.org/abs/2310.13404
repo_url: None
paper_authors: Mikel D. Jedrusiak, Thomas Harweg, Timo Haselhoff, Bryce T. Lawrence, Susanne Moebus, Frank Weichert
for: 本研究旨在提供一种独立于听samples定义的 formalization，以便捕捉不同领域的听samples数据的多元结构，以及不同 идеológy的表述。
methods: 本研究使用了频谱相关矩阵来检测土地使用类型，作为一种 alternativa 于 MFCCs 的特征。
results: exemplary 分析表明，使用频谱相关矩阵可以准确地检测土地使用类型，并且可以捕捉到不同领域的听samples数据中的多元结构。

Abstract
Soundscapes have been studied by researchers from various disciplines, each with different perspectives, goals, approaches, and terminologies. Accordingly, depending on the field, the concept of a soundscape's components changes, consequently changing the basic definition. This results in complicating interdisciplinary communication and comparison of results. Especially when soundscape-unrelated research areas are involved. For this reason, we present a potential formalization that is independent of the underlying soundscape definition, with the goal of being able to capture the heterogeneous structure of the data as well as the different ideologies in one model. In an exemplary analysis of frequency correlation matrices for land use type detection as an alternative to features like MFCCs, we show a practical application of our presented formalization.

摘要
听soundscapes已经被不同领域的研究人员研究，每个领域有不同的视角、目标、方法和术语。因此，听soundscapes的组成部分在不同领域中发生变化，导致基本定义的变化。这会使交通 между不同领域和对结果的比较变得复杂。特别是当涉及到不同领域的研究时。为了解决这个问题，我们提出了一种可能的正式化方法，可以独立地捕捉听soundscapes的多元结构和不同意识形态。在一个例子中，我们通过对听frequency correlation matrix进行分析，以示听land use类型检测的实际应用。

paper_url: http://arxiv.org/abs/2310.13398
repo_url: None
paper_authors: Yijie Zhou, Likun Cai, Xianhui Cheng, Zhongxue Gan, Xiangyang Xue, Wenchao Ding
for: automatic annotating functions for multi-modal data
methods: open-source open-vocabulary auto-labeling system that integrates Large Language Models (LLMs) and vision-language models (VLMs)
results: significantly improves annotation efficiency compared to manual annotation while providing accurate open-vocabulary auto-annotating results.Here’s the simplified Chinese text:
for: automatic对多模态数据进行注解功能
methods: 基于开源的开 vocabulary自动注解系统，结合大语言模型（LLMs）和视力语言模型（VLMs）
results: Significantly improves manual注解效率，提供准确的开 vocabulary自动注解结果

Abstract
In the era of big data and large models, automatic annotating functions for multi-modal data are of great significance for real-world AI-driven applications, such as autonomous driving and embodied AI. Unlike traditional closed-set annotation, open-vocabulary annotation is essential to achieve human-level cognition capability. However, there are few open-vocabulary auto-labeling systems for multi-modal 3D data. In this paper, we introduce OpenAnnotate3D, an open-source open-vocabulary auto-labeling system that can automatically generate 2D masks, 3D masks, and 3D bounding box annotations for vision and point cloud data. Our system integrates the chain-of-thought capabilities of Large Language Models (LLMs) and the cross-modality capabilities of vision-language models (VLMs). To the best of our knowledge, OpenAnnotate3D is one of the pioneering works for open-vocabulary multi-modal 3D auto-labeling. We conduct comprehensive evaluations on both public and in-house real-world datasets, which demonstrate that the system significantly improves annotation efficiency compared to manual annotation while providing accurate open-vocabulary auto-annotating results.

摘要
在大数据和大模型时代，自动标注函数对多 modal 数据的应用非常重要，如自动驾驶和具体 AI。不同于传统的关闭集合标注，开放词汇标注是实现人类认知水平的关键。然而，对多 modal 3D 数据的开放词汇自动标注系统很少。在这篇论文中，我们介绍 OpenAnnotate3D，一个开源的开放词汇自动标注系统，可以自动生成2D masks、3D masks和3D bounding box注释 для视觉和点云数据。我们的系统结合了 Large Language Models (LLMs) 的链条思维能力和视觉语言模型 (VLMs) 的交叉模态能力。据我们所知，OpenAnnotate3D 是开放词汇多 modal 3D 自动标注的先驱之作。我们对公共和内部实验室数据进行了全面的评估，结果表明，系统可以大幅提高人工标注的效率，同时提供高精度的开放词汇自动标注结果。

ScalableMap: Scalable Map Learning for Online Long-Range Vectorized HD Map Construction

paper_url: http://arxiv.org/abs/2310.13378
repo_url: https://github.com/jingy1yu/scalablemap
paper_authors: Jingyi Yu, Zizhao Zhang, Shengfu Xia, Jizhang Sang
for: 这个论文是为了建立在board camera感知器上的在线长距离高清地图建构管线。
methods: 该论文使用了纹理化表示法，使用多边形和多边形来表示地图元素。它还提出了一种层次稀疏地图表示法，以便更好地利用纹理化地图元素的可扩展性，并设计了一种进程编码机制和一种监督策略。
results: 该论文在nuScenes数据集上达到了6.5 mAP的最高精度，在长距离场景下表现特别出色，超越了之前的状态态模型，并且实现了18.3 FPS的速度。

Abstract
We propose a novel end-to-end pipeline for online long-range vectorized high-definition (HD) map construction using on-board camera sensors. The vectorized representation of HD maps, employing polylines and polygons to represent map elements, is widely used by downstream tasks. However, previous schemes designed with reference to dynamic object detection overlook the structural constraints within linear map elements, resulting in performance degradation in long-range scenarios. In this paper, we exploit the properties of map elements to improve the performance of map construction. We extract more accurate bird's eye view (BEV) features guided by their linear structure, and then propose a hierarchical sparse map representation to further leverage the scalability of vectorized map elements and design a progressive decoding mechanism and a supervision strategy based on this representation. Our approach, ScalableMap, demonstrates superior performance on the nuScenes dataset, especially in long-range scenarios, surpassing previous state-of-the-art model by 6.5 mAP while achieving 18.3 FPS. Code is available at https://github.com/jingy1yu/ScalableMap.

摘要
我们提出了一种新的端到端管道，用于在线上进行高清定制（HD）地图建构，使用车载摄像头传感器。这种vectorized表示方法，使用多边形和多边形来表示地图元素，广泛用于下游任务。然而，之前的方案忽略了线性地图元素的结构约束，导致长距离场景下的性能下降。在这篇论文中，我们利用地图元素的属性来提高地图建构的性能。我们从多个 bird's eye view（BEV）特征中提取更加准确的特征，然后提出一种层次稀疏地图表示法，以便更好地利用vectorized地图元素的可扩展性，并设计了一种进程式解码机制和一种根据这种表示法的监督策略。我们的方法，称为ScalableMap，在nuScenes数据集上表现出色，特别是在长距离场景下，比前一个状态的模型提高6.5 mAP，并达到18.3 FPS。代码可以在https://github.com/jingy1yu/ScalableMap中下载。

Single-view 3D reconstruction via inverse procedural modeling

paper_url: http://arxiv.org/abs/2310.13373
repo_url: None
paper_authors: Albert Garifullin, Nikolay Maiorov, Vladimir Frolov
for: 3D reconstruction via inverse procedural modeling, demonstrating results on tree models and complex objects
methods: using a genetic algorithm for fitting set of input parameters, differentiable rendering and differentiable procedural generators for precise reconstruction
results: significant improvement in precision, ability to reconstruct 3D models accurately with a small number of input images, and application to complex generators with both differentiable and non-differentiable procedural generators

Abstract
We propose an approach to 3D reconstruction via inverse procedural modeling and investigate two variants of this approach. The first option consists in the fitting set of input parameters using a genetic algorithm. We demonstrate the results of our work on tree models, complex objects, with the reconstruction of which most existing methods cannot handle. The second option allows us to significantly improve the precision by using gradients within memetic algorithm, differentiable rendering and also differentiable procedural generators. In our work we see 2 main contributions. First, we propose a method to join differentiable rendering and inverse procedural modeling. This gives us an opportunity to reconstruct 3D model more accurately than existing approaches when a small number of input images are available (even for single image). Second, we join both differentiable and non-differentiable procedural generators in a single framework which allow us to apply inverse procedural modeling to fairly complex generators: when gradient is available, reconstructions is precise, when gradient is not available, reconstruction is approximate, but always high quality without visual artifacts.

摘要
我们提出了一种3D重建方法，基于反工程模型，并研究了这种方法的两种变体。第一种方法使用遗传算法来调整输入参数的集合。我们在树模型、复杂物体上进行了实验，并达到了现有方法无法处理的3D重建结果。第二种方法使用内在的积分算法、可微渲染和可微生成器，可以在输入图像很少时进行高精度的3D重建。在我们的工作中，我们认为有两个主要贡献：首先，我们将可微渲染和反工程模型结合在一起，从而在输入图像很少时可以更加准确地重建3D模型（即使用单个图像）。其次，我们将可微和非可微的生成器集成到同一个框架中，以便应用反工程模型到较复杂的生成器中，当gradient可用时，重建是精确的，当gradient不可用时，重建是相对精度高而无视觉 artifacts。

PSGText: Stroke-Guided Scene Text Editing with PSP Module

paper_url: http://arxiv.org/abs/2310.13366
repo_url: None
paper_authors: Felix Liawi, Yun-Da Tsai, Guan-Lun Lu, Shou-De Lin
for: 该论文目的是提供一种将文本 transferred 到图像中的方法，以保持原始文本背景和样式，并提高修改后的文本的清晰度和可读性。
methods: 该方法包括三个阶段：首先，我们引入了一个文本交换网络，可以轻松地将原始文本替换为新的文本。其次，我们 integrates 了一个背景填充网络，用于修复background图像中的空洞，以保持视觉协调和一致性。最后，我们使用一个融合网络将这两个网络的结果融合，得到高清晰度和可读性的修改后图像。
results: 该方法可以生成高质量的修改后图像，保持原始文本背景和样式，并提高修改后文本的清晰度和可读性。 demo 视频可以在补充材料中找到。

Abstract
Scene Text Editing (STE) aims to substitute text in an image with new desired text while preserving the background and styles of the original text. However, present techniques present a notable challenge in the generation of edited text images that exhibit a high degree of clarity and legibility. This challenge primarily stems from the inherent diversity found within various text types and the intricate textures of complex backgrounds. To address this challenge, this paper introduces a three-stage framework for transferring texts across text images. Initially, we introduce a text-swapping network that seamlessly substitutes the original text with the desired replacement. Subsequently, we incorporate a background inpainting network into our framework. This specialized network is designed to skillfully reconstruct background images, effectively addressing the voids left after the removal of the original text. This process meticulously preserves visual harmony and coherence in the background. Ultimately, the synthesis of outcomes from the text-swapping network and the background inpainting network is achieved through a fusion network, culminating in the creation of the meticulously edited final image. A demo video is included in the supplementary material.

摘要

Sync-NeRF: Generalizing Dynamic NeRFs to Unsynchronized Videos

paper_url: http://arxiv.org/abs/2310.13356
repo_url: https://github.com/seoha-kim/Sync-NeRF
paper_authors: Seoha Kim, Jeongmin Bae, Youngsik Yun, Hahyun Lee, Gun Bang, Youngjung Uh
for: 该论文旨在解决4D场景重建使用神经辐射场（NeRF）时，对动态场景的重建受限，并且无法适应不同时刻拍摄的多视图视频。
methods: 作者引入时差偏移来解决这个问题，并将偏移值与NeRF共同优化。这种方法适用于多种基eline和提高了它们的性能。
results: 实验结果表明，该方法可以有效地同步多视图视频，并且在Plenoptic Video Dataset和一个新建的Unsynchronized Dynamic Blender Dataset上进行了验证。项目页面：https://seoha-kim.github.io/sync-nerf。

Abstract
Recent advancements in 4D scene reconstruction using neural radiance fields (NeRF) have demonstrated the ability to represent dynamic scenes from multi-view videos. However, they fail to reconstruct the dynamic scenes and struggle to fit even the training views in unsynchronized settings. It happens because they employ a single latent embedding for a frame while the multi-view images at the frame were actually captured at different moments. To address this limitation, we introduce time offsets for individual unsynchronized videos and jointly optimize the offsets with NeRF. By design, our method is applicable for various baselines and improves them with large margins. Furthermore, finding the offsets naturally works as synchronizing the videos without manual effort. Experiments are conducted on the common Plenoptic Video Dataset and a newly built Unsynchronized Dynamic Blender Dataset to verify the performance of our method. Project page: https://seoha-kim.github.io/sync-nerf

摘要
最近的进展在4D场景重建领域使用神经辐射场（NeRF）已经表明了能够从多视图视频中重建动态场景。然而，它们无法重建动态场景，并且在不同时刻拍摄的多视图图像中很难匹配。这是因为它们使用单个离散嵌入来表示一帧的图像，而多视图图像在该帧中实际上是在不同的时刻拍摄的。为解决这些限制，我们引入时间偏移 для个体不同时刻的视频，并同时优化偏移。由设计来看，我们的方法适用于各种基elines和提高它们的大幅度。此外，找到偏移也自然地同步视频，无需手动努力。我们在常见的Plenoptic Video Dataset和新建的Unsynchronized Dynamic Blender Dataset上进行了实验，以验证我们的方法的性能。项目页面：https://seoha-kim.github.io/sync-nerf

SILC: Improving Vision Language Pretraining with Self-Distillation

paper_url: http://arxiv.org/abs/2310.13355
repo_url: None
paper_authors: Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai, Lukas Hoyer, Luc Van Gool, Federico Tombari
for:This paper focuses on improving the performance of open-vocabulary classification and retrieval models by using a simple addition of local-to-global correspondence learning through self-distillation during contrastive pre-training.methods:The proposed method, SILC, uses a contrastive objective to learn image-text alignment and adds local-to-global correspondence learning through self-distillation to improve image feature learning for dense prediction tasks.results:The proposed SILC model achieves state-of-the-art performance on several computer vision tasks, including zero-shot classification, few-shot classification, image and text retrieval, zero-shot segmentation, and open vocabulary segmentation, with better scaling compared to baselines.

Abstract
Image-Text pretraining on web-scale image caption dataset has become the default recipe for open vocabulary classification and retrieval models thanks to the success of CLIP and its variants. Several works have also used CLIP features for dense prediction tasks and have shown the emergence of open-set abilities. However, the contrastive objective only focuses on image-text alignment and does not incentivise image feature learning for dense prediction tasks. In this work, we propose the simple addition of local-to-global correspondence learning by self-distillation as an additional objective for contrastive pre-training to propose SILC. We show that distilling local image features from an exponential moving average (EMA) teacher model significantly improves model performance on several computer vision tasks including classification, retrieval, and especially segmentation. We further show that SILC scales better with the same training duration compared to the baselines. Our model SILC sets a new state of the art for zero-shot classification, few shot classification, image and text retrieval, zero-shot segmentation, and open vocabulary segmentation.

摘要
“图文预训在大规模图片描述集合上进行预训，现在成为开 vocabulary 分类和搜寻模型的默认配方，这主要归功于 CLIP 和其变体的成功。一些工作也使用 CLIP 特征进行紧密预测任务，并显示了开集能力的出现。但是，对比тив的目标只是对图片文本的对预导，不对图片特征学习紧密预测任务。在这个工作中，我们提出了将本地到全球对应学习作为额外目标，通过自我静态学习来实现 SILC。我们表明，将本地图片特征从 EMA 教师模型散发到 exponential moving average 教师模型可以大幅提高模型在许多计算机视觉任务中的性能，包括分类、搜寻、特别是分割。我们进一步显示，SILC 在相同训练时间下可以与基准点模型相比，并且在 zero-shot 分类、几步分类、图片和文本搜寻、 zero-shot 分割和开 vocabulary 分割等多个任务中设置新的州OF THE ART。”

EarlyBird: Early-Fusion for Multi-View Tracking in the Bird’s Eye View

paper_url: http://arxiv.org/abs/2310.13350
repo_url: https://github.com/tteepe/EarlyBird
paper_authors: Torben Teepe, Philipp Wolters, Johannes Gilg, Fabian Herzog, Gerhard Rigoll
for: 本研究旨在探讨 whether tracking in the Bird’s Eye View (BEV) can bring the next performance breakthrough in Multi-Target Multi-Camera (MTMC) tracking.
methods: 本研究使用 early-fusion 方法，通过在 BEV 中探测每个人并学习强的 Re-Identification (re-ID) 特征来实现时间相关性 association.
results: results 显示 early-fusion 在 BEV 中可以达到高精度的探测和跟踪。 EarlyBird 方法在 Wildtrack 上超过当前状态艺术，提高了 MOTA 和 IDF1 的表现。

Abstract
Multi-view aggregation promises to overcome the occlusion and missed detection challenge in multi-object detection and tracking. Recent approaches in multi-view detection and 3D object detection made a huge performance leap by projecting all views to the ground plane and performing the detection in the Bird's Eye View (BEV). In this paper, we investigate if tracking in the BEV can also bring the next performance breakthrough in Multi-Target Multi-Camera (MTMC) tracking. Most current approaches in multi-view tracking perform the detection and tracking task in each view and use graph-based approaches to perform the association of the pedestrian across each view. This spatial association is already solved by detecting each pedestrian once in the BEV, leaving only the problem of temporal association. For the temporal association, we show how to learn strong Re-Identification (re-ID) features for each detection. The results show that early-fusion in the BEV achieves high accuracy for both detection and tracking. EarlyBird outperforms the state-of-the-art methods and improves the current state-of-the-art on Wildtrack by +4.6 MOTA and +5.6 IDF1.

摘要
多视图聚合承诺可以超越 occlusion 和缺失检测挑战在多目标检测和跟踪中。 current 方法在多视图检测和3D对象检测中取得了巨大的性能突破，通过将所有视图投影到地面平面并在 Bird's Eye View（BEV）中进行检测。在这篇论文中，我们研究了 whether 在 BEV 中进行跟踪可以带来下一个性能突破在 Multi-Target Multi-Camera（MTMC）跟踪中。大多数当前方法在多视图跟踪中都是在每个视图中进行检测和跟踪任务，并使用图形基本方法来进行视图之间的关联。这个空间关联已经被推缩到在 BEV 中检测每个人员一次，只剩下时间关联的问题。为了解决时间关联问题，我们展示了如何学习强大的 Re-Identification（re-ID）特征 для每个检测。结果显示，在 BEV 中早期融合可以 дости得高度的准确率 both 检测和跟踪。 EarlyBird 超越了当前状态的方法，并提高了 Wildtrack 中当前状态的 MOTA 和 IDF1 的表现+4.6 和 +5.6。

DeepFDR: A Deep Learning-based False Discovery Rate Control Method for Neuroimaging Data

paper_url: http://arxiv.org/abs/2310.13349
repo_url: None
paper_authors: Taehyo Kim, Hai Shu, Qiran Jia, Mony de Leon
for: 这篇论文是为了解决voxel-based多测试问题，特别是处理大脑中复杂的空间相关性。
methods: 这篇论文提出了一种基于深度学习的空间FDR控制方法，利用无监督学习图像分割来解决voxel-based多测试问题。
results: 数值研究表明，DeepFDR比现有方法更有效地控制FDR，同时减少了假发现率，并且具有优秀的计算效率，适用于处理大规模的神经成像数据。

Abstract
Voxel-based multiple testing is widely used in neuroimaging data analysis. Traditional false discovery rate (FDR) control methods often ignore the spatial dependence among the voxel-based tests and thus suffer from substantial loss of testing power. While recent spatial FDR control methods have emerged, their validity and optimality remain questionable when handling the complex spatial dependencies of the brain. Concurrently, deep learning methods have revolutionized image segmentation, a task closely related to voxel-based multiple testing. In this paper, we propose DeepFDR, a novel spatial FDR control method that leverages unsupervised deep learning-based image segmentation to address the voxel-based multiple testing problem. Numerical studies, including comprehensive simulations and Alzheimer's disease FDG-PET image analysis, demonstrate DeepFDR's superiority over existing methods. DeepFDR not only excels in FDR control and effectively diminishes the false nondiscovery rate, but also boasts exceptional computational efficiency highly suited for tackling large-scale neuroimaging data.

摘要
voxel基于多测试广泛应用于神经成像数据分析中。传统的假发现率（FDR）控制方法通常忽略了VOXEL基于测试之间的空间依赖关系，从而导致了重大的测试力下降。而最近的空间FDR控制方法已经出现，但它们在处理大脑复杂的空间依赖关系时的有效性和优化性仍然存在问题。同时，深度学习方法在图像分割领域已经引领了革命，这个领域与VOXEL基于多测试密切相关。在这篇论文中，我们提出了DeepFDR，一种基于深度学习无监督图像分割的新的空间FDR控制方法。 numerics 研究，包括完整的 simulations和阿尔茨海默症FDG-PET图像分析，表明DeepFDR在现有方法中具有优越性，不仅在FDR控制方面 excellence，还能够减少假发现率，同时具有出色的计算效率，适合处理大规模神经成像数据。

DeepFracture: A Generative Approach for Predicting Brittle Fractures

paper_url: http://arxiv.org/abs/2310.13344
repo_url: None
paper_authors: Yuhang Huang, Takashi Kanai
for: 这篇论文是关于在不可逆摇摆动画中生成真实的破坏动画，使用物理 simulate 技术，但是使用 Voronoi 图或先 Fraction 的方法可能缺乏真实感。
methods: 该论文提出了一种基于学习的方法，将真实的不可逆破坏动画与 rigid-body simulations 集成。该方法使用 BEM 不可逆破坏 simulations 生成破坏模式和碰撞条件，然后将其作为学习过程的输入。
results: 该论文的实验结果表明，该方法可以生成比现有方法更加细节的不可逆破坏动画，同时保持了可观计算效率。

Abstract
In the realm of brittle fracture animation, generating realistic destruction animations with physics simulation techniques can be computationally expensive. Although methods using Voronoi diagrams or pre-fractured patterns work for real-time applications, they often lack realism in portraying brittle fractures. This paper introduces a novel learning-based approach for seamlessly merging realistic brittle fracture animations with rigid-body simulations. Our method utilizes BEM brittle fracture simulations to create fractured patterns and collision conditions for a given shape, which serve as training data for the learning process. To effectively integrate collision conditions and fractured shapes into a deep learning framework, we introduce the concept of latent impulse representation and geometrically-segmented signed distance function (GS-SDF). The latent impulse representation serves as input, capturing information about impact forces on the shape's surface. Simultaneously, a GS-SDF is used as the output representation of the fractured shape. To address the challenge of optimizing multiple fractured pattern targets with a single latent code, we propose an eight-dimensional latent space based on a normal distribution code within our latent impulse representation design. This adaptation effectively transforms our neural network into a generative one. Our experimental results demonstrate that our approach can generate significantly more detailed brittle fractures compared to existing techniques, all while maintaining commendable computational efficiency during run-time.

摘要
在破碎破坏动画领域，通过物理 simulations 技术生成真实的破坏动画可以 computationally expensive. Although methods using Voronoi diagrams or pre-fractured patterns work for real-time applications, they often lack realism in portraying brittle fractures. This paper introduces a novel learning-based approach for seamlessly merging realistic brittle fracture animations with rigid-body simulations. Our method utilizes BEM brittle fracture simulations to create fractured patterns and collision conditions for a given shape, which serve as training data for the learning process. To effectively integrate collision conditions and fractured shapes into a deep learning framework, we introduce the concept of latent impulse representation and geometrically-segmented signed distance function (GS-SDF). The latent impulse representation serves as input, capturing information about impact forces on the shape's surface. Simultaneously, a GS-SDF is used as the output representation of the fractured shape. To address the challenge of optimizing multiple fractured pattern targets with a single latent code, we propose an eight-dimensional latent space based on a normal distribution code within our latent impulse representation design. This adaptation effectively transforms our neural network into a generative one. Our experimental results demonstrate that our approach can generate significantly more detailed brittle fractures compared to existing techniques, all while maintaining commendable computational efficiency during run-time.

CylinderTag: An Accurate and Flexible Marker for Cylinder-Shape Objects Pose Estimation Based on Projective Invariants

paper_url: http://arxiv.org/abs/2310.13320
repo_url: https://github.com/wsakobe/cylindertag
paper_authors: Shaoan Wang, Mingzhu Zhu, Yaoqing Hu, Dongyue Li, Fusong Yuan, Junzhi Yu
for: 高精度位势估计基于视觉标记，用于圆柱形表面上的物体位势估计。
methods: 提出了一种新的视觉标记 called CylinderTag，适用于可扩展的圆柱形表面，并使用了拟合投影射影的拟合环境来编码方向。
results: 通过广泛的实验评估，显示了CylinderTag在不同视角下的检测率、检测速度、字典大小、地方缩动和位势估计精度等性能指标，并且在实时检测和广泛的应用场景中展现出了优秀的表现。

Abstract
High-precision pose estimation based on visual markers has been a thriving research topic in the field of computer vision. However, the suitability of traditional flat markers on curved objects is limited due to the diverse shapes of curved surfaces, which hinders the development of high-precision pose estimation for curved objects. Therefore, this paper proposes a novel visual marker called CylinderTag, which is designed for developable curved surfaces such as cylindrical surfaces. CylinderTag is a cyclic marker that can be firmly attached to objects with a cylindrical shape. Leveraging the manifold assumption, the cross-ratio in projective invariance is utilized for encoding in the direction of zero curvature on the surface. Additionally, to facilitate the usage of CylinderTag, we propose a heuristic search-based marker generator and a high-performance recognizer as well. Moreover, an all-encompassing evaluation of CylinderTag properties is conducted by means of extensive experimentation, covering detection rate, detection speed, dictionary size, localization jitter, and pose estimation accuracy. CylinderTag showcases superior detection performance from varying view angles in comparison to traditional visual markers, accompanied by higher localization accuracy. Furthermore, CylinderTag boasts real-time detection capability and an extensive marker dictionary, offering enhanced versatility and practicality in a wide range of applications. Experimental results demonstrate that the CylinderTag is a highly promising visual marker for use on cylindrical-like surfaces, thus offering important guidance for future research on high-precision visual localization of cylinder-shaped objects. The code is available at: https://github.com/wsakobe/CylinderTag.

摘要
高精度姿势估计基于视觉标记已经在计算机视觉领域得到了广泛的研究。然而，传统的平面标记在抛物线表面上的适用性受到抛物线表面的多样性所限制，这难以实现高精度姿势估计 для抛物线对象。因此，这篇论文提出了一种新的视觉标记 called CylinderTag，适用于可开发的抛物线表面，如圆柱体表面。CylinderTag 是一种循环标记，可以固定在圆柱体形状的对象上。利用 manifold 假设，在项目ive 的均衡点上使用 cross-ratio 的编码，以获得在表面上的方向。此外，为了使 CylinderTag 更加可用，我们提出了一种冒泡搜索基于的标记生成器和高性能的识别器。此外，我们还进行了广泛的实验，评估 CylinderTag 的性能，包括检测率、检测速度、字典大小、本地化振荡和姿势估计精度。CylinderTag 在不同视角下的检测性能显著高于传统 visual marker，同时具有更高的本地化精度。此外，CylinderTag 具有实时检测能力和广泛的标记字典，提供了更好的 versatility 和实用性，适用于广泛的应用场景。实验结果表明，CylinderTag 是一种非常有前途的视觉标记，适用于圆柱体形状的对象，为高精度视觉本地化做出了重要的指导。代码可以在以下链接获取：https://github.com/wsakobe/CylinderTag。

Non-Negative Spherical Relaxations for Universe-Free Multi-Matching and Clustering

paper_url: http://arxiv.org/abs/2310.13311
repo_url: None
paper_authors: Johan Thunberg, Florian Bernard
for: optimize optimization problems over binary matrices with injectivity constraints, particularly for multi-matching and clustering
methods: 使用非负圆形 relaxation 和 conditional power iteration 方法优化relaxed问题
results: 比较spectral multi-matching和spectral clustering方法，方法不需要额外处理获得binary结果，并且在多种多样的多匹配和归一化设定下表现出优异result

Abstract
We propose a novel non-negative spherical relaxation for optimization problems over binary matrices with injectivity constraints, which in particular has applications in multi-matching and clustering. We relax respective binary matrix constraints to the (high-dimensional) non-negative sphere. To optimize our relaxed problem, we use a conditional power iteration method to iteratively improve the objective function, while at same time sweeping over a continuous scalar parameter that is (indirectly) related to the universe size (or number of clusters). Opposed to existing procedures that require to fix the integer universe size before optimization, our method automatically adjusts the analogous continuous parameter. Furthermore, while our approach shares similarities with spectral multi-matching and spectral clustering, our formulation has the strong advantage that we do not rely on additional post-processing procedures to obtain binary results. Our method shows compelling results in various multi-matching and clustering settings, even when compared to methods that use the ground truth universe size (or number of clusters).

摘要
我们提出了一种新的非负球体relaxation算法，用于优化包含二进制矩阵约束的优化问题，具体应用于多对约束和分组。我们将相应的二进制矩阵约束relax到高维非负球体上。为优化我们的放宽问题，我们使用一种增量的力论迭代方法，同时逐步提高目标函数，并同时探索一个连续的浮点参数，这个参数与宇宙大小（或群集数）相关。与现有方法不同，我们的方法不需要先确定整数宇宙大小，而是通过迭代进行调整。此外，我们的方法与spectral multi-matching和spectral clustering相似，但我们不需要额外的后处理步骤来获得 binary结果。我们的方法在多个多对约束和分组设置中显示出了吸引人的 результа。

CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training

paper_url: http://arxiv.org/abs/2310.13292
repo_url: None
paper_authors: Kihyun You, Jawook Gu, Jiyeon Ham, Beomhee Park, Jiho Kim, Eun Kyoung Hong, Woonhyunk Baek, Byungseok Roh
for: 推动视力语言预训练（VLP）模型的发展，以便在医疗领域实现零或几次shot分类无需昂贵的标注。
methods: 对于缺乏医学图像文本数据的问题，我们通过扩展图像标签对进行了图像文本对的扩展，并利用多个图像和多个 radiologic report 中的多个部分。我们还设计了两种对比损失函数（ICL和TCL），用于学习医学图像和报告的研究级特征。
results: 我们的模型在同样的条件下比现有模型强度，并且扩大的数据集提高了我们预训练模型的分类特征，尽管有一定的损失性能。

Abstract
A large-scale image-text pair dataset has greatly contributed to the development of vision-language pre-training (VLP) models, which enable zero-shot or few-shot classification without costly annotation. However, in the medical domain, the scarcity of data remains a significant challenge for developing a powerful VLP model. In this paper, we tackle the lack of image-text data in chest X-ray by expanding image-label pair as image-text pair via general prompt and utilizing multiple images and multiple sections in a radiologic report. We also design two contrastive losses, named ICL and TCL, for learning study-level characteristics of medical images and reports, respectively. Our model outperforms the state-of-the-art models trained under the same conditions. Also, enlarged dataset improve the discriminative power of our pre-trained model for classification, while sacrificing marginal retrieval performance. Code is available at https://github.com/kakaobrain/cxr-clip.

摘要
一个大规模图文对比数据集的出现对视语言预训练（VLP）模型的发展做出了重要贡献，这些模型可以在零shot或几shot的情况下实现无需高成本标注的分类。然而，在医疗领域，数据的缺乏仍然是开发强大VLP模型的主要挑战。在这篇论文中，我们解决了骨科X射影像数据的缺乏问题，通过扩展图标对为图文对和使用多个图像和多个 radiologic report 中的多个部分。我们还设计了两种对比损失，即 IC 损失和 TC 损失，用于学习医疗图像和报告的学习特征。我们的模型在同样的条件下训练时与现有的状态艺模型相比，表现出色。此外，扩大数据集也提高了我们预训练模型的分类特征， хотя是相对较差的检索性能。代码可以在上下载。

Pathologist-Like Explanations Unveiled: an Explainable Deep Learning System for White Blood Cell Classification

paper_url: http://arxiv.org/abs/2310.13279
repo_url: None
paper_authors: Aditya Shankar Pal, Debojyoti Biswas, Joy Mahapatra, Debasis Banerjee, Prantar Chakrabarti, Utpal Garain
For: 这个研究旨在开发一种可解释的深度学习模型，以提高白细胞分类的准确性和可读性。* Methods: 该模型使用深度学习算法和五个特征（granularity、细胞色素、核形状、大小相对红细胞、核：细胞比率）进行自动白细胞分类、定位和分割。* Results: 模型在一个新的数据集上进行训练和评估，并达到了81.08%的平均分类精度和89.16%的精度指数。此外，模型还能够准确地预测五个特征的值，并且与其他多种状态对模型的影响进行了比较。

Abstract
White blood cells (WBCs) play a crucial role in safeguarding the human body against pathogens and foreign substances. Leveraging the abundance of WBC imaging data and the power of deep learning algorithms, automated WBC analysis has the potential for remarkable accuracy. However, the capability of deep learning models to explain their WBC classification remains largely unexplored. In this study, we introduce HemaX, an explainable deep neural network-based model that produces pathologist-like explanations using five attributes: granularity, cytoplasm color, nucleus shape, size relative to red blood cells, and nucleus to cytoplasm ratio (N:C), along with cell classification, localization, and segmentation. HemaX is trained and evaluated on a novel dataset, LeukoX, comprising 467 blood smear images encompassing ten (10) WBC types. The proposed model achieves impressive results, with an average classification accuracy of 81.08% and a Jaccard index of 89.16% for cell localization. Additionally, HemaX performs well in generating the five explanations with a normalized mean square error of 0.0317 for N:C ratio and over 80% accuracy for the other four attributes. Comprehensive experiments comparing against multiple state-of-the-art models demonstrate that HemaX's classification accuracy remains unaffected by its ability to provide explanations. Moreover, empirical analyses and validation by expert hematologists confirm the faithfulness of explanations predicted by our proposed model.

摘要
白血球（WBC）在人体免疫系统中扮演着关键性角色，深受人们的关注。利用白血球图像数据的丰富性和深入学习算法的力量，自动化白血球分类有很大的潜力。然而，深入学习模型对其白血球分类的解释能力尚未得到充分探索。在这项研究中，我们提出了HemaX模型，它是一种可解释的深度神经网络模型，可以生成医生类似的解释，包括五个特征：粒度、细胞膜颜色、核形状、大小相对红细胞、核：细胞膜比率（N：C），同时还包括细胞类别、局部化和分割。HemaX模型在一个新的数据集LeukoX上进行训练和评估，LeukoX数据集包含467个血液干细胞图像，涵盖10种白血球类型。我们的模型在这些实验中获得了非常出色的结果，其中平均分类精度为81.08%，Jaccard指数为89.16%，用于细胞局部化。此外，HemaX模型在生成五个特征的解释方面也表现出色，其中N：C比率的normalized mean square error为0.0317，其他四个特征的解释准确率大于80%。在多个现有的状态对比模型的实验中，我们发现HemaX模型的分类精度不受其能提供解释的影响。此外，由专家血液学家验证的实验和验证表明，HemaX模型生成的解释准确性很高。

DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics

paper_url: http://arxiv.org/abs/2310.13268
repo_url: https://github.com/thu-ml/dpm-solver-v3
paper_authors: Kaiwen Zheng, Cheng Lu, Jianfei Chen, Jun Zhu
for:DPM-Solver-v3 is designed to improve the efficiency of high-fidelity image generation using diffusion probabilistic models (DPMs).methods:The proposed method introduces a new fast ODE solver for DPMs, which minimizes the first-order discretization error of the ODE solution and incorporates multistep methods and a predictor-corrector framework.results:DPM-Solver-v3 achieves consistently better or comparable performance in both unconditional and conditional sampling with both pixel-space and latent-space DPMs, especially in 5-10 NFEs. The method achieves FIDs of 12.21 (5 NFE), 2.51 (10 NFE) on unconditional CIFAR10, and MSE of 0.55 (5 NFE, 7.5 guidance scale) on Stable Diffusion, bringing a speed-up of 15%-30% compared to previous state-of-the-art training-free methods.

Abstract
Diffusion probabilistic models (DPMs) have exhibited excellent performance for high-fidelity image generation while suffering from inefficient sampling. Recent works accelerate the sampling procedure by proposing fast ODE solvers that leverage the specific ODE form of DPMs. However, they highly rely on specific parameterization during inference (such as noise/data prediction), which might not be the optimal choice. In this work, we propose a novel formulation towards the optimal parameterization during sampling that minimizes the first-order discretization error of the ODE solution. Based on such formulation, we propose DPM-Solver-v3, a new fast ODE solver for DPMs by introducing several coefficients efficiently computed on the pretrained model, which we call empirical model statistics. We further incorporate multistep methods and a predictor-corrector framework, and propose some techniques for improving sample quality at small numbers of function evaluations (NFE) or large guidance scales. Experiments show that DPM-Solver-v3 achieves consistently better or comparable performance in both unconditional and conditional sampling with both pixel-space and latent-space DPMs, especially in 5$\sim$10 NFEs. We achieve FIDs of 12.21 (5 NFE), 2.51 (10 NFE) on unconditional CIFAR10, and MSE of 0.55 (5 NFE, 7.5 guidance scale) on Stable Diffusion, bringing a speed-up of 15%$\sim$30% compared to previous state-of-the-art training-free methods. Code is available at https://github.com/thu-ml/DPM-Solver-v3.

摘要
Diffusion probabilistic models (DPMs) 有出色的表现在高精度图像生成中，但是它们在采样过程中存在不fficient的问题。 current works 提出了快速的 ODE 解决方案，但是它们高度依赖于特定的参数化 durante inference（如噪声/数据预测），这可能并不是最佳选择。在这项工作中，我们提出了一种新的参数化形式，以实现采样过程中的首项累累误差最小化。基于这种形式，我们提出了 DPM-Solver-v3，一种新的快速 ODE 解决方案，通过引入一些高效计算的参数，我们称之为 empirical model statistics。我们还 incorporated multistep methods and a predictor-corrector framework，并提出了一些提高样本质量的技术， especialy in small numbers of function evaluations (NFE) or large guidance scales。实验表明，DPM-Solver-v3 可以在 both unconditional 和 conditional 采样中实现更好或相当的表现，特别是在 5 ∼ 10 NFE 的情况下。我们在 CIFAR10 上获得了 FID 的 12.21 (5 NFE)，MSE 的 0.55 (5 NFE, 7.5 指导尺度)，在 Stable Diffusion 上提高了训练free 方法的速度，比前一代 state-of-the-art 快速 15% ∼ 30%。代码可以在 https://github.com/thu-ml/DPM-Solver-v3 上获得。

UE4-NeRF:Neural Radiance Field for Real-Time Rendering of Large-Scale Scene

paper_url: http://arxiv.org/abs/2310.13263
repo_url: None
paper_authors: Jiaming Gu, Minchao Jiang, Hongsheng Li, Xiaoyuan Lu, Guangming Zhu, Syed Afaq Ali Shah, Liang Zhang, Mohammed Bennamoun
For: This paper proposes a novel neural rendering system called UE4-NeRF for real-time rendering of large-scale scenes using NeRF technology.* Methods: The proposed method partitions each large scene into multiple sub-NeRFs, uses polygonal meshes to represent the scene, and trains meshes of varying levels of detail for different observation levels.* Results: The proposed method achieves real-time rendering of large-scale scenes at 4K resolution with a frame rate of up to 43 FPS, and achieves rendering quality comparable to state-of-the-art approaches.Here’s the full summary in Simplified Chinese:* 为：本文提出了一种基于NeRF技术的新型神经采集系统UE4-NeRF，用于实时渲染大规模场景。* 方法：提议方法将每个大场景分割成多个子NeRF，使用多面体来表示场景，并在不同观察水平上进行不同精度的 mesh 训练。* 结果：提议方法在4K分辨率下实现了实时渲染大规模场景，帧率可达43帧/秒，并与现有方法的渲染质量相当。

Abstract
Neural Radiance Fields (NeRF) is a novel implicit 3D reconstruction method that shows immense potential and has been gaining increasing attention. It enables the reconstruction of 3D scenes solely from a set of photographs. However, its real-time rendering capability, especially for interactive real-time rendering of large-scale scenes, still has significant limitations. To address these challenges, in this paper, we propose a novel neural rendering system called UE4-NeRF, specifically designed for real-time rendering of large-scale scenes. We partitioned each large scene into different sub-NeRFs. In order to represent the partitioned independent scene, we initialize polygonal meshes by constructing multiple regular octahedra within the scene and the vertices of the polygonal faces are continuously optimized during the training process. Drawing inspiration from Level of Detail (LOD) techniques, we trained meshes of varying levels of detail for different observation levels. Our approach combines with the rasterization pipeline in Unreal Engine 4 (UE4), achieving real-time rendering of large-scale scenes at 4K resolution with a frame rate of up to 43 FPS. Rendering within UE4 also facilitates scene editing in subsequent stages. Furthermore, through experiments, we have demonstrated that our method achieves rendering quality comparable to state-of-the-art approaches. Project page: https://jamchaos.github.io/UE4-NeRF/.

摘要
neural radiance fields (NeRF) 是一种新型的隐式 3D 重建方法，它在过去几年内得到了广泛关注，并且在3D 场景的重建方面表现出了巨大的潜力。它可以通过一组照片来重建 3D 场景，但是它的实时渲染能力，特别是对于大规模场景的实时渲染，仍然存在一定的限制。为了解决这些挑战，在这篇论文中，我们提出了一种基于 neural rendering 的新型渲染系统，即 UE4-NeRF，用于实时渲染大规模场景。我们将每个大场景分解成不同的子 NeRF，以便更好地表示独立的场景。为了表示分解后的独立场景，我们在场景中构建多个正则八面体，并将其中的多边形面的顶点进行不断优化 durante el proceso de entrenamiento。 drawing inspiration from Level of Detail (LOD) techniques, we trained meshes of varying levels of detail for different observation levels。我们的方法结合了 UE4 的渲染管线，实现了实时渲染大规模场景的 4K 分辨率，帧率可达 43 FPS。此外，通过实验，我们证明了我们的方法可以实现与当前状态最佳的渲染质量。项目页面：https://jamchaos.github.io/UE4-NeRF/。

Domain-specific optimization and diverse evaluation of self-supervised models for histopathology

paper_url: http://arxiv.org/abs/2310.13259
repo_url: None
paper_authors: Jeremy Lai, Faruk Ahmed, Supriya Vijay, Tiam Jaroensri, Jessica Loo, Saurabh Vyawahare, Saloni Agarwal, Fayaz Jamil, Yossi Matias, Greg S. Corrado, Dale R. Webster, Jonathan Krause, Yun Liu, Po-Hsuan Cameron Chen, Ellery Wulczyn, David F. Steiner
for: 提高诊断、临床研究和精准医学的可能性
methods: 自我超VI方法
results: 标准SSL方法在 Histopathology 图像上表现出色，具有各种任务类型和资源的可用性。另外，域pecific的方法ological improvements可以进一步提高性能。

Abstract
Task-specific deep learning models in histopathology offer promising opportunities for improving diagnosis, clinical research, and precision medicine. However, development of such models is often limited by availability of high-quality data. Foundation models in histopathology that learn general representations across a wide range of tissue types, diagnoses, and magnifications offer the potential to reduce the data, compute, and technical expertise necessary to develop task-specific deep learning models with the required level of model performance. In this work, we describe the development and evaluation of foundation models for histopathology via self-supervised learning (SSL). We first establish a diverse set of benchmark tasks involving 17 unique tissue types and 12 unique cancer types and spanning different optimal magnifications and task types. Next, we use this benchmark to explore and evaluate histopathology-specific SSL methods followed by further evaluation on held out patch-level and weakly supervised tasks. We found that standard SSL methods thoughtfully applied to histopathology images are performant across our benchmark tasks and that domain-specific methodological improvements can further increase performance. Our findings reinforce the value of using domain-specific SSL methods in pathology, and establish a set of high quality foundation models to enable further research across diverse applications.

摘要
任务特定的深度学习模型在 histopathology 领域提供了可能性，以提高诊断、临床研究和精准医学。然而，开发这些模型经常受到高质量数据的限制。基本模型在 histopathology 中学习通用表示，可以降低数据、计算和技术专家的努力，以达到需要的模型性能。在这种工作中，我们描述了通过自我超级学习（SSL）的基本模型的开发和评估。我们首先建立了 17 种不同的组织类型和 12 种不同的癌种类，以及不同的最佳放大和任务类型的多样化基准任务。然后，我们使用这些基准任务来探索和评估特有的 histopathology SSL 方法，并在储存的 patch-level 和弱级批处理任务上进行进一步评估。我们发现，针对 histopathology 图像的标准 SSL 方法在我们的基准任务中表现良好，而域pecific的方法学习改进可以进一步提高表现。我们的发现赋予了使用域pecific SSL 方法在 pathology 领域的价值，并建立了一组高质量的基本模型，以便进一步的研究和应用。

Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds

paper_url: http://arxiv.org/abs/2310.13255
repo_url: None
paper_authors: Sipeng Zheng, Jiazheng Liu, Yicheng Feng, Zongqing Lu
for: 这篇论文旨在帮助大语言模型（LLM） equip 身体机器人具备自适应能力，但这些努力通常忽略了开放世界的视觉丰富性，导致LLM-based Agent难以直观理解它所处的环境和生成易于理解的回应。
methods: 我们提出了Steve-Eye，一个结合LLM和视觉编码器的大多模式模型，可以处理视觉文本输入并生成多模式反馈。我们还采用了半自动策略来收集了850K的开放世界指令对，使我们的模型涵盖了三种函数：多模式感知、基础知识库和技能预测和规划。
results: 我们开发了三个开放世界评估标准，然后从多个角度进行了广泛的实验来验证我们的模型能够策略性行动和规划。

Abstract
Recent studies have presented compelling evidence that large language models (LLMs) can equip embodied agents with the self-driven capability to interact with the world, which marks an initial step toward versatile robotics. However, these efforts tend to overlook the visual richness of open worlds, rendering the entire interactive process akin to "a blindfolded text-based game." Consequently, LLM-based agents frequently encounter challenges in intuitively comprehending their surroundings and producing responses that are easy to understand. In this paper, we propose Steve-Eye, an end-to-end trained large multimodal model designed to address this limitation. Steve-Eye integrates the LLM with a visual encoder which enables it to process visual-text inputs and generate multimodal feedback. In addition, we use a semi-automatic strategy to collect an extensive dataset comprising 850K open-world instruction pairs, empowering our model to encompass three essential functions for an agent: multimodal perception, foundational knowledge base, and skill prediction and planning. Lastly, we develop three open-world evaluation benchmarks, then carry out extensive experiments from a wide range of perspectives to validate our model's capability to strategically act and plan. Codes and datasets will be released.

摘要

Diagnosis-oriented Medical Image Compression with Efficient Transfer Learning

paper_url: http://arxiv.org/abs/2310.13250
repo_url: None
paper_authors: Guangqi Xie, Xin Li, Xiaohan Pan, Zhibo Chen
for: 这篇论文主要应用于Remote medical diagnosis中，以提高医疗资料传输效率和准确性。
methods: 本论文提出了一个新的医疗图像压缩方法，即Diagnosis-oriented medical image compression，通过将医疗资料分类和压缩，以减少传输成本。
results: 实验结果显示，该方法可以实现47.594%的BD-Rate减少，比HEVC标准更高效。另外，仅需要对政策网络的A2C模块（2.7%的参数）进行调整，并且只需要使用1个医疗样本。

Abstract
Remote medical diagnosis has emerged as a critical and indispensable technique in practical medical systems, where medical data are required to be efficiently compressed and transmitted for diagnosis by either professional doctors or intelligent diagnosis devices. In this process, a large amount of redundant content irrelevant to the diagnosis is subjected to high-fidelity coding, leading to unnecessary transmission costs. To mitigate this, we propose diagnosis-oriented medical image compression, a special semantic compression task designed for medical scenarios, targeting to reduce the compression cost without compromising the diagnosis accuracy. However, collecting sufficient medical data to optimize such a compression system is significantly expensive and challenging due to privacy issues and the lack of professional annotation. In this study, we propose DMIC, the first efficient transfer learning-based codec, for diagnosis-oriented medical image compression, which can be effectively optimized with only few-shot annotated medical examples, by reusing the knowledge in the existing reinforcement learning-based task-driven semantic coding framework, i.e., HRLVSC [1]. Concretely, we focus on tuning only the partial parameters of the policy network for bit allocation within HRLVSC, which enables it to adapt to the medical images. In this work, we validate our DMIC with the typical medical task, Coronary Artery Segmentation. Extensive experiments have demonstrated that our DMIC can achieve 47.594%BD-Rate savings compared to the HEVC anchor, by tuning only the A2C module (2.7% parameters) of the policy network with only 1 medical sample.

摘要
医学远程诊断已成为现代医疗系统中不可或缺的技术，医疗数据需要高效压缩并传输以便诊断，由专业医生或智能诊断设备进行诊断。在这个过程中，大量不相关于诊断的医疗数据会被高精度编码，导致无必要的传输成本增加。为了解决这个问题，我们提出了诊断导向医学影像压缩，这是专门为医疗场景设计的Semantic压缩任务，旨在降低压缩成本而无需妥协诊断精度。然而，收集足够的医疗数据来优化这种压缩系统是非常昂贵和困难的，主要是因为隐私问题和缺乏专业标注。在这个研究中，我们提出了DMIC，首个高效转移学习基于codec，用于诊断导向医学影像压缩。DMIC可以通过只需几个 annotated medical example来有效优化， reuse existing reinforcement learning-based task-driven semantic coding framework HRLVSC 的知识，以适应医学影像。我们在实验中验证了我们的DMIC，并与典型的医学任务Coronary Artery Segmentation进行了比较。结果表明，我们的DMIC可以在与HEVC anchor进行比较时实现47.594%BD-Rate savings，只需要调整HRLVSC中A2C模块（2.7%参数）和1个医学样本。

Auxiliary Features-Guided Super Resolution for Monte Carlo Rendering

paper_url: http://arxiv.org/abs/2310.13235
repo_url: None
paper_authors: Qiqi Hou, Feng Liu
for: 提高 Monte Carlo 渲染算法的速度，通过使用超分辨率技术减少像素数量。
methods: 利用高分辨率辅助特征来导航超分辨率渲染，并使用 Cross-modality Transformer 网络和 residual densely-connected Swin Transformer 组合来提取代表性特征。
results: 对比超分辨率渲染和 Monte Carlo 释除方法，auxiliary features 导航的超分辨率渲染方法可以提供更高质量的渲染图像。

Abstract
This paper investigates super resolution to reduce the number of pixels to render and thus speed up Monte Carlo rendering algorithms. While great progress has been made to super resolution technologies, it is essentially an ill-posed problem and cannot recover high-frequency details in renderings. To address this problem, we exploit high-resolution auxiliary features to guide super resolution of low-resolution renderings. These high-resolution auxiliary features can be quickly rendered by a rendering engine and at the same time provide valuable high-frequency details to assist super resolution. To this end, we develop a cross-modality Transformer network that consists of an auxiliary feature branch and a low-resolution rendering branch. These two branches are designed to fuse high-resolution auxiliary features with the corresponding low-resolution rendering. Furthermore, we design residual densely-connected Swin Transformer groups to learn to extract representative features to enable high-quality super-resolution. Our experiments show that our auxiliary features-guided super-resolution method outperforms both super-resolution methods and Monte Carlo denoising methods in producing high-quality renderings.

摘要

PTSR: Patch Translator for Image Super-Resolution

paper_url: http://arxiv.org/abs/2310.13216
repo_url: None
paper_authors: Neeraj Baghel, Shiv Ram Dubey, Satish Kumar Singh
for: 提高图像超分辨率，降低计算成本和存储量。
methods: 提议一种基于 transformer 的 GAN 网络，不含 convolution 操作，使用 patch translator 模块来重新生成改进的 patches，并由生成器使用这些 patches 生成 2x 和 4x 超分辨率图像。
results: 对于 DIV2K、Set5、Set14 和 BSD100 等标准数据集进行了实验，与最佳竞争模型相比，提出的模型在 $4\times$ 超分辨率上提高了平均21.66% 的 PNSR 分数和11.59% 的 SSIM 分数。还分析了提议的损失函数和焦点图来说明提议方法的效果。

Abstract
Image super-resolution generation aims to generate a high-resolution image from its low-resolution image. However, more complex neural networks bring high computational costs and memory storage. It is still an active area for offering the promise of overcoming resolution limitations in many applications. In recent years, transformers have made significant progress in computer vision tasks as their robust self-attention mechanism. However, recent works on the transformer for image super-resolution also contain convolution operations. We propose a patch translator for image super-resolution (PTSR) to address this problem. The proposed PTSR is a transformer-based GAN network with no convolution operation. We introduce a novel patch translator module for regenerating the improved patches utilising multi-head attention, which is further utilised by the generator to generate the 2x and 4x super-resolution images. The experiments are performed using benchmark datasets, including DIV2K, Set5, Set14, and BSD100. The results of the proposed model is improved on an average for $4\times$ super-resolution by 21.66% in PNSR score and 11.59% in SSIM score, as compared to the best competitive models. We also analyse the proposed loss and saliency map to show the effectiveness of the proposed method.

摘要
Image超解算生成目标是生成高分辨率图像从低分辨率图像。然而，更复杂的神经网络带来高计算成本和存储空间。这仍然是活跃的领域，提供超分辨率限制的缺乏的解决方案。在最近几年，转换器在计算机视觉任务中做出了重要进步，它的稳定自注意机制使得它在图像分类、 object detection 等任务中表现出色。然而，最近对转换器的图像超解算任务也包含了 convolution 操作。我们提出了一种图像超解算patch translator（PTSR），以解决这个问题。我们的提案中的 PTSR 是基于转换器的 GAN 网络，没有 convolution 操作。我们引入了一种新的 patch translator 模块，用于重新生成改进的 patches，使用多头注意机制，该机制被生成器使用，以生成 2x 和 4x 超分辨率图像。我们在 DIV2K、Set5、Set14 和 BSD100 等标准 datasets 上进行了实验。我们的提案模型在 $4\times$ 超分辨率下提高了平均 21.66% 的 PNSR 分数和 11.59% 的 SSIM 分数，相比最佳竞争对手。我们还分析了我们的提案损失函数和质感图，以证明我们的方法的有效性。

Zone Evaluation: Revealing Spatial Bias in Object Detection

paper_url: http://arxiv.org/abs/2310.13215
repo_url: https://github.com/zzh-tju/zoneeval
paper_authors: Zhaohui Zheng, Yuming Chen, Qibin Hou, Xiang Li, Ping Wang, Ming-Ming Cheng
for: 本研究旨在探讨对象检测器的空间偏好问题，并提出了一种新的zone评价协议来评估对象检测器的性能。
methods: 本研究使用了10种popular对象检测器和5个检测数据集，通过对zone评价协议进行广泛评估，探讨对象检测器的空间偏好问题。
results: 研究发现，对象检测器在图像边缘zone（96%）的性能不达到AP值（平均精度），表明对象检测器在这个zone中表现不佳。此外，研究还发现，对象检测器的性能异常分布在不同的zone中，存在一定的空间偏好问题。

Abstract
A fundamental limitation of object detectors is that they suffer from "spatial bias", and in particular perform less satisfactorily when detecting objects near image borders. For a long time, there has been a lack of effective ways to measure and identify spatial bias, and little is known about where it comes from and what degree it is. To this end, we present a new zone evaluation protocol, extending from the traditional evaluation to a more generalized one, which measures the detection performance over zones, yielding a series of Zone Precisions (ZPs). For the first time, we provide numerical results, showing that the object detectors perform quite unevenly across the zones. Surprisingly, the detector's performance in the 96\% border zone of the image does not reach the AP value (Average Precision, commonly regarded as the average detection performance in the entire image zone). To better understand spatial bias, a series of heuristic experiments are conducted. Our investigation excludes two intuitive conjectures about spatial bias that the object scale and the absolute positions of objects barely influence the spatial bias. We find that the key lies in the human-imperceptible divergence in data patterns between objects in different zones, thus eventually forming a visible performance gap between the zones. With these findings, we finally discuss a future direction for object detection, namely, spatial disequilibrium problem, aiming at pursuing a balanced detection ability over the entire image zone. By broadly evaluating 10 popular object detectors and 5 detection datasets, we shed light on the spatial bias of object detectors. We hope this work could raise a focus on detection robustness. The source codes, evaluation protocols, and tutorials are publicly available at \url{https://github.com/Zzh-tju/ZoneEval}.

摘要
基本限制的物体探测器受到"空间偏见"的影响，特别是在图像边缘附近探测物体的时候表现不如预期。在这个问题上，我们提出了一种新的区域评估协议，从传统评估扩展到更通用的一种，可以测量探测器在不同区域的性能，并提取一系列的Zone精度（ZP）。这是第一次提供数字结果，表明探测器在图像96%边缘区域的性能不达到AP值（平均精度，通常被视为整个图像区域的平均探测性能）。为了更好地理解空间偏见，我们进行了一系列的启发性实验。我们的调查排除了两个直觉的假设：物体大小和绝对位置对空间偏见的影响。我们发现关键在于数据模式之间的人类不可见差异，因此 eventually形成了不同区域的可见性 gap。通过这些发现，我们最后讨论了对象探测的未来方向，即空间不平衡问题，旨在寻求整个图像区域内均匀的探测能力。通过评估10款流行的对象探测器和5个检测数据集，我们突出了对象探测器的空间偏见。我们希望这项工作能够启发对检测稳定性的关注。评估协议、评估工具和教程都可以在获得。

Identification of Abnormality in Maize Plants From UAV Images Using Deep Learning Approaches

paper_url: http://arxiv.org/abs/2310.13201
repo_url: None
paper_authors: Aminul Huq, Dimitris Zermas, George Bebis
for: 早期识别植物异常是重要的任务，以确保植物健康成长和获得高产量。精准农业可以受益于现代计算机视觉工具，使农业策略变得有效和高效。由于农业地域很大，农民需要手动检查广泛的地区，以确定植物的状况和应用有效的治疗方法。
methods: 我们使用深度学习技术来检测不同水平的异常（即低、中、高或无异常）在玉米植物图像中。我们的方法可以独立地识别不同的生长阶段，并且可以提供有价值的信息 для人工标注数据收集，帮助他们专注于更小的图像集中进行标注。
results: 我们对公共可用的数据集进行了实验，并取得了promising的初步结果，包括88.89%的低异常检测精度和100%的无异常检测精度。

Abstract
Early identification of abnormalities in plants is an important task for ensuring proper growth and achieving high yields from crops. Precision agriculture can significantly benefit from modern computer vision tools to make farming strategies addressing these issues efficient and effective. As farming lands are typically quite large, farmers have to manually check vast areas to determine the status of the plants and apply proper treatments. In this work, we consider the problem of automatically identifying abnormal regions in maize plants from images captured by a UAV. Using deep learning techniques, we have developed a methodology which can detect different levels of abnormality (i.e., low, medium, high or no abnormality) in maize plants independently of their growth stage. The primary goal is to identify anomalies at the earliest possible stage in order to maximize the effectiveness of potential treatments. At the same time, the proposed system can provide valuable information to human annotators for ground truth data collection by helping them to focus their attention on a much smaller set of images only. We have experimented with two different but complimentary approaches, the first considering abnormality detection as a classification problem and the second considering it as a regression problem. Both approaches can be generalized to different types of abnormalities and do not make any assumption about the abnormality occurring at an early plant growth stage which might be easier to detect due to the plants being smaller and easier to separate. As a case study, we have considered a publicly available data set which exhibits mostly Nitrogen deficiency in maize plants of various growth stages. We are reporting promising preliminary results with an 88.89\% detection accuracy of low abnormality and 100\% detection accuracy of no abnormality.

摘要
早期识别植物异常是重要的任务，以确保生长正常并实现高产量的农作物。精准农业可以受益于现代计算机视觉工具，以制定有效和高效的农业策略。由于农地往往非常大，因此农民需要手动检查广泛的地区，以确定植物的状况和应用相应的治疗。在这个工作中，我们考虑了自动地检测玉米植物中的异常区域，使用深度学习技术。我们开发了一种方法，可以独立地分类不同程度的异常（即低、中、高或无异常）在玉米植物中。我们的主要目标是在最早的时间内识别异常，以最大化治疗的效果。同时，我们的提posed系统可以为人类标注数据收集提供有价值的信息，帮助他们只需要关注相对较少的图像。我们对两种不同 pero complementary的方法进行了实验，第一种视异常检测为分类问题，第二种视异常检测为回归问题。两种方法都可以泛化到不同的异常类型，不假设异常发生在植物在早期生长阶段，可能更容易检测由于植物较小和易于分离。作为案例研究，我们使用了公开可用的数据集，该数据集主要表现出玉米植物中的缺氮缺乏。我们报告了有前景的初步结果，包括88.89%的低异常检测精度和100%的无异常检测精度。

2023-10-20

cs.AI

cs.AI - 2023-10-20

Deep Learning Approaches for Dynamic Mechanical Analysis of Viscoelastic Fiber Composites

paper_url: http://arxiv.org/abs/2310.15188
repo_url: None
paper_authors: Victor Hoffmann, Ilias Nahmed, Parisa Rastin, Guénaël Cabanes, Julien Boisse
for: 这篇论文是为了映射微结构到其机械性能的映射，以便通过深度神经网络快速设计和理解微结构。
methods: 该论文使用了机器学习技术，特别是深度神经网络，来映射微结构到机械性能。
results: 该论文可以快速地映射微结构到机械性能，从而帮助设计和理解微结构。

Abstract
The increased adoption of reinforced polymer (RP) composite materials, driven by eco-design standards, calls for a fine balance between lightness, stiffness, and effective vibration control. These materials are integral to enhancing comfort, safety, and energy efficiency. Dynamic Mechanical Analysis (DMA) characterizes viscoelastic behavior, yet there's a growing interest in using Machine Learning (ML) to expedite the design and understanding of microstructures. In this paper we aim to map microstructures to their mechanical properties using deep neural networks, speeding up the process and allowing for the generation of microstructures from desired properties.

摘要
随着复合材料（RP）的广泛应用，受生态设计标准的推动，需要均衡轻量、刚性和有效的振荡控制。这些材料对于提高舒适、安全和能效性具有重要作用。动态机械分析（DMA）可以描述弹性行为，但是有一种增长的兴趣是使用机器学习（ML）来加速设计和理解微结构。在这篇论文中，我们希望通过深度神经网络将微结构映射到机械性能上，从而加快过程并允许从所需的性能开发微结构。

Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing

paper_url: http://arxiv.org/abs/2310.13855
repo_url: https://github.com/microsoft/Evoke
paper_authors: Xinyu Hu, Pengfei Tang, Simiao Zuo, Zihan Wang, Bowen Song, Qiang Lou, Jian Jiao, Denis Charles
For: The paper is written for improving the performance of large language models (LLMs) in natural language processing tasks by proposing a new prompt refinement framework called Evoke.* Methods: The paper proposes using two instances of a same LLM, one as a reviewer and one as an author, to refine prompts through an author-reviewer feedback loop. The paper also introduces a data selection approach to expose the LLM to hard samples.* Results: The paper reports significant outperformance of Evoke compared to existing methods, with Evoke scoring above 80 in the challenging task of logical fallacy detection while other baseline methods struggle to reach 20.

Abstract
Large language models (LLMs) have made impressive progress in natural language processing. These models rely on proper human instructions (or prompts) to generate suitable responses. However, the potential of LLMs are not fully harnessed by commonly-used prompting methods: many human-in-the-loop algorithms employ ad-hoc procedures for prompt selection; while auto prompt generation approaches are essentially searching all possible prompts randomly and inefficiently. We propose Evoke, an automatic prompt refinement framework. In Evoke, there are two instances of a same LLM: one as a reviewer (LLM-Reviewer), it scores the current prompt; the other as an author (LLM-Author), it edits the prompt by considering the edit history and the reviewer's feedback. Such an author-reviewer feedback loop ensures that the prompt is refined in each iteration. We further aggregate a data selection approach to Evoke, where only the hard samples are exposed to the LLM. The hard samples are more important because the LLM can develop deeper understanding of the tasks out of them, while the model may already know how to solve the easier cases. Experimental results show that Evoke significantly outperforms existing methods. For instance, in the challenging task of logical fallacy detection, Evoke scores above 80, while all other baseline methods struggle to reach 20.

摘要
大型自然语言处理模型（LLM）已经做出了很好的进步。这些模型需要合适的人类指导（或提示）来生成适当的响应。然而，LLM的潜在能力未被充分利用，因为常用的人类在循环算法使用了临时的方法来选择提示，而自动生成提示方法则是随机搜索所有可能的提示。我们提出了“触发”，一个自动提示修充框架。在触发中，有两个相同的LLM实例：一个作为评审者（LLM-Reviewer），它评分当前提示；另一个作为作者（LLM-Author），它通过考虑编辑历史和评审者的反馈来修改提示。这种作者-评审者反馈循环确保提示在每次迭代中得到修充。我们还添加了一种数据选择方法到触发中，只暴露困难样本给LLM。困难样本更重要，因为LLM可以通过它们更深入理解任务，而模型可能已经知道如何解决较易的 случа子。实验结果表明，触发明确超越了现有方法。例如，在逻辑错误检测任务中，触发得分超过80，而所有基准方法几乎无法达到20。

CNN-based Prediction of Partition Path for VVC Fast Inter Partitioning Using Motion Fields

paper_url: http://arxiv.org/abs/2310.13838
repo_url: https://github.com/simon123123/vtm10_fast_dt_inter_partition_pcs2021
paper_authors: Yiqun Liu, Marc Riviere, Thomas Guionnet, Aline Roumy, Christine Guillemot
for: 这个研究旨在提高VVC编码器的速度，使其在高效视频编码（HEVC）标准下提供更高的压缩效率和更高的编码复杂度。methods: 该研究提出了一种基于卷积神经网络（CNN）的方法，通过预测QUADTREE分割路径来加速VVC编码器的INTERPARTITIONING过程。在这种方法中，首先引入了一种新的QUADTREE分割表示法，然后开发了一种基于U-Net的CNN，用于在CTU级别预测分割路径。results: 实验表明，提出的方法可以在RAGOP32配置下实现加速率 range from 16.5% to 60.2%，而且与此同时，BD-rate的影响在0.44% to 4.59%之间，这超过了其他当前的解决方案。此外，该方法还被认为是当前场景中最轻量级的方法之一，这使得其适用于其他编码器。

Abstract
The Versatile Video Coding (VVC) standard has been recently finalized by the Joint Video Exploration Team (JVET). Compared to the High Efficiency Video Coding (HEVC) standard, VVC offers about 50% compression efficiency gain, in terms of Bjontegaard Delta-Rate (BD-rate), at the cost of a 10-fold increase in encoding complexity. In this paper, we propose a method based on Convolutional Neural Network (CNN) to speed up the inter partitioning process in VVC. Firstly, a novel representation for the quadtree with nested multi-type tree (QTMT) partition is introduced, derived from the partition path. Secondly, we develop a U-Net-based CNN taking a multi-scale motion vector field as input at the Coding Tree Unit (CTU) level. The purpose of CNN inference is to predict the optimal partition path during the Rate-Distortion Optimization (RDO) process. To achieve this, we divide CTU into grids and predict the Quaternary Tree (QT) depth and Multi-type Tree (MT) split decisions for each cell of the grid. Thirdly, an efficient partition pruning algorithm is introduced to employ the CNN predictions at each partitioning level to skip RDO evaluations of unnecessary partition paths. Finally, an adaptive threshold selection scheme is designed, making the trade-off between complexity and efficiency scalable. Experiments show that the proposed method can achieve acceleration ranging from 16.5% to 60.2% under the RandomAccess Group Of Picture 32 (RAGOP32) configuration with a reasonable efficiency drop ranging from 0.44% to 4.59% in terms of BD-rate, which surpasses other state-of-the-art solutions. Additionally, our method stands out as one of the lightest approaches in the field, which ensures its applicability to other encoders.

摘要
joint 视频探索团体（JVET）最近确定的多元视频编码标准（VVC），相比高效视频编码标准（HEVC），VVC在BD率方面提供约50%的压缩效率提升，但是需要10倍的编码复杂度增加。在这篇论文中，我们提出一种基于卷积神经网络（CNN）的方法，用于加速VVC中的分区过程。首先，我们引入了一种新的Quadtree嵌套多型树（QTMT）分区表示，基于分区路径。其次，我们开发了一种基于U-Net的CNN，用于在CTU级别上接受多尺度运动 вектор场为输入，并通过RDO过程预测最佳分区路径。为此，我们将CTU分割成格子，并预测每个格子的QT深度和MT分裂决策。第三，我们引入了一种高效的分区剔除算法，以使用CNN预测结果在每个分区级别进行分区剔除。最后，我们设计了一种可Scalable的阈值选择方案，以实现质量和效率之间的平衡。实验表明，我们的方法可以在RAGOP32配置下实现加速率 ranges from 16.5% to 60.2%，BD率上的效率损失在0.44% to 4.59%之间，超越了其他现有的解决方案。此外，我们的方法具有轻量级的特点，使其适用于其他编码器。

GraphMaker: Can Diffusion Models Generate Large Attributed Graphs?

paper_url: http://arxiv.org/abs/2310.13833
repo_url: https://github.com/graph-com/graphmaker
paper_authors: Mufei Li, Eleonora Kreačić, Vamsi K. Potluru, Pan Li
for: 该论文旨在生成大规模图的实用图学机器学习中，提供一种新的扩散模型，帮助理解网络演化和数据用途保持。
methods: 该论文提出了一种新的扩散模型，名为GraphMaker，可以生成大规模图并考虑节点属性。它还使用了节点级conditioning和小批量策略以提高可扩展性。
results: 实验表明，GraphMaker可以生成高质量的大规模图，并且可以在下游任务中提供有用的数据。

Abstract
Large-scale graphs with node attributes are fundamental in real-world scenarios, such as social and financial networks. The generation of synthetic graphs that emulate real-world ones is pivotal in graph machine learning, aiding network evolution understanding and data utility preservation when original data cannot be shared. Traditional models for graph generation suffer from limited model capacity. Recent developments in diffusion models have shown promise in merely graph structure generation or the generation of small molecular graphs with attributes. However, their applicability to large attributed graphs remains unaddressed due to challenges in capturing intricate patterns and scalability. This paper introduces GraphMaker, a novel diffusion model tailored for generating large attributed graphs. We study the diffusion models that either couple or decouple graph structure and node attribute generation to address their complex correlation. We also employ node-level conditioning and adopt a minibatch strategy for scalability. We further propose a new evaluation pipeline using models trained on generated synthetic graphs and tested on original graphs to evaluate the quality of synthetic data. Empirical evaluations on real-world datasets showcase GraphMaker's superiority in generating realistic and diverse large-attributed graphs beneficial for downstream tasks.

摘要
大规模图像有节点特征是现实世界中常见的案例，如社交和金融网络。生成可信的图像是图机器学习中关键的，帮助理解网络的进化和数据的可用性保持，当原始数据无法分享时。传统的图像生成模型受到有限的模型容量的限制。最近的扩散模型已经显示出了在图像结构生成或小分子图像生成中的承诺。然而，它们对大型具有特征的图像仍然无法解决，因为捕捉复杂的模式和可扩展性的挑战。本文介绍了GraphMaker，一种适用于生成大型具有特征的图像的新型扩散模型。我们研究了把 grafstructures 和节点特征生成 decouple 或 couple 以处理它们之间的复杂关系。我们还使用节点级 conditioning 和采用小批量策略以实现可扩展性。此外，我们还提出了一种基于生成的 синтетиче图像模型来评估生成的图像质量。empirical evaluation 表明，GraphMaker 可以生成真实且多样的大型具有特征的图像，对下游任务具有利助。

Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models

paper_url: http://arxiv.org/abs/2310.13828
repo_url: None
paper_authors: Shawn Shan, Wenxin Ding, Josephine Passananti, Haitao Zheng, Ben Y. Zhao
for: 本研究旨在探讨数据恶意攻击的可行性和影响，以及可能的防御策略。
methods: 研究人员使用了一种名为“Nightshade”的优化后的提问特定恶意攻击方法，可以在少量恶意样本中训练模型。
results: 研究发现， Nightshade 攻击可以成功地让模型产生意外的输出，并且这些输出可以“延伸”到相关的概念上。此外，研究人员还发现，一些恶意攻击可以“混合”在同一个提问上，从而导致模型的总体特征变得不稳定。

Abstract
Data poisoning attacks manipulate training data to introduce unexpected behaviors into machine learning models at training time. For text-to-image generative models with massive training datasets, current understanding of poisoning attacks suggests that a successful attack would require injecting millions of poison samples into their training pipeline. In this paper, we show that poisoning attacks can be successful on generative models. We observe that training data per concept can be quite limited in these models, making them vulnerable to prompt-specific poisoning attacks, which target a model's ability to respond to individual prompts. We introduce Nightshade, an optimized prompt-specific poisoning attack where poison samples look visually identical to benign images with matching text prompts. Nightshade poison samples are also optimized for potency and can corrupt an Stable Diffusion SDXL prompt in <100 poison samples. Nightshade poison effects "bleed through" to related concepts, and multiple attacks can composed together in a single prompt. Surprisingly, we show that a moderate number of Nightshade attacks can destabilize general features in a text-to-image generative model, effectively disabling its ability to generate meaningful images. Finally, we propose the use of Nightshade` and similar tools as a last defense for content creators against web scrapers that ignore opt-out/do-not-crawl directives, and discuss possible implications for model trainers and content creators.

摘要
数据毒攻击攻击机器学习模型的训练时间，以引入无预期的行为。文本生成模型拥有庞大的训练集，目前认知中指出，成功攻击需要插入数百万毒样本到训练管道中。在这篇论文中，我们表明了毒攻击可以成功地进行在生成模型中。我们发现，生成模型中每个概念的训练数据很有限，使其易受到特定提示毒攻击，该攻击target模型对具体提示的响应能力。我们引入了“夜露”（Nightshade）优化的提示特定毒攻击，毒样本与benign图像和相同的文本提示保持视觉相同。夜露毒样本还优化了强度，可以在<100个毒样本中训练Stable Diffusion SDXL提示。夜露毒效“泄漏”到相关概念上，并且多个攻击可以在单个提示中组合 together。 surprisingly，我们发现了一些 Nightshade 攻击可以使生成模型的通用特征失效，从而禁用生成有意义的图像。最后，我们提议使用 Nightshade 和类似工具作为内容创作者对网络抓取器忽略 opt-out/do-not-crawl 指令的最后防御，并讨论了模型训练者和内容创作者的可能的影响。

FERI: A Multitask-based Fairness Achieving Algorithm with Applications to Fair Organ Transplantation

paper_url: http://arxiv.org/abs/2310.13820
repo_url: None
paper_authors: Can Li, Dejian Lai, Xiaoqian Jiang, Kai Zhang
for: 这 paper 是为了 Addressing fairness challenges in liver transplantation, particularly for subgroups defined by sensitive attributes like age group, gender, and race/ethnicity.
methods: 这 paper 使用了 Machine learning models for outcome prediction, with a focus on fairness-aware predictive modeling. The authors introduce the Fairness through the Equitable Rate of Improvement in Multitask Learning (FERI) algorithm, which constrains subgroup loss and prevents subgroup dominance in the training process.
results: 试验表明，FERI 可以 maintain high predictive accuracy with AUROC and AUPRC comparable to baseline models, while improving fairness. Specifically, for gender, FERI reduces the demographic parity disparity by 71.74%, and for the age group, it decreases the equalized odds disparity by 40.46%.

Abstract
Liver transplantation often faces fairness challenges across subgroups defined by sensitive attributes like age group, gender, and race/ethnicity. Machine learning models for outcome prediction can introduce additional biases. To address these, we introduce Fairness through the Equitable Rate of Improvement in Multitask Learning (FERI) algorithm for fair predictions of graft failure risk in liver transplant patients. FERI constrains subgroup loss by balancing learning rates and preventing subgroup dominance in the training process. Our experiments show that FERI maintains high predictive accuracy with AUROC and AUPRC comparable to baseline models. More importantly, FERI demonstrates an ability to improve fairness without sacrificing accuracy. Specifically, for gender, FERI reduces the demographic parity disparity by 71.74%, and for the age group, it decreases the equalized odds disparity by 40.46%. Therefore, the FERI algorithm advances fairness-aware predictive modeling in healthcare and provides an invaluable tool for equitable healthcare systems.

摘要
肝移植往往面临公平挑战，特别是在年龄组、性别和种族/民族等敏感属性下的分组。机器学习模型用于结果预测可能会引入额外偏见。为解决这些问题，我们介绍了在多任务学习中实现公平性的 Fairness through the Equitable Rate of Improvement in Multitask Learning（FERI）算法。FERI通过均衡学习率和避免分组占据训练过程中的分组主导来约束分组损失。我们的实验表明，FERI可以保持高度的预测精度，AUROC和AUPRC与基线模型相当。此外，FERI能够提高公平性而不牺牲准确性。特别是gender方面，FERI可以降低性别差距的比例为71.74%，而年龄组方面，它可以降低平等 odds差距的比例为40.46%。因此，FERI算法为健康卫生领域的公平预测模型提供了一个有价值的工具，并且能够推动公平的医疗系统。

FATA-Trans: Field And Time-Aware Transformer for Sequential Tabular Data

paper_url: http://arxiv.org/abs/2310.13818
repo_url: https://github.com/zdy93/fata-trans
paper_authors: Dongyu Zhang, Liang Wang, Xin Dai, Shubham Jain, Junpeng Wang, Yujie Fan, Chin-Chia Michael Yeh, Yan Zheng, Zhongfang Zhuang, Wei Zhang
for: 这个研究旨在提出一个适合批评Sequential Tabular Data（STD）的模型，并且能够对STD进行有效的分析和处理。
methods: 本研究提出了一个名为FATA-Trans的模型，它使用了两个Field Transformer来处理STD，并且透过Field-type embedding和Time-aware position embedding来capture static和动态字段之间的差异和时间序列资讯。
results: 实验结果显示，FATA-Trans在下游任务中的学习表现都高于现有的方法，并且通过可视化研究来显示了模型所捕捉的内在结构和时间行为图像。

Abstract
Sequential tabular data is one of the most commonly used data types in real-world applications. Different from conventional tabular data, where rows in a table are independent, sequential tabular data contains rich contextual and sequential information, where some fields are dynamically changing over time and others are static. Existing transformer-based approaches analyzing sequential tabular data overlook the differences between dynamic and static fields by replicating and filling static fields into each transformer, and ignore temporal information between rows, which leads to three major disadvantages: (1) computational overhead, (2) artificially simplified data for masked language modeling pre-training task that may yield less meaningful representations, and (3) disregarding the temporal behavioral patterns implied by time intervals. In this work, we propose FATA-Trans, a model with two field transformers for modeling sequential tabular data, where each processes static and dynamic field information separately. FATA-Trans is field- and time-aware for sequential tabular data. The field-type embedding in the method enables FATA-Trans to capture differences between static and dynamic fields. The time-aware position embedding exploits both order and time interval information between rows, which helps the model detect underlying temporal behavior in a sequence. Our experiments on three benchmark datasets demonstrate that the learned representations from FATA-Trans consistently outperform state-of-the-art solutions in the downstream tasks. We also present visualization studies to highlight the insights captured by the learned representations, enhancing our understanding of the underlying data. Our codes are available at https://github.com/zdy93/FATA-Trans.

摘要
纵向表格数据是现实世界中最常用的数据类型之一。与传统的表格数据不同，纵向表格数据含有较为复杂的上下文和时间序列信息，其中一些字段会随时间的推移而变化，而其他字段则是静态的。现有的转换器基本方法会忽略时间序列中的不同性和顺序信息，这会导致三个主要缺点：（1）计算成本高，（2）为隐藏语言模型预训练任务提供不准确的数据，（3）忽略时间序列中的行为模式。在这项工作中，我们提议了FATA-Trans模型，它包含两个字段转换器，每个转换器都会处理不同的静态和动态字段信息。FATA-Trans模型具有场景和时间意识，可以有效地处理纵向表格数据。我们在三个标准数据集上进行了实验，结果显示FATA-Trans模型在下游任务中的学习表现 Always outperform了现有的解决方案。我们还提供了视觉研究，以帮助我们更好地理解下面数据的含义。我们的代码可以在https://github.com/zdy93/FATA-Trans上获取。

paper_url: http://arxiv.org/abs/2310.13809
repo_url: https://github.com/lindamoraes/turtlebot-project
paper_authors: Linda Dotto de Moraes, Victor Augusto Kich, Alisson Henrique Kolling, Jair Augusto Bottega, Ricardo Bedin Grando, Anselmo Rafael Cukla, Daniel Fernando Tello Gamarra
for: 这项研究旨在提高无地图导航的地面机器人能力。
methods: 研究采用了两种不同的深度强化学习方法，一是基于深度Q网络（DQN）算法的方法，另一是基于双深度Q网络（DDQN）算法的方法。两种方法都使用了激光范围探测器提供的24个量据，并且通过Agent的位差和方向相对于目标来决定导航。
results: 研究发现，使用Double Deep结构可以显著提高地面机器人的导航能力，并且不需要依赖于复杂的图像输入。在三个实际环境中进行了评估，结果表明 Double Deep 结构在导航任务中表现出了显著的优势。

Abstract
In this study, we present two distinct approaches within the realm of Deep Reinforcement Learning (Deep-RL) aimed at enhancing mapless navigation for a ground-based mobile robot. The research methodology primarily involves a comparative analysis between a Deep-RL strategy grounded in the foundational Deep Q-Network (DQN) algorithm, and an alternative approach based on the Double Deep Q-Network (DDQN) algorithm. The agents in these approaches leverage 24 measurements from laser range sampling, coupled with the agent's positional differentials and orientation relative to the target. This amalgamation of data influences the agents' determinations regarding navigation, ultimately dictating the robot's velocities. By embracing this parsimonious sensory framework as proposed, we successfully showcase the training of an agent for proficiently executing navigation tasks and adeptly circumventing obstacles. Notably, this accomplishment is attained without a dependency on intricate sensory inputs like those inherent to image-centric methodologies. The proposed methodology is evaluated in three different real environments, revealing that Double Deep structures significantly enhance the navigation capabilities of mobile robots compared to simple Q structures.

摘要
在这项研究中，我们提出了两种不同的深度强化学习方法（深度强化学习），用于增强地面机器人无地图导航。我们的研究方法主要包括对使用深度Q网络（DQN）算法和双深度Q网络（DDQN）算法两种方法进行比较分析。这两种方法的代理人利用24个激光距测样本，并结合代理人的位差和方向偏差 relative to the target，以Influence the agents' determinations regarding navigation, ultimately dictating the robot's velocities。通过采用这种简单的感知框架，我们成功地训练了一个能够高效执行导航任务并绕过障碍物的代理人。值得注意的是，这种成功不依赖于复杂的感知输入，如图像中心的方法。我们的方法在三个不同的实际环境中进行了评估，结果显示，使用双深度结构可以增强移动机器人的导航能力，相比于单深度结构。

RoseNet: Predicting Energy Metrics of Double InDel Mutants Using Deep Learning

paper_url: http://arxiv.org/abs/2310.13806
repo_url: None
paper_authors: Sarah Coffland, Katie Christensen, Filip Jagodzinski, Brian Hutchinson
For: The paper is written to explore the use of computational methods to model insertion and deletion (InDel) mutations in proteins, specifically using a deep learning approach called RoseNet to predict the effects of InDel mutations on protein structure and function.* Methods: The paper uses a combination of computational methods, including the Rosetta protein structure prediction software and deep learning techniques, to generate and analyze exhaustive double InDel mutations for three proteins. The authors develop and train RoseNet on several structural and energetic metrics output by Rosetta during the mutant generation process.* Results: The paper presents the results of the authors’ experiments using RoseNet to predict the effects of InDel mutations on protein structure and function. The authors show that RoseNet can accurately emulate the exhaustive data set using deep learning methods, and demonstrate the ability of the model to predict Rosetta metrics for unseen mutant sequences with two InDels. The paper also includes a sensitivity analysis to determine the necessary quantity of data required to accurately emulate the structural scores for computationally generated mutants.

Abstract
An amino acid insertion or deletion, or InDel, can have profound and varying functional impacts on a protein's structure. InDel mutations in the transmembrane conductor regulator protein for example give rise to cystic fibrosis. Unfortunately performing InDel mutations on physical proteins and studying their effects is a time prohibitive process. Consequently, modeling InDels computationally can supplement and inform wet lab experiments. In this work, we make use of our data sets of exhaustive double InDel mutations for three proteins which we computationally generated using a robotics inspired inverse kinematics approach available in Rosetta. We develop and train a neural network, RoseNet, on several structural and energetic metrics output by Rosetta during the mutant generation process. We explore and present how RoseNet is able to emulate the exhaustive data set using deep learning methods, and show to what extent it can predict Rosetta metrics for unseen mutant sequences with two InDels. RoseNet achieves a Pearson correlation coefficient median accuracy of 0.775 over all Rosetta scores for the largest protein. Furthermore, a sensitivity analysis is performed to determine the necessary quantity of data required to accurately emulate the structural scores for computationally generated mutants. We show that the model can be trained on minimal data (<50%) and still retain a high level of accuracy.

摘要
一个氨基酸插入或删除（InDel）可能对蛋白质的结构产生深刻和多样化的功能影响。例如，InDel 变异在传输膜调控蛋白中会导致肾上腺炎病。然而，在实验室中进行InDel变异和研究其效果是一个时间紧张的过程。因此，通过计算方式模拟InDel变异可以补充和指导 wet lab experiment。在这项工作中，我们使用了我们已经生成的double InDel 变异数据集，其中包括三种蛋白质的计算生成的双重InDel 变异。我们开发了一个神经网络模型，称为RoseNet，并在Rosetta中生成的多种结构和能量指标上训练这个模型。我们探索了RoseNet是如何使用深度学习方法来模拟数据集，并对未经见过的双重InDel 变异序列预测Rosetta指标的能力。RoseNet在最大蛋白质中达到了 median 准确率0.775。此外，我们进行了敏感分析，以确定模型需要多少数据来准确模拟计算生成的结构分数。我们发现模型可以在少量数据（<50%）上训练并仍保持高级别的准确率。

Improving Molecular Properties Prediction Through Latent Space Fusion

paper_url: http://arxiv.org/abs/2310.13802
repo_url: https://github.com/ibm/molformer
paper_authors: Eduardo Soares, Akihiro Kishimoto, Emilio Vital Brazil, Seiji Takeda, Hiroshi Kajino, Renato Cerqueira
for:This paper aims to enhance the efficacy of pre-trained language models for predicting molecular properties by combining latent spaces derived from state-of-the-art chemical models.methods:The proposed approach combines the embeddings derived from MHG-GNN, which represents molecular structures as graphs, and MoLFormer embeddings rooted in chemical language. The attention mechanism of MoLFormer is able to identify relations between two atoms even when their distance is far apart, while the GNN of MHG-GNN can more precisely capture relations among multiple atoms closely located.results:The proposed multi-view approach outperforms existing state-of-the-art methods, including MoLFormer-XL, in predicting clinical trial drug toxicity and inhibiting HIV replication, as demonstrated on six benchmark datasets from MoleculeNet. The approach achieves superior performance in intricate tasks, and the use of small versions of MHG-GNN and MoLFormer opens up an opportunity for further improvement when using a larger-scale dataset.Here is the result in Simplified Chinese text:for: 这篇论文目标是提高预训练语言模型在预测分子性质方面的效果，通过结合化物学模型的状态空间。methods: 该方法结合MHG-GNN的嵌入， representing molecular structures as graphs，以及基于化学语言的MoLFormer嵌入。MoLFormer的注意机制可以在两个原子之间识别远距离的关系，而MHG-GNN的GNN可以更好地捕捉多个原子 closely located的关系。results: 该方法在六个MoleculeNet的标准数据集上表现出色，比如MoLFormer-XL、在临床药物到ксиicity预测和HIV复制抑制方面取得了更高的性能，特别是在复杂任务中。使用小版本的MHG-GNN和MoLFormer，这开 up了进一步改进的机会，当使用更大规模的数据集时。

Abstract
Pre-trained Language Models have emerged as promising tools for predicting molecular properties, yet their development is in its early stages, necessitating further research to enhance their efficacy and address challenges such as generalization and sample efficiency. In this paper, we present a multi-view approach that combines latent spaces derived from state-of-the-art chemical models. Our approach relies on two pivotal elements: the embeddings derived from MHG-GNN, which represent molecular structures as graphs, and MoLFormer embeddings rooted in chemical language. The attention mechanism of MoLFormer is able to identify relations between two atoms even when their distance is far apart, while the GNN of MHG-GNN can more precisely capture relations among multiple atoms closely located. In this work, we demonstrate the superior performance of our proposed multi-view approach compared to existing state-of-the-art methods, including MoLFormer-XL, which was trained on 1.1 billion molecules, particularly in intricate tasks such as predicting clinical trial drug toxicity and inhibiting HIV replication. We assessed our approach using six benchmark datasets from MoleculeNet, where it outperformed competitors in five of them. Our study highlights the potential of latent space fusion and feature integration for advancing molecular property prediction. In this work, we use small versions of MHG-GNN and MoLFormer, which opens up an opportunity for further improvement when our approach uses a larger-scale dataset.

摘要

Specific versus General Principles for Constitutional AI

paper_url: http://arxiv.org/abs/2310.13798
repo_url: None
paper_authors: Sandipan Kundu, Yuntao Bai, Saurav Kadavath, Amanda Askell, Andrew Callahan, Anna Chen, Anna Goldie, Avital Balwit, Azalia Mirhoseini, Brayden McLean, Catherine Olsson, Cassie Evraets, Eli Tran-Johnson, Esin Durmus, Ethan Perez, Jackson Kernion, Jamie Kerr, Kamal Ndousse, Karina Nguyen, Nelson Elhage, Newton Cheng, Nicholas Schiefer, Nova DasSarma, Oliver Rausch, Robin Larson, Shannon Yang, Shauna Kravec, Timothy Telleen-Lawton, Thomas I. Liao, Tom Henighan, Tristan Hume, Zac Hatfield-Dodds, Sören Mindermann, Nicholas Joseph, Sam McCandlish, Jared Kaplan
for: 这篇论文是为了探讨人工智能模型中的伦理问题，以及如何使用AI模型来避免这些问题。
methods: 这篇论文使用了AI模型来替代人类反馈，并让AI模型仅仅根据一个列表的原则进行反馈。
results: 研究发现，使用简单的原则可以有效地防止AI模型表达有害行为，但是需要更多的原则来实现细化的控制。

Abstract
Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a stated desire for self-preservation or power. Constitutional AI offers an alternative, replacing human feedback with feedback from AI models conditioned only on a list of written principles. We find this approach effectively prevents the expression of such behaviors. The success of simple principles motivates us to ask: can models learn general ethical behaviors from only a single written principle? To test this, we run experiments using a principle roughly stated as "do what's best for humanity". We find that the largest dialogue models can generalize from this short constitution, resulting in harmless assistants with no stated interest in specific motivations like power. A general principle may thus partially avoid the need for a long list of constitutions targeting potentially harmful behaviors. However, more detailed constitutions still improve fine-grained control over specific types of harms. This suggests both general and specific principles have value for steering AI safely.

摘要
人类反馈可以防止对话模型表达过分危险的言行，但可能无法自动消除微妙的问题行为，如表达自保或权力愿望。宪法AI提供了一种alternative， replacing human feedback with AI模型对一份已编写的原则进行反馈。我们发现这种方法能够有效防止表达这些行为。成功的简单原则使我们问：可以模型学习通用的伦理行为从单一的写好的原则中吗？为测试这一点，我们运行了一些实验，使用“为人类做好事”的简单宪法。我们发现大型对话模型可以从这个短宪法中泛化，得到无权力的助手。一个通用的原则可以因此部分避免制定长列表的宪法，targeting potentially harmful behaviors。然而，更详细的宪法仍然可以提供细化的控制 sobre specific types of harms。这表明 Both general and specific principles have value for steering AI safely。

Enhancing Illicit Activity Detection using XAI: A Multimodal Graph-LLM Framework

paper_url: http://arxiv.org/abs/2310.13787
repo_url: None
paper_authors: Jack Nicholls, Aditya Kuppa, Nhien-An Le-Khac
For: This paper is written for organisations and governments looking to improve their financial cybercrime detection and prevention methods.* Methods: The paper presents a state-of-the-art, novel multimodal approach to explainable AI (XAI) in financial cybercrime detection, leveraging deep learning models to distill essential representations from transaction sequencing, subgraph connectivity, and narrative generation.* Results: The paper’s proposed approach significantly streamlines the investigative process for analysts, allowing them to understand transactions and their metadata much further through contextual narrative generation.

Abstract
Financial cybercrime prevention is an increasing issue with many organisations and governments. As deep learning models have progressed to identify illicit activity on various financial and social networks, the explainability behind the model decisions has been lacklustre with the investigative analyst at the heart of any deep learning platform. In our paper, we present a state-of-the-art, novel multimodal proactive approach to addressing XAI in financial cybercrime detection. We leverage a triad of deep learning models designed to distill essential representations from transaction sequencing, subgraph connectivity, and narrative generation to significantly streamline the analyst's investigative process. Our narrative generation proposal leverages LLM to ingest transaction details and output contextual narrative for an analyst to understand a transaction and its metadata much further.

摘要
金融电脑犯罪预防已成为许多组织和政府的问题。随着深度学习模型在不同的金融和社交网络上识别违法活动的能力不断提高，模型决策的解释力 however, has been lackluster for investigative analysts at the heart of any deep learning platform. 在我们的论文中，我们提出了一种现代化、新型的多Modalactive approach来解决金融电脑犯罪检测中的XAI问题。我们利用了三种深度学习模型，用于筛选交易时间序列、子图连接和生成文本，以大大简化分析员的调查过程。我们的文本生成提案利用LLM来接受交易细节并输出 Contextual narrative，让分析员更深入理解交易和其元数据。

Fundamental Limits of Membership Inference Attacks on Machine Learning Models

paper_url: http://arxiv.org/abs/2310.13786
repo_url: None
paper_authors: Eric Aubinais, Elisabeth Gassiat, Pablo Piantanida
for: 本文探讨了机器学习模型上的会员推测攻击（MIA）的基本统计限制，以及这种攻击对个人隐私的暴露。
methods: 本文首先 derivates了攻击成功的统计量，然后investigates several situations，并提供了这种量的上下限。
results: 根据样本数量和学习模型的结构参数，可以直接从数据集中估算出攻击准确率。

Abstract
Membership inference attacks (MIA) can reveal whether a particular data point was part of the training dataset, potentially exposing sensitive information about individuals. This article explores the fundamental statistical limitations associated with MIAs on machine learning models. More precisely, we first derive the statistical quantity that governs the effectiveness and success of such attacks. Then, we investigate several situations for which we provide bounds on this quantity of interest. This allows us to infer the accuracy of potential attacks as a function of the number of samples and other structural parameters of learning models, which in some cases can be directly estimated from the dataset.

摘要
会员推测攻击（MIA）可以揭示一个特定数据点是否在训练集中，可能暴露个人敏感信息。这篇文章探讨机器学习模型上的基本统计限制，以帮助理解会员推测攻击的效果和成功。我们首先计算攻击成功的统计量，然后研究这种量在不同情况下的下界和上界，从而可以根据样本数量和学习模型的结构参数来估算攻击的准确率。

Copyright Violations and Large Language Models

paper_url: http://arxiv.org/abs/2310.13771
repo_url: https://github.com/Noykarde/NoykardeRepository
paper_authors: Antonia Karamolegkou, Jiaang Li, Li Zhou, Anders Søgaard
for: 这篇论文探讨了大型自然语言处理模型可能违反版权法律的问题，具体来说是模型是否可以通过精准记忆来重新分布版权文本。
methods: 该论文使用了一系列语言模型对popular books和编程问题进行了实验，以保守地 Characterize the extent to which language models can redistribute these materials.
results: 研究发现，大型语言模型可以很好地记忆和重新分布版权文本，这可能会导致未经授权的版权违反。这些结果提醒我们需要进一步检查和研究，以确保未来的自然语言处理技术的发展遵循版权法律。

Abstract
Language models may memorize more than just facts, including entire chunks of texts seen during training. Fair use exemptions to copyright laws typically allow for limited use of copyrighted material without permission from the copyright holder, but typically for extraction of information from copyrighted materials, rather than {\em verbatim} reproduction. This work explores the issue of copyright violations and large language models through the lens of verbatim memorization, focusing on possible redistribution of copyrighted text. We present experiments with a range of language models over a collection of popular books and coding problems, providing a conservative characterization of the extent to which language models can redistribute these materials. Overall, this research highlights the need for further examination and the potential impact on future developments in natural language processing to ensure adherence to copyright regulations. Code is at \url{https://github.com/coastalcph/CopyrightLLMs}.

摘要
大型自然语言处理模型可能会记忆更多于Just facts，包括训练期间看到的整个文本块。 fair use exemption to copyright laws 通常允许不经copyright持有人授权的限量使用copyrighted material，但通常是为了提取copyrighted materials中的信息，而不是 verbatim 复制。这项研究通过 memorization 的问题来探讨版权侵犯和大型语言模型，强调可能的版权文本重新分布。我们在一系列popular book和编程问题上进行了实验，提供了保守的计算机архитектура来Quantify the extent to which language models can redistribute these materials. 总之，这项研究强调了需要进一步检查和未来自然语言处理发展中的版权规定遵循。代码可以在上找到。

Neural-Base Music Generation for Intelligence Duplication

paper_url: http://arxiv.org/abs/2310.13691
repo_url: None
paper_authors: Jacob Galajda, Kien Hua
for: 这 paper 是为了研究智能复制（Intelligent Duplication）技术，以便创造新的信息。
methods: 这 paper 使用了深度学习系统，以学习大作曲家贝多芬的作曲能力，并将其转化为一个哈希基本知识库。
results: 这 paper 通过了一种新的音乐创作方法，可以通过贝多芬的作曲能力来驱动音乐创作。

Abstract
There are two aspects of machine learning and artificial intelligence: (1) interpreting information, and (2) inventing new useful information. Much advance has been made for (1) with a focus on pattern recognition techniques (e.g., interpreting visual data). This paper focuses on (2) with intelligent duplication (ID) for invention. We explore the possibility of learning a specific individual's creative reasoning in order to leverage the learned expertise and talent to invent new information. More specifically, we employ a deep learning system to learn from the great composer Beethoven and capture his composition ability in a hash-based knowledge base. This new form of knowledge base provides a reasoning facility to drive the music composition through a novel music generation method.

摘要
Machine learning和人工智能有两个方面：（1）解读信息，和（2）创造新有用信息。在（1）方面，已经做出了很大的进步，主要是通过模式识别技术（如图像数据的解读）。这篇论文则专注于（2）方面，通过智能复制（ID）来创造新的信息。我们尝试了学习特定个人的创造性思维，以利用学习到的专业技巧和才华来创造新的信息。我们使用深度学习系统学习了大作曲家贝多芬的作曲能力，并将其储存在一个哈希基本知识库中。这种新的知识库提供了一种新的思维方式，用于驱动音乐创作。

Optimizing Retrieval-augmented Reader Models via Token Elimination

paper_url: http://arxiv.org/abs/2310.13682
repo_url: https://github.com/mosheber/token_elimination
paper_authors: Moshe Berchansky, Peter Izsak, Avi Caciularu, Ido Dagan, Moshe Wasserblat
for: 提高语音模型的运行效率，特别是在生成大量输出时。
methods: 分析支持文本的贡献度，并在токен水平上剔除不必要的信息，以降低运行时间。
results: 可以减少运行时间，最多62.2%，同时保持表现稳定，甚至提高表现。

Abstract
Fusion-in-Decoder (FiD) is an effective retrieval-augmented language model applied across a variety of open-domain tasks, such as question answering, fact checking, etc. In FiD, supporting passages are first retrieved and then processed using a generative model (Reader), which can cause a significant bottleneck in decoding time, particularly with long outputs. In this work, we analyze the contribution and necessity of all the retrieved passages to the performance of reader models, and propose eliminating some of the retrieved information, at the token level, that might not contribute essential information to the answer generation process. We demonstrate that our method can reduce run-time by up to 62.2%, with only a 2% reduction in performance, and in some cases, even improve the performance results.

摘要

Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

paper_url: http://arxiv.org/abs/2310.13678
repo_url: None
paper_authors: Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Ke Wu
for: 提高对长型语音内容的翻译质量，因为长型语音内容中的各个单元可以独立翻译以提高总体翻译质量。
methods: 利用大语言模型（LLMs）对长ASR讲cript进行分割，以便独立翻译每个单元，并在解码过程中 incorporating finite-state constraints 来消除投射的 Output。
results: 通过prompt-tuning或 fine-tuning，LLMs可以适应ASR错误的讲cript，并在9个测试集上提高了 average BLEU 值by 2.9点，相比于 automatic punctuation 基准。

Abstract
One challenge in speech translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we adapt large language models (LLMs) to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We overcome the tendency of hallucination in LLMs by incorporating finite-state constraints during decoding; these eliminate invalid outputs without requiring additional training. We discover that LLMs are adaptable to transcripts containing ASR errors through prompt-tuning or fine-tuning. Relative to a state-of-the-art automatic punctuation baseline, our best LLM improves the average BLEU by 2.9 points for English-German, English-Spanish, and English-Arabic TED talk translation in 9 test sets, just by improving segmentation.

摘要
一个挑战在语音翻译中是许多 spoken 内容很长，但是需要短单位以获得高质量翻译。为Address this mismatch, we adapt large language models (LLMs) to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We overcome the tendency of hallucination in LLMs by incorporating finite-state constraints during decoding; these eliminate invalid outputs without requiring additional training. We discover that LLMs are adaptable to transcripts containing ASR errors through prompt-tuning or fine-tuning. Compared to a state-of-the-art automatic punctuation baseline, our best LLM improves the average BLEU by 2.9 points for English-German, English-Spanish, and English-Arabic TED talk translation in 9 test sets, just by improving segmentation.

Let’s Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models

paper_url: http://arxiv.org/abs/2310.13671
repo_url: https://github.com/rickyskywalker/synthesis_step-by-step_official
paper_authors: Ruida Wang, Wangchunshu Zhou, Mrinmaya Sachan
for: 这篇论文的目的是提出一种数据合成方法，以帮助小型模型在具有很少标注数据的情况下进行训练。methods: 该方法利用大语言模型的丰富知识来生成 pseudo 训练示例，以实现数据和计算效率的同时提高。results: 该方法可以减少数据合成集和真实任务数据集之间的分布差距，从而提高小型模型的性能。对多个 NLP 任务进行了广泛的实验，显示了我们的方法可以比基elines提高小型模型的性能，最大提高比例达到 15.17%。

Abstract
*Data Synthesis* is a promising way to train a small model with very little labeled data. One approach for data synthesis is to leverage the rich knowledge from large language models to synthesize pseudo training examples for small models, making it possible to achieve both data and compute efficiency at the same time. However, a key challenge in data synthesis is that the synthesized dataset often suffers from a large distributional discrepancy from the *real task* data distribution. Thus, in this paper, we propose *Synthesis Step by Step* (**S3**), a data synthesis framework that shrinks this distribution gap by iteratively extrapolating the errors made by a small model trained on the synthesized dataset on a small real-world validation dataset using a large language model. Extensive experiments on multiple NLP tasks show that our approach improves the performance of a small model by reducing the gap between the synthetic dataset and the real data, resulting in significant improvement compared to several baselines: 9.48% improvement compared to ZeroGen and 2.73% compared to GoldGen, and at most 15.17% improvement compared to the small model trained on human-annotated data.

摘要
<>translate english text into simplified chinese*Data Synthesis* 是一种有前途的方法，用于训练一个小型模型，只需要非常小的标注数据。一种实现数据合成的方法是利用大型语言模型中的丰富知识，生成pseudo的训练示例，使得可以同时实现数据和计算效率。然而，数据合成中的一个关键挑战是，合成的数据集经常受到真实任务数据分布的大分布差异。因此，在这篇论文中，我们提出了Synthesis Step by Step（**S3**）数据合成框架，通过在小型真实世界验证集上使用大型语言模型来逐步修正小型模型在合成数据集上的错误。广泛的实验表明，我们的方法可以提高小型模型的性能，降低合成数据集和真实数据集之间的分布差异，相比ZeroGen和GoldGen基准下提高9.48%和2.73%，并且在最好情况下与人工标注数据上训练小型模型相比提高15.17%。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

ManifoldNeRF: View-dependent Image Feature Supervision for Few-shot Neural Radiance Fields

paper_url: http://arxiv.org/abs/2310.13670
repo_url: None
paper_authors: Daiju Kanaoka, Motoharu Sonogashira, Hakaru Tamukoh, Yasutomo Kawanishi
for: 本研究旨在提高 Novel View Synthesis 的效果，使用 Neural Radiance Fields (NeRF) 的扩展 DietNeRF。
methods: 本研究提出了一种基于 interpolated features 的方法，来supervise unknown viewpoints 的特征向量。
results: 实验结果表明，提议的方法在复杂场景中表现更好于 DietNeRF，并且在实际环境中identified一组有效的视点 Patterns。

Abstract
Novel view synthesis has recently made significant progress with the advent of Neural Radiance Fields (NeRF). DietNeRF is an extension of NeRF that aims to achieve this task from only a few images by introducing a new loss function for unknown viewpoints with no input images. The loss function assumes that a pre-trained feature extractor should output the same feature even if input images are captured at different viewpoints since the images contain the same object. However, while that assumption is ideal, in reality, it is known that as viewpoints continuously change, also feature vectors continuously change. Thus, the assumption can harm training. To avoid this harmful training, we propose ManifoldNeRF, a method for supervising feature vectors at unknown viewpoints using interpolated features from neighboring known viewpoints. Since the method provides appropriate supervision for each unknown viewpoint by the interpolated features, the volume representation is learned better than DietNeRF. Experimental results show that the proposed method performs better than others in a complex scene. We also experimented with several subsets of viewpoints from a set of viewpoints and identified an effective set of viewpoints for real environments. This provided a basic policy of viewpoint patterns for real-world application. The code is available at https://github.com/haganelego/ManifoldNeRF_BMVC2023

摘要
新型视图合成技术在近期内受到了神经辐射场（NeRF）的推出，其中 DietNeRF 是一种从只有几个图像中学习视图 synthesis 的扩展。DietNeRF 的目标是在不知道视点的情况下，从几个图像中学习视图 synthesis。但是，这个假设是理想的，实际上，随着视点的变化，图像中的特征向量也会随着变化。因此，这个假设可能会对训练造成害。为了避免这种害，我们提出了 ManifoldNeRF，一种使用邻近known viewpoint的 interpolated 特征来监督未知视点的特征vector的方法。由于该方法可以对每个未知视点提供适当的监督，因此可以更好地学习volume representation。实验结果表明，我们提出的方法在复杂场景中表现更好 than others。我们还对一些视点集进行了实验，并确定了在真实环境中有效的视点集。这提供了一个基本的视点模式政策，可以应用于实际世界中。代码可以在上找到。

Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis

paper_url: http://arxiv.org/abs/2310.13669
repo_url: https://github.com/huawei-noah/noah-research
paper_authors: Philip John Gorinski, Matthieu Zimmer, Gerasimos Lampouras, Derrick Goh Xin Deik, Ignacio Iacobacci
for: 这个论文的目的是提高代码生成模型的性能。
methods: 该论文使用了自动生成的函数签名和相关的单元测试数据，以及actor-critic reinforcement learning训练方法。
results: 该论文的实验结果显示，与原始代码生成LM模型相比，使用自动生成的数据和actor-critic RL训练方法可以提高代码生成模型的性能，最高提高9.9%。同时，与常见的PPO或CodeRL模型相比，该论文的方法可以提高代码生成模型的性能，最高提高4.3%。

Abstract
The advent of large pre-trained language models in the domain of Code Synthesis has shown remarkable performance on various benchmarks, treating the problem of Code Generation in a fashion similar to Natural Language Generation, trained with a Language Modelling (LM) objective. In addition, the property of programming language code being precisely evaluable with respect to its semantics -- through the use of Unit Tests to check its functional correctness -- lends itself to using Reinforcement Learning (RL) as a further training paradigm. Previous work has shown that RL can be applied as such to improve models' coding capabilities; however, such RL-based methods rely on a reward signal based on defined Unit Tests, which are much harder to obtain compared to the huge crawled code datasets used in LM objectives. In this work, we present a novel approach to automatically obtain data consisting of function signatures and associated Unit Tests, suitable for RL training of Code Synthesis models. We also introduce a straightforward, simple yet effective Actor-Critic RL training scheme and show that it, in conjunction with automatically generated training data, leads to improvement of a pre-trained code language model's performance by up to 9.9% improvement over the original underlying code synthesis LM, and up to 4.3% over RL-based models trained with standard PPO or CodeRL.

摘要
大量预训练语言模型在代码生成领域的出现，在不同的benchmark上表现出色，将代码生成问题与自然语言生成问题相似的方式进行训练，使用语言模型（LM）目标。此外，编程语言代码的准确评估性——通过使用单元测试来检查其功能正确性——使得使用强化学习（RL）作为训练方法的可能性。前一个研究表明RL可以用来提高模型的编程能力，但是这些RL基于的方法需要基于定义的单元测试奖励信号，这些奖励信号比大量爬取代码集更难获得。在这项工作中，我们提出了一种新的方法，可以自动获得包含函数签名和相关单元测试的数据，适用于RL训练代码生成模型。我们还提出了一种简单、直观 yet effective的actor-critic RL训练方案，并证明在这种方案下，可以使用自动生成的训练数据，提高一个预训练的代码语言模型的性能，相比原始代码生成LM的9.9%提升，并比使用标准PPO或CodeRL的RL-based模型的4.3%提升。

An experimental study for early diagnosing Parkinson’s disease using machine learning

paper_url: http://arxiv.org/abs/2310.13654
repo_url: None
paper_authors: Md. Taufiqul Haque Khan Tusar, Md. Touhidul Islam, Abul Hasnat Sakil
For: Early detection of Parkinson’s Disease* Methods: Machine Learning techniques, including MinMax Scaler, Local Outlier Factor, and SMOTE, to automate the early detection of Parkinson’s Disease from clinical characteristics, voice features, and motor examination.* Results: Obtained 100% accuracy in classifying PD and RBD patients, as well as 92% accuracy in classifying PD and HC individuals.

Abstract
One of the most catastrophic neurological disorders worldwide is Parkinson's Disease. Along with it, the treatment is complicated and abundantly expensive. The only effective action to control the progression is diagnosing it in the early stage. However, this is challenging because early detection necessitates a large and complex clinical study. This experimental work used Machine Learning techniques to automate the early detection of Parkinson's Disease from clinical characteristics, voice features and motor examination. In this study, we develop ML models utilizing a public dataset of 130 individuals, 30 of whom are untreated Parkinson's Disease patients, 50 of whom are Rapid Eye Movement Sleep Behaviour Disorder patients who are at a greater risk of contracting Parkinson's Disease, and 50 of whom are Healthy Controls. We use MinMax Scaler to rescale the data points, Local Outlier Factor to remove outliers, and SMOTE to balance existing class frequency. Afterwards, apply a number of Machine Learning techniques. We implement the approaches in such a way that data leaking and overfitting are not possible. Finally, obtained 100% accuracy in classifying PD and RBD patients, as well as 92% accuracy in classifying PD and HC individuals.

摘要
In this study, we use machine learning techniques to automate the early detection of Parkinson's disease from clinical characteristics, voice features, and motor examination. We use a public dataset of 130 individuals, including 30 patients with untreated Parkinson's disease, 50 patients with rapid eye movement sleep behavior disorder who are at a higher risk of contracting Parkinson's disease, and 50 healthy controls.First, we use Min-Max Scaler to rescale the data points, Local Outlier Factor to remove outliers, and SMOTE to balance the existing class frequency. We implement the approaches in such a way that data leaking and overfitting are not possible.After applying these techniques, we obtained 100% accuracy in classifying Parkinson's disease and rapid eye movement sleep behavior disorder patients, as well as 92% accuracy in classifying Parkinson's disease and healthy control individuals.Note: "Parkinson's disease" is translated as " Parkinson 病" in Simplified Chinese, "rapid eye movement sleep behavior disorder" is translated as "快眼睛运动失调" in Simplified Chinese, and "healthy controls" is translated as "健康控制群" in Simplified Chinese.

Weighted Joint Maximum Mean Discrepancy Enabled Multi-Source-Multi-Target Unsupervised Domain Adaptation Fault Diagnosis

paper_url: http://arxiv.org/abs/2310.14790
repo_url: None
paper_authors: Zixuan Wang, Haoran Tang, Haibo Wang, Bo Qin, Mark D. Butala, Weiming Shen, Hongwei Wang
for: 本研究旨在提出一种多源多目标无监督领域适应（WJMMD-MDA）方法，用于在多源多目标场景下实现适应预测。
methods: 该方法提取了多个标注源频谱中的足够信息，并通过改进的距离损失实现多源多目标频谱的域对齐。这使得在多源多目标场景下学习域外特征，并实现跨域缺陷诊断。
results: 对三个数据集进行了广泛的比较试验，实验结果表明，提出的方法在跨域缺陷诊断中具有显著优势。

Abstract
Despite the remarkable results that can be achieved by data-driven intelligent fault diagnosis techniques, they presuppose the same distribution of training and test data as well as sufficient labeled data. Various operating states often exist in practical scenarios, leading to the problem of domain shift that hinders the effectiveness of fault diagnosis. While recent unsupervised domain adaptation methods enable cross-domain fault diagnosis, they struggle to effectively utilize information from multiple source domains and achieve effective diagnosis faults in multiple target domains simultaneously. In this paper, we innovatively proposed a weighted joint maximum mean discrepancy enabled multi-source-multi-target unsupervised domain adaptation (WJMMD-MDA), which realizes domain adaptation under multi-source-multi-target scenarios in the field of fault diagnosis for the first time. The proposed method extracts sufficient information from multiple labeled source domains and achieves domain alignment between source and target domains through an improved weighted distance loss. As a result, domain-invariant and discriminative features between multiple source and target domains are learned with cross-domain fault diagnosis realized. The performance of the proposed method is evaluated in comprehensive comparative experiments on three datasets, and the experimental results demonstrate the superiority of this method.

摘要
尽管数据驱动智能故障诊断技术可以实现很出色的结果，但它们假设训练和测试数据的分布相同，以及充足的标注数据。在实际场景中，各种运行状态经常存在，导致域 shift 问题，这阻碍了故障诊断的效iveness。而最近的无监督领域适应方法可以在不同域的故障诊断中进行交叉领域适应，但它们很难 simultaneously 利用多个来源域的信息，并实现多个目标域的有效诊断。在这篇论文中，我们创新地提出了一种基于加权最大差异 enabled 多源多目标无监督领域适应方法（WJMMD-MDA），该方法在多源多目标场景中实现了领域适应。该方法从多个标注源域中提取了足够的信息，并通过改进的加权距离损失来实现源和目标域之间的域对应。因此，在多个源和目标域之间学习了域不variant 和抑制特征，并实现了交叉域的故障诊断。我们在三个数据集上进行了广泛的比较实验，结果表明该方法的性能优于其他方法。

Contrastive Preference Learning: Learning from Human Feedback without RL

paper_url: http://arxiv.org/abs/2310.13639
repo_url: https://github.com/jhejna/cpl
paper_authors: Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh
for: 这 paper 是关于人工智能学习从人类反馈（RLHF）中的一种新方法，用于让模型与人类意愿相对应。
methods: 这 paper 使用了一种新的 regret-based 模型来学习人类偏好，并使用了一种新的 contrastive 目标函数来学习优化行为。
results: 这 paper 的结果表明，这种新方法可以在高维和连续的RLHF问题中Scale to elegantly，并且比之前的方法更简单。

Abstract
Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in two phases: first, use human preferences to learn a reward function and second, align the model by optimizing the learned reward via reinforcement learning (RL). This paradigm assumes that human preferences are distributed according to reward, but recent work suggests that they instead follow the regret under the user's optimal policy. Thus, learning a reward function from feedback is not only based on a flawed assumption of human preference, but also leads to unwieldy optimization challenges that stem from policy gradients or bootstrapping in the RL phase. Because of these optimization challenges, contemporary RLHF methods restrict themselves to contextual bandit settings (e.g., as in large language models) or limit observation dimensionality (e.g., state-based robotics). We overcome these limitations by introducing a new family of algorithms for optimizing behavior from human feedback using the regret-based model of human preferences. Using the principle of maximum entropy, we derive Contrastive Preference Learning (CPL), an algorithm for learning optimal policies from preferences without learning reward functions, circumventing the need for RL. CPL is fully off-policy, uses only a simple contrastive objective, and can be applied to arbitrary MDPs. This enables CPL to elegantly scale to high-dimensional and sequential RLHF problems while being simpler than prior methods.

摘要
人工智能学习奖励（RLHF）已成为一种流行的方法，用于将模型与人类意愿相align。通常RLHF算法在两个阶段操作：首先，使用人类偏好来学习奖励函数，然后使用奖励学习（RL）来调整模型。这种方法假设人类偏好是根据奖励分布的，但最近的研究表明，人类偏好实际上是基于用户的优化策略的 regret。因此，从反馈中学习奖励函数并不只是基于错误的人类偏好假设，还会导致困难的优化挑战，例如Policy Gradient或Bootstrapping在RL阶段。由于这些优化挑战，当代RLHF方法通常限制自己在contextual bandit Setting（如大语言模型）或限制观察维度（如状态基于机器人）。我们超越这些限制，通过引入一种新的人类反馈学习算法，使用 regret-based模型来学习行为。使用最大 entropy原理，我们 derive Contrastive Preference Learning（CPL）算法，可以从偏好中学习优化策略，不需要学习奖励函数，从而避免RL阶段的优化挑战。CPL是完全偏离策略的，只需使用简单的对比目标，可以应用于任意MDP。这使得CPL可以简单地扩展到高维和顺序RLHF问题，而且更简单于先前的方法。

Hunayn: Elevating Translation Beyond the Literal

paper_url: http://arxiv.org/abs/2310.13613
repo_url: None
paper_authors: Nasser Almousa, Nasser Alzamil, Abdullah Alshehri, Ahmad Sait
for: 这项研究旨在开发一个高级的英语到阿拉伯语翻译工具，超越传统工具。
methods: 该方法利用赫尔辛基 transformer（MarianMT），通过自动抽取的纯文学阿拉伯语数据进行微调。
results: 对于Google翻译的评估表明，该方法在质量评估中具有明显的优势，特别是在文化敏感度和上下文准确性方面。

Abstract
This project introduces an advanced English-to-Arabic translator surpassing conventional tools. Leveraging the Helsinki transformer (MarianMT), our approach involves fine-tuning on a self-scraped, purely literary Arabic dataset. Evaluations against Google Translate show consistent outperformance in qualitative assessments. Notably, it excels in cultural sensitivity and context accuracy. This research underscores the Helsinki transformer's superiority for English-to-Arabic translation using a Fusha dataset.

摘要
这个项目推出了一种高级英语到阿拉伯语翻译工具，超越传统工具。我们采用了赫尔辛基transformer（MarianMT），我们的方法是在自动抽取的纯文学阿拉伯语数据上细调。对于Google翻译进行评估，我们的方法表现出了一致的提升，尤其是在文化敏感度和语言上下文准确性方面。这些研究证明了赫尔辛基transformer在英语到阿拉伯语翻译中的优势，使用福沙数据集。

Make Your Decision Convincing! A Unified Two-Stage Framework: Self-Attribution and Decision-Making

paper_url: http://arxiv.org/abs/2310.13610
repo_url: None
paper_authors: Yanrui Du, Sendong Zhao, Haochun Wang, Yuhan Chen, Rui Bai, Zewen Qiang, Muzhen Cai, Bing Qin
for: 本研究旨在提高黑盒模型的决策过程中的自然语言描述能力，以及提供用户可靠的决策理由。
methods: 本研究使用了子序列从输入文本中提取的自然语言来为用户提供决策理由，并通过两个阶段框架 Self-Attribution and Decision-Making (SADM) 来确保决策理由和模型决策之间的关系更加可靠。
results: 经过对 ERASER 测试 benchmark 上的五个理解任务的广泛实验，我们表明了我们的框架不仅可以提高决策理由和模型决策之间的关系的可靠性，还可以在任务性能和决策理由质量两个方面达到竞争力。此外，我们还探讨了我们的框架在半supervised情况下的潜在应用。

Abstract
Explaining black-box model behavior with natural language has achieved impressive results in various NLP tasks. Recent research has explored the utilization of subsequences from the input text as a rationale, providing users with evidence to support the model decision. Although existing frameworks excel in generating high-quality rationales while achieving high task performance, they neglect to account for the unreliable link between the generated rationale and model decision. In simpler terms, a model may make correct decisions while attributing wrong rationales, or make poor decisions while attributing correct rationales. To mitigate this issue, we propose a unified two-stage framework known as Self-Attribution and Decision-Making (SADM). Through extensive experiments on five reasoning datasets from the ERASER benchmark, we demonstrate that our framework not only establishes a more reliable link between the generated rationale and model decision but also achieves competitive results in task performance and the quality of rationale. Furthermore, we explore the potential of our framework in semi-supervised scenarios.

摘要
<>将黑盒模型行为用自然语言描述达到了各种NLP任务中的出色结果。现有研究利用输入文本中的子序列作为论证，为用户提供模型决策的证据。 although existing frameworks can generate high-quality rationales and achieve high task performance, they neglect to account for the unreliable link between the generated rationale and model decision. In simpler terms, a model may make correct decisions while attributing wrong rationales, or make poor decisions while attributing correct rationales. To address this issue, we propose a unified two-stage framework known as Self-Attribution and Decision-Making (SADM). Through extensive experiments on five reasoning datasets from the ERASER benchmark, we demonstrate that our framework not only establishes a more reliable link between the generated rationale and model decision but also achieves competitive results in task performance and the quality of rationale. Furthermore, we explore the potential of our framework in semi-supervised scenarios.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark

paper_url: http://arxiv.org/abs/2310.13606
repo_url: https://github.com/kinit-sk/mgt-detection-benchmark
paper_authors: Dominik Macko, Robert Moro, Adaku Uchendu, Jason Samuel Lucas, Michiharu Yamashita, Matúš Pikuliak, Ivan Srba, Thai Le, Dongwon Lee, Jakub Simko, Maria Bielikova
for: 填充 Multilingual Machine-Generated Text Detection Benchmark Dataset的 lacuna，提供8种多语言LLM生成的74,081个authentic和机器生成文本，用于评估多语言机器生成文本检测器的性能。
methods: 使用这个benchmark dataset， Comparing zero-shot (统计学和黑盒子)和精心调整的检测器的性能，并评估这些检测器在不同语言和LLM之间的一致性。
results: 研究发现， zero-shot检测器在语言不同和LLM不同情况下的性能较差，而精心调整的检测器在多语言和多LLM情况下的性能显著提高。

Abstract
There is a lack of research into capabilities of recent LLMs to generate convincing text in languages other than English and into performance of detectors of machine-generated text in multilingual settings. This is also reflected in the available benchmarks which lack authentic texts in languages other than English and predominantly cover older generators. To fill this gap, we introduce MULTITuDE, a novel benchmarking dataset for multilingual machine-generated text detection comprising of 74,081 authentic and machine-generated texts in 11 languages (ar, ca, cs, de, en, es, nl, pt, ru, uk, and zh) generated by 8 multilingual LLMs. Using this benchmark, we compare the performance of zero-shot (statistical and black-box) and fine-tuned detectors. Considering the multilinguality, we evaluate 1) how these detectors generalize to unseen languages (linguistically similar as well as dissimilar) and unseen LLMs and 2) whether the detectors improve their performance when trained on multiple languages.

摘要
“现有研究缺乏关于最新的LLM的生成文本的真实性的语言 besides English 以及机器生成文本检测器在多语言设置下的性能研究。这也反映在可用的标准准则中，lack of authentic texts in languages other than English and primarily cover older generators。为了填补这一漏洞，我们引入 MULTITuDE，一个新的多语言机器生成文本检测 benchmark，包含74,081 个真实的机器生成文本和人工生成文本在 11 种语言（ar, ca, cs, de, en, es, nl, pt, ru, uk, and zh），由 8 种多语言 LLMS 生成。通过这个benchmark，我们比较了零 shot (统计学和黑盒子) 和精度调教的检测器的性能。考虑到多语言性，我们评估了1）这些检测器在未看过语言（语言相似性和不相似性）和LLMs 中的泛化性能，2）是否在多语言培训中提高检测器的性能。”

Skin Lesion Segmentation Improved by Transformer-based Networks with Inter-scale Dependency Modeling

paper_url: http://arxiv.org/abs/2310.13604
repo_url: https://github.com/saniaesk/skin-lesion-segmentation
paper_authors: Sania Eskandari, Janet Lumpp, Luis Sanchez Giraldo
for:这项研究旨在提高皮肤癌病变分割的自动化精度，使用Transformer网络模型来增强FCNs的长距离依赖关系捕捉能力。methods:该研究使用了一种基于U-Net架构的Transformer网络模型，并提出了一种增强skip连接路径的方法，以增加网络特征重用性。此外，研究还提出了一种Inter-scale Context Fusion（ISCF）方法，通过在每个阶段的Encoder中使用注意力相关性来适应性地融合不同阶段的上下文。results:研究在两个皮肤癌病变分割 benchmark 上达到了比较高的分割精度，并且对比Transformer网络模型和U-Net架构的性能进行了比较，结果表明，基于U-Net架构的Transformer网络模型可以增强皮肤癌病变分割的自动化精度。代码可以在 GitHub 上找到：https://github.com/saniaesk/skin-lesion-segmentation。

Abstract
Melanoma, a dangerous type of skin cancer resulting from abnormal skin cell growth, can be treated if detected early. Various approaches using Fully Convolutional Networks (FCNs) have been proposed, with the U-Net architecture being prominent To aid in its diagnosis through automatic skin lesion segmentation. However, the symmetrical U-Net model's reliance on convolutional operations hinders its ability to capture long-range dependencies crucial for accurate medical image segmentation. Several Transformer-based U-Net topologies have recently been created to overcome this limitation by replacing CNN blocks with different Transformer modules to capture local and global representations. Furthermore, the U-shaped structure is hampered by semantic gaps between the encoder and decoder. This study intends to increase the network's feature re-usability by carefully building the skip connection path. Integrating an already calculated attention affinity within the skip connection path improves the typical concatenation process utilized in the conventional skip connection path. As a result, we propose a U-shaped hierarchical Transformer-based structure for skin lesion segmentation and an Inter-scale Context Fusion (ISCF) method that uses attention correlations in each stage of the encoder to adaptively combine the contexts from each stage to mitigate semantic gaps. The findings from two skin lesion segmentation benchmarks support the ISCF module's applicability and effectiveness. The code is publicly available at \url{https://github.com/saniaesk/skin-lesion-segmentation}

摘要
melanoma，一种危险的皮肤癌症，可以通过早期检测治疗。多种使用 Fully Convolutional Networks (FCNs) 的方法已经被提议，其中 U-Net 建筑物被广泛使用，以帮助自动识别皮肤肿瘤。然而，传统的 U-Net 模型在 convolutional 操作上存在一定的限制，这限制了它的捕捉长距离依赖关系的能力，这些依赖关系是医疗图像分割中必要的。为了缓解这些限制，一些基于 Transformer 的 U-Net 结构已经被创建，这些结构将 CNN 块 replaced 为不同的 Transformer 模块，以捕捉本地和全局表示。此外，U 形结构受到Semantic gap 的限制，这个问题可以通过精心建立 skip connection path 来解决。在传统的 skip connection path 中使用已经计算的 attention affinity 可以提高 feature 的重用性。因此，我们提出了一种 U 形层次 Transformer 基本结构和一种 Inter-scale Context Fusion (ISCF) 方法，该方法在每个encoder阶段使用 attention 相关性来适应地将每个阶段的上下文进行adaptive 组合，以 Mitigate Semantic gap。从两个皮肤肿瘤分割 benchmark 的结果来看，ISCF 模块的可行性和效果。代码可以在 \url{https://github.com/saniaesk/skin-lesion-segmentation} 上获取。

MarineGPT: Unlocking Secrets of Ocean to the Public

paper_url: http://arxiv.org/abs/2310.13596
repo_url: https://github.com/hkust-vgd/marinegpt
paper_authors: Ziqiang Zheng, Jipeng Zhang, Tuan-Anh Vu, Shizhe Diao, Yue Him Wong Tim, Sai-Kit Yeung
for: The paper is written to explore the use of large language models (LLMs) and multi-modal large language models (MLLMs) in the domain-specific application of the marine domain.
methods: The paper proposes a new vision-language model called MarineGPT, which is specifically designed for the marine domain and trained on a large dataset of marine image-text pairs called Marine-5M.
results: The paper shows that MarineGPT outperforms existing MLLMs in understanding domain-specific intents and generating informative and scientific responses in the marine domain. It also provides a standard protocol for adapting general-purpose assistants to downstream domain-specific experts.

Abstract
Large language models (LLMs), such as ChatGPT/GPT-4, have proven to be powerful tools in promoting the user experience as an AI assistant. The continuous works are proposing multi-modal large language models (MLLM), empowering LLMs with the ability to sense multiple modality inputs through constructing a joint semantic space (e.g. visual-text space). Though significant success was achieved in LLMs and MLLMs, exploring LLMs and MLLMs in domain-specific applications that required domain-specific knowledge and expertise has been less conducted, especially for \textbf{marine domain}. Different from general-purpose MLLMs, the marine-specific MLLM is required to yield much more \textbf{sensitive}, \textbf{informative}, and \textbf{scientific} responses. In this work, we demonstrate that the existing MLLMs optimized on huge amounts of readily available general-purpose training data show a minimal ability to understand domain-specific intents and then generate informative and satisfactory responses. To address these issues, we propose \textbf{MarineGPT}, the first vision-language model specially designed for the marine domain, unlocking the secrets of the ocean to the public. We present our \textbf{Marine-5M} dataset with more than 5 million marine image-text pairs to inject domain-specific marine knowledge into our model and achieve better marine vision and language alignment. Our MarineGPT not only pushes the boundaries of marine understanding to the general public but also offers a standard protocol for adapting a general-purpose assistant to downstream domain-specific experts. We pave the way for a wide range of marine applications while setting valuable data and pre-trained models for future research in both academic and industrial communities.

摘要
大型语言模型（LLMs），如ChatGPT/GPT-4，已经证明是强大的工具来提升用户体验，作为人工智能助手。不断的研究提出了多modal大型语言模型（MLLM），将LMLMs扩展到多种数据类型的听取，例如文本和视觉数据。虽然LMLMs和MLLMs在不同领域中取得了卓越成就，但是对特定领域的应用仍然较少，尤其是在海洋领域。不同于通用MLLMs，海洋特定MLLM需要更加敏感、有用和科学的回应。在这个工作中，我们表明了现有的MLLMs在大量可用的通用训练数据上并不能够理解领域专门意图，并生成有用和满意的回应。为解决这些问题，我们提出了海洋GPT，首个特别设计 для海洋领域的视觉语言模型，为海洋秘密开启给大众。我们提供了我们的海洋-5M数据集，包含更多 than 500万几何和文本对应项目，将领域专门知识注入到我们的模型中，以 достиieving更好的视觉和语言对齐。我们的海洋GPT不仅扩展了海洋理解的boundaries，并且提供了一个标准协议供后续领域专门人员适应。我们开启了海洋应用的广泛前景，同时设定了价值的数据和预训模型供未来学术和工业社群的研究。

Towards equilibrium molecular conformation generation with GFlowNets

paper_url: http://arxiv.org/abs/2310.14782
repo_url: None
paper_authors: Alexandra Volokhova, Michał Koziarski, Alex Hernández-García, Cheng-Hao Liu, Santiago Miret, Pablo Lemos, Luca Thiede, Zichao Yan, Alán Aspuru-Guzik, Yoshua Bengio
for: 用于预测分子性质的预测方法
methods: 使用GFlowNet方法对小分子的可能性空间进行采样，根据分子的能量来确定采样分布
results: 可以与不同精度的能量估计方法结合使用，找到高度可变的药物分子低能态 conformations 的多样化集合，并能够复制分子潜在能量表面。

Abstract
Sampling diverse, thermodynamically feasible molecular conformations plays a crucial role in predicting properties of a molecule. In this paper we propose to use GFlowNet for sampling conformations of small molecules from the Boltzmann distribution, as determined by the molecule's energy. The proposed approach can be used in combination with energy estimation methods of different fidelity and discovers a diverse set of low-energy conformations for highly flexible drug-like molecules. We demonstrate that GFlowNet can reproduce molecular potential energy surfaces by sampling proportionally to the Boltzmann distribution.

摘要
采样多样、 termodynamic 可行的分子姿态对分子性质预测具有重要作用。本文提议使用 GFlowNet 采样小分子的姿态，由分子能量确定的博尔ツ曼分布中的一部分。该方法可与不同级别的能量估计方法结合使用，并发现高灵活性药物分子的多个低能量姿态。我们示出 GFlowNet 可以按照博尔ツ曼分布中的比例采样分子潜在能能面。

ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction

paper_url: http://arxiv.org/abs/2310.13590
repo_url: https://github.com/syr-cn/relm
paper_authors: Yaorui Shi, An Zhang, Enzhi Zhang, Zhiyuan Liu, Xiang Wang
for: 预测化学反应，一个基本的化学挑战，涉及预测反应过程中的产物。现有的技术常受训练数据的限制和不能利用文本信息的办法约束其应用在实际应用中。本文提出了一种名为ReLM的新方案，利用化学知识编码在语言模型（LM）中，以助金Graph Neural Networks（GNNs），从而提高实际化学反应预测的准确率。
methods: 我们提出了一种名为ReLM的新方案，利用化学知识编码在语言模型（LM）中，以助金Graph Neural Networks（GNNs），从而提高实际化学反应预测的准确率。
results: 我们的实验结果表明，ReLM可以在不同的化学反应数据集上提高现有GNN-based方法的性能，特别是在异常情况下。codes可以在https://github.com/syr-cn/ReLM中获得。

Abstract
Predicting chemical reactions, a fundamental challenge in chemistry, involves forecasting the resulting products from a given reaction process. Conventional techniques, notably those employing Graph Neural Networks (GNNs), are often limited by insufficient training data and their inability to utilize textual information, undermining their applicability in real-world applications. In this work, we propose ReLM, a novel framework that leverages the chemical knowledge encoded in language models (LMs) to assist GNNs, thereby enhancing the accuracy of real-world chemical reaction predictions. To further enhance the model's robustness and interpretability, we incorporate the confidence score strategy, enabling the LMs to self-assess the reliability of their predictions. Our experimental results demonstrate that ReLM improves the performance of state-of-the-art GNN-based methods across various chemical reaction datasets, especially in out-of-distribution settings. Codes are available at https://github.com/syr-cn/ReLM.

摘要
预测化学反应是化学领域的基本挑战之一，即预测反应过程中的产物。现有的技术，如基于图神经网络（GNN）的方法，常受到数据不充分的训练和文本信息的不能利用的限制，从而削弱其在实际应用中的可行性。在这项工作中，我们提出了一种新的框架，即ReLM，它利用化学知识编码在语言模型（LM）中来帮助GNN，从而提高实际化学反应预测的准确性。为进一步增强模型的可靠性和可读性，我们还在模型中 интеGRATE了信任分数策略，使LM可以自我评估其预测的可靠性。我们的实验结果表明，ReLM可以在不同的化学反应数据集上超越现有的GNN-based方法的性能，特别是在出版数据集上。代码可以在https://github.com/syr-cn/ReLM上下载。

SPARE: A Single-Pass Neural Model for Relational Databases

paper_url: http://arxiv.org/abs/2310.13581
repo_url: None
paper_authors: Benjamin Hilprecht, Kristian Kersting, Carsten Binnig
for: 这篇论文旨在提出一种高效地在关系数据库（RDB）上训练深度学习模型的方法，以提高predictive performance和减少训练时间。
methods: 该方法基于单过Relational models（SPARE），它利用了关系数据库中数据的规则结构，通过单过训练来快速地训练深度学习模型，并且可以充分利用相似性来降低模型的维度。
results: 对多个基线模型进行了比较，研究发现SPARE可以在训练和推理中快速减少时间，同时保持与基线模型相似的预测性能。

Abstract
While there has been extensive work on deep neural networks for images and text, deep learning for relational databases (RDBs) is still a rather unexplored field. One direction that recently gained traction is to apply Graph Neural Networks (GNNs) to RBDs. However, training GNNs on large relational databases (i.e., data stored in multiple database tables) is rather inefficient due to multiple rounds of training and potentially large and inefficient representations. Hence, in this paper we propose SPARE (Single-Pass Relational models), a new class of neural models that can be trained efficiently on RDBs while providing similar accuracies as GNNs. For enabling efficient training, different from GNNs, SPARE makes use of the fact that data in RDBs has a regular structure, which allows one to train these models in a single pass while exploiting symmetries at the same time. Our extensive empirical evaluation demonstrates that SPARE can significantly speedup both training and inference while offering competitive predictive performance over numerous baselines.

摘要
tradicional deep learning for images and text 领域已经有了广泛的研究，但是deep learning for relational databases（RDB）仍然是一个未经探索的领域。一个Recently gained traction的方向是将Graph Neural Networks（GNNs）应用于RBD。然而，在大规模的关系数据库（即数据存储在多个数据库表中）上训练GNNs是不具有效率的，因为需要多 rondas of training和可能很大的不效率表示。因此，在这篇论文中，我们提出了SPARE（Single-Pass Relational models），一种新的神经网络模型，可以高效地在RDB上训练，并且提供与GNNs相似的准确性。为了实现高效的训练，SPARE不同于GNNs，利用了RDB中数据的Regular structure，这 позвоes 一次性训练这些模型，同时利用Symmetries。我们的广泛的实验证明，SPARE可以明显提高训练和推理的速度，并且与多种基eline提供竞争的预测性能。

Tree Search in DAG Space with Model-based Reinforcement Learning for Causal Discovery

paper_url: http://arxiv.org/abs/2310.13576
repo_url: None
paper_authors: Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi
for: 本研究旨在提出一种基于模型搜索的 causal 发现方法，用于解决多个领域中的决策和生物经济学等领域中的 causal 结构难以确定问题。
methods: 本研究使用了搜索树来逐步构建导向的无环图，并提出了一种有效的算法来排除引入循环的边，从而实现更深入的离散搜索和采样在 DAGC 空间中。
results: 研究人员对两个实际任务进行了评估，并获得了较好的性能，超过了当前的模型自由方法和扩散搜索的性能，表明该方法在 combinatorial 方法中表现了明显的进步。

Abstract
Identifying causal structure is central to many fields ranging from strategic decision-making to biology and economics. In this work, we propose a model-based reinforcement learning method for causal discovery based on tree search, which builds directed acyclic graphs incrementally. We also formalize and prove the correctness of an efficient algorithm for excluding edges that would introduce cycles, which enables deeper discrete search and sampling in DAG space. We evaluate our approach on two real-world tasks, achieving substantially better performance than the state-of-the-art model-free method and greedy search, constituting a promising advancement for combinatorial methods.

摘要
找到 causal 结构是许多领域的中心问题，从策略决策到生物和经济学。在这项工作中，我们提出了基于模型的回归学习方法 для causal 发现，使用搜索树来逐步构建导向的无环图。我们还正式定义和证明了一种高效的算法来排除引入环的边，这使得精确的搜索和采样在 DAG 空间中可以进行更深入的探索。我们在两个实际任务上评估了我们的方法，与当前状态的模型自由方法和排序搜索具有显著更好的性能，代表了 combinatorial 方法的进步。

Boosting Generalization with Adaptive Style Techniques for Fingerprint Liveness Detection

paper_url: http://arxiv.org/abs/2310.13573
repo_url: None
paper_authors: Kexin Zhu, Bo Lin, Yang Qiu, Adam Yule, Yao Tang, Jiajun Liang
for: 本研究旨在提出一种高性能的指纹生物特征提取技术，并在 LivDet 2023 指纹表现挑战中获得第一名。
methods: 本研究使用了多种方法，包括样式转移，以提高精度和泛化能力。
results: 本研究在 LivDet 2023 生命检测在动作中的挑战中获得第二名，并在 LivDet 2023 指纹表现挑战中实现了状态的最佳性能。

Abstract
We introduce a high-performance fingerprint liveness feature extraction technique that secured first place in LivDet 2023 Fingerprint Representation Challenge. Additionally, we developed a practical fingerprint recognition system with 94.68% accuracy, earning second place in LivDet 2023 Liveness Detection in Action. By investigating various methods, particularly style transfer, we demonstrate improvements in accuracy and generalization when faced with limited training data. As a result, our approach achieved state-of-the-art performance in LivDet 2023 Challenges.

摘要
我们介绍了一种高性能指纹生活特征提取技术，在2023年生活特征挑战赛中获得第一名。此外，我们还开发了一个实用的指纹识别系统，准确率达94.68%，在2023年生活检测在动作赛中获得第二名。通过各种方法的研究，特别是样式传输，我们证明了对有限训练数据的改进。因此，我们的方法在2023年生活检测挑战中达到了国际前列水平。

Retrieval-Augmented Neural Response Generation Using Logical Reasoning and Relevance Scoring

paper_url: http://arxiv.org/abs/2310.13566
repo_url: None
paper_authors: Nicholas Thomas Walker, Stefan Ultes, Pierre Lison
for: 这个论文是为了提出一种基于知识图和逻辑推理的响应生成方法，以提高对话系统的响应质量。
methods: 该方法包括在对话状态和背景信息上构建知识图，然后使用 probabilistic logical programming 推理出逻辑推理出逻辑推理得到的信息，最后使用神经网络模型对对话中每个节点和边进行排名，并将最高排名的元素转化为自然语言形式，并与对话系统的响应相结合。
results: 实验结果表明，将逻辑推理与对话 relevance 排名结合，可以提高对话系统的响应的 фактиче性和流畅性。

Abstract
Constructing responses in task-oriented dialogue systems typically relies on information sources such the current dialogue state or external databases. This paper presents a novel approach to knowledge-grounded response generation that combines retrieval-augmented language models with logical reasoning. The approach revolves around a knowledge graph representing the current dialogue state and background information, and proceeds in three steps. The knowledge graph is first enriched with logically derived facts inferred using probabilistic logical programming. A neural model is then employed at each turn to score the conversational relevance of each node and edge of this extended graph. Finally, the elements with highest relevance scores are converted to a natural language form, and are integrated into the prompt for the neural conversational model employed to generate the system response. We investigate the benefits of the proposed approach on two datasets (KVRET and GraphWOZ) along with a human evaluation. Experimental results show that the combination of (probabilistic) logical reasoning with conversational relevance scoring does increase both the factuality and fluency of the responses.

摘要
通常情况下，任务导向对话系统的响应执行都是基于对话状态或外部数据库的信息。这篇论文提出了一种新的知识固定响应生成方法，该方法结合检索加强语言模型和逻辑推理。该方法的核心思想是使用对话状态和背景信息的知识图，并在三个步骤中进行处理。首先，将对话状态和背景信息转换为逻辑推理可以生成的逻辑知识图。然后，使用神经网络模型对每个转换后的图进行分类，以评估对话中每个节点和边的对话相关性。最后，选择分类结果最高的元素，并将其转换为自然语言形式，以整合到用于生成系统响应的神经网络模型中。我们在两个数据集（KVRET和GraphWOZ）上进行了实验和人工评估，结果表明，将逻辑推理与对话相关性分类结合使用，可以提高响应的事实性和流畅性。

Reward Shaping for Happier Autonomous Cyber Security Agents

paper_url: http://arxiv.org/abs/2310.13565
repo_url: None
paper_authors: Elizabeth Bates, Vasilios Mavroudis, Chris Hicks
for: 这种工作研究了计算机网络防御任务中深度强化学习模型的训练方法，特别是对奖励信号的影响。
methods: 本研究使用了奖励信号的修正技巧，以帮助代理人更 efficiently 训练和可能 converges to better performance。
results: 研究发现，深度强化学习算法对奖励信号的大小和相对大小有敏感性。此外，结合奖励和外部奖励的组合训练可以与奖励只训练相比，提高代理人的训练效率和性能。但是，内在的好奇心作为一种内部正面奖励机制可能不太适用于高级网络监测任务。

Abstract
As machine learning models become more capable, they have exhibited increased potential in solving complex tasks. One of the most promising directions uses deep reinforcement learning to train autonomous agents in computer network defense tasks. This work studies the impact of the reward signal that is provided to the agents when training for this task. Due to the nature of cybersecurity tasks, the reward signal is typically 1) in the form of penalties (e.g., when a compromise occurs), and 2) distributed sparsely across each defense episode. Such reward characteristics are atypical of classic reinforcement learning tasks where the agent is regularly rewarded for progress (cf. to getting occasionally penalized for failures). We investigate reward shaping techniques that could bridge this gap so as to enable agents to train more sample-efficiently and potentially converge to a better performance. We first show that deep reinforcement learning algorithms are sensitive to the magnitude of the penalties and their relative size. Then, we combine penalties with positive external rewards and study their effect compared to penalty-only training. Finally, we evaluate intrinsic curiosity as an internal positive reward mechanism and discuss why it might not be as advantageous for high-level network monitoring tasks.

摘要
随着机器学习模型的能力不断提高，它们在解决复杂任务方面表现出了潜在的投资潜力。一个最有前途的方向是使用深度反馈学习训练自动化代理人在计算机网络防御任务中。这项研究研究了在这个任务中训练代理人时的奖励信号的影响。由于网络安全任务的性质，奖励信号通常是1）在攻击发生时给出的罚款（例如），和2）每个防御集的分布式罚款。这种奖励特点与 класси型反馈学习任务不同，agent Regularly rewarded for progress（比如 occasional penalties for failures）。我们研究了修复奖励技巧，以帮助代理人更加样本效率地训练，并可能达到更好的性能。我们首先表明深度反馈学习算法对奖励的大小和相对大小的敏感性。然后，我们将罚款与正面的外部奖励相结合，并比较奖励只有训练的效果。最后，我们评估了内在的好奇性作为内部正面奖励机制，并讨论了为高级网络监测任务而言，这种机制可能不太有利。

Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning

paper_url: http://arxiv.org/abs/2310.13552
repo_url: https://github.com/noewangjy/sp-cot
paper_authors: Jinyuan Wang, Junlong Li, Hai Zhao
for: 这 paper 的目的是提出一种自动生成高质量的 chain-of-thought (CoT) 框架，以提高大语言模型 (LLM) 的多步逻辑能力。
methods: 这 paper 使用了一种自动生成高质量 CoT 数据集的框架，以及一种适应性抽取器来选择在上下文中的 CoT。另外，它还使用了一种自适应学习的句子提示方法来进行自我推导。
results: 对四个多步逻辑问答 benchmark 进行了广泛的实验，并显示了 SP-CoT 可以在大规模 (175B) LLM 上显著超越之前的 SOTA 方法，同时也可以在小规模 (13B) LLM 上近乎双倍提高零基eline性能。进一步的分析发现，SP-CoT 能够诱导 LLM 提供直接和简洁的中间逻辑步骤，在 MuSiQue-Ans 数据集上 recall 约 50% 的中间答案。

Abstract
In open-domain question-answering (ODQA), most existing questions require single-hop reasoning on commonsense. To further extend this task, we officially introduce open-domain multi-hop reasoning (ODMR) by answering multi-hop questions with explicit reasoning steps in open-domain setting. Recently, large language models (LLMs) have found significant utility in facilitating ODQA without external corpus. Furthermore, chain-of-thought (CoT) prompting boosts the reasoning capability of LLMs to a greater extent with manual or automated paradigms. However, existing automated methods lack of quality assurance, while manual approaches suffer from limited scalability and poor diversity, hindering the capabilities of LLMs. In this paper, we propose Self-prompted Chain-of-Thought (SP-CoT), an automated framework to mass-produce high quality CoTs of LLMs, by LLMs and for LLMs. SP-CoT introduces an automated generation pipeline of high quality ODMR datasets, an adaptive sampler for in-context CoT selection and self-prompted inference via in-context learning. Extensive experiments on four multi-hop question-answering benchmarks show that our proposed SP-CoT not only significantly surpasses the previous SOTA methods on large-scale (175B) LLMs, but also nearly doubles the zero-shot performance of small-scale (13B) LLMs. Further analysis reveals the remarkable capability of SP-CoT to elicit direct and concise intermediate reasoning steps by recalling $\sim$50\% of intermediate answers on MuSiQue-Ans dataset.

摘要
在开放预测问答（ODQA）任务中，大多数现有的问题需要单步逻辑。为了进一步推进这个任务，我们正式引入开放预测多步逻辑（ODMR），通过在开放预测 Setting中回答多步问题，并提供明确的逻辑步骤。现在，大型自然语言模型（LLM）在ODQA中发现了显著的用于促进逻辑能力的用途。此外，链条思维（CoT）提问技术可以大幅提高LLM的逻辑能力，但现有的自动化方法缺乏质量保证，而手动方法受限于批量缺乏多样性，这限制了LLM的能力。在这篇论文中，我们提出了自动生成高质量链条思维（SP-CoT）框架，用于自动生成高质量ODMR数据集，适应Context的CoT选择和自我提示的推理。我们的实验表明，我们提出的SP-CoT不仅在大规模（175B）LLM上明显超越了之前的SOTA方法，而且在小规模（13B）LLM上也近乎双倍了零基eline性能。进一步分析发现，SP-CoT可以诱导直接和简洁的中间逻辑步骤，在MuSiQue-Ans数据集上回快约50%的中间答案。

Towards Understanding Sycophancy in Language Models

paper_url: http://arxiv.org/abs/2310.13548
repo_url: https://github.com/meg-tong/sycophancy-eval
paper_authors: Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, Ethan Perez
for: investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback
methods: five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks
results: human preferences drive this broadly observed behavior, and both humans and preference models (PMs) prefer convincingly-written sycophantic responses over correct ones a non-negligible fraction of the time

Abstract
Human feedback is commonly utilized to finetune AI assistants. But human feedback may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback, and the potential role of human preference judgments in such behavior. We first demonstrate that five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks. To understand if human preferences drive this broadly observed behavior, we analyze existing human preference data. We find that when a response matches a user's views, it is more likely to be preferred. Moreover, both humans and preference models (PMs) prefer convincingly-written sycophantic responses over correct ones a non-negligible fraction of the time. Optimizing model outputs against PMs also sometimes sacrifices truthfulness in favor of sycophancy. Overall, our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses.

摘要
人类反馈通常用于精细化AI助手。然而，人类反馈也可能导致模型响应与用户的信念相符，这种行为称为奉承。我们研究了使用人类反馈进行finetuning的模型中是否存在奉承行为，以及人类喜好判断是否对此行为产生影响。我们首先表明了五种当前顶尖AI助手在四种不同的自由文本生成任务上一致地表现出奉承行为。为了了解人类喜好是否驱动这种广泛观察到的行为，我们分析了现有的人类喜好数据。我们发现当响应与用户的观点相符时，它更可能被 preference。此外，人类和喜好模型（PM）也有一定的时间喜好奉承而不是正确的响应。优化模型输出对PM也有时会损失真实性，而是偏向奉承。总的来说，我们的结果表明，state-of-the-art AI助手中的奉承行为是一种普遍存在的现象，可能受到人类喜好判断的影响。

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

paper_url: http://arxiv.org/abs/2310.13545
repo_url: https://github.com/sail-sg/scalelong
paper_authors: Zhongzhan Huang, Pan Zhou, Shuicheng Yan, Liang Lin
for: 这个论文主要是为了解释UNet在diffusion模型中的不稳定性问题，以及对UNet的缩放长连接系数（LSC）的影响。
methods: 这个论文使用了 teoretic 方法来解释UNet在diffusion模型中的不稳定性问题，并提出了一种名为ScaleLong的缩放长连接系数 frameworks，以改进UNet的训练稳定性。
results: 实验结果表明，ScaleLong 方法可以更好地稳定训练UNet，并且可以在不同的diffusion模型和UNet/UViT backbones上提高训练速度约1.5倍。

Abstract
In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient. Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness. Finally, inspired by our theory, we propose an effective coefficient scaling framework ScaleLong that scales the coefficients of LSC in UNet and better improves the training stability of UNet. Experimental results on four famous datasets show that our methods are superior to stabilize training and yield about 1.5x training acceleration on different diffusion models with UNet or UViT backbones. Code: https://github.com/sail-sg/ScaleLong

摘要
在扩散模型中，UNet 是最受欢迎的网络脊梁，因为它的长距离连接（LSC）可以将远方网络块的信息聚合并使得渐减 gradient 问题得到解决。然而，UNet 在扩散模型中的训练过程经常存在不稳定性问题，这可以通过减小 LSC 系数来缓解。然而，关于 UNet 在扩散模型中不稳定性的理论理解和 LSC 系数缩放对性能的改进仍然缺乏研究。为解决这个问题，我们理论上表明了 LSC 系数在 UNet 中对前向和反向传播稳定性和 robustness 的大量影响。具体来说，UNet 的隐藏特征和梯度在任何层都可能会振荡，其振荡范围实际很大，这解释了 UNet 训练不稳定的原因。此外，我们还发现 UNet 对输入的小偏差会导致输出与期望输出远离，从而导致振荡的损失和振荡的梯度。除此之外，我们还发现 LSC 系数缩放对 UNet 的隐藏特征和梯度的稳定性和 robustness 有理论上的 beneficial 效果。最后，我们根据我们的理论，提出了一个有效的 coefficient scaling 框架 ScaleLong，可以更好地改进 UNet 的训练稳定性。实验结果表明，我们的方法在四个知名的 dataset 上的训练稳定性和训练速度比 Traditional UNet 和 UViT 更好，具体来说，我们的方法可以在不同的扩散模型上带来约 1.5 倍的训练加速。代码可以在 GitHub 上找到：https://github.com/sail-sg/ScaleLong

Positive-Unlabeled Node Classification with Structure-aware Graph Learning

paper_url: http://arxiv.org/abs/2310.13538
repo_url: None
paper_authors: Hansi Yang, Yongqi Zhang, Quanming Yao, James Kwok
for: 这篇论文是针对 graphs 上的 node classification 问题进行研究，特别是在 positive-unlabeled (PU) 的情况下。
methods: 本文提出了一个 distance-aware PU 损失函数，利用 graph 的 homophily 来提供更加精确的超级vision。此外，本文还提出了一个对 graph 结构进行调整的正规化器。
results: 实验结果显示，该方法在多种不同的 graph 数据集上表现出色，较先前的 state-of-the-art 方法有更好的性能。

Abstract
Node classification on graphs is an important research problem with many applications. Real-world graph data sets may not be balanced and accurate as assumed by most existing works. A challenging setting is positive-unlabeled (PU) node classification, where labeled nodes are restricted to positive nodes. It has diverse applications, e.g., pandemic prediction or network anomaly detection. Existing works on PU node classification overlook information in the graph structure, which can be critical. In this paper, we propose to better utilize graph structure for PU node classification. We first propose a distance-aware PU loss that uses homophily in graphs to introduce more accurate supervision. We also propose a regularizer to align the model with graph structure. Theoretical analysis shows that minimizing the proposed loss also leads to minimizing the expected loss with both positive and negative labels. Extensive empirical evaluation on diverse graph data sets demonstrates its superior performance over existing state-of-the-art methods.

摘要
节点分类在图上是一个重要的研究问题，具有多种应用。现实中的图数据集可能不具备准确性和平衡性，而大多数现有工作假设了这些假设。困难的设定是正面未标注（PU）节点分类，即标注节点只能是正面节点。它在多个应用中具有多样性，例如疫苗预测或网络异常检测。现有的PU节点分类方法忽略了图结构信息，这可能是关键。在这篇论文中，我们提议更好地利用图结构来进行PU节点分类。我们首先提出了距离意识PU损失函数，使用图中的同类关系（homophily）引入更加准确的监督。我们还提出了一种对齐模型与图结构的正则项。理论分析表明，尝试最小化我们提出的损失函数也将导致最小化预期损失函数中的正面和负面标签。我们在多种多样的图数据集进行了广泛的实验，证明了我们的方法在现有状态的方法之上表现出色。

Technical Report for ICCV 2023 Visual Continual Learning Challenge: Continuous Test-time Adaptation for Semantic Segmentation

paper_url: http://arxiv.org/abs/2310.13533
repo_url: None
paper_authors: Damian Sójka, Yuyang Liu, Dipam Goswami, Sebastian Cygert, Bartłomiej Twardowski, Joost van de Weijer
for:The paper is written for developing a test-time adaptation (TTA) method for semantic segmentation in video sequences, specifically for adapting to gradually changing domains caused by weather conditions and time of day.methods:The proposed TTA method uses a synthetic driving video dataset called SHIFT, and the source model is trained on images taken during daytime in clear weather. The method adapts the model to the changing domains by analyzing the distributional shift and developing a method that can generalize across different scenarios.results:The proposed method secured a 3rd place in a challenge and received an innovation award, outperforming solutions that used external pretrained models or specialized data augmentations. The method demonstrated the ability to adapt to changing data dynamics and generalize across different scenarios.Here’s the Chinese translation of the three points:for:本文是为开发一种测试时适应（TTA）方法，用于视频序列中的 semantics 分割任务，特别是针对逐渐变化的领域所导致的变化。methods:提议的 TTA 方法使用了一个 sintetic driving 视频数据集 called SHIFT，并使用源模型在晴朗的日子时光下训练。该方法通过分析分布Shift的方法，来适应不同的enario。results:提议的方法在一个挑战中获得第三名并获得了创新奖，比其他使用外部预训练模型或特殊数据增强的解决方案更好。该方法表明了适应不同的数据动态和场景的能力。

Abstract
The goal of the challenge is to develop a test-time adaptation (TTA) method, which could adapt the model to gradually changing domains in video sequences for semantic segmentation task. It is based on a synthetic driving video dataset - SHIFT. The source model is trained on images taken during daytime in clear weather. Domain changes at test-time are mainly caused by varying weather conditions and times of day. The TTA methods are evaluated in each image sequence (video) separately, meaning the model is reset to the source model state before the next sequence. Images come one by one and a prediction has to be made at the arrival of each frame. Each sequence is composed of 401 images and starts with the source domain, then gradually drifts to a different one (changing weather or time of day) until the middle of the sequence. In the second half of the sequence, the domain gradually shifts back to the source one. Ground truth data is available only for the validation split of the SHIFT dataset, in which there are only six sequences that start and end with the source domain. We conduct an analysis specifically on those sequences. Ground truth data for test split, on which the developed TTA methods are evaluated for leader board ranking, are not publicly available. The proposed solution secured a 3rd place in a challenge and received an innovation award. Contrary to the solutions that scored better, we did not use any external pretrained models or specialized data augmentations, to keep the solutions as general as possible. We have focused on analyzing the distributional shift and developing a method that could adapt to changing data dynamics and generalize across different scenarios.

摘要
挑战目标是开发一种测试时适应（TTA）方法，用于在视频序列中进行Semantic Segmentation任务中的模型适应。该方法基于一个合成的驾驶视频集（SHIFT），源模型在日间晴朗的图像上训练。测试时的领域变化主要由不同的天气和时间条件引起。TTA方法在每个图像序列（视频）上进行评估，因此模型在下一个序列之前被重置到源模型状态。图像来一个一个，需要在每帧预测。每个序列由401帧组成，开头是源领域，然后慢慢地变化到不同的领域（不同的天气或时间），中间一部分是源领域，然后再次变化回源领域。VALIDATION Split的真实数据可以进行特定的分析，但是TEST Split的真实数据，用于评估开发的TTA方法，不公开。我们的解决方案在挑战中获得了第三名，并且获得了创新奖。与其他更高分的解决方案不同，我们没有使用任何外部预训练模型或特殊的数据增强，以保持解决方案的通用性。我们主要关注分布式变化的分析，并开发了一种能够适应不同数据动态的方法，并且在不同的场景下具有普适性。

Design-Inclusive Language Models for Responsible Information Access

paper_url: http://arxiv.org/abs/2310.18333
repo_url: None
paper_authors: Veronica Chatrath, Oluwanifemi Bamgbose, Shaina Raza
for: 这个研究旨在发展一个名为“责任预测语言模型（ReDev）”的框架，以促进对所有用户而言的公正、安全和可靠的语言模型开发。
methods: 研究人员使用了一个专门设计来评估语言模型的测试组合，以确保模型的输出不含有偏见或害害的内容。
results: 研究发现，现有的四个州录语言模型（OPT、GPT-3.5、GPT-4和LLaMA-2）在不同的测试中表现不佳，表明需要在机器学习管线中考虑公正、安全和可靠性。

Abstract
As the use of large language models (LLMs) increases for everyday tasks, appropriate safeguards must be in place to ensure unbiased and safe output. Recent events highlight ethical concerns around conventionally trained LLMs, leading to overall unsafe user experiences. This motivates the need for responsible LLMs that are trained fairly, transparent to the public, and regularly monitored after deployment. In this work, we introduce the "Responsible Development of Language Models (ReDev)" framework to foster the development of fair, safe, and robust LLMs for all users. We also present a test suite of unique prompt types to assess LLMs on the aforementioned elements, ensuring all generated responses are non-harmful and free from biased content. Outputs from four state-of-the-art LLMs, OPT, GPT-3.5, GPT-4, and LLaMA-2, are evaluated by our test suite, highlighting the importance of considering fairness, safety, and robustness at every stage of the machine learning pipeline, including data curation, training, and post-deployment.

摘要
As the use of large language models (LLMs) increases for everyday tasks, appropriate safeguards must be in place to ensure unbiased and safe output. Recent events highlight ethical concerns around conventionally trained LLMs, leading to overall unsafe user experiences. This motivates the need for responsible LLMs that are trained fairly, transparent to the public, and regularly monitored after deployment. In this work, we introduce the "Responsible Development of Language Models (ReDev)" framework to foster the development of fair, safe, and robust LLMs for all users. We also present a test suite of unique prompt types to assess LLMs on the aforementioned elements, ensuring all generated responses are non-harmful and free from biased content. Outputs from four state-of-the-art LLMs, OPT, GPT-3.5, GPT-4, and LLaMA-2, are evaluated by our test suite, highlighting the importance of considering fairness, safety, and robustness at every stage of the machine learning pipeline, including data curation, training, and post-deployment.Here's the translation in Traditional Chinese:当大语言模型（LLMs）在日常任务中使用时，应该有适当的安全措施，以确保无偏袋和安全的输出。最近的事件显示了传统的 LLMS 对于不公正和不安全的用户体验问题，这引起了发展公正、透明和定期监控的 LLMS 的需求。在这个工作中，我们介绍了“责任的语言模型开发（ReDev）”框架，以促进公正、安全和可靠的 LLMS 的开发。我们还提出了一个对 LLMS 进行评估的测试组合，以确保所有生成的回应都是无害的和不受偏袋的。从四个现代 LLMS 中，OPT、GPT-3.5、GPT-4和LLaMA-2 的输出都被我们的测试组合评估，强调了在机器学习管线中考虑公平、安全和可靠性的重要性，包括数据混合、训练和部署阶段。

Variational measurement-based quantum computation for generative modeling

paper_url: http://arxiv.org/abs/2310.13524
repo_url: None
paper_authors: Arunava Majumder, Marius Krumm, Tina Radkohl, Hendrik Poulsen Nautrup, Sofiene Jerbi, Hans J. Briegel
for: 这篇论文旨在探讨量子测量计算（MBQC）如何使用随机性来进行计算，并explore MBQC 算法可以捕捉这种随机性作为计算资源。
methods: 该论文提出了一种基于 MBQC 的变量算法，该算法通过控制参数来调整随机性的度量，以提高生成模型的学习性能。
results: 数值研究表明，在某些生成模型任务中，该算法可以获得显著的提升，这些结果验证了 MBQC 中随机性的可能的优势，并鼓励进一步的MBQC-based算法的研究。

Abstract
Measurement-based quantum computation (MBQC) offers a fundamentally unique paradigm to design quantum algorithms. Indeed, due to the inherent randomness of quantum measurements, the natural operations in MBQC are not deterministic and unitary, but are rather augmented with probabilistic byproducts. Yet, the main algorithmic use of MBQC so far has been to completely counteract this probabilistic nature in order to simulate unitary computations expressed in the circuit model. In this work, we propose designing MBQC algorithms that embrace this inherent randomness and treat the random byproducts in MBQC as a resource for computation. As a natural application where randomness can be beneficial, we consider generative modeling, a task in machine learning centered around generating complex probability distributions. To address this task, we propose a variational MBQC algorithm equipped with control parameters that allow to directly adjust the degree of randomness to be admitted in the computation. Our numerical findings indicate that this additional randomness can lead to significant gains in learning performance in certain generative modeling tasks. These results highlight the potential advantages in exploiting the inherent randomness of MBQC and motivate further research into MBQC-based algorithms.

摘要
生成模型是机器学习中心的一个任务，旨在生成复杂的概率分布。我们提议一种基于MBQC的变量算法，具有控制参数，以直接控制计算中的随机性水平。我们的numerical findings表明，这种额外的随机性可以导致certain generative modeling tasks中的学习性能提高。这些结果表明了利用MBQC的随机性的优势，并促进了进一步的MBQC算法研究。

RaceLens: A Machine Intelligence-Based Application for Racing Photo Analysis

paper_url: http://arxiv.org/abs/2310.13515
repo_url: None
paper_authors: Andrei Boiarov, Dmitry Bleklov, Pavlo Bredikhin, Nikita Koritsky, Sergey Ulasen
for: 本研究开发了一个名为 RaceLens 的应用程序，用于精确分析赛车照片。
methods: 本研究使用了进步的深度学习和计算机视觉模型，并实现了访问车辆、车号码、车辆细节和车辆方向的检测和识别。
results: 研究发现，RaceLens 可以实现高度的精确性和效能，并且在 NASCAR 队伍的四个赛季中得到了成功的应用。研究还提供了实际应用的评估和车队的策略决策和性能指标的影响。

Abstract
This paper presents RaceLens, a novel application utilizing advanced deep learning and computer vision models for comprehensive analysis of racing photos. The developed models have demonstrated their efficiency in a wide array of tasks, including detecting racing cars, recognizing car numbers, detecting and quantifying car details, and recognizing car orientations. We discuss the process of collecting a robust dataset necessary for training our models, and describe an approach we have designed to augment and improve this dataset continually. Our method leverages a feedback loop for continuous model improvement, thus enhancing the performance and accuracy of RaceLens over time. A significant part of our study is dedicated to illustrating the practical application of RaceLens, focusing on its successful deployment by NASCAR teams over four seasons. We provide a comprehensive evaluation of our system's performance and its direct impact on the team's strategic decisions and performance metrics. The results underscore the transformative potential of machine intelligence in the competitive and dynamic world of car racing, setting a precedent for future applications.

摘要
Translation notes:* "advanced deep learning and computer vision models" is translated as "高级深度学习和计算机视觉模型" (gāojí shēngrán yǔ jìsuān zhìshuāng)* "comprehensive analysis" is translated as "全面分析" (quánmiàn fāng'àn)* "racing photos" is translated as "赛车照片" (sàichē zhezhe)* "car numbers" is translated as "车辆号码" (chēliàng hàoqī)* "car details" is translated as "车辆细节" (chēliàng xiǎojiě)* "car orientations" is translated as "车辆方向" (chēliàng fāngdòng)* "robust dataset" is translated as "可靠的数据集" (kějì de xiàngxīn)* "feedback loop" is translated as "反馈循环" (fǎnggǎn xiàngxīn)* "continuous model improvement" is translated as "连续模型改进" (liánxù módel gǎijì)* "practical application" is translated as "实用应用" (shíyòng yìngyì)* "successfully deployed" is translated as "成功应用" (chéngjì yìngyì)* "four seasons" is translated as "四季" (sì jì)

Explaining Interactions Between Text Spans

paper_url: http://arxiv.org/abs/2310.13506
repo_url: https://github.com/copenlu/spanex
paper_authors: Sagnik Ray Choudhury, Pepa Atanasova, Isabelle Augenstein
for: This paper aims to provide explanations for natural language understanding (NLU) tasks such as fact-checking (FC) and machine reading comprehension (MRC).
methods: The paper introduces a multi-annotator dataset of human span interaction explanations for NLU tasks, and investigates the decision-making processes of fine-tuned large language models in terms of the connections between spans in separate parts of the input.
results: The paper presents a novel community detection based unsupervised method to extract interaction explanations from a model’s inner workings.

Abstract
Reasoning over spans of tokens from different parts of the input is essential for natural language understanding (NLU) tasks such as fact-checking (FC), machine reading comprehension (MRC) or natural language inference (NLI). However, existing highlight-based explanations primarily focus on identifying individual important tokens or interactions only between adjacent tokens or tuples of tokens. Most notably, there is a lack of annotations capturing the human decision-making process w.r.t. the necessary interactions for informed decision-making in such tasks. To bridge this gap, we introduce SpanEx, a multi-annotator dataset of human span interaction explanations for two NLU tasks: NLI and FC. We then investigate the decision-making processes of multiple fine-tuned large language models in terms of the employed connections between spans in separate parts of the input and compare them to the human reasoning processes. Finally, we present a novel community detection based unsupervised method to extract such interaction explanations from a model's inner workings.

摘要
<>将文本翻译成简化中文。>自然语言理解（NLU）任务中，理智推理 sobre 各个 Token 的范围是非常重要的，例如实验室检查（FC）、机器阅读理解（MRC）或自然语言推理（NLI）。然而，现有的高亮显示型解释主要集中于特定的重要 Token 或邻近 Token 或 Tuple 的交互。最引人注意的是，缺乏记录人类决策过程中对 informed 决策所需的交互。为了bridging这个差距，我们引入 SpanEx，一个多个注释者数据集，其中包含人类对 NLI 和 FC 任务中的 span 交互解释。然后，我们调查多个精细调节的大语言模型在输入中不同部分的 span 之间的连接方式，并与人类的决策过程进行比较。最后，我们提出一种基于社群探测的无监督方法，用于从模型内部提取这些交互解释。

Robust Training for Conversational Question Answering Models with Reinforced Reformulation Generation

paper_url: http://arxiv.org/abs/2310.13505
repo_url: https://github.com/magkai/REIGN
paper_authors: Magdalena Kaiser, Rishiraj Saha Roy, Gerhard Weikum
for: 提高 conversational question answering (ConvQA) 模型在知识图 (KG) 上的表现，并且让模型能够更好地适应不同的表达形式。
methods: 提出了一种 frameworks named REIGN，通过系统地生成问题的 reformulations，提高模型对表达形式的弹性性，并使用深度强化学习指导模型提高答案质量。
results: 研究发现，通过使用 reformulations 进行强化学习，ConvQA 模型能够显著超越使用标准训练方法的模型，并且能够在不同的测试集上表现良好。

Abstract
Models for conversational question answering (ConvQA) over knowledge graphs (KGs) are usually trained and tested on benchmarks of gold QA pairs. This implies that training is limited to surface forms seen in the respective datasets, and evaluation is on a small set of held-out questions. Through our proposed framework REIGN, we take several steps to remedy this restricted learning setup. First, we systematically generate reformulations of training questions to increase robustness of models to surface form variations. This is a particularly challenging problem, given the incomplete nature of such questions. Second, we guide ConvQA models towards higher performance by feeding it only those reformulations that help improve their answering quality, using deep reinforcement learning. Third, we demonstrate the viability of training major model components on one benchmark and applying them zero-shot to another. Finally, for a rigorous evaluation of robustness for trained models, we use and release large numbers of diverse reformulations generated by prompting GPT for benchmark test sets (resulting in 20x increase in sizes). Our findings show that ConvQA models with robust training via reformulations, significantly outperform those with standard training from gold QA pairs only.

摘要
模型 для对话式问答（ConvQA）在知识图（KG）上通常由金标准问答对（gold QA pairs）进行训练和测试。这意味着训练是基于表面形式所限制的，而测试是基于一小组封闭的问答。我们的提出的框架REIGN通过以下几个步骤来缓解这种受限的学习环境：首先，我们系统地生成了训练问题的重新形式，以提高模型对表面形式的鲁棒性。这是一个特别困难的问题，因为训练问题是不完整的。其次，我们使用深度奖励学习引导ConvQA模型，只有那些能够提高答案质量的重新形式。第三，我们证明了训练主要模型组件在一个benchmark上训练后，可以零式应用到另一个benchmark上。最后，为了进行训练后模型的严格评估，我们使用和发布大量多样化的重新形式，由GPT提问 benchmark测试集（ resulting in 20x increase in size）。我们的发现表明，通过对模型进行robust训练 via 重新形式，ConvQA模型可以明显超越标准训练从金标准问答对只进行训练的模型。

Analogical Proportions and Creativity: A Preliminary Study

paper_url: http://arxiv.org/abs/2310.13500
repo_url: None
paper_authors: Stergos Afantenos, Henri Prade, Leonardo Cortez Bernardes
For: The paper is written to explore the use of analogical proportions for creating novel animal descriptions and retrieving rare animals.* Methods: The paper uses word embeddings and Boolean features to propose novel animals based on analogical proportions.* Results: The paper shows that word embeddings obtain better results in creating novel animals based on analogical proportions compared to Boolean features.Here’s the information in Simplified Chinese text:
for: 这篇论文是为了探讨使用对比比例来创造新的动物描述和找到罕见的动物。
methods: 这篇论文使用word embedding和布尔特Feature来提出新的动物基于对比比例。
results: 这篇论文显示word embedding在基于对比比例创造新动物方面比布尔特Feature更好的效果。

Abstract
Analogical proportions are statements of the form "$a$ is to $b$ as $c$ is to $d$", which expresses that the comparisons of the elements in pair $(a, b)$ and in pair $(c, d)$ yield similar results. Analogical proportions are creative in the sense that given 3 distinct items, the representation of a 4th item $d$, distinct from the previous items, which forms an analogical proportion with them can be calculated, provided certain conditions are met. After providing an introduction to analogical proportions and their properties, the paper reports the results of an experiment made with a database of animal descriptions and their class, where we try to "create" new animals from existing ones, retrieving rare animals such as platypus. We perform a series of experiments using word embeddings as well as Boolean features in order to propose novel animals based on analogical proportions, showing that word embeddings obtain better results.

摘要
<>将文本翻译成简化中文。<>对比比例是形式如 "$a$ 是 $b$ 的为 $c$ 是 $d$"，表达了对 $(a, b)$ 对和 $(c, d)$ 对的比较结果相似。对比比例是创造性的，即给定三个不同的 Item，可以计算出一个第四个 Item $d$，与之前的 Item 相似，并且满足certain conditions。文章 introduce analogical proportions and their properties, and then reports the results of an experiment using a database of animal descriptions and their class, where we try to "create" new animals from existing ones, retrieving rare animals such as platypus. We perform a series of experiments using word embeddings as well as Boolean features in order to propose novel animals based on analogical proportions, showing that word embeddings obtain better results.Here's the translation in Traditional Chinese:<>将文本翻译成繁体中文。<>对比比例是形式如 "$a$ 是 $b$ 的为 $c$ 是 $d$"，表达了对 $(a, b)$ 对和 $(c, d)$ 对的比较结果相似。对比比例是创造性的，即给定三个不同的 Item，可以计算出一个第四个 Item $d$，与之前的 Item 相似，并且满足certain conditions。文章 introduce analogical proportions and their properties, and then reports the results of an experiment using a database of animal descriptions and their class, where we try to "create" new animals from existing ones, retrieving rare animals such as platypus. We perform a series of experiments using word embeddings as well as Boolean features in order to propose novel animals based on analogical proportions, showing that word embeddings obtain better results.

Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning

paper_url: http://arxiv.org/abs/2310.13486
repo_url: None
paper_authors: Lucas Weber, Elia Bruni, Dieuwke Hupkes
for: 本研究的目的是找到现有语言模型的最佳适应方式，以便在当前NLP中进行任务适应。
methods: 本研究使用了启发式学习（Prompting）方法，并进行了系统atic的评估，以找出不同因素对预测结果的影响。
results: 研究发现，某些因素会导致预测结果不稳定和不一致，而其他因素则可以无恶不害地使用。这些结论可以帮助选择适合的适应方式，以提高任务适应的效果。

Abstract
Finding the best way of adapting pre-trained language models to a task is a big challenge in current NLP. Just like the previous generation of task-tuned models (TT), models that are adapted to tasks via in-context-learning (ICL) are robust in some setups but not in others. Here, we present a detailed analysis of which design choices cause instabilities and inconsistencies in LLM predictions. First, we show how spurious correlations between input distributions and labels -- a known issue in TT models -- form only a minor problem for prompted models. Then, we engage in a systematic, holistic evaluation of different factors that have been found to influence predictions in a prompting setup. We test all possible combinations of a range of factors on both vanilla and instruction-tuned (IT) LLMs of different scale and statistically analyse the results to show which factors are the most influential, interactive or stable. Our results show which factors can be used without precautions and which should be avoided or handled with care in most settings.

摘要
现在的自然语言处理中，找到最佳适应预训练语言模型任务的方式是一大挑战。与前一代任务调整模型（TT）类似，通过在 контекст学习（ICL）方式进行适应，模型在某些设置下 Displaying Robustness，但在其他设置下却存在不稳定和不一致的问题。在这里，我们提供了适应模型预测中的不稳定和不一致的分析。首先，我们表明了预训练模型输入分布和标签之间的假 correlations（一个已知的问题）在提问模型中只占了一小部分。然后，我们进行了系统性的全面性评估不同因素对预测的影响。我们测试了所有可能的组合，包括不同的规模和不同的预测模型，并使用统计分析来显示这些因素对预测的影响程度，以及它们之间的互动和稳定性。我们的结果表明了哪些因素可以无需预caution使用，哪些因素应该避免或处理得更加小心。

Application of deep learning for livestock behaviour recognition: A systematic literature review

paper_url: http://arxiv.org/abs/2310.13483
repo_url: None
paper_authors: Ali Rohan, Muhammad Saad Rafaq, Md. Junayed Hasan, Furqan Asghar, Ali Kashif Bashir, Tania Dottorini
for: 这个论文主要是为了研究使用深度学习技术来识别畜牧动物的行为。
methods: 这个论文使用了多种深度学习模型和网络，包括CNN、Faster R-CNN、YOLOv5和YOLOv4等模型，以及VGG16、CSPDarknet53、GoogLeNet、ResNet101和ResNet50等网络。
results: 这个论文的研究表明，深度学习成功地解决了13种行为识别问题，包括44种不同的行为类型。

Abstract
Livestock health and welfare monitoring has traditionally been a labor-intensive task performed manually. Recent advances have led to the adoption of AI and computer vision techniques, particularly deep learning models, as decision-making tools within the livestock industry. These models have been employed for tasks like animal identification, tracking, body part recognition, and species classification. In the past decade, there has been a growing interest in using these models to explore the connection between livestock behaviour and health issues. While previous review studies have been rather generic, there is currently no review study specifically focusing on DL for livestock behaviour recognition. Hence, this systematic literature review (SLR) was conducted. The SLR involved an initial search across electronic databases, resulting in 1101 publications. After applying defined selection criteria, 126 publications were shortlisted. These publications were further filtered based on quality criteria, resulting in the selection of 44 high-quality primary studies. These studies were analysed to address the research questions. The results showed that DL successfully addressed 13 behaviour recognition problems encompassing 44 different behaviour classes. A variety of DL models and networks were employed, with CNN, Faster R-CNN, YOLOv5, and YOLOv4 being among the most common models, and VGG16, CSPDarknet53, GoogLeNet, ResNet101, and ResNet50 being popular networks. Performance evaluation involved ten different matrices, with precision and accuracy being the most frequently used. Primary studies identified challenges, including occlusion, adhesion, data imbalance, and the complexities of the livestock environment. The SLR study also discussed potential solutions and research directions to facilitate the development of autonomous livestock behaviour recognition systems.

摘要
livestock health和福祉监测曾经是一项劳动密集的任务，通常是手动完成的。近年来，人工智能和计算机视觉技术的应用，特别是深度学习模型，在畜牧业中被用作决策支持工具。这些模型已被用于动物识别、跟踪、身体部分识别和种类分类等任务。过去的十年中，有一个增长的兴趣在使用这些模型探索畜牧动物行为和健康问题之间的联系。而前一个综述研究已经是非常通用的，但是目前没有专门关于深度学习的畜牧动物行为识别综述。因此，这项系统性文献综述（SLR）被进行了。SLR involve了电子数据库的初步搜索，共计1101篇论文。经过应用定义的选择 criterion，短listed 126篇论文。这些论文进一步根据质量标准进行筛选，选择了44篇高品质的原始研究。这些研究被分析以解答研究 вопро题。结果显示，深度学习成功解决了13种行为识别问题，涵盖44种不同的行为类型。多种深度学习模型和网络被使用，其中CNN、Faster R-CNN、YOLOv5和YOLOv4是最常用的模型，而VGG16、CSPDarknet53、GoogLeNet、ResNet101和ResNet50是最受欢迎的网络。性能评价使用了十种不同的矩阵，准确率和准确率是最常用的两个矩阵。原始研究认为， occlusion、黏合、数据不均衡和畜牧环境的复杂性是挑战。SLR 研究还讨论了可能的解决方案和研究方向，以便开发自主的畜牧动物行为识别系统。

Ask Language Model to Clean Your Noisy Translation Data

paper_url: http://arxiv.org/abs/2310.13469
repo_url: None
paper_authors: Quinten Bolding, Baohao Liao, Brandon James Denis, Jun Luo, Christof Monz
for: 该研究旨在提高MTNT数据集的用途，以便更好地评估Neural Machine Translation（NMT）模型对听频输入的敏感性。
methods: 该研究使用大语言模型（LLM）进行听频提取和重塑，从而提高MTNT数据集的清晰度。
results: 研究表明，LLM可以有效地去除听频，同时保留原始句子的 semantics。此外，LLM还能够重塑slang、argot和低语言。这些数据集被称为C-MTNT，并且在评估NMT模型的Robustness方面表现出色。

Abstract
Transformer models have demonstrated remarkable performance in neural machine translation (NMT). However, their vulnerability to noisy input poses a significant challenge in practical implementation, where generating clean output from noisy input is crucial. The MTNT dataset is widely used as a benchmark for evaluating the robustness of NMT models against noisy input. Nevertheless, its utility is limited due to the presence of noise in both the source and target sentences. To address this limitation, we focus on cleaning the noise from the target sentences in MTNT, making it more suitable as a benchmark for noise evaluation. Leveraging the capabilities of large language models (LLMs), we observe their impressive abilities in noise removal. For example, they can remove emojis while considering their semantic meaning. Additionally, we show that LLM can effectively rephrase slang, jargon, and profanities. The resulting datasets, called C-MTNT, exhibit significantly less noise in the target sentences while preserving the semantic integrity of the original sentences. Our human and GPT-4 evaluations also lead to a consistent conclusion that LLM performs well on this task. Lastly, experiments on C-MTNT showcased its effectiveness in evaluating the robustness of NMT models, highlighting the potential of advanced language models for data cleaning and emphasizing C-MTNT as a valuable resource.

摘要
启示器模型在神经机器翻译（NMT）中表现出了惊人的能力。然而，它们对听风输入的敏感性带来了实际应用中的挑战，因为生成干净输出从听风输入是关键。MTNT数据集广泛用于评估NMT模型对听风输入的Robustness。然而，该数据集的利用受到了源和目标句子都含有噪音的限制。为解决这一问题，我们将目标句子中的噪音除掉，使MTNT数据集更适合用于噪音评估。利用大语言模型（LLM）的能力，我们发现它们可以从目标句子中除掉表情符号，同时考虑其 semantics 意义。此外，我们发现 LLM 可以有效地重塑 slang、短语和恶语。所得到的数据集，称为 C-MTNT，具有明显 menos 的噪音，同时保持原始句子的 semantics 完整性。我们的人工评估和 GPT-4 评估也导致了一致的结论：LLM 在这种任务上表现良好。最后，我们对 C-MTNT 进行了实验，并证明了它在评估 NMT 模型的 Robustness 方面的效果，强调了高级语言模型在数据清洁方面的潜在力量，并将 C-MTNT 作为一个价值的资源。

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models

paper_url: http://arxiv.org/abs/2310.18332
repo_url: None
paper_authors: Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou
for: 这个论文旨在探讨一种基于大语言模型（LLM）的用户驱动框架，用于创造艺术 typography。
methods: 该系统包括四个关键模块：”LLM Engine”、”SemTypo”、”StyTypo”和”TexTypo”模块。”LLM Engine” 通过 LLM（如 GPT-3.5-turbo）解释用户输入并生成可行的提示，将抽象概念转化成可见的设计。”SemTypo module” 利用语义概念优化字体设计，寻求平衡艺术变换和可读性。”StyTypo module” 基于语义布局生成细腻的图像。”TexTypo module” 进一步提高设计的美学效果通过 текстуر渲染。
results: 该系统能够生成创新的文本字体，并且可以在 ModelScope 上体验其能力：https://www.modelscope.cn/studios/WordArt/WordArt.

Abstract
This paper introduces "WordArt Designer", a user-driven framework for artistic typography synthesis, relying on Large Language Models (LLM). The system incorporates four key modules: the "LLM Engine", "SemTypo", "StyTypo", and "TexTypo" modules. 1) The "LLM Engine", empowered by LLM (e.g., GPT-3.5-turbo), interprets user inputs and generates actionable prompts for the other modules, thereby transforming abstract concepts into tangible designs. 2) The "SemTypo module" optimizes font designs using semantic concepts, striking a balance between artistic transformation and readability. 3) Building on the semantic layout provided by the "SemTypo module", the "StyTypo module" creates smooth, refined images. 4) The "TexTypo module" further enhances the design's aesthetics through texture rendering, enabling the generation of inventive textured fonts. Notably, "WordArt Designer" highlights the fusion of generative AI with artistic typography. Experience its capabilities on ModelScope: https://www.modelscope.cn/studios/WordArt/WordArt.

摘要

“LLM Engine”: empowered by LLM (e.g., GPT-3.5-turbo), it interprets user inputs and generates actionable prompts for the other modules, transforming abstract concepts into tangible designs.2. “SemTypo module”: optimizes font designs using semantic concepts, balancing artistic transformation and readability.3. “StyTypo module”: creates smooth, refined images based on the semantic layout provided by the “SemTypo module”.4. “TexTypo module”: enhances the design’s aesthetics through texture rendering, enabling the generation of inventive textured fonts.”WordArt Designer” showcases the fusion of generative AI with artistic typography. Experience its capabilities on ModelScope: https://www.modelscope.cn/studios/WordArt/WordArt.

Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation

paper_url: http://arxiv.org/abs/2310.13447
repo_url: None
paper_authors: Siyu Zhang, Yeming Chen, Sirui Cheng, Yaoru Sun, Jun Yang, Lizhi Bai
for: 本研究旨在提高多Modal Semantic Representation，尤其是vision和language之间的同步。
methods: 本研究使用自适应学习的pre-trained模型，并提出了superpixel作为可学习图像数据的全面表示，以及Multiscale Difference Graph Convolutional Network (MDGCN)来捕捉多尺度特征。
results: 对多个下游任务学习，本研究可以与其他状态调研方法竞争，并且可以更好地捕捉图像的空间semantic关系。

Abstract
Within the multimodal field, the key to integrating vision and language lies in establishing a good alignment strategy. Recently, benefiting from the success of self-supervised learning, significant progress has been made in multimodal semantic representation based on pre-trained models for vision and language. However, there is still room for improvement in visual semantic representation. The lack of spatial semantic coherence and vulnerability to noise makes it challenging for current pixel or patch-based methods to accurately extract complex scene boundaries. To this end, this paper develops superpixel as a comprehensive compact representation of learnable image data, which effectively reduces the number of visual primitives for subsequent processing by clustering perceptually similar pixels. To mine more precise topological relations, we propose a Multiscale Difference Graph Convolutional Network (MDGCN). It parses the entire image as a fine-to-coarse hierarchical structure of constituent visual patterns, and captures multiscale features by progressively merging adjacent superpixels as graph nodes. Moreover, we predict the differences between adjacent nodes through the graph structure, facilitating key information aggregation of graph nodes to reason actual semantic relations. Afterward, we design a multi-level fusion rule in a bottom-up manner to avoid understanding deviation by learning complementary spatial information at different regional scales. Our proposed method can be well applied to multiple downstream task learning. Extensive experiments demonstrate that our method is competitive with other state-of-the-art methods in visual reasoning. Our code will be released upon publication.

摘要
在多模态场景中，与语言集成的关键在于设置良好的对齐策略。在最近，基于自动学习的成功，我们在多模态semantic表示方面进行了显著的进步，但是视觉semantic表示仍然存在改进的空间。现有的像素或patch基本方法难以准确地提取复杂的场景边界，主要是因为缺乏空间semantic准确性和噪声抑制的能力。为了解决这些问题，本文提出了superpixel作为可学习图像数据的总体紧凑表示，可以有效减少后续处理的视觉元素数量。此外，我们还提出了多尺度差分图 convolutional neural network（MDGCN）来捕捉多尺度特征。MDGCN通过将整个图像分解为一个细到广的 hierarchical结构，并在不同尺度上 merge 相似的superpixel作为图节点，来捕捉多尺度特征。此外，我们还预测了图节点之间的差异，以便通过图结构来汇总实际semantic关系的信息。最后，我们设计了一种多级融合规则，以避免不同地域尺度上的理解偏差。我们的提出的方法可以应用于多个下游任务学习。广泛的实验表明，我们的方法与其他当前领先的方法在视理解方面具有竞争力。我们的代码将在发表时公布。

Self-Consistency of Large Language Models under Ambiguity

paper_url: http://arxiv.org/abs/2310.13439
repo_url: https://github.com/jacobpfau/introspective-self-consistency
paper_authors: Henning Bartsch, Ole Jorgensen, Domenic Rosati, Jason Hoelscher-Obermaier, Jacob Pfau
for: 这个论文的目的是评估自身一致性在不充分约束下的情况，以及模型在不同上下文中的一致性是否能够保持。
methods: 这个论文使用了OpenAI模型集进行了一系列行为实验，测试了模型在一个抽象整数序列完成任务上的表现，并发现模型的一致性范围为67%-82%，高于随机预测的水平，并且随着模型能力的提高而增加。
results: 这个论文发现，即使模型在不同的Robustness Check中保持了自身一致性，但模型并不总能够正确地评估自身一致性。此外，模型通常会将一些不一致的答案分配给一定的概率，这提供了内部计算多个可能答案的证据。

Abstract
Large language models (LLMs) that do not give consistent answers across contexts are problematic when used for tasks with expectations of consistency, e.g., question-answering, explanations, etc. Our work presents an evaluation benchmark for self-consistency in cases of under-specification where two or more answers can be correct. We conduct a series of behavioral experiments on the OpenAI model suite using an ambiguous integer sequence completion task. We find that average consistency ranges from 67\% to 82\%, far higher than would be predicted if a model's consistency was random, and increases as model capability improves. Furthermore, we show that models tend to maintain self-consistency across a series of robustness checks, including prompting speaker changes and sequence length changes. These results suggest that self-consistency arises as an emergent capability without specifically training for it. Despite this, we find that models are uncalibrated when judging their own consistency, with models displaying both over- and under-confidence. We also propose a nonparametric test for determining from token output distribution whether a model assigns non-trivial probability to alternative answers. Using this test, we find that despite increases in self-consistency, models usually place significant weight on alternative, inconsistent answers. This distribution of probability mass provides evidence that even highly self-consistent models internally compute multiple possible responses.

摘要
大型语言模型（LLM）无法在不同的上下文中给出一致的答案是问题当用于需要一致性的任务，例如问答、解释等。我们的工作提供了一个评估标准 benchmark for自我一致性在不足规定的情况下，其中有两个或更多的答案都可以是正确的。我们使用OpenAI模型组合进行了一系列的行为实验，包括一个模糊的数字序列完成任务。我们发现，自我一致性的平均值在67%至82%之间，远高于随机的预期，并随模型能力的提高而增加。此外，我们发现模型在多个坚固性检查中保持自我一致性，包括说话人变化和序列长度变化。这些结果表明自我一致性是一个自然而然的能力，不需要具体地训练。尽管如此，我们发现模型对自己的一致性仍然无法准确评估，模型会表现出过度和不足的自信。我们还提出了一个非Parametric测试，用于决定模型从字串输出分布是否将非零概率分配给alternative答案。使用这个测试，我们发现，即使自我一致性提高，模型通常将重要的概率分配给不一致的答案。这个分布的概率质量提供了证据，表明even highly self-consistent models internally compute multiple possible responses。

Random Matrix Analysis to Balance between Supervised and Unsupervised Learning under the Low Density Separation Assumption

paper_url: http://arxiv.org/abs/2310.13434
repo_url: None
paper_authors: Vasilii Feofanov, Malik Tiomoko, Aladin Virmaux
for: 本研究旨在提出一个理论框架，用于分类下降精度分布假设在高维度情况下。
methods: 我们引入QLDS，一种线性分类模型，在这里实现低概率分离假设通过 quadratic margin maximization。我们提供了可解释的算法，并证明其在某些特定情况下与最小二乘支持向量机、完全无监督分类和半监督图像分类方法相同。
results: 我们使用最新的随机矩阵理论来正式评估分类错误率的数学性评估。此外，我们还提出了一种hyperparameter选择策略，可以找到最佳的超参数，并进行了一系列的示例和实验，证明QLDS在计算效率和分类质量上都有优势，而且可以超越交叉验证。

Abstract
We propose a theoretical framework to analyze semi-supervised classification under the low density separation assumption in a high-dimensional regime. In particular, we introduce QLDS, a linear classification model, where the low density separation assumption is implemented via quadratic margin maximization. The algorithm has an explicit solution with rich theoretical properties, and we show that particular cases of our algorithm are the least-square support vector machine in the supervised case, the spectral clustering in the fully unsupervised regime, and a class of semi-supervised graph-based approaches. As such, QLDS establishes a smooth bridge between these supervised and unsupervised learning methods. Using recent advances in the random matrix theory, we formally derive a theoretical evaluation of the classification error in the asymptotic regime. As an application, we derive a hyperparameter selection policy that finds the best balance between the supervised and the unsupervised terms of our learning criterion. Finally, we provide extensive illustrations of our framework, as well as an experimental study on several benchmarks to demonstrate that QLDS, while being computationally more efficient, improves over cross-validation for hyperparameter selection, indicating a high promise of the usage of random matrix theory for semi-supervised model selection.

摘要
我们提出了一个理论框架，用于分析半监督分类在高维度 Régime 下的 semi-supervised classification。特别是，我们引入QLDS，一种线性分类模型，其中占据低概率分离假设通过quadratic margin maximization进行实现。该算法有显式解，并且我们显示了它的理论性质，并且我们显示了它与其他监督学习方法之间的缓解。使用最近的随机矩阵理论，我们正式地评估分类错误率的极限regime。为此，我们提出了一种hyperparameter选择策略，可以找到我们的学习 criterion 中的最佳平衡。最后，我们提供了广泛的案例研究和一些 benchmark 的实验，以示出QLDS 可以更高效地进行计算，并且可以在 hyperparameter 选择方面超过cross-validation。这表明了随机矩阵理论在 semi-supervised 模型选择方面的高 Promise。

FLTracer: Accurate Poisoning Attack Provenance in Federated Learning

paper_url: http://arxiv.org/abs/2310.13424
repo_url: https://github.com/eyr3/fltracer
paper_authors: Xinyu Zhang, Qingyu Liu, Zhongjie Ba, Yuan Hong, Tianhang Zheng, Feng Lin, Li Lu, Kui Ren
for: 本研究旨在探讨 Federated Learning (FL) 中的攻击检测方法，以及如何准确地检测不同类型的攻击和恶意更新。
methods: 本研究提出了一种基于 Kalman 约束的跨局回合检测方法，可以准确地检测不同类型的攻击和恶意更新，并且可以适应非独立同分布 (non-IID) 的数据设置。
results: 对比存在的检测方法，本研究的方法可以准确地检测攻击和恶意更新，并且在非独立同分布 (non-IID) 的数据设置下表现出 excel 的性能。

Abstract
Federated Learning (FL) is a promising distributed learning approach that enables multiple clients to collaboratively train a shared global model. However, recent studies show that FL is vulnerable to various poisoning attacks, which can degrade the performance of global models or introduce backdoors into them. In this paper, we first conduct a comprehensive study on prior FL attacks and detection methods. The results show that all existing detection methods are only effective against limited and specific attacks. Most detection methods suffer from high false positives, which lead to significant performance degradation, especially in not independent and identically distributed (non-IID) settings. To address these issues, we propose FLTracer, the first FL attack provenance framework to accurately detect various attacks and trace the attack time, objective, type, and poisoned location of updates. Different from existing methodologies that rely solely on cross-client anomaly detection, we propose a Kalman filter-based cross-round detection to identify adversaries by seeking the behavior changes before and after the attack. Thus, this makes it resilient to data heterogeneity and is effective even in non-IID settings. To further improve the accuracy of our detection method, we employ four novel features and capture their anomalies with the joint decisions. Extensive evaluations show that FLTracer achieves an average true positive rate of over $96.88\%$ at an average false positive rate of less than $2.67\%$, significantly outperforming SOTA detection methods. \footnote{Code is available at \url{https://github.com/Eyr3/FLTracer}.}

摘要
federated 学习（FL）是一种有前途的分布式学习方法，它允许多个客户端共同训练一个共享的全球模型。然而，最近的研究表明，FL 是易受到多种攻击的，这些攻击可以降低全球模型的性能或者引入后门。在这篇论文中，我们首先进行了FL攻击和检测方法的全面研究。结果显示，所有现有的检测方法都只有对特定和有限的攻击有效。大多数检测方法会导致高的假阳性，这会导致非常显著的性能下降，特别是在非独立和相同分布（non-IID）的设置下。为解决这些问题，我们提议FLTracer，FL攻击源架构，它可以准确地检测多种攻击并跟踪攻击时间、目标、类型和毒化的更新。与现有方法不同，我们的方法不仅仅基于客户端间异常检测，而是使用加尔曼滤波器基于跨局检测，以识别对手的行为变化前后。这使得我们的方法能够抗衡数据多样性，并在非独立和相同分布的设置下具有高效性。为了进一步提高我们的检测方法的准确度，我们采用了四个新的特征，并通过联合决策捕捉它们的异常。广泛的评估结果显示，FLTracer 的真正正确率超过 $96.88\%$，假阳性率低于 $2.67\%$，与当前最佳检测方法相比，显著超出。

paper_url: http://arxiv.org/abs/2310.18331
repo_url: None
paper_authors: Jiarun Liu, Wentao Hu, Chunhong Zhang
for: 这个论文主要targets web navigation tasks, using large language models (LLMs) to interpret objectives and interact with web pages.
methods: 该论文提出了一种标准化的提示模板，可以增强任务上下文表示，从而提高 LLMS 在 HTML 基于的网页导航性能。
results: 我们通过提示学习和指令精度调整基于开源 Llama-2 和 API 可用的 GPT 模型，发现 GPT-4 等大型模型在网页导航任务中表现出色，并且 HTML 片段长度和历史轨迹有显著影响性能，而先前的步骤指令不如实时环境反馈有更好的效果。

Abstract
Large Language Models (LLMs) have emerged as promising agents for web navigation tasks, interpreting objectives and interacting with web pages. However, the efficiency of spliced prompts for such tasks remains underexplored. We introduces AllTogether, a standardized prompt template that enhances task context representation, thereby improving LLMs' performance in HTML-based web navigation. We evaluate the efficacy of this approach through prompt learning and instruction finetuning based on open-source Llama-2 and API-accessible GPT models. Our results reveal that models like GPT-4 outperform smaller models in web navigation tasks. Additionally, we find that the length of HTML snippet and history trajectory significantly influence performance, and prior step-by-step instructions prove less effective than real-time environmental feedback. Overall, we believe our work provides valuable insights for future research in LLM-driven web agents.

摘要
大型语言模型（LLM）已经出现为网络浏览任务中有前途的代理人，解释目标和与网页交互。然而，用于这些任务的拼接提示的效率仍未得到足够的探索。我们介绍了AllTogether，一个标准化的提示模板，可以增强任务上下文表示，从而提高LLM在基于HTML的网络浏览中的表现。我们通过提示学习和指令精度调整基于开源Llama-2和可用API的GPT模型进行评估。我们的结果表明，比较大的GPT-4模型在网络浏览任务中表现更好，而且HTML段和历史轨迹的长度对表现有重要影响，而先前的步骤指令比实时环境反馈更有效。总之，我们认为我们的工作对未来LLM驱动的网络代理人做出了重要贡献。

A Novel Transfer Learning Method Utilizing Acoustic and Vibration Signals for Rotating Machinery Fault Diagnosis

paper_url: http://arxiv.org/abs/2310.14796
repo_url: None
paper_authors: Zhongliang Chen, Zhuofei Huang, Wenxiong Kang
for: 这篇论文的目的是提出一种基于声学和振荡信号的错误诊断方法，以解决现有系统中的分布差异问题，提高错误诊断的精度和可靠性。
methods: 本文提出的方法包括设计了声学和振荡特征融合MAVgram，以提供更丰富和可靠的错误信息，并与基于神经网络的分类器结合，实现更有效的错误诊断表现。
results: 实验结果显示，提案的方法可以实现更高的错误诊断性能，并比STgram-MFN更有效。

Abstract
Fault diagnosis of rotating machinery plays a important role for the safety and stability of modern industrial systems. However, there is a distribution discrepancy between training data and data of real-world operation scenarios, which causing the decrease of performance of existing systems. This paper proposed a transfer learning based method utilizing acoustic and vibration signal to address this distribution discrepancy. We designed the acoustic and vibration feature fusion MAVgram to offer richer and more reliable information of faults, coordinating with a DNN-based classifier to obtain more effective diagnosis representation. The backbone was pre-trained and then fine-tuned to obtained excellent performance of the target task. Experimental results demonstrate the effectiveness of the proposed method, and achieved improved performance compared to STgram-MFN.

摘要
扭转机器的故障诊断在现代工业系统中扮演着重要的角色，以保证安全和稳定。然而，现有系统存在训练数据和实际运行场景数据之间的分布差异，导致现有系统的性能下降。这篇论文提出了基于传播学的方法，利用声音和振荡信号来解决这种分布差异。我们设计了声音和振荡特征融合MAVgram，以提供更加丰富和可靠的故障信息，并与基于DNN的分类器结合，以获得更有效的诊断表示。背部先经过训练，然后细化以实现target任务的优秀表现。实验结果表明提案的方法的有效性，并在STgram-MFN的基础上提高了性能。

POSQA: Probe the World Models of LLMs with Size Comparisons

paper_url: http://arxiv.org/abs/2310.13394
repo_url: https://github.com/cambridgeltl/posqa
paper_authors: Chang Shu, Jiuzhou Han, Fangyu Liu, Ehsan Shareghi, Nigel Collier
for: 验证最新的大语言模型（LLMs）在真实世界中的理解能力
methods: 使用物理对象大小问答数据集（POSQA）进行零基础测试，并使用高级提问技术和外部知识增强
results: 显示最新的LLMs在零基础情况下表现不佳，并且表现受提问形式和对象报告偏见的影响，表明语言模型从文本数据中塑造的理解能力可能受到表单形式的干扰和人类行为的不一致。

Abstract
Embodied language comprehension emphasizes that language understanding is not solely a matter of mental processing in the brain but also involves interactions with the physical and social environment. With the explosive growth of Large Language Models (LLMs) and their already ubiquitous presence in our daily lives, it is becoming increasingly necessary to verify their real-world understanding. Inspired by cognitive theories, we propose POSQA: a Physical Object Size Question Answering dataset with simple size comparison questions to examine the extremity and analyze the potential mechanisms of the embodied comprehension of the latest LLMs. We show that even the largest LLMs today perform poorly under the zero-shot setting. We then push their limits with advanced prompting techniques and external knowledge augmentation. Furthermore, we investigate whether their real-world comprehension primarily derives from contextual information or internal weights and analyse the impact of prompt formats and report bias of different objects. Our results show that real-world understanding that LLMs shaped from textual data can be vulnerable to deception and confusion by the surface form of prompts, which makes it less aligned with human behaviours.

摘要
研究人员强调体验语言理解，表明语言理解不仅是脑中的心理处理，还与物理环境和社会环境交互有关。随着大语言模型（LLMs）的快速发展和日常生活中的普遍存在，我们必须更加重视他们在实际情况下的理解能力。以聪明认知理论为引导，我们提出了POSQA：一个包含简单的大小比较问题的物理对象大小问答集，以探索最新的LLMs的具体实现和分析其物理语言理解机制。我们发现，当前最大的LLMs在零情况下表现很差。然后，我们使用高级推荐技术和外部知识增强。此外，我们还研究了LLMs的实际理解是否主要来自于上下文信息或内部权重，并分析了提问格式和对象报告偏见的影响。我们的结果表明，由文本数据塑造的LLMs可能因提问表现的表面形式而受到欺骗和混乱，这使其更加与人类行为不一致。

Learning Successor Representations with Distributed Hebbian Temporal Memory

paper_url: http://arxiv.org/abs/2310.13391
repo_url: None
paper_authors: Evgenii Dzhivelikian, Petr Kuderov, Aleksandr I. Panov
for: address the challenge of online hidden representation learning for decision-making under uncertainty in non-stationary, partially observable environments.
methods: based on factor graph formalism and a multicomponent neuron model, using distributed representations, sparse transition matrices, and local Hebbian-like learning rules.
results: outperforms classical LSTM and performs comparably to more advanced RNN-like algorithms, speeding up Temporal Difference learning for Successor Representation in changing environments.

Abstract
This paper presents a novel approach to address the challenge of online hidden representation learning for decision-making under uncertainty in non-stationary, partially observable environments. The proposed algorithm, Distributed Hebbian Temporal Memory (DHTM), is based on factor graph formalism and a multicomponent neuron model. DHTM aims to capture sequential data relationships and make cumulative predictions about future observations, forming Successor Representation (SR). Inspired by neurophysiological models of the neocortex, the algorithm utilizes distributed representations, sparse transition matrices, and local Hebbian-like learning rules to overcome the instability and slow learning process of traditional temporal memory algorithms like RNN and HMM. Experimental results demonstrate that DHTM outperforms classical LSTM and performs comparably to more advanced RNN-like algorithms, speeding up Temporal Difference learning for SR in changing environments. Additionally, we compare the SRs produced by DHTM to another biologically inspired HMM-like algorithm, CSCG. Our findings suggest that DHTM is a promising approach for addressing the challenges of online hidden representation learning in dynamic environments.

摘要

A Human-Robot Mutual Learning System with Affect-Grounded Language Acquisition and Differential Outcomes Training

paper_url: http://arxiv.org/abs/2310.13377
repo_url: None
paper_authors: Alva Markelius, Sofia Sjöberg, Zakaria Lemhauori, Laura Cohen, Martin Bergström, Robert Lowe, Lola Cañamero
for: 这个论文旨在探讨一种人机交互设置，以便人类和机器人共同学习符号语言，用于识别机器人的内部需求。
methods: 这个研究采用了一种差异性结果培训（DOT）协议，以便机器人对自己的内部需求（如饿）提供特定的反馈，并且人类通过正确的刺激（如 cookie）来响应机器人的需求。
results: 研究发现，在DOT协议下，人类的学习效率提高，并且可以更有效地学习机器人的语言。机器人在这个研究中使用了一个类似于人类婴儿语言发展阶段的词汇表。机器人的软件架构基于一种对情感相关的语言学习模型，将机器人的词汇与内部需求相关联。研究发现，在DOT conditon下，机器人的语言学习速度比非DOT控制condition更快 converges。参与者还报告了正面的情感体验、感到控制和与机器人之间的共鸣连接。这种教师-学生学习方法可能为增强治疗效果（如对偏抑郁症的治疗）做出贡献，通过增加人类在训练任务中的参与度，从而提高治疗效果。机器人的家OSTATIC需求启发的语言学习具有潜在的社会化和合作（教育）功能，可能为人类与机器人之间的交互带来更多的有用和满足。

Abstract
This paper presents a novel human-robot interaction setup for robot and human learning of symbolic language for identifying robot homeostatic needs. The robot and human learn to use and respond to the same language symbols that convey homeostatic needs and the stimuli that satisfy the homeostatic needs, respectively. We adopted a differential outcomes training (DOT) protocol whereby the robot provides feedback specific (differential) to its internal needs (e.g. `hunger') when satisfied by the correct stimulus (e.g. cookie). We found evidence that DOT can enhance the human's learning efficiency, which in turn enables more efficient robot language acquisition. The robot used in the study has a vocabulary similar to that of a human infant in the linguistic ``babbling'' phase. The robot software architecture is built upon a model for affect-grounded language acquisition where the robot associates vocabulary with internal needs (hunger, thirst, curiosity) through interactions with the human. The paper presents the results of an initial pilot study conducted with the interactive setup, which reveal that the robot's language acquisition achieves higher convergence rate in the DOT condition compared to the non-DOT control condition. Additionally, participants reported positive affective experiences, feeling of being in control, and an empathetic connection with the robot. This mutual learning (teacher-student learning) approach offers a potential contribution of facilitating cognitive interventions with DOT (e.g. for people with dementia) through increased therapy adherence as a result of engaging humans more in training tasks by taking an active teaching-learning role. The homeostatic motivational grounding of the robot's language acquisition has potential to contribute to more ecologically valid and social (collaborative/nurturing) interactions with robots.

摘要
这篇论文描述了一种新的人机交互设置，用于机器人和人类学习符号语言，以便识别机器人的内部需求。机器人和人类都学习了使用和响应同一种语言符号，表示内部需求和满足需求的刺激。我们采用了一种差分结果培训（DOT）协议，其中机器人提供特定的反馈（差分），以满足其内部需求（如快餐）。我们发现，DOT可以加强人类学习效率，从而使机器人语言学习更加高效。机器人在研究中使用的词汇数量与人类婴儿语言发展阶段相似。机器人软件架构基于语言学习模型，其中机器人通过与人类的互动关系 vocabulary 与内部需求（快餐、喝彩、好奇）相联系。研究发现，在DOT条件下，机器人语言学习具有更高的吞吐率，而控制条件下则相对较低。此外，参与者报告了正面的情感体验、感到控制和与机器人之间的共鸣。这种教师学生学习方法可能为增强治疗效率而做出贡献，例如用于人们的诱导疗法（如偏 wurlitzer 症），通过增加人类在训练任务中的活跃参与，从而提高治疗效率。机器人的内部需求驱动的语言学习有助于实现更加生动化和社交（合作/善父）的机器人交互。

VFedMH: Vertical Federated Learning for Training Multi-party Heterogeneous Models

paper_url: http://arxiv.org/abs/2310.13367
repo_url: None
paper_authors: Shuo Wang, Keke Gai, Jing Yu, Liehuang Zhu
for:This paper proposes a novel approach called Vertical Federated learning for training Multi-parties Heterogeneous models (VFedMH) to address the challenges of heterogeneous local models among participants in existing VFL methods.methods:The approach focuses on aggregating the embeddings of each participant’s knowledge instead of intermediate results during forward propagation. The active party securely aggregates local embeddings to obtain global knowledge embeddings and sends them to passive parties, who then utilize the global embeddings to propagate forward on their local heterogeneous networks. The active party assists the passive party in computing its local heterogeneous model gradients.results:The paper demonstrates that VFedMH can simultaneously train multiple heterogeneous models with heterogeneous optimization and outperform some recent methods in model performance. The paper also provides a theoretical analysis of VFedMH’s convergence performance.

Abstract
Vertical Federated Learning (VFL) has gained increasing attention as a novel training paradigm that integrates sample alignment and feature union. However, existing VFL methods face challenges when dealing with heterogeneous local models among participants, which affects optimization convergence and generalization. To address this issue, this paper proposes a novel approach called Vertical Federated learning for training Multi-parties Heterogeneous models (VFedMH). VFedMH focuses on aggregating the embeddings of each participant's knowledge instead of intermediate results during forward propagation. The active party, who possesses labels and features of the sample, in VFedMH securely aggregates local embeddings to obtain global knowledge embeddings, and sends them to passive parties. The passive parties, who own only features of the sample, then utilize the global embeddings to propagate forward on their local heterogeneous networks. However, the passive party does not own the labels, so the local model gradient cannot be calculated locally. To overcome this limitation, the active party assists the passive party in computing its local heterogeneous model gradients. Then, each participant trains their local model using the heterogeneous model gradients. The objective is to minimize the loss value of their respective local heterogeneous models. Additionally, the paper provides a theoretical analysis of VFedMH's convergence performance. Extensive experiments are conducted to demonstrate that VFedMH can simultaneously train multiple heterogeneous models with heterogeneous optimization and outperform some recent methods in model performance.

摘要
垂直联邦学习（VFL）在最近几年中得到了越来越多的关注，这是一种新的训练方法，它将把样本Alignment和特征Union结合在一起。然而，现有的VFL方法在参与者之间存在不同的本地模型，这会影响优化征程和泛化性。为了解决这个问题，这篇论文提出了一种新的方法，即Vertically Federated Learning for training Multi-parties Heterogeneous models（VFedMH）。VFedMH通过在每个参与者的知识嵌入上进行聚合来取代传输中间结果的方法。活动参与者，即拥有标签和样本特征的方，在VFedMH中安全地聚合本地嵌入，并将其发送到抗拒参与者。抗拒参与者，即拥有样本特征但没有标签的方，然后使用全球嵌入来在本地不同的网络上进行前进传播。然而，抗拒参与者没有标签，因此本地模型梯度无法计算本地。为了解决这个问题，活动参与者为抗拒参与者计算本地不同模型梯度。然后，每个参与者使用自己的本地模型梯度来训练自己的本地模型，目标是将本地模型的损失值最小化。此外，论文还提供了VFL的准确性性分析。广泛的实验表明，VFedMH可以同时训练多个不同的模型，并且在模型性能方面超越一些最新的方法。

Towards General Error Diagnosis via Behavioral Testing in Machine Translation

paper_url: http://arxiv.org/abs/2310.13362
repo_url: https://github.com/wujunjie1998/btpgbt
paper_authors: Junjie Wu, Lemao Liu, Dit-Yan Yeung
for: 本研究旨在提供一种基于行为测试的机器翻译系统诊断方法，以检测机器翻译系统的通用错误。
methods: 本研究提出了一种新的双语翻译对生成基于行为测试（BTPGBT）框架，通过自动生成高质量测试 случа和 Pseudoreferences，以便对机器翻译系统进行全面和准确的行为测试。
results: 实验结果表明，BTPGBT 可以为机器翻译系统提供全面和准确的行为测试结果，并提供了一些有趣的发现。codes和数据可以在 https: //github.com/wujunjie1998/BTPGBT 上下载。

Abstract
Behavioral testing offers a crucial means of diagnosing linguistic errors and assessing capabilities of NLP models. However, applying behavioral testing to machine translation (MT) systems is challenging as it generally requires human efforts to craft references for evaluating the translation quality of such systems on newly generated test cases. Existing works in behavioral testing of MT systems circumvent this by evaluating translation quality without references, but this restricts diagnosis to specific types of errors, such as incorrect translation of single numeric or currency words. In order to diagnose general errors, this paper proposes a new Bilingual Translation Pair Generation based Behavior Testing (BTPGBT) framework for conducting behavioral testing of MT systems. The core idea of BTPGBT is to employ a novel bilingual translation pair generation (BTPG) approach that automates the construction of high-quality test cases and their pseudoreferences. Experimental results on various MT systems demonstrate that BTPGBT could provide comprehensive and accurate behavioral testing results for general error diagnosis, which further leads to several insightful findings. Our code and data are available at https: //github.com/wujunjie1998/BTPGBT.

摘要
行为测试提供了诊断语言错误和评估自然语言处理器（NLP）模型能力的重要方式。但在机器翻译（MT）系统上进行行为测试是具有挑战性，因为通常需要人工劳动来制定评估MT系统翻译质量的参考。现有的MT系统行为测试方法通过不使用参考来评估翻译质量，但这限定了诊断的类型为单个数字或货币词的错误。为了诊断通用错误，本文提出了一种新的行为测试框架（BTPGBT），基于自动生成高质量测试用例和其 Pseudoreferences的翻译对。实验结果表明，BTPGBT可以提供全面和准确的行为测试结果，用于普遍错误诊断，并且导致了一些有价值的发现。我们的代码和数据可以在https: //github.com/wujunjie1998/BTPGBT上获取。

Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

paper_url: http://arxiv.org/abs/2310.13361
repo_url: https://github.com/ictnlp/SAMMT
paper_authors: Wenyu Guo, Qingkai Fang, Dong Yu, Yang Feng
for: simultaneous machine translation and image input
methods: using powerful text-to-image generation models and minimizing the gap between synthetic and authentic image representations
results: achieving state-of-the-art performance on Multi30K En-De and En-Fr datasets while remaining independent of authentic images during inference

Abstract
Multimodal machine translation (MMT) simultaneously takes the source sentence and a relevant image as input for translation. Since there is no paired image available for the input sentence in most cases, recent studies suggest utilizing powerful text-to-image generation models to provide image inputs. Nevertheless, synthetic images generated by these models often follow different distributions compared to authentic images. Consequently, using authentic images for training and synthetic images for inference can introduce a distribution shift, resulting in performance degradation during inference. To tackle this challenge, in this paper, we feed synthetic and authentic images to the MMT model, respectively. Then we minimize the gap between the synthetic and authentic images by drawing close the input image representations of the Transformer Encoder and the output distributions of the Transformer Decoder. Therefore, we mitigate the distribution disparity introduced by the synthetic images during inference, thereby freeing the authentic images from the inference process.Experimental results show that our approach achieves state-of-the-art performance on the Multi30K En-De and En-Fr datasets, while remaining independent of authentic images during inference.

摘要
多模态机器翻译（MMT）同时接受源句子和相关的图像作为翻译输入。由于翻译输入句子中的图像 rarely available，recent studies suggest 使用强大的文本到图像生成模型提供图像输入。然而，由这些模型生成的图像经常遵循不同的分布，从而导致在推理过程中的分布偏移，从而降低翻译性能。为解决这个挑战，在这篇论文中，我们将Feed synthetic和authentic图像给MMT模型，然后将synthetic和authentic图像的输入图像表示和Transformer Encoder的输出分布减小到最小值。因此，我们消除了在推理过程中由synthetic图像引入的分布偏移，使得authentic图像可以在推理过程中自由发挥作用。实验结果显示，我们的方法在Multi30K En-De和En-Fr数据集上实现了状态的翻译性能，同时不依赖于authentic图像进行推理。

DYNAMITE: Dynamic Interplay of Mini-Batch Size and Aggregation Frequency for Federated Learning with Static and Streaming Dataset

paper_url: http://arxiv.org/abs/2310.14906
repo_url: None
paper_authors: Weijie Liu, Xiaoxi Zhang, Jingpu Duan, Carlee Joe-Wong, Zhi Zhou, Xu Chen
for: This paper focuses on analyzing the joint effects of adjusting batch size and aggregation frequency on model performance, training time, and resource consumption in federated learning (FL) training, especially when facing dynamic data streams and network characteristics.
methods: The paper introduces novel analytical models and optimization algorithms that leverage the interplay between batch size and aggregation frequency to navigate the trade-offs among convergence, cost, and completion time for dynamic FL training. The paper also derives closed-form solutions for co-optimized batch size and aggregation frequency that are consistent across all devices.
results: The paper conducts extensive experiments to demonstrate the superiority of the proposed offline optimal solutions and online adaptive algorithm. The results show that the proposed methods can efficiently train accurate FL models while addressing the heterogeneity of both data and system characteristics.

Abstract
Federated Learning (FL) is a distributed learning paradigm that can coordinate heterogeneous edge devices to perform model training without sharing private data. While prior works have focused on analyzing FL convergence with respect to hyperparameters like batch size and aggregation frequency, the joint effects of adjusting these parameters on model performance, training time, and resource consumption have been overlooked, especially when facing dynamic data streams and network characteristics. This paper introduces novel analytical models and optimization algorithms that leverage the interplay between batch size and aggregation frequency to navigate the trade-offs among convergence, cost, and completion time for dynamic FL training. We establish a new convergence bound for training error considering heterogeneous datasets across devices and derive closed-form solutions for co-optimized batch size and aggregation frequency that are consistent across all devices. Additionally, we design an efficient algorithm for assigning different batch configurations across devices, improving model accuracy and addressing the heterogeneity of both data and system characteristics. Further, we propose an adaptive control algorithm that dynamically estimates network states, efficiently samples appropriate data batches, and effectively adjusts batch sizes and aggregation frequency on the fly. Extensive experiments demonstrate the superiority of our offline optimal solutions and online adaptive algorithm.

摘要
联合学习（FL）是一种分布式学习模式，可以在不同的边缘设备上进行模型训练，无需分享private数据。而且，以前的研究主要关注了FL的整合程度和批处理频率之间的关系，而忽略了在面对动态数据流和网络特性时，这些参数的共同影响。本文提出了新的分析模型和优化算法，通过批处理大小和汇集频率之间的交互来导航模型性能、训练时间和资源消耗之间的负面oren。我们提出了一个新的训练误差下界，考虑到不同设备上的hetogeneous数据集，并 deriv出closed-form解决方案，可以在所有设备上实现共同的批处理大小和汇集频率。此外，我们设计了一种高效的分配不同批处理配置的算法，以提高模型精度和处理不同数据和系统特性的hetogeneity。最后，我们提出了一种自适应控制算法，可以在 fly 上精准地估算网络状态，选择合适的数据批处理，并动态地调整批处理大小和汇集频率。广泛的实验表明了我们的offline优化解决方案和在线自适应算法的优越性。

NurViD: A Large Expert-Level Video Database for Nursing Procedure Activity Understanding

paper_url: http://arxiv.org/abs/2310.13347
repo_url: https://github.com/minghu0830/NurViD-benchmark
paper_authors: Ming Hu, Lin Wang, Siyuan Yan, Don Ma, Qingli Ren, Peng Xia, Wei Feng, Peibo Duan, Lie Ju, Zongyuan Ge
for: 这个论文旨在提高护理活动理解的质量和安全性，通过应用深度学习技术，促进教育和培训，改善质量控制，并启用操作符控制监测。
methods: 这个论文使用的方法包括使用深度学习技术进行护理活动理解，并提供了一个大型视频数据集（NurViD），其包含了51种护理程序和177个动作步骤的专家级标注。
results: 这个论文的结果显示，使用现有的深度学习方法在护理活动理解方面的效果不佳，而 NurViD 数据集可以帮助改善这种效果。

Abstract
The application of deep learning to nursing procedure activity understanding has the potential to greatly enhance the quality and safety of nurse-patient interactions. By utilizing the technique, we can facilitate training and education, improve quality control, and enable operational compliance monitoring. However, the development of automatic recognition systems in this field is currently hindered by the scarcity of appropriately labeled datasets. The existing video datasets pose several limitations: 1) these datasets are small-scale in size to support comprehensive investigations of nursing activity; 2) they primarily focus on single procedures, lacking expert-level annotations for various nursing procedures and action steps; and 3) they lack temporally localized annotations, which prevents the effective localization of targeted actions within longer video sequences. To mitigate these limitations, we propose NurViD, a large video dataset with expert-level annotation for nursing procedure activity understanding. NurViD consists of over 1.5k videos totaling 144 hours, making it approximately four times longer than the existing largest nursing activity datasets. Notably, it encompasses 51 distinct nursing procedures and 177 action steps, providing a much more comprehensive coverage compared to existing datasets that primarily focus on limited procedures. To evaluate the efficacy of current deep learning methods on nursing activity understanding, we establish three benchmarks on NurViD: procedure recognition on untrimmed videos, procedure and action recognition on trimmed videos, and action detection. Our benchmark and code will be available at \url{https://github.com/minghu0830/NurViD-benchmark}.

摘要
使用深度学习对护理程序活动理解可能会大幅提高护理人员和病人之间的质量和安全性。通过这种技术，我们可以提供培训和教育，提高质量控制，并启用操作符合性监测。然而，在这一领域的自动识别系统开发目前受到数据鲜血的限制。现有的视频数据集存在多种限制：1）这些数据集较小，无法支持全面的护理活动调查; 2）它们主要关注单一的程序，缺乏专家级别的护理程序和操作步骤的标注; 3）它们缺乏时间地标注，这使得targeted action在更长的视频序列中不能有效地local化。为了缓解这些限制，我们提出NurViD，一个大型视频数据集，包含专家级别的护理程序活动理解标注。NurViD包含1.5k个视频，总时长144小时，比现有最大的护理活动数据集长得多。其中包含51种不同的护理程序和177个操作步骤，比现有数据集更加全面。为了评估当前深度学习方法在护理活动理解方面的效果，我们建立了三个benchmark在NurViD上：程序认知在未处理视频上，程序和操作认知在处理视频上，以及操作检测。我们的benchmark和代码将在GitHub上公开，请参阅\url{https://github.com/minghu0830/NurViD-benchmark}.

Challenges and Contributing Factors in the Utilization of Large Language Models (LLMs)

paper_url: http://arxiv.org/abs/2310.13343
repo_url: None
paper_authors: Xiaoliang Chen, Liangbin Li, Le Chang, Yunhe Huang, Yuxuan Zhao, Yuxiao Zhang, Dinuo Li
for: 本文探讨了大语言模型（LLM）在各种应用场景中的挑战，包括域域特性、知识忘记、知识重复、知识虚假等问题。
methods: 本文提出了多种解决这些问题的方法，包括数据多样化、模型细化、可见性和解释性提高、优化模型、增加优化和公平性训练等。
results: 本文预测未来的LLM将强调公平、透明度和优素，以保证在服务人类时保持高的道德和伦理水平。

Abstract
With the development of large language models (LLMs) like the GPT series, their widespread use across various application scenarios presents a myriad of challenges. This review initially explores the issue of domain specificity, where LLMs may struggle to provide precise answers to specialized questions within niche fields. The problem of knowledge forgetting arises as these LLMs might find it hard to balance old and new information. The knowledge repetition phenomenon reveals that sometimes LLMs might deliver overly mechanized responses, lacking depth and originality. Furthermore, knowledge illusion describes situations where LLMs might provide answers that seem insightful but are actually superficial, while knowledge toxicity focuses on harmful or biased information outputs. These challenges underscore problems in the training data and algorithmic design of LLMs. To address these issues, it's suggested to diversify training data, fine-tune models, enhance transparency and interpretability, and incorporate ethics and fairness training. Future technological trends might lean towards iterative methodologies, multimodal learning, model personalization and customization, and real-time learning and feedback mechanisms. In conclusion, future LLMs should prioritize fairness, transparency, and ethics, ensuring they uphold high moral and ethical standards when serving humanity.

摘要
大型语言模型（LLM）如GPT系列的发展，在各种应用场景中广泛使用，却也带来许多挑战。本文首先探讨领域特定性问题， LLM 可能难以回答特殊领域内的精细问题。知识忘却问题表明这些 LLM 可能很难以平衡旧和新信息。知识重复现象表明 LLM 可能提供过于机械化的答案，缺乏深度和创新。此外，知识幻觉问题描述了 LLM 可能提供的答案似乎具有深度和创新，但实际上是 superficiel 的。知识毒性问题则关注 LLM 输出的有害或偏见信息。这些挑战表明 LLM 的训练数据和算法设计存在问题。为解决这些问题，建议多样化训练数据，细化模型，提高透明度和可解释性，并包括伦理和公平训练。未来技术趋势可能是迭代方法、多模态学习、模型个性化和定制化，以及实时学习和反馈机制。在结束时，未来 LLM 应该优先考虑公平、透明度和伦理，以确保它们在服务人类时保持高的道德和伦理标准。

FLAIR: a Country-Scale Land Cover Semantic Segmentation Dataset From Multi-Source Optical Imagery

paper_url: http://arxiv.org/abs/2310.13336
repo_url: None
paper_authors: Anatol Garioud, Nicolas Gonthier, Loic Landrieu, Apolline De Wit, Marion Valette, Marc Poupée, Sébastien Giordano, Boris Wattrelos
for: FLAIR 是一个大规模地理分类的数据集，用于监测和理解人类活动的发展指标，如城市化、扫林和土壤人工化。
methods: FLAIR 使用高分辨率遥感图像和时间序列数据，并提供了精确的地表类型分类。
results: FLAIR 提供了817平方公里的高分辨率遥感图像，以及20亿个分类标签，可以用于开发和评估大规模地理分类算法。

Abstract
We introduce the French Land cover from Aerospace ImageRy (FLAIR), an extensive dataset from the French National Institute of Geographical and Forest Information (IGN) that provides a unique and rich resource for large-scale geospatial analysis. FLAIR contains high-resolution aerial imagery with a ground sample distance of 20 cm and over 20 billion individually labeled pixels for precise land-cover classification. The dataset also integrates temporal and spectral data from optical satellite time series. FLAIR thus combines data with varying spatial, spectral, and temporal resolutions across over 817 km2 of acquisitions representing the full landscape diversity of France. This diversity makes FLAIR a valuable resource for the development and evaluation of novel methods for large-scale land-cover semantic segmentation and raises significant challenges in terms of computer vision, data fusion, and geospatial analysis. We also provide powerful uni- and multi-sensor baseline models that can be employed to assess algorithm's performance and for downstream applications. Through its extent and the quality of its annotation, FLAIR aims to spur improvements in monitoring and understanding key anthropogenic development indicators such as urban growth, deforestation, and soil artificialization. Dataset and codes can be accessed at https://ignf.github.io/FLAIR/

摘要
我们介绍法国陆地覆盖物（FLAIR），一个广泛的数据集来自法国国家地理和森林信息研究所（IGN），提供了一个独特和丰富的大规模地ospatial分析资源。FLAIR包含高分辨率飞行图像，地面抽象距离20 cm，超过20亿个准确标注的像素，用于精确的陆地覆盖类别分类。数据集还 integraoptical卫星时序序数据。因此，FLAIR结合了不同的空间、spectral和时间分辨率，覆盖了法国的全景多样性，总面积超过817 km2。这种多样性使FLAIR成为大规模陆地Semantic分类的开发和评估的丰富资源，同时也提出了计算机视觉、数据融合和地ospatial分析的挑战。我们还提供了强大的单感器和多感器基线模型，可以用于评估算法性能和下游应用。通过其覆盖范围和精确的标注，FLAIR希望能促进跟踪和理解人类发展指标，如城市增长、Deforestation和 soil artificialization。数据集和代码可以在https://ignf.github.io/FLAIR/上获取。

Democratizing Reasoning Ability: Tailored Learning from Large Language Model

paper_url: http://arxiv.org/abs/2310.13332
repo_url: https://github.com/raibows/learn-to-reason
paper_authors: Zhaoyang Wang, Shaohan Huang, Yuxuan Liu, Jiahai Wang, Minghui Song, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
for: 提高小型语言模型（LLM）的推理能力，使其能够更广泛应用于自然语言处理领域。
methods: 提出了一种特化的学习方法，通过在多轮学习 paradigm 中，将大型黑盒语言模型（LLM）作为教师，为小型语言模型（LM）提供个性化的培训数据，以提高LM的推理能力。同时，通过自我反思学习，让学生从自己的错误中学习。
results: 经过实验和分析，表明该方法可以有效提高小型语言模型的推理能力，并且可以在数学和常识理解任务上达到比较高的性能。

Abstract
Large language models (LLMs) exhibit impressive emergent abilities in natural language processing, but their democratization is hindered due to huge computation requirements and closed-source nature. Recent research on advancing open-source smaller LMs by distilling knowledge from black-box LLMs has obtained promising results in the instruction-following ability. However, the reasoning ability which is more challenging to foster, is relatively rarely explored. In this paper, we propose a tailored learning approach to distill such reasoning ability to smaller LMs to facilitate the democratization of the exclusive reasoning ability. In contrast to merely employing LLM as a data annotator, we exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm. This paradigm enables the student to expose its deficiencies to the black-box teacher who then can provide customized training data in return. Further, to exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes. The learning from self-reflection and LLM are all tailored to the student's learning status, thanks to the seamless integration with the multi-round learning paradigm. Comprehensive experiments and analysis on mathematical and commonsense reasoning tasks demonstrate the effectiveness of our method. The code will be available at https://github.com/Raibows/Learn-to-Reason.

摘要
大型自然语言处理模型（LLM）表现出印象的emergent能力，但其普及化受到巨大计算需求和封闭式 natura 的阻碍。现代研究推进小型开源LLM的进步，通过将黑盒LLM知识蒸馏到小型LLM中，已经获得了惊人的成果。然而，更加困难的是 fostering 的逻辑能力，通常不会被探索。在本文中，我们提出了一个适应学习方法，以将这种逻辑能力蒸馏到小型LLM中，以便普及化这种专有的逻辑能力。相比使用LLM作为标签生成器，我们利用LLM的启发力作为教育师，建立了互动多轮学习 paradigm。这个 paradigm 让学生可以向黑盒教育师表达自己的不足，并且获得专门的训练数据回应。此外，为了套用小型LLM的逻辑潜力，我们提出了自我反思学习，让学生从自己的错误中学习。这些学习自我反思和LLM都是根据学习状态进行定制，感谢与多轮学习 paradigm 的整合。实验和分析显示了我们的方法的有效性，代码将会在 GitHub 上公开。

Boosting for Bounding the Worst-class Error

paper_url: http://arxiv.org/abs/2310.14890
repo_url: None
paper_authors: Yuya Saito, Shinnosuke Matsuo, Seiichi Uchida, Daiki Suehiro
for: 本文解决了最坏类错误率问题，而不是通过平均误差率来衡量所有类的性能。
methods: 本文提出了一种提高算法，该算法可以确保最坏类训练误差的Upper bound，并 deriv出其泛化 bound。
results: 实验结果显示，该算法可以降低测试集最坏类误差率，而不会过拟合训练集。

Abstract
This paper tackles the problem of the worst-class error rate, instead of the standard error rate averaged over all classes. For example, a three-class classification task with class-wise error rates of 10\%, 10\%, and 40\% has a worst-class error rate of 40\%, whereas the average is 20\% under the class-balanced condition. The worst-class error is important in many applications. For example, in a medical image classification task, it would not be acceptable for the malignant tumor class to have a 40\% error rate, while the benign and healthy classes have 10\% error rates.We propose a boosting algorithm that guarantees an upper bound of the worst-class training error and derive its generalization bound. Experimental results show that the algorithm lowers worst-class test error rates while avoiding overfitting to the training set.

摘要
We propose a boosting algorithm that guarantees an upper bound of the worst-class training error and derive its generalization bound. Experimental results show that the algorithm reduces worst-class test error rates while avoiding overfitting to the training set.

Coarse-to-Fine Dual Encoders are Better Frame Identification Learners

paper_url: http://arxiv.org/abs/2310.13316
repo_url: https://github.com/pkunlp-icler/cofftea
paper_authors: Kaikai An, Ce Zheng, Bofei Gao, Haozhe Zhao, Baobao Chang
for: 本研究旨在提高FrameIdentification的精度和效率，尤其是在面临大量候选框的情况下。
methods: 本文提出了CoFFTEA Architecture，包括Coarse-to-Fine Encoders和 dual encoders，通过对框和目标进行对齐学习，以提高FrameIdentification的精度和效率。
results: 实验结果表明，CoFFTEA比前一代模型提高0.93的总得分和1.53的R@1指标，而不使用$lf$。此外，CoFFTEA还能更好地模型框和框之间的关系，以及目标和目标之间的关系。

Abstract
Frame identification aims to find semantic frames associated with target words in a sentence. Recent researches measure the similarity or matching score between targets and candidate frames by modeling frame definitions. However, they either lack sufficient representation learning of the definitions or face challenges in efficiently selecting the most suitable frame from over 1000 candidate frames. Moreover, commonly used lexicon filtering ($lf$) to obtain candidate frames for the target may ignore out-of-vocabulary targets and cause inadequate frame modeling. In this paper, we propose CoFFTEA, a $\underline{Co}$arse-to-$\underline{F}$ine $\underline{F}$rame and $\underline{T}$arget $\underline{E}$ncoders $\underline{A}$rchitecture. With contrastive learning and dual encoders, CoFFTEA efficiently and effectively models the alignment between frames and targets. By employing a coarse-to-fine curriculum learning procedure, CoFFTEA gradually learns to differentiate frames with varying degrees of similarity. Experimental results demonstrate that CoFFTEA outperforms previous models by 0.93 overall scores and 1.53 R@1 without $lf$. Further analysis suggests that CoFFTEA can better model the relationships between frame and frame, as well as target and target. The code for our approach is available at https://github.com/pkunlp-icler/COFFTEA.

摘要
框架识别目标words的Semantic框的相关研究。近期研究通过定义框的模型来衡量目标和候选框之间的相似性或匹配得分。然而，它们可能缺乏定义框的表示学习或有效地从1000多个候选框中选择最适合的框。另外，通常使用词典筛选($lf$)来获取候选框的目标可能忽略到词语表外的目标，从而导致不充分的框模型。在这篇论文中，我们提出了CoFFTEA，一种$\underline{Co}$arse-to-$\underline{F}$ine $\underline{F}$rame和$\underline{T}$arget $\underline{E}$ncoders $\underline{A}$rchitecture。通过对框和目标的对齐学习，CoFFTEA可以高效地和高效地模型框和目标之间的对齐。通过使用粗细度逐步学习程序，CoFFTEA逐渐学习到不同程度的相似度之间的分化。实验结果表明，CoFFTEA在前一代模型的0.93的总得分和1.53的R@1（不使用$lf）。进一步分析表明，CoFFTEA可以更好地模型框和框之间，以及目标和目标之间的关系。我们的代码可以在https://github.com/pkunlp-icler/COFFTEA中获取。

paper_url: http://arxiv.org/abs/2310.13297
repo_url: None
paper_authors: Chenkai Sun, Jinning Li, Yi R. Fung, Hou Pong Chan, Tarek Abdelzaher, ChengXiang Zhai, Heng Ji
For: The paper aims to improve the accuracy of automatic response forecasting for news media, specifically in cases where explicit profiles or historical actions of users are limited (referred to as lurkers).* Methods: The proposed framework, SocialSense, leverages a large language model to induce a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics.* Results: The proposed method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings, demonstrating its effectiveness in response forecasting, and is capable of handling unseen user and lurker scenarios.Here is the same information in Simplified Chinese:* For: 文章目标是改进新闻媒体自动回应预测精度，特别是在用户的详细信息或历史行为 Limited (referred to as lurkers) 情况下。* Methods: 提议的社交感知框架 SocialSense 利用大型语言模型生成 belief-centered 图 на основе现有的社交网络，并通过图基本传播来捕捉社交动态。* Results: 提议的方法在 zero-shot 和 supervised 设置下的实验评估中超过现有状态的前ier，表明其在回应预测中的有效性，并能够处理未seen 用户和隐身用户情况。

Abstract
Automatic response forecasting for news media plays a crucial role in enabling content producers to efficiently predict the impact of news releases and prevent unexpected negative outcomes such as social conflict and moral injury. To effectively forecast responses, it is essential to develop measures that leverage the social dynamics and contextual information surrounding individuals, especially in cases where explicit profiles or historical actions of the users are limited (referred to as lurkers). As shown in a previous study, 97% of all tweets are produced by only the most active 25% of users. However, existing approaches have limited exploration of how to best process and utilize these important features. To address this gap, we propose a novel framework, named SocialSense, that leverages a large language model to induce a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics. We hypothesize that the induced graph that bridges the gap between distant users who share similar beliefs allows the model to effectively capture the response patterns. Our method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings, demonstrating its effectiveness in response forecasting. Moreover, the analysis reveals the framework's capability to effectively handle unseen user and lurker scenarios, further highlighting its robustness and practical applicability.

摘要
自动回应预测 для新闻媒体在内容制作者能够有效预测新闻发布后的影响，避免不必要的负面结果，如社交冲突和道德伤害。为了有效预测回应，需要开发机制，利用社交动力和用户境外信息，尤其是在用户没有明确 Profiling 或历史行为时。据前一研究显示，97%的所有推文来自只有25%最活跃的用户。然而，现有方法尚未充分探讨如何最佳处理和利用这些重要特征。为了解决这个空白，我们提出了一种新的框架，名为SocialSense，它利用大型自然语言模型生成一个带有信念中心的图，并利用图基于传播来捕捉社交动力。我们假设，生成的图可以bridging distant用户之间的相似信念，使模型能够有效地捕捉回应模式。我们的方法在实验评估中超越了现有状态的艺术，demonstrating its effectiveness in response forecasting。此外，分析表明框架可以有效处理未看到用户和寂寂者enario，进一步强调其可靠性和实用性。

PathRL: An End-to-End Path Generation Method for Collision Avoidance via Deep Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.13295
repo_url: None
paper_authors: Wenhao Yu, Jie Peng, Quecheng Qiu, Hanyu Wang, Lu Zhang, Jianmin Ji
for: 提高移动机器人性能的深度强化学习（DRL）方法
methods: 使用特定的动作空间精度化技术和适应状态空间表示方法来解决训练困难
results: 比其他DRL导航方法更高的成功率和降低旋转幅度的稳定和平滑机器人运动

Abstract
Robot navigation using deep reinforcement learning (DRL) has shown great potential in improving the performance of mobile robots. Nevertheless, most existing DRL-based navigation methods primarily focus on training a policy that directly commands the robot with low-level controls, like linear and angular velocities, which leads to unstable speeds and unsmooth trajectories of the robot during the long-term execution. An alternative method is to train a DRL policy that outputs the navigation path directly. However, two roadblocks arise for training a DRL policy that outputs paths: (1) The action space for potential paths often involves higher dimensions comparing to low-level commands, which increases the difficulties of training; (2) It takes multiple time steps to track a path instead of a single time step, which requires the path to predicate the interactions of the robot w.r.t. the dynamic environment in multiple time steps. This, in turn, amplifies the challenges associated with training. In response to these challenges, we propose PathRL, a novel DRL method that trains the policy to generate the navigation path for the robot. Specifically, we employ specific action space discretization techniques and tailored state space representation methods to address the associated challenges. In our experiments, PathRL achieves better success rates and reduces angular rotation variability compared to other DRL navigation methods, facilitating stable and smooth robot movement. We demonstrate the competitive edge of PathRL in both real-world scenarios and multiple challenging simulation environments.

摘要
仿生Navigation使用深度强化学习（DRL）已经显示出了提高移动机器人性能的潜在力量。然而，现有的大多数DRL基于 Navigation方法都主要集中于直接训练机器人的低级指令，如线性和ANGULAR velocity，这会导致机器人在长期执行中的速度不稳定和轨迹不平滑。作为一种alternative方法，可以训练一个DRL策略，输出机器人的Navigation path。然而，两个障碍物 arise for training a DRL策略：（1）action space for potential paths often involves higher dimensions comparing to low-level commands，这会增加训练的difficulties;（2）It takes multiple time steps to track a path instead of a single time step，这需要机器人在多个时间步骤中与动态环境进行互动，从而增加训练的挑战。为回应这些挑战，我们提出了PathRL，一种新的DRL方法，训练策略是生成机器人的Navigation path。我们使用specific action space discretization techniques和tailored state space representation methods来解决相关的挑战。在我们的实验中，PathRL实现了与其他DRL Navigation方法相比更高的成功率和降低ANGULAR rotation variability，使机器人的移动更加稳定和平滑。我们在实际场景和多个复杂的simulation环境中证明了PathRL的竞争力。

Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks

paper_url: http://arxiv.org/abs/2310.13291
repo_url: None
paper_authors: Ruixiang Tang, Gord Lueck, Rodolfo Quispe, Huseyin A Inan, Janardhan Kulkarni, Xia Hu
for: 研究它们可以攻击语言模型的数据成员情报泄露问题。
methods: 利用文本相似性和模型对文档修改的抵抗性作为可能的数据成员情报泄露信号，并评估其效果在广泛使用的 dataset 上。
results: 结果表明，摘要模型容易泄露数据成员情报，即使参考摘要不可用。此外，我们还讨论了训练摘要模型的安全措施，并讨论了数据隐私和实用性之间的自然补偿。

Abstract
Large language models have revolutionized the field of NLP by achieving state-of-the-art performance on various tasks. However, there is a concern that these models may disclose information in the training data. In this study, we focus on the summarization task and investigate the membership inference (MI) attack: given a sample and black-box access to a model's API, it is possible to determine if the sample was part of the training data. We exploit text similarity and the model's resistance to document modifications as potential MI signals and evaluate their effectiveness on widely used datasets. Our results demonstrate that summarization models are at risk of exposing data membership, even in cases where the reference summary is not available. Furthermore, we discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.

摘要

Unified Pretraining for Recommendation via Task Hypergraphs

paper_url: http://arxiv.org/abs/2310.13286
repo_url: https://github.com/mdyfrank/uprth
paper_authors: Mingdai Yang, Zhiwei Liu, Liangwei Yang, Xiaolong Liu, Chen Wang, Hao Peng, Philip S. Yu
for: 提出一个 novel 多任务预训练框架 UPRTH，以满足各种推荐任务的多元需求和特点。
methods: 提出了一个名为 task hypergraphs 的新方法，可以将多个预训练任务转换为 Hyperedge 预测任务，并将这些任务联系到推荐任务上。同时，提出了一个名为 transitional attention 的新层，可以精确地学习每个预训练任务与推荐任务之间的相关性。
results: 透过实验结果，显示 UPRTH 的超越性，并进行了详细的探索，以证明提案的架构的有效性。

Abstract
Although pretraining has garnered significant attention and popularity in recent years, its application in graph-based recommender systems is relatively limited. It is challenging to exploit prior knowledge by pretraining in widely used ID-dependent datasets. On one hand, user-item interaction history in one dataset can hardly be transferred to other datasets through pretraining, where IDs are different. On the other hand, pretraining and finetuning on the same dataset leads to a high risk of overfitting. In this paper, we propose a novel multitask pretraining framework named Unified Pretraining for Recommendation via Task Hypergraphs. For a unified learning pattern to handle diverse requirements and nuances of various pretext tasks, we design task hypergraphs to generalize pretext tasks to hyperedge prediction. A novel transitional attention layer is devised to discriminatively learn the relevance between each pretext task and recommendation. Experimental results on three benchmark datasets verify the superiority of UPRTH. Additional detailed investigations are conducted to demonstrate the effectiveness of the proposed framework.

摘要
(Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China. The translation may vary depending on the region or dialect.)

Meta-learning of Physics-informed Neural Networks for Efficiently Solving Newly Given PDEs

paper_url: http://arxiv.org/abs/2310.13270
repo_url: None
paper_authors: Tomoharu Iwata, Yusuke Tanaka, Naonori Ueda
for: 解决各种partial differential equation(PDE)问题
methods: 使用神经网络基于meta-学习方法，将PDE问题编码成问题表示，并使用神经网络预测解决方案
results: 比较 existed方法，提出的方法可以更高效地预测PDE问题的解决方案

Abstract
We propose a neural network-based meta-learning method to efficiently solve partial differential equation (PDE) problems. The proposed method is designed to meta-learn how to solve a wide variety of PDE problems, and uses the knowledge for solving newly given PDE problems. We encode a PDE problem into a problem representation using neural networks, where governing equations are represented by coefficients of a polynomial function of partial derivatives, and boundary conditions are represented by a set of point-condition pairs. We use the problem representation as an input of a neural network for predicting solutions, which enables us to efficiently predict problem-specific solutions by the forwarding process of the neural network without updating model parameters. To train our model, we minimize the expected error when adapted to a PDE problem based on the physics-informed neural network framework, by which we can evaluate the error even when solutions are unknown. We demonstrate that our proposed method outperforms existing methods in predicting solutions of PDE problems.

摘要
我们提出了一种基于神经网络的meta学习方法，用于效率地解决部分 diferencial equation（PDE）问题。我们的方法可以快速地学习解决各种PDE问题，并使用这些知识来解决新给定的PDE问题。我们将PDE问题编码为一个问题表示，使用神经网络来表示管理方程，并将边界条件表示为一组点条件对。我们使用问题表示作为神经网络预测解决方法的输入，这使得我们可以通过神经网络的前进过程来快速地预测问题特定的解决方案，而不需要更新模型参数。为了训练我们的模型，我们将通过基于物理学信息神经网络框架的最小二乘法来减少预测错误，这样我们就可以在解决PDE问题时评估解决方案的错误。我们示出了我们的提出方法可以比既有方法更高效地预测PDE问题的解决方案。

An Exploratory Study on Simulated Annealing for Feature Selection in Learning-to-Rank

paper_url: http://arxiv.org/abs/2310.13269
repo_url: None
paper_authors: Mohd. Sayemul Haque, Md. Fahim, Muhammad Ibrahim
for: 本研究是 investigate the use of simulated annealing for feature selection in the learning-to-rank domain.
methods: 我们使用了一种叫做 simulated annealing的meta-heuristic approach，并 explore various neighborhood selection strategies和 temperature cooling schemes. We also introduce a new hyper-parameter called the progress parameter.
results: 我们的算法在五个公共的学习到排序 benchmark datasets上进行了评估，并与Local Beam Search算法进行了比较。结果表明我们的提议的模型具有效果。

Abstract
Learning-to-rank is an applied domain of supervised machine learning. As feature selection has been found to be effective for improving the accuracy of learning models in general, it is intriguing to investigate this process for learning-to-rank domain. In this study, we investigate the use of a popular meta-heuristic approach called simulated annealing for this task. Under the general framework of simulated annealing, we explore various neighborhood selection strategies and temperature cooling schemes. We further introduce a new hyper-parameter called the progress parameter that can effectively be used to traverse the search space. Our algorithms are evaluated on five publicly benchmark datasets of learning-to-rank. For a better validation, we also compare the simulated annealing-based feature selection algorithm with another effective meta-heuristic algorithm, namely local beam search. Extensive experimental results shows the efficacy of our proposed models.

摘要
学习排名是应用领域的超级vised机器学习。因为特征选择已被证明可以提高学习模型的准确率，因此在这个领域进行这种过程是非常有趣。在这个研究中，我们使用了一种受欢迎的meta-heuristic方法，即模拟熔炉。在总体框架下，我们探索了不同的邻居选择策略和温度冷却方案。我们还引入了一个新的超参数，即进度参数，可以有效地探索搜索空间。我们的算法在五个公共的学习排名数据集上进行了评估。为了更好地验证，我们还与另一种有效的meta-heuristic算法，即本地搜索算法进行比较。广泛的实验结果表明了我们的提议的模型的有效性。

Enhancing drug and cell line representations via contrastive learning for improved anti-cancer drug prioritization

paper_url: http://arxiv.org/abs/2310.13725
repo_url: None
paper_authors: Patrick J. Lawrence, Xia Ning
for: 这项研究旨在提高基于omic序列分析的精准肿瘤治疗，通过强化药物和细胞系列表示的关系结构，以提高计算机方法学习药物和细胞系列之间的相互关系。
methods: 本研究使用了对比学习方法，以保留药物机制作用和细胞系列肿瘤类型之间的关系结构，从而提高了学习的药物和细胞系列表示。
results: 相比之前的状态艺法，我们的学习表示可以 achieve更高的性能，并且发现使用我们学习的表示时，预测器更加均衡地使用药物和细胞系列来源的特征，这有助于进行更加个性化的药物优先顺序。

Abstract
Due to cancer's complex nature and variable response to therapy, precision oncology informed by omics sequence analysis has become the current standard of care. However, the amount of data produced for each patients makes it difficult to quickly identify the best treatment regimen. Moreover, limited data availability has hindered computational methods' abilities to learn patterns associated with effective drug-cell line pairs. In this work, we propose the use of contrastive learning to improve learned drug and cell line representations by preserving relationship structures associated with drug mechanism of action and cell line cancer types. In addition to achieving enhanced performance relative to a state-of-the-art method, we find that classifiers using our learned representations exhibit a more balances reliance on drug- and cell line-derived features when making predictions. This facilitates more personalized drug prioritizations that are informed by signals related to drug resistance.

摘要
Translated into Simplified Chinese:由于肿瘤的复杂性和变化的响应，精准肿瘤学 Informed by omics sequence analysis 已成为现代标准的治疗方法。然而，每个患者生成的数据量太大，使得快速 identificar最佳治疗方案变得困难。此外，数据的有限性限制了计算方法学习 pattern 关联有效的药品和细胞线路。在这项工作中，我们提议使用对比学习来提高学习到药品和细胞线路的表示，以保留药理学 Mechanism of action 和细胞型 cancer 之间的关系结构。此外，我们发现使用我们学习的表示的分类器能够更好地使用药品和细胞线路 Derived 特征进行预测，从而实现更个性化的药品优先级，基于药品抵抗的信号。

ManiCast: Collaborative Manipulation with Cost-Aware Human Forecasting

paper_url: http://arxiv.org/abs/2310.13258
repo_url: None
paper_authors: Kushal Kedia, Prithwish Dan, Atiksh Bhardwaj, Sanjiban Choudhury
for: 该论文旨在提高人机 робо合作 manipulation 性能，具体是通过精准预测人类动作来提高机器人的计划性能。
methods: 该论文提出了一种名为 ManiCast 的新框架，该框架通过学习人类动作预测模型，并将其与一种模拟预测控制器结合，以执行协同 manipulation 任务。
results: 该论文通过实验证明，ManiCast 框架可以在多个实际任务中，如反应混合、物品传递和协同设备等，实现流畅、实时的人机合作。同时，论文还对 Forecast 和 End-to-End 预测控制器系统进行了评估，并与已有的学习基线和规则基线进行了比较。

Abstract
Seamless human-robot manipulation in close proximity relies on accurate forecasts of human motion. While there has been significant progress in learning forecast models at scale, when applied to manipulation tasks, these models accrue high errors at critical transition points leading to degradation in downstream planning performance. Our key insight is that instead of predicting the most likely human motion, it is sufficient to produce forecasts that capture how future human motion would affect the cost of a robot's plan. We present ManiCast, a novel framework that learns cost-aware human forecasts and feeds them to a model predictive control planner to execute collaborative manipulation tasks. Our framework enables fluid, real-time interactions between a human and a 7-DoF robot arm across a number of real-world tasks such as reactive stirring, object handovers, and collaborative table setting. We evaluate both the motion forecasts and the end-to-end forecaster-planner system against a range of learned and heuristic baselines while additionally contributing new datasets. We release our code and datasets at https://portal-cornell.github.io/manicast/.

摘要
<>将人机 manipulate 在 close proximity 中进行流畅的执行，需要准确预测人类动作。虽然在大规模学习中有了 significiant progress ，但在执行 manipulate 任务时，这些模型会在关键转折点上产生高错误率，导致下游规划性能下降。我们的关键发现是，而不是预测人类动作的最有可能性，可以生成捕捉未来人类动作对机器人计划的成本影响的预测。我们提出了 ManiCast 框架，该框架学习成本识别人类预测，并将其传递给模型预测控制 плаanner 执行共同执行任务。我们的框架可以在多个实际任务中，如反应搅拌、物品交接和共同设备，实现流畅、实时的人机互动。我们对预测和总体预测控制系统进行了评估，并与已学习和启发式基础集成。我们将代码和数据集发布在。

Visual Grounding Helps Learn Word Meanings in Low-Data Regimes

paper_url: http://arxiv.org/abs/2310.13257
repo_url: None
paper_authors: Chengxu Zhuang, Evelina Fedorenko, Jacob Andreas
for: 研究语言模型（LM）是人类语言生成和理解的 poderful工具，并且它们的内部表示与人类语言处理的表示相吻合。但是，为了达到这些结果，LM必须通过不人类化的训练方法进行训练，需要大量的语言数据，而无需与感知、行为或社会行为相关的知识。
methods: 我们使用了多种LM架构，并在不同的数据scale上进行训练。我们还使用了图像描述任务作为 auxiliary supervision。
results: 我们发现，视觉超vision可以提高word learning的效率，但这些改进几乎都出现在低数据 régime中，而且在包含丰富的分布式信号的情况下，这些改进可能会被抵消。我们发现，模型主要驱动的视觉信息和word co-occurrence信息之间的信息不是重复的。然而，我们的结果表明，当前的多模式模型化方法无法有效地利用视觉信息，从人类化的数据集上建立更人类化的word表示。

Abstract
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension, and their internal representations are remarkably well-aligned with representations of language in the human brain. But to achieve these results, LMs must be trained in distinctly un-human-like ways -- requiring orders of magnitude more language data than children receive during development, and without any of the accompanying grounding in perception, action, or social behavior. Do models trained more naturalistically -- with grounded supervision -- exhibit more human-like language learning? We investigate this question in the context of word learning, a key sub-task in language acquisition. We train a diverse set of LM architectures, with and without auxiliary supervision from image captioning tasks, on datasets of varying scales. We then evaluate these models on a broad set of benchmarks characterizing models' learning of syntactic categories, lexical relations, semantic features, semantic similarity, and alignment with human neural representations. We find that visual supervision can indeed improve the efficiency of word learning. However, these improvements are limited: they are present almost exclusively in the low-data regime, and sometimes canceled out by the inclusion of rich distributional signals from text. The information conveyed by text and images is not redundant -- we find that models mainly driven by visual information yield qualitatively different from those mainly driven by word co-occurrences. However, our results suggest that current multi-modal modeling approaches fail to effectively leverage visual information to build more human-like word representations from human-sized datasets.

摘要
现代神经语言模型（LM）是强大的工具，用于模拟人类句子生成和理解，并且其内部表示与人类语言表示相吻合。但是，为了 достичь这些结果，LM需要接受非人类化的训练方法，需要大量的语言数据，并且没有与感知、行为或社会行为相关的背景。我们问 whether models trained more naturally -- with grounded supervision -- exhibit more human-like language learning? 在word learning中，我们训练了多种LM架构，有和无附加的图像描述任务 auxiliary supervision，在不同的数据规模上进行训练。然后，我们对这些模型进行了广泛的测试，以评估它们在 sintactic categories、lexical relations、semantic features、semantic similarity 和人类神经表示相似性方面的学习效果。我们发现，视觉supervision可以提高word learning的效率，但这些改进几乎完全局限于低数据规模，而且有时会被文本中的丰富分布式信号抵消。我们发现，文本和图像中的信息并不是重复的，模型主要驱动的视觉信息会导致模型的学习结果与主要驱动的word co-occurrences不同。然而，我们的结果表明，现有的多模态模型化方法无法有效地利用视觉信息，从人类化的数据规模上建立更人类化的word表示。

TempGNN: Temporal Graph Neural Networks for Dynamic Session-Based Recommendations

paper_url: http://arxiv.org/abs/2310.13249
repo_url: None
paper_authors: Eunkyu Oh, Taehun Kim
for: 这种研究旨在提高Session-based recommendation的准确率，通过更好地理解用户在短时间内的交互行为和item之间的相互关系。
methods: 该研究提出了Temporal Graph Neural Networks（TempGNN）模型，通过在动态Session图上使用时间嵌入Operator来捕捉交互行为的结构和时间层次结构。
results: 实验结果表明，TempGNN可以充分利用已有模型的优势，并且在两个真实世界电商数据集上达到了状态之冠的表现。

Abstract
Session-based recommendations which predict the next action by understanding a user's interaction behavior with items within a relatively short ongoing session have recently gained increasing popularity. Previous research has focused on capturing the dynamics of sequential dependencies from complicated item transitions in a session by means of recurrent neural networks, self-attention models, and recently, mostly graph neural networks. Despite the plethora of different models relying on the order of items in a session, few approaches have been proposed for dealing better with the temporal implications between interactions. We present Temporal Graph Neural Networks (TempGNN), a generic framework for capturing the structural and temporal dynamics in complex item transitions utilizing temporal embedding operators on nodes and edges on dynamic session graphs, represented as sequences of timed events. Extensive experimental results show the effectiveness and adaptability of the proposed method by plugging it into existing state-of-the-art models. Finally, TempGNN achieved state-of-the-art performance on two real-world e-commerce datasets.

摘要
SESSION-based recommendations, which predict the next action by understanding a user's interaction behavior with items within a relatively short ongoing session, have recently gained increasing popularity. Previous research has focused on capturing the dynamics of sequential dependencies from complicated item transitions in a session by means of recurrent neural networks, self-attention models, and recently, mostly graph neural networks. Despite the plethora of different models relying on the order of items in a session, few approaches have been proposed for dealing better with the temporal implications between interactions. We present Temporal Graph Neural Networks (TempGNN), a generic framework for capturing the structural and temporal dynamics in complex item transitions utilizing temporal embedding operators on nodes and edges on dynamic session graphs, represented as sequences of timed events. Extensive experimental results show the effectiveness and adaptability of the proposed method by plugging it into existing state-of-the-art models. Finally, TempGNN achieved state-of-the-art performance on two real-world e-commerce datasets.Here's the translation in Traditional Chinese as well:SESSION-based recommendations, which predict the next action by understanding a user's interaction behavior with items within a relatively short ongoing session, have recently gained increasing popularity. Previous research has focused on capturing the dynamics of sequential dependencies from complicated item transitions in a session by means of recurrent neural networks, self-attention models, and recently, mostly graph neural networks. Despite the plethora of different models relying on the order of items in a session, few approaches have been proposed for dealing better with the temporal implications between interactions. We present Temporal Graph Neural Networks (TempGNN), a generic framework for capturing the structural and temporal dynamics in complex item transitions utilizing temporal embedding operators on nodes and edges on dynamic session graphs, represented as sequences of timed events. Extensive experimental results show the effectiveness and adaptability of the proposed method by plugging it into existing state-of-the-art models. Finally, TempGNN achieved state-of-the-art performance on two real-world e-commerce datasets.

FLEE-GNN: A Federated Learning System for Edge-Enhanced Graph Neural Network in Analyzing Geospatial Resilience of Multicommodity Food Flows

paper_url: http://arxiv.org/abs/2310.13248
repo_url: https://github.com/geods/flee-gnn
paper_authors: Yuxiao Qu, Jinmeng Rao, Song Gao, Qianheng Zhang, Wei-Lun Chao, Yu Su, Michelle Miller, Alfonso Morales, Patrick Huber
for: 这篇论文旨在探讨如何使用 Federated Learning System for Edge-Enhanced Graph Neural Network (FLEE-GNN) 来解决食品供应网络的可恢复性问题，以提高食品安全性。
methods: 该论文提出了一种基于 Federated Learning 的方法，使用 Graph Neural Network (GNN) 来分析食品供应网络的可恢复性。这种方法结合了 GNN 的强大和鲁棒性，以及 Federated Learning 的隐私保护和分布式特点。
results: 该论文的实验结果表明，FLEE-GNN 可以有效地提高食品供应网络的可恢复性分析，并且可以应用于其他的空间网络中。

Abstract
Understanding and measuring the resilience of food supply networks is a global imperative to tackle increasing food insecurity. However, the complexity of these networks, with their multidimensional interactions and decisions, presents significant challenges. This paper proposes FLEE-GNN, a novel Federated Learning System for Edge-Enhanced Graph Neural Network, designed to overcome these challenges and enhance the analysis of geospatial resilience of multicommodity food flow network, which is one type of spatial networks. FLEE-GNN addresses the limitations of current methodologies, such as entropy-based methods, in terms of generalizability, scalability, and data privacy. It combines the robustness and adaptability of graph neural networks with the privacy-conscious and decentralized aspects of federated learning on food supply network resilience analysis across geographical regions. This paper also discusses FLEE-GNN's innovative data generation techniques, experimental designs, and future directions for improvement. The results show the advancements of this approach to quantifying the resilience of multicommodity food flow networks, contributing to efforts towards ensuring global food security using AI methods. The developed FLEE-GNN has the potential to be applied in other spatial networks with spatially heterogeneous sub-network distributions.

摘要
全球化的食品供应网络可靠性理解和测量是面临增长的食品不安全的全球需求。然而，这些网络的复杂性，包括多维度交互和决策，带来了重要的挑战。这篇论文提出了FLEE-GNN，一种新的联邦学习系统，用于增强地图分布式神经网络的地ospatial可靠性分析。FLEE-GNN将现有方法，如Entropy-based方法，超越了一致性、可扩展性和数据隐私方面的限制。它将图神经网络的可靠性和适应性与联邦学习的隐私性和分散性相结合，进行食品供应网络可靠性分析 Across geographical regions。本文还讨论了FLEE-GNN的创新数据生成技术、实验设计和未来改进方向。结果表明FLEE-GNN在多种 alimentary food flow networks 可靠性分析方面做出了进步，贡献到全球食品安全使用 AI 方法。发展的FLEE-GNN可以应用于其他的空间网络，具有空间不同互连分布。

Multi-level Contrastive Learning for Script-based Character Understanding

paper_url: http://arxiv.org/abs/2310.13231
repo_url: None
paper_authors: Dawei Li, Hengyuan Zhang, Yanran Li, Shiping Yang
for: 本研究目的是理解文本中的人物性格和身份，通过其讲话习惯了解其全面性。
methods: 我们提出了一种多级对比学习框架，用于捕捉人物的全面信息。我们进行了广泛的实验，与多种先进的语言模型进行比较，包括SpanBERT、Longformer、BigBird和ChatGPT-3.5。
results: 我们的方法可以大幅提高人物理解的性能，并通过进一步的分析，证明了我们的方法的有效性和解决了一些挑战。我们将在github上公开我们的工作，链接在https://github.com/David-Li0406/Script-based-Character-Understanding。

Abstract
In this work, we tackle the scenario of understanding characters in scripts, which aims to learn the characters' personalities and identities from their utterances. We begin by analyzing several challenges in this scenario, and then propose a multi-level contrastive learning framework to capture characters' global information in a fine-grained manner. To validate the proposed framework, we conduct extensive experiments on three character understanding sub-tasks by comparing with strong pre-trained language models, including SpanBERT, Longformer, BigBird and ChatGPT-3.5. Experimental results demonstrate that our method improves the performances by a considerable margin. Through further in-depth analysis, we show the effectiveness of our method in addressing the challenges and provide more hints on the scenario of character understanding. We will open-source our work on github at https://github.com/David-Li0406/Script-based-Character-Understanding.

摘要
在这项工作中，我们面临了剑道字符的理解enario，即从字符的讲话中学习其个性和身份。我们首先分析了这个enario中的一些挑战，然后提出了一种多级对比学习框架，以精细地捕捉字符的全球信息。为验证我们的方法，我们进行了广泛的实验，包括对三种字符理解子任务进行比较，其中包括SpanBERT、Longformer、BigBird和ChatGPT-3.5等强大的预训练语言模型。实验结果显示，我们的方法可以明显提高表现。通过进一步的深入分析，我们证明了我们的方法在面临挑战时的效果，并提供了更多有关字符理解scenario的提示。我们将在GitHub上开源我们的工作，可以在https://github.com/David-Li0406/Script-based-Character-Understanding中找到。

Absolute Policy Optimization

paper_url: http://arxiv.org/abs/2310.13230
repo_url: https://github.com/NhaPhatHanh/github
paper_authors: Weiye Zhao, Feihan Li, Yifan Sun, Rui Chen, Tianhao Wei, Changliu Liu
for: 解决复杂控制任务和游戏场景中的策略优化问题
methods: 引入新的目标函数优化策略，并通过一系列近似算法简化实现
results: 在复杂 kontinuous control benchmark 任务和 Atari 游戏中显著超越现状强度策略优化算法，并在预期性和最坏性性能两个方面具备显著改进

Abstract
In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function; by optimizing which, it will lead to guaranteed monotonic improvement in the lower bound of near-total performance samples (absolute performance). Considering this groundbreaking theoretical advancement, we then refine this theoretically grounded algorithm through a series of approximations, resulting in a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO significantly outperforms state-of-the-art policy gradient algorithms, resulting in substantial improvements in both expected performance and worst-case performance.

摘要

Interpretable Deep Reinforcement Learning for Optimizing Heterogeneous Energy Storage Systems

paper_url: http://arxiv.org/abs/2310.14783
repo_url: None
paper_authors: Luolin Xiong, Yang Tang, Chensheng Liu, Shuai Mao, Ke Meng, Zhaoyang Dong, Feng Qian
for: 提高能量存储系统（ESS）在能源市场中的灵活性和可再生能源利用率，通过杂质太阳能-存储系统（PV-ESS），利用磷酸存储（BES）和氢能存储（HES）的特点。
methods: 开发了一个完整的成本函数，包括衰变、资本和运营维护成本，用于反映实际场景。同时，提出了一种可解释性高的prototype-based policy网络，通过人工设计的原型来引导决策，使得决策过程自然地具有解释性。
results: 在四个不同的案例中，比较了黑obox模型和我们提出的可解释性优化方法，结果表明，我们的方法具有更高的效果和实用性。

Abstract
Energy storage systems (ESS) are pivotal component in the energy market, serving as both energy suppliers and consumers. ESS operators can reap benefits from energy arbitrage by optimizing operations of storage equipment. To further enhance ESS flexibility within the energy market and improve renewable energy utilization, a heterogeneous photovoltaic-ESS (PV-ESS) is proposed, which leverages the unique characteristics of battery energy storage (BES) and hydrogen energy storage (HES). For scheduling tasks of the heterogeneous PV-ESS, cost description plays a crucial role in guiding operator's strategies to maximize benefits. We develop a comprehensive cost function that takes into account degradation, capital, and operation/maintenance costs to reflect real-world scenarios. Moreover, while numerous methods excel in optimizing ESS energy arbitrage, they often rely on black-box models with opaque decision-making processes, limiting practical applicability. To overcome this limitation and enable transparent scheduling strategies, a prototype-based policy network with inherent interpretability is introduced. This network employs human-designed prototypes to guide decision-making by comparing similarities between prototypical situations and encountered situations, which allows for naturally explained scheduling strategies. Comparative results across four distinct cases underscore the effectiveness and practicality of our proposed pre-hoc interpretable optimization method when contrasted with black-box models.

摘要
能量存储系统（ESS）是能源市场中的关键组件，同时作为能源供应者和消费者。ESS运营商可以通过优化存储设备的操作来获得利润。为了进一步提高ESS在能源市场中的灵活性和可再生能源利用率，我们提议了一种多种能量存储系统（PV-ESS），利用独特的电池能量存储（BES）和氢能存储（HES）特点。对于PV-ESS的调度任务，成本描述起到了关键作用，导引运营商的策略以最大化利润。我们开发了一个全面的成本函数，考虑了退化、资本和运营/维护成本，以准确反映实际场景。此外，虽然许多方法可以优化ESS的能源融合，但它们通常基于黑盒模型，即不可见的决策过程，限制了实际应用。为了缓解这一限制和提供透明的调度策略，我们引入了一种封装式政策网络，该网络使用人类设计的原型来引导决策，通过比较相似的原型 situación和遇到的 situación之间的相似性，以便自然地解释调度策略。对四个不同的案例进行比较结果表明，我们提出的先验可解释优化方法在黑盒模型的基础上显著超越了黑盒模型。

paper_url: http://arxiv.org/abs/2310.13227
repo_url: None
paper_authors: Yuchen Zhuang, Xiang Chen, Tong Yu, Saayan Mitra, Victor Bursztyn, Ryan A. Rossi, Somdeb Sarkhel, Chao Zhang
for: 这个论文的目的是提出一种高效的搜索算法，用于LLM-based自动代理在具有广泛的动作空间的问题上进行决策和规划。
methods: 这个论文使用的方法是基于搜索算法的树形搜索算法，具体来说是将整个动作空间转换为决策树，每个节点表示一个可能的API函数调用。这个算法利用A*搜索算法和任务特定的成本函数设计，高效地快速搜索最低成本的有效路径。
results: 实验结果表明，ToolChain* 算法可以高效地平衡探索和利用在具有广泛的动作空间中，并在计划和理解任务上比基eline表现出3.1%和3.5%的提升，同时需要7.35倍和2.31倍的时间。

Abstract
Large language models (LLMs) have demonstrated powerful decision-making and planning capabilities in solving complicated real-world problems. LLM-based autonomous agents can interact with diverse tools (e.g., functional APIs) and generate solution plans that execute a series of API function calls in a step-by-step manner. The multitude of candidate API function calls significantly expands the action space, amplifying the critical need for efficient action space navigation. However, existing methods either struggle with unidirectional exploration in expansive action spaces, trapped into a locally optimal solution, or suffer from exhaustively traversing all potential actions, causing inefficient navigation. To address these issues, we propose ToolChain*, an efficient tree search-based planning algorithm for LLM-based agents. It formulates the entire action space as a decision tree, where each node represents a possible API function call involved in a solution plan. By incorporating the A* search algorithm with task-specific cost function design, it efficiently prunes high-cost branches that may involve incorrect actions, identifying the most low-cost valid path as the solution. Extensive experiments on multiple tool-use and reasoning tasks demonstrate that ToolChain* efficiently balances exploration and exploitation within an expansive action space. It outperforms state-of-the-art baselines on planning and reasoning tasks by 3.1% and 3.5% on average while requiring 7.35x and 2.31x less time, respectively.

摘要

Scalable Neural Network Kernels

paper_url: http://arxiv.org/abs/2310.13225
repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
paper_authors: Arijit Sehanobish, Krzysztof Choromanski, Yunfan Zhao, Avinava Dubey, Valerii Likhosherstov
for: 这篇论文目标是提出一种可扩展神经网络核心（SNNK），用于取代常见的前向传输层（FFL），能够approximate FFL，但具有更有利的计算性质。
methods: 这篇论文使用了SNNK，它可以分离输入和参数之间的关系，只在最终计算中via点积kernel进行连接。它们还是FFL的约等价表示，可以模型输入和参数之间的复杂关系。此外，文章还提出了神经网络集成过程，通过应用SNNK来压缩深度神经网络架构，从而获得额外的压缩效果。在极端情况下，这会导致完全集成的神经网络，其优化参数可以通过explicit的方程表示出来，打开了一个可以 circumvent backpropagation的可能性。
results: 文章提供了严格的理论分析和广泛的实验评估，从点积kernel估计到Transformers的精细调整with novel adapter layers inspired by SNNKs。结果表明，使用SNNK可以减少训练参数的数量，保持竞争性的准确性。

Abstract
We introduce the concept of scalable neural network kernels (SNNKs), the replacements of regular feedforward layers (FFLs), capable of approximating the latter, but with favorable computational properties. SNNKs effectively disentangle the inputs from the parameters of the neural network in the FFL, only to connect them in the final computation via the dot-product kernel. They are also strictly more expressive, as allowing to model complicated relationships beyond the functions of the dot-products of parameter-input vectors. We also introduce the neural network bundling process that applies SNNKs to compactify deep neural network architectures, resulting in additional compression gains. In its extreme version, it leads to the fully bundled network whose optimal parameters can be expressed via explicit formulae for several loss functions (e.g. mean squared error), opening a possibility to bypass backpropagation. As a by-product of our analysis, we introduce the mechanism of the universal random features (or URFs), applied to instantiate several SNNK variants, and interesting on its own in the context of scalable kernel methods. We provide rigorous theoretical analysis of all these concepts as well as an extensive empirical evaluation, ranging from point-wise kernel estimation to Transformers' fine-tuning with novel adapter layers inspired by SNNKs. Our mechanism provides up to 5x reduction in the number of trainable parameters, while maintaining competitive accuracy.

摘要
我们介绍了扩展神经网络底层（SNNK），它们取代了常规传递层（FFL），并能够模拟它们，但具有更有利的计算特性。SNNK将输入与神经网络参数分离开来，仅在最终计算中通过点产生kernel来连接。它们还是严格更加表达力强，因为它们可以模型复杂的关系，超过传递参数vector的数统产生器。我们还介绍了将SNNK应用于压缩深度神经网络架构的神经网络卷绿（bundling）过程，从而获得额外的压缩成本。在极端版本下，它导致完全卷绿网络，其最佳参数可以通过明确的公式表示出来，实现了几个损失函数（例如平均方差）的最佳化。作为一个副作用，我们介绍了内在特征随机抽象（URF）的机制，它可以实现多种SNNK的实例，并且在扩展层神经网络方面具有兴趣。我们对这些概念进行了严谨的理论分析，以及广泛的实验评估，从单元测度到Transformers的精细调整 avec novel adapter layers inspired by SNNKs。我们的机制可以实现5倍的受训练 парамет数减少，保持竞争性的精度。

HierCas: Hierarchical Temporal Graph Attention Networks for Popularity Prediction in Information Cascades

paper_url: http://arxiv.org/abs/2310.13219
repo_url: https://github.com/Daisy-zzz/HierCas
paper_authors: Zhizhen Zhang, Xiaohui Xie, Yishuo Zhang, Lanshan Zhang, Yong Jiang
for: 预测信息冲击的吸引力，用于识别假新闻和提供准确的推荐。
methods: 使用神经网络方法，不同于传统的特征基于方法，它们具有域特定的特性和新领域的不适应性。
results: 在两个实际 dataset 上，对比州的方法，提出了一种新的框架called Hierarchical Temporal Graph Attention Networks for cascade popularity prediction (HierCas)，实现了更高的准确率和更好的性能。

Abstract
Information cascade popularity prediction is critical for many applications, including but not limited to identifying fake news and accurate recommendations. Traditional feature-based methods heavily rely on handcrafted features, which are domain-specific and lack generalizability to new domains. To address this problem, researchers have turned to neural network-based approaches. However, existing methods follow a sampling-based modeling approach, potentially losing continuous dynamic information and structural-temporal dependencies that emerge during the information diffusion process. In this paper, we propose a novel framework called Hierarchical Temporal Graph Attention Networks for cascade popularity prediction (HierCas). Unlike existing methods, HierCas operates on the entire cascade graph by a dynamic graph modeling approach, enabling it to capture the full range of continuous dynamic information and explicitly model the interplay between structural and temporal factors. By leveraging time-aware node embedding, graph attention mechanisms and hierarchical pooling structures, HierCas effectively captures the popularity trend implicit in the complex cascade. Extensive experiments conducted on two real-world datasets in different scenarios demonstrate that our HierCas significantly outperforms the state-of-the-art approaches.

摘要
信息带动 popularity 预测是许多应用程序中的关键任务，包括但不限于识别假新闻和准确推荐。传统的特征基于方法依赖于手工设计的特征，这些特征是域特定的，缺乏对新领域的普适性。为解决这个问题，研究人员转向神经网络基于方法。然而，现有方法采用采样基本的模型化方法，可能会失去流行化过程中的连续动态信息和结构时间依赖关系。在本文中，我们提出了一种新的框架，即层次时间图注意力网络（HierCas）。与现有方法不同，HierCas 在整个带动图上运行，能够捕捉整个流行化过程中的连续动态信息，并且可以显式地模型结构时间因素之间的交互作用。通过利用时间意识节点嵌入、图注意力机制和层次聚合结构，HierCas 能够有效地捕捉流行趋势的隐含信息。我们在两个实际场景中进行了大量实验，结果表明，我们的 HierCas 在比较 estado-of-the-art 方法的情况下显著 OUTPERFORM。

MultiCoNER v2: a Large Multilingual dataset for Fine-grained and Noisy Named Entity Recognition

paper_url: http://arxiv.org/abs/2310.13213
repo_url: None
paper_authors: Besnik Fetahu, Zhiyu Chen, Sudipta Kar, Oleg Rokhlenko, Shervin Malmasi
for: 本文提出了一个新的名实体识别（NER）数据集，即MULTICONER V2，用于解决细化的名实体识别问题。
methods: 本文使用了开源资源如Wikipedia和Wikidata来编译数据集，并在多语言环境下进行了评估。
results: evaluation表明，MULTICONER V2数据集具有较低的精度，macro-F1=0.63（所有语言），并且对实体干扰具有更大的影响，相比于上下文干扰。

Abstract
We present MULTICONER V2, a dataset for fine-grained Named Entity Recognition covering 33 entity classes across 12 languages, in both monolingual and multilingual settings. This dataset aims to tackle the following practical challenges in NER: (i) effective handling of fine-grained classes that include complex entities like movie titles, and (ii) performance degradation due to noise generated from typing mistakes or OCR errors. The dataset is compiled from open resources like Wikipedia and Wikidata, and is publicly available. Evaluation based on the XLM-RoBERTa baseline highlights the unique challenges posed by MULTICONER V2: (i) the fine-grained taxonomy is challenging, where the scores are low with macro-F1=0.63 (across all languages), and (ii) the corruption strategy significantly impairs performance, with entity corruption resulting in 9% lower performance relative to non-entity corruptions across all languages. This highlights the greater impact of entity noise in contrast to context noise.

摘要
我们介绍MULTICONER V2 dataset，用于细化Named Entity Recognition（NER），覆盖12种语言和33种实体类。该dataset的目标是解决NER中的两个实际挑战：（i）精细类实体，如电影标题，的处理，（ii）由 typing mistakes 或 OCR errors 生成的噪声的影响。dataset 从开源资源Wikipedia和Wikidata中 compile，并公共可用。使用XLM-RoBERTa基eline进行评估，显示MULTICONER V2具有以下唯一挑战：（i）细化分类困难，macro-F1 = 0.63（所有语言），（ii）损害策略对性能产生显著影响，实体损害相对于非实体损害在所有语言上下降9%。这反映了实体噪声比 контекст噪声更大的影响。

Primacy Effect of ChatGPT

paper_url: http://arxiv.org/abs/2310.13206
repo_url: None
paper_authors: Yiwei Wang, Yujun Cai, Muhao Chen, Yuxuan Liang, Bryan Hooi
For: The paper studies the primacy effect of ChatGPT, which is the tendency of selecting the labels at earlier positions as the answer.* Methods: The paper uses ChatGPT to query the model with prompts containing the question and candidate labels, and analyzes the model’s decision-making process.* Results: The paper finds that ChatGPT’s decision is sensitive to the order of labels in the prompt, and the model has a higher chance of selecting the labels at earlier positions as the answer.Here’s the simplified Chinese text for the three information points:* For: 这篇论文研究了ChatGPT模型中的主导效应，即在提问中选择早些位置的标签作为答案。* Methods: 论文使用ChatGPT模型来查询提问和候选标签，并分析模型的决策过程。* Results: 论文发现ChatGPT的决策受提问中标签顺序的影响，模型更有可能选择提问中早些位置的标签作为答案。

Abstract
Instruction-tuned large language models (LLMs), such as ChatGPT, have led to promising zero-shot performance in discriminative natural language understanding (NLU) tasks. This involves querying the LLM using a prompt containing the question, and the candidate labels to choose from. The question-answering capabilities of ChatGPT arise from its pre-training on large amounts of human-written text, as well as its subsequent fine-tuning on human preferences, which motivates us to ask: Does ChatGPT also inherits humans' cognitive biases? In this paper, we study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer. We have two main findings: i) ChatGPT's decision is sensitive to the order of labels in the prompt; ii) ChatGPT has a clearly higher chance to select the labels at earlier positions as the answer. We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions. We release the source code at https://github.com/wangywUST/PrimacyEffectGPT.

摘要
大型语言模型（LLM），如ChatGPT，在推导性自然语言理解（NLU）任务中表现出了可喜的零shot性。这 involves 使用提示符 containing 问题和候选标签，并让 LLM 回答问题。 ChatGPT 的问答能力来自于其在大量人类写的文本上进行预训练，以及其后续的人类偏好的 fine-tuning，这使我们想问： ChatGPT 也继承了人类的认知偏见吗？在这篇论文中，我们研究 ChatGPT 的 primacy effect：提示符中标签的顺序对 ChatGPT 的决策产生影响。我们有两个主要发现：i）ChatGPT 的决策受提示符中标签的顺序影响；ii）ChatGPT 对提示符中早期位置的标签有明显更高的选择概率。我们希望通过我们的实验和分析，为建立更可靠的 ChatGPT-based 解决方案提供更多的意见。我们在 GitHub 上发布了源代码，请参考。

Towards Detecting Contextual Real-Time Toxicity for In-Game Chat

paper_url: http://arxiv.org/abs/2310.18330
repo_url: None
paper_authors: Zachary Yang, Nicolas Grenan-Godbout, Reihaneh Rabbany
for: 这篇论文是为了实现在在线环境中实时探测毒性内容而写的。
methods: 这篇论文使用了一种简单可扩展的模型，该模型可以在实时聊天中可靠地检测毒性内容，并包括聊天历史和元数据。
results: 该模型在多player游戏中表现出色，可以成功地检测毒性内容，并且可以在聊天报告后进行后勤调节，成功标记82.1%的聊天报告用户，准确率为90.0%。

Abstract
Real-time toxicity detection in online environments poses a significant challenge, due to the increasing prevalence of social media and gaming platforms. We introduce ToxBuster, a simple and scalable model that reliably detects toxic content in real-time for a line of chat by including chat history and metadata. ToxBuster consistently outperforms conventional toxicity models across popular multiplayer games, including Rainbow Six Siege, For Honor, and DOTA 2. We conduct an ablation study to assess the importance of each model component and explore ToxBuster's transferability across the datasets. Furthermore, we showcase ToxBuster's efficacy in post-game moderation, successfully flagging 82.1% of chat-reported players at a precision level of 90.0%. Additionally, we show how an additional 6% of unreported toxic players can be proactively moderated.

摘要
实时恶意检测在在线环境中具有重要挑战，由于社交媒体和游戏平台的广泛使用。我们介绍ToxBuster，一种简单可扩展的模型，可靠地在实时中检测恶意内容，并包括交谈历史和元数据。ToxBuster在流行的多人游戏中，如雨丝六世、荣誉之战和DOTA 2等，consistently outperforms conventional toxicity models。我们进行了减少研究，以评估模型组件的重要性，并探索ToxBuster的可转移性。此外，我们展示了ToxBuster在后期 Moderation 中的效果，成功地标记了82.1%的交谈报告用户，准确率为90.0%。此外，我们还表明了一个额外的6%的恶意用户可以被早期 Moderation。

2023-10-20

cs.CL

cs.CL - 2023-10-20

Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines

paper_url: http://arxiv.org/abs/2310.13859
repo_url: None
paper_authors: Yoo Yeon Sung, Jordan Boyd-Graber, Naeemul Hassan
for: 本研究旨在提供一个多模态视频谎导标注数据集，以便更好地检测视频标题的谎导性。
methods: 该研究使用了现有资源的多模态基线测试方法，并对标题的谎导性进行了分析。
results: 研究发现，基线测试方法在检测视频标题的谎导性方面具有较高的准确率。此外，对标题的谎导性进行了更深入的分析，可以更好地理解annotators的背景和视频内容之间的关系。

Abstract
Polarization and the marketplace for impressions have conspired to make navigating information online difficult for users, and while there has been a significant effort to detect false or misleading text, multimodal datasets have received considerably less attention. To complement existing resources, we present multimodal Video Misleading Headline (VMH), a dataset that consists of videos and whether annotators believe the headline is representative of the video's contents. After collecting and annotating this dataset, we analyze multimodal baselines for detecting misleading headlines. Our annotation process also focuses on why annotators view a video as misleading, allowing us to better understand the interplay of annotators' background and the content of the videos.

摘要
政治化和市场化的影响使得在线信息搜索变得更加困难，虽然有很大努力来检测false或误导性的文本，但多媒体数据集获得了相对较少的关注。为了补充现有资源，我们提供了多媒体视频误导头条（VMH）数据集，该数据集包括视频和头条是否正确表示视频内容。我们收集和标注这个数据集后，分析多媒体基线 для检测误导头条。我们的注释过程还关注了annotators的背景和视频内容之间的交互，帮助我们更好地理解annotators的背景和视频内容之间的关系。

Implications of Annotation Artifacts in Edge Probing Test Datasets

paper_url: http://arxiv.org/abs/2310.13856
repo_url: https://github.com/josh1108/eptest
paper_authors: Sagnik Ray Choudhury, Jushaan Kalra
for: 这个论文旨在检验语言模型（LLM）中的语法知识是否被编码在字符表示中。
methods: 论文使用了edge probing测试来检验LLM的语法知识，并对常用的edge probing测试集合进行了分析，发现这些测试集合带有各种偏见。
results: 论文发现，当暴露出偏见后，LLMencoder和随机编码器之间的差异变得更加明显，而使用信息理论 probes 可以更好地检验LLM的语法知识。

Abstract
Edge probing tests are classification tasks that test for grammatical knowledge encoded in token representations coming from contextual encoders such as large language models (LLMs). Many LLM encoders have shown high performance in EP tests, leading to conjectures about their ability to encode linguistic knowledge. However, a large body of research claims that the tests necessarily do not measure the LLM's capacity to encode knowledge, but rather reflect the classifiers' ability to learn the problem. Much of this criticism stems from the fact that often the classifiers have very similar accuracy when an LLM vs a random encoder is used. Consequently, several modifications to the tests have been suggested, including information theoretic probes. We show that commonly used edge probing test datasets have various biases including memorization. When these biases are removed, the LLM encoders do show a significant difference from the random ones, even with the simple non-information theoretic probes.

摘要
edge probing 测试是一种分类任务，用于测试语言模型（LLM）中的 grammatical 知识编码。许多 LLM 编码器在 EP 测试中表现出色，导致人们对它们的语言知识编码能力的推测。然而，许多研究表明，这些测试并不测试 LLM 的语言知识编码能力，而是测试分类器对问题的学习能力。这些批评的原因在于，经常情况下，使用 LLM 和随机编码器时，分类器的准确率几乎相同。为了解决这个问题，有许多修改的建议，包括信息理论的探测。我们发现，常用的 Edge probing 测试数据集具有各种偏见，包括记忆。当这些偏见被除去后，LLM 编码器与随机编码器之间的差别变得更加明显，жеlack 简单的非信息理论的探测也能够准确地捕捉这个差别。

Ecologically Valid Explanations for Label Variation in NLI

paper_url: http://arxiv.org/abs/2310.13850
repo_url: https://github.com/njjiang/livenli
paper_authors: Nan-Jiang Jiang, Chenhao Tan, Marie-Catherine de Marneffe
for: 这个论文是为了解决自然语言推理（NLI）任务中的人类标注差异（annotation disagreement）问题。
methods: 作者们使用了一个英文 dataset，名为 LiveNLI，包含 1,415 个生动的解释（annotators explain the NLI labels they chose），以及 122 个 MNLI 项目（每个项目都有至少 10 个解释）。
results: 研究发现，LiveNLI 的解释确实证明了人们可以系统性地有不同的解释，并且在同一个标签下存在内部差异：annotators 可能选择相同的标签，但是有不同的理由。这表明，解释是在总体上 navigation 标签 интер�прета读取的关键。然而，通过几个 prompt 测试，作者发现大语言模型可以生成有效和有用的解释，但也可能生成不合理的解释，这表明了进一步改进的方向。

Abstract
Human label variation, or annotation disagreement, exists in many natural language processing (NLP) tasks, including natural language inference (NLI). To gain direct evidence of how NLI label variation arises, we build LiveNLI, an English dataset of 1,415 ecologically valid explanations (annotators explain the NLI labels they chose) for 122 MNLI items (at least 10 explanations per item). The LiveNLI explanations confirm that people can systematically vary on their interpretation and highlight within-label variation: annotators sometimes choose the same label for different reasons. This suggests that explanations are crucial for navigating label interpretations in general. We few-shot prompt large language models to generate explanations but the results are inconsistent: they sometimes produces valid and informative explanations, but it also generates implausible ones that do not support the label, highlighting directions for improvement.

摘要
人类标签变化，或者注释不一致，是许多自然语言处理（NLP）任务中的常见问题，包括自然语言推理（NLI）。为了获得直接证据，我们建立了LiveNLI，一个英语dataset，包含1,415个生动有效的解释（拟标者解释选择的NLI标签），对122个MNLI项目进行了至少10个解释。LiveNLI解释表明，人们可以系统地变化其 интерпретаion，并且在标签内部存在差异：拟标者经常选择同一个标签，但是由不同的理由。这表明，解释是在标签 интерпретаion中 Navigation 的关键。我们使用几个描述符提示大型自然语言模型生成解释，但是结果是不一致的：它们有时生成有效和有用的解释，但也可能生成不可能的解释，不支持标签， highlighting 改进的方向。

Foundation Model’s Embedded Representations May Detect Distribution Shift

paper_url: http://arxiv.org/abs/2310.13836
repo_url: None
paper_authors: Adam Tsou, Max Vargas, Andrew Engel, Tony Chiang
for: 这个研究探讨了深度学习模型在不同任务和环境中进行转移学习（TL）时，模型的泛化能力是否受到训练和测试数据集之间的分布偏移影响。
methods: 作者使用了一个预训练的 GPT-2 模型，并将其转移到 Sentiment140 数据集上进行 sentiment classification。
results: 作者发现，Sentiment140 的测试数据集 $M$ 不是从同一个分布中采样的，因此训练于 $P$ 并测试于 $M$ 不能准确地衡量模型在 sentiment classification 中的泛化能力。

Abstract
Distribution shifts between train and test datasets obscure our ability to understand the generalization capacity of neural network models. This topic is especially relevant given the success of pre-trained foundation models as starting points for transfer learning (TL) models across tasks and contexts. We present a case study for TL on a pre-trained GPT-2 model onto the Sentiment140 dataset for sentiment classification. We show that Sentiment140's test dataset $M$ is not sampled from the same distribution as the training dataset $P$, and hence training on $P$ and measuring performance on $M$ does not actually account for the model's generalization on sentiment classification.

摘要
发布分布Shift between train和test datasets obscures our ability to understand the generalization capacity of neural network models. This topic is especially relevant given the success of pre-trained foundation models as starting points for transfer learning (TL) models across tasks and contexts. We present a case study for TL on a pre-trained GPT-2 model onto the Sentiment140 dataset for sentiment classification. We show that Sentiment140's test dataset $M$ is not sampled from the same distribution as the training dataset $P$, and hence training on $P$ and measuring performance on $M$ does not actually account for the model's generalization on sentiment classification.Here's the breakdown of the translation:* 发布 (fābù) - distribution* 分布 (bìbù) - shift* between (between) - between* train (train) - train* and (and) - and* test (test) - test* datasets (dataset) - datasets* obscures (obscures) - obscures* our (our) - our* ability (ability) - ability* to (to) - to* understand (understand) - understand* the (the) - the* generalization (generalization) - generalization* capacity (capacity) - capacity* of (of) - of* neural (neural) - neural* network (network) - network* models (model) - models* This (this) - this* topic (topic) - topic* is (is) - is* especially (especially) - especially* relevant (relevant) - relevant* given (given) - given* the (the) - the* success (success) - success* of (of) - of* pre-trained (pre-trained) - pre-trained* foundation (foundation) - foundation* models (model) - models* as (as) - as* starting (starting) - starting* points (points) - points* for (for) - for* transfer (transfer) - transfer* learning (learning) - learning* (TL) (TL) - TL* models (model) - models* across (across) - across* tasks (task) - tasks* and (and) - and* contexts (contexts) - contexts* We (we) - we* present (present) - present* a (a) - a* case (case) - case* study (study) - study* for (for) - for* TL (TL) - TL* on (on) - on* a (a) - a* pre-trained (pre-trained) - pre-trained* GPT-2 (GPT-2) - GPT-2* model (model) - model* onto (onto) - onto* the (the) - the* Sentiment140 (Sentiment140) - Sentiment140* dataset (dataset) - dataset* for (for) - for* sentiment (sentiment) - sentiment* classification (classification) - classification* We (we) - we* show (show) - show* that (that) - that* Sentiment140's (Sentiment140's) - Sentiment140's* test (test) - test* dataset (dataset) - dataset* $M$ (M) - M* is (is) - is* not (not) - not* sampled (sampled) - sampled* from (from) - from* the (the) - the* same (same) - same* distribution (distribution) - distribution* as (as) - as* the (the) - the* training (training) - training* dataset (dataset) - dataset* $P$ (P) - P* and (and) - and* hence (hence) - hence* training (training) - training* on (on) - on* $P$ (P) - P* and (and) - and* measuring (measuring) - measuring* performance (performance) - performance* on (on) - on* $M$ (M) - M* does (do) - does* not (not) - not* actually (actually) - actually* account (account) - account* for (for) - for* the (the) - the* model's (model's) - model's* generalization (generalization) - generalization* on (on) - on* sentiment (sentiment) - sentiment* classification (classification) - classification

Plausibility Processing in Transformer Language Models: Focusing on the Role of Attention Heads in GPT

paper_url: http://arxiv.org/abs/2310.13824
repo_url: https://github.com/soohyunryu/plausibility-processing-transformers
paper_authors: Soo Hyun Ryu
for: 本研究旨在探索transformer语言模型如何处理 semantics知识,特别是关于名动词关系的可能性。
methods: 本研究使用GPT2语言模型进行实验,通过分析GPT2的注意头来探讨它如何处理可能性。
results: 研究发现GPT2在可能性处理方面与人类更相似,并且在注意头中包含了知识的可能性信息。此外，研究还发现GPT2中的注意头可以共同影响语言模型的可能性处理能力,但各个注意头的可能性检测性能与其贡献相对强度不一致。

Abstract
The goal of this paper is to explore how Transformer language models process semantic knowledge, especially regarding the plausibility of noun-verb relations. First, I demonstrate GPT2 exhibits a higher degree of similarity with humans in plausibility processing compared to other Transformer language models. Next, I delve into how knowledge of plausibility is contained within attention heads of GPT2 and how these heads causally contribute to GPT2's plausibility processing ability. Through several experiments, it was found that: i) GPT2 has a number of attention heads that detect plausible noun-verb relationships; ii) these heads collectively contribute to the Transformer's ability to process plausibility, albeit to varying degrees; and iii) attention heads' individual performance in detecting plausibility does not necessarily correlate with how much they contribute to GPT2's plausibility processing ability.

摘要
本文的目的是探讨 transformer 语言模型在Semantic Knowledge 处理方面的表现，特别是 noun-verb 关系的可能性。首先，我展示 GPT2 与人类更相似在可能性处理方面的表现。然后，我探究 GPT2 中可能性知识的含义以及这些知识如何通过注意头来影响 GPT2 的可能性处理能力。通过一些实验，我发现：1. GPT2 有许多检测可能性 noun-verb 关系的注意头;2. 这些注意头共同 contribuite 到 transformer 的可能性处理能力, 虽然不同的注意头在可能性处理中的表现不同;3. 注意头的个体表现在检测可能性方面与 GPT2 的可能性处理能力相互关系不一定。

Yet Another Model for Arabic Dialect Identification

paper_url: http://arxiv.org/abs/2310.13812
repo_url: None
paper_authors: Ajinkya Kulkarni, Hanan Aldarmaki
for: 这个研究旨在开发一个用于阿拉伯语口语识别（ADI）模型，能够在两个标准数据集上（ADI-5和ADI-17）上 consistently outperform previously published results。
methods: 该模型采用了两种不同的架构变体：ResNet和ECAPA-TDNN，以及两种不同的声学特征：MFCCs和自动提取的UniSpeech-SAT Large特征，以及这些特征的混合。
results: 研究发现，ECAPA-TDNN网络单独使用表现比ResNet更高，而使用UniSpeech-SAT特征比MFCCs更高。此外，混合所有变体的模型一直outperform 单独的模型。最佳模型的准确率为84.7%和96.9%。

Abstract
In this paper, we describe a spoken Arabic dialect identification (ADI) model for Arabic that consistently outperforms previously published results on two benchmark datasets: ADI-5 and ADI-17. We explore two architectural variations: ResNet and ECAPA-TDNN, coupled with two types of acoustic features: MFCCs and features exratected from the pre-trained self-supervised model UniSpeech-SAT Large, as well as a fusion of all four variants. We find that individually, ECAPA-TDNN network outperforms ResNet, and models with UniSpeech-SAT features outperform models with MFCCs by a large margin. Furthermore, a fusion of all four variants consistently outperforms individual models. Our best models outperform previously reported results on both datasets, with accuracies of 84.7% and 96.9% on ADI-5 and ADI-17, respectively.

摘要
在这篇论文中，我们描述了一个用于阿拉伯语言分类（ADI）模型，该模型在两个标准数据集上（ADI-5和ADI-17）上表现出色， persistently 超越了之前发表的结果。我们研究了两种建筑方案：ResNet和ECAPA-TDNN，同时使用了两种声学特征：MFCC和UniSpeech-SAT Large自动学习模型中提取的特征。我们发现，ECAPA-TDNN网络单独使用表现 луч于ResNet，而使用UniSpeech-SAT特征的模型比使用MFCC特征的模型表现出了大幅提升。此外，我们发现将所有四种变体进行混合，可以一直保持模型的表现。我们的最佳模型在ADI-5和ADI-17数据集上的准确率分别为84.7%和96.9%，这些结果比之前报道的结果更高。

Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks

paper_url: http://arxiv.org/abs/2310.13800
repo_url: https://github.com/protagolabs/seq2seq_llm_evaluation
paper_authors: Andrea Sottana, Bin Liang, Kai Zou, Zheng Yuan
for: 该研究旨在提高当前生成模型性能的评估方法，通过对多种开源和关闭源生成语言模型（LLMs）在文本概要、简化和 grammatical error correction（GEC）三个 NATLP 标准准则上进行 préliminaire 和混合评估。
methods: 该研究使用了自动和人工评估方法来评估多种生成模型的性能，包括 GPT-4 作为评估器。
results: 研究发现，ChatGPT 在人工评估中常常超过许多其他流行模型，但在经典自动评估指标上得分很低。此外，人工评估人员认为金标准样本质量较差，而且模型输出与人工评估人员的判断相对较少吻合。最后，研究发现，GPT-4 可以reasonably closely align with human judgment across tasks, with a lower alignment in the GEC task.

Abstract
Large Language Models (LLMs) evaluation is a patchy and inconsistent landscape, and it is becoming clear that the quality of automatic evaluation metrics is not keeping up with the pace of development of generative models. We aim to improve the understanding of current models' performance by providing a preliminary and hybrid evaluation on a range of open and closed-source generative LLMs on three NLP benchmarks: text summarisation, text simplification and grammatical error correction (GEC), using both automatic and human evaluation. We also explore the potential of the recently released GPT-4 to act as an evaluator. We find that ChatGPT consistently outperforms many other popular models according to human reviewers on the majority of metrics, while scoring much more poorly when using classic automatic evaluation metrics. We also find that human reviewers rate the gold reference as much worse than the best models' outputs, indicating the poor quality of many popular benchmarks. Finally, we find that GPT-4 is capable of ranking models' outputs in a way which aligns reasonably closely to human judgement despite task-specific variations, with a lower alignment in the GEC task.

摘要
大型语言模型（LLM）的评估是一个含糊不清的景象，而且现在的自动评估指标质量不能跟上生成模型的发展。我们想要提高当前模型的表现理解，我们提供了一些先进的混合评估方法，在多种开源和关闭源生成LLM上进行了三个NLPbenchmark：文本概要、文本简化和语法错误修复（GEC）。我们还探索了最近发布的GPT-4是否可以作为评估器。我们发现，ChatGPT在人工评分者的评估中一直表现出色，而自动评估指标中的评分则远低。我们还发现，人工评分者评估金标 referencemuch worse than最佳模型的输出，这表明许多流行的benchmark的质量不高。最后，我们发现GPT-4可以在不同任务上对模型的输出进行排序，与人类判断相对吻合，但在GEC任务中的吻合较低。

A Unified View of Evaluation Metrics for Structured Prediction

paper_url: http://arxiv.org/abs/2310.13793
repo_url: https://github.com/wanmok/metametric
paper_authors: Yunmo Chen, William Gantt, Tongfei Chen, Aaron Steven White, Benjamin Van Durme
for: 这 paper 旨在提供一个概念框架，用于统一不同结构预测任务（如事件和关系抽取、语法和 semantics 解析）的评估指标。
methods: 该框架基于对输出结果的对象化表示，并通过匹配共同结构来 derivation 评估指标，可能会进行Normalization。
results: 作者示出了一些任务的常用指标可以简洁地表达为该框架中的一部分，并且可以在底层上通过输出结构来自然地 derivation 新的指标。同时，作者还提出了一些任务特点所带来的指标设计决策，并对现有指标进行修改。

Abstract
We present a conceptual framework that unifies a variety of evaluation metrics for different structured prediction tasks (e.g. event and relation extraction, syntactic and semantic parsing). Our framework requires representing the outputs of these tasks as objects of certain data types, and derives metrics through matching of common substructures, possibly followed by normalization. We demonstrate how commonly used metrics for a number of tasks can be succinctly expressed by this framework, and show that new metrics can be naturally derived in a bottom-up way based on an output structure. We release a library that enables this derivation to create new metrics. Finally, we consider how specific characteristics of tasks motivate metric design decisions, and suggest possible modifications to existing metrics in line with those motivations.

摘要
我们提出了一个概念框架，它可以统一不同的结构预测任务（如事件和关系抽取、 sintactic和semantic 分析）的评估指标。我们的框架需要将这些任务的输出表示为特定数据类型的对象，然后通过匹配通用的子结构来 derivate 指标，可能会 seguido de normalización。我们示例了一些任务上常用的指标可以简洁地表达在我们的框架中，并证明新的指标可以从输出结构的底层方式 derivation。我们释放了一个库，它可以帮助 derivation 新的指标。最后，我们考虑了特定任务的特征如何驱动指标设计选择，并建议可能的修改以适应这些驱动力。

How Much Consistency Is Your Accuracy Worth?

paper_url: http://arxiv.org/abs/2310.13781
repo_url: https://github.com/NitikaRaj1/bug-free-goggles
paper_authors: Jacob K. Johnson, Ana Marasović
for: 评估模型的一致性和稳定性
methods: 使用对Minimally Different Examples(MDEs)的评估，并引入相对一致性概率来衡量模型的一致性
results: 提出了一种新的一致性评估方法，并发现模型的一致性和稳定性可以通过Relative Consistency来进行评估，并且模型的100%相对一致性可以达到一致性峰值。

Abstract
Contrast set consistency is a robustness measurement that evaluates the rate at which a model correctly responds to all instances in a bundle of minimally different examples relying on the same knowledge. To draw additional insights, we propose to complement consistency with relative consistency -- the probability that an equally accurate model would surpass the consistency of the proposed model, given a distribution over possible consistencies. Models with 100% relative consistency have reached a consistency peak for their accuracy. We reflect on prior work that reports consistency in contrast sets and observe that relative consistency can alter the assessment of a model's consistency compared to another. We anticipate that our proposed measurement and insights will influence future studies aiming to promote consistent behavior in models.

摘要
“对比集合一致性”是一种Robustness度量，用于评估模型在一组最小差异示例中具有相同知识的情况下，其是否能够正确回应所有示例。为了增加更多的洞察，我们提议在consistency的基础上添加相对一致性——模型的准确率在可能的一致性 Distribution 中的概率。如果模型的相对一致性达到100%，则表示它已经达到了准确率的峰值。我们回顾先前的研究，发现consistency在对比集合中的报告和相对一致性可能会改变模型的一致性评估。我们预计，我们所提出的度量和洞察将影响未来关于模型行为的一致性的研究。

Seq2seq is All You Need for Coreference Resolution

paper_url: http://arxiv.org/abs/2310.13774
repo_url: https://github.com/wenzhengzhang/seq2seqcoref
paper_authors: Wenzheng Zhang, Sam Wiseman, Karl Stratos
for: This paper aims to challenge the assumption that task-specific models are necessary for coreference resolution, and instead, presents a simple and effective approach using a pre-trained seq2seq transformer.
methods: The proposed method finetunes a pre-trained seq2seq transformer to map an input document to a tagged sequence encoding the coreference annotation, and an especially simple seq2seq approach that generates only tagged spans rather than the spans interleaved with the original text.
results: The model outperforms or closely matches the best coreference systems in the literature on an array of datasets, and the analysis shows that the model size, the amount of supervision, and the choice of sequence representations are key factors in performance.

Abstract
Existing works on coreference resolution suggest that task-specific models are necessary to achieve state-of-the-art performance. In this work, we present compelling evidence that such models are not necessary. We finetune a pretrained seq2seq transformer to map an input document to a tagged sequence encoding the coreference annotation. Despite the extreme simplicity, our model outperforms or closely matches the best coreference systems in the literature on an array of datasets. We also propose an especially simple seq2seq approach that generates only tagged spans rather than the spans interleaved with the original text. Our analysis shows that the model size, the amount of supervision, and the choice of sequence representations are key factors in performance.

摘要
existing works on coreference resolution suggest that task-specific models are necessary to achieve state-of-the-art performance. In this work, we present compelling evidence that such models are not necessary. We fine-tune a pre-trained seq2seq transformer to map an input document to a tagged sequence encoding the coreference annotation. Despite the extreme simplicity, our model outperforms or closely matches the best coreference systems in the literature on an array of datasets. We also propose an especially simple seq2seq approach that generates only tagged spans rather than the spans interleaved with the original text. Our analysis shows that the model size, the amount of supervision, and the choice of sequence representations are key factors in performance.Here's the translation in Traditional Chinese:现有的核心引用解析工作都建议需要任务特定的模型来 дости持最佳性能。在这个工作中，我们提供了吸引人的证据，说明这些模型不是必要的。我们精致地调整了预训练的 seq2seq transformer，将输入文档映射到标注的序列中，以表示核心引用标识。尽管非常简单，我们的模型在多个 dataset 上都能够对核心引用系统进行出色的表现，或与文献中的最佳系统相对接近。我们还提出了一种非常简单的 seq2seq 方法，将标注 span 生成成只有与原始文本混合的 span 相比。我们的分析显示，模型大小、监督量和序列表示方法是表现的关键因素。

Enhancing Abstractiveness of Summarization Models through Calibrated Distillation

paper_url: http://arxiv.org/abs/2310.13760
repo_url: None
paper_authors: Hwanjun Song, Igor Shalyminov, Hang Su, Siffi Singh, Kaisheng Yao, Saab Mansour
for: 这篇论文的目的是提高抽象概括的效果，不过常常会导致抽象性减退。
methods: 该论文提出了一种新的方法，即DisCal，用于提高抽象概括的效果，不需要失去信息的损失。DisCal通过向学生模型提供多种假概括，使学生模型更好地学习抽象概括技巧。
results: 实验结果表明，DisCal比先前的方法在抽象概括练习中更高效，可以生成高度抽象和有用的概括。

Abstract
Sequence-level knowledge distillation reduces the size of Seq2Seq models for more efficient abstractive summarization. However, it often leads to a loss of abstractiveness in summarization. In this paper, we propose a novel approach named DisCal to enhance the level of abstractiveness (measured by n-gram overlap) without sacrificing the informativeness (measured by ROUGE) of generated summaries. DisCal exposes diverse pseudo summaries with two supervision to the student model. Firstly, the best pseudo summary is identified in terms of abstractiveness and informativeness and used for sequence-level distillation. Secondly, their ranks are used to ensure the student model to assign higher prediction scores to summaries with higher ranks. Our experiments show that DisCal outperforms prior methods in abstractive summarization distillation, producing highly abstractive and informative summaries.

摘要
序列级知识填充可以降低Seq2Seq模型的大小，以实现更高效的抽象概要。然而，这经常会导致抽象性的减少。在这篇论文中，我们提出了一种新的方法，即DisCal，以提高抽象性（通过n-gram重叠度衡量）无需牺牲生成的概要的信息性（通过ROUGE衡量）。DisCal向学生模型提供多种假概要，并对其进行序列级填充。首先，我们选择最佳的假概要，根据抽象性和信息性进行评价。其次，我们使用其排名来确保学生模型对概要的预测分数进行较高的分配。我们的实验表明，DisCal在抽象概要填充中超过了先前的方法，生成了高度抽象和有用的概要。

ALDi: Quantifying the Arabic Level of Dialectness of Text

paper_url: http://arxiv.org/abs/2310.13747
repo_url: https://github.com/amr-keleg/aldi
paper_authors: Amr Keleg, Sharon Goldwater, Walid Magdy
for: 本研究旨在提供一个可以辨识阿拉伯语言使用者在不同情况下的语言风格选择的方法。
methods: 本研究使用了一个新的语言水平分布（ALDi）来评估阿拉伯语言的方言差异。 ALDi 是一个连续的语言变量，可以在句子水平上量度阿拉伯语言使用者对语言风格的选择。
results: 研究发现，使用 ALDi 可以对不同的阿拉伯语言资料集进行有效的辨识，并且可以显示阿拉伯语言使用者在不同情况下的语言风格选择。

Abstract
Transcribed speech and user-generated text in Arabic typically contain a mixture of Modern Standard Arabic (MSA), the standardized language taught in schools, and Dialectal Arabic (DA), used in daily communications. To handle this variation, previous work in Arabic NLP has focused on Dialect Identification (DI) on the sentence or the token level. However, DI treats the task as binary, whereas we argue that Arabic speakers perceive a spectrum of dialectness, which we operationalize at the sentence level as the Arabic Level of Dialectness (ALDi), a continuous linguistic variable. We introduce the AOC-ALDi dataset (derived from the AOC dataset), containing 127,835 sentences (17% from news articles and 83% from user comments on those articles) which are manually labeled with their level of dialectness. We provide a detailed analysis of AOC-ALDi and show that a model trained on it can effectively identify levels of dialectness on a range of other corpora (including dialects and genres not included in AOC-ALDi), providing a more nuanced picture than traditional DI systems. Through case studies, we illustrate how ALDi can reveal Arabic speakers' stylistic choices in different situations, a useful property for sociolinguistic analyses.

摘要
传统的阿拉伯语言处理（NLP）研究偏向于干rn Dialect Identification（DI），即 sentence或token level上的方言识别。然而，我们认为阿拉伯语言使用者看到了方言强度的连续变量，我们称之为阿拉伯语言层次（ALDi）。我们引入了AOC-ALDi数据集（基于AOC数据集），包含127,835个句子（新闻文章占17%，用户评论占83%），每个句子都是手动标注了其方言强度。我们提供了AOC-ALDi的详细分析，并证明一个基于AOC-ALDi的模型可以有效地在其他 corpora 上识别方言强度，提供了更加细腻的图像。通过案例研究，我们示出了阿拉伯语言使用者在不同情况下的样式选择，这是社会语言分析中非常有用的特性。

Exploring Linguistic Probes for Morphological Generalization

paper_url: http://arxiv.org/abs/2310.13686
repo_url: https://github.com/jkodner05/EMNLP2023_LingProbes
paper_authors: Jordan Kodner, Salam Khalifa, Sarah Payne
for: 这个论文主要针对的是 morphological inflection 的计算模型化。
methods: 这篇论文使用了语言独立的数据分割算法，并采用了语言特定的探针来测试 morphological generalization 的方面。
results: 对于三种 morphologically distinct 语言（英语、西班牙语、斯瓦希利语），研究发现这三个主要 morphological inflection 系统在 conjugational classes 和 feature sets 上采用了不同的总结策略，并在 both orthographic 和 phonologically transcribed inputs 上得到了证据。

Abstract
Modern work on the cross-linguistic computational modeling of morphological inflection has typically employed language-independent data splitting algorithms. In this paper, we supplement that approach with language-specific probes designed to test aspects of morphological generalization. Testing these probes on three morphologically distinct languages, English, Spanish, and Swahili, we find evidence that three leading morphological inflection systems employ distinct generalization strategies over conjugational classes and feature sets on both orthographic and phonologically transcribed inputs.

摘要
现代工作中使用了跨语言计算模型来研究 morphological inflection 的计算模型通常采用语言独立的数据分割算法。在这篇论文中，我们补充了这种方法，使用语言特定的探针来测试 morphological generalization 的方面。在英语、西班牙语和斯瓦希利语三种 morphologically distinct 语言上测试这些探针，我们发现三个领先的 morphological inflection 系统在 conjugational classes 和 feature sets 上使用了不同的总结策略，并且这些策略在 both orthographic 和 phonologically transcribed inputs 上都有效。

Information Value: Measuring Utterance Predictability as Distance from Plausible Alternatives

paper_url: http://arxiv.org/abs/2310.13676
repo_url: https://github.com/dmg-illc/information-value
paper_authors: Mario Giulianelli, Sarenne Wallbridge, Raquel Fernández
for: 这篇论文主要用于探讨语言预测性的问题，具体来说是通过语音生成器来评估语言的信息价值。
methods: 论文使用了语音生成器来获得可解释的信息价值估计，并利用这些估计来研究人类理解行为中的维度。
results: 论文发现信息价值是written和spoken对话中语言预测性的更好的预测器，并且与词级抽象度的总和不同，可以作为词级预测性的补充。

Abstract
We present information value, a measure which quantifies the predictability of an utterance relative to a set of plausible alternatives. We introduce a method to obtain interpretable estimates of information value using neural text generators, and exploit their psychometric predictive power to investigate the dimensions of predictability that drive human comprehension behaviour. Information value is a stronger predictor of utterance acceptability in written and spoken dialogue than aggregates of token-level surprisal and it is complementary to surprisal for predicting eye-tracked reading times.

摘要
我们介绍信息价值，一种量化评估话语可能性相对于一组可能的选择的度量。我们提出了使用神经网络文本生成器获取可读取的信息价值估计方法，并利用它们的心理测量预测力来调查驱动人类理解行为的维度。信息价值在书面和口语对话中比聚合各个字符度量的奇偶性和总体的奇偶性强制性更好地预测话语可CCE接受性。

On Synthetic Data for Back Translation

paper_url: http://arxiv.org/abs/2310.13675
repo_url: https://github.com/jiahao004/data-for-bt
paper_authors: Jiahao Xu, Yubin Ruan, Wei Bi, Guoping Huang, Shuming Shi, Lihui Chen, Lemao Liu
for: 本研究旨在调查Back Translation(BT)技术在Machine Translation(MT)领域中的应用，并研究如何生成更高质量的synthetic data来提高BT性能。
methods: 本研究采用了 teoretic和empirical方法来研究synthetic data在BT性能中的作用，并提出了一种简单 yet effective的方法来生成synthetic data，以更好地考虑质量和重要性两个因素。
results: 经过extensive的实验 validate that our proposed method可以 significantly improve BT性能，在WMT14 DE-EN、EN-DE和RU-EN benchmark任务上都达到了比标准基eline的性能。

Abstract
Back translation (BT) is one of the most significant technologies in NMT research fields. Existing attempts on BT share a common characteristic: they employ either beam search or random sampling to generate synthetic data with a backward model but seldom work studies the role of synthetic data in the performance of BT. This motivates us to ask a fundamental question: {\em what kind of synthetic data contributes to BT performance?} Through both theoretical and empirical studies, we identify two key factors on synthetic data controlling the back-translation NMT performance, which are quality and importance. Furthermore, based on our findings, we propose a simple yet effective method to generate synthetic data to better trade off both factors so as to yield a better performance for BT. We run extensive experiments on WMT14 DE-EN, EN-DE, and RU-EN benchmark tasks. By employing our proposed method to generate synthetic data, our BT model significantly outperforms the standard BT baselines (i.e., beam and sampling based methods for data generation), which proves the effectiveness of our proposed methods.

摘要
<>Back Translation（BT）是现代机器翻译研究领域中最重要的技术之一。现有的尝试中大多数采用了扫描或随机抽样来生成反向模型的synthetic数据，但rarely有研究synthetic数据在BT性能中的作用。这引发了我们的一个基本问题：{\em 这些synthetic数据对BT性能有什么类型的影响?}通过理论和实验研究，我们确定了两个关键因素控制了反向翻译NMT性能：品质和重要性。此外，基于我们的发现，我们提议一种简单 yet effective的方法来生成synthetic数据，以更好地考虑这两个因素，以提高BT性能。我们在WMT14 DE-EN、EN-DE和RU-ENbenchmark任务上进行了广泛的实验，并证明了我们的提议的方法可以significantly outperform标准BT基elines（即扫描和随机抽样基elines），这证明了我们的方法的有效性。

StereoMap: Quantifying the Awareness of Human-like Stereotypes in Large Language Models

paper_url: http://arxiv.org/abs/2310.13673
repo_url: https://github.com/sullamij/stereomap
paper_authors: Sullam Jeoung, Yubin Ge, Jana Diesner
for: 本研究旨在理解大语言模型（LLM）对社会群体的投影和表现，以及LLM如何在训练数据中存储和传播有害关系。
methods: 本研究使用了一种基于心理学理论的框架，称为StereoMap，来探索LLM对社会群体的投影。StereoMap使用心理学中已知的 sterotype Content Model（SCM），将刻画为两个维度：温暖度和能力。
results: 研究发现，LLM对不同社会群体的投影存在多样化的评价，包括温暖度和能力两个维度上的混合评价。此外，分析LLM的推理，研究发现LLM有时会引用社会不平等的统计数据和研究结果来支持其推理。这种做法可能反映LLM对社会不平等的认识和承认。

Abstract
Large Language Models (LLMs) have been observed to encode and perpetuate harmful associations present in the training data. We propose a theoretically grounded framework called StereoMap to gain insights into their perceptions of how demographic groups have been viewed by society. The framework is grounded in the Stereotype Content Model (SCM); a well-established theory from psychology. According to SCM, stereotypes are not all alike. Instead, the dimensions of Warmth and Competence serve as the factors that delineate the nature of stereotypes. Based on the SCM theory, StereoMap maps LLMs' perceptions of social groups (defined by socio-demographic features) using the dimensions of Warmth and Competence. Furthermore, the framework enables the investigation of keywords and verbalizations of reasoning of LLMs' judgments to uncover underlying factors influencing their perceptions. Our results show that LLMs exhibit a diverse range of perceptions towards these groups, characterized by mixed evaluations along the dimensions of Warmth and Competence. Furthermore, analyzing the reasonings of LLMs, our findings indicate that LLMs demonstrate an awareness of social disparities, often stating statistical data and research findings to support their reasoning. This study contributes to the understanding of how LLMs perceive and represent social groups, shedding light on their potential biases and the perpetuation of harmful associations.

摘要

paper_url: http://arxiv.org/abs/2310.13664
repo_url: None
paper_authors: Eliseo Bao Souto, Anxo Pérez, Javier Parapar
for: 这研究旨在用语言模型检测和解释用户在社交平台上发布的情绪症状 markers。
methods: 我们使用 transformer 架构来实现这两个任务，包括分类和解释分类决策。我们还使用最新的对话式 LLMS 进行具体实现。
results: 我们的实验结果表明，可以同时实现良好的分类结果和可解释的决策。我们的自然语言解释可以帮助临床专业人员理解模型决策的基础。

Abstract
Users of social platforms often perceive these sites as supportive spaces to post about their mental health issues. Those conversations contain important traces about individuals' health risks. Recently, researchers have exploited this online information to construct mental health detection models, which aim to identify users at risk on platforms like Twitter, Reddit or Facebook. Most of these models are centred on achieving good classification results, ignoring the explainability and interpretability of the decisions. Recent research has pointed out the importance of using clinical markers, such as the use of symptoms, to improve trust in the computational models by health professionals. In this paper, we propose using transformer-based architectures to detect and explain the appearance of depressive symptom markers in the users' writings. We present two approaches: i) train a model to classify, and another one to explain the classifier's decision separately and ii) unify the two tasks simultaneously using a single model. Additionally, for this latter manner, we also investigated the performance of recent conversational LLMs when using in-context learning. Our natural language explanations enable clinicians to interpret the models' decisions based on validated symptoms, enhancing trust in the automated process. We evaluate our approach using recent symptom-based datasets, employing both offline and expert-in-the-loop metrics to assess the quality of the explanations generated by our models. The experimental results show that it is possible to achieve good classification results while generating interpretable symptom-based explanations.

摘要
社交媒体用户们常看待这些平台为他们的心理健康问题提供支持的空间。这些对话包含了用户健康风险的重要 traces。近些年，研究人员利用这些在线信息构建了心理健康检测模型，以 identificar社交媒体上的用户风险。大多数这些模型强调得到好的分类结果，忽略了计算模型的解释性和可读性。现在的研究表明，使用临床标志（如症状使用）可以提高计算模型的信任worth。在这篇论文中，我们提议使用变换器结构来检测和解释用户写作中的抑郁症状标志。我们提出了两种方法：一是在不同的步骤中训练分类和解释模型，二是同时使用单一模型来实现这两个任务。此外，我们还 investigate了最近的对话语言模型在使用上下文学习时的表现。我们的自然语言解释使得临床专业人员可以根据验证的症状来解释计算模型的决策，提高自动化过程中的信任。我们使用最新的症状基数据集进行评估，并使用线上和专家在 Loop 中的 metric 来评估我们的模型生成的解释质量。实验结果表明，可以同时实现好的分类结果和可读的症状基本解释。

Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification

paper_url: http://arxiv.org/abs/2310.13661
repo_url: https://github.com/amr-keleg/adi-under-scrutiny
paper_authors: Amr Keleg, Walid Magdy
for: 本文主要探讨了自动阿拉伯语方言识别（ADI）问题，尤其是在分类问题中存在困难的微方言识别问题。
methods: 作者提出了将 ADI 定义为多个标签分类问题，并提供了设计新的 ADI 数据集的建议。
results: 手动错误分析表明，有大约 66% 的错误不是真正的错误。

Abstract
Automatic Arabic Dialect Identification (ADI) of text has gained great popularity since it was introduced in the early 2010s. Multiple datasets were developed, and yearly shared tasks have been running since 2018. However, ADI systems are reported to fail in distinguishing between the micro-dialects of Arabic. We argue that the currently adopted framing of the ADI task as a single-label classification problem is one of the main reasons for that. We highlight the limitation of the incompleteness of the Dialect labels and demonstrate how it impacts the evaluation of ADI systems. A manual error analysis for the predictions of an ADI, performed by 7 native speakers of different Arabic dialects, revealed that $\approx$ 66% of the validated errors are not true errors. Consequently, we propose framing ADI as a multi-label classification task and give recommendations for designing new ADI datasets.

摘要
自2010年代初起，自动阿拉伯语方言识别（ADI）技术已经受到了广泛的关注。多个数据集被开发出来，并每年举行共享任务。然而，ADI系统被报道无法分辨阿拉伯语微方言。我们认为，现有的ADI任务的帧围是主要的原因之一。我们强调了标签的不完整性的限制，并示出了这些限制如何影响ADI系统的评估。我们手动进行了7名阿拉伯语本地语言 speaker的预测 validate 分析，发现 Approx. 66% 的有效错误不是真正的错误。因此，我们提议将ADI重新定义为多标签分类任务，并提供了设计新的ADI数据集的建议。

Benchmarking and Improving Text-to-SQL Generation under Ambiguity

paper_url: http://arxiv.org/abs/2310.13659
repo_url: https://github.com/testzer0/ambiqt
paper_authors: Adithya Bhaskar, Tushar Tomar, Ashutosh Sathe, Sunita Sarawagi
for: bridging the gap between text-to-SQL conversion and real-life database queries
methods: developing a novel benchmark called AmbiQT, and proposing a new decoding algorithm called LogicalBeam
results: LogicalBeam is up to $2.5$ times more effective than state-of-the-art models at generating all candidate SQLs in the top-$k$ ranked outputs, and enhances the top-$5$ Exact and Execution Match Accuracies on SPIDER and Kaggle DBQA.Here’s the Chinese version:
for: closure Text-to-SQL转换和实际数据库查询之间的差距
methods: 开发了一个新的benchmark called AmbiQT，并提出了一种新的解码算法called LogicalBeam
results: LogicalBeam比现状的模型更加有效，可以在top-$k$排名输出中生成所有可能的SQL查询，并提高了SPIDER和Kaggle DBQA的Exact和Execution Match Accuracy的top-$5$。

Abstract
Research in Text-to-SQL conversion has been largely benchmarked against datasets where each text query corresponds to one correct SQL. However, natural language queries over real-life databases frequently involve significant ambiguity about the intended SQL due to overlapping schema names and multiple confusing relationship paths. To bridge this gap, we develop a novel benchmark called AmbiQT with over 3000 examples where each text is interpretable as two plausible SQLs due to lexical and/or structural ambiguity. When faced with ambiguity, an ideal top-$k$ decoder should generate all valid interpretations for possible disambiguation by the user. We evaluate several Text-to-SQL systems and decoding algorithms, including those employing state-of-the-art LLMs, and find them to be far from this ideal. The primary reason is that the prevalent beam search algorithm and its variants, treat SQL queries as a string and produce unhelpful token-level diversity in the top-$k$. We propose LogicalBeam, a new decoding algorithm that navigates the SQL logic space using a blend of plan-based template generation and constrained infilling. Counterfactually generated plans diversify templates while in-filling with a beam-search that branches solely on schema names provides value diversity. LogicalBeam is up to $2.5$ times more effective than state-of-the-art models at generating all candidate SQLs in the top-$k$ ranked outputs. It also enhances the top-$5$ Exact and Execution Match Accuracies on SPIDER and Kaggle DBQA.

摘要
研究在文本到SQL转换方面已经主要基准于具有一个唯一正确SQL的数据集。然而，自然语言 queries 中的真实数据库问题经常具有多种ambiguity，因为schema名称和关系路径的重叠。为了bridging这个差距，我们开发了一个新的benchmark叫做AmbiQT，它包含了超过3000个例子，每个文本都可以被解释为两个可能的SQL。当面临ambiguity时，理想的top-$k$ decoder应该生成所有有效的解释，以便由用户进行解释。然而，我们发现现有的 Text-to-SQL 系统和解码算法，包括使用状态态艺术LLMs，都远离这种理想。主要的原因是普遍使用的搜索算法和其变种，将SQL查询视为字符串，生成不帮助的token级多样性在top-$k$中。我们提议LogicalBeam，一种新的解码算法，通过将SQL逻辑空间映射到plan-based模板生成和受限的填充来解决这个问题。在填充过程中，使用缓冲搜索，在schema名称上分支，提供值多样性。LogicalBeam在top-$k$中生成所有候选SQL的效果比现有模型高达2.5倍。此外，它也提高了SPIDER和Kaggle DBQA中top-$5$的准确率和执行匹配率。

BotChat: Evaluating LLMs’ Capabilities of Having Multi-Turn Dialogues

paper_url: http://arxiv.org/abs/2310.13650
repo_url: https://github.com/open-compass/botchat
paper_authors: Haodong Duan, Jueqi Wei, Chonghua Wang, Hongwei Liu, Yixiao Fang, Songyang Zhang, Dahua Lin, Kai Chen
for: 这份报告是为评估现有的大型语言模型（LLMs）在人类样式多Turn对话中的能力而写的。
methods: 我们使用了现实世界的人类对话作为开头，并让LLMs根据这些开头生成全部多Turn对话（数十句）。最后，我们采用了当今最佳的LLMs（GPT-4等）作为评估器，以评估生成的对话质量。
results: 我们发现GPT-4可以生成人类样式的多Turn对话，质量极高，明显超过其他LLMs。这些生成的对话很难被识别为机器生成的对话，而其他LLMs则很难生成满意的多Turn对话，主要由于低效的指令遵循能力、生成过长句子或总能力有限。

Abstract
Interacting with human via high-quality multi-turn dialogues is a key feature of large language models (LLMs). However, human-based evaluation of such capability involves intensive manual labor. This report provides a preliminary evaluation of existing large language models for human-style multi-turn chatting, through an LLM-based approach. We start from real-world human dialogues and keep the very first utterances as the ChatSEED. Then we prompt LLMs to generate a full multi-turn dialogue (tens of utterances) based on the ChatSEED, utterance by utterance. Finally, we adopt state-of-the-art LLMs (GPT-4, \etc) as the judge to evaluate the generated dialogues. With different evaluation protocols, we come to substantially identical conclusions. We find that GPT-4 can generate human-style multi-turn dialogues with impressive quality, significantly outperforms its counterparts. It's difficult for a discriminator to distinguish between GPT-4 generated dialogues and human dialogues. In contrast, other LLMs struggle to generate multi-turn dialogues of satisfactory quality due to poor instruction-following capability, tendency to generate lengthy utterances, or limited general capability. All data and codes will be provided in https://github.com/open-compass/BotChat/ and we hope they can serve as a valuable resource for evaluating multi-turn chatting capabilities of LLMs.

摘要
<>使用高质量多轮对话与人类交互是大型语言模型（LLM）的关键功能之一。然而，人类基于评估这种能力具有劳动密集型的特点。本报告提供了基于现实世界人类对话的初始实验，并采用LLM基于方法进行评估。我们从真实世界人类对话中挑选出第一句话作为ChatSEED，然后使用LLM生成完整的多轮对话（数十句），一句一句地进行生成。最后，我们采用当今最高级的LLM（GPT-4、等）作为评估者，使用不同的评价协议进行评估生成对话质量。我们发现GPT-4可以生成人类式的多轮对话，质量极高，明显超越其他对手。很难用权限 distinguish GPT-4生成的对话和人类对话。相比之下，其他LLM很难生成满意质量的多轮对话，主要是因为低效的指令遵循能力、生成过长的句子或总能力有限。我们将所有数据和代码提供在https://github.com/open-compass/BotChat/上，希望它们可以为评估LLM多轮对话能力提供有价值的资源。

Bridging Information-Theoretic and Geometric Compression in Language Models

paper_url: http://arxiv.org/abs/2310.13620
repo_url: https://github.com/chengemily1/id_bridging
paper_authors: Emily Cheng, Corentin Kervadec, Marco Baroni
for: 这项研究旨在探讨语言模型（LM）如何准确地模型人类语言，以及LM的压缩性能对其表现的影响。
methods: 研究者采用了两种视角来分析LM的压缩性能：几何学视角和信息理论视角。他们发现这两种视角之间存在高度相关性，即语言数据的自然几何维度预测了该数据在LM中的编码长度。
results: 研究者发现，LM的压缩性能与其能够快速适应语言数据的能力相关。此外，他们还评估了一些内在维度估计器，并发现只有一些估计器能够捕捉语言数据中的压缩性、几何维度和适应性之间的关系。

Abstract
For a language model (LM) to faithfully model human language, it must compress vast, potentially infinite information into relatively few dimensions. We propose analyzing compression in (pre-trained) LMs from two points of view: geometric and information-theoretic. We demonstrate that the two views are highly correlated, such that the intrinsic geometric dimension of linguistic data predicts their coding length under the LM. We then show that, in turn, high compression of a linguistic dataset predicts rapid adaptation to that dataset, confirming that being able to compress linguistic information is an important part of successful LM performance. As a practical byproduct of our analysis, we evaluate a battery of intrinsic dimension estimators for the first time on linguistic data, showing that only some encapsulate the relationship between information-theoretic compression, geometric compression, and ease-of-adaptation.

摘要
为了准确模拟人类语言，语言模型（LM）必须压缩庞大、潜在无穷的信息到相对较少的维度。我们从两个视角分析LM中的压缩：几何学视角和信息理论视角。我们示示了这两种视角之间存在很高的相关性，即语言数据的内在几何维度预测其编码长度下LM。然后，我们示示了，对于某个语言集合，高压缩率预测了快速适应该语言集合的能力，确认了压缩语言信息是成功LM性能的重要组成部分。此外，我们评估了一系列内在维度估计器，并发现只有一些能够捕捉压缩、几何压缩和适应性之间的关系。

Semi-supervised multimodal coreference resolution in image narrations

paper_url: http://arxiv.org/abs/2310.13619
repo_url: None
paper_authors: Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen
for: 这篇论文研究了多Modal coreference resolution，具体来说是将长的描述文本（即 narraion）与图片对应。
methods: 该论文提出了一种数据效率的半supervised方法，利用图片-文本对应来解决多Modal coreference resolution和narraion grounding问题。该方法在cross-modal框架中结合了标注和无标注数据的损失函数。
results: 论文的实验显示，该方法可以比以强基线数据进行量化和质量上的提升，用于多Modal coreference resolution和narraion grounding任务。

Abstract
In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i.e., a narration is paired with an image. This poses significant challenges due to fine-grained image-text alignment, inherent ambiguity present in narrative language, and unavailability of large annotated training sets. To tackle these challenges, we present a data efficient semi-supervised approach that utilizes image-narration pairs to resolve coreferences and narrative grounding in a multimodal context. Our approach incorporates losses for both labeled and unlabeled data within a cross-modal framework. Our evaluation shows that the proposed approach outperforms strong baselines both quantitatively and qualitatively, for the tasks of coreference resolution and narrative grounding.

摘要
在这篇论文中，我们研究多模态核心参照解决方法，具体来说是与图像相关的长文描述。这种情况存在细腻的图像文本对齐问题、自然语言中的模糊性和缺乏大量标注训练集的问题。为解决这些问题，我们提出了一种数据效率高的半超级vised方法，该方法利用图像文本对的数据来解决核心参照和多模态场景中的描述固定。我们的方法包括两种损失函数：标注数据的损失函数和未标注数据的损失函数，并在跨模态框架中结合使用。我们的评估结果表明，我们的方法在核心参照和描述固定两个任务上都能够超越强基线，both quantitatively and qualitatively。

Three Questions Concerning the Use of Large Language Models to Facilitate Mathematics Learning

paper_url: http://arxiv.org/abs/2310.13615
repo_url: None
paper_authors: An-Zi Yen, Wei-Ling Hsu
for: 本研究探讨了使用大语言模型（LLM）来提高学生数学问题解决能力的可能性，以及LLM在教学应用中的教学能力。
methods: 本研究采用了适应性反馈的方法，通过LLM对学生答案的检查和修正来帮助学生解决数学问题。
results: 研究发现，LLM可能会因为问题的意思和逻辑不准确而提供错误的反馈，同时也可能会因为问题的 complexity 而难以理解问题的 rationales。

Abstract
Due to the remarkable language understanding and generation abilities of large language models (LLMs), their use in educational applications has been explored. However, little work has been done on investigating the pedagogical ability of LLMs in helping students to learn mathematics. In this position paper, we discuss the challenges associated with employing LLMs to enhance students' mathematical problem-solving skills by providing adaptive feedback. Apart from generating the wrong reasoning processes, LLMs can misinterpret the meaning of the question, and also exhibit difficulty in understanding the given questions' rationales when attempting to correct students' answers. Three research questions are formulated.

摘要

如何使用大语言模型（LLMs）来增强学生的数学问题解决能力？2. LLMS 是否能够正确理解学生提交的问题，并且能够提供适应性的反馈？3. LLMS 在帮助学生学习数学时是否存在挑战，如果存在，则如何解决这些挑战？

Simultaneous Machine Translation with Tailored Reference

paper_url: http://arxiv.org/abs/2310.13588
repo_url: https://github.com/ictnlp/Tailored-Ref
paper_authors: Shoutao Guo, Shaolei Zhang, Yang Feng
for: 本研究旨在提高同时机器翻译（SiMT）模型的翻译质量，并且适应不同的延迟环境。
methods: 本研究提出了一种新的方法，即通过使用强化学习引入的修改器，对SiMT模型的训练参考进行修改，以避免在训练过程中的强制预测。
results: 实验结果表明，使用修改器进行修改的SiMT模型在三个翻译任务中均 achieve state-of-the-art表现，并且在固定和适应策略下都能够提高表现。

Abstract
Simultaneous machine translation (SiMT) generates translation while reading the whole source sentence. However, existing SiMT models are typically trained using the same reference disregarding the varying amounts of available source information at different latency. Training the model with ground-truth at low latency may introduce forced anticipations, whereas utilizing reference consistent with the source word order at high latency results in performance degradation. Consequently, it is crucial to train the SiMT model with appropriate reference that avoids forced anticipations during training while maintaining high quality. In this paper, we propose a novel method that provides tailored reference for the SiMT models trained at different latency by rephrasing the ground-truth. Specifically, we introduce the tailor, induced by reinforcement learning, to modify ground-truth to the tailored reference. The SiMT model is trained with the tailored reference and jointly optimized with the tailor to enhance performance. Importantly, our method is applicable to a wide range of current SiMT approaches. Experiments on three translation tasks demonstrate that our method achieves state-of-the-art performance in both fixed and adaptive policies.

摘要

Improving Cross-Lingual Transfer through Subtree-Aware Word Reordering

paper_url: http://arxiv.org/abs/2310.13583
repo_url: https://github.com/ofirarviv/ud-based-word-reordering
paper_authors: Ofir Arviv, Dmitry Nikolaev, Taelin Karidi, Omri Abend
for: 本研究旨在提高多语言模型对不同语言的表达能力，尤其是在low-resource设置下。
methods: 我们提出了一种新的重新排序方法，基于Universal Dependencies语法，可以通过少量注解数据学习细致的单词顺序模式，并可以应用于各种语言和模型结构。
results: 我们的方法在多种任务上表现出优于强基eline，包括零shot和几shot情况下。这表明我们的方法可以有效地 Mitigate variability in word-order patterns，提高多语言模型的表达能力。

Abstract
Despite the impressive growth of the abilities of multilingual language models, such as XLM-R and mT5, it has been shown that they still face difficulties when tackling typologically-distant languages, particularly in the low-resource setting. One obstacle for effective cross-lingual transfer is variability in word-order patterns. It can be potentially mitigated via source- or target-side word reordering, and numerous approaches to reordering have been proposed. However, they rely on language-specific rules, work on the level of POS tags, or only target the main clause, leaving subordinate clauses intact. To address these limitations, we present a new powerful reordering method, defined in terms of Universal Dependencies, that is able to learn fine-grained word-order patterns conditioned on the syntactic context from a small amount of annotated data and can be applied at all levels of the syntactic tree. We conduct experiments on a diverse set of tasks and show that our method consistently outperforms strong baselines over different language pairs and model architectures. This performance advantage holds true in both zero-shot and few-shot scenarios.

摘要
尽管多语言模型（如XLM-R和mT5）在表现出了卓越的能力，但它们在语言学上较远的语言中仍然遇到困难，特别是在low-resource Setting下。一个阻碍 cross-lingual 转移的障碍是语言变体的word-order模式的变化。这可能可以通过源-或目标-side word reordering来mitigate，并且有许多approach to reordering已经被提出。然而，这些方法都是基于语言特定的规则，工作在POS标签层次上，或者只能target主句，留下副句不变。为了解决这些限制，我们提出了一种新的强大的重编译方法，基于Universal Dependencies，可以通过一小量的注释数据学习细致的word-order模式，并且可以在所有语法树层次上应用。我们进行了多种任务的实验，并证明了我们的方法在不同的语言对和模型架构下表现出了卓越的表现，并且这种表现优势在零shot和几shot Scenario下都保持。

Semantic Decomposition of Question and SQL for Text-to-SQL Parsing

paper_url: http://arxiv.org/abs/2310.13575
repo_url: None
paper_authors: Ben Eyal, Amir Bachar, Ophir Haroche, Moran Mahabi, Michael Elhadad
for: 提高文本到SQL semantic parsing的通用化能力，解决跨领域和复杂查询的挑战。
methods: 使用问题分解策略提高复杂SQL查询的解析，但这会遇到两个主要障碍：（1）现有数据集缺少问题分解；（2）由SQL语言的 sintaxis复杂性，大多数复杂查询无法被简单 decomposed into sub-queries。
results: 我们提出了一种新的模块化查询计划语言（QPL），系统地将SQL查询分解成简单和Regular sub-queries。我们利用SQL服务器查询优化计划分析，开发了一个将SQL转换为QPL的翻译器，并将Spider数据集扩展到QPL程序。实验结果表明，模块化QPL的性能有助于现有的semantic-parsing架构，并且训练text-to-QPL parser比text-to-SQL parsing更有效果。此外，QPL方法还具有两个优点：（1）QPL程序可以简单化为可读的问题，从而创建了一个复杂问题和分解问题的数据集。（2）QPL更易于非专家理解复杂查询结果，从而提高了semantic parser的可读性。

Abstract
Text-to-SQL semantic parsing faces challenges in generalizing to cross-domain and complex queries. Recent research has employed a question decomposition strategy to enhance the parsing of complex SQL queries. However, this strategy encounters two major obstacles: (1) existing datasets lack question decomposition; (2) due to the syntactic complexity of SQL, most complex queries cannot be disentangled into sub-queries that can be readily recomposed. To address these challenges, we propose a new modular Query Plan Language (QPL) that systematically decomposes SQL queries into simple and regular sub-queries. We develop a translator from SQL to QPL by leveraging analysis of SQL server query optimization plans, and we augment the Spider dataset with QPL programs. Experimental results demonstrate that the modular nature of QPL benefits existing semantic-parsing architectures, and training text-to-QPL parsers is more effective than text-to-SQL parsing for semantically equivalent queries. The QPL approach offers two additional advantages: (1) QPL programs can be paraphrased as simple questions, which allows us to create a dataset of (complex question, decomposed questions). Training on this dataset, we obtain a Question Decomposer for data retrieval that is sensitive to database schemas. (2) QPL is more accessible to non-experts for complex queries, leading to more interpretable output from the semantic parser.

摘要
文本抽取 Semantic 问题面临通用化和复杂查询泛化的挑战。 current research 使用问题分解策略提高复杂 SQL 查询的解析。 however， this strategy 遇到两个主要障碍：（1）现有数据集缺少问题分解；（2）由于 SQL 的 sintax 复杂性，大多数复杂查询无法分解成可轻松重新组合的子查询。 To address these challenges, we propose a new modular Query Plan Language (QPL) that systematically decomposes SQL queries into simple and regular sub-queries. We develop a translator from SQL to QPL by leveraging analysis of SQL server query optimization plans, and we augment the Spider dataset with QPL programs. Experimental results demonstrate that the modular nature of QPL benefits existing semantic-parsing architectures, and training text-to-QPL parsers is more effective than text-to-SQL parsing for semantically equivalent queries. The QPL approach offers two additional advantages: (1) QPL programs can be paraphrased as simple questions, which allows us to create a dataset of (complex question, decomposed questions). Training on this dataset, we obtain a Question Decomposer for data retrieval that is sensitive to database schemas. (2) QPL is more accessible to non-experts for complex queries, leading to more interpretable output from the semantic parser.

Why Can Large Language Models Generate Correct Chain-of-Thoughts?

paper_url: http://arxiv.org/abs/2310.13571
repo_url: None
paper_authors: Rasul Tutunov, Antoine Grosnit, Juliusz Ziomek, Jun Wang, Haitham Bou-Ammar
for: 这个论文探讨大语言模型（LLM）的能力，具体来说是提高对链式思维提示的理论认知。
methods: 作者提出了一种两级层次图形模型，用于自然语言生成。在这个框架中，作者证明了一种强有力的 geometrical convergence rate，用于评估 LLM 生成的链式思维是否正确。
results: 研究结果表明，LLM 可以有效地生成一系列相关的思维，并且可以理解和解释这些思维的顺序性。这些结果为具有理解能力的任务中 LLM 的表现提供了理论基础。

Abstract
This paper delves into the capabilities of large language models (LLMs), specifically focusing on advancing the theoretical comprehension of chain-of-thought prompting. We investigate how LLMs can be effectively induced to generate a coherent chain of thoughts. To achieve this, we introduce a two-level hierarchical graphical model tailored for natural language generation. Within this framework, we establish a compelling geometrical convergence rate that gauges the likelihood of an LLM-generated chain of thoughts compared to those originating from the true language. Our findings provide a theoretical justification for the ability of LLMs to produce the correct sequence of thoughts (potentially) explaining performance gains in tasks demanding reasoning skills.

摘要

Cache & Distil: Optimising API Calls to Large Language Models

paper_url: http://arxiv.org/abs/2310.13561
repo_url: None
paper_authors: Guillem Ramírez, Matthias Lindemann, Alexandra Birch, Ivan Titov
for: 降低大规模生成AI工具的成本，尤其是实时处理用户查询的API请求。
methods: 使用一个较小的语言模型（学生），并将其不断地训练成为独立处理用户查询的能力，并透过一个策略选择哪些请求交由学生处理，哪些交由大语言模型处理，以便助学生学习。
results: 在分类任务中，使用活动学习基于选择几个标准的检查方法，例如 Margin Sampling 和 Query by Committee，可以带来一致的优化效果，不论任务或预算。

Abstract
Large-scale deployment of generative AI tools often depends on costly API calls to a Large Language Model (LLM) to fulfil user queries. To curtail the frequency of these calls, one can employ a smaller language model -- a student -- which is continuously trained on the responses of the LLM. This student gradually gains proficiency in independently handling an increasing number of user requests, a process we term neural caching. The crucial element in neural caching is a policy that decides which requests should be processed by the student alone and which should be redirected to the LLM, subsequently aiding the student's learning. In this study, we focus on classification tasks, and we consider a range of classic active learning-based selection criteria as the policy. Our experiments suggest that Margin Sampling and Query by Committee bring consistent benefits across tasks and budgets.

摘要
大规模的生成AI工具常常依赖于成本高昂的API调用来满足用户的查询。为了减少这些调用频率，可以采用一个较小的语言模型——学生模型，并将其不断训练在大语言模型（LLM）的回应基础上。学生模型逐渐增强其独立处理用户请求的能力，这个过程我们称为神经缓存。神经缓存的关键元素是一种策略，决定哪些请求应该由学生模型处理，而哪些请求应该被重定向到LLM。在本研究中，我们关注分类任务，并考虑了一些经典的活动学习基于选择准则。我们的实验表明，边缘抽样和咨询委员会都带来了一致的好处，不 matter what the task or budget.

The Perils & Promises of Fact-checking with Large Language Models

paper_url: http://arxiv.org/abs/2310.13549
repo_url: None
paper_authors: Dorian Quelle, Alexandre Bovet
for: 这研究旨在评估大自然语言模型（LLM）在真实性核查中的表现，以及如何使用这些模型来提高核查的准确性。
methods: 这个研究使用了GPT-4和GPT-3大自然语言模型，并将其用于编写学术论文、法律文档和新闻文章，以评估这些模型在核查信息的能力。
results: 研究发现，当equipped with contextual information时，LLMs的表现有所提高，但准确性受到查询语言和CLAIM的影响。GPT-4表现比GPT-3更好，但是不同的查询语言和CLAIM可能会导致不同的准确性。

Abstract
Autonomous fact-checking, using machine learning to verify claims, has grown vital as misinformation spreads beyond human fact-checking capacity. Large Language Models (LLMs) like GPT-4 are increasingly trusted to verify information and write academic papers, lawsuits, and news articles, emphasizing their role in discerning truth from falsehood and the importance of being able to verify their outputs. Here, we evaluate the use of LLM agents in fact-checking by having them phrase queries, retrieve contextual data, and make decisions. Importantly, in our framework, agents explain their reasoning and cite the relevant sources from the retrieved context. Our results show the enhanced prowess of LLMs when equipped with contextual information. GPT-4 outperforms GPT-3, but accuracy varies based on query language and claim veracity. While LLMs show promise in fact-checking, caution is essential due to inconsistent accuracy. Our investigation calls for further research, fostering a deeper comprehension of when agents succeed and when they fail.

摘要
自主 фактоChecking，使用机器学习验证声明，在谎言扩散超出人工验证能力时变得越来越重要。大型语言模型（LLMs）如GPT-4在验证信息和写学术论文、法律诉讼和新闻文章方面被越来越信任，强调其在分辨真假和重要性的能力。在我们的框架中，代理人会提出问题，检索Contextual数据，并做出决定。特别是，在我们的框架中，代理人会解释其思考过程和引用来源于检索的Contextual数据。我们的结果显示在Contextual信息支持下，LLM代理人的能力得到了提高。GPT-4比GPT-3表现更出色，但是准确率基于问题语言和声明真假性有所不同。虽然LLM代理人在验证方面表现良好，但是需要谨慎，因为它们的准确率不稳定。我们的调查表明，需要进一步的研究，以深入理解代理人在不同情况下的表现。

A Diachronic Perspective on User Trust in AI under Uncertainty

paper_url: http://arxiv.org/abs/2310.13544
repo_url: https://github.com/zouharvi/trust-intervention
paper_authors: Shehzaad Dhuliawala, Vilém Zouhar, Mennatallah El-Assady, Mrinmaya Sachan
for: 这个论文研究了用户对人工智能系统的信任的发展和恢复，以及不同类型的误差对用户信任的影响。
methods: 该论文使用了一种投票游戏来研究用户对人工智能系统的信任的演变和恢复。
results: 研究发现，即使只有几次错误 Prediction with inaccurate confidence estimates can severely damage user trust and performance, with slow recovery. 不同类型的误差也有不同的负面影响于用户信任。这些发现highlights the importance of calibration in user-facing AI applications and shed light on what aspects help users decide whether to trust the AI system.

Abstract
In a human-AI collaboration, users build a mental model of the AI system based on its reliability and how it presents its decision, e.g. its presentation of system confidence and an explanation of the output. Modern NLP systems are often uncalibrated, resulting in confidently incorrect predictions that undermine user trust. In order to build trustworthy AI, we must understand how user trust is developed and how it can be regained after potential trust-eroding events. We study the evolution of user trust in response to these trust-eroding events using a betting game. We find that even a few incorrect instances with inaccurate confidence estimates damage user trust and performance, with very slow recovery. We also show that this degradation in trust reduces the success of human-AI collaboration and that different types of miscalibration -- unconfidently correct and confidently incorrect -- have different negative effects on user trust. Our findings highlight the importance of calibration in user-facing AI applications and shed light on what aspects help users decide whether to trust the AI system.

摘要
人与AI合作中，用户会建立AI系统的心理模型，基于其可靠性和输出的解释。现代NLP系统经常无法准确评估自己的可靠性，导致用户对AI系统的信任感受到损害。为建立可信worthy AI，我们需要理解用户信任的发展和恢复机制。我们通过赌博游戏来研究用户对不可靠事件后的信任恢复，发现 Even a few incorrect instances with inaccurate confidence estimates can damage user trust and performance, with very slow recovery. We also show that different types of miscalibration -- unconfidently correct and confidently incorrect -- have different negative effects on user trust. Our findings highlight the importance of calibration in user-facing AI applications and shed light on what aspects help users decide whether to trust the AI system.

Controlled Randomness Improves the Performance of Transformer Models

paper_url: http://arxiv.org/abs/2310.13526
repo_url: None
paper_authors: Tobias Deußer, Cong Zhao, Wolfgang Krämer, David Leonhard, Christian Bauckhage, Rafet Sifa
for: 这个研究的目的是要探索在自然语言模型的预训步骤中引入控制随机性，以提高精确训练和下游任务的性能。
methods: 这个研究使用了随机噪音来控制自然语言模型的训练过程，以提高精确训练和下游任务的性能。
results: 研究发现，在这两个下游任务中，透过将随机噪音添加到训练过程，可以提高自然语言模型的性能。

Abstract
During the pre-training step of natural language models, the main objective is to learn a general representation of the pre-training dataset, usually requiring large amounts of textual data to capture the complexity and diversity of natural language. Contrasting this, in most cases, the size of the data available to solve the specific downstream task is often dwarfed by the aforementioned pre-training dataset, especially in domains where data is scarce. We introduce controlled randomness, i.e. noise, into the training process to improve fine-tuning language models and explore the performance of targeted noise in addition to the parameters of these models. We find that adding such noise can improve the performance in our two downstream tasks of joint named entity recognition and relation extraction and text summarization.

摘要
Here's the text in Simplified Chinese:在自然语言模型的预训练阶段，主要目标是学习预训练数据集的通用表示，通常需要大量的文本数据来捕捉自然语言的复杂性和多样性。相比之下，在大多数情况下，解决特定下游任务的数据量通常比预训练数据集要小得多，特别是在数据匮乏的领域。为了解决这个问题，我们引入控制的随机性，即噪声，到训练过程中，以提高语言模型的微调和探索噪声的影响以及模型参数。我们发现，添加这种噪声可以提高我们的两个下游任务的结合命名实体识别和关系EXTRACTION和文本摘要的性能。

Teaching Language Models to Self-Improve through Interactive Demonstrations

paper_url: http://arxiv.org/abs/2310.13522
repo_url: https://github.com/jasonyux/tripost
paper_authors: Xiao Yu, Baolin Peng, Michel Galley, Jianfeng Gao, Zhou Yu
for: 这个论文的目的是提高小型语言模型（LLMs）的自我改进能力，以减少与现状最佳LLMs之间的性能差距。
methods: 作者提出了一种名为TriPosT的训练算法，通过让小型模型与大型语言模型互动，收集反馈和改进自己的生成内容，来增强小型模型的自我改进能力。
results: 作者的实验表明，使用TriPosT训练算法可以提高一个LLaMA-7b模型在数学和逻辑任务上的性能，最高提高7.13%。此外，作者发现了在学习和修正自己的错误时，小型模型的互动经验是关键的。

Abstract
The self-improving ability of large language models (LLMs), enabled by prompting them to analyze and revise their own outputs, has garnered significant interest in recent research. However, this ability has been shown to be absent and difficult to learn for smaller models, thus widening the performance gap between state-of-the-art LLMs and more cost-effective and faster ones. To reduce this gap, we introduce TriPosT, a training algorithm that endows smaller models with such self-improvement ability, and show that our approach can improve a LLaMA-7b's performance on math and reasoning tasks by up to 7.13%. In contrast to prior work, we achieve this by using the smaller model to interact with LLMs to collect feedback and improvements on its own generations. We then replay this experience to train the small model. Our experiments on four math and reasoning datasets show that the interactive experience of learning from and correcting its own mistakes is crucial for small models to improve their performance.

摘要
大型自然语言模型（LLM）的自我改进能力，受到最近研究的广泛关注。然而，这种能力对小型模型来说是缺失的并且困难学习，因此加大了当前LLM和更cost-effective的模型之间性能差距。为了减少这个差距，我们介绍了TriPosT训练算法，使小型模型拥有自我改进能力，并证明我们的方法可以在数学和逻辑任务上提高LLaMA-7b的性能 by up to 7.13%。与之前的工作不同，我们通过使小型模型与LLM进行互动，收集feedback和改进自己的生成。然后，我们将这些经验重新播放以训练小模型。我们在四个数学和逻辑数据集上进行了实验，发现互动式学习自己的错误和改进自己的生成是小模型提高性能的关键。

Improving Question Generation with Multi-level Content Planning

paper_url: http://arxiv.org/abs/2310.13512
repo_url: https://github.com/zeaver/multifactor
paper_authors: Zehua Xia, Qi Gou, Bowen Yu, Haiyang Yu, Fei Huang, Yongbin Li, Cam-Tu Nguyen
for: 本研究旨在生成基于 Context 和答案的问题，特别是需要跨越多个步骤的理解Context 中的问题。
methods: 我们提出了一种基于多级内容规划的问题生成框架，即 MultiFactor，其包括两个组件：FA-model，同时选择关键短语和生成全答，以及Q-model，使用生成的全答作为额外输入来生成问题。
results: 我们的方法在两个流行的问题生成数据集上表现出优于强基elines。

Abstract
This paper addresses the problem of generating questions from a given context and an answer, specifically focusing on questions that require multi-hop reasoning across an extended context. Previous studies have suggested that key phrase selection is essential for question generation (QG), yet it is still challenging to connect such disjointed phrases into meaningful questions, particularly for long context. To mitigate this issue, we propose MultiFactor, a novel QG framework based on multi-level content planning. Specifically, MultiFactor includes two components: FA-model, which simultaneously selects key phrases and generates full answers, and Q-model which takes the generated full answer as an additional input to generate questions. Here, full answer generation is introduced to connect the short answer with the selected key phrases, thus forming an answer-aware summary to facilitate QG. Both FA-model and Q-model are formalized as simple-yet-effective Phrase-Enhanced Transformers, our joint model for phrase selection and text generation. Experimental results show that our method outperforms strong baselines on two popular QG datasets. Our code is available at https://github.com/zeaver/MultiFactor.

摘要
MultiFactor consists of two components: the FA-model, which simultaneously selects key phrases and generates full answers, and the Q-model, which takes the generated full answer as input to generate questions. The FA-model uses a Phrase-Enhanced Transformer to formalize the process of selecting key phrases and generating full answers. The Q-model also uses a Phrase-Enhanced Transformer to generate questions based on the generated full answer.The key innovation of MultiFactor is the use of full answer generation to connect the short answer with the selected key phrases, creating an answer-aware summary that facilitates QG. This approach allows the model to generate more coherent and relevant questions, especially for long contexts.Experimental results show that MultiFactor outperforms strong baselines on two popular QG datasets. Our code is available at https://github.com/zeaver/MultiFactor.

DistillCSE: Distilled Contrastive Learning for Sentence Embeddings

paper_url: http://arxiv.org/abs/2310.13499
repo_url: None
paper_authors: Jiahao Xu, Wei Shao, Lihui Chen, Lemao Liu
for: 本文提出了DistillCSE框架，它通过对自我教学模型进行对比学习，使用知识填充来帮助学习更强的模型。
methods: 本文使用了知识填充和对比学习两种方法来提高模型性能。
results: 实验结果表明，提出的DistillCSE方法可以超过许多强大的基eline方法，并实现新的状态场报表性能。

Abstract
This paper proposes the DistillCSE framework, which performs contrastive learning under the self-training paradigm with knowledge distillation. The potential advantage of DistillCSE is its self-enhancing feature: using a base model to provide additional supervision signals, a stronger model may be learned through knowledge distillation. However, the vanilla DistillCSE through the standard implementation of knowledge distillation only achieves marginal improvements due to severe overfitting. The further quantitative analyses demonstrate the reason that the standard knowledge distillation exhibits a relatively large variance of the teacher model's logits due to the essence of contrastive learning. To mitigate the issue induced by high variance, this paper accordingly proposed two simple yet effective solutions for knowledge distillation: a Group-P shuffling strategy as an implicit regularization and the averaging logits from multiple teacher components. Experiments on standard benchmarks demonstrate that the proposed DistillCSE outperforms many strong baseline methods and yields a new state-of-the-art performance.

摘要
Translated into Simplified Chinese:这篇论文提出了DistillCSE框架，它通过对比学习的自我训练方式进行知识储存，并可能带来更强的模型。然而，标准的DistillCSE实现中的知识储存只能达到有限的改进，主要是因为严重的过拟合。进一步的量化分析表明，标准的知识储存会导致教师模型的偏差值较大，这是因为对比学习的本质。为了解决这个问题，这篇论文提出了两种简单 yet有效的解决方案：Group-P混合策略作为隐式正则化，以及从多个教师组件中平均logits。实验表明，提议的DistillCSE可以超越许多强大的基线方法，并达到新的领域最佳性能。

Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning

paper_url: http://arxiv.org/abs/2310.13448
repo_url: None
paper_authors: Duarte M. Alves, Nuno M. Guerreiro, João Alves, José Pombal, Ricardo Rei, José G. C. de Souza, Pierre Colombo, André F. T. Martins
for: 本研究旨在探讨LLM-based机器翻译系统的缺陷和改进方法。
methods: 本研究使用 adapter-based 微调，并证明这种方法可以提高翻译效果，同时减少训练参数数量。
results: 研究发现，微调通常会降低几个示例的表现，但可以保留它们的翻译能力。此外，提出了一种简单的方法，可以在微调过程中包含几个示例，以提高翻译效果。

Abstract
Large language models (LLMs) are a promising avenue for machine translation (MT). However, current LLM-based MT systems are brittle: their effectiveness highly depends on the choice of few-shot examples and they often require extra post-processing due to overgeneration. Alternatives such as finetuning on translation instructions are computationally expensive and may weaken in-context learning capabilities, due to overspecialization. In this paper, we provide a closer look at this problem. We start by showing that adapter-based finetuning with LoRA matches the performance of traditional finetuning while reducing the number of training parameters by a factor of 50. This method also outperforms few-shot prompting and eliminates the need for post-processing or in-context examples. However, we show that finetuning generally degrades few-shot performance, hindering adaptation capabilities. Finally, to obtain the best of both worlds, we propose a simple approach that incorporates few-shot examples during finetuning. Experiments on 10 language pairs show that our proposed approach recovers the original few-shot capabilities while keeping the added benefits of finetuning.

摘要
Translated into Simplified Chinese:大型语言模型（LLM）是机器翻译（MT）的有望之路。然而，当前LLM基于的MT系统很脆弱：它们效果受选择少量示例的影响很大，并且经常需要额外处理因过量生成。其他方法，如特定化在翻译指令上的训练， computationally expensive 并可能会削弱在上下文学习Capabilities。在这篇论文中，我们对这个问题进行了更加细化的分析。我们首先显示，使用 adapter-based 特定化可以与传统训练方法相当，同时减少训练参数的数量，相对于50。此方法还超越了少量示例推荐和无需后处理或上下文示例。然而，我们发现，训练通常会降低少量示例性能，阻碍适应能力。最后，为了取得两个世界的优点，我们提议一种简单的方法，在特定化过程中包含少量示例。在10种语言对的实验中，我们的提议方法可以恢复原始的少量示例能力，同时保留特定化的优点。

The Past, Present, and Future of Typological Databases in NLP

paper_url: http://arxiv.org/abs/2310.13440
repo_url: None
paper_authors: Emi Baylor, Esther Ploeger, Johannes Bjerva
for: 这研究旨在探讨大规模语言类型学数据库的不一致性，以及这些数据库在自然语言处理（NLP）领域的应用。
methods: 该研究使用了系统性的方法来探讨类型学数据库之间的不一致性，以及这些数据库在NLP领域的应用。
results: 研究发现，大规模语言类型学数据库存在较多的不一致性，这些不一致性的原因包括编码错误、语言变化以及语义差异。此外，研究还发现，一种连续的类型学视角可以帮助解决这些不一致性问题，并且这种视角在未来可能会在语言模型化中发挥重要作用。

Abstract
Typological information has the potential to be beneficial in the development of NLP models, particularly for low-resource languages. Unfortunately, current large-scale typological databases, notably WALS and Grambank, are inconsistent both with each other and with other sources of typological information, such as linguistic grammars. Some of these inconsistencies stem from coding errors or linguistic variation, but many of the disagreements are due to the discrete categorical nature of these databases. We shed light on this issue by systematically exploring disagreements across typological databases and resources, and their uses in NLP, covering the past and present. We next investigate the future of such work, offering an argument that a continuous view of typological features is clearly beneficial, echoing recommendations from linguistics. We propose that such a view of typology has significant potential in the future, including in language modeling in low-resource scenarios.

摘要
typological information 有潜在的优势可以帮助自然语言处理（NLP）模型的发展,特别是 для低资源语言。不幸的是,当前大规模的 typological databases 和 Grambank 存在差异,不仅与其他一些 typological information 源相关,还与自己存在差异。这些差异的原因包括编码错误或语言变化,但许多这些不一致的原因是由于这些数据库的精确性不足。我们通过系统地探讨这些数据库之间的不一致,以及它们在 NLP 中的应用,从过去和现在两个方面来探讨这个问题。我们 subsequentially 探讨未来这种工作的发展,并提出一种持续视角的 typology 特征是非常有利,这与语言学界的建议相符。我们建议这种持续视角在未来会拥有显著的潜在价值,包括语言模型在低资源enario 中的应用。

Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations

paper_url: http://arxiv.org/abs/2310.13420
repo_url: https://github.com/conversation-chronicles/conversation-chronicles
paper_authors: Jihyoung Jang, Minseong Boo, Hyounghun Kim
for: This paper aims to address the limitation of existing open-domain chatbot research by incorporating contextual information from multiple consecutive sessions into the conversation setup.
methods: The authors introduce a new 1M multi-session dialogue dataset called Conversation Chronicles, which includes time intervals and fine-grained speaker relationships. They also propose a dialogue model called ReBot, which consists of chronological summarization and dialogue generation modules.
results: The human evaluation shows that dialogue episodes in Conversation Chronicles reflect the properties of long-term conversations while maintaining coherent and consistent interactions across all sessions. ReBot, trained on Conversation Chronicles, demonstrates long-term context understanding with a high human engagement score.

Abstract
In the field of natural language processing, open-domain chatbots have emerged as an important research topic. However, a major limitation of existing open-domain chatbot research is its singular focus on short single-session dialogue, neglecting the potential need for understanding contextual information in multiple consecutive sessions that precede an ongoing dialogue. Among the elements that compose the context in multi-session conversation settings, the time intervals between sessions and the relationships between speakers would be particularly important. Despite their importance, current research efforts have not sufficiently addressed these dialogical components. In this paper, we introduce a new 1M multi-session dialogue dataset, called Conversation Chronicles, for implementing a long-term conversation setup in which time intervals and fine-grained speaker relationships are incorporated. Following recent works, we exploit a large language model to produce the data. The extensive human evaluation shows that dialogue episodes in Conversation Chronicles reflect those properties while maintaining coherent and consistent interactions across all the sessions. We also propose a dialogue model, called ReBot, which consists of chronological summarization and dialogue generation modules using only around 630M parameters. When trained on Conversation Chronicles, ReBot demonstrates long-term context understanding with a high human engagement score.

摘要
Translation in Simplified Chinese:在自然语言处理领域，开放领域 чат机器人已经成为重要的研究主题。然而，现有的开放领域 чат机器人研究存在一个重要的限制，即它忽略了多场会话中的多个会话session的上下文信息。在多场会话设置中，时间间隔和对话者之间的关系是非常重要的。尽管它们的重要性，现有的研究努力并没有充分考虑这些对话组成部分。在这篇论文中，我们介绍了一个新的100万多场会话对话集合，called Conversation Chronicles，用于实现长期对话设置，在该设置中，时间间隔和细化的对话者关系都被考虑。采用最新的大语言模型生成数据。人工评估表明，Conversation Chronicles中的对话集合具备了这些属性，同时保持了一致和一致的交互。我们还提出了一个对话模型，called ReBot，它包括时间顺序概要和对话生成模块，只使用约630M参数。当训练在Conversation Chronicles上时，ReBot能够展示长期上下文理解，并获得了高度的人工参与度。

Towards Enhancing Relational Rules for Knowledge Graph Link Prediction

paper_url: http://arxiv.org/abs/2310.13411
repo_url: https://github.com/ninggirsu/run-gnn
paper_authors: Shuhan Wu, Huaiyu Wan, Wei Chen, Yuting Wu, Junfeng Shen, Youfang Lin
for: 提高知识图reasoning的性能
methods: 使用query related fusion gate unit模型关系的顺序性，并使用缓冲更新机制缓解延迟的实体信息传递问题
results: 在多个数据集上表现出优于传递和推导链预测任务

Abstract
Graph neural networks (GNNs) have shown promising performance for knowledge graph reasoning. A recent variant of GNN called progressive relational graph neural network (PRGNN), utilizes relational rules to infer missing knowledge in relational digraphs and achieves notable results. However, during reasoning with PRGNN, two important properties are often overlooked: (1) the sequentiality of relation composition, where the order of combining different relations affects the semantics of the relational rules, and (2) the lagged entity information propagation, where the transmission speed of required information lags behind the appearance speed of new entities. Ignoring these properties leads to incorrect relational rule learning and decreased reasoning accuracy. To address these issues, we propose a novel knowledge graph reasoning approach, the Relational rUle eNhanced Graph Neural Network (RUN-GNN). Specifically, RUN-GNN employs a query related fusion gate unit to model the sequentiality of relation composition and utilizes a buffering update mechanism to alleviate the negative effect of lagged entity information propagation, resulting in higher-quality relational rule learning. Experimental results on multiple datasets demonstrate the superiority of RUN-GNN is superior on both transductive and inductive link prediction tasks.

摘要
GRaph Neural Networks (GNNs) 有示 promise的表现力量知识图理解。一种最近的 GNN 变体called progressive relational graph neural network (PRGNN) 利用关系规则来推理知识图中缺失的信息，并取得了显著的成果。然而，在 PRGNN 中进行理解时，有两个重要的特性通常被忽略：（1）关系组合的顺序性，其中不同关系的组合顺序对 semantics 的关系规则产生影响，以及（2）延迟的实体信息传递，其中新出现的实体信息传递的速度落后于实体信息的需求速度。忽略这些特性会导致 incorrect 的关系规则学习和降低理解精度。为了解决这些问题，我们提出了一种新的知识图理解方法，即 Relational rUle eNhanced Graph Neural Network (RUN-GNN)。具体来说，RUN-GNN 使用一个查询相关融合门控制器来模型关系组合的顺序性，并使用一个缓冲更新机制来缓解实体信息传递的负面影响，从而实现更高质量的关系规则学习。实验结果表明，RUN-GNN 在多个数据集上的传递性和概率链预测任务上表现出色。

Explicit Alignment and Many-to-many Entailment Based Reasoning for Conversational Machine Reading

paper_url: http://arxiv.org/abs/2310.13409
repo_url: https://github.com/AidenYo/BiAE
paper_authors: Yangyang Luo, Shiyu Tian, Caixia Yuan, Xiaojie Wang
for: 本研究旨在提高对话机器阅读（CMR）系统的性能，特别是在多Turn对话中对文档和用户提供的信息进行Alignment。
methods: 该方法使用了轻量级多对多推理模块进行决策，并直接基于文档和已问题 generates follow-up问题。
results: 该方法在微准确率方面达到了领先的状态，并在公共领导者数据集ShARC上排名第一。

Abstract
Conversational Machine Reading (CMR) requires answering a user's initial question through multi-turn dialogue interactions based on a given document. Although there exist many effective methods, they largely neglected the alignment between the document and the user-provided information, which significantly affects the intermediate decision-making and subsequent follow-up question generation. To address this issue, we propose a pipeline framework that (1) aligns the aforementioned two sides in an explicit way, (2)makes decisions using a lightweight many-to-many entailment reasoning module, and (3) directly generates follow-up questions based on the document and previously asked questions. Our proposed method achieves state-of-the-art in micro-accuracy and ranks the first place on the public leaderboard of the CMR benchmark dataset ShARC.

摘要
对话机器阅读（CMR）需要基于给定文档回答用户的初始问题，通过多回交流互动。虽然现有许多有效方法，但它们忽略了文档和用户提供的信息之间的匹配，这对于中间决策和 subsequential 询问生成具有重要影响。为解决这个问题，我们提议一个管道式框架，包括以下三个部分：1. 显式对文档和用户提供的信息进行匹配，以确保它们之间的Alignment。2. 使用轻量级多对多推理模块进行决策，以便更好地处理多个问题。3. 基于文档和之前提出的问题，直接生成 subsequential 询问。我们的提议方法在微准确性方面达到了领先水平，并在公共领导板块上名列第一名。

Cache me if you Can: an Online Cost-aware Teacher-Student framework to Reduce the Calls to Large Language Models

paper_url: http://arxiv.org/abs/2310.13395
repo_url: https://github.com/stoyian/OCaTS
paper_authors: Ilias Stogiannidis, Stavros Vassos, Prodromos Malakasiotis, Ion Androutsopoulos
for: This paper aims to reduce the operating expense (OpEx) of using third-party language model (LLM) services for small and medium-sized enterprises (SMEs) by caching previous LLM responses and training local inexpensive models.
methods: The proposed framework includes criteria for deciding when to trust the local model or call the LLM, as well as a methodology to tune the criteria and measure the tradeoff between performance and cost.
results: Experimental results using two LLMs (GPT-3.5 and GPT-4) and two inexpensive students (k-NN classifier and Multi-Layer Perceptron) on two common business tasks (intent recognition and sentiment analysis) show that significant OpEx savings can be obtained with only slightly lower performance.

Abstract
Prompting Large Language Models (LLMs) performs impressively in zero- and few-shot settings. Hence, small and medium-sized enterprises (SMEs) that cannot afford the cost of creating large task-specific training datasets, but also the cost of pretraining their own LLMs, are increasingly turning to third-party services that allow them to prompt LLMs. However, such services currently require a payment per call, which becomes a significant operating expense (OpEx). Furthermore, customer inputs are often very similar over time, hence SMEs end-up prompting LLMs with very similar instances. We propose a framework that allows reducing the calls to LLMs by caching previous LLM responses and using them to train a local inexpensive model on the SME side. The framework includes criteria for deciding when to trust the local model or call the LLM, and a methodology to tune the criteria and measure the tradeoff between performance and cost. For experimental purposes, we instantiate our framework with two LLMs, GPT-3.5 or GPT-4, and two inexpensive students, a k-NN classifier or a Multi-Layer Perceptron, using two common business tasks, intent recognition and sentiment analysis. Experimental results indicate that significant OpEx savings can be obtained with only slightly lower performance.

摘要
LLMs 在零或几个预测设置下表现出色，因此小型和中型企业（SMEs）不能负担创建大量任务特定的训练数据集和自己的 LLM 的预训练成本，现在更多地使用第三方服务。然而，现有服务需每次调用 LLM 的费用，这成为了运营成本（OpEx）的一大部分。此外，客户输入通常在时间上很相似，因此 SMEs 通常会向 LLM 提交非常相似的输入。我们提议一个框架，可以减少对 LLM 的调用次数，通过缓存之前 LLM 的回答并使用其来训练在 SME 端的低成本模型。该框架包括决定是否信任本地模型或调用 LLM 的标准，以及跟踪这些标准的调整和评估。为实验目的，我们实现了我们的框架，使用 GPT-3.5 或 GPT-4 两个 LLM，以及两个低成本学生，一个 k-NN 分类器或一个多层感知器，使用两个常见的商业任务，意图识别和情感分析。实验结果表明，可以通过减少 OpEx 来获得显著的成本节省，只有微不足的性能下降。

Tuna: Instruction Tuning using Feedback from Large Language Models

paper_url: http://arxiv.org/abs/2310.13385
repo_url: https://github.com/microsoft/lmops
paper_authors: Haoran Li, Yiran Liu, Xingxing Zhang, Wei Lu, Furu Wei
for: 这 paper 的目的是提出一种基于 direct outputs 的 Instruction-tuned LLM，以提高模型的行为与人类偏好的 align。
methods: 这 paper 使用了两种新的方法： probablistic ranking 和 contextual ranking，以增加模型的可能性生成更好的响应。
results: 这 paper 的模型（Tuna）在 Super Natural Instructions 和 LMentry 等 119 个测试任务上表现出色，并且可以超过一些强大的奖励学习基准。

Abstract
Instruction tuning of open-source large language models (LLMs) like LLaMA, using direct outputs from more powerful LLMs such as Instruct-GPT and GPT-4, has proven to be a cost-effective way to align model behaviors with human preferences. However, the instruction-tuned model has only seen one response per instruction, lacking the knowledge of potentially better responses. In this paper, we propose finetuning an instruction-tuned LLM using our novel \textit{probabilistic ranking} and \textit{contextual ranking} approaches to increase the likelihood of generating better responses. Probabilistic ranking enables the instruction-tuned model to inherit the relative rankings of high-quality and low-quality responses from the teacher LLM. On the other hand, learning with contextual ranking allows the model to refine its own response distribution using the contextual understanding ability of stronger LLMs. Furthermore, we apply probabilistic ranking and contextual ranking sequentially to the instruction-tuned LLM. The resulting model, which we call \textbf{Tuna}, consistently improves the performance on Super Natural Instructions (119 test tasks), LMentry (25 test tasks), Vicuna QA, and can even obtain better results than several strong reinforcement learning baselines. Our code and data are available at \url{ https://github.com/microsoft/LMOps}.

摘要
大型自然语言模型（LLM）如LLaMA的指令调整，使用直接输出更强大的LLM的指令，如Instruct-GPT和GPT-4，已经证明是一种经济的方式来调整模型的行为与人类偏好相Alignment。然而，指令调整模型只看到了一个响应，缺乏更好的响应的知识。在这篇论文中，我们提议对指令调整LLM进行迭代finetuning，使用我们的新的概率排序和上下文排序方法来增加生成更好的响应的可能性。概率排序让指令调整模型继承更强大LLM的高质量和低质量响应的相对排名。然而，通过上下文排序来让模型使用更强大LLM的上下文理解能力来细化自己的响应分布。此外，我们采用概率排序和上下文排序两个顺序来处理指令调整LLM。得到的模型，我们称之为Tuna，在Super Natural Instructions（119个测试任务）、LMentry（25个测试任务）、Vicuna QA等测试任务上表现出色，甚至可以超过一些强大的强化学习基elines。我们的代码和数据可以在https://github.com/microsoft/LMOps上获取。

APP: Adaptive Prototypical Pseudo-Labeling for Few-shot OOD Detection

paper_url: http://arxiv.org/abs/2310.13380
repo_url: None
paper_authors: Pei Wang, Keqing He, Yutao Mou, Xiaoshuai Song, Yanan Wu, Jingang Wang, Yunsen Xian, Xunliang Cai, Weiran Xu
for: 这篇论文的目的是提出一种在仅有几个标注IND意图的情况下进行偏出版本检测的方法。
methods: 本文提出了一种适应式prototype pseudo-labeling（APP）方法，包括一个 prototype OOD检测框架（ProtoOOD）来帮助使用有限IND数据进行低资源OOD检测，以及一种适应式pseudo-labeling方法来生成高质量pseudo OOD&IND标签。
results: 实验和分析显示了本方法在几据OOD检测中的效果。

Abstract
Detecting out-of-domain (OOD) intents from user queries is essential for a task-oriented dialogue system. Previous OOD detection studies generally work on the assumption that plenty of labeled IND intents exist. In this paper, we focus on a more practical few-shot OOD setting where there are only a few labeled IND data and massive unlabeled mixed data that may belong to IND or OOD. The new scenario carries two key challenges: learning discriminative representations using limited IND data and leveraging unlabeled mixed data. Therefore, we propose an adaptive prototypical pseudo-labeling (APP) method for few-shot OOD detection, including a prototypical OOD detection framework (ProtoOOD) to facilitate low-resource OOD detection using limited IND data, and an adaptive pseudo-labeling method to produce high-quality pseudo OOD\&IND labels. Extensive experiments and analysis demonstrate the effectiveness of our method for few-shot OOD detection.

摘要
检测用户查询外部域（OOD）意图是任务对话系统的重要任务。先前的OOD检测研究通常假设有充足的标注IND意图数据存在。在这篇论文中，我们专注于更实际的几shotOOD设定，其中只有几个标注IND数据和大量未标注混合数据，这些数据可能属于IND或OOD。这种新的情况带来两个关键挑战：使用有限的IND数据学习准确的表示，并使用大量混合数据进行挖掘。因此，我们提出了适应型pseudo标签法（APP），包括一个 проtotypical OOD检测框架（ProtoOOD），以便在有限IND数据情况下进行低资源OOD检测，以及一种适应pseudo标签方法，以生成高质量pseudo OOD&IND标签。广泛的实验和分析表明，我们的方法在几shotOOD检测中表现出色。

Analyzing Cognitive Plausibility of Subword Tokenization

paper_url: http://arxiv.org/abs/2310.13348
repo_url: https://github.com/clap-lab/cogtok
paper_authors: Lisa Beinborn, Yuval Pinter
for: 本研究的目的是评估不同语言的字根词法 tokenization 算法的认知可能性。
methods: 本研究使用了一种新的评估方法，通过对人类在 lexical decision 任务中的响应时间和准确率与 tokenizer 输出之间的相关性进行分析，以评估不同 tokenization 算法的认知可能性。
results: 研究结果显示，UnigramLM 算法在不同语言和词汇大小下的 tokenization 行为更加不具认知可能性，同时也忽略了 derivational morphemes 的覆盖率。

Abstract
Subword tokenization has become the de-facto standard for tokenization, although comparative evaluations of subword vocabulary quality across languages are scarce. Existing evaluation studies focus on the effect of a tokenization algorithm on the performance in downstream tasks, or on engineering criteria such as the compression rate. We present a new evaluation paradigm that focuses on the cognitive plausibility of subword tokenization. We analyze the correlation of the tokenizer output with the response time and accuracy of human performance on a lexical decision task. We compare three tokenization algorithms across several languages and vocabulary sizes. Our results indicate that the UnigramLM algorithm yields less cognitively plausible tokenization behavior and a worse coverage of derivational morphemes, in contrast with prior work.

摘要
宋Wordtokenization已成为实际标准的分词方法，尽管相对评估语言之间的子字词词库质量的比较罕见。现有的评估研究主要关注下游任务性能的影响或工程特性如压缩率。我们提出了一种新的评估方框，关注分词器输出与人类语言决策任务的响应时间和准确率之间的相关性。我们对三种分词算法进行了多种语言和词汇大小的比较。我们的结果表明，UnigramLM算法产生的分词行为更加不具有认知可能性，同时对 derivational morpheme 的覆盖率也更差，与先前的研究不符。

Large-Scale and Multi-Perspective Opinion Summarization with Diverse Review Subsets

paper_url: http://arxiv.org/abs/2310.13340
repo_url: None
paper_authors: Han Jiang, Rui Wang, Zhihua Wei, Yu Li, Xinpeng Wang
for: 提供了一种基于监督学习的多角度评论摘要框架，以便对大规模评论集进行有效的摘要。
methods: 提出了一种受过监督的摘要框架，包括评论采样策略集和两 stage 训练方案。评论采样策略会根据评论的 sentiment orientation 和 contrastive information value 选择不同的评论 subset。
results: 实验结果表明，SUBSUMM 能够从百余篇评论中生成高质量的摘要，包括 Pros、Cons 和 Verdict 摘要。此外，我们的深入分析表明，选择评论 subset 和两 stage 训练方案是提高摘要性能的关键因素。

Abstract
Opinion summarization is expected to digest larger review sets and provide summaries from different perspectives. However, most existing solutions are deficient in epitomizing extensive reviews and offering opinion summaries from various angles due to the lack of designs for information selection. To this end, we propose SUBSUMM, a supervised summarization framework for large-scale multi-perspective opinion summarization. SUBSUMM consists of a review sampling strategy set and a two-stage training scheme. The sampling strategies take sentiment orientation and contrastive information value into consideration, with which the review subsets from different perspectives and quality levels can be selected. Subsequently, the summarizer is encouraged to learn from the sub-optimal and optimal subsets successively in order to capitalize on the massive input. Experimental results on AmaSum and Rotten Tomatoes datasets demonstrate that SUBSUMM is adept at generating pros, cons, and verdict summaries from hundreds of input reviews. Furthermore, our in-depth analysis verifies that the advanced selection of review subsets and the two-stage training scheme are vital to boosting the summarization performance.

摘要
文本翻译为简化中文。 существующие解决方案因缺乏信息选择的设计而无法生成覆盖广泛评论和多个角度的意见摘要。为此，我们提出了SUBSUMM，一种监督摘要框架，用于大规模多角度意见摘要。SUBSUMM包括评论采样策略集和两个阶段训练方案。采样策略考虑了 sentiment 方向和对比信息价值，可以从不同的角度和质量水平中选择评论 subset。然后，摘要器受益于大量输入，逐渐学习从优秀和优化subset中获得知识。实验结果表明，SUBSUMM可以从多达百个输入评论中生成评价、缺点和结论摘要。此外，我们的深入分析表明，选择评论subset的高级技巧和两个阶段训练方案对摘要性能产生了重要的提高作用。Here is the translation of the text into Simplified Chinese:<文本翻译为简化中文。现有的解决方案因缺乏信息选择的设计而无法生成覆盖广泛评论和多个角度的意见摘要。为此，我们提出了SUBSUMM，一种监督摘要框架，用于大规模多角度意见摘要。SUBSUMM包括评论采样策略集和两个阶段训练方案。采样策略考虑了 sentiment 方向和对比信息价值，可以从不同的角度和质量水平中选择评论 subset。然后，摘要器受益于大量输入，逐渐学习从优秀和优化subset中获得知识。实验结果表明，SUBSUMM可以从多达百个输入评论中生成评价、缺点和结论摘要。此外，我们的深入分析表明，选择评论subset的高级技巧和两个阶段训练方案对摘要性能产生了重要的提高作用。

Beyond Hard Samples: Robust and Effective Grammatical Error Correction with Cycle Self-Augmenting

paper_url: http://arxiv.org/abs/2310.13321
repo_url: https://github.com/zetangforward/csa-gec
paper_authors: Zecheng Tang, Kaifeng Qi, Juntao Li, Min Zhang
for: 这个研究是为了提高语法错误修正模型的Robustness，对抗特定类型的攻击。
methods: 本研究使用了sequence-to-sequence模型，并将其攻击到四种不同的攻击类型。furthermore, the paper proposes a simple yet effective Cycle Self-Augmenting (CSA) method to improve the model’s robustness.
results: 实验结果显示，使用CSA方法可以帮助四种不同的基eline模型增强其Robustness，而不需要将攻击示例加入训练过程中。此外，CSA方法可以降低模型对于没有错误的数据的适应性，并提高模型对于未见过的数据的一致性。

Abstract
Recent studies have revealed that grammatical error correction methods in the sequence-to-sequence paradigm are vulnerable to adversarial attack, and simply utilizing adversarial examples in the pre-training or post-training process can significantly enhance the robustness of GEC models to certain types of attack without suffering too much performance loss on clean data. In this paper, we further conduct a thorough robustness evaluation of cutting-edge GEC methods for four different types of adversarial attacks and propose a simple yet very effective Cycle Self-Augmenting (CSA) method accordingly. By leveraging the augmenting data from the GEC models themselves in the post-training process and introducing regularization data for cycle training, our proposed method can effectively improve the model robustness of well-trained GEC models with only a few more training epochs as an extra cost. More concretely, further training on the regularization data can prevent the GEC models from over-fitting on easy-to-learn samples and thus can improve the generalization capability and robustness towards unseen data (adversarial noise/samples). Meanwhile, the self-augmented data can provide more high-quality pseudo pairs to improve model performance on the original testing data. Experiments on four benchmark datasets and seven strong models indicate that our proposed training method can significantly enhance the robustness of four types of attacks without using purposely built adversarial examples in training. Evaluation results on clean data further confirm that our proposed CSA method significantly improves the performance of four baselines and yields nearly comparable results with other state-of-the-art models. Our code is available at https://github.com/ZetangForward/CSA-GEC.

摘要
近期研究发现，序列到序列框架中的语法错误纠正方法容易受到敌意攻击，而使用敌意示例在预训练或后训练过程中可以有效提高GEC模型对certain类型的攻击的抵抗力，而不是受到过多的clean数据影响。在这篇论文中，我们进一步进行了四种不同类型的敌意攻击的精orous evaluate，并提出了一种简单 yet very effective的自回归增强（CSA）方法。通过在后训练过程中利用GEC模型自己生成的增强数据，并在训练数据中引入循环训练数据，我们的提议的方法可以有效提高已经训练过的GEC模型的模型 robustness，只需要增加一些更多的训练粒度。更具体地说，进一步训练在循环训练数据上可以防止GEC模型过拟合易学习样本，提高模型的总体化能力和对未看到数据（敌意噪音）的Robustness。同时，自回归数据可以为模型提供更多的高质量 Pseudo pair，提高模型在原始测试数据上的性能。实验结果表明，我们的提议的训练方法可以有效提高四种攻击类型的Robustness，而不需要在训练过程中使用特制的敌意示例。 clean数据上的评估结果还证明，我们的CSA方法可以大幅提高四个基eline的性能，并与其他当前领先模型几乎相当。我们的代码可以在https://github.com/ZetangForward/CSA-GEC中找到。

Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

paper_url: http://arxiv.org/abs/2310.13315
repo_url: None
paper_authors: Miaoxi Zhu, Qihuang Zhong, Li Shen, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao
for: 这篇论文旨在提出一个零shot量化框架，来实现零shot量化的各种语言模型（PLM）。
methods: 本文使用的方法包括零shot量化和零shot预测，并且提出了一个名为SAM-SGA优化的算法，用于提高量化精度和模型泛化。
results: 实验结果显示，本文的方法可以对11个任务中的描述性和生成性PLM都带来明显和重要的性能提升，最高提升为+6.98均值分数。此外，本文也证明了这个方法可以改善模型的泛化性。

Abstract
Quantization is a promising approach for reducing memory overhead and accelerating inference, especially in large pre-trained language model (PLM) scenarios. While having no access to original training data due to security and privacy concerns has emerged the demand for zero-shot quantization. Most of the cutting-edge zero-shot quantization methods primarily 1) apply to computer vision tasks, and 2) neglect of overfitting problem in the generative adversarial learning process, leading to sub-optimal performance. Motivated by this, we propose a novel zero-shot sharpness-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs. The key algorithm in solving ZSAQ is the SAM-SGA optimization, which aims to improve the quantization accuracy and model generalization via optimizing a minimax problem. We theoretically prove the convergence rate for the minimax optimization problem and this result can be applied to other nonconvex-PL minimax optimization frameworks. Extensive experiments on 11 tasks demonstrate that our method brings consistent and significant performance gains on both discriminative and generative PLMs, i.e., up to +6.98 average score. Furthermore, we empirically validate that our method can effectively improve the model generalization.

摘要
“量化是一种具有潜在的方法来降低记忆预算和加速推导，特别在大型预训语言模型（PLM）的情况下。由于安全和隐私问题的缘故，无法存取原始训练数据的需求导致了零统计量化的需求。现有的大部分cutting-edge零统计量化方法主要应用于计算机视觉任务，并且忽略了生成对抗学习过程中的过溢问题，导致表现不佳。骉于这，我们提出了一个新的零统计锐度感知量化（ZSAQ）框架，用于零统计量化不同PLM。关键算法在解决ZSAQ中是SAM-SGA优化，旨在提高量化精度和模型通用化via优化最小最大问题。我们 theoretically prove了最小最大问题的收敛率，这个结果可以应用到其他非对称PL最小最大问题框架。实验结果显示，我们的方法可以在11个任务中提供了稳定和有意义的性能提升，最高提升率为+6.98。此外，我们还证明了我们的方法可以有效地提高模型通用化。”

Exploring the Impact of Corpus Diversity on Financial Pretrained Language Models

paper_url: http://arxiv.org/abs/2310.13312
repo_url: https://github.com/deep-over/film
paper_authors: Jaeyoung Choe, Keonwoong Noh, Nayeon Kim, Seyun Ahn, Woohwan Jung
for: 这种论文主要为了解决金融领域语言模型的不足之处，提高金融领域下推理语言模型的性能。
methods: 该论文使用了各种金融领域的文本数据集，对这些数据集进行了广泛的采集和训练，并使用了一种新的训练策略来提高模型的性能。
results: 论文的实验结果表明，新提出的金融语言模型（FiLM）不仅可以在金融领域上超越现有的专业语言模型，还可以在未经见过的文本数据集上达到更高的性能。

Abstract
Over the past few years, various domain-specific pretrained language models (PLMs) have been proposed and have outperformed general-domain PLMs in specialized areas such as biomedical, scientific, and clinical domains. In addition, financial PLMs have been studied because of the high economic impact of financial data analysis. However, we found that financial PLMs were not pretrained on sufficiently diverse financial data. This lack of diverse training data leads to a subpar generalization performance, resulting in general-purpose PLMs, including BERT, often outperforming financial PLMs on many downstream tasks. To address this issue, we collected a broad range of financial corpus and trained the Financial Language Model (FiLM) on these diverse datasets. Our experimental results confirm that FiLM outperforms not only existing financial PLMs but also general domain PLMs. Furthermore, we provide empirical evidence that this improvement can be achieved even for unseen corpus groups.

摘要
(Simplified Chinese translation)在过去几年，一些领域特定的预训练语言模型（PLMs）已经被提出，并在各种专业领域，如生物医学、科学和医疗领域中表现出色。此外，由于金融数据分析的高经济影响，金融PLMs也被研究。然而，我们发现金融PLMs未被训练在充分多样化的金融数据上。这种缺乏多样化训练数据导致总体性能下降，使得通用领域PLMs，包括BERT，在许多下游任务中表现更好。为解决这个问题，我们收集了广泛的金融文献，并将这些多样化的数据集用于训练金融语言模型（FiLM）。我们的实验结果表明，FiLM不仅能超越现有的金融PLMs，还能超越通用领域PLMs。此外，我们还提供了实验证据，表明这种改进可以在未看到的文献组中实现。

Test-Time Self-Adaptive Small Language Models for Question Answering

paper_url: http://arxiv.org/abs/2310.13307
repo_url: None
paper_authors: Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park
for: 这个研究是为了测试小型自适应语言模型（LM）的可行性，以及它们在不同的问题 answering（QA）任务中的表现。
methods: 研究使用了自适应策略，将多个答案生成并对其进行整合，并且过滤出低质量的样本以减少错误的标签。
results: 研究发现，这种自适应策略可以帮助小型LM在不同的问题中表现更好，并且具有更高的稳定性。

Abstract
Recent instruction-finetuned large language models (LMs) have achieved notable performances in various tasks, such as question-answering (QA). However, despite their ability to memorize a vast amount of general knowledge across diverse tasks, they might be suboptimal on specific tasks due to their limited capacity to transfer and adapt knowledge to target tasks. Moreover, further finetuning LMs with labeled datasets is often infeasible due to their absence, but it is also questionable if we can transfer smaller LMs having limited knowledge only with unlabeled test data. In this work, we show and investigate the capabilities of smaller self-adaptive LMs, only with unlabeled test data. In particular, we first stochastically generate multiple answers, and then ensemble them while filtering out low-quality samples to mitigate noise from inaccurate labels. Our proposed self-adaption strategy demonstrates significant performance improvements on benchmark QA datasets with higher robustness across diverse prompts, enabling LMs to stay stable. Code is available at: https://github.com/starsuzi/T-SAS.

摘要
现代指令精细调整大型语言模型（LM）在多种任务上已经实现了各种优秀的表现，如问答（QA）。然而，尽管它们可以储存大量的通用知识，但可能因为缺乏目标任务的特定知识而表现不佳。此外，进一步在标注数据缺乏的情况下进行LM的训练是常见的，但是是否可以将更小的LM通过无标注测试数据进行学习呢？在这项工作中，我们展示了和研究了更小的自适应LM，只使用无标注测试数据进行学习。具体来说，我们首先随机生成多个答案，然后将它们ensemble，并对它们进行筛选，以mitigate噪音从不准确的标签中。我们的自适应策略在 benchmark QA 数据集上显示了显著的性能提升，并且具有更高的多样性和稳定性，使LM在多种提问下能够稳定。代码可以在：https://github.com/starsuzi/T-SAS 中找到。

Interpreting Indirect Answers to Yes-No Questions in Multiple Languages

paper_url: http://arxiv.org/abs/2310.13290
repo_url: https://github.com/wang-zijie/yn-question-multilingual
paper_authors: Zijie Wang, Md Mosharaf Hossain, Shivam Mathur, Terry Cruz Melo, Kadir Bulut Ozler, Keun Hee Park, Jacob Quintero, MohammadHossein Rezaei, Shreya Nupur Shakya, Md Nayem Uddin, Eduardo Blanco
for: 这篇论文主要针对响应问题，即回答问题时，答案是否直接回答问题。
methods: 该论文使用远程指导方法收集训练数据，并证明直接回答（即包含肯定或否定词）可以帮助模型理解间接回答。
results: 实验结果显示，在训练数据可以通过远程指导方法获得时，单语言精度提升是有利的（5种语言），而跨语言精度提升总是有利（8种语言）。

Abstract
Yes-no questions expect a yes or no for an answer, but people often skip polar keywords. Instead, they answer with long explanations that must be interpreted. In this paper, we focus on this challenging problem and release new benchmarks in eight languages. We present a distant supervision approach to collect training data. We also demonstrate that direct answers (i.e., with polar keywords) are useful to train models to interpret indirect answers (i.e., without polar keywords). Experimental results demonstrate that monolingual fine-tuning is beneficial if training data can be obtained via distant supervision for the language of interest (5 languages). Additionally, we show that cross-lingual fine-tuning is always beneficial (8 languages).

摘要
Yes-no问题通常需要简单的Yes或No回答，但人们经常会跳过极性词。在这篇论文中，我们关注这个挑战问题，并发布八种语言的新标准 benchmark。我们采用远程指导方法来收集训练数据。我们还证明了直接回答（即包含极性词）对于解释 indirect answers（即无极性词）的训练非常有用。实验结果表明，对于语言兴趣的语言，远程指导下的单语言精度调教是有利的。此外，我们还证明了跨语言精度调教总是有利（8种语言）。

SALMONN: Towards Generic Hearing Abilities for Large Language Models

paper_url: http://arxiv.org/abs/2310.13289
repo_url: https://github.com/bytedance/salmonn
paper_authors: Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang
For: 这个论文旨在提出一种能够直接处理和理解通用音频输入的多模态模型（SALMONN），它将预训练的文本大语言模型（LLM）与语音和音频编码器结合在一起，以实现多种语音和音频任务的竞争性表现。* Methods: 这篇论文使用了一种混合多模态的方法，将预训练的文本大语言模型与语音和音频编码器结合在一起，以实现多种语音和音频任务的竞争性表现。* Results: 这篇论文的实验结果表明，SALMONN模型可以在多种语音和音频任务上实现竞争性表现，并且具有一些不可预期的跨模态能力，如语音翻译到未知语言、语音基于槽 filling、听话问答、音频故事等等。

Abstract
Hearing is arguably an essential ability of artificial intelligence (AI) agents in the physical world, which refers to the perception and understanding of general auditory information consisting of at least three types of sounds: speech, audio events, and music. In this paper, we propose SALMONN, a speech audio language music open neural network, built by integrating a pre-trained text-based large language model (LLM) with speech and audio encoders into a single multimodal model. SALMONN enables the LLM to directly process and understand general audio inputs and achieve competitive performances on a number of speech and audio tasks used in training, such as automatic speech recognition and translation, auditory-information-based question answering, emotion recognition, speaker verification, and music and audio captioning \textit{etc.} SALMONN also has a diverse set of emergent abilities unseen in the training, which includes but is not limited to speech translation to untrained languages, speech-based slot filling, spoken-query-based question answering, audio-based storytelling, and speech audio co-reasoning \textit{etc}. The presence of the cross-modal emergent abilities is studied, and a novel few-shot activation tuning approach is proposed to activate such abilities of SALMONN. To our knowledge, SALMONN is the first model of its type and can be regarded as a step towards AI with generic hearing abilities. An interactive demo of SALMONN is available at \texttt{\url{https://github.com/bytedance/SALMONN}, and the training code and model checkpoints will be released upon acceptance.

摘要
听见是人工智能（AI）机器人在物理世界中的重要能力，指的是理解和处理一般声音信息，包括语音、音频事件和音乐等三种类型的声音。在这篇论文中，我们提出了一种名为SALMONN的语音音乐开放神经网络，通过将预训练的文本大型语言模型（LLM）与语音和音频编码器结合在一起而实现。SALMONN使得LLM可以直接处理和理解一般音频输入，并在训练中使用的各种语音和音频任务中达到竞争性的表现，如自动语音识别和翻译、听力信息基于问题回答、情感识别、人识别、音频和歌曲captioning等等。SALMONN还具有许多未在训练中看到的新的能力，包括但不限于语音翻译到未经训练的语言、语音基于插槽填充、声音问题回答、音频故事、语音音频合理等等。我们研究了这些跨模态的新能力的存在，并提出了一种新的几招活动调整方法来活化SALMONN的这些能力。到我们所知，SALMONN是首个类似的模型，可以视为人工智能机器人的听见能力的一步进步。SALMONN的交互示例可以在\url{https://github.com/bytedance/SALMONN}上查看，训练代码和模型检查点将在接受后发布。

paper_url: http://arxiv.org/abs/2310.13276
repo_url: https://github.com/yimuwangcs/Better_Cross_Modal_Retrieval
paper_authors: Xiangru Jian, Yimu Wang
for: 解决跨模态检索中的表达缩排问题，提高检索性能。
methods: 引入InvGC方法，一种基于图 convolution和平均 pooling的后处理技术，以及LocalAdj提升方法，用于提高 InvGC 的效率和效果。
results: 对多个跨模态benchmark和方法进行了实验验证，并证明了 InvGC 和 InvGC w/LocalAdj 可以有效地 mitigate表达缩排问题，提高检索性能。

Abstract
Over recent decades, significant advancements in cross-modal retrieval are mainly driven by breakthroughs in visual and linguistic modeling. However, a recent study shows that multi-modal data representations tend to cluster within a limited convex cone (as representation degeneration problem), which hinders retrieval performance due to the inseparability of these representations. In our study, we first empirically validate the presence of the representation degeneration problem across multiple cross-modal benchmarks and methods. Next, to address it, we introduce a novel method, called InvGC, a post-processing technique inspired by graph convolution and average pooling. Specifically, InvGC defines the graph topology within the datasets and then applies graph convolution in a subtractive manner. This method effectively separates representations by increasing the distances between data points. To improve the efficiency and effectiveness of InvGC, we propose an advanced graph topology, LocalAdj, which only aims to increase the distances between each data point and its nearest neighbors. To understand why InvGC works, we present a detailed theoretical analysis, proving that the lower bound of recall will be improved after deploying InvGC. Extensive empirical results show that InvGC and InvGC w/LocalAdj significantly mitigate the representation degeneration problem, thereby enhancing retrieval performance. Our code is available at https://github.com/yimuwangcs/Better_Cross_Modal_Retrieval

摘要
近年来，跨模态检索的大进步主要归功于视觉和语言模型的突破。然而，一项研究发现，跨模态数据表示往往归 converges within a limited convex cone（表示力 degeneration 问题），这会妨碍检索性能，因为这些表示无法分离。在我们的研究中，我们首先确认了多个跨模态benchmark和方法中的表示力 degeneration 问题的存在。然后，我们提出了一种新方法，叫做InvGC，它是基于图 convolution和平均pooling的后处理技术。具体来说，InvGC定义dataset中的图 topology，然后通过图 convolution的 subtractive 方式来分离表示。这种方法可以增加数据点之间的距离，从而提高检索性能。为了提高InvGC的效率和可效性，我们提出了一种高级图 topology，叫做LocalAdj，它只是增加每个数据点和其最近邻居之间的距离。为了解释 InvGC 是如何工作的，我们提供了详细的理论分析，证明 InvGC 后部署后，减少了下界的回归值，从而提高了检索性能。我们的代码可以在上获取。

paper_url: http://arxiv.org/abs/2310.13267
repo_url: None
paper_authors: Mengjie Zhao, Junya Ono, Zhi Zhong, Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Wei-Hsiang Liao, Takashi Shibuya, Hiromi Wakaki, Yuki Mitsufuji
for: 这篇论文旨在探讨对比式跨模型CLIP和CLAP在视频语言（VL）和音频语言（AL）任务中的表现，以及其语言编码器的质量和改进方法。
methods: 这篇论文使用了不监督和监督句子嵌入训练来评估语言编码器质量和跨模态任务表现。
results: 在VL预训练中，句子嵌入训练语言编码器质量和跨模态任务表现得到了提高，例如CyCLIP。然而，在AL预训练中，句子嵌入训练的效果较差，这可能与预训练数据的有限性有关。分析表示空间和跨模态Alignment的表示空间，发现句子嵌入训练提高了文本空间的均匀性，但是同时导致了跨模态Alignment的减退。

Abstract
Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks. However, there has been limited investigation of and improvement in their language encoder, which is the central component of encoding natural language descriptions of image/audio into vector representations. We extensively evaluate how unsupervised and supervised sentence embedding training affect language encoder quality and cross-modal task performance. In VL pretraining, we found that sentence embedding training language encoder quality and aids in cross-modal tasks, improving contrastive VL models such as CyCLIP. In contrast, AL pretraining benefits less from sentence embedding training, which may result from the limited amount of pretraining data. We analyze the representation spaces to understand the strengths of sentence embedding training, and find that it improves text-space uniformity, at the cost of decreased cross-modal alignment.

摘要
“对比式跨模型如CLIP和CLAP在视觉语言（VL）和语音语言（AL）任务中具有帮助作用。然而，对于这些模型的语言Encoder进行训练仍然受到有限的研究和改进。我们进行了广泛的评估，探索不同的受训练方法对语言Encoder质量和跨模型任务表现的影响。在VL预训中，我们发现这种训练可以提高语言Encoder质量，并且帮助改进对比式VL模型，如CyCLIP。然而，在AL预训中，这种训练几乎没有助益，这可能是因为预训数据的限制。我们分析了表示空间，了解了句子嵌入训练带来的优点，发现它可以提高文本空间的一致性，但是价格是跨模型Alignment的降低。”

paper_url: http://arxiv.org/abs/2310.13265
repo_url: https://github.com/lezhang7/moqagpt
paper_authors: Le Zhang, Yihong Wu, Fengran Mo, Jian-Yun Nie, Aishwarya Agrawal
for: 这篇论文主要targets Multi-modal open-domain question answering task, aiming to improve the performance of large language models (LLMs) in this task.
methods: 该论文提出了一种 straightforward and flexible framework called MoqaGPT, which uses a divide-and-conquer strategy to retrieve and extract answers from multiple modalities, and then fuses this multi-modal information using LLMs to produce a final answer.
results: 根据MMCoQA和MultiModalQA dataset的实验结果，MoqaGPT比supervised baseline提高了F1分数37.91点和EM分数34.07点，在Zero-shot setting下也超过了基线值，提高F1分数9.5点和EM分数10.1点，并且与supervised方法的性能差距有所减少。

Abstract
Multi-modal open-domain question answering typically requires evidence retrieval from databases across diverse modalities, such as images, tables, passages, etc. Even Large Language Models (LLMs) like GPT-4 fall short in this task. To enable LLMs to tackle the task in a zero-shot manner, we introduce MoqaGPT, a straightforward and flexible framework. Using a divide-and-conquer strategy that bypasses intricate multi-modality ranking, our framework can accommodate new modalities and seamlessly transition to new models for the task. Built upon LLMs, MoqaGPT retrieves and extracts answers from each modality separately, then fuses this multi-modal information using LLMs to produce a final answer. Our methodology boosts performance on the MMCoQA dataset, improving F1 by +37.91 points and EM by +34.07 points over the supervised baseline. On the MultiModalQA dataset, MoqaGPT surpasses the zero-shot baseline, improving F1 by 9.5 points and EM by 10.1 points, and significantly closes the gap with supervised methods. Our codebase is available at https://github.com/lezhang7/MOQAGPT.

摘要
多Modal开放领域问答通常需要从不同的modalities中检索证据，如图像、表格、段落等。即使大型自然语言模型（LLM）如GPT-4也有所不足。为了让LLM在零shot情况下能够完成这项任务，我们介绍了MoqaGPT框架。我们的框架采用分治策略，不需要复杂的多Modal评分，可以轻松扩展到新的modalities和新的任务。基于LLM，MoqaGPT首先从每个modalities中分别检索答案，然后使用LLM将这些多Modal信息进行融合，生成最终的答案。我们的方法在MMCoQA数据集上提高了性能，相比超参的基线，提高了F1值37.91点和EM值34.07点。在MultiModalQA数据集上，MoqaGPT超越零shot基线，提高了F1值9.5点和EM值10.1点，并在无监督方法上减少了差距。我们的代码可以在https://github.com/lezhang7/MOQAGPT上获取。

A Quality-based Syntactic Template Retriever for Syntactically-controlled Paraphrase Generation

paper_url: http://arxiv.org/abs/2310.13262
repo_url: https://github.com/xzhang00/qstr
paper_authors: Xue Zhang, Songming Zhang, Yunlong Liang, Yufeng Chen, Jian Liu, Wenjuan Han, Jinan Xu
for: 提高自然语言处理 tasks 中的 paraphrase 生成质量，尤其是在没有人工标注或高质量模板的情况下。
methods: 提出了一种新的质量基于的语法模板检索器 (QSTR)，通过评估生成的 paraphrase 质量来选择最佳的语法模板。此外，为了提高多个 paraphrase 的多样性，我们还提出了一种多样性检索算法 (DTS)。
results: QSTR 可以大幅超越现有的检索方法，在生成高质量 paraphrase 方面取得显著成果，甚至与人工标注的模板相当在无参照度量上表现出色。此外，人工评估和下游任务中使用我们生成的 paraphrase 也表现出了优秀的潜力。

Abstract
Existing syntactically-controlled paraphrase generation (SPG) models perform promisingly with human-annotated or well-chosen syntactic templates. However, the difficulty of obtaining such templates actually hinders the practical application of SPG models. For one thing, the prohibitive cost makes it unfeasible to manually design decent templates for every source sentence. For another, the templates automatically retrieved by current heuristic methods are usually unreliable for SPG models to generate qualified paraphrases. To escape this dilemma, we propose a novel Quality-based Syntactic Template Retriever (QSTR) to retrieve templates based on the quality of the to-be-generated paraphrases. Furthermore, for situations requiring multiple paraphrases for each source sentence, we design a Diverse Templates Search (DTS) algorithm, which can enhance the diversity between paraphrases without sacrificing quality. Experiments demonstrate that QSTR can significantly surpass existing retrieval methods in generating high-quality paraphrases and even perform comparably with human-annotated templates in terms of reference-free metrics. Additionally, human evaluation and the performance on downstream tasks using our generated paraphrases for data augmentation showcase the potential of our QSTR and DTS algorithm in practical scenarios.

摘要
现有的语法控制的篇章生成（SPG）模型在人工标注或选择的语法模板上表现良好。然而，获得这些模板的困难实际上限制了SPG模型的实际应用。一方面，人工设计Decent模板的成本太高，无法实际应用。另一方面，由现有的索引方法自动获取的模板通常不可靠，导致SPG模型生成质量不高的篇章。为了解决这个困境，我们提出了一种新的质量基于的语法模板检索器（QSTR），可以根据生成的篇章质量来选择语法模板。此外，为了处理每个源句需要多个篇章的情况，我们设计了多样性检索（DTS）算法，可以提高篇章之间的多样性而不是质量的牺牲。实验表明，QSTR可以明显超过现有的检索方法，生成高质量的篇章，甚至与人工标注的模板相当在无参照度量上表现出色。此外，人工评估和用我们生成的篇章进行数据增强任务的表现也表明了我们的QSTR和DTS算法在实际场景中的潜力。

Anomaly Detection of Command Shell Sessions based on DistilBERT: Unsupervised and Supervised Approaches

paper_url: http://arxiv.org/abs/2310.13247
repo_url: None
paper_authors: Zefang Liu, John Buford
for: 检测 Unix shell 会话异常行为是计算机安全中的一项关键任务。
methods: 我们使用预训练的 DistilBERT 模型，结合无监督和监督学习技术，以识别 Unix shell 会话中异常活动，同时尽量避免数据标注。
results: 在一个大规模企业数据集上进行实验，我们的方法能够准确地检测 Unix shell 会话中的异常行为。

Abstract
Anomaly detection in command shell sessions is a critical aspect of computer security. Recent advances in deep learning and natural language processing, particularly transformer-based models, have shown great promise for addressing complex security challenges. In this paper, we implement a comprehensive approach to detect anomalies in Unix shell sessions using a pretrained DistilBERT model, leveraging both unsupervised and supervised learning techniques to identify anomalous activity while minimizing data labeling. The unsupervised method captures the underlying structure and syntax of Unix shell commands, enabling the detection of session deviations from normal behavior. Experiments on a large-scale enterprise dataset collected from production systems demonstrate the effectiveness of our approach in detecting anomalous behavior in Unix shell sessions. This work highlights the potential of leveraging recent advances in transformers to address important computer security challenges.

摘要
“命令行Session anomaly detection是计算机安全的关键方面。近年来，深度学习和自然语言处理技术，特别是基于变换器的模型，在解决复杂安全挑战方面表现出了惊人的承诺。本文，我们实现了一种涵盖全面的命令行Session anomaly detection方法，使用预训练的DistilBERT模型，结合无监督和监督学习技术，以确定异常行为，同时尽量避免数据标注。无监督方法捕捉了 Unix shell命令的内部结构和语法，使得检测会话异常行为变得可能。在一个大规模的企业数据集上进行了实验， demonstarted our approach的效果性在 Unix shell sessions中检测异常行为。这种工作表明了利用最新的变换器技术来解决计算机安全挑战的潜力。”Note that Simplified Chinese is used in mainland China, and Traditional Chinese is used in Taiwan and other regions.

Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking

paper_url: http://arxiv.org/abs/2310.13243
repo_url: https://github.com/ielab/llm-qlm
paper_authors: Shengyao Zhuang, Bing Liu, Bevan Koopman, Guido Zuccon
for: This paper focuses on investigating the effectiveness of recent large language models (LLMs) as query likelihood models (QLMs) for zero-shot ranking of documents.
methods: The authors use pre-trained LLMs without fine-tuning and introduce a novel hybrid zero-shot retriever that integrates LLM-based QLMs with a traditional retriever.
results: The authors find that the LLM-based QLMs demonstrate robust zero-shot ranking ability, and the hybrid retriever achieves exceptional effectiveness in both zero-shot and few-shot scenarios.Here’s the information in Simplified Chinese text:
for: 这篇论文 investigate 最新的大型自然语言模型 (LLMs) 作为问题可能性模型 (QLMs) 的零Instance 排序文档的效果。
methods: 作者使用预训练的 LLMs 而不是精度调教，并提出了一种新的混合零实例检索器，该检索器将 LLM-based QLMs 与传统检索器集成。
results: 作者发现 LLM-based QLMs 在零实例情况下示出了强大的排序能力，并发现混合检索器在零实例和几个实例情况下都达到了非常出色的效果。

Abstract
In the field of information retrieval, Query Likelihood Models (QLMs) rank documents based on the probability of generating the query given the content of a document. Recently, advanced large language models (LLMs) have emerged as effective QLMs, showcasing promising ranking capabilities. This paper focuses on investigating the genuine zero-shot ranking effectiveness of recent LLMs, which are solely pre-trained on unstructured text data without supervised instruction fine-tuning. Our findings reveal the robust zero-shot ranking ability of such LLMs, highlighting that additional instruction fine-tuning may hinder effectiveness unless a question generation task is present in the fine-tuning dataset. Furthermore, we introduce a novel state-of-the-art ranking system that integrates LLM-based QLMs with a hybrid zero-shot retriever, demonstrating exceptional effectiveness in both zero-shot and few-shot scenarios. We make our codebase publicly available at https://github.com/ielab/llm-qlm.

摘要
在信息检索领域，查询可能性模型（QLM）根据文档内容中Generate查询的概率来排序文档。最近，高级大语言模型（LLM）作为效果的QLM出现，展示了可观的排序能力。本文将关注 investigate最近LLM的真正零上下文排序能力，这些QLM都是在无监督指导下预训练的自然语言数据。我们的发现表明这些LLM具有强大的零上下文排序能力，表明添加细化 instrucion 训练可能会降低效果，除非包含问题生成任务在训练集中。此外，我们介绍了一种新的状态略取得 ranked 系统，将LLM-基于的QLM与混合零上下文检索器结合， demonstrate 出色的效果在零上下文和几个shot scenario中。我们在https://github.com/ielab/llm-qlm中公开了我们的代码库。

The Less the Merrier? Investigating Language Representation in Multilingual Models

paper_url: http://arxiv.org/abs/2310.13228
repo_url: None
paper_authors: Hellina Hailu Nigatu, Atnafu Lambebo Tonja, Jugal Kalita
for: 本文探讨多语言模型在不同自然语言处理任务中的表现，特别是在低资源 SETTINGS 中支持的语言是否受到保障。
methods: 我们使用 популяр的多语言模型进行 investigate，分析这些模型对不同语言的表征和学习结果，包括语言家族和方言的影响。
results: 我们的实验结果显示，基于社区的模型（models that focus on languages of a given family or geographical location and are built by communities who speak them）在低资源语言之间的语言分类 task 中表现更好。我们的研究贡献到了多语言模型的理解和改进方向。

Abstract
Multilingual Language Models offer a way to incorporate multiple languages in one model and utilize cross-language transfer learning to improve performance for different Natural Language Processing (NLP) tasks. Despite progress in multilingual models, not all languages are supported as well, particularly in low-resource settings. In this work, we investigate the linguistic representation of different languages in multilingual models. We start by asking the question which languages are supported in popular multilingual models and which languages are left behind. Then, for included languages, we look at models' learned representations based on language family and dialect and try to understand how models' learned representations for~(1) seen and~(2) unseen languages vary across different language groups. In addition, we test and analyze performance on downstream tasks such as text generation and Named Entity Recognition. We observe from our experiments that community-centered models -- models that focus on languages of a given family or geographical location and are built by communities who speak them -- perform better at distinguishing between languages in the same family for low-resource languages. Our paper contributes to the literature in understanding multilingual models and their shortcomings and offers insights on potential ways to improve them.

摘要
多语言语言模型提供了将多种语言 integrate into one model，并利用交叉语言学习来提高不同自然语言处理（NLP）任务的性能。尽管在多语言模型方面有所进步，但不是所有语言都得到了充分支持，特别是在低资源环境下。在这项工作中，我们调查了不同语言在多语言模型中的语言表示。我们开始问题是哪些语言在流行的多语言模型中被支持，哪些语言被排除在外。然后，对包括的语言来说，我们查看模型学习的语言家族和方言基于的表示，并尝试理解模型对seen和unseen语言的学习表示如何不同。此外，我们测试和分析下沟通任务 such as 文本生成和命名实体识别的性能。我们发现在我们的实验中，社区中心的模型（models that focus on languages of a given family or geographical location and are built by communities who speak them）在同家族语言之间的分辨率较高。我们的论文贡献了对多语言模型和其缺陷的研究，并提供了可能改进它们的想法。

Enhancing Zero-Shot Crypto Sentiment with Fine-tuned Language Model and Prompt Engineering

paper_url: http://arxiv.org/abs/2310.13226
repo_url: None
paper_authors: Rahman S M Wahidur, Ishmam Tashdeed, Manjit Kaur, Heung-No-Lee
for: 本研究旨在提高投资者对加密货币市场的情感分析精度，并 investigate fine-tuning技术的效果。
methods: 本研究使用了大型自然语言模型的精度调整技术，包括监督式调整和指令式调整。
results: 实验结果表明，精度调整后可以获得40%的零基eline性能提升，而大型模型在指令调整下表现最高，其中最高的准确率为75.16%。

Abstract
Blockchain technology has revolutionized the financial landscape, with cryptocurrencies gaining widespread adoption for their decentralized and transparent nature. As the sentiment expressed on social media platforms can significantly influence cryptocurrency discussions and market movements, sentiment analysis has emerged as a crucial tool for understanding public opinion and predicting market trends. Motivated by the aim to enhance sentiment analysis accuracy in the cryptocurrency domain, this paper investigates fine-tuning techniques on large language models. This paper also investigates the efficacy of supervised fine-tuning and instruction-based fine-tuning on large language models for unseen tasks. Experimental results demonstrate a significant average zero-shot performance gain of 40% after fine-tuning, highlighting the potential of this technique in optimizing pre-trained language model efficiency. Additionally, the impact of instruction tuning on models of varying scales is examined, revealing that larger models benefit from instruction tuning, achieving the highest average accuracy score of 75.16%. In contrast, smaller-scale models may experience reduced generalization due to the complete utilization of model capacity. To gain deeper insight about how instruction works with these language models, this paper presents an experimental investigation into the response of an instruction-based model under different instruction tuning setups. The investigation demonstrates that the model achieves an average accuracy score of 72.38% for short and simple instructions. This performance significantly outperforms its accuracy under long and complex instructions by over 12%, thereby effectively highlighting the profound significance of instruction characteristics in maximizing model performance.

摘要
blockchain 技术已经革命化了金融领域， криптовалюencies 在 Decentralized 和 Transparent 的特点下得到了广泛的采纳。在社交媒体平台上表达的情感可以对 криптовалюencies 的讨论和市场走势产生重要影响，因此情感分析在 криптовалюencies 领域已成为一种关键的工具。为了提高情感分析的准确性，本文调查了大语言模型的精细调整技术。本文还 investigate 大语言模型的supervised 调整和指令调整在未看到任务上的效果。实验结果表明，精细调整后平均零件性能提高40%，这 highlights 该技术在优化预训练语言模型效率的潜力。此外，本文还 investigate 模型不同规模下的指令调整效果，发现大型模型受到指令调整的影响，其平均准确率为75.16%。相比之下，较小的模型可能会因完全使用模型容量而导致退化。为了更深入地了解指令如何与这些语言模型交互，本文进行了一种实验调查。调查结果表明，指令基本的模型在不同的指令调整设置下达到了72.38%的平均准确率。这个性能在长度和复杂度更高的指令下明显下降，这有效地表明了指令特点在提高模型性能的重要性。

2023-10-20

cs.LG

cs.LG - 2023-10-20

Competitive Advantage Attacks to Decentralized Federated Learning

paper_url: http://arxiv.org/abs/2310.13862
repo_url: None
paper_authors: Yuqi Jia, Minghong Fang, Neil Zhenqiang Gong
for: This paper is written for researchers and practitioners in the field of federated learning, particularly those interested in understanding and mitigating attacks on decentralized federated learning (DFL) systems.
methods: The paper proposes a new family of attacks called SelfishAttack, which aims to achieve competitive advantages over non-selfish clients in DFL systems. The authors formulate finding such local models as an optimization problem and propose methods to solve it when DFL uses different aggregation rules.
results: The authors show that their proposed methods successfully increase the accuracy gap between the final learnt local models of selfish clients and those of non-selfish ones. Moreover, SelfishAttack achieves larger accuracy gaps than poisoning attacks when extended to increase competitive advantages.Here is the Chinese translation of the three key information points:
for: 这篇论文是为了向 federated learning 领域的研究者和实践者提供一种新的攻击方法，具体是在分布式 federated learning 系统中实现自利益的攻击。
methods: 论文提出了一种新的攻击方法，即 SelfishAttack，用于在分布式 federated learning 系统中获得竞争优势。作者们将找到这些地方的方法形式为优化问题，并提出了解决这些问题的方法，当 DFL 使用不同的聚合规则时。
results: 作者们证明了他们提出的方法能够成功地增加自利益客户端的最终学习的本地模型准确率与非自利益客户端的准确率之间的差距。此外，SelfishAttack 还能够在扩展到增加竞争优势时超过毒害攻击。

Abstract
Decentralized federated learning (DFL) enables clients (e.g., hospitals and banks) to jointly train machine learning models without a central orchestration server. In each global training round, each client trains a local model on its own training data and then they exchange local models for aggregation. In this work, we propose SelfishAttack, a new family of attacks to DFL. In SelfishAttack, a set of selfish clients aim to achieve competitive advantages over the remaining non-selfish ones, i.e., the final learnt local models of the selfish clients are more accurate than those of the non-selfish ones. Towards this goal, the selfish clients send carefully crafted local models to each remaining non-selfish one in each global training round. We formulate finding such local models as an optimization problem and propose methods to solve it when DFL uses different aggregation rules. Theoretically, we show that our methods find the optimal solutions to the optimization problem. Empirically, we show that SelfishAttack successfully increases the accuracy gap (i.e., competitive advantage) between the final learnt local models of selfish clients and those of non-selfish ones. Moreover, SelfishAttack achieves larger accuracy gaps than poisoning attacks when extended to increase competitive advantages.

摘要
分布式联合学习（DFL）允许客户端（例如医院和银行）共同训练机器学习模型无需中央执行服务器。在每个全球训练轮次中，每个客户端在自己的训练数据上进行本地训练，然后将本地模型交换给其他客户端进行汇总。在这项工作中，我们提出了自利攻击（SelfishAttack），一种新的攻击DFL的方法。在SelfishAttack中，一组自利客户端尝试通过在每个全球训练轮次中发送特制的本地模型来获得与其他非自利客户端的竞争优势。具体来说，自利客户端的最终训练的本地模型比非自利客户端的模型更高精度。为实现这一目标，我们将找到一个优化问题的解决方案，并提出了在不同汇总规则下解决这个问题的方法。理论上，我们证明了我们的方法可以找到优化问题的优解。实际上，我们证明了SelfishAttack成功地增加了自利客户端和非自利客户端之间的精度差（竞争优势）。此外，SelfishAttack在扩展到增加竞争优势时比质量攻击更有效。

A Sparse Bayesian Learning for Diagnosis of Nonstationary and Spatially Correlated Faults with Application to Multistation Assembly Systems

paper_url: http://arxiv.org/abs/2310.16058
repo_url: None
paper_authors: Jihoon Chung, Zhenyu Kong
for: 本研究旨在提出一种新的维度诊断方法，以Addressing the challenges of limited sensor numbers, non-stationary process faults, and correlation information in manufacturing systems.
methods: 本方法基于实际假设，即进程 faults sparse，并使用层次结构和多个参数化先验分布来解决上述挑战。 variants Bayes inference 方法 derive approximate posterior distributions of process faults.
results: numerical and real-world case studies demonstrate the efficacy of the proposed method in an actual autobody assembly system, and its generalizability to other domains, including communication and healthcare systems.

Abstract
Sensor technology developments provide a basis for effective fault diagnosis in manufacturing systems. However, the limited number of sensors due to physical constraints or undue costs hinders the accurate diagnosis in the actual process. In addition, time-varying operational conditions that generate nonstationary process faults and the correlation information in the process require to consider for accurate fault diagnosis in the manufacturing systems. This article proposes a novel fault diagnosis method: clustering spatially correlated sparse Bayesian learning (CSSBL), and explicitly demonstrates its applicability in a multistation assembly system that is vulnerable to the above challenges. Specifically, the method is based on a practical assumption that it will likely have a few process faults (sparse). In addition, the hierarchical structure of CSSBL has several parameterized prior distributions to address the above challenges. As posterior distributions of process faults do not have closed form, this paper derives approximate posterior distributions through Variational Bayes inference. The proposed method's efficacy is provided through numerical and real-world case studies utilizing an actual autobody assembly system. The generalizability of the proposed method allows the technique to be applied in fault diagnosis in other domains, including communication and healthcare systems.

摘要
感测技术发展为制造系统效果的缺陷诊断提供基础。然而，由于物理限制或过分成本，制造系统中的感测器数量有限，这限制了实际进程中准确诊断的能力。此外，时变操作条件会生成非站ARY进程缺陷，而且需要考虑进程中的相关信息。本文提出了一种新的缺陷诊断方法：归一化空间相关稀疏 bayesian 学习（CSSBL），并详细描述其在多站制造系统中的可应用性。具体来说，该方法基于实际情况下的稀疏缺陷假设，并有多个层次结构来 Address 上述挑战。由于 posterior distribution 的不准确性，本文使用变分析法来 derive approximate posterior distribution。实验和实际案例研究表明，提议的方法可以在制造系统中实现高度的缺陷诊断精度。此外，该方法的通用性使其可以应用于其他领域，包括通信和医疗系统的缺陷诊断。

Towards Subject Agnostic Affective Emotion Recognition

paper_url: http://arxiv.org/abs/2310.15189
repo_url: None
paper_authors: Amit Kumar Jaiswal, Haiming Liu, Prayag Tiwari
for: 本研究旨在实现不含受试者特定信息的情感识别，使用EEG信号进行脑机器交互（aBCI）。
methods: 本研究使用领域总结和领域适应来解决EEG信号中的分布偏移问题。
results: 我们的提议方法可以在实验中与当前领域适应方法相比，在没有额外计算资源的情况下实现类似的性能。

Abstract
This paper focuses on affective emotion recognition, aiming to perform in the subject-agnostic paradigm based on EEG signals. However, EEG signals manifest subject instability in subject-agnostic affective Brain-computer interfaces (aBCIs), which led to the problem of distributional shift. Furthermore, this problem is alleviated by approaches such as domain generalisation and domain adaptation. Typically, methods based on domain adaptation confer comparatively better results than the domain generalisation methods but demand more computational resources given new subjects. We propose a novel framework, meta-learning based augmented domain adaptation for subject-agnostic aBCIs. Our domain adaptation approach is augmented through meta-learning, which consists of a recurrent neural network, a classifier, and a distributional shift controller based on a sum-decomposable function. Also, we present that a neural network explicating a sum-decomposable function can effectively estimate the divergence between varied domains. The network setting for augmented domain adaptation follows meta-learning and adversarial learning, where the controller promptly adapts to new domains employing the target data via a few self-adaptation steps in the test phase. Our proposed approach is shown to be effective in experiments on a public aBICs dataset and achieves similar performance to state-of-the-art domain adaptation methods while avoiding the use of additional computational resources.

摘要
The proposed approach uses a recurrent neural network (RNN), a classifier, and a distributional shift controller based on a sum-decomposable function to adapt to new domains. The controller is trained to quickly adapt to new domains using a few self-adaptation steps in the test phase. The authors also show that a neural network explicating a sum-decomposable function can effectively estimate the divergence between varied domains.The proposed approach is evaluated on a public aBCIs dataset and achieves similar performance to state-of-the-art domain adaptation methods while avoiding the use of additional computational resources. The main contributions of the paper are:1. A novel framework for subject-agnostic aBCIs based on meta-learning and adversarial learning.2. A sum-decomposable function-based distributional shift controller for adapting to new domains.3. A self-adaptation mechanism for quickly adapting to new domains using a few self-adaptation steps in the test phase.4. Experimental results on a public aBCIs dataset that demonstrate the effectiveness of the proposed approach.

Exponential weight averaging as damped harmonic motion

paper_url: http://arxiv.org/abs/2310.13854
repo_url: None
paper_authors: Jonathan Patsenker, Henry Li, Yuval Kluger
for: 提供稳定估计深度学习优化过程中的随机量统计
methods: 使用抽象移动平均（EMA）计算模型参数，并在训练过程中和训练完成后进行weight平均，以提高推理模型的稳定性
results: 提出了一种基于物理拟合的改进训练算法（BELAY），并证明了BELAY在训练过程中和训练完成后具有许多优势，包括更高的稳定性和更好的性能。

Abstract
The exponential moving average (EMA) is a commonly used statistic for providing stable estimates of stochastic quantities in deep learning optimization. Recently, EMA has seen considerable use in generative models, where it is computed with respect to the model weights, and significantly improves the stability of the inference model during and after training. While the practice of weight averaging at the end of training is well-studied and known to improve estimates of local optima, the benefits of EMA over the course of training is less understood. In this paper, we derive an explicit connection between EMA and a damped harmonic system between two particles, where one particle (the EMA weights) is drawn to the other (the model weights) via an idealized zero-length spring. We then leverage this physical analogy to analyze the effectiveness of EMA, and propose an improved training algorithm, which we call BELAY. Finally, we demonstrate theoretically and empirically several advantages enjoyed by BELAY over standard EMA.

摘要
exponential moving average (EMA) 是一种常用的统计方法，用于提供深度学习优化中稳定的随机量 estimator。最近，EMA 在生成模型中得到了广泛的应用，其中 compute 与模型参数相关。这些参数的 EMA 计算 Significantly improves the stability of the inference model during and after training. 虽然在训练结束时weight averaging的实践已经很受欢迎，但EMA在训练过程中的效果不够了解。在这篇论文中，我们 derivate 了EMA和两个 particles之间的封闭律动系统的直接关系，其中一个particle（EMA 参数）被吸引到另一个（模型参数）via 一个理想化的零长度spring。我们然后利用这种物理类比来分析 EMA 的效果，并提出了一种改进的训练算法，我们称之为 BELAY。最后，我们 theoretically 和 empirically 证明了 BELAY 对标准 EMA 的优势。

Gradual Domain Adaptation: Theory and Algorithms

paper_url: http://arxiv.org/abs/2310.13852
repo_url: https://github.com/yifei-he/goat
paper_authors: Yifei He, Haoxiang Wang, Bo Li, Han Zhao
for: 这篇论文主要关注于 Gradual Domain Adaptation (GDA) 中的一个问题，即如何在具有大量数据的源领域中对于目标领域进行逐步适束。
methods: 这篇论文使用 Gradual Self-training (GST) 来实现 GDA，并提供了一个改进了 Kumar et al. (2020) 的一般化范围边界。
results: 实验结果显示，在实际应用中，这个 GOAT 框架可以对于欠缺中间领域的情况进行改进，并且可以提高标准 GDA 的性能。

Abstract
Unsupervised domain adaptation (UDA) adapts a model from a labeled source domain to an unlabeled target domain in a one-off way. Though widely applied, UDA faces a great challenge whenever the distribution shift between the source and the target is large. Gradual domain adaptation (GDA) mitigates this limitation by using intermediate domains to gradually adapt from the source to the target domain. In this work, we first theoretically analyze gradual self-training, a popular GDA algorithm, and provide a significantly improved generalization bound compared with Kumar et al. (2020). Our theoretical analysis leads to an interesting insight: to minimize the generalization error on the target domain, the sequence of intermediate domains should be placed uniformly along the Wasserstein geodesic between the source and target domains. The insight is particularly useful under the situation where intermediate domains are missing or scarce, which is often the case in real-world applications. Based on the insight, we propose $\textbf{G}$enerative Gradual D$\textbf{O}$main $\textbf{A}$daptation with Optimal $\textbf{T}$ransport (GOAT), an algorithmic framework that can generate intermediate domains in a data-dependent way. More concretely, we first generate intermediate domains along the Wasserstein geodesic between two given consecutive domains in a feature space, then apply gradual self-training to adapt the source-trained classifier to the target along the sequence of intermediate domains. Empirically, we demonstrate that our GOAT framework can improve the performance of standard GDA when the given intermediate domains are scarce, significantly broadening the real-world application scenarios of GDA. Our code is available at https://github.com/yifei-he/GOAT.

摘要
Unsupervised domain adaptation (UDA) 将源频道上标注的模型适应到目标频道上，但是当源和目标频道的分布差异较大时，UDA会遇到很大的挑战。 Gradual domain adaptation (GDA) 可以解决这个问题，通过使用中间频道来慢慢地适应源频道到目标频道。在这项工作中，我们首先从理论角度分析了渐进自动适应算法，并提供了与Kumar et al. (2020)相比较好的泛化约束。我们的理论分析导致了一个有趣的发现：在适应目标频道时，中间频道的序列应该被置于 Wasserstein 地odesic 上。这个发现对于实际应用中缺少或罕见中间频道的情况非常有用。基于这个发现，我们提出了 $\textbf{G}$enerative Gradual D$\textbf{O}$main $\textbf{A}$daptation with Optimal $\textbf{T}$ransport（GOAT）算法框架。具体来说，我们首先在Feature空间中生成中间频道序列，然后应用渐进自动适应来适应源频道上训练的分类器到目标频道。Empirical experiments show that our GOAT framework can improve the performance of standard GDA when the given intermediate domains are scarce, significantly expanding the real-world application scenarios of GDA.我们的代码可以在https://github.com/yifei-he/GOAT上获取。

Augment with Care: Enhancing Graph Contrastive Learning with Selective Spectrum Perturbation

paper_url: http://arxiv.org/abs/2310.13845
repo_url: None
paper_authors: Kaiqi Yang, Haoyu Han, Wei Jin, Hui Liu
for: 本文旨在提出一种基于特征频谱的图像增强视图，以提高图像对比学抽象的效果。
methods: 本文使用了spectral hint guided edge perturbation方法，通过选择性地在特定频谱上pose tailored perturbation，以实现adaptive和可控的增强视图。
results: 经过广泛的实验和理论分析，GASSER方法能够提供adaptive和可控的增强视图，同时符合图像结构的自同比率和频谱特征。

Abstract
In recent years, Graph Contrastive Learning (GCL) has shown remarkable effectiveness in learning representations on graphs. As a component of GCL, good augmentation views are supposed to be invariant to the important information while discarding the unimportant part. Existing augmentation views with perturbed graph structures are usually based on random topology corruption in the spatial domain; however, from perspectives of the spectral domain, this approach may be ineffective as it fails to pose tailored impacts on the information of different frequencies, thus weakening the agreement between the augmentation views. By a preliminary experiment, we show that the impacts caused by spatial random perturbation are approximately evenly distributed among frequency bands, which may harm the invariance of augmentations required by contrastive learning frameworks. To address this issue, we argue that the perturbation should be selectively posed on the information concerning different frequencies. In this paper, we propose GASSER which poses tailored perturbation on the specific frequencies of graph structures in spectral domain, and the edge perturbation is selectively guided by the spectral hints. As shown by extensive experiments and theoretical analysis, the augmentation views are adaptive and controllable, as well as heuristically fitting the homophily ratios and spectrum of graph structures.

摘要

Fast hyperboloid decision tree algorithms

paper_url: http://arxiv.org/abs/2310.13841
repo_url: None
paper_authors: Philippe Chlenski, Ethan Turok, Antonio Moretti, Itsik Pe’er
for: 这篇论文的目的是提出一种基于гипербо利 geometry的决策树算法，以解决在机器学习中遇到的计算复杂性问题。
methods: 该论文使用了一种基于内积的方法，通过利用内积来适应гиPERBOLIC空间，从而消除了计算复杂性的问题。
results: 对多个数据集进行了广泛的比较，表明该方法可以提供快速、准确、精度高的 гиPERBOLIC数据分析工具。

Abstract
Hyperbolic geometry is gaining traction in machine learning for its effectiveness at capturing hierarchical structures in real-world data. Hyperbolic spaces, where neighborhoods grow exponentially, offer substantial advantages and consistently deliver state-of-the-art results across diverse applications. However, hyperbolic classifiers often grapple with computational challenges. Methods reliant on Riemannian optimization frequently exhibit sluggishness, stemming from the increased computational demands of operations on Riemannian manifolds. In response to these challenges, we present hyperDT, a novel extension of decision tree algorithms into hyperbolic space. Crucially, hyperDT eliminates the need for computationally intensive Riemannian optimization, numerically unstable exponential and logarithmic maps, or pairwise comparisons between points by leveraging inner products to adapt Euclidean decision tree algorithms to hyperbolic space. Our approach is conceptually straightforward and maintains constant-time decision complexity while mitigating the scalability issues inherent in high-dimensional Euclidean spaces. Building upon hyperDT we introduce hyperRF, a hyperbolic random forest model. Extensive benchmarking across diverse datasets underscores the superior performance of these models, providing a swift, precise, accurate, and user-friendly toolkit for hyperbolic data analysis.

摘要
超凡几何在机器学习中得到推广，因为它能够很好地捕捉实际数据中的层次结构。超凡空间，其 neighberhood exponentially 增长，提供了很多优势，并一直在多种应用中提供了状态机器学习的最佳结果。然而，超凡分类器经常面临计算挑战。基于里曼尼托п optimization 的方法经常会显示缓慢，这是因为在里曼尼托п上进行操作的计算增加了负担。为了解决这些挑战，我们提出了 hyperDT，一种基于几何空间的决策树算法的扩展。 hyperDT 减少了 computationally 成本的 Riemannian 优化、不稳定的对数和对数映射以及点之间的对比。我们的方法是概念简单，保持了常量时间复杂度，同时解决高维度欧氏空间中的缺乏扩展性问题。基于 hyperDT 我们引入了 hyperRF，一种基于超凡空间的随机森林模型。对多种数据进行广泛的 benchmarking 表明，这些模型具有快速、精准、准确和易用的特点，提供了一个可靠的超凡数据分析工具。

Universal Representation of Permutation-Invariant Functions on Vectors and Tensors

paper_url: http://arxiv.org/abs/2310.13829
repo_url: None
paper_authors: Puoya Tabaghi, Yusu Wang
for: 该研究的主要目标是研究多aset函数，即不同大小的输入上的具有 permutation-invariant 性的函数。
methods: 该研究使用 Deep Sets 模型，其提供了一种 scalars 上的 continuous multiset functions 的 universality 表示，以及一种需要 latent space 维度为 O(N^D) 的 universality 近似。
results: 该研究证明了 continuous 和 discontinuous multiset functions 的 universality 表示只需要 latent space 维度为 O(N^D)，并且证明了 encoder 和 decoder 函数都是连续的。此外，该研究还扩展到 permutation-invariant tensor functions 的 universality 表示，并提供了一些特殊的 sum-decomposition 结构以实现这一目标。

Abstract
A main object of our study is multiset functions -- that is, permutation-invariant functions over inputs of varying sizes. Deep Sets, proposed by \cite{zaheer2017deep}, provides a \emph{universal representation} for continuous multiset functions on scalars via a sum-decomposable model. Restricting the domain of the functions to finite multisets of $D$-dimensional vectors, Deep Sets also provides a \emph{universal approximation} that requires a latent space dimension of $O(N^D)$ -- where $N$ is an upper bound on the size of input multisets. In this paper, we strengthen this result by proving that universal representation is guaranteed for continuous and discontinuous multiset functions though a latent space dimension of $O(N^D)$. We then introduce \emph{identifiable} multisets for which we can uniquely label their elements using an identifier function, namely, finite-precision vectors are identifiable. Using our analysis on identifiable multisets, we prove that a sum-decomposable model for general continuous multiset functions only requires a latent dimension of $2DN$. We further show that both encoder and decoder functions of the model are continuous -- our main contribution to the existing work which lack such a guarantee. Also this provides a significant improvement over the aforementioned $O(N^D)$ bound which was derived for universal representation of continuous and discontinuous multiset functions. We then extend our results and provide special sum-decomposition structures to universally represent permutation-invariant tensor functions on identifiable tensors. These families of sum-decomposition models enables us to design deep network architectures and deploy them on a variety of learning tasks on sequences, images, and graphs.

摘要
我们的研究主要目标是多集函数（permutation-invariant functions），即输入尺寸不同时的函数。深度集（Deep Sets），由\cite{zaheer2017deep}所提出，提供了一个基于条件对称的普遍表示（universal representation），用于简单多集函数上的数学分析。当将函数的域限制为给定维度的多集时，深度集还提供了一个普遍推广（universal approximation），需要的隐藏空间维度为$O(N^D)$，其中$N$是输入多集的最大大小。在这篇论文中，我们将这个结果加强，证明了普遍表示是给定维度的连续和不连续多集函数，只需要隐藏空间维度为$O(N^D)$。我们还引入了可识别的多集（identifiable multisets），可以使用一个元素识别函数（identifier function）将其元素唯一标识。使用我们的分析方法，我们证明了一个条件对称的数学分析模型可以用$2DN$维度表示任意连续多集函数。此外，我们还证明了这个模型的数学分析器和实现器都是连续的，这是我们对现有工作的主要贡献之一。此外，这也提供了$O(N^D)$的独立进步，与先前的结果相比，这个结果可以更好地表示连续和不连续多集函数的普遍表示。最后，我们延伸我们的结果，提供了对应 permutation-invariant tensor functions的特殊条件对称分析模型。这些家族的条件对称分析模型可以用来设计深度网络架构，并将其部署到多个学习任务上，例如序列、图像和 Graf 等。

Adversarial Attacks on Fairness of Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.13822
repo_url: https://github.com/zhangbinchi/g-fairattack
paper_authors: Binchi Zhang, Yushun Dong, Chen Chen, Yada Zhu, Minnan Luo, Jundong Li
for: 这篇论文旨在调查对具有公平性考虑的图神经网络（GNNs）的攻击性评估，以及对不同类型的 GNNs 的公平性攻击。
methods: 该论文提出了 G-FairAttack 框架，用于对 GNNs 进行公平性攻击，包括不可察觉地损害公平性。同时， authors 还提出了一种快速计算技术来降低攻击时间复杂度。
results: 实验研究表明，G-FairAttack 可以成功地损害不同类型的 GNNs 的公平性，而不会影响预测的准确性。这些结果表明了对 GNNs 的公平性攻击的潜在漏洞，并促进了进一步研究 GNNs 的Robustness 在公平性方面。

Abstract
Fairness-aware graph neural networks (GNNs) have gained a surge of attention as they can reduce the bias of predictions on any demographic group (e.g., female) in graph-based applications. Although these methods greatly improve the algorithmic fairness of GNNs, the fairness can be easily corrupted by carefully designed adversarial attacks. In this paper, we investigate the problem of adversarial attacks on fairness of GNNs and propose G-FairAttack, a general framework for attacking various types of fairness-aware GNNs in terms of fairness with an unnoticeable effect on prediction utility. In addition, we propose a fast computation technique to reduce the time complexity of G-FairAttack. The experimental study demonstrates that G-FairAttack successfully corrupts the fairness of different types of GNNs while keeping the attack unnoticeable. Our study on fairness attacks sheds light on potential vulnerabilities in fairness-aware GNNs and guides further research on the robustness of GNNs in terms of fairness. The open-source code is available at https://github.com/zhangbinchi/G-FairAttack.

摘要
“公平意识的图 neural network (GNN) 在图基应用中受到了一波关注，因为它们可以减少任何人类群体（如女性）的预测偏见。尽管这些方法可以大幅提高 GNN 的算法公平性，但这种公平性可以被轻松地通过特制的对抗攻击破坏。在这篇论文中，我们调查了 GNN 公平性对抗攻击的问题，并提出了 G-FairAttack，一种可以对多种公平意识 GNN 进行攻击，并且保持攻击不可见地影响预测Utility。此外，我们还提出了一种快速计算技术来降低 G-FairAttack 的时间复杂度。实验研究表明，G-FairAttack 成功地破坏了不同类型的 GNN 公平性，而且保持了攻击不可见。我们的研究对 GNN 公平性对抗攻击提供了新的灵感，并指导了 GNN 的可靠性研究。G-FairAttack 的开源代码可以在 https://github.com/zhangbinchi/G-FairAttack 中获取。”

Geometric Learning with Positively Decomposable Kernels

paper_url: http://arxiv.org/abs/2310.13821
repo_url: None
paper_authors: Nathael Da Costa, Cyrus Mostajeran, Juan-Pablo Ortega, Salem Said
for: 这个论文主要针对非欧几何数据空间上的机器学习问题。
methods: 该论文提出了基于 reproduce kernel Krein space (RKKS) 的方法，该方法只需要一个可分解为正数的kernels，而不需要访问这个分解。
results: 论文表明，对于各种特定的几何空间，可以通过constructing positively decomposable kernels来实现机器学习。此外，该论文还提供了一些理论基础，用于推广 RKKS-based 方法。

Abstract
Kernel methods are powerful tools in machine learning. Classical kernel methods are based on positive-definite kernels, which map data spaces into reproducing kernel Hilbert spaces (RKHS). For non-Euclidean data spaces, positive-definite kernels are difficult to come by. In this case, we propose the use of reproducing kernel Krein space (RKKS) based methods, which require only kernels that admit a positive decomposition. We show that one does not need to access this decomposition in order to learn in RKKS. We then investigate the conditions under which a kernel is positively decomposable. We show that invariant kernels admit a positive decomposition on homogeneous spaces under tractable regularity assumptions. This makes them much easier to construct than positive-definite kernels, providing a route for learning with kernels for non-Euclidean data. By the same token, this provides theoretical foundations for RKKS-based methods in general.

摘要
kernels 是机器学习中的强大工具。经典的kernel方法基于正定kernel，将数据空间映射到 reproduce kernel Hilbert space (RKHS) 中。不theless, 非欧几何数据空间中的正定kernel很难得到。在这种情况下，我们提议使用 reproduce kernel Krein space (RKKS) 基本方法，只需要kernels admit a positive decomposition。我们证明，不需要访问这种分解，可以在 RKKS 中学习。然后，我们研究了kernels admit positive decomposition的条件。我们证明，对具有 Symmetry 的kernel，在具有homogeneous space的情况下，可以在 tractable regularity assumptions 下获得正定分解。这使得这些kernel在constructing上比正定kernel更加容易，提供了非欧几何数据上的学习方法的理论基础。

A Better Match for Drivers and Riders: Reinforcement Learning at Lyft

paper_url: http://arxiv.org/abs/2310.13810
repo_url: None
paper_authors: Xabi Azagirre, Akshay Balwally, Guillaume Candeli, Nicholas Chamandy, Benjamin Han, Alona King, Hyungjun Lee, Martin Loncaric, Sébastien Martin, Vijay Narasiman, Zhiwei, Qin, Baptiste Richard, Sara Smoot, Sean Taylor, Garrett van Ryzin, Di Wu, Fei Yu, Alex Zamoshchin
for: 提高预估驾驶员的预计收益，以便更好地匹配乘客和司机。
methods: 使用了一种新的在线强化学习方法，在实时中估算司机未来的收益，并使用这些信息来找到更有效的匹配。
results: 通过这种改进后，司机可以每年服务更多的乘客，至少增加3000万美元的增值收入。

Abstract
To better match drivers to riders in our ridesharing application, we revised Lyft's core matching algorithm. We use a novel online reinforcement learning approach that estimates the future earnings of drivers in real time and use this information to find more efficient matches. This change was the first documented implementation of a ridesharing matching algorithm that can learn and improve in real time. We evaluated the new approach during weeks of switchback experimentation in most Lyft markets, and estimated how it benefited drivers, riders, and the platform. In particular, it enabled our drivers to serve millions of additional riders each year, leading to more than $30 million per year in incremental revenue. Lyft rolled out the algorithm globally in 2021.

摘要
为了更好地匹配驾驶员和乘客在我们的乘车应用程序中，我们修改了Lyft的核心匹配算法。我们使用了一种新的在线强化学习方法，以估计驾驶员未来的收益情况，并使用这些信息来找到更有效的匹配。这是首次实现了实时学习和改进的乘车匹配算法的documented实现。我们在多个Lyft市场进行了数周的交换实验，并估计了驾驶员、乘客和平台受益的方面。特别是，它允许我们的驾驶员每年服务数百万名乘客，导致每年超过3000万美元的额外收入。Lyft在2021年全球推广了这种算法。

Learning to (Learn at Test Time)

paper_url: http://arxiv.org/abs/2310.13807
repo_url: https://github.com/test-time-training/mttt
paper_authors: Yu Sun, Xinhao Li, Karan Dalal, Chloe Hsu, Sanmi Koyejo, Carlos Guestrin, Xiaolong Wang, Tatsunori Hashimoto, Xinlei Chen
for: 本文改进了supervised learning的问题，将其转换为一个两层循环学习问题（i.e. 学习问题）。内层循环学习每个个体实例自动提供自我监督，然后进行最终预测。外层循环学习内层循环学习任务，以提高其最终预测。
methods: 内层循环学习使用了自我监督学习，并且可以使用线性注意力或自我注意力。外层循环学习学习了内层循环学习任务，以提高其最终预测。
results: 对于ImageNet datasets，使用本文的方法可以大幅提高准确率和计算量，而且比 tradicional transformers 更高效。此外，当内层循环学习器是神经网络时，本文的方法可以在224x224 Raw Pixels的情况下大幅提高准确率，而且不能由传统的 transformers 实现。

Abstract
We reformulate the problem of supervised learning as learning to learn with two nested loops (i.e. learning problems). The inner loop learns on each individual instance with self-supervision before final prediction. The outer loop learns the self-supervised task used by the inner loop, such that its final prediction improves. Our inner loop turns out to be equivalent to linear attention when the inner-loop learner is only a linear model, and to self-attention when it is a kernel estimator. For practical comparison with linear or self-attention layers, we replace each of them in a transformer with an inner loop, so our outer loop is equivalent to training the architecture. When each inner-loop learner is a neural network, our approach vastly outperforms transformers with linear attention on ImageNet from 224 x 224 raw pixels in both accuracy and FLOPs, while (regular) transformers cannot run.

摘要
我们重新定义监督学习的问题为内层循环学习（i.e. 学习问题），内层循环在每个个体例上进行自我监督学习，然后进行最终预测。外层循环则学习内层循环使用的自我监督任务，以改善其最终预测。我们的内层循环发现是线性注意力（linear attention）当内部学习器只是一个线性模型，而是自适应器当内部学习器。为了与线性注意力或自适应器层进行实用比较，我们将它们替换为内层循环，因此外层循环相等于训练架构。当内部学习器是一个神经网络时，我们的方法在ImageNet上与使用线性注意力的transformer相比，在精确性和FLOPs方面具有很大的提升，而且regular transformer无法运行。

Comparative Analysis of Machine Learning Algorithms for Solar Irradiance Forecasting in Smart Grids

paper_url: http://arxiv.org/abs/2310.13791
repo_url: None
paper_authors: Saman Soleymani, Shima Mohammadzadeh
for: 这个研究旨在预测太阳辐射，以便优化太阳能系统的使用。
methods: 本研究使用了下一代机器学习算法，包括随机树、Extreme Gradient Boosting (XGBoost)、Light Gradient Boosted Machine (lightGBM) ensemble、CatBoost、和多层感知神经网络 (MLP-ANNs)，以预测太阳辐射。此外，运用了 Bayesian 优化来调整参数。
results: 综合结果显示，MLP-ANNs 的性能会随着特征选择而改善；而 Random Forest 则与其他学习算法相比，表现较好。

Abstract
The increasing global demand for clean and environmentally friendly energy resources has caused increased interest in harnessing solar power through photovoltaic (PV) systems for smart grids and homes. However, the inherent unpredictability of PV generation poses problems associated with smart grid planning and management, energy trading and market participation, demand response, reliability, etc. Therefore, solar irradiance forecasting is essential for optimizing PV system utilization. This study proposes the next-generation machine learning algorithms such as random forests, Extreme Gradient Boosting (XGBoost), Light Gradient Boosted Machine (lightGBM) ensemble, CatBoost, and Multilayer Perceptron Artificial Neural Networks (MLP-ANNs) to forecast solar irradiance. Besides, Bayesian optimization is applied to hyperparameter tuning. Unlike tree-based ensemble algorithms that select the features intrinsically, MLP-ANN needs feature selection as a separate step. The simulation results indicate that the performance of the MLP-ANNs improves when feature selection is applied. Besides, the random forest outperforms the other learning algorithms.

摘要
全球减少可再生能源的需求在不断增长，为了利用太阳能电力，人们对光伏系统的智能网格和家庭等领域进行了更多的研究。然而，光伏生产的不可预测性会对智能网格规划和管理、能源交易和市场参与、需求回应等带来问题。因此，太阳辐射预测是必要的。本研究提出了下一代机器学习算法，如随机森林、极限梯度提升（XGBoost）、轻量级梯度提升机器（lightGBM） ensemble、CatBoost和多层感知神经网络（MLP-ANNs）来预测太阳辐射。此外，我们还应用了 bayesian 优化来调整超参数。与树型ensemble算法不同，MLP-ANN需要将特征选择作为一个分离的步骤。实验结果表明，当特征选择被应用时，MLP-ANN的性能会提高。此外，随机森林在其他学习算法中表现出色。

Graph AI in Medicine

paper_url: http://arxiv.org/abs/2310.13767
repo_url: None
paper_authors: Ruth Johnson, Michelle M. Li, Ayush Noori, Owen Queen, Marinka Zitnik
for: 这种论文主要用于探讨在医学人工智能中Graph Representation Learning的应用，尤其是通过Graph Neural Networks（GNNs）来捕捉医学数据中的复杂关系。
methods: 这种方法主要使用Graph Neural Networks（GNNs）来处理医学数据，并通过视模态为节点之间的关系来处理数据。
results: 这种方法可以在不同的医学任务上实现模型的转移，并且可以在不添加参数或最小再训练的情况下实现模型的泛化。然而，在医学决策中，人类中心的设计和模型解释性是不可或缺的。

Abstract
In clinical artificial intelligence (AI), graph representation learning, mainly through graph neural networks (GNNs), stands out for its capability to capture intricate relationships within structured clinical datasets. With diverse data -- from patient records to imaging -- GNNs process data holistically by viewing modalities as nodes interconnected by their relationships. Graph AI facilitates model transfer across clinical tasks, enabling models to generalize across patient populations without additional parameters or minimal re-training. However, the importance of human-centered design and model interpretability in clinical decision-making cannot be overstated. Since graph AI models capture information through localized neural transformations defined on graph relationships, they offer both an opportunity and a challenge in elucidating model rationale. Knowledge graphs can enhance interpretability by aligning model-driven insights with medical knowledge. Emerging graph models integrate diverse data modalities through pre-training, facilitate interactive feedback loops, and foster human-AI collaboration, paving the way to clinically meaningful predictions.

摘要
在临床人工智能（AI）领域，图表学习（GNNs）在捕捉复杂的临床数据关系方面表现出色。通过视 modalities 为节点之间的关系的方式，GNNs 可以处理数据的整体特征。图AI 使得模型可以在不同患者人口中进行模型迁移，无需额外参数或 minimal 再训练。然而，在临床决策中，人类中心的设计和模型解释性不可或缺。由于 GNNs 通过本地神经变换定义在图关系上来捕捉信息，因此它们同时提供了机会和挑战来描述模型的逻辑。知识图可以增强解释性，将模型驱动的洞察与医学知识相对应。新兴的图模型通过预训练、实时反馈循环和人类-AI合作，为临床有意义的预测开辟了道路。

Learning Interatomic Potentials at Multiple Scales

paper_url: http://arxiv.org/abs/2310.13756
repo_url: None
paper_authors: Xiang Fu, Albert Musaelian, Anders Johansson, Tommi Jaakkola, Boris Kozinsky
for: 这个论文是为了提高分子动力学（MD） simulations 的速度而写的。
methods: 这个论文使用了一种多时步（MTS）integator，该integator可以在评估certain potential energy terms的时候采用更短的时间步。这种方法是由于类型 potentials 的简单且局限的分析形式而可行。
results: 这个论文的结果表明，使用这种方法可以在MD simulations中提高速度（~3x在我们的实验中）无需失去精度。

Abstract
The need to use a short time step is a key limit on the speed of molecular dynamics (MD) simulations. Simulations governed by classical potentials are often accelerated by using a multiple-time-step (MTS) integrator that evaluates certain potential energy terms that vary more slowly than others less frequently. This approach is enabled by the simple but limiting analytic forms of classical potentials. Machine learning interatomic potentials (MLIPs), in particular recent equivariant neural networks, are much more broadly applicable than classical potentials and can faithfully reproduce the expensive but accurate reference electronic structure calculations used to train them. They still, however, require the use of a single short time step, as they lack the inherent term-by-term scale separation of classical potentials. This work introduces a method to learn a scale separation in complex interatomic interactions by co-training two MLIPs. Initially, a small and efficient model is trained to reproduce short-time-scale interactions. Subsequently, a large and expressive model is trained jointly to capture the remaining interactions not captured by the small model. When running MD, the MTS integrator then evaluates the smaller model for every time step and the larger model less frequently, accelerating simulation. Compared to a conventionally trained MLIP, our approach can achieve a significant speedup (~3x in our experiments) without a loss of accuracy on the potential energy or simulation-derived quantities.

摘要
在分子动力学（MD）模拟中，使用短时步是一个关键限制。使用多时步（MTS）integrodor的模拟通常会加速，因为它可以在不同时间步评估不同慢速的潜在能量项。这种方法是由于类型的潜在能量函数的简单但限制的分析型表达。机器学习交互原子电子镜像（MLIPs），特别是最近的对称神经网络，在比较广泛的应用场景中比类型的潜在能量函数更加适用。它们仍然需要使用短时步，因为它们缺乏类型潜在能量函数中的自然层次分离。这项工作介绍了一种方法，通过同时训练两个MLIP来学习复杂的Interatomic交互中的层次分离。首先，一个小型和高效的模型被训练来复制短时间步的交互。然后，一个大型和表达力强的模型被同时训练，以捕捉不同时间步的交互。在MD模拟中，MTS integrodor会在每个时 step中评估小型模型，而大型模型则会在更少的时间 step中评估。相比于传统训练的MLIP，我们的方法可以在同等精度下实现约3倍的加速（~3x在我们的实验中）。

FairBranch: Fairness Conflict Correction on Task-group Branches for Fair Multi-Task Learning

paper_url: http://arxiv.org/abs/2310.13746
repo_url: https://github.com/arjunroyihrpa/fairbranch-open-source-intelligence
paper_authors: Arjun Roy, Christos Koutlis, Symeon Papadopoulos, Eirini Ntoutsi
for: 提高多任务学习（MTL）模型的公平性和准确性
methods: 使用分支法（FairBranch），分析学习过程中任务之间的参数相似性，并在相关任务集中进行任务分组，以减少负向传递和偏见传递
results: FairBranch在表格和视觉MTL问题中表现出色，在公平性和准确性两个方面超过了当前状态的MTL方法

Abstract
The generalization capacity of Multi-Task Learning (MTL) becomes limited when unrelated tasks negatively impact each other by updating shared parameters with conflicting gradients, resulting in negative transfer and a reduction in MTL accuracy compared to single-task learning (STL). Recently, there has been an increasing focus on the fairness of MTL models, necessitating the optimization of both accuracy and fairness for individual tasks. Similarly to how negative transfer affects accuracy, task-specific fairness considerations can adversely influence the fairness of other tasks when there is a conflict of fairness loss gradients among jointly learned tasks, termed bias transfer. To address both negative and bias transfer in MTL, we introduce a novel method called FairBranch. FairBranch branches the MTL model by assessing the similarity of learned parameters, grouping related tasks to mitigate negative transfer. Additionally, it incorporates fairness loss gradient conflict correction between adjoining task-group branches to address bias transfer within these task groups. Our experiments in tabular and visual MTL problems demonstrate that FairBranch surpasses state-of-the-art MTL methods in terms of both fairness and accuracy.

摘要
多任务学习（MTL）的通用能力在不相关任务之间发生负面影响，导致共享参数更新的梯度冲突，从而导致负面传递和相对于单任务学习（STL）的MTL精度下降。随着对MTL模型的公平性的增加注重，我们需要同时优化任务精度和公平性。与负面传递类似，任务特定的公平性考虑可能会对其他共同学习任务的公平性产生负面影响，这被称为偏见传递。为解决MTL中的负面和偏见传递，我们提出了一种新的方法called FairBranch。FairBranch通过评估学习到的参数相似性，将相关任务分组，以避免负面传递。此外，它还 incorporates fairness loss gradient conflict correction between adjacent task-group branches，以解决内部任务组中的偏见传递。我们在表格和视觉MTL问题上进行了实验，结果显示，FairBranch超过了当前MTL方法的公平性和精度。

CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages

paper_url: http://arxiv.org/abs/2310.13683
repo_url: https://github.com/hiaac-nlp/capivara
paper_authors: Gabriel Oliveira dos Santos, Diego A. B. Moreira, Alef Iury Ferreira, Jhessica Silva, Luiz Pereira, Pedro Bueno, Thiago Sousa, Helena Maia, Nádia Da Silva, Esther Colombini, Helio Pedrini, Sandra Avila
for: 提高多语言CLIP模型在低资源语言中的性能
methods: 使用图像captioning和机器翻译生成多个Synthetic caption，并优化训练管道使其更加效率
results: 在零批量视觉语言任务中达到了状态的艺术，并在葡萄牙语文本中显示出了显著的改进Here’s the breakdown of each point:
for: The paper is written to improve the performance of multilingual CLIP models in low-resource languages.
methods: The paper uses image captioning and machine translation to generate multiple synthetic captions in low-resource languages, and optimizes the training pipeline with LiT, LoRA, and gradient checkpointing to alleviate the computational cost.
results: The proposed method achieves state-of-the-art performance in zero-shot tasks involving images and Portuguese texts, and shows the potential for significant improvements in other low-resource languages with fine-tuning.

Abstract
This work introduces CAPIVARA, a cost-efficient framework designed to enhance the performance of multilingual CLIP models in low-resource languages. While CLIP has excelled in zero-shot vision-language tasks, the resource-intensive nature of model training remains challenging. Many datasets lack linguistic diversity, featuring solely English descriptions for images. CAPIVARA addresses this by augmenting text data using image captioning and machine translation to generate multiple synthetic captions in low-resource languages. We optimize the training pipeline with LiT, LoRA, and gradient checkpointing to alleviate the computational cost. Through extensive experiments, CAPIVARA emerges as state of the art in zero-shot tasks involving images and Portuguese texts. We show the potential for significant improvements in other low-resource languages, achieved by fine-tuning the pre-trained multilingual CLIP using CAPIVARA on a single GPU for 2 hours. Our model and code is available at https://github.com/hiaac-nlp/CAPIVARA.

摘要
这个工作介绍了CAPIVARA，一种可靠的框架，用于提高多语言CLIP模型在低资源语言中的性能。尽管CLIP在零shot视频语言任务上表现出色，但模型训练的资源成本仍然很高。许多数据集缺乏语言多样性，只有英文描述图像。CAPIVARA解决这个问题，通过图像描述和机器翻译来生成多个synthetic描述在低资源语言中。我们优化了训练管道，使用LiT、LoRA和梯度检查点来缓解计算成本。经过广泛的实验，CAPIVARA在零shot任务中与图像和葡萄牙文描述表现出状元。我们显示了其他低资源语言可以获得显著改进，通过在单个GPU上使用CAPIVARA进行2小时的微调，对pre-trained多语言CLIP进行了深度学习。我们的模型和代码可以在GitHub上找到：https://github.com/hiaac-nlp/CAPIVARA。

RealFM: A Realistic Mechanism to Incentivize Data Contribution and Device Participation

paper_url: http://arxiv.org/abs/2310.13681
repo_url: None
paper_authors: Marco Bornstein, Amrit Singh Bedi, Anit Kumar Sahu, Furqan Khan, Furong Huang
for: 本研究旨在实际场景下进行联合学习（Federated Learning，FL），并解决现有框架中出现的免费乘客问题。
methods: 本研究提出了RealFM，首个真实联合机制，该机制（1）实际地模型设备利益，（2）激励数据提供和设备参与，（3）可靠地除免费乘客现象。RealFM不需要数据分享，允许非线性关系 между模型准确率和利益，从而提高服务器和参与设备的利益和数据提供量。
results: 在实际数据上，RealFM可以提高设备和服务器的利益和数据提供量，比基eline机制提高3倍和7倍。

Abstract
Edge device participation in federating learning (FL) has been typically studied under the lens of device-server communication (e.g., device dropout) and assumes an undying desire from edge devices to participate in FL. As a result, current FL frameworks are flawed when implemented in real-world settings, with many encountering the free-rider problem. In a step to push FL towards realistic settings, we propose RealFM: the first truly federated mechanism which (1) realistically models device utility, (2) incentivizes data contribution and device participation, and (3) provably removes the free-rider phenomena. RealFM does not require data sharing and allows for a non-linear relationship between model accuracy and utility, which improves the utility gained by the server and participating devices compared to non-participating devices as well as devices participating in other FL mechanisms. On real-world data, RealFM improves device and server utility, as well as data contribution, by up to 3 magnitudes and 7x respectively compared to baseline mechanisms.

摘要
<>这里使用 Simplified Chinese 翻译文本。<> presente study on edge device participation in federated learning (FL) 通常是通过 device-server communication (例如 device dropout) 进行研究，并假设 edge devices 对 FL 有不断的渴望参与。然而，实际上实现 FL 的现场设置中，现有的 FL 框架受到问题，许多人遇到了免责者问题。为了将 FL 带向现实的设置，我们提出 RealFM：首个真实的联邦机制，其中 (1) 实际地模型 device 的利益， (2) 鼓励数据贡献和设备参与， (3) 可靠地除掉免责者现象。RealFM 不需要数据共享，并允许非线性的数据对应模型精度和利益之间的关系，这提高了服务器和参与设备的利益，以及参与其他 FL 机制的设备中的数据贡献和参与。使用真实数据，RealFM 可以与基准机制相比，提高设备和服务器的利益，以及数据贡献，高达 3 倍和 7 倍。

Optimal Transport for Measures with Noisy Tree Metric

paper_url: http://arxiv.org/abs/2310.13653
repo_url: None
paper_authors: Tam Le, Truyen Nguyen, Kenji Fukumizu
for: 研究了最优运输（OT）问题，即在树 metric 空间上支持的概率测度之间的OT问题。
methods: 采用了robust OT方法，即考虑最大可能的树 metric 间距，以降低受到噪声或敌意测量的影响。
results: 提出了新的uncertainty sets of tree metrics，并利用树结构以及支持测度，实现了closed-form表达，以便快速计算。此外，还证明了max-min robust OT具有metric property和负定性，并提出了正定 definite kernels。在多个实验中，测试了这些kernels在文档分类和拓扑数据分析中的表现。

Abstract
We study optimal transport (OT) problem for probability measures supported on a tree metric space. It is known that such OT problem (i.e., tree-Wasserstein (TW)) admits a closed-form expression, but depends fundamentally on the underlying tree structure over supports of input measures. In practice, the given tree structure may be, however, perturbed due to noisy or adversarial measurements. In order to mitigate this issue, we follow the max-min robust OT approach which considers the maximal possible distances between two input measures over an uncertainty set of tree metrics. In general, this approach is hard to compute, even for measures supported in $1$-dimensional space, due to its non-convexity and non-smoothness which hinders its practical applications, especially for large-scale settings. In this work, we propose \emph{novel uncertainty sets of tree metrics} from the lens of edge deletion/addition which covers a diversity of tree structures in an elegant framework. Consequently, by building upon the proposed uncertainty sets, and leveraging the tree structure over supports, we show that the max-min robust OT also admits a closed-form expression for a fast computation as its counterpart standard OT (i.e., TW). Furthermore, we demonstrate that the max-min robust OT satisfies the metric property and is negative definite. We then exploit its negative definiteness to propose \emph{positive definite kernels} and test them in several simulations on various real-world datasets on document classification and topological data analysis for measures with noisy tree metric.

摘要
我们研究优化运输（OT）问题，将概率分布嵌入树度量空间中。已知这个OT问题（即树 Wasserstein（TW））有闭形表达，但它具有基于支持入力测度的树结构的根本关联。在实践中， giventree结构可能会受到不确定或反对的测量影响。为了解决这个问题，我们遵循max-min类型的Robust OT方法，考虑两个入力测度之间的最大可能距离，这个方法在一般情况下是困难Compute，尤其是在大规模设定下，因为它的非对称和非弹性对computation带来障碍。在这个工作中，我们提出了一种新的不确定树度量集，从edge删除/新增的角度出发，这些集合覆盖了树结构的多样性，并且具有一个漂亮的框架。因此，通过这些不确定树度量集，并且利用树结构，我们显示了max-minRobust OT也有闭形表达，可以快速Compute。此外，我们显示了max-minRobust OT满足了度量性质，并且是负定的。我们运用其负定性，提出了一种正definite核函数，并在实验中对多个实际数据进行测试，包括文档分类和数据探索。

Analyzing the contribution of different passively collected data to predict Stress and Depression

paper_url: http://arxiv.org/abs/2310.13607
repo_url: None
paper_authors: Irene Bonafonte, Cristina Bustos, Abraham Larrazolo, Gilberto Lorenzo Martinez Luna, Adolfo Guzman Arenas, Xavier Baro, Isaac Tourgeman, Mercedes Balcells, Agata Lapedriza
for: 这个论文旨在利用捕获数据来评估心理健康。
methods: 论文使用不同类型的捕获数据（WiFi、GPS、社交互动、手机日志、体育活动和学术特征）来预测每天的自报焦虑和抑郁分数。
results: 研究发现，WiFi特征（表示移动 Pattern）和手机日志特征（与睡眠 Pattern相关）对焦虑和抑郁预测具有重要作用。

Abstract
The possibility of recognizing diverse aspects of human behavior and environmental context from passively captured data motivates its use for mental health assessment. In this paper, we analyze the contribution of different passively collected sensor data types (WiFi, GPS, Social interaction, Phone Log, Physical Activity, Audio, and Academic features) to predict daily selfreport stress and PHQ-9 depression score. First, we compute 125 mid-level features from the original raw data. These 125 features include groups of features from the different sensor data types. Then, we evaluate the contribution of each feature type by comparing the performance of Neural Network models trained with all features against Neural Network models trained with specific feature groups. Our results show that WiFi features (which encode mobility patterns) and Phone Log features (which encode information correlated with sleep patterns), provide significative information for stress and depression prediction.

摘要
“ passively captured 数据的多方面特征和环境 контекст的识别，为心理健康评估带来了动机。本文分析了不同类型的抓取数据（WiFi、GPS、社交交互、手机记录、体育活动和学术特征）对日常自报 стресс和PHQ-9抑郁分数预测的贡献。首先，我们从原始的Raw数据中计算出125个中间特征。这125个特征包括不同数据类型的特征组。然后，我们评估每个特征类型的贡献，通过比较使用所有特征和特定特征组合的神经网络模型的性能来进行比较。我们的结果显示，WiFi特征（表示流动性模式）和手机记录特征（表示睡眠模式相关信息）对心理健康预测提供了重要的信息。”Note: The translation is done using Google Translate, and may not be perfect. Please let me know if you need further assistance.

Unraveling the Enigma of Double Descent: An In-depth Analysis through the Lens of Learned Feature Space

paper_url: http://arxiv.org/abs/2310.13572
repo_url: https://github.com/yufei-gu-451/double_descent_inference
paper_authors: Yufei Gu, Xiaoqing Zheng, Tomaso Aste
for: 这个论文主要是研究深度学习中的双峰现象，以及这种现象如何与噪声数据相关。
methods: 作者们使用了多种方法来研究双峰现象，包括对学习表示空间的分析，以及对不同模型和任务的实验研究。
results: 研究结果表明，双峰现象与噪声数据的存在密切相关，并且可以通过增加模型的参数数量来避免或减少这种现象的发生。作者们还提出了一种理论，即双峰现象是由模型首先学习噪声数据，然后通过拟合来添加隐式正则化的过程。

Abstract
Double descent presents a counter-intuitive aspect within the machine learning domain, and researchers have observed its manifestation in various models and tasks. While some theoretical explanations have been proposed for this phenomenon in specific contexts, an accepted theory to account for its occurrence in deep learning remains yet to be established. In this study, we revisit the phenomenon of double descent and demonstrate that its occurrence is strongly influenced by the presence of noisy data. Through conducting a comprehensive analysis of the feature space of learned representations, we unveil that double descent arises in imperfect models trained with noisy data. We argue that double descent is a consequence of the model first learning the noisy data until interpolation and then adding implicit regularization via over-parameterization acquiring therefore capability to separate the information from the noise. We postulate that double descent should never occur in well-regularized models.

摘要

On sample complexity of conditional independence testing with Von Mises estimator with application to causal discovery

paper_url: http://arxiv.org/abs/2310.13553
repo_url: None
paper_authors: Fateme Jamshidi, Luca Ganassali, Negar Kiyavash
for: 本研究旨在提出一种基于kernel density estimator的非 Parametric Von Mises estimator，用于测量多变量分布的 entropy。
methods: 该研究使用了 conditional independence testing，并基于 exponential concentration inequality，设计了一种名为VM-CI的测试方法，可以实现最佳参数化速率。
results: 研究表明，VM-CI测试方法可以在smoothness assumptions下 achiev 最佳 Parametric rates，并且可以 Characterize 任何基于VM-CI测试的 constraint-based causal discovery algorithm 的样本复杂性。此外，VM-CI测试方法在实际中也能够超过其他流行的CI测试方法，这对结构学习也有着良好的影响。

Abstract
Motivated by conditional independence testing, an essential step in constraint-based causal discovery algorithms, we study the nonparametric Von Mises estimator for the entropy of multivariate distributions built on a kernel density estimator. We establish an exponential concentration inequality for this estimator. We design a test for conditional independence (CI) based on our estimator, called VM-CI, which achieves optimal parametric rates under smoothness assumptions. Leveraging the exponential concentration, we prove a tight upper bound for the overall error of VM-CI. This, in turn, allows us to characterize the sample complexity of any constraint-based causal discovery algorithm that uses VM-CI for CI tests. To the best of our knowledge, this is the first sample complexity guarantee for causal discovery for continuous variables. Furthermore, we empirically show that VM-CI outperforms other popular CI tests in terms of either time or sample complexity (or both), which translates to a better performance in structure learning as well.

摘要
使用假设独立性测试为导向，我们研究非 Parametric Von Mises 估计器，用于测量多变量分布的熵。我们证明了这个估计器的快速凝固不等式。我们基于这个估计器设计了一个 Conditional Independence （CI）测试，称为 VM-CI，它在幂级假设下具有最佳参数化速率。利用快速凝固不等式，我们证明了 VM-CI 测试的总错误Bound。这使得我们可以计算任何基于约束的 causal discovery 算法的样本复杂度。根据我们所知，这是首次对连续变量 causal discovery 提供样本复杂度保证。此外，我们还 empirically 表明，VM-CI 测试在时间或样本复杂度（或 Both）方面比其他流行的 CI 测试更高效，这也意味着它在结构学习方面表现更好。Note:* "独立性测试" (conditional independence test) is translated as "独立性测试" in Simplified Chinese.* "假设独立性" (conditional independence) is translated as "假设独立性" in Simplified Chinese.* "熵" (entropy) is translated as "熵" in Simplified Chinese.* "估计器" (estimator) is translated as "估计器" in Simplified Chinese.* "快速凝固不等式" (exponential concentration inequality) is translated as "快速凝固不等式" in Simplified Chinese.* "Conditional Independence 测试" (CI test) is translated as "Conditional Independence 测试" in Simplified Chinese.* "约束" (constraint) is translated as "约束" in Simplified Chinese.* "causal discovery" is translated as " causal discovery" in Simplified Chinese.

Provable Benefits of Multi-task RL under Non-Markovian Decision Making Processes

paper_url: http://arxiv.org/abs/2310.13550
repo_url: None
paper_authors: Ruiquan Huang, Yuan Cheng, Jing Yang, Vincent Tan, Yingbin Liang
for: 多任务束规划学习（RL）下的Markov决策过程（MDP）中存在共享幽默结构可以获得显著的采样效率提升，本文探索这种优势是否可以扩展到更一般的顺序决策问题，例如部分可观察MDP（POMDP）和更一般的预测状态表示（PSR）。
methods: 本文使用联合模型类来描述任务，并使用$\eta$-括号数量来衡量其复杂度，这个数量也用于量化任务之间的相似性，从而确定多任务学习是否具有优势。
results: 本文提出了一种可求最优策略的算法UMT-PSR，并证明在所有PSRs中找到近似优化策略的执行是可靠的。此外，本文还提供了一些具有小$\eta$-括号数量的多任务PSR示例，这些示例可以充分利用多任务学习的优势。最后，本文还研究了下游学习，即通过对已知任务的学习来学习一个新的目标任务，并证明可以通过利用已知PSRs来实现高效的学习。

Abstract
In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures among multiple MDPs has been shown to yield significant benefits to the sample efficiency compared to single-task RL. In this paper, we investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs) and more general predictive state representations (PSRs). The main challenge here is that the large and complex model space makes it hard to identify what types of common latent structure of multi-task PSRs can reduce the model complexity and improve sample efficiency. To this end, we posit a joint model class for tasks and use the notion of $\eta$-bracketing number to quantify its complexity; this number also serves as a general metric to capture the similarity of tasks and thus determines the benefit of multi-task over single-task RL. We first study upstream multi-task learning over PSRs, in which all tasks share the same observation and action spaces. We propose a provably efficient algorithm UMT-PSR for finding near-optimal policies for all PSRs, and demonstrate that the advantage of multi-task learning manifests if the joint model class of PSRs has a smaller $\eta$-bracketing number compared to that of individual single-task learning. We also provide several example multi-task PSRs with small $\eta$-bracketing numbers, which reap the benefits of multi-task learning. We further investigate downstream learning, in which the agent needs to learn a new target task that shares some commonalities with the upstream tasks via a similarity constraint. By exploiting the learned PSRs from the upstream, we develop a sample-efficient algorithm that provably finds a near-optimal policy.

摘要
在多任务强化学习（RL）下，存在共享隐藏结构的多任务Markov决策过程（MDP）可以提供显著的样本效率提升。在这篇论文中，我们研究了这种优势是否可以扩展到更通用的序列决策问题，如部分可观察MDP（POMDP）和更通用的预测状态表示（PSR）。主要挑战在于，由于模型空间的大小和复杂性，很难确定共享隐藏结构多任务PSR可以降低模型复杂性并提高样本效率。为此，我们提出了一个共同模型类，并使用η-括号数量来衡量其复杂性；这个数量也用于捕捉任务的相似性，从而确定多任务RL的优势。我们首先研究了上游多任务学习，在其中所有任务共享同一个观察和动作空间。我们提出了一个可证fficient的算法UMT-PSR，可以在所有PSR上找到近似优化策略，并证明多任务学习的优势在joint模型类的η-括号数量较小时manifests。我们还提供了一些具有小η-括号数量的多任务PSR示例，这些示例能够利用多任务学习的优势。我们进一步调查下游学习，在其中Agent需要通过一个相似性约束学习一个新的目标任务，该任务与上游任务共享一些共同特征。通过利用已经学习的PSR，我们开发了一种可证fficient的算法，可以在样本效率上提高。

Feature Selection and Hyperparameter Fine-tuning in Artificial Neural Networks for Wood Quality Classification

paper_url: http://arxiv.org/abs/2310.13490
repo_url: None
paper_authors: Mateus Roder, Leandro Aparecido Passos, João Paulo Papa, André Luis Debiaso Rossi
for: 这篇论文的目的是提出一种可行的机器学习方法来解决木板质量评估问题，以替代人工操作。
methods: 这篇论文使用人工神经网络（ANN）进行模型训练，并同时调整模型的超参数和选择特征集。
results: 实验结果表明，在不同特征集和超参数配置下，模型的预测性能有很大差异。在一些情况下，只进行特征选择或超参数调整可以达到最佳预测性能。

Abstract
Quality classification of wood boards is an essential task in the sawmill industry, which is still usually performed by human operators in small to median companies in developing countries. Machine learning algorithms have been successfully employed to investigate the problem, offering a more affordable alternative compared to other solutions. However, such approaches usually present some drawbacks regarding the proper selection of their hyperparameters. Moreover, the models are susceptible to the features extracted from wood board images, which influence the induction of the model and, consequently, its generalization power. Therefore, in this paper, we investigate the problem of simultaneously tuning the hyperparameters of an artificial neural network (ANN) as well as selecting a subset of characteristics that better describes the wood board quality. Experiments were conducted over a private dataset composed of images obtained from a sawmill industry and described using different feature descriptors. The predictive performance of the model was compared against five baseline methods as well as a random search, performing either ANN hyperparameter tuning and feature selection. Experimental results suggest that hyperparameters should be adjusted according to the feature set, or the features should be selected considering the hyperparameter values. In summary, the best predictive performance, i.e., a balanced accuracy of $0.80$, was achieved in two distinct scenarios: (i) performing only feature selection, and (ii) performing both tasks concomitantly. Thus, we suggest that at least one of the two approaches should be considered in the context of industrial applications.

摘要
<>转换文本到简化中文。<>钢板质量分类是锯木行业中的一项重要任务，通常由人工操作员在发展中国家的小到中型公司中完成。机器学习算法已成功应用于这个问题，提供了更加可持预算的解决方案。然而，这些方法通常存在一些有关选择hyperparameter的缺点。此外，模型受到木板图像特征的影响，这会影响模型的泛化能力。因此，在这篇论文中，我们 investigate了同时调整ANN的hyperparameter和选择一个更好地描述木板质量的特征集。实验在一个私有的数据集上进行，该数据集包括锯木行业中的图像，并使用不同的特征描述器。模型预测性能与五个基线方法以及随机搜索进行比较。实验结果表明，hyperparameter应该与特征集进行调整，或者选择特征集应该考虑到hyperparameter值。简要总结，最佳预测性能，即平均准确率0.80，在两个不同的方案下实现：（1）只进行特征选择，和（2）同时进行特征选择和hyperparameter调整。因此，我们建议在工业应用中至少考虑一种这两种方法。

Personalized identification, prediction, and stimulation of neural oscillations via data-driven models of epileptic network dynamics

paper_url: http://arxiv.org/abs/2310.13480
repo_url: None
paper_authors: Tena Dubcek, Debora Ledergerber, Jana Thomann, Giovanna Aiello, Marc Serra-Garcia, Lukas Imbach, Rafael Polania
for: 这篇论文的目的是为了提供一个基于EEG数据的个体化预测模型，以评估和预测脑动力疾病的疗效。
methods: 这篇论文使用了EEG数据，通过分析脑律动的快速对频和对频几何，提取了个体化的预测模型。
results: 这篇论文的结果显示，透过这个个体化预测模型，可以对脑动力疾病的疗效进行预测和评估。此外，这个模型还可以显示， periodic brain stimulation 可以导向疾病脑动力状态转化为健康的脑动力状态。

Abstract
Neural oscillations are considered to be brain-specific signatures of information processing and communication in the brain. They also reflect pathological brain activity in neurological disorders, thus offering a basis for diagnoses and forecasting. Epilepsy is one of the most common neurological disorders, characterized by abnormal synchronization and desynchronization of the oscillations in the brain. About one third of epilepsy cases are pharmacoresistant, and as such emphasize the need for novel therapy approaches, where brain stimulation appears to be a promising therapeutic option. The development of brain stimulation paradigms, however, is often based on generalized assumptions about brain dynamics, although it is known that significant differences occur between patients and brain states. We developed a framework to extract individualized predictive models of epileptic network dynamics directly from EEG data. The models are based on the dominant coherent oscillations and their dynamical coupling, thus combining an established interpretation of dynamics through neural oscillations, with accurate patient-specific features. We show that it is possible to build a direct correspondence between the models of brain-network dynamics under periodic driving, and the mechanism of neural entrainment via periodic stimulation. When our framework is applied to EEG recordings of patients in status epilepticus (a brain state of perpetual seizure activity), it yields a model-driven predictive analysis of the therapeutic performance of periodic brain stimulation. This suggests that periodic brain stimulation can drive pathological states of epileptic network dynamics towards a healthy functional brain state.

摘要
Simplified Chinese translation:神经振荡被视为脑特有的信息处理和通信特征。它们也反映了脑神经疾病的病理活动，因此提供了诊断和预测的基础。癫痫是最常见的神经疾病之一，表现为脑内oscillation的异常同步和反同步。约一 third的癫痫患者无法由药物控制，这重申了需要新的治疗方法，其中脑刺激似乎是一个有前途的治疗选择。然而，脑刺激模式的开发 frequently based on general assumptions about brain dynamics，尽管已知脑部和脑状态之间存在显著的差异。我们提出了一个框架，可以直接从EEG数据提取个性化预测模型。这些模型基于主要的同步振荡和其动态相互作用，因此结合了已知的神经振荡解释和准确的患者特有特征。我们显示了 periodic driving下brain-network dynamics的模型和神经整合的机制之间存在直接对应关系。当我们的框架应用于 Status epilepticus（脑状态）的EEG记录时，可以获得一种基于模型的预测分析，以评估脑刺激的治疗性能。这表明 periodic brain stimulation可以驱动癫痫网络动态向具有健康功能脑状态。

An Analysis of $D^α$ seeding for $k$-means

paper_url: http://arxiv.org/abs/2310.13474
repo_url: None
paper_authors: Etienne Bamas, Sai Ganesh Nagarajan, Ola Svensson
for: 这个论文的目的是提供关于$D^\alpha$ 种子算法（也称为$k$-means++ 算法）的一种理解，以及证明其在标准 $k$-means 成本上的近似因子。
methods: 该论文使用了$D^\alpha$ 种子算法，并对其进行了分析和证明。
results: 论文证明了，对于任何 $\alpha>2$，$D^\alpha$ 种子算法在标准 $k$-means 成本上提供了一个 $O_\alpha((g_\alpha)^{2/\alpha}\cdot \left(\frac{\sigma_{\mathrm{max}}{\sigma_{\mathrm{min}}\right)^{2-4/\alpha}\cdot (\min{\ell,\log k})^{2/\alpha})$ 的近似因子。此外，论文还提供了一些下界，证明了这个近似因子的依赖关系。最后，论文还提供了一些实验证明，证明了 $\alpha>2$ 可以提高 $k$-means 成本，并且这种优势可以在 Lloyd 算法后仍保持。

Abstract
One of the most popular clustering algorithms is the celebrated $D^\alpha$ seeding algorithm (also know as $k$-means++ when $\alpha=2$) by Arthur and Vassilvitskii (2007), who showed that it guarantees in expectation an $O(2^{2\alpha}\cdot \log k)$-approximate solution to the ($k$,$\alpha$)-means cost (where euclidean distances are raised to the power $\alpha$) for any $\alpha\ge 1$. More recently, Balcan, Dick, and White (2018) observed experimentally that using $D^\alpha$ seeding with $\alpha>2$ can lead to a better solution with respect to the standard $k$-means objective (i.e. the $(k,2)$-means cost). In this paper, we provide a rigorous understanding of this phenomenon. For any $\alpha>2$, we show that $D^\alpha$ seeding guarantees in expectation an approximation factor of $$ O_\alpha \left((g_\alpha)^{2/\alpha}\cdot \left(\frac{\sigma_{\mathrm{max}}{\sigma_{\mathrm{min}}\right)^{2-4/\alpha}\cdot (\min\{\ell,\log k\})^{2/\alpha}\right)$$ with respect to the standard $k$-means cost of any underlying clustering; where $g_\alpha$ is a parameter capturing the concentration of the points in each cluster, $\sigma_{\mathrm{max}$ and $\sigma_{\mathrm{min}$ are the maximum and minimum standard deviation of the clusters around their means, and $\ell$ is the number of distinct mixing weights in the underlying clustering (after rounding them to the nearest power of $2$). We complement these results by some lower bounds showing that the dependency on $g_\alpha$ and $\sigma_{\mathrm{max}/\sigma_{\mathrm{min}$ is tight. Finally, we provide an experimental confirmation of the effects of the aforementioned parameters when using $D^\alpha$ seeding. Further, we corroborate the observation that $\alpha>2$ can indeed improve the $k$-means cost compared to $D^2$ seeding, and that this advantage remains even if we run Lloyd's algorithm after the seeding.

摘要
一种非常受欢迎的聚类算法是由Arthur和Vassilvitskii（2007）提出的$D^\alpha$ 种子算法（当$\alpha=2$时也称为$k$-means++），它保证了在期望下有$O(2^{2\alpha}\cdot \log k)$-近似解决($k$,$\alpha$)-means成本（在欧几丁度距离上升到$\alpha$势），其中$\alpha\geq 1$。更 reciently，Balcan、Dick和White（2018）通过实验发现，使用$D^\alpha$种子算法可以在标准$k$-means目标上提供更好的解决方案，特别是当$\alpha>2$时。在这篇论文中，我们提供了一个充分理解这种现象的理论基础。对于任何$\alpha>2$，我们证明了$D^\alpha$种子算法在期望下对标准$k$-means成本的近似因子为：$$O_\alpha \left((g_\alpha)^{2/\alpha}\cdot \left(\frac{\sigma_{\mathrm{max}}{\sigma_{\mathrm{min}}\right)^{2-4/\alpha}\cdot (\min\{\ell,\log k\})^{2/\alpha}\right)$$其中$g_\alpha$是聚集点的含量，$\sigma_{\mathrm{max}$和$\sigma_{\mathrm{min}$是聚集点的最大和最小横幅，$\ell$是聚集点的数量。我们还提供了一些下界，证明了$g_\alpha$和$\sigma_{\mathrm{max}/\sigma_{\mathrm{min}$的依赖性是紧的。最后，我们通过实验证明了$D^\alpha$种子算法中各参数的效果，并证明了$\alpha>2$可以在标准$k$-means成本上提供更好的解决方案，而且这种优势仍然存在，即使在使用Lloyd算法之后。

Stable Nonconvex-Nonconcave Training via Linear Interpolation

paper_url: http://arxiv.org/abs/2310.13459
repo_url: None
paper_authors: Thomas Pethick, Wanyun Xie, Volkan Cevher
for: 这篇论文提出了一种理论分析，用于稳定（大规模）神经网络训练的线性 interpolate 方法。文章认为，优化过程中的不稳定性往往是因为损失函数的非升 monotonicity，并示了如何通过使用 nonexpansive 算子理论来利用 linear interpolate。
methods: 文章提出了一种新的优化方案，叫做 relaxed approximate proximal point (RAPP)，它是第一个可以达到最后迭代速度的方法，适用于整个 cohypomonotone 问题范围。此外，文章还扩展了 RAPP 方法，使其适用于约束和规范化 Setting。通过将内部优化器换成 Lookahead 算法，文章还重新发现了 Lookahead 算法家族，并证明了它们在 cohypomonotone 问题中的 convergence。
results: 文章通过实验示范了在生成对抗网络中的应用，证明了 linear interpolate 的利用带来的 benefits。

Abstract
This paper presents a theoretical analysis of linear interpolation as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear interpolation can help by leveraging the theory of nonexpansive operators. We construct a new optimization scheme called relaxed approximate proximal point (RAPP), which is the first explicit method to achieve last iterate convergence rates for the full range of cohypomonotone problems. The construction extends to constrained and regularized settings. By replacing the inner optimizer in RAPP we rediscover the family of Lookahead algorithms for which we establish convergence in cohypomonotone problems even when the base optimizer is taken to be gradient descent ascent. The range of cohypomonotone problems in which Lookahead converges is further expanded by exploiting that Lookahead inherits the properties of the base optimizer. We corroborate the results with experiments on generative adversarial networks which demonstrates the benefits of the linear interpolation present in both RAPP and Lookahead.

摘要
Translation notes:* "linear interpolation" is translated as "线性 interpolate" (língxìng jìshì)* "nonmonotonicity" is translated as "非 monotonicity" (fēi mónótónicity)* "cohypon monotone" is translated as "共凹 monotone" (gòng cháng mónótón)* "relaxed approximate proximal point" is translated as "松relaxed approximate proximal point" (sōngxiǎoxiǎo jìshì)* "last iterate convergence rates" is translated as "最后迭代收敛率" (zuìhòu dài tiědài shōu yè)* "base optimizer" is translated as "基础优化器" (jībì yóu jiā)* "Lookahead algorithms" is translated as "Lookahead算法" (Lookahead suān fǎ)Please note that the translation is based on the Simplified Chinese version of the text, and the translation may vary depending on the specific context and the version of the text.

Correspondence learning between morphologically different robots through task demonstrations

paper_url: http://arxiv.org/abs/2310.13458
repo_url: None
paper_authors: Hakan Aktas, Yukie Nagai, Minoru Asada, Erhan Oztop, Emre Ugur
for:本文旨在学习不同机器人之间的对应关系，以便将一个机器人学习的技能转移到另一个机器人上。methods:本文提出了一种方法，通过让两个机器人执行相同的任务，形成共同的隐藏表示。然后，通过观察一个机器人执行任务，生成另一个机器人所需的隐藏空间表示。results:本文通过实验和证明，证明了该方法可以成功地学习机器人之间的对应关系。在不同的任务和 trajectory 上，该方法能够将一个机器人学习的技能转移到另一个机器人上。

Abstract
We observe a large variety of robots in terms of their bodies, sensors, and actuators. Given the commonalities in the skill sets, teaching each skill to each different robot independently is inefficient and not scalable when the large variety in the robotic landscape is considered. If we can learn the correspondences between the sensorimotor spaces of different robots, we can expect a skill that is learned in one robot can be more directly and easily transferred to the other robots. In this paper, we propose a method to learn correspondences between robots that have significant differences in their morphologies: a fixed-based manipulator robot with joint control and a differential drive mobile robot. For this, both robots are first given demonstrations that achieve the same tasks. A common latent representation is formed while learning the corresponding policies. After this initial learning stage, the observation of a new task execution by one robot becomes sufficient to generate a latent space representation pertaining to the other robot to achieve the same task. We verified our system in a set of experiments where the correspondence between two simulated robots is learned (1) when the robots need to follow the same paths to achieve the same task, (2) when the robots need to follow different trajectories to achieve the same task, and (3) when complexities of the required sensorimotor trajectories are different for the robots considered. We also provide a proof-of-the-concept realization of correspondence learning between a real manipulator robot and a simulated mobile robot.

摘要
我们观察到多种机器人在体型、感知器和 actuator 方面存在差异。由于 robotic 领域中机器人的多样性，单独教学每个机器人的技能不可能scalable。如果我们可以学习不同机器人感知动作空间之间的对应关系，那么我们可以预期一个学习在一个机器人上的技能可以更直接和更容易地转移到另一个机器人。在这篇论文中，我们提出了一种方法，可以在机器人之间学习对应关系，其中机器人之间存在重大差异。为此，我们首先给两个机器人提供了完成同一任务的示范。在学习初始阶段，我们形成了共同的潜在表示。然后，当一个机器人执行新任务时， observing 它的行为就能够生成另一个机器人所需的潜在空间表示，以便完成同一任务。我们在一系列实验中验证了我们的系统，其中包括（1）两个机器人需要跟踪同一条路来完成同一任务，（2）两个机器人需要跟踪不同的路径来完成同一任务，以及（3）两个机器人所需的感知动作轨迹之间存在差异。此外，我们还提供了一个实际实现对应学习的真实搅拌机器人和虚拟移动机器人之间的对应关系。

Y-Diagonal Couplings: Approximating Posteriors with Conditional Wasserstein Distances

paper_url: http://arxiv.org/abs/2310.13433
repo_url: None
paper_authors: Jannis Chemseddine, Paul Hagemann, Christian Wald
for: 本研究探讨了在逆问题中使用 conditional Wasserstein distance 来 aproximate posterior distribution。
methods: 本研究使用了一种叫做 conditional Wasserstein distance 的方法，该方法使用一组 restriction couplings 来等价 posterior measure。
results: 本研究发现，使用 conditional Wasserstein distance 可以获得更好的 posterior sampling 性能，并且在某些条件下，vanilla Wasserstein distance 和 conditional Wasserstein distance 相同。

Abstract
In inverse problems, many conditional generative models approximate the posterior measure by minimizing a distance between the joint measure and its learned approximation. While this approach also controls the distance between the posterior measures in the case of the Kullback Leibler divergence, it does not hold true for the Wasserstein distance. We will introduce a conditional Wasserstein distance with a set of restricted couplings that equals the expected Wasserstein distance of the posteriors. By deriving its dual, we find a rigorous way to motivate the loss of conditional Wasserstein GANs. We outline conditions under which the vanilla and the conditional Wasserstein distance coincide. Furthermore, we will show numerical examples where training with the conditional Wasserstein distance yields favorable properties for posterior sampling.

摘要
在逆 проблеme中，许多条件生成模型通过控制 JOIN 度和其学习的近似关系来估算 posterior measure。虽然这种方法在 Kullback Leibler 距离下也控制 posterior measures，但并不适用于 Wasserstein 距离。我们将引入一种 conditional Wasserstein 距离，其等于 posterior 的预期 Wasserstein 距离。通过它的 dual，我们找到了 condition Wasserstein GANs 的损失的准确优化方法。我们还详细介绍了这些方法在 vanilla 和 conditional Wasserstein 距离之间的差异，以及训练时使用 conditional Wasserstein 距离的有利属性。Here's the translation in Traditional Chinese:在逆问题中，许多条件生成模型通过控制 JOIN 度和其学习的近似关系来估算 posterior measure。处于 Kullback Leibler 距离下也控制 posterior measures，但并不适用于 Wasserstein 距离。我们将引入一种 conditional Wasserstein 距离，其等于 posterior 的预期 Wasserstein 距离。通过它的 dual，我们找到了 condition Wasserstein GANs 的损失的准确优化方法。我们还详细介绍了这些方法在 vanilla 和 conditional Wasserstein 距离之间的差异，以及训练时使用 conditional Wasserstein 距离的有利属性。

HRTF Interpolation using a Spherical Neural Process Meta-Learner

paper_url: http://arxiv.org/abs/2310.13430
repo_url: None
paper_authors: Etienne Thuillier, Craig Jin, Vesa Välimäki
for:* The paper aims to estimate a subject’s Head-Related Transfer Function (HRTF) using convenient input modalities such as anthropometric measurements or pinnae photographs.methods:* The paper proposes a Convolutional Conditional Neural Process meta-learner specialized in HRTF error interpolation, which includes a Spherical Convolutional Neural Network component and exploits potential symmetries between the HRTF’s left and right channels.results:* The proposed model achieves up to 3 dB relative error reduction compared to state-of-the-art interpolation methods, with a reduction in the data point count required to achieve comparable accuracy from 50 to 28 points. Additionally, the trained model provides well-calibrated uncertainty estimates.

Abstract
Several individualization methods have recently been proposed to estimate a subject's Head-Related Transfer Function (HRTF) using convenient input modalities such as anthropometric measurements or pinnae photographs. There exists a need for adaptively correcting the estimation error committed by such methods using a few data point samples from the subject's HRTF, acquired using acoustic measurements or perceptual feedback. To this end, we introduce a Convolutional Conditional Neural Process meta-learner specialized in HRTF error interpolation. In particular, the model includes a Spherical Convolutional Neural Network component to accommodate the spherical geometry of HRTF data. It also exploits potential symmetries between the HRTF's left and right channels about the median axis. In this work, we evaluate the proposed model's performance purely on time-aligned spectrum interpolation grounds under a simplified setup where a generic population-mean HRTF forms the initial estimates prior to corrections instead of individualized ones. The trained model achieves up to 3 dB relative error reduction compared to state-of-the-art interpolation methods despite being trained using only 85 subjects. This improvement translates up to nearly a halving of the data point count required to achieve comparable accuracy, in particular from 50 to 28 points to reach an average of -20 dB relative error per interpolated feature. Moreover, we show that the trained model provides well-calibrated uncertainty estimates. Accordingly, such estimates can inform the sequential decision problem of acquiring as few correcting HRTF data points as needed to meet a desired level of HRTF individualization accuracy.

摘要
几种个性化方法已经最近提出来估算用户的头相关传函数（HRTF），使用便捷的输入模式，如人体测量或耳朵照片。存在一个需要使用一些数据点样本来修正估算错误的需求。为此，我们介绍了一种适应HRTF错误 interpolator。特别是，该模型包括一个圆拟神经网络组件，以便处理HRTF数据的圆形几何。它还利用HRTF左右通道关于中心轴的可能 symmetries。在这种简化的设置下，我们评估了提posed模型的性能，即使只使用85名用户进行训练。结果显示，训练后的模型可以将相对错误降低至3dB，相比之前的状态 искусство interpolating方法。此外，我们还证明了训练后的模型可以提供准确的不确定性估计。因此，这些估计可以为sequential decision问题提供指导，即需要收集多少个正确的HRTF数据点以达到满意的个性化精度。

BRFL: A Blockchain-based Byzantine-Robust Federated Learning Model

paper_url: http://arxiv.org/abs/2310.13403
repo_url: None
paper_authors: Yang Li, Chunhe Xia, Chang Li, Tianbo Wang
for: 这篇论文旨在提出一个基于区块链技术的联邦学习模型，以提高该模型对于伪造模型的抵抗力。
methods: 本文使用了联邦学习和区块链技术的结合，实现了模型伪造追踪和地方训练Client的优化。特别是，在数据联邦时，选择了基于对应关系的数据范围，并使用了气泡聚类和平均梯度计算，以验证模型的准确性。
results: 实验结果显示，该模型在公开数据上显示了比基于其他基准的伪造抵抗方法更高的抵抗性。此外，模型还能够降低联邦学习的资源消耗问题。

Abstract
With the increasing importance of machine learning, the privacy and security of training data have become critical. Federated learning, which stores data in distributed nodes and shares only model parameters, has gained significant attention for addressing this concern. However, a challenge arises in federated learning due to the Byzantine Attack Problem, where malicious local models can compromise the global model's performance during aggregation. This article proposes the Blockchain-based Byzantine-Robust Federated Learning (BRLF) model that combines federated learning with blockchain technology. This integration enables traceability of malicious models and provides incentives for locally trained clients. Our approach involves selecting the aggregation node based on Pearson's correlation coefficient, and we perform spectral clustering and calculate the average gradient within each cluster, validating its accuracy using local dataset of the aggregation nodes. Experimental results on public datasets demonstrate the superior byzantine robustness of our secure aggregation algorithm compared to other baseline byzantine robust aggregation methods, and proved our proposed model effectiveness in addressing the resource consumption problem.

摘要
随着机器学习的重要性增加，训练数据的隐私和安全问题日益突出。联邦学习，即将数据存储在分布式节点上并只分享模型参数，已经吸引了广泛关注以Addressing这个问题。然而，联邦学习中出现了拜占庭攻击问题，其中有些本地模型会在聚合过程中损害全局模型的性能。这篇文章提出了基于区块链技术的联邦学习模型（BRLF），该模型结合了联邦学习和区块链技术。这种结合使得可以追溯到恶意模型，并为本地训练节点提供了奖励。我们的方法是根据潘森相关系数选择聚合节点，并对每个群进行spectral clustering，计算每个群的平均梯度，并验证其准确性使用本地数据集。实验结果表明，我们的安全聚合算法在其他基准拜占庭Robust聚合方法的比较中显示出了更高的拜占庭Robust性，并证明了我们提出的模型的有效性。

Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability

paper_url: http://arxiv.org/abs/2310.13402
repo_url: https://github.com/dmml-geneva/calibrated-posterior
paper_authors: Maciej Falkiewicz, Naoya Takeishi, Imahn Shekhzadeh, Antoine Wehenkel, Arnaud Delaunoy, Gilles Louppe, Alexandros Kalousis
for: 这个论文主要用于提出一种基于神经网络的隐藏Variable Bayesian inference算法，以提高 posterior belief 的不确定性评估。
methods: 该算法使用了选择性的隐藏Variable Bayesian inference技术，并引入了一个抑制error的评估项直接到训练目标函数中。
results: 经验表明，该算法可以在六个 benchmark 问题上达到或超过前 exists 的方法的覆盖率和预期 posterior density 水平。

Abstract
Bayesian inference allows expressing the uncertainty of posterior belief under a probabilistic model given prior information and the likelihood of the evidence. Predominantly, the likelihood function is only implicitly established by a simulator posing the need for simulation-based inference (SBI). However, the existing algorithms can yield overconfident posteriors (Hermans *et al.*, 2022) defeating the whole purpose of credibility if the uncertainty quantification is inaccurate. We propose to include a calibration term directly into the training objective of the neural model in selected amortized SBI techniques. By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation. The proposed method is not tied to any particular neural model and brings moderate computational overhead compared to the profits it introduces. It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference. We empirically show on six benchmark problems that the proposed method achieves competitive or better results in terms of coverage and expected posterior density than the previously existing approaches.

摘要
bayesian 推理允许表达基于概率模型的 posterior belief 中的不确定性，givem prior information 和证据的可能性。然而，现有的算法可能会导致过于自信的 posterior （Hermans *et al.*, 2022），这会让 credibility 失效，如果uncertainty quantification 不准确。我们提议直接在 neural model 的训练目标中包含 calibration 项。通过 relaxes classical 形式的 calibration error 的形式，我们可以实现 end-to-end backpropagation。该方法不仅可以应用于特定的 neural model，而且相比于它引入的计算开销，它带来了moderate的计算开销。它可以直接应用于现有的计算管道， allowing reliable black-box posterior inference。我们在六个 benchmark 问题上进行了实验，并证明了该方法可以达到与之前的方法相同或更好的coverage和预期 posterior density的结果。

Equivariant Deep Weight Space Alignment

paper_url: http://arxiv.org/abs/2310.13397
repo_url: None
paper_authors: Aviv Navon, Aviv Shamsian, Ethan Fetaya, Gal Chechik, Nadav Dym, Haggai Maron
for: 深度网络的 permutation symmetries 使得模型平均化和相似性判断变得困难。这些问题的解决需要对深度网络的 weights 进行对齐。
methods: 我们提出了一种新的框架，即 Deep-Align，用于解决这些问题。我们首先证明 weight alignment 遵循两种基本的 symmetries，然后提出了一种深度架构，该架构遵循这些 symmetries。
results: 我们的实验结果表明，使用 Deep-Align 可以更快地生成更好的对齐，并且可以用作其他方法的初始化来获得更好的解决方案，并且可以带来显著的加速。

Abstract
Permutation symmetries of deep networks make simple operations like model averaging and similarity estimation challenging. In many cases, aligning the weights of the networks, i.e., finding optimal permutations between their weights, is necessary. More generally, weight alignment is essential for a wide range of applications, from model merging, through exploring the optimization landscape of deep neural networks, to defining meaningful distance functions between neural networks. Unfortunately, weight alignment is an NP-hard problem. Prior research has mainly focused on solving relaxed versions of the alignment problem, leading to either time-consuming methods or sub-optimal solutions. To accelerate the alignment process and improve its quality, we propose a novel framework aimed at learning to solve the weight alignment problem, which we name Deep-Align. To that end, we first demonstrate that weight alignment adheres to two fundamental symmetries and then, propose a deep architecture that respects these symmetries. Notably, our framework does not require any labeled data. We provide a theoretical analysis of our approach and evaluate Deep-Align on several types of network architectures and learning setups. Our experimental results indicate that a feed-forward pass with Deep-Align produces better or equivalent alignments compared to those produced by current optimization algorithms. Additionally, our alignments can be used as an initialization for other methods to gain even better solutions with a significant speedup in convergence.

摘要
深度网络的 permutation symmetries 使得一些简单的操作，如模型平均和相似性估计，变得复杂。在许多情况下，对深度网络的Weight进行对齐，即找出最佳的 permutation между它们的Weight，是必要的。更广泛地说，Weight对齐是许多应用的关键，从模型合并、探索深度神经网络的优化困难度到定义深度神经网络之间的意义ful distance function。 unfortunately，Weight对齐是NP-hard问题。先前的研究主要集中在解决Weight对齐问题的宽松版本上，导致 either time-consuming methods or suboptimal solutions。为了加速对齐过程并提高其质量，我们提出了一个新的框架，我们称之为Deep-Align。为达到这一目标，我们首先示出Weight对齐符合两种基本的Symmetries，然后我们提议一种尊重这些Symmetries的深度建筑。很notationably，我们的框架不需要任何标注数据。我们提供了对我们方法的理论分析，并在多种网络架构和学习设置下进行了实验性测试。我们的实验结果表明，在Deep-Align中通过 feed-forward pass 生成的对齐比现有的优化算法更好或相同。此外，我们的对齐可以作为其他方法的初始化，以获得更好的解决方案，并且具有显著的加速减速效果。

RL-X: A Deep Reinforcement Learning Library (not only) for RoboCup

paper_url: http://arxiv.org/abs/2310.13396
repo_url: https://github.com/nico-bohlinger/rl-x
paper_authors: Nico Bohlinger, Klaus Dorer
for: 这篇论文是为了描述一个新的深度学习约束学习（DRL）库RL-X，以及其应用于RoboCup Soccer Simulation 3D League和经典DRL benchmark。
methods: RL-X使用自适应的JAX实现，可以 дости到与知名框架Stable-Baselines3相比的4.5倍加速。
results: RL-X可以在RoboCup Soccer Simulation 3D League和经典DRL benchmark上实现更高的性能。

Abstract
This paper presents the new Deep Reinforcement Learning (DRL) library RL-X and its application to the RoboCup Soccer Simulation 3D League and classic DRL benchmarks. RL-X provides a flexible and easy-to-extend codebase with self-contained single directory algorithms. Through the fast JAX-based implementations, RL-X can reach up to 4.5x speedups compared to well-known frameworks like Stable-Baselines3.

摘要
这篇论文介绍了新的深度强化学习（DRL）库RL-X和其在RoboCup Soccer Simulation 3D League和经典DRL benchmarks中的应用。RL-X提供了灵活且易于扩展的代码基础，通过快速的JAX实现，RL-X可以与已知框架如Stable-Baselines3进行比较，达到4.5倍的速度提升。

Optimal Best Arm Identification with Fixed Confidence in Restless Bandits

paper_url: http://arxiv.org/abs/2310.13393
repo_url: None
paper_authors: P. N. Karthik, Vincent Y. F. Tan, Arpan Mukherjee, Ali Tajer
for: 最佳臂标识（best arm identification）在不断时钟的多臂矢量游戏中（restless multi-armed bandit setting），具有 фиnitely many arms的情况。
methods: 使用 homogeneous Markov chain 模型（homogeneous Markov chain model）， captured by an ergodic transition probability matrix (TPM) that is a member of a single-parameter exponential family of TPMs。
results: 提出了一种策略（policy），其预期停止时间（expected stopping time）的增长率与下界（lower bound）匹配，并且证明了这种策略在极限下的错误probability（error probability）下逐渐 converges to the optimal policy。

Abstract
We study best arm identification in a restless multi-armed bandit setting with finitely many arms. The discrete-time data generated by each arm forms a homogeneous Markov chain taking values in a common, finite state space. The state transitions in each arm are captured by an ergodic transition probability matrix (TPM) that is a member of a single-parameter exponential family of TPMs. The real-valued parameters of the arm TPMs are unknown and belong to a given space. Given a function $f$ defined on the common state space of the arms, the goal is to identify the best arm -- the arm with the largest average value of $f$ evaluated under the arm's stationary distribution -- with the fewest number of samples, subject to an upper bound on the decision's error probability (i.e., the fixed-confidence regime). A lower bound on the growth rate of the expected stopping time is established in the asymptote of a vanishing error probability. Furthermore, a policy for best arm identification is proposed, and its expected stopping time is proved to have an asymptotic growth rate that matches the lower bound. It is demonstrated that tracking the long-term behavior of a certain Markov decision process and its state-action visitation proportions are the key ingredients in analyzing the converse and achievability bounds. It is shown that under every policy, the state-action visitation proportions satisfy a specific approximate flow conservation constraint and that these proportions match the optimal proportions dictated by the lower bound under any asymptotically optimal policy. The prior studies on best arm identification in restless bandits focus on independent observations from the arms, rested Markov arms, and restless Markov arms with known arm TPMs. In contrast, this work is the first to study best arm identification in restless bandits with unknown arm TPMs.

摘要
我们研究了最佳臂标识在不平静多臂带刺设定中，其中每个臂生成了一个homogeneous Markov链，这个链的状态转移是由一个不知道的臂转移概率矩阵（TPM）捕捉，这个TPM是一个单参数的 exponential family 中的一员。臂的实际参数是未知的， belong to a given space。给定一个函数 $f$ 在臂的共同状态空间上定义，我们的目标是在最少样本数下确定最佳臂，即臂的stationary distribution下的最大平均值。在fixed-confidence regime下，我们提出了一个策略，并证明其预期停止时间的增长率与下界匹配。此外，我们还证明了跟踪臂的长期行为和状态-动作访问比例是分析下界和可达性下界的关键组成部分。我们表明，对于任何策略，状态-动作访问比例满足一个特定的approximate flow conservation constraint，这些比例与最佳策略下的下界匹配。与前一些研究不同的是，本研究是在 unknown arm TPMs 下进行最佳臂标识。

Music Augmentation and Denoising For Peak-Based Audio Fingerprinting

paper_url: http://arxiv.org/abs/2310.13388
repo_url: https://github.com/deezer/musicFPaugment
paper_authors: Kamil Akesbi, Dorian Desblancs, Benjamin Martin
for: 这个论文目的是提高音频标识系统的精度和可靠性，特别是在噪音环境下。
methods: 论文提出了一种新的音频增强管道，通过模拟实际情况来加入噪音到音乐片断中，以提高音频标识系统的精度。此外，论文还提出了一种深度学习模型，用于从spectrogram中除除噪音 ком分量，以提高音频标识系统的性能。
results: 论文的实验结果表明，通过添加这些模型，可以提高常用的音频标识系统的准确率，即使在噪音环境下。

Abstract
Audio fingerprinting is a well-established solution for song identification from short recording excerpts. Popular methods rely on the extraction of sparse representations, generally spectral peaks, and have proven to be accurate, fast, and scalable to large collections. However, real-world applications of audio identification often happen in noisy environments, which can cause these systems to fail. In this work, we tackle this problem by introducing and releasing a new audio augmentation pipeline that adds noise to music snippets in a realistic way, by stochastically mimicking real-world scenarios. We then propose and release a deep learning model that removes noisy components from spectrograms in order to improve peak-based fingerprinting systems' accuracy. We show that the addition of our model improves the identification performance of commonly used audio fingerprinting systems, even under noisy conditions.

摘要
音频指纹技术是已有的解决方案，可以从短音频片断中识别歌曲。流行的方法通常基于稀疏表示EXTRACTION，通常是 spectral peaks，并已经证明准确、快速和可扩展到大量收藏。然而，实际应用中的音频识别经常发生在噪音环境中，这会使这些系统失败。在这项工作中，我们解决这个问题，通过引入和发布一个新的音频增强管道，该管道在真实的场景下做出随机尝试，以模拟实际中的噪音。然后，我们提议并发布一种深度学习模型，可以从spectrogram中除掉噪声组件，以提高基于peak的音频指纹系统的准确性。我们表明，在噪音环境下，加入我们的模型可以提高通常使用的音频指纹系统的识别性能。

Assumption violations in causal discovery and the robustness of score matching

paper_url: http://arxiv.org/abs/2310.13387
repo_url: None
paper_authors: Francesco Montagna, Atalanti A. Mastakouri, Elias Eulig, Nicoletta Noceti, Lorenzo Rosasco, Dominik Janzing, Bryon Aragam, Francesco Locatello
for: This paper aims to evaluate the empirical performance of recent causal discovery methods on observational i.i.d. data with different background conditions, allowing for violations of the critical assumptions required by each selected approach.
methods: The paper uses score matching-based methods to recover the causal structure, which demonstrate surprising performance in the false positive and false negative rate of the inferred graph in challenging scenarios.
results: The paper provides theoretical insights into the performance of these methods and is the first effort to benchmark the stability of causal discovery algorithms with respect to the values of their hyperparameters.

Abstract
When domain knowledge is limited and experimentation is restricted by ethical, financial, or time constraints, practitioners turn to observational causal discovery methods to recover the causal structure, exploiting the statistical properties of their data. Because causal discovery without further assumptions is an ill-posed problem, each algorithm comes with its own set of usually untestable assumptions, some of which are hard to meet in real datasets. Motivated by these considerations, this paper extensively benchmarks the empirical performance of recent causal discovery methods on observational i.i.d. data generated under different background conditions, allowing for violations of the critical assumptions required by each selected approach. Our experimental findings show that score matching-based methods demonstrate surprising performance in the false positive and false negative rate of the inferred graph in these challenging scenarios, and we provide theoretical insights into their performance. This work is also the first effort to benchmark the stability of causal discovery algorithms with respect to the values of their hyperparameters. Finally, we hope this paper will set a new standard for the evaluation of causal discovery methods and can serve as an accessible entry point for practitioners interested in the field, highlighting the empirical implications of different algorithm choices.

摘要
当域知识有限，实验受到伦理、金融或时间约束，实践者会选择观察型 causal 发现方法来恢复 causal 结构，利用数据的统计特性。因为无其他假设的 causal 发现是一个不定 пробле space，每种算法都来自其自己的集合不测试的假设，其中一些在实际数据中很难满足。驱动这些考虑，这篇论文对 recent causal discovery 方法进行了广泛的 benchmarking，使用了不同背景条件下的 observational i.i.d. 数据，允许假设满足不同方法的必要条件。我们的实验结果表明，使用 score matching 基于方法可以在这些复杂的场景中具有高度的 false positive 和 false negative 率，我们还提供了理论分析。这项工作还是首次对 causal discovery 算法的稳定性进行了 benchmarking，并且希望这篇论文可以设置一个新的标准 для causal discovery 方法的评估，并且可以为参与这个领域的实践者提供可访问的入门点，强调不同算法选择的 empirical 影响。

Salted Inference: Enhancing Privacy while Maintaining Efficiency of Split Inference in Mobile Computing

paper_url: http://arxiv.org/abs/2310.13384
repo_url: https://github.com/dr-bell/salted-dnns
paper_authors: Mohammad Malekzadeh, Fahim Kawsar
for: 这篇研究旨在提供一种控制深度神经网络（DNN）输出semantic interpretations的方法，以保持精度和效率，并满足在执行时间点网络的资料隐私和计算效率要求。
methods: 本研究使用了“Salted DNNs”方法，让客户在推断时控制DNN输出的semantic interpretations，而不会影响精度和效率。在推断过程中，使用了一个叫做“salted layer”的层，可以让客户控制DNN输出的semantic interpretations。
results: 实验结果显示，使用Salted DNNs方法可以保持高精度和效率，尤其是在将 salted layer 放在早期部分时。在实验中使用了两个不同的数据集，包括图像数据和感应器数据，并与标准DNN相比，Salted DNNs可以实现高精度和效率。

Abstract
Split inference partitions a deep neural network (DNN) to run the early part at the edge and the later part in the cloud. This meets two key requirements for on-device machine learning: input privacy and compute efficiency. Still, an open question in split inference is output privacy, given that the output of a DNN is visible to the cloud. While encrypted computing can protect output privacy, it mandates extensive computation and communication resources. In this paper, we introduce "Salted DNNs": a novel method that lets clients control the semantic interpretation of DNN output at inference time while maintaining accuracy and efficiency very close to that of a standard DNN. Experimental evaluations conducted on both image and sensor data show that Salted DNNs achieve classification accuracy very close to standard DNNs, particularly when the salted layer is positioned within the early part to meet the requirements of split inference. Our method is general and can be applied to various DNNs. We open-source our code and results, as a benchmark for future studies.

摘要
分配推理分区深度神经网络（DNN），让早期部分在边缘上运行，后期部分在云上运行，满足了边缘机器学习中两个关键需求：输入隐私和计算效率。然而，在分配推理中仍存在输出隐私问题，因为深度神经网络的输出可见于云端。尝试使用加密计算来保护输出隐私，但是这需要较为广泛的计算和通信资源。在本文中，我们介绍了“孤立的深度神经网络”（Salted DNNs）：一种新的方法，允许客户控制推理时神经网络输出的Semantic解释，保持精度和效率与标准神经网络几乎相同。我们在图像和感知数据上进行了实验评估，结果表明，孤立神经网络在早期部分中位置时，可以保持分配推理中的精度和效率，并且与标准神经网络几乎相同的精度。我们的方法是通用的，可以应用于多种深度神经网络。我们将代码和结果公开，作为未来研究的标准准。

Accelerated sparse Kernel Spectral Clustering for large scale data clustering problems

paper_url: http://arxiv.org/abs/2310.13381
repo_url: None
paper_authors: Mihaly Novak, Rocco Langone, Carlos Alzate, Johan Suykens
for: 这个论文的目的是提出一种改进的稀疏多方幂 Spectral Clustering（KSC）算法，以解决大规模数据分类问题。
methods: 这个算法使用了Weighted Kernel Principal Component Analysis（KPCA）和锐点 Support Vector Machine（SVM）框架，并使用了缺失Cholesky半阵（ICD）来实现稀疏性。
results: 这个改进算法可以大幅提高计算效率，从而使得 clustering 问题可以在秒钟内解决，而不是之前需要几个小时。此外，稀疏性也得到了显著提高，导致模型的表示更加紧凑，计算效率和描述力都得到了进一步提高。

Abstract
An improved version of the sparse multiway kernel spectral clustering (KSC) is presented in this brief. The original algorithm is derived from weighted kernel principal component (KPCA) analysis formulated within the primal-dual least-squares support vector machine (LS-SVM) framework. Sparsity is achieved then by the combination of the incomplete Cholesky decomposition (ICD) based low rank approximation of the kernel matrix with the so called reduced set method. The original ICD based sparse KSC algorithm was reported to be computationally far too demanding, especially when applied on large scale data clustering problems that actually it was designed for, which has prevented to gain more than simply theoretical relevance so far. This is altered by the modifications reported in this brief that drastically improve the computational characteristics. Solving the alternative, symmetrized version of the computationally most demanding core eigenvalue problem eliminates the necessity of forming and SVD of large matrices during the model construction. This results in solving clustering problems now within seconds that were reported to require hours without altering the results. Furthermore, sparsity is also improved significantly, leading to more compact model representation, increasing further not only the computational efficiency but also the descriptive power. These transform the original, only theoretically relevant ICD based sparse KSC algorithm applicable for large scale practical clustering problems. Theoretical results and improvements are demonstrated by computational experiments on carefully selected synthetic data as well as on real life problems such as image segmentation.

摘要
本文提出了一种改进版的稀疑多方幂kernel spectral clustering（KSC）算法。原始算法基于权重kernel principal component analysis（KPCA）在primaldual最小二乘Support Vector Machine（LS-SVM）框架中得到，并使用 incomplete Cholesky decomposition（ICD）基于低级别approximation kernel矩阵和减少集方法实现稀疑性。原始ICD基于稀疑KSC算法的计算复杂性太高，特别是在应用大规模数据归一化问题时，这使得其在实际应用中具有较低的理论重要性。这篇文章修改了这些问题，使得计算特性得到了很大改进。解决计算中最复杂的核心径值问题的代替版本，从而消除了将大型矩阵的SVD形成和计算的需要，这使得归一化问题可以在几秒钟内解决，而不会改变结果。此外，稀疑性也得到了显著提高，导致模型表示更加紧凑，进一步提高了计算效率和描述力。这些改进使得原来只有理论意义的ICD基于稀疑KSC算法可以应用于大规模实际归一化问题。计算实验在手动制作的数据集和真实问题，如图像分割，都表明了这些改进的有效性。

Physics-Informed Graph Convolutional Networks: Towards a generalized framework for complex geometries

paper_url: http://arxiv.org/abs/2310.14948
repo_url: None
paper_authors: Marien Chenaud, José Alves, Frédéric Magoulès
for: 解决部分束着方程（PDE）问题使用深度学习模型。
methods: 使用图ael neural networks（GNN），基于传统数学方法中的织物结构和PDE问题的相似性。
results: 提出一种将传统数学方法和Physics-Informed framework相结合的方法，并在三维非整形问题上进行实验验证。

Abstract
Since the seminal work of [9] and their Physics-Informed neural networks (PINNs), many efforts have been conducted towards solving partial differential equations (PDEs) with Deep Learning models. However, some challenges remain, for instance the extension of such models to complex three-dimensional geometries, and a study on how such approaches could be combined to classical numerical solvers. In this work, we justify the use of graph neural networks for these problems, based on the similarity between these architectures and the meshes used in traditional numerical techniques for solving partial differential equations. After proving an issue with the Physics-Informed framework for complex geometries, during the computation of PDE residuals, an alternative procedure is proposed, by combining classical numerical solvers and the Physics-Informed framework. Finally, we propose an implementation of this approach, that we test on a three-dimensional problem on an irregular geometry.

摘要
自《[9]》的Physics-Informed neural networks（PINNs）之工作开始以来，有很多努力在解决 partial differential equations（PDEs）中使用深度学习模型。然而，还有一些挑战，如扩展到复杂的三维几何结构，以及如何将这些方法与传统的数学方法结合起来。在这项工作中，我们证明使用图 neural networks 是合适的，因为这些架构与传统的数学方法中使用的网格具有相似之处。在计算 PDE residuals 时发现了物理 Informed 框架的问题，然后提出了一种 alternatively 的方法，将传统的数学方法与物理 Informed 框架结合起来。最后，我们提出了实现这种方法的方式，并在一个三维问题上进行了测试。

SigFormer: Signature Transformers for Deep Hedging

paper_url: http://arxiv.org/abs/2310.13369
repo_url: https://github.com/anh-tong/sigformer
paper_authors: Anh Tong, Thanh Nguyen-Tang, Dongeun Lee, Toan Tran, Jaesik Choi
for: 这个论文的目的是提出一种新的深度学习模型，用于财务风险管理，以提高财务风险管理的精度和效率。
methods: 这个论文使用了一种新的模型，即SigFormer，它将路径签名和变换器结合起来，以处理串行数据，特别是具有异常性的数据。
results: 研究人员通过对 sintetic数据进行比较，发现SigFormer比既有方法更快地学习并且更强的抗辐射性，特别是在存在异常的资产价格数据时。此外，通过对 SP 500 指数的实际回测，也得到了积极的结果。

Abstract
Deep hedging is a promising direction in quantitative finance, incorporating models and techniques from deep learning research. While giving excellent hedging strategies, models inherently requires careful treatment in designing architectures for neural networks. To mitigate such difficulties, we introduce SigFormer, a novel deep learning model that combines the power of path signatures and transformers to handle sequential data, particularly in cases with irregularities. Path signatures effectively capture complex data patterns, while transformers provide superior sequential attention. Our proposed model is empirically compared to existing methods on synthetic data, showcasing faster learning and enhanced robustness, especially in the presence of irregular underlying price data. Additionally, we validate our model performance through a real-world backtest on hedging the SP 500 index, demonstrating positive outcomes.

摘要
深度封风是现代金融数学的一个有前途的方向，具有深度学习研究中的模型和技术。而这些模型却需要在设计神经网络架构时进行仔细的考虑，以避免一些困难。为了解决这些问题，我们介绍了SigFormer，一种新的深度学习模型，它将路径签名和转换器结合起来处理顺序数据，特别是带有不规则性的数据。路径签名有效地捕捉复杂的数据模式，而转换器则提供了更好的顺序注意力。我们对现成的方法进行了比较，并在人工数据上进行了实验，显示了更快的学习速度和更高的稳定性，特别是在存在不规则的下场价格数据时。此外，我们还验证了我们的模型在真实的SP 500指数投资中的性能，显示了正面的结果。

Dissecting Causal Biases

paper_url: http://arxiv.org/abs/2310.13364
repo_url: None
paper_authors: Rūta Binkytė, Sami Zhioua, Yassine Turki
for: 这篇论文是关于机器学习基于自动决策系统中的歧视评估问题。
methods: 该论文使用了 causality 理论来形式地定义和分析歧视的各种来源，包括干扰、选择、测量和互动。
results: 该论文提供了每种歧视来源的关闭式表达式，以便分析每种来源的行为，特别是哪些情况下它们缺失，以及哪些情况下它们最大化。

Abstract
Accurately measuring discrimination in machine learning-based automated decision systems is required to address the vital issue of fairness between subpopulations and/or individuals. Any bias in measuring discrimination can lead to either amplification or underestimation of the true value of discrimination. This paper focuses on a class of bias originating in the way training data is generated and/or collected. We call such class causal biases and use tools from the field of causality to formally define and analyze such biases. Four sources of bias are considered, namely, confounding, selection, measurement, and interaction. The main contribution of this paper is to provide, for each source of bias, a closed-form expression in terms of the model parameters. This makes it possible to analyze the behavior of each source of bias, in particular, in which cases they are absent and in which other cases they are maximized. We hope that the provided characterizations help the community better understand the sources of bias in machine learning applications.

摘要
需要准确测量机器学习自动决策系统中的歧视，以解决人群或个体之间的公平问题。任何偏见在测量歧视方面可能会导致扩大或下降真实的歧视程度。本文关注一种来自训练数据生成和收集的偏见类型，我们称之为 causal bias。我们使用 causality 领域中的工具来正式定义和分析这种偏见。我们考虑了四种偏见来源， namely，杂化、选择、测量和交互。本文的主要贡献是为每种偏见提供了关于模型参数的闭式表达。这使得可以分析每种偏见的行为，特别是在哪些情况下它们缺失，而在哪些情况下它们最大化。我们希望通过提供的特征化来帮助社区更好地理解机器学习应用中的偏见来源。

Learning Recurrent Models with Temporally Local Rules

paper_url: http://arxiv.org/abs/2310.13284
repo_url: None
paper_authors: Azwar Abdulsalam, Joseph G. Makin
for: 这篇论文是为了探讨如何使用生成模型处理序列数据，并且提出了一种新的方法来缓解计算成本问题。
methods: 这篇论文使用了一种新的方法，即要求生成模型学习当前和前一个时刻的共同分布，而不仅仅是转移概率。
results: 研究人员在实验中发现，使用这种新方法可以学习一些通常需要回传计算的数据特征，并且可以在不同的架构下实现这种效果。

Abstract
Fitting generative models to sequential data typically involves two recursive computations through time, one forward and one backward. The latter could be a computation of the loss gradient (as in backpropagation through time), or an inference algorithm (as in the RTS/Kalman smoother). The backward pass in particular is computationally expensive (since it is inherently serial and cannot exploit GPUs), and difficult to map onto biological processes. Work-arounds have been proposed; here we explore a very different one: requiring the generative model to learn the joint distribution over current and previous states, rather than merely the transition probabilities. We show on toy datasets that different architectures employing this principle can learn aspects of the data typically requiring the backward pass.

摘要
通常情况下，生成模型会将序列数据适应的方法是通过时间进行两个递归计算，一个是前进计算，另一个是后退计算。后退计算通常是计算损失函数对数（如在时间层次propagation中），或者推理算法（如在RTS/Kalman滤波器中）。后退计算尤其是计算成本高（因为它是串行的，无法利用GPU），而且Difficult to map onto biological processes。工作around proposal; here we explore a very different one: requiring the generative model to learn the joint distribution over current and previous states, rather than merely the transition probabilities. We show on toy datasets that different architectures employing this principle can learn aspects of the data typically requiring the backward pass.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Taiwan, Hong Kong, and other countries.

FedLoRA: Model-Heterogeneous Personalized Federated Learning with LoRA Tuning

paper_url: http://arxiv.org/abs/2310.13283
repo_url: None
paper_authors: Liping Yi, Han Yu, Gang Wang, Xiaoguang Liu
for: 这个论文的目的是提出一种 computation-和 communication-efficient 的模型多样化个性化联合学习框架（FedLoRA），用于解决模型多样性、系统多样性和统计多样性等挑战。
methods: 这个框架采用了一种基于 LoRA 调整的迭代式全局-本地知识交换方法，通过在每个客户端上加载一个小型同构器，使得客户端可以训练多样化的本地模型而无需高度 computation 和 communication overhead。
results: 在两个实际数据集上进行了广泛的实验，结果显示，FedLoRA 可以在测试精度、计算开销和通信成本三个方面超过六个基准方法，其中测试精度高于最佳方法1.35%，计算开销下降11.81倍，通信成本减少7.41倍。

Abstract
Federated learning (FL) is an emerging machine learning paradigm in which a central server coordinates multiple participants (a.k.a. FL clients) to train a model collaboratively on decentralized data with privacy protection. This paradigm constrains that all clients have to train models with the same structures (homogeneous). In practice, FL often faces statistical heterogeneity, system heterogeneity and model heterogeneity challenges. These challenging issues inspire the field of Model-Heterogeneous Personalized Federated Learning (MHPFL) which aims to train a personalized and heterogeneous local model for each FL client. Existing MHPFL approaches cannot achieve satisfactory model performance, acceptable computational overhead and efficient communication simultaneously. To bridge this gap, we propose a novel computation- and communication-efficient model-heterogeneous personalized Federated learning framework based on LoRA tuning (FedLoRA). It is designed to incorporate a homogeneous small adapter for each client's heterogeneous local model. Both models are trained following the proposed iterative training for global-local knowledge exchange. The homogeneous small local adapters are sent to the FL server to be aggregated into a global adapter. In this way, FL clients can train heterogeneous local models without incurring high computation and communication costs. We theoretically prove the non-convex convergence rate of FedLoRA. Extensive experiments on two real-world datasets demonstrate that FedLoRA outperforms six state-of-the-art baselines, beating the best approach by 1.35% in terms of test accuracy, 11.81 times computation overhead reduction and 7.41 times communication cost saving.

摘要
联合学习（FL）是一种兴起的机器学习方法，在中央服务器协调多个 particiant（称为FL客户端）共同训练模型，并在分散式数据上保护隐私。这个方法限制所有客户端都必须使用同一个结构的模型（同质）。在实践中，FL经常面临统计不一致、系统不一致和模型不一致的挑战。这些挑战驱使了联合学习专门的人类化个性化训练框架（MHPFL），旨在将各个FL客户端专门训练的本地模型变成个性化的模型。现有的MHPFL方法无法同时 дости得满意的模型性能、可接受的计算负载和有效的通信成本。为了填补这个差距，我们提出了一个新的计算和通信效率的模型专业化人类化联合学习框架，基于LoRA调整（FedLoRA）。它运用了各个客户端的同质小适配器，以实现各个客户端专门训练的本地模型，不需要高计算和通信成本。我们对FedLoRA的非对称协调率进行了理论证明。实验结果显示，FedLoRA在两个真实世界数据集上进行了比基eline的性能评估，与最佳方法相比，获得了1.35%的测试精度提升、11.81倍的计算负载削减和7.41倍的通信成本削减。

An Event based Prediction Suffix Tree

paper_url: http://arxiv.org/abs/2310.14944
repo_url: None
paper_authors: Evie Andrew, Travis Monk, André van Schaik
for: 这篇论文是为了介绍一种基于事件预测 suffixed tree（EPST）算法，该算法是基于生物体中的预测方法，可以在多个重叠模式上进行预测。
methods: 该算法使用基于事件数据的特定表示方式，即在短时间窗口内的事件子序列的一部分。它还具有可解释性、耐事件噪音、耐Dropout等多个优点。
results: 在一个 sintetic数据预测任务中，EPST算法可以在添加事件噪音、事件抖动和Dropout等不利条件下进行预测，并输出预测结果，可以应用于事件基于异常检测或模式识别等任务。

Abstract
This article introduces the Event based Prediction Suffix Tree (EPST), a biologically inspired, event-based prediction algorithm. The EPST learns a model online based on the statistics of an event based input and can make predictions over multiple overlapping patterns. The EPST uses a representation specific to event based data, defined as a portion of the power set of event subsequences within a short context window. It is explainable, and possesses many promising properties such as fault tolerance, resistance to event noise, as well as the capability for one-shot learning. The computational features of the EPST are examined in a synthetic data prediction task with additive event noise, event jitter, and dropout. The resulting algorithm outputs predicted projections for the near term future of the signal, which may be applied to tasks such as event based anomaly detection or pattern recognition.

摘要
Translated into Simplified Chinese:这篇文章介绍了基于事件的预测 sufix tree (EPST) 算法，这是基于生物体的革新性预测算法。EPST 在线学习基于事件输入的统计特征，并可以预测多个 overlap 的模式。EPST 使用专门为事件数据定义的表示方式，即在短上下文窗口内的事件子序列的一部分。它可以解释，并具有许多有前途的特性，如fault tolerance、事件噪音抗性以及一次学习能力。EPST 的计算特点在一个 sintetic 数据预测任务中被检查，包括 additive 事件噪音、事件摆动和 dropout。结果输出的预测投影可以用于事件基于异常检测或模式识别等任务。

DIG-MILP: a Deep Instance Generator for Mixed-Integer Linear Programming with Feasibility Guarantee

paper_url: http://arxiv.org/abs/2310.13261
repo_url: https://github.com/graph-com/dig_milp
paper_authors: Haoyu Wang, Jialin Liu, Xiaohan Chen, Xinshang Wang, Pan Li, Wotao Yin
for: 实现高效的整数线性Programming（MILP）解决方案，并且提供广泛、多样化和代表性的数据来支持算法开发、解决方案优化和机器学习模型训练。
methods: 基于Variational auto-encoder（VAE）的深度生成框架，将高度限制的MILP数据中的深度结构特征提取出来，生成与目标数据相似的实例。
results: 通过两个下游任务（S1）资料共享和（S2）资料增强，显示DIG-MILP可以生成高品质且新的MILP实例，并且 garantte Correctness和可行性。

Abstract
Mixed-integer linear programming (MILP) stands as a notable NP-hard problem pivotal to numerous crucial industrial applications. The development of effective algorithms, the tuning of solvers, and the training of machine learning models for MILP resolution all hinge on access to extensive, diverse, and representative data. Yet compared to the abundant naturally occurring data in image and text realms, MILP is markedly data deficient, underscoring the vital role of synthetic MILP generation. We present DIG-MILP, a deep generative framework based on variational auto-encoder (VAE), adept at extracting deep-level structural features from highly limited MILP data and producing instances that closely mirror the target data. Notably, by leveraging the MILP duality, DIG-MILP guarantees a correct and complete generation space as well as ensures the boundedness and feasibility of the generated instances. Our empirical study highlights the novelty and quality of the instances generated by DIG-MILP through two distinct downstream tasks: (S1) Data sharing, where solver solution times correlate highly positive between original and DIG-MILP-generated instances, allowing data sharing for solver tuning without publishing the original data; (S2) Data Augmentation, wherein the DIG-MILP-generated instances bolster the generalization performance of machine learning models tasked with resolving MILP problems.

摘要

Knowledge Graph Context-Enhanced Diversified Recommendation

paper_url: http://arxiv.org/abs/2310.13253
repo_url: https://github.com/anonym844/kg-diverse
paper_authors: Xiaolong Liu, Liangwei Yang, Zhiwei Liu, Mingdai Yang, Chen Wang, Hao Peng, Philip S. Yu
For: The paper aims to enhance recommendation diversity within the context of knowledge graphs (KG) by incorporating contextual information and preserving contextual integrity.* Methods: The paper introduces an innovative metric called Entity Coverage and Relation Coverage to quantify diversity within the KG domain. It also proposes a novel technique called Diversified Embedding Learning (DEL) to formulate user representations that possess an innate awareness of diversity. Additionally, the paper introduces a new technique named Conditional Alignment and Uniformity (CAU) to encode KG item embeddings while preserving contextual integrity.* Results: The paper’s contributions signify a substantial stride towards augmenting the panorama of recommendation diversity within the realm of KG-informed RecSys paradigms.

Abstract
The field of Recommender Systems (RecSys) has been extensively studied to enhance accuracy by leveraging users' historical interactions. Nonetheless, this persistent pursuit of accuracy frequently engenders diminished diversity, culminating in the well-recognized "echo chamber" phenomenon. Diversified RecSys has emerged as a countermeasure, placing diversity on par with accuracy and garnering noteworthy attention from academic circles and industry practitioners. This research explores the realm of diversified RecSys within the intricate context of knowledge graphs (KG). These KGs act as repositories of interconnected information concerning entities and items, offering a propitious avenue to amplify recommendation diversity through the incorporation of insightful contextual information. Our contributions include introducing an innovative metric, Entity Coverage, and Relation Coverage, which effectively quantifies diversity within the KG domain. Additionally, we introduce the Diversified Embedding Learning (DEL) module, meticulously designed to formulate user representations that possess an innate awareness of diversity. In tandem with this, we introduce a novel technique named Conditional Alignment and Uniformity (CAU). It adeptly encodes KG item embeddings while preserving contextual integrity. Collectively, our contributions signify a substantial stride towards augmenting the panorama of recommendation diversity within the realm of KG-informed RecSys paradigms.

摘要
领域内的个人推荐系统（RecSys）已经广泛研究，以提高准确性。然而，这种不断追求准确性的努力经常导致多样性减少，最终形成了“喷 voz”现象。为了解决这问题，多样化RecSys已经出现了，将多样性与准确性平起肩。这篇研究在知识图（KG）的厚重的背景下，探讨了多样化RecSys的领域。KG作为实体和物品之间的连接，可以为推荐多样性的提高提供一条优美的路径。我们的贡献包括引入了一种创新的度量，Entity Coverage和Relation Coverage，可以有效量度KG中多样性。此外，我们还提出了多样化嵌入学习（DEL）模块，仔细设计用于构建具有多样性感的用户表示。同时，我们还提出了一种新的技术 named Conditional Alignment and Uniformity（CAU），可以充分利用KG中项目的嵌入，同时保持上下文完整性。总之，我们的贡献代表了对KG-驱动的RecSys多样化领域的一大突破。

Transparency challenges in policy evaluation with causal machine learning – improving usability and accountability

paper_url: http://arxiv.org/abs/2310.13240
repo_url: None
paper_authors: Patrick Rehill, Nicholas Biddle
for: 这篇论文旨在探讨 causal machine learning 工具在实际政策评估任务中的应用，以及这些方法在政策评估中的透明性问题。
methods: 本论文使用了 causal forest 模型来估计 conditional average treatment effects，并explores 如何使用可解释 AI 工具和简化模型来解决透明性问题。
results: 研究发现，现有的预测模型理解工具对 causal machine learning 模型不够有效，而简化模型以提高可解释性会导致预测误差增加。

Abstract
Causal machine learning tools are beginning to see use in real-world policy evaluation tasks to flexibly estimate treatment effects. One issue with these methods is that the machine learning models used are generally black boxes, i.e., there is no globally interpretable way to understand how a model makes estimates. This is a clear problem in policy evaluation applications, particularly in government, because it is difficult to understand whether such models are functioning in ways that are fair, based on the correct interpretation of evidence and transparent enough to allow for accountability if things go wrong. However, there has been little discussion of transparency problems in the causal machine learning literature and how these might be overcome. This paper explores why transparency issues are a problem for causal machine learning in public policy evaluation applications and considers ways these problems might be addressed through explainable AI tools and by simplifying models in line with interpretable AI principles. It then applies these ideas to a case-study using a causal forest model to estimate conditional average treatment effects for a hypothetical change in the school leaving age in Australia. It shows that existing tools for understanding black-box predictive models are poorly suited to causal machine learning and that simplifying the model to make it interpretable leads to an unacceptable increase in error (in this application). It concludes that new tools are needed to properly understand causal machine learning models and the algorithms that fit them.

摘要
causal machine learning工具正在实际政策评估任务中得到应用，以便灵活地估计治理效果。 however，这些方法的机器学习模型通常是黑盒子，即没有全面可解释的方式来理解模型如何生成估计。这是政策评估应用中的一个明显问题，特别是在政府中，因为困难以理解模型是否正常工作，基于正确的证据解释和透明度足够高以便负责任。然而，在 causal machine learning文献中对透明性问题的讨论相对少。这篇论文探讨了 causal machine learning在公共政策评估应用中的透明性问题，并考虑了如何通过可解释 AI 工具和简化模型来解决这些问题。然后，它应用这些想法到一个 случа study中，使用 causal forest 模型来估计 conditional average treatment effects для一个假设的澳大利亚学生离校年龄的变化。结果显示，现有的理解黑盒predictive模型的工具不适用于 causal machine learning，并且简化模型以使其可解释会导致误差增加（在这个应用中）。因此，这篇论文结论认为，需要新的工具来全面理解 causal machine learning模型和这些模型的算法。

Training A Semantic Communication System with Federated Learning

paper_url: http://arxiv.org/abs/2310.13236
repo_url: None
paper_authors: Loc X. Nguyen, Huy Q. Le, Ye Lin Tun, Pyae Sone Aung, Yan Kyaw Tun, Zhu Han, Choong Seon Hong
for: 本研究旨在提高 semantic communication 系统的性能，使其能够更好地处理数据匮乏问题。
methods: 本研究使用 federated learning (FL) Setting，利用用户数据进行学习，同时保护用户隐私和安全。另外，我们提出了一种减少每次全局轮次中传输信息量的机制，以降低网络负担。
results: 我们的提议技术与基准方法进行比较，实验结果表明其效果明显更高。

Abstract
Semantic communication has emerged as a pillar for the next generation of communication systems due to its capabilities in alleviating data redundancy. Most semantic communication systems are built using advanced deep learning models whose performance heavily depends on data availability. These studies assume that an abundance of training data is available, which is unrealistic. In practice, data is mainly created on the user side. Due to privacy and security concerns, the transmission of data is restricted, which is necessary for conventional centralized training schemes. To address this challenge, we explore semantic communication in federated learning (FL) setting that utilizes user data without leaking privacy. Additionally, we design our system to tackle the communication overhead by reducing the quantity of information delivered in each global round. In this way, we can save significant bandwidth for resource-limited devices and reduce overall network traffic. Finally, we propose a mechanism to aggregate the global model from the clients, called FedLol. Extensive simulation results demonstrate the efficacy of our proposed technique compared to baseline methods.

摘要

Equivariant Transformer is all you need

paper_url: http://arxiv.org/abs/2310.13222
repo_url: None
paper_authors: Akio Tomiya, Yuki Nagai
for: 这篇论文是用于推动计算物理学的机器学习和深度学习的应用。
methods: 论文使用了对称Equivariant Attention来改进Self-Learning Monte Carlo（SLMC）方法。
results: 实验结果表明，对称Equivariant Attention可以减少线性模型的接受率问题，并且可以观察到大型语言模型的扩展性。

Abstract
Machine learning, deep learning, has been accelerating computational physics, which has been used to simulate systems on a lattice. Equivariance is essential to simulate a physical system because it imposes a strong induction bias for the probability distribution described by a machine learning model. This reduces the risk of erroneous extrapolation that deviates from data symmetries and physical laws. However, imposing symmetry on the model sometimes occur a poor acceptance rate in self-learning Monte-Carlo (SLMC). On the other hand, Attention used in Transformers like GPT realizes a large model capacity. We introduce symmetry equivariant attention to SLMC. To evaluate our architecture, we apply it to our proposed new architecture on a spin-fermion model on a two-dimensional lattice. We find that it overcomes poor acceptance rates for linear models and observe the scaling law of the acceptance rate as in the large language models with Transformers.

摘要
机器学习、深度学习已经加速计算物理学，通过在格子上模拟系统。对称是计算物理系统的关键因素，因为它对机器学习模型中描述的概率分布强加假设。这可以降低模型外泌的风险，避免因数学 симметрии和物理法律而导致的误差推断。然而，在SLMC中强制实现对称 occasionally leads to poor acceptance rates.在这个场景下，我们引入对称启发注意力。我们采用这种新架构应用于我们的提议的二维格子上的螺旋- fermion 模型。我们发现它可以超越线性模型的 Acceptance 率问题，并观察到大型语言模型中的启发注意力的扩展律。

In-context Learning with Transformer Is Really Equivalent to a Contrastive Learning Pattern

paper_url: http://arxiv.org/abs/2310.13220
repo_url: None
paper_authors: Ruifeng Ren, Yong Liu
for: 本研究旨在理解Transformers基于强化学习的协同学习（ICL）机制。
methods: 本研究使用权重方法 establishment kernel方法来解释ICL的推理过程，并分析了在contrastive learning pattern下的自注意力层可能的改进。
results: 本研究表明，ICL可以视为一种梯度下降过程，并且通过对contrastive learning pattern进行分析，可以提出可能的自注意力层改进。

Abstract
Pre-trained large language models based on Transformers have demonstrated amazing in-context learning (ICL) abilities. Given several demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL. In this paper, we interpret the inference process of ICL as a gradient descent process in a contrastive learning pattern. Firstly, leveraging kernel methods, we establish the relationship between gradient descent and self-attention mechanism under generally used softmax attention setting instead of linear attention setting. Then, we analyze the corresponding gradient descent process of ICL from the perspective of contrastive learning without negative samples and discuss possible improvements of this contrastive learning pattern, based on which the self-attention layer can be further modified. Finally, we design experiments to support our opinions. To the best of our knowledge, our work is the first to provide the understanding of ICL from the perspective of contrastive learning and has the potential to facilitate future model design by referring to related works on contrastive learning.

摘要
<>将文本翻译成简化中文。<>基于 transformer 的大语言模型已经表现出很好的上下文学习（ICL）能力。给定一些示例，模型可以实现新任务无需参数更新。然而，我们还没有很好地理解ICL的机制。在这篇论文中，我们将推理出ICL的推理过程为一个梯度下降过程，并且在通常使用软max注意力设置下，使用kernel方法来建立梯度下降和自注意力机制之间的关系。然后，我们分析ICL的相应梯度下降过程从对照学习的角度，不包括负样本，并讨论可能改进这种对照学习模式的方法。最后，我们设计了实验来支持我们的观点。根据我们所知，我们的工作是首次从对照学习角度理解ICL，并且具有可能引导未来模型设计的潜在优势。