2023-10-13

cs.CL

cs.CL - 2023-10-13

SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

paper_url: http://arxiv.org/abs/2310.09424
repo_url: https://github.com/NVIDIA/NeMo
paper_authors: Zhehuai Chen, He Huang, Andrei Andrusenko, Oleksii Hrinchuk, Krishna C. Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg
for: 本研究旨在提出一种新型的语音增强语言模型（SALM），具有多任务和contextual学习能力。
methods: SALM包括冻结文本LLM、音频编码器、模态适应模块以及LoRA层，以处理语音输入和相关任务指令。
results: 研究表明，SALM不仅可以与任务特定的Conformer基线相比的性能，同时还具有零扩展域学习能力，通过关键词提升任务的ASR和AST。此外，对于LLM训练和下游语音任务之间的差距，提出了speech supervised in-context training方法，进一步提高了语音识别模型的contextual学习能力。

Abstract
We present a novel Speech Augmented Language Model (SALM) with {\em multitask} and {\em in-context} learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a modality adapter module, and LoRA layers to accommodate speech input and associated task instructions. The unified SALM not only achieves performance on par with task-specific Conformer baselines for Automatic Speech Recognition (ASR) and Speech Translation (AST), but also exhibits zero-shot in-context learning capabilities, demonstrated through keyword-boosting task for ASR and AST. Moreover, {\em speech supervised in-context training} is proposed to bridge the gap between LLM training and downstream speech tasks, which further boosts the in-context learning ability of speech-to-text models. Proposed model is open-sourced via NeMo toolkit.

摘要
我们介绍了一种新的语音增强语言模型（SALM），具有多任务和 Contextual Learning 能力。 SALM 包含一个冻结文本 LLM，一个音频编码器，一个模态适应模块，以及LoRA层来处理语音输入和相关任务指令。这个一体的 SALM 不仅实现与任务特定 Conformer 基elines 相当的性能，还能够采用零shot Contextual Learning 能力，通过关键词增强任务来证明。此外，我们还提出了基于语音超级vised Contextual Training 的方法，以填补 LLM 训练和下游语音任务之间的差距，这进一步提高了语音到文本模型的Contextual Learning 能力。我们将该模型通过 NeMo 工具包开源。

A Computational Approach to Style in American Poetry

paper_url: http://arxiv.org/abs/2310.09357
repo_url: None
paper_authors: David M. Kaplan, David M. Blei
for: 这个论文是为了开发一种量化方法来评估美国诗歌的风格和Visualize a Collection of Poems。
methods: 这个论文使用了qualitative poetry criticism来导向开发 metrics，这些metrics分析了诗歌中的不同的字幕、 sintactic 和phonemic特征。
results: 这个方法可以从诗歌中提取了全面的风格信息，并计算出诗歌之间的距离。Visualizations提供了Ready access to analytical components。在 tested on several collections of poetry 中，这个方法可以更好地定义诗歌的风格，并且有可能应用于文学研究、个人对诗歌的感受研究以及为基于用户喜爱诗歌的推荐。

Abstract
We develop a quantitative method to assess the style of American poems and to visualize a collection of poems in relation to one another. Qualitative poetry criticism helped guide our development of metrics that analyze various orthographic, syntactic, and phonemic features. These features are used to discover comprehensive stylistic information from a poem's multi-layered latent structure, and to compute distances between poems in this space. Visualizations provide ready access to the analytical components. We demonstrate our method on several collections of poetry, showing that it better delineates poetry style than the traditional word-occurrence features that are used in typical text analysis algorithms. Our method has potential applications to academic research of texts, to research of the intuitive personal response to poetry, and to making recommendations to readers based on their favorite poems.

摘要
我们开发了一种量化方法，用于评估美国诗歌的风格和对诗歌集的视觉化。 qualitative poetry criticism 帮助我们开发了一些测量不同语言、 sintactic 和 phonemic 特征的维度，以探索诗歌的多层次潜在结构，并计算诗歌之间的距离。我们在几个诗歌集上应用了这种方法，并证明它可以更好地分类诗歌风格，比传统的单词出现频率特征更加精准。我们的方法有可能应用于文本研究、个人对诗歌的直觉反应的研究以及根据读者喜欢的诗歌进行推荐。

User Inference Attacks on Large Language Models

paper_url: http://arxiv.org/abs/2310.09266
repo_url: None
paper_authors: Nikhil Kandpal, Krishna Pillutla, Alina Oprea, Peter Kairouz, Christopher A. Choquette-Choo, Zheng Xu
for: 本研究探讨了大语言模型（LLM）的精细调整过程中的隐私问题。
methods: 作者提出了一种威胁模型，称为用户推理（user inference），其中攻击者通过对用户数据进行推理来推断用户数据是否被使用于精细调整。作者实现了这种威胁模型下的攻击，需要只有一小部分的用户样本和黑盒访问精细调整后的LLM。
results: 研究发现，LLM在不同的精细调整数据集上都具有攻击 succeess rate，有时成功率接近100%。此外，研究发现特定用户（例如异常用户，即其数据分布与其他用户差异较大）和贡献大量数据的用户容易受到攻击。 finally, 作者考虑了一些防范隐私攻击的办法，发现在训练算法中进行批处理或每个例子的梯度剪切和早停等方法无法防止用户推理攻击，但是限制单个用户提供的精细调整样本数量可以降低攻击效果，尽管会减少总的精细调整数据量。

Abstract
Fine-tuning is a common and effective method for tailoring large language models (LLMs) to specialized tasks and applications. In this paper, we study the privacy implications of fine-tuning LLMs on user data. To this end, we define a realistic threat model, called user inference, wherein an attacker infers whether or not a user's data was used for fine-tuning. We implement attacks for this threat model that require only a small set of samples from a user (possibly different from the samples used for training) and black-box access to the fine-tuned LLM. We find that LLMs are susceptible to user inference attacks across a variety of fine-tuning datasets, at times with near perfect attack success rates. Further, we investigate which properties make users vulnerable to user inference, finding that outlier users (i.e. those with data distributions sufficiently different from other users) and users who contribute large quantities of data are most susceptible to attack. Finally, we explore several heuristics for mitigating privacy attacks. We find that interventions in the training algorithm, such as batch or per-example gradient clipping and early stopping fail to prevent user inference. However, limiting the number of fine-tuning samples from a single user can reduce attack effectiveness, albeit at the cost of reducing the total amount of fine-tuning data.

摘要
大型语言模型（LLM）的精致化是一种常见且有效的方法，用于适应特定任务和应用。在这篇研究中，我们研究了精致化LLM的隐私问题。为此，我们定义了一个实际威胁模型，called user inference，其中攻击者可以推断用户的数据是否用于精致化。我们实现了这个威胁模型的攻击，只需要一小批的用户数据（可能与训练数据不同）和黑盒式存取精致化LLM。我们发现，精致化LLM在不同的训练数据集上都受到攻击者的攻击，有时成功率接近100%。我们进一步研究了哪些特性使用户容易受到攻击，发现个别用户（即与其他用户的数据分布不同）和贡献大量数据的用户最容易受到攻击。最后，我们探索了一些防护隐私措施，发现在训练算法中的干预，如批次或每个例子的梯度调整和早期停止，无法防止用户推断。但是，限制单一用户精致化数据的来源可以降低攻击效果，尽管这会导致精致化数据减少。

PromptRE: Weakly-Supervised Document-Level Relation Extraction via Prompting-Based Data Programming

paper_url: http://arxiv.org/abs/2310.09265
repo_url: None
paper_authors: Chufan Gao, Xulin Fan, Jimeng Sun, Xuan Wang
for: 文章的目的是提出一种新的弱监督文档关系提取方法，以解决 tradicional的人工标注方法存在的时间和劳动成本问题。
methods: 该方法使用了提示技术和数据编程技术，同时利用标签分布和实体类型作为先验知识来提高性能。
results: 实验结果表明，PromptRE方法在ReDocRED测试集上比基eline方法有更高的表现，能够有效地处理”没有关系”问题。

Abstract
Relation extraction aims to classify the relationships between two entities into pre-defined categories. While previous research has mainly focused on sentence-level relation extraction, recent studies have expanded the scope to document-level relation extraction. Traditional relation extraction methods heavily rely on human-annotated training data, which is time-consuming and labor-intensive. To mitigate the need for manual annotation, recent weakly-supervised approaches have been developed for sentence-level relation extraction while limited work has been done on document-level relation extraction. Weakly-supervised document-level relation extraction faces significant challenges due to an imbalanced number "no relation" instances and the failure of directly probing pretrained large language models for document relation extraction. To address these challenges, we propose PromptRE, a novel weakly-supervised document-level relation extraction method that combines prompting-based techniques with data programming. Furthermore, PromptRE incorporates the label distribution and entity types as prior knowledge to improve the performance. By leveraging the strengths of both prompting and data programming, PromptRE achieves improved performance in relation classification and effectively handles the "no relation" problem. Experimental results on ReDocRED, a benchmark dataset for document-level relation extraction, demonstrate the superiority of PromptRE over baseline approaches.

摘要
relation extraction的目标是将两个实体之间的关系分类为预定义的类别。而前期研究主要集中在句子水平的关系抽取，而最近的研究则扩展到文档水平的关系抽取。传统的关系抽取方法几乎完全依赖于人工标注训练数据，这是时间消耗和劳动密集的。为了减轻人工标注的需求，最近的弱级支持方法在句子水平的关系抽取中得到了应用。然而，弱级支持的文档水平关系抽取受到了“无关”实例的强烈抗衡和直接使用预训练大语言模型进行文档关系抽取的失败。为解决这些挑战，我们提出了PromptRE，一种新的弱级支持的文档水平关系抽取方法，该方法将招徕技术和数据编程相结合。此外，PromptRE还利用标签分布和实体类型作为先验知识来提高性能。通过利用招徕和数据编程的优势，PromptRE实现了对关系分类的改进表现，并有效地处理“无关”问题。实验结果表明，PromptRE在ReDocRED测试集上表现出色，比基eline方法更高。

Political claim identification and categorization in a multilingual setting: First experiments

paper_url: http://arxiv.org/abs/2310.09256
repo_url: None
paper_authors: Urs Zaberer, Sebastian Padó, Gabriella Lapesa
for: 这篇论文旨在探讨跨语言政治宣言分析的方法。
methods: 这篇论文使用了机器翻译和多语言嵌入来进行跨语言政治宣言分析。
results: 在德国DebateNet2.0 dataset上，这些方法在政策辩论中的难民危机问题上进行了实验，并取得了良好的成绩。

Abstract
The identification and classification of political claims is an important step in the analysis of political newspaper reports; however, resources for this task are few and far between. This paper explores different strategies for the cross-lingual projection of political claims analysis. We conduct experiments on a German dataset, DebateNet2.0, covering the policy debate sparked by the 2015 refugee crisis. Our evaluation involves two tasks (claim identification and categorization), three languages (German, English, and French) and two methods (machine translation -- the best method in our experiments -- and multilingual embeddings).

摘要
政治声明的识别和分类是政治报道分析中的重要步骤，但资源却稀缺。这篇论文探讨了不同的横跨语言政治声明分析投影策略。我们在德国 dataset DebateNet2.0 上进行实验，该 dataset 覆盖2015年难民危机引发的政策辩论。我们的评估包括两个任务（声明识别和分类）、三种语言（德语、英语、法语）和两种方法（机器翻译——我们实验中最佳方法——和多语言嵌入）。

Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy

paper_url: http://arxiv.org/abs/2310.09247
repo_url: https://github.com/yandex-research/text-to-img-hypernymy
paper_authors: Anton Baryshnikov, Max Ryabinin
for: 这项研究的目的是对 популяр的文本到图像模型进行语言理解能力的测试和评估。
methods: 该研究使用了WordNetsemantic hierarchy和现有的图像分类器pretrained on ImageNet来设计了两种自动度量器，以便对文本到图像模型的语言能力进行广泛的量化比较，并找到细腻的质量差异，如模型中不熟悉的词汇。
results: 研究对 популяр的文本到图像模型进行了广泛的评估，包括GLIDE、Latent Diffusion和Stable Diffusion等模型，并显示了这些度量器可以为我们提供更好的理解这些模型的个体优劣点。

Abstract
Text-to-image synthesis has recently attracted widespread attention due to rapidly improving quality and numerous practical applications. However, the language understanding capabilities of text-to-image models are still poorly understood, which makes it difficult to reason about prompt formulations that a given model would understand well. In this work, we measure the capability of popular text-to-image models to understand $\textit{hypernymy}$, or the "is-a" relation between words. We design two automatic metrics based on the WordNet semantic hierarchy and existing image classifiers pretrained on ImageNet. These metrics both enable broad quantitative comparison of linguistic capabilities for text-to-image models and offer a way of finding fine-grained qualitative differences, such as words that are unknown to models and thus are difficult for them to draw. We comprehensively evaluate popular text-to-image models, including GLIDE, Latent Diffusion, and Stable Diffusion, showing how our metrics can provide a better understanding of the individual strengths and weaknesses of these models.

摘要

Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model Collaboration

paper_url: http://arxiv.org/abs/2310.09241
repo_url: https://github.com/wuyiquan/PLJP
paper_authors: Yiquan Wu, Siying Zhou, Yifei Liu, Weiming Lu, Xiaozhong Liu, Yating Zhang, Changlong Sun, Fei Wu, Kun Kuang
for: 预测法律案件判决（Legal Judgment Prediction，LJP）在法律人工智能领域变得越来越重要，即根据案件事实描述预测案件判决。
methods: 我们提出了一种基于前例的LJP框架（PLJP），利用大语言模型（LLM）和域pecific模型的优势，在前例上进行预测。域pecific模型可以快速提供候选标签和有效找到相关前例，而LLM则可以在上下文中理解和生成复杂的自然语言。
results: 我们在实际数据集上进行了实验，并证明了我们的PLJP方法的有效性。此外，我们的工作还采用了LLM和域模型的合作方式，可以推广到其他垂直领域。

Abstract
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI, i.e., predicting the judgment of the case in terms of case fact description. Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems. Thus, it is worthwhile to explore the utilization of precedents in the LJP. Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task. These can be broken down into two categories: large language models (LLMs) and domain-specific models. LLMs are capable of interpreting and generating complex natural language, while domain models are efficient in learning task-specific information. In this paper, we propose the precedent-enhanced LJP framework (PLJP), a system that leverages the strength of both LLM and domain models in the context of precedents. Specifically, the domain models are designed to provide candidate labels and find the proper precedents efficiently, and the large models will make the final prediction with an in-context precedents comprehension. Experiments on the real-world dataset demonstrate the effectiveness of our PLJP. Moreover, our work shows a promising direction for LLM and domain-model collaboration that can be generalized to other vertical domains.

摘要
法律判断预测（LJP）在法律人工智能中变得越来越重要，即根据案件事实描述预测案件的判断。前例是国家法律系统中的前一次案件，它们成为后续案件的判断基础。因此，探索利用前例的使用在LJP中是有价值的。现代深度学习技术的进步使得可以使用多种解决LJP任务的技术。这些技术可以分为两类：大自然语言模型（LLM）和域特定模型。LLM可以解释和生成复杂的自然语言，而域特定模型可以高效地学习任务特定的信息。在这篇论文中，我们提出了前例增强的LJP框架（PLJP），一个利用LLM和域模型的优点来解决LJP任务的系统。具体来说，域模型用于提供候选标签和快速找到相关前例，而LLM则使用在前例上进行最终预测。实验表明我们的PLJP在实际数据集上具有效果。此外，我们的工作还释明了LLM和域模型之间的合作方向，这种方向可以普遍应用于其他垂直领域。

paper_url: http://arxiv.org/abs/2310.09238
repo_url: https://github.com/Saumajit/BanglaNLP/tree/main/Task_2
paper_authors: Saumajit Saha, Albert Nanda
for: 本研究主要针对 Bangla 社交媒体帖子中的 sentiment analysis 问题，即在 low-resource 语言enario 中使用 Transformer 架构进行模型学习和评价。
methods: 本研究采用了多种 Transformer 架构进行实验，包括 Twitter 数据集上已经 finetuned 的模型，以及不同的 hyperparameter 和搅拌策略。
results: 研究发现，通过 transfer learning 可以在 low-resource 语言enario 中更好地学习模型，并且 finetuned 模型在 test 集上 obtaint 微 F1 分数为 67.02%，在共同任务中排名第 21。此外，研究还进行了详细的错误分析，发现一些批处标注需要重新审查。

Abstract
Bangla is the 7th most widely spoken language globally, with a staggering 234 million native speakers primarily hailing from India and Bangladesh. This morphologically rich language boasts a rich literary tradition, encompassing diverse dialects and language-specific challenges. Despite its linguistic richness and history, Bangla remains categorized as a low-resource language within the natural language processing (NLP) and speech community. This paper presents our submission to Task 2 (Sentiment Analysis of Bangla Social Media Posts) of the BLP Workshop. We experiment with various Transformer-based architectures to solve this task. Our quantitative results show that transfer learning really helps in better learning of the models in this low-resource language scenario. This becomes evident when we further finetune a model which has already been finetuned on twitter data for sentiment analysis task and that finetuned model performs the best among all other models. We also perform a detailed error analysis where we find some instances where ground truth labels need to be relooked at. We obtain a micro-F1 of 67.02\% on the test set and our performance in this shared task is ranked at 21 in the leaderboard.

摘要
孟加拉语是全球第七最流行的语言，拥有234万名native speaker，主要来自印度和孟加拉。这种语言拥有丰富的 morphology，包括多种方言和语言特有的挑战。尽管孟加拉语的语言富裕和历史，但它在自然语言处理（NLP）和speech社区内仍被视为低资源语言。这篇文章介绍我们对Task 2（孟加拉社交媒体文章情感分析）的参与。我们试用了不同的Transformer架构来解决这个任务。我们的量化结果表明，在低资源语言情况下，传输学习确实有助于模型更好地学习。这成为可见的，当我们再finetune一个已经在推特数据上进行情感分析任务的模型时，该模型在所有其他模型中表现最佳。我们还进行了详细的错误分析，发现一些实例，需要重新审查真实的标注。我们在测试集上 obtiain micro-F1的67.02%，在共享任务中排名第21名。

AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems

paper_url: http://arxiv.org/abs/2310.09233
repo_url: None
paper_authors: Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, Ji-Rong Wen
for:这个论文的目的是为了模拟用户行为，尤其是在推荐系统中的用户-项目互动。methods:这个论文使用了代理人机制，将用户和项目都视为代理人，并通过协同学习方法来优化这两个类型的代理人。results:这个论文的结果表明，使用这种方法可以模拟用户的个性化行为，并且可以预测用户将在未来的互动中展现出的行为。

Abstract
Recently, there has been an emergence of employing LLM-powered agents as believable human proxies, based on their remarkable decision-making capability. However, existing studies mainly focus on simulating human dialogue. Human non-verbal behaviors, such as item clicking in recommender systems, although implicitly exhibiting user preferences and could enhance the modeling of users, have not been deeply explored. The main reasons lie in the gap between language modeling and behavior modeling, as well as the incomprehension of LLMs about user-item relations. To address this issue, we propose AgentCF for simulating user-item interactions in recommender systems through agent-based collaborative filtering. We creatively consider not only users but also items as agents, and develop a collaborative learning approach that optimizes both kinds of agents together. Specifically, at each time step, we first prompt the user and item agents to interact autonomously. Then, based on the disparities between the agents' decisions and real-world interaction records, user and item agents are prompted to reflect on and adjust the misleading simulations collaboratively, thereby modeling their two-sided relations. The optimized agents can also propagate their preferences to other agents in subsequent interactions, implicitly capturing the collaborative filtering idea. Overall, the optimized agents exhibit diverse interaction behaviors within our framework, including user-item, user-user, item-item, and collective interactions. The results show that these agents can demonstrate personalized behaviors akin to those of real-world individuals, sparking the development of next-generation user behavior simulation.

摘要
现在，有一种趋势是利用基于LLM的代理人作为可信的人类代理人，基于它们的决策能力的很好。然而，现有的研究主要集中在模拟人类对话。用户非语言行为，如推荐系统中的物品点击，虽然做出了用户喜好的含义，但尚未得到深入研究。这主要的原因在于语言模型和行为模型之间的差距，以及LLM对用户-项目关系的无知。为解决这个问题，我们提出了 AgentCF，一种通过代理人合作 filtering 来模拟用户-项目交互的方法。我们创新地将用户和项目都视为代理人，并开发了一种合作学习方法，以同时优化这两种代理人。具体来说，在每次时间步骤时，我们先让用户和项目代理人自主互动。然后，根据代理人决策和真实交互记录之间的差异，用户和项目代理人被让reflect和调整模拟的不符合行为，以模型他们的两面关系。最优化的代理人还可以在后续交互中传递它们的偏好， implicit capture 合 filtering 的想法。总的来说，我们的框架中的优化代理人展现出了多样化的交互行为，包括用户-项目、用户-用户、项目-项目和集体交互。结果显示，这些代理人可以展现出与实际世界个体类似的个性化行为，鼓励下一代用户行为模拟的发展。

Automated Claim Matching with Large Language Models: Empowering Fact-Checkers in the Fight Against Misinformation

paper_url: http://arxiv.org/abs/2310.09223
repo_url: None
paper_authors: Eun Cheol Choi, Emilio Ferrara
for: 增强 Fact-checking Automation (增强 Fact-checking 自动化)
methods: 使用 Large Language Models (LLMs) 生成 simulated social media posts 并 fine-tune 特殊化的 LLMs for claim matching tasks (使用 LLMs 生成 simulated social media posts，并对 claims matching tasks 进行 fine-tuning)
results: Fine-tuned LLMs rival the performance of larger pre-trained LLMs in claim matching tasks, aligning closely with human annotations (特殊化的 LLMs 与更大的预训练 LLMs 的表现相似，与人类注释Alignment)

Abstract
In today's digital era, the rapid spread of misinformation poses threats to public well-being and societal trust. As online misinformation proliferates, manual verification by fact checkers becomes increasingly challenging. We introduce FACT-GPT (Fact-checking Augmentation with Claim matching Task-oriented Generative Pre-trained Transformer), a framework designed to automate the claim matching phase of fact-checking using Large Language Models (LLMs). This framework identifies new social media content that either supports or contradicts claims previously debunked by fact-checkers. Our approach employs GPT-4 to generate a labeled dataset consisting of simulated social media posts. This data set serves as a training ground for fine-tuning more specialized LLMs. We evaluated FACT-GPT on an extensive dataset of social media content related to public health. The results indicate that our fine-tuned LLMs rival the performance of larger pre-trained LLMs in claim matching tasks, aligning closely with human annotations. This study achieves three key milestones: it provides an automated framework for enhanced fact-checking; demonstrates the potential of LLMs to complement human expertise; offers public resources, including datasets and models, to further research and applications in the fact-checking domain.

摘要
今天的数字时代，迅速传播的谣言威胁公众健康和社会信任。随着谣言在线传播，手动验证的困难也在增加。我们介绍FACT-GPT（真实核查增强with Claim Matching Task-oriented Generative Pre-trained Transformer）框架，用于自动化CLAIM Matching阶段的真实核查。这个框架可以识别新的社交媒体内容， Either Supports or Contradicts previously debunked by fact-checkers。我们的方法使用GPT-4生成一个标注数据集，用于训练特殊化的LLMs。我们对一个大量社交媒体内容 related to public health进行评估，结果表明，我们的精心特殊化LLMs可以与更大的预训练LLMs在CLAIM Matching任务中 rival，与人工笔记相似。本研究实现了三个关键突破口：提供了自动化的增强真实核查框架；证明LLMs可以补充人类专家知识；提供了公共资源，包括数据集和模型，以便进一步的研究和应用在真实核查领域。

Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration

paper_url: http://arxiv.org/abs/2310.09168
repo_url: https://github.com/fanqiwan/explore-instruct
paper_authors: Fanqi Wan, Xinting Huang, Tao Yang, Xiaojun Quan, Wei Bi, Shuming Shi
for: 提高适用范围和任务覆盖率的模型调教数据准备
methods: 使用大自然语言模型进行活动探索，实现域pecific instrucion-tuning数据的多样性和域dialect化
results: 对多个基线进行比较，实现了域pecific instruction coverage的明显提高，并且模型性能得到了显著改进

Abstract
Instruction-tuning can be substantially optimized through enhanced diversity, resulting in models capable of handling a broader spectrum of tasks. However, existing data employed for such tuning often exhibit an inadequate coverage of individual domains, limiting the scope for nuanced comprehension and interactions within these areas. To address this deficiency, we propose Explore-Instruct, a novel approach to enhance the data coverage to be used in domain-specific instruction-tuning through active exploration via Large Language Models (LLMs). Built upon representative domain use cases, Explore-Instruct explores a multitude of variations or possibilities by implementing a search algorithm to obtain diversified and domain-focused instruction-tuning data. Our data-centric analysis validates the effectiveness of this proposed approach in improving domain-specific instruction coverage. Moreover, our model's performance demonstrates considerable advancements over multiple baselines, including those utilizing domain-specific data enhancement. Our findings offer a promising opportunity to improve instruction coverage, especially in domain-specific contexts, thereby advancing the development of adaptable language models. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/Explore-Instruct}.

摘要
具有增强多样性的指导调整可以具有更好的优化效果，使模型能够涵盖更广泛的任务范围。然而，现有的用于这种调整的数据经常表现出不够的域名覆盖率，这限制了模型在这些领域内的细化理解和互动的范围。为了解决这一问题，我们提出了Explore-Instruct方法，它通过使用大型自然语言模型（LLM）进行活动探索，以获取具有多样性和域名焦点的指导调整数据。我们基于域名使用情况建立了代表性的域名案例，然后通过搜索算法来探索多种可能性和域名专注的指导调整数据。我们的数据分析表明，我们的提议方法可以提高域名特定的指导覆盖率。此外，我们的模型性能也超过了多个基线，包括使用域名特定数据增强的基线。我们的发现对于提高指导覆盖率，特别是在域名特定上，是一个有前途的发展。我们的代码、模型 веса和数据可以在 \url{https://github.com/fanqiwan/Explore-Instruct} 中找到。

Developing a Natural Language Understanding Model to Characterize Cable News Bias

paper_url: http://arxiv.org/abs/2310.09166
repo_url: None
paper_authors: Seth P. Benson, Iain J. Cruickshank
for: 本研究旨在开发一种无需人工标注的媒体偏见检测方法，以便对有线电视新闻节目进行客观评估。
methods: 本方法基于名实Recognition和态度分析，对有线电视新闻节目的话题和讨论方式进行分析，并通过聚类分析将相似偏见的节目集成起来。
results: 应用本方法于2020年有线电视新闻脚本，发现节目团集在时间上保持一致，roughly对应有线电视新闻网络。本方法显示了未来可能开发出客观评估媒体偏见的工具，并可以对未知媒体环境进行描述。

Abstract
Media bias has been extensively studied by both social and computational sciences. However, current work still has a large reliance on human input and subjective assessment to label biases. This is especially true for cable news research. To address these issues, we develop an unsupervised machine learning method to characterize the bias of cable news programs without any human input. This method relies on the analysis of what topics are mentioned through Named Entity Recognition and how those topics are discussed through Stance Analysis in order to cluster programs with similar biases together. Applying our method to 2020 cable news transcripts, we find that program clusters are consistent over time and roughly correspond to the cable news network of the program. This method reveals the potential for future tools to objectively assess media bias and characterize unfamiliar media environments.

摘要
媒体偏见已经由社会科学和计算机科学广泛研究。然而，现有工作仍然具有大量的人工输入和主观评估来标识偏见。这尤其是在有线电视新闻研究中。为解决这些问题，我们开发了一种无监督机器学习方法，用于无人工输入地识别有线电视节目的偏见。这种方法基于命名实体识别和立场分析来 clustering 节目的偏见。在应用于2020年有线电视脚本时，我们发现program集群在时间上具有一定的稳定性，并roughly对应于电视新闻网络。这种方法揭示了未来工具的可能性，用于 объектив地评估媒体偏见并描述未知的媒体环境。

BibRank: Automatic Keyphrase Extraction Platform Using~Metadata

paper_url: http://arxiv.org/abs/2310.09151
repo_url: https://github.com/dallal9/bibrank
paper_authors: Abdelrhman Eldallal, Eduard Barbu
for: 本文是为了提供一个 integrate keyphrase 数据集和评估关键短语提取算法的平台。
methods: 本文使用了 BibRank 自动关键短语提取算法，该算法利用 BibTeX 格式的 bibliographic 数据获得了丰富的数据集，并 combining 创新的权重技术、位置信息、统计信息和单词相似度信息来提取关键短语。
results: 本平台可以为研究人员和开发人员提供一个便捷的平台来提高关键短语提取算法和自然语言处理领域的进步。

Abstract
Automatic Keyphrase Extraction involves identifying essential phrases in a document. These keyphrases are crucial in various tasks such as document classification, clustering, recommendation, indexing, searching, summarization, and text simplification. This paper introduces a platform that integrates keyphrase datasets and facilitates the evaluation of keyphrase extraction algorithms. The platform includes BibRank, an automatic keyphrase extraction algorithm that leverages a rich dataset obtained by parsing bibliographic data in BibTeX format. BibRank combines innovative weighting techniques with positional, statistical, and word co-occurrence information to extract keyphrases from documents. The platform proves valuable for researchers and developers seeking to enhance their keyphrase extraction algorithms and advance the field of natural language processing.

摘要
自动KEYPHRASE提取关键词phraseextraction的核心是从文档中提取重要的短语。这些关键词phrase是各种任务，如文档分类、聚类、推荐、索引、搜索、摘要和文本简化中的关键。这篇文章介绍了一个集成关键词phrase数据集和评估关键词提取算法的平台。该平台包括BibRank自动关键词提取算法，该算法利用BibTeX格式文献数据中的丰富数据来提取关键词phrase。BibRank combining创新权重技术、位置、统计和词语相似性信息来从文档中提取关键词phrase。该平台对研究人员和开发人员来进行关键词提取算法的优化和自然语言处理领域的发展具有价值。

PuoBERTa: Training and evaluation of a curated language model for Setswana

paper_url: http://arxiv.org/abs/2310.09141
repo_url: https://github.com/dsfsi/puodata
paper_authors: Vukosi Marivate, Moseli Mots’Oehli, Valencia Wagner, Richard Lastrucci, Isheanesu Dzingirai
for: 本研究旨在提高LOW-RESOURCE语言如setswana的自然语言处理（NLP）能力。
methods: 该研究使用自定义的masked language model PuoBERTa进行训练，并利用多样化的单语言文本生成高质量训练集。
results: 研究表明PuoBERTa在PART-OF-SPEECH标注、命名实体识别和新闻分类等NLP任务中表现出色，并提供了一个新的setswana新闻分类数据集的初步评测结果。

Abstract
Natural language processing (NLP) has made significant progress for well-resourced languages such as English but lagged behind for low-resource languages like Setswana. This paper addresses this gap by presenting PuoBERTa, a customised masked language model trained specifically for Setswana. We cover how we collected, curated, and prepared diverse monolingual texts to generate a high-quality corpus for PuoBERTa's training. Building upon previous efforts in creating monolingual resources for Setswana, we evaluated PuoBERTa across several NLP tasks, including part-of-speech (POS) tagging, named entity recognition (NER), and news categorisation. Additionally, we introduced a new Setswana news categorisation dataset and provided the initial benchmarks using PuoBERTa. Our work demonstrates the efficacy of PuoBERTa in fostering NLP capabilities for understudied languages like Setswana and paves the way for future research directions.

摘要
自然语言处理（NLP）在英语等资源丰富语言方面做出了重要进展，但对低资源语言如setswana来说，却存在落后的问题。这篇论文旨在填补这个差距，通过提出puoberta，一种特定于setswana的掩码语言模型的训练。我们详细介绍了如何收集、 curación和准备了多样化的单语言文本，以生成高质量的训练集 дляpuoberta。在以前的尝试中，我们为setswana语言创造了单语言资源，并评估了puoberta在多个NLP任务上，包括分词（POS）标注、命名实体识别（NER）和新闻分类。此外，我们还提供了一个新的setswana新闻分类数据集，并在puoberta上提供了初步的benchmark。我们的工作表明puoberta在推动低资源语言like setswana的NLP能力具有潜力，并为未来的研究提供了道路。

A Frustratingly Easy Plug-and-Play Detection-and-Reasoning Module for Chinese Spelling Check

paper_url: http://arxiv.org/abs/2310.09119
repo_url: None
paper_authors: Haojing Huang, Jingheng Ye, Qingyu Zhou, Yinghui Li, Yangning Li, Feng Zhou, Hai-Tao Zheng
for: 提高中文拼写检查（CSC）的性能，通过更直接和有效地利用中文语言的外部知识。
methods: 将CSC工作流程分解为检测、理解和搜索子任务，并设计了兼容现有SOTA非自适应CSC模型的检测和理解模块。
results: 提出了一个可插入式的检测和理解模块，可以为现有模型提高性能，并发现这种模块在不同模型上也可以提供主要的解释性。经过广泛的实验和详细分析，证明了该模块的效果和竞争力。

Abstract
In recent years, Chinese Spelling Check (CSC) has been greatly improved by designing task-specific pre-training methods or introducing auxiliary tasks, which mostly solve this task in an end-to-end fashion. In this paper, we propose to decompose the CSC workflow into detection, reasoning, and searching subtasks so that the rich external knowledge about the Chinese language can be leveraged more directly and efficiently. Specifically, we design a plug-and-play detection-and-reasoning module that is compatible with existing SOTA non-autoregressive CSC models to further boost their performance. We find that the detection-and-reasoning module trained for one model can also benefit other models. We also study the primary interpretability provided by the task decomposition. Extensive experiments and detailed analyses demonstrate the effectiveness and competitiveness of the proposed module.

摘要
Note:* "CSC" stands for "Chinese Spelling Check"* "SOTA" stands for "State-of-the-Art"* "non-autoregressive" means that the model does not use feedback connections to previous time steps.

Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model

paper_url: http://arxiv.org/abs/2310.09089
repo_url: https://github.com/williamliujl/Qilin-Med
paper_authors: Qichen Ye, Junling Liu, Dading Chong, Peilin Zhou, Yining Hua, Andrew Liu
for: 这篇论文的目的是探讨如何使用大型自然语言模型（LLM）在医疗领域中提高表现。
methods: 该论文使用了多 Stage 训练方法，包括域специфи的继续预训练（DCPT）、监督精度优化（SFT）和直接偏好优化（DPO）。
results: 通过使用这种训练策略，研究人员减少了LLM的资源消耗，同时提高了医疗领域的表现。在 CPT 和 SFT 阶段，LLM 的准确率达到了 38.4% 和 40.0%，比 Baichuan-7B 的 33.5% 高。在 DPO 阶段，LLM 在 Huatuo-26M 测试集上的 BLEU-1 和 ROUGE1 分别达到了 16.66 和 27.44，比 SFT 的 12.69 和 24.21 高。这显示了该训练方法在医疗领域中提高 LLM 表现的力量。

Abstract
Integrating large language models (LLMs) into healthcare presents potential but faces challenges. Directly pre-training LLMs for domains like medicine is resource-heavy and sometimes unfeasible. Sole reliance on Supervised Fine-tuning (SFT) can result in overconfident predictions and may not tap into domain specific insights. Addressing these challenges, we present a multi-stage training method combining Domain-specific Continued Pre-training (DCPT), SFT, and Direct Preference Optimization (DPO). A notable contribution of our study is the introduction of a 3Gb Chinese Medicine (ChiMed) dataset, encompassing medical question answering, plain texts, knowledge graphs, and dialogues, segmented into three training stages. The medical LLM trained with our pipeline, Qilin-Med, exhibits significant performance boosts. In the CPT and SFT phases, it achieves 38.4% and 40.0% accuracy on the CMExam, surpassing Baichuan-7B's 33.5%. In the DPO phase, on the Huatuo-26M test set, it scores 16.66 in BLEU-1 and 27.44 in ROUGE1, outperforming the SFT's 12.69 and 24.21. This highlights the strength of our training approach in refining LLMs for medical applications.

摘要
把大语言模型（LLM）应用于医疗领域存在潜在的潜力，但也面临着挑战。直接为医疗领域预训练LLM可能是资源占用过重，而且可能不可能实现。凭借Supervised Fine-tuning（SFT） alone不能捕捉医疗领域专业知识。为了解决这些挑战，我们提出了一种多 stage 训练方法，包括域pecific Continued Pre-training（DCPT）、SFT和Direct Preference Optimization（DPO）。我们的研究的一个重要贡献是提供了3Gb的中药学（ChiMed）数据集，包括医学问答、普通文本、知识图谱和对话，分为三个训练阶段。使用我们的训练管道，Qilin-Med，医学LLM在CPT和SFT阶段取得了38.4%和40.0%的准确率，超过了Baichuan-7B的33.5%。在DPO阶段，在Huatuo-26M测试集上，它得分16.66在BLEU-1和27.44在ROUGE1，超过了SFT的12.69和24.21。这表明我们的训练方法在医疗应用中具有强大的细化LLM的能力。

MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks

paper_url: http://arxiv.org/abs/2310.09036
repo_url: https://github.com/declare-lab/mm-bigbench
paper_authors: Xiaocui Yang, Wenfang Wu, Shi Feng, Ming Wang, Daling Wang, Yang Li, Qi Sun, Yifei Zhang, Xiaoming Fu, Soujanya Poria
for: 本研究的目的是评估多modal大语言模型（MLLMs）的性能，尤其是在多modal内容理解任务中。
methods: 本研究使用了多种指标来全面评估不同模型和指令的性能，包括Best Performance指标、Mean Relative Gain指标和Stability指标。
results: 研究发现了20种语言模型（14种MLLMs）在14个多modal数据集上的性能，并derived了新的发现。

Abstract
The popularity of multimodal large language models (MLLMs) has triggered a recent surge in research efforts dedicated to evaluating these models. Nevertheless, existing evaluation studies of MLLMs primarily focus on the comprehension and reasoning of unimodal (vision) content, neglecting performance evaluations in the domain of multimodal (vision-language) content understanding. Beyond multimodal reasoning, tasks related to multimodal content comprehension necessitate a profound understanding of multimodal contexts, achieved through the multimodal interaction to obtain a final answer. In this paper, we introduce a comprehensive assessment framework called MM-BigBench, which incorporates a diverse range of metrics to offer an extensive evaluation of the performance of various models and instructions across a wide spectrum of diverse multimodal content comprehension tasks. Consequently, our work complements research on the performance of MLLMs in multimodal comprehension tasks, achieving a more comprehensive and holistic evaluation of MLLMs. To begin, we employ the Best Performance metric to ascertain each model's performance upper bound on different datasets. Subsequently, the Mean Relative Gain metric offers an assessment of the overall performance of various models and instructions, while the Stability metric measures their sensitivity. Furthermore, previous research centers on evaluating models independently or solely assessing instructions, neglecting the adaptability between models and instructions. We propose the Adaptability metric to quantify the adaptability between models and instructions. Our paper evaluates a total of 20 language models (14 MLLMs) on 14 multimodal datasets spanning 6 tasks, with 10 instructions for each task, and derives novel insights. Our code will be released at https://github.com/declare-lab/MM-BigBench.

摘要
具有多模态语言模型（MLLM）的受欢迎程度已经引发了研究人员对这些模型的评估的新一轮努力。然而，现有的评估研究主要集中在视觉内容上的理解和推理，忽视了多模态内容理解的评估。除了多模态理解外，多模态内容理解任务需要深入理解多模态上下文，通过多模态交互获得最终答案。在本文中，我们提出了一个完整的评估框架 called MM-BigBench，它包括多种纪录来评估不同模型和指令的表现。因此，我们的工作补充了关于 MLLM 在多模态理解任务中的性能研究，实现了更加全面和彻底的 MLLM 评估。首先，我们使用 Best Performance 纪录来确定每个模型在不同的数据集上的性能最高 bound。然后，Mean Relative Gain 纪录用于评估不同模型和指令的总体性能，而 Stability 纪录则测量它们的敏感度。此外，前一代的研究主要集中在独立评估模型或仅仅评估指令，忽视模型和指令之间的适应性。我们提出了 Adaptability 纪录来衡量模型和指令之间的适应性。我们的研究评估了 20 种语言模型（14 MLLM）在 14 个多模态数据集上，涵盖 6 个任务，每个任务有 10 个指令，并 derivates 新的发现。我们的代码将在 GitHub 上发布。

Dont Add, dont Miss: Effective Content Preserving Generation from Pre-Selected Text Spans

paper_url: http://arxiv.org/abs/2310.09017
repo_url: https://github.com/lovodkin93/cdr_ctr
paper_authors: Aviv Slobodkin, Avi Caciularu, Eran Hirsch, Ido Dagan
for: 这篇论文旨在提供一个可靠的 Controlled Text Reduction (CTR) 模型，以解决当前存在 mediocre 性能的基eline 问题。
methods: 该论文使用 reinforcement learning (RL) 和 controlled decoding strategy 来强化内容保留约束，并使用 GPT-4 distillation 提高银色训练数据质量。
results: Comparing with current baseline, 该论文的模型可以提供 marked gains 的性能，最高提高了 ROUGE-L 分数30个点，提供了一个可靠的 CTR 模型。

Abstract
The recently introduced Controlled Text Reduction (CTR) task isolates the text generation step within typical summarization-style tasks. It does so by challenging models to generate coherent text conforming to pre-selected content within the input text ("highlights"). This framing enables increased modularity in summarization-like tasks, allowing to couple a single CTR model with various content-selection setups and modules. However, there are currently no reliable CTR models, while the performance of the existing baseline for the task is mediocre, falling short of practical utility. Here, we address this gap by introducing a high-quality, open-source CTR model that tackles two prior key limitations: inadequate enforcement of the content-preservation constraint, and suboptimal silver training data. Addressing these, we amplify the content-preservation constraint in both training, via RL, and inference, via a controlled decoding strategy. Further, we substantially improve the silver training data quality via GPT-4 distillation. Overall, pairing the distilled dataset with the highlight-adherence strategies yields marked gains over the current baseline, of up to 30 ROUGE-L points, providing a reliable CTR model for downstream use.

摘要
新引入的Controlled Text Reduction（CTR）任务将文本生成步骤与传统的概要化任务分离开来。它通过要求模型生成符合输入文本中预选内容的 coherent 文本来实现这一点。这种框架允许在概要化任务中增加模块化，使得可以将CTR模型与不同的内容选择设置和模块集成。然而，目前没有可靠的CTR模型，而现有的基eline性能不佳，落后于实际应用中的需求。在这里，我们填补这一漏洞，引入一个高质量、开源的CTR模型，解决了两个关键的前提限制：不足的内容保持约束和低质量的银色训练数据。我们在训练和推理中强制实施内容保持约束，通过RL学习和控制的解码策略来强制实施。此外，我们通过GPT-4浸泡来大幅提高银色训练数据的质量。总的来说，将浸泡数据与突出重点策略相结合，可以获得与当前基eline的ROUGE-L分数提高至30个点，提供一个可靠的CTR模型。

Towards Example-Based NMT with Multi-Levenshtein Transformers

paper_url: http://arxiv.org/abs/2310.08967
repo_url: https://github.com/maxwell1447/fairseq
paper_authors: Maxime Bouthors, Josep Crego, François Yvon
For: 提高翻译 metric 以及域适应性* Methods: 使用 retrieve-augmented 翻译模型，并允许用户查看翻译决策的示例* Results: 实验结果显示，对多个示例进行编辑可以提高翻译分数，并增加目标句子中的复制 span 数量

Abstract
Retrieval-Augmented Machine Translation (RAMT) is attracting growing attention. This is because RAMT not only improves translation metrics, but is also assumed to implement some form of domain adaptation. In this contribution, we study another salient trait of RAMT, its ability to make translation decisions more transparent by allowing users to go back to examples that contributed to these decisions. For this, we propose a novel architecture aiming to increase this transparency. This model adapts a retrieval-augmented version of the Levenshtein Transformer and makes it amenable to simultaneously edit multiple fuzzy matches found in memory. We discuss how to perform training and inference in this model, based on multi-way alignment algorithms and imitation learning. Our experiments show that editing several examples positively impacts translation scores, notably increasing the number of target spans that are copied from existing instances.

摘要
Retrieval-Augmented Machine Translation (RAMT) 在最近吸引了越来越多的注意。这是因为 RAMT 不仅改善翻译指标，而且还被 assumes 实现了一种形式的领域适应。在这篇论文中，我们研究了 RAMT 另一个醒目的特点，即它可以让用户回到翻译决策中的示例。为了实现这一点，我们提议了一种新的架构，该架构基于改进的 Levenshtein Transformer，并使其可以同时修改内存中的多个混淆匹配。我们讨论了在这种模型中进行训练和推断的方法，包括多重对齐算法和模仿学习。我们的实验表明，编辑多个示例可以正面影响翻译分数，特别是增加目标词串中的复制数。

xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark

paper_url: http://arxiv.org/abs/2310.08958
repo_url: https://github.com/e0397123/xdial-eval
paper_authors: Chen Zhang, Luis Fernando D’Haro, Chengguang Tang, Ke Shi, Guohua Tang, Haizhou Li
for: 本研究旨在提出一个多语言对话评估 benchmark，以便检验英语对话评估metric的一致性和扩展性。
methods: 本研究使用了预训条件语言模型和商业机器翻译系统将英语对话扩展到其他九种语言。研究者还将BERT基础的metric和大型自然语言模型进行了广泛的分析。
results: 研究结果显示，新建立的自我超vised和多语言基准达到了优秀的成绩，在所有dataset和语言上的平均pearson相互 correlations中，最佳基准比OpenAI的ChatGPT提高了6.5%和4.6%的统计差。

Abstract
Recent advancements in reference-free learned metrics for open-domain dialogue evaluation have been driven by the progress in pre-trained language models and the availability of dialogue data with high-quality human annotations. However, current studies predominantly concentrate on English dialogues, and the generalization of these metrics to other languages has not been fully examined. This is largely due to the absence of a multilingual dialogue evaluation benchmark. To address the issue, we introduce xDial-Eval, built on top of open-source English dialogue evaluation datasets. xDial-Eval includes 12 turn-level and 6 dialogue-level English datasets, comprising 14930 annotated turns and 8691 annotated dialogues respectively. The English dialogue data are extended to nine other languages with commercial machine translation systems. On xDial-Eval, we conduct comprehensive analyses of previous BERT-based metrics and the recently-emerged large language models. Lastly, we establish strong self-supervised and multilingual baselines. In terms of average Pearson correlations over all datasets and languages, the best baseline outperforms OpenAI's ChatGPT by absolute improvements of 6.5% and 4.6% at the turn and dialogue levels respectively, albeit with much fewer parameters. The data and code are publicly available at https://github.com/e0397123/xDial-Eval.

摘要
现代技术的参照无关学习度量对开放领域对话评价有所进步，主要归功于预训练语言模型和高质量人工标注的对话数据的可用性。然而，当前的研究主要集中在英文对话上，对其他语言的普适性尚未得到全面的检验。这主要是因为没有一个多语言对话评价标准 benchmark。为解决这个问题，我们介绍了xDial-Eval，基于开源的英文对话评价数据集。xDial-Eval包括12个转折级和6个对话级英文数据集，共计14930个标注的转折和8691个标注的对话。英文对话数据被扩展到九种其他语言，使用商业机器翻译系统。在xDial-Eval上，我们进行了前BERT基于度量的全面分析，以及最近出现的大语言模型。最后，我们建立了强Self-supervised和多语言基elines。相对所有数据集和语言，最佳基eline的平均对预 correlation coefficient的提升为6.5%和4.6%，尽管它具有许多 fewer 参数。数据和代码在https://github.com/e0397123/xDial-Eval 上公开 available。

Textual Analysis of ICALEPCS and IPAC Conference Proceedings: Revealing Research Trends, Topics, and Collaborations for Future Insights and Advanced Search

paper_url: http://arxiv.org/abs/2310.08954
repo_url: https://github.com/sulcantonin/text_icalepcs23
paper_authors: Antonin Sulc, Annika Eichler, Tim Wilksen
for: 本研究通过文献分析、自然语言处理技术，对过去ICALEPCS和IPAC会议论文进行文本分析，以获得Field的研究趋势和话题。
methods: 本研究使用自然语言处理技术提取有意义信息，分析和可视化论文中的话题，识别研究趋势，并高亮一些基于内容的出色论文。
results: 本研究提供了Field的研究领域的全面概述，帮助研究者和实践者更好地了解当前 estado-of-the-art，并且为未来研究提供了方向。

Abstract
In this paper, we show a textual analysis of past ICALEPCS and IPAC conference proceedings to gain insights into the research trends and topics discussed in the field. We use natural language processing techniques to extract meaningful information from the abstracts and papers of past conference proceedings. We extract topics to visualize and identify trends, analyze their evolution to identify emerging research directions, and highlight interesting publications based solely on their content with an analysis of their network. Additionally, we will provide an advanced search tool to better search the existing papers to prevent duplication and easier reference findings. Our analysis provides a comprehensive overview of the research landscape in the field and helps researchers and practitioners to better understand the state-of-the-art and identify areas for future research.

摘要
在这篇论文中，我们对过去的ICALEPCS和IPAC会议论文进行文本分析，以获得研究趋势和话题的洞察。我们使用自然语言处理技术来提取有用的信息从会议论文摘要和论文中。我们提取话题以可视化和识别趋势，分析其演化以识别emerging research direction，并高亮一些基于内容的出色论文。此外，我们还将提供一个高级搜索工具，以避免重复和更方便地找到相关结果。我们的分析提供了领域的全面评估和未来研究方向，帮助研究人员和实践者更好地理解领域的状态和识别未来研究领域。

CAMELL: Confidence-based Acquisition Model for Efficient Self-supervised Active Learning with Label Validation

paper_url: http://arxiv.org/abs/2310.08944
repo_url: None
paper_authors: Carel van Niekerk, Christian Geishauser, Michael Heck, Shutong Feng, Hsien-chin Lin, Nurul Lubis, Benjamin Ruppik, Renato Vukovic, Milica Gašić
for: 这篇论文的目的是提出一个可以应对Sequential Task的 актив学习框架，以提高模型的性能。
methods: 这篇论文使用了一个名为CAMELL的池化型活动学习框架，具有三个核心特点：首先，它只需要专家标注少量的序列中的一部分；其次，它可以为其余的序列进行自我标注；最后，它使用了标签验证机制来防止错误的标签填充数据集和害模型性能。
results: 在实验中，CAMELL比基eline的性能更高，且提出的数据更正确。

Abstract
Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition from expert-based to crowd-sourced labelling. To address these challenges, we present \textbf{CAMELL} (Confidence-based Acquisition Model for Efficient self-supervised active Learning with Label validation), a pool-based active learning framework tailored for sequential multi-output problems. CAMELL possesses three core features: (1) it requires expert annotators to label only a fraction of a chosen sequence, (2) it facilitates self-supervision for the remainder of the sequence, and (3) it employs a label validation mechanism to prevent erroneous labels from contaminating the dataset and harming model performance. We evaluate CAMELL on sequential tasks, with a special emphasis on dialogue belief tracking, a task plagued by the constraints of limited and noisy datasets. Our experiments demonstrate that CAMELL outperforms the baselines in terms of efficiency. Furthermore, the data corrections suggested by our method contribute to an overall improvement in the quality of the resulting datasets.

摘要
supervised neural方法受到大量、精心标注的数据的依赖，这种需求对于Sequential任务特别是困难。标注质量随着从专家标注转移到群体标注而逐渐下降。为解决这些挑战，我们提出了\textbf{CAMELL}（Confidence-based Acquisition Model for Efficient self-supervised active Learning with Label validation），一种适用于Sequential多输出问题的池化式活动学习框架。CAMELL具有以下三个核心特点：1. 仅需专家标注部分序列中的一小部分，而不是整个序列。2. 为剩余的序列自我标注。3. 使用标签验证机制，以避免错误标签污染数据集，从而危害模型性能。我们在Sequential任务上进行了实验，尤其是对话信念跟踪任务，这种任务受到数据的局限和噪声的限制。我们的实验结果表明，CAMELL在效率方面超过基eline。此外，我们的方法建议的数据修正也对数据集的质量产生了总体改善。

Multi-level Adaptive Contrastive Learning for Knowledge Internalization in Dialogue Generation

paper_url: http://arxiv.org/abs/2310.08943
repo_url: None
paper_authors: Chenxu Yang, Zheng Lin, Lanrui Wang, Chong Tian, Liang Pang, Jiangnan Li, Qirong Ho, Yanan Cao, Weiping Wang
for: 这篇论文旨在解决文本塌行问题，即模型通过吸收外部知识来增强对话生成的人类化性。
methods: 该论文提出了一种多级 adaptive contrastive learning（MACL）框架，该框架在模型内部动态选择负例，并对模型的塌行行为进行惩罚，以避免模型仅仅在 superficies 上匹配知识段落而无法 internalize 这些信息。
results: 对 WoW 数据集进行了广泛的实验，证明了我们的方法在不同的预训练模型下具有效果，能够提高对话生成的人类化性。

Abstract
Knowledge-grounded dialogue generation aims to mitigate the issue of text degeneration by incorporating external knowledge to supplement the context. However, the model often fails to internalize this information into responses in a human-like manner. Instead, it simply inserts segments of the provided knowledge into generic responses. As a result, the generated responses tend to be tedious, incoherent, and in lack of interactivity which means the degeneration problem is still unsolved. In this work, we first find that such copying-style degeneration is primarily due to the weak likelihood objective, which allows the model to "cheat" the objective by merely duplicating knowledge segments in a superficial pattern matching based on overlap. To overcome this challenge, we then propose a Multi-level Adaptive Contrastive Learning (MACL) framework that dynamically samples negative examples and subsequently penalizes degeneration behaviors at both the token-level and sequence-level. Extensive experiments on the WoW dataset demonstrate the effectiveness of our approach across various pre-trained models.

摘要
知识基于对话生成旨在解决文本衰退问题，通过外部知识补充上下文。然而，模型往往无法将此信息Internalize到回答中，而是简单地插入提供的知识到通用回答中。这导致生成的回答往往是无聊、无 coherence 和不互动的，这意味着衰退问题仍未解决。在这种工作中，我们首先发现，这种拷贝式衰退主要是由弱化概率目标负担，这使得模型可以“偷懒”地通过 superficies 的pattern matching来满足目标。为了解决这个挑战，我们then propose a Multi-level Adaptive Contrastive Learning (MACL) 框架，该框架在运行时动态 sampling negative examples，并在字元级和序列级进行 penalty，以避免衰退行为。我们在 WoW 数据集上进行了广泛的实验，并证明了我们的方法在不同的预训练模型上的效果。

Towards Informative Few-Shot Prompt with Maximum Information Gain for In-Context Learning

paper_url: http://arxiv.org/abs/2310.08923
repo_url: None
paper_authors: Hongfu Liu, Ye Wang
for: 本研究旨在提高大语言模型（LLM）在新下游任务上的培育环境中的稳定性。
methods: 本研究使用了一种新的采样策略，即通过量化选择的示例来评估其信息增强。此外，本研究还提出了一种减少模板偏见的纠正策略。
results: 实验结果显示，提案的方法可以在六个分类任务中提高 LLM 的平均相对改进率为 14.3%。

Abstract
Large Language models (LLMs) possess the capability to engage In-context Learning (ICL) by leveraging a few demonstrations pertaining to a new downstream task as conditions. However, this particular learning paradigm suffers from high instability stemming from substantial variances induced by factors such as the input distribution of selected examples, their ordering, and prompt formats. In this work, we demonstrate that even when all these factors are held constant, the random selection of examples still results in high variance. Consequently, we aim to explore the informative ability of data examples by quantifying the Information Gain (IG) obtained in prediction after observing a given example candidate. Then we propose to sample those with maximum IG. Additionally, we identify the presence of template bias, which can lead to unfair evaluations of IG during the sampling process. To mitigate this bias, we introduce Calibration Before Sampling strategy. The experimental results illustrate that our proposed method can yield an average relative improvement of 14.3% across six classification tasks using three LLMs.

摘要

Human-in-the-loop Machine Translation with Large Language Model

paper_url: http://arxiv.org/abs/2310.08908
repo_url: https://github.com/nlp2ct/hil-mt
paper_authors: Xinyi Yang, Runzhe Zhan, Derek F. Wong, Junchao Wu, Lidia S. Chao
for: 这个研究旨在应用人工智能语言模型（LLM）到机器翻译任务中，并评估其性能从多个角度。
methods: 该研究使用了LLM的启发式学习机制和自然语言处理技术，并提出了一个人工智能干预框架，以指导LLM生成自定义输出并进行修订。
results: 研究表明，人工智能干预框架可以提高LLM的翻译性能，并且可以适应不同领域的翻译需求。此外，研究还发现了不同的启发式检索方法和建立低资源情况下的检索数据库的可行性。

Abstract
The large language model (LLM) has garnered significant attention due to its in-context learning mechanisms and emergent capabilities. The research community has conducted several pilot studies to apply LLMs to machine translation tasks and evaluate their performance from diverse perspectives. However, previous research has primarily focused on the LLM itself and has not explored human intervention in the inference process of LLM. The characteristics of LLM, such as in-context learning and prompt engineering, closely mirror human cognitive abilities in language tasks, offering an intuitive solution for human-in-the-loop generation. In this study, we propose a human-in-the-loop pipeline that guides LLMs to produce customized outputs with revision instructions. The pipeline initiates by prompting the LLM to produce a draft translation, followed by the utilization of automatic retrieval or human feedback as supervision signals to enhance the LLM's translation through in-context learning. The human-machine interactions generated in this pipeline are also stored in an external database to expand the in-context retrieval database, enabling us to leverage human supervision in an offline setting. We evaluate the proposed pipeline using GPT-3.5-turbo API on five domain-specific benchmarks for German-English translation. The results demonstrate the effectiveness of the pipeline in tailoring in-domain translations and improving translation performance compared to direct translation. Additionally, we discuss the results from the following perspectives: 1) the effectiveness of different in-context retrieval methods; 2) the construction of a retrieval database under low-resource scenarios; 3) the observed domains differences; 4) the quantitative analysis of linguistic statistics; and 5) the qualitative analysis of translation cases. The code and data are available at https://github.com/NLP2CT/HIL-MT/.

摘要
大型语言模型（LLM）已引起广泛关注，因其在语言任务中的增强功能和上下文学习机制。研究者们已经通过多个预研究来应用 LLM 到机器翻译任务中，并评估其性能从多个角度。然而，之前的研究主要集中在 LLM 本身，忽略了人类在推理过程中的干预。LLM 的特点，如上下文学习和提示工程，与人类语言任务的认知能力很相似，提供了一种直观的解决方案。在本研究中，我们提议了一个人类在循环（HIL）管道，用于指导 LLM 生成自定义输出，并通过修订指令进行修正。这个管道从提示 LLM 生成稿本开始，然后利用自动检索或人类反馈作为监督信号，通过上下文学习进行改进。人机交互生成在这个管道中也被存储在外部数据库中，以扩展上下文检索数据库，以便在线上利用人类监督。我们使用 GPT-3.5-turbo API 测试我们的管道，并在五个域限翻译 benchmark 上进行评估。结果表明，我们的管道可以适应域限翻译，提高翻译性能，比直接翻译更好。此外，我们从以下几个角度分析结果：1）不同的上下文检索方法的效果；2）在低资源下构建检索数据库的问题；3）观察到的域差异；4）语言统计量的分析；以及5）翻译案例的质量分析。代码和数据可以在 GitHub 上获取：.

SeqXGPT: Sentence-Level AI-Generated Text Detection

paper_url: http://arxiv.org/abs/2310.08903
repo_url: https://github.com/jihuai-wpy/seqxgpt
paper_authors: Pengyu Wang, Linyang Li, Ke Ren, Botian Jiang, Dong Zhang, Xipeng Qiu
for: 本研究旨在提出一种新的句子水平AI生成文本检测方法，以满足现有的文本检测方法只考虑文档水平的需求。
methods: 我们提出了一种基于白盒LM的log概率列表的特征，称为SeqXGPT，它使用了卷积网络和自注意网络来实现句子水平AI生成文本检测。
results: 我们的方法在句子和文档水平的检测挑战中都显示出了 significatively 高的表现，并且具有强大的泛化能力。

Abstract
Widely applied large language models (LLMs) can generate human-like content, raising concerns about the abuse of LLMs. Therefore, it is important to build strong AI-generated text (AIGT) detectors. Current works only consider document-level AIGT detection, therefore, in this paper, we first introduce a sentence-level detection challenge by synthesizing a dataset that contains documents that are polished with LLMs, that is, the documents contain sentences written by humans and sentences modified by LLMs. Then we propose \textbf{Seq}uence \textbf{X} (Check) \textbf{GPT}, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection. These features are composed like \textit{waves} in speech processing and cannot be studied by LLMs. Therefore, we build SeqXGPT based on convolution and self-attention networks. We test it in both sentence and document-level detection challenges. Experimental results show that previous methods struggle in solving sentence-level AIGT detection, while our method not only significantly surpasses baseline methods in both sentence and document-level detection challenges but also exhibits strong generalization capabilities.

摘要
广泛应用的大语言模型（LLM）可以生成人类样式的内容，因此建立强大的人工智能生成文本检测器（AIGT）变得非常重要。现有的工作只考虑文档级别的AIGT检测，因此在这篇论文中，我们首先提出了句子级别的检测挑战， Synthesize一个包含由人类和LLM修改的句子的数据集。然后，我们提出了序列检测（Seq）逻辑检测（X）（Check）大语言模型（GPT），一种新的方法，它利用白盒LLM的log概率列作为句子级别AIGT检测的特征。这些特征如speech处理中的波动，不能被LLM研究。因此，我们建立了SeqXGPT基于卷积网络和自注意网络。我们在句子和文档级别的检测挑战中测试了我们的方法，实验结果表明，先前的方法在句子级别AIGT检测中很难解决，而我们的方法不仅在句子和文档级别的检测挑战中明显超越基线方法，还表现出了强大的泛化能力。

Exploration with Principles for Diverse AI Supervision

paper_url: http://arxiv.org/abs/2310.08899
repo_url: None
paper_authors: Hao Liu, Matei Zaharia, Pieter Abbeel
for: 提高人工监督需求的AI模型表现，以增强自然语言处理技术的发展。
methods: 基于自主探索学习的语言模型，通过评估生成内容的新鲜度来驱动探索。
results: 对复杂逻辑任务的模型表现显著提高，减少人工监督需求。

Abstract
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI. While this generative AI approach has produced impressive results, it heavily leans on human supervision. Even state-of-the-art AI models like ChatGPT depend on fine-tuning through human demonstrations, demanding extensive human input and domain expertise. This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation. To address this limitation, we propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data. Drawing inspiration from unsupervised reinforcement learning (RL) pretraining, EAI achieves exploration within the natural language space. We accomplish this by harnessing large language models to assess the novelty of generated content. Our approach employs two key components: an actor that generates novel content following exploration principles and a critic that evaluates the generated content, offering critiques to guide the actor. Empirical evaluations demonstrate that EAI significantly boosts model performance on complex reasoning tasks, addressing the limitations of human-intensive supervision.

摘要
<>使用下一个token预测训练大型transformer教学对人工智能发展带来了重要突破。这种生成AI方法产生了吸引人的结果，但它对人类监督依赖很强，甚至最新的AI模型如ChatGPT也需要人类精心组译，需要广泛的人类输入和领域专业知识。这强大的人类监督限制了AI创新的发展。为解决这个限制，我们提出了一个新的思想，称为探索AI（EAI），旨在自动生成高质量训练数据。参考无监督学习（RL）的预训练，EAI在自然语言空间中进行探索。我们通过使用大型语言模型评估生成的内容新鲜度，实现这一目标。我们的方法包括两个关键 ком成分：一个actor生成 seguir内容，以探索原则为 guide，另一个critic评估生成的内容，提供反馈来引导actor。我们的实验结果表明，EAI可以对复杂推理任务的模型表现有 significiant提高，解决人类专业监督的限制。

PerturbScore: Connecting Discrete and Continuous Perturbations in NLP

paper_url: http://arxiv.org/abs/2310.08889
repo_url: https://github.com/renke999/perturbscore
paper_authors: Linyang Li, Ke Ren, Yunfan Shao, Pengyu Wang, Xipeng Qiu
for: 这个论文主要目标是研究NLP模型的Robustness问题，具体来说是将离散干扰与连续干扰连接起来，以便更好地理解NLP模型中的离散干扰。
methods: 作者们首先研究如何连接和度量离散干扰和连续干扰之间的相关性。然后，他们设计了一个回归任务来自动学习这种相关性。通过实验结果，作者们发现可以建立离散和连续干扰之间的连接，并使用提议的PerturbScore来学习这种相关性，超过了之前在离散干扰量化中使用的方法。
results: 作者们通过实验结果发现，可以建立离散和连续干扰之间的连接，并使用提议的PerturbScore来学习这种相关性，超过了之前在离散干扰量化中使用的方法。此外，提议的PerturbScore可以在不同的 dataset、干扰方法上进行普适化，这表明可以将其用作NLP模型的Robustness研究中的一种有力的工具。

Abstract
With the rapid development of neural network applications in NLP, model robustness problem is gaining more attention. Different from computer vision, the discrete nature of texts makes it more challenging to explore robustness in NLP. Therefore, in this paper, we aim to connect discrete perturbations with continuous perturbations, therefore we can use such connections as a bridge to help understand discrete perturbations in NLP models. Specifically, we first explore how to connect and measure the correlation between discrete perturbations and continuous perturbations. Then we design a regression task as a PerturbScore to learn the correlation automatically. Through experimental results, we find that we can build a connection between discrete and continuous perturbations and use the proposed PerturbScore to learn such correlation, surpassing previous methods used in discrete perturbation measuring. Further, the proposed PerturbScore can be well generalized to different datasets, perturbation methods, indicating that we can use it as a powerful tool to study model robustness in NLP.

摘要
随着自然语言处理（NLP）领域中神经网络应用的快速发展，模型Robustness问题在引起更多的关注。与计算机视觉不同，文本的整数性质使其更加挑战性地探索Robustness。因此，在这篇论文中，我们尝试将整数扰动与连续扰动连接起来，以此为桥梁来理解NLP模型中的整数扰动。特别是，我们首先探索如何连接和度量整数扰动和连续扰动之间的相关性。然后，我们设计了一个 regression 任务，即 PerturbScore，以自动学习这种相关性。经过实验结果，我们发现可以建立整数扰动和连续扰动之间的连接，并使用我们提议的 PerturbScore 来学习这种相关性，超过过去在整数扰动测量中使用的方法。此外，我们的 PerturbScore 可以通过不同的数据集、扰动方法进行普适性测试，表明可以使用它作为NLP模型Robustness的可能的工具。

InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems

paper_url: http://arxiv.org/abs/2310.08885
repo_url: None
paper_authors: Willy Chung, Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, Pascale Fung
for: 这个论文旨在开发一种适用于多个领域的零整合端到端任务对话系统框架，不需要特定任务数据或 Fine-tuning。
methods: 该框架利用大型自然语言模型（LLMs）生成代理信仰状态，以便精准地翻译用户意图为动态查询，以便与任何知识库（KB）进行高效的交互。
results: 对比于完全精心调整的端到端任务对话系统，InstructTODS在无需任务特定数据或 Fine-tuning的情况下达到了相同的完成率。此外，人工评估表明，InstructTODS生成的对话响应比金标准响应和现有的端到端任务对话系统更有用、更有信息、更人性化。

Abstract
Large language models (LLMs) have been used for diverse tasks in natural language processing (NLP), yet remain under-explored for task-oriented dialogue systems (TODS), especially for end-to-end TODS. We present InstructTODS, a novel off-the-shelf framework for zero-shot end-to-end task-oriented dialogue systems that can adapt to diverse domains without fine-tuning. By leveraging LLMs, InstructTODS generates a proxy belief state that seamlessly translates user intentions into dynamic queries for efficient interaction with any KB. Our extensive experiments demonstrate that InstructTODS achieves comparable performance to fully fine-tuned TODS in guiding dialogues to successful completion without prior knowledge or task-specific data. Furthermore, a rigorous human evaluation of end-to-end TODS shows that InstructTODS produces dialogue responses that notably outperform both the gold responses and the state-of-the-art TODS in terms of helpfulness, informativeness, and humanness. Moreover, the effectiveness of LLMs in TODS is further supported by our comprehensive evaluations on TODS subtasks: dialogue state tracking, intent classification, and response generation. Code and implementations could be found here https://github.com/WillyHC22/InstructTODS/

摘要
大型自然语言处理（NLP）模型（LLM）已经在不同任务上使用，但它们在任务导向对话系统（TODS）中仍然尚未得到充分探索。我们介绍了一个新的协助TODS框架，名为InstructTODS，可以在零基础情况下实现终端到终端的任务导向对话系统。通过利用LLM，InstructTODS生成了一个代理信念状态，可以快速和高效地与任何知识库（KB）进行交互。我们的广泛实验表明，InstructTODS可以与完全精心调整的TODS相比，在不需要先知或任务特定数据的情况下，导航对话到成功完成。此外，我们进行了严格的人类评估，表明InstructTODS生成的对话响应比金标准响应和现有的TODS更有帮助、更有信息和更人性化。此外，我们还对TODS任务进行了广泛的评估，包括对话状态跟踪、意图类型分类和响应生成等。代码和实现可以在以下链接中找到：https://github.com/WillyHC22/InstructTODS/。

Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System

paper_url: http://arxiv.org/abs/2310.08877
repo_url: https://github.com/shenwzh3/mk-tod
paper_authors: Weizhou Shen, Yingqi Gao, Canbin Huang, Fanqi Wan, Xiaojun Quan, Wei Bi
for: 本研究旨在开发一个高效的知识检索器，以便对大规模知识库（KB）进行有效的任务导向对话。
methods: 我们提出了使用最大最大似然来培训一个敏感的检索器，并利用回归生成器提供的信号进行监督。此外，我们的方法还会考虑多种元知识，以便更好地利用知识。
results: 我们在三个任务导向对话数据集上使用T5和ChatGPT作为基础模型进行评估。结果表明，当与元知识相结合时，回归生成器可以有效地利用高质量的知识记录，并提高生成回答的质量。

Abstract
Developing an efficient retriever to retrieve knowledge from a large-scale knowledge base (KB) is critical for task-oriented dialogue systems to effectively handle localized and specialized tasks. However, widely used generative models such as T5 and ChatGPT often struggle to differentiate subtle differences among the retrieved KB records when generating responses, resulting in suboptimal quality of generated responses. In this paper, we propose the application of maximal marginal likelihood to train a perceptive retriever by utilizing signals from response generation for supervision. In addition, our approach goes beyond considering solely retrieved entities and incorporates various meta knowledge to guide the generator, thus improving the utilization of knowledge. We evaluate our approach on three task-oriented dialogue datasets using T5 and ChatGPT as the backbone models. The results demonstrate that when combined with meta knowledge, the response generator can effectively leverage high-quality knowledge records from the retriever and enhance the quality of generated responses. The codes and models of this paper are available at https://github.com/shenwzh3/MK-TOD.

摘要
开发一个高效的检索器，以检索大规模知识库（KB）中的知识，是对话系统来处理本地化和特殊化任务的关键。然而，广泛使用的生成模型，如T5和ChatGPT，经常很难以在生成响应时，对检索到的KB记录进行细致的区分。这会导致生成的响应质量下降。在这篇论文中，我们提议使用最大极值似然来训练一个敏感的检索器，通过响应生成提供的信号进行超vision。此外，我们的方法不仅考虑检索到的实体，还 incorporates 多种元知识，以便更好地利用知识。我们在三个任务对话数据集上使用T5和ChatGPT作为基础模型进行评估。结果显示，当与元知识相结合，响应生成器可以办法利用高质量的知识记录，从检索器中获得更高质量的响应。codes和models的github地址为https://github.com/shenwzh3/MK-TOD。

Guiding AMR Parsing with Reverse Graph Linearization

paper_url: http://arxiv.org/abs/2310.08860
repo_url: https://github.com/pkunlp-icler/amr_reverse_graph_linearization
paper_authors: Bofei Gao, Liang Chen, Peiyi Wang, Zhifang Sui, Baobao Chang
for: 这个论文主要是提出了一种解决sequence-to-sequence方法在AMR分析中strucutre loss聚集问题的方法，以提高AMR分析的精度。
methods: 该方法基于一种新的反向图线性化（RGL）框架，该框架定义了AMR图的默认和反向线性化顺序，并通过自我蒸馏机制将RGL纳入原始的AMR分析模型中。
results: 对AMR 2.0和AMR 3.0数据集进行测试，该方法与之前最佳的AMR分析模型相比，提高了0.8和0.5的Smatch分数。

Abstract
Abstract Meaning Representation (AMR) parsing aims to extract an abstract semantic graph from a given sentence. The sequence-to-sequence approaches, which linearize the semantic graph into a sequence of nodes and edges and generate the linearized graph directly, have achieved good performance. However, we observed that these approaches suffer from structure loss accumulation during the decoding process, leading to a much lower F1-score for nodes and edges decoded later compared to those decoded earlier. To address this issue, we propose a novel Reverse Graph Linearization (RGL) enhanced framework. RGL defines both default and reverse linearization orders of an AMR graph, where most structures at the back part of the default order appear at the front part of the reversed order and vice versa. RGL incorporates the reversed linearization to the original AMR parser through a two-pass self-distillation mechanism, which guides the model when generating the default linearizations. Our analysis shows that our proposed method significantly mitigates the problem of structure loss accumulation, outperforming the previously best AMR parsing model by 0.8 and 0.5 Smatch scores on the AMR 2.0 and AMR 3.0 dataset, respectively. The code are available at https://github.com/pkunlp-icler/AMR_reverse_graph_linearization.

摘要
抽象意义表示（AMR）分析目标是从给定句子中提取一个抽象semantic图。序列到序列方法， Linearization的Semantic graph into a sequence of nodes and edges and generate the linearized graph directly, have achieved good performance. However, we observed that these approaches suffer from structure loss accumulation during the decoding process, leading to a much lower F1-score for nodes and edges decoded later compared to those decoded earlier. To address this issue, we propose a novel Reverse Graph Linearization (RGL) enhanced framework. RGL defines both default and reverse linearization orders of an AMR graph, where most structures at the back part of the default order appear at the front part of the reversed order and vice versa. RGL incorporates the reversed linearization to the original AMR parser through a two-pass self-distillation mechanism, which guides the model when generating the default linearizations. Our analysis shows that our proposed method significantly mitigates the problem of structure loss accumulation, outperforming the previously best AMR parsing model by 0.8 and 0.5 Smatch scores on the AMR 2.0 and AMR 3.0 dataset, respectively. 代码可以在https://github.com/pkunlp-icler/AMR_reverse_graph_linearization中找到。

End-to-end Story Plot Generator

paper_url: http://arxiv.org/abs/2310.08796
repo_url: https://github.com/rprokap/pset-9
paper_authors: Hanlin Zhu, Andrew Cohen, Danqing Wang, Kevin Yang, Xiaomeng Yang, Jiantao Jiao, Yuandong Tian
for: 本研究targets the problem of automatic generation of story plots, including premise, character descriptions, and plot outlines.
methods: 我们提出了三种模型来解决这些挑战：$\texttt{OpenPlot}$, $\texttt{E2EPlot}$, 和 $\texttt{RLPlot}$. $\texttt{OpenPlot}$使用LLaMA2取代了开源API调用，通过精心设计提示语来实现便宜的生成高质量的故事情节训练集。 $\texttt{E2EPlot}$通过终端到终端的精度调整来训练，使用约13000个故事情节生成器生成的故事情节。 $\texttt{RLPlot}$通过RLHF进行进一步的微调，使其在不同的奖励模型下实现不同方面的故事质量的优化。
results: 我们的实验结果显示，$\texttt{RLPlot}$可以在不同的奖励模型下实现60.0%的胜率，比$\texttt{E2EPlot}$高。

Abstract
Story plots, while short, carry most of the essential information of a full story that may contain tens of thousands of words. We study the problem of automatic generation of story plots, which includes story premise, character descriptions, plot outlines, etc. To generate a single engaging plot, existing plot generators (e.g., DOC (Yang et al., 2022a)) require hundreds to thousands of calls to LLMs (e.g., OpenAI API) in the planning stage of the story plot, which is costly and takes at least several minutes. Moreover, the hard-wired nature of the method makes the pipeline non-differentiable, blocking fast specialization and personalization of the plot generator. In this paper, we propose three models, $\texttt{OpenPlot}$, $\texttt{E2EPlot}$ and $\texttt{RLPlot}$, to address these challenges. $\texttt{OpenPlot}$ replaces expensive OpenAI API calls with LLaMA2 (Touvron et al., 2023) calls via careful prompt designs, which leads to inexpensive generation of high-quality training datasets of story plots. We then train an end-to-end story plot generator, $\texttt{E2EPlot}$, by supervised fine-tuning (SFT) using approximately 13000 story plots generated by $\texttt{OpenPlot}$. $\texttt{E2EPlot}$ generates story plots of comparable quality to $\texttt{OpenPlot}$, and is > 10$\times$ faster (1k tokens in only 30 seconds on average). Finally, we obtain $\texttt{RLPlot}$ that is further fine-tuned with RLHF on several different reward models for different aspects of story quality, which yields 60.0$\%$ winning rate against $\texttt{E2EPlot}$ along the aspect of suspense and surprise.

摘要
文本情节，即使短，拥有大部分完整的故事信息。我们研究自动生成文本情节的问题，包括故事前提、人物描述、剧本大纲等。现有的故事情节生成器（例如，DOC（Yang et al., 2022a））需要数百到千个LLM（例如，OpenAI API）的调用，以便在故事情节的规划阶段生成一个有趣的情节，这是成本高昂，至少需要几分钟。此外，这种硬编程的方法使得流水线不可 diferenciable，阻碍快速个性化和特化的情节生成器。在这篇论文中，我们提出了三种模型： $\texttt{OpenPlot}$、 $\texttt{E2EPlot}$ 和 $\texttt{RLPlot}$，以解决这些挑战。 $\texttt{OpenPlot}$ 将Expensive OpenAI API 调用替换为LLLaMA2（Touvron et al., 2023）调用，通过仔细设计提示，以便便宜地生成高质量的故事情节训练数据集。然后，我们通过监督微调（SFT）使用约13000个故事情节，生成了一个终到终的故事情节生成器 $\texttt{E2EPlot}$。 $\texttt{E2EPlot}$ 生成的故事情节质量与 $\texttt{OpenPlot}$ 相当，且速度 > 10$\times$ （1k 字在30秒内平均 completion）。最后，我们通过RLHF（Reinforcement Learning with Human Feedback）进行进一步微调，在不同的奖励模型下，以不同的剧情质量方面获得60.0%的胜率。

2023-10-13

cs.LG

cs.LG - 2023-10-13

G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations

paper_url: http://arxiv.org/abs/2310.09443
repo_url: https://github.com/platformxlab/g10
paper_authors: Haoyang Zhang, Yirui Eric Zhou, Yuqi Xue, Yiqi Liu, Jian Huang
for: 提高深度学习工作负载的扩展和扩展
methods: 使用内存扩展和直接存储访问
results: 提高了1.75倍，无需修改深度学习工作负载，并且可以达到90.3%的理想性能。

Abstract
To break the GPU memory wall for scaling deep learning workloads, a variety of architecture and system techniques have been proposed recently. Their typical approaches include memory extension with flash memory and direct storage access. However, these techniques still suffer from suboptimal performance and introduce complexity to the GPU memory management, making them hard to meet the scalability requirement of deep learning workloads today. In this paper, we present a unified GPU memory and storage architecture named G10 driven by the fact that the tensor behaviors of deep learning workloads are highly predictable. G10 integrates the host memory, GPU memory, and flash memory into a unified memory space, to scale the GPU memory capacity while enabling transparent data migrations. Based on this unified GPU memory and storage architecture, G10 utilizes compiler techniques to characterize the tensor behaviors in deep learning workloads. Therefore, it can schedule data migrations in advance by considering the available bandwidth of flash memory and host memory. The cooperative mechanism between deep learning compilers and the unified memory architecture enables G10 to hide data transfer overheads in a transparent manner. We implement G10 based on an open-source GPU simulator. Our experiments demonstrate that G10 outperforms state-of-the-art GPU memory solutions by up to 1.75$\times$, without code modifications to deep learning workloads. With the smart data migration mechanism, G10 can reach 90.3\% of the performance of the ideal case assuming unlimited GPU memory.

摘要
要突破深度学习工作负载中的GPU内存墙，最近有许多建筑和系统技术被提出。这些典型的方法包括内存扩展和直接存储访问，但这些技术仍然受到优化性能和加载GPU内存管理的复杂性，使得它们难以满足深度学习工作负载今天的可扩展性要求。在这篇论文中，我们提出了一种统一的GPU内存和存储架构，名为G10，基于深度学习工作负载中tensor行为的预测性。G10将主机内存、GPU内存和flash存储集成到一个统一的内存空间中，以扩展GPU内存容量而实现透明数据迁移。基于这种统一的GPU内存和存储架构，G10利用编译器技术来特征化深度学习工作负载中tensor行为。因此，它可以在考虑flash存储和主机内存可用带宽的情况下，在先进计划数据迁移。G10和深度学习编译器之间的协同机制，使得G10可以在透明manner中隐藏数据传输开销。我们基于开源GPU模拟器实现G10。我们的实验表明，G10可以与现有状态的GPU内存解决方案相比，提高性能高达1.75倍，无需修改深度学习工作负载代码。在智能数据迁移机制的协助下，G10可以达到90.3%的理想情况下的性能，即假设GPU内存是无限的。

Target Variable Engineering

paper_url: http://arxiv.org/abs/2310.09440
repo_url: https://github.com/Sfedfcv/redesigned-pancake
paper_authors: Jessica Clark
for: 这种研究探讨了机器学习管道中目标变量的形式化对性能的影响。
methods: 该研究使用了对阈值进行分类的 numeric targets，并比较了使用回归模型和分类器来预测这些目标。
results: 研究发现，回归模型需要更多的计算资源来达到优化性能，并且更敏感于随机性和训练过程中的决策。分类器可以从系统化的参数调整和模型选择中获得小量的改进，但这些改进比回归模型的改进要小得多。

Abstract
How does the formulation of a target variable affect performance within the ML pipeline? The experiments in this study examine numeric targets that have been binarized by comparing against a threshold. We compare the predictive performance of regression models trained to predict the numeric targets vs. classifiers trained to predict their binarized counterparts. Specifically, we make this comparison at every point of a randomized hyperparameter optimization search to understand the effect of computational resource budget on the tradeoff between the two. We find that regression requires significantly more computational effort to converge upon the optimal performance, and is more sensitive to both randomness and heuristic choices in the training process. Although classification can and does benefit from systematic hyperparameter tuning and model selection, the improvements are much less than for regression. This work comprises the first systematic comparison of regression and classification within the framework of computational resource requirements. Our findings contribute to calls for greater replicability and efficiency within the ML pipeline for the sake of building more sustainable and robust AI systems.

摘要
文本翻译为简化中文：target变量的形式化如何影响ML管道中的性能？这些实验研究通过与阈值进行比较来将数字目标变量变换为二分类目标。我们将使用不同的搜索策略来比较推荐模型预测数字目标vs.类别预测其二分类对应的模型。我们在每个随机化hyperparameter优化搜索中进行这种比较，以了解计算资源预算对于这种贸易OFF的影响。我们发现了regression需要更多的计算努力来 converges到优化性能，并且更敏感于随机性和训练过程中的优化策略。虽然类别可以和系统化的hyperparameter优化和模型选择带来改进，但这些改进远远小于regression。这个研究是ML管道中首次系统比较regression和类别的计算资源需求。我们的发现对于建立更可持续和可靠的AI系统而言是有价值的。

Learning nonlinear integral operators via Recurrent Neural Networks and its application in solving Integro-Differential Equations

paper_url: http://arxiv.org/abs/2310.09434
repo_url: None
paper_authors: Hardeep Bassi, Yuanran Zhu, Senwei Liang, Jia Yin, Cian C. Reeves, Vojtech Vlcek, Chao Yang
for: 该论文提出使用LSTM-RNN学习和表示非线性积分算子，以解决非线性 integro-differential equations（IDEs）中的问题。
methods: 该论文使用LSTM-RNN来表示非线性积分算子，从而将IDEs转化为可以使用高效的解teger differential equations（ODEs）的系统。此外，由于LSTM-RNN表示的积分算子可以在数值时间演化过程中避免数值积分，因此该方法的总时间成本可以降至$O(n_T)$。
results: 该论文通过一个模拟问题示出了该方法的效率和稳定性。此外，该方法还可以应用于不同的外部力驱动的IDEs，并且可以解决达到弗里德曼方程（Dyson’s equation），这是量子多体系统的一个重要问题。

Abstract
In this paper, we propose using LSTM-RNNs (Long Short-Term Memory-Recurrent Neural Networks) to learn and represent nonlinear integral operators that appear in nonlinear integro-differential equations (IDEs). The LSTM-RNN representation of the nonlinear integral operator allows us to turn a system of nonlinear integro-differential equations into a system of ordinary differential equations for which many efficient solvers are available. Furthermore, because the use of LSTM-RNN representation of the nonlinear integral operator in an IDE eliminates the need to perform a numerical integration in each numerical time evolution step, the overall temporal cost of the LSTM-RNN-based IDE solver can be reduced to $O(n_T)$ from $O(n_T^2)$ if a $n_T$-step trajectory is to be computed. We illustrate the efficiency and robustness of this LSTM-RNN-based numerical IDE solver with a model problem. Additionally, we highlight the generalizability of the learned integral operator by applying it to IDEs driven by different external forces. As a practical application, we show how this methodology can effectively solve the Dyson's equation for quantum many-body systems.

摘要
在这篇论文中，我们提议使用LSTM-RNN（长期短期记忆-回归神经网络）来学习和表示非线性 интеgro-梯度方程（IDEs）中的非线性 интегро作用素。LSTM-RNN表示非线性 интегро作用素，使得我们可以将非线性 integro-梯度方程转化为一个可以使用高效解算法的常规梯度方程系统。此外，由于LSTM-RNN表示非线性 интегро作用素在IDEs中消除了每个数值时间演化步骤中的数值 интеIntegration的需要，因此总的时间成本可以降低至O(n_T)，比原始的O(n_T^2)快。我们通过一个模拟问题来证明这种LSTM-RNN基于的IDEs解numerical solver的效率和可靠性。此外，我们还指出了学习的 интегро作用素的通用性，并应用它到不同的外部力驱动的IDEs。作为实际应用，我们展示了如何使这种方法来有效解决量子多体系统中的杜逊方程。

Effects of cavity nonlinearities and linear losses on silicon microring-based reservoir computing

paper_url: http://arxiv.org/abs/2310.09433
repo_url: None
paper_authors: Bernard J. Giron Castro, Christophe Peucheret, Darko Zibar, Francesco Da Ros
for: This paper is written for understanding the impact of physical effects on the performance of time-delay photonic reservoir computing using microring resonators (MRRs).
methods: The paper uses numerical analysis to study the effect of linear losses, thermo-optic effects, and free-carrier effects on the prediction error of the time-series task NARMA-10 in MRRs.
results: The paper shows that there are three regions of input power and frequency detuning that reveal the cavity transition from linear to nonlinear regimes, and one of these regions offers very low error in time-series prediction under relatively low input power and number of nodes.

Abstract
Microring resonators (MRRs) are promising devices for time-delay photonic reservoir computing, but the impact of the different physical effects taking place in the MRRs on the reservoir computing performance is yet to be fully understood. We numerically analyze the impact of linear losses as well as thermo-optic and free-carrier effects relaxation times on the prediction error of the time-series task NARMA-10. We demonstrate the existence of three regions, defined by the input power and the frequency detuning between the optical source and the microring resonance, that reveal the cavity transition from linear to nonlinear regimes. One of these regions offers very low error in time-series prediction under relatively low input power and number of nodes while the other regions either lack nonlinearity or become unstable. This study provides insight into the design of the MRR and the optimization of its physical properties for improving the prediction performance of time-delay reservoir computing.

摘要
微型环 resonator (MRR) 是一种有前途的设备，用于时间延迟光子遗传计算，但不同物理效应在 MRR 中对计算性能的影响还未全面理解。我们通过数值分析Linear losses 以及 thermo-optic 和 free-carrier effects 的 relaxation time的影响，对时间序列任务 NARMA-10 的预测错误进行了分析。我们发现了三个区域，它们由输入功率和光源和微型环共振频率的偏差来定义。这三个区域分别表示光栅在线性和非线性频率域之间的转移，其中一个区域在相对较低的输入功率和节点数下具有非常低的预测错误。本研究为 MRR 的设计和物理属性优化提供了深入的理解，以提高时间延迟遗传计算的预测性能。

Offline Reinforcement Learning for Optimizing Production Bidding Policies

paper_url: http://arxiv.org/abs/2310.09426
repo_url: None
paper_authors: Dmytro Korenkevych, Frank Cheng, Artsiom Balakir, Alex Nikulkov, Lingnan Gao, Zhihao Cen, Zuobing Xu, Zheqing Zhu
for: 增加在线广告市场中的广告主优化预算限制下的投标效率。
methods: 使用生成式学习在生产环境中优化拍卖策略。
results: 在模拟和大规模生产环境中显著提高拍卖效率，不增加基础设施、安全或解释性成本。

Abstract
The online advertising market, with its thousands of auctions run per second, presents a daunting challenge for advertisers who wish to optimize their spend under a budget constraint. Thus, advertising platforms typically provide automated agents to their customers, which act on their behalf to bid for impression opportunities in real time at scale. Because these proxy agents are owned by the platform but use advertiser funds to operate, there is a strong practical need to balance reliability and explainability of the agent with optimizing power. We propose a generalizable approach to optimizing bidding policies in production environments by learning from real data using offline reinforcement learning. This approach can be used to optimize any differentiable base policy (practically, a heuristic policy based on principles which the advertiser can easily understand), and only requires data generated by the base policy itself. We use a hybrid agent architecture that combines arbitrary base policies with deep neural networks, where only the optimized base policy parameters are eventually deployed, and the neural network part is discarded after training. We demonstrate that such an architecture achieves statistically significant performance gains in both simulated and at-scale production bidding environments. Our approach does not incur additional infrastructure, safety, or explainability costs, as it directly optimizes parameters of existing production routines without replacing them with black box-style models like neural networks.

摘要
在线上广告市场中，每秒钟有千上千场拍卖，对广告商而言是一个挑战。为了优化预算，广告平台通常提供自动代理人给客户，这些代理人在实时拍卖中对广告机会进行标识。由于这些代理人由平台拥有，但使用广告商的资金运作，因此需要平衡可靠性和解释性。我们提出一个通用的方法来在生产环境中优化拍卖策略，通过在真实数据上学习折衣策略。这种方法可以优化任何可微分基础策略（实际上是一个基于原则的轻量级策略），并且仅需基础策略生成的数据。我们使用一种混合代理人架构，结合基础策略和深度神经网络，仅是在训练过程中丢出基础策略参数，神经网络部分则被丢弃。我们显示这种架构在模拟和实际生产拍卖环境中具有 statistically significant 性能提升。我们的方法不会增加基础设施、安全或解释成本，因为它直接优化现有生产流程中的参数，不需要替换黑盒模型如神经网络。

ZeroSwap: Data-driven Optimal Market Making in DeFi

paper_url: http://arxiv.org/abs/2310.09413
repo_url: None
paper_authors: Viraj Nadkarni, Jiachen Hu, Ranvir Rana, Chi Jin, Sanjeev Kulkarni, Pramod Viswanath
for: 这个论文主要研究了分布式金融中市场 maker（AMM）的功能和 LPs 的投资策略。
methods: 该论文提出了一种基于 classical market microstructure model 的 Bayesian 和数据驱动算法，以优化外币价格追踪。
results: 该论文提出了一种能够无需价格或损失或acles的外币价格估计方法，并提供了理论保证方法的稳定性和收敛性。

Abstract
Automated Market Makers (AMMs) are major centers of matching liquidity supply and demand in Decentralized Finance. Their functioning relies primarily on the presence of liquidity providers (LPs) incentivized to invest their assets into a liquidity pool. However, the prices at which a pooled asset is traded is often more stale than the prices on centralized and more liquid exchanges. This leads to the LPs suffering losses to arbitrage. This problem is addressed by adapting market prices to trader behavior, captured via the classical market microstructure model of Glosten and Milgrom. In this paper, we propose the first optimal Bayesian and the first model-free data-driven algorithm to optimally track the external price of the asset. The notion of optimality that we use enforces a zero-profit condition on the prices of the market maker, hence the name ZeroSwap. This ensures that the market maker balances losses to informed traders with profits from noise traders. The key property of our approach is the ability to estimate the external market price without the need for price oracles or loss oracles. Our theoretical guarantees on the performance of both these algorithms, ensuring the stability and convergence of their price recommendations, are of independent interest in the theory of reinforcement learning. We empirically demonstrate the robustness of our algorithms to changing market conditions.

摘要
自动市场制作者（AMM）是减中化金融中主要的匹配流动性供应和需求中心。它们的运作主要依赖于涉中资产投资者（LP）投入到流动性池中，以获得回报。然而，流动性池中购买和卖出的资产价格经常比中心化和更流动的交易所价格更慢。这会导致LP们因为买卖差价而损失。为解决这个问题，我们提出了首个优化的极 Bayesian 算法和首个无模型数据驱动算法，以优化跨asset的外部价格追踪。我们的优化条件是market maker的价格必须满足零利润条件，因此得名ZeroSwap。这使得市场制作者平衡了 Informed trader 的损失和随机 trader 的利润。我们的方法可以无需价格或损失oracle来估计外部市场价格，我们的理论保证了我们的算法的稳定性和收敛性。我们在实际中证明了我们的算法对市场条件的变化具有强大的稳定性。

Identifiability of Product of Experts Models

paper_url: http://arxiv.org/abs/2310.09397
repo_url: None
paper_authors: Spencer L. Gordon, Manav Kant, Eric Ma, Leonard J. Schulman, Andrei Staicu
for: 这种研究是为了研究一种叫做 Product of Experts（PoE）的层化网络模型，该模型可以快速学习生成高维数据，满足多个低维约束。
methods: 这种模型使用了一层 Binary Latent Variables 和一层 Binary Observables，这些变量是独立Conditional Random Fields（CRF）。
results: 研究人员发现，当Latents是均匀分布的时候，模型可以被识别，只需要与参数数量相同的数量 Observables。在更一般的情况下，当Latents是任意分布的时候，模型仍可以被识别，但是需要比最佳情况几乎的两倍多的 Observables。

Abstract
Product of experts (PoE) are layered networks in which the value at each node is an AND (or product) of the values (possibly negated) at its inputs. These were introduced as a neural network architecture that can efficiently learn to generate high-dimensional data which satisfy many low-dimensional constraints -- thereby allowing each individual expert to perform a simple task. PoEs have found a variety of applications in learning. We study the problem of identifiability of a product of experts model having a layer of binary latent variables, and a layer of binary observables that are iid conditional on the latents. The previous best upper bound on the number of observables needed to identify the model was exponential in the number of parameters. We show: (a) When the latents are uniformly distributed, the model is identifiable with a number of observables equal to the number of parameters (and hence best possible). (b) In the more general case of arbitrarily distributed latents, the model is identifiable for a number of observables that is still linear in the number of parameters (and within a factor of two of best-possible). The proofs rely on root interlacing phenomena for some special three-term recurrences.

摘要
产品专家（PoE）是一种层次网络，其每个节点的值是输入值（可能是否定）的AND（或产品）。这些网络 architecture 可以高效地学习生成高维数据，满足许多低维约束——因此每个专家可以完成简单任务。 PoEs 在学习中找到了多种应用。我们研究一个含有 binary latent variables 层和 binary 可观测变量层的 PoE 模型的可识别性问题。之前的最好上限是 exponential 增长于参数数量。我们表明：（a）当 latents uniform 分布时，模型可以通过一个等于参数数量的数量 Observables 进行识别（并且是最佳的）。（b）在更一般的情况下，latents 的分布可以是任意的，但模型仍可以通过 linear 增长于参数数量的 Observables 进行识别（并且在最佳的情况下只需要一个 фактор）。我们的证明 rely 于 certain three-term recurrences 的根交叠现象。

Machine Learning Estimation of Maximum Vertical Velocity from Radar

paper_url: http://arxiv.org/abs/2310.09392
repo_url: https://github.com/ai2es/hradar2updraft
paper_authors: Randy J. Chase, Amy McGovern, Cameron Homeyer, Peter Marinescu, Corey Potvin
for: 这个研究旨在使用机器学习模型（U-Nets）来从3D格里的雷达反射来推断最大的垂直速度和其扩散范围。
methods: 这个研究使用了模拟的雷达反射和垂直速度数据从国家极端天气实验室的预测系统（WoFS）来训练机器学习模型。
results: 最佳模型在独立测试集上提供了root mean squared error Less than 50%，Cofficient of determination greater than 0.65和intersection over union（IoU）More than 0.45。此外，在一个实际雷达数据和对应的双Doppler分析中，U-Net consistently underestimates the dual-Doppler updraft speed estimates by 50%.

Abstract
Despite being the source region of severe weather hazards, the quantification of the fast current of upward moving air (i.e., updraft) remains unavailable for operational forecasting. Updraft proxies, like overshooting top area from satellite images, have been linked to severe weather hazards but only relate to a limited portion of the total storm updraft. This study investigates if a machine learning model, namely U-Nets, can skillfully retrieve maximum vertical velocity and its areal extent from 3-dimensional (3D) gridded radar reflectivity alone. The machine learning model is trained using simulated radar reflectivity and vertical velocity from the National Severe Storm Laboratory's convection permitting Warn on Forecast System (WoFS). A parametric regression technique using the Sinh-arcsinh-normal (SHASH) distribution is adapted to run with UNets, allowing for both deterministic and probabilistic predictions of maximum vertical velocity. The best models after hyperparameter search provided less than 50% root mean squared error, a coefficient of determination greater than 0.65 and an intersection over union (IoU) of more than 0.45 on the independent test set composed of WoFS data. Beyond the WoFS analysis, a case study was conducted using real radar data and corresponding dual-Doppler analyses of vertical velocity within a supercell. The U-Net consistently underestimates the dual-Doppler updraft speed estimates by 50%. Meanwhile, the area of the 5 and 10 m s-1 updraft cores show an IoU of 0.25. While the above statistics are not exceptional, the machine learning model enables quick distillation of 3D radar data that is related to the maximum vertical velocity which could be useful in assessing a storm's severe potential.

摘要
尽管气象风险的源region也是强制 Current of upward moving air (i.e., updraft)的量化还没有在运维预测中可用。updraft代理，如卫星图像中的跨越顶部面积，与严重天气风险相关，但只 relate to a limited portion of the total storm updraft。这个研究 investigate whether a machine learning model, namely U-Nets, can skillfully retrieve maximum vertical velocity and its areal extent from 3-dimensional (3D) gridded radar reflectivity alone.机器学习模型在使用 simulated radar reflectivity和vertival velocity from the National Severe Storm Laboratory's convection permitting Warn on Forecast System (WoFS) 进行训练。使用 SHASH分布 Parametric regression technique, allowing for both deterministic and probabilistic predictions of maximum vertical velocity.最佳模型经过 гиперпараметры的搜索，提供了 less than 50% root mean squared error, a coefficient of determination greater than 0.65 and an intersection over union (IoU) of more than 0.45 on the independent test set composed of WoFS data.除了 WoFS 分析之外，一个 caso study 使用实际雷达数据和相应的 dual-Doppler 分析 vertical velocity within a supercell。U-Net consistently underestimates the dual-Doppler updraft speed estimates by 50%. Meanwhile, the area of the 5 and 10 m s-1 updraft cores show an IoU of 0.25. Although the above statistics are not exceptional, the machine learning model enables quick distillation of 3D radar data that is related to the maximum vertical velocity, which could be useful in assessing a storm's severe potential.

CORN: Co-Trained Full-Reference And No-Reference Audio Metrics

paper_url: http://arxiv.org/abs/2310.09388
repo_url: None
paper_authors: Pranay Manocha, Donald Williamson, Adam Finkelstein
for: 这篇论文旨在提出一种新的评估方法，即同时培育 FR 和 NR 模型。
methods: 这篇论文使用了一种新的框架，称为 CORN，将 FR 和 NR 两种方法结合在一起，同时培育两种模型。
results: 在论文中，研究人员发现了一种新的评估方法，即通过同时培育 FR 和 NR 模型，可以得到两个独立可用的模型，其中一个是 NR 模型，另一个是 FR 模型，两者均能超越独立培育的模型。

Abstract
Perceptual evaluation constitutes a crucial aspect of various audio-processing tasks. Full reference (FR) or similarity-based metrics rely on high-quality reference recordings, to which lower-quality or corrupted versions of the recording may be compared for evaluation. In contrast, no-reference (NR) metrics evaluate a recording without relying on a reference. Both the FR and NR approaches exhibit advantages and drawbacks relative to each other. In this paper, we present a novel framework called CORN that amalgamates these dual approaches, concurrently training both FR and NR models together. After training, the models can be applied independently. We evaluate CORN by predicting several common objective metrics and across two different architectures. The NR model trained using CORN has access to a reference recording during training, and thus, as one would expect, it consistently outperforms baseline NR models trained independently. Perhaps even more remarkable is that the CORN FR model also outperforms its baseline counterpart, even though it relies on the same training data and the same model architecture. Thus, a single training regime produces two independently useful models, each outperforming independently trained models.

摘要

Identifying and examining machine learning biases on Adult dataset

paper_url: http://arxiv.org/abs/2310.09373
repo_url: None
paper_authors: Sahil Girhepuje
for: 这项研究探讨了使用 Ensemble Learning 减少机器学习模型偏见的方法。
methods: 研究采用了严格的方法ология，全面评估偏见的多个分类变量，最终发现了性别属性偏见。
results: 实验证明了性别基于的薪资预测差距：从原始的 $902.91 降至 $774.31 当 gender 属性被转换为女性。此外，Kullback-Leibler 分布 scores 表明了性别偏见，其值超过 0.13，主要在树型模型中。使用 Ensemble Learning 可以寻求公正和透明度。很有趣的是，我们的发现表明了堆叠模型与个体模型相对于偏见的一致性。这项研究强调了伦理考虑和推广混合模型，为数据驱动的社会做出偏见和包容的努力。

Abstract
This research delves into the reduction of machine learning model bias through Ensemble Learning. Our rigorous methodology comprehensively assesses bias across various categorical variables, ultimately revealing a pronounced gender attribute bias. The empirical evidence unveils a substantial gender-based wage prediction disparity: wages predicted for males, initially at \$902.91, significantly decrease to \$774.31 when the gender attribute is alternated to females. Notably, Kullback-Leibler divergence scores point to gender bias, with values exceeding 0.13, predominantly within tree-based models. Employing Ensemble Learning elucidates the quest for fairness and transparency. Intriguingly, our findings reveal that the stacked model aligns with individual models, confirming the resilience of model bias. This study underscores ethical considerations and advocates the implementation of hybrid models for a data-driven society marked by impartiality and inclusivity.

摘要

From Words and Exercises to Wellness: Farsi Chatbot for Self-Attachment Technique

paper_url: http://arxiv.org/abs/2310.09362
repo_url: None
paper_authors: Sina Elahimanesh, Shayan Salehi, Sara Zahedi Movahed, Lisa Alazraki, Ruoyu Hu, Abbas Edalat
for: 这个研究的目的是为了开发一个基于数字心理咨询的对话机器人，用于帮助用户通过Self-Attachment（SAT）技术来自我帮助。methods: 这个研究使用了一个动态数组的规则编程和分类编程模块，以理解用户输入并根据对话流程进行相应的导航。此外，研究还使用了一个新的情感分析模块，可以准确地分类用户的情感为12个类别，并且准确率高于92%。results: 研究发现，75%的用户觉得对话机器人很有趣，72%的用户表示经过对话后感到更好，74%的用户满意SAT教师的表现。

Abstract
In the wake of the post-pandemic era, marked by social isolation and surging rates of depression and anxiety, conversational agents based on digital psychotherapy can play an influential role compared to traditional therapy sessions. In this work, we develop a voice-capable chatbot in Farsi to guide users through Self-Attachment (SAT), a novel, self-administered, holistic psychological technique based on attachment theory. Our chatbot uses a dynamic array of rule-based and classification-based modules to comprehend user input throughout the conversation and navigates a dialogue flowchart accordingly, recommending appropriate SAT exercises that depend on the user's emotional and mental state. In particular, we collect a dataset of over 6,000 utterances and develop a novel sentiment-analysis module that classifies user sentiment into 12 classes, with accuracy above 92%. To keep the conversation novel and engaging, the chatbot's responses are retrieved from a large dataset of utterances created with the aid of Farsi GPT-2 and a reinforcement learning approach, thus requiring minimal human annotation. Our chatbot also offers a question-answering module, called SAT Teacher, to answer users' questions about the principles of Self-Attachment. Finally, we design a cross-platform application as the bot's user interface. We evaluate our platform in a ten-day human study with N=52 volunteers from the non-clinical population, who have had over 2,000 dialogues in total with the chatbot. The results indicate that the platform was engaging to most users (75%), 72% felt better after the interactions, and 74% were satisfied with the SAT Teacher's performance.

摘要
在covid-19后的时代，社交孤独和抑郁症和焦虑症的发病率呈现出增长趋势。在这种情况下，基于数字心理咨询的对话代理可能比传统咨询会议更有影响力。在这项工作中，我们开发了一个可以理解farsi语言的语音可控虚拟助手，用于导引用户通过自我附加（SAT），一种新的、自适应的心理技巧，基于附加理论。我们的助手使用动态数组的规则基和分类基模块来理解用户输入，并根据对话流程图 Navigation accordingly，推荐适合用户的情感和心理状态的SAT仪式。尤其是，我们收集了超过6,000个语音和开发了一个新的情感分类模块，可以将用户的情感分为12个类别，准确率高于92%。为保持对话新鲜和有趣，助手的回答来自于一大量的语音数据库，创建了使用Farsi GPT-2和强化学习方法，因此需 minimal human annotation。我们的助手还提供了一个问答模块，称为SAT教师，以回答用户关于自我附加原理的问题。最后，我们设计了一个跨平台应用程序作为助手的用户界面。我们对这个平台进行了10天的人类研究，参与者数为52人，这些参与者在对助手的互动中共有过2,000个对话。结果表明，平台对大多数用户（75%）是有吸引力的，72%的用户表示在互动后感到更好，74%的用户对SAT教师的表现感到满意。

Is Certifying $\ell_p$ Robustness Still Worthwhile?

paper_url: http://arxiv.org/abs/2310.09361
repo_url: None
paper_authors: Ravi Mangal, Klas Leino, Zifan Wang, Kai Hu, Weicheng Yu, Corina Pasareanu, Anupam Datta, Matt Fredrikson
for: 这 paper 的目的是重新评估机器学习领域中的Robustness研究的实际价值。
methods: 这 paper 使用了 Certified defense 来对抗 $\ell_p$-bounded 攻击。
results: 这 paper argue that local robustness certification indeed confers practical value to the field of machine learning, and that certified training techniques constitute a particularly promising way for learning robust models.

Abstract
Over the years, researchers have developed myriad attacks that exploit the ubiquity of adversarial examples, as well as defenses that aim to guard against the security vulnerabilities posed by such attacks. Of particular interest to this paper are defenses that provide provable guarantees against the class of $\ell_p$-bounded attacks. Certified defenses have made significant progress, taking robustness certification from toy models and datasets to large-scale problems like ImageNet classification. While this is undoubtedly an interesting academic problem, as the field has matured, its impact in practice remains unclear, thus we find it useful to revisit the motivation for continuing this line of research. There are three layers to this inquiry, which we address in this paper: (1) why do we care about robustness research? (2) why do we care about the $\ell_p$-bounded threat model? And (3) why do we care about certification as opposed to empirical defenses? In brief, we take the position that local robustness certification indeed confers practical value to the field of machine learning. We focus especially on the latter two questions from above. With respect to the first of the two, we argue that the $\ell_p$-bounded threat model acts as a minimal requirement for safe application of models in security-critical domains, while at the same time, evidence has mounted suggesting that local robustness may lead to downstream external benefits not immediately related to robustness. As for the second, we argue that (i) certification provides a resolution to the cat-and-mouse game of adversarial attacks; and furthermore, that (ii) perhaps contrary to popular belief, there may not exist a fundamental trade-off between accuracy, robustness, and certifiability, while moreover, certified training techniques constitute a particularly promising way for learning robust models.

摘要
Over the years, researchers have developed many attacks that exploit the ubiquity of adversarial examples, as well as defenses that aim to guard against the security vulnerabilities posed by such attacks. Of particular interest to this paper are defenses that provide provable guarantees against the class of $\ell_p$-bounded attacks. Certified defenses have made significant progress, taking robustness certification from toy models and datasets to large-scale problems like ImageNet classification. While this is undoubtedly an interesting academic problem, as the field has matured, its impact in practice remains unclear, thus we find it useful to revisit the motivation for continuing this line of research. There are three layers to this inquiry, which we address in this paper: (1) why do we care about robustness research? (2) why do we care about the $\ell_p$-bounded threat model? And (3) why do we care about certification as opposed to empirical defenses? In brief, we take the position that local robustness certification indeed confers practical value to the field of machine learning. We focus especially on the latter two questions from above. With respect to the first of the two, we argue that the $\ell_p$-bounded threat model acts as a minimal requirement for safe application of models in security-critical domains, while at the same time, evidence has mounted suggesting that local robustness may lead to downstream external benefits not immediately related to robustness. As for the second, we argue that (i) certification provides a resolution to the cat-and-mouse game of adversarial attacks; and furthermore, that (ii) perhaps contrary to popular belief, there may not exist a fundamental trade-off between accuracy, robustness, and certifiability, while moreover, certified training techniques constitute a particularly promising way for learning robust models.

Exact Verification of ReLU Neural Control Barrier Functions

paper_url: http://arxiv.org/abs/2310.09360
repo_url: https://github.com/hongchaozhang-hz/exactverif-reluncbf-nips23
paper_authors: Hongchao Zhang, Junlin Wu, Yevgeniy Vorobeychik, Andrew Clark
for: This paper is written for safe control of nonlinear systems using machine learning methods, specifically focusing on verifying the safety of feedforward neural control barrier functions (NCBFs) with ReLU activation functions.
methods: The paper proposes novel exact conditions and algorithms for verifying the safety of NCBFs with ReLU activation functions. The approach involves decomposing the NCBF into piecewise linear segments, solving a nonlinear program to verify safety of each segment, and using Interval Bound Propagation (IBP) and linear relaxation to mitigate the complexity.
results: The paper presents numerical studies comparing the proposed approach with state-of-the-art SMT-based methods, demonstrating the effectiveness and efficiency of the proposed method. The code is available at https://github.com/HongchaoZhang-HZ/exactverif-reluncbf-nips23.

Abstract
Control Barrier Functions (CBFs) are a popular approach for safe control of nonlinear systems. In CBF-based control, the desired safety properties of the system are mapped to nonnegativity of a CBF, and the control input is chosen to ensure that the CBF remains nonnegative for all time. Recently, machine learning methods that represent CBFs as neural networks (neural control barrier functions, or NCBFs) have shown great promise due to the universal representability of neural networks. However, verifying that a learned CBF guarantees safety remains a challenging research problem. This paper presents novel exact conditions and algorithms for verifying safety of feedforward NCBFs with ReLU activation functions. The key challenge in doing so is that, due to the piecewise linearity of the ReLU function, the NCBF will be nondifferentiable at certain points, thus invalidating traditional safety verification methods that assume a smooth barrier function. We resolve this issue by leveraging a generalization of Nagumo's theorem for proving invariance of sets with nonsmooth boundaries to derive necessary and sufficient conditions for safety. Based on this condition, we propose an algorithm for safety verification of NCBFs that first decomposes the NCBF into piecewise linear segments and then solves a nonlinear program to verify safety of each segment as well as the intersections of the linear segments. We mitigate the complexity by only considering the boundary of the safe region and by pruning the segments with Interval Bound Propagation (IBP) and linear relaxation. We evaluate our approach through numerical studies with comparison to state-of-the-art SMT-based methods. Our code is available at https://github.com/HongchaoZhang-HZ/exactverif-reluncbf-nips23.

摘要
控制边界函数（CBF）是一种广泛使用的方法来保证非线性系统的安全控制。在CBF基于控制中，您希望的安全性质将被映射到非负的CBF中，并选择控制输入以确保CBF在所有时间都保持非负。近些年来，使用神经网络表示CBF（神经控制边界函数，或NCBF）的机器学习方法已经表现出了极大的搭配性。然而，确保学习到的CBF确保安全仍然是一个具有挑战性的研究问题。本文提出了新的精确条件和算法来验证NCBF的安全性。由于ReLU函数的割辑性，NCBF的积分不 diferenciable，因此传统的安全验证方法无法应用。我们解决这个问题 by leveraging a generalization of Nagumo's theorem for proving invariance of sets with nonsmooth boundaries to derive necessary and sufficient conditions for safety. Based on this condition, we propose an algorithm for safety verification of NCBFs that first decomposes the NCBF into piecewise linear segments and then solves a nonlinear program to verify safety of each segment as well as the intersections of the linear segments. We mitigate the complexity by only considering the boundary of the safe region and by pruning the segments with Interval Bound Propagation (IBP) and linear relaxation. We evaluate our approach through numerical studies with comparison to state-of-the-art SMT-based methods. Our code is available at .

Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

paper_url: http://arxiv.org/abs/2310.09336
repo_url: None
paper_authors: Maya Okawa, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka
for: 本研究旨在理解 conditional diffusion models 在实际应用中的可 compose 性。
methods: 我们在 synthetic 设置中控制了不同的训练数据属性，测试模型对样本生成的能力，并研究模型在不同任务中的性能。
results: 我们发现：（i）生成样本的顺序取决于数据生成过程的结构；（ii）在compositional任务中，性能会有突然的“出现”，这与生成模型中的 multiplicative 依赖性有关；（iii）生成不同频率的概念需要更多的优化步骤。

Abstract
Modern generative models exhibit unprecedented capabilities to generate extremely realistic data. However, given the inherent compositionality of the real world, reliable use of these models in practical applications requires that they exhibit the capability to compose a novel set of concepts to generate outputs not seen in the training data set. Prior work demonstrates that recent diffusion models do exhibit intriguing compositional generalization abilities, but also fail unpredictably. Motivated by this, we perform a controlled study for understanding compositional generalization in conditional diffusion models in a synthetic setting, varying different attributes of the training data and measuring the model's ability to generate samples out-of-distribution. Our results show: (i) the order in which the ability to generate samples from a concept and compose them emerges is governed by the structure of the underlying data-generating process; (ii) performance on compositional tasks exhibits a sudden ``emergence'' due to multiplicative reliance on the performance of constituent tasks, partially explaining emergent phenomena seen in generative models; and (iii) composing concepts with lower frequency in the training data to generate out-of-distribution samples requires considerably more optimization steps compared to generating in-distribution samples. Overall, our study lays a foundation for understanding capabilities and compositionality in generative models from a data-centric perspective.

摘要
现代生成模型显示出无 precedent 的能力，生成EXTREMELY 真实的数据。然而，由于实际世界的内在结构，在实际应用中使用这些模型需要它们能够组合一组新的概念，生成训练数据集中没有看到的输出。先前的工作表明，最近的扩散模型在 conditional 扩散模型中具有惊人的组合总结能力，但也会不可预测地失败。我们在一个控制的研究中，通过 vary 不同的训练数据属性，测试模型在生成Out-of-distribution 样本时的能力。我们的结果显示：（i）生成概念和组合概念的顺序是由下面的数据生成过程结构决定；（ii）在组合任务中表现出了突然的“出现”，这是因为各个任务的性能相互multiplicative 关系，部分解释生成模型中的emergent 现象；（iii）对于训练数据中出现频率较低的概念来生成Out-of-distribution 样本，需要训练步数比生成In-distribution 样本多得多。总之，我们的研究为了理解生成模型的能力和组合性从数据中心 perspective 提供了基础。

Statistical guarantees for stochastic Metropolis-Hastings

paper_url: http://arxiv.org/abs/2310.09335
repo_url: https://github.com/sbieringer/csmala
paper_authors: Sebastian Bieringer, Gregor Kasieczka, Maximilian F. Steffen, Mathias Trabs
for: 这个论文主要用于研究Gradient-based Markov chain Monte Carlo方法的不确定性评估中的一种常用步骤——Metropolis-Hastings步骤。
methods: 这个论文使用了一种名为 corrected stochastic Metropolis-Hastings方法，它可以避免计算成本的增加，但是会减少有效样本大小。
results: 论文研究了这种 corrected stochastic Metropolis-Hastings方法在 Gibbs posterior 分布中样本的站立性分布的统计性质，以及在深度神经网络回归中的PAC-Bayesoracle不等式和 credible sets 的性能。数值示例表明， credible sets 和 contraction rates 与 классиical Metropolis-adjusted Langevin algorithm 的结果几乎相同。

Abstract
A Metropolis-Hastings step is widely used for gradient-based Markov chain Monte Carlo methods in uncertainty quantification. By calculating acceptance probabilities on batches, a stochastic Metropolis-Hastings step saves computational costs, but reduces the effective sample size. We show that this obstacle can be avoided by a simple correction term. We study statistical properties of the resulting stationary distribution of the chain if the corrected stochastic Metropolis-Hastings approach is applied to sample from a Gibbs posterior distribution in a nonparametric regression setting. Focusing on deep neural network regression, we prove a PAC-Bayes oracle inequality which yields optimal contraction rates and we analyze the diameter and show high coverage probability of the resulting credible sets. With a numerical example in a high-dimensional parameter space, we illustrate that credible sets and contraction rates of the stochastic Metropolis-Hastings algorithm indeed behave similar to those obtained from the classical Metropolis-adjusted Langevin algorithm.

摘要
一种 Metropolis-Hastings 步骤广泛用于 gradient-based Markov chain Monte Carlo 方法中的不确定度评估。通过计算批处理的接受概率，杂乱 Metropolis-Hastings 步骤可以降低计算成本，但是减少有效样本大小。我们表明这个障碍可以通过一个简单的修正项解决。我们研究采样从 Gibbs posterior distribution 中的站立分布的统计性质，如果使用修正后的杂乱 Metropolis-Hastings 方法。在深度神经网络回归Setting 中，我们证明了 PAC-Bayes oracle 不等式，它提供了最佳压缩率，并且分析了 Diameter 和高覆盖率的信誉集。在高维参数空间中的数据示例中，我们证明了信誉集和压缩率实际上和 classical Metropolis-adjusted Langevin algorithm 的结果类似。

Disentangled Latent Spaces Facilitate Data-Driven Auxiliary Learning

paper_url: http://arxiv.org/abs/2310.09278
repo_url: None
paper_authors: Geri Skenderi, Luigi Capogrosso, Andrea Toaiari, Matteo Denitto, Franco Fummi, Simone Melzi, Marco Cristani
for: This paper is written for improving the performance of multi-task learning (MTL) models by discovering new unrelated classification tasks and their associated labels using a weakly supervised disentanglement procedure.
methods: The proposed method, called Detaux, uses a weakly supervised disentanglement procedure to isolate a subspace related to the principal task and an arbitrary number of orthogonal subspaces. The disentanglement procedure is followed by a clustering procedure to generate additional classification tasks and their associated labels.
results: The proposed method is validated on both synthetic and real data, and various ablation studies are conducted to demonstrate its effectiveness. The results show promising improvements in the performance of MTL models using the discovered additional tasks and labels.

Abstract
In deep learning, auxiliary objectives are often used to facilitate learning in situations where data is scarce, or the principal task is extremely complex. This idea is primarily inspired by the improved generalization capability induced by solving multiple tasks simultaneously, which leads to a more robust shared representation. Nevertheless, finding optimal auxiliary tasks that give rise to the desired improvement is a crucial problem that often requires hand-crafted solutions or expensive meta-learning approaches. In this paper, we propose a novel framework, dubbed Detaux, whereby a weakly supervised disentanglement procedure is used to discover new unrelated classification tasks and the associated labels that can be exploited with the principal task in any Multi-Task Learning (MTL) model. The disentanglement procedure works at a representation level, isolating a subspace related to the principal task, plus an arbitrary number of orthogonal subspaces. In the most disentangled subspaces, through a clustering procedure, we generate the additional classification tasks, and the associated labels become their representatives. Subsequently, the original data, the labels associated with the principal task, and the newly discovered ones can be fed into any MTL framework. Extensive validation on both synthetic and real data, along with various ablation studies, demonstrate promising results, revealing the potential in what has been, so far, an unexplored connection between learning disentangled representations and MTL. The code will be made publicly available upon acceptance.

摘要
在深度学习中，辅助目标 oftentimes 用于处理数据稀缺或主任任务非常复杂的情况。这个想法主要受到同时解决多个任务的提高通用化能力所带来的更加稳定的共享表示。然而，找到优化auxiliary tasks的问题经常需要手动制定解决方案或昂贵的元学习方法。在这篇论文中，我们提出了一种新的框架，称为Detaux，其中使用弱监督分解程序来发现新的无关的分类任务和相关的标签。这个分解程序在表示层次上工作， izolating一个与主任任务相关的子空间， plus 一个或多个正交的子空间。在最分解的子空间中，通过归类程序，我们生成了additional classification tasks，并且这些标签成为它们的代表。然后，原始数据、主任任务的标签以及新发现的标签可以被任何多任务学习框架中使用。我们在 sintetic 和实际数据上进行了广泛的验证，并进行了多种简化研究，得到了有前途的结果，表明了在以前未探索的学习分解表示和MTL之间的联系的潜在潜力。代码将在接受后公开发布。

A Hybrid Approach for Depression Classification: Random Forest-ANN Ensemble on Motor Activity Signals

paper_url: http://arxiv.org/abs/2310.09277
repo_url: None
paper_authors: Anket Patil, Dhairya Shah, Abhishek Shah, Mokshit Gala
for: 本研究旨在针对现代社会中受到问题的心理健康问题，通过利用可穿戴式仪器追踪和理解心理健康状况。
methods: 本研究使用机器学习方法来分析可穿戴式仪器资料，并开发了一个名为混合随机阶层-神经网络的新算法，以评估仪器资料中的抑郁状态。
results: 本研究发现，使用这个新算法可以实现80%的准确率，从抑郁症患者的仪器资料中评估出抑郁状态。这些结果显示出这个算法在心理健康诊断中具有可靠性和潜在价值。

Abstract
Regarding the rising number of people suffering from mental health illnesses in today's society, the importance of mental health cannot be overstated. Wearable sensors, which are increasingly widely available, provide a potential way to track and comprehend mental health issues. These gadgets not only monitor everyday activities but also continuously record vital signs like heart rate, perhaps providing information on a person's mental state. Recent research has used these sensors in conjunction with machine learning methods to identify patterns relating to different mental health conditions, highlighting the immense potential of this data beyond simple activity monitoring. In this research, we present a novel algorithm called the Hybrid Random forest - Neural network that has been tailored to evaluate sensor data from depressed patients. Our method has a noteworthy accuracy of 80\% when evaluated on a special dataset that included both unipolar and bipolar depressive patients as well as healthy controls. The findings highlight the algorithm's potential for reliably determining a person's depression condition using sensor data, making a substantial contribution to the area of mental health diagnostics.

摘要
关于现代社会中 mental health 问题的增加，mental health 的重要性literally cannot be overstated. 可穿戴式感知器，它们在日常生活中不仅可以监测活动，还可以不断记录生命 Parameters such as heart rate，可能提供关于一个人的 mental state 信息。 recent research 使用这些仪器和机器学习方法来识别不同的 mental health 状况，这些数据的潜在用途非常大。在这项研究中，我们提出了一种新的算法 called Hybrid Random forest - Neural network，专门用于评估受抑郁症影响的人的感知数据。我们的方法在一个特定的数据集上达到了80%的准确率，该数据集包括单极和双极抑郁症患者以及健康控制人群。这些发现表明了我们的算法在使用感知数据确定一个人的抑郁状况的可靠性，对 mental health 诊断做出了重要贡献。

Genetic algorithms are strong baselines for molecule generation

paper_url: http://arxiv.org/abs/2310.09267
repo_url: None
paper_authors: Austin Tripp, José Miguel Hernández-Lobato
for: 该论文主要目标是探讨生物分子生成方法，以及如何选择合适的生成方法。
methods: 该论文使用了遗传算法（GA）来生成分子，并证明了GA在这类任务中的强大性，比较多种复杂的机器学习方法。
results: 研究发现，GA算法在分子生成任务中表现出色，超过了许多复杂的机器学习方法。因此，该论文提议在 peer review 中要求新算法具有显著的优势 над GA，称为 GA 标准。

Abstract
Generating molecules, both in a directed and undirected fashion, is a huge part of the drug discovery pipeline. Genetic algorithms (GAs) generate molecules by randomly modifying known molecules. In this paper we show that GAs are very strong algorithms for such tasks, outperforming many complicated machine learning methods: a result which many researchers may find surprising. We therefore propose insisting during peer review that new algorithms must have some clear advantage over GAs, which we call the GA criterion. Ultimately our work suggests that a lot of research in molecule generation should be re-assessed.

摘要
生成分子是药物探索管道中的一大部分。遗传算法（GA）可以随机修改已知分子，以生成新的分子。在这篇论文中，我们显示了GA是一种非常强大的算法，超过了许多复杂的机器学习方法：这可能会让许多研究人员感到意外。因此，我们建议在同行评审中要求新算法具有GA criterion，即GA标准。最终，我们的工作表明，许多分子生成研究应该重新评估。Here's a word-for-word translation:生成分子是药物探索管道中的一大部分。遗传算法（GA）可以随机修改已知分子，以生成新的分子。在这篇论文中，我们显示了GA是一种非常强大的算法，超过了许多复杂的机器学习方法：这可能会让许多研究人员感到意外。因此，我们建议在同行评审中要求新算法具有GA criterion，即GA标准。最终，我们的工作表明，许多分子生成研究应该重新评估。

Towards End-to-end 4-Bit Inference on Generative Large Language Models

paper_url: http://arxiv.org/abs/2310.09259
repo_url: None
paper_authors: Saleh Ashkboos, Ilia Markov, Elias Frantar, Tingxuan Zhong, Xincheng Wang, Jie Ren, Torsten Hoefler, Dan Alistarh
for: 大型生成模型的推理计算可以使用4位数字，实现实用的加速，同时维护准确性。
methods: 使用QUIK协议，压缩大多数权重和活动为4位数字，保留一些异常权重和活动。提供高效的GPU加速器，实现综合性的加速。
results: 实现了FP16执行的3.1倍加速，提供了实用的推理计算加速方法。

Abstract
We show that the majority of the inference computations for large generative models such as LLaMA and OPT can be performed with both weights and activations being cast to 4 bits, in a way that leads to practical speedups while at the same time maintaining good accuracy. We achieve this via a hybrid quantization strategy called QUIK, which compresses most of the weights and activations to 4-bit, while keeping some outlier weights and activations in higher-precision. Crucially, our scheme is designed with computational efficiency in mind: we provide GPU kernels with highly-efficient layer-wise runtimes, which lead to practical end-to-end throughput improvements of up to 3.1x relative to FP16 execution. Code and models are provided at https://github.com/IST-DASLab/QUIK.

摘要
我们显示了大型生成模型如LLaMA和OPT的大多数推理计算可以使用4位数字实现，这导致了实用的速度提升，同时保持了好的准确性。我们使用一种名为QUIK的混合压缩策略，将大多数 weights和activations压缩到4位数字，而保留一些偏出的 weights和activations在更高精度下。我们的方案具有Computational efficiency的设计：我们提供了高效的GPU核心，实现了层别的高效运行，从而实现了实用的终端通过putthrough的提升，最多达3.1倍相对于FP16执行。代码和模型可以在https://github.com/IST-DASLab/QUIK中找到。

Generative Entropic Neural Optimal Transport To Map Within and Across Spaces

paper_url: http://arxiv.org/abs/2310.09254
repo_url: None
paper_authors: Dominik Klein, Théo Uscidda, Fabian Theis, Marco Cuturi
for: 这个论文是为了研究机器学习中的测量映射，即将一个空间映射到另一个空间的问题。
methods: 这个论文使用优化运输理论（Optimal Transport，OT）作为印导偏好，将 нейрон网络模型与OT结合使用，以实现优化测量映射。
results: 这个论文提出了一个统一的框架，称为生成 entropy neural optimal transport（GENOT），可以处理任意成本函数，处理随机性使用条件生成模型，可以将点映射到不同的空间，并且可以作为不平衡的解决方案。在单元细胞生物领域中，GENOT得到了良好的实践效果，用于模拟细胞发育、预测药物对细胞的反应以及细胞数据模式之间的翻译。

Abstract
Learning measure-to-measure mappings is a crucial task in machine learning, featured prominently in generative modeling. Recent years have witnessed a surge of techniques that draw inspiration from optimal transport (OT) theory. Combined with neural network models, these methods collectively known as \textit{Neural OT} use optimal transport as an inductive bias: such mappings should be optimal w.r.t. a given cost function, in the sense that they are able to move points in a thrifty way, within (by minimizing displacements) or across spaces (by being isometric). This principle, while intuitive, is often confronted with several practical challenges that require adapting the OT toolbox: cost functions other than the squared-Euclidean cost can be challenging to handle, the deterministic formulation of Monge maps leaves little flexibility, mapping across incomparable spaces raises multiple challenges, while the mass conservation constraint inherent to OT can provide too much credit to outliers. While each of these mismatches between practice and theory has been addressed independently in various works, we propose in this work an elegant framework to unify them, called \textit{generative entropic neural optimal transport} (GENOT). GENOT can accommodate any cost function; handles randomness using conditional generative models; can map points across incomparable spaces, and can be used as an \textit{unbalanced} solver. We evaluate our approach through experiments conducted on various synthetic datasets and demonstrate its practicality in single-cell biology. In this domain, GENOT proves to be valuable for tasks such as modeling cell development, predicting cellular responses to drugs, and translating between different data modalities of cells.

摘要
学习度量到度量的映射是机器学习中非常重要的任务，广泛应用于生成模型。过去几年，有许多基于最优运输（OT）理论的技术在机器学习领域得到了广泛应用。这些方法通常被称为“神经网络最优运输”（Neural OT），它们将最优运输作为假设，即映射应该尽可能地减少权重的变化。这个原则是直观的，但在实际应用中受到许多实际挑战，例如：1. 非欧几何距离成本函数可能具有问题。2. 决定性的蒙格映射留下了少量的灵活性。3. 将点映射到不同的空间可能会遇到多种挑战。4. OT中的质量保守约束可能会给异常点提供过多的信任。尽管每个这些偏差都在不同的作品中独立地得到了解决，但我们在这个工作中提出了一个简洁的框架，叫做“生成Entropic神经最优运输”（GENOT）。GENOT可以考虑任何成本函数，可以通过条件生成模型处理随机性，可以将点映射到不同的空间，并且可以作为“不平衡”的解决方案。我们通过对各种 sintetic 数据进行实验，以及在单元细胞领域中的应用，证明了GENOT的实用性。在这个领域中，GENOT表示了值得价值的任务，例如：1. 模拟细胞发育。2. 预测细胞对药物的反应。3. 将不同数据模式的细胞翻译成另一种数据模式。

Insuring Smiles: Predicting routine dental coverage using Spark ML

paper_url: http://arxiv.org/abs/2310.09229
repo_url: None
paper_authors: Aishwarya Gupta, Rahul S. Bhogale, Priyanka Thota, Prathushkumar Dathuri, Jongwook Woo
for: 本研究的目的是提供一种便利个人和家庭选择适当的健康保险计划，基于收入和开支。
methods: 本研究使用机器学习算法，包括折衣分布、决策树、随机森林、梯度提升、分解模型和支持向量机器。
results: 研究通过分析计划类型、地区、 deductibles、out-of-pocket maximums 和 copayments，预测健康保险计划是否覆盖成人常规牙科服务。

Abstract
Finding suitable health insurance coverage can be challenging for individuals and small enterprises in the USA. The Health Insurance Exchange Public Use Files (Exchange PUFs) dataset provided by CMS offers valuable information on health and dental policies [1]. In this paper, we leverage machine learning algorithms to predict if a health insurance plan covers routine dental services for adults. By analyzing plan type, region, deductibles, out-of-pocket maximums, and copayments, we employ Logistic Regression, Decision Tree, Random Forest, Gradient Boost, Factorization Model and Support Vector Machine algorithms. Our goal is to provide a clinical strategy for individuals and families to select the most suitable insurance plan based on income and expenses.

摘要
在美国，找到适合个人或小型企业的健康保险覆盖可以是一项挑战。美国医疗保险交易公共使用文件（Exchange PUFs）数据集提供了有价值的信息关于健康和牙科保险政策 [1]。在这篇论文中，我们利用机器学习算法预测健康保险计划是否覆盖成人日常牙科服务。我们分析计划类型、地区、deductibles、out-of-pocket最高限额和 copayments，并使用Logistic Regression、决策树、Random Forest、Gradient Boost、Factorization Model和Support Vector Machine算法。我们的目标是为个人和家庭提供基于收入和开支的临床策略，以选择最适合的保险计划。

Regularization-Based Methods for Ordinal Quantification

paper_url: http://arxiv.org/abs/2310.09210
repo_url: https://github.com/mirkobunse/regularized-oq
paper_authors: Mirko Bunse, Alejandro Moreo, Fabrizio Sebastiani, Martin Senz
for: 研究预测分类问题中的排序问题（ordinal quantification，OQ），提供了两个新的资料集，并比较了过去文献中提出的主要算法。
methods: 使用了多种不同的研究领域的算法，包括数据挖掘和天文学，并将其对比测试。
results: 提出了一种新的规 regularized OQ 算法，它在实验中超过了现有的算法表现，并且通过了一些实际应用中的验证。

Abstract
Quantification, i.e., the task of training predictors of the class prevalence values in sets of unlabeled data items, has received increased attention in recent years. However, most quantification research has concentrated on developing algorithms for binary and multiclass problems in which the classes are not ordered. Here, we study the ordinal case, i.e., the case in which a total order is defined on the set of n>2 classes. We give three main contributions to this field. First, we create and make available two datasets for ordinal quantification (OQ) research that overcome the inadequacies of the previously available ones. Second, we experimentally compare the most important OQ algorithms proposed in the literature so far. To this end, we bring together algorithms proposed by authors from very different research fields, such as data mining and astrophysics, who were unaware of each others' developments. Third, we propose a novel class of regularized OQ algorithms, which outperforms existing algorithms in our experiments. The key to this gain in performance is that our regularization prevents ordinally implausible estimates, assuming that ordinal distributions tend to be smooth in practice. We informally verify this assumption for several real-world applications.

摘要
它的量化任务，即在无标签数据集中训练类预测值的任务，在最近几年内得到了更多的关注。然而，大多数量化研究集中在 binary 和多类问题上，在这些问题中，类别没有定义顺序。在这里，我们研究 ordinal 情况，即在 n > 2 个类别中定义排序。我们对这个领域做出了三个主要贡献。首先，我们创建了两个用于 ordinal 量化（OQ）研究的数据集，这些数据集在前一些不足的情况下超越了现有的数据集。第二，我们对 literature 中最重要的 OQ 算法进行了实验性比较。为此，我们将来自不同的研究领域，如数据挖掘和天文学，这些人对彼此的发展不知道的算法集成在一起。第三，我们提出了一种新的常化 OQ 算法，它在我们的实验中超越了现有的算法。这个增强的性能的关键在于，我们的常化预防了ordinally 不可能的估计，假设ordinally 分布在实际中是平滑的。我们 informally 验证了这个假设，在一些实际应用中。

Graph Condensation via Eigenbasis Matching

paper_url: http://arxiv.org/abs/2310.09202
repo_url: None
paper_authors: Yang Liu, Deyu Bo, Chuan Shi
for: 提高图数据的效率和扩展性， Graph Neural Networks (GNNs) 的计算成本和扩展性面临了更高的要求，尽管它们在各种图相关应用中表现出色。
methods: Graph Condensation (GC) 是一种将大图变换成小图的技术，以降低 GNNs 的计算成本。但我们的实验表明，现有的 GC 方法受到不良泛化的影响，即不同的 GNNs 在同一个小图上表现出明显的性能差距。
results: 我们提出了一种名为 GCEM 的 eigenbasis matching 方法，可以减少 GNNs 对图的spectrum bias，从而提高 GC 的泛化性能。我们的理论分析和实验结果都表明，GCEM 可以在五个图数据集上达到最佳性能，同时减少不同 GNNs 之间的性能差距。

Abstract
The increasing amount of graph data places requirements on the efficiency and scalability of graph neural networks (GNNs), despite their effectiveness in various graph-related applications. Recently, the emerging graph condensation (GC) sheds light on reducing the computational cost of GNNs from a data perspective. It aims to replace the real large graph with a significantly smaller synthetic graph so that GNNs trained on both graphs exhibit comparable performance. However, our empirical investigation reveals that existing GC methods suffer from poor generalization, i.e., different GNNs trained on the same synthetic graph have obvious performance gaps. What factors hinder the generalization of GC and how can we mitigate it? To answer this question, we commence with a detailed analysis and observe that GNNs will inject spectrum bias into the synthetic graph, resulting in a distribution shift. To tackle this issue, we propose eigenbasis matching for spectrum-free graph condensation, named GCEM, which has two key steps: First, GCEM matches the eigenbasis of the real and synthetic graphs, rather than the graph structure, which eliminates the spectrum bias of GNNs. Subsequently, GCEM leverages the spectrum of the real graph and the synthetic eigenbasis to construct the synthetic graph, thereby preserving the essential structural information. We theoretically demonstrate that the synthetic graph generated by GCEM maintains the spectral similarity, i.e., total variation, of the real graph. Extensive experiments conducted on five graph datasets verify that GCEM not only achieves state-of-the-art performance over baselines but also significantly narrows the performance gaps between different GNNs.

摘要
“graph neural networks（GNNs）的效率和扩展性在增加图数据的情况下面临挑战，尽管它们在各种图相关应用中表现出色。在最近，出现了图缩写（GC），它想要通过将真实的大图换成一个远小的合成图来减少GNNs的计算成本。然而，我们的实验表明，现有的GC方法受到泛化的困难，即使同一个合成图上训练不同的GNNs，它们的性能存在明显的差距。这些因素阻碍GC的泛化吗？如何消除这些问题？为了回答这个问题，我们开始了详细的分析，发现GNNs会在合成图中注入spectrum偏见，导致分布shift。为了解决这个问题，我们提出了 eigenbasis matching for spectrum-free graph condensation（GCEM），它有两个关键步骤：首先，GCEM匹配了真实图和合成图的eigenbasis，而不是图结构，从而消除了GNNs的spectrum偏见。然后，GCEM利用了真实图的spectrum和合成图的eigenbasis来构建合成图，从而保留了实际结构中的关键信息。我们论证了合成图生成者GCEM保持了实际图的spectral similarity，即总变量。在五个图数据集上进行了广泛的实验，得到的结果表明GCEM不仅超过了基eline的性能，而且在不同的GNNs之间减少了性能差距。”

A 4-approximation algorithm for min max correlation clustering

paper_url: http://arxiv.org/abs/2310.09196
repo_url: https://github.com/jannikirmai/min-max-correlation-clustering
paper_authors: Holger Heidrich, Jannik Irmai, Bjoern Andres
for: 提出了一种下界技术用于最大最小协方差聚类问题，并基于该技术开发了一种基于 combinatorial 的 4-approximation 算法 для完全图。
methods: 使用了一个线性 програм序列化（Kalhan et al., 2019）和一种 combinatorial 算法（Davies et al., 2023）。
results: 提高了前best known approximation guarantee的5和40，并通过一种扩展的简单的加入规则优化了实验性能和运行时间在多个 benchmark 数据集。

Abstract
We introduce a lower bounding technique for the min max correlation clustering problem and, based on this technique, a combinatorial 4-approximation algorithm for complete graphs. This improves upon the previous best known approximation guarantees of 5, using a linear program formulation (Kalhan et al., 2019), and 40, for a combinatorial algorithm (Davies et al., 2023). We extend this algorithm by a greedy joining heuristic and show empirically that it improves the state of the art in solution quality and runtime on several benchmark datasets.

摘要
我们介绍一种下界技巧 для最大最小相关对排 clustering 问题，并基于这技巧，提出了一个数学Programming的4倍近似算法 для完整图。这超越了之前最好的知识保证5，使用线性程式表示（Kalhan等，2019），以及40， для一个数学算法（Davies等，2023）。我们将这个算法扩展为一个排序式的组合Algorithm，并 empirically show that it improves the state of the art in solution quality and runtime on several benchmark datasets.

Variational autoencoder with weighted samples for high-dimensional non-parametric adaptive importance sampling

paper_url: http://arxiv.org/abs/2310.09194
repo_url: https://github.com/julien6431/importance-sampling-vae
paper_authors: Julien Demange-Chryst, François Bachoc, Jérôme Morio, Timothé Krauth
for: 用于 approximating Target Distribution 的方法
methods: 使用 variational autoencoder parameterized distribution
results: 可以在高维度中更有效地Estimate rare event probability 和 draw points from target distribution，并且可以学习多modal distribution

Abstract
Probability density function estimation with weighted samples is the main foundation of all adaptive importance sampling algorithms. Classically, a target distribution is approximated either by a non-parametric model or within a parametric family. However, these models suffer from the curse of dimensionality or from their lack of flexibility. In this contribution, we suggest to use as the approximating model a distribution parameterised by a variational autoencoder. We extend the existing framework to the case of weighted samples by introducing a new objective function. The flexibility of the obtained family of distributions makes it as expressive as a non-parametric model, and despite the very high number of parameters to estimate, this family is much more efficient in high dimension than the classical Gaussian or Gaussian mixture families. Moreover, in order to add flexibility to the model and to be able to learn multimodal distributions, we consider a learnable prior distribution for the variational autoencoder latent variables. We also introduce a new pre-training procedure for the variational autoencoder to find good starting weights of the neural networks to prevent as much as possible the posterior collapse phenomenon to happen. At last, we explicit how the resulting distribution can be combined with importance sampling, and we exploit the proposed procedure in existing adaptive importance sampling algorithms to draw points from a target distribution and to estimate a rare event probability in high dimension on two multimodal problems.

摘要
“probability density function估计使用权重样本是所有适束重要样本推断算法的基础。传统上，target分布被估计为非parametric模型或在parametric家族中。然而，这些模型受到维度缘故或其缺乏弹性。在这篇贡献中，我们建议使用一个受权重样本条件的分布来估计target分布。我们将exist的框架扩展到受权重样本的情况下，通过引入一个新的目标函数。这个分布家族的弹性使其与非parametric模型一样有表现力，并且在高维度情况下比 класси Golus Gaussian或Gaussian混合家族更高效。此外，为了增加模型的灵活性，我们考虑了一个可学习的假设分布 для variational autoencoder的隐藏变量。我们还导入了一个新的增强训练程序，以避免 posterior collapse 现象发生。最后，我们详细介绍了如何将所得到的分布与重要样本推断算法结合，并在高维度中评估了两个多模型问题上的效果。”

A Deep Neural Network – Mechanistic Hybrid Model to Predict Pharmacokinetics in Rat

paper_url: http://arxiv.org/abs/2310.09167
repo_url: None
paper_authors: Florian Führer, Andrea Gruber, Holger Diedam, Andreas H. Göller, Stephan Menz, Sebastian Schneckener
for: 这项研究的目的是提高小分子药物或农药的系统可用性预测，以便更好地Focus drug or agrochemical development on compounds with favorable kinetic profiles.
methods: 该研究使用了一种hybrid模型，包括机器学习模型和机理学模型，以预测小分子药物或农药的系统可用性。
results: 研究人员通过增加数据集训练和改进机器学习模型和机理学模型的参数化，提高了模型的 median fold change error，从2.85下降到2.35 для全口暴露和从1.95下降到1.62 для intravenousadministration。此外，研究人员还扩展了该方法，以预测其他终点和处理不同的 covariates，如性别和剂量形式。

Abstract
An important aspect in the development of small molecules as drugs or agro-chemicals is their systemic availability after intravenous and oral administration.The prediction of the systemic availability from the chemical structure of a poten-tial candidate is highly desirable, as it allows to focus the drug or agrochemicaldevelopment on compounds with a favorable kinetic profile. However, such pre-dictions are challenging as the availability is the result of the complex interplaybetween molecular properties, biology and physiology and training data is rare.In this work we improve the hybrid model developed earlier [34]. We reducethe median fold change error for the total oral exposure from 2.85 to 2.35 andfor intravenous administration from 1.95 to 1.62. This is achieved by trainingon a larger data set, improving the neural network architecture as well as theparametrization of mechanistic model. Further, we extend our approach to predictadditional endpoints and to handle different covariates, like sex and dosage form.In contrast to a pure machine learning model, our model is able to predict newend points on which it has not been trained. We demonstrate this feature by1predicting the exposure over the first 24h, while the model has only been trainedon the total exposure.

摘要
Important aspects of small molecule development as drugs or agrochemicals include their systemic availability after intravenous and oral administration. Predicting the systemic availability from the chemical structure of a potential candidate is highly desirable, as it allows for focusing drug or agrochemical development on compounds with a favorable kinetic profile. However, such predictions are challenging due to the complex interplay between molecular properties, biology, and physiology, and training data is rare.In this work, we improve the hybrid model developed earlier [34]. We reduce the median fold change error for total oral exposure from 2.85 to 2.35 and for intravenous administration from 1.95 to 1.62. This is achieved by training on a larger data set, improving the neural network architecture, and parameterizing the mechanistic model. Additionally, we extend our approach to predict additional endpoints and handle different covariates, such as sex and dosage form.Unlike a pure machine learning model, our model can predict new endpoints it has not been trained on. We demonstrate this feature by predicting exposure over the first 24 hours, even though the model has only been trained on total exposure.

Jointly-Learned Exit and Inference for a Dynamic Neural Network : JEI-DNN

paper_url: http://arxiv.org/abs/2310.09163
repo_url: None
paper_authors: Florence Regol, Joud Chataoui, Mark Coates
for: 这个研究旨在提高大型预训练的机器学习模型在实际应用中的性能和不确定度描述能力。
methods: 研究采用了一种新的构建方法，将旁边检查机制（GM）和中间检查模组（IM）联系起来，从中间 Representation 进行检查和预测。
results: 研究获得了 significan performance 提高在分类数据集上，并且可以更好地描述不确定度信息。

Abstract
Large pretrained models, coupled with fine-tuning, are slowly becoming established as the dominant architecture in machine learning. Even though these models offer impressive performance, their practical application is often limited by the prohibitive amount of resources required for every inference. Early-exiting dynamic neural networks (EDNN) circumvent this issue by allowing a model to make some of its predictions from intermediate layers (i.e., early-exit). Training an EDNN architecture is challenging as it consists of two intertwined components: the gating mechanism (GM) that controls early-exiting decisions and the intermediate inference modules (IMs) that perform inference from intermediate representations. As a result, most existing approaches rely on thresholding confidence metrics for the gating mechanism and strive to improve the underlying backbone network and the inference modules. Although successful, this approach has two fundamental shortcomings: 1) the GMs and the IMs are decoupled during training, leading to a train-test mismatch; and 2) the thresholding gating mechanism introduces a positive bias into the predictive probabilities, making it difficult to readily extract uncertainty information. We propose a novel architecture that connects these two modules. This leads to significant performance improvements on classification datasets and enables better uncertainty characterization capabilities.

摘要
We propose a novel EDNN architecture that connects the GM and IMs, leading to significant performance improvements on classification datasets and better uncertainty characterization capabilities.

The Computational Complexity of Finding Stationary Points in Non-Convex Optimization

paper_url: http://arxiv.org/abs/2310.09157
repo_url: None
paper_authors: Alexandros Hollender, Manolis Zampetakis
for: 这个论文的目的是解决非对称优化问题中找到 Approximate 站点的问题。
methods: 这个论文使用了 PLS-完善性和 zero-order 算法来解决这个问题。
results: 这个论文得到了关于 Approximate 站点的问题的 Computational 和 Query 复杂度的一系列结论，包括：1. 这个问题是 PLS-完善的；2. 对于 $d=2$，存在一种 zero-order 算法，可以在 $O(1/\varepsilon)$ 值询问中找到 $\varepsilon$- Approximate 站点；3. 任何算法都需要至少 $\Omega(1/\varepsilon)$ 值询问和/或梯度询问来找到 $\varepsilon$- Approximate 站点；4. 对于 $d=2$，存在一种 zero-order 算法，可以在 $O(1/\sqrt{\varepsilon})$ 值询问中找到 $\varepsilon$- KKT 点。

Abstract
Finding approximate stationary points, i.e., points where the gradient is approximately zero, of non-convex but smooth objective functions $f$ over unrestricted $d$-dimensional domains is one of the most fundamental problems in classical non-convex optimization. Nevertheless, the computational and query complexity of this problem are still not well understood when the dimension $d$ of the problem is independent of the approximation error. In this paper, we show the following computational and query complexity results: 1. The problem of finding approximate stationary points over unrestricted domains is PLS-complete. 2. For $d = 2$, we provide a zero-order algorithm for finding $\varepsilon$-approximate stationary points that requires at most $O(1/\varepsilon)$ value queries to the objective function. 3. We show that any algorithm needs at least $\Omega(1/\varepsilon)$ queries to the objective function and/or its gradient to find $\varepsilon$-approximate stationary points when $d=2$. Combined with the above, this characterizes the query complexity of this problem to be $\Theta(1/\varepsilon)$. 4. For $d = 2$, we provide a zero-order algorithm for finding $\varepsilon$-KKT points in constrained optimization problems that requires at most $O(1/\sqrt{\varepsilon})$ value queries to the objective function. This closes the gap between the works of Bubeck and Mikulincer [2020] and Vavasis [1993] and characterizes the query complexity of this problem to be $\Theta(1/\sqrt{\varepsilon})$. 5. Combining our results with the recent result of Fearnley et al. [2022], we show that finding approximate KKT points in constrained optimization is reducible to finding approximate stationary points in unconstrained optimization but the converse is impossible.

摘要
“找到非凸函数$f$的近似站点（stationary points）是 классиcal non-convex 优化中的一个最基本问题。然而，在维度$d$不受限制时，这个问题的计算和询问复杂度还不够了解。在这篇论文中，我们提供以下计算和询问复杂度结果：1. 找到非凸函数$f$的近似站点问题是PLS-完备的。2. 当$d=2$时，我们提供一个零次方法来找到$\varepsilon$-近似站点，需要最多$O(1/\varepsilon)$次询问函数值。3. 我们证明任何算法都需要至少$\Omega(1/\varepsilon)$次询问函数值和/或其导数来找到$\varepsilon$-近似站点，当$d=2$时。这一结果与上述结果相结合，Characterizes this problem's query complexity as $\Theta(1/\varepsilon)$.4. 当$d=2$时，我们提供一个零次方法来找到$\varepsilon$-KKT点（KKT点），需要最多$O(1/\sqrt{\varepsilon})$次询问函数值。这一结果与Bubeck和Mikulincer（2020）和Vavasis（1993）的结果匹配，Characterizes this problem's query complexity as $\Theta(1/\sqrt{\varepsilon})$.5. 将我们的结果与Fearnley等（2022）的结果结合，我们证明找到 approximate KKT点在受限制优化中是可逆的，但是受限制优化中的KKT点不可能被转化为非凸函数的近似站点。”

Lattice Approximations in Wasserstein Space

paper_url: http://arxiv.org/abs/2310.09149
repo_url: None
paper_authors: Keaton Hamm, Varun Khurana
for: 本文研究了在 Wasserstein 空间 $W_p(\mathbb{R}^d)$ 中使用排序 Voronoi 分区法来 aproximate 离散和 piecewise 常数测度。
methods: 作者使用了一种扩展 Voronoi 分区法，该法基于一个缩放后的全排名 lattice $\Lambda$，并使用一个 covering 算法来确定最佳approximation。
results: 作者证明了，对于 $p\in[1,\infty)$ 和 $d\geq 1$，如果将 $\Lambda$ 缩放为 $h\in(0,1]$，那么使用 Voronoi 分区法 approximation 测度是 $O(h)$，不виси于 $d$ 或 $p$。此外，作者还证明了 $N$-term approximation 的最佳速率是 $O(N^{-\frac1d})$，与已知的最佳量化器和 empirical measure approximation 的速率相同。最后，作者扩展了这些结果到非封闭支持测度。

Abstract
We consider structured approximation of measures in Wasserstein space $W_p(\mathbb{R}^d)$ for $p\in[1,\infty)$ by discrete and piecewise constant measures based on a scaled Voronoi partition of $\mathbb{R}^d$. We show that if a full rank lattice $\Lambda$ is scaled by a factor of $h\in(0,1]$, then approximation of a measure based on the Voronoi partition of $h\Lambda$ is $O(h)$ regardless of $d$ or $p$. We then use a covering argument to show that $N$-term approximations of compactly supported measures is $O(N^{-\frac1d})$ which matches known rates for optimal quantizers and empirical measure approximation in most instances. Finally, we extend these results to noncompactly supported measures with sufficient decay.

摘要
我们考虑在 Wasserstein 空间 $W_p(\mathbb{R}^d)$ 中结构化近似措施，使用随机和划分式常数措施，基于扩展 Voronoi 分解 $\mathbb{R}^d$。我们显示，如果一个全rank 阵列 $\Lambda$ 被扩展了一个因子 $h\in(0,1]$, 那么基于 $h\Lambda$ 的 Voronoi 分解的近似是 $O(h)$，不виси于 $d$ 或 $p$。然后，我们使用覆盖 Argument 来显示 $N$-term 近似是 $O(N^{-{\frac{1}{d}})$，这与已知的最优误差和empirical measure approximation的速率匹配。最后，我们扩展这些结果到非封闭支持的措施，具有足够减速。

Goodhart’s Law in Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.09144
repo_url: None
paper_authors: Jacek Karwowski, Oliver Hayman, Xingjian Bai, Klaus Kiendlhofer, Charlie Griffin, Joar Skalse
for: 这篇论文主要针对的是在奖励函数不准确时，RLAlgorithm 的优化问题。
methods: 作者提出了一种量化奖励函数的不准确性的方法，并通过实验证明了这种方法可以预测奖励函数不准确性导致的行为。
results: 作者提出了一种最佳停止方法，可以避免奖励函数不准确性导致的问题，并 derivated 一种理论上的 regret bound。此外，作者还提出了一种 maximize worst-case reward 的训练方法，可以在奖励函数不确定的情况下实现。实验结果支持这种方法的有效性。

Abstract
Implementing a reward function that perfectly captures a complex task in the real world is impractical. As a result, it is often appropriate to think of the reward function as a proxy for the true objective rather than as its definition. We study this phenomenon through the lens of Goodhart's law, which predicts that increasing optimisation of an imperfect proxy beyond some critical point decreases performance on the true objective. First, we propose a way to quantify the magnitude of this effect and show empirically that optimising an imperfect proxy reward often leads to the behaviour predicted by Goodhart's law for a wide range of environments and reward functions. We then provide a geometric explanation for why Goodhart's law occurs in Markov decision processes. We use these theoretical insights to propose an optimal early stopping method that provably avoids the aforementioned pitfall and derive theoretical regret bounds for this method. Moreover, we derive a training method that maximises worst-case reward, for the setting where there is uncertainty about the true reward function. Finally, we evaluate our early stopping method experimentally. Our results support a foundation for a theoretically-principled study of reinforcement learning under reward misspecification.

摘要
实现一个完美地捕捉复杂任务的奖函数是不实际的。因此，通常需要视奖函数为true目标的代理而不是其定义。我们通过Goodhart的法则来研究这种现象，Goodhart的法则预测，在某个关键点上增加奖函数的优化后，对真实目标的性能下降。我们首先提出了衡量这种效果的方法，并证明了，在各种环境和奖函数下，通常会出现Goodhart的法则所预测的行为。然后，我们提供了一种几何解释，解释了在Markov决策过程中Why Goodhart's law occurs。我们根据这些理论性视角，提出了一种最佳早期停止方法，该方法可以证明避免上述困难，并 derive了对该方法的 regret bound。此外，我们还 derive了一种最大化最差情况奖函数的训练方法，该方法可以在true奖函数不确定的情况下实现。最后，我们进行了实验评估。我们的结果支持了一种基于理论原则的探索学习下 reward misspecification 的研究基础。

Computing Marginal and Conditional Divergences between Decomposable Models with Applications

paper_url: http://arxiv.org/abs/2310.09129
repo_url: None
paper_authors: Loong Kuan Lee, Geoffrey I. Webb, Daniel F. Schmidt, Nico Piatkowski
for:这种论文的主要目标是计算高维分布之间的差异，具体来说是 alpha-beta 差异。methods:这种方法是基于 Markov 网络的 decomposable 模型，通过将差异分解成 marginal 和 conditional 分布的差异来计算差异。results:这种方法可以对高维分布进行 exact 计算，并且可以用于分析分布的变化。在一个图像数据集上进行了实验，并且提出了一种新的量化错误方法。

Abstract
The ability to compute the exact divergence between two high-dimensional distributions is useful in many applications but doing so naively is intractable. Computing the alpha-beta divergence -- a family of divergences that includes the Kullback-Leibler divergence and Hellinger distance -- between the joint distribution of two decomposable models, i.e chordal Markov networks, can be done in time exponential in the treewidth of these models. However, reducing the dissimilarity between two high-dimensional objects to a single scalar value can be uninformative. Furthermore, in applications such as supervised learning, the divergence over a conditional distribution might be of more interest. Therefore, we propose an approach to compute the exact alpha-beta divergence between any marginal or conditional distribution of two decomposable models. Doing so tractably is non-trivial as we need to decompose the divergence between these distributions and therefore, require a decomposition over the marginal and conditional distributions of these models. Consequently, we provide such a decomposition and also extend existing work to compute the marginal and conditional alpha-beta divergence between these decompositions. We then show how our method can be used to analyze distributional changes by first applying it to a benchmark image dataset. Finally, based on our framework, we propose a novel way to quantify the error in contemporary superconducting quantum computers. Code for all experiments is available at: https://lklee.dev/pub/2023-icdm/code

摘要
“计算高维分布之间的紧急差异是许多应用中的有用工具，但直接计算是不可行的。在叶-$ \beta $ 差异中，包括废察-$ \text{KL} $ 差异和HELLINGER 距离的计算可以在圆柱体-$ \text{tw} $ 的几何宽度上 exponential 时间内完成。然而，将高维对象的不同性折射到单个整数值上可能是无用的。此外，在supervised 学习中，对于两个模型的分布差异可能更关心 Conditional 分布。因此，我们提出一种方法来计算任意一个条件分布或主分布之间的紧急差异。这是非常困难，因为我们需要将差异分解成两个分布的差异，并且需要对这两个分布进行分解。我们提供了这种分解，并将其推广到计算这两个分布之间的紧急差异。然后，我们用这种方法分析了一个标准图像集的分布变化。最后，基于我们的框架，我们提出了一种新的方法来评估当代超导量子计算机的错误。代码可以在以下链接获取：https://lklee.dev/pub/2023-icdm/code”

On Generalization Bounds for Projective Clustering

paper_url: http://arxiv.org/abs/2310.09127
repo_url: None
paper_authors: Maria Sofia Bucarelli, Matilde Fjeldsø Larsen, Chris Schwiegelshohn, Mads Bech Toftrup
for: 这 paper written for 研究 clustering 问题，具体来说是研究 center-based 和 subspace clustering 问题的学习约束。
methods: 这 paper 使用了学习约束来研究 clustering 问题，包括 $k$-means 和 $k$-median 等中心基本的目标函数，以及 $j$-dimensional 子空间 clustering。
results: 这 paper 得到了对 clustering 问题的学习约束的证明，包括 center-based 问题的 $\tilde{O}\left(\sqrt{\frac{k}{n}\right)$ 速度约束，以及 subspace clustering 问题的 $\tilde{O}\left(\sqrt{\frac{kj^2}{n}\right)$ 速度约束。这些结果都是首次得到的。此外，paper 还证明了 projective clustering 问题的 $\Omega\left(\sqrt{\frac{kj}{n}\right)$ 速度约束是必要的，这也证明了 [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] 的结果是可持。

Abstract
Given a set of points, clustering consists of finding a partition of a point set into $k$ clusters such that the center to which a point is assigned is as close as possible. Most commonly, centers are points themselves, which leads to the famous $k$-median and $k$-means objectives. One may also choose centers to be $j$ dimensional subspaces, which gives rise to subspace clustering. In this paper, we consider learning bounds for these problems. That is, given a set of $n$ samples $P$ drawn independently from some unknown, but fixed distribution $\mathcal{D}$, how quickly does a solution computed on $P$ converge to the optimal clustering of $\mathcal{D}$? We give several near optimal results. In particular, For center-based objectives, we show a convergence rate of $\tilde{O}\left(\sqrt{k}/{n}\right)$. This matches the known optimal bounds of [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] and [Bartlett, Linder, and Lugosi, IEEE Trans. Inf. Theory 1998] for $k$-means and extends it to other important objectives such as $k$-median. For subspace clustering with $j$-dimensional subspaces, we show a convergence rate of $\tilde{O}\left(\sqrt{\frac{kj^2}{n}\right)$. These are the first provable bounds for most of these problems. For the specific case of projective clustering, which generalizes $k$-means, we show a convergence rate of $\Omega\left(\sqrt{\frac{kj}{n}\right)$ is necessary, thereby proving that the bounds from [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] are essentially optimal.

摘要
给一个点集合， clustering 的目标是找到一个分割点集合 into $k$ 个群的方法，使得每个点被分配到的中心点 как近可能。通常，中心点是点自身，这导致了著名的 $k$- median 和 $k$-means 目标。也可以选择中心点为 $j$ 维子空间，这给出了子空间 clustering。在这篇论文中，我们考虑了学习 bounds для这些问题。即，给定一个 $n$ 个样本集合 $P$，被独立地从某种未知但固定的分布 $\mathcal{D}$ 采样而来，如何证明solution computed on $P$ converge to $\mathcal{D}$ 中的最优分 clustering? 我们给出了一些near optimal 结果。具体来说，对于中心基于目标，我们显示了一个 convergence rate of $\tilde{O}\left(\sqrt{\frac{k}{n}\right)$。这与 [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] 和 [Bartlett, Linder, and Lugosi, IEEE Trans. Inf. Theory 1998] 的知名最优 bound 匹配，并扩展了它们到其他重要的目标，如 $k$-median。对于 $j$-维子空间 clustering，我们显示了一个 convergence rate of $\tilde{O}\left(\sqrt{\frac{kj^2}{n}\right)$。这是这些问题的首次可证明 bound。特别是，对于 projective clustering，这种扩展 $k$-means 的问题，我们显示了一个 convergence rate of $\Omega\left(\sqrt{\frac{kj}{n}\right)$ 是必要的，从而证明了 [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] 的 bound 是可能最优的。

Automatic Music Playlist Generation via Simulation-based Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.09123
repo_url: None
paper_authors: Federico Tomasi, Joseph Cauteruccio, Surya Kanoria, Kamil Ciosek, Matteo Rinaldi, Zhenwen Dai
for: 这项研究旨在提高个性化播放列表的品质，以便更好地满足用户的需求。
methods: 该研究使用了人工智能技术，具体来说是使用改进的深度Q学习策略（AH-DQN），通过在模拟的播放列表生成环境中直接优化用户满意度指标来解决了传统的合作过滤方法的局限性。
results: 研究人员通过在模拟环境中进行了Offline分析和评估，并在在线A/B测试中证明了该策略可以提高用户满意度指标。此外，研究人员还发现了与实际在线 metric 结果之间的强相关性。

Abstract
Personalization of playlists is a common feature in music streaming services, but conventional techniques, such as collaborative filtering, rely on explicit assumptions regarding content quality to learn how to make recommendations. Such assumptions often result in misalignment between offline model objectives and online user satisfaction metrics. In this paper, we present a reinforcement learning framework that solves for such limitations by directly optimizing for user satisfaction metrics via the use of a simulated playlist-generation environment. Using this simulator we develop and train a modified Deep Q-Network, the action head DQN (AH-DQN), in a manner that addresses the challenges imposed by the large state and action space of our RL formulation. The resulting policy is capable of making recommendations from large and dynamic sets of candidate items with the expectation of maximizing consumption metrics. We analyze and evaluate agents offline via simulations that use environment models trained on both public and proprietary streaming datasets. We show how these agents lead to better user-satisfaction metrics compared to baseline methods during online A/B tests. Finally, we demonstrate that performance assessments produced from our simulator are strongly correlated with observed online metric results.

摘要
个人化播放列表是音乐流媒体服务的常见特性，但传统的技术，如共同识别，通常会基于明确的内容质量假设来学习如何提供建议。这些假设常导致在线模型目标与用户满意度指标之间的不一致。在这篇论文中，我们提出了一种使用强化学习框架来解决这些限制，直接优化用户满意度指标。使用这个模拟器，我们开发了一种修改后的深度Q网络（AH-DQN），以解决我们的RL形式中的挑战。这种策略能够从大型和动态的候选项集中选择，以期 maximize consumption metrics。我们通过使用环境模型，在公共和专用的流媒体数据集上进行了下线分析和评估。我们显示了这些代理比基eline方法在线A/B测试中的更好的用户满意度指标。最后，我们证明了我们的模拟器生成的性能评估与实际上线 metric 结果之间存在强相关性。

Topological Data Analysis in smart manufacturing processes – A survey on the state of the art

paper_url: http://arxiv.org/abs/2310.09319
repo_url: None
paper_authors: Martin Uray, Barbara Giunti, Michael Kerber, Stefan Huber
for: 这篇论文主要是为了探讨数据分析方法 topological data analysis (TDA) 在工业生产和生产过程中的应用。
methods: 论文使用了数据分析方法 topological data analysis (TDA) 来分析复杂多维数据。
results: 论文通过对工业生产和生产过程中的应用来分析 TDA 的应用和其工具的优势，同时还提出了这些方法的挑战和未来可能性。

Abstract
Topological Data Analysis (TDA) is a mathematical method using techniques from topology for the analysis of complex, multi-dimensional data that has been widely and successfully applied in several fields such as medicine, material science, biology, and others. This survey summarizes the state of the art of TDA in yet another application area: industrial manufacturing and production in the context of Industry 4.0. We perform a rigorous and reproducible literature search of applications of TDA on the setting of industrial production and manufacturing. The resulting works are clustered and analyzed based on their application area within the manufacturing process and their input data type. We highlight the key benefits of TDA and their tools in this area and describe its challenges, as well as future potential. Finally, we discuss which TDA methods are underutilized in (the specific area of) industry and the identified types of application, with the goal of prompting more research in this profitable area of application.

摘要

Online Relocating and Matching of Ride-Hailing Services: A Model-Based Modular Approach

paper_url: http://arxiv.org/abs/2310.09071
repo_url: None
paper_authors: Chang Gao, Xi Lin, Fang He, Xindi Tang
for: 这种研究旨在提出一种基于模型的模块化方法（MMA），用于动态优化乘客请求和车辆重新分配在乘客请求平台上。
methods: MMA使用了两层和模块化的模型结构，其中上层确定系统中车流的空间传递模式，以最大化当前和未来阶段的总收入。下层使用快速匹配和车辆重新分配。
results: 我们证明了提出的算法可以在涂抹网络中 достичь全球优化，而数值实验基于寓言网络和实际数据显示，MMA可以在乘客请求和车辆重新分配方面实现更高的系统性性能，并且具有较低的计算成本和较高的Robustness。

Abstract
This study proposes an innovative model-based modular approach (MMA) to dynamically optimize order matching and vehicle relocation in a ride-hailing platform. MMA utilizes a two-layer and modular modeling structure. The upper layer determines the spatial transfer patterns of vehicle flow within the system to maximize the total revenue of the current and future stages. With the guidance provided by the upper layer, the lower layer performs rapid vehicle-to-order matching and vehicle relocation. MMA is interpretable, and equipped with the customized and polynomial-time algorithm, which, as an online order-matching and vehicle-relocation algorithm, can scale past thousands of vehicles. We theoretically prove that the proposed algorithm can achieve the global optimum in stylized networks, while the numerical experiments based on both the toy network and realistic dataset demonstrate that MMA is capable of achieving superior systematic performance compared to batch matching and reinforcement-learning based methods. Moreover, its modular and lightweight modeling structure further enables it to achieve a high level of robustness against demand variation while maintaining a relatively low computational cost.

摘要
MMA is interpretable and equipped with a customized and polynomial-time algorithm, which can scale past thousands of vehicles. We prove that the proposed algorithm can achieve the global optimum in stylized networks, and numerical experiments based on both a toy network and realistic dataset demonstrate that MMA can achieve superior systematic performance compared to batch matching and reinforcement-learning based methods. Additionally, its modular and lightweight modeling structure enables it to achieve a high level of robustness against demand variation while maintaining a relatively low computational cost.

MINDE: Mutual Information Neural Diffusion Estimation

paper_url: http://arxiv.org/abs/2310.09031
repo_url: None
paper_authors: Giulio Franzese, Mustapha Bounoua, Pietro Michiardi
for: 本文提出了一种新的穿梭方法来估计随机变量之间的共轭信息（Mutual Information，MI）。
methods: 该方法基于 Girсанов定理的新解释，使用分数函数扩散模型来估计两个分布之间的卷积列比（Kullback Leibler divergence），并且同时可以估计随机变量的熵。
results: 我们的方法比文献中主要的方法更准确，特别是对于困难的分布。此外，我们的方法通过自我一致性测试，包括数据处理和独立性测试，得出了正面的结果。

Abstract
In this work we present a new method for the estimation of Mutual Information (MI) between random variables. Our approach is based on an original interpretation of the Girsanov theorem, which allows us to use score-based diffusion models to estimate the Kullback Leibler divergence between two densities as a difference between their score functions. As a by-product, our method also enables the estimation of the entropy of random variables. Armed with such building blocks, we present a general recipe to measure MI, which unfolds in two directions: one uses conditional diffusion process, whereas the other uses joint diffusion processes that allow simultaneous modelling of two random variables. Our results, which derive from a thorough experimental protocol over all the variants of our approach, indicate that our method is more accurate than the main alternatives from the literature, especially for challenging distributions. Furthermore, our methods pass MI self-consistency tests, including data processing and additivity under independence, which instead are a pain-point of existing methods.

摘要
在这项工作中，我们提出了一种新的共识信息（Mutual Information，MI） между随机变量的估计方法。我们的方法基于一种原创的 Girсанов定理解释，允许我们通过使用分数函数来估计两个概率密度之间的贝叶斯演化模型，从而估计卷积抽象函数（Kullback Leibler divergence）。此外，我们的方法还允许估计随机变量的熵。持有这些基本结构后，我们提出了一种总体的计量MI方法，该方法在两个方向下进行扩展：一个使用条件演化过程，另一个使用共同演化过程，可同时模拟两个随机变量。我们的结果表明，我们的方法比主要的文献中的方法更准确，特别是对于复杂的分布。此外，我们的方法还通过自我一致测试，包括数据处理和独立性测试，而这些测试对现有方法来说是一个痛点。

Federated Meta-Learning for Few-Shot Fault Diagnosis with Representation Encoding

paper_url: http://arxiv.org/abs/2310.09002
repo_url: None
paper_authors: Jixuan Cui, Jun Li, Zhen Mei, Kang Wei, Sha Wei, Ming Ding, Wen Chen, Song Guo
for: 这个论文主要是为了提出一个基于深度学习的整体疾病诊断（FD）方法，并且使用联合学习（FL）来实现跨机构的训练。
methods: 这个方法使用了一种新的训练策略，即基于表示编码和元学习的整合方法，以将训练客户端的内在多样性转化为对不同的作业条件或设备类型的扩展。此外，还提出了一种适应插值方法，将地方和全球模型的最佳结合作为本地训练的初始化。
results: 相比于现有的方法，如FedProx，这个方法可以在未见到的作业条件或设备类型下实现高精度的诊断，并且在不同的设备类型下实现13.44%-18.33%的提升。

Abstract
Deep learning-based fault diagnosis (FD) approaches require a large amount of training data, which are difficult to obtain since they are located across different entities. Federated learning (FL) enables multiple clients to collaboratively train a shared model with data privacy guaranteed. However, the domain discrepancy and data scarcity problems among clients deteriorate the performance of the global FL model. To tackle these issues, we propose a novel framework called representation encoding-based federated meta-learning (REFML) for few-shot FD. First, a novel training strategy based on representation encoding and meta-learning is developed. It harnesses the inherent heterogeneity among training clients, effectively transforming it into an advantage for out-of-distribution generalization on unseen working conditions or equipment types. Additionally, an adaptive interpolation method that calculates the optimal combination of local and global models as the initialization of local training is proposed. This helps to further utilize local information to mitigate the negative effects of domain discrepancy. As a result, high diagnostic accuracy can be achieved on unseen working conditions or equipment types with limited training data. Compared with the state-of-the-art methods, such as FedProx, the proposed REFML framework achieves an increase in accuracy by 2.17%-6.50% when tested on unseen working conditions of the same equipment type and 13.44%-18.33% when tested on totally unseen equipment types, respectively.

摘要
深度学习基于的故障诊断（FD）方法需要大量的训练数据，但这些数据往往分散在不同的实体上，难以获得。联邦学习（FL）可以让多个客户共同训练一个共享模型，同时保证数据隐私。然而，客户端的领域差异和数据缺乏问题会影响全局FL模型的性能。为解决这些问题，我们提出了一个新的框架，即表示编码基于联邦meta学习（REFML），用于几何学学习。首先，我们开发了一种基于表示编码和meta学习的新训练策略。它利用了训练客户端的自然多样性，以便在未看过的工作条件或设备类型上进行out-of-distribution泛化。此外，我们还提出了一种适应 interpolating 方法，该方法计算了全局和本地模型的优质combinación，作为本地训练的初始化。这有助于进一步利用本地信息，减轻领域差异的负面影响。因此，我们的REFML框架可以在有限的训练数据下实现高精度的故障诊断，并且相比 estado-of-the-art 方法，REFML 框架可以提高精度的提升为2.17%-6.50%和13.44%-18.33%。

Measuring the Stability of Process Outcome Predictions in Online Settings

paper_url: http://arxiv.org/abs/2310.09000
repo_url: https://github.com/ghksdl6025/online_ppm_stability
paper_authors: Suhwan Lee, Marco Comuzzi, Xixi Lu, Hajo A. Reijers
for: 本研究旨在评估在线预测过程监控中模型的稳定性，以确保其在不同的风险环境中的一致性和可靠性。
methods: 本研究提出了一个评估框架，包括四个性能协方差：性能下降的频率、下降的幅度、恢复率和性能的变化程度。
results: 研究结果表明，这些协方差可以帮助比较和选择不同风险环境下的预测模型，并为动态商业环境做出更好的决策。

Abstract
Predictive Process Monitoring aims to forecast the future progress of process instances using historical event data. As predictive process monitoring is increasingly applied in online settings to enable timely interventions, evaluating the performance of the underlying models becomes crucial for ensuring their consistency and reliability over time. This is especially important in high risk business scenarios where incorrect predictions may have severe consequences. However, predictive models are currently usually evaluated using a single, aggregated value or a time-series visualization, which makes it challenging to assess their performance and, specifically, their stability over time. This paper proposes an evaluation framework for assessing the stability of models for online predictive process monitoring. The framework introduces four performance meta-measures: the frequency of significant performance drops, the magnitude of such drops, the recovery rate, and the volatility of performance. To validate this framework, we applied it to two artificial and two real-world event logs. The results demonstrate that these meta-measures facilitate the comparison and selection of predictive models for different risk-taking scenarios. Such insights are of particular value to enhance decision-making in dynamic business environments.

摘要
Translated into Simplified Chinese:predictive进程监控 aimsto forecast the future progress of process instances using historical event data. As predictive process monitoring is increasingly applied in online settings to enable timely interventions, evaluating the performance of the underlying models becomes crucial for ensuring their consistency and reliability over time. This is especially important in high-risk business scenarios where incorrect predictions may have severe consequences. However, predictive models are currently usually evaluated using a single, aggregated value or a time-series visualization, which makes it challenging to assess their performance and, specifically, their stability over time. This paper proposes an evaluation framework for assessing the stability of models for online predictive process monitoring. The framework introduces four performance meta-measures: the frequency of significant performance drops, the magnitude of such drops, the recovery rate, and the volatility of performance. To validate this framework, we applied it to two artificial and two real-world event logs. The results demonstrate that these meta-measures facilitate the comparison and selection of predictive models for different risk-taking scenarios. Such insights are of particular value to enhance decision-making in dynamic business environments.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

PAGE: Equilibrate Personalization and Generalization in Federated Learning

paper_url: http://arxiv.org/abs/2310.08961
repo_url: None
paper_authors: Qian Chen, Zilong Wang, Jiaqi Hu, Haonan Yan, Jianying Zhou, Xiaodong Lin
for: 本研究旨在提出一种能同时保证本地模型个性化和全局模型泛化的 Federated Learning（FL）算法，以满足客户（客）的当前需求和服务提供商（服务器）的未来需求。
methods: 本研究使用游戏理论为基础，提出了一种名为PAGE的算法，将FL转变为客户和服务器之间的合作游戏。为了探索平衡点，PAGE将游戏形式化为马尔可夫决策过程，并使用回归学习算法进行解决。
results: 对四种广泛使用的数据集进行了广泛的实验，显示PAGE可以同时提高全局和本地预测精度，并且可以提高预测精度最高达35.20%和39.91%。此外，对PAGE的偏度变体进行了实验，表明它在实际应用中具有良好的适应性。

Abstract
Federated learning (FL) is becoming a major driving force behind machine learning as a service, where customers (clients) collaboratively benefit from shared local updates under the orchestration of the service provider (server). Representing clients' current demands and the server's future demand, local model personalization and global model generalization are separately investigated, as the ill-effects of data heterogeneity enforce the community to focus on one over the other. However, these two seemingly competing goals are of equal importance rather than black and white issues, and should be achieved simultaneously. In this paper, we propose the first algorithm to balance personalization and generalization on top of game theory, dubbed PAGE, which reshapes FL as a co-opetition game between clients and the server. To explore the equilibrium, PAGE further formulates the game as Markov decision processes, and leverages the reinforcement learning algorithm, which simplifies the solving complexity. Extensive experiments on four widespread datasets show that PAGE outperforms state-of-the-art FL baselines in terms of global and local prediction accuracy simultaneously, and the accuracy can be improved by up to 35.20% and 39.91%, respectively. In addition, biased variants of PAGE imply promising adaptiveness to demand shifts in practice.

摘要
联邦学习（FL）正在成为机器学习云服务的主要驱动力，客户（客户端）共同从共享的本地更新中获得了服务提供者（服务器）的协调。客户当前的需求和服务器未来的需求都被考虑，本地模型个性化和全球模型通用是分别调查的，由于数据不同性的副作用，社区必须集中于一个而不是另一个。然而，这两个似乎竞争的目标并不是黑白的问题，应该同时实现。在这篇论文中，我们提出了首个在游戏理论基础上均衡个性化和通用性的算法，名为PAGE，它将FL转变为客户和服务器之间的协作游戏。为了探索平衡点，PAGE进一步将游戏形式为Markov决策过程，并利用了回归学习算法，从而简化解决复杂性。在四种广泛使用的数据集上进行了广泛的实验，PAGE比状态艺术FL基elines在全球和本地预测准确率上同时表现出色，可以提高预测精度的最大值35.20%和39.91%。此外，对PAGE的偏见变体进行了有优势的适应性测试。

LLaMA Rider: Spurring Large Language Models to Explore the Open World

paper_url: http://arxiv.org/abs/2310.08922
repo_url: None
paper_authors: Yicheng Feng, Yuxuan Wang, Jiazheng Liu, Sipeng Zheng, Zongqing Lu
for: 本研究旨在帮助Large Language Models（LLMs）在开放世界中进行决策和规划，并将LLMs的知识与世界条件相互协调。
methods: 本研究提出了一种鼓励LLMs在开放世界中自主探索、收集经验，并通过反馈机制进行修改，以提高其任务解决能力。此外，我们还 интегрирова了子任务重新标注，以帮助LLMs保持任务规划的一致性，并帮助模型学习任务之间的组合性。
results: 我们在Minecraft中进行评估，发现我们的方法LLaMA-Rider可以提高LLM在环境探索中的效率，并通过仅使用1.3k个数据集进行微调，使LLM的任务解决能力得到显著提高，与基线使用强化学习的训练成本相比，训练成本减少了。

Abstract
Recently, various studies have leveraged Large Language Models (LLMs) to help decision-making and planning in environments, and try to align the LLMs' knowledge with the world conditions. Nonetheless, the capacity of LLMs to continuously acquire environmental knowledge and adapt in an open world remains uncertain. In this paper, we propose an approach to spur LLMs to explore the open world, gather experiences, and learn to improve their task-solving capabilities. In this approach, a multi-round feedback-revision mechanism is utilized to encourage LLMs to actively select appropriate revision actions guided by feedback information from the environment. This facilitates exploration and enhances the model's performance. Besides, we integrate sub-task relabeling to assist LLMs in maintaining consistency in sub-task planning and help the model learn the combinatorial nature between tasks, enabling it to complete a wider range of tasks through training based on the acquired exploration experiences. By evaluation in Minecraft, an open-ended sandbox world, we demonstrate that our approach LLaMA-Rider enhances the efficiency of the LLM in exploring the environment, and effectively improves the LLM's ability to accomplish more tasks through fine-tuning with merely 1.3k instances of collected data, showing minimal training costs compared to the baseline using reinforcement learning.

摘要
In this approach, we use a multi-round feedback-revision mechanism to encourage LLMs to select appropriate revision actions based on feedback information from the environment. This helps the model explore the environment more effectively and enhances its performance. Additionally, we integrate sub-task relabeling to help LLMs maintain consistency in sub-task planning and learn the combinatorial nature between tasks, allowing the model to complete a wider range of tasks through training based on the acquired exploration experiences.By evaluating our approach, LLaMA-Rider, in Minecraft, an open-ended sandbox world, we demonstrate that it enhances the efficiency of the LLM in exploring the environment and improves its ability to accomplish more tasks through fine-tuning with just 1.3k instances of collected data, showing minimal training costs compared to the baseline using reinforcement learning.

EHI: End-to-end Learning of Hierarchical Index for Efficient Dense Retrieval

paper_url: http://arxiv.org/abs/2310.08891
repo_url: None
paper_authors: Ramnath Kumar, Anshul Mittal, Nilesh Gupta, Aditya Kusupati, Inderjit Dhillon, Prateek Jain
For: 提高 semantic search 和排序问题的效果，如获取关于给定查询的相关文档。* Methods: 使用 dense embedding-based retrieval，包括两个阶段：（a）对 dual encoder 进行对照学习，以训练 embedding 和（b）使用 approximate nearest neighbor search (ANNS) 来找到相似的文档。* Results: 提出了 End-to-end Hierarchical Indexing (EHI)，可以同时学习 embedding 和 ANNS 结构，以优化检索性能。EHI 使用标准 dual encoder 模型来对查询和文档进行 embedding，并学习一个 inverted file index (IVF) 样式的树结构来实现高效的 ANNS。

Abstract
Dense embedding-based retrieval is now the industry standard for semantic search and ranking problems, like obtaining relevant web documents for a given query. Such techniques use a two-stage process: (a) contrastive learning to train a dual encoder to embed both the query and documents and (b) approximate nearest neighbor search (ANNS) for finding similar documents for a given query. These two stages are disjoint; the learned embeddings might be ill-suited for the ANNS method and vice-versa, leading to suboptimal performance. In this work, we propose End-to-end Hierarchical Indexing -- EHI -- that jointly learns both the embeddings and the ANNS structure to optimize retrieval performance. EHI uses a standard dual encoder model for embedding queries and documents while learning an inverted file index (IVF) style tree structure for efficient ANNS. To ensure stable and efficient learning of discrete tree-based ANNS structure, EHI introduces the notion of dense path embedding that captures the position of a query/document in the tree. We demonstrate the effectiveness of EHI on several benchmarks, including de-facto industry standard MS MARCO (Dev set and TREC DL19) datasets. For example, with the same compute budget, EHI outperforms state-of-the-art (SOTA) in by 0.6% (MRR@10) on MS MARCO dev set and by 4.2% (nDCG@10) on TREC DL19 benchmarks.

摘要
现在的industry标准是使用密集嵌入来实现semantic搜索和排名问题，如获取给定查询的相关网络文档。这些技术使用两个阶段进程：（a）对比学习来训练双Encoder来对查询和文档进行嵌入，以及（b） Approximate Nearest Neighbor Search（ANNS）来找到查询中相似的文档。这两个阶段是独立的，学习得到的嵌入可能不适合ANNS方法，反之亦然，可能导致表现下降。在这种工作中，我们提出了End-to-end Hierarchical Indexing（EHI），它同时学习嵌入和ANNS结构，以优化搜索性能。EHI使用标准的双Encoder模型来对查询和文档进行嵌入，而学习一个IVF风格的倒排索引树结构来高效地进行ANNS。为确保稳定和高效地学习离散树结构，EHI引入了密集路径嵌入，它记录查询/文档在树中的位置。我们在多个 benchmark 上证明了EHI的有效性，包括de facto 行业标准的MS MARCO（Dev set和TREC DL19）数据集。例如，与同样的计算预算，EHI在MS MARCO Dev set 上比SOTA提高了0.6%（MRR@10），在TREC DL19 数据集上提高了4.2%（nDCG@10）。

Gesture Recognition for FMCW Radar on the Edge

paper_url: http://arxiv.org/abs/2310.08876
repo_url: None
paper_authors: Maximilian Strobel, Stephan Schoenfeldt, Jonas Daugalas
for: 这篇论文介绍了一种基于60GHz频率调制连续波（FMCW）雷达的轻量级手势识别系统。
methods: 论文提出了一种使用五个特征来 caracterize gestures的方法，并提出了一种简单的雷达处理算法来提取这些特征。
results: 论文表明了该系统可以在fully embedded平台上实现高精度的手势识别，并且具有低内存占用、低计算能力和低功耗特点。

Abstract
This paper introduces a lightweight gesture recognition system based on 60 GHz frequency modulated continuous wave (FMCW) radar. We show that gestures can be characterized efficiently by a set of five features, and propose a slim radar processing algorithm to extract these features. In contrast to previous approaches, we avoid heavy 2D processing, i.e. range-Doppler imaging, and perform instead an early target detection - this allows us to port the system to fully embedded platforms with tight constraints on memory, compute and power consumption. A recurrent neural network (RNN) based architecture exploits these features to jointly detect and classify five different gestures. The proposed system recognizes gestures with an F1 score of 98.4% on our hold-out test dataset, it runs on an Arm Cortex-M4 microcontroller requiring less than 280 kB of flash memory, 120 kB of RAM, and consuming 75 mW of power.

摘要
Translated into Simplified Chinese:这篇论文介绍了一个基于60 GHz频率Modulated continuous wave（FMCW）雷达的轻量级手势识别系统。我们示示了手势可以高效地characterized by a set of five features，并提议了一种简单的雷达处理算法来提取这些特征。与之前的方法不同的是，我们避免了重量级的2D处理，即范围-Doppler成像，并 instead perform an early target detection - 这 позволяет我们将系统端到完全嵌入式平台上，即Memory, compute和电力消耗受限。一个基于回归神经网络（RNN）的架构利用这些特征来同时检测和分类五种不同的手势。提议的系统在我们的测试数据集上的F1分数为98.4%，运行在Arm Cortex-M4微控制器上，需要 less than 280 kB的flash存储器，120 kB的RAM，并消耗75 mW的电力。

A Survey of Methods for Handling Disk Data Imbalance

paper_url: http://arxiv.org/abs/2310.08867
repo_url: None
paper_authors: Shuangshuang Yuan, Peng Wu, Yuehui Chen, Qiang Li
for: 该论文旨在提供关于不均衡数据分类的广泛回顾，包括数据水平方法、算法水平方法和гибри德方法。
methods: 该论文总结了不同类型的方法，包括数据水平方法、算法水平方法和гибри德方法，并分析了它们的存在问题、算法想法、优点和缺点。
results: 该论文不提供实际结果，而是为研究者提供一个全面的回顾，以便他们可以根据自己的需要选择适当的方法。

Abstract
Class imbalance exists in many classification problems, and since the data is designed for accuracy, imbalance in data classes can lead to classification challenges with a few classes having higher misclassification costs. The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalance. This paper provides a comprehensive overview of research in the field of imbalanced data classification. The discussion is organized into three main aspects: data-level methods, algorithmic-level methods, and hybrid methods. For each type of method, we summarize and analyze the existing problems, algorithmic ideas, strengths, and weaknesses. Additionally, the challenges of unbalanced data classification are discussed, along with strategies to address them. It is convenient for researchers to choose the appropriate method according to their needs.

摘要
clas·si·fy·ca·tion prob·lems often have class im·bal·ance, and since the data is de·signed for ac·cu·ra·cy, im·bal·ance in data classes can lead to clas·si·fy·ca·tion chal·lenges with a few classes having higher mis·class·i·fi·ca·tion costs. The Back·blaze dataset, a widely used dataset re·lated to hard disks, has a small amount of fail·ure data and a large amount of health data, which ex·hib·its a se·rious class im·bal·ance. This pa·per pro·vides a com·pre·hen·sive over·view of re·search in the field of im·bal·anced data clas·si·fi·ca·tion. The dis·cus·sion is or·gan·ized into three main as·pects: data-level me·thods, al·go·rithm-level me·thods, and hy·brid me·thods. For each type of me·thod, we sum·ma·rize and an·a·lyze the ex·ist·ing prob·lems, al·go·rithm·ic ideas, strengths, and weak·nesses. In ad·di·tion, the chal·lenges of un·bal·anced data clas·si·fi·ca·tion are dis·cussed, along with stra·te·gies to ad·dress them. It is con·ve·nient for re·searchers to choose the ap·pro·pri·ate me·thod ac·cord·ing to their needs.

In-Context Learning for Few-Shot Molecular Property Prediction

paper_url: http://arxiv.org/abs/2310.08863
repo_url: None
paper_authors: Christopher Fifty, Jure Leskovec, Sebastian Thrun
for: 本研究的目的是开发一种基于内容学习的新算法，用于几个示例学习分子属性预测。
methods: 本研究使用了内容学习的核心思想，将(分子、属性测量)对的集合作为内容，并通过适应器来学习分子属性的预测。
results: 在FS-Mol和BACE分子属性预测标准库中，我们发现这种方法在小支持大小下表现更好，并与最佳方法在大支持大小下竞争。

Abstract
In-context learning has become an important approach for few-shot learning in Large Language Models because of its ability to rapidly adapt to new tasks without fine-tuning model parameters. However, it is restricted to applications in natural language and inapplicable to other domains. In this paper, we adapt the concepts underpinning in-context learning to develop a new algorithm for few-shot molecular property prediction. Our approach learns to predict molecular properties from a context of (molecule, property measurement) pairs and rapidly adapts to new properties without fine-tuning. On the FS-Mol and BACE molecular property prediction benchmarks, we find this method surpasses the performance of recent meta-learning algorithms at small support sizes and is competitive with the best methods at large support sizes.

摘要
内容学习已成为大语言模型少个数据学习的重要方法，因为它可以快速适应新任务而无需调整模型参数。但是，它仅适用于自然语言领域，无法应用于其他领域。在这篇论文中，我们将内容学习的概念应用到开发一个新的数据少量预测分子性能的算法。我们的方法可以从（分子、性能测量）对组中学习分子性能，并快速适应新的性能而无需调整。在FS-Mol和BACE分子性能预测benchmark上，我们发现这种方法在小支持大小下表现比过去的元学习算法优秀，并且与最佳方法竞争。

Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and Adaptation

paper_url: http://arxiv.org/abs/2310.08855
repo_url: https://github.com/lvyilin/adab2n
paper_authors: Yilin Lyu, Liyuan Wang, Xingxing Zhang, Zicheng Sun, Hang Su, Jun Zhu, Liping Jing
for: 本研究旨在提出一种基于批量Normalization的持续学习方法，以便在深度神经网络中解决快速忘记旧任务的问题。
methods: 本文使用了批量Normalization（BN），并提供了一种名为Adaptive Balance of BN（AdaB$^2$N）的批量Normalization方法，以便在持续学习中维护旧任务的知识。
results: 本研究在多个benchmark上实现了显著的性能提升（最高提升率达7.68%、6.86%和4.26%），特别是在实际上线场景下（如Split CIFAR-10、Split CIFAR-100和Split Mini-ImageNet等）。

Abstract
Continual learning entails learning a sequence of tasks and balancing their knowledge appropriately. With limited access to old training samples, much of the current work in deep neural networks has focused on overcoming catastrophic forgetting of old tasks in gradient-based optimization. However, the normalization layers provide an exception, as they are updated interdependently by the gradient and statistics of currently observed training samples, which require specialized strategies to mitigate recency bias. In this work, we focus on the most popular Batch Normalization (BN) and provide an in-depth theoretical analysis of its sub-optimality in continual learning. Our analysis demonstrates the dilemma between balance and adaptation of BN statistics for incremental tasks, which potentially affects training stability and generalization. Targeting on these particular challenges, we propose Adaptive Balance of BN (AdaB$^2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions and a modified momentum to balance BN statistics, corresponding to the training and testing stages. By implementing BN in a continual learning fashion, our approach achieves significant performance gains across a wide range of benchmarks, particularly for the challenging yet realistic online scenarios (e.g., up to 7.68%, 6.86% and 4.26% on Split CIFAR-10, Split CIFAR-100 and Split Mini-ImageNet, respectively). Our code is available at https://github.com/lvyilin/AdaB2N.

摘要

Semi-Supervised End-To-End Contrastive Learning For Time Series Classification

paper_url: http://arxiv.org/abs/2310.08848
repo_url: https://github.com/DL4mHealth/SLOTS
paper_authors: Huili Cai, Xiang Zhang, Xiaofeng Liu
for: 这 paper 是为了解决时间序列分类问题，它在股票、医疗和感知数据分析等领域都是关键任务。
methods: 该 paper 使用了 semi-supervised learning 方法，即在 полу过小量标签数据的情况下，使用大量无标签数据进行预训练，然后在这些预训练模型中进行细化调整。
results: compared to 先前的两个阶段方法，SLOTS 在五个数据集上对十个状态对比法中的性能显著提高，尽管它们使用了相同的输入数据和计算成本。

Abstract
Time series classification is a critical task in various domains, such as finance, healthcare, and sensor data analysis. Unsupervised contrastive learning has garnered significant interest in learning effective representations from time series data with limited labels. The prevalent approach in existing contrastive learning methods consists of two separate stages: pre-training the encoder on unlabeled datasets and fine-tuning the well-trained model on a small-scale labeled dataset. However, such two-stage approaches suffer from several shortcomings, such as the inability of unsupervised pre-training contrastive loss to directly affect downstream fine-tuning classifiers, and the lack of exploiting the classification loss which is guided by valuable ground truth. In this paper, we propose an end-to-end model called SLOTS (Semi-supervised Learning fOr Time clasSification). SLOTS receives semi-labeled datasets, comprising a large number of unlabeled samples and a small proportion of labeled samples, and maps them to an embedding space through an encoder. We calculate not only the unsupervised contrastive loss but also measure the supervised contrastive loss on the samples with ground truth. The learned embeddings are fed into a classifier, and the classification loss is calculated using the available true labels. The unsupervised, supervised contrastive losses and classification loss are jointly used to optimize the encoder and classifier. We evaluate SLOTS by comparing it with ten state-of-the-art methods across five datasets. The results demonstrate that SLOTS is a simple yet effective framework. When compared to the two-stage framework, our end-to-end SLOTS utilizes the same input data, consumes a similar computational cost, but delivers significantly improved performance. We release code and datasets at https://anonymous.4open.science/r/SLOTS-242E.

摘要
时序序列分类是各个领域中的关键任务，如金融、医疗和感知数据分析。无监督对比学习在时序序列数据上学习有效表示已经引起了广泛的关注，但现有的对比学习方法中存在一些缺点，如预训练encoder的无监督对比损失无法直接影响下游精度调节器，以及缺乏利用有价值的真实标签导航的loss。在本文中，我们提出了一种终端模型called SLOTS（ semi-supervised Learning fOr Time clasSification）。SLOTS接受半标注数据集，包括大量无标注样本和一小部分标注样本，并使其映射到一个嵌入空间通过encoder。我们计算不只有无监督对比损失，还计算了有标注样本上的超级vised对比损失。学习的嵌入被传递给分类器，并计算使用可用的真实标签的分类损失。无监督、有标注对比损失和分类损失共同用于优化encoder和分类器。我们通过对SLOTS与10种现有方法进行比较，在5个数据集上评估SLOTS的性能。结果表明，SLOTS是一种简单 yet effective的框架。相比两个阶段方法，SLOTS使用同样的输入数据、相同的计算成本，但具有明显改善的性能。我们在https://anonymous.4open.science/r/SLOTS-242E上发布了代码和数据集。

On the Over-Memorization During Natural, Robust and Catastrophic Overfitting

paper_url: http://arxiv.org/abs/2310.08847
repo_url: None
paper_authors: Runqi Lin, Chaojian Yu, Bo Han, Tongliang Liu
for: 本研究旨在探讨深度神经网络（DNNs）在自然和攻击训练下的过拟合问题，并提出一种通用的方法来解决不同类型的过拟合。
methods: 本研究采用了一种纯然的自然pattern方法，通过分析DNNs的记忆效果，发现了一种共同的行为——过度记忆，这会使DNNs在推理过程中具有强自信心并保留长期记忆。
results: 实验结果表明，提出的方法能够在不同的训练方法下有效地避免过拟合，并且能够在攻击训练下保持鲁棒性。

Abstract
Overfitting negatively impacts the generalization ability of deep neural networks (DNNs) in both natural and adversarial training. Existing methods struggle to consistently address different types of overfitting, typically designing strategies that focus separately on either natural or adversarial patterns. In this work, we adopt a unified perspective by solely focusing on natural patterns to explore different types of overfitting. Specifically, we examine the memorization effect in DNNs and reveal a shared behaviour termed over-memorization, which impairs their generalization capacity. This behaviour manifests as DNNs suddenly becoming high-confidence in predicting certain training patterns and retaining a persistent memory for them. Furthermore, when DNNs over-memorize an adversarial pattern, they tend to simultaneously exhibit high-confidence prediction for the corresponding natural pattern. These findings motivate us to holistically mitigate different types of overfitting by hindering the DNNs from over-memorization natural patterns. To this end, we propose a general framework, Distraction Over-Memorization (DOM), which explicitly prevents over-memorization by either removing or augmenting the high-confidence natural patterns. Extensive experiments demonstrate the effectiveness of our proposed method in mitigating overfitting across various training paradigms.

摘要
深度神经网络（DNN）在自然和攻击训练中均受到过拟合的负面影响。现有方法通常只能分别针对自然或攻击模式中的过拟合，而不能一次性地解决不同类型的过拟合。在这种工作中，我们采用一种简化的视角，即只关注自然模式，以探索不同类型的过拟合。我们发现了DNN中的记忆效应，并证明了这种行为会削弱其泛化能力。这种行为表现为DNN在训练模式中 suddenly变得高度自信，并彻底记忆这些模式。此外，当DNN上下文中攻击模式时，它们通常同时表现出高度自信的预测行为，以及对应的自然模式的高度记忆。这些发现使我们感叹需要一种整体的缓解方法，以防止DNN在不同类型的过拟合中受到损害。为此，我们提出了一种通用框架——干扰过拟合（DOM），该框架可以显著降低DNN在不同训练方法下的过拟合。我们的实验结果表明，我们的提议方法可以有效地缓解DNN中的过拟合问题。

Optimal Sample Complexity for Average Reward Markov Decision Processes

paper_url: http://arxiv.org/abs/2310.08833
repo_url: None
paper_authors: Shengbo Wang, Jose Blanchet, Peter Glynn
for: maximizing the long run average reward of a uniformly ergodic Markov decision process (MDP)
methods: combining algorithmic ideas from Jin and Sidford (2021) and Li et al. (2020)
results: an estimator for the optimal policy with a sample complexity of $\widetilde O(|S||A|t_{\text{mix}\epsilon^{-2})$

Abstract
We settle the sample complexity of policy learning for the maximization of the long run average reward associated with a uniformly ergodic Markov decision process (MDP), assuming a generative model. In this context, the existing literature provides a sample complexity upper bound of $\widetilde O(|S||A|t_{\text{mix}^2 \epsilon^{-2})$ and a lower bound of $\Omega(|S||A|t_{\text{mix} \epsilon^{-2})$. In these expressions, $|S|$ and $|A|$ denote the cardinalities of the state and action spaces respectively, $t_{\text{mix}$ serves as a uniform upper limit for the total variation mixing times, and $\epsilon$ signifies the error tolerance. Therefore, a notable gap of $t_{\text{mix}$ still remains to be bridged. Our primary contribution is to establish an estimator for the optimal policy of average reward MDPs with a sample complexity of $\widetilde O(|S||A|t_{\text{mix}\epsilon^{-2})$, effectively reaching the lower bound in the literature. This is achieved by combining algorithmic ideas in Jin and Sidford (2021) with those of Li et al. (2020).

摘要
我们考虑了Policy学习的样本复杂性 для最大化长期平均奖励相关的Markov决策过程（MDP），假设有生成模型。现有的文献提供了样本复杂性Upper bound的$\widetilde O((|S||A|t_{\text{mix})^2 \epsilon^{-2})$和Lower bound的$\Omega((|S||A|t_{\text{mix} \epsilon^{-2})$。在这些表达式中， $|S|$和 $|A|$表示状态和动作空间的cardinality， $t_{\text{mix}$表示总变化混合时间，而 $\epsilon$表示错误容忍度。因此，还有一个显著的$t_{\text{mix}$ gab 需要bridged。我们的主要贡献是提出一个估计算法的优化策略的样本复杂性为 $\widetilde O((|S||A|t_{\text{mix} \epsilon^{-2})$，实际达到了文献中的Lower bound。这是通过合并Jin和Sidford（2021）的算法想法和Li et al.（2020）的想法来实现的。

A Nonlinear Method for time series forecasting using VMD-GARCH-LSTM model

paper_url: http://arxiv.org/abs/2310.08812
repo_url: None
paper_authors: Zhengtao Gui, Haoyuan Li, Sijie Xu, Yu Chen
For: The paper is written for forecasting complex time series data, specifically addressing the challenge of capturing implied volatilities that contain significant information.* Methods: The proposed VMD-LSTM-GARCH model combines Variational Mode Decomposition (VMD) with Long Short-Term Memory (LSTM) and GARCH models to capture both numerical and volatility information of the time series.* Results: The proposed model demonstrates superior performance in time series forecasting, with significant decreases in MSE, RMSE, and MAPE compared to other state-of-the-art methods.Here’s the Chinese translation of the three points:* For: 这篇论文是为了预测复杂的时间序列数据，特别是捕捉含有重要信息的预期波动。* Methods: 提议的 VMD-LSTM-GARCH 模型结合了变换方式模式分解 (VMD) 与长短期记忆 (LSTM) 和 GARCH 模型，以捕捉时间序列中的数字和波动信息。* Results: 提议的模型在时间序列预测中表现出色，与其他当前领先方法相比，显著降低了 MSE、RMSE 和 MAPE 的值。

Abstract
Time series forecasting represents a significant and challenging task across various fields. Recently, methods based on mode decomposition have dominated the forecasting of complex time series because of the advantages of capturing local characteristics and extracting intrinsic modes from data. Unfortunately, most models fail to capture the implied volatilities that contain significant information. To enhance the forecasting of current, rapidly evolving, and volatile time series, we propose a novel decomposition-ensemble paradigm, the VMD-LSTM-GARCH model. The Variational Mode Decomposition algorithm is employed to decompose the time series into K sub-modes. Subsequently, the GARCH model extracts the volatility information from these sub-modes, which serve as the input for the LSTM. The numerical and volatility information of each sub-mode is utilized to train a Long Short-Term Memory network. This network predicts the sub-mode, and then we aggregate the predictions from all sub-modes to produce the output. By integrating econometric and artificial intelligence methods, and taking into account both the numerical and volatility information of the time series, our proposed model demonstrates superior performance in time series forecasting, as evidenced by the significant decrease in MSE, RMSE, and MAPE in our comparative experimental results.

摘要
时间序列预测是一个重要且挑战性的任务，广泛存在各个领域。现在，基于模式分解的方法在复杂时间序列预测中占据主导地位，因为它们可以捕捉当前数据的本地特征并提取数据中的内在模式。然而，大多数模型忽略了包含重要信息的含量波动性。为了改进当前、迅速发展、波动性强的时间序列预测，我们提议一种新的分解-ensemble paradigma，即VMD-LSTM-GARCH模型。在这种模型中，Variational Mode Decomposition算法将时间序列分解成K个子模式。然后，GARCH模型从这些子模式中提取波动信息，这些信息作为LSTM网络的输入。每个子模式的数字和波动信息被用来训练一个Long Short-Term Memory网络。这个网络预测子模式，然后我们将所有子模式的预测结果聚合以生成输出。通过结合经济学和人工智能方法，并考虑时间序列的数字和波动信息，我们的提议模型在时间序列预测中表现出优于其他模型，这可以通过我们的比较实验结果来证明。

Analysis of Weather and Time Features in Machine Learning-aided ERCOT Load Forecasting

paper_url: http://arxiv.org/abs/2310.08793
repo_url: https://github.com/rpglab/ML_ERCOT-Load_Prediction
paper_authors: Jonathan Yang, Mingjian Tuo, Jin Lu, Xingpeng Li
for: 预测电力系统短期总荷电压
methods: 使用机器学习模型，其中输入特征包括不同的时间和天气信息
results: 通过不同天气和时间输入特征训练机器学习模型，实现了对电力系统短期总荷电压的准确预测

Abstract
Accurate load forecasting is critical for efficient and reliable operations of the electric power system. A large part of electricity consumption is affected by weather conditions, making weather information an important determinant of electricity usage. Personal appliances and industry equipment also contribute significantly to electricity demand with temporal patterns, making time a useful factor to consider in load forecasting. This work develops several machine learning (ML) models that take various time and weather information as part of the input features to predict the short-term system-wide total load. Ablation studies were also performed to investigate and compare the impacts of different weather factors on the prediction accuracy. Actual load and historical weather data for the same region were processed and then used to train the ML models. It is interesting to observe that using all available features, each of which may be correlated to the load, is unlikely to achieve the best forecasting performance; features with redundancy may even decrease the inference capabilities of ML models. This indicates the importance of feature selection for ML models. Overall, case studies demonstrated the effectiveness of ML models trained with different weather and time input features for ERCOT load forecasting.

摘要
Load forecasting is crucial for the efficient and reliable operation of the electric power system. Weather conditions have a significant impact on electricity consumption, making weather information an important factor in load forecasting. Additionally, personal appliances and industry equipment have temporal patterns that contribute to electricity demand, making time a useful factor to consider. This work develops several machine learning models that take various time and weather information as input features to predict short-term system-wide total load. Ablation studies were also performed to investigate the impacts of different weather factors on prediction accuracy. Actual load and historical weather data for the same region were used to train the ML models. It is interesting to note that using all available features may not result in the best forecasting performance, as features with redundancy may actually decrease the inference capabilities of ML models. This highlights the importance of feature selection for ML models. Case studies demonstrated the effectiveness of ML models trained with different weather and time input features for ERCOT load forecasting.

Incentive Mechanism Design for Distributed Ensemble Learning

paper_url: http://arxiv.org/abs/2310.08792
repo_url: https://github.com/PengchaoHan/Incentive-Mechanism-Design-for-Distributed-Ensemble-Learning
paper_authors: Chao Huang, Pengchao Han, Jianwei Huang
for: 提高分布式学习（DEL）的性能，通过让多个学习器同时训练，并将其结果相乘以提高性能。
methods: 提出了一种奖励机制设计方法，以促进自利益强烈的学习器参与DEL。
results: 研究发现，在MNIST数据集上，提出的奖励机制可能会导致学习器偏向更少的多样性，以实现更高的集成精度。

Abstract
Distributed ensemble learning (DEL) involves training multiple models at distributed learners, and then combining their predictions to improve performance. Existing related studies focus on DEL algorithm design and optimization but ignore the important issue of incentives, without which self-interested learners may be unwilling to participate in DEL. We aim to fill this gap by presenting a first study on the incentive mechanism design for DEL. Our proposed mechanism specifies both the amount of training data and reward for learners with heterogeneous computation and communication costs. One design challenge is to have an accurate understanding regarding how learners' diversity (in terms of training data) affects the ensemble accuracy. To this end, we decompose the ensemble accuracy into a diversity-precision tradeoff to guide the mechanism design. Another challenge is that the mechanism design involves solving a mixed-integer program with a large search space. To this end, we propose an alternating algorithm that iteratively updates each learner's training data size and reward. We prove that under mild conditions, the algorithm converges. Numerical results using MNIST dataset show an interesting result: our proposed mechanism may prefer a lower level of learner diversity to achieve a higher ensemble accuracy.

摘要
分布式ensemble学习（DEL）涉及训练多个模型在分布式学习者上，然后将其预测结果进行组合以提高性能。现有相关研究主要关注DEL算法设计和优化，忽略了重要的问题——奖励机制，而无奖励机制，自利益学习者可能不愿意参与DEL。我们尝试填补这一空白，并提出了首个关于DEL奖励机制设计的研究。我们的提议的机制规定了学习者的训练数据量和奖励，并考虑了学习者的计算和通信成本不同而导致的差异。我们将 ensemble 精度分解为多样性精度和精度质量的贸易关系，以引导机制设计。另一个挑战是机制设计涉及到一个大的搜索空间的杂合Integer программирова。为此，我们提出了一种交互式算法，通过逐个学习者更新其训练数据量和奖励来实现机制设计。我们证明，在某些条件下，该算法 converges。 numerically 使用 MNIST 数据集，我们发现了一个有趣的结果：我们的提议机制可能会选择一个较低的学习者多样性来实现更高的 ensemble 精度。

2023-10-13

eess.IV

eess.IV - 2023-10-13

Sampling and resolution in sparse view photoacoustic tomography

paper_url: http://arxiv.org/abs/2310.09447
repo_url: None
paper_authors: Markus Haltmeier, Daniel Obmann, Karoline Felbermayer, Florian Hinterleitner, Peter Burgholzer
for: investigate resolution in photoacoustic tomography (PAT)
methods: 使用Shannon理论研究稀疏视图PAT的理论最高分辨率，并实验表明所有重建方法都超过这个限制。
results: 所有重建方法都超过了理论最高分辨率限制。

Abstract
We investigate resolution in photoacoustic tomography (PAT). Using Shannon theory, we investigate the theoretical resolution limit of sparse view PAT theoretically, and empirically demonstrate that all reconstruction methods used exceed this limit.

摘要
我们调查了图像听觉成像（PAT）中的分辨率。使用雪伦理论，我们 theoretically investigated the theoretical resolution limit of sparse view PAT, and empirically demonstrated that all reconstruction methods used exceed this limit.Here's a breakdown of the translation:* "We investigate" is translated as "我们调查" (wǒmen tīngshì).* "resolution" is translated as "分辨率" (jiěxiàngdù).* "in photoacoustic tomography" is translated as "在图像听觉成像中" (在图像听觉成像中).* "Using Shannon theory" is translated as "使用雪伦理论" (使用雪伦理论).* "we investigate the theoretical resolution limit" is translated as "我们 theoretically investigated the theoretical resolution limit" (我们 theoretically investigated the theoretical resolution limit).* "of sparse view PAT" is translated as " sparse view PAT" (稀见视野 PAT).* "empirically demonstrate" is translated as " empirically demonstrated" (empirically demonstrated).* "that all reconstruction methods used exceed this limit" is translated as "所有的重建方法都超过这个限制" (所有的重建方法都超过这个限制).

A study on the ideal magnitude and phase of reconstructed point targets in SAR imaging

paper_url: http://arxiv.org/abs/2310.08786
repo_url: None
paper_authors: Guanying Sun, Carey Rappaport
for: 这篇论文研究了探测器股票影像中点靶的大小和相位，通过反犯罪方法进行量化研究。
methods: 该论文使用了反犯罪方法，分别研究了单点靶和两点靶两种情况。
results: 该研究提出了对于单点靶和两点靶两种情况的质量和相位的定理，并通过数值例子证明了这些定理。

Abstract
In this paper, the magnitude and phase of the reconstructed point targets in SAR imaging are studied quantitatively by using inverse crime. Two scenarios, one with single point target in the imaging area and the other with two point targets, are considered. The theorems on the magnitude and phase are established and proved for each scenario. In addition, several numerical examples are presented and the numerical results show that they agree with the corresponding theorems. This study is useful for appreciating the limitations of formulating inversion algorithms based on simplistic point target building blocks.

摘要
在这篇论文中，我们Quantitatively研究了SAR成像中重建点目标的大小和相位。我们考虑了单点目标和两点目标两种场景。对于每个场景，我们提出了定理，并证明了它们的正确性。此外，我们还提供了一些数值示例，数据显示它们与相应的定理一致。这种研究有助于理解基于简单点目标建立的推算算法的限制。Here's the word-for-word translation:在这篇论文中，我们Quantitatively研究了SAR成像中重建点目标的大小和相位。我们考虑了单点目标和两点目标两种场景。对于每个场景，我们提出了定理，并证明了它们的正确性。此外，我们还提供了一些数值示例，数据显示它们与相应的定理一致。这种研究有助于理解基于简单点目标建立的推算算法的限制。

2023-10-13

eess.SP

eess.SP - 2023-10-13

Global Positioning: the Uniqueness Question and a New Solution Method

paper_url: http://arxiv.org/abs/2310.09261
repo_url: None
paper_authors: Mireille Boutin, Gregor Kemper
for: 提供了一种新的代数解决方法 для全球定位问题在n维空间中使用m颗卫星。
methods: 使用了代数几何方法。
results: 得到了一种几何特征化的不唯一解问题的情况，证明了这种情况可以发生在任何维度和卫星数量下，推翻了一些已有的 conjectures。此外，提供了一个证明：当m≥n+2时，用户位置 almost all情况下有唯一解；当m≥2n+2时， almost all卫星配置都可以确保用户位置有唯一解。

Abstract
We provide a new algebraic solution procedure for the global positioning problem in $n$ dimensions using $m$ satellites. We also give a geometric characterization of the situations in which the problem does not have a unique solution. This characterization shows that such cases can happen in any dimension and with any number of satellites, leading to counterexamples to some open conjectures. We fill a gap in the literature by giving a proof for the long-held belief that when $m \ge n+2$, the solution is unique for almost all user positions. Even better, when $m \ge 2n+2$, almost all satellite configurations will guarantee a unique solution for all user positions. Some of our results are obtained using tools from algebraic geometry.

摘要
我们提供了一种新的代数解决方法，用于在 $n$ 维空间中解决全球定位问题，使用 $m$ 颗卫星。我们还给出了一种几何特征化，用于描述无Unique解的情况。这个特征化表明，这种情况可以在任何维度和任何卫星数量下出现，这些counterexample 解决了一些长期存在的 conjecture。我们填充了文献中的一个空白，证明了当 $m \ge n+2$ 时，用户位置 almost all 情况下存在唯一解。更好的是，当 $m \ge 2n+2$ 时，大多数卫星配置都可以保证所有用户位置的唯一解。一些我们的结果使用了代数几何工具。

Histogram-less LiDAR through SPAD response linearization

paper_url: http://arxiv.org/abs/2310.09176
repo_url: None
paper_authors: Alessandro Tontini, Sonia Mazzucchi, Roberto Passerone, Nicolò Broseghini, Leonardo Gasparini
for: 这个论文是为了提出一种新的方法来从SPAD基于直时飞行（d-ToF）图像系统中获取3D信息，这种方法不需要构建历史gram的时间排序和可以承受高流量运行模式。
methods: 该获取方案模拟SPAD探测器无损时间延迟的行为，通过简单的平均运算来提取飞行信息，使得感知器易于集成和扩展到大型阵列。
results: 该方法被validate通过广泛的数学分析和数值 Monte Carlo 模型，其预测与实际测量设置中的证明一致，并在较高的背景干扰下达到3.8米的距离。

Abstract
We present a new method to acquire the 3D information from a SPAD-based direct-Time-of-Flight (d-ToF) imaging system which does not require the construction of a histogram of timestamps and can withstand high flux operation regime. The proposed acquisition scheme emulates the behavior of a SPAD detector with no distortion due to dead time, and extracts the Tof information by a simple average operation on the photon timestamps ensuring ease of integration in a dedicated sensor and scalability to large arrays. The method is validated through a comprehensive mathematical analysis, whose predictions are in agreement with a numerical Monte Carlo model of the problem. Finally, we show the validity of the predictions in a real d-ToF measurement setup under challenging background conditions well beyond the typical pile-up limit of 5% detection rate up to a distance of 3.8 m.

摘要
我们提出了一种新的方法，可以从 SPAD 基于 direct-Time-of-Flight（d-ToF）图像系统中获取三维信息，不需要构建历史gram的时间排序和可以承受高流动操作模式。我们的获取方案模拟 SPAD 探测器无损时间延迟的行为，通过简单的平均操作来提取 Tof 信息，以便易于集成到专门的感器中和可扩展到大型数组。我们通过了全面的数学分析，其预测与数字 Monte Carlo 模型的预测一致。最后，我们在实际 d-ToF 测量设置下表明了该预测的有效性，包括耗送背景条件超出 Typical 堆叠限制的5% 检测率至 3.8 米之距离。

DNFS-VNE: Deep Neuro-Fuzzy System-Driven Virtual Network Embedding Algorithm

paper_url: http://arxiv.org/abs/2310.09078
repo_url: None
paper_authors: Ailing Xiao, Ning Chen, Sheng Wu, Shigen Shen, Weiping Ding, Peiying Zhang
for: 本研究旨在提出一种可解释性强的网络虚拟化（NV）算法，以满足不同需求的多样化和服务质量的差异化。
methods: 本研究使用了深度学习（DL）技术，具体来说是深度神经网络（CNN）和规则推理（RL）的组合，以实现可解释性强的虚拟网络嵌入（VNE）算法。
results: 实验结果表明，提出的DNFS-based VNE算法可以减少虚拟网络的coupling度，提高服务质量和多样化性。同时，DNFS可以帮助找到更加精准的虚拟网络嵌入。

Abstract
By decoupling substrate resources, network virtualization (NV) is a promising solution for meeting diverse demands and ensuring differentiated quality of service (QoS). In particular, virtual network embedding (VNE) is a critical enabling technology that enhances the flexibility and scalability of network deployment by addressing the coupling of Internet processes and services. However, in the existing works, the black-box nature of deep neural networks (DNNs) limits the analysis, development, and improvement of systems. In recent times, interpretable deep learning (DL) represented by deep neuro-fuzzy systems (DNFS) combined with fuzzy inference has shown promising interpretability to further exploit the hidden value in the data. Motivated by this, we propose a DNFS-based VNE algorithm that aims to provide an interpretable NV scheme. Specifically, data-driven convolutional neural networks (CNNs) are used as fuzzy implication operators to compute the embedding probabilities of candidate substrate nodes through entailment operations. And, the identified fuzzy rule patterns are cached into the weights by forward computation and gradient back-propagation (BP). In addition, the fuzzy rule base is constructed based on Mamdani-type linguistic rules using linguistic labels. Finally, the effectiveness of evaluation indicators and fuzzy rules is verified by experiments.

摘要
通过卸载基础资源 Coupling, 网络虚拟化（NV）是一种有前途的解决方案，用于满足多样化的需求并确保不同的服务质量（QoS）。特别是虚拟网络嵌入（VNE）是一种关键的促进技术，它提高了网络部署的灵活性和扩展性，并且可以解决互联网过程和服务的卷积问题。然而，在现有的工作中，深度神经网络（DNN）的黑盒特性限制了系统分析、开发和改进的能力。在最近的时间里，可解释的深度学习（DL），表示的深度神经网络系统（DNFS），与杂化推理相结合，已经显示出了可解释性的提高，以便更好地利用数据中隐藏的值。驱动了这一点，我们提出了一种基于 DNFS 的 VNE 算法，目的是提供一种可解释的 NV 方案。具体来说，用户提供的数据驱动的卷积神经网络（CNN）被用作杂化推理操作来计算候选基础节点的嵌入概率。而identified的杂化规则模式被缓存在权重中通过前向计算和反向传播（BP）。此外，基于 Mamdani 类型语言规则的语言标签建立了杂化规则基。最后，实验证明了评价指标和杂化规则的有效性。

Cell-Free Massive MIMO for ISAC: Access Point Operation Mode Selection and Power Control

paper_url: http://arxiv.org/abs/2310.09032
repo_url: None
paper_authors: Mohamed Elfiatoure, Mohammadali Mohammadi, Hien Quoc Ngo, Michail Matthaiou
for: 这篇论文考虑了一个细胞自由大量多输入多 outputs（MIMO）集成感知通信（ISAC）系统，其中分布式MIMO访问点（AP）被用来同时服务通信用户和探测单个目标。
methods: 我们调查了AP操作模式选择问题，其中一些AP仅用于下行通信，剩下的AP用于探测目的。我们 derivated关闭式表达式，用于评估通信和探测性能。
results: 我们的数字结果表明，提出的AP操作模式选择与功率控制可以在给定探测要求下显著改善通信性能。

Abstract
This paper considers a cell-free massive multipleinput multiple-output (MIMO) integrated sensing and communication (ISAC) system, where distributed MIMO access points (APs) are used to jointly serve the communication users and detect the presence of a single target. We investigate the problem of AP operation mode selection, wherein some APs are dedicated for downlink communication, while the remaining APs are used for sensing purposes. Closed-form expressions for the individual spectral efficiency (SE) and mainlobe-to-average-sidelobe ratio (MASR) are derived, which are respectively utilized to assess the communication and sensing performances. Accordingly, a maxmin fairness problem is formulated and solved, where the minimum SE of the users is maximized, subject to the per-AP power constraints as well as sensing MASR constraint. Our numerical results show that the proposed AP operation mode selection with power control can significantly improve the communication performance for given sensing requirements.

摘要
本文考虑了一个无细胞大规模多输入多输出（MIMO）集成感知通信（ISAC）系统，其中分布式MIMO访问点（AP）被用来共同服务通信用户和检测目标的存在。我们研究了AP操作模式选择问题，其中一些AP专门用于下降通信，剩下的AP用于探测用途。我们 derivatedclosed-form表达式，用于评估通信和探测性能。根据这些表达式，我们建立了最大最小公正问题，其中最小的用户SE被最大化，同时保证每个AP的功率限制以及探测MASR限制。我们的数字结果表明，提出的AP操作模式选择策略可以显著提高给定的探测要求下的通信性能。

Survey on Near-Space Information Networks: Channel Modeling, Networking, and Transmission Perspectives

paper_url: http://arxiv.org/abs/2310.09025
repo_url: None
paper_authors: Xianbin Cao, Peng Yang, Xiaoning Su
for: 提供 quickly, robustly, and cost-efficiently 探测和通信服务的新Registry。
methods: 使用高空平台（HAPs）和高和低空无人机（UAVs）组成的near-space信息网络（NSIN）。
results: 提供最新的NSIN技术发展和应用场景，包括通信协议和网络部署方法，以及空中平台不稳定运动对antenna数组阶段延迟的影响。

Abstract
Near-space information networks (NSIN) composed of high-altitude platforms (HAPs), high- and low-altitude unmanned aerial vehicles (UAVs) are a new regime for providing quickly, robustly, and cost-efficiently sensing and communication services. Precipitated by innovations and breakthroughs in manufacturing, materials, communications, electronics, and control technologies, NSIN have emerged as an essential component of the emerging sixth-generation of mobile communication systems. This article aims at providing and discussing the latest advances in NSIN in the research areas of channel modeling, networking, and transmission from a forward-looking, comparative, and technological evolutionary perspective. In this article, we highlight the characteristics of NSIN and present the promising use-cases of NSIN. The impact of airborne platforms' unstable movements on the phase delays of onboard antenna arrays with diverse structures is mathematically analyzed. The recent advancements in HAP channel modeling are elaborated on, along with the significant differences between HAP and UAV channel modeling. A comprehensive review of the networking technologies of NSIN in network deployment, handoff management, and network management aspects is provided. Besides, the promising technologies and communication protocols of the physical layer, medium access control (MAC) layer, network layer, and transport layer of NSIN for achieving efficient transmission over NSIN are overviewed. Finally, we outline some open issues and promising directions of NSIN deserved for future study and discuss the corresponding challenges.

摘要
近空信息网络（NSIN）由高空平台（HAP）、高空和低空无人机（UAV）组成，是一种新的服务提供方式，具有快速、可靠、成本效益的优势。由于制造技术、材料、通信、电子和控制技术的进步，NSIN已经成为第六代移动通信系统的重要组成部分。本文旨在提供和讨论最新的NSIN研究进展，包括通道模型、网络和传输技术。本文特点出了NSIN的特点，并提出了NSIN的有前途的应用场景。文中还分析了机上天线阵列的相位延迟问题，并对高空平台通道模型进行了详细介绍。此外，本文还提供了NSIN网络部署、承接管理和网络管理方面的全面评论，以及NSIN物理层、数据链层、网络层和传输层的有效传输技术。最后，文章还提出了NSIN未来研究的一些开问和挑战。

Multi-Sensor Multi-Scan Radar Sensing of Multiple Extended Targets

paper_url: http://arxiv.org/abs/2310.09011
repo_url: https://github.com/martin497/di-gsncp-radar-sensing
paper_authors: Martin V. Vejling, Christophe A. N. Biscio, Petar Popovski
for: 提出了一种高效的多扫描多感器多目标探测问题解决方案，适用于高噪声场景下 closely spaced 的多目标探测。
methods: 使用了一种异常泛化普通术语的差分推论采样技术来估算参数，并考虑了探测器的空间性质，包括探测器噪声协方差、检测概率和分辨率。
results: 对各种高噪声场景下的多目标探测问题进行了数值实验，结果表明，提出的方法可以在高噪声场景下提供较好的性能，超过了现有的多目标跟踪算法。

Abstract
We propose an efficient solution to the state estimation problem in multi-scan multi-sensor multiple extended target sensing scenarios. We first model the measurement process by a doubly inhomogeneous-generalized shot noise Cox process and then estimate the parameters using a jump Markov chain Monte Carlo sampling technique. The proposed approach scales linearly in the number of measurements and can take spatial properties of the sensors into account, herein, sensor noise covariance, detection probability, and resolution. Numerical experiments using radar measurement data suggest that the algorithm offers improvements in high clutter scenarios with closely spaced targets over state-of-the-art clustering techniques used in existing multiple extended target tracking algorithms.

摘要
我们提出一种高效的解决方案，用于多扫描多传感器多扩展目标感知场景中的状态估计问题。我们首先将测量过程模型为一种 doubly inhomogeneous-generalized shot noise Cox 过程，然后使用跳动 Markov 链 Monte Carlo 样本技术来估计参数。我们的方法与测量数据量直接相关，并能考虑探测器的空间性能，包括探测器噪声卷积矩阵、检测概率和分辨率。数值实验使用雷达测量数据表明，我们的算法在高噪场景中，具有较高的性能，超过现有多扩展目标跟踪算法中的聚类技术。

A unified framework for STAR-RIS coefficients optimization

paper_url: http://arxiv.org/abs/2310.08960
repo_url: None
paper_authors: Hancheng Zhu, Yuanwei Liu, Yik Chung Wu, Vincent K. N. Lau
for: 提高传输性能和反射性能的同时传输和反射（STAR-RIS）系统，它可以为传输和反射两侧用户提供服务。
methods: 该paper提出了一个统一优化框架，用于处理STAR-RIS运行模式和离散阶跃Constraints。该框架通过引入一个惩罚项，将原始问题转化为两个迭代子问题，其中一个包含选择类约束，另一个子问题处理其他无线资源。
results: 对下行传输的总比特率最大化问题进行了示例应用，并获得了比其他已有算法更好的性能。此外，研究还发现，使用4或2个离散阶跃STAR-RIS可以达到相当于连续阶跃的性能水平，这是首次发现离散阶跃不一定导致显著性能下降。

Abstract
Simultaneously transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS), which serves users located on both sides of the surface, has recently emerged as a promising enhancement to the traditional reflective only RIS. Due to the lack of a unified comparison of communication systems equipped with different modes of STAR-RIS and the performance degradation caused by the constraints involving discrete selection, this paper proposes a unified optimization framework for handling the STAR-RIS operating mode and discrete phase constraints. With a judiciously introduced penalty term, this framework transforms the original problem into two iterative subproblems, with one containing the selection-type constraints, and the other subproblem handling other wireless resource. Convergent point of the whole algorithm is found to be at least a stationary point under mild conditions. As an illustrative example, the proposed framework is applied to a sum-rate maximization problem in the downlink transmission. Simulation results show that the algorithms from the proposed framework outperform other existing algorithms tailored for different STAR-RIS scenarios. Furthermore, it is found that 4 or even 2 discrete phases STAR-RIS could achieve almost the same sum-rate performance as the continuous phase setting, showing for the first time that discrete phase is not necessarily a cause of significant performance degradation.

摘要
同时传输和反射（STAR-RIS）可重配置智能表面，该表面服务于两侧用户，近期崛起为传统反射only RIS的增强。由于不同 communicate systems 装备不同模式的 STAR-RIS 的比较缺乏统一的评估，这篇论文提出了一个统一优化框架，用于处理 STAR-RIS 运行模式和离散阶段约束。通过在搜索过程中引入罚项，该框架将原始问题转化为两个迭代子问题，其中一个包含选择类约束，另一个子问题处理其他无线资源。对整个算法的转移点，存在轻度条件下的 converges 点。在应用于下链传输的吞吐量最大化问题中，使用提出的框架的算法比其他适用于不同 STAR-RIS 场景的算法更高效。此外，发现4或两个离散阶段 STAR-RIS 可以达到相当于连续阶段的吞吐量性能，这是第一次发现离散阶段不一定会导致显著性能下降。

A Two-Stage 2D Channel Extrapolation Scheme for TDD 5G NR Systems

paper_url: http://arxiv.org/abs/2310.08851
repo_url: None
paper_authors: Yubo Wan, An Liu
for: addressing the channel extrapolation problem in TDD massive MIMO-OFDM systems for 5G NR, incorporating imperfection factors
methods: 提议了一种二stage二维（2D）通道投影方案，包括在频率域和时间域中进行通道投影，以减轻不完美因素的影响并确保高精度通道估计
results: 对比基elines，提议的通道投影方案在实验结果中表现出优于基elines，能够更好地捕捉大量MIMO-OFDM通道的动态稠密特征

Abstract
Recently, channel extrapolation has been widely investigated in frequency division duplex (FDD) massive MIMO systems. However, in time division duplex (TDD) fifth generation (5G) new radio (NR) systems, the channel extrapolation problem also arises due to the hopping uplink pilot pattern, which has not been fully researched yet. This paper addresses this gap by formulating a channel extrapolation problem in TDD massive MIMO-OFDM systems for 5G NR, incorporating imperfection factors. A novel two-stage two-dimensional (2D) channel extrapolation scheme in both frequency and time domain is proposed, designed to mitigate the negative effects of imperfection factors and ensure high-accuracy channel estimation. Specifically, in the channel estimation stage, we propose a novel multi-band and multi-timeslot based high-resolution parameter estimation algorithm to achieve 2D channel extrapolation in the presence of imperfection factors. Then, to avoid repeated multi-timeslot based channel estimation, a channel tracking stage is designed during the subsequent time instants, in which a sparse Markov channel model is formulated to capture the dynamic sparsity of massive MIMO-OFDM channels under the influence of imperfection factors. Next, an expectation-maximization (EM) based compressive channel tracking algorithm is designed to jointly estimate unknown imperfection and channel parameters by exploiting the high-resolution prior information of the delay/angle parameters from the previous timeslots. Simulation results underscore the superior performance of our proposed channel extrapolation scheme over baselines.

摘要
近些年，频分多路多Input Multiple Output（MIMO）系统中的通道拓展问题得到了广泛的研究。然而，在时分多路新Radio（NR）5G系统中，通道拓展问题也出现，它和各种不完美因素相关。这篇论文填补了这一漏洞，并提出了一种基于OFDM的2D通道拓展方案，以减轻不完美因素的负面影响，并确保高精度通道估计。具体来说，在通道估计阶段，我们提出了一种基于多频和多时槽的高分辨率参数估计算法，以实现2D通道拓展在存在不完美因素的情况下。然后，为了避免重复的多时槽基本通道估计，我们设计了一个通道跟踪阶段，在接下来的时间点instants中，采用了一个简单的Markov通道模型来捕捉大量MIMO-OFDM通道的动态稀热性。接着，我们设计了一个基于极大似然估计的压缩通道跟踪算法，以同时估计不完美因素和通道参数。实验结果表明，我们的提出的通道拓展方案在基准下表现出优于性。

Spiking Semantic Communication for Feature Transmission with HARQ

paper_url: http://arxiv.org/abs/2310.08804
repo_url: None
paper_authors: Mengyang Wang, Jiahui Li, Mengyao Ma, Xiaopeng Fan
for: This paper aims to improve the performance of Semantic Communication (SC) models in Collaborative Intelligence (CI) systems by introducing a novel SC model called SNN-SC-HARQ, which combines SNN-based SC models with the Hybrid Automatic Repeat Request (HARQ) mechanism.
methods: The proposed SNN-SC-HARQ model uses a combination of SNN-based SC models and a policy model to dynamically adjust the transmission bandwidth based on channel conditions, without sacrificing performance.
results: Experimental results show that SNN-SC-HARQ can dynamically adjust the bandwidth according to the channel conditions without performance loss, improving the overall performance of SC models in CI systems.

Abstract
In Collaborative Intelligence (CI), the Artificial Intelligence (AI) model is divided between the edge and the cloud, with intermediate features being sent from the edge to the cloud for inference. Several deep learning-based Semantic Communication (SC) models have been proposed to reduce feature transmission overhead and mitigate channel noise interference. Previous research has demonstrated that Spiking Neural Network (SNN)-based SC models exhibit greater robustness on digital channels compared to Deep Neural Network (DNN)-based SC models. However, the existing SNN-based SC models require fixed time steps, resulting in fixed transmission bandwidths that cannot be adaptively adjusted based on channel conditions. To address this issue, this paper introduces a novel SC model called SNN-SC-HARQ, which combines the SNN-based SC model with the Hybrid Automatic Repeat Request (HARQ) mechanism. SNN-SC-HARQ comprises an SNN-based SC model that supports the transmission of features at varying bandwidths, along with a policy model that determines the appropriate bandwidth. Experimental results show that SNN-SC-HARQ can dynamically adjust the bandwidth according to the channel conditions without performance loss.

摘要
在合作智能（CI）中，人工智能（AI）模型被分为边缘和云端两部分，中间特征从边缘传输到云端进行推理。一些基于深度学习的语意通信（SC）模型已经被提出，以减少特征传输开销和抵消通道干扰。前一些研究表明，使用快速 нейрон网络（SNN）基于的 SC 模型在数字通道上比使用深度神经网络（DNN）基于的 SC 模型更加强健。然而，现有的 SNN 基于的 SC 模型具有固定时间步，导致固定的传输宽度，无法根据通道条件进行适应调整。为解决这个问题，本文提出了一种新的 SC 模型，即 SNN-SC-HARQ，它结合了 SNN 基于的 SC 模型和混合自动重传请求（HARQ）机制。SNN-SC-HARQ 包括一个基于 SNN 的 SC 模型，可以在不同宽度下传输特征，以及一个策略模型，用于确定适当的宽度。实验结果表明，SNN-SC-HARQ 可以根据通道条件动态调整宽度，无需 sacrificing performance。

Quickest Change Detection in Autoregressive Models

paper_url: http://arxiv.org/abs/2310.08789
repo_url: None
paper_authors: Zhongchang Sun, Shaofeng Zou
for: 检测autoregressive（AR）模型中最快改变（Quickest Change Detection，QCD）的问题。
methods: 使用novel forward variable和auxiliary Markov chain来证明AR模型的 asymptotic stability condition，并构建computationally efficient Ergodic CuSum算法。在数据驱动 Setting下，还构建了online和 computationally efficient gradient ascent CuSum算法，基于最大 LIKElihood principle和梯度升 ascend approach。
results: 提出了一种数据驱动的 online和 computationally efficient gradient ascent CuSum算法，可以在false alarm控制下检测到变化。同时，还 deriv了lower bound on its average running length to false alarm。 simulate results demonstrate the performance of the proposed algorithms.

Abstract
The problem of quickest change detection (QCD) in autoregressive (AR) models is investigated. A system is being monitored with sequentially observed samples. At some unknown time, a disturbance signal occurs and changes the distribution of the observations. The disturbance signal follows an AR model, which is dependent over time. Before the change, observations only consist of measurement noise, and are independent and identically distributed (i.i.d.). After the change, observations consist of the disturbance signal and the measurement noise, are dependent over time, which essentially follow a continuous-state hidden Markov model (HMM). The goal is to design a stopping time to detect the disturbance signal as quickly as possible subject to false alarm constraints. Existing approaches for general non-i.i.d. settings and discrete-state HMMs cannot be applied due to their high computational complexity and memory consumption, and they usually assume some asymptotic stability condition. In this paper, the asymptotic stability condition is firstly theoretically proved for the AR model by a novel design of forward variable and auxiliary Markov chain. A computationally efficient Ergodic CuSum algorithm that can be updated recursively is then constructed and is further shown to be asymptotically optimal. The data-driven setting where the disturbance signal parameters are unknown is further investigated, and an online and computationally efficient gradient ascent CuSum algorithm is designed. The algorithm is constructed by iteratively updating the estimate of the unknown parameters based on the maximum likelihood principle and the gradient ascent approach. The lower bound on its average running length to false alarm is also derived for practical false alarm control. Simulation results are provided to demonstrate the performance of the proposed algorithms.

摘要
文本中的快速变化检测（QCD）问题在某些杂音模型（AR）中进行了研究。系统通过连续观测样本进行监测，并且在未知时间点上发生了干扰信号，该干扰信号随时间的变化而改变观测值的分布。在干扰之前，观测值只包含测量噪声，而测量噪声是独立同分布的。在干扰后，观测值包含干扰信号和测量噪声，这些观测值随时间的变化而成为连续状态隐марков链（HMM）。检测干扰信号的目标是在最短时间内检测干扰信号，并且具有False Alarm控制的限制。现有的方法无法应用于这种非i.i.d.设置和离散状态HMM，因为它们具有高的计算复杂性和内存占用。此外，这些方法通常假设某种 asymptotic stability condition。本文首次 теоретиче上证明了AR模型的 asymptotic stability condition，并构建了一种可更新的Ergodic CuSum算法。此外，一种在线和计算效率高的梯度上升CuSum算法也被构建，该算法基于最大有希望似然原理和梯度上升方法来更新未知参数的估计。此外，对于实际false alarm控制，我们还 deriv了lower bound的平均跑道时间。实验结果显示了提案的算法的性能。

2023-10-12

cs.SD

cs.SD - 2023-10-12

End-to-end Online Speaker Diarization with Target Speaker Tracking

paper_url: http://arxiv.org/abs/2310.08696
repo_url: None
paper_authors: Weiqing Wang, Ming Li
for: 这个论文提出了一种基于在线检测的目标说话者语音活动检测系统，用于speaker diarization任务，不需要先知道 clustering-based diarization 系统中的目标说话者嵌入。
methods: 该系统使用了自适应的 conventional 目标说话者语音活动检测方法，并在实时运行中进行了适应。在推理阶段，我们使用了一个前端模型来提取每个块的帧级别说话者嵌入。然后，我们根据这些帧级别说话者嵌入和 previously estimatinated 目标说话者嵌入来预测每个块的检测状态。接着，我们更新了目标说话者嵌入，并在当前块中预测了每个块的检测结果。
results: 实验结果显示，提出的方法在 DIHARD III 和 AliMeeting 数据集上超过了停止 clustering-based diarization 系统的性能。此外，我们还扩展了该方法到多通道数据，并与现有的离线 diarization 系统具有相似的性能。

Abstract
This paper proposes an online target speaker voice activity detection system for speaker diarization tasks, which does not require a priori knowledge from the clustering-based diarization system to obtain the target speaker embeddings. By adapting the conventional target speaker voice activity detection for real-time operation, this framework can identify speaker activities using self-generated embeddings, resulting in consistent performance without permutation inconsistencies in the inference phase. During the inference process, we employ a front-end model to extract the frame-level speaker embeddings for each coming block of a signal. Next, we predict the detection state of each speaker based on these frame-level speaker embeddings and the previously estimated target speaker embedding. Then, the target speaker embeddings are updated by aggregating these frame-level speaker embeddings according to the predictions in the current block. Our model predicts the results for each block and updates the target speakers' embeddings until reaching the end of the signal. Experimental results show that the proposed method outperforms the offline clustering-based diarization system on the DIHARD III and AliMeeting datasets. The proposed method is further extended to multi-channel data, which achieves similar performance with the state-of-the-art offline diarization systems.

摘要
During the inference process, we employ a front-end model to extract the frame-level speaker embeddings for each coming block of a signal. Next, we predict the detection state of each speaker based on these frame-level speaker embeddings and the previously estimated target speaker embedding. Then, the target speaker embeddings are updated by aggregating these frame-level speaker embeddings according to the predictions in the current block. Our model predicts the results for each block and updates the target speakers' embeddings until reaching the end of the signal.Experimental results show that the proposed method outperforms the offline clustering-based diarization system on the DIHARD III and AliMeeting datasets. The proposed method is further extended to multi-channel data, which achieves similar performance with the state-of-the-art offline diarization systems.Translated into Simplified Chinese:这篇论文提出了一种在线目标说话人活动检测系统，用于说话人分类任务，不需要先知的 clustering-based 分类系统来获取目标说话人嵌入。通过适应传统的目标说话人活动检测系统进行实时操作，这种框架可以使用自生成的嵌入来识别说话人活动，从而避免在推理阶段出现 permutation 不一致的问题。在推理过程中，我们使用前端模型来提取每个块的帧级 speaker 嵌入。然后，我们预测每个说话人的检测状态基于这些帧级 speaker 嵌入和之前估计的目标说话人嵌入。然后，目标说话人嵌入会通过当前块的预测来更新。我们的模型会预测每个块的结果，并将目标说话人嵌入更新到信号的结束。实验结果显示，提出的方法在 DIHARD III 和 AliMeeting 数据集上超越了偏置 clustering-based 分类系统。此外，我们还将方法扩展到多通道数据，其性能与目前最佳的 offline 分类系统类似。

Crowdsourced and Automatic Speech Prominence Estimation

paper_url: http://arxiv.org/abs/2310.08464
repo_url: https://github.com/reseval/reseval
paper_authors: Max Morrison, Pranav Pawar, Nathan Pruyne, Jennifer Cole, Bryan Pardo
for: The paper is written for the purpose of developing an automated system for speech prominence estimation, which is useful for linguistic analysis and training automated systems for text-to-speech and emotion recognition.
methods: The paper uses crowdsourced annotations of a portion of the LibriTTS dataset to train a neural speech prominence estimator, and investigates the impact of dataset size and the number of annotations per utterance on the accuracy of the estimator.
results: The paper achieves high accuracy on unseen speakers, datasets, and speaking styles, and provides insights into the design decisions for neural prominence estimation and how annotation cost affects the performance of the estimator.

Abstract
The prominence of a spoken word is the degree to which an average native listener perceives the word as salient or emphasized relative to its context. Speech prominence estimation is the process of assigning a numeric value to the prominence of each word in an utterance. These prominence labels are useful for linguistic analysis, as well as training automated systems to perform emphasis-controlled text-to-speech or emotion recognition. Manually annotating prominence is time-consuming and expensive, which motivates the development of automated methods for speech prominence estimation. However, developing such an automated system using machine-learning methods requires human-annotated training data. Using our system for acquiring such human annotations, we collect and open-source crowdsourced annotations of a portion of the LibriTTS dataset. We use these annotations as ground truth to train a neural speech prominence estimator that generalizes to unseen speakers, datasets, and speaking styles. We investigate design decisions for neural prominence estimation as well as how neural prominence estimation improves as a function of two key factors of annotation cost: dataset size and the number of annotations per utterance.

摘要
spoken word 的发音度是指一个平均的本地语 listener 对该词的强调或强调度相对于其上下文的程度。 speech prominence estimation 是指将每个话语中的每个词的强调分配到一个数字值上。这些强调标签非常有用于语言分析，以及训练自动化系统来执行强调控制的文本到语音或情感识别。手动标注强调是时间consuming 和昂贵的，这种情况驱动了开发自动化方法 дляSpeech prominence estimation的发展。然而，使用机器学习方法开发这样的自动化系统需要人类标注数据。我们使用我们的系统来获取这些人类标注数据，并将其开源到LibriTTS dataset中。我们使用这些标注作为真实的参考数据，用于训练一个基于神经网络的声音发音度估计器，该估计器可以泛化到未看过的发音者、数据集和说话风格。我们也考虑了 neural prominence estimation 的设计决策，以及如何通过两个关键因素来提高强调估计器的准确性：数据集大小和每个话语中的标注数。

A cry for help: Early detection of brain injury in newborns

paper_url: http://arxiv.org/abs/2310.08338
repo_url: None
paper_authors: Charles C. Onu, Samantha Latremouille, Arsenii Gorin, Junhao Wang, Uchenna Ekwochi, Peter O. Ubuane, Omolara A. Kehinde, Muhammad A. Salisu, Datonye Briggs, Yoshua Bengio, Doina Precup
for: 这个研究的目的是用人工智能算法来检测新生儿的脑部损伤，以便在无法得到正确诊断的情况下提供可靠的诊断工具。
methods: 这个研究使用了一种新的训练方法来开发一种基于听音的疾病检测模型，并在5家医院 across 3 continent中收集了一个大量的新生儿哭声数据库。这个系统可以提取可解释的听音生物标志，并准确地检测新生儿的脑部损伤，其AUC为92.5%（88.7%的敏感度在80%的特异性下）。
results: 这个研究发现，通过使用听音来检测新生儿的脑部损伤，可以提供一种低成本、易用、不侵入的屏测工具，尤其是在发展中国家， где大多数生产不受训练的医生。这种系统可以减少新生儿需要经常受到物理疲劳或辐射暴露的诊断测试，如脑CT扫描。这项研究开创了将婴儿哭声作为生命 parameter的可能性，并表明了人工智能驱动的听音监测在未来的可 affordable healthcare中的潜力。

Abstract
Since the 1960s, neonatal clinicians have known that newborns suffering from certain neurological conditions exhibit altered crying patterns such as the high-pitched cry in birth asphyxia. Despite an annual burden of over 1.5 million infant deaths and disabilities, early detection of neonatal brain injuries due to asphyxia remains a challenge, particularly in developing countries where the majority of births are not attended by a trained physician. Here we report on the first inter-continental clinical study to demonstrate that neonatal brain injury can be reliably determined from recorded infant cries using an AI algorithm we call Roseline. Previous and recent work has been limited by the lack of a large, high-quality clinical database of cry recordings, constraining the application of state-of-the-art machine learning. We develop a new training methodology for audio-based pathology detection models and evaluate this system on a large database of newborn cry sounds acquired from geographically diverse settings -- 5 hospitals across 3 continents. Our system extracts interpretable acoustic biomarkers that support clinical decisions and is able to accurately detect neurological injury from newborns' cries with an AUC of 92.5% (88.7% sensitivity at 80% specificity). Cry-based neurological monitoring opens the door for low-cost, easy-to-use, non-invasive and contact-free screening of at-risk babies, especially when integrated into simple devices like smartphones or neonatal ICU monitors. This would provide a reliable tool where there are no alternatives, but also curtail the need to regularly exert newborns to physically-exhausting or radiation-exposing assessments such as brain CT scans. This work sets the stage for embracing the infant cry as a vital sign and indicates the potential of AI-driven sound monitoring for the future of affordable healthcare.

摘要
Previous and recent work has been limited by the lack of a large, high-quality clinical database of cry recordings, which has constrained the application of state-of-the-art machine learning. We developed a new training methodology for audio-based pathology detection models and evaluated this system on a large database of newborn cry sounds acquired from geographically diverse settings - 5 hospitals across 3 continents. Our system extracts interpretable acoustic biomarkers that support clinical decisions and can accurately detect neurological injury from newborns' cries with an AUC of 92.5% (88.7% sensitivity at 80% specificity).Cry-based neurological monitoring provides a low-cost, easy-to-use, non-invasive, and contact-free screening tool for at-risk babies, especially when integrated into simple devices like smartphones or neonatal ICU monitors. This would provide a reliable tool where there are no alternatives and curtail the need for regular, physically-exhausting, or radiation-exposing assessments such as brain CT scans. This work sets the stage for embracing the infant cry as a vital sign and indicates the potential of AI-driven sound monitoring for the future of affordable healthcare.

A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction

paper_url: http://arxiv.org/abs/2310.08277
repo_url: None
paper_authors: Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa
for: 这个论文是为了提出一种多任务普适语音增强（MUSE）模型，能够处理五种语音增强任务：逆声泵、噪声纠正、语音分离（SS）、目标说话人抽取（TSE）和说话人数量计算。
methods: 这个模型通过将两个模块 integrate into an SE model：1）内部分离模块，它同时完成了 speaker counting和分离；2）TSE模块，使用目标说话人征料来从internal separation输出中提取目标speech。
results: 论文的evaluation结果表明，提出的MUSE模型可以成功处理多个任务，并且可以在单个模型上实现这些任务，这已经没有被完成过。

Abstract
We propose a multi-task universal speech enhancement (MUSE) model that can perform five speech enhancement (SE) tasks: dereverberation, denoising, speech separation (SS), target speaker extraction (TSE), and speaker counting. This is achieved by integrating two modules into an SE model: 1) an internal separation module that does both speaker counting and separation; and 2) a TSE module that extracts the target speech from the internal separation outputs using target speaker cues. The model is trained to perform TSE if the target speaker cue is given and SS otherwise. By training the model to remove noise and reverberation, we allow the model to tackle the five tasks mentioned above with a single model, which has not been accomplished yet. Evaluation results demonstrate that the proposed MUSE model can successfully handle multiple tasks with a single model.

摘要
我们提出了一种多任务通用speech增强（MUSE）模型，可以执行五种speech增强（SE）任务：干扰除泛滥、噪声除除、speaker分离（SS）、target speaker抽象（TSE）和speaker数量计算。这是通过将两个模块集成到一个SE模型中来实现的：1）内部分离模块，用于计算speaker数量和分离；2）TSE模块，使用目标说话者信号来从内部分离输出中提取目标speech。模型在给定目标说话者信号时进行TSE训练，否则进行SS训练。通过训练模型去除噪声和泛滥，我们使得模型可以处理上述五个任务。评估结果表明，我们提出的MUSE模型可以成功处理多个任务。

2023-10-12

cs.CV

cs.CV - 2023-10-12

Investigating the Robustness and Properties of Detection Transformers (DETR) Toward Difficult Images

paper_url: http://arxiv.org/abs/2310.08772
repo_url: None
paper_authors: Zhao Ning Zou, Yuhang Zhang, Robert Wijaya
for: 本研究探讨了基于Transformer的对象检测器（DETR）如何处理不同的图像干扰因素，如 occlusion 和 adversarial 杂乱。
methods: 我们使用了不同的实验和测试基准来评估 DETR 的性能，并对其与基于 convolutional neural network（CNN）的检测器如 YOLO 和 Faster-RCNN 进行了比较。
results: 我们发现 DETR 在遇到 occlusion 图像时表现良好，但在遇到 adversarial 杂乱时表现较差，并且依赖于主要查询来做预测，导致查询的贡献不均匀。

Abstract
Transformer-based object detectors (DETR) have shown significant performance across machine vision tasks, ultimately in object detection. This detector is based on a self-attention mechanism along with the transformer encoder-decoder architecture to capture the global context in the image. The critical issue to be addressed is how this model architecture can handle different image nuisances, such as occlusion and adversarial perturbations. We studied this issue by measuring the performance of DETR with different experiments and benchmarking the network with convolutional neural network (CNN) based detectors like YOLO and Faster-RCNN. We found that DETR performs well when it comes to resistance to interference from information loss in occlusion images. Despite that, we found that the adversarial stickers put on the image require the network to produce a new unnecessary set of keys, queries, and values, which in most cases, results in a misdirection of the network. DETR also performed poorer than YOLOv5 in the image corruption benchmark. Furthermore, we found that DETR depends heavily on the main query when making a prediction, which leads to imbalanced contributions between queries since the main query receives most of the gradient flow.

摘要
带有变换器的对象检测器（DETR）在机器视觉任务中表现出色，特别是对象检测任务。这种检测器基于自我注意机制以及变换器编码解码架构，以捕捉图像中的全局上下文。然而，需要解决的问题是如何让这种模型架构在不同的图像噪音（如 occlusion 和 adversarial 扰动）下表现良好。我们通过不同的实验和比较 DE TR 与基于 convolutional neural network（CNN）的检测器如 YOLO 和 Faster-RCNN 进行了比较。我们发现 DE TR 在干扰图像中具有较好的抗干扰性能。然而，我们发现对图像添加 adversarial 贴图会导致网络生成新的无用的键、问题和值，这通常会导致网络的误导。此外，我们发现 DE TR 在图像损害benchmark中表现较差，而且 DE TR 依赖于主要的查询来进行预测，导致查询的贡献不均。

Intelligent Scoliosis Screening and Diagnosis: A Survey

paper_url: http://arxiv.org/abs/2310.08756
repo_url: None
paper_authors: Zhang Zhenlin, Pu Lixin, Li Ang, Zhang Jun, Li Xianjie, Fan Jipeng
for: 这个论文是为了探讨计算机助手诊断和评估脊梁 curvature 的现状和发展趋势。
methods: 论文使用了不同算法模型来描述脊梁 curvature的计算机助手诊断和评估，并对这些模型的优缺点进行分析。
results: 论文分析了现有算法模型的优缺点，并预测未来发展趋势。I hope that helps!

Abstract
Scoliosis is a three-dimensional spinal deformity, which may lead to abnormal morphologies, such as thoracic deformity, and pelvic tilt. Severe patients may suffer from nerve damage and urinary abnormalities. At present, the number of scoliosis patients in primary and secondary schools has exceeded five million in China, the incidence rate is about 3% to 5% which is growing every year. The research on scoliosis, therefore, has important clinical value. This paper systematically introduces computer-assisted scoliosis screening and diagnosis as well as analyzes the advantages and limitations of different algorithm models in the current issue field. Moreover, the paper also discusses the current development bottlenecks in this field and looks forward to future development trends.

摘要
斯科利病是三维脊梁弯曲的疾病，可能会导致异常的体征，如胸部弯曲和臀部倾斜。严重的患者可能会uffer from nerve damage和尿液异常。目前，中国primary和secondary学校的斯科利病患者人数已经超过500万，发生率大约为3%-5%，每年都在增长。因此，斯科利病的研究具有重要的临床价值。本文系统介绍了计算机助け的斯科利病检测和诊断，分析了不同算法模型在当前领域的优劣点。此外，本文还讨论了当前领域的发展瓶颈和未来发展趋势。

PU-Ray: Point Cloud Upsampling via Ray Marching on Implicit Surface

paper_url: http://arxiv.org/abs/2310.08755
repo_url: https://github.com/sum1lim/PU-Ray
paper_authors: Sangwon Lim, Karim El-Basyouny, Yee Hong Yang
For: + The paper addresses the problems of domain dependency and computational redundancy in deep-learning-based point cloud upsampling methods, and proposes a ray-based upsampling approach with an arbitrary rate for more precise and stable results.* Methods: + The method uses a ray-based approach to simulate the ray marching algorithm for implicit surface learning, and employs a rule-based mid-point query sampling method to achieve a uniform output point distribution without requiring model training.* Results: + The results demonstrate the method’s versatility across different domains and training scenarios with limited computational resources and training data, allowing the upsampling task to transition from academic research to real-world applications.Here’s the simplified Chinese text for the three key information points:* 用: + 本文解决了深度学习基于点云upsampling方法中的域dependency和计算过程的重复性问题，并提出了一种基于探针的upsampling方法，以实现更精确和稳定的结果。* 方法: + 方法使用探针基本来模拟探针迭代算法，实现了隐藏表面学习，并采用规则基于中点查询法来实现输出点云的均匀分布，不需要模型训练。* 结果: + 结果表明该方法在不同的域和训练场景下具有限制的计算资源和训练数据，可以将upsampling任务从学术研究转移到实际应用中。

Abstract
While the recent advancements in deep-learning-based point cloud upsampling methods improve the input to autonomous driving systems, they still suffer from the uncertainty of denser point generation resulting from end-to-end learning. For example, due to the vague training objectives of the models, their performance depends on the point distributions of the input and the ground truth. This causes problems of domain dependency between synthetic and real-scanned point clouds and issues with substantial model sizes and dataset requirements. Additionally, many existing methods upsample point clouds with a fixed scaling rate, making them inflexible and computationally redundant. This paper addresses the above problems by proposing a ray-based upsampling approach with an arbitrary rate, where a depth prediction is made for each query ray. The method simulates the ray marching algorithm to achieve more precise and stable ray-depth predictions through implicit surface learning. The rule-based mid-point query sampling method enables a uniform output point distribution without requiring model training using the Chamfer distance loss function, which can exhibit bias towards the training dataset. Self-supervised learning becomes possible with accurate ground truths within the input point cloud. The results demonstrate the method's versatility across different domains and training scenarios with limited computational resources and training data. This allows the upsampling task to transition from academic research to real-world applications.

摘要
“Recent advancements in deep learning-based point cloud upsampling methods have improved the input for autonomous driving systems, but they still suffer from the uncertainty of denser point generation due to end-to-end learning. For example, the models' performance depends on the point distributions of the input and ground truth, causing domain dependency between synthetic and real-scanned point clouds, as well as issues with large model sizes and dataset requirements. Existing methods upsample point clouds with a fixed scaling rate, making them inflexible and computationally redundant. This paper addresses these problems by proposing a ray-based upsampling approach with an arbitrary rate, where a depth prediction is made for each query ray. The method simulates the ray marching algorithm to achieve more precise and stable ray-depth predictions through implicit surface learning. The rule-based mid-point query sampling method enables a uniform output point distribution without requiring model training using the Chamfer distance loss function, which can exhibit bias towards the training dataset. Self-supervised learning becomes possible with accurate ground truths within the input point cloud. The results demonstrate the method's versatility across different domains and training scenarios with limited computational resources and training data, allowing the upsampling task to transition from academic research to real-world applications.”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

AcTExplore: Active Tactile Exploration on Unknown Objects

paper_url: http://arxiv.org/abs/2310.08745
repo_url: None
paper_authors: Amir-Hossein Shahidzadeh, Seong Jong Yoo, Pavan Mantripragada, Chahat Deep Singh, Cornelia Fermüller, Yiannis Aloimonos
for: 本研究旨在提出一种基于奖励学习的活动戚触探测方法，以便高效地探索物体结构，从而提高机器人抓取和操作等基础任务的能力。
methods: 本方法使用奖励学习驱动的活动戚触探测策略，通过尝试不同的探测动作，逐渐探索物体表面，并通过缓存的策略来减少探测次数。
results: 本研究在未看过的 YCB 对象上达到了95.97% IoU 覆盖率，而且只需要在基本形状上进行训练。项目网站：https://prg.cs.umd$.$edu/AcTExplore

Abstract
Tactile exploration plays a crucial role in understanding object structures for fundamental robotics tasks such as grasping and manipulation. However, efficiently exploring such objects using tactile sensors is challenging, primarily due to the large-scale unknown environments and limited sensing coverage of these sensors. To this end, we present AcTExplore, an active tactile exploration method driven by reinforcement learning for object reconstruction at scales that automatically explores the object surfaces in a limited number of steps. Through sufficient exploration, our algorithm incrementally collects tactile data and reconstructs 3D shapes of the objects as well, which can serve as a representation for higher-level downstream tasks. Our method achieves an average of 95.97% IoU coverage on unseen YCB objects while just being trained on primitive shapes. Project Webpage: https://prg.cs.umd$.$edu/AcTExplore

摘要
“感觉探索”在基本机器人任务中，如抓取和操作，具有重要的作用。然而，使用感觉传感器效率地探索物体是具有挑战性，主要是因为物体环境规模很大，感觉传感器的感知范围也有限。为此，我们提出了AcTExplore，一种基于奖励学习的活动感觉探索方法，可以自动在有限步数内探索物体表面。通过适当的探索，我们的算法逐步收集感觉数据，并将其用于重建物体的3D形状，这可以作为下游任务的表示。我们的方法在未经训练的YCB物体上实现了95.97%的IOU覆盖率，请参阅项目网页：https://prg.cs.umd$.$edu/AcTExplore。

A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches

paper_url: http://arxiv.org/abs/2310.08705
repo_url: None
paper_authors: Kangqing Shen, Gemine Vivone, Xiaoyuan Yang, Simone Lolli, Michael Schmitt
for: 这篇论文旨在提出一种基于supervised learning的SAR颜色化方法，以帮助解决Remote sensing中SAR图像的雾涂问题。
methods: 该方法包括一种协议 для生成合成的SAR颜色图像、多种基线和一种基于conditional generative adversarial network（cGAN）的有效SAR颜色化方法。
results: EXTENSIVE TESTS表明我们提出的cGAN基本网络对SAR颜色化问题具有高效性。代码将公开发布。

Abstract
Synthetic aperture radar (SAR) images are widely used in remote sensing. Interpreting SAR images can be challenging due to their intrinsic speckle noise and grayscale nature. To address this issue, SAR colorization has emerged as a research direction to colorize gray scale SAR images while preserving the original spatial information and radiometric information. However, this research field is still in its early stages, and many limitations can be highlighted. In this paper, we propose a full research line for supervised learning-based approaches to SAR colorization. Our approach includes a protocol for generating synthetic color SAR images, several baselines, and an effective method based on the conditional generative adversarial network (cGAN) for SAR colorization. We also propose numerical assessment metrics for the problem at hand. To our knowledge, this is the first attempt to propose a research line for SAR colorization that includes a protocol, a benchmark, and a complete performance evaluation. Our extensive tests demonstrate the effectiveness of our proposed cGAN-based network for SAR colorization. The code will be made publicly available.

摘要
雷达图像（SAR）广泛用于远程感知。解释SAR图像可以是困难的，因为它们具有内生的点粒度噪声和灰度特征。为解决这一问题，SAR彩色化在研究中得到了广泛关注，以彩色化灰度SAR图像，保留原始空间信息和雷达信息。但这一研究领域仍处于早期阶段，有很多限制。在这篇论文中，我们提出了一个完整的超级vised学习基础的研究线路，用于SAR彩色化。我们的方法包括一种协议，一些基elines，以及基于条件生成 adversarial network（cGAN）的有效方法。我们还提出了评估问题的数字量表。根据我们所知，这是第一次为SAR彩色化提出了一个完整的研究线路，包括协议、标准和完整的性能评估。我们的广泛测试表明，我们提出的cGAN基于网络可以有效地进行SAR彩色化。代码将公开发布。

Fed-Safe: Securing Federated Learning in Healthcare Against Adversarial Attacks

paper_url: http://arxiv.org/abs/2310.08681
repo_url: None
paper_authors: Erfan Darzi, Nanna M. Sijtsema, P. M. A van Ooijen
for: 本研究探讨了受防御性训练和加密技术保护的 Federated Learning 医学图像分析应用的安全性。
methods: 本研究使用了分布式噪声来防御模型免受攻击，并且通过对不同攻击场景、参数和使用场景进行广泛的评估来证明其效果。
results: 研究结果表明，通过分布式噪声可以实现与传统防御训练相同的安全水平，而且需要更少的重训样本来建立一个可靠的模型。

Abstract
This paper explores the security aspects of federated learning applications in medical image analysis. Current robustness-oriented methods like adversarial training, secure aggregation, and homomorphic encryption often risk privacy compromises. The central aim is to defend the network against potential privacy breaches while maintaining model robustness against adversarial manipulations. We show that incorporating distributed noise, grounded in the privacy guarantees in federated settings, enables the development of a adversarially robust model that also meets federated privacy standards. We conducted comprehensive evaluations across diverse attack scenarios, parameters, and use cases in cancer imaging, concentrating on pathology, meningioma, and glioma. The results reveal that the incorporation of distributed noise allows for the attainment of security levels comparable to those of conventional adversarial training while requiring fewer retraining samples to establish a robust model.

摘要

SSG2: A new modelling paradigm for semantic segmentation

paper_url: http://arxiv.org/abs/2310.08671
repo_url: https://github.com/feevos/ssg2
paper_authors: Foivos I. Diakogiannis, Suzanne Furby, Peter Caccetta, Xiaoliang Wu, Rodrigo Ibata, Ondrej Hlinka, John Taylor
for: 这个论文主要是为了解决semantic segmentation中的一个问题，即模型只能处理单个静止图像，导致无法进行误差修正。
methods: 这篇论文提出了一种方法，即使用序列可观测的方式来提高semantic segmentation的准确率。具体来说，该方法使用了一个双encoder、单decoder的基网络，以及一个序列模型。
results: 在三个不同的数据集上进行测试，SSG2模型表现出色，与UNet类基线模型相比，它在同样的数量的梯度更新中显著地提高了准确率。然而，添加时间维度会增加内存占用量。

Abstract
State-of-the-art models in semantic segmentation primarily operate on single, static images, generating corresponding segmentation masks. This one-shot approach leaves little room for error correction, as the models lack the capability to integrate multiple observations for enhanced accuracy. Inspired by work on semantic change detection, we address this limitation by introducing a methodology that leverages a sequence of observables generated for each static input image. By adding this "temporal" dimension, we exploit strong signal correlations between successive observations in the sequence to reduce error rates. Our framework, dubbed SSG2 (Semantic Segmentation Generation 2), employs a dual-encoder, single-decoder base network augmented with a sequence model. The base model learns to predict the set intersection, union, and difference of labels from dual-input images. Given a fixed target input image and a set of support images, the sequence model builds the predicted mask of the target by synthesizing the partial views from each sequence step and filtering out noise. We evaluate SSG2 across three diverse datasets: UrbanMonitor, featuring orthoimage tiles from Darwin, Australia with five spectral bands and 0.2m spatial resolution; ISPRS Potsdam, which includes true orthophoto images with multiple spectral bands and a 5cm ground sampling distance; and ISIC2018, a medical dataset focused on skin lesion segmentation, particularly melanoma. The SSG2 model demonstrates rapid convergence within the first few tens of epochs and significantly outperforms UNet-like baseline models with the same number of gradient updates. However, the addition of the temporal dimension results in an increased memory footprint. While this could be a limitation, it is offset by the advent of higher-memory GPUs and coding optimizations.

摘要
现代 semantic segmentation 模型主要在单个静止图像上运行，生成相应的分割标签。这种一枚投入方法留下了Errata的Room for error correction，因为模型缺乏集成多个观察到的能力。我们由 semantic change detection 的工作 inspirited，我们提出了一种方法，利用每个静止输入图像的序列可观测。通过添加这个“时间”维度，我们利用序列观察中强相关的信号强度来减少错误率。我们的框架，称为 SSG2 (Semantic Segmentation Generation 2)，使用了 dual-encoder，single-decoder 基础网络，并添加了一个序列模型。基础模型可以预测 dual-input 图像上的标签集 intersection， union 和 difference。给定一个固定的目标输入图像和一组支持图像，序列模型可以使用每个序列步骤中的Synthesizing partial views，并将噪声滤除，生成目标图像的预测掩码。我们在 UrbanMonitor，ISPRS Potsdam 和 ISIC2018 三个多样化的数据集上评估了 SSG2 模型，它在First few tens of epochs 内快速聚合，并与同样多个梯度更新的 UNet-like 基eline模型相比，显著提高性能。然而，添加时间维度会增加内存占用量。尽管这可能是一个限制，但是随着更高内存 GPU 和编程优化，这个问题可以被解决。

paper_url: http://arxiv.org/abs/2310.08669
repo_url: None
paper_authors: Yao-Hung Hubert Tsai, Vansh Dhar, Jialu Li, Bowen Zhang, Jian Zhang
for: 本研究旨在开发一种可以通过语言模型来实现视觉导航的方法，不需要复杂的提示系统。
methods: 我们的方法使用了简单的文本提示、当前观察和历史收集器模型，将输入为视觉导航。输出为可能的行为选择的概率分布。
results: 我们的方法在使用人类示例和碰撞信号从Habitat-Matterport 3D Dataset（HM3D）进行训练后，与现有的行为快照方法相比，表现出来的结果更好，并有效降低碰撞率。

Abstract
Recent efforts to enable visual navigation using large language models have mainly focused on developing complex prompt systems. These systems incorporate instructions, observations, and history into massive text prompts, which are then combined with pre-trained large language models to facilitate visual navigation. In contrast, our approach aims to fine-tune large language models for visual navigation without extensive prompt engineering. Our design involves a simple text prompt, current observations, and a history collector model that gathers information from previous observations as input. For output, our design provides a probability distribution of possible actions that the agent can take during navigation. We train our model using human demonstrations and collision signals from the Habitat-Matterport 3D Dataset (HM3D). Experimental results demonstrate that our method outperforms state-of-the-art behavior cloning methods and effectively reduces collision rates.

摘要
Recent efforts to enable visual navigation using large language models have mainly focused on developing complex prompt systems. These systems incorporate instructions, observations, and history into massive text prompts, which are then combined with pre-trained large language models to facilitate visual navigation. In contrast, our approach aims to fine-tune large language models for visual navigation without extensive prompt engineering. Our design involves a simple text prompt, current observations, and a history collector model that gathers information from previous observations as input. For output, our design provides a probability distribution of possible actions that the agent can take during navigation. We train our model using human demonstrations and collision signals from the Habitat-Matterport 3D Dataset (HM3D). Experimental results demonstrate that our method outperforms state-of-the-art behavior cloning methods and effectively reduces collision rates.Here's the text in Traditional Chinese:近期对使用大型自然语言模型进行视觉NAVIIGATION的努力主要集中在开发复杂的提示系统上。这些系统将 instrucions、观察和历史合并到巨量文本提示中，然后与预训的大型自然语言模型结合以便视觉NAVIIGATION。相比之下，我们的方法则是调整大型自然语言模型以便视觉NAVIIGATION，不需要广泛的提示工程。我们的设计包括简单的文本提示、当前观察和历史收集器模型，这些模型将前一次观察的信息作为输入，并将输出为可能的行动选择的概率分布。我们使用人类示范和Habitat-Matterport 3D Dataset（HM3D）中的碰撞信号进行训练。实验结果显示，我们的方法比预设的行为复制方法更高效，并有效地降低碰撞率。

Histogram- and Diffusion-Based Medical Out-of-Distribution Detection

paper_url: http://arxiv.org/abs/2310.08654
repo_url: None
paper_authors: Evi M. C. Huijben, Sina Amirrajab, Josien P. W. Pluim
for: 本研究旨在提高医学领域人工智能算法的安全性和可靠性，通过检测异常输入数据（Out-of-distribution，OOD）。
methods: 本研究提出了一个组合使用 histogram-based 方法和 diffusion-based 方法的检测管道，以检测医学领域中的异常数据。 histogram-based 方法用于检测医学领域中的同型异常（homogeneous anomalies），而 diffusion-based 方法基于最新的无监督异常检测方法（DDPM-OOD）。
results: 研究发现，提出的 DDPM 方法敏感于卷积和偏置场示例，但面临着解剖变形、黑色slice和交换 patches 等挑战。这些发现表明，进一步研究可以提高 DDPM 的性能，以便更好地检测医学领域中的异常数据。

Abstract
Out-of-distribution (OOD) detection is crucial for the safety and reliability of artificial intelligence algorithms, especially in the medical domain. In the context of the Medical OOD (MOOD) detection challenge 2023, we propose a pipeline that combines a histogram-based method and a diffusion-based method. The histogram-based method is designed to accurately detect homogeneous anomalies in the toy examples of the challenge, such as blobs with constant intensity values. The diffusion-based method is based on one of the latest methods for unsupervised anomaly detection, called DDPM-OOD. We explore this method and propose extensive post-processing steps for pixel-level and sample-level anomaly detection on brain MRI and abdominal CT data provided by the challenge. Our results show that the proposed DDPM method is sensitive to blur and bias field samples, but faces challenges with anatomical deformation, black slice, and swapped patches. These findings suggest that further research is needed to improve the performance of DDPM for OOD detection in medical images.

摘要
外部分布 (OOD) 检测是人工智能算法的安全性和可靠性关键，特别在医疗领域。在2023年医疗外部分布检测挑战中，我们提议一个管道， combinates histogram-based 方法和扩散-based 方法。 histogram-based 方法用于准确检测医疗示例中的同质异常，如具有常数Intensity值的blob。扩散-based 方法基于最新的无监督异常检测方法DDPM-OOD。我们探索这个方法，并提出了广泛的后处理步骤，用于像素级和样本级异常检测在脑MRI和腹部CT数据中。我们的结果表明，我们提议的 DDPM 方法对于锐化和偏置场景敏感，但面临着解剖变形、黑色slice和交换 patches 等挑战。这些发现表明，进一步的研究可以提高 DDPM 的外部分布检测性能在医疗图像中。

Defect Analysis of 3D Printed Cylinder Object Using Transfer Learning Approaches

paper_url: http://arxiv.org/abs/2310.08645
repo_url: None
paper_authors: Md Manjurul Ahsan, Shivakumar Raman, Zahed Siddique
for: 这个研究旨在测试机器学习（ML）方法，特别是转移学习（TL）模型，以检测3D印造中的缺陷。
methods: 研究使用了多种ML模型，包括VGG16、VGG19、ResNet50、ResNet101、InceptionResNetV2和MobileNetV2，对3D印造中的图像进行分析。
results: 研究发现，MobileNetV2、InceptionResNetV2和VGG16等TL模型在第一个研究中均取得了完美的分数，而ResNet50则表现不佳，其平均F1分数为0.32。在第二个研究中，MobileNetV2正确地显示了所有的实例，而ResNet50则因为更多的假阳性和 fewer true positives，其F1分数为0.75。总的来说，研究发现了一些TL模型，如MobileNetV2，可以为3D印造中的缺陷分类提供高精度。

Abstract
Additive manufacturing (AM) is gaining attention across various industries like healthcare, aerospace, and automotive. However, identifying defects early in the AM process can reduce production costs and improve productivity - a key challenge. This study explored the effectiveness of machine learning (ML) approaches, specifically transfer learning (TL) models, for defect detection in 3D-printed cylinders. Images of cylinders were analyzed using models including VGG16, VGG19, ResNet50, ResNet101, InceptionResNetV2, and MobileNetV2. Performance was compared across two datasets using accuracy, precision, recall, and F1-score metrics. In the first study, VGG16, InceptionResNetV2, and MobileNetV2 achieved perfect scores. In contrast, ResNet50 had the lowest performance, with an average F1-score of 0.32. Similarly, in the second study, MobileNetV2 correctly classified all instances, while ResNet50 struggled with more false positives and fewer true positives, resulting in an F1-score of 0.75. Overall, the findings suggest certain TL models like MobileNetV2 can deliver high accuracy for AM defect classification, although performance varies across algorithms. The results provide insights into model optimization and integration needs for reliable automated defect analysis during 3D printing. By identifying the top-performing TL techniques, this study aims to enhance AM product quality through robust image-based monitoring and inspection.

摘要
三维打印（AM）在医疗、航空和汽车等领域得到了广泛关注，但是早期发现AM制造过程中的缺陷可以降低生产成本和提高生产效率，这是一个关键挑战。本研究通过机器学习（ML）方法，具体来说是传输学习（TL）模型，研究了3D打印的缺陷检测。研究使用了多种模型，包括VGG16、VGG19、ResNet50、ResNet101、InceptionResNetV2和MobileNetV2。通过精度、准确率、回归率和F1得分来评估模型的性能。在第一个研究中，VGG16、InceptionResNetV2和MobileNetV2均取得了完美的分数。相比之下，ResNet50表现最差，其平均F1分数为0.32。在第二个研究中，MobileNetV2正确地分类了所有实例，而ResNet50则有更多的假阳性和 fewer true positive，其F1分数为0.75。总的来说，研究发现一些TL模型，如MobileNetV2，可以在AM缺陷分类中达到高精度。然而，不同的算法之间存在性能差异。这些结果为自动化3D打印图像基于监测和检测中的模型优化和集成提供了信息。通过确定最佳TL技术，本研究旨在通过图像基于的可靠自动检测，提高AM产品质量。

Is Generalized Dynamic Novel View Synthesis from Monocular Videos Possible Today?

paper_url: http://arxiv.org/abs/2310.08587
repo_url: None
paper_authors: Xiaoming Zhao, Alex Colburn, Fangchang Ma, Miguel Angel Bautista, Joshua M. Susskind, Alexander G. Schwing
for: 动态新视角Synthesizing from monocular videos
methods: 基于现有技术的分析框架和 pseudo-generalized 方法
results: despite lacking scene-specific appearance optimization, the pseudo-generalized approach improves upon some scene-specific methods and achieves geometrically and temporally consistent depth estimates.

Abstract
Rendering scenes observed in a monocular video from novel viewpoints is a challenging problem. For static scenes the community has studied both scene-specific optimization techniques, which optimize on every test scene, and generalized techniques, which only run a deep net forward pass on a test scene. In contrast, for dynamic scenes, scene-specific optimization techniques exist, but, to our best knowledge, there is currently no generalized method for dynamic novel view synthesis from a given monocular video. To answer whether generalized dynamic novel view synthesis from monocular videos is possible today, we establish an analysis framework based on existing techniques and work toward the generalized approach. We find a pseudo-generalized process without scene-specific appearance optimization is possible, but geometrically and temporally consistent depth estimates are needed. Despite no scene-specific appearance optimization, the pseudo-generalized approach improves upon some scene-specific methods.

摘要
translate("Rendering scenes observed in a monocular video from novel viewpoints is a challenging problem.")对于单目视频中观察到的场景，社区已经研究了两种类型的技术：一是场景特定优化技术，这些技术在每个测试场景上进行优化；另一种是通用技术，只需在测试场景上运行深度网络的前进 pass。然而，对于动态场景，只有场景特定优化技术存在，而没有通用的方法 для动态新视角synthesis from monocular videos。为了回答这个问题，我们建立了一个分析框架，基于现有的技术和工作 toward a generalized approach。我们发现可以使用 Pseudo-generalized process without scene-specific appearance optimization，但需要ogeometrically和temporally consistent depth estimates。尽管没有场景特定的外观优化， Pseudo-generalized approach仍然可以超越一些场景特定的方法。Note: "Pseudo-generalized" is a term used in the original text, and it refers to a process that is not entirely generalized, but rather a simplified version of a generalized process.

Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes

paper_url: http://arxiv.org/abs/2310.08585
repo_url: https://github.com/zju3dv/im4d
paper_authors: Haotong Lin, Sida Peng, Zhen Xu, Tao Xie, Xingyi He, Hujun Bao, Xiaowei Zhou
For: 该 paper targets 动态视角合成问题，即从多视角视频中生成高质量的动态视图图像。* Methods: 该 paper 提出了 Im4D Hybrid Scene Representation，即将格子基 geometry 与多视角图像基于的 appearance 结合在一起，以捕捉复杂动态场景中的earance detail。* Results: 该 paper 的方法在 five 个动态视角合成数据集上进行了评估，并表现出了state-of-the-art的渲染质量和可教学性，同时实现了实时渲染，单个 RTX 3090 GPU 上的速度为 79.8 FPS。

Abstract
This paper aims to tackle the challenge of dynamic view synthesis from multi-view videos. The key observation is that while previous grid-based methods offer consistent rendering, they fall short in capturing appearance details of a complex dynamic scene, a domain where multi-view image-based rendering methods demonstrate the opposite properties. To combine the best of two worlds, we introduce Im4D, a hybrid scene representation that consists of a grid-based geometry representation and a multi-view image-based appearance representation. Specifically, the dynamic geometry is encoded as a 4D density function composed of spatiotemporal feature planes and a small MLP network, which globally models the scene structure and facilitates the rendering consistency. We represent the scene appearance by the original multi-view videos and a network that learns to predict the color of a 3D point from image features, instead of memorizing detailed appearance totally with networks, thereby naturally making the learning of networks easier. Our method is evaluated on five dynamic view synthesis datasets including DyNeRF, ZJU-MoCap, NHR, DNA-Rendering and ENeRF-Outdoor datasets. The results show that Im4D exhibits state-of-the-art performance in rendering quality and can be trained efficiently, while realizing real-time rendering with a speed of 79.8 FPS for 512x512 images, on a single RTX 3090 GPU.

摘要
The dynamic geometry of the scene is represented as a 4D density function consisting of spatiotemporal feature planes and a small MLP network. This allows for global modeling of the scene structure and consistent rendering. The scene appearance is represented by the original multi-view videos and a network that predicts the color of a 3D point based on image features, rather than memorizing detailed appearance with networks. This approach makes it easier to learn the networks and naturally leads to more efficient training.We evaluate our method on five dynamic view synthesis datasets, including DyNeRF, ZJU-MoCap, NHR, DNA-Rendering, and ENeRF-Outdoor. The results show that Im4D achieves state-of-the-art performance in rendering quality and can be trained efficiently. Additionally, our method realizes real-time rendering with a speed of 79.8 FPS for 512x512 images on a single RTX 3090 GPU.

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

paper_url: http://arxiv.org/abs/2310.08586
repo_url: https://github.com/OpenGVLab/PonderV2
paper_authors: Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Tong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Wanli Ouyang
for: 本研究旨在开发一种robust和高度泛化的3D基础模型，以解决现有的2D计算机视觉和自然语言处理基础模型不足的问题。
methods: 该研究提出了一种包括点云编码器和volumetric神经渲染器的完整3D预训练框架，通过对实际图像和预测图像进行对比，以学习有用的3D表示。
results: 该研究首次实现了在11个室内和室外标准测试集上的state-of-the-art性能，并在不同的场景下表现了一致性。代码和模型将在https://github.com/OpenGVLab/PonderV2中公开。

Abstract
In contrast to numerous NLP and 2D computer vision foundational models, the learning of a robust and highly generalized 3D foundational model poses considerably greater challenges. This is primarily due to the inherent data variability and the diversity of downstream tasks. In this paper, we introduce a comprehensive 3D pre-training framework designed to facilitate the acquisition of efficient 3D representations, thereby establishing a pathway to 3D foundational models. Motivated by the fact that informative 3D features should be able to encode rich geometry and appearance cues that can be utilized to render realistic images, we propose a novel universal paradigm to learn point cloud representations by differentiable neural rendering, serving as a bridge between 3D and 2D worlds. We train a point cloud encoder within a devised volumetric neural renderer by comparing the rendered images with the real images. Notably, our approach demonstrates the seamless integration of the learned 3D encoder into diverse downstream tasks. These tasks encompass not only high-level challenges such as 3D detection and segmentation but also low-level objectives like 3D reconstruction and image synthesis, spanning both indoor and outdoor scenarios. Besides, we also illustrate the capability of pre-training a 2D backbone using the proposed universal methodology, surpassing conventional pre-training methods by a large margin. For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks. The consistent improvements in various settings imply the effectiveness of the proposed method. Code and models will be made available at https://github.com/OpenGVLab/PonderV2.

摘要
相比多种自然语言处理和2D计算机视觉的基础模型，学习一个强大和高度总结的3D基础模型带来了许多更大的挑战。这主要是因为数据的自然变化和下游任务的多样性。在这篇论文中，我们介绍了一个全面的3D预训练框架，用于实现高效的3D表示的获得，从而建立3D基础模型的路径。我们被激励了由于3D特征应该能够编码丰富的几何和外观提示，以便生成真实的图像。我们提出了一种新的通用 paradigma，用于学习点云表示，作为2D和3D世界之间的桥梁。我们在定制的Volumetric Neural Renderer中训练了一个点云编码器，通过比较生成的图像与真实图像来训练。值得注意的是，我们的方法可以很好地整合学习的3D编码器到多种下游任务中。这些任务包括高级挑战 like 3D检测和分割，以及低级目标 like 3D重建和图像生成，涵盖了室内和室外场景。此外，我们还示出了使用我们所提出的通用方法来预训练2D脊梁的优势。在11个室内和室外标准测试 benchmark上，PonderV2首次实现了状态机器人的性能。这一共见的改进表明了我们的方法的有效性。代码和模型将在https://github.com/OpenGVLab/PonderV2上提供。

Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video

paper_url: http://arxiv.org/abs/2310.08584
repo_url: None
paper_authors: Shashanka Venkataramanan, Mamshad Nayeem Rizve, João Carreira, Yuki M. Asano, Yannis Avrithis
for: 这个论文是为了研究自主学习中的数据使用方法，以及如何更加经济地使用数据。
methods: 本论文提出了两个贡献。首先，它介绍了一个新的自助学习图像预训练方法，该方法基于时间Tracking来学习认知。其次，它提出了一个新的自助学习预训练方法，该方法使用 transformer 交叉关注来生成焦点地图，并使用这些焦点地图来学习图像和视频下游任务。
results: 根据论文描述，使用这两种方法可以使一个来自 Walking Tours 的视频成为 ImageNet 的强大竞争对手。

Abstract
Self-supervised learning has unlocked the potential of scaling up pretraining to billions of images, since annotation is unnecessary. But are we making the best use of data? How more economical can we be? In this work, we attempt to answer this question by making two contributions. First, we investigate first-person videos and introduce a "Walking Tours" dataset. These videos are high-resolution, hours-long, captured in a single uninterrupted take, depicting a large number of objects and actions with natural scene transitions. They are unlabeled and uncurated, thus realistic for self-supervision and comparable with human learning. Second, we introduce a novel self-supervised image pretraining method tailored for learning from continuous videos. Existing methods typically adapt image-based pretraining approaches to incorporate more frames. Instead, we advocate a "tracking to learn to recognize" approach. Our method called DoRA, leads to attention maps that Discover and tRAck objects over time in an end-to-end manner, using transformer cross-attention. We derive multiple views from the tracks and use them in a classical self-supervised distillation loss. Using our novel approach, a single Walking Tours video remarkably becomes a strong competitor to ImageNet for several image and video downstream tasks.

摘要
自我指导学习已经开放了扩大预训练到数十亿张图像的潜力，因为没有注释。但我们是否可以更经济地使用数据？在这项工作中，我们尝试回答这个问题，并提供了两项贡献。首先，我们研究了一个“步行旅游”数据集，这是高解度、数小时长、不间断拍摄的首人视频。这些视频没有标签和排序，因此与人类学习的方式相似，适合自我指导学习。其次，我们介绍了一种适合从连续视频中学习的自我指导图像预训练方法。现有方法通常是将图像预训练方法与更多帧相结合。相反，我们提倡“跟踪以学习认知”的方法。我们的方法称为DoRA，通过转换器跨层关注来发现和跟踪 объек over time，并使用классиical自我指导液化损失来生成多视图。使用我们的新方法，一个单个步行旅游视频很奇迹地变成了 ImageNet 的强大竞争对手。

Universal Visual Decomposer: Long-Horizon Manipulation Made Easy

paper_url: http://arxiv.org/abs/2310.08581
repo_url: https://github.com/zcczhang/UVD
paper_authors: Zichen Zhang, Yunshuang Li, Osbert Bastani, Abhishek Gupta, Dinesh Jayaraman, Yecheng Jason Ma, Luca Weihs
for:本研究旨在开发一种可靠、可 reuse 的视觉任务剖分方法，以便在 robotic 控制中学习长期 manipulate 任务。methods:本研究使用 pre-trained 视觉表示，通过检测视觉 embedding 空间中的阶段变化，自动找到视觉子任务。无需额外训练，UVD 可以减少 compositional generalization 问题，并在实际任务中显著提高性能。results:与基eline 相比，UVD 在 simulation 和实际任务中均表现出色，可以快速地学习和适应新任务。UVD 可以提供更好的 compositional generalization，并且可以用于 constructing goal-based reward shaping。

Abstract
Real-world robotic tasks stretch over extended horizons and encompass multiple stages. Learning long-horizon manipulation tasks, however, is a long-standing challenge, and demands decomposing the overarching task into several manageable subtasks to facilitate policy learning and generalization to unseen tasks. Prior task decomposition methods require task-specific knowledge, are computationally intensive, and cannot readily be applied to new tasks. To address these shortcomings, we propose Universal Visual Decomposer (UVD), an off-the-shelf task decomposition method for visual long horizon manipulation using pre-trained visual representations designed for robotic control. At a high level, UVD discovers subgoals by detecting phase shifts in the embedding space of the pre-trained representation. Operating purely on visual demonstrations without auxiliary information, UVD can effectively extract visual subgoals embedded in the videos, while incurring zero additional training cost on top of standard visuomotor policy training. Goal-conditioned policies learned with UVD-discovered subgoals exhibit significantly improved compositional generalization at test time to unseen tasks. Furthermore, UVD-discovered subgoals can be used to construct goal-based reward shaping that jump-starts temporally extended exploration for reinforcement learning. We extensively evaluate UVD on both simulation and real-world tasks, and in all cases, UVD substantially outperforms baselines across imitation and reinforcement learning settings on in-domain and out-of-domain task sequences alike, validating the clear advantage of automated visual task decomposition within the simple, compact UVD framework.

摘要
实际世界中的 роботи工作通常是长时间的、多个阶段的。学习长时间的抓取任务是一个长期的挑战，需要将总体任务分解成可控制的子任务，以便策略学习和对未看过的任务进行泛化。现有的任务分解方法需要任务特定的知识， computationally 成本高，并不能方便地应用于新任务。为解决这些缺点，我们提出了一种通用视觉分解器（UVD），用于视觉长时间抓取任务的偏振 decomposition。UVD 使用预训练的视觉表示进行检测预Shift 在 embedding 空间中的阶段变化，从而发现子任务。不需要辅助信息，UVD 可以有效地从视频中提取视觉子任务，并不需要额外的训练成本。与标准视听动作策略训练相同，使用 UVD 发现的子任务可以显著提高含 композиitional 泛化的表现，并且可以用于构建目标基于的奖励形式，刺激执行扩展的探索。我们广泛测试了 UVD 在 simulator 和实际世界中，并在所有情况下都显著超越基eline，证明了自动视觉任务分解在简单、 компакт的 UVD 框架中的明显优势。

OmniControl: Control Any Joint at Any Time for Human Motion Generation

paper_url: http://arxiv.org/abs/2310.08580
repo_url: https://github.com/neu-vi/OmniControl
paper_authors: Yiming Xie, Varun Jampani, Lei Zhong, Deqing Sun, Huaizu Jiang
for: 用于 incorporating flexible spatial control signals into a text-conditioned human motion generation model
methods: 使用 analytic spatial guidance 和 realism guidance 两种不同的指导方法
results: 实验结果表明，OmniControl 可以实现更加真实、协调和一致的人体动作生成，并且在不同 JOINTS 上的控制也有显著改善。

Abstract
We present a novel approach named OmniControl for incorporating flexible spatial control signals into a text-conditioned human motion generation model based on the diffusion process. Unlike previous methods that can only control the pelvis trajectory, OmniControl can incorporate flexible spatial control signals over different joints at different times with only one model. Specifically, we propose analytic spatial guidance that ensures the generated motion can tightly conform to the input control signals. At the same time, realism guidance is introduced to refine all the joints to generate more coherent motion. Both the spatial and realism guidance are essential and they are highly complementary for balancing control accuracy and motion realism. By combining them, OmniControl generates motions that are realistic, coherent, and consistent with the spatial constraints. Experiments on HumanML3D and KIT-ML datasets show that OmniControl not only achieves significant improvement over state-of-the-art methods on pelvis control but also shows promising results when incorporating the constraints over other joints.

摘要
我们提出了一种新的方法 named OmniControl，用于在基于扩散过程的文本条件人体运动生成模型中 incorporating flexible spatial control signals。与之前的方法不同，OmniControl 可以在不同的 JOINTS 和不同的时间点上使用 flexible spatial control signals，只需一个模型。我们提出了分析空间指导，以确保生成的运动能够紧跟输入控制信号。同时，我们还引入了真实性指导，以进一步让所有 JOINTS 都更加协调，生成更加合理的运动。这两种指导都是重要的，它们是彼此补做的，可以均衡控制准确性和运动真实性。通过将它们结合起来，OmniControl 可以生成更加真实、协调、遵循空间约束的运动。在 HumanML3D 和 KIT-ML 数据集上进行了实验，OmniControl 不仅在 pelvis 控制方面取得了显著改进，还在其他 JOINTS 上 incorporating 约束时表现了良好的结果。

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

paper_url: http://arxiv.org/abs/2310.08579
repo_url: https://github.com/snap-research/HyperHuman
paper_authors: Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li, Dahua Lin, Xihui Liu, Ziwei Liu, Sergey Tulyakov
for: 该论文目标是生成高度真实的人像图像，以满足在各种场景下的人像生成需求。
methods: 该论文提出了一种叫做HyperHuman的框架，该框架包括三个主要部分：1) 构建了一个大规模的人像数据集（名为HumanVerse），包括340万个图像和人 pose、深度和表面法向量等精心标注。2) 提出了一种叫做Latent Structural Diffusion Model的模型，该模型同时减去了深度和表面法向量以及生成的RGB图像中的噪声。3) 最后，提出了一种叫做Structure-Guided Refiner的组合方法，用于更加细腻地生成更高分辨率的人像图像。
results: 经过广泛的实验，该论文的框架实现了state-of-the-art的性能，可以在多种场景下生成高度真实的人像图像。

Abstract
Despite significant advances in large-scale text-to-image models, achieving hyper-realistic human image generation remains a desirable yet unsolved task. Existing models like Stable Diffusion and DALL-E 2 tend to generate human images with incoherent parts or unnatural poses. To tackle these challenges, our key insight is that human image is inherently structural over multiple granularities, from the coarse-level body skeleton to fine-grained spatial geometry. Therefore, capturing such correlations between the explicit appearance and latent structure in one model is essential to generate coherent and natural human images. To this end, we propose a unified framework, HyperHuman, that generates in-the-wild human images of high realism and diverse layouts. Specifically, 1) we first build a large-scale human-centric dataset, named HumanVerse, which consists of 340M images with comprehensive annotations like human pose, depth, and surface normal. 2) Next, we propose a Latent Structural Diffusion Model that simultaneously denoises the depth and surface normal along with the synthesized RGB image. Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network, where each branch in the model complements to each other with both structural awareness and textural richness. 3) Finally, to further boost the visual quality, we propose a Structure-Guided Refiner to compose the predicted conditions for more detailed generation of higher resolution. Extensive experiments demonstrate that our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios. Project Page: https://snap-research.github.io/HyperHuman/

摘要
尽管大规模文本到图像模型已经取得了 significativo 进步，但 Achieving hyper-realistic human image generation 仍然是一个需要解决的任务。现有的模型如 Stable Diffusion 和 DALL-E 2 通常会生成人像图像中的部分不协调或不自然的姿势。为了解决这些挑战，我们的关键洞察是人像图像具有多个粒度的结构，从粗粒度的体姿skeleton到细粒度的空间几何。因此，捕捉这些相关性在一个模型中是关键，以生成协调的和自然的人像图像。为此，我们提出了一个统一框架，即 HyperHuman，可以生成宽泛的人像图像，高度真实和多样化的布局。specifically，我们采取以下三个步骤：1. 我们首先建立了一个大规模的人类中心的数据集，名为 HumanVerse，该集包含340万张图像，并包括人pose、深度和表面法向的全面注解。2. 接下来，我们提出了一种干扰难度和表面法向同时减震的模型，即 Latent Structural Diffusion Model。该模型同时学习图像外观、空间关系和几何结构，并在一个统一网络中进行结合学习。每个分支在模型中补做了对于彼此的结构意识和 текстуаль丰富的补做。3. 为了进一步提高视觉质量，我们还提出了一种结构指导的修正器，用于更详细地生成更高分辨率的图像。广泛的实验表明，我们的框架可以 дости到当前最佳性能，在多种enario下生成高度真实的人像图像。项目页面：https://snap-research.github.io/HyperHuman/

Learning to Act from Actionless Videos through Dense Correspondences

paper_url: http://arxiv.org/abs/2310.08576
repo_url: https://github.com/flow-diffusion/AVDC
paper_authors: Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum
for: 本研究旨在构建一种基于视频示例的机器人策略，可以在不同机器人和环境中可靠执行多种任务，仅从视频示例中学习而无需使用任何动作标注。
methods: 本方法利用图像作为任务免疑表示，同时使用文本来表示机器人目标。我们使用视频拼接技术生成机器人执行动作的视频，并利用密集对准关系来INFER机器人需要执行的具体动作。
results: 我们在表面 manipulate 和导航任务上证明了本方法的效果，并提供了一个开源框架，可以有效地模型视频，使得在四个GPU上进行高精度策略模型训练，可以在一天内完成。

Abstract
In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations. Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals. By synthesizing videos that ``hallucinate'' robot executing actions and in combination with dense correspondences between frames, our approach can infer the closed-formed action to execute to an environment without the need of any explicit action labels. This unique capability allows us to train the policy solely based on RGB videos and deploy learned policies to various robotic tasks. We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks. Additionally, we contribute an open-source framework for efficient video modeling, enabling the training of high-fidelity policy models with four GPUs within a single day.

摘要
在这项工作中，我们提出了一种方法，能够基于视频构建一个多功能机器人策略，可靠地执行多种任务在不同的机器人和环境中，只需从视频示例中学习而无需使用任何动作标注。我们的方法利用图像作为任务无关的表示，卷积 both 状态和动作信息，并使用文本作为机器人目标的通用表示。通过将视频“幻化”机器人执行动作，并在每帧之间进行紧密的对应关系，我们的方法可以从RGB视频中推理出要执行的closed-form动作，无需任何显式动作标注。这种特有的能力允许我们通过RGB视频进行训练策略，并将学习的策略部署到多种机器人任务中。我们在表ptop抓取和导航任务上证明了这种方法的效果。此外，我们还提供了一个开源的视频模型框架，可以使用四个GPU在单天内高效地训练高精度策略模型。

paper_url: http://arxiv.org/abs/2310.08541
repo_url: None
paper_authors: Zhengyuan Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang
for: automatic image design and generation
methods: multimodal iterative self-refinement with GPT-4V(ision)
results: images of better semantic and visual qualities, with the ability to process input ideas with interleaved image-text sequences and follow ideas with design instructions.

Abstract
We introduce ``Idea to Image,'' a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation. Humans can quickly identify the characteristics of different text-to-image (T2I) models via iterative explorations. This enables them to efficiently convert their high-level generation ideas into effective T2I prompts that can produce good images. We investigate if systems based on large multimodal models (LMMs) can develop analogous multimodal self-refinement abilities that enable exploring unknown models or environments via self-refining tries. Idea2Img cyclically generates revised T2I prompts to synthesize draft images, and provides directional feedback for prompt revision, both conditioned on its memory of the probed T2I model's characteristics. The iterative self-refinement brings Idea2Img various advantages over vanilla T2I models. Notably, Idea2Img can process input ideas with interleaved image-text sequences, follow ideas with design instructions, and generate images of better semantic and visual qualities. The user preference study validates the efficacy of multimodal iterative self-refinement on automatic image design and generation.

摘要
我们介绍“想法到图像”系统，该系统允许多Modal迭代自修正（GPT-4V）为自动图像设计和生成。人类可以快速认识不同的文本到图像（T2I）模型的特征，通过迭代探索来快速转化高级生成想法为有效的T2I提示，以生成好的图像。我们研究了基于大型多Modal模型（LMM）是否可以发展类似的多Modal自修复能力，以实现探索未知模型或环境 via 自修复尝试。Idea2Img 在循环生成修订T2I提示，并提供向提示修改的指导反馈，两者均基于它的记忆 probed T2I 模型的特征。多Modal迭代自修正带来了 Idea2Img 多种优势，包括可以处理交错的图像文本序列、跟随设计指令、生成更好的 semantic 和视觉质量的图像。用户偏好调查证明了自动图像设计和生成中多Modal迭代自修复的有效性。

Image2PCI – A Multitask Learning Framework for Estimating Pavement Condition Indices Directly from Images

paper_url: http://arxiv.org/abs/2310.08538
repo_url: None
paper_authors: Neema Jakisa Owor, Hang Du, Abdulateef Daud, Armstrong Aboah, Yaw Adu-Gyamfi
For: The paper aims to develop a unified multi-tasking model for estimating Pavement Condition Index (PCI) directly from top-down pavement images.* Methods: The proposed model is a multi-task architecture that combines feature extraction and four decoders for PCI estimation, crack detection, and segmentation. The model uses deep learning techniques and is trained on a benchmarked and open pavement distress dataset.* Results: The proposed model achieves excellent accuracy on all related tasks for crack detection and segmentation, and can estimate PCI directly from images at real-time speeds. This is the first work that can accomplish this task, to the best of the authors’ knowledge.

Abstract
The Pavement Condition Index (PCI) is a widely used metric for evaluating pavement performance based on the type, extent and severity of distresses detected on a pavement surface. In recent times, significant progress has been made in utilizing deep-learning approaches to automate PCI estimation process. However, the current approaches rely on at least two separate models to estimate PCI values -- one model dedicated to determining the type and extent and another for estimating their severity. This approach presents several challenges, including complexities, high computational resource demands, and maintenance burdens that necessitate careful consideration and resolution. To overcome these challenges, the current study develops a unified multi-tasking model that predicts the PCI directly from a top-down pavement image. The proposed architecture is a multi-task model composed of one encoder for feature extraction and four decoders to handle specific tasks: two detection heads, one segmentation head and one PCI estimation head. By multitasking, we are able to extract features from the detection and segmentation heads for automatically estimating the PCI directly from the images. The model performs very well on our benchmarked and open pavement distress dataset that is annotated for multitask learning (the first of its kind). To our best knowledge, this is the first work that can estimate PCI directly from an image at real time speeds while maintaining excellent accuracy on all related tasks for crack detection and segmentation.

摘要
《路面条件指数（PCI）评估 metric 是评估路面性能的 widely 使用方法，基于路面表面上的类型、规模和严重程度的病诊。在最近的时间里，深入学习方法在自动化 PCI 评估过程中进行了显著的进步。然而，当前的方法都需要至少两个分开的模型来计算 PCI 值 -- 一个模型用于确定类型和规模，另一个用于估计严重程度。这种方法存在许多挑战，包括复杂性、高计算资源需求和维护压力，需要仔细考虑和解决。为了突破这些挑战，当前的研究开发了一种简化多任务模型，可以直接从路面图像中预测 PCI。我们的建议的架构包括一个嵌入器 для特征提取和四个解码器来处理特定任务：两个检测头、一个分割头和一个 PCI 估计头。通过多任务学习，我们能够自动从检测和分割任务中提取特征，以便直接从图像中预测 PCI。我们的模型在我们自己练制的和公开的路面裂隙数据集上表现出色，并且在所有相关任务上保持了高精度。到我们所知，这是第一个可以在实时速度下从图像中直接预测 PCI，并且保持所有相关任务的高精度的工作。》

XAI Benchmark for Visual Explanation

paper_url: http://arxiv.org/abs/2310.08537
repo_url: None
paper_authors: Yifei Zhang, Siyi Gu, James Song, Bo Pan, Liang Zhao
for:The paper aims to provide a benchmark for evaluating the performance of visual explanation models in the context of image data.methods:The paper introduces a comprehensive visual explanation pipeline that integrates data loading, preprocessing, experimental setup, and model evaluation processes. The pipeline is designed to enable fair comparisons of various visual explanation techniques.results:The paper provides a comprehensive review of over 10 evaluation methods for visual explanation and conducts experiments on selected datasets using various model-centered and ground truth-centered evaluation metrics. The results demonstrate the effectiveness of the proposed benchmark for evaluating the performance of visual explanation models.Here is the simplified Chinese text for the three key points:for:这篇论文的目的是为了提供一个用于评估图像数据上视觉解释模型表现的标准 benchmarck。methods:这篇论文介绍了一个完整的视觉解释管道，该管道 integrates 数据加载、预处理、实验设置和模型评估过程。该管道的设计目的是允许不同的视觉解释技术进行公正的比较。results:这篇论文提供了一个涵盖 более十种评估方法的全面的视觉解释评估文献，并在选择的数据集上使用不同的模型中心和真实数据中心评估 метри来进行实验。结果表明该 benchmark 对视觉解释模型表现的评估具有效果。

Abstract
The rise of deep learning algorithms has led to significant advancements in computer vision tasks, but their "black box" nature has raised concerns regarding interpretability. Explainable AI (XAI) has emerged as a critical area of research aiming to open this "black box", and shed light on the decision-making process of AI models. Visual explanations, as a subset of Explainable Artificial Intelligence (XAI), provide intuitive insights into the decision-making processes of AI models handling visual data by highlighting influential areas in an input image. Despite extensive research conducted on visual explanations, most evaluations are model-centered since the availability of corresponding real-world datasets with ground truth explanations is scarce in the context of image data. To bridge this gap, we introduce an XAI Benchmark comprising a dataset collection from diverse topics that provide both class labels and corresponding explanation annotations for images. We have processed data from diverse domains to align with our unified visual explanation framework. We introduce a comprehensive Visual Explanation pipeline, which integrates data loading, preprocessing, experimental setup, and model evaluation processes. This structure enables researchers to conduct fair comparisons of various visual explanation techniques. In addition, we provide a comprehensive review of over 10 evaluation methods for visual explanation to assist researchers in effectively utilizing our dataset collection. To further assess the performance of existing visual explanation methods, we conduct experiments on selected datasets using various model-centered and ground truth-centered evaluation metrics. We envision this benchmark could facilitate the advancement of visual explanation models. The XAI dataset collection and easy-to-use code for evaluation are publicly accessible at https://xaidataset.github.io.

摘要
“深度学习算法的出现导致计算机视觉任务得到了重大进步，但它们的“黑盒”性带来了解释性的担忧。解释人工智能（XAI）成为了一个重要的研究领域，旨在打开这“黑盒”，了解人工智能模型做出决策的过程。视觉解释，作为解释人工智能的一个子集，为处理视觉数据的人工智能模型提供了直观的决策过程解释。然而，大多数研究都是模型中心的，因为对图像数据的相关真实数据集的可用性非常scarce。为了bridging这个差距，我们介绍了一个XAI Benchmark，包括从多种主题收集的数据集，每个数据集都包括图像的类别标签和相应的解释注释。我们对这些数据进行了多种领域的处理，以适应我们的统一的视觉解释框架。我们还提供了一个完整的视觉解释管线，包括数据加载、预处理、实验设置和模型评估过程。这种结构使研究人员能够进行公正的比较多种视觉解释技术。此外，我们还提供了更 than 10 评估方法的完整审查，以帮助研究人员有效地利用我们的数据集。为了进一步评估现有的视觉解释方法，我们在选择的数据集上进行了多种模型中心和真实数据中心的评估指标。我们希望这个Benchmark能够促进视觉解释模型的进步。XAI数据集和使用方式的代码公开访问，可以在中找到。”

Animating Street View

paper_url: http://arxiv.org/abs/2310.08534
repo_url: https://github.com/jblsmith/street-view-movie-maker
paper_authors: Mengyi Shan, Brian Curless, Ira Kemelmacher-Shlizerman, Steve Seitz
for: 这个系统可以自动将街景图像带到生命中，通过插入自然行为的行人和车辆，并且规划路径和交通行为，同时还会模拟遮盖和阴影效果。
methods: 这个系统使用了去除原有的人和车辆、插入运动对象、规划路径和交通行为、模拟人群行为、并且使用一致的照明、可见度、遮盖和阴影效果来实现。
results: 这个系统在各种街景图像中得到了丰富的生命化效果，包括正常的拍摄图像和扫描图像。

Abstract
We present a system that automatically brings street view imagery to life by populating it with naturally behaving, animated pedestrians and vehicles. Our approach is to remove existing people and vehicles from the input image, insert moving objects with proper scale, angle, motion, and appearance, plan paths and traffic behavior, as well as render the scene with plausible occlusion and shadowing effects. The system achieves these by reconstructing the still image street scene, simulating crowd behavior, and rendering with consistent lighting, visibility, occlusions, and shadows. We demonstrate results on a diverse range of street scenes including regular still images and panoramas.

摘要
我们提出了一种系统，可以自动将街景图像带到生命中，通过插入自然行为的步行者和交通工具，让图像具有更加生动的效果。我们的方法是从输入图像中移除现有的人员和交通工具，插入正确的规模、角度、运动和外观的运动对象，规划路径和交通行为，同时进行透明度和阴影效果的渲染。该系统通过重建静止图像街景、模拟人群行为、渲染透明度和阴影效果来实现这一目标。我们在多种不同的街景图像中进行了证明，包括普通的静止图像和拍摄的Panorama。

UniPose: Detecting Any Keypoints

paper_url: http://arxiv.org/abs/2310.08530
repo_url: https://github.com/IDEA-Research/UniPose
paper_authors: Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang
for: 这个研究旨在探索一个统一的框架，叫做UniPose，以探测任何骨骼结构的人类或动物体姿的关键点，包括眼睛、脚、爪子等细部信息，以便进一步掌握和操作细部物品的视觉理解。
methods: 这个研究使用了一个统一的框架，叫做UniPose，让探测关键点的任何类型的物品，包括人类和动物体姿，并且使用了文本或图像提示来进行探测。
results: 研究结果显示UniPose能够具有优秀的细部定位和普遍化能力，可以在不同的图像样式、类别和姿势下进行精确的关键点探测。

Abstract
This work proposes a unified framework called UniPose to detect keypoints of any articulated (e.g., human and animal), rigid, and soft objects via visual or textual prompts for fine-grained vision understanding and manipulation. Keypoint is a structure-aware, pixel-level, and compact representation of any object, especially articulated objects. Existing fine-grained promptable tasks mainly focus on object instance detection and segmentation but often fail to identify fine-grained granularity and structured information of image and instance, such as eyes, leg, paw, etc. Meanwhile, prompt-based keypoint detection is still under-explored. To bridge the gap, we make the first attempt to develop an end-to-end prompt-based keypoint detection framework called UniPose to detect keypoints of any objects. As keypoint detection tasks are unified in this framework, we can leverage 13 keypoint detection datasets with 338 keypoints across 1,237 categories over 400K instances to train a generic keypoint detection model. UniPose can effectively align text-to-keypoint and image-to-keypoint due to the mutual enhancement of textual and visual prompts based on the cross-modality contrastive learning optimization objectives. Our experimental results show that UniPose has strong fine-grained localization and generalization abilities across image styles, categories, and poses. Based on UniPose as a generalist keypoint detector, we hope it could serve fine-grained visual perception, understanding, and generation.

摘要
这个工作提出了一个统一框架called UniPose，用于检测任何骨Structured object（如人类和动物）的关键点，通过视觉或文本提示进行细腻视觉理解和操作。关键点是一种结构意识、像素级别、紧凑的对象表示，特别是复杂的对象。现有的细腻提示任务主要集中在对象实例检测和分割，但经常无法识别图像和实例的细腻特征，如眼睛、脚、爪子等。同时，基于提示的关键点检测还是下不足探索的领域。为了填补这个空白，我们首次尝试开发了一个端到端基于提示的关键点检测框架，可以检测任何对象的关键点。由于这个框架中的关键点检测任务被统一，我们可以使用13个关键点检测数据集，包含338个关键点，涵盖1,237个类型，共400,000个实例来训练一个通用的关键点检测模型。UniPose可以有效地将文本到关键点和图像到关键点进行对应，基于跨Modalities的对比学习优化目标，从而实现文本和图像之间的协调。我们的实验结果表明，UniPose具有强大的细腻地方化和泛化能力，可以在不同的图像风格、类型和姿势下进行高精度的关键点检测。基于UniPose作为一个通用的关键点检测器，我们希望它可以为细腻视觉理解、理解和生成提供服务。

GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors

paper_url: http://arxiv.org/abs/2310.08529
repo_url: https://github.com/hustvl/GaussianDreamer
paper_authors: Taoran Yi, Jiemin Fang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, Xinggang Wang
for: 本研究旨在bridging 2D和3D扩散模型之间，通过使用 latest 3D Gaussian splatting representation，实现高质量和高效的3D生成。
methods: 本研究提出了一种fast 3D生成框架，named as \name，其中2D扩散模型提供初始化点云约束，而3D扩散模型为 initialization提供点云质量约束。操作包括噪点增长和颜色干扰，以提高 initialized Gaussians。
results: 根据实验结果，我们的 \name 可以在一个GPU上生成高质量的3D实例，耗时只有25分钟，比之前的方法更快。生成的实例可以 direct rendering in real time。示例和代码可以在https://taoranyi.com/gaussiandreamer/ 中找到。

Abstract
In recent times, the generation of 3D assets from text prompts has shown impressive results. Both 2D and 3D diffusion models can generate decent 3D objects based on prompts. 3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain. 2D diffusion models enjoy strong abilities of generalization and fine generation, but the 3D consistency is hard to guarantee. This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation. A fast 3D generation framework, named as \name, is proposed, where the 3D diffusion model provides point cloud priors for initialization and the 2D diffusion model enriches the geometry and appearance. Operations of noisy point growing and color perturbation are introduced to enhance the initialized Gaussians. Our \name can generate a high-quality 3D instance within 25 minutes on one GPU, much faster than previous methods, while the generated instances can be directly rendered in real time. Demos and code are available at https://taoranyi.com/gaussiandreamer/.

摘要
现今，从文本提示生成3D资产已经获得了优秀的结果。两种类型的扩散模型都能够生成可以接受的3D物件，包括2D扩散模型和3D扩散模型。3D扩散模型具有良好的3D一致性，但是其质量和应用范围受到可读性和实际应用的限制。2D扩散模型具有强大的一般化和细节生成能力，但是3D一致性很难保证。这篇论文尝试将2D和3D扩散模型的力量融合起来，通过最近的明确和高效的3D Gaussian抛物表示。我们提出了一个快速的3D生成框架，名为\name，其中3D扩散模型提供初始化的点云偏好，而2D扩散模型则丰富了几何和外观。我们引入随机点增长和颜色干扰的操作来改善初始化的Gaussian。我们的\name可以在一个GPU上生成高品质的3D实例，比前方法更快，且生成的实例可以实时显示。 demo 和代码可以在上获取。

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

paper_url: http://arxiv.org/abs/2310.08528
repo_url: https://github.com/hustvl/4DGaussians
paper_authors: Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, Xinggang Wang
for: 实现高效的动态场景渲染，包括模拟复杂运动和高分辨率渲染。
methods: 提出了4D Gaussian Splatting（4D-GS）方法，通过构建高效的凝固场和 hexPlane 连接来实现高效的形态和 Gaussian 运动模拟。
results: 实现了70帧/秒的实时渲染，在800x800分辨率的RTX 3090 GPU上，并且与之前的状态OF艺术方法相当或更高的质量。更多 demo 和代码可以在 https://guanjunwu.github.io/4dgs/ 上找到。

Abstract
Representing and rendering dynamic scenes has been an important but challenging task. Especially, to accurately model complex motions, high efficiency is usually hard to maintain. We introduce the 4D Gaussian Splatting (4D-GS) to achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency. An efficient deformation field is constructed to model both Gaussian motions and shape deformations. Different adjacent Gaussians are connected via a HexPlane to produce more accurate position and shape deformations. Our 4D-GS method achieves real-time rendering under high resolutions, 70 FPS at a 800$\times$800 resolution on an RTX 3090 GPU, while maintaining comparable or higher quality than previous state-of-the-art methods. More demos and code are available at https://guanjunwu.github.io/4dgs/.

摘要
Dynamic scene representation and rendering has been an important but challenging task. Especially, accurately modeling complex motions is often difficult to achieve while maintaining high efficiency. We propose the 4D Gaussian Splatting (4D-GS) method to achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency. An efficient deformation field is constructed to model both Gaussian motions and shape deformations. Different adjacent Gaussians are connected via a HexPlane to produce more accurate position and shape deformations. Our 4D-GS method achieves real-time rendering under high resolutions, with 70 FPS at an 800x800 resolution on an RTX 3090 GPU, while maintaining comparable or higher quality than previous state-of-the-art methods. More demos and code are available at https://guanjunwu.github.io/4dgs/.Here's the breakdown of the translation:* "dynamic scene" becomes "动态场景" (dòngtài chǎngjìng)* "representation" becomes "表示" (biǎozhì)* "rendering" becomes "渲染" (chūjiān)* "challenging task" becomes "difficult task" ( Zhèngshì zhèngdào)* "Gaussian motions" becomes "高斯运动" (gāosī yùndòng)* "shape deformations" becomes "形态变形" (xíngtài biànxiàng)* "HexPlane" becomes "六面体" (liùmiàn tǐ)* "real-time" becomes "实时" (shíshí)* "high resolutions" becomes "高分辨率" (gāo fēnbiàn zhù)* "70 FPS" becomes "70帧每秒" (qīshí fēn fēi shí)* "RTX 3090 GPU" becomes "RTX 3090 GPU" (RTX 3090 GPU)* "while maintaining" becomes "保持" (bǎojìn)* "comparable or higher quality" becomes "相当或更高质量" (xiāngdàng huí gèng qiàngyù)* "previous state-of-the-art methods" becomes "前一代方法" (qián yīdài fāngchéng)* "More demos and code" becomes "更多示例和代码" (gèng duō shìjì yǔ gōngcháng)Note that the translation is done in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know and I can provide that instead.

Unsupervised Learning of Object-Centric Embeddings for Cell Instance Segmentation in Microscopy Images

paper_url: http://arxiv.org/abs/2310.08501
repo_url: https://github.com/funkelab/cellulus
paper_authors: Steffen Wolf, Manan Lalit, Henry Westmacott, Katie McDole, Jan Funke
for: This paper is written for the task of segmenting objects in microscopy images, which is an important task in biomedical applications.
methods: The paper introduces a new method called object-centric embeddings (OCEs) that learns to embed image patches in a way that preserves spatial offsets between patches from the same object.
results: The paper shows that the OCE method can be used to delineate individual objects and obtain instance segmentations, and evaluates the method on nine diverse large-scale microscopy datasets. The results show that the method leads to substantially improved results compared to state-of-the-art baselines on six out of nine datasets, and performs on par on the remaining three datasets. If ground-truth annotations are available, the method can serve as an excellent starting point for supervised training, reducing the required amount of ground-truth needed by one order of magnitude.

Abstract
Segmentation of objects in microscopy images is required for many biomedical applications. We introduce object-centric embeddings (OCEs), which embed image patches such that the spatial offsets between patches cropped from the same object are preserved. Those learnt embeddings can be used to delineate individual objects and thus obtain instance segmentations. Here, we show theoretically that, under assumptions commonly found in microscopy images, OCEs can be learnt through a self-supervised task that predicts the spatial offset between image patches. Together, this forms an unsupervised cell instance segmentation method which we evaluate on nine diverse large-scale microscopy datasets. Segmentations obtained with our method lead to substantially improved results, compared to state-of-the-art baselines on six out of nine datasets, and perform on par on the remaining three datasets. If ground-truth annotations are available, our method serves as an excellent starting point for supervised training, reducing the required amount of ground-truth needed by one order of magnitude, thus substantially increasing the practical applicability of our method. Source code is available at https://github.com/funkelab/cellulus.

摘要
分割微scopic图像中的对象是生物医学应用中的重要任务。我们介绍了对象中心嵌入（OCE），它将图像块嵌入以保留归一化的空间偏移。这些学习的嵌入可以用来划分个体对象并获取实例分割。我们证明了，在微scopic图像中常见的假设下，OCE可以通过自动学习任务预测图像块之间的空间偏移来学习。这两个组件共同形成了无监督细胞实例分割方法，我们在九个大规模微scopic图像集合上进行了评估。 segmentation结果表现出色，相比于状态函数基eline，在六个 dataset 上表现出了明显的改善，并在剩下三个 dataset 上表现在eline。如果有ground truth标注，我们的方法可以作为supervised学习的初始点， thereby reducing the amount of ground truth needed by one order of magnitude, thereby significantly increasing the practical applicability of our method。可以在https://github.com/funkelab/cellulus上获取源代码。

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

paper_url: http://arxiv.org/abs/2310.08465
repo_url: https://github.com/showlab/MotionDirector
paper_authors: Rui Zhao, Yuchao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jiawei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou
for: 这种研究的目的是为了使用扩展的涂抹模型生成具有自定义动作的视频。
methods: 这种方法使用了全模型调整、附加层参数精度调整以及低级适应（LoRAs）等方法进行自定义动作的调整。
results: 实验结果表明，提议的方法可以生成具有多样化外观的自定义动作视频。此外，该方法还可以支持多种下游应用，如混合不同视频的外观和动作，以及将单个图像涂抹到自定义动作中。

Abstract
Large-scale pre-trained diffusion models have exhibited remarkable capabilities in diverse video generations. Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate videos with this motion. For example, generating a video with a car moving in a prescribed manner under specific camera movements to make a movie, or a video illustrating how a bear would lift weights to inspire creators. Adaptation methods have been developed for customizing appearance like subject or style, yet unexplored for motion. It is straightforward to extend mainstream adaption methods for motion customization, including full model tuning, parameter-efficient tuning of additional layers, and Low-Rank Adaptions (LoRAs). However, the motion concept learned by these methods is often coupled with the limited appearances in the training videos, making it difficult to generalize the customized motion to other appearances. To overcome this challenge, we propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion. Further, we design a novel appearance-debiased temporal loss to mitigate the influence of appearance on the temporal training objective. Experimental results show the proposed method can generate videos of diverse appearances for the customized motions. Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions. Our code and model weights will be released.

摘要
大规模预训 diffusion 模型在多种视频生成任务中表现出色，例如：根据给定的视频clip生成具有指定动作的视频。为了解决这个问题，我们提出了 MotionDirector，它使用了双路LoRAs架构来解耦动作和外观的学习。此外，我们还设计了一种新的外观偏好的时间损失来减轻外观对时间目标的影响。实验结果表明，我们的方法可以生成具有多样化外观的动作视频。此外，我们的方法还支持多种下游应用程序，例如：将不同视频的外观和动作分别混合，并将一张图片动画为自定义动作。我们将发布代码和模型参数。

Proving the Potential of Skeleton Based Action Recognition to Automate the Analysis of Manual Processes

paper_url: http://arxiv.org/abs/2310.08451
repo_url: None
paper_authors: Marlin Berger, Frederik Cloppenburg, Jens Eufinger, Thomas Gries
For: The paper aims to improve the analysis and monitoring of manual processes in manufacturing sectors such as textiles and electronics by using machine learning (ML) methods.* Methods: The paper uses a skeleton-based action recognition approach, which is a recent successful method in machine vision tasks, to detect the current motion performed by an operator in manual assembly. The authors also develop a ML pipeline to enable extensive research on different pre-processing methods and neural nets.* Results: The authors find that ML methods can provide higher flexibility, self-sufficiency, and lower costs compared to traditional methods such as Methods-Time-Measurement (MTM). They also demonstrate that their approach can be applied to all kinds of manual processes, not just manual assembly.

Abstract
In manufacturing sectors such as textiles and electronics, manual processes are a fundamental part of production. The analysis and monitoring of the processes is necessary for efficient production design. Traditional methods for analyzing manual processes are complex, expensive, and inflexible. Compared to established approaches such as Methods-Time-Measurement (MTM), machine learning (ML) methods promise: Higher flexibility, self-sufficient & permanent use, lower costs. In this work, based on a video stream, the current motion class in a manual assembly process is detected. With information on the current motion, Key-Performance-Indicators (KPIs) can be derived easily. A skeleton-based action recognition approach is taken, as this field recently shows major success in machine vision tasks. For skeleton-based action recognition in manual assembly, no sufficient pre-work could be found. Therefore, a ML pipeline is developed, to enable extensive research on different (pre-) processing methods and neural nets. Suitable well generalizing approaches are found, proving the potential of ML to enhance analyzation of manual processes. Models detect the current motion, performed by an operator in manual assembly, but the results can be transferred to all kinds of manual processes.

摘要
在制造业务中，如纺织和电子等，手动过程是生产的基本组成部分。分析和监测这些过程是生产设计的必要条件。传统方法分析手动过程相对复杂、昂贵和不灵活。相比已有的方法，如方法时间测量（MTM），机器学习（ML）方法承诺：更高的灵活性、自主和常规使用、更低的成本。在这项工作中，通过视频流，检测当前手动 Assembly 过程中的动作类别。通过动作类别信息，可以轻松地 derivation Key-Performance-Indicators（KPIs）。我们采用skeleton基于动作识别方法，因为这个领域最近几年在机器视觉任务中占据了主导地位。对于手动 Assembly 中skeleton基于动作识别，没有充分的前置工作。因此，我们开发了一个ML管道，以便进行广泛的不同（前）处理方法和神经网络的研究。适合通用的方法被发现，证明了机器学习可以增强手动过程的分析。模型可以检测手动 Assembly 过程中操作员现在执行的动作，但结果可以应用于所有的手动过程。

Assessing of Soil Erosion Risk Through Geoinformation Sciences and Remote Sensing – A Review

paper_url: http://arxiv.org/abs/2310.08430
repo_url: None
paper_authors: Lachezar Filchev, Vasil Kolev
for: 这篇论文主要是为了评估不同类型和结构的风化模型，以及它们在全球各地的应用。
methods: 这篇论文使用了空间分析技术，如地理信息系统（GIS），以进行风化风险评估，包括美国和世界各地的常用USLE和RUSLE方法，以及更进一步的实验室方法和人工智能技术。
results: 这篇论文提出了一种可能的未来发展方向，即采用人工智能技术进行风化风险评估。

Abstract
During past decades a marked manifestation of widespread erosion phenomena was studied worldwide. Global conservation community has launched campaigns at local, regional and continental level in developing countries for preservation of soil resources in order not only to stop or mitigate human impact on nature but also to improve life in rural areas introducing new approaches for soil cultivation. After the adoption of Sustainable Development Goals of UNs and launching several world initiatives such as the Land Degradation Neutrality (LDN) the world came to realize the very importance of the soil resources on which the biosphere relies for its existence. The main goal of the chapter is to review different types and structures erosion models as well as their applications. Several methods using spatial analysis capabilities of geographic information systems (GIS) are in operation for soil erosion risk assessment, such as Universal Soil Loss Equation (USLE), Revised Universal Soil Loss Equation (RUSLE) in operation worldwide and in the USA and MESALES model. These and more models are being discussed in the present work alongside more experimental models and methods for assessing soil erosion risk such as Artificial Intelligence (AI), Machine and Deep Learning, etc. At the end of this work, a prospectus for the future development of soil erosion risk assessment is drawn.

摘要
The main goal of this chapter is to review different types and structures of erosion models, as well as their applications. Various methods using spatial analysis capabilities of geographic information systems (GIS) are currently in operation for soil erosion risk assessment, such as the Universal Soil Loss Equation (USLE), the Revised Universal Soil Loss Equation (RUSLE), and the MESALES model. These and other models, as well as more experimental models and methods for assessing soil erosion risk, such as Artificial Intelligence (AI), Machine Learning, and Deep Learning, are discussed in this work. Finally, a prospectus for the future development of soil erosion risk assessment is drawn.

Revisiting Data Augmentation for Rotational Invariance in Convolutional Neural Networks

paper_url: http://arxiv.org/abs/2310.08429
repo_url: https://github.com/facundoq/rotational_invariance_data_augmentation
paper_authors: Facundo Manuel Quiroga, Franco Ronchetti, Laura Lanzarini, Aurelio Fernandez-Bariviera
for: 这个论文是为了研究如何在图像分类任务中实现旋转不变性。
methods: 该论文使用了数据增强法和两种特殊的Convolutional Neural Networks（Spatial Transformer Networks和Group Equivariant CNNs）来实现旋转不变性。
results: 研究发现，通过数据增强法可以让网络在旋转图像上准确分类，但是这需要更多的训练时间。此外，研究还发现了哪些层在网络中帮助网络编码旋转不变性。

Abstract
Convolutional Neural Networks (CNN) offer state of the art performance in various computer vision tasks. Many of those tasks require different subtypes of affine invariances (scale, rotational, translational) to image transformations. Convolutional layers are translation equivariant by design, but in their basic form lack invariances. In this work we investigate how best to include rotational invariance in a CNN for image classification. Our experiments show that networks trained with data augmentation alone can classify rotated images nearly as well as in the normal unrotated case; this increase in representational power comes only at the cost of training time. We also compare data augmentation versus two modified CNN models for achieving rotational invariance or equivariance, Spatial Transformer Networks and Group Equivariant CNNs, finding no significant accuracy increase with these specialized methods. In the case of data augmented networks, we also analyze which layers help the network to encode the rotational invariance, which is important for understanding its limitations and how to best retrain a network with data augmentation to achieve invariance to rotation.

摘要
convolutional neural networks (CNN) 提供了计算机视觉任务中的状态机器人表现。这些任务中有许多不同的Affine invariance（比例、旋转、平移）图像变换需求。 convolutional layers 是设计为 equivariant 的，但在其基本形式中缺乏抗变征性。在这项工作中，我们调查了如何在 CNN 中包含旋转不变性，以提高图像分类的表现。我们的实验结果表明，使用数据增强alone 可以将旋转图像分类到 nearly 与 Normal 无旋转情况下分类的程度相似; 这种增强的表现力只需要增加训练时间的代价。我们还比较了数据增强与两种修改后 CNN 模型来实现旋转不变性或 equivariance，Spatial Transformer Networks 和 Group Equivariant CNNs，发现这些特殊化方法并没有提高准确率。在数据增强网络中，我们还分析了哪些层 помо助网络编码旋转不变性，这是重要的，因为它可以理解其限制和如何最好地重新训练数据增强网络以实现旋转不变性。

Visual Attention-Prompted Prediction and Learning

paper_url: http://arxiv.org/abs/2310.08420
repo_url: None
paper_authors: Yifei Zhang, Siyi Gu, Bo Pan, Guangji Bai, Xiaofeng Yang, Liang Zhao
for: 提高模型预测力，解决注意力引导学习中的时间和计算成本问题
methods: 提出了注意力提前预测技术，不需要模型重新训练，并解决了视觉注意力提示的不完整信息问题
results: 通过实验表明，提出的框架可以在两个 dataset 上增强预测结果，并且可以在注意力提示和无注意力提示的情况下进行预测，提高了模型的预测能力

Abstract
Explanation(attention)-guided learning is a method that enhances a model's predictive power by incorporating human understanding during the training phase. While attention-guided learning has shown promising results, it often involves time-consuming and computationally expensive model retraining. To address this issue, we introduce the attention-prompted prediction technique, which enables direct prediction guided by the attention prompt without the need for model retraining. However, this approach presents several challenges, including: 1) How to incorporate the visual attention prompt into the model's decision-making process and leverage it for future predictions even in the absence of a prompt? and 2) How to handle the incomplete information from the visual attention prompt? To tackle these challenges, we propose a novel framework called Visual Attention-Prompted Prediction and Learning, which seamlessly integrates visual attention prompts into the model's decision-making process and adapts to images both with and without attention prompts for prediction. To address the incomplete information of the visual attention prompt, we introduce a perturbation-based attention map modification method. Additionally, we propose an optimization-based mask aggregation method with a new weight learning function for adaptive perturbed annotation aggregation in the attention map modification process. Our overall framework is designed to learn in an attention-prompt guided multi-task manner to enhance future predictions even for samples without attention prompts and trained in an alternating manner for better convergence. Extensive experiments conducted on two datasets demonstrate the effectiveness of our proposed framework in enhancing predictions for samples, both with and without provided prompts.

摘要
针对解释（注意）导学习方法，我们提出了一种新的框架，即视觉注意点引导预测和学习框架（Visual Attention-Prompted Prediction and Learning，简称VAPPL）。这种框架可以让模型在训练过程中通过注意点引导来增强预测力，而无需进行复杂和计算昂贵的模型重新训练。然而，这种方法存在一些挑战，包括如何在模型决策过程中 incorporate 视觉注意点，以及如何处理视觉注意点中的不完整信息。为解决这些挑战，我们提出了以下两点方法：1. 将视觉注意点integrated 到模型决策过程中，并在未提供注意点时进行预测。2. 使用杂化基于注意点的修正方法，以处理视觉注意点中的不完整信息。我们的框架包括以下几个组成部分：1. 注意点引导预测：使用提供的注意点来直接预测图像中的特征。2. 注意点修正：使用杂化基于注意点的修正方法，以处理视觉注意点中的不完整信息。3. 杂化基于注意点的mask aggregation：使用一种新的杂化基于注意点的mask aggregation方法，以便更好地处理不完整的注意点信息。4. 优化基于注意点的weight learning：使用一种新的优化基于注意点的weight learning方法，以便更好地适应不同的注意点信息。我们的框架采用了一种带有注意点的多任务学习方法，通过在不同任务之间进行交互学习，以提高未提供注意点时的预测性能。此外，我们还采用了一种分段的训练策略，以便更好地适应不同的注意点信息。我们对两个数据集进行了广泛的实验，并证明了我们的提出的框架可以提高未提供注意点时的预测性能。

Towards Design and Development of an ArUco Markers-Based Quantitative Surface Tactile Sensor

paper_url: http://arxiv.org/abs/2310.08398
repo_url: None
paper_authors: Ozdemir Can Kara, Charles Everson, Farshid Alambeigi
for: 这项研究的目标是量化视觉基于感知器的图像输出。
methods: 该研究提出了一种新的量化表面感知器（QS-TS），使得机器人抓取机械臂安全和自主地操作细腻物品。QS-TS通过在实时中测量感知器的gel层变形来实现这一目标。
results: 实验结果表明，QS-TS可以准确地测量感知器的gel层变形，相对误差低于5%。

Abstract
In this paper, with the goal of quantifying the qualitative image outputs of a Vision-based Tactile Sensor (VTS), we present the design, fabrication, and characterization of a novel Quantitative Surface Tactile Sensor (called QS-TS). QS-TS directly estimates the sensor's gel layer deformation in real-time enabling safe and autonomous tactile manipulation and servoing of delicate objects using robotic manipulators. The core of the proposed sensor is the utilization of miniature 1.5 mm x 1.5 mm synthetic square markers with inner binary patterns and a broad black border, called ArUco Markers. Each ArUco marker can provide real-time camera pose estimation that, in our design, is used as a quantitative measure for obtaining deformation of the QS-TS gel layer. Moreover, thanks to the use of ArUco markers, we propose a unique fabrication procedure that mitigates various challenges associated with the fabrication of the existing marker-based VTSs and offers an intuitive and less-arduous method for the construction of the VTS. Remarkably, the proposed fabrication facilitates the integration and adherence of markers with the gel layer to robustly and reliably obtain a quantitative measure of deformation in real-time regardless of the orientation of ArUco Markers. The performance and efficacy of the proposed QS-TS in estimating the deformation of the sensor's gel layer were experimentally evaluated and verified. Results demonstrate the phenomenal performance of the QS-TS in estimating the deformation of the gel layer with a relative error of <5%.

摘要
在这篇论文中，我们目标是量化视觉基于感测器（VTS）的 качеitative图像输出。我们介绍了一种新的量化表面感测器（QS-TS）的设计、制造和性能Characterization。QS-TS直接测量感测器的gel层塑料的变形，并在实时中提供了安全和自主的柔软物体把握和控制。核心思想是利用微型1.5毫米 x 1.5毫米的合成方块 marker，内部具有内 binar pattern和宽黑边框，称为 ArUco 标记。每个 ArUco 标记可以提供实时相机pose estimation，在我们的设计中用作量化测量gel层的变形。此外，我们提出了一种独特的制造过程，解决了现有 marker-based VTS 的制造问题，并提供了一种直观和不困难的方法 для VTS 的建造。另外，我们的制造过程可以强制性地和可靠地在不同的 ArUco 标记orientation下获取量化测量gel层的变形。我们对提出的 QS-TS 的性能和可靠性进行了实验性评估和验证。结果表明，QS-TS 可以准确地测量gel层的变形，Relative error <5%。

Hyp-UML: Hyperbolic Image Retrieval with Uncertainty-aware Metric Learning

paper_url: http://arxiv.org/abs/2310.08390
repo_url: None
paper_authors: Shiyang Yan, Zongxuan Liu, Lin Xu
for: 这篇论文主要应用于图像搜寻和分类，并且是一种代表学习的关键算法，例如特征学习和它们在度量空间的对齐。
methods: 本论文提出了一种基于希腊圆形空间的图像嵌入，并且还包括两种不同的不确定度测量方法，一种是基于对比学习，另一种是基于margin-based度量学习。
results: 实验验证确认了提出的方法可以实现相关方法中的最佳结果，并且进行了广泛的ablation研究，验证每个方法的有效性。

Abstract
Metric learning plays a critical role in training image retrieval and classification. It is also a key algorithm in representation learning, e.g., for feature learning and its alignment in metric space. Hyperbolic embedding has been recently developed. Compared to the conventional Euclidean embedding in most of the previously developed models, Hyperbolic embedding can be more effective in representing the hierarchical data structure. Second, uncertainty estimation/measurement is a long-lasting challenge in artificial intelligence. Successful uncertainty estimation can improve a machine learning model's performance, robustness, and security. In Hyperbolic space, uncertainty measurement is at least with equivalent, if not more, critical importance. In this paper, we develop a Hyperbolic image embedding with uncertainty-aware metric learning for image retrieval. We call our method Hyp-UML: Hyperbolic Uncertainty-aware Metric Learning. Our contribution are threefold: we propose an image embedding algorithm based on Hyperbolic space, with their corresponding uncertainty value; we propose two types of uncertainty-aware metric learning, for the popular Contrastive learning and conventional margin-based metric learning, respectively. We perform extensive experimental validations to prove that the proposed algorithm can achieve state-of-the-art results among related methods. The comprehensive ablation study validates the effectiveness of each component of the proposed algorithm.

摘要
metric 学习在图像检索和分类训练中扮演了关键角色，它还是表示学习中的关键算法，例如特征学习和其在度量空间的对齐。 reciently， Hyperbolic embedding 已经被开发出来。相比传统的欧几何 embedding ，Hyperbolic embedding 可以更好地表示层次结构的数据。第二，人工智能中的不确定性估计是一个长期的挑战。成功的不确定性估计可以提高机器学习模型的性能、Robustness 和安全性。在 Hyperbolic 空间中，不确定性测量是至少与欧几何空间相当重要，可能更重要。在这篇论文中，我们开发了一种基于 Hyperbolic 空间的图像嵌入，并与不确定性值相对。我们称之为 Hyp-UML：Hyperbolic Uncertainty-aware Metric Learning。我们的贡献有三个方面：1. 我们提出了基于 Hyperbolic 空间的图像嵌入算法，并附带不确定性值。2. 我们提出了两种不确定性意识度量学习方法，一种是基于对比学习，另一种是基于折衔学习。3. 我们进行了广泛的实验验证，证明我们的方法可以在相关的方法中 achieve 状态的较好Result。另外，我们进行了全面的减少学习来验证每个方法的有效性。

MeanAP-Guided Reinforced Active Learning for Object Detection

paper_url: http://arxiv.org/abs/2310.08387
repo_url: None
paper_authors: Zhixuan Liang, Xingyu Zeng, Rui Zhao, Ping Luo
for: 本研究旨在提高对象检测模型的训练效果，使用最少的标注数据，通过选择最有用的示例进行标注并将其包含到任务学习器中。
methods: 本研究使用了 MeanAP metric来作为查找数据的信息吸引度，并采用了一种基于 reinforcement learning 的抽象代理来选择后续训练示例。
results: 实验结果表明，MAGRAL 在 PASCAL VOC 和 MS COCO 上比最新的状态艺术方法表现出色，显示了substantial的性能提升。MAGRAL 为激活学习对象检测提供了一个坚实的基线，这表明它在这个领域可能会取得进一步的进步。

Abstract
Active learning presents a promising avenue for training high-performance models with minimal labeled data, achieved by judiciously selecting the most informative instances to label and incorporating them into the task learner. Despite notable advancements in active learning for image recognition, metrics devised or learned to gauge the information gain of data, crucial for query strategy design, do not consistently align with task model performance metrics, such as Mean Average Precision (MeanAP) in object detection tasks. This paper introduces MeanAP-Guided Reinforced Active Learning for Object Detection (MAGRAL), a novel approach that directly utilizes the MeanAP metric of the task model to devise a sampling strategy employing a reinforcement learning-based sampling agent. Built upon LSTM architecture, the agent efficiently explores and selects subsequent training instances, and optimizes the process through policy gradient with MeanAP serving as reward. Recognizing the time-intensive nature of MeanAP computation at each step, we propose fast look-up tables to expedite agent training. We assess MAGRAL's efficacy across popular benchmarks, PASCAL VOC and MS COCO, utilizing different backbone architectures. Empirical findings substantiate MAGRAL's superiority over recent state-of-the-art methods, showcasing substantial performance gains. MAGRAL establishes a robust baseline for reinforced active object detection, signifying its potential in advancing the field.

摘要
active learning可能是训练高性能模型的有望途径，通过选择最有信息的实例进行标注并将其添加到任务学习器中，以实现最小的标注数据量。然而，关键指标选择和任务模型性能指标之间存在一定的差异，这些指标通常是图像识别任务中的 Mean Average Precision（MeanAP）。这篇论文介绍了 MeanAP-Guided Reinforced Active Learning for Object Detection（MAGRAL），一种新的方法，它直接使用任务模型的 MeanAP 指标来设计查询策略，并使用长短期记忆（LSTM）架构建立一个强化学习 Agent。通过策略梯度下降，Agent 可以快速探索和选择后续训练实例，并且可以通过 MeanAP 作为奖励来优化过程。由于 MeanAP 的计算在每步都是时间开销的，我们提出了快速查找表来加速 Agent 的训练。我们在 PASCAL VOC 和 MS COCO 等 популяр的 benchmark 上进行了实验，并使用不同的底层架构。实验结果证明 MAGRAL 在最新的方法中表现出色，显示了大幅性能提升。MAGRAL 建立了一个强大的底线 для强化活动对象检测，这表明它在该领域的发展潜力很大。

AutoVP: An Automated Visual Prompting Framework and Benchmark

paper_url: http://arxiv.org/abs/2310.08381
repo_url: https://github.com/IBM/AutoVP
paper_authors: Hsi-Ai Tsao, Lei Hsiung, Pin-Yu Chen, Sijia Liu, Tsung-Yi Ho
for:这篇论文的目的是提出一个叫做AutoVP的扩展性框架，用于自动化Visual Prompting（VP）设计选择，以及提供12个下游图像分类任务，用于全面评估VP性能。methods:论文使用了一个名为AutoVP的框架，包括了三个设计空间：1）对于Prompt的共同优化; 2）适用于预训练模型的选择，包括图像分类器和文本图像Encoder; 3）模型输出映射策略，包括非 Parametric 和可训练的标签映射。results:实验结果显示，AutoVP比现有最佳VP方法有着重大的提升，具体而言，可以提高精度的最大提升为27.5%，并且在12个下游图像分类任务中实现了6.7%的提升。

Abstract
Visual prompting (VP) is an emerging parameter-efficient fine-tuning approach to adapting pre-trained vision models to solve various downstream image-classification tasks. However, there has hitherto been little systematic study of the design space of VP and no clear benchmark for evaluating its performance. To bridge this gap, we propose AutoVP, an end-to-end expandable framework for automating VP design choices, along with 12 downstream image-classification tasks that can serve as a holistic VP-performance benchmark. Our design space covers 1) the joint optimization of the prompts; 2) the selection of pre-trained models, including image classifiers and text-image encoders; and 3) model output mapping strategies, including nonparametric and trainable label mapping. Our extensive experimental results show that AutoVP outperforms the best-known current VP methods by a substantial margin, having up to 6.7% improvement in accuracy; and attains a maximum performance increase of 27.5% compared to linear-probing (LP) baseline. AutoVP thus makes a two-fold contribution: serving both as an efficient tool for hyperparameter tuning on VP design choices, and as a comprehensive benchmark that can reasonably be expected to accelerate VP's development. The source code is available at https://github.com/IBM/AutoVP.

摘要
“幻像提示（VP）是一种emerging的参数高效调整方法，用于适应预训练的视觉模型解决各种下游图像分类任务。然而，有很少的系统性研究VP的设计空间，也没有明确的性能标准。为bridge这个差距，我们提议AutoVP，一个可扩展的框架，用于自动化VP设计选择，以及12个下游图像分类任务，可以作为VP性能标准。我们的设计空间包括：1）提示的共同优化; 2）采用预训练模型，包括图像分类器和文本图像编码器; 3）模型输出映射策略，包括非 Parametric 和可训练标签映射。我们的广泛实验结果表明，AutoVP比现有最佳VP方法有substantial的提升，具有最高27.5%的性能提升比基准线性探测（LP）方法。AutoVP因此作出了两重贡献：作为一个高效的 hyperparameter 调整工具，以及一个全面的标准，可以加速VP的发展。代码可以在https://github.com/IBM/AutoVP 中获取。”

Worst-Case Morphs using Wasserstein ALI and Improved MIPGAN

paper_url: http://arxiv.org/abs/2310.08371
repo_url: None
paper_authors: Una M. Kelly, Meike Nauta, Lu Liu, Luuk J. Spreeuwers, Raymond N. J. Veldhuis
for: 这 paper 的目的是提出一种能够生成 worst-case 模糊图像的方法，以挑战 face recognition 系统（FR）的安全性。
methods: 这 paper 使用了 Adversarially Learned Inference（ALI）和 Wasserstein GANs trains with Gradient Penalty（WGAN-GP）等方法来生成模糊图像，并通过特定的损失函数来提高模糊图像中的人脸信息 manipulate 的能力。
results: 这 paper 的结果表明，使用 WALI 方法可以生成更加挑战 FR 系统的模糊图像，并且可以提高 MIPGAN 等现有的 StyleGAN-based morph generator 的性能。

Abstract
A morph is a combination of two separate facial images and contains identity information of two different people. When used in an identity document, both people can be authenticated by a biometric Face Recognition (FR) system. Morphs can be generated using either a landmark-based approach or approaches based on deep learning such as Generative Adversarial Networks (GAN). In a recent paper, we introduced a \emph{worst-case} upper bound on how challenging morphing attacks can be for an FR system. The closer morphs are to this upper bound, the bigger the challenge they pose to FR. We introduced an approach with which it was possible to generate morphs that approximate this upper bound for a known FR system (white box), but not for unknown (black box) FR systems. In this paper, we introduce a morph generation method that can approximate worst-case morphs even when the FR system is not known. A key contribution is that we include the goal of generating difficult morphs \emph{during} training. Our method is based on Adversarially Learned Inference (ALI) and uses concepts from Wasserstein GANs trained with Gradient Penalty, which were introduced to stabilise the training of GANs. We include these concepts to achieve similar improvement in training stability and call the resulting method Wasserstein ALI (WALI). We finetune WALI using loss functions designed specifically to improve the ability to manipulate identity information in facial images and show how it can generate morphs that are more challenging for FR systems than landmark- or GAN-based morphs. We also show how our findings can be used to improve MIPGAN, an existing StyleGAN-based morph generator.

摘要
文本：A morph is a combination of two separate facial images and contains identity information of two different people. When used in an identity document, both people can be authenticated by a biometric Face Recognition (FR) system. Morphs can be generated using either a landmark-based approach or approaches based on deep learning such as Generative Adversarial Networks (GAN). In a recent paper, we introduced a worst-case upper bound on how challenging morphing attacks can be for an FR system. The closer morphs are to this upper bound, the bigger the challenge they pose to FR. We introduced an approach with which it was possible to generate morphs that approximate this upper bound for a known FR system (white box), but not for unknown (black box) FR systems. In this paper, we introduce a morph generation method that can approximate worst-case morphs even when the FR system is not known. A key contribution is that we include the goal of generating difficult morphs during training. Our method is based on Adversarially Learned Inference (ALI) and uses concepts from Wasserstein GANs trained with Gradient Penalty, which were introduced to stabilize the training of GANs. We include these concepts to achieve similar improvement in training stability and call the resulting method Wasserstein ALI (WALI). We finetune WALI using loss functions designed specifically to improve the ability to manipulate identity information in facial images and show how it can generate morphs that are more challenging for FR systems than landmark- or GAN-based morphs. We also show how our findings can be used to improve MIPGAN, an existing StyleGAN-based morph generator.翻译：一个 morph 是两个不同人的面部图像的组合，它包含这两个人的身份信息。在身份文件中使用时，这两个人可以通过面部识别系统进行验证。 morphs 可以使用 landmark-based 方法或深度学习方法如生成敌对学习网络 (GAN) 来生成。在一篇最近的论文中，我们引入了一个 worst-case 上限，用于描述 morphing 攻击的复杂程度。这个上限更近的 morphs 对于 Face Recognition (FR) 系统来说更加具有挑战性。我们引入了一种可以在知道 FR 系统 (白盒) 上生成 Approximate worst-case morphs 的方法，但不能在不知道 FR 系统 (黑盒) 上生成。在这篇论文中，我们介绍了一种可以在不知道 FR 系统上生成 worst-case morphs 的方法。这个方法基于 Adversarially Learned Inference (ALI)，并使用 Wasserstein GANs trained with Gradient Penalty 的概念，这些概念可以帮助稳定 GANs 的训练。我们在这些概念基础上进行了类似的改进，并将其称为 Wasserstein ALI (WALI)。我们使用特定设计来提高 facial image 中的身份信息 manipulate 能力的损失函数，并通过这些损失函数来训练 WALI。我们还显示了我们的发现可以用来改进 MIPGAN，一个基于 StyleGAN 的 morph generator。

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

paper_url: http://arxiv.org/abs/2310.08370
repo_url: https://github.com/Nightmare-n/UniPAD
paper_authors: Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang
for: This paper is written for the purpose of proposing a novel self-supervised learning paradigm called UniPAD, which is designed to improve the effectiveness of feature learning for autonomous driving.
methods: The paper uses a 3D volumetric differentiable rendering technique to implicitly encode 3D space and facilitate the reconstruction of continuous 3D shape structures and intricate appearance characteristics of their 2D projections.
results: The paper demonstrates the feasibility and effectiveness of UniPAD through extensive experiments on various downstream 3D tasks, achieving significant improvements over lidar-, camera-, and lidar-camera-based baselines, and achieving state-of-the-art results in 3D object detection and 3D semantic segmentation on the nuScenes validation set.Here is the text in Simplified Chinese:
for: 这篇论文是为了介绍一种新的自我超vised学习方法UniPAD，该方法是用于提高自驾护护的特征学习效果。
methods: 该论文使用了3D可微分渲染技术，以隐式地编码3D空间，并且促进了连续3D形状结构和2D投影中的细腻特征的重建。
results: 论文通过对多个下游3D任务进行广泛的实验，证明了UniPAD的可行性和效果，并在nuScenes验证集上 achieved state-of-the-art Results in 3D物体检测和3D semantics排序。

Abstract
In the context of autonomous driving, the significance of effective feature learning is widely acknowledged. While conventional 3D self-supervised pre-training methods have shown widespread success, most methods follow the ideas originally designed for 2D images. In this paper, we present UniPAD, a novel self-supervised learning paradigm applying 3D volumetric differentiable rendering. UniPAD implicitly encodes 3D space, facilitating the reconstruction of continuous 3D shape structures and the intricate appearance characteristics of their 2D projections. The flexibility of our method enables seamless integration into both 2D and 3D frameworks, enabling a more holistic comprehension of the scenes. We manifest the feasibility and effectiveness of UniPAD by conducting extensive experiments on various downstream 3D tasks. Our method significantly improves lidar-, camera-, and lidar-camera-based baseline by 9.1, 7.7, and 6.9 NDS, respectively. Notably, our pre-training pipeline achieves 73.2 NDS for 3D object detection and 79.4 mIoU for 3D semantic segmentation on the nuScenes validation set, achieving state-of-the-art results in comparison with previous methods. The code will be available at https://github.com/Nightmare-n/UniPAD.

摘要
在自动驾驶中，有效特征学习的重要性广泛得到了认可。传统的3D自我超vised预训练方法已经在各种应用中得到了广泛的成功，但大多数方法都是基于2D图像的想法。在这篇论文中，我们提出了UniPAD，一种新的自我超vised学习方法，通过3D分割可 differentiable rendering来隐式地编码3D空间，使得可以重建连续的3D形状结构和其2D投影图像的细节特征。我们的方法具有灵活性，可以轻松地与2D和3D框架集成，从而更好地理解场景。我们通过对多种下游3D任务进行广泛的实验证明了UniPAD的可行性和效果。与基eline相比，我们的预训练管道可以提高lidar-, camera-和lidar-camera-based基eline的NDS分数，分别提高9.1、7.7和6.9个NDS。特别是，我们的预训练管道在3D物体检测和3Dsemantic segmentation任务上实现了73.2个NDS和79.4个mIoU的最佳成绩，与前一代方法相比，达到了状态的艺术水平。代码将在https://github.com/Nightmare-n/UniPAD中提供。

Mapping Memes to Words for Multimodal Hateful Meme Classification

paper_url: http://arxiv.org/abs/2310.08368
repo_url: https://github.com/miccunifi/issues
paper_authors: Giovanni Burbi, Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Alberto Del Bimbo
for: 本研究旨在探讨Multimodal图文投稿中的仇恨内容检测，以提高网络上的仇恨内容识别和防控。
methods: 本研究提出了一种名为ISSUES的新方法，利用预训练的CLIP视觉语言模型和文本倒转技术，有效地捕捉 Multimodal图文投稿的semantic内容。
results: 实验表明，ISSUES方法在Hateful Memes Challenge和HarMeme数据集上达到了状态之前的最佳结果。代码和预训练模型公开在https://github.com/miccunifi/ISSUES。

Abstract
Multimodal image-text memes are prevalent on the internet, serving as a unique form of communication that combines visual and textual elements to convey humor, ideas, or emotions. However, some memes take a malicious turn, promoting hateful content and perpetuating discrimination. Detecting hateful memes within this multimodal context is a challenging task that requires understanding the intertwined meaning of text and images. In this work, we address this issue by proposing a novel approach named ISSUES for multimodal hateful meme classification. ISSUES leverages a pre-trained CLIP vision-language model and the textual inversion technique to effectively capture the multimodal semantic content of the memes. The experiments show that our method achieves state-of-the-art results on the Hateful Memes Challenge and HarMeme datasets. The code and the pre-trained models are publicly available at https://github.com/miccunifi/ISSUES.

摘要
多模态图文投稿在互联网上广泛存在，作为一种混合视觉和文本元素的特殊形式的沟通，用于传达幽默、想法或情感。然而，一些投稿会发展为恶意的，推广仇恨内容并推动歧视。在这种多模态上下文中探测恶意投稿是一项复杂的任务，需要理解图文中的含义相互作用。在这种情况下，我们提出了一种名为ISSUES的新方法，用于多模态恶意投稿分类。ISSUES利用预训练的CLIP视觉语言模型和文本倒转技术，有效地捕捉投稿的多模态含义。实验结果表明，我们的方法在Hateful Memes Challenge和HarMeme数据集上达到了状态码的最佳结果。代码和预训练模型可以在https://github.com/miccunifi/ISSUES上下载。

A Generic Software Framework for Distributed Topological Analysis Pipelines

paper_url: http://arxiv.org/abs/2310.08339
repo_url: None
paper_authors: Eve Le Guillou, Michael Will, Pierre Guillou, Jonas Lukasczyk, Pierre Fortin, Christoph Garth, Julien Tierny
for: 本文提出了一个软件框架，用于支持分布式内存中的拓扑分析管道。相比之下，一些最近的论文已经在分布式内存环境中实现了基于拓扑的方法，但是这些方法都是专门为单一算法而实现的。本文则描述了一个通用的、Generic框架，可以支持多种拓扑算法的交互，可能在不同的进程上运行。
methods: 我们在本文中使用了MPI模型，并在Topology ToolKit（TTK）中实现了这个框架。在开发这个框架时，我们遇到了许多算法和软件工程困难，并在文中 документирова了这些困难。我们还提供了分布式内存中的拓扑算法的分类，根据它们的通信需求，以及一些Hybrid MPI+线程并行的示例。
results: 我们对这个框架的性能进行了详细的分析，发现并行效率可以在20%到80%之间，具体取决于算法。此外，我们在我们的框架中引入的MPI特定的预处理对计算时间 overhead是可以忽略的。 finally，我们使用了TTK在一个大规模的数据集上进行了一个高级的分析管道示例，演示了这个框架的新的分布式内存能力。

Abstract
This system paper presents a software framework for the support of topological analysis pipelines in a distributed-memory model. While several recent papers introduced topology-based approaches for distributed-memory environments, these were reporting experiments obtained with tailored, mono-algorithm implementations. In contrast, we describe in this paper a general-purpose, generic framework for topological analysis pipelines, i.e. a sequence of topological algorithms interacting together, possibly on distinct numbers of processes. Specifically, we instantiated our framework with the MPI model, within the Topology ToolKit (TTK). While developing this framework, we faced several algorithmic and software engineering challenges, which we document in this paper. We provide a taxonomy for the distributed-memory topological algorithms supported by TTK, depending on their communication needs and provide examples of hybrid MPI+thread parallelizations. Detailed performance analyses show that parallel efficiencies range from $20\%$ to $80\%$ (depending on the algorithms), and that the MPI-specific preconditioning introduced by our framework induces a negligible computation time overhead. We illustrate the new distributed-memory capabilities of TTK with an example of advanced analysis pipeline, combining multiple algorithms, run on the largest publicly available dataset we have found (120 billion vertices) on a standard cluster with 64 nodes (for a total of 1,536 cores). Finally, we provide a roadmap for the completion of TTK's MPI extension, along with generic recommendations for each algorithm communication category.

摘要
We provide a taxonomy for the distributed-memory topological algorithms supported by TTK, depending on their communication needs, and examples of hybrid MPI+thread parallelizations. Our performance analyses show that parallel efficiencies range from 20% to 80% (depending on the algorithms), and that the MPI-specific preconditioning introduced by our framework has negligible computation time overhead.We illustrate the new distributed-memory capabilities of TTK with an example of an advanced analysis pipeline combining multiple algorithms on the largest publicly available dataset (120 billion vertices) on a standard cluster with 64 nodes (for a total of 1,536 cores). Finally, we provide a roadmap for the completion of TTK's MPI extension, along with generic recommendations for each algorithm communication category.

Real-Time Neural BRDF with Spherically Distributed Primitives

paper_url: http://arxiv.org/abs/2310.08332
repo_url: None
paper_authors: Yishun Dou, Zhong Zheng, Qiaoqiao Jin, Bingbing Ni, Yugang Chen, Junxiang Ke
for: 提供一种高效简洁的神经网络 BRDF，用于实现实时渲染。
methods: 提议使用两个低维度的方向特征网格（一个是入射方向网格，另一个是出射方向网格），以及一个小型的神经网络来学习反射特征。
results: 实验结果表明，提议的方法可以在高解度下实现实时渲染，并且可以模型各种材料的各种表现。

Abstract
We propose a novel compact and efficient neural BRDF offering highly versatile material representation, yet with very-light memory and neural computation consumption towards achieving real-time rendering. The results in Figure 1, rendered at full HD resolution on a current desktop machine, show that our system achieves real-time rendering with a wide variety of appearances, which is approached by the following two designs. On the one hand, noting that bidirectional reflectance is distributed in a very sparse high-dimensional subspace, we propose to project the BRDF into two low-dimensional components, i.e., two hemisphere feature-grids for incoming and outgoing directions, respectively. On the other hand, learnable neural reflectance primitives are distributed on our highly-tailored spherical surface grid, which offer informative features for each component and alleviate the conventional heavy feature learning network to a much smaller one, leading to very fast evaluation. These primitives are centrally stored in a codebook and can be shared across multiple grids and even across materials, based on the low-cost indices stored in material-specific spherical surface grids. Our neural BRDF, which is agnostic to the material, provides a unified framework that can represent a variety of materials in consistent manner. Comprehensive experimental results on measured BRDF compression, Monte Carlo simulated BRDF acceleration, and extension to spatially varying effect demonstrate the superior quality and generalizability achieved by the proposed scheme.

摘要
我们提出了一种新的紧凑型高效神经BRDF，可以高效地表示各种材料的各种表现，且具有很低的内存和神经计算占用率，以实现实时渲染。图1所示的结果，在全高清解算器上的当前桌面机器上进行渲染，显示了我们的系统可以实现实时渲染，并且可以表示各种不同的外观。在一种方法上，我们注意到了反射率在高维度下的极其稀畴分布，我们将BRDF投影到了两个低维度组件中，即进行和出行方向的两个半球特征网格。另一方面，我们使用学习神经反射元素，分布在我们特制的球面网格上，这些元素提供了每个组件中的有用特征，从而使得传统的重量级特征学习网络可以减少到非常小，从而实现非常快的评估。这些元素被中心存储在一个编码表中，可以在多个网格和材料之间共享，基于材料特有的球面网格中的低成本索引。我们的神经BRDF是材料无关的，它提供了一种统一的框架，可以一致地表示各种材料。我们的实验结果表明，我们的方法可以高效地压缩BRDF，使用MCV simulated BRDF加速，并在空间变化的效果上进行扩展。

NSM4D: Neural Scene Model Based Online 4D Point Cloud Sequence Understanding

paper_url: http://arxiv.org/abs/2310.08326
repo_url: None
paper_authors: Yuhao Dong, Zhuoyang Zhang, Yunze Liu, Li Yi
for: 本研究旨在提高现有4D背bone的在线感知能力，包括VR/AR、机器人和自动驾驶等场景。
methods: 我们提出了一种名为NSM4D的通用在线4D感知方法，可以与现有的4D背bone结合使用，以提高其在线感知能力。NSM4D使用神经场景模型来分解空间和运动信息，并通过token表示来提高鲁棒性和可缩放性。
results: 我们在各种在线感知测试 benchmark 上达到了显著的改善，包括HOI4D在线动作 segmentation 的9.6%精度提高和SemanticKITTI在线 semantics segmentation 的3.4% mIoU 提高。此外，NSM4D表现出了优秀的扩展性，可以适应更长的序列。

Abstract
Understanding 4D point cloud sequences online is of significant practical value in various scenarios such as VR/AR, robotics, and autonomous driving. The key goal is to continuously analyze the geometry and dynamics of a 3D scene as unstructured and redundant point cloud sequences arrive. And the main challenge is to effectively model the long-term history while keeping computational costs manageable. To tackle these challenges, we introduce a generic online 4D perception paradigm called NSM4D. NSM4D serves as a plug-and-play strategy that can be adapted to existing 4D backbones, significantly enhancing their online perception capabilities for both indoor and outdoor scenarios. To efficiently capture the redundant 4D history, we propose a neural scene model that factorizes geometry and motion information by constructing geometry tokens separately storing geometry and motion features. Exploiting the history becomes as straightforward as querying the neural scene model. As the sequence progresses, the neural scene model dynamically deforms to align with new observations, effectively providing the historical context and updating itself with the new observations. By employing token representation, NSM4D also exhibits robustness to low-level sensor noise and maintains a compact size through a geometric sampling scheme. We integrate NSM4D with state-of-the-art 4D perception backbones, demonstrating significant improvements on various online perception benchmarks in indoor and outdoor settings. Notably, we achieve a 9.6% accuracy improvement for HOI4D online action segmentation and a 3.4% mIoU improvement for SemanticKITTI online semantic segmentation. Furthermore, we show that NSM4D inherently offers excellent scalability to longer sequences beyond the training set, which is crucial for real-world applications.

摘要
理解4D点云序列在线是实际场景中的重要任务，如VR/AR、 робо太器和自动驾驶。主要挑战是在新观察到的数据流入时，有效地模型长期历史，同时保持计算成本可控。为解决这些挑战，我们介绍了一种通用的在线4D感知方法 called NSM4D。NSM4D是一种插件化策略，可以适应现有4D脊梁，明显提高在线感知能力，包括室内和室外场景。为了有效地捕捉重复的4D历史，我们提议一种神经场景模型，该模型将geometry和动作信息分解为两个分量，并将geometry特征存储在geometry tokens中。利用历史变得如查询神经场景模型。随着序列的扩展，神经场景模型会逐渐对新观察到的数据进行匹配，以提供历史上的 контекст和更新。通过使用Token表示，NSM4D也能够对低级别的感知器骤动具有抗性，并保持紧凑的大小通过地理学取样方式。我们将NSM4D与现有的4D感知脊梁集成，在室内和室外场景中展示了显著改进。特别是，我们实现了HOI4D在线动作分割 tasks中的9.6%精度提高和SemanticKITTI在线semantic segmentation tasks中的3.4%mIoU提高。此外，我们还证明NSM4D自然地具有优秀的扩展性，可以处理更长的序列，这在实际应用中是非常重要的。

Extended target tracking utilizing machine-learning software – with applications to animal classification

paper_url: http://arxiv.org/abs/2310.08316
repo_url: None
paper_authors: Magnus Malmström, Anton Kullberg, Isaac Skog, Daniel Axehill, Fredrik Gustafsson
for: 检测和跟踪图像序列中的对象
methods: 使用对象检测算法输出为检测结果，并利用前一帧的类信息强化分类，以鲁棒化分类结果
results: 在使用camera trap图像进行测试后，实现了更加鲁检的分类结果

Abstract
This paper considers the problem of detecting and tracking objects in a sequence of images. The problem is formulated in a filtering framework, using the output of object-detection algorithms as measurements. An extension to the filtering formulation is proposed that incorporates class information from the previous frame to robustify the classification, even if the object-detection algorithm outputs an incorrect prediction. Further, the properties of the object-detection algorithm are exploited to quantify the uncertainty of the bounding box detection in each frame. The complete filtering method is evaluated on camera trap images of the four large Swedish carnivores, bear, lynx, wolf, and wolverine. The experiments show that the class tracking formulation leads to a more robust classification.

摘要
这篇论文考虑了图像序列中对象检测和跟踪的问题。问题是使用滤波框架来解决，使用对象检测算法的输出作为测量。另外，一种增强的滤波形式是提出，该形式包括上一帧的类信息来强化分类，即使对象检测算法输出错误预测也能够强化分类。此外，利用对象检测算法的性质来评估每帧 bounding box 检测结果的uncertainty。完整的滤波方法在摄像头捕捉的瑞典四大哺乳动物摄像头上进行了评估。实验结果表明，类跟踪形式导致更加稳定的分类。

GePSAn: Generative Procedure Step Anticipation in Cooking Videos

paper_url: http://arxiv.org/abs/2310.08312
repo_url: None
paper_authors: Mohamed Ashraf Abdelsalam, Samrudhdhi B. Rangrej, Isma Hadji, Nikita Dvornik, Konstantinos G. Derpanis, Afsaneh Fazly
for: 预测未来步骤在进程视频中
methods: 使用生成模型，通过模型学习多个可能的下一步选择
results: 在 YouCookII 上实现新的状态态-of-the-art 结果，并在没有调整或适应的情况下在视频频道上进行预测。Here’s a breakdown of each point:
for: The paper is focused on the problem of future step anticipation in procedural videos.
methods: The authors use a generative model to predict multiple plausible candidates for the next step in a procedural video. They pretrain the model on a large text-based corpus of procedural activities and then transfer it to the video domain.
results: The authors achieve new state-of-the-art results on the YouCookII dataset, and demonstrate that their model can successfully transfer from text to the video domain without fine-tuning or adaptation.

Abstract
We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the problem of data scarcity in procedural video datasets, another core challenge of future anticipation is how to account for multiple plausible future realizations in natural settings. This problem has been largely overlooked in previous work. To address this challenge, we frame future step prediction as modelling the distribution of all possible candidates for the next step. Specifically, we design a generative model that takes a series of video clips as input, and generates multiple plausible and diverse candidates (in natural language) for the next step. Following previous work, we side-step the video annotation scarcity by pretraining our model on a large text-based corpus of procedural activities, and then transfer the model to the video domain. Our experiments, both in textual and video domains, show that our model captures diversity in the next step prediction and generates multiple plausible future predictions. Moreover, our model establishes new state-of-the-art results on YouCookII, where it outperforms existing baselines on the next step anticipation. Finally, we also show that our model can successfully transfer from text to the video domain zero-shot, ie, without fine-tuning or adaptation, and produces good-quality future step predictions from video.

摘要
我们研究未来步骤预测在进程视频中的问题。给定一个正在进行的进程活动视频，我们预测下一步的可能性描述在丰富的自然语言中。而前一个工作主要关注的问题是数据缺乏在进程视频数据集上，另一个核心挑战是如何考虑多个可能的未来实现在自然 Setting中。这个问题在前一个工作中得到了广泛忽略。为了解决这个挑战，我们将未来步骤预测定义为模型所有可能候选人的分布。具体来说，我们设计了一种生成模型，接受一系列视频剪辑作为输入，并生成多个可能和多样的候选人（在自然语言中）的下一步。根据之前的工作，我们训练我们的模型在大量的文本基础数据集上，然后将模型转移到视频领域。我们的实验表明，我们的模型能够捕捉多个下一步预测的多样性，并生成多个可能的未来预测。此外，我们的模型在YouCookII上新做出了状态的报表结果，比现有的基elines superior。最后，我们还证明了我们的模型可以成功地在视频领域中转移到零例情况下，即无需调整或适应，并生成良质的未来步骤预测。

Multimodal Variational Auto-encoder based Audio-Visual Segmentation

paper_url: http://arxiv.org/abs/2310.08303
repo_url: https://github.com/opennlplab/mmvae-avs
paper_authors: Yuxin Mao, Jing Zhang, Mochu Xiang, Yiran Zhong, Yuchao Dai
for: 为 audio-visual segmentation (AVS) 任务，提出了Explicit Conditional Multimodal Variational Auto-Encoder (ECMVAE) 模型，用于音频视频序列中的音源分割。
methods: 我们使用了模式特征学习的视角，强调明确地捕捉每个模式的特征。具体来说，我们发现音频中含有音源生产者的关键分类信息，而视频数据则提供了可能的声音生产者。这两种数据的共同信息与视频中显示的声音生产者相对应。因此，跨modal共享表示学习是AVS中非常重要的。为了实现这一目标，我们的ECMVAE模型使用了共享表示和特定表示的因子化。在这种情况下，我们应用了modalities之间的正交性约束，以保持因子化的独特性。此外，我们还引入了广泛探索的强制正则化，以便对每个模式进行详细的探索。
results: 我们在AVSBench上进行了量化和质量评估，并证明了我们的方法的效iveness。相比之前的AVS方法，我们的ECMVAE模型在多个声音源分割任务中达到了新的州OF-THE-ART Waterloo，升级了3.84 mIOU的性能。

Abstract
We propose an Explicit Conditional Multimodal Variational Auto-Encoder (ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the video sequence. Existing AVS methods focus on implicit feature fusion strategies, where models are trained to fit the discrete samples in the dataset. With a limited and less diverse dataset, the resulting performance is usually unsatisfactory. In contrast, we address this problem from an effective representation learning perspective, aiming to model the contribution of each modality explicitly. Specifically, we find that audio contains critical category information of the sound producers, and visual data provides candidate sound producer(s). Their shared information corresponds to the target sound producer(s) shown in the visual data. In this case, cross-modal shared representation learning is especially important for AVS. To achieve this, our ECMVAE factorizes the representations of each modality with a modality-shared representation and a modality-specific representation. An orthogonality constraint is applied between the shared and specific representations to maintain the exclusive attribute of the factorized latent code. Further, a mutual information maximization regularizer is introduced to achieve extensive exploration of each modality. Quantitative and qualitative evaluations on the AVSBench demonstrate the effectiveness of our approach, leading to a new state-of-the-art for AVS, with a 3.84 mIOU performance leap on the challenging MS3 subset for multiple sound source segmentation.

摘要
我们提出了一种显式条件多模态变分自动编码器（ECMVAE），用于音频视频分割（AVS），目的是在视频序列中分割声音源。现有的AVS方法主要采用隐式特征融合策略，其中模型通常是根据数据集中的精确样本进行训练。由于数据集规模有限，模型的性能通常不满足要求。我们则从表示学习的视角来解决这个问题，即模型需要明确地表示每个modalities的贡献。具体来说，我们发现音频中含有重要的声音生产者类别信息，而视觉数据则提供了声音生产者候选人。他们共享的信息与视频中显示的声音生产者相对应。在这种情况下，跨Modalities的共享表示学习特别重要。为此，我们的ECMVAE使用一个共享表示和一个特定表示来分解每个modalities的表示。我们还应用一个共享和特定表示之间的正交约束，以保持各个modalities的独特性。此外，我们还引入了一个最大化对抗信息 regularizer，以实现每个modalities的广泛探索。量化和质量评估表明，我们的方法有效地解决AVS问题，在AVSBench上达到了新的状态机器，其中MS3子集上多个声音源分割的IOU性能提高3.84米。

paper_url: http://arxiv.org/abs/2310.08261
repo_url: None
paper_authors: Ziying Song, Haiyue Wei, Lin Bai, Lei Yang, Caiyan Jia
for: 3D object detection in autonomous driving
methods: graph matching, feature alignment, projection calibration, self-attention module
results: more accurate feature alignment, improved performance in 3D object detection

Abstract
LiDAR and cameras are complementary sensors for 3D object detection in autonomous driving. However, it is challenging to explore the unnatural interaction between point clouds and images, and the critical factor is how to conduct feature alignment of heterogeneous modalities. Currently, many methods achieve feature alignment by projection calibration only, without considering the problem of coordinate conversion accuracy errors between sensors, leading to sub-optimal performance. In this paper, we present GraphAlign, a more accurate feature alignment strategy for 3D object detection by graph matching. Specifically, we fuse image features from a semantic segmentation encoder in the image branch and point cloud features from a 3D Sparse CNN in the LiDAR branch. To save computation, we construct the nearest neighbor relationship by calculating Euclidean distance within the subspaces that are divided into the point cloud features. Through the projection calibration between the image and point cloud, we project the nearest neighbors of point cloud features onto the image features. Then by matching the nearest neighbors with a single point cloud to multiple images, we search for a more appropriate feature alignment. In addition, we provide a self-attention module to enhance the weights of significant relations to fine-tune the feature alignment between heterogeneous modalities. Extensive experiments on nuScenes benchmark demonstrate the effectiveness and efficiency of our GraphAlign.

摘要
李达朗和摄像头是自动驾驶中3D对象检测的补充传感器。然而，在点云和图像之间的不自然交互问题具有挑战性，而且关键因素是如何进行多模态特征对齐。目前，许多方法通过投影准备 alone，不考虑投影准备精度错误之间传感器的坐标转换问题，导致优化性不佳。在这篇论文中，我们提出了图像对齐策略，通过图像特征和点云特征的图像对齐来提高3D对象检测的精度。具体来说，我们将图像分支中的semantic segmentation编码器输出的图像特征与LiDAR分支中的3D稀畴CNN输出的点云特征进行融合。为了降低计算量，我们将点云特征分解成子空间，并在这些子空间内计算最近邻关系。然后，通过点云特征与图像特征的投影准备，将点云特征的最近邻映射到图像特征上。最后，我们通过将多个点云特征对应到同一张图像上的多个特征进行匹配，以找到更加适合的特征对齐。此外，我们还提供了一个自注意模块，以增强不同模态之间的特征对齐关系的权重，以进一步细调特征对齐。我们在nuScenes标准测试集上进行了广泛的实验，并证明了我们的图像对齐策略的有效性和高效性。

Invisible Threats: Backdoor Attack in OCR Systems

paper_url: http://arxiv.org/abs/2310.08259
repo_url: None
paper_authors: Mauro Conti, Nicola Farronato, Stefanos Koffas, Luca Pajola, Stjepan Picek
for: 这个论文的目的是描述一种针对 Optical Character Recognition (OCR) 的后门攻击，使得 extracted text 不可读用于自然语言处理应用程序中。
methods: 该论文使用了深度神经网络来实现后门攻击，并通过插入特定的图像模式来让 OCR 模型在测试阶段输出不可读的字符。
results: 实验结果表明，攻击后 OCR 模型可以成功输出不可读的字符约 90% 的恶意输入图像，而不会对其他输入图像产生影响。

Abstract
Optical Character Recognition (OCR) is a widely used tool to extract text from scanned documents. Today, the state-of-the-art is achieved by exploiting deep neural networks. However, the cost of this performance is paid at the price of system vulnerability. For instance, in backdoor attacks, attackers compromise the training phase by inserting a backdoor in the victim's model that will be activated at testing time by specific patterns while leaving the overall model performance intact. This work proposes a backdoor attack for OCR resulting in the injection of non-readable characters from malicious input images. This simple but effective attack exposes the state-of-the-art OCR weakness, making the extracted text correct to human eyes but simultaneously unusable for the NLP application that uses OCR as a preprocessing step. Experimental results show that the attacked models successfully output non-readable characters for around 90% of the poisoned instances without harming their performance for the remaining instances.

摘要
“光学字符识别（OCR）是一个广泛使用的工具来提取扫描文档中的文字。今天，技术的前进是通过启用深度神经网络来实现的。然而，这些性能的代价是系统的易受攻击性。例如，在后门攻击中，攻击者将在受害者的模型中植入后门，使特定的模式在试验阶段 Activate 时会导致模型产生非法的字符。这个简单而有效的攻击可以让OCR模型对逻辑不正确的输入图像进行处理，从而导致提取的文字 Correct 到人眼看来，但同时也使得这些文字无法用于基于OCR的自然语言处理应用程序中。实验结果显示，攻击模型可以在90%的毒品实验中产生非法的字符，而不会对其他实验中的模型造成影响。”

Distilling from Vision-Language Models for Improved OOD Generalization in Vision Tasks

paper_url: http://arxiv.org/abs/2310.08255
repo_url: https://github.com/val-iisc/VL2V-ADiP
paper_authors: Sravanti Addepalli, Ashish Ramayee Asokan, Lakshay Sharma, R. Venkatesh Babu
for: 这种 исследование的目的是提高在黑盒 Setting中的vision-language模型（VLM）的使用效果，使其在不同数据分布下进行推理，并且可以在有限的任务特定数据上进行减少推理成本。
methods: 该研究提出了一种名为Vision-Language to Vision-Align, Distill, Predict（VL2V-ADiP）的方法，该方法首先对教师模型的视觉语言模式进行对齐，然后将对齐后的VLM嵌入托管到学生模型中，进行减少。
results: 该研究在标准的领域普适化benchmark上达到了黑盒教师设置下的state-of-the-art结果，并且当VLM的权重可用时，也可以达到更高的性能。

Abstract
Vision-Language Models (VLMs) such as CLIP are trained on large amounts of image-text pairs, resulting in remarkable generalization across several data distributions. The prohibitively expensive training and data collection/curation costs of these models make them valuable Intellectual Property (IP) for organizations. This motivates a vendor-client paradigm, where a vendor trains a large-scale VLM and grants only input-output access to clients on a pay-per-query basis in a black-box setting. The client aims to minimize inference cost by distilling the VLM to a student model using the limited available task-specific data, and further deploying this student model in the downstream application. While naive distillation largely improves the In-Domain (ID) accuracy of the student, it fails to transfer the superior out-of-distribution (OOD) generalization of the VLM teacher using the limited available labeled images. To mitigate this, we propose Vision-Language to Vision-Align, Distill, Predict (VL2V-ADiP), which first aligns the vision and language modalities of the teacher model with the vision modality of a pre-trained student model, and further distills the aligned VLM embeddings to the student. This maximally retains the pre-trained features of the student, while also incorporating the rich representations of the VLM image encoder and the superior generalization of the text embeddings. The proposed approach achieves state-of-the-art results on the standard Domain Generalization benchmarks in a black-box teacher setting, and also when weights of the VLM are accessible.

摘要
vision-language模型（VLM）如CLIP在大量图像文本对的训练中显示出惊人的总结能力。这些模型的训练和数据收集/筛选成本高昂，使其成为组织的价值财产。这种厂商-客户模式中，厂商将大规模的VLM训练成功，并只提供输入-输出访问权限给客户，并在黑盒模式下收取访问成本。客户希望通过简化VLM来减少推理成本，并将其部署到下游应用程序中。虽然简化大幅提高了学生模型的区域性（ID）准确率，但是它无法传递VLM教师模型在有限可用标注图像上的出色的跨类泛化性。为解决这个问题，我们提议vision-language到vision-align、distill、predict（VL2V-ADiP），它首先将视语模式的教师模型与先验学生模型的视模式对齐，然后将对齐后的VLM嵌入简化到学生模型中，以保留先验学生模型的特征，同时也包含VLM图像编码器的丰富表示和文本嵌入的出色泛化性。该方法在标准领域普遍化benchmark上达到了黑盒教师模式下的状态艺术成绩，以及可以访问VLM权重时的成绩。

Fast Discrete Optimisation for Geometrically Consistent 3D Shape Matching

paper_url: http://arxiv.org/abs/2310.08230
repo_url: None
paper_authors: Paul Roetzer, Ahmed Abbas, Dongliang Cao, Florian Bernard, Paul Swoboda
for: 提高3D形状匹配的精度和效率。
methods: 结合学习基于和AXIOmatic方法，实现地面一个有效的匹配方案。
results: 提供了一种初始化自由、大量并行化、提供优化差值、运行时间减少和全球最优的匹配方案。

Abstract
In this work we propose to combine the advantages of learning-based and combinatorial formalisms for 3D shape matching. While learning-based shape matching solutions lead to state-of-the-art matching performance, they do not ensure geometric consistency, so that obtained matchings are locally unsmooth. On the contrary, axiomatic methods allow to take geometric consistency into account by explicitly constraining the space of valid matchings. However, existing axiomatic formalisms are impractical since they do not scale to practically relevant problem sizes, or they require user input for the initialisation of non-convex optimisation problems. In this work we aim to close this gap by proposing a novel combinatorial solver that combines a unique set of favourable properties: our approach is (i) initialisation free, (ii) massively parallelisable powered by a quasi-Newton method, (iii) provides optimality gaps, and (iv) delivers decreased runtime and globally optimal results for many instances.

摘要
在这项工作中，我们提议结合学习基于和组合形式的3D形状匹配方法。学习基于的匹配解决方案可以达到状态最佳的匹配性，但是它们不能保证几何一致性，因此所获得的匹配是地方不稳定。相反，AXIOmatic方法可以直接考虑几何一致性，通过明确限制有效匹配空间。然而，现有的AXIOmatic形式不scalable，或者需要用户输入来初始化非拟合优化问题。在这项工作中，我们希望通过提出一种新的 combinatorial solver，并且这种 solver具有以下优点：我们的方法是（i）无需初始化，（ii）可以大规模并行化，通过 quasi-Newton 方法，（iii）提供优化差，并（iv）在许多实例中具有减少的时间和全球最佳结果。

Structural analysis of Hindi online handwritten characters for character recognition

paper_url: http://arxiv.org/abs/2310.08222
repo_url: None
paper_authors: Anand Sharma, A. G. Ramakrishnan
for: 这个论文的目的是分析在线手写文字的方向性特性，并将其分解成具有共同几何特性的子单元（sub-units）。
methods: 该论文使用了一种方法，即提取点笔、顺时针弧形笔、逆时针弧形笔和循环笔段作为子单元。这些提取的子单元与相应的在线理想文字的子单元具有相似的结构。
results: 该论文的结果表明，使用了本论文提出的子单元提取方法和基于子单元的字符分类器，可以提高在线手写文字识别率。Specifically, the recognition accuracy of the classifier trained with sub-unit level local and character level global features is 93.5%, which is the highest compared with other classifiers trained only with global features.

Abstract
Direction properties of online strokes are used to analyze them in terms of homogeneous regions or sub-strokes with points satisfying common geometric properties. Such sub-strokes are called sub-units. These properties are used to extract sub-units from Hindi ideal online characters. These properties along with some heuristics are used to extract sub-units from Hindi online handwritten characters.\\ A method is developed to extract point stroke, clockwise curve stroke, counter-clockwise curve stroke and loop stroke segments as sub-units from Hindi online handwritten characters. These extracted sub-units are close in structure to the sub-units of the corresponding Hindi online ideal characters.\\ Importance of local representation of online handwritten characters in terms of sub-units is assessed by training a classifier with sub-unit level local and character level global features extracted from characters for character recognition. The classifier has the recognition accuracy of 93.5\% on the testing set. This accuracy is the highest when compared with that of the classifiers trained only with global features extracted from characters in the same training set and evaluated on the same testing set.\\ Sub-unit extraction algorithm and the sub-unit based character classifier are tested on Hindi online handwritten character dataset. This dataset consists of samples from 96 different characters. There are 12832 and 2821 samples in the training and testing sets, respectively.

摘要
irection Properties of Online Strokes are Used to Analyze Them in Terms of Homogeneous Regions or Sub-strokes with Points Satisfying Common Geometric Properties. Such Sub-strokes are Called Sub-units. These Properties are Used to Extract Sub-units from Hindi Ideal Online Characters.\\ A Method is Developed to Extract Point Stroke, Clockwise Curve Stroke, Counter-clockwise Curve Stroke, and Loop Stroke Segments as Sub-units from Hindi Online Handwritten Characters. These Extracted Sub-units are Close in Structure to the Sub-units of the Corresponding Hindi Online Ideal Characters.\\ Importance of Local Representation of Online Handwritten Characters in Terms of Sub-units is Assessed by Training a Classifier with Sub-unit Level Local and Character Level Global Features Extracted from Characters for Character Recognition. The Classifier has the Recognition Accuracy of 93.5% on the Testing Set. This Accuracy is the Highest When Compared with That of the Classifiers Trained Only with Global Features Extracted from Characters in the Same Training Set and Evaluated on the Same Testing Set.\\ Sub-unit Extraction Algorithm and the Sub-unit Based Character Classifier are Tested on Hindi Online Handwritten Character Dataset. This Dataset Consists of Samples from 96 Different Characters. There are 12832 and 2821 Samples in the Training and Testing Sets, Respectively.

Lifelong Audio-video Masked Autoencoder with Forget-robust Localized Alignments

paper_url: http://arxiv.org/abs/2310.08204
repo_url: None
paper_authors: Jaewoo Lee, Jaehong Yoon, Wonjae Kim, Yunji Kim, Sung Ju Hwang
for: 本研究旨在应对 continuous audio-video 流的学习，即时学习多媒体表示。
methods: 我们提出了两个新想法来解决这个问题：(1) 本地对过程：我们引入了一个小型可训练的多媒体编码器，它预测 audio 和 video 词汇的对顺掌握。这使得模型仅学习高度相关的 audiovisual 矩阵。(2) 忘却Robust多媒体矩阵选择：我们比较了每对 audio-video 矩阵的相对重要性，以mitigate 过去学习的 audiovisual 表示的忘却。
results: 我们的实验显示，FLAVA 比顶对应的 continual learning 方法在多个 bencmark 数据集上表现出色。

Abstract
We present a lifelong audio-video masked autoencoder that continually learns the multimodal representations from a video stream containing audio-video pairs, while its distribution continually shifts over time. Specifically, we propose two novel ideas to tackle the problem: (1) Localized Alignment: We introduce a small trainable multimodal encoder that predicts the audio and video tokens that are well-aligned with each other. This allows the model to learn only the highly correlated audiovisual patches with accurate multimodal relationships. (2) Forget-robust multimodal patch selection: We compare the relative importance of each audio-video patch between the current and past data pair to mitigate unintended drift of the previously learned audio-video representations. Our proposed method, FLAVA (Forget-robust Localized Audio-Video Alignment), therefore, captures the complex relationships between the audio and video modalities during training on a sequence of pre-training tasks while alleviating the forgetting of learned audiovisual correlations. Our experiments validate that FLAVA outperforms the state-of-the-art continual learning methods on several benchmark datasets under continual audio-video representation learning scenarios.

摘要
我们提出了一种持续学习的音频视频匿名自动编码器，该模型从包含音频视频对的视频流中不断学习多modal表示，而其分布也在时间上不断变化。我们提出了两个新的想法来解决这个问题：（1）本地对齐：我们引入了一个可学习的小型多modal编码器，该编码器预测了audio和视频标记的匹配。这使得模型只学习了高度相关的音频视频 patches，并且保持了准确的多modal关系。（2）忘记抗性多modal patch选择：我们比较了当前和过去数据对的相对重要性，以mitigate不必要的演变。我们的提议方法FLAVA（忘记抗性本地音频视频对齐）因此在训练一系列预训练任务时，捕捉了音频视频modal之间的复杂关系，并减轻了已经学习的audiovisual相关性的忘记。我们的实验证明，FLAVA在多个 benchmark 数据集上比state-of-the-art continual learning方法表现出色。

XIMAGENET-12: An Explainable AI Benchmark Dataset for Model Robustness Evaluation

paper_url: http://arxiv.org/abs/2310.08182
repo_url: https://github.com/xiaohai12/explainable-ai-imagenet-12
paper_authors: Qiang Li, Dan Zhang, Shengzhao Lei, Xun Zhao, Shuyan Li, Porawit Kamnoedboon, WeiWei Li
for: 本研究旨在提供一个可解释的图像标注数据集，以评估计算机视觉模型在实际应用中的Robustness。
methods: 本研究使用了XIMAGENET-12数据集，该数据集包含200,000张图像和15,600个手动semantic标注。数据集 simulates six diverse scenarios，包括过度曝光、模糊、颜色变化等。
results: 本研究提出了一种新的Robustness criterion，可以评估计算机视觉模型在实际应用中的Robustness。这个数据集， along with related code，是可以用于评估计算机视觉模型的Robustness的重要资源。

Abstract
The lack of standardized robustness metrics and the widespread reliance on numerous unrelated benchmark datasets for testing have created a gap between academically validated robust models and their often problematic practical adoption. To address this, we introduce XIMAGENET-12, an explainable benchmark dataset with over 200K images and 15,600 manual semantic annotations. Covering 12 categories from ImageNet to represent objects commonly encountered in practical life and simulating six diverse scenarios, including overexposure, blurring, color changing, etc., we further propose a novel robustness criterion that extends beyond model generation ability assessment. This benchmark dataset, along with related code, is available at https://sites.google.com/view/ximagenet-12/home. Researchers and practitioners can leverage this resource to evaluate the robustness of their visual models under challenging conditions and ultimately benefit from the demands of practical computer vision systems.

摘要
因为缺乏标准化的稳定性指标和各种不相关的benchmark dataset的广泛依赖，这导致了学术验证的模型和其在实际应用中的问题aticadoptation之间的一个差距。为解决这个问题，我们介绍了ximagenet-12，一个可解释的benchmark dataset，包含超过20万个图像和15600个手动semantic annotations。这些dataset covers 12个类从imageNet中选择了通常在实际生活中遇到的 объекcs，并模拟了6种多样化的enario，包括过度曝光、模糊、颜色变化等。此外，我们还提出了一个新的稳定性标准，超过了模型生成能力评价。这个benchmark dataset， along with related code，可以在https://sites.google.com/view/ximagenet-12/home上获取。研究人员和实践者可以利用这个资源来评估他们的视觉模型在具有挑战性的条件下的稳定性，从而 ultimately benefit from the demands of practical computer vision systems。

Improving Fast Minimum-Norm Attacks with Hyperparameter Optimization

paper_url: http://arxiv.org/abs/2310.08177
repo_url: https://github.com/pralab/HO-FMN
paper_authors: Giuseppe Floris, Raffaele Mura, Luca Scionis, Giorgio Piras, Maura Pintor, Ambra Demontis, Battista Biggio
for: 提高机器学习模型的敌对 robustness 使用Gradient-based攻击是困难的。
methods: 通过自动选择损失函数、优化器和步长调节器以及它们相关的超参数进行超参数优化，以提高快速最小 нор攻击的效果。
results: 我们在多种Robust模型的广泛评估中发现，通过超参数优化可以提高快速最小 нор攻击的效果。我们发布了相关的开源代码在https://github.com/pralab/HO-FMN。

Abstract
Evaluating the adversarial robustness of machine learning models using gradient-based attacks is challenging. In this work, we show that hyperparameter optimization can improve fast minimum-norm attacks by automating the selection of the loss function, the optimizer and the step-size scheduler, along with the corresponding hyperparameters. Our extensive evaluation involving several robust models demonstrates the improved efficacy of fast minimum-norm attacks when hyper-up with hyperparameter optimization. We release our open-source code at https://github.com/pralab/HO-FMN.

摘要
evaluating the adversarial robustness of machine learning models using gradient-based attacks is challenging. In this work, we show that hyperparameter optimization can improve fast minimum-norm attacks by automating the selection of the loss function, the optimizer, and the step-size scheduler, along with the corresponding hyperparameters. Our extensive evaluation involving several robust models demonstrates the improved efficacy of fast minimum-norm attacks when hyper-up with hyperparameter optimization. We release our open-source code at https://github.com/pralab/HO-FMN.Here's the translation in Traditional Chinese:评估机器学习模型的敌方性防护效果使用Gradient-based攻击是具有挑战性的。在这个工作中，我们显示出hyperparameter优化可以提高快速最小范数攻击的效率，通过自动选择损失函数、优化器和步长调节器，以及它们所对应的超参数。我们的广泛评估，包括多个预防型模型，显示了快速最小范数攻击的改善效果，当hyper-up with hyperparameter优化时。我们在https://github.com/pralab/HO-FMN上发布了我们的开源代码。

COVID-19 Detection Using Swin Transformer Approach from Computed Tomography Images

paper_url: http://arxiv.org/abs/2310.08165
repo_url: https://github.com/idu-cvlab/cov19d_4th
paper_authors: Kenan Morani
for: 针对大规模医学成像数据集，提出一种新的 COVID-19 诊断方法使用 CT 图像，利用 Swin Transformer 模型的力量，为计算机视觉任务提供现代解决方案。
methods: 方法包括一种系统化的病人级预测方法，即将个别 CT 片分类为 COVID-19 或非 COVID-19，并通过多数投票决定病人的总诊断结果。
results: 对比基准和竞争方法，我们的方法在评价指标中表现出色，具有 Exceptional 的诊断精度。 macro F1 分数达到了基准和竞争方法的高点，提供了一个可靠的 COVID-19 诊断解决方案。

Abstract
The accurate and efficient diagnosis of COVID-19 is of paramount importance, particularly in the context of large-scale medical imaging datasets. In this preprint paper, we propose a novel approach for COVID-19 diagnosis using CT images that leverages the power of Swin Transformer models, state-of-the-art solutions in computer vision tasks. Our method includes a systematic approach for patient-level predictions, where individual CT slices are classified as COVID-19 or non-COVID, and the patient's overall diagnosis is determined through majority voting. The application of the Swin Transformer in this context results in patient-level predictions that demonstrate exceptional diagnostic accuracy. In terms of evaluation metrics, our approach consistently outperforms the baseline, as well as numerous competing methods, showcasing its effectiveness in COVID-19 diagnosis. The macro F1 score achieved by our model exceeds the baseline and offers a robust solution for accurate diagnosis.

摘要
“ covid-19 诊断的精确性和效率非常重要，特别在大规模医疗影像数据中。在这个预印稿中，我们提出了一种新的 covid-19 诊断方法使用 CT 影像，利用了 Swin Transformer 模型，现今的计算机见解应用。我们的方法包括对每个 CT 层进行分类，将每个 CT 层分为 covid-19 或非 covid-19，并通过多数决进行病人级别预测。Swin Transformer 在这个上下文中的应用导致了病人级别预测的非常高精度。在评估指标方面，我们的方法比基准和多个竞争方法表现出色，展示了它在 covid-19 诊断中的有效性。 macro F1 分数由我们的模型实现，超过基准，提供了一个可靠的准确诊断解决方案。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you need Traditional Chinese, please let me know.

A Deep Learning Framework for Spatiotemporal Ultrasound Localization Microscopy

paper_url: http://arxiv.org/abs/2310.08143
repo_url: None
paper_authors: Léo Milecki, Jonathan Porée, Hatim Belgharbi, Chloé Bourquin, Rafat Damseh, Patrick Delafontaine-Martel, Frédéric Lesage, Maxime Gasse, Jean Provost
for: 本研究旨在使用深度学习方法重建微vascular网络，以提高ultrasound localization microscopy（ULM）的分辨率。
methods: 本研究使用了三维卷积神经网络（3D-CNN），基于V-net架构，来重建微vascular网络。采用了实际的mouse brain microvascular网络，从2氪微scopy中提取的数据来训练3D-CNN。
results: 本研究的结果表明，使用3D-CNN方法可以提高ULM的分辨率，在silico中的 precisión为81%，与传统ULM框架相比下降。在生物体中，3D-CNN方法可以分解出微vascular网络中的小血管，分辨率高于传统方法。

Abstract
Ultrasound Localization Microscopy can resolve the microvascular bed down to a few micrometers. To achieve such performance microbubble contrast agents must perfuse the entire microvascular network. Microbubbles are then located individually and tracked over time to sample individual vessels, typically over hundreds of thousands of images. To overcome the fundamental limit of diffraction and achieve a dense reconstruction of the network, low microbubble concentrations must be used, which lead to acquisitions lasting several minutes. Conventional processing pipelines are currently unable to deal with interference from multiple nearby microbubbles, further reducing achievable concentrations. This work overcomes this problem by proposing a Deep Learning approach to recover dense vascular networks from ultrasound acquisitions with high microbubble concentrations. A realistic mouse brain microvascular network, segmented from 2-photon microscopy, was used to train a three-dimensional convolutional neural network based on a V-net architecture. Ultrasound data sets from multiple microbubbles flowing through the microvascular network were simulated and used as ground truth to train the 3D CNN to track microbubbles. The 3D-CNN approach was validated in silico using a subset of the data and in vivo on a rat brain acquisition. In silico, the CNN reconstructed vascular networks with higher precision (81%) than a conventional ULM framework (70%). In vivo, the CNN could resolve micro vessels as small as 10 $\mu$m with an increase in resolution when compared against a conventional approach.

摘要
超声本地化微scopic imaging可以达到几微米级别的分辨率。为了实现这一表现，微ubble contrast agents必须在整个微血管网络中流动。然后，微ubble会被 individuated 和跟踪时间，以采样个体血管，通常是数十万张图像。为了超越干扰的基本限制，使用低微ubble浓度，需要持续数分钟的获取。现有的处理管道无法处理多个附近微ubble的干扰，从而降低实现的浓度。这项工作解决了这个问题，提出了基于深度学习的方法，从ultrasound获取 dense vascular network 的重建。使用真实的mouse brain microvascular network，从2气相icroscopy中 segments，并用3维 convolutional neural network (CNN) 基于V-net架构进行训练。ultrasound数据集从多个微ubble流经 microvascular network 进行模拟，并用作真实数据来训练3D CNN。在silico中，CNN可以比 convential ULM framework (70%) 提高分辨率（81%）。在 vivo中，CNN可以分解到10微米级别的微血管，并与 convential approach 比较，显示了增加的分辨率。

Fine-Grained Annotation for Face Anti-Spoofing

paper_url: http://arxiv.org/abs/2310.08142
repo_url: None
paper_authors: Xu Chen, Yunde Jia, Yuwei Wu
for: 防止面部验证系统受到攻击，提高面部验证系统的安全性。
methods: 提出了一种细化注释方法，通过使用面部特征点作为提示，获取面部区域的分割 маSK。然后，将这些区域分割成三个分割地图：骗ubble、生物和背景地图。最后，将这三个地图组合成一个三通道地图，用于模型训练。此外，我们还引入了多通道区域交换增强，以增加训练数据的多样性和减少过拟合。
results: 实验结果表明，我们的方法比现有状态的方法在内部和跨 dataset 评估中表现出色，得到了更高的识别率。

Abstract
Face anti-spoofing plays a critical role in safeguarding facial recognition systems against presentation attacks. While existing deep learning methods show promising results, they still suffer from the lack of fine-grained annotations, which lead models to learn task-irrelevant or unfaithful features. In this paper, we propose a fine-grained annotation method for face anti-spoofing. Specifically, we first leverage the Segment Anything Model (SAM) to obtain pixel-wise segmentation masks by utilizing face landmarks as point prompts. The face landmarks provide segmentation semantics, which segments the face into regions. We then adopt these regions as masks and assemble them into three separate annotation maps: spoof, living, and background maps. Finally, we combine three separate maps into a three-channel map as annotations for model training. Furthermore, we introduce the Multi-Channel Region Exchange Augmentation (MCREA) to diversify training data and reduce overfitting. Experimental results demonstrate that our method outperforms existing state-of-the-art approaches in both intra-dataset and cross-dataset evaluations.

摘要
<>translate_text=" Face anti-spoofing plays a critical role in safeguarding facial recognition systems against presentation attacks. While existing deep learning methods show promising results, they still suffer from the lack of fine-grained annotations, which lead models to learn task-irrelevant or unfaithful features. In this paper, we propose a fine-grained annotation method for face anti-spoofing. Specifically, we first leverage the Segment Anything Model (SAM) to obtain pixel-wise segmentation masks by utilizing face landmarks as point prompts. The face landmarks provide segmentation semantics, which segments the face into regions. We then adopt these regions as masks and assemble them into three separate annotation maps: spoof, living, and background maps. Finally, we combine three separate maps into a three-channel map as annotations for model training. Furthermore, we introduce the Multi-Channel Region Exchange Augmentation (MCREA) to diversify training data and reduce overfitting. Experimental results demonstrate that our method outperforms existing state-of-the-art approaches in both intra-dataset and cross-dataset evaluations. "translate_text_simplified = "面部防质备措施对于面部识别系统的保护起到关键作用。现有的深度学习方法虽显示出了扎实的结果，但仍然受到精细注解的缺乏，这导致模型学习到无关任务或不准确的特征。在这篇论文中，我们提议一种精细注解方法 для面部防质备。具体来说，我们首先利用Segment Anything Model（SAM）获取面部像素级分割masks，通过面部特征点作为点提示来使用。这些面部特征点提供分割 semantics，将面部分成不同区域。我们然后采用这些区域作为masks，并将其组装成三个分割图：骗球、生物和背景图。最后，我们将三个分割图合并成三通道的映射，用于模型训练。此外，我们还引入多通道区域交换增强技术（MCREA），以增加训练数据的多样性，降低过拟合。实验结果表明，我们的方法在内部和交叉 dataset 评估中都超过了现有状态码的方法。

DualAug: Exploiting Additional Heavy Augmentation with OOD Data Rejection

paper_url: http://arxiv.org/abs/2310.08139
repo_url: https://github.com/shuguang99/DualAug
paper_authors: Zehao Wang, Yiwen Guo, Qizhang Li, Guanglei Yang, Wangmeng Zuo
for: 提高模型泛化和鲁棒性，避免模型适应性问题
methods: 提出了一种新的数据扩充方法，即双重扩充（DualAug），通过混合基本扩充和重大扩充分支来保持扩充在适度上，并且可以适应不同的训练样本
results: 在图像分类Benchmark上进行了广泛的实验，并证明了DualAug可以提高自动数据扩充方法，同时在 semi-supervised learning 和自我监督学习中也有良好的效果

Abstract
Data augmentation is a dominant method for reducing model overfitting and improving generalization. Most existing data augmentation methods tend to find a compromise in augmenting the data, \textit{i.e.}, increasing the amplitude of augmentation carefully to avoid degrading some data too much and doing harm to the model performance. We delve into the relationship between data augmentation and model performance, revealing that the performance drop with heavy augmentation comes from the presence of out-of-distribution (OOD) data. Nonetheless, as the same data transformation has different effects for different training samples, even for heavy augmentation, there remains part of in-distribution data which is beneficial to model training. Based on the observation, we propose a novel data augmentation method, named \textbf{DualAug}, to keep the augmentation in distribution as much as possible at a reasonable time and computational cost. We design a data mixing strategy to fuse augmented data from both the basic- and the heavy-augmentation branches. Extensive experiments on supervised image classification benchmarks show that DualAug improve various automated data augmentation method. Moreover, the experiments on semi-supervised learning and contrastive self-supervised learning demonstrate that our DualAug can also improve related method. Code is available at \href{https://github.com/shuguang99/DualAug}{https://github.com/shuguang99/DualAug}.

摘要
<>输入文本转换成简化中文。<>数据增强是现有方法中最主要的方法，用于降低模型适应度和提高泛化能力。大多数现有的数据增强方法都是找到一个妥协，即缓和增强数据的方式，以避免一些数据被增强得太多，对模型表现产生负面影响。我们深入研究数据增强和模型性能之间的关系，发现增强后模型表现下降的原因是存在外部数据（OOD）。然而，即使使用同一种数据变换，不同的训练样本会受到不同的影响，甚至在增强得 Very Heavy 时，还有一部分内部数据会对模型训练有利。基于这个观察，我们提出了一种新的数据增强方法，名为 DualAug，可以保持增强在distribution中的可能性最大，同时在合理的时间和计算成本下进行增强。我们设计了一种混合策略，将基本增强和重增强分支中的增强数据混合在一起。广泛的实验表明，我们的 DualAug 可以提高不同的自动数据增强方法，并且在 semi-supervised 学习和对比自动数据增强方法中也有优化效果。代码可以在中找到。

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting

paper_url: http://arxiv.org/abs/2310.08129
repo_url: None
paper_authors: Zijie Chen, Lichao Zhang, Fangsheng Weng, Lili Pan, Zhenzhong Lan
for: 提高文本到图像生成的个性化性和用户体验
methods: 利用历史用户与系统交互增强用户提示，并使用大规模文本到图像数据集进行提示重写
results: 比基eline方法有显著提高，在新的离线评估方法和在线测试中得到较高的效果

Abstract
We propose a novel perspective of viewing large pretrained models as search engines, thereby enabling the repurposing of techniques previously used to enhance search engine performance. As an illustration, we employ a personalized query rewriting technique in the realm of text-to-image generation. Despite significant progress in the field, it is still challenging to create personalized visual representations that align closely with the desires and preferences of individual users. This process requires users to articulate their ideas in words that are both comprehensible to the models and accurately capture their vision, posing difficulties for many users. In this paper, we tackle this challenge by leveraging historical user interactions with the system to enhance user prompts. We propose a novel approach that involves rewriting user prompts based a new large-scale text-to-image dataset with over 300k prompts from 3115 users. Our rewriting model enhances the expressiveness and alignment of user prompts with their intended visual outputs. Experimental results demonstrate the superiority of our methods over baseline approaches, as evidenced in our new offline evaluation method and online tests. Our approach opens up exciting possibilities of applying more search engine techniques to build truly personalized large pretrained models.

摘要
我们提出了一种新的视角，即将大型预训练模型视为搜索引擎，从而使得可以复用以前用于提高搜索引擎性能的技术。作为一个示例，我们在文本到图生成领域使用了个性化查询 rewrite 技术。虽然在这个领域已经做出了很大的进步，但是仍然很难创造个性化的视觉表示，使得用户需要用语言来表达他们的想法，这会对用户提出很大的挑战。在这篇论文中，我们解决了这个问题，通过利用系统历史用户交互记录来增强用户提示。我们提出了一种新的方法，即基于大规模文本到图数据集（包含超过 300k 提示，来自 3115 名用户）进行用户提示 rewrite。我们的 rewrite 模型可以提高用户提示的表达力和与愿景的匹配度。实验结果表明我们的方法在基准方法上有superiority，可见于我们新的离线评估方法和在线测试中。我们的方法开 up了应用更多搜索引擎技术来建立真正个性化的大型预训练模型的可能性。

Multimodal Active Measurement for Human Mesh Recovery in Close Proximity

paper_url: http://arxiv.org/abs/2310.08116
repo_url: None
paper_authors: Takahiro Maeda, Keisuke Takeshita, Kazuhito Tanaka
for: 这个研究旨在提高人机交互中机器人的人体位姿估计精度，以实现安全和复杂的人机交互。methods: 本研究提出了一个活动测量和感应融合框架，使用equipped镜头和其他感应器，如触摸感应器和2D LiDAR，在人机交互中获取稀疏但可靠的感应讯号，并融合这些感应讯号和镜头测量估计的人体位姿，以提高人体位姿估计精度。results: 实验结果显示， compared to existing methods, 本研究的方法能够更好地估计人体位姿，尤其是在实际情况下，如人被覆盖物品 occluded 和人机交互中。

Abstract
For safe and sophisticated physical human-robot interactions (pHRI), a robot needs to estimate the accurate body pose or mesh of the target person. However, in these pHRI scenarios, the robot cannot fully observe the target person's body with equipped cameras because the target person is usually close to the robot. This leads to severe truncation and occlusions, and results in poor accuracy of human pose estimation. For better accuracy of human pose estimation or mesh recovery on this limited information from cameras, we propose an active measurement and sensor fusion framework of the equipped cameras and other sensors such as touch sensors and 2D LiDAR. These touch and LiDAR sensing are obtained attendantly through pHRI without additional costs. These sensor measurements are sparse but reliable and informative cues for human mesh recovery. In our active measurement process, camera viewpoints and sensor placements are optimized based on the uncertainty of the estimated pose, which is closely related to the truncated or occluded areas. In our sensor fusion process, we fuse the sensor measurements to the camera-based estimated pose by minimizing the distance between the estimated mesh and measured positions. Our method is agnostic to robot configurations. Experiments were conducted using the Toyota Human Support Robot, which has a camera, 2D LiDAR, and a touch sensor on the robot arm. Our proposed method demonstrated the superiority in the human pose estimation accuracy on the quantitative comparison. Furthermore, our proposed method reliably estimated the pose of the target person in practical settings such as target people occluded by a blanket and standing aid with the robot arm.

摘要
为实现安全和复杂的人机 робо交互（pHRI）， робо需要估算target人体姿或网格的准确位置。然而，在这些pHRI场景中，робо不能完全观察target人体的全部部分，因此会出现严重的截断和遮挡，导致人体姿势估算的精度低。为了提高人体姿势估算或网格恢复的精度，我们提议使用配备了摄像头和其他感知器的活动测量和感知融合框架。这些触感和LiDAR感知通过pHRI获得，无需额外成本。这些感知测量 sparse yet reliable and informative cues for human mesh recovery。在我们的活动测量过程中，摄像头视点和感知器位置被优化基于估算 pose 的uncertainty，这与 truncated或 occluded areas 有关。在我们的感知融合过程中，我们将感知测量与摄像头基于 estimated pose 进行融合，以iminize the distance between the estimated mesh and measured positions。我们的方法不受机器人配置的限制。我们在使用 Toyota Human Support Robot，它配备了摄像头、2D LiDAR和触感器，进行实验。我们的提议方法在量化比较中表现出了superiority。此外，我们的方法可靠地估算target人体姿势在实际场景中，如target人被布料 occluded 和robot臂上的standing aid。

Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models

paper_url: http://arxiv.org/abs/2310.08106
repo_url: https://github.com/BeierZhu/GLA
paper_authors: Beier Zhu, Kaihua Tang, Qianru Sun, Hanwang Zhang
for: 提高预训练模型的表现，尤其是在零shot任务上。
methods: 研究基础模型中的内在偏见问题，并提出一种通过优化来减少这种偏见的方法（Generalized Logit Adjustment，GLA）。
results: 在多个任务上达到了显著的改善，包括在ImageNet上的1.5 pp精度提升，以及在11个少量数据集上的大均值改善（1.4-4.6 pp）和长尾分类任务上的2.4 pp提升。

Abstract
Foundation models like CLIP allow zero-shot transfer on various tasks without additional training data. Yet, the zero-shot performance is less competitive than a fully supervised one. Thus, to enhance the performance, fine-tuning and ensembling are also commonly adopted to better fit the downstream tasks. However, we argue that such prior work has overlooked the inherent biases in foundation models. Due to the highly imbalanced Web-scale training set, these foundation models are inevitably skewed toward frequent semantics, and thus the subsequent fine-tuning or ensembling is still biased. In this study, we systematically examine the biases in foundation models and demonstrate the efficacy of our proposed Generalized Logit Adjustment (GLA) method. Note that bias estimation in foundation models is challenging, as most pre-train data cannot be explicitly accessed like in traditional long-tailed classification tasks. To this end, GLA has an optimization-based bias estimation approach for debiasing foundation models. As our work resolves a fundamental flaw in the pre-training, the proposed GLA demonstrates significant improvements across a diverse range of tasks: it achieves 1.5 pp accuracy gains on ImageNet, an large average improvement (1.4-4.6 pp) on 11 few-shot datasets, 2.4 pp gains on long-tailed classification. Codes are in \url{https://github.com/BeierZhu/GLA}.

摘要
基于CLIP等基础模型的零shot传输能力在多种任务上表现不佳，但是通过精度调整和组合来进一步适应下游任务的性能。然而，我们认为这些前工作忽略了基础模型内置的偏见。由于Web规模训练集的高度偏袋性，这些基础模型无法快速识别少见的 semantics，因此后续的精度调整或组合仍然偏袋。在本研究中，我们系统地检查基础模型中的偏见，并示出我们的提议的通用Logit调整（GLA）方法的效果。尽管对基础模型的偏见估计是一项挑战，因为大多数预训练数据无法直接访问如传统长尾分类任务一样。为此，GLA使用优化基本偏见估计方法来减少基础模型的偏见。我们的工作解决了预训练的基本漏洞，因此我们的GLA方法在多种任务上表现出了显著改进：在ImageNet上达到1.5 pp的精度提升，在11个少量样本任务上平均提高1.4-4.6 pp，在长尾分类任务上提高2.4 pp。代码在\url{https://github.com/BeierZhu/GLA}。

SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing

paper_url: http://arxiv.org/abs/2310.08094
repo_url: None
paper_authors: Zijie Wu, Chaohui Yu, Zhen Zhu, Fan Wang, Xiang Bai
for: 这个研究的目的是提出一个简单且有效的单一图像转文本（I2T）倒排基eline，实现高品质的图像生成和自由的文本控制。
methods: 这个基eline使用了两个阶段的方案，第一阶段是调整学习的对象 embedding，使其专注在对话领域而不与无关的背景相关。第二阶段是精微调整T2I模型，以提高图像的可观性和避免语言漂移问题。
results: 这个基eline可以实现高品质的单一概念生成，同时允许自由的编辑。此外，这个基eline也可以实现单一图像新视角生成和多概念合成，不需要共同训练。我们设计了一个编辑提示列表和一个名为Editing Success Rate（ESR）的评估指标，以便评估编辑的灵活性。

Abstract
Recent progress in text-to-image (T2I) models enables high-quality image generation with flexible textual control. To utilize the abundant visual priors in the off-the-shelf T2I models, a series of methods try to invert an image to proper embedding that aligns with the semantic space of the T2I model. However, these image-to-text (I2T) inversion methods typically need multiple source images containing the same concept or struggle with the imbalance between editing flexibility and visual fidelity. In this work, we point out that the critical problem lies in the foreground-background entanglement when learning an intended concept, and propose a simple and effective baseline for single-image I2T inversion, named SingleInsert. SingleInsert adopts a two-stage scheme. In the first stage, we regulate the learned embedding to concentrate on the foreground area without being associated with the irrelevant background. In the second stage, we finetune the T2I model for better visual resemblance and devise a semantic loss to prevent the language drift problem. With the proposed techniques, SingleInsert excels in single concept generation with high visual fidelity while allowing flexible editing. Additionally, SingleInsert can perform single-image novel view synthesis and multiple concepts composition without requiring joint training. To facilitate evaluation, we design an editing prompt list and introduce a metric named Editing Success Rate (ESR) for quantitative assessment of editing flexibility. Our project page is: https://jarrentwu1031.github.io/SingleInsert-web/

摘要
最近的文本到图像（T2I）模型进步，使得高质量图像生成变得可控。为了利用存在的图像Visual prior，一些方法尝试将图像转换为与T2I模型的semantic空间匹配的嵌入。然而，这些图像到文本（I2T）反向方法通常需要多个包含同一概念的源图像，或者面临着编辑灵活性和视觉准确性之间的矛盾。在这种情况下，我们指出了带前景背景杂化的问题是学习某一概念的关键问题。为了解决这问题，我们提出了一种简单而有效的基线方法，名为SingleInsert。SingleInsert采用两个阶段方案。在第一阶段，我们规定学习的嵌入向量集中注意力集中在前景区域，而不与无关的背景相关。在第二阶段，我们进一步训练T2I模型，以更好地保持视觉准确性，并设置了semantic损失，以避免语言迁移问题。与传统方法相比，SingleInsert在单个概念生成中实现高视觉准确性，同时允许高灵活度编辑。此外，SingleInsert还可以完成单图像新视图生成和多个概念组合，无需共同训练。为方便评估，我们设计了编辑提示列表，并引入了一个名为Editing Success Rate（ESR）的评价指标，用于评估编辑flexibility的量化评价。我们的项目页面是：https://jarrentwu1031.github.io/SingleInsert-web/

Consistent123: Improve Consistency for One Image to 3D Object Synthesis

paper_url: http://arxiv.org/abs/2310.08092
repo_url: None
paper_authors: Haohan Weng, Tianyu Yang, Jianan Wang, Yu Li, Tong Zhang, C. L. Philip Chen, Lei Zhang
for: 提高视图一致性和三维重建性
methods: incorporating additional cross-view attention layers and shared self-attention mechanism
results: outperforms baselines in view consistency and shows great potential in 3D generation field

Abstract
Large image diffusion models enable novel view synthesis with high quality and excellent zero-shot capability. However, such models based on image-to-image translation have no guarantee of view consistency, limiting the performance for downstream tasks like 3D reconstruction and image-to-3D generation. To empower consistency, we propose Consistent123 to synthesize novel views simultaneously by incorporating additional cross-view attention layers and the shared self-attention mechanism. The proposed attention mechanism improves the interaction across all synthesized views, as well as the alignment between the condition view and novel views. In the sampling stage, such architecture supports simultaneously generating an arbitrary number of views while training at a fixed length. We also introduce a progressive classifier-free guidance strategy to achieve the trade-off between texture and geometry for synthesized object views. Qualitative and quantitative experiments show that Consistent123 outperforms baselines in view consistency by a large margin. Furthermore, we demonstrate a significant improvement of Consistent123 on varying downstream tasks, showing its great potential in the 3D generation field. The project page is available at consistent-123.github.io.

摘要
大型图像扩散模型可以实现高质量的新视图合成，但这些模型基于图像到图像翻译没有保证视图一致性，这限制了下游任务如3D重建和图像到3D转换的性能。为了强化一致性，我们提议Consistent123同时生成新视图，通过添加跨视图注意力层和共享自注意机制来实现。该注意力机制提高了所生成视图之间的交互，以及condition视图和新视图之间的对齐。在抽取阶段，这种建筑支持同时生成任意数量的视图，并在固定长度进行训练。我们还提出了不需要分类器的进度导航策略，以实现Texture和Geometry之间的融合。Qualitative和量化实验显示，Consistent123在视图一致性方面大幅超过基eline。此外，我们还证明Consistent123在不同的下游任务上表现出了很大的提升，这表明它在3D生成领域的潜力非常大。项目页面可以在consistent-123.github.io上找到。

Implicit Shape and Appearance Priors for Few-Shot Full Head Reconstruction

paper_url: http://arxiv.org/abs/2310.08784
repo_url: None
paper_authors: Pol Caselles, Eduard Ramon, Jaime Garcia, Gil Triginer, Francesc Moreno-Noguer
for: 这篇论文主要targets few-shot full 3D head reconstruction, aiming to improve the efficiency and accuracy of coordinate-based neural representations.
methods: 该方法具有以下三个特点：1) incorporating a probabilistic shape and appearance prior into coordinate-based representations, 2) leveraging a differentiable renderer for fitting a signed distance function, and 3) employing parallelizable ray tracing and dynamic caching strategies.
results: 该方法可以在只使用几张输入图像（甚至只有一张）的情况下实现高精度的3D头部重建，并且比前一代方法快速多了一个数量级。此外，该方法还可以在测试阶段使用H3DS数据集进行评估，并达到了当前最佳的结果。

Abstract
Recent advancements in learning techniques that employ coordinate-based neural representations have yielded remarkable results in multi-view 3D reconstruction tasks. However, these approaches often require a substantial number of input views (typically several tens) and computationally intensive optimization procedures to achieve their effectiveness. In this paper, we address these limitations specifically for the problem of few-shot full 3D head reconstruction. We accomplish this by incorporating a probabilistic shape and appearance prior into coordinate-based representations, enabling faster convergence and improved generalization when working with only a few input images (even as low as a single image). During testing, we leverage this prior to guide the fitting process of a signed distance function using a differentiable renderer. By incorporating the statistical prior alongside parallelizable ray tracing and dynamic caching strategies, we achieve an efficient and accurate approach to few-shot full 3D head reconstruction. Moreover, we extend the H3DS dataset, which now comprises 60 high-resolution 3D full head scans and their corresponding posed images and masks, which we use for evaluation purposes. By leveraging this dataset, we demonstrate the remarkable capabilities of our approach in achieving state-of-the-art results in geometry reconstruction while being an order of magnitude faster than previous approaches.

摘要

Volumetric Medical Image Segmentation via Scribble Annotations and Shape Priors

paper_url: http://arxiv.org/abs/2310.08084
repo_url: None
paper_authors: Qiuhui Chen, Haiying Lyu, Xinyue Hu, Yong Lu, Yi Hong
for: 这个论文目的是提出一种基于scribble的三维图像分割方法，以提高边界预测和ROI的形态regularization。
methods: 该方法使用了一种2.5D注意力UNet，加上一个提议的标签传播模块，以扩展scribble中的semantic信息，并使用了static和active边界预测来学习ROI的边界和形态regulation。
results: 对于三个公共数据集和一个私有数据集， experiments demonstrate that our Scribble2D5方法可以在基于scribble的volumetric图像分割 task中 achieve state-of-the-art performance，并且可以利用shape prior信息来进一步提高模型准确性。

Abstract
Recently, weakly-supervised image segmentation using weak annotations like scribbles has gained great attention in computer vision and medical image analysis, since such annotations are much easier to obtain compared to time-consuming and labor-intensive labeling at the pixel/voxel level. However, due to a lack of structure supervision on regions of interest (ROIs), existing scribble-based methods suffer from poor boundary localization. Furthermore, most current methods are designed for 2D image segmentation, which do not fully leverage the volumetric information if directly applied to each image slice. In this paper, we propose a scribble-based volumetric image segmentation, Scribble2D5, which tackles 3D anisotropic image segmentation and aims to its improve boundary prediction. To achieve this, we augment a 2.5D attention UNet with a proposed label propagation module to extend semantic information from scribbles and use a combination of static and active boundary prediction to learn ROI's boundary and regularize its shape. Also, we propose an optional add-on component, which incorporates the shape prior information from unpaired segmentation masks to further improve model accuracy. Extensive experiments on three public datasets and one private dataset demonstrate our Scribble2D5 achieves state-of-the-art performance on volumetric image segmentation using scribbles and shape prior if available.

摘要
Translation:近期，受到scribble（简要标注）的关注强化了计算机视觉和医学影像分析领域，因为这些标注比 pixel/voxel 级别的时间consuming和劳动 INTENSIVE 标注更加容易获得。然而，由于ROI（区域关注点）的结构监督缺乏，现有的scribble-based方法受到边界预测的差。此外，大多数当前方法是为2D图像分割而设计，这些方法直接应用于每个图像片不会充分利用图像堆叠中的三维信息。在这篇论文中，我们提出了一种基于scribble的三维图像分割方法，即Scribble2D5，该方法旨在改进边界预测。为了实现这一点，我们将2.5D注意力UNet（2.5D注意力网络）与一个提议的标签传播模块相结合，以延伸scribble中的semantic信息，并使用组合动态和活动边界预测来学习ROI的边界和正则化其形状。此外，我们还提出了一个可选的组件，即将不对应分割mask中的形状优先信息 integrate到模型中，以进一步提高模型精度。广泛的实验表明，我们的Scribble2D5在使用scribble和形状优先信息时取得了state-of-the-art的性能。

Jointly Optimized Global-Local Visual Localization of UAVs

paper_url: http://arxiv.org/abs/2310.08082
repo_url: None
paper_authors: Haoling Li, Jiuniu Wang, Zhiwei Wei, Wenjia Xu
For: 本研究旨在解决无人机在GNSS干扰和不可靠情况下的导航和定位问题，特别是解决传统方法（如同时地图和视差估计）的缺陷，如错误积累和实时性不足。* Methods: 我们提出了一种新的全球-地方视觉定位网络（GLVL），该网络是一种两个阶段的视觉定位方法，其首先使用大规模检索模块找到与无人机飞行场景中相似的区域，然后使用细腻匹配模块确定精确的无人机坐标，实现实时和精确的定位。* Results: 我们在六个无人机飞行场景中进行了实验，包括了Texture-rich和Texture-sparse两类场景。结果表明，我们的方法可以实现实时精确的定位要求，特别是在村庄场景中，我们的方法可以在0.48秒内达到2.39米的定位错误。

Abstract
Navigation and localization of UAVs present a challenge when global navigation satellite systems (GNSS) are disrupted and unreliable. Traditional techniques, such as simultaneous localization and mapping (SLAM) and visual odometry (VO), exhibit certain limitations in furnishing absolute coordinates and mitigating error accumulation. Existing visual localization methods achieve autonomous visual localization without error accumulation by matching with ortho satellite images. However, doing so cannot guarantee real-time performance due to the complex matching process. To address these challenges, we propose a novel Global-Local Visual Localization (GLVL) network. Our GLVL network is a two-stage visual localization approach, combining a large-scale retrieval module that finds similar regions with the UAV flight scene, and a fine-grained matching module that localizes the precise UAV coordinate, enabling real-time and precise localization. The training process is jointly optimized in an end-to-end manner to further enhance the model capability. Experiments on six UAV flight scenes encompassing both texture-rich and texture-sparse regions demonstrate the ability of our model to achieve the real-time precise localization requirements of UAVs. Particularly, our method achieves a localization error of only 2.39 meters in 0.48 seconds in a village scene with sparse texture features.

摘要
Navigation and localization of UAVs present a challenge when global navigation satellite systems (GNSS) are disrupted and unreliable. Traditional techniques, such as simultaneous localization and mapping (SLAM) and visual odometry (VO), have certain limitations in providing absolute coordinates and mitigating error accumulation. Existing visual localization methods can achieve autonomous visual localization without error accumulation by matching with ortho satellite images, but this cannot guarantee real-time performance due to the complex matching process. To address these challenges, we propose a novel Global-Local Visual Localization (GLVL) network. Our GLVL network is a two-stage visual localization approach, combining a large-scale retrieval module that finds similar regions with the UAV flight scene, and a fine-grained matching module that localizes the precise UAV coordinate, enabling real-time and precise localization. The training process is jointly optimized in an end-to-end manner to further enhance the model capability. Experiments on six UAV flight scenes encompassing both texture-rich and texture-sparse regions demonstrate the ability of our model to achieve the real-time precise localization requirements of UAVs. Particularly, our method achieves a localization error of only 2.39 meters in 0.48 seconds in a village scene with sparse texture features.Here's the word-for-word translation of the text into Simplified Chinese:导航和地理位置系统（GNSS）在受到干扰和不可靠时，UAV的导航和地理位置问题具有挑战性。传统技术，如同时地理位置和地图（SLAM）和视觉速度（VO），在提供绝对坐标和减少错误偏差方面存在一定的局限性。现有的视觉定位方法可以通过与正交卫星图像匹配来实现无错误的自主视觉定位，但这无法保证实时性。为解决这些挑战，我们提出了一种新的全球视觉定位网络（GLVL）。我们的 GLVL 网络是一种两stage的视觉定位方法，包括一个大规模检索模块，找到与 UAV 飞行场景相似的区域，以及一个细化匹配模块，在 UAV 坐标上进行精度定位，实现实时和准确的定位。训练过程是在端到端方式进行并行优化，以进一步提高模型能力。实验结果表明，我们的方法可以在包括Texture-rich和Texture-sparse区域的六个 UAV 飞行场景中实现实时精度定位要求。特别是，我们的方法在村庄场景中，具有稀疏特征的Texture-sparse区域，可以实现只有2.39米的地理位置错误，在0.48秒内完成。

RT-SRTS: Angle-Agnostic Real-Time Simultaneous 3D Reconstruction and Tumor Segmentation from Single X-Ray Projection

paper_url: http://arxiv.org/abs/2310.08080
repo_url: None
paper_authors: Miao Zhu, Qiming Fu, Bo Liu, Mengxi Zhang, Bojian Li, Xiaoyan Luo, Fugen Zhou
for: 这篇论文的目的是提出一种新的医疗影像重建方法，以帮助肿瘤治疗中的放射线治疗过程。
methods: 这篇论文使用的方法是基于多任务学习（MTL）的一种综合三维图像重建和肿瘤分类的网络，可以实现单据X射线像面的实时三维重建和肿瘤分类。此外，还提出了注意力增强calibrator（AEC）和不确定区域详细（URE）模组，以帮助特征提取和提高分类精度。
results: 这篇论文的结果显示，提出的方法可以实现实时三维重建和肿瘤分类，并且与两种现有方法比较，表现更加出色。实际上，这篇论文可以实现单据X射线像面的实时三维重建和肿瘤分类，并且可以在约70ms内完成这个过程，远远超过了实时肿瘤追踪所需的时间点。此外，还进一步验证了AEC和URE模组的有效性。

Abstract
Radiotherapy is one of the primary treatment methods for tumors, but the organ movement caused by respiratory motion limits its accuracy. Recently, 3D imaging from single X-ray projection receives extensive attentions as a promising way to address this issue. However, current methods can only reconstruct 3D image without direct location of the tumor and are only validated for fixed-angle imaging, which fails to fully meet the requirement of motion control in radiotherapy. In this study, we propose a novel imaging method RT-SRTS which integrates 3D imaging and tumor segmentation into one network based on the multi-task learning (MTL) and achieves real-time simultaneous 3D reconstruction and tumor segmentation from single X-ray projection at any angle. Futhermore, we propose the attention enhanced calibrator (AEC) and uncertain-region elaboration (URE) modules to aid feature extraction and improve segmentation accuracy. We evaluated the proposed method on ten patient cases and compared it with two state-of-the-art methods. Our approach not only delivered superior 3D reconstruction but also demonstrated commendable tumor segmentation results. The simultaneous reconstruction and segmentation could be completed in approximately 70 ms, significantly faster than the required time threshold for real-time tumor tracking. The efficacy of both AEC and URE was also validated through ablation studies.

摘要
医学中，辐射疗法是肿瘤的主要治疗方法，但是呼吸运动引起的器官运动限制了它的精度。最近，3D成像从单个X射线投影所receives extensive attention为一种有前途的方法来解决这个问题。然而，当前的方法只能重建3D图像而不是直接定位肿瘤，并且只适用于固定角度的成像，这些方法无法充分满足肿瘤跟踪的需求。在本研究中，我们提出了一种新的成像方法，即RT-SRTS，它将3D成像和肿瘤分割 integrate into one network based on multi-task learning (MTL)，并在单个X射线投影任意角度下实现实时同步3D重建和肿瘤分割。此外，我们还提出了注意力增强calibrator (AEC)和uncertain-region elaboration (URE)模块，以帮助特征提取和提高分割精度。我们对十个患者案例进行了评估，并与两种当前最佳方法进行比较。我们的方法不仅提供了superior 3D重建，还demonstrated commendable tumor segmentation results。同时，我们的方法可以在约70ms内完成同步重建和分割，这比较于实时肿瘤跟踪的时间要求更快。此外，我们还 validate了AEC和URE模块的效果通过ablation study。

Samples on Thin Ice: Re-Evaluating Adversarial Pruning of Neural Networks

paper_url: http://arxiv.org/abs/2310.08073
repo_url: None
paper_authors: Giorgio Piras, Maura Pintor, Ambra Demontis, Battista Biggio
for: 这篇论文的目的是重新评估三种最新的对抗式范例遗传方法，并评估这些方法的稳定性和抗衰变性。
methods: 这篇论文使用了三种最新的对抗式范例遗传方法，分别是 adversarial training、input preprocessing 和 output preprocessing。
results: 研究发现，这三种方法的 robustness 被过度估计，而且对于较具有挑战性的测试数据集，这些方法的表现相对较差。此外，研究发现这些方法遗传后的模型通常会对于较接近原始模型的决策界面的样本进行错误分类。

Abstract
Neural network pruning has shown to be an effective technique for reducing the network size, trading desirable properties like generalization and robustness to adversarial attacks for higher sparsity. Recent work has claimed that adversarial pruning methods can produce sparse networks while also preserving robustness to adversarial examples. In this work, we first re-evaluate three state-of-the-art adversarial pruning methods, showing that their robustness was indeed overestimated. We then compare pruned and dense versions of the same models, discovering that samples on thin ice, i.e., closer to the unpruned model's decision boundary, are typically misclassified after pruning. We conclude by discussing how this intuition may lead to designing more effective adversarial pruning methods in future work.

摘要

Learning Transferable Conceptual Prototypes for Interpretable Unsupervised Domain Adaptation

paper_url: http://arxiv.org/abs/2310.08071
repo_url: None
paper_authors: Junyu Gao, Xinhong Ma, Changsheng Xu
for: 本研究旨在提出一种可解释的频繁领域适应（UDA）方法，以提高模型的安全性和可控性。
methods: 本方法基于层次分类模型，设计了一个层次概念模型（TCPL），通过将来源频繁领域的基本概念传递到目标频繁领域，学习了频繁领域共享的原型。同时，设计了一种自适应的自我预测稳定潜在标签策略，以选择适合 Pseudo 注解的目标样本，逐渐缩小频繁领域的差距。
results: 实验表明，提出的方法可以不仅提供有效和直观的解释，还能够超越之前的状态。

Abstract
Despite the great progress of unsupervised domain adaptation (UDA) with the deep neural networks, current UDA models are opaque and cannot provide promising explanations, limiting their applications in the scenarios that require safe and controllable model decisions. At present, a surge of work focuses on designing deep interpretable methods with adequate data annotations and only a few methods consider the distributional shift problem. Most existing interpretable UDA methods are post-hoc ones, which cannot facilitate the model learning process for performance enhancement. In this paper, we propose an inherently interpretable method, named Transferable Conceptual Prototype Learning (TCPL), which could simultaneously interpret and improve the processes of knowledge transfer and decision-making in UDA. To achieve this goal, we design a hierarchically prototypical module that transfers categorical basic concepts from the source domain to the target domain and learns domain-shared prototypes for explaining the underlying reasoning process. With the learned transferable prototypes, a self-predictive consistent pseudo-label strategy that fuses confidence, predictions, and prototype information, is designed for selecting suitable target samples for pseudo annotations and gradually narrowing down the domain gap. Comprehensive experiments show that the proposed method can not only provide effective and intuitive explanations but also outperform previous state-of-the-arts.

摘要
尽管深度神经网络在无监督领域适应（UDA）中做出了很大的进步，但目前的UDA模型仍然不透明，无法提供有前途的解释，限制其在需要安全和可控的模型决策的场景中的应用。目前，大量的研究集中在设计深度可解释方法上，但大多数这些方法仅考虑了数据注解的问题，而很少考虑分布shift问题。现有的可解释UDA方法都是后续的方法，无法促进模型性能的提高。在这篇论文中，我们提出了内置可解释的方法，即传递可读 prototype 学习（TCPL），可同时解释和改进知识传递和决策过程。为 достичь这个目标，我们设计了层次prototype模块，将来源领域中的基本概念传递到目标领域，并在不同领域之间学习共享的概念示例。通过学习传递的示例，我们设计了一种自预测一致的 pseudo-label 策略，将信任度、预测值和示例信息 fusion 以选择适合 pseudo 标注的目标样本，逐渐缩小领域差距。经过完整的实验表明，我们的方法不仅可以提供有效和直观的解释，还可以超越先前的状态 искус。

Frequency-Aware Re-Parameterization for Over-Fitting Based Image Compression

paper_url: http://arxiv.org/abs/2310.08068
repo_url: None
paper_authors: Yun Ye, Yanjie Pan, Qually Jiang, Ming Lu, Xiaoran Fang, Beryl Xu
for: 压缩图像过滤需要图像压缩和实时调整，对于深度卷积神经网 (CNN) 的方法而言，这会带来储存类型和快速调整的挑战。
methods: 这篇 paper 提出了一个简单的重构化方法，用于训练 CNNs 的储存类型和快速调整。卷积核心被重构化为一个权重总和的离散弹道变换 (DCT) 核心，允许直接优化频域中。combined with L1 正规化，提出的方法可以超过普通的卷积，在短时间内 achieve 较好的比特率-调整。
results: 实验结果显示，这篇 paper 的方法可以在不同的数据集上进行压缩图像的过滤，并且可以实现 -46.12% BD-rate 的提升，仅需要 200 迭代。

Abstract
Over-fitting-based image compression requires weights compactness for compression and fast convergence for practical use, posing challenges for deep convolutional neural networks (CNNs) based methods. This paper presents a simple re-parameterization method to train CNNs with reduced weights storage and accelerated convergence. The convolution kernels are re-parameterized as a weighted sum of discrete cosine transform (DCT) kernels enabling direct optimization in the frequency domain. Combined with L1 regularization, the proposed method surpasses vanilla convolutions by achieving a significantly improved rate-distortion with low computational cost. The proposed method is verified with extensive experiments of over-fitting-based image restoration on various datasets, achieving up to -46.12% BD-rate on top of HEIF with only 200 iterations.

摘要
适应过拟合的图像压缩需要权重紧密度 для压缩和快速收敛，对深度卷积神经网络（CNN）基本方法带来挑战。这篇论文提出了一种简单的重parameter化方法，以减少权重存储和加速收敛。核心卷积被重parameterized为一个权重加权的积分幂函数，允许直接优化频率频谱中。与L1正则化结合使用，提议方法在环境成本低下实现了明显提高的比特率-误差率。试验表明，在多种适应过拟合图像修复 task 上，提议方法可以达到最高 -46.12% BD-rate，只需200个迭代。

Age Estimation Based on Graph Convolutional Networks and Multi-head Attention Mechanisms

paper_url: http://arxiv.org/abs/2310.08064
repo_url: None
paper_authors: Miaomiao Yang, Changwei Yao, Shijin Yan
For: 本研究开发了一个端正游戏用户识别系统，使用腔边卷网络和多头注意力机制来提高年龄估测的精度。* Methods: 本研究使用了卷网络和多头注意力机制，实现了不 Regular 面部图像特征的抽象和模型，以减少背景信息的影响和提高年龄估测的精度。* Results: 本研究获得了较高的年龄估测精度，MAE错误值降至约3.64，比今天的年龄估测模型更好，从而提高了面部识别和身份验证的精度。

Abstract
Age estimation technology is a part of facial recognition and has been applied to identity authentication. This technology achieves the development and application of a juvenile anti-addiction system by authenticating users in the game. Convolutional Neural Network (CNN) and Transformer algorithms are widely used in this application scenario. However, these two models cannot flexibly extract and model features of faces with irregular shapes, and they are ineffective in capturing key information. Furthermore, the above methods will contain a lot of background information while extracting features, which will interfere with the model. In consequence, it is easy to extract redundant information from images. In this paper, a new modeling idea is proposed to solve this problem, which can flexibly model irregular objects. The Graph Convolutional Network (GCN) is used to extract features from irregular face images effectively, and multi-head attention mechanisms are added to avoid redundant features and capture key region information in the image. This model can effectively improve the accuracy of age estimation and reduce the MAE error value to about 3.64, which is better than the effect of today's age estimation model, to improve the accuracy of face recognition and identity authentication.

摘要
现代年龄估计技术是人脸识别的一部分，已经应用于身份验证。这种技术通过验证用户在游戏中的身份来实现青少年反加ict系统的发展和应用。卷积神经网络（CNN）和变换器算法广泛应用于这个应用场景中。然而，这两种模型无法flexibly提取和模型面呈扁桃形的特征，并且不能 Capture关键信息。此外，上述方法会从图像中提取背景信息，这会干扰模型。因此，容易提取图像中的废弃信息。在本文中，一种新的模型化想法被提出来解决这个问题，即使用图像特征提取GCN网络，并添加多头注意机制以避免废弃特征和Capture图像关键区域信息。这种模型可以有效提高年龄估计的准确性，并将MAE错误值降到约3.64，比现有的年龄估计模型更好。这将有助于提高人脸识别和身份验证的准确性。

EC-Depth: Exploring the consistency of self-supervised monocular depth estimation under challenging scenes

paper_url: http://arxiv.org/abs/2310.08044
repo_url: https://github.com/RuijieZhu94/EC-Depth
paper_authors: Ruijie Zhu, Ziyang Song, Chuxin Wang, Jianfeng He, Tianzhu Zhang
for:EC-Depth is designed to improve the robustness of self-supervised monocular depth estimation models in real-world applications, where adverse conditions are prevalent.methods:The proposed method utilizes a two-stage training framework with a perturbation-invariant depth consistency constraint module and a consistency-based pseudo-label selection module to achieve accurate and consistent depth predictions.results:EC-Depth surpasses existing state-of-the-art methods on KITTI, KITTI-C, and DrivingStereo benchmarks, demonstrating its effectiveness in challenging scenarios.

Abstract
Self-supervised monocular depth estimation holds significant importance in the fields of autonomous driving and robotics. However, existing methods are typically designed to train and test on clear and pristine datasets, overlooking the impact of various adverse conditions prevalent in real-world scenarios. As a result, it is commonly observed that most self-supervised monocular depth estimation methods struggle to perform adequately under challenging conditions. To address this issue, we present EC-Depth, a novel self-supervised two-stage training framework to achieve a robust depth estimation, starting from the foundation of depth prediction consistency under different perturbations. Leveraging the proposed perturbation-invariant depth consistency constraint module and the consistency-based pseudo-label selection module, our model attains accurate and consistent depth predictions in both standard and challenging scenarios. Extensive experiments substantiate the effectiveness of the proposed method. Moreover, our method surpasses existing state-of-the-art methods on KITTI, KITTI-C and DrivingStereo benchmarks, demonstrating its potential for enhancing the reliability of self-supervised monocular depth estimation models in real-world applications.

摘要
自我监督单目深度估计在自动驾驶和 робо械学中具有重要意义，但现有方法通常是在清晰和完整的数据集上训练和测试，忽视了实际场景中的多种不利条件。因此，大多数自我监督单目深度估计方法在实际场景中表现不佳。为解决这个问题，我们提出了 EC-Depth，一种新的自我监督两 stage 训练框架，以实现robust的深度估计。我们利用了提议的扰动不敏感深度一致性约束模块和一致性基于pseudo标签选择模块，从而使我们的模型在标准和复杂场景中都能够获得准确和一致的深度预测。广泛的实验证明了我们的方法的有效性。此外，我们的方法在 KITTI、KITTI-C 和 DrivingStereo 标准吗chmark上超过了现有状态的艺术方法，这表明了我们的方法在实际应用中提高了自我监督单目深度估计模型的可靠性。

X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention

paper_url: http://arxiv.org/abs/2310.08042
repo_url: https://github.com/cool-xuan/x-hrnet
paper_authors: Yixuan Zhou, Xuanhan Wang, Xing Xu, Lei Zhao, Jingkuan Song
for: 提高人 pose 估计精度，降低计算复杂性
methods: 引入空间单 dimensional 自注意力 (SUSA)，取代点 wise (1x1) 卷积
results: 实现高精度人 pose 估计，降低计算复杂性96%，并提供了可重复使用的代码Here’s a breakdown of each sentence:* “for”: 该文章是为了提高人 pose 估计精度和降低计算复杂性。* “methods”: 文章提出了一种新的方法，即引入空间单 dimensional 自注意力 (SUSA)，以取代点 wise (1x1) 卷积。* “results”: 实验结果表明，使用 SUSA 可以实现高精度人 pose 估计，并降低计算复杂性96%。此外，文章还提供了可重复使用的代码。

Abstract
High-resolution representation is necessary for human pose estimation to achieve high performance, and the ensuing problem is high computational complexity. In particular, predominant pose estimation methods estimate human joints by 2D single-peak heatmaps. Each 2D heatmap can be horizontally and vertically projected to and reconstructed by a pair of 1D heat vectors. Inspired by this observation, we introduce a lightweight and powerful alternative, Spatially Unidimensional Self-Attention (SUSA), to the pointwise (1x1) convolution that is the main computational bottleneck in the depthwise separable 3c3 convolution. Our SUSA reduces the computational complexity of the pointwise (1x1) convolution by 96% without sacrificing accuracy. Furthermore, we use the SUSA as the main module to build our lightweight pose estimation backbone X-HRNet, where `X' represents the estimated cross-shape attention vectors. Extensive experiments on the COCO benchmark demonstrate the superiority of our X-HRNet, and comprehensive ablation studies show the effectiveness of the SUSA modules. The code is publicly available at https://github.com/cool-xuan/x-hrnet.

摘要
高分辨率表示是人体姿势估计高性能所需的，但是随之而来的问题是高计算复杂性。特别是，主流的姿势估计方法都是通过2D单峰热图来估计人体关节。每个2D热图可以被水平和垂直投影，并通过一对1D热向量重建。从这个观察中，我们提出了一种轻量级、强大的替代方案——空间单维自注意（SUSA），以减少点 wise（1x1）卷积的计算复杂性。我们的SUSA可以将点 wise（1x1）卷积的计算复杂性减少96%，而不会失去精度。另外，我们使用SUSA作为主模块，建立了我们的轻量级姿势估计后缘X-HRNet，其中`X'表示估计的交叉形注意 vector。EXTENSIVE EXPERIMENTS ON THE COCO BENCHMARK DEMONSTRATE THE SUPERIORITY OF OUR X-HRNet， AND COMPREHENSIVE ABLAATION STUDIES SHOW THE EFFECTIVENESS OF THE SUSA MODULES。代码可以在https://github.com/cool-xuan/x-hrnet中获得。

Continual Learning via Manifold Expansion Replay

paper_url: http://arxiv.org/abs/2310.08038
repo_url: None
paper_authors: Zihao Xu, Xuan Tang, Yufei Shi, Jianfeng Zhang, Jian Yang, Mingsong Chen, Xian Wei
for: 本研究旨在提高连续学习中的稳定性和表达力，透过扩大知识表示的含义槽的几何尺度。
methods: 本研究提出了一种新的播放策略called Manifold Expansion Replay (MaER)，通过在知识缓存中扩大隐式几何的缺失来提高模型的稳定性和表达力。
results: 通过对MNIST、CIFAR10、CIFAR100和TinyImageNet等数据集进行了广泛的实验验证，提出的方法在连续学习设置下显著提高了精度，比对状态前的表现更高。

Abstract
In continual learning, the learner learns multiple tasks in sequence, with data being acquired only once for each task. Catastrophic forgetting is a major challenge to continual learning. To reduce forgetting, some existing rehearsal-based methods use episodic memory to replay samples of previous tasks. However, in the process of knowledge integration when learning a new task, this strategy also suffers from catastrophic forgetting due to an imbalance between old and new knowledge. To address this problem, we propose a novel replay strategy called Manifold Expansion Replay (MaER). We argue that expanding the implicit manifold of the knowledge representation in the episodic memory helps to improve the robustness and expressiveness of the model. To this end, we propose a greedy strategy to keep increasing the diameter of the implicit manifold represented by the knowledge in the buffer during memory management. In addition, we introduce Wasserstein distance instead of cross entropy as distillation loss to preserve previous knowledge. With extensive experimental validation on MNIST, CIFAR10, CIFAR100, and TinyImageNet, we show that the proposed method significantly improves the accuracy in continual learning setup, outperforming the state of the arts.

摘要
在连续学习中，学习者需要在序列中学习多个任务，并且每个任务只有一次数据采集。然而，这会导致忘记问题，特别是在知识集成过程中学习新任务时。为解决这问题，我们提出了一种新的回忆策略，即扩展隐式抽象的 manifold 扩展回忆（MaER）策略。我们认为，通过扩展知识表示的隐式抽象 manifold 可以提高模型的稳定性和表达力。为此，我们提出了一种满足策略，在内存管理中不断增加知识在缓存中的径距。此外，我们引入 Wasserstein 距离 instead of cross entropy 作为练习损失，以保持之前的知识。经验 validate 在 MNIST、CIFAR10、CIFAR100 和 TinyImageNet 上，我们发现提出的方法可以在连续学习设置中显著提高准确率，超过当前最佳性能。

BaSAL: Size Balanced Warm Start Active Learning for LiDAR Semantic Segmentation

paper_url: http://arxiv.org/abs/2310.08035
repo_url: None
paper_authors: Jiarong Wei, Yancong Lin, Holger Caesar
for: 降低成本的数据标注，通过重复询问 annotator 标注pool中的无标签数据中最有用的样本，并将其用于重新训练模型。
methods: 使用size-balanced warm start active learning模型，根据对象类别的特征大小进行对象群集 sampling，以创建更加均衡的数据集。
results: 能够大幅提高初始模型的性能，并且与使用整个SemanticKITTI dataset进行训练相当，使用只有5%的标注数据，而且与现有的活动学习方法相当。

Abstract
Active learning strives to reduce the need for costly data annotation, by repeatedly querying an annotator to label the most informative samples from a pool of unlabeled data and retraining a model from these samples. We identify two problems with existing active learning methods for LiDAR semantic segmentation. First, they ignore the severe class imbalance inherent in LiDAR semantic segmentation datasets. Second, to bootstrap the active learning loop, they train their initial model from randomly selected data samples, which leads to low performance and is referred to as the cold start problem. To address these problems we propose BaSAL, a size-balanced warm start active learning model, based on the observation that each object class has a characteristic size. By sampling object clusters according to their size, we can thus create a size-balanced dataset that is also more class-balanced. Furthermore, in contrast to existing information measures like entropy or CoreSet, size-based sampling does not require an already trained model and thus can be used to address the cold start problem. Results show that we are able to improve the performance of the initial model by a large margin. Combining size-balanced sampling and warm start with established information measures, our approach achieves a comparable performance to training on the entire SemanticKITTI dataset, despite using only 5% of the annotations, which outperforms existing active learning methods. We also match the existing state-of-the-art in active learning on nuScenes. Our code will be made available upon paper acceptance.

摘要
aktive learning实践旨在减少成本的标注资料，通过重复询问标注者 labelpool中的不标注资料中的最有用样本，并从这些样本中重训模型。我们发现了现有的 aktive learning方法对于LiDAR semantic segmentation有两个问题。首先，它们忽略了LiDAR semantic segmentationdataset中的严重类别不均衡。其次，为了启动活动学习循环，它们从Random选择的资料样本中训练初始模型，这个问题被称为冷启动问题。为了解决这些问题，我们提出了Basal，一个size-balanced warm start aktive learning模型，基于每个物类的特征大小。通过根据物类的大小排序物类对，我们可以创建一个size-balanceddataset，并且更好地对类别进行均衡。此外，不同于现有的信息度量like entropy或CoreSet，size-based sampling不需要已经训练的模型，因此可以用来解决冷启动问题。我们的结果显示，我们能够从初始模型中大幅提高性能。通过结合size-balanced sampling和暖启动，我们的方法可以与使用整个SemanticKITTI dataset的性能相匹配，即使只使用5%的标注资料，而且超越现有的aktive learning方法。我们还与nuScenes中的active learning方法匹配。我们将代码公开发布一并发表论文。

Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval

paper_url: http://arxiv.org/abs/2310.08009
repo_url: https://github.com/IMCCretrieval/DKPH
paper_authors: Pandeng Li, Hongtao Xie, Jiannan Ge, Lei Zhang, Shaobo Min, Yongdong Zhang
for: 本研究旨在提高无监督视频哈希的性能，通过分解视频信息为重建依赖的信息和Semantic依赖的信息，从而隔离 semantic extraction 从重建约束。
methods: 我们采用了一种简单的 dual-stream 结构，包括一个时间层和一个哈希层。在这种结构中，哈希层通过自我监督获得的含义相似知识，学习捕捉 binary codes 中的 semantics，而时间层则学习重建视频信息。
results: 我们的方法在三个视频benchmark上进行了广泛的实验 validate，与之前的状态场景比较，我们的方法一直表现出优于其他方法。

Abstract
Unsupervised video hashing usually optimizes binary codes by learning to reconstruct input videos. Such reconstruction constraint spends much effort on frame-level temporal context changes without focusing on video-level global semantics that are more useful for retrieval. Hence, we address this problem by decomposing video information into reconstruction-dependent and semantic-dependent information, which disentangles the semantic extraction from reconstruction constraint. Specifically, we first design a simple dual-stream structure, including a temporal layer and a hash layer. Then, with the help of semantic similarity knowledge obtained from self-supervision, the hash layer learns to capture information for semantic retrieval, while the temporal layer learns to capture the information for reconstruction. In this way, the model naturally preserves the disentangled semantics into binary codes. Validated by comprehensive experiments, our method consistently outperforms the state-of-the-arts on three video benchmarks.

摘要
<> translates the given text into Simplified Chinese.Unsupervised video hashing usually optimizes binary codes by learning to reconstruct input videos. Such reconstruction constraint spends much effort on frame-level temporal context changes without focusing on video-level global semantics that are more useful for retrieval. Hence, we address this problem by decomposing video information into reconstruction-dependent and semantic-dependent information, which disentangles the semantic extraction from reconstruction constraint. Specifically, we first design a simple dual-stream structure, including a temporal layer and a hash layer. Then, with the help of semantic similarity knowledge obtained from self-supervision, the hash layer learns to capture information for semantic retrieval, while the temporal layer learns to capture the information for reconstruction. In this way, the model naturally preserves the disentangled semantics into binary codes. Validated by comprehensive experiments, our method consistently outperforms the state-of-the-arts on three video benchmarks.中文翻译：通常情况下，无监督视频哈希优化二进制代码通过学习重建输入视频的方式进行优化。这种重建约束会占用帧级时间上下文变化的大量精力，而不是关注视频级全局 semantics 更有用于检索。因此，我们解决这个问题，通过分解视频信息为重建依赖的信息和semantic依赖的信息来分离semantic抽取。具体来说，我们首先设计了一个简单的双流结构，包括一个时间层和一个哈希层。然后，通过自我监督获得的semantic相似性知识，哈希层学习捕捉Semantic检索中的信息，而时间层学习捕捉重建中的信息。这样，模型会自然地储存分离的semantics到二进制代码中。经过了广泛的实验 validate，我们的方法在三个视频 benchmark 上 consistently 超越了状态的艺术。

MLP-AMDC: An MLP Architecture for Adaptive-Mask-based Dual-Camera snapshot hyperspectral imaging

paper_url: http://arxiv.org/abs/2310.08002
repo_url: https://github.com/caizeyu1992/MLP-AMDC
paper_authors: Zeyu Cai, Can Zhang, Xunhao Chen, Shanghuan Liu, Chengqian Jin, Feipeng Da
for: This paper aims to improve the performance and speed of Coded Aperture Snapshot Spectral Imaging (CASSI) systems, which are used to acquire Hyper-Spectral Images (HSI).
methods: The paper proposes an AMDC-CASSI system that uses an RGB camera with CASSI and Adaptive-Mask to improve the reconstruction quality of HSI. The proposed method replaces the transformer structure of the network with an MLP architecture to improve the inference speed of the reconstruction network.
results: The paper shows that the proposed MLP-AMDC method achieves an 8 dB improvement over the state-of-the-art (SOTA) and at least a 5-fold improvement in reconstruction speed, while maintaining competitive reconstruction quality.

Abstract
Coded Aperture Snapshot Spectral Imaging (CASSI) system has great advantages over traditional methods in dynamically acquiring Hyper-Spectral Image (HSI), but there are the following problems. 1) Traditional mask relies on random patterns or analytical design, both of which limit the performance improvement of CASSI. 2) Existing high-quality reconstruction algorithms are slow in reconstruction and can only reconstruct scene information offline. To address the above two problems, this paper designs the AMDC-CASSI system, introducing RGB camera with CASSI based on Adaptive-Mask as multimodal input to improve the reconstruction quality. The existing SOTA reconstruction schemes are based on transformer, but the operation of self-attention pulls down the operation efficiency of the network. In order to improve the inference speed of the reconstruction network, this paper proposes An MLP Architecture for Adaptive-Mask-based Dual-Camera (MLP-AMDC) to replace the transformer structure of the network. Numerous experiments have shown that MLP performs no less well than transformer-based structures for HSI reconstruction, while MLP greatly improves the network inference speed and has less number of parameters and operations, our method has a 8 db improvement over SOTA and at least a 5-fold improvement in reconstruction speed. (https://github.com/caizeyu1992/MLP-AMDC.)

摘要
CASSI（coded aperture snapshot spectral imaging）系统在获取高spectral resolution的图像方面有优势，但存在以下问题：1）传统的面Mask rely on random patterns或分析设计，两者都限制了CASSI的性能提升。2）现有的高质量重建算法慢于重建和只能在离线重建场景信息。为了解决上述两个问题，本文提出了RGB camera与CASSI基于Adaptive-Mask的多模态输入，以提高重建质量。现有的SOTA重建方案基于transformer，但自我注意operation pulls down网络的运算效率。为了提高重建网络的吞吐量，本文提议使用An MLP Architecture for Adaptive-Mask-based Dual-Camera（MLP-AMDC）来替换网络的transformer结构。多个实验表明，MLP与transformer-based结构相当，而MLP可以大幅提高网络的吞吐量和参数数量，我们的方法与SOTA差距8db，并至少提高5倍的重建速度。（https://github.com/caizeyu1992/MLP-AMDC。）

Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning

paper_url: http://arxiv.org/abs/2310.07996
repo_url: None
paper_authors: Lapo Frati, Neil Traft, Jeff Clune, Nick Cheney
for: 这个论文旨在提出一种简单的预训练机制，以便 representations 能够更好地进行 continual 学习和转移学习。
methods: 这个机制是在最后一层的权重重新设置，我们昵称其为 “zapping”。这种机制原本是为 meta-continual-learning 过程设计的，但我们表明它可以在许多其他场景中应用。
results: 在我们的实验中，我们想要将预训练的图像分类器转移到新的类别中，并在几个极少的试验中达到了更高的转移精度和/或更快的适应速度，而无需使用昂贵的高阶导数。这种 zapping 机制可以考虑为 computationally 更便宜的、或者是 meta-learning 快速适应特征的代替方案。

Abstract
This work identifies a simple pre-training mechanism that leads to representations exhibiting better continual and transfer learning. This mechanism -- the repeated resetting of weights in the last layer, which we nickname "zapping" -- was originally designed for a meta-continual-learning procedure, yet we show it is surprisingly applicable in many settings beyond both meta-learning and continual learning. In our experiments, we wish to transfer a pre-trained image classifier to a new set of classes, in a few shots. We show that our zapping procedure results in improved transfer accuracy and/or more rapid adaptation in both standard fine-tuning and continual learning settings, while being simple to implement and computationally efficient. In many cases, we achieve performance on par with state of the art meta-learning without needing the expensive higher-order gradients, by using a combination of zapping and sequential learning. An intuitive explanation for the effectiveness of this zapping procedure is that representations trained with repeated zapping learn features that are capable of rapidly adapting to newly initialized classifiers. Such an approach may be considered a computationally cheaper type of, or alternative to, meta-learning rapidly adaptable features with higher-order gradients. This adds to recent work on the usefulness of resetting neural network parameters during training, and invites further investigation of this mechanism.

摘要

CleftGAN: Adapting A Style-Based Generative Adversarial Network To Create Images Depicting Cleft Lip Deformity

paper_url: http://arxiv.org/abs/2310.07969
repo_url: None
paper_authors: Abdullah Hayajneh, Erchin Serpedin, Mohammad Shaqfeh, Graeme Glass, Mitchell A. Stotland
for: This paper aims to address the challenge of training a machine learning system to evaluate facial clefts by generating a large dataset of high-quality, ethics board-approved patient images using a deep learning-based cleft lip generator.
methods: The authors use a transfer learning protocol with a deep learning-based generative adversarial network image generator incorporating adaptive data augmentation (ADA) to generate a large dataset of artificial images exhibiting high-fidelity facsimiles of cleft lip with wide variation.
results: The authors found that StyleGAN3 with translation invariance (StyleGAN3-t) performed optimally as a base model, and the generated images achieved a low Frechet Inception Distance (FID) reflecting a close similarity to the training input dataset of genuine cleft images. The PPL and DISH measures also showed a smooth and semantically valid interpolation of images through the transfer learning process, and a similar distribution of severity in the training and generated images.

Abstract
A major obstacle when attempting to train a machine learning system to evaluate facial clefts is the scarcity of large datasets of high-quality, ethics board-approved patient images. In response, we have built a deep learning-based cleft lip generator designed to produce an almost unlimited number of artificial images exhibiting high-fidelity facsimiles of cleft lip with wide variation. We undertook a transfer learning protocol testing different versions of StyleGAN-ADA (a generative adversarial network image generator incorporating adaptive data augmentation (ADA)) as the base model. Training images depicting a variety of cleft deformities were pre-processed to adjust for rotation, scaling, color adjustment and background blurring. The ADA modification of the primary algorithm permitted construction of our new generative model while requiring input of a relatively small number of training images. Adversarial training was carried out using 514 unique frontal photographs of cleft-affected faces to adapt a pre-trained model based on 70,000 normal faces. The Frechet Inception Distance (FID) was used to measure the similarity of the newly generated facial images to the cleft training dataset, while Perceptual Path Length (PPL) and the novel Divergence Index of Severity Histograms (DISH) measures were also used to assess the performance of the image generator that we dub CleftGAN. We found that StyleGAN3 with translation invariance (StyleGAN3-t) performed optimally as a base model. Generated images achieved a low FID reflecting a close similarity to our training input dataset of genuine cleft images. Low PPL and DISH measures reflected a smooth and semantically valid interpolation of images through the transfer learning process and a similar distribution of severity in the training and generated images, respectively.

摘要
很多时候，在尝试使机器学习系统评估面部缺陷时，面临着大量高质量、伦理委员会批准的患者图像的缺乏问题。为此，我们构建了一个基于深度学习的面部缺陷生成器，可以生成具有广泛变化的人工图像，以便模拟面部缺陷的多种形式。我们采用了一种转移学习协议，测试不同版本的StyleGAN-ADA（一种基于生成 adversarial network的图像生成器，其中ADA表示适应性数据增强）作为基本模型。我们使用了不同的扭转、缩放、颜色调整和背景模糊等方法来预处理训练图像，以适应不同的缺陷形式。ADA修改后的主要算法允许我们建立我们新的生成模型，只需输入相对较少的训练图像。我们使用了514个特定的rontal相机拍摄了缺陷面部图像来适应一个预训练模型，基于70000个正常面部图像。我们使用了Frechet Inception Distance（FID）、Perceptual Path Length（PPL）和 novel Divergence Index of Severity Histograms（DISH）等方法来评估我们所建立的图像生成器，我们称之为CleftGAN。我们发现，StyleGAN3 with translation invariance（StyleGAN3-t）在基本模型中表现最佳。生成的图像得到了低的FID，表示它们与我们的训练输入图像的真实缺陷图像很相似。PPL和DISH值均较低，表示通过转移学习过程中的满意 interpolate 和生成图像的分布相似。

2023-10-12

cs.AI

cs.AI - 2023-10-12

Examining the Potential and Pitfalls of ChatGPT in Science and Engineering Problem-Solving

paper_url: http://arxiv.org/abs/2310.08773
repo_url: None
paper_authors: Karen D. Wang, Eric Burkholder, Carl Wieman, Shima Salehi, Nick Haber
for: 本研究探讨OpenAI的ChatGPT在解决不同类型物理问题的能力。
methods: 本研究使用ChatGPT（GPT-4）解决了一共40个大学物理课程中的问题，这些问题包括具有完整数据的准确问题以及缺乏数据的实际问题。
results: 研究发现ChatGPT可以成功解决62.5%的准确问题，但对于缺乏数据的问题，准确率只有8.3%。分析模型的错误解决方法发现有三种失败模式：1）建立不准确的物理世界模型，2）缺乏数据的假设，3）计算错误。

Abstract
The study explores the capabilities of OpenAI's ChatGPT in solving different types of physics problems. ChatGPT (with GPT-4) was queried to solve a total of 40 problems from a college-level engineering physics course. These problems ranged from well-specified problems, where all data required for solving the problem was provided, to under-specified, real-world problems where not all necessary data were given. Our findings show that ChatGPT could successfully solve 62.5% of the well-specified problems, but its accuracy drops to 8.3% for under-specified problems. Analysis of the model's incorrect solutions revealed three distinct failure modes: 1) failure to construct accurate models of the physical world, 2) failure to make reasonable assumptions about missing data, and 3) calculation errors. The study offers implications for how to leverage LLM-augmented instructional materials to enhance STEM education. The insights also contribute to the broader discourse on AI's strengths and limitations, serving both educators aiming to leverage the technology and researchers investigating human-AI collaboration frameworks for problem-solving and decision-making.

摘要

失败 construct accurate models of the physical world2. 失败 make reasonable assumptions about missing data3. calculation errorsThe study offers implications for how to leverage LLM-augmented instructional materials to enhance STEM education. The insights also contribute to the broader discourse on AI’s strengths and limitations, serving both educators aiming to leverage the technology and researchers investigating human-AI collaboration frameworks for problem-solving and decision-making.

Stabilizing Subject Transfer in EEG Classification with Divergence Estimation

paper_url: http://arxiv.org/abs/2310.08762
repo_url: None
paper_authors: Niklas Smedemark-Margulies, Ye Wang, Toshiaki Koike-Akino, Jing Liu, Kieran Parsons, Yunus Bicer, Deniz Erdogmus
for: 这篇论文的目的是提高电enzephalogram（EEG）数据的分类模型性能。
methods: 作者使用了新的调整技术来减少分类模型在未见到的测试主题上的性能下降。他们提出了几个图形模型来描述EEG分类任务，并从每个模型中提取了一些关于理想训练enario中的统计关系。他们设计了一些调整 penalty来保持这些关系在实际训练中。
results: 作者的提案方法可以对EEG数据进行分类，并且可以增加测试主题上的均衡精度和减少过滤。这些方法在不同的参数下展现出更大的优化效果，并且仅对训练时间进行小量的computational cost。

Abstract
Classification models for electroencephalogram (EEG) data show a large decrease in performance when evaluated on unseen test sub jects. We reduce this performance decrease using new regularization techniques during model training. We propose several graphical models to describe an EEG classification task. From each model, we identify statistical relationships that should hold true in an idealized training scenario (with infinite data and a globally-optimal model) but that may not hold in practice. We design regularization penalties to enforce these relationships in two stages. First, we identify suitable proxy quantities (divergences such as Mutual Information and Wasserstein-1) that can be used to measure statistical independence and dependence relationships. Second, we provide algorithms to efficiently estimate these quantities during training using secondary neural network models. We conduct extensive computational experiments using a large benchmark EEG dataset, comparing our proposed techniques with a baseline method that uses an adversarial classifier. We find our proposed methods significantly increase balanced accuracy on test subjects and decrease overfitting. The proposed methods exhibit a larger benefit over a greater range of hyperparameters than the baseline method, with only a small computational cost at training time. These benefits are largest when used for a fixed training period, though there is still a significant benefit for a subset of hyperparameters when our techniques are used in conjunction with early stopping regularization.

摘要
“电击脑波（EEG）标本分类模型表现出现大量的减少性能，当被评估在未见到的测试主题时。我们使用新的调整技术来减少这种性能下降。我们提出了一些图形模型来描述EEG标本分类任务。从每个模型中，我们识别出理想情况下（即无穷数据和全球最佳模型）不会出现的统计关系。我们设计了调整罚则来强制这些关系在两个阶段中。首先，我们选择适合的代理量（如共识信息和沃瑟敏-1）来量度弹性和依赖关系。其次，我们提供了高效的训练 Algorithm 来计算这些量。我们使用大量的benchmark EEG数据进行了广泛的计算实验，比较我们的提议方法与基eline方法（使用对抗网络）。我们发现，我们的提议方法在测试主题上具有更高的平衡率和更低的过滤。我们的提议方法在多个参数的范围中表现出更大的优势，仅仅需要在训练过程中进行小量的计算成本。这些优势在固定训练时间下最大化，但是还存在一些参数的子集中，使用我们的技术和早期停止调整时仍然具有重要的优势。”

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

paper_url: http://arxiv.org/abs/2310.08753
repo_url: None
paper_authors: Sreyan Ghosh, Ashish Seth, Sonal Kumar, Utkarsh Tyagi, Chandra Kiran Evuru, S. Ramaneswaran, S. Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
for: The paper is written to explore the ability of audio-language models (ALMs) to perform compositional reasoning, and to propose a new benchmark (CompA) to evaluate this ability.
methods: The paper uses a contrastive approach (e.g., CLAP) to train the ALMs, and proposes a novel learning method to improve the model’s compositional reasoning abilities. The method includes improvements to contrastive training with composition-aware hard negatives, and a novel modular contrastive loss.
results: The paper shows that current ALMs perform only marginally better than random chance on the CompA benchmark, and proposes a new model (CompA-CLAP) that significantly improves over all baseline models on the benchmark. The results indicate that the proposed method has superior compositional reasoning capabilities.Here’s the Chinese translation of the three key information points:
for: 这篇论文是为了探究语音语言模型（ALM）的 композиitional 理解能力而写的，并提出了一个新的 benchmark（CompA）来评估这种能力。
methods: 这篇论文使用了对比方法（e.g., CLAP）来训练 ALM，并提出了一种新的学习方法来提高模型的compositional 理解能力。这种方法包括对比训练中的组合感知强制性进行改进，以及一种新的模块对比损失。
results: 这篇论文显示了现有的 ALM 只能marginally better than随机的概率来 Perform 在 CompA bencmark 上，并提出了一种新的模型（CompA-CLAP）来解决这个问题。这种模型在 CompA bencmark 上显示出了明显的改进， indicating 它的 compositional 理解能力更强。

Abstract
A fundamental characteristic of audio is its compositional nature. Audio-language models (ALMs) trained using a contrastive approach (e.g., CLAP) that learns a shared representation between audio and language modalities have improved performance in many downstream applications, including zero-shot audio classification, audio retrieval, etc. However, the ability of these models to effectively perform compositional reasoning remains largely unexplored and necessitates additional research. In this paper, we propose CompA, a collection of two expert-annotated benchmarks with a majority of real-world audio samples, to evaluate compositional reasoning in ALMs. Our proposed CompA-order evaluates how well an ALM understands the order or occurrence of acoustic events in audio, and CompA-attribute evaluates attribute binding of acoustic events. An instance from either benchmark consists of two audio-caption pairs, where both audios have the same acoustic events but with different compositions. An ALM is evaluated on how well it matches the right audio to the right caption. Using this benchmark, we first show that current ALMs perform only marginally better than random chance, thereby struggling with compositional reasoning. Next, we propose CompA-CLAP, where we fine-tune CLAP using a novel learning method to improve its compositional reasoning abilities. To train CompA-CLAP, we first propose improvements to contrastive training with composition-aware hard negatives, allowing for more focused training. Next, we propose a novel modular contrastive loss that helps the model learn fine-grained compositional understanding and overcomes the acute scarcity of openly available compositional audios. CompA-CLAP significantly improves over all our baseline models on the CompA benchmark, indicating its superior compositional reasoning capabilities.

摘要
音频的基本特点之一是其 Compositional nature。使用对比方法（例如CLAP）训练的音频语言模型（ALM）在许多下游应用程序中表现得更好，包括零shot音频分类、音频检索等。然而，这些模型对于实际进行compositional reasoning的能力尚未得到足够的探索，需要进一步的研究。在这篇论文中，我们提出了CompA，一个由专家标注的benchmark集合，用于评估ALM的compositional reasoning能力。我们的CompA-order评估了ALM是否能够正确地理解音频中的事件顺序或发生频度，而CompA-attribute评估了事件绑定的能力。每个benchmark实例都包括两对音频-标签对，其中两个音频具有相同的听觉事件，但具有不同的组合。ALM被评估是否能够匹配正确的音频和标签。使用这个benchmark，我们首先发现现有ALM的表现只是marginally better than random chance，因此它们在compositional reasoning方面几乎没有表现出来。然后，我们提出了CompA-CLAP，其中我们使用一种新的学习方法来改进CLAP的compositional reasoning能力。为了训练CompA-CLAP，我们首先提出了对比训练中的组合感知强制对手，以便更加专注的训练。然后，我们提出了一种新的模块化对比损失，帮助模型学习细致的compositional理解，并且解决了公开available的compositional音频的缺乏问题。CompA-CLAP在CompA benchmark上显著超越了所有基线模型， indicating its superior compositional reasoning capabilities.

Constrained Bayesian Optimization with Adaptive Active Learning of Unknown Constraints

paper_url: http://arxiv.org/abs/2310.08751
repo_url: None
paper_authors: Fengxue Zhang, Zejie Zhu, Yuxin Chen
for: 这 paper 是关于 constrained Bayesian optimization (CBO) 的研究，用于处理具有黑盒函数目标和约束的复杂应用场景。
methods: 该 paper 提出了一种基于 ROI 的 CBO 框架，利用了目标和约束可以帮助确定高可信区域的想法。
results: 该 paper 提供了一种有理 теорем 的 CBO 框架，并通过实验证明了其效率和稳定性。 In English:
for: This paper is about research on constrained Bayesian optimization (CBO) for handling complex application scenarios with black-box objective and constraint functions.
methods: The paper proposes an CBO framework based on the idea of identifying high-confidence regions of interest (ROI) using both the objective and constraint functions.
results: The paper provides a theoretically grounded CBO framework and demonstrates its efficiency and robustness through empirical evidence.

Abstract
Optimizing objectives under constraints, where both the objectives and constraints are black box functions, is a common scenario in real-world applications such as scientific experimental design, design of medical therapies, and industrial process optimization. One popular approach to handling these complex scenarios is Bayesian Optimization (BO). In terms of theoretical behavior, BO is relatively well understood in the unconstrained setting, where its principles have been well explored and validated. However, when it comes to constrained Bayesian optimization (CBO), the existing framework often relies on heuristics or approximations without the same level of theoretical guarantees. In this paper, we delve into the theoretical and practical aspects of constrained Bayesian optimization, where the objective and constraints can be independently evaluated and are subject to noise. By recognizing that both the objective and constraints can help identify high-confidence regions of interest (ROI), we propose an efficient CBO framework that intersects the ROIs identified from each aspect to determine the general ROI. The ROI, coupled with a novel acquisition function that adaptively balances the optimization of the objective and the identification of feasible regions, enables us to derive rigorous theoretical justifications for its performance. We showcase the efficiency and robustness of our proposed CBO framework through empirical evidence and discuss the fundamental challenge of deriving practical regret bounds for CBO algorithms.

摘要
In this paper, we delve into the theoretical and practical aspects of constrained Bayesian optimization, where the objective and constraints can be independently evaluated and are subject to noise. By recognizing that both the objective and constraints can help identify high-confidence regions of interest (ROI), we propose an efficient CBO framework that intersects the ROIs identified from each aspect to determine the general ROI. The ROI, coupled with a novel acquisition function that adaptively balances the optimization of the objective and the identification of feasible regions, enables us to derive rigorous theoretical justifications for its performance. We showcase the efficiency and robustness of our proposed CBO framework through empirical evidence and discuss the fundamental challenge of deriving practical regret bounds for CBO algorithms.

Development and Validation of a Deep Learning-Based Microsatellite Instability Predictor from Prostate Cancer Whole-Slide Images

paper_url: http://arxiv.org/abs/2310.08743
repo_url: None
paper_authors: Qiyuan Hu, Abbas A. Rizvi, Geoffery Schau, Kshitij Ingale, Yoni Muller, Rachel Baits, Sebastian Pretzer, Aïcha BenTaieb, Abigail Gordhamer, Roberto Nussenzveig, Adam Cole, Matthew O. Leavitt, Rohan P. Joshi, Nike Beaubier, Martin C. Stumpe, Kunal Nagpal
for: 这个研究的目的是为了开发一个基于人工智能的肉眼染色图像（H&E）的微isatellite不稳定（MSI）诊断模型，以便将这些模型应用到肝癌患者身上，以促进免疫抑制剂治疗的适应率。
methods: 这个研究使用了一种名为“注意力型多个例学习（Multiple Instance Learning，MIL）”的人工智能模型，并使用了4015名肝癌患者的肝癌标本，其中173名患者的标本是在实验室内进行了染色和扫描。这个模型使用了一个叫做“注意力”的技术，将标本中的细胞扫描到图像中，以便更好地识别细胞的特征。
results: 这个研究发现了一个新的AI-based MSI诊断模型，可以从H&E标本中预测肝癌患者是否有高度微isatellite不稳定（MSI-H）。这个模型在3个不同的验证集中都达到了高度的准确率，分别为0.78、0.72和0.72。此外，这个模型还发现了与 gleason 分子数值相关的MSI-H诊断。

Abstract
Microsatellite instability-high (MSI-H) is a tumor agnostic biomarker for immune checkpoint inhibitor therapy. However, MSI status is not routinely tested in prostate cancer, in part due to low prevalence and assay cost. As such, prediction of MSI status from hematoxylin and eosin (H&E) stained whole-slide images (WSIs) could identify prostate cancer patients most likely to benefit from confirmatory testing and becoming eligible for immunotherapy. Prostate biopsies and surgical resections from de-identified records of consecutive prostate cancer patients referred to our institution were analyzed. Their MSI status was determined by next generation sequencing. Patients before a cutoff date were split into an algorithm development set (n=4015, MSI-H 1.8%) and a paired validation set (n=173, MSI-H 19.7%) that consisted of two serial sections from each sample, one stained and scanned internally and the other at an external site. Patients after the cutoff date formed the temporal validation set (n=1350, MSI-H 2.3%). Attention-based multiple instance learning models were trained to predict MSI-H from H&E WSIs. The MSI-H predictor achieved area under the receiver operating characteristic curve values of 0.78 (95% CI [0.69-0.86]), 0.72 (95% CI [0.63-0.81]), and 0.72 (95% CI [0.62-0.82]) on the internally prepared, externally prepared, and temporal validation sets, respectively. While MSI-H status is significantly correlated with Gleason score, the model remained predictive within each Gleason score subgroup. In summary, we developed and validated an AI-based MSI-H diagnostic model on a large real-world cohort of routine H&E slides, which effectively generalized to externally stained and scanned samples and a temporally independent validation cohort. This algorithm has the potential to direct prostate cancer patients toward immunotherapy and to identify MSI-H cases secondary to Lynch syndrome.

摘要
微卫星稳定性高 (MSI-H) 是一种肿瘤不吝啬的生物标志物，可以用于免疫检查点剂治疗。然而，MSI 状态在前列腺癌中并不是常见的测试项，一些原因是诊断成本高和预测率低。因此，可以通过从 Hematoxylin 和 Eosin (H&E) 染色整个扫描图像 (WSIs) 预测 prostate cancer 患者可能会从 confirmatory testing 中受益，并成为免疫治疗的 кандидат。我们分析了来自 consecutive prostate cancer 患者的杯尿和手术摘取记录，并确定了他们的 MSI 状态通过次世代测序。在割Date 之前，患者被分为了一个算法开发集 (n=4015, MSI-H 1.8%) 和一个验证集 (n=173, MSI-H 19.7%)，其中每个样本都有两个并行的 serial section，一个在内部染色和扫描，另一个在外部Site 染色。割Date 之后的患者组成了 temporal 验证集 (n=1350, MSI-H 2.3%)。我们使用了注意力基本多实例学习模型来预测 MSI-H 从 H&E WSIs。预测模型在 internally prepared、 externally prepared 和 temporal 验证集上的 area under the receiver operating characteristic curve 值分别为 0.78 (95% CI [0.69-0.86]), 0.72 (95% CI [0.63-0.81]), 0.72 (95% CI [0.62-0.82])。尽管 MSI-H 状态与 gleason 分型显著相关，但模型在每个 gleason 分型 subgroup 中保持预测性。综上所述，我们开发了一种基于 AI 的 MSI-H 诊断模型，并在大量实际患者数据上验证了其效果。这种算法有可能导引 prostate cancer 患者进行免疫治疗，并确定 MSI-H случа例是否与 Lynch 综合征相关。

Real-Time Event Detection with Random Forests and Temporal Convolutional Networks for More Sustainable Petroleum Industry

paper_url: http://arxiv.org/abs/2310.08737
repo_url: None
paper_authors: Yuanwei Qu, Baifan Zhou, Arild Waaler, David Cameron
for: 本研究旨在提供更有效的生产过程中不愿意事件探测方法，以避免环境和经济损害。
methods: 本研究使用机器学习方法，包括Random Forest和时间卷积网络，实时探测不愿意事件。
results: 研究结果表明，我们的方法可以有效地类型化事件并预测事件出现概率，从而解决过去研究中存在的挑战，并为生产过程中的事件管理提供更有效的解决方案。

Abstract
The petroleum industry is crucial for modern society, but the production process is complex and risky. During the production, accidents or failures, resulting from undesired production events, can cause severe environmental and economic damage. Previous studies have investigated machine learning (ML) methods for undesired event detection. However, the prediction of event probability in real-time was insufficiently addressed, which is essential since it is important to undertake early intervention when an event is expected to happen. This paper proposes two ML approaches, random forests and temporal convolutional networks, to detect undesired events in real-time. Results show that our approaches can effectively classify event types and predict the probability of their appearance, addressing the challenges uncovered in previous studies and providing a more effective solution for failure event management during the production.

摘要
现代社会中，石油工业具有重要的地位，但生产过程具有复杂和危险的特点。生产过程中的意外或失败可能会导致严重的环境和经济损害。先前的研究已经调查了机器学习（ML）方法用于不愿意事件检测。然而，实时预测事件概率的问题尚未得到充分解决，这是因为在事件预计将发生时，早期干预是非常重要的。本文提出了两种ML方法，随机森林和时间卷积网络，用于实时检测不愿意事件。结果表明，我们的方法可以有效地分类事件类型并预测事件出现的概率，解决先前研究中存在的挑战，并为生产过程中的失败事件管理提供更有效的解决方案。

A Simple Way to Incorporate Novelty Detection in World Models

paper_url: http://arxiv.org/abs/2310.08731
repo_url: None
paper_authors: Geigh Zollicoffer, Kenneth Eaton, Jonathan Balloch, Julia Kim, Mark O. Riedl, Robert Wright
for: 保护RL Agent在突然改变世界机制或属性时的性能和可靠性。
methods: 利用生成的世界模型框架中的假象状态与真实观察状态的偏差来检测新鲜事物。
results: 在一个新环境中，比传统机器学习新鲜事物检测方法和当前RL关注的新鲜事物检测算法更有优势。

Abstract
Reinforcement learning (RL) using world models has found significant recent successes. However, when a sudden change to world mechanics or properties occurs then agent performance and reliability can dramatically decline. We refer to the sudden change in visual properties or state transitions as {\em novelties}. Implementing novelty detection within generated world model frameworks is a crucial task for protecting the agent when deployed. In this paper, we propose straightforward bounding approaches to incorporate novelty detection into world model RL agents, by utilizing the misalignment of the world model's hallucinated states and the true observed states as an anomaly score. We first provide an ontology of novelty detection relevant to sequential decision making, then we provide effective approaches to detecting novelties in a distribution of transitions learned by an agent in a world model. Finally, we show the advantage of our work in a novel environment compared to traditional machine learning novelty detection methods as well as currently accepted RL focused novelty detection algorithms.

摘要
现代控制学（RL）使用世界模型已经取得了显著成功。然而，当世界机制或属性突然发生变化时，智能体性能和可靠性可能很快减退。我们称这种突然变化为“新奇”（novelties）。在生成世界模型框架中实现新奇探测是保护智能体部署的关键任务。在这篇论文中，我们提出了简单的绝对方法，通过利用世界模型生成的幻觉状态和实际观察状态之间的偏差作为异常分数来检测新奇。我们首先提供了对于顺序决策的新奇检测 Ontology，然后我们提供了有效的检测新奇在智能体在世界模型中学习的转移分布中的方法。最后，我们展示了我们的工作在一个新环境中的优势，比传统机器学习新奇检测方法和当前广泛accepted RL专注的新奇检测算法。

Transformer Choice Net: A Transformer Neural Network for Choice Prediction

paper_url: http://arxiv.org/abs/2310.08716
repo_url: None
paper_authors: Hanzhao Wang, Xiaocheng Li, Kalyan Talluri
for: 这篇论文旨在提出一种能够预测客户选择多个 item 的Transformer neural network architecture，即 Transformer Choice Net。
methods: 该论文使用 transformer 网络，考虑客户和物品特征以及上下文（如购物礼品和客户之前选择）来预测客户选择。
results: 在多个 benchmark 数据集上，该 Architecture 表现出比 Literature 中主流模型更好的out-of-sample 预测性能，无需特定模型定制或调整。

Abstract
Discrete-choice models, such as Multinomial Logit, Probit, or Mixed-Logit, are widely used in Marketing, Economics, and Operations Research: given a set of alternatives, the customer is modeled as choosing one of the alternatives to maximize a (latent) utility function. However, extending such models to situations where the customer chooses more than one item (such as in e-commerce shopping) has proven problematic. While one can construct reasonable models of the customer's behavior, estimating such models becomes very challenging because of the combinatorial explosion in the number of possible subsets of items. In this paper we develop a transformer neural network architecture, the Transformer Choice Net, that is suitable for predicting multiple choices. Transformer networks turn out to be especially suitable for this task as they take into account not only the features of the customer and the items but also the context, which in this case could be the assortment as well as the customer's past choices. On a range of benchmark datasets, our architecture shows uniformly superior out-of-sample prediction performance compared to the leading models in the literature, without requiring any custom modeling or tuning for each instance.

摘要
偏函数模型，如多项逻辑或混合逻辑，在市场学、经济学和运筹学中广泛应用：给定一组选项，客户会选择一个选项以最大化隐藏的凝聚函数。然而，将这些模型扩展到客户选择多个Item（如在电子商务上的购物）是有困难的。尽管可以构建合理的客户行为模型，但估计这些模型变得非常困难，因为选择的可能性的 combinatorial 爆炸。在这篇文章中，我们开发了一种变换神经网络架构，名为Transformer Choice Net，适用于预测多个选择。Transformer网络在这种任务中特别适用，因为它们考虑客户和Item的特征以及上下文，上下文可能是商品组合以及客户的过去选择。在一系列的标准数据集上，我们的架构在无需任何定制化或调整的情况下显示了对比标准模型的uniformly 出色的尝试预测性能。

Toward Joint Language Modeling for Speech Units and Text

paper_url: http://arxiv.org/abs/2310.08715
repo_url: None
paper_authors: Ju-Chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli
for: 本研究旨在模型 speech 和 text 之间的共同表达。
methods: 我们使用不同的 speech tokenizer 将连续的 speech 信号转换成 discrete 单元，并使用不同的方法构建混合 speech-text 数据。我们还引入自动评价指标，以评估模型是否能够共同学习 speech 和 text。
results: 我们的结果表明，通过我们的混合技术，混合 speech 单元和 text，joint LM 可以在 SLU 任务上超过 speech-only 基线，并且在不同的模式（speech 或 text）下进行 Zero-shot 跨模态传递。

Abstract
Speech and text are two major forms of human language. The research community has been focusing on mapping speech to text or vice versa for many years. However, in the field of language modeling, very little effort has been made to model them jointly. In light of this, we explore joint language modeling for speech units and text. Specifically, we compare different speech tokenizers to transform continuous speech signals into discrete units and use different methods to construct mixed speech-text data. We introduce automatic metrics to evaluate how well the joint LM mixes speech and text. We also fine-tune the LM on downstream spoken language understanding (SLU) tasks with different modalities (speech or text) and test its performance to assess the model's learning of shared representations. Our results show that by mixing speech units and text with our proposed mixing techniques, the joint LM improves over a speech-only baseline on SLU tasks and shows zero-shot cross-modal transferability.

摘要
文本和语音是人类语言的两大形式。研究者们在映射语音到文本或反之方面努力了很多年。然而，在语言模型化领域，很少努力用于同时模型语音和文本。为了解决这个问题，我们在语音单元和文本之间进行同时语言模型化。我们比较不同的语音切分器将连续的语音信号转换成分解单元，并使用不同的方法构建混合语音-文本数据。我们引入自动评估 metric来评估混合LM如何混合语音和文本。此外，我们在不同Modalitites（语音或文本）下进行了精细调整，并测试模型在下游语言理解任务上的性能，以评估模型是否学习了共享表示。我们的结果表明，通过我们提议的混合技术，混合语音单元和文本的混合LM在SLU任务上超过了基准点的语音Only模型，并表现出零 shot cross-modal可转移性。

ELDEN: Exploration via Local Dependencies

paper_url: http://arxiv.org/abs/2310.08702
repo_url: None
paper_authors: Jiaheng Hu, Zizhao Wang, Peter Stone, Roberto Martin-Martin
for: 这篇论文是为了解决复杂的任务和奖励不够的问题，提出了一种新的自适应奖励方法。
methods: 该方法基于当前环境中实体之间的异常依赖关系，通过计算部分导数来准确地捕捉实体之间的依赖关系，并使用这些依赖关系来鼓励探索新的交互方式。
results: 在四个不同的领域中，ELDEN方法在许多复杂的任务上表现出色，比前一个状态的探索方法更加成功，并且能够准确地捕捉实体之间的依赖关系。

Abstract
Tasks with large state space and sparse rewards present a longstanding challenge to reinforcement learning. In these tasks, an agent needs to explore the state space efficiently until it finds a reward. To deal with this problem, the community has proposed to augment the reward function with intrinsic reward, a bonus signal that encourages the agent to visit interesting states. In this work, we propose a new way of defining interesting states for environments with factored state spaces and complex chained dependencies, where an agent's actions may change the value of one entity that, in order, may affect the value of another entity. Our insight is that, in these environments, interesting states for exploration are states where the agent is uncertain whether (as opposed to how) entities such as the agent or objects have some influence on each other. We present ELDEN, Exploration via Local DepENdencies, a novel intrinsic reward that encourages the discovery of new interactions between entities. ELDEN utilizes a novel scheme -- the partial derivative of the learned dynamics to model the local dependencies between entities accurately and computationally efficiently. The uncertainty of the predicted dependencies is then used as an intrinsic reward to encourage exploration toward new interactions. We evaluate the performance of ELDEN on four different domains with complex dependencies, ranging from 2D grid worlds to 3D robotic tasks. In all domains, ELDEN correctly identifies local dependencies and learns successful policies, significantly outperforming previous state-of-the-art exploration methods.

摘要
Tasks with large state space and sparse rewards have long been a challenge for reinforcement learning. In these tasks, an agent needs to explore the state space efficiently until it finds a reward. To address this problem, the community has proposed augmenting the reward function with an intrinsic reward, a bonus signal that encourages the agent to visit interesting states. In this work, we propose a new way of defining interesting states for environments with factored state spaces and complex chained dependencies, where an agent's actions may change the value of one entity that, in turn, may affect the value of another entity. Our insight is that, in these environments, interesting states for exploration are states where the agent is uncertain whether (as opposed to how) entities such as the agent or objects have some influence on each other. We present ELDEN, Exploration via Local Dependencies, a novel intrinsic reward that encourages the discovery of new interactions between entities. ELDEN utilizes a novel scheme -- the partial derivative of the learned dynamics to model the local dependencies between entities accurately and computationally efficiently. The uncertainty of the predicted dependencies is then used as an intrinsic reward to encourage exploration toward new interactions. We evaluate the performance of ELDEN on four different domains with complex dependencies, ranging from 2D grid worlds to 3D robotic tasks. In all domains, ELDEN correctly identifies local dependencies and learns successful policies, significantly outperforming previous state-of-the-art exploration methods.

Virtual Augmented Reality for Atari Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.08683
repo_url: https://github.com/c-a-schiller/var4arl
paper_authors: Christian A. Schiller
for: 研究是使RL代理人在Atari游戏中表现更好的途径，以及是否可以通过现有的图像分割模型提高RL代理人的游戏表现。
methods: 使用现有的图像分割模型（如Segment Anything Model）对RL代理人的游戏环境进行修饰，以提高其游戏表现。
results: 研究发现，对RL代理人的游戏环境进行修饰可以提高其游戏表现，但是需要满足certain condition。 Comparing RL agent performance results from raw and augmented pixel inputs provides insight into these conditions.

Abstract
Reinforcement Learning (RL) has achieved significant milestones in the gaming domain, most notably Google DeepMind's AlphaGo defeating human Go champion Ken Jie. This victory was also made possible through the Atari Learning Environment (ALE): The ALE has been foundational in RL research, facilitating significant RL algorithm developments such as AlphaGo and others. In current Atari video game RL research, RL agents' perceptions of its environment is based on raw pixel data from the Atari video game screen with minimal image preprocessing. Contrarily, cutting-edge ML research, external to the Atari video game RL research domain, is focusing on enhancing image perception. A notable example is Meta Research's "Segment Anything Model" (SAM), a foundation model capable of segmenting images without prior training (zero-shot). This paper addresses a novel methodical question: Can state-of-the-art image segmentation models such as SAM improve the performance of RL agents playing Atari video games? The results suggest that SAM can serve as a "virtual augmented reality" for the RL agent, boosting its Atari video game playing performance under certain conditions. Comparing RL agent performance results from raw and augmented pixel inputs provides insight into these conditions. Although this paper was limited by computational constraints, the findings show improved RL agent performance for augmented pixel inputs and can inform broader research agendas in the domain of "virtual augmented reality for video game playing RL agents".

摘要
reinforcement learning (RL) 在游戏领域取得了重要的成就，最 Notable example 是 Google DeepMind 的 AlphaGo 击败人类Go冠军 Ken Jie。这胜利也得到了 ALE 的支持：ALE 是RL研究中基础的平台，促进了一系列RL算法的发展，如 AlphaGo 等。现在的 Atari 游戏 RL 研究中，RL 代理的环境感知基于 raw pixel 数据从 Atari 游戏屏幕， minimal 图像预处理。然而，当前的 ML 研究，外部于 Atari 游戏 RL 研究领域，正在强调图像感知的提高。一个 notable example 是 Meta Research 的 "Segment Anything Model" (SAM)，这是一个无需先期训练的基本模型，可以 segmenting 图像。本文提出了一个新的问题：可以使用 state-of-the-art 图像 segmentation 模型来提高 Atari 游戏 RL 代理的性能吗？结果表明，SAM 可以作为 RL 代理的 "虚拟增强 reality"，在某些条件下提高其 Atari 游戏性能。通过比较 raw 和增强 pixel 输入的 RL 代理性能结果，可以了解这些条件。虽然这篇文章受限于计算力，但结果表明在某些情况下，使用 state-of-the-art 图像 segmentation 模型可以提高 RL 代理的性能，这些结果可以推导到更广泛的 "虚拟增强 reality для video game 游戏 RL 代理" 的研究论题。

Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

paper_url: http://arxiv.org/abs/2310.08678
repo_url: None
paper_authors: Ethan Callanan, Amarachi Mbakwe, Antony Papadimitriou, Yulong Pei, Mathieu Sibue, Xiaodan Zhu, Zhiqiang Ma, Xiaomo Liu, Sameena Shah
For: This study aims to assess the financial reasoning capabilities of Large Language Models (LLMs) using mock exam questions from the Chartered Financial Analyst (CFA) Program.* Methods: The study uses ChatGPT and GPT-4 in financial analysis, considering Zero-Shot (ZS), Chain-of-Thought (CoT), and Few-Shot (FS) scenarios.* Results: The study presents an in-depth analysis of the models’ performance and limitations, and estimates whether they would have a chance at passing the CFA exams. Additionally, it outlines insights into potential strategies and improvements to enhance the applicability of LLMs in finance.

Abstract
Large Language Models (LLMs) have demonstrated remarkable performance on a wide range of Natural Language Processing (NLP) tasks, often matching or even beating state-of-the-art task-specific models. This study aims at assessing the financial reasoning capabilities of LLMs. We leverage mock exam questions of the Chartered Financial Analyst (CFA) Program to conduct a comprehensive evaluation of ChatGPT and GPT-4 in financial analysis, considering Zero-Shot (ZS), Chain-of-Thought (CoT), and Few-Shot (FS) scenarios. We present an in-depth analysis of the models' performance and limitations, and estimate whether they would have a chance at passing the CFA exams. Finally, we outline insights into potential strategies and improvements to enhance the applicability of LLMs in finance. In this perspective, we hope this work paves the way for future studies to continue enhancing LLMs for financial reasoning through rigorous evaluation.

摘要
大型自然语言模型（LLM）已经在各种自然语言处理任务上显示出极具表现力，经常与状态流的任务特定模型匹配或甚至超越。这项研究的目的是评估LLM在金融分析中的理解能力。我们利用Chartered Financial Analyst（CFA）考试Mock问题来进行全面的GPT和GPT-4在金融分析中的评估，包括零极（ZS）、链条（CoT）和几极（FS）场景。我们提供了深入的分析和限制，并估算这些模型是否会在CFA考试中通过。最后，我们总结了可能的策略和改进，以提高LLM在金融领域的应用性。希望这项研究能够为未来的研究提供依据，继续提高LLM在金融分析中的表现。

GDL-DS: A Benchmark for Geometric Deep Learning under Distribution Shifts

paper_url: http://arxiv.org/abs/2310.08677
repo_url: https://github.com/graph-com/gdl_ds
paper_authors: Deyu Zou, Shikun Liu, Siqi Miao, Victor Fung, Shiyu Chang, Pan Li
for: 本研究旨在评估深度学习模型在数据分布变化的情况下的性能。
methods: 本研究使用的方法包括提出了一个全面的benchmark，用于评估深度学习模型在不同的数据分布变化情况下的性能。
results: 研究结果显示，在30个不同的实验设置中，3种深度学习基础模型和11种学习算法在不同的数据分布变化情况下的性能有所差异。

Abstract
Geometric deep learning (GDL) has gained significant attention in various scientific fields, chiefly for its proficiency in modeling data with intricate geometric structures. Yet, very few works have delved into its capability of tackling the distribution shift problem, a prevalent challenge in many relevant applications. To bridge this gap, we propose GDL-DS, a comprehensive benchmark designed for evaluating the performance of GDL models in scenarios with distribution shifts. Our evaluation datasets cover diverse scientific domains from particle physics and materials science to biochemistry, and encapsulate a broad spectrum of distribution shifts including conditional, covariate, and concept shifts. Furthermore, we study three levels of information access from the out-of-distribution (OOD) testing data, including no OOD information, only OOD features without labels, and OOD features with a few labels. Overall, our benchmark results in 30 different experiment settings, and evaluates 3 GDL backbones and 11 learning algorithms in each setting. A thorough analysis of the evaluation results is provided, poised to illuminate insights for DGL researchers and domain practitioners who are to use DGL in their applications.

摘要
几何深度学习（GDL）已经受到了不同领域的科学家的重视，主要是因为它能够有效地模型复杂的几何结构数据。然而，只有一些研究探讨了GDL模型在分布类型错误（distribution shift）的情况下的能力。为了补充这个空白，我们提出了GDL-DS，一个全面的对照测试框架，用于评估GDL模型在分布类型错误的情况下的表现。我们的评估数据集覆盖了物理学和材料科学等多个科学领域，并包含了各种分布类型错误，包括增量、偏好和概念类型错误。此外，我们还研究了从 OUT-OF-Distribution（OOD）测试数据中获取信息的三种水平，包括没有OOD信息、只有OOD特征而无 labels，以及OOD特征和一些labels。总的来说，我们的对照测试得出了30个不同的实验设定，并评估了3个GDL核心和11种学习算法在每个设定中。我们进行了详细的分析结果，以便为DGL研究者和领域实践者提供启发。

Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach

paper_url: http://arxiv.org/abs/2310.08660
repo_url: https://github.com/heasung-kim/safe-rl-deployment-for-5g
paper_authors: Heasung Kim, Sravan Ankireddy
For: The paper is written for optimizing network parameters for rate maximization in 5G communication systems.* Methods: The paper proposes using deep reinforcement learning (RL) techniques, specifically discrete batch constrained deep Q-learning (BCQ), to solve the non-convex optimization problem of power control, beam forming, and interference cancellation.* Results: The paper shows that the proposed BCQ algorithm can achieve performance similar to deep Q-network (DQN) based control with only a fraction of the data and without the need for exploration, resulting in maximized sample efficiency and minimized risk in the deployment of a new algorithm to commercial networks.Here are the three key information points in Simplified Chinese text:* For: 本文是为了优化5G通信系统中的网络参数以实现速率最大化。* Methods: 本文提议使用深度学习 Reinforcement Learning (RL) 技术，特别是粗粒度约束的深度 Q-学习 (BCQ)，解决非对称优化问题。* Results: 本文显示，提议的 BCQ 算法可以与 DQN 基于控制 дости到类似性，只需要一小部分数据和不需要探索，从而最大化样本效率和风险的避免。

Abstract
In this project, we consider the problem of network parameter optimization for rate maximization. We frame this as a joint optimization problem of power control, beam forming, and interference cancellation. We consider the setting where multiple Base Stations (BSs) are communicating with multiple user equipments (UEs). Because of the exponential computational complexity of brute force search, we instead solve this non-convex optimization problem using deep reinforcement learning (RL) techniques. The modern communication systems are notorious for their difficulty in exactly modeling their behaviour. This limits us in using RL based algorithms as interaction with the environment is needed for the agent to explore and learn efficiently. Further, it is ill advised to deploy the algorithm in real world for exploration and learning because of the high cost of failure. In contrast to the previous RL-based solutions proposed, such as deep-Q network (DQN) based control, we propose taking an offline model based approach. We specifically consider discrete batch constrained deep Q-learning (BCQ) and show that performance similar to DQN can be acheived with only a fraction of the data and without the need for exploration. This results in maximizing sample efficiency and minimizing risk in the deployment of a new algorithm to commercial networks. We provide the entire resource of the project, including code and data, at the following link: https://github.com/Heasung-Kim/ safe-rl-deployment-for-5g.

摘要
在这个项目中，我们考虑了网络参数优化问题，以maximize rate。我们将这个问题划为多个基站（BS）与多个用户设备（UE）之间的共同优化问题，包括功率控制、扫描形成和干扰抑制。由于条件矩阵的计算复杂性，我们不能采用条件矩阵搜索法。相反，我们使用深度学习束缚学习（RL）技术来解决这个非连续优化问题。现代通信系统的行为难以准确模拟，这限制了我们使用RL基于算法。此外，由于实际部署中的失败成本高，我们不建议在实际环境中进行探索和学习。相比之前的RL基于解决方案，我们提出了离线模型基于的BCQ算法。我们表明，BCQ算法可以在只需一部分数据和不需探索的情况下，达到与DQN算法相同的性能。这使得我们可以最大化样本效率，最小化部署新算法到商业网络中的风险。我们提供了该项目的所有资源，包括代码和数据，请参考以下链接：https://github.com/Heasung-Kim/safe-rl-deployment-for-5g。

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

paper_url: http://arxiv.org/abs/2310.08659
repo_url: https://github.com/yxli2123/loftq
paper_authors: Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, Tuo Zhao
for: 本研究旨在探讨在预训练模型上同时应用量化和LoRA精度调整的场景下，量化和LoRA精度调整可以共同提高下游任务的性能。
methods: 我们提出了LoftQ（LoRA-Fine-Tuning-aware Quantization）量化框架，该框架同时对LLM进行量化，并在LoRA精度调整中找到适当的低级别初始化，以解决量化模型和全精度模型之间的性能差距。
results: 我们在自然语言理解、问答、概要、自然语言生成等任务上进行了实验，结果表明，我们的方法在2比特和2/4比特混合精度 режиmes中具有显著的优势，与现有的量化方法相比，尤其是在更加具有挑战性的场景下表现出色。

Abstract
Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In such cases it is common to observe a consistent gap in the performance on downstream tasks between full fine-tuning and quantization plus LoRA fine-tuning approach. In response, we propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework that simultaneously quantizes an LLM and finds a proper low-rank initialization for LoRA fine-tuning. Such an initialization alleviates the discrepancy between the quantized and full-precision model and significantly improves the generalization in downstream tasks. We evaluate our method on natural language understanding, question answering, summarization, and natural language generation tasks. Experiments show that our method is highly effective and outperforms existing quantization methods, especially in the challenging 2-bit and 2/4-bit mixed precision regimes. We will release our code.

摘要
“量化”是现代大语言模型（LLM）的不可或缺技巧，最近它在LoRA精细调整中找到了应用。在这种场景下，我们发现在预训练模型上应用量化和LoRA精细调整时，下游任务表现存在一个一致的差距。为了解决这个问题，我们提出了LoftQ（LoRA-Fine-Tuning-aware Quantization），一种新的量化框架，同时对大语言模型进行量化，并在LoRA精细调整中找到合适的低级初始化。这种初始化可以减轻量化模型和整数模型之间的差异，并在下游任务中提高通用性。我们在自然语言理解、问答、概要、自然语言生成等任务上进行了实验，结果显示，我们的方法非常有效，特别在2位和2/4位混合精度 régime中表现出色。我们将发布我们的代码。

Analyzing Textual Data for Fatality Classification in Afghanistan’s Armed Conflicts: A BERT Approach

paper_url: http://arxiv.org/abs/2310.08653
repo_url: None
paper_authors: Hikmatullah Mohammadi, Ziaullah Momand, Parwin Habibi, Nazifa Ramaki, Bibi Storay Fazli, Sayed Zobair Rohany, Iqbal Samsoor
For: The paper aims to classify the outcomes of armed conflicts in Afghanistan as either fatal or non-fatal based on textual descriptions provided by the ACLED dataset.* Methods: The paper uses the BERT model, a cutting-edge language representation model in natural language processing, to classify the events based on their raw textual descriptions.* Results: The model achieved impressive performance on the test set with an accuracy of 98.8%, recall of 98.05%, precision of 99.6%, and an F1 score of 98.82%. These results highlight the model’s robustness and indicate its potential impact in various areas such as resource allocation, policymaking, and humanitarian aid efforts in Afghanistan.Here are the three points in Simplified Chinese text:
for: 这个研究目标是使用 ACLED 数据集的文本描述来分类阿富汗武装冲突的结果为非致死或致死。
methods: 这个研究使用 BERT 模型，一种现代自然语言处理的语言表示模型，来基于事件的原始文本描述来分类。
results: 模型在测试集上表现出色，准确率为 98.8%，回归率为 98.05%，准确率为 99.6%， F1 分数为 98.82%。这些结果表明模型的稳定性，并指示其在阿富汗资源分配、政策制定和人道主义援助等领域的潜在影响。

Abstract
Afghanistan has witnessed many armed conflicts throughout history, especially in the past 20 years; these events have had a significant impact on human lives, including military and civilians, with potential fatalities. In this research, we aim to leverage state-of-the-art machine learning techniques to classify the outcomes of Afghanistan armed conflicts to either fatal or non-fatal based on their textual descriptions provided by the Armed Conflict Location & Event Data Project (ACLED) dataset. The dataset contains comprehensive descriptions of armed conflicts in Afghanistan that took place from August 2021 to March 2023. The proposed approach leverages the power of BERT (Bidirectional Encoder Representations from Transformers), a cutting-edge language representation model in natural language processing. The classifier utilizes the raw textual description of an event to estimate the likelihood of the event resulting in a fatality. The model achieved impressive performance on the test set with an accuracy of 98.8%, recall of 98.05%, precision of 99.6%, and an F1 score of 98.82%. These results highlight the model's robustness and indicate its potential impact in various areas such as resource allocation, policymaking, and humanitarian aid efforts in Afghanistan. The model indicates a machine learning-based text classification approach using the ACLED dataset to accurately classify fatality in Afghanistan armed conflicts, achieving robust performance with the BERT model and paving the way for future endeavors in predicting event severity in Afghanistan.

摘要
阿富汗历史上有很多武装冲突，特别是过去20年，这些事件对人类生命产生了深远的影响，包括军事人员和平民，可能导致致命性伤亡。在这项研究中，我们想利用当今最先进的机器学习技术来分类阿富汗武装冲突的结果为致命或非致命，基于ACLED数据集（武装冲突位置和事件数据项目）提供的文本描述。ACLED数据集包含了阿富汗2021年8月至2023年3月期间的武装冲突描述。我们提出的方法利用BERT（irectional Encoder Representations from Transformers）模型，这是当今自然语言处理领域最先进的语言表示模型。分类器使用事件描述的Raw文本来估计事件是否会导致致命性伤亡。测试集上，模型实现了惊人的表现，准确率为98.8%，回归率为98.05%，精度为99.6%，F1分数为98.82%。这些结果显示模型的强健性，并指示其在各种领域，如资源分配、政策制定和人道主义援助等，有可能产生深远的影响。模型表明，使用ACLED数据集和BERT模型进行文本分类可以准确地 классифици阿富汗武装冲突的致命性，实现了robust性表现，开创了预测阿富汗事件严重性的先河。

Electrical Grid Anomaly Detection via Tensor Decomposition

paper_url: http://arxiv.org/abs/2310.08650
repo_url: None
paper_authors: Alexander Most, Maksim Eren, Nigel Lawrence, Boian Alexandrov
For: This paper aims to improve the accuracy and specificity of anomaly detection in Supervisory Control and Data Acquisition (SCADA) systems for electrical grid systems.* Methods: The paper applies a non-negative tensor decomposition method called Canonical Polyadic Alternating Poisson Regression (CP-APR) with a probabilistic framework to identify anomalies in SCADA systems.* Results: The use of statistical behavior analysis of SCADA communication with tensor decomposition improves the specificity and accuracy of identifying anomalies in electrical grid systems, as demonstrated through experiments using real-world SCADA system data collected from the Los Alamos National Laboratory (LANL).

Abstract
Supervisory Control and Data Acquisition (SCADA) systems often serve as the nervous system for substations within power grids. These systems facilitate real-time monitoring, data acquisition, control of equipment, and ensure smooth and efficient operation of the substation and its connected devices. Previous work has shown that dimensionality reduction-based approaches, such as Principal Component Analysis (PCA), can be used for accurate identification of anomalies in SCADA systems. While not specifically applied to SCADA, non-negative matrix factorization (NMF) has shown strong results at detecting anomalies in wireless sensor networks. These unsupervised approaches model the normal or expected behavior and detect the unseen types of attacks or anomalies by identifying the events that deviate from the expected behavior. These approaches; however, do not model the complex and multi-dimensional interactions that are naturally present in SCADA systems. Differently, non-negative tensor decomposition is a powerful unsupervised machine learning (ML) method that can model the complex and multi-faceted activity details of SCADA events. In this work, we novelly apply the tensor decomposition method Canonical Polyadic Alternating Poisson Regression (CP-APR) with a probabilistic framework, which has previously shown state-of-the-art anomaly detection results on cyber network data, to identify anomalies in SCADA systems. We showcase that the use of statistical behavior analysis of SCADA communication with tensor decomposition improves the specificity and accuracy of identifying anomalies in electrical grid systems. In our experiments, we model real-world SCADA system data collected from the electrical grid operated by Los Alamos National Laboratory (LANL) which provides transmission and distribution service through a partnership with Los Alamos County, and detect synthetically generated anomalies.

摘要
超visory控制和数据获取（SCADA）系统 часто作为电网互网络的神经系统。这些系统实时监控、数据获取、控制设备，以确保电网和相关设备的运行平滑和高效。以前的研究表明，维度减少基本方法，如主成分分析（PCA），可以准确地检测SCADA系统中的异常。尽管不直接应用于SCADA，非负矩阵分解（NMF）在无人报表网络中检测异常表现出色。这些不监管的方法模拟正常或预期的行为，并检测不可见的攻击或异常情况，并且可以快速地响应变化。然而，这些方法不能模拟SCADA系统中自然存在的复杂多维度交互。相反，非负矩阵分解是一种强大的无监管机器学习方法，可以模拟SCADA事件的复杂多方面活动详细情况。在这种工作中，我们首次应用tensor decompositions方法Canonical Polyadic Alternating Poisson Regression（CP-APR）的概率框架，以前已经在网络数据上达到了状态之绩异常检测结果。我们显示，通过统计行为分析SCADA通信和tensor decompositions，可以提高异常检测在电力网络系统中的特点和准确率。在我们的实验中，我们使用实际的SCADA系统数据，从洛斯阿拉莫斯国家实验室（LANL）电力网络提供的传输和分布服务，并检测生成的异常。

A Mass-Conserving-Perceptron for Machine Learning-Based Modeling of Geoscientific Systems

paper_url: http://arxiv.org/abs/2310.08644
repo_url: None
paper_authors: Yuan-Heng Wang, Hoshin V. Gupta
for: 该文章是为了开发一种能够更准确地预测地球科学系统时间序列进程的 физи学基础模型。
methods: 该文章使用机器学习（ML）技术，开发了一种基于权重链网络（GRNN）的 физи学基础模型。
results: 该文章的实验结果表明，该模型可以更好地预测地球科学系统的时间序列进程，并且可以帮助科学家更好地理解系统的结构和功能。

Abstract
Although decades of effort have been devoted to building Physical-Conceptual (PC) models for predicting the time-series evolution of geoscientific systems, recent work shows that Machine Learning (ML) based Gated Recurrent Neural Network technology can be used to develop models that are much more accurate. However, the difficulty of extracting physical understanding from ML-based models complicates their utility for enhancing scientific knowledge regarding system structure and function. Here, we propose a physically-interpretable Mass Conserving Perceptron (MCP) as a way to bridge the gap between PC-based and ML-based modeling approaches. The MCP exploits the inherent isomorphism between the directed graph structures underlying both PC models and GRNNs to explicitly represent the mass-conserving nature of physical processes while enabling the functional nature of such processes to be directly learned (in an interpretable manner) from available data using off-the-shelf ML technology. As a proof of concept, we investigate the functional expressivity (capacity) of the MCP, explore its ability to parsimoniously represent the rainfall-runoff (RR) dynamics of the Leaf River Basin, and demonstrate its utility for scientific hypothesis testing. To conclude, we discuss extensions of the concept to enable ML-based physical-conceptual representation of the coupled nature of mass-energy-information flows through geoscientific systems.

摘要
尽管多年的努力已经投入到建立物理概念（PC）模型以预测地球科学系统的时间序列演化，但最近的研究表明，机器学习（ML）基于闭合循环神经网络技术可以建立更高度准确的模型。然而，提取物理理解从ML基于模型中带来了问题，使其在提高科学知识系统结构和功能方面具有限制。为了bridging这个鸿沟，我们提议一种可解释的质量保持嵌入（MCP），该模型利用PC模型和GRNNs的直接对应关系来显式表示物理过程中的质量保持性，同时允许功能性过程直接从数据中学习（可解释的方式）。作为证明，我们研究MCP的功能表达能力，探讨它在哥伦比亚河流水系中表达简洁性，并示出其在科学假设测试中的实用性。最后，我们讨论了扩展该概念，以实现ML基于物理概念的表示地球科学系统的结合性。

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

paper_url: http://arxiv.org/abs/2310.08588
repo_url: https://github.com/dongyh20/octopus
paper_authors: Jingkang Yang, Yuhao Dong, Shuai Liu, Bo Li, Ziyue Wang, Chencheng Jiang, Haoran Tan, Jiamu Kang, Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu
for: 本研究旨在开发一种能够高效地理解智能代理人的视觉和文本任务目标，并生成复杂的行动序列和可执行代码的新型视觉语言模型（VLM）。
methods: 本研究使用GPT-4来控制一个探索性的代理人生成训练数据，包括行动蓝图和相应的可执行代码，并采用反馈学习环境反馈（RLEF）来进一步优化决策。
results: 经过一系列实验，我们证明 Octopus 的功能和取得了吸引人的结果，并且 RLEF 提高了代理人的决策。

Abstract
Large vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning. Furthermore, when seamlessly integrated into an embodied agent, it signifies a crucial stride towards the creation of autonomous and context-aware systems capable of formulating plans and executing commands with precision. In this paper, we introduce Octopus, a novel VLM designed to proficiently decipher an agent's vision and textual task objectives and to formulate intricate action sequences and generate executable code. Our design allows the agent to adeptly handle a wide spectrum of tasks, ranging from mundane daily chores in simulators to sophisticated interactions in complex video games. Octopus is trained by leveraging GPT-4 to control an explorative agent to generate training data, i.e., action blueprints and the corresponding executable code, within our experimental environment called OctoVerse. We also collect the feedback that allows the enhanced training scheme of Reinforcement Learning with Environmental Feedback (RLEF). Through a series of experiments, we illuminate Octopus's functionality and present compelling results, and the proposed RLEF turns out to refine the agent's decision-making. By open-sourcing our model architecture, simulator, and dataset, we aspire to ignite further innovation and foster collaborative applications within the broader embodied AI community.

摘要
大型视力语言模型（VLM）已经取得了多样化感知和理解的重要进步。更重要的是，当这些模型与embody agent结合使用时，表示自主和上下文感知系统的创造。在这篇论文中，我们介绍了Octopus，一种新的VLM，可以高效地理解机器人的视觉和文本任务目标，并生成复杂的动作序列和执行代码。我们的设计允许机器人在各种任务中灵活处理，从日常 simulate 中的杂乱任务到复杂的 виде游戏中的互动。Octopus 通过利用 GPT-4 控制一个探索性的机器人生成训练数据，即动作蓝图和相应的执行代码，在我们的实验环境 OctoVerse 中。我们还收集了反馈，用于改进强化学习环境反馈（RLEF）的训练方案。通过一系列实验，我们表明 Octopus 的功能和结果，并发现 RLEF 对机器人做出了更好的决策。我们通过开源我们的模型结构、模拟器和数据集，希望能够点燃更多的创新和在更广泛的embody AI社区中的合作应用。

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models

paper_url: http://arxiv.org/abs/2310.08582
repo_url: None
paper_authors: Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, Ping Luo
for: 该论文研究了一种名为close-loop任务规划的技术，它是一种根据实时观察而逐步生成任务计划的过程。
methods: 该论文使用了大语言模型（LLM）来生成动作，并将其分为三个阶段：计划抽样、动作树构建和基于实际环境信息的决策。
results: 该论文通过将LLM查询分解为多个基于实际环境信息的决策，可以大幅减少token消耗量，同时提高了效率。实验显示，该方法可以达到状态的术语表现，而且可以减少92.2%的token消耗量和40.5%的错误纠正量。

Abstract
This paper studies close-loop task planning, which refers to the process of generating a sequence of skills (a plan) to accomplish a specific goal while adapting the plan based on real-time observations. Recently, prompting Large Language Models (LLMs) to generate actions iteratively has become a prevalent paradigm due to its superior performance and user-friendliness. However, this paradigm is plagued by two inefficiencies: high token consumption and redundant error correction, both of which hinder its scalability for large-scale testing and applications. To address these issues, we propose Tree-Planner, which reframes task planning with LLMs into three distinct phases: plan sampling, action tree construction, and grounded deciding. Tree-Planner starts by using an LLM to sample a set of potential plans before execution, followed by the aggregation of them to form an action tree. Finally, the LLM performs a top-down decision-making process on the tree, taking into account real-time environmental information. Experiments show that Tree-Planner achieves state-of-the-art performance while maintaining high efficiency. By decomposing LLM queries into a single plan-sampling call and multiple grounded-deciding calls, a considerable part of the prompt are less likely to be repeatedly consumed. As a result, token consumption is reduced by 92.2% compared to the previously best-performing model. Additionally, by enabling backtracking on the action tree as needed, the correction process becomes more flexible, leading to a 40.5% decrease in error corrections. Project page: https://tree-planner.github.io/

摘要
这份论文研究了闭环任务规划，即通过生成一个序列的技能（计划）来完成特定目标，并在实时观察基础上修改计划。最近，通过让大型自然语言模型（LLM）逐步生成动作来实现这种方法，已成为流行的方法，因为它的性能和用户友好性。然而，这种方法受到两种不足：高度的token消耗和重复的错误修正，两者都阻碍了其扩展性，特别是对大规模测试和应用。为了解决这些问题，我们提出了Tree-Planner，它将任务规划转化为三个不同阶段：计划抽样、动作树构建和基于现场信息的决策。Tree-Planner开始使用LLM生成一组可能的计划，然后将它们聚合成动作树。最后，LLM在树上进行顶部决策过程，考虑实时环境信息。实验结果表明，Tree-Planner可以 дости得状态足以性，同时保持高效。通过将LLM查询分解成单个计划抽样调用和多个基于现场信息的决策调用，可以减少提示的92.2%。此外，通过允许在动作树上进行弹回 correction， correction过程更加灵活，导致错误修正减少40.5%。项目页面：https://tree-planner.github.io/

Jigsaw: Supporting Designers in Prototyping Multimodal Applications by Assembling AI Foundation Models

paper_url: http://arxiv.org/abs/2310.08574
repo_url: None
paper_authors: David Chuan-En Lin, Nikolas Martelaro
for: 本研究旨在帮助设计师在创作过程中更好地利用基础模型，提高设计效率和质量。
methods: 本研究使用维度模型作为基础模型，并通过将这些模型转化为独特的盘点模式来帮助设计师更好地组合不同的模式和任务。
results: 在用户研究中，Jigsaw系统有助于设计师更好地理解可用基础模型的功能，提供了不同模式和任务之间的组合指南，并且可以作为设计探索、原型制作和文档支持的画布。

Abstract
Recent advancements in AI foundation models have made it possible for them to be utilized off-the-shelf for creative tasks, including ideating design concepts or generating visual prototypes. However, integrating these models into the creative process can be challenging as they often exist as standalone applications tailored to specific tasks. To address this challenge, we introduce Jigsaw, a prototype system that employs puzzle pieces as metaphors to represent foundation models. Jigsaw allows designers to combine different foundation model capabilities across various modalities by assembling compatible puzzle pieces. To inform the design of Jigsaw, we interviewed ten designers and distilled design goals. In a user study, we showed that Jigsaw enhanced designers' understanding of available foundation model capabilities, provided guidance on combining capabilities across different modalities and tasks, and served as a canvas to support design exploration, prototyping, and documentation.

摘要
Recent advancements in AI基础模型have made it possible to use them for creative tasks such as generating design concepts or visual prototypes. However, integrating these models into the creative process can be challenging because they often exist as standalone applications tailored to specific tasks. To address this challenge, we introduce Jigsaw, a prototype system that uses puzzle pieces as metaphors to represent foundation models. Jigsaw allows designers to combine different foundation model capabilities across various modalities by assembling compatible puzzle pieces. To inform the design of Jigsaw, we interviewed ten designers and distilled their design goals. In a user study, we found that Jigsaw enhanced designers' understanding of available foundation model capabilities, provided guidance on combining capabilities across different modalities and tasks, and served as a canvas to support design exploration, prototyping, and documentation.

A Lightweight Calibrated Simulation Enabling Efficient Offline Learning for Optimal Control of Real Buildings

paper_url: http://arxiv.org/abs/2310.08569
repo_url: None
paper_authors: Judah Goldfeder, John Sipple
for: 这篇论文的目的是提出一种基于强化学习的空调系统控制方法，以减少能源消耗和碳排放。
methods: 这篇论文使用了一个自订的模拟器来训练代理人，并使用了现有的建筑和天气资料来实现更高的精度。
results: 在一个68,000平方英尺的二层建筑物上，使用这种方法可以实现仅仅半度的偏差值和现实世界之间的调整，这显示了这种方法在减少能源消耗和碳排放方面的重要性。

Abstract
Modern commercial Heating, Ventilation, and Air Conditioning (HVAC) devices form a complex and interconnected thermodynamic system with the building and outside weather conditions, and current setpoint control policies are not fully optimized for minimizing energy use and carbon emission. Given a suitable training environment, a Reinforcement Learning (RL) model is able to improve upon these policies, but training such a model, especially in a way that scales to thousands of buildings, presents many real world challenges. We propose a novel simulation-based approach, where a customized simulator is used to train the agent for each building. Our open-source simulator (available online: https://github.com/google/sbsim) is lightweight and calibrated via telemetry from the building to reach a higher level of fidelity. On a two-story, 68,000 square foot building, with 127 devices, we were able to calibrate our simulator to have just over half a degree of drift from the real world over a six-hour interval. This approach is an important step toward having a real-world RL control system that can be scaled to many buildings, allowing for greater efficiency and resulting in reduced energy consumption and carbon emissions.

摘要
现代商业冷却、通风、空调设备形成了复杂且相互连接的 термодинамиче系统，与建筑物和外部天气条件相关。目前的设点控制策略并没有充分优化能源使用和二氧化碳排放。一个适当的训练环境下，一个强化学习（RL）模型可以改进这些策略，但是训练这样一个模型，特别是在千量级建筑物上，存在许多现实世界挑战。我们提议一种新的模拟基本方法，其中每座建筑物都有自己的特定的模拟器。我们开源的模拟器（可以在线访问：https://github.com/google/sbsim）轻量级，通过建筑物的测验数据进行准确调整。在一座两层、68,000平方米的建筑物上，拥有127个设备时，我们可以在六个小时内将模拟器与实际世界之间的偏差降低到了超过一半度。这种方法是有效地帮助实现大规模化RL控制系统，从而提高能源使用效率，并减少能源消耗和二氧化碳排放。

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining

paper_url: http://arxiv.org/abs/2310.08566
repo_url: None
paper_authors: Licong Lin, Yu Bai, Song Mei
for: 这paper的目的是理解可以在offline启动的大型变换器模型中进行ICRL。
methods: 这paper使用了两种近期提出的训练方法：算法涵化和决策预训练变换器。
results: 这paper表明，supervised预训练的变换器可以很好地复制条件预期的专家算法，并且可以有效地近似在线学习算法。

Abstract
Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction trajectories from unseen environments. However, when and how transformers can be trained to perform ICRL have not been theoretically well-understood. In particular, it is unclear which reinforcement-learning algorithms transformers can perform in context, and how distribution mismatch in offline training data affects the learned algorithms. This paper provides a theoretical framework that analyzes supervised pretraining for ICRL. This includes two recently proposed training methods -- algorithm distillation and decision-pretrained transformers. First, assuming model realizability, we prove the supervised-pretrained transformer will imitate the conditional expectation of the expert algorithm given the observed trajectory. The generalization error will scale with model capacity and a distribution divergence factor between the expert and offline algorithms. Second, we show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms like LinUCB and Thompson sampling for stochastic linear bandits, and UCB-VI for tabular Markov decision processes. This provides the first quantitative analysis of the ICRL capabilities of transformers pretrained from offline trajectories.

摘要
大型转换器模型在线上强化学习数据上预训练后表现出了非常出色的在场景强化学习（ICRL）能力，它们可以在未看过环境中接受交互轨迹时作出良好的决策。然而，transformer在ICRL中被训练的时候和怎样做到ICRL都没有有 teorтичеamente好的理解。具体来说，transformer可以执行哪些强化学习算法在场景中，以及在线上训练数据中的分布差异如何影响学习的算法。这篇论文提供了一个理论框架，用于分析监督预训练的ICRL。这包括两种最近提出的训练方法：算法采样和决策预训练转换器。首先，我们假设模型可行，我们证明监督预训练的转换器将在观察轨迹时效果地复制出 conditional expectation 的专家算法。总的来说，泛化误差将与模型容量和一个分布分化因子 между专家和线上算法相关。其次，我们表明 transformer WITH ReLU 注意力可以高效地近似在线强化学习算法 like LinUCB 和 Thompson sampling для随机线性奖励，以及 UCB-VI для表格 Markov 决策过程。这是 ICRL 能力的首次量化分析。

Security Considerations in AI-Robotics: A Survey of Current Methods, Challenges, and Opportunities

paper_url: http://arxiv.org/abs/2310.08565
repo_url: None
paper_authors: Subash Neupane, Shaswata Mitra, Ivan A. Fernandez, Swayamjit Saha, Sudip Mittal, Jingdao Chen, Nisha Pillai, Shahram Rahimi
for: 这篇论文的目的是为了探讨人工智能机器人系统的安全问题。
methods: 这篇论文使用了三维ensional的攻击表面、伦理和法律问题、人机交互安全等方面进行概括和分类。
results: 这篇论文提供了一个总结性的对话，包括攻击表面、伦理和法律问题、人机交互安全等方面的概括和分类，以帮助用户、开发者和其他关注者更好地理解这些领域，并提高整体系统安全性。

Abstract
Robotics and Artificial Intelligence (AI) have been inextricably intertwined since their inception. Today, AI-Robotics systems have become an integral part of our daily lives, from robotic vacuum cleaners to semi-autonomous cars. These systems are built upon three fundamental architectural elements: perception, navigation and planning, and control. However, while the integration of AI-Robotics systems has enhanced the quality our lives, it has also presented a serious problem - these systems are vulnerable to security attacks. The physical components, algorithms, and data that make up AI-Robotics systems can be exploited by malicious actors, potentially leading to dire consequences. Motivated by the need to address the security concerns in AI-Robotics systems, this paper presents a comprehensive survey and taxonomy across three dimensions: attack surfaces, ethical and legal concerns, and Human-Robot Interaction (HRI) security. Our goal is to provide users, developers and other stakeholders with a holistic understanding of these areas to enhance the overall AI-Robotics system security. We begin by surveying potential attack surfaces and provide mitigating defensive strategies. We then delve into ethical issues, such as dependency and psychological impact, as well as the legal concerns regarding accountability for these systems. Besides, emerging trends such as HRI are discussed, considering privacy, integrity, safety, trustworthiness, and explainability concerns. Finally, we present our vision for future research directions in this dynamic and promising field.

摘要
人工智能（AI）和机器人技术自出发以来一直是不可分割的。今天，AI-机器人系统已成为我们日常生活的一部分，从吸尘器到半自动汽车。这些系统建立在三个基本建筑元素之上：感知、导航和规划，以及控制。然而，AI-机器人系统的集成也导致了一个严重的问题——这些系统容易受到安全攻击。物理组件、算法和数据，这些组成AI-机器人系统的元素可以被恶意攻击者滥用，可能导致严重的后果。为了解决AI-机器人系统的安全问题，本文提供了全面的调查和分类，涵盖三个维度：攻击表面、伦理和法律问题，以及人机交互安全。我们的目标是为用户、开发者和其他参与者提供一个整体的理解，以增强AI-机器人系统的安全性。我们开始是检查潜在的攻击表面，并提供防御策略。然后，我们详细讨论了伦理问题，如依赖和心理影响，以及法律问题，包括负责任的问题。此外，我们还讨论了新趋势，如人机交互，考虑隐私、完整性、安全、可靠性、可 explainer 的问题。最后，我们提出了未来研究方向的视野。

MemGPT: Towards LLMs as Operating Systems

paper_url: http://arxiv.org/abs/2310.08560
repo_url: None
paper_authors: Charles Packer, Vivian Fang, Shishir G. Patil, Kevin Lin, Sarah Wooders, Joseph E. Gonzalez
for: 该论文旨在解决现代大语言模型（LLM）受限于局部上下文窗口的问题，提高LLM在长 conversations 和文档分析等任务中的实用性。
methods: 该论文提出了虚拟上下文管理技术， drawing inspiration from hierarchical memory systems in traditional operating systems，以提供较大的上下文资源，并使用中断来管理控制流。
results: 在文档分析和多会话聊天两个领域中，MemGPT能够有效地提供extended context，超过了基于LLM的局部上下文窗口的性能。

Abstract
Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis. To enable using context beyond limited context windows, we propose virtual context management, a technique drawing inspiration from hierarchical memory systems in traditional operating systems that provide the appearance of large memory resources through data movement between fast and slow memory. Using this technique, we introduce MemGPT (Memory-GPT), a system that intelligently manages different memory tiers in order to effectively provide extended context within the LLM's limited context window, and utilizes interrupts to manage control flow between itself and the user. We evaluate our OS-inspired design in two domains where the limited context windows of modern LLMs severely handicaps their performance: document analysis, where MemGPT is able to analyze large documents that far exceed the underlying LLM's context window, and multi-session chat, where MemGPT can create conversational agents that remember, reflect, and evolve dynamically through long-term interactions with their users. We release MemGPT code and data for our experiments at https://memgpt.ai.

摘要

paper_url: http://arxiv.org/abs/2310.08559
repo_url: https://github.com/linlu-qiu/lm-inductive-reasoning
paper_authors: Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren
for: 这个研究旨在探讨语言模型（LM）在推理中的 inductive reasoning 能力，以及LM与人类 inductive reasoning 过程的差异。
methods: 研究使用了迭代假设细化（iterative hypothesis refinement）技术，包括提出、选择和细化假设的三个步骤，以模拟人类 inductive reasoning 过程。
results: 研究发现，LM 在 inductive reasoning 任务中表现出色，但也存在一些问题，如规则推理和应用等方面的表现下降，这表明LM 可能只是提出了假设而无法实际应用规则。此外，研究还发现了LM 和人类 inductive reasoning 过程之间的几个差异。

Abstract
The ability to derive underlying principles from a handful of observations and then generalize to novel situations -- known as inductive reasoning -- is central to human intelligence. Prior work suggests that language models (LMs) often fall short on inductive reasoning, despite achieving impressive success on research benchmarks. In this work, we conduct a systematic study of the inductive reasoning capabilities of LMs through iterative hypothesis refinement, a technique that more closely mirrors the human inductive process than standard input-output prompting. Iterative hypothesis refinement employs a three-step process: proposing, selecting, and refining hypotheses in the form of textual rules. By examining the intermediate rules, we observe that LMs are phenomenal hypothesis proposers (i.e., generating candidate rules), and when coupled with a (task-specific) symbolic interpreter that is able to systematically filter the proposed set of rules, this hybrid approach achieves strong results across inductive reasoning benchmarks that require inducing causal relations, language-like instructions, and symbolic concepts. However, they also behave as puzzling inductive reasoners, showing notable performance gaps in rule induction (i.e., identifying plausible rules) and rule application (i.e., applying proposed rules to instances), suggesting that LMs are proposing hypotheses without being able to actually apply the rules. Through empirical and human analyses, we further reveal several discrepancies between the inductive reasoning processes of LMs and humans, shedding light on both the potentials and limitations of using LMs in inductive reasoning tasks.

摘要
人类智能中的一个重要特点是从少量观察结果中推导出基本原则，然后将其应用到新的情况下。这种推导能力被称为推理，是人类智能的核心能力。尽管语言模型（LM）在研究 benchmark上表现出色，但在推理能力方面 frequently falls short。在这项工作中，我们通过迭代假设细化来系统地研究LM的推理能力，这种方法更加像人类的推理过程。迭代假设细化包括提出、选择和细化假设的三个步骤，通过分析中间规则，我们发现LM是出色的假设提出者（即生成候选规则），当与任务特定的符号化 интерпрета器相结合，这种混合方法在induction reasoning benchmarks上表现出强劲。然而，LMs也表现出了吸引人的推理行为，包括规则生成和规则应用的性能差距，这表明LMs在提出假设时不能实际应用规则。通过实验和人类分析，我们进一步揭示了LMs和人类在推理过程中的差异，这有助于理解LMs在推理任务中的潜在能力和局限性。

Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias

paper_url: http://arxiv.org/abs/2310.08558
repo_url: https://github.com/MaxSobolMark/OOO
paper_authors: Max Sobol Mark, Archit Sharma, Fahim Tajwar, Rafael Rafailov, Sergey Levine, Chelsea Finn
for: 该论文主要目标是提高在在线学习 Reinforcement Learning (RL) 中的策略训练效果，特别是在缺乏足够状态覆盖的情况下。
methods: 该论文提出了一种 Offline-to-Online-to-Offline (OOO) 框架，其中在在线finetuning过程中使用了一个optimistic（探索）策略和一个pessimistic（利用）策略。在这个框架中， optimistic策略用于与环境交互，而pessimistic策略则是根据所有观察到的数据进行训练。
results: 该论文的实验结果显示，OOO框架可以提高在线RL的性能，并且可以在缺乏足够状态覆盖的情况下进行策略训练。实验结果还表明，OOO框架可以与其他在线RL和离线RL方法相结合，并且可以在一些OpenAI gym环境上提高在线RL性能 by 165%。

Abstract
It is desirable for policies to optimistically explore new states and behaviors during online reinforcement learning (RL) or fine-tuning, especially when prior offline data does not provide enough state coverage. However, exploration bonuses can bias the learned policy, and our experiments find that naive, yet standard use of such bonuses can fail to recover a performant policy. Concurrently, pessimistic training in offline RL has enabled recovery of performant policies from static datasets. Can we leverage offline RL to recover better policies from online interaction? We make a simple observation that a policy can be trained from scratch on all interaction data with pessimistic objectives, thereby decoupling the policies used for data collection and for evaluation. Specifically, we propose offline retraining, a policy extraction step at the end of online fine-tuning in our Offline-to-Online-to-Offline (OOO) framework for reinforcement learning (RL). An optimistic (exploration) policy is used to interact with the environment, and a separate pessimistic (exploitation) policy is trained on all the observed data for evaluation. Such decoupling can reduce any bias from online interaction (intrinsic rewards, primacy bias) in the evaluation policy, and can allow more exploratory behaviors during online interaction which in turn can generate better data for exploitation. OOO is complementary to several offline-to-online RL and online RL methods, and improves their average performance by 14% to 26% in our fine-tuning experiments, achieves state-of-the-art performance on several environments in the D4RL benchmarks, and improves online RL performance by 165% on two OpenAI gym environments. Further, OOO can enable fine-tuning from incomplete offline datasets where prior methods can fail to recover a performant policy. Implementation: https://github.com/MaxSobolMark/OOO

摘要
<>translation into Simplified Chinese<>政策应该在在线强化学习（RL）或精度调整时，积极探索新状态和行为。特别是当前在线数据不够覆盖状态时，这对于政策的学习非常有利。然而，探索奖励可能会偏移学习的政策，我们的实验发现，标准使用探索奖励可能会失败回归高性能政策。同时，在线RL中的积极训练已经使得从静态数据中回归高性能政策成为可能。我们可以利用在线RL来回归更好的政策从在线互动中？我们提出了一个简单的观察：一个政策可以从所有互动数据中准备零，并使用消极目标进行训练。这可以减少在线互动中的偏见（内在奖励、优先级偏见），并允许在线互动中更多的探索行为，从而生成更好的数据进行利用。我们提出了在线重新训练（OOO）框架，它在在线 Fine-tuning 过程中使用一个积极（探索）政策和一个独立的消极（利用）政策进行训练。这种分离可以减少在线互动中的偏见，并允许更多的探索行为，从而提高在线RL的性能。OOO 与多种在线-to-Offline RL 和在线RL 方法相结合，可以提高均衡性能。我们的实验表明，OOO 可以在 D4RL benchmark 上达到状态-of-the-art 性能，并在 OpenAI gym 中的两个环境上提高在线RL 性能 by 165%。此外，OOO 可以在无法回归高性能政策的情况下，从不完整的 Offline 数据进行 fine-tuning。实现：https://github.com/MaxSobolMark/OOO。

Cross-Episodic Curriculum for Transformer Agents

paper_url: http://arxiv.org/abs/2310.08549
repo_url: https://github.com/CEC-Agent/CEC
paper_authors: Lucy Xiaoyang Shi, Yunfan Jiang, Jake Grigsby, Linxi “Jim” Fan, Yuke Zhu
for: 提高 transformer 代理的学习效率和通用性
methods: 跨话Context curriculum 方法
results: 在多任务 reinforcement learning 和模仿学习中表现出色，政策表现出超过比较者的优势和强大的通用性

Abstract
We present a new algorithm, Cross-Episodic Curriculum (CEC), to boost the learning efficiency and generalization of Transformer agents. Central to CEC is the placement of cross-episodic experiences into a Transformer's context, which forms the basis of a curriculum. By sequentially structuring online learning trials and mixed-quality demonstrations, CEC constructs curricula that encapsulate learning progression and proficiency increase across episodes. Such synergy combined with the potent pattern recognition capabilities of Transformer models delivers a powerful cross-episodic attention mechanism. The effectiveness of CEC is demonstrated under two representative scenarios: one involving multi-task reinforcement learning with discrete control, such as in DeepMind Lab, where the curriculum captures the learning progression in both individual and progressively complex settings; and the other involving imitation learning with mixed-quality data for continuous control, as seen in RoboMimic, where the curriculum captures the improvement in demonstrators' expertise. In all instances, policies resulting from CEC exhibit superior performance and strong generalization. Code is open-sourced at https://cec-agent.github.io/ to facilitate research on Transformer agent learning.

摘要
我们提出了一种新的算法 named Cross-Episodic Curriculum (CEC), 用于提高 transformer 代理的学习效率和通用性。 CEC 的核心思想是在 transformer 的上下文中放置 cross-episodic 经验，这些经验组成了一个 curriculum。通过将在线学习课程和杂质示例进行顺序排序，CEC 构建了包含学习进程和能力提升的 curricula。这种同时利用 transformer 模型强大的模式识别能力和 curriculum 结构的 synergy，实现了一种强大的 cross-episodic 注意力机制。在 two 个代表性的场景中，CEC 的效果得到了证明：一个是在 DeepMind Lab 中进行多任务强化学习，其中 curriculum 捕捉了学习过程中的个体和逐渐复杂的设置；另一个是在 RoboMimic 中进行模仿学习，其中 curriculum 捕捉了示例师的专业水平提高。在所有情况下，由 CEC 生成的策略均显示出superior performance和强大的泛化能力。code 可以在上下载，以便研究 transformer 代理学习。

Do pretrained Transformers Really Learn In-context by Gradient Descent?

paper_url: http://arxiv.org/abs/2310.08540
repo_url: None
paper_authors: Lingfeng Shen, Aayush Mishra, Daniel Khashabi
for: 这种研究旨在检验是否存在某种潜在的相似性 между大语言模型中的增量学习（ICL）和梯度下降（GD）。
methods: 该研究使用了一种新的方法，即在大语言模型中使用Transformer网络进行学习，并通过对ICL和GD进行比较，检验它们之间的关系。
results: 研究发现，ICL和GD在不同的数据集、模型和示例数下 exhibit 不同的行为，表明它们之间并不是完全相同的。这些结果 Suggests 存在一些假设不符的问题，需要进一步的研究以确认它们的等价性。

Abstract
Is In-Context Learning (ICL) implicitly equivalent to Gradient Descent (GD)? Several recent works draw analogies between the dynamics of GD and the emergent behavior of ICL in large language models. However, these works make assumptions far from the realistic natural language setting in which language models are trained. Such discrepancies between theory and practice, therefore, necessitate further investigation to validate their applicability. We start by highlighting the weaknesses in prior works that construct Transformer weights to simulate gradient descent. Their experiments with training Transformers on ICL objective, inconsistencies in the order sensitivity of ICL and GD, sparsity of the constructed weights, and sensitivity to parameter changes are some examples of a mismatch from the real-world setting. Furthermore, we probe and compare the ICL vs. GD hypothesis in a natural setting. We conduct comprehensive empirical analyses on language models pretrained on natural data (LLaMa-7B). Our comparisons on various performance metrics highlight the inconsistent behavior of ICL and GD as a function of various factors such as datasets, models, and number of demonstrations. We observe that ICL and GD adapt the output distribution of language models differently. These results indicate that the equivalence between ICL and GD is an open hypothesis, requires nuanced considerations and calls for further studies.

摘要
是否存在卷积下降（GD）与Context Learning（ICL）的隐式等价？一些最近的研究将GD和ICL的动力学比作，但这些研究假设了不realistic的自然语言训练环境，导致了与实际情况之间的差异。因此，进一步的调查是必要的以验证其可靠性。我们开始于 highlighting priors works的缺陷，它们通过构建Transformer weights来模拟GD。他们在训练Transformers时使用ICL目标，但存在一些不一致的问题，如ICL和GD的敏感性顺序、稀疏的构建矩阵、和参数变化的敏感性。此外，我们进行了ICL vs. GD的比较，并在自然 Setting中进行了广泛的实验分析。我们在使用自然数据（LLaMa-7B）预训练的语言模型上进行了多种表现指标的比较。我们发现，ICL和GD在不同的数据集、模型和示例数目上 exhibit不一致的行为。这些结果表明，ICL和GD的等价性是一个开放的假设，需要细致的考虑和进一步的研究。

Formally Specifying the High-Level Behavior of LLM-Based Agents

paper_url: http://arxiv.org/abs/2310.08535
repo_url: None
paper_authors: Maxwell Crouse, Ibrahim Abdelaziz, Kinjal Basu, Soham Dan, Sadhana Kumaravel, Achille Fokoue, Pavan Kapanipathi, Luis Lastras
for:LLM-based agents are promising tools for solving challenging problems without the need for task-specific finetuned models.methods:The proposed framework uses Linear Temporal Logic (LTL) to specify desired agent behaviors, and a constrained decoder to guarantee the LLM will produce an output exhibiting the desired behavior.results:The framework enables rapid design, implementation, and experimentation with different LLM-based agents, and provides benefits such as the ability to enforce complex agent behavior, formally validate prompt examples, and incorporate content-focused logical constraints into generation. The approach leads to improvements in agent performance, and the code is released for general use.Here is the text in Simplified Chinese:for: LLM-based agents 是一种可以解决复杂问题的有前途的工具，无需特定任务的精心适应模型。methods: 提议的框架使用线性时间逻辑（LTL）来指定代理行为，并使用受限的解码器来保证 LLM 生成输出符合所需的行为。results: 该框架可以快速设计、实现和测试不同的 LLM-based agents，并提供了一些优点，如强制执行复杂的代理行为、正式验证提示示例、内容专注的逻辑约束的 incorporation into generation。该方法可以提高代理性能，并公开发布代码。

Abstract
LLM-based agents have recently emerged as promising tools for solving challenging problems without the need for task-specific finetuned models that can be expensive to procure. Currently, the design and implementation of such agents is ad hoc, as the wide variety of tasks that LLM-based agents may be applied to naturally means there can be no one-size-fits-all approach to agent design. In this work we aim to alleviate the difficulty of designing and implementing new agents by proposing a minimalistic, high-level generation framework that simplifies the process of building agents. The framework we introduce allows the user to specify desired agent behaviors in Linear Temporal Logic (LTL). The declarative LTL specification is then used to construct a constrained decoder that guarantees the LLM will produce an output exhibiting the desired behavior. By designing our framework in this way, we obtain several benefits, including the ability to enforce complex agent behavior, the ability to formally validate prompt examples, and the ability to seamlessly incorporate content-focused logical constraints into generation. In particular, our declarative approach, in which the desired behavior is simply described without concern for how it should be implemented or enforced, enables rapid design, implementation and experimentation with different LLM-based agents. We demonstrate how the proposed framework can be used to implement recent LLM-based agents, and show how the guardrails our approach provides can lead to improvements in agent performance. In addition, we release our code for general use.

摘要
The framework we introduce allows the user to specify desired agent behaviors in Linear Temporal Logic (LTL). The declarative LTL specification is then used to construct a constrained decoder that guarantees the LLM will produce an output exhibiting the desired behavior. By designing our framework in this way, we obtain several benefits, including the ability to enforce complex agent behavior, the ability to formally validate prompt examples, and the ability to seamlessly incorporate content-focused logical constraints into generation.In particular, our declarative approach, in which the desired behavior is simply described without concern for how it should be implemented or enforced, enables rapid design, implementation, and experimentation with different LLM-based agents. We demonstrate how the proposed framework can be used to implement recent LLM-based agents, and show how the guardrails our approach provides can lead to improvements in agent performance. Additionally, we release our code for general use.

How connectivity structure shapes rich and lazy learning in neural circuits

paper_url: http://arxiv.org/abs/2310.08513
repo_url: None
paper_authors: Yuhan Helena Liu, Aristide Baratin, Jonathan Cornford, Stefan Mihalas, Eric Shea-Brown, Guillaume Lajoie
for: 这个论文探讨了深度学习工具如何用于研究神经网络学习动态。
methods: 这篇论文使用了实验和理论分析来研究初始积分特性如何影响神经网络的学习 режим。
results: 研究发现，高级别初始积分通常导致小变化的网络学习 режим，而低级别初始积分则导致更加丰富的学习 режим。

Abstract
In theoretical neuroscience, recent work leverages deep learning tools to explore how some network attributes critically influence its learning dynamics. Notably, initial weight distributions with small (resp. large) variance may yield a rich (resp. lazy) regime, where significant (resp. minor) changes to network states and representation are observed over the course of learning. However, in biology, neural circuit connectivity generally has a low-rank structure and therefore differs markedly from the random initializations generally used for these studies. As such, here we investigate how the structure of the initial weights, in particular their effective rank, influences the network learning regime. Through both empirical and theoretical analyses, we discover that high-rank initializations typically yield smaller network changes indicative of lazier learning, a finding we also confirm with experimentally-driven initial connectivity in recurrent neural networks. Conversely, low-rank initialization biases learning towards richer learning. Importantly, however, as an exception to this rule, we find lazier learning can still occur with a low-rank initialization that aligns with task and data statistics. Our research highlights the pivotal role of initial weight structures in shaping learning regimes, with implications for metabolic costs of plasticity and risks of catastrophic forgetting.

摘要
在理论神经科学中，最近的工作利用深度学习工具来探索如何某些网络特性影响其学习动态。特别是，初始 веса分布有小（resp. 大）方差可能导致一个丰富（resp. 懒散）的学习模式，其中网络状态和表示有 significiant（resp. 微不足）的变化。然而，生物中神经Circuit连接通常具有低维结构，因此与通常用于这些研究的随机初始化不同。因此，我们 investigate how the structure of the initial weights, particularly their effective rank, influences the network learning regime.通过实验和理论分析，我们发现高维初始化通常导致小网络变化，表示懒散学习，而低维初始化启动学习更加丰富。然而，我们发现在任务和数据统计相align的低维初始化下，可以occurrence lazier learning。我们的研究强调初始 веса结构在形成学习模式的作用，有关 метаболиic cost of plasticity和忘记风险。

HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science

paper_url: http://arxiv.org/abs/2310.08511
repo_url: https://github.com/BangLab-UdeM-Mila/NLP4MatSci-HoneyBee
paper_authors: Yu Song, Santiago Miret, Huan Zhang, Bang Liu
for: 本研究的目的是提出一种信任worthy数据准备过程（MatSci-Instruct），并应用其在语言模型中进行迭代优化（HoneyBee），以解决物理科学领域的数据准备问题。
methods: 本研究使用了多个商业可用的大语言模型（如Chat-GPT和Claude），通过Instructor模块和Verifier模块的合作，提高生成的数据的可靠性和相关性。
results: 本研究通过MatSci-Instruct来构建多个任务的数据集，并评估了数据集的质量从多个维度，包括准确性、相关性、完整性和合理性。此外，本研究还通过迭代生成更加定向的指令和指令数据来进行迭代优化，以达到进一步改进HoneyBee模型的性能。

Abstract
We propose an instruction-based process for trustworthy data curation in materials science (MatSci-Instruct), which we then apply to finetune a LLaMa-based language model targeted for materials science (HoneyBee). MatSci-Instruct helps alleviate the scarcity of relevant, high-quality materials science textual data available in the open literature, and HoneyBee is the first billion-parameter language model specialized to materials science. In MatSci-Instruct we improve the trustworthiness of generated data by prompting multiple commercially available large language models for generation with an Instructor module (e.g. Chat-GPT) and verification from an independent Verifier module (e.g. Claude). Using MatSci-Instruct, we construct a dataset of multiple tasks and measure the quality of our dataset along multiple dimensions, including accuracy against known facts, relevance to materials science, as well as completeness and reasonableness of the data. Moreover, we iteratively generate more targeted instructions and instruction-data in a finetuning-evaluation-feedback loop leading to progressively better performance for our finetuned HoneyBee models. Our evaluation on the MatSci-NLP benchmark shows HoneyBee's outperformance of existing language models on materials science tasks and iterative improvement in successive stages of instruction-data refinement. We study the quality of HoneyBee's language modeling through automatic evaluation and analyze case studies to further understand the model's capabilities and limitations. Our code and relevant datasets are publicly available at \url{https://github.com/BangLab-UdeM-Mila/NLP4MatSci-HoneyBee}.

摘要
我们提出一种基于 instrucion 的数据纯化 процесс，称为 MatSci-Instruct，用于提高材料科学领域的数据质量。我们 THEN 使用这种 processto 训练一个基于 LLaMa 语言模型，称为 HoneyBee，以提高材料科学领域的语言模型性能。MatSci-Instruct 可以帮助解决开 literature 中材料科学领域的相关、高质量文本数据的缺乏问题，HoneyBee 是首个专门针对材料科学的一千亿参数语言模型。在 MatSci-Instruct 中，我们通过多个商业可用的大语言模型（例如 Chat-GPT 和 Claude）的干预和独立验证模块的验证来提高生成数据的可靠性。我们使用 MatSci-Instruct 构建多个任务的数据集，并对数据集进行多维度评估，包括准确性、 relevance、完整性和合理性。此外，我们在 finetuning-evaluation-feedback 循环中不断生成更加定向的 instructon-data，导致我们的 fine-tuned HoneyBee 模型的表现不断改善。我们在 MatSci-NLP benchmark 上进行评估，发现 HoneyBee 对材料科学任务的表现优于现有语言模型，并在 successive stages of instruction-data refinement 中进行Iterative improvement。我们通过自动评估和案例研究来深入了解 HoneyBee 模型的能力和局限性。我们的代码和相关数据集可以在 \url{https://github.com/BangLab-UdeM-Mila/NLP4MatSci-HoneyBee} 上获取。

Impact of time and note duration tokenizations on deep learning symbolic music modeling

paper_url: http://arxiv.org/abs/2310.08497
repo_url: https://github.com/Natooz/music-modeling-time-duration
paper_authors: Nathan Fradet, Nicolas Gutowski, Fabien Chhel, Jean-Pierre Briot
for: 本研究旨在研究Symbolic music在深度学习任务中的应用，包括生成、识别、合成和Music Information Retrieval（MIR）等。
methods: 本研究使用了不同的tokenization方法，包括时间和音长表示方法，以研究这些方法对Transformer模型的表现的影响。
results: 研究发现，виси于任务，explicit信息可以提高表现，而time和音长表示方法在不同任务中的表现有所不同。

Abstract
Symbolic music is widely used in various deep learning tasks, including generation, transcription, synthesis, and Music Information Retrieval (MIR). It is mostly employed with discrete models like Transformers, which require music to be tokenized, i.e., formatted into sequences of distinct elements called tokens. Tokenization can be performed in different ways. As Transformer can struggle at reasoning, but capture more easily explicit information, it is important to study how the way the information is represented for such model impact their performances. In this work, we analyze the common tokenization methods and experiment with time and note duration representations. We compare the performances of these two impactful criteria on several tasks, including composer and emotion classification, music generation, and sequence representation learning. We demonstrate that explicit information leads to better results depending on the task.

摘要
Symbolic music 广泛应用于深度学习任务中，包括生成、识别、合成和音乐信息检索（MIR）。它通常与分割模型如转换器结合使用，这些模型需要音乐被格式化为序列中的固定元素，即token。格式化可以通过不同的方式进行，而转换器可能会很难理解，但可以较容易捕捉明确的信息。因此，我们需要研究不同的表示方式对这种模型的性能产生何种影响。在这项工作中，我们分析了常见的tokenization方法，并对时间和音符持续时间的表示进行实验。我们比较了这两个重要的标准准则在不同任务中的表现，包括作曲和情感分类、音乐生成和序列表示学习。我们发现，明确的信息会带来更好的结果，具体取决于任务。

Can We Edit Multimodal Large Language Models?

paper_url: http://arxiv.org/abs/2310.08475
repo_url: https://github.com/zjunlp/easyedit
paper_authors: Siyuan Cheng, Bozhong Tian, Qingbin Liu, Xi Chen, Yongheng Wang, Huajun Chen, Ningyu Zhang
for: 这个论文主要关注于编辑多Modal大型自然语言模型（MLLMs）。与单Modal模型编辑相比，多Modal模型编辑更加具有挑战性，需要更高的级别的精检和谨慎的考虑。为促进这一领域的研究，我们建立了一个新的标准 benchmark，名为MMEdit，并开发了一组创新的评价指标。
methods: 我们在这个论文中采用了多种模型编辑基线和评价指标，并进行了广泛的实验。我们发现，之前的基线可以在一定程度上实现编辑多Modal LLMs，但效果仍然很有限，这表明这个任务可能比较困难。
results: 我们的实验结果表明，之前的基线可以在一定程度上实现编辑多Modal LLMs，但效果仍然很有限。我们希望通过这个研究，为NLP社区提供一些新的想法和灵感。代码和数据集可以在https://github.com/zjunlp/EasyEdit中下载。

Abstract
In this paper, we focus on editing Multimodal Large Language Models (MLLMs). Compared to editing single-modal LLMs, multimodal model editing is more challenging, which demands a higher level of scrutiny and careful consideration in the editing process. To facilitate research in this area, we construct a new benchmark, dubbed MMEdit, for editing multimodal LLMs and establishing a suite of innovative metrics for evaluation. We conduct comprehensive experiments involving various model editing baselines and analyze the impact of editing different components for multimodal LLMs. Empirically, we notice that previous baselines can implement editing multimodal LLMs to some extent, but the effect is still barely satisfactory, indicating the potential difficulty of this task. We hope that our work can provide the NLP community with insights. Code and dataset are available in https://github.com/zjunlp/EasyEdit.

摘要
在这篇论文中，我们关注编辑多Modal Large Language Models（MLLMs）。与单modal LLMs 编辑相比，多modal 模型编辑更加具有挑战性，需要更高的审核和谨慎评估。为促进这一领域的研究，我们构建了一个新的标准测试集，名为MMEdit，并开发了一套创新的评价指标。我们进行了对多modal LLMs 编辑不同组件的全面实验，并分析了不同组件的编辑对多modal LLMs 的影响。实验结果表明，前一代基eline可以部分地编辑多modal LLMs，但效果仍然很有限，表明这是一项具有挑战性的任务。我们希望通过这项工作，为NLP社区提供新的想法。代码和数据集可以在https://github.com/zjunlp/EasyEdit 上找到。

Belief formation and the persistence of biased beliefs

paper_url: http://arxiv.org/abs/2310.08466
repo_url: None
paper_authors: Olivier Compte
for: 本研究旨在描述智能代理如何在决策过程中处理信息，以及如何偏袋证据导致的偏见。
methods: 本研究使用了一种假设形成模型，其中代理尝试将两个理论区分开，并且因为证据的强度差异，倾向于接受具有强（可能罕见）证据的理论。
results: 研究发现，由于信息处理限制，代理可能会剪辑弱证据，导致一些歧义问题中的证据变得一面。更加聪明的代理不会受到这些偏袋证据的影响，但是一些不那么聪明的代理可能会偏袋其信念。

Abstract
We propose a belief-formation model where agents attempt to discriminate between two theories, and where the asymmetry in strength between confirming and disconfirming evidence tilts beliefs in favor of theories that generate strong (and possibly rare) confirming evidence and weak (and frequent) disconfirming evidence. In our model, limitations on information processing provide incentives to censor weak evidence, with the consequence that for some discrimination problems, evidence may become mostly one-sided, independently of the true underlying theory. Sophisticated agents who know the characteristics of the censored data-generating process are not lured by this accumulation of ``evidence'', but less sophisticated ones end up with biased beliefs.

摘要
我们提出了一种信仰形成模型，在这个模型中，代理人尝试区分两个理论，而差异强度 между证实和驳斥证据使得信仰倾向于强大（可能罕见）的证实证据和弱（常见）的驳斥证据。在我们的模型中，信息处理的限制提供了奖励自我ensorcement的机会，导致一些推理问题上的证据变得一面，独立于真实下面理论。更加了解的代理人不会受到这些偏见的影响，但是不那么了解的代理人则会受到偏见。

DistillSpec: Improving Speculative Decoding via Knowledge Distillation

paper_url: http://arxiv.org/abs/2310.08461
repo_url: None
paper_authors: Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, Rishabh Agarwal
for: 这个论文旨在提高大型语言模型的推导速度，使用快速的范本模型生成多个 tokens，然后在平行验证过程中运用更大的目标模型来生成文本，根据目标模型的分布。
methods: 这个方法使用知识传递来更好地调整范本模型和目标模型之间的对齐性，然后通过快速推导来实现文本生成。
results: 这个方法可以在多个标准参数上获得很好的速度提升，从10%到45%不等，并且可以在不同的标准参数和推导策略下进行精确的调整。此外，这个方法可以与丧失SD结合，以控制推导延误和任务性能的贸易。最后，这个方法可以在实际的实验中，使用对齐模型来实现6-10倍的延误缩减，而且几乎没有性能下降。

Abstract
Speculative decoding (SD) accelerates large language model inference by employing a faster draft model for generating multiple tokens, which are then verified in parallel by the larger target model, resulting in the text generated according to the target model distribution. However, identifying a compact draft model that is well-aligned with the target model is challenging. To tackle this issue, we propose DistillSpec that uses knowledge distillation to better align the draft model with the target model, before applying SD. DistillSpec makes two key design choices, which we demonstrate via systematic study to be crucial to improving the draft and target alignment: utilizing on-policy data generation from the draft model, and tailoring the divergence function to the task and decoding strategy. Notably, DistillSpec yields impressive 10 - 45% speedups over standard SD on a range of standard benchmarks, using both greedy and non-greedy sampling. Furthermore, we combine DistillSpec with lossy SD to achieve fine-grained control over the latency vs. task performance trade-off. Finally, in practical scenarios with models of varying sizes, first using distillation to boost the performance of the target model and then applying DistillSpec to train a well-aligned draft model can reduce decoding latency by 6-10x with minimal performance drop, compared to standard decoding without distillation.

摘要
假设解oding（SD）可以加速大型语言模型的推断，通过使用更快的稿本模型来生成多个字元，然后在平行验证这些字元的准确性，以生成根据目标模型分布的文本。但是，找到一个具有单位大小的稿本模型，与目标模型相互Alignment是一个挑战。为了解决这个问题，我们提出了DistillSpec，它使用知识传播来更好地对稿本模型和目标模型进行Alignment。DistillSpec做出了两项重要的设计决策，我们通过系统性的研究证明这些设计决策是关键的提高稿本和目标模型的Alignment：使用稿本模型生成的在policy数据来验证稿本模型，并调整差异函数以适应任务和推断策略。特别是，DistillSpec可以在标准 benchmark 上获得了10-45%的提高，使用了 both greedy 和 non-greedy 推断。此外，我们可以将DistillSpec与lossy SD 结合，以获得精确的任务性能和时延调整。最后，在实际应用中，首先使用对target模型进行增强，然后使用DistillSpec对稿本模型进行训练，可以将推断时间降低6-10倍，而且几乎没有性能下降。

A Survey of Heterogeneous Transfer Learning

paper_url: http://arxiv.org/abs/2310.08459
repo_url: https://github.com/ymsun99/Heterogeneous-Transfer-Learning
paper_authors: Runxue Bao, Yiming Sun, Yuhe Gao, Jindong Wang, Qiang Yang, Haifeng Chen, Zhi-Hong Mao, Ye Ye
for: 本研究旨在提供一份彻悟的综述，涵盖最新的非同一致学习方法的发展，以帮助未来的研究。
methods: 本文总结了不同学习场景下的多样化学习方法，包括自适应学习、卷积神经网络、隐藏状态模型、等方法，以及它们在不同应用场景中的应用。
results: 本文综述了不同领域中的实验结果，包括自然语言处理、计算机视觉、多模式识别、生物医学等领域，以及它们的应用场景和限制。

Abstract
The application of transfer learning, an approach utilizing knowledge from a source domain to enhance model performance in a target domain, has seen a tremendous rise in recent years, underpinning many real-world scenarios. The key to its success lies in the shared common knowledge between the domains, a prerequisite in most transfer learning methodologies. These methods typically presuppose identical feature spaces and label spaces in both domains, known as homogeneous transfer learning, which, however, is not always a practical assumption. Oftentimes, the source and target domains vary in feature spaces, data distributions, and label spaces, making it challenging or costly to secure source domain data with identical feature and label spaces as the target domain. Arbitrary elimination of these differences is not always feasible or optimal. Thus, heterogeneous transfer learning, acknowledging and dealing with such disparities, has emerged as a promising approach for a variety of tasks. Despite the existence of a survey in 2017 on this topic, the fast-paced advances post-2017 necessitate an updated, in-depth review. We therefore present a comprehensive survey of recent developments in heterogeneous transfer learning methods, offering a systematic guide for future research. Our paper reviews methodologies for diverse learning scenarios, discusses the limitations of current studies, and covers various application contexts, including Natural Language Processing, Computer Vision, Multimodality, and Biomedicine, to foster a deeper understanding and spur future research.

摘要
“将学习传播技术应用到目标领域，以优化模型表现，在过去几年中获得了巨大的发展，支撑了许多实际应用场景。这些方法通常假设源领域和目标领域之间存在共同知识，这是传统的传播学习方法的前提。然而，这些方法通常假设源领域和目标领域之间存在同样的特征空间和标签空间，这称为同样的传播学习。然而，这种假设不一定是实际可行或优化的。因此，不同领域之间的传播学习，承认和处理这些差异，已经成为一种有前途的方法。尽管在2017年已经有一篇关于这个主题的调查，但随着时间的推移，这些领域的发展速度很快，因此我们需要一份更新、更深入的评论。我们因此提出了一份综观最近几年传播学习方法的综观，实现了系统化的引导。我们的评论涵盖了多种学习enario，讨论了现有研究的限制，并涵盖了不同应用场景，包括自然语言处理、computer vision、多模式和生医，以促进更深入的理解和未来研究。”

Metrics for popularity bias in dynamic recommender systems

paper_url: http://arxiv.org/abs/2310.08455
repo_url: None
paper_authors: Valentijn Braun, Debarati Bhaumik, Diptish Dey
for: 这篇论文主要目标是量化推荐系统中的不公正和偏见。
methods: 论文提出了四种度量推荐系统中受欢迎性偏见的指标，并在两个常用的 benchmark 数据集上测试了四种 collaborative filtering 算法。
results: 测试结果表明，提出的度量指标可以为推荐系统的不公正和偏见提供全面的理解，并且在不同的敏感用户群体中存在增长的差距。

Abstract
Albeit the widespread application of recommender systems (RecSys) in our daily lives, rather limited research has been done on quantifying unfairness and biases present in such systems. Prior work largely focuses on determining whether a RecSys is discriminating or not but does not compute the amount of bias present in these systems. Biased recommendations may lead to decisions that can potentially have adverse effects on individuals, sensitive user groups, and society. Hence, it is important to quantify these biases for fair and safe commercial applications of these systems. This paper focuses on quantifying popularity bias that stems directly from the output of RecSys models, leading to over recommendation of popular items that are likely to be misaligned with user preferences. Four metrics to quantify popularity bias in RescSys over time in dynamic setting across different sensitive user groups have been proposed. These metrics have been demonstrated for four collaborative filtering based RecSys algorithms trained on two commonly used benchmark datasets in the literature. Results obtained show that the metrics proposed provide a comprehensive understanding of growing disparities in treatment between sensitive groups over time when used conjointly.

摘要
This paper aims to address this issue by quantifying popularity bias in RecSys, which stems directly from the output of the models and leads to the over-recommendation of popular items that may be misaligned with user preferences. To achieve this, four metrics have been proposed to quantify popularity bias in RecSys over time in a dynamic setting across different sensitive user groups. These metrics have been demonstrated for four collaborative filtering-based RecSys algorithms trained on two commonly used benchmark datasets in the literature.The results obtained show that the proposed metrics provide a comprehensive understanding of the growing disparities in treatment between sensitive groups over time when used conjointly. This study contributes to the development of fair and safe RecSys by providing a quantitative approach to identify and mitigate popularity bias.

paper_url: http://arxiv.org/abs/2310.08446
repo_url: None
paper_authors: Xiangyan Liu, Rongxue Li, Wei Ji, Tao Lin
For: This paper aims to improve the robustness of multi-modal agents in multi-step reasoning by addressing the challenge of model selection.* Methods: The paper proposes the $\textit{M}^3$ framework, a plug-in with negligible runtime overhead at test-time, to improve model selection and bolster the robustness of multi-modal agents.* Results: The paper creates a new dataset, MS-GQA, to investigate the model selection challenge in multi-modal agents and shows that the proposed framework enables dynamic model selection, considering both user inputs and subtask dependencies, thereby robustifying the overall reasoning process.

Abstract
The reasoning capabilities of LLM (Large Language Model) are widely acknowledged in recent research, inspiring studies on tool learning and autonomous agents. LLM serves as the "brain" of agent, orchestrating multiple tools for collaborative multi-step task solving. Unlike methods invoking tools like calculators or weather APIs for straightforward tasks, multi-modal agents excel by integrating diverse AI models for complex challenges. However, current multi-modal agents neglect the significance of model selection: they primarily focus on the planning and execution phases, and will only invoke predefined task-specific models for each subtask, making the execution fragile. Meanwhile, other traditional model selection methods are either incompatible with or suboptimal for the multi-modal agent scenarios, due to ignorance of dependencies among subtasks arising by multi-step reasoning. To this end, we identify the key challenges therein and propose the $\textit{M}^3$ framework as a plug-in with negligible runtime overhead at test-time. This framework improves model selection and bolsters the robustness of multi-modal agents in multi-step reasoning. In the absence of suitable benchmarks, we create MS-GQA, a new dataset specifically designed to investigate the model selection challenge in multi-modal agents. Our experiments reveal that our framework enables dynamic model selection, considering both user inputs and subtask dependencies, thereby robustifying the overall reasoning process. Our code and benchmark: https://github.com/LINs-lab/M3.

摘要
大量语言模型（LLM）的智能能力在最新的研究中得到了广泛认可，激发了工具学习和自主代理研究。LLM作为代理的“脑”，整合多种工具进行合作多步任务解决。与传统的方法不同，现在的多模态代理忽略了模型选择的重要性：它们主要关注计划和执行阶段，只在每个子任务中预先定义任务特定的模型，使执行过程脆弱。此外，传统的模型选择方法在多模态代理场景中不兼容或优化不够，因为忽略了由多步逻辑导致的任务依赖关系。为了解决这些挑战，我们认为需要一个可插入的框架，具有较少的运行时开销。我们称之为$\textit{M}^3$框架，它可以在测试时进行插入。这个框架改进了模型选择，使多模态代理在多步逻辑中更加稳定。由于缺乏适当的benchmark，我们创建了MS-GQA数据集，用于调查多模态代理中模型选择挑战的问题。我们的实验表明，我们的框架可以动态选择模型，考虑用户输入和子任务依赖关系，从而强化整体逻辑过程的稳定性。我们的代码和benchmark可以在GitHub上找到：https://github.com/LINs-lab/M3。

Debias the Training of Diffusion Models

paper_url: http://arxiv.org/abs/2310.08442
repo_url: None
paper_authors: Hu Yu, Li Shen, Jie Huang, Man Zhou, Hongsheng Li, Feng Zhao
for: 提高Diffusion模型的生成质量
methods: 提出了一种有效的权重调整策略，以解决常用的损失函数策略带来的偏见问题
results: 通过理论分析和实验评估，证明了该策略可以减少偏见问题，并提高样本质量和生成效率

Abstract
Diffusion models have demonstrated compelling generation quality by optimizing the variational lower bound through a simple denoising score matching loss. In this paper, we provide theoretical evidence that the prevailing practice of using a constant loss weight strategy in diffusion models leads to biased estimation during the training phase. Simply optimizing the denoising network to predict Gaussian noise with constant weighting may hinder precise estimations of original images. To address the issue, we propose an elegant and effective weighting strategy grounded in the theoretically unbiased principle. Moreover, we conduct a comprehensive and systematic exploration to dissect the inherent bias problem deriving from constant weighting loss from the perspectives of its existence, impact and reasons. These analyses are expected to advance our understanding and demystify the inner workings of diffusion models. Through empirical evaluation, we demonstrate that our proposed debiased estimation method significantly enhances sample quality without the reliance on complex techniques, and exhibits improved efficiency compared to the baseline method both in training and sampling processes.

摘要
Diffusion models 已经展示了吸引人的生成质量，通过简单的降噪对应loss来优化variational lower bound。在这篇论文中，我们提供了理论证明，表明常用的常数损失重量策略在Diffusion models中导致训练阶段的估计偏见。只是优化降噪网络以预测 Gaussian noise 的常数权重，可能会妨碍精准估计原始图像。为解决这个问题，我们提议一种精美和有效的权重策略，基于理论上的无偏估计原理。此外，我们进行了系统性的探索，析分了常数损失重量导致的内在偏见问题的存在、影响和原因。这些分析将有助于我们更深入理解Diffusion models的内部工作机制。通过实验评估，我们示出了我们提议的减偏估计方法可以大幅提高样本质量，不需要复杂的技术，并且在训练和采样过程中比基eline方法更高效。

The Impact of Explanations on Fairness in Human-AI Decision-Making: Protected vs Proxy Features

paper_url: http://arxiv.org/abs/2310.08617
repo_url: None
paper_authors: Navita Goyal, Connor Baumler, Tin Nguyen, Hal Daumé III
for: 本研究旨在 investigating the effect of protected and proxy features on participants’ perception of model fairness and their ability to improve demographic parity over an AI alone.
methods: 本研究使用了不同的treatments，包括解释、模型偏见披露和代理相关性披露，以影响人们对模型公平性的识别和决策公平性。
results: 研究发现，解释可以帮助人们检测直接偏见，但不能帮助人们检测间接偏见。此外，无论偏见类型如何，解释都会增加对模型偏见的同意。披露可以减轻这种效果，提高不公正认知和决策公平性。

Abstract
AI systems have been known to amplify biases in real world data. Explanations may help human-AI teams address these biases for fairer decision-making. Typically, explanations focus on salient input features. If a model is biased against some protected group, explanations may include features that demonstrate this bias, but when biases are realized through proxy features, the relationship between this proxy feature and the protected one may be less clear to a human. In this work, we study the effect of the presence of protected and proxy features on participants' perception of model fairness and their ability to improve demographic parity over an AI alone. Further, we examine how different treatments -- explanations, model bias disclosure and proxy correlation disclosure -- affect fairness perception and parity. We find that explanations help people detect direct biases but not indirect biases. Additionally, regardless of bias type, explanations tend to increase agreement with model biases. Disclosures can help mitigate this effect for indirect biases, improving both unfairness recognition and the decision-making fairness. We hope that our findings can help guide further research into advancing explanations in support of fair human-AI decision-making.

摘要

Neural Sampling in Hierarchical Exponential-family Energy-based Models

paper_url: http://arxiv.org/abs/2310.08431
repo_url: None
paper_authors: Xingsi Dong, Si Wu
for: 这个论文旨在探讨脑海的推理和学习方法。
methods: 该论文提出了 Hierarchical Exponential-family Energy-based（HEE）模型，该模型可以同时进行推理和学习，并且可以通过采样神经元响应的梯度来估计归一化函数。
results: 该模型可以快速地进行推理和学习，并且可以在自然图像 datasets 上显示出类似于生物视觉系统中的表示。此外，该模型还可以通过 marginal generation 或 joint generation 生成观察结果，并且 marginal generation 可以达到与其他 EBMs 相同的性能。

Abstract
Bayesian brain theory suggests that the brain employs generative models to understand the external world. The sampling-based perspective posits that the brain infers the posterior distribution through samples of stochastic neuronal responses. Additionally, the brain continually updates its generative model to approach the true distribution of the external world. In this study, we introduce the Hierarchical Exponential-family Energy-based (HEE) model, which captures the dynamics of inference and learning. In the HEE model, we decompose the partition function into individual layers and leverage a group of neurons with shorter time constants to sample the gradient of the decomposed normalization term. This allows our model to estimate the partition function and perform inference simultaneously, circumventing the negative phase encountered in conventional energy-based models (EBMs). As a result, the learning process is localized both in time and space, and the model is easy to converge. To match the brain's rapid computation, we demonstrate that neural adaptation can serve as a momentum term, significantly accelerating the inference process. On natural image datasets, our model exhibits representations akin to those observed in the biological visual system. Furthermore, for the machine learning community, our model can generate observations through joint or marginal generation. We show that marginal generation outperforms joint generation and achieves performance on par with other EBMs.

摘要
bayesian 脑理论 suggets that the brain 使用生成模型来理解外部世界。 sampling-based 观点认为脑内部INFERS posterior distribution 通过抽样 Stochastic neuronal responses。此外，脑 continually 更新其生成模型，以 approaching true distribution 外部世界。在这种研究中，我们引入 Hierarchical Exponential-family Energy-based (HEE) 模型，该模型捕捉了推理和学习的动力学。在 HEE 模型中，我们将 partition function 分解成各层，并利用一组具有 shorter time constants 的 neurons 来抽样分解 normalization term 的梯度。这 permit our model 可以估算 partition function 并同时进行推理，而不是在 conventional energy-based models (EBMs) 中遇到的负相位。因此，学习过程是在时间和空间上局部化的，模型易于收敛。为了匹配脑的快速计算，我们示出 neural adaptation 可以作为推理过程中的推进量，帮助加速推理过程。在自然图像数据集上，我们的模型表现出类似于生物视觉系统中的表征。此外，为机器学习社区，我们的模型可以通过 joint 或 marginal generation 生成观测。我们表明 marginal generation 超过 joint generation，并达到与其他 EBMs 相同的性能。

DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing

paper_url: http://arxiv.org/abs/2310.08785
repo_url: https://github.com/yueming6568/deltaedit
paper_authors: Yueming Lyu, Kang Zhao, Bo Peng, Yue Jiang, Yingya Zhang, Jing Dong
for: 文章主要旨在提高文本引导图像修改的训练和推理灵活性。
methods: 文章提出了一种基于 CLIP delta 空间的新方法，称为 deltaedit，它可以在训练阶段将 CLIP 视觉特征差映射到生成模型的幂空间方向上，并在推理阶段通过 CLIP 文本特征差来预测幂空间方向。
results: 实验证明， deltaedit 可以在不同的生成模型（包括 GAN 模型和扩散模型）上实现文本引导图像修改的灵活性，并且可以在不同的文本描述下进行零shot推理。

Abstract
Text-guided image editing faces significant challenges to training and inference flexibility. Much literature collects large amounts of annotated image-text pairs to train text-conditioned generative models from scratch, which is expensive and not efficient. After that, some approaches that leverage pre-trained vision-language models are put forward to avoid data collection, but they are also limited by either per text-prompt optimization or inference-time hyper-parameters tuning. To address these issues, we investigate and identify a specific space, referred to as CLIP DeltaSpace, where the CLIP visual feature difference of two images is semantically aligned with the CLIP textual feature difference of their corresponding text descriptions. Based on DeltaSpace, we propose a novel framework called DeltaEdit, which maps the CLIP visual feature differences to the latent space directions of a generative model during the training phase, and predicts the latent space directions from the CLIP textual feature differences during the inference phase. And this design endows DeltaEdit with two advantages: (1) text-free training; (2) generalization to various text prompts for zero-shot inference. Extensive experiments validate the effectiveness and versatility of DeltaEdit with different generative models, including both the GAN model and the diffusion model, in achieving flexible text-guided image editing. Code is available at https://github.com/Yueming6568/DeltaEdit.

摘要
文本导向图像编辑面临训练和推理灵活性的重大挑战。大量文本描述和图像对应的 annotated image-text pairs 收集是贵重的并不是效率的。后来，一些利用预训练的视觉语言模型的方法被提出，以避免数据收集，但它们也受到文本提示优化或推理时的参数调整的限制。为解决这些问题，我们调查并发现了一个特定的空间，称为 CLIP DeltaSpace，其中 CLIP 视觉特征差异与 CLIP 文本特征差异相semantically 对齐。基于 DeltaSpace，我们提议一种新的框架 called DeltaEdit，它在训练阶段将 CLIP 视觉特征差异映射到生成模型的幂值空间方向上，并在推理阶段从 CLIP 文本特征差异预测幂值空间方向。这种设计具有两个优势：（1）无需文本训练；（2）对不同文本提示进行零件推理。广泛的实验证明了 DeltaEdit 与不同的生成模型，包括 GAN 模型和扩散模型，在实现文本导向图像编辑的灵活性方面的有效和多样化。代码可以在 https://github.com/Yueming6568/DeltaEdit 上获取。

SegLoc: Visual Self-supervised Learning Scheme for Dense Prediction Tasks of Security Inspection X-ray Images

paper_url: http://arxiv.org/abs/2310.08421
repo_url: None
paper_authors: Shervin Halat, Mohammad Rahmati, Ehsan Nazerfard
for: 本研究旨在提高对验安全检查X射线图像进行密集预测的能力。
methods: 本研究使用了增强的自然语言处理（NLP）技术，并将对比学习策略应用于现有的视觉自我超级学习（SSL）模型。
results: 对比于随机初始化方法，本研究的方法在AR和AP指标下，在不同的IOU值下表现出3%至6%的提高，但在不同的预训练纪元下，被超越了指导初始化方法。

Abstract
Lately, remarkable advancements of artificial intelligence have been attributed to the integration of self-supervised learning (SSL) scheme. Despite impressive achievements within natural language processing (NLP), SSL in computer vision has not been able to stay on track comparatively. Recently, integration of contrastive learning on top of existing visual SSL models has established considerable progress, thereby being able to outperform supervised counterparts. Nevertheless, the improvements were mostly limited to classification tasks; moreover, few studies have evaluated visual SSL models in real-world scenarios, while the majority considered datasets containing class-wise portrait images, notably ImageNet. Thus, here, we have considered dense prediction tasks on security inspection x-ray images to evaluate our proposed model Segmentation Localization (SegLoc). Based upon the model Instance Localization (InsLoc), our model has managed to address one of the most challenging downsides of contrastive learning, i.e., false negative pairs of query embeddings. To do so, our pre-training dataset is synthesized by cutting, transforming, then pasting labeled segments, as foregrounds, from an already existing labeled dataset (PIDray) onto instances, as backgrounds, of an unlabeled dataset (SIXray;) further, we fully harness the labels through integration of the notion, one queue per class, into MoCo-v2 memory bank, avoiding false negative pairs. Regarding the task in question, our approach has outperformed random initialization method by 3% to 6%, while having underperformed supervised initialization, in AR and AP metrics at different IoU values for 20 to 30 pre-training epochs.

摘要
近期，人工智能的发展受到了自我指导学习（SSL）的整合的影响。尽管在自然语言处理（NLP）领域内的成果很出色，但在计算机视觉领域，SSL并没有很好地保持同步。最近，在现有的视觉SSL模型之上添加了对比学习，已经实现了较好的进步，并且能够超越指导学习的对比。然而，这些进步主要集中在分类任务上，而且很少的研究对视觉SSL模型进行了实际场景的评估，大多数研究都是使用类别图像 datasets，特别是ImageNet。因此，我们在安全检查式x射线图像上进行了粒度预测任务来评估我们的提议模型Segmentation Localization（SegLoc）。基于Instance Localization（InsLoc）模型，我们解决了对比学习中一个最大的挑战，即查询embedding false negative对。为此，我们使用了将已有的标注dataset（PIDray）中的标注段落切割、变换并贴上无标注dataset（SIXray）中的图像作为背景，并通过 integrate notion one queue per class into MoCo-v2 memory bank来完全利用标签。在问题上，我们的方法在与随机初始化方法的比较中出现了3%到6%的提升，而与指导初始化方法相比，在不同的IoU值下的AR和AP metric上出现了20到30个预训练纪元内的下降。

Jailbreaking Black Box Large Language Models in Twenty Queries

paper_url: http://arxiv.org/abs/2310.08419
repo_url: https://github.com/patrickrchao/jailbreakingllms
paper_authors: Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong
for: 保障大型自然语言模型（LLM）与人类价值观Alignment。
methods: 使用攻击者LLM自动生成 semantic jailbreaks，只需黑盒访问目标LLM。
results: PAIR algorithm可以很快生成jailbreak，需要 fewer than twenty queries，并且在不同的LLM上 achieve competitive jailbreaking success rates and transferability.

Abstract
There is growing interest in ensuring that large language models (LLMs) align with human values. However, the alignment of such models is vulnerable to adversarial jailbreaks, which coax LLMs into overriding their safety guardrails. The identification of these vulnerabilities is therefore instrumental in understanding inherent weaknesses and preventing future misuse. To this end, we propose Prompt Automatic Iterative Refinement (PAIR), an algorithm that generates semantic jailbreaks with only black-box access to an LLM. PAIR -- which is inspired by social engineering attacks -- uses an attacker LLM to automatically generate jailbreaks for a separate targeted LLM without human intervention. In this way, the attacker LLM iteratively queries the target LLM to update and refine a candidate jailbreak. Empirically, PAIR often requires fewer than twenty queries to produce a jailbreak, which is orders of magnitude more efficient than existing algorithms. PAIR also achieves competitive jailbreaking success rates and transferability on open and closed-source LLMs, including GPT-3.5/4, Vicuna, and PaLM-2.

摘要
有越来越多的关注是确保大语言模型（LLM）与人类价值观念相对应。然而， LLM 的启用是易受到黑客攻击的威胁，这些攻击可以让 LLM 绕过安全拦束。因此，可以通过确定这些漏洞来理解 LLM 的内在弱点，并防止未来的滥用。为此目的，我们提议 Prompt Automatic Iterative Refinement（PAIR）算法，该算法可以使用黑盒访问 LLM 生成 semantic jailbreak，而无需人类干预。PAIR 灵感来自社会工程攻击，使用攻击者 LLM 自动生成针对目标 LLM 的 jailbreak。这样，攻击者 LLM 可以针对目标 LLM 进行无数次询问，以更新和精细化候选 jailbreak。我们的实验表明，PAIR 通常需要 fewer than twenty 个询问来生成 jailbreak，这是现有算法的整个数量级别效率。PAIR 还实现了对 open 和 closed-source LLM 的突破和传输性。包括 GPT-3.5/4、Vicuna 和 PaLM-2 等。

Tightening Bounds on Probabilities of Causation By Merging Datasets

paper_url: http://arxiv.org/abs/2310.08406
repo_url: None
paper_authors: Numair Sani, Atalanti A. Mastakouri
For: The paper aims to provide symbolic bounds on the Probabilities of Causation (PoC) for a challenging scenario where multiple datasets with different treatment assignment mechanisms are available.* Methods: The paper uses causal sufficiency and combines two randomized experiments or a randomized experiment and an observational study to derive symbolic bounds on the PoC.* Results: The paper provides bounds on the PoC that work for arbitrary dimensionality of covariates and treatment, and discusses the conditions under which these bounds are tighter than existing bounds in literature. Additionally, the paper allows for the possibility of different treatment assignment mechanisms across datasets, enabling the transfer of causal information from the external dataset to the target dataset.

Abstract
Probabilities of Causation (PoC) play a fundamental role in decision-making in law, health care and public policy. Nevertheless, their point identification is challenging, requiring strong assumptions, in the absence of which only bounds can be derived. Existing work to further tighten these bounds by leveraging extra information either provides numerical bounds, symbolic bounds for fixed dimensionality, or requires access to multiple datasets that contain the same treatment and outcome variables. However, in many clinical, epidemiological and public policy applications, there exist external datasets that examine the effect of different treatments on the same outcome variable, or study the association between covariates and the outcome variable. These external datasets cannot be used in conjunction with the aforementioned bounds, since the former may entail different treatment assignment mechanisms, or even obey different causal structures. Here, we provide symbolic bounds on the PoC for this challenging scenario. We focus on combining either two randomized experiments studying different treatments, or a randomized experiment and an observational study, assuming causal sufficiency. Our symbolic bounds work for arbitrary dimensionality of covariates and treatment, and we discuss the conditions under which these bounds are tighter than existing bounds in literature. Finally, our bounds parameterize the difference in treatment assignment mechanism across datasets, allowing the mechanisms to vary across datasets while still allowing causal information to be transferred from the external dataset to the target dataset.

摘要
“ causal sufficiency ”在法律、医疗和公共政策中的决策中发挥基本作用。然而，它们的点标识具有挑战性，需要强大的假设，在缺乏这些假设的情况下只能 derivation 出界。现有的工作是通过利用额外信息来进一步紧紧这些界。然而，在许多临床、EPIDEMIOLOGY 和公共政策应用中，存在外部数据集，其研究不同的治疗方法对同一个结果变量的影响，或者研究 covariates 和结果变量之间的关系。这些外部数据集不能与上述界一起使用，因为前者可能具有不同的治疗分配机制，或者甚至遵循不同的 causal 结构。在这里，我们提供了符号约束，用于评估 PoC。我们集中于组合两个随机化实验，其中一个研究不同的治疗方法，另一个是随机化实验和观察研究，假设 causal sufficiency。我们的符号约束适用于任意维度的 covariates 和治疗，并讨论了这些约束在文献中是否更紧的。最后，我们的约束可以 Parametrize 治疗分配机制的差异，让机制在数据集之间差异，同时仍然允许 causal 信息从外部数据集传递到目标数据集。”

Performance/power assessment of CNN packages on embedded automotive platforms

paper_url: http://arxiv.org/abs/2310.08401
repo_url: None
paper_authors: Paolo Burgio, Gianluca Brilli
for:This paper aims to support engineers in choosing the most appropriate deep neural network (CNN) package and computing system for their autonomous driving designs, while also deriving guidelines for adequately sizing their systems.methods:The paper will validate the effectiveness and efficiency of recent CNN networks on state-of-the-art platforms with embedded commercial-off-the-shelf System-on-Chips (SoCs), including Xavier AGX, Tegra X2, Nano for NVIDIA, and XCZU9EG and XCZU3EG of the Zynq UltraScale+ family for the Xilinx counterpart.results:The paper will provide guidelines for engineers to choose the most appropriate CNN package and computing system for their designs, based on the performance and power consumption of the SoCs.

Abstract
The rise of power-efficient embedded computers based on highly-parallel accelerators opens a number of opportunities and challenges for researchers and engineers, and paved the way to the era of edge computing. At the same time, advances in embedded AI for object detection and categorization such as YOLO, GoogleNet and AlexNet reached an unprecedented level of accuracy (mean-Average Precision - mAP) and performance (Frames-Per-Second - FPS). Today, edge computers based on heterogeneous many-core systems are a predominant choice to deploy such systems in industry 4.0, wearable devices, and - our focus - autonomous driving systems. In these latter systems, engineers struggle to make reduced automotive power and size budgets co-exist with the accuracy and performance targets requested by autonomous driving. We aim at validating the effectiveness and efficiency of most recent networks on state-of-the-art platforms with embedded commercial-off-the-shelf System-on-Chips, such as Xavier AGX, Tegra X2 and Nano for NVIDIA and XCZU9EG and XCZU3EG of the Zynq UltraScale+ family, for the Xilinx counterpart. Our work aims at supporting engineers in choosing the most appropriate CNN package and computing system for their designs, and deriving guidelines for adequately sizing their systems.

摘要
随着高效的嵌入式计算机的兴起，基于高并行加速器的技术开创了许多机遇和挑战，并导致了边缘计算的时代。同时，嵌入式AI的对象检测和分类技术，如YOLO、GoogleNet和AlexNet，在准确率（mean-Average Precision - mAP）和性能（Frame-Per-Second - FPS）方面达到了历史性的水平。在现代工业4.0、穿梭设备和自动驾驶系统等领域，基于多核心多处理器系统的边缘计算机已成为主流选择。在这些系统中，工程师面临着减少汽车功率和尺寸预算的挑战，同时需要保持自动驾驶系统的准确率和性能标准。我们的研究旨在验证最新的网络在现有的商业半导体SoC上的效果和效率，如Xavier AGX、Tegra X2和Nano等NVIDIA SoC，以及XCZU9EG和XCZU3EG等Xilinx SoC。我们的工作旨在支持工程师选择最适合的Convolutional Neural Network（CNN）套件和计算系统，并 derive出适用于适应系统的指南。

Prompting Large Language Models with Chain-of-Thought for Few-Shot Knowledge Base Question Generation

paper_url: http://arxiv.org/abs/2310.08395
repo_url: None
paper_authors: Yuanyuan Liang, Jianing Wang, Hanlun Zhu, Lei Wang, Weining Qian, Yunshi Lan
for: 本研究旨在提出一种基于大语言模型的几何问题生成方法，以解决现有KBQG方法对于几何数据的依赖性。
methods: 我们提出了一种基于链条思想的几何问题生成方法（KQG-CoT），首先从无标注数据池中选择支持的逻辑形式，然后根据选择的示例进行链条式启发，并通过扩展KQG-CoT+来确保提问质量。
results: 我们在三个公共KBQG数据集上进行了广泛的实验，结果表明，我们的提问方法在评估数据集上一直表现出优于其他提问基线。特别是，我们的KQG-CoT+方法在PathQuestions数据集上超越了现有的几何数据集的SoTA结果，提高了BLEU-4、METEOR和ROUGE-L的评估指标的相对提升率。

Abstract
The task of Question Generation over Knowledge Bases (KBQG) aims to convert a logical form into a natural language question. For the sake of expensive cost of large-scale question annotation, the methods of KBQG under low-resource scenarios urgently need to be developed. However, current methods heavily rely on annotated data for fine-tuning, which is not well-suited for few-shot question generation. The emergence of Large Language Models (LLMs) has shown their impressive generalization ability in few-shot tasks. Inspired by Chain-of-Thought (CoT) prompting, which is an in-context learning strategy for reasoning, we formulate KBQG task as a reasoning problem, where the generation of a complete question is splitted into a series of sub-question generation. Our proposed prompting method KQG-CoT first retrieves supportive logical forms from the unlabeled data pool taking account of the characteristics of the logical form. Then, we write a prompt to explicit the reasoning chain of generating complicated questions based on the selected demonstrations. To further ensure prompt quality, we extend KQG-CoT into KQG-CoT+ via sorting the logical forms by their complexity. We conduct extensive experiments over three public KBQG datasets. The results demonstrate that our prompting method consistently outperforms other prompting baselines on the evaluated datasets. Remarkably, our KQG-CoT+ method could surpass existing few-shot SoTA results of the PathQuestions dataset by 18.25, 10.72, and 10.18 absolute points on BLEU-4, METEOR, and ROUGE-L, respectively.

摘要
KBQG任务的目的是将逻辑形式转换为自然语言问题。由于大规模问题标注的昂贵成本，KBQG在低资源场景下的方法urgently需要开发。然而，现有方法均重视 annotated data 的微调，这不适用于少量问题生成。大型自然语言模型（LLMs）的出现表明它们在少量任务中表现出色。受链条思维（CoT）提问策略启发，我们将KBQG任务定义为reasoning问题，其中问题生成的完整过程被拆分为多个子问题生成。我们提出的KQG-CoT提问方法首先从无标注数据池中选择符合特征的逻辑形式，然后写出一个提示，以显示生成复杂问题的逻辑链。为了进一步保证提示质量，我们将KQG-CoT+进一步推广，对逻辑形式进行排序，以确保提示的复杂度适中。我们在三个公共KBQG数据集上进行了广泛的实验。结果表明，我们的提示方法在评估数据集上一直表现出色，并且可以与其他提示基eline比肩。特别是，我们的KQG-CoT+方法可以在PathQuestions数据集上超越现有的几个shot SoTA结果，在BLEU-4、METEOR和ROUGE-L三个指标上提高相对评价18.25、10.72和10.18分。

Towards Better Evaluation of Instruction-Following: A Case-Study in Summarization

paper_url: http://arxiv.org/abs/2310.08394
repo_url: None
paper_authors: Ondrej Skopek, Rahul Aralikatte, Sian Gooding, Victor Carbune
for: 这个论文的目的是评估大型自然语言模型（LLM）如何遵循用户的指令。methods: 这篇论文使用了多种评估方法来量化LLM的指令遵循能力，包括Prompt-based方法。results: 研究发现，新的LLM-based reference-free评估方法可以提高评估精度，并与高品质的参照基础metric相当。

Abstract
Despite recent advances, evaluating how well large language models (LLMs) follow user instructions remains an open problem. While evaluation methods of language models have seen a rise in prompt-based approaches, limited work on the correctness of these methods has been conducted. In this work, we perform a meta-evaluation of a variety of metrics to quantify how accurately they measure the instruction-following abilities of LLMs. Our investigation is performed on grounded query-based summarization by collecting a new short-form, real-world dataset riSum, containing 300 document-instruction pairs with 3 answers each. All 900 answers are rated by 3 human annotators. Using riSum, we analyze the agreement between evaluation methods and human judgment. Finally, we propose new LLM-based reference-free evaluation methods that improve upon established baselines and perform on par with costly reference-based metrics that require high-quality summaries.

摘要
尽管最近有所进步，评估大语言模型（LLM）遵循用户指令仍然是一个开放的问题。评估语言模型的方法有很多，但对这些方法的正确性进行了有限的研究。在这种情况下，我们进行了一项meta评估，用于量化 LLM 遵循用户指令的能力。我们的调查是基于文本摘要的基础，收集了300份文档指令对，每个对有3个答案。所有900个答案都被3名人类标注员评分。使用riSum，我们分析了评估方法与人类判断的一致性。最后，我们提出了一些新的 LLM 基于参照free评估方法，超越了已有的基线，并与高质量参照基础的评估方法相当。

Do Not Marginalize Mechanisms, Rather Consolidate!

paper_url: http://arxiv.org/abs/2310.08377
repo_url: None
paper_authors: Moritz Willig, Matej Zečević, Devendra Singh Dhami, Kristian Kersting
for: 本研究旨在开发一种能够简化大规模结构 causal model（SCM）的方法，以便更好地理解这些系统的复杂 causal 关系。
methods: 本研究提出了一种基于 consolidating causal mechanisms 的方法，可以将大规模 SCM 转换为更加简单的模型，保持了可变量的 causal 行为。
results: 研究表明，通过 consolidation 可以大幅减少计算复杂性，同时保持 SCM 的可变量性和 causal 行为的一致性。此外，研究还提供了一种泛化 SCM 的思路，以增强其应用范围。

Abstract
Structural causal models (SCMs) are a powerful tool for understanding the complex causal relationships that underlie many real-world systems. As these systems grow in size, the number of variables and complexity of interactions between them does, too. Thus, becoming convoluted and difficult to analyze. This is particularly true in the context of machine learning and artificial intelligence, where an ever increasing amount of data demands for new methods to simplify and compress large scale SCM. While methods for marginalizing and abstracting SCM already exist today, they may destroy the causality of the marginalized model. To alleviate this, we introduce the concept of consolidating causal mechanisms to transform large-scale SCM while preserving consistent interventional behaviour. We show consolidation is a powerful method for simplifying SCM, discuss reduction of computational complexity and give a perspective on generalizing abilities of consolidated SCM.

摘要
Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. The translation is based on the standard grammar and vocabulary of Simplified Chinese, and may differ from Traditional Chinese, which is used in Taiwan and other countries.

MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft

paper_url: http://arxiv.org/abs/2310.08367
repo_url: https://github.com/craftjarvis/mcu
paper_authors: Haowei Lin, Zihao Wang, Jianzhu Ma, Yitao Liang
for: 本研究旨在开发一个开放式 Minecraft 代理人， therefore 提出了一个任务中心框架（MCU）用于评估 Minecraft 代理人。
methods: 本研究使用了MCU框架，其基于atom任务作为基本建构件，可以生成多种多样的任务。每个任务都有六个不同的困难度分数（时间消耗、运作努力、规划复杂度、细节、创新、新颖），这些分数可以从不同的角度评估代理人的能力。
results: 研究发现MCU框架具有高表达力，能够覆盖所有在latest literature中使用的 Minecraft 代理人任务。此外，研究还发现了代理人开发中的一些挑战，如创新、精准控制和out-of-distribution总结。

Abstract
To pursue the goal of creating an open-ended agent in Minecraft, an open-ended game environment with unlimited possibilities, this paper introduces a task-centric framework named MCU for Minecraft agent evaluation. The MCU framework leverages the concept of atom tasks as fundamental building blocks, enabling the generation of diverse or even arbitrary tasks. Within the MCU framework, each task is measured with six distinct difficulty scores (time consumption, operational effort, planning complexity, intricacy, creativity, novelty). These scores offer a multi-dimensional assessment of a task from different angles, and thus can reveal an agent's capability on specific facets. The difficulty scores also serve as the feature of each task, which creates a meaningful task space and unveils the relationship between tasks. For efficient evaluation of Minecraft agents employing the MCU framework, we maintain a unified benchmark, namely SkillForge, which comprises representative tasks with diverse categories and difficulty distribution. We also provide convenient filters for users to select tasks to assess specific capabilities of agents. We show that MCU has the high expressivity to cover all tasks used in recent literature on Minecraft agent, and underscores the need for advancements in areas such as creativity, precise control, and out-of-distribution generalization under the goal of open-ended Minecraft agent development.

摘要
为了实现在 Minecraft 中创造开放式的代理人，这篇论文提出了一个任务中心框架 named MCU，用于评估 Minecraft 代理人的能力。MCU 框架利用了原子任务作为基本建构件，可以生成多种多样的任务。在 MCU 框架中，每个任务都有六种不同的难度分数（时间消耗、操作努力、计划复杂度、细节、创造力、新颖性）。这些分数可以从不同的角度评估一个任务的难度，从而揭示代理人的特定能力。难度分数还成为每个任务的特征，创造了一个有意义的任务空间，揭示了任务之间的关系。为了有效评估 Minecraft 代理人使用 MCU 框架，我们维护了一个统一的标准套件，称为 SkillForge，该套件包含了多种类型的代表任务，并且有多样化的难度分布。我们还提供了用户友好的筛选工具，以便用户选择评估特定能力的代理人。我们发现 MCU 框架可以覆盖所有在最近的 Minecraft 代理人研究中使用的任务，并且强调了在开放式 Minecraft 代理人发展中的创新、精准控制和非标型泛化等领域的进一步发展。

2SFGL: A Simple And Robust Protocol For Graph-Based Fraud Detection

paper_url: http://arxiv.org/abs/2310.08335
repo_url: None
paper_authors: Zhirui Pan, Guangzhong Wang, Zhaoning Li, Lifeng Chen, Yang Bian, Zhongyuan Lai
for: 提高金融安全性和效率，避免金融犯罪者逃脱检测
methods: 联邦学习（FL）和虚拟图谱融合
results: 在一种常见诈骗检测任务上，与 FedAvg 相比， integrating GCN 和 2SFGL 协同检测方法可以提高性能 indicator 17.6%-30.2%，而 integrating GraphSAGE 和 2SFGL 协同检测方法可以提高性能 indicator 6%-16.2%。

Abstract
Financial crime detection using graph learning improves financial safety and efficiency. However, criminals may commit financial crimes across different institutions to avoid detection, which increases the difficulty of detection for financial institutions which use local data for graph learning. As most financial institutions are subject to strict regulations in regards to data privacy protection, the training data is often isolated and conventional learning technology cannot handle the problem. Federated learning (FL) allows multiple institutions to train a model without revealing their datasets to each other, hence ensuring data privacy protection. In this paper, we proposes a novel two-stage approach to federated graph learning (2SFGL): The first stage of 2SFGL involves the virtual fusion of multiparty graphs, and the second involves model training and inference on the virtual graph. We evaluate our framework on a conventional fraud detection task based on the FraudAmazonDataset and FraudYelpDataset. Experimental results show that integrating and applying a GCN (Graph Convolutional Network) with our 2SFGL framework to the same task results in a 17.6\%-30.2\% increase in performance on several typical metrics compared to the case only using FedAvg, while integrating GraphSAGE with 2SFGL results in a 6\%-16.2\% increase in performance compared to the case only using FedAvg. We conclude that our proposed framework is a robust and simple protocol which can be simply integrated to pre-existing graph-based fraud detection methods.

摘要
金融犯罪检测使用图学学习提高金融安全和效率。然而，犯罪者可能会在不同机构中犯罪，以避免检测，这会增加金融机构使用本地数据进行图学学习时的检测难度。由于大多数金融机构受到严格的数据隐私保护法规，训练数据通常孤立，传统的学习技术无法处理这个问题。联邦学习（FL）allow multiple institutions to train a model without revealing their datasets to each other, thereby ensuring data privacy protection.在这篇论文中，我们提出了一种新的两stage方法： federated graph learning（2SFGL）。第一stage of 2SFGL involves the virtual fusion of multiparty graphs, and the second stage involves model training and inference on the virtual graph. We evaluate our framework on a conventional fraud detection task based on the FraudAmazonDataset and FraudYelpDataset. Experimental results show that integrating and applying a GCN (Graph Convolutional Network) with our 2SFGL framework to the same task results in a 17.6%-30.2% increase in performance on several typical metrics compared to the case only using FedAvg, while integrating GraphSAGE with 2SFGL results in a 6%-16.2% increase in performance compared to the case only using FedAvg. We conclude that our proposed framework is a robust and simple protocol which can be simply integrated to pre-existing graph-based fraud detection methods.

Transport-Hub-Aware Spatial-Temporal Adaptive Graph Transformer for Traffic Flow Prediction

paper_url: http://arxiv.org/abs/2310.08328
repo_url: https://github.com/fantasy-shaw/h-stformer
paper_authors: Xiao Xu, Lei Zhang, Bailong Liu, Zhizhen Liang, Xuefei Zhang
for: 这篇论文的目的是提出一种基于交通运输系统核心技术的交通流量预测方法，以解决现有方法不充分利用交通流量数据的特性和增量学习问题。
methods: 该方法基于Transport-Hub-aware Spatial-Temporal adaptive graph transFormer (H-STFormer)，包括了一个新的空间自注意机制，三个图集合矩阵和一个时间自注意机制，以及一个额外的空间-时间知识塑造模块。
results: 经过广泛的实验，该方法在正常和增量交通流量预测任务中表现出色，能够更好地利用交通流量数据的特性和增量学习知识。

Abstract
As a core technology of Intelligent Transportation System (ITS), traffic flow prediction has a wide range of applications. Traffic flow data are spatial-temporal, which are not only correlated to spatial locations in road networks, but also vary with temporal time indices. Existing methods have solved the challenges in traffic flow prediction partly, focusing on modeling spatial-temporal dependencies effectively, while not all intrinsic properties of traffic flow data are utilized fully. Besides, there are very few attempts at incremental learning of spatial-temporal data mining, and few previous works can be easily transferred to the traffic flow prediction task. Motivated by the challenge of incremental learning methods for traffic flow prediction and the underutilization of intrinsic properties of road networks, we propose a Transport-Hub-aware Spatial-Temporal adaptive graph transFormer (H-STFormer) for traffic flow prediction. Specifically, we first design a novel spatial self-attention module to capture the dynamic spatial dependencies. Three graph masking matrices are integrated into spatial self-attentions to highlight both short- and long-term dependences. Additionally, we employ a temporal self-attention module to detect dynamic temporal patterns in the traffic flow data. Finally, we design an extra spatial-temporal knowledge distillation module for incremental learning of traffic flow prediction tasks. Through extensive experiments, we show the effectiveness of H-STFormer in normal and incremental traffic flow prediction tasks. The code is available at https://github.com/Fantasy-Shaw/H-STFormer.

摘要
为智能交通系统（ITS）核心技术之一，交通流量预测具有广泛的应用。交通流量数据具有空间-时间相关性，不仅与路网中的空间位置相关，还随着时间索引而变化。现有方法已经解决了交通流量预测中的一些挑战，主要是有效地模型空间-时间相关性，但是没有 completelly 利用交通流量数据的内在特性。此外，有很少的尝试在增量学习空间-时间数据挖掘中，而且前期工作很难 direct 应用于交通流量预测任务。驱动 by 交通流量预测任务的增量学习挑战和路网内在特性的 unterutilization，我们提出了一种基于交通枢纽的空间-时间 adaptive graph transformer（H-STFormer）。具体来说，我们首先设计了一种新的空间自注意模块，以捕捉流动的空间相关性。在空间自注意模块中，我们采用了三个图 masking 矩阵，以强调短期和长期相关性。此外，我们采用了一种时间自注意模块，以检测交通流量数据中的动态时间模式。最后，我们设计了一个额外的空间-时间知识继承模块，以进行增量学习交通流量预测任务。通过广泛的实验，我们证明了 H-STFormer 在正常和增量交通流量预测任务中的效果。代码可以在 GitHub 上找到：https://github.com/Fantasy-Shaw/H-STFormer。

CHIP: Contrastive Hierarchical Image Pretraining

paper_url: http://arxiv.org/abs/2310.08304
repo_url: None
paper_authors: Arpit Mittal, Harshil Jhaveri, Swapnil Mallick, Abhishek Ajmera
for: 这个论文主要目的是提出一种几 shot 对象分类模型，用于将未seen类对象分类到一个相对普遍的类别中。
methods: 该模型使用了三级层次的对比损失函数基于 ResNet152 分类器，用于基于图像嵌入特征进行对象分类。
results: 经过训练后，模型可以准确地将未seen类对象分类到一个相对普遍的类别中，并且对这些结果进行了详细的讨论。

Abstract
Few-shot object classification is the task of classifying objects in an image with limited number of examples as supervision. We propose a one-shot/few-shot classification model that can classify an object of any unseen class into a relatively general category in an hierarchically based classification. Our model uses a three-level hierarchical contrastive loss based ResNet152 classifier for classifying an object based on its features extracted from Image embedding, not used during the training phase. For our experimentation, we have used a subset of the ImageNet (ILSVRC-12) dataset that contains only the animal classes for training our model and created our own dataset of unseen classes for evaluating our trained model. Our model provides satisfactory results in classifying the unknown objects into a generic category which has been later discussed in greater detail.

摘要
几个示例物类分类是指将图像中的对象分类到有限多个示例的超级类别中。我们提出了一种一批/几批分类模型，可以将未看到的对象分类到一个层次结构基于的总体类别中。我们的模型使用了基于对比损失的ResNet152分类器，通过图像嵌入特征来分类对象。在训练阶段，我们使用了ILSVRC-12 dataset的动物类 subsets来训练我们的模型，并创建了一个包含未看到类的自定义数据集来评估我们的训练后模型。我们的模型在不知道对象的情况下提供了满意的结果，这些结果在后续详细介绍。

If our aim is to build morality into an artificial agent, how might we begin to go about doing so?

paper_url: http://arxiv.org/abs/2310.08295
repo_url: None
paper_authors: Reneira Seeamber, Cosmin Badea
for: 本研究旨在强调在AI中建立道德机器人的重要性，以及关键考虑的道德方法和挑战。
methods: 本文提出了一种混合方法和层次结合方法，以实现建立道德机器人。
results: 本研究提出了一些解决方案，包括 hybrid 方法和层次结合方法，以确保 AI 的道德行为和政策的实施。

Abstract
As Artificial Intelligence (AI) becomes pervasive in most fields, from healthcare to autonomous driving, it is essential that we find successful ways of building morality into our machines, especially for decision-making. However, the question of what it means to be moral is still debated, particularly in the context of AI. In this paper, we highlight the different aspects that should be considered when building moral agents, including the most relevant moral paradigms and challenges. We also discuss the top-down and bottom-up approaches to design and the role of emotion and sentience in morality. We then propose solutions including a hybrid approach to design and a hierarchical approach to combining moral paradigms. We emphasize how governance and policy are becoming ever more critical in AI Ethics and in ensuring that the tasks we set for moral agents are attainable, that ethical behavior is achieved, and that we obtain good AI.

摘要
随着人工智能（AI）在各个领域的普及，从医疗到自动驾驶，建立机器内置的道德是非常重要的。然而，我们 Still debating what it means to be moral, especially in the context of AI. In this paper, we highlight the different aspects that should be considered when building moral agents, including the most relevant moral paradigms and challenges. We also discuss the top-down and bottom-up approaches to design and the role of emotion and sentience in morality. We then propose solutions including a hybrid approach to design and a hierarchical approach to combining moral paradigms. We emphasize how governance and policy are becoming ever more critical in AI Ethics and in ensuring that the tasks we set for moral agents are attainable, that ethical behavior is achieved, and that we obtain good AI.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Concealed Electronic Countermeasures of Radar Signal with Adversarial Examples

paper_url: http://arxiv.org/abs/2310.08292
repo_url: None
paper_authors: Ruinan Ma, Canjie Zhu, Mingfeng Lu, Yunjie Li, Yu-an Tan, Ruibin Zhang, Ran Tao
for: 本研究旨在探讨基于AI技术的雷达信号电子干扰技术，以解决传统干扰技术的缺点，即干扰信号过于明显。
methods: 我们提出了一个基于时域频域图像的雷达信号分类攻击管道，并使用DITIMI-FGSM攻击算法，实现了高度可移植性。此外，我们还提出了基于STFT算法的时域信号攻击方法（STDS），以解决时域分析中的非逆问题。
results: 我们通过大量实验发现，我们的攻击管道是可行的，并且提出的攻击方法具有高度成功率。

Abstract
Electronic countermeasures involving radar signals are an important aspect of modern warfare. Traditional electronic countermeasures techniques typically add large-scale interference signals to ensure interference effects, which can lead to attacks being too obvious. In recent years, AI-based attack methods have emerged that can effectively solve this problem, but the attack scenarios are currently limited to time domain radar signal classification. In this paper, we focus on the time-frequency images classification scenario of radar signals. We first propose an attack pipeline under the time-frequency images scenario and DITIMI-FGSM attack algorithm with high transferability. Then, we propose STFT-based time domain signal attack(STDS) algorithm to solve the problem of non-invertibility in time-frequency analysis, thus obtaining the time-domain representation of the interference signal. A large number of experiments show that our attack pipeline is feasible and the proposed attack method has a high success rate.

摘要
现代战争中电子干扰技术是非常重要的。传统的电子干扰技术通常添加大规模干扰信号以确保干扰效果，这可能导致攻击变得太明显。在最近几年，基于人工智能的攻击方法出现了，可以有效解决这个问题，但攻击场景目前仅限于时域雷达信号分类。在这篇论文中，我们关注时频图像分类场景中的雷达信号。我们首先提出了基于时频图像的攻击管道和DITIMI-FGSM攻击算法，该算法具有高传输性。然后，我们提出了STFT基于时域信号攻击算法（STDS）以解决时频分析中的非可逆性问题，从而获得了干扰信号的时域表示。大量实验表明，我们的攻击管道是可行的，并且提posed攻击方法具有高成功率。

Expanding the Vocabulary of BERT for Knowledge Base Construction

paper_url: http://arxiv.org/abs/2310.08291
repo_url: https://github.com/MaastrichtU-IDS/LMKBC-2023
paper_authors: Dong Yang, Xu Wang, Remzi Celebi
for: 本研究旨在constructing knowledge base using language model, specifically tackling the task of knowledge base construction from pre-trained language models at International Semantic Web Conference 2023.
methods: 我们提出了Vocabulary Expandable BERT，一种可以扩展语言模型词汇表的方法，同时保持语义嵌入的新增词语。我们采用了任务特有的重新预训练方法来进一步提高语言模型。
results: 我们的方法在实验中表现出色，F1分数达0.323和0.362分别在隐藏测试集和验证集上，两者均由挑战提供。我们的框架使用了轻量级语言模型（BERT-base，0.13亿参数），超过使用直接在大语言模型上预训练（Chatgpt-3，175亿参数）。此外，Token-Recode achieve相当的表现与Re-pretrain。本研究提升了语言理解模型，使得直接嵌入多token实体，标志着知识图和数据管理中的链接预测任务做出了重大进步。

Abstract
Knowledge base construction entails acquiring structured information to create a knowledge base of factual and relational data, facilitating question answering, information retrieval, and semantic understanding. The challenge called "Knowledge Base Construction from Pretrained Language Models" at International Semantic Web Conference 2023 defines tasks focused on constructing knowledge base using language model. Our focus was on Track 1 of the challenge, where the parameters are constrained to a maximum of 1 billion, and the inclusion of entity descriptions within the prompt is prohibited. Although the masked language model offers sufficient flexibility to extend its vocabulary, it is not inherently designed for multi-token prediction. To address this, we present Vocabulary Expandable BERT for knowledge base construction, which expand the language model's vocabulary while preserving semantic embeddings for newly added words. We adopt task-specific re-pre-training on masked language model to further enhance the language model. Through experimentation, the results show the effectiveness of our approaches. Our framework achieves F1 score of 0.323 on the hidden test set and 0.362 on the validation set, both data set is provided by the challenge. Notably, our framework adopts a lightweight language model (BERT-base, 0.13 billion parameters) and surpasses the model using prompts directly on large language model (Chatgpt-3, 175 billion parameters). Besides, Token-Recode achieves comparable performances as Re-pretrain. This research advances language understanding models by enabling the direct embedding of multi-token entities, signifying a substantial step forward in link prediction task in knowledge graph and metadata completion in data management.

摘要
知识库建设需要获取结构化信息，以创建一个包含事实和关系数据的知识库，以便问题回答、信息检索和Semantic理解。国际semantic Web会议2023年度挑战定义了一系列任务，用于使用语言模型构建知识库。我们的关注点是第一轨道的挑战，其中语言模型的参数不得超过10亿，并且在提示中禁止包含实体描述。虽然遮盲语言模型具有较好的灵活性，但它并不自然地适应多个单词预测。为解决这个问题，我们提出了用于知识库建设的词汇扩展BERT（Vocabulary Expandable BERT），可以扩展语言模型的词汇，同时保持新增的单词含义表示。我们采用了特定任务的再预训练masked语言模型，以进一步提高语言模型。经过实验，我们的方法得到了显著的效果。我们的框架在隐藏测试集上达到了F1分数0.323，并在验证集上达到了F1分数0.362。值得注意的是，我们的框架使用了轻量级语言模型（BERT-base，0.13亿参数），并在使用大型语言模型（Chatgpt-3，175亿参数）上超过了模型。此外，Token-Recode获得了与Re-pretrain相当的性能。这项研究提高了语言理解模型，使其能直接嵌入多个单词实体，标志着链接预测任务在知 graphs和数据管理中的一个重要进步。

CP-KGC: Constrained-Prompt Knowledge Graph Completion with Large Language Models

paper_url: http://arxiv.org/abs/2310.08279
repo_url: https://github.com/sjlmg/CP-KGC
paper_authors: Rui Yang, Li Fang, Yi Zhou
For: 这篇论文的目的是利用现有的知识来推理和推测知识图中缺失的连接。* Methods: 这篇论文使用了文本基本方法，如SimKGC，以提高知识图补充的效果。但是，文本基本方法的效果受到实体文本描述的质量的限制。本文提出了使用约束基于的提示来减少LLM生成文本中的幻化问题。* Results: 本文的Constraint-Prompt Knowledge Graph Completion（CP-KGC）方法在低资源计算条件下表现出了有效的推理能力，并在WN18RR和FB15K237数据集上超过了之前的结果。这表明了LLMs在KGC任务中的整合和未来研究的新方向。

Abstract
Knowledge graph completion (KGC) aims to utilize existing knowledge to deduce and infer missing connections within knowledge graphs. Text-based approaches, like SimKGC, have outperformed graph embedding methods, showcasing the promise of inductive KGC. However, the efficacy of text-based methods hinges on the quality of entity textual descriptions. In this paper, we identify the key issue of whether large language models (LLMs) can generate effective text. To mitigate hallucination in LLM-generated text in this paper, we introduce a constraint-based prompt that utilizes the entity and its textual description as contextual constraints to enhance data quality. Our Constrained-Prompt Knowledge Graph Completion (CP-KGC) method demonstrates effective inference under low resource computing conditions and surpasses prior results on the WN18RR and FB15K237 datasets. This showcases the integration of LLMs in KGC tasks and provides new directions for future research.

摘要
知识图完成（KGC）目标是利用现有知识来推理和推断知识图中缺失的连接。文本基本方法，如SimKGC，在完成KGC任务中表现出色，超越了图集 embedding 方法。然而，文本基本方法的效果归结于实体文本描述的质量。在这篇论文中，我们发现了大语言模型（LLM）是否能生成有效的文本是关键问题。为了消除LLM生成文本中的幻觉，我们引入了一种基于约束的提问方法，使用实体和其文本描述作为 Contextual 约束来提高数据质量。我们的受约束知识图完成（CP-KGC）方法在低资源计算条件下表现出了有效的推理能力，并在WN18RR和FB15K237数据集上超越了先前的结果。这表明了LLM在KGC任务中的整合，并为未来的研究提供了新的方向。

Lag-Llama: Towards Foundation Models for Time Series Forecasting

paper_url: http://arxiv.org/abs/2310.08278
repo_url: https://github.com/kashif/pytorch-transformer-ts
paper_authors: Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Arian Khorasani, George Adamopoulos, Rishika Bhagwatkar, Marin Biloš, Hena Ghonia, Nadhir Vincent Hassen, Anderson Schneider, Sahil Garg, Alexandre Drouin, Nicolas Chapados, Yuriy Nevmyvaka, Irina Rish
for: 这个论文的目的是建立基础模型，用于时间序列预测，以及研究这些模型的扩展性。
methods: 这个模型使用了一种通用的 probabilistic 时间序列预测模型，并使用了大量的时间序列数据进行训练。
results: 模型在未看过的 “out-of-distribution” 时间序列数据上表现出色，超过了supervised baselines的预测性能。模型使用了光滑破碎的power-laws来预测模型的扩展性。Here’s the English version for reference:
for: The purpose of this paper is to build foundational models for time-series forecasting and study their scalability.
methods: The model uses a general-purpose probabilistic time-series forecasting model trained on a large collection of time-series data.
results: The model shows good zero-shot prediction capabilities on unseen “out-of-distribution” time-series datasets, outperforming supervised baselines. The model uses smoothly broken power-laws to fit and predict model scaling behavior.

Abstract
Aiming to build foundation models for time-series forecasting and study their scaling behavior, we present here our work-in-progress on Lag-Llama, a general-purpose univariate probabilistic time-series forecasting model trained on a large collection of time-series data. The model shows good zero-shot prediction capabilities on unseen "out-of-distribution" time-series datasets, outperforming supervised baselines. We use smoothly broken power-laws to fit and predict model scaling behavior. The open source code is made available at https://github.com/kashif/pytorch-transformer-ts.

摘要
目标建立时间序列预测基础模型，我们现在发表我们的工作进度， Lag-Llama 是一种通用单变量时间序列预测模型，通过大量时间序列数据进行训练。该模型在未看过的 "out-of-distribution" 时间序列数据上表现出良好的预测能力，超过了指导基eline。我们使用缓和的力普洛斯来预测模型缩放行为。开源代码可以在 https://github.com/kashif/pytorch-transformer-ts 上获取。

Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval

paper_url: http://arxiv.org/abs/2310.08276
repo_url: None
paper_authors: Qing Ma, Jiancheng Pan, Cong Bai
for: 提高Remote Sensing中的图像文本检索精度，解决视觉语义不匹配问题
methods: 提出一种新的方向偏置视 semantic Embedding Model (DOVE)，利用 Regional-Oriented Attention Module (ROAM) 和 lightweight Digging Text Genome Assistant (DTGA) 实现视觉语义关系的挖掘
results: 通过广泛的实验，包括参数评估、量化比较、拆除研究和视觉分析，证明方法的效果和优越性，在RSICD和RSITMD两个标准测试集上

Abstract
Image-text retrieval has developed rapidly in recent years. However, it is still a challenge in remote sensing due to visual-semantic imbalance, which leads to incorrect matching of non-semantic visual and textual features. To solve this problem, we propose a novel Direction-Oriented Visual-semantic Embedding Model (DOVE) to mine the relationship between vision and language. Concretely, a Regional-Oriented Attention Module (ROAM) adaptively adjusts the distance between the final visual and textual embeddings in the latent semantic space, oriented by regional visual features. Meanwhile, a lightweight Digging Text Genome Assistant (DTGA) is designed to expand the range of tractable textual representation and enhance global word-level semantic connections using less attention operations. Ultimately, we exploit a global visual-semantic constraint to reduce single visual dependency and serve as an external constraint for the final visual and textual representations. The effectiveness and superiority of our method are verified by extensive experiments including parameter evaluation, quantitative comparison, ablation studies and visual analysis, on two benchmark datasets, RSICD and RSITMD.

摘要
图文检索在最近几年内得到了快速发展，但在遥感领域仍然存在视 semantic 不匹配问题，这会导致非 semantic 的视觉和文本特征的不正确匹配。为解决这问题，我们提议一种新的方向围绕视 semantic 嵌入模型（DOVE），以挖掘视 Semantic 关系。具体来说，一个地域围绕注意模块（ROAM）可以在最终的视觉和文本嵌入空间中进行 adaptive 距离调整，以便根据地域视觉特征进行方向引导。同时，我们设计了一种轻量级的挖掘文本遗传助手（DTGA），以扩大可处理文本表示范围和增强全局单词 Semantic 连接使用 fewer attention 操作。最后，我们利用全局视 Semantic 约束来减少单个视觉依赖和作为外部约束 для final 视觉和文本表示。我们的方法的效果和优越性在两个 benchmark 数据集上，RSICD 和 RSITMD 进行了广泛的实验，包括参数评估、量化比较、剥削学习和视觉分析。Note that Simplified Chinese is used in the translation, as it is more widely used in mainland China and is the standard writing system used in the country. Traditional Chinese is used in Hong Kong, Macau, and Taiwan, and it has some differences in spelling and grammar compared to Simplified Chinese.

Impact of Co-occurrence on Factual Knowledge of Large Language Models

paper_url: http://arxiv.org/abs/2310.08256
repo_url: https://github.com/cheongwoong/impact_of_cooccurrence
paper_authors: Cheongwoong Kang, Jaesik Choi
for: 本研究旨在探讨大语言模型（LLM）常常返回错误的原因，以及如何提高LLM的可靠性。
methods: 本研究使用了一种定量方法，通过分析LLM在不同预训练集中的表现，探讨LLM在回答问题时是否受到预训练数据中的偏见影响。
results: 研究发现，LLM受到预训练数据中的偏见影响，导致它们偏好频繁共occurrence的词语，而不是正确的答案。这使得LLM在回答有关的问题时难以讲述事实，尤其是当问题中的主题和 объек hardly co-occur在预训练数据中时。研究还发现，不 matter how large the model size or how much finetuning is done, co-occurrence bias still exists. 因此，研究建议使用减偏数据集进行finetuning，以避免这种偏见。

Abstract
Large language models (LLMs) often make factually incorrect responses despite their success in various applications. In this paper, we hypothesize that relying heavily on simple co-occurrence statistics of the pre-training corpora is one of the main factors that cause factual errors. Our results reveal that LLMs are vulnerable to the co-occurrence bias, defined as preferring frequently co-occurred words over the correct answer. Consequently, LLMs struggle to recall facts whose subject and object rarely co-occur in the pre-training dataset although they are seen during finetuning. We show that co-occurrence bias remains despite scaling up model sizes or finetuning. Therefore, we suggest finetuning on a debiased dataset to mitigate the bias by filtering out biased samples whose subject-object co-occurrence count is high. Although debiased finetuning allows LLMs to memorize rare facts in the training set, it is not effective in recalling rare facts unseen during finetuning. Further research in mitigation will help build reliable language models by preventing potential errors. The code is available at \url{https://github.com/CheongWoong/impact_of_cooccurrence}.

摘要
大型语言模型（LLM）经常会给出错误的回答，即使在不同的应用中具有成功。在这篇文章中，我们提出了假设，认为将重点放在预训料中的简单共occurrence统计上是 LLM 产生错误的主要原因。我们的结果显示 LLM 受到共occurrence偏见，即偏爱常见的词语，而不是正确的答案。因此， LLM 对于 rarely 共occurrence 的主题和物件没有记忆，即使它们在调整中看到过。我们发现，共occurrence偏见不受模型大小或调整的影响，因此我们建议使用删除偏见样本的删除调整，以降低这种偏见。虽然删除调整可以帮助 LLM 记忆预训料中罕见的事实，但是它不能帮助 LLM 在调整中发现过去未见的罕见事实。进一步的研究将有助于建立可靠的语言模型，以避免潜在的错误。代码可以在 \url{https://github.com/CheongWoong/impact_of_cooccurrence} 获取。

MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.08252
repo_url: https://github.com/GMC-DRL/MetaBox
paper_authors: Zeyuan Ma, Hongshu Guo, Jiacheng Chen, Zhenrui Li, Guojun Peng, Yue-Jiao Gong, Yining Ma, Zhiguang Cao
for: 这个研究旨在探讨Meta-Black-Box Optimization with Reinforcement Learning（MetaBBO-RL）的可能性，并提供一个全面的 benchmark 平台 для开发和评估MetaBBO-RL 方法。
methods: 这个研究使用了一个可调的 algorithmic template，让使用者可以轻松地实现自己的设计 within the platform。另外，它还提供了300多个问题实例，涵盖了从 sintetic 到 realistic 的情况，以及19个基eline 方法，包括传统的 black-box optimizer 和最近的 MetaBBO-RL 方法。
results: 这个研究为了证明 MetaBox 的用途，对现有的 MetaBBO-RL 方法进行了广泛的 benchmarking 研究。

Abstract
Recently, Meta-Black-Box Optimization with Reinforcement Learning (MetaBBO-RL) has showcased the power of leveraging RL at the meta-level to mitigate manual fine-tuning of low-level black-box optimizers. However, this field is hindered by the lack of a unified benchmark. To fill this gap, we introduce MetaBox, the first benchmark platform expressly tailored for developing and evaluating MetaBBO-RL methods. MetaBox offers a flexible algorithmic template that allows users to effortlessly implement their unique designs within the platform. Moreover, it provides a broad spectrum of over 300 problem instances, collected from synthetic to realistic scenarios, and an extensive library of 19 baseline methods, including both traditional black-box optimizers and recent MetaBBO-RL methods. Besides, MetaBox introduces three standardized performance metrics, enabling a more thorough assessment of the methods. In a bid to illustrate the utility of MetaBox for facilitating rigorous evaluation and in-depth analysis, we carry out a wide-ranging benchmarking study on existing MetaBBO-RL methods. Our MetaBox is open-source and accessible at: https://github.com/GMC-DRL/MetaBox.

摘要
近期，Meta-Black-Box优化器与强化学习（MetaBBO-RL）已经展示了通过在meta层使用RL来减少人工细化低级黑盒优化器的问题。然而，这个领域受到互联网缺乏一个统一的 benchmark 的限制。为了填补这个空白，我们介绍了 MetaBox，第一个专门为开发和评估 MetaBBO-RL 方法而设计的 benchmark 平台。MetaBox 提供了灵活的算法模板，allowing users 可以轻松地实现他们的独特设计在平台上。此外，它还提供了来自 sintetic 到 realistic 的问题集，以及一个广泛的库，包括传统的黑盒优化器和最新的 MetaBBO-RL 方法。此外，MetaBox 引入了三种标准性能指标，以便更加全面地评估方法。为了证明 MetaBox 的用于促进严格评估和深入分析的能力，我们进行了广泛的 benchmarking 研究，覆盖了现有的 MetaBBO-RL 方法。我们的 MetaBox 是开源的，可以在以下地址下载：https://github.com/GMC-DRL/MetaBox。

GROOT: Learning to Follow Instructions by Watching Gameplay Videos

paper_url: http://arxiv.org/abs/2310.08235
repo_url: https://github.com/CraftJarvis/GROOT
paper_authors: Shaofei Cai, Bowei Zhang, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang
for: 这个论文目标是建立一个可以遵循开放式指令的控制器，用于在开放世界环境中进行游戏play。
methods: 该论文提出了以参考视频作为指令，从游戏媒体中学习控制器的方法，并使用 causal transformers 实现了一个简单 yet effective encoder-decoder 架构。
results: 在一个基于 Minecraft 的 SkillForge benchamark 上，对于开放世界的对手和人类玩家进行评测，GROOT 表现出了70% 的赢利率，并且与人类机器同等水平。代码和视频可以在https://craftjarvis-groot.github.io 上找到。

Abstract
We study the problem of building a controller that can follow open-ended instructions in open-world environments. We propose to follow reference videos as instructions, which offer expressive goal specifications while eliminating the need for expensive text-gameplay annotations. A new learning framework is derived to allow learning such instruction-following controllers from gameplay videos while producing a video instruction encoder that induces a structured goal space. We implement our agent GROOT in a simple yet effective encoder-decoder architecture based on causal transformers. We evaluate GROOT against open-world counterparts and human players on a proposed Minecraft SkillForge benchmark. The Elo ratings clearly show that GROOT is closing the human-machine gap as well as exhibiting a 70% winning rate over the best generalist agent baseline. Qualitative analysis of the induced goal space further demonstrates some interesting emergent properties, including the goal composition and complex gameplay behavior synthesis. Code and video can be found on the website https://craftjarvis-groot.github.io.

摘要
我们研究如何建立一个可以遵循开放式指令的控制器，在开放世界环境中进行游戏游戏。我们提议以参考视频作为指令，这些指令提供了表达力强的目标规范，同时消除了高昂的文本游戏注释。我们 derivates a new learning framework，使得可以从游戏视频中学习这种指令遵循控制器，并生成一个视频指令编码器，该编码器在游戏中生成结构化的目标空间。我们实现了我们的代理GROOT，使用了 causal transformers 基于 encoder-decoder 架构。我们对 Minecraft SkillForge benchmark 进行了评估，并与人类玩家和其他开放世界控制器进行比较。很明显，GROOT 在人机之间减少了差距，并在最佳通用代理基eline上达到 70% 的赢利率。另外，对于引导空间的分析也表明了一些有趣的 emergent 性，包括目标组合和复杂的游戏行为合成。代码和视频可以在 website https://craftjarvis-groot.github.io 找到。

The Impact of Time Step Frequency on the Realism of Robotic Manipulation Simulation for Objects of Different Scales

paper_url: http://arxiv.org/abs/2310.08233
repo_url: None
paper_authors: Minh Q. Ta, Holly Dinkel, Hameed Abdul-Rashid, Yangfei Dai, Jessica Myers, Tan Chen, Junyi Geng, Timothy Bretl
for: 本研究探讨了机器人操作模拟精度中的时间步频和组件尺度的影响。
methods: 研究使用了不同的时间步频和组件尺度，对小规模 объек的机器人操作模拟精度进行了评估。
results: 结果显示，逐步增加时间步频可以提高小规模 объек的机器人操作模拟精度。

Abstract
This work evaluates the impact of time step frequency and component scale on robotic manipulation simulation accuracy. Increasing the time step frequency for small-scale objects is shown to improve simulation accuracy. This simulation, demonstrating pre-assembly part picking for two object geometries, serves as a starting point for discussing how to improve Sim2Real transfer in robotic assembly processes.

摘要
这个研究evaluates the impact of时间步频和组件尺度在机器人操作 simulated accuracy. 增加小规模对象的时间步频可以提高simulation accuracy. 这个simulation, demonstrating pre-assembly part picking for two object geometries, serves as a starting point for discussing how to improve Sim2Real transfer in robotic assembly processes.Note: "Sim2Real" refers to the transfer of skills learned in simulation to the real world.

Large language models can replicate cross-cultural differences in personality

paper_url: http://arxiv.org/abs/2310.10679
repo_url: None
paper_authors: Paweł Niszczota, Mateusz Janczak
for: 本研究用于测试GPT-4是否能够复制不同文化之间的五大人性特质差异，使用美国和韩国作为文化对比。
methods: 研究使用了大规模实验（N=8000），使用GPT-4和GPT-3.5两种语言模型，对英语和韩语版本的十项人性测试表进行了 manipulate。
results: 研究发现GPT-4能够复制不同文化之间的每个因素差异，但是 сред值有上升偏好，表现出来的结构适应性较低。

Abstract
We use a large-scale experiment (N=8000) to determine whether GPT-4 can replicate cross-cultural differences in the Big Five, measured using the Ten-Item Personality Inventory. We used the US and South Korea as the cultural pair, given that prior research suggests substantial personality differences between people from these two countries. We manipulated the target of the simulation (US vs. Korean), the language of the inventory (English vs. Korean), and the language model (GPT-4 vs. GPT-3.5). Our results show that GPT-4 replicated the cross-cultural differences for each factor. However, mean ratings had an upward bias and exhibited lower variation than in the human samples, as well as lower structural validity. Overall, we provide preliminary evidence that LLMs can aid cross-cultural psychological research.

摘要
我们使用大规模实验（N=8000）来确定GPT-4是否可以复制不同文化之间的五大人格 trait，使用美国和韩国作为文化对，因为先前的研究表明这两个国家之间存在重要的人格差异。我们在目标 simulate（美国 vs. 韩国）、语言测量 инструment（英语 vs. 韩语）和语言模型（GPT-4 vs. GPT-3.5）上进行了 manipulate。我们的结果表明，GPT-4能够复制不同文化之间的每个因素。然而，平均评价显示有上升偏好，并且表现出较低的多样性和结构有效性。总的来说，我们提供了初步的证据，表明LLMs可以助助cross-cultural psychological research。Note: "韩语" (Korean language) is used in the text to refer to the language spoken in South Korea.

SimCKP: Simple Contrastive Learning of Keyphrase Representations

paper_url: http://arxiv.org/abs/2310.08221
repo_url: https://github.com/brightjade/SimCKP
paper_authors: Minseok Choi, Chaeheon Gwak, Seho Kim, Si Hyeong Kim, Jaegul Choo
for: 本文目的是提出一种简单的对比学习框架，以提高键短语生成和键短语提取的效果。
methods: 本文使用了一个抽象generator和一个reranker来实现对比学习框架。抽象generator通过学习上下文感知词语表示来提取键短语，同时也生成不在文档中的键短语。reranker则是对生成的每个词语进行适应性分配，使其与文档的表示相似。
results: 实验结果表明，提出的方法在多个benchmark数据集上表现出色，与现有状态的模型相比，具有显著的超越性。

Abstract
Keyphrase generation (KG) aims to generate a set of summarizing words or phrases given a source document, while keyphrase extraction (KE) aims to identify them from the text. Because the search space is much smaller in KE, it is often combined with KG to predict keyphrases that may or may not exist in the corresponding document. However, current unified approaches adopt sequence labeling and maximization-based generation that primarily operate at a token level, falling short in observing and scoring keyphrases as a whole. In this work, we propose SimCKP, a simple contrastive learning framework that consists of two stages: 1) An extractor-generator that extracts keyphrases by learning context-aware phrase-level representations in a contrastive manner while also generating keyphrases that do not appear in the document; 2) A reranker that adapts scores for each generated phrase by likewise aligning their representations with the corresponding document. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed approach, which outperforms the state-of-the-art models by a significant margin.

摘要
“键签生成（KG）的目标是从来源文档中生成一系列概要的词汇或短语，而键签提取（KE）则是从文档中直接找到这些键签。由于搜寻空间较小的KE，因此通常与KG结合以预测文档中可能存在的键签。然而，现有的统一方法通常运用序列标记和最大化生成，主要在字元水平运作，忽略了评估和评分键签的整体性。在这个工作中，我们提出了简单的对照学习框架SimCKP，它包括以下两个阶段：1）抽取生成器，通过学习上下文感知词汇水平表示来提取键签，同时生成不存在文档中的键签；2）改进器，将每个生成的词汇排名更新，根据该词汇与文档的表示相互适合。实验结果显示，我们提出的方法可以对多个标准 benchmark dataset 进行优化，并与现有模型相比，具有较高的效果。”

TriRE: A Multi-Mechanism Learning Paradigm for Continual Knowledge Retention and Promotion

paper_url: http://arxiv.org/abs/2310.08217
repo_url: https://github.com/NeurAI-Lab/TriRE
paper_authors: Preetha Vijayan, Prashant Bhat, Elahe Arani, Bahram Zonooz
For: The paper aims to address the challenge of continual learning (CL) in deep neural networks, specifically catastrophic forgetting (CF) of previously learned tasks.* Methods: The proposed method, called TriRE, combines several neurophysiological processes, including neurogenesis, active forgetting, neuromodulation, metaplasticity, experience rehearsal, and context-dependent gating, to mitigate CF and improve CL performance.* Results: TriRE significantly reduces task interference and outperforms other CL approaches considered in isolation across various CL settings.

Abstract
Continual learning (CL) has remained a persistent challenge for deep neural networks due to catastrophic forgetting (CF) of previously learned tasks. Several techniques such as weight regularization, experience rehearsal, and parameter isolation have been proposed to alleviate CF. Despite their relative success, these research directions have predominantly remained orthogonal and suffer from several shortcomings, while missing out on the advantages of competing strategies. On the contrary, the brain continually learns, accommodates, and transfers knowledge across tasks by simultaneously leveraging several neurophysiological processes, including neurogenesis, active forgetting, neuromodulation, metaplasticity, experience rehearsal, and context-dependent gating, rarely resulting in CF. Inspired by how the brain exploits multiple mechanisms concurrently, we propose TriRE, a novel CL paradigm that encompasses retaining the most prominent neurons for each task, revising and solidifying the extracted knowledge of current and past tasks, and actively promoting less active neurons for subsequent tasks through rewinding and relearning. Across CL settings, TriRE significantly reduces task interference and surpasses different CL approaches considered in isolation.

摘要

Trustworthy Machine Learning

paper_url: http://arxiv.org/abs/2310.08215
repo_url: https://github.com/matthew-mcateer/practicing_trustworthy_machine_learning
paper_authors: Bálint Mucsányi, Michael Kirchhof, Elisa Nguyen, Alexander Rubinstein, Seong Joon Oh
for: This paper is written for researchers and practitioners who want to build trustworthy machine learning models that can generalize to small changes in the distribution, provide explainability, and quantify uncertainty.
methods: The paper covers four key topics in trustworthy machine learning: out-of-distribution generalization, explainability, uncertainty quantification, and evaluation of trustworthiness. It discusses classical and contemporary research papers in these fields and uncovers their underlying intuitions.
results: The book provides a theoretical and technical background in trustworthy machine learning, including code snippets and pointers to further sources on topics of TML. It is meant to be a stand-alone product and has evolved from a course offered at the University of Tübingen.

Abstract
As machine learning technology gets applied to actual products and solutions, new challenges have emerged. Models unexpectedly fail to generalize to small changes in the distribution, tend to be confident on novel data they have never seen, or cannot communicate the rationale behind their decisions effectively with the end users. Collectively, we face a trustworthiness issue with the current machine learning technology. This textbook on Trustworthy Machine Learning (TML) covers a theoretical and technical background of four key topics in TML: Out-of-Distribution Generalization, Explainability, Uncertainty Quantification, and Evaluation of Trustworthiness. We discuss important classical and contemporary research papers of the aforementioned fields and uncover and connect their underlying intuitions. The book evolved from the homonymous course at the University of T\"ubingen, first offered in the Winter Semester of 2022/23. It is meant to be a stand-alone product accompanied by code snippets and various pointers to further sources on topics of TML. The dedicated website of the book is https://trustworthyml.io/.

摘要
machine learning技术应用到实际产品和解决方案时，新的挑战出现了。模型往往无法泛化到小的分布变化，对新数据感到非常自信，或者无法有效地通过决策的理由与用户交流。总之，我们面临着当前机器学习技术的信任问题。这本《信任worthy机器学习》（TML）教程涵盖了四个关键话题的理论和技术背景：离distribution泛化、解释性、不确定度量和评估信任worthiness。我们讨论了重要的经典和当代研究论文，探索了它们的基本感知。这本书源于同名课程，在2022/23学年冬季学期首次举行。它是一个独立的产品，附有代码示例和各种关于TML主题的资源。相关网站是。

Long-Tailed Classification Based on Coarse-Grained Leading Forest and Multi-Center Loss

paper_url: http://arxiv.org/abs/2310.08206
repo_url: https://github.com/jinyery/cognisance
paper_authors: Jinye Yang, Ji Xu
For: This paper aims to address the long-tailed classification problem by proposing a new framework called \textbf{\textsc{Cognisance}, which uses a combination of Coarse-Grained Leading Forest (CLF) and Multi-Center Loss (MCL) to learn invariant features and improve the performance of long-tailed classification.* Methods: The proposed method uses an unsupervised learning method, CLF, to better characterize the distribution of attributes within a class, and introduces a new metric learning loss, MCL, to gradually eliminate confusing attributes during the feature learning process.* Results: The proposed method has state-of-the-art performance in both existing benchmarks ImageNet-GLT and MSCOCO-GLT, and can improve the performance of existing LT methods. The codes are available on GitHub: \url{https://github.com/jinyery/cognisance}.

Abstract
Long-tailed(LT) classification is an unavoidable and challenging problem in the real world. Most of the existing long-tailed classification methods focus only on solving the inter-class imbalance in which there are more samples in the head class than in the tail class, while ignoring the intra-lass imbalance in which the number of samples of the head attribute within the same class is much larger than the number of samples of the tail attribute. The deviation in the model is caused by both of these factors, and due to the fact that attributes are implicit in most datasets and the combination of attributes is very complex, the intra-class imbalance is more difficult to handle. For this purpose, we proposed a long-tailed classification framework, known as \textbf{\textsc{Cognisance}, which is founded on Coarse-Grained Leading Forest (CLF) and Multi-Center Loss (MCL), aiming to build a multi-granularity joint solution model by means of invariant feature learning. In this method, we designed an unsupervised learning method, i.e., CLF, to better characterize the distribution of attributes within a class. Depending on the distribution of attributes, we can flexibly construct sampling strategies suitable for different environments. In addition, we introduce a new metric learning loss (MCL), which aims to gradually eliminate confusing attributes during the feature learning process. More importantly, this approach does not depend on a specific model structure and can be integrated with existing LT methods as an independent component. We have conducted extensive experiments and our approach has state-of-the-art performance in both existing benchmarks ImageNet-GLT and MSCOCO-GLT, and can improve the performance of existing LT methods. Our codes are available on GitHub: \url{https://github.com/jinyery/cognisance}

摘要
Traditional long-tailed classification methods only focus on solving the inter-class imbalance issue, where there are more samples in the head class than in the tail class, while ignoring the intra-class imbalance issue where the number of samples of the head attribute within the same class is much larger than the number of samples of the tail attribute. This leads to deviation in the model. Moreover, attributes are implicit in most datasets and the combination of attributes is very complex, making the intra-class imbalance more difficult to handle.To address these issues, we proposed a long-tailed classification framework called \textbf{\textsc{Cognisance} which is founded on Coarse-Grained Leading Forest (CLF) and Multi-Center Loss (MCL). The goal is to build a multi-granularity joint solution model through invariant feature learning.Our approach includes an unsupervised learning method, CLF, to better characterize the distribution of attributes within a class. Depending on the distribution of attributes, we can flexibly construct sampling strategies suitable for different environments. Additionally, we introduce a new metric learning loss, MCL, which aims to gradually eliminate confusing attributes during the feature learning process.The key advantage of our approach is that it does not depend on a specific model structure and can be integrated with existing LT methods as an independent component. We have conducted extensive experiments and our approach has achieved state-of-the-art performance in both existing benchmarks ImageNet-GLT and MSCOCO-GLT, and can improve the performance of existing LT methods. Our codes are available on GitHub: \url{https://github.com/jinyery/cognisance}.

Beyond Traditional DoE: Deep Reinforcement Learning for Optimizing Experiments in Model Identification of Battery Dynamics

paper_url: http://arxiv.org/abs/2310.08198
repo_url: None
paper_authors: Gokhan Budan, Francesca Damiani, Can Kurtulus, N. Kemal Ure
For: 该研究旨在提高电池模型的建模效率，以便更好地优化能源管理系统和设计过程。* Methods: 该研究使用深度强化学习来改进传统设计实验（DoE）方法，以避免手动配置多个电流配置，并通过更新过去实验统计信息来动态调整当前实验。* Results: 实验和仿真结果表明，提案的方法可以与传统DoE方法相比，使用85% menos资源获得同样准确的电池模型。

Abstract
Model identification of battery dynamics is a central problem in energy research; many energy management systems and design processes rely on accurate battery models for efficiency optimization. The standard methodology for battery modelling is traditional design of experiments (DoE), where the battery dynamics are excited with many different current profiles and the measured outputs are used to estimate the system dynamics. However, although it is possible to obtain useful models with the traditional approach, the process is time consuming and expensive because of the need to sweep many different current-profile configurations. In the present work, a novel DoE approach is developed based on deep reinforcement learning, which alters the configuration of the experiments on the fly based on the statistics of past experiments. Instead of sticking to a library of predefined current profiles, the proposed approach modifies the current profiles dynamically by updating the output space covered by past measurements, hence only the current profiles that are informative for future experiments are applied. Simulations and real experiments are used to show that the proposed approach gives models that are as accurate as those obtained with traditional DoE but by using 85\% less resources.

摘要
模型识别电池动态是能源研究的中心问题，许多能源管理系统和设计过程都依赖于准确的电池模型以优化效率。现行的方法是传统的设计实验（DoE），通过刺激电池动态多种不同的电流 Profiling 并根据测量输出来估算系统动态。然而，尽管可以通过传统方法获得有用的模型，但这个过程占用时间和成本很大，因为需要探索许多不同的电流配置。在 presente 工作中，一种新的DoE方法基于深度强化学习被发展出来，该方法在实验过程中基于过去测量的统计参数来修改配置。相比传统方法，该方法不再仅仅依赖于静态的电流配置库，而是在实验过程中动态地修改电流配置，只有在过去测量中有用的电流配置才会被应用。通过实验和真实实验，我们显示了该方法可以提供与传统DoE相同的准确性，但是使用85% fewer resources。

EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation

paper_url: http://arxiv.org/abs/2310.08185
repo_url: None
paper_authors: Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan
for: 这个论文主要目标是提高长篇文章生成的质量，使其更加 coherent 和 relevante。
methods: 这个论文提出了一种新的框架 called Evaluation-guided Iterative Plan Extraction (EIPE-text)，它从 narrative 干员中提取计划，并使用这些计划来构建更好的规划器。这个框架包括三个阶段：计划提取、学习和推理。在计划提取阶段，它通过迭代提取和改进计划来构建一个计划库。
results: 这个论文的实验结果表明，使用 EIPE-text 可以生成更加 coherent 和 relevante 的长篇文章，比如小说和故事。两个 GPT-4 基于的评估和人类评估都表明了这一点。

Abstract
Plan-and-Write is a common hierarchical approach in long-form narrative text generation, which first creates a plan to guide the narrative writing. Following this approach, several studies rely on simply prompting large language models for planning, which often yields suboptimal results. In this paper, we propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narrative text generation (EIPE-text), which extracts plans from the corpus of narratives and utilizes the extracted plans to construct a better planner. EIPE-text has three stages: plan extraction, learning, and inference. In the plan extraction stage, it iteratively extracts and improves plans from the narrative corpus and constructs a plan corpus. We propose a question answer (QA) based evaluation mechanism to automatically evaluate the plans and generate detailed plan refinement instructions to guide the iterative improvement. In the learning stage, we build a better planner by fine-tuning with the plan corpus or in-context learning with examples in the plan corpus. Finally, we leverage a hierarchical approach to generate long-form narratives. We evaluate the effectiveness of EIPE-text in the domains of novels and storytelling. Both GPT-4-based evaluations and human evaluations demonstrate that our method can generate more coherent and relevant long-form narratives. Our code will be released in the future.

摘要
Plan-and-Write 是一种常见的幂等方法，用于长篇叙述文本生成。在这种方法中，首先创建一个指导叙述的计划。然而，许多研究仅通过简单地请求大语言模型生成计划，这经常会得到不佳的结果。在这篇论文中，我们提出了一个新的框架，即评估指导逐步提取计划（EIPE-text）。这个框架包括三个阶段：计划提取、学习和推理。在计划提取阶段，我们使用迭代提取和改进计划的方法，从叙述资源中提取计划，并构建计划库。我们提出了一种问答（QA）基于的评估机制，自动评估计划，并生成详细的计划细化指导，以帮助迭代改进。在学习阶段，我们使用计划库或在计划库中进行Contextual learning 进行训练，以建立更好的规划器。最后，我们采用层次结构来生成长篇叙述。我们在小说和故事领域进行了评估，并得到了人类和 GPT-4 基于的评估结果，表明我们的方法可以生成更 coherent 和 relevante 的长篇叙述。我们将在未来发布代码。

Learn From Model Beyond Fine-Tuning: A Survey

paper_url: http://arxiv.org/abs/2310.08184
repo_url: https://github.com/ruthless-man/awesome-learn-from-model
paper_authors: Hongling Zheng, Li Shen, Anke Tang, Yong Luo, Han Hu, Bo Du, Dacheng Tao
for: 这篇论文主要研究的是基于模型接口的学习从模型（Learn From Model，LFM）技术，以提高模型的性能和普适性。
methods: 该论文主要介绍了五个主要领域的研究方法，包括模型调整、模型萃取、模型重用、元学习和模型编辑。每个领域都包括了多种方法和策略，用于提高模型的表现和普适性。
results: 该论文对现有的研究进行了一个全面的回顾，并提出了未来研究的一些关键领域和需要更多的关注的问题。

Abstract
Foundation models (FM) have demonstrated remarkable performance across a wide range of tasks (especially in the fields of natural language processing and computer vision), primarily attributed to their ability to comprehend instructions and access extensive, high-quality data. This not only showcases their current effectiveness but also sets a promising trajectory towards the development of artificial general intelligence. Unfortunately, due to multiple constraints, the raw data of the model used for large model training are often inaccessible, so the use of end-to-end models for downstream tasks has become a new research trend, which we call Learn From Model (LFM) in this article. LFM focuses on the research, modification, and design of FM based on the model interface, so as to better understand the model structure and weights (in a black box environment), and to generalize the model to downstream tasks. The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing. Each category encompasses a repertoire of methods and strategies that aim to enhance the capabilities and performance of FM. This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM, in order to help readers better understand the current research status and ideas. To conclude, we summarize the survey by highlighting several critical areas for future exploration and addressing open issues that require further attention from the research community. The relevant papers we investigated in this article can be accessed at .

摘要
基于模型（FM）在各种任务上表现出色，特别是自然语言处理和计算机视觉领域，这主要归功于它们对指令的理解和访问高质量数据的能力。这不仅表明当前的效果，还预示了人工智能发展的美好趋势。然而，由于多种限制，FM的原始数据通常不可 accessible，因此使用端到端模型进行下游任务的研究成为了新的研究趋势，我们在这篇文章中称之为“学习从模型”（LFM）。LFM的研究重点在于 FM 的模型接口上进行研究、修改和设计，以更好地理解模型结构和权重（在黑盒环境中），并将模型扩展到下游任务。LFM 的研究领域可以分为五大类：模型调整、模型蒸馏、模型复用、元学习和模型编辑。每个类别包括一系列方法和策略，旨在提高 FM 的能力和性能。本文通过 FM 的角度，对当前的 LFM 方法进行了全面的审视，以帮助读者更好地了解当前的研究状况和想法。以下是本文的结论：我们将来的探索细分为五个重要领域，并提出了一些需要进一步关注的问题。相关的研究论文可以在中找到。

Multi-Scale Spatial-Temporal Recurrent Networks for Traffic Flow Prediction

paper_url: http://arxiv.org/abs/2310.08138
repo_url: None
paper_authors: Haiyang Liu, Chunjiang Zhu, Detian Zhang, Qing Li
for: traffic flow predictionmethods: Multi-Scale Spatial-Temporal Recurrent Network (MSSTRN) with single-step gate recurrent unit and multi-step gate recurrent unit, and spatial-temporal synchronous attention mechanismresults: best prediction accuracy with non-trivial margins compared to all twenty baseline methods.Here is the simplified Chinese text:for: 交通流量预测methods: 多尺度空间-时间径向网络（MSSTRN），包括单步门闭合径向单元和多步门闭合径向单元，以及空间-时间同步注意力机制results: 与所有基线方法相比，实现了最佳预测精度。

Abstract
Traffic flow prediction is one of the most fundamental tasks of intelligent transportation systems. The complex and dynamic spatial-temporal dependencies make the traffic flow prediction quite challenging. Although existing spatial-temporal graph neural networks hold prominent, they often encounter challenges such as (1) ignoring the fixed graph that limits the predictive performance of the model, (2) insufficiently capturing complex spatial-temporal dependencies simultaneously, and (3) lacking attention to spatial-temporal information at different time lengths. In this paper, we propose a Multi-Scale Spatial-Temporal Recurrent Network for traffic flow prediction, namely MSSTRN, which consists of two different recurrent neural networks: the single-step gate recurrent unit and the multi-step gate recurrent unit to fully capture the complex spatial-temporal information in the traffic data under different time windows. Moreover, we propose a spatial-temporal synchronous attention mechanism that integrates adaptive position graph convolutions into the self-attention mechanism to achieve synchronous capture of spatial-temporal dependencies. We conducted extensive experiments on four real traffic datasets and demonstrated that our model achieves the best prediction accuracy with non-trivial margins compared to all the twenty baseline methods.

摘要
做为智能交通系统的基本任务之一，流行预测是非常复杂和动态的。虽然现有的空间-时间图神经网络具有显著的优势，但它们经常遇到以下困难：（1）忽略固定图，这限制了预测模型的性能；（2）不够同时捕捉复杂的空间-时间依赖关系；（3）缺乏对不同时间长度的空间-时间信息的注意力。在这篇论文中，我们提出了一种多级空间-时间循环网络（MSSTRN），它包括单步门阻循环单元和多步门阻循环单元，以全面捕捉不同时间窗口下的复杂空间-时间信息。此外，我们提出了一种空间-时间同步注意机制，它将适应性位图 convolution integrated into the self-attention mechanism，以同步捕捉空间-时间依赖关系。我们对四个实际交通数据集进行了广泛的实验，并证明了我们的模型在所有二十个基eline方法的比较下具有最好的预测精度。

Can Large Language Models Really Improve by Self-critiquing Their Own Plans?

paper_url: http://arxiv.org/abs/2310.08118
repo_url: None
paper_authors: Karthik Valmeekam, Matthew Marquez, Subbarao Kambhampati
for: investigate the verification/self-critiquing abilities of large language models in the context of planning
methods: employ LLMs for both plan generation and verification, assess the verifier LLM’s performance against ground-truth verification, and evaluate the impact of self-critiquing and feedback levels on system performance
results: self-critiquing appears to diminish plan generation performance, LLM verifiers produce a notable number of false positives, and the nature of feedback has minimal impact on plan generation.

Abstract
There have been widespread claims about Large Language Models (LLMs) being able to successfully verify or self-critique their candidate solutions in reasoning problems in an iterative mode. Intrigued by those claims, in this paper we set out to investigate the verification/self-critiquing abilities of large language models in the context of planning. We evaluate a planning system that employs LLMs for both plan generation and verification. We assess the verifier LLM's performance against ground-truth verification, the impact of self-critiquing on plan generation, and the influence of varying feedback levels on system performance. Using GPT-4, a state-of-the-art LLM, for both generation and verification, our findings reveal that self-critiquing appears to diminish plan generation performance, especially when compared to systems with external, sound verifiers and the LLM verifiers in that system produce a notable number of false positives, compromising the system's reliability. Additionally, the nature of feedback, whether binary or detailed, showed minimal impact on plan generation. Collectively, our results cast doubt on the effectiveness of LLMs in a self-critiquing, iterative framework for planning tasks.

摘要
有很多人提出了大型自然语言模型（LLM）可以成功验证或自我批判其候选解决方案的宣传。为了调查这些宣传，我们在这篇论文中进行了大语言模型在规划中的验证/自我批判能力的调查。我们评估了使用LLM进行生成和验证的规划系统。我们评估了验证LLM的性能与基准验证、自我批判对计划生成的影响以及不同反馈水平对系统性能的影响。使用GPT-4，当前的状态顶尖LLM，进行生成和验证，我们发现自我批判对计划生成性能产生了负面影响，尤其是与外部、有效的验证器和LLM验证器相比。此外，我们发现验证LLM生成的许多假阳性，这使得系统的可靠性受到了损害。此外，反馈的性质，无论是binary还是详细，对计划生成没有显著影响。总之，我们的结果表明LLM在自我批判、迭代模式下的规划任务效果不足。

DUSA: Decoupled Unsupervised Sim2Real Adaptation for Vehicle-to-Everything Collaborative Perception

paper_url: http://arxiv.org/abs/2310.08117
repo_url: https://github.com/refkxh/DUSA
paper_authors: Xianghao Kong, Wentao Jiang, Jinrang Jia, Yifeng Shi, Runsheng Xu, Si Liu
for: 这个研究是为了解决自动驾驶需要高精度的车辆到所有事物（V2X）的共同感知问题，但是获得大量真实世界数据可能是costly和difficult的。因此，实验数据获得了更多的注意，因为它们可以在非常低的成本下生成大量的数据。但是，实验和真实世界之间的领域差强度常常导致从实验数据训练的模型在真实世界数据上表现不佳。
methods: 这个研究使用了一种名为Decoupled Unsupervised Sim2Real Adaptation（DUSA）的新方法，它将V2X共同感知领域的 sim2real 领域对应问题分解为两个互相独立的子问题： sim2real 适应和间 agent 适应。在 sim2real 适应方面，我们设计了一个位置适应的 LSA（Location-adaptive Sim2Real Adapter）模组，将从critical locations of the feature map中提取的特征进行适应，并通过一个 sim/real 检测器来调整这些特征与实验数据之间的对应。在间 agent 适应方面，我们还提出了一个 Confidence-aware Inter-agent Adapter（CIA）模组，将Agent-wise confidence maps的指导下进行细部特征的对应。
results: 实验结果显示，提案的 DUSA 方法在无supervision的 sim2real 适应上具有优秀的效果，从 simulated V2XSet 数据集中获得了高精度的 V2X 共同感知结果，并且在真实世界 DAIR-V2X-C 数据集上进行验证。

Abstract
Vehicle-to-Everything (V2X) collaborative perception is crucial for autonomous driving. However, achieving high-precision V2X perception requires a significant amount of annotated real-world data, which can always be expensive and hard to acquire. Simulated data have raised much attention since they can be massively produced at an extremely low cost. Nevertheless, the significant domain gap between simulated and real-world data, including differences in sensor type, reflectance patterns, and road surroundings, often leads to poor performance of models trained on simulated data when evaluated on real-world data. In addition, there remains a domain gap between real-world collaborative agents, e.g. different types of sensors may be installed on autonomous vehicles and roadside infrastructures with different extrinsics, further increasing the difficulty of sim2real generalization. To take full advantage of simulated data, we present a new unsupervised sim2real domain adaptation method for V2X collaborative detection named Decoupled Unsupervised Sim2Real Adaptation (DUSA). Our new method decouples the V2X collaborative sim2real domain adaptation problem into two sub-problems: sim2real adaptation and inter-agent adaptation. For sim2real adaptation, we design a Location-adaptive Sim2Real Adapter (LSA) module to adaptively aggregate features from critical locations of the feature map and align the features between simulated data and real-world data via a sim/real discriminator on the aggregated global feature. For inter-agent adaptation, we further devise a Confidence-aware Inter-agent Adapter (CIA) module to align the fine-grained features from heterogeneous agents under the guidance of agent-wise confidence maps. Experiments demonstrate the effectiveness of the proposed DUSA approach on unsupervised sim2real adaptation from the simulated V2XSet dataset to the real-world DAIR-V2X-C dataset.

摘要

Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques

paper_url: http://arxiv.org/abs/2310.08101
repo_url: None
paper_authors: Junxiao Shen, John J. Dudley, Jingyao Zheng, Bill Byrne, Per Ola Kristensson
for: 这篇研究旨在提高文本输入的效率和流畅性，并且应对深度学习模型在文本输入中的应用。
methods: 这篇研究使用了大型语言模型GPT-3.5的内置学习能力，将其训练为不同的文本预测技术。另外，还引入了一个对话式提示生成器Promptor，以帮助设计师创建适当的提示。
results: 研究结果显示，使用Promptor生成的提示可以提高文本预测的相似度和 coherence 比设计师自己创建的提示高出35%和22%。

Abstract
Text entry is an essential task in our day-to-day digital interactions. Numerous intelligent features have been developed to streamline this process, making text entry more effective, efficient, and fluid. These improvements include sentence prediction and user personalization. However, as deep learning-based language models become the norm for these advanced features, the necessity for data collection and model fine-tuning increases. These challenges can be mitigated by harnessing the in-context learning capability of large language models such as GPT-3.5. This unique feature allows the language model to acquire new skills through prompts, eliminating the need for data collection and fine-tuning. Consequently, large language models can learn various text prediction techniques. We initially showed that, for a sentence prediction task, merely prompting GPT-3.5 surpassed a GPT-2 backed system and is comparable with a fine-tuned GPT-3.5 model, with the latter two methods requiring costly data collection, fine-tuning and post-processing. However, the task of prompting large language models to specialize in specific text prediction tasks can be challenging, particularly for designers without expertise in prompt engineering. To address this, we introduce Promptor, a conversational prompt generation agent designed to engage proactively with designers. Promptor can automatically generate complex prompts tailored to meet specific needs, thus offering a solution to this challenge. We conducted a user study involving 24 participants creating prompts for three intelligent text entry tasks, half of the participants used Promptor while the other half designed prompts themselves. The results show that Promptor-designed prompts result in a 35% increase in similarity and 22% in coherence over those by designers.

摘要
文本输入是我们日常数字互动中的基本任务。许多智能功能已经被开发出来，以减少这个过程的复杂性、效率和流畅性。这些改进包括句子预测和用户个性化。然而，随着深度学习基于语言模型成为标准，数据采集和模型细化的必要性增加。这些挑战可以通过大语言模型的上下文学习能力来解决，例如GPT-3.5。这种特有的功能允许语言模型通过提示来获得新的技能，从而消除数据采集和细化的需要。因此，大语言模型可以学习多种文本预测技术。我们的初步研究表明，对于句子预测任务，只需提示GPT-3.5，其性能比GPT-2 backing system和精心细化GPT-3.5模型高，但需要大量数据采集、细化和后处理。然而，让大语言模型专注于特定文本预测任务的任务可能是挑战，特别是没有提示工程学习的设计师。为解决这个问题，我们介绍了Promptor，一个用于生成对话提示的对话引擎，旨在与设计师进行激活engage。Promptor可以自动生成特定需求的复杂提示，因此为这个挑战提供了解决方案。我们对24名参与者进行了用户研究，其中一半使用Promptor，另一半设计自己的提示。结果表明，Promptor-设计的提示与设计师自己设计的提示相比，同样的任务上的相似性提高35%， coherence提高22%。

Sentinel: An Aggregation Function to Secure Decentralized Federated Learning

paper_url: http://arxiv.org/abs/2310.08097
repo_url: None
paper_authors: Chao Feng, Alberto Huertas Celdran, Janosch Baltensperger, Enrique Tomas Martinez Beltran, Gerome Bovet, Burkhard Stiller
for: 本研究旨在提出一种防御策略，以counteract poisoning attacks在分布式学习（DFL）中。
methods: 该策略基于本地数据的可访问性，定义了三步集成协议：相似性筛选、bootstrap验证和 нормализация，以保护 Against malicious model updates。
results: 对于多种数据集和攻击类型和威胁水平，Sentinel可以提高防御性能，超越当前领域的状态之作。

Abstract
The rapid integration of Federated Learning (FL) into networking encompasses various aspects such as network management, quality of service, and cybersecurity while preserving data privacy. In this context, Decentralized Federated Learning (DFL) emerges as an innovative paradigm to train collaborative models, addressing the single point of failure limitation. However, the security and trustworthiness of FL and DFL are compromised by poisoning attacks, negatively impacting its performance. Existing defense mechanisms have been designed for centralized FL and they do not adequately exploit the particularities of DFL. Thus, this work introduces Sentinel, a defense strategy to counteract poisoning attacks in DFL. Sentinel leverages the accessibility of local data and defines a three-step aggregation protocol consisting of similarity filtering, bootstrap validation, and normalization to safeguard against malicious model updates. Sentinel has been evaluated with diverse datasets and various poisoning attack types and threat levels, improving the state-of-the-art performance against both untargeted and targeted poisoning attacks.

摘要
随着联邦学习（FL）在网络中的快速整合，包括网络管理、质量服务和网络安全，同时保护数据隐私。在这个 контексте，分布式联邦学习（DFL） emerges as an innovative paradigm to train collaborative models, addressing the single point of failure limitation。然而，FL和DFL的安全性和可靠性受到毒素攻击的威胁，这会 negatively impact its performance。现有的防御机制是为中央化FL设计的，它们不充分利用了 DFL 的特点。因此，这个工作介绍了 Sentinel，一种防御策略，用于对抗毒素攻击在 DFL 中。Sentinel 利用了本地数据的可 accessible 性，并定义了三步集成协议，包括相似性筛选、 bootstrap 验证和归一化，以保护 против 恶意模型更新。Sentinel 在多种数据集和不同类型和威胁水平的攻击下进行了评估，提高了对于不argeted和targeted毒素攻击的状态前艺性表现。

Discerning Temporal Difference Learning

paper_url: http://arxiv.org/abs/2310.08091
repo_url: None
paper_authors: Jianfei Ma
for: 提高 reinforcement learning 中值函数的效率评估
methods: 使用 temporal difference learning ($\lambda$) 和 flexible emphasis functions
results: 提高 value estimation 和学习速度，适用于多种情况

Abstract
Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction error into the historical context. However, this approach often neglects the significance of historical states and the relative importance of propagating the TD error, influenced by challenges such as visitation imbalance or outcome noise. To address this, we propose a novel TD algorithm named discerning TD learning (DTD), which allows flexible emphasis functions$-$predetermined or adapted during training$-$to allocate efforts effectively across states. We establish the convergence properties of our method within a specific class of emphasis functions and showcase its promising potential for adaptation to deep RL contexts. Empirical results underscore that employing a judicious emphasis function not only improves value estimation but also expedites learning across diverse scenarios.

摘要
<>Temporal difference learning（TD）是RL中的基本概念，旨在效率地评估策略的价值函数。TD($\lambda$)是一种强大的变体，它在历史上追溯记忆中分配预测错误。然而，这种方法经常忽略历史状态的重要性和对TD错误的相对重要性，这可能导致探索偏误或结果噪音的问题。为解决这个问题，我们提出了一种新的TD算法，名为分化TD学习（DTD），它允许在训练过程中预先或适应定制强调函数，以有效地分配努力 across状态。我们证明了我们的方法在特定类型的强调函数下的收敛性质，并在深度RLContext中展示了其扬名的潜力。实验结果表明，采用合适的强调函数不仅改善价值估计，而且加快学习过程中的探索。Translation notes:* TD($\lambda$) is translated as TD($\lambda$)，where $\lambda$ is a memory trace.* 历史状态 (lìshǐ zhèngjī) is translated as "historical states".* 探索偏误 (tànsuǒ biànpò) is translated as "exploration bias".* 结果噪音 (jiéguǒ zhāoxīn) is translated as "outcome noise".* 强调函数 (qiángdǎo fungs) is translated as "emphasis functions".* 深度RLContext (shēngrán yījīng) is translated as "deep RL contexts".

Low-Resource Clickbait Spoiling for Indonesian via Question Answering

paper_url: http://arxiv.org/abs/2310.08085
repo_url: None
paper_authors: Ni Putu Intan Maharani, Ayu Purwarianti, Alham Fikri Aji
for: 这个论文的目的是怎样解决Clickbait spoiling问题，即通过生成短文满足Clickbait文章中引起的Curiosity。
methods: 这个论文使用了跨语言零shot问答模型，以及多语言模型，来解决Clickbait spoiling问题。
results: 实验结果表明，XLM-RoBERTa（大）模型在短语和段落 spoilers 中表现最佳，而 mDeBERTa（基础）模型在多部 spoilers 中表现最佳。

Abstract
Clickbait spoiling aims to generate a short text to satisfy the curiosity induced by a clickbait post. As it is a newly introduced task, the dataset is only available in English so far. Our contributions include the construction of manually labeled clickbait spoiling corpus in Indonesian and an evaluation on using cross-lingual zero-shot question answering-based models to tackle clikcbait spoiling for low-resource language like Indonesian. We utilize selection of multilingual language models. The experimental results suggest that XLM-RoBERTa (large) model outperforms other models for phrase and passage spoilers, meanwhile, mDeBERTa (base) model outperforms other models for multipart spoilers.

摘要
Clickbait 恶作戏目的是生成一篇短文以满足 clickbait 帖子所引起的好奇心。现在这个任务刚刚引入，数据集只有英语版本。我们的贡献包括手动标注的 Indonesian clickbait 恶作戏训练集，以及使用 cross-lingual zero-shot 问答模型来解决 low-resource 语言 like Indonesian 的 clickbait 恶作戏。我们利用多语言语言模型的选择。实验结果表明， XLM-RoBERTa (大) 模型在短语和段落 spoilers 方面表现出色，而 mDeBERTa (基础) 模型在多部 spoilers 方面表现更佳。

GameGPT: Multi-agent Collaborative Framework for Game Development

paper_url: http://arxiv.org/abs/2310.08067
repo_url: None
paper_authors: Dake Chen, Hanbin Wang, Yunhao Huo, Yuzhao Li, Haoyang Zhang
for: automatize and expedite game development processes
methods: dual collaboration, layered approaches with several in-house lexicons, and decoupling approach
results: mitigate hallucination and redundancy in planning, task identification, and implementation phases, and achieve code generation with better precision.Here’s the full translation of the paper’s abstract in Simplified Chinese:
for: 这个论文主要是为了自动化和加速游戏开发过程而写的。
methods: 该框架使用了双合作、层次分解和多个内部词典的方法来 Mitigate hallucination和重复性在规划、任务标识和实现阶段。
results: 这些方法可以减少hallucination和重复性，并实现代码生成更加精准。

Abstract
The large language model (LLM) based agents have demonstrated their capacity to automate and expedite software development processes. In this paper, we focus on game development and propose a multi-agent collaborative framework, dubbed GameGPT, to automate game development. While many studies have pinpointed hallucination as a primary roadblock for deploying LLMs in production, we identify another concern: redundancy. Our framework presents a series of methods to mitigate both concerns. These methods include dual collaboration and layered approaches with several in-house lexicons, to mitigate the hallucination and redundancy in the planning, task identification, and implementation phases. Furthermore, a decoupling approach is also introduced to achieve code generation with better precision.

摘要
大型语言模型（LLM）基于代理的代理系统已经展示了自动化和加速软件开发过程的能力。在这篇文章中，我们专注于游戏开发，并提出了一个多代理协同框架，名为GameGPT，以自动化游戏开发。许多研究都指出了推几成为 LLM 在生产环境中应用时的主要障碍。我们则识别了另一个问题：重复。我们的框架提出了一系列方法来减轻这两个问题。这些方法包括双投递和层次方法，以减少在规划、任务识别和实现阶段中的重复和推几。此外，我们还引入了解离方法，以实现代码生成的更高精度。

The Search-and-Mix Paradigm in Approximate Nash Equilibrium Algorithms

paper_url: http://arxiv.org/abs/2310.08066
repo_url: None
paper_authors: Xiaotie Deng, Dongchen Li, Hanyu Li
for: 本文旨在提供一种自动化筛选和混合方法，用于计算 approximate Nash equilibria 在两个玩家的游戏中。
methods: 本文使用了一种搜索和混合方法，其中包括一个搜索阶段和一个混合阶段。通过这种方法，我们可以自动化筛选和混合过程，并不需要手写证明。
results: 本文通过自动化筛选和混合方法，可以计算出所有文献中的算法精度下界。同时，我们还可以使用一个程序来分析这些下界，而不需要手写证明。这种方法可以扩展到其他算法中，以自动化其分析。

Abstract
AI in Math deals with mathematics in a constructive manner so that reasoning becomes automated, less laborious, and less error-prone. For algorithms, the question becomes how to automate analyses for specific problems. For the first time, this work provides an automatic method for approximation analysis on a well-studied problem in theoretical computer science: computing approximate Nash equilibria in two-player games. We observe that such algorithms can be reformulated into a search-and-mix paradigm, which involves a search phase followed by a mixing phase. By doing so, we are able to fully automate the procedure of designing and analyzing the mixing phase. For example, we illustrate how to perform our method with a program to analyze the approximation bounds of all the algorithms in the literature. Same approximation bounds are computed without any hand-written proof. Our automatic method heavily relies on the LP-relaxation structure in approximate Nash equilibria. Since many approximation algorithms and online algorithms adopt the LP relaxation, our approach may be extended to automate the analysis of other algorithms.

摘要

Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation

paper_url: http://arxiv.org/abs/2310.08056
repo_url: None
paper_authors: Shreyas Havaldar, Navodita Sharma, Shubhi Sareen, Karthikeyan Shanmugam, Aravindan Raghuveer
for: 本文targets the Learning from Label Proportions (LLP) problem, where only aggregate level labels are available during training, and the aim is to achieve the best performance at the instance-level on test data.
methods: 本文提出了一个新的算法框架，包括两个主要步骤：Pseudo Labeling和Embedding Refinement。在Pseudo Labeling阶段，我们使用Gibbs分布 incorporate covariate information和bag level aggregated label，然后使用Belief Propagation marginalize Gibbs distribution获得pseudo labels。在Embedding Refinement阶段，我们使用pseudo labels提供supervision for a learner to obtain a better embedding。
results: 本文的算法在LLPBinary Classification问题上 display strong gains against several SOTA baselines (up to 15%) on various dataset types - tabular and Image. 更重要的是，我们的算法可以在大袋子样本数量下 достичь这些提高，并且具有较少的计算负担。

Abstract
Learning from Label Proportions (LLP) is a learning problem where only aggregate level labels are available for groups of instances, called bags, during training, and the aim is to get the best performance at the instance-level on the test data. This setting arises in domains like advertising and medicine due to privacy considerations. We propose a novel algorithmic framework for this problem that iteratively performs two main steps. For the first step (Pseudo Labeling) in every iteration, we define a Gibbs distribution over binary instance labels that incorporates a) covariate information through the constraint that instances with similar covariates should have similar labels and b) the bag level aggregated label. We then use Belief Propagation (BP) to marginalize the Gibbs distribution to obtain pseudo labels. In the second step (Embedding Refinement), we use the pseudo labels to provide supervision for a learner that yields a better embedding. Further, we iterate on the two steps again by using the second step's embeddings as new covariates for the next iteration. In the final iteration, a classifier is trained using the pseudo labels. Our algorithm displays strong gains against several SOTA baselines (up to 15%) for the LLP Binary Classification problem on various dataset types - tabular and Image. We achieve these improvements with minimal computational overhead above standard supervised learning due to Belief Propagation, for large bag sizes, even for a million samples.

摘要
学习从标签分布（LLP）是一个学习问题，在训练时只有袋子级别标签可用，并且目标是在测试数据上达到最佳实例级别性能。这种设定出现在广告和医疗等领域 due to privacy considerations。我们提出了一种新的算法框架，它在每次迭代中执行两个主要步骤。在第一步（假标签生成）中，我们定义了一个 Gibbs 分布 над二进制实例标签，该分布包含 a) covariate 信息通过要求同 covariate 的实例有同样的标签，以及 b) 袋子级别归一化标签。然后，我们使用信念传播（BP）来抽象 Gibbs 分布，从而获得假标签。在第二步（嵌入级修正）中，我们使用假标签来提供对一个更好的嵌入的超vision。然后，我们在下一轮迭代中使用上一轮的嵌入作为新的covariate。在最后一轮迭代中，我们使用假标签来训练一个分类器。我们的算法在LLP binary classification问题中 Display strong gains against several SOTA baselines (up to 15%) on various dataset types - tabular and Image. We achieve these improvements with minimal computational overhead above standard supervised learning due to Belief Propagation, even for large bag sizes, even for a million samples.

Understanding and Controlling a Maze-Solving Policy Network

paper_url: http://arxiv.org/abs/2310.08043
repo_url: None
paper_authors: Ulisse Mini, Peli Grietzer, Mrinank Sharma, Austin Meek, Monte MacDiarmid, Alexander Matt Turner
for: 研究AI系统的目标和目标表现方式，通过实际测试一个预训练的套件学习策略，实现迷宫 Navigation 的多个上下文依赖性目标。
methods: 使用实验研究方法，精确地研究这个策略所解决的迷宫 Navigation 问题，并对策略中的不同部分进行修改和测试，以探索策略中的多个目标表现方式。
results: 研究发现，这个策略包含多个重复、分散和可重新目标表现方式，并且可以通过修改策略中的不同部分来控制策略的行为。这些结果提供了关于对AI系统的目标和目标表现方式的更深入理解。

Abstract
To understand the goals and goal representations of AI systems, we carefully study a pretrained reinforcement learning policy that solves mazes by navigating to a range of target squares. We find this network pursues multiple context-dependent goals, and we further identify circuits within the network that correspond to one of these goals. In particular, we identified eleven channels that track the location of the goal. By modifying these channels, either with hand-designed interventions or by combining forward passes, we can partially control the policy. We show that this network contains redundant, distributed, and retargetable goal representations, shedding light on the nature of goal-direction in trained policy networks.

摘要
要了解人工智能系统的目标和目标表达，我们仔细研究了一个预训练的奖励学习策略，该策略在迷宫中穿梭到多种目标方块。我们发现该网络追求多个Context-dependent目标，并且我们进一步确定了网络中的一些Circuits与这些目标相关。例如，我们发现了11个跟踪目标的通道。通过修改这些通道，可以在一定程度上控制策略。我们显示，这个网络包含多余的、分布式的和可重定向的目标表达，这 shed light on the nature of goal-direction in trained policy networks。

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

paper_url: http://arxiv.org/abs/2310.08041
repo_url: None
paper_authors: Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang
for: 提高大型语言模型（LLMs）的广泛部署，因为它们的需求很高。
methods: 提议使用Quantization-Aware Training（QAT）来解决这个问题，但QAT的训练成本很高。因此，提议使用Post-Training Quantization（PTQ）来实现LLMs的低位数部署。
results: 提出了一种名为QLLM的准确和高效的低位数PTQ方法，可以快速地量化LLMs。例如，在LLaMA-2上，QLLM可以在10个小时内量化4位数LLaMA-2-70B模型，比前一个state-of-the-art方法提高了7.89%的均值准确率。

Abstract
Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive training costs make Post-Training Quantization (PTQ) a more practical approach for LLMs. In existing studies, activation outliers in particular channels are identified as the bottleneck to PTQ accuracy. They propose to transform the magnitudes from activations to weights, which however offers limited alleviation or suffers from unstable gradients, resulting in a severe performance drop at low-bitwidth. In this paper, we propose QLLM, an accurate and efficient low-bitwidth PTQ method designed for LLMs. QLLM introduces an adaptive channel reassembly technique that reallocates the magnitude of outliers to other channels, thereby mitigating their impact on the quantization range. This is achieved by channel disassembly and channel assembly, which first breaks down the outlier channels into several sub-channels to ensure a more balanced distribution of activation magnitudes. Then similar channels are merged to maintain the original channel number for efficiency. Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly. To further compensate for the performance loss caused by quantization, we propose an efficient tuning method that only learns a small number of low-rank weights while freezing the pre-trained quantized model. After training, these low-rank parameters can be fused into the frozen weights without affecting inference. Extensive experiments on LLaMA-1 and LLaMA-2 show that QLLM can obtain accurate quantized models efficiently. For example, QLLM quantizes the 4-bit LLaMA-2-70B within 10 hours on a single A100-80G GPU, outperforming the previous state-of-the-art method by 7.89% on the average accuracy across five zero-shot tasks.

摘要
大型语言模型（LLM）在自然语言处理（NLP）领域表现出色，但它们的需求限制了它们的广泛部署。量化意识训练（QAT）提供了一种解决方案，但它的训练成本非常高，因此在LLM中使用Post-Training Quantization（PTQ）成为了更实际的方法。在现有的研究中，活动异常值在特定通道被识别为PTQ准确率的瓶颈。它们提议将活动的大小从权重转移到 weights，但这些方法具有有限的缓解或因为不稳定的梯度而导致性能下降。在这篇论文中，我们提出了QLLM，一种高效和准确的低位宽PTQ方法，适用于LLM。QLLM引入了适应通道重新组装技术，通过将异常通道的大小分配到其他通道来缓解其影响量化范围。这是通过通道分解和通道组装来实现的，先将异常通道分解成多个子通道，以确保更加平衡的活动大小分布。然后，类似通道被合并以维持原始通道数量的效率。此外，我们还提出了一种自适应的整数截取策略，以确定最佳的子通道数量。为了进一步补偿由量化所带来的性能损失，我们还提出了一种高效的调整方法，只需学习一小部分的低维度参数，而不影响推理。经验表明，QLLM可以高效地生成准确的量化模型，比如在4位LLAMA-2-70B上，QLLM在10个小时内在单个A100-80G GPU上量化，并在五个零shot任务上平均提高了7.89%的准确率。

Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models

paper_url: http://arxiv.org/abs/2310.08039
repo_url: https://github.com/songjinbo/ECMM
paper_authors: Jinbo Song, Ruoran Huang, Xinyang Wang, Wei Huang, Qian Yu, Mingming Chen, Yafei Yao, Chaosheng Fan, Changping Peng, Zhangang Lin, Jinghe Hu, Jingping Shao
for: 提高推荐系统和在线广告中的多Stage架构的性能，减少样本选择偏见问题。
methods: 提出基于全样本空间的整体链模型（ECM），并设计了细化神经网络结构ECMM以提高预选精度。
results: 对实际大规模交通日志数据进行评估，ECM模型比状态艺术法准确率高，时间消耗在可接受水平之下，实现了更好的效率和效果之间的交易。

Abstract
Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-ranking approaches mainly endure sample selection bias (SSB) problem owing to ignoring the entire-chain data dependence, resulting in sub-optimal performances. In this paper, we rethink pre-ranking system from the perspective of the entire sample space, and propose Entire-chain Cross-domain Models (ECM), which leverage samples from the whole cascaded stages to effectively alleviate SSB problem. Besides, we design a fine-grained neural structure named ECMM to further improve the pre-ranking accuracy. Specifically, we propose a cross-domain multi-tower neural network to comprehensively predict for each stage result, and introduce the sub-networking routing strategy with $L0$ regularization to reduce computational costs. Evaluations on real-world large-scale traffic logs demonstrate that our pre-ranking models outperform SOTA methods while time consumption is maintained within an acceptable level, which achieves better trade-off between efficiency and effectiveness.

摘要
Note:* 推荐系统 (recommender systems) is translated as 推荐系统* online advertising is translated as 在线广告* SOTA (state-of-the-art) is translated as 当前最佳方案* SSB (sample selection bias) is translated as 样本选择偏见

Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles

paper_url: http://arxiv.org/abs/2310.08034
repo_url: None
paper_authors: Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Ziran Wang
for: 提高自动驾驶车辆的安全性和效率，通过语言模型增强决策过程
methods: 利用语言模型的语言和上下文理解能力，与专门的工具集成在自动驾驶车辆中
results: 实验表明，使用链条提示可以提高驾驶决策，并实现实时个性化驾驶，语言模型可以 influencing driving behaviors based on verbal commands, leading to improved driving decisions and personalized driving experiences.

Abstract
The fusion of human-centric design and artificial intelligence (AI) capabilities has opened up new possibilities for next-generation autonomous vehicles that go beyond transportation. These vehicles can dynamically interact with passengers and adapt to their preferences. This paper proposes a novel framework that leverages Large Language Models (LLMs) to enhance the decision-making process in autonomous vehicles. By utilizing LLMs' linguistic and contextual understanding abilities with specialized tools, we aim to integrate the language and reasoning capabilities of LLMs into autonomous vehicles. Our research includes experiments in HighwayEnv, a collection of environments for autonomous driving and tactical decision-making tasks, to explore LLMs' interpretation, interaction, and reasoning in various scenarios. We also examine real-time personalization, demonstrating how LLMs can influence driving behaviors based on verbal commands. Our empirical results highlight the substantial advantages of utilizing chain-of-thought prompting, leading to improved driving decisions, and showing the potential for LLMs to enhance personalized driving experiences through ongoing verbal feedback. The proposed framework aims to transform autonomous vehicle operations, offering personalized support, transparent decision-making, and continuous learning to enhance safety and effectiveness. We achieve user-centric, transparent, and adaptive autonomous driving ecosystems supported by the integration of LLMs into autonomous vehicles.

摘要
人 centered 设计和人工智能（AI）技术的融合已经开启了下一代自动驾驶车的新可能性，这些车辆可以在交通过程中动态与乘客交互，并根据他们的偏好进行适应。本文提出了一种新的框架，利用大型自然语言模型（LLM）来增强自动驾驶车的决策过程。通过利用 LLM 的语言和上下文理解能力，我们希望将语言和思维能力 integrate into autonomous vehicles。我们的研究包括在 HighwayEnv 环境中进行了一系列的实验，以探索 LLM 在不同场景中的解释、交互和思维能力。我们还考虑了实时个性化，以示如何 LLM 可以根据 verbalemands 的指令来影响驾驶行为。我们的实验结果表明，使用 chain-of-thought prompting 可以提高驾驶决策的质量，并显示 LLM 可以在不断的语言反馈中提供个性化驾驶体验。我们的提案框架 aimsto transform autonomous vehicle operations, offering personalized support, transparent decision-making, and continuous learning to enhance safety and effectiveness。我们实现了用户中心、透明和 adaptive 的自动驾驶生态系统，通过 LLM 的 integrate into autonomous vehicles。

Incorporating Domain Knowledge Graph into Multimodal Movie Genre Classification with Self-Supervised Attention and Contrastive Learning

paper_url: http://arxiv.org/abs/2310.08032
repo_url: https://github.com/aoluming/IDKG
paper_authors: Jiaqi Li, Guilin Qi, Chuanyi Zhang, Yongrui Chen, Yiming Tan, Chenlong Xia, Ye Tian
for: 这篇论文旨在提高多模态电影类别分类的性能，解决了 metadata 中的群体关系未利用、自动注意分配和综合特征混合等问题。
methods: 该方法利用知识图从多种角度来解决这些问题，首先将 metadata 转化为域知识图，然后使用 translate model 获取知识图中的关系。接着，引入自动注意分配模块，使用自我超vised学习学习知识图的分布，生成合理的注意重量。最后，提出一种 Genre-Centroid Anchored Contrastive Learning 模块，增强综合特征的抑制能力。
results: 实验结果表明，我们的方法在 MM-IMDb 2.0 数据集上比现有方法高效，并且在 MM-IMDb 数据集上也达到了比较好的效果。

Abstract
Multimodal movie genre classification has always been regarded as a demanding multi-label classification task due to the diversity of multimodal data such as posters, plot summaries, trailers and metadata. Although existing works have made great progress in modeling and combining each modality, they still face three issues: 1) unutilized group relations in metadata, 2) unreliable attention allocation, and 3) indiscriminative fused features. Given that the knowledge graph has been proven to contain rich information, we present a novel framework that exploits the knowledge graph from various perspectives to address the above problems. As a preparation, the metadata is processed into a domain knowledge graph. A translate model for knowledge graph embedding is adopted to capture the relations between entities. Firstly we retrieve the relevant embedding from the knowledge graph by utilizing group relations in metadata and then integrate it with other modalities. Next, we introduce an Attention Teacher module for reliable attention allocation based on self-supervised learning. It learns the distribution of the knowledge graph and produces rational attention weights. Finally, a Genre-Centroid Anchored Contrastive Learning module is proposed to strengthen the discriminative ability of fused features. The embedding space of anchors is initialized from the genre entities in the knowledge graph. To verify the effectiveness of our framework, we collect a larger and more challenging dataset named MM-IMDb 2.0 compared with the MM-IMDb dataset. The experimental results on two datasets demonstrate that our model is superior to the state-of-the-art methods. We will release the code in the near future.

摘要
多Modal电影类别分类一直被视为一项具有多个标签的复杂分类任务，这是因为多modal数据，如海报、剧情简介、预告片和元数据的多样性。 existing works have made great progress in modeling and combining each modality, but they still face three issues: 1) unutilized group relations in metadata, 2) unreliable attention allocation, and 3) indiscriminative fused features. Given that the knowledge graph has been proven to contain rich information, we present a novel framework that exploits the knowledge graph from various perspectives to address the above problems. As a preparation, the metadata is processed into a domain knowledge graph. A translate model for knowledge graph embedding is adopted to capture the relations between entities. Firstly we retrieve the relevant embedding from the knowledge graph by utilizing group relations in metadata and then integrate it with other modalities. Next, we introduce an Attention Teacher module for reliable attention allocation based on self-supervised learning. It learns the distribution of the knowledge graph and produces rational attention weights. Finally, a Genre-Centroid Anchored Contrastive Learning module is proposed to strengthen the discriminative ability of fused features. The embedding space of anchors is initialized from the genre entities in the knowledge graph. To verify the effectiveness of our framework, we collect a larger and more challenging dataset named MM-IMDb 2.0 compared with the MM-IMDb dataset. The experimental results on two datasets demonstrate that our model is superior to the state-of-the-art methods. We will release the code in the near future.

paper_url: http://arxiv.org/abs/2310.08026
repo_url: None
paper_authors: Xingyue Liu, Jiahao Qi, Chen Chen, Kangcheng Bin, Ping Zhong
for: 该论文旨在解决无人机视觉检索中的跨模态车辆识别问题，提高视觉监测和公共安全领域的应用。
methods: 该论文提出了一个跨模态车辆识别 benchmark 名为 UAV Cross-Modality Vehicle Re-ID (UCM-VeID)，包含 753 个标识性的车辆图像，以及一种 hybrid weights decoupling network (HWDNet) 来解决跨模态差异和方向差异挑战。
results: 实验结果表明，UCM-VeID 可以有效地解决跨模态车辆识别问题，并且 HWDNet 可以学习共享的 orientation-invariant 特征。

Abstract
Owing to the capacity of performing full-time target search, cross-modality vehicle re-identification (Re-ID) based on unmanned aerial vehicle (UAV) is gaining more attention in both video surveillance and public security. However, this promising and innovative research has not been studied sufficiently due to the data inadequacy issue. Meanwhile, the cross-modality discrepancy and orientation discrepancy challenges further aggravate the difficulty of this task. To this end, we pioneer a cross-modality vehicle Re-ID benchmark named UAV Cross-Modality Vehicle Re-ID (UCM-VeID), containing 753 identities with 16015 RGB and 13913 infrared images. Moreover, to meet cross-modality discrepancy and orientation discrepancy challenges, we present a hybrid weights decoupling network (HWDNet) to learn the shared discriminative orientation-invariant features. For the first challenge, we proposed a hybrid weights siamese network with a well-designed weight restrainer and its corresponding objective function to learn both modality-specific and modality shared information. In terms of the second challenge, three effective decoupling structures with two pretext tasks are investigated to learn orientation-invariant feature. Comprehensive experiments are carried out to validate the effectiveness of the proposed method. The dataset and codes will be released at https://github.com/moonstarL/UAV-CM-VeID.

摘要
由于全时目标搜索的能力，基于无人机（UAV）的跨模态汽车重新认识（Re-ID）在视频监测和公共安全领域获得更多的注意力。然而，这项有前瞻性和创新的研究尚未得到充分的研究，主要是因为数据不足问题。此外，跨模态差异和Orientation差挑战更加增加了这个任务的难度。为此，我们开创了一个跨模态汽车Re-ID标准 bencmark named UAV Cross-Modality Vehicle Re-ID (UCM-VeID),包含753个标识性、16015个RGB和13913个 инфракра们图像。此外，为了解决跨模态差异和Orientation差挑战，我们提出了一种hybrid weights decoupling network (HWDNet)，以学习共享的Discriminative orientation-invariant特征。首先，我们提出了一种hybrid weights siamese network，其中包括一个Well-designed weight restrainer和其对应的目标函数，以学习两个模态Specific和共享信息。在第二个挑战中，我们investigated three effective decoupling structures with two pretext tasks，以学习Orientation-invariant特征。为了证明方法的有效性，我们进行了广泛的实验。UCM-VeID数据集和代码将在https://github.com/moonstarL/UAV-CM-VeID上发布。

Effects of Human Adversarial and Affable Samples on BERT Generalizability

paper_url: http://arxiv.org/abs/2310.08008
repo_url: None
paper_authors: Aparna Elangovan, Jiayuan He, Yuan Li, Karin Verspoor
for: 这篇论文的目的是探讨训练数据质量对模型的泛化性的影响，而不是训练数据量。
methods: 这篇论文使用了BERT模型，并对训练数据进行分类和关系抽取任务。
results: 研究发现，固定训练样本数量下，有10-30%的人工挑战（h-adversarial）样本可以提高精度和F1值，但是超过这个范围可能会导致性能普遍下降。同时，h-affable样本可能没有提高模型的泛化性，甚至会导致模型的泛化性下降。

Abstract
BERT-based models have had strong performance on leaderboards, yet have been demonstrably worse in real-world settings requiring generalization. Limited quantities of training data is considered a key impediment to achieving generalizability in machine learning. In this paper, we examine the impact of training data quality, not quantity, on a model's generalizability. We consider two characteristics of training data: the portion of human-adversarial (h-adversarial), i.e., sample pairs with seemingly minor differences but different ground-truth labels, and human-affable (h-affable) training samples, i.e., sample pairs with minor differences but the same ground-truth label. We find that for a fixed size of training samples, as a rule of thumb, having 10-30% h-adversarial instances improves the precision, and therefore F1, by up to 20 points in the tasks of text classification and relation extraction. Increasing h-adversarials beyond this range can result in performance plateaus or even degradation. In contrast, h-affables may not contribute to a model's generalizability and may even degrade generalization performance.

摘要

A Novel Statistical Measure for Out-of-Distribution Detection in Data Quality Assurance

paper_url: http://arxiv.org/abs/2310.07998
repo_url: None
paper_authors: Tinghui Ouyang, Isao Echizen, Yoshiki Seo
for: 本研究旨在 investigate AIQualityManagement (AIQM) 领域中数据领域和out-of-distribution (OOD) 数据的问题。
methods: 本研究使用深度学习技术来实现特征表示，并开发了一种新的统计量来检测OOD数据。
results: 经过实验和评估于图像 benchmark datasets 和工业 dataset，提出的方法被证明为可靠和有效的OOD检测方法。

Abstract
Data outside the problem domain poses significant threats to the security of AI-based intelligent systems. Aiming to investigate the data domain and out-of-distribution (OOD) data in AI quality management (AIQM) study, this paper proposes to use deep learning techniques for feature representation and develop a novel statistical measure for OOD detection. First, to extract low-dimensional representative features distinguishing normal and OOD data, the proposed research combines the deep auto-encoder (AE) architecture and neuron activation status for feature engineering. Then, using local conditional probability (LCP) in data reconstruction, a novel and superior statistical measure is developed to calculate the score of OOD detection. Experiments and evaluations are conducted on image benchmark datasets and an industrial dataset. Through comparative analysis with other common statistical measures in OOD detection, the proposed research is validated as feasible and effective in OOD and AIQM studies.

摘要
□ Text ①人工智能系统的安全性受到数据外部威胁。本研究旨在通过深度学习技术实现特征表示和外部数据检测的AI质量管理（AIQM）研究。首先，通过结合自动encoder（AE）架构和神经元活动状态进行特征工程，提取出Normal和外部数据之间的低维度表示特征。然后，通过局部概率（LCP）进行数据重建，提出了一种新的和优秀的统计度量，用于评估外部数据检测得分。经对图像 bench mark 数据集和工业数据集进行实验和评估，与其他常见的统计度量进行比较分析，本研究被证明可行和有效。

Point-NeuS: Point-Guided Neural Implicit Surface Reconstruction by Volume Rendering

paper_url: http://arxiv.org/abs/2310.07997
repo_url: None
paper_authors: Chen Zhang, Wanjuan Su, Wenbing Tao
for: 本研究旨在提高多视图重建的精度和效率，提出一种基于点导航机制的新方法Point-NeuS。
methods: 该方法利用点模型进行几何约束，将点云的 aleatoric 不确定性模型为捕捉噪声和估计点的可靠性。另外，引入图像 проек示模块，将点和图像连接到signed distance function中，以增强 geometric constraint。
results: 经过效果的点导航，使用轻量级网络实现了11倍的速度提升，并且在多个实验中表现出高质量表面，尤其是细腻的细节和平滑区域。此外，它还具有强大的鲁棒性，可以抗 resist 噪声和缺失数据。

Abstract
Recently, learning neural implicit surface by volume rendering has been a promising way for multi-view reconstruction. However, limited accuracy and excessive time complexity remain bottlenecks that current methods urgently need to overcome. To address these challenges, we propose a new method called Point-NeuS, utilizing point-guided mechanisms to achieve accurate and efficient reconstruction. Point modeling is organically embedded into the volume rendering to enhance and regularize the representation of implicit surface. Specifically, to achieve precise point guidance and noise robustness, aleatoric uncertainty of the point cloud is modeled to capture the distribution of noise and estimate the reliability of points. Additionally, a Neural Projection module connecting points and images is introduced to add geometric constraints to the Signed Distance Function (SDF). To better compensate for geometric bias between volume rendering and point modeling, high-fidelity points are filtered into an Implicit Displacement Network to improve the representation of SDF. Benefiting from our effective point guidance, lightweight networks are employed to achieve an impressive 11x speedup compared to NeuS. Extensive experiments show that our method yields high-quality surfaces, especially for fine-grained details and smooth regions. Moreover, it exhibits strong robustness to both noisy and sparse data.

摘要
近期，通过量rendering学习神经隐 superficie的方法在多视图重建方面表现出了抢眼的承诺。然而，当前方法仍面临着准确性和时间复杂度的瓶颈。为了解决这些挑战，我们提出了一种新的方法 called Point-NeuS，该方法利用点导向机制来实现准确和高效的重建。在量rendering中，点模型被天然地嵌入到隐 superficie中，以增强和规范隐 superficie的表示。具体来说，为了实现精准的点导航和随机变量的鲁棒性，点云的 aleatoric 不确定性被模型来捕捉随机变量的分布和计算点的可靠性。此外，我们引入了一种神经投影模块，将点和图像连接起来，以加入 геометрические约束到signed Distance Function (SDF)。为了更好地补做几何偏见 между量rendering和点模型，高精度的点被筛选到一个Implicit Displacement Network中，以提高SDF的表示。由于我们的有效点导航，我们采用了轻量级的网络，实现了与NeuS的11倍的速度提升。我们的实验表明，我们的方法可以生成高质量的表面，特别是细节和平滑的区域。此外，它还具有强大的鲁棒性，能够抗抗噪和缺失数据。

HeightFormer: A Multilevel Interaction and Image-adaptive Classification-regression Network for Monocular Height Estimation with Aerial Images

paper_url: http://arxiv.org/abs/2310.07995
repo_url: None
paper_authors: Zhan Chen, Yidan Zhang, Xiyu Qi, Yongqiang Mao, Xin Zhou, Lulu Niu, Hui Wu, Lei Wang, Yunping Ge
for: 这篇论文是针对单一图像高度估测在远程测量领域中提出了全面的解决方案，以提高现有方法的精度和效能。
methods: 这篇论文使用了一种叫做HeightFormer的全新方法，其结合了多个层次互动和适应性分类回归，以解决单一图像高度估测中的常见问题。
results: 这篇论文的结果显示，使用HeightFormer方法可以实现高度估测的精度和效能，并且可以提高实际应用中的对象边缘深度估测精度。

Abstract
Height estimation has long been a pivotal topic within measurement and remote sensing disciplines, proving critical for endeavours such as 3D urban modelling, MR and autonomous driving. Traditional methods utilise stereo matching or multisensor fusion, both well-established techniques that typically necessitate multiple images from varying perspectives and adjunct sensors like SAR, leading to substantial deployment costs. Single image height estimation has emerged as an attractive alternative, boasting a larger data source variety and simpler deployment. However, current methods suffer from limitations such as fixed receptive fields, a lack of global information interaction, leading to noticeable instance-level height deviations. The inherent complexity of height prediction can result in a blurry estimation of object edge depth when using mainstream regression methods based on fixed height division. This paper presents a comprehensive solution for monocular height estimation in remote sensing, termed HeightFormer, combining multilevel interactions and image-adaptive classification-regression. It features the Multilevel Interaction Backbone (MIB) and Image-adaptive Classification-regression Height Generator (ICG). MIB supplements the fixed sample grid in CNN of the conventional backbone network with tokens of different interaction ranges. It is complemented by a pixel-, patch-, and feature map-level hierarchical interaction mechanism, designed to relay spatial geometry information across different scales and introducing a global receptive field to enhance the quality of instance-level height estimation. The ICG dynamically generates height partition for each image and reframes the traditional regression task, using a refinement from coarse to fine classification-regression that significantly mitigates the innate ill-posedness issue and drastically improves edge sharpness.

摘要
Height estimation 已经是测量和远程感知领域中长期焦点问题，对3D城市模型、MR和无人驾驶等项目具有重要意义。传统方法通常使用ステレオ匹配或多感器融合，这些技术需要多张视角不同的图像和附加感知器 like SAR，这导致了巨大的投入成本。单张图像高度估计已经成为一种有吸引力的alternative，具有更多的数据源和更简单的投入。然而，当前方法受到Fixed receptive fields和缺乏全球信息互动的限制，导致了明显的实例级高度偏差。图像高度预测的自然复杂性可能导致使用主流回归方法的对象边缘深度估计变得模糊。这篇论文提出了一种干扰HeightFormer，用于远程感知单张图像高度估计。该方法结合多层互动和图像适应分类回归，具有多层互动背bone和图像适应分类回归高度生成器（ICG）。多层互动背bone将传统的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的

Large Language Models for Scientific Synthesis, Inference and Explanation

paper_url: http://arxiv.org/abs/2310.07984
repo_url: https://github.com/zyzisastudyreallyhardguy/llm4sd
paper_authors: Yizhen Zheng, Huan Yee Koh, Jiaxin Ju, Anh T. N. Nguyen, Lauren T. May, Geoffrey I. Webb, Shirui Pan
for: 这个论文的目的是用大语言模型来执行科学合成、推理和解释。
methods: 论文使用了通用的大语言模型来从科学数据集中进行推理，并将这些推理结果与专门用于机器学习的数据集相结合，以提高预测分子性质的性能。
results: 研究表明，当将大语言模型的推理和合成结果与专门用于机器学习的数据集相结合时，可以超过当前的状态艺术水平。此外，大语言模型还可以解释机器学习系统的预测结果。

Abstract
Large language models are a form of artificial intelligence systems whose primary knowledge consists of the statistical patterns, semantic relationships, and syntactical structures of language1. Despite their limited forms of "knowledge", these systems are adept at numerous complex tasks including creative writing, storytelling, translation, question-answering, summarization, and computer code generation. However, they have yet to demonstrate advanced applications in natural science. Here we show how large language models can perform scientific synthesis, inference, and explanation. We present a method for using general-purpose large language models to make inferences from scientific datasets of the form usually associated with special-purpose machine learning algorithms. We show that the large language model can augment this "knowledge" by synthesizing from the scientific literature. When a conventional machine learning system is augmented with this synthesized and inferred knowledge it can outperform the current state of the art across a range of benchmark tasks for predicting molecular properties. This approach has the further advantage that the large language model can explain the machine learning system's predictions. We anticipate that our framework will open new avenues for AI to accelerate the pace of scientific discovery.

摘要
We present a method for using general-purpose large language models to make inferences from scientific datasets, which are usually used for special-purpose machine learning algorithms. We found that the large language model can augment this "knowledge" by synthesizing from the scientific literature. When a conventional machine learning system is augmented with this synthesized and inferred knowledge, it can outperform the current state of the art in predicting molecular properties. This approach also has the advantage that the large language model can explain the machine learning system's predictions. We believe that our framework will open up new opportunities for AI to accelerate scientific discovery.

Self-supervised visual learning for analyzing firearms trafficking activities on the Web

paper_url: http://arxiv.org/abs/2310.07975
repo_url: None
paper_authors: Sotirios Konstantakos, Despina Ioanna Chalkiadaki, Ioannis Mademlis, Adamantia Anna Rebolledo Chrysochoou, Georgios Th. Papadopoulos
for: 这篇论文的目的是对RGB图像中的自动化火器分类进行研究，以应对公共空间安全、情报收集和刑事调查等实际应用。
methods: 这篇论文使用的方法是深度神经网络（DNN），特别是卷积神经网络（CNN），并且使用了转移学习和自我超vised learning（SSL）。
results: 这篇论文的结果显示，使用SSL和转移学习可以实现更好的火器分类效果，并且可以在较小的 annotated 数据集上进行实现。

Abstract
Automated visual firearms classification from RGB images is an important real-world task with applications in public space security, intelligence gathering and law enforcement investigations. When applied to images massively crawled from the World Wide Web (including social media and dark Web sites), it can serve as an important component of systems that attempt to identify criminal firearms trafficking networks, by analyzing Big Data from open-source intelligence. Deep Neural Networks (DNN) are the state-of-the-art methodology for achieving this, with Convolutional Neural Networks (CNN) being typically employed. The common transfer learning approach consists of pretraining on a large-scale, generic annotated dataset for whole-image classification, such as ImageNet-1k, and then finetuning the DNN on a smaller, annotated, task-specific, downstream dataset for visual firearms classification. Neither Visual Transformer (ViT) neural architectures nor Self-Supervised Learning (SSL) approaches have been so far evaluated on this critical task. SSL essentially consists of replacing the traditional supervised pretraining objective with an unsupervised pretext task that does not require ground-truth labels..

摘要
自动化视觉枪支分类从RGB图像是一项重要的现实世界任务，有应用于公共空间安全、情报收集和刑事调查调查。当应用于互联网上搜索大量图像时，它可以作为系统的一部分，用于识别刑事枪支贩卖网络，通过分析开源情报大数据。深度神经网络（DNN）是现状最佳方法，常用的是卷积神经网络（CNN）。常见的传输学习方法是先在大规模、通用注解 dataset 上预训练 DNN，然后在下游任务特定的注解 dataset 上精度调整 DNN。而Visual Transformer（ViT）神经网络 architectures 和Self-Supervised Learning（SSL）方法尚未在这一关键任务上进行评估。 SSL 基本上是将传统的超级vised预训练目标取代为无labels的自主预TEXT task。

Interpretable Diffusion via Information Decomposition

paper_url: http://arxiv.org/abs/2310.07972
repo_url: https://github.com/kxh001/info-decomp
paper_authors: Xianghao Kong, Ollie Liu, Han Li, Dani Yogatama, Greg Ver Steeg
for: This paper is written for understanding the fine-grained relationships learned by diffusion models, and for developing methods to quantify and manipulate these relationships.
methods: The paper uses denoising diffusion models and exact expressions for mutual information and conditional mutual information to illuminate the relationships between words and parts of an image.
results: The paper shows that a natural non-negative decomposition of mutual information emerges, allowing for the quantification of informative relationships between words and pixels in an image, and enabling unsupervised localization of objects in images and measurement of effects through selective editing.

Abstract
Denoising diffusion models enable conditional generation and density modeling of complex relationships like images and text. However, the nature of the learned relationships is opaque making it difficult to understand precisely what relationships between words and parts of an image are captured, or to predict the effect of an intervention. We illuminate the fine-grained relationships learned by diffusion models by noticing a precise relationship between diffusion and information decomposition. Exact expressions for mutual information and conditional mutual information can be written in terms of the denoising model. Furthermore, pointwise estimates can be easily estimated as well, allowing us to ask questions about the relationships between specific images and captions. Decomposing information even further to understand which variables in a high-dimensional space carry information is a long-standing problem. For diffusion models, we show that a natural non-negative decomposition of mutual information emerges, allowing us to quantify informative relationships between words and pixels in an image. We exploit these new relations to measure the compositional understanding of diffusion models, to do unsupervised localization of objects in images, and to measure effects when selectively editing images through prompt interventions.

摘要
Diffusion models 可以实现Conditional generation和density模型复杂关系，如图像和文本之间的关系。然而，学习的关系性是不透明的，难以理解 diffusion 模型中 capture 的细腻关系是什么，或者预测干预的效果。我们通过发现 diffusion 和信息分解的精确关系来照明 diffusion 模型学习的细腻关系。我们可以通过 conditional 和 mutual information 的准确表达来理解这些关系，并且可以轻松地估算点wise 的关系。这些新关系allow us 可以问题关于特定图像和caption 之间的关系，并且可以进一步分解信息，以便理解 diffusion 模型中各变量的信息含量。我们展示了一种自然的非负分解方法，以便量化 diffusion 模型中各变量之间的信息关系。我们利用这些新关系来衡量 diffusion 模型的 compositional understanding，进行无监督的对象本地化，并且measure 干预后图像的效果。

A New Approach Towards Autoformalization

paper_url: http://arxiv.org/abs/2310.07957
repo_url: None
paper_authors: Nilay Patel, Rahul Saha, Jeffrey Flanigan
for: 本文提出了一种方法来自动化推理证明的验证，即通过自动将自然语言数学转化为可验证的正式语言。
methods: 本文提出了一种分解任务的方法，即将自然语言数学分解成三个更容易实现的子任务：不连接化形式化（即使用不连接的定义和证明）、实体链接（将证明和定义链接到正确的位置）和类型调整（使类型检查器通过）。
results: 本文提出了一个名为 arXiv2Formal 的 benchmark 数据集，包含 50 个证明，来验证自然语言数学的自动化验证能力。

Abstract
Verifying mathematical proofs is difficult, but can be automated with the assistance of a computer. Autoformalization is the task of automatically translating natural language mathematics into a formal language that can be verified by a program. This is a challenging task, and especially for higher-level mathematics found in research papers. Research paper mathematics requires large amounts of background and context. In this paper, we propose an avenue towards tackling autoformalization for research-level mathematics, by breaking the task into easier and more approachable subtasks: unlinked formalization (formalization with unlinked definitions and theorems), entity linking (linking to the proper theorems and definitions), and finally adjusting types so it passes the type checker. In addition, we present arXiv2Formal, a benchmark dataset for unlinked formalization consisting of 50 theorems formalized for the Lean theorem prover sampled from papers on arXiv.org. We welcome any contributions from the community to future versions of this dataset.

摘要
自动化验证数学证明是具有挑战性的任务，但可以通过计算机的协助进行自动化。自动化形式化是将自然语言数学转换为可以由计算机验证的形式语言的任务。这是一项复杂的任务，特别是在研究论文中出现的更高水平的数学。在这篇论文中，我们提出了一种方法来解决研究级数学自动化问题，即将任务分解为更容易实现的子任务：无关定义（定义和证明分开）、实体链接（将证明和定义链接到正确的位置）和最后调整类型，以便通过类型检查器进行验证。此外，我们还提供了arXiv2Formal数据集，这是一个由arXiv.org上的50个论文中所选择的50个证明，用于测试Lean证明引擎。我们欢迎社区的贡献，以便未来版本的数据集。

2023-10-12

cs.CL

cs.CL - 2023-10-12

Calibrating Likelihoods towards Consistency in Summarization Models

paper_url: http://arxiv.org/abs/2310.08764
repo_url: None
paper_authors: Polina Zablotskaia, Misha Khalman, Rishabh Joshi, Livio Baldini Soares, Shoshana Jakobovits, Joshua Maynez, Shashi Narayan
for: 提高抽象文本概要生成模型的可靠性，以便应用于实际场景。
methods: 使用自然语言判断（NLI）模型来衡量模型生成的文本的一致性，并对模型进行均衡化，使其更好地评估文本的一致性。
results: 通过人工评估和自动指标，显示了使用我们的方法生成的概要更加一致、质量更高，同时模型返回的概率也更加吻合NLI分数，提高了抽象文本概要生成模型的可靠性。

Abstract
Despite the recent advances in abstractive text summarization, current summarization models still suffer from generating factually inconsistent summaries, reducing their utility for real-world application. We argue that the main reason for such behavior is that the summarization models trained with maximum likelihood objective assign high probability to plausible sequences given the context, but they often do not accurately rank sequences by their consistency. In this work, we solve this problem by calibrating the likelihood of model generated sequences to better align with a consistency metric measured by natural language inference (NLI) models. The human evaluation study and automatic metrics show that the calibrated models generate more consistent and higher-quality summaries. We also show that the models trained using our method return probabilities that are better aligned with the NLI scores, which significantly increase reliability of summarization models.

摘要

Circuit Component Reuse Across Tasks in Transformer Language Models

paper_url: http://arxiv.org/abs/2310.08744
repo_url: None
paper_authors: Jack Merullo, Carsten Eickhoff, Ellie Pavlick
for: 这个研究的目的是为了解释大型语言模型的行为，以及它们是如何在不同任务上实现的。
methods: 这个研究使用了循环分析来reverse工程语言模型，并通过这种方法发现了一个名为IOI任务的电路。
results: 研究发现，这个IOI任务的电路可以在一个更大的GPT2模型上重现，并且可以用于解决一个看起来很不同的任务—颜色物品任务。此外，研究还发现了这两个任务之间的函数相似性，具体来说，它们的电路中的听力头数量相似，并且它们的处理逻辑也很相似。此外，研究还进行了一个观察性实验，通过调整中间层的四个听力头来使颜色物品任务的电路更像IOI任务的电路，从而提高了任务的准确率从49.6%提高到93.7%。这些结果表明，可能有一些可解释的任务通用算法构建块和计算组件，可以用于解释大型语言模型的行为。

Abstract
Recent work in mechanistic interpretability has shown that behaviors in language models can be successfully reverse-engineered through circuit analysis. A common criticism, however, is that each circuit is task-specific, and thus such analysis cannot contribute to understanding the models at a higher level. In this work, we present evidence that insights (both low-level findings about specific heads and higher-level findings about general algorithms) can indeed generalize across tasks. Specifically, we study the circuit discovered in Wang et al. (2022) for the Indirect Object Identification (IOI) task and 1.) show that it reproduces on a larger GPT2 model, and 2.) that it is mostly reused to solve a seemingly different task: Colored Objects (Ippolito & Callison-Burch, 2023). We provide evidence that the process underlying both tasks is functionally very similar, and contains about a 78% overlap in in-circuit attention heads. We further present a proof-of-concept intervention experiment, in which we adjust four attention heads in middle layers in order to 'repair' the Colored Objects circuit and make it behave like the IOI circuit. In doing so, we boost accuracy from 49.6% to 93.7% on the Colored Objects task and explain most sources of error. The intervention affects downstream attention heads in specific ways predicted by their interactions in the IOI circuit, indicating that this subcircuit behavior is invariant to the different task inputs. Overall, our results provide evidence that it may yet be possible to explain large language models' behavior in terms of a relatively small number of interpretable task-general algorithmic building blocks and computational components.

摘要

A Zero-Shot Language Agent for Computer Control with Structured Reflection

paper_url: http://arxiv.org/abs/2310.08740
repo_url: None
paper_authors: Tao Li, Gang Li, Zhiwei Deng, Bryan Wang, Yang Li
for: 本研究的目的是开发一个不需要专家示例的零批量机器人，能够自主学习并完成计算机上的任务。
methods: 本研究使用了规划和自我反思的技术，让机器人通过自己的错误分析和结构化思维管理来学习和改进控制。
results: 研究发现，在MiniWoB++中的易于完成任务上，我们的零批量机器人可以超越现有的最佳实践，并且更加高效地进行了分析。而在更复杂的任务上，我们的反思机器人与之前有特权的模型相当，即使这些模型有访问专家示例或额外的屏幕信息的优势。

Abstract
Large language models (LLMs) have shown increasing capacity at planning and executing a high-level goal in a live computer environment (e.g. MiniWoB++). To perform a task, recent works often require a model to learn from trace examples of the task via either supervised learning or few/many-shot prompting. Without these trace examples, it remains a challenge how an agent can autonomously learn and improve its control on a computer, which limits the ability of an agent to perform a new task. We approach this problem with a zero-shot agent that requires no given expert traces. Our agent plans for executable actions on a partially observed environment, and iteratively progresses a task by identifying and learning from its mistakes via self-reflection and structured thought management. On the easy tasks of MiniWoB++, we show that our zero-shot agent often outperforms recent SoTAs, with more efficient reasoning. For tasks with more complexity, our reflective agent performs on par with prior best models, even though previous works had the advantages of accessing expert traces or additional screen information.

摘要

Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models

paper_url: http://arxiv.org/abs/2310.08577
repo_url: https://github.com/bethgelab/DataTypeIdentification
paper_authors: Vishaal Udandarao, Max F. Burg, Samuel Albanie, Matthias Bethge
for: 本研究旨在提出一个新的任务——视觉数据类型识别，以探索现代视觉语言模型（VLM）在识别视觉内容的能力。
methods: 本研究开发了两个类型数据集，包括动物图像被修改成27种不同的视觉数据类型，分成四个主要类别。对于39个VLM，包括从100M到80B个参数的模型，进行了广泛的零基础评估。
results: 研究结果显示，VLMs在某些类型的视觉数据类型识别方面表现不俗，如动画和素描，但对于较简单的视觉数据类型，如图像旋转或加法噪声，表现不佳。研究显示，视觉数据类型识别需要更进一步的训练和模型设计。

Abstract
Recent advances in the development of vision-language models (VLMs) are yielding remarkable success in recognizing visual semantic content, including impressive instances of compositional image understanding. Here, we introduce the novel task of Visual Data-Type Identification, a basic perceptual skill with implications for data curation (e.g., noisy data-removal from large datasets, domain-specific retrieval) and autonomous vision (e.g., distinguishing changing weather conditions from camera lens staining). We develop two datasets consisting of animal images altered across a diverse set of 27 visual data-types, spanning four broad categories. An extensive zero-shot evaluation of 39 VLMs, ranging from 100M to 80B parameters, shows a nuanced performance landscape. While VLMs are reasonably good at identifying certain stylistic \textit{data-types}, such as cartoons and sketches, they struggle with simpler data-types arising from basic manipulations like image rotations or additive noise. Our findings reveal that (i) model scaling alone yields marginal gains for contrastively-trained models like CLIP, and (ii) there is a pronounced drop in performance for the largest auto-regressively trained VLMs like OpenFlamingo. This finding points to a blind spot in current frontier VLMs: they excel in recognizing semantic content but fail to acquire an understanding of visual data-types through scaling. By analyzing the pre-training distributions of these models and incorporating data-type information into the captions during fine-tuning, we achieve a significant enhancement in performance. By exploring this previously uncharted task, we aim to set the stage for further advancing VLMs to equip them with visual data-type understanding. Code and datasets are released at https://github.com/bethgelab/DataTypeIdentification.

摘要
最近的视力语言模型（VLM）的发展进展得到了非常出色的成果，包括惊人的图像Semantic content认知。在这里，我们介绍了一个新的任务：视图数据类型标识，这是一种基本的感知技能，它在数据筛选（例如，去除大量数据中的噪音）和自主视觉方面具有重要的意义。我们制作了两个包含动物图像被修改的数据集，其中包括27种不同的视图数据类型，分为四个大类。我们进行了39种VLM的无参数测试，其中参数量从100M到80B不等。我们发现，虽然VLM在某些风格的数据类型方面表现reasonably well，但对于基本的修改，如图像旋转或添加噪音，它们表现不佳。我们的发现表明，（i）升级模型 alone 并不能提供明显的提升，而（ii）最大化自动回归式VLMs like OpenFlamingo 在大型数据集上表现下降。这种发现表明当前前沿VLMs 在扩大scale 时，它们并不能通过升级来学习视图数据类型。通过分析这些模型的预训练分布 ribbon 和在 fine-tuning 过程中包含数据类型信息，我们实现了显著的性能提升。通过这个以前未探索的任务，我们希望能够为 VLM 带来更好的视图数据类型理解。代码和数据集可以在上下载。

LLM-augmented Preference Learning from Natural Language

paper_url: http://arxiv.org/abs/2310.08523
repo_url: None
paper_authors: Inwon Kang, Sikai Ruan, Tyler Ho, Jui-Chien Lin, Farhad Mohsin, Oshani Seneviratne, Lirong Xia
for: 本研究旨在使用大型自然语言模型（LLM）进行比较文本分类任务。
methods: 本研究采用了 transformer-based 模型和 graph neural architecture，并对 LLM 进行了直接 Classification 任务的设计和实验。
results: 研究发现，预训练的 LLM 能够在不需要精度调整的情况下，超越现有的 State-of-the-art 模型，特别是在多句text中的情况下。此外，一些几拟学习也能够提高表现。

Abstract
Finding preferences expressed in natural language is an important but challenging task. State-of-the-art(SotA) methods leverage transformer-based models such as BERT, RoBERTa, etc. and graph neural architectures such as graph attention networks. Since Large Language Models (LLMs) are equipped to deal with larger context lengths and have much larger model sizes than the transformer-based model, we investigate their ability to classify comparative text directly. This work aims to serve as a first step towards using LLMs for the CPC task. We design and conduct a set of experiments that format the classification task into an input prompt for the LLM and a methodology to get a fixed-format response that can be automatically evaluated. Comparing performances with existing methods, we see that pre-trained LLMs are able to outperform the previous SotA models with no fine-tuning involved. Our results show that the LLMs can consistently outperform the SotA when the target text is large -- i.e. composed of multiple sentences --, and are still comparable to the SotA performance in shorter text. We also find that few-shot learning yields better performance than zero-shot learning.

摘要
找到用户喜好表达在自然语言中是一项重要 yet challenging task. 现有的State-of-the-art (SotA) 方法利用 transformer-based 模型如 BERT、RoBERTa 等，以及图 neural 架构如图注意力网络。由于 Large Language Models (LLMs) 可以处理更长的上下文长度和有更大的模型大小于 transformer-based 模型，我们调查其能否直接类型比较文本。这项工作的目的是使用 LLMs 进行 CPC 任务。我们设计并进行了一系列实验，将类型任务转换为 LLM 的输入提示和一种自动评估的方法。与现有方法进行比较，我们发现预训练的 LLMs 能够在无需 fine-tuning 的情况下超越前一个 SotA 模型。我们的结果表明，LLMs 在 longer 的目标文本（即多个句子）上能够顺利地超越 SotA，并且在 shorter 的文本上仍然与 SotA 性能相当。我们还发现，几个 shot 学习比零 shot 学习更好。

The Uncertainty-based Retrieval Framework for Ancient Chinese CWS and POS

paper_url: http://arxiv.org/abs/2310.08496
repo_url: https://github.com/Jihuai-wpy/bert-ancient-chinese
paper_authors: Pengyu Wang, Zhichen Ren
for: 这 paper 是为了提高古代中文文本分 segmentation 和 parts-of-speech 标注的框架。
methods: 该 framework 使用了两种方法：一方面是capture 词 semantics; 另一方面是通过引入外部知识来重新预测基线模型的不确定样本。
results: 该框架的性能超过了预先训练的 BERT 与 CRF 以及现有的工具 such as Jiayan。

Abstract
Automatic analysis for modern Chinese has greatly improved the accuracy of text mining in related fields, but the study of ancient Chinese is still relatively rare. Ancient text division and lexical annotation are important parts of classical literature comprehension, and previous studies have tried to construct auxiliary dictionary and other fused knowledge to improve the performance. In this paper, we propose a framework for ancient Chinese Word Segmentation and Part-of-Speech Tagging that makes a twofold effort: on the one hand, we try to capture the wordhood semantics; on the other hand, we re-predict the uncertain samples of baseline model by introducing external knowledge. The performance of our architecture outperforms pre-trained BERT with CRF and existing tools such as Jiayan.

摘要
自动分析现代中文已经大幅提高了相关领域的文本挖掘精度，但古代中文的研究仍然相对罕见。古代文本分区和词性标注是古典文学理解的重要组成部分，先前的研究已经尝试了构建辅助词典和其他融合知识以提高性能。在这篇论文中，我们提出了古代中文单词分 segmentation和部分标注框架，该框架做出了两重努力：一方面，我们尝试捕捉词 semantics；另一方面，我们重新预测基eline模型的不确定样本，通过引入外部知识。我们的架构的性能超过了预训练BERT与CRF以及现有工具such as Jiayan。

Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

paper_url: http://arxiv.org/abs/2310.08491
repo_url: https://github.com/kaistAI/Prometheus
paper_authors: Seungone Kim, Jamin Shin, Yejin Cho, Joel Jang, Shayne Longpre, Hwaran Lee, Sangdoo Yun, Seongjin Shin, Sungdong Kim, James Thorne, Minjoon Seo
for: 这篇论文的目的是提出一个可以用于长篇回答评估的完全开源语言模型（Prometheus），以取代 propriety LLM（如GPT-4），并且可以根据用户提供的自定义评估标准（customized score rubric）进行评估。
methods: 这篇论文使用了一个新的数据集——Feedback Collection，包含1000个细化的评估标准、20000个指令和100000个由GPT-4生成的语言反馈。然后，通过使用Feedback Collection，提出了一个13亿参数的评估语言模型（Prometheus），可以根据用户提供的自定义评估标准进行评估。
results: 实验结果显示，Prometheus与人工评估人员的相关性为0.897，与GPT-4相关性为0.882，并且大大超过ChatGPT的相关性（0.392）。此外，通过对1222个自定义评估标准进行测试，Prometheus在四个 benchmark 上的相关性都是GPT-4的。最后，Prometheus在两个人类偏好benchmark上的准确率最高，比开源奖励模型在人类偏好数据集上的训练结果更好。

Abstract
Recently, using a powerful proprietary Large Language Model (LLM) (e.g., GPT-4) as an evaluator for long-form responses has become the de facto standard. However, for practitioners with large-scale evaluation tasks and custom criteria in consideration (e.g., child-readability), using proprietary LLMs as an evaluator is unreliable due to the closed-source nature, uncontrolled versioning, and prohibitive costs. In this work, we propose Prometheus, a fully open-source LLM that is on par with GPT-4's evaluation capabilities when the appropriate reference materials (reference answer, score rubric) are accompanied. We first construct the Feedback Collection, a new dataset that consists of 1K fine-grained score rubrics, 20K instructions, and 100K responses and language feedback generated by GPT-4. Using the Feedback Collection, we train Prometheus, a 13B evaluator LLM that can assess any given long-form text based on customized score rubric provided by the user. Experimental results show that Prometheus scores a Pearson correlation of 0.897 with human evaluators when evaluating with 45 customized score rubrics, which is on par with GPT-4 (0.882), and greatly outperforms ChatGPT (0.392). Furthermore, measuring correlation with GPT-4 with 1222 customized score rubrics across four benchmarks (MT Bench, Vicuna Bench, Feedback Bench, Flask Eval) shows similar trends, bolstering Prometheus's capability as an evaluator LLM. Lastly, Prometheus achieves the highest accuracy on two human preference benchmarks (HHH Alignment & MT Bench Human Judgment) compared to open-sourced reward models explicitly trained on human preference datasets, highlighting its potential as an universal reward model. We open-source our code, dataset, and model at https://github.com/kaistAI/Prometheus.

摘要
近些时候，使用强大的专有大语言模型（LLM）（例如GPT-4）作为评价长篇回答的标准已成为了现实。然而，对于具有大规模评估任务和自定义评价标准的实践者来说，使用专有LLM作为评价器是不可靠的，因为它们的关闭源代码、不可控的版本和昂贵的成本。在这种情况下，我们提出了Prometheus，一个完全开源的LLM，它与GPT-4的评价能力相当，只要附带合适的参考答案和评价标准。我们首先构建了Feedback Collection，一个新的数据集，包括1000个细化的评价标准、20000个说明和100000个由GPT-4生成的语言反馈。使用Feedback Collection，我们训练了Prometheus，一个13亿 evaluator LLM，可以根据用户提供的自定义评价标准评估任何长篇文本。实验结果显示，Prometheus与人类评估器相关性为0.897，与GPT-4相关性为0.882，并大幅超过ChatGPT（0.392）。此外，使用1222个自定义评价标准在四个 bench 上测试Prometheus和GPT-4的相关性也显示了类似的趋势，这ebolsters Prometheus的评价器LLM能力。最后，Prometheus在两个人类偏好bench（HHH Alignment & MT Bench Human Judgment）上达到了其他开源奖励模型explicitly trained on human preference datasets的最高准确率， highlighting its potential as an universal reward model。我们将我们的代码、数据集和模型公开于https://github.com/kaistAI/Prometheus。

GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language Models

paper_url: http://arxiv.org/abs/2310.08487
repo_url: https://github.com/happen2me/cross-gnn
paper_authors: Yuanchun Shen, Ruotong Liao, Zhen Han, Yunpu Ma, Volker Tresp
for: This paper aims to evaluate and develop graph-language models that can integrate graph knowledge into language generation.
methods: The proposed method uses a question answering dataset called GraphextQA, which includes paired subgraphs retrieved from Wikidata, to condition answer generation on the paired graphs through cross-attention.
results: The proposed method demonstrates the usefulness of paired graphs for answer generation and shows the difficulty of the task by comparing language-only models and the proposed graph-language model.Here’s the Chinese version:
for: 这篇论文目的是评估和发展基于图像知识的语言模型。
methods: 提议的方法使用了一个名为GraphextQA的问答集，其包含从Wikidata中检索的匹配子图，以便在解码时通过跨模型的注意力来使用问题相关的图像特征。
results: 提议的方法证明了匹配图像的用处性，并表明了这个任务的困难性，通过比较语言只模型和提议的图像语言模型。

Abstract
While multi-modal models have successfully integrated information from image, video, and audio modalities, integrating graph modality into large language models (LLMs) remains unexplored. This discrepancy largely stems from the inherent divergence between structured graph data and unstructured text data. Incorporating graph knowledge provides a reliable source of information, enabling potential solutions to address issues in text generation, e.g., hallucination, and lack of domain knowledge. To evaluate the integration of graph knowledge into language models, a dedicated dataset is needed. However, there is currently no benchmark dataset specifically designed for multimodal graph-language models. To address this gap, we propose GraphextQA, a question answering dataset with paired subgraphs, retrieved from Wikidata, to facilitate the evaluation and future development of graph-language models. Additionally, we introduce a baseline model called CrossGNN, which conditions answer generation on the paired graphs by cross-attending question-aware graph features at decoding. The proposed dataset is designed to evaluate graph-language models' ability to understand graphs and make use of it for answer generation. We perform experiments with language-only models and the proposed graph-language model to validate the usefulness of the paired graphs and to demonstrate the difficulty of the task.

摘要
While multi-modal models have successfully integrated information from image, video, and audio modalities, integrating graph modality into large language models (LLMs) remains unexplored. This discrepancy largely stems from the inherent divergence between structured graph data and unstructured text data. Incorporating graph knowledge provides a reliable source of information, enabling potential solutions to address issues in text generation, e.g., hallucination, and lack of domain knowledge. To evaluate the integration of graph knowledge into language models, a dedicated dataset is needed. However, there is currently no benchmark dataset specifically designed for multimodal graph-language models. To address this gap, we propose GraphextQA, a question answering dataset with paired subgraphs, retrieved from Wikidata, to facilitate the evaluation and future development of graph-language models. Additionally, we introduce a baseline model called CrossGNN, which conditions answer generation on the paired graphs by cross-attending question-aware graph features at decoding. The proposed dataset is designed to evaluate graph-language models' ability to understand graphs and make use of it for answer generation. We perform experiments with language-only models and the proposed graph-language model to validate the usefulness of the paired graphs and to demonstrate the difficulty of the task.

Understanding the Humans Behind Online Misinformation: An Observational Study Through the Lens of the COVID-19 Pandemic

paper_url: http://arxiv.org/abs/2310.08483
repo_url: None
paper_authors: Mohit Chandra, Anush Mattapalli, Munmun De Choudhury
for: 本研究旨在理解在COVID-19疫情期间用户如何传播谣言，以及这种行为与过去在非COVID话题上的谣言传播倾向之间的关系。
methods: 该研究采用时序分析技术和robust causal inference-based设计，分析了超过32万个COVID-19推文和16万个历史时间推文。
results: 分析发现，用户在COVID-19疫情期间的谣言传播行为和过去在非COVID话题上的谣言传播倾向之间存在正相关关系，表明用户的历史倾向对当前谣言传播行为产生了影响。这些结果可能为设计用户中心的免疫策略和生态系统基于的迅速干预策略提供了价值的基础。

Abstract
The proliferation of online misinformation has emerged as one of the biggest threats to society. Considerable efforts have focused on building misinformation detection models, still the perils of misinformation remain abound. Mitigating online misinformation and its ramifications requires a holistic approach that encompasses not only an understanding of its intricate landscape in relation to the complex issue and topic-rich information ecosystem online, but also the psychological drivers of individuals behind it. Adopting a time series analytic technique and robust causal inference-based design, we conduct a large-scale observational study analyzing over 32 million COVID-19 tweets and 16 million historical timeline tweets. We focus on understanding the behavior and psychology of users disseminating misinformation during COVID-19 and its relationship with the historical inclinations towards sharing misinformation on Non-COVID topics before the pandemic. Our analysis underscores the intricacies inherent to cross-topic misinformation, and highlights that users' historical inclination toward sharing misinformation is positively associated with their present behavior pertaining to misinformation sharing on emergent topics and beyond. This work may serve as a valuable foundation for designing user-centric inoculation strategies and ecologically-grounded agile interventions for effectively tackling online misinformation.

摘要
“在线资讯的滥读问题已经成为现代社会面临的一大挑战。各方努力建立误信探测模型，但误信的危害仍然存在。为了对抗网络误信和其后果，我们需要一个整体的方法，不仅要理解网络资讯的复杂领域，也要理解个人在网络上传播误信的心理驱动。我们运用时间序列分析技术和强健的 causal inference-based 设计，对 COVID-19 tweets 和历史时间轴 tweets 进行大规模观察分析，焦点在探索传播误信的用户行为和心理。我们发现跨主题误信的复杂性，并发现用户在过去传播误信的倾向与今天传播误信的行为之间存在正相关。这项研究可能成为设计用户中心的传染策略和生态系考虑的基础。”

A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing

paper_url: http://arxiv.org/abs/2310.08433
repo_url: https://github.com/komoku/confederacy-of-models
paper_authors: Carlos Gómez-Rodríguez, Paul Williams
for: 我们用这篇论文来评估一些最新的自然语言处理技术（LLMs）在英语创作写作中的表现。
methods: 我们使用一个具有挑战性和复杂性的enario， chosen to avoid training data reuse，要求 Ignatius J. Reilly，一位著名小说《奴隶制度》（1980）中的主人公和一个恐龙进行一场决斗。我们向几个LLMs和人类写作者请求作品，并进行了人类评价，包括流畅性、一致性、创新力、幽默和风格等方面的评价。
results: 我们的结果表明，一些现代商业LLMs在大多数维度上与我们的写作者匹配或甚至超越了。然而，开源LLMs落后于其他LLMs。人类在创作方面仍保留了一定的优势，而幽默方面则存在 binary 的分化，一些LLMs可以与人类相匹配，而其他LLMs则完全失败。我们讨论了这些研究的限制和意义，并提出了未来研究的方向。

Abstract
We evaluate a range of recent LLMs on English creative writing, a challenging and complex task that requires imagination, coherence, and style. We use a difficult, open-ended scenario chosen to avoid training data reuse: an epic narration of a single combat between Ignatius J. Reilly, the protagonist of the Pulitzer Prize-winning novel A Confederacy of Dunces (1980), and a pterodactyl, a prehistoric flying reptile. We ask several LLMs and humans to write such a story and conduct a human evalution involving various criteria such as fluency, coherence, originality, humor, and style. Our results show that some state-of-the-art commercial LLMs match or slightly outperform our writers in most dimensions; whereas open-source LLMs lag behind. Humans retain an edge in creativity, while humor shows a binary divide between LLMs that can handle it comparably to humans and those that fail at it. We discuss the implications and limitations of our study and suggest directions for future research.

摘要
我们评估了一些最新的自然语言处理模型（LLM）在英语创作写作方面的表现，这是一项复杂和挑战性的任务，需要想象力、一致性和风格。我们使用了一个具有挑战性和开放性的enario，避免了训练数据的重复使用：一场 Ignatius J. Reilly，《一个奴隶共和国》（1980）中的主角，与恐龙相打的漫长战役。我们征求了一些LLM和人类作者写出这种故事，并进行了人类评估，包括流畅性、一致性、创新性、幽默和风格等多个指标。我们的结果显示，一些当前的商业LLM几乎与人类作者相当或略高于其他维度中的大多数维度，而开源LLM则落后于人类。人类在创意方面仍保持优势，而幽默方面则存在binary分化，一些LLM可以与人类相比肯定地处理，而另一些则完全失败。我们讨论了我们的研究的限制和意义，并建议未来研究的方向。

Reconstructing Materials Tetrahedron: Challenges in Materials Information Extraction

paper_url: http://arxiv.org/abs/2310.08383
repo_url: None
paper_authors: Kausik Hira, Mohd Zaki, Dhruvil Sheth, Mausam, N M Anoop Krishnan
for: 本研究旨在探讨自然语言处理和深度学习技术在材料科学文献中自动信息提取中存在的挑战，以创建大规模的材料科学知识库。
methods: 本研究使用了深度学习和自然语言处理技术来检测和提取材料科学文献中的信息，特别是从文本和表格中提取信息。
results: 本研究发现了许多自动信息提取中的挑战，包括表格和文本中的信息提取、不同报告风格和不具有一致性的报告方式等。

Abstract
Discovery of new materials has a documented history of propelling human progress for centuries and more. The behaviour of a material is a function of its composition, structure, and properties, which further depend on its processing and testing conditions. Recent developments in deep learning and natural language processing have enabled information extraction at scale from published literature such as peer-reviewed publications, books, and patents. However, this information is spread in multiple formats, such as tables, text, and images, and with little or no uniformity in reporting style giving rise to several machine learning challenges. Here, we discuss, quantify, and document these outstanding challenges in automated information extraction (IE) from materials science literature towards the creation of a large materials science knowledge base. Specifically, we focus on IE from text and tables and outline several challenges with examples. We hope the present work inspires researchers to address the challenges in a coherent fashion, providing to fillip to IE for the materials knowledge base.

摘要
人类进步史上发现新材料有记录，对人类进步产生了深远的影响。材料的行为取决于其组成、结构和性能，这些因素又取决于材料的处理和测试条件。现代深度学习和自然语言处理技术已经允许大规模提取出版文献中的信息，如同行 peer-reviewed 论文、书籍和专利。然而，这些信息分散在多种格式中，如表格、文本和图片，并且无一统一的报告风格，从而带来了许多机器学习挑战。我们在这里讨论、量化和记录了自动信息提取（IE）在材料科学文献中的一些挑战，以创建大规模的材料科学知识库。我们专注于IE文本和表格中的挑战，并提供了一些例子。我们希望现在的工作能够激励研究人员解决这些挑战，以提供填充材料知识库的动力。

Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment

paper_url: http://arxiv.org/abs/2310.08372
repo_url: https://github.com/amourwaltz/factdial
paper_authors: Boyang Xue, Weichao Wang, Hongru Wang, Fei Mi, Rui Wang, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong
for: 提高知识grounded对话系统中FFN模块的准确表达能力
methods: investigate two methods to improve the factual expression capability of FFNs, including explicit knowledge enhancement and implicit alignment through reinforcement learning
results: experimental results on WoW and CMU_DoG datasets show that our methods efficiently enhance the ability of the FFN module to convey factual knowledge, validating the effectiveness of improving factual consistency for knowledge-grounded dialogue systems.

Abstract
Pretrained language models (PLMs) based knowledge-grounded dialogue systems are prone to generate responses that are factually inconsistent with the provided knowledge source. In such inconsistent responses, the dialogue models fail to accurately express the external knowledge they rely upon. Inspired by previous work which identified that feed-forward networks (FFNs) within Transformers are responsible for factual knowledge expressions, we investigate two methods to efficiently improve the factual expression capability {of FFNs} by knowledge enhancement and alignment respectively. We first propose \textsc{K-Dial}, which {explicitly} introduces {extended FFNs in Transformers to enhance factual knowledge expressions} given the specific patterns of knowledge-grounded dialogue inputs. Additionally, we apply the reinforcement learning for factual consistency (RLFC) method to implicitly adjust FFNs' expressions in responses by aligning with gold knowledge for the factual consistency preference. To comprehensively assess the factual consistency and dialogue quality of responses, we employ extensive automatic measures and human evaluations including sophisticated fine-grained NLI-based metrics. Experimental results on WoW and CMU\_DoG datasets demonstrate that our methods efficiently enhance the ability of the FFN module to convey factual knowledge, validating the efficacy of improving factual consistency for knowledge-grounded dialogue systems.

摘要
预训言语模型（PLM）基于的知识围绕对话系统有可能生成不符合知识源的回答。在这些不符合回答中，对话模型失去了正确表达外部知识的能力。根据前期研究，发现Feed-Forward Networks（FFNs）在Transformers中负责表达事实知识。我们调查了两种方法可以有效提高FFNs的事实表达能力，即知识增强和对齐方法。我们首先提出了\textsc{K-Dial}，该方法在Transformers中引入扩展的FFNs以提高基于知识围绕对话输入的事实表达。此外，我们应用了对齐抽象金 знание的方法来隐式地调整FFNs的表达，以确保回答的事实一致性。为了全面评估回答的事实一致性和对话质量，我们采用了广泛的自动度量和人工评估，包括复杂的细致的NLI基metric。实验结果表明，我们的方法可以有效提高FFN module的事实表达能力，证明了提高知识围绕对话系统的事实一致性的效果。

From Large Language Models to Knowledge Graphs for Biomarker Discovery in Cancer

paper_url: http://arxiv.org/abs/2310.08365
repo_url: None
paper_authors: Md. Rezaul Karim, Lina Molinas Comet, Md Shajalal, Oya Beyan, Dietrich Rebholz-Schuhmann, Stefan Decker
for: 这个论文是为了提供一个封装了生物医学知识的域 Specific Knowledge Graph（KG），以便用于抑肿癌病的诊断和治疗建议。
methods: 这个论文使用了生物医学文献中的数据（例如文章、图像、ómics数据和临床数据），通过构建一个大规模的知识图（KG），提取了与生物医学知识相关的实体和关系。然后，使用生物语义技术（例如 BioBERT 和 SciBERT）进行信息提取（IE），以提高 KG 的质量。
results: 这个论文通过构建域 Specific KG，使得域专家可以通过查询和探索 KG 来获得更多的生物医学知识，并且可以通过Semantic reasoning来验证基因与疾病关系。此外，通过使用大语言模型（LLMs）进行 KG 的迭代更新，使得 AI 系统可以更好地适应生物医学领域的演变。

Abstract
Domain experts often rely on up-to-date knowledge for apprehending and disseminating specific biological processes that help them design strategies to develop prevention and therapeutic decision-making. A challenging scenario for artificial intelligence (AI) is using biomedical data (e.g., texts, imaging, omics, and clinical) to provide diagnosis and treatment recommendations for cancerous conditions. Data and knowledge about cancer, drugs, genes, proteins, and their mechanism is spread across structured (knowledge bases (KBs)) and unstructured (e.g., scientific articles) sources. A large-scale knowledge graph (KG) can be constructed by integrating these data, followed by extracting facts about semantically interrelated entities and relations. Such KGs not only allow exploration and question answering (QA) but also allow domain experts to deduce new knowledge. However, exploring and querying large-scale KGs is tedious for non-domain users due to a lack of understanding of the underlying data assets and semantic technologies. In this paper, we develop a domain KG to leverage cancer-specific biomarker discovery and interactive QA. For this, a domain ontology called OncoNet Ontology (ONO) is developed to enable semantic reasoning for validating gene-disease relations. The KG is then enriched by harmonizing the ONO, controlled vocabularies, and additional biomedical concepts from scientific articles by employing BioBERT- and SciBERT-based information extraction (IE) methods. Further, since the biomedical domain is evolving, where new findings often replace old ones, without employing up-to-date findings, there is a high chance an AI system exhibits concept drift while providing diagnosis and treatment. Therefore, we finetuned the KG using large language models (LLMs) based on more recent articles and KBs that might not have been seen by the named entity recognition models.

摘要
域内专家常靠最新的知识来理解和传达特定生物过程，以设计预防和治疗决策的策略。人工智能（AI）面临着使用生物医学数据（如文本、成像、ómics和临床）提供诊断和治疗建议的挑战。生物医学数据和知识分布在结构化（知识库）和不结构化（如科学文章）来源中。我们可以将这些数据集成成大规模知识图（KG），然后提取关键的生物医学实体和关系信息。这些KG不仅允许探索和问答（QA），还允许域专家推理出新的知识。然而，探索和查询大规模KG可以是非域用户的繁琐和困难，因为他们缺乏对下面数据资产和semantic技术的理解。在这篇论文中，我们开发了域知识图（KG），以便利用抗癌特异性生物标志物发现和互动问答。为此，我们开发了一个域 ontology（ONO），以启用semantic推理，验证蛋白质与疾病关系。然后，我们将KG丰富化，通过融合ONO、控制词汇和生物医学概念，使用BioBERT和SciBERT基于文本提取技术。此外，由于医学领域在不断发展，新的发现经常取代老的发现，如果不使用最新的发现，AI系统可能会出现概念漂移，从而影响诊断和治疗的准确性。因此，我们在LLMs（大语言模型）基于更新的文章和知识库进行训练和finetuning。

Defending Our Privacy With Backdoors

paper_url: http://arxiv.org/abs/2310.08320
repo_url: https://github.com/D0miH/Defending-Our-Privacy-With-Backdoors
paper_authors: Dominik Hintersdorf, Lukas Struppek, Daniel Neider, Kristian Kersting
for: 保护个人隐私，防止敌对者通过隐私攻击提取模型中的敏感信息。
methods: 利用后门攻击方法，将敏感词的嵌入与中性词的嵌入进行对应，使得模型不会受到隐私攻击的影响。
results: 通过对CLIP模型进行特殊隐私攻击评估，证明了我们的后门防御策略的有效性。

Abstract
The proliferation of large AI models trained on uncurated, often sensitive web-scraped data has raised significant privacy concerns. One of the concerns is that adversaries can extract information about the training data using privacy attacks. Unfortunately, the task of removing specific information from the models without sacrificing performance is not straightforward and has proven to be challenging. We propose a rather easy yet effective defense based on backdoor attacks to remove private information such as names of individuals from models, and focus in this work on text encoders. Specifically, through strategic insertion of backdoors, we align the embeddings of sensitive phrases with those of neutral terms-"a person" instead of the person's name. Our empirical results demonstrate the effectiveness of our backdoor-based defense on CLIP by assessing its performance using a specialized privacy attack for zero-shot classifiers. Our approach provides not only a new "dual-use" perspective on backdoor attacks, but also presents a promising avenue to enhance the privacy of individuals within models trained on uncurated web-scraped data.

摘要
大量的AI模型通过未经整理、有时敏感的网络抓取数据进行训练，带来了一些隐私问题。其中一个问题是，敌对者可以通过隐私攻击提取模型中的信息。然而，从模型中删除特定信息而不影响性能是一项不容易的任务，并且已经证明是具有挑战性的。我们提出了一种简单又有效的防御方法，基于后门攻击来移除模型中的隐私信息，并且在本文中专注于文本编码器。具体来说，通过策略性的插入后门，我们将敏感词的嵌入与中性词的嵌入相对应，例如将个人名称替换为“一个人”。我们的实验结果表明，我们的后门基于防御方法在CLIP上表现出色，并且提供了一种新的“双用”视角，以及一个可能的方式来增强模型中人类隐私的保护。

Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning

paper_url: http://arxiv.org/abs/2310.08309
repo_url: https://github.com/Zhe-Young/WICL
paper_authors: Zhe Yang, Damai Dai, Peiyi Wang, Zhifang Sui
for: 这 paper 的目的是研究如何在 In-Context Learning (ICL) 中确定示例的权重，以及如何在不同的模型位置上应用这些权重。
methods: 这 paper 使用了 masked self-prediction (MSP) Score 来评估示例的质量，并采用了粒子搜索和粒子搜索来寻找approximately optimal weights。
results: 这 paper 的实验结果显示，使用该方法可以大幅提高 IC L 性能，并且比 convential ICL 要好得多。

Abstract
Large Language Models (LLMs) have recently gained the In-Context Learning (ICL) ability with the models scaling up, allowing them to quickly adapt to downstream tasks with only a few demonstration examples prepended in the input sequence. Nonetheless, the current practice of ICL treats all demonstration examples equally, which still warrants improvement, as the quality of examples is usually uneven. In this paper, we investigate how to determine approximately optimal weights for demonstration examples and how to apply them during ICL. To assess the quality of weights in the absence of additional validation data, we design a masked self-prediction (MSP) score that exhibits a strong correlation with the final ICL performance. To expedite the weight-searching process, we discretize the continuous weight space and adopt beam search. With approximately optimal weights obtained, we further propose two strategies to apply them to demonstrations at different model positions. Experimental results on 8 text classification tasks show that our approach outperforms conventional ICL by a large margin. Our code are publicly available at https:github.com/Zhe-Young/WICL.

摘要

MProto: Multi-Prototype Network with Denoised Optimal Transport for Distantly Supervised Named Entity Recognition

paper_url: http://arxiv.org/abs/2310.08298
repo_url: https://github.com/XiPotatonium/mproto
paper_authors: Shuhui Wu, Yongliang Shen, Zeqi Tan, Wenqi Ren, Jietian Guo, Shiliang Pu, Weiming Lu
for: 这篇论文targets distantly supervised named entity recognition (NER) task, aiming to locate entity mentions and classify their types with only knowledge bases or gazetteers and unlabeled corpus.
methods: 这篇论文提出了一个具有抗杂读能力的原型网络（MProto）来解决DS-NER任务。不同于先前的原型基本的NER方法，MProto将每个entity type represented by multiple prototypes，以具体化entity表现的内部变化。
results: experiments on several DS-NER benchmarks show that our MProto achieves state-of-the-art performance.Here’s the format you requested:
for: <这篇论文targets distantly supervised named entity recognition (NER) task, aiming to locate entity mentions and classify their types with only knowledge bases or gazetteers and unlabeled corpus.>
methods: <这篇论文提出了一个具有抗杂读能力的原型网络（MProto）来解决DS-NER任务。不同于先前的原型基本的NER方法，MProto将每个entity type represented by multiple prototypes，以具体化entity表现的内部变化。>
results:

Abstract
Distantly supervised named entity recognition (DS-NER) aims to locate entity mentions and classify their types with only knowledge bases or gazetteers and unlabeled corpus. However, distant annotations are noisy and degrade the performance of NER models. In this paper, we propose a noise-robust prototype network named MProto for the DS-NER task. Different from previous prototype-based NER methods, MProto represents each entity type with multiple prototypes to characterize the intra-class variance among entity representations. To optimize the classifier, each token should be assigned an appropriate ground-truth prototype and we consider such token-prototype assignment as an optimal transport (OT) problem. Furthermore, to mitigate the noise from incomplete labeling, we propose a novel denoised optimal transport (DOT) algorithm. Specifically, we utilize the assignment result between Other class tokens and all prototypes to distinguish unlabeled entity tokens from true negatives. Experiments on several DS-NER benchmarks demonstrate that our MProto achieves state-of-the-art performance. The source code is now available on Github.

摘要
distant supervised named entity recognition (DS-NER) targets to locate entity mentions and classify their types with only knowledge bases or gazetteers and unlabeled corpus. However, distant annotations are noisy and degrade the performance of NER models. In this paper, we propose a noise-robust prototype network named MProto for the DS-NER task. Different from previous prototype-based NER methods, MProto represents each entity type with multiple prototypes to characterize the intra-class variance among entity representations. To optimize the classifier, each token should be assigned an appropriate ground-truth prototype, and we consider such token-prototype assignment as an optimal transport (OT) problem. Furthermore, to mitigate the noise from incomplete labeling, we propose a novel denoised optimal transport (DOT) algorithm. Specifically, we utilize the assignment result between Other class tokens and all prototypes to distinguish unlabeled entity tokens from true negatives. Experiments on several DS-NER benchmarks demonstrate that our MProto achieves state-of-the-art performance. The source code is now available on Github.Here's the word-for-word translation of the text into Simplified Chinese:远程监督名entity recognition (DS-NER) 目标是在只有知识库或地图的情况下，找到实体提及和其类型的标注，但远程标注具有噪音性，这会降低NER模型的性能。本文提出一种鲁棒的prototype网络，名为MProto，用于DS-NER任务。与前一些prototype-based NER方法不同，MProto对每个实体类型使用多个проtotypes来描述实体表示的内部变异。为优化分类器，每个token应该被分配到合适的真实prototype，我们认为这是一个optimal transport (OT)问题。此外，为了 mitigate incomplete labeling的噪音，我们提出了一种新的denoised optimal transport (DOT)算法。具体来说，我们使用其他类型token和所有prototypes之间的匹配结果，来 отличиtrue negatives from unlabeled entity tokens。实验表明，我们的MProto在多个DS-NER benchmark上达到了状态的性能。源代码现已经在Github上可用。

Optimizing Odia Braille Literacy: The Influence of Speed on Error Reduction and Enhanced Comprehension

paper_url: http://arxiv.org/abs/2310.08280
repo_url: None
paper_authors: Monnie Parida, Manjira Sinha, Anupam Basu, Pabitra Mitra
for: 这项研究的目的是对学生们的奥地利Braille阅读理解进行详细的分析，尤其是针对视障学生的阅读速度和手或指运动。
methods: 本研究使用观察参与者的手运动来理解阅读错误与手运动之间的关系，以及参与者的奥地利Braille阅读技能、阅读速度、错误和理解水平。
results: 研究发现阅读速度和阅读错误之间存在显著的相关性，即阅读速度下降时，阅读错误的数量往往增加。此外，研究还发现，改善Braille阅读错误可以提高阅读理解水平，而不同的Braille阅读模式可能存在不同的理论、发展和方法ологи的意义。

Abstract
This study aims to conduct an extensive detailed analysis of the Odia Braille reading comprehension among students with visual disability. Specifically, the study explores their reading speed and hand or finger movements. The study also aims to investigate any comprehension difficulties and reading errors they may encounter. Six students from the 9th and 10th grades, aged between 14 and 16, participated in the study. We observed participants hand movements to understand how reading errors were connected to hand movement and identify the students reading difficulties. We also evaluated the participants Odia Braille reading skills, including their reading speed (in words per minute), errors, and comprehension. The average speed of Odia Braille reader is 17.64wpm. According to the study, there was a noticeable correlation between reading speed and reading errors. As reading speed decreased, the number of reading errors tended to increase. Moreover, the study established a link between reduced Braille reading errors and improved reading comprehension. In contrast, the study found that better comprehension was associated with increased reading speed. The researchers concluded with some interesting findings about preferred Braille reading patterns. These findings have important theoretical, developmental, and methodological implications for instruction.

摘要
Translation notes:* "Odia" is the Braille script used for the Odia language.* "wpm" stands for "words per minute".* "reading errors" refer to mistakes made while reading, such as misidentifying letters or words.* "comprehension" refers to the ability to understand the meaning of the text being read.* "preferred Braille reading patterns" refer to the specific ways in which students with visual disabilities tend to read Braille text.

paper_url: http://arxiv.org/abs/2310.08240
repo_url: None
paper_authors: Wanyun Cui, Linqiu Zhang, Qianle Wang, Shuyang Cai
for: 本研究旨在评估社交媒体平台上AI文本检测模型的能力，并提出了一个新的用户参与型AI文本检测挑战。
methods: 本研究使用了SAID（社交媒体AI检测） benchmark，该benchmark基于真实的社交媒体平台上的AI生成文本，如Zhihu和Quora。与现有的benchmark不同，SAID更加注重实际上的AI用户在互联网上使用的策略，以提供更加真实和挑战性的评估环境。
results: 研究发现，使用Zhihu数据集，人工标注者可以将AI生成文本和人类生成文本正确分类的平均准确率为96.5%。这一结果表明，在今天广泛使用AI的环境中，人类可能需要重新评估AI生成文本的识别能力。此外，研究还提出了一个新的用户参与型AI文本检测挑战，该挑战的实验结果表明，在实际社交媒体平台上进行检测任务比传统的模拟AI文本检测更加具有挑战性，并且用户参与型AI文本检测可以提高检测精度。

Abstract
AI-generated text has proliferated across various online platforms, offering both transformative prospects and posing significant risks related to misinformation and manipulation. Addressing these challenges, this paper introduces SAID (Social media AI Detection), a novel benchmark developed to assess AI-text detection models' capabilities in real social media platforms. It incorporates real AI-generate text from popular social media platforms like Zhihu and Quora. Unlike existing benchmarks, SAID deals with content that reflects the sophisticated strategies employed by real AI users on the Internet which may evade detection or gain visibility, providing a more realistic and challenging evaluation landscape. A notable finding of our study, based on the Zhihu dataset, reveals that annotators can distinguish between AI-generated and human-generated texts with an average accuracy rate of 96.5%. This finding necessitates a re-evaluation of human capability in recognizing AI-generated text in today's widely AI-influenced environment. Furthermore, we present a new user-oriented AI-text detection challenge focusing on the practicality and effectiveness of identifying AI-generated text based on user information and multiple responses. The experimental results demonstrate that conducting detection tasks on actual social media platforms proves to be more challenging compared to traditional simulated AI-text detection, resulting in a decreased accuracy. On the other hand, user-oriented AI-generated text detection significantly improve the accuracy of detection.

摘要
人工智能生成的文本已经渗透到了各种在线平台，带来了重大的可能性和风险，其中包括误information和操纵。为了解决这些挑战，本文提出了SAID（社交媒体AI检测），一个新的benchmark，用于评估AI文本检测模型在实际社交媒体平台上的能力。它包括来自popular社交媒体平台 like Zhihu和Quora的真实AI生成的文本。与现有benchmark不同，SAID处理了实际上的AI用户在互联网上采用的复杂策略，这些策略可能会逃避检测或获得可见性，提供一个更真实和挑战的评估景象。我们的研究发现，基于Zhihu数据集，标注员可以在96.5%的情况下 correctly distinguish between AI-generated and human-generated texts。这一发现需要我们重新评估现代社交媒体环境中人类对AI生成文本的识别能力。此外，我们还提出了一个新的用户 oriented AI文本检测挑战，该挑战的目的是评估检测模型在实际社交媒体平台上的实用性和效果。实验结果表明，在实际社交媒体平台上进行检测任务比传统的模拟AI文本检测更加具有挑战性，导致检测精度下降。然而，用户 oriented AI文本检测显示出明显的改善，提高检测精度。

Language Models are Universal Embedders

paper_url: http://arxiv.org/abs/2310.08232
repo_url: https://github.com/izhx/uni-rep
paper_authors: Xin Zhang, Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Min Zhang
for: This paper aims to build a unified embedding model that can be applied across tasks and languages, rather than dedicated models for each scenario.
methods: The authors use pre-trained transformer decoders for multiple languages and fine-tune them on limited English data to demonstrate universal embedding.
results: The models achieve competitive performance on different embedding tasks with minimal training data, and perform comparably to heavily supervised baselines and/or APIs on other benchmarks such as multilingual classification and code search.

Abstract
In the large language model (LLM) revolution, embedding is a key component of various systems. For example, it is used to retrieve knowledge or memories for LLMs, to build content moderation filters, etc. As such cases span from English to other natural or programming languages, from retrieval to classification and beyond, it is desirable to build a unified embedding model rather than dedicated ones for each scenario. In this work, we make an initial step towards this goal, demonstrating that multiple languages (both natural and programming) pre-trained transformer decoders can embed universally when finetuned on limited English data. We provide a comprehensive practice with thorough evaluations. On English MTEB, our models achieve competitive performance on different embedding tasks by minimal training data. On other benchmarks, such as multilingual classification and code search, our models (without any supervision) perform comparably to, or even surpass heavily supervised baselines and/or APIs. These results provide evidence of a promising path towards building powerful unified embedders that can be applied across tasks and languages.

摘要
在大语言模型（LLM）革命中，嵌入是一个关键组件，用于多种系统。例如，用于检索知识或记忆，建立内容审核筛选器等。由于这些案例跨越英语到其他自然语言或编程语言，从检索到分类和更多的应用场景，因此建立一个统一的嵌入模型比较愿意。在这项工作中，我们做出了初步的尝试，证明多种自然语言和编程语言预训练转换器可以在有限的英语数据上进行共同嵌入。我们提供了全面的实践和详细的评估。在英语MTEB上，我们的模型在不同的嵌入任务上达到了竞争性的性能，只需要训练数据的最小量。在其他benchmark上，如多语言分类和代码搜索，我们的模型（无任何监督）与大量监督的基线和/或API进行了相当的比较，甚至超越了它们。这些结果表明了建立强大的统一嵌入器的可能性，可以在任务和语言之间应用。

Fast Word Error Rate Estimation Using Self-Supervised Representations For Speech And Text

paper_url: http://arxiv.org/abs/2310.08225
repo_url: None
paper_authors: Chanho Park, Chengsong Lu, Mingjie Chen, Thomas Hain
for: 这篇论文是为了提出一种快速的word error rate（WER）估计方法，以提高计算效率。
methods: 该方法使用了自主学习表示（SSLR），通过均值抽象来组合SSLR，并实现了快速的计算。
results: 实验结果表明，该方法（Fe-WER）相比基eline（e-WER3）在Ted-Lium3上提高了19.69%和7.16%，并且Weighted by duration是10.43%。同时，实时因子大约是4倍。

Abstract
The quality of automatic speech recognition (ASR) is typically measured by word error rate (WER). WER estimation is a task aiming to predict the WER of an ASR system, given a speech utterance and a transcription. This task has gained increasing attention while advanced ASR systems are trained on large amounts of data. In this case, WER estimation becomes necessary in many scenarios, for example, selecting training data with unknown transcription quality or estimating the testing performance of an ASR system without ground truth transcriptions. Facing large amounts of data, the computation efficiency of a WER estimator becomes essential in practical applications. However, previous works usually did not consider it as a priority. In this paper, a Fast WER estimator (Fe-WER) using self-supervised learning representation (SSLR) is introduced. The estimator is built upon SSLR aggregated by average pooling. The results show that Fe-WER outperformed the e-WER3 baseline relatively by 19.69% and 7.16% on Ted-Lium3 in both evaluation metrics of root mean square error and Pearson correlation coefficient, respectively. Moreover, the estimation weighted by duration was 10.43% when the target was 10.88%. Lastly, the inference speed was about 4x in terms of a real-time factor.

摘要
自动语音识别（ASR）的质量通常由单词错误率（WER）来度量。WER估计是一个目标，它是估计给定一个语音utterance和一个转写的ASR系统的WER。这个任务在高级ASR系统被训练在大量数据后得到了越来越多的关注。在这种情况下，WER估计在许多场景中变得必要，例如选择 unknown 转写质量的训练数据或者测试ASR系统的性能 без 真实的转写。面临大量数据的情况下，WER估计的计算效率在实际应用中变得非常重要。然而，之前的工作通常不会将其作为优先级考虑。本文提出了一种快速的WER估计器（Fe-WER），使用自然学习表示（SSLR）进行自我监督学习。 Fe-WER 基于 SSLR 的均值聚合。结果表明，Fe-WER 相比 e-WER3 基线比例提高了19.69%和7.16% 在 Ted-Lium3 上的两个评价指标中的根mean square error 和 Pearson 相关系数，分别。此外，Weighted by duration 的估计为10.43%，目标值为10.88%。最后，实时因子约为4倍。

Visual Question Generation in Bengali

paper_url: http://arxiv.org/abs/2310.08187
repo_url: https://github.com/mahmudhasankhan/vqg-in-bengali
paper_authors: Mahmud Hasan, Labiba Islam, Jannatul Ferdous Ruma, Tasmiah Tahsin Mayeesha, Rashedur M. Rahman
for: The paper is written for the task of Visual Question Generation (VQG) in Bengali, with the goal of generating human-like questions relevant to given images.
methods: The paper proposes a novel transformer-based encoder-decoder architecture for VQG in Bengali, with multiple variants including image-only, image-category, and image-answer-category.
results: The paper achieves state-of-the-art results on the translated VQAv2.0 dataset, with the image-cat model achieving the highest BLUE-1 and BLEU-3 scores. The human evaluation suggests that the image-cat model is capable of generating goal-driven and attribute-specific questions that are relevant to the corresponding images.Here is the simplified Chinese text for the three key points:
for: 这篇论文是为视觉问题生成（VQG）任务中的孟加语言（Bengali）而写的。
methods: 这篇论文提出了一种基于转换器的编码器-解码体系，用于VQG任务中的孟加语言。
results: 这篇论文在转换VQAv2.0数据集上实现了状态的最佳Result，image-cat模型在BLUE-1和BLEU-3分数上达到了最高分数。人工评估表明，image-cat模型能够生成具有目标和特征的问题，与对应的图像相关。

Abstract
The task of Visual Question Generation (VQG) is to generate human-like questions relevant to the given image. As VQG is an emerging research field, existing works tend to focus only on resource-rich language such as English due to the availability of datasets. In this paper, we propose the first Bengali Visual Question Generation task and develop a novel transformer-based encoder-decoder architecture that generates questions in Bengali when given an image. We propose multiple variants of models - (i) image-only: baseline model of generating questions from images without additional information, (ii) image-category and image-answer-category: guided VQG where we condition the model to generate questions based on the answer and the category of expected question. These models are trained and evaluated on the translated VQAv2.0 dataset. Our quantitative and qualitative results establish the first state of the art models for VQG task in Bengali and demonstrate that our models are capable of generating grammatically correct and relevant questions. Our quantitative results show that our image-cat model achieves a BLUE-1 score of 33.12 and BLEU-3 score of 7.56 which is the highest of the other two variants. We also perform a human evaluation to assess the quality of the generation tasks. Human evaluation suggests that image-cat model is capable of generating goal-driven and attribute-specific questions and also stays relevant to the corresponding image.

摘要
文本内容：视觉问题生成（VQG）的任务是生成与给定图像相关的人类化问题。由于现有的数据集主要集中在英语等资源丰富的语言上，现有研究主要集中在这些语言上。在这篇论文中，我们提出了第一个孟加拉语视觉问题生成任务，并开发了一种基于转换器的编码器-解码体系，该系统可以从图像中生成孟加拉语问题。我们提出了多种变体的模型，包括（i）图像只：基线模型，不受任何额外信息影响而生成问题（ii）图像类别和图像答案类别：指导VQG的模型，conditioning模型以图像答案和预期的问题类别来生成问题。这些模型在翻译的VQAv2.0数据集上进行训练和评估。我们的量化和质量结果表明，我们的模型在孟加拉语VQG任务中创造了第一个状态的艺术模型，并且能够生成正确的 grammatical 和相关的问题。我们的量化结果表明，我们的图像类别模型在BLUE-1和BLEU-3指标上达到了33.12和7.56的最高分，并且在人类评价中也表现出了优异。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China.

Exploring the Cognitive Knowledge Structure of Large Language Models: An Educational Diagnostic Assessment Approach

paper_url: http://arxiv.org/abs/2310.08172
repo_url: None
paper_authors: Zheyuan Zhang, Jifan Yu, Juanzi Li, Lei Hou
for: 本研究旨在evaluate Large Language Models (LLMs)的知识结构，以便更好地理解LLMs的认知能力和知识表达方式。
methods: 本研究使用了eduational diagnostic assessment method和MoocRadar dataset，这是一个基于Bloom Taxonomy的人工测试数据集。
results: 研究发现LLMs具有了丰富的知识结构，并且其认知能力在不同领域中具有强大的表达能力。

Abstract
Large Language Models (LLMs) have not only exhibited exceptional performance across various tasks, but also demonstrated sparks of intelligence. Recent studies have focused on assessing their capabilities on human exams and revealed their impressive competence in different domains. However, cognitive research on the overall knowledge structure of LLMs is still lacking. In this paper, based on educational diagnostic assessment method, we conduct an evaluation using MoocRadar, a meticulously annotated human test dataset based on Bloom Taxonomy. We aim to reveal the knowledge structures of LLMs and gain insights of their cognitive capabilities. This research emphasizes the significance of investigating LLMs' knowledge and understanding the disparate cognitive patterns of LLMs. By shedding light on models' knowledge, researchers can advance development and utilization of LLMs in a more informed and effective manner.

摘要
大型语言模型（LLM）不仅在不同任务中表现出色，而且也表现出了智能的启示。最近的研究主要集中在评估这些模型在人类考试中的能力，并发现它们在不同领域中表现出了惊人的能力。然而，对于LLM的总体知识结构的认知研究仍然缺乏。在这篇论文中，我们通过基于教育诊断评估方法的MoocRadar，一个精心注释的人类测试数据集基于Bloom分类法，进行评估。我们希望通过这些研究来揭示LLM的知识结构，并了解它们的不同认知模式。这些研究可以帮助研究人员更好地发展和利用LLM，以更加了解和有效地使用它们。

Simplicity Level Estimate (SLE): A Learned Reference-Less Metric for Sentence Simplification

paper_url: http://arxiv.org/abs/2310.08170
repo_url: https://github.com/liamcripwell/sle
paper_authors: Liam Cripwell, Joël Legrand, Claire Gardent
for: 这个论文是为了提出一种新的自动评估方法，以解决自动 sentence simplification 中的评估问题。
methods: 该论文使用了一种新的学习型评估 metric，称为 SLE，它专注于简洁性，并且与人类评估更高度相关。
results: 论文表明，SLE metric 可以准确地评估自动 sentence simplification 的性能，并且与人类评估更高度相关，比大多数现有的评估 metric 更高。

Abstract
Automatic evaluation for sentence simplification remains a challenging problem. Most popular evaluation metrics require multiple high-quality references -- something not readily available for simplification -- which makes it difficult to test performance on unseen domains. Furthermore, most existing metrics conflate simplicity with correlated attributes such as fluency or meaning preservation. We propose a new learned evaluation metric (SLE) which focuses on simplicity, outperforming almost all existing metrics in terms of correlation with human judgements.

摘要
自动评估句子简化仍然是一个挑战性的问题。大多数流行的评估指标需要多个高质量的参考文本，但这些参考文本对简化不 readily available，这使得测试性能在未看到的领域变得困难。另外，大多数现有的指标会混同简化的特征与相关的属性，如流利度或意义保持。我们提出了一个新的学习based的评估指标（SLE），它专注于简化，超越了大多数现有指标在人工判断上的相关性。

Multiclass Classification of Policy Documents with Large Language Models

paper_url: http://arxiv.org/abs/2310.08167
repo_url: None
paper_authors: Erkan Gunes, Christoffer Koch Florczak
for: automate text classification processes for social science research purposes
methods: use GPT 3.5 and GPT 4 models of OpenAI, pre-trained instruction-tuned Large Language Models (LLM)
results: overall accuracies ranging from 58-83% depending on scenario and GPT model employed, with the most humanly demanding use-case achieving 83% accuracy on 65% of the data.

Abstract
Classifying policy documents into policy issue topics has been a long-time effort in political science and communication disciplines. Efforts to automate text classification processes for social science research purposes have so far achieved remarkable results, but there is still a large room for progress. In this work, we test the prediction performance of an alternative strategy, which requires human involvement much less than full manual coding. We use the GPT 3.5 and GPT 4 models of the OpenAI, which are pre-trained instruction-tuned Large Language Models (LLM), to classify congressional bills and congressional hearings into Comparative Agendas Project's 21 major policy issue topics. We propose three use-case scenarios and estimate overall accuracies ranging from %58-83 depending on scenario and GPT model employed. The three scenarios aims at minimal, moderate, and major human interference, respectively. Overall, our results point towards the insufficiency of complete reliance on GPT with minimal human intervention, an increasing accuracy along with the human effort exerted, and a surprisingly high accuracy achieved in the most humanly demanding use-case. However, the superior use-case achieved the %83 accuracy on the %65 of the data in which the two models agreed, suggesting that a similar approach to ours can be relatively easily implemented and allow for mostly automated coding of a majority of a given dataset. This could free up resources allowing manual human coding of the remaining %35 of the data to achieve an overall higher level of accuracy while reducing costs significantly.

摘要
政策文档的分类into policy issue topics已经是政治科学和communication disciplines的长期努力。为了自动化文本分类过程，以便社会科学研究purposes，已经取得了很好的结果，但还有很大的进步空间。在这个工作中，我们测试了一种 alternativestrategy，需要人类参与度 much less than full manual coding。我们使用OpenAI提供的GPT 3.5和GPT 4模型，这些模型是预训练的 instruction-tuned Large Language Models (LLM)，来分类国会法案和国会听证会into Comparative Agendas Project的21个主要政策问题。我们提出了三种使用enario和 estimate了准确率，从58%到83%，具体取决于scenario和GPT模型。我们的结果表明，完全依赖GPT的自动编码是不够的，随着人类努力的增加，准确率也逐渐提高。 Surprisingly, the most humanly demanding use-case achieved an accuracy of 83% on 65% of the data, suggesting that a similar approach to ours can be relatively easily implemented and allow for mostly automated coding of a majority of a given dataset. This could free up resources, allowing manual human coding of the remaining 35% of the data to achieve an overall higher level of accuracy while reducing costs significantly.

Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning

paper_url: http://arxiv.org/abs/2310.08166
repo_url: None
paper_authors: Junyu Lu, Dixiang Zhang, Xiaojun Wu, Xinyu Gao, Ruyi Gan, Jiaxing Zhang, Yan Song, Pingjian Zhang
for: 这paper aimed to improve the ability of large language models (LLMs) in zero-shot image-to-text generation and understanding by integrating multi-modal inputs, specifically in non-English scenarios.
methods: The paper introduces the Ziya-Visual series of bilingual large-scale vision-language models (LVLMs) that incorporate visual semantics into LLMs for multi-modal dialogue. The models use the Querying Transformer from BLIP-2 and explore optimization schemes such as instruction tuning, multi-stage training, and low-rank adaptation module for visual-language alignment.
results: The paper shows that compared to existing LVLMs, Ziya-Visual achieves competitive performance across a wide range of English-only tasks including zero-shot image-text retrieval, image captioning, and visual question answering. The evaluation leaderboard accessed by GPT-4 also indicates that the models possess satisfactory image-text understanding and generation capabilities in Chinese multi-modal scenario dialogues.

Abstract
Recent advancements enlarge the capabilities of large language models (LLMs) in zero-shot image-to-text generation and understanding by integrating multi-modal inputs. However, such success is typically limited to English scenarios due to the lack of large-scale and high-quality non-English multi-modal resources, making it extremely difficult to establish competitive counterparts in other languages. In this paper, we introduce the Ziya-Visual series, a set of bilingual large-scale vision-language models (LVLMs) designed to incorporate visual semantics into LLM for multi-modal dialogue. Composed of Ziya-Visual-Base and Ziya-Visual-Chat, our models adopt the Querying Transformer from BLIP-2, further exploring the assistance of optimization schemes such as instruction tuning, multi-stage training and low-rank adaptation module for visual-language alignment. In addition, we stimulate the understanding ability of GPT-4 in multi-modal scenarios, translating our gathered English image-text datasets into Chinese and generating instruction-response through the in-context learning method. The experiment results demonstrate that compared to the existing LVLMs, Ziya-Visual achieves competitive performance across a wide range of English-only tasks including zero-shot image-text retrieval, image captioning, and visual question answering. The evaluation leaderboard accessed by GPT-4 also indicates that our models possess satisfactory image-text understanding and generation capabilities in Chinese multi-modal scenario dialogues. Code, demo and models are available at ~\url{https://huggingface.co/IDEA-CCNL/Ziya-BLIP2-14B-Visual-v1}.

摘要

Context Compression for Auto-regressive Transformers with Sentinel Tokens

paper_url: http://arxiv.org/abs/2310.08152
repo_url: https://github.com/DRSY/KV_Compression
paper_authors: Siyu Ren, Qi Jia, Kenny Q. Zhu
for: 这个研究旨在提高Transformer-based LLMs中的发话范围，以减少computational cost和memory footprint。
methods: authors proposed a plug-and-play approach to incrementally compress the intermediate activation of a specified span of tokens into compact ones, reducing both memory and computational cost.
results: experiments on both in-domain language modeling and zero-shot open-ended document generation demonstrate the advantage of the proposed approach over sparse attention baselines in terms of fluency, n-gram matching, and semantic similarity.

Abstract
The quadratic complexity of the attention module makes it gradually become the bulk of compute in Transformer-based LLMs during generation. Moreover, the excessive key-value cache that arises when dealing with long inputs also brings severe issues on memory footprint and inference latency. In this work, we propose a plug-and-play approach that is able to incrementally compress the intermediate activation of a specified span of tokens into compact ones, thereby reducing both memory and computational cost when processing subsequent context. Experiments on both in-domain language modeling and zero-shot open-ended document generation demonstrate the advantage of our approach over sparse attention baselines in terms of fluency, n-gram matching, and semantic similarity. At last, we comprehensively profile the benefit of context compression on improving the system throughout. Code is available at https://github.com/DRSY/KV_Compression.

摘要
“对于Transformer基于模型中的注意模块，其二次性复杂性使得它在生成过程中逐渐成为计算的主要部分。此外，对长输入的处理也会导致严重的内存占用和执行时间问题。在这种情况下，我们提出了一种插件化方法，可以逐步压缩指定的Token之间的中间活动，从而降低内存和计算成本。实验表明，我们的方法在语料处理和零基础文档生成中表现出优于稀采baseline，在流畅性、n-gram匹配和semantic相似性等方面具有优势。最后，我们对系统性能进行了全面的评估。代码可以在https://github.com/DRSY/KV_Compression中找到。”

On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition

paper_url: http://arxiv.org/abs/2310.08132
repo_url: None
paper_authors: Nick Rossenbach, Benedikt Hilmes, Ralf Schlüter
for: 提高自动语音识别（ASR）系统的表现，特别是在low-resource或频率域不符合任务下。
methods: 使用新的oracle设置，研究 tekst-to-speech（TTS）系统生成的数据质量如何影响ASR训练。使用两种常见的对齐方法：隐藏马尔可夫混合模型（HMM-GMM）对齐器和神经网络Connectionist Temporal Classification（CTC）对齐器。使用一种简单的随机步骤算法，将TTS系统生成的音频帧duration分布靠拟真实duration分布，从而提高ASR系统使用synthetic数据的表现。
results: 使用这种方法可以提高ASR系统在semi-supervised Setting下的表现，使得它能够更好地识别来自TTS系统生成的语音。

Abstract
Synthetic data generated by text-to-speech (TTS) systems can be used to improve automatic speech recognition (ASR) systems in low-resource or domain mismatch tasks. It has been shown that TTS-generated outputs still do not have the same qualities as real data. In this work we focus on the temporal structure of synthetic data and its relation to ASR training. By using a novel oracle setup we show how much the degradation of synthetic data quality is influenced by duration modeling in non-autoregressive (NAR) TTS. To get reference phoneme durations we use two common alignment methods, a hidden Markov Gaussian-mixture model (HMM-GMM) aligner and a neural connectionist temporal classification (CTC) aligner. Using a simple algorithm based on random walks we shift phoneme duration distributions of the TTS system closer to real durations, resulting in an improvement of an ASR system using synthetic data in a semi-supervised setting.

摘要
<>使用文本到语音（TTS）系统生成的 sintetic 数据可以提高自动语音识别（ASR）系统在low-resource或频率匹配任务中的性能。然而，TTS生成的输出还没有与实际数据相同的质量。在这项工作中，我们关注了synthetic数据的时间结构和ASR训练之间的关系。我们使用一种新的oracle设置，以证明duration模型在non-autoregressive（NAR）TTS中对数据质量的影响。为获取参考音频duration，我们使用两种常见的对接方法：隐藏马尔可夫混合模型（HMM-GMM）对接器和神经网络时间分类（CTC）对接器。使用一种基于随机漫步的简单算法，我们将TTS系统中phoneme duration的分布shift到更近于实际duration，从而提高使用synthetic数据的ASR系统在半supervised Setting中的性能。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Fine-grained Conversational Decoding via Isotropic and Proximal Search

paper_url: http://arxiv.org/abs/2310.08130
repo_url: None
paper_authors: Yuxuan Yao, Han Wu, Qiling Xu, Linqi Song
for: 提高对话Response的质量，提出了一种细化的对话解码方法。
methods: 基于\citet{wu2023learning}的思想，提出了一种名为\textit{isotropic and proximal search (IPS)}的方法，以实现对话具有本地和均匀性的特征空间。
results: 对比其他对话解码策略，我们的方法在对话领域中表现出色，在自动和人类评价指标上都有显著的优势。

Abstract
General-purpose text decoding approaches are usually adopted for dialogue response generation. Although the quality of the generated responses can be improved with dialogue-specific encoding methods, conversational decoding methods are still under-explored. Inspired by \citet{wu2023learning} that a good dialogue feature space should follow the rules of locality and isotropy, we present a fine-grained conversational decoding method, termed \textit{isotropic and proximal search (IPS)}. Our method is designed to generate the semantic-concentrated response, while still maintaining informativeness and discrimination against the context. Experiments show that our approach outperforms existing decoding strategies in the dialogue field across both automatic and human evaluation metrics. More in-depth analyses further confirm the effectiveness of our approach.

摘要
通用文本解码方法通常用于对话回复生成。虽然使用对话特定编码方法可以提高生成的回复质量，但对话解码方法仍然受欢迎。以\citet{wu2023learning}为例，我们认为一个好的对话特征空间应该遵循地方性和射线性的规则。基于这些原则，我们提出了细化的对话解码方法，称为iso tropic and proximal search（IPS）。我们的方法旨在生成具有semantic concentrate的回复，同时仍保持对上下文的信息性和分化性。实验表明，我们的方法在对话领域中胜过现有的解码策略，在自动和人类评估指标上都显示出优异表现。更详细的分析还证明了我们的方法的有效性。

Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification

paper_url: http://arxiv.org/abs/2310.08123
repo_url: None
paper_authors: Chia-Yu Hung, Zhiqiang Hu, Yujia Hu, Roy Ka-Wei Lee
for: 本研究的目的是提出一种基于大型自然语言模型（LLM）的作者鉴定（AV）技术，以提高AV的数据要求和解释性。
methods: 本研究使用了LLMs提供步骤性的 стилиometric解释提示，以解决现有AV技术的数据限制和解释性不足问题。
results: 实验结果显示，PromptAV比 estado-of-the-art基elines高效，可以有效地使用有限的培训数据，并提供了Intuitive的解释，表明PromptAV可能成为一种有效和可解释的AV解决方案。

Abstract
Authorship verification (AV) is a fundamental task in natural language processing (NLP) and computational linguistics, with applications in forensic analysis, plagiarism detection, and identification of deceptive content. Existing AV techniques, including traditional stylometric and deep learning approaches, face limitations in terms of data requirements and lack of explainability. To address these limitations, this paper proposes PromptAV, a novel technique that leverages Large-Language Models (LLMs) for AV by providing step-by-step stylometric explanation prompts. PromptAV outperforms state-of-the-art baselines, operates effectively with limited training data, and enhances interpretability through intuitive explanations, showcasing its potential as an effective and interpretable solution for the AV task.

摘要
<作者识别（AV）是自然语言处理（NLP）和计算语言学的基本任务，具有潜在的应用于医学分析、 плаги依抄检测和识别偏执性内容。现有的AV技术，包括传统的风格统计和深度学习方法，受到数据需求的限制和解释性的不足。为了解决这些限制，本文提出了PromptAV，一种新的技术，利用大型自然语言模型（LLMs）进行AV，并提供了步骤性的风格解释提示。PromptAV在比较州的基elines上表现出色，可以有效地使用有限的训练数据，并提高了解释性通过直观的解释，这显示了PromptAV作为一种有效和可解释的解决方案。

Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices

paper_url: http://arxiv.org/abs/2310.08104
repo_url: None
paper_authors: Matthew Baas, Herman Kamper
for: investigate how a recent voice conversion model performs on non-standard downstream voice conversion tasks
methods: 使用 k-nearest neighbors voice conversion (kNN-VC) 方法
results: compared to an established baseline, kNN-VC retains high performance in stuttered and cross-lingual voice conversion, but results are more mixed for musical instrument and text-to-voice conversion tasks.

Abstract
Voice conversion aims to convert source speech into a target voice using recordings of the target speaker as a reference. Newer models are producing increasingly realistic output. But what happens when models are fed with non-standard data, such as speech from a user with a speech impairment? We investigate how a recent voice conversion model performs on non-standard downstream voice conversion tasks. We use a simple but robust approach called k-nearest neighbors voice conversion (kNN-VC). We look at four non-standard applications: stuttered voice conversion, cross-lingual voice conversion, musical instrument conversion, and text-to-voice conversion. The latter involves converting to a target voice specified through a text description, e.g. "a young man with a high-pitched voice". Compared to an established baseline, we find that kNN-VC retains high performance in stuttered and cross-lingual voice conversion. Results are more mixed for the musical instrument and text-to-voice conversion tasks. E.g., kNN-VC works well on some instruments like drums but not on others. Nevertheless, this shows that voice conversion models - and kNN-VC in particular - are increasingly applicable in a range of non-standard downstream tasks. But there are still limitations when samples are very far from the training distribution. Code, samples, trained models: https://rf5.github.io/sacair2023-knnvc-demo/.

摘要
声音转换目标是将源语音转换为目标声音，使用目标说话人的录音作为参考。 newer模型在生成声音结果时的实现得更加真实。但当模型接受非标准数据，如受残疾的说话人的语音时，会发生什么？我们调查了一种最近的声音转换模型在非标准下沟通任务中的性能。我们使用了一种简单 yet robust的方法，即k-nearest neighbors声音转换（kNN-VC）。我们分析了四种非标准应用：偏残声音转换、cross-lingual声音转换、乐器转换和文本到声音转换。后一个任务是将文本描述转换为目标声音，例如"一个年轻的男孩 WITH high-pitched 的声音"。相比已成熟的基线，我们发现kNN-VC在偏残和cross-lingual声音转换任务中保持高性能。结果在乐器和文本到声音转换任务中是更加杂乱，例如kNN-VC在鼓乐器上工作良好，但不是在其他乐器上。这表明声音转换模型 - 和kNN-VC在特定情况下的情况 - 在一些非标准下沟通任务中越来越可靠。然而，当样本很遥か于训练分布时，还存在一些限制。代码、样本、训练模型可以在https://rf5.github.io/sacair2023-knnvc-demo/中找到。

QASiNa: Religious Domain Question Answering using Sirah Nabawiyah

paper_url: http://arxiv.org/abs/2310.08102
repo_url: https://github.com/rizquuula/QASiNa
paper_authors: Muhammad Razif Rizqullah, Ayu Purwarianti, Alham Fikri Aji
for: 这个论文目的是为了评估大语言模型（LLM）在宗教领域中的性能，特别是在伊斯兰教中。
methods: 本论文使用了几种大语言模型（mBERT、XLM-R和IndoBERT），对于这些模型进行了精度调整，并使用了印度尼西亚翻译的SQuAD v2.0作为数据集。
results: 研究发现，XLM-R模型在Question Answering Sirah Nabawiyah（QASiNa）数据集上返回了最好的表现，EM为61.20，F1-Score为75.94，和字符串匹配为70.00。与Chat GPT-3.5和GPT-4进行比较后，发现Chat GPT版本返回了较低的EM和F1-Score，同时字符串匹配得分更高，这表明Chat GPT倾向于提供过多的解释，尤其是在宗教领域。

Abstract
Nowadays, Question Answering (QA) tasks receive significant research focus, particularly with the development of Large Language Model (LLM) such as Chat GPT [1]. LLM can be applied to various domains, but it contradicts the principles of information transmission when applied to the Islamic domain. In Islam we strictly regulates the sources of information and who can give interpretations or tafseer for that sources [2]. The approach used by LLM to generate answers based on its own interpretation is similar to the concept of tafseer, LLM is neither an Islamic expert nor a human which is not permitted in Islam. Indonesia is the country with the largest Islamic believer population in the world [3]. With the high influence of LLM, we need to make evaluation of LLM in religious domain. Currently, there is only few religious QA dataset available and none of them using Sirah Nabawiyah especially in Indonesian Language. In this paper, we propose the Question Answering Sirah Nabawiyah (QASiNa) dataset, a novel dataset compiled from Sirah Nabawiyah literatures in Indonesian language. We demonstrate our dataset by using mBERT [4], XLM-R [5], and IndoBERT [6] which fine-tuned with Indonesian translation of SQuAD v2.0 [7]. XLM-R model returned the best performance on QASiNa with EM of 61.20, F1-Score of 75.94, and Substring Match of 70.00. We compare XLM-R performance with Chat GPT-3.5 and GPT-4 [1]. Both Chat GPT version returned lower EM and F1-Score with higher Substring Match, the gap of EM and Substring Match get wider in GPT-4. The experiment indicate that Chat GPT tends to give excessive interpretations as evidenced by its higher Substring Match scores compared to EM and F1-Score, even after providing instruction and context. This concludes Chat GPT is unsuitable for question answering task in religious domain especially for Islamic religion.

摘要
现在，问答任务（QA） receiving significant research focus, particularly with the development of Large Language Model (LLM) such as Chat GPT [1]. LLM can be applied to various domains, but it contradicts the principles of information transmission when applied to the Islamic domain. In Islam, we strictly regulate the sources of information and who can give interpretations or tafseer for that sources [2]. The approach used by LLM to generate answers based on its own interpretation is similar to the concept of tafseer, LLM is neither an Islamic expert nor a human which is not permitted in Islam. Indonesia has the largest Islamic believer population in the world [3]. With the high influence of LLM, we need to evaluate LLM in the religious domain. Currently, there are only a few religious QA datasets available, and none of them use Sirah Nabawiyah, especially in Indonesian. In this paper, we propose the Question Answering Sirah Nabawiyah (QASiNa) dataset, a novel dataset compiled from Sirah Nabawiyah literatures in Indonesian language. We demonstrate our dataset by using mBERT [4], XLM-R [5], and IndoBERT [6], which were fine-tuned with Indonesian translation of SQuAD v2.0 [7]. XLM-R model returned the best performance on QASiNa with EM of 61.20, F1-Score of 75.94, and Substring Match of 70.00. We compare XLM-R performance with Chat GPT-3.5 and GPT-4 [1]. Both Chat GPT versions returned lower EM and F1-Score with higher Substring Match, the gap of EM and Substring Match gets wider in GPT-4. The experiment indicates that Chat GPT tends to give excessive interpretations as evidenced by its higher Substring Match scores compared to EM and F1-Score, even after providing instruction and context. This concludes that Chat GPT is unsuitable for question answering tasks in the religious domain, especially for Islamic religion.

ClimateNLP: Analyzing Public Sentiment Towards Climate Change Using Natural Language Processing

paper_url: http://arxiv.org/abs/2310.08099
repo_url: None
paper_authors: Ajay Krishnan, V. S. Anoop
for: 这篇论文旨在分析社交媒体上关于气候变化的讨论，了解公众对这种全球挑战的看法和情感。methods: 本文使用自然语言处理技术（NLP）分析社交媒体上的气候变化讨论，并使用气候BERT模型进行精度的分类。results: 研究发现，公众对气候变化的看法和情感具有诸多特征，包括担忧、抗拒和争议等。这些发现可以帮助政策制定者、研究人员和组织更好地理解公众的看法，制定有效的策略以应对气候变化挑战。

Abstract
Climate change's impact on human health poses unprecedented and diverse challenges. Unless proactive measures based on solid evidence are implemented, these threats will likely escalate and continue to endanger human well-being. The escalating advancements in information and communication technologies have facilitated the widespread availability and utilization of social media platforms. Individuals utilize platforms such as Twitter and Facebook to express their opinions, thoughts, and critiques on diverse subjects, encompassing the pressing issue of climate change. The proliferation of climate change-related content on social media necessitates comprehensive analysis to glean meaningful insights. This paper employs natural language processing (NLP) techniques to analyze climate change discourse and quantify the sentiment of climate change-related tweets. We use ClimateBERT, a pretrained model fine-tuned specifically for the climate change domain. The objective is to discern the sentiment individuals express and uncover patterns in public opinion concerning climate change. Analyzing tweet sentiments allows a deeper comprehension of public perceptions, concerns, and emotions about this critical global challenge. The findings from this experiment unearth valuable insights into public sentiment and the entities associated with climate change discourse. Policymakers, researchers, and organizations can leverage such analyses to understand public perceptions, identify influential actors, and devise informed strategies to address climate change challenges.

摘要
人类健康受气候变化影响面临历史上无 precedent 和多样化的挑战。 Unless 采取有据且有效的措施，这些威胁将持续升级，继续威胁人类生存。随着信息和通信技术的不断发展，社交媒体平台的普及和使用已成为现实。人们通过平台如Twitter和Facebook表达自己的看法、思想和评论，其中包括气候变化问题。气候变化相关内容的快速普及需要系统性的分析，以便从中提取有价值的洞察。本文使用自然语言处理（NLP）技术分析气候变化讨论，并利用ClimateBERT预训练模型，特意为气候变化领域进行了精细调整。我们的目标是探索人们表达的情感，找到气候变化话题相关的公众情况和感受。分析微博情感可以帮助我们更深入了解公众对这个全球挑战的看法、担忧和情感。本研究的发现可以为政策制定者、研究人员和组织提供有价值的情感分析和影响力actor的报告，以便更好地理解公众情况，制定有效的策略，解决气候变化挑战。

To token or not to token: A Comparative Study of Text Representations for Cross-Lingual Transfer

paper_url: http://arxiv.org/abs/2310.08078
repo_url: https://github.com/mushfiqur11/tokenfreetransfer
paper_authors: Md Mushfiqur Rahman, Fardin Ahsan Sakib, Fahim Faisal, Antonios Anastasopoulos
For: The paper aims to understand the downstream implications of text representation choices in low-resource cross-lingual transfer, and to provide a recommendation scheme for model selection based on task and language requirements.* Methods: The paper compares language models with diverse text representation modalities, including segmentation-based models (BERT, mBERT), image-based models (PIXEL), and character-level models (CANINE), on three NLP tasks (POS tagging, Dependency parsing, and NER) in 19 source languages and 133 target languages.* Results: The paper finds that image-based models excel in cross-lingual transfer for closely related languages with visually similar scripts, while segmentation-based models are superior for tasks that rely on word meaning (POS, NER). Character-level models perform best in dependency parsing tasks that require an understanding of word relationships.

Abstract
Choosing an appropriate tokenization scheme is often a bottleneck in low-resource cross-lingual transfer. To understand the downstream implications of text representation choices, we perform a comparative analysis on language models having diverse text representation modalities including 2 segmentation-based models (\texttt{BERT}, \texttt{mBERT}), 1 image-based model (\texttt{PIXEL}), and 1 character-level model (\texttt{CANINE}). First, we propose a scoring Language Quotient (LQ) metric capable of providing a weighted representation of both zero-shot and few-shot evaluation combined. Utilizing this metric, we perform experiments comprising 19 source languages and 133 target languages on three tasks (POS tagging, Dependency parsing, and NER). Our analysis reveals that image-based models excel in cross-lingual transfer when languages are closely related and share visually similar scripts. However, for tasks biased toward word meaning (POS, NER), segmentation-based models prove to be superior. Furthermore, in dependency parsing tasks where word relationships play a crucial role, models with their character-level focus, outperform others. Finally, we propose a recommendation scheme based on our findings to guide model selection according to task and language requirements.

摘要
选择合适的减少方案是跨语言转移中的一大瓶颈。为了理解文本表示方式选择的下游影响，我们进行了包括2个分 segmentation-based模型（BERT、mBERT）、1个图像基于模型（PIXEL）和1个字符级模型（CANINE）的比较分析。首先，我们提出了一个语言指数（LQ） metric，可以提供零shot和几shot评估的权重表示。使用这个 metric，我们进行了包括19种源语言和133种目标语言的三个任务（POS标签、依赖分析和NER）的实验。我们的分析发现，图像基于模型在语言相似度高和字形相似的语言间的跨语言转移中表现出色。然而，对于受word意义倾斜的任务（POS、NER），分 segmentation-based模型表现更出色。此外，在依赖分析任务中，字符级模型因其专注于字符级别的表示，而表现出了优异。最后，我们提出了根据我们的发现进行模型选择的建议方案，以便根据任务和语言要求进行指导。

Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model

paper_url: http://arxiv.org/abs/2310.08072
repo_url: None
paper_authors: Kosuke Takahashi, Takahiro Omi, Kosuke Arima, Tatsuya Ishigaki
for: 这 paper 是为了开发一种可靠且Cost-effective的问答系统训练数据生成方法。
methods: 这 paper 使用一种名为 instruct-tuned 模型，通过自动生成问题和答案对来训练问答系统。
results: 实验结果表明，使用我们提议的合成数据可以达到与 manually 批注数据相同的性能水平，而无需人工成本。I hope that helps! Let me know if you have any other questions.

Abstract
This paper presents a simple and cost-effective method for synthesizing data to train question-answering systems. For training, fine-tuning GPT models is a common practice in resource-rich languages like English, however, it becomes challenging for non-English languages due to the scarcity of sufficient question-answer (QA) pairs. Existing approaches use question and answer generators trained on human-authored QA pairs, which involves substantial human expenses. In contrast, we use an instruct-tuned model to generate QA pairs in a zero-shot or few-shot manner. We conduct experiments to compare various strategies for obtaining QA pairs from the instruct-tuned model. The results demonstrate that a model trained on our proposed synthetic data achieves comparable performance to a model trained on manually curated datasets, without incurring human costs.

摘要

Rethinking Negative Pairs in Code Search

paper_url: http://arxiv.org/abs/2310.08069
repo_url: https://github.com/Alex-HaochenLi/Soft-InfoNCE
paper_authors: Haochen Li, Xin Zhou, Luu Anh Tuan, Chunyan Miao
for: 提高代码搜索模型的软件开发效率和效果，通过对搜索查询返回的正例和负例进行对比学习。
methods: 提议使用Soft-InfoNCE损失函数，该损失函数在 InfoNCE 损失函数基础上增加了权重项来处理负例中的假阳性样本和不同负例之间的可能相互关系。
results: 经过广泛的实验，提出的 Soft-InfoNCE 损失函数和权重估计方法在现有的代码搜索模型中显示出了更高的效果和精度，并且可以更好地控制学习的代码表示分布。

Abstract
Recently, contrastive learning has become a key component in fine-tuning code search models for software development efficiency and effectiveness. It pulls together positive code snippets while pushing negative samples away given search queries. Among contrastive learning, InfoNCE is the most widely used loss function due to its better performance. However, the following problems in negative samples of InfoNCE may deteriorate its representation learning: 1) The existence of false negative samples in large code corpora due to duplications. 2). The failure to explicitly differentiate between the potential relevance of negative samples. As an example, a bubble sorting algorithm example is less ``negative'' than a file saving function for the quick sorting algorithm query. In this paper, we tackle the above problems by proposing a simple yet effective Soft-InfoNCE loss that inserts weight terms into InfoNCE. In our proposed loss function, we apply three methods to estimate the weights of negative pairs and show that the vanilla InfoNCE loss is a special case of Soft-InfoNCE. Theoretically, we analyze the effects of Soft-InfoNCE on controlling the distribution of learnt code representations and on deducing a more precise mutual information estimation. We furthermore discuss the superiority of proposed loss functions with other design alternatives. Extensive experiments demonstrate the effectiveness of Soft-InfoNCE and weights estimation methods under state-of-the-art code search models on a large-scale public dataset consisting of six programming languages. Source code is available at \url{https://github.com/Alex-HaochenLi/Soft-InfoNCE}.

摘要

paper_url: http://arxiv.org/abs/2310.08027
repo_url: None
paper_authors: Yi Dai, Hao Lang, Kaisheng Zeng, Fei Huang, Yongbin Li
for: This paper focuses on improving out-of-distribution (OOD) detection for reliable and trustworthy machine learning by leveraging world knowledge from large language models (LLMs).
methods: The proposed method uses a consistency-based uncertainty calibration approach to estimate the confidence score of each generation, and extracts visual objects from each image to fully capitalize on the world knowledge.
results: The proposed method consistently outperforms the state-of-the-art in OOD detection tasks, demonstrating its effectiveness in leveraging world knowledge for improved performance.Here’s the text in Simplified Chinese:
for: 这篇论文目的是提高机器学习中的外围样本检测，以确保可靠和可信worthy的机器学习模型。
methods: 该方法使用一种兼容性基于的不确定性准备方法来估计每一代的信任度，并从每个图像中提取完整的视觉对象，以全面利用世界知识。
results: 该方法在OOD检测任务中 consistently outperform了现有的状态则，示出其在利用世界知识方面的效果。

Abstract
Out-of-distribution (OOD) detection is essential for reliable and trustworthy machine learning. Recent multi-modal OOD detection leverages textual information from in-distribution (ID) class names for visual OOD detection, yet it currently neglects the rich contextual information of ID classes. Large language models (LLMs) encode a wealth of world knowledge and can be prompted to generate descriptive features for each class. Indiscriminately using such knowledge causes catastrophic damage to OOD detection due to LLMs' hallucinations, as is observed by our analysis. In this paper, we propose to apply world knowledge to enhance OOD detection performance through selective generation from LLMs. Specifically, we introduce a consistency-based uncertainty calibration method to estimate the confidence score of each generation. We further extract visual objects from each image to fully capitalize on the aforementioned world knowledge. Extensive experiments demonstrate that our method consistently outperforms the state-of-the-art.

摘要
非常重要的 OUT-OF-DISTRIBUTION（OOD）检测是可靠和可信认的机器学习的一部分。现代多Modal OOD检测利用了内部分布（ID）类名的文本信息进行视觉OOD检测，但是它目前忽视了内部分布类的丰富 Contextual information。大型语言模型（LLMs）包含了大量的世界知识，可以通过提示来生成每个类的描述性特征。不经过选择性地使用这些知识会导致OOD检测的毁灭性损害，这可以通过我们的分析所观察到。在这篇论文中，我们提议通过选择性生成来增强OOD检测性能。具体来说，我们引入了一种归一化uncertainty calibration方法来估计每个生成的可信度。我们还EXTRACT visual object from each image，以便全面利用上述世界知识。我们的方法在EXTENSIVE EXPERIMENTS中经常超越了现状的最佳性能。

Harnessing Large Language Models’ Empathetic Response Generation Capabilities for Online Mental Health Counselling Support

paper_url: http://arxiv.org/abs/2310.08017
repo_url: None
paper_authors: Siyuan Brandon Loh, Aravind Sesagiri Raamkumar
for: 本研究旨在探讨 LLM 是否能够生成同情响应，以满足心理健康护理的需求。
methods: 研究使用 five 种 LLM：GPT 版本 3.5 和版本 4，Vicuna FastChat-T5，PaLM 版本 2，以及 Falcon-7B-Instruct。通过简单的指令提示，这些模型对 EmpatheticDialogues 数据集中的词语进行回应。
results: 研究发现， LLMs 的回应比传统的回应生成对话系统和人类生成的回应更加同情。这些结果位于创造同情对话系统的创新进步中。

Abstract
Large Language Models (LLMs) have demonstrated remarkable performance across various information-seeking and reasoning tasks. These computational systems drive state-of-the-art dialogue systems, such as ChatGPT and Bard. They also carry substantial promise in meeting the growing demands of mental health care, albeit relatively unexplored. As such, this study sought to examine LLMs' capability to generate empathetic responses in conversations that emulate those in a mental health counselling setting. We selected five LLMs: version 3.5 and version 4 of the Generative Pre-training (GPT), Vicuna FastChat-T5, Pathways Language Model (PaLM) version 2, and Falcon-7B-Instruct. Based on a simple instructional prompt, these models responded to utterances derived from the EmpatheticDialogues (ED) dataset. Using three empathy-related metrics, we compared their responses to those from traditional response generation dialogue systems, which were fine-tuned on the ED dataset, along with human-generated responses. Notably, we discovered that responses from the LLMs were remarkably more empathetic in most scenarios. We position our findings in light of catapulting advancements in creating empathetic conversational systems.

摘要

paper_url: http://arxiv.org/abs/2310.07968
repo_url: None
paper_authors: Yinpei Dai, Run Peng, Sikai Li, Joyce Chai
for: 本研究旨在开发一种能够在未知环境中根据用户指令前往开放词汇对象的自适应智能代理人。
methods: 该研究提出了一种新的框架，称为Open-woRld Interactive persOnalized Navigation（ORION），该框架使用大语言模型（LLMs）来采取顺序决策，以控制不同模块的感知、导航和通信。
results: 实验结果表明，可以通过使用用户反馈来提高交互代理人的性能，但是在完成任务和导航导入交互中保持 equilibrio是一个挑战。此外，研究还发现了不同用户反馈形式对代理人性能的影响。

Abstract
Zero-Shot Object Navigation (ZSON) enables agents to navigate towards open-vocabulary objects in unknown environments. The existing works of ZSON mainly focus on following individual instructions to find generic object classes, neglecting the utilization of natural language interaction and the complexities of identifying user-specific objects. To address these limitations, we introduce Zero-shot Interactive Personalized Object Navigation (ZIPON), where robots need to navigate to personalized goal objects while engaging in conversations with users. To solve ZIPON, we propose a new framework termed Open-woRld Interactive persOnalized Navigation (ORION), which uses Large Language Models (LLMs) to make sequential decisions to manipulate different modules for perception, navigation and communication. Experimental results show that the performance of interactive agents that can leverage user feedback exhibits significant improvement. However, obtaining a good balance between task completion and the efficiency of navigation and interaction remains challenging for all methods. We further provide more findings on the impact of diverse user feedback forms on the agents' performance.

摘要
<>转换给定文本到简化中文。> Zero-Shot Object Navigation (ZSON) 允许代理人在未知环境中寻找开放词汇对象。现有的 ZSON 工作主要集中于遵循个人指令来找到通用对象类，忽视了自然语言互动和用户特定对象的复杂性。为解决这些局限性，我们引入 Zero-shot Interactive Personalized Object Navigation (ZIPON)， robots 需要在与用户交流的过程中前往个性化目标对象。为解决 ZIPON，我们提出一个新的框架，称为 Open-woRld Interactive persOnalized Navigation (ORION)，使用大语言模型 (LLM) 进行顺序决策，以控制不同模块的感知、导航和通信。实验结果表明，可以使用用户反馈来改进交互代理人的性能。然而，在任务完成和导航和交互的效率之间寻找良好的平衡仍然是一个挑战。我们还提供了更多关于不同用户反馈形式对代理人性能的影响的发现。

Clustering of Spell Variations for Proper Nouns Transliterated from the other languages

paper_url: http://arxiv.org/abs/2310.07962
repo_url: None
paper_authors: Prathamesh Pawar
for: 这篇论文是为了解决文本数据处理中的非统一性问题，即因为语言和方言的变化，导致翻译质量低下，从而使得NLP技术在处理文本数据时遇到的问题。
methods: 该论文提出了一种使用机器学习技术和数学相似性方程来归一化不同语言和方言中的专名的方法。具体来说，使用Affinity Propagation算法确定专名Token之间的相似性，并通过对Token-变化对的筛选而减少了专名的变体数量。
results: 该方法可以减少专名的变体数量，从而降低了人工注释的努力。这种应用可以大幅减少数据整理和格式化的人工努力。

Abstract
One of the prominent problems with processing and operating on text data is the non uniformity of it. Due to the change in the dialects and languages, the caliber of translation is low. This creates a unique problem while using NLP in text data; which is the spell variation arising from the inconsistent translations and transliterations. This problem can also be further aggravated by the human error arising from the various ways to write a Proper Noun from an Indian language into its English equivalent. Translating proper nouns originating from Indian languages can be complicated as some proper nouns are also used as common nouns which might be taken literally. Applications of NLP that require addresses, names and other proper nouns face this problem frequently. We propose a method to cluster these spell variations for proper nouns using ML techniques and mathematical similarity equations. We aimed to use Affinity Propagation to determine relative similarity between the tokens. The results are augmented by filtering the token-variation pair by a similarity threshold. We were able to reduce the spell variations by a considerable amount. This application can significantly reduce the amount of human annotation efforts needed for data cleansing and formatting.

摘要
一个常见的文本处理和操作问题是文本不具有固定格式和标准，这导致翻译质量低下。这种问题在使用自然语言处理（NLP）时特别明显，其中一个问题是缺乏一致性的翻译和转写，从而导致的拼写差异。这种问题可以通过人类错误进一步加剧，特别是在印地语言中的专名译成英文的情况下。将印地语言中的专名翻译到英文中可以是复杂的，因为一些专名也可以作为通用名称使用，并且可能会被 Literal 解释。NLP 应用程序需要识别地址、名称和其他专名时，这种问题经常出现。我们提出了使用机器学习（ML）技术和数学相似性方程来归类拼写差异的方法。我们使用 Affinity Propagation 确定token之间的相似性，并将其Filter 为相似性阈值。我们成功地减少了拼写差异的数量。这种应用可以减少数据整理和格式化所需的人工注解工作量。

2023-10-12

cs.LG

cs.LG - 2023-10-12

When Machine Learning Models Leak: An Exploration of Synthetic Training Data

paper_url: http://arxiv.org/abs/2310.08775
repo_url: None
paper_authors: Manel Slokom, Peter-Paul de Wolf, Martha Larson
for: 这个研究旨在攻击一种用于预测个人或户口在下一两年内是否重新迁徙的机器学习模型，即潜在迁徙分类器。
methods: 攻击者可以访问模型并获取预测结果，并且假设公开训练数据的边缘分布。攻击者还假设他们已经获取了一些目标个体的非敏感属性的值。攻击者的目标是推断目标个体的敏感属性值。
results: 我们研究了在使用synthetic数据更新模型训练时，攻击者是否可以更successfully推断目标个体的敏感属性值。

Abstract
We investigate an attack on a machine learning model that predicts whether a person or household will relocate in the next two years, i.e., a propensity-to-move classifier. The attack assumes that the attacker can query the model to obtain predictions and that the marginal distribution of the data on which the model was trained is publicly available. The attack also assumes that the attacker has obtained the values of non-sensitive attributes for a certain number of target individuals. The objective of the attack is to infer the values of sensitive attributes for these target individuals. We explore how replacing the original data with synthetic data when training the model impacts how successfully the attacker can infer sensitive attributes.\footnote{Original paper published at PSD 2022. The paper was subsequently updated.}

摘要
我们研究了一种攻击机器学习模型，该模型预测 individu 或家庭在下一两年内是否将移居，即潜在移居分类器。该攻击假设攻击者可以访问模型并获得预测结果，同时公共数据分布也可以公开。攻击者还假设他们已经获得了目标个体的非敏感特征值。攻击者的目标是推断目标个体的敏感特征值。我们研究了在训练模型时使用synthetic数据取代原始数据的影响，以及如何防止攻击者成功推断敏感特征值。Note: "敏感特征值" (sensitive attributes) in the text refers to information that is private and confidential, such as medical information or financial information.

PhyloGFN: Phylogenetic inference with generative flow networks

paper_url: http://arxiv.org/abs/2310.08774
repo_url: None
paper_authors: Mingyang Zhou, Zichao Yan, Elliot Layne, Nikolay Malkin, Dinghuai Zhang, Moksh Jain, Mathieu Blanchette, Yoshua Bengio
For: The paper is written for researchers and practitioners in the field of phylogenetics, particularly those interested in computational methods for inferring evolutionary relationships.* Methods: The paper uses a framework called generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and Bayesian phylogenetic inference. The proposed method, PhyloGFN, is an amortized posterior sampler that uses GFlowNets to explore and sample from the multimodal posterior distribution over tree topologies and evolutionary distances.* Results: The paper demonstrates that PhyloGFN produces diverse and high-quality evolutionary hypotheses on real benchmark datasets, and achieves a closer fit to the target distribution than state-of-the-art variational inference methods.Here’s the same information in Simplified Chinese text:* For: 本文是为phylogenetics研究人员和实践者所写的，尤其是关注计算方法来推断生物体之间的演化关系。* Methods: 本文使用generative flow networks（GFlowNets）解决phylogenetics中的两个核心问题：简洁基于和 bayesianphylogenetic inference。提议的方法是PhyloGFN，它是一种amortized posterior sampler，使用GFlowNets来探索和采样多模态 posterior distribution中的树构型和进化距离。* Results: 本文示出PhyloGFN在真实的benchmark数据上produces多样和高质量的演化假设，并超过现有variational inference方法的fit度。

Abstract
Phylogenetics is a branch of computational biology that studies the evolutionary relationships among biological entities. Its long history and numerous applications notwithstanding, inference of phylogenetic trees from sequence data remains challenging: the high complexity of tree space poses a significant obstacle for the current combinatorial and probabilistic techniques. In this paper, we adopt the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and Bayesian phylogenetic inference. Because GFlowNets are well-suited for sampling complex combinatorial structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies and evolutionary distances. We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets. PhyloGFN is competitive with prior works in marginal likelihood estimation and achieves a closer fit to the target distribution than state-of-the-art variational inference methods.

摘要
生物学计算分支——phylogenetics，研究生物体之间进化关系。尽管它们历史悠久，应用广泛，但从序列数据推导演化树仍然是一项挑战：高复杂的树空间对当前的组合学和概率方法 pose significant obstacles。在这篇论文中，我们采用泵流网络（GFlowNets）框架来解决演化树假设的两个核心问题：简便性基于的和 bayesian 演化树推导。由于GFlowNets 适合探索和采样复杂的组合结构，因此它们是探索和采样 posterior 分布中树 topologies 和演化距离的自然选择。我们示出了 PhyloGFN 可以生成多质量和多样性的进化假设，并且与优化 posterior 分布中的目标分布相似。在实际 benchmark 数据上，PhyloGFN 与 priors 在 marginal likelihood estimation 方面竞争，并且在 state-of-the-art variational inference methods 中实现了更加紧密的适应。

Modeling Fission Gas Release at the Mesoscale using Multiscale DenseNet Regression with Attention Mechanism and Inception Blocks

paper_url: http://arxiv.org/abs/2310.08767
repo_url: None
paper_authors: Peter Toma, Md Ali Muntaha, Joel B. Harley, Michael R. Tonks
for: 研究微structure影响裂变气体释放 (FGR) 的mesoscale仿真方法，但这些方法具有计算成本高的问题。
methods: 本研究使用深度学习方法，使用2D核料微structure图像预测裂变气体释放流速。
results: 四种 convolutional neural network (CNN) 架构在 simulate FGR 数据上进行了训练和评估，其中最佳performing网络 combin CBAM 和 InceptionNet 机制，可以提供高精度（ mean absolute percentage error of 4.4%）、良好的训练稳定性和鲁棒性，特别是在 very low instantaneous FGR flux values 下。

Abstract
Mesoscale simulations of fission gas release (FGR) in nuclear fuel provide a powerful tool for understanding how microstructure evolution impacts FGR, but they are computationally intensive. In this study, we present an alternate, data-driven approach, using deep learning to predict instantaneous FGR flux from 2D nuclear fuel microstructure images. Four convolutional neural network (CNN) architectures with multiscale regression are trained and evaluated on simulated FGR data generated using a hybrid phase field/cluster dynamics model. All four networks show high predictive power, with $R^{2}$ values above 98%. The best performing network combine a Convolutional Block Attention Module (CBAM) and InceptionNet mechanisms to provide superior accuracy (mean absolute percentage error of 4.4%), training stability, and robustness on very low instantaneous FGR flux values.

摘要
<>translate "Mesoscale simulations of fission gas release (FGR) in nuclear fuel provide a powerful tool for understanding how microstructure evolution impacts FGR, but they are computationally intensive. In this study, we present an alternate, data-driven approach, using deep learning to predict instantaneous FGR flux from 2D nuclear fuel microstructure images. Four convolutional neural network (CNN) architectures with multiscale regression are trained and evaluated on simulated FGR data generated using a hybrid phase field/cluster dynamics model. All four networks show high predictive power, with $R^{2}$ values above 98%. The best performing network combine a Convolutional Block Attention Module (CBAM) and InceptionNet mechanisms to provide superior accuracy (mean absolute percentage error of 4.4%), training stability, and robustness on very low instantaneous FGR flux values." into Simplified Chinese.习惯的方式是使用深度学习来预测核燃料中的发生气体释放（FGR）实时流量，从2D核燃料微струк影像中提取特征。这篇研究使用了四种卷积神经网络架构，每个架构都包含多尺度调整。这些架构都显示了高预测力，$R^{2}$值高于98%。最佳performing网络是一个结合卷积块注意模组（CBAM）和inception Net机制的网络，具有最高的准确性（统计误差的平均相对误差为4.4%）、训练稳定性和对very low实时FGR流量的稳定性。

Question Answering for Electronic Health Records: A Scoping Review of datasets and models

paper_url: http://arxiv.org/abs/2310.08759
repo_url: None
paper_authors: Jayetri Bardhan, Kirk Roberts, Daisy Zhe Wang
For: This paper is focused on providing a methodological review of existing works on question answering (QA) over electronic health records (EHRs).* Methods: The authors searched four digital sources (Google Scholar, ACL Anthology, ACM Digital Library, and PubMed) to collect relevant publications on EHR QA, and identified 47 papers for further study. They found that most of the works are fairly recent and that emrQA is the most popular EHR QA dataset.* Results: The authors identified the different models used in EHR QA and the evaluation metrics used to assess these models. They also observed that QA on EHRs is a relatively new and unexplored area, and that there is a need for further research in this area.Here is the same information in Simplified Chinese text:* For: 这篇论文是关于电子医疗记录（EHR）上问答（QA）的方法学评估。* Methods: 作者通过四个数字源（Google Scholar、ACL Anthology、ACM Digital Library、PubMed）收集了相关的EHR QA论文，并从4111篇论文中选择了47篇进行进一步研究。他们发现大多数工作是非常新的，并且emrQA是EHR QA数据集中最受引用和使用的。* Results: 作者发现了EHR QA中不同的模型和评价指标，并评估了这些模型。他们还发现了EHR QA是一个相对新的和未经探索的领域，需要进一步的研究。

Abstract
Question Answering (QA) systems on patient-related data can assist both clinicians and patients. They can, for example, assist clinicians in decision-making and enable patients to have a better understanding of their medical history. Significant amounts of patient data are stored in Electronic Health Records (EHRs), making EHR QA an important research area. In EHR QA, the answer is obtained from the medical record of the patient. Because of the differences in data format and modality, this differs greatly from other medical QA tasks that employ medical websites or scientific papers to retrieve answers, making it critical to research EHR question answering. This study aimed to provide a methodological review of existing works on QA over EHRs. We searched for articles from January 1st, 2005 to September 30th, 2023 in four digital sources including Google Scholar, ACL Anthology, ACM Digital Library, and PubMed to collect relevant publications on EHR QA. 4111 papers were identified for our study, and after screening based on our inclusion criteria, we obtained a total of 47 papers for further study. Out of the 47 papers, 25 papers were about EHR QA datasets, and 37 papers were about EHR QA models. It was observed that QA on EHRs is relatively new and unexplored. Most of the works are fairly recent. Also, it was observed that emrQA is by far the most popular EHR QA dataset, both in terms of citations and usage in other papers. Furthermore, we identified the different models used in EHR QA along with the evaluation metrics used for these models.

摘要
问答系统（QA）在患者相关数据上可以帮助临床医生和患者。它们可以帮助临床医生决策，并让患者更好地理解自己的医疗记录。大量患者数据被存储在电子医疗记录（EHR）中，因此EHR QA成为了重要的研究领域。在EHR QA中，答案来自患者的医疗记录。由于数据格式和模式的不同，这与其他医学问答任务不同，这使得研究EHR问答非常重要。本研究的目的是对现有的EHR QA工作进行方法学性评估。我们在2005年1月1日至2023年9月30日之间在Google学术、ACL Anthology、ACM数字图书馆和PubMed等四个数字源中搜索相关的文献，并从这些源中收集了4111篇文献。经过屏选 Based on our inclusion criteria，我们获得了47篇文献进行进一步研究。其中25篇文献关于EHR QA数据集，37篇文献关于EHR QA模型。我们发现，EHR QA相对较新和未探索。大多数工作是非常新的。此外，我们发现，emrQA是EHR QA数据集中最受欢迎的，同时也是其他文献中最多被引用和使用的。此外，我们还识别了EHR QA中使用的不同模型和评价指标。

Detection and prediction of clopidogrel treatment failures using longitudinal structured electronic health records

paper_url: http://arxiv.org/abs/2310.08757
repo_url: None
paper_authors: Samuel Kim, In Gu Sean Lee, Mijeong Irene Ban, Jane Chiang
for: 这个研究旨在使用机器学习算法自动检测和预测lopidogrel治疗失败，以及将自然语言处理（NLP）技术应用于医疗记录（EHR）中。
methods: 我们使用了不同的机器学习算法，包括Transformer和时间序列模型，来建立模型来检测和预测lopidogrel治疗失败。
results: 我们从uk Biobank数据集中获得了502,527名病人中的1,824名治疗失败病例和6,859名控制病例。我们组织了每名病人的诊断、处方和手术记录，并将其分为同一天的访问。实验结果显示，时间序列模型在检测和预测任务中都能够超越bag-of-words方法。尤其是BERT模型，其在检测任务中可以达到0.928AUC，而在预测任务中可以达到0.729AUC。BERT模型还在缺乏训练数据时表现出色，因为它可以利用大量的预先训练数据。

Abstract
We propose machine learning algorithms to automatically detect and predict clopidogrel treatment failure using longitudinal structured electronic health records (EHR). By drawing analogies between natural language and structured EHR, we introduce various machine learning algorithms used in natural language processing (NLP) applications to build models for treatment failure detection and prediction. In this regard, we generated a cohort of patients with clopidogrel prescriptions from UK Biobank and annotated if the patients had treatment failure events within one year of the first clopidogrel prescription; out of 502,527 patients, 1,824 patients were identified as treatment failure cases, and 6,859 patients were considered as control cases. From the dataset, we gathered diagnoses, prescriptions, and procedure records together per patient and organized them into visits with the same date to build models. The models were built for two different tasks, i.e., detection and prediction, and the experimental results showed that time series models outperform bag-of-words approaches in both tasks. In particular, a Transformer-based model, namely BERT, could reach 0.928 AUC in detection tasks and 0.729 AUC in prediction tasks. BERT also showed competence over other time series models when there is not enough training data, because it leverages the pre-training procedure using large unlabeled data.

摘要
我们提议使用机器学习算法自动检测和预测托剂治疗失败，使用长期结构化电子医疗记录（EHR）。我们通过对自然语言和结构化EHR之间的相似性进行Drawing analogies，引入了多种机器学习算法，用于建立治疗失败检测和预测模型。在这种情况下，我们从UK Biobank中提取了托剂订金的患者群体，并将其分为失败和控制两类。其中，1,824名患者被诊断为治疗失败 случа，6,859名患者被诊断为控制 caso。从数据集中，我们收集了诊断、订金和手术记录，并将其分组为同一个日期下的访问。然后，我们建立了两种不同任务的模型，即检测和预测模型，实验结果表明，时间序列模型在两个任务中都高于bag-of-words方法。特别是，一种基于Transformer的模型，即BERT，在检测任务中可达0.928AUC，在预测任务中可达0.729AUC。此外，BERT还在缺乏训练数据时表现出了优势，因为它可以利用大量未标注数据进行预处理。

Tokenizer Choice For LLM Training: Negligible or Crucial?

paper_url: http://arxiv.org/abs/2310.08754
repo_url: None
paper_authors: Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr
for: 这 paper 的目的是研究 tokenizer 对 LLM 的下游性能的影响，并提出了一些可能的解决方案。
methods: 作者使用了 24 个 mono-和多语言 LLM，在 2.6B 参数级别进行了训练，并对不同的 tokenizer 算法和参数进行了ablation 研究。
results: 研究发现，选择的 tokenizer 可以对 LLM 的下游性能产生显著的影响，同时 Training 和执行成本也受到了影响。特别是，通用的 tokenizer 评价指标 fertility 和 parity 并不总是预测模型的下游性能，因此这些指标可能是一个不可靠的代理。此外，作者发现，使用英语单语言 tokenizer 进行多语言 LLM 的训练会导致下游性能下降和额外的训练成本增加（最高达 68%），因为英语单语言 tokenizer 的词汇表大小不足。

Abstract
The recent success of LLMs has been predominantly driven by curating the training dataset composition, scaling of model architectures and dataset sizes and advancements in pretraining objectives, leaving tokenizer influence as a blind spot. Shedding light on this underexplored area, we conduct a comprehensive study on the influence of tokenizer choice on LLM downstream performance by training 24 mono- and multilingual LLMs at a 2.6B parameter scale, ablating different tokenizer algorithms and parameterizations. Our studies highlight that the tokenizer choice can significantly impact the model's downstream performance, training and inference costs. In particular, we find that the common tokenizer evaluation metrics fertility and parity are not always predictive of model downstream performance, rendering these metrics a questionable proxy for the model's downstream performance. Furthermore, we show that multilingual tokenizers trained on the five most frequent European languages require vocabulary size increases of factor three in comparison to English. While English-only tokenizers have been applied to the training of multi-lingual LLMs, we find that this approach results in a severe downstream performance degradation and additional training costs of up to 68%, due to an inefficient tokenization vocabulary.

摘要
We trained 24 mono- and multilingual LLMs with 2.6 billion parameters and compared different tokenizer algorithms and parameterizations. Our findings reveal that the tokenizer choice can significantly affect the model's downstream performance, training and inference costs. In particular, we observed that commonly used tokenizer evaluation metrics, such as fertility and parity, are not always indicative of the model's downstream performance, making these metrics a questionable proxy for assessing the model's performance.Moreover, we discovered that multilingual tokenizers trained on the five most frequent European languages require a vocabulary size increase of a factor of three compared to English. While English-only tokenizers have been used for training multi-lingual LLMs, our results show that this approach leads to a significant downstream performance degradation and additional training costs of up to 68%, due to an inefficient tokenization vocabulary.

Search-Adaptor: Text Embedding Customization for Information Retrieval

paper_url: http://arxiv.org/abs/2310.08750
repo_url: None
paper_authors: Jinsung Yoon, Sercan O Arik, Yanfei Chen, Tomas Pfister
for: 这篇论文旨在提高信息检索和搜索的效果，通过利用相关的查询-文档对应数据来改进大语言模型（LLM）的能力。
methods: 该论文提出了一种新的方法，即Search-Adaptor，可以快速和可靠地适应LLM进行信息检索。Search-Adaptor会修改原始的文本嵌入，并可以与任何LLM集成，包括通过API访问的LLM。
results: 在多个实际的英语和多语言检索数据集上，我们显示了Search-Adaptor的性能优势，例如在13个BEIR数据集上，与Google嵌入API相比，Search-Adaptor可以提高nDCG@10的平均提升超过5.2%。

Abstract
Text embeddings extracted by pre-trained Large Language Models (LLMs) have significant potential to improve information retrieval and search. Beyond the zero-shot setup in which they are being conventionally used, being able to take advantage of the information from the relevant query-corpus paired data has the power to further boost the LLM capabilities. In this paper, we propose a novel method, Search-Adaptor, for customizing LLMs for information retrieval in an efficient and robust way. Search-Adaptor modifies the original text embedding generated by pre-trained LLMs, and can be integrated with any LLM, including those only available via APIs. On multiple real-world English and multilingual retrieval datasets, we show consistent and significant performance benefits for Search-Adaptor -- e.g., more than 5.2% improvements over the Google Embedding APIs in nDCG@10 averaged over 13 BEIR datasets.

摘要
文本嵌入EXTracted by pre-trained Large Language Models (LLMs) has significant potential to improve information retrieval and search. Beyond the zero-shot setup in which they are being conventionally used, being able to take advantage of the information from the relevant query-corpus paired data has the power to further boost the LLM capabilities. In this paper, we propose a novel method, Search-Adaptor, for customizing LLMs for information retrieval in an efficient and robust way. Search-Adaptor modifies the original text embedding generated by pre-trained LLMs, and can be integrated with any LLM, including those only available via APIs. On multiple real-world English and multilingual retrieval datasets, we show consistent and significant performance benefits for Search-Adaptor -- e.g., more than 5.2% improvements over the Google Embedding APIs in nDCG@10 averaged over 13 BEIR datasets.

Evolutionary Dynamic Optimization and Machine Learning

paper_url: http://arxiv.org/abs/2310.08748
repo_url: https://github.com/Aryia-Behroziuan/neurons
paper_authors: Abdennour Boulesnane
for: 本研究旨在探讨EVOLUTIONARY DYNAMIC OPTIMIZATION (EDO)和 MACHINE LEARNING (ML)的相互 интеграción，以便在ML任务中实现动态优化。
methods: 本研究使用了EVOLUTIONARY ALGORITHMS (EA)和ML ALGORITHMS (MLA)的相互 интеграción，以便在不同的ML任务中实现更好的优化。
results: 研究发现，通过在EA和MLA之间的相互 интеграción，可以在ML任务中实现更好的动态优化，并且可以提高模型的性能。

Abstract
Evolutionary Computation (EC) has emerged as a powerful field of Artificial Intelligence, inspired by nature's mechanisms of gradual development. However, EC approaches often face challenges such as stagnation, diversity loss, computational complexity, population initialization, and premature convergence. To overcome these limitations, researchers have integrated learning algorithms with evolutionary techniques. This integration harnesses the valuable data generated by EC algorithms during iterative searches, providing insights into the search space and population dynamics. Similarly, the relationship between evolutionary algorithms and Machine Learning (ML) is reciprocal, as EC methods offer exceptional opportunities for optimizing complex ML tasks characterized by noisy, inaccurate, and dynamic objective functions. These hybrid techniques, known as Evolutionary Machine Learning (EML), have been applied at various stages of the ML process. EC techniques play a vital role in tasks such as data balancing, feature selection, and model training optimization. Moreover, ML tasks often require dynamic optimization, for which Evolutionary Dynamic Optimization (EDO) is valuable. This paper presents the first comprehensive exploration of reciprocal integration between EDO and ML. The study aims to stimulate interest in the evolutionary learning community and inspire innovative contributions in this domain.

摘要
适应进化 Computation (EC) 已经成为人工智能中的一个强大领域， Drawing inspiration from nature's gradual development mechanisms. However, EC approaches often face challenges such as stagnation, diversity loss, computational complexity, population initialization, and premature convergence. To overcome these limitations, researchers have integrated learning algorithms with evolutionary techniques. This integration harnesses the valuable data generated by EC algorithms during iterative searches, providing insights into the search space and population dynamics. Similarly, the relationship between evolutionary algorithms and Machine Learning (ML) is reciprocal, as EC methods offer exceptional opportunities for optimizing complex ML tasks characterized by noisy, inaccurate, and dynamic objective functions. These hybrid techniques, known as Evolutionary Machine Learning (EML), have been applied at various stages of the ML process. EC techniques play a vital role in tasks such as data balancing, feature selection, and model training optimization. Moreover, ML tasks often require dynamic optimization, for which Evolutionary Dynamic Optimization (EDO) is valuable. This paper presents the first comprehensive exploration of reciprocal integration between EDO and ML. The study aims to stimulate interest in the evolutionary learning community and inspire innovative contributions in this domain.Translated into Simplified Chinese:适应进化计算 (EC) 已经成为人工智能中的一个强大领域，灵感自然的慢慢发展机制。然而，EC方法经常面临困难，如停滞、多样性损失、计算复杂性、人口初始化和快速 converges。为了突破这些局限性，研究人员已经将学习算法与进化技术相结合。这种结合使得EC算法在迭代搜索中生成的有价值数据，提供搜索空间和人口动态的启示。同时，机器学习 (ML) 和 EC 之间的关系是相互的，EC方法在复杂的 ML 任务中提供了优秀的优化机会。这些混合技术，称为进化机器学习 (EML)，在 ML 过程中的不同阶段应用。EC 技术在数据均衡、特征选择和模型训练优化等任务中扮演着重要的角色。此外， ML 任务经常需要动态优化，为了这些任务，进化动态优化 (EDO) 是非常有价值的。这篇文章提供了首次对 EDO 和 ML 之间的相互 интеграции的全面探讨。这项研究的目标是在进化学习社区中启发兴趣，激发创新的贡献。

paper_url: http://arxiv.org/abs/2310.08746
repo_url: https://github.com/Aakriti05/Robust-Multimodal-MARL
paper_authors: Aakriti Agrawal, Rohith Aralikatti, Yanchao Sun, Furong Huang
for: 这篇论文的目的是解决多智能体强化学习中环境不确定性的问题。
methods: 本文提出了一种基于课程学习技术的通用响应方法来面对多种环境不确定性。
results: 本文的实验结果显示，这种方法可以在多智能体强化学习环境中提高稳定性，并在协力和竞争环境中都达到了现有最佳性能。

Abstract
Multi-agent reinforcement learning (MARL) plays a pivotal role in tackling real-world challenges. However, the seamless transition of trained policies from simulations to real-world requires it to be robust to various environmental uncertainties. Existing works focus on finding Nash Equilibrium or the optimal policy under uncertainty in one environment variable (i.e. action, state or reward). This is because a multi-agent system itself is highly complex and unstationary. However, in real-world situation uncertainty can occur in multiple environment variables simultaneously. This work is the first to formulate the generalised problem of robustness to multi-modal environment uncertainty in MARL. To this end, we propose a general robust training approach for multi-modal uncertainty based on curriculum learning techniques. We handle two distinct environmental uncertainty simultaneously and present extensive results across both cooperative and competitive MARL environments, demonstrating that our approach achieves state-of-the-art levels of robustness.

摘要

Splicing Up Your Predictions with RNA Contrastive Learning

paper_url: http://arxiv.org/abs/2310.08738
repo_url: https://github.com/phil-fradkin/contrastive_rna
paper_authors: Philip Fradkin, Ruian Shi, Bo Wang, Brendan Frey, Leo J. Lee
for: 该研究旨在填补 genomic 数据的理解 gap，通过自动学习法在其他领域所示出的可能性。
methods: 该研究使用了对比学习技术，利用功能相似性 междуalternative splicing和基因复制生成的序列。
results: 研究证明了通过这种策略可以学习通用RNAisoform表示，并在下游任务中达到竞争性的结果，包括RNA半衰期和平均核酶荷载预测。

Abstract
In the face of rapidly accumulating genomic data, our understanding of the RNA regulatory code remains incomplete. Recent self-supervised methods in other domains have demonstrated the ability to learn rules underlying the data-generating process such as sentence structure in language. Inspired by this, we extend contrastive learning techniques to genomic data by utilizing functional similarities between sequences generated through alternative splicing and gene duplication. Our novel dataset and contrastive objective enable the learning of generalized RNA isoform representations. We validate their utility on downstream tasks such as RNA half-life and mean ribosome load prediction. Our pre-training strategy yields competitive results using linear probing on both tasks, along with up to a two-fold increase in Pearson correlation in low-data conditions. Importantly, our exploration of the learned latent space reveals that our contrastive objective yields semantically meaningful representations, underscoring its potential as a valuable initialization technique for RNA property prediction.

摘要
在大量基因数据面前，我们对树落语言中的RNA规则 Code仍然不够完善。其他领域的自动学习方法已经表明了数据生成过程中的规则，例如语言 sentence 结构。受这种启发，我们扩展了对 genomic 数据的冲突学习技术，利用功能相似性 между通过alternative splicing和 gene duplication 生成的序列。我们的新数据集和对比目标可以学习通用 RNA isoform 表示。我们验证了它们在下游任务中的有用性，包括 RNA 半衰期和平均ribosome 荷载预测。我们的预训练策略在线性检测中获得了竞争力和在低数据情况下的一两倍增加的皮尔逊相关性。进一步探索学习的latent空间表明，我们的对比目标实际上得到了Semantically meaningful的表示，这 highlights its potential as a valuable initialization technique for RNA property prediction。

Provably Robust Cost-Sensitive Learning via Randomized Smoothing

paper_url: http://arxiv.org/abs/2310.08732
repo_url: https://github.com/trustmlrg/cs-rs
paper_authors: Yuan Xin, Michael Backes, Xiao Zhang
for: 本研究旨在开发一种可靠的鲁棒性证明框架，以适应成本敏感的情景下学习抵抗攻击的分类器。
methods: 本研究使用随机抑制的灵活矩阵来证明鲁棒性，并为不同的数据子集设计细化的证明范围优化策略。
results: 实验结果表明， compared to existing methods, 本方法可以在成本敏感的情景下实现显著改善的鲁棒性证明性，而无需增加训练时间或计算复杂度。

Abstract
We focus on learning adversarially robust classifiers under a cost-sensitive scenario, where the potential harm of different classwise adversarial transformations is encoded in a binary cost matrix. Existing methods are either empirical that cannot certify robustness or suffer from inherent scalability issues. In this work, we study whether randomized smoothing, a more scalable robustness certification framework, can be leveraged to certify cost-sensitive robustness. Built upon a notion of cost-sensitive certified radius, we show how to adapt the standard randomized smoothing certification pipeline to produce tight robustness guarantees for any cost matrix. In addition, with fine-grained certified radius optimization schemes specifically designed for different data subgroups, we propose an algorithm to train smoothed classifiers that are optimized for cost-sensitive robustness. Extensive experiments on image benchmarks and a real-world medical dataset demonstrate the superiority of our method in achieving significantly improved performance of certified cost-sensitive robustness while having a negligible impact on overall accuracy.

摘要
我们关注在一个成本敏感的情况下学习对抗性robust classifier，其中不同类型的对抗性变换的潜在危害是通过二进制成本矩阵编码。现有方法有两种缺点：一是经验的，无法证明Robustness；二是存在内在的扩展性问题。在这个工作中，我们研究了randomized smoothing是否可以用来证明成本敏感Robustness。基于成本敏感证明半径，我们展示了如何适应任何成本矩阵，并提出了一种特定数据 subgroup fine-grained证明半径优化方案来训练优化成本敏感Robustness的滑动类ifier。实验表明，我们的方法可以在图像benchmark和一个真实的医疗数据集上实现显著提高证明成本敏感Robustness的性能，而无需对总准确率造成重要影响。

Heterophily-Based Graph Neural Network for Imbalanced Classification

paper_url: http://arxiv.org/abs/2310.08725
repo_url: None
paper_authors: Zirui Liang, Yuntao Li, Tianjin Huang, Akrati Saxena, Yulong Pei, Mykola Pechenizkiy
for: Addressing class imbalance in graph-related problems, particularly in node classification tasks.
methods: Proposes a unique approach that considers graph heterophily to tackle imbalanced classification, integrating an imbalance classification strategy with heterophily-aware GNNs to improve performance and efficiency.
results: Demonstrates superiority in classification performance and efficiency compared to existing baselines through experiments on real-world graphs.

Abstract
Graph neural networks (GNNs) have shown promise in addressing graph-related problems, including node classification. However, conventional GNNs assume an even distribution of data across classes, which is often not the case in real-world scenarios, where certain classes are severely underrepresented. This leads to suboptimal performance of standard GNNs on imbalanced graphs. In this paper, we introduce a unique approach that tackles imbalanced classification on graphs by considering graph heterophily. We investigate the intricate relationship between class imbalance and graph heterophily, revealing that minority classes not only exhibit a scarcity of samples but also manifest lower levels of homophily, facilitating the propagation of erroneous information among neighboring nodes. Drawing upon this insight, we propose an efficient method, called Fast Im-GBK, which integrates an imbalance classification strategy with heterophily-aware GNNs to effectively address the class imbalance problem while significantly reducing training time. Our experiments on real-world graphs demonstrate our model's superiority in classification performance and efficiency for node classification tasks compared to existing baselines.

摘要
граф neural networks (GNNs) 已经显示了解决图像问题的抢势，包括节点分类。然而， conventioal GNNs 假设图像上的数据具有均匀分布，而实际场景中的类别经常受到不均匀的影响，导致标准 GNNs 在不均匀图像上表现下降。在这篇论文中，我们介绍了一种独特的方法，该方法解决了不均匀分类问题，通过考虑图像的异质。我们发现了少数类不仅具有样本稀缺，而且也表现出较低的同类关系度，从而使得邻居节点之间的误差信息更易传播。基于这一点，我们提出了一种高效的方法，即 Fast Im-GBK，该方法结合了不均匀分类策略和异质意识 GNNs，以有效地解决不均匀分类问题，同时显著降低训练时间。我们在实际图像上进行了Node classification任务的实验，并证明了我们的模型在性能和效率方面与现有的基elines相比较优。

Designing Observables for Measurements with Deep Learning

paper_url: http://arxiv.org/abs/2310.08717
repo_url: https://github.com/owen234/designer-obs-paper
paper_authors: Owen Long, Benjamin Nachman
for: 这篇论文的目的是用机器学习设计优化的观察量来推导基础物理模型的参数。
methods: 这篇论文使用机器学习来设计优化的观察量，并使用神经网络输出不平衡的凝聚section来包含最多有关参数的信息。
results: 作者使用两种物理模型进行深层受体散射的包含测量，并通过比较传统的physics intuition和机器学习设计的观察量来证明机器学习设计的优势。

Abstract
Many analyses in particle and nuclear physics use simulations to infer fundamental, effective, or phenomenological parameters of the underlying physics models. When the inference is performed with unfolded cross sections, the observables are designed using physics intuition and heuristics. We propose to design optimal observables with machine learning. Unfolded, differential cross sections in a neural network output contain the most information about parameters of interest and can be well-measured by construction. We demonstrate this idea using two physics models for inclusive measurements in deep inelastic scattering.

摘要
很多分析在粒子和核物理中使用模拟来推导基础、有效或现象学 Parameters of interest. When the inference is performed with unfolded cross sections, the observables are designed using physics intuition and heuristics. We propose to design optimal observables with machine learning. Unfolded, differential cross sections in a neural network output contain the most information about parameters of interest and can be well-measured by construction. We demonstrate this idea using two physics models for inclusive measurements in deep inelastic scattering.Here's the word-for-word translation of the text into Simplified Chinese:很多分析在粒子和核物理中使用模拟来推导基础、有效或现象学参数。当推导使用 unfolded 跨sections 时，观察器是通过物理直觉和规则来设计的。我们提议使用机器学习设计优化观察器。 unfolded，差分跨sections 在神经网络输出中含有最多关于参数的信息，并可以通过构建好地测量。我们使用 two 物理模型 для inclusive 测量在深刻射撃中示例。

Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research

paper_url: http://arxiv.org/abs/2310.08710
repo_url: None
paper_authors: Cole Gulino, Justin Fu, Wenjie Luo, George Tucker, Eli Bronstein, Yiren Lu, Jean Harb, Xinlei Pan, Yan Wang, Xiangyu Chen, John D. Co-Reyes, Rishabh Agarwal, Rebecca Roelofs, Yao Lu, Nico Montali, Paul Mougin, Zoey Yang, Brandyn White, Aleksandra Faust, Rowan McAllister, Dragomir Anguelov, Benjamin Sapp
for: 本研究旨在开发一个可靠、成本效果的 autonomous driving simulate 工具，以便在安全和可靠的环境中进行大规模的测试和训练。
methods: 本研究使用了公共发布的实际驾驶数据（如 Waymo Open Motion Dataset）来初始化或播放多个自适应者的 simulated 场景。它利用硬件加速器（如 TPUs/GPUs）进行加速，支持在图像上进行训练，适用于现代大规模分布式机器学习工作流程。
results: 本研究通过对各种启发式学习和奖励学习算法进行比较，以及不同设计决策的ablation 研究，证明了路径作为规划代理的有效性，以及RL 可能受到 simulated 代理的过拟合。

Abstract
Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simulation and testing. Waymax uses publicly-released, real-world driving data (e.g., the Waymo Open Motion Dataset) to initialize or play back a diverse set of multi-agent simulated scenarios. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training, making it suitable for modern large-scale, distributed machine learning workflows. To support online training and evaluation, Waymax includes several learned and hard-coded behavior models that allow for realistic interaction within simulation. To supplement Waymax, we benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions, where we highlight the effectiveness of routes as guidance for planning agents and the ability of RL to overfit against simulated agents.

摘要
模拟是自动驾驶车辆规划软件的开发和测试的重要工具，可以在安全和经济的方式下进行模拟。然而，实际的模拟需要准确地模拟复杂多代理交互行为。为解决这些挑战，我们介绍了 Waymax，一个新的数据驱动的自动驾驶多代理场景模拟器，适用于大规模的模拟和测试。Waymax 使用公共发布的实际驾驶数据（例如 Waymo 开放运动数据集）来初始化或播放多种多代理模拟enario。它完全采用硬件加速器 such as TPUs/GPUs 运行，适合现代大规模分布式机器学习工作流程。为支持在线培训和评估，Waymax 包含了一些学习和硬编码的行为模型， allowing for realistic interaction within simulation。为了补充 Waymax，我们对一些流行的仿效学习和奖励学习算法进行了ablation study，其中我们强调路径作为规划代理的导航和RL 可以做到对模拟代理的适应。

Polynomial Time Cryptanalytic Extraction of Neural Network Models

paper_url: http://arxiv.org/abs/2310.08708
repo_url: None
paper_authors: Adi Shamir, Isaac Canales-Martinez, Anna Hambitzer, Jorge Chavez-Saab, Francisco Rodrigez-Henriquez, Nitin Satpute
for: 本研究旨在提高现有的攻击方法，以EXTRACT ALL PARAMETERS OF RELU-based deep neural networks WITH ARBITRARILY HIGH PRECISION AND POLYNOMIAL COMPLEXITY。
methods: 本研究使用了新的技术，包括一种新的搜索算法和一种新的筛选算法，以提高攻击效率和精度。
results: 研究发现，使用新的技术可以在30分钟内EXTRACT ALL PARAMETERS OF A FULL-SIZED NEURAL NETWORK FOR CLASSIFYING CIFAR10 DATASET，这个任务原来需要了2^256的搜索空间。

Abstract
Billions of dollars and countless GPU hours are currently spent on training Deep Neural Networks (DNNs) for a variety of tasks. Thus, it is essential to determine the difficulty of extracting all the parameters of such neural networks when given access to their black-box implementations. Many versions of this problem have been studied over the last 30 years, and the best current attack on ReLU-based deep neural networks was presented at Crypto 2020 by Carlini, Jagielski, and Mironov. It resembles a differential chosen plaintext attack on a cryptosystem, which has a secret key embedded in its black-box implementation and requires a polynomial number of queries but an exponential amount of time (as a function of the number of neurons). In this paper, we improve this attack by developing several new techniques that enable us to extract with arbitrarily high precision all the real-valued parameters of a ReLU-based DNN using a polynomial number of queries and a polynomial amount of time. We demonstrate its practical efficiency by applying it to a full-sized neural network for classifying the CIFAR10 dataset, which has 3072 inputs, 8 hidden layers with 256 neurons each, and over million neuronal parameters. An attack following the approach by Carlini et al. requires an exhaustive search over 2 to the power 256 possibilities. Our attack replaces this with our new techniques, which require only 30 minutes on a 256-core computer.

摘要
估计 billions of dollars 和 countless GPU 小时是为了训练深度神经网络（DNN）而被花费。因此， Determining the difficulty of extracting all the parameters of such neural networks when given access to their black-box implementations is essential. Over the last 30 years, many versions of this problem have been studied, and the best current attack on ReLU-based deep neural networks was presented at Crypto 2020 by Carlini, Jagielski, and Mironov. This attack resembles a differential chosen plaintext attack on a cryptosystem, which has a secret key embedded in its black-box implementation and requires a polynomial number of queries but an exponential amount of time (as a function of the number of neurons).在这篇论文中，我们提高了这种攻击，发展了several new techniques，使得我们可以使用一个 полиномиаль数量的查询和一个 полиномиаль时间来EXTRACT WITH ARBITRARILY HIGH PRECISION 所有实数参数 OF A ReLU-based DNN。我们示出了这个攻击的实践效率，通过应用它到一个用于分类 CIFAR10 数据集的全大小神经网络，该神经网络有 3072 个输入、8 个隐藏层，每个隐藏层有 256 个神经元，以及数百万个神经元。一个如Carlini et al. 所提出的攻击需要枚举 2 的 256 个可能性。我们的攻击则使用我们新提出的技术，只需要 30 分钟的时间在 256 核心计算机上完成。

Eliciting Model Steering Interactions from Users via Data and Visual Design Probes

paper_url: http://arxiv.org/abs/2310.09314
repo_url: None
paper_authors: Anamaria Crisan, Maddie Shang, Eric Brochu
for: 该研究旨在探讨采用自动化数据科学工具的领域专家如何使用语义交互来更新简单的分类模型。
methods: 该研究使用数据和视觉设计探索法来检查专家们在不同程度的机器学习知识下如何使用语义交互来更新机器学习模型。
results: 研究发现，许多Semantic interactions的目标不直接映射到机器学习模型参数，而是增强训练数据集。研究还发现参与者们有几种不同的需求，这些需求与机器学习专家的不同程度相关。此外，参与者们也发现使用语义交互可以帮助团队成员之间合作工作，特别是对于没有机器学习背景的成员。

Abstract
Domain experts increasingly use automated data science tools to incorporate machine learning (ML) models in their work but struggle to "debug" these models when they are incorrect. For these experts, semantic interactions can provide an accessible avenue to guide and refine ML models without having to programmatically dive into its technical details. In this research, we conduct an elicitation study using data and visual design probes to examine if and how experts with a spectrum of ML expertise use semantic interactions to update a simple classification model. We use our design probes to facilitate an interactive dialogue with 20 participants and codify their interactions as a set of target-interaction pairs. Interestingly, our findings revealed that many targets of semantic interactions do not directly map to ML model parameters, but instead aim to augment the data a model uses for training. We also identify reasons that participants would hesitate to interact with ML models, including burdens of cognitive load and concerns of injecting bias. Unexpectedly participants also saw the value of using semantic interactions to work collaboratively with members of their team. Participants with less ML expertise found this to be a useful mechanism for communicating their concerns to ML experts. This was an especially important observation, as our study also shows the different needs that correspond to diverse ML expertise. Collectively, we demonstrate that design probes are effective tools for proactively gathering the affordances that should be offered in an interactive machine learning system.

摘要
域专业人员 increasingly 使用自动化数 science工具来 incorporate 机器学习（ML）模型到他们的工作中，但是在模型错误时很难“调试”这些模型。为这些专业人员，语义互动可以提供一条可 accessible 的通道，以便指导并细化 ML 模型，不需要深入了解技术细节。在这项研究中，我们通过数据和视觉设计探索来评估专业人员spectrum 的 ML 技能水平上是否使用语义互动来更新一个简单的分类模型。我们使用我们的设计探索来促进参与者和20名参与者之间的交互对话，并将其 codified 为一组目标互动对。我们发现了许多语义互动目标并不直接映射到 ML 模型参数，而是用于增强训练数据集。我们还发现了参与者与 ML 模型交互时的一些障碍，包括认知负担和担心插入偏见。但是，参与者也看到了使用语义互动工作协作团队成员的价值。参与者具有较低的 ML 专业知识水平发现这是一种有用的机制，用于与 ML 专家进行沟通他们的关注。这是特别重要的，因为我们的研究也显示了不同的 ML 专业知识水平对应的不同需求。总的来说，我们示示了设计探索是可以有效地收集互动机器学习系统的可用性的工具。

Kernel-Elastic Autoencoder for Molecular Design

paper_url: http://arxiv.org/abs/2310.08685
repo_url: None
paper_authors: Haote Li, Yu Shee, Brandon Allen, Federica Maschietto, Victor Batista
for: 这篇论文是为了描述一种基于 transformer 架构的自然语言生成模型，即 Kernel-Elastic Autoencoder (KAE)，以及 KAE 在分子设计方面的表现。
methods: KAE 使用了两个新的损失函数：修改后最大平均差和权重重建。这两个损失函数使得 KAE 可以同时实现有效的生成和准确的重建。
results: KAE 在独立测试集上实现了很好的多样性，同时保持了near-perfect的重建性。此外，KAE 还可以进行 conditional 生成和基于搜索的解码，从而实现了 state-of-the-art 的性能在受限优化中。最后，KAE 还可以根据偏好的粘合能力进行 Conditional 生成，并且在 AutoDock Vina 和 Glide 分数上超过了所有从训练集中的候选者。

Abstract
We introduce the Kernel-Elastic Autoencoder (KAE), a self-supervised generative model based on the transformer architecture with enhanced performance for molecular design. KAE is formulated based on two novel loss functions: modified maximum mean discrepancy and weighted reconstruction. KAE addresses the long-standing challenge of achieving valid generation and accurate reconstruction at the same time. KAE achieves remarkable diversity in molecule generation while maintaining near-perfect reconstructions on the independent testing dataset, surpassing previous molecule-generating models. KAE enables conditional generation and allows for decoding based on beam search resulting in state-of-the-art performance in constrained optimizations. Furthermore, KAE can generate molecules conditional to favorable binding affinities in docking applications as confirmed by AutoDock Vina and Glide scores, outperforming all existing candidates from the training dataset. Beyond molecular design, we anticipate KAE could be applied to solve problems by generation in a wide range of applications.

摘要
我们介绍了核心弹性自适应器（KAE），一种基于传播架构造的无监督生成模型，具有优化的表现力 для分子设计。KAE 是基于两个新的损失函数：修改后最大平均差异和负载重合成。KAE 解决了实现有效生成和精准重建的问题，同时维持了独立测试集上的近乎完美重建。KAE 可以实现 conditional generation 和基于排序搜寻的解oding，实现了顶尖的性能在受限制的优化中。此外，KAE 可以根据偏好的缘Points generates molecules with favorable binding affinities in docking applications, as confirmed by AutoDock Vina and Glide scores, outperforming all existing candidates from the training dataset. 在分子设计之外，我们预期 KAE 可以应用到各种生成问题中。

Machine Learning Who to Nudge: Causal vs Predictive Targeting in a Field Experiment on Student Financial Aid Renewal

paper_url: http://arxiv.org/abs/2310.08672
repo_url: None
paper_authors: Susan Athey, Niall Keleher, Jann Spiess
for: 该研究目的是分析在大规模实验中对学生进行”激励”以提高学生在过期前续续金融援助申请的效果。
methods: 该研究使用了” causal forest” 来估计各个学生对待治疗的不同效果，然后根据这些估计分配学生到待治疗组和控制组中。然后评估了两种不同的目标策略，一种是targeting学生的低预测结果，另一种是targeting学生的高预测结果。
results: 研究发现，targeting中间预测结果最有效，而targeting低预测结果实际上是有害的。在一年的实验中，对所有学生进行激励提高了37%的学生早期申请率的平均值6.4个百分点，并估计使用我们选择的策略可以达到约75%的这个效果。

Abstract
In many settings, interventions may be more effective for some individuals than others, so that targeting interventions may be beneficial. We analyze the value of targeting in the context of a large-scale field experiment with over 53,000 college students, where the goal was to use "nudges" to encourage students to renew their financial-aid applications before a non-binding deadline. We begin with baseline approaches to targeting. First, we target based on a causal forest that estimates heterogeneous treatment effects and then assigns students to treatment according to those estimated to have the highest treatment effects. Next, we evaluate two alternative targeting policies, one targeting students with low predicted probability of renewing financial aid in the absence of the treatment, the other targeting those with high probability. The predicted baseline outcome is not the ideal criterion for targeting, nor is it a priori clear whether to prioritize low, high, or intermediate predicted probability. Nonetheless, targeting on low baseline outcomes is common in practice, for example because the relationship between individual characteristics and treatment effects is often difficult or impossible to estimate with historical data. We propose hybrid approaches that incorporate the strengths of both predictive approaches (accurate estimation) and causal approaches (correct criterion); we show that targeting intermediate baseline outcomes is most effective, while targeting based on low baseline outcomes is detrimental. In one year of the experiment, nudging all students improved early filing by an average of 6.4 percentage points over a baseline average of 37% filing, and we estimate that targeting half of the students using our preferred policy attains around 75% of this benefit.

摘要
在许多场景下，干预可能对某些个人更有效，因此定向干预可能有利。我们在一个大规模的场景实验中分析了定向的价值，该实验包括53,000名大学生，并使用“拐弯”来吸引学生在不紧迫的截止日期前续写金融援助申请。我们开始于基本的定向方法。首先，我们使用 causal forest 来估计各个人的不同待遇效果，然后将学生按照这些估计的最高待遇效果进行干预。接着，我们评估了两种替代的定向策略，一种targeting学生的低预测报道续写金融援助的可能性，另一种targeting学生的高预测报道续写金融援助的可能性。理想的基准结果不是定向的理想准则，也没有先前确定是否优先级低、高或中等预测报道。然而，定向低基准结果很普遍，例如因为对个人特征和干预效果的关系 often difficult or impossible to estimate with historical data。我们提议的混合方法可以结合predictive approach的准确估计和causal approach的正确准则，我们发现定向中间基准结果最有效，而定向低基准结果是不利的。在一年的实验中，对所有学生进行拐弯提高了37%的早期申请率的平均值6.4个百分点，我们估计使用我们的首选策略定向一半的学生可以达到约75%的这个效果。

Every Parameter Matters: Ensuring the Convergence of Federated Learning with Dynamic Heterogeneous Models Reduction

paper_url: http://arxiv.org/abs/2310.08670
repo_url: None
paper_authors: Hanhan Zhou, Tian Lan, Guru Venkataramani, Wenbo Ding
for: This paper focuses on addressing the challenges of cross-device Federated Learning (FL) by developing a unifying framework for heterogeneous FL algorithms with online model extraction, and providing a general convergence analysis for the first time.
methods: The paper proposes a holistic approach that considers both model reduction noise and minimum coverage index to enhance the efficiency of heterogeneous FL.
results: The authors prove that under certain sufficient conditions, the proposed algorithms converge to a stationary point of standard FL for general smooth cost functions, both for IID and non-IID data.

Abstract
Cross-device Federated Learning (FL) faces significant challenges where low-end clients that could potentially make unique contributions are excluded from training large models due to their resource bottlenecks. Recent research efforts have focused on model-heterogeneous FL, by extracting reduced-size models from the global model and applying them to local clients accordingly. Despite the empirical success, general theoretical guarantees of convergence on this method remain an open question. This paper presents a unifying framework for heterogeneous FL algorithms with online model extraction and provides a general convergence analysis for the first time. In particular, we prove that under certain sufficient conditions and for both IID and non-IID data, these algorithms converge to a stationary point of standard FL for general smooth cost functions. Moreover, we introduce the concept of minimum coverage index, together with model reduction noise, which will determine the convergence of heterogeneous federated learning, and therefore we advocate for a holistic approach that considers both factors to enhance the efficiency of heterogeneous federated learning.

摘要
跨设备联合学习（FL）面临着严重的挑战，low-end客户端因资源瓶颈而无法参与大型模型的训练，这些客户端具有唯一的贡献 potential。 recent research efforts have focused on model-heterogeneous FL, by extracting reduced-size models from the global model and applying them to local clients accordingly. Despite the empirical success, general theoretical guarantees of convergence on this method remain an open question. This paper presents a unifying framework for heterogeneous FL algorithms with online model extraction and provides a general convergence analysis for the first time. In particular, we prove that under certain sufficient conditions and for both IID and non-IID data, these algorithms converge to a stationary point of standard FL for general smooth cost functions. Moreover, we introduce the concept of minimum coverage index, together with model reduction noise, which will determine the convergence of heterogeneous federated learning, and therefore we advocate for a holistic approach that considers both factors to enhance the efficiency of heterogeneous federated learning.Here's the translation in Traditional Chinese:跨设备联合学习（FL）面临着严重的挑战，low-end客户端因为资源瓶颈而无法参与大型模型的训练，这些客户端具有唯一的贡献 potential。 recent research efforts have focused on model-heterogeneous FL, by extracting reduced-size models from the global model and applying them to local clients accordingly. Despite the empirical success, general theoretical guarantees of convergence on this method remain an open question. This paper presents a unifying framework for heterogeneous FL algorithms with online model extraction and provides a general convergence analysis for the first time. In particular, we prove that under certain sufficient conditions and for both IID and non-IID data, these algorithms converge to a stationary point of standard FL for general smooth cost functions. Moreover, we introduce the concept of minimum coverage index, together with model reduction noise, which will determine the convergence of heterogeneous federated learning, and therefore we advocate for a holistic approach that considers both factors to enhance the efficiency of heterogeneous federated learning.

Counting and Algorithmic Generalization with Transformers

paper_url: http://arxiv.org/abs/2310.08661
repo_url: https://github.com/simonouellette35/countingwithtransformers
paper_authors: Simon Ouellette, Rolf Pfister, Hansueli Jud
for: 该论文旨在研究机器学习算法的泛化能力，即能够学习数据生成算法的下发性。
methods: 该论文使用了标准的Transformers架构，并进行了一些修改来改善其对异常数据的性能。
results: 该论文显示了一种使用了修改后的Transformers架构可以在计数任务上实现良好的泛化能力，而且使用的是非常轻量级的架构。

Abstract
Algorithmic generalization in machine learning refers to the ability to learn the underlying algorithm that generates data in a way that generalizes out-of-distribution. This is generally considered a difficult task for most machine learning algorithms. Here, we analyze algorithmic generalization when counting is required, either implicitly or explicitly. We show that standard Transformers are based on architectural decisions that hinder out-of-distribution performance for such tasks. In particular, we discuss the consequences of using layer normalization and of normalizing the attention weights via softmax. With ablation of the problematic operations, we demonstrate that a modified transformer can exhibit a good algorithmic generalization performance on counting while using a very lightweight architecture.

摘要

SplitBeam: Effective and Efficient Beamforming in Wi-Fi Networks Through Split Computing

paper_url: http://arxiv.org/abs/2310.08656
repo_url: https://github.com/yoshitomo-matsubara/split-beam
paper_authors: Niloofar Bahadori, Yoshitomo Matsubara, Marco Levorato, Francesco Restuccia
for: 提高 Wi-Fi 网络吞吐量methods: 使用分解深度神经网络 (SplitBeam) 直接生成扫描矩阵 (BM)，并解决瓶颈优化问题 (BOP)results: 比标准 IEEE 802.11 算法和 LB-SciFi state-of-the-art DNN-based approach 减少扫描矩阵返回大小和计算复杂度，保持 bit error rate (BER) 在 about 10^-3 以下，并实现FPGA硬件实现以下结果。

Abstract
Modern IEEE 802.11 (Wi-Fi) networks extensively rely on multiple-input multiple-output (MIMO) to significantly improve throughput. To correctly beamform MIMO transmissions, the access point needs to frequently acquire a beamforming matrix (BM) from each connected station. However, the size of the matrix grows with the number of antennas and subcarriers, resulting in an increasing amount of airtime overhead and computational load at the station. Conventional approaches come with either excessive computational load or loss of beamforming precision. For this reason, we propose SplitBeam, a new framework where we train a split deep neural network (DNN) to directly output the BM given the channel state information (CSI) matrix as input. We formulate and solve a bottleneck optimization problem (BOP) to keep computation, airtime overhead, and bit error rate (BER) below application requirements. We perform extensive experimental CSI collection with off-the-shelf Wi-Fi devices in two distinct environments and compare the performance of SplitBeam with the standard IEEE 802.11 algorithm for BM feedback and the state-of-the-art DNN-based approach LB-SciFi. Our experimental results show that SplitBeam reduces the beamforming feedback size and computational complexity by respectively up to 81% and 84% while maintaining BER within about 10^-3 of existing approaches. We also implement the SplitBeam DNNs on FPGA hardware to estimate the end-to-end BM reporting delay, and show that the latter is less than 10 milliseconds in the most complex scenario, which is the target channel sounding frequency in realistic multi-user MIMO scenarios.

摘要
现代IEEE 802.11（Wi-Fi）网络广泛采用多输入多输出（MIMO）技术，以大幅提高吞吐量。为正确扫扫MIMO传输，接入点需要从每个连接的站点得到扫扫矩阵（BM）频繁。然而，矩阵的大小与天线和子频点数量成正比，导致了在站点处的空中时间开销和计算负担的增加。传统方法会导致过高的计算负担或扫扫矩阵精度的损失。为此，我们提出了SplitBeam，一个新的框架，其中我们使用分解深度神经网络（DNN）直接输出BM， given the channel state information（CSI）矩阵作为输入。我们解决了瓶颈优化问题（BOP），以保证计算、空中时间开销和比特错误率（BER）都在应用要求之下。我们进行了广泛的实验CSI收集，使用现成的Wi-Fi设备，在两种不同的环境中进行了测试，并与IEEE 802.11标准Feedback算法和LB-SciFi状态的艺术隐藏法相比较。我们的实验结果表明，SplitBeam可以将扫扫矩阵反馈大小和计算复杂性分别减少到81%和84%，保持BER在约10^-3的范围内。我们还将SplitBeam DNN硬件实现在FPGA上，并估算了终端到终端BM报告延迟，发现其在最复杂的场景下不超过10毫秒，这与实际多用户MIMO场景中的目标频率匹配。

Time-vectorized numerical integration for systems of ODEs

paper_url: http://arxiv.org/abs/2310.08649
repo_url: None
paper_authors: Mark C. Messner, Tianchen Hu, Tianju Chen
for: 该文章描述了一种高效、隐式、矩阵化的方法，用于解决科学问题中的紧张系统ordinary differential equations（ODEs）和稀缺训练数据。
methods: 该方法使用隐式法则和矩阵化技术，将问题vector化在独立时间序列和批处理时间步骤之间，从而提高计算设备的带宽。
results: 该方法可以实现 greater than 100x的速度提升， compared to标准、顺序时间integation方法，并且可以完全利用现代GPU的性能。文章还提供了一些示例问题，来说明方法的优势。

Abstract
Stiff systems of ordinary differential equations (ODEs) and sparse training data are common in scientific problems. This paper describes efficient, implicit, vectorized methods for integrating stiff systems of ordinary differential equations through time and calculating parameter gradients with the adjoint method. The main innovation is to vectorize the problem both over the number of independent times series and over a batch or "chunk" of sequential time steps, effectively vectorizing the assembly of the implicit system of ODEs. The block-bidiagonal structure of the linearized implicit system for the backward Euler method allows for further vectorization using parallel cyclic reduction (PCR). Vectorizing over both axes of the input data provides a higher bandwidth of calculations to the computing device, allowing even problems with comparatively sparse data to fully utilize modern GPUs and achieving speed ups of greater than 100x, compared to standard, sequential time integration. We demonstrate the advantages of implicit, vectorized time integration with several example problems, drawn from both analytical stiff and non-stiff ODE models as well as neural ODE models. We also describe and provide a freely available open-source implementation of the methods developed here.

摘要
常见的科学问题中有刚性系统（ODE）和稀疏的训练数据。这篇论文描述了高效、隐式、矩阵化方法，用于在时间上集成刚性系统的常微分方程，并计算参数偏导数量方法。主要创新在于将问题矩阵化在独立时间序列上，以及在批处理（chunk）或连续时间步上。这使得将问题矩阵化在输入数据上，从而提高计算设备的带宽，使得 Even problems with relatively sparse data can fully utilize modern GPUs, achieving speedups of greater than 100x compared to standard, sequential time integration. We demonstrate the advantages of implicit, vectorized time integration with several example problems, drawn from both analytical stiff and non-stiff ODE models as well as neural ODE models. We also describe and provide a freely available open-source implementation of the methods developed here.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Bucks for Buckets (B4B): Active Defenses Against Stealing Encoders

paper_url: http://arxiv.org/abs/2310.08571
repo_url: None
paper_authors: Jan Dubiński, Stanisław Pawlak, Franziska Boenisch, Tomasz Trzciński, Adam Dziedzic
For: 本研究旨在保护Machine Learning as a Service（MLaaS）APIs中的高性能编码器，防止对编码器的模型盗取攻击。* Methods: 本研究提出了一种名为“Bucks for Buckets”（B4B）的活动防御策略，利用了对 adversary 的观察和适应，以防止盗取编码器的功能。* Results: B4B 能够防止盗取编码器的功能，同时不会影响合法用户的表单质量。此外，B4B 还可以防止 adaptive adversaries 通过创建多个帐户（sybils）来绕过防御。

Abstract
Machine Learning as a Service (MLaaS) APIs provide ready-to-use and high-utility encoders that generate vector representations for given inputs. Since these encoders are very costly to train, they become lucrative targets for model stealing attacks during which an adversary leverages query access to the API to replicate the encoder locally at a fraction of the original training costs. We propose Bucks for Buckets (B4B), the first active defense that prevents stealing while the attack is happening without degrading representation quality for legitimate API users. Our defense relies on the observation that the representations returned to adversaries who try to steal the encoder's functionality cover a significantly larger fraction of the embedding space than representations of legitimate users who utilize the encoder to solve a particular downstream task.vB4B leverages this to adaptively adjust the utility of the returned representations according to a user's coverage of the embedding space. To prevent adaptive adversaries from eluding our defense by simply creating multiple user accounts (sybils), B4B also individually transforms each user's representations. This prevents the adversary from directly aggregating representations over multiple accounts to create their stolen encoder copy. Our active defense opens a new path towards securely sharing and democratizing encoders over public APIs.

摘要

Stronger Coreset Bounds for Kernel Density Estimators via Chaining

paper_url: http://arxiv.org/abs/2310.08548
repo_url: None
paper_authors: Rainie Bozzai, Thomas Rothvoss
for: 这个论文的目的是提供一种改进了核函数核心集的复杂性的方法，以便在广泛的kernel函数中实现随机多阶时间算法。
methods: 这个论文使用了不同的方法，包括误差方法和链接方法，以便实现核函数核心集的生成。
results: 这个论文的结果是提供了一种可随机多阶时间算法来生成核函数核心集，其大小为$O\left(\frac{\sqrt{d}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}\right)$，其中$d$是数据集的维度，$\varepsilon$是误差的大小。此外，这个论文还提供了一种可随机多阶时间算法来生成核函数核心集，其大小为$O\left(\frac{1}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}\right)$，但是只适用于 laplacian kernel。最后，这个论文还提供了对 exponential、Hellinger 和 JS kernel 的最佳known bounds，即 $O\left(\frac{\sqrt{d}{\varepsilon}\sqrt{\log(2\max{1,\alpha})}\right)$。

Abstract
We apply the discrepancy method and a chaining approach to give improved bounds on the coreset complexity of a wide class of kernel functions. Our results give randomized polynomial time algorithms to produce coresets of size $O\big(\frac{\sqrt{d}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}\big)$ for the Gaussian and Laplacian kernels in the case that the data set is uniformly bounded, an improvement that was not possible with previous techniques. We also obtain coresets of size $O\big(\frac{1}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}\big)$ for the Laplacian kernel for $d$ constant. Finally, we give the best known bounds of $O\big(\frac{\sqrt{d}{\varepsilon}\sqrt{\log(2\max\{1,\alpha\})}\big)$ on the coreset complexity of the exponential, Hellinger, and JS Kernels, where $1/\alpha$ is the bandwidth parameter of the kernel.

摘要
我们使用差异方法和链接方法来提供增强的核心复杂度下界，并且给出随机算法来生成核心集的大小为$O\left(\frac{\sqrt{d}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}\right)$的 Gaussian 和 Laplacian 核函数的情况下，其中数据集是均匀分布的。此外，我们还得到了 Laplacian 核函数的核心集大小为$O\left(\frac{1}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}\right)$的情况。最后，我们给出了最好的下界为$O\left(\frac{\sqrt{d}{\varepsilon}\sqrt{\log(2\max\{1,\alpha\})}\right)$的核心复杂度，其中 $\frac{1}{\alpha}$ 是核函数的宽度参数。

Divorce Prediction with Machine Learning: Insights and LIME Interpretability

paper_url: http://arxiv.org/abs/2310.08620
repo_url: None
paper_authors: Md Manjurul Ahsan
For: The paper aims to predict whether a couple will divorce or not using machine learning algorithms and interpretability techniques.* Methods: The authors use six machine learning algorithms (Logistic Regression, Linear Discriminant Analysis, K-Nearest Neighbors, Classification and Regression Trees, Gaussian Na"ive Bayes, and Support Vector Machines) to classify married and divorced individuals based on a dataset. They also use Local Interpretable Model-Agnostic Explanations (LIME) to provide interpretable results.* Results: The authors achieve an accuracy of 98.57% in predicting divorce using the SVM, KNN, and LDA algorithms. They also use LIME to explain the prediction probabilities and identify the most important features that differentiate divorced and married couples. Additionally, they develop a divorce predictor app that considers the ten most important features to help couples make decisions about their relationship.

Abstract
Divorce is one of the most common social issues in developed countries like in the United States. Almost 50% of the recent marriages turn into an involuntary divorce or separation. While it is evident that people vary to a different extent, and even over time, an incident like Divorce does not interrupt the individual's daily activities; still, Divorce has a severe effect on the individual's mental health, and personal life. Within the scope of this research, the divorce prediction was carried out by evaluating a dataset named by the 'divorce predictor dataset' to correctly classify between married and Divorce people using six different machine learning algorithms- Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Gaussian Na\"ive Bayes (NB), and, Support Vector Machines (SVM). Preliminary computational results show that algorithms such as SVM, KNN, and LDA, can perform that task with an accuracy of 98.57%. This work's additional novel contribution is the detailed and comprehensive explanation of prediction probabilities using Local Interpretable Model-Agnostic Explanations (LIME). Utilizing LIME to analyze test results illustrates the possibility of differentiating between divorced and married couples. Finally, we have developed a divorce predictor app considering ten most important features that potentially affect couples in making decisions in their divorce, such tools can be used by any one in order to identify their relationship condition.

摘要
各种社会问题中，离婚是最常见的一种，美国和其他发达国家的现代社会中约50%的新婚夫妻会经历不可避免的离婚或分居。虽然人们在不同程度和时间上有所不同，但离婚仍然会对个人的心理健康和生活产生严重的影响。在这个研究中，我们使用了名为“离婚预测数据集”的数据集，使用六种机器学习算法（LR、LDA、KNN、CART、NB和SVM）对已婚和离婚人进行分类，并实现了准确率达98.57%。此外，我们还提供了详细的解释，使用本地可解释性模型无关性（LIME）来分析测试结果，并证明了可以在已婚和离婚夫妻之间进行区分。最后，我们开发了一款离婚预测应用，考虑了十大最重要的因素，这些因素可能影响夫妻在离婚决策中的选择。这些工具可以由任何人使用，以了解他们的关系状况。

Characterizing climate pathways using feature importance on echo state networks

paper_url: http://arxiv.org/abs/2310.08495
repo_url: None
paper_authors: Katherine Goode, Daniel Ries, Kellie McClernon
for: 这份研究旨在探讨使用echostate网络（ESN）来 caracterize气候路径（climate pathway）的变量关系。
methods: 研究使用了echostate网络（ESN）来模拟气候数据，并开发了一种基于特征重要性的方法来评估变量关系。
results: 研究发现，使用echostate网络（ESN）可以准确地捕捉气候变量之间的关系，并且可以用特征重要性来评估这些关系的重要性。

Abstract
The 2022 National Defense Strategy of the United States listed climate change as a serious threat to national security. Climate intervention methods, such as stratospheric aerosol injection, have been proposed as mitigation strategies, but the downstream effects of such actions on a complex climate system are not well understood. The development of algorithmic techniques for quantifying relationships between source and impact variables related to a climate event (i.e., a climate pathway) would help inform policy decisions. Data-driven deep learning models have become powerful tools for modeling highly nonlinear relationships and may provide a route to characterize climate variable relationships. In this paper, we explore the use of an echo state network (ESN) for characterizing climate pathways. ESNs are a computationally efficient neural network variation designed for temporal data, and recent work proposes ESNs as a useful tool for forecasting spatio-temporal climate data. Like other neural networks, ESNs are non-interpretable black-box models, which poses a hurdle for understanding variable relationships. We address this issue by developing feature importance methods for ESNs in the context of spatio-temporal data to quantify variable relationships captured by the model. We conduct a simulation study to assess and compare the feature importance techniques, and we demonstrate the approach on reanalysis climate data. In the climate application, we select a time period that includes the 1991 volcanic eruption of Mount Pinatubo. This event was a significant stratospheric aerosol injection, which we use as a proxy for an artificial stratospheric aerosol injection. Using the proposed approach, we are able to characterize relationships between pathway variables associated with this event.

摘要
美国2022年国防策略列出了气候变化为国家安全的严重威胁。气候干预方法，如大气粉末注射，已经被提出来 mitigation 策略，但气候系统下流效果不很清楚。开发算法技术来衡量气候事件（即气候路径）变量之间的关系可以帮助决策。数据驱动的深度学习模型已成为模拟非线性关系的强大工具，可能提供了 caracterize 气候变量关系的方法。在这篇文章中，我们探讨使用echostate network（ESN）来 caracterize 气候路径。ESN是用于时间数据的计算效率高的神经网络变体，而最近的研究还提出了使用ESN来预测空间时间气候数据。just like other neural networks, ESNs are non-interpretable black-box models, which poses a hurdle for understanding variable relationships。我们解决这个问题 by developing feature importance methods for ESNs in the context of spatio-temporal data to quantify variable relationships captured by the model。我们在模拟研究中评估和比较特征重要性技术，并在气候数据上进行了应用。在气候应用中，我们选择了1991年Mount Pinatubo火山喷发事件作为人工大气粉末注射的代表。使用我们的方法，我们能够 caracterize 事件相关变量之间的关系。

Strategies and impact of learning curve estimation for CNN-based image classification

paper_url: http://arxiv.org/abs/2310.08470
repo_url: None
paper_authors: Laura Didyk, Brayden Yarish, Michael A. Beck, Christopher P. Bidinosti, Christopher J. Henry
for: 这篇论文主要用于提出一些简化启发式学习模型训练时间的策略，以便更好地优化模型性能。
methods: 这篇论文使用了一些不同的采样策略来估计学习曲线，并评估了这些策略在模拟学习曲线和实际图像分类任务中的性能。
results: 根据实验结果，提出的采样策略可以减少模型训练时间，同时仍能准确地估计学习曲线。

Abstract
Learning curves are a measure for how the performance of machine learning models improves given a certain volume of training data. Over a wide variety of applications and models it was observed that learning curves follow -- to a large extent -- a power law behavior. This makes the performance of different models for a given task somewhat predictable and opens the opportunity to reduce the training time for practitioners, who are exploring the space of possible models and hyperparameters for the problem at hand. By estimating the learning curve of a model from training on small subsets of data only the best models need to be considered for training on the full dataset. How to choose subset sizes and how often to sample models on these to obtain estimates is however not researched. Given that the goal is to reduce overall training time strategies are needed that sample the performance in a time-efficient way and yet leads to accurate learning curve estimates. In this paper we formulate the framework for these strategies and propose several strategies. Further we evaluate the strategies for simulated learning curves and in experiments with popular datasets and models for image classification tasks.

摘要
学习曲线是机器学习模型的性能改进度量，随着训练数据量的增加而变化。在各种应用和模型中，学习曲线大多数情况下遵循几何规律行为。这使得不同任务的模型性能之间有一定的预测性，从而为实际应用提供了减少训练时间的机会。通过在小样本数据上训练模型来估算学习曲线，只需考虑最佳的模型进行全数据集训练。然而，选择样本大小和如何采样模型以获得准确的学习曲线估计并不是研究的焦点。为了减少总训练时间，需要采用时效的采样策略，同时能够准确地估计学习曲线。在这篇论文中，我们提出了这些策略的框架，并提出了一些策略。此外，我们还对模拟学习曲线和实际数据集和模型进行了实验评估。

Differentially Private Non-convex Learning for Multi-layer Neural Networks

paper_url: http://arxiv.org/abs/2310.08425
repo_url: None
paper_authors: Hanpu Shen, Cheng-Long Wang, Zihang Xiang, Yiming Ying, Di Wang
for: 本研究探讨了多层fully connected神经网络中的差分隐私权限化优化问题。
methods: 我们提出了一些算法，并进行了分析，证明可以实现数据维度不受影响的过分布风险。
results: 我们的研究表明，在一些特定的情况下，DP-SGD可以提供优化的过分布风险保证。

Abstract
This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded and Lipschitz continuous. We propose several algorithms and our analysis demonstrates the feasibility of achieving an excess population risk that remains invariant to the data dimension. We also delve into the scenario involving the ReLU link function, and our findings mirror those of the bounded link function. We conclude this section by contrasting well-specified and misspecified models, using ReLU regression as a representative example. In the second part of the paper, we extend our ideas to two-layer neural networks with sigmoid or ReLU activation functions in the well-specified model. In the third part, we study the theoretical guarantees of DP-SGD in Abadi et al. (2016) for fully connected multi-layer neural networks. By utilizing recent advances in Neural Tangent Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large. Additionally, we discuss the role of some parameters in DP-SGD regarding their utility, both theoretically and empirically.

摘要
In the second part of the paper, we extend our ideas to two-layer neural networks with sigmoid or ReLU activation functions in the well-specified model. In the third part, we study the theoretical guarantees of DP-SGD in Abadi et al. (2016) for fully connected multi-layer neural networks. By utilizing recent advances in Neural Tangent Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large. Additionally, we discuss the role of some parameters in DP-SGD regarding their utility, both theoretically and empirically.

Introducing a Deep Neural Network-based Model Predictive Control Framework for Rapid Controller Implementation

paper_url: http://arxiv.org/abs/2310.08392
repo_url: None
paper_authors: David C. Gordon, Alexander Winkler, Julian Bedei, Patrick Schaber, Jakob Andert, Charles R. Koch
For: This paper presents an experimental implementation of a deep neural network (DNN) based nonlinear model predictive control (MPC) for Homogeneous Charge Compression Ignition (HCCI) combustion control.* Methods: The MPC uses a Long Short-Term Memory (LSTM) network surrounded by fully connected layers, which was trained using experimental engine data and showed acceptable prediction performance with under 5% error for all outputs.* Results: The developed controller was able to track the Indicated Mean Effective Pressure (IMEP) and combustion phasing trajectories, while minimizing several parameters. The IMEP trajectory following was excellent, with a root-mean-square error of 0.133 bar, and process constraints were observed.Here is the same information in Simplified Chinese text:* For: 这篇论文提出了一种基于深度神经网络（DNN）的非线性模块预测控制（MPC）方法，用于控制Homogeneous Charge Compression Ignition（HCCI）燃燃过程。* Methods: MPC使用了Long Short-Term Memory（LSTM）网络和全连接层，通过使用实验引擎数据训练，并达到了所有输出的下限Error的Acceptable prediction performance。* Results: 开发的控制器能够跟踪Indicated Mean Effective Pressure（IMEP）和燃燃阶段的轨迹，同时最小化一些参数。IMEP轨迹跟踪非常出色，root-mean-square error为0.133 bar，并且观察到了过程约束。

Abstract
Model Predictive Control (MPC) provides an optimal control solution based on a cost function while allowing for the implementation of process constraints. As a model-based optimal control technique, the performance of MPC strongly depends on the model used where a trade-off between model computation time and prediction performance exists. One solution is the integration of MPC with a machine learning (ML) based process model which are quick to evaluate online. This work presents the experimental implementation of a deep neural network (DNN) based nonlinear MPC for Homogeneous Charge Compression Ignition (HCCI) combustion control. The DNN model consists of a Long Short-Term Memory (LSTM) network surrounded by fully connected layers which was trained using experimental engine data and showed acceptable prediction performance with under 5% error for all outputs. Using this model, the MPC is designed to track the Indicated Mean Effective Pressure (IMEP) and combustion phasing trajectories, while minimizing several parameters. Using the acados software package to enable the real-time implementation of the MPC on an ARM Cortex A72, the optimization calculations are completed within 1.4 ms. The external A72 processor is integrated with the prototyping engine controller using a UDP connection allowing for rapid experimental deployment of the NMPC. The IMEP trajectory following of the developed controller was excellent, with a root-mean-square error of 0.133 bar, in addition to observing process constraints.

摘要

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

paper_url: http://arxiv.org/abs/2310.08391
repo_url: None
paper_authors: Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Peter L. Bartlett
for: 本研究探讨了一种简单的内容学习（ICL）设置，即预训练一个线性参数化的单层线性注意力模型，用于解决未看过的任务。
methods: 本研究使用了一个线性参数化的单层线性注意力模型，并采用了一些独立的任务进行预训练。
results: 研究发现，只需要一个小数量的独立任务，可以有效地预训练模型，并且预训练后的模型可以与最佳折叠 regression 匹配。

Abstract
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters. In this paper, we study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression with a Gaussian prior. We establish a statistical task complexity bound for the attention model pretraining, showing that effective pretraining only requires a small number of independent tasks. Furthermore, we prove that the pretrained model closely matches the Bayes optimal algorithm, i.e., optimally tuned ridge regression, by achieving nearly Bayes optimal risk on unseen tasks under a fixed context length. These theoretical findings complement prior experimental research and shed light on the statistical foundations of ICL.

摘要
启发器预训练在多个任务上表现出了惊人的境TEXTcontext学习（ICL）能力，使其可以基于输入上下文解决未经调整模型参数的未看过任务。在这篇论文中，我们研究了ICL的一个最简设置：预训练一个线性参数化的单层线性注意力模型，用 Gaussian 假设进行线性回归。我们确定了预训练模型的任务复杂度下界，表明有效的预训练只需要一小数量独立任务。此外，我们证明了预训练模型与最优调整ridge回归算法几乎相同，即在未看过任务下，模型可以达到最优的风险，而且这一结论与先前的实验研究相符。这些理论发现补充了ICL的统计基础，并为其进一步的研究提供了新的思路。

Towards Demystifying the Generalization Behaviors When Neural Collapse Emerges

paper_url: http://arxiv.org/abs/2310.08358
repo_url: None
paper_authors: Peifeng Gao, Qianqian Xu, Yibo Yang, Peisong Wen, Huiyang Shao, Zhiyong Yang, Bernard Ghanem, Qingming Huang
for: 这个论文探讨了深度神经网络在训练过程中的终端阶段（TPT）中的神经塌陷（NC）现象，以及在这个过程中的泛化行为。
methods: 这篇论文使用了泛化学习理论来解释在TPT阶段内部件的泛化行为，并提出了一种多类margin泛化约束，用于解释为何在终端训练阶段仍然可以 дости到测试集上的准确率提高。
results: 论文的研究结果表明，在TPT阶段内，不同的标签和特征之间的对齐程度可以导致不同的泛化水平，即”非保守泛化”现象。此外，论文还提供了实验证据来支持其理论结论。

Abstract
Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT). It is characterized by the collapse of features and classifier into a symmetrical structure, known as simplex equiangular tight frame (ETF). While there have been extensive studies on optimization characteristics showing the global optimality of neural collapse, little research has been done on the generalization behaviors during the occurrence of NC. Particularly, the important phenomenon of generalization improvement during TPT has been remaining in an empirical observation and lacking rigorous theoretical explanation. In this paper, we establish the connection between the minimization of CE and a multi-class SVM during TPT, and then derive a multi-class margin generalization bound, which provides a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%. Additionally, our further theoretical results indicate that different alignment between labels and features in a simplex ETF can result in varying degrees of generalization improvement, despite all models reaching NC and demonstrating similar optimization performance on train set. We refer to this newly discovered property as "non-conservative generalization". In experiments, we also provide empirical observations to verify the indications suggested by our theoretical results.

摘要
neural collapse (NC) 是深度神经网络在终端训练阶段的一个常见现象，特征是特征和分类器归一化到一个对称结构，称为简单等距离框架（ETF）。虽然有了广泛的优化特性研究，表明NC的全球优化性，但对于NC发生时的泛化行为进行了少量的研究。特别是在训练阶段达到100%的时候，继续训练可以导致测试集上的准确率改善，这种现象在理论上没有得到充分的解释。在这篇论文中，我们连接了CE的最小化和多类SVM的训练过程，并 derivated一种多类边界泛化 bound，这提供了一个理论上的解释，为什么在TPT阶段继续训练可以导致准确率改善。此外，我们的进一步理论结果表明，不同的标签和特征在简单ETF中的对应关系可能会导致不同的泛化提升，即使所有模型都达到NC并在训练集上达到相同的优化性。我们称之为“非保守泛化”。在实验中，我们也提供了一些实证证明我们的理论结果的指示。

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

paper_url: http://arxiv.org/abs/2310.08348
repo_url: https://github.com/opendilab/LightZero
paper_authors: Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu
for: 这篇论文的目的是为了开发一种通用的 Monte Carlo Tree Search（MCTS）/MuZero 基于的决策制定方法，以便在多种不同的应用中使用。
methods: 这篇论文使用了一种名为 LightZero 的统一测试基准，以便在多种不同的环境中评估 MCTS/MuZero 的性能。具体来说，论文将 MCTS 的算法拆分成多个子模块，然后通过适当的探索和优化策略来提高这些子模块的性能。
results: 论文通过对多种任务和环境进行测试，表明了这种方法的潜在强大性和可扩展性。详细的测试结果表明，通过采用这种方法，可以在多种不同的应用中建立可扩展和高效的决策智能。

Abstract
Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari. However, it has been deemed challenging or even infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse real-world applications, especially when these environments involve complex action spaces and significant simulation costs, or inherent stochasticity. In this work, we introduce LightZero, the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios. Specificially, we summarize the most critical challenges in designing a general MCTS-style decision-making solver, then decompose the tightly-coupled algorithm and system design of tree-search RL methods into distinct sub-modules. By incorporating more appropriate exploration and optimization strategies, we can significantly enhance these sub-modules and construct powerful LightZero agents to tackle tasks across a wide range of domains, such as board games, Atari, MuJoCo, MiniGrid and GoBigger. Detailed benchmark results reveal the significant potential of such methods in building scalable and efficient decision intelligence. The code is available as part of OpenDILab at https://github.com/opendilab/LightZero.

摘要
<>translate the following text into Simplified Chinese<>建立基于树搜索规划能力的智能代理，如Go和Atari等 класси型决策问题中 achieved remarkable success。然而，扩展 Monte Carlo Tree Search（MCTS）基于算法到实际应用中，特别是当这些环境包含复杂的行动空间和重要的仿真成本，或者内在的随机性时，被评估为困难或不可能。在这种情况下，我们引入了 LightZero，第一个通用的 MCTS/MuZero Benchmark，用于在通用sequential决策场景中部署 MCTS/MuZero 算法。 Specifically, we summarize the most critical challenges in designing a general MCTS-style decision-making solver, then decompose the tightly-coupled algorithm and system design of tree-search RL methods into distinct sub-modules。 By incorporating more appropriate exploration and optimization strategies, we can significantly enhance these sub-modules and construct powerful LightZero agents to tackle tasks across a wide range of domains, such as board games, Atari, MuJoCo, MiniGrid and GoBigger。详细的 benchmark 结果表明这种方法在构建可扩展和高效的决策智能方面具有潜在的潜力。代码可以在 OpenDILab 中下载，https://github.com/opendilab/LightZero。Note: Simplified Chinese is used in this translation, which is a version of Chinese that uses simpler grammar and vocabulary than Traditional Chinese.

Neural Diffusion Models

paper_url: http://arxiv.org/abs/2310.08337
repo_url: https://github.com/maum-ai/nuwave
paper_authors: Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth
for:NDMs are designed to train generative distributions more efficiently and simplify the reverse process by allowing time-dependent non-linear transformations of data.methods:NDMs use a variational bound to optimize the model in a simulation-free setting, and a time-continuous formulation allows for fast and reliable inference using off-the-shelf numerical ODE and SDE solvers.results:NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples, as demonstrated through experiments on standard image generation benchmarks such as CIFAR-10, downsampled versions of ImageNet, and CelebA-HQ.

Abstract
Diffusion models have shown remarkable performance on many generative tasks. Despite recent success, most diffusion models are restricted in that they only allow linear transformation of the data distribution. In contrast, broader family of transformations can potentially help train generative distributions more efficiently, simplifying the reverse process and closing the gap between the true negative log-likelihood and the variational approximation. In this paper, we present Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. We show how to optimise NDMs using a variational bound in a simulation-free setting. Moreover, we derive a time-continuous formulation of NDMs, which allows fast and reliable inference using off-the-shelf numerical ODE and SDE solvers. Finally, we demonstrate the utility of NDMs with learnable transformations through experiments on standard image generation benchmarks, including CIFAR-10, downsampled versions of ImageNet and CelebA-HQ. NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples.

摘要
Diffusion models have shown remarkable performance on many generative tasks. Despite recent success, most diffusion models are restricted in that they only allow linear transformation of the data distribution. In contrast, a broader family of transformations can potentially help train generative distributions more efficiently, simplifying the reverse process and closing the gap between the true negative log-likelihood and the variational approximation. In this paper, we present Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. We show how to optimize NDMs using a variational bound in a simulation-free setting. Moreover, we derive a time-continuous formulation of NDMs, which allows fast and reliable inference using off-the-shelf numerical ODE and SDE solvers. Finally, we demonstrate the utility of NDMs with learnable transformations through experiments on standard image generation benchmarks, including CIFAR-10, downsampled versions of ImageNet, and CelebA-HQ. NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples.Here's the translation in Traditional Chinese:Diffusion models have shown remarkable performance on many generative tasks. Despite recent success, most diffusion models are restricted in that they only allow linear transformation of the data distribution. In contrast, a broader family of transformations can potentially help train generative distributions more efficiently, simplifying the reverse process and closing the gap between the true negative log-likelihood and the variational approximation. In this paper, we present Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. We show how to optimize NDMs using a variational bound in a simulation-free setting. Moreover, we derive a time-continuous formulation of NDMs, which allows fast and reliable inference using off-the-shelf numerical ODE and SDE solvers. Finally, we demonstrate the utility of NDMs with learnable transformations through experiments on standard image generation benchmarks, including CIFAR-10, downsampled versions of ImageNet, and CelebA-HQ. NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples.

Impact of multi-armed bandit strategies on deep recurrent reinforcement learning

paper_url: http://arxiv.org/abs/2310.08331
repo_url: https://github.com/ValentinaZangirolami/DRL
paper_authors: Valentina Zangirolami, Matteo Borrotti
for: 这个论文是研究如何在半 observable 系统中均衡探索和利用的trade-off，以便在自动驾驶enario中预测车辊。
methods: 该论文使用了多种技术来均衡探索和利用trade-off，包括随机和决定性多 armed bandit策略以及深度循环神经网络。
results: 研究表明，使用适应性随机方法可以更好地 approximates the trade-off between exploration and exploitation，而且在总体来说，Softmax和Max-Boltzmann策略可以超越epsilon-greedy策略。

Abstract
Incomplete knowledge of the environment leads an agent to make decisions under uncertainty. One of the major dilemmas in Reinforcement Learning (RL) where an autonomous agent has to balance two contrasting needs in making its decisions is: exploiting the current knowledge of the environment to maximize the cumulative reward as well as exploring actions that allow improving the knowledge of the environment, hopefully leading to higher reward values (exploration-exploitation trade-off). Concurrently, another relevant issue regards the full observability of the states, which may not be assumed in all applications. Such as when only 2D images are considered as input in a RL approach used for finding the optimal action within a 3D simulation environment. In this work, we address these issues by deploying and testing several techniques to balance exploration and exploitation trade-off on partially observable systems for predicting steering wheels in autonomous driving scenario. More precisely, the final aim is to investigate the effects of using both stochastic and deterministic multi-armed bandit strategies coupled with a Deep Recurrent Q-Network. Additionally, we adapted and evaluated the impact of an innovative method to improve the learning phase of the underlying Convolutional Recurrent Neural Network. We aim to show that adaptive stochastic methods for exploration better approximate the trade-off between exploration and exploitation as, in general, Softmax and Max-Boltzmann strategies are able to outperform epsilon-greedy techniques.

摘要
agent在缺乏环境知识的情况下做出决策，是Reinforcement Learning（RL）中一个主要的挑战。在RL中，自动代理人需要均衡两种冲突的决策：一方面是利用当前环境知识来最大化总奖励，另一方面是探索可以提高环境知识的动作，希望能够获得更高的奖励值（探索-利用贸易offs）。同时，另一个相关的问题是环境完全可见性，这可能不是所有应用中都能假设。例如，在使用RL方法控制汽车自动驾驶时，只有2D图像作为输入。在这种情况下，我们采用了多种技术来均衡探索和利用贸易offs，并在部分可见性下进行预测车辊。具体来说，我们的目标是研究使用杂合精度和杂合概率方法来均衡探索和利用贸易offs，同时采用深度循环网络来改进学习阶段。我们希望能够证明，适应性的随机方法可以更好地均衡探索和利用贸易offs，而且在总体来说，Softmax和Max-Boltzmann策略可以超越epsilon-greedy策略。

A Symmetry-Aware Exploration of Bayesian Neural Network Posteriors

paper_url: http://arxiv.org/abs/2310.08287
repo_url: None
paper_authors: Olivier Laurent, Emanuel Aldea, Gianni Franchi
for: 这篇论文旨在探讨现代深度神经网络（DNNs）的权重分布，以便量化不确定性和robustness。
methods: 该论文提出了一种大规模探讨深度 bayesian neural network（BNNs）的 posterior distribution，并推广到实际视觉任务和结构。特别是，我们研究了 posterior 的最佳approximation方法，分析了 posterior 质量和不确定性量化之间的关系，探讨了模式对 posterior 的影响，并探索了可视化 posterior 的方法。
results: 我们发现 weight-space symmetries 是理解 posterior 的关键方面，并开发了对这种 symmetries 的深入分析。特别是，我们发现 permutation 和 scaling symmetries 对 Bayesian posterior 有重要影响，并探讨了这些 symmetries 与 L2 正则化之间的关系。此外，我们将 shortly release 一个大规模的 checkpoint 数据集，包括了千个实际模型和我们的代码，以帮助社区更好地理解 Bayesian posterior。

Abstract
The distribution of the weights of modern deep neural networks (DNNs) - crucial for uncertainty quantification and robustness - is an eminently complex object due to its extremely high dimensionality. This paper proposes one of the first large-scale explorations of the posterior distribution of deep Bayesian Neural Networks (BNNs), expanding its study to real-world vision tasks and architectures. Specifically, we investigate the optimal approach for approximating the posterior, analyze the connection between posterior quality and uncertainty quantification, delve into the impact of modes on the posterior, and explore methods for visualizing the posterior. Moreover, we uncover weight-space symmetries as a critical aspect for understanding the posterior. To this extent, we develop an in-depth assessment of the impact of both permutation and scaling symmetries that tend to obfuscate the Bayesian posterior. While the first type of transformation is known for duplicating modes, we explore the relationship between the latter and L2 regularization, challenging previous misconceptions. Finally, to help the community improve our understanding of the Bayesian posterior, we will shortly release the first large-scale checkpoint dataset, including thousands of real-world models and our codes.

摘要
现代深度神经网络（DNN）的权重分布 - 对于不确定性评估和稳定性的评估非常重要 - 是一个极其复杂的对象，因为其维度非常高。这篇论文提出了一个大规模的 posterior distribution 的探索，扩展到了真实世界视觉任务和架构。我们详细研究 posterior 的优化方法，分析 posterior 质量和不确定性评估之间的连接，探讨模式对 posterior 的影响，以及如何可视化 posterior。此外，我们还发现权重空间的Symmetry 是理解 posterior 的关键因素。为了更好地理解 posterior，我们开发了一种权重空间Symmetry 的深入分析，包括 permutation 和缩放Symmetry。而 permutation 类型的转换可以复制模式，我们探索了 L2 正则化和这种转换之间的关系，推翻了一些以前的错误假设。最后，为了帮助社区更好地理解 Bayesian posterior，我们将 shortly 发布大规模的 checkpoint 数据集，包括 тысячи个真实世界模型和我们的代码。

Data driven modeling of self-similar dynamics

paper_url: http://arxiv.org/abs/2310.08282
repo_url: None
paper_authors: Ru-yi Tao, Ning-ning Tao, Yi-zhuang You, Jiang Zhang
for: 本研究旨在提出一种基于多尺度模型的数据驱动模型，用于理解复杂系统的内含特性。
methods: 该模型基于自similarity prior knowledge，可以用来模型自similar dynamical systems。
results: 研究人员通过对Isng模型进行测试，发现模型可以提取出scale-invariant kernel，并且可以确定非平衡系统的极限阶段转移。

Abstract
Multiscale modeling of complex systems is crucial for understanding their intricacies. Data-driven multiscale modeling has emerged as a promising approach to tackle challenges associated with complex systems. On the other hand, self-similarity is prevalent in complex systems, hinting that large-scale complex systems can be modeled at a reduced cost. In this paper, we introduce a multiscale neural network framework that incorporates self-similarity as prior knowledge, facilitating the modeling of self-similar dynamical systems. For deterministic dynamics, our framework can discern whether the dynamics are self-similar. For uncertain dynamics, it can compare and determine which parameter set is closer to self-similarity. The framework allows us to extract scale-invariant kernels from the dynamics for modeling at any scale. Moreover, our method can identify the power law exponents in self-similar systems. Preliminary tests on the Ising model yielded critical exponents consistent with theoretical expectations, providing valuable insights for addressing critical phase transitions in non-equilibrium systems.

摘要
多尺度模型复杂系统的重要性，是为了理解它们的复杂性。数据驱动多尺度模型已成为规模系统的研究挑战的有效方法。同时，复杂系统中的自相似性很普遍，这表示大规模复杂系统可以在更低的成本下进行模型化。在本文中，我们提出了一种多尺度神经网络框架，该框架利用自相似性作为先验知识，以便模型自相似动力系统。对于确定性动力学，我们的框架可以判断动力系统是否为自相似的。对于不确定性动力学，它可以比较和确定最接近自相似性的参数集。我们的方法可以从动力系统中提取任意尺度的扁平函数，并且可以识别自相似系统中的指数律指数。我们的方法可以在不同尺度下进行模型化，并且可以提供非平衡系统的普遍性。在预测期望值下，我们的方法对牛顿模型进行了初步测试，并得到了相关的扩展 exponent。

Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift

paper_url: http://arxiv.org/abs/2310.08237
repo_url: https://github.com/WangCaixing-96/Kernel_CS
paper_authors: Xingdong Feng, Xin He, Caixing Wang, Chao Wang, Jingnan Zhang
for: 异常输出预测问题中的covariate shift问题
methods: 基于可重构kernel空间的非 Parametric方法
results: 提出了一种统一分析方法，并在一个 ricloss函数家族中得到了正确的理论和数值结果，并且在synthetic和实际示例中进行了广泛的数值研究，证明了方法的有效性。

Abstract
Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.

摘要
covariate shift 是在实践中非常普遍存在的问题，source和target数据的输入分布substantially different。Despite its practical importance in various learning problems, most existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.

Emergence of Latent Binary Encoding in Deep Neural Network Classifiers

paper_url: http://arxiv.org/abs/2310.08224
repo_url: None
paper_authors: Luigi Sbailò, Luca Ghiringhelli
for: 这个论文是为了研究深度神经网络分类器中的二进制编码。
methods: 作者使用了在训练过程中引入线性末级层，并使用一个增长为$\exp(\vec{x}^2)$的损失函数来训练。
results: 研究发现，在训练的末 stages，深度神经网络的latent space中会出现二进制编码，这会加速训练过程中的折衔和提高分类精度。

Abstract
We observe the emergence of binary encoding within the latent space of deep-neural-network classifiers. Such binary encoding is induced by introducing a linear penultimate layer, which is equipped during training with a loss function that grows as $\exp(\vec{x}^2)$, where $\vec{x}$ are the coordinates in the latent space. The phenomenon we describe represents a specific instance of a well-documented occurrence known as \textit{neural collapse}, which arises in the terminal phase of training and entails the collapse of latent class means to the vertices of a simplex equiangular tight frame (ETF). We show that binary encoding accelerates convergence toward the simplex ETF and enhances classification accuracy.

摘要
Note:* "latent space" is translated as "latent space" (干净空间)* "deep-neural-network classifiers" is translated as "深度神经网络分类器" (shēn dào shén zhī wǎng wǎng)* "binary encoding" is translated as "二进制编码" (èr jì bìn yì)* "linear penultimate layer" is translated as "线性末层" (xiào xìng zhì yù)* "loss function" is translated as "损失函数" (shū shī fún)* "neural collapse" is translated as "神经塌陷" (shén xiān zhù)* "simplex equiangular tight frame" is translated as "等角紧缩框架" (děng jiàng jǐn zhù kōng jì)* "accelerates convergence" is translated as "加速收敛" (jiā sù shōu jí)* "enhances classification accuracy" is translated as "提高分类准确性" (tí gāo fēn xiǎng yì yì)

Conformal inference for regression on Riemannian Manifolds

paper_url: http://arxiv.org/abs/2310.08209
repo_url: None
paper_authors: Alejandro Cholaquidis, Fabrice Gamboa, Leonardo Moreno
for: 这篇论文探讨了在拟合空间中进行回归的问题，具体来说是在拟合空间中预测变量Y的问题。
methods: 该论文使用了传统的协形推断方法，不假设任何关于$(X, Y)$的联合分布模型，而是基于拟合空间上的抽象结构来建立预测集。
results: 该论文提出了一种基于拟合空间的预测集方法，并证明了该方法的 asymptotic almost sure convergence 性和效率。通过实验和实际数据分析，论文还证明了该方法的可行性和适用性。

Abstract
Regression on manifolds, and, more broadly, statistics on manifolds, has garnered significant importance in recent years due to the vast number of applications for this type of data. Circular data is a classic example, but so is data in the space of covariance matrices, data on the Grassmannian manifold obtained as a result of principal component analysis, among many others. In this work we investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space. This extends the concepts delineated in [Lei and Wasserman, 2014] to this novel context. Aligning with traditional principles in conformal inference, these prediction sets are distribution-free, indicating that no specific assumptions are imposed on the joint distribution of $(X, Y)$, and they maintain a non-parametric character. We prove the asymptotic almost sure convergence of the empirical version of these regions on the manifold to their population counterparts. The efficiency of this method is shown through a comprehensive simulation study and an analysis involving real-world data.

摘要
“投 regression on manifolds 和更广泛的 statistics on manifolds 在最近几年内受到了广泛关注，因为这类数据在各个领域中有着广泛的应用。Circular data 和 data on the Grassmannian manifold 都是 класси的例子，以及由 principal component analysis 获得的数据在 manifold 上。在这项工作中，我们调查了 regression enario 中回归变量 $Y$ 的 manifold 上的预测集，而 covariable $X$ 则在欧几何空间中。这个扩展了 [Lei and Wasserman, 2014] 中的概念，并遵循了传统的准确推断原则。这些预测集是无结构的，即没有对 $(X, Y)$ 的共同分布假设，并且具有非Parametric 的特点。我们证明了这些预测集在 manifold 上的empirical版本在极限上对应于人口版本的 asymptotic almost sure convergence。我们通过了广泛的 simulate 研究和一个使用实际数据的分析，证明了这种方法的效率。”Note: The translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.

Infinite Width Graph Neural Networks for Node Regression/ Classification

paper_url: http://arxiv.org/abs/2310.08176
repo_url: https://github.com/yCobanoglu/infinite-width-gnns
paper_authors: Yunus Cobanoglu
for: 这paper探讨了图神经网络（GNN）的普通化，当其宽度（即每层全连接节点数）增加到无穷大时。
methods: 这paper使用了各种架构，包括标准图神经网络、跳过 concatenate 连接的图神经网络和Graph Attention神经网络，并 derivied了它们的kernel和 Gaussian Process 的闭式。
results: 这paper在多种dataset上进行了透传节点预测和分类任务，并使用了spectral sparsification方法来提高运行时间和内存需求。此外， inductive graph learning任务（图 regression/classification）的扩展也 briefley discussed。

Abstract
This work analyzes Graph Neural Networks, a generalization of Fully-Connected Deep Neural Nets on Graph structured data, when their width, that is the number of nodes in each fullyconnected layer is increasing to infinity. Infinite Width Neural Networks are connecting Deep Learning to Gaussian Processes and Kernels, both Machine Learning Frameworks with long traditions and extensive theoretical foundations. Gaussian Processes and Kernels have much less hyperparameters then Neural Networks and can be used for uncertainty estimation, making them more user friendly for applications. This works extends the increasing amount of research connecting Gaussian Processes and Kernels to Neural Networks. The Kernel and Gaussian Process closed forms are derived for a variety of architectures, namely the standard Graph Neural Network, the Graph Neural Network with Skip-Concatenate Connections and the Graph Attention Neural Network. All architectures are evaluated on a variety of datasets on the task of transductive Node Regression and Classification. Additionally, a Spectral Sparsification method known as Effective Resistance is used to improve runtime and memory requirements. Extending the setting to inductive graph learning tasks (Graph Regression/ Classification) is straightforward and is briefly discussed in 3.5.

摘要
这个研究 analyze Graph Neural Networks (GNNs), a generalization of Fully-Connected Deep Neural Nets on graph-structured data, when the width (i.e., the number of nodes in each fully connected layer) increases to infinity. 无穷宽神经网络可以将深度学习与泊松过程和kernel相连接，这两者都是机器学习框架，具有较少的Hyperparameter，可以用于uncertainty estimation，使其更适用于应用。这项研究将Gaussian Processes和Kernels相关的研究扩展到神经网络领域。这些架构包括标准的Graph Neural Network (GNN)、Skip-Concatenate Connection GNN和Graph Attention Neural Network (GAT)。这些架构在逻辑回归和分类任务上进行了多种数据集的评估。此外，我们还使用了一种spectral sparsification方法，即Effective Resistance，以改善运行时间和内存需求。将这些设置推广到 inductive graph learning任务 (graph regression/classification) 也是可行的，并在3.5中简要介绍。

Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders

paper_url: http://arxiv.org/abs/2310.08164
repo_url: None
paper_authors: Luke Marks, Amir Abdullah, Luna Mendez, Rauno Arike, Philip Torr, Fazl Barez
for: This paper aims to provide a method for interpreting the learned reward functions in reinforcement learning-tuned large language models (LLMs), in order to ensure alignment between the model’s behaviors and the specified objectives.
methods: The proposed method uses sparse autoencoders to compare the activations of a base LLM and its RLHF-tuned version, and identifies unique features that reflect the accuracy of the learned reward model.
results: The method provides an abstract approximation of reward integrity, and is the first application of sparse autoencoders for interpreting learned rewards and broadly inspecting reward learning in LLMs.

Abstract
Large language models (LLMs) aligned to human preferences via reinforcement learning from human feedback (RLHF) underpin many commercial applications. However, how RLHF impacts LLM internals remains opaque. We propose a novel method to interpret learned reward functions in RLHF-tuned LLMs using sparse autoencoders. Our approach trains autoencoder sets on activations from a base LLM and its RLHF-tuned version. By comparing autoencoder hidden spaces, we identify unique features that reflect the accuracy of the learned reward model. To quantify this, we construct a scenario where the tuned LLM learns token-reward mappings to maximize reward. This is the first application of sparse autoencoders for interpreting learned rewards and broadly inspecting reward learning in LLMs. Our method provides an abstract approximation of reward integrity. This presents a promising technique for ensuring alignment between specified objectives and model behaviors.

摘要
translate to Simplified Chinese:大型语言模型（LLM）通过人类反馈学习（RLHF）实现了许多商业应用程序。然而，RLHF如何影响 LLM 的内部结构仍然不清晰。我们提出了一种新的方法来解释 RLHF 调教的 LLM 学习的奖励函数使用稀疏自动编码器。我们的方法通过将自动编码器集 trains 在基础 LLM 和其RLHF 调教版本的活动上，并比较自动编码器隐藏空间，以确定RLHF 调教后 LLM 学习的奖励模型的准确性。为了量化这一点，我们构建了一种情况，在RLHF 调教后 LLM 学习token-奖励映射以最大化奖励。这是 sparse autoencoders 用于解释学习奖励的第一个应用，并广泛检查RLHF 学习在 LLM 中的奖励学习。我们的方法提供了一种抽象的奖励完整性评估方法，可以确保模型的行为与指定目标之间的一致性。

On Extreme Value Asymptotics of Projected Sample Covariances in High Dimensions with Applications in Finance and Convolutional Networks

paper_url: http://arxiv.org/abs/2310.08150
repo_url: None
paper_authors: Ansgar Steland
for: 研究高维时间序列数据集是否遵循正常分布条件下收集的Maximum-type统计方法。
methods: 使用样本协variance矩阵的最大值统计方法，并应用Gumbel-type极值 asymptotics 来确定数据集的分布条件。
results: 研究应用于最小协variance股票组合优化和对偏好风险的分析，ETF指数追踪 by sparse tracking portfolios， convolutional deep learners for image analysis 和 array-of-sensors数据分析。

Abstract
Maximum-type statistics of certain functions of the sample covariance matrix of high-dimensional vector time series are studied to statistically confirm or reject the null hypothesis that a data set has been collected under normal conditions. The approach generalizes the case of the maximal deviation of the sample autocovariances function from its assumed values. Within a linear time series framework it is shown that Gumbel-type extreme value asymptotics holds true. As applications we discuss long-only mimimal-variance portfolio optimization and subportfolio analysis with respect to idiosyncratic risks, ETF index tracking by sparse tracking portfolios, convolutional deep learners for image analysis and the analysis of array-of-sensors data.

摘要
“ maximum-type 统计学的certain function of the sample covariance matrix of high-dimensional vector time series 被研究以确认或拒绝 null hypothesis 是否在正常情况下收集数据。这种方法总结了最大偏差sample autocovariances函数的假设值。在线性时间序列框架下，证明Gumbel-type极值几何 asymptotics 成立。应用包括long-only minimal-variance portfolio optimization和对唯一风险的subportfolio分析，ETF指数追踪 porfolio by sparse tracking， convolutional deep learners for image analysis和 array-of-sensors data 分析。”Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".

Open-Set Knowledge-Based Visual Question Answering with Inference Paths

paper_url: http://arxiv.org/abs/2310.08148
repo_url: https://github.com/JingruG/GATHER
paper_authors: Jingru Gan, Xinzhe Han, Shuhui Wang, Qingming Huang
for: Answering open-set questions with explicit reasoning paths in a knowledge-based visual question answering system.
methods: The proposed Graph pATH rankER (GATHER) framework, which includes graph constructing, pruning, and path-level ranking.
results: The model is able to perform open-set question answering across the whole knowledge base and provide explicit reasoning paths, as demonstrated through extensive experiments on real-world questions.

Abstract
Given an image and an associated textual question, the purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases. Prior KB-VQA models are usually formulated as a retriever-classifier framework, where a pre-trained retriever extracts textual or visual information from knowledge graphs and then makes a prediction among the candidates. Despite promising progress, there are two drawbacks with existing models. Firstly, modeling question-answering as multi-class classification limits the answer space to a preset corpus and lacks the ability of flexible reasoning. Secondly, the classifier merely consider "what is the answer" without "how to get the answer", which cannot ground the answer to explicit reasoning paths. In this paper, we confront the challenge of \emph{explainable open-set} KB-VQA, where the system is required to answer questions with entities at wild and retain an explainable reasoning path. To resolve the aforementioned issues, we propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity). Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process. To comprehensively evaluate our model, we reformulate the benchmark dataset OK-VQA with manually corrected entity-level annotations and release it as ConceptVQA. Extensive experiments on real-world questions demonstrate that our framework is not only able to perform open-set question answering across the whole knowledge base but provide explicit reasoning path.

摘要
KB-VQA的目的是使用外部知识库提供问题的正确答案，而且可以提供解释的推理路径。现有的KB-VQA模型通常采用 Retriever-Classifier 框架，其中 Pre-trained Retriever 从知识图中提取文本或视觉信息，然后进行多类分类预测。然而，现有模型存在两个缺陷：首先，问题答案模型为多类分类，限制答案空间为预设的词汇库，缺乏灵活的推理能力。其次，分类器仅考虑“问题的答案”而不考虑“如何得到答案”，无法固定推理路径。在这篇文章中，我们面临KB-VQA中的解释开放集问题，需要在未知知识库中回答问题并提供可解释的推理路径。为解决这些问题，我们提出了一种新的 Retriever-Ranker 模型，即 Graph pATH rankER（GATHER）。GATHER模型包含图构建、剪辑和路径级别排名，不仅可以 Retrieves 精准答案，还可以提供推理路径。为了全面评估我们的模型，我们对 OK-VQA benchmark dataset进行了手动修改Entity-level的注释，并将其发布为 ConceptVQA。广泛的实验表明，我们的框架不仅可以在整个知识库中开放式回答问题，还可以提供可解释的推理路径。

Counterfactual Explanations for Time Series Forecasting

paper_url: http://arxiv.org/abs/2310.08137
repo_url: https://github.com/zhendong3wang/counterfactual-explanations-for-forecasting
paper_authors: Zhendong Wang, Ioanna Miliou, Isak Samsten, Panagiotis Papapetrou
for: 本研究旨在提出一种新的时间序列预测方法，可以生成有意义的对照性解释。
methods: 本研究使用了深度预测模型，并提出了一种基于梯度的方法来生成对照性解释。
results: 实验结果表明，本方法可以在不同的深度预测模型上达到更高的对照性和数据抽象度。

Abstract
Among recent developments in time series forecasting methods, deep forecasting models have gained popularity as they can utilize hidden feature patterns in time series to improve forecasting performance. Nevertheless, the majority of current deep forecasting models are opaque, hence making it challenging to interpret the results. While counterfactual explanations have been extensively employed as a post-hoc approach for explaining classification models, their application to forecasting models still remains underexplored. In this paper, we formulate the novel problem of counterfactual generation for time series forecasting, and propose an algorithm, called ForecastCF, that solves the problem by applying gradient-based perturbations to the original time series. ForecastCF guides the perturbations by applying constraints to the forecasted values to obtain desired prediction outcomes. We experimentally evaluate ForecastCF using four state-of-the-art deep model architectures and compare to two baselines. Our results show that ForecastCF outperforms the baseline in terms of counterfactual validity and data manifold closeness. Overall, our findings suggest that ForecastCF can generate meaningful and relevant counterfactual explanations for various forecasting tasks.

摘要
现代时间序列预测方法中，深度预测模型在使用隐藏特征模式来提高预测性能方面具有广泛的应用。然而，大多数当前的深度预测模型是不透明的，因此很难解释结果。在分类模型中，对比方面的解释方法已经广泛应用，但是对预测模型的应用仍然尚未得到充分发展。在本文中，我们提出了时间序列预测中的对比方面问题，并提出了一种名为 ForecastCF 的算法，该算法通过对原始时间序列应用梯度基于的干扰来解决这个问题。ForecastCF 使用约束来导引干扰，以实现想要的预测结果。我们通过四种当今最佳的深度模型架构进行实验测试，并与两个基准值进行比较。我们的结果表明，ForecastCF 在对比验证中高于基准值，以至于counterfactual正确性和数据折衔离的方面。总的来说，我们的发现表明，ForecastCF 可以生成有意义和相关的对比解释，用于各种预测任务。

Core-sets for Fair and Diverse Data Summarization

paper_url: http://arxiv.org/abs/2310.08122
repo_url: https://github.com/microsoft/coresets-fair-diverse
paper_authors: Sepideh Mahabadi, Stojan Trajanovski
for: 本文研究核心集构造算法，用于实现多元化最大化并且满足公平/分区约束。给定一个点集 $P$ 在 метри空间中，分成 $m$ 个组，并给定 $k_1, \ldots, k_m$，目标是从每个组中选择 $k_i$ 个点，使得总体多元化最大化。
methods: 本文考虑了两种自然多元度度量：点对之间距离和最近邻居距离，并提供了改进的核心集构造算法。具体来说，我们提供了对sum-of-pairwise distances的首个常量因子核心集，其大小独立于数据集大小和尺度比。其次，我们提供了对sum-of-nearest-neighbor distances的首个核心集。
results: 我们运行了多个实验，证明我们的核心集方法的效果。具体来说，我们应用了这种核心集方法，对一个实际应用中的时间消息汇总 зада题进行了解决。Specifically, the summary should include more recent messages compared to older ones. 我们实现了100倍的速度提升，而多元化损失只减少了很少。此外，我们的方法可以在流式设定中提高算法的空间使用率。

Abstract
We study core-set construction algorithms for the task of Diversity Maximization under fairness/partition constraint. Given a set of points $P$ in a metric space partitioned into $m$ groups, and given $k_1,\ldots,k_m$, the goal of this problem is to pick $k_i$ points from each group $i$ such that the overall diversity of the $k=\sum_i k_i$ picked points is maximized. We consider two natural diversity measures: sum-of-pairwise distances and sum-of-nearest-neighbor distances, and show improved core-set construction algorithms with respect to these measures. More precisely, we show the first constant factor core-set w.r.t. sum-of-pairwise distances whose size is independent of the size of the dataset and the aspect ratio. Second, we show the first core-set w.r.t. the sum-of-nearest-neighbor distances. Finally, we run several experiments showing the effectiveness of our core-set approach. In particular, we apply constrained diversity maximization to summarize a set of timed messages that takes into account the messages' recency. Specifically, the summary should include more recent messages compared to older ones. This is a real task in one of the largest communication platforms, affecting the experience of hundreds of millions daily active users. By utilizing our core-set method for this task, we achieve a 100x speed-up while losing the diversity by only a few percent. Moreover, our approach allows us to improve the space usage of the algorithm in the streaming setting.

摘要
我们研究核心集建构算法，以实现多样性最大化 unter fairness/分区约束。给定一个点集 $P$ 在一个度量空间中分成 $m$ 个组，并给定 $k_1, \ldots, k_m$, 则这个问题的目标是从每个组 $i$ 中选择 $k_i$ 个点，使得总的多样性最大化。我们考虑了两种自然的多样性度量：点对的总距离和最近邻居距离，并提供了改进的核心集建构算法。更加准确地说，我们提供了第一个常量因子核心集，其大小独立于数据集大小和尺度比。其次，我们提供了第一个对最近邻居距离进行多样性最大化的核心集。最后，我们运行了多个实验，证明了我们的核心集方法的有效性。特别是，我们应用了多样性最大化来摘要一组时间消息，考虑到消息的新鲜度。具体来说，摘要应包含更新的消息比较新的消息。这是一个现实中的大型通信平台上的实际任务，影响了每天活跃用户数百万。通过利用我们的核心集方法来解决这个任务，我们得到了100倍的速度提升，而多样性损失只减少了很少。此外，我们的方法可以改进流式设定下的算法空间使用情况。

Overview of Physics-Informed Machine Learning Inversion of Geophysical Data

paper_url: http://arxiv.org/abs/2310.08109
repo_url: None
paper_authors: Gerard T. Schuster, Shihang Feng
For: The paper discusses the use of physics-informed machine learning (PIML) algorithms for geophysical data inversion.* Methods: The paper proposes four different PIML algorithms, each with a unique combination of weights and neural network operations, and uses a joint objective function (Equation \ref{PIML.eq120}) to minimize the difference between observed and predicted data.* Results: The paper highlights the potential advantages of PIML over standard full-waveform inversion (FWI), including the ability to avoid local minima and the option to locally train the inversion operator, but notes that the effectiveness of PIML relies on the similarity between the test and trained data.Here is the information in Simplified Chinese:* For: 这篇论文介绍了物理学 Informed 机器学习（PIML）算法的应用于地球物理数据逆向。* Methods: 这篇论文提出了四种不同的 PIML 算法，每种都有特定的量身定制的量和神经网络操作，并使用共同目标函数（方程 \ref{PIML.eq120}）来最小化观测数据和预测数据之间的差异。* Results: 这篇论文指出了 PIML 相比标准全波形逆向（FWI）具有避免地点最小值和地方训练逆向器的选择，但是其效iveness 受观测数据和训练数据之间的相似性影响。

Abstract
We review four types of algorithms for physics-informed machine learning (PIML) inversion of geophysical data. The unifying equation is given by the joint objective function $\epsilon$: \begin{eqnarray} \epsilon^{||-PIML}&=&\lambda_1 \overbrace{||{\bf W}^{ML}({\bf H}_{\bf w} {\bf d}^{obs}-{\bf m})||^2}^{NN} + \lambda_2 \overbrace{||{\bf W}^{FWI}({\bf L} {\bf m}-{\bf d}^{obs})||^2}^{FWI} ~+ \nonumber\\ \nonumber\\ && + ~~Regularizer, \label{PIML.eq120} \end{eqnarray}where the optimal model ${\bf m}^*$ and weights $\bf w^*$ minimize $\epsilon$. Here, The matrix weights are given by the boldface symbol $\bf W$, and full waveform inversion (FWI) is typically computed using a finite-difference solution of the wave equation, where $\bf L$ represents the forward modeling operation of the wave equation as a function of the model $\bf m$. Also, a fully-connected neural network (NN) is used to compute the model ${\bf H_w}{\bf d}^{obs} \approx \bf m$ from the observed input data ${\bf d}^{obs}$. The selection of weights $\lambda_i$ and the NN operations determine one of four different PIML algorithms. PIML offers potential advantages over standard FWI through its enhanced ability to avoid local minima and the option to locally train the inversion operator, minimizing the requirement for extensive training data for global applicability. However, the effectiveness of PIML relies on the similarity between the test and trained data. Nevertheless, a possible strategy to overcome this limitation involves initial pretraining of a PIML architecture with data from a broader region, followed by fine-tuning for specific data-a method reminiscent of the way large language models are pretrained and adapted for various tasks.

摘要
我们评审了四种物理学 Informed Machine Learning（PIML）逆转 geophysical 数据的算法。共同的目标函数为 $\epsilon$：$$\epsilon^{||-PIML} = \lambda_1 \left\lVert{\bf W}^{ML}({\bf H}_{\bf w} {\bf d}^{obs} - {\bf m})\right\rVert^2 + \lambda_2 \left\lVert{\bf W}^{FWI}({\bf L} {\bf m} - {\bf d}^{obs})\right\rVert^2 + \text{Regularizer}$$其中，最佳模型 ${\bf m}^*$ 和权重 $\bf w^*$ 将 minimize $\epsilon$。其中，粗体字 $\bf W$ 表示矩阵权重，全波形推导（FWI）通常使用finite-difference方法解析波动方程，其中 $\bf L$ 表示模型 $\bf m$ 的前向模型化操作。此外，完全连接神经网络（NN）用于计算模型 ${\bf H_w}{\bf d}^{obs} \approx \bf m$ 从观察数据 ${\bf d}^{obs}$。选择权重 $\lambda_i$ 和 NN 操作确定了一种中的四种 PIML 算法。PIML 具有增强的避免本地最小值的能力，以及可以地方式训练逆转操作，以降低训练数据的需求。然而，PIML 的有效性取决于测试和训练数据之间的相似性。然而，一种可能的策略是先初始化 PIML 架构使用更广泛的地区数据进行预训练，然后进行细化适应特定数据。这种策略类似于大型自然语言模型的预训练和适应不同任务的方法。

Generative Intrinsic Optimization: Intrisic Control with Model Learning

paper_url: http://arxiv.org/abs/2310.08100
repo_url: None
paper_authors: Jianfei Ma
for: 这个论文主要为了研究如何通过 incorporating intrinsic motivation with reward maximization来提高机器人的决策效率和适应能力。
methods: 这个论文使用了变量方法来联合学习必要的量和动力学模型，并将其纳入政策迭代算法，以确保优化策略。
results: 这个论文通过 теоретиче分析和实验 validate了其方法的可行性和效果，并开启了在机器人决策中利用内生控制和模型学习来提高样本效率和环境不确定性的潜在应用前景。

Abstract
Future sequence represents the outcome after executing the action into the environment. When driven by the information-theoretic concept of mutual information, it seeks maximally informative consequences. Explicit outcomes may vary across state, return, or trajectory serving different purposes such as credit assignment or imitation learning. However, the inherent nature of incorporating intrinsic motivation with reward maximization is often neglected. In this work, we propose a variational approach to jointly learn the necessary quantity for estimating the mutual information and the dynamics model, providing a general framework for incorporating different forms of outcomes of interest. Integrated into a policy iteration scheme, our approach guarantees convergence to the optimal policy. While we mainly focus on theoretical analysis, our approach opens the possibilities of leveraging intrinsic control with model learning to enhance sample efficiency and incorporate uncertainty of the environment into decision-making.

摘要
Translated into Simplified Chinese:未来序列表示在环境中执行动的结果。当被动力学概念的共同信息概念驱动时，它寻求最大化的信息后果。显式结果可能因状态、返回或轨迹而变化，以服务不同的目的，如归报学习或模仿学习。然而，在把内在动机与奖励最大化结合的情况下，通常会被忽略。在这项工作中，我们提出了一种变量方法，同时学习必要的量来估计共同信息和动力学模型，提供一个通用的框架，以包括不同形式的结果。将这种方法集成到策略迭代程序中，我们的方法保证 converges to the optimal policy。虽主要关注理论分析，但我们的方法开创了在使用内在控制与模型学习来提高样本效率和在决策中包含环境不确定性的可能性。

ClimateBERT-NetZero: Detecting and Assessing Net Zero and Reduction Targets

paper_url: http://arxiv.org/abs/2310.08096
repo_url: None
paper_authors: Tobias Schimanski, Julia Bingler, Camilla Hyslop, Mathias Kraus, Markus Leippold
for: 本研究的目的是帮助公共和私人行为者评估各种机构的可持续发展承诺。
methods: 本研究使用了一种新的自动检测方法，包括三个步骤：首先，我们提供了一个专家标注的数据集，包含3.5K个文本样本；其次，我们训练并发布了一种基于自然语言分类器的气候BERT-NetZero模型，用于检测文本中是否包含网零或减少目标；最后，我们使用这种模型分析了两个使用 случа例，包括与传统问答（Q&A）模型结合分析网零或减少目标的批处，以及基于四季报告会讲话稿的 коммуникаtion 模式的演化。
results: 我们的实验结果表明，使用 ClimateBERT-NetZero 模型可以帮助自动检测和分析网零或减少目标，并且可以在大规模数据中提取有用的信息。

Abstract
Public and private actors struggle to assess the vast amounts of information about sustainability commitments made by various institutions. To address this problem, we create a novel tool for automatically detecting corporate, national, and regional net zero and reduction targets in three steps. First, we introduce an expert-annotated data set with 3.5K text samples. Second, we train and release ClimateBERT-NetZero, a natural language classifier to detect whether a text contains a net zero or reduction target. Third, we showcase its analysis potential with two use cases: We first demonstrate how ClimateBERT-NetZero can be combined with conventional question-answering (Q&A) models to analyze the ambitions displayed in net zero and reduction targets. Furthermore, we employ the ClimateBERT-NetZero model on quarterly earning call transcripts and outline how communication patterns evolve over time. Our experiments demonstrate promising pathways for extracting and analyzing net zero and emission reduction targets at scale.

摘要
公共和私人行业努力评估各种机构的可持续发展承诺，但面临巨大的信息量和识别挑战。为解决这问题，我们创建了一种自动检测公司、国家和地区的减少和零排放目标的新工具。我们的方法包括以下三步：第一步，我们提供了一个专家标注的数据集，包含3.5K个文本样本。第二步，我们训练并发布了一个基于自然语言的分类器，用于判断文本是否包含减少或零排放目标。第三步，我们展示了这种分类器的分析潜力，通过两个使用情况：首先，我们将 ClimateBERT-NetZero 与传统的问答（Q&A）模型结合，分析减少和零排放目标中表达的目标。其次，我们使用 ClimateBERT-NetZero 模型对每季财务会议笔记进行分析，并详细描述了时间序列中的沟通趋势。我们的实验结果表明，这种方法可以有效地检测和分析减少和零排放目标。

Dealing with zero-inflated data: achieving SOTA with a two-fold machine learning approach

paper_url: http://arxiv.org/abs/2310.08088
repo_url: None
paper_authors: Jože M. Rožanec, Gašper Petelin, João Costa, Blaž Bertalanič, Gregor Cerar, Marko Guček, Gregor Papa, Dunja Mladenić
for: 本研究旨在应对 zero-inflated 数据，提高模型的预测性能。
methods: 本研究使用了层次模型，将 zero-inflated 数据分解为两个部分，并通过统计学方法来捕捉 zero 的影响。
results: 本研究在实际应用中，包括家用电器分类和机场接驳车需求预测，均得到了优秀的结果。例如，在家用电器分类中，与传统方法相比，提高了精度、回传率、内容积分和 ROC 曲线的平均值。而在机场接驳车需求预测中，使用了 two-fold 模型，与其他模型相比，得到了 statistically significant 的结果。

Abstract
In many cases, a machine learning model must learn to correctly predict a few data points with particular values of interest in a broader range of data where many target values are zero. Zero-inflated data can be found in diverse scenarios, such as lumpy and intermittent demands, power consumption for home appliances being turned on and off, impurities measurement in distillation processes, and even airport shuttle demand prediction. The presence of zeroes affects the models' learning and may result in poor performance. Furthermore, zeroes also distort the metrics used to compute the model's prediction quality. This paper showcases two real-world use cases (home appliances classification and airport shuttle demand prediction) where a hierarchical model applied in the context of zero-inflated data leads to excellent results. In particular, for home appliances classification, the weighted average of Precision, Recall, F1, and AUC ROC was increased by 27%, 34%, 49%, and 27%, respectively. Furthermore, it is estimated that the proposed approach is also four times more energy efficient than the SOTA approach against which it was compared to. Two-fold models performed best in all cases when predicting airport shuttle demand, and the difference against other models has been proven to be statistically significant.

摘要
在许多情况下，机器学习模型需要正确预测一些特定值的数据点，而这些数据点中有许多目标值为零。零扩散数据可以在多种场景中找到，如峰值和间歇性的需求，家用电器的功耗频率是否打开或关闭，蒸馏过程中的杂质测量，以及机场接驳车需求预测。 zeros的存在会影响模型的学习，并可能导致模型的性能差。此外， zeros还会扭曲用于计算模型预测质量的指标。这篇文章介绍了两个实际应用场景（家用电器分类和机场接驳车需求预测），在零扩散数据的上下文中使用层次模型，并取得了优秀的结果。具体来说，对于家用电器分类，使用权重平均的精度、回归率、F1指标和AUC ROC指标的提高分别为27%、34%、49%和27%。此外，还估计该方法比最佳实践（State of the Art，SOTA）方法四倍更能效率。在预测机场接驳车需求方面，两重模型都表现最佳，与其他模型之间的差异被证明为 statistically significant。

A Carbon Tracking Model for Federated Learning: Impact of Quantization and Sparsification

paper_url: http://arxiv.org/abs/2310.08087
repo_url: None
paper_authors: Luca Barbieri, Stefano Savazzi, Sanaz Kianoush, Monica Nicoli, Luigi Serio
for: 这篇论文是为了研究 Federated Learning (FL) 方法如何减少数据存储和计算复杂性，以提高环境可持续性。
methods: 这篇论文提出了一个框架，用于实时监测 FL 系统的能源和碳足迹影响。这个框架包括一个碳追踪工具，用于评估不同的 FL 策略。
results: 研究发现，在低能量通信 Situations (i.e., < 25 Kbit/Joule) 下，使用 consensus-driven FL 实现可以最大限度减少碳排放。此外，研究还发现，量化和简化操作可以寻求一个平衡点，使得 FL 设计变得可持续。

Abstract
Federated Learning (FL) methods adopt efficient communication technologies to distribute machine learning tasks across edge devices, reducing the overhead in terms of data storage and computational complexity compared to centralized solutions. Rather than moving large data volumes from producers (sensors, machines) to energy-hungry data centers, raising environmental concerns due to resource demands, FL provides an alternative solution to mitigate the energy demands of several learning tasks while enabling new Artificial Intelligence of Things (AIoT) applications. This paper proposes a framework for real-time monitoring of the energy and carbon footprint impacts of FL systems. The carbon tracking tool is evaluated for consensus (fully decentralized) and classical FL policies. For the first time, we present a quantitative evaluation of different computationally and communication efficient FL methods from the perspectives of energy consumption and carbon equivalent emissions, suggesting also general guidelines for energy-efficient design. Results indicate that consensus-driven FL implementations should be preferred for limiting carbon emissions when the energy efficiency of the communication is low (i.e., < 25 Kbit/Joule). Besides, quantization and sparsification operations are shown to strike a balance between learning performances and energy consumption, leading to sustainable FL designs.

摘要

Tight Time-Space Lower Bounds for Constant-Pass Learning

paper_url: http://arxiv.org/abs/2310.08070
repo_url: None
paper_authors: Xin Lyu, Avishay Tal, Hongxun Wu, Junzhao Yang
for: 这个论文的目的是为了证明任何偏好学习算法都需要 quadratic memory 或者 exponential number of samples。
methods: 这个论文使用了多 passes 模型，允许学习者在流中多次访问样本。
results: 这个论文证明了任何偏好学习算法在多 passes 模型下都需要 either $\Omega(n^2)$ 内存大小或者至少 $2^{\sqrt{n}}$ 样本。

Abstract
In his breakthrough paper, Raz showed that any parity learning algorithm requires either quadratic memory or an exponential number of samples [FOCS'16, JACM'19]. A line of work that followed extended this result to a large class of learning problems. Until recently, all these results considered learning in the streaming model, where each sample is drawn independently, and the learner is allowed a single pass over the stream of samples. Garg, Raz, and Tal [CCC'19] considered a stronger model, allowing multiple passes over the stream. In the $2$-pass model, they showed that learning parities of size $n$ requires either a memory of size $n^{1.5}$ or at least $2^{\sqrt{n}$ samples. (Their result also generalizes to other learning problems.) In this work, for any constant $q$, we prove tight memory-sample lower bounds for any parity learning algorithm that makes $q$ passes over the stream of samples. We show that such a learner requires either $\Omega(n^{2})$ memory size or at least $2^{\Omega(n)}$ samples. Beyond establishing a tight lower bound, this is the first non-trivial lower bound for $q$-pass learning for any $q\ge 3$. Similar to prior work, our results extend to any learning problem with many nearly-orthogonal concepts. We complement the lower bound with an upper bound, showing that parity learning with $q$ passes can be done efficiently with $O(n^2/\log q)$ memory.

摘要
在他的突破论文中，拉兹（Raz）证明了任何偏置学习算法都需要 Either quadratic memory or an exponential number of samples （FOCS'16, JACM'19）。一条随后的工作扩展了这个结论到了一大类学习问题。直到最近，所有的结果都考虑了学习在流式模型下，每个样本独立地被采样，学习者被允许单次遍历样本流。Garg、拉兹和塔尔（Garg, Raz, and Tal）在 $2$-pass模型中显示了学习parity的大小为 $n$ 需要 Either a memory of size $n^{1.5}$ or at least $2^{\sqrt{n}$ samples （他们的结果也总结于其他学习问题）。在这项工作中，我们证明了任何偏置学习算法在 $q$ 次遍历样本流时，需要 Either $\Omega(n^{2})$ 内存大小或至少 $2^{\Omega(n)}$ 样本。这不仅是首次给出了非微小的下界，还是首次给出了 $q$-pass 学习的非微小下界，其中 $q\ge 3$。我们的结果同样适用于任何具有多个几乎正交概念的学习问题。我们在这篇论文中还提供了一个上界，证明了在 $q$ 次遍历样本流时，可以使用 $O(n^2/\log q)$ 内存来有效地学习 parity。

ETDock: A Novel Equivariant Transformer for Protein-Ligand Docking

paper_url: http://arxiv.org/abs/2310.08061
repo_url: None
paper_authors: Yiqiang Yi, Xu Wan, Yatao Bian, Le Ou-Yang, Peilin Zhao
for: 预测蛋白质和抗体之间的吸附是药物发现中的关键和挑战之一，但传统的吸附方法主要依靠分数函数，深度学习基于的吸附方法通常忽略蛋白质和抗体的3D空间信息以及抗体的图形级特征，这限制了其性能。
methods: 我们提出了一种对称变换神经网络模型 для蛋白质-抗体吸附pose预测，该方法包括精细 Ligand 的特征处理、TAMformer模块来学习蛋白质和抗体的表达，以及基于预测距离矩阵的迭代优化来生成高精度的抗体pose。
results: 我们在实际数据集上进行了实验，结果表明，我们的模型可以达到状态畅的性能。

Abstract
Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep learning-based docking approaches usually neglect the 3D spatial information of proteins and ligands, as well as the graph-level features of ligands, which limits their performance. To address these limitations, we propose an equivariant transformer neural network for protein-ligand docking pose prediction. Our approach involves the fusion of ligand graph-level features by feature processing, followed by the learning of ligand and protein representations using our proposed TAMformer module. Additionally, we employ an iterative optimization approach based on the predicted distance matrix to generate refined ligand poses. The experimental results on real datasets show that our model can achieve state-of-the-art performance.

摘要
预测蛋白和小分子的吸附是药物发现中的关键和挑战。传统的吸附方法主要基于得分函数，而深度学习基于吸附方法通常忽略蛋白和小分子的3D空间信息以及ligand的图形级特征，这限制了其性能。为解决这些限制，我们提议一种具有等变性的变换神经网络 для蛋白-小分子吸附姿态预测。我们的方法包括ligand图形级特征的融合以及使用我们提出的TAMformer模块来学习蛋白和小分子表示。此外，我们采用基于预测距离矩阵的迭代优化方法来生成高精度的ligand姿态。实验结果表明，我们的模型可以在真实数据上达到顶尖性能。

LGL-BCI: A Lightweight Geometric Learning Framework for Motor Imagery-Based Brain-Computer Interfaces

paper_url: http://arxiv.org/abs/2310.08051
repo_url: None
paper_authors: Jianchao Lu, Yuzhe Tian, Yang Zhang, Jiaqi Ge, Quan Z. Sheng, Xi Zheng
for: 这个研究旨在提高电enzephalogram (EEG)-based Motor Imagery (MI) 任务的精度和效率，并且探索geometry deep learning的应用在脑机器接口 (BCI) 领域。
methods: 本研究使用Geometric Deep Learning Framework for EEG processing in non-Euclidean metric spaces, particularly the Symmetric Positive Definite (SPD) Manifold space, 并且提出了一个EEG通道选择解决方案，通过将SPD矩阵维度减少，以提高推断速度。
results: 实验结果显示LGL-BCI的精度和效率明显超过现有解决方案（$82.54%$ vs. $62.22%$），并且具有较少的参数（64.9M）。

Abstract
Brain-Computer Interfaces (BCIs) are a groundbreaking technology for interacting with external devices using brain signals. Despite advancements, electroencephalogram (EEG)-based Motor Imagery (MI) tasks face challenges like amplitude and phase variability, and complex spatial correlations, with a need for smaller model size and faster inference. This study introduces the LGL-BCI framework, employing a Geometric Deep Learning Framework for EEG processing in non-Euclidean metric spaces, particularly the Symmetric Positive Definite (SPD) Manifold space. LGL-BCI offers robust EEG data representation and captures spatial correlations. We propose an EEG channel selection solution via a feature decomposition algorithm to reduce SPD matrix dimensionality, with a lossless transformation boosting inference speed. Extensive experiments show LGL-BCI's superior accuracy and efficiency compared to current solutions, highlighting geometric deep learning's potential in MI-BCI applications. The efficiency, assessed on two public EEG datasets and two real-world EEG devices, significantly outperforms the state-of-the-art solution in accuracy ($82.54\%$ versus $62.22\%$) with fewer parameters (64.9M compared to 183.7M).

摘要
Brain-Computer Interfaces (BCIs) 是一种创新的技术，通过脑电响应与外部设备交互。尽管有进步，但EEG基于 Motor Imagery (MI) 任务还面临着幅度和相位变化，复杂的空间相关性等问题，需要更小的模型大小和更快的推理。本研究介绍了LGL-BCI框架，利用几何深度学习框架对EEG数据进行处理，特别是在非欧几何度量空间中的Symmetric Positive Definite（SPD） manifesto空间。LGL-BCI可以准确地表示EEG数据，并捕捉空间相关性。我们提出了基于特征分解算法的EEG通道选择解决方案，以减少SPD矩阵维度，并通过无损变换提高推理速度。广泛的实验表明LGL-BCI的精度和效率都高于当前解决方案，这highlights几何深度学习在MI-BCI应用中的潜力。LGL-BCI的效率，在两个公共EEG数据集和两个实际世界EEG设备上进行评估，明显超过了当前解决方案的精度($82.54\%$ VS $62.22\%$)，同时具有 fewer parameters（64.9M VS 183.7M）。

Exploring the Relationship Between Model Architecture and In-Context Learning Ability

paper_url: http://arxiv.org/abs/2310.08049
repo_url: https://github.com/ivnle/synth-icl
paper_authors: Ivan Lee, Nan Jiang, Taylor Berg-Kirkpatrick
for: 本研究探讨模型架构与 Context-aware 学习能力之间的关系。
methods: 我们对十五种模型架构进行了一系列的synthetic Context-aware 学习任务测试。这些选择的架构包括Recurrent Neural Networks、Convolutional Neural Networks、Transformers以及emerging attention的替代品。我们发现所有考虑的架构都可以在某些条件下进行Context-aware 学习，但是当任务复杂度增加时，当代架构表现最佳。
results: 我们的追加实验表明，不同架构对于 Hyperparameter 设置和训练动态的影响不同。另外，我们发现一些 emerging attention 替代品在 Context-aware 学习中表现更加稳定和 Robust，这可能开启了将Context-aware 学习扩展到更多的 Context-aware 示例的未来可能性。

Abstract
What is the relationship between model architecture and the ability to perform in-context learning? In this empirical study, we take the first steps towards answering this question. In particular, we evaluate fifteen model architectures across a suite of synthetic in-context learning tasks. The selected architectures represent a broad range of paradigms, including recurrent and convolution-based neural networks, transformers, and emerging attention alternatives. We discover that all considered architectures can perform in-context learning under certain conditions. However, contemporary architectures are found to be the best performing, especially as task complexity grows. Additionally, our follow-up experiments delve into various factors that influence in-context learning. We observe varied sensitivities among architectures with respect to hyperparameter settings. Our study of training dynamics reveals that certain architectures exhibit a smooth, progressive learning trajectory, while others demonstrate periods of stagnation followed by abrupt mastery of the task. Finally, and somewhat surprisingly, we find that several emerging attention alternatives are more robust in-context learners than transformers; since such approaches have constant-sized memory footprints at inference time, this result opens the future possibility of scaling up in-context learning to vastly larger numbers of in-context examples.

摘要
What is the relationship between model architecture and the ability to perform in-context learning? In this empirical study, we take the first steps towards answering this question. In particular, we evaluate fifteen model architectures across a suite of synthetic in-context learning tasks. The selected architectures represent a broad range of paradigms, including recurrent and convolution-based neural networks, transformers, and emerging attention alternatives. We discover that all considered architectures can perform in-context learning under certain conditions. However, contemporary architectures are found to be the best performing, especially as task complexity grows. Additionally, our follow-up experiments delve into various factors that influence in-context learning. We observe varied sensitivities among architectures with respect to hyperparameter settings. Our study of training dynamics reveals that certain architectures exhibit a smooth, progressive learning trajectory, while others demonstrate periods of stagnation followed by abrupt mastery of the task. Finally, and somewhat surprisingly, we find that several emerging attention alternatives are more robust in-context learners than transformers; since such approaches have constant-sized memory footprints at inference time, this result opens the future possibility of scaling up in-context learning to vastly larger numbers of in-context examples.Here is the translation in Traditional Chinese:这篇研究对于模型架构和内容学习的关系进行了初步的探索。我们评估了15种模型架构，包括循环神经网络、卷积神经网络、转移器和新兴注意力方法。我们发现所有考虑的架构都可以在某些情况下进行内容学习，但是现代架构在任务复杂度增加时表现最佳。我们还进行了多种因素影响内容学习的实验，发现不同的架构对于内容学习有不同的敏感性。我们的训练动态研究发现一些架构在进行学习时会展示平滑、进步的学习曲线，而其他架构则会在进行学习时出现停滞期后快速掌握任务。最后，我们发现一些新兴注意力方法在内容学习中表现更稳定和更强，这可能是因为这些方法在测试时具有常量大小的内存占用量，这开启了未来可以扩展内容学习至更大量的内容示例的可能性。

SEE-OoD: Supervised Exploration For Enhanced Out-of-Distribution Detection

paper_url: http://arxiv.org/abs/2310.08040
repo_url: None
paper_authors: Xiaoyang Song, Wenbo Sun, Maher Nouiehed, Raed Al Kontar, Judy Jin
for: 提高Out-of-Distribution（OoD）检测精度
methods: 基于 Wasserstein 分数的生成对抗训练方案
results: 在多个计算机视觉数据集上表现出超过现有技术，并且在未看到OoD数据时表现出更好的普适性

Abstract
Current techniques for Out-of-Distribution (OoD) detection predominantly rely on quantifying predictive uncertainty and incorporating model regularization during the training phase, using either real or synthetic OoD samples. However, methods that utilize real OoD samples lack exploration and are prone to overfit the OoD samples at hand. Whereas synthetic samples are often generated based on features extracted from training data, rendering them less effective when the training and OoD data are highly overlapped in the feature space. In this work, we propose a Wasserstein-score-based generative adversarial training scheme to enhance OoD detection accuracy, which, for the first time, performs data augmentation and exploration simultaneously under the supervision of limited OoD samples. Specifically, the generator explores OoD spaces and generates synthetic OoD samples using feedback from the discriminator, while the discriminator exploits both the observed and synthesized samples for OoD detection using a predefined Wasserstein score. We provide theoretical guarantees that the optimal solutions of our generative scheme are statistically achievable through adversarial training in empirical settings. We then demonstrate that the proposed method outperforms state-of-the-art techniques on various computer vision datasets and exhibits superior generalizability to unseen OoD data.

摘要
现有的Out-of-Distribution（OoD）检测技术主要基于计量预测不确定性和在训练阶段添加模型规则，使用实际或 sintetic OoD 样本。然而，使用实际 OoD 样本的方法缺乏探索，容易过拟合当前 OoD 样本。而使用 sintetic 样本则是基于训练数据中提取的特征，当训练和 OoD 数据在特征空间高度重叠时，这些 sintetic 样本可能变得不效果。在这种情况下，我们提出一种基于 Wasserstein 分数的生成敌对训练方案，以提高 OoD 检测精度。这种方法在supervise limited OoD 样本下同时进行数据扩充和探索。具体来说，生成器通过对 Discriminator 提供反馈，在 OoD 空间中探索并生成 sintetic OoD 样本，而 Discriminator 则是使用观察到的和 sintetic 样本来进行 OoD 检测，使用预定的 Wasserstein 分数。我们提供了理论保证，表明我们的生成方案在实际情况下通过对抗训练来实现最优解。我们然后通过在多种计算机视觉数据集上进行比较，证明了我们的方法在不同的 OoD 数据集上具有优于现状技术的普适性和检测精度。

ZEST: Attention-based Zero-Shot Learning for Unseen IoT Device Classification

paper_url: http://arxiv.org/abs/2310.08036
repo_url: https://github.com/Binghui99/ZEST
paper_authors: Binghui Wu, Philipp Gysel, Dinil Mon Divakaran, Mohan Gurusamy
for: 本研究旨在提出一种基于自注意力的零例学习（ZEST）框架，用于分类未经训练过的 IoT 设备。
methods: ZEST 框架包括 i) 基于自注意力的网络特征提取器（SANE），用于提取 IoT 流量的隐藏空间表示；ii) 使用这些隐藏特征生成 pseudo 数据，并 iii) 使用这些生成的 pseudo 数据进行预测设备分类。
results: 我们在实际 IoT 流量数据上进行了广泛的实验，结果表明：i) ZEST 在基础模型上显著提高了准确率；ii) ZEST 能够更好地提取有意义的表示，比 LSTM 更常用于网络流量模型。

Abstract
Recent research works have proposed machine learning models for classifying IoT devices connected to a network. However, there is still a practical challenge of not having all devices (and hence their traffic) available during the training of a model. This essentially means, during the operational phase, we need to classify new devices not seen during the training phase. To address this challenge, we propose ZEST -- a ZSL (zero-shot learning) framework based on self-attention for classifying both seen and unseen devices. ZEST consists of i) a self-attention based network feature extractor, termed SANE, for extracting latent space representations of IoT traffic, ii) a generative model that trains a decoder using latent features to generate pseudo data, and iii) a supervised model that is trained on the generated pseudo data for classifying devices. We carry out extensive experiments on real IoT traffic data; our experiments demonstrate i) ZEST achieves significant improvement (in terms of accuracy) over the baselines; ii) ZEST is able to better extract meaningful representations than LSTM which has been commonly used for modeling network traffic.

摘要
现代研究工作已经提出了基于机器学习的互联网设备分类模型。然而，实际上还存在一个实际挑战，即在训练阶段没有所有设备（以及其交换流量）可用。这意味着在运维阶段需要将新设备分类，这些设备未在训练阶段看到。为解决这个挑战，我们提出了ZEST——基于自注意力的零shot学习（ZSL）框架，用于分类已知和未知的设备。ZEST包括以下三个部分：1. 基于自注意力的网络特征提取器（SANE），用于提取互联网流量的隐藏空间表示。2. 使用隐藏特征进行生成模型，用于生成 Pseudo 数据。3. 使用生成的 Pseudo 数据进行超参数学习，用于分类设备。我们对实际的互联网流量数据进行了广泛的实验，我们的实验结果表明：1. ZEST 相比基elines提供了显著改进（准确率）。2. ZEST 能够更好地提取有意义的表示，比如 LSTM 通常用于网络流量模型。

Local Graph Clustering with Noisy Labels

paper_url: http://arxiv.org/abs/2310.08031
repo_url: None
paper_authors: Artur Back de Luca, Kimon Fountoulakis, Shenghao Yang
for:这个论文主要研究的是如何使用噪声节点标签来提高本地图表 clustering 性能。methods:该论文提出了一种基于扩展图的本地 clustering 方法，使用噪声节点标签来提高 clustering 性能。此外，论文还提出了一种基于权重图的方法，通过在噪声标签上进行扩散来提高 clustering 性能。results:实验结果表明，使用噪声节点标签可以获得可靠的节点标签，并且通过在权重图上进行扩散可以提高本地 clustering 性能。论文还进行了多个实验，并证明了这种方法可以在多种实际应用中提高 clustering 性能。

Abstract
The growing interest in machine learning problems over graphs with additional node information such as texts, images, or labels has popularized methods that require the costly operation of processing the entire graph. Yet, little effort has been made to the development of fast local methods (i.e. without accessing the entire graph) that extract useful information from such data. To that end, we propose a study of local graph clustering using noisy node labels as a proxy for additional node information. In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise. Subsequently, a fraction of these labels is flipped. We investigate the benefits of incorporating noisy labels for local graph clustering. By constructing a weighted graph with such labels, we study the performance of graph diffusion-based local clustering method on both the original and the weighted graphs. From a theoretical perspective, we consider recovering an unknown target cluster with a single seed node in a random graph with independent noisy node labels. We provide sufficient conditions on the label noise under which, with high probability, using diffusion in the weighted graph yields a more accurate recovery of the target cluster. This approach proves more effective than using the given labels alone or using diffusion in the label-free original graph. Empirically, we show that reliable node labels can be obtained with just a few samples from an attributed graph. Moreover, utilizing these labels via diffusion in the weighted graph leads to significantly better local clustering performance across several real-world datasets, improving F1 scores by up to 13%.

摘要
“对于具有附加节点信息的图像进行学习问题的兴趣不断增长，然而现有的方法通常需要处理整个图像，即使这些方法可能会带来成本高昂的计算成本。为了开发更快速的本地方法，我们提出了一种基于不纯净节点标签的本地图像归类方法。在这种设定下，每个节点都会Initially receive binary labels based on cluster affiliation：1 if they belong to the target cluster and 0 otherwise。然后，一部分这些标签会被翻转。我们研究了在Weighted graph上使用这些标签来进行图像归类的效果。从理论上看，我们考虑了在Random graph上recover an unknown target cluster with a single seed node。我们提供了对标签噪声的condition under which, with high probability, using diffusion in the weighted graph yields a more accurate recovery of the target cluster。这种方法比使用给定标签alone或使用 diffusion in the label-free original graph更有效。Empirically, we show that reliable node labels can be obtained with just a few samples from an attributed graph。此外，通过使用这些标签via diffusion in the weighted graph，可以在多个实际数据集上获得显著提高的本地归类性能，提高F1分数达13%。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Robust 1-bit Compressed Sensing with Iterative Hard Thresholding

paper_url: http://arxiv.org/abs/2310.08019
repo_url: None
paper_authors: Namiko Matsumoto, Arya Mazumdar
for: 本文研究了一种噪声损坏的1比特压缩感知问题，即在一些量化为签字的噪声损坏中估计$k$-次稀畴向量$x\in S^{n-1}$。
methods: 本文使用了一种名为BIHT的 proximal 梯度下降算法，该算法在一个特定的损失函数上进行 iterative 硬阈值处理，以估计$x$。
results: 本文表明，BIHT算法在噪声损坏情况下可以提供更好的结果，并且可以在 $\tilde{O}(\epsilon+\tau)$ 误差范围内估计$x$，其中 $\epsilon$ 是约束误差，$\tau$ 是噪声损坏率。这个结果表明了iterative 硬阈值处理在噪声损坏情况下的稳定性。

Abstract
In 1-bit compressed sensing, the aim is to estimate a $k$-sparse unit vector $x\in S^{n-1}$ within an $\epsilon$ error (in $\ell_2$) from minimal number of linear measurements that are quantized to just their signs, i.e., from measurements of the form $y = \mathrm{Sign}(\langle a, x\rangle).$ In this paper, we study a noisy version where a fraction of the measurements can be flipped, potentially by an adversary. In particular, we analyze the Binary Iterative Hard Thresholding (BIHT) algorithm, a proximal gradient descent on a properly defined loss function used for 1-bit compressed sensing, in this noisy setting. It is known from recent results that, with $\tilde{O}(\frac{k}{\epsilon})$ noiseless measurements, BIHT provides an estimate within $\epsilon$ error. This result is optimal and universal, meaning one set of measurements work for all sparse vectors. In this paper, we show that BIHT also provides better results than all known methods for the noisy setting. We show that when up to $\tau$-fraction of the sign measurements are incorrect (adversarial error), with the same number of measurements as before, BIHT agnostically provides an estimate of $x$ within an $\tilde{O}(\epsilon+\tau)$ error, maintaining the universality of measurements. This establishes stability of iterative hard thresholding in the presence of measurement error. To obtain the result, we use the restricted approximate invertibility of Gaussian matrices, as well as a tight analysis of the high-dimensional geometry of the adversarially corrupted measurements.

摘要
在1比特压缩感知中，目标是从最小的线性测量中估计一个$k$-简单向量$x\in S^{n-1}$，保证错误在$\ell_2$范围内偏差不超过$\epsilon$。在这篇论文中，我们研究了一种噪声损失的情况，其中一部分测量可能会被反转，可能由一个对手引起。特别是，我们分析了使用 proximal 梯度下降的损失函数来实现1比特压缩感知的Binary Iterative Hard Thresholding（BIHT）算法在这种噪声设定下的性能。已知于最近的结果是，只需要$\tilde{O}\left(\frac{k}{\epsilon}\right)$个噪声无关的测量，BIHT可以提供 $\epsilon$ 误差内的估计。这个结果是最佳的和通用的，意味着一组测量可以适用于所有简单向量。在这篇论文中，我们证明BIHT在噪声设定下也比所有已知方法更好。我们证明，当最多 $\tau$ 部分签字测量错误（对手错误）时，BIHT可以在 $\tilde{O}(\epsilon+\tau)$ 误差内提供 $x$ 的估计，保持测量的 universality。这个结果证明了迭代坚固resholding在测量误差存在的情况下的稳定性。为了获得这个结果，我们使用了受限的近似逆Matrix，以及高维ensional的对抗性测量的紧张分析。

Why Train More? Effective and Efficient Membership Inference via Memorization

paper_url: http://arxiv.org/abs/2310.08015
repo_url: None
paper_authors: Jihye Choi, Shruti Tople, Varun Chandrasekaran, Somesh Jha
for: 本研究旨在攻击机器学习模型的私人训练数据，以便破坏个人隐私和其他高级威胁。
methods: 本研究使用了黑盒攻击方法，通过访问数据分布来训练陌生模型。
results: 研究表明，通过策略选择敏感样本，攻击者可以最大化攻击成功的可能性，同时减少陌生模型的数量。

Abstract
Membership Inference Attacks (MIAs) aim to identify specific data samples within the private training dataset of machine learning models, leading to serious privacy violations and other sophisticated threats. Many practical black-box MIAs require query access to the data distribution (the same distribution where the private data is drawn) to train shadow models. By doing so, the adversary obtains models trained "with" or "without" samples drawn from the distribution, and analyzes the characteristics of the samples under consideration. The adversary is often required to train more than hundreds of shadow models to extract the signals needed for MIAs; this becomes the computational overhead of MIAs. In this paper, we propose that by strategically choosing the samples, MI adversaries can maximize their attack success while minimizing the number of shadow models. First, our motivational experiments suggest memorization as the key property explaining disparate sample vulnerability to MIAs. We formalize this through a theoretical bound that connects MI advantage with memorization. Second, we show sample complexity bounds that connect the number of shadow models needed for MIAs with memorization. Lastly, we confirm our theoretical arguments with comprehensive experiments; by utilizing samples with high memorization scores, the adversary can (a) significantly improve its efficacy regardless of the MIA used, and (b) reduce the number of shadow models by nearly two orders of magnitude compared to state-of-the-art approaches.

摘要

AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE

paper_url: http://arxiv.org/abs/2310.08012
repo_url: None
paper_authors: Wei Ao, Vishnu Naresh Boddeti
for: 这 paper 的目的是提出一种自动化深度 convolutional neural networks (CNNs) 的安全执行方法，以便在 RNS-CKKS 上进行加密数据处理。
methods: 这 paper 使用了一种叫做 AutoFHE 的方法，它利用层次混合度 polynomial activation functions，并通过多目标优化来调整 homomorphic evaluation 架构以适应不同的 CNN 架构。
results: 实验结果表明，AutoFHE 可以在 RNS-CKKS 加密 CIFAR 数据集上提高安全执行的速度，比较传统方法快得多，同时也可以提高准确率。相比TFHE，AutoFHE 可以提高执行速度和准确率的同时，达到 $103\times$ 和 3.46% 的提升。

Abstract
Secure inference of deep convolutional neural networks (CNNs) under RNS-CKKS involves polynomial approximation of unsupported non-linear activation functions. However, existing approaches have three main limitations: 1) Inflexibility: The polynomial approximation and associated homomorphic evaluation architecture are customized manually for each CNN architecture and do not generalize to other networks. 2) Suboptimal Approximation: Each activation function is approximated instead of the function represented by the CNN. 3) Restricted Design: Either high-degree or low-degree polynomial approximations are used. The former retains high accuracy but slows down inference due to bootstrapping operations, while the latter accelerates ciphertext inference but compromises accuracy. To address these limitations, we present AutoFHE, which automatically adapts standard CNNs for secure inference under RNS-CKKS. The key idea is to adopt layerwise mixed-degree polynomial activation functions, which are optimized jointly with the homomorphic evaluation architecture in terms of the placement of bootstrapping operations. The problem is modeled within a multi-objective optimization framework to maximize accuracy and minimize the number of bootstrapping operations. AutoFHE can be applied flexibly on any CNN architecture, and it provides diverse solutions that span the trade-off between accuracy and latency. Experimental evaluation over RNS-CKKS encrypted CIFAR datasets shows that AutoFHE accelerates secure inference by $1.32\times$ to $1.8\times$ compared to methods employing high-degree polynomials. It also improves accuracy by up to 2.56% compared to methods using low-degree polynomials. Lastly, AutoFHE accelerates inference and improves accuracy by $103\times$ and 3.46%, respectively, compared to CNNs under TFHE.

摘要
安全的深度卷积神经网络（CNN）在RNS-CKKS中进行推理需要多项式近似未支持的非线性活动函数。然而，现有的方法具有以下三个主要的限制：1. 不灵活：用于CNN的多项式近似和相关的卷积评估架构是手动定制的，不能泛化到其他网络。2. 不优化的近似：对于每个CNN，都使用多项式近似，而不是网络中的功能表示。3. 局限的设计：使用高度或低度的多项式近似，前者保持高精度，但增加了卷积评估的速度，后者快速了ciphertext推理，但牺牲了精度。为了解决这些限制，我们提出了AutoFHE，它自动将标准的CNN适应安全地进行RNS-CKKS中的推理。AutoFHE的关键思想是采用层次混合度多项式活动函数，并在卷积评估架构中对它们进行优化，以最大化精度和减少卷积评估操作数量。问题被模型为多目标优化框架，以优化精度和响应时间之间的负荷。AutoFHE可以灵活应用于任何CNN架构，并提供了多种解决方案，横跨精度和响应时间之间的负荷Trade-off。实验表明，AutoFHE比使用高度多项式快速了1.32倍至1.8倍的安全推理，并提高了精度。同时，AutoFHE也提高了精度和响应时间之间的负荷Trade-off，相比TFHE， acceleration和精度提高分别为103倍和3.46%。

LEMON: Lossless model expansion

paper_url: http://arxiv.org/abs/2310.07999
repo_url: None
paper_authors: Yite Wang, Jiahao Su, Hanlin Lu, Cong Xie, Tianyi Liu, Jianbo Yuan, Haibin Lin, Ruoyu Sun, Hongxia Yang
for: 这篇论文的目的是提高深度神经网络的扩展和优化，以提高其表现和智能推理能力。
methods: 本文使用的方法是将小型神经网络的发现获得到大型神经网络的初始化，并且运用特定的学习率调询器来训练大型神经网络。
results: 实验结果显示，使用LEMON可以降低对于Vision Transformers和BERT等模型的训练时间和计算成本，相比训练 desde scratch，可以节省56.7%的计算成本和33.2%的训练时间。

Abstract
Scaling of deep neural networks, especially Transformers, is pivotal for their surging performance and has further led to the emergence of sophisticated reasoning capabilities in foundation models. Such scaling generally requires training large models from scratch with random initialization, failing to leverage the knowledge acquired by their smaller counterparts, which are already resource-intensive to obtain. To tackle this inefficiency, we present $\textbf{L}$ossl$\textbf{E}$ss $\textbf{MO}$del Expansio$\textbf{N}$ (LEMON), a recipe to initialize scaled models using the weights of their smaller but pre-trained counterparts. This is followed by model training with an optimized learning rate scheduler tailored explicitly for the scaled models, substantially reducing the training time compared to training from scratch. Notably, LEMON is versatile, ensuring compatibility with various network structures, including models like Vision Transformers and BERT. Our empirical results demonstrate that LEMON reduces computational costs by 56.7% for Vision Transformers and 33.2% for BERT when compared to training from scratch.

摘要
深度神经网络的扩展，特别是转换器，对其表现的增长和引入了更加复杂的理解能力而言是非常重要的。这种扩展通常需要从零开始训练大型模型，不能利用已经训练过的小型模型所获得的知识，这些模型已经是资源占用的了。为解决这种不效率，我们提出了$\textbf{L}$ossl$\textbf{E}$ss $\textbf{MO}$del Expansio$\textbf{N}$（LEMON），一种初始化扩展模型的方法，使用小型模型已经训练过的权重。然后，通过对扩展模型进行优化的学习率调整器，减少了训练时间与从零开始训练的计算成本。值得注意的是，LEMON具有兼容性，可以与不同的网络结构相结合，包括视觉转换器和BERT等模型。我们的实验结果表明，LEMON可以在视觉转换器和BERT等模型上减少计算成本，相比于从零开始训练，减少了56.7%和33.2%。

Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics

paper_url: http://arxiv.org/abs/2310.07990
repo_url: None
paper_authors: Chen Zhao, Kuan-Jui Su, Chong Wu, Xuewei Cao, Qiuying Sha, Wu Li, Zhe Luo, Tian Qin, Chuan Qiu, Lan Juan Zhao, Anqi Liu, Lindong Jiang, Xiao Zhang, Hui Shen, Weihua Zhou, Hong-Wen Deng
For: The paper aims to improve the accuracy of metabolomics data imputation by integrating whole-genome sequencing (WGS) data with metabolomics data.* Methods: The proposed method uses a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation.* Results: The proposed method achieved r2-scores > 0.01 for 71.55% of metabolites, demonstrating its superiority compared to conventional imputation techniques.Here are the three points in Simplified Chinese:* For: 这个研究想要使用整个基因组序列（WGS）数据来改进大规模精确的 метабо树据报告。* Methods: 这种方法使用多视图自适应变换器来同时模型负担分数、多因素风险分数（PGS）和链接不均衡（LD）剔除单树谱分割（SNPs），以提取特征和缺失的 метабо树据。* Results: 这种方法在实验数据上显示了与传统填充方法相比的超越性，其中71.55%的代表物质的r2-分数大于0.01。

Abstract
Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved r2-scores > 0.01 for 71.55% of metabolites. Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.

摘要
背景：大量数据缺失是生物 массспектрометрии中的常见挑战，可能导致偏执和缺失的分析。整合整个基因组序列数据（WGS）与生物 массспектрометria数据已经出现为提高生物 массспектрометria数据缺失的精度的有力的方法。方法：在这种研究中，我们提出了一种新的方法，利用WGS数据和参照物质的信息来填充未知的代谢物质。我们的方法使用多视图变换自适应器来同时模型负担分数、多型风险分数（PGS）和遗传关系不均匀变异（LD）排序单核苷变异（SNPs），以提取特征和缺失的生物 массспектрометria数据。通过学习两种各自的隐藏表示，我们的方法可以有效地填充缺失的生物 массспектрометria数据。结果：我们对实验室中的生物 массспектрометria数据进行评估，并证明了我们的方法在缺失数据的情况下的表现较为优于传统填充技术。使用35个模板代谢物质的负担分数、PGS和LD排序SNPs，我们的方法在71.55%的代谢物质上达到了r2分数>0.01。结论：整合WGS数据在生物 массспектрометria填充中不仅提高了数据完teness，还提高了下游分析，开创了更全面和准确的代谢 PATHway和疾病相关性的研究。我们的发现对于利用WGS数据进行生物 массспектрометria填充的潜在利益和多Modal数据集成在精度医学研究中的重要性提供了有价值的意见。

Semantic-Forward Relaying: A Novel Framework Towards 6G Cooperative Communications

paper_url: http://arxiv.org/abs/2310.07987
repo_url: https://github.com/linwest/Semantic_Forward
paper_authors: Wensheng Lin, Yuna Yan, Lixin Li, Zhu Han, Tad Matsumoto
for: 该文章提出了一种新的 relaying 框架，即 semantic-forward (SF)，用于 sixth-generation (6G) 无线网络中的合作通信。
methods: 该文章使用了semantic feature extraction和传输，以减少前进 payload，并提高网络对内链错误的Robustness。基于合作通信与侧信息的理论基础以及 turbo 原理，文章设计了一种 joint source-channel coding 算法，用于在 destination 中循环交换 extrinsic information，以提高解码 gain。
results: simulation results 表明，即使在坏通道条件下，SF relaying 仍然可以有效地提高 recovered information 质量。

Abstract
This letter proposes a novel relaying framework, semantic-forward (SF), for cooperative communications towards the sixth-generation (6G) wireless networks. The SF relay extracts and transmits the semantic features, which reduces forwarding payload, and also improves the network robustness against intra-link errors. Based on the theoretical basis for cooperative communications with side information and the turbo principle, we design a joint source-channel coding algorithm to iteratively exchange the extrinsic information for enhancing the decoding gains at the destination. Surprisingly, simulation results indicate that even in bad channel conditions, SF relaying can still effectively improve the recovered information quality.

摘要

Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization

paper_url: http://arxiv.org/abs/2310.07985
repo_url: https://github.com/ciam-group/nco_code
paper_authors: Fu Luo, Xi Lin, Fei Liu, Qingfu Zhang, Zhenkun Wang
for: 解决复杂的 combinatorial 优化问题，无需专家设计专门的算法。
methods: 提出了一种新的 Light Encoder and Heavy Decoder (LEHD) 模型，可以动态捕捉所有节点的关系，提高模型通用性。并开发了一种数据效率高的训练方案和灵活的解决方案构建机制。
results: 通过训练小规模问题实例，LEHD 模型可以在 TSP 和 CVRP 问题上生成近似优解，并在实际 TSPLib 和 CVRPLib 问题上也达到了优秀表现。这些结果证明了我们提出的 LEHD 模型可以显著提高现有的 NCO 性能。代码可以在 https://github.com/CIAM-Group/NCO_code/tree/main/single_objective/LEHD 上下载。

Abstract
Neural combinatorial optimization (NCO) is a promising learning-based approach for solving challenging combinatorial optimization problems without specialized algorithm design by experts. However, most constructive NCO methods cannot solve problems with large-scale instance sizes, which significantly diminishes their usefulness for real-world applications. In this work, we propose a novel Light Encoder and Heavy Decoder (LEHD) model with a strong generalization ability to address this critical issue. The LEHD model can learn to dynamically capture the relationships between all available nodes of varying sizes, which is beneficial for model generalization to problems of various scales. Moreover, we develop a data-efficient training scheme and a flexible solution construction mechanism for the proposed LEHD model. By training on small-scale problem instances, the LEHD model can generate nearly optimal solutions for the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP) with up to 1000 nodes, and also generalizes well to solve real-world TSPLib and CVRPLib problems. These results confirm our proposed LEHD model can significantly improve the state-of-the-art performance for constructive NCO. The code is available at https://github.com/CIAM-Group/NCO_code/tree/main/single_objective/LEHD.

摘要
神经组合优化（NCO）是一种有前途的学习基于方法，用于解决复杂的组合优化问题，不需要专家设计专门的算法。然而，大多数构造性NCO方法无法解决大规模实例问题，这会对实际应用中的使用有很大的限制。在这种情况下，我们提出了一种新的轻量级编码器和重量级解码器（LEHD）模型，具有强大的总结能力。LEHD模型可以学习动态捕捉所有可用节点的关系，这对模型总结有利，可以在不同的缩放比例下解决问题。此外，我们还开发了一种数据效率的训练方案和灵活的解决方案构建机制。通过训练小规模问题，LEHD模型可以生成近似优质解决方案，并在实际TSPLib和CVRPLib问题上具有良好的总结性。这些结果证明，我们提出的LEHD模型可以对构造性NCO进行显著改进，从而提高实际应用中的性能。代码可以在https://github.com/CIAM-Group/NCO_code/tree/main/single_objective/LEHD中下载。

RandCom: Random Communication Skipping Method for Decentralized Stochastic Optimization

paper_url: http://arxiv.org/abs/2310.07983
repo_url: None
paper_authors: Luyao Guo, Sulaiman A. Alghunaim, Kun Yuan, Laurent Condat, Jinde Cao
for: 这个论文主要针对的是分布式优化方法，具体来说是随机沟通跳过的分布式优化方法，以提高通信复杂性的加速。
methods: 这个论文提出了一种名为RandCom的分布式优化方法，该方法包含抽象的地方更新。作者对RandCom的性能进行了分析，并证明了在随机非 convex、 convex 和强 convex 设定下，RandCom可以减少通信开销。此外，作者还证明了 RandCom 可以在网络独立步长下实现线性增速。
results: 作者通过实验和分析表明，RandCom 可以在 federated learning 中实现线性增速，并且在非 convex 设定下，RandCom 可以实现网络独立步长下的线性增速。

Abstract
Distributed optimization methods with random communication skips are gaining increasing attention due to their proven benefits in accelerating communication complexity. Nevertheless, existing research mainly focuses on centralized communication protocols for strongly convex deterministic settings. In this work, we provide a decentralized optimization method called RandCom, which incorporates probabilistic local updates. We analyze the performance of RandCom in stochastic non-convex, convex, and strongly convex settings and demonstrate its ability to asymptotically reduce communication overhead by the probability of communication. Additionally, we prove that RandCom achieves linear speedup as the number of nodes increases. In stochastic strongly convex settings, we further prove that RandCom can achieve linear speedup with network-independent stepsizes. Moreover, we apply RandCom to federated learning and provide positive results concerning the potential for achieving linear speedup and the suitability of the probabilistic local update approach for non-convex settings.

摘要
分布式优化方法 WITH random communication skips 在加速通信复杂性方面受到越来越多的关注，因为它们在强 convex deterministic 设定下证明了其 benefits。然而，现有研究主要集中在中央通信协议上，即使在非 convex 和 strongly convex 设定下也进行了研究。在这项工作中，我们提出了一种分布式优化方法，称为 RandCom，该方法包含 probabilistic local updates。我们分析了 RandCom 在随机非 convex、 convex 和 strongly convex 设定下的性能，并证明了它可以减少通信开销的概率。此外，我们证明了 RandCom 随着节点数量增加而 achieve linear speedup。在随机 strongly convex 设定下，我们进一步证明了 RandCom 可以 achieve linear speedup WITH network-independent stepsizes。此外，我们将 RandCom 应用于联合学习，并提供了关于 achievable linear speedup 和非 convex 设定下 probablistic local update 方法的正面结果。

Reinforcement Learning of Display Transfer Robots in Glass Flow Control Systems: A Physical Simulation-Based Approach

paper_url: http://arxiv.org/abs/2310.07981
repo_url: None
paper_authors: Hwajong Lee, Chan Kim, Seong-Woo Kim
for: 解决制造系统生产能力提高中的流控系统优化问题，提高生产率。
methods: 使用深度反馈学习解决制造过程中的调度优化问题，实现可靠的流控系统设计。
results: 通过实验验证，使用反馈学习对显示制造过程中的玻璃流控系统进行优化，可以获得更高的生产率和更好的制造质量。

Abstract
A flow control system is a critical concept for increasing the production capacity of manufacturing systems. To solve the scheduling optimization problem related to the flow control with the aim of improving productivity, existing methods depend on a heuristic design by domain human experts. Therefore, the methods require correction, monitoring, and verification by using real equipment. As system designs increase in complexity, the monitoring time increases, which decreases the probability of arriving at the optimal design. As an alternative approach to the heuristic design of flow control systems, the use of deep reinforcement learning to solve the scheduling optimization problem has been considered. Although the existing research on reinforcement learning has yielded excellent performance in some areas, the applicability of the results to actual FAB such as display and semiconductor manufacturing processes is not evident so far. To this end, we propose a method to implement a physical simulation environment and devise a feasible flow control system design using a transfer robot in display manufacturing through reinforcement learning. We present a model and parameter setting to build a virtual environment for different display transfer robots, and training methods of reinforcement learning on the environment to obtain an optimal scheduling of glass flow control systems. Its feasibility was verified by using different types of robots used in the actual process.

摘要
一种流控系统是生产系统的关键概念，可以提高生产能力。为解决流控系统的调度优化问题，目前的方法通过域内人工设计而实现，但这些方法需要纠正、监测和验证，而且随着系统设计的增加，监测时间增加，优化设计的概率降低。为了避免人工设计的限制，我们提出了使用深度强化学习解决调度优化问题的方法。虽然现有研究中的深度学习得到了一些领域的优秀表现，但是其应用于实际的FAB，如显示和半导体生产过程，还未得到证明。因此，我们提出了一种在物理模拟环境中实现流控系统设计的方法，并在显示生产过程中使用转移机器人进行实际应用。我们提出了建立不同类型机器人的虚拟环境，并在这些环境中进行强化学习训练，以获得最佳的玻璃流控系统调度。我们验证了这种方法的可行性，并在实际过程中使用不同类型机器人进行了验证。

GRASP: Accelerating Shortest Path Attacks via Graph Attention

paper_url: http://arxiv.org/abs/2310.07980
repo_url: None
paper_authors: Zohair Shafi, Benjamin A. Miller, Ayan Chatterjee, Tina Eliassi-Rad, Rajmonda S. Caceres
for: 本研究旨在提高现有的 комбиниatorial优化算法的速度，使用机器学习技术来帮助和加速这些算法。
methods: 本研究提出了GRASP算法（图注意力加速最短路攻击），它使用图注意力网络来缩小输入问题的大小，从而提高运行速度。
results: 对于APX-硬度的问题，GRASP算法可以在10倍的运行时间内维持解的质量。此外，通过精心设计输入图的表示，包括节点特征与优化任务之间的相关性，可以更好地高亮优化解的重要结构。

Abstract
Recent advances in machine learning (ML) have shown promise in aiding and accelerating classical combinatorial optimization algorithms. ML-based speed ups that aim to learn in an end to end manner (i.e., directly output the solution) tend to trade off run time with solution quality. Therefore, solutions that are able to accelerate existing solvers while maintaining their performance guarantees, are of great interest. We consider an APX-hard problem, where an adversary aims to attack shortest paths in a graph by removing the minimum number of edges. We propose the GRASP algorithm: Graph Attention Accelerated Shortest Path Attack, an ML aided optimization algorithm that achieves run times up to 10x faster, while maintaining the quality of solution generated. GRASP uses a graph attention network to identify a smaller subgraph containing the combinatorial solution, thus effectively reducing the input problem size. Additionally, we demonstrate how careful representation of the input graph, including node features that correlate well with the optimization task, can highlight important structure in the optimization solution.

摘要
We consider an APX-hard problem where an adversary aims to attack shortest paths in a graph by removing the minimum number of edges. To address this, we propose the GRASP algorithm: Graph Attention Accelerated Shortest Path Attack, an ML-aided optimization algorithm that achieves run times up to 10 times faster while maintaining the quality of the solution generated.GRASP utilizes a graph attention network to identify a smaller subgraph containing the combinatorial solution, effectively reducing the input problem size. Additionally, we demonstrate how carefully representing the input graph, including node features that strongly correlate with the optimization task, can highlight important structure in the optimization solution.

Graph-SCP: Accelerating Set Cover Problems with Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.07979
repo_url: None
paper_authors: Zohair Shafi, Benjamin A. Miller, Tina Eliassi-Rad, Rajmonda S. Caceres
for: solves the Set Cover Problem (SCP) using graph neural networks to accelerate combinatorial optimization.
methods: uses a graph neural network method called Graph-SCP to identify a smaller sub-problem that contains the solution space, and can be used with other optimization solvers to achieve run time improvement.
results: reduces the problem size by 30-70% and achieves run time speedups up to~25x compared to commercial solvers, and can achieve 100% optimality given a desired threshold.

Abstract
Machine learning (ML) approaches are increasingly being used to accelerate combinatorial optimization (CO) problems. We look specifically at the Set Cover Problem (SCP) and propose Graph-SCP, a graph neural network method that can augment existing optimization solvers by learning to identify a much smaller sub-problem that contains the solution space. We evaluate the performance of Graph-SCP on synthetic weighted and unweighted SCP instances with diverse problem characteristics and complexities, and on instances from the OR Library, a canonical benchmark for SCP. We show that Graph-SCP reduces the problem size by 30-70% and achieves run time speedups up to~25x when compared to commercial solvers (Gurobi). Given a desired optimality threshold, Graph-SCP will improve upon it or even achieve 100% optimality. This is in contrast to fast greedy solutions that significantly compromise solution quality to achieve guaranteed polynomial run time. Graph-SCP can generalize to larger problem sizes and can be used with other conventional or ML-augmented CO solvers to lead to potential additional run time improvement.

摘要

Hyperparameter Adaptive Search for Surrogate Optimization: A Self-Adjusting Approach

paper_url: http://arxiv.org/abs/2310.07970
repo_url: None
paper_authors: Nazanin Nezami, Hadis Anahideh
for: 提高Expensive Black-box函数优化算法的可用性、效果和 converges speed
methods: Hyperparameter Adaptive Search for SO (HASSO)方法，一种自适应的SO算法，动态调整自己的超参数，不需要额外评估
results: 实验结果表明，HASSO可以提高多种SO算法在不同全球优化问题的性能

Abstract
Surrogate Optimization (SO) algorithms have shown promise for optimizing expensive black-box functions. However, their performance is heavily influenced by hyperparameters related to sampling and surrogate fitting, which poses a challenge to their widespread adoption. We investigate the impact of hyperparameters on various SO algorithms and propose a Hyperparameter Adaptive Search for SO (HASSO) approach. HASSO is not a hyperparameter tuning algorithm, but a generic self-adjusting SO algorithm that dynamically tunes its own hyperparameters while concurrently optimizing the primary objective function, without requiring additional evaluations. The aim is to improve the accessibility, effectiveness, and convergence speed of SO algorithms for practitioners. Our approach identifies and modifies the most influential hyperparameters specific to each problem and SO approach, reducing the need for manual tuning without significantly increasing the computational burden. Experimental results demonstrate the effectiveness of HASSO in enhancing the performance of various SO algorithms across different global optimization test problems.

摘要
供质 Optimization（SO）算法已经显示出优化成本高black-box函数的承诺。然而，其性能受到采样和代理适应参数的影响，这对其普及化带来挑战。我们调查了不同SO算法中的 гипер参数对各种问题的影响，并提出了自适应搜索SO方法（HASSO）。HASSO不是一个hyperparameter tuning算法，而是一种通用的自适应SO算法，可以同时进行主要目标函数优化和自动调整参数，不需要额外的评估。目标是使SO算法更加容易使用、有效和快速收敛，为实践者提供更好的解决方案。我们的方法可以根据具体的问题和SO方法自动确定和修改最有影响的参数，从而减少手动调整的需求，不会提高计算负担。实验结果表明，HASSO可以在不同的全球优化测试问题中提高SO算法的性能。

Towards Causal Deep Learning for Vulnerability Detection

paper_url: http://arxiv.org/abs/2310.07958
repo_url: None
paper_authors: Md Mahbubur Rahman, Ira Ceka, Chengzhi Mao, Saikat Chakraborty, Baishakhi Ray, Wei Le
for:这个研究旨在提高深度学习漏洞探测的可靠性和一致性，以便在实际应用中使用。methods:我们提出了一种基于 causality 的方法，包括两个阶段。第一个阶段是设计 novel perturbations，以发现模型可能使用的伪实际特征。第二个阶段是将 causal learning 算法，特别是 do-calculus，应用到现有的深度学习模型上，以系统地移除使用伪实际特征，并且将 causal 基于预测。results:我们的结果显示，我们的方法 CausalVul 可以适当地提高模型的准确性、可靠性和 OOD 性能，并且适用于所有 state-of-the-art 模型和数据集。此外，我们的研究是首个将 do-calculus 基于 causal learning 应用到软件工程模型上，并证明其实际用途。我们的重复套件可以在 https://figshare.com/s/0ffda320dcb96c249ef2 找到。

Abstract
Deep learning vulnerability detection has shown promising results in recent years. However, an important challenge that still blocks it from being very useful in practice is that the model is not robust under perturbation and it cannot generalize well over the out-of-distribution (OOD) data, e.g., applying a trained model to unseen projects in real world. We hypothesize that this is because the model learned non-robust features, e.g., variable names, that have spurious correlations with labels. When the perturbed and OOD datasets no longer have the same spurious features, the model prediction fails. To address the challenge, in this paper, we introduced causality into deep learning vulnerability detection. Our approach CausalVul consists of two phases. First, we designed novel perturbations to discover spurious features that the model may use to make predictions. Second, we applied the causal learning algorithms, specifically, do-calculus, on top of existing deep learning models to systematically remove the use of spurious features and thus promote causal based prediction. Our results show that CausalVul consistently improved the model accuracy, robustness and OOD performance for all the state-of-the-art models and datasets we experimented. To the best of our knowledge, this is the first work that introduces do calculus based causal learning to software engineering models and shows it's indeed useful for improving the model accuracy, robustness and generalization. Our replication package is located at https://figshare.com/s/0ffda320dcb96c249ef2.

摘要
深度学习漏洞检测在最近几年内已经展示了有 promise的结果。然而，一个重要的挑战仍然阻碍它在实际中变得非常有用是，模型不具有对扰动和不同数据集（OOD）的抗锋性和扩展性。我们认为这是因为模型学习了不稳定的特征，例如变量名称，这些特征与标签之间存在假 correlations。当扰动和OOD数据集不再具有这些假特征时，模型预测失败。为了解决这个挑战，在这篇论文中，我们引入了 causality 到深度学习漏洞检测中。我们的方法 CausalVul 包括两个阶段。第一个阶段，我们设计了新的扰动，以便发现模型可能使用的假特征。第二个阶段，我们应用了 causal 学习算法，具体来说是 do-calculus，在现有的深度学习模型之上进行系统性的减少假特征的使用，以便促进 causal 基于预测。我们的结果显示， CausalVul 在所有state-of-the-art 模型和数据集上都有提高模型精度、抗锋性和 OOD 性能。到目前为止，这是首次引入 do-calculus 基于 causal 学习到软件工程模型中，并证明其实际上有用于提高模型精度、抗锋性和扩展性。我们的复制包可以在 https://figshare.com/s/0ffda320dcb96c249ef2 找到。

2023-10-12

eess.IV

eess.IV - 2023-10-12

Cross-correlation image analysis for real-time particle tracking

paper_url: http://arxiv.org/abs/2310.08770
repo_url: None
paper_authors: Leonardo R. Werneck, Cody Jessup, Austin Brandenberger, Tyler Knowles, Charles W. Lewandowski, Megan Nolan, Ken Sible, Zachariah B. Etienne, Brian D’Urso
for: 用于实时图像分析，尤其是在反馈控制系统中。
methods: 使用新的算法，可以实时检测图像中的微小移动，并且可以抗干扰。
results: 实现了实时图像分析，并且可以达到干扰限制的精度。

Abstract
Accurately measuring translations between images is essential in many fields, including biology, medicine, geography, and physics. Existing methods, including the popular FFT-based cross-correlation, are not suitable for real-time analysis, which is especially vital in feedback control systems. To fill this gap, we introduce a new algorithm which approaches shot-noise limited displacement detection and a GPU-based implementation for real-time image analysis.

摘要
精准测量图像之间的翻译是许多领域的关键，包括生物学、医学、地理学和物理学。现有的方法，包括受欢迎的FFT基于的横距矩阵相关，不适合实时分析，尤其是在反馈控制系统中。为填补这一漏洞，我们介绍了一种新的算法，可以实现射频噪声限制的位移检测，以及基于GPU的实时图像分析实现。

Unlocking the capabilities of explainable fewshot learning in remote sensing

paper_url: http://arxiv.org/abs/2310.08619
repo_url: None
paper_authors: Gao Yu Lee, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N Duong
for: 这个评论旨在提供一个最新的概述，探讨深度学习方法在基于遥感图像的任务上的效率和可效性。
methods: 这篇评论提出了一些新的几何学学习方法，以及现有的数据集的应用。
results: 评论表明，几何学学习方法可以有效地适应基于遥感图像的更广泛和多样化的视角。同时，评论也评估了一些最新的state-of-the-art几何学学习方法在隐私场景中的性能。

Abstract
Recent advancements have significantly improved the efficiency and effectiveness of deep learning methods for imagebased remote sensing tasks. However, the requirement for large amounts of labeled data can limit the applicability of deep neural networks to existing remote sensing datasets. To overcome this challenge, fewshot learning has emerged as a valuable approach for enabling learning with limited data. While previous research has evaluated the effectiveness of fewshot learning methods on satellite based datasets, little attention has been paid to exploring the applications of these methods to datasets obtained from UAVs, which are increasingly used in remote sensing studies. In this review, we provide an up to date overview of both existing and newly proposed fewshot classification techniques, along with appropriate datasets that are used for both satellite based and UAV based data. Our systematic approach demonstrates that fewshot learning can effectively adapt to the broader and more diverse perspectives that UAVbased platforms can provide. We also evaluate some SOTA fewshot approaches on a UAV disaster scene classification dataset, yielding promising results. We emphasize the importance of integrating XAI techniques like attention maps and prototype analysis to increase the transparency, accountability, and trustworthiness of fewshot models for remote sensing. Key challenges and future research directions are identified, including tailored fewshot methods for UAVs, extending to unseen tasks like segmentation, and developing optimized XAI techniques suited for fewshot remote sensing problems. This review aims to provide researchers and practitioners with an improved understanding of fewshot learnings capabilities and limitations in remote sensing, while highlighting open problems to guide future progress in efficient, reliable, and interpretable fewshot methods.

摘要
现代技术的进步有效地提高了深度学习方法对于图像基于远程感知任务的效率和效果。然而，需要大量标注数据可能限制深度神经网络对现有远程感知数据集的应用。为解决这个挑战，几何学习（fewshot learning）作为一种有价值的方法来启用学习。 although previous research has evaluated the effectiveness of fewshot learning methods on satellite-based datasets, little attention has been paid to exploring the applications of these methods to datasets obtained from UAVs, which are increasingly used in remote sensing studies. In this review, we provide an up-to-date overview of both existing and newly proposed fewshot classification techniques, along with appropriate datasets that are used for both satellite-based and UAV-based data. Our systematic approach demonstrates that fewshot learning can effectively adapt to the broader and more diverse perspectives that UAV-based platforms can provide. We also evaluate some state-of-the-art fewshot approaches on a UAV disaster scene classification dataset, yielding promising results. We emphasize the importance of integrating XAI techniques like attention maps and prototype analysis to increase the transparency, accountability, and trustworthiness of fewshot models for remote sensing. Key challenges and future research directions are identified, including tailored fewshot methods for UAVs, extending to unseen tasks like segmentation, and developing optimized XAI techniques suited for fewshot remote sensing problems. This review aims to provide researchers and practitioners with an improved understanding of fewshot learning's capabilities and limitations in remote sensing, while highlighting open problems to guide future progress in efficient, reliable, and interpretable fewshot methods.

paper_url: http://arxiv.org/abs/2310.08435
repo_url: None
paper_authors: Ravindu G. Thalagala, Sahan M. Gunawardena, Oscar De Silva, Awantha Jayasiri, Arthur Gubbels, George K. I Mann, Raymond G. Gosine
for:This paper aims to promote GNSS-denied navigation research by providing a unique outdoor aerial dataset captured using a multi-sensor payload.methods:The dataset includes hardware synchronized monocular images, IMU measurements, 3D LiDAR point-clouds, and high-precision RTK-GNSS based ground truth.results:The paper provides a performance summary of state-of-the-art methods applied on the datasets.Here is the same information in Simplified Chinese text:for:这篇论文目的是促进GNSS denied navigation研究，提供一个独特的外部飞行数据集，通过多感器 payload 所捕获。methods:数据集包括硬件同步化的单镜像，IMU测量，3D LiDAR 点云，以及高精度 RTK-GNSS 基准数据。results:论文提供了现有方法在数据集上的性能摘要。

Abstract
This paper presents a unique outdoor aerial visual-inertial-LiDAR dataset captured using a multi-sensor payload to promote the global navigation satellite system (GNSS)-denied navigation research. The dataset features flight distances ranging from 300m to 5km, collected using a DJI M600 hexacopter drone and the National Research Council (NRC) Bell 412 Advanced Systems Research Aircraft (ASRA). The dataset consists of hardware synchronized monocular images, IMU measurements, 3D LiDAR point-clouds, and high-precision real-time kinematic (RTK)-GNSS based ground truth. Ten datasets were collected as ROS bags over 100 mins of outdoor environment footage ranging from urban areas, highways, hillsides, prairies, and waterfronts. The datasets were collected to facilitate the development of visual-inertial-LiDAR odometry and mapping algorithms, visual-inertial navigation algorithms, object detection, segmentation, and landing zone detection algorithms based upon real-world drone and full-scale helicopter data. All the datasets contain raw sensor measurements, hardware timestamps, and spatio-temporally aligned ground truth. The intrinsic and extrinsic calibrations of the sensors are also provided along with raw calibration datasets. A performance summary of state-of-the-art methods applied on the datasets is also provided.

摘要
Note: "Simplified Chinese" is used to refer to the written form of Chinese that uses simpler characters and grammar than Traditional Chinese. However, it is important to note that there is no single standard for Simplified Chinese, and different regions may use different forms of Simplified Chinese. The translation provided above is based on the common usage of Simplified Chinese in Mainland China.

2023-10-13

SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

A Computational Approach to Style in American Poetry

User Inference Attacks on Large Language Models

PromptRE: Weakly-Supervised Document-Level Relation Extraction via Prompting-Based Data Programming

Political claim identification and categorization in a multilingual setting: First experiments

Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy

Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model Collaboration

BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models for Sentiment Analysis of Bangla Social Media Posts

AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems

Automated Claim Matching with Large Language Models: Empowering Fact-Checkers in the Fight Against Misinformation

Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration

Developing a Natural Language Understanding Model to Characterize Cable News Bias

BibRank: Automatic Keyphrase Extraction Platform Using~Metadata

PuoBERTa: Training and evaluation of a curated language model for Setswana

A Frustratingly Easy Plug-and-Play Detection-and-Reasoning Module for Chinese Spelling Check

Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model

MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks

Dont Add, dont Miss: Effective Content Preserving Generation from Pre-Selected Text Spans

Towards Example-Based NMT with Multi-Levenshtein Transformers

xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark

Textual Analysis of ICALEPCS and IPAC Conference Proceedings: Revealing Research Trends, Topics, and Collaborations for Future Insights and Advanced Search

CAMELL: Confidence-based Acquisition Model for Efficient Self-supervised Active Learning with Label Validation

Multi-level Adaptive Contrastive Learning for Knowledge Internalization in Dialogue Generation

Towards Informative Few-Shot Prompt with Maximum Information Gain for In-Context Learning

Human-in-the-loop Machine Translation with Large Language Model

SeqXGPT: Sentence-Level AI-Generated Text Detection

Exploration with Principles for Diverse AI Supervision

PerturbScore: Connecting Discrete and Continuous Perturbations in NLP

InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems

Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System

Guiding AMR Parsing with Reverse Graph Linearization

End-to-end Story Plot Generator

2023-10-13

G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations

Target Variable Engineering

Learning nonlinear integral operators via Recurrent Neural Networks and its application in solving Integro-Differential Equations

Effects of cavity nonlinearities and linear losses on silicon microring-based reservoir computing

Offline Reinforcement Learning for Optimizing Production Bidding Policies

ZeroSwap: Data-driven Optimal Market Making in DeFi

Identifiability of Product of Experts Models

Machine Learning Estimation of Maximum Vertical Velocity from Radar

CORN: Co-Trained Full-Reference And No-Reference Audio Metrics

Identifying and examining machine learning biases on Adult dataset

From Words and Exercises to Wellness: Farsi Chatbot for Self-Attachment Technique

Is Certifying $\ell_p$ Robustness Still Worthwhile?

Exact Verification of ReLU Neural Control Barrier Functions

Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

Statistical guarantees for stochastic Metropolis-Hastings

Disentangled Latent Spaces Facilitate Data-Driven Auxiliary Learning

A Hybrid Approach for Depression Classification: Random Forest-ANN Ensemble on Motor Activity Signals

Genetic algorithms are strong baselines for molecule generation

Towards End-to-end 4-Bit Inference on Generative Large Language Models

Generative Entropic Neural Optimal Transport To Map Within and Across Spaces

Insuring Smiles: Predicting routine dental coverage using Spark ML

Regularization-Based Methods for Ordinal Quantification

Graph Condensation via Eigenbasis Matching

A 4-approximation algorithm for min max correlation clustering

Variational autoencoder with weighted samples for high-dimensional non-parametric adaptive importance sampling

A Deep Neural Network – Mechanistic Hybrid Model to Predict Pharmacokinetics in Rat

Jointly-Learned Exit and Inference for a Dynamic Neural Network : JEI-DNN

The Computational Complexity of Finding Stationary Points in Non-Convex Optimization

Lattice Approximations in Wasserstein Space

Goodhart’s Law in Reinforcement Learning

Computing Marginal and Conditional Divergences between Decomposable Models with Applications

On Generalization Bounds for Projective Clustering

Automatic Music Playlist Generation via Simulation-based Reinforcement Learning

Topological Data Analysis in smart manufacturing processes – A survey on the state of the art

Online Relocating and Matching of Ride-Hailing Services: A Model-Based Modular Approach

MINDE: Mutual Information Neural Diffusion Estimation

Federated Meta-Learning for Few-Shot Fault Diagnosis with Representation Encoding

Measuring the Stability of Process Outcome Predictions in Online Settings

PAGE: Equilibrate Personalization and Generalization in Federated Learning

LLaMA Rider: Spurring Large Language Models to Explore the Open World

EHI: End-to-end Learning of Hierarchical Index for Efficient Dense Retrieval

Gesture Recognition for FMCW Radar on the Edge

A Survey of Methods for Handling Disk Data Imbalance

In-Context Learning for Few-Shot Molecular Property Prediction

Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and Adaptation

Semi-Supervised End-To-End Contrastive Learning For Time Series Classification