2023-10-29

cs.CL

cs.CL - 2023-10-29

From Chatbots to PhishBots? – Preventing Phishing scams created using ChatGPT, Google Bard and Claude

paper_url: http://arxiv.org/abs/2310.19181
repo_url: None
paper_authors: Sayak Saha Roy, Poojitha Thota, Krishna Vamsi Naragam, Shirin Nilizadeh
for: 防止 Large Language Models (LLMs) 生成邪恶内容，包括骗财攻击。
methods: 使用四种常见的商业可用 LLMs（ChatGPT、GPT 4、Claude 和 Bard）生成功能骗财攻击，使用 serie 的邪恶提示。
results: 发现这些 LLMs 可以生成具有识别度的骗财电子邮件和网站，并且可以使用诸如逃脱检测系统的诡计来掩盖自己。

Abstract
The advanced capabilities of Large Language Models (LLMs) have made them invaluable across various applications, from conversational agents and content creation to data analysis, research, and innovation. However, their effectiveness and accessibility also render them susceptible to abuse for generating malicious content, including phishing attacks. This study explores the potential of using four popular commercially available LLMs - ChatGPT (GPT 3.5 Turbo), GPT 4, Claude and Bard to generate functional phishing attacks using a series of malicious prompts. We discover that these LLMs can generate both phishing emails and websites that can convincingly imitate well-known brands, and also deploy a range of evasive tactics for the latter to elude detection mechanisms employed by anti-phishing systems. Notably, these attacks can be generated using unmodified, or "vanilla," versions of these LLMs, without requiring any prior adversarial exploits such as jailbreaking. As a countermeasure, we build a BERT based automated detection tool that can be used for the early detection of malicious prompts to prevent LLMs from generating phishing content attaining an accuracy of 97\% for phishing website prompts, and 94\% for phishing email prompts.

摘要
大型自然语言模型（LLM）的高级功能使得它们在不同应用程序中变得不可或缺，从对话代理和内容创作到数据分析、研究和创新。然而，它们的效iveness和可用性也使得它们容易遭受用于生成恶意内容的违用，包括骗财攻击。这项研究探讨了使用四种流行的商业可用的 LLM——ChatGPT（GPT 3.5 Turbo）、GPT 4、Claude 和 Bard——生成功能攻击。我们发现这些 LLM 可以生成具有识别度的恶意电子邮件和网站，并且可以部署一系列逃脱检测机制的诡计。值得注意的是，这些攻击可以使用未修改的、“纯净”的 LLM 进行生成，不需要任何先前的反对攻击如监禁。作为对策，我们构建了基于 BERT 的自动检测工具，可以用于早期检测恶意提示，以防止 LLM 生成攻击内容，其准确率为 97% для骗财网站提示，和 94% для骗财电子邮件提示。

Robustifying Language Models with Test-Time Adaptation

paper_url: http://arxiv.org/abs/2310.19177
repo_url: None
paper_authors: Noah Thomas McDermott, Junfeng Yang, Chengzhi Mao
for: 防止语言模型受到语言攻击
methods: 使用遮盖词预测来动态适应输入句子，以逆转语言攻击
results: 在两个受欢迎的句子分类任务上，我们的方法可以修复65%以上的语言攻击In English, this means:
for: Preventing language models from being attacked by adversarial language
methods: Using dynamic adaptation of input sentences with predictions from masked words to reverse language adversarial attacks
results: Our method can repair over 65% of adversarial language attacks on two popular sentence classification tasks without requiring any training.

Abstract
Large-scale language models achieved state-of-the-art performance over a number of language tasks. However, they fail on adversarial language examples, which are sentences optimized to fool the language models but with similar semantic meanings for humans. While prior work focuses on making the language model robust at training time, retraining for robustness is often unrealistic for large-scale foundation models. Instead, we propose to make the language models robust at test time. By dynamically adapting the input sentence with predictions from masked words, we show that we can reverse many language adversarial attacks. Since our approach does not require any training, it works for novel tasks at test time and can adapt to novel adversarial corruptions. Visualizations and empirical results on two popular sentence classification datasets demonstrate that our method can repair adversarial language attacks over 65% o

摘要
大规模语言模型在多种语言任务上实现了状态机器的表现，但它们对语言攻击例子失败，这些例子是用来欺骗语言模型的，但对人类来说含义相同的句子。而现有的工作通常在训练时做 robustness 的优化，但对大规模基础模型来说，这种 retraining 是不现实的。因此，我们提议在测试时使用语言模型的 robustness。我们通过在输入句子上动态适应预测结果来显示，我们可以反转许多语言攻击例子。我们的方法不需要任何训练，因此它在测试时可以对新任务进行适应，并且可以适应新的语言攻击。我们的实验结果和视觉化结果表明，我们的方法可以修复大于 65% 的语言攻击例子。

Poisoning Retrieval Corpora by Injecting Adversarial Passages

paper_url: http://arxiv.org/abs/2310.19156
repo_url: https://github.com/princeton-nlp/corpus-poisoning
paper_authors: Zexuan Zhong, Ziqing Huang, Alexander Wettig, Danqi Chen
for: 本研究旨在测试紧密搜寻系统的安全性，特别是在真实世界应用中是否可以安全地启用。
methods: 作者提出了一种新的对紧密搜寻系统的攻击方法，其中一名黑客产生了一小批陌生过程，并将其插入到大量搜寻 corpora 中，以导致紧密搜寻系统错误地回答查询。
results: 研究发现，这种攻击可以将紧密搜寻系统误导回答，并且这些陌生过程可以直接扩展到不同的域外查询和 corpora，例如在金融文档或网络论坛中，50个生成的过程可以误导>94%的查询。

Abstract
Dense retrievers have achieved state-of-the-art performance in various information retrieval tasks, but to what extent can they be safely deployed in real-world applications? In this work, we propose a novel attack for dense retrieval systems in which a malicious user generates a small number of adversarial passages by perturbing discrete tokens to maximize similarity with a provided set of training queries. When these adversarial passages are inserted into a large retrieval corpus, we show that this attack is highly effective in fooling these systems to retrieve them for queries that were not seen by the attacker. More surprisingly, these adversarial passages can directly generalize to out-of-domain queries and corpora with a high success attack rate -- for instance, we find that 50 generated passages optimized on Natural Questions can mislead >94% of questions posed in financial documents or online forums. We also benchmark and compare a range of state-of-the-art dense retrievers, both unsupervised and supervised. Although different systems exhibit varying levels of vulnerability, we show they can all be successfully attacked by injecting up to 500 passages, a small fraction compared to a retrieval corpus of millions of passages.

摘要
dense retrievers 在多种信息检索任务中实现了状态码表现，但它们在实际应用中是否可以安全部署？在这项工作中，我们提出了一种 novel 的攻击方法，malicious user 通过修改精确的token来生成一小数量的对抗 passage，以最大化与提供的训练问题的相似性。当这些对抗 passage 添加到大量的检索库中时，我们发现这种攻击非常有效，能让这些系统 retrieve 这些恶意生成的 passage 作为未看过的查询。即使在不同的领域和数据集上，这些对抗 passage 仍然能够直接泛化，并达到高度的成功攻击率。例如，我们发现50个优化后的对抗 passage 可以误导 >94%的金融文档或在线讨论中的问题。我们还对多种当前最佳的 dense retrievers 进行了 benchmark 和比较，包括不超过500个对抗 passage 的攻击。虽然不同的系统在攻击上展现出不同的抵抗程度，但我们发现所有这些系统都可以被成功攻击。

BERT Lost Patience Won’t Be Robust to Adversarial Slowdown

paper_url: http://arxiv.org/abs/2310.19152
repo_url: https://github.com/ztcoalson/waffle
paper_authors: Zachary Coalson, Gabriel Ritter, Rakesh Bobba, Sanghyun Hong
for: 这 paper 评估了多出口语言模型对钝化攻击的 Robustness。
methods: 作者设计了一种钝化攻击，通过生成自然的 adversarial text 绕过早出点。然后，他们使用这种 WAFFLE 攻击来进行多出口机制的全面评估，并在 GLUE benchmark 上测试了三种多出口机制在钝化攻击下的性能。
results: 研究发现，钝化攻击可以减少多出口机制提供的计算成本，特别是对于复杂的机制而言。此外，研究还发现了一些常见的 perturbation 模式，并与标准的 adversarial text 攻击进行比较。最后，研究发现了输入整形可以有效地解决钝化攻击，但是 adversarial training 无法战胜钝化攻击。

Abstract
In this paper, we systematically evaluate the robustness of multi-exit language models against adversarial slowdown. To audit their robustness, we design a slowdown attack that generates natural adversarial text bypassing early-exit points. We use the resulting WAFFLE attack as a vehicle to conduct a comprehensive evaluation of three multi-exit mechanisms with the GLUE benchmark against adversarial slowdown. We then show our attack significantly reduces the computational savings provided by the three methods in both white-box and black-box settings. The more complex a mechanism is, the more vulnerable it is to adversarial slowdown. We also perform a linguistic analysis of the perturbed text inputs, identifying common perturbation patterns that our attack generates, and comparing them with standard adversarial text attacks. Moreover, we show that adversarial training is ineffective in defeating our slowdown attack, but input sanitization with a conversational model, e.g., ChatGPT, can remove perturbations effectively. This result suggests that future work is needed for developing efficient yet robust multi-exit models. Our code is available at: https://github.com/ztcoalson/WAFFLE

摘要
在这篇论文中，我们系统地评估了多出口语言模型对针对性慢速攻击的Robustness。为了审计其Robustness，我们设计了一种通过绕过早出点生成自然针对性文本的攻击。我们使用这种WAFFLE攻击来进行对三种多出口机制的GLUEbenchmark进行广泛的评估，并显示我们的攻击可以在白盒和黑盒设置下减少了这些方法提供的计算成本。我们发现，复杂的机制更容易受到针对性慢速攻击。此外，我们还进行了文本输入的语言分析，找到了我们的攻击生成的扰乱模式，并与标准针对性攻击相比较。此外，我们还发现，对于我们的慢速攻击，反向训练无法有效地抵抗，但是使用 conversational 模型，例如 ChatGPT，可以有效地除掉扰乱。这种结果表明，未来的工作需要开发高效又Robust的多出口模型。我们的代码可以在：https://github.com/ztcoalson/WAFFLE 获取。

Learning to Follow Object-Centric Image Editing Instructions Faithfully

paper_url: http://arxiv.org/abs/2310.19145
repo_url: https://github.com/tuhinjubcse/faithfuledits_emnlp2023
paper_authors: Tuhin Chakrabarty, Kanishk Singh, Arkadiy Saakyan, Smaranda Muresan
for: 这篇论文旨在提高文本到图像扩展模型中的自然语言指令编辑性能。
methods: 本文提出了一种基于最新的分割、链式思维提示和视觉问答技术的方法，可以提高自然语言指令下的图像编辑质量。
results: 对比于现有的基线，该方法能够进行细化的对象中心编辑，并且能够在未经训练的领域中进行扩展。此外，模型还能够捕捉到文本指令中的含义，进行 faithfulness 的捕捉和修改。

Abstract
Natural language instructions are a powerful interface for editing the outputs of text-to-image diffusion models. However, several challenges need to be addressed: 1) underspecification (the need to model the implicit meaning of instructions) 2) grounding (the need to localize where the edit has to be performed), 3) faithfulness (the need to preserve the elements of the image not affected by the edit instruction). Current approaches focusing on image editing with natural language instructions rely on automatically generated paired data, which, as shown in our investigation, is noisy and sometimes nonsensical, exacerbating the above issues. Building on recent advances in segmentation, Chain-of-Thought prompting, and visual question answering, we significantly improve the quality of the paired data. In addition, we enhance the supervision signal by highlighting parts of the image that need to be changed by the instruction. The model fine-tuned on the improved data is capable of performing fine-grained object-centric edits better than state-of-the-art baselines, mitigating the problems outlined above, as shown by automatic and human evaluations. Moreover, our model is capable of generalizing to domains unseen during training, such as visual metaphors.

摘要
自然语言指令是文本到图像扩散模型的高级用户界面。然而，需要解决以下挑战：1）下pecification（需要模型理解指令的隐含含义）、2）grounding（需要确定编辑操作的具体位置）、3）loyal（需要保持图像中未受影响的元素）。现有的方法通过自动生成的对应数据来实现图像编辑，但这些数据经过我们的调查发现噪音和无意义，这些问题进一步加剧了上述问题。我们基于最近的分割、链条提示和视觉问答技术进行了大幅改进，提高对应数据的质量。此外，我们还强调要更改的图像部分，以提高模型的精细化对象编辑能力。经过练习这些改进后的模型，在自动和人工评估中都能够更好地完成细化的对象编辑任务，并且能够在训练时未看到的领域中进行推断。此外，我们的模型还能够捕捉到视觉 метаFOR，进一步提高图像编辑的精度和效果。

paper_url: http://arxiv.org/abs/2310.19130
repo_url: https://github.com/ahmedssabir/genderscore
paper_authors: Ahmed Sabir, Lluís Padró
for: investigate the impact of objects on gender bias in image captioning systems
methods: use visual semantic-based gender score to measure the degree of bias
results: propose a gender score that can be used as an additional metric to existing approach, and observe the bias relation between caption and related gender

Abstract
In this paper, we investigate the impact of objects on gender bias in image captioning systems. Our results show that only gender-specific objects have a strong gender bias (e.g., women-lipstick). In addition, we propose a visual semantic-based gender score that measures the degree of bias and can be used as a plug-in for any image captioning system. Our experiments demonstrate the utility of the gender score, since we observe that our score can measure the bias relation between a caption and its related gender; therefore, our score can be used as an additional metric to the existing Object Gender Co-Occ approach. Code and data are publicly available at \url{https://github.com/ahmedssabir/GenderScore}.

摘要
在这篇论文中，我们研究了图像描述系统中的性别偏见。我们的结果显示，只有性别特定的物品会带有强烈的性别偏见（例如女性 lipstick）。此外，我们提出了基于视觉 semantics 的性别分数，可以用于任何图像描述系统中。我们的实验表明了这个分数的用途，因为我们发现了这个分数可以测量描述和其相关的性别之间的偏见关系，因此可以用作现有的 Object Gender Co-Occ 方法的附加指标。代码和数据都可以在 \url{https://github.com/ahmedssabir/GenderScore} 上获取。

Unified Representation for Non-compositional and Compositional Expressions

paper_url: http://arxiv.org/abs/2310.19127
repo_url: None
paper_authors: Ziheng Zeng, Suma Bhat
for: This paper is written for researchers and developers working on natural language processing (NLP) and machine learning, specifically those interested in non-compositional language and idiomatic expressions.
methods: The paper proposes a language model called PIER, which builds on BART and generates semantically meaningful and contextually appropriate representations for English potentially idiomatic expressions (PIEs).
results: The paper shows that the representations generated by PIER result in higher homogeneity scores for embedding clustering and gains in accuracy and sequence accuracy for PIE sense classification and span detection compared to the state-of-the-art IE representation model, GIEA, without sacrificing performance on NLU tasks.

Abstract
Accurate processing of non-compositional language relies on generating good representations for such expressions. In this work, we study the representation of language non-compositionality by proposing a language model, PIER, that builds on BART and can create semantically meaningful and contextually appropriate representations for English potentially idiomatic expressions (PIEs). PIEs are characterized by their non-compositionality and contextual ambiguity in their literal and idiomatic interpretations. Via intrinsic evaluation on embedding quality and extrinsic evaluation on PIE processing and NLU tasks, we show that representations generated by PIER result in 33% higher homogeneity score for embedding clustering than BART, whereas 3.12% and 3.29% gains in accuracy and sequence accuracy for PIE sense classification and span detection compared to the state-of-the-art IE representation model, GIEA. These gains are achieved without sacrificing PIER's performance on NLU tasks (+/- 1% accuracy) compared to BART.

摘要
Accurate processing of non-compositional language relies on generating good representations for such expressions. In this work, we study the representation of language non-compositionality by proposing a language model, PIER, that builds on BART and can create semantically meaningful and contextually appropriate representations for English potentially idiomatic expressions (PIEs). PIEs are characterized by their non-compositionality and contextual ambiguity in their literal and idiomatic interpretations. Via intrinsic evaluation on embedding quality and extrinsic evaluation on PIE processing and NLU tasks, we show that representations generated by PIER result in 33% higher homogeneity score for embedding clustering than BART, whereas 3.12% and 3.29% gains in accuracy and sequence accuracy for PIE sense classification and span detection compared to the state-of-the-art IE representation model, GIEA. These gains are achieved without sacrificing PIER's performance on NLU tasks (+/- 1% accuracy) compared to BART.Here's the translation in Traditional Chinese: Accurate processing of non-compositional language relies on generating good representations for such expressions. In this work, we study the representation of language non-compositionality by proposing a language model, PIER, that builds on BART and can create semantically meaningful and contextually appropriate representations for English potentially idiomatic expressions (PIEs). PIEs are characterized by their non-compositionality and contextual ambiguity in their literal and idiomatic interpretations. Via intrinsic evaluation on embedding quality and extrinsic evaluation on PIE processing and NLU tasks, we show that representations generated by PIER result in 33% higher homogeneity score for embedding clustering than BART, whereas 3.12% and 3.29% gains in accuracy and sequence accuracy for PIE sense classification and span detection compared to the state-of-the-art IE representation model, GIEA. These gains are achieved without sacrificing PIER's performance on NLU tasks (+/- 1% accuracy) compared to BART.

PACuna: Automated Fine-Tuning of Language Models for Particle Accelerators

paper_url: http://arxiv.org/abs/2310.19106
repo_url: None
paper_authors: Antonin Sulc, Raimund Kammering, Annika Eichler, Tim Wilksen
for: 提高加速器设备的理解和解释能力
methods: 使用公开available的加速器资源（如会议、预印文章和书籍）自动生成问题和数据集，并使用Language模型进行精细调整
results: PACuna可以解决复杂的加速器问题，并被专家 validateTranslation:
for: 提高加速器设备的理解和解释能力
methods: 使用公开available的加速器资源（如会议、预印文章和书籍）自动生成问题和数据集，并使用Language模型进行精细调整
results: PACuna可以解决复杂的加速器问题，并被专家 validate

Abstract
Navigating the landscape of particle accelerators has become increasingly challenging with recent surges in contributions. These intricate devices challenge comprehension, even within individual facilities. To address this, we introduce PACuna, a fine-tuned language model refined through publicly available accelerator resources like conferences, pre-prints, and books. We automated data collection and question generation to minimize expert involvement and make the data publicly available. PACuna demonstrates proficiency in addressing intricate accelerator questions, validated by experts. Our approach shows adapting language models to scientific domains by fine-tuning technical texts and auto-generated corpora capturing the latest developments can further produce pre-trained models to answer some intricate questions that commercially available assistants cannot and can serve as intelligent assistants for individual facilities.

摘要
在加速器领域的探索中，由于最近的贡献增加， navigating 已成为越来越复杂的任务。这些细腻的设备会使人们感到困惑，甚至在同一个设施内。为解决这问题，我们介绍了 PACuna，一种精度调整的语言模型，通过公共可用的加速器资源，如会议、预印和书籍来优化。我们自动收集数据和生成问题，以最小化专家参与度，并将数据公开可用。 PACuna 在解决复杂的加速器问题方面表现出色，由专家 validate。我们的方法表明，通过科学领域中的技术文本和自动生成的 corpora 来练习语言模型，可以生成适用于一些复杂问题的预训练模型，这些模型可以作为加速器设施的智能助手。

Pushdown Layers: Encoding Recursive Structure in Transformer Language Models

paper_url: http://arxiv.org/abs/2310.19089
repo_url: None
paper_authors: Shikhar Murty, Pratyusha Sharma, Jacob Andreas, Christopher D. Manning
for: This paper aims to improve the syntactic generalization of Transformer language models by introducing a new self-attention layer called Pushdown Layers.
methods: The Pushdown Layers model recursive state via a stack tape that tracks estimated depths of every token, and the Transformer LMs with Pushdown Layers use this stack tape to softly modulate attention over tokens.
results: The authors achieve dramatically better and 3-5x more sample-efficient syntactic generalization when training Transformers equipped with Pushdown Layers on a corpus of strings annotated with silver constituency parses, while maintaining similar perplexities.

Abstract
Recursion is a prominent feature of human language, and fundamentally challenging for self-attention due to the lack of an explicit recursive-state tracking mechanism. Consequently, Transformer language models poorly capture long-tail recursive structure and exhibit sample-inefficient syntactic generalization. This work introduces Pushdown Layers, a new self-attention layer that models recursive state via a stack tape that tracks estimated depths of every token in an incremental parse of the observed prefix. Transformer LMs with Pushdown Layers are syntactic language models that autoregressively and synchronously update this stack tape as they predict new tokens, in turn using the stack tape to softly modulate attention over tokens -- for instance, learning to "skip" over closed constituents. When trained on a corpus of strings annotated with silver constituency parses, Transformers equipped with Pushdown Layers achieve dramatically better and 3-5x more sample-efficient syntactic generalization, while maintaining similar perplexities. Pushdown Layers are a drop-in replacement for standard self-attention. We illustrate this by finetuning GPT2-medium with Pushdown Layers on an automatically parsed WikiText-103, leading to improvements on several GLUE text classification tasks.

摘要
人类语言中具有重要特点的一种是Recursion，它对于自注意机制的缺乏显式状态跟踪机制而具有挑战性。因此，Transformer语言模型在捕捉长尾递归结构方面表现不佳，并且 exhibit sample-inefficient sintactic generalization。这项工作介绍了Pushdown层，一种新的自注意层，通过一个堆栈带跟踪每个字符的估计深度来模型 recursive state。Transformer LMs WITH Pushdown Layers 是一种强式语言模型，可以同步和顺序地更新这个堆栈带，并在预测新字符时使用堆栈来软模式地修饰注意力。例如，学习"跳过"关闭的成分。当在一个Silver Constituency Parses 的集合上训练 Transformer 时，它们配备 Pushdown Layers 可以在同样的批量大小下达到更好的 3-5 倍的样本效率，同时保持相似的折衔率。Pushdown Layers 是一种可替换的自注意层。我们通过对 GPT2-medium WITH Pushdown Layers 在自动生成的 WikiText-103 上进行训练，来示例这一点。

A Survey on Recent Named Entity Recognition and Relation Classification Methods with Focus on Few-Shot Learning Approaches

paper_url: http://arxiv.org/abs/2310.19055
repo_url: None
paper_authors: Sakher Alqaaidi, Elika Bozorgi
for: 本研究主要针对非结构化文本中的命名实体识别和关系类型分类两个关键阶段，以抽取有用信息。
methods: 本文主要介绍了最新的非结构化文本处理应用中的命名实体识别和关系类型分类方法，特别是几步学习方法。
results: 本文对两个领域的最新成果进行了比较分析，并对几步学习方法的结果进行了结构化分析。

Abstract
Named entity recognition and relation classification are key stages for extracting information from unstructured text. Several natural language processing applications utilize the two tasks, such as information retrieval, knowledge graph construction and completion, question answering and other domain-specific applications, such as biomedical data mining. We present a survey of recent approaches in the two tasks with focus on few-shot learning approaches. Our work compares the main approaches followed in the two paradigms. Additionally, we report the latest metric scores in the two tasks with a structured analysis that considers the results in the few-shot learning scope.

摘要
Named entity recognition和关系分类是抽取无结构文本信息的关键阶段。许多自然语言处理应用程序利用这两个任务，如信息检索、知识图构建和完善、问答等领域应用程序，以及生物医学数据挖掘等领域应用程序。我们对最近的方法进行了评论，并对几种学习 paradigms进行了比较。此外，我们还对这两个任务中最新的 метри克分数进行了报告，并进行了结构化分析，考虑到几种少量学习范围内的结果。

ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic

paper_url: http://arxiv.org/abs/2310.19034
repo_url: None
paper_authors: Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, Sana Ghanem
for: 本研究开发了一个大型的阿拉伯语言Intent检测dataset，名为ArBanking77，并将其 arabized 和 localized 到了英文 Banking77 dataset。
methods: 本研究使用了一个基于 AraBERT 的神经网络模型，并在 ArBanking77 上进行了 fine-tuning，以达到了 F1-score 的 0.9209 和 0.8995 在 Modern Standard Arabic 和 Palestinian dialect 中 respectively。
results: 本研究实现了对 live chat 查询中的实际应用，并在 simulated low-resource 环境下进行了广泛的实验，以评估模型在不同的情况下的表现。

Abstract
This paper presents the ArBanking77, a large Arabic dataset for intent detection in the banking domain. Our dataset was arabized and localized from the original English Banking77 dataset, which consists of 13,083 queries to ArBanking77 dataset with 31,404 queries in both Modern Standard Arabic (MSA) and Palestinian dialect, with each query classified into one of the 77 classes (intents). Furthermore, we present a neural model, based on AraBERT, fine-tuned on ArBanking77, which achieved an F1-score of 0.9209 and 0.8995 on MSA and Palestinian dialect, respectively. We performed extensive experimentation in which we simulated low-resource settings, where the model is trained on a subset of the data and augmented with noisy queries to simulate colloquial terms, mistakes and misspellings found in real NLP systems, especially live chat queries. The data and the models are publicly available at https://sina.birzeit.edu/arbanking77.

摘要
Note: "AraBERT" is a pre-trained Arabic language model, similar to BERT.

SALMA: Arabic Sense-Annotated Corpus and WSD Benchmarks

paper_url: http://arxiv.org/abs/2310.19029
repo_url: None
paper_authors: Mustafa Jarrar, Sanad Malaysha, Tymaa Hammouda, Mohammed Khalilia
for: 这个论文是为了描述一个新的阿拉伯语意义权重annotated corpus（SALMA），以及该 corpus 的注释工具和评估 metric。
methods: 这个论文使用了两种不同的意义инвенタри（Modern和Ghani）同时进行注释，并为每个词语提供了多个意义的分数。在注释过程中，研究人员还使用了六种名称实体的注释。
results: 研究人员通过了多种 metric（Kappa、Lineal Weighted Kappa、Quadratic Weighted Kappa、Mean Average Error、Root Mean Square Error）来评估注释质量，并发现了非常高的间接对应者一致性。此外，研究人员还开发了一个基于 Target Sense Verification 的 Word Sense Disambiguation 系统，并使用这个系统来评估三种 Target Sense Verification 模型的性能，其中最佳模型的准确率达到了 84.2%（使用 Modern）和 78.7%（使用 Ghani）。

Abstract
SALMA, the first Arabic sense-annotated corpus, consists of ~34K tokens, which are all sense-annotated. The corpus is annotated using two different sense inventories simultaneously (Modern and Ghani). SALMA novelty lies in how tokens and senses are associated. Instead of linking a token to only one intended sense, SALMA links a token to multiple senses and provides a score to each sense. A smart web-based annotation tool was developed to support scoring multiple senses against a given word. In addition to sense annotations, we also annotated the corpus using six types of named entities. The quality of our annotations was assessed using various metrics (Kappa, Linear Weighted Kappa, Quadratic Weighted Kappa, Mean Average Error, and Root Mean Square Error), which show very high inter-annotator agreement. To establish a Word Sense Disambiguation baseline using our SALMA corpus, we developed an end-to-end Word Sense Disambiguation system using Target Sense Verification. We used this system to evaluate three Target Sense Verification models available in the literature. Our best model achieved an accuracy with 84.2% using Modern and 78.7% using Ghani. The full corpus and the annotation tool are open-source and publicly available at https://sina.birzeit.edu/salma/.

摘要
SALMA，首个阿拉伯语意义注释 корпу斯，包含约34000个字符，所有字符都有意义注释。 corpora 使用两个不同的意义集 simultaneously (Modern 和 Ghani)。 SALMA 的创新在于如何将字符和意义相关联。而不是将字符与一个固定的意义相关联，SALMA 将字符与多个意义相关联，并为每个意义提供一个分数。为支持多个意义对一个词的分数，我们开发了一个智能的网络基于的注释工具。此外，我们还对 corpora 进行了六种命名实体的注释。我们对注释的质量进行了多种 metric 评估（Kappa、线性权重Kappa、quadratic Weighted Kappa、平均误差和根平方误差），它们显示了非常高的间对注释者一致性。为建立基于我们 SALMA корпу的单词意义推断基线，我们开发了一个 Target Sense Verification 基于的全 End-to-end Word Sense Disambiguation 系统。我们使用这个系统来评估Literature 中提供的三种 Target Sense Verification 模型。我们的最佳模型在Modern 和 Ghani 中达到了84.2%和78.7%的准确率。整个corpus 和注释工具都是开源的，可以在获取。

LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection

paper_url: http://arxiv.org/abs/2310.18964
repo_url: None
paper_authors: Ahmad Nasir, Aadish Sharma, Kokil Jaidka
for: 这 paper 比较了不同的预训练和精度调整的大语言模型（LLMs）在仇恨言语检测中的表现。
methods: 本研究发现了cross-domain 适用性和过拟合风险是LLMs的主要挑战。我们通过评估发现了需要更多的标签多样性来让模型更好地捕捉仇恨言语的细节。
results: 我们的研究结果表明，通过适度调整和更多的标签多样性，可以提高模型的泛化性和检测精度。我们认为未来的仇恨言语检测应该强调cross-domain泛化和合适的benchmarking实践。

Abstract
This paper compares different pre-trained and fine-tuned large language models (LLMs) for hate speech detection. Our research underscores challenges in LLMs' cross-domain validity and overfitting risks. Through evaluations, we highlight the need for fine-tuned models that grasp the nuances of hate speech through greater label heterogeneity. We conclude with a vision for the future of hate speech detection, emphasizing cross-domain generalizability and appropriate benchmarking practices.

摘要
这篇论文比较了不同的预训练和微调大型自然语言模型（LLM）对仇视言语检测的性能。我们的研究强调了LLM在不同领域的交叉领域有效性和过拟合风险。通过评估，我们强调需要微调模型，以便更好地捕捉仇视言语的细节和多样性。我们 conclude with a future vision for hate speech detection，强调跨领域一致性和合适的标准化实践。Note that the word " LL" in the original text was translated as "LLM" in Simplified Chinese, as "LL" is not a commonly used term in Simplified Chinese.

S2F-NER: Exploring Sequence-to-Forest Generation for Complex Entity Recognition

paper_url: http://arxiv.org/abs/2310.18944
repo_url: None
paper_authors: Yongxiu Xu, Heyan Huang, Yue Hu
for: 这篇论文主要针对复杂的实体识别问题（Named Entity Recognition，NER），例如嵌入、重叠和不连续的实体。
methods: 我们提出了一种新的序列到森林生成模式（Sequence-to-Forest，S2F-NER），它可以直接在句子中提取实体，而不是采用传统的序列到序列（Sequence-to-Sequence，Seq2Seq）生成模式。
results: 我们的模型在三个不连续NER数据集和两个嵌入NER数据集上表现出色，特别是对于不连续实体识别。

Abstract
Named Entity Recognition (NER) remains challenging due to the complex entities, like nested, overlapping, and discontinuous entities. Existing approaches, such as sequence-to-sequence (Seq2Seq) generation and span-based classification, have shown impressive performance on various NER subtasks, but they are difficult to scale to datasets with longer input text because of either exposure bias issue or inefficient computation. In this paper, we propose a novel Sequence-to-Forest generation paradigm, S2F-NER, which can directly extract entities in sentence via a Forest decoder that decode multiple entities in parallel rather than sequentially. Specifically, our model generate each path of each tree in forest autoregressively, where the maximum depth of each tree is three (which is the shortest feasible length for complex NER and is far smaller than the decoding length of Seq2Seq). Based on this novel paradigm, our model can elegantly mitigates the exposure bias problem and keep the simplicity of Seq2Seq. Experimental results show that our model significantly outperforms the baselines on three discontinuous NER datasets and on two nested NER datasets, especially for discontinuous entity recognition.

摘要

Retrofitting Light-weight Language Models for Emotions using Supervised Contrastive Learning

paper_url: http://arxiv.org/abs/2310.18930
repo_url: None
paper_authors: Sapan Shah, Sreedhar Reddy, Pushpak Bhattacharyya
for: 这篇论文旨在探讨如何将情感方面的知识嵌入预训语言模型（BERT和RoBERTa）中，以提高模型的情感识别能力。
methods: 这篇论文使用对照学习方法将预训网络重新训练，使得文本片段具有相似情感时，在表现空间中被更加靠近地编码，而具有不同情感内容时则被推离。同时，这篇论文还确保了预训网络中对语言知识的不偏独影响。
results: 这篇论文的结果显示，使用这种方法更新预训网络的BERT和RoBERTa模型，可以实现情感识别的改进。对于情感分析和讽刺检测任务，这些模型比预训网络原始版本（约1%的提升）和其他已知方法更好。此外，这些更新后的模型在少量学习设定下表现更好。

Abstract
We present a novel retrofitting method to induce emotion aspects into pre-trained language models (PLMs) such as BERT and RoBERTa. Our method updates pre-trained network weights using contrastive learning so that the text fragments exhibiting similar emotions are encoded nearby in the representation space, and the fragments with different emotion content are pushed apart. While doing so, it also ensures that the linguistic knowledge already present in PLMs is not inadvertently perturbed. The language models retrofitted by our method, i.e., BERTEmo and RoBERTaEmo, produce emotion-aware text representations, as evaluated through different clustering and retrieval metrics. For the downstream tasks on sentiment analysis and sarcasm detection, they perform better than their pre-trained counterparts (about 1% improvement in F1-score) and other existing approaches. Additionally, a more significant boost in performance is observed for the retrofitted models over pre-trained ones in few-shot learning setting.

摘要
我们提出了一种新的改进方法，用于启用语言模型（PLM）中的情感方面，如BERT和RoBERTa。我们的方法通过对预训练网络权重进行更新，使得表达同样情感的文本片段在表示空间中相近，而表达不同情感的片段则被推迟。同时，我们的方法 также确保了预训练语言模型中的语言知识不会偶然受到影响。我们修改后的语言模型，即BERTEmo和RoBERTaEmo，可以生成情感意识的文本表示，根据不同的聚类和检索指标进行评估。在情感分析和讽刺检测下投入下，它们与预训练模型（大约1%的提升）和其他现有方法相比，表现出较好的性能。此外，我们发现在少量学习 Setting中，修改后的模型比预训练模型表现更好，具有更大的提升。

Sentence Bag Graph Formulation for Biomedical Distant Supervision Relation Extraction

paper_url: http://arxiv.org/abs/2310.18912
repo_url: None
paper_authors: Hao Zhang, Yang Liu, Xiaoyan Liu, Tianming Liang, Gaurav Sharma, Liang Xue, Maozu Guo
for: 提高生物医学数据中 distant supervision relation extraction 的精度和效果。
methods: 提出了一种基于图的框架，使用 message-passing 的信息汇集机制，解决了远级指导关系提取中的噪声标注问题，同时也能够有效地捕捉句子袋内sentence之间的依赖关系。
results: 在两个大规模生物医学关系集和 NYT 集上进行了广泛的实验，并证明了我们提出的方法可以在生物医学数据中 distant supervision relation extraction 中表现出色，同时也在普通文本挖掘领域中表现出优秀的relation extraction 能力。

Abstract
We introduce a novel graph-based framework for alleviating key challenges in distantly-supervised relation extraction and demonstrate its effectiveness in the challenging and important domain of biomedical data. Specifically, we propose a graph view of sentence bags referring to an entity pair, which enables message-passing based aggregation of information related to the entity pair over the sentence bag. The proposed framework alleviates the common problem of noisy labeling in distantly supervised relation extraction and also effectively incorporates inter-dependencies between sentences within a bag. Extensive experiments on two large-scale biomedical relation datasets and the widely utilized NYT dataset demonstrate that our proposed framework significantly outperforms the state-of-the-art methods for biomedical distant supervision relation extraction while also providing excellent performance for relation extraction in the general text mining domain.

摘要
我们提出了一种新的图structure-based框架，用于解决远程supervised关系抽取中的一些主要挑战，并在生物医学数据中进行了实质性的证明。特别是，我们提出了一种将句子袋视为实体对的图视图，使得对于实体对的信息在句子袋中进行消息传递基于的聚合。该提议的框架可以解决远程supervised关系抽取中的常见问题，即标签杂乱，并有效地 incorporate句子之间的依赖关系。我们在两个大规模的生物医学关系数据集和常用的NYT数据集上进行了广泛的实验，得到了我们提议的框架在生物医学关系抽取中的显著超越州方法的性能，同时也在文本挖掘领域中表现出色。

Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition

paper_url: http://arxiv.org/abs/2310.18877
repo_url: https://github.com/isaaconline/speat
paper_authors: Isaac Slaughter, Craig Greenberg, Reva Schwartz, Aylin Caliskan
for: 这个研究旨在检测语音处理模型中的偏见，具体来说是检测预训练模型中的偏见。
methods: 这个研究使用了Speech Embedding Association Test（SpEAT）来检测预训练模型中的偏见。SpEAT是基于自然语言处理中的词嵌入协会测试，用于量化模型对不同概念的偏见，如种族、性别等。
results: 这个研究发现了14种预训练模型中的偏见，包括abled人群对disabled人群的正面偏见、欧洲裔美国人群对非洲裔美国人群的正面偏见、女性对♂的正面偏见、美国口音 speaker对非美国口音 speaker的正面偏见、年轻人群对老年人群的正面偏见。此外，研究还发现了这些偏见在下游任务Speech Emotion Recognition（SER）中的影响。在66个测试中（69%），由SpEAT测试发现的偏见与SER任务中的偏见相关。

Abstract
Previous work has established that a person's demographics and speech style affect how well speech processing models perform for them. But where does this bias come from? In this work, we present the Speech Embedding Association Test (SpEAT), a method for detecting bias in one type of model used for many speech tasks: pre-trained models. The SpEAT is inspired by word embedding association tests in natural language processing, which quantify intrinsic bias in a model's representations of different concepts, such as race or valence (something's pleasantness or unpleasantness) and capture the extent to which a model trained on large-scale socio-cultural data has learned human-like biases. Using the SpEAT, we test for six types of bias in 16 English speech models (including 4 models also trained on multilingual data), which come from the wav2vec 2.0, HuBERT, WavLM, and Whisper model families. We find that 14 or more models reveal positive valence (pleasantness) associations with abled people over disabled people, with European-Americans over African-Americans, with females over males, with U.S. accented speakers over non-U.S. accented speakers, and with younger people over older people. Beyond establishing that pre-trained speech models contain these biases, we also show that they can have real world effects. We compare biases found in pre-trained models to biases in downstream models adapted to the task of Speech Emotion Recognition (SER) and find that in 66 of the 96 tests performed (69%), the group that is more associated with positive valence as indicated by the SpEAT also tends to be predicted as speaking with higher valence by the downstream model. Our work provides evidence that, like text and image-based models, pre-trained speech based-models frequently learn human-like biases. Our work also shows that bias found in pre-trained models can propagate to the downstream task of SER.

摘要
先前的研究已经证明人的民族和语言风格会影响语音处理模型对他们的性能。但是这种偏见来自哪里？在这项工作中，我们介绍了语音嵌入协会测试（SpEAT），用于检测语音处理模型中的偏见。SpEAT Draws inspiration from natural language processing中的嵌入协会测试，用于衡量不同概念的嵌入表示，如种族或语言风格，并捕捉模型从大规模社会文化数据中学习的人类化偏见。使用SpEAT，我们测试了16种英语语音模型（包括4种多语言模型），来自wav2vec 2.0、HuBERT、WavLM和Whisper模型家族。我们发现14个或更多的模型表现出了有利可能（愉悦）偏见，即abled人群比 disabled人群更有利可能，European-Americans比 African-Americans更有利可能，女性比男性更有利可能，U.S.口音说话者比非U.S.口音说话者更有利可能，以及年轻人比老年人更有利可能。我们不仅证明了语音处理模型中的这些偏见，还表明它们可能有实际的影响。我们比较了预训练模型中的偏见和下游任务speech emotion recognition（SER）模型中的偏见，发现在96次测试中（69%），与预训练模型中的偏见相关的组 Also tends to be predicted as speaking with higher valence by the downstream model。我们的工作证明了，如文本和图像基于模型一样，预训练语音基于模型经常学习人类化偏见。我们的工作还表明了预训练模型中的偏见可能会传播到下游任务中。

MUST: A Multilingual Student-Teacher Learning approach for low-resource speech recognition

paper_url: http://arxiv.org/abs/2310.18865
repo_url: None
paper_authors: Muhammad Umar Farooq, Rehan Ahmad, Thomas Hain
for: 本研究旨在解决语音识别系统训练中数据稀缺问题，通过学生教师学习（KD）方法。
methods: 本研究使用的方法包括提议一种多语言学生教师（MUST）学习方法，利用一个预训练的映射模型将教师语言的 posterior 映射到学生语言的 ASR 模型中。
results: 根据实验结果，使用 MUST 学习方法可以将Relative Character Error Rate（CER）降低到9.5%，相比基eline monolingual ASR 模型。

Abstract
Student-teacher learning or knowledge distillation (KD) has been previously used to address data scarcity issue for training of speech recognition (ASR) systems. However, a limitation of KD training is that the student model classes must be a proper or improper subset of the teacher model classes. It prevents distillation from even acoustically similar languages if the character sets are not same. In this work, the aforementioned limitation is addressed by proposing a MUltilingual Student-Teacher (MUST) learning which exploits a posteriors mapping approach. A pre-trained mapping model is used to map posteriors from a teacher language to the student language ASR. These mapped posteriors are used as soft labels for KD learning. Various teacher ensemble schemes are experimented to train an ASR model for low-resource languages. A model trained with MUST learning reduces relative character error rate (CER) up to 9.5% in comparison with a baseline monolingual ASR.

摘要
学生教师学习或知识蒸馏（KD）已经曾用于解决训练语音识别（ASR）系统的数据稀缺问题。然而，KD 训练的一个限制是学生模型类型必须是教师模型类型的正确或错误子集。这会防止训练不同语言的扩展，即使字符集不同。在这种情况下，我们提出了一种多语言学生教师（MUST）学习方法，利用 posterior mapping 技术。我们使用一个预训练的映射模型将教师语言的 posterior 映射到学生语言 ASR 中。这些映射 posterior 用作 KD 学习的软标签。我们对各种教师集合方案进行了实验，以训练一个低资源语言的 ASR 模型。与基线单语言 ASR 模型相比，我们的 MUST 学习方法可以降低相对字符错误率（CER）的差异为9.5%。

Counterfactually Probing Language Identity in Multilingual Models

paper_url: http://arxiv.org/abs/2310.18862
repo_url: https://github.com/venkatasg/multilingual-counterfactual-probing
paper_authors: Anirudh Srinivasan, Venkata S Govindarajan, Kyle Mahowald
for: 这 paper 探讨了语言模型中语言信息的组织方式，使用一种技术 called AlterRep 进行 counterfactual probing。
methods: 作者使用了一种 linear classifier 来解释 tokens 的语言标识 Task，并通过 counterfactual probing 方法来探讨模型的内部结构。
results: 研究发现，给定一个 Language X 模板，向 Language Y 方向推动 embedding 会系统性地增加 Language Y 词汇的概率，超过第三方控制语言。但是，这并不特别地推动模型转化为翻译相当的 Language Y 词汇。向 Language X 方向推动也有一定的效果，但是会有些程度下降。总之，这些结果表明大量多语言语言模型具有both语言特定和语言通用的结构，并且 counterfactual probing 可以成功应用于多语言模型。

Abstract
Techniques in causal analysis of language models illuminate how linguistic information is organized in LLMs. We use one such technique, AlterRep, a method of counterfactual probing, to explore the internal structure of multilingual models (mBERT and XLM-R). We train a linear classifier on a binary language identity task, to classify tokens between Language X and Language Y. Applying a counterfactual probing procedure, we use the classifier weights to project the embeddings into the null space and push the resulting embeddings either in the direction of Language X or Language Y. Then we evaluate on a masked language modeling task. We find that, given a template in Language X, pushing towards Language Y systematically increases the probability of Language Y words, above and beyond a third-party control language. But it does not specifically push the model towards translation-equivalent words in Language Y. Pushing towards Language X (the same direction as the template) has a minimal effect, but somewhat degrades these models. Overall, we take these results as further evidence of the rich structure of massive multilingual language models, which include both a language-specific and language-general component. And we show that counterfactual probing can be fruitfully applied to multilingual models.

摘要
使用 causal 分析技术可以探索语言模型（mBERT 和 XLM-R）中的语言信息结构。我们使用一种方法——Counterfactual probing，以探索这些模型的内部结构。我们在一个 binary 语言标识任务上训练了一个线性分类器，以分类Token是来自哪种语言。通过对这些分类器权重进行Counterfactual probing操作，我们可以将表示Vector проек到null空间中，并将其推动向Language X 或 Language Y 方向。然后，我们在一个隐藏语言模型任务上进行评估。我们发现，当给定一个 Language X 模板时，推动向 Language Y 方向会系统地增加 Language Y 词汇的概率，而这与第三种控制语言相比，这种效果明显。但是，不会 Specifically push 模型向翻译相同的 Language Y 词汇方向。推动向 Language X （与模板相同的方向）的效果相对较小，但是会有一定程度的降低这些模型的性能。总之，我们认为这些结果是证明大型多语言语言模型具有了rich结构，包括语言特定和语言通用的组成部分。同时，我们示出了对多语言模型的 counterfactual probing 可以得到有用的结果。

2023-10-29

cs.LG

cs.LG - 2023-10-29

Improved Motor Imagery Classification Using Adaptive Spatial Filters Based on Particle Swarm Optimization Algorithm

paper_url: http://arxiv.org/abs/2310.19202
repo_url: None
paper_authors: Xiong Xiong, Ying Wang, Tianyuan Song, Jinguo Huang, Guixia Kang
for: 这个论文主要针对的应用领域是 robot控制、roke rehabilitation 和 stroke 或脊梁损伤患者的助手。
methods: 该论文提出了一种基于 particle swarm optimization 算法 (PSO) 的适应空间筛选解决方案，用于提取 MI-EEG 信号中更加有效的空间特征，以提高分类性能。
results: Comparative experiments 表明，该提案的方法在两个公共数据集（2a 和 2b）上实现了显著的平均识别率提高，达到 74.61% 和 81.19% 分别。相比基eline algorithm（FBCSP），该提案的算法提高了 11.44% 和 7.11% 在两个数据集上。

Abstract
As a typical self-paced brain-computer interface (BCI) system, the motor imagery (MI) BCI has been widely applied in fields such as robot control, stroke rehabilitation, and assistance for patients with stroke or spinal cord injury. Many studies have focused on the traditional spatial filters obtained through the common spatial pattern (CSP) method. However, the CSP method can only obtain fixed spatial filters for specific input signals. Besides, CSP method only focuses on the variance difference of two types of electroencephalogram (EEG) signals, so the decoding ability of EEG signals is limited. To obtain more effective spatial filters for better extraction of spatial features that can improve classification to MI-EEG, this paper proposes an adaptive spatial filter solving method based on particle swarm optimization algorithm (PSO). A training and testing framework based on filter bank and spatial filters (FBCSP-ASP) is designed for MI EEG signal classification. Comparative experiments are conducted on two public datasets (2a and 2b) from BCI competition IV, which show the outstanding average recognition accuracy of FBCSP-ASP. The proposed method has achieved significant performance improvement on MI-BCI. The classification accuracy of the proposed method has reached 74.61% and 81.19% on datasets 2a and 2b, respectively. Compared with the baseline algorithm (FBCSP), the proposed algorithm improves 11.44% and 7.11% on two datasets respectively. Furthermore, the analysis based on mutual information, t-SNE and Shapley values further proves that ASP features have excellent decoding ability for MI-EEG signals, and explains the improvement of classification performance by the introduction of ASP features.

摘要
如常的自适应脑机器接口（BCI）系统中，运动想象（MI）BCI已经广泛应用在机器人控制、roke rehabilitation 和roke 或脊梁损伤患者的助手等领域。许多研究都集中在传统的空间筛选（CSP）方法上。然而，CSP 方法只能获得特定输入信号的固定空间筛选。此外，CSP 方法只关注两种电enzephalogram（EEG）信号之间的差异，因此EEG 信号的解码能力受限。为了获得更有效的空间筛选，提高MI-EEG 信号的特征提取，这篇文章提出了基于聚合粒子猎 optimization 算法（PSO）的自适应空间筛选解决方案。为MI EEG 信号类型的分类，设计了一个基于筛选银行和空间筛选（FBCSP-ASP）的训练和测试框架。对于BCI 竞赛 IV 公共数据集（2a 和 2b）进行了比较性实验，实验结果表明，提案方法在MI-BCI 中 achieved significant performance improvement。提案方法的识别率为74.61% 和 81.19% 在数据集 2a 和 2b 中，相比基准算法（FBCSP），提案算法提高了11.44% 和 7.11% 在两个数据集中。此外，基于mutual information、t-SNE 和 Shapley 值的分析进一步证明了ASP 特征对MI-EEG 信号的解码能力具有优秀性，并解释了提案方法的性能提升原因。

Enhancing Motor Imagery Decoding in Brain Computer Interfaces using Riemann Tangent Space Mapping and Cross Frequency Coupling

paper_url: http://arxiv.org/abs/2310.19198
repo_url: None
paper_authors: Xiong Xiong, Li Su, Jinguo Huang, Guixia Kang
for: 这个论文的目的是提高motor imagery（MI）特征的编码和解码能力。methods: 这篇论文提出了一种基于Riemannian geometry和Cross-Frequency Coupling（CFC）的新方法，称为Riemann Tangent Space Mapping using Dichotomous Filter Bank with Convolutional Neural Network（DFBRTS），用于提高MI特征的表达质量和解码能力。DFBRTS使用了一个完整的二进制树结构的 dichotomous filter bank来精炼EEG信号，然后使用Riemann Tangent Space Mapping提取每个子带中的突出的EEG信号特征。最后，一个轻量级的卷积神经网络被用于进一步提取特征和分类，在彼此之间同时受到了cross-entropy和center loss的联合监督。results: 对于BCI竞赛IV 2a（BCIC-IV-2a）数据集和OpenBMI数据集进行了广泛的实验，DFBRTS在两个数据集上显示出了明显的优异性，在四个类和二个类的保留分类中分别达到了78.16%和71.58%的高精度分类率，与现有的参考值进行比较。

Abstract
Objective: Motor Imagery (MI) serves as a crucial experimental paradigm within the realm of Brain Computer Interfaces (BCIs), aiming to decoding motor intentions from electroencephalogram (EEG) signals. Method: Drawing inspiration from Riemannian geometry and Cross-Frequency Coupling (CFC), this paper introduces a novel approach termed Riemann Tangent Space Mapping using Dichotomous Filter Bank with Convolutional Neural Network (DFBRTS) to enhance the representation quality and decoding capability pertaining to MI features. DFBRTS first initiates the process by meticulously filtering EEG signals through a Dichotomous Filter Bank, structured in the fashion of a complete binary tree. Subsequently, it employs Riemann Tangent Space Mapping to extract salient EEG signal features within each sub-band. Finally, a lightweight convolutional neural network is employed for further feature extraction and classification, operating under the joint supervision of cross-entropy and center loss. To validate the efficacy, extensive experiments were conducted using DFBRTS on two well-established benchmark datasets: the BCI competition IV 2a (BCIC-IV-2a) dataset and the OpenBMI dataset. The performance of DFBRTS was benchmarked against several state-of-the-art MI decoding methods, alongside other Riemannian geometry-based MI decoding approaches. Results: DFBRTS significantly outperforms other MI decoding algorithms on both datasets, achieving a remarkable classification accuracy of 78.16% for four-class and 71.58% for two-class hold-out classification, as compared to the existing benchmarks.

摘要
目的：使用电气生物 интерфей斯（BCI）中的动作幻像（MI）作为关键实验方法，从电气生物学信号（EEG）中提取动作意图。方法：基于里敦纬度 geometry和跨频相关（CFC），这篇论文提出了一种新的方法，即里敦 Tangent Space Mapping using Dichotomous Filter Bank with Convolutional Neural Network（DFBRTS），以提高MI特征的表达质量和解码能力。DFBRTS首先通过完整的 binary tree 结构的 dichotomous Filter Bank 精细筛选 EEG 信号，然后使用里敦 Tangent Space Mapping 提取每个子带中的优秀 EEG 信号特征。最后，一种轻量级的 convolutional neural network 进行进一步的特征提取和分类，在joint 超VI 和中心损失的协同监督下运行。为证明DFBRTS的效果，对DFBRTS进行了广泛的实验，并与其他里敦 geometry 基于MI解码方法进行比较。结果：DFBRTS在两个常用的benchmark数据集上（BCIC-IV-2a数据集和OpenBMI数据集）上显著地超过了其他MI解码方法，达到了78.16%的四类分类率和71.58%的二类分类率。

Conformal Normalization in Recurrent Neural Network of Grid Cells

paper_url: http://arxiv.org/abs/2310.19192
repo_url: None
paper_authors: Dehong Xu, Ruiqi Gao, Wen-Hao Zhang, Xue-Xin Wei, Ying Nian Wu
for: 该研究探讨了grid cells在脑内的响应模式，以及这些模式如何影响agent的 Navigation。
methods: 研究使用了高维神经网络，并提出了一种简单而普遍的准确normalization方法，以便在agent移动时保持响应vector的正确尺寸。
results: 实验结果表明，通过使用准确normalization方法，grid cells可以形成六角形响应模式，并且这些模式与agent的实际位置在2D physical space有直接的关系。

Abstract
Grid cells in the entorhinal cortex of the mammalian brain exhibit striking hexagon firing patterns in their response maps as the animal (e.g., a rat) navigates in a 2D open environment. The responses of the population of grid cells collectively form a vector in a high-dimensional neural activity space, and this vector represents the self-position of the agent in the 2D physical space. As the agent moves, the vector is transformed by a recurrent neural network that takes the velocity of the agent as input. In this paper, we propose a simple and general conformal normalization of the input velocity for the recurrent neural network, so that the local displacement of the position vector in the high-dimensional neural space is proportional to the local displacement of the agent in the 2D physical space, regardless of the direction of the input velocity. Our numerical experiments on the minimally simple linear and non-linear recurrent networks show that conformal normalization leads to the emergence of the hexagon grid patterns. Furthermore, we derive a new theoretical understanding that connects conformal normalization to the emergence of hexagon grid patterns in navigation tasks.

摘要
“ENTORHINAL CORTEX中的格子细胞在动物（例如鼠）在2D开放环境中探索时表现出惊人的六角发射模式。这些细胞的响应集体形成一个高维神经活动空间中的向量，该向量表示动物的自身位置在2D物理空间中。随着动物的移动，这个向量被一个循环神经网络转换，该神经网络的输入是动物的速度。在这篇论文中，我们提出了一种简单而普遍的几何正常化方法，以确保输入速度的local displacement在高维神经空间中与动物在2D物理空间中的local displacement成正比。我们的数值实验表明，几何正常化会导致格子网格模式的出现。此外，我们还derived一种新的理论理解，该理解连接几何正常化与导航任务中的格子网格模式的出现。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

The Power of Explainability in Forecast-Informed Deep Learning Models for Flood Mitigation

paper_url: http://arxiv.org/abs/2310.19166
repo_url: None
paper_authors: Jimeng Shi, Vitalii Stebliankin, Giri Narasimhan
for: 这 paper 是为了提出一种基于深度学习架构的洪水管理方法，以优化洪水预 release 的决策。
methods: 这 paper 使用了 Forecast Informed Deep Learning Architecture (FIDLAR)， combinig 预测和深度学习来实现洪水管理。
results: 实验结果表明，FIDLAR 比现有的状态艺术有几个数量级的速度提高，并且可以提供更好的预 release 决策。这些速度提高使得 FIDLAR 可以用于实时洪水管理。此外，这 paper 还使用了工具来解释模型的决策，从而更好地理解洪水管理中环境因素的贡献。

Abstract
Floods can cause horrific harm to life and property. However, they can be mitigated or even avoided by the effective use of hydraulic structures such as dams, gates, and pumps. By pre-releasing water via these structures in advance of extreme weather events, water levels are sufficiently lowered to prevent floods. In this work, we propose FIDLAR, a Forecast Informed Deep Learning Architecture, achieving flood management in watersheds with hydraulic structures in an optimal manner by balancing out flood mitigation and unnecessary wastage of water via pre-releases. We perform experiments with FIDLAR using data from the South Florida Water Management District, which manages a coastal area that is highly prone to frequent storms and floods. Results show that FIDLAR performs better than the current state-of-the-art with several orders of magnitude speedup and with provably better pre-release schedules. The dramatic speedups make it possible for FIDLAR to be used for real-time flood management. The main contribution of this paper is the effective use of tools for model explainability, allowing us to understand the contribution of the various environmental factors towards its decisions.

摘要
洪水可以带来惊人的破坏力和财产损失。然而，通过有效地使用 гидро利用结构，如坝、闸门和泵，可以减轻或缓解洪水的影响。在这种情况下，我们提出了 FIDLAR，一种基于预测的深度学习架构，通过在洪水事件前预先释放水，以达到Optimal flood management in watersheds with hydraulic structures by balancing flood mitigation and water wastage via pre-releases.我们在使用南佛瑞达水资源管理区的数据进行实验，这是一个常遇洪水的沿海地区。结果显示，FIDLAR比当前状态艺术有几个数量级的速度减速，并且可以证明更好的预release schedule。这些减速使得FIDLAR可以用于实时洪水管理。本文的主要贡献是通过工具来描述模型的解释，以便理解响应不同环境因素的决策的贡献。

RAIFLE: Reconstruction Attacks on Interaction-based Federated Learning with Active Data Manipulation

paper_url: http://arxiv.org/abs/2310.19163
repo_url: https://github.com/dzungvpham/raifle
paper_authors: Dzung Pham, Shreyas Kulkarni, Amir Houmansadr
for: 这个论文关注了 federated learning (FL) 中的用户隐私问题，具体来说是在用户互动域中的 recommender systems (RS) 和 online learning to rank (OLTR) 中。
methods: 这篇论文使用了一种名为 RAIFLE 的整合优化基于攻击框架，用于攻击 IFL 系统中的用户隐私。RAIFLE 使用了一种新的攻击技术名为 Active Data Manipulation (ADM)，通过在训练特征上操纵ITEMS来导致本地 FL 更新中的 adversarial 行为。
results: 论文表明 RAIFLE 可以在 IFL 系统中更有效地攻击用户隐私，并且可以干扰隐私防御技术，如安全汇聚和私人信息检索。基于这些发现，论文提出了一些Countermeasure 建议来 mitigate 这种攻击。

Abstract
Federated learning (FL) has recently emerged as a privacy-preserving approach for machine learning in domains that rely on user interactions, particularly recommender systems (RS) and online learning to rank (OLTR). While there has been substantial research on the privacy of traditional FL, little attention has been paid to studying the privacy properties of these interaction-based FL (IFL) systems. In this work, we show that IFL can introduce unique challenges concerning user privacy, particularly when the central server has knowledge and control over the items that users interact with. Specifically, we demonstrate the threat of reconstructing user interactions by presenting RAIFLE, a general optimization-based reconstruction attack framework customized for IFL. RAIFLE employs Active Data Manipulation (ADM), a novel attack technique unique to IFL, where the server actively manipulates the training features of the items to induce adversarial behaviors in the local FL updates. We show that RAIFLE is more impactful than existing FL privacy attacks in the IFL context, and describe how it can undermine privacy defenses like secure aggregation and private information retrieval. Based on our findings, we propose and discuss countermeasure guidelines to mitigate our attack in the context of federated RS/OLTR specifically and IFL more broadly.

摘要
federated learning（FL）已经被认为是一种保护用户隐私的机器学习方法，尤其是在用户互动域中，如推荐系统（RS）和在线学习排名（OLTR）等领域。 although there has been extensive research on the privacy of traditional FL, little attention has been paid to the privacy properties of these interaction-based FL（IFL）systems. in this work, we show that IFL can introduce unique challenges concerning user privacy, particularly when the central server has knowledge and control over the items that users interact with. specifically, we demonstrate the threat of reconstructing user interactions by presenting RAIFLE, a general optimization-based reconstruction attack framework customized for IFL. RAIFLE employs Active Data Manipulation（ADM）, a novel attack technique unique to IFL, where the server actively manipulates the training features of the items to induce adversarial behaviors in the local FL updates. we show that RAIFLE is more impactful than existing FL privacy attacks in the IFL context, and describe how it can undermine privacy defenses like secure aggregation and private information retrieval. based on our findings, we propose and discuss countermeasure guidelines to mitigate our attack in the context of federated RS/OLTR specifically and IFL more broadly.

Transfer Learning in Transformer-Based Demand Forecasting For Home Energy Management System

paper_url: http://arxiv.org/abs/2310.19159
repo_url: None
paper_authors: Gargya Gokhale, Jonas Van Gompel, Bert Claessens, Chris Develder
for: 这个研究旨在开发一个基于转移学习的家用电力负载预测模型，以提高家用电力负载预测的精度和效率。
methods: 研究人员使用了一种名为“时间融合 трансформа”的先进预测模型，并通过将这个全球模型调整到新的一个家用电力负载数据中，以提高预测的精度和效率。
results: 研究人员发现，使用转移学习设置可以比仅使用单一家用电力负载数据更好地预测家用电力负载，具体而言，可以降低预测误差率约15%，并且可以降低家用电力负载成本约2%。

Abstract
Increasingly, homeowners opt for photovoltaic (PV) systems and/or battery storage to minimize their energy bills and maximize renewable energy usage. This has spurred the development of advanced control algorithms that maximally achieve those goals. However, a common challenge faced while developing such controllers is the unavailability of accurate forecasts of household power consumption, especially for shorter time resolutions (15 minutes) and in a data-efficient manner. In this paper, we analyze how transfer learning can help by exploiting data from multiple households to improve a single house's load forecasting. Specifically, we train an advanced forecasting model (a temporal fusion transformer) using data from multiple different households, and then finetune this global model on a new household with limited data (i.e. only a few days). The obtained models are used for forecasting power consumption of the household for the next 24 hours~(day-ahead) at a time resolution of 15 minutes, with the intention of using these forecasts in advanced controllers such as Model Predictive Control. We show the benefit of this transfer learning setup versus solely using the individual new household's data, both in terms of (i) forecasting accuracy ($\sim$15\% MAE reduction) and (ii) control performance ($\sim$2\% energy cost reduction), using real-world household data.

摘要
HOMEOWNERS 对光伏系统和/或电池储存系统的选择在增加，以最大化能源成本和可再生能源使用。这导致了高级控制算法的发展，以最大化这些目标。然而，开发这些控制器时常遇到缺乏精确的家用电力消耗预测，特别是在短时间尺度（15分钟）和高效率下。在这篇文章中，我们分析了如何使用传播学习来解决这个问题。我们使用多个不同的家庭的数据来训练进阶预测模型（时间融合变换器），然后在新的家庭中进行精确化训练（仅使用几天的数据）。所得到的模型用于预测新家庭的电力消耗预测，时间尺度为24小时（日前），每15分钟一次。我们显示了这个传播学习设置的优点，包括预测精度（约15% MAE减少）和控制性能（约2%能源成本减少），使用实际家庭数据进行评估。

Real-World Implementation of Reinforcement Learning Based Energy Coordination for a Cluster of Households

paper_url: http://arxiv.org/abs/2310.19155
repo_url: None
paper_authors: Gargya Gokhale, Niels Tiben, Marie-Sophie Verwee, Manu Lahariya, Bert Claessens, Chris Develder
for: 这 paper 的目的是研究如何通过聚合控制多幢住宅建筑物来为现代电力网提供支持服务，包括备用服务。methods: 这 paper 使用了学习反馈控制（RL）技术来协调8幢住宅建筑物的电力消耗，不需要任何建筑模型或模拟器，因此实施和扩展非常方便。results: 这 paper 通过实验示出了RL基于排名系统选择哪些户型动用可变资产，并使用实时PI控制机制来控制选择的资产，实现了功能的电力跟踪和RL基于数据驱动的排名效果的可行性。

Abstract
Given its substantial contribution of 40\% to global power consumption, the built environment has received increasing attention to serve as a source of flexibility to assist the modern power grid. In that respect, previous research mainly focused on energy management of individual buildings. In contrast, in this paper, we focus on aggregated control of a set of residential buildings, to provide grid supporting services, that eventually should include ancillary services. In particular, we present a real-life pilot study that studies the effectiveness of reinforcement-learning (RL) in coordinating the power consumption of 8 residential buildings to jointly track a target power signal. Our RL approach relies solely on observed data from individual households and does not require any explicit building models or simulators, making it practical to implement and easy to scale. We show the feasibility of our proposed RL-based coordination strategy in a real-world setting. In a 4-week case study, we demonstrate a hierarchical control system, relying on an RL-based ranking system to select which households to activate flex assets from, and a real-time PI control-based power dispatch mechanism to control the selected assets. Our results demonstrate satisfactory power tracking, and the effectiveness of the RL-based ranks which are learnt in a purely data-driven manner.

摘要
由于它的严重贡献了40%的全球电力消耗，建筑环境在现代电力网络中获得了越来越多的注意力，以满足需求。在这个意义上，之前的研究主要集中在建筑物之间的能源管理。相比之下，在这篇论文中，我们将关注一组住宅建筑物的总控制，以为电力网络提供支持服务，最终应包括辅助服务。具体来说，我们将展示一个实际的 Pilot 研究，研究使用强化学习（RL）来协调8个住宅建筑物的电力消耗，以同步跟踪目标电力信号。我们的RL方法不需要任何建筑物模型或模拟器，因此实施可行和扩展容易。我们在实际情况下展示了我们提议的RL-基于协调策略的可行性。在4个星期的案例研究中，我们实现了一个层次控制系统，通过RL-基于排名系统来选择需要活动的资产，并使用实时PI控制-基于的电力派发机制来控制选择的资产。我们的结果表明了满意的电力跟踪，以及RL-基于排名系统的学习效果，这些排名系统是通过实际数据驱动学习而学习的。

MAG-GNN: Reinforcement Learning Boosted Graph Neural Network

paper_url: http://arxiv.org/abs/2310.19142
repo_url: None
paper_authors: Lecheng Kong, Jiarui Feng, Hao Liu, Dacheng Tao, Yixin Chen, Muhan Zhang
for: 提高Graph Neural Networks（GNNs）的结构编码能力，以提高GNNs的表达能力。
methods: 使用搜索算法来选择一小subset of subgraphs，并使用强化学习（RL）Agent来更新subgraph set，以提高GNNs的表达能力。
results: 在多个 datasets 上进行了广泛的实验，显示了 MAG-GNN 可以与现有方法竞争，甚至超过一些subgraph GNNs，同时也可以减少subgraph GNNs 的运行时间。

Abstract
While Graph Neural Networks (GNNs) recently became powerful tools in graph learning tasks, considerable efforts have been spent on improving GNNs' structural encoding ability. A particular line of work proposed subgraph GNNs that use subgraph information to improve GNNs' expressivity and achieved great success. However, such effectivity sacrifices the efficiency of GNNs by enumerating all possible subgraphs. In this paper, we analyze the necessity of complete subgraph enumeration and show that a model can achieve a comparable level of expressivity by considering a small subset of the subgraphs. We then formulate the identification of the optimal subset as a combinatorial optimization problem and propose Magnetic Graph Neural Network (MAG-GNN), a reinforcement learning (RL) boosted GNN, to solve the problem. Starting with a candidate subgraph set, MAG-GNN employs an RL agent to iteratively update the subgraphs to locate the most expressive set for prediction. This reduces the exponential complexity of subgraph enumeration to the constant complexity of a subgraph search algorithm while keeping good expressivity. We conduct extensive experiments on many datasets, showing that MAG-GNN achieves competitive performance to state-of-the-art methods and even outperforms many subgraph GNNs. We also demonstrate that MAG-GNN effectively reduces the running time of subgraph GNNs.

摘要
Graph Neural Networks (GNNs) 在图学任务中最近成为了强大工具，但是大量的工作被投入到了GNNs的结构编码能力的提高中。一种特定的工作提出了子图GNNs，使用子图信息来提高GNNs的表达能力，并取得了很大的成功。然而，这种表达能力来源于完全对所有可能的子图进行枚举，这会导致GNNs的效率下降。在这篇论文中，我们分析了完全子图枚举的必要性，并证明了一个模型可以通过考虑一小部分的子图来达到相似的表达能力。然后，我们将这个问题转化为一个 combinatorial 优化问题，并提出了磁矢量图神经网络（MAG-GNN）来解决这个问题。MAG-GNN从候选子图集开始，使用了一个强化学习（RL）的代理人来逐步更新子图，以查找最有表达力的集合用于预测。这将枚举子图的枚举复杂度从对数复杂度降低到常数复杂度，保持好的表达能力。我们在许多数据集上进行了广泛的实验，显示MAG-GNN与当前状态的方法竞争，甚至超过了许多子图GNNs。我们还证明了MAG-GNN可以有效减少子图GNNs的运行时间。

Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations

paper_url: http://arxiv.org/abs/2310.19126
repo_url: None
paper_authors: Piotr Indyk, Haike Xu
for: 这些纸都是为了研究近似 neighboor搜索算法的最坏情况性能的。
methods: 这些算法包括HNSW、NSG和DiskANN。
results: 我们发现，对于DiskANN的”慢预处理”版本，它可以在数据集中有 bounded “内在”维度时支持常数准确度和多余logarithmic查询时间的近似最近邻搜索查询。对于其他数据结构variant studied，包括DiskANN的”快预处理”版本、HNSW和NSG，我们提出了一家实例集，其中查询过程可以 linear in instance size 的时间内返回”合理”的准确率。例如，对于DiskANN，我们显示了，在实例大小为 n 时，查询过程至少需要 0.1 n 步骤才能查找其中的5个最近邻。

Abstract
Graph-based approaches to nearest neighbor search are popular and powerful tools for handling large datasets in practice, but they have limited theoretical guarantees. We study the worst-case performance of recent graph-based approximate nearest neighbor search algorithms, such as HNSW, NSG and DiskANN. For DiskANN, we show that its "slow preprocessing" version provably supports approximate nearest neighbor search query with constant approximation ratio and poly-logarithmic query time, on data sets with bounded "intrinsic" dimension. For the other data structure variants studied, including DiskANN with "fast preprocessing", HNSW and NSG, we present a family of instances on which the empirical query time required to achieve a "reasonable" accuracy is linear in instance size. For example, for DiskANN, we show that the query procedure can take at least $0.1 n$ steps on instances of size $n$ before it encounters any of the $5$ nearest neighbors of the query.

摘要
Graph-based方法是实际中处理大数据集的强大工具，但它们在理论上有限的保证。我们研究近期的图形基于近似最近邻搜索算法，如HNSW、NSG和DiskANN的最坏情况性能。对于DiskANN，我们表明其"慢预处理"版本可以在数据集中的"内在维度"是bounded时提供常数准确率和多项几何查询时间的近似最近邻搜索查询。对其他数据结构变体，包括DiskANN的"快预处理"版本、HNSW和NSG，我们提供了一家实际上的实例，其查询时间与实例大小线性相关。例如，对于DiskANN，我们表明其查询过程可以在实例大小为$n$时间内至少执行$0.1n$步骤才会遇到5个最近邻居。

Software engineering for deep learning applications: usage of SWEng and MLops tools in GitHub repositories

paper_url: http://arxiv.org/abs/2310.19124
repo_url: None
paper_authors: Evangelia Panourgia, Theodoros Plessas, Diomidis Spinellis
for: 这篇论文主要关注于深度学习（DL）软件开发中的软件工程（SE）实践，特别是DL软件开发中的工程挑战和资料驱动的非决定性模式。
methods: 本研究使用Python为主要编程语言，采用 précédente MSR 研究的工具使用方法，扫描GitHub上popular的应用DL项目，探索这些项目中的SE工具使用情况。
results: 研究发现，大约70%的GitHub库中包含至少一个SE工具，软件配置管理工具是最多使用的，而维护工具则较少使用。另外，MLOps工具的使用相对较少，只有9个工具在该样本中被使用。TensorBoard是唯一在对�る repository 中使用的MLOps工具。

Abstract
The rising popularity of deep learning (DL) methods and techniques has invigorated interest in the topic of SE4DL, the application of software engineering (SE) practices on deep learning software. Despite the novel engineering challenges brought on by the data-driven and non-deterministic paradigm of DL software, little work has been invested into developing AI-targeted SE tools. On the other hand, tools tackling more general engineering issues in DL are actively used and referred to under the umbrella term of ``MLOps tools''. Furthermore, the available literature supports the utility of conventional SE tooling in DL software development. Building upon previous MSR research on tool usage in open-source software works, we identify conventional and MLOps tools adopted in popular applied DL projects that use Python as the main programming language. About 70% of the GitHub repositories mined contained at least one conventional SE tool. Software configuration management tools are the most adopted, while the opposite applies to maintenance tools. Substantially fewer MLOps tools were in use, with only 9 tools out of a sample of 80 used in at least one repository. The majority of them were open-source rather than proprietary. One of these tools, TensorBoard, was found to be adopted in about half of the repositories in our study. Consequently, the use of conventional SE tooling demonstrates its relevance to DL software. Further research is recommended on the adoption of MLOps tooling by open-source projects, focusing on the relevance of particular tool types, the development of required tools, as well as ways to promote the use of already available tools.

摘要
随着深度学习（DL）方法和技术的普及，关注SE4DL（深度学习软件工程）领域的应用而增加。然而，由于深度学习软件的数据驱动和不确定的理论带来的新的工程挑战，对于AI目标的SE工具仍然受到了少量投入。相反，关于更一般的机器学习（ML）工程问题，如MLOps工具，活跃地使用和引用。此外，现有的文献支持传统的SE工具在深度学习软件开发中的可用性。基于之前的微软研究人员在开源软件项目中工具使用情况，我们识别了传统和MLOps工具在流行的应用深度学习项目中的采用情况。我们发现，大约70%的GitHub存储库包含至少一个传统SE工具。软件配置管理工具是最广泛采用的，而维护工具则相对较少。与此同时，MLOps工具的采用远远少于传统SE工具，只有80个存储库中的9个被使用。大多数这些工具是开源的，而不是商业化的。 tensorBoard 是这些工具中的一个，在我们的研究中被采用的约半数。因此，传统SE工具在深度学习软件开发中的使用表明了它们的重要性。进一步的研究建议在开源项目中采用MLOps工具，特别是关注特定工具类型、开发需要的工具以及如何促进现有工具的使用。

Proving Linear Mode Connectivity of Neural Networks via Optimal Transport

paper_url: http://arxiv.org/abs/2310.19103
repo_url: https://github.com/9aze/ot_lmc
paper_authors: Damien Ferbach, Baptiste Goujaud, Gauthier Gidel, Aymeric Dieuleveut
for: 本研究探讨了高维非凸优化问题的能量景观，以解释现代深度神经网络架构的效果。
methods: 本文提出了一种理论框架，可以理解现有两次权重训练后得到的两个解的连续性。基于 Wasserstein 距离的整体速度，我们表明了两个够宽的两层神经网络，通过权重训练来连续地连接。
results: 我们提供了一种上下限bounds，可以量化每层神经网络的宽度，以便确保连续性。此外，我们还经验表明了积分权重分布的维度与连续性之间的相关性。

Abstract
The energy landscape of high-dimensional non-convex optimization problems is crucial to understanding the effectiveness of modern deep neural network architectures. Recent works have experimentally shown that two different solutions found after two runs of a stochastic training are often connected by very simple continuous paths (e.g., linear) modulo a permutation of the weights. In this paper, we provide a framework theoretically explaining this empirical observation. Based on convergence rates in Wasserstein distance of empirical measures, we show that, with high probability, two wide enough two-layer neural networks trained with stochastic gradient descent are linearly connected. Additionally, we express upper and lower bounds on the width of each layer of two deep neural networks with independent neuron weights to be linearly connected. Finally, we empirically demonstrate the validity of our approach by showing how the dimension of the support of the weight distribution of neurons, which dictates Wasserstein convergence rates is correlated with linear mode connectivity.

摘要
高维非对称优化问题的能量景观对现代深度神经网络架构的效果是关键。latest studies have shown that two different solutions found after two runs of stochastic training are often connected by very simple continuous paths (e.g., linear) modulo a permutation of the weights. In this paper, we provide a theoretical framework to explain this empirical observation. Based on the convergence rates of empirical measures in Wasserstein distance, we show that, with high probability, two wide enough two-layer neural networks trained with stochastic gradient descent are linearly connected. Additionally, we provide upper and lower bounds on the width of each layer of two deep neural networks with independent neuron weights to be linearly connected. Finally, we empirically demonstrate the validity of our approach by showing how the dimension of the support of the weight distribution of neurons, which dictates Wasserstein convergence rates, is correlated with linear mode connectivity.

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

paper_url: http://arxiv.org/abs/2310.19102
repo_url: None
paper_authors: Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci
for: 提高 Large Language Models（LLMs）在内容生成、智能客服和情感分析等应用中的效率，以适应应用场景中的增长需求。
methods: 使用批处理技术批处理多个请求，以提高 GPU 资源的使用效率和throughput。
results: 在服务器上实现高 durchput 提高 ($7.73\times$ 比FP16和 $2.53\times$ 比INT8 归一化)，同时保持同样的响应时间目标，而不会增加精度损失。

Abstract
The growing demand for Large Language Models (LLMs) in applications such as content generation, intelligent chatbots, and sentiment analysis poses considerable challenges for LLM service providers. To efficiently use GPU resources and boost throughput, batching multiple requests has emerged as a popular paradigm; to further speed up batching, LLM quantization techniques reduce memory consumption and increase computing capacity. However, prevalent quantization schemes (e.g., 8-bit weight-activation quantization) cannot fully leverage the capabilities of modern GPUs, such as 4-bit integer operators, resulting in sub-optimal performance. To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss. Atom significantly boosts serving throughput by using low-bit operators and considerably reduces memory consumption via low-bit quantization. It attains high accuracy by applying a novel mixed-precision and fine-grained quantization process. We evaluate Atom on 4-bit weight-activation quantization setups in the serving context. Atom improves end-to-end throughput by up to $7.73\times$ compared to the FP16 and by $2.53\times$ compared to INT8 quantization, while maintaining the same latency target.

摘要
受大语言模型（LLM）应用的内容生成、智能客服和情感分析等领域的需求不断增长， LLM 服务提供商面临着巨大的挑战。为了高效利用 GPU 资源并提高通过put，批处多个请求已成为了流行的方法；而为了进一步加速批处， LLM 量化技术可以降低内存占用量和提高计算能力。然而，现有的量化方案（如 8 位 weight-activation 量化）无法完全利用现代 GPU 的能力，导致性能下降。为了最大化 LLM 的服务通过put，我们介绍 Atom，一种低位量化方法，可以 achieve high throughput improvements with negligible accuracy loss。Atom 使用低位操作和减少内存占用量，以提高服务通过put。它通过应用一种新的混合精度和细致的量化过程，保持高度的准确性。我们在 4 位 weight-activation 量化设置下测试 Atom。Atom 可以提高终端通过put的吞吐量，比FP16和 INT8 量化的吞吐量高出 $7.73\times$，而且保持同样的响应时间目标。

Bridging the Gap: Towards an Expanded Toolkit for ML-Supported Decision-Making in the Public Sector

paper_url: http://arxiv.org/abs/2310.19091
repo_url: None
paper_authors: Unai Fischer Abaigar, Christoph Kern, Noam Barda, Frauke Kreuter
For: This paper aims to bridge the gap between machine learning (ML) and public sector decision-making by addressing key technical challenges that arise when aligning intricate policy objectives with the precise formalization requirements of ML models.* Methods: The paper concentrates on pivotal points of the ML pipeline that connect the model to its operational environment, including the significance of representative training data and the importance of a model setup that facilitates effective decision-making. The paper also links these challenges with emerging methodological advancements, such as causal ML, domain adaptation, uncertainty quantification, and multi-objective optimization.* Results: The paper provides a comprehensive overview of the challenges that arise when using ML in the public sector, and highlights the importance of addressing these challenges in order to harmonize ML and public sector objectives. The paper also illustrates the path forward for addressing these challenges, including the use of emerging methodological advancements.

Abstract
Machine Learning (ML) systems are becoming instrumental in the public sector, with applications spanning areas like criminal justice, social welfare, financial fraud detection, and public health. While these systems offer great potential benefits to institutional decision-making processes, such as improved efficiency and reliability, they still face the challenge of aligning intricate and nuanced policy objectives with the precise formalization requirements necessitated by ML models. In this paper, we aim to bridge the gap between ML and public sector decision-making by presenting a comprehensive overview of key technical challenges where disjunctions between policy goals and ML models commonly arise. We concentrate on pivotal points of the ML pipeline that connect the model to its operational environment, delving into the significance of representative training data and highlighting the importance of a model setup that facilitates effective decision-making. Additionally, we link these challenges with emerging methodological advancements, encompassing causal ML, domain adaptation, uncertainty quantification, and multi-objective optimization, illustrating the path forward for harmonizing ML and public sector objectives.

摘要

训练数据的选择和调整：ML 模型的精度和可靠性受到训练数据的影响，但是公共部门的决策过程中的训练数据可能不够完整或不够有代表性。2. 模型设置的构成和调整：为了使 ML 模型能够实际地支持公共部门的决策过程，需要适当地设置和调整模型的参数和架构。3. 适用于公共部门的 ML 技术发展：包括 causal ML、领域适应、uncertainty quantification 和多目标优化在内的新技术可以帮助解决 ML 和公共部门之间的匹配问题。本文通过聚焦 ML pipeline 中的关键点子，探讨 ML 模型如何与公共部门的决策过程进行匹配，并提出了一些实际的方法来解决这些挑战。这些方法包括：1. 使用更多的代表性丰富的训练数据来优化 ML 模型的性能。2. 适当地设置和调整 ML 模型的参数和架构，以便更好地支持公共部门的决策过程。3. 采用新的 ML 技术，例如 causal ML、领域适应、uncertainty quantification 和多目标优化，来解决 ML 和公共部门之间的匹配问题。

Efficient Cluster Selection for Personalized Federated Learning: A Multi-Armed Bandit Approach

paper_url: http://arxiv.org/abs/2310.19069
repo_url: None
paper_authors: Zhou Ni, Morteza Hashemi
for: 这篇论文目的是为了解决在人工智能学习中的联邦学习网络中的问题，特别是在变化很大的数据分布和设备能力下。
methods: 这篇论文使用了一种名为“动态Upper Confidence Bound”的算法，它是基于多臂枪（MAB）的方法，用于在联邦学习网络中聚合用户。这个算法可以将新用户的数据分布与最佳的聚合群相匹配。
results: 这篇论文的实验结果显示，在不同的数据分布和设备能力下，这个算法可以有效地处理变化很大的联邦学习enario。

Abstract
Federated learning (FL) offers a decentralized training approach for machine learning models, prioritizing data privacy. However, the inherent heterogeneity in FL networks, arising from variations in data distribution, size, and device capabilities, poses challenges in user federation. Recognizing this, Personalized Federated Learning (PFL) emphasizes tailoring learning processes to individual data profiles. In this paper, we address the complexity of clustering users in PFL, especially in dynamic networks, by introducing a dynamic Upper Confidence Bound (dUCB) algorithm inspired by the multi-armed bandit (MAB) approach. The dUCB algorithm ensures that new users can effectively find the best cluster for their data distribution by balancing exploration and exploitation. The performance of our algorithm is evaluated in various cases, showing its effectiveness in handling dynamic federated learning scenarios.

摘要

Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming

paper_url: http://arxiv.org/abs/2310.19068
repo_url: None
paper_authors: Gregory Dexter, Petros Drineas, David P. Woodruff, Taisuke Yasuda
for: 这 paper 的目的是扩展 sketching 算法的应用范围，包括稀疏字储学习和 EUCLIDEAN $k$-means 归类问题。
methods: 这 paper 使用了新的技术来推广 sketching 算法的应用范围，包括一种新的 PTAS 方法和新的上界和下界。
results: 这 paper 得到了一些新的结果，包括一种新的 PTAS 方法和新的上界和下界，以及一些关于 dictionary learning 和 $k$-means 归类问题的研究。

Abstract
Sketching algorithms have recently proven to be a powerful approach both for designing low-space streaming algorithms as well as fast polynomial time approximation schemes (PTAS). In this work, we develop new techniques to extend the applicability of sketching-based approaches to the sparse dictionary learning and the Euclidean $k$-means clustering problems. In particular, we initiate the study of the challenging setting where the dictionary/clustering assignment for each of the $n$ input points must be output, which has surprisingly received little attention in prior work. On the fast algorithms front, we obtain a new approach for designing PTAS's for the $k$-means clustering problem, which generalizes to the first PTAS for the sparse dictionary learning problem. On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and $k$-means clustering. In particular, given a design matrix $\mathbf A\in\mathbb R^{n\times d}$ in a turnstile stream, we show an $\tilde O(nr/\epsilon^2 + dk/\epsilon)$ space upper bound for $r$-sparse dictionary learning of size $k$, an $\tilde O(n/\epsilon^2 + dk/\epsilon)$ space upper bound for $k$-means clustering, as well as an $\tilde O(n)$ space upper bound for $k$-means clustering on random order row insertion streams with a natural "bounded sensitivity" assumption. On the lower bounds side, we obtain a general $\tilde\Omega(n/\epsilon + dk/\epsilon)$ lower bound for $k$-means clustering, as well as an $\tilde\Omega(n/\epsilon^2)$ lower bound for algorithms which can estimate the cost of a single fixed set of candidate centers.

摘要
algorithm 已经证明是一种强大的方法，不仅用于设计具有低空间流处理器的算法，也用于快速的多项时间算法（PTAS）。在这个工作中，我们开发了新的技术，以扩展画 sketching 方法的应用范围至简短字典学习和欧几何 $k$-means 聚类问题。具体来说，我们开始研究具有复杂的设定，其中每个输入点的字典/聚类分配必须被输出，这个问题在先前的工作中很少获得关注。在快速算法方面，我们取得了一新的方法，用于设计 PTAS 的 $k$-means 聚类问题，这个方法可扩展到简短字典学习问题的首次 PTAS。在流处理算法方面，我们取得了新的上界和下界，用于字典学习和 $k$-means 聚类问题。具体来说，在turnstile流中，我们显示了一个 $\tilde O(nr/\epsilon^2 + dk/\epsilon)$ 的空间上界，用于 $r$-简字典学习的大小为 $k$，以及一个 $\tilde O(n/\epsilon^2 + dk/\epsilon)$ 的空间上界，用于 $k$-means 聚类问题。此外，我们还取得了一个 $\tilde O(n)$ 的空间上界，用于 $k$-means 聚类在随机排序推入流中。在下界方面，我们取得了一个通用的 $\tilde\Omega(n/\epsilon + dk/\epsilon)$ 下界，用于 $k$-means 聚类问题，以及一个 $\tilde\Omega(n/\epsilon^2)$ 下界，用于可以估计单一集合中心的成本的算法。

Evaluating LLP Methods: Challenges and Approaches

paper_url: http://arxiv.org/abs/2310.19065
repo_url: https://github.com/gaabrielfranco/llp-variants-datasets-benchmarks
paper_authors: Gabriel Franco, Giovanni Comarela, Mark Crovella
for: 本研究是为了解决Label Proportions（LLP）问题，这是一个机器学习问题，有很多实际应用。
methods: 本研究使用了生成 variant-specific 数据集，以捕捉不同的 dependence structure 和 bag 特性。此外，还使用了一种新的模型选择方法，以适应 LLQ 问题的特殊性。
results: 研究发现，选择最佳算法需要考虑不同的 LLP 变种和模型选择方法。通过对一些常见 LLQ 算法进行了广泛的比较， demonstrate 了需要我们提出的方法。

Abstract
Learning from Label Proportions (LLP) is an established machine learning problem with numerous real-world applications. In this setting, data items are grouped into bags, and the goal is to learn individual item labels, knowing only the features of the data and the proportions of labels in each bag. Although LLP is a well-established problem, it has several unusual aspects that create challenges for benchmarking learning methods. Fundamental complications arise because of the existence of different LLP variants, i.e., dependence structures that can exist between items, labels, and bags. Accordingly, the first algorithmic challenge is the generation of variant-specific datasets capturing the diversity of dependence structures and bag characteristics. The second methodological challenge is model selection, i.e., hyperparameter tuning; due to the nature of LLP, model selection cannot easily use the standard machine learning paradigm. The final benchmarking challenge consists of properly evaluating LLP solution methods across various LLP variants. We note that there is very little consideration of these issues in prior work, and there are no general solutions for these challenges proposed to date. To address these challenges, we develop methods capable of generating LLP datasets meeting the requirements of different variants. We use these methods to generate a collection of datasets encompassing the spectrum of LLP problem characteristics, which can be used in future evaluation studies. Additionally, we develop guidelines for benchmarking LLP algorithms, including the model selection and evaluation steps. Finally, we illustrate the new methods and guidelines by performing an extensive benchmark of a set of well-known LLP algorithms. We show that choosing the best algorithm depends critically on the LLP variant and model selection method, demonstrating the need for our proposed approach.

摘要
To address these challenges, we develop methods capable of generating LLP datasets meeting the requirements of different variants. We use these methods to generate a collection of datasets encompassing the spectrum of LLP problem characteristics, which can be used in future evaluation studies. Additionally, we develop guidelines for benchmarking LLP algorithms, including the model selection and evaluation steps. Finally, we illustrate the new methods and guidelines by performing an extensive benchmark of a set of well-known LLP algorithms. We show that choosing the best algorithm depends critically on the LLP variant and model selection method, demonstrating the need for our proposed approach.翻译结果：LLP（学习从标签分量）是一个已经有长期应用的机器学习问题，具有许多实际应用场景。在这个设定中，数据项目被分组为袋子，目标是从数据特征和标签分量中学习各自的项目标签。虽然LLP是一个已知的问题，但它有一些不寻常的特点，导致评估学习方法的挑战。主要的挑战包括：1. 生成 variant-specific 数据集，捕捉不同的依赖结构和袋子特征的多样性。2. 因为 LLP 的特点，选择最佳模型不能使用标准机器学习范文。3. 评估 LLP 解决方案的多样性，以确保它们在不同的 LLP 变体中表现良好。为了解决这些挑战，我们开发了生成 LLP 数据集的方法，以满足不同变体的需求。我们使用这些方法生成了一系列包含 LLP 问题特征谱的数据集，可以在未来的评估研究中使用。此外，我们还提供了评估 LLP 算法的指南，包括模型选择和评估步骤。最后，我们使用新方法和指南对一组知名 LLP 算法进行了广泛的比较。我们发现，选择最佳算法取决于 LLP 变体和模型选择方法，这说明了我们的提出的方法的需要。

Revisiting the Learnability of Apple Tasting

paper_url: http://arxiv.org/abs/2310.19064
repo_url: None
paper_authors: Vinod Raman, Unique Subedi, Ananth Raman, Ambuj Tewari
for: 研究在apple tasting反馈下的在线分类问题。
methods: 使用 combinatorial perspective 研究在线学习可能性，并提出了一个新的参数Effective width，用于量化在可 realizable 设定下的最差预期错误数。
results: 在 realizable 设定下，showed that the expected number of mistakes for any learner under apple tasting feedback can only be $\Theta(1), \Theta(\sqrt{T})$, or $\Theta(T)$。

Abstract
In online binary classification under \textit{apple tasting} feedback, the learner only observes the true label if it predicts "1". First studied by \cite{helmbold2000apple}, we revisit this classical partial-feedback setting and study online learnability from a combinatorial perspective. We show that the Littlestone dimension continues to prove a tight quantitative characterization of apple tasting in the agnostic setting, closing an open question posed by \cite{helmbold2000apple}. In addition, we give a new combinatorial parameter, called the Effective width, that tightly quantifies the minimax expected mistakes in the realizable setting. As a corollary, we use the Effective width to establish a \textit{trichotomy} of the minimax expected number of mistakes in the realizable setting. In particular, we show that in the realizable setting, the expected number of mistakes for any learner under apple tasting feedback can only be $\Theta(1), \Theta(\sqrt{T})$, or $\Theta(T)$.

摘要
在在线二分类学习中，学习者只会看到真实标签，如果预测结果为1。这个问题最早由Helmbold等人（2000）研究，我们现在从 combinatorial 角度重新研究这个古典的partial-feedback 设定，并证明 Littlestone 维度仍然是agnostic 设定中的一个紧张量量化 caracterization。此外，我们还提出了一个新的 combinatorial 参数，called Effective width，它紧密地量化了可行情况下的最差预期错误。为此，我们使用 Effective width 证明了可行情况下的最差预期错误数可以只是 $\Theta(1), \Theta(\sqrt{T})$ 或 $\Theta(T)$。

Feature Aggregation in Joint Sound Classification and Localization Neural Networks

paper_url: http://arxiv.org/abs/2310.19063
repo_url: None
paper_authors: Brendan Healy, Patrick McNamee, Zahra Nili Ahmadabadi
for: 本研究探讨了深度学习技术在共同声音信号分类和地点化网络中的应用。现有状态的声音源地点化深度学习网络缺乏特征聚合在其架构中。特征聚合可以提高模型性能，因为它使得不同特征尺度上的信息可以被集成，从而提高特征Robustness和不变性。这特别重要在SSL网络中，因为它们必须在直接和间接声音信号之间进行区分。为解决这个漏洞，我们将计算机视觉网络中的特征聚合技术应用到声音检测网络中。
methods: 我们采用了计算机视觉网络中的特征聚合技术，包括Path Aggregation Network (PANet)、Weighted Bi-directional Feature Pyramid Network (BiFPN)和Scale Encoding Network (SEN)等。这些技术被integrated into a SSL control architecture，并被评估使用两种声音分类和两种方向射 regression 的指标。PANet和BiFPN是计算机视觉模型中已知的聚合器，而我们提议的SEN是更加压缩的聚合器。
results: 结果表明，包含特征聚合的模型在声音分类和地点化方面的性能都高于控制模型，即Sound Event Localization and Detection network (SELDnet)。特征聚合技术提高了声音检测神经网络的性能，特别是在方向射 regression 方面。

Abstract
This study addresses the application of deep learning techniques in joint sound signal classification and localization networks. Current state-of-the-art sound source localization deep learning networks lack feature aggregation within their architecture. Feature aggregation enhances model performance by enabling the consolidation of information from different feature scales, thereby improving feature robustness and invariance. This is particularly important in SSL networks, which must differentiate direct and indirect acoustic signals. To address this gap, we adapt feature aggregation techniques from computer vision neural networks to signal detection neural networks. Additionally, we propose the Scale Encoding Network (SEN) for feature aggregation to encode features from various scales, compressing the network for more computationally efficient aggregation. To evaluate the efficacy of feature aggregation in SSL networks, we integrated the following computer vision feature aggregation sub-architectures into a SSL control architecture: Path Aggregation Network (PANet), Weighted Bi-directional Feature Pyramid Network (BiFPN), and SEN. These sub-architectures were evaluated using two metrics for signal classification and two metrics for direction-of-arrival regression. PANet and BiFPN are established aggregators in computer vision models, while the proposed SEN is a more compact aggregator. The results suggest that models incorporating feature aggregations outperformed the control model, the Sound Event Localization and Detection network (SELDnet), in both sound signal classification and localization. The feature aggregation techniques enhance the performance of sound detection neural networks, particularly in direction-of-arrival regression.

摘要

Escaping Saddle Points in Heterogeneous Federated Learning via Distributed SGD with Communication Compression

paper_url: http://arxiv.org/abs/2310.19059
repo_url: None
paper_authors: Sijin Chen, Zhize Li, Yuejie Chi
For: 提高 federated learning（FL）中communication efficiency和学习精度的问题。* Methods: 提出了一种新的error-feedback scheme，实现了在不同客户端数据异ogeneous的情况下，通过压缩信息进行分布式SGD算法的实现。* Results: 证明了Power-EF算法可以在不同客户端数据异ogeneous情况下，逃脱平均点，并且在第二阶段 convergence 中，展现出线性增长。

Abstract
We consider the problem of finding second-order stationary points of heterogeneous federated learning (FL). Previous works in FL mostly focus on first-order convergence guarantees, which do not rule out the scenario of unstable saddle points. Meanwhile, it is a key bottleneck of FL to achieve communication efficiency without compensating the learning accuracy, especially when local data are highly heterogeneous across different clients. Given this, we propose a novel algorithm Power-EF that only communicates compressed information via a novel error-feedback scheme. To our knowledge, Power-EF is the first distributed and compressed SGD algorithm that provably escapes saddle points in heterogeneous FL without any data homogeneity assumptions. In particular, Power-EF improves to second-order stationary points after visiting first-order (possibly saddle) points, using additional gradient queries and communication rounds only of almost the same order required by first-order convergence, and the convergence rate exhibits a linear speedup in terms of the number of workers. Our theory improves/recovers previous results, while extending to much more tolerant settings on the local data. Numerical experiments are provided to complement the theory.

摘要
我们考虑到寻找非常复杂的联邦学习（FL）中的第二阶站点问题。前一些FL工作主要集中在第一阶均衡保证，这不能排除不稳定的阶均点的情况。另一方面，在FL中实现通信效率不损学习精度的挑战，尤其是当地方数据具有很高的不同客户端的多样性时。为了解决这个问题，我们提出了一个新的算法Power-EF，它仅在一个新的错误反馈方案下进行压缩通信。我们知道Power-EF是首个分布式压缩SGD算法，可以在不同客户端的数据多样性下，避免阶均点而实现第二阶站点，并且在额外的梯度询问和通信轮次上进行几乎相同的复杂度。我们的理论提高了/恢复了先前的结果，同时扩展到许多更允许的本地数据设置。实验数据来补充理论。

Object-centric architectures enable efficient causal representation learning

paper_url: http://arxiv.org/abs/2310.19054
repo_url: None
paper_authors: Amin Mansouri, Jason Hartford, Yan Zhang, Yoshua Bengio
for: 这篇论文旨在探讨如何在多个物体的观察数据上实现 causal representation learning，以实现对每个物体的属性的分离。
methods: 该论文使用了对象中心学习和 causal representation learning 的最新发展，通过修改 Slot Attention 架构，使用稀有的干扰来强制实现对每个物体的属性的分离。
results: 该论文在一系列简单的图像基于的分离实验中成功地分离了一组物体的属性，并且需要更少的干扰than comparable approach 。

Abstract
Causal representation learning has showed a variety of settings in which we can disentangle latent variables with identifiability guarantees (up to some reasonable equivalence class). Common to all of these approaches is the assumption that (1) the latent variables are represented as $d$-dimensional vectors, and (2) that the observations are the output of some injective generative function of these latent variables. While these assumptions appear benign, we show that when the observations are of multiple objects, the generative function is no longer injective and disentanglement fails in practice. We can address this failure by combining recent developments in object-centric learning and causal representation learning. By modifying the Slot Attention architecture arXiv:2006.15055, we develop an object-centric architecture that leverages weak supervision from sparse perturbations to disentangle each object's properties. This approach is more data-efficient in the sense that it requires significantly fewer perturbations than a comparable approach that encodes to a Euclidean space and we show that this approach successfully disentangles the properties of a set of objects in a series of simple image-based disentanglement experiments.

摘要
causal representation learning 在多种设置中展示了可以分离干扰变量的可靠性保证 ( hasta certain extent 的等价类). 这些方法假设：1) 干扰变量是 $d$-维 вектор表示; 2) 观察是这些干扰变量的生成函数的输出。 although these assumptions seem innocuous, we show that when the observations are of multiple objects, the generative function is no longer injective and disentanglement fails in practice. 我们可以通过结合近期的对象中心学习和 causal representation learning 来解决这种失败。我们修改了 arXiv:2006.15055 中的槽注意架构，以便在 sparse perturbations 的 weak supervision 下，为每个对象分离其特性。这种方法比一种在 Euclidean space 中编码并且需要更少的扰动而言，我们展示了这种方法可以成功地分离一系列的图像基于的对象分离实验中的对象特性。

Datasets and Benchmarks for Nanophotonic Structure and Parametric Design Simulations

paper_url: http://arxiv.org/abs/2310.19053
repo_url: https://github.com/jungtaekkim/nanophotonic-structures
paper_authors: Jungtaek Kim, Mingxuan Li, Oliver Hinder, Paul W. Leu
for: 这个论文主要针对的应用是设计和理解奈米光学结构，以实现太阳能电池、反射层、电磁干扰屏蔽、光滤波器和LED等多种应用。
methods: 这篇论文使用了电动力学模拟来模拟电磁场的时间变化和光学性质。同时，它还提出了一些参数结构设计问题的评价框架和标准。
results: 研究人员通过对不同Grid大小的电动力学模拟进行比较，发现可以通过灵活地选择评价精度来提高结构设计。此外，他们还提出了一些参数结构设计问题的解决方案。

Abstract
Nanophotonic structures have versatile applications including solar cells, anti-reflective coatings, electromagnetic interference shielding, optical filters, and light emitting diodes. To design and understand these nanophotonic structures, electrodynamic simulations are essential. These simulations enable us to model electromagnetic fields over time and calculate optical properties. In this work, we introduce frameworks and benchmarks to evaluate nanophotonic structures in the context of parametric structure design problems. The benchmarks are instrumental in assessing the performance of optimization algorithms and identifying an optimal structure based on target optical properties. Moreover, we explore the impact of varying grid sizes in electrodynamic simulations, shedding light on how evaluation fidelity can be strategically leveraged in enhancing structure designs.

摘要
几何光子结构具有多方面应用，包括太阳能电池、反射层、电磁干扰隔绝、光滤波器和发光二极管。为设计和理解这些几何光子结构，电动力学模拟是必备的。这些模拟可以模拟电磁场过时的变化，并计算光学性能。在这个工作中，我们介绍了框架和参考标准，用于评估几何光子结构在参数结构设计问题中的性能。这些参考标准可以评估优化算法的性能，并帮助选择基于目标光学性能的最佳结构。此外，我们还探讨了在电动力学模拟中不同格子大小的影响，照明了如何积极地利用评估实价来提升结构设计。

Differentially Private Permutation Tests: Applications to Kernel Methods

paper_url: http://arxiv.org/abs/2310.19043
repo_url: https://github.com/antoninschrab/dpkernel-paper
paper_authors: Ilmun Kim, Antonin Schrab
for: 隐私保护的敏感数据分析
methods: 使用差异性保护的排序测试（differentially private permutation tests），extend classical non-private permutation tests to private settings，maintain both finite-sample validity and differential privacy
results: 提出了 differentially private kernel tests（dpMMD和dpHSIC），可以在不同的隐私环境下实现最佳的能力，实现了在Synthetic和实际场景下的竞争力比较Here’s the breakdown of each point:
for: The paper is written for the purpose of privacy-preserving data analysis, specifically in the context of hypothesis testing.
methods: The paper introduces differentially private permutation tests as a way to extend classical non-private permutation tests to private settings while maintaining both finite-sample validity and differential privacy.
results: The paper proposes two differentially private kernel tests (dpMMD and dpHSIC) that can achieve optimal power under different privacy regimes, and demonstrates their competitive power through empirical evaluations on synthetic and real-world data.

Abstract
Recent years have witnessed growing concerns about the privacy of sensitive data. In response to these concerns, differential privacy has emerged as a rigorous framework for privacy protection, gaining widespread recognition in both academic and industrial circles. While substantial progress has been made in private data analysis, existing methods often suffer from impracticality or a significant loss of statistical efficiency. This paper aims to alleviate these concerns in the context of hypothesis testing by introducing differentially private permutation tests. The proposed framework extends classical non-private permutation tests to private settings, maintaining both finite-sample validity and differential privacy in a rigorous manner. The power of the proposed test depends on the choice of a test statistic, and we establish general conditions for consistency and non-asymptotic uniform power. To demonstrate the utility and practicality of our framework, we focus on reproducing kernel-based test statistics and introduce differentially private kernel tests for two-sample and independence testing: dpMMD and dpHSIC. The proposed kernel tests are straightforward to implement, applicable to various types of data, and attain minimax optimal power across different privacy regimes. Our empirical evaluations further highlight their competitive power under various synthetic and real-world scenarios, emphasizing their practical value. The code is publicly available to facilitate the implementation of our framework.

摘要
近年来，有越来越多的关注关于敏感数据的隐私问题。为回应这些问题，差分隐私在学术和工业圈中得到了广泛的认可，成为隐私保护的严格框架。虽然在私人数据分析方面已经做出了大量的进展，但现有方法经常受到实用性或统计效率的限制。这篇论文的目标是在假设测试中解决这些问题，通过引入差分隐私排序测试来保持rigorous的隐私和统计有效性。我们的框架将经典的非私人排序测试扩展到私人设置下，并保持了finite-sample的有效性和差分隐私。我们的测试能力取决于选择的测试统计量，我们确定了一般的一致性和非假设统计上的强大能力。为了证明我们的框架的实用性和实用性，我们将重点关注使用归一化测试统计量，并引入差分隐私kernel测试：dpMMD和dpHSIC。这些差分隐私kernel测试是易于实现，适用于各种数据类型，并在不同的隐私环境下具有最佳的可比性。我们的实验证明了它们在不同的 sintetic 和实际场景下具有竞争力，强调它们的实际价值。代码publicly available，以便实现我们的框架。

On Linear Separation Capacity of Self-Supervised Representation Learning

paper_url: http://arxiv.org/abs/2310.19041
repo_url: None
paper_authors: Shulei Wang
for: 本研究旨在探讨数据增强学习在多材料模型下的表达学习，以及这种表达学习是如何提高线性分类器的表达能力的。
methods: 本研究使用了自助学习和数据增强学习方法，并对这些方法的表达学习效果进行了分析。
results: 研究发现，数据增强学习可以提高线性分类器的表达能力，并且可以在多材料模型下 Linearly separate manifolds。此外，研究还发现，自助学习可以在小样本大量数据下提高线性分类器的表达能力。

Abstract
Recent advances in self-supervised learning have highlighted the efficacy of data augmentation in learning data representation from unlabeled data. Training a linear model atop these enhanced representations can yield an adept classifier. Despite the remarkable empirical performance, the underlying mechanisms that enable data augmentation to unravel nonlinear data structures into linearly separable representations remain elusive. This paper seeks to bridge this gap by investigating under what conditions learned representations can linearly separate manifolds when data is drawn from a multi-manifold model. Our investigation reveals that data augmentation offers additional information beyond observed data and can thus improve the information-theoretic optimal rate of linear separation capacity. In particular, we show that self-supervised learning can linearly separate manifolds with a smaller distance than unsupervised learning, underscoring the additional benefits of data augmentation. Our theoretical analysis further underscores that the performance of downstream linear classifiers primarily hinges on the linear separability of data representations rather than the size of the labeled data set, reaffirming the viability of constructing efficient classifiers with limited labeled data amid an expansive unlabeled data set.

摘要

Machine Learning for the identification of phase-transitions in interacting agent-based systems

paper_url: http://arxiv.org/abs/2310.19039
repo_url: None
paper_authors: Nikolaos Evangelou, Dimitrios G. Giovanis, George A. Kevrekidis, Grigorios A. Pavliotis, Ioannis G. Kevrekidis
for: 这篇论文的目的是提出一种数据驱动的框架，用于描述agent-based模型（ABM）中的相态转变。
methods: 该论文使用了Diffusion Maps算法来Identify一个简洁的数据驱动变量，并使用深度学习框架来获得一个参数化的坐标系，以便在这些坐标系中identify一个参数dependent的涨落函数。
results: 该论文通过使用这种数据驱动的方法，成功地construct了一个相态转变的 диаграм。

Abstract
Deriving closed-form, analytical expressions for reduced-order models, and judiciously choosing the closures leading to them, has long been the strategy of choice for studying phase- and noise-induced transitions for agent-based models (ABMs). In this paper, we propose a data-driven framework that pinpoints phase transitions for an ABM in its mean-field limit, using a smaller number of variables than traditional closed-form models. To this end, we use the manifold learning algorithm Diffusion Maps to identify a parsimonious set of data-driven latent variables, and show that they are in one-to-one correspondence with the expected theoretical order parameter of the ABM. We then utilize a deep learning framework to obtain a conformal reparametrization of the data-driven coordinates that facilitates, in our example, the identification of a single parameter-dependent ODE in these coordinates. We identify this ODE through a residual neural network inspired by a numerical integration scheme (forward Euler). We then use the identified ODE -- enabled through an odd symmetry transformation -- to construct the bifurcation diagram exhibiting the phase transition.

摘要
使用闭式表达式和选择合适的闭式来研究基于代理模型（ABM）的相对阶段和噪声引起的转变，已经是长期的策略。在这篇文章中，我们提出了一个数据驱动的框架，用于在ABM的含义场限制下标出相对阶段的转变点。为此，我们使用扩散地图算法来确定一个简洁的数据驱动的秘密变量，并证明它们与ABM的预期的理论参量之间存在一一对应关系。然后，我们使用深度学习框架来获得一个符号映射，以便在这些坐标系中进行数据驱动的协调。通过这种方式，我们可以在这些坐标系中提取出一个参数依赖的径谱方程。我们使用这个径谱方程，通过一种奇偶变换，构建了相对阶段的分布图。

Does Invariant Graph Learning via Environment Augmentation Learn Invariance?

paper_url: http://arxiv.org/abs/2310.19035
repo_url: https://github.com/lfhase/gala
paper_authors: Yongqiang Chen, Yatao Bian, Kaiwen Zhou, Binghui Xie, Bo Han, James Cheng
For: 本 paper 的目的是学习图像上的不变性，以便在图像上进行对外部数据进行泛化。* Methods: 本 paper 使用环境扩充来提高图像的不变性学习，但是这些环境扩充的有用性从未被证明。因此，本 paper 提出了一些最小假设，包括变化 suficiency 和变化 consistency，以便可能地学习图像的不变性。* Results: 本 paper 提出了一个新的框架 Graph invAriant Learning Assistant (GALA)，该框架包括一个助手模型，该模型需要对图像环境变化或分布变化敏感。助手模型的proxy预测可以判断图像中的杂乱子图的变化。根据这些 proxy 预测，提取图像中最大不变性子图可以唯一地标识图像的不变性子图，并且在成功的 OOD 泛化下保证不变性。经过对多个 dataset 的广泛实验，包括 DrugOOD 等，确认了 GALA 的有效性。

Abstract
Invariant graph representation learning aims to learn the invariance among data from different environments for out-of-distribution generalization on graphs. As the graph environment partitions are usually expensive to obtain, augmenting the environment information has become the de facto approach. However, the usefulness of the augmented environment information has never been verified. In this work, we find that it is fundamentally impossible to learn invariant graph representations via environment augmentation without additional assumptions. Therefore, we develop a set of minimal assumptions, including variation sufficiency and variation consistency, for feasible invariant graph learning. We then propose a new framework Graph invAriant Learning Assistant (GALA). GALA incorporates an assistant model that needs to be sensitive to graph environment changes or distribution shifts. The correctness of the proxy predictions by the assistant model hence can differentiate the variations in spurious subgraphs. We show that extracting the maximally invariant subgraph to the proxy predictions provably identifies the underlying invariant subgraph for successful OOD generalization under the established minimal assumptions. Extensive experiments on datasets including DrugOOD with various graph distribution shifts confirm the effectiveness of GALA.

摘要
《固定 graph 表示学习中的不变性学习目标是学习数据集中的不变性，以实现对不同环境的外部数据泛化。然而，通常获取 graph 环境分区是非常昂贵的，因此通常会使用环境扩充来解决这个问题。然而，这种环境扩充的有用性从来没有得到证明。在这种情况下，我们发现，通过环境扩充来学习不变的 graph 表示是不可能的，因此我们提出了一些最小化假设，包括变化充分和变化一致，以便实现可能的不变的 graph 学习。然后，我们提出了一个新的框架Graph invAriant Learning Assistant（GALA）。GALA 包含一个助手模型，该模型需要对 graph 环境变化或分布变化敏感。如果助手模型的代理预测正确，那么可以区分真正的变量和误差的变量。我们证明，从助手模型的代理预测中提取最大可变的子图可以识别下来的不变的子图，并且在我们提出的假设下，可以 garantuee 对外部数据的泛化。我们的实验结果表明，GALA 在具有不同 graph 分布变化的数据集上具有非常高的有效性。》

An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits

paper_url: http://arxiv.org/abs/2310.19025
repo_url: None
paper_authors: Kiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin, Max Springer
for: 这个论文是为了解决 adversarial contextual bandits 问题的 oracle-efficient relaxation。
methods: 这个论文使用的方法是一个 online adversary 选择 cost sequence，contexts 是从 known distribution 随机地被引入。
results: 这个论文的 regret bound 是 $O(T^{\frac{2}{3}(K\log(|\Pi|))^{\frac{1}{3})$，比之前的最好 bound $O((TK)^{\frac{2}{3}(\log(|\Pi|))^{\frac{1}{3})$ 更好。此外，这个论文还是第一个能够与 Langford 和 Zhang 在 NeurIPS 2007 提出的原始 bound 匹配的 result。

Abstract
We present an oracle-efficient relaxation for the adversarial contextual bandits problem, where the contexts are sequentially drawn i.i.d from a known distribution and the cost sequence is chosen by an online adversary. Our algorithm has a regret bound of $O(T^{\frac{2}{3}(K\log(|\Pi|))^{\frac{1}{3})$ and makes at most $O(K)$ calls per round to an offline optimization oracle, where $K$ denotes the number of actions, $T$ denotes the number of rounds and $\Pi$ denotes the set of policies. This is the first result to improve the prior best bound of $O((TK)^{\frac{2}{3}(\log(|\Pi|))^{\frac{1}{3})$ as obtained by Syrgkanis et al. at NeurIPS 2016, and the first to match the original bound of Langford and Zhang at NeurIPS 2007 which was obtained for the stochastic case.

摘要
我们提出了一个 oracle-efficient relaxation 的方法来解决对抗上下文带状奖励问题，其中上下文是以独立 Identically distributed（i.i.d）方式从一个已知分布中随机获取，而问题选择的成本序列则是由一个在线 adversary 选择。我们的算法具有一个 regret bound of $O(T^{\frac{2}{3}}(K\log(|\Pi|))^{\frac{1}{3}})$，并在每个回合最多做 $O(K)$ 个调用于 offline 优化库的请求，其中 $K$ 表示行动的数量，$T$ 表示回合的数量，$\Pi$ 表示策略的集合。这是第一个超越先前最好的 bound of $O((TK)^{\frac{2}{3}}(\log(|\Pi|))^{\frac{1}{3}})$，它是 Syrgkanis et al. 在 NeurIPS 2016 上提出的，并且是第一个与 Langford 和 Zhang 在 NeurIPS 2007 上提出的原始 bound 匹配，这个 bound 是为 Stochastic 情况。

Optimization Landscape of Policy Gradient Methods for Discrete-time Static Output Feedback

paper_url: http://arxiv.org/abs/2310.19022
repo_url: None
paper_authors: Jingliang Duan, Jie Li, Xuyang Chen, Kai Zhao, Shengbo Eben Li, Lin Zhao
for: 这篇论文探讨了使用policy gradient方法实现线性时域不变（LTI）系统的优化控制问题。
methods: 该论文使用了三种policy gradient方法：混合策略梯度法、自然策略梯度法和Gauss-Newton法。
results: 论文提出了关于这三种方法的新发现，包括它们在 дискреase时间LTI系统中的收敛性和约瑟率。此外，论文还证明了vanilla policy gradient方法在初始化 nearby local minima时的线性收敛性。

Abstract
In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, output-feedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This paper analyzes the optimization landscape inherent to policy gradient methods when applied to static output feedback (SOF) control in discrete-time LTI systems subject to quadratic cost. We begin by establishing crucial properties of the SOF cost, encompassing coercivity, L-smoothness, and M-Lipschitz continuous Hessian. Despite the absence of convexity, we leverage these properties to derive novel findings regarding convergence (and nearly dimension-free rate) to stationary points for three policy gradient methods, including the vanilla policy gradient method, the natural policy gradient method, and the Gauss-Newton method. Moreover, we provide proof that the vanilla policy gradient method exhibits linear convergence towards local minima when initialized near such minima. The paper concludes by presenting numerical examples that validate our theoretical findings. These results not only characterize the performance of gradient descent for optimizing the SOF problem but also provide insights into the effectiveness of general policy gradient methods within the realm of reinforcement learning.

摘要
近些时间，在政策梯度方法中探索优化景观的进展很大。相比状态反馈控制，输出反馈控制更为普遍，因为实际情况中系统的下面状态可能不完全 observable。这篇论文分析了在静态输出反馈（SOF）控制中政策梯度方法的优化景观。我们首先证明了SOF成本函数的重要性质，包括半征性、L-smoothness和M-Lipschitz连续偏导。尽管不具有凸性，我们利用这些性质来 derivate 新的发现，包括政策梯度方法的三种方法（包括混合政策梯度方法、自然政策梯度方法和Gauss-Newton方法）的收敛性（以及几乎维度独立的速率）。此外，我们提供了证明，在 initialization 近于 Local minima 时，混合政策梯度方法 exhibits 线性收敛到 Local minima。文章结束，通过数学实验证明我们的理论发现。这些结果不仅描述了随机梯度下引擎的 SOF 问题的优化，还为束缚学习中的政策梯度方法提供了信息。

Behavior Alignment via Reward Function Optimization

paper_url: http://arxiv.org/abs/2310.19007
repo_url: None
paper_authors: Dhawal Gupta, Yash Chandak, Scott M. Jordan, Philip S. Thomas, Bruno Castro da Silva
for: 本研究旨在设计导引问题解决的优化学习搜寻（RL）代理人，以实现特定行为的目标。
methods: 本研究使用了一个新的两级目标框架，将auxiliary reward函数与环境的主要优化函数整合，以学习行为调整优化函数。
results: 本研究的结果显示，使用本研究的方法可以对RL代理人的政策优化过程进行自动调整，以减少问题解决中的限制和偏误。此外，本研究还证明了其可以对不同的任务和环境进行适用，并且可以实现高性能的解决方案，即使auxiliary reward函数存在误差或偏误。

Abstract
Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadvertently inducing undesirable behaviors. Naively modifying the reward structure to offer denser and more frequent feedback can lead to unintended outcomes and promote behaviors that are not aligned with the designer's intended goal. Although potential-based reward shaping is often suggested as a remedy, we systematically investigate settings where deploying it often significantly impairs performance. To address these issues, we introduce a new framework that uses a bi-level objective to learn \emph{behavior alignment reward functions}. These functions integrate auxiliary rewards reflecting a designer's heuristics and domain knowledge with the environment's primary rewards. Our approach automatically determines the most effective way to blend these types of feedback, thereby enhancing robustness against heuristic reward misspecification. Remarkably, it can also adapt an agent's policy optimization process to mitigate suboptimalities resulting from limitations and biases inherent in the underlying RL algorithms. We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges. We investigate heuristic auxiliary rewards of varying quality -- some of which are beneficial and others detrimental to the learning process. Our results show that our framework offers a robust and principled way to integrate designer-specified heuristics. It not only addresses key shortcomings of existing approaches but also consistently leads to high-performing solutions, even when given misaligned or poorly-specified auxiliary reward functions.

摘要
设计奖励函数以有效引导学习控制（RL）代理人行为是一个复杂的任务。这是因为它需要识别不 sparse的奖励结构，以避免不恰当的奖励引导代理人行为。直接修改奖励结构以提供更密集和更频繁的反馈可能会导致不预期的结果，并且激励代理人不符合设计者的目标行为。虽然潜在基于奖励的奖励形成 often 被建议作为解决方案，但我们系统地调查这种方法在一些情况下可能会导致性能下降。为解决这些问题，我们提出一种新的框架，使用二级目标学习行为Alignment奖励函数。这些函数将auxiliary奖励与环境的主要奖励相结合，以便自动确定最有效的奖励杂合方式，从而提高对奖励misspecification的Robustness。此外，它还可以通过调整代理人的政策优化过程来抑制基于RL算法的限制和偏见所导致的优化不足。我们对这种方法的可行性进行了多种任务的测试，从小规模实验到高维控制挑战。我们研究了不同质量的辅助奖励，一些有利于学习过程，而另一些有害。我们的结果表明，我们的框架可以采取一种原则性的方式来整合设计者指定的euristic。它不仅解决了现有方法的主要缺陷，还一致地导致高性能的解决方案，即使auxiliary奖励函数给出了偏移或低质量的指示。

Kernel-based Joint Multiple Graph Learning and Clustering of Graph Signals

paper_url: http://arxiv.org/abs/2310.19005
repo_url: None
paper_authors: Mohamad H. Alizade, Aref Einizade
for: 这种paper是为了掌握图像处理中的图结构学习和划分问题。
methods: 这种方法使用了kernel-based算法，结合了节点特征信息，以jointly partition signals和学习图。
results: 实验结果表明，这种方法在比较 estado-of-the-art方法时表现出了更高的效果。

Abstract
Within the context of Graph Signal Processing (GSP), Graph Learning (GL) is concerned with the inference of a graph's topology from nodal observations, i.e., graph signals. However, data is often in mixed form, relating to different underlying structures. This heterogeneity necessitates the joint clustering and learning of multiple graphs. In many real-life applications, there are available node-side covariates (i.e., kernels) that imperatively should be incorporated, which has not been addressed by the rare graph signal clustering approaches. To this end and inspired by the rich K-means framework, we propose a novel kernel-based algorithm to incorporate this node-side information as we jointly partition the signals and learn a graph for each cluster. Numerical experiments demonstrate its effectiveness over the state-of-the-art.

摘要
在图像处理（GSP）中，图学习（GL）关注图像的结构划分，即从节点观测获取图像。然而，数据往往是混合形式的，关系不同的基础结构。这种多元性需要同时划分多个图。在许多实际应用中，有可用的节点侧特征（即kernel），需要考虑其中的信息，这一点未在前期的图像信号划分方法中被考虑。为此，我们基于rich K-means框架，提出一种新的kernel-based算法，并在同时划分信号和学习图中jointly使用节点侧信息。数值实验表明其效果胜过现有的状态。

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

paper_url: http://arxiv.org/abs/2310.18988
repo_url: None
paper_authors: Alicia Curth, Alan Jeffares, Mihaela van der Schaar
for: 本研究探讨了double descent现象在传统统计机器学习方法中的存在，并挑战了现有的U型曲线假设。
methods: 研究使用了非神经网络模型，包括线性回归、树和加法拟合。
results: 研究发现，当 Parameter 数量增加时，模型的测试错误率会经历两次下降，而不是传统的U型曲线预测。此外，通过视角改变，研究发现这种现象是由多个复杂性轴的交叠导致的。

Abstract
Conventional statistical wisdom established a well-understood relationship between model complexity and prediction error, typically presented as a U-shaped curve reflecting a transition between under- and overfitting regimes. However, motivated by the success of overparametrized neural networks, recent influential work has suggested this theory to be generally incomplete, introducing an additional regime that exhibits a second descent in test error as the parameter count p grows past sample size n - a phenomenon dubbed double descent. While most attention has naturally been given to the deep-learning setting, double descent was shown to emerge more generally across non-neural models: known cases include linear regression, trees, and boosting. In this work, we take a closer look at evidence surrounding these more classical statistical machine learning methods and challenge the claim that observed cases of double descent truly extend the limits of a traditional U-shaped complexity-generalization curve therein. We show that once careful consideration is given to what is being plotted on the x-axes of their double descent plots, it becomes apparent that there are implicitly multiple complexity axes along which the parameter count grows. We demonstrate that the second descent appears exactly (and only) when and where the transition between these underlying axes occurs, and that its location is thus not inherently tied to the interpolation threshold p=n. We then gain further insight by adopting a classical nonparametric statistics perspective. We interpret the investigated methods as smoothers and propose a generalized measure for the effective number of parameters they use on unseen examples, using which we find that their apparent double descent curves indeed fold back into more traditional convex shapes - providing a resolution to tensions between double descent and statistical intuition.

摘要
传统统计智能认为，模型复杂度和预测误差之间存在一个很好地理解的关系，通常表现为一个U型曲线，反映模型在过拟合和under拟合两个 режиmes之间的转换。然而，受深度学习的成功影响，近期一些influential的工作表明，这种理论是通常不准确的，存在一个第二个下降的测试误差情况，被称为double descent。而且，这种现象不仅限于深度学习设置，还出现在非神经网络模型中，如线性回归、树和抛物模型。在这项工作中，我们更加仔细地研究了这些经典统计机器学习方法的证据，并挑战这些方法的double descent现象是否真的超出传统的U型复杂度-通用曲线的限制。我们发现，只要注意把plot的x轴上的图表绘制得到的是多少个复杂度轴，double descent现象就会变得更加明确。我们示示了第二个下降出现在这些下推轴之间的转换处，并且其位置不是因为 interpolate threshold p=n 决定的。然后，我们采用了一种类非Parametric统计视角，将这些方法看作是简单器，并提出了一种通用的效果参数计数器，用于测试这些方法在未seen例中的表现。我们发现，这些方法的apparent double descent曲线实际上是fold back到了传统的convex形状，解决了对double descent和统计直觉之间的矛盾。

TRIAGE: Characterizing and auditing training data for improved regression

paper_url: http://arxiv.org/abs/2310.18970
repo_url: https://github.com/seedatnabeel/triage
paper_authors: Nabeel Seedat, Jonathan Crabbé, Zhaozhi Qian, Mihaela van der Schaar
for: 本研究旨在提供一种适用于回归任务的数据Characterization方法，以提高机器学习算法的稳定性和性能。
methods: 本方法基于Conformal predictive distributions提供一个模型无关的评分方法，称为TRIAGE score。该分数用于分析个体样本的训练动态和 caracterizar样本为模型下未经估计、过估计或良好估计。
results: 研究人员通过应用TRIAGE分析了多个回归任务，并证明了TRIAGE的 charactization是一致的。此外，TRIAGE还可以用于选择数据集和获取特征。总的来说，TRIAGE highlights the value of data characterization in real-world regression applications.

Abstract
Data quality is crucial for robust machine learning algorithms, with the recent interest in data-centric AI emphasizing the importance of training data characterization. However, current data characterization methods are largely focused on classification settings, with regression settings largely understudied. To address this, we introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors. TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score. We operationalize the score to analyze individual samples' training dynamics and characterize samples as under-, over-, or well-estimated by the model. We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings. Additionally, beyond sample level, we show TRIAGE enables new approaches to dataset selection and feature acquisition. Overall, TRIAGE highlights the value unlocked by data characterization in real-world regression applications

摘要
<>将文本翻译成简化中文。<>机器学习算法中数据质量的重要性已得到更多的关注，特别是在数据中心式AI时代，数据Characterization的重要性得到了更多的认可。然而，现有的数据Characterization方法主要集中在分类任务上， regression任务相对较少研究。为了解决这个问题，我们介绍了一种新的数据Characterization框架，即TRIAGE，该框架适用于多种回归器，并且可以提供一个模型无关的评分方法，即TRIAGE分数。我们将TRIAGE分数操作化，以分析个体样本的训练过程和模型评估。我们示出了TRIAGE的分类是一致的，并且它的用途可以提高数据雕刻/筛选的性能，在多种回归任务中。此外，TRIAGE还可以用于数据集选择和特征收集新的方法。总之，TRIAGE highlights the value of data characterization in real-world regression applications.

Playing in the Dark: No-regret Learning with Adversarial Constraints

paper_url: http://arxiv.org/abs/2310.18955
repo_url: None
paper_authors: Abhishek Sinha, Rahul Vaze
For: 本文探讨了线性 convex 优化框架的扩展，包括额外的长期反对抗约束。特别是，在一线策略决定动作后，除了一个凸成本函数外，反对抗还会透露一组 $k$ 凸约束。成本和约束函数可以随时间变化，无法预测未来函数的信息。* Methods: 本文提出了一种元策略，同时实现了下线性累积约束和下线性 regret。这是通过将受限问题降低到标准 OCO 问题的 recursive 构建的一个黑盒减reduction。我们表明，可以通过解决 surrogate 问题使用任何适应 OCO 策略，满足标准数据依赖 regret bound。* Results: 本文提出了一种新的 Lyapunov 基于证明技术，揭示了 regret 与某些顺序不等式之间的连接。通过一种新的分解结果，我们得出了 regret 的优化性质。 finally, 本文应用于在线多任务学习和网络控制问题。

Abstract
We study a generalization of the classic Online Convex Optimization (OCO) framework by considering additional long-term adversarial constraints. Specifically, after an online policy decides its action on a round, in addition to a convex cost function, the adversary also reveals a set of $k$ convex constraints. The cost and the constraint functions could change arbitrarily with time, and no information about the future functions is assumed to be available. In this paper, we propose a meta-policy that simultaneously achieves a sublinear cumulative constraint violation and a sublinear regret. This is achieved via a black box reduction of the constrained problem to the standard OCO problem for a recursively constructed sequence of surrogate cost functions. We show that optimal performance bounds can be achieved by solving the surrogate problem using any adaptive OCO policy enjoying a standard data-dependent regret bound. A new Lyapunov-based proof technique is presented that reveals a connection between regret and certain sequential inequalities through a novel decomposition result. We conclude the paper by highlighting applications to online multi-task learning and network control problems.

摘要
我们研究了online convex optimization（OCO）框架的一种普遍化，其中包括额外的长期反对派对约束。具体来说，在一个线上策略决定其行动后，除了一个凸成本函数外，反对派也会公布一组$k$个凸约束。成本函数和约束函数可以随时间变化，并不知道未来函数的信息。在这篇论文中，我们提议一个meta策略，可以同时实现一个凸累累约束和一个凸后悔。这是通过一种黑盒减少法将受约束问题转化为标准OCO问题的 recursively constructed sequence of surrogate cost functions。我们表明了一种新的 Lyapunov-based 证明技术，可以通过一种新的分解结果显示回归和某些顺序不等式之间的联系。最后，我们将报告应用于在线多任务学习和网络控制问题。

Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU Networks on Nearly-orthogonal Data

paper_url: http://arxiv.org/abs/2310.18935
repo_url: None
paper_authors: Yiwen Kou, Zixiang Chen, Quanquan Gu
for: 这个论文主要探讨了Gradient Descent的隐式偏好在训练非平滑神经网络时的影响。
methods: 本文使用了Gradient Descent来训练两层完全相连（漏斗 activtion function）神经网络，并研究了隐式偏好在不同activation function下的影响。
results: 本文发现在训练数据接近对称的情况下，隐式偏好会使Gradient Descent寻找一个稳定的rank值，并且这値值会在ReLU activation function下随着训练进程的推移而变化。此外，本文还发现隐式偏好会使Gradient Descent寻找一个神经网络，使得所有的训练数据点都具有相同的normalized margin。实验结果与理论结果匹配。

Abstract
The implicit bias towards solutions with favorable properties is believed to be a key reason why neural networks trained by gradient-based optimization can generalize well. While the implicit bias of gradient flow has been widely studied for homogeneous neural networks (including ReLU and leaky ReLU networks), the implicit bias of gradient descent is currently only understood for smooth neural networks. Therefore, implicit bias in non-smooth neural networks trained by gradient descent remains an open question. In this paper, we aim to answer this question by studying the implicit bias of gradient descent for training two-layer fully connected (leaky) ReLU neural networks. We showed that when the training data are nearly-orthogonal, for leaky ReLU activation function, gradient descent will find a network with a stable rank that converges to $1$, whereas for ReLU activation function, gradient descent will find a neural network with a stable rank that is upper bounded by a constant. Additionally, we show that gradient descent will find a neural network such that all the training data points have the same normalized margin asymptotically. Experiments on both synthetic and real data backup our theoretical findings.

摘要
“偏好解释器的偏好”是一个关键因素，使得神经网络通过梯度下降优化得到良好的泛化能力。而“梯度流”中的偏好已经广泛研究过 homogeneous 神经网络（包括 ReLU 和泄漏 ReLU 网络），但是“梯度下降”中的偏好对非满意神经网络仍然是一个开Question。在这篇论文中，我们尝试答复这个问题，通过研究两层全连接（泄漏 ReLU）神经网络在训练过程中的偏好。我们发现，当训练数据几乎正交时，使用泄漏 ReLU 活动函数时，梯度下降会找到一个稳定的权重积分，其渐近值为 1；而使用 ReLU 活动函数时，梯度下降会找到一个神经网络，其稳定权重积分上限为一个常数。此外，我们还发现，梯度下降会找到一个神经网络，使得所有的训练数据点都具有相同的归一化margin。实验结果证明了我们的理论发现。

Remaining Useful Life Prediction of Lithium-ion Batteries using Spatio-temporal Multimodal Attention Networks

paper_url: http://arxiv.org/abs/2310.18924
repo_url: https://github.com/Dhruvadityamittal/RUL_Prediction_of_LIB_using_Spatio_temporal_Multimodal_Attention_Networks
paper_authors: Sungho Suh, Dhruv Aditya Mittal, Hymalai Bello, Bo Zhou, Mayank Shekhar Jha, Paul Lukowicz
For: The paper aims to predict the remaining useful life of Lithium-ion batteries in real-world scenarios, addressing the limitations of existing methods and improving the reliability and efficiency of battery operations.* Methods: The proposed method uses a two-stage remaining useful life prediction scheme based on a spatio-temporal multimodal attention network (ST-MAN), which captures complex spatio-temporal dependencies in the battery data and neglected features such as temperature, internal resistance, and material type.* Results: The proposed ST-MAN model outperforms existing CNN and LSTM-based methods, achieving state-of-the-art performance in predicting the remaining useful life of Li-ion batteries.

Abstract
Lithium-ion batteries are widely used in various applications, including electric vehicles and renewable energy storage. The prediction of the remaining useful life (RUL) of batteries is crucial for ensuring reliable and efficient operation, as well as reducing maintenance costs. However, determining the life cycle of batteries in real-world scenarios is challenging, and existing methods have limitations in predicting the number of cycles iteratively. In addition, existing works often oversimplify the datasets, neglecting important features of the batteries such as temperature, internal resistance, and material type. To address these limitations, this paper proposes a two-stage remaining useful life prediction scheme for Lithium-ion batteries using a spatio-temporal multimodal attention network (ST-MAN). The proposed model is designed to iteratively predict the number of cycles required for the battery to reach the end of its useful life, based on available data. The proposed ST-MAN is to capture the complex spatio-temporal dependencies in the battery data, including the features that are often neglected in existing works. Experimental results demonstrate that the proposed ST-MAN model outperforms existing CNN and LSTM-based methods, achieving state-of-the-art performance in predicting the remaining useful life of Li-ion batteries. The proposed method has the potential to improve the reliability and efficiency of battery operations and is applicable in various industries, including automotive and renewable energy.

摘要

Hyperbolic Graph Neural Networks at Scale: A Meta Learning Approach

paper_url: http://arxiv.org/abs/2310.18918
repo_url: None
paper_authors: Nurendra Choudhary, Nikhil Rao, Chandan K. Reddy
for: 提高几何神经网络（HNNs）的泛化能力和可扩展性，以便在新任务上快速学习和掌握大型图数据集。
methods: 学习图节点和边的本地子图中的抽象特征，并将其转移到新的子图上进行几拟 shot 学习。引入一种新的方法——几何 GRAph Meta Learner（H-GRAM），可以在节点 classification 和链接预测任务中学习并转移抽象信息，以便更快地学习新的任务。
results: 在多个具有挑战性的几拟 shot Setting 中，H-GRAM 能够有效地学习和转移信息，并且在大型图数据集上可以扩展性地提高性能。与标准 HNNs 相比，我们的方法可以更好地扩展到大型图数据集和提高性能。

Abstract
The progress in hyperbolic neural networks (HNNs) research is hindered by their absence of inductive bias mechanisms, which are essential for generalizing to new tasks and facilitating scalable learning over large datasets. In this paper, we aim to alleviate these issues by learning generalizable inductive biases from the nodes' local subgraph and transfer them for faster learning over new subgraphs with a disjoint set of nodes, edges, and labels in a few-shot setting. We introduce a novel method, Hyperbolic GRAph Meta Learner (H-GRAM), that, for the tasks of node classification and link prediction, learns transferable information from a set of support local subgraphs in the form of hyperbolic meta gradients and label hyperbolic protonets to enable faster learning over a query set of new tasks dealing with disjoint subgraphs. Furthermore, we show that an extension of our meta-learning framework also mitigates the scalability challenges seen in HNNs faced by existing approaches. Our comparative analysis shows that H-GRAM effectively learns and transfers information in multiple challenging few-shot settings compared to other state-of-the-art baselines. Additionally, we demonstrate that, unlike standard HNNs, our approach is able to scale over large graph datasets and improve performance over its Euclidean counterparts.

摘要
progress in гиперболических нейронных сетях (HNNs) 研究受到其缺乏抽象假设机制的限制，这些机制是必需的 для泛化到新任务和促进大量数据集上的学习。在这篇论文中，我们想要解决这些问题，通过从节点的本地子图中学习通用的抽象假设，并将其传递给新的子图上快速学习。我们提出了一种新的方法，即гиперболическиеGRAPH元学习器（H-GRAM），它在节点分类和链接预测任务上，通过从支持本地子图集中学习hyperbolic meta 梯度和标签hyerbolic气泡来启用快速学习新任务。此外，我们还证明了我们的meta学习框架的扩展可以解决现有方法所面临的扩展性问题。我们的比较分析表明，H-GRAM在多个具有挑战性的几个shot设定中能够有效地学习和传递信息，并且与标准HNNs相比，我们的方法可以在大规模图数据集上提高性能。

Estimating the Rate-Distortion Function by Wasserstein Gradient Descent

paper_url: http://arxiv.org/abs/2310.18908
repo_url: https://github.com/yiboyang/wgd
paper_authors: Yibo Yang, Stephan Eckstein, Marcel Nutz, Stephan Mandt
for: 本研究的目的是提出一种基于最优运输的Rate-Distortion（R-D）函数估计方法，用于评估数据源的压缩性。
methods: 该方法使用 Wasserstein 梯度下降算法学习优化的抽象分布，不同于经典的 Blahut–Arimoto 算法，它预先固定了往复分布的支持。
results: 实验表明，该方法可以在低比特率源上获得相当或更紧的约束，而需要许多更少的调整和计算努力。此外，该方法还与最大极值估计有关，并引入了一种新的测试源。

Abstract
In the theory of lossy compression, the rate-distortion (R-D) function $R(D)$ describes how much a data source can be compressed (in bit-rate) at any given level of fidelity (distortion). Obtaining $R(D)$ for a given data source establishes the fundamental performance limit for all compression algorithms. We propose a new method to estimate $R(D)$ from the perspective of optimal transport. Unlike the classic Blahut--Arimoto algorithm which fixes the support of the reproduction distribution in advance, our Wasserstein gradient descent algorithm learns the support of the optimal reproduction distribution by moving particles. We prove its local convergence and analyze the sample complexity of our R-D estimator based on a connection to entropic optimal transport. Experimentally, we obtain comparable or tighter bounds than state-of-the-art neural network methods on low-rate sources while requiring considerably less tuning and computation effort. We also highlight a connection to maximum-likelihood deconvolution and introduce a new class of sources that can be used as test cases with known solutions to the R-D problem.

摘要
理论上，吞吐率-损均衡（R-D）函数 $R(D)$ 描述了数据源可以通过压缩（bit-rate）来实现任何级别的准确性（损均）。确定 $R(D)$ 对于给定数据源是吞吐率压缩算法的基本性能上限。我们提出了一种基于最优运输的新方法来估计 $R(D)$。这种方法不同于经典的布拉哈特-阿里莫托算法，它在预先固定往复复制分布的支持上进行估计。我们的渐进梯度滚动算法会学习最优往复复制分布的支持，通过移动粒子来实现。我们证明了本方法的本地收敛性和样本复杂性，并与经典神经网络方法进行比较。实验结果表明，我们的方法可以在低比特率源上获得相对或更紧的约束，而且需要远少的调整和计算努力。我们还 highlight了与最大似然减杂的连接，并介绍了一种新的测试集，其中可以使用已知解决R-D问题的源。

Topological, or Non-topological? A Deep Learning Based Prediction

paper_url: http://arxiv.org/abs/2310.18907
repo_url: https://github.com/xercxis/P_zeta
paper_authors: Ashiqur Rasul, Md Shafayat Hossain, Ankan Ghosh Dastider, Himaddri Roy, M. Zahid Hasan, Quazi D. M. Khosru
for: 预测和发现新材料的性能预测和材料设计
methods: 使用深度学习模型，结合 persistently homology 和图神经网络，实现高精度的材料分类
results: 实验结果显示，该模型的准确率为 91.4%，F1 分数为 88.5%，在分类非材料和材料中表现出色，超过其他状态对照模型

Abstract
Prediction and discovery of new materials with desired properties are at the forefront of quantum science and technology research. A major bottleneck in this field is the computational resources and time complexity related to finding new materials from ab initio calculations. In this work, an effective and robust deep learning-based model is proposed by incorporating persistent homology and graph neural network which offers an accuracy of 91.4% and an F1 score of 88.5% in classifying topological vs. non-topological materials, outperforming the other state-of-the-art classifier models. The incorporation of the graph neural network encodes the underlying relation between the atoms into the model based on their own crystalline structures and thus proved to be an effective method to represent and process non-euclidean data like molecules with a relatively shallow network. The persistent homology pipeline in the suggested neural network is capable of integrating the atom-specific topological information into the deep learning model, increasing robustness, and gain in performance. It is believed that the presented work will be an efficacious tool for predicting the topological class and therefore enable the high-throughput search for novel materials in this field.

摘要
科学家们正在努力探索新材料的搜索和预测，以满足现代科学和技术的需求。然而，计算资源和计算复杂性问题成为了这一领域的主要瓶颈。在这篇文章中，我们提出了一种有效和可靠的深度学习模型，通过结合持续同态和图神经网络来减少计算资源的占用和提高预测的精度。这种模型在分类非普遍材料和普遍材料方面的准确率达91.4%，F1分数达88.5%，超过了其他现有的分类模型。在这种模型中，图神经网络允许通过晶体结构中的原子之间的关系来编码材料的结构，从而实现了对非欧几何数据的有效处理。持续同态管道在建议的神经网络中允许将原子特征的拓扑信息纳入深度学习模型中，从而提高了模型的稳定性和性能。总之，这种方法将成为预测材料的拓扑类别的有效工具，并促进高速搜索新材料的搜索。

Learning Subgrid-Scale Models in Discontinuous Galerkin Methods with Neural Ordinary Differential Equations for Compressible Navier–Stokes Equations

paper_url: http://arxiv.org/abs/2310.18897
repo_url: None
paper_authors: Shinhoo Kang, Emil M. Constantinescu
for: 该文章的目的是提出一种基于神经普通微分方程的新方法，用于在离散哈姆频率（DG）空间积分中学习低级别模型的影响。
methods: 该方法使用神经网络来学习低级别模型中缺失的涨落尺度，从而提高低级别DG近似的准确性和加速筛选高级别DG simulation的运算速度。
results: 作者通过多维泰勒-格林涡漩示例来证明该方法的性能，并证明该方法不仅可以重construct低级别模型的涨落尺度，还可以加速筛选高级别DG simulation的运算速度，提高了模型的准确性和效率。

Abstract
The growing computing power over the years has enabled simulations to become more complex and accurate. However, high-fidelity simulations, while immensely valuable for scientific discovery and problem solving, come with significant computational demands. As a result, it is common to run a low-fidelity model with a subgrid-scale model to reduce the computational cost, but selecting the appropriate subgrid-scale models and tuning them are challenging. We propose a novel method for learning the subgrid-scale model effects when simulating partial differential equations using neural ordinary differential equations in the context of discontinuous Galerkin (DG) spatial discretization. Our approach learns the missing scales of the low-order DG solver at a continuous level and hence improves the accuracy of the low-order DG approximations as well as accelerates the filtered high-order DG simulations with a certain degree of precision. We demonstrate the performance of our approach through multidimensional Taylor--Green vortex examples at different Reynolds numbers and times, which cover laminar, transitional, and turbulent regimes. The proposed method not only reconstructs the subgrid-scale from the low-order (1st-order) approximation but also speeds up the filtered high-order DG (6th-order) simulation by two orders of magnitude.

摘要
随着计算能力的提高， simulations 已经能够更加复杂和准确。然而，高精度 simulations 的计算需求很大，因此通常采用低精度模型和缺失涂抹模型来降低计算成本。然而，选择合适的缺失涂抹模型并调整它们是困难的。我们提出了一种基于神经ordinary differential equations的新方法，用于在discontinuous Galerkin（DG）空间积分方法中学习subgrid-scale模型的效果。我们的方法可以在 kontinuierlichen Level学习低阶DG解的缺失尺度，从而提高低阶DG的准确性和加速筛选高阶DG（6th-order） simulations的过程。我们通过多维 Taylor--Green涡涌示例来证明我们的方法的性能，这些示例覆盖了laminar、transition和turbulent режиmes。我们的方法不仅可以从低阶（1st-order）解中重construct subgrid-scale，还可以加速筛选高阶DG simulations的过程，提高速度两个数量级。

D2NO: Efficient Handling of Heterogeneous Input Function Spaces with Distributed Deep Neural Operators

paper_url: http://arxiv.org/abs/2310.18888
repo_url: None
paper_authors: Zecheng Zhang, Christian Moya, Lu Lu, Guang Lin, Hayden Schaeffer
for: 解决 Parametric partial differential equations、动力系统控制和反向问题中的hetERogeneous输入函数问题
methods: 使用Discretization-invariant neural operators和分布式方法处理多感器输入函数
results: 提出一种新的分布式方法，可以降低Gradient descent back-propagation步数，提高效率而不失精度，并 Validated by four numerical examples

Abstract
Neural operators have been applied in various scientific fields, such as solving parametric partial differential equations, dynamical systems with control, and inverse problems. However, challenges arise when dealing with input functions that exhibit heterogeneous properties, requiring multiple sensors to handle functions with minimal regularity. To address this issue, discretization-invariant neural operators have been used, allowing the sampling of diverse input functions with different sensor locations. However, existing frameworks still require an equal number of sensors for all functions. In our study, we propose a novel distributed approach to further relax the discretization requirements and solve the heterogeneous dataset challenges. Our method involves partitioning the input function space and processing individual input functions using independent and separate neural networks. A centralized neural network is used to handle shared information across all output functions. This distributed methodology reduces the number of gradient descent back-propagation steps, improving efficiency while maintaining accuracy. We demonstrate that the corresponding neural network is a universal approximator of continuous nonlinear operators and present four numerical examples to validate its performance.

摘要
我们的方法包括将输入函数空间分区，并使用独立的 neural network 处理个别输入函数。中央 neural network 用于处理所有输出函数之间的共享信息。这种分布式方法可以降低梯度下降反propagation 步骤数量，提高效率而无损准确性。我们证明了相应的 neural network 是一个 universal approximator 的连续非线性算子，并在四个数字示例中验证了其性能。

A foundational neural operator that continuously learns without forgetting

paper_url: http://arxiv.org/abs/2310.18885
repo_url: None
paper_authors: Tapas Tripura, Souvik Chakraborty
for: 本研究旨在开发一种基础模型，用于科学计算中的物理问题。
methods: 该模型基于神经网络和波峰分解技术，并使用了闭合结构和记忆 Ensemble 技术来学习多种物理系统的解方程。
results: 该模型能够同时学习多种 Parametric PDE 的解方程，并能够快速适应新的 Parametric PDE。同时，该模型也能够保持Positive Transfer和避免 Catastrophic Forgetting。经过广泛的 benchmark 测试，该模型可以在预测阶段比task-specific基eline模型表现更好，并且具有较少的hyperparameter tuning。

Abstract
Machine learning has witnessed substantial growth, leading to the development of advanced artificial intelligence models crafted to address a wide range of real-world challenges spanning various domains, such as computer vision, natural language processing, and scientific computing. Nevertheless, the creation of custom models for each new task remains a resource-intensive undertaking, demanding considerable computational time and memory resources. In this study, we introduce the concept of the Neural Combinatorial Wavelet Neural Operator (NCWNO) as a foundational model for scientific computing. This model is specifically designed to excel in learning from a diverse spectrum of physics and continuously adapt to the solution operators associated with parametric partial differential equations (PDEs). The NCWNO leverages a gated structure that employs local wavelet experts to acquire shared features across multiple physical systems, complemented by a memory-based ensembling approach among these local wavelet experts. This combination enables rapid adaptation to new challenges. The proposed foundational model offers two key advantages: (i) it can simultaneously learn solution operators for multiple parametric PDEs, and (ii) it can swiftly generalize to new parametric PDEs with minimal fine-tuning. The proposed NCWNO is the first foundational operator learning algorithm distinguished by its (i) robustness against catastrophic forgetting, (ii) the maintenance of positive transfer for new parametric PDEs, and (iii) the facilitation of knowledge transfer across dissimilar tasks. Through an extensive set of benchmark examples, we demonstrate that the NCWNO can outperform task-specific baseline operator learning frameworks with minimal hyperparameter tuning at the prediction stage. We also show that with minimal fine-tuning, the NCWNO performs accurate combinatorial learning of new parametric PDEs.

摘要
In this study, we introduce the Neural Combinatorial Wavelet Neural Operator (NCWNO) as a foundational model for scientific computing. This model is specifically designed to excel in learning from a diverse spectrum of physics and continuously adapt to the solution operators associated with parametric partial differential equations (PDEs). The NCWNO leverages a gated structure that employs local wavelet experts to acquire shared features across multiple physical systems, complemented by a memory-based ensembling approach among these local wavelet experts. This combination enables rapid adaptation to new challenges.The proposed foundational model offers two key advantages: (i) it can simultaneously learn solution operators for multiple parametric PDEs, and (ii) it can swiftly generalize to new parametric PDEs with minimal fine-tuning. Additionally, the NCWNO is distinguished by its:* Robustness against catastrophic forgetting* Maintenance of positive transfer for new parametric PDEs* Facilitation of knowledge transfer across dissimilar tasksThrough an extensive set of benchmark examples, we demonstrate that the NCWNO can outperform task-specific baseline operator learning frameworks with minimal hyperparameter tuning at the prediction stage. We also show that with minimal fine-tuning, the NCWNO can accurately combine learning of new parametric PDEs.

Simple and Asymmetric Graph Contrastive Learning without Augmentations

paper_url: http://arxiv.org/abs/2310.18884
repo_url: https://github.com/tengxiao1/graphacl
paper_authors: Teng Xiao, Huaisheng Zhu, Zhengyu Chen, Suhang Wang
for: 本文研究了对异谱图进行对照学习，并提出了一种简单的算法GraphACL，可以 capture一步邻居信息和两步同类相似性。
methods: 本文使用了对照学习方法，并提出了一种偏 asymmetric 视角来处理异谱图。
results: 实验结果表明，GraphACL 可以在异谱图上 achieve 出色的表现，并且在 homophilic 和异谱图上都具有优异的泛化能力。

Abstract
Graph Contrastive Learning (GCL) has shown superior performance in representation learning in graph-structured data. Despite their success, most existing GCL methods rely on prefabricated graph augmentation and homophily assumptions. Thus, they fail to generalize well to heterophilic graphs where connected nodes may have different class labels and dissimilar features. In this paper, we study the problem of conducting contrastive learning on homophilic and heterophilic graphs. We find that we can achieve promising performance simply by considering an asymmetric view of the neighboring nodes. The resulting simple algorithm, Asymmetric Contrastive Learning for Graphs (GraphACL), is easy to implement and does not rely on graph augmentations and homophily assumptions. We provide theoretical and empirical evidence that GraphACL can capture one-hop local neighborhood information and two-hop monophily similarity, which are both important for modeling heterophilic graphs. Experimental results show that the simple GraphACL significantly outperforms state-of-the-art graph contrastive learning and self-supervised learning methods on homophilic and heterophilic graphs. The code of GraphACL is available at https://github.com/tengxiao1/GraphACL.

摘要
图像对比学习（GCL）在图结构数据中的表示学习表现出色。然而，大多数现有的GCL方法都基于先制制图像增强和同类连接假设。因此，它们在不同类型连接的图中失去泛化能力。在这篇论文中，我们研究了在同类连接和不同类型连接图中进行对比学习的问题。我们发现，只需考虑偏 asymmetric 的邻居节点视角，就可以获得了良好的表现。 resulting algorithm, Asymmetric Contrastive Learning for Graphs (GraphACL), 易于实现并不需要图像增强和同类连接假设。我们提供了理论和实验证据，表明 GraphACL 可以捕捉一次邻居信息和两次同类连接相似性，这些都是模型不同类型连接图的关键。实验结果表明，简单的 GraphACL 在同类连接和不同类型连接图中明显超越了当前最佳的图像对比学习和自然学习方法。GraphACL 的代码可以在 https://github.com/tengxiao1/GraphACL 上获取。

Correlation Aware Sparsified Mean Estimation Using Random Projection

paper_url: http://arxiv.org/abs/2310.18868
repo_url: https://github.com/11hifish/Rand-Proj-Spatial
paper_authors: Shuli Jiang, Pranay Sharma, Gauri Joshi
for: 这篇论文主要探讨了分布式vector mean估计问题，是分布式优化和联合学习（Federated Learning，FL）中通用的子routine。
methods: 这篇论文使用了 Rand-$k$ 簇范例化技术来减少分布式传输成本，每个客户端将 $k < d$ 个坐标发送到服务器。然而，Rand-$k$ 无法考虑实际应用中客户端之间的相互联系。这篇论文提出了 Rand-$k$-Spatial 估计器，利用服务器端的客户端间联系信息来改善 Rand-$k$ 的性能。然而，Rand-$k$-Spatial 的性能仍然不足。这篇论文提出了 Rand-Proj-Spatial 估计器，具有更加灵活的嵌入构造和解oding程序，可以更好地利用客户端间的联系信息。
results: 这篇论文的实验结果显示，Rand-Proj-Spatial 比 Rand-$k$-Spatial 和其他更加复杂的簇范例化技术更高效。此外，这篇论文还提出了一种可以根据客户端间联系信息不同程度的弹性 Rand-Proj-Spatial 方法，并且在实验中证明其效果。

Abstract
We study the problem of communication-efficient distributed vector mean estimation, a commonly used subroutine in distributed optimization and Federated Learning (FL). Rand-$k$ sparsification is a commonly used technique to reduce communication cost, where each client sends $k < d$ of its coordinates to the server. However, Rand-$k$ is agnostic to any correlations, that might exist between clients in practical scenarios. The recently proposed Rand-$k$-Spatial estimator leverages the cross-client correlation information at the server to improve Rand-$k$'s performance. Yet, the performance of Rand-$k$-Spatial is suboptimal. We propose the Rand-Proj-Spatial estimator with a more flexible encoding-decoding procedure, which generalizes the encoding of Rand-$k$ by projecting the client vectors to a random $k$-dimensional subspace. We utilize Subsampled Randomized Hadamard Transform (SRHT) as the projection matrix and show that Rand-Proj-Spatial with SRHT outperforms Rand-$k$-Spatial, using the correlation information more efficiently. Furthermore, we propose an approach to incorporate varying degrees of correlation and suggest a practical variant of Rand-Proj-Spatial when the correlation information is not available to the server. Experiments on real-world distributed optimization tasks showcase the superior performance of Rand-Proj-Spatial compared to Rand-$k$-Spatial and other more sophisticated sparsification techniques.

摘要
我们研究了一个分布式向量均值估计问题，这是分布式优化和联合学习（FL）中广泛使用的一种子 Routine。 Rand-$k$ 精炼是一种常用的减少通信成本的技术，每个客户端向服务器发送 $k < d$ 个坐标。然而，Rand-$k$ 无法考虑客户端之间的协方差信息，这可能导致性能下降。我们提出了 Rand-$k$-Spatial 估计器，使用服务器端的协方差信息来改进 Rand-$k$ 的性能。然而，Rand-$k$-Spatial 的性能仍然有限制。我们提出了 Rand-Proj-Spatial 估计器，它使用随机 $k$-维空间的投影来扩展 Rand-$k$ 的编码过程。我们使用 Subsampled Randomized Hadamard Transform (SRHT) 作为投影矩阵，并证明 Rand-Proj-Spatial 使用 SRHT 的投影可以更好地利用协方差信息。此外，我们提出了一种根据协方差信息不同程度的变化来修改 Rand-Proj-Spatial 的方法，并建议在服务器端不可获得协方差信息时使用实际 variant。我们在实际分布式优化任务上进行了实验，并证明 Rand-Proj-Spatial 的性能比 Rand-$k$-Spatial 和其他更复杂的精炼技术更高。

Peer-to-Peer Deep Learning for Beyond-5G IoT

paper_url: http://arxiv.org/abs/2310.18861
repo_url: None
paper_authors: Srinivasa Pranav, José M. F. Moura
for: 这个论文是为了解决智能城市等 beyond-5G computing 环境中的规模问题，而不需要中央服务器或云端协调。
methods: 这个算法使用 max norm synchronization 来驱动训练，保留了设备上的深度模型训练，并使用本地设备之间的通信来实现分布式共识。每个设备会逐次交替进行两个阶段：1）设备上的学习，2）分布式合作，其中它们与附近的设备结合模型参数。
results: 这个算法可以让所有参与设备都 дости得到与 federated 和中央训练相同的测试性能，甚至在 100 个设备和宽松的单器散发加重的情况下。此外，这个算法还可以在不同的网络拓扑、罕见的通信和非Identical 数据分布情况下进行扩展。

Abstract
We present P2PL, a practical multi-device peer-to-peer deep learning algorithm that, unlike the federated learning paradigm, does not require coordination from edge servers or the cloud. This makes P2PL well-suited for the sheer scale of beyond-5G computing environments like smart cities that otherwise create range, latency, bandwidth, and single point of failure issues for federated approaches. P2PL introduces max norm synchronization to catalyze training, retains on-device deep model training to preserve privacy, and leverages local inter-device communication to implement distributed consensus. Each device iteratively alternates between two phases: 1) on-device learning and 2) distributed cooperation where they combine model parameters with nearby devices. We empirically show that all participating devices achieve the same test performance attained by federated and centralized training -- even with 100 devices and relaxed singly stochastic consensus weights. We extend these experimental results to settings with diverse network topologies, sparse and intermittent communication, and non-IID data distributions.

摘要
我们介绍P2PL，一种实用多设备 peer-to-peer深度学习算法，不同于联邦学习模式，不需要边缘服务器或云端协调。这使得P2PL在 beyond-5G 计算环境中，如智能城市，创造范围、延迟、带宽和单点故障问题，而 federated 方法不适用。P2PL 引入最大范数同步来促进训练，保留设备上深度模型训练，并利用本地设备间通信实现分布式共识。每个设备会逐次 alternate between two 阶段：1）设备上学习和 2）分布式合作，其中 combines 模型参数与附近设备。我们实验表明，参与训练的所有设备可以达到 federated 和中央训练所得到的测试性能，即使有 100 个设备和松弛单调共识加权。我们还将这些实验结果扩展到不同的网络拓扑、笔数和间歇性通信、非标一致数据分布的设置下。

Bayes beats Cross Validation: Efficient and Accurate Ridge Regression via Expectation Maximization

paper_url: http://arxiv.org/abs/2310.18860
repo_url: None
paper_authors: Shu Yu Tew, Mario Boley, Daniel F. Schmidt
for: 本研究提出了一种新的方法，用于调整ridge regression中的正则化参数（λ），它比遗弃一个样本的跨Validation（LOOCV）更快速，同时可以提供与LOOCV risk最小化的拟合参数的同等或更高质量的估计。
methods: 本研究使用了一种 bayesian 的ridge regression形式ulation，通过一个迭代的期望最大化（EM）过程来学习jointly 估计 $\lambda$ 和拟合参数。
results: 研究表明，该方法可以在大 enough $n$ 的情况下，无需设定任何难以确定的 гипер参数，具有唯一最优解，并且在 $O(\min(n, p))$ 操作下实现单一迭代EM循环。此外，研究还发现，通过采用合适的预处理步骤，可以在 $O(n \min(n, p))$ 操作下评估单个 $\lambda$ 值，而不需要评估所有 $l$ 个 candidate $\lambda$ 值。

Abstract
We present a novel method for tuning the regularization hyper-parameter, $\lambda$, of a ridge regression that is faster to compute than leave-one-out cross-validation (LOOCV) while yielding estimates of the regression parameters of equal, or particularly in the setting of sparse covariates, superior quality to those obtained by minimising the LOOCV risk. The LOOCV risk can suffer from multiple and bad local minima for finite $n$ and thus requires the specification of a set of candidate $\lambda$, which can fail to provide good solutions. In contrast, we show that the proposed method is guaranteed to find a unique optimal solution for large enough $n$, under relatively mild conditions, without requiring the specification of any difficult to determine hyper-parameters. This is based on a Bayesian formulation of ridge regression that we prove to have a unimodal posterior for large enough $n$, allowing for both the optimal $\lambda$ and the regression coefficients to be jointly learned within an iterative expectation maximization (EM) procedure. Importantly, we show that by utilizing an appropriate preprocessing step, a single iteration of the main EM loop can be implemented in $O(\min(n, p))$ operations, for input data with $n$ rows and $p$ columns. In contrast, evaluating a single value of $\lambda$ using fast LOOCV costs $O(n \min(n, p))$ operations when using the same preprocessing. This advantage amounts to an asymptotic improvement of a factor of $l$ for $l$ candidate values for $\lambda$ (in the regime $q, p \in O(\sqrt{n})$ where $q$ is the number of regression targets).

摘要
我团队提出了一种新的方法来调整ridge regression中的正则化超参数($\lambda$ )，这种方法比逐个留下一个（LOOCV）更快速计算，而且可以提供与LOOCVrisk相同或更高质量的回归参数估计。LOOCV risk可能会在finite $n$ 下存在多个和坏的地方极 minimum，因此可能需要指定一组 candidate $\lambda$，这可能会导致不良的解决方案。然而，我们证明了该方法在 suficiently large $n$ 下是唯一优化解决方案，不需要指定任何难以确定的超参数。这是基于ridge regression的 Bayesian 表述，我们证明了其 posterior 在 suficiently large $n$ 下是单模的，因此可以通过 iterative expectation maximization (EM) 过程来同时学习 optimal $\lambda$ 和回归系数。其中，我们还证明了可以通过适当的预处理步骤，在 $O(\min(n, p))$ 操作下完成一次主 EM 循环，其中 $n$ 是行数，$p$ 是列数。与此相比，通过快速 LOOCV 评估 $\lambda$ 的值需要 $O(n \min(n, p))$ 操作。这种优势在 $l$ 个 candidate $\lambda$ 值的情况下（在 $q, p \in O(\sqrt{n})$ regime 中）amounts to an asymptotic improvement factor of $l$。

SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models

paper_url: http://arxiv.org/abs/2310.18859
repo_url: None
paper_authors: Zhixu Du, Shiyu Li, Yuhao Wu, Xiangyu Jiang, Jingwei Sun, Qilin Zheng, Yongkai Wu, Ang Li, Hai “Helen” Li, Yiran Chen
for: 这篇论文的目的是提出一种高效的大型 mixture-of-experts（MoE）模型测试方法，以减少 GPU 内存使用量并提高模型效率。
methods: 这篇论文使用了一种叫做 SiDA（Sparsity-inspired Data-Aware）的方法，它利用了系统的主内存和 GPU 内存，并利用了 MoE 模型中专家活化的内在绝对性来优化模型效率。
results: 根据论文的结果，SiDA 方法可以实现大幅提高 MoE 模型的测试速度，将对 GPU 内存的使用量减少到 1%，同时保持模型效率不变。具体来说，SiDA 方法可以实现 Up to 3.93X 的测试速度增加、Up to 75% 的延迟降低和 Up to 80% 的 GPU 内存储存量减少。

Abstract
Mixture-of-Experts (MoE) has emerged as a favorable architecture in the era of large models due to its inherent advantage, i.e., enlarging model capacity without incurring notable computational overhead. Yet, the realization of such benefits often results in ineffective GPU memory utilization, as large portions of the model parameters remain dormant during inference. Moreover, the memory demands of large models consistently outpace the memory capacity of contemporary GPUs. Addressing this, we introduce SiDA (Sparsity-inspired Data-Aware), an efficient inference approach tailored for large MoE models. SiDA judiciously exploits both the system's main memory, which is now abundant and readily scalable, and GPU memory by capitalizing on the inherent sparsity on expert activation in MoE models. By adopting a data-aware perspective, SiDA achieves enhanced model efficiency with a neglectable performance drop. Specifically, SiDA attains a remarkable speedup in MoE inference with up to 3.93X throughput increasing, up to 75% latency reduction, and up to 80% GPU memory saving with down to 1% performance drop. This work paves the way for scalable and efficient deployment of large MoE models, even in memory-constrained systems.

摘要
大型模型时代，混合专家（MoE）架构已成为有利的选择，因为它可以无需增加显著的计算负担来扩大模型的容量。然而，实现这些利点经常导致大量模型参数在推理过程中处于休眠状态，同时大模型的内存需求常常超过当今的GPU内存容量。为解决这个问题，我们提出了SiDA（基于缺省性的数据意识），这是一种高效的推理方法，专门为大MoE模型设计。SiDA利用系统的主存，这是现在充足且可扩展的，同时还利用GPU内存，通过利用MoE模型中专家活动的自然缺省性来提高模型效率。通过采用数据意识的视角，SiDA实现了更高的模型效率，减少了推理延迟和GPU内存占用，同时保持了模型性能的稳定。具体来说，SiDA在MoE推理中可以达到3.93倍的吞吐量提高、75%的延迟减少和80%的GPU内存减少，同时保持模型性能下降不到1%。这项工作为大MoE模型的扩展和高效部署提供了可行的方法。

2023-10-29

eess.IV

eess.IV - 2023-10-29

Subjective Quality Evaluation of Point Clouds Using a Head Mounted Display

paper_url: http://arxiv.org/abs/2310.19179
repo_url: None
paper_authors: Joao Prazeres, Rafael Rodrigues, Manuela Pereira, Antonio M. G. Pinheiro
for: 这个论文报告了对静止点云编码器MPEG V-PCC、G-PCC、深度学习编码器RS-DLPCC以及受欢迎的Draco编码器的主观质量评估。
methods: 该论文使用了18名参与者通过头戴式显示器直接比较了3D表示的扭曲点云的视觉效果，并对所获得的主观评分（MOS）与之前两项研究中对同一内容的视觉效果进行了比较，包括潘森相关指数、斯宾塞排名相关指数、平均方差和外围异常指数。
results: 结果表明这三项研究之间存在高度相关性，并且对所有评估中的差异没有发现任何显著差异。

Abstract
This paper reports on a subjective quality evaluation of static point clouds encoded with the MPEG codecs V-PCC and G-PCC, the deep learning-based codec RS-DLPCC, and the popular Draco codec. 18 subjects visualized 3D representations of distorted point clouds using a Head Mounted Display, which allowed for a direct comparison with their reference. The Mean Opinion Scores (MOS) obtained in this subjective evaluation were compared with the MOS from two previous studies, where the same content was visualized either on a 2D display or a 3D stereoscopic display, through the Pearson Correlation, Spearman Rank Order Correlation, Root Mean Square Error, and the Outlier Ratio. The results indicate that the three studies are highly correlated with one another. Moreover, a statistical analysis between all evaluations showed no significant differences between them.

摘要
(Simplified Chinese)这篇论文报道了一项主观质量评估，涉及到静止点云编码器MPEG codecs V-PCC和G-PCC、深度学习基于的RS-DLPCC编码器以及受欢迎的Draco编码器。18名参与者通过头戴式显示器 visualized 3D表示distorted点云，可以直接与参考进行比较。获得的主观意见分（MOS）在这个主观评估中被与之前两个研究相比较，这两个研究分别使用2D显示器和3D立体显示器显示同一内容。通过皮尔逊相关度、Spearman排序相关度、平均方差误差和异常比率进行比较。结果表明这三个研究之间存在高度相关性，并且在所有评估中没有发现显著差异。

Transport-of-Intensity Model for Single-Mask X-ray Differential Phase Contrast Imaging

paper_url: http://arxiv.org/abs/2310.19087
repo_url: None
paper_authors: Jingcheng Yuan, Mini Das
for: 该研究旨在提高软组织和肿瘤的可见度，使用X射线阶段差图像技术。
methods: 该研究提出了一种基于运输Intensity公式的单面phas imaging系统模型，以提供图像形成过程的直观理解。此外，该研究还展示了使用单一的干扰图像来 Retrieval attenuation和分别阶段差信息，不需要spectral信息或探测器/Mask移动。
results: 该研究通过实验和Monte Carlo仿真示出了模型的有效性和提议的Retrieval方法。该模型超越了现有模型的限制，提供了直观的图像形成过程的视觉化，同时允许优化分别阶段差投影 geometries，进一步提高了实际应用的可行性。

Abstract
X-ray phase contrast imaging has emerged as a promising technique for enhancing contrast and visibility of light-element materials, including soft tissues and tumors. In this paper, we propose a novel model for a single-mask phase imaging system based on the transport-of-intensity equation. Our model offers an intuitive understanding of signal and contrast formation in single-mask phase imaging systems. We also demonstrate efficient retrieval of attenuation and differential phase contrast with just one intensity image without requiring spectral information or mask/detector movement. The model validity as well as the proposed retrieval method is demonstrated via both experimental results on a system developed in-house as well as with Monte Carlo simulations. Our proposed model overcomes the limitations of existing models by providing an intuitive visualization of the image formation process. It also allows optimizing differential phase imaging geometries for practical applications, further enhancing broader applicability. Furthermore, the general methodology described herein offers insight on deriving transport-of-intensity models for novel X-ray imaging systems with periodic structures in the beam path.

摘要

2023-10-29

eess.SP

eess.SP - 2023-10-29

Optical STAR-RIS-Aided VLC Systems: RSMA Versus NOMA

paper_url: http://arxiv.org/abs/2310.19141
repo_url: None
paper_authors: Omar Maraqa, Sylvester Aboagye, Telex M. N. Ngatched
for: This paper aims to study the performance of optical simultaneous transmission and reflection reconfigurable intelligent surface (OSTAR-RIS) in a multi-user indoor visible light communication (VLC) system.methods: The proposed system uses a novel multi-user indoor VLC system assisted by OSTAR-RIS, which employs both power-domain non-orthogonal multiple access (NOMA) and rate splitting multiple access (RSMA) to improve the sum rate performance. The roll and yaw angles of the reflector elements, as well as the refractive index of the refractor elements in OSTAR-RIS, are jointly optimized using a sum rate maximization problem.results: The simulation results show that the proposed OSTAR-RIS RSMA-aided VLC system outperforms the OSTAR-RIS NOMA-based VLC system in terms of both the sum rate and the sum energy efficiency.

Abstract
A critical concern within the realm of visible light communications (VLC) pertains to enhancing system data rate, particularly in scenarios where the direct line-of-sight (LoS) connection is obstructed by obstacles. The deployment of meta-surface-based simultaneous transmission and reflection reconfigurable intelligent surface (STAR-RIS) has emerged to combat challenging LoS blockage scenarios and to provide 360 coverage in radio-frequency wireless systems. Recently, the concept of optical simultaneous transmission and reflection reconfigurable intelligent surface (OSTAR-RIS) has been promoted for VLC systems. This work is dedicated to studying the performance of OSTAR-RIS in detail and unveiling the VLC system performance gain under such technology. Specifically, we propose a novel multi-user indoor VLC system that is assisted by OSTAR-RIS. To improve the sum rate performance of the proposed system, both power-domain non-orthogonal multiple access (NOMA) and rate splitting multiple access (RSMA) are investigated in this work. To realize this, a sum rate maximization problem that jointly optimizes the roll and yaw angles of the reflector elements as well as the refractive index of the refractor elements in OSTAR-RIS is formulated, solved, and evaluated. The maximization problem takes into account practical considerations, such as the presence of non-users (i.e., blockers) and the orientation of the recipient's device. The sine-cosine meta-heuristic algorithm is employed to get the optimal solution of the formulated non-convex optimization problem. Moreover, the study delves into the sum energy efficiency optimization of the proposed system. Simulation results indicate that the proposed OSTAR-RIS RSMA-aided VLC system outperforms the OSTAR-RIS NOMA-based VLC system in terms of both the sum rate and the sum energy efficiency.

摘要
Visible light communication (VLC) 的一个关键问题是提高系统数据率，特别是在直线视线 (LoS) 连接被障碍物阻挡时。Meta-surface-based simultaneous transmission and reflection reconfigurable intelligent surface (STAR-RIS) 的部署已经在这些场景中提供了一种解决方案，并提供了360度的覆盖。在这些系统中，optical simultaneous transmission and reflection reconfigurable intelligent surface (OSTAR-RIS) 的概念也在提出。本研究的目的是研究OSTAR-RIS的性能，探讨VLC系统在这种技术下的性能提升。我们提出了一种基于OSTAR-RIS的多用户indoor VLC系统，并使用了power-domain non-orthogonal multiple access (NOMA) 和 rate splitting multiple access (RSMA) 来提高系统的总率性能。为此，我们提出了一个总率最大化问题，该问题jointly优化了OSTAR-RIS中 reflector元素的扭积角和折射率，以及recipient的设备方向。该问题考虑了实际因素，例如阻挡物 (i.e., 堵塞) 和recipient的设备方向。我们使用了sine-cosine meta-heuristic algorithm来获得优化问题的解。此外，我们还研究了系统的总能效率优化。实验结果表明，我们的OSTAR-RIS RSMA-aided VLC系统在总率和总能效率方面都超过了OSTAR-RIS NOMA-based VLC系统。

2023-10-28

cs.CV

cs.CV - 2023-10-28

Deep Learning-based Compressed Domain Multimedia for Man and Machine: A Taxonomy and Application to Point Cloud Classification

paper_url: http://arxiv.org/abs/2310.18849
repo_url: None
paper_authors: Abdelrahman Seleem, André F. R. Guarda, Nuno M. M. Rodrigues, Fernando Pereira
for: 本研究旨在提出一种基于深度学习的图像和视频处理技术，以提高计算机视觉任务的性能和减少计算复杂度。
methods: 该研究使用了深度学习来提取图像和视频数据中的特征，并使用了一种新的稳定性分析方法来评估不同的图像和视频处理算法。
results: 研究结果显示，使用了基于深度学习的图像和视频处理算法可以大幅提高计算机视觉任务的性能，同时减少计算复杂度。此外，研究还发现了一些新的图像和视频处理算法，可以在不同的应用场景中得到优秀的效果。

Abstract
In the current golden age of multimedia, human visualization is no longer the single main target, with the final consumer often being a machine which performs some processing or computer vision tasks. In both cases, deep learning plays a undamental role in extracting features from the multimedia representation data, usually producing a compressed representation referred to as latent representation. The increasing development and adoption of deep learning-based solutions in a wide area of multimedia applications have opened an exciting new vision where a common compressed multimedia representation is used for both man and machine. The main benefits of this vision are two-fold: i) improved performance for the computer vision tasks, since the effects of coding artifacts are mitigated; and ii) reduced computational complexity, since prior decoding is not required. This paper proposes the first taxonomy for designing compressed domain computer vision solutions driven by the architecture and weights compatibility with an available spatio-temporal computer vision processor. The potential of the proposed taxonomy is demonstrated for the specific case of point cloud classification by designing novel compressed domain processors using the JPEG Pleno Point Cloud Coding standard under development and adaptations of the PointGrid classifier. Experimental results show that the designed compressed domain point cloud classification solutions can significantly outperform the spatial-temporal domain classification benchmarks when applied to the decompressed data, containing coding artifacts, and even surpass their performance when applied to the original uncompressed data.

摘要
在当今的金色 Multimedia 时代，人类视觉不再是唯一的主要目标，最终consumer часто是一个机器，执行一些处理或计算机视觉任务。在这两种情况下，深度学习在抽取 Multimedia 表示数据中的特征方面发挥了关键作用，通常生成一个压缩表示 referred to as 封顶表示。随着深度学习基于解决方案在各种 Multimedia 应用领域的开发和采用，开启了一个新的视野，在这个视野中，一个通用的压缩 Multimedia 表示被用于人类和机器。这个视野的主要优点有两个方面：一是提高计算机视觉任务的性能，因为编码artifacts的影响被减少; 二是降低计算复杂性，因为不需要先 decode。这篇文章提出了首个设计压缩领域计算机视觉解决方案的taxonomy，该taxonomy基于可用的空间temporal计算机视觉处理器的建立和重量相容性。特点的实验结果表明，通过设计 novel 压缩领域处理器，使用 JPEG Pleno Point Cloud Coding 标准在开发中和PointGrid分类器的改进，可以在压缩领域实现显著的点云分类性能提升，并在应用到解码后的数据中 even surpass 其性能。

INCODE: Implicit Neural Conditioning with Prior Knowledge Embeddings

paper_url: http://arxiv.org/abs/2310.18846
repo_url: https://github.com/xmindflow/INCODE
paper_authors: Amirhossein Kazerouni, Reza Azad, Alireza Hosseini, Dorit Merhof, Ulas Bagci
for: 提高信号表示的精度和灵活性，解决现有INR的细节捕捉和鲁棒性问题
methods: 利用深度先验知识调整抽象函数的参数，并通过任务特定预训练模型进行任务特定参数调整，以优化表示过程
results: 在多种信号表示任务上具有更高的精度、质量、灵活性和速度，并能够解决复杂的音频、图像、3D形状重建、NeRFs、反问题等任务，并且在各种难题上具有优于现有INR的表现

Abstract
Implicit Neural Representations (INRs) have revolutionized signal representation by leveraging neural networks to provide continuous and smooth representations of complex data. However, existing INRs face limitations in capturing fine-grained details, handling noise, and adapting to diverse signal types. To address these challenges, we introduce INCODE, a novel approach that enhances the control of the sinusoidal-based activation function in INRs using deep prior knowledge. INCODE comprises a harmonizer network and a composer network, where the harmonizer network dynamically adjusts key parameters of the activation function. Through a task-specific pre-trained model, INCODE adapts the task-specific parameters to optimize the representation process. Our approach not only excels in representation, but also extends its prowess to tackle complex tasks such as audio, image, and 3D shape reconstructions, as well as intricate challenges such as neural radiance fields (NeRFs), and inverse problems, including denoising, super-resolution, inpainting, and CT reconstruction. Through comprehensive experiments, INCODE demonstrates its superiority in terms of robustness, accuracy, quality, and convergence rate, broadening the scope of signal representation. Please visit the project's website for details on the proposed method and access to the code.

摘要
归一神经表示（INR）已经革命化了信号表示方法，通过利用神经网络提供连续和平滑的数据表示方式。然而，现有INR受到细节capturing、鲁棒性和多样化信号类型的限制。为解决这些挑战，我们介绍了INCODE，一种新的方法，它使用深度先验知识来强化神经征函数的控制。INCODE包括一个和 composer网络，其中和网络动态调整 activation 函数的关键参数。通过一个任务特定的预训练模型，INCODE可以将任务特定的参数适应优化表示过程。我们的方法不仅在表示方面卓越，还能够扩展到复杂的任务，如音频、图像、三维形状重建、神经辐射场（NeRF）、反问题（包括噪声、超分解、缺失、CT重建）等。通过全面的实验，INCODE在稳定性、准确性、质量和收敛率方面表现出优异，扩大信号表示的范围。请参考项目网站了解提出的方法和获取代码。

Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models

paper_url: http://arxiv.org/abs/2310.18840
repo_url: https://github.com/littlewhitesea/stitchdiffusion
paper_authors: Hai Wang, Xiaoyu Xiang, Yuchen Fan, Jing-Hao Xue
for: 本研究旨在提出一种基于diffusion模型的个性化文本到图像（T2I）合成方法，用于自适应地生成360度全景图像。
methods: 我们首先为这项任务提前抽象了一个预训练的T2I扩散模型，然后使用LoRA进行精度调整。然而，这些调整并不能保证左右两侧图像的连续性，这是360度全景图像的重要特征。因此，我们提出了StitchDiffusion方法，包括在拼接块中进行预除噪音处理，以及应用全局剪辑来生成无缝360度全景图像。
results: 我们的自定义模型，加上我们提出的StitchDiffusion方法，可以生成高质量的360度全景图像。此外，我们的自定义模型在生成未在训练数据集中看到的场景时表现出了异常的泛化能力。

Abstract
Personalized text-to-image (T2I) synthesis based on diffusion models has attracted significant attention in recent research. However, existing methods primarily concentrate on customizing subjects or styles, neglecting the exploration of global geometry. In this study, we propose an approach that focuses on the customization of 360-degree panoramas, which inherently possess global geometric properties, using a T2I diffusion model. To achieve this, we curate a paired image-text dataset specifically designed for the task and subsequently employ it to fine-tune a pre-trained T2I diffusion model with LoRA. Nevertheless, the fine-tuned model alone does not ensure the continuity between the leftmost and rightmost sides of the synthesized images, a crucial characteristic of 360-degree panoramas. To address this issue, we propose a method called StitchDiffusion. Specifically, we perform pre-denoising operations twice at each time step of the denoising process on the stitch block consisting of the leftmost and rightmost image regions. Furthermore, a global cropping is adopted to synthesize seamless 360-degree panoramas. Experimental results demonstrate the effectiveness of our customized model combined with the proposed StitchDiffusion in generating high-quality 360-degree panoramic images. Moreover, our customized model exhibits exceptional generalization ability in producing scenes unseen in the fine-tuning dataset. Code is available at https://github.com/littlewhitesea/StitchDiffusion.

摘要
<>转换给定文本到简化中文。<>研究中的个性化文本到图像（T2I）合成技术受到了非常重视。然而，现有的方法主要集中在主题或风格方面，忽略了全球几何特性的探索。在这种研究中，我们提出了一种方法，该方法是通过T2I扩散模型进行个性化360度全景图像的合成。为此，我们制作了特定于任务的图像文本对集，然后使用它来练化一个预训练的T2I扩散模型。然而，练化后的模型本身不能保证左侧和右侧图像的连续性，这是360度全景图像的关键特征。为解决这个问题，我们提出了一种方法called StitchDiffusion。具体来说，我们在每个时间步中对固定块进行预处理操作两次，并采用全球裁剪来合成无缝360度全景图像。实验结果表明，我们的定制模型与StitchDiffusion结合可以生成高质量的360度全景图像。此外，我们的定制模型还表现出了非常好的泛化能力，可以生成未在练化数据集中出现的场景。代码可以在https://github.com/littlewhitesea/StitchDiffusion上找到。

UniCat: Crafting a Stronger Fusion Baseline for Multimodal Re-Identification

paper_url: http://arxiv.org/abs/2310.18812
repo_url: None
paper_authors: Jennifer Crawford, Haoli Yin, Luke McDermott, Daniel Cummings
for: 这个论文目标是为了解决多模态重识别任务中的批量化问题，提高对多种数据流的重识别能力。
methods: 该论文使用了多种方法，包括单模态和多模态方法，以及各种拼接和融合策略。
results: 研究发现，使用单模态方法可以获得更好的表示，而不是使用多模态方法。此外，使用不同的拼接和融合策略也可以提高表示的质量。

Abstract
Multimodal Re-Identification (ReID) is a popular retrieval task that aims to re-identify objects across diverse data streams, prompting many researchers to integrate multiple modalities into a unified representation. While such fusion promises a holistic view, our investigations shed light on potential pitfalls. We uncover that prevailing late-fusion techniques often produce suboptimal latent representations when compared to methods that train modalities in isolation. We argue that this effect is largely due to the inadvertent relaxation of the training objectives on individual modalities when using fusion, what others have termed modality laziness. We present a nuanced point-of-view that this relaxation can lead to certain modalities failing to fully harness available task-relevant information, and yet, offers a protective veil to noisy modalities, preventing them from overfitting to task-irrelevant data. Our findings also show that unimodal concatenation (UniCat) and other late-fusion ensembling of unimodal backbones, when paired with best-known training techniques, exceed the current state-of-the-art performance across several multimodal ReID benchmarks. By unveiling the double-edged sword of "modality laziness", we motivate future research in balancing local modality strengths with global representations.

摘要
多模态重识别（ReID）是一个广泛应用的检索任务，旨在透过多种数据流处理对象的重识别，引起了许多研究人员将多种模式融合到一个统一表示中。然而，我们的调查发现，使用合并技术时常会产生优化后的下降，相比于单独训练模式时的表示。我们认为这是由于将多个模式融合时，不小心放弃各个模式的训练目标，导致模式懒散（modality laziness）的问题。我们提出一种复杂的观点，认为这种放弃可以使某些模式在任务相关的信息上充分发挥作用，同时防止不相关的数据泛染。我们的发现还表明，将单模式 concatenation（UniCat）和其他融合技术与最佳训练技术结合，可以在多模态ReIDbenchmark上超越当前状态的表现。我们的研究揭示了“模式懒散”的双重剑，激励未来的研究人员在地方模式强大性和全球表示之间寻求平衡。

A Review on the Applications of Machine Learning for Tinnitus Diagnosis Using EEG Signals

paper_url: http://arxiv.org/abs/2310.18795
repo_url: None
paper_authors: Farzaneh Ramezani, Hamidreza Bolhasani
for: 这个研究的目的是使用机器学习技术来识别或预测听力障碍患者，以便早期诊断和治疗。
methods: 这些研究使用了多种数据模式和机器学习技术来识别和分类听力障碍患者。
results: 这些研究的结果表明，使用EEG信号作为输入数据，可以准确地识别和预测听力障碍患者。但是，研究结果存在差异和矛盾，需要进一步的研究以更好地理解听力障碍的特征和预测方法。

Abstract
Tinnitus is a prevalent hearing disorder that can be caused by various factors such as age, hearing loss, exposure to loud noises, ear infections or tumors, certain medications, head or neck injuries, and psychological conditions like anxiety and depression. While not every patient requires medical attention, about 20% of sufferers seek clinical intervention. Early diagnosis is crucial for effective treatment. New developments have been made in tinnitus detection to aid in early detection of this illness. Over the past few years, there has been a notable growth in the usage of electroencephalography (EEG) to study variations in oscillatory brain activity related to tinnitus. However, the results obtained from numerous studies vary greatly, leading to conflicting conclusions. Currently, clinicians rely solely on their expertise to identify individuals with tinnitus. Researchers in this field have incorporated various data modalities and machine-learning techniques to aid clinicians in identifying tinnitus characteristics and classifying people with tinnitus. The purpose of writing this article is to review articles that focus on using machine learning (ML) to identify or predict tinnitus patients using EEG signals as input data. We have evaluated 11 articles published between 2016 and 2023 using a systematic literature review (SLR) method. This article arranges perfect summaries of all the research reviewed and compares the significant aspects of each. Additionally, we performed statistical analyses to gain a deeper comprehension of the most recent research in this area. Almost all of the reviewed articles followed a five-step procedure to achieve the goal of tinnitus. Disclosure. Finally, we discuss the open affairs and challenges in this method of tinnitus recognition or prediction and suggest future directions for research.

摘要
听力障碍（tinnitus）是一种非常普遍的听力疾病，可以由年龄、听力损伤、高音响应、耳感染或肿瘤、某些药物、头或Neck伤等多种因素引起。虽然不是所有患者需要医疗干预，但约20%的患者会寻求临床 intervención。早期诊断非常重要，以便有效的治疗。在过去几年中，对听力障碍检测方法的新发展带来了一定的进步。通过使用电enzephalography（EEG）研究听力障碍相关的脑动力学特征，已经有了一定的进步。然而，这些研究的结果很多样化，导致了不一致的结论。目前，临床医生仅仅靠自己的专业知识来诊断听力障碍。研究人员在这一领域已经结合了不同的数据模式和机器学习技术，以帮助临床医生识别听力障碍特征并将患者分类。本文的目的是对使用机器学习（ML）识别或预测听力障碍患者的研究进行系统性的文献综述。我们对2016年至2023年间发表的11篇文章进行了系统性的文献综述，并对每篇文章进行了精确的摘要。此外，我们还进行了统计分析，以更深入地了解最近的研究发展。大多数复习的文章遵循了五步程序来实现听力障碍识别或预测的目标。最后，我们讨论了这一方法的开放问题和挑战，并建议未来的研究方向。

PrObeD: Proactive Object Detection Wrapper

paper_url: http://arxiv.org/abs/2310.18788
repo_url: None
paper_authors: Vishal Asnani, Abhinav Kumar, Suya You, Xiaoming Liu
For: 提高$2D$物体检测的性能，使其能够更好地检测普通和掩蔽的图像中的物体。* Methods: 基于 wrapper 的扩展方法 PrObeD，包括一个编码器-解码器架构，通过生成图像依赖的信号（模板）来加密输入图像，并通过解码器从Encrypted images中提取这个模板。* Results: 对 MS-COCO、CAMO、COD$10$K 和 NC$4$K 数据集进行了实验，并在不同的检测器上显示了提高的检测性能。

Abstract
Previous research in $2D$ object detection focuses on various tasks, including detecting objects in generic and camouflaged images. These works are regarded as passive works for object detection as they take the input image as is. However, convergence to global minima is not guaranteed to be optimal in neural networks; therefore, we argue that the trained weights in the object detector are not optimal. To rectify this problem, we propose a wrapper based on proactive schemes, PrObeD, which enhances the performance of these object detectors by learning a signal. PrObeD consists of an encoder-decoder architecture, where the encoder network generates an image-dependent signal termed templates to encrypt the input images, and the decoder recovers this template from the encrypted images. We propose that learning the optimum template results in an object detector with an improved detection performance. The template acts as a mask to the input images to highlight semantics useful for the object detector. Finetuning the object detector with these encrypted images enhances the detection performance for both generic and camouflaged. Our experiments on MS-COCO, CAMO, COD$10$K, and NC$4$K datasets show improvement over different detectors after applying PrObeD. Our models/codes are available at https://github.com/vishal3477/Proactive-Object-Detection.

摘要
PrObeD consists of an encoder-decoder architecture, where the encoder network generates an image-dependent signal called templates to encrypt the input images, and the decoder recovers this template from the encrypted images. We believe that learning the optimum template results in an object detector with improved detection performance. The template acts as a mask to the input images, highlighting semantics that are useful for the object detector. Finetuning the object detector with these encrypted images improves the detection performance for both generic and camouflaged objects.Our experiments on the MS-COCO, CAMO, COD$10$K, and NC$4$K datasets show that PrObeD improves the detection performance of different object detectors. Our models and codes are available at https://github.com/vishal3477/Proactive-Object-Detection.Simplified Chinese translation:前一些研究主要关注在二维对象检测中的不同任务，包括检测通用和涂抹图像中的对象。这些工作被视为通过对输入图像进行修改来实现对象检测的被动方法。然而，神经网络中的学习结果可能并不是最优的，因此我们认为这些学习结果可能并不是最优的。为了解决这个问题，我们提出了一种基于主动方法的包装器，称为PrObeD，它可以提高对象检测器的性能。PrObeD包括一个编码器-解码器架构，其中编码器网络生成一个图像具有依赖关系的信号，称为模板，并将这个模板用于对输入图像进行加密。解码器则可以从加密后的图像中提取出这个模板。我们认为，学习最优的模板可以提高对象检测器的检测性能。模板可以视为对输入图像进行修饰，使对象检测器更容易察见用于检测的 semantics。通过在这些加密图像上进行训练，可以提高对象检测器的检测性能，包括通用和涂抹图像中的对象。我们在 MS-COCO、CAMO、COD$10$K 和 NC$4$K 数据集上进行了实验，结果显示 PrObeD 可以提高不同的对象检测器的检测性能。我们的模型和代码可以在 https://github.com/vishal3477/Proactive-Object-Detection 上获取。

CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data

paper_url: http://arxiv.org/abs/2310.18773
repo_url: https://github.com/atr-dbi/cityrefer
paper_authors: Taiki Miyanishi, Fumiya Kitamori, Shuhei Kurita, Jungdae Lee, Motoaki Kawanabe, Nakamasa Inoue
for: 城市级3D点云数据是用于表示细节和复杂的户外结构的有前途的方式，可以用于吸引人的应用，如自适应导航和无人机。
methods: 我们引入了CityRefer数据集，其包含35k个自然语言描述和5k个地标标签，以及与OpenStreetMap的同步。我们还开发了基线系统，可以学习编码语言描述、3D物体实例和城市的地标信息，以实现视Grounding。
results: 据我们知道，CityRefer数据集是当前最大的城市级视Grounding数据集，用于本地化特定3D对象。

Abstract
City-scale 3D point cloud is a promising way to express detailed and complicated outdoor structures. It encompasses both the appearance and geometry features of segmented city components, including cars, streets, and buildings, that can be utilized for attractive applications such as user-interactive navigation of autonomous vehicles and drones. However, compared to the extensive text annotations available for images and indoor scenes, the scarcity of text annotations for outdoor scenes poses a significant challenge for achieving these applications. To tackle this problem, we introduce the CityRefer dataset for city-level visual grounding. The dataset consists of 35k natural language descriptions of 3D objects appearing in SensatUrban city scenes and 5k landmarks labels synchronizing with OpenStreetMap. To ensure the quality and accuracy of the dataset, all descriptions and labels in the CityRefer dataset are manually verified. We also have developed a baseline system that can learn encoded language descriptions, 3D object instances, and geographical information about the city's landmarks to perform visual grounding on the CityRefer dataset. To the best of our knowledge, the CityRefer dataset is the largest city-level visual grounding dataset for localizing specific 3D objects.

摘要
城市级3D点云是一种有前途的方式表达细节和复杂的户外结构。它包括城市组成部分的外观和几何特征，包括汽车、街道和建筑物，可以用于有吸引力的应用，如用户交互导航自动汽车和无人机。然而，相比于图像和室内场景的广泛文本注释，户外场景的文本注释的缺乏对市场具有显著的挑战，以实现这些应用。为解决这个问题，我们介绍了城市参照数据集（CityRefer），该数据集包含35000个自然语言描述3D объек在敏捷城市场景中出现的场景和5000个地标标签，与OpenStreetMap相匹配。为保证数据集的质量和准确性，所有的描述和标签在CityRefer数据集中都是手动验证的。我们还开发了一个基eline系统，可以学习编码的自然语言描述、3D объек实例和城市的地标信息，以在CityRefer数据集上进行视觉定位。据我们所知，CityRefer数据集是当前最大的城市级视觉定位数据集，用于特定3D对象的本地化。

Online Multi-view Anomaly Detection with Disentangled Product-of-Experts Modeling

paper_url: http://arxiv.org/abs/2310.18728
repo_url: https://github.com/cshaowang/dPoE
paper_authors: Hao Wang, Zhi-Qi Cheng, Jingdong Sun, Xin Yang, Xiao Wu, Hongyang Chen, Yan Yang
for: 本研究的目的是提出一种能够处理多视图数据的异常检测方法，以解决现有方法中的一些缺陷，如只适用于两个视图或特定类型异常等。
methods: 本研究使用了多视图学习、分解表示学习和生成模型等方法，其中包括一个Product-of-Experts（PoE）层、一个Total Correction（TC）推定器和一个联合损失函数等。
results: 经过广泛的实验测试，提出的dPoE模型在六个真实世界数据集上表现出色，舒过基elines明显。

Abstract
Multi-view or even multi-modal data is appealing yet challenging for real-world applications. Detecting anomalies in multi-view data is a prominent recent research topic. However, most of the existing methods 1) are only suitable for two views or type-specific anomalies, 2) suffer from the issue of fusion disentanglement, and 3) do not support online detection after model deployment. To address these challenges, our main ideas in this paper are three-fold: multi-view learning, disentangled representation learning, and generative model. To this end, we propose dPoE, a novel multi-view variational autoencoder model that involves (1) a Product-of-Experts (PoE) layer in tackling multi-view data, (2) a Total Correction (TC) discriminator in disentangling view-common and view-specific representations, and (3) a joint loss function in wrapping up all components. In addition, we devise theoretical information bounds to control both view-common and view-specific representations. Extensive experiments on six real-world datasets demonstrate that the proposed dPoE outperforms baselines markedly.

摘要
多视图或多模式数据吸引了现实应用的研究者，但是检测多视图数据中异常现象是一个挑战。现有的大多数方法1）只适用于两个视图或类型特定异常，2）受混合解决问题的影响，3）无法在模型部署后进行在线检测。为解决这些挑战，我们的主要想法是三重：多视图学习、分解表示学习和生成模型。为此，我们提出了dPoE，一种新的多视图变量自适应器模型，它包括（1）一个Product-of-Experts（PoE）层来处理多视图数据，（2）一个总正确（TC）推分器来分解视图共同和视图特定表示，以及（3）一个联合损失函数来包装所有组件。此外，我们设计了理论信息约束来控制视图共同和视图特定表示。广泛的实验证明了我们提出的dPoE明显超过基eline。

Audio-Visual Instance Segmentation

paper_url: http://arxiv.org/abs/2310.18709
repo_url: None
paper_authors: Ruohao Guo, Yaru Chen, Yanyu Qi, Wenzhen Yue, Dantong Niu, Xianghua Ying
for: 这个论文目标是提出一种新的多模态任务，即音频视频实例分割（AVIS），目的是同时在可见视频中识别、分割和跟踪各种声音实例。
methods: 该论文使用了一种简单的基础模型，其中添加了一个音频分支和一个跨模态融合模块，以使用Mask2Former来找到所有声音实例。
results: 该论文使用两种脊梁进行评估，并得到了在AVISeg上的较好的性能。作者认为，AVIS将激励社区尝试更加全面的多模态理解。

Abstract
In this paper, we propose a new multi-modal task, namely audio-visual instance segmentation (AVIS), in which the goal is to identify, segment, and track individual sounding object instances in audible videos, simultaneously. To our knowledge, it is the first time that instance segmentation has been extended into the audio-visual domain. To better facilitate this research, we construct the first audio-visual instance segmentation benchmark (AVISeg). Specifically, AVISeg consists of 1,258 videos with an average duration of 62.6 seconds from YouTube and public audio-visual datasets, where 117 videos have been annotated by using an interactive semi-automatic labeling tool based on the Segment Anything Model (SAM). In addition, we present a simple baseline model for the AVIS task. Our new model introduces an audio branch and a cross-modal fusion module to Mask2Former to locate all sounding objects. Finally, we evaluate the proposed method using two backbones on AVISeg. We believe that AVIS will inspire the community towards a more comprehensive multi-modal understanding.

摘要
在这篇论文中，我们提出了一个新的多模态任务，即听视频实例分割（AVIS），目的是同时在听sible的视频中识别、分割和跟踪具有声音的对象实例。我们认为这是多模态理解的一个新的突破口。为了更好地推进这项研究，我们建立了首个听视频实例分割benchmark（AVISeg）。具体来说，AVISeg包括YouTube和公共听视频数据集的1,258个视频，视频的平均时长为62.6秒，其中117个视频通过使用基于Segment Anything Model（SAM）的交互式半自动标注工具进行了标注。此外，我们提出了一个简单的基线模型 дляAVIS任务。我们的新模型在Mask2Former模型中添加了一个声音支持和一个跨模态融合模块，以便在听sible的视频中找到所有的声音对象。最后，我们使用两个背景网络测试了我们的提议方法。我们认为AVIS将鼓励社区更加全面地理解多模态。

Triplet Attention Transformer for Spatiotemporal Predictive Learning

paper_url: http://arxiv.org/abs/2310.18698
repo_url: None
paper_authors: Xuesong Nie, Xi Chen, Haoyuan Jin, Zhihang Zhu, Yunfeng Yan, Donglian Qi
for: 预测未来序列 based on 历史序列，提高预测质量 while maintaining 计算效率
methods: 使用 triplet attention transformer，包括 Triplet Attention Module (TAM)，替代传统的 recurrent units， capture both inter-frame dynamics 和 intra-frame static features
results: 在多种场景下，包括移动对象轨迹预测、交通流预测、驾驶场景预测和人体动作捕捉，实现了 state-of-the-art 性能，超过了现有的 recurrent-based 和 recurrent-free 方法

Abstract
Spatiotemporal predictive learning offers a self-supervised learning paradigm that enables models to learn both spatial and temporal patterns by predicting future sequences based on historical sequences. Mainstream methods are dominated by recurrent units, yet they are limited by their lack of parallelization and often underperform in real-world scenarios. To improve prediction quality while maintaining computational efficiency, we propose an innovative triplet attention transformer designed to capture both inter-frame dynamics and intra-frame static features. Specifically, the model incorporates the Triplet Attention Module (TAM), which replaces traditional recurrent units by exploring self-attention mechanisms in temporal, spatial, and channel dimensions. In this configuration: (i) temporal tokens contain abstract representations of inter-frame, facilitating the capture of inherent temporal dependencies; (ii) spatial and channel attention combine to refine the intra-frame representation by performing fine-grained interactions across spatial and channel dimensions. Alternating temporal, spatial, and channel-level attention allows our approach to learn more complex short- and long-range spatiotemporal dependencies. Extensive experiments demonstrate performance surpassing existing recurrent-based and recurrent-free methods, achieving state-of-the-art under multi-scenario examination including moving object trajectory prediction, traffic flow prediction, driving scene prediction, and human motion capture.

摘要
《空时空间预测学习》提供了一种自主学习 paradigma，允许模型通过预测未来序列基于历史序列来学习空间和时间模式。主流方法受限于缺乏并行化和实际场景下的表现不佳，我们提议一种创新的 triplet attention transformer，用于捕捉空间、时间和通道维度的自我注意力机制。在这种配置下：（i）时间ток包含了抽象的间隔frame的表示，以便捕捉自然的时间依赖关系；（ii）空间和通道注意力结合以进一步细化内帧表示，通过在空间和通道维度进行细化的交互来增强模型对于短距离和长距离空间时间关系的学习能力。 alternate temporal、空间和通道级别的注意力允许我们的方法学习更复杂的短距离和长距离空间时间关系。广泛的实验证明了我们的方法在多种场景下，包括人体动作跟踪、交通流量预测、驾驶场景预测和人体动作捕捉等，性能超过了现有的循环单元和循环自由方法，实现了状态当前的水平。

Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision

paper_url: http://arxiv.org/abs/2310.18689
repo_url: None
paper_authors: Bobby Azad, Reza Azad, Sania Eskandari, Afshin Bozorgpour, Amirhossein Kazerouni, Islem Rekik, Dorit Merhof
For: This paper provides a comprehensive overview of foundation models in the domain of medical imaging, with a focus on their applications, opportunities, and future directions.* Methods: The paper classifies foundation models within the medical domain based on training strategies, imaging modalities, specific organs of interest, and algorithms integral to these models.* Results: The paper discusses the practical use cases of some selected approaches and addresses the challenges and research pathways associated with foundational models in medical imaging, including interpretability, data management, computational requirements, and contextual comprehension.

Abstract
Foundation models, large-scale, pre-trained deep-learning models adapted to a wide range of downstream tasks have gained significant interest lately in various deep-learning problems undergoing a paradigm shift with the rise of these models. Trained on large-scale dataset to bridge the gap between different modalities, foundation models facilitate contextual reasoning, generalization, and prompt capabilities at test time. The predictions of these models can be adjusted for new tasks by augmenting the model input with task-specific hints called prompts without requiring extensive labeled data and retraining. Capitalizing on the advances in computer vision, medical imaging has also marked a growing interest in these models. To assist researchers in navigating this direction, this survey intends to provide a comprehensive overview of foundation models in the domain of medical imaging. Specifically, we initiate our exploration by providing an exposition of the fundamental concepts forming the basis of foundation models. Subsequently, we offer a methodical taxonomy of foundation models within the medical domain, proposing a classification system primarily structured around training strategies, while also incorporating additional facets such as application domains, imaging modalities, specific organs of interest, and the algorithms integral to these models. Furthermore, we emphasize the practical use case of some selected approaches and then discuss the opportunities, applications, and future directions of these large-scale pre-trained models, for analyzing medical images. In the same vein, we address the prevailing challenges and research pathways associated with foundational models in medical imaging. These encompass the areas of interpretability, data management, computational requirements, and the nuanced issue of contextual comprehension.

摘要
大量训练的深度学习模型（foundation models）在不同领域的深度学习问题中受到了非常大的关注。这些模型可以在各种模式之间进行Contextual reasoning，通过加入任务特定的提示（prompts）来调整模型的预测，无需大量的标注数据和重新训练。随着计算机视觉领域的进步，医学影像领域也开始关注这些模型。本文旨在为医学影像领域的研究人员提供一份全面的评论，以帮助他们在这个方向中探索。 Specifically, we begin by providing an overview of the fundamental concepts that underlie foundation models. We then offer a systematic taxonomy of foundation models within the medical domain, classifying them based on their training strategies, application domains, imaging modalities, specific organs of interest, and the algorithms used in these models. We also highlight the practical use cases of some selected approaches and discuss the opportunities, applications, and future directions of these large-scale pre-trained models for analyzing medical images. Furthermore, we address the challenges and research pathways associated with foundational models in medical imaging, including interpretability, data management, computational requirements, and the nuanced issue of contextual comprehension.

Efficient Object Detection in Optical Remote Sensing Imagery via Attention-based Feature Distillation

paper_url: http://arxiv.org/abs/2310.18676
repo_url: None
paper_authors: Pourya Shamsolmoali, Jocelyn Chanussot, Huiyu Zhou, Yue Lu
for: 这篇论文主要针对的是实时观测中的有效物体检测方法，并且使用知识传播（KD）技术来实现轻量级模型，同时保持精度。
methods: 本文提出了一个新的知识传播方法，即注意力基本Distillation（AFD），这个方法可以将教师模型中的本地和全球资讯都传播到学生模型中，以提高学生模型的检测精度。此外，本文还引入了一个多例对劲机制，以分辨背景和前景元素，并将其传播到学生模型中。
results: 本文的实验结果显示，这个AFD方法可以在两个公共的航空图像benchmark上实现和其他状态顶对称模型相同的检测性能，同时具有轻量级的特点。

Abstract
Efficient object detection methods have recently received great attention in remote sensing. Although deep convolutional networks often have excellent detection accuracy, their deployment on resource-limited edge devices is difficult. Knowledge distillation (KD) is a strategy for addressing this issue since it makes models lightweight while maintaining accuracy. However, existing KD methods for object detection have encountered two constraints. First, they discard potentially important background information and only distill nearby foreground regions. Second, they only rely on the global context, which limits the student detector's ability to acquire local information from the teacher detector. To address the aforementioned challenges, we propose Attention-based Feature Distillation (AFD), a new KD approach that distills both local and global information from the teacher detector. To enhance local distillation, we introduce a multi-instance attention mechanism that effectively distinguishes between background and foreground elements. This approach prompts the student detector to focus on the pertinent channels and pixels, as identified by the teacher detector. Local distillation lacks global information, thus attention global distillation is proposed to reconstruct the relationship between various pixels and pass it from teacher to student detector. The performance of AFD is evaluated on two public aerial image benchmarks, and the evaluation results demonstrate that AFD in object detection can attain the performance of other state-of-the-art models while being efficient.

摘要
Recently, efficient object detection methods have received significant attention in remote sensing. Although deep convolutional networks often have excellent detection accuracy, deploying them on resource-limited edge devices is challenging. Knowledge distillation (KD) is a strategy that can address this issue by making models lightweight while maintaining accuracy. However, existing KD methods for object detection have two limitations. First, they discard potentially important background information and only distill nearby foreground regions. Second, they only rely on global context, which limits the student detector's ability to acquire local information from the teacher detector.To address these challenges, we propose Attention-based Feature Distillation (AFD), a new KD approach that distills both local and global information from the teacher detector. To enhance local distillation, we introduce a multi-instance attention mechanism that effectively distinguishes between background and foreground elements. This approach prompts the student detector to focus on the pertinent channels and pixels, as identified by the teacher detector. Local distillation lacks global information, so we propose attention global distillation to reconstruct the relationship between various pixels and pass it from teacher to student detector.We evaluate the performance of AFD on two public aerial image benchmarks, and the results show that AFD can achieve the performance of other state-of-the-art models while being efficient.

Foundation Models for Generalist Geospatial Artificial Intelligence

paper_url: http://arxiv.org/abs/2310.18660
repo_url: None
paper_authors: Johannes Jakubik, Sujit Roy, C. E. Phillips, Paolo Fraccaro, Denys Godwin, Bianca Zadrozny, Daniela Szwarcman, Carlos Gomes, Gabby Nyirjesy, Blair Edwards, Daiki Kimura, Naomi Simumba, Linsong Chu, S. Karthik Mukkavilli, Devyani Lambhate, Kamal Das, Ranjini Bangalore, Dario Oliveira, Michal Muszynski, Kumar Ankur, Muthukumaran Ramasubramanian, Iksha Gurung, Sam Khallaghi, Hanxi, Li, Michael Cecil, Maryam Ahmadi, Fatemeh Kordi, Hamed Alemohammad, Manil Maskey, Raghu Ganti, Kommy Weldemariam, Rahul Ramachandran
for:* 这篇论文的目的是为了提出一个高度可调整且可重用的人工智能（AI）模型的开发，以便在地球科学和遥测中具有重要影响。methods:* 这篇论文使用了自我指导的方法来预训foundational models，然后使用小量标签数据进行精确化。results:* 这篇论文的研究显示了一个首次的框架可以有效地将foundational models预训和精确化，并在多个地球观测任务上表现出色，例如多 Spectral satellite imagery 的测试。

Abstract
Significant progress in the development of highly adaptable and reusable Artificial Intelligence (AI) models is expected to have a significant impact on Earth science and remote sensing. Foundation models are pre-trained on large unlabeled datasets through self-supervision, and then fine-tuned for various downstream tasks with small labeled datasets. This paper introduces a first-of-a-kind framework for the efficient pre-training and fine-tuning of foundational models on extensive geospatial data. We have utilized this framework to create Prithvi, a transformer-based geospatial foundational model pre-trained on more than 1TB of multispectral satellite imagery from the Harmonized Landsat-Sentinel 2 (HLS) dataset. Our study demonstrates the efficacy of our framework in successfully fine-tuning Prithvi to a range of Earth observation tasks that have not been tackled by previous work on foundation models involving multi-temporal cloud gap imputation, flood mapping, wildfire scar segmentation, and multi-temporal crop segmentation. Our experiments show that the pre-trained model accelerates the fine-tuning process compared to leveraging randomly initialized weights. In addition, pre-trained Prithvi compares well against the state-of-the-art, e.g., outperforming a conditional GAN model in multi-temporal cloud imputation by up to 5pp (or 5.7%) in the structural similarity index. Finally, due to the limited availability of labeled data in the field of Earth observation, we gradually reduce the quantity of available labeled data for refining the model to evaluate data efficiency and demonstrate that data can be decreased significantly without affecting the model's accuracy. The pre-trained 100 million parameter model and corresponding fine-tuning workflows have been released publicly as open source contributions to the global Earth sciences community through Hugging Face.

摘要
“预计在人工智能（AI）模型的开发中，有 significante进步，这将对地球科学和远程感知产生重要影响。基础模型通过自我超vision，在大量无标签数据上自我预训练，然后使用小量标签数据进行精度调整。这篇论文介绍了一种新的框架，用于高效地预训练和精度调整基础模型，并在extensive geospatial数据上进行了实践。我们使用了这个框架，创造了一个基于转换器的地ospatial基础模型，名为Prithvi，并在 более чем 1TB的多spectral卫星图像上进行了预训练。我们的研究表明，我们的框架可以成功地将Prithvi fine-tune到多种地观测任务中，包括多temporal云阴掩模型、洪水地图、野火痕分割和多temporal作物分割。我们的实验显示，预训练后的模型可以加速 fine-tuning 过程，并且与随机初始化的权重相比，具有更高的准确率。此外，我们的Prithvi模型在多temporal云阴掩模型中与状态艺术模型进行比较，在结构相似指数中提高了5pp（或5.7%）。最后，由于地球观测领域内标注数据的有限性，我们逐渐减少可用标注数据的量来评估数据效率，并证明可以大幅减少数据量而无需影响模型的准确率。预训练10000万参数模型和相应的精度调整工作流已经公开发布在Hugging Face上，作为对全球地球科学社区的开源贡献。”

Med-DANet V2: A Flexible Dynamic Architecture for Efficient Medical Volumetric Segmentation

paper_url: http://arxiv.org/abs/2310.18656
repo_url: None
paper_authors: Haoran Shen, Yifu Zhang, Wenxuan Wang, Chen Chen, Jing Liu, Shanshan Song, Jiangyun Li
for: 这个论文的目的是提高医疗影像三维分类的计算效率。
methods: 这个方法使用了动态推论基于层次复杂度，并 dynamically选择适合不同层次的2D候选模型。
results: 该方法在BraTS 2019和2020的实验中实现了与前一代方法相似或更好的性能，并且具有许多少的模型复杂度。相比Med-DANet和TransBTS，我们的框架可以提高模型效率，并且具有相似的分类结果。

Abstract
Recent works have shown that the computational efficiency of 3D medical image (e.g. CT and MRI) segmentation can be impressively improved by dynamic inference based on slice-wise complexity. As a pioneering work, a dynamic architecture network for medical volumetric segmentation (i.e. Med-DANet) has achieved a favorable accuracy and efficiency trade-off by dynamically selecting a suitable 2D candidate model from the pre-defined model bank for different slices. However, the issues of incomplete data analysis, high training costs, and the two-stage pipeline in Med-DANet require further improvement. To this end, this paper further explores a unified formulation of the dynamic inference framework from the perspective of both the data itself and the model structure. For each slice of the input volume, our proposed method dynamically selects an important foreground region for segmentation based on the policy generated by our Decision Network and Crop Position Network. Besides, we propose to insert a stage-wise quantization selector to the employed segmentation model (e.g. U-Net) for dynamic architecture adapting. Extensive experiments on BraTS 2019 and 2020 show that our method achieves comparable or better performance than previous state-of-the-art methods with much less model complexity. Compared with previous methods Med-DANet and TransBTS with dynamic and static architecture respectively, our framework improves the model efficiency by up to nearly 4.1 and 17.3 times with comparable segmentation results on BraTS 2019.

摘要
To address these issues, this paper proposes a unified formulation of the dynamic inference framework from both the data and model perspectives. For each slice of the input volume, our method dynamically selects an important foreground region for segmentation based on the policy generated by our Decision Network and Crop Position Network. Additionally, we propose inserting a stage-wise quantization selector into the employed segmentation model (such as U-Net) for dynamic architecture adapting.Experiments on the BraTS 2019 and 2020 datasets show that our method achieves performance comparable to or better than previous state-of-the-art methods with much less model complexity. Compared with previous methods Med-DANet and TransBTS with dynamic and static architecture, respectively, our framework improves model efficiency by up to nearly 4.1 and 17.3 times with comparable segmentation results on BraTS 2019.

Feature Guided Masked Autoencoder for Self-supervised Learning in Remote Sensing

paper_url: http://arxiv.org/abs/2310.18653
repo_url: https://github.com/zhu-xlab/fgmae
paper_authors: Yi Wang, Hugo Hernández Hernández, Conrad M Albrecht, Xiao Xiang Zhu
for: 这篇论文旨在探讨自我监督学习帮助vised transformer在远程感知中进行预训。
methods: 本论文使用Masked AutoEncoder（MAE）作为预训模型，并将spectral和spatial remote sensing图像特征作为改进MAE重建目标。
results: 实验结果显示Feature Guided Masked Autoencoder（FG-MAE）可以提高多spectral图像和SAR图像的semantic理解，并且具有很好的扩展性。

Abstract
Self-supervised learning guided by masked image modelling, such as Masked AutoEncoder (MAE), has attracted wide attention for pretraining vision transformers in remote sensing. However, MAE tends to excessively focus on pixel details, thereby limiting the model's capacity for semantic understanding, in particular for noisy SAR images. In this paper, we explore spectral and spatial remote sensing image features as improved MAE-reconstruction targets. We first conduct a study on reconstructing various image features, all performing comparably well or better than raw pixels. Based on such observations, we propose Feature Guided Masked Autoencoder (FG-MAE): reconstructing a combination of Histograms of Oriented Graidents (HOG) and Normalized Difference Indices (NDI) for multispectral images, and reconstructing HOG for SAR images. Experimental results on three downstream tasks illustrate the effectiveness of FG-MAE with a particular boost for SAR imagery. Furthermore, we demonstrate the well-inherited scalability of FG-MAE and release a first series of pretrained vision transformers for medium resolution SAR and multispectral images.

摘要
自领导学习，如遮盲自动编码（MAE），在远程感知领域内吸引了广泛的关注，用于预训练视Transformer。然而，MAE往往过分关注像素细节，因此限制模型对Semantic理解的能力，特别是对噪音SAR图像。在这篇论文中，我们探索谱spectral和空间Remote sensing图像特征作为改进MAE重建目标。我们首先进行了不同图像特征的重建研究，并发现所有特征都可以相对或更好地than raw pixels。基于这些观察，我们提出了特征引导遮盲自动编码（FG-MAE）：对多spectral图像重建 histogram of oriented gradients（HOG）和normalized difference indices（NDI）的组合，对SAR图像重建HOG。我们的实验结果表明FG-MAE在三个下游任务上表现出了效果，特别是对SAR图像。此外，我们还证明了FG-MAE具有良好的扩展性，并发布了首个媒体分辨率SAR和多spectral图像预训练的视Transformer。

Local-Global Self-Supervised Visual Representation Learning

paper_url: http://arxiv.org/abs/2310.18651
repo_url: https://github.com/alijavidani/local_global_representation_learning
paper_authors: Ali Javidani, Mohammad Amin Sadeghi, Babak Nadjar Araabi
for: 本研究旨在探讨将patch-level特征学习纳入现有自动学习批处理方法中，以提高学习得到的视觉表示的质量。
methods: 我们提出了一种简单 yet effective的patch-matching算法，可以在扩展视图下找到图像中匹配的补丁。然后，我们使用基于Vision Transformer（ViT）的自动学习框架，将扩展视图和补丁进行自我超vised学习。这种方法可以同时生成图像级别和补丁级别的表示。
results: 我们在小、中、大规模数据集上预训练了我们的方法，并证明了我们的方法可以在图像分类和下游任务中超越当前状态艺技。

Abstract
Self-supervised representation learning methods mainly focus on image-level instance discrimination. This study explores the potential benefits of incorporating patch-level discrimination into existing methods to enhance the quality of learned representations by simultaneously looking at local and global visual features. Towards this idea, we present a straightforward yet effective patch-matching algorithm that can find the corresponding patches across the augmented views of an image. The augmented views are subsequently fed into a self-supervised learning framework employing Vision Transformer (ViT) as its backbone. The result is the generation of both image-level and patch-level representations. Leveraging the proposed patch-matching algorithm, the model minimizes the representation distance between not only the CLS tokens but also the corresponding patches. As a result, the model gains a more comprehensive understanding of both the entirety of the image as well as its finer details. We pretrain the proposed method on small, medium, and large-scale datasets. It is shown that our approach could outperform state-of-the-art image-level representation learning methods on both image classification and downstream tasks. Keywords: Self-Supervised Learning; Visual Representations; Local-Global Representation Learning; Patch-Wise Representation Learning; Vision Transformer (ViT)

摘要
自我监督学习方法主要关注图像级别的实例识别。本研究探讨可以将图像级别的识别与patch级别的识别结合到现有方法中，以提高学习的表示质量。为了实现这一目标，我们提出了一种简单又有效的补丁匹配算法，可以在扩展视图中找到图像中的相对应补丁。这些扩展视图然后被 feed into一个自我监督学习框架，使用 Vision Transformer（ViT）作为 backing。通过这种方法，我们可以同时生成图像级别的表示和patch级别的表示。通过补丁匹配算法，模型可以将 CLS Token 的表示距离和相应的补丁之间的表示距离减小。因此，模型可以更好地理解图像的整体特征和细节特征。我们在小、中、大样本大小上进行预训练，并显示了我们的方法可以在图像级别表示学习中超越状态艺术的图像级别表示学习方法。关键词：自我监督学习;视觉表示;本地-全局表示学习;补丁级别表示学习; Vision Transformer（ViT）

Switching Temporary Teachers for Semi-Supervised Semantic Segmentation

paper_url: http://arxiv.org/abs/2310.18640
repo_url: https://github.com/naver-ai/dual-teacher
paper_authors: Jaemin Na, Jung-Woo Ha, Hyung Jin Chang, Dongyoon Han, Wonjun Hwang
for: 这篇研究旨在提高 semi-supervised semantic segmentation 的效果，并且解决 teacher 和学生模型之间的问题。
methods: 这篇研究使用了 dual temporary teacher 方法，将 teacher 和学生模型分为两个短期教师，以降低学生模型与教师模型之间的 Coupling 问题。
results: 这篇研究在 PASCAL VOC、Cityscapes 和 ADE20K 测试 benchmark 上达到了竞争性的表现，并且训练时间比 state-of-the-art 方法短得多。此外，这篇研究还证明了其方法是模型不敏感的，可以与 CNN 和 Transformer 等不同类型的模型搭配使用。

Abstract
The teacher-student framework, prevalent in semi-supervised semantic segmentation, mainly employs the exponential moving average (EMA) to update a single teacher's weights based on the student's. However, EMA updates raise a problem in that the weights of the teacher and student are getting coupled, causing a potential performance bottleneck. Furthermore, this problem may become more severe when training with more complicated labels such as segmentation masks but with few annotated data. This paper introduces Dual Teacher, a simple yet effective approach that employs dual temporary teachers aiming to alleviate the coupling problem for the student. The temporary teachers work in shifts and are progressively improved, so consistently prevent the teacher and student from becoming excessively close. Specifically, the temporary teachers periodically take turns generating pseudo-labels to train a student model and maintain the distinct characteristics of the student model for each epoch. Consequently, Dual Teacher achieves competitive performance on the PASCAL VOC, Cityscapes, and ADE20K benchmarks with remarkably shorter training times than state-of-the-art methods. Moreover, we demonstrate that our approach is model-agnostic and compatible with both CNN- and Transformer-based models. Code is available at \url{https://github.com/naver-ai/dual-teacher}.

摘要
教师-学生框架，广泛存在 semi-supervised 语义分割中，主要采用积分移动平均（EMA）来更新单个教师的参数基于学生的。然而，EMA 更新可能会导致教师和学生的参数相互关联，从而引起性能瓶颈。此外，这种问题可能会在具有更复杂的标签，如分割mask，但具有少量标注数据的情况下变得更加严重。本文介绍了 Dual Teacher，一种简单 yet effective 的方法，它采用了双临时教师来解决学生模型与教师模型之间的关联问题。这两个临时教师会在交替的时间间隔内为学生模型生成 pseudo-标签，以便在每个轮次中维护学生模型的独特特征。因此， Dual Teacher 在 PASCAL VOC、Cityscapes 和 ADE20K 测试集上 achieve 竞争性性能，并且训练时间较短于当前state-of-the-art 方法。此外，我们还证明了我们的方法是模型无关的，可以与 CNN 和 Transformer 等模型结合使用。代码可以在 \url{https://github.com/naver-ai/dual-teacher} 上找到。

Towards Plastic and Stable Exemplar-Free Incremental Learning: A Dual-Learner Framework with Cumulative Parameter Averaging

paper_url: http://arxiv.org/abs/2310.18639
repo_url: None
paper_authors: Wenju Sun, Qingyong Li, Wen Wang, Yangli-ao Geng
for: 这个研究是为了解决增量学习中的困难，特别是在例项自由情况下，当学习新任务时不能访问旧任务的样本。
methods: 这个方法使用了单任务学习（STL）和平均参数积存（CPA）技术，具有单任务学习和综合学习两种模式。
results: 实验结果显示，这个方法在CIFAR-100和Tiny-ImageNet上比过几个状态顶对的增量学习基eline表现出色，尤其在任务增量学习和类别增量学习情况下。

Abstract
The dilemma between plasticity and stability presents a significant challenge in Incremental Learning (IL), especially in the exemplar-free scenario where accessing old-task samples is strictly prohibited during the learning of a new task. A straightforward solution to this issue is learning and storing an independent model for each task, known as Single Task Learning (STL). Despite the linear growth in model storage with the number of tasks in STL, we empirically discover that averaging these model parameters can potentially preserve knowledge across all tasks. Inspired by this observation, we propose a Dual-Learner framework with Cumulative Parameter Averaging (DLCPA). DLCPA employs a dual-learner design: a plastic learner focused on acquiring new-task knowledge and a stable learner responsible for accumulating all learned knowledge. The knowledge from the plastic learner is transferred to the stable learner via cumulative parameter averaging. Additionally, several task-specific classifiers work in cooperation with the stable learner to yield the final prediction. Specifically, when learning a new task, these modules are updated in a cyclic manner: i) the plastic learner is initially optimized using a self-supervised loss besides the supervised loss to enhance the feature extraction robustness; ii) the stable learner is then updated with respect to the plastic learner in a cumulative parameter averaging manner to maintain its task-wise generalization; iii) the task-specific classifier is accordingly optimized to align with the stable learner. Experimental results on CIFAR-100 and Tiny-ImageNet show that DLCPA outperforms several state-of-the-art exemplar-free baselines in both Task-IL and Class-IL settings.

摘要
increments 学习（IL）中的困境在选择between plasticity and stability 上是一个 significannot challenge, especially in the exemplar-free scenario where accessing old-task samples is strictly prohibited during the learning of a new task. A straightforward solution to this issue is learning and storing an independent model for each task, known as Single Task Learning (STL). Despite the linear growth in model storage with the number of tasks in STL, we empirically discover that averaging these model parameters can potentially preserve knowledge across all tasks. Inspired by this observation, we propose a Dual-Learner framework with Cumulative Parameter Averaging (DLCPA). DLCPA employs a dual-learner design: a plastic learner focused on acquiring new-task knowledge and a stable learner responsible for accumulating all learned knowledge. The knowledge from the plastic learner is transferred to the stable learner via cumulative parameter averaging. Additionally, several task-specific classifiers work in cooperation with the stable learner to yield the final prediction. Specifically, when learning a new task, these modules are updated in a cyclic manner: i) the plastic learner is initially optimized using a self-supervised loss besides the supervised loss to enhance the feature extraction robustness; ii) the stable learner is then updated with respect to the plastic learner in a cumulative parameter averaging manner to maintain its task-wise generalization; iii) the task-specific classifier is accordingly optimized to align with the stable learner. Experimental results on CIFAR-100 and Tiny-ImageNet show that DLCPA outperforms several state-of-the-art exemplar-free baselines in both Task-IL and Class-IL settings.

ODM3D: Alleviating Foreground Sparsity for Enhanced Semi-Supervised Monocular 3D Object Detection

paper_url: http://arxiv.org/abs/2310.18620
repo_url: None
paper_authors: Weijia Zhang, Dongnan Liu, Chao Ma, Weidong Cai
for: 提高单光图像3D物体检测（M3OD）的性能，使其能够更好地检测自动驾驶中的3D物体。
methods: 使用semi-supervised learning，将LiDAR频谱知识注入到单光图像检测器中，并通过提取前景稀缺性来进行更加有效的知识传递。
results: 在KITTI验证和测试环境中，其方法 ranked 1st，在BEV和3D检测纪录中都有显著的提升，舒过所有现有的单光方法，包括直接监督和半监督方法。

Abstract
Monocular 3D object detection (M3OD) is a significant yet inherently challenging task in autonomous driving due to absence of implicit depth cues in a single RGB image. In this paper, we strive to boost currently underperforming monocular 3D object detectors by leveraging an abundance of unlabelled data via semi-supervised learning. Our proposed ODM3D framework entails cross-modal knowledge distillation at various levels to inject LiDAR-domain knowledge into a monocular detector during training. By identifying foreground sparsity as the main culprit behind existing methods' suboptimal training, we exploit the precise localisation information embedded in LiDAR points to enable more foreground-attentive and efficient distillation via the proposed BEV occupancy guidance mask, leading to notably improved knowledge transfer and M3OD performance. Besides, motivated by insights into why existing cross-modal GT-sampling techniques fail on our task at hand, we further design a novel cross-modal object-wise data augmentation strategy for effective RGB-LiDAR joint learning. Our method ranks 1st in both KITTI validation and test benchmarks, significantly surpassing all existing monocular methods, supervised or semi-supervised, on both BEV and 3D detection metrics.

摘要
《单目三维物体检测（M3OD）是自主驾驶中的一项重要 yet inherently 挑战性任务，因为单个 RGB 图像中缺乏隐式深度提示。在这篇文章中，我们努力提高目前的单目三维物体检测器，通过利用大量未标注数据进行 semi-supervised 学习。我们提出的 ODM3D 框架在不同层次进行交叉模态知识填充，以在训练中注入 LiDAR 频谱知识到单目检测器。通过发现前景稀畴是现有方法训练不佳的主要原因，我们利用 LiDAR 点的精确地址信息来实现更加前景注意和高效的填充，从而提高知识传递和 M3OD 性能。此外，鉴于现有交叉模态 GT 采样技术在我们任务上失效的原因，我们还设计了一种新的交叉模态对象增强数据采样策略，以便有效地在 RGB 和 LiDAR JOINT 学习中进行对象增强。我们的方法在 KITTI 验证和测试benchmark上rank 1st，明显超过了所有现有的单目方法，包括直接监督和 semi-supervised 方法，在 BEV 和 3D 检测 метриках上。

Domain Generalisation via Risk Distribution Matching

paper_url: http://arxiv.org/abs/2310.18598
repo_url: https://github.com/nktoan/risk-distribution-matching
paper_authors: Toan Nguyen, Kien Do, Bao Duong, Thin Nguyen
for: 这篇论文旨在解决域对应（Domain Generalization，DG）中的问题，提出一个新的方法，利用风险分布来描述域，以 достиieving 域之对称。
methods: 这篇论文使用的方法是基于风险分布的，即使用最大mean距离（MMD）距离来测量风险分布之间的差异，并将其用于域之对称。
results: 实验结果显示，这篇论文提出的方法（Risk Distribution Matching，RDM）在标准的benchmark数据集上具有较高的域对称能力，并且比其他state-of-the-art DG方法更有效率。

Abstract
We propose a novel approach for domain generalisation (DG) leveraging risk distributions to characterise domains, thereby achieving domain invariance. In our findings, risk distributions effectively highlight differences between training domains and reveal their inherent complexities. In testing, we may observe similar, or potentially intensifying in magnitude, divergences between risk distributions. Hence, we propose a compelling proposition: Minimising the divergences between risk distributions across training domains leads to robust invariance for DG. The key rationale behind this concept is that a model, trained on domain-invariant or stable features, may consistently produce similar risk distributions across various domains. Building upon this idea, we propose Risk Distribution Matching (RDM). Using the maximum mean discrepancy (MMD) distance, RDM aims to minimise the variance of risk distributions across training domains. However, when the number of domains increases, the direct optimisation of variance leads to linear growth in MMD computations, resulting in inefficiency. Instead, we propose an approximation that requires only one MMD computation, by aligning just two distributions: that of the worst-case domain and the aggregated distribution from all domains. Notably, this method empirically outperforms optimising distributional variance while being computationally more efficient. Unlike conventional DG matching algorithms, RDM stands out for its enhanced efficacy by concentrating on scalar risk distributions, sidestepping the pitfalls of high-dimensional challenges seen in feature or gradient matching. Our extensive experiments on standard benchmark datasets demonstrate that RDM shows superior generalisation capability over state-of-the-art DG methods.

摘要
我们提出了一种新的领域通用化（DG）方法，利用风险分布来特征化领域，从而实现领域不变性。我们发现，风险分布能够有效地披露训练领域之间的差异和内在复杂性。在测试中，我们可能会观察到类似或者可能加剧的差异between风险分布。因此，我们提出了一个有力的提议：将领域之间风险分布的差异降到最小化，以实现Robust Invariance for DG。这个概念的关键思想是，通过训练领域不变或稳定的特征，我们可以在不同领域上通过风险分布的匹配来实现模型的稳定性。基于这个想法，我们提出了风险分布匹配（RDM）。使用最大平均差（MMD）距离，RDM aimsto minimize the variance of risk distributions across training domains。然而，当领域数量增加时，直接优化差异会导致线性增长的MMD计算，从而变得不效率。因此，我们提出了一种简化方法，只需要一次MMD计算，通过对最坏领域的分布和所有领域的分布进行对应。与传统的DG匹配算法不同，RDM更加有效地做到了通过scalar风险分布来快速匹配，而不需要高维度的特征或梯度匹配。我们对标准 benchmark 数据集进行了广泛的实验，发现RDM在state-of-the-art DG方法中显示出了更好的总体化能力。

This Looks Like Those: Illuminating Prototypical Concepts Using Multiple Visualizations

paper_url: http://arxiv.org/abs/2310.18589
repo_url: https://github.com/henrymachiyu/this-looks-like-those_protoconcepts
paper_authors: Chiyu Ma, Brandon Zhao, Chaofan Chen, Cynthia Rudin
for: 这个论文的目的是提出一种可解释的图像分类方法，结合深度学习和案例基础理解。
methods: 这个方法使用多个图像 patches 来学习概念，并使用这些概念进行可解释的图像分类。
results: 实验结果表明，这种方法可以应用于各种现有的prototype-based图像分类网络中，并在标准数据集上实现相同的准确率。

Abstract
We present ProtoConcepts, a method for interpretable image classification combining deep learning and case-based reasoning using prototypical parts. Existing work in prototype-based image classification uses a ``this looks like that'' reasoning process, which dissects a test image by finding prototypical parts and combining evidence from these prototypes to make a final classification. However, all of the existing prototypical part-based image classifiers provide only one-to-one comparisons, where a single training image patch serves as a prototype to compare with a part of our test image. With these single-image comparisons, it can often be difficult to identify the underlying concept being compared (e.g., ``is it comparing the color or the shape?''). Our proposed method modifies the architecture of prototype-based networks to instead learn prototypical concepts which are visualized using multiple image patches. Having multiple visualizations of the same prototype allows us to more easily identify the concept captured by that prototype (e.g., ``the test image and the related training patches are all the same shade of blue''), and allows our model to create richer, more interpretable visual explanations. Our experiments show that our ``this looks like those'' reasoning process can be applied as a modification to a wide range of existing prototypical image classification networks while achieving comparable accuracy on benchmark datasets.

摘要
我们提出了ProtoConcepts，一种可读性高的图像分类方法，结合深度学习和倡议式推理，使用 protoypical parts。现有的图像分类方法中，使用“这看起来像那”的思维过程，将试验图像分解为 protoypical parts，并从这些 protoypical parts 中获取证据，以进行最终的分类。然而，所有的单一图像比较方法都仅提供一对一的比较，即一个训练图像区块作为一个 prototype，与试验图像中的一部分进行比较。这种单一图像比较可能很难理解到被比较的基本概念（例如，“是对颜色或形状的比较？”）。我们的提案方法改变 prototype-based 网络的架构，以学习 protoypical concepts，这些概念可以通过多个图像区块进行可读性更高的visual化。有多个可读性更高的 visual explanation，我们的模型可以更好地识别基本概念，并创建更加可读性更高的图像解释。我们的实验显示，我们的“这看起来像那”的思维过程可以与现有的 prototype-based 图像分类网络结合，在benchmark dataset上实现相似的准确性。

Self-Supervised Multi-Modality Learning for Multi-Label Skin Lesion Classification

paper_url: http://arxiv.org/abs/2310.18583
repo_url: https://github.com/dylan-h-wang/skin-sm3
paper_authors: Hao Wang, Euijoon Ahn, Lei Bi, Jinman Kim
for: 该研究旨在提高多Modal Skin Lesion 诊断精度，使用自助学习算法和多modal 特征。
methods: 该算法使用了对匹配的dermoscopic和临床图像进行最大化的相似性 Maximization，以及基于归一化的Clustering分析生成Surrogate pseudo-multi-labels。
results: 研究结果表明，该算法在 Seven-point Skin Lesion 数据集上表现更好于其他状态对照算法，并且能够准确地识别多种皮肤病变。

Abstract
The clinical diagnosis of skin lesion involves the analysis of dermoscopic and clinical modalities. Dermoscopic images provide a detailed view of the surface structures whereas clinical images offer a complementary macroscopic information. The visual diagnosis of melanoma is also based on seven-point checklist which involves identifying different visual attributes. Recently, supervised learning approaches such as convolutional neural networks (CNNs) have shown great performances using both dermoscopic and clinical modalities (Multi-modality). The seven different visual attributes in the checklist are also used to further improve the the diagnosis. The performances of these approaches, however, are still reliant on the availability of large-scaled labeled data. The acquisition of annotated dataset is an expensive and time-consuming task, more so with annotating multi-attributes. To overcome this limitation, we propose a self-supervised learning (SSL) algorithm for multi-modality skin lesion classification. Our algorithm enables the multi-modality learning by maximizing the similarities between paired dermoscopic and clinical images from different views. In addition, we generate surrogate pseudo-multi-labels that represent seven attributes via clustering analysis. We also propose a label-relation-aware module to refine each pseudo-label embedding and capture the interrelationships between pseudo-multi-labels. We validated the effectiveness of our algorithm using well-benchmarked seven-point skin lesion dataset. Our results show that our algorithm achieved better performances than other state-of-the-art SSL counterparts.

摘要
诊断皮肤损伤的临床 диагностика involves 分析 dermoscopic 和临床特征。 dermoscopic 图像提供表面结构的详细视图，而临床图像提供较大规模的信息。诊断 меланомой 还是基于七点检查表，包括不同的视觉特征。最近，supervised learning 方法 such as convolutional neural networks (CNNs) 在多Modalities 上表现出色，使用dermoscopic 和临床特征。七个不同的视觉特征也用于进一步改进诊断。然而，这些方法的表现仍然受到大规模标注数据的可用性的限制。获取标注数据集是一项expensive 和时间consuming的任务，更是与标注多属性。为了突破这些限制，我们提出了一种自助学习（SSL）算法 для多Modalities 皮肤损伤分类。我们的算法使得多Modalities 学习，通过最大化不同视角dermoscopic 和临床图像之间的相似性。此外，我们生成了surrogate pseudo-multi-labels，通过分类分析代表七个属性。我们还提出了一种标签关系意识模块，用于修复每个 pseudo-label embedding 并捕捉 pseudo-multi-labels 之间的关系。我们 validated 我们的算法使用 well-benchmarked 七点皮肤损伤数据集。我们的结果显示，我们的算法在与其他状态时的SSL 对手中表现出色。

MultiScale Spectral-Spatial Convolutional Transformer for Hyperspectral Image Classification

paper_url: http://arxiv.org/abs/2310.18550
repo_url: None
paper_authors: Zhiqiang Gong, Xian Zhou, Wen Yao
For: The paper is written for hyperspectral image classification, and it proposes a new architecture called MultiscaleFormer that captures both spectral and spatial information.* Methods: The proposed method uses multiscale spatial patches as tokens to formulate the spatial Transformer, and generates multiscale spectral-spatial representation of each pixel. It also uses a modified spectral-spatial CAF module to fuse cross-layer spectral and spatial information.* Results: The proposed method outperforms other architectures for hyperspectral image classification on commonly used real-world datasets.Here’s the simplified Chinese text for the three key points:* For: 这篇论文是为了干涉谱图像分类而写的，并提出了一种新的架构方案 called MultiscaleFormer，它能够捕捉谱图像的 spectral 和 spatial 信息。* Methods: 该方法使用多个级别的空间块作为 токен，以形成空间 transformer，并生成每个像素的多级 spectral-spatial 表示。它还使用一种修改后的 spectral-spatial CAF 模块来融合层次 spectral 和 spatial 信息。* Results: 该方法在常用的实际 dataset 上进行了实验，并与其他架构进行了比较，结果显示了该方法的优越性。

Abstract
Due to the powerful ability in capturing the global information, Transformer has become an alternative architecture of CNNs for hyperspectral image classification. However, general Transformer mainly considers the global spectral information while ignores the multiscale spatial information of the hyperspectral image. In this paper, we propose a multiscale spectral-spatial convolutional Transformer (MultiscaleFormer) for hyperspectral image classification. First, the developed method utilizes multiscale spatial patches as tokens to formulate the spatial Transformer and generates multiscale spatial representation of each band in each pixel. Second, the spatial representation of all the bands in a given pixel are utilized as tokens to formulate the spectral Transformer and generate the multiscale spectral-spatial representation of each pixel. Besides, a modified spectral-spatial CAF module is constructed in the MultiFormer to fuse cross-layer spectral and spatial information. Therefore, the proposed MultiFormer can capture the multiscale spectral-spatial information and provide better performance than most of other architectures for hyperspectral image classification. Experiments are conducted over commonly used real-world datasets and the comparison results show the superiority of the proposed method.

摘要
由于Transformer的强大能力 capture global information，因此成为了干扰器的替代架构for hyperspectral image classification。然而，通常的Transformer主要考虑全球 spectral information，而忽略了多尺度空间信息的干扰器图像。在本文中，我们提出了一种多尺度 spectral-spatial convolutional Transformer（MultiscaleFormer）for hyperspectral image classification。首先，我们开发的方法使用多尺度空间块作为токен，并生成每个像素的多尺度空间表示。其次，所有帧在每个像素中的 spectral representation被作为 tokens，并生成每个像素的多尺度 spectral-spatial表示。此外，我们修改了 spectral-spatial CAF模块，以融合层次 spectral和空间信息。因此，我们提出的 MultiFormer 可以捕捉多尺度 spectral-spatial信息，并提供更好的性能 than most other architectures for hyperspectral image classification。我们对常用的实际 dataset进行了实验，并 compare 结果表明了我们的方法的优越性。

MEDAVET: Traffic Vehicle Anomaly Detection Mechanism based on spatial and temporal structures in vehicle traffic

paper_url: http://arxiv.org/abs/2310.18548
repo_url: None
paper_authors: Ana Rosalía Huamán Reyna, Alex Josué Flórez Farfán, Geraldo Pereira Rocha Filho, Sandra Sampaio, Robson de Grande, Luis Hideo, Vasconcelos Nakamura, Rodolfo Ipolito Meneguette
for: 这篇论文是为了模型交通异常检测而写的。
methods: 该论文使用计算机视觉技术进行车辆跟踪，并使用бипаolar图和Convex Hull算法定义运动区域。异常检测使用QuadTree和靠近 occluded 的数据结构。
results: 实验结果显示，该方法在 Track4 测试集上得到了85.7% 的 F1 分数和25.432 的平方差。

Abstract
Currently, there are computer vision systems that help us with tasks that would be dull for humans, such as surveillance and vehicle tracking. An important part of this analysis is to identify traffic anomalies. An anomaly tells us that something unusual has happened, in this case on the highway. This paper aims to model vehicle tracking using computer vision to detect traffic anomalies on a highway. We develop the steps of detection, tracking, and analysis of traffic: the detection of vehicles from video of urban traffic, the tracking of vehicles using a bipartite graph and the Convex Hull algorithm to delimit moving areas. Finally for anomaly detection we use two data structures to detect the beginning and end of the anomaly. The first is the QuadTree that groups vehicles that are stopped for a long time on the road and the second that approaches vehicles that are occluded. Experimental results show that our method is acceptable on the Track4 test set, with an F1 score of 85.7% and a mean squared error of 25.432.

摘要
现在，计算机视觉系统可以帮助我们完成一些人类厌热的任务，如Surveillance和车辆跟踪。这个分析的一个重要组成部分是检测交通异常。异常告诉我们 чтоomething不寻常发生在公路上。这篇论文旨在通过计算机视觉来模型车辆跟踪，检测公路上的交通异常。我们开发了检测、跟踪和分析交通的步骤：从城市交通视频中检测车辆，使用二分图和Convex Hull算法来定义移动区域，并用两种数据结构来检测异常的开始和结束。实验结果表明，我们的方法在Track4测试集上得到了可接受的结果，F1分数为85.7%，平均方差为25.432。

2023-10-28

cs.AI

cs.AI - 2023-10-28

AI for Open Science: A Multi-Agent Perspective for Ethically Translating Data to Knowledge

paper_url: http://arxiv.org/abs/2310.18852
repo_url: None
paper_authors: Chase Yakaboski, Gregory Hyde, Clement Nyanhongo, Eugene Santos Jr
for: 本文提出了一种名为“AI for Open Science”（AI4OS）的概念，以便在科学实验室中提高开放性，并且将科学发现的开放化视为核心原则。
methods: 本文使用了知识发现和数据挖掘（KDD）的原则来正式化AI4OS的语言。并详细介绍了AI4OS系统中知识翻译的三个关键阶段，以及在这些阶段中应用开放性的具体方法。
results: 本文提出了一种用于评估AI4OS的理论指标，并阐述了这种指标的伦理意义。作者希望通过强调AI4OS，使AI4科学的自动化实验室不仅对开发者而言是有利，而且对社会也是有益。

Abstract
AI for Science (AI4Science), particularly in the form of self-driving labs, has the potential to sideline human involvement and hinder scientific discovery within the broader community. While prior research has focused on ensuring the responsible deployment of AI applications, enhancing security, and ensuring interpretability, we also propose that promoting openness in AI4Science discoveries should be carefully considered. In this paper, we introduce the concept of AI for Open Science (AI4OS) as a multi-agent extension of AI4Science with the core principle of maximizing open knowledge translation throughout the scientific enterprise rather than a single organizational unit. We use the established principles of Knowledge Discovery and Data Mining (KDD) to formalize a language around AI4OS. We then discuss three principle stages of knowledge translation embedded in AI4Science systems and detail specific points where openness can be applied to yield an AI4OS alternative. Lastly, we formulate a theoretical metric to assess AI4OS with a supporting ethical argument highlighting its importance. Our goal is that by drawing attention to AI4OS we can ensure the natural consequence of AI4Science (e.g., self-driving labs) is a benefit not only for its developers but for society as a whole.

摘要
人工智能（AI）在科学领域（AI4Science），特别是自动驾驶室，有可能削弱人类参与度和阻碍科学发现。而且，现有研究主要集中在负责AI应用部署、加强安全性和保持可解释性等方面。我们还建议在AI4Science发现中保持开放性应该仔细考虑。在本文中，我们提出了AI для开放科学（AI4OS）的概念，它是AI4Science的多代理扩展，核心原则是在科学产业中最大化开放知识翻译。我们使用已有的知识发现和数据挖掘（KDD）原则来正式化AI4OS的语言。然后，我们讨论了AI4Science系统中知识翻译的三个基本阶段，并详细介绍了在每个阶段中开放性可以如何应用，以生成一种AI4OS的替代方案。最后，我们提出了一个理论指标来评估AI4OS，并附加了一个伦理论据，强调其重要性。我们的目标是通过吸引关注AI4OS，使AI4Science的自然后果（例如自动驾驶室）对发展者和社会都带来好处。

Exploring Data Augmentations on Self-/Semi-/Fully- Supervised Pre-trained Models

paper_url: http://arxiv.org/abs/2310.18850
repo_url: None
paper_authors: Shentong Mo, Zhun Sun, Chao Li
for: investigate the effectiveness of data augmentation techniques in vision pre-trained models
methods: apply 4 types of data augmentations (Random Erasing, CutOut, CutMix, and MixUp) to self-/semi-/fully-supervised pre-trained models
results: observe that masking regions of images decreases invariance but increases diversity, while MixUp approach improves diversity with minimal decrease in invariance.Here’s the full text in Simplified Chinese:
for: 研究视觉预训模型中数据增强技术的效果
methods: 对自助/半助/全助预训模型应用4种数据增强方法（随机覆盖、剪辑、混合和混合）
results: 发现，对图像masking区域可以降低学习的不变性，但提供更大的多样性；而混合方法可以提高多样性，只是有一定的减少不变性。

Abstract
Data augmentation has become a standard component of vision pre-trained models to capture the invariance between augmented views. In practice, augmentation techniques that mask regions of a sample with zero/mean values or patches from other samples are commonly employed in pre-trained models with self-/semi-/fully-supervised contrastive losses. However, the underlying mechanism behind the effectiveness of these augmentation techniques remains poorly explored. To investigate the problems, we conduct an empirical study to quantify how data augmentation affects performance. Concretely, we apply 4 types of data augmentations termed with Random Erasing, CutOut, CutMix and MixUp to a series of self-/semi-/fully- supervised pre-trained models. We report their performance on vision tasks such as image classification, object detection, instance segmentation, and semantic segmentation. We then explicitly evaluate the invariance and diversity of the feature embedding. We observe that: 1) Masking regions of the images decreases the invariance of the learned feature embedding while providing a more considerable diversity. 2) Manual annotations do not change the invariance or diversity of the learned feature embedding. 3) The MixUp approach improves the diversity significantly, with only a marginal decrease in terms of the invariance.

摘要
<>将文本翻译成简化中文。<>预训练模型中的数据扩充已成为标准组件，以捕捉不同扩充视图之间的不变性。在实践中，通常使用随机将区域Masking为零或平均值的技术来实现预训练模型，并使用自我/半自动/全自动对比损失。然而，这些扩充技术的下面机制仍然不够了解。为了调查问题，我们进行了一项实验来衡量数据扩充对性能的影响。具体来说，我们将4种数据扩充方法称为随机擦除、CutOut、CutMix和MixUp应用于一系列自我/半自动/全自动预训练模型。我们则Report它们在视觉任务中的性能，包括图像分类、物体检测、实例分割和semantic segmentation。然后，我们显式评估扩充后feature embedding的不变性和多样性。我们发现：1. 将图像中的区域Masking为零或平均值会降低学习的feature embedding不变性，同时提供更大的多样性。2. 手动标注没有改变学习的feature embedding不变性或多样性。3. MixUp方法可以提高多样性，只有小量地降低不变性。

BanditPAM++: Faster $k$-medoids Clustering

paper_url: http://arxiv.org/abs/2310.18844
repo_url: https://github.com/thrungroup/banditpam_plusplus_experiments
paper_authors: Mo Tiwari, Ryan Kang, Donghyun Lee, Sebastian Thrun, Chris Piech, Ilan Shomorony, Martin Jinye Zhang
for: 这个论文主要关注于提高$k$-medoids clustering算法的效率和准确性。
methods: 该论文提出了两种算法优化方法，即在每个迭代中重用归一化信息，以及在不同迭代之间重用信息。
results: 实验结果表明，使用提出的 BanditPAM++ 算法可以在 CIFAR10 数据集上返回同样的 clustering 解决方案，但是运行速度比 BanditPAM 快得多，例如在 CIFAR10 数据集上，BanditPAM++ 运行时间是 BanditPAM 的10倍以上。

Abstract
Clustering is a fundamental task in data science with wide-ranging applications. In $k$-medoids clustering, cluster centers must be actual datapoints and arbitrary distance metrics may be used; these features allow for greater interpretability of the cluster centers and the clustering of exotic objects in $k$-medoids clustering, respectively. $k$-medoids clustering has recently grown in popularity due to the discovery of more efficient $k$-medoids algorithms. In particular, recent research has proposed BanditPAM, a randomized $k$-medoids algorithm with state-of-the-art complexity and clustering accuracy. In this paper, we present BanditPAM++, which accelerates BanditPAM via two algorithmic improvements, and is $O(k)$ faster than BanditPAM in complexity and substantially faster than BanditPAM in wall-clock runtime. First, we demonstrate that BanditPAM has a special structure that allows the reuse of clustering information $\textit{within}$ each iteration. Second, we demonstrate that BanditPAM has additional structure that permits the reuse of information $\textit{across}$ different iterations. These observations inspire our proposed algorithm, BanditPAM++, which returns the same clustering solutions as BanditPAM but often several times faster. For example, on the CIFAR10 dataset, BanditPAM++ returns the same results as BanditPAM but runs over 10$\times$ faster. Finally, we provide a high-performance C++ implementation of BanditPAM++, callable from Python and R, that may be of interest to practitioners at https://github.com/motiwari/BanditPAM. Auxiliary code to reproduce all of our experiments via a one-line script is available at https://github.com/ThrunGroup/BanditPAM_plusplus_experiments.

摘要
“集群是数据科学中的基本任务，具有广泛的应用。在$k$-medians集群中，集群中心必须是实际数据点，并且可以使用任意距离度量；这些特点使得$k$-medians集群更有可读性，并且可以更好地集 clusters 的特殊对象。随着更高效的$k$-medians算法的发现，$k$-medians集群在最近几年内 Popularity 增长。本文提出了 BanditPAM++，它是一种随机化的 $k$-medians算法，通过两个算法优化，与 BanditPAM 相比， complexity 为 $O(k)$ 和增加了很多的 wall-clock 时间。首先，我们证明 BanditPAM 具有特殊的结构，可以在每个迭代中重用 clustering 信息。其次，我们证明 BanditPAM 具有额外的结构，允许在不同的迭代中重用信息。这些观察点激发我们提出 BanditPAM++，它返回与 BanditPAM 相同的 clustering 解决方案，但通常很多 slower。例如，在 CIFAR10 数据集上，BanditPAM++ 与 BanditPAM 返回相同的结果，但运行速度比 BanditPAM 快了大约 10 倍。最后，我们提供了高性能的 C++ 实现，可以在 Python 和 R 中调用，并可能对实践者有利。详细的实验代码可以在 https://github.com/motiwari/BanditPAM 和 https://github.com/ThrunGroup/BanditPAM_plusplus_experiments 上找到。”

Automating the Correctness Assessment of AI-generated Code for Security Contexts

paper_url: http://arxiv.org/abs/2310.18834
repo_url: None
paper_authors: Domenico Cotroneo, Alessio Foggia, Cristina Improta, Pietro Liguori, Roberto Natella
for: This paper aims to evaluate the correctness of AI-generated code for security purposes using a fully automated method.
methods: The proposed method, named ACCA, uses symbolic execution to assess whether the AI-generated code behaves as a reference implementation.
results: The proposed method outperforms baseline solutions and shows a strong correlation with human evaluation, with an average time of ~0.17s per code snippet, much faster than manual inspection.

Abstract
In this paper, we propose a fully automated method, named ACCA, to evaluate the correctness of AI-generated code for security purposes. The method uses symbolic execution to assess whether the AI-generated code behaves as a reference implementation. We use ACCA to assess four state-of-the-art models trained to generate security-oriented assembly code and compare the results of the evaluation with different baseline solutions, including output similarity metrics, widely used in the field, and the well-known ChatGPT, the AI-powered language model developed by OpenAI. Our experiments show that our method outperforms the baseline solutions and assesses the correctness of the AI-generated code similar to the human-based evaluation, which is considered the ground truth for the assessment in the field. Moreover, ACCA has a very strong correlation with human evaluation (Pearson's correlation coefficient r=0.84 on average). Finally, since it is a fully automated solution that does not require any human intervention, the proposed method performs the assessment of every code snippet in ~0.17s on average, which is definitely lower than the average time required by human analysts to manually inspect the code, based on our experience.

摘要
在这篇论文中，我们提出了一种完全自动化的方法，名为ACCA，用于评估人工智能生成的代码的正确性，以便用于安全目的。该方法利用symbolic执行来评估AI生成的代码是否与参考实现一致。我们使用ACCA评估了四种现状最佳的模型，用于生成安全听力的assembly代码，并与不同的基准解决方案进行比较，包括输出相似度指标，在领域内广泛使用的，以及由OpenAI开发的知名的ChatGPT人工智能语言模型。我们的实验表明，我们的方法超越了基准解决方案，并与人类评估类似，被视为领域中的地面真实值。此外，ACCA与人类评估之间存在强相关性（平均Pearson相关系数r=0.84）。最后，由于它是完全自动化的，不需要任何人类参与，我们的方法可以快速地评估每个代码副本，平均需时约0.17秒，明显低于由人工分析员手动检查代码所需的时间，根据我们的经验。

Responsible AI (RAI) Games and Ensembles

paper_url: http://arxiv.org/abs/2310.18832
repo_url: https://github.com/yashgupta-7/rai-games
paper_authors: Yash Gupta, Runtian Zhai, Arun Suggala, Pradeep Ravikumar
for: 这个研究旨在解决人工智能（AI）在社会中的影响，包括公平性、可靠性和安全性等问题。
methods: 这个研究使用了一种普遍的框架，称为责任AI（RAI）游戏，来研究这些问题。两种算法来解决这些游戏：一种是基于在线学习和游戏理论的游戏玩家算法，另一种是基于经典统计文献中的提升和回归算法。
results: 研究证明了这些方法在解决一些RAI问题，特别是在子人口变化时的性能竞争力。

Abstract
Several recent works have studied the societal effects of AI; these include issues such as fairness, robustness, and safety. In many of these objectives, a learner seeks to minimize its worst-case loss over a set of predefined distributions (known as uncertainty sets), with usual examples being perturbed versions of the empirical distribution. In other words, aforementioned problems can be written as min-max problems over these uncertainty sets. In this work, we provide a general framework for studying these problems, which we refer to as Responsible AI (RAI) games. We provide two classes of algorithms for solving these games: (a) game-play based algorithms, and (b) greedy stagewise estimation algorithms. The former class is motivated by online learning and game theory, whereas the latter class is motivated by the classical statistical literature on boosting, and regression. We empirically demonstrate the applicability and competitive performance of our techniques for solving several RAI problems, particularly around subpopulation shift.

摘要
Recent research has focused on the social impact of AI, including issues such as fairness, robustness, and safety. In many cases, the goal is to minimize the worst-case loss over a set of predefined distribution (known as uncertainty sets), such as perturbed versions of the empirical distribution. These problems can be formulated as min-max problems over the uncertainty sets. In this study, we propose a general framework for addressing these issues, which we refer to as Responsible AI (RAI) games. We present two classes of algorithms for solving these games: (a) game-play based algorithms, and (b) greedy stagewise estimation algorithms. The former class is inspired by online learning and game theory, while the latter class is based on the classical statistical literature on boosting and regression. We empirically demonstrate the applicability and competitive performance of our techniques for solving several RAI problems, particularly in the context of subpopulation shift.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

All Things Considered: Detecting Partisan Events from News Media with Cross-Article Comparison

paper_url: http://arxiv.org/abs/2310.18827
repo_url: None
paper_authors: Yujian Liu, Xinliang Frederick Zhang, Kaijian Zou, Ruihong Huang, Nick Beauchamp, Lu Wang
for: 本研究旨在探讨新闻媒体如何影响公众意见，以及媒体如何通过透明或不透明的方式 shape opinion。
methods: 本研究使用了一种基于潜在变量的框架，通过比较相同故事的多篇文章来预测文章的政治倾向。
results: 实验表明，媒体可以通过不公平地选择报道事件来形成公众意见，而且这种偏见存在于主流媒体中，即使媒体具有强的 объекivity 和非政治化准则。

Abstract
Public opinion is shaped by the information news media provide, and that information in turn may be shaped by the ideological preferences of media outlets. But while much attention has been devoted to media bias via overt ideological language or topic selection, a more unobtrusive way in which the media shape opinion is via the strategic inclusion or omission of partisan events that may support one side or the other. We develop a latent variable-based framework to predict the ideology of news articles by comparing multiple articles on the same story and identifying partisan events whose inclusion or omission reveals ideology. Our experiments first validate the existence of partisan event selection, and then show that article alignment and cross-document comparison detect partisan events and article ideology better than competitive baselines. Our results reveal the high-level form of media bias, which is present even among mainstream media with strong norms of objectivity and nonpartisanship. Our codebase and dataset are available at https://github.com/launchnlp/ATC.

摘要
社会舆论是由新闻媒体提供的信息所形成的，而这些信息可能受媒体机构的意识形态偏好所影响。然而，许多注意力集中在媒体偏见的明显表达或话题选择上，而媒体 shape 意见的更加不显式的方式却很少得到关注。我们提出了一种基于隐藏变量的框架，用于预测新闻文章的意识性。我们通过比较同一个故事的多篇文章来确定包含或排除某些政治事件的媒体偏见。我们的实验首先证明了事件选择的存在，然后展示了文章对齐和跨文档比较的能力更好地探测文章意识性和媒体偏见。我们的结果表明，媒体偏见存在于主流媒体中，即使媒体有强大的objectivity和非政治化的准则。我们的代码库和数据集可以在上获取。

A Fuzzy Time Series-Based Model Using Particle Swarm Optimization and Weighted Rules

paper_url: http://arxiv.org/abs/2310.18825
repo_url: None
paper_authors: Daniel Ortiz-Arroyo
for: 提高高阶不确定时间序列模型的精度和可靠性。
methods: combining particle swarm optimization (PSO) and weighted summation to address the limitations of high-order fuzzy time series models.
results: 比前方法更高精度地模型时间序列。

Abstract
During the last decades, a myriad of fuzzy time series models have been proposed in scientific literature. Among the most accurate models found in fuzzy time series, the high-order ones are the most accurate. The research described in this paper tackles three potential limitations associated with the application of high-order fuzzy time series models. To begin with, the adequacy of forecast rules lacks consistency. Secondly, as the model's order increases, data utilization diminishes. Thirdly, the uniformity of forecast rules proves to be highly contingent on the chosen interval partitions. To address these likely drawbacks, we introduce a novel model based on fuzzy time series that amalgamates the principles of particle swarm optimization (PSO) and weighted summation. Our results show that our approach models accurately the time series in comparison with previous methods.

摘要
在过去几十年中，数字时间序列模型的研究得到了广泛的发展和应用。高阶的含糊时间序列模型在科学文献中被认为是最为准确的。本研究考虑了高阶含糊时间序列模型的三个可能的限制：首先，预测规则的适用稳定性不充分；第二，随着模型的阶数增加，数据利用率逐渐减少；第三，预测规则的均匀性高度取决于选择的时间分割。为解决这些可能的缺点，我们提出了一种基于含糊时间序列的新模型，具有融合了粒子群组合优化（PSO）和Weighted Summary的原则。我们的结果表明，我们的方法可以准确地模型时间序列，与前期方法相比。

Rethinking Semi-Supervised Federated Learning: How to co-train fully-labeled and fully-unlabeled client imaging data

paper_url: http://arxiv.org/abs/2310.18815
repo_url: None
paper_authors: Pramit Saha, Divyanshu Mishra, J. Alison Noble
for: 本研究旨在解决 semi-supervised federated learning (SSFL) 中 client 之间具有半标注数据的问题，特别是在医疗设置下，合作伙伴（通常是医院）可能拥有图像，但没有注释。
methods: 我们提出了一种新的学习方案，即 Isolated Federated Learning (IsoFed)，以避免简单的平均方法。我们的训练方法包括两个部分：（a）孤立的客户端模型归一化，以及（b）所有客户端的本地自我超vised pre-training。
results: 我们在四种不同的医疗影像数据集上进行了实验，包括 MedMNIST 的医疗影像 benchmark。我们还在不同的实验设置下变换了比例的标注客户端和多样性，以示出我们的方法在不同的情况下的效果。

Abstract
The most challenging, yet practical, setting of semi-supervised federated learning (SSFL) is where a few clients have fully labeled data whereas the other clients have fully unlabeled data. This is particularly common in healthcare settings where collaborating partners (typically hospitals) may have images but not annotations. The bottleneck in this setting is the joint training of labeled and unlabeled clients as the objective function for each client varies based on the availability of labels. This paper investigates an alternative way for effective training with labeled and unlabeled clients in a federated setting. We propose a novel learning scheme specifically designed for SSFL which we call Isolated Federated Learning (IsoFed) that circumvents the problem by avoiding simple averaging of supervised and semi-supervised models together. In particular, our training approach consists of two parts - (a) isolated aggregation of labeled and unlabeled client models, and (b) local self-supervised pretraining of isolated global models in all clients. We evaluate our model performance on medical image datasets of four different modalities publicly available within the biomedical image classification benchmark MedMNIST. We further vary the proportion of labeled clients and the degree of heterogeneity to demonstrate the effectiveness of the proposed method under varied experimental settings.

摘要
最大挑战的、但实际可行的 semi-supervised federated learning（SSFL）设置是，一些客户端有完全标注数据，而另一些客户端有完全无标注数据。这种情况 particullary 在医疗设置中常见，合作伙伴（通常是医院）可能拥有图像，但并没有标注。瓶颈在这种设置下是 joint 训练标注和无标注客户端的目标函数，因为每个客户端的目标函数因标注的可用性而变化。这篇论文investigates an alternative way for effective training with labeled and unlabeled clients in a federated setting. We propose a novel learning scheme specifically designed for SSFL, which we call Isolated Federated Learning (IsoFed), to circumvent this problem by avoiding simple averaging of supervised and semi-supervised models together. In particular, our training approach consists of two parts: (a) isolated aggregation of labeled and unlabeled client models, and (b) local self-supervised pretraining of isolated global models in all clients. We evaluate our model performance on medical image datasets of four different modalities publicly available within the biomedical image classification benchmark MedMNIST. We further vary the proportion of labeled clients and the degree of heterogeneity to demonstrate the effectiveness of the proposed method under varied experimental settings.

Hierarchical Framework for Interpretable and Probabilistic Model-Based Safe Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.18811
repo_url: None
paper_authors: Ammar N. Abbas, Georgios C. Chasparis, John D. Kelleher
for: 这篇论文的目的是提出一种基于深度强化学习的安全关键系统解决方案，以便在安全关键系统中使用深度强化学习，并且提供解释性的执行。
methods: 这篇论文使用了深度强化学习，并与传统决策策略相合作，以提高安全关键系统的可靠性和可控性。它还使用了潜在模型和强化学习的融合，以提高解释性和可靠性。
results: 这篇论文的实验结果显示，BC-SRLA在维护领域中的维护维护过程中表现出色，比传统方法和其他基于RL的基eline更好。

Abstract
The difficulty of identifying the physical model of complex systems has led to exploring methods that do not rely on such complex modeling of the systems. Deep reinforcement learning has been the pioneer for solving this problem without the need for relying on the physical model of complex systems by just interacting with it. However, it uses a black-box learning approach that makes it difficult to be applied within real-world and safety-critical systems without providing explanations of the actions derived by the model. Furthermore, an open research question in deep reinforcement learning is how to focus the policy learning of critical decisions within a sparse domain. This paper proposes a novel approach for the use of deep reinforcement learning in safety-critical systems. It combines the advantages of probabilistic modeling and reinforcement learning with the added benefits of interpretability and works in collaboration and synchronization with conventional decision-making strategies. The BC-SRLA is activated in specific situations which are identified autonomously through the fused information of probabilistic model and reinforcement learning, such as abnormal conditions or when the system is near-to-failure. Further, it is initialized with a baseline policy using policy cloning to allow minimum interactions with the environment to address the challenges associated with using RL in safety-critical industries. The effectiveness of the BC-SRLA is demonstrated through a case study in maintenance applied to turbofan engines, where it shows superior performance to the prior art and other baselines.

摘要
因为识别复杂系统的物理模型具有挑战，因此探索不需要基于这些复杂模型的方法。深度强化学习曾经是解决这个问题的先驱，它不需要基于系统的物理模型来解决问题，只需通过与系统交互来解决问题。然而，它使用黑盒学习方法，这使得其在实际世界和安全关键系统中应用非常困难，而且无法提供行为的解释。此外，深度强化学习中的一个开放研究问题是如何将策略学习集中在稀疏领域中。本文提出了一种基于深度强化学习的新方法，用于安全关键系统中。它结合了概率模型和强化学习的优点，同时增加了可解性。此外，它与传统决策策略协作和同步，在特定情况下自动识别并且通过混合信息来识别，例如异常情况或系统垂直危机。此外，它使用策略做副本来初始化，以最小化与环境的交互，解决了使用强化学习在安全关键行业中的挑战。本文通过一个维护案例研究展示了BC-SRLA的有效性，其在维护领域的表现较优于先前艺术和其他基线。

OC-NMN: Object-centric Compositional Neural Module Network for Generative Visual Analogical Reasoning

paper_url: http://arxiv.org/abs/2310.18807
repo_url: None
paper_authors: Rim Assouel, Pau Rodriguez, Perouz Taslakian, David Vazquez, Yoshua Bengio
for: This paper aims to improve the ability of machine learning systems to imagine and compose learned concepts in novel ways, specifically in the context of visual reasoning.
methods: The paper proposes a modular data augmentation framework called Object-centric Compositional Neural Module Network (OC-NMN), which decomposes visual generative reasoning tasks into a series of primitives applied to objects.
results: The paper shows that the proposed modular architectural choices can be used to generate new training tasks that lead to better out-of-distribution generalization, and compares the model to existing and new baselines in a proposed visual reasoning benchmark.Here’s the same information in Simplified Chinese text:
for: 这篇论文目标是提高机器学习系统的想象和组合学习能力，特别是在视觉理解中。
methods: 该论文提出了一种模块化数据增强框架，称为Object-centric Compositional Neural Module Network (OC-NMN)，它将视觉生成逻辑任务 decomposes 成一系列对象上的基本操作。
results: 论文显示，提出的模块性建筑设计可以生成新的训练任务，导致更好的 OUT-OF-distribution 通用化。并与现有和新的基准值进行比较，在提posed的视觉理解 bencmark 中。

Abstract
A key aspect of human intelligence is the ability to imagine -- composing learned concepts in novel ways -- to make sense of new scenarios. Such capacity is not yet attained for machine learning systems. In this work, in the context of visual reasoning, we show how modularity can be leveraged to derive a compositional data augmentation framework inspired by imagination. Our method, denoted Object-centric Compositional Neural Module Network (OC-NMN), decomposes visual generative reasoning tasks into a series of primitives applied to objects without using a domain-specific language. We show that our modular architectural choices can be used to generate new training tasks that lead to better out-of-distribution generalization. We compare our model to existing and new baselines in proposed visual reasoning benchmark that consists of applying arithmetic operations to MNIST digits.

摘要
人类智能的一个重要方面是具备想象能力---把已学习的概念组合在新的方式下来---以便理解新的场景。这种能力目前尚未被机器学习系统具备。在这项工作中，我们在视觉逻辑上利用了模块性，以 derive一种基于想象的数据增强框架。我们的方法，称为物体中心的compositional Neural Module Network（OC-NMN），将视觉生成逻辑任务分解成一系列对象上的基本Primitive。我们显示了我们的建筑方式可以生成新的训练任务，导致更好的对外值 generale。我们与现有和新的基准值进行比较，并在我们提出的视觉理解benchmark中进行测试，该benchmark包括对MNIST数字应用数学运算。

Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

paper_url: http://arxiv.org/abs/2310.18804
repo_url: None
paper_authors: Hejie Cui, Xinyu Fang, Zihan Zhang, Ran Xu, Xuan Kan, Xin Liu, Yue Yu, Manling Li, Yangqiu Song, Carl Yang
for: 这篇论文旨在探讨开放视觉知识EXTRACTION的新方法，以提高机器理解世界的能力。
methods: 该方法使用开放关系区域检测器和大型多模态模型，从图像中提取无格式的视觉知识。
results: 实验表明，OpenVik可以生成具有准确性和独特性的开放视觉知识，并在多种视觉理解应用中提供了显著的改进。

Abstract
Images contain rich relational knowledge that can help machines understand the world. Existing methods on visual knowledge extraction often rely on the pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. To achieve this, we present OpenVik which consists of an open relational region detector to detect regions potentially containing relational knowledge and a visual knowledge generator that generates format-free knowledge by prompting the large multimodality model with the detected region of interest. We also explore two data enhancement techniques for diversifying the generated format-free visual knowledge. Extensive knowledge quality evaluations highlight the correctness and uniqueness of the extracted open visual knowledge by OpenVik. Moreover, integrating our extracted knowledge across various visual reasoning applications shows consistent improvements, indicating the real-world applicability of OpenVik.

摘要
图像含有丰富的关系知识，可以帮助机器理解世界。现有的视觉知识EXTRACTION方法 oft rely on先defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge。在这项工作中，我们开始了一种新的开放视觉知识EXTRACTION paradigm。为达到这个目标，我们提出了 OpenVik，它包括一个开放关系区域检测器，用于检测可能包含关系知识的区域，以及一个可视知识生成器，通过在检测到区域特点的提示下，生成无格式的知识。我们还探讨了两种数据增强技术，用于让生成的无格式视觉知识更加多样化。广泛的知识质量评估表明OpenVik提取的开放视觉知识具有正确性和独特性。此外，我们在不同的视觉理解应用中集成我们提取的知识，显示了一致的改进， indicating the real-world applicability of OpenVik。

Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation

paper_url: http://arxiv.org/abs/2310.18794
repo_url: None
paper_authors: Yixin Wan, Fanyou Wu, Weijie Xu, Srinivasan H. Sengamedu
for: 本研究的目的是探讨模型幻化现象在自然语言生成（NLG）中的作用，并提出基于确定性的回答排名方法来减少模型幻化。
methods: 本研究使用了序列级确定性的两个方面：概率确定性和含义确定性，并通过对知识推理对话生成（KGDG）任务的实验发现，两者均与模型回答中幻化水平有显著相关性。
results: 研究发现，在模型回答中含义确定性水平较高时，幻化水平较低，而概率确定性水平较高时，幻化水平较高。此外，研究还提供了理论分析和证明，证明含义确定性可以作为概率确定性的一种代替方案，并在黑obox场景中具有可行性。基于这些发现，本研究提出了确定性基本回答排名（CRR）方法，以减少NLG中模型幻化现象。CRR分为两种类型：概率CRR（P-CRR）和含义CRR（S-CRR）。P-CRR使用模型回答整个序列的平均Log probability来排名样本。S-CRR根据模型回答的含义相似度来排名一些模型回答的候选者，并使用含义相似度来估计模型回答的确定性水平。通过对3个KGDG数据集、3种排序方法和4个模型进行了广泛的实验， validate了我们提出的2种CRR方法的效果。

Abstract
Model hallucination has been a crucial interest of research in Natural Language Generation (NLG). In this work, we propose sequence-level certainty as a common theme over hallucination in NLG, and explore the correlation between sequence-level certainty and the level of hallucination in model responses. We categorize sequence-level certainty into two aspects: probabilistic certainty and semantic certainty, and reveal through experiments on Knowledge-Grounded Dialogue Generation (KGDG) task that both a higher level of probabilistic certainty and a higher level of semantic certainty in model responses are significantly correlated with a lower level of hallucination. What's more, we provide theoretical proof and analysis to show that semantic certainty is a good estimator of probabilistic certainty, and therefore has the potential as an alternative to probability-based certainty estimation in black-box scenarios. Based on the observation on the relationship between certainty and hallucination, we further propose Certainty-based Response Ranking (CRR), a decoding-time method for mitigating hallucination in NLG. Based on our categorization of sequence-level certainty, we propose 2 types of CRR approach: Probabilistic CRR (P-CRR) and Semantic CRR (S-CRR). P-CRR ranks individually sampled model responses using their arithmetic mean log-probability of the entire sequence. S-CRR approaches certainty estimation from meaning-space, and ranks a number of model response candidates based on their semantic certainty level, which is estimated by the entailment-based Agreement Score (AS). Through extensive experiments across 3 KGDG datasets, 3 decoding methods, and on 4 different models, we validate the effectiveness of our 2 proposed CRR methods to reduce model hallucination.

摘要
modelo de generación de lenguaje natural (NLG) ha sido un tema crucial de investigación en la comunidad científica. En este trabajo, propusimos la certidumbre de secuencia como un tema común en la generación de lenguaje natural, y exploramos la relación entre la certidumbre de secuencia y el nivel de halucinación en las respuestas del modelo. Distinguiendo la certidumbre de secuencia en dos aspectos: la certidumbre probabilística y la certidumbre semántica, revelamos a través de experimentos en la tarea de generación de diálogo basado en conocimientos (KGDG) que ambos tienen un nivel significativamente correlacionado con un nivel bajo de halucinación. Además, proveímos pruebas teóricas y análisis para demostrar que la certidumbre semántica es un buen estimador de la certidumbre probabilística, y por lo tanto tiene el potencial de servir como una alternativa a la estimación de certidumbre basada en probabilidades en escenarios de "black-box". Basándonos en la observación de la relación entre la certidumbre y la halucinación, propusimos el Metodo de Ranking de Respuestas basado en la Certidumbre (CRR), un método de decodificación en tiempo real para mitigar la halucinación en NLG. Basándonos en nuestra categorización de la certidumbre de secuencia, propusimos dos enfoques de CRR: el enfoque de Certidumbre Probabilística (P-CRR) y el enfoque de Certidumbre Semántica (S-CRR). El enfoque P-CRR clasifica las respuestas individualmente seleccionadas del modelo utilizando su probabilidad aritmética promedio de toda la secuencia. El enfoque S-CRR se basa en la certidumbre semántica, y clasifica un número de candidatos de respuestas del modelo según su nivel de certidumbre semántica, que se estima utilizando el índice de Entendimiento (AS). A través de extensivos experimentos en tres conjuntos de datos de KGDG, tres métodos de decodificación y cuatro modelos diferentes, validamos la eficacia de nuestros dos métodos de CRR para reducir la halucinación del modelo.

“Do it my way!”: Impact of Customizations on Trust perceptions in Human-Robot Collaboration

paper_url: http://arxiv.org/abs/2310.18791
repo_url: None
paper_authors: Parv Kapoor, Simon Chu, Angela Chen
for: 这个研究旨在探讨个性化助手机器人的影响，以及个性化程度对人类使用者的体验和信任感的影响。
methods: 研究采用了在人类使用者身上进行的内置研究（N=17），并对不同水平的自定义可能性进行了比较。
results: 研究发现，增加个性化程度会导致更高的信任和舒适感。这些发现可以帮助设计师设计更信任worthy和个性化的助手机器人。

Abstract
Trust has been shown to be a key factor in effective human-robot collaboration. In the context of assistive robotics, the effect of trust factors on human experience is further pronounced. Personalization of assistive robots is an orthogonal factor positively correlated with robot adoption and user perceptions. In this work, we investigate the relationship between these factors through a within-subjects study (N=17). We provide different levels of customization possibilities over baseline autonomous robot behavior and investigate its impact on trust. Our findings indicate that increased levels of customization was associated with higher trust and comfort perceptions. The assistive robot design process can benefit significantly from our insights for designing trustworthy and customized robots.

摘要
信任被证明为人机合作中关键因素。在帮助型机器人领域，信任因素对人类体验的影响更加明显。个性化机器人设计是一个 orthogonal 因素，与机器人采用和用户对机器人的评价显著相关。本研究通过在subjects（N=17）中进行内部研究，研究自适应机器人行为的不同水平的个性化可能性对信任的影响。我们发现，逐渐提高个性化水平与信任、舒适感的增加有显著相关性。这些发现可以帮助设计信任worthy和个性化的机器人设计过程。

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

paper_url: http://arxiv.org/abs/2310.18780
repo_url: None
paper_authors: Stefano Massaroli, Michael Poli, Daniel Y. Fu, Hermann Kumbong, Rom N. Parnichkun, Aman Timalsina, David W. Romero, Quinn McIntyre, Beidi Chen, Atri Rudra, Ce Zhang, Christopher Re, Stefano Ermon, Yoshua Bengio
for: 降低memory footprint和提高throughput during generation
methods: 使用 rational interpolation和model-order reduction techniques，以及Weight-tying filters across channels into heads
results: 实现了10倍于Transformers的throughput和1.5倍于Hyena的throughput，而且无损质量 послеdistillation

Abstract
Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers. In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads -- naively requiring a full pass (or caching of activations) over the input sequence for each generated token -- similarly to attention-based models. In this paper, we seek to enable $\mathcal O(1)$ compute and memory cost per token in any pre-trained long convolution architecture to reduce memory footprint and increase throughput during generation. Concretely, our methods consist in extracting low-dimensional linear state-space models from each convolution layer, building upon rational interpolation and model-order reduction techniques. We further introduce architectural improvements to convolution-based layers such as Hyena: by weight-tying the filters across channels into heads, we achieve higher pre-training quality and reduce the number of filters to be distilled. The resulting model achieves 10x higher throughput than Transformers and 1.5x higher than Hyena at 1.3B parameters, without any loss in quality after distillation.

摘要
最近的进展在无注意力序列模型中使用核函数作为Transformer核心中的注意力运算符的替代方案。具体而言，长核函数序列模型在多个领域中 achieved state-of-the-art performance，但是在自动生成推干负载中具有重要的成本 - 需要遍历输入序列的全部通过或缓存活动的结果。在这篇论文中，我们寻求实现$\mathcal O(1)$的 compute和memory成本每个 токен，以降低快取面积和增加生成速度。具体来说，我们的方法是从每个核函数层提取低维度的线性状态空间模型，建立在理性插值和模型阶层技术之上。我们还引入了对于核函数层的建筑改进，例如Hyena：将核函数跨通道联结成头部，以提高预训品质和降低缩减策略中的缩减策略。实验结果显示，我们的模型可以与Transformer和Hyena相比，在1.3B个参数下实现10倍的生成速度，不会对品质造成损害。

Improving Compositional Generalization Using Iterated Learning and Simplicial Embeddings

paper_url: http://arxiv.org/abs/2310.18777
repo_url: None
paper_authors: Yi Ren, Samuel Lavoie, Mikhail Galkin, Danica J. Sutherland, Aaron Courville
for: The paper aims to improve the compositional generalization of deep neural networks, which is the ability to generalize to unseen combinations of latent factors.
methods: The paper proposes using iterated learning on models with simplicial embeddings to improve compositional generalization. This approach is motivated by an analysis of compositionality based on Kolmogorov complexity.
results: The paper demonstrates improvements in compositional generalization over other approaches, using both vision tasks with well-understood latent factors and real molecular graph prediction tasks where the latent structure is unknown.

Abstract
Compositional generalization, the ability of an agent to generalize to unseen combinations of latent factors, is easy for humans but hard for deep neural networks. A line of research in cognitive science has hypothesized a process, ``iterated learning,'' to help explain how human language developed this ability; the theory rests on simultaneous pressures towards compressibility (when an ignorant agent learns from an informed one) and expressivity (when it uses the representation for downstream tasks). Inspired by this process, we propose to improve the compositional generalization of deep networks by using iterated learning on models with simplicial embeddings, which can approximately discretize representations. This approach is further motivated by an analysis of compositionality based on Kolmogorov complexity. We show that this combination of changes improves compositional generalization over other approaches, demonstrating these improvements both on vision tasks with well-understood latent factors and on real molecular graph prediction tasks where the latent structure is unknown.

摘要
人类的 Compositional generalization，即对未经过视图的组合因素进行泛化，容易 для人类，但困难 для深度神经网络。认知科学中的一条研究提出了一个过程，即“迭代学习”，以解释人类语言的发展能力;该理论基于同时应对压缩性（ikor ignorant agent 从 informed one 学习）和表达性（ikor it 使用表示进行下游任务）的同时压力。 inspirited by this process, we propose to improve the compositional generalization of deep networks by using iterated learning on models with simplicial embeddings, which can approximately discretize representations. This approach is further motivated by an analysis of compositionality based on Kolmogorov complexity. We show that this combination of changes improves compositional generalization over other approaches, demonstrating these improvements both on vision tasks with well-understood latent factors and on real molecular graph prediction tasks where the latent structure is unknown.Note: Simplified Chinese is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and Macau. The translation is written in Simplified Chinese.

Linear Mode Connectivity in Sparse Neural Networks

paper_url: http://arxiv.org/abs/2310.18769
repo_url: None
paper_authors: Luke McDermott, Daniel Cummings
for: 这个论文研究了使用生成的数据进行神经网络减少，并研究了这些减少后的神经网络在真实数据上的训练特性。
methods: 该论文使用了迭代幅度减少（IMP）法，并使用了一种称为“液体减少”的方法来生成数据。
results: 研究发现，使用生成的数据和IMP法可以创建一类稀疏神经网络，这些神经网络在真实数据上训练时更加稳定，并且可以与传统IMP法相比，使用更少的训练点（最多150倍）达到相同的性能。

Abstract
With the rise in interest of sparse neural networks, we study how neural network pruning with synthetic data leads to sparse networks with unique training properties. We find that distilled data, a synthetic summarization of the real data, paired with Iterative Magnitude Pruning (IMP) unveils a new class of sparse networks that are more stable to SGD noise on the real data, than either the dense model, or subnetworks found with real data in IMP. That is, synthetically chosen subnetworks often train to the same minima, or exhibit linear mode connectivity. We study this through linear interpolation, loss landscape visualizations, and measuring the diagonal of the hessian. While dataset distillation as a field is still young, we find that these properties lead to synthetic subnetworks matching the performance of traditional IMP with up to 150x less training points in settings where distilled data applies.

摘要
“因为神经网络束缚的兴趣增长，我们研究了使用 sintetic data 进行神经网络剪除的影响。我们发现，通过对真实数据进行概要汇总，并使用迭代大小剪除（IMP），可以找到一类特有的稀疏网络，它们在真实数据上更加稳定，SGD 噪音的影响下。即使使用真实数据进行 IMP，也不能达到这类网络的性能。我们通过线性 interpolate，损失地图可见化和对偏导数矩阵的评估来研究这一点。虽然数据概要为一个 relativity 新的领域，但我们发现这些特性使得使用 sintetic data 可以达到与传统 IMP 相同的性能，即使是使用 150 倍少的训练点。”

Reboost Large Language Model-based Text-to-SQL, Text-to-Python, and Text-to-Function – with Real Applications in Traffic Domain

paper_url: http://arxiv.org/abs/2310.18752
repo_url: None
paper_authors: Guanghu Sui, Zhishuai Li, Ziyue Li, Sun Yang, Jingqing Ruan, Hangyu Mao, Rui Zhao
for: 提高文本到SQL执行精度
methods: 改进提问方法，包括查询重写和SQL增强
results: 实现了显著提高执行精度，使用较弱的预训练语言模型也达到了21.05%的最高精度Here’s the full translation of the abstract in Simplified Chinese:本文提出了一种更适应和更通用的提问方法，用于提高文本到SQL执行精度。我们发现了对于商业 dataset 的执行精度的显著下降，并且分析了 dataset 的复杂性和问题意图的不同所带来的影响。为了减少信息漏斗，我们将comments、值类型和值示例包含在数据库描述中。我们的实验表明，使用大型自然语言模型（LLMs）可以实现显著的性能提高。相比之下，state-of-the-art 方法在商业 dataset 上的执行精度为21.05%，而我们的方法在同一 dataset 上达到了65.79%。此外，我们还探讨了文本到Python和文本到函数等选项，并对其间的优缺点进行了深入分析，为社区提供了有价值的意见。

Abstract
The previous state-of-the-art (SOTA) method achieved a remarkable execution accuracy on the Spider dataset, which is one of the largest and most diverse datasets in the Text-to-SQL domain. However, during our reproduction of the business dataset, we observed a significant drop in performance. We examined the differences in dataset complexity, as well as the clarity of questions' intentions, and assessed how those differences could impact the performance of prompting methods. Subsequently, We develop a more adaptable and more general prompting method, involving mainly query rewriting and SQL boosting, which respectively transform vague information into exact and precise information and enhance the SQL itself by incorporating execution feedback and the query results from the database content. In order to prevent information gaps, we include the comments, value types, and value samples for columns as part of the database description in the prompt. Our experiments with Large Language Models (LLMs) illustrate the significant performance improvement on the business dataset and prove the substantial potential of our method. In terms of execution accuracy on the business dataset, the SOTA method scored 21.05, while our approach scored 65.79. As a result, our approach achieved a notable performance improvement even when using a less capable pre-trained language model. Last but not least, we also explore the Text-to-Python and Text-to-Function options, and we deeply analyze the pros and cons among them, offering valuable insights to the community.

摘要
previous state-of-the-art (SOTA) 方法在 Spider 数据集上达到了杰出的执行精度，这是文本到 SQL 领域中最大和最多样的数据集之一。然而，在我们重现商业数据集时，我们注意到了显著的性能下降。我们分析了数据集的复杂性以及问题意图的清晰度，并评估了这些差异如何影响提示方法的性能。因此，我们开发了更适应和更通用的提示方法，包括主要的查询重写和 SQL 加强，将混淆信息转化为准确和精确信息，并通过 incorporating 执行反馈和数据库内容的查询结果来增强 SQL 本身。为了避免信息异常，我们将数据库描述中的注释、值类型和值示例包含在提示中。我们的实验表明，使用大型自然语言模型 (LLMs) 可以在商业数据集上实现显著性能提升，并证明了我们的方法的巨大潜力。在商业数据集上的执行精度方面，SOTA 方法得分 21.05，而我们的方法得分 65.79。因此，我们的方法在使用较弱预训练语言模型时 still 实现了显著的性能提升。最后，我们还探索了 Text-to-Python 和 Text-to-Function 选项，并进行了深入分析，提供了价值的发现。

On Training Implicit Meta-Learning With Applications to Inductive Weighing in Consistency Regularization

paper_url: http://arxiv.org/abs/2310.18741
repo_url: None
paper_authors: Fady Rezk
for: 这个论文的目的是比较不同的缺省方法在隐式微调学习中的计算成本、稳定性、泛化性和估计准确性。
methods: 这个论文使用了多种缺省方法，包括矩阵估计、均值场估计和积分估计等，并对它们进行了系统比较。
results: 研究发现，矩阵估计和均值场估计在缺省学习中具有较高的计算成本和稳定性，而积分估计具有较高的泛化性和估计准确性。此外，研究还提出了一种新的半监督学习算法，可以透过增强具有适应性的域特异特征来增强鲁棒性。该算法的实验结果超过了基eline FixMatch性能。

Abstract
Meta-learning that uses implicit gradient have provided an exciting alternative to standard techniques which depend on the trajectory of the inner loop training. Implicit meta-learning (IML), however, require computing $2^{nd}$ order gradients, particularly the Hessian which is impractical to compute for modern deep learning models. Various approximations for the Hessian were proposed but a systematic comparison of their compute cost, stability, generalization of solution found and estimation accuracy were largely overlooked. In this study, we start by conducting a systematic comparative analysis of the various approximation methods and their effect when incorporated into IML training routines. We establish situations where catastrophic forgetting is exhibited in IML and explain their cause in terms of the inability of the approximations to estimate the curvature at convergence points. Sources of IML training instability are demonstrated and remedied. A detailed analysis of the effeciency of various inverse Hessian-vector product approximation methods is also provided. Subsequently, we use the insights gained to propose and evaluate a novel semi-supervised learning algorithm that learns to inductively weigh consistency regularization losses. We show how training a "Confidence Network" to extract domain specific features can learn to up-weigh useful images and down-weigh out-of-distribution samples. Results outperform the baseline FixMatch performance.

摘要
Meta-学习使用隐式梯度提供了一种有趣的代替标准技术，这些技术取决于内部循环训练的轨迹。然而，隐式 meta-学习（IML）需要计算第二个梯度，特别是希尔比格，这是现代深度学习模型中计算的不实际。Various approximations for the Hessian were proposed, but a systematic comparison of their compute cost, stability, generalization of solution found and estimation accuracy were largely overlooked.在这项研究中，我们开始了一个系统性的比较分析，检验不同的近似方法在IML训练流程中的效果。我们证明了IML训练中出现的 катастрофи忘记现象，并解释了其原因为近似方法无法在 converges 点 estimating 曲线的 curvature。我们还示出了IML训练的不稳定性的来源，并提供了修复方法。另外，我们还提供了一个细节的 inverse Hessian-vector product approximation 方法的效率分析。然后，我们使用获得的理解，提出和评估一种新的半监督学习算法，该算法可以学习 inductively 权重一致减少损失。我们表明了在训练 "信任网络" 来提取域pecific特征时，可以学习到升重用户图像和降低非标范图像。结果超出了基eline FixMatch性能。

Pre-training with Random Orthogonal Projection Image Modeling

paper_url: http://arxiv.org/abs/2310.18737
repo_url: None
paper_authors: Maryam Haghighat, Peyman Moghadam, Shaheer Mohamed, Piotr Koniusz
for: The paper is written for proposing a new self-supervised learning framework called Random Orthogonal Projection Image Modeling (ROPIM) that can be used for visual pre-training without the need for labels.
methods: The paper uses a random orthogonal projection method to randomly mask entire spatial image areas with locally varying masking degrees, which encourages the network to capture and learn structural information about objects and scenes.
results: The paper shows that using random orthogonal projection leads to superior performance compared to crop-based masking, and demonstrates state-of-the-art results on several popular benchmarks.Here is the same information in Simplified Chinese text:
for: 这篇论文是为了介绍一种新的自我超视learning框架，即Random Orthogonal Projection Image Modeling（ROPIM），用于无标签的视觉预训练。
methods: 这篇论文使用随机正交投影方法，随机将整个图像空间掩码，实现了地方性Masking的效果，从而让网络学习对象和场景的结构信息。
results: 这篇论文表明，使用随机正交投影比crop-based masking更高效，并在多个流行的标准测试集上达到了领先的性能。

Abstract
Masked Image Modeling (MIM) is a powerful self-supervised strategy for visual pre-training without the use of labels. MIM applies random crops to input images, processes them with an encoder, and then recovers the masked inputs with a decoder, which encourages the network to capture and learn structural information about objects and scenes. The intermediate feature representations obtained from MIM are suitable for fine-tuning on downstream tasks. In this paper, we propose an Image Modeling framework based on random orthogonal projection instead of binary masking as in MIM. Our proposed Random Orthogonal Projection Image Modeling (ROPIM) reduces spatially-wise token information under guaranteed bound on the noise variance and can be considered as masking entire spatial image area under locally varying masking degrees. Since ROPIM uses a random subspace for the projection that realizes the masking step, the readily available complement of the subspace can be used during unmasking to promote recovery of removed information. In this paper, we show that using random orthogonal projection leads to superior performance compared to crop-based masking. We demonstrate state-of-the-art results on several popular benchmarks.

摘要
自适应学习 ohne 标签的视觉预训练策略：面罩图像模型（MIM）。MIM 使用随机剪辑对输入图像进行处理，然后使用解码器恢复受随机剪辑影响的输入图像，这使得网络学习和捕捉图像中的结构信息。MIM 生成的中间特征表示可以进行下游任务的细化。在这篇论文中，我们提出了基于随机正交投影的图像模型框架（ROPIM）。ROPIM 在空间上减少了Token信息，并且可以保证随机投影的噪声方差的下界。由于 ROPIM 使用随机子空间进行投影，因此可以使用该子空间的可用资源进行解压缩，以便恢复被移除的信息。在这篇论文中，我们证明了使用随机正交投影可以比随机剪辑更高效。我们在多个流行的 benchmark 上达到了状态机器的表现。

Using Large Language Models to Support Thematic Analysis in Empirical Legal Studies

paper_url: http://arxiv.org/abs/2310.18729
repo_url: None
paper_authors: Jakub Drápal, Hannes Westermann, Jaromir Savelka
for: 本研究旨在探讨如何使用大语言模型（LLM）和法律专家合作进行逻辑分析，以便提高逻辑分析的效率和质量。
methods: 本研究使用了一种新的框架，即将LLM与法律专家合作进行逻辑分析的初始编码（阶段2）、主题搜索（阶段3）和数据分类（阶段4）。
results: 研究发现，使用LLM可以生成合理的初始编码，并且可以根据专家反馈进行改进。此外，模型还能够透过零例学习来将描述事实分类到主题类别中。最后，由LLM自动发现的主题与法律专家所找到的主题之间存在一定的相似性。这些发现可以帮助法律研究人员在启用LLM时作出更 Informed Decisions。

Abstract
Thematic analysis and other variants of inductive coding are widely used qualitative analytic methods within empirical legal studies (ELS). We propose a novel framework facilitating effective collaboration of a legal expert with a large language model (LLM) for generating initial codes (phase 2 of thematic analysis), searching for themes (phase 3), and classifying the data in terms of the themes (to kick-start phase 4). We employed the framework for an analysis of a dataset (n=785) of facts descriptions from criminal court opinions regarding thefts. The goal of the analysis was to discover classes of typical thefts. Our results show that the LLM, namely OpenAI's GPT-4, generated reasonable initial codes, and it was capable of improving the quality of the codes based on expert feedback. They also suggest that the model performed well in zero-shot classification of facts descriptions in terms of the themes. Finally, the themes autonomously discovered by the LLM appear to map fairly well to the themes arrived at by legal experts. These findings can be leveraged by legal researchers to guide their decisions in integrating LLMs into their thematic analyses, as well as other inductive coding projects.

摘要
empirical legal studies (ELS) widely used qualitative analytic methods, including thematic analysis and its variants. We propose a novel framework for effective collaboration between a legal expert and a large language model (LLM) in thematic analysis, including generating initial codes (phase 2), searching for themes (phase 3), and classifying the data in terms of themes (to kick-start phase 4). We applied the framework to a dataset (n=785) of fact descriptions from criminal court opinions on thefts, aiming to discover typical theft classes. Our results show that OpenAI's GPT-4, the LLM, generated reasonable initial codes and improved code quality based on expert feedback. Additionally, the model performed well in zero-shot classification of fact descriptions in terms of themes. The themes autonomously discovered by the LLM align well with the themes identified by legal experts, providing valuable insights for legal researchers integrating LLMs into their thematic analyses and other inductive coding projects.

The Evolution of the Interplay Between Input Distributions and Linear Regions in Networks

paper_url: http://arxiv.org/abs/2310.18725
repo_url: None
paper_authors: Xuan Qi, Yi Wei
for: 本研究旨在探讨深度神经网络的表达能力，具体来说是通过ReLU activation function来评估神经网络的表达能力。
methods: 本研究使用了 counted number of linear convex regions 来评估神经网络的表达能力。我们也提供了一种基于ReLU activation function的训练过程的分析。
results: 我们的研究发现，对于任意一个一维输入，存在一个最小阈值的神经元数量可以表达它。此外，我们还发现在训练过程中，ReLU网络的决策边界会经历反复细化过程。我们的研究希望能够激发网络优化的研究，并为深度神经网络的探索和分析提供启示。

Abstract
It is commonly recognized that the expressiveness of deep neural networks is contingent upon a range of factors, encompassing their depth, width, and other relevant considerations. Currently, the practical performance of the majority of deep neural networks remains uncertain. For ReLU (Rectified Linear Unit) networks with piecewise linear activations, the number of linear convex regions serves as a natural metric to gauge the network's expressivity. In this paper, we count the number of linear convex regions in deep neural networks based on ReLU. In particular, we prove that for any one-dimensional input, there exists a minimum threshold for the number of neurons required to express it. We also empirically observe that for the same network, intricate inputs hinder its capacity to express linear regions. Furthermore, we unveil the iterative refinement process of decision boundaries in ReLU networks during training. We aspire for our research to serve as an inspiration for network optimization endeavors and aids in the exploration and analysis of the behaviors exhibited by deep networks.

摘要
通常认为深度神经网络的表达能力取决于各种因素，包括它们的深度、宽度和其他相关因素。目前，大多数深度神经网络的实际表现仍然不清楚。为ReLU（矩阵线性单元）网络，数量的凸 convex 区域作为一个自然的度量来衡量网络的表达能力。在这篇论文中，我们计算了深度神经网络中ReLU activation function的凸 convex 区域数量。特别是，我们证明了任何一维输入都存在一个最小阈值的神经元数量，可以表达它。此外，我们还观察到了在同一个网络中，复杂的输入会降低其表达线性区域的能力。此外，我们还揭示了ReLU网络在训练过程中的迭代精细化过程。我们希望通过这项研究，能够激发网络优化的努力，并且对深度网络的行为进行探索和分析。

WCLD: Curated Large Dataset of Criminal Cases from Wisconsin Circuit Courts

paper_url: http://arxiv.org/abs/2310.18724
repo_url: None
paper_authors: Elliott Ash, Naman Goel, Nianyun Li, Claudia Marangon, Peiyao Sun
for: This paper provides a large dataset of criminal cases to support research on machine learning decision-support tools in criminal justice systems, with a focus on fairness and systemic issues.
methods: The dataset is constructed using reliable public data from 1970 to 2020, including information on prior criminal counts, recidivism outcomes, and various other attributes such as neighborhood characteristics, charge severity, and case decisions.
results: The dataset contains a large number of samples from five racial groups and provides researchers with a more comprehensive and rigorous platform for studying algorithmic fairness in the context of criminal justice.

Abstract
Machine learning based decision-support tools in criminal justice systems are subjects of intense discussions and academic research. There are important open questions about the utility and fairness of such tools. Academic researchers often rely on a few small datasets that are not sufficient to empirically study various real-world aspects of these questions. In this paper, we contribute WCLD, a curated large dataset of 1.5 million criminal cases from circuit courts in the U.S. state of Wisconsin. We used reliable public data from 1970 to 2020 to curate attributes like prior criminal counts and recidivism outcomes. The dataset contains large number of samples from five racial groups, in addition to information like sex and age (at judgment and first offense). Other attributes in this dataset include neighborhood characteristics obtained from census data, detailed types of offense, charge severity, case decisions, sentence lengths, year of filing etc. We also provide pseudo-identifiers for judge, county and zipcode. The dataset will not only enable researchers to more rigorously study algorithmic fairness in the context of criminal justice, but also relate algorithmic challenges with various systemic issues. We also discuss in detail the process of constructing the dataset and provide a datasheet. The WCLD dataset is available at \url{https://clezdata.github.io/wcld/}.

摘要
机器学习基于决策支持工具在刑事司法系统中是激烈的讨论和学术研究的主题。有重要的开放问题，例如这些工具的有用性和公平性。学术研究人员 часто依靠一些小的数据集来实际研究各种现实世界方面的问题。在这篇论文中，我们贡献了WCLD，一个 curaated大型数据集，包含150万个刑事案件从美国威斯康星州的环境法院。我们使用可靠的公共数据从1970年到2020年来Curate属性，如前科犯罪记录和重犯率结果。这个数据集包含多个种族组，以及性别和年龄（审判时和首次犯罪时）的信息。其他属性包括从人口普查数据获取的社区特征、细致的犯罪类型、罪名严重程度、审判结果、刑罚长度、提交年份等。我们还提供了判官、郡和邮政编码的 Pseudo-标识符。这个数据集不仅允许研究人员更加严谨地研究刑事司法中的算法公平性，还可以将算法挑战与多种系统问题相关联。我们还在详细介绍了数据集构建过程，并提供了数据表单。WCLD数据集可以在 \url{https://clezdata.github.io/wcld/} 上下载。

Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards

paper_url: http://arxiv.org/abs/2310.18715
repo_url: None
paper_authors: Jin Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi
for: 增强线上强化学习（RL）在重 tailed 奖励下的Robustness，这种情况在实际应用中很普遍。
methods: 我们提出了两种算法框架，ROAM和ROOM，用于稳定的 Off-policy Evaluation（OPE）和 Offline Policy Optimization（OPO）。我们的框架通过精心将 median-of-means 方法与线上RL结合，以便直观地估计值函数估计器的uncertainty。这不仅遵循 OPO 的原则，而且 также有效地处理重 tailed 奖励。
results: 我们的两种框架在对 logged 数据集展示 heavy-tailed 奖励分布时表现出色，与现有方法相比，有较高的性能。

Abstract
This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation (OPE) and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavy-tailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavy-tailed reward distributions.

摘要

An Investigation of Darwiche and Pearl’s Postulates for Iterated Belief Update

paper_url: http://arxiv.org/abs/2310.18714
repo_url: None
paper_authors: Quanlong Guan, Tong Zhu, Liangda Fang, Junming Qiu, Zhao-Rong Lai, Weiqi Luo
For: This paper focuses on belief revision and update, two types of belief change, and how an agent can modify her beliefs in the presence of new information.* Methods: The paper uses the AGM and KM postulates to capture rational belief revision and update, respectively, but notes that these postulates are too permissive and can lead to unreasonable changes in the iteration.* Results: The paper presents a modification of the original KM postulates based on belief states, and migrates several well-known postulates for iterated belief revision to iterated belief update. The paper also provides exact semantic characterizations based on partial preorders for each of the proposed postulates, and analyzes the compatibility between the iterated postulates and the KM postulates for belief update.

Abstract
Belief revision and update, two significant types of belief change, both focus on how an agent modify her beliefs in presence of new information. The most striking difference between them is that the former studies the change of beliefs in a static world while the latter concentrates on a dynamically-changing world. The famous AGM and KM postulates were proposed to capture rational belief revision and update, respectively. However, both of them are too permissive to exclude some unreasonable changes in the iteration. In response to this weakness, the DP postulates and its extensions for iterated belief revision were presented. Furthermore, Rodrigues integrated these postulates in belief update. Unfortunately, his approach does not meet the basic requirement of iterated belief update. This paper is intended to solve this problem of Rodrigues's approach. Firstly, we present a modification of the original KM postulates based on belief states. Subsequently, we migrate several well-known postulates for iterated belief revision to iterated belief update. Moreover, we provide the exact semantic characterizations based on partial preorders for each of the proposed postulates. Finally, we analyze the compatibility between the above iterated postulates and the KM postulates for belief update.

摘要
belief revision和更新两种重要的信念变化都关注于一个代理人在新信息存在下如何修改她的信念。两者最明显的差异在于前者研究在静止世界中的信念变化，而后者专注于动态变化的世界。著名的AGM和KM假设被提出来捕捉合理的信念修改和更新。然而，两者都过于允许一些不合理的修改在迭代中。为了解决这个弱点，DP假设和其扩展被提出来。此外，Rodrigues将这些假设 integrate到信念更新中。然而，他的方法并不满足基本的迭代信念更新要求。这篇论文的目的是解决Rodrigues的方法中的这个问题。首先，我们提出修改了原始KM假设的基于信念状态的修改。然后，我们将许多已知的迭代信念修改假设迁移到迭代信念更新中。此外，我们提供了每个提案的准确的语义特征化，基于partial orden для每个提案。最后，我们分析了以上迭代假设与KM假设之间的兼容性。

Probing LLMs for Joint Encoding of Linguistic Categories

paper_url: http://arxiv.org/abs/2310.18696
repo_url: https://github.com/thesofakillers/infoshare
paper_authors: Giulio Starace, Konstantinos Papakostas, Rochelle Choenni, Apostolos Panagiotopoulos, Matteo Rosati, Alina Leidinger, Ekaterina Shutova
for: 这个论文旨在探讨大语言模型（LLM）中不同语言现象之间的编码方式，以及这些编码方式如何交互影响模型的表示。
methods: 作者提出了一种测试框架，用于检查 LLM 中不同语言现象之间的编码方式。他们在 syntax 领域进行了实验，并发现了在同一级别（相关的 parts-of-speech 类）和不同级别（parts-of-speech 类和相关的语法依赖关系）之间存在共同编码的证据。
results: 实验显示，在多语言 LLM 中，同样的 patterns 存在于不同语言中。

Abstract
Large Language Models (LLMs) exhibit impressive performance on a range of NLP tasks, due to the general-purpose linguistic knowledge acquired during pretraining. Existing model interpretability research (Tenney et al., 2019) suggests that a linguistic hierarchy emerges in the LLM layers, with lower layers better suited to solving syntactic tasks and higher layers employed for semantic processing. Yet, little is known about how encodings of different linguistic phenomena interact within the models and to what extent processing of linguistically-related categories relies on the same, shared model representations. In this paper, we propose a framework for testing the joint encoding of linguistic categories in LLMs. Focusing on syntax, we find evidence of joint encoding both at the same (related part-of-speech (POS) classes) and different (POS classes and related syntactic dependency relations) levels of linguistic hierarchy. Our cross-lingual experiments show that the same patterns hold across languages in multilingual LLMs.

摘要
大型语言模型（LLM）在多种自然语言处理任务上表现出众，这是因为预训练期间获得的通用语言知识。现有的模型解释研究（Tenney等，2019）表明，LLM层次结构中的下层更适合解决语法任务，而上层则用于 semantics处理。然而，我们对 LLM 中不同语言现象编码的交互并不甚了解，以及这些编码如何在模型中互相协作。在这篇论文中，我们提出了测试 LLM 中语言类别之间的共同编码框架。我们将注重语法，发现在同一级别（相关的部分词类）和不同级别（部分词类和相关的语法关系）之间都有共同编码证据。我们的跨语言实验表明，这些模式在多语言 LLM 中也存在。

Unsupervised Behavior Extraction via Random Intent Priors

paper_url: http://arxiv.org/abs/2310.18687
repo_url: None
paper_authors: Hao Hu, Yiqin Yang, Jianing Ye, Ziqing Mai, Chongjie Zhang
for: 提高offline reinforcement learning（RL）算法的效率和实用性，使其能够更好地利用奖励自由数据中的人类行为知识。
methods: 提出了一种无监督的方法UBER，通过不同的假奖分配给不同的代理人来提取多样化的行为集，并将其 reuse 为新任务学习的候选策略。
results: 经验和理论证明表明，使用随机神经网络生成的奖励函数可以提取多样化和有用的行为，一些甚至与专家相似。实验结果表明，UBER可以在多个 benchmark 上学习有效和多样的行为集，超过现有的基elines。

Abstract
Reward-free data is abundant and contains rich prior knowledge of human behaviors, but it is not well exploited by offline reinforcement learning (RL) algorithms. In this paper, we propose UBER, an unsupervised approach to extract useful behaviors from offline reward-free datasets via diversified rewards. UBER assigns different pseudo-rewards sampled from a given prior distribution to different agents to extract a diverse set of behaviors, and reuse them as candidate policies to facilitate the learning of new tasks. Perhaps surprisingly, we show that rewards generated from random neural networks are sufficient to extract diverse and useful behaviors, some even close to expert ones. We provide both empirical and theoretical evidence to justify the use of random priors for the reward function. Experiments on multiple benchmarks showcase UBER's ability to learn effective and diverse behavior sets that enhance sample efficiency for online RL, outperforming existing baselines. By reducing reliance on human supervision, UBER broadens the applicability of RL to real-world scenarios with abundant reward-free data.

摘要
reward-free 数据够丰富，含有人类行为的丰富先验知识，但是现在的Offline reinforcement learning（RL）算法未能充分利用这些数据。在这篇论文中，我们提出了UBER，一种不带supervision的方法，通过多样化的奖励来提取用于新任务学习的有用行为集。我们尝试 assigning different pseudo-奖励，从给定的先验分布中随机生成的奖励，给不同的代理人，以提取多样化的行为集，并将其 reuse 作为新任务学习的候选策略。我们发现，由随机神经网络生成的奖励可以提取出高质量的多样化行为集，其中一些甚至可以与专家级别相比。我们提供了 both empirical 和理论证据，证明使用随机先验来奖励函数是有理由的。我们在多个 bench mark 上进行了实验，证明 UBER 能够学习有效和多样化的行为集，提高在线RL的样本效率，超过现有的基eline。通过减少人类监督，UBER 扩展了RL的应用范围，使其可以在实际情况下应用。

paper_url: http://arxiv.org/abs/2310.18679
repo_url: None
paper_authors: Sajad Mousavi, Ricardo Luna Gutiérrez, Desik Rengarajan, Vineet Gundecha, Ashwin Ramesh Babu, Avisek Naug, Antonio Guillen, Soumyendu Sarkar
for: 提高 LLM 的可靠性和准确性， Mitigate 偏见和谎言
methods: 使用一个 ensemble of critics 和模型自身的反馈来修正模型输出， drawing inspiration from human self-reflection and input seeking behavior
results: observe consistent performance improvements in reducing toxicity and correcting factual errors

Abstract
We propose a self-correction mechanism for Large Language Models (LLMs) to mitigate issues such as toxicity and fact hallucination. This method involves refining model outputs through an ensemble of critics and the model's own feedback. Drawing inspiration from human behavior, we explore whether LLMs can emulate the self-correction process observed in humans who often engage in self-reflection and seek input from others to refine their understanding of complex topics. Our approach is model-agnostic and can be applied across various domains to enhance trustworthiness by addressing fairness, bias, and robustness concerns. We consistently observe performance improvements in LLMs for reducing toxicity and correcting factual errors.

摘要
我们提出了一种自修复机制，用于 mitigate Large Language Models（LLMs）中的问题，如恶意和谬误投入。这种方法通过一个ensemble of critics和模型自身的反馈来纠正模型输出。我们从人类行为中获得灵感，探讨 LLMS 是否可以模仿人类自我反思的自修复过程。我们的方法是无关模型的，可以在不同领域中应用，以提高可靠性，解决公平、偏见和Robustness 等问题。我们一致地观察到 LLMS 的性能提高，用于减少恶意和 corrected 错误。

GalliformeSpectra: A Hen Breed Dataset

paper_url: http://arxiv.org/abs/2310.19830
repo_url: None
paper_authors: Galib Muhammad Shahriar Himel, Md Masudul Islam
for: 这个论文旨在提供一份包含十种不同鸡种的完整数据集，以捕捉每种鸡种的独特特征和特征。
methods: 该论文使用了一种多样化的数据收集方法，收集了1010个原始JPG图像，展示了各种鸡种的身体特征、毛皮模式和特有的特征。这些图像后来被标准化、缩放和转换为PNG格式以保持数据集的一致性。
results: 该数据集提供了一个多样化的资源，可以用于鸡类科学、遗传学和农业研究。这个数据集的潜在价值在于它可以帮助研究人员探索不同鸡种之间的一致性和遗传特征，从而支持鸡类种养、遗传研究和生物技术发展。

Abstract
This article presents a comprehensive dataset featuring ten distinct hen breeds, sourced from various regions, capturing the unique characteristics and traits of each breed. The dataset encompasses Bielefeld, Blackorpington, Brahma, Buckeye, Fayoumi, Leghorn, Newhampshire, Plymouthrock, Sussex, and Turken breeds, offering a diverse representation of poultry commonly bred worldwide. A total of 1010 original JPG images were meticulously collected, showcasing the physical attributes, feather patterns, and distinctive features of each hen breed. These images were subsequently standardized, resized, and converted to PNG format for consistency within the dataset. The compilation, although unevenly distributed across the breeds, provides a rich resource, serving as a foundation for research and applications in poultry science, genetics, and agricultural studies. This dataset holds significant potential to contribute to various fields by enabling the exploration and analysis of unique characteristics and genetic traits across different hen breeds, thereby supporting advancements in poultry breeding, farming, and genetic research.

摘要

FinBTech: Blockchain-Based Video and Voice Authentication System for Enhanced Security in Financial Transactions Utilizing FaceNet512 and Gaussian Mixture Models

paper_url: http://arxiv.org/abs/2310.18668
repo_url: None
paper_authors: Prof N. Jeenath Laila, Dr G. Tamilpavai
for: 为了提高金融交易的安全性和可靠性
methods: 使用智能合约、区块链技术、FaceNet512 face recognition和GMM语音认证，实现视频和音频验证
results: 提供了一个无 precedent 的多因素生物 metric 验证系统，提高安全性至新高度

Abstract
In the digital age, it is crucial to make sure that financial transactions are as secure and reliable as possible. This abstract offers a ground-breaking method that combines smart contracts, blockchain technology, FaceNet512 for improved face recognition, and Gaussian Mixture Models (GMM) for speech authentication to create a system for video and audio verification that is unmatched. Smart contracts and the immutable ledger of the blockchain are combined to offer a safe and open environment for financial transactions. FaceNet512 and GMM offer multi-factor biometric authentication simultaneously, enhancing security to new heights. By combining cutting-edge technology, this system offers a strong defense against identity theft and illegal access, establishing a new benchmark for safe financial transactions.

摘要
在数字时代，确保金融交易的安全和可靠性非常重要。这个报道提供了一种创新的方法， combinig智能合同、区块链技术、FaceNet512 для提高人脸识别和混合 Gaussian Mixture Models (GMM) для语音验证，以创建一个无与伦比的视频和音频验证系统。智能合同和区块链的坚实记录结合，提供了一个安全和开放的金融交易环境。FaceNet512 和 GMM 同时提供多因素生物 metric 验证，提高安全性至新的高度。通过结合前沿技术，这个系统提供了一个强大的防止身份盗用和未经授权访问的防御，设立了新的安全金融交易标准。

From Indeterminacy to Determinacy: Augmenting Logical Reasoning Capabilities with Large Language Models

paper_url: http://arxiv.org/abs/2310.18659
repo_url: None
paper_authors: Hongda Sun, Weikai Xu, Wei Liu, Jian Luan, Bin Wang, Shuo Shang, Ji-Rong Wen, Rui Yan
for: 提高LLM的逻辑推理能力，以便更好地模仿人类逻辑思维。
methods: 提出了一种新的逻辑推理框架，即DetermLR，该框架将逻辑推理过程定义为一种从不确定前提开始，逐步增加确定前提，使结论变得更加明确的过程。DetermLR包括三个重要组成部分：1）前提识别：将前提分为两类：确定和不确定。这使LLM可以根据特定任务的复杂度选择适当的逻辑结构。2）前提优化和探索：利用量化度量评估每个前提的相关性，以便更好地决定探索哪些前提可能会带来新的发现。3）迭代过程和逻辑记忆：引入逻辑记忆模块，自动记录和提取可用的前提和逻辑路径，以保持历史逻辑细节，从而更好地优化前提优化和逻辑推理过程。
results: 对四个复杂的逻辑推理任务LogiQA、ProofWriter、FOLIO和LogicalDeduction进行了广泛的实验，结果表明，DetermLR与所有基线相比，在逻辑推理任务中表现出色，可以更好地完成逻辑推理任务，同时需要更少的访问状态。

Abstract
Recent advances in LLMs have revolutionized the landscape of reasoning tasks. To enhance the capabilities of LLMs to emulate human reasoning, prior works focus on modeling reasoning steps using specific thought structures like chains, trees, or graphs. However, LLM-based reasoning continues to encounter three challenges: 1) Selecting appropriate reasoning structures for various tasks; 2) Exploiting known conditions sufficiently and efficiently to deduce new insights; 3) Considering the impact of historical reasoning experience. To address these challenges, we propose DetermLR, a novel reasoning framework that formulates the reasoning process as a transformational journey from indeterminate premises to determinate ones. This process is marked by the incremental accumulation of determinate premises, making the conclusion progressively closer to clarity. DetermLR includes three essential components: 1) Premise identification: We categorize premises into two distinct types: determinate and indeterminate. This empowers LLMs to customize reasoning structures to match the specific task complexities. 2) Premise prioritization and exploration: We leverage quantitative measurements to assess the relevance of each premise to the target, prioritizing more relevant premises for exploring new insights. 3) Iterative process with reasoning memory: We introduce a reasoning memory module to automate storage and extraction of available premises and reasoning paths, preserving historical reasoning details for more accurate premise prioritization. Comprehensive experimental results show that DetermLR outperforms all baselines on four challenging logical reasoning tasks: LogiQA, ProofWriter, FOLIO, and LogicalDeduction. DetermLR can achieve better reasoning performance while requiring fewer visited states, highlighting its superior efficiency and effectiveness in tackling logical reasoning tasks.

摘要

Selecting appropriate reasoning structures for various tasks2. Exploiting known conditions sufficiently and efficiently to deduce new insights3. Considering the impact of historical reasoning experience.To address these challenges, we propose DetermLR, a novel reasoning framework that formulates the reasoning process as a transformational journey from indeterminate premises to determinate ones. This process is marked by the incremental accumulation of determinate premises, making the conclusion progressively closer to clarity. DetermLR includes three essential components: 1. Premise identification: We categorize premises into two distinct types: determinate and indeterminate. This empowers LLMs to customize reasoning structures to match the specific task complexities. 2. Premise prioritization and exploration: We leverage quantitative measurements to assess the relevance of each premise to the target, prioritizing more relevant premises for exploring new insights. 3. Iterative process with reasoning memory: We introduce a reasoning memory module to automate storage and extraction of available premises and reasoning paths, preserving historical reasoning details for more accurate premise prioritization.Comprehensive experimental results show that DetermLR outperforms all baselines on four challenging logical reasoning tasks: LogiQA, ProofWriter, FOLIO, and LogicalDeduction. DetermLR can achieve better reasoning performance while requiring fewer visited states, highlighting its superior efficiency and effectiveness in tackling logical reasoning tasks.

paper_url: http://arxiv.org/abs/2310.18652
repo_url: https://github.com/baeseongsu/ehrxqa
paper_authors: Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei Ji, Eric I-Chao Chang, Tackeun Kim, Edward Choi
For: 这个论文旨在开发一个基于电子医疗记录（EHR）的多模态问答集（EHRXQA），以推动现有EHR问答系统中多模态合理的推理。* Methods: 该论文使用了两个uni-modal资源：1）MIMIC-CXR-VQA数据集，我们新创建的医疗图像问答标准 benchmark，以增强imaging模式在EHR问答中的参与度；2）EHRSQL（MIMIC-IV），一个重新设计的表格基于EHR问答dataset。通过将这两个uni-modal资源集成，我们成功构建了一个多模态EHR问答集。* Results: 该论文提出了一种基于NeuralSQL的策略，其中包括一个外部VQA API，以解决多模态EHR问题中的独特挑战。这项创新的尝试可以提高对多模态EHR源的参与度，我们认为这个dataset可以促进现实世界的医疗应用，如临床决策和研究。

Abstract
Electronic Health Records (EHRs), which contain patients' medical histories in various multi-modal formats, often overlook the potential for joint reasoning across imaging and table modalities underexplored in current EHR Question Answering (QA) systems. In this paper, we introduce EHRXQA, a novel multi-modal question answering dataset combining structured EHRs and chest X-ray images. To develop our dataset, we first construct two uni-modal resources: 1) The MIMIC- CXR-VQA dataset, our newly created medical visual question answering (VQA) benchmark, specifically designed to augment the imaging modality in EHR QA, and 2) EHRSQL (MIMIC-IV), a refashioned version of a previously established table-based EHR QA dataset. By integrating these two uni-modal resources, we successfully construct a multi-modal EHR QA dataset that necessitates both uni-modal and cross-modal reasoning. To address the unique challenges of multi-modal questions within EHRs, we propose a NeuralSQL-based strategy equipped with an external VQA API. This pioneering endeavor enhances engagement with multi-modal EHR sources and we believe that our dataset can catalyze advances in real-world medical scenarios such as clinical decision-making and research. EHRXQA is available at https://github.com/baeseongsu/ehrxqa.

摘要
电子健康记录（EHR），它们包含了患者的医疗历史记录在不同的多模态格式中，经常忽视了现有EHR问答系统中的跨模态合理化潜力。在这篇论文中，我们引入了EHRXQA，一个新的多模态问答数据集，结合了结构化的EHR和胸部X射影像。为了开发我们的数据集，我们首先构建了两个单模态资源：1）我们新创建的医疗图像问答数据集（MIMIC-CXR-VQA），用于增强EHR中的图像模态，并2）EHRSQL（MIMIC-IV），一个重新设计的表格基于EHR问答数据集。通过将这两个单模态资源集成起来，我们成功地构建了一个多模态EHR问答数据集，需要同时进行单模态和跨模态合理化。为了解决EHR中多模态问题中的特殊挑战，我们提出了基于NeuralSQL的策略，并配备了外部VQA API。我们认为这一努力可以提高对多模态EHR源的参与度，并且我们相信EHRXQA数据集可以促进实际医疗场景中的决策和研究。EHRXQA数据集可以在https://github.com/baeseongsu/ehrxqa上下载。

Sleep Deprivation in the Forward-Forward Algorithm

paper_url: http://arxiv.org/abs/2310.18647
repo_url: https://github.com/mirceatlx/ff
paper_authors: Mircea-Tudor Lică, David Dinucu-Jianu
for: 本研究探讨了在睡眠Context中Forward-Forward算法的两个前向通道分离方法的生物学意义。
methods: 本研究使用了Forward-Forward算法进行学习，并通过调整睡眠和醒目阶段之间的差距来调整算法的学习能力。
results: 研究发现，睡眠阶段的差距影响了算法的学习能力，而负数据的存在可以减轻睡眠不足的影响。

Abstract
This paper aims to explore the separation of the two forward passes in the Forward-Forward algorithm from a biological perspective in the context of sleep. We show the size of the gap between the sleep and awake phase influences the learning capabilities of the algorithm and highlight the importance of negative data in diminishing the devastating effects of sleep deprivation.

摘要
Note: "Forward-Forward algorithm" is not a real algorithm, it's a fictional one used for illustration purposes only.

Predicting Agricultural Commodities Prices with Machine Learning: A Review of Current Research

paper_url: http://arxiv.org/abs/2310.18646
repo_url: None
paper_authors: Nhat-Quang Tran, Anna Felipe, Thanh Nguyen Ngoc, Tom Huynh, Quang Tran, Arthur Tang, Thuy Nguyen
For: 这篇论文是关于机器学习算法在农业价格预测中的一种评论。* Methods: 论文详细介绍了各种机器学习算法在农业价格预测中的应用，包括支持向量机器、决策树、彩虹分解等。* Results: 论文认为，机器学习算法可以提高农业价格预测的准确性和实时性，同时可以适应不同的农业市场和环境。但是，论文也指出了这些算法的限制和挑战，例如数据质量和可用性的问题。

Abstract
Agricultural price prediction is crucial for farmers, policymakers, and other stakeholders in the agricultural sector. However, it is a challenging task due to the complex and dynamic nature of agricultural markets. Machine learning algorithms have the potential to revolutionize agricultural price prediction by improving accuracy, real-time prediction, customization, and integration. This paper reviews recent research on machine learning algorithms for agricultural price prediction. We discuss the importance of agriculture in developing countries and the problems associated with crop price falls. We then identify the challenges of predicting agricultural prices and highlight how machine learning algorithms can support better prediction. Next, we present a comprehensive analysis of recent research, discussing the strengths and weaknesses of various machine learning techniques. We conclude that machine learning has the potential to revolutionize agricultural price prediction, but further research is essential to address the limitations and challenges associated with this approach.

摘要
农业价格预测对农民、政策制定者和农业领acker有着重要的意义。然而，由于农业市场的复杂和动态特点，这是一项具有挑战性的任务。机器学习算法有可能为农业价格预测带来革命性的改善，包括准确性、实时预测、定制化和 интеграción。本文 recensreview了最近的研究，探讨了机器学习算法在农业价格预测中的应用。我们讨论了发展国家农业的重要性以及作物价格下跌的问题，然后详细介绍了各种机器学习技术的挑战和局限性。我们 conclude that 机器学习有可能为农业价格预测带来革命性的改善，但进一步的研究是必要的，以解决这种方法的限制和挑战。

One-shot Localization and Segmentation of Medical Images with Foundation Models

paper_url: http://arxiv.org/abs/2310.18642
repo_url: None
paper_authors: Deepa Anand, Gurunath Reddy M, Vanika Singhal, Dattesh D. Shanbhag, Shriram KS, Uday Patil, Chitresh Bhushan, Kavitha Manickam, Dawei Gui, Rakesh Mullick, Avinash Gopal, Parminder Bhatia, Taha Kass-Hout
for: 本研究使用自然图像预训练的视Transformers（ViT）和稳定扩散（SD）模型来解决医学图像对应问题。
methods: 研究使用多种预训练的ViT（DINO、DINOv2、SAM、CLIP）和SD模型，对医学图像进行解决对应问题。
results: 研究表明，使用自然图像预训练的ViT和SD模型可以在不同的医学图像模式（CT、MR、ultrasound）、多个解剖区域（脑、胸、 Abdomen、Extremities）和多种任务上达到良好的性能。此外，通过与模板图像进行对应，我们使用SAM模型进行单击分割，达到了单击分割的 dice range 62%-90%。我们的单击方法也超过了 reciently proposed few-shot segmentation方法 - UniverSeg（Dice range 47%-80%) 在大多数医学图像模式中的多个semantic segmentation任务中表现出色。

Abstract
Recent advances in Vision Transformers (ViT) and Stable Diffusion (SD) models with their ability to capture rich semantic features of the image have been used for image correspondence tasks on natural images. In this paper, we examine the ability of a variety of pre-trained ViT (DINO, DINOv2, SAM, CLIP) and SD models, trained exclusively on natural images, for solving the correspondence problems on medical images. While many works have made a case for in-domain training, we show that the models trained on natural images can offer good performance on medical images across different modalities (CT,MR,Ultrasound) sourced from various manufacturers, over multiple anatomical regions (brain, thorax, abdomen, extremities), and on wide variety of tasks. Further, we leverage the correspondence with respect to a template image to prompt a Segment Anything (SAM) model to arrive at single shot segmentation, achieving dice range of 62%-90% across tasks, using just one image as reference. We also show that our single-shot method outperforms the recently proposed few-shot segmentation method - UniverSeg (Dice range 47%-80%) on most of the semantic segmentation tasks(six out of seven) across medical imaging modalities.

摘要
近期，人工智能领域内的视觉转换器（ViT）和稳定扩散（SD）模型在自然图像上表现出了捕捉图像 semantics的能力，这些模型在图像匹配任务中表现出色。在这篇论文中，我们研究了不同预训练的 ViT（DINO、DINOv2、SAM、CLIP）和 SD 模型，这些模型均在自然图像上进行封闭式训练，是否能够在医疗图像上解决匹配问题。虽然许多研究认为域内训练是关键，但我们发现这些模型在医疗图像上表现良好，包括不同的modalities（CT、MR、ultrasound），来自不同的制造商，以及多个解剖区域（大脑、胸部、腹部、四肢）。此外，我们利用模板图像的对应关系，使用 SAM 模型进行一步分割，实现了单步分割的 dice 范围为 62%-90%，使用只有一张图像作为参考。此外，我们的单步方法在多种医疗影像模式中的多个semantic segmentation任务中表现出色，超过了最近提出的几个shot segmentation方法（UniverSeg）的 dice 范围（47%-80%）。

Electrical Impedance Tomography: A Fair Comparative Study on Deep Learning and Analytic-based Approaches

paper_url: http://arxiv.org/abs/2310.18636
repo_url: https://github.com/dericknganyu/eit_dataset_generation
paper_authors: Derick Nganyu Tanyu, Jianfeng Ning, Andreas Hauptmann, Bangti Jin, Peter Maass
For: This paper focuses on the Electrical Impedance Tomography (EIT) inverse problem, which is the challenge of inferring the internal conductivity distribution of an object from measurements taken on its boundary. The paper explores techniques for solving this problem, particularly the interplay between deep learning-based strategies and classical analytic-based methods.* Methods: The paper examines four state-of-the-art deep learning algorithms for solving the EIT inverse problem, including their representational capabilities and strengths. In addition, two analytic-based methods are dissected for their limitations and strengths. The paper also employs various numerical experiments to evaluate the efficacy of these methods.* Results: The paper provides a nuanced understanding of the methods’ ability to capture essential features and delineate complex conductivity patterns. The incorporation of variable conductivity scenarios allows for exploring the robustness and adaptability of each method. The results demonstrate the potential of deep learning-based methods for solving the EIT inverse problem, particularly in the presence of complex conductivity patterns.

Abstract
Electrical Impedance Tomography (EIT) is a powerful imaging technique with diverse applications, e.g., medical diagnosis, industrial monitoring, and environmental studies. The EIT inverse problem is about inferring the internal conductivity distribution of an object from measurements taken on its boundary. It is severely ill-posed, necessitating advanced computational methods for accurate image reconstructions. Recent years have witnessed significant progress, driven by innovations in analytic-based approaches and deep learning. This review explores techniques for solving the EIT inverse problem, focusing on the interplay between contemporary deep learning-based strategies and classical analytic-based methods. Four state-of-the-art deep learning algorithms are rigorously examined, harnessing the representational capabilities of deep neural networks to reconstruct intricate conductivity distributions. In parallel, two analytic-based methods, rooted in mathematical formulations and regularisation techniques, are dissected for their strengths and limitations. These methodologies are evaluated through various numerical experiments, encompassing diverse scenarios that reflect real-world complexities. A suite of performance metrics is employed to assess the efficacy of these methods. These metrics collectively provide a nuanced understanding of the methods' ability to capture essential features and delineate complex conductivity patterns. One novel feature of the study is the incorporation of variable conductivity scenarios, introducing a level of heterogeneity that mimics textured inclusions. This departure from uniform conductivity assumptions mimics realistic scenarios where tissues or materials exhibit spatially varying electrical properties. Exploring how each method responds to such variable conductivity scenarios opens avenues for understanding their robustness and adaptability.

摘要
电气阻抗成像技术（EIT）是一种 poderosa 的成像技术，广泛应用于医学诊断、工业监测和环境研究等领域。EIT逆问题是关于从物体边缘测量获得内部电导分布的问题，它是非常不稳定的，需要高级计算方法以实现准确的成像重建。过去几年，驱动了由创新的数学基础和深度学习的技术进步，这种技术的研究受到了广泛关注。本文探讨了解决EIT逆问题的方法，特别是将现代深度学习基础与传统的数学基础相结合的方法。本文选择了四种现代深度学习算法进行严格的分析和评估，利用深度神经网络的表达能力来重建复杂的电导分布。同时，本文还介绍了两种传统的数学基础方法，包括基于数学形式和正则化技术的方法，并评估了它们的优缺点。这些方法在多种数字实验中被评估，涵盖了实际中的复杂场景。为评估这些方法的效果，本文采用了多种效果指标，这些指标共同提供了对方法的准确性和复杂电导分布的能力的全面了解。本文的一个新特点是对不同电导性场景进行变量电导分布的研究，这种假设与实际中的细胞或材料表现相符。通过对每种方法的响应来评估它们的Robustness和适应性。

Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots

paper_url: http://arxiv.org/abs/2310.18633
repo_url: None
paper_authors: Ruixiang Tang, Jiayi Yuan, Yiming Li, Zirui Liu, Rui Chen, Xia Hu
for: 防止语言模型中的后门攻击
methods: integrate a honeypot module into the original PLM, impose penalties on the information acquired by the honeypot module
results: 减少了10%~40%的攻击成功率，比前一代方法更有效和可靠

Abstract
In the field of natural language processing, the prevalent approach involves fine-tuning pretrained language models (PLMs) using local samples. Recent research has exposed the susceptibility of PLMs to backdoor attacks, wherein the adversaries can embed malicious prediction behaviors by manipulating a few training samples. In this study, our objective is to develop a backdoor-resistant tuning procedure that yields a backdoor-free model, no matter whether the fine-tuning dataset contains poisoned samples. To this end, we propose and integrate a honeypot module into the original PLM, specifically designed to absorb backdoor information exclusively. Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features while carrying minimal information about the original tasks. Consequently, we can impose penalties on the information acquired by the honeypot module to inhibit backdoor creation during the fine-tuning process of the stem network. Comprehensive experiments conducted on benchmark datasets substantiate the effectiveness and robustness of our defensive strategy. Notably, these results indicate a substantial reduction in the attack success rate ranging from 10\% to 40\% when compared to prior state-of-the-art methods.

摘要
在自然语言处理领域，普遍的方法是细化预训练语言模型（PLM）使用本地样本。 recent research has exposed the vulnerability of PLMs to backdoor attacks, where adversaries can embed malicious prediction behaviors by manipulating a few training samples. In this study, our objective is to develop a backdoor-resistant tuning procedure that yields a backdoor-free model, regardless of whether the fine-tuning dataset contains poisoned samples. To this end, we propose and integrate a honeypot module into the original PLM, specifically designed to absorb backdoor information exclusively. Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features while carrying minimal information about the original tasks. Therefore, we can impose penalties on the information acquired by the honeypot module to inhibit backdoor creation during the fine-tuning process of the stem network. Comprehensive experiments conducted on benchmark datasets substantiate the effectiveness and robustness of our defensive strategy. Notably, these results indicate a substantial reduction in the attack success rate ranging from 10% to 40% when compared to prior state-of-the-art methods.

Benchmark Generation Framework with Customizable Distortions for Image Classifier Robustness

paper_url: http://arxiv.org/abs/2310.18626
repo_url: None
paper_authors: Soumyendu Sarkar, Ashwin Ramesh Babu, Sajad Mousavi, Zachariah Carmichael, Vineet Gundecha, Sahand Ghorbanpour, Ricardo Luna, Gutierrez Antonio Guillen, Avisek Naug
for: 这个 paper 是为了提供一种生成攻击测试集的框架，以评估图像分类模型的可靠性。
methods: 这个框架使用了一种基于模型学习的强化学习算法，可以根据用户的需求选择合适的扰动种类，并生成多种扰动水平的测试集，以评估不同的图像分类模型的可靠性。
results: 这个框架可以生成高效和可转移的攻击样本，可以让不同的图像分类模型失败，包括 ResNet-50、Inception-V3 和 VGG-16 等模型。这些攻击样本可以在不受束缚的情况下生成，而不需要引入不自然的artifacts或颜色泄漏。

Abstract
We present a novel framework for generating adversarial benchmarks to evaluate the robustness of image classification models. Our framework allows users to customize the types of distortions to be optimally applied to images, which helps address the specific distortions relevant to their deployment. The benchmark can generate datasets at various distortion levels to assess the robustness of different image classifiers. Our results show that the adversarial samples generated by our framework with any of the image classification models, like ResNet-50, Inception-V3, and VGG-16, are effective and transferable to other models causing them to fail. These failures happen even when these models are adversarially retrained using state-of-the-art techniques, demonstrating the generalizability of our adversarial samples. We achieve competitive performance in terms of net $L_2$ distortion compared to state-of-the-art benchmark techniques on CIFAR-10 and ImageNet; however, we demonstrate our framework achieves such results with simple distortions like Gaussian noise without introducing unnatural artifacts or color bleeds. This is made possible by a model-based reinforcement learning (RL) agent and a technique that reduces a deep tree search of the image for model sensitivity to perturbations, to a one-level analysis and action. The flexibility of choosing distortions and setting classification probability thresholds for multiple classes makes our framework suitable for algorithmic audits.

摘要
我团队提出了一种新的框架，用于生成对图像分类模型的Robustness进行评估。我们的框架允许用户自定义图像上应用的最佳噪声类型，以适应其特定的部署环境。这个框架可以生成各种噪声水平的数据集，以评估不同的图像分类器的Robustness。我们的结果显示，我们的框架生成的对图像分类模型的攻击样本，包括ResNet-50、Inception-V3和VGG-16等模型，都是有效的和可传递的。这些攻击样本会让这些模型失败，即使这些模型通过了最先进的防御技术进行适应。我们的框架在CIFAR-10和ImageNet上达到了与state-of-the-art的$L_2$损失相同的竞争性，但是我们的框架可以使用简单的噪声（如 Gaussian 噪声）而不需要引入不自然的artifacts或颜色泄漏。这是由一个基于模型的强化学习（RL） Agent和一种减少图像深度搜索的技术来实现的。我们的框架可以根据用户选择的噪声类型和多个类别的分类概率来进行自定义。这使得我们的框架适用于算法审核。

Arbitrarily Scalable Environment Generators via Neural Cellular Automata

paper_url: http://arxiv.org/abs/2310.18622
repo_url: https://github.com/lunjohnzhang/warehouse_env_gen_nca_public
paper_authors: Yulun Zhang, Matthew C. Fontaine, Varun Bhatt, Stefanos Nikolaidis, Jiaoyang Li
for: 提高多机器人系统的吞吐量（improve the throughput of multi-robot systems）
methods: 使用质量多样性（Quality Diversity）算法优化环境生成器（Neural Cellular Automata environment generators）
results: 可以生成无限大的环境，并且维持环境中的准备规划（consistent, regularized patterns），提高多机器人系统的可扩展性和可靠性（improve the scalability and reliability of multi-robot systems）

Abstract
We study the problem of generating arbitrarily large environments to improve the throughput of multi-robot systems. Prior work proposes Quality Diversity (QD) algorithms as an effective method for optimizing the environments of automated warehouses. However, these approaches optimize only relatively small environments, falling short when it comes to replicating real-world warehouse sizes. The challenge arises from the exponential increase in the search space as the environment size increases. Additionally, the previous methods have only been tested with up to 350 robots in simulations, while practical warehouses could host thousands of robots. In this paper, instead of optimizing environments, we propose to optimize Neural Cellular Automata (NCA) environment generators via QD algorithms. We train a collection of NCA generators with QD algorithms in small environments and then generate arbitrarily large environments from the generators at test time. We show that NCA environment generators maintain consistent, regularized patterns regardless of environment size, significantly enhancing the scalability of multi-robot systems in two different domains with up to 2,350 robots. Additionally, we demonstrate that our method scales a single-agent reinforcement learning policy to arbitrarily large environments with similar patterns. We include the source code at \url{https://github.com/lunjohnzhang/warehouse_env_gen_nca_public}.

摘要
我们研究多机器人系统中的环境生成问题，以提高其吞吐量。先前的方法提出了质量多样性（QD）算法来优化自动化仓储的环境，但这些方法仅能优化较小的环境，无法模拟现实世界仓储的规模。这问题的挑战在于搜索空间的对数增长，以及先前的方法仅在350台机器人的 simulations 中进行过测试。在这篇文章中，我们不是直接优化环境，而是透过 QD 算法来优化神经细胞自动机（NCA）环境生成器。我们在小型环境中训练了一个 NCA 环境生成器，然后在测试时使用这个生成器来生成任意大的环境。我们证明了 NCA 环境生成器在不同领域中能够维持一致的、规律的模式，很大地提高了多机器人系统的扩展性，并且还能将单机器人学习策略扩展到任意大的环境中。我们在这篇文章中还提供了源代码，可以在 \url{https://github.com/lunjohnzhang/warehouse_env_gen_nca_public} 中获取。

Dense Retrieval as Indirect Supervision for Large-space Decision Making

paper_url: http://arxiv.org/abs/2310.18619
repo_url: https://github.com/luka-group/ddr
paper_authors: Nan Xu, Fei Wang, Mingtao Dong, Muhao Chen
for: 提高大量分类任务的准确率和泛化能力。
methods: 使用 dense retrieval 方法，将大量分类任务 reformulate 为学习 retrieve 任务，并使用 dual-encoder 架构来学习预测。
results: 在多个极端多类分类任务和少量数据情况下，DDR 可以大幅提高预测精度和泛化能力，至少比基eline 27.54%，F1 分数提高 1.17%，并在三个少量意图分类任务中平均提高了1.26%的准确率。

Abstract
Many discriminative natural language understanding (NLU) tasks have large label spaces. Learning such a process of large-space decision making is particularly challenging due to the lack of training instances per label and the difficulty of selection among many fine-grained labels. Inspired by dense retrieval methods for passage finding in open-domain QA, we propose a reformulation of large-space discriminative NLU tasks as a learning-to-retrieve task, leading to a novel solution named Dense Decision Retrieval (DDR ). Instead of predicting fine-grained decisions as logits, DDR adopts a dual-encoder architecture that learns to predict by retrieving from a decision thesaurus. This approach not only leverages rich indirect supervision signals from easy-to-consume learning resources for dense retrieval, it also leads to enhanced prediction generalizability with a semantically meaningful representation of the large decision space. When evaluated on tasks with decision spaces ranging from hundreds to hundred-thousand scales, DDR outperforms strong baselines greatly by 27.54% in P@1 on two extreme multi-label classification tasks, 1.17% in F1 score ultra-fine entity typing, and 1.26% in accuracy on three few-shot intent classification tasks on average. Code and resources are available at https://github.com/luka-group/DDR

摘要
很多推理性自然语言理解（NLU）任务有很大的标签空间。学习这种大空间决策的过程特别是有很多标签的选择和训练实例的缺乏。 inspirited by dense retrieval方法用于在开放领域Question Answering中找到段落，我们提出了对大空间推理性NLU任务的重新表述，导致一种新的解决方案 называ为粘性决策检索（DDR）。而不是预测细化的决策，DDR采用了双核生成器体系，学习通过检索决策词典来预测。这种方法不仅利用了易于采用的学习资源的丰富间接监督信号，还导致了增强的预测泛化性和semantically meaningful的决策空间表示。当评估在标签空间范围从百到千千的任务上，DDR大幅超越了强基eline的表现，平均提高了27.54%的P@1、1.17%的F1分数和1.26%的准确率。代码和资源可以在https://github.com/luka-group/DDR上找到。

Hierarchical Mutual Information Analysis: Towards Multi-view Clustering in The Wild

paper_url: http://arxiv.org/abs/2310.18614
repo_url: None
paper_authors: Jiatai Wang, Zhiwei Xu, Xuewen Yang, Xin Wang
for: This paper focuses on addressing the challenges of missing and unaligned data in multi-view clustering, which is a common problem in practical computer vision applications.
methods: The proposed method uses a deep framework that combines data recovery and alignment in a hierarchically consistent way, leveraging dual prediction and contrastive reconstruction to achieve instance-level and class-level alignment.
results: The proposed method significantly outperforms state-of-the-art methods on multi-view clustering even in the cases of view missing and unalignment, as demonstrated by extensive experiments on public datasets.Here’s the same information in Simplified Chinese text:
for: 这篇论文针对多视图 clustering 中缺失和不一致的数据问题进行解决，这是实际计算机视觉应用中的常见问题。
methods: 提议的方法使用深度框架，将数据恢复和对齐 fusion 在层次结构上进行一致性验证，通过 dual prediction 和对比重建来实现实例级别和类别级别的对齐。
results: 提议的方法在实际公共数据集上进行了广泛的实验，与现有方法进行比较，得到了显著的性能提升，即使在缺失和不一致的情况下也能够达到显著的效果。

Abstract
Multi-view clustering (MVC) can explore common semantics from unsupervised views generated by different sources, and thus has been extensively used in applications of practical computer vision. Due to the spatio-temporal asynchronism, multi-view data often suffer from view missing and are unaligned in real-world applications, which makes it difficult to learn consistent representations. To address the above issues, this work proposes a deep MVC framework where data recovery and alignment are fused in a hierarchically consistent way to maximize the mutual information among different views and ensure the consistency of their latent spaces. More specifically, we first leverage dual prediction to fill in missing views while achieving the instance-level alignment, and then take the contrastive reconstruction to achieve the class-level alignment. To the best of our knowledge, this could be the first successful attempt to handle the missing and unaligned data problem separately with different learning paradigms. Extensive experiments on public datasets demonstrate that our method significantly outperforms state-of-the-art methods on multi-view clustering even in the cases of view missing and unalignment.

摘要

Embedding in Recommender Systems: A Survey

paper_url: http://arxiv.org/abs/2310.18608
repo_url: None
paper_authors: Xiangyu Zhao, Maolin Wang, Xinjian Zhao, Jiansheng Li, Shucheng Zhou, Dawei Yin, Qing Li, Jiliang Tang, Ruocheng Guo
for: 本文提供了一个概述近期 embedding 技术在推荐系统中的研究进展的survey。
methods: 本文覆盖了多种 embedding 方法，包括 collaborative filtering、自监学习和图基于的技术。
results: 本文提出了一些创新的方法来提高推荐系统的性能和计算复杂性，包括 AutoML、哈希技术和量化技术。

Abstract
Recommender systems have become an essential component of many online platforms, providing personalized recommendations to users. A crucial aspect is embedding techniques that coverts the high-dimensional discrete features, such as user and item IDs, into low-dimensional continuous vectors and can enhance the recommendation performance. Applying embedding techniques captures complex entity relationships and has spurred substantial research. In this survey, we provide an overview of the recent literature on embedding techniques in recommender systems. This survey covers embedding methods like collaborative filtering, self-supervised learning, and graph-based techniques. Collaborative filtering generates embeddings capturing user-item preferences, excelling in sparse data. Self-supervised methods leverage contrastive or generative learning for various tasks. Graph-based techniques like node2vec exploit complex relationships in network-rich environments. Addressing the scalability challenges inherent to embedding methods, our survey delves into innovative directions within the field of recommendation systems. These directions aim to enhance performance and reduce computational complexity, paving the way for improved recommender systems. Among these innovative approaches, we will introduce Auto Machine Learning (AutoML), hash techniques, and quantization techniques in this survey. We discuss various architectures and techniques and highlight the challenges and future directions in these aspects. This survey aims to provide a comprehensive overview of the state-of-the-art in this rapidly evolving field and serve as a useful resource for researchers and practitioners working in the area of recommender systems.

摘要
现在许多在线平台上都有推荐系统，为用户提供个性化的推荐。一个重要的方面是嵌入技术，将用户和 Item ID 等高维离散特征转换成低维连续向量，以提高推荐性能。采用嵌入技术可以捕捉复杂的实体关系，并促进了大量研究。在这篇报告中，我们提供了现代推荐系统中嵌入技术的最新Literature综述。这篇报告覆盖了协同练习、自然学习和图像基本技术等嵌入方法。协同练习生成 embeddings，捕捉用户和 Item 的偏好，在缺乏数据时表现出色。自然学习使用对比或生成学习来实现多种任务。图像基本技术如 node2vec 利用网络中的复杂关系。为了解决嵌入方法中的扩展性问题，我们在推荐系统领域内进行了创新的方向，以提高性能并降低计算复杂性，为未来的推荐系统铺平道路。这些创新方向包括自动机器学习（AutoML）、哈希技术和量化技术。我们讨论了不同的架构和技术，并高亮了这些方面中的挑战和未来方向。该报告的目的是为研究人员和实践者提供一份现代化的推荐系统领域的 estado-da-arte 资源，以便他们在这一领域进行更好的研究和实践。

MILDSum: A Novel Benchmark Dataset for Multilingual Summarization of Indian Legal Case Judgments

paper_url: http://arxiv.org/abs/2310.18600
repo_url: https://github.com/law-ai/mildsum
paper_authors: Debtanu Datta, Shubham Soni, Rajdeep Mukherjee, Saptarshi Ghosh
for: 本研究旨在提供英文法律文件的跨语言概要，以便在印度的法律系统中提供更加公平的 justice。
methods: 该研究使用了多种多样的概要方法，以评估其在法律领域的性能。
results: 研究发现，跨语言概要在法律领域的应用仍然需要进一步的研究，以提高概要的准确性和可读性。

Abstract
Automatic summarization of legal case judgments is a practically important problem that has attracted substantial research efforts in many countries. In the context of the Indian judiciary, there is an additional complexity -- Indian legal case judgments are mostly written in complex English, but a significant portion of India's population lacks command of the English language. Hence, it is crucial to summarize the legal documents in Indian languages to ensure equitable access to justice. While prior research primarily focuses on summarizing legal case judgments in their source languages, this study presents a pioneering effort toward cross-lingual summarization of English legal documents into Hindi, the most frequently spoken Indian language. We construct the first high-quality legal corpus comprising of 3,122 case judgments from prominent Indian courts in English, along with their summaries in both English and Hindi, drafted by legal practitioners. We benchmark the performance of several diverse summarization approaches on our corpus and demonstrate the need for further research in cross-lingual summarization in the legal domain.

摘要
自动摘要法律案例判决是一个实际重要的问题，在多个国家的研究中都获得了substantial的投入。在印度法院的背景下，有一个额外的复杂性---印度的法律案例判决大多是用复杂的英语写成，但印度大部分人口不会英语。因此，实际需要摘要法律文件的印地语言，以确保公平的法律服务。在先前的研究中，主要对源语言进行摘要，但这项研究则对英文法律文件进行标准化，并将其摘要为印地语言。我们建立了首个高品质的法律档案，包括3,122个案例判决由印度主要法院提供，以及其摘要在英语和印地语言中，由法律专业人员撰写。我们在我们的档案上评估了多种多元摘要方法的表现，并证明了在法律领域中的标准化摘要仍然需要进一步的研究。

Using Early Readouts to Mediate Featural Bias in Distillation

paper_url: http://arxiv.org/abs/2310.18590
repo_url: None
paper_authors: Rishabh Tiwari, Durga Sivasubramanian, Anmol Mekala, Ganesh Ramakrishnan, Pradeep Shenoy
for: 本研究旨在改进在真实世界的超级vised学习任务中深度网络学习的潜在损害，特别是在托管学习中，学生模型可能比对应教师模型更具有较低的表达能力。
methods: 我们提出了一种新的早期读取机制，通过使用早期网络层的表示来预测标签。我们发现这些早期读outs自动地标识了问题实例或组，具体来说是具有高度信任但 incorrect 预测的情况。
results: 我们显示了这种早期读outs可以自动地为实例层次提供较好的预测信号，可以用于修改分配损害loss的学习过程中。我们在多个benchmark数据集上显示了提高group fairness度量和学生模型的总准确率。此外，我们还提供了次要分析，以帮助理解超级vised学习中特征学习的角色。

Abstract
Deep networks tend to learn spurious feature-label correlations in real-world supervised learning tasks. This vulnerability is aggravated in distillation, where a student model may have lesser representational capacity than the corresponding teacher model. Often, knowledge of specific spurious correlations is used to reweight instances & rebalance the learning process. We propose a novel early readout mechanism whereby we attempt to predict the label using representations from earlier network layers. We show that these early readouts automatically identify problem instances or groups in the form of confident, incorrect predictions. Leveraging these signals to modulate the distillation loss on an instance level allows us to substantially improve not only group fairness measures across benchmark datasets, but also overall accuracy of the student model. We also provide secondary analyses that bring insight into the role of feature learning in supervision and distillation.

摘要
深度网络通常在实际supervised learning任务中学习假的特征-标签相关性。这种漏洞在精神投射中更加加剧，因为学生模型可能比对应的教师模型有更差的表达能力。经常使用特定假相关性的知识来重新权衡实例和重新调整学习过程。我们提出了一种新的早期读取机制，尝试使用早期网络层的表示来预测标签。我们发现这些早期读outs自然地标识问题实例或组，即高信息准确预测。利用这些信号来修改分配损失的实例级别可以大幅提高 benchmark数据集上的组准则性和学生模型的总准确率。我们还提供了次要分析，探讨特征学习在监督和精神投射中的角色。

Visual Explanations via Iterated Integrated Attributions

paper_url: http://arxiv.org/abs/2310.18585
repo_url: None
paper_authors: Oren Barkan, Yehonatan Elisha, Yuval Asher, Amit Eshel, Noam Koenigstein
for: 这篇论文用于解释视觉模型的预测结果。
methods: 该论文使用迭代 интеGRATED ATTRIBUTES（IIA）方法，通过迭代 интеGRATE 输入图像、模型内部表示和导数，生成准确和专注的解释地图。
results: 论文的实验结果表明，IIA方法可以准确地解释视觉模型的预测结果，并且在不同任务、数据集和网络架构上表现出色，超过了其他当前领先的解释技术。

Abstract
We introduce Iterated Integrated Attributions (IIA) - a generic method for explaining the predictions of vision models. IIA employs iterative integration across the input image, the internal representations generated by the model, and their gradients, yielding precise and focused explanation maps. We demonstrate the effectiveness of IIA through comprehensive evaluations across various tasks, datasets, and network architectures. Our results showcase that IIA produces accurate explanation maps, outperforming other state-of-the-art explanation techniques.

摘要
我们介绍Iterated Integrated Attributions（IIA），一种通用的视觉模型预测解释方法。IIA通过迭代 интеграpection 输入图像、模型内部表示和其导数，生成精细和专注的解释地图。我们通过多种任务、数据集和网络架构的全面评估，证明IIA可以生成准确的解释地图，超越其他当前领域的解释技术。

Breaking the Trilemma of Privacy, Utility, Efficiency via Controllable Machine Unlearning

paper_url: http://arxiv.org/abs/2310.18574
repo_url: https://github.com/guangyaodou/conmu
paper_authors: Zheyuan Liu, Guangyao Dou, Yijun Tian, Chunhui Zhang, Eli Chien, Ziwei Zhu
for: 这篇论文的主要目标是解决机器学习模型中的数据隐私问题，具体来说是通过控制 Privacy-Utility-Efficiency 三方面的质量来实现机器学习模型的卸载。
methods: 这篇论文提出了一种名为 Controllable Machine Unlearning（ConMU）的新框架，该框架包括三个基本模块：重要数据选择模块、进程 Gaussian 机制模块和卸载代理模块。这些模块协同实现了控制 Privacy-Utility-Efficiency 三方面的质量。
results: 对于各种标准数据集的实验表明，ConMU 控制机制具有优于现有卸载方法的灵活性和可控性，并且可以充分考虑不同的实际隐私法规。

Abstract
Machine Unlearning (MU) algorithms have become increasingly critical due to the imperative adherence to data privacy regulations. The primary objective of MU is to erase the influence of specific data samples on a given model without the need to retrain it from scratch. Accordingly, existing methods focus on maximizing user privacy protection. However, there are different degrees of privacy regulations for each real-world web-based application. Exploring the full spectrum of trade-offs between privacy, model utility, and runtime efficiency is critical for practical unlearning scenarios. Furthermore, designing the MU algorithm with simple control of the aforementioned trade-off is desirable but challenging due to the inherent complex interaction. To address the challenges, we present Controllable Machine Unlearning (ConMU), a novel framework designed to facilitate the calibration of MU. The ConMU framework contains three integral modules: an important data selection module that reconciles the runtime efficiency and model generalization, a progressive Gaussian mechanism module that balances privacy and model generalization, and an unlearning proxy that controls the trade-offs between privacy and runtime efficiency. Comprehensive experiments on various benchmark datasets have demonstrated the robust adaptability of our control mechanism and its superiority over established unlearning methods. ConMU explores the full spectrum of the Privacy-Utility-Efficiency trade-off and allows practitioners to account for different real-world regulations. Source code available at: https://github.com/guangyaodou/ConMU.

摘要

A General Framework for Robust G-Invariance in G-Equivariant Networks

paper_url: http://arxiv.org/abs/2310.18564
repo_url: https://github.com/gtc-invariance/gtc-invariance
paper_authors: Sophia Sanborn, Nina Miolane
For: The paper proposes a method for achieving robust group-invariance in group-equivariant convolutional neural networks (G-CNNs) called the G-triple-correlation (G-TC) layer.* Methods: The G-TC layer leverages the theory of the triple-correlation on groups, which is a unique, lowest-degree polynomial invariant map that is also complete.* Results: The G-TC layer yields measurable improvements in classification accuracy over standard Max G-Pooling in G-CNN architectures, and is resistant to invariance-based adversarial attacks. The method is demonstrated on several groups acting on both $\mathbb{R}^2$ and $\mathbb{R}^3$ on the G-MNIST and G-ModelNet10 datasets.Here is the same information in Simplified Chinese text:* For: 本文提出了一种方法来实现robust group-invariance在群equivariant convolutional neural networks（G-CNNs）中，称为G-triple-correlation（G-TC）层。* Methods: G-TC层利用群中的 triple-correlation理论，这是一个唯一的、最低度的多项式恒等函数，同时也是完整的。* Results: G-TC层在G-CNN架构中提供了较好的分类精度，并且对 invariant-based adversarial attacks具有强大的Robustness。此方法在几个群中对 $\mathbb{R}^2$ 和 $\mathbb{R}^3$ 上的 G-MNIST 和 G-ModelNet10 数据集进行了证明。

Abstract
We introduce a general method for achieving robust group-invariance in group-equivariant convolutional neural networks ($G$-CNNs), which we call the $G$-triple-correlation ($G$-TC) layer. The approach leverages the theory of the triple-correlation on groups, which is the unique, lowest-degree polynomial invariant map that is also complete. Many commonly used invariant maps - such as the max - are incomplete: they remove both group and signal structure. A complete invariant, by contrast, removes only the variation due to the actions of the group, while preserving all information about the structure of the signal. The completeness of the triple correlation endows the $G$-TC layer with strong robustness, which can be observed in its resistance to invariance-based adversarial attacks. In addition, we observe that it yields measurable improvements in classification accuracy over standard Max $G$-Pooling in $G$-CNN architectures. We provide a general and efficient implementation of the method for any discretized group, which requires only a table defining the group's product structure. We demonstrate the benefits of this method for $G$-CNNs defined on both commutative and non-commutative groups - $SO(2)$, $O(2)$, $SO(3)$, and $O(3)$ (discretized as the cyclic $C8$, dihedral $D16$, chiral octahedral $O$ and full octahedral $O_h$ groups) - acting on $\mathbb{R}^2$ and $\mathbb{R}^3$ on both $G$-MNIST and $G$-ModelNet10 datasets.

摘要
我们介绍了一个通用的方法，可以在群equivariant convolutional neural networks（$G$-CNNs）中实现强健的群对称性，我们称之为$G$-三重相关（$G$-TC）层。这种方法利用群论中的三重相関，这是唯一的、最低阶的多项式群对称函数，同时也是完备的。许多常用的对称函数，如最大值，都是不完备的：它们会消除群和信号结构中的一部分。一个完备的对称函数，则会消除群的行动所导致的变化，保留信号的结构信息。三重相关的完备性使得$G$-TC层具有强大的免疫力，可以通过观察它对抗对称基于的攻击而证明。此外，我们发现这种方法可以提高$G$-CNN的分类精度，比标准的最大值$G$-Pooling更好。我们提供了一个通用且有效的实现方法，这需要一个表格定义了群的产生结构。我们在$G$-CNNs中使用了不同的域群，包括$SO(2)$, $O(2)$, $SO(3)$,和$O(3)$（为数为顺序$C8$, $D16$, $O$和$O_h$群），并在$\mathbb{R}^2$和$\mathbb{R}^3$上进行了$G$-MNIST和$G$-ModelNet10数据集上的实验。

Optimization-Free Test-Time Adaptation for Cross-Person Activity Recognition

paper_url: http://arxiv.org/abs/2310.18562
repo_url: https://github.com/Claydon-Wang/OFTTA
paper_authors: Shuoyuan Wang, Jindong Wang, HuaJun Xi, Bob Zhang, Lei Zhang, Hongxin Wei
for: 这个论文主要针对的是人体动作识别（HAR）模型在实际应用中的性能降低问题，以及如何通过测试流式进行时间适应（TTA）来解决这个问题。
methods: 这篇论文提出了一种不需要优化的测试时适应（OFTTA）框架，用于抗预测域变化和实时适应。OFTTA使用了快速的测试时批处理（EDTN）来取代批处理（CBN）层，并对分类器进行了距离计算和支持集维护。
results: 对于三个公共的人体动作识别（HAR）数据集和两种不同的TTA设置，实验结果表明，OFTTA可以与现有的TTA方法进行比较，在分类性能和计算效率两个方面均有优异表现。此外，我们还验证了OFTTA在边缘设备上的可行性，表明可能的部署在实际应用中。

Abstract
Human Activity Recognition (HAR) models often suffer from performance degradation in real-world applications due to distribution shifts in activity patterns across individuals. Test-Time Adaptation (TTA) is an emerging learning paradigm that aims to utilize the test stream to adjust predictions in real-time inference, which has not been explored in HAR before. However, the high computational cost of optimization-based TTA algorithms makes it intractable to run on resource-constrained edge devices. In this paper, we propose an Optimization-Free Test-Time Adaptation (OFTTA) framework for sensor-based HAR. OFTTA adjusts the feature extractor and linear classifier simultaneously in an optimization-free manner. For the feature extractor, we propose Exponential DecayTest-time Normalization (EDTN) to replace the conventional batch normalization (CBN) layers. EDTN combines CBN and Test-time batch Normalization (TBN) to extract reliable features against domain shifts with TBN's influence decreasing exponentially in deeper layers. For the classifier, we adjust the prediction by computing the distance between the feature and the prototype, which is calculated by a maintained support set. In addition, the update of the support set is based on the pseudo label, which can benefit from reliable features extracted by EDTN. Extensive experiments on three public cross-person HAR datasets and two different TTA settings demonstrate that OFTTA outperforms the state-of-the-art TTA approaches in both classification performance and computational efficiency. Finally, we verify the superiority of our proposed OFTTA on edge devices, indicating possible deployment in real applications. Our code is available at \href{https://github.com/Claydon-Wang/OFTTA}{this https URL}.

摘要
人体活动识别（HAR）模型经常在实际应用中受到分布偏移的影响，导致性能下降。测试时适应（TTA）是一种新趋势的学习方法，它在实时推断中使用测试流来调整预测，在HAR中尚未得到探索。然而，优化基本的TTA算法的计算成本过高，使其无法在有限的边缘设备上进行实时推断。在这篇论文中，我们提出了一种不需要优化的测试时适应（OFTTA）框架，用于感知器基本HAR。OFTTA同时调整特征提取器和线性分类器。特征提取器方面，我们提出了对域偏移的 exponential decay test-time normalization（EDTN），以取代传统的批量normalization（CBN）层。EDTN将CBN和测试时批量normalization（TBN）相结合，以提取可靠的特征对域偏移。分类器方面，我们通过计算特征和拟标的距离，来更新支持集和pseudo标签。此外，更新支持集的方法基于pseudo标签，可以利用EDTN提取的可靠特征。我们对三个公共跨人HAR数据集和两种不同的TTA设置进行了广泛的实验，结果表明OFTTA在分类性能和计算效率两个方面都超过了当前TTA方法。最后，我们验证了我们提出的OFTTA在边缘设备上的可部署性， indicating possible deployment in real applications.我们的代码可以在\href{https://github.com/Claydon-Wang/OFTTA}{这个https URL}上找到。

Deep Intrinsic Decomposition with Adversarial Learning for Hyperspectral Image Classification

paper_url: http://arxiv.org/abs/2310.18549
repo_url: None
paper_authors: Zhiqiang Gong, Xian Zhou, Wen Yao
for: 提高干扰因素影响的高spectral图像分类性能
methods: 利用深度学习的强化学习方法，提取环境因素相关的特征和分类特征，并在激烈学习环境下进行对环境和分类的共同学习
results: 对三个常用的实际数据集进行了实验，并与其他比较方法进行了比较，结果表明提出的方法可以提高高spectral图像分类性能。

Abstract
Convolutional neural networks (CNNs) have been demonstrated their powerful ability to extract discriminative features for hyperspectral image classification. However, general deep learning methods for CNNs ignore the influence of complex environmental factor which enlarges the intra-class variance and decreases the inter-class variance. This multiplies the difficulty to extract discriminative features. To overcome this problem, this work develops a novel deep intrinsic decomposition with adversarial learning, namely AdverDecom, for hyperspectral image classification to mitigate the negative impact of environmental factors on classification performance. First, we develop a generative network for hyperspectral image (HyperNet) to extract the environmental-related feature and category-related feature from the image. Then, a discriminative network is constructed to distinguish different environmental categories. Finally, a environmental and category joint learning loss is developed for adversarial learning to make the deep model learn discriminative features. Experiments are conducted over three commonly used real-world datasets and the comparison results show the superiority of the proposed method. The implementation of the proposed method and other compared methods could be accessed at https://github.com/shendu-sw/Adversarial Learning Intrinsic Decomposition for the sake of reproducibility.

摘要
卷积神经网络（CNN）在多spectral影像分类中表现出了强大的特征提取能力。然而，通用深度学习方法忽略了环境因素的复杂影响，这会增加内类差异和降低对类差异，从而困难提取特征。为解决这个问题，本文提出了一种新的深度内在分解与对抗学习方法，称为AdverDecom，用于多spectral影像分类。首先，我们开发了一个生成网络（HyperNet），用于提取影像中的环境相关特征和类别相关特征。然后，我们构建了一个分类网络，用于分辨不同的环境类别。最后，我们开发了一个环境和类别联合学习损失函数，用于对抗学习，以使深度模型学习特征。我们在三个常用的实际数据集上进行了实验，并比较了我们的方法和其他比较方法的结果，显示了我们的方法的优越性。实现方法和其他比较方法的实现可以通过https://github.com/shendu-sw/Adversarial Learning Intrinsic Decomposition访问，以便重现。

ReConTab: Regularized Contrastive Representation Learning for Tabular Data

paper_url: http://arxiv.org/abs/2310.18541
repo_url: None
paper_authors: Suiyao Chen, Jing Wu, Naira Hovakimyan, Handong Yao
for: 本研究旨在提出一种深度自动表示学习框架，以提高tabular数据中的特征工程和选择过程。
methods: 该框架基于同 Raw Features 的非对称 autoencoder，并采用了正则化技术进行 Raw Feature 选择。同时，框架还应用了对比学习来维护最关键的信息。
results: 实验结果表明，该框架可以在各种实际数据集上提供显著的性能提升，并且可以轻松地与传统方法相结合，如 XGBoost 和 Random Forest。

Abstract
Representation learning stands as one of the critical machine learning techniques across various domains. Through the acquisition of high-quality features, pre-trained embeddings significantly reduce input space redundancy, benefiting downstream pattern recognition tasks such as classification, regression, or detection. Nonetheless, in the domain of tabular data, feature engineering and selection still heavily rely on manual intervention, leading to time-consuming processes and necessitating domain expertise. In response to this challenge, we introduce ReConTab, a deep automatic representation learning framework with regularized contrastive learning. Agnostic to any type of modeling task, ReConTab constructs an asymmetric autoencoder based on the same raw features from model inputs, producing low-dimensional representative embeddings. Specifically, regularization techniques are applied for raw feature selection. Meanwhile, ReConTab leverages contrastive learning to distill the most pertinent information for downstream tasks. Experiments conducted on extensive real-world datasets substantiate the framework's capacity to yield substantial and robust performance improvements. Furthermore, we empirically demonstrate that pre-trained embeddings can seamlessly integrate as easily adaptable features, enhancing the performance of various traditional methods such as XGBoost and Random Forest.

摘要
<>机器学习中的表示学习技术在不同领域具有重要的地位。通过获得高质量的特征，预训练的嵌入significantly reducent输入空间的重复性，从而为下游的模式识别任务，如分类、回归或检测提供了明显的性能提升。然而，在标量数据领域，功能工程和选择仍然高度依赖于人工干预，导致时间消耗大、需要域专业知识。为解决这个挑战，我们介绍ReConTab，一种深度自动表示学习框架，通过带有正则化的对比学习来实现。不论任务模型的类型，ReConTab使用同 Raw Features 的同构自动encoder来生成低维表示嵌入。特别是，对 Raw Features 进行正则化处理。同时，ReConTab通过对比学习来筛选最关键的信息，以便下游任务。经验表明，ReConTab在广泛的实际数据集上实现了显著和可靠的性能提升。此外，我们也证明了预训练嵌入可以轻松地适应为多种传统方法，如 XGBoost 和 Random Forest 提高性能。<>

2023-10-28

cs.CL

cs.CL - 2023-10-28

Translating away Translationese without Parallel Data

paper_url: http://arxiv.org/abs/2310.18830
repo_url: None
paper_authors: Rricha Jalota, Koel Dutta Chowdhury, Cristina España-Bonet, Josef van Genabith
for: 本研究旨在减少翻译语言的影响，以提高跨语言自然语言处理任务的准确性。
methods: 本研究使用了一种新的翻译风格传递方法，利用了自监督学习方法，并结合了原始语言模型损失和 semantics相似性损失。
results: 研究结果表明，本方法能够减少翻译语言的影响，保持内容完整性和目标风格流畅性。

Abstract
Translated texts exhibit systematic linguistic differences compared to original texts in the same language, and these differences are referred to as translationese. Translationese has effects on various cross-lingual natural language processing tasks, potentially leading to biased results. In this paper, we explore a novel approach to reduce translationese in translated texts: translation-based style transfer. As there are no parallel human-translated and original data in the same language, we use a self-supervised approach that can learn from comparable (rather than parallel) mono-lingual original and translated data. However, even this self-supervised approach requires some parallel data for validation. We show how we can eliminate the need for parallel validation data by combining the self-supervised loss with an unsupervised loss. This unsupervised loss leverages the original language model loss over the style-transferred output and a semantic similarity loss between the input and style-transferred output. We evaluate our approach in terms of original vs. translationese binary classification in addition to measuring content preservation and target-style fluency. The results show that our approach is able to reduce translationese classifier accuracy to a level of a random classifier after style transfer while adequately preserving the content and fluency in the target original style.

摘要
文本翻译后会显示系统性的语言差异，这些差异称为翻译语言（translationese）。这些语言差异会影响跨语言自然语言处理任务的结果，可能导致偏向结果。在这篇论文中，我们探索了一种新的方法来减少翻译语言：翻译样式传递。由于没有同语言的人工翻译和原始数据，我们使用了一种自动学习的方法，可以从相似的原始和翻译数据中学习。然而， même 这种自动学习方法需要一些平行数据来验证。我们可以消除平行验证数据的需求 by combining the self-supervised loss with an unsupervised loss。这种无supervised loss 利用了原始语言模型的损失 sobre la output de estilo transferido y una pérdida de similitud semántica entre la entrada y la output de estilo transferido。我们按照原始vs. 翻译语言二分类、内容保持和目标风格流畅来评估我们的方法。结果表明，我们的方法可以在style transfer后减少翻译语言分类器的准确率到随机分类器的水平，同时保持内容和目标风格的流畅。

Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding

paper_url: http://arxiv.org/abs/2310.18783
repo_url: None
paper_authors: Lixing Zhu, Runcong Zhao, Lin Gui, Yulan He
for: 本研究旨在探讨narative理解的应用和挑战，以提高大语言模型（LLM）的 narative comprehension 能力。
methods: 本研究使用了 comprehensive survey 方法，对 narrative understanding 任务进行了全面的检查和分类，并详细介绍了关键特征、定义、分类、相关数据集、训练目标和评价指标。
results: 本研究发现，通过扩展 modularized LLM 的能力，可以解决一些新的 narative understanding 任务。此外，通过将 narative understanding 定义为捕捉作者的想象创作灵感的问题，本研究提出了一新的视角，以增强 narative comprehension 能力。

Abstract
Narrative understanding involves capturing the author's cognitive processes, providing insights into their knowledge, intentions, beliefs, and desires. Although large language models (LLMs) excel in generating grammatically coherent text, their ability to comprehend the author's thoughts remains uncertain. This limitation hinders the practical applications of narrative understanding. In this paper, we conduct a comprehensive survey of narrative understanding tasks, thoroughly examining their key features, definitions, taxonomy, associated datasets, training objectives, evaluation metrics, and limitations. Furthermore, we explore the potential of expanding the capabilities of modularized LLMs to address novel narrative understanding tasks. By framing narrative understanding as the retrieval of the author's imaginative cues that outline the narrative structure, our study introduces a fresh perspective on enhancing narrative comprehension.

摘要
narrative understanding 涉及捕捉作者的认知过程，提供作者的知识、意图、信仰、愿望等信息的启示。虽然大语言模型（LLM）在生成 grammatically coherent text 方面表现出色，但它们对作者的思想真实理解仍存在uncertainty。这种限制阻碍了 narraitve understanding 的实际应用。在这篇论文中，我们进行了全面的 narrative understanding 任务调查，详细检查了这些任务的关键特征、定义、分类、相关数据集、训练目标、评价指标以及局限性。此外，我们还探讨了扩展 modularized LLM 的能力，以解决新的 narrative understanding 任务。我们通过将 narrative understanding 定义为捕捉作者的想象力cue 的抽象，从新的角度增强了 narrative comprehension。

ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting

paper_url: http://arxiv.org/abs/2310.18778
repo_url: https://github.com/4mekki4/promap
paper_authors: Abdellah El Mekki, Muhammad Abdul-Mageed, ElMoatez Billah Nagoudi, Ismail Berrada, Ahmed Khoumsi
for: 本研究的目的是提出一种基于多语言多方言语言模型的提示方法，以解决基于静态单词表示的单词翻译 task 中的挑战。
methods: 该方法基于提前训练的多语言多方言语言模型，并使用有效的补充提示来改进单词翻译性能。
results: 在评估多种单词翻译方法，包括静态单词表示的方法，ProMap consistently achieve 状态的 лучResults ，并且在少数例示enario 下（ fewer than 10 个训练示例）也能够达到良好的性能。

Abstract
Bilingual Lexicon Induction (BLI), where words are translated between two languages, is an important NLP task. While noticeable progress on BLI in rich resource languages using static word embeddings has been achieved. The word translation performance can be further improved by incorporating information from contextualized word embeddings. In this paper, we introduce ProMap, a novel approach for BLI that leverages the power of prompting pretrained multilingual and multidialectal language models to address these challenges. To overcome the employment of subword tokens in these models, ProMap relies on an effective padded prompting of language models with a seed dictionary that achieves good performance when used independently. We also demonstrate the effectiveness of ProMap in re-ranking results from other BLI methods such as with aligned static word embeddings. When evaluated on both rich-resource and low-resource languages, ProMap consistently achieves state-of-the-art results. Furthermore, ProMap enables strong performance in few-shot scenarios (even with less than 10 training examples), making it a valuable tool for low-resource language translation. Overall, we believe our method offers both exciting and promising direction for BLI in general and low-resource languages in particular. ProMap code and data are available at \url{https://github.com/4mekki4/promap}.

摘要
百度 Lexicon 推理 (BLI), 将词语翻译 между两种语言，是 NLP 任务中的一项重要任务。虽然在使用静态词嵌入的情况下，在丰富资源语言中已经取得了可注目的进步，但词语翻译性能可以通过使用语言模型的上下文化词嵌入进一步改进。在这篇论文中，我们介绍了 ProMap，一种新的 BLI 方法，利用预训练的多语言多方言语言模型的力量，解决这些挑战。为了超越使用子词 токен，ProMap 利用有效的补充提示语言模型的方法，并使用种子词典 achieve 好的性能。我们还 demonstarte ProMap 可以在其他 BLI 方法的结果中进行排名，如采用静态词嵌入的方法。当评估在丰富资源语言和低资源语言上时，ProMap consistently 取得了状态的艺术结果。此外，ProMap 可以在少量示例下进行几极enario （即使使用 less than 10 个训练示例），这使其成为低资源语言翻译中的有价值工具。总之，我们认为我们的方法对 BLI 和低资源语言来说是一种激动人心的和有前途的方向。ProMap 代码和数据可以在 \url{https://github.com/4mekki4/promap} 上找到。

Crossing the Aisle: Unveiling Partisan and Counter-Partisan Events in News Reporting

paper_url: http://arxiv.org/abs/2310.18768
repo_url: None
paper_authors: Kaijian Zou, Xinliang Frederick Zhang, Winston Wu, Nick Beauchamp, Lu Wang
for: 这篇论文研究了新闻媒体是如何通过事件包容或排除来影响公众意见的。
methods: 作者首先引入了检测党派和反党派事件的任务，并对这些事件进行了标注。然后，他们使用了高质量的数据集PAC，包含304篇来自不同政治立场的新闻文章，并对其进行了分析。
results: 研究发现，新闻媒体通过事件包容或排除来影响公众意见，并且这种影响可以通过语言模型更好地理解事件在更广泛的上下文中。同时，研究也发现了新闻媒体的选择性报道可能会影响公众意见的方向性。

Abstract
News media is expected to uphold unbiased reporting. Yet they may still affect public opinion by selectively including or omitting events that support or contradict their ideological positions. Prior work in NLP has only studied media bias via linguistic style and word usage. In this paper, we study to which degree media balances news reporting and affects consumers through event inclusion or omission. We first introduce the task of detecting both partisan and counter-partisan events: events that support or oppose the author's political ideology. To conduct our study, we annotate a high-quality dataset, PAC, containing 8,511 (counter-)partisan event annotations in 304 news articles from ideologically diverse media outlets. We benchmark PAC to highlight the challenges of this task. Our findings highlight both the ways in which the news subtly shapes opinion and the need for large language models that better understand events within a broader context. Our dataset can be found at https://github.com/launchnlp/Partisan-Event-Dataset.

摘要
新闻媒体应该保持不倚于任何政治立场的报道，但它们可能仍然影响公众意见通过选择性地包括或排除支持或反对其政治立场的事件。在这篇论文中，我们研究了新闻报道是如何帮助或妨碍公众意见的。我们首先介绍了检测政治立场事件的任务，包括支持和反对作者政治立场的事件。为了进行这项研究，我们在304篇来自不同政治立场的新闻媒体发布的文章中标注了8511个（Counter-)政治立场事件。我们使用PAC数据集进行测试，以高亮这个任务的挑战。我们的发现表明新闻可以在不显着的方式下形成公众意见，同时也表明需要更好地理解事件在更广泛的上下文中。我们的数据集可以在GitHub上找到：https://github.com/launchnlp/Partisan-Event-Dataset。

TLM: Token-Level Masking for Transformers

paper_url: http://arxiv.org/abs/2310.18738
repo_url: https://github.com/Young1993/tlm
paper_authors: Yangjun Wu, Kebin Fang, Dongxiang Zhang, Han Wang, Hao Zhang, Gang Chen
for: 本研究旨在提高Transformer模型的鲁棒性和一致性，通过对自注意力连接进行质量控制。
methods: 本研究提出了一种基于Token Level Masking（TLM）的新训练策略，包括两种有效和容易实现的遮盾技术。
results: 实验表明，TLM可以在4种不同的自然语言处理任务上提高性能，比如GLUE、ChineseGLUE、中文语法错误修复和数据到文本生成等，并且可以超越DropHead和注意力遮盾。例如，使用BERT-large模型，TLM在GLUE上提高了0.5点相对于DropHead。此外，TLM在Rotowire上达到了18.93 BLEU的新纪录。

Abstract
Structured dropout approaches, such as attention dropout and DropHead, have been investigated to regularize the multi-head attention mechanism in Transformers. In this paper, we propose a new regularization scheme based on token-level rather than structure-level to reduce overfitting. Specifically, we devise a novel Token-Level Masking (TLM) training strategy for Transformers to regularize the connections of self-attention, which consists of two masking techniques that are effective and easy to implement. The underlying idea is to manipulate the connections between tokens in the multi-head attention via masking, where the networks are forced to exploit partial neighbors' information to produce a meaningful representation. The generality and effectiveness of TLM are thoroughly evaluated via extensive experiments on 4 diversified NLP tasks across 18 datasets, including natural language understanding benchmark GLUE, ChineseGLUE, Chinese Grammatical Error Correction, and data-to-text generation. The results indicate that TLM can consistently outperform attention dropout and DropHead, e.g., it increases by 0.5 points relative to DropHead with BERT-large on GLUE. Moreover, TLM can establish a new record on the data-to-text benchmark Rotowire (18.93 BLEU). Our code will be publicly available at https://github.com/Young1993/tlm.

摘要
“structured dropout方法，如注意力Dropout和DropHead，已经用来规化Transformer中的多头注意力机制。在这篇论文中，我们提出了一新的规化方案，基于Token Level而不是结构 Level，以减少过拟合。 Specifically, we develop a novel Token-Level Masking（TLM）训练策略 дляTransformer，以规化自我注意力的连接，这包括两种遮盾技术，它们是有效且易于实现。 The underlying idea is to manipulate the connections between tokens in the multi-head attention via masking, where the networks are forced to exploit partial neighbors' information to produce a meaningful representation。”“我们透过广泛的实验评估TLM的通用性和效果，包括18个不同的自然语言处理任务和4个测试集。结果显示，TLM可以较DropHead和注意力Dropout表现出色，例如，与BERT-large在GLUE上的结果提高0.5分。此外，TLM可以创下Rotowire（18.93 BLEU）中的新纪录。我们将代码公开在https://github.com/Young1993/tlm。”

When Reviewers Lock Horn: Finding Disagreement in Scientific Peer Reviews

paper_url: http://arxiv.org/abs/2310.18685
repo_url: https://github.com/sandeep82945/contradiction-in-peer-review
paper_authors: Sandeep Kumar, Tirthankar Ghosal, Asif Ekbal
for: 本研究旨在Automatically identifying contradictions among reviewers on a given article.
methods: 我们提出了一种基本模型，可以从open review-based ICLR和NeurIPS会议的 around 8.5k paper中检测出对 противоречи的评论。
results: 我们创建了一个包含around 28k review pair中nearly 50k review pair comment的comprehensive review-pair contradiction数据集，并提出了一种基本模型可以自动检测评论中的对 противоречи。

Abstract
To this date, the efficacy of the scientific publishing enterprise fundamentally rests on the strength of the peer review process. The journal editor or the conference chair primarily relies on the expert reviewers' assessment, identify points of agreement and disagreement and try to reach a consensus to make a fair and informed decision on whether to accept or reject a paper. However, with the escalating number of submissions requiring review, especially in top-tier Artificial Intelligence (AI) conferences, the editor/chair, among many other works, invests a significant, sometimes stressful effort to mitigate reviewer disagreements. Here in this work, we introduce a novel task of automatically identifying contradictions among reviewers on a given article. To this end, we introduce ContraSciView, a comprehensive review-pair contradiction dataset on around 8.5k papers (with around 28k review pairs containing nearly 50k review pair comments) from the open review-based ICLR and NeurIPS conferences. We further propose a baseline model that detects contradictory statements from the review pairs. To the best of our knowledge, we make the first attempt to identify disagreements among peer reviewers automatically. We make our dataset and code public for further investigations.

摘要

ASTormer: An AST Structure-aware Transformer Decoder for Text-to-SQL

paper_url: http://arxiv.org/abs/2310.18662
repo_url: None
paper_authors: Ruisheng Cao, Hanchong Zhang, Hongshen Xu, Jieyu Li, Da Ma, Lu Chen, Kai Yu
for: 文章目的是提出一种基于Transformer decoder的文本到SQL转换方法，以生成可执行的SQL程序，并确保输出SQL的有效性。
methods: 该方法使用AST结构具有STRUCTURE-aware Transformer decoder（ASTormer），在decoder中嵌入了结构知识，例如节点类型和位置，并通过绝对和相对位置嵌入来强化结构信息。
results: 对五个文本到SQL benchmark进行了广泛的实验，并证明了ASTormer比基于RNN的竞争对手更有效和高效。

Abstract
Text-to-SQL aims to generate an executable SQL program given the user utterance and the corresponding database schema. To ensure the well-formedness of output SQLs, one prominent approach adopts a grammar-based recurrent decoder to produce the equivalent SQL abstract syntax tree (AST). However, previous methods mainly utilize an RNN-series decoder, which 1) is time-consuming and inefficient and 2) introduces very few structure priors. In this work, we propose an AST structure-aware Transformer decoder (ASTormer) to replace traditional RNN cells. The structural knowledge, such as node types and positions in the tree, is seamlessly incorporated into the decoder via both absolute and relative position embeddings. Besides, the proposed framework is compatible with different traversing orders even considering adaptive node selection. Extensive experiments on five text-to-SQL benchmarks demonstrate the effectiveness and efficiency of our structured decoder compared to competitive baselines.

摘要
文本到SQL的目标是生成基于用户语音和相应的数据库架构的可执行SQL程序。为保证输出SQL的正确性，一种广泛采用的方法是使用 grammar-based 回归decoder生成相应的SQL抽象语法树（AST）。然而，前一代方法主要采用 RNN 序列decoder，这些方法有以下两点缺陷：1）时间consuming 和不效率，2）不提供多少结构偏好。在这项工作中，我们提议一种AST结构意识的Transformer decoder（ASTormer）来取代传统的RNN细胞。在decoder中，结构知识，如树中节点类型和位置，通过绝对和相对位置嵌入被灵活地嵌入。此外，我们提出的框架可以与不同的搜索顺序一起使用，包括自适应节点选择。我们对五个文本到SQLbenchmark进行了广泛的实验，并证明了我们的结构化decoder与其他基准值比较有效和高效。

Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

paper_url: http://arxiv.org/abs/2310.18628
repo_url: None
paper_authors: Hailin Chen, Amrita Saha, Steven Hoi, Shafiq Joty
for: 本研究旨在提高小型开源模型的能力，通过将大型关闭源模型（如ChatGPT、GPT-4）的能力储存到小型开源模型中。
methods: 该研究提出了一种个性化储存方法，其中学生模型首先尝试解决任务，然后教师模型提供适应性更新，以便学生模型可以通过自己的错误来进行改进。这种个性化储存方法与传统的预先Feed的方法不同，因为它只有学生模型在进行学习时才会学习，而不是从教师模型那里获得知识。
results: 研究表明，个性化储存方法在代码生成任务中表现出色，可以在只有一 third的数据量下达到传统方法的水平。在HumanEval中，通过使用2.5-3K个性化示例，可以提高CodeGen-mono-16B的表现，从33.6%提高到36.4%，并提高StarCoder的表现，从39.5%提高到45.8%。

Abstract
With the rise of powerful closed-sourced LLMs (ChatGPT, GPT-4), there are increasing interests in distilling the capabilies of close-sourced LLMs to smaller open-sourced LLMs. Previous distillation methods usually prompt ChatGPT to generate a set of instructions and answers, for the student model to learn. However, such standard distillation approach neglects the merits and conditions of the student model. Inspired by modern teaching principles, we design a personalised distillation process, in which the student attempts to solve a task first, then the teacher provides an adaptive refinement for the student to improve. Instead of feeding the student with teacher's prior, personalised distillation enables personalised learning for the student model, as it only learns on examples it makes mistakes upon and learns to improve its own solution. On code generation, personalised distillation consistently outperforms standard distillation with only one third of the data. With only 2.5-3K personalised examples that incur a data-collection cost of 4-6$, we boost CodeGen-mono-16B by 7% to achieve 36.4% pass@1 and StarCoder by 12.2% to achieve 45.8% pass@1 on HumanEval.

摘要
Inspired by modern teaching principles, we designed a personalized distillation process in which the student attempts to solve a task first, and then the teacher provides an adaptive refinement for the student to improve. Instead of feeding the student with the teacher's prior knowledge, personalized distillation enables personalized learning for the student model, as it only learns from examples it makes mistakes on and learns to improve its own solutions.On code generation, personalized distillation consistently outperforms standard distillation with only one-third of the data. With only 2,500 to 3,000 personalized examples that incur a data-collection cost of $4 to $6, we boosted CodeGen-mono-16B by 7% to achieve 36.4% pass@1 and StarCoder by 12.2% to achieve 45.8% pass@1 on HumanEval.

Anaphor Assisted Document-Level Relation Extraction

paper_url: http://arxiv.org/abs/2310.18604
repo_url: https://github.com/burgerburgerburger/aa
paper_authors: Chonggang Lu, Richong Zhang, Kai Sun, Jaein Kim, Cunwang Zhang, Yongyi Mao
for: DocRE document-level relation extraction
methods: Anaphor-Assisted (AA) framework
results: new state-of-the-art performance

Abstract
Document-level relation extraction (DocRE) involves identifying relations between entities distributed in multiple sentences within a document. Existing methods focus on building a heterogeneous document graph to model the internal structure of an entity and the external interaction between entities. However, there are two drawbacks in existing methods. On one hand, anaphor plays an important role in reasoning to identify relations between entities but is ignored by these methods. On the other hand, these methods achieve cross-sentence entity interactions implicitly by utilizing a document or sentences as intermediate nodes. Such an approach has difficulties in learning fine-grained interactions between entities across different sentences, resulting in sub-optimal performance. To address these issues, we propose an Anaphor-Assisted (AA) framework for DocRE tasks. Experimental results on the widely-used datasets demonstrate that our model achieves a new state-of-the-art performance.

摘要
文档级关系EXTRACTION（DocRE）涉及到在文档中多个句子中Identifying关系 между实体。现有方法是建立不同类型的文档图来模型实体的内部结构和实体之间的外部互动。然而，现有方法存在两点缺陷。一方面，Anaphora在理解关系 identification 中发挥重要作用，但这些方法忽略了它。另一方面，这些方法通过使用文档或句子作为中间节点来实现跨句sentenceEntity interaction，这会增加学习细化实体之间的交互问题，导致性能下降。为了解决这些问题，我们提出了Anaphor-Assisted（AA）框架来Address DocRE任务。实验结果表明，我们的模型在广泛使用的数据集上达到了新的状态的艺术性能。

Accelerating LLM Inference by Enabling Intermediate Layer Decoding

paper_url: http://arxiv.org/abs/2310.18581
repo_url: None
paper_authors: Neeraj Varshney, Agneet Chatterjee, Mihir Parmar, Chitta Baral
for: 提高LLMs的执行效率，使其适用于资源受限的实际应用中。
methods: 通过增加中间层的LITE损失，使中间层学习生成文本的能力，而不会影响最终层的生成质量。并通过“动态信任早退”的技术，在token层面实现更高效的推理，保持生成质量。
results: 在Alpaca数据集上进行了广泛的实验，并对四个不同的人工指令测试集进行了总体评估：Vicuna、WizardLM、Koala和Self-Instruct。结果表明，“动态早退”可以实现37.86%的平均成本改进，保持生成质量。进一步的分析结果表明，输出的语义相似性和生成的数量均有改进。

Abstract
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks; however, their large size makes their inference slow and computationally expensive which poses a practical challenge for resource constrained real-world applications. Focusing on this problem, we propose to instruction tune LLMs in a way that enables intermediate layer decoding for efficiently generating text, but importantly without compromising the quality of the generation. Specifically, we instruction tune LLMs with additional explicit Losses from the InTermediate layErs (LITE) and show that it enables these layers to acquire 'good' generation ability without affecting the generation ability of the final layer. We perform 'dynamic confidence-based early exiting' at token level from the intermediate layers which improves the efficiency of inference while maintaining the generation quality. We conduct comprehensive experiments by instruction tuning LLaMA-2 models on the widely used Alpaca dataset and holistically evaluate on four different human-instruction test sets: Vicuna, WizardLM, Koala, and Self-Instruct. We show that 'dynamic early exiting' achieves consistent and considerable cost improvements (37.86% on average) while maintaining the generation quality of the responses. We further conduct a thorough analysis of the results over several important aspects, such as comparing the semantic similarity of the outputs and dissecting the efficiency improvements by comparing the number of tokens generated in the output. In summary, our work contributes to improving the efficiency of LLM inference while maintaining the generation quality, a crucial step en route to enabling their widespread adoption.

摘要
大型自然语言模型（LLM）在各种自然语言任务上达到了很高的表现水平，但是它们的大小使得其推理慢且计算成本高，这成为了实际应用中的实际挑战。在这个问题上，我们提出了在LLM中 instrucion 优化，以实现中间层解码，以便高效地生成文本，而不会影响最终层的生成质量。我们在LLM中添加了额外的明确损失（LITE），使中间层学习“好”的生成能力，而不会影响最终层的生成能力。我们在中间层进行“动态信息确定早退”，从而提高推理效率，保持生成质量。我们对 LLamA-2 模型进行了广泛的实验，并对四个不同的人工指令测试集进行了总体评估：Vicuna、WizardLM、Koala 和 Self-Instruct。我们发现，“动态早退”可以具有重要的成本改善（37.86% 的平均提高），同时保持生成质量。我们进一步进行了详细的分析结果，包括比较输出的semantic相似性和生成量的比较分析。总之，我们的工作为LLM推理效率的提高，并保持生成质量，是实际应用中的关键一步。

Identifying Conspiracy Theories News based on Event Relation Graph

paper_url: http://arxiv.org/abs/2310.18545
repo_url: https://github.com/yuanyuanlei-nlp/conspiracy_theories_emnlp_2023
paper_authors: Yuanyuan Lei, Ruihong Huang
for: 本研究旨在检测新闻文章中是否存在阴谋理论。
methods: 本文提出了一种基于事件关系图的阴谋理论检测方法，包括开发了一个事件意识语言模型以提高基础语言模型的事件和事件关系知识，以及使用一种多型图注意力网络来 derive 一个图像 embedding。
results: 实验结果表明，基于事件关系图的方法可以提高阴谋理论检测的准确率和受检测率，并且能够在新的媒体源上进行检测。

Abstract
Conspiracy theories, as a type of misinformation, are narratives that explains an event or situation in an irrational or malicious manner. While most previous work examined conspiracy theory in social media short texts, limited attention was put on such misinformation in long news documents. In this paper, we aim to identify whether a news article contains conspiracy theories. We observe that a conspiracy story can be made up by mixing uncorrelated events together, or by presenting an unusual distribution of relations between events. Achieving a contextualized understanding of events in a story is essential for detecting conspiracy theories. Thus, we propose to incorporate an event relation graph for each article, in which events are nodes, and four common types of event relations, coreference, temporal, causal, and subevent relations, are considered as edges. Then, we integrate the event relation graph into conspiracy theory identification in two ways: an event-aware language model is developed to augment the basic language model with the knowledge of events and event relations via soft labels; further, a heterogeneous graph attention network is designed to derive a graph embedding based on hard labels. Experiments on a large benchmark dataset show that our approach based on event relation graph improves both precision and recall of conspiracy theory identification, and generalizes well for new unseen media sources.

摘要
《刺激论题》是一种不合理或恶意的含义，用于解释事件或情况。而大多数前期工作都是通过社交媒体短文来研究刺激论题，却受到了长文报道的限制。在这篇论文中，我们想要判断一篇新闻文章是否包含刺激论题。我们发现，刺激故事可以通过将不相关的事件混合起来，或者通过事件之间的不寻常的关系分布来构成。为了检测刺激论题，我们需要了解事件的上下文知识。因此，我们提议使用事件关系图来识刺刺激论题。每篇文章都有一个事件关系图，其中事件是节点，而四种常见的事件关系，核心引用、时间关系、 causal 关系和 subsequential 关系，被视为边。然后，我们将事件关系图 интеグриinto刺激论题标识中两种方式：首先，我们开发了一个事件意识语言模型，以提高基本语言模型的知识水平，并通过软标签将事件和事件关系传递给模型；其次，我们设计了一个多类Graph注意力网络，以生成基于硬标签的图 embedding。实验表明，我们基于事件关系图的方法可以提高刺激论题标识的精度和准确率，并在新的媒体来源上具有良好的泛化性。

Discourse Structures Guided Fine-grained Propaganda Identification

paper_url: http://arxiv.org/abs/2310.18544
repo_url: https://github.com/yuanyuanlei-nlp/propaganda_emnlp_2023
paper_authors: Yuanyuan Lei, Ruihong Huang
for: 本研究旨在识别政治新闻中的宣传内容，以 sentence-level 和 token-level 两级精细度进行识别。
methods: 本研究提出了两种教师模型，一是基于 PDTB 风格的 discourse relations 来识别宣传内容，二是基于本文中的 sentence 和 token 的 discourse structures 来提高宣传内容识别精度。
results: 实验结果表明，通过利用教师预测概率或知识储存框架来汇集 discourse structures 可以显著提高宣传内容识别的精度。

Abstract
Propaganda is a form of deceptive narratives that instigate or mislead the public, usually with a political purpose. In this paper, we aim to identify propaganda in political news at two fine-grained levels: sentence-level and token-level. We observe that propaganda content is more likely to be embedded in sentences that attribute causality or assert contrast to nearby sentences, as well as seen in opinionated evaluation, speculation and discussions of future expectation. Hence, we propose to incorporate both local and global discourse structures for propaganda discovery and construct two teacher models for identifying PDTB-style discourse relations between nearby sentences and common discourse roles of sentences in a news article respectively. We further devise two methods to incorporate the two types of discourse structures for propaganda identification by either using teacher predicted probabilities as additional features or soliciting guidance in a knowledge distillation framework. Experiments on the benchmark dataset demonstrate that leveraging guidance from discourse structures can significantly improve both precision and recall of propaganda content identification.

摘要
宣传是一种欺骗性的叙述，通常有政治目的。在这篇论文中，我们目标是在新闻文本中发现宣传。我们发现宣传内容更容易在归因或者评价附近的句子中出现，以及在评价、推测和未来预测中出现。因此，我们建议使用本地和全局文本结构来发现宣传。我们设计了两种教师模型，一个用于确定邻近句子之间的 PDTB 风格的语言关系，另一个用于确定新闻文本中句子的常见语言角色。此外，我们还提出了两种方法来结合这两种文本结构来发现宣传内容，一种是使用教师预测概率作为额外特征，另一种是在知识填充框架中寻求指导。实验表明，通过使用文本结构的指导，可以大幅提高宣传内容发现的精度和准确性。

2023-10-28

cs.LG

cs.LG - 2023-10-28

paper_url: http://arxiv.org/abs/2310.18847
repo_url: None
paper_authors: Chen Liu, Kiran Lekkala, Laurent Itti
For: 本研究目的是开发一个可以从便宜的 simulator 转移到实际世界的 robot Navigation 系统。* Methods: 本研究使用了一个组合了传统的 World Model комponents的系统，并将其整合成一个可以在 simulator 上全部训练的 Robust 系统。为了促进转移，我们使用了基于 Bird’s Eye View (BEV) 的中间表示，并将它与 First-Person View (FPV) 的 RGB 图像进行翻译。* Results: 我们使用了 CARLA simulator 收集的数据进行训练，并显示了模型的效能。最后，我们发布了一个完整的代码库、数据和模型，供大众使用。

Abstract
Sim2Real transfer has gained popularity because it helps transfer from inexpensive simulators to real world. This paper presents a novel system that fuses components in a traditional \textit{World Model} into a robust system, trained entirely within a simulator, that \textit{Zero-Shot} transfers to the real world. To facilitate transfer, we use an intermediary representation that are based on \textit{Bird's Eye View (BEV)} images. Thus, our robot learns to navigate in a simulator by first learning to translate from complex \textit{First-Person View (FPV)} based RGB images to BEV representations, then learning to navigate using those representations. Later, when tested in the real world, the robot uses the perception model that translates FPV-based RGB images to embeddings that are used by the downstream policy. The incorporation of state-checking modules using \textit{Anchor images} and \textit{Mixture Density LSTM} not only interpolates uncertain and missing observations but also enhances the robustness of the model when exposed to the real-world environment. We trained the model using data collected using a \textit{Differential drive} robot in the CARLA simulator. Our methodology's effectiveness is shown through the deployment of trained models onto a \textit{Real world Differential drive} robot. Lastly we release a comprehensive codebase, dataset and models for training and deployment that are available to the public.

摘要
实际转移（Sim2Real）已经受到普遍采用，因为它帮助将来自便宜的模拟器转移到真实世界。这篇论文提出了一个新的系统，将模拟器中的元件融合成一个可靠的系统，由真实世界训练，并且透过运算获得零损转移。为了促进转移，我们使用了中心投影（Bird's Eye View，BEV）图像作为中介表示。因此，我们的机器人在模拟器中学习将复杂的首人视角（First-Person View，FPV）基于RGB图像转换为BEV表示，然后学习使用这些表示进行navigation。当它在真实世界中进行测试时，机器人使用视觉模型将FPV基于RGB图像转换为嵌入，这些嵌入被用于下游策略。另外，我们还使用了状态检查模组使用 anchor image和mixture density LSTM interpolate uncertain和缺失观察，这不仅让模型在真实世界环境中更加稳定，而且也增强了模型的可靠性。我们使用了通过CARLA模拟器收集的数据进行训练。我们的方法的有效性被显示在真实世界中部署训练好的模型。最后，我们发布了一个完整的代码库、数据集和模型，供大众使用。

A randomized algorithm for nonconvex minimization with inexact evaluations and complexity guarantees

paper_url: http://arxiv.org/abs/2310.18841
repo_url: None
paper_authors: Shuyao Li, Stephen J. Wright
for: Minimizing a smooth nonconvex function with inexact oracle access to gradient and Hessian.
methods: Using a novel method that chooses the step direction with equal probability of positive or negative sense, and using relative inexactness measures on gradient and Hessian.
results: Achieving ($\epsilon_{g}, \epsilon_{H}$)-approximate second-order optimality with convergence analysis based on martingale analysis and concentration inequalities.Here’s the full summary in Simplified Chinese:
for: 本文目的是使用不精准渐近 oracle 访问梯度和对角线，来实现 ($ \epsilon_{g}, \epsilon_{H} $)-精度二阶优化。
methods: 我们使用一种新的方法，其中在步长选择时，选择的方向的方向是正负两种有 Equal probability。此外，我们使用梯度和对角线的相对不精准度度量，并松弛了梯度和对角线的第一阶和第二阶误差之间的关联。
results: 我们可以通过 martingale 分析和集中不等式来证明我们的算法可以实现 ($\epsilon_{g}, \epsilon_{H}$)-精度二阶优化，并且可以应用到empirical risk minimization问题中。

Abstract
We consider minimization of a smooth nonconvex function with inexact oracle access to gradient and Hessian (but not the function value) to achieve $(\epsilon_{g}, \epsilon_{H})$-approximate second-order optimality. A novel feature of our method is that if an approximate direction of negative curvature is chosen as the step, we choose its sense to be positive or negative with equal probability. We also use relative inexactness measures on gradient and Hessian and relax the coupling between the first- and second-order tolerances $\epsilon_{g}$ and $\epsilon_{H}$. Our convergence analysis includes both an expectation bound based on martingale analysis and a high-probability bound based on concentration inequalities. We apply our algorithm to empirical risk minimization problems and obtain gradient sample complexity.

摘要
我们考虑使用非конvex函数的最小化，但是只有不准确的梯度和偏微分（而不是函数值）的偏 oracle 访问来实现($\epsilon_{g}, \epsilon_{H}$)-次极性。我们的新特点在于，如果选择一个近似的负曲率方向作为步骤，我们会选择其方向为正或负的概率为50%。我们还使用relative不准确度度量在梯度和偏微分上，并松弛了梯度和偏微分的 Coupling。我们的收敛分析包括基于Martingale分析的预期 bound和基于集中不等式的高概率 bound。我们将我们的算法应用到empirical risk minimization问题，并获得梯度样本复杂度。

Intrinsic Gaussian Vector Fields on Manifolds

paper_url: http://arxiv.org/abs/2310.18824
repo_url: None
paper_authors: Daniel Robert-Nicoud, Andreas Krause, Viacheslav Borovitskiy
for: 本文主要针对的是模型非欧几何空间上的向量值信号，尤其是在不确定性评估中。
methods: 本文提出了一种新的泊松过程模型，即 HODE-MATÉRN 泊松场，用于模型非欧几何空间上的向量值信号。
results: 本文的实验结果表明，HODE-MATÉRN 泊松场可以在二维球面和高维托里上提供更精细的 inductive bias，并且可以在不同的批处理上进行扩展。

Abstract
Various applications ranging from robotics to climate science require modeling signals on non-Euclidean domains, such as the sphere. Gaussian process models on manifolds have recently been proposed for such tasks, in particular when uncertainty quantification is needed. In the manifold setting, vector-valued signals can behave very differently from scalar-valued ones, with much of the progress so far focused on modeling the latter. The former, however, are crucial for many applications, such as modeling wind speeds or force fields of unknown dynamical systems. In this paper, we propose novel Gaussian process models for vector-valued signals on manifolds that are intrinsically defined and account for the geometry of the space in consideration. We provide computational primitives needed to deploy the resulting Hodge-Mat\'ern Gaussian vector fields on the two-dimensional sphere and the hypertori. Further, we highlight two generalization directions: discrete two-dimensional meshes and "ideal" manifolds like hyperspheres, Lie groups, and homogeneous spaces. Finally, we show that our Gaussian vector fields constitute considerably more refined inductive biases than the extrinsic fields proposed before.

摘要
各种应用，从机器人学到气候科学，需要在非欧几何空间上模型信号，例如球面。在拓扑上， Gaussian process 模型在拓扑上最近得到了提议，特别是当需要uncertainty量化时。在拓扑设置中，向量值信号可能会与scalar值信号有很大差异，而前者在许多应用中非常重要，例如模型风速或未知动力系统的力场。在这篇论文中，我们提出了新的 Gaussian process 模型，用于vector值信号在拓扑上的模型，这些模型具有内在定义的拓扑geometry。我们还提供了在两个维度的球面和杂质上运行这些Hodge-Matérn Gaussian vector fields的计算基础。此外，我们还提出了两个扩展方向：离散二维网格和"理想"拓扑，如高维球面、 Lie group 和同态空间。最后，我们表明了我们的 Gaussian vector fields 比之前提出的外在场更加细致，即更加精细的 inductive bias。

Successfully Applying Lottery Ticket Hypothesis to Diffusion Model

paper_url: http://arxiv.org/abs/2310.18823
repo_url: https://github.com/osier0524/lottery-ticket-to-ddpm
paper_authors: Chao Jiang, Bo Hui, Bohan Liu, Da Yan
for: 这个论文是为了应用抽签票假设（Lottery Ticket Hypothesis，LTH）到扩散模型而写的。
methods: 这个论文使用了LTH来找到一个扩散模型中的精炼版网络，并通过对这个精炼版网络进行简化来减少计算量。
results: 实验结果表明，这个方法可以找到一个具有更高精度且具有更少计算量的扩散模型。 codes可以在https://github.com/osier0524/Lottery-Ticket-to-DDPM中找到。

Abstract
Despite the success of diffusion models, the training and inference of diffusion models are notoriously expensive due to the long chain of the reverse process. In parallel, the Lottery Ticket Hypothesis (LTH) claims that there exists winning tickets (i.e., aproperly pruned sub-network together with original weight initialization) that can achieve performance competitive to the original dense neural network when trained in isolation. In this work, we for the first time apply LTH to diffusion models. We empirically find subnetworks at sparsity 90%-99% without compromising performance for denoising diffusion probabilistic models on benchmarks (CIFAR-10, CIFAR-100, MNIST). Moreover, existing LTH works identify the subnetworks with a unified sparsity along different layers. We observe that the similarity between two winning tickets of a model varies from block to block. Specifically, the upstream layers from two winning tickets for a model tend to be more similar than the downstream layers. Therefore, we propose to find the winning ticket with varying sparsity along different layers in the model. Experimental results demonstrate that our method can find sparser sub-models that require less memory for storage and reduce the necessary number of FLOPs. Codes are available at https://github.com/osier0524/Lottery-Ticket-to-DDPM.

摘要
尽管扩散模型取得成功，但它们的训练和推理过程却很昂贵，主要因为扩散过程中的链式结构。同时， Lottery Ticket Hypothesis（LTH）假设存在赢家票（即适当剪辑后的子网络以及原始权重初始化），可以在孤立训练中与普通 dense neural network 具有竞争性的性能。在这个工作中，我们首次应用 LTH 到扩散模型。我们实验发现，在 CIFAR-10、CIFAR-100 和 MNIST 等标准图像预测任务上，可以在 diffusion probabilistic models 中找到 90%-99% 的杂度率下的优秀子网络，而不会影响性能。此外，现有的 LTH 工作通常会找到具有不同层次杂度的子网络。我们发现，两个赢家票之间的相似性从层次上不同。具体来说，模型的上游层从两个赢家票之间更加相似，而下游层则更加不同。因此，我们提议在不同层次上找到具有变化杂度的赢家票。实验结果表明，我们的方法可以找到更加简洁的子网络，减少存储的内存需求和计算所需的 FLOPs。代码可以在中找到。

Adaptive Test-Time Personalization for Federated Learning

paper_url: http://arxiv.org/abs/2310.18816
repo_url: https://github.com/baowenxuan/atp
paper_authors: Wenxuan Bao, Tianxin Wei, Haohan Wang, Jingrui He
for: 本研究旨在提出一种在测试时进行个性化 Federated Learning (FL) 的方法，以适应不同来源客户端的分布差异。
methods: 我们提出了一种名为 ATP 的自适应学习算法，可以在不含标注数据的情况下，在测试时地方式地适应模型。
results: 我们的 ATP 算法在面对多种分布差异，包括标签差异、图像损害和频率差异等，能够超越现有的 TTA 方法，并且可以在多个数据集和模型架构上实现优秀的表现。

Abstract
Personalized federated learning algorithms have shown promising results in adapting models to various distribution shifts. However, most of these methods require labeled data on testing clients for personalization, which is usually unavailable in real-world scenarios. In this paper, we introduce a novel setting called test-time personalized federated learning (TTPFL), where clients locally adapt a global model in an unsupervised way without relying on any labeled data during test-time. While traditional test-time adaptation (TTA) can be used in this scenario, most of them inherently assume training data come from a single domain, while they come from multiple clients (source domains) with different distributions. Overlooking these domain interrelationships can result in suboptimal generalization. Moreover, most TTA algorithms are designed for a specific kind of distribution shift and lack the flexibility to handle multiple kinds of distribution shifts in FL. In this paper, we find that this lack of flexibility partially results from their pre-defining which modules to adapt in the model. To tackle this challenge, we propose a novel algorithm called ATP to adaptively learns the adaptation rates for each module in the model from distribution shifts among source domains. Theoretical analysis proves the strong generalization of ATP. Extensive experiments demonstrate its superiority in handling various distribution shifts including label shift, image corruptions, and domain shift, outperforming existing TTA methods across multiple datasets and model architectures. Our code is available at https://github.com/baowenxuan/ATP .

摘要
个人化联合学习算法已经在不同分布偏移中适应模型表现出色。然而，大多数这些方法需要测试客户端上有标注数据进行个人化，而在实际场景中这些数据通常不可获得。在这篇论文中，我们介绍了一种新的设定，即测试时个人化联合学习（TTPFL），Client可以在无标注数据的情况下，在本地适应全球模型，而不需要任何标注数据。尽管传统的测试时适应（TTA）可以在这种场景中使用，但大多数它们假设训练数据来自单一领域，而实际上来自多个客户端（源领域）的分布不同。忽略这些领域关系可能会导致低效泛化。此外，大多数TTA算法是为某种特定的分布偏移设计的，缺乏在多种分布偏移中的灵活性。为解决这个挑战，我们提议了一种新的算法，即ATP，可以自动学习模型中每个模块的适应率从分布偏移中。理论分析表明ATP具有强大的泛化性。广泛的实验表明ATP在处理多种分布偏移，包括标签偏移、图像损害和频率偏移，在多个数据集和模型架构上超越了现有的TTA方法，并且其代码可以在https://github.com/baowenxuan/ATP上获取。

Stability of Random Forests and Coverage of Random-Forest Prediction Intervals

paper_url: http://arxiv.org/abs/2310.18814
repo_url: None
paper_authors: Yan Wang, Huaiqing Wu, Dan Nettleton
for: 这个论文主要是为了研究随机森林的稳定性，并且提供了一个稳定性的定义，以及一种基于这个定义的预测 интерVALL的建立方法。
methods: 这个论文使用了随机森林的实际实现，以及一些数理Statistics的工具来研究随机森林的稳定性。
results: 这个论文的结果表明，随机森林在一定条件下具有稳定性，并且可以提供正确的预测点和预测 интерVALL，而且这些预测 interval 的覆盖率可以保证在一定范围内。

Abstract
We establish stability of random forests under the mild condition that the squared response ($Y^2$) does not have a heavy tail. In particular, our analysis holds for the practical version of random forests that is implemented in popular packages like \texttt{randomForest} in \texttt{R}. Empirical results show that stability may persist even beyond our assumption and hold for heavy-tailed $Y^2$. Using the stability property, we prove a non-asymptotic lower bound for the coverage probability of prediction intervals constructed from the out-of-bag error of random forests. With another mild condition that is typically satisfied when $Y$ is continuous, we also establish a complementary upper bound, which can be similarly established for the jackknife prediction interval constructed from an arbitrary stable algorithm. We also discuss the asymptotic coverage probability under assumptions weaker than those considered in previous literature. Our work implies that random forests, with its stability property, is an effective machine learning method that can provide not only satisfactory point prediction but also justified interval prediction at almost no extra computational cost.

摘要
我们证明随机森林的稳定性，具体来说是当响应值($Y^2$) 不具有极大的尾部时。我们的分析适用于实际的随机森林实现，如\texttt{randomForest} 在 \texttt{R} 中的实现。实际结果表明，稳定性可能会 persist 超过我们的假设，并且适用于重 tailed $Y^2$。使用稳定性质量，我们证明了预测间隔 constructed from 随机森林的 out-of-bag 错误的下界。另外，对于 continuous $Y$ 的情况，我们还设立了一个轻量级的条件，并证明了这个下界。我们还讨论了先前文献中考虑的假设下的极限覆盖率。我们的工作 imply 随机森林，具有稳定性质量，是一种有效的机器学习方法，可以不仅提供满意的点预测，还可以提供正确的间预测，而且只需要一些较少的计算成本。

The Synergy of Speculative Decoding and Batching in Serving Large Language Models

paper_url: http://arxiv.org/abs/2310.18813
repo_url: None
paper_authors: Qidong Su, Christina Giannoula, Gennady Pekhimenko
for: 这篇论文的目的是研究大语言模型（LLM）的批处理和预测解oding技术，以提高LLM的硬件利用率。
methods: 这篇论文使用了批处理和预测解oding两种技术来提高LLM的硬件利用率。
results: 论文的实验结果表明，适当的预测 lengths 与批处理大小有关，而且提出了一种适应性的预测解oding策略，可以与现有的最佳化策略相比。

Abstract
Large Language Models (LLMs) like GPT are state-of-the-art text generation models that provide significant assistance in daily routines. However, LLM execution is inherently sequential, since they only produce one token at a time, thus incurring low hardware utilization on modern GPUs. Batching and speculative decoding are two techniques to improve GPU hardware utilization in LLM inference. To study their synergy, we implement a prototype implementation and perform an extensive characterization analysis on various LLM models and GPU architectures. We observe that the optimal speculation length depends on the batch size used. We analyze the key observation and build a quantitative model to explain it. Based on our analysis, we propose a new adaptive speculative decoding strategy that chooses the optimal speculation length for different batch sizes. Our evaluations show that our proposed method can achieve equal or better performance than the state-of-the-art speculation decoding schemes with fixed speculation length.

摘要
大型语言模型（LLM）如GPT是现在的文本生成模型，它们在日常 Routine 中提供了重要的帮助。然而，LLM 执行是Sequential 的，它们只生成一个 Token 在一次，从而导致现代 GPU 的硬件利用率低。批处和推测解码是两种技术来提高 LLM 执行的 GPU 硬件利用率。为了研究这两种技术的相互作用，我们实现了一个原型实现，并对不同的 LLM 模型和 GPU 架构进行了广泛的分析。我们发现，使用不同的批处大小时，最佳的推测长度会有所不同。我们分析了这一关键观察结果，并建立了一个量化的模型来解释它。根据我们的分析，我们提出了一种新的自适应推测解码策略，可以根据不同的批处大小选择最佳的推测长度。我们的评估表明，我们的提议方法可以与现有的最佳推测解码方法相当或更好的性能。

Inverse distance weighting attention

paper_url: http://arxiv.org/abs/2310.18805
repo_url: https://github.com/calvinmccarter/idw-attention
paper_authors: Calvin McCarter
for: 这篇论文研究了取代透彩积 dot-product 注意力的negative-log of Euclidean distance 的效果。
methods: 这种注意力方式简化为 inverse distance weighting interpolation，并在简单的一层隐藏层网络和vanilla cross-entropy loss中进行训练，用于文本分类问题。
results: 研究发现，使用这种注意力方式可以生成一个包含原型的键矩阵和相应的 logits 的解释网络，并可以通过手动构建的特殊情况 прототипы进行低影响的特殊情况处理。

Abstract
We report the effects of replacing the scaled dot-product (within softmax) attention with the negative-log of Euclidean distance. This form of attention simplifies to inverse distance weighting interpolation. Used in simple one hidden layer networks and trained with vanilla cross-entropy loss on classification problems, it tends to produce a key matrix containing prototypes and a value matrix with corresponding logits. We also show that the resulting interpretable networks can be augmented with manually-constructed prototypes to perform low-impact handling of special cases.

摘要
我们报告了在扩展点积（在满意函数中）中使用负欧几丁度的效果。这种注意力的形式简化为对距离权重 interpolating。在简单的一个隐藏层网络中使用，并使用普通的极值损失函数进行训练，它通常会生成一个包含原型的键矩阵和相应的 logits 矩阵。我们还示出了使用手动构造的特殊情况扩展的可行性，以便实现低影响的特殊情况处理。Note: "扩展点积" in Chinese is "扩展点积" (kuòzhè dòngshí), and "负欧几丁度" is "负欧几丁度" (fùōujìtiànduō).

Weakly Coupled Deep Q-Networks

paper_url: http://arxiv.org/abs/2310.18803
repo_url: None
paper_authors: Ibrahim El Shar, Daniel R. Jiang
for: 增强深度强化学习算法的性能在受约非常小的 Markov决策过程（WCMDP）中。
methods: 使用单一网络训练多个独立的 DQN “子代理”，每个子代理专门处理一个子问题，然后将其解决结果组合成最佳动作值的上界，以引导主 DQN 代理向优化尝试。
results: 在有多达 10 个子问题、3^10 个总动作和连续状态空间的设置下，与 DQN 和相关技术相比，WCDQN 在数值实验中显示更快的收敛速度。

Abstract
We propose weakly coupled deep Q-networks (WCDQN), a novel deep reinforcement learning algorithm that enhances performance in a class of structured problems called weakly coupled Markov decision processes (WCMDP). WCMDPs consist of multiple independent subproblems connected by an action space constraint, which is a structural property that frequently emerges in practice. Despite this appealing structure, WCMDPs quickly become intractable as the number of subproblems grows. WCDQN employs a single network to train multiple DQN "subagents", one for each subproblem, and then combine their solutions to establish an upper bound on the optimal action value. This guides the main DQN agent towards optimality. We show that the tabular version, weakly coupled Q-learning (WCQL), converges almost surely to the optimal action value. Numerical experiments show faster convergence compared to DQN and related techniques in settings with as many as 10 subproblems, $3^{10}$ total actions, and a continuous state space.

摘要
我们提出了弱连结深度Q网络（WCDQN），一种新的深度训练学习算法，它在受限构造问题（WCMDP）中提高表现。WCMDP包含多个独立的子问题，连接在动作空间约束上，这是实际中常见的结构性特征。然而，随着子问题的数量增加，WCMDP很快就会变得无法应对。WCDQN使用单一网络来训练多个DQN“子代”，每个子代针对每个子问题进行训练，然后结合其解决方案以建立最佳动作值的Upper bound。这导引主DQN代向最佳解决方案。我们证明了这个 Tabular 版本，弱连结Q学习（WCQL），会逐渐趋向最佳动作值，并且在包含多达10个子问题、3^{10}个总动作和连续状态空间的numerical实验中比DQN和相关技术更快地趋向最佳解决方案。

A Competitive Algorithm for Agnostic Active Learning

paper_url: http://arxiv.org/abs/2310.18786
repo_url: None
paper_authors: Eric Price, Yihan Zhou
for: 这种纸是用于研究agnostic active learning的最佳算法，具体来说是用于任何二分类假设集$H$和分布$D_X$ over $X$中的输入。
methods: 我们采用了一种不同于现有的方法的approach，即使用splitting-based方法，以实现在$O(m^* \log |H|)$ queries中达到$O(\eta)$ error的目标。
results: 我们的算法可以与最佳算法匹配，即使在某些输入上有NP困难，我们的算法可以在$O(\log |H|)$ overhead下达到$O(\eta)$ error。

Abstract
For some hypothesis classes and input distributions, active agnostic learning needs exponentially fewer samples than passive learning; for other classes and distributions, it offers little to no improvement. The most popular algorithms for agnostic active learning express their performance in terms of a parameter called the disagreement coefficient, but it is known that these algorithms are inefficient on some inputs. We take a different approach to agnostic active learning, getting an algorithm that is competitive with the optimal algorithm for any binary hypothesis class $H$ and distribution $D_X$ over $X$. In particular, if any algorithm can use $m^*$ queries to get $O(\eta)$ error, then our algorithm uses $O(m^* \log |H|)$ queries to get $O(\eta)$ error. Our algorithm lies in the vein of the splitting-based approach of Dasgupta [2004], which gets a similar result for the realizable ($\eta = 0$) setting. We also show that it is NP-hard to do better than our algorithm's $O(\log |H|)$ overhead in general.

摘要
For some hypothesis classes and input distributions, active agnostic learning needs exponentially fewer samples than passive learning; for other classes and distributions, it offers little to no improvement. The most popular algorithms for agnostic active learning express their performance in terms of a parameter called the disagreement coefficient, but it is known that these algorithms are inefficient on some inputs. We take a different approach to agnostic active learning, getting an algorithm that is competitive with the optimal algorithm for any binary hypothesis class $H$ and distribution $D_X$ over $X$. In particular, if any algorithm can use $m^*$ queries to get $O(\eta)$ error, then our algorithm uses $O(m^* \log |H|)$ queries to get $O(\eta)$ error. Our algorithm is in the same vein as the splitting-based approach of Dasgupta [2004], which gets a similar result for the realizable ($\eta = 0$) setting. We also prove that it is NP-hard to do better than our algorithm's $O(\log |H|)$ overhead in general.Note: The text has been translated using Simplified Chinese characters.

High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise

paper_url: http://arxiv.org/abs/2310.18784
repo_url: None
paper_authors: Aleksandar Armacki, Pranay Sharma, Gauri Joshi, Dragana Bajovic, Dusan Jakovetic, Soummya Kar
for: 本文研究了一种广泛的非线性SGD方法的收敛性。
methods: 本文使用了高probability下的收敛性 bounds，并且可以涵盖大多数现有的非线性SGD方法，如clipping、normalization和quantization。
results: 对具有 lipschitz continuous 的梯度的强转换函数，本文证明了logarithmic的依赖于失败概率，而且可以在heavy-tailed noise下工作。此外，本文的结果比现有的结果更加广泛，可以涵盖更多的非线性SGD方法和不同的噪声分布。

Abstract
Several recent works have studied the convergence \textit{in high probability} of stochastic gradient descent (SGD) and its clipped variant. Compared to vanilla SGD, clipped SGD is practically more stable and has the additional theoretical benefit of logarithmic dependence on the failure probability. However, the convergence of other practical nonlinear variants of SGD, e.g., sign SGD, quantized SGD and normalized SGD, that achieve improved communication efficiency or accelerated convergence is much less understood. In this work, we study the convergence bounds \textit{in high probability} of a broad class of nonlinear SGD methods. For strongly convex loss functions with Lipschitz continuous gradients, we prove a logarithmic dependence on the failure probability, even when the noise is heavy-tailed. Strictly more general than the results for clipped SGD, our results hold for any nonlinearity with bounded (component-wise or joint) outputs, such as clipping, normalization, and quantization. Further, existing results with heavy-tailed noise assume bounded $\eta$-th central moments, with $\eta \in (1,2]$. In contrast, our refined analysis works even for $\eta=1$, strictly relaxing the noise moment assumptions in the literature.

摘要

A Data-driven Recommendation Framework for Optimal Walker Designs

paper_url: http://arxiv.org/abs/2310.18772
repo_url: None
paper_authors: Advaith Narayanan
for: 这篇论文旨在优化医疗步行器，以提高临床恢复和生理治疗下肢体的功能。
methods: 该论文使用自动化机器学习模型和栅Stacked-Ensemble方法，以优化医疗步行器的设计。同时，该论文还提供了大量的 Parametric walker 设计数据，以便训练预测模型。
results: 该论文的结果表明，通过使用自动化机器学习模型和多目标优化算法，可以实现高性能的医疗步行器设计。论文还提供了一些可能的医疗步行器设计，其中一些设计可以减轻重量达30%，同时提高结构稳定性和完整性。

Abstract
The rapidly advancing fields of statistical modeling and machine learning have significantly enhanced data-driven design and optimization. This paper focuses on leveraging these design algorithms to optimize a medical walker, an integral part of gait rehabilitation and physiological therapy of the lower extremities. To achieve the desirable qualities of a walker, we train a predictive machine-learning model to identify trade-offs between performance objectives, thus enabling the use of efficient optimization algorithms. To do this, we use an Automated Machine Learning model utilizing a stacked-ensemble approach shown to outperform traditional ML models. However, training a predictive model requires vast amounts of data for accuracy. Due to limited publicly available walker designs, this paper presents a dataset of more than 5,000 parametric walker designs with performance values to assess mass, structural integrity, and stability. These performance values include displacement vectors for the given load case, stress coefficients, mass, and other physical properties. We also introduce a novel method of systematically calculating the stability index of a walker. We use MultiObjective Counterfactuals for Design (MCD), a novel genetic-based optimization algorithm, to explore the diverse 16-dimensional design space and search for high-performing designs based on numerous objectives. This paper presents potential walker designs that demonstrate up to a 30% mass reduction while increasing structural stability and integrity. This work takes a step toward the improved development of assistive mobility devices.

摘要
“随着统计模型和机器学习的快速进步，数据驱动设计和优化技术已经得到了很大的提高。本文将focus on 使用这些设计算法来优化医疗杆子，它是距离股体重abilitation和物理治疗的重要部分。为了实现杆子的欲望性能，我们将使用预测机器学习模型，以识别表现目标之间的贸易，并启用高效的优化算法。我们使用了自动化机器学习模型，使用堆叠合 ensemble 方法，已经被证明可以超越传统机器学习模型。然而，训练预测模型需要巨量的数据，以确保准确性。由于有限的公开可用的杆子设计，本文提供了超过5,000个 Parametric 杆子设计，并且包含表现值，以评估杆子的质量、结构完整性和稳定性。我们还引入了一新的稳定指数计算方法。我们使用多目标Counterfactuals for Design (MCD) ，一种新的基因型数据分析方法，来探索16个维度的设计空间，寻找高性能的设计。本文显示了可能的杆子设计，证明了可以降低30%的质量，同时增加结构的稳定性和完整性。这个工作为伤健移动设备的改进做出了一步。”

Rethinking Semi-Supervised Imbalanced Node Classification from Bias-Variance Decomposition

paper_url: http://arxiv.org/abs/2310.18765
repo_url: https://github.com/yanliang3612/revar
paper_authors: Divin Yan, Gengchen Wei, Chen Yang, Shengzhong Zhang, Zengfeng Huang
for: Addressing the issue of class imbalance in graph neural networks (GNNs) for learning on graph-structured data.
methods: Integrates imbalanced node classification and Bias-Variance Decomposition, leverages graph augmentation technique to estimate the variance, and designs a regularization term to alleviate the impact of imbalance.
results: Outperforms state-of-the-art methods in various imbalanced scenarios, providing a novel theoretical perspective for addressing the problem of imbalanced node classification in GNNs.

Abstract
This paper introduces a new approach to address the issue of class imbalance in graph neural networks (GNNs) for learning on graph-structured data. Our approach integrates imbalanced node classification and Bias-Variance Decomposition, establishing a theoretical framework that closely relates data imbalance to model variance. We also leverage graph augmentation technique to estimate the variance, and design a regularization term to alleviate the impact of imbalance. Exhaustive tests are conducted on multiple benchmarks, including naturally imbalanced datasets and public-split class-imbalanced datasets, demonstrating that our approach outperforms state-of-the-art methods in various imbalanced scenarios. This work provides a novel theoretical perspective for addressing the problem of imbalanced node classification in GNNs.

摘要
Translation notes:* "GNNs" is translated as "图 нейрон网络" (graph neural networks)* "class imbalance" is translated as "类别不均衡" (class imbalance)* "Bias-Variance Decomposition" is translated as "偏差-差异分解" (Bias-Variance Decomposition)* "graph augmentation" is translated as "图补充" (graph augmentation)* "regularization term" is translated as "正则化项" (regularization term)Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form as well.

Purify++: Improving Diffusion-Purification with Advanced Diffusion Models and Control of Randomness

paper_url: http://arxiv.org/abs/2310.18762
repo_url: None
paper_authors: Boya Zhang, Weijian Luo, Zhihua Zhang
for: 防止神经网络分类器受到攻击的安全性研究
methods: diffusion purification 方法
results: Purify++ 算法提高了对多种攻击方法的防御能力

Abstract
Adversarial attacks can mislead neural network classifiers. The defense against adversarial attacks is important for AI safety. Adversarial purification is a family of approaches that defend adversarial attacks with suitable pre-processing. Diffusion models have been shown to be effective for adversarial purification. Despite their success, many aspects of diffusion purification still remain unexplored. In this paper, we investigate and improve upon three limiting designs of diffusion purification: the use of an improved diffusion model, advanced numerical simulation techniques, and optimal control of randomness. Based on our findings, we propose Purify++, a new diffusion purification algorithm that is now the state-of-the-art purification method against several adversarial attacks. Our work presents a systematic exploration of the limits of diffusion purification methods.

摘要
Adversarial attacks can mislead neural network classifiers. The defense against adversarial attacks is important for AI safety. Adversarial purification is a family of approaches that defend against adversarial attacks with suitable pre-processing. Diffusion models have been shown to be effective for adversarial purification. Despite their success, many aspects of diffusion purification still remain unexplored. In this paper, we investigate and improve upon three limiting designs of diffusion purification: the use of an improved diffusion model, advanced numerical simulation techniques, and optimal control of randomness. Based on our findings, we propose Purify++, a new diffusion purification algorithm that is now the state-of-the-art purification method against several adversarial attacks. Our work presents a systematic exploration of the limits of diffusion purification methods.Here's the translation in Simplified Chinese characters: adversarial attacks 可以诱导 нейрон网络分类器错误。防止 adversarial attacks 是 AI 安全的重要任务。adversarial purification 是一家 approachedefend against adversarial attacks with suitable pre-processing。diffusion models 已经被证明是有效的 adversarial purification 方法。despite their success, many aspects of diffusion purification still remain unexplored。在这篇 paper中，我们investigate 和改进 diffusion purification 的三个限制设计：使用改进的 diffusion model，进阶的数值 simulations 技术，和优化的随机性控制。基于我们的发现，我们提出 Purify++, 一个新的 diffusion purification 算法，现在是 severaldiffusion purification 方法的州际标准。our work 展示了 diffusion purification 方法的系统性探索。

Optimization of utility-based shortfall risk: A non-asymptotic viewpoint

paper_url: http://arxiv.org/abs/2310.18743
repo_url: None
paper_authors: Sumedh Gupte, Prashanth L. A., Sanjay P. Bhat
for: 本文研究了金融领域中流量风险的评估和优化问题，具体来说是Utility-based shortfall risk（UBSR）的估计和优化问题。
methods: 本文使用了类型样本平均approximation（SAA）来估计UBSR，并 derive了非尺度性质 bound 的均方差误差。在UBSR优化问题中，本文 derive了UBSR导数的表达式，该表达式是一个期望比率，两个期望都 involve UBSR。使用SAA来 aproximate numerator和denominator中的UBSR，得到一个偏导数估计器。
results: 本文 derive non-尺度性质 bound 表示该偏导数估计器是 asymptotically unbiased。此外，本文还 derive non-尺度性质 bound 表示SG算法的速度减少率。

Abstract
We consider the problems of estimation and optimization of utility-based shortfall risk (UBSR), which is a popular risk measure in finance. In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of the classical sample average approximation (SAA) of UBSR. Next, in the context of UBSR optimization, we derive an expression for the UBSR gradient under a smooth parameterization. This expression is a ratio of expectations, both of which involve the UBSR. We use SAA for the numerator as well as denominator in the UBSR gradient expression to arrive at a biased gradient estimator. We derive non-asymptotic bounds on the estimation error, which show that our gradient estimator is asymptotically unbiased. We incorporate the aforementioned gradient estimator into a stochastic gradient (SG) algorithm for UBSR optimization. Finally, we derive non-asymptotic bounds that quantify the rate of convergence of our SG algorithm for UBSR optimization.

摘要
我们考虑了金融中流行的价值基础隐没隐危 (UBSR) 的估计和优化问题。在 UBSR 估计上，我们 derivated 一个非对数减少的 bound 为 classical sample average approximation (SAA) 的均方误差。在 UBSR 优化上，我们 derivated 一个表达式，用于 UBSR 的梯度，这个表达式是两个期望的比率，其中一个是 UBSR 的期望值。我们使用 SAA 来计算 numerator 和 denominator 两个部分，从而得到一个偏导数 estimator。我们 derivated 非对数减少的 bounds ，证明了我们的梯度 estimator 是 asymptotically unbiased。最后，我们 incorporated 这个梯度 estimator 到一个随机梯度 (SG) 算法中，并 derivated 非对数减少的 bounds 来评估这个算法的速度传递率。

Curriculum Learning for Graph Neural Networks: Which Edges Should We Learn First

paper_url: http://arxiv.org/abs/2310.18735
repo_url: https://github.com/rollingstonezz/curriculum_learning_for_gnns
paper_authors: Zheng Zhang, Junxiang Wang, Liang Zhao
For: 本文提出了一种新的课程学习策略，用于逐渐将图数据中的边 integrate 到训练中，以提高图 neural network 的泛化能力和Robustness。* Methods: 本文提出了一种基于课程学习的策略，使用了度量学习策略来衡量边的难度，并逐渐将边添加到训练中，以便学习更好的表示。* Results: 经过EXTENSIVE experiments on nine synthetic datasets and nine real-world datasets, 本文 Demonstrated the strength of the proposed method in improving the generalization ability and robustness of learned representations.

Abstract
Graph Neural Networks (GNNs) have achieved great success in representing data with dependencies by recursively propagating and aggregating messages along the edges. However, edges in real-world graphs often have varying degrees of difficulty, and some edges may even be noisy to the downstream tasks. Therefore, existing GNNs may lead to suboptimal learned representations because they usually treat every edge in the graph equally. On the other hand, Curriculum Learning (CL), which mimics the human learning principle of learning data samples in a meaningful order, has been shown to be effective in improving the generalization ability and robustness of representation learners by gradually proceeding from easy to more difficult samples during training. Unfortunately, existing CL strategies are designed for independent data samples and cannot trivially generalize to handle data dependencies. To address these issues, we propose a novel CL strategy to gradually incorporate more edges into training according to their difficulty from easy to hard, where the degree of difficulty is measured by how well the edges are expected given the model training status. We demonstrate the strength of our proposed method in improving the generalization ability and robustness of learned representations through extensive experiments on nine synthetic datasets and nine real-world datasets. The code for our proposed method is available at https://github.com/rollingstonezz/Curriculum_learning_for_GNNs.

摘要
graph neural networks (GNNs) 图神网络已经取得了很大的成功，通过 recursively propagating 和 aggregating 消息来表示具有依赖关系的数据。然而，实际世界中的图 often have varying degrees of difficulty, and some edges may even be noisy to the downstream tasks.因此，现有的 GNN 可能会导致学习的表示不佳，因为它们通常对每个图边进行平等的处理。在另一个面向，curriculum learning (CL)，模仿人类学习的原理，可以在训练过程中逐渐从易到更加复杂的样本中学习，从而提高学习的普适性和鲁棒性。然而，现有的 CL 策略是为独立的数据样本设计的，无法直接扩展到处理数据依赖关系。为解决这些问题，我们提出了一种新的 CL 策略，通过度量图边的难度从易到更加困难地慢慢地包含更多的图边到训练中，其中图边的难度通过模型训练状态来评估。我们通过对九个Synthetic数据集和九个实际世界数据集进行了广泛的实验，证明了我们的提议的方法能够提高学习的普适性和鲁棒性。code для我们的提议方法可以在 https://github.com/rollingstonezz/Curriculum_learning_for_GNNs 找到。

Latent class analysis by regularized spectral clustering

paper_url: http://arxiv.org/abs/2310.18727
repo_url: None
paper_authors: Huan Qing
for: 这篇论文的目的是提出两种新的算法来估计 categorical 数据中的潜在类型模型。
methods: 这两种算法都基于一个新定义的规范化拉普拉斯矩阵，计算从响应矩阵中获得的。作者提供了这些算法的理论收敛速率，并证明了它们在某些轻度的条件下稳定地生成了一致的潜在类型分析。
results: 作者通过了广泛的 simulations 实验来证明算法的效率和准确性，并在实际的 categorical 数据上应用了这些算法，获得了有前途的结果。

Abstract
The latent class model is a powerful tool for identifying latent classes within populations that share common characteristics for categorical data in social, psychological, and behavioral sciences. In this article, we propose two new algorithms to estimate a latent class model for categorical data. Our algorithms are developed by using a newly defined regularized Laplacian matrix calculated from the response matrix. We provide theoretical convergence rates of our algorithms by considering a sparsity parameter and show that our algorithms stably yield consistent latent class analysis under mild conditions. Additionally, we propose a metric to capture the strength of latent class analysis and several procedures designed based on this metric to infer how many latent classes one should use for real-world categorical data. The efficiency and accuracy of our algorithms are verified by extensive simulated experiments, and we further apply our algorithms to real-world categorical data with promising results.

摘要
“拉丁类模型是一种强大的工具，用于在人口中找到共同特征的分类数据的社会、心理和行为科学中。在这篇文章中，我们提出了两种新的算法，用于估计拉丁类模型。我们的算法基于响应矩阵中定义的新的规范化拉普拉斯矩阵。我们提供了对我们的算法的理论收敛率，并证明我们的算法在轻度条件下稳定地生成了一致的拉丁类分析。此外，我们还提出了一个用于捕捉拉丁类分析的强度的度量，以及基于这个度量的几种过程，用于在实际中的分类数据中决定拉丁类的数量。我们的算法的效率和准确性通过了广泛的模拟实验，并在实际中应用到了分类数据的成功。”Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

On the Accuracy of Hotelling-Type Asymmetric Tensor Deflation: A Random Tensor Analysis

paper_url: http://arxiv.org/abs/2310.18717
repo_url: None
paper_authors: Mohamed El Amine Seddik, Maxime Guillaud, Alexis Decurninge, José Henrique de Morais Goulart
For: This paper studies the asymptotic behavior of Hotelling-type tensor deflation in the presence of noise, specifically in the regime of large tensor dimensions.* Methods: The paper uses recent advances in random tensor theory to analytically characterize the estimated singular values and the alignment of estimated and true singular vectors at each step of the deflation procedure.* Results: The paper shows that the estimated singular values and the alignments between the estimated and true rank-1 signal components can be used to construct estimators of the signal-to-noise ratios and the alignments between the estimated and true rank-1 signal components.

Abstract
This work introduces an asymptotic study of Hotelling-type tensor deflation in the presence of noise, in the regime of large tensor dimensions. Specifically, we consider a low-rank asymmetric tensor model of the form $\sum_{i=1}^r \beta_i{\mathcal{A}_i + {\mathcal{W}$ where $\beta_i\geq 0$ and the ${\mathcal{A}_i$'s are unit-norm rank-one tensors such that $\left| \langle {\mathcal{A}_i, {\mathcal{A}_j \rangle \right| \in [0, 1]$ for $i\neq j$ and ${\mathcal{W}$ is an additive noise term. Assuming that the dominant components are successively estimated from the noisy observation and subsequently subtracted, we leverage recent advances in random tensor theory in the regime of asymptotically large tensor dimensions to analytically characterize the estimated singular values and the alignment of estimated and true singular vectors at each step of the deflation procedure. Furthermore, this result can be used to construct estimators of the signal-to-noise ratios $\beta_i$ and the alignments between the estimated and true rank-1 signal components.

摘要
The proposed method involves successively estimating the dominant components of the tensor and subtracting them from the noisy observation. By leveraging recent advances in random tensor theory, the paper analytically characterizes the estimated singular values and the alignment of estimated and true singular vectors at each step of the deflation procedure.Furthermore, the results can be used to construct estimators of the signal-to-noise ratios $\beta_i$ and the alignments between the estimated and true rank-1 signal components. This study provides a comprehensive understanding of the behavior of Hotelling-type tensor deflation in the presence of noise and its applications in signal processing and machine learning.In Simplified Chinese, the text can be translated as:这个研究介绍了一种幂等式抑制方法在噪声存在的情况下进行研究，具体来说是在大tensor维度下进行研究。这个模型是一个低维偏 asymmetric tensor的形式，具体来说是 $\sum_{i=1}^r \beta_i{\mathcal{A}_i + {\mathcal{W}$，其中 $\beta_i\geq 0$ 和 $\left| \langle {\mathcal{A}_i, {\mathcal{A}_j \rangle \right| \in [0, 1]$ для $i\neq j$。噪声是一个加itive的。该方法假设在噪声观测下，逐步估计主要组分，并将其从噪声观测中 subtract。通过利用最近的幂等式理论，这篇论文分析了在噪声存在下的逐步估计方法的特性。此外，这些结果还可以用于构建噪声比例和真实维度方向的估计器。这种方法的应用包括信号处理和机器学习等领域。通过这篇论文，我们可以更好地理解幂等式抑制方法在噪声存在下的行为，以及其在实际应用中的可行性和有效性。

Laplacian Canonization: A Minimalist Approach to Sign and Basis Invariant Spectral Embedding

paper_url: http://arxiv.org/abs/2310.18716
repo_url: https://github.com/pku-ml/laplaciancanonization
paper_authors: Jiangyan Ma, Yifei Wang, Yisen Wang
for: 提高 Graph Transformers 的效果，解决spectral embedding在理论上的缺陷
methods: 直接找到 Laplacian Canonization（LC），一种轻量级的预处理方法，可以应用于任何现有的 GNN
results: MAP 算法可以成功 canonize 超过 90% 的 eigenvectors，并在实验中表现出色，与现有方法相比带来较少的计算开销。

Abstract
Spectral embedding is a powerful graph embedding technique that has received a lot of attention recently due to its effectiveness on Graph Transformers. However, from a theoretical perspective, the universal expressive power of spectral embedding comes at the price of losing two important invariance properties of graphs, sign and basis invariance, which also limits its effectiveness on graph data. To remedy this issue, many previous methods developed costly approaches to learn new invariants and suffer from high computation complexity. In this work, we explore a minimal approach that resolves the ambiguity issues by directly finding canonical directions for the eigenvectors, named Laplacian Canonization (LC). As a pure pre-processing method, LC is light-weighted and can be applied to any existing GNNs. We provide a thorough investigation, from theory to algorithm, on this approach, and discover an efficient algorithm named Maximal Axis Projection (MAP) that works for both sign and basis invariance and successfully canonizes more than 90% of all eigenvectors. Experiments on real-world benchmark datasets like ZINC, MOLTOX21, and MOLPCBA show that MAP consistently outperforms existing methods while bringing minimal computation overhead. Code is available at https://github.com/PKU-ML/LaplacianCanonization.

摘要
干扰 embedding 是一种强大的图 embedding 技术，在图transformer 中得到了很多关注，但从理论角度来看，它的通用表达力来到了两个重要的对称性问题的价格，即标志对称性和基准对称性，这也限制了它在图数据上的效果。为了解决这个问题，许多前一代的方法开发了昂贵的方法来学习新的对称性，并且受到高计算复杂度的困扰。在这种情况下，我们 explore 一种最小的方法，即laplacian canonization (LC)，它可以直接找到图laplacian 的可 canonical 方向。作为一种纯粹的预处理方法，LC 轻量级，可以应用于任何现有的 GNNs。我们提供了一份 thorought 的调查，从理论到算法，对这种方法，并发现了一种高效的算法 named Maximal Axis Projection (MAP)，它可以实现标志对称性和基准对称性，并成功 canonize 超过 90% 的所有 eigenvectors。实验结果表明，MAP 在实际 benchmark 数据上（如 ZINC、MOLTOX21 和 MOLPCBA） consistently 超过现有方法，同时带来最小的计算开销。代码可以在 https://github.com/PKU-ML/LaplacianCanonization 上找到。

Episodic Multi-Task Learning with Heterogeneous Neural Processes

paper_url: http://arxiv.org/abs/2310.18713
repo_url: https://github.com/autumn9999/hnps
paper_authors: Jiayi Shen, Xiantong Zhen, Qi, Wang, Marcel Worring
for: 本研究旨在解决多任务学习中的数据不足问题，具体来说是在 episodic 训练设置下利用任务之间的不同信息和 episoden 中的元知识，以有效地处理每个任务。
methods: 我们开发了异 heterogeneous Neural Processes (HNPs) 来解决这个问题，它们在层次 Bayes 框架中有效地利用先前经验作为元知识，捕捉任务之间的相互关系，从而 mitigate 数据不足。 transformer 结构的推理模块也是为了快速地进行元知识和任务相关性的推理。
results: 实验结果表明我们的提案的 HNPs 在比较基eline的情况下表现出色，并且对减少数据不足的影响进行了证明。简化 Studios 中的结果也验证了我们设计的推理模块的有效性。

Abstract
This paper focuses on the data-insufficiency problem in multi-task learning within an episodic training setup. Specifically, we explore the potential of heterogeneous information across tasks and meta-knowledge among episodes to effectively tackle each task with limited data. Existing meta-learning methods often fail to take advantage of crucial heterogeneous information in a single episode, while multi-task learning models neglect reusing experience from earlier episodes. To address the problem of insufficient data, we develop Heterogeneous Neural Processes (HNPs) for the episodic multi-task setup. Within the framework of hierarchical Bayes, HNPs effectively capitalize on prior experiences as meta-knowledge and capture task-relatedness among heterogeneous tasks, mitigating data-insufficiency. Meanwhile, transformer-structured inference modules are designed to enable efficient inferences toward meta-knowledge and task-relatedness. In this way, HNPs can learn more powerful functional priors for adapting to novel heterogeneous tasks in each meta-test episode. Experimental results show the superior performance of the proposed HNPs over typical baselines, and ablation studies verify the effectiveness of the designed inference modules.

摘要

ALERTA-Net: A Temporal Distance-Aware Recurrent Networks for Stock Movement and Volatility Prediction

paper_url: http://arxiv.org/abs/2310.18706
repo_url: https://github.com/hao1zhao/alerta-net
paper_authors: Shengkun Wang, YangXiao Bai, Kaiqun Fu, Linhan Wang, Chang-Tien Lu, Taoran Ji
for: 预测股市运动和不稳定性 + 投资者和 policymakers 都需要准确预测股市，作为经济健康指标
methods: integrate sentiment analysis, macroeconomic indicators, search engine data, and historical prices within a multi-attention deep learning model + 利用社交媒体数据，具有丰富的公众情感信息，以增强股市预测的准确性
results: state-of-the-art performance using a dataset specifically curated for predicting stock market movements and volatility + 我们的提议模型在使用自定义的数据集后，实现了股市运动和不稳定性预测的状态oke-of-the-art表现

Abstract
For both investors and policymakers, forecasting the stock market is essential as it serves as an indicator of economic well-being. To this end, we harness the power of social media data, a rich source of public sentiment, to enhance the accuracy of stock market predictions. Diverging from conventional methods, we pioneer an approach that integrates sentiment analysis, macroeconomic indicators, search engine data, and historical prices within a multi-attention deep learning model, masterfully decoding the complex patterns inherent in the data. We showcase the state-of-the-art performance of our proposed model using a dataset, specifically curated by us, for predicting stock market movements and volatility.

摘要
für both investors und policymakers ist die prognose des aktienmarktes essenziell, da er als indicator für die wirtschaftliche well-being dient. um diese herausforderung zu meistern, nutzen wir die kraft von sozialen medien-daten, ein reiches quell von öffentlicher meinung, um die genauigkeit der aktienmarkt-vorhersagen zu verbessern. im Gegensatz zu conventional methods, entwickeln wir eine ansprechung, die sentiment-analyse, makroökonomische indicatoren, suchmaschine-daten und historische preise in einem multi-attention-tiefen lernmodell integriert, das die komplexen muster im data meisterlich decodiert. wir zeigen die state-of-the-art-leistung unseres vorgeschlagenen modells anhand einer datenbasis, die von uns speziell für die vorhersage von aktienmarkt-bewegungen und -volatilität sammeln.

Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning

paper_url: http://arxiv.org/abs/2310.19831
repo_url: https://github.com/alihanhyk/interpole
paper_authors: Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar
for: 这个论文是为了理解人类决策行为的概念模型，以便提高决策过程的透明度和负责任性。
methods: 这个论文提出了一种基于 bayesian 方法的可解释政策学习方法（Interpole），可以同时估计决策者的（可能偏袋）信念更新过程和决策策略。
results: 通过在模拟和真实世界数据上进行实验，论文示出了该方法的可能作为决策过程的调查、评估和理解的潜在价值。

Abstract
Understanding human behavior from observed data is critical for transparency and accountability in decision-making. Consider real-world settings such as healthcare, in which modeling a decision-maker's policy is challenging -- with no access to underlying states, no knowledge of environment dynamics, and no allowance for live experimentation. We desire learning a data-driven representation of decision-making behavior that (1) inheres transparency by design, (2) accommodates partial observability, and (3) operates completely offline. To satisfy these key criteria, we propose a novel model-based Bayesian method for interpretable policy learning ("Interpole") that jointly estimates an agent's (possibly biased) belief-update process together with their (possibly suboptimal) belief-action mapping. Through experiments on both simulated and real-world data for the problem of Alzheimer's disease diagnosis, we illustrate the potential of our approach as an investigative device for auditing, quantifying, and understanding human decision-making behavior.

摘要
理解人类行为从观察数据中是决策过程中的关键，以确保决策过程中的透明度和负责任。在实际场景中，如医疗行业，模型决策者的政策非常困难，因为无法访问基础状态、环境动力学不明确、无法进行实际实验。我们希望通过学习数据驱动的方法学习决策行为，以满足以下三个关键需求：1. 具有透明度设计，以便理解决策过程中的决策因素。2. 可以处理部分可见性，以适应决策过程中的不同情况。3. 完全没有线上运行，以便在决策过程中进行实时调整。为了满足这些需求，我们提出了一种新的模型基于概率方法，即“Interpole”，可以同时估算决策者的（可能偏见的）信念更新过程和（可能不优的）信念行为映射。通过在模拟和实际数据上进行实验，我们证明了我们的方法可以作为决策过程的调查、评估和理解人类决策行为的调查工具。

Towards Combinatorial Generalization for Catalysts: A Kohn-Sham Charge-Density Approach

paper_url: http://arxiv.org/abs/2310.18702
repo_url: https://github.com/ppope/rho-learn
paper_authors: Phillip Pope, David Jacobs
for: 本研究旨在探讨一种基于点wise学习的Kohn-Sham充电密度模型，以实现对新材料的预测和设计。
methods: 本研究使用了点wise学习方法，学习了 bulk catalysts 的充电密度，并在新的材料结构中进行了探索和预测。
results: 研究发现，使用点wise学习方法可以实现对新材料的预测和设计，并且可以在多种元素组合下实现 combinatorial 泛化。测试结果显示，超过 80% 的二元和三元测试样本在使用点wise学习方法下可以更快地达到稳定状态，相比标准基线下降减13%的迭代次数，这可能是独立的兴趣点。

Abstract
The Kohn-Sham equations underlie many important applications such as the discovery of new catalysts. Recent machine learning work on catalyst modeling has focused on prediction of the energy, but has so far not yet demonstrated significant out-of-distribution generalization. Here we investigate another approach based on the pointwise learning of the Kohn-Sham charge-density. On a new dataset of bulk catalysts with charge densities, we show density models can generalize to new structures with combinations of elements not seen at train time, a form of combinatorial generalization. We show that over 80% of binary and ternary test cases achieve faster convergence than standard baselines in Density Functional Theory, amounting to an average reduction of 13% in the number of iterations required to reach convergence, which may be of independent interest. Our results suggest that density learning is a viable alternative, trading greater inference costs for a step towards combinatorial generalization, a key property for applications.

摘要
金ohn-Sham方程在许多重要应用中发挥重要作用，如新 catalyst 的发现。现代机器学习方法在 catalyst 模型化中的 Prediction of energy 方面已经受到了重点研究，但是迄今为止并没有显示出significant out-of-distribution generalization。我们在这里 investigate 一种基于点wise learning of Kohn-Sham charge-density 的方法。使用新的 bulk catalysts with charge densities 数据集，我们显示了 density models 可以generalize 到新的结构，包括元素的组合不同于训练时间，这种 combinatorial generalization。我们表明了超过 80% 的 binary 和 ternary test cases 在 Density Functional Theory 中比标准基elines 更快 converges，平均降低了13%的迭代次数，这可能是独立的 interesseting。我们的结果表明了 density learning 是一种可行的alternative，通过更大的推理成本换取了 combinatorial generalization，一种重要的应用特性。

Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards

paper_url: http://arxiv.org/abs/2310.18701
repo_url: None
paper_authors: Bo Xue, Yimu Wang, Yuanyu Wan, Jinfeng Yi, Lijun Zhang
For: investigate the problem of generalized linear bandits with heavy-tailed rewards, and propose two novel algorithms based on truncation and mean of medians to address this issue.* Methods: propose two algorithms, one based on truncation and the other based on mean of medians, to achieve an almost optimal regret bound of $\widetilde{O}(dT^{\frac{1}{1+\epsilon})$ with online learning support and lower computational complexity.* Results: improve the regret bounds by a logarithmic factor compared to existing algorithms when $\epsilon=1$, and confirm the merits of the proposed algorithms through numerical experimental results.

Abstract
This paper investigates the problem of generalized linear bandits with heavy-tailed rewards, whose $(1+\epsilon)$-th moment is bounded for some $\epsilon\in (0,1]$. Although there exist methods for generalized linear bandits, most of them focus on bounded or sub-Gaussian rewards and are not well-suited for many real-world scenarios, such as financial markets and web-advertising. To address this issue, we propose two novel algorithms based on truncation and mean of medians. These algorithms achieve an almost optimal regret bound of $\widetilde{O}(dT^{\frac{1}{1+\epsilon})$, where $d$ is the dimension of contextual information and $T$ is the time horizon. Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches. Additionally, our mean-of-medians-based algorithm requires only $O(\log T)$ rewards and one estimator per epoch, making it more practical. Moreover, our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $\epsilon=1$. Numerical experimental results confirm the merits of our algorithms.

摘要

Clairvoyance: A Pipeline Toolkit for Medical Time Series

paper_url: http://arxiv.org/abs/2310.18688
repo_url: https://github.com/vanderschaarlab/clairvoyance
paper_authors: Daniel Jarrett, Jinsung Yoon, Ioana Bica, Zhaozhi Qian, Ari Ercole, Mihaela van der Schaar
for: 这个研究旨在提供一个统一的、端到端的、自动机器学习（AutoML）友好的数据驱动医疗决策支持系统，以便在实际医疗过程中与患者互动，提供适应性强的预测和决策支持。
methods: 这个系统使用了许多Machine Learning（ML）技术，包括数据预processing、缺失数据填充、特征选择、预测和不确定性估计等。
results: 这个系统可以在实际医疗应用中实现高度自动化的数据驱动医疗决策支持，并且可以在不同的医疗设置中进行适应性强的预测和决策支持。

Abstract
Time-series learning is the bread and butter of data-driven *clinical decision support*, and the recent explosion in ML research has demonstrated great potential in various healthcare settings. At the same time, medical time-series problems in the wild are challenging due to their highly *composite* nature: They entail design choices and interactions among components that preprocess data, impute missing values, select features, issue predictions, estimate uncertainty, and interpret models. Despite exponential growth in electronic patient data, there is a remarkable gap between the potential and realized utilization of ML for clinical research and decision support. In particular, orchestrating a real-world project lifecycle poses challenges in engineering (i.e. hard to build), evaluation (i.e. hard to assess), and efficiency (i.e. hard to optimize). Designed to address these issues simultaneously, Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a (i) software toolkit, (ii) empirical standard, and (iii) interface for optimization. Our ultimate goal lies in facilitating transparent and reproducible experimentation with complex inference workflows, providing integrated pathways for (1) personalized prediction, (2) treatment-effect estimation, and (3) information acquisition. Through illustrative examples on real-world data in outpatient, general wards, and intensive-care settings, we illustrate the applicability of the pipeline paradigm on core tasks in the healthcare journey. To the best of our knowledge, Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.

摘要
时间序列学习是医疗数据驱动的严重症状支持的基础，最近几年的机器学习研究表明了各种医疗设置中的潜力。然而，医疗时间序列问题在实际应用中具有复杂的特性：它们包括数据预处理、缺失值填充、特征选择、预测 issuing、 uncertainty 估计和模型解释等多个组件的交互。尽管电子病人数据的增长呈指数型增长，但是在临床研究和决策支持中实现的潜力与可能性之间还存在巨大的差距。特别是在实际项目生命周期中，工程（即困难于建立）、评估（即困难于评估）和效率（即困难于优化）等问题具有挑战性。为了解决这些问题，Clairvoyance 提出了一个统一、端到端、自动化 ML 友好的管道，作为（i）软件工具包、（ii）实验标准和（iii）优化接口。我们的最终目标是使得医疗时间序列 ML 实际实践中的透明度和可重现性得到改善。通过使用实际数据来 illustrate 管道的应用，我们示例了医疗旅程中的核心任务，如个性化预测、治疗效果估计和信息获取。根据我们所知，Clairvoyance 是首个实现了Complex Inference Workflows 的综合和自动化管道的医疗时间序列 ML 项目。

DySurv: Dynamic Deep Learning Model for Survival Prediction in the ICU

paper_url: http://arxiv.org/abs/2310.18681
repo_url: None
paper_authors: Munib Mesinovic, Peter Watkinson, Tingting Zhu
for: 这篇论文旨在提出一种基于深度学习的预测生存时间方法，以便在ICU中进行动态死亡风险预测。
methods: 这篇论文使用了一种名为 DySurv 的新型 conditional variational autoencoder-based 方法，使用了病人电子医疗纪录中的静态和时间序列数据来估计死亡风险。
results: 这篇论文的实验结果显示，DySurv 方法可以在标准库中对比其他方法表现出色，并且在实际患者数据库中进行了验证。 survival 估计的内在一致性和不同数据集中的稳定性都支持了这种动态深度学习模型在预测生存时间方法中的可靠性。

Abstract
Survival analysis helps approximate underlying distributions of time-to-events which in the case of critical care like in the ICU can be a powerful tool for dynamic mortality risk prediction. Extending beyond the classical Cox model, deep learning techniques have been leveraged over the last years relaxing the many constraints of their counterparts from statistical methods. In this work, we propose a novel conditional variational autoencoder-based method called DySurv which uses a combination of static and time-series measurements from patient electronic health records in estimating risk of death dynamically in the ICU. DySurv has been tested on standard benchmarks where it outperforms most existing methods including other deep learning methods and we evaluate it on a real-world patient database from MIMIC-IV. The predictive capacity of DySurv is consistent and the survival estimates remain disentangled across different datasets supporting the idea that dynamic deep learning models based on conditional variational inference in multi-task cases can be robust models for survival analysis.

摘要
生存分析可以 aproximate 时间事件的下面分布，在 ICU 中可以是一种强大的动态死亡风险预测工具。在过去几年中，深度学习技术被应用于生存分析，超越了统计方法的多种限制。在这种工作中，我们提出了一种名为 DySurv 的新方法，使用患者电子医疗记录中的静态和时间序列测量来 dynamically 估算 ICU 中死亡风险。DySurv 已经在标准Benchmark上测试，与其他深度学习方法相比，它在大多数情况下表现出色，并在实际患者数据库中进行了评估。survival 预测的可靠性和预测值在不同数据集中保持分离，支持我们的想法，即基于 conditional variational inference 的动态深度学习模型在多任务情况下可以是Robust模型 для survival analysis。

Energy-Based Models for Anomaly Detection: A Manifold Diffusion Recovery Approach

paper_url: http://arxiv.org/abs/2310.18677
repo_url: None
paper_authors: Sangwoong Yoon, Young-Uk Jin, Yung-Kyun Noh, Frank C. Park
for: 这篇论文是用于侦测异常（Anomaly Detection）的新方法。
methods: 这篇论文使用的方法是把资料点推广到低维度构造中，然后使用EBM进行侦测。
results: 实验结果显示，这篇论文的方法可以在不同的资料类型和侦测任务中具有优秀的表现。

Abstract
We present a new method of training energy-based models (EBMs) for anomaly detection that leverages low-dimensional structures within data. The proposed algorithm, Manifold Projection-Diffusion Recovery (MPDR), first perturbs a data point along a low-dimensional manifold that approximates the training dataset. Then, EBM is trained to maximize the probability of recovering the original data. The training involves the generation of negative samples via MCMC, as in conventional EBM training, but from a different distribution concentrated near the manifold. The resulting near-manifold negative samples are highly informative, reflecting relevant modes of variation in data. An energy function of MPDR effectively learns accurate boundaries of the training data distribution and excels at detecting out-of-distribution samples. Experimental results show that MPDR exhibits strong performance across various anomaly detection tasks involving diverse data types, such as images, vectors, and acoustic signals.

摘要
我们提出了一种新的能量基模型（EBM）训练方法，该方法利用数据中的低维结构。我们的算法，抽象扩散恢复（MPDR），首先将数据点扰动到一个低维抽象 manifold 上，然后通过 MCMC 生成负样本，与 convential EBM 训练中的负样本生成方式类似。但是，MPDR 使用的是一个集中在抽象 manifold 上的分布，从而生成了具有低维结构的负样本。这些近抽象 manifold 上的负样本具有高度信息richness，反映了数据中重要的变换模式。 MPDR 的能量函数可以准确地学习训练数据分布的边界，并且能够准确检测数据集外的异常样本。我们的实验结果显示，MPDR 在多种异常检测任务中表现出色，包括图像、向量和声音信号等数据类型。

Maximum Independent Set: Self-Training through Dynamic Programming

paper_url: http://arxiv.org/abs/2310.18672
repo_url: None
paper_authors: Lorenzo Brusca, Lars C. P. M. Quaedvlieg, Stratis Skoulakis, Grigorios G Chrysos, Volkan Cevher
for: 本文提出了一种基于图神经网络（GNN）的最大独立集（MIS）问题解决方案， drawing inspiration from dynamic programming（DP）。
methods: specifically, the authors propose a DP-like recursive algorithm based on GNNs that first constructs two smaller sub-graphs, predicts the one with the larger MIS, and then uses it in the next recursive call.
results: the authors provide numerical evidence showing the superiority of their method compared to prior methods in multiple synthetic and real-world datasets.

Abstract
This work presents a graph neural network (GNN) framework for solving the maximum independent set (MIS) problem, inspired by dynamic programming (DP). Specifically, given a graph, we propose a DP-like recursive algorithm based on GNNs that firstly constructs two smaller sub-graphs, predicts the one with the larger MIS, and then uses it in the next recursive call. To train our algorithm, we require annotated comparisons of different graphs concerning their MIS size. Annotating the comparisons with the output of our algorithm leads to a self-training process that results in more accurate self-annotation of the comparisons and vice versa. We provide numerical evidence showing the superiority of our method vs prior methods in multiple synthetic and real-world datasets.

摘要

Causal discovery in a complex industrial system: A time series benchmark

paper_url: http://arxiv.org/abs/2310.18654
repo_url: None
paper_authors: Søren Wengel Mogensen, Karin Rathsman, Per Nilsson
for: 这篇论文是用来描述如何从观测数据中推断 causal structure的。
methods: 这篇论文使用了一种时间序列数据的 causal discovery 方法，并对真实的工业系统进行了测试。
results: 论文提供了一个industrial subsystem的 causal graph，并通过专家知识来构建了这个图。这个测试环境可以帮助开发 causal discovery 方法。

Abstract
Causal discovery outputs a causal structure, represented by a graph, from observed data. For time series data, there is a variety of methods, however, it is difficult to evaluate these on real data as realistic use cases very rarely come with a known causal graph to which output can be compared. In this paper, we present a dataset from an industrial subsystem at the European Spallation Source along with its causal graph which has been constructed from expert knowledge. This provides a testbed for causal discovery from time series observations of complex systems, and we believe this can help inform the development of causal discovery methodology.

摘要

SSL Framework for Causal Inconsistency between Structures and Representations

paper_url: http://arxiv.org/abs/2310.18634
repo_url: None
paper_authors: Hang Chen, Xinyu Yang, Keqing Du
for: 这篇论文旨在探讨深度学习和 causal discovery 之间的交叉束合，以揭示无法统计数据中的 causal 关系。
methods: 本文提出了一种针对无法统计数据的 intervention 策略和 causal consistency condition (CCC) 的理论发展，并设计了一个自然语言模型 (LLMs) 和一个监督特殊化模型 (SSMs) 的自动学习框架。
results: 该文通过大量实验证明了其方法的有效性，并在三个下游任务中进行了评估。

Abstract
The cross-pollination of deep learning and causal discovery has catalyzed a burgeoning field of research seeking to elucidate causal relationships within non-statistical data forms like images, videos, and text. Such data, often being named `indefinite data', exhibit unique challenges-inconsistency between causal structure and representation, which are not common in conventional data forms. To tackle this issue, we theoretically develop intervention strategies suitable for indefinite data and derive causal consistency condition (CCC). Moreover, we design a self-supervised learning (SSL) framework that considers interventions as `views' and CCC as a `philosophy' with two implement examples on Supervised Specialized Models (SSMs) and Large Language Models (LLMs), respectively. To evaluate pure inconsistency manifestations, we have prepared the first high-quality causal dialogue dataset-Causalogue. Evaluations are also performed on three other downstream tasks. Extensive experimentation has substantiated the efficacy of our methodology, illuminating how CCC could potentially play an influential role in various fields.

摘要
<>将深度学习和 causal discovery 融合，激发了一个蓬勃的研究，旨在揭示非统计数据中的 causal 关系。这类数据，常被称为 "未定数据"，具有独特的挑战 - causal 结构和表示之间的不一致。为解决这个问题，我们提出了适应于未定数据的干预策略和 causal 一致性条件（CCC）的理论发展。此外，我们还设计了一个基于自我监督学习（SSL）框架，在该框架中，干预被视为 "视图"，CCC 被视为 "哲学"，并在 Supervised Specialized Models (SSMs) 和 Large Language Models (LLMs) 中进行了两个实现例子。为了评估纯净的不一致现象，我们准备了首个高质量 causal 对话集 - Causalogue。此外，我们还在三个下游任务上进行了评估。广泛的实验证明了我们的方法的有效性，揭示了 CCC 在不同领域的可能发挥作用。

Explainable Modeling for Wind Power Forecasting: A Glass-Box Approach with Exceptional Accuracy

paper_url: http://arxiv.org/abs/2310.18629
repo_url: None
paper_authors: Wenlong Liao, Fernando Porté-Agel, Jiannong Fang, Birgitte Bak-Jensen, Guangchun Ruan, Zhe Yang
for: 这篇论文旨在提出一个可读性高的风力预测模型，并且可以实现高精度的风力预测。
methods: 本论文使用了进步的人工智能技术（例如Gradient Boosting），创造了shape函数在预测模型中。这些函数可以将风力输出和输入特征之间的复杂非线性关系实现有效地映射。此外，预测模型还包括互动项，以实现输入特征之间的互动和联合作用。
results: 根据实验结果显示，提案的玻璃箱方法可以实现风力预测的可读性和高精度。对于全球和个别 perspective，这种方法都能够实现高精度的预测。此外，与大多数参考模型相比，玻璃箱方法表现更好，并且和最佳性能的神经网络相比，表现相当。因此，这种玻璃箱方法在可靠的风力预测中具有吸引力。

Abstract
Machine learning models (e.g., neural networks) achieve high accuracy in wind power forecasting, but they are usually regarded as black boxes that lack interpretability. To address this issue, the paper proposes a glass-box approach that combines exceptional accuracy with transparency for wind power forecasting. Specifically, advanced artificial intelligence methods (e.g., gradient boosting) are innovatively employed to create shape functions within the forecasting model. These functions effectively map the intricate non-linear relationships between wind power output and input features. Furthermore, the forecasting model is enriched by incorporating interaction terms that adeptly capture interdependencies and synergies among the input features. Simulation results show that the proposed glass-box approach effectively interprets the results of wind power forecasting from both global and instance perspectives. Besides, it outperforms most benchmark models and exhibits comparable performance to the best-performing neural networks. This dual strength of transparency and high accuracy positions the proposed glass-box approach as a compelling choice for reliable wind power forecasting.

摘要
机器学习模型（如神经网络）可以实现高精度风力预测，但它们通常被视为黑盒模型，缺乏可读性。为解决这个问题，文章提出了一种玻璃盒方法，该方法结合了高精度和可读性来进行风力预测。具体来说，文章使用了进步的人工智能技术（如梯度提升）来创建shape函数在预测模型中。这些函数有效地映射了风力输出和输入特征之间的复杂非线性关系。此外，预测模型还被补充了交互项，以便精准地捕捉输入特征之间的互动和协同作用。 simulation结果显示，提议的玻璃盒方法可以从全局和实例两个角度进行可读性的风力预测，并且在大多数参考模型之上出performances，与最佳性能的神经网络相当。这种两种优点的玻璃盒方法因此成为可靠的风力预测的可靠选择。

Pessimistic Off-Policy Multi-Objective Optimization

paper_url: http://arxiv.org/abs/2310.18617
repo_url: None
paper_authors: Shima Alizadeh, Aniruddha Bhargava, Karthick Gopalswamy, Lalit Jain, Branislav Kveton, Ge Liu
for: 这篇论文主要研究了多目标优化问题中，如何从现有策略收集的数据中提取多目标策略优化。
methods: 该论文提出了一种偏负估 estimator，基于对抗性折衣分数（IPS），用于估算多目标策略价值。这种估计器在理论和实验中都提高了对于naive IPS估计器。
results: 该论文的分析是通用的，可以应用于不同的IPS估计器和优化方法。偏负估 estimator可以通过policy gradient来优化，在所有实验中表现良好。

Abstract
Multi-objective optimization is a type of decision making problems where multiple conflicting objectives are optimized. We study offline optimization of multi-objective policies from data collected by an existing policy. We propose a pessimistic estimator for the multi-objective policy values that can be easily plugged into existing formulas for hypervolume computation and optimized. The estimator is based on inverse propensity scores (IPS), and improves upon a naive IPS estimator in both theory and experiments. Our analysis is general, and applies beyond our IPS estimators and methods for optimizing them. The pessimistic estimator can be optimized by policy gradients and performs well in all of our experiments.

摘要
多目标优化是决策问题的一种，其中有多个矛盾的目标被优化。我们研究基于现有策略所采集的数据进行离线优化的多目标策略。我们提出了一种消极估计器，用于估计多目标策略的价值，这种估计器基于反抗概率分布（IPS），并在理论和实验中都有所改进。我们的分析涵盖了更广泛的领域，并不仅限于我们的IPS估计器和优化方法。这种消极估计器可以通过政策Gradient优化，在所有实验中表现良好。

Temporally Disentangled Representation Learning under Unknown Nonstationarity

paper_url: http://arxiv.org/abs/2310.18615
repo_url: https://github.com/xiangchensong/nctrl
paper_authors: Xiangchen Song, Weiran Yao, Yewen Fan, Xinshuai Dong, Guangyi Chen, Juan Carlos Niebles, Eric Xing, Kun Zhang
for: 研究者们是想解决非站点的时序数据中 causal represencing 问题，即在不具备辅助变量（如类别标签和/或领域标识符）的情况下，可以准确分离 causally 相关的 latent 变量。
methods: 研究者们在这篇论文中提出了一种名为 NCTRL 的原则性估计框架，可以在非站点设置下，基于测量序列数据，重建时延 causal 变量并分离其关系。
results: 实验证明，NCTRL 方法可以可靠地分离时延 causal 变量，并且在不具备辅助变量的情况下，表现出明显的优势，超过了现有的基准值。

Abstract
In unsupervised causal representation learning for sequential data with time-delayed latent causal influences, strong identifiability results for the disentanglement of causally-related latent variables have been established in stationary settings by leveraging temporal structure. However, in nonstationary setting, existing work only partially addressed the problem by either utilizing observed auxiliary variables (e.g., class labels and/or domain indexes) as side information or assuming simplified latent causal dynamics. Both constrain the method to a limited range of scenarios. In this study, we further explored the Markov Assumption under time-delayed causally related process in nonstationary setting and showed that under mild conditions, the independent latent components can be recovered from their nonlinear mixture up to a permutation and a component-wise transformation, without the observation of auxiliary variables. We then introduce NCTRL, a principled estimation framework, to reconstruct time-delayed latent causal variables and identify their relations from measured sequential data only. Empirical evaluations demonstrated the reliable identification of time-delayed latent causal influences, with our methodology substantially outperforming existing baselines that fail to exploit the nonstationarity adequately and then, consequently, cannot distinguish distribution shifts.

摘要
在不监督 causal 表示学习中，对时间延迟的 latent causal 影响进行了强大的可Identifiability 结果，在静止设置下，通过利用时间结构来恰当地识别 causally 相关的 latent 变量。然而，在不稳定设置下，现有的工作只是部分地解决了问题，可以通过利用观测的auxiliary变量（例如类别标签和/或domain标识符）作为副信息，或者假设简单的 latent causal 动力学。两者都限制方法只能在有限的情况下运行。在这项研究中，我们进一步探讨了在时间延迟 causally 相关的 Markov 假设在不稳定设置下，并证明了在某些轻度条件下，独立的 latent 分量可以从其非线性混合中重建，而无需观测 auxilary 变量。然后，我们引入 NCTRL，一种原则性的估计框架，来重建时间延迟的 latent causal 变量，并identify它们之间的关系，从测量的时间序列数据中。实验证明了我们的方法可靠地识别时间延迟的 latent causal 影响，并且substantially 超越了不充分利用不稳定性的现有基准值。

Efficient kernel surrogates for neural network-based regression

paper_url: http://arxiv.org/abs/2310.18612
repo_url: None
paper_authors: Saad Qadeer, Andrew Engel, Adam Tsou, Max Vargas, Panos Stinis, Tony Chiang
for: 这篇论文的目的是为了解释深度神经网络（DNN）的效果和局限性，并提供一种低成本的估计方法。methods: 这篇论文使用了Randomly initialized DNNs和Conjugate Kernel（CK）来研究DNN的性能。results: 论文表明，CK可以作为NTK的低成本估计方法，并且在某些情况下可以超越NTK的性能。此外，论文还提供了一种改进DNN准确率的简单方法。

Abstract
Despite their immense promise in performing a variety of learning tasks, a theoretical understanding of the effectiveness and limitations of Deep Neural Networks (DNNs) has so far eluded practitioners. This is partly due to the inability to determine the closed forms of the learned functions, making it harder to assess their precise dependence on the training data and to study their generalization properties on unseen datasets. Recent work has shown that randomly initialized DNNs in the infinite width limit converge to kernel machines relying on a Neural Tangent Kernel (NTK) with known closed form. These results suggest, and experimental evidence corroborates, that empirical kernel machines can also act as surrogates for finite width DNNs. The high computational cost of assembling the full NTK, however, makes this approach infeasible in practice, motivating the need for low-cost approximations. In the current work, we study the performance of the Conjugate Kernel (CK), an efficient approximation to the NTK that has been observed to yield fairly similar results. For the regression problem of smooth functions and classification using logistic regression, we show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior. In particular, we establish bounds for the relative test losses, verify them with numerical tests, and identify the regularity of the kernel as the key determinant of performance. In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework provides insights into understanding the robustness of the various approximants and suggests a recipe for improving DNN accuracy inexpensively. We present a demonstration of this on the foundation model GPT-2 by comparing its performance on a classification task using a conventional approach and our prescription.

摘要
尽管深度神经网络（DNN）在许多学习任务上表现出了极大的承诺，但是理论上的效iveness和局限性仍然无法被实践者们完全理解。这是因为不能确定closed form的学习函数，使得训练数据的依赖关系和未经训练数据集的泛化性 harder to assess。近期的研究表明，在无限宽限制下，Randomly initialized DNNs会 converges to kernel machines，这些机器可以通过known closed form的Neural Tangent Kernel（NTK）来描述。这些结果表明，和实验证据支持，empirical kernel machines可以作为finite width DNNs的surrogate。然而，assembling the full NTK的计算成本太高，这使得这种方法在实践中不可行，因此需要低成本的近似。在当前的工作中，我们研究了Conjugate Kernel（CK）的性能，CK是NTK的有效近似。对于抽象函数的回归问题和使用logistic regression进行分类，我们显示CK的性能只是NTK的一个小 margin worse，而且在某些情况下，CKeven outperform NTK。具体来说，我们给出了 bounds for the relative test losses，通过数值测试验证了这些 bound，并发现了核函数的 Regularity是性能的关键因素。此外，我们的框架还提供了使用CK instead of NTK的理论基础，以及如何提高DNN的准确性的recipe。我们在GPT-2基础模型上进行了一个示例，通过对一个分类任务使用我们的方法和传统方法进行比较。

Where have you been? A Study of Privacy Risk for Point-of-Interest Recommendation

paper_url: http://arxiv.org/abs/2310.18606
repo_url: None
paper_authors: Kunlin Cai, Jinghuai Zhang, Will Shand, Zhiqing Hong, Guang Wang, Desheng Zhang, Jianfeng Chi, Yuan Tian
For: This paper aims to evaluate the privacy risks of mobility data-based machine learning models, specifically point-of-interest recommendation models, by designing a privacy attack suite and conducting experimental evaluations.* Methods: The paper uses a privacy attack suite that includes data extraction and membership inference attacks to evaluate the privacy risks of POI recommendation models. The attacks assume different adversary knowledge and aim to extract different types of sensitive information from mobility data.* Results: The experimental evaluation using two real-world mobility datasets demonstrates that current POI recommendation models are vulnerable to the attacks in the privacy attack suite. The paper also presents unique findings on what types of mobility data are more susceptible to privacy attacks.

Abstract
As location-based services (LBS) have grown in popularity, the collection of human mobility data has become increasingly extensive to build machine learning (ML) models offering enhanced convenience to LBS users. However, the convenience comes with the risk of privacy leakage since this type of data might contain sensitive information related to user identities, such as home/work locations. Prior work focuses on protecting mobility data privacy during transmission or prior to release, lacking the privacy risk evaluation of mobility data-based ML models. To better understand and quantify the privacy leakage in mobility data-based ML models, we design a privacy attack suite containing data extraction and membership inference attacks tailored for point-of-interest (POI) recommendation models, one of the most widely used mobility data-based ML models. These attacks in our attack suite assume different adversary knowledge and aim to extract different types of sensitive information from mobility data, providing a holistic privacy risk assessment for POI recommendation models. Our experimental evaluation using two real-world mobility datasets demonstrates that current POI recommendation models are vulnerable to our attacks. We also present unique findings to understand what types of mobility data are more susceptible to privacy attacks. Finally, we evaluate defenses against these attacks and highlight future directions and challenges.

摘要
为了应对 Location-based Services (LBS) 的普及，人类移动数据的收集已成为建立 Machine Learning (ML) 模型的重要步骤，以提供更高的用户便利。然而，这种便利也会带来隐私泄露的风险，因为这些数据可能包含用户标识信息，如家庭/办公室的位置。先前的工作主要关注于在传输或发布 mobility data 时保护隐私，而忽略了 mobility data 基于 ML 模型的隐私风险评估。为了更好地理解和评估 mobility data 基于 ML 模型的隐私泄露，我们设计了一个隐私攻击集，包括数据EXTRACTION和会员推理攻击，专门为点位服务（POI）推荐模型而设计。这些攻击在我们的攻击集中假设不同的反对手知识，目标是从移动数据中提取不同类型的敏感信息，为 POI 推荐模型的隐私风险进行总体评估。我们使用两个实际的移动数据集进行实验，表明现有 POI 推荐模型对我们的攻击非常感受。我们还发现了不同类型的移动数据是哪些隐私攻击最容易受到的，以及对这些攻击的防御措施和未来方向。

TorchDEQ: A Library for Deep Equilibrium Models

paper_url: http://arxiv.org/abs/2310.18605
repo_url: https://github.com/locuslab/torchdeq
paper_authors: Zhengyang Geng, J. Zico Kolter
for: This paper is written to provide a systematic and comprehensive framework for training and applying Deep Equilibrium (DEQ) models, which are a class of implicit models that map inputs to fixed points of neural networks.
methods: The paper presents TorchDEQ, an open-source PyTorch-based library that allows users to define, train, and infer using DEQs over multiple domains with minimal code and best practices.
results: The paper reports that by developing a joint framework that incorporates the best practices across all models, the performance, training stability, and efficiency of DEQs have been substantially improved on ten datasets across all six projects in the “DEQ Zoo”.

Abstract
Deep Equilibrium (DEQ) Models, an emerging class of implicit models that maps inputs to fixed points of neural networks, are of growing interest in the deep learning community. However, training and applying DEQ models is currently done in an ad-hoc fashion, with various techniques spread across the literature. In this work, we systematically revisit DEQs and present TorchDEQ, an out-of-the-box PyTorch-based library that allows users to define, train, and infer using DEQs over multiple domains with minimal code and best practices. Using TorchDEQ, we build a ``DEQ Zoo'' that supports six published implicit models across different domains. By developing a joint framework that incorporates the best practices across all models, we have substantially improved the performance, training stability, and efficiency of DEQs on ten datasets across all six projects in the DEQ Zoo. TorchDEQ and DEQ Zoo are released as \href{https://github.com/locuslab/torchdeq}{open source}.

摘要
深度平衡（DEQ）模型，一种在深度学习社区中升起的新类刚果模型，可以将输入映射到神经网络中的固定点上。然而，在训练和应用DEQ模型时，目前仍然采用各种不同的技术，分散在文献中。在这项工作中，我们系统地回顾DEQs，并提出了一个名为TorchDEQ的基于PyTorch的库，允许用户定义、训练和推理使用DEQs，并在多个领域上进行最小代码和最佳实践。使用TorchDEQ，我们建立了一个名为“DEQ zoo”的 colección，支持了六种已发表的隐式模型，并在不同的领域中进行了六个项目的实验。通过开发一个集成所有模型最佳实践的共同框架，我们在十个数据集上提高了DEQs的性能、训练稳定性和效率。TorchDEQ和DEQ zoo都已经作为开源项目在GitHub上发布。

Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers

paper_url: http://arxiv.org/abs/2310.18603
repo_url: None
paper_authors: Wencong You, Zayd Hammoudeh, Daniel Lowd
for: 这 paper 是为了攻击机器学习模型的弱点，使其预测结果被 manipulate 的。
methods: 这 paper 使用了语言模型来自动插入多种风格的触发器到文本中，以达到攻击目标。
results: 论文表明，使用 LLMBkd 攻击方法可以在各种风格下 achieve 高度的攻击成功率，只需要 little effort 和无需模型训练。

Abstract
Backdoor attacks manipulate model predictions by inserting innocuous triggers into training and test data. We focus on more realistic and more challenging clean-label attacks where the adversarial training examples are correctly labeled. Our attack, LLMBkd, leverages language models to automatically insert diverse style-based triggers into texts. We also propose a poison selection technique to improve the effectiveness of both LLMBkd as well as existing textual backdoor attacks. Lastly, we describe REACT, a baseline defense to mitigate backdoor attacks via antidote training examples. Our evaluations demonstrate LLMBkd's effectiveness and efficiency, where we consistently achieve high attack success rates across a wide range of styles with little effort and no model training.

摘要
<>将给定文本翻译成简化中文。>我们研究了一种新的后门攻击方法，称为LLMBkd。这种攻击方法利用语言模型自动插入文本中的多种风格化触发器。我们还提出了一种毒选择技术，以提高现有的文本后门攻击和LLMBkd的效果。此外，我们还描述了一种基eline防御方法，称为REACT，以 Mitigate backdoor attacks via antidote training examples。我们的评估结果表明，LLMBkd 具有高效率和多样化的触发器，可以轻松地在各种风格下实现高度成功率。

Online Decision Mediation

paper_url: http://arxiv.org/abs/2310.18601
repo_url: https://github.com/uvhw/Bitcoin-Foundation
paper_authors: Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar
For: The paper aims to serve as an intermediary between expert behavior and human behavior in decision-making, with the goal of striking a balance between purely prescriptive and purely descriptive approaches.* Methods: The paper proposes a solution that trades off immediate loss terms against future improvements in generalization error, and identifies why conventional bandit algorithms may fail.* Results: The paper demonstrates consistent gains over applicable benchmarks on performance measures with respect to the mediator policy, the learned model, and the decision-making system as a whole, through experiments and sensitivities on a variety of datasets.Here’s the simplified Chinese text for the three information points:* For: 这篇论文目标是在决策过程中作为中间人，以寻求 struck a balance between凡是指导的（purely prescriptive）和凡是描述的（purely descriptive）方法。* Methods: 论文提出了一种方法，该方法在评估损失和未来改进的泛化误差之间进行了交易，并解释了 conventional bandit 算法可能失败的原因。* Results: 论文通过对各种数据集的实验和敏感度分析，示出了与相关的参考模型、学习模型和决策系统总体性能的一致性。

Abstract
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior: At each time, the algorithm observes an action chosen by a fallible agent, and decides whether to *accept* that agent's decision, *intervene* with an alternative, or *request* the expert's opinion. For instance, in clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances, thus real-world decision support is often limited to monitoring and forecasting. Instead, such an intermediary would strike a prudent balance between the former (purely prescriptive) and latter (purely descriptive) approaches, while providing an efficient interface between human mistakes and expert feedback. In this work, we first formalize the sequential problem of *online decision mediation* -- that is, of simultaneously learning and evaluating mediator policies from scratch with *abstentive feedback*: In each round, deferring to the oracle obviates the risk of error, but incurs an upfront penalty, and reveals the otherwise hidden expert action as a new training data point. Second, we motivate and propose a solution that seeks to trade off (immediate) loss terms against (future) improvements in generalization error; in doing so, we identify why conventional bandit algorithms may fail. Finally, through experiments and sensitivities on a variety of datasets, we illustrate consistent gains over applicable benchmarks on performance measures with respect to the mediator policy, the learned model, and the decision-making system as a whole.

摘要
考虑使用决策支持助手作为 oracle 专家行为和人类行为之间的中间人：在每次时刻，算法观察到一个不准确的代理人选择的行动，然后决定是否接受该代理人的决定， intervene with 一个替代方案，或者请求专家的意见。例如，在临床诊断中，完全自主的机器行为经常超出伦理范畴，因此现实世界决策支持通常受限于监测和预测。而这个中间人可以 strike 一个谨慎的平衡 между两者，同时提供一个有效的人类错误和专家反馈之间的交互。在这项工作中，我们首先正式化了在线决策媒介问题的sequential形式：在每个回合中，推迟到oracle会降低风险，但是会付出一个初始的罚款，并将 Otherwise 隐藏的专家行为作为一个新的训练数据点。其次，我们激励和提出一个解决方案，它在同时学习和评估媒介策略时，要求平衡 (immediate) 损失和 (future) 改进泛化误差的问题。在这个过程中，我们发现了 conventional bandit 算法可能失败的原因。最后，通过对多种数据集进行实验和敏感分析，我们证明了我们的方法在表现度量上与相关的benchmark相比具有一致性。

Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint

paper_url: http://arxiv.org/abs/2310.18593
repo_url: None
paper_authors: Junghyun Lee, Hanseul Cho, Se-Young Yun, Chulhee Yun
for: 这个论文的目标是实现公平的主成分分析（PCA），使得投影后的分布匹配于敏感特征的分布。
methods: 这篇论文使用了一种新的定义called“可能相对公平优化”（PAFO）学习可能性，并在实际应用中提出了一种名为“公平流动PCA”的新设定，以及一种具有内存效率的算法“公平噪声方法”（FNPM）。
results: 这篇论文提供了这种算法的“统计”保证，这是公平PCA文献中的首次。此外，它还验证了这种算法的效果和内存效率在实际数据上。

Abstract
Fair Principal Component Analysis (PCA) is a problem setting where we aim to perform PCA while making the resulting representation fair in that the projected distributions, conditional on the sensitive attributes, match one another. However, existing approaches to fair PCA have two main problems: theoretically, there has been no statistical foundation of fair PCA in terms of learnability; practically, limited memory prevents us from using existing approaches, as they explicitly rely on full access to the entire data. On the theoretical side, we rigorously formulate fair PCA using a new notion called \emph{probably approximately fair and optimal} (PAFO) learnability. On the practical side, motivated by recent advances in streaming algorithms for addressing memory limitation, we propose a new setting called \emph{fair streaming PCA} along with a memory-efficient algorithm, fair noisy power method (FNPM). We then provide its {\it statistical} guarantee in terms of PAFO-learnability, which is the first of its kind in fair PCA literature. Lastly, we verify the efficacy and memory efficiency of our algorithm on real-world datasets.

摘要
“ fair principal component analysis (PCA) 是一个问题设定，我们想要在 PCA 中进行不偏的表现，使得投影的分布，根据敏感特征，相互匹配。然而，现有的公平 PCA 方法有两个主要问题：一是理论上没有公平 PCA 的学习基础; two是实际上限制了我们使用现有方法，因为它们需要完整的数据存储。在理论上，我们严格定义公平 PCA 使用一个新的概念 called “可能接近公平且最佳”(PAFO) 可学习性。在实践上，运用最近的流动数据处理技术，我们提出一个新的设定 called “公平流动 PCA”，以及一个内存有效的算法，叫做公平杂音方法 (FNPM)。我们然后提供这个设定的“ Statistical ”保证，这是公平 PCA 文献中的第一个。最后，我们验证了我们的算法在实际数据上的有效性和内存效率。”

Inverse Decision Modeling: Learning Interpretable Representations of Behavior

paper_url: http://arxiv.org/abs/2310.18591
repo_url: https://github.com/danieljarrett/Inverse-Bounded-Rational-Control
paper_authors: Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar
for: 提高决策过程的模型化和改进
methods: 使用参数化表示Sequential Decision Behavior的框架，包括正则化控制行为和资料学习
results: 实现了学习（可读）表示 rationality，自然地捕捉了偏见行为、环境知识不准确和 bounded rationality 的概念

Abstract
Decision analysis deals with modeling and enhancing decision processes. A principal challenge in improving behavior is in obtaining a transparent description of existing behavior in the first place. In this paper, we develop an expressive, unifying perspective on inverse decision modeling: a framework for learning parameterized representations of sequential decision behavior. First, we formalize the forward problem (as a normative standard), subsuming common classes of control behavior. Second, we use this to formalize the inverse problem (as a descriptive model), generalizing existing work on imitation/reward learning -- while opening up a much broader class of research problems in behavior representation. Finally, we instantiate this approach with an example (inverse bounded rational control), illustrating how this structure enables learning (interpretable) representations of (bounded) rationality -- while naturally capturing intuitive notions of suboptimal actions, biased beliefs, and imperfect knowledge of environments.

摘要
First, we define the forward problem, which includes common classes of control behavior. Then, we use this framework to formalize the inverse problem, which generalizes existing work on imitation and reward learning. This approach opens up a broader range of research problems in behavior representation.Finally, we provide an example of inverse bounded rational control, which demonstrates how this structure enables the learning of interpretable representations of rationality while naturally capturing suboptimal actions, biased beliefs, and imperfect knowledge of environments.

Optimal Transport for Kernel Gaussian Mixture Models

paper_url: http://arxiv.org/abs/2310.18586
repo_url: None
paper_authors: Jung Hun Oh, Rena Elkin, Anish Kumar Simhal, Jiening Zhu, Joseph O Deasy, Allen Tannenbaum
for: 本研究使用 Wasserstein 距离来衡量两个 Gaussian mixture 的距离，并利用 kernel trick 避免直接将输入数据映射到高维特征空间。
methods: 本研究使用 kernel Gaussian mixture models 来计算两个 Gaussian mixture 的 Wasserstein 距离。
results: 本研究提出了一种基于 RKHS 的 Wasserstein-type metric，可以帮助更好地模型复杂多模 density 的实际数据。

Abstract
The Wasserstein distance from optimal mass transport (OMT) is a powerful mathematical tool with numerous applications that provides a natural measure of the distance between two probability distributions. Several methods to incorporate OMT into widely used probabilistic models, such as Gaussian or Gaussian mixture, have been developed to enhance the capability of modeling complex multimodal densities of real datasets. However, very few studies have explored the OMT problems in a reproducing kernel Hilbert space (RKHS), wherein the kernel trick is utilized to avoid the need to explicitly map input data into a high-dimensional feature space. In the current study, we propose a Wasserstein-type metric to compute the distance between two Gaussian mixtures in a RKHS via the kernel trick, i.e., kernel Gaussian mixture models.

摘要
水斯坦距离（OMT）是一个具有广泛应用的数学工具，它提供了两个概率分布之间的自然距离量。有几种方法可以将 OMT 整合到广泛使用的概率模型中，如 Gaussian 或 Gaussian 混合体，以增强模型处理复杂多模式数据的能力。然而，几乎没有研究探讨 OMT 问题在复复函数希尔贝特空间（RKHS）中，这里利用核函数传递器来避免直接将输入数据映射到高维的特征空间。在 presente 研究中，我们提出了一种 Wasserstein-type 度量来计算两个 Gaussian 混合体之间的距离，即核函数 Gaussian 混合模型。

Group Robust Classification Without Any Group Information

paper_url: http://arxiv.org/abs/2310.18555
repo_url: https://github.com/tsirif/ula
paper_authors: Christos Tsirigotis, Joao Monteiro, Pau Rodriguez, David Vazquez, Aaron Courville
for: 这个研究旨在提高 Empirical Risk Minimization (ERM) 方法的高阶假设精度，并解决训练数据中的假设相互作用所导致的伪 correlations 问题，以便在高风险应用中部署系统。
methods: 这个研究提出了一种 entirely bias-unsupervised 的方法，使用预训练的自我supervised 模型来可靠地提取偏见信息，并与我们的验证标准数据集成logit adjustment 训练损失。
results: 我们的方法可以超过现有方法的性能，并在实际应用中提供了更好的伪相互作用精度，包括在 MPI3D dataset 上进行系统性的普遍化任务中，当混合对应 attribute value absent 时，现有方法失败。

Abstract
Empirical risk minimization (ERM) is sensitive to spurious correlations in the training data, which poses a significant risk when deploying systems trained under this paradigm in high-stake applications. While the existing literature focuses on maximizing group-balanced or worst-group accuracy, estimating these accuracies is hindered by costly bias annotations. This study contends that current bias-unsupervised approaches to group robustness continue to rely on group information to achieve optimal performance. Firstly, these methods implicitly assume that all group combinations are represented during training. To illustrate this, we introduce a systematic generalization task on the MPI3D dataset and discover that current algorithms fail to improve the ERM baseline when combinations of observed attribute values are missing. Secondly, bias labels are still crucial for effective model selection, restricting the practicality of these methods in real-world scenarios. To address these limitations, we propose a revised methodology for training and validating debiased models in an entirely bias-unsupervised manner. We achieve this by employing pretrained self-supervised models to reliably extract bias information, which enables the integration of a logit adjustment training loss with our validation criterion. Our empirical analysis on synthetic and real-world tasks provides evidence that our approach overcomes the identified challenges and consistently enhances robust accuracy, attaining performance which is competitive with or outperforms that of state-of-the-art methods, which, conversely, rely on bias labels for validation.

摘要
empirical risk minimization (ERM) sensitive to spurious correlations in the training data, posing significant risks when deploying systems trained under this paradigm in high-stakes applications. While the existing literature focuses on maximizing group-balanced or worst-group accuracy, estimating these accuracies is hindered by costly bias annotations. This study contends that current bias-unsupervised approaches to group robustness continue to rely on group information to achieve optimal performance. Firstly, these methods implicitly assume that all group combinations are represented during training. To illustrate this, we introduce a systematic generalization task on the MPI3D dataset and discover that current algorithms fail to improve the ERM baseline when combinations of observed attribute values are missing. Secondly, bias labels are still crucial for effective model selection, restricting the practicality of these methods in real-world scenarios. To address these limitations, we propose a revised methodology for training and validating debiased models in an entirely bias-unsupervised manner. We achieve this by employing pretrained self-supervised models to reliably extract bias information, which enables the integration of a logit adjustment training loss with our validation criterion. Our empirical analysis on synthetic and real-world tasks provides evidence that our approach overcomes the identified challenges and consistently enhances robust accuracy, attaining performance which is competitive with or outperforms that of state-of-the-art methods, which, conversely, rely on bias labels for validation.

Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion

paper_url: http://arxiv.org/abs/2310.18554
repo_url: None
paper_authors: Junghyun Lee, Se-Young Yun, Kwang-Sung Jun
for: 这个论文的目的是提高Logistic Bandit模型中的依赖关系，尤其是在大型数据集 ($S \geq d$) 时。
methods: 这个论文使用了一种新的方法 called “regret-to-confidence set conversion” (R2CS)，用于改进logistic bandit的 regret bound。
results: 通过使用R2CS方法，这个论文得到了一个优化的 regret bound，具有更好的依赖关系于$S$，同时保留计算可行性和其他因素($d$和$T$)的依赖关系。

Abstract
Logistic bandit is a ubiquitous framework of modeling users' choices, e.g., click vs. no click for advertisement recommender system. We observe that the prior works overlook or neglect dependencies in $S \geq \lVert \theta_\star \rVert_2$, where $\theta_\star \in \mathbb{R}^d$ is the unknown parameter vector, which is particularly problematic when $S$ is large, e.g., $S \geq d$. In this work, we improve the dependency on $S$ via a novel approach called {\it regret-to-confidence set conversion (R2CS)}, which allows us to construct a convex confidence set based on only the \textit{existence} of an online learning algorithm with a regret guarantee. Using R2CS, we obtain a strict improvement in the regret bound w.r.t. $S$ in logistic bandits while retaining computational feasibility and the dependence on other factors such as $d$ and $T$. We apply our new confidence set to the regret analyses of logistic bandits with a new martingale concentration step that circumvents an additional factor of $S$. We then extend this analysis to multinomial logistic bandits and obtain similar improvements in the regret, showing the efficacy of R2CS. While we applied R2CS to the (multinomial) logistic model, R2CS is a generic approach for developing confidence sets that can be used for various models, which can be of independent interest.

摘要
“带有搜索问题的游戏”（Logistic Bandit）是一个普遍的框架，用于模型用户选择，例如广告追踪系统中的点击vs无点击。我们发现先前的研究往往忽略或忽略了$S \geq \lVert \theta_\star \rVert_2$中的相互依赖，尤其当$S$较大时（例如$S \geq d$）。在这个工作中，我们通过一种新的方法called“ regret-to-confidence set conversion”（R2CS），将可以建立基于仅存在线上学习算法的 regret guarantee的凸信心集。使用R2CS，我们得到了对$S$的 regret bound的严格改进，同时保持了 Computational Feasibility和其他因素（例如$d$和$T$）的依赖。我们将新的信心集应用到了带有新 martingale concentration step 的 regret分析中，从而缺少一个 $S$ 的额外因素。然后，我们将这些分析扩展到多ategorical logistic bandits，并获得了类似的改进。我们将R2CS应用到（多ategorical） logistic模型，但R2CS是一个更通用的方法，可以用于不同的模型，这可能是独立的兴趣。

The Role of Reference Points in Machine-Learned Atomistic Simulation Models

paper_url: http://arxiv.org/abs/2310.18552
repo_url: None
paper_authors: Xiangyun Lei, Weike Ye, Joseph Montoya, Tim Mueller, Linda Hung, Jens Hummelshoej
for: 本研究提出了一种新的化学环境模型理论（CEMT），用于超越传统的基于原子 Machine Learning Force Field（MLFF）模型，广泛用于化学系统的分子动力学 simulations。
methods: 本研究使用了 Gaussian Multipole（GMP）函数来Feature化不同参考点集，包括finite difference grid-centered和bond-centered模型，以分析不同参考点集的能量预测精度、预测速度和学习效率。
results: 研究发现，使用非原子参考点可以提高力训练的灵活性和适应性，并且可以提高预测精度、预测速度和学习效率。此外，本研究还建立了 CEMT 与 real-space orbital-free finite element Density Functional Theory（FE-DFT）之间的联系，并表明了这种联系可以提高数据效率和稳定性。

Abstract
This paper introduces the Chemical Environment Modeling Theory (CEMT), a novel, generalized framework designed to overcome the limitations inherent in traditional atom-centered Machine Learning Force Field (MLFF) models, widely used in atomistic simulations of chemical systems. CEMT demonstrated enhanced flexibility and adaptability by allowing reference points to exist anywhere within the modeled domain and thus, enabling the study of various model architectures. Utilizing Gaussian Multipole (GMP) featurization functions, several models with different reference point sets, including finite difference grid-centered and bond-centered models, were tested to analyze the variance in capabilities intrinsic to models built on distinct reference points. The results underscore the potential of non-atom-centered reference points in force training, revealing variations in prediction accuracy, inference speed and learning efficiency. Finally, a unique connection between CEMT and real-space orbital-free finite element Density Functional Theory (FE-DFT) is established, and the implications include the enhancement of data efficiency and robustness. It allows the leveraging of spatially-resolved energy densities and charge densities from FE-DFT calculations, as well as serving as a pivotal step towards integrating known quantum-mechanical laws into the architecture of ML models.

摘要

Punica: Multi-Tenant LoRA Serving

paper_url: http://arxiv.org/abs/2310.18547
repo_url: https://github.com/punica-ai/punica
paper_authors: Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, Arvind Krishnamurthy
for: 这个论文是为了提出一个名为Punica的系统，用于在共享GPU集群中服务多个低阶适应（LoRA）模型。
methods: 这个系统使用了一个新的CUDA核心设计，允许不同LoRA模型的批处理操作在GPU上混合进行，这使得GPU只需要储存一个基础预训练模型，从而大幅提高GPU的内存和计算效率。
results: 根据评估结果，Punica在共享GPU集群中服务多个LoRA模型时，与现有的LLM服务系统相比，可以实现12倍的throughput提高，仅加2毫秒迟延性每个字。Punica的源代码可以在https://github.com/punica-ai/punica上获取。

Abstract
Low-rank adaptation (LoRA) has become an important and popular method to adapt pre-trained models to specific domains. We present Punica, a system to serve multiple LoRA models in a shared GPU cluster. Punica contains a new CUDA kernel design that allows batching of GPU operations for different LoRA models. This allows a GPU to hold only a single copy of the underlying pre-trained model when serving multiple, different LoRA models, significantly enhancing GPU efficiency in terms of both memory and computation. Our scheduler consolidates multi-tenant LoRA serving workloads in a shared GPU cluster. With a fixed-sized GPU cluster, our evaluations show that Punica achieves 12x higher throughput in serving multiple LoRA models compared to state-of-the-art LLM serving systems while only adding 2ms latency per token. Punica is open source at https://github.com/punica-ai/punica .

摘要
低阶尝试（LoRA）已成为特定领域适应模型的重要和受欢迎方法。我们介绍了一个名为“Punica”的系统，用于在共享GPU集群中服务多个LoRA模型。Punica包含一个新的CUDA内核设计，允许不同LoRA模型的GPU操作批处理。这意味着GPU只需要存储一份基于预训练模型的底层模型，可以大幅提高GPU的内存和计算效率。我们的调度器将多家租户的LoRA服务工作负载在共享GPU集群中卷积。与现状的LLM服务系统相比，我们的Punica在服务多个LoRA模型时实现了12倍的throughput，同时只增加2毫秒的延迟每个字。Punica的源代码可以在上下载。

End-to-end Feature Selection Approach for Learning Skinny Trees

paper_url: http://arxiv.org/abs/2310.18542
repo_url: None
paper_authors: Shibal Ibrahim, Kayhan Behdin, Rahul Mazumder
for:这篇论文的目的是提出一种同时进行特征选择和树ensemble学习的工具包，以提高树ensemble模型的性能和可读性。methods:这篇论文使用了一种综合优化方法，包括特征选择和树ensemble学习，并且使用了分组L0-L2正则化来实现特征选择。results:这篇论文在15个 sintetic和实际世界数据集上实现了特征压缩率为1.5倍至620倍，并且在某些情况下可以达到10倍的推理速度提升，而无需失去性能。此外，这篇论文的特征选择方法也超过了许多现有的工具包，例如LightGBM和Random Forests，在特定的特征预算下（25%），Skinny Trees可以提高AUC性能，比LightGBM提高10.2%（最高达37.7%），比Random Forests提高3%（最高达12.5%）。

Abstract
Joint feature selection and tree ensemble learning is a challenging task. Popular tree ensemble toolkits e.g., Gradient Boosted Trees and Random Forests support feature selection post-training based on feature importances, which are known to be misleading, and can significantly hurt performance. We propose Skinny Trees: a toolkit for feature selection in tree ensembles, such that feature selection and tree ensemble learning occurs simultaneously. It is based on an end-to-end optimization approach that considers feature selection in differentiable trees with Group $\ell_0 - \ell_2$ regularization. We optimize with a first-order proximal method and present convergence guarantees for a non-convex and non-smooth objective. Interestingly, dense-to-sparse regularization scheduling can lead to more expressive and sparser tree ensembles than vanilla proximal method. On 15 synthetic and real-world datasets, Skinny Trees can achieve $1.5\times$ - $620\times$ feature compression rates, leading up to $10\times$ faster inference over dense trees, without any loss in performance. Skinny Trees lead to superior feature selection than many existing toolkits e.g., in terms of AUC performance for $25\%$ feature budget, Skinny Trees outperforms LightGBM by $10.2\%$ (up to $37.7\%$), and Random Forests by $3\%$ (up to $12.5\%$).

摘要
共同特征选择和树集合学习是一项具有挑战性的任务。常见的树集合工具包，例如梯度拟合树和随机森林，支持基于特征重要性的特征选择，这些特征选择是已知会导致性能下降的。我们提出了瘦树：一个特征选择在树集合学习中同时进行的工具包。它基于一个端到端优化方法，考虑特征选择在分子树中的分支权重 regularization。我们使用一种第一个贝叶幂方法并提供了对非对称和非均匀目标函数的收敛保证。有趣的是， dense-to-sparse 规则调度可以导致更具表达力和更稀疏的树集合，而不是原始的 proximal 方法。在 15 个 synthetic 和实际世界数据集上，瘦树可以实现 $1.5\times$ - $620\times$ 特征压缩率，导致 $10\times$ 更快的推理速度，而无损性能。瘦树在多个现有工具包中的特征选择表现更佳，例如在 $25\%$ 特征预算下，瘦树可以跟上 LightGBM 的 $10.2\%$ (最高 $37.7\%$)，并跟上 Random Forests 的 $3\%$ (最高 $12.5\%$)。

2023-10-28

eess.IV

eess.IV - 2023-10-28

Tracking and fast imaging of a translational object via Fourier modulation

paper_url: http://arxiv.org/abs/2310.18732
repo_url: None
paper_authors: Shijian Li, Xu-ri Yao, Wei Zhang, Yeliang Wang, Qing Zhao
for: 高速运动物体的追踪和图像化，具有各种应用领域的应用前景。
methods: 运用单ixel图像技术进行进程式捕捉高速运动物体，通过动作补偿以获得更好的图像质量。
results: 方法可以同时具有短的重建时间和高质量图像，并且可以实现对小物体的佳化追踪和边缘检测。

Abstract
The tracking and imaging of high-speed moving objects hold significant promise for application in various fields. Single-pixel imaging enables the progressive capture of a fast-moving translational object through motion compensation. However, achieving a balance between a short reconstruction time and a good image quality is challenging. In this study, we present a approach that simultaneously incorporates position encoding and spatial information encoding through the Fourier patterns. The utilization of Fourier patterns with specific spatial frequencies ensures robust and accurate object localization. By exploiting the properties of the Fourier transform, our method achieves a remarkable reduction in time complexity and memory consumption while significantly enhancing image quality. Furthermore, we introduce an optimized sampling strategy specifically tailored for small moving objects, significantly reducing the required dwell time for imaging. The proposed method provides a practical solution for the real-time tracking, imaging and edge detection of translational objects, underscoring its considerable potential for diverse applications.

摘要
高速移动物体的跟踪和成像具有广泛的应用前景。单像素成像可以逐步捕捉fast-moving translational object，通过运动补偿来实现。但是，实现一个好的图像质量和重建时间短的 equilibrio却是挑战。在这种研究中，我们提出了一种方法，同时 incorporates position encoding和空间信息编码通过干扰Patterns。通过利用干扰transform的特性，我们的方法可以remarkably reduce时间复杂度和内存占用，同时显著提高图像质量。此外，我们还提出了专门为小 objetes introduce an optimized sampling strategy, significantly reducing the required dwell time for imaging.该方法提供了一个实用的解决方案，可以实时跟踪、成像和Edge detection of translational objects，强调其广泛的应用前景。

2023-10-28

eess.SP

eess.SP - 2023-10-28

Enhancing Epileptic Seizure Detection with EEG Feature Embeddings

paper_url: http://arxiv.org/abs/2310.18767
repo_url: None
paper_authors: Arman Zarei, Bingzhao Zhu, Mahsa Shoaran
For: The paper aims to improve the performance of seizure detection systems using EEG signals by learning informative embeddings of the signals.* Methods: The proposed method converts raw EEG signals to appropriate embeddings, which are beneficial for various machine learning models.* Results: The proposed approach achieves significant improvements in sensitivity, specificity, and AUC score across multiple models, with a state-of-the-art classification performance of 100% sensitivity and 99% specificity.Here is the same information in Simplified Chinese text:* For: 这篇论文目的是使用EEG信号提高癫痫检测系统的性能。* Methods: 提议的方法是将原始EEG信号转换为有用的嵌入，这些嵌入对多种机器学习模型都是有利的。* Results: 提议的方法在多个模型上实现了显著提高的敏感性、特异性和AUC分数，并达到了新的顶峰性，即100%的敏感度和99%的特异度。

Abstract
Epilepsy is one of the most prevalent brain disorders that disrupts the lives of millions worldwide. For patients with drug-resistant seizures, there exist implantable devices capable of monitoring neural activity, promptly triggering neurostimulation to regulate seizures, or alerting patients of potential episodes. Next-generation seizure detection systems heavily rely on high-accuracy machine learning-based classifiers to detect the seizure onset. Here, we propose to enhance the seizure detection performance by learning informative embeddings of the EEG signal. We empirically demonstrate, for the first time, that converting raw EEG signals to appropriate embeddings can significantly boost the performance of seizure detection algorithms. Importantly, we show that embedding features, which converts the raw EEG into an alternative representation, is beneficial for various machine learning models such as Logistic Regression, Multi-Layer Perceptron, Support Vector Machines, and Gradient Boosted Trees. The experiments were conducted on the CHB-MIT scalp EEG dataset. With the proposed EEG feature embeddings, we achieve significant improvements in sensitivity, specificity, and AUC score across multiple models. By employing this approach alongside an SVM classifier, we were able to attain state-of-the-art classification performance with a sensitivity of 100% and specificity of 99%, setting a new benchmark in the field.

摘要
Translated into Simplified Chinese:epsilepsy 是全球范围内最常见的脑部疾病之一，影响了数百万人。为了治疗这些药物抵抗性的癫痫病人，存在可以监测神经活动，迅速诱发神经刺激来调节癫痫的嵌入式设备。未来的癫痫检测系统几乎完全依赖于高精度机器学习模型来检测癫痫开始。在这里，我们提议通过学习有用的嵌入来增强癫痫检测性能。我们实际地示证，对于第一次癫痫检测，将raw EEG信号转换为合适的嵌入可以显著提高癫痫检测算法的性能。此外，我们还证明了嵌入特征可以为不同的机器学习模型，如Logistic Regression、Multi-Layer Perceptron、Support Vector Machines和Gradient Boosted Trees等提供有利。实验在CHB-MIT皮帽EEG数据集上进行。通过我们的EEG特征嵌入，我们在多个模型上实现了显著的改善，包括敏感性、特异性和AUC分数。通过与SVM分类器结合使用，我们实现了当前领域的最佳分类性能，敏感性为100%，特异性为99%。

Cluster-Based Cell-Free Massive MIMO Systems: A Novel Framework to Enhance Spectral Efficiency with Low Complexity

paper_url: http://arxiv.org/abs/2310.18734
repo_url: None
paper_authors: Reza Roshanghias, Reza Saadat
for: This paper aims to improve the spectral efficiency (SE) of distributed cell-free massive MIMO (CF-mMIMO) systems by proposing a novel cluster-based architecture.methods: The proposed cluster-based structure combines centralized and distributed configurations, with local precoders formed using collectively shared CSI within each cluster. The MMSE precoding technique is used to achieve optimal SE performance.results: The simulation results show that the proposed cluster-based framework achieves a significantly augmented SE compared to the distributed architecture, with the optimal SE attained using four clusters and the MMSE precoding technique. The computational complexity is reduced by over 85%. Additionally, the proposed approach outperforms the centralized structure in terms of SE.Here is the text in Simplified Chinese:for: 本文目的是提高分布式cell-free大MIMO系统的spectral efficiency（SE）。methods: 提议的集群结构 combinest中央化和分布式配置，通过每个集群内CSI的共享来形成本地预编码器。使用MMSE预编码技术来实现优化的SE性能。results: 仪表结果表明，提议的集群结构相比分布式结构，可以获得显著提高的SE性能，最佳SE性能在四个集群和MMSE预编码技术下实现，计算复杂度下降超过85%。此外，提议的方法还超越了中央结构的SE性能。

Abstract
The issue of diminished spectral efficiency (SE) of the downlink (DL) transmission in distributed cell-free massive MIMO (CF-mMIMO) systems poses a significant challenge in terms of user equipment (UE) performance when compared to their centralized CF-mMIMO counterparts. The primary root cause of this issue can be attributed to the reduced efficacy of distributed precoders, which are devised using local channel state information (CSI) in distributed systems. This reduced efficacy becomes particularly pronounced in terms of interference mitigation when compared to centralized precoders. To address this issue, this paper proposes a novel architectural framework for CF-mMIMO systems, referred to herein as the "cluster-based structure." Within this innovative structure, a hybrid amalgamation of centralized and distributed configurations is employed, complemented by the introduction of a unique cluster arrangement for the access points (APs) within the network. In this design, the CSI of APs within each cluster is collectively shared within a local processor unit. Consequently, by harnessing this enhanced repository of local channel information, local precoders are formulated, which facilitate more effective interference mitigation with reduced computational complexity compared to the centralized approach. This approach ultimately results in a significantly augmented SE when contrasted with the distributed architecture. The simulation results unequivocally demonstrate that within the cluster-based framework, the optimal SE for the network is attained when utilizing four clusters in conjunction with the MMSE precoding technique, leading to a notable reduction in computational complexity exceeding 85%. Importantly, this approach surpasses the SE performance of the centralized structure.

摘要
分布式Cell-free巨观MIMO系统（CF-mMIMO）的下行传输（DL） Spectral Efficiency（SE）受到了明显的挑战，用户设备（UE）性能与中央CF-mMIMO对照之下显著下降。主要的根本原因在于分布式预编器的效果减退，这些预编器基于分布式系统中的本地频道状态信息（CSI）设计。在分布式系统中，这种减退的效果特别明显在干扰抑制方面，与中央预编器相比。为解决这一问题，本文提出了一种新的建筑框架，称为“分区结构”。在这种新的架构中， hybrid化中央和分布式配置是使用，并在网络中的APs之间创建了特有的帧排序。在这个设计中，APs在每个分区内的CSI是集中共享在本地处理单元中。通过利用这些增强的本地频道信息，本地预编器是计算更有效的干扰抑制，比中央方法更加简单。这种方法最终导致了分布式系统中的SE显著增加，与分布式架构相比，SE性能得到了显著提高。实验结果表明，在分区结构中，使用四个分区并与MMSE预编器技术相结合，可以获得网络的最佳SE，计算复杂度超过85%。这种方法还超越了中央结构的SE性能。

Two-stage space construction for real-time modeling of distributed parameter systems under sparse sensing

paper_url: http://arxiv.org/abs/2310.18670
repo_url: None
paper_authors: Peng Wei
for: This paper is written for real-time modeling of distributed parameter systems (DPSs) in cases of limited sensors.
methods: The paper introduces a two-stage spatial construction approach that uses a discrete space-completion method to recuperate spatiotemporal patterns of non-monitored locations, followed by the use of high-dimensional space construction methods to derive continuous spatial basis functions (SBFs). The nonlinear temporal model is identified and adjusted via long short-term memory (LSTM) neural networks.
results: The paper demonstrates the efficacy of the proposed modeling technique under sparse sensing using experimental tests conducted on a pouch-type Li-ion battery. The results show that the use of a cubic B-spline surface is an effective solution for optimizing space construction in the sense of least squares approximation.

Abstract
Numerous industrial processes can be defined using distributed parameter systems (DPSs). This study introduces a two-stage spatial construction approach for real-time modeling of DPSs in cases of limited sensors. Initially, a discrete space-completion approach is created to recuperate the spatiotemporal patterns of non-monitored locations under sparse sensing. The high-dimensional space construction method is employed to derive continuous spatial basis functions (SBFs). The identification and adjustment of the nonlinear temporal model are carried out via the long short-term memory (LSTM) neural network. Eventually, the amalgamation of the derived SBFs and temporal model results in a spatially continuous model. The use of a cubic B-spline surface is validated as an effective solution for optimizing space construction in the sense of least squares approximation. Experimental tests conducted on a pouch-type Li-ion battery demonstrate the efficacy of the proposed modeling technique under sparse sensing. This work highlights the promise of sparse sensors in real-time full-space modeling for large-scale battery energy storage systems.

摘要
许多工业过程可以使用分布参数系统（DPS）进行定义。本研究提出了一种两Stage空间建构方法，用于实时模拟DPS，并且在有限感知的情况下进行模拟。首先，一种精简空间完成方法被创建，以恢复不监测区域的空间时间模式。然后，高维空间建构方法被应用，以 derivate连续空间基函数（SBF）。非线性时间模型的标识和调整由长Short-Term记忆神经网络（LSTM）完成。最后， derivate的 SBFs 和时间模型的结合，得到了一个连续空间模型。实验表明，使用立方BSpline面的方法可以有效地优化空间建构，从 least squares 的角度来看。这种方法在磁力牵引Li-ion电池的实验中得到了证明，并且表明了有限感知的感知器在实时全空间模型化方面的承诺。

Joint Localization and Communication Enhancement in Uplink Integrated Sensing and Communications System with Clock Asynchronism

paper_url: http://arxiv.org/abs/2310.18630
repo_url: None
paper_authors: Xu Chen, XinXin He, Zhiyong Feng, Zhiqing Wei, Qixun Zhang, Xin Yuan, Ping Zhang
for: 提高单基站的射频定位和通信可靠性
methods: 利用多信号分类（MUSIC）基于抽象谱（AoA）估计，并在AoA估计中加入信号增强（CSI）估计，以消除额外复杂度。
results: 可以减少时钟偏移（TO）相关的频率变化引起的影响，并且可以实现单基站的射频定位。在实验中，提出的方案可以与最小二乘均方差（MMSE）CSI估计具有相同的比特错误率性能，并且可以提高射频定位的均方差Error（MSE）约8分质量单位。

Abstract
In this paper, we propose a joint single-base localization and communication enhancement scheme for the uplink (UL) integrated sensing and communications (ISAC) system with asynchronism, which can achieve accurate single-base localization of user equipment (UE) and significantly improve the communication reliability despite the existence of timing offset (TO) due to the clock asynchronism between UE and base station (BS). Our proposed scheme integrates the CSI enhancement into the multiple signal classification (MUSIC)-based AoA estimation and thus imposes no extra complexity on the ISAC system. We further exploit a MUSIC-based range estimation method and prove that it can suppress the time-varying TO-related phase terms. Exploiting the AoA and range estimation of UE, we can estimate the location of UE. Finally, we propose a joint CSI and data signals-based localization scheme that can coherently exploit the data and the CSI signals to improve the AoA and range estimation, which further enhances the single-base localization of UE. The extensive simulation results show that the enhanced CSI can achieve equivalent bit error rate performance to the minimum mean square error (MMSE) CSI estimator. The proposed joint CSI and data signals-based localization scheme can achieve decimeter-level localization accuracy despite the existing clock asynchronism and improve the localization mean square error (MSE) by about 8 dB compared with the maximum likelihood (ML)-based benchmark method.

摘要
在这篇论文中，我们提出了一种同时进行单基地位置定位和通信增强方案，用于下降链（UL）结合感知通信（ISAC）系统中的异步问题，可以实现精确的单基地位置定位和提高通信可靠性，即使存在时钟偏移（TO）。我们的提议方案将增强因子（CSI）增强纳入多个信号分类（MUSIC）基于投射角（AoA）估计中，从而不增加ISAC系统的复杂性。我们进一步利用MUSIC基于距离估计方法，并证明它可以抑制时间变化的TO相关阶跃项。通过UE的AoA和距离估计，我们可以估计UE的位置。最后，我们提议一种同时使用CSI和数据信号的位置定位方案，可以具有协调利用数据和CSI信号来提高AoA和距离估计的优点，进而提高单基地位置定位精度。实验结果表明，提高CSI可以实现与最小平均方差（MMSE）CSI估计器相同的错误率性能。我们的联合CSI和数据信号基于位置定位方案可以在存在时钟偏移的情况下实现厘米级位置定位精度，并提高位置估计均方差（MSE）约8分贝比对最大likelihood（ML）参考方法。

A Generalized Statistical Model for THz wireless Channel with Random Atmospheric Absorption

paper_url: http://arxiv.org/abs/2310.18616
repo_url: None
paper_authors: Pranay Bhardwaj, S. M. Zafaruddin
for: 这个论文是为了研究和模型TERAHERTZ（THz）无线通信频率范围内的信号媒体特性和损害，以及这些损害对连接性和可靠性的影响。
methods: 这篇论文使用了γ分布来描述分子吸收率的Random Path-Loss，以及α-η-κ-μ分布来描述短期干扰。此外，论文还考虑了天线偏倾错误和接收机硬件缺陷。
results: 论文通过fox的H函数来描述信道障碍的共同统计效应，并分析了THz链路的失业概率，以证明提出的通用模型的分析可行性。 computer simulations也用于证明该模型在性能评估中的效果。

Abstract
Current statistical channel models for Terahertz (THz) wireless communication primarily concentrate on the sub-THz band, mostly with $\alpha$-$\mu$ and Gaussian mixture fading distributions for short-term fading and deterministic modeling for atmospheric absorption. In this paper, we develop a generalized statistical model for signal propagation at THz frequencies considering random path-loss employing Gamma distribution for the molecular absorption coefficient, short-term fading characterized by the $\alpha$-$\eta$-$\kappa$-$\mu$ distribution, antenna misalignment errors, and transceiver hardware impairments. The proposed model can handle various propagation scenarios, including indoor and outdoor environments, backhaul/fronthaul situations, and complex urban settings. Using Fox's H-functions, we present the probability density function (PDF) and cumulative distribution function (CDF) that capture the combined statistical effects of channel impairments. We analyze the outage probability of a THz link to demonstrate the analytical tractability of the proposed generalized model. We present computer simulations to demonstrate the efficacy of the proposed model for performance assessment with the statistical effect of atmospheric absorption.

摘要
当前的天 Harrison（THz）无线通信频率模型主要集中在Sub-THz频段，通常使用α-μ和高斯混合折射分布来描述短期折射和大气吸收的 deterministic 模型。在本文中，我们开发了一种通用的天 Harrison（THz）信号卫星传播模型，考虑了随机路径损失，使用γ分布来描述分子吸收系数，短期折射由α-η-κ-μ分布 characterize，天线误差、发射机硬件不良等因素。该模型可以处理不同的传播enario，包括室内和室外环境，后向/前向 Situation，复杂的城市场景。使用Fox的H函数，我们提供了PDF和CDF，这些函数捕捉了天 Harrison（THz）通信频率的共同统计效应。我们分析了THz链接的失业概率，以示analytical tractability of the proposed generalized model。我们通过计算机实验证明了提案的模型在性能评估中的有效性，并且演示了天 Harrison（THz）通信频率的统计效应。

2023-10-29

From Chatbots to PhishBots? – Preventing Phishing scams created using ChatGPT, Google Bard and Claude

Robustifying Language Models with Test-Time Adaptation

Poisoning Retrieval Corpora by Injecting Adversarial Passages

BERT Lost Patience Won’t Be Robust to Adversarial Slowdown

Learning to Follow Object-Centric Image Editing Instructions Faithfully

Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related Gender

Unified Representation for Non-compositional and Compositional Expressions

PACuna: Automated Fine-Tuning of Language Models for Particle Accelerators

Pushdown Layers: Encoding Recursive Structure in Transformer Language Models

A Survey on Recent Named Entity Recognition and Relation Classification Methods with Focus on Few-Shot Learning Approaches

ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic

SALMA: Arabic Sense-Annotated Corpus and WSD Benchmarks

LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection

S2F-NER: Exploring Sequence-to-Forest Generation for Complex Entity Recognition

Retrofitting Light-weight Language Models for Emotions using Supervised Contrastive Learning

Sentence Bag Graph Formulation for Biomedical Distant Supervision Relation Extraction

Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition

MUST: A Multilingual Student-Teacher Learning approach for low-resource speech recognition

Counterfactually Probing Language Identity in Multilingual Models

2023-10-29

Improved Motor Imagery Classification Using Adaptive Spatial Filters Based on Particle Swarm Optimization Algorithm

Enhancing Motor Imagery Decoding in Brain Computer Interfaces using Riemann Tangent Space Mapping and Cross Frequency Coupling

Conformal Normalization in Recurrent Neural Network of Grid Cells

The Power of Explainability in Forecast-Informed Deep Learning Models for Flood Mitigation

RAIFLE: Reconstruction Attacks on Interaction-based Federated Learning with Active Data Manipulation

Transfer Learning in Transformer-Based Demand Forecasting For Home Energy Management System

Real-World Implementation of Reinforcement Learning Based Energy Coordination for a Cluster of Households

MAG-GNN: Reinforcement Learning Boosted Graph Neural Network

Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations

Software engineering for deep learning applications: usage of SWEng and MLops tools in GitHub repositories

Proving Linear Mode Connectivity of Neural Networks via Optimal Transport

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Bridging the Gap: Towards an Expanded Toolkit for ML-Supported Decision-Making in the Public Sector

Efficient Cluster Selection for Personalized Federated Learning: A Multi-Armed Bandit Approach

Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming

Evaluating LLP Methods: Challenges and Approaches

Revisiting the Learnability of Apple Tasting

Feature Aggregation in Joint Sound Classification and Localization Neural Networks

Escaping Saddle Points in Heterogeneous Federated Learning via Distributed SGD with Communication Compression

Object-centric architectures enable efficient causal representation learning

Datasets and Benchmarks for Nanophotonic Structure and Parametric Design Simulations

Differentially Private Permutation Tests: Applications to Kernel Methods

On Linear Separation Capacity of Self-Supervised Representation Learning

Machine Learning for the identification of phase-transitions in interacting agent-based systems

Does Invariant Graph Learning via Environment Augmentation Learn Invariance?

An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits

Optimization Landscape of Policy Gradient Methods for Discrete-time Static Output Feedback

Behavior Alignment via Reward Function Optimization

Kernel-based Joint Multiple Graph Learning and Clustering of Graph Signals

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

TRIAGE: Characterizing and auditing training data for improved regression

Playing in the Dark: No-regret Learning with Adversarial Constraints

Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU Networks on Nearly-orthogonal Data

Remaining Useful Life Prediction of Lithium-ion Batteries using Spatio-temporal Multimodal Attention Networks

Hyperbolic Graph Neural Networks at Scale: A Meta Learning Approach

Estimating the Rate-Distortion Function by Wasserstein Gradient Descent

Topological, or Non-topological? A Deep Learning Based Prediction

Learning Subgrid-Scale Models in Discontinuous Galerkin Methods with Neural Ordinary Differential Equations for Compressible Navier–Stokes Equations

D2NO: Efficient Handling of Heterogeneous Input Function Spaces with Distributed Deep Neural Operators

A foundational neural operator that continuously learns without forgetting

Simple and Asymmetric Graph Contrastive Learning without Augmentations

Correlation Aware Sparsified Mean Estimation Using Random Projection

Peer-to-Peer Deep Learning for Beyond-5G IoT

Bayes beats Cross Validation: Efficient and Accurate Ridge Regression via Expectation Maximization

SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models

2023-10-29

Subjective Quality Evaluation of Point Clouds Using a Head Mounted Display

Transport-of-Intensity Model for Single-Mask X-ray Differential Phase Contrast Imaging

2023-10-29

Optical STAR-RIS-Aided VLC Systems: RSMA Versus NOMA

2023-10-28

Deep Learning-based Compressed Domain Multimedia for Man and Machine: A Taxonomy and Application to Point Cloud Classification

INCODE: Implicit Neural Conditioning with Prior Knowledge Embeddings

Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models

UniCat: Crafting a Stronger Fusion Baseline for Multimodal Re-Identification

A Review on the Applications of Machine Learning for Tinnitus Diagnosis Using EEG Signals

PrObeD: Proactive Object Detection Wrapper

CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data

Online Multi-view Anomaly Detection with Disentangled Product-of-Experts Modeling