2023-08-07

cs.CL

cs.CL - 2023-08-07

Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench

paper_url: http://arxiv.org/abs/2308.03656
repo_url: https://github.com/cuhk-arise/emotionbench
paper_authors: Jen-tse Huang, Man Ho Lam, Eric John Li, Shujie Ren, Wenxuan Wang, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu
for: 评估 Large Language Models (LLMs) 的人工情感能力 (empathy ability)，以了解 LLMS 如何在具有不同情感状况下做出回应。
methods: 使用心理学中的情感评估理论，收集了超过 400 个情况的数据集，并对这些情况进行了人工评估，并使用了五种不同的 LLMs，以覆盖 comercial 和 open-source 模型，以及不同的模型大小。
results: LLMs 可以在某些情况下做出合适的回应，但它们无法完全遵循人类情感行为的Alignment，也无法建立类似情况之间的连接。

Abstract
Recently, the community has witnessed the advancement of Large Language Models (LLMs), which have shown remarkable performance on various downstream tasks. Led by powerful models like ChatGPT and Claude, LLMs are revolutionizing how users engage with software, assuming more than mere tools but intelligent assistants. Consequently, evaluating LLMs' anthropomorphic capabilities becomes increasingly important in contemporary discourse. Utilizing the emotion appraisal theory from psychology, we propose to evaluate the empathy ability of LLMs, i.e., how their feelings change when presented with specific situations. After a careful and comprehensive survey, we collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study. Categorizing the situations into 36 factors, we conduct a human evaluation involving more than 1,200 subjects worldwide. With the human evaluation results as references, our evaluation includes five LLMs, covering both commercial and open-source models, including variations in model sizes, featuring the latest iterations, such as GPT-4 and LLaMA 2. A conclusion can be drawn from the results that, despite several misalignments, LLMs can generally respond appropriately to certain situations. Nevertheless, they fall short in alignment with the emotional behaviors of human beings and cannot establish connections between similar situations. Our collected dataset of situations, the human evaluation results, and the code of our testing framework, dubbed EmotionBench, is made publicly in https://github.com/CUHK-ARISE/EmotionBench. We aspire to contribute to the advancement of LLMs regarding better alignment with the emotional behaviors of human beings, thereby enhancing their utility and applicability as intelligent assistants.

摘要
近期，社区目睹了大型语言模型（LLM）的发展，其在不同下游任务上表现出了很好的表现。带领于强大的模型如ChatGPT和Claude，LLMs在软件中的应用不再是只是工具，而是智能助手。因此，评估LLMs的人类化能力在当今话题中变得越来越重要。基于心理学中的情感评估理论，我们提议评估LLMs的同情能力，即在特定情况下，它们的情感如何变化。经过仔细和全面的调查，我们收集了包含超过400个情况的数据集，这些情况被证明可以诱发出8种基本的情感。将这些情况分为36个因素，我们进行了全球范围内的人工评估，受试者超过1,200人。与人工评估结果为参考，我们的评估包括5个LLM，其中包括商业和开源模型，以及不同的模型大小和最新的迭代（如GPT-4和LLaMA 2）。结果显示，虽然LLMs在某些情况下能够适应，但它们在与人类情感行为的Alignment方面异常，无法建立类似情况之间的连接。我们收集的情况数据集、人工评估结果和测试框架代码（EmotionBench）将在https://github.com/CUHK-ARISE/EmotionBench上公开。我们希望通过提高LLMs与人类情感行为的Alignment，从而提高它们在智能助手方面的应用和可用性。

KITLM: Domain-Specific Knowledge InTegration into Language Models for Question Answering

paper_url: http://arxiv.org/abs/2308.03638
repo_url: https://github.com/sakharamg/kitlm
paper_authors: Ankush Agarwal, Sakharam Gawade, Amar Prakash Azad, Pushpak Bhattacharyya
for: 提高域域语言理解，提高语言模型在专业领域的表现。
methods: 通过知识涂敷方法，将相关知识infusion到语言模型中，提高语言模型的表现和效率。
results: KITLM表现较SKILL和GPT-3.5-turbo更出色，在MetaQA和AeroQA中都达到了1.5倍以上的提高，并且在飞航领域中也有了显著的提高。

Abstract
Large language models (LLMs) have demonstrated remarkable performance in a wide range of natural language tasks. However, as these models continue to grow in size, they face significant challenges in terms of computational costs. Additionally, LLMs often lack efficient domain-specific understanding, which is particularly crucial in specialized fields such as aviation and healthcare. To boost the domain-specific understanding, we propose, KITLM, a novel knowledge base integration approach into language model through relevant information infusion. By integrating pertinent knowledge, not only the performance of the language model is greatly enhanced, but the model size requirement is also significantly reduced while achieving comparable performance. Our proposed knowledge-infused model surpasses the performance of both GPT-3.5-turbo and the state-of-the-art knowledge infusion method, SKILL, achieving over 1.5 times improvement in exact match scores on the MetaQA. KITLM showed a similar performance boost in the aviation domain with AeroQA. The drastic performance improvement of KITLM over the existing methods can be attributed to the infusion of relevant knowledge while mitigating noise. In addition, we release two curated datasets to accelerate knowledge infusion research in specialized fields: a) AeroQA, a new benchmark dataset designed for multi-hop question-answering within the aviation domain, and b) Aviation Corpus, a dataset constructed from unstructured text extracted from the National Transportation Safety Board reports. Our research contributes to advancing the field of domain-specific language understanding and showcases the potential of knowledge infusion techniques in improving the performance of language models on question-answering.

摘要
大型语言模型（LLM）在各种自然语言任务中表现出色，但是随着模型的大小不断增长，其计算成本也随之增加。此外，LLM often lacks efficient domain-specific understanding，尤其在专业领域如航空和医疗等领域。为了提高域 Specific Understanding，我们提议了一种基于知识库 интеграción的语言模型approach，即KITLM。通过将相关知识integrated into the language model，不仅提高了语言模型的性能，而且降低了模型的大小，同时实现相似的性能。我们的提出的知识混合模型在MetaQA上的精确匹配分数上超过了GPT-3.5-turbo和SKILL的状态的表现，达到了1.5倍的提升。KITLM在航空领域的AeroQA上也显示了类似的性能提升。我们发布了两个 curaated dataset，以促进域 Specific Understanding研究：a) AeroQA，一个新的多步问答检验 benchmark dataset，和b) Aviation Corpus，一个从国家交通安全委员会报告中提取的未结构化文本构建的数据集。我们的研究对域 Specific Understanding领域的发展做出了贡献，并展示了知识混合技术在问答任务中的潜在提升效果。

Negative Lexical Constraints in Neural Machine Translation

paper_url: http://arxiv.org/abs/2308.03601
repo_url: None
paper_authors: Josef Jon, Dušan Variš, Michal Novák, João Paulo Aires, Ondřej Bojar
for: 这个论文探讨了在英语到捷克语神经机器翻译中的负 lexical constraining。负 lexical constraining是禁止翻译模型生成某些词语或表达的方法。
methods: 我们比较了基于修改解码过程或训练数据的不同方法。我们在两个任务上进行了比较：重句化和反馈式翻译重新调整。我们还研究了这些方法如何”逃脱”给定的约束（通常是在词典形式），通过生成不同的表面形式来绕过约束。
results: 我们提出了一种方法来减轻这个问题，通过在训练过程中使用剪辑的负约束来对抗模型生成多种表面形式的词语，从而减轻约束被违反的问题。我们示出了我们的方法可以改善约束，但问题仍然存在许多情况下。

Abstract
This paper explores negative lexical constraining in English to Czech neural machine translation. Negative lexical constraining is used to prohibit certain words or expressions in the translation produced by the neural translation model. We compared various methods based on modifying either the decoding process or the training data. The comparison was performed on two tasks: paraphrasing and feedback-based translation refinement. We also studied to which extent these methods "evade" the constraints presented to the model (usually in the dictionary form) by generating a different surface form of a given constraint.We propose a way to mitigate the issue through training with stemmed negative constraints to counter the model's ability to induce a variety of the surface forms of a word that can result in bypassing the constraint. We demonstrate that our method improves the constraining, although the problem still persists in many cases.

摘要
Translated into Simplified Chinese:这篇论文研究了英语到捷克语神经机器翻译中的负 lexical 约束。负 lexical 约束用于禁止翻译模型生成的某些词或表达。我们比较了基于修改解码过程或训练数据的不同方法。我们在两个任务上进行了比较：重句和反馈基于翻译重新评估。我们还研究了这些方法如何"逃脱"给模型的约束（通常是字典形式），生成不同的表面形式。我们提议通过减少负约束来 Mitigate 这个问题，使用减少负约束来对模型的表面形式变化进行对抗。我们示出了我们的方法可以改善约束，但问题仍然存在于多个情况。

WIKITIDE: A Wikipedia-Based Timestamped Definition Pairs Dataset

paper_url: http://arxiv.org/abs/2308.03582
repo_url: None
paper_authors: Hsuvas Borkakoty, Luis Espinosa-Anke
for: 本研究旨在提高现有语言模型的可靠性和灵活性，通过从Wikipedia中提取时间戳定义对语言和世界的变化进行识别。
methods: 本研究提出了一种基于WikiTiDe数据集的综合方法，包括一种自动化的启动算法，以及一种使用精心预处理的基本更新来提高模型的性能。
results: 研究结果表明，使用自动化启动算法和精心预处理的基本更新可以提高模型的性能，并在多个下游任务中显示出优异的成绩。

Abstract
A fundamental challenge in the current NLP context, dominated by language models, comes from the inflexibility of current architectures to 'learn' new information. While model-centric solutions like continual learning or parameter-efficient fine tuning are available, the question still remains of how to reliably identify changes in language or in the world. In this paper, we propose WikiTiDe, a dataset derived from pairs of timestamped definitions extracted from Wikipedia. We argue that such resource can be helpful for accelerating diachronic NLP, specifically, for training models able to scan knowledge resources for core updates concerning a concept, an event, or a named entity. Our proposed end-to-end method is fully automatic, and leverages a bootstrapping algorithm for gradually creating a high-quality dataset. Our results suggest that bootstrapping the seed version of WikiTiDe leads to better fine-tuned models. We also leverage fine-tuned models in a number of downstream tasks, showing promising results with respect to competitive baselines.

摘要
当前NLP上的一个基本挑战是模型的不可变性，即现有的模型无法"学习"新信息。虽有模型中心的解决方案如 kontinual learning 和参数效率的精度调整，但问题仍然是如何可靠地识别语言或世界中的变化。在这篇论文中，我们提出了 WikiTiDe dataset，它是基于 Wikipedia 中的时间戳定义对的对应集。我们认为这种资源可以帮助加速 diachronic NLP，即训练能够扫描知识资源的模型，以找到核心更新 concerning 概念、事件或Named Entity。我们的提议的终端方法是自动的，并利用搅拌算法来逐渐创建高质量的数据集。我们的结果表明，使用搅拌种子版本 WikiTiDe 可以获得更好的精度调整。此外，我们还利用精度调整的模型在一些下游任务中的表现，与比较标准的基准线有着良好的结果。

Towards Controllable Natural Language Inference through Lexical Inference Types

paper_url: http://arxiv.org/abs/2308.03581
repo_url: None
paper_authors: Yingji Zhang, Danilo S. Carvalho, Ian Pratt-Hartmann, Andre Freitas
for: This paper aims to provide a mechanism for producing explanatory (abductive) inference chains that ground claims to their supporting premises.methods: The paper employs the T5 model to directly generate an entailment tree, which explains how the answer is inferred. However, the T5 model lacks the ability to explain and control the generation of intermediate steps, which is crucial for the multi-hop inference process.results: The paper proposes a controlled natural language inference architecture for multi-premise explanatory inference, which includes defining lexical inference types based on Abstract Meaning Representation (AMR) graph and modifying the architecture of T5 to learn a latent sentence representation conditioned on said type information. The paper also delivers a dataset of approximately 5000 annotated explanatory inference steps, with well-grounded lexical-symbolic operations. Experimental results indicate that the inference typing induced at the T5 bottleneck can help T5 to generate a conclusion under explicit control.

Abstract
Explainable natural language inference aims to provide a mechanism to produce explanatory (abductive) inference chains which ground claims to their supporting premises. A recent corpus called EntailmentBank strives to advance this task by explaining the answer to a question using an entailment tree \cite{dalvi2021explaining}. They employ the T5 model to directly generate the tree, which can explain how the answer is inferred. However, it lacks the ability to explain and control the generation of intermediate steps, which is crucial for the multi-hop inference process. % One recent corpus, EntailmentBank, aims to push this task forward by explaining an answer to a question according to an entailment tree \cite{dalvi2021explaining}. They employ T5 to generate the tree directly, which can explain how the answer is inferred but cannot explain how the intermediate is generated, which is essential to the multi-hop inference process. In this work, we focus on proposing a controlled natural language inference architecture for multi-premise explanatory inference. To improve control and enable explanatory analysis over the generation, we define lexical inference types based on Abstract Meaning Representation (AMR) graph and modify the architecture of T5 to learn a latent sentence representation (T5 bottleneck) conditioned on said type information. We also deliver a dataset of approximately 5000 annotated explanatory inference steps, with well-grounded lexical-symbolic operations. Experimental results indicate that the inference typing induced at the T5 bottleneck can help T5 to generate a conclusion under explicit control.

摘要
自然语言推理可以提供一种机制，以便生成解释性的推理链，并将含义链绑定到它的支持前提。一个新的资料库called EntailmentBank，努力推动这项任务，通过解释答案使用推理树 \cite{dalvi2021explaining}.它使用T5模型直接生成推理树，可以解释答案如何被推理出来。然而，它缺乏对中间步骤的解释和控制能力，这是多步推理过程中的关键。在这项工作中，我们关注提出一种可控的自然语言推理体系，用于多个前提解释推理。为了提高控制和启用解释分析，我们定义了基于抽象意义表示（AMR）图的语义推理类型，并修改T5模型的架构，以学习受到这些类型信息的隐藏句子表示（T5瓶颈）。我们还提供了约5000个注释的解释推理步骤数据集，其中包含了具有固定 lexical-symbolic 操作的准确地标注。实验结果表明，在T5瓶颈中引入的推理类型induced可以帮助T5在显式控制下生成结论。

Topological Interpretations of GPT-3

paper_url: http://arxiv.org/abs/2308.03565
repo_url: None
paper_authors: Tianyi Sun, Bradley Nelson
for: investigate a consistent method for deriving the correlation between sentence vector and semantic meaning of a sentence
methods: use three state-of-the-art word/sentence embedding methods (GPT-3, Word2Vec, and Sentence-BERT) to embed plain text sentence strings into high dimensional spaces, and compute the pairwise distance between any possible combination of two sentence vectors in an embedding space
results: observe correlations of the same sentence in different embedding spaces and correlations of different sentences in the same embedding space, which are consistent with the hypothesis and provide a foundation for further research

Abstract
This is an experiential study of investigating a consistent method for deriving the correlation between sentence vector and semantic meaning of a sentence. We first used three state-of-the-art word/sentence embedding methods including GPT-3, Word2Vec, and Sentence-BERT, to embed plain text sentence strings into high dimensional spaces. Then we compute the pairwise distance between any possible combination of two sentence vectors in an embedding space and map them into a matrix. Based on each distance matrix, we compute the correlation of distances of a sentence vector with respect to the other sentence vectors in an embedding space. Then we compute the correlation of each pair of the distance matrices. We observed correlations of the same sentence in different embedding spaces and correlations of different sentences in the same embedding space. These observations are consistent with our hypothesis and take us to the next stage.

摘要
这是一项实验性研究，旨在找到一种稳定的方法，用于计算句子 vector 和句子意义之间的相关性。我们首先使用了三种当前顶尖词语/句子嵌入方法，包括 GPT-3、Word2Vec 和 Sentence-BERT，将平文句子串embedded到高维空间中。然后，我们计算了任意两个句子 vector 之间的距离，并将其映射到一个矩阵中。基于每个距离矩阵，我们计算了每个句子 vector 与其他句子 vector 在嵌入空间中的距离相关性。然后，我们计算了每对距离矩阵之间的相关性。我们发现了不同嵌入空间中的同句子之间的相关性，以及同一个嵌入空间中的不同句子之间的相关性。这些观察结果与我们的假设一致，为我们的下一步做出了基础。

Mondrian: Prompt Abstraction Attack Against Large Language Models for Cheaper API Pricing

paper_url: http://arxiv.org/abs/2308.03558
repo_url: None
paper_authors: Wai Man Si, Michael Backes, Yang Zhang
for: 本研究旨在描述一种新的攻击策略对于自然语言处理（NLP）模型API，以及一种简单有效的方法来实现这种攻击。
methods: 本研究使用了一种名为“MONDRIAN”的简单方法，可以将用户查询语句简化，从而降低使用NLP模型API的成本。该方法包括创建一个假API（具有较低的成本），并使用MONDRIAN模块来修改用户查询语句，以获取对目标API的响应。
results: 研究结果表明，MONDRIAN可以成功地将用户查询语句的字符数减少13%至23%，并且这些简化的查询语句对任务特定和通用的语言模型如ChatGPT没有显著影响。此外，MONDRIAN还可以减少 instruciton prompts 的字符数至少11%，而不会影响输出质量。因此，这种攻击策略可以让攻击者获得利益，而无需承担API开发和部署的成本。

Abstract
The Machine Learning as a Service (MLaaS) market is rapidly expanding and becoming more mature. For example, OpenAI's ChatGPT is an advanced large language model (LLM) that generates responses for various queries with associated fees. Although these models can deliver satisfactory performance, they are far from perfect. Researchers have long studied the vulnerabilities and limitations of LLMs, such as adversarial attacks and model toxicity. Inevitably, commercial ML models are also not exempt from such issues, which can be problematic as MLaaS continues to grow. In this paper, we discover a new attack strategy against LLM APIs, namely the prompt abstraction attack. Specifically, we propose Mondrian, a simple and straightforward method that abstracts sentences, which can lower the cost of using LLM APIs. In this approach, the adversary first creates a pseudo API (with a lower established price) to serve as the proxy of the target API (with a higher established price). Next, the pseudo API leverages Mondrian to modify the user query, obtain the abstracted response from the target API, and forward it back to the end user. Our results show that Mondrian successfully reduces user queries' token length ranging from 13% to 23% across various tasks, including text classification, generation, and question answering. Meanwhile, these abstracted queries do not significantly affect the utility of task-specific and general language models like ChatGPT. Mondrian also reduces instruction prompts' token length by at least 11% without compromising output quality. As a result, the prompt abstraction attack enables the adversary to profit without bearing the cost of API development and deployment.

摘要
Machine Learning as a Service（MLaaS）市场迅速扩大，成熔度也在提高。例如，OpenAI的ChatGPT是一种先进的大型语言模型（LLM），可以根据不同的问题生成相应的回答，但这些模型并不完美。研究人员已经长期研究LLM的攻击和限制，如对抗攻击和模型毒性。然而，商业ML模型也不能免受这些问题，这可能会对MLaaS的发展带来问题。在这篇论文中，我们发现了一种新的攻击策略对LLM API，即提档攻击。特别是，我们提出了一种名为Mondrian的简单和直观的方法，可以将句子抽象成更短的句子。在这种方法中，敌对者首先创建一个假API（具有较低的成本），作为目标API（具有较高的成本）的代理。然后，假API使用Mondrian modify用户的查询，从目标API获取抽象回答，并将其返回给终端用户。我们的结果表明，Mondrian可以成功地将用户查询的字符数量减少13%到23%，并且这些抽象查询不会对任务特定和总语言模型如ChatGPT产生重大影响。此外，Mondrian还可以减少 instruktion 的字符数量至少11%，无需妥协输出质量。因此，提档攻击可以让敌对者获利而不需要承担API的开发和部署成本。

Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue

paper_url: http://arxiv.org/abs/2308.03549
repo_url: https://github.com/suprityoung/zhongjing
paper_authors: Songhua Yang, Hanjie Zhao, Senbin Zhu, Guangyu Zhou, Hongfei Xu, Yuxiang Jia, Hongying Zan
for: 本研究旨在提高Large Language Models（LLMs）在中药医学领域的理解和回答能力。
methods: 本研究使用了人工智能反馈学习（RLHF）和中医诊断对话集（CMtMedQA）来提高模型的多回合对话和积极问题提起能力。
results: 我们的模型在多种能力方面超过了基eline，并与之前的最佳模型和ChatGPT匹配在一些能力方面。RLHF进一步提高了模型的命令遵循能力和安全性。

Abstract
Recent advances in Large Language Models (LLMs) have achieved remarkable breakthroughs in understanding and responding to user intents. However, their performance lag behind general use cases in some expertise domains, such as Chinese medicine. Existing efforts to incorporate Chinese medicine into LLMs rely on Supervised Fine-Tuning (SFT) with single-turn and distilled dialogue data. These models lack the ability for doctor-like proactive inquiry and multi-turn comprehension and cannot always align responses with safety and professionalism experts. In this work, we introduce Zhongjing, the first Chinese medical LLaMA-based LLM that implements an entire training pipeline from pre-training to reinforcement learning with human feedback (RLHF). Additionally, we introduce a Chinese multi-turn medical dialogue dataset of 70,000 authentic doctor-patient dialogues, CMtMedQA, which significantly enhances the model's capability for complex dialogue and proactive inquiry initiation. We define a refined annotation rule and evaluation criteria given the biomedical domain's unique characteristics. Results show that our model outperforms baselines in various capacities and matches the performance of ChatGPT in a few abilities, despite having 50x training data with previous best model and 100x parameters with ChatGPT. RLHF further improves the model's instruction-following ability and safety.We also release our code, datasets and model for further research.

摘要
近期大语言模型（LLM）的进步取得了很大的突破，在理解和回答用户意图方面表现出色。然而，在某些专业领域，如中医，其表现仍然落后于通用场景。现有的中医integration into LLMs的尝试都是通过监督微调（SFT）和单转Dialogue数据进行。这些模型缺乏医生般的积极问题和多转Dialogue的能力，并且不能一直与安全和专业性保持一致。在这项工作中，我们介绍了 Zhongjing，首个基于LLaMA的中医语言模型，该模型通过整个训练管道，从预训练到人工反馈学习（RLHF）来实现。此外，我们还介绍了一个70000 authentic doctor-patient对话的中医多转对话数据集，CMtMedQA，这使得模型在复杂对话和积极问题的 iniciation 方面具有显著提升。我们采用了特定领域的注解规则和评价标准。结果表明，我们的模型在多种方面超过基eline，并与之前的最佳模型和ChatGPT的性能相当，即使有50倍的训练数据和100倍的参数。RLHF进一步改善了模型的指令遵循能力和安全性。我们还发布了代码、数据集和模型，以便进一步的研究。

Knowledge-preserving Pruning for Pre-trained Language Models without Retraining

paper_url: http://arxiv.org/abs/2308.03449
repo_url: None
paper_authors: Seungcheol Park, Hojun Choi, U Kang
for: 压缩预训练语言模型，不需要重新训练
methods: 使用知识保留的结构剪辑算法， identificates 和剪辑尚未使用的注意头和神经元
results: 在 SQuAD benchmarck 上，距离80%的压缩率下，与现有的不需要重新训练剪辑算法相比，提高了58.02%的 F1 分数

Abstract
Given a pre-trained language model, how can we efficiently compress it without retraining? Retraining-free structured pruning algorithms are crucial in pre-trained language model compression due to their significantly reduced pruning cost and capability to prune large language models. However, existing retraining-free algorithms encounter severe accuracy degradation, as they fail to preserve the useful knowledge of pre-trained models. In this paper, we propose K-pruning (Knowledge-preserving pruning), an accurate retraining-free structured pruning algorithm for pre-trained language models. K-pruning identifies and prunes attention heads and neurons deemed to be superfluous, based on the amount of their inherent knowledge. K-pruning applies an iterative process of pruning followed by knowledge reconstruction for each sub-layer to preserve the knowledge of the pre-trained models. Consequently, K-pruning shows up to 58.02%p higher F1 score than existing retraining-free pruning algorithms under a high compression rate of 80% on the SQuAD benchmark.

摘要
Translated into Simplified Chinese:给一个预训练语言模型，如何高效压缩它而无需重新训练？预训练模型压缩中的结构化压缩算法对于大型语言模型的压缩具有重要的降低压缩成本和可以压缩大型语言模型。然而，现有的预训练模型压缩算法往往会导致严重的准确性下降，因为它们无法保留预训练模型的有用知识。在本文中，我们提出了K-压缩（知识保留压缩），一种高精度的预训练模型压缩算法。K-压缩通过评估注意头和神经元的含义量来进行压缩，并在每个子层上应用迭代压缩和知识重建过程以保留预训练模型的知识。因此，K-压缩在80%的压缩率下，与现有的预训练模型压缩算法相比，在SQuAD测试 benchmark上显示出58.02%p的更高的F1分数。

Improving Few-shot and Zero-shot Entity Linking with Coarse-to-Fine Lexicon-based Retriever

paper_url: http://arxiv.org/abs/2308.03365
repo_url: None
paper_authors: Shijue Huang, Bingbing Wang, Libo Qin, Qin Zhao, Ruifeng Xu
for: 这篇论文主要针对中文少量和零量实体识别问题，尤其是对尾部和出现的实体进行更加准确的识别。methods: 该论文提出了一种基于词典的粗细化检索器，通过两层检索来有效地检索实体候选者。第一层利用实体名称进行检索，而第二层则是利用实体描述来细化检索并准确地划分出新的实体。results: 实验结果显示，该方法可以在不进行广泛的训练过程中获得优秀的性能，并且在NLPCC 2023共享任务6中 ranked 1st in Chinese Few-shot and Zero-shot Entity Linking。

Abstract
Few-shot and zero-shot entity linking focus on the tail and emerging entities, which are more challenging but closer to real-world scenarios. The mainstream method is the ''retrieve and rerank'' two-stage framework. In this paper, we propose a coarse-to-fine lexicon-based retriever to retrieve entity candidates in an effective manner, which operates in two layers. The first layer retrieves coarse-grained candidates by leveraging entity names, while the second layer narrows down the search to fine-grained candidates within the coarse-grained ones. In addition, this second layer utilizes entity descriptions to effectively disambiguate tail or new entities that share names with existing popular entities. Experimental results indicate that our approach can obtain superior performance without requiring extensive finetuning in the retrieval stage. Notably, our approach ranks the 1st in NLPCC 2023 Shared Task 6 on Chinese Few-shot and Zero-shot Entity Linking.

摘要
主要研究领域是几招和零招实体连接，它们更加具有实际场景的挑战性。主流方法是''检索并重新排''的两个阶段框架。在这篇论文中，我们提出了一种粗细层次lexicon-based检索器，可以有效地 retrieve实体候选者，它在两层结构下运行。第一层通过实体名称进行检索粗细候选者，第二层在粗细候选者中进行筛选和精度增强。此外，第二层还利用实体描述来有效地减少尾部或新出现的实体名称冲突。实验结果表明，我们的方法可以在检索阶段无需大规模的微调就可以获得优秀表现。特别是，我们的方法在NLPCC 2023共享任务6中的中文几招和零招实体连接中获得了第一名。

Coupling Symbolic Reasoning with Language Modeling for Efficient Longitudinal Understanding of Unstructured Electronic Medical Records

paper_url: http://arxiv.org/abs/2308.03360
repo_url: None
paper_authors: Shivani Shekhar, Simran Tiwari, T. C. Rensink, Ramy Eskander, Wael Salloum
for: 这种研究旨在提高不结构化的医疗记录理解，特别是使用 transformer-based 大型语言模型 (LLMs) 来解决医疗记录中的混乱、不一致和重复问题。
methods: 该研究使用了符号逻辑和语言模型的组合来提高不结构化医疗记录中的各种医学变量提取。
results: 研究发现，将符号逻辑与语言模型结合使用可以提高不结构化医疗记录中各种医学变量的提取率。此外，研究还发现了现有的开源 LLMs 在检索性能方面与商业 LLMs 相当。最后，研究强调了使用符号逻辑来导航 LLMs 的重要性，因为纯然使用 LLMs 会导致性能最低。

Abstract
The application of Artificial Intelligence (AI) in healthcare has been revolutionary, especially with the recent advancements in transformer-based Large Language Models (LLMs). However, the task of understanding unstructured electronic medical records remains a challenge given the nature of the records (e.g., disorganization, inconsistency, and redundancy) and the inability of LLMs to derive reasoning paradigms that allow for comprehensive understanding of medical variables. In this work, we examine the power of coupling symbolic reasoning with language modeling toward improved understanding of unstructured clinical texts. We show that such a combination improves the extraction of several medical variables from unstructured records. In addition, we show that the state-of-the-art commercially-free LLMs enjoy retrieval capabilities comparable to those provided by their commercial counterparts. Finally, we elaborate on the need for LLM steering through the application of symbolic reasoning as the exclusive use of LLMs results in the lowest performance.

摘要
人工智能（AI）在医疗领域的应用已经是革命性的，尤其是最近的转换器基于大语言模型（LLM）的进步。然而，理解不结构化的电子医疗记录仍然是一个挑战，因为记录的自然特性（如混乱、不一致和重复），以及LLM无法 derivation reasoning 模式，导致医学变量的全面理解受到限制。在这种情况下，我们研究了对象 Symbolic reasoning 和语言模型结合的能力，以提高不结构化医疗文本理解。我们发现，这种结合可以提高多个医学变量的提取。此外，我们发现了现成的自由LLM在检索能力方面与商业LLM相当。最后，我们讨论了LLM的导航需要通过象征逻辑的应用，因为纯粹使用LLM会导致性能最低。

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

paper_url: http://arxiv.org/abs/2308.03303
repo_url: None
paper_authors: Longteng Zhang, Lin Zhang, Shaohuai Shi, Xiaowen Chu, Bo Li
For: + The paper is written for fine-tuning large language models (LLMs) with low-rank adaptation (LoRA) method.* Methods: + The LoRA-FA method chooses to freeze the projection-down weight of $A$ and update the projection-up weight of $B$ in each LoRA layer. + The method eliminates the requirement to store full-rank input activations, reducing the activation memory without performance degradation and expensive recomputation.* Results: + The LoRA-FA method achieves close fine-tuning accuracy across different tasks compared to full parameter fine-tuning and LoRA. + The method reduces the overall memory cost by up to 1.4 times compared to LoRA.Here’s the simplified Chinese text:* For: + 这篇论文是为大型自然语言模型（LLM）的精度调整（LoRA）方法而写的。* Methods: + LoRA-FA方法选择在每个LoRA层中冻结投影下重量$A$，并更新投影上重量$B$。 + 方法消除了需要存储完整Activation的存储要求，从而降低了活动内存的消耗，不会影响性能和费用计算。* Results: + LoRA-FA方法在不同任务上都能达到与全参数调整和LoRA相同的精度。 + 方法可以将总内存成本降低到LoRA的1.4倍。

Abstract
The low-rank adaptation (LoRA) method can largely reduce the amount of trainable parameters for fine-tuning large language models (LLMs), however, it still requires expensive activation memory to update low-rank weights. Reducing the number of LoRA layers or using activation recomputation could harm the fine-tuning performance or increase the computational overhead. In this work, we present LoRA-FA, a memory-efficient fine-tuning method that reduces the activation memory without performance degradation and expensive recomputation. LoRA-FA chooses to freeze the projection-down weight of $A$ and update the projection-up weight of $B$ in each LoRA layer. It ensures the change of model weight reside in a low-rank space during LLMs fine-tuning, while eliminating the requirement to store full-rank input activations. We conduct extensive experiments across multiple model types (RoBERTa, T5, LLaMA) and model scales. Our results show that LoRA-FA can always achieve close fine-tuning accuracy across different tasks compared to full parameter fine-tuning and LoRA. Furthermore, LoRA-FA can reduce the overall memory cost by up to 1.4$\times$ compared to LoRA.

摘要
LoRA方法可以大幅减少精度调整大语言模型（LLM）的训练参数，但仍然需要费时的活动记忆更新低级 веса。减少LoRA层数或使用活动重计算可能会减少调整性能或增加计算开销。在这种工作中，我们提出了LoRA-FA，一种内存高效的调整方法，可以不需要存储全级输入活动的存储。LoRA-FA在每个LoRA层中决定将$A$的投影下降 веса冻结，而将$B$的投影上升 веса更新。这确保了模型参数的变化在LLMs调整过程中 residual在低级空间中，而无需存储全级输入活动。我们在多种模型类型（RoBERTa、T5、LLaMA）和模型规模上进行了广泛的实验。我们的结果表明，LoRA-FA可以在不同任务上实现与全参数调整和LoRA的相似精度，并且可以将总内存成本减少到1.4倍。

Studying Large Language Model Generalization with Influence Functions

paper_url: http://arxiv.org/abs/2308.03296
repo_url: https://github.com/Aryia-Behroziuan/neurons
paper_authors: Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman
for: 了解和 mitigate Machine Learning 模型关联的风险，可以通过哪些训练示例来了解模型的行为？
methods: 影响函数可以回答一个 counterfactual 问题：如果某个序列添加到训练集中， Then how would the model’s parameters and outputs change? However, influence functions are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP).
results: 我们使用 Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) 方法将影响函数扩展到 LLMs 中，并实现了类似于传统影响函数估计器的准确性，同时计算 IHVP 的速度是指数级别 slower。我们还 investigate了两种算法技巧来降低计算候选训练序列梯度的成本：TF-IDF 筛选和查询批处理。通过影响函数，我们研究了 LLMs 的泛化模式，包括泛化模式的稀缺性、增加抽象的规律、数学和编程能力、 across-lingual 泛化和角色扮演行为。 despite 许多复杂的泛化形式，我们发现一个意外的限制：影响幅在键短语顺序反转时 decay 到 Near-zero。总的来说，影响函数给我们一种强大的新工具来研究 LLMs 的泛化特性。

Abstract
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.

摘要
Translated into Simplified Chinese:当试图更好地了解一个机器学习模型以便理解和 mitigate 相关的风险时，一个有价值的证据来源是：哪些训练示例最大程度地影响模型的行为？影响函数旨在回答一个Counterfactual问题：如果给定的序列添加到训练集中， THEN 模型的参数（以及其输出）如何变化？虽然影响函数已经生成了小型模型的情况，但是它们难以扩展到大型自然语言模型（LLMs），因为计算 inverse-Hessian-vector product（IHVP）的困难。我们使用 Eigenvalue-corrected Kronecker-Factored Approximate Curvature（EK-FAC）方法来扩展影响函数到 LLMs 中，并在520亿参数中实现了类似的准确率。我们还 investigate 两种算法技术来降低计算候选训练序列的导数的成本：TF-IDF 筛选和查询批处理。我们使用影响函数来研究 LLMs 的总化模式，包括影响模式的稀疏性、随着Scale的增长、数学和编程能力、cross-lingual总化和角色扮演行为。尽管 Apparently 出现了多种复杂的总化形式，但我们发现一个意外的限制：影响的 decay 到 near-zero 当键phrase 的顺序被反转。总之，影响函数给我们一种强大的新工具来研究 LLMs 的总化性能。

Dialogue Systems Can Generate Appropriate Responses without the Use of Question Marks? – Investigation of the Effects of Question Marks on Dialogue Systems

paper_url: http://arxiv.org/abs/2308.03293
repo_url: None
paper_authors: Tomoya Mizumoto, Takato Yamazaki, Katsumasa Yoshikawa, Masaya Ohagi, Toshiki Kawamoto, Toshinori Sato
for: 这篇论文旨在研究对话系统中问号的影响。
methods: 作者使用了语音识别技术和对话系统来研究问号的影响。
results: 研究发现，问号在对话系统中有显著的影响，并且分析了具体的示例以了解哪些类型的语音会对对话系统产生影响。

Abstract
When individuals engage in spoken discourse, various phenomena can be observed that differ from those that are apparent in text-based conversation. While written communication commonly uses a question mark to denote a query, in spoken discourse, queries are frequently indicated by a rising intonation at the end of a sentence. However, numerous speech recognition engines do not append a question mark to recognized queries, presenting a challenge when creating a spoken dialogue system. Specifically, the absence of a question mark at the end of a sentence can impede the generation of appropriate responses to queries in spoken dialogue systems. Hence, we investigate the impact of question marks on dialogue systems, with the results showing that they have a significant impact. Moreover, we analyze specific examples in an effort to determine which types of utterances have the impact on dialogue systems.

摘要
当人们在口头交流中发言时，可以观察到文本对话中不同的现象。written communication通常使用问号来标示问题，而在口头交流中，问题通常由句子尾的升高声调表示。然而，许多语音识别器不会将认可的问题append到句子尾，这对创建口头对话系统带来挑战。specifically, the absence of a question mark at the end of a sentence can hinder the generation of appropriate responses to queries in spoken dialogue systems. Therefore, we investigate the impact of question marks on dialogue systems, with the results showing that they have a significant impact. In addition, we analyze specific examples to determine which types of utterances have the greatest impact on dialogue systems.Note: The word "问号" (wèn zhàng) in the text refers to the question mark symbol (?) used in written Chinese to indicate a question.

Towards General Text Embeddings with Multi-stage Contrastive Learning

paper_url: http://arxiv.org/abs/2308.03281
repo_url: None
paper_authors: Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang
for: 该论文旨在提出一种通用的文本嵌入模型，通过多stage对比学习来训练。
methods: 该模型使用多种dataset并在不同阶段进行对比学习，以提高表示能力。
results: 该模型在文本嵌入benchmark上表现出色，比之前的模型更高效，并在不需要进一步 fine-tuning 的情况下，在代码检索任务上也表现出优异。

Abstract
We present GTE, a general-purpose text embedding model trained with multi-stage contrastive learning. In line with recent advancements in unifying various NLP tasks into a single format, we train a unified text embedding model by employing contrastive learning over a diverse mixture of datasets from multiple sources. By significantly increasing the number of training data during both unsupervised pre-training and supervised fine-tuning stages, we achieve substantial performance gains over existing embedding models. Notably, even with a relatively modest parameter count of 110M, GTE$_\text{base}$ outperforms the black-box embedding API provided by OpenAI and even surpasses 10x larger text embedding models on the massive text embedding benchmark. Furthermore, without additional fine-tuning on each programming language individually, our model outperforms previous best code retrievers of similar size by treating code as text. In summary, our model achieves impressive results by effectively harnessing multi-stage contrastive learning, offering a powerful and efficient text embedding model with broad applicability across various NLP and code-related tasks.

摘要
我们介绍GTE，一种通用文本嵌入模型，通过多阶段对比学习训练。随着现代NPLTasks的统一，我们使用对比学习训练一种多元数据集合的通用文本嵌入模型，并在不同来源的数据集上进行了大量的训练数据增加。这使得GTE在现有嵌入模型的基础上实现了显著性能提升。特别是，即使使用110M个参数，GTE$_\text{base}$仍然可以超越OpenAI提供的黑盒嵌入API以及10倍大的文本嵌入模型在庞大文本嵌入 benchmark 上。此外，不需要额外 fine-tuning 每种编程语言，我们的模型可以在与类似大小的前一代最佳代码搜索器相比超越它们。总之，我们的模型通过有效地利用多阶段对比学习，提供了一种强大和高效的文本嵌入模型，可以广泛应用于不同的NPLTasks和代码相关任务。

UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition

paper_url: http://arxiv.org/abs/2308.03279
repo_url: None
paper_authors: Wenxuan Zhou, Sheng Zhang, Yu Gu, Muhao Chen, Hoifung Poon
for: 这个论文的目的是用具体的任务准则（mission-focused instruction tuning）来训练更加成本效益的模型，以便在广泛的应用领域中具有优秀的表现。methods: 这篇论文使用了命名实体识别（NER）作为案例研究，通过减少ChatGPT模型的参数量，以训练更小的UniversalNER模型，以便在开放的NER任务上达到高度的准确率。results: 研究发现，无需使用直接监督，UniversalNER模型可以在多个领域和数据集上达到remarkable的NER准确率，并在average上超过了Alpaca和Vicuna模型以及InstructUIE系统。此外，UniversalNER模型还可以在不同的任务上进行多任务学习，并且可以在不同的数据集上进行适应性的学习。

Abstract
Large language models (LLMs) have demonstrated remarkable generalizability, such as understanding arbitrary entities and relations. Instruction tuning has proven effective for distilling LLMs into more cost-efficient models such as Alpaca and Vicuna. Yet such student models still trail the original LLMs by large margins in downstream applications. In this paper, we explore targeted distillation with mission-focused instruction tuning to train student models that can excel in a broad application class such as open information extraction. Using named entity recognition (NER) for case study, we show how ChatGPT can be distilled into much smaller UniversalNER models for open NER. For evaluation, we assemble the largest NER benchmark to date, comprising 43 datasets across 9 diverse domains such as biomedicine, programming, social media, law, finance. Without using any direct supervision, UniversalNER attains remarkable NER accuracy across tens of thousands of entity types, outperforming general instruction-tuned models such as Alpaca and Vicuna by over 30 absolute F1 points in average. With a tiny fraction of parameters, UniversalNER not only acquires ChatGPT's capability in recognizing arbitrary entity types, but also outperforms its NER accuracy by 7-9 absolute F1 points in average. Remarkably, UniversalNER even outperforms by a large margin state-of-the-art multi-task instruction-tuned systems such as InstructUIE, which uses supervised NER examples. We also conduct thorough ablation studies to assess the impact of various components in our distillation approach. We will release the distillation recipe, data, and UniversalNER models to facilitate future research on targeted distillation.

摘要
大型语言模型（LLM）已经表现出了很好的通用性，例如理解任意实体和关系。 instrucion 调教已经证明可以将 LLM 转化为更加Cost-efficient的模型，如阿LPACA和维纳纳。然而，这些学生模型仍然在下游应用中落后于原始 LLM 多达几个百分点。在这篇论文中，我们 explore 目标调教与使命匹配的 instrucion 调教来训练学生模型，以便在广泛的应用领域中 excel。使用名实体识别（NER）为案例研究，我们示出了如何通过 Targeted Distillation 将 ChatGPT 训练成更加小的 UniversalNER 模型，以便在开放的 NER 应用中进行识别。为了评估，我们组织了历史上最大的 NER benchmark，包括43个数据集，覆盖9个多样化的领域，如生物医学、编程、社交媒体、法律、金融等。无需直接监督，UniversalNER 在 tens of thousands 个实体类型中实现了惊人的 NER 准确率，比如常规的 instrucion-tuned 模型（如阿LPACA和维纳纳）多达30个Absolute F1点的提升。同时，UniversalNER 不仅获得了 ChatGPT 可以识别任意实体类型的能力，还在 NER 准确率方面超过了 ChatGPT 的表现，提高了7-9个Absolute F1点的提升。凯于此，UniversalNER 甚至超过了当前state-of-the-art的多任务 instrucion-tuned 系统（InstructUIE），该系统使用了supervised NER例子。我们还进行了严格的减少研究，以评估各种组件在我们的调教方法中的影响。我们将发布调教方法、数据和 UniversalNER 模型，以便未来研究人员可以通过目标调教来进一步提高模型的性能。

From Ambiguity to Explicitness: NLP-Assisted 5G Specification Abstraction for Formal Analysis

paper_url: http://arxiv.org/abs/2308.03277
repo_url: None
paper_authors: Shiyu Yuan, Jingda Yang, Sudhanshu Arya, Carlo Lipizzi, Ying Wang
for: 本研究旨在提高5G无线通信协议的正式分析效率，尤其在设计阶段，以便更好地发现逻辑漏洞和进行全面的安全评估。
methods: 本研究提出了一种hybrid方法，首先使用自然语言处理（NLP）工具将协议文档转换为数据，然后使用 constructed数据提取标识符和正式属性。最后，使用模型来验证标识符和正式属性的正确性。
results: 本研究实现了三种不同的依赖关系模型，其中最佳模型可达到有效率39% для标识符提取和42% для正式属性预测。这些结果证明了我们的方法的可行性和效果，并预示了对大规模复杂规格和协议分析的高效方法。

Abstract
Formal method-based analysis of the 5G Wireless Communication Protocol is crucial for identifying logical vulnerabilities and facilitating an all-encompassing security assessment, especially in the design phase. Natural Language Processing (NLP) assisted techniques and most of the tools are not widely adopted by the industry and research community. Traditional formal verification through a mathematics approach heavily relied on manual logical abstraction prone to being time-consuming, and error-prone. The reason that the NLP-assisted method did not apply in industrial research may be due to the ambiguity in the natural language of the protocol designs nature is controversial to the explicitness of formal verification. To address the challenge of adopting the formal methods in protocol designs, targeting (3GPP) protocols that are written in natural language, in this study, we propose a hybrid approach to streamline the analysis of protocols. We introduce a two-step pipeline that first uses NLP tools to construct data and then uses constructed data to extract identifiers and formal properties by using the NLP model. The identifiers and formal properties are further used for formal analysis. We implemented three models that take different dependencies between identifiers and formal properties as criteria. Our results of the optimal model reach valid accuracy of 39% for identifier extraction and 42% for formal properties predictions. Our work is proof of concept for an efficient procedure in performing formal analysis for largescale complicate specification and protocol analysis, especially for 5G and nextG communications.

摘要
formal方法基础的分析对5G无线通信协议是关键的，尤其在设计阶段。自然语言处理（NLP）助け的技术和工具在行业和研究社区中并不很受欢迎。传统的形式验证通过数学方法，强调手动逻辑归纳，容易占用时间和容易出错。因为自然语言协议设计的语言性是 controvertible，NLP助け的方法在工业研究中并未得到广泛采用。为了解决协议设计中的形式方法采用的挑战，我们在本研究中提出了一种混合方法。我们提出了一个两步管道，首先使用NLP工具生成数据，然后使用生成的数据提取标识符和形式属性，并用NLP模型进行预测。标识符和形式属性被用于形式分析。我们实现了三个模型，它们根据标识符和形式属性之间的依赖关系作为优化标准。我们的结果表明，我们的优化模型可以达到有效率的39% для标识符提取和42% для形式属性预测。我们的工作是一种有效的方法，用于对大规模复杂规格和协议分析进行有效的形式分析，特别是对5G和nextG通信。

Adapter-based Selective Knowledge Distillation for Federated Multi-domain Meeting Summarization

paper_url: http://arxiv.org/abs/2308.03275
repo_url: None
paper_authors: Xiachong Feng, Xiaocheng Feng, Xiyuan Du, Min-Yen Kan, Bing Qin
for: 本研究旨在提供一种基于联合学习的会议笔记概要生成技术，以解决现有工作偏重中心化数据的问题。
methods: 本研究使用了适配器基本的概要模型，以及选择性知识填充策略，帮助客户端模型在不同领域的数据上进行强化学习。
results: 实验结果表明，AdaFedSelecKD可以与中央训练方法相比，在QMSum数据集上实现相似的性能，并且表现稳定和可靠。

Abstract
Meeting summarization has emerged as a promising technique for providing users with condensed summaries. However, existing work has focused on training models on centralized data, neglecting real-world scenarios where meeting data are infeasible to collect centrally, due to their sensitive nature. This gap motivates us to explore federated learning for meeting summarization. Two critical challenges impede progress. First, state-of-the-art summarizers are based on parameter-heavy pre-trained models. Exchanging such a model's parameters across clients imposes large bandwidth costs. Second, as real-world meeting data belong to various domains and are distributed across clients, they are instances of non-identically and independently distributed (non-IID). IID assumptions do not hold, which changes which forms of learning algorithms best apply. To address this, we propose Adapter-based Federated Selective Knowledge Distillation (AdaFedSelecKD) for training performant client models. Specifically, we develop an adapter-based summarization model where two adapters cooperatively facilitate learning using fewer parameters to reduce communication costs. Then, we devise a selective knowledge distillation strategy, assisting clients in robustly handling domain-focused modelling on their own data, while leveraging global parameters based on non-IID data. Extensive experiments on the QMSum benchmark demonstrate AdaFedSelecKD can achieve comparable performance with powerful centralized training methods, and shows its generalizability and robustness.

摘要

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability

paper_url: http://arxiv.org/abs/2308.03266
repo_url: https://github.com/r1ckshi/seaco-paraformer
paper_authors: Xian Shi, Yexin Yang, Zerui Li, Shiliang Zhang
for: 这个论文的目的是提出一种基于 semantics的语音识别系统，具有可customizable的热词定制能力。
methods: 该论文使用了一种新的NAR模型，combines the accuracy of AED-based model和NAR模型，并且具有良好的contextualization表现。
results: 在50,000小时的工业大数据实验中，该提案的模型比强基eline模型在自定义和普通语音识别任务中表现出色，并且还提出了一种高效的热词筛选方法。

Abstract
Hotword customization is one of the important issues remained in ASR field - it is of value to enable users of ASR systems to customize names of entities, persons and other phrases. The past few years have seen both implicit and explicit modeling strategies for ASR contextualization developed. While these approaches have performed adequately, they still exhibit certain shortcomings such as instability in effectiveness. In this paper we propose Semantic-augmented Contextual-Paraformer (SeACo-Paraformer) a novel NAR based ASR system with flexible and effective hotword customization ability. It combines the accuracy of the AED-based model, the efficiency of the NAR model, and the excellent performance in contextualization. In 50,000 hours industrial big data experiments, our proposed model outperforms strong baselines in customization and general ASR tasks. Besides, we explore an efficient way to filter large scale incoming hotwords for further improvement. The source codes and industrial models proposed and compared are all opened as well as two hotword test sets.

摘要
“热词自定义是ASR领域中一个重要的 Issue - 它具有价值，以允许ASR系统的使用者自定义名称、人名和其他短语。过去几年，有 implicit 和 explicit 模型化策略为ASR上下文化开发出来。although these approaches have performed adequately, they still exhibit certain shortcomings such as instability in effectiveness。在这篇文章中，我们提出Semantic-augmented Contextual-Paraformer (SeACo-Paraformer) ，一种新的 NAR 基于 ASR 系统，具有灵活和有效的热词自定义能力。它结合了 AED-based 模型的精度，NAR 模型的效率，以及优秀的上下文化表现。在50,000小时的工业大数据实验中，我们的提议模型比强大的基eline在自定义和一般 ASR 任务上表现出色。此外，我们还探索了一种高效的方法来筛选大规模的来临热词，以进一步提高效能。文章中的原始代码和工业模型都已经公开，同时还提供了两个热词测试集。”

Exploring Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning

paper_url: http://arxiv.org/abs/2308.03234
repo_url: None
paper_authors: Hunter McNichols, Wanyong Feng, Jaewook Lee, Alexander Scarlatos, Digory Smith, Simon Woodhead, Andrew Lan
for: 这篇论文的目的是提出一种自动生成多项选择题（MCQ）的拥有者和相应的反馈信息的方法，以解决现有的劳动密集和限制缩放性。
methods: 这篇论文使用了大型自然语言模型来自动生成多项选择题的错误选项和反馈信息。它提出了一种简单的、在场景学习中进行学习的解决方案。
results: 该论文通过实验表明，自动生成的错误选项和反馈信息质量仍有很大的改进空间。同时，它还提出了未来研究的方向。

Abstract
Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable format in both assessments and practices. An important aspect of MCQs is the distractors, i.e., incorrect options that are designed to target specific misconceptions or insufficient knowledge among students. To date, the task of crafting high-quality distractors has largely remained a labor-intensive process for teachers and learning content designers, which has limited scalability. In this work, we explore the task of automated distractor and corresponding feedback message generation in math MCQs using large language models. We establish a formulation of these two tasks and propose a simple, in-context learning-based solution. Moreover, we explore using two non-standard metrics to evaluate the quality of the generated distractors and feedback messages. We conduct extensive experiments on these tasks using a real-world MCQ dataset that contains student response information. Our findings suggest that there is a lot of room for improvement in automated distractor and feedback generation. We also outline several directions for future work

摘要
多选问题（MCQ）在教育中是非常普遍的，因为它们容易进行管理、评分和练习。MCQ中的错误选项（distractors）是一个重要的特点，它们需要针对学生的错误观点或知识不足进行设计。然而，制作高质量的错误选项仍然是一项劳动密集的任务，这限制了大规模应用。在这篇文章中，我们探讨了自动生成MCQ中的错误选项和相应的反馈消息，使用大型自然语言模型。我们提出了一种简单的、在场景学习中进行学习的解决方案。此外，我们还使用了两种非标准度量来评估生成的错误选项和反馈消息的质量。我们在实际的MCQ数据集上进行了广泛的实验，我们的发现表明了自动生成错误选项和反馈消息还有很大的可改进空间。我们还提出了未来工作的一些方向。

Average-Hard Attention Transformers are Constant-Depth Uniform Threshold Circuits

paper_url: http://arxiv.org/abs/2308.03212
repo_url: None
paper_authors: Lena Strobl
for: 这paper是为了研究transformer模型和常深度阈值电路之间的关系。
methods: 这paper使用了average-hard attention和logarithmic precision两个假设，以及constant-depth threshold circuits来模型语言。
results: 这paper证明了average-hard attention transformers可以recognizeTC0类语言，而log-precision transformers可以recognize uniform TC0类语言。这两个result都表明transformer模型可以被constant-depth threshold circuits模型。

Abstract
Transformers have emerged as a widely used neural network model for various natural language processing tasks. Previous research explored their relationship with constant-depth threshold circuits, making two assumptions: average-hard attention and logarithmic precision for internal computations relative to input length. Merrill et al. (2022) prove that average-hard attention transformers recognize languages that fall within the complexity class TC0, denoting the set of languages that can be recognized by constant-depth polynomial-size threshold circuits. Likewise, Merrill and Sabharwal (2023) show that log-precision transformers recognize languages within the class of uniform TC0. This shows that both transformer models can be simulated by constant-depth threshold circuits, with the latter being more robust due to generating a uniform circuit family. Our paper shows that the first result can be extended to yield uniform circuits as well.

摘要
transformers 已经成为自然语言处理任务中广泛使用的神经网络模型。前一个研究探讨了它们与常 depth 阈值电路之间的关系，并假设了两个假设：均值困难注意力和对内部计算的对数精度相对于输入长度。Merrill et al. (2022) 证明了 average-hard 注意力 transformers 可以认出 fall 在 TC0 复杂性类中的语言，其中 TC0 表示可以通过常 depth 多项式大小阈值电路来认出的语言。另外，Merrill 和 Sabharwal (2023) 表明 log-precision transformers 可以认出 uniform TC0 类中的语言。这表明两种 transformer 模型都可以被模拟为常 depth 阈值电路，其中后一种更加稳定，因为它生成了一个 uniform 电路家族。我们的论文显示，第一个结果可以被推广到生成 uniform 电路。

2023-08-07

cs.LG

cs.LG - 2023-08-07

Improving FHB Screening in Wheat Breeding Using an Efficient Transformer Model

paper_url: http://arxiv.org/abs/2308.03670
repo_url: None
paper_authors: Babak Azad, Ahmed Abdalla, Kwanghee Won, Ali Mirzakhani Nafchi
For: The paper is written for the early detection of Fusarium head blight (FHB) in wheat and barley breeding programs.* Methods: The paper proposes a new Context Bridge to integrate the local representation capability of the U-Net network in the transformer model, and replaces the standard attention mechanism with Efficient Self-attention.* Results: The proposed transformer-based method is effective for FHB-disease detection, as demonstrated through extensive experiments across typical tasks for plant image segmentation.Here’s the simplified Chinese text for the three key points:* 用: 本文为小麦和麦角束缚程序中早期检测 fusarium head blight (FHB) 的研究。* 方法: 本文提出一种新的 Context Bridge，将 U-Net 网络的本地表示能力与 transformer 模型结合，并将标准的注意机制 replaced 为 Efficient Self-attention。* 结果: 提议的 transformer-based 方法在 Plant image segmentation 等 typical tasks 中得到了广泛的实验证明，表明其效果是可靠的。

Abstract
Fusarium head blight is a devastating disease that causes significant economic losses annually on small grains. Efficiency, accuracy, and timely detection of FHB in the resistance screening are critical for wheat and barley breeding programs. In recent years, various image processing techniques have been developed using supervised machine learning algorithms for the early detection of FHB. The state-of-the-art convolutional neural network-based methods, such as U-Net, employ a series of encoding blocks to create a local representation and a series of decoding blocks to capture the semantic relations. However, these methods are not often capable of long-range modeling dependencies inside the input data, and their ability to model multi-scale objects with significant variations in texture and shape is limited. Vision transformers as alternative architectures with innate global self-attention mechanisms for sequence-to-sequence prediction, due to insufficient low-level details, may also limit localization capabilities. To overcome these limitations, a new Context Bridge is proposed to integrate the local representation capability of the U-Net network in the transformer model. In addition, the standard attention mechanism of the original transformer is replaced with Efficient Self-attention, which is less complicated than other state-of-the-art methods. To train the proposed network, 12,000 wheat images from an FHB-inoculated wheat field at the SDSU research farm in Volga, SD, were captured. In addition to healthy and unhealthy plants, these images encompass various stages of the disease. A team of expert pathologists annotated the images for training and evaluating the developed model. As a result, the effectiveness of the transformer-based method for FHB-disease detection, through extensive experiments across typical tasks for plant image segmentation, is demonstrated.

摘要
fusarium 头病是一种致命的疾病，每年对小麦和麦角造成重大经济损失。效率、准确性和时间检测 fusarium 头病在抗性屏选中是关键，以便为小麦和麦角推广 програм序。在最近几年，Various image processing techniques 有 been developed using supervised machine learning algorithms for the early detection of fusarium 头病。state-of-the-art convolutional neural network-based methods, such as U-Net, employ a series of encoding blocks to create a local representation and a series of decoding blocks to capture the semantic relations。However, these methods are not often capable of long-range modeling dependencies inside the input data, and their ability to model multi-scale objects with significant variations in texture and shape is limited。In order to overcome these limitations, a new Context Bridge is proposed to integrate the local representation capability of the U-Net network in the transformer model。In addition, the standard attention mechanism of the original transformer is replaced with Efficient Self-attention, which is less complicated than other state-of-the-art methods。To train the proposed network, 12,000 wheat images from an fusarium 头病-inoculated wheat field at the SDSU research farm in Volga, SD, were captured。In addition to healthy and unhealthy plants, these images encompass various stages of the disease。A team of expert pathologists annotated the images for training and evaluating the developed model。As a result, the effectiveness of the transformer-based method for fusarium 头病-disease detection, through extensive experiments across typical tasks for plant image segmentation, is demonstrated。

Diffusion Model in Causal Inference with Unmeasured Confounders

paper_url: http://arxiv.org/abs/2308.03669
repo_url: https://github.com/tatsu432/BDCM
paper_authors: Tatsuhiro Shimizu
For: 本研究旨在扩展Diffusion Model，以回答基于观察数据的 causal 问题，在存在隐藏 confounder 的情况下。* Methods: 我们使用 Pearl 的 Directed Acyclic Graph (DAG) 框架，并提出了一种Diffusion-based Causal Model (DCM)，即将 diffusion model incorporated into DAG 来更准确地回答 causal 问题，假设所有 confounder 都是观察的。但在实践中，隐藏 confounder 存在，这限制了 DCM 的可应用范围。为了解决这种限制，我们提出了一种扩展模型 called Backdoor Criterion based DCM (BDCM)，其基于 Backdoor criterion 来找出 DAG 中需要包含在 decoding 过程中的变量，以便在隐藏 confounder 的情况下扩展 DCM。* Results: 我们通过 synthetic data 实验表明，我们的提议的模型能够更 precisely 捕捉 counterfactual distribution than DCM under unmeasured confounders。

Abstract
We study how to extend the use of the diffusion model to answer the causal question from the observational data under the existence of unmeasured confounders. In Pearl's framework of using a Directed Acyclic Graph (DAG) to capture the causal intervention, a Diffusion-based Causal Model (DCM) was proposed incorporating the diffusion model to answer the causal questions more accurately, assuming that all of the confounders are observed. However, unmeasured confounders in practice exist, which hinders DCM from being applicable. To alleviate this limitation of DCM, we propose an extended model called Backdoor Criterion based DCM (BDCM), whose idea is rooted in the Backdoor criterion to find the variables in DAG to be included in the decoding process of the diffusion model so that we can extend DCM to the case with unmeasured confounders. Synthetic data experiment demonstrates that our proposed model captures the counterfactual distribution more precisely than DCM under the unmeasured confounders.

摘要
我们研究如何使用扩散模型回答从观察数据中的 causal 问题，在存在未探测的干扰变量的情况下。在珍珠的框架中，使用直接径向图（DAG）捕捉 causal 干预，一种扩散基于 causal 模型（DCM）被提出，假设所有干扰变量都是观察的。然而，在实践中存在未探测的干扰变量，这限制了 DCM 的应用。为了解决 DCM 中未探测干扰变量的问题，我们提出一种扩展模型，即 Backdoor criterion 基于 DCM（BDCM），其基于 Backdoor criterion 来找出 DAG 中需要包含在扩散模型中的变量，以便在未探测干扰变量的情况下扩展 DCM。synthetic 数据实验表明，我们的提议的模型可以更准确地回答 counterfactual 分布，比 DCM 在未探测干扰变量的情况下。

Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness

paper_url: http://arxiv.org/abs/2308.03666
repo_url: None
paper_authors: Shide Du, Zihan Fang, Shiyang Lan, Yanchao Tan, Manuel Günther, Shiping Wang, Wenzhong Guo
for: 提高人工智能系统的可靠性和多模式学习能力
methods: 使用自定义可靠网络、灵活学习正则化和开放世界认知损失来提高可靠性和多模式学习
results: 实现了开放世界多模式认知任务的高性能提升

Abstract
As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence systems that need to be bridged: 1) Insufficient explanation of predictive results; 2) Inadequate generalization for learning models; 3) Poor adaptability to uncertain environments. Consequently, we explore a neural program to bridge trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. 1) To enhance design-level interpretability, we first customize trustworthy networks with specific physical meanings; 2) We then design environmental well-being task-interfaces via flexible learning regularizers for improving the generalization of trustworthy learning; 3) We propose to increase the robustness of trustworthy learning by integrating open-world recognition losses with agent mechanisms. Eventually, we enhance various trustworthy properties through the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. These designed open-world protocols are applicable across a wide range of surroundings, under open-world multimedia recognition scenarios with significant performance improvements observed.

摘要

Insufficient explanation of predictive results: It is difficult to understand why the system made a particular prediction or decision.2. Inadequate generalization for learning models: The system may not perform well when faced with new or unfamiliar situations.3. Poor adaptability to uncertain environments: The system may not be able to handle unexpected events or changes in the environment.To address these challenges, we propose a neural program that bridges trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. Our approach includes the following three components:1. Design-level interpretability: We first customize trustworthy networks with specific physical meanings, making it easier to understand how the system works and why it makes certain predictions or decisions.2. Environmental well-being task-interfaces: We design flexible learning regularizers to improve the generalization of trustworthy learning, allowing the system to adapt to new situations and environments.3. Open-world recognition programs: We integrate open-world recognition losses with agent mechanisms to increase the robustness of trustworthy learning, enabling the system to handle unexpected events and changes in the environment.By enhancing various trustworthy properties through these designed open-world protocols, we observe significant performance improvements across a wide range of surroundings, under open-world multimedia recognition scenarios. These protocols are applicable to a variety of environments, including but not limited to:1. Image recognition: The system can accurately recognize objects and scenes in images, even in the presence of noise or other challenges.2. Speech recognition: The system can accurately transcribe spoken language, even in noisy or unfamiliar environments.3. Natural language processing: The system can understand and respond to natural language inputs, even in complex or ambiguous situations.Overall, our approach to trustworthy open-world learning has the potential to significantly improve the performance and reliability of artificial intelligence systems, enabling them to better serve the needs of users in a wide range of contexts.

Distributionally Robust Classification on a Data Budget

paper_url: http://arxiv.org/abs/2308.03821
repo_url: https://github.com/penfever/vlhub
paper_authors: Benjamin Feuer, Ameya Joshi, Minh Pham, Chinmay Hegde
for: 该论文目的是研究如何在数据有限的情况下培养可靠的深度学习模型。
methods: 该论文使用了一系列新的训练数据集和精心控制的调查来研究对于图像分类的Robustness的因素。
results: 研究发现，使用标准的ResNet-50模型，训练时使用交叉熵损失，可以在240万个图像样本上达到与CLIP模型训练400万样本后的Robustness水平。这是我们知道的第一个在有限数据预算下实现（近）顶尖分布robustness的结果。

Abstract
Real world uses of deep learning require predictable model behavior under distribution shifts. Models such as CLIP show emergent natural distributional robustness comparable to humans, but may require hundreds of millions of training samples. Can we train robust learners in a domain where data is limited? To rigorously address this question, we introduce JANuS (Joint Annotations and Names Set), a collection of four new training datasets with images, labels, and corresponding captions, and perform a series of carefully controlled investigations of factors contributing to robustness in image classification, then compare those results to findings derived from a large-scale meta-analysis. Using this approach, we show that standard ResNet-50 trained with the cross-entropy loss on 2.4 million image samples can attain comparable robustness to a CLIP ResNet-50 trained on 400 million samples. To our knowledge, this is the first result showing (near) state-of-the-art distributional robustness on limited data budgets. Our dataset is available at \url{https://huggingface.co/datasets/penfever/JANuS_dataset}, and the code used to reproduce our experiments can be found at \url{https://github.com/penfever/vlhub/}.

摘要
real-world 应用需要深度学习模型在分布变化下具有预测性的行为。例如，CLIP 模型表现出了自然的分布强度性，但可能需要百万个训练样本。可以在有限数据量的领域中训练强健的学习者吗？为了彻底回答这个问题，我们介绍了 JANuS（共同注释和名称集），一个包含四个新的训练集，包括图像、标签和相应的描述，并进行了一系列严格控制的调查，以研究影响模型强度的因素。我们发现，使用权重平均损失函数，只需训练标准 ResNet-50 模型，可以在240万张图像样本上达到与 CLIP ResNet-50 模型在400万样本上的相似水平的分布强度性。我们认为这是首次在有限数据预算下实现（近）状态时的分布强度性的研究结果。我们的数据集可以在 \url{https://huggingface.co/datasets/penfever/JANuS_dataset} 上下载，并且使用来复制我们的实验代码可以在 \url{https://github.com/penfever/vlhub/} 上找到。

Two-stage Early Prediction Framework of Remaining Useful Life for Lithium-ion Batteries

paper_url: http://arxiv.org/abs/2308.03664
repo_url: None
paper_authors: Dhruv Mittal, Hymalai Bello, Bo Zhou, Mayank Shekhar Jha, Sungho Suh, Paul Lukowicz
for: 预测 Lithium-ion 电池的有用寿命 (RUL)，以提高各种业务中的电池管理可靠性和可维护性。
methods: 提议的方法包括两个阶段：使用神经网络模型确定首次预测周期 (FPC)，然后使用预测衰落特征来估算剩余有用寿命。
results: 实验结果显示，提议的方法在 RUL 预测方面表现出色，比传统方法更准确和可靠。此外，该方法在实际应用中也表现了承诺，提供了更好的准确性和可应用性。

Abstract
Early prediction of remaining useful life (RUL) is crucial for effective battery management across various industries, ranging from household appliances to large-scale applications. Accurate RUL prediction improves the reliability and maintainability of battery technology. However, existing methods have limitations, including assumptions of data from the same sensors or distribution, foreknowledge of the end of life (EOL), and neglect to determine the first prediction cycle (FPC) to identify the start of the unhealthy stage. This paper proposes a novel method for RUL prediction of Lithium-ion batteries. The proposed framework comprises two stages: determining the FPC using a neural network-based model to divide the degradation data into distinct health states and predicting the degradation pattern after the FPC to estimate the remaining useful life as a percentage. Experimental results demonstrate that the proposed method outperforms conventional approaches in terms of RUL prediction. Furthermore, the proposed method shows promise for real-world scenarios, providing improved accuracy and applicability for battery management.

摘要
早期预测电池剩余有用生命（RUL）是跨多个领域的关键，从家用电器到大规模应用。准确的RUL预测提高电池技术的可靠性和维护性。然而，现有方法有限，包括同感知数据的假设、结束生命阶段（EOL）的先知识和忽略第一预测周期（FPC）来确定开始不健康阶段。这篇论文提出了一种新的Li-ion电池RUL预测方法。该框架包括两个阶段：使用神经网络模型将衰减数据分为不同的健康状态，并预测衰减模式以估算剩余有用生命的百分数。实验结果表明，提案方法在RUL预测方面表现出了明显的优势，并在实际场景中展现出了改善的准确性和可用性。

Matrix Completion in Almost-Verification Time

paper_url: http://arxiv.org/abs/2308.03661
repo_url: None
paper_authors: Jonathan A. Kelner, Jerry Li, Allen Liu, Aaron Sidford, Kevin Tian
for: 这个论文提出了一个新的低级matrix completion问题的解决方案，即从random observations中approximentate矩阵\mathbf{M}的级数为r的完全问题。
methods: 该论文提出了一种算法，可以在没有任何假设下completes\mathbf{M}的99%行和列在 sampling的基础上。然后，假设矩阵\mathbf{M}的行和列范围满足特定的规则性质，则可以通过融合多个回归问题的解来提高完全问题的解决方案。
results: 论文表明，在矩阵\mathbf{M}的行和列范围满足特定的规则性质时，可以通过 sampling的基础上completes\mathbf{M}到高精度从mr^{2+o(1)} observations中，并且runtime为mr^{3+o(1)}。此外，论文还提出了一些Robust variantsof algorithms，可以在noisy环境中completes\mathbf{M}到Frobenius norm distance的approximentatelyr^{1.5}\Delta。在这些runtimes中，可以verify that a rank-$r$ decomposition $\mathbf{U}\mathbf{V}^\top$ agrees with the sampled observations。

Abstract
We give a new framework for solving the fundamental problem of low-rank matrix completion, i.e., approximating a rank-$r$ matrix $\mathbf{M} \in \mathbb{R}^{m \times n}$ (where $m \ge n$) from random observations. First, we provide an algorithm which completes $\mathbf{M}$ on $99\%$ of rows and columns under no further assumptions on $\mathbf{M}$ from $\approx mr$ samples and using $\approx mr^2$ time. Then, assuming the row and column spans of $\mathbf{M}$ satisfy additional regularity properties, we show how to boost this partial completion guarantee to a full matrix completion algorithm by aggregating solutions to regression problems involving the observations. In the well-studied setting where $\mathbf{M}$ has incoherent row and column spans, our algorithms complete $\mathbf{M}$ to high precision from $mr^{2+o(1)}$ observations in $mr^{3 + o(1)}$ time (omitting logarithmic factors in problem parameters), improving upon the prior state-of-the-art [JN15] which used $\approx mr^5$ samples and $\approx mr^7$ time. Under an assumption on the row and column spans of $\mathbf{M}$ we introduce (which is satisfied by random subspaces with high probability), our sample complexity improves to an almost information-theoretically optimal $mr^{1 + o(1)}$, and our runtime improves to $mr^{2 + o(1)}$. Our runtimes have the appealing property of matching the best known runtime to verify that a rank-$r$ decomposition $\mathbf{U}\mathbf{V}^\top$ agrees with the sampled observations. We also provide robust variants of our algorithms that, given random observations from $\mathbf{M} + \mathbf{N}$ with $\|\mathbf{N}\|_{F} \le \Delta$, complete $\mathbf{M}$ to Frobenius norm distance $\approx r^{1.5}\Delta$ in the same runtimes as the noiseless setting. Prior noisy matrix completion algorithms [CP10] only guaranteed a distance of $\approx \sqrt{n}\Delta$.

摘要
我们提出了一个新的框架来解决低阶矩阵完成问题，即确 aproximating 一个 Rank-$r$ 矩阵 $\mathbf{M} \in \mathbb{R}^{m \times n}$ (where $m \ge n$) 从Random observations 中。首先，我们提供了一个算法，可以在不进一步假设 $\mathbf{M}$ 的情况下，从 $\approx mr$ 样本中完成 $\mathbf{M}$ 的99% 的行和列。然后，假设 $\mathbf{M}$ 的行和列范围满足其他调和的特性，我们可以通过聚合这些部分完成数据来实现全矩阵完成算法。在广泛研究的设定中，其中 $\mathbf{M}$ 的行和列范围是不对称的，我们的算法可以从 $mr^{2+o(1)}$ 样本中完成 $\mathbf{M}$ 到高精度，比以前的状况下（JN15）使用 $\approx mr^5$ 样本和 $\approx mr^7$ 时间。假设 $\mathbf{M}$ 的行和列范围满足我们引入的一个假设（这个假设在高概率下成立），我们的样本缩减到 almost information-theoretically Optimal $mr^{1+o(1)}$，并且我们的时间缩减到 $mr^{2+o(1)}$。我们的时间有着愉悦的性质，与最好的知识理论时间匹配，以确认 $\mathbf{U}\mathbf{V}^\top$ 是否对样本一致。我们还提供了一些强健的算法，可以在 $\mathbf{M} + \mathbf{N}$ 中的随机样本中完成 $\mathbf{M}$，其中 $\|\mathbf{N}\|_{F} \le \Delta$。这些算法可以在同一个时间中完成这些任务，并且可以保证完成结果的 Frobenius 误差在 $\approx r^{1.5}\Delta$ 之间。与之前的噪音矩阵完成算法（CP10）相比，这些算法可以提供更好的误差保证，即 $\approx \sqrt{n}\Delta$。

Generative Forests

paper_url: http://arxiv.org/abs/2308.03648
repo_url: https://github.com/AlCorreia/GeFs
paper_authors: Richard Nock, Mathieu Guillame-Bert
for: 本文主要针对Tabular数据的生成和模型化问题，旨在提出新的树状生成模型和训练算法，以解决现有方法的三大问题。
methods: 本文提出了一种基于树状模型的生成模型，可以快速生成高质量的Tabular数据，同时保证模型的可读性和可视化性。此外，本文还提出了一种基于树状模型的训练算法，可以简化之前的训练设定和 Display boosting-compatible convergence。
results: 实验表明，本文的方法可以在缺失数据补充和生成数据比较真实数据的问题上达到remarkable results，特别是在比较于state-of-the-art方法的情况下。

Abstract
Tabular data represents one of the most prevalent form of data. When it comes to data generation, many approaches would learn a density for the data generation process, but would not necessarily end up with a sampler, even less so being exact with respect to the underlying density. A second issue is on models: while complex modeling based on neural nets thrives in image or text generation (etc.), less is known for powerful generative models on tabular data. A third problem is the visible chasm on tabular data between training algorithms for supervised learning with remarkable properties (e.g. boosting), and a comparative lack of guarantees when it comes to data generation. In this paper, we tackle the three problems, introducing new tree-based generative models convenient for density modeling and tabular data generation that improve on modeling capabilities of recent proposals, and a training algorithm which simplifies the training setting of previous approaches and displays boosting-compliant convergence. This algorithm has the convenient property to rely on a supervised training scheme that can be implemented by a few tweaks to the most popular induction scheme for decision tree induction with two classes. Experiments are provided on missing data imputation and comparing generated data to real data, displaying the quality of the results obtained by our approach, in particular against state of the art.

摘要
表格数据表示一种非常常见的数据形式。在数据生成方面，许多方法会学习数据生成过程中的浓度，但并不一定会得到一个抽象，更不一定是对于下面的潜在浓度准确。第二个问题是模型：虽然复杂的模型基于神经网络在图像或文本生成等领域得到了成功，但对于可质量生成模型来说，对于表格数据 menos 知之。第三个问题是表格数据的可见差异，在超vised学习中具有惊人性能的训练算法（例如，提升），而数据生成方面却缺乏保证。在这篇论文中，我们解决了这三个问题，提出了新的树状生成模型，可以增强对表格数据的模型能力，以及一种简化训练设置的训练算法，可以在前一代方法的基础上进行快速启用。这种算法可以通过对最流行的决策树生成算法中的两类训练进行一些修改来实现。我们的实验表明，我们的方法可以在缺失数据填充和生成数据与真实数据比较的情况下表现出色，特别是与现有技术相比。

XFlow: Benchmarking Flow Behaviors over Graphs

paper_url: http://arxiv.org/abs/2308.03819
repo_url: https://github.com/xgraphing/xflow
paper_authors: Zijian Zhang, Zonghan Zhang, Zhiqian Chen
for: 本研究旨在提供一个涵盖多领域 tasks、基线模型、图 dataset 和评估工具的新的参考库，以便研究各种传播行为在不同领域的情况下。
methods: 本研究使用了多种方法，包括基线模型、图 Theory 和机器学习方法，以探索各种传播行为的特点和特征。
results: 本研究的结果显示了现有的基础模型在不同的图dataset上的优适性和缺陷，以及可能的未来研究方向。

Abstract
The occurrence of diffusion on a graph is a prevalent and significant phenomenon, as evidenced by the spread of rumors, influenza-like viruses, smart grid failures, and similar events. Comprehending the behaviors of flow is a formidable task, due to the intricate interplay between the distribution of seeds that initiate flow propagation, the propagation model, and the topology of the graph. The study of networks encompasses a diverse range of academic disciplines, including mathematics, physics, social science, and computer science. This interdisciplinary nature of network research is characterized by a high degree of specialization and compartmentalization, and the cooperation facilitated by them is inadequate. From a machine learning standpoint, there is a deficiency in a cohesive platform for assessing algorithms across various domains. One of the primary obstacles to current research in this field is the absence of a comprehensive curated benchmark suite to study the flow behaviors under network scenarios. To address this disparity, we propose the implementation of a novel benchmark suite that encompasses a variety of tasks, baseline models, graph datasets, and evaluation tools. In addition, we present a comprehensive analytical framework that offers a generalized approach to numerous flow-related tasks across diverse domains, serving as a blueprint and roadmap. Drawing upon the outcomes of our empirical investigation, we analyze the advantages and disadvantages of current foundational models, and we underscore potential avenues for further study. The datasets, code, and baseline models have been made available for the public at: https://github.com/XGraphing/XFlow

摘要
Diffusion on graphs is a common and important phenomenon, such as the spread of rumors, influenza-like viruses, and smart grid failures. Understanding the flow behavior is a challenging task due to the complex interplay between the seed distribution, propagation model, and graph topology. Network research is an interdisciplinary field that includes mathematics, physics, social science, and computer science, but this field is characterized by a high degree of specialization and compartmentalization, and cooperation between disciplines is limited. From a machine learning perspective, there is a lack of a comprehensive platform for assessing algorithms across different domains. One of the primary obstacles to current research in this field is the absence of a comprehensive curated benchmark suite to study flow behaviors under network scenarios.To address this gap, we propose the implementation of a novel benchmark suite that includes various tasks, baseline models, graph datasets, and evaluation tools. Additionally, we present a comprehensive analytical framework that provides a generalized approach to numerous flow-related tasks across diverse domains, serving as a blueprint and roadmap. Based on our empirical investigation, we analyze the advantages and disadvantages of current foundational models and highlight potential avenues for further study. The datasets, code, and baseline models have been made publicly available at: .

MedMine: Examining Pre-trained Language Models on Medication Mining

paper_url: http://arxiv.org/abs/2308.03629
repo_url: https://github.com/hecta-uom/m3
paper_authors: Haifa Alrdahi, Lifeng Han, Hendrik Šuvalov, Goran Nenadic
for: 本研究旨在探讨现有的预训练语言模型（PLM）在自动药物检索 task 上的表现，以便为未来研究提供参考。
methods: 本研究使用了 Fine-tuning 方法，包括 Med7 和 XLM-RoBERTa 两种预训练模型，以便对历史药物检索 shared task 数据集进行比较。
results: 研究发现现有的 PLM 在不同类型的实体和药物事件上表现不均衡，并且提出了将这些模型结合使用、或者通过 ensemble learning 和数据扩展来提高总体精度的想法。

Abstract
Automatic medication mining from clinical and biomedical text has become a popular topic due to its real impact on healthcare applications and the recent development of powerful language models (LMs). However, fully-automatic extraction models still face obstacles to be overcome such that they can be deployed directly into clinical practice for better impacts. Such obstacles include their imbalanced performances on different entity types and clinical events. In this work, we examine current state-of-the-art pre-trained language models (PLMs) on such tasks, via fine-tuning including the monolingual model Med7 and multilingual large language model (LLM) XLM-RoBERTa. We compare their advantages and drawbacks using historical medication mining shared task data sets from n2c2-2018 challenges. We report the findings we get from these fine-tuning experiments such that they can facilitate future research on addressing them, for instance, how to combine their outputs, merge such models, or improve their overall accuracy by ensemble learning and data augmentation. MedMine is part of the M3 Initiative \url{https://github.com/HECTA-UoM/M3}

摘要
自动药物检索从临床和生物医学文本中的检索已经成为一个流行的话题，这主要归功于它在医疗应用中的实际影响以及最近发展的强大语言模型（LM）。然而，完全自动提取模型仍然需要突破一些障碍，以便在临床实践中直接部署。这些障碍包括它们在不同实体类型和临床事件上的不均衡表现。在这项工作中，我们研究了当前状态的最佳预训练语言模型（PLM）在这些任务上，包括单语言模型Med7和多语言大型语言模型（LLM）XLM-RoBERTa。我们比较了它们的优势和缺陷，使用历史药物检索共同任务数据集。我们报告了这些精度调整实验的结果，以便将来研究如何结合它们的输出，合并这些模型，或者提高它们的总精度 mediante ensemble学习和数据扩展。MedMine是M3Initiave的一部分，详情请参考。

A sparse coding approach to inverse problems with application to microwave tomography imaging

paper_url: http://arxiv.org/abs/2308.03818
repo_url: None
paper_authors: Cesar F. Caiafa, Ramiro M. Irastorza
for: 这篇论文是关于解决具有各种科学和技术领域的不定性问题的，包括医疗诊断和天文学研究。
methods: 这篇论文使用了稀疏表示法，这是一种基于生物视觉系统的自然图像生成模型，可以有效地解决不定性线性逆问题。
results: 该论文提出了一种基于稀疏 coding的非线性和不定性问题解决方法，可能将导致现有算法的显著改进。

Abstract
Inverse imaging problems that are ill-posed can be encountered across multiple domains of science and technology, ranging from medical diagnosis to astronomical studies. To reconstruct images from incomplete and distorted data, it is necessary to create algorithms that can take into account both, the physical mechanisms responsible for generating these measurements and the intrinsic characteristics of the images being analyzed. In this work, the sparse representation of images is reviewed, which is a realistic, compact and effective generative model for natural images inspired by the visual system of mammals. It enables us to address ill-posed linear inverse problems by training the model on a vast collection of images. Moreover, we extend the application of sparse coding to solve the non-linear and ill-posed problem in microwave tomography imaging, which could lead to a significant improvement of the state-of-the-arts algorithms.

摘要
<>translate "Inverse imaging problems that are ill-posed can be encountered across multiple domains of science and technology, ranging from medical diagnosis to astronomical studies. To reconstruct images from incomplete and distorted data, it is necessary to create algorithms that can take into account both, the physical mechanisms responsible for generating these measurements and the intrinsic characteristics of the images being analyzed. In this work, the sparse representation of images is reviewed, which is a realistic, compact and effective generative model for natural images inspired by the visual system of mammals. It enables us to address ill-posed linear inverse problems by training the model on a vast collection of images. Moreover, we extend the application of sparse coding to solve the non-linear and ill-posed problem in microwave tomography imaging, which could lead to a significant improvement of the state-of-the-arts algorithms." into Simplified Chinese.描述：反射图像问题可以在多个科学和技术领域中遇到，从医学诊断到天文学研究。为了从不完整和扭曲的数据中重建图像，需要创建能够考虑物理机制生成这些测量的数据，以及图像的内在特征。在这项工作中，我们评论了图像简洁表示，它是自然图像的可靠、紧凑和有效的生成模型，受蜥蜴视系统的启发。通过训练模型，可以解决线性不定的反射图像问题。此外，我们还扩展了简洁编码的应用，以解决微波Tomography影像问题，这可能会导致现有算法的显著改进。

A Meta-learning based Stacked Regression Approach for Customer Lifetime Value Prediction

paper_url: http://arxiv.org/abs/2308.08502
repo_url: None
paper_authors: Karan Gadgil, Sukhpal Singh Gill, Ahmed M. Abdelmoniem
for: The paper is written to propose a new approach to estimating Customer Lifetime Value (CLV) that is both effective and interpretable, using a combination of bagging and boosting models.
methods: The proposed approach uses a meta-learning-based stacked regression model that combines the predictions from multiple bagging and boosting models to estimate CLV.
results: The paper shows the efficacy of the proposed approach through empirical tests on an openly available Online Retail dataset, demonstrating that it outperforms existing distribution-based and basic models.

Abstract
Companies across the globe are keen on targeting potential high-value customers in an attempt to expand revenue and this could be achieved only by understanding the customers more. Customer Lifetime Value (CLV) is the total monetary value of transactions/purchases made by a customer with the business over an intended period of time and is used as means to estimate future customer interactions. CLV finds application in a number of distinct business domains such as Banking, Insurance, Online-entertainment, Gaming, and E-Commerce. The existing distribution-based and basic (recency, frequency & monetary) based models face a limitation in terms of handling a wide variety of input features. Moreover, the more advanced Deep learning approaches could be superfluous and add an undesirable element of complexity in certain application areas. We, therefore, propose a system which is able to qualify both as effective, and comprehensive yet simple and interpretable. With that in mind, we develop a meta-learning-based stacked regression model which combines the predictions from bagging and boosting models that each is found to perform well individually. Empirical tests have been carried out on an openly available Online Retail dataset to evaluate various models and show the efficacy of the proposed approach.

摘要
世界各地企业都在努力寻找高值客户，以扩大收入。这可以通过更好地了解客户来实现。客户全生命值（CLV）是指客户在企业与其交易的总财务值，在一定时间范围内，并用于估计未来客户互动。CLV在银行、保险、在线娱乐、游戏和电商等多个业务领域得到应用。现有的分布型和基本（频率、额度和时间）型模型具有处理多种输入特征的限制。此外，更先进的深度学习方法可能会添加不必要的复杂性。因此，我们提出一个能够同时具有效果、全面和简单可解释的系统。为了实现这一目标，我们开发了一种基于元学习的堆叠回归模型，该模型结合了袋型和投射型模型，每一个模型都能够单独表现出色。在一个公开的在线零售数据集上进行了实验，以评估不同模型的效果，并证明了我们的方法的有效性。

Stock Market Price Prediction: A Hybrid LSTM and Sequential Self-Attention based Approach

paper_url: http://arxiv.org/abs/2308.04419
repo_url: None
paper_authors: Karan Pardeshi, Sukhpal Singh Gill, Ahmed M. Abdelmoniem
for: 预测股票价格，帮助投资者在最佳时间做出最佳决策。
methods: 使用深度学习策略，具体来说是Long Short-Term Memory（LSTM）与Sequential Self-Attention Mechanism（LSTM-SSAM）建模方法。
results: 对三个股票数据集（SBIN、HDFCBANK、BANKBARODA）进行了广泛的实验，实验结果表明提议的模型比现有模型更有效果和可行性。RMSE和R2评价指标得到了最佳result。

Abstract
One of the most enticing research areas is the stock market, and projecting stock prices may help investors profit by making the best decisions at the correct time. Deep learning strategies have emerged as a critical technique in the field of the financial market. The stock market is impacted due to two aspects, one is the geo-political, social and global events on the bases of which the price trends could be affected. Meanwhile, the second aspect purely focuses on historical price trends and seasonality, allowing us to forecast stock prices. In this paper, our aim is to focus on the second aspect and build a model that predicts future prices with minimal errors. In order to provide better prediction results of stock price, we propose a new model named Long Short-Term Memory (LSTM) with Sequential Self-Attention Mechanism (LSTM-SSAM). Finally, we conduct extensive experiments on the three stock datasets: SBIN, HDFCBANK, and BANKBARODA. The experimental results prove the effectiveness and feasibility of the proposed model compared to existing models. The experimental findings demonstrate that the root-mean-squared error (RMSE), and R-square (R2) evaluation indicators are giving the best results.

摘要
一个非常吸引人的研究领域是股票市场，并且预测股票价格可以帮助投资者获得最佳的决策时间。深度学习策略在财务市场中得到了广泛应用。股票市场受到两个方面的影响：一是地域政治社会和全球事件的影响，这些事件可能影响股票价格走势。另一方面，我们专注于历史价格走势和季节性，以预测股票价格。在这篇论文中，我们的目标是建立一个可以预测股票价格的新模型，并且对现有模型进行比较。为了提供更好的预测结果，我们提议一种名为长期记忆（LSTM）和顺序自注意机制（SSAM）的新模型。最后，我们对三个股票数据集进行了广泛的实验：SBIN、HDFCBANK和BANKBARODA。实验结果证明了我们提出的模型的可行性和有效性，并且比现有模型更好。实验结果表明，使用Root Mean Squared Error（RMSE）和R-square（R2）评价指标，我们的模型在预测股票价格方面得到了最佳的结果。

Adaptive Semi-Supervised Segmentation of Brain Vessels with Ambiguous Labels

paper_url: http://arxiv.org/abs/2308.03613
repo_url: None
paper_authors: Fengming Lin, Yan Xia, Nishant Ravikumar, Qiongyao Liu, Michael MacRaild, Alejandro F Frangi
for: 脑血管疾病诊断和治疗中，精准分割脑血管的重要性。
methods: 我们提出了一种适应性半supervised方法，包括进步半supervised学习、适应训练策略和边界增强。
results: 对3DRA数据集进行实验，我们的方法在mesh-based分割指标中显示出优于其他方法。通过利用部分和抽象标注数据，我们的方法在杂乱标注数据集上实现了出色的分割性能，展示了临床应用的潜力。

Abstract
Accurate segmentation of brain vessels is crucial for cerebrovascular disease diagnosis and treatment. However, existing methods face challenges in capturing small vessels and handling datasets that are partially or ambiguously annotated. In this paper, we propose an adaptive semi-supervised approach to address these challenges. Our approach incorporates innovative techniques including progressive semi-supervised learning, adaptative training strategy, and boundary enhancement. Experimental results on 3DRA datasets demonstrate the superiority of our method in terms of mesh-based segmentation metrics. By leveraging the partially and ambiguously labeled data, which only annotates the main vessels, our method achieves impressive segmentation performance on mislabeled fine vessels, showcasing its potential for clinical applications.

摘要
精准分割脑血管是脑血管疾病诊断和治疗中的关键。然而，现有方法在捕捉小血管和处理部分或恶劣标注的数据集时遇到困难。在这篇论文中，我们提出了一种适应式半监督方法来解决这些挑战。我们的方法包括进步式半监督学习、适应性训练策略和边界增强等创新技术。在3DRA数据集上进行实验，我们的方法在基于网格的分割指标上表现出色。通过利用部分和恶劣标注的数据，我们的方法在混乱标注的细血管上具有出色的分割性能，这显示了其在临床应用中的潜力。

A machine-learning sleep-wake classification model using a reduced number of features derived from photoplethysmography and activity signals

paper_url: http://arxiv.org/abs/2308.05759
repo_url: None
paper_authors: Douglas A. Almeida, Felipe M. Dias, Marcelo A. F. Toledo, Diego A. C. Cardenas, Filipe A. C. Oliveira, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez
for: 这个研究的目的是开发一种基于 Photoplethysmography (PPG) 信号和活动计数的机器学习睡眠-醒目分类模型，以提高睡眠质量和全身健康。
methods: 该研究使用了 eXtreme Gradient Boosting (XGBoost) 算法和 PPG 信号和活动计数特征进行睡眠 stage INFERENCE。
results: 该研究的结果显示，使用 XGBoost 算法和 PPG 信号和活动计数特征可以达到比现有方法更高的分类性能，具体来说，敏感性为 91.15 $\pm$ 1.16%, 特征选择率为 53.66 $\pm$ 1.12%, F1 分数为 83.88 $\pm$ 0.56%, κ值为 48.0 $\pm$ 0.86%。

Abstract
Sleep is a crucial aspect of our overall health and well-being. It plays a vital role in regulating our mental and physical health, impacting our mood, memory, and cognitive function to our physical resilience and immune system. The classification of sleep stages is a mandatory step to assess sleep quality, providing the metrics to estimate the quality of sleep and how well our body is functioning during this essential period of rest. Photoplethysmography (PPG) has been demonstrated to be an effective signal for sleep stage inference, meaning it can be used on its own or in a combination with others signals to determine sleep stage. This information is valuable in identifying potential sleep issues and developing strategies to improve sleep quality and overall health. In this work, we present a machine learning sleep-wake classification model based on the eXtreme Gradient Boosting (XGBoost) algorithm and features extracted from PPG signal and activity counts. The performance of our method was comparable to current state-of-the-art methods with a Sensitivity of 91.15 $\pm$ 1.16%, Specificity of 53.66 $\pm$ 1.12%, F1-score of 83.88 $\pm$ 0.56%, and Kappa of 48.0 $\pm$ 0.86%. Our method offers a significant improvement over other approaches as it uses a reduced number of features, making it suitable for implementation in wearable devices that have limited computational power.

摘要
睡眠是我们健康和养成的重要方面。它对我们的情绪、身体和肌肉健康产生重要影响，对我们的情绪、记忆和认知功能也产生很大的影响。睡眠阶段的分类是评估睡眠质量的必要步骤，可以提供评估睡眠质量的度量，以及身体在这段睡眠时的功能状况。光学抵抗检测（PPG）已经被证明可以用于睡眠阶段推断，这意味着它可以单独使用或与其他信号组合使用来确定睡眠阶段。这些信息非常重要，可以帮助发现可能存在的睡眠问题，并开发改善睡眠质量和总体健康的策略。在这项工作中，我们提出了基于XTreme Gradient Boosting（XGBoost）算法和PPG信号和活动计数的机器学习睡眠-醒目分类模型。我们的方法的性能与当前状态的方法相当，具有感知率91.15 $\pm$ 1.16%、特异性53.66 $\pm$ 1.12%、F1分数83.88 $\pm$ 0.56%和卡ปα48.0 $\pm$ 0.86%。我们的方法比其他方法更加有优势，因为它使用了减少的特征数，适合在有限计算能力的穿戴式设备中实现。

Generalized Early Stopping in Evolutionary Direct Policy Search

paper_url: http://arxiv.org/abs/2308.03574
repo_url: https://github.com/anonreposit/gesp
paper_authors: Etor Arza, Leni K. Le Goff, Emma Hart
for: 提高direct policy search任务的计算效率，尤其是在物理世界中进行评估时。
methods: 基于对象值的简单停止 criterion，不需要具体问题知识。
results: 在五个来自游戏、 роботи库和 класси控制领域的直接策略搜索环境中，可以Save up to 75% of computation time。 comparison with problem-specific stopping criteria shows that it performs comparably while being more generally applicable.

Abstract
Lengthy evaluation times are common in many optimization problems such as direct policy search tasks, especially when they involve conducting evaluations in the physical world, e.g. in robotics applications. Often, when evaluating a solution over a fixed time period, it becomes clear that the objective value will not increase with additional computation time (for example, when a two-wheeled robot continuously spins on the spot). In such cases, it makes sense to stop the evaluation early to save computation time. However, most approaches to stop the evaluation are problem-specific and need to be specifically designed for the task at hand. Therefore, we propose an early stopping method for direct policy search. The proposed method only looks at the objective value at each time step and requires no problem-specific knowledge. We test the introduced stopping criterion in five direct policy search environments drawn from games, robotics, and classic control domains, and show that it can save up to 75% of the computation time. We also compare it with problem-specific stopping criteria and demonstrate that it performs comparably while being more generally applicable.

摘要
长时间的评估时间是许多优化问题中的常见问题，如直接策略搜索任务，特别是在物理世界中进行评估，例如在二轮机器人应用中。经常情况下，当评估一解决方案在固定时间段内时，就会发现目标值不会随着计算时间增加（例如，当二轮机器人不断旋转在同一点上）。在这些情况下，可以提前结束评估以降低计算时间。然而，大多数评估结束方法是专门为特定任务设计的，需要具备专门的问题知识。因此，我们提出了一种直接策略搜索中的 early stopping 方法。该方法只需要在每个时间步骤中评估目标值，无需特定任务知识。我们在五个直接策略搜索环境中测试了引入的停止标准，这些环境来自游戏、机器人和 класси控制领域。我们发现，该方法可以将计算时间减少到75%。我们还与专门的停止标准进行比较，并证明它与专门的停止标准相比，性能相似，但更一般适用。

When Federated Learning meets Watermarking: A Comprehensive Overview of Techniques for Intellectual Property Protection

paper_url: http://arxiv.org/abs/2308.03573
repo_url: None
paper_authors: Mohammed Lansari, Reda Bellafqira, Katarzyna Kapusta, Vincent Thouvenot, Olivier Bettan, Gouenou Coatrieux
for: 本文提供了最新的 Federated Learning（FL） watermarking 技术的概述，包括新的挑战和机遇在FL中。
methods: 本文详细介绍了过去五年内关于DNN watermarking的研究，以及在FL中应用这些技术的新途径。
results: 本文总结了FL watermarking 技术的最新进展，并释明了在FL中保护模型所有权的新挑战和机遇。

Abstract
Federated Learning (FL) is a technique that allows multiple participants to collaboratively train a Deep Neural Network (DNN) without the need of centralizing their data. Among other advantages, it comes with privacy-preserving properties making it attractive for application in sensitive contexts, such as health care or the military. Although the data are not explicitly exchanged, the training procedure requires sharing information about participants' models. This makes the individual models vulnerable to theft or unauthorized distribution by malicious actors. To address the issue of ownership rights protection in the context of Machine Learning (ML), DNN Watermarking methods have been developed during the last five years. Most existing works have focused on watermarking in a centralized manner, but only a few methods have been designed for FL and its unique constraints. In this paper, we provide an overview of recent advancements in Federated Learning watermarking, shedding light on the new challenges and opportunities that arise in this field.

摘要
受领域限制的 Federated Learning（FL）是一种技术，允许多个参与者共同训练深度神经网络（DNN），不需要中央化数据。FL possess several advantages, including privacy-preserving properties, making it suitable for applications in sensitive contexts, such as healthcare or the military. However, the training process requires sharing information about participants' models, which makes the individual models vulnerable to theft or unauthorized distribution by malicious actors. To address the issue of ownership rights protection in the context of Machine Learning（ML）, DNN watermarking methods have been developed over the past five years. Most existing works have focused on watermarking in a centralized manner, but only a few methods have been designed for FL and its unique constraints. In this paper, we provide an overview of recent advancements in Federated Learning watermarking, highlighting the new challenges and opportunities that arise in this field.

Provably Efficient Learning in Partially Observable Contextual Bandit

paper_url: http://arxiv.org/abs/2308.03572
repo_url: None
paper_authors: Xueping Gong, Jiheng Zhang
for: 这项研究探讨了在半可见情况下的Contextual Bandit问题，agent有限制知识和部分隐藏变量的情况下，通过优化问题来描述或部分描述 causal effect between actions and rewards。
methods: 我们将原始函数约束转化为 linear 约束，通过顺序解 linear programming 来样 compatible causal models，并通过这些模型来获得 causal bounds，并考虑到估计误差。我们的抽取算法提供了可靠的抽取结果。
results: 我们证明了我们的 causally enhanced 算法在 action set 和函数空间大小的情况下比 classical bandit 算法更高效，并且在可以处理通用 context distribution 的情况下，我们的方法可以在函数approximation 任务中提高 regret 的速度。我们的结果还表明，我们的方法可以在实际应用中提高 contextual bandit 代理的性能。

Abstract
In this paper, we investigate transfer learning in partially observable contextual bandits, where agents have limited knowledge from other agents and partial information about hidden confounders. We first convert the problem to identifying or partially identifying causal effects between actions and rewards through optimization problems. To solve these optimization problems, we discretize the original functional constraints of unknown distributions into linear constraints, and sample compatible causal models via sequentially solving linear programmings to obtain causal bounds with the consideration of estimation error. Our sampling algorithms provide desirable convergence results for suitable sampling distributions. We then show how causal bounds can be applied to improving classical bandit algorithms and affect the regrets with respect to the size of action sets and function spaces. Notably, in the task with function approximation which allows us to handle general context distributions, our method improves the order dependence on function space size compared with previous literatures. We formally prove that our causally enhanced algorithms outperform classical bandit algorithms and achieve orders of magnitude faster convergence rates. Finally, we perform simulations that demonstrate the efficiency of our strategy compared to the current state-of-the-art methods. This research has the potential to enhance the performance of contextual bandit agents in real-world applications where data is scarce and costly to obtain.

摘要
在这篇论文中，我们研究了在部分可见 контекстual bandit 中的传输学习， agents 具有其他 agents 的有限知识和隐藏的干扰因素的有限信息。我们首先将问题转化为标识或部分标识 causal 效果 между动作和奖励的优化问题。为解这些优化问题，我们将原始的函数约束Unknown Distribution 转化为线性约束，并通过顺序解线性程序来采样兼容 causal 模型，以获得 causal 上下文的约束。我们的抽取算法提供了可靠的收敛结果。然后，我们展示了如何使用 causal 上下文来改进 classical bandit 算法，并对奖励的大小和功能空间的大小产生影响。尤其在功能适应任务中，我们的方法可以处理一般的 context 分布，我们的方法可以在功能空间大小的下降中提高奖励的顺序依赖度。我们正式证明我们的 causally 强化算法在较前文献的算法之上出perform better，并 achiev 许多更快的收敛率。最后，我们在实验中证明了我们的策略在实际应用中比现有的方法更高效。这种研究具有提高实际应用中 contextual bandit 代理的性能的潜在可能性。

Partial identification of kernel based two sample tests with mismeasured data

paper_url: http://arxiv.org/abs/2308.03570
repo_url: None
paper_authors: Ron Nafshi, Maggie Makar
for: 这个论文是关于在机器学习应用中使用非参数两个样本测试（最大均值差）时，如何处理含有错误样本的情况。
methods: 该论文使用了对$\epsilon$-污染情况下的MMD的估计，并研究了MMD的部分鉴定。
results: 该论文提出了一种方法来估计MMD的上下限，并证明这些上下限是true, unknown MMD的精确 bounds。通过三个实验 dataset，论文证明了这种方法的优越性，即它可以提供紧Binding的上下限，并且false coverage rate较低。

Abstract
Nonparametric two-sample tests such as the Maximum Mean Discrepancy (MMD) are often used to detect differences between two distributions in machine learning applications. However, the majority of existing literature assumes that error-free samples from the two distributions of interest are available.We relax this assumption and study the estimation of the MMD under $\epsilon$-contamination, where a possibly non-random $\epsilon$ proportion of one distribution is erroneously grouped with the other. We show that under $\epsilon$-contamination, the typical estimate of the MMD is unreliable. Instead, we study partial identification of the MMD, and characterize sharp upper and lower bounds that contain the true, unknown MMD. We propose a method to estimate these bounds, and show that it gives estimates that converge to the sharpest possible bounds on the MMD as sample size increases, with a convergence rate that is faster than alternative approaches. Using three datasets, we empirically validate that our approach is superior to the alternatives: it gives tight bounds with a low false coverage rate.

摘要
非参数两个样本测试，如最大均值差（MMD），在机器学习应用中广泛使用以检测两个分布之间的差异。然而，现有的大部分文献假设有误无误的样本来自两个分布。我们松弛这个假设，研究在$\epsilon$-杂化下MMD的估计，其中可能存在非随机的$\epsilon$比例一个分布被误归类为另一个分布。我们表明在$\epsilon$-杂化下，通常的估计MMD是不可靠的。而我们研究了MMD的部分鉴定，并Characterize了包含真实未知MMD的尖锐上下限。我们提议了一种估计这些上下限的方法，并证明它的估计将与样本大小增加而 convergence to the sharpest possible bounds on MMD, with a convergence rate that is faster than alternative approaches。使用三个数据集，我们实验 validate that our approach is superior to the alternatives: it gives tight bounds with a low false coverage rate.

A Transfer Learning Framework for Proactive Ramp Metering Performance Assessment

paper_url: http://arxiv.org/abs/2308.03542
repo_url: None
paper_authors: Xiaobo Ma, Adrian Cottam, Mohammad Razaur Rahman Shaon, Yao-Jan Wu
for: 评估升道计划的性能时需要评估升道计划对高速公路交通流动的影响。
methods: 本研究提出了一种基于传输学习模型的方法，通过学习before和after情况下的车流特征，对新的高速公路段预测车流参数。
results: 实验结果表明，提posed方法可以作为评估升道计划性能的 alternatives。

Abstract
Transportation agencies need to assess ramp metering performance when deploying or expanding a ramp metering system. The evaluation of a ramp metering strategy is primarily centered around examining its impact on freeway traffic mobility. One way these effects can be explored is by comparing traffic states, such as the speed before and after the ramp metering strategy has been altered. Predicting freeway traffic states for the after scenarios following the implementation of a new ramp metering control strategy could offer valuable insights into the potential effectiveness of the target strategy. However, the use of machine learning methods in predicting the freeway traffic state for the after scenarios and evaluating the effectiveness of transportation policies or traffic control strategies such as ramp metering is somewhat limited in the current literature. To bridge the research gap, this study presents a framework for predicting freeway traffic parameters (speed, occupancy, and flow rate) for the after situations when a new ramp metering control strategy is implemented. By learning the association between the spatial-temporal features of traffic states in before and after situations for known freeway segments, the proposed framework can transfer this learning to predict the traffic parameters for new freeway segments. The proposed framework is built upon a transfer learning model. Experimental results show that the proposed framework is feasible for use as an alternative for predicting freeway traffic parameters to proactively evaluate ramp metering performance.

摘要
（简化中文）交通机构需要评估扩展或部署匝道控制系统时，需要评估匝道控制策略的表现。匝道控制策略的效果主要是通过评估它对高速公路交通流动性的影响来评估。比较交通状态之前和之后匝道控制策略改变后的情况可以提供有价值的信息。但是，使用机器学习方法来预测高速公路交通状态的后果和评估交通政策或匝道控制策略的效果在当前文献中受限。为了填补这个研究漏洞，本研究提出了一种预测高速公路交通参数（速度、占用率和流速）的后果框架。通过学习知道的匝道段的前后交通状态之间的空间时间特征，该框架可以将这种学习转移到新的匝道段上预测交通参数。该框架基于转移学习模型。实验结果表明，该框架可以作为评估匝道控制性能的代替方法来预测高速公路交通参数。

On-ramp and Off-ramp Traffic Flows Estimation Based on A Data-driven Transfer Learning Framework

paper_url: http://arxiv.org/abs/2308.03538
repo_url: None
paper_authors: Xiaobo Ma, Abolfazl Karimpour, Yao-Jan Wu
for: 用于提高路径负荷管理策略的执行和监测，以及评估高速公路交叉口的交通性能。methods: 使用数据驱动的框架，通过启用转移学习模型，可以准确地估计缺失的融合流量。results: 实验结果表明，提案的方法可以在不同的交通模式、分布和特点下提供高精度的融合流量估计，其误差值在23.90 veh/h到40.85 veh/h之间， root mean square error值在34.55 veh/h到57.77 veh/h之间。此外，比较分析表明，提案的方法在其他传统机器学习模型之上表现出优异的性能。

Abstract
To develop the most appropriate control strategy and monitor, maintain, and evaluate the traffic performance of the freeway weaving areas, state and local Departments of Transportation need to have access to traffic flows at each pair of on-ramp and off-ramp. However, ramp flows are not always readily available to transportation agencies and little effort has been made to estimate these missing flows in locations where no physical sensors are installed. To bridge this research gap, a data-driven framework is proposed that can accurately estimate the missing ramp flows by solely using data collected from loop detectors on freeway mainlines. The proposed framework employs a transfer learning model. The transfer learning model relaxes the assumption that the underlying data distributions of the source and target domains must be the same. Therefore, the proposed framework can guarantee high-accuracy estimation of on-ramp and off-ramp flows on freeways with different traffic patterns, distributions, and characteristics. Based on the experimental results, the flow estimation mean absolute errors range between 23.90 veh/h to 40.85 veh/h for on-ramps, and 31.58 veh/h to 45.31 veh/h for off-ramps; the flow estimation root mean square errors range between 34.55 veh/h to 57.77 veh/h for on-ramps, and 41.75 veh/h to 58.80 veh/h for off-ramps. Further, the comparison analysis shows that the proposed framework outperforms other conventional machine learning models. The estimated ramp flows based on the proposed method can help transportation agencies to enhance the operations of their ramp control strategies for locations where physical sensors are not installed.

摘要
为了开发最佳的控制策略和监测、维护和评估高速公路叠加区域的交通表现，国家和地方交通部门需要了解每对侧线和脱离的交通流量。然而，叠加区域的流量并不总是可以向交通机构提供，而且过去很少有人尝试了估计这些缺失的流量。为了填补这一研究漏洞，我们提出了一个数据驱动的框架，可以准确地估计缺失的叠加流量，只使用高速公路主线上的循环检测器收集的数据。我们的框架采用了传输学习模型，该模型放宽了下面数据集的假设，因此我们的框架可以 garantizar高精度地估计叠加流量，无论高速公路的交通模式、分布和特征如何。根据实验结果，估计的叠加流量 Mean Absolute Error 在23.90 veh/h 到 40.85 veh/h之间，Root Mean Square Error 在34.55 veh/h 到 57.77 veh/h之间。此外，比较分析表明，我们的方法在其他传统的机器学习模型之上表现出色。根据我们的估计结果，可以帮助交通机构在没有物理感知器的情况下提高叠加控制策略的运行。

Deep Feature Learning for Wireless Spectrum Data

paper_url: http://arxiv.org/abs/2308.03530
repo_url: None
paper_authors: Ljupcho Milosheski, Gregor Cerar, Blaž Bertalanič, Carolina Fortuna, Mihael Mohorčič
for: 本研究旨在实现无监督的无线传输几何分 clustering。
methods: 我们提出了基于卷积神经网络的自动特征表示学习方法，并证明了这种方法可以将输入数据降维至99.3%比baseline PCA少。
results: 我们显示了自动特征表示学习可以提取细化的传输几何shape，而baseline仅能基于背景噪音进行大致分类。

Abstract
In recent years, the traditional feature engineering process for training machine learning models is being automated by the feature extraction layers integrated in deep learning architectures. In wireless networks, many studies were conducted in automatic learning of feature representations for domain-related challenges. However, most of the existing works assume some supervision along the learning process by using labels to optimize the model. In this paper, we investigate an approach to learning feature representations for wireless transmission clustering in a completely unsupervised manner, i.e. requiring no labels in the process. We propose a model based on convolutional neural networks that automatically learns a reduced dimensionality representation of the input data with 99.3% less components compared to a baseline principal component analysis (PCA). We show that the automatic representation learning is able to extract fine-grained clusters containing the shapes of the wireless transmission bursts, while the baseline enables only general separability of the data based on the background noise.

摘要

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

paper_url: http://arxiv.org/abs/2308.03526
repo_url: None
paper_authors: Michaël Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad Żołna, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gómez Colmenarejo, Aäron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals
For: The paper is written to advance offline reinforcement learning algorithms in the challenging environment of StarCraft II.* Methods: The paper introduces a new benchmark called AlphaStar Unplugged, which includes a dataset, standardized API, and evaluation protocol. The authors also present baseline agents, including behavior cloning and offline variants of actor-critic and MuZero.* Results: The authors achieve a 90% win rate against a previously published AlphaStar behavior cloning agent using only offline data, improving the state of the art of agents in this domain.Here is the same information in Simplified Chinese text:
for: 本文是为了提高星际II上的离线学习算法。
methods: 本文引入了一个新的标准 benchmark，即AlphaStar Unplugged，包括一个数据集、标准 API 和评估协议。作者还提供了一些基线代理，包括行为做影响和离线版actor-critic和MuZero。
results: 作者使用仅离线数据达到了90% 的胜率，超过了之前发表的AlphaStar行为做影响代理。

Abstract
StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution. It also has an active professional competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of its challenging nature and because Blizzard has released a massive dataset of millions of StarCraft II games played by human players. This paper leverages that and establishes a benchmark, called AlphaStar Unplugged, introducing unprecedented challenges for offline reinforcement learning. We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol. We also present baseline agents, including behavior cloning, offline variants of actor-critic and MuZero. We improve the state of the art of agents using only offline data, and we achieve 90% win rate against previously published AlphaStar behavior cloning agent.

摘要
星际II是一个非常挑战性的规则学习环境，它是部分可见、随机、多智能的，掌握星际II需要长期战略规划，同时在实时低级别执行。它还有活跃的职业竞赛场景。由于星际II的挑战性和Blizzard公司发布了数百万场星际II游戏记录，这使得星际II成为了提前RL算法的进步的 идеaldestination。本文利用了这些数据和API，并实现了一个基准测试（AlphaStar Unplugged），它为Offline RL算法带来了前所未有的挑战。我们定义了一个子集（Blizzard发布的数据），工具和标准API для机器学习方法，以及评价协议。我们还提供了基eline agents，包括行为做法快照、Offline变体的actor-critic和MuZero。我们使用仅Offline数据提高了代理机器的状态，并达到了以前发布的AlphaStar行为快照机器的90%赢率。

Worker Activity Recognition in Manufacturing Line Using Near-body Electric Field

paper_url: http://arxiv.org/abs/2308.03514
repo_url: None
paper_authors: Sungho Suh, Vitor Fortes Rey, Sizhen Bian, Yu-Chi Huang, Jože M. Rožanec, Hooman Tavakoli Ghinani, Bo Zhou, Paul Lukowicz
for: 本研究旨在提高制造业的生产效率和产品质量，通过部署先进的感知和控制系统。
methods: 本研究使用了融合IMU和体容感测模块的新型便携式感知原型，并对多渠道时间序列 convolutional neural networks 和深度径向LSTM进行早期和晚期整合处理。
results: 实验结果表明，我们提出的方法可以与基线方法相比，在实际应用中表现出较高的性能。此外，具有体容感器和特征融合方法的感知原型，对比没有体容感器和Apple Watch数据的情况，提高了6.35%，macro F1分数提高了9.38%。

Abstract
Manufacturing industries strive to improve production efficiency and product quality by deploying advanced sensing and control systems. Wearable sensors are emerging as a promising solution for achieving this goal, as they can provide continuous and unobtrusive monitoring of workers' activities in the manufacturing line. This paper presents a novel wearable sensing prototype that combines IMU and body capacitance sensing modules to recognize worker activities in the manufacturing line. To handle these multimodal sensor data, we propose and compare early, and late sensor data fusion approaches for multi-channel time-series convolutional neural networks and deep convolutional LSTM. We evaluate the proposed hardware and neural network model by collecting and annotating sensor data using the proposed sensing prototype and Apple Watches in the testbed of the manufacturing line. Experimental results demonstrate that our proposed methods achieve superior performance compared to the baseline methods, indicating the potential of the proposed approach for real-world applications in manufacturing industries. Furthermore, the proposed sensing prototype with a body capacitive sensor and feature fusion method improves by 6.35%, yielding a 9.38% higher macro F1 score than the proposed sensing prototype without a body capacitive sensor and Apple Watch data, respectively.

摘要
制造业为提高生产效率和产品质量而努力，通常会使用先进的感测和控制系统。穿戴式感测器正在成为制造业中实现这一目标的有望解决方案，因为它们可以提供不间断和不干扰的工作者活动监测。本文介绍了一种新的穿戴式感测原型，该原型结合IMU和身体电容感测模块，以认识制造线上工作者的活动。为处理这些多模式感测数据，我们提出并比较了早期和晚期感测数据融合方法，用于多渠道时间序列卷积神经网络和深度卷积LSTM。我们通过使用提议的硬件和神经网络模型，对收集和标注感测数据的Apple Watch和测试准系中的感测数据进行评估。实验结果表明，我们的提议方法在比基准方法的情况下表现出色，这表明了我们的方法在实际应用中的潜在可能性。此外，结合身体电容感测器和特征融合方法的提议感测器提高了6.35%，对比没有身体电容感测器和Apple Watch数据的情况下，提议感测器的macro F1分数提高了9.38%。

A data-driven approach to predict decision point choice during normal and evacuation wayfinding in multi-story buildings

paper_url: http://arxiv.org/abs/2308.03511
repo_url: None
paper_authors: Yan Feng, Panchamy Krishnakumari
for: 这项研究旨在理解和预测在多层建筑物内 pedestrian 的决策点选择行为，以确保 pedestrian 的安全。methods: 该研究使用了数据驱动的方法，首先构建了indoor网络表示，然后使用了一种已知的机器学习算法，即随机森林（RF）模型来预测 pedestrian 在路线上的决策点选择。results: 研究发现，使用 RF 模型可以高度准确预测 pedestrian 的决策点选择，其中最高的预测精度达到 96%。此外，研究还发现，个人特征不会影响决策点选择。这项研究表明了应用机器学习算法来研究 pedestrian 路线选择行为在复杂的indoor建筑物中的潜力。

Abstract
Understanding pedestrian route choice behavior in complex buildings is important to ensure pedestrian safety. Previous studies have mostly used traditional data collection methods and discrete choice modeling to understand the influence of different factors on pedestrian route and exit choice, particularly in simple indoor environments. However, research on pedestrian route choice in complex buildings is still limited. This paper presents a data-driven approach for understanding and predicting the pedestrian decision point choice during normal and emergency wayfinding in a multi-story building. For this, we first built an indoor network representation and proposed a data mapping technique to map VR coordinates to the indoor representation. We then used a well-established machine learning algorithm, namely the random forest (RF) model to predict pedestrian decision point choice along a route during four wayfinding tasks in a multi-story building. Pedestrian behavioral data in a multi-story building was collected by a Virtual Reality experiment. The results show a much higher prediction accuracy of decision points using the RF model (i.e., 93% on average) compared to the logistic regression model. The highest prediction accuracy was 96% for task 3. Additionally, we tested the model performance combining personal characteristics and we found that personal characteristics did not affect decision point choice. This paper demonstrates the potential of applying a machine learning algorithm to study pedestrian route choice behavior in complex indoor buildings.

摘要
理解步行者路径选择行为在复杂的建筑物中是重要的，以确保步行者的安全。先前的研究通常使用传统的数据采集方法和精确选择模型来理解不同因素对步行者路径和出口选择的影响，特别是在简单的室内环境中。然而，关于步行者路径选择在复杂的建筑物中的研究仍然有限。本文提出了一种数据驱动的方法，用于理解和预测步行者决策点选择在正常和紧急导航中的多层建筑物中。为此，我们首先建立了一个室内网络表示，并提出了一种数据映射技术来将VR坐标映射到室内表示中。然后，我们使用一种已有的机器学习算法，即随机森林（RF）模型来预测步行者决策点选择的路径中的决策点。在一个多层建筑物中的步行者行为数据被通过虚拟现实实验收集。结果显示，使用RF模型（即93%的平均预测精度）的预测精度远高于逻辑回归模型。最高的预测精度是96%的任务3。此外，我们测试了模型性能，结果表明个人特征不会影响决策点选择。本文示范了应用机器学习算法研究步行者路径选择行为在复杂室内建筑物中的可能性。

Balanced Face Dataset: Guiding StyleGAN to Generate Labeled Synthetic Face Image Dataset for Underrepresented Group

paper_url: http://arxiv.org/abs/2308.03495
repo_url: None
paper_authors: Kidist Amde Mekonnen
for:This paper aims to generate a robust face image dataset that is balanced among different demographic groups, using the StyleGAN model.methods:The paper uses the StyleGAN model to generate synthetic face images, and controls the generation process to achieve a balanced distribution of the dataset among different demographic groups.results:The paper achieves a balanced distribution of the dataset among different demographic groups, and demonstrates the effectiveness of using synthetic data generation and active labeling to reduce bias in machine learning.Here’s the Chinese translation of the three points:for:这篇论文的目标是使用StyleGAN模型生成一个可靠的人脸图像数据集，该数据集在不同的民族群体中具有平衡分布。methods:论文使用StyleGAN模型生成人脸synthetic图像，并控制生成过程以实现数据集中不同民族群体的平衡分布。results:论文实现了数据集中不同民族群体的平衡分布，并证明了通过生成人工数据和活动标注来减少机器学习中的偏见。

Abstract
For a machine learning model to generalize effectively to unseen data within a particular problem domain, it is well-understood that the data needs to be of sufficient size and representative of real-world scenarios. Nonetheless, real-world datasets frequently have overrepresented and underrepresented groups. One solution to mitigate bias in machine learning is to leverage a diverse and representative dataset. Training a model on a dataset that covers all demographics is crucial to reducing bias in machine learning. However, collecting and labeling large-scale datasets has been challenging, prompting the use of synthetic data generation and active labeling to decrease the costs of manual labeling. The focus of this study was to generate a robust face image dataset using the StyleGAN model. In order to achieve a balanced distribution of the dataset among different demographic groups, a synthetic dataset was created by controlling the generation process of StyleGaN and annotated for different downstream tasks.

摘要
为了让机器学习模型在特定问题领域 generaleffectively，需要确保数据够大且符合实际情况。然而，实际世界数据集经常具有过度和不足的分布。一种解决偏见问题的方法是利用多样化和代表性的数据集。训练机器学习模型需要覆盖所有民族，这有助于减少偏见。然而，收集和标注大规模数据集的成本高昂，因此常用生成数据和活动标注来减少手动标注成本。这个研究的目标是通过StyleGAN模型生成一个可靠的人脸图像集。为了实现数据集的均衡分布，我们控制了StyleGAN生成过程，并对不同下游任务进行了标注。

Exploring the Physical World Adversarial Robustness of Vehicle Detection

paper_url: http://arxiv.org/abs/2308.03476
repo_url: None
paper_authors: Wei Jiang, Tianyuan Zhang, Shuangcheng Liu, Weiyu Ji, Zichao Zhang, Gang Xiao
For: The paper is written to highlight the significance of adversarial attacks in real-world contexts and to introduce a new dataset (DCI) for evaluating the robustness of detection models under these attacks.* Methods: The paper uses an innovative instant-level data generation pipeline using the CARLA simulator to create the DCI dataset, which enables comprehensive experiments involving three detection models and three physical adversarial attacks.* Results: The paper finds that Yolo v6 demonstrates remarkable resilience to adversarial attacks, while the ASA attack yields a substantial average AP reduction of 14.51%. The study also notes that static scenes yield higher recognition AP values and that outcomes remain relatively consistent across varying weather conditions. Additionally, the study suggests that advancements in adversarial attack algorithms may be approaching their “limitation”.

Abstract
Adversarial attacks can compromise the robustness of real-world detection models. However, evaluating these models under real-world conditions poses challenges due to resource-intensive experiments. Virtual simulations offer an alternative, but the absence of standardized benchmarks hampers progress. Addressing this, we propose an innovative instant-level data generation pipeline using the CARLA simulator. Through this pipeline, we establish the Discrete and Continuous Instant-level (DCI) dataset, enabling comprehensive experiments involving three detection models and three physical adversarial attacks. Our findings highlight diverse model performances under adversarial conditions. Yolo v6 demonstrates remarkable resilience, experiencing just a marginal 6.59% average drop in average precision (AP). In contrast, the ASA attack yields a substantial 14.51% average AP reduction, twice the effect of other algorithms. We also note that static scenes yield higher recognition AP values, and outcomes remain relatively consistent across varying weather conditions. Intriguingly, our study suggests that advancements in adversarial attack algorithms may be approaching its ``limitation''.In summary, our work underscores the significance of adversarial attacks in real-world contexts and introduces the DCI dataset as a versatile benchmark. Our findings provide valuable insights for enhancing the robustness of detection models and offer guidance for future research endeavors in the realm of adversarial attacks.

摘要
Note: The text has been translated into Simplified Chinese, which is the standardized form of Chinese used in mainland China and Singapore.

How to forecast power generation in wind farms? Insights from leveraging hierarchical structure

paper_url: http://arxiv.org/abs/2308.03472
repo_url: None
paper_authors: Lucas English, Mahdi Abolghasemi
for: 预测可再生能源生产，帮助决策全球减排。
methods: 使用层次预测和协调，以提高预测质量。
results: 跨时间和空间协调预测方法可以提高预测精度，特别是在多个时间层级。 linear regression 可以在大多数水平上超过机器学习模型的性能。

Abstract
Forecasting of renewable energy generation provides key insights which may help with decision-making towards global decarbonisation. Renewable energy generation can often be represented through cross-sectional hierarchies, whereby a single farm may have multiple individual generators. Hierarchical forecasting through reconciliation has demonstrated a significant increase in the quality of forecasts both theoretically and empirically. However, it is not evident whether forecasts generated by individual temporal and cross-sectional aggregation can be superior to integrated cross-temporal forecasts and to individual forecasts on more granular data. In this study, we investigate the accuracies of different cross-sectional and cross-temporal reconciliation methods using both linear regression and gradient boosting machine learning for forecasting wind farm power generation. We found that cross-temporal reconciliation is superior to individual cross-sectional reconciliation at multiple temporal aggregations. Cross-temporally reconciled machine learning base forecasts also demonstrated a high accuracy at coarser temporal granularities, which may encourage adoption for short-term wind forecasts. We also show that linear regression can outperform machine learning models across most levels in cross-sectional wind time series.

摘要
预测可再生能源生产提供关键的视野，帮助决策全球减灰化。可再生能源生产经常可以用分层结构表示，一个农场可以有多个个体发电机。层次预测通过协调可以提高预测质量， both theoretically and empirically。然而，不清楚 Whether forecasts generated by individual temporal and cross-sectional aggregation can be superior to integrated cross-temporal forecasts and to individual forecasts on more granular data.本研究 investigate 不同的横截和时间层整合预测方法的准确性，使用线性回归和梯度拟合机器学习模型预测风力农场电力生产。发现cross-temporal reconciliation 在多个时间层上胜过单个横截协调。同时，梯度拟合机器学习基础预测也在更粗细的时间层上表现出高准确性，这可能推动短期风预测的采用。此外，我们还发现了线性回归在大多数水平上超过机器学习模型的准确性。

Wide Gaps and Clustering Axioms

paper_url: http://arxiv.org/abs/2308.03464
repo_url: None
paper_authors: Mieczysław A. Kłopotek
for: 本研究旨在探讨k-means算法是否遵循克林伯格的聚类axiomaatic系统，并提出一些改进方案来使k-means更加符合这个系统。
methods: 本研究使用了两种新的聚类性特征：变量k-分割性和剩余k-分割性，并证明了k-means算法在欧几何或非欧几何空间中遵循克林伯格的一致性axioma。
results: 研究发现，k-means算法在某些情况下会violate克林伯格的一致性axioma，这是因为数据本身不符合聚类axioma。为了解决这个问题，研究提出了一些改进方案，包括一种基于变量k-分割性和剩余k-分割性的k-means算法。这些方案可以在欧几何和非欧几何空间中实现，并且可以使k-means算法更加符合克林伯格的聚类axioma。

Abstract
The widely applied k-means algorithm produces clusterings that violate our expectations with respect to high/low similarity/density and is in conflict with Kleinberg's axiomatic system for distance based clustering algorithms that formalizes those expectations in a natural way. k-means violates in particular the consistency axiom. We hypothesise that this clash is due to the not explicated expectation that the data themselves should have the property of being clusterable in order to expect the algorithm clustering hem to fit a clustering axiomatic system. To demonstrate this, we introduce two new clusterability properties, variational k-separability and residual k-separability and show that then the Kleinberg's consistency axiom holds for k-means operating in the Euclidean or non-Euclidean space. Furthermore, we propose extensions of k-means algorithm that fit approximately the Kleinberg's richness axiom that does not hold for k-means. In this way, we reconcile k-means with Kleinberg's axiomatic framework in Euclidean and non-Euclidean settings. Besides contribution to the theory of axiomatic frameworks of clustering and for clusterability theory, practical contribution is the possibility to construct {datasets for testing purposes of algorithms optimizing k-means cost function. This includes a method of construction of {clusterable data with known in advance global optimum.

摘要
广泛应用的k-means算法生成的分布不符合我们对高/低相似性和密度的预期，并与克林伯格的距离基于对数据分组算法的axiomaatic系统产生冲突。k-means Violates particular consistency axiom。我们 hypothesize 这个冲突是因为数据本身不具备分组性，以至于预期k-means对数据进行分组。 To demonstrate this, we introduce two new clusterability properties, variational k-separability and residual k-separability, and show that then the Kleinberg's consistency axiom holds for k-means operating in the Euclidean or non-Euclidean space. Furthermore, we propose extensions of k-means algorithm that fit approximately the Kleinberg's richness axiom that does not hold for k-means. In this way, we reconcile k-means with Kleinberg's axiomatic framework in Euclidean and non-Euclidean settings. Besides contribution to the theory of axiomatic frameworks of clustering and for clusterability theory, practical contribution is the possibility to construct datasets for testing purposes of algorithms optimizing k-means cost function. This includes a method of construction of clusterable data with known in advance global optimum.

High-Resolution Cranial Defect Reconstruction by Iterative, Low-Resolution, Point Cloud Completion Transformers

paper_url: http://arxiv.org/abs/2308.03813
repo_url: https://github.com/MWod/DeepImplant_MICCAI_2023
paper_authors: Marek Wodzinski, Mateusz Daniol, Daria Hemmerling, Miroslaw Socha
for: automatic cranial defect reconstruction
methods: iterative, transformer-based method
results: superior performance in terms of GPU memory consumption while maintaining high-quality of the reconstructed defectsHere’s the text in Simplified Chinese:
for: 自动化头部问题重建
methods: 迭代、基于传播器的方法
results: GPU 内存消耗量下降，并保持高品质的缺陷重建I hope this helps! Let me know if you have any other questions.

Abstract
Each year thousands of people suffer from various types of cranial injuries and require personalized implants whose manual design is expensive and time-consuming. Therefore, an automatic, dedicated system to increase the availability of personalized cranial reconstruction is highly desirable. The problem of the automatic cranial defect reconstruction can be formulated as the shape completion task and solved using dedicated deep networks. Currently, the most common approach is to use the volumetric representation and apply deep networks dedicated to image segmentation. However, this approach has several limitations and does not scale well into high-resolution volumes, nor takes into account the data sparsity. In our work, we reformulate the problem into a point cloud completion task. We propose an iterative, transformer-based method to reconstruct the cranial defect at any resolution while also being fast and resource-efficient during training and inference. We compare the proposed methods to the state-of-the-art volumetric approaches and show superior performance in terms of GPU memory consumption while maintaining high-quality of the reconstructed defects.

摘要
每年数千人都会因为不同类型的头部伤害而需要个性化嵌入式设备，但 manual 的设计是贵重时间的。因此，一个自动化、专门的系统可以大幅提高个性化头部重建的可用性。我们可以将这个问题 формули为形状完成任务，并使用专门的深度网络解决。现有的最常见方法是使用积分表示法，并应用深度网络进行图像分割。但这种方法存在一些限制，并不能扩展到高分辨率的体积，同时也不考虑数据稀缺性。在我们的工作中，我们将问题重新формализова为点云完成任务。我们提议一种迭代的变换器基本方法，可以在任何分辨率下重建头部缺陷，同时也具有快速和资源高效的训练和推理特点。我们与当前状态的积分方法进行比较，并显示我们的提议方法在 GPU 内存占用量方面具有显著优势，而无需牺牲高质量的缺陷重建。

Redesigning Out-of-Distribution Detection on 3D Medical Images

paper_url: http://arxiv.org/abs/2308.07324
repo_url: None
paper_authors: Anton Vasiliuk, Daria Frolova, Mikhail Belyaev, Boris Shirokikh
for: 本研究旨在解决验证医学影像分割中的异常样本检测问题，特别是由于缺乏明确的异常数据定义，导致许多人 искусственно设定问题而无法测量临床影响。
methods: 本研究提出了一种根据医学影像三维数据特点和下游任务（例如分割）重新定义异常样本检测问题。通过利用下游模型的性能来定义异常样本，我们可以无需明确ID/OOD分类来衡量不同样本的影响。我们称这种方法为预期性能下降（EPD）。
results: 在11种CT和MRI异常样本检测挑战中，我们示出了EPD的效果，并证明EPD可以根据临床影响来排序方法。

Abstract
Detecting out-of-distribution (OOD) samples for trusted medical image segmentation remains a significant challenge. The critical issue here is the lack of a strict definition of abnormal data, which often results in artificial problem settings without measurable clinical impact. In this paper, we redesign the OOD detection problem according to the specifics of volumetric medical imaging and related downstream tasks (e.g., segmentation). We propose using the downstream model's performance as a pseudometric between images to define abnormal samples. This approach enables us to weigh different samples based on their performance impact without an explicit ID/OOD distinction. We incorporate this weighting in a new metric called Expected Performance Drop (EPD). EPD is our core contribution to the new problem design, allowing us to rank methods based on their clinical impact. We demonstrate the effectiveness of EPD-based evaluation in 11 CT and MRI OOD detection challenges.

摘要
<> transtable text into Simplified Chinese<>检测非典型（OOD）样本 для信任的医学影像分割是一个重要的挑战。关键问题在于缺乏严格定义的异常数据，这经常导致人工设定的问题无法量化临床影响。在这篇论文中，我们重新设计了OOD检测问题，根据医学影像的特点和相关的下游任务（如分割）。我们提议使用下游模型的性能作为图像之间的 pseudometric，以定义异常样本。这种方法允许我们根据不同样本的性能影响加权它们，无需显式的ID/OOD分类。我们称之为预期性能下降（EPD）。 EPD是我们对新的问题设计的核心贡献，允许我们根据临床影响排名方法。我们在11个CT和MRI OOD检测挑战中展示了EPD-基于的评价效果。

Cross-Silo Prototypical Calibration for Federated Learning with Non-IID Data

paper_url: http://arxiv.org/abs/2308.03457
repo_url: https://github.com/qizhuang-qz/FedCSPC
paper_authors: Zhuang Qi, Lei Meng, Zitan Chen, Han Hu, Hui Lin, Xiangxu Meng
for: 这个论文目的是提出一种基于联合特征 проtotypical calibration 方法 (FedCSPC)，以在隐私保护下实现训练全球模型，并且能够对不同资料来源的数据进行一致性调整。
methods: 本论文使用了 Data Prototypical Modeling (DPM) 模组和 Cross-silo Prototypical Calibration (CSPC) 模组，DPM 模组可以帮助获取数据模式，而 CSPC 模组可以将不同来源的数据调整到一个共同的特征空间中，并且可以实现一致性调整。
results: 实验结果显示，FedCSPC 方法可以在不同资料来源上学习一致的特征，并且比起现有的方法有更好的性能。

Abstract
Federated Learning aims to learn a global model on the server side that generalizes to all clients in a privacy-preserving manner, by leveraging the local models from different clients. Existing solutions focus on either regularizing the objective functions among clients or improving the aggregation mechanism for the improved model generalization capability. However, their performance is typically limited by the dataset biases, such as the heterogeneous data distributions and the missing classes. To address this issue, this paper presents a cross-silo prototypical calibration method (FedCSPC), which takes additional prototype information from the clients to learn a unified feature space on the server side. Specifically, FedCSPC first employs the Data Prototypical Modeling (DPM) module to learn data patterns via clustering to aid calibration. Subsequently, the cross-silo prototypical calibration (CSPC) module develops an augmented contrastive learning method to improve the robustness of the calibration, which can effectively project cross-source features into a consistent space while maintaining clear decision boundaries. Moreover, the CSPC module's ease of implementation and plug-and-play characteristics make it even more remarkable. Experiments were conducted on four datasets in terms of performance comparison, ablation study, in-depth analysis and case study, and the results verified that FedCSPC is capable of learning the consistent features across different data sources of the same class under the guidance of calibrated model, which leads to better performance than the state-of-the-art methods. The source codes have been released at https://github.com/qizhuang-qz/FedCSPC.

摘要
联合学习目标是在服务器端学习一个总模型，该模型在保持隐私的情况下泛化到所有客户端。现有解决方案通常是通过客户端对象函数的规范化或改进模型聚合机制来提高模型泛化能力。然而，这些方法通常受到数据偏见的限制，如不同数据分布和缺失类。为解决这个问题，本文提出了跨积 Silva 批量准备方法（FedCSPC），该方法通过客户端提供的额外原型信息在服务器端学习一个统一的特征空间。具体来说，FedCSPC首先使用数据批量模型（DPM）模块学习数据模式以帮助准备。接着，跨积 Silva 批量准备（CSPC）模块提供了一种改进了准备的批量学习方法，可以有效地将跨源特征投影到一个协调的空间中，保持清晰的决策界。此外，CSPC模块的易于实现和插件化特性使其更加吸引人。经过对四个数据集的性能比较、简洁研究、深入分析和案例研究，结果证明了 FedCSPC 能够在不同数据源之间学习一致的特征，这导致了与状态艺术法比较的更好的性能。代码已经在 GitHub 上发布，请参考。

Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces

paper_url: http://arxiv.org/abs/2308.03443
repo_url: https://github.com/tatsu432/DR-estimator-OPE-large-action
paper_authors: Tatsuhiro Shimizu, Laura Forastiere
for: Off-Policy Evaluation (OPE) in contextual bandit settings with large action spaces.
methods: 使用 Marginalized Inverse Propensity Scoring (MIPS) 和 Marginalized Doubly Robust (MDR) estimator.
results: 提供了一种更加精度的 estimator, 并且在实验中证明了其超过了现有的 estimator.

Abstract
We study Off-Policy Evaluation (OPE) in contextual bandit settings with large action spaces. The benchmark estimators suffer from severe bias and variance tradeoffs. Parametric approaches suffer from bias due to difficulty specifying the correct model, whereas ones with importance weight suffer from variance. To overcome these limitations, Marginalized Inverse Propensity Scoring (MIPS) was proposed to mitigate the estimator's variance via embeddings of an action. To make the estimator more accurate, we propose the doubly robust estimator of MIPS called the Marginalized Doubly Robust (MDR) estimator. Theoretical analysis shows that the proposed estimator is unbiased under weaker assumptions than MIPS while maintaining variance reduction against IPS, which was the main advantage of MIPS. The empirical experiment verifies the supremacy of MDR against existing estimators.

摘要
我们研究在Contextual Bandit设置下的Off-Policy评估（OPE），它们的标准估计器受到严重的偏见和方差交易的影响。参数化方法受到模型难以准确地特定的偏见，而重要性Weighted方法受到方差的影响。为了解决这些限制，我们提出了Embeddings of an action的Marginalized Inverse Propensity Scoring（MIPS）来减少估计器的方差。为了使估计器更准确，我们提出了MIPS的双重Robust（MDR）估计器。理论分析表明，我们的估计器在较弱的假设下具有不偏性，同时维持IPS的方差减少。实验证明了MDR的超越性。

PURL: Safe and Effective Sanitization of Link Decoration

paper_url: http://arxiv.org/abs/2308.03417
repo_url: https://github.com/purl-sanitizer/purl
paper_authors: Shaoor Munir, Patrick Lee, Umar Iqbal, Zubair Shafiq, Sandra Siby
for: 防止追踪浏览器新增防御策略，novel tracking方法继续出现。
methods: 利用机器学习方法，检测和净化链接装饰中的追踪信息。
results: PURL可以准确地检测和净化链接装饰，比现有Countermeasure更高效和可靠，并对常见欺骗技术有较好的鲁棒性。

Abstract
While privacy-focused browsers have taken steps to block third-party cookies and browser fingerprinting, novel tracking methods that bypass existing defenses continue to emerge. Since trackers need to exfiltrate information from the client- to server-side through link decoration regardless of the tracking technique they employ, a promising orthogonal approach is to detect and sanitize tracking information in decorated links. We present PURL, a machine-learning approach that leverages a cross-layer graph representation of webpage execution to safely and effectively sanitize link decoration. Our evaluation shows that PURL significantly outperforms existing countermeasures in terms of accuracy and reducing website breakage while being robust to common evasion techniques. We use PURL to perform a measurement study on top-million websites. We find that link decorations are widely abused by well-known advertisers and trackers to exfiltrate user information collected from browser storage, email addresses, and scripts involved in fingerprinting.

摘要
“对于隐私浏览器的尝试，第三方Cookie和浏览器指纹都已经被防止，但新的追踪方法继续出现，这些方法可以跳过现有的防护措施。因为追踪者需要将信息从客户端传到服务器端，因此一个有效的对策是检测和清理链接装饰。我们提出了PURL，一种机器学习方法，利用页面执行的跨层图表示来安全地和有效地检测和清理链接装饰。我们的评估显示，PURL比现有的对策更高度精度和减少网站损坏，同时具有对常见的逃脱技术的抗性。我们使用PURL进行了顶千个网站的测量研究，发现链接装饰被知名的广告商和追踪者广泛运用，以将用户信息从浏览器储存、电子邮件地址和脚本散发扫描撷取到。”Note: The translation is in Simplified Chinese, which is the standard Chinese writing system used in mainland China and Singapore.

Noncompact uniform universal approximation

paper_url: http://arxiv.org/abs/2308.03812
repo_url: None
paper_authors: Teun D. H. van Nuland
for: 这个论文探讨了universal approximation theorem在非 компакт输入空间 $\mathbb R^n$ 上的普遍化 convergenc。
methods: 这个论文使用了神经网络来对所有在 $\mathbb R^n$ 上连续函数进行uniform approximation。
results: 研究发现，对于所有非零 activation function $\varphi$ 和所有 $n$ 和 $l\geq2$， THEN $\mathcal{N}_\varphi^l(\mathbb R^n)$ 是一个 vector space，且对于左限和右限不同的 $\varphi$，这个 vector space 独立于 $\varphi$ 和 $l$，且等于 sigmoid compose with one-dimensional projection 的闭 span。对于左限和右限相同的 $\varphi$，这个 vector space 等于 commutative resolvent algebra，一个 C*-algebra，且独立于 $l\geq1$。

Abstract
The universal approximation theorem is generalised to uniform convergence on the (noncompact) input space $\mathbb R^n$. All continuous functions that vanish at infinity can be uniformly approximated by neural networks with one hidden layer, for all continuous activation functions $\varphi\neq0$ with asymptotically linear behaviour at $\pm\infty$. When $\varphi$ is moreover bounded, we exactly determine which functions can be uniformly approximated by neural networks, with the following unexpected results. Let $\overline{\mathcal{N}_\varphi^l(\mathbb R^n)}$ denote the vector space of functions that are uniformly approximable by neural networks with $l$ hidden layers and $n$ inputs. For all $n$ and all $l\geq2$, $\overline{\mathcal{N}_\varphi^l(\mathbb R^n)}$ turns out to be an algebra under the pointwise product. If the left limit of $\varphi$ differs from its right limit (for instance, when $\varphi$ is sigmoidal) the algebra $\overline{\mathcal{N}_\varphi^l(\mathbb R^n)}$ ($l\geq2$) is independent of $\varphi$ and $l$, and equals the closed span of products of sigmoids composed with one-dimensional projections. If the left limit of $\varphi$ equals its right limit, $\overline{\mathcal{N}_\varphi^l(\mathbb R^n)}$ ($l\geq1$) equals the (real part of the) commutative resolvent algebra, a C*-algebra which is used in mathematical approaches to quantum theory. In the latter case, the algebra is independent of $l\geq1$, whereas in the former case $\overline{\mathcal{N}_\varphi^2(\mathbb R^n)}$ is strictly bigger than $\overline{\mathcal{N}_\varphi^1(\mathbb R^n)}$.

摘要
“universal approximation theorem”被推广到非 compat 输入空间 $\mathbb R^n$ 上的 uniform convergence。所有在 infinities 处消失的连续函数可以由一层神经网络 uniform approximation，对所有非零连续激活函数 $\varphi$ 的 asymptotically linear behavior at $\pm\infty$。当 $\varphi$ moreover bounded 时，我们可以准确地确定可以 uniform approximation 的函数，并且有以下意外的结果。具有 $n$ 输入和 $l$ 层神经网络的函数空间 $\mathcal{N}_\varphi^l(\mathbb R^n)$ 被定义为可以通过神经网络 uniform approximation 的函数空间。对所有 $n$ 和 $l\geq2$，$\mathcal{N}_\varphi^l(\mathbb R^n)$ 是一个点wise product 的代数。如果左限的 $\varphi$ 不同于右限（例如，sigmoid 函数）， то $\mathcal{N}_\varphi^l(\mathbb R^n)$ ($l\geq2$) 是 $\varphi$ 和 $l$ 的独立的代数，等于sigmoid compose with one-dimensional projection 的闭 span。如果左限的 $\varphi$ 等于右限，那么 $\mathcal{N}_\varphi^l(\mathbb R^n)$ ($l\geq1$) 是一个 commutative resolvent algebra，一个 C*-algebra，这种代数在数学方法中用于量子理论。在后者情况下，这个代数是 $l\geq1$ 的独立的，而在前者情况下，$\mathcal{N}_\varphi^2(\mathbb R^n)$ 是 $\mathcal{N}_\varphi^1(\mathbb R^n)$ 的 strictly bigger。

Applied metamodelling for ATM performance simulations

paper_url: http://arxiv.org/abs/2308.03404
repo_url: None
paper_authors: Christoffer Riis, Francisco N. Antunes, Tatjana Bolić, Gérald Gurtner, Andrew Cook, Carlos Lima Azevedo, Francisco Câmara Pereira
for: 提高ATM simulator的计划和运作的决策支持
methods: integrate active learning和SHAP值进行模拟мета模型
results: 比XGBoost模型具有更好的解释能力，并且可以更好地揭示输入和输出变量之间的隐藏关系。

Abstract
The use of Air traffic management (ATM) simulators for planing and operations can be challenging due to their modelling complexity. This paper presents XALM (eXplainable Active Learning Metamodel), a three-step framework integrating active learning and SHAP (SHapley Additive exPlanations) values into simulation metamodels for supporting ATM decision-making. XALM efficiently uncovers hidden relationships among input and output variables in ATM simulators, those usually of interest in policy analysis. Our experiments show XALM's predictive performance comparable to the XGBoost metamodel with fewer simulations. Additionally, XALM exhibits superior explanatory capabilities compared to non-active learning metamodels. Using the `Mercury' (flight and passenger) ATM simulator, XALM is applied to a real-world scenario in Paris Charles de Gaulle airport, extending an arrival manager's range and scope by analysing six variables. This case study illustrates XALM's effectiveness in enhancing simulation interpretability and understanding variable interactions. By addressing computational challenges and improving explainability, XALM complements traditional simulation-based analyses. Lastly, we discuss two practical approaches for reducing the computational burden of the metamodelling further: we introduce a stopping criterion for active learning based on the inherent uncertainty of the metamodel, and we show how the simulations used for the metamodel can be reused across key performance indicators, thus decreasing the overall number of simulations needed.

摘要
使用空交通管理（ATM）模拟器进行规划和运行可能会面临模型复杂性挑战。本文介绍XALM（可解释主动学习元模型），一个三步框架，将活动学习和SHAP（SHapley Additive exPlanations）值 integrate到 simulation元模型中，以支持ATM决策。XALM能够效率地揭示ATM模拟器中输入和输出变量之间的隐藏关系，通常是政策分析中的关键点。我们的实验表明，XALM的预测性能与XGBoost元模型相当，而且XALM的解释能力比非活动学习元模型更高。使用Mercury（飞机和乘客）ATM模拟器，XALM在法国巴黎查理·德·古尔机场的一个实际场景中应用，分析了六个变量。这个案例示出了XALM在提高模拟解释性和理解变量互动方面的效果。通过解决计算挑战和提高解释性，XALM补充了传统的模拟分析。最后，我们介绍了两种实用的计算压力减轻方法：基于元模型内在不确定性的活动学习停止 criterion，以及可以将模拟用于元模型中的 simulation reuse across key performance indicators，从而降低总的模拟数量。

Towards Machine Learning-based Fish Stock Assessment

paper_url: http://arxiv.org/abs/2308.03403
repo_url: None
paper_authors: Stefan Lüdtke, Maria E. Pierce
for: 提高可持续性渔业管理中鱼类资源的准确评估
methods: 使用机器学习模型改进鱼类资源参数的估计和预测
results: 对五种不同的鱼类资源进行实验，发现预测减降率和繁殖种群质量的准确率有很大改善

Abstract
The accurate assessment of fish stocks is crucial for sustainable fisheries management. However, existing statistical stock assessment models can have low forecast performance of relevant stock parameters like recruitment or spawning stock biomass, especially in ecosystems that are changing due to global warming and other anthropogenic stressors. In this paper, we investigate the use of machine learning models to improve the estimation and forecast of such stock parameters. We propose a hybrid model that combines classical statistical stock assessment models with supervised ML, specifically gradient boosted trees. Our hybrid model leverages the initial estimate provided by the classical model and uses the ML model to make a post-hoc correction to improve accuracy. We experiment with five different stocks and find that the forecast accuracy of recruitment and spawning stock biomass improves considerably in most cases.

摘要
准确评估淡水鱼资源非常重要，以实现可持续的渔业管理。然而，现有的统计鱼填评估模型可能具有低预测性能，特别是在因全球变暖和其他人类压力而变化的生态系统中。在这篇论文中，我们研究了使用机器学习模型提高鱼填评估和预测的方法。我们提议一种混合模型，结合传统的统计鱼填评估模型和监督学习，具体来说是梯度提升树。我们的混合模型利用传统模型提供的初始估计，并使用ML模型进行后续更正，以提高准确性。我们对五个不同的鱼种进行实验，发现预测准确性在大多数情况下有显著提高。

Enhancing Nucleus Segmentation with HARU-Net: A Hybrid Attention Based Residual U-Blocks Network

paper_url: http://arxiv.org/abs/2308.03382
repo_url: None
paper_authors: Junzhou Chen, Qian Huang, Yulin Chen, Linyi Qian, Chengyuan Yu
for: 这个研究主要旨在提高核体像素化的精度和效率，以便于生物医学分析、诊断和分类中使用。
methods: 我们提出了一个基于双支分支网络的混合注意力残差U-块方法，可以同时预测目标信息和目标 outline。我们还提出了一个后处理方法，可以结合目标信息和目标 outline来区别遮蔽的核体和生成实例分割图像。
results: 我们的方法在各个数据集上进行了广泛的量化评估，结果显示我们的方法在与现有方法比较时表现出色，特别是在应用于不规则核体的情况下。

Abstract
Nucleus image segmentation is a crucial step in the analysis, pathological diagnosis, and classification, which heavily relies on the quality of nucleus segmentation. However, the complexity of issues such as variations in nucleus size, blurred nucleus contours, uneven staining, cell clustering, and overlapping cells poses significant challenges. Current methods for nucleus segmentation primarily rely on nuclear morphology or contour-based approaches. Nuclear morphology-based methods exhibit limited generalization ability and struggle to effectively predict irregular-shaped nuclei, while contour-based extraction methods face challenges in accurately segmenting overlapping nuclei. To address the aforementioned issues, we propose a dual-branch network using hybrid attention based residual U-blocks for nucleus instance segmentation. The network simultaneously predicts target information and target contours. Additionally, we introduce a post-processing method that combines the target information and target contours to distinguish overlapping nuclei and generate an instance segmentation image. Within the network, we propose a context fusion block (CF-block) that effectively extracts and merges contextual information from the network. Extensive quantitative evaluations are conducted to assess the performance of our method. Experimental results demonstrate the superior performance of the proposed method compared to state-of-the-art approaches on the BNS, MoNuSeg, CoNSeg, and CPM-17 datasets.

摘要
核心图像分割是生物体分析、病理诊断和分类中的关键步骤，它的质量直接影响下游应用的结果。然而，核心图像分割过程面临着许多复杂的问题，如核心大小变化、核心渐圆、不均颜色、细胞堆叠和重叠细胞等。现有的核心图像分割方法主要基于核心形态或边沿基本方法。核心形态基本方法具有限定泛化能力，难以预测不规则形状的核心，而边沿基本方法在重叠细胞分割上存在困难。为解决以上问题，我们提出了一种基于双支网络的核心实例分割方法。该网络同时预测目标信息和目标极值。此外，我们引入了一种 combining 目标信息和目标极值的后处理方法，以分辨重叠的核心并生成实例分割图像。在网络中，我们提出了一种Context Fusion块（CF-块），可以有效地提取和融合网络中的Contextual信息。我们对方法进行了广泛的量化评估，并发现方法的性能在BNS、MoNuSeg、CoNSeg和CPM-17等数据集上都显著超过了现有方法。

A reading survey on adversarial machine learning: Adversarial attacks and their understanding

paper_url: http://arxiv.org/abs/2308.03363
repo_url: None
paper_authors: Shashank Kotyan
for: 本研究旨在探讨和理解针对神经网络的攻击方法，以系统化的方式掌握攻击方法的类别和特点。
methods: 本文使用了多种攻击方法，包括随机攻击、梯度攻击、缺失攻击、噪声攻击等，以测试神经网络的抵御能力。
results: 本文通过对多种神经网络模型进行攻击和防御测试，发现攻击方法的多样性和神经网络模型的抵御能力强度不同，并提出了一些未来研究方向。

Abstract
Deep Learning has empowered us to train neural networks for complex data with high performance. However, with the growing research, several vulnerabilities in neural networks have been exposed. A particular branch of research, Adversarial Machine Learning, exploits and understands some of the vulnerabilities that cause the neural networks to misclassify for near original input. A class of algorithms called adversarial attacks is proposed to make the neural networks misclassify for various tasks in different domains. With the extensive and growing research in adversarial attacks, it is crucial to understand the classification of adversarial attacks. This will help us understand the vulnerabilities in a systematic order and help us to mitigate the effects of adversarial attacks. This article provides a survey of existing adversarial attacks and their understanding based on different perspectives. We also provide a brief overview of existing adversarial defences and their limitations in mitigating the effect of adversarial attacks. Further, we conclude with a discussion on the future research directions in the field of adversarial machine learning.

摘要
深度学习已经赋予我们训练复杂数据的神经网络高性能。然而，随着研究的增长，许多神经网络的漏洞也被曝光。一个特定的研究分支，敌意机器学习，利用和掌握了一些导致神经网络错分的漏洞。一类称为敌意攻击的算法被提出来使神经网络错分各种任务在不同领域。随着敌意攻击的广泛和增长的研究，我们需要理解敌意攻击的分类。这将帮助我们系统地理解漏洞，并帮助我们减轻敌意攻击的影响。本文提供了现有的敌意攻击和它们的理解基于不同的角度。我们还提供了敌意防御的简要概述和其限制在减轻敌意攻击的影响。最后，我们结束 WITH 未来机器学习领域的研究方向。

Solving Falkner-Skan type equations via Legendre and Chebyshev Neural Blocks

paper_url: http://arxiv.org/abs/2308.03337
repo_url: None
paper_authors: Alireza Afzal Aghaei, Kourosh Parand, Ali Nikkhah, Shakila Jaberi
for: 解决非线性法克-斯坦方程
methods: 使用Legendre和Chebyshev神经块，利用 ortogonal polynomials 在神经网络中增强人工神经网络的近似能力
results: 通过模拟不同的法克-斯坦方程配置，实现了提高计算效率和准确率的目的

Abstract
In this paper, a new deep-learning architecture for solving the non-linear Falkner-Skan equation is proposed. Using Legendre and Chebyshev neural blocks, this approach shows how orthogonal polynomials can be used in neural networks to increase the approximation capability of artificial neural networks. In addition, utilizing the mathematical properties of these functions, we overcome the computational complexity of the backpropagation algorithm by using the operational matrices of the derivative. The efficiency of the proposed method is carried out by simulating various configurations of the Falkner-Skan equation.

摘要
在本文中，一种新的深度学习架构，用于解决非线性法克纳-斯坦方程，被提出。使用Legendre和Chebyshev神经块，这种方法展示了如何在神经网络中使用正交多项式增加人工神经网络的近似能力。此外，利用这些函数的数学性质，我们超越了反射算法的计算复杂性，使用操作矩阵的导数。提出的方法的效率被通过 simulate多种法克纳-斯坦方程的配置进行证明。

Non-Convex Bilevel Optimization with Time-Varying Objective Functions

paper_url: http://arxiv.org/abs/2308.03811
repo_url: None
paper_authors: Sen Lin, Daouda Sow, Kaiyi Ji, Yingbin Liang, Ness Shroff
for: 本研究强调在在线应用中实现精细化优化，满足流动数据和时间变化函数的需求。
methods: 我们提出了一种基于单 Loop 的在线双层优化器（SOBOW），通过窗口均值来更新外层决策，不需要知道过去函数。我们还开发了一种新的分析技术，用于综合分析决策变量之间的复杂 Coupling，并且精细控制了 hypergradient 估计误差。
results: 我们证明 SOBOW 可以在某些条件下实现幂等级的双层本地 regret。广泛的实验结果证明 SOBOW 的效果。

Abstract
Bilevel optimization has become a powerful tool in a wide variety of machine learning problems. However, the current nonconvex bilevel optimization considers an offline dataset and static functions, which may not work well in emerging online applications with streaming data and time-varying functions. In this work, we study online bilevel optimization (OBO) where the functions can be time-varying and the agent continuously updates the decisions with online streaming data. To deal with the function variations and the unavailability of the true hypergradients in OBO, we propose a single-loop online bilevel optimizer with window averaging (SOBOW), which updates the outer-level decision based on a window average of the most recent hypergradient estimations stored in the memory. Compared to existing algorithms, SOBOW is computationally efficient and does not need to know previous functions. To handle the unique technical difficulties rooted in single-loop update and function variations for OBO, we develop a novel analytical technique that disentangles the complex couplings between decision variables, and carefully controls the hypergradient estimation error. We show that SOBOW can achieve a sublinear bilevel local regret under mild conditions. Extensive experiments across multiple domains corroborate the effectiveness of SOBOW.

摘要
bilateral 优化已成为机器学习问题中的一种强大工具。然而，当前的非凸 bilateral 优化假设了一个离线数据集和静止函数，这可能不适用于新般的在线应用程序中的流动数据和时间变化函数。在这种工作中，我们研究在线 bilateral 优化（OBO），其中函数可以是时间变化的，代理人在线流动数据中不断更新决策。为了处理函数的变化和真实的梯度不可知，我们提议了一种带窗口平均（SOBOW）的单loop在线 bilateral 优化器，其在内存中保存最近的梯度估计，并基于窗口平均更新外层决策。相比现有算法，SOBOW具有计算效率和不需要知道前一个函数的优点。为了处理单 loop 更新和函数变化对 OBO 的独特技术难点，我们开发了一种新的分析技术，可以分解决策变量之间的复杂 Coupling，并且精心控制梯度估计错误。我们表明，SOBOW 可以在某些条件下 achieve 下降的 bilateral 地方 regret。广泛的实验证明了 SOBOW 的有效性。

Expediting Neural Network Verification via Network Reduction

paper_url: http://arxiv.org/abs/2308.03330
repo_url: None
paper_authors: Yuyi Zhong, Ruiwei Wang, Siau-Cheng Khoo
for: 验证深度神经网络的安全性属性，以确保神经网络在关键应用中正常工作。
methods: 提出了多种验证方法，以验证神经网络的安全性。但是，许多已知的验证工具仍然无法处理复杂的网络架构和大型神经网络。本文提出了一种网络减少技术作为验证前置处理方法。
results: 我们在大量的 benchmark 上实验表明，提posed 的减少技术可以减少神经网络，并使现有的验证工具更快速。此外，实验结果还表明，网络减少可以提高现有验证工具对许多神经网络的可用性。

Abstract
A wide range of verification methods have been proposed to verify the safety properties of deep neural networks ensuring that the networks function correctly in critical applications. However, many well-known verification tools still struggle with complicated network architectures and large network sizes. In this work, we propose a network reduction technique as a pre-processing method prior to verification. The proposed method reduces neural networks via eliminating stable ReLU neurons, and transforming them into a sequential neural network consisting of ReLU and Affine layers which can be handled by the most verification tools. We instantiate the reduction technique on the state-of-the-art complete and incomplete verification tools, including alpha-beta-crown, VeriNet and PRIMA. Our experiments on a large set of benchmarks indicate that the proposed technique can significantly reduce neural networks and speed up existing verification tools. Furthermore, the experiment results also show that network reduction can improve the availability of existing verification tools on many networks by reducing them into sequential neural networks.

摘要
深度神经网络的安全性特性的验证方法有很多已经被提出，以确保神经网络在关键应用中正确地工作。然而，许多知名的验证工具仍然无法处理复杂的网络架构和大型网络。在这种情况下，我们提出了一种网络减少技术作为预处理方法，以降低验证工具的难度。我们的方法利用稳定的ReLU神经元消除和转换为一个包含ReLU和Affine层的顺序神经网络，这种网络可以被大多数验证工具处理。我们在 alpha-beta-crown、VeriNet 和 PRIMA 等当今最佳实践中的完整和 incomplete 验证工具上实现了这种减少技术。我们对一组大型标准 benchmark 进行了实验，结果表明，我们的方法可以减少神经网络，并使现有的验证工具在许多网络上提高可用性。此外，实验结果还表明，网络减少可以提高现有验证工具对许多网络的验证能力。

AFN: Adaptive Fusion Normalization via Encoder-Decoder Framework

paper_url: http://arxiv.org/abs/2308.03321
repo_url: https://github.com/huanranchen/ASRNorm
paper_authors: Zikai Zhou, Huanran Chen
for: 该论文目的是提出一种统一的Normalization函数，以减少不同Normalization函数的缺点。
methods: 该论文使用了多种Normalization函数，并通过对比这些函数的优缺点，提出了一种新的Adaptive Fusion Normalization函数。
results: 实验结果显示，AFN函数在领域泛化和图像分类任务中表现较好，超过了之前的Normalization技术。

Abstract
The success of deep learning is inseparable from normalization layers. Researchers have proposed various normalization functions, and each of them has both advantages and disadvantages. In response, efforts have been made to design a unified normalization function that combines all normalization procedures and mitigates their weaknesses. We also proposed a new normalization function called Adaptive Fusion Normalization. Through experiments, we demonstrate AFN outperforms the previous normalization techniques in domain generalization and image classification tasks.

摘要
深度学习的成功与 нормализацион层无可分割。研究人员已经提出了多种 нормализацион函数，每种都有其优点和缺点。为了解决这些弱点，努力设计一个统一的 нормализацион函数，既能汇集所有的 нормализацион过程，又能减轻它们的缺点。我们还提出了一种新的 нормаliasjon函数，叫做自适应融合 нормаliasjon（AFN）。经过实验，我们证明AFN在领域普适化和图像分类任务中表现出色。Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Binary Federated Learning with Client-Level Differential Privacy

paper_url: http://arxiv.org/abs/2308.03320
repo_url: None
paper_authors: Lumin Liu, Jun Zhang, Shenghui Song, Khaled B. Letaief
for: 提高 Federated Learning 系统的隐私保护和性能。
methods: 使用 binary neural networks (BNNs) 和离散噪声来实现 client-level 隐私保护，并且通过减少模型参数的精度来提高通信效率。
results: 实验结果基于 MNIST 和 Fashion-MNIST 数据集表明，提议的训练算法可以实现客户端级隐私保护而同时具有低通信开销的优势。

Abstract
Federated learning (FL) is a privacy-preserving collaborative learning framework, and differential privacy can be applied to further enhance its privacy protection. Existing FL systems typically adopt Federated Average (FedAvg) as the training algorithm and implement differential privacy with a Gaussian mechanism. However, the inherent privacy-utility trade-off in these systems severely degrades the training performance if a tight privacy budget is enforced. Besides, the Gaussian mechanism requires model weights to be of high-precision. To improve communication efficiency and achieve a better privacy-utility trade-off, we propose a communication-efficient FL training algorithm with differential privacy guarantee. Specifically, we propose to adopt binary neural networks (BNNs) and introduce discrete noise in the FL setting. Binary model parameters are uploaded for higher communication efficiency and discrete noise is added to achieve the client-level differential privacy protection. The achieved performance guarantee is rigorously proved, and it is shown to depend on the level of discrete noise. Experimental results based on MNIST and Fashion-MNIST datasets will demonstrate that the proposed training algorithm achieves client-level privacy protection with performance gain while enjoying the benefits of low communication overhead from binary model updates.

摘要
federated learning（FL）是一种隐私保护的协作学习框架，可以通过减少隐私泄露来进一步增强隐私保护。现有的FL系统通常采用联邦平均（FedAvg）作为训练算法，并通过高精度的模型权重来实现减少隐私的目的。然而，这种隐私性和用途之间的质量负担在这些系统中严重地降低了训练性能，特别是当强制实施严格的隐私预算时。此外，高精度的模型权重需要高精度的数据。为了提高通信效率和实现更好的隐私性和用途之间的质量负担，我们提议一种通信高效的FL训练算法，并且保证隐私性。具体来说，我们提议采用二进制神经网络（BNN）和在FL设定中引入离散噪声。二进制模型参数上传以提高通信效率，而离散噪声可以实现客户端级别的隐私保护。我们通过 teorema 证明了性能保证，并证明其取决于离散噪声的水平。实验结果基于 MNIST 和 Fashion-MNIST 数据集表明，提议的训练算法可以实现客户端级别的隐私保护，同时享受到低通信开销的 binary 模型更新的好处。

HomOpt: A Homotopy-Based Hyperparameter Optimization Method

paper_url: http://arxiv.org/abs/2308.03317
repo_url: https://github.com/jeffkinnison/shadho
paper_authors: Sophia J. Abraham, Kehelwala D. G. Maduranga, Jeffery Kinnison, Zachariah Carmichael, Jonathan D. Hauenstein, Walter J. Scheirer
for: 提高机器学习模型的性能和效率，即 Hyperparameter Optimization (HPO) 问题。
methods: 提出一种新的数据驱动的 Hyperparameter Optimization 方法，基于 Generalized Additive Model (GAM) 函数和 homotopy 优化。
results: 对多种优化技术（如 Random Search、TPE、Bayes 和 SMAC）进行比较，并在多个标准机器学习Benchmark和开放集ognition任务上显示出更好的目标性能。

Abstract
Machine learning has achieved remarkable success over the past couple of decades, often attributed to a combination of algorithmic innovations and the availability of high-quality data available at scale. However, a third critical component is the fine-tuning of hyperparameters, which plays a pivotal role in achieving optimal model performance. Despite its significance, hyperparameter optimization (HPO) remains a challenging task for several reasons. Many HPO techniques rely on naive search methods or assume that the loss function is smooth and continuous, which may not always be the case. Traditional methods, like grid search and Bayesian optimization, often struggle to quickly adapt and efficiently search the loss landscape. Grid search is computationally expensive, while Bayesian optimization can be slow to prime. Since the search space for HPO is frequently high-dimensional and non-convex, it is often challenging to efficiently find a global minimum. Moreover, optimal hyperparameters can be sensitive to the specific dataset or task, further complicating the search process. To address these issues, we propose a new hyperparameter optimization method, HomOpt, using a data-driven approach based on a generalized additive model (GAM) surrogate combined with homotopy optimization. This strategy augments established optimization methodologies to boost the performance and effectiveness of any given method with faster convergence to the optimum on continuous, discrete, and categorical domain spaces. We compare the effectiveness of HomOpt applied to multiple optimization techniques (e.g., Random Search, TPE, Bayes, and SMAC) showing improved objective performance on many standardized machine learning benchmarks and challenging open-set recognition tasks.

摘要
机器学习在过去几十年内取得了很大成功，经常归功于算法创新和大规模数据的可用性。然而，一个第三要 componenet是细化参数的调整，它在实现优化模型性能中扮演着关键的角色。尽管其重要性，但参数优化（HPO）仍然是一个具有挑战性的任务，主要因为以下几个原因：多数HPO技术利用粗暴的搜索方法，或者假设损失函数是连续的，这并不总是情况。传统的方法，如格里德搜索和bayesian优化，经常难以快速适应和有效地搜索损失函数的 landscape。格里德搜索 computationally expensive，而bayesian优化可能需要很长时间来 prime。由于搜索空间 дляHPOfrequently高维和非 convex，因此寻找全局最优点是有很大挑战。此外，优化参数可能会受到特定的数据集或任务的影响，这进一步增加了搜索过程的复杂性。为解决这些问题，我们提出了一种新的参数优化方法，HomOpt，使用基于通用添加模型（GAM）的数据驱动方法，并结合抽象优化。这种策略可以增强现有优化方法的性能和有效性，并在不同的域空间上提供更快的趋势。我们对HomOpt应用于多种优化技术（例如Random Search、TPE、Bayes和SMAC），并在许多标准化机器学习benchmark和开放集成任务上显示出了提高了目标性能。

Deep Q-Network for Stochastic Process Environments

paper_url: http://arxiv.org/abs/2308.03316
repo_url: None
paper_authors: Kuangheng He
for: 本研究用深度学习抽象环境中的束缚学习方法解决复杂问题。
methods: 本研究使用深度Q学习网络，并评估不同结构的网络在束缚过程环境中的性能。
results: 研究结果表明，使用特定网络结构可以在束缚过程环境中提高性能。

Abstract
Reinforcement learning is a powerful approach for training an optimal policy to solve complex problems in a given system. This project aims to demonstrate the application of reinforcement learning in stochastic process environments with missing information, using Flappy Bird and a newly developed stock trading environment as case studies. We evaluate various structures of Deep Q-learning networks and identify the most suitable variant for the stochastic process environment. Additionally, we discuss the current challenges and propose potential improvements for further work in environment-building and reinforcement learning techniques.

摘要
“增强学习”是一种强大的方法，用于训练在给定系统中的优化策略，解决复杂的问题。这个项目的目标是通过使用“扩展 Deep Q-学习”网络和新开发的股票交易环境作为案例研究，在偏振过程环境中应用增强学习。我们评估了不同结构的 Deep Q-学习网络，并确定最适合偏振过程环境的变体。此外，我们还讨论了当前的挑战和可能的改进方法，以便进一步推进环境建设和增强学习技术。

Symmetry-Preserving Program Representations for Learning Code Semantics

paper_url: http://arxiv.org/abs/2308.03312
repo_url: None
paper_authors: Kexin Pei, Weichen Li, Qirui Jin, Shuyang Liu, Scott Geng, Lorenzo Cavallaro, Junfeng Yang, Suman Jana
for: 本研究旨在提高自动化程序理解的能力，尤其是安全任务中的核心问题。
methods: 我们Draw inspiration from examples of convolution layers exploiting translation symmetry，探讨如何使用代码 симметрии提高 LL M 架构。我们提出了一种正式的群理论框架，准确地定义代码 симметрии为 semantics-preserving 变换，并提供了 precisione reasoning 技术来保证在 LL M 架构中Symmetry preservation。
results: 我们 introduce a novel variant of self-attention that preserves program symmetries，并通过详细的实验评估，证明其在泛化和Robustness 方面的效果。总的来说，我们的代码 Symmetry 框架提供了正式和有力的理由技术，可以导向将来的特циалиzed LL M 的发展，并推动 LL M 驱动的程序理解任务的进步。

Abstract
Large Language Models (LLMs) have shown promise in automated program reasoning, a crucial aspect of many security tasks. However, existing LLM architectures for code are often borrowed from other domains like natural language processing, raising concerns about their generalization and robustness to unseen code. A key generalization challenge is to incorporate the knowledge of code semantics, including control and data flow, into the LLM architectures. Drawing inspiration from examples of convolution layers exploiting translation symmetry, we explore how code symmetries can enhance LLM architectures for program analysis and modeling. We present a rigorous group-theoretic framework that formally defines code symmetries as semantics-preserving transformations and provides techniques for precisely reasoning about symmetry preservation within LLM architectures. Using this framework, we introduce a novel variant of self-attention that preserves program symmetries, demonstrating its effectiveness in generalization and robustness through detailed experimental evaluations across different binary and source code analysis tasks. Overall, our code symmetry framework offers rigorous and powerful reasoning techniques that can guide the future development of specialized LLMs for code and advance LLM-guided program reasoning tasks.

摘要
Inspired by the use of convolution layers that exploit translation symmetry, we explore how code symmetries can enhance LLM architectures for program analysis and modeling. We provide a rigorous group-theoretic framework that defines code symmetries as semantics-preserving transformations and provides techniques for precisely reasoning about symmetry preservation within LLM architectures.Using this framework, we introduce a novel variant of self-attention that preserves program symmetries, which we demonstrate to be effective in terms of generalization and robustness through detailed experimental evaluations across different binary and source code analysis tasks. Overall, our code symmetry framework offers rigorous and powerful reasoning techniques that can guide the future development of specialized LLMs for code and advance LLM-guided program reasoning tasks.

Implicit Graph Neural Diffusion Based on Constrained Dirichlet Energy Minimization

paper_url: http://arxiv.org/abs/2308.03306
repo_url: None
paper_authors: Guoji Fu, Mohammed Haroon Dupty, Yanfei Dong, Lee Wee Sun
for: This paper aims to address the issues of over-smoothing and limited adaptability in implicit graph neural networks (GNNs) by introducing a geometric framework for designing implicit graph diffusion layers.
methods: The paper proposes a parameterized graph Laplacian operator to learn the geometry of vertex and edge spaces, as well as the graph gradient operator from data. The implicit graph diffusion layer is viewed as the fixed-point solution of a Dirichlet energy minimization problem, and the authors design a solution with constraints on vertex features to trade off smoothing with the preservation of node feature information.
results: The paper demonstrates better performance than leading implicit and explicit GNNs on benchmark datasets for node and graph classification tasks, with substantial accuracy improvements observed for some datasets.

Abstract
Implicit graph neural networks (GNNs) have emerged as a potential approach to enable GNNs to capture long-range dependencies effectively. However, poorly designed implicit GNN layers can experience over-smoothing or may have limited adaptability to learn data geometry, potentially hindering their performance in graph learning problems. To address these issues, we introduce a geometric framework to design implicit graph diffusion layers based on a parameterized graph Laplacian operator. Our framework allows learning the geometry of vertex and edge spaces, as well as the graph gradient operator from data. We further show how implicit GNN layers can be viewed as the fixed-point solution of a Dirichlet energy minimization problem and give conditions under which it may suffer from over-smoothing. To overcome the over-smoothing problem, we design our implicit graph diffusion layer as the solution of a Dirichlet energy minimization problem with constraints on vertex features, enabling it to trade off smoothing with the preservation of node feature information. With an appropriate hyperparameter set to be larger than the largest eigenvalue of the parameterized graph Laplacian, our framework guarantees a unique equilibrium and quick convergence. Our models demonstrate better performance than leading implicit and explicit GNNs on benchmark datasets for node and graph classification tasks, with substantial accuracy improvements observed for some datasets.

摘要
匿名图 neural networks (GNNs) 已经出现为一种可能的方法，以便 GNNs 可以有效地捕捉长距离依赖关系。然而，如果设计不当的匿名 GNN 层，可能会导致过滤或有限适应性，从而妨碍它们在图学习问题中表现。为了解决这些问题，我们提出了一个几何框架，用于设计基于参数化图拉普拉斯运算符的匿名图扩散层。我们的框架允许学习顶点和边空间的几何结构，以及图的梯度运算符从数据中学习。此外，我们还证明了匿名 GNN 层可以视为 Dirichlet 能量最小化问题的固定点解，并给出了避免过滤的条件。为了超越过滤问题，我们设计了一种基于顶点特征的 Dirichlet 能量最小化问题的约束，使得匿名图扩散层能够平衡平滑与保留顶点特征信息之间的权衡。在适当的超参数设置为大于最大 eigenvalues of 参数化图拉普拉斯运算符时，我们的框架保证唯一的平衡点和快速收敛。我们的模型在 benchmark 数据集上 для 节点和图分类任务中表现出色，与领先的匿名 GNN 和Explicit GNN 相比，具有显著的准确率提高。

Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection

paper_url: http://arxiv.org/abs/2308.03300
repo_url: https://github.com/cecile-hi/regularized-adaptive-weight-modification
paper_authors: Xiaohui Zhang, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Chuyuan Zhang
for: 这个论文的目的是解决伪音标注检测算法在不同数据集上表现下降的问题。
methods: 我们提出了一种持续学习算法，叫做Regularized Adaptive Weight Modification（RAWM），可以避免伪阳性检测算法中的溃败性忘记。当调整检测网络时，我们的方法会根据伪音和真音的比例进行适应的 modificaitondirection。
results: 我们在多个数据集上进行了跨数据集实验，结果表明我们的方法可以提高伪音标注检测的表现。另外，我们还引入了一个规化因素，以保持网络对于不同的音响环境中的真音标注的记忆。

Abstract
Current fake audio detection algorithms have achieved promising performances on most datasets. However, their performance may be significantly degraded when dealing with audio of a different dataset. The orthogonal weight modification to overcome catastrophic forgetting does not consider the similarity of genuine audio across different datasets. To overcome this limitation, we propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting, called Regularized Adaptive Weight Modification (RAWM). When fine-tuning a detection network, our approach adaptively computes the direction of weight modification according to the ratio of genuine utterances and fake utterances. The adaptive modification direction ensures the network can effectively detect fake audio on the new dataset while preserving its knowledge of old model, thus mitigating catastrophic forgetting. In addition, genuine audio collected from quite different acoustic conditions may skew their feature distribution, so we introduce a regularization constraint to force the network to remember the old distribution in this regard. Our method can easily be generalized to related fields, like speech emotion recognition. We also evaluate our approach across multiple datasets and obtain a significant performance improvement on cross-dataset experiments.

摘要
When fine-tuning a detection network, our approach adaptively computes the direction of weight modification based on the ratio of genuine utterances and fake utterances. This ensures that the network can effectively detect fake audio on the new dataset while preserving its knowledge of the old model, thus mitigating catastrophic forgetting.In addition, we introduce a regularization constraint to force the network to remember the old distribution of genuine audio in terms of feature distribution, even when faced with new audio from quite different acoustic conditions. Our method can easily be applied to related fields such as speech emotion recognition.We evaluate our approach across multiple datasets and observe a significant improvement in performance on cross-dataset experiments.

Studying Large Language Model Generalization with Influence Functions

paper_url: http://arxiv.org/abs/2308.03296
repo_url: https://github.com/Aryia-Behroziuan/neurons
paper_authors: Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman
for: 了解和 mitigate Machine Learning 模型中关联的风险
methods: 使用 Influence Functions 来回答一个 counterfactual：如果一个序列被添加到训练集中，如何改变模型的参数和输出？
results: 使用 Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) 方法可以在大型语言模型 (LLMs) 中扩展 Influence Functions，并且可以在几乎实时内计算 inverse-Hessian-vector product (IHVP)。我们的实验表明，EK-FAC 可以达到类似于传统的 Influence Functions 估计器的准确性，即使 IHVP 计算在下面的许多 órders of magnitude 快。

Abstract
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.

摘要
当尝试更好地了解一个机器学习模型以便理解和避免相关风险时，一个有价值的证据来源是：哪些训练示例最大程度地对模型的行为做出贡献？影响函数的目的是回答一个Counterfactual问题：如果给定的序列添加到训练集中， THEN 模型的参数（以及其输出）如何改变？虽然影响函数已经生成了一些启示，但是它们难以扩展到大型自然语言模型（LLM），因为计算 inverse-Hessian-vector product（IHVP）的困难。我们使用Eigenvalue-corrected Kronecker-Factored Approximate Curvature（EK-FAC）的方法来扩展影响函数到 LLM 中，并在520亿参数下实现了类似的准确率。我们运行了两种算法技术来减少计算候选训练序列的梯度的成本：TF-IDF 筛选和查询批处理。我们使用影响函数来调查大语言模型的泛化模式，包括泛化 Patterns的稀缺性、逐渐增加的抽象级别、数学和编程能力、跨语言泛化和角色扮演行为。尽管有很多复杂的泛化形式，但我们发现一个Surprising limitation：影响的 decay 到 near-zero 当键phrase 的顺序被反转。总之，影响函数为我们研究大语言模型的泛化性质提供了一个强大的新工具。

DOMINO: Domain-invariant Hyperdimensional Classification for Multi-Sensor Time Series Data

paper_url: http://arxiv.org/abs/2308.03295
repo_url: None
paper_authors: Junyao Wang, Luke Chen, Mohammad Abdullah Al Faruque
for: 这个研究是为了解决智能网络的资料驱动机器学学习方法中的分布偏移问题。
methods: 这个研究使用了脑海算法（HDC）来解决分布偏移问题，并且提出了一个名为DOMINO的新的学习框架。
results: 这个研究的结果显示，DOMINO比前一代的预测方法高出2.04%的精度，并且在训练和测试过程中比前一代的预测方法快得多了16.34倍和2.89倍。此外，DOMINO在部分标签和高度不均的资料上进行学习时表现特别出色，与硬件噪音相比，DOMINO的Robustness提高了10.93倍。

Abstract
With the rapid evolution of the Internet of Things, many real-world applications utilize heterogeneously connected sensors to capture time-series information. Edge-based machine learning (ML) methodologies are often employed to analyze locally collected data. However, a fundamental issue across data-driven ML approaches is distribution shift. It occurs when a model is deployed on a data distribution different from what it was trained on, and can substantially degrade model performance. Additionally, increasingly sophisticated deep neural networks (DNNs) have been proposed to capture spatial and temporal dependencies in multi-sensor time series data, requiring intensive computational resources beyond the capacity of today's edge devices. While brain-inspired hyperdimensional computing (HDC) has been introduced as a lightweight solution for edge-based learning, existing HDCs are also vulnerable to the distribution shift challenge. In this paper, we propose DOMINO, a novel HDC learning framework addressing the distribution shift problem in noisy multi-sensor time-series data. DOMINO leverages efficient and parallel matrix operations on high-dimensional space to dynamically identify and filter out domain-variant dimensions. Our evaluation on a wide range of multi-sensor time series classification tasks shows that DOMINO achieves on average 2.04% higher accuracy than state-of-the-art (SOTA) DNN-based domain generalization techniques, and delivers 16.34x faster training and 2.89x faster inference. More importantly, DOMINO performs notably better when learning from partially labeled and highly imbalanced data, providing 10.93x higher robustness against hardware noises than SOTA DNNs.

摘要
With the rapid evolution of the Internet of Things, many real-world applications utilize heterogeneously connected sensors to capture time-series information. Edge-based machine learning (ML) methodologies are often employed to analyze locally collected data. However, a fundamental issue across data-driven ML approaches is distribution shift. It occurs when a model is deployed on a data distribution different from what it was trained on, and can substantially degrade model performance. Additionally, increasingly sophisticated deep neural networks (DNNs) have been proposed to capture spatial and temporal dependencies in multi-sensor time series data, requiring intensive computational resources beyond the capacity of today's edge devices. While brain-inspired hyperdimensional computing (HDC) has been introduced as a lightweight solution for edge-based learning, existing HDCs are also vulnerable to the distribution shift challenge. In this paper, we propose DOMINO, a novel HDC learning framework addressing the distribution shift problem in noisy multi-sensor time-series data. DOMINO leverages efficient and parallel matrix operations on high-dimensional space to dynamically identify and filter out domain-variant dimensions. Our evaluation on a wide range of multi-sensor time series classification tasks shows that DOMINO achieves on average 2.04% higher accuracy than state-of-the-art (SOTA) DNN-based domain generalization techniques, and delivers 16.34x faster training and 2.89x faster inference. More importantly, DOMINO performs notably better when learning from partially labeled and highly imbalanced data, providing 10.93x higher robustness against hardware noises than SOTA DNNs.

SynJax: Structured Probability Distributions for JAX

paper_url: http://arxiv.org/abs/2308.03291
repo_url: https://github.com/deepmind/synjax
paper_authors: Miloš Stanojević, Laurent Sartran
for: 这个论文是为了提高深度学习模型中的结构化对象处理而写的。
methods: 这篇论文使用的方法是为了提供高效的 вектор化实现归一化算法，以便在现代硬件加速器上实现大规模可导模型。
results: 这篇论文通过SynJax库实现了大规模可导模型，并且可以高效地处理结构化对象，如树和分割。

Abstract
The development of deep learning software libraries enabled significant progress in the field by allowing users to focus on modeling, while letting the library to take care of the tedious and time-consuming task of optimizing execution for modern hardware accelerators. However, this has benefited only particular types of deep learning models, such as Transformers, whose primitives map easily to the vectorized computation. The models that explicitly account for structured objects, such as trees and segmentations, did not benefit equally because they require custom algorithms that are difficult to implement in a vectorized form. SynJax directly addresses this problem by providing an efficient vectorized implementation of inference algorithms for structured distributions covering alignment, tagging, segmentation, constituency trees and spanning trees. With SynJax we can build large-scale differentiable models that explicitly model structure in the data. The code is available at https://github.com/deepmind/synjax.

摘要
深度学习软件库的发展允许用户专注于模型设计，让库负责处理现代硬件加速器的繁琐和耗时任务。然而，这只有某些深度学习模型，如转换器，得到了利用。这些模型的基本 primitives 可以直接映射到 вектор化计算。不同的模型，如树和分割，因为它们需要特定的算法，很难以在 вектор化形式下实现。SynJax 直接解决了这个问题，提供了高效的 вектор化实现方式，用于推理算法，包括对适配、标记、分割、树和 span 的支持。通过 SynJax，我们可以构建大规模的可导 diferenciable 模型，直接模型数据中的结构。代码可以在 GitHub 上找到：https://github.com/deepmind/synjax。

FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search

paper_url: http://arxiv.org/abs/2308.03290
repo_url: None
paper_authors: Jordan Dotzel, Gang Wu, Andrew Li, Muhammad Umar, Yun Ni, Mohamed S. Abdelfattah, Zhiru Zhang, Liqun Cheng, Martin G. Dixon, Norman P. Jouppi, Quoc V. Le, Sheng Li
for: 这个研究旨在提出一种一击混合精度搜寻方法，以实现高品质且低成本的模型。
methods: 这个方法使用混合精度搜寻，将数据分成integer和low-precision floating point两部分，并对它们进行一击搜寻。
results: 这个方法可以实现高品质的模型，并且比先前的方法更高的精度和更低的成本。在ImageNet上，这个方法可以提高ResNet-18的精度 by 1.31% points和ResNet-50的精度 by 0.90% points，而且与先前的方法相比，这个方法的模型成本相同。此外，这个方法还可以对MobileNetV2进行改进，实现更高的精度和更低的成本。最后，这个方法还可以同时搜寻模型的架构和量化空间，实现ImageNet的精度提高 by 2.69% points，并且与相似的模型成本。

Abstract
Quantization has become a mainstream compression technique for reducing model size, computational requirements, and energy consumption for modern deep neural networks (DNNs). With the improved numerical support in recent hardware, including multiple variants of integer and floating point, mixed-precision quantization has become necessary to achieve high-quality results with low model cost. Prior mixed-precision quantization methods have performed a post-training quantization search, which compromises on accuracy, or a differentiable quantization search, which leads to high memory usage from branching. Therefore, we propose the first one-shot mixed-precision quantization search that eliminates the need for retraining in both integer and low-precision floating point models. We evaluate our floating-point and integer quantization search (FLIQS) on multiple convolutional networks and vision transformer models to discover Pareto-optimal models. Our approach discovers models that improve upon uniform precision, manual mixed-precision, and recent integer quantization search methods. With the proposed integer quantization search, we increase the accuracy of ResNet-18 on ImageNet by 1.31% points and ResNet-50 by 0.90% points with equivalent model cost over previous methods. Additionally, for the first time, we explore a novel mixed-precision floating-point search and improve MobileNetV2 by up to 0.98% points compared to prior state-of-the-art FP8 models. Finally, we extend FLIQS to simultaneously search a joint quantization and neural architecture space and improve the ImageNet accuracy by 2.69% points with similar model cost on a MobileNetV2 search space.

摘要
快照量化已成为现代深度神经网络（DNN）的主流压缩技术，以降低模型大小、计算需求和能耗。随着当前硬件的改进 numerical support，包括多种整数和浮点数的多种变体，杂音精度量化已成为实现高质量结果的低成本模型的必要手段。先前的杂音精度量化方法通常会进行训练后量化搜索，这会妥协准确性，或者使用可导量化搜索，这会导致高内存使用率。因此，我们提出了首次一shot杂音精度量化搜索，这将消除重新训练的需要，并在整数和低精度浮点数模型中实现高质量结果。我们对多个卷积神经网络和视Transformer模型进行了评估，并发现了Pareto优质模型。我们的方法可以提高ResNet-18在ImageNet上的准确率by 1.31%点和ResNet-50上的准确率by 0.90%点，与先前方法相当。此外，我们首次探索了一种新的杂音精度浮点数搜索，并提高了MobileNetV2的性能，相比先前的FP8模型。最后，我们将FLIQS扩展到同时搜索一个量化和神经网络架构的空间，并在MobileNetV2搜索空间上提高ImageNet的准确率by 2.69%点，与相同的模型成本相似。

High-rate discretely-modulated continuous-variable quantum key distribution using quantum machine learning

paper_url: http://arxiv.org/abs/2308.03283
repo_url: None
paper_authors: Qin Liao, Jieyu Liu, Anqi Huang, Lei Huang, Zhuoying Fei, Xiquan Fu
for: 该研究旨在提出一种高速率的 continuous-variable quantum key distribution（CVQKD）分类器，用于提高CVQKD系统的安全性和效率。
methods: 该研究使用了量子机器学习技术，将CVQKD系统分为三部分：初始化部分，预测部分和数据处理部分。在预测部分，使用了低复杂度的量子k-最近邻（QkNN）分类器，用于预测Bob方向的失去拟合干扰 states。
results: 研究发现，提出的QkNN-based CVQKD具有较高的安全性和效率，并且可以通过增加拟合干扰的幅度来进一步提高秘密密钥率。数值仿真结果表明，相比现有的DM CVQKD协议，该方案的秘密密钥率明显高于现有的协议。

Abstract
We propose a high-rate scheme for discretely-modulated continuous-variable quantum key distribution (DM CVQKD) using quantum machine learning technologies, which divides the whole CVQKD system into three parts, i.e., the initialization part that is used for training and estimating quantum classifier, the prediction part that is used for generating highly correlated raw keys, and the data-postprocessing part that generates the final secret key string shared by Alice and Bob. To this end, a low-complexity quantum k-nearest neighbor (QkNN) classifier is designed for predicting the lossy discretely-modulated coherent states (DMCSs) at Bob's side. The performance of the proposed QkNN-based CVQKD especially in terms of machine learning metrics and complexity is analyzed, and its theoretical security is proved by using semi-definite program (SDP) method. Numerical simulation shows that the secret key rate of our proposed scheme is explicitly superior to the existing DM CVQKD protocols, and it can be further enhanced with the increase of modulation variance.

摘要
我们提出了一种高率方案 для离散Modulated kontinuierliche variable Quantum Key Distribution（DM CVQKD），使用量子机器学习技术，将整个CVQKD系统分成三部分：初始化部分用于训练和估计量子分类器，预测部分用于生成高度相关的Raw密钥，以及数据处理部分用于生成最终由Alice和Bob共享的密钥串。为此，我们设计了一种低复杂度的量子k-最近邻（QkNN）分类器，用于预测Bob方面的失去离散Modulated coherent states（DMCSs）。我们分析了提议的QkNN-based CVQKD的性能，包括机器学习指标和复杂度，并使用半definite Program（SDP）方法证明其理论安全性。numerical simulation表明，我们提议的方案的秘密密钥率明显高于现有的DM CVQKD协议，并可以通过增加模拟幅度进一步提高。

Knowledge Distilled Ensemble Model for sEMG-based Silent Speech Interface

paper_url: http://arxiv.org/abs/2308.06533
repo_url: None
paper_authors: Wenqiang Lai, Qihan Yang, Ye Mao, Endong Sun, Jiangnan Ye
for: 这个论文是为了解决语音疾病的问题而写的。
methods: 这个论文使用了深度学习知识压缩组合模型（KDE-SSI）来解决surface electromyography-based Silent Speech Interface（sEMG-based SSI）的限制。
results: 该模型可以在26个NATO音素字符集中正确地类别3900个数据样本，允许不ambiguous地生成任何英语单词。测试准确率达85.9%。

Abstract
Voice disorders affect millions of people worldwide. Surface electromyography-based Silent Speech Interfaces (sEMG-based SSIs) have been explored as a potential solution for decades. However, previous works were limited by small vocabularies and manually extracted features from raw data. To address these limitations, we propose a lightweight deep learning knowledge-distilled ensemble model for sEMG-based SSI (KDE-SSI). Our model can classify a 26 NATO phonetic alphabets dataset with 3900 data samples, enabling the unambiguous generation of any English word through spelling. Extensive experiments validate the effectiveness of KDE-SSI, achieving a test accuracy of 85.9\%. Our findings also shed light on an end-to-end system for portable, practical equipment.

摘要
声音疾病影响全球数百万人。基于表面电 MYography（sEMG）的无声朗读界面（SSI）已经被研究了几十年。然而，之前的工作受到小词汇和从原始数据手动提取的特征所限制。为了解决这些限制，我们提议一种轻量级深度学习知识填充ensemble模型 для sEMG-based SSI（KDE-SSI）。我们的模型可以 классифицировать一个 NATO phonetic alphabet dataset，包含 3900 个数据样本，允许不 ambiguous 地生成任何英语单词 through spelling。广泛的实验证明了 KDE-SSI 的效果，实现了 85.9% 的测试精度。我们的发现还 shed light on 一个端到端系统 for portable, practical equipment.

DSformer: A Double Sampling Transformer for Multivariate Time Series Long-term Prediction

paper_url: http://arxiv.org/abs/2308.03274
repo_url: None
paper_authors: Chengqing Yu, Fei Wang, Zezhi Shao, Tao Sun, Lin Wu, Yongjun Xu
for: 预测多变量时间序列长期变化，提供决策参考。
methods: 提议使用双重采样变换器（DSformer），包括双重采样（DS）块和时间变量注意（TVA）块。DS块使用下采样和分割采样将原始序列转换为具有全球信息和本地信息注意的特征向量。然后，TVA块使用时间注意和变量注意来挖掘这些特征向量的不同维度信息，并提取关键信息。
results: 实验结果表明，DSformer可以在九个真实世界数据集上超过八个基elines。

Abstract
Multivariate time series long-term prediction, which aims to predict the change of data in a long time, can provide references for decision-making. Although transformer-based models have made progress in this field, they usually do not make full use of three features of multivariate time series: global information, local information, and variables correlation. To effectively mine the above three features and establish a high-precision prediction model, we propose a double sampling transformer (DSformer), which consists of the double sampling (DS) block and the temporal variable attention (TVA) block. Firstly, the DS block employs down sampling and piecewise sampling to transform the original series into feature vectors that focus on global information and local information respectively. Then, TVA block uses temporal attention and variable attention to mine these feature vectors from different dimensions and extract key information. Finally, based on a parallel structure, DSformer uses multiple TVA blocks to mine and integrate different features obtained from DS blocks respectively. The integrated feature information is passed to the generative decoder based on a multi-layer perceptron to realize multivariate time series long-term prediction. Experimental results on nine real-world datasets show that DSformer can outperform eight existing baselines.

摘要
多变量时间序列长期预测，目的是预测数据在长期内的变化，可以提供决策参考。虽然基于转换器模型在这个领域已经取得了进步，但它们通常不充分利用多变量时间序列的三个特征：全局信息、本地信息和变量相关性。为了有效利用这些特征并建立高精度预测模型，我们提议了双重采样变换器（DSformer）。DSformer包括双重采样（DS）块和时间变量注意（TVA）块。首先，DS块使用下采样和分割采样将原始序列转化为特征向量，其中专注于全局信息和本地信息。然后，TVA块使用时间注意和变量注意来挖掘这些特征向量从不同维度，提取关键信息。最后，基于并行结构，DSformer使用多个TVA块来挖掘和集成不同维度的特征信息，并将其传递给基于多层感知机器的生成解码器，实现多变量时间序列长期预测。实验结果表明，DSformer可以在九个真实世界数据集上超过八个基准值。

Local Structure-aware Graph Contrastive Representation Learning

paper_url: http://arxiv.org/abs/2308.03271
repo_url: None
paper_authors: Kai Yang, Yuan Liu, Zijuan Zhao, Peijin Ding, Wenqian Zhao
for: 本研究提出了一种Local Structure-aware Graph Contrastive representation Learning方法（LS-GCL），用于模型节点的多视图结构信息。
methods: 本方法首先构建了各个目标节点的含义子图，不限于首领相邻节点。然后，对每个目标节点，使用共享GNN编码器获取子图级别节点嵌入。最后，使用pooling函数生成子图级别图像嵌入。
results: 对五个数据集进行实验，结果表明， compared to现有图像学习方法，LS-GCL方法在节点分类和链接预测任务中表现出色。

Abstract
Traditional Graph Neural Network (GNN), as a graph representation learning method, is constrained by label information. However, Graph Contrastive Learning (GCL) methods, which tackle the label problem effectively, mainly focus on the feature information of the global graph or small subgraph structure (e.g., the first-order neighborhood). In the paper, we propose a Local Structure-aware Graph Contrastive representation Learning method (LS-GCL) to model the structural information of nodes from multiple views. Specifically, we construct the semantic subgraphs that are not limited to the first-order neighbors. For the local view, the semantic subgraph of each target node is input into a shared GNN encoder to obtain the target node embeddings at the subgraph-level. Then, we use a pooling function to generate the subgraph-level graph embeddings. For the global view, considering the original graph preserves indispensable semantic information of nodes, we leverage the shared GNN encoder to learn the target node embeddings at the global graph-level. The proposed LS-GCL model is optimized to maximize the common information among similar instances at three various perspectives through a multi-level contrastive loss function. Experimental results on five datasets illustrate that our method outperforms state-of-the-art graph representation learning approaches for both node classification and link prediction tasks.

摘要
传统的图形神经网络（GNN）在图像学习中受标签信息的限制。然而，图像对比学习（GCL）方法，它们可以有效地解决标签问题，主要集中在全图或小部分图结构（例如，第一邻居）的特征信息上。在文章中，我们提出了一种本地结构意识 graph contrastive representation learning方法（LS-GCL），用于模型节点的多视图结构信息。具体来说，我们构建了不限于第一邻居的语义子图。对于本地视图，每个目标节点的语义子图输入到共享GNNEncoder中，以获得目标节点的子图级别嵌入。然后，我们使用一种池化函数生成子图级别图像嵌入。对于全球视图，因为原始图保留了节点的不可或缺semantic信息，我们利用共享GNNEncoder来学习目标节点的全图级别嵌入。我们提出的LS-GCL模型通过最大化三个不同视点上相似实例的共同信息来优化多级对比损失函数。实验结果表明，我们的方法在五个 dataset上较前state-of-the-art图形学习方法出色地进行节点类别和连接预测任务。

Simple Rule Injection for ComplEx Embeddings

paper_url: http://arxiv.org/abs/2308.03269
repo_url: None
paper_authors: Haodi Ma, Anthony Colas, Yuejie Wang, Ali Sadeghian, Daisy Zhe Wang
for: 本研究旨在结合逻辑规则和知识图embedding以获得优化的知识图推理结果。
methods: 本研究提出了一种名为InjEx的机制，可以通过简单的约束来插入多种逻辑规则，以捕捉Definite Horn规则。
results: 实验结果表明，InjEx可以在知识图完成(KGC)和少量shot知识图完成(FKGC)任务中具有更高的性能和可扩展性，并且可以带来更加可读性的知识图 Representation。

Abstract
Recent works in neural knowledge graph inference attempt to combine logic rules with knowledge graph embeddings to benefit from prior knowledge. However, they usually cannot avoid rule grounding, and injecting a diverse set of rules has still not been thoroughly explored. In this work, we propose InjEx, a mechanism to inject multiple types of rules through simple constraints, which capture definite Horn rules. To start, we theoretically prove that InjEx can inject such rules. Next, to demonstrate that InjEx infuses interpretable prior knowledge into the embedding space, we evaluate InjEx on both the knowledge graph completion (KGC) and few-shot knowledge graph completion (FKGC) settings. Our experimental results reveal that InjEx outperforms both baseline KGC models as well as specialized few-shot models while maintaining its scalability and efficiency.

摘要
近期研究在神经知识图推理中尝试将逻辑规则与知识图嵌入结合以获得优势，但通常无法避免规则固化，并尚未全面探讨多种规则的混合。在这种工作中，我们提出了InjEx机制，可以通过简单的约束来注入多种类型的规则，这些规则捕捉了幂等规则。首先，我们理论上证明了InjEx可以注入这些规则。然后，我们通过在知识图完成(KGC)和少量知识图完成(FKGC)设置中评估InjEx，发现InjEx可以充分吸收明确的先验知识，并在缺少数据时保持高效性和扩展性。

Exploring Different Time-series-Transformer (TST) Architectures: A Case Study in Battery Life Prediction for Electric Vehicles (EVs)

paper_url: http://arxiv.org/abs/2308.03260
repo_url: None
paper_authors: Niranjan Sitapure, Atharva Kulkarni
for: The paper aims to develop accurate battery life prediction models for electric vehicles (EVs) using a data-driven approach and novel transformer-based architectures.
methods: The paper uses time-series-transformers (TSTs) and long short-term memory (LSTM) models to predict battery state-of-charge (SOC) and temperature in EVs, incorporating environmental, battery, vehicle driving, and heating circuit data.
results: The paper explores and compares novel TST architectures, including encoder TST + decoder LSTM and a hybrid TST-LSTM, to create accurate battery life prediction models for EVs.

Abstract
In recent years, battery technology for electric vehicles (EVs) has been a major focus, with a significant emphasis on developing new battery materials and chemistries. However, accurately predicting key battery parameters, such as state-of-charge (SOC) and temperature, remains a challenge for constructing advanced battery management systems (BMS). Existing battery models do not comprehensively cover all parameters affecting battery performance, including non-battery-related factors like ambient temperature, cabin temperature, elevation, and regenerative braking during EV operation. Due to the difficulty of incorporating these auxiliary parameters into traditional models, a data-driven approach is suggested. Time-series-transformers (TSTs), leveraging multiheaded attention and parallelization-friendly architecture, are explored alongside LSTM models. Novel TST architectures, including encoder TST + decoder LSTM and a hybrid TST-LSTM, are also developed and compared against existing models. A dataset comprising 72 driving trips in a BMW i3 (60 Ah) is used to address battery life prediction in EVs, aiming to create accurate TST models that incorporate environmental, battery, vehicle driving, and heating circuit data to predict SOC and battery temperature for future time steps.

摘要
近年来，电动汽车（EV）的电池技术受到了重点关注，开发新的电池材料和化学组合也受到了重视。然而，正确预测电池参数，如充电状态（SOC）和温度，仍然是构建高级电池管理系统（BMS）的挑战。现有的电池模型没有完全覆盖所有影响电池性能的参数，包括非电池相关因素，如外部温度、车辆温度、海拔和在EV运行时的回生制动。由于将这些辅助参数 incorporated into traditional models 是困难的，一种数据驱动的方法被建议。时序列转换器（TST），利用多头注意力和并行化友好的架构，与LSTM模型一起被探讨。 Novel TST架构，包括编码TST+解码LSTM和混合TST-LSTM，也被开发并与现有模型进行比较。使用了72次开放驱动记录（BMW i3，60 Ah），用于预测EV电池寿命，目标是创建准确的TST模型， incorporating environmental, battery, vehicle driving, and heating circuit data to predict SOC and battery temperature for future time steps。

Optimal Approximation and Learning Rates for Deep Convolutional Neural Networks

paper_url: http://arxiv.org/abs/2308.03259
repo_url: None
paper_authors: Shao-Bo Lin
for: 这篇论文主要针对深度卷积神经网络的抽象和学习性能分析。
methods: 论文使用了零填充和最大池化来分析深度卷积神经网络的抽象和学习性能。
results: 论文证明了，用于模型$ r $-光滑函数的近似率，深度卷积神经网络的深度$ L $ 的近似率是$ (L^2/\log L)^{-2r/d} $，这是最佳准确性因子。此外，论文还提出了几乎最佳的学习率来实现深度卷积神经网络上的实际风险函数最小化。

Abstract
This paper focuses on approximation and learning performance analysis for deep convolutional neural networks with zero-padding and max-pooling. We prove that, to approximate $r$-smooth function, the approximation rates of deep convolutional neural networks with depth $L$ are of order $ (L^2/\log L)^{-2r/d} $, which is optimal up to a logarithmic factor. Furthermore, we deduce almost optimal learning rates for implementing empirical risk minimization over deep convolutional neural networks.

摘要
Here's the translation in Simplified Chinese:这篇论文关注深度卷积神经网络（CNN）的近似和学习性能分析，包括零填充和最大池化。我们证明，用于近似$r$-光滑函数的深度$L$的CNN的近似率是$(L^2/\log L)^{-2r/d}$，即优化到对数因子。此外，我们还得出了对深度CNN进行实际风险最小化的几乎最佳学习速率。

Unsupervised Adversarial Detection without Extra Model: Training Loss Should Change

paper_url: http://arxiv.org/abs/2308.03243
repo_url: https://github.com/cyclebooster/unsupervised-adversarial-detection-without-extra-model
paper_authors: Chien Cheng Chyou, Hung-Ting Su, Winston H. Hsu
for: 提高深度学习模型对抗攻击的可靠性
methods: 提出新的训练损失函数和无需依赖于攻击类型的检测方法
results: 检测率高于93.9%，false positive率低于2.5%，在所有攻击类型下都有良好表现

Abstract
Adversarial robustness poses a critical challenge in the deployment of deep learning models for real-world applications. Traditional approaches to adversarial training and supervised detection rely on prior knowledge of attack types and access to labeled training data, which is often impractical. Existing unsupervised adversarial detection methods identify whether the target model works properly, but they suffer from bad accuracies owing to the use of common cross-entropy training loss, which relies on unnecessary features and strengthens adversarial attacks. We propose new training losses to reduce useless features and the corresponding detection method without prior knowledge of adversarial attacks. The detection rate (true positive rate) against all given white-box attacks is above 93.9% except for attacks without limits (DF($\infty$)), while the false positive rate is barely 2.5%. The proposed method works well in all tested attack types and the false positive rates are even better than the methods good at certain types.

摘要
深度学习模型在实际应用中面临严重的敌对Robustness挑战。传统的敌对训练和监督检测方法需要严格的优化目标和训练数据，这并不现实。现有的无监督敌对检测方法可以判断目标模型是否正常工作，但它们因使用共同克服极值损失函数，导致检测精度差。我们提出了新的训练损失函数，以减少无用的特征，并提出了不需要先知 adversarial 攻击的检测方法。对于所有给出的白盒攻击，检测率（真正阳性率）高于93.9%，只有不受限制的攻击（DF($\infty$））的检测率较低， False Positive率只有2.5%。我们的方法在所有攻击类型上都很好，而且对于某些类型的检测率更高。

Asynchronous Decentralized Q-Learning: Two Timescale Analysis By Persistence

paper_url: http://arxiv.org/abs/2308.03239
repo_url: None
paper_authors: Bora Yongacoglu, Gürdal Arslan, Serdar Yüksel
for: 这篇论文主要探讨了多智能体学习（MARL）中的非站点性挑战，以及如何使用不同的方法来解决这个挑战。
methods: 这篇论文使用了 asynchronous variant of the decentralized Q-learning algorithm，并提供了 suficient conditions 以 garantuee that the asynchronous algorithm drives play to equilibrium with high probability。
results: 研究发现，使用constant learning rates在Q-factor更新中是关键的，可以relaxing the synchrony assumptions of earlier work。此外，这种方法还可以应用于 asynchronous generalizations of a number of other algorithms from the regret testing tradition。

Abstract
Non-stationarity is a fundamental challenge in multi-agent reinforcement learning (MARL), where agents update their behaviour as they learn. Many theoretical advances in MARL avoid the challenge of non-stationarity by coordinating the policy updates of agents in various ways, including synchronizing times at which agents are allowed to revise their policies. Synchronization enables analysis of many MARL algorithms via multi-timescale methods, but such synchrony is infeasible in many decentralized applications. In this paper, we study an asynchronous variant of the decentralized Q-learning algorithm, a recent MARL algorithm for stochastic games. We provide sufficient conditions under which the asynchronous algorithm drives play to equilibrium with high probability. Our solution utilizes constant learning rates in the Q-factor update, which we show to be critical for relaxing the synchrony assumptions of earlier work. Our analysis also applies to asynchronous generalizations of a number of other algorithms from the regret testing tradition, whose performance is analyzed by multi-timescale methods that study Markov chains obtained via policy update dynamics. This work extends the applicability of the decentralized Q-learning algorithm and its relatives to settings in which parameters are selected in an independent manner, and tames non-stationarity without imposing the coordination assumptions of prior work.

摘要

AdaER: An Adaptive Experience Replay Approach for Continual Lifelong Learning

paper_url: http://arxiv.org/abs/2308.03810
repo_url: None
paper_authors: Xingyu Li, Bo Tang, Haifeng Li
for: 这 paper 的目的是解决机器学习框架中的持续学习问题，即学习者需要不断获得新的知识，但是流动数据的非站点性带来了快速忘记之问题。
methods: 这 paper 使用了一种新的算法叫做adaptive-experience replay（AdaER），它包括两个阶段：记忆重播和记忆更新。在记忆重播阶段，AdaER 使用了一种 Contextually-cued memory recall（C-CMR）策略，选择在当前输入数据和任务上表现最大的冲突的记忆进行重播。此外，AdaER 还包括一种 Entropy-balanced reservoir sampling（E-BRS）策略来增强记忆缓存的性能。
results: 经过实验表明，AdaER 比现有的持续学习基elines表现更高，highlighting its efficacy in mitigating catastrophic forgetting and improving learning performance。

Abstract
Continual lifelong learning is an machine learning framework inspired by human learning, where learners are trained to continuously acquire new knowledge in a sequential manner. However, the non-stationary nature of streaming training data poses a significant challenge known as catastrophic forgetting, which refers to the rapid forgetting of previously learned knowledge when new tasks are introduced. While some approaches, such as experience replay (ER), have been proposed to mitigate this issue, their performance remains limited, particularly in the class-incremental scenario which is considered natural and highly challenging. In this paper, we present a novel algorithm, called adaptive-experience replay (AdaER), to address the challenge of continual lifelong learning. AdaER consists of two stages: memory replay and memory update. In the memory replay stage, AdaER introduces a contextually-cued memory recall (C-CMR) strategy, which selectively replays memories that are most conflicting with the current input data in terms of both data and task. Additionally, AdaER incorporates an entropy-balanced reservoir sampling (E-BRS) strategy to enhance the performance of the memory buffer by maximizing information entropy. To evaluate the effectiveness of AdaER, we conduct experiments on established supervised continual lifelong learning benchmarks, specifically focusing on class-incremental learning scenarios. The results demonstrate that AdaER outperforms existing continual lifelong learning baselines, highlighting its efficacy in mitigating catastrophic forgetting and improving learning performance.

摘要
AdaER包括两个阶段：记忆回顾和记忆更新。在记忆回顾阶段，AdaER使用了Contextually-cued Memory Recall（C-CMR）策略，选择ively回顾与当前输入数据和任务相关的记忆。此外，AdaER还将Entropy-balanced Reservoir Sampling（E-BRS）策略添加到记忆缓存中，以提高记忆缓存的性能，并 Maximizing information entropy。为了评估AdaER的有效性，我们对已有的超vised continual lifelong learning Benchmark进行实验，特别是对维度增量学习scenario。结果显示，AdaER在减少忘却和提高学习性能方面表现出色，较以前的持续性学习基eline高效。

G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima

paper_url: http://arxiv.org/abs/2308.03236
repo_url: None
paper_authors: Xingyu Li, Bo Tang
for: 提高深度神经网络（DNN）的通用能力，特别是在有限的训练数据available时。
methods: combines Mixup和SAM（Sharpness-Aware Minimization）技术，以提高DNN训练过程中的泛化能力。
results: 提出了两种新算法：Binary G-Mix和Decomposed G-Mix，可以进一步优化DNN性能。实验结果表明，这两种算法可以在多个数据集和模型上提高模型的泛化性能，达到状态 искусственный智能水平。

Abstract
Deep neural networks (DNNs) have demonstrated promising results in various complex tasks. However, current DNNs encounter challenges with over-parameterization, especially when there is limited training data available. To enhance the generalization capability of DNNs, the Mixup technique has gained popularity. Nevertheless, it still produces suboptimal outcomes. Inspired by the successful Sharpness-Aware Minimization (SAM) approach, which establishes a connection between the sharpness of the training loss landscape and model generalization, we propose a new learning framework called Generalized-Mixup, which combines the strengths of Mixup and SAM for training DNN models. The theoretical analysis provided demonstrates how the developed G-Mix framework enhances generalization. Additionally, to further optimize DNN performance with the G-Mix framework, we introduce two novel algorithms: Binary G-Mix and Decomposed G-Mix. These algorithms partition the training data into two subsets based on the sharpness-sensitivity of each example to address the issue of "manifold intrusion" in Mixup. Both theoretical explanations and experimental results reveal that the proposed BG-Mix and DG-Mix algorithms further enhance model generalization across multiple datasets and models, achieving state-of-the-art performance.

摘要
Translation notes:* "Deep neural networks" is translated as "深度神经网络" (shēn dào shén zhī wǎng luò)* "Mixup" is translated as "混合" (hùn hé)* "Sharpness-Aware Minimization" is translated as "锐度意识化最小化" (yǐ shi zhī yì xiǎng zuì xiǎo)* "Generalized-Mixup" is translated as "通用混合" (gōng yòng hùn hé)* "Binary G-Mix" is translated as "二进制G-Mix" (èr jì zhì G-Mix)* "Decomposed G-Mix" is translated as "分解G-Mix" (fēn jiě G-Mix)

Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion Mining

paper_url: http://arxiv.org/abs/2308.03235
repo_url: https://github.com/zekaouinoureddine/Opinion-Transformers
paper_authors: Nour Eddine Zekaoui, Siham Yousfi, Maryem Rhanoui, Mounia Mikram
for: 本研究的目的是研究Transformer型语言模型在情感分析领域的表现，并对这些模型进行比较，以便为生产工程师和研究人员提供指导。
methods: 本研究使用了Transformer型语言模型进行情感分析，并对这些模型进行了比较。
results: 研究发现，Transformer型语言模型在情感分析任务上具有出色的表现，具有较快的处理速度和更高的准确率。

Abstract
Opinion mining, also known as sentiment analysis, is a subfield of natural language processing (NLP) that focuses on identifying and extracting subjective information in textual material. This can include determining the overall sentiment of a piece of text (e.g., positive or negative), as well as identifying specific emotions or opinions expressed in the text, that involves the use of advanced machine and deep learning techniques. Recently, transformer-based language models make this task of human emotion analysis intuitive, thanks to the attention mechanism and parallel computation. These advantages make such models very powerful on linguistic tasks, unlike recurrent neural networks that spend a lot of time on sequential processing, making them prone to fail when it comes to processing long text. The scope of our paper aims to study the behaviour of the cutting-edge Transformer-based language models on opinion mining and provide a high-level comparison between them to highlight their key particularities. Additionally, our comparative study shows leads and paves the way for production engineers regarding the approach to focus on and is useful for researchers as it provides guidelines for future research subjects.

摘要

Imbalanced Large Graph Learning Framework for FPGA Logic Elements Packing Prediction

paper_url: http://arxiv.org/abs/2308.03231
repo_url: None
paper_authors: Zhixiong Di, Runzhe Tao, Lin Chen, Qiang Wu, Yibo Lin
for: 预测FPGA嵌入式逻辑元素是否被压缩，以优化设计优化和加速设计关闭。
methods: 提议一种大图学习框架ImLG，通过独特的特征提取和特征聚合方法来提高层次图表示学习。另外，针对不均衡分布的逻辑元素压缩和未压缩的情况，提出了图像增量和小批训练等技术来解决这种学习任务。
results: 实验结果表明，我们的框架可以提高FPGA嵌入式逻辑元素预测的F1分数比最近的泊Sdk-based预测方法提高42.82%。物理设计结果表明，提议的方法可以帮助分配器提高已经路由的电缆长度0.93%和SLICE占用0.89%。

Abstract
Packing is a required step in a typical FPGA CAD flow. It has high impacts to the performance of FPGA placement and routing. Early prediction of packing results can guide design optimization and expedite design closure. In this work, we propose an imbalanced large graph learning framework, ImLG, for prediction of whether logic elements will be packed after placement. Specifically, we propose dedicated feature extraction and feature aggregation methods to enhance the node representation learning of circuit graphs. With imbalanced distribution of packed and unpacked logic elements, we further propose techniques such as graph oversampling and mini-batch training for this imbalanced learning task in large circuit graphs. Experimental results demonstrate that our framework can improve the F1 score by 42.82% compared to the most recent Gaussian-based prediction method. Physical design results show that the proposed method can assist the placer in improving routed wirelength by 0.93% and SLICE occupation by 0.89%.

摘要
Packing 是FPGA CAD流程中的一个必需步骤，它对FPGA的地点和路径产生了高度的影响。 early prediction of packing results can guide design optimization and expedite design closure. 在这种工作中，我们提出了一种大图学习框架，ImLG，用于预测逻辑元素是否将被packed After placement。specifically，我们提出了专门的特征提取和特征聚合方法，以增强环 Graph的节点表示学习。 With imbalanced distribution of packed and unpacked logic elements, we further propose techniques such as graph oversampling and mini-batch training for this imbalanced learning task in large circuit graphs. Experimental results demonstrate that our framework can improve the F1 score by 42.82% compared to the most recent Gaussian-based prediction method. Physical design results show that the proposed method can assist the placer in improving routed wirelength by 0.93% and SLICE occupation by 0.89%.

Tractability of approximation by general shallow networks

paper_url: http://arxiv.org/abs/2308.03230
repo_url: None
paper_authors: Hrushikesh Mhaskar, Tong Mao
for: 本文提出了一种更加锐利的 bounds 方法，用于 approximating 函数 $ x\mapsto\int_{\mathbb{Y} G( x, y)d\tau( y) $ 的形式，where $\mathbb{X}$ 和 $\mathbb{Y}$ 是紧密的 метри空间。
methods: 本文使用了 $G$-网络，即 $ x\mapsto \sum_{k=1}^n a_kG( x, y_k) $，where $ y_1,\cdots, y_n\in\mathbb{Y} $ 和 $ a_1,\cdots, a_n\in\mathbb{R} $。
results: 本文提出了基于维度的独立 bounds 方法，可以用于评估 $G$-网络的度量准确性，并且这些 bounds 中的常数都是几乎只依赖于维度的 polynomial 方式增长。

Abstract
In this paper, we present a sharper version of the results in the paper Dimension independent bounds for general shallow networks; Neural Networks, \textbf{123} (2020), 142-152. Let $\mathbb{X}$ and $\mathbb{Y}$ be compact metric spaces. We consider approximation of functions of the form $ x\mapsto\int_{\mathbb{Y} G( x, y)d\tau( y)$, $ x\in\mathbb{X}$, by $G$-networks of the form $ x\mapsto \sum_{k=1}^n a_kG( x, y_k)$, $ y_1,\cdots, y_n\in\mathbb{Y}$, $a_1,\cdots, a_n\in\mathbb{R}$. Defining the dimensions of $\mathbb{X}$ and $\mathbb{Y}$ in terms of covering numbers, we obtain dimension independent bounds on the degree of approximation in terms of $n$, where also the constants involved are all dependent at most polynomially on the dimensions. Applications include approximation by power rectified linear unit networks, zonal function networks, certain radial basis function networks as well as the important problem of function extension to higher dimensional spaces.

摘要
在这篇论文中，我们提出了一个更为精细的版本的result，来自文章《独立维度 bounds for general shallow networks》；Neural Networks, \textbf{123} (2020), 142-152。假设 $\mathbb{X}$ 和 $\mathbb{Y}$ 是两个 компакт度度量空间。我们考虑了将函数 $x\mapsto\int_{\mathbb{Y} G( x, y)d\tau( y)$, $x\in\mathbb{X}$, 近似为 $G$-网络的形式 $x\mapsto \sum_{k=1}^n a_kG( x, y_k)$, $y_1,\cdots, y_n\in\mathbb{Y}$, $a_1,\cdots, a_n\in\mathbb{R}$.我们使用covering数量来定义 $\mathbb{X}$ 和 $\mathbb{Y}$ 的维度，得到了独立于维度的度量约束，其中涉及的常数都是仅仅受到 polynomial 幂级的影响。应用包括power rectified linear unit网络、zonal function网络、certain radial basis function网络以及高维空间中函数扩展的重要问题。

Why Linguistics Will Thrive in the 21st Century: A Reply to Piantadosi (2023)

paper_url: http://arxiv.org/abs/2308.03228
repo_url: None
paper_authors: Jordan Kodner, Sarah Payne, Jeffrey Heinz
for: 本文批判Piantadosi（2023）认为现代语言模型推翻了昌斯基的语言方法，从四个主要方面进行批判。
methods: 本文使用了大量语言模型（LLMs）的印象和实用性来评估昌斯基的语言方法。
results: 本文 conclude that 大量语言模型（LLMs）无法解决语言学习中的主要谜团，即儿童在获得 native语言后的语言能力增长。此外，LLMs 无法提供对语言和语言学习的科学理论，因为它们只提供了预测而不是可解释的结果。

Abstract
We present a critical assessment of Piantadosi's (2023) claim that "Modern language models refute Chomsky's approach to language," focusing on four main points. First, despite the impressive performance and utility of large language models (LLMs), humans achieve their capacity for language after exposure to several orders of magnitude less data. The fact that young children become competent, fluent speakers of their native languages with relatively little exposure to them is the central mystery of language learning to which Chomsky initially drew attention, and LLMs currently show little promise of solving this mystery. Second, what can the artificial reveal about the natural? Put simply, the implications of LLMs for our understanding of the cognitive structures and mechanisms underlying language and its acquisition are like the implications of airplanes for understanding how birds fly. Third, LLMs cannot constitute scientific theories of language for several reasons, not least of which is that scientific theories must provide interpretable explanations, not just predictions. This leads to our final point: to even determine whether the linguistic and cognitive capabilities of LLMs rival those of humans requires explicating what humans' capacities actually are. In other words, it requires a separate theory of language and cognition; generative linguistics provides precisely such a theory. As such, we conclude that generative linguistics as a scientific discipline will remain indispensable throughout the 21st century and beyond.

摘要
我们提供了对Piantadosi（2023）的批判性评估，关注四个主要点。首先，虽然大型语言模型（LLM）具有印象力和实用性，但人类通过相对许多更少的数据获得语言能力是语言学习的中心谜题，Piantadosi最初吸引到了注意。LLMs current show little promise of solving this mystery. Second, what can the artificial reveal about the natural? In other words, the implications of LLMs for our understanding of the cognitive structures and mechanisms underlying language and its acquisition are like the implications of airplanes for understanding how birds fly. Third, LLMs cannot constitute scientific theories of language for several reasons, not least of which is that scientific theories must provide interpretable explanations, not just predictions. This leads to our final point: to even determine whether the linguistic and cognitive capabilities of LLMs rival those of humans requires explicating what humans' capacities actually are. In other words, it requires a separate theory of language and cognition; generative linguistics provides precisely such a theory. As such, we conclude that generative linguistics as a scientific discipline will remain indispensable throughout the 21st century and beyond.

Local Consensus Enhanced Siamese Network with Reciprocal Loss for Two-view Correspondence Learning

paper_url: http://arxiv.org/abs/2308.03217
repo_url: None
paper_authors: Linbo Wang, Jing Wu, Xianyong Fang, Zhengyi Liu, Chenjie Cao, Yanwei Fu
for: 提高两视对匹配学习框架的精度和稳定性。
methods: 提议一个Local Feature Consensus（LFC）插件块来增强现有模型的特征，以及扩展现有模型到siames网络，使用对称损失来利用对方向投影的信息。
results: 通过实验，在标准 benchmark 数据集上达到了状态当前的性能。

Abstract
Recent studies of two-view correspondence learning usually establish an end-to-end network to jointly predict correspondence reliability and relative pose. We improve such a framework from two aspects. First, we propose a Local Feature Consensus (LFC) plugin block to augment the features of existing models. Given a correspondence feature, the block augments its neighboring features with mutual neighborhood consensus and aggregates them to produce an enhanced feature. As inliers obey a uniform cross-view transformation and share more consistent learned features than outliers, feature consensus strengthens inlier correlation and suppresses outlier distraction, which makes output features more discriminative for classifying inliers/outliers. Second, existing approaches supervise network training with the ground truth correspondences and essential matrix projecting one image to the other for an input image pair, without considering the information from the reverse mapping. We extend existing models to a Siamese network with a reciprocal loss that exploits the supervision of mutual projection, which considerably promotes the matching performance without introducing additional model parameters. Building upon MSA-Net, we implement the two proposals and experimentally achieve state-of-the-art performance on benchmark datasets.

摘要
最近的两视匹配学习研究通常建立一个端到端网络，同时预测匹配可靠性和相对pose。我们从两个方面提高了这种框架：首先，我们提出了一个本地特征共识（LFC）插件块，用于增强现有模型的特征。给一个匹配特征，该块将其周围的特征与相互邻域共识，并将它们积累到生成一个强化特征。由于匹配点遵循均匀的双视变换，并且分享更一致的学习特征，因此特征共识强化匹配点之间的相互关系，降低干扰物的影响，使输出特征更有特征性，以便将匹配点分类为匹配/干扰物。其次，现有方法在网络训练时使用真实匹配和 Essential matrix projecting一个图像到另一个图像，而不考虑反向映射的信息。我们将现有模型扩展为siamese网络，使用相互抽象的损失函数，以便利用反向映射的supervision，大幅提高匹配性能，而无需添加更多的模型参数。基于MSA-Net，我们实现了两个提议，并在benchmark datasets上实验ally achieve state-of-the-art performance。

The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning

paper_url: http://arxiv.org/abs/2308.03215
repo_url: None
paper_authors: Nikhil Ghosh, Spencer Frei, Wooseok Ha, Bin Yu
for: 这个论文研究了使用梯度下降法（SGD）训练单神经 autoencoder 的动态性，并研究了不同批处理大小对于非对称问题的影响。
methods: 该论文使用了随机初始化的 SGD 算法，并研究了不同批处理大小对于解的影响。
results: 研究发现，无论批处理大小，SGD 都可以成功找到全球最小值，但是特定的全球最小值取决于批处理大小。在全部批处理情况下，解是 dense 的（即不含杂的），并且与初始向量高度相似，表明在这种情况下，Feature learning 不太多。相反，任何小于样本数的批处理大小都可以找到一个 sparse 的全球最小值，这种 “特征选择” 是由梯度randomness 引起的。此外，我们还发现，使用全部批处理的SGD 找到的最小值比使用更小的批处理大小找到的最小值更平滑（即距离初始向量更远），这与之前的研究不同。Note: “梯度randomness” is a term used to describe the randomness of the gradients in the stochastic gradient descent algorithm.

Abstract
In this work, we investigate the dynamics of stochastic gradient descent (SGD) when training a single-neuron autoencoder with linear or ReLU activation on orthogonal data. We show that for this non-convex problem, randomly initialized SGD with a constant step size successfully finds a global minimum for any batch size choice. However, the particular global minimum found depends upon the batch size. In the full-batch setting, we show that the solution is dense (i.e., not sparse) and is highly aligned with its initialized direction, showing that relatively little feature learning occurs. On the other hand, for any batch size strictly smaller than the number of samples, SGD finds a global minimum which is sparse and nearly orthogonal to its initialization, showing that the randomness of stochastic gradients induces a qualitatively different type of "feature selection" in this setting. Moreover, if we measure the sharpness of the minimum by the trace of the Hessian, the minima found with full batch gradient descent are flatter than those found with strictly smaller batch sizes, in contrast to previous works which suggest that large batches lead to sharper minima. To prove convergence of SGD with a constant step size, we introduce a powerful tool from the theory of non-homogeneous random walks which may be of independent interest.

摘要
在这项工作中，我们研究了使用单 neuron autoencoder 的梯度下降法（SGD）在正交数据上的动态。我们发现，对于这个非 convex 问题，随机初始化的SGD WITH 常数步长能够成功找到全局最小值，但这个全局最小值与批处理大小有关。在完整批处理设置下，我们发现的解是密集的（即不是稀疏），与其初始方向高度相似，表明在这个设置下，相对较少的特征学习发生。相反，任何小于样本数的批处理大小都会使SGD找到全局最小值，这个全局最小值是稀疏的和初始方向几乎垂直的，这表明随机梯度的Randomness 会导致一种totally different的"特征选择"现象。此外，我们还证明了SGD WITH 常数步长的收敛性，并引入了非同homogeneous random walk 的理论工具，这可能有独立的意义。

Average-Hard Attention Transformers are Constant-Depth Uniform Threshold Circuits

paper_url: http://arxiv.org/abs/2308.03212
repo_url: None
paper_authors: Lena Strobl
for: This paper explores the relationship between transformer models and constant-depth threshold circuits, and demonstrates that transformers can be simulated by constant-depth threshold circuits.
methods: The paper uses two assumptions: average-hard attention and logarithmic precision for internal computations relative to input length.
results: The paper shows that both transformer models can be simulated by constant-depth threshold circuits, with the latter being more robust due to generating a uniform circuit family. Additionally, the paper extends the first result to yield uniform circuits as well.Here’s the information in Simplified Chinese text:
for: 这篇论文研究了 transformer 模型与常深度阈值电路之间的关系，并证明 transformer 可以被模拟为常深度阈值电路。
methods: 该论文使用了两个假设：平均困难的注意力和对输入长度的对数精度。
results: 该论文显示了 transformer 模型可以被模拟为常深度阈值电路，其中第二个更加稳定，因为它生成了一个固定深度电路家族。此外，论文还将第一个结果推广到生成固定深度电路。

Abstract
Transformers have emerged as a widely used neural network model for various natural language processing tasks. Previous research explored their relationship with constant-depth threshold circuits, making two assumptions: average-hard attention and logarithmic precision for internal computations relative to input length. Merrill et al. (2022) prove that average-hard attention transformers recognize languages that fall within the complexity class TC0, denoting the set of languages that can be recognized by constant-depth polynomial-size threshold circuits. Likewise, Merrill and Sabharwal (2023) show that log-precision transformers recognize languages within the class of uniform TC0. This shows that both transformer models can be simulated by constant-depth threshold circuits, with the latter being more robust due to generating a uniform circuit family. Our paper shows that the first result can be extended to yield uniform circuits as well.

摘要
transformers 已经成为自然语言处理任务中广泛使用的神经网络模型。前期研究探讨了它们与常深度阈值电路之间的关系，假设了平均困难注意力和对内部计算的对数精度相对于输入长度。Merill et al. (2022)证明了average-hard attention transformers 可以认出TC0复杂性类型的语言，这些语言可以被表示为常深度的多阶度阈值电路。Merill 和 Sabharwal (2023)表明，log-precision transformers 可以认出 uniform TC0 类型的语言。这表明这两种 transformers 模型都可以被模拟为常深度阈值电路，其中后者更加稳定，因为它生成了一个固定深度的多阶度阈值电路家族。我们的论文表明，上一个结果可以被推广到生成固定深度的多阶度阈值电路。

Time-Parameterized Convolutional Neural Networks for Irregularly Sampled Time Series

paper_url: http://arxiv.org/abs/2308.03210
repo_url: None
paper_authors: Chrysoula Kosma, Giannis Nikolentzos, Michalis Vazirgiannis
for: 这篇论文主要关注于如何对不规则数据进行模型化和预测，尤其是在多变量时间序列中。
methods: 本文提出了一种名为时间参数化卷积神经网（TPCNN）的新型神经网络模型，它运用时间参数化的卷积 kernel 来实现不规则数据的模型化。
results: 根据实验结果，TPCNN 模型在测量 interpolating 和类别任务中表现了竞争力和效率，并且可以对不规则数据进行有效地模型化和预测。

Abstract
Irregularly sampled multivariate time series are ubiquitous in several application domains, leading to sparse, not fully-observed and non-aligned observations across different variables. Standard sequential neural network architectures, such as recurrent neural networks (RNNs) and convolutional neural networks (CNNs), consider regular spacing between observation times, posing significant challenges to irregular time series modeling. While most of the proposed architectures incorporate RNN variants to handle irregular time intervals, convolutional neural networks have not been adequately studied in the irregular sampling setting. In this paper, we parameterize convolutional layers by employing time-explicitly initialized kernels. Such general functions of time enhance the learning process of continuous-time hidden dynamics and can be efficiently incorporated into convolutional kernel weights. We, thus, propose the time-parameterized convolutional neural network (TPCNN), which shares similar properties with vanilla convolutions but is carefully designed for irregularly sampled time series. We evaluate TPCNN on both interpolation and classification tasks involving real-world irregularly sampled multivariate time series datasets. Our experimental results indicate the competitive performance of the proposed TPCNN model which is also significantly more efficient than other state-of-the-art methods. At the same time, the proposed architecture allows the interpretability of the input series by leveraging the combination of learnable time functions that improve the network performance in subsequent tasks and expedite the inaugural application of convolutions in this field.

摘要
不规则时间序列是多种应用领域中的普遍现象，导致不同变量之间的观察记录稀缺、不完全观察和不对称。标准的序列神经网络架构，如循环神经网络（RNN）和卷积神经网络（CNN），假设时间序列的均匀采样，对于不规则时间序列模型 pose significant challenges。大多数提议的架构包括RNN变体来处理不规则时间间隔，但是卷积神经网络在不规则采样 Setting 中尚未得到了充分的研究。在这篇论文中，我们将时间序列中的卷积层参数化，使用时间explicitly初始化的kernel。这种通用时间函数可以增强隐藏时间序列的学习过程，并可以高效地被包含到卷积核心的 weights 中。因此，我们提出了时间参数化卷积神经网络（TPCNN），它与普通的卷积神经网络 sharing similar properties，但是特别地设计 для不规则时间序列。我们在实验中使用TPCNN进行 interpolate 和 classification 任务，并对实际的不规则时间序列多变量数据进行评估。我们的实验结果表明，提议的 TPCNN 模型在竞争性和效率两个方面具有竞争力，而且可以更好地利用输入序列的学习可能性，通过组合学习时间函数来提高网络性能，并且可以更快地在这一领域中应用卷积神经网络。

Communication-Free Distributed GNN Training with Vertex Cut

paper_url: http://arxiv.org/abs/2308.03209
repo_url: None
paper_authors: Kaidi Cao, Rui Deng, Shirley Wu, Edward W Huang, Karthik Subbian, Jure Leskovec
for: 加速图 neural network（GNN）在实际图中的训练，以便应对实际图中的巨量数据和复杂结构。
methods: 提出了一种新的分布式训练框架CoFree-GNN，通过减少交互 communication来加速训练过程，并采用骨架剖分法保持图结构。
results: 在实际网络上进行了广泛的实验，demonstrating that CoFree-GNN can speed up GNN training by up to 10 times compared to existing state-of-the-art methods.

Abstract
Training Graph Neural Networks (GNNs) on real-world graphs consisting of billions of nodes and edges is quite challenging, primarily due to the substantial memory needed to store the graph and its intermediate node and edge features, and there is a pressing need to speed up the training process. A common approach to achieve speed up is to divide the graph into many smaller subgraphs, which are then distributed across multiple GPUs in one or more machines and processed in parallel. However, existing distributed methods require frequent and substantial cross-GPU communication, leading to significant time overhead and progressively diminishing scalability. Here, we introduce CoFree-GNN, a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training. The framework utilizes a Vertex Cut partitioning, i.e., rather than partitioning the graph by cutting the edges between partitions, the Vertex Cut partitions the edges and duplicates the node information to preserve the graph structure. Furthermore, the framework maintains high model accuracy by incorporating a reweighting mechanism to handle a distorted graph distribution that arises from the duplicated nodes. We also propose a modified DropEdge technique to further speed up the training process. Using an extensive set of experiments on real-world networks, we demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.

摘要
训练图 neural network（GNN）在实际图中包含数百亿个节点和边的情况下是非常困难的，主要是因为需要很大的内存来存储图和其间途节点和边特征的存储。随着图的规模的增长，训练过程的速度变得非常重要。现有的分布式方法需要频繁的跨GPU通信，导致训练过程中的时间开销很大，并且随着图的规模的增长，缓存的缺省值逐渐减少。在这里，我们介绍了CoFree-GNN，一种新的分布式GNN训练框架，可以快速加速GNN训练过程。该框架使用顶点割分法，而不是将图分成多个分区，然后在多个GPU上并行处理。此外，框架还保持了高精度模型，通过对填充的节点数据进行重新权重来处理受损的图分布。我们还提出了一种修改后 DropEdge 技术，以进一步加速训练过程。通过对实际网络进行了广泛的实验，我们证明了CoFree-GNN可以在实际图中加速GNN训练过程，并且可以达到现有状态的 искусственный智能训练方法的10倍速度。

Microvasculature Segmentation in Human BioMolecular Atlas Program (HuBMAP)

paper_url: http://arxiv.org/abs/2308.03203
repo_url: None
paper_authors: Youssef Sultan, Yongqiang Wang, James Scanlon, Lisa D’lima
for: 这个研究旨在为 HuBMAP 项目提供细胞分割技术，以创建详细的人体细胞地图。
methods: 该研究使用 FastAI U-Net 模型作基础，并对其进行了多种变化，包括不同的后处理架构、更深的模型和特征峰网络。
results: 研究对不同方法的性能进行了严格评估，并提供了有价值的探索和Future研究方向。

Abstract
Image segmentation serves as a critical tool across a range of applications, encompassing autonomous driving's pedestrian detection and pre-operative tumor delineation in the medical sector. Among these applications, we focus on the National Institutes of Health's (NIH) Human BioMolecular Atlas Program (HuBMAP), a significant initiative aimed at creating detailed cellular maps of the human body. In this study, we concentrate on segmenting various microvascular structures in human kidneys, utilizing 2D Periodic Acid-Schiff (PAS)-stained histology images. Our methodology begins with a foundational FastAI U-Net model, upon which we investigate alternative backbone architectures, delve into deeper models, and experiment with Feature Pyramid Networks. We rigorously evaluate these varied approaches by benchmarking their performance against our baseline U-Net model. This study thus offers a comprehensive exploration of cutting-edge segmentation techniques, providing valuable insights for future research in the field.

摘要
Image segmentation serves as a critical tool across a range of applications, encompassing autonomous driving's pedestrian detection and pre-operative tumor delineation in the medical sector. Among these applications, we focus on the National Institutes of Health's (NIH) Human BioMolecular Atlas Program (HuBMAP), a significant initiative aimed at creating detailed cellular maps of the human body. In this study, we concentrate on segmenting various microvascular structures in human kidneys, utilizing 2D Periodic Acid-Schiff (PAS)-stained histology images. Our methodology begins with a foundational FastAI U-Net model, upon which we investigate alternative backbone architectures, delve into deeper models, and experiment with Feature Pyramid Networks. We rigorously evaluate these varied approaches by benchmarking their performance against our baseline U-Net model. This study thus offers a comprehensive exploration of cutting-edge segmentation techniques, providing valuable insights for future research in the field.Here's the word-for-word translation of the text into Simplified Chinese:图像分割 serves as a critical tool across a range of applications, including autonomous driving的 pedestrian detection和医疗领域的 pre-operative tumor delineation。在这些应用程序中，我们关注国家医疗研究所（NIH）的人类生物分子图库计划（HuBMAP），这是一项创建详细人体细胞地图的重要 iniciative。在这种研究中，我们关注人类肾脏中的微血管结构分割，使用2D Periodic Acid-Schiff（PAS）染色 histology 图像。我们的方法开始于基础 FastAI U-Net 模型，然后我们 investigate alternative backbone architectures、 deeper models、和 Feature Pyramid Networks。我们严格评估这些不同的方法，对比我们的基eline U-Net 模型。这种研究 thus offers a comprehensive exploration of cutting-edge segmentation techniques, providing valuable insights for future research in the field。

Source-free Domain Adaptive Human Pose Estimation

paper_url: http://arxiv.org/abs/2308.03202
repo_url: https://github.com/davidpengucf/sfdahpe
paper_authors: Qucheng Peng, Ce Zheng, Chen Chen
for: 本研究旨在解决人体姿态估计（HPE）中数据隐私和安全问题，提出了一种新的任务：无源频道适应HPE。
methods: 本研究提出了一种新的框架，包括三个模型：源模型、中间模型和目标模型，从源保护和目标相关两个角度解决了问题。 source-protect模块更好地保持源信息，而target-relevant模块减少了空间表示的稀疏性，通过建立一个新的空间概率空间和姿势特异性学习和信息增强来解决这个问题。
results: 对多个频道适应HPEbenchmark进行了广泛的实验，结果表明，提出的方法在与现有方法进行比较时得到了显著的改进。代码可以在https://github.com/davidpengucf/SFDAHPE中下载。

Abstract
Human Pose Estimation (HPE) is widely used in various fields, including motion analysis, healthcare, and virtual reality. However, the great expenses of labeled real-world datasets present a significant challenge for HPE. To overcome this, one approach is to train HPE models on synthetic datasets and then perform domain adaptation (DA) on real-world data. Unfortunately, existing DA methods for HPE neglect data privacy and security by using both source and target data in the adaptation process. To this end, we propose a new task, named source-free domain adaptive HPE, which aims to address the challenges of cross-domain learning of HPE without access to source data during the adaptation process. We further propose a novel framework that consists of three models: source model, intermediate model, and target model, which explores the task from both source-protect and target-relevant perspectives. The source-protect module preserves source information more effectively while resisting noise, and the target-relevant module reduces the sparsity of spatial representations by building a novel spatial probability space, and pose-specific contrastive learning and information maximization are proposed on the basis of this space. Comprehensive experiments on several domain adaptive HPE benchmarks show that the proposed method outperforms existing approaches by a considerable margin. The codes are available at https://github.com/davidpengucf/SFDAHPE.

摘要
人体姿势估计（HPE）在多个领域得到广泛应用，包括运动分析、医疗和虚拟现实。然而，实际世界数据集的高成本成为HPE的一大挑战。为了解决这个问题，一种方法是在HPE模型上训练 synthetic 数据集，然后进行领域适应（DA）操作实际世界数据。然而，现有的 DA 方法 для HPE 忽视了数据隐私和安全性，使用了源和目标数据在适应过程中。为此，我们提出了一个新的任务，即无源领域适应 HPE，旨在解决 HPE 的跨领域学习问题，不需要源数据的访问 during adaptation process。我们还提出了一个新的框架，包括三个模型：源模型、中间模型和目标模型，该框架从源保护和目标相关两个角度出发，探讨了这个任务。源保护模块更好地保留源信息，同时抗抗噪，目标相关模块通过建立一个新的空间概率空间，减少了空间表示的稀疏性，并通过基于这个空间的姿势特异性学习和信息最大化来提高姿势估计的精度。通过对多个领域适应 HPE benchmark 进行广泛的实验，我们发现了提议方法在现有方法之上得到了较大的提升。代码可以在中下载。

Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

paper_url: http://arxiv.org/abs/2308.03188
repo_url: https://github.com/teacherpeterpan/self-correction-llm-papers
paper_authors: Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, William Yang Wang
for: This paper aims to provide a comprehensive review of techniques for self-correction in large language models (LLMs) to address undesired behaviors such as hallucination, unfaithful reasoning, and toxic content.
methods: The paper reviews and taxonomizes recent work utilizing self-correction techniques, including training-time, generation-time, and post-hoc correction methods.
results: The paper summarizes the major applications of self-correction techniques in LLMs and discusses future directions and challenges in this emerging area of research.Here is the same information in Simplified Chinese text:
for: 这篇论文目的是为大语言模型（LLM）提供全面的审查，以解决它们的不良行为，如幻见、不负责任的思维和恶势感。
methods: 论文回顾和分类了最近几年利用自我修复技术的研究，包括训练时、生成时和后续修复方法。
results: 论文总结了自我修复技术在LLM中的主要应用场景，并讨论未来的发展和挑战。

Abstract
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. However, their efficacy is undermined by undesired and inconsistent behaviors, including hallucination, unfaithful reasoning, and toxic content. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. Techniques leveraging automated feedback -- either produced by the LLM itself or some external system -- are of particular interest as they are a promising way to make LLM-based solutions more practical and deployable with minimal human feedback. This paper presents a comprehensive review of this emerging class of techniques. We analyze and taxonomize a wide array of recent work utilizing these strategies, including training-time, generation-time, and post-hoc correction. We also summarize the major applications of this strategy and conclude by discussing future directions and challenges.

摘要
Translation notes:* "Large language models" (LLMs) is translated as "大型语言模型" (dàxìng yǔyán módelǐ)* "Natural language processing" (NLP) is translated as "自然语言处理" (zìrán yǔyán xùzhì)* "Hallucination" is translated as "幻见" (hèngjiàn)* "Unfaithful reasoning" is translated as "不寻常的理解" (bù zhǎngcháng de lǐjiě)* "Toxic content" is translated as "毒害内容" (dāo hài nèirong)* "Self-correction" is translated as "自动修复" (zìdòng xiūgòng)* "Automated feedback" is translated as "自动反馈" (zìdòng fāngxiàn)* "Training-time" is translated as "训练时间" (xùnlí shíjiān)* "Generation-time" is translated as "生成时间" (shēngchǎng shíjiān)* "Post-hoc correction" is translated as "后续修复" (hòu xiāng xiūgòng)* "Major applications" is translated as "主要应用" (zhǔyào yìngyòu)* "Future directions" is translated as "未来方向" (wèilái fāngdìng)* "Challenges" is translated as "挑战" (tiǎozhàn)

A Lightweight Method for Modeling Confidence in Recommendations with Learned Beta Distributions

paper_url: http://arxiv.org/abs/2308.03186
repo_url: https://github.com/nkny/confidencerecsys2023
paper_authors: Norman Knyazev, Harrie Oosterhuis
for: 提供一种简单实用的推荐方法，可以提供推荐结果的信息度量。
methods: 使用学习 beta 分布来预测用户喜好，该方法可以简单实现，同时可以提供明确的信息度量。
results: 对比 existed 方法，本方法可以保持竞争性的准确率，同时信息度量和准确率之间存在显著正相关性。此外，在高精度目标推荐任务中，本方法的表现更高。

Abstract
Most Recommender Systems (RecSys) do not provide an indication of confidence in their decisions. Therefore, they do not distinguish between recommendations of which they are certain, and those where they are not. Existing confidence methods for RecSys are either inaccurate heuristics, conceptually complex or computationally very expensive. Consequently, real-world RecSys applications rarely adopt these methods, and thus, provide no confidence insights in their behavior. In this work, we propose learned beta distributions (LBD) as a simple and practical recommendation method with an explicit measure of confidence. Our main insight is that beta distributions predict user preferences as probability distributions that naturally model confidence on a closed interval, yet can be implemented with the minimal model-complexity. Our results show that LBD maintains competitive accuracy to existing methods while also having a significantly stronger correlation between its accuracy and confidence. Furthermore, LBD has higher performance when applied to a high-precision targeted recommendation task. Our work thus shows that confidence in RecSys is possible without sacrificing simplicity or accuracy, and without introducing heavy computational complexity. Thereby, we hope it enables better insight into real-world RecSys and opens the door for novel future applications.

摘要
大多数推荐系统（RecSys）没有提供决策的信任度指示。因此，它们不能分辨出它们是否具有信任度。现有的信任方法对RecSys是 Either inaccurate heuristics, conceptually complex or computationally very expensive。因此，现实世界中的RecSys应用 rare adopt these methods，并因此无法提供信任信息。在这项工作中，我们提议使用学习beta分布（LBD）作为简单而实用的推荐方法，具有显式的信任度度量。我们的主要发现是，beta分布预测用户喜好的概率分布，自然地模型信任的闭合区间，但可以实现最小的模型复杂度。我们的结果表明，LBD与现有方法具有相似的精度，而且其准确性和信任度之间存在显著的相关性。此外，LBD在高精度目标推荐任务中表现更高。因此，我们的工作显示了信任度在RecSys中是可能的，不需要牺牲简单性或准确性，也不需要承受重量的计算复杂度。这有助于提供更好的察看实际RecSys的信息，并开启了未来应用的新可能性。

A Critical Review of Physics-Informed Machine Learning Applications in Subsurface Energy Systems

paper_url: http://arxiv.org/abs/2308.04457
repo_url: None
paper_authors: Abdeldjalil Latrach, Mohamed Lamine Malki, Misael Morales, Mohamed Mehana, Minou Rabiei
For: The paper is written for researchers and practitioners in the field of machine learning, particularly in the area of physics-informed machine learning (PIML), to provide a comprehensive review of its applications in subsurface energy systems, such as the oil and gas industry.* Methods: The paper uses a literature review to discuss the current state of PIML techniques and their applications in various fields, including seismic applications, reservoir simulation, hydrocarbons production forecasting, and intelligent decision-making in the exploration and production stages.* Results: The paper highlights the successful utilization of PIML for tasks related to subsurface energy systems, demonstrating its ability to provide more accurate and reliable predictions for resource management and operational efficiency. Additionally, it shows the potential of PIML to revolutionize the oil and gas industry and other emerging areas of interest, such as carbon and hydrogen storage, and geothermal systems.

Abstract
Machine learning has emerged as a powerful tool in various fields, including computer vision, natural language processing, and speech recognition. It can unravel hidden patterns within large data sets and reveal unparalleled insights, revolutionizing many industries and disciplines. However, machine and deep learning models lack interpretability and limited domain-specific knowledge, especially in applications such as physics and engineering. Alternatively, physics-informed machine learning (PIML) techniques integrate physics principles into data-driven models. By combining deep learning with domain knowledge, PIML improves the generalization of the model, abidance by the governing physical laws, and interpretability. This paper comprehensively reviews PIML applications related to subsurface energy systems, mainly in the oil and gas industry. The review highlights the successful utilization of PIML for tasks such as seismic applications, reservoir simulation, hydrocarbons production forecasting, and intelligent decision-making in the exploration and production stages. Additionally, it demonstrates PIML's capabilities to revolutionize the oil and gas industry and other emerging areas of interest, such as carbon and hydrogen storage; and geothermal systems by providing more accurate and reliable predictions for resource management and operational efficiency.

摘要
Translation Notes:* "unravel" 解释 (jiě jiě)* "hidden patterns" 隐藏的模式 (hìn zì de mó shì)* "revolutionize" 革命化 (gé mín huà)* "domain knowledge" 领域知识 (zhōng yì zhī shī)* "physics-informed" 物理学习 (wù lǐ xué xí)* "abides by" 遵循 (zhèng xiǎng)* "interpretability" 可解释性 (kě jì e xiǎng xìng)* "subsurface energy systems" 地层能源系统 (dì céng néng yuán xìng zhì)* "oil and gas industry" 石油和天然气业 (shí yóu hé tiān nàng qì yè)* "seismic applications" 地震应用 (dì zhèn yìng yòu)* "reservoir simulation" 储量模拟 (chuī liàng mó xiǎng)* "hydrocarbons production forecasting" 矿物质生产预测 (kuàng wù zhì shēng chéng yù jì)* "intelligent decision-making" 智能决策 (zhì néng jí suī)* "resource management" 资源管理 (yùn xīn guǎn lí)* "operational efficiency" 运营效率 (yùn yìng xiǎng jì)* "carbon and hydrogen storage" 碳和氢存储 (dàn hé hóu cè yù)* "geothermal systems" 地热系统 (dì rè xìng zhì)

Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience

paper_url: http://arxiv.org/abs/2308.03175
repo_url: None
paper_authors: Rongguang Wang, Guray Erus, Pratik Chaudhari, Christos Davatzikos
for: 这篇论文目的是为了解决机器学习在医疗领域中的可重现性问题，特别是在医学中。
methods: 这篇论文使用了权重的机制实现零偏好的学习方法，将来自不同来源的数据结合来预测目标群的结果。
results: 这篇论文的结果显示，这种方法可以在多来源数据上建立更好的预测模型，并且可以在不同的扫描器、实验室和人口特征下进行可重现性的预测。

Abstract
Machine learning (ML) has shown great promise for revolutionizing a number of areas, including healthcare. However, it is also facing a reproducibility crisis, especially in medicine. ML models that are carefully constructed from and evaluated on a training set might not generalize well on data from different patient populations or acquisition instrument settings and protocols. We tackle this problem in the context of neuroimaging of Alzheimer's disease (AD), schizophrenia (SZ) and brain aging. We develop a weighted empirical risk minimization approach that optimally combines data from a source group, e.g., subjects are stratified by attributes such as sex, age group, race and clinical cohort to make predictions on a target group, e.g., other sex, age group, etc. using a small fraction (10%) of data from the target group. We apply this method to multi-source data of 15,363 individuals from 20 neuroimaging studies to build ML models for diagnosis of AD and SZ, and estimation of brain age. We found that this approach achieves substantially better accuracy than existing domain adaptation techniques: it obtains area under curve greater than 0.95 for AD classification, area under curve greater than 0.7 for SZ classification and mean absolute error less than 5 years for brain age prediction on all target groups, achieving robustness to variations of scanners, protocols, and demographic or clinical characteristics. In some cases, it is even better than training on all data from the target group, because it leverages the diversity and size of a larger training set. We also demonstrate the utility of our models for prognostic tasks such as predicting disease progression in individuals with mild cognitive impairment. Critically, our brain age prediction models lead to new clinical insights regarding correlations with neurophysiological tests.

摘要

Two Sides of Miscalibration: Identifying Over and Under-Confidence Prediction for Network Calibration

paper_url: http://arxiv.org/abs/2308.03172
repo_url: https://github.com/aoshuang92/miscalibration_ts
paper_authors: Shuang Ao, Stefan Rueger, Advaith Siddharthan
for: 本研究旨在解决深度神经网络的准确预测问题中的可靠性问题，即训练数据集中的模型可靠性问题。
methods: 本研究提出了一种新的评估方法，即误差评估指标，以评估模型的整体和分类准确率。此外，本研究还提出了一种基于分类误差评估指标的calibration技术，可以解决模型过于自信和不够自信的问题。
results: 本研究的实验结果表明，提出的误差评估指标和calibration技术可以substantially outperform现有的calibration技术。此外，在一个自动失败检测任务中，我们的方法也提高了模型的失败检测和可靠性。

Abstract
Proper confidence calibration of deep neural networks is essential for reliable predictions in safety-critical tasks. Miscalibration can lead to model over-confidence and/or under-confidence; i.e., the model's confidence in its prediction can be greater or less than the model's accuracy. Recent studies have highlighted the over-confidence issue by introducing calibration techniques and demonstrated success on various tasks. However, miscalibration through under-confidence has not yet to receive much attention. In this paper, we address the necessity of paying attention to the under-confidence issue. We first introduce a novel metric, a miscalibration score, to identify the overall and class-wise calibration status, including being over or under-confident. Our proposed metric reveals the pitfalls of existing calibration techniques, where they often overly calibrate the model and worsen under-confident predictions. Then we utilize the class-wise miscalibration score as a proxy to design a calibration technique that can tackle both over and under-confidence. We report extensive experiments that show our proposed methods substantially outperforming existing calibration techniques. We also validate our proposed calibration technique on an automatic failure detection task with a risk-coverage curve, reporting that our methods improve failure detection as well as trustworthiness of the model. The code are available at \url{https://github.com/AoShuang92/miscalibration_TS}.

摘要
深度神经网络的正确信度调整是关键 для可靠的预测结果。不当的信度调整可能导致模型过度自信或者不足自信，即模型对其预测结果的自信度高于或低于实际精度。当前的研究主要关注过度自信的问题，并已经在不同的任务上展现了成功。然而，下降自信的问题尚未得到充分的注意。在这篇论文中，我们强调了对下降自信的需要，并提出了一种新的评价指标——信度混乱分数，用于评估模型的总体和分类准确程度。我们的提议的评价指标 revelas了现有的准则化技术的缺陷，它们经常对模型进行过度准则化，从而使下降自信的预测结果更加差。然后，我们使用分类准确度下降的指标作为代理，设计了一种能够杜绝过度和下降自信的准则化技术。我们进行了广泛的实验，并证明了我们的提议方法在现有的准则化技术之上显著超越。此外，我们还验证了我们的提议准则化技术在自动故障检测任务中的可靠性和信任性，通过发布了一个风险覆盖曲线。代码可以在 \url{https://github.com/AoShuang92/miscalibration_TS} 中找到。

Detection of Anomalies in Multivariate Time Series Using Ensemble Techniques

paper_url: http://arxiv.org/abs/2308.03171
repo_url: None
paper_authors: Anastasios Iliopoulos, John Violos, Christos Diou, Iraklis Varlamis
for: 这篇论文主要关注于多变量时间序列异常探测，以解决许多领域中的问题。
methods: 本论文提出了一种基于深度神经网络的方法，包括LSTM、自动Encoder和嵌入式自动Encoder等。这些方法在具有偏差数据的情况下表现出色。然而，当应用到多变量时间序列时，异常可能从一小subset的特征集中发生。为了提高这些基本模型的表现，我们提出了一种特征袋包技术，将特征集分成多个子集，并对每个子集进行适当的变数转换。
results: 本论文的实验结果显示，提出的组合技术可以对SKAB资料集进行异常探测，并且在不监控和半监控情况下都有着良好的表现。具体来说，这篇论文的数据显示，使用组合技术可以对SKAB资料集进行异常探测，并且在不监控情况下，异常探测精度提高了2%，而在半监控情况下，异常探测精度提高了10%以上。

Abstract
Anomaly Detection in multivariate time series is a major problem in many fields. Due to their nature, anomalies sparsely occur in real data, thus making the task of anomaly detection a challenging problem for classification algorithms to solve. Methods that are based on Deep Neural Networks such as LSTM, Autoencoders, Convolutional Autoencoders etc., have shown positive results in such imbalanced data. However, the major challenge that algorithms face when applied to multivariate time series is that the anomaly can arise from a small subset of the feature set. To boost the performance of these base models, we propose a feature-bagging technique that considers only a subset of features at a time, and we further apply a transformation that is based on nested rotation computed from Principal Component Analysis (PCA) to improve the effectiveness and generalization of the approach. To further enhance the prediction performance, we propose an ensemble technique that combines multiple base models toward the final decision. In addition, a semi-supervised approach using a Logistic Regressor to combine the base models' outputs is proposed. The proposed methodology is applied to the Skoltech Anomaly Benchmark (SKAB) dataset, which contains time series data related to the flow of water in a closed circuit, and the experimental results show that the proposed ensemble technique outperforms the basic algorithms. More specifically, the performance improvement in terms of anomaly detection accuracy reaches 2% for the unsupervised and at least 10% for the semi-supervised models.

摘要
异常检测在多变量时间序列中是许多领域的主要问题。由于异常事件罕见，因此对分类算法来说是一个困难的问题。基于深度神经网络的方法，如LSTM、Autoencoder、Convolutional Autoencoder等，在这样的不均衡数据中显示出了正面的效果。然而，在多变量时间序列中，异常可能来自一小部分特征集。为了提高基本模型的性能，我们提议一种特征袋包技术，该技术只考虑特定的子集特征，并应用基于Principal Component Analysis（PCA）的嵌入式旋转变换来提高效果和泛化性。此外，我们还提议一种集成技术，将多个基本模型的输出集成到最终决策中。此外，我们还提出了一种半监督方法，使用Logistic Regressor将基本模型的输出集成到最终决策中。我们对Skoltech异常数据集（SKAB）进行了实验，该数据集包含关于水流在关闭环circuit中的时间序列数据，实验结果显示，我们的ensemble方法在异常检测精度方面与基本算法相比，提高了2%（不监督）和至少10%（半监督）。

FireFly A Synthetic Dataset for Ember Detection in Wildfire

paper_url: http://arxiv.org/abs/2308.03164
repo_url: https://github.com/ergowho/firefly2.0
paper_authors: Yue Hu, Xinan Ye, Yifei Liu, Souvik Kundu, Gourav Datta, Srikar Mutnuri, Namo Asavisanu, Nora Ayanian, Konstantinos Psounis, Peter Beerel
for: 本研究写作了一个名为FireFly的人工数据集，用于发现灯火。
methods: 本研究使用Unreal Engine 4（UE4）创建了一个自动生成的统计数据集，并使用自动化的标签工具来生成标签数据。
results: 本研究在四个流行的物体检测模型上进行了评估，并获得了8.57%的平均精度提升（mAP）在实际的野火enario中，相比于仅以小型实际数据进行训练。

Abstract
This paper presents "FireFly", a synthetic dataset for ember detection created using Unreal Engine 4 (UE4), designed to overcome the current lack of ember-specific training resources. To create the dataset, we present a tool that allows the automated generation of the synthetic labeled dataset with adjustable parameters, enabling data diversity from various environmental conditions, making the dataset both diverse and customizable based on user requirements. We generated a total of 19,273 frames that have been used to evaluate FireFly on four popular object detection models. Further to minimize human intervention, we leveraged a trained model to create a semi-automatic labeling process for real-life ember frames. Moreover, we demonstrated an up to 8.57% improvement in mean Average Precision (mAP) in real-world wildfire scenarios compared to models trained exclusively on a small real dataset.

摘要
这份论文提出了“火萝虫”，一个使用Unreal Engine 4（UE4）创建的人工数据集，用于缺乏ember特有训练资源的缺陷。为创建这个数据集，我们提供了一个自动生成Synthetic标注数据集的工具，可以根据用户需求进行自定义，以实现数据集的多样性和自定义。我们总共生成了19273帧，用于评估FireFly在四种流行的物体检测模型上。此外，我们还利用一个已经训练的模型来创建一种半自动的标注过程，以便为真实的萝虫框架进行标注。此外，我们还证明了在实际野外爆发火情况下，FireFly比只在小型实际数据集上训练的模型提高了8.57%的平均准确率。

2023-08-07

eess.IV

eess.IV - 2023-08-07

SoilNet: An Attention-based Spatio-temporal Deep Learning Framework for Soil Organic Carbon Prediction with Digital Soil Mapping in Europe

paper_url: http://arxiv.org/abs/2308.03586
repo_url: None
paper_authors: Nafiseh Kakhani, Moien Rangzan, Ali Jamali, Sara Attarchi, Seyed Kazem Alavipanah, Thomas Scholten
for: 这个研究旨在精确地描述土壤属性的空间分布，并且运用深度学习技术来预测土壤碳数量。
methods: 本研究提出了一个新的架构， combining 类神经网络（CNN）模型和时间缓冲机制，以及一个长期记忆（LSTM）网络，用于预测欧洲各地的土壤碳数量。
results: 研究结果显示，提案的架构比普遍使用的机器学习方法（如随机森林）精度高，具有较低的根平方差误差（RMSE）。这个模型可以作为预测土壤碳数量的robust工具，并且可以应用到其他土壤特性的预测中。

Abstract
Digital soil mapping (DSM) is an advanced approach that integrates statistical modeling and cutting-edge technologies, including machine learning (ML) methods, to accurately depict soil properties and their spatial distribution. Soil organic carbon (SOC) is a crucial soil attribute providing valuable insights into soil health, nutrient cycling, greenhouse gas emissions, and overall ecosystem productivity. This study highlights the significance of spatial-temporal deep learning (DL) techniques within the DSM framework. A novel architecture is proposed, incorporating spatial information using a base convolutional neural network (CNN) model and spatial attention mechanism, along with climate temporal information using a long short-term memory (LSTM) network, for SOC prediction across Europe. The model utilizes a comprehensive set of environmental features, including Landsat-8 images, topography, remote sensing indices, and climate time series, as input features. Results demonstrate that the proposed framework outperforms conventional ML approaches like random forest commonly used in DSM, yielding lower root mean square error (RMSE). This model is a robust tool for predicting SOC and could be applied to other soil properties, thereby contributing to the advancement of DSM techniques and facilitating land management and decision-making processes based on accurate information.

摘要
《数字化土壤地图（DSM）是一种高级方法，通过统计模型和先进技术，包括机器学习（ML）方法，准确描述土壤属性和其空间分布。土壤有机碳（SOC）是一项重要的土壤特征，提供了有价值的信息关于土壤健康、营养循环、温室气体排放和生态系统产生力。本研究强调了在 DSM 框架中使用深度学习（DL）技术的重要性。本文提出了一种新的架构，其包括使用基本的卷积神经网络（CNN）模型和空间注意机制，以及使用长期短 памяouss术（LSTM）网络，为欧洲地区的 SOC 预测。该模型使用了包括 Landsat-8 图像、地形、遥感指数和气候时间序列等环境特征作为输入特征。结果表明，提议的框架在比Random Forest 常用于 DSM 的 ML 方法下，得到了更低的根平均方差误差（RMSE）。这种模型是一种可靠的 SOC 预测工具，可以应用于其他土壤属性预测，从而为土地管理和决策过程提供准确信息的支持。

Quantitative MR Image Reconstruction using Parameter-Specific Dictionary Learning with Adaptive Dictionary-Size and Sparsity-Level Choice

paper_url: http://arxiv.org/abs/2308.03460
repo_url: None
paper_authors: Andreas Kofler, Kirsten Miriam Kerkering, Laura Göschel, Ariane Fillmer, Cristoph Kolbitsch
for: 提出一种方法用于量化磁共振成像（QMRI）中的参数地图重建。
methods: 使用字典学习（DL）和稀疏编码（SC）算法自动计算参数地图的最佳字典大小和稀疏程度，并对每个参数地图进行自适应调整。
results: 相比MAP方法和其他基于稀疏性的方法（TV、Wl、Sh），提出的方法在PSNR和RMSE上表现更好，同时能够加速重建过程约七倍。Here’s the full text in Simplified Chinese:
for: 本文提出了一种方法，用于量化磁共振成像（QMRI）中的参数地图重建。
methods: 该方法使用字典学习（DL）和稀疏编码（SC）算法自动计算参数地图的最佳字典大小和稀疏程度，并对每个参数地图进行自适应调整。
results: 相比MAP方法和其他基于稀疏性的方法（TV、Wl、Sh），提出的方法在PSNR和RMSE上表现更好，同时能够加速重建过程约七倍。I hope that helps!

Abstract
Objective: We propose a method for the reconstruction of parameter-maps in Quantitative Magnetic Resonance Imaging (QMRI). Methods: Because different quantitative parameter-maps differ from each other in terms of local features, we propose a method where the employed dictionary learning (DL) and sparse coding (SC) algorithms automatically estimate the optimal dictionary-size and sparsity level separately for each parameter-map. We evaluated the method on a $T_1$-mapping QMRI problem in the brain using the BrainWeb data as well as in-vivo brain images acquired on an ultra-high field 7T scanner. We compared it to a model-based acceleration for parameter mapping (MAP) approach, other sparsity-based methods using total variation (TV), Wavelets (Wl) and Shearlets (Sh), and to a method which uses DL and SC to reconstruct qualitative images, followed by a non-linear (DL+Fit). Results: Our algorithm surpasses MAP, TV, Wl and Sh in terms of RMSE and PSNR. It yields better or comparable results to DL+Fit by additionally significantly accelerating the reconstruction by a factor of approximately seven. Conclusion: The proposed method outperforms the reported methods of comparison and yields accurate $T_1$-maps. Although presented for $T_1$-mapping in the brain, our method's structure is general and thus most probably also applicable for the the reconstruction of other quantitative parameters in other organs. Significance: From a clinical perspective, the obtained $T_1$-maps could be utilized to differentiate between healthy subjects and patients with Alzheimer's disease. From a technical perspective, the proposed unsupervised method could be employed to obtain ground-truth data for the development of data-driven methods based on supervised learning.+

摘要
Methods: 因为不同的量化参数地图之间的本地特征不同，我们提议使用自适应词库学习（DL）和稀疏编码（SC）算法自动计算参数地图的优化词库大小和稀疏性水平。我们对BrainWeb数据集和7T磁共振成像机上实验取得的生物体内部图像进行评估。我们与参数映射（MAP）方法、总变量（TV）、波лет（Wl）和扭变（Sh）等其他稀疏方法进行比较，以及使用DL和SC重建质量图像，然后使用非线性（DL+Fit）方法。Results: 我们的算法在RMSE和PSNR方面都高于MAP、TV、Wl和Sh，并且与DL+Fit相比，同时提供了约七倍的加速。Conclusion: 我们提出的方法在$T_1$-mapping问题上表现出色，并且可以在脑部其他参数的重建中使用。尽管我们只对脑部的$T_1$-mapping进行了评估，但我们的方法结构是通用的，因此可能适用于其他器官的量化参数重建。Significance: 从临床角度来看，获得的$T_1$-地图可能用于识别健康人群和患有阿尔茨海默病的患者。从技术角度来看，我们提出的无监督方法可以用于获得数据驱动学习方法的基准数据。

Lighting Every Darkness in Two Pairs: A Calibration-Free Pipeline for RAW Denoising

paper_url: http://arxiv.org/abs/2308.03448
repo_url: https://github.com/srameo/led
paper_authors: Xin Jin, Jia-Wen Xiao, Ling-Hao Han, Chunle Guo, Ruixun Zhang, Xialei Liu, Chongyi Li
for: 提高低光环境下 RAW 图像减噪性能，不需要干扰式减噪方法。
methods: 基于减噪策略，对目标摄像机进行几次匹配数据和微调，以适应不同摄像机和数字增量。同时，通过特殊的结构修改，解决synthetic noise和实际噪声之间的域距离问题。
results: 与其他干扰式减噪方法相比，本方法在6对增量数据和0.5%迭代后，在低光环境下达到了更高的性能。

Abstract
Calibration-based methods have dominated RAW image denoising under extremely low-light environments. However, these methods suffer from several main deficiencies: 1) the calibration procedure is laborious and time-consuming, 2) denoisers for different cameras are difficult to transfer, and 3) the discrepancy between synthetic noise and real noise is enlarged by high digital gain. To overcome the above shortcomings, we propose a calibration-free pipeline for Lighting Every Drakness (LED), regardless of the digital gain or camera sensor. Instead of calibrating the noise parameters and training repeatedly, our method could adapt to a target camera only with few-shot paired data and fine-tuning. In addition, well-designed structural modification during both stages alleviates the domain gap between synthetic and real noise without any extra computational cost. With 2 pairs for each additional digital gain (in total 6 pairs) and 0.5% iterations, our method achieves superior performance over other calibration-based methods. Our code is available at https://github.com/Srameo/LED .

摘要
准确性基于方法在极低照度环境中对RAW图像净化得到了广泛应用。然而，这些方法受到以下主要缺点的影响：1）准备和实施准化过程是时间consuming和劳动密集的；2）描述器对不同的相机难以传递；3）高度数字增强会使 synthetic 噪声与实际噪声之间的差距变大。为了缓解以上缺点，我们提出了不需要准化的激光照明每个点（LED）管道，无论相机感知器或数字增强。而不是在每次训练中重复准化噪声参数，我们的方法可以适应目标相机只需要几个数据对和微调。此外，我们在两个阶段中设计了结构修改，以避免噪声生成器和实际噪声之间的领域差距，无需额外计算成本。使用2对每个附加的数字增强（总共6对）和0.5%迭代，我们的方法可以在其他准化基于方法上达到更高的性能。我们的代码可以在https://github.com/Srameo/LED 上找到。

Energy-Guided Diffusion Model for CBCT-to-CT Synthesis

paper_url: http://arxiv.org/abs/2308.03354
repo_url: None
paper_authors: Linjie Fu, Xia Li, Xiuding Cai, Dong Miao, Yu Yao, Yali Shen
for: 提高CBCT图像质量和Hounsfield单位精度，以便更好地用于放射治疗
methods: 基于能量导向扩散模型(EGDiff)，从CBCT图像生成synthetic CT(sCT)
results: 对胸腔肿瘤数据集进行实验，得到了具有较高精度和视觉质量的sCT图像，并且超过了现有无监督合成方法的性能。

Abstract
Cone Beam CT (CBCT) plays a crucial role in Adaptive Radiation Therapy (ART) by accurately providing radiation treatment when organ anatomy changes occur. However, CBCT images suffer from scatter noise and artifacts, making relying solely on CBCT for precise dose calculation and accurate tissue localization challenging. Therefore, there is a need to improve CBCT image quality and Hounsfield Unit (HU) accuracy while preserving anatomical structures. To enhance the role and application value of CBCT in ART, we propose an energy-guided diffusion model (EGDiff) and conduct experiments on a chest tumor dataset to generate synthetic CT (sCT) from CBCT. The experimental results demonstrate impressive performance with an average absolute error of 26.87$\pm$6.14 HU, a structural similarity index measurement of 0.850$\pm$0.03, a peak signal-to-noise ratio of the sCT of 19.83$\pm$1.39 dB, and a normalized cross-correlation of the sCT of 0.874$\pm$0.04. These results indicate that our method outperforms state-of-the-art unsupervised synthesis methods in accuracy and visual quality, producing superior sCT images.

摘要
cone beam CT (CBCT) 在适应性辐射疗法 (ART) 中发挥关键作用，准确地提供辐射治疗当器官 анатомиче变化时。然而， CBCT 图像受到扰干噪和artefacts的影响，使凭借 CBCT alone 准确地计算辐射剂量和精确地地标定组织结构困难。因此，需要改进 CBCT 图像质量和温迪尔单位 (HU) 准确性，同时保持器官结构。为了提高 CBCT 在 ART 中的应用价值，我们提议一种能量导向扩散模型 (EGDiff) ，并在胸腔肿瘤数据集上进行实验，将 CBCT 转换成Synthetic CT (sCT)。实验结果表明，我们的方法可以准确地生成高质量的 sCT 图像，其中平均绝对错误为 26.87$\pm$6.14 HU，结构相似度指数为 0.850$\pm$0.03，峰值响应信号强度为 19.83$\pm$1.39 dB，和正常化交叉相似度为 0.874$\pm$0.04。这些结果表明，我们的方法在精度和视觉质量方面都有出色的表现，生成的 sCT 图像较为优秀。

A Hybrid CNN-Transformer Architecture with Frequency Domain Contrastive Learning for Image Deraining

paper_url: http://arxiv.org/abs/2308.03340
repo_url: None
paper_authors: Cheng Wang, Wei Li
for: 图像恢复（image restoration），即修复受损图像中的雨线 Streaks 的问题。
methods: 该论文使用了 Deep Learning 技术，特别是 Convolutional Neural Networks (CNNs) 和 Generative Adversarial Networks (GANs)，以实现图像恢复。
results: 该论文实现了高效的图像恢复，可以减少或完全消除雨线 Streaks，并保持图像的原始细节和颜色彩虹。

Abstract
Image deraining is a challenging task that involves restoring degraded images affected by rain streaks.

摘要
图像推Clearing是一项具有挑战性的任务，旨在修复受到雨批 streaks 的图像。

2023-08-06

cs.SD

cs.SD - 2023-08-06

SoK: Acoustic Side Channels

paper_url: http://arxiv.org/abs/2308.03806
repo_url: None
paper_authors: Ping Wang, Shishir Nagaraja, Aurélien Bourquard, Haichang Gao, Jeff Yan
for: 本研究准确地分析了音频侧通道，涵盖了全部主要的学术研究领域，讨论了它们的安全意义和防范措施，并确定了未来研究的方向。
methods: 本研究使用了多种方法，包括侧通道分析、反向问题分析和机器学习等，以探讨音频侧通道的安全性和可靠性。
results: 本研究得到了许多有价值的结果，包括发现了一些新的侧通道攻击和防范策略，以及确定了音频侧通道和反向问题之间的深刻联系。

Abstract
We provide a state-of-the-art analysis of acoustic side channels, cover all the significant academic research in the area, discuss their security implications and countermeasures, and identify areas for future research. We also make an attempt to bridge side channels and inverse problems, two fields that appear to be completely isolated from each other but have deep connections.

摘要
我们提供了最新的分析方法，涵盖了全部重要的学术研究领域，讨论了他们的安全影响和防范措施，并确定了未来研究的方向。我们还尝试将侧频渠道和反问题两个领域联系起来，这两个领域之前被视为完全不相关的。

Characterization of cough sounds using statistical analysis

paper_url: http://arxiv.org/abs/2308.03019
repo_url: None
paper_authors: Naveenkumar Vodnala, Pratap Reddy Lankireddy, Padmasai Yarlagadda
For: The paper aims to characterize cough sounds with voiced content and cough sounds without voiced content, and compare the cough sound characteristics with speech signals.* Methods: The proposed method utilizes spectral roll-off, spectral entropy, spectral flatness, spectral flux, zero crossing rate, spectral centroid, and spectral bandwidth attributes to describe the cough sounds related to the respiratory system, glottal information, and voice model. These attributes are then subjected to statistical analysis using measures of minimum, maximum, mean, median, and standard deviation.* Results: The experimental results show that the mean and frequency distribution of spectral roll-off, spectral centroid, and spectral bandwidth are higher for cough sounds than for speech signals. Spectral flatness levels in cough sounds are found to be around 0.22, while spectral flux varies between 0.3 and 0.6. The zero crossing rate of most frames of cough sounds is between 0.05 and 0.4. These attributes contribute significant information while characterizing cough sounds.Here’s the simplified Chinese text for the three key points:* For: 这个研究的目的是Characterizing cough sounds with voiced content and cough sounds without voiced content, and comparing the cough sound characteristics with speech signals.* Methods: 该方法使用 spectral roll-off, spectral entropy, spectral flatness, spectral flux, zero crossing rate, spectral centroid, and spectral bandwidth attributes to describe the cough sounds related to the respiratory system, glottal information, and voice model. These attributes are then subjected to statistical analysis using measures of minimum, maximum, mean, median, and standard deviation.* Results: 实验结果显示，cough sounds的mean和频谱分布高于speech signals，spectral flatness levels around 0.22，spectral flux between 0.3 and 0.6，zero crossing rate between 0.05 and 0.4. These attributes contribute significant information while characterizing cough sounds.

Abstract
Cough is a primary symptom of most respiratory diseases, and changes in cough characteristics provide valuable information for diagnosing respiratory diseases. The characterization of cough sounds still lacks concrete evidence, which makes it difficult to accurately distinguish between different types of coughs and other sounds. The objective of this research work is to characterize cough sounds with voiced content and cough sounds without voiced content. Further, the cough sound characteristics are compared with the characteristics of speech. The proposed method to achieve this goal utilized spectral roll-off, spectral entropy, spectral flatness, spectral flux, zero crossing rate, spectral centroid, and spectral bandwidth attributes which describe the cough sounds related to the respiratory system, glottal information, and voice model. These attributes are then subjected to statistical analysis using the measures of minimum, maximum, mean, median, and standard deviation. The experimental results show that the mean and frequency distribution of spectral roll-off, spectral centroid, and spectral bandwidth are found to be higher for cough sounds than for speech signals. Spectral flatness levels in cough sounds will rise to 0.22, whereas spectral flux varies between 0.3 and 0.6. The Zero Crossing Rate (ZCR) of most frames of cough sounds is between 0.05 and 0.4. These attributes contribute significant information while characterizing cough sounds.

摘要
咳是许多呼吸疾病的主要症状之一，Changes in cough characteristics 提供了诊断呼吸疾病的有价值信息。然而，咳 зву的特征化仍然缺乏具体证据，这使得准确地 отличи出咳音和其他声音变得困难。本研究的目的是 caracterize cough sounds with voiced content and cough sounds without voiced content。此外，咳音特征与语音特征进行比较。提议的方法是利用spectral roll-off, spectral entropy, spectral flatness, spectral flux, zero crossing rate, spectral centroid, and spectral bandwidth attribute来描述咳音。这些特征然后被统计分析，使用最小、最大、平均、中值和标准差度量。实验结果显示，咳音的平均值和频谱分布高于语音信号。咳音中的频谱平坦度达0.22，而频谱流量在0.3和0.6之间变化。咳音中的零极点频率在0.05和0.4之间。这些特征对于 caracterizing cough sounds 提供了重要信息。 Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need the translation in Traditional Chinese, please let me know.

DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation

paper_url: http://arxiv.org/abs/2308.02915
repo_url: None
paper_authors: Qiaosong Qi, Le Zhuo, Aixi Zhang, Yue Liao, Fei Fang, Si Liu, Shuicheng Yan
for: 这 paper 的目的是生成真实的舞蹈序列，以便与输入的音乐进行有效的Alignment。
methods: 这 paper 使用了一种新的层次动态扩散模型，称为DiffDance，来生成高分辨率、长形 dance sequence。该模型包括一个音乐到舞蹈扩散模型和一个序列超分辨率扩散模型。为了将音乐和动作空间联系起来，DiffDance 使用了一个预训练的音频表示学习模型来提取音乐嵌入，并通过对比损失对其嵌入空间进行对齐。
results: 通过对 AIST++ benchmark 数据集进行了广泛的实验，DiffDance 能够生成真实的舞蹈序列，并与输入音乐进行有效的Alignment。这些结果与现有的排序法相当。

Abstract
When hearing music, it is natural for people to dance to its rhythm. Automatic dance generation, however, is a challenging task due to the physical constraints of human motion and rhythmic alignment with target music. Conventional autoregressive methods introduce compounding errors during sampling and struggle to capture the long-term structure of dance sequences. To address these limitations, we present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation. This model comprises a music-to-dance diffusion model and a sequence super-resolution diffusion model. To bridge the gap between music and motion for conditional generation, DiffDance employs a pretrained audio representation learning model to extract music embeddings and further align its embedding space to motion via contrastive loss. During training our cascaded diffusion model, we also incorporate multiple geometric losses to constrain the model outputs to be physically plausible and add a dynamic loss weight that adaptively changes over diffusion timesteps to facilitate sample diversity. Through comprehensive experiments performed on the benchmark dataset AIST++, we demonstrate that DiffDance is capable of generating realistic dance sequences that align effectively with the input music. These results are comparable to those achieved by state-of-the-art autoregressive methods.

摘要

Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

paper_url: http://arxiv.org/abs/2308.04455
repo_url: https://github.com/deep-privacy/SA-toolkit
paper_authors: Pierre Champion
for: 本论文旨在解决语音数据隐私问题，提出了一种语音匿名化方法以保护用户的隐私。
methods: 本论文使用的方法包括语音转换、量化变换等，以减少语音数据中的 speaker PPI。
results: 本论文的研究结果显示，量化变换可以减少 speaker PPI 而不影响语音信号的用处。同时，本论文也提出了一种新的攻击方法来逆转匿名化。

Abstract
The growing use of voice user interfaces has led to a surge in the collection and storage of speech data. While data collection allows for the development of efficient tools powering most speech services, it also poses serious privacy issues for users as centralized storage makes private personal speech data vulnerable to cyber threats. With the increasing use of voice-based digital assistants like Amazon's Alexa, Google's Home, and Apple's Siri, and with the increasing ease with which personal speech data can be collected, the risk of malicious use of voice-cloning and speaker/gender/pathological/etc. recognition has increased. This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization. In this work, anonymization refers to making personal speech data unlinkable to an identity while maintaining the usefulness (utility) of the speech signal (e.g., access to linguistic content). We start by identifying several challenges that evaluation protocols need to consider to evaluate the degree of privacy protection properly. We clarify how anonymization systems must be configured for evaluation purposes and highlight that many practical deployment configurations do not permit privacy evaluation. Furthermore, we study and examine the most common voice conversion-based anonymization system and identify its weak points before suggesting new methods to overcome some limitations. We isolate all components of the anonymization system to evaluate the degree of speaker PPI associated with each of them. Then, we propose several transformation methods for each component to reduce as much as possible speaker PPI while maintaining utility. We promote anonymization algorithms based on quantization-based transformation as an alternative to the most-used and well-known noise-based approach. Finally, we endeavor a new attack method to invert anonymization.

摘要
voice用户界面的使用量在增长，导致了对话数据的收集和存储。这种数据收集可以为语音服务的开发提供效率的工具，但也会对用户造成严重的隐私问题，因为中央存储的私人个人对话数据容易受到网络攻击。随着语音基于的数字助手like Amazon的Alexa、Google的Home和Apple的Siri的使用的增加，以及对个人对话数据的收集变得更加容易，隐私抹革和 speaker/性别/疾病等识别的风险也在增加。本论文提出了对话数据的匿名化和评估其匿名化度的解决方案。在这个过程中，匿名化指的是让个人对话数据与身份分离开来，保持语音信号的有用性（如访问语言内容）。我们开始于评估协议中需要考虑的挑战，并且明确匿名化系统的配置方式，并指出了许多实际部署配置不允许隐私评估。此外，我们研究了最常用的语音转换基于匿名化系统，并发现其弱点，然后建议新的方法来解决一些限制。我们分解了匿名化系统的每个组件，并评估它们中 speaker PPI 的度量。然后，我们提议了一些转换方法，以减少 speaker PPI 的度量，同时保持Utility。我们推荐使用量化变换的匿名化算法，而不是最常用的噪音基本方法。最后，我们提出了一种新的攻击方法，用于逆转匿名化。

2023-08-06

cs.CV

cs.CV - 2023-08-06

Nest-DGIL: Nesterov-optimized Deep Geometric Incremental Learning for CS Image Reconstruction

paper_url: http://arxiv.org/abs/2308.03807
repo_url: https://github.com/fanxiaohong/Nest-DGIL
paper_authors: Xiaohong Fan, Yin Yang, Ke Chen, Yujie Feng, Jianping Zhang
for: This paper proposes a deep geometric incremental learning framework for image reconstruction, which can effectively alleviate artifacts and guarantee the reconstruction of geometric texture details.
methods: The proposed method uses second Nesterov proximal gradient optimization and a cascade geometric incremental learning module to compensate for missing texture information from different geometric spectral decomposition domains.
results: The proposed method demonstrates superior reconstruction performance compared to existing state-of-the-art methods, and can avoid the risk of intermediate reconstruction results falling outside the geometric decomposition domains.Here’s the same information in a more detailed format:
for: This paper aims to develop a deep learning-based method for image reconstruction that can effectively alleviate artifacts and preserve geometric texture details.
methods: The proposed method uses a cascade geometric incremental learning module, which is inspired by the overlap-tile strategy, to compensate for missing texture information from different geometric spectral decomposition domains. The method also employs second Nesterov proximal gradient optimization to improve the convergence speed and ensure the reconstruction of geometric texture details. All parameters in the proposed model are learnable, and an adaptive initialization technique of physical-parameters is used to make the model flexible and ensure smooth convergence.
results: The proposed method is compared with existing state-of-the-art methods, and the results show that it demonstrates superior reconstruction performance. The method can effectively alleviate artifacts and preserve geometric texture details, and can avoid the risk of intermediate reconstruction results falling outside the geometric decomposition domains.I hope this helps! Let me know if you have any further questions.

Abstract
Proximal gradient-based optimization is one of the most common strategies for solving image inverse problems as well as easy to implement. However, these techniques often generate heavy artifacts in image reconstruction. One of the most popular refinement methods is to fine-tune the regularization parameter to alleviate such artifacts, but it may not always be sufficient or applicable due to increased computational costs. In this work, we propose a deep geometric incremental learning framework based on second Nesterov proximal gradient optimization. The proposed end-to-end network not only has the powerful learning ability for high/low frequency image features,but also can theoretically guarantee that geometric texture details will be reconstructed from preliminary linear reconstruction.Furthermore, it can avoid the risk of intermediate reconstruction results falling outside the geometric decomposition domains and achieve fast convergence. Our reconstruction framework is decomposed into four modules including general linear reconstruction, cascade geometric incremental restoration, Nesterov acceleration and post-processing. In the image restoration step,a cascade geometric incremental learning module is designed to compensate for the missing texture information from different geometric spectral decomposition domains. Inspired by overlap-tile strategy, we also develop a post-processing module to remove the block-effect in patch-wise-based natural image reconstruction. All parameters in the proposed model are learnable,an adaptive initialization technique of physical-parameters is also employed to make model flexibility and ensure converging smoothly. We compare the reconstruction performance of the proposed method with existing state-of-the-art methods to demonstrate its superiority. Our source codes are available at https://github.com/fanxiaohong/Nest-DGIL.

摘要
近似梯度基本优化是解决图像反问题的一种非常常用的策略，但这些技术经常生成严重的artefacts。一种非常流行的改进方法是微调正则化参数，以避免这些artefacts，但这可能不一定是可靠的或可行的，因为它可能会增加计算成本。在这项工作中，我们提出了一种深度凸 геометрическое增量学习框架，基于第二个奈斯特洛夫梯度优化。我们的提案的端到端网络不仅具有高/低频图像特征的强大学习能力，而且可以 theoretically guarantee that geometric texture details will be reconstructed from preliminary linear reconstruction。此外，它可以避免中间重建结果落在几何分解域之外，并 achieve fast convergence。我们的重建框架分为四个模块：通用线性重建、几何增量归一化、奈斯特洛夫加速和后处理。在图像重建步骤中，我们设计了一个几何增量学习模块，用于补做不同几何 spectral decomposition domains 中缺失的文本信息。受 overlap-tile 灵感，我们还开发了一个后处理模块，用于在 patch-wise 基于自然图像重建中消除块效应。所有模型参数都是学习可以，并且我们采用了一种适应性的 физи学参数初始化技术，以确保模型的灵活性和平滑的 converges。我们与现有状态的方法进行比较，以示其超越性。我们的源代码可以在 https://github.com/fanxiaohong/Nest-DGIL 上获取。

PNN: From proximal algorithms to robust unfolded image denoising networks and Plug-and-Play methods

paper_url: http://arxiv.org/abs/2308.03139
repo_url: None
paper_authors: Hoang Trieu Vy Le, Audrey Repetti, Nelly Pustelnik
for: 这篇论文的目的是提出一种基于拟合优化理论的神经网络架构，用于解决频率噪声约束的图像恢复问题。
methods: 这篇论文使用了迭代 proximal 算法，并将其与深度学习策略结合，以提高估计质量。具体来说，这篇论文提出了一种基于 proximal 算法的神经网络架构，称为 proximal neural network (PNN)，可以解决任何基于 proximal 算法的图像恢复任务。
results: 作者们在这篇论文中提出了一种基于 dual-FB 和 primal-dual Chambolle-Pock 算法的 PNN 架构，并证明了这种架构可以在图像恢复任务中实现更好的效果。此外，作者们还提出了一些不同的学习策略，并对其的稳定性（Lipschitz 性）和恢复效果进行了 исследование。最后，作者们证明了这种 PNN 架构在一种镜像减震问题中的稳定性。

Abstract
A common approach to solve inverse imaging problems relies on finding a maximum a posteriori (MAP) estimate of the original unknown image, by solving a minimization problem. In thiscontext, iterative proximal algorithms are widely used, enabling to handle non-smooth functions and linear operators. Recently, these algorithms have been paired with deep learning strategies, to further improve the estimate quality. In particular, proximal neural networks (PNNs) have been introduced, obtained by unrolling a proximal algorithm as for finding a MAP estimate, but over a fixed number of iterations, with learned linear operators and parameters. As PNNs are based on optimization theory, they are very flexible, and can be adapted to any image restoration task, as soon as a proximal algorithm can solve it. They further have much lighter architectures than traditional networks. In this article we propose a unified framework to build PNNs for the Gaussian denoising task, based on both the dual-FB and the primal-dual Chambolle-Pock algorithms. We further show that accelerated inertial versions of these algorithms enable skip connections in the associated NN layers. We propose different learning strategies for our PNN framework, and investigate their robustness (Lipschitz property) and denoising efficiency. Finally, we assess the robustness of our PNNs when plugged in a forward-backward algorithm for an image deblurring problem.

摘要
一般来说，解决反射图像问题的常用方法是找到最大 posteriori (MAP) 估计原始未知图像，解决一个最小化问题。在这种情况下，迭代 proximal 算法广泛使用，以处理非滑动函数和线性运算员。最近，这些算法与深度学习策略相结合，以进一步改善估计质量。特别是， proximal 神经网络 (PNNs) 已经引入，它们是通过固定数量的迭代器来找到 MAP 估计，但是具有学习的线性运算员和参数。由于 PNNs 基于优化理论，它们非常灵活，可以适应任何图像恢复任务，只要可以使用 proximal 算法解决它。此外，它们的架构非常轻量级，比传统神经网络更加轻量级。在这篇文章中，我们提出一个统一的框架来建立 PNNs для Gaussian 噪声问题，基于 dual-FB 和 primal-dual Chambolle-Pock 算法。我们还证明，使用加速增量版本的这些算法可以在相关的 CNN 层中添加跳过连接。我们提出了不同的学习策略，并investigate 其Robustness 和噪声除去效率。最后，我们评估了我们的 PNNs 在一个前向-后向算法中的稳定性。

E-CLIP: Towards Label-efficient Event-based Open-world Understanding by CLIP

paper_url: http://arxiv.org/abs/2308.03135
repo_url: None
paper_authors: Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu, Lin Wang
for: 提高 event-based 图像识别 task 的性能
methods: 提出了一个新的框架 E-CLIP，通过在 event 数据上模型缺失的大规模数据集和模式差异，挖掘 CLIP 的潜在能力
results: 在 N-Caltech 数据集上取得了 +3.94% 和 +4.62% 的提升，在细化设定和少量示例设定中都达到了最佳性能

Abstract
Contrasting Language-image pertaining (CLIP) has recently shown promising open-world and few-shot performance on 2D image-based recognition tasks. However, the transferred capability of CLIP to the novel event camera data still remains under-explored. In particular, due to the modality gap with the image-text data and the lack of large-scale datasets, achieving this goal is non-trivial and thus requires significant research innovation. In this paper, we propose E-CLIP, a novel and effective framework that unleashes the potential of CLIP for event-based recognition to compensate for the lack of large-scale event-based datasets. Our work addresses two crucial challenges: 1) how to generalize CLIP's visual encoder to event data while fully leveraging events' unique properties, e.g., sparsity and high temporal resolution; 2) how to effectively align the multi-modal embeddings, i.e., image, text, and events. To this end, we first introduce a novel event encoder that subtly models the temporal information from events and meanwhile generates event prompts to promote the modality bridging. We then design a text encoder that generates content prompts and utilizes hybrid text prompts to enhance the E-CLIP's generalization ability across diverse datasets. With the proposed event encoder, text encoder, and original image encoder, a novel Hierarchical Triple Contrastive Alignment (HTCA) module is introduced to jointly optimize the correlation and enable efficient knowledge transfer among the three modalities. We conduct extensive experiments on two recognition benchmarks, and the results demonstrate that our E-CLIP outperforms existing methods by a large margin of +3.94% and +4.62% on the N-Caltech dataset, respectively, in both fine-tuning and few-shot settings. Moreover, our E-CLIP can be flexibly extended to the event retrieval task using both text or image queries, showing plausible performance.

摘要
另一个挑战是如何将CLIP的视觉编码器应用到事件数据上，并充分利用事件的特有特征，如稀疏性和高时间分辨率。为此，我们首先引入了一种新的事件编码器，它灵活地模拟了事件中的时间信息，同时生成了事件提示，以便模式桥接。然后，我们设计了一个文本编码器，它生成了内容提示，并使用混合文本提示来提高E-CLIP的泛化能力。最后，我们引入了一个新的层次 triple contrastive alignment（HTCA）模块，以同时优化相关性和各模态之间的知识传递。我们在两个认证标准列表上进行了广泛的实验，结果表明，我们的E-CLIP在精度调整和少量调整下比 EXISTS的方法提高了+3.94%和+4.62%的margin。此外，我们的E-CLIP可以灵活地扩展到事件检索任务，使用文本或图像查询，表现可靠。

NNVISR: Bring Neural Network Video Interpolation and Super Resolution into Video Processing Framework

paper_url: http://arxiv.org/abs/2308.03121
repo_url: https://github.com/tongyuantongyu/vs-nnvisr
paper_authors: Yuan Tong, Mengshun Hu, Zheng Wang
for: 这个论文是为了提供一个基于神经网络的视频提高工具，用于进行各种视频提高任务，包括噪声除除、超分辨、 interpolate 和空间时间超分辨。
methods: 该论文使用的方法是基于神经网络的，可以接受任何能够提高一组帧的神经网络，并处理所有其他网络不依赖的细节 during 视频处理。
results: 该论文的实验结果表明，NNVISR 可以高效地进行视频提高任务，并且可以提供比较高的图像质量。

Abstract
We present NNVISR - an open-source filter plugin for the VapourSynth video processing framework, which facilitates the application of neural networks for various kinds of video enhancing tasks, including denoising, super resolution, interpolation, and spatio-temporal super-resolution. NNVISR fills the gap between video enhancement neural networks and video processing pipelines, by accepting any network that enhances a group of frames, and handling all other network agnostic details during video processing. NNVISR is publicly released at https://github.com/tongyuantongyu/vs-NNVISR.

摘要
我团队现在公开发布一款开源的滤波器插件NNVISR，用于VapourSynth视频处理框架，以应用神经网络进行视频优化任务，包括噪声除除、超分辨率、 interpolate 和空间时间超分辨率。NNVISR将视频优化神经网络与视频处理管道相连接，接受任何可以提高帧组的网络，并处理所有其他网络不关注的细节 durante el procesamiento de video. NNVISR está disponible en .

SAAM: Stealthy Adversarial Attack on Monoculor Depth Estimation

paper_url: http://arxiv.org/abs/2308.03108
repo_url: None
paper_authors: Amira Guesmi, Muhammad Abdullah Hanif, Bassem Ouni, Muhammad Shafique
for: 本研究探讨了基于深度学习的MDE（多层感知探测）系统在面对抗尾攻击时的漏洞。
methods: 我们提出了一种新的隐蔽式抗尾攻击方法，称为SAAM（隐蔽式抗尾攻击），该方法可以让MDE系统伪造物体的深度估计。我们的实验结果表明，我们设计的隐蔽式抗尾攻击贴图成功地使DNN基于MDE系统产生深度错误。具体来说，我们的设计的隐蔽式抗尾攻击贴图可以在99%的影响区域内达到60%的深度错误。
results: 我们的实验结果表明，我们的SAAM方法可以成功地使MDE系统产生深度错误，并且这些错误具有自然的外观，使其难以被人类识别。我们认为这种威胁对MDE系统在边缘设备上的应用产生了重要的影响，并且希望这种威胁能够引起社区的关注，并促进更多的robust和适应性的防御机制的研究。

Abstract
In this paper, we investigate the vulnerability of MDE to adversarial patches. We propose a novel \underline{S}tealthy \underline{A}dversarial \underline{A}ttacks on \underline{M}DE (SAAM) that compromises MDE by either corrupting the estimated distance or causing an object to seamlessly blend into its surroundings. Our experiments, demonstrate that the designed stealthy patch successfully causes a DNN-based MDE to misestimate the depth of objects. In fact, our proposed adversarial patch achieves a significant 60\% depth error with 99\% ratio of the affected region. Importantly, despite its adversarial nature, the patch maintains a naturalistic appearance, making it inconspicuous to human observers. We believe that this work sheds light on the threat of adversarial attacks in the context of MDE on edge devices. We hope it raises awareness within the community about the potential real-life harm of such attacks and encourages further research into developing more robust and adaptive defense mechanisms.

摘要
在这篇论文中，我们研究了多光谱探测（MDE）对恶意质patch的抵触性。我们提出了一种新的隐蔽的恶意质patch（SAAM），该patch可以让MDE估算误差或使物体顺滑地融入到它所在的环境中。我们的实验表明，我们设计的隐蔽patch成功地使得基于DNN的MDE估算深度错误。事实上，我们的提案的恶意质patch实现了60%的深度错误率，99%的affected region。重要的是，即使具有恶意目的，patch仍然保持自然的外观，使其对人类观察者难以发现。我们认为，这项工作着光了MDE在边缘设备上的攻击风险，并希望这项研究会引起社区对此类攻击的关注，并促进更多的robust和适应性的防御机制的研究。

Incorporating Pre-training Data Matters in Unsupervised Domain Adaptation

paper_url: http://arxiv.org/abs/2308.03097
repo_url: None
paper_authors: Yinsong Xu, Aidong Men, Yang Liu, Qingchao Chen
for: 本研究旨在探讨半监督领域适应（UDA）和无源领域适应（SFUDA）方法中的问题，具体来说是研究图像Net、源频谱和目标频谱之间的相互关系，以及在这些频谱上进行 pré-训练后，对目标风险的影响。
methods: 本研究使用了一种名为TriDA的新框架，它在 fine-tuning 过程中保持了预训练集（图像Net）的 semantic 结构，以提高适应性能。
results: 实验结果显示，TriDA 可以在多个 UDA 和 SFUDA 标准测试集上达到当前最佳性能。

Abstract
Unsupervised domain adaptation(UDA) and Source-free UDA(SFUDA) methods formulate the problem involving two domains: source and target. They typically employ a standard training approach that begins with models pre-trained on large-scale datasets e.g., ImageNet, while rarely discussing its effect. Recognizing this gap, we investigate the following research questions: (1) What is the correlation among ImageNet, the source, and the target domain? (2) How does pre-training on ImageNet influence the target risk? To answer the first question, we empirically observed an interesting Spontaneous Pulling (SP) Effect in fine-tuning where the discrepancies between any two of the three domains (ImageNet, Source, Target) decrease but at the cost of the impaired semantic structure of the pre-train domain. For the second question, we put forward a theory to explain SP and quantify that the target risk is bound by gradient disparities among the three domains. Our observations reveal a key limitation of existing methods: it hinders the adaptation performance if the semantic cluster structure of the pre-train dataset (i.e.ImageNet) is impaired. To address it, we incorporate ImageNet as the third domain and redefine the UDA/SFUDA as a three-player game. Specifically, inspired by the theory and empirical findings, we present a novel framework termed TriDA which additionally preserves the semantic structure of the pre-train dataset during fine-tuning. Experimental results demonstrate that it achieves state-of-the-art performance across various UDA and SFUDA benchmarks.

摘要
Unsupervised domain adaptation（UDA）和Source-free UDA（SFUDA）方法通常假设有两个领域：源领域和目标领域。它们通常采用标准训练方法，开始于在大规模数据集上预训练模型，如ImageNet，而rarely探讨其效果。认可这个空隙，我们调查以下研究问题：（1）ImageNet、源领域和目标领域之间有哪些相互关系？（2）预训练在ImageNet上会对目标风险产生何种影响？为回答第一个问题，我们观察了一种有趣的自然抽象（SP）效应在细化中，其中任何两个领域之间的差异都会减少，但是会导致预训练频率的结构受损。为回答第二个问题，我们提出了一种理论来解释SP和量化目标风险受到三个领域的梯度差异的限制。我们的观察表明现有方法的一个重要限制：如果预训练数据集（即ImageNet）的 semantic cluster structure被破坏，那么适应性会受到影响。为解决这个限制，我们将ImageNet作为第三个领域，并重新定义UDA/SFUDA为三个玩家的游戏。具体来说，我们提出了一种新的框架，称为TriDA，它在细化过程中保持预训练数据集的semantic结构。实验结果表明，TriDA可以在多个UDA和SFUDA benchmark上实现状态机器人的性能。

ECT: Fine-grained Edge Detection with Learned Cause Tokens

paper_url: http://arxiv.org/abs/2308.03092
repo_url: https://github.com/daniellli/ect
paper_authors: Shaocong Xu, Xiaoxue Chen, Yuhang Zheng, Guyue Zhou, Yurong Chen, Hongbin Zha, Hao Zhao
for: 本研究强调细化边检测任务，即预测由反射、照明、正常和深度变化引起的特定边。
methods: 我们提出了一种基于转换器的两阶段网络，先预测通用边，然后预测细化边，使用注意机制实现全局响应野。我们还使用可学习的 causal 令和边绑定损失来保证边的一致性。
results: 我们在公共测试集 BSDS-RIND 上以及一些新定义的测试集上进行了评估，并实现了新的状态计算结果。我们的代码、数据和模型都公开在 GitHub 上（ https://github.com/Daniellli/ECT.git）。

Abstract
In this study, we tackle the challenging fine-grained edge detection task, which refers to predicting specific edges caused by reflectance, illumination, normal, and depth changes, respectively. Prior methods exploit multi-scale convolutional networks, which are limited in three aspects: (1) Convolutions are local operators while identifying the cause of edge formation requires looking at far away pixels. (2) Priors specific to edge cause are fixed in prediction heads. (3) Using separate networks for generic and fine-grained edge detection, and the constraint between them may be violated. To address these three issues, we propose a two-stage transformer-based network sequentially predicting generic edges and fine-grained edges, which has a global receptive field thanks to the attention mechanism. The prior knowledge of edge causes is formulated as four learnable cause tokens in a cause-aware decoder design. Furthermore, to encourage the consistency between generic edges and fine-grained edges, an edge aggregation and alignment loss is exploited. We evaluate our method on the public benchmark BSDS-RIND and several newly derived benchmarks, and achieve new state-of-the-art results. Our code, data, and models are publicly available at https://github.com/Daniellli/ECT.git.

摘要
在这个研究中，我们面临着细化的边检测任务，即根据反射、照明、 нормаль和深度变化预测特定的边。先前的方法利用多尺度卷积网络，它们受到以下三种限制：（1）卷积是本地操作符，而预测边的原因需要查看远程像素。（2）预测头中的特征预设是固定的。（3）使用分开的网络来预测通用和细化的边，以及这两个网络之间的约束可能会被违反。为了解决这些问题，我们提出了一个两stage的 transformer-based 网络，先预测通用的边，然后预测细化的边，具有全局响应场，因为注意机制。此外，我们还形式化了边的原因为四个学习的 Token，并在 cause-aware 解码器中实现了这些 Token。此外，为了促进通用边和细化边之间的一致性，我们还利用了边聚合和对齐损失。我们在公共的 benchmark BSDS-RIND 上以及一些新 derivation 的 benchmark 上进行了测试，并实现了新的州��archy 记录。我们的代码、数据和模型都公开可用于。

Study for Performance of MobileNetV1 and MobileNetV2 Based on Breast Cancer

paper_url: http://arxiv.org/abs/2308.03076
repo_url: None
paper_authors: Jiuqi Yan
for: 这个实验的目的是比较MobileNetV1和MobileNetV2模型在分析乳腺癌图像方面的表现。
methods: 这个实验使用了Kaggle上下载的 histopathological 图像集进行训练，并使用了 MobileNetV1 和 MobileNetV2 模型进行分类。
results: 实验结果显示，在处理这个数据集时，MobileNetV1 模型表现更好，其验证精度和过拟合性也较高。

Abstract
Artificial intelligence is constantly evolving and can provide effective help in all aspects of people's lives. The experiment is mainly to study the use of artificial intelligence in the field of medicine. The purpose of this experiment was to compare which of MobileNetV1 and MobileNetV2 models was better at detecting histopathological images of the breast downloaded at Kaggle. When the doctor looks at the pathological image, there may be errors that lead to errors in judgment, and the observation speed is slow. Rational use of artificial intelligence can effectively reduce the error of doctor diagnosis in breast cancer judgment and speed up doctor diagnosis. The dataset was downloaded from Kaggle and then normalized. The basic principle of the experiment is to let the neural network model learn the downloaded data set. Then find the pattern and be able to judge on your own whether breast tissue is cancer. In the dataset, benign tumor pictures and malignant tumor pictures have been classified, of which 198738 are benign tumor pictures and 78, 786 are malignant tumor pictures. After calling MobileNetV1 and MobileNetV2, the dataset is trained separately, the training accuracy and validation accuracy rate are obtained, and the image is drawn. It can be observed that MobileNetV1 has better validation accuracy and overfit during MobileNetV2 training. From the experimental results, it can be seen that in the case of processing this dataset, MobileNetV1 is much better than MobileNetV2.

摘要
人工智能不断演化，可以提供有效的帮助在人们的生活中。这个实验主要是研究人工智能在医疗领域的应用。实验的目的是比较MobileNetV1和MobileNetV2模型在Kaggle上下载的乳腺病理图像上的性能。当医生查看病理图像时，可能会出现错误，导致诊断错误， Observation speed also slow. 合理使用人工智能可以有效减少医生诊断 breast cancer 的错误率，并加快医生诊断速度。数据集来自Kaggle，然后 норmalized。实验的基本原则是让神经网络模型学习下载的数据集，然后找出模式，以便 judge 是否 breast tissue 是癌变。在数据集中，癌变和非癌变图像已经分类，其中有198738个癌变图像和78786个非癌变图像。在MobileNetV1和MobileNetV2之后，数据集进行了独立的训练，并获得了训练精度和验证精度率，并将图像绘制出来。可以看到，在处理这个数据集时，MobileNetV1比MobileNetV2更好。

M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition

paper_url: http://arxiv.org/abs/2308.03063
repo_url: None
paper_authors: Hao Tang, Jun Liu, Shuanglin Yan, Rui Yan, Zechao Li, Jinhui Tang
for:* The paper is written for fine-grained action recognition, specifically addressing the challenges of capturing subtle action details and learning from limited data with high intra-class variance and inter-class similarity.methods:* The proposed M$^3$Net framework incorporates multi-view encoding, multi-view matching, and multi-view fusion to facilitate embedding encoding, similarity matching, and decision making across multiple viewpoints.* The framework uses various matching functions to integrate instance-specific, category-specific, and task-specific perspectives, and employs multi-task collaborative learning to enhance embedding generalizability.results:* Experimental results on three challenging benchmarks demonstrate the superiority of M$^3$Net in capturing fine-grained action details and achieving state-of-the-art performance for few-shot fine-grained action recognition.Here’s the information in Simplified Chinese text:for:* 这篇论文是为了解决细化动作识别问题，特别是捕捉细化动作细节和从有限数据中学习的问题。methods:* 提议的 M$^3$Net 框架具有多视图编码、多视图匹配和多视图融合，以便在多个视点上进行嵌入编码、相似匹配和决策。* 该框架使用多种匹配函数集成多种视点，包括实例特定、类别特定和任务特定的视点，以处理多尺度空间时间变化。results:* 实验结果表明，M$^3$Net 能够出色地捕捉细化动作细节并在少量数据下实现状态的最佳性能。

Abstract
Due to the scarcity of manually annotated data required for fine-grained video understanding, few-shot fine-grained (FS-FG) action recognition has gained significant attention, with the aim of classifying novel fine-grained action categories with only a few labeled instances. Despite the progress made in FS coarse-grained action recognition, current approaches encounter two challenges when dealing with the fine-grained action categories: the inability to capture subtle action details and the insufficiency of learning from limited data that exhibit high intra-class variance and inter-class similarity. To address these limitations, we propose M$^3$Net, a matching-based framework for FS-FG action recognition, which incorporates \textit{multi-view encoding}, \textit{multi-view matching}, and \textit{multi-view fusion} to facilitate embedding encoding, similarity matching, and decision making across multiple viewpoints. \textit{Multi-view encoding} captures rich contextual details from the intra-frame, intra-video, and intra-episode perspectives, generating customized higher-order embeddings for fine-grained data. \textit{Multi-view matching} integrates various matching functions enabling flexible relation modeling within limited samples to handle multi-scale spatio-temporal variations by leveraging the instance-specific, category-specific, and task-specific perspectives. \textit{Multi-view fusion} consists of matching-predictions fusion and matching-losses fusion over the above views, where the former promotes mutual complementarity and the latter enhances embedding generalizability by employing multi-task collaborative learning. Explainable visualizations and experimental results on three challenging benchmarks demonstrate the superiority of M$^3$Net in capturing fine-grained action details and achieving state-of-the-art performance for FS-FG action recognition.

摘要
多视图编码 captures rich contextual details from the intra-frame, intra-video, and intra-episode perspectives, generating customized higher-order embeddings for fine-grained data. 多视图匹配 integrates various matching functions enabling flexible relation modeling within limited samples to handle multi-scale spatio-temporal variations by leveraging the instance-specific, category-specific, and task-specific perspectives. 多视图融合 consists of matching-predictions fusion and matching-losses fusion over the above views, where the former promotes mutual complementarity and the latter enhances embedding generalizability by employing multi-task collaborative learning. Explainable visualizations and experimental results on three challenging benchmarks demonstrate the superiority of M$^3$Net in capturing fine-grained action details and achieving state-of-the-art performance for FS-FG action recognition.

InterTracker: Discovering and Tracking General Objects Interacting with Hands in the Wild

paper_url: http://arxiv.org/abs/2308.03061
repo_url: None
paper_authors: Yanyan Shao, Qi Ye, Wenhan Luo, Kaihao Zhang, Jiming Chen
for: 本研究旨在解决人机交互中物体识别的问题，即在受到遮挡、背景噪音和干扰物体的情况下，准确地识别人与物体之间的交互。
methods: 本研究提出了一种基于手套空间时间信息的交互物体跟踪方法，不需要先知道将要跟踪的通用物体，可以在不同的交互场景下准确地识别和跟踪交互物体。
results: 比较实验结果表明，提出的方法在持续交互场景下的表现明显优于现有方法，具体来说，在100DOH数据集上测试和评估的视频水平手套交互数据集上，我们的方法在AP指标下取得了约10%的提升。此外，我们的质量发现也表明，我们的方法可以生成更加连续的交互物体轨迹。

Abstract
Understanding human interaction with objects is an important research topic for embodied Artificial Intelligence and identifying the objects that humans are interacting with is a primary problem for interaction understanding. Existing methods rely on frame-based detectors to locate interacting objects. However, this approach is subjected to heavy occlusions, background clutter, and distracting objects. To address the limitations, in this paper, we propose to leverage spatio-temporal information of hand-object interaction to track interactive objects under these challenging cases. Without prior knowledge of the general objects to be tracked like object tracking problems, we first utilize the spatial relation between hands and objects to adaptively discover the interacting objects from the scene. Second, the consistency and continuity of the appearance of objects between successive frames are exploited to track the objects. With this tracking formulation, our method also benefits from training on large-scale general object-tracking datasets. We further curate a video-level hand-object interaction dataset for testing and evaluation from 100DOH. The quantitative results demonstrate that our proposed method outperforms the state-of-the-art methods. Specifically, in scenes with continuous interaction with different objects, we achieve an impressive improvement of about 10% as evaluated using the Average Precision (AP) metric. Our qualitative findings also illustrate that our method can produce more continuous trajectories for interacting objects.

摘要
人类与物体之间的互动理解是人工智能embodied的重要研究领域，并且确定互动中的物体是primary problem。现有方法通过框架基于的检测器来定位互动中的物体，但这种方法受到压束、背景噪音和干扰物体的影响。为了解决这些限制，在这篇论文中，我们提出了利用手部和物体之间的空间时间信息来跟踪互动中的物体。不需要先知道需要跟踪的通用物体类型，我们首先利用手部和物体之间的空间关系来自适应地发现互动中的物体。其次，我们利用物体之间的相似性和连续性来跟踪物体。这种跟踪方法受益于在大规模的通用物体跟踪数据集上进行训练。我们还制作了基于100DOH的视频级手部物体互动数据集用于测试和评估。量值结果表明，我们提出的方法在不断互动的不同物体场景中表现出色，相比状态艺术方法，我们的方法在AP metric下达到了约10%的提升。我们的质量发现也表明了我们的方法可以生成更加连续的互动物体轨迹。

TOPIQ: A Top-down Approach from Semantics to Distortions for Image Quality Assessment

paper_url: http://arxiv.org/abs/2308.03060
repo_url: https://github.com/chaofengc/iqa-pytorch
paper_authors: Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, Weisi Lin
for: 这个论文主要针对图像质量评估（IQA）领域，旨在提高图像质量评估的精度和效率。
methods: 该方法基于人类视觉系统的特点，使用多级尺度特征（global和local特征），并通过提高semantic信息的利用来提高表示能力。具体来说，该方法提出了一种顶部下降的方法（TOPIQ），通过高级semantic信息导引低级特征进行重要区域的强调，从而提高表示能力。
results: 该方法通过设计了一种冗余抽象网络（CFANet），可以用于Full-Reference（FR）和No-Reference（NR）IQA，并且可以与现有的视transformer方法相比，在大多数公共FR和NR标准benchmark上表现更好或竞争性好，同时具有许多更高效的特点。

Abstract
Image Quality Assessment (IQA) is a fundamental task in computer vision that has witnessed remarkable progress with deep neural networks. Inspired by the characteristics of the human visual system, existing methods typically use a combination of global and local representations (\ie, multi-scale features) to achieve superior performance. However, most of them adopt simple linear fusion of multi-scale features, and neglect their possibly complex relationship and interaction. In contrast, humans typically first form a global impression to locate important regions and then focus on local details in those regions. We therefore propose a top-down approach that uses high-level semantics to guide the IQA network to focus on semantically important local distortion regions, named as \emph{TOPIQ}. Our approach to IQA involves the design of a heuristic coarse-to-fine network (CFANet) that leverages multi-scale features and progressively propagates multi-level semantic information to low-level representations in a top-down manner. A key component of our approach is the proposed cross-scale attention mechanism, which calculates attention maps for lower level features guided by higher level features. This mechanism emphasizes active semantic regions for low-level distortions, thereby improving performance. CFANet can be used for both Full-Reference (FR) and No-Reference (NR) IQA. We use ResNet50 as its backbone and demonstrate that CFANet achieves better or competitive performance on most public FR and NR benchmarks compared with state-of-the-art methods based on vision transformers, while being much more efficient (with only ${\sim}13\%$ FLOPS of the current best FR method). Codes are released at \url{https://github.com/chaofengc/IQA-PyTorch}.

摘要
Image质量评估（IQA）是计算机视觉中的基本任务，它在深度神经网络的推动下已经取得了非常出色的进步。人类视觉系统的特点 inspirits existing methods typically use a combination of global and local representations (\ie, multi-scale features) to achieve superior performance. However, most of them adopt simple linear fusion of multi-scale features, and neglect their possibly complex relationship and interaction. In contrast, humans typically first form a global impression to locate important regions and then focus on local details in those regions. We therefore propose a top-down approach that uses high-level semantics to guide the IQA network to focus on semantically important local distortion regions, named as \emph{TOPIQ}. Our approach to IQA involves the design of a heuristic coarse-to-fine network (CFANet) that leverages multi-scale features and progressively propagates multi-level semantic information to low-level representations in a top-down manner. A key component of our approach is the proposed cross-scale attention mechanism, which calculates attention maps for lower level features guided by higher level features. This mechanism emphasizes active semantic regions for low-level distortions, thereby improving performance. CFANet can be used for both Full-Reference (FR) and No-Reference (NR) IQA. We use ResNet50 as its backbone and demonstrate that CFANet achieves better or competitive performance on most public FR and NR benchmarks compared with state-of-the-art methods based on vision transformers, while being much more efficient (with only $\sim13\%$ FLOPS of the current best FR method). Codes are released at \url{https://github.com/chaofengc/IQA-PyTorch}.

Multi-scale Alternated Attention Transformer for Generalized Stereo Matching

paper_url: http://arxiv.org/abs/2308.03048
repo_url: None
paper_authors: Wei Miao, Hong Zhao, Tongjia Chen, Wei Huang, Changyan Xiao
for: 提高三视图匹配性能，提供一种新的简单 yet effective的网络结构。
methods: 使用 Alternated Attention U-shaped Transformer (AAUformer) 网络结构，包括窗口自注意力和多尺度 Alternated Attention 几层网络，以提高单视图特征的解释力和匹配精度。
results: 在 Scene Flow 数据集上达到了状态的推出表现，并在 KITTI 2015 数据集上进行了细致的调整，在 synthetic 和 real-world 数据集上进行了跨种植入性比较，表现出色。

Abstract
Recent stereo matching networks achieves dramatic performance by introducing epipolar line constraint to limit the matching range of dual-view. However, in complicated real-world scenarios, the feature information based on intra-epipolar line alone is too weak to facilitate stereo matching. In this paper, we present a simple but highly effective network called Alternated Attention U-shaped Transformer (AAUformer) to balance the impact of epipolar line in dual and single view respectively for excellent generalization performance. Compared to other models, our model has several main designs: 1) to better liberate the local semantic features of the single-view at pixel level, we introduce window self-attention to break the limits of intra-row self-attention and completely replace the convolutional network for denser features before cross-matching; 2) the multi-scale alternated attention backbone network was designed to extract invariant features in order to achieves the coarse-to-fine matching process for hard-to-discriminate regions. We performed a series of both comparative studies and ablation studies on several mainstream stereo matching datasets. The results demonstrate that our model achieves state-of-the-art on the Scene Flow dataset, and the fine-tuning performance is competitive on the KITTI 2015 dataset. In addition, for cross generalization experiments on synthetic and real-world datasets, our model outperforms several state-of-the-art works.

摘要
现代ステレオ匹配ネットワークは、双视点间の匹配范囲を制限するepipolar线制约を导入して、优れた性能を示している。 However, 现実世界の复雑なシーンでは、単视点内の情报に基づいた匹配は弱すぎる。在这篇论文中，我们は、简单で强大的网络模型 called Alternated Attention U-shaped Transformer (AAUformer) を提出します。この模型は、双视点および単视点の両方での匹配に対して、均衡的な影响を持つように设计されています。この模型には、以下の主要なデザインが含まれています。1. 単视点の地域的semantic featureをより良く解放するため、ウィンドウ自己注意を导入しました。これにより、梯度网络の制约を完全に超えることができます。2. 多スケールのalternated attentionBackbone Networkを设计しました。これにより、不挥発的な特徴を抽出し、coarse-to-fine匹配プロセスにおいて、难度の高い领域での匹配を可能にします。我们は、several mainstream stereo matching datasetsに対して、比较研究およびablation studyを実施しました。结果は、Scene Flow datasetでstate-of-the-artを记录し、KITTI 2015 datasetでの细化学习性能は、优れた性能を示しています。また、合成および実世界のデータセットでの测定においても、several state-of-the-art worksを越える性能を示しています。

Prototypes-oriented Transductive Few-shot Learning with Conditional Transport

paper_url: http://arxiv.org/abs/2308.03047
repo_url: None
paper_authors: Long Tian, Jingyi Feng, Wenchao Chen, Xiaoqiang Chai, Liming Wang, Xiyang Liu, Bo Chen
for: 提高异常少量学习（TFSL）模型在异常分布的情况下的性能，即使在类别不均衡的情况下。
methods: 提出一种基于Conditional Transport（CT）的不均衡TFSL模型，称为PUTM，可以全面利用异常query样本的不均衡统计信息，并使用前进和后退导航器作为传输矩阵来偏好每个类别的查询样本分布。
results: 通过实验证明，在四个标准 benchmark上，包括miniImageNet、 tieredImageNet、CUB 和 CIFAR-FS，OUR模型在类别不均衡的情况下表现出优于其他模型。

Abstract
Transductive Few-Shot Learning (TFSL) has recently attracted increasing attention since it typically outperforms its inductive peer by leveraging statistics of query samples. However, previous TFSL methods usually encode uniform prior that all the classes within query samples are equally likely, which is biased in imbalanced TFSL and causes severe performance degradation. Given this pivotal issue, in this work, we propose a novel Conditional Transport (CT) based imbalanced TFSL model called {\textbf P}rototypes-oriented {\textbf U}nbiased {\textbf T}ransfer {\textbf M}odel (PUTM) to fully exploit unbiased statistics of imbalanced query samples, which employs forward and backward navigators as transport matrices to balance the prior of query samples per class between uniform and adaptive data-driven distributions. For efficiently transferring statistics learned by CT, we further derive a closed form solution to refine prototypes based on MAP given the learned navigators. The above two steps of discovering and transferring unbiased statistics follow an iterative manner, formulating our EM-based solver. Experimental results on four standard benchmarks including miniImageNet, tieredImageNet, CUB, and CIFAR-FS demonstrate superiority of our model in class-imbalanced generalization.

摘要
传ductive 少样学习 (TFSL) 在最近几年引起了越来越多的关注，因为它通常比其它 inductive 对手优秀，通过训练尝试样本的统计信息来实现。然而，先前的 TFSL 方法通常采用了所有查询样本中类别的均勋统计，这会导致不均衡的 TFSL 表现不佳，特别是在类别异质的情况下。为了解决这一问题，在本工作中，我们提出了一种基于 Conditional Transport (CT) 的不均衡 TFSL 模型，称为 Prototypes-oriented Unbiased Transfer Model (PUTM)。该模型利用前向和后向导航器作为传输矩阵，以均衡查询样本中每个类别的先前统计信息。为了有效地传输 CT 学习的统计信息，我们还 derivates 一个关注 MAP 的闭形解，以修改基于导航器学习的抽象。这两个步骤，jointly 组成我们的 EM-based 解决方案。实验结果表明，我们的模型在 miniImageNet、tieredImageNet、CUB 和 CIFAR-FS 四个标准准标 benchmark 上具有优秀的类别不均衡泛化能力。

Learning Fine-Grained Features for Pixel-wise Video Correspondences

paper_url: http://arxiv.org/abs/2308.03040
repo_url: https://github.com/qianduoduolr/fgvc
paper_authors: Rui Li, Shenglong Zhou, Dong Liu
for: 学习 pixel-wise 视频对应关系，提高视频分析效果。
methods: 基于自我超vised学习和 optical flows 的方法，使用 labelled 和 unlabeled 视频进行学习，采用对抗学习 scheme 提高泛化能力。
results: 在多种对应任务上达到了 state-of-the-art 精度和效率。

Abstract
Video analysis tasks rely heavily on identifying the pixels from different frames that correspond to the same visual target. To tackle this problem, recent studies have advocated feature learning methods that aim to learn distinctive representations to match the pixels, especially in a self-supervised fashion. Unfortunately, these methods have difficulties for tiny or even single-pixel visual targets. Pixel-wise video correspondences were traditionally related to optical flows, which however lead to deterministic correspondences and lack robustness on real-world videos. We address the problem of learning features for establishing pixel-wise correspondences. Motivated by optical flows as well as the self-supervised feature learning, we propose to use not only labeled synthetic videos but also unlabeled real-world videos for learning fine-grained representations in a holistic framework. We adopt an adversarial learning scheme to enhance the generalization ability of the learned features. Moreover, we design a coarse-to-fine framework to pursue high computational efficiency. Our experimental results on a series of correspondence-based tasks demonstrate that the proposed method outperforms state-of-the-art rivals in both accuracy and efficiency.

摘要
视频分析任务强调将不同帧中的像素与同一个视觉目标进行对应。为解决这个问题， latest studies have advocated feature learning methods that aim to learn distinctive representations to match the pixels, especially in a self-supervised fashion. However, these methods have difficulties for tiny or even single-pixel visual targets. 传统的像素级视频对应方法与 optic flow 相关，但是这些方法具有决定性对应和在实际视频中缺乏鲁棒性。我们解决了学习建立像素级对应的特征的问题。我们采用了 optical flow 以及自主学习特征学习的思路，并在一个整体框架中学习细腻的表示。我们采用了对抗学习方案来增强学习的通用能力，并设计了一个从粗到细的框架来提高计算效率。我们的实验结果表明，我们的方法在多种对应任务中超过了当前的竞争对手， both in terms of accuracy and efficiency.

FourLLIE: Boosting Low-Light Image Enhancement by Fourier Frequency Information

paper_url: http://arxiv.org/abs/2308.03033
repo_url: https://github.com/wangchx67/fourllie
paper_authors: Chenxi Wang, Hongjun Wu, Zhi Jin
for: This paper focuses on improving the lightness of low-light images using Fourier frequency information.methods: The proposed FourLLIE network uses a two-stage approach, first estimating the amplitude transform map in the Fourier space and then introducing an SNR map to integrate global Fourier frequency and local spatial information.results: The proposed FourLLIE method outperforms existing SOTA LLIE methods on four representative datasets while maintaining good model efficiency.Here’s the text in Simplified Chinese:for: 本文主要针对低光照图像的亮度提升问题，利用傅里叶频率信息。methods: FourLLIE网络采用两stage方法，首先估计傅里叶频率空间中的振荡变换图，然后引入Signal-to-Noise-Ratio（SNR）地图，将全球傅里叶频率和本地空间信息集成。results: FourLLIE方法在四个表示性数据集上比存在状态的方法（SOTA）高，同时保持良好的模型效率。

Abstract
Recently, Fourier frequency information has attracted much attention in Low-Light Image Enhancement (LLIE). Some researchers noticed that, in the Fourier space, the lightness degradation mainly exists in the amplitude component and the rest exists in the phase component. By incorporating both the Fourier frequency and the spatial information, these researchers proposed remarkable solutions for LLIE. In this work, we further explore the positive correlation between the magnitude of amplitude and the magnitude of lightness, which can be effectively leveraged to improve the lightness of low-light images in the Fourier space. Moreover, we find that the Fourier transform can extract the global information of the image, and does not introduce massive neural network parameters like Multi-Layer Perceptrons (MLPs) or Transformer. To this end, a two-stage Fourier-based LLIE network (FourLLIE) is proposed. In the first stage, we improve the lightness of low-light images by estimating the amplitude transform map in the Fourier space. In the second stage, we introduce the Signal-to-Noise-Ratio (SNR) map to provide the prior for integrating the global Fourier frequency and the local spatial information, which recovers image details in the spatial space. With this ingenious design, FourLLIE outperforms the existing state-of-the-art (SOTA) LLIE methods on four representative datasets while maintaining good model efficiency.

摘要
近些时候，弗洛伦幂频信息在低光照图像增强（LLIE）中吸引了非常多的注意力。一些研究人员发现，在弗洛伦空间中，低光照图像的亮度异常主要集中在幂频成分中，剩下的则集中在相位成分中。通过将弗洛伦频率和空间信息结合起来，这些研究人员提出了非常出色的解决方案。在这个工作中，我们进一步探索幂频成分的积分和亮度之间的正相关关系，可以有效地提高低光照图像的亮度在弗洛伦空间中。此外，我们发现弗洛伦变换可以提取图像的全局信息，而不需要大量的神经网络参数，比如多层感知器（MLP）或转移器。为了实现这一目标，我们提出了一种两stage的弗洛伦基于LLIE网络（FourLLIE）。在第一stage中，我们使用幂频变换Map来提高低光照图像的亮度在弗洛伦空间中。在第二stage中，我们引入信噪比特（SNR）Map，以提供优先顺序，将全局弗洛伦频率和本地空间信息集成起来，以恢复图像细节在空间空间中。与既有的SOTA方法相比，FourLLIE在四个代表性的数据集上表现出了出色的性能，同时保持了良好的模型效率。

Brighten-and-Colorize: A Decoupled Network for Customized Low-Light Image Enhancement

paper_url: http://arxiv.org/abs/2308.03029
repo_url: None
paper_authors: Chenxi Wang, Zhi Jin
for: 提高低光照图像的 perceived 质量 (improve the perceptual quality of low-light images)
methods: 提出一种“炯化和颜色化”网络 (propose a “brighten-and-colorize” network)，将低光照图像进行炯化和颜色化，同时实现具有精度颜色的增强和个性化增强 based on user preferences. (use a “brighten-and-colorize” network to brighten and colorize low-light images, while achieving accurate color and customized enhancement based on user preferences)
results: 实验结果表明，提出的方法可以同时实现 State-Of-The-Art 性能和用户友好的个性化增强。 (experimental results show that the proposed method can achieve both State-Of-The-Art performance and user-friendly customization)

Abstract
Low-Light Image Enhancement (LLIE) aims to improve the perceptual quality of an image captured in low-light conditions. Generally, a low-light image can be divided into lightness and chrominance components. Recent advances in this area mainly focus on the refinement of the lightness, while ignoring the role of chrominance. It easily leads to chromatic aberration and, to some extent, limits the diverse applications of chrominance in customized LLIE. In this work, a ``brighten-and-colorize'' network (called BCNet), which introduces image colorization to LLIE, is proposed to address the above issues. BCNet can accomplish LLIE with accurate color and simultaneously enables customized enhancement with varying saturations and color styles based on user preferences. Specifically, BCNet regards LLIE as a multi-task learning problem: brightening and colorization. The brightening sub-task aligns with other conventional LLIE methods to get a well-lit lightness. The colorization sub-task is accomplished by regarding the chrominance of the low-light image as color guidance like the user-guide image colorization. Upon completion of model training, the color guidance (i.e., input low-light chrominance) can be simply manipulated by users to acquire customized results. This customized process is optional and, due to its decoupled nature, does not compromise the structural and detailed information of lightness. Extensive experiments on the commonly used LLIE datasets show that the proposed method achieves both State-Of-The-Art (SOTA) performance and user-friendly customization.

摘要
低光照图像提升（LLIE）的目标是提高低光照图像的感知质量。通常，低光照图像可以分为亮度和色彩组成部分。现有的研究主要关注亮度的精细调整，而忽略了色彩的角色。这可能会导致彩色偏差和限制了自定义LLIE的多样化应用。在这项工作中，我们提出了一种“炬光化和彩色”网络（BCNet），该网络引入图像彩色化，以解决上述问题。BCNet可以同时完成LLIE和自定义增强，并提供了不同的饱和度和颜色风格的个性化调整，基于用户的偏好。具体来说，BCNet将LLIE视为多任务学习问题：炬光和彩色。炬光子任务与其他传统的LLIE方法一样，以获得良好的亮度。彩色子任务是基于低光照图像的色彩作为颜色指南，与用户指南图像彩色相似。在模型训练完成后，用户可以简单地 manipulate 输入低光照图像的色彩，以获得自定义结果。这个自定义过程是可选的，并且由于其分离的性质，不会对图像的结构和细节信息产生影响。我们对常用的LLIE数据集进行了广泛的实验，结果表明，我们提出的方法同时实现了State-Of-The-Art（SOTA）性能和用户友好的自定义。

Causal Disentanglement Hidden Markov Model for Fault Diagnosis

paper_url: http://arxiv.org/abs/2308.03027
repo_url: None
paper_authors: Rihao Chang, Yongtao Ma, Weizhi Nie, Jie Nie, An-an Liu
for: 本研究旨在提出一种基于 causal disentanglement hidden markov model (CDHM) 的FAULT DIAGNOSIS方法，以便更好地捕捉 bearing 故障机制的特征，并实现更加精准的FAULT TYPE预测。
methods: 本研究使用了时间序列数据，逐步分离了振荡信号，将 fault-relevant 和 fault-irrelevant 因素分离出来。并使用了 ELBO 优化方法来学习 causal disentanglement markov model。此外，还采用了无监督领域适应技术来传递学习的拟合表示。
results: 在 CWRU 数据集和 IMS 数据集上进行了实验，结果表明，提出的方法能够更好地捕捉 bearing 故障机制的特征，并实现更加精准的FAULT TYPE预测。

Abstract
In modern industries, fault diagnosis has been widely applied with the goal of realizing predictive maintenance. The key issue for the fault diagnosis system is to extract representative characteristics of the fault signal and then accurately predict the fault type. In this paper, we propose a Causal Disentanglement Hidden Markov model (CDHM) to learn the causality in the bearing fault mechanism and thus, capture their characteristics to achieve a more robust representation. Specifically, we make full use of the time-series data and progressively disentangle the vibration signal into fault-relevant and fault-irrelevant factors. The ELBO is reformulated to optimize the learning of the causal disentanglement Markov model. Moreover, to expand the scope of the application, we adopt unsupervised domain adaptation to transfer the learned disentangled representations to other working environments. Experiments were conducted on the CWRU dataset and IMS dataset. Relevant results validate the superiority of the proposed method.

摘要
现代产业中，故障诊断已广泛应用，目标是实现预测维护。故障诊断系统的关键问题是提取异常信号的表征特征，并准确预测故障类型。在这篇论文中，我们提出了 causal disentanglement hidden markov model（CDHM），以学习涟散机制中的 causality，并捕捉其特征，以实现更加稳定的表示。specifically，我们利用时间序列数据，逐渐分离振荡信号，分解为相关和不相关的故障因子。我们 reformulate elbo，以便学习 causal disentanglement markov model。此外，为扩大应用范围，我们采用了无监督领域适应，将学习的分离表示转移到其他工作环境。在CWRU数据集和IMS数据集上进行了实验，获得了相关的结果，证明了我们提出的方法的优越性。

All-in-one Multi-degradation Image Restoration Network via Hierarchical Degradation Representation

paper_url: http://arxiv.org/abs/2308.03021
repo_url: None
paper_authors: Cheng Zhang, Yu Zhu, Qingsen Yan, Jinqiu Sun, Yanning Zhang
for: restore high-quality images from distorted ones, especially on mobile devices
methods: progressively construct a tree structure through clustering to learn degradation representation, and design a feature transform block (FTB) to align domains and refine features
results: demonstrate the effectiveness of the method and its advantages over state-of-the-art restoration methods through extensive experiments on multiple distorted datasets

Abstract
The aim of image restoration is to recover high-quality images from distorted ones. However, current methods usually focus on a single task (\emph{e.g.}, denoising, deblurring or super-resolution) which cannot address the needs of real-world multi-task processing, especially on mobile devices. Thus, developing an all-in-one method that can restore images from various unknown distortions is a significant challenge. Previous works have employed contrastive learning to learn the degradation representation from observed images, but this often leads to representation drift caused by deficient positive and negative pairs. To address this issue, we propose a novel All-in-one Multi-degradation Image Restoration Network (AMIRNet) that can effectively capture and utilize accurate degradation representation for image restoration. AMIRNet learns a degradation representation for unknown degraded images by progressively constructing a tree structure through clustering, without any prior knowledge of degradation information. This tree-structured representation explicitly reflects the consistency and discrepancy of various distortions, providing a specific clue for image restoration. To further enhance the performance of the image restoration network and overcome domain gaps caused by unknown distortions, we design a feature transform block (FTB) that aligns domains and refines features with the guidance of the degradation representation. We conduct extensive experiments on multiple distorted datasets, demonstrating the effectiveness of our method and its advantages over state-of-the-art restoration methods both qualitatively and quantitatively.

摘要
目的是 восстановить高质量的图像从扭曲的图像中。然而，当前的方法通常只关注单一任务（例如，去噪、去锐化或超解像），这些方法无法满足实际世界中多任务处理的需求，特别是在移动设备上。因此，开发一个能够从多种不知道的扭曲中恢复图像的通用方法是一项重要的挑战。先前的工作使用了对比学习来学习图像的损害表示，但这经常导致表示漂移问题，由于缺乏正确的正例和负例。为解决这个问题，我们提出了一种新的 All-in-one 多种扭曲图像恢复网络（AMIRNet），它可以有效地捕捉和利用正确的损害表示进行图像恢复。AMIRNet 通过不知道扭曲信息进行批处理，逐渐构建一棵树状结构，无需任何先验知识。这棵树状结构明确地反映了不同的扭曲之间的一致和不一致，为图像恢复提供了特定的线索。为了进一步提高图像恢复网络的性能和过渡域差，我们设计了一种特征变换块（FTB），它可以对频率域进行匹配，并通过损害表示的指导来修正特征。我们在多个扭曲数据集上进行了广泛的实验，证明了我们的方法的效iveness和其与当前状态艺术方法的优势。

Recurrent Spike-based Image Restoration under General Illumination

paper_url: http://arxiv.org/abs/2308.03018
repo_url: https://github.com/bit-vision/rsir
paper_authors: Lin Zhu, Yunlong Zheng, Mengyue Geng, Lizhi Wang, Hua Huang
for: 该研究旨在开拓基于脉冲（spike）数组的视觉感知领域，提高视觉Task的高速重建和准确率。
methods: 提出了一种基于循环神经网络的脉冲图像修复（RSIR）网络，通过physical-based脉冲噪声模型和适应脉冲转换模块、循环时间特征融合模块和频率基于脉冲噪声消除模块来修复脉冲图像。
results: 通过实验表明，该网络能够在不同的照明条件下修复清晰的图像，并且可以在脉冲数组中 recursive 地处理脉冲信息，以便更好地利用脉冲时间信息。

Abstract
Spike camera is a new type of bio-inspired vision sensor that records light intensity in the form of a spike array with high temporal resolution (20,000 Hz). This new paradigm of vision sensor offers significant advantages for many vision tasks such as high speed image reconstruction. However, existing spike-based approaches typically assume that the scenes are with sufficient light intensity, which is usually unavailable in many real-world scenarios such as rainy days or dusk scenes. To unlock more spike-based application scenarios, we propose a Recurrent Spike-based Image Restoration (RSIR) network, which is the first work towards restoring clear images from spike arrays under general illumination. Specifically, to accurately describe the noise distribution under different illuminations, we build a physical-based spike noise model according to the sampling process of the spike camera. Based on the noise model, we design our RSIR network which consists of an adaptive spike transformation module, a recurrent temporal feature fusion module, and a frequency-based spike denoising module. Our RSIR can process the spike array in a recursive manner to ensure that the spike temporal information is well utilized. In the training process, we generate the simulated spike data based on our noise model to train our network. Extensive experiments on real-world datasets with different illuminations demonstrate the effectiveness of the proposed network. The code and dataset are released at https://github.com/BIT-Vision/RSIR.

摘要
新型生物体鼓启发相机（Spike camera）记录光束强度为数组形式，高度满足视觉任务的高速重建。然而，现有的脉冲基本方法通常假设场景具有足够的光束强度，这通常不符合实际世界的情况，如雨天或晚上场景。为了扩展更多的脉冲基本应用场景，我们提出了一种基于脉冲图像修复（RSIR）网络，这是第一个在普通照明下修复清晰图像的脉冲基本方法。为了准确描述不同照明下的噪声分布，我们构建了基于采样过程的物理基于脉冲噪声模型。根据噪声模型，我们设计了我们的RSIR网络，它包括自适应脉冲转换模块、回归时间特征融合模块和频率基于脉冲噪声清除模块。我们的RSIR可以在恰当的回归方式下处理脉冲数组，以确保脉冲时间信息得到好用。在训练过程中，我们基于我们的噪声模型生成了模拟的脉冲数据来训练我们的网络。实际测试表明，我们的RSIR网络在不同的照明下可以有效地修复清晰图像。我们在 GitHub 上发布了代码和数据集，请参考。

Early Detection and Localization of Pancreatic Cancer by Label-Free Tumor Synthesis

paper_url: http://arxiv.org/abs/2308.03008
repo_url: https://github.com/mrgiovanni/synthetictumors
paper_authors: Bowen Li, Yu-Cheng Chou, Shuwen Sun, Hualin Qiao, Alan Yuille, Zongwei Zhou
for: 提高晚期肠癌患者5年生存率，从8.5%提高到20%。
methods: 使用人工智能（AI）模型来帮助放射学专家早期发现肠癌。
results: 我们的实验表明，通过使用我们的虚拟肿瘤 Synthesis方法，AI模型可以很好地检测晚期肠癌。此外，我们还发现，使用我们的方法可以提高肠癌检测率，特别是小肿瘤的检测率。 Finally,我们证明了我们的方法可以在不同医院的CT扫描数据上提高肠癌检测和地点化的泛化性。

Abstract
Early detection and localization of pancreatic cancer can increase the 5-year survival rate for patients from 8.5% to 20%. Artificial intelligence (AI) can potentially assist radiologists in detecting pancreatic tumors at an early stage. Training AI models require a vast number of annotated examples, but the availability of CT scans obtaining early-stage tumors is constrained. This is because early-stage tumors may not cause any symptoms, which can delay detection, and the tumors are relatively small and may be almost invisible to human eyes on CT scans. To address this issue, we develop a tumor synthesis method that can synthesize enormous examples of small pancreatic tumors in the healthy pancreas without the need for manual annotation. Our experiments demonstrate that the overall detection rate of pancreatic tumors, measured by Sensitivity and Specificity, achieved by AI trained on synthetic tumors is comparable to that of real tumors. More importantly, our method shows a much higher detection rate for small tumors. We further investigate the per-voxel segmentation performance of pancreatic tumors if AI is trained on a combination of CT scans with synthetic tumors and CT scans with annotated large tumors at an advanced stage. Finally, we show that synthetic tumors improve AI generalizability in tumor detection and localization when processing CT scans from different hospitals. Overall, our proposed tumor synthesis method has immense potential to improve the early detection of pancreatic cancer, leading to better patient outcomes.

摘要
早期检测和肿瘤位置确定普罗大肠癌可以提高病人5年存活率从8.5%提高到20%。人工智能（AI）可能能够帮助放射学家在早期检测肿瘤。训练AI模型需要庞大的标注示例，但获得早期肿瘤的CT扫描数据受限。这是因为早期肿瘤可能不会产生任何症状，这可能会延迟检测，而且肿瘤也可能很小，使其在人类眼里几乎不可见。为解决这个问题，我们开发了一种肿瘤合成方法，可以在健康的肠部中合成庞大的小肿瘤示例，无需手动标注。我们的实验表明，由AI训练的总检测率（敏感性和特异性）与实际肿瘤相比，合成肿瘤的检测率相对较高。更重要的是，我们发现合成肿瘤的检测率对小肿瘤是非常高。我们进一步研究了使用合成肿瘤和已知大肿瘤的CT扫描数据训练AI的每个像素分割性能。最后，我们证明合成肿瘤可以提高AI在不同医院的CT扫描数据处理中的普适性。总的来说，我们的肿瘤合成方法具有极大的潜力，可以提高普罗大肠癌的早期检测，从而提高病人的存活率。

High-Resolution Vision Transformers for Pixel-Level Identification of Structural Components and Damage

paper_url: http://arxiv.org/abs/2308.03006
repo_url: None
paper_authors: Kareem Eltouny, Seyedomid Sajedi, Xiao Liang
For: The paper aims to improve the efficiency and accuracy of visual inspection for civil structures, specifically using high-resolution images and deep learning techniques.* Methods: The proposed framework uses a semantic segmentation network based on vision transformers and Laplacian pyramids scaling networks to parse high-resolution visual inspection images. The network is designed to retain both local fine details and global contextual information, while improving computational efficiency.* Results: The proposed framework was evaluated through comprehensive experiments on a dataset of bridge inspection report images, using multiple metrics for pixel-wise materials detection. The results show that the framework can efficiently process high-resolution visual data and accurately detect materials in the images.Here’s the same information in Simplified Chinese:* For: 本研究旨在通过高分辨率图像和深度学习技术进行结构体视觉检查，提高检查效率和准确率。* Methods: 提议的框架使用基于视Transformers的语义分割网络和卷积 pyramids 缩放网络来解析高分辨率视觉检查图像。网络设计保留了细节和全局 semantics 信息，提高计算效率。* Results: 提议的框架通过对桥梁检查报告图像进行广泛的实验，使用多种精度来测试像素粒子材料检测。结果表明，框架可以高效处理高分辨率视觉数据，并准确地检测图像中的材料。

Abstract
Visual inspection is predominantly used to evaluate the state of civil structures, but recent developments in unmanned aerial vehicles (UAVs) and artificial intelligence have increased the speed, safety, and reliability of the inspection process. In this study, we develop a semantic segmentation network based on vision transformers and Laplacian pyramids scaling networks for efficiently parsing high-resolution visual inspection images. The massive amounts of collected high-resolution images during inspections can slow down the investigation efforts. And while there have been extensive studies dedicated to the use of deep learning models for damage segmentation, processing high-resolution visual data can pose major computational difficulties. Traditionally, images are either uniformly downsampled or partitioned to cope with computational demands. However, the input is at risk of losing local fine details, such as thin cracks, or global contextual information. Inspired by super-resolution architectures, our vision transformer model learns to resize high-resolution images and masks to retain both the valuable local features and the global semantics without sacrificing computational efficiency. The proposed framework has been evaluated through comprehensive experiments on a dataset of bridge inspection report images using multiple metrics for pixel-wise materials detection.

摘要
<>传感器Visual inspection 主要用于评估公共结构，但最近的无人飞行器（UAV）和人工智能技术的发展已经提高了检查过程的速度、安全性和可靠性。在这项研究中，我们开发了基于视transformer和Laplacian pyramids scaling networks的semantic segmentation网络，用于高效地分解视检图像。采集的大量高分辨率图像可能会拖慢调查的进程。而深度学习模型已经得到了大量的投入，但处理高分辨率视觉数据可能会带来巨大的计算困难。传统上，图像会被uniform downsample或分割，以降低计算开销。然而，输入可能会失去细小的本地缝隙或全局的语义信息。 drawing inspiration from super-resolution architectures，我们的视transformer模型学习了resize高分辨率图像和面积，以保留有价值的本地特征和全局语义信息，不需要牺牲计算效率。我们提出的框架在bridge inspection report图像 dataset上进行了广泛的实验，并使用多种度量器进行像素精度检测。>>Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form instead.

MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation

paper_url: http://arxiv.org/abs/2308.03005
repo_url: https://github.com/xulianuwa/mctformer
paper_authors: Lian Xu, Mohammed Bennamoun, Farid Boussaid, Hamid Laga, Wanli Ouyang, Dan Xu
for: 提高弱级Semantic Segmentation（WSSS）的准确性，通过生成准确的类型特征地图作为pseudo标签。
methods: 基于transformer模型，提出一种多类token转换器，通过多类token来实现类型特征的学习和分类。
results: 通过对多个类型token的学习和对patch token的权重调整，实现了高效的类型特征地图生成，并且与Class Activation Mapping（CAM）方法结合使用，在PASCAL VOC 2012和MS COCO 2014数据集上显著提高WSSS性能。

Abstract
This paper proposes a novel transformer-based framework that aims to enhance weakly supervised semantic segmentation (WSSS) by generating accurate class-specific object localization maps as pseudo labels. Building upon the observation that the attended regions of the one-class token in the standard vision transformer can contribute to a class-agnostic localization map, we explore the potential of the transformer model to capture class-specific attention for class-discriminative object localization by learning multiple class tokens. We introduce a Multi-Class Token transformer, which incorporates multiple class tokens to enable class-aware interactions with the patch tokens. To achieve this, we devise a class-aware training strategy that establishes a one-to-one correspondence between the output class tokens and the ground-truth class labels. Moreover, a Contrastive-Class-Token (CCT) module is proposed to enhance the learning of discriminative class tokens, enabling the model to better capture the unique characteristics and properties of each class. As a result, class-discriminative object localization maps can be effectively generated by leveraging the class-to-patch attentions associated with different class tokens. To further refine these localization maps, we propose the utilization of patch-level pairwise affinity derived from the patch-to-patch transformer attention. Furthermore, the proposed framework seamlessly complements the Class Activation Mapping (CAM) method, resulting in significantly improved WSSS performance on the PASCAL VOC 2012 and MS COCO 2014 datasets. These results underline the importance of the class token for WSSS.

摘要
To achieve this, the authors propose a Multi-Class Token transformer that incorporates multiple class tokens, and a class-aware training strategy that establishes a one-to-one correspondence between the output class tokens and the ground-truth class labels. Additionally, a Contrastive-Class-Token (CCT) module is introduced to enhance the learning of discriminative class tokens.The proposed framework also utilizes patch-level pairwise affinity derived from the patch-to-patch transformer attention to refine the localization maps. The authors show that the proposed framework significantly improves WSSS performance on the PASCAL VOC 2012 and MS COCO 2014 datasets, demonstrating the importance of the class token for WSSS.Here is the translation in Simplified Chinese:这篇论文提出了一种基于转换器的新框架，以强化弱监督Semantic Segmentation（WSSS）中的对象局部化映射。这个框架基于对一个类token的注意力来生成准确的类型特有的对象局部化映射。为了实现这一点，作者们提出了多个类token，并使用多类token来实现类 aware的交互。这使得模型能够更好地捕捉类特异的注意力，并生成类型特异的对象局部化映射。此外，作者们还提出了一种强化学习抽象类token的方法，以提高模型对每个类的学习。这种方法基于对类标签和类token之间的对应关系，以确保模型能够正确地学习每个类的特征和属性。此论文还提出了一种使用patch-to-patch transformer注意力来加强对局部化映射的级别。这种方法可以更好地调整局部化映射，以提高WSSS性能。最后，作者们表明，这种框架可以轻松地与Class Activation Mapping（CAM）方法相结合，从而提高WSSS性能。这些结果证明了类token对WSSS的重要性。

Weakly supervised segmentation of intracranial aneurysms using a 3D focal modulation UNet

paper_url: http://arxiv.org/abs/2308.03001
repo_url: None
paper_authors: Amirhossein Rasoulian, Soorena Salari, Yiming Xiao
for: 这个论文的目的是提出一种基于弱监督学习的自动液体动脉精度诊断技术，以便更好地评估和治疗动脉精度疾病。
methods: 这个论文使用了一种新的3D焦点修饰UNet和 conditional random field（CRF）后处理技术来提高UIAs的自动 segmentation。
results: 研究人员通过使用这种新技术实现了UIAs的精度 segmentation，并与现有的3D UNet和Swin-UNETR进行比较，证明了该方法的优越性。

Abstract
Accurate identification and quantification of unruptured intracranial aneurysms (UIAs) are essential for the risk assessment and treatment decisions of this cerebrovascular disorder. Current assessment based on 2D manual measures of aneurysms on 3D magnetic resonance angiography (MRA) is sub-optimal and time-consuming. Automatic 3D measures can significantly benefit the clinical workflow and treatment outcomes. However, one major issue in medical image segmentation is the need for large well-annotated data, which can be expensive to obtain. Techniques that mitigate the requirement, such as weakly supervised learning with coarse labels are highly desirable. In this paper, we leverage coarse labels of UIAs from time-of-flight MRAs to obtain refined UIAs segmentation using a novel 3D focal modulation UNet, called FocalSegNet and conditional random field (CRF) postprocessing, with a Dice score of 0.68 and 95% Hausdorff distance of 0.95 mm. We evaluated the performance of the proposed algorithms against the state-of-the-art 3D UNet and Swin-UNETR, and demonstrated the superiority of the proposed FocalSegNet and the benefit of focal modulation for the task.

摘要
正确识别和量化不ruptured intracranial aneurysms (UIAs) 是脑动脉疾病的风险评估和治疗决策的重要因素。现有的评估方法基于2D手动测量MRA影像的3D磁共振成像是次optimal和时间consuming。自动3D测量可以对临床工作流程和治疗结果产生重要的帮助。然而，医疗影像分类中一个主要的问题是需要大量高质量标注数据，这可能是贵重的。本文利用时间推射MRA中的UIAs粗略标签来实现UIAs精确分类，使用了一个新的3D静止模组UNet（FocalSegNet）和 conditional random field（CRF）后处理，具有0.68的Dice分数和0.95 mm的95% Hausdorff距离。我们评估了提案的算法与现有的3D UNet和Swin-UNETR的比较，并证明了提案的FocalSegNet的优越性和静止模组的重要性。

StyleEDL: Style-Guided High-order Attention Network for Image Emotion Distribution Learning

paper_url: http://arxiv.org/abs/2308.03000
repo_url: https://github.com/liuxianyi/styleedl
paper_authors: Peiguang Jing, Xianyi Liu, Ji Wang, Yinwei Wei, Liqiang Nie, Yuting Su
for: Image emotion distribution learning
methods: Style-guided high-order attention network, GRAM-based stylistic representations, adversary-constrained high-order attention mechanism, stylistic graph convolutional network
results: Effective emotion distribution learning compared to state-of-the-art methods, demonstrated through extensive experiments on several benchmark datasets.Here is the text in Simplified Chinese:
for: 图像情感分布学习
methods: 风格引导高级注意网络、GRAM基于风格表示、对抗限制高级注意机制、动态生成内容依赖情感表示
results: 比对状态艺法高效的图像情感分布学习，通过多个 benchmark 数据集的广泛实验证明。I hope this helps!

Abstract
Emotion distribution learning has gained increasing attention with the tendency to express emotions through images. As for emotion ambiguity arising from humans' subjectivity, substantial previous methods generally focused on learning appropriate representations from the holistic or significant part of images. However, they rarely consider establishing connections with the stylistic information although it can lead to a better understanding of images. In this paper, we propose a style-guided high-order attention network for image emotion distribution learning termed StyleEDL, which interactively learns stylistic-aware representations of images by exploring the hierarchical stylistic information of visual contents. Specifically, we consider exploring the intra- and inter-layer correlations among GRAM-based stylistic representations, and meanwhile exploit an adversary-constrained high-order attention mechanism to capture potential interactions between subtle visual parts. In addition, we introduce a stylistic graph convolutional network to dynamically generate the content-dependent emotion representations to benefit the final emotion distribution learning. Extensive experiments conducted on several benchmark datasets demonstrate the effectiveness of our proposed StyleEDL compared to state-of-the-art methods. The implementation is released at: https://github.com/liuxianyi/StyleEDL.

摘要
📝Emotion distribution learning has gained increasing attention with the tendency to express emotions through images. However, emotion ambiguity arising from humans' subjectivity has posed a challenge. Previous methods have focused on learning appropriate representations from the holistic or significant part of images, but rarely consider establishing connections with stylistic information. In this paper, we propose a style-guided high-order attention network for image emotion distribution learning termed StyleEDL, which interactively learns stylistic-aware representations of images by exploring the hierarchical stylistic information of visual contents. Specifically, we consider exploring the intra- and inter-layer correlations among GRAM-based stylistic representations, and meanwhile exploit an adversary-constrained high-order attention mechanism to capture potential interactions between subtle visual parts. In addition, we introduce a stylistic graph convolutional network to dynamically generate the content-dependent emotion representations to benefit the final emotion distribution learning. Extensive experiments conducted on several benchmark datasets demonstrate the effectiveness of our proposed StyleEDL compared to state-of-the-art methods. 📈The implementation is released at: .

Novel Class Discovery for Long-tailed Recognition

paper_url: http://arxiv.org/abs/2308.02989
repo_url: https://github.com/kleinzcy/ncdlr
paper_authors: Zhang Chuyu, Xu Ruijie, He Xuming
for: 本文研究了一种更加实际的新类发现问题，其中新类和已知类的分布呈长尾型。
methods: 本文提出了一种适应自动标注策略，基于类均匀代表。该方法通过解决一个松弛优化运动问题，生成高质量的假标签，有效地减少了类偏见。
results: 对CIFAR100、ImageNet100、Herbarium19和大规模iNaturalist18 dataset进行了广泛的实验，结果表明本方法具有优异性。代码可以在https://github.com/kleinzcy/NCDLR上下载。

Abstract
While the novel class discovery has recently made great progress, existing methods typically focus on improving algorithms on class-balanced benchmarks. However, in real-world recognition tasks, the class distributions of their corresponding datasets are often imbalanced, which leads to serious performance degeneration of those methods. In this paper, we consider a more realistic setting for novel class discovery where the distributions of novel and known classes are long-tailed. One main challenge of this new problem is to discover imbalanced novel classes with the help of long-tailed known classes. To tackle this problem, we propose an adaptive self-labeling strategy based on an equiangular prototype representation of classes. Our method infers high-quality pseudo-labels for the novel classes by solving a relaxed optimal transport problem and effectively mitigates the class biases in learning the known and novel classes. We perform extensive experiments on CIFAR100, ImageNet100, Herbarium19 and large-scale iNaturalist18 datasets, and the results demonstrate the superiority of our method. Our code is available at https://github.com/kleinzcy/NCDLR.

摘要
新的类发现方法在最近几年内做出了大量的进步，但现有方法通常是通过改进算法来提高类均衡的benchmark上的性能。然而，在实际识别任务中，数据集的类分布通常是不均衡的，这会导致这些方法的性能下降严重。在这篇论文中，我们考虑了更真实的新类发现问题，其中新类和已知类的分布都是长尾分布。我们的主要挑战是通过长尾已知类的帮助，找到不均衡的新类。为解决这个问题，我们提出了一种适应性自标注策略，基于类的等角度表示。我们的方法通过解决一个宽松的优化交通问题，生成高质量的 Pseudo-标签，并有效地消除知类和新类的类偏见。我们在CIFAR100、ImageNet100、Herbarium19和大规模的iNaturalist18 datasets上进行了广泛的实验，结果表明我们的方法具有优越性。我们的代码可以在https://github.com/kleinzcy/NCDLR上获取。

Introducing Feature Attention Module on Convolutional Neural Network for Diabetic Retinopathy Detection

paper_url: http://arxiv.org/abs/2308.02985
repo_url: None
paper_authors: Susmita Ghosh, Abhiroop Chatterjee
for: The paper is written for proposing a new methodology for more accurate detection of diabetic retinopathy (DR) using deep learning models.
methods: The proposed method integrates a feature attention module with a pretrained VGG19 convolutional neural network (CNN) to enhance the discriminative power of the CNN. The feature attention module selectively highlights salient features from images and fuses them with the original input, which improves the network’s ability to focus on relevant information.
results: The proposed method achieves an accuracy of 95.70% on the APTOS (Asia Pacific Tele-Ophthalmology Society) DR Dataset, which is higher than the accuracy achieved by other state-of-the-art approaches.

Abstract
Diabetic retinopathy (DR) is a leading cause of blindness among diabetic patients. Deep learning models have shown promising results in automating the detection of DR. In the present work, we propose a new methodology that integrates a feature attention module with a pretrained VGG19 convolutional neural network (CNN) for more accurate DR detection. Here, the pretrained net is fine-tuned with the proposed feature attention block. The proposed module aims to leverage the complementary information from various regions of fundus images to enhance the discriminative power of the CNN. The said feature attention module incorporates an attention mechanism which selectively highlights salient features from images and fuses them with the original input. The simultaneous learning of attention weights for the features and thereupon the combination of attention-modulated features within the feature attention block facilitates the network's ability to focus on relevant information while reducing the impact of noisy or irrelevant features. Performance of the proposed method has been evaluated on a widely used dataset for diabetic retinopathy classification e.g., the APTOS (Asia Pacific Tele-Ophthalmology Society) DR Dataset. Results are compared with/without attention module, as well as with other state-of-the-art approaches. Results confirm that the introduction of the fusion module (fusing of feature attention module with CNN) improves the accuracy of DR detection achieving an accuracy of 95.70%.

摘要
糖尿病 RETINOPATHY (DR) 是糖尿病患者中最主要的失明原因。深度学习模型已经显示了自动DR检测的承诺。在当前工作中，我们提议一种新的方法ológíque，即将特征注意模块与预训练的 VGG19 convolutional neural network (CNN) 集成以提高DR检测的准确度。这里，预训练的网络被细化并与提议的特征注意块结合。该模块的目的是利用不同区域的眼球图像中的补充信息，以提高 CNN 的拟合力。该特征注意模块包括一个注意机制，该机制选择眼球图像中的重要特征，并将其与原始输入进行组合。同时学习注意 весов для特征和组合注意模ulated特征内的特征注意块，使网络能够注重相关信息，降低干扰或无关信息的影响。我们对APTOS（亚太地区电子眼科学会）DR数据集进行了评估，并与和不含注意模块，以及其他当前领先的方法进行比较。结果表明，将特征注意模块与CNN集成（称为特征注意块）可以提高DR检测的准确度，达到95.70%的最高精度。

Focus the Discrepancy: Intra- and Inter-Correlation Learning for Image Anomaly Detection

paper_url: http://arxiv.org/abs/2308.02983
repo_url: https://github.com/xcyao00/fod
paper_authors: Xincheng Yao, Ruoqi Li, Zefeng Qian, Yan Luo, Chongyang Zhang
for: 本文提出了一种新的异常检测方法，即 FOcus-the-Discrepancy（FOD），用于同时检测图像中异常点的patch-wise、intra-和inter-异常。
methods: 本文使用了Transformer模型，并对其进行修改，以便更好地捕捉图像中异常点的patch-wise和inter-image correlations。特别是，本文提出了一种新的自我相关映射修改方法，称为Intra-Inter-Correlation（I2Correlation），用于同时建立图像中patch-wise和inter-image的相关性。
results: 本文的实验结果表明，FOD方法可以在三个Unsupervised Real-World AD benchmark上达到非常高的异常检测性能。 Code将会在https://github.com/xcyao00/FOD上提供。

Abstract
Humans recognize anomalies through two aspects: larger patch-wise representation discrepancies and weaker patch-to-normal-patch correlations. However, the previous AD methods didn't sufficiently combine the two complementary aspects to design AD models. To this end, we find that Transformer can ideally satisfy the two aspects as its great power in the unified modeling of patch-wise representations and patch-to-patch correlations. In this paper, we propose a novel AD framework: FOcus-the-Discrepancy (FOD), which can simultaneously spot the patch-wise, intra- and inter-discrepancies of anomalies. The major characteristic of our method is that we renovate the self-attention maps in transformers to Intra-Inter-Correlation (I2Correlation). The I2Correlation contains a two-branch structure to first explicitly establish intra- and inter-image correlations, and then fuses the features of two-branch to spotlight the abnormal patterns. To learn the intra- and inter-correlations adaptively, we propose the RBF-kernel-based target-correlations as learning targets for self-supervised learning. Besides, we introduce an entropy constraint strategy to solve the mode collapse issue in optimization and further amplify the normal-abnormal distinguishability. Extensive experiments on three unsupervised real-world AD benchmarks show the superior performance of our approach. Code will be available at https://github.com/xcyao00/FOD.

摘要
人类通过两个方面识别异常：一是较大的补丁层表示差异，二是较弱的补丁与正常补丁之间的相关性。然而，先前的异常检测方法没有足够融合这两个补充性方面来设计异常检测模型。为了解决这问题，我们发现 transformer 可以 идеаль地满足这两个方面，因为它可以强大地统一补丁层表示和补丁之间的相关性。在这篇文章中，我们提出了一种新的异常检测框架：FOcus-the-Discrepancy（FOD），它可以同时检测补丁层、内部和外部异常。我们的方法的主要特点是将 transformer 的自我注意力地图改为 Intra-Inter-Correlation（I2Correlation）。I2Correlation 包括两个分支结构，首先显式建立内部和外部图像相关性，然后将两个分支的特征进行合并，以检测异常模式。为了学习内部和外部相关性适应性地，我们提出了基于 RBF kernel 的目标相关性学习目标，并引入了一种对数分布约束策略，以解决优化过程中的模式覆盖问题。我们的方法在三个未经过超参数的实际异常检测标准benchmark上进行了广泛的实验，并显示了我们的方法的超过其他方法的性能。代码将在 https://github.com/xcyao00/FOD 上提供。

paper_url: http://arxiv.org/abs/2308.02982
repo_url: https://github.com/mr-neko/jm3d
paper_authors: Haowei Wang, Jiji Tang, Jiayi Ji, Xiaoshuai Sun, Rongsheng Zhang, Yiwei Ma, Minda Zhao, Lincheng Li, zeng zhao, Tangjie Lv, Rongrong Ji
for: 本文提出了一种多视图共同模式匹配方法，以解决现有方法所导致的信息损失和共同匹配问题。
methods: 本文提出了一种新的结构多模态组织器（SMO）和一种联合多模态匹配（JMA）两部分。SMO通过将多视图图像和层次文本纳入一起，提高了视觉和语言模式的表示。JMA通过将语言知识 incorporated into the visual modality，实现了多模态之间的共同匹配。
results: 对于ModelNet40和ScanObjectNN两个数据集，本文的提出的方法JM3D实现了零shot 3D分类的最佳性能。与ULIP相比，JM3D在PointMLP上提高了约4.3%的性能，在PointNet++上提高了最大1.5%的性能。

Abstract
In recent years, 3D representation learning has turned to 2D vision-language pre-trained models to overcome data scarcity challenges. However, existing methods simply transfer 2D alignment strategies, aligning 3D representations with single-view 2D images and coarse-grained parent category text. These approaches introduce information degradation and insufficient synergy issues, leading to performance loss. Information degradation arises from overlooking the fact that a 3D representation should be equivalent to a series of multi-view images and more fine-grained subcategory text. Insufficient synergy neglects the idea that a robust 3D representation should align with the joint vision-language space, rather than independently aligning with each modality. In this paper, we propose a multi-view joint modality modeling approach, termed JM3D, to obtain a unified representation for point cloud, text, and image. Specifically, a novel Structured Multimodal Organizer (SMO) is proposed to address the information degradation issue, which introduces contiguous multi-view images and hierarchical text to enrich the representation of vision and language modalities. A Joint Multi-modal Alignment (JMA) is designed to tackle the insufficient synergy problem, which models the joint modality by incorporating language knowledge into the visual modality. Extensive experiments on ModelNet40 and ScanObjectNN demonstrate the effectiveness of our proposed method, JM3D, which achieves state-of-the-art performance in zero-shot 3D classification. JM3D outperforms ULIP by approximately 4.3% on PointMLP and achieves an improvement of up to 6.5% accuracy on PointNet++ in top-1 accuracy for zero-shot 3D classification on ModelNet40. The source code and trained models for all our experiments are publicly available at https://github.com/Mr-Neko/JM3D.

摘要
在最近的几年中，3D表示学习转向了2D视觉语言预训模型，以解决数据缺乏问题。然而，现有的方法只是将2D对齐策略转移到3D表示上，将3D表示与单视图2D图像和粗粒度的父类别文本进行对齐。这些方法会导致信息损失和不足的共聚问题，从而导致性能下降。信息损失的原因在于忽略了3D表示应该与多视图图像和更细化的子类别文本相对应。不足的共聚问题是由于忽略了视觉语言空间的 JOINT 模型，而不是独立地对每个模式进行对齐。在本文中，我们提出了一种多视图集成模型，称为JM3D，以获得包含点云、文本和图像的统一表示。具体来说，我们提出了一种新的结构化多模式组织器（SMO），以解决信息损失问题，并在视觉和语言模式中引入连续多视图图像和层次文本，以激活表示的多模式。此外，我们还提出了一种联合多模式对齐（JMA），以解决不足的共聚问题，将语言知识integrated到视觉模式中。我们在ModelNet40和ScanObjectNN上进行了广泛的实验，并证明了我们提出的方法JM3D在零shot 3D分类中达到了状态机器人的性能。JM3D在PointMLP和PointNet++上比ULIP高出约4.3%的性能，并在ModelNet40上实现了顶部1个最大值的提高率为6.5%。我们在https://github.com/Mr-Neko/JM3D上提供了所有实验代码和训练模型。

Robust estimation of exposure ratios in multi-exposure image stacks

paper_url: http://arxiv.org/abs/2308.02968
repo_url: https://github.com/gfxdisp/hdrutils
paper_authors: Param Hanji, Rafał K. Mantiuk
for:The paper is written for those who need to merge multi-exposure image stacks into a high dynamic range (HDR) image, and want to eliminate banding artifacts caused by inaccurate exposure times.methods:The paper proposes a method to estimate exposure ratios directly from the input images, using an optimization problem to minimize estimation error caused by camera noise. The method uses a linear solver and is robust to pixel misalignment caused by camera or object motion.results:The proposed method eliminates banding artifacts in popular datasets and is essential for applications that require physically accurate reconstructions, such as measuring the modulation transfer function of a display. The code for the method is available.

Abstract
Merging multi-exposure image stacks into a high dynamic range (HDR) image requires knowledge of accurate exposure times. When exposure times are inaccurate, for example, when they are extracted from a camera's EXIF metadata, the reconstructed HDR images reveal banding artifacts at smooth gradients. To remedy this, we propose to estimate exposure ratios directly from the input images. We derive the exposure time estimation as an optimization problem, in which pixels are selected from pairs of exposures to minimize estimation error caused by camera noise. When pixel values are represented in the logarithmic domain, the problem can be solved efficiently using a linear solver. We demonstrate that the estimation can be easily made robust to pixel misalignment caused by camera or object motion by collecting pixels from multiple spatial tiles. The proposed automatic exposure estimation and alignment eliminates banding artifacts in popular datasets and is essential for applications that require physically accurate reconstructions, such as measuring the modulation transfer function of a display. The code for the method is available.

摘要
合并多张曝光图像拼接成高动态范围（HDR）图像需要准确的曝光时间知识。如果曝光时间不准确，例如从相机的EXIF元数据中提取出来的，则重构的HDR图像中会出现扫描 artifacts 在平滑的渐变区域。为解决这问题，我们提议直接从输入图像中估算曝光比率。我们将曝光时间估算作为一个优化问题，在其中选择了对应的曝光图像中的像素，以最小化由相机噪声引起的估算误差。当像素值表示在对数域中时，问题可以通过一个线性解决器效率地解决。我们示出了对摄像头或物体运动引起的像素不对的情况下，可以使用多个空间块收集像素来使估算成为 robust。我们的自动曝光估算和对齐方法可以消除各种流行的数据集中的扫描 artifacts，并是需要物理准确重建的应用，如测量显示器的模板传输函数。我们的代码可以获得。

Generative Approach for Probabilistic Human Mesh Recovery using Diffusion Models

paper_url: http://arxiv.org/abs/2308.02963
repo_url: https://github.com/hanbyel0105/diff-hmr
paper_authors: Hanbyel Cho, Junmo Kim
for: diff-hmr 是一种用于从 2D 图像中重建 3D 人体网格的方法，并且能够处理多个可能性的挑战。methods: diff-hmr 使用了扩散过程来处理从真实参数衍生的模型，并在训练阶段将 SMPL 参数从真实参数扩散到随机分布。results: diff-hmr 可以从同一个输入图像中产生多种不同的结果，因为它可以处理不同的输入噪声。实验结果显示，提案的框架可以有效地模型人体网格重建任务中的潜在多个可能性。

Abstract
This work focuses on the problem of reconstructing a 3D human body mesh from a given 2D image. Despite the inherent ambiguity of the task of human mesh recovery, most existing works have adopted a method of regressing a single output. In contrast, we propose a generative approach framework, called "Diffusion-based Human Mesh Recovery (Diff-HMR)" that takes advantage of the denoising diffusion process to account for multiple plausible outcomes. During the training phase, the SMPL parameters are diffused from ground-truth parameters to random distribution, and Diff-HMR learns the reverse process of this diffusion. In the inference phase, the model progressively refines the given random SMPL parameters into the corresponding parameters that align with the input image. Diff-HMR, being a generative approach, is capable of generating diverse results for the same input image as the input noise varies. We conduct validation experiments, and the results demonstrate that the proposed framework effectively models the inherent ambiguity of the task of human mesh recovery in a probabilistic manner. The code is available at https://github.com/hanbyel0105/Diff-HMR

摘要
这个工作关注于从给定的2D图像中重建3D人体模型的问题。尽管人体模型恢复任务本身具有内在的模糊性，大多数现有的工作都采取了单输出回归的方法。而我们提议的是一种生成方法框架，称为“Diffusion-based Human Mesh Recovery（Diff-HMR）”，它利用杂化扩散过程来考虑多个可能的结果。在训练阶段，SMPL参数从真实参数扩散到随机分布，Diff-HMR学习反向的这种扩散过程。在推理阶段，模型逐渐将给定的随机SMPL参数转化为与输入图像匹配的参数。由于Diff-HMR是一种生成方法，因此它能够根据输入图像的噪声变化生成多种结果。我们进行了验证实验，结果表明，我们提议的框架能够在 probabilistic 的方式模型人体模型恢复任务中的内在模糊性。代码可以在 https://github.com/hanbyel0105/Diff-HMR 上获取。

DermoSegDiff: A Boundary-aware Segmentation Diffusion Model for Skin Lesion Delineation

paper_url: http://arxiv.org/abs/2308.02959
repo_url: https://github.com/mindflow-institue/dermosegdiff
paper_authors: Afshin Bozorgpour, Yousef Sadegheih, Amirhossein Kazerouni, Reza Azad, Dorit Merhof
for: 静脉皮肤病诊断 Early detection and accurate diagnosis of dermatological conditions.
methods: 使用 Denoising Diffusion Probabilistic Models (DDPMs) 和 U-Net 网络，在学习过程中引入边缘信息，逐渐减少其他区域的重要性。
results: 在多个皮肤分割数据集上实现了比 CNN、transformer 和 diffusion-based 方法更高的效果和泛化能力。

Abstract
Skin lesion segmentation plays a critical role in the early detection and accurate diagnosis of dermatological conditions. Denoising Diffusion Probabilistic Models (DDPMs) have recently gained attention for their exceptional image-generation capabilities. Building on these advancements, we propose DermoSegDiff, a novel framework for skin lesion segmentation that incorporates boundary information during the learning process. Our approach introduces a novel loss function that prioritizes the boundaries during training, gradually reducing the significance of other regions. We also introduce a novel U-Net-based denoising network that proficiently integrates noise and semantic information inside the network. Experimental results on multiple skin segmentation datasets demonstrate the superiority of DermoSegDiff over existing CNN, transformer, and diffusion-based approaches, showcasing its effectiveness and generalization in various scenarios. The implementation is publicly accessible on \href{https://github.com/mindflow-institue/dermosegdiff}{GitHub}

摘要
皮肤梗混杂 Segmentation 在早期检测和精准诊断皮肤疾病中扮演了关键角色。 latest 的 Denoising Diffusion Probabilistic Models (DDPMs) 在图像生成方面吸引了广泛的关注。基于这些进步，我们提出了 DermoSegDiff，一种新的皮肤梗混杂 Segmentation 框架，在学习过程中加入边界信息。我们的方法引入了一种新的损失函数，在训练过程中优先级化边界，逐渐减少其他区域的重要性。我们还引入了一种基于 U-Net 的净化网络，能够高效地 инте integrate 噪声和semantic信息内网络。多个皮肤分割数据集的实验结果表明 DermoSegDiff 在不同的场景下具有优秀的效果和泛化能力，超越了现有的 CNN、transformer 和扩散基于的方法。实现在 \href{https://github.com/mindflow-institue/dermosegdiff}{GitHub} 上公开 accessible。

K-band: Self-supervised MRI Reconstruction via Stochastic Gradient Descent over K-space Subsets

paper_url: http://arxiv.org/abs/2308.02958
repo_url: https://github.com/mikgroup/k-band
paper_authors: Frederic Wang, Han Qi, Alfredo De Goyeneche, Reinhard Heckel, Michael Lustig, Efrat Shimron
for: 用于使深度学习方法可以使用有限分辨率的k空间数据进行训练。
methods: 引入了一种新的数学框架，称为k带，允许使用有限分辨率的k空间数据进行训练深度学习模型。特别是，我们使用在每次训练迭代中，只使用一小部分的k空间数据来计算梯度的方法。
results: 对于 Raw MRI 数据进行了数值实验，并证明了k带方法可以超过其他基于有限分辨率数据的方法，并与当前最佳方法（SoTA）的性能相当。k带方法因此实现了SoTA性能，而不需要高质量的训练数据。

Abstract
Although deep learning (DL) methods are powerful for solving inverse problems, their reliance on high-quality training data is a major hurdle. This is significant in high-dimensional (dynamic/volumetric) magnetic resonance imaging (MRI), where acquisition of high-resolution fully sampled k-space data is impractical. We introduce a novel mathematical framework, dubbed k-band, that enables training DL models using only partial, limited-resolution k-space data. Specifically, we introduce training with stochastic gradient descent (SGD) over k-space subsets. In each training iteration, rather than using the fully sampled k-space for computing gradients, we use only a small k-space portion. This concept is compatible with different sampling strategies; here we demonstrate the method for k-space "bands", which have limited resolution in one dimension and can hence be acquired rapidly. We prove analytically that our method stochastically approximates the gradients computed in a fully-supervised setup, when two simple conditions are met: (i) the limited-resolution axis is chosen randomly-uniformly for every new scan, hence k-space is fully covered across the entire training set, and (ii) the loss function is weighed with a mask, derived here analytically, which facilitates accurate reconstruction of high-resolution details. Numerical experiments with raw MRI data indicate that k-band outperforms two other methods trained on limited-resolution data and performs comparably to state-of-the-art (SoTA) methods trained on high-resolution data. k-band hence obtains SoTA performance, with the advantage of training using only limited-resolution data. This work hence introduces a practical, easy-to-implement, self-supervised training framework, which involves fast acquisition and self-supervised reconstruction and offers theoretical guarantees.

摘要
although deep learning (DL) 方法是强大的解决反向问题的工具，它们的依赖于高质量训练数据是一个主要的障碍。在高维（动态/体积）磁共振成像（MRI）中，获得高分辨率完全掌握的k空间数据是不实际。我们介绍了一种新的数学框架，称为k空间，允许使用只有部分、有限分辨率k空间数据来训练DL模型。具体来说，我们引入了使用杂次下降（SGD）在k空间subset中训练。在每个训练轮次中，而不是使用完全掌握的k空间来计算导数，我们只使用一小部分k空间。这个概念 compatible with different sampling strategies，我们在这里采用k空间“带”，它们有限制分辨率的一个维度，可以快速获得。我们数学上证明，我们的方法可以随机选择限制分辨率轴，以 Ensure k-space是在整个训练集中完全覆盖。此外，我们还使用一个mask，在这里分析性 derivation，以便准确地重建高分辨率细节。numerical experiments with raw MRI data indicate that k-band outperforms two other methods trained on limited-resolution data and performs comparably to state-of-the-art (SoTA) methods trained on high-resolution data. k-band hence obtains SoTA performance, with the advantage of training using only limited-resolution data. this work hence introduces a practical, easy-to-implement, self-supervised training framework, which involves fast acquisition and self-supervised reconstruction and offers theoretical guarantees.

Multispectral Quantitative Phase Imaging Using a Diffractive Optical Network

paper_url: http://arxiv.org/abs/2308.02952
repo_url: None
paper_authors: Che-Yung Shen, Jingxi Li, Deniz Mengu, Aydogan Ozcan
for: 这种 diffractive processor 可以实现高通量、低功耗的量子phasic 成像，用于生物、材料科学和工程等领域的 transparent 样品的成像。
methods: 使用了 spatially 工程的 diffractive layers，通过 deep learning 优化，将输入对象的phasic 特征编码到输出平面上的INTENSITY variations中，实现多spectral QPI 的同时进行。
results: 通过数字 simulations，我们展示了这种 diffractive multispectral processor 可以同时进行9和16个target spectral band的量子phasic 成像，并且在所有激光通道上保持uniform performance，显示出高质量的 QPI 性能。

Abstract
As a label-free imaging technique, quantitative phase imaging (QPI) provides optical path length information of transparent specimens for various applications in biology, materials science, and engineering. Multispectral QPI measures quantitative phase information across multiple spectral bands, permitting the examination of wavelength-specific phase and dispersion characteristics of samples. Here, we present the design of a diffractive processor that can all-optically perform multispectral quantitative phase imaging of transparent phase-only objects in a snapshot. Our design utilizes spatially engineered diffractive layers, optimized through deep learning, to encode the phase profile of the input object at a predetermined set of wavelengths into spatial intensity variations at the output plane, allowing multispectral QPI using a monochrome focal plane array. Through numerical simulations, we demonstrate diffractive multispectral processors to simultaneously perform quantitative phase imaging at 9 and 16 target spectral bands in the visible spectrum. These diffractive multispectral processors maintain uniform performance across all the wavelength channels, revealing a decent QPI performance at each target wavelength. The generalization of these diffractive processor designs is validated through numerical tests on unseen objects, including thin Pap smear images. Due to its all-optical processing capability using passive dielectric diffractive materials, this diffractive multispectral QPI processor offers a compact and power-efficient solution for high-throughput quantitative phase microscopy and spectroscopy. This framework can operate at different parts of the electromagnetic spectrum and be used for a wide range of phase imaging and sensing applications.

摘要
为了实现标注自由的成像技术，量子阶段成像（QPI）提供了透明样品的光路径信息，用于生物、材料科学和工程各种应用。多spectral QPI测量了透明样品的波长特征，允许样品的波长特征和分散特征的检测。在这里，我们提出了一种可以在一步中完成多spectral量子阶段成像的干涉处理器设计。我们的设计利用了工程干涉层，通过深度学习优化，将输入样品的相位特征编码到预先确定的波长谱中，并通过干涉处理器将其转换为具有空间强度变化的输出平面。这种方法可以通过单色焦点平面数组实现多spectral QPI。我们通过数值 simulations 表明，我们的设计可以同时在9和16个目标频谱中进行多spectral QPI，并且在所有波长通道中保持均衡性，从而实现高质量的QPI性能。我们的总结是，通过深度学习优化的干涉处理器可以在不同的电磁谱спектrum中实现高效的量子阶段成像和感知应用。这种方法可以用于各种透明样品的成像和感知应用，并且可以在不同的波长谱中运行。

MomentaMorph: Unsupervised Spatial-Temporal Registration with Momenta, Shooting, and Correction

paper_url: http://arxiv.org/abs/2308.02949
repo_url: None
paper_authors: Zhangxing Bian, Shuwen Wei, Yihao Liu, Junyu Chen, Jiachen Zhuo, Fangxu Xing, Jonghye Woo, Aaron Carass, Jerry L. Prince
For: This paper aims to address the challenges of registering magnetic resonance imaging (MRI) data with large motion and repetitive patterns, which can lead to motion estimation errors.* Methods: The proposed method uses a “momenta, shooting, and correction” framework grounded in Lie algebra and Lie group principles. This framework accumulates momenta in the tangent vector space and employs exponential mapping in the diffeomorphic space for rapid approximation towards true optima, circumventing local optima.* Results: The method is demonstrated to be efficient in estimating accurate, dense, and diffeomorphic 2D/3D motion fields amidst large motion and repetitive patterns on both a 2D synthetic dataset and a real 3D tMRI dataset.

Abstract
Tagged magnetic resonance imaging (tMRI) has been employed for decades to measure the motion of tissue undergoing deformation. However, registration-based motion estimation from tMRI is difficult due to the periodic patterns in these images, particularly when the motion is large. With a larger motion the registration approach gets trapped in a local optima, leading to motion estimation errors. We introduce a novel "momenta, shooting, and correction" framework for Lagrangian motion estimation in the presence of repetitive patterns and large motion. This framework, grounded in Lie algebra and Lie group principles, accumulates momenta in the tangent vector space and employs exponential mapping in the diffeomorphic space for rapid approximation towards true optima, circumventing local optima. A subsequent correction step ensures convergence to true optima. The results on a 2D synthetic dataset and a real 3D tMRI dataset demonstrate our method's efficiency in estimating accurate, dense, and diffeomorphic 2D/3D motion fields amidst large motion and repetitive patterns.

摘要
测量类型核磁共振成像（tMRI）已经用数十年来量化体内组织运动的变形。然而，从tMRI中使用登录基本的动作估计受到频繁的图样和大动作的限制，导致动作估计出错。我们介绍了一个新的“动量、射线和调整”框架，用于在具有循环图样和大动作的情况下，高精度地估计Lagrangian动作场。这个框架基于李代数和李群原理，在射线空间寄存动量，并使用对称空间的对易映射来快速地对真的极点进行快速趋向，避免本地极点。接着的调整步骤确保了对真的极点的对准。实验结果显示，我们的方法可以高效地在具有大动作和循环图样的2D/3D tMRI数据中估计精度高、密集的动作场。

paper_url: http://arxiv.org/abs/2308.02947
repo_url: https://github.com/guillermocarbajal/j-mkpd
paper_authors: Guillermo Carbajal, Patricia Vitoria, José Lezama, Pablo Musé
for: 提高摄像头拍摄图像的运动模糊除卸。
methods: 使用深度学习方法，首先估计杂质函数，然后使用非盲解减法。
results: 实现了对真实模糊图像的高质量修复，与现有的端到端深度学习方法相当或更好。Here’s the full Chinese text in simplified format:* 为：提高摄像头拍摄图像的运动模糊除卸。* 方法：使用深度学习方法，首先估计杂质函数，然后使用非盲解减法。* 结果：实现了对真实模糊图像的高质量修复，与现有的端到端深度学习方法相当或更好。

Abstract
In recent years, the removal of motion blur in photographs has seen impressive progress in the hands of deep learning-based methods, trained to map directly from blurry to sharp images. For this reason, approaches that explicitly use a forward degradation model received significantly less attention. However, a well-defined specification of the blur genesis, as an intermediate step, promotes the generalization and explainability of the method. Towards this goal, we propose a learning-based motion deblurring method based on dense non-uniform motion blur estimation followed by a non-blind deconvolution approach. Specifically, given a blurry image, a first network estimates the dense per-pixel motion blur kernels using a lightweight representation composed of a set of image-adaptive basis motion kernels and the corresponding mixing coefficients. Then, a second network trained jointly with the first one, unrolls a non-blind deconvolution method using the motion kernel field estimated by the first network. The model-driven aspect is further promoted by training the networks on sharp/blurry pairs synthesized according to a convolution-based, non-uniform motion blur degradation model. Qualitative and quantitative evaluation shows that the kernel prediction network produces accurate motion blur estimates, and that the deblurring pipeline leads to restorations of real blurred images that are competitive or superior to those obtained with existing end-to-end deep learning-based methods. Code and trained models are available at https://github.com/GuillermoCarbajal/J-MKPD/.

摘要
Recent years have seen significant progress in removing motion blur from photographs using deep learning-based methods, which map directly from blurry to sharp images. As a result, methods that explicitly use a forward degradation model have received less attention. However, a well-defined specification of the blur genesis, as an intermediate step, can promote the generalization and explainability of the method. To address this, we propose a learning-based motion deblurring method based on dense non-uniform motion blur estimation followed by a non-blind deconvolution approach.Given a blurry image, the first network estimates the dense per-pixel motion blur kernels using a lightweight representation composed of a set of image-adaptive basis motion kernels and the corresponding mixing coefficients. Then, a second network trained jointly with the first one unrolls a non-blind deconvolution method using the motion kernel field estimated by the first network. The model-driven aspect is further promoted by training the networks on sharp/blurry pairs synthesized according to a convolution-based, non-uniform motion blur degradation model.Qualitative and quantitative evaluation shows that the kernel prediction network produces accurate motion blur estimates, and that the deblurring pipeline leads to restorations of real blurred images that are competitive or superior to those obtained with existing end-to-end deep learning-based methods. Code and trained models are available at .

paper_url: http://arxiv.org/abs/2308.02917
repo_url: None
paper_authors: Florentin Liebmann, Marco von Atzigen, Dominik Stütz, Julian Wolf, Lukas Zingg, Daniel Suter, Laura Leoty, Hooman Esfandiari, Jess G. Snedeker, Martin R. Oswald, Marc Pollefeys, Mazda Farshad, Philipp Fürnstahl
for: 骨相干 Navigation 系统的开发，以提高 lumbar 脊椎连接手术中的钉 placement 精度。
methods: 使用深度学习神经网络对 lumbar 脊椎进行自动注册，并在实时更新中使用 GPU 加速。
results: 在一个公共数据集上，成功注册率为 96%，注册误差为 2.73 mm，钉轨迹误差为 1.79°，钉入点误差为 2.43 mm。在人体试验中也实现了 100% 的钉准确性和 1.20 mm 的注册精度。

Abstract
Established surgical navigation systems for pedicle screw placement have been proven to be accurate, but still reveal limitations in registration or surgical guidance. Registration of preoperative data to the intraoperative anatomy remains a time-consuming, error-prone task that includes exposure to harmful radiation. Surgical guidance through conventional displays has well-known drawbacks, as information cannot be presented in-situ and from the surgeon's perspective. Consequently, radiation-free and more automatic registration methods with subsequent surgeon-centric navigation feedback are desirable. In this work, we present an approach that automatically solves the registration problem for lumbar spinal fusion surgery in a radiation-free manner. A deep neural network was trained to segment the lumbar spine and simultaneously predict its orientation, yielding an initial pose for preoperative models, which then is refined for each vertebra individually and updated in real-time with GPU acceleration while handling surgeon occlusions. An intuitive surgical guidance is provided thanks to the integration into an augmented reality based navigation system. The registration method was verified on a public dataset with a mean of 96\% successful registrations, a target registration error of 2.73 mm, a screw trajectory error of 1.79{\deg} and a screw entry point error of 2.43 mm. Additionally, the whole pipeline was validated in an ex-vivo surgery, yielding a 100\% screw accuracy and a registration accuracy of 1.20 mm. Our results meet clinical demands and emphasize the potential of RGB-D data for fully automatic registration approaches in combination with augmented reality guidance.

摘要
We trained a deep neural network to segment the lumbar spine and simultaneously predict its orientation, yielding an initial pose for preoperative models, which is then refined for each vertebra individually and updated in real-time with GPU acceleration while handling surgeon occlusions. This approach provides an intuitive surgical guidance through an augmented reality-based navigation system.We verified our registration method on a public dataset with a mean of 96% successful registrations, a target registration error of 2.73 mm, a screw trajectory error of 1.79°, and a screw entry point error of 2.43 mm. Additionally, we validated the whole pipeline in an ex-vivo surgery, yielding a 100% screw accuracy and a registration accuracy of 1.20 mm. Our results meet clinical demands and highlight the potential of RGB-D data for fully automatic registration approaches in combination with augmented reality guidance.

DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation

paper_url: http://arxiv.org/abs/2308.02915
repo_url: None
paper_authors: Qiaosong Qi, Le Zhuo, Aixi Zhang, Yue Liao, Fei Fang, Si Liu, Shuicheng Yan
for: 本研究旨在生成高分辨率、长形舞蹈序列，以便与输入音乐进行有效的Alignment。
methods: 该模型采用了一种新的级联动动态模型（DiffDance），其包括一个音乐到舞蹈吸引模型和一个序列超分辨模型。在 conditional 生成过程中，DiffDance 使用了一个预训练的音频表示学习模型来提取音乐嵌入，并通过对比损失来对其嵌入空间与动作进行对齐。
results: 通过对 AIST++ 数据集进行了广泛的实验，我们展示了 DiffDance 能够生成真实的舞蹈序列，并且与输入音乐高效地进行了Alignment。这些结果与当前的 autoregressive 方法相当。

Abstract
When hearing music, it is natural for people to dance to its rhythm. Automatic dance generation, however, is a challenging task due to the physical constraints of human motion and rhythmic alignment with target music. Conventional autoregressive methods introduce compounding errors during sampling and struggle to capture the long-term structure of dance sequences. To address these limitations, we present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation. This model comprises a music-to-dance diffusion model and a sequence super-resolution diffusion model. To bridge the gap between music and motion for conditional generation, DiffDance employs a pretrained audio representation learning model to extract music embeddings and further align its embedding space to motion via contrastive loss. During training our cascaded diffusion model, we also incorporate multiple geometric losses to constrain the model outputs to be physically plausible and add a dynamic loss weight that adaptively changes over diffusion timesteps to facilitate sample diversity. Through comprehensive experiments performed on the benchmark dataset AIST++, we demonstrate that DiffDance is capable of generating realistic dance sequences that align effectively with the input music. These results are comparable to those achieved by state-of-the-art autoregressive methods.

摘要
DiffDance consists of a music-to-dance diffusion model and a sequence super-resolution diffusion model. To bridge the gap between music and motion for conditional generation, DiffDance uses a pre-trained audio representation learning model to extract music embeddings and align its embedding space to motion through contrastive loss. During training, we incorporate multiple geometric losses to ensure physically plausible outputs and add a dynamic loss weight that adaptively changes over diffusion timesteps to promote sample diversity.We evaluate DiffDance on the benchmark dataset AIST++ and demonstrate that it can generate realistic dance sequences that effectively align with the input music. Our results are comparable to those achieved by state-of-the-art autoregressive methods.

2023-08-06

cs.AI

cs.AI - 2023-08-06

AI-GOMS: Large AI-Driven Global Ocean Modeling System

paper_url: http://arxiv.org/abs/2308.03152
repo_url: None
paper_authors: Wei Xiong, Yanfei Xiang, Hao Wu, Shuyi Zhou, Yuze Sun, Muyuan Ma, Xiaomeng Huang
for: 这篇论文旨在提出一种基于人工智能的全球海洋模型系统（AI-GOMS），以提高海洋日干预测的准确性和效率。
methods: 该论文使用了一种基于 Fourier-based Masked Autoencoder 结构的基本海洋变量预测模型，以及一些轻量级细化模型，包括地区下降、浪谱解码和生物化协同模块。
results: 该论文在30天预测期内，对全球海洋基本变量（15层深度）的预测达到了最佳性能，并且在统计指标上表现出色。此外，AI-GOMS还能模拟mesoscale涝团在日本海洋区域的涝团，以及在 тропиical Pacific 海洋区域的海洋层次分布。

Abstract
Ocean modeling is a powerful tool for simulating the physical, chemical, and biological processes of the ocean, which is the foundation for marine science research and operational oceanography. Modern numerical ocean modeling mainly consists of governing equations and numerical algorithms. Nonlinear instability, computational expense, low reusability efficiency and high coupling costs have gradually become the main bottlenecks for the further development of numerical ocean modeling. Recently, artificial intelligence-based modeling in scientific computing has shown revolutionary potential for digital twins and scientific simulations, but the bottlenecks of numerical ocean modeling have not been further solved. Here, we present AI-GOMS, a large AI-driven global ocean modeling system, for accurate and efficient global ocean daily prediction. AI-GOMS consists of a backbone model with the Fourier-based Masked Autoencoder structure for basic ocean variable prediction and lightweight fine-tuning models incorporating regional downscaling, wave decoding, and biochemistry coupling modules. AI-GOMS has achieved the best performance in 30 days of prediction for the global ocean basic variables with 15 depth layers at 1/4{\deg} spatial resolution. Beyond the good performance in statistical metrics, AI-GOMS realizes the simulation of mesoscale eddies in the Kuroshio region at 1/12{\deg} spatial resolution and ocean stratification in the tropical Pacific Ocean. AI-GOMS provides a new backbone-downstream paradigm for Earth system modeling, which makes the system transferable, scalable and reusable.

摘要
海洋模型是一种强大的工具，用于模拟海洋的物理、化学和生物过程，这是海洋科学研究和操作海洋学的基础。现代数值海洋模型主要由管理方程和数值算法组成。不线性不稳定、计算成本高、重用率低和对接成本高逐渐成为数值海洋模型的主要瓶颈。在科学计算中，基于人工智能的模型在近年来表现出革命性的潜力，但是数值海洋模型中的瓶颈问题没有得到解决。我们现在提出了AI-GOMS，一个大型基于人工智能的全球海洋模型系统，用于准确和高效地预测全球海洋的日常变化。AI-GOMS包括一个基本模型，使用带mask的自适应神经网络结构来预测基本海洋变量，以及轻量级的精度适应模型，包括地区下降、波解码和生物化学相互作用模块。AI-GOMS在30天预测全球海洋基本变量中达到了15层深度的1/4度空间分辨率，并实现了库ashiyo区域的微型旋转和海洋层次分布在赤道太平洋海域。AI-GOMS提供了一个新的后台下渠模型，使得系统可以易于传输、扩展和重用。

“We care”: Improving Code Mixed Speech Emotion Recognition in Customer-Care Conversations

paper_url: http://arxiv.org/abs/2308.03150
repo_url: None
paper_authors: N V S Abhishek, Pushpak Bhattacharyya
for: 这个论文的目的是提高对自然语言 conversational AI 中的情绪识别精度。
methods: 该论文使用了自然混合语言的 conversational AI 数据集（NSED），并通过 incorporating word-level VAD value 提高了 SER 任务的精度。
results: 根据该论文的结果，在 NSED 数据集上，通过 incorporating word-level VAD value 可以提高 SER 任务的精度，特别是对于负情绪的识别精度提高了2%。

Abstract
Speech Emotion Recognition (SER) is the task of identifying the emotion expressed in a spoken utterance. Emotion recognition is essential in building robust conversational agents in domains such as law, healthcare, education, and customer support. Most of the studies published on SER use datasets created by employing professional actors in a noise-free environment. In natural settings such as a customer care conversation, the audio is often noisy with speakers regularly switching between different languages as they see fit. We have worked in collaboration with a leading unicorn in the Conversational AI sector to develop Natural Speech Emotion Dataset (NSED). NSED is a natural code-mixed speech emotion dataset where each utterance in a conversation is annotated with emotion, sentiment, valence, arousal, and dominance (VAD) values. In this paper, we show that by incorporating word-level VAD value we improve on the task of SER by 2%, for negative emotions, over the baseline value for NSED. High accuracy for negative emotion recognition is essential because customers expressing negative opinions/views need to be pacified with urgency, lest complaints and dissatisfaction snowball and get out of hand. Escalation of negative opinions speedily is crucial for business interests. Our study then can be utilized to develop conversational agents which are more polite and empathetic in such situations.

摘要
《语音情感识别（SER）是指根据说话人的语音特征来确定他们表达的情感。情感识别是建立Robust conversational agents的关键，特别是在法律、医疗、教育和客户支持等领域。大多数已发表的SER研究使用了由专业演员创建的数据集，这些数据集通常在干净的环境中采集。然而，在自然环境中，如客户服务对话，音频通常具有噪音和Speaker switching between不同的语言的现象。我们与行业领袖的Conversational AI公司合作，开发了自然语音情感数据集（NSED）。NSED是一个自然语音混合语言的情感数据集，每个对话中的每句话都有情感、情感、浓淡、高低（VAD）值的注释。在这篇论文中，我们表明，通过 incorporating 单词级VAD值，可以在NSED数据集上提高SER任务的准确率，对负情感的准确率提高2%。正确识别负情感的精度非常重要，因为客户表达负面意见时需要尽快 pacify，以避免投诉和不满的情况扩散和恶化。这些情况下，我们的研究可以用于开发更加偏袋和同情的对话机器人。》

Towards socially-competent and culturally-adaptive artificial agents Expressive order, interactional disruptions and recovery strategies

paper_url: http://arxiv.org/abs/2308.03146
repo_url: None
paper_authors: Chiara Bassetti, Enrico Blanzieri, Stefano Borgo, Sofia Marangon
for: 这个论文的目的是使人工智能代理人在多方交互情况下展现社交技巧和地方社会规范知识。
methods: 这个论文使用了分离表达和功能两个顺序的方法，并提出了一种框架来使人工智能代理人在多方交互情况下成为社交合格的。
results: 这个论文的结果表明，通过分类功能和社交干扰，以及研究人工智能架构是如何利用这些知识的，可以使人工智能代理人在多方交互情况下展现出社交技巧。

Abstract
The development of artificial agents for social interaction pushes to enrich robots with social skills and knowledge about (local) social norms. One possibility is to distinguish the expressive and the functional orders during a human-robot interaction. The overarching aim of this work is to set a framework to make the artificial agent socially-competent beyond dyadic interaction-interaction in varying multi-party social situations-and beyond individual-based user personalization, thereby enlarging the current conception of "culturally-adaptive". The core idea is to provide the artificial agent with the capability to handle different kinds of interactional disruptions, and associated recovery strategies, in microsociology. The result is obtained by classifying functional and social disruptions, and by investigating the requirements a robot's architecture should satisfy to exploit such knowledge. The paper also highlights how this level of competence is achieved by focusing on just three dimensions: (i) social capability, (ii) relational role, and (iii) proximity, leaving aside the further complexity of full-fledged human-human interactions. Without going into technical aspects, End-to-end Data-driven Architectures and Modular Architectures are discussed to evaluate the degree to which they can exploit this new set of social and cultural knowledge. Finally, a list of general requirements for such agents is proposed.

摘要
人工智能代理人的发展推动了赋予机器人社交技能和地方社会规范知识。一种可能是在人机交互中分辨出表达性和功能性的两个顺序。本研究的核心目标是为人工代理人提供社会能力，使其在多方社会情况下能够应对不同类型的交互干扰和相应的恢复策略，从而超越当前的“文化适应”概念。以下是本研究的核心思想：1. 分类功能性和社交性的干扰，并调查机器人体系是如何利用这些知识。2. 只关注三个维度：社会能力、关系角色和距离，忽略人类之间的复杂交互。3. 评估END-TO-END数据驱动体系和模块化体系是否能够利用这些新的社会和文化知识。4. 提出人工代理人的通用要求列表。希望这个翻译能够帮助您！（_ _)

Embedding-based Retrieval with LLM for Effective Agriculture Information Extracting from Unstructured Data

paper_url: http://arxiv.org/abs/2308.03107
repo_url: None
paper_authors: Ruoling Peng, Kang Liu, Po Yang, Zhipeng Yuan, Shunbao Li
for: 本研究旨在提高农业领域病虫识别率，使用域名独立普通话语言模型（LLM）自动提取农业文档中的结构化数据。
methods: 本研究使用文本检索和筛选，使用嵌入空间Retrieval，然后使用LLM问答系统自动提取文档中的实体和属性，并将其转换为结构化数据。
results: 对比现有方法，本研究在标准测试集上保持了更高的准确率，同时具有高效性。

Abstract
Pest identification is a crucial aspect of pest control in agriculture. However, most farmers are not capable of accurately identifying pests in the field, and there is a limited number of structured data sources available for rapid querying. In this work, we explored using domain-agnostic general pre-trained large language model(LLM) to extract structured data from agricultural documents with minimal or no human intervention. We propose a methodology that involves text retrieval and filtering using embedding-based retrieval, followed by LLM question-answering to automatically extract entities and attributes from the documents, and transform them into structured data. In comparison to existing methods, our approach achieves consistently better accuracy in the benchmark while maintaining efficiency.

摘要
害虫识别是农业害虫控制中的关键环节。然而，大多数农民无法在场地上准确地识别害虫，而有限的结构化数据源也不可供快速查询。在这项工作中，我们探讨使用领域独立的大型自然语言模型（LLM）来从农业文档中提取结构化数据，无需或 minimal 人工干预。我们提议一种方法，包括文本检索和筛选使用嵌入空间 retrieval，然后使用 LLM 问答来自动提取文档中的实体和属性，并将其转换为结构化数据。相比现有方法，我们的方法在 benchmark 中具有更高的稳定性和效率。

Language-based Photo Color Adjustment for Graphic Designs

paper_url: http://arxiv.org/abs/2308.03059
repo_url: None
paper_authors: Zhenwei Wang, Nanxuan Zhao, Gerhard Hancke, Rynson W. H. Lau
for: 这篇论文的目的是为 graphic design 提供一种语言基于的图像重新色调方法，以便更好地传达图像的信息和增加美感。
methods: 该方法使用了一种语言基于的模型，可以帮助专业人士和新手插画师来重新色调图像。它可以预测图像中的源颜色和目标区域，然后将目标区域重新色调为源颜色。
results: 该方法可以准确地重新色调图像，并且可以根据用户的多重指令来生成多个可能的结果。它还可以保持原始图像的 semantics，以便更好地增加图像的美感。

Abstract
Adjusting the photo color to associate with some design elements is an essential way for a graphic design to effectively deliver its message and make it aesthetically pleasing. However, existing tools and previous works face a dilemma between the ease of use and level of expressiveness. To this end, we introduce an interactive language-based approach for photo recoloring, which provides an intuitive system that can assist both experts and novices on graphic design. Given a graphic design containing a photo that needs to be recolored, our model can predict the source colors and the target regions, and then recolor the target regions with the source colors based on the given language-based instruction. The multi-granularity of the instruction allows diverse user intentions. The proposed novel task faces several unique challenges, including: 1) color accuracy for recoloring with exactly the same color from the target design element as specified by the user; 2) multi-granularity instructions for parsing instructions correctly to generate a specific result or multiple plausible ones; and 3) locality for recoloring in semantically meaningful local regions to preserve original image semantics. To address these challenges, we propose a model called LangRecol with two main components: the language-based source color prediction module and the semantic-palette-based photo recoloring module. We also introduce an approach for generating a synthetic graphic design dataset with instructions to enable model training. We evaluate our model via extensive experiments and user studies. We also discuss several practical applications, showing the effectiveness and practicality of our approach. Code and data for this paper are at: https://zhenwwang.github.io/langrecol.

摘要
adjusting the photo color to match with some design elements is a crucial way for a graphic design to effectively deliver its message and make it visually appealing. However, existing tools and previous works face a dilemma between ease of use and level of expressiveness. To address this challenge, we propose an interactive language-based approach for photo recoloring, which provides an intuitive system that can assist both experts and novices in graphic design. Given a graphic design containing a photo that needs to be recolored, our model can predict the source colors and the target regions, and then recolor the target regions with the source colors based on the given language-based instruction. The multi-granularity of the instruction allows diverse user intentions. The proposed novel task faces several unique challenges, including: 1) color accuracy for recoloring with exactly the same color from the target design element as specified by the user; 2) multi-granularity instructions for parsing instructions correctly to generate a specific result or multiple plausible ones; and 3) locality for recoloring in semantically meaningful local regions to preserve original image semantics. To address these challenges, we propose a model called LangRecol with two main components: the language-based source color prediction module and the semantic-palette-based photo recoloring module. We also introduce an approach for generating a synthetic graphic design dataset with instructions to enable model training. We evaluate our model via extensive experiments and user studies. We also discuss several practical applications, showing the effectiveness and practicality of our approach. Code and data for this paper are available at: .

Comparative Analysis of Epileptic Seizure Prediction: Exploring Diverse Pre-Processing Techniques and Machine Learning Models

paper_url: http://arxiv.org/abs/2308.05176
repo_url: None
paper_authors: Md. Simul Hasan Talukder, Rejwan Bin Sulaiman
for: 预测 эпилептических发作
methods: 使用多种机器学习模型进行预测
results: 比较分析五种机器学习模型的性能，结果显示Extra Trees模型在预测 эпилептических发作中表现最佳，其准确率达99.29%，超过了之前的研究成果。

Abstract
Epilepsy is a prevalent neurological disorder characterized by recurrent and unpredictable seizures, necessitating accurate prediction for effective management and patient care. Application of machine learning (ML) on electroencephalogram (EEG) recordings, along with its ability to provide valuable insights into brain activity during seizures, is able to make accurate and robust seizure prediction an indispensable component in relevant studies. In this research, we present a comprehensive comparative analysis of five machine learning models - Random Forest (RF), Decision Tree (DT), Extra Trees (ET), Logistic Regression (LR), and Gradient Boosting (GB) - for the prediction of epileptic seizures using EEG data. The dataset underwent meticulous preprocessing, including cleaning, normalization, outlier handling, and oversampling, ensuring data quality and facilitating accurate model training. These preprocessing techniques played a crucial role in enhancing the models' performance. The results of our analysis demonstrate the performance of each model in terms of accuracy. The LR classifier achieved an accuracy of 56.95%, while GB and DT both attained 97.17% accuracy. RT achieved a higher accuracy of 98.99%, while the ET model exhibited the best performance with an accuracy of 99.29%. Our findings reveal that the ET model outperformed not only the other models in the comparative analysis but also surpassed the state-of-the-art results from previous research. The superior performance of the ET model makes it a compelling choice for accurate and robust epileptic seizure prediction using EEG data.

摘要
эпилепсия是一种常见的神经疾病，表现为不规则的发作，需要准确预测以提供有效的管理和患者照料。通过机器学习（ML）的应用于电энцефалографиagram（EEG）记录，可以提供有价值的脑活动信息 durante发作，因此成为有效的预测组成部分。在这项研究中，我们进行了全面的比较分析，推荐五种机器学习模型：Random Forest（RF）、Decision Tree（DT）、Extra Trees（ET）、Logistic Regression（LR）和Gradient Boosting（GB），用于预测癫痫发作。数据集经过了仔细的处理，包括清洁、 нормализа、异常处理和扩展，以确保数据质量，并促进模型训练。这些处理技术在提高模型性能方面发挥了关键作用。我们的分析结果表明每个模型在准确率方面的表现，LR分类器达到56.95%的准确率，而GB和DT都达到97.17%的准确率。RT达到98.99%的准确率，而ET模型表现出了最好的性能，准确率达99.29%。我们的发现表明，ET模型不仅在本研究中的比较分析中表现出色，还超越了过去研究中的状态体现。ET模型的出色表现使其成为精准和可靠的癫痫发作预测方法。

Weakly Supervised Multi-Task Representation Learning for Human Activity Analysis Using Wearables

paper_url: http://arxiv.org/abs/2308.03805
repo_url: None
paper_authors: Taoran Sheng, Manfred Huber
for: 这 paper 的目的是提出一种多任务弱监督 siamese 网络，用于将数据映射到多个表示空间中，以便同时解决多个任务。
methods: 该方法使用了多输出 siamese 网络，其中每个输出都关注一个不同的方面的数据。这使得数据样本的表示向量在多个表示空间中被排序，使得同Semantic meaning的数据在同一个空间中几乎重合。
results: 经过一系列实验表明，该模型可以同时解决多个任务，并且在许多情况下可以超越单任务监督方法的性能。此外，paper 还进一步分析了该框架的architecture和多任务之间的关系，以及将多个任务合并到同一个框架中的可扩展性。

Abstract
Sensor data streams from wearable devices and smart environments are widely studied in areas like human activity recognition (HAR), person identification, or health monitoring. However, most of the previous works in activity and sensor stream analysis have been focusing on one aspect of the data, e.g. only recognizing the type of the activity or only identifying the person who performed the activity. We instead propose an approach that uses a weakly supervised multi-output siamese network that learns to map the data into multiple representation spaces, where each representation space focuses on one aspect of the data. The representation vectors of the data samples are positioned in the space such that the data with the same semantic meaning in that aspect are closely located to each other. Therefore, as demonstrated with a set of experiments, the trained model can provide metrics for clustering data based on multiple aspects, allowing it to address multiple tasks simultaneously and even to outperform single task supervised methods in many situations. In addition, further experiments are presented that in more detail analyze the effect of the architecture and of using multiple tasks within this framework, that investigate the scalability of the model to include additional tasks, and that demonstrate the ability of the framework to combine data for which only partial relationship information with respect to the target tasks is available.

摘要
仪器数据流从智能设备和智能环境中广泛研究，如人动作识别（HAR）、人识别或健康监测。然而，大多数之前的活动和数据流分析工作都是专注一个方面的数据，例如仅仅识别活动的类型或仅仅识别活动的执行者。我们提议一种使用弱监督多输出siamesenet来映射数据到多个表示空间，其中每个表示空间专注一个数据方面。数据样本的表示向量在空间中的位置是根据数据semantic意义相似的，因此训练模型可以为多个任务提供分 clustering metrics，使其能同时解决多个任务，甚至在许多情况下超越单任务监督方法。此外，我们还进行了进一步的实验，分析了这种架构的影响和多任务内部的效果，以及模型可以扩展到包括更多任务的可扩展性。最后，我们还示出了将多个任务的数据合并起来，只有部分关系信息与目标任务相关的情况下，模型仍然可以提供高效的分类结果。

Serverless Federated AUPRC Optimization for Multi-Party Collaborative Imbalanced Data Mining

paper_url: http://arxiv.org/abs/2308.03035
repo_url: https://github.com/xidongwu/d-auprc
paper_authors: Xidong Wu, Zhengmian Hu, Jian Pei, Heng Huang
for: 这篇论文targets the problem of multi-party collaborative training for imbalanced data tasks, with the goal of maximizing the Area Under Precision-Recall Curve (AUPRC).
methods: 该论文提出了一种新的ServerLess biAsed sTochastic gradiEnt (SLATE) algorithm，用于直接优化AUPRC。此外，还提出了一种基于势量衰减技术的ServerLess biAsed sTochastic gradiEnt with Momentum-based variance reduction (SLATE-M) algorithm，用于提高收敛率。
results: 该论文的实验结果表明，SLATE-M算法可以与单机在线方法匹配最佳理论收敛率，并且在多方合作训练中减少了通信成本。

Abstract
Multi-party collaborative training, such as distributed learning and federated learning, is used to address the big data challenges. However, traditional multi-party collaborative training algorithms were mainly designed for balanced data mining tasks and are intended to optimize accuracy (\emph{e.g.}, cross-entropy). The data distribution in many real-world applications is skewed and classifiers, which are trained to improve accuracy, perform poorly when applied to imbalanced data tasks since models could be significantly biased toward the primary class. Therefore, the Area Under Precision-Recall Curve (AUPRC) was introduced as an effective metric. Although single-machine AUPRC maximization methods have been designed, multi-party collaborative algorithm has never been studied. The change from the single-machine to the multi-party setting poses critical challenges. To address the above challenge, we study the serverless multi-party collaborative AUPRC maximization problem since serverless multi-party collaborative training can cut down the communications cost by avoiding the server node bottleneck, and reformulate it as a conditional stochastic optimization problem in a serverless multi-party collaborative learning setting and propose a new ServerLess biAsed sTochastic gradiEnt (SLATE) algorithm to directly optimize the AUPRC. After that, we use the variance reduction technique and propose ServerLess biAsed sTochastic gradiEnt with Momentum-based variance reduction (SLATE-M) algorithm to improve the convergence rate, which matches the best theoretical convergence result reached by the single-machine online method. To the best of our knowledge, this is the first work to solve the multi-party collaborative AUPRC maximization problem.

摘要
多方合作训练，如分布式学习和联合学习，用于解决大数据挑战。然而，传统的多方合作训练算法主要是为了均衡数据挖掘任务而设计，并且是为了提高准确率（例如，交叉熵）。然而，在实际应用中，数据分布往往偏斜，并且基于主要类别的模型可能会受到偏见。因此，Area Under Precision-Recall Curve（AUPRC）被引入为一个有效的度量。虽然单机AUPRC最大化方法已经被设计，但多方合作算法尚未被研究。在从单机到多机设置中的变化 pose critical challenges。为解决上述挑战，我们研究了无服务器多方合作AUPRC最大化问题，并将其重新定义为无服务器多方合作学习Setting中的conditional stochastic optimization问题。然后，我们提出了一种新的ServerLess biAsed sTochastic gradiEnt（SLATE）算法，以直接优化AUPRC。接着，我们使用了减少偏差的技术，并提出了ServerLess biAsed sTochastic gradiEnt with Momentum-based variance reduction（SLATE-M）算法，以提高收敛率，并与单机在线方法的最佳理论收敛率相匹配。到目前为止，这是首次解决多方合作AUPRC最大化问题的研究。

Pre-Trained Large Language Models for Industrial Control

paper_url: http://arxiv.org/abs/2308.03028
repo_url: None
paper_authors: Lei Song, Chuheng Zhang, Li Zhao, Jiang Bian
For: 这个论文旨在研究使用基础模型GPT-4来控制建筑物的冷暖空调系统（HVAC）。* Methods: 作者使用了GPT-4来控制HVAC系统，并通过提供简短的描述、选择的示例和当前观察来让GPT-4执行操作。* Results: 研究发现，GPT-4可以准确地控制HVAC系统，并且能够在不同的场景下进行一致性的控制。此外，研究还发现了不同的文本背景Context对性能的影响。

Abstract
For industrial control, developing high-performance controllers with few samples and low technical debt is appealing. Foundation models, possessing rich prior knowledge obtained from pre-training with Internet-scale corpus, have the potential to be a good controller with proper prompts. In this paper, we take HVAC (Heating, Ventilation, and Air Conditioning) building control as an example to examine the ability of GPT-4 (one of the first-tier foundation models) as the controller. To control HVAC, we wrap the task as a language game by providing text including a short description for the task, several selected demonstrations, and the current observation to GPT-4 on each step and execute the actions responded by GPT-4. We conduct series of experiments to answer the following questions: 1)~How well can GPT-4 control HVAC? 2)~How well can GPT-4 generalize to different scenarios for HVAC control? 3) How different parts of the text context affect the performance? In general, we found GPT-4 achieves the performance comparable to RL methods with few samples and low technical debt, indicating the potential of directly applying foundation models to industrial control tasks.

摘要
для industrial control, developing high-performance controllers with few samples and low technical debt is appealing. Foundation models, possessing rich prior knowledge obtained from pre-training with Internet-scale corpus, have the potential to be a good controller with proper prompts. In this paper, we take HVAC (Heating, Ventilation, and Air Conditioning) building control as an example to examine the ability of GPT-4 (one of the first-tier foundation models) as the controller. To control HVAC, we wrap the task as a language game by providing text including a short description for the task, several selected demonstrations, and the current observation to GPT-4 on each step and execute the actions responded by GPT-4. We conduct series of experiments to answer the following questions: 1)~How well can GPT-4 control HVAC? 2)~How well can GPT-4 generalize to different scenarios for HVAC control? 3) How different parts of the text context affect the performance? In general, we found GPT-4 achieves the performance comparable to RL methods with few samples and low technical debt, indicating the potential of directly applying foundation models to industrial control tasks.Here's a word-for-word translation of the text into Simplified Chinese:为了 industrial control, 开发高性能控制器只需几个样本和低技术债是吸引人的。基础模型，通过 Internet 规模的预训练获得了丰富的先前知识，有potential为good controller。在这篇论文中，我们选择了 HVAC（卫生、通风、空调）建筑控制作为例子，检查 GPT-4（一个首层基础模型）的控制能力。为了控制 HVAC，我们将任务包装成语言游戏，在每步提供文本描述任务、选择的示例和当前观察，并执行 GPT-4 回答的动作。我们进行了一系列实验，回答以下问题：1）GPT-4 能控制 HVAC 吗？2）GPT-4 能在不同场景下控制 HVAC 吗？3）不同文本上下文部分如何影响性能？总的来说，我们发现 GPT-4 在几个样本和低技术债的情况下可以与RL方法匹配性能，表明可以直接应用基础模型到industrial control任务。

Towards Scene-Text to Scene-Text Translation

paper_url: http://arxiv.org/abs/2308.03024
repo_url: None
paper_authors: Onkar Susladkar, Prajwal Gatti, Anand Mishra
for: 这个研究旨在实现“视觉”翻译场景文本从源语言（例如英语）到目标语言（例如中文）。
methods: 我们引入了一种新的决策扩散方法，即VTNet，以Addressing several challenges in visual translation, such as interpolating font to unseen characters and preserving text size and background.
results: 我们通过了广泛的实验和相关方法比较，并且我们的模型超过了之前的状态态-of-the-art结果在传统的场景文本编辑benchmark中。

Abstract
In this work, we study the task of ``visually" translating scene text from a source language (e.g., English) to a target language (e.g., Chinese). Visual translation involves not just the recognition and translation of scene text but also the generation of the translated image that preserves visual features of the text, such as font, size, and background. There are several challenges associated with this task, such as interpolating font to unseen characters and preserving text size and the background. To address these, we introduce VTNet, a novel conditional diffusion-based method. To train the VTNet, we create a synthetic cross-lingual dataset of 600K samples of scene text images in six popular languages, including English, Hindi, Tamil, Chinese, Bengali, and German. We evaluate the performance of VTnet through extensive experiments and comparisons to related methods. Our model also surpasses the previous state-of-the-art results on the conventional scene-text editing benchmarks. Further, we present rigorous qualitative studies to understand the strengths and shortcomings of our model. Results show that our approach generalizes well to unseen words and fonts. We firmly believe our work can benefit real-world applications, such as text translation using a phone camera and translating educational materials. Code and data will be made publicly available.

摘要
在这个研究中，我们研究了将场景文本从源语言（例如英语）翻译到目标语言（例如中文）的任务。视觉翻译不仅包括场景文本的识别和翻译，还包括生成翻译后的图像，保持文本的视觉特征，如字体、大小和背景。这个任务存在许多挑战，如 interpolating 字体到未seen 字符和保持文本大小和背景。为解决这些挑战，我们引入 VTNet，一种新的决策扩散方法。为训练 VTNet，我们创建了600,000个样本的场景文本图像 Synthetic 跨语言数据集，包括英语、捷尔文、泰米尔语、中文、孟加拉语和德语。我们通过广泛的实验和相关方法的比较来评估 VTNet 的性能。我们的模型还超过了之前的状态法则结果在场景文本编辑标准准则上。此外，我们进行了严格的Qualitative 研究，以了解我们的模型的优势和缺陷。结果表明我们的方法可以通过未看过的字符和字体进行泛化。我们认为我们的工作可以实际应用中批用，例如通过手机摄像头进行文本翻译和翻译教学材料。代码和数据将公开发布。

SAPIEN: Affective Virtual Agents Powered by Large Language Models

paper_url: http://arxiv.org/abs/2308.03022
repo_url: None
paper_authors: Masum Hasan, Cengiz Ozel, Sammy Potter, Ehsan Hoque
for: 这篇论文描述了一个基于大语言模型的虚拟代表人物平台，可以在13种语言中进行高 fidelioty的对话，并表现出情绪 through facial expressions and voice.
methods: 该平台使用了大语言模型驱动的虚拟代表人物，并允许用户自定义虚拟代表人物的性格、背景和对话前提，以提供丰富、沉浸的互动体验.
results: 该论文介绍了该平台的概述和其应用领域的可能性，从娱乐到心理健康、communication training、语言学习、教育、医疗等领域。此外，论文还考虑了这种真实的虚拟代表人物表现的伦理问题，并考虑了 Ensuring responsible use的挑战。

Abstract
In this demo paper, we introduce SAPIEN, a platform for high-fidelity virtual agents driven by large language models that can hold open domain conversations with users in 13 different languages, and display emotions through facial expressions and voice. The platform allows users to customize their virtual agent's personality, background, and conversation premise, thus providing a rich, immersive interaction experience. Furthermore, after the virtual meeting, the user can choose to get the conversation analyzed and receive actionable feedback on their communication skills. This paper illustrates an overview of the platform and discusses the various application domains of this technology, ranging from entertainment to mental health, communication training, language learning, education, healthcare, and beyond. Additionally, we consider the ethical implications of such realistic virtual agent representations and the potential challenges in ensuring responsible use.

摘要
在这份演示文献中，我们介绍了SAPIEN平台，这是一个可以在13种不同语言上进行高效虚拟代表的大语言模型驱动的平台，可以为用户提供自定义虚拟代表的人格、背景和对话前提，从而提供丰富的 immerse 交互体验。此外，用户可以在虚拟会议后选择获取对话分析，并获得有用的沟通技巧反馈。本文介绍了该平台的概述和其应用领域的多样性，从娱乐到心理健康、沟通培训、语言学习、教育、医疗和更多。此外，我们还考虑了这些真实虚拟代表表现的伦理问题，并讨论了 garantizar 负责任使用的挑战。

Cal-SFDA: Source-Free Domain-adaptive Semantic Segmentation with Differentiable Expected Calibration Error

paper_url: http://arxiv.org/abs/2308.03003
repo_url: https://github.com/jo-wang/cal-sfda
paper_authors: Zixin Wang, Yadan Luo, Zhi Chen, Sen Wang, Zi Huang
for: 避免源数据泄露的域适应 semantic segmentation 问题
methods: 自然学习方法 pseudo-label 高信度区域，适应目标数据
results: 比前状态下提高 5.25% of mIoU，并且可以做公平的模型选择

Abstract
The prevalence of domain adaptive semantic segmentation has prompted concerns regarding source domain data leakage, where private information from the source domain could inadvertently be exposed in the target domain. To circumvent the requirement for source data, source-free domain adaptation has emerged as a viable solution that leverages self-training methods to pseudo-label high-confidence regions and adapt the model to the target data. However, the confidence scores obtained are often highly biased due to over-confidence and class-imbalance issues, which render both model selection and optimization problematic. In this paper, we propose a novel calibration-guided source-free domain adaptive semantic segmentation (Cal-SFDA) framework. The core idea is to estimate the expected calibration error (ECE) from the segmentation predictions, serving as a strong indicator of the model's generalization capability to the unlabeled target domain. The estimated ECE scores, in turn, assist the model training and fair selection in both source training and target adaptation stages. During model pre-training on the source domain, we ensure the differentiability of the ECE objective by leveraging the LogSumExp trick and using ECE scores to select the best source checkpoints for adaptation. To enable ECE estimation on the target domain without requiring labels, we train a value net for ECE estimation and apply statistic warm-up on its BatchNorm layers for stability. The estimated ECE scores assist in determining the reliability of prediction and enable class-balanced pseudo-labeling by positively guiding the adaptation progress and inhibiting potential error accumulation. Extensive experiments on two widely-used synthetic-to-real transfer tasks show that the proposed approach surpasses previous state-of-the-art by up to 5.25% of mIoU with fair model selection criteria.

摘要
域际适应 semantic segmentation 的普遍使用引发了来源域数据泄露的问题，即private information从来源域可能不计划地泄露到target域。为了绕过来源数据的需求，source-free domain adaptation emerged as a viable solution，利用 self-training 方法 Pseudo-label high-confidence regions and adapt the model to the target data。然而，获得的信任分布 часто受到过度自信和分类不均问题的影响，这 rendering both model selection and optimization problematic。在这篇论文中，我们提出了一种新的 calibration-guided source-free domain adaptive semantic segmentation (Cal-SFDA) 框架。核心思想是 estimate the expected calibration error (ECE) from the segmentation predictions, serving as a strong indicator of the model's generalization capability to the unlabeled target domain。Estimated ECE scores, in turn, assist the model training and fair selection in both source training and target adaptation stages。在模型预训练的 source domain 阶段，我们利用 LogSumExp 技巧和 ECE 分数选择最佳的来源检查点进行适应。为了在target domain 上无需标签进行 ECE 估计，我们训练了一个值网来估计 ECE 分数，并在其 BatchNorm 层上应用统计暖身。Estimated ECE scores assist in determining the reliability of prediction and enable class-balanced pseudo-labeling by positively guiding the adaptation progress and inhibiting potential error accumulation。我们在两个Synthetic-to-real transfer task上进行了广泛的实验，结果表明，我们的方法比前一个state-of-the-art 高于5.25%的mIoU，并且满足了公平的模型选择标准。

Spanish Pre-trained BERT Model and Evaluation Data

paper_url: http://arxiv.org/abs/2308.02976
repo_url: https://github.com/dccuchile/beto
paper_authors: José Cañete, Gabriel Chaperon, Rodrigo Fuentes, Jou-Hui Ho, Hojin Kang, Jorge Pérez
for: 本研究的目的是提供一个基于BERT的西班牙语模型，并且将西班牙语相关任务集成一个单一的库，以便进行模型训练和评估。
methods: 本研究使用BERT预训概念，并将西班牙语数据集用于预训。
results: 经过精致调整后，我们的西班牙语模型在大多数任务中比其他基于BERT的多语言模型优化，甚至在一些任务中创下新的最佳纪录。

Abstract
The Spanish language is one of the top 5 spoken languages in the world. Nevertheless, finding resources to train or evaluate Spanish language models is not an easy task. In this paper we help bridge this gap by presenting a BERT-based language model pre-trained exclusively on Spanish data. As a second contribution, we also compiled several tasks specifically for the Spanish language in a single repository much in the spirit of the GLUE benchmark. By fine-tuning our pre-trained Spanish model, we obtain better results compared to other BERT-based models pre-trained on multilingual corpora for most of the tasks, even achieving a new state-of-the-art on some of them. We have publicly released our model, the pre-training data, and the compilation of the Spanish benchmarks.

摘要
西班牙语是全球前5种最广泛使用的语言之一，然而找到用于训练或评估西班牙语模型的资源并不是一件容易的事情。在这篇论文中，我们帮助填补这一差距，并提供了基于BERT的西班牙语模型，该模型在西班牙语数据上进行了专门预训练。此外，我们还编译了一些特定于西班牙语的任务，并将其集成到了GLUE数据集的同类型的单一存储中。经过我们的西班牙语模型的微调，我们在大多数任务上Obtained better results than other BERT-based models pre-trained on multilingual corpora，甚至在一些任务上创造了新的状态反应。我们已经公开发布了我们的模型、预训练数据和西班牙语benchmark集。

Understanding User Intent Modeling for Conversational Recommender Systems: A Systematic Literature Review

paper_url: http://arxiv.org/abs/2308.08496
repo_url: None
paper_authors: Siamak Farshidi, Kiyan Rezaee, Sara Mazaheri, Amir Hossein Rahimi, Ali Dadashzadeh, Morteza Ziabakhsh, Sadegh Eskandari, Slinger Jansen
for: 本研究旨在帮助研究者选择适合其系统的用户意图模型，以提高对话推荐系统的个性化响应。
methods: 我们通过系统性的文献综述方法收集了对话推荐系统中通常使用的模型，并基于这些数据开发了一个决策模型。我们还完成了两个案例研究来评估我们提议的决策模型的效果。
results: 我们的研究分析了59种不同的模型和74种通常使用的特征，提供了对用户意图模型的实际应用和评估的深入理解。我们还发现了一些模型组合的潜在优势、趋势、评价标准和常用的训练和评估数据集。

Abstract
Context: User intent modeling is a crucial process in Natural Language Processing that aims to identify the underlying purpose behind a user's request, enabling personalized responses. With a vast array of approaches introduced in the literature (over 13,000 papers in the last decade), understanding the related concepts and commonly used models in AI-based systems is essential. Method: We conducted a systematic literature review to gather data on models typically employed in designing conversational recommender systems. From the collected data, we developed a decision model to assist researchers in selecting the most suitable models for their systems. Additionally, we performed two case studies to evaluate the effectiveness of our proposed decision model. Results: Our study analyzed 59 distinct models and identified 74 commonly used features. We provided insights into potential model combinations, trends in model selection, quality concerns, evaluation measures, and frequently used datasets for training and evaluating these models. Contribution: Our study contributes practical insights and a comprehensive understanding of user intent modeling, empowering the development of more effective and personalized conversational recommender systems. With the Conversational Recommender System, researchers can perform a more systematic and efficient assessment of fitting intent modeling frameworks.

摘要
Context: 用户意图模型化是自然语言处理领域的关键过程，旨在识别用户请求的含义，以提供个性化回答。随着文献中所提出的多种方法的激增（过去一个十年内有超过13,000篇论文），了解相关的概念和在人工智能系统中常用的模型是非常重要。方法：我们进行了系统性的文献综述，以收集用于设计对话推荐系统的模型的数据。从收集到的数据中，我们开发了一个决策模型，以 помо助研究人员选择最适合他们系统的模型。此外，我们进行了两个实验，以评估我们的提议的决策模型的有效性。结果：我们的研究分析了59种不同的模型，并发现了74种通常使用的特征。我们提供了模型组合的可能性、模型选择趋势、质量问题、评价标准和用于训练和评估这些模型的常用数据集。贡献：我们的研究提供了实用的准确和对话推荐系统的全面理解，推动了更有效和个性化的对话推荐系统的开发。通过对话推荐系统，研究人员可以更系统和有效地评估适用的意图模型框架。

Science and engineering for what? A large-scale analysis of students’ projects in science fairs

paper_url: http://arxiv.org/abs/2308.02962
repo_url: None
paper_authors: Adelmo Eloy, Thomas Palmeira Ferraz, Fellip Silva Alves, Roseli de Deus Lopes
for: The paper is written to analyze the themes and topics that have driven students’ inquiry and design in science fair projects over the past 20 years in Brazil.
methods: The paper uses topic modeling to identify the main topics being explored in the projects, and to examine variations over time, region, and school setting.
results: The analysis found a broad range of topics being explored, with significant variations over time, region, and school setting, and the authors argue that the results and proposed methodology can support further research and inform instruction and resource design for open inquiry experiences in different settings.Here are the three points in Simplified Chinese text:
for: 这个论文是为了分析过去20年在巴西的科学展上学生的探索和设计主题。
methods: 这篇论文使用主题分析来确定学生的探索和设计主题，并分析时间、地区和学校背景的变化。
results: 分析发现了各种主题的探索，时间、地区和学校背景中存在显著的变化，作者认为这些结果和提出的方法可以支持更多的研究，并且用于不同设置的 instrucion 和资源设计。

Abstract
Science and Engineering fairs offer K-12 students opportunities to engage with authentic STEM practices. Particularly, students are given the chance to experience authentic and open inquiry processes, by defining which themes, questions and approaches will guide their scientific endeavors. In this study, we analyzed data from over 5,000 projects presented at a nationwide science fair in Brazil over the past 20 years using topic modeling to identify the main topics that have driven students' inquiry and design. Our analysis identified a broad range of topics being explored, with significant variations over time, region, and school setting. We argue those results and proposed methodology can not only support further research in the context of science fairs, but also inform instruction and design of contexts-specific resources to support students in open inquiry experiences in different settings.

摘要
科学和工程博览会为小学到高中学生提供了实践科学技术的机会。特别是，学生有机会经历真正的开放探索过程，确定他们的科学做业的主题、问题和方法。在这项研究中，我们使用主题分析对 brasil 国内过去 20 年的全国科学博览会上的项目数据进行分析，并确定了学生的探索主题的主要趋势。我们的分析发现了广泛的主题被探索，并且在时间、地区和学校背景中有显著的变化。我们认为这些结果和方法可以不仅支持后续在科学博览会上的研究，还可以 Inform 不同设置中的 instrucion 和资源设计，以支持学生在开放探索经历中。

Data Fusion for Multi-Task Learning of Building Extraction and Height Estimation

paper_url: http://arxiv.org/abs/2308.02960
repo_url: https://github.com/SaadAhmedJamal/IEEE_DFC2023
paper_authors: Saad Ahmed Jamal, Arioluwa Aribisala
for: 这个论文是针对都市重建问题进行的一种多任务学习方法，利用光学和雷达卫星图像进行建筑物提取和高度估算。
methods: 这个论文使用多任务学习方法，将建筑物提取和高度估算作为两个独立的任务进行实现，并在这两个任务之间设置约束。
results: 根据设计实验结果，论文的基准结果在建筑物提取和高度估算方面得到了显著提高。

Abstract
In accordance with the urban reconstruction problem proposed by the DFC23 Track 2 Contest, this paper attempts a multitask-learning method of building extraction and height estimation using both optical and radar satellite imagery. Contrary to the initial goal of multitask learning which could potentially give a superior solution by reusing features and forming implicit constraints between multiple tasks, this paper reports the individual implementation of the building extraction and height estimation under constraints. The baseline results for the building extraction and the height estimation significantly increased after designed experiments.

摘要
根据DFC23 Track 2 Contest提出的城市重建问题，本文提出了一种多任务学习方法，通过光学和雷达卫星图像进行建筑物提取和高度估计。与初始目标的多任务学习不同，本文报告了每个任务的单独实现，而不是 reuse features和形成多任务之间的含义约束。经过设计实验，基准结果显著提高。

A criterion for Artificial General Intelligence: hypothetic-deductive reasoning, tested on ChatGPT

paper_url: http://arxiv.org/abs/2308.02950
repo_url: None
paper_authors: Louis Vervoort, Vitaliy Mizyakov, Anastasia Ugleva
for: 这篇论文探讨了一种逻辑能力，即假设推理能力，是否能让AI达到人类智能水平。
methods: 作者提出了一些简单的测试方法来评估AI的假设推理能力，并应用到了ChatGPT上。
results: 研究发现，目前ChatGPT在较复杂的问题上的假设推理能力有限，但如果AI可以在多种情况下表现出这种逻辑能力，那么它就可以被视为人类智能水平。

Abstract
We argue that a key reasoning skill that any advanced AI, say GPT-4, should master in order to qualify as 'thinking machine', or AGI, is hypothetic-deductive reasoning. Problem-solving or question-answering can quite generally be construed as involving two steps: hypothesizing that a certain set of hypotheses T applies to the problem or question at hand, and deducing the solution or answer from T - hence the term hypothetic-deductive reasoning. An elementary proxy of hypothetic-deductive reasoning is causal reasoning. We propose simple tests for both types of reasoning, and apply them to ChatGPT. Our study shows that, at present, the chatbot has a limited capacity for either type of reasoning, as soon as the problems considered are somewhat complex. However, we submit that if an AI would be capable of this type of reasoning in a sufficiently wide range of contexts, it would be an AGI.

摘要
我们认为，任何高级AI，如GPT-4，以“思考机器”或AGI的标准，应具备推理能力。我们认为，这种推理能力应包括假设推理。问题解释或问题回答通常可以分为两步：首先假设一个假设集T适用于问题或问题，然后从T中推理出解释或答案。因此，我们称这种推理为假设推理。我们提出了两种这种推理的简单测验，并将其应用到ChatGPT。我们的研究表明，现在，这个 chatbot 仅有限的能力进行这些类型的推理，只有在问题变得相当复杂时。但我们认为，如果AI能够在充分广泛的上下文中进行这种推理，则它将是一个AGI。

dPASP: A Comprehensive Differentiable Probabilistic Answer Set Programming Environment For Neurosymbolic Learning and Reasoning

paper_url: http://arxiv.org/abs/2308.02944
repo_url: None
paper_authors: Renato Lui Geh, Jonas Gonçalves, Igor Cataneo Silveira, Denis Deratani Mauá, Fabio Gagliardi Cozman
for: This paper proposes a new framework called dPASP for differentiable neuro-symbolic reasoning, which allows for the combination of discrete probabilistic models, neural predicates, logic constraints, and interval-valued probabilistic choices.
methods: The paper discusses several semantics for probabilistic logic programs that can express nondeterministic, contradictory, incomplete, and/or statistical knowledge, and how gradient-based learning can be performed with neural predicates and probabilistic choices under these semantics.
results: The paper describes an implemented package that supports inference and learning in the language, along with several example programs, and demonstrates that the package allows for end-to-end training of rather sophisticated models and loss functions with minimal user knowledge of deep learning system’s inner workings.

Abstract
We present dPASP, a novel declarative probabilistic logic programming framework for differentiable neuro-symbolic reasoning. The framework allows for the specification of discrete probabilistic models with neural predicates, logic constraints and interval-valued probabilistic choices, thus supporting models that combine low-level perception (images, texts, etc), common-sense reasoning, and (vague) statistical knowledge. To support all such features, we discuss the several semantics for probabilistic logic programs that can express nondeterministic, contradictory, incomplete and/or statistical knowledge. We also discuss how gradient-based learning can be performed with neural predicates and probabilistic choices under selected semantics. We then describe an implemented package that supports inference and learning in the language, along with several example programs. The package requires minimal user knowledge of deep learning system's inner workings, while allowing end-to-end training of rather sophisticated models and loss functions.

摘要
我们介绍了dPASP，一种新的宣告型概率逻辑编程框架，用于不同槽的神经符号逻辑推理。该框架允许指定混合低水平感知（图像、文本等）、通用理智、抽象统计知识的概率模型。为支持这些特性，我们讨论了概率逻辑程序的多种 semantics，包括不确定、矛盾、不完整和统计知识。我们还讨论了如何在选定 semantics 下使用神经 predicate 和概率选择来进行梯度基本学习。然后，我们描述了实现的包，包括推理和学习语言的实现，以及一些示例程序。该包需要最小化用户对深度学习系统内部办公的知识，同时允许执行比较复杂的模型和损失函数的整体训练。

Dark-Skin Individuals Are at More Risk on the Street: Unmasking Fairness Issues of Autonomous Driving Systems

paper_url: http://arxiv.org/abs/2308.02935
repo_url: None
paper_authors: Xinyue Li, Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Ying Zhang, Xuanzhe Liu
for: 本研究检测了自动驾驶系统中人工步行检测器的公平性问题，这是一个尚未得到充分研究的问题。methods: 我们用了8种广泛研究的人工步行检测器进行公平性测试，并对大规模的实际世界数据集进行了大规模的批处理和标注。我们为数据集提供了广泛的标注，包括16,070个性别标签、20,115个年龄标签和3,513个皮肤颜色标签。results: 我们的发现表明，年龄和皮肤颜色存在显著的公平问题。对于成年人和儿童，检测精度差异为19.67%，而对于轻皮和暗皮人，检测精度差异为7.52%。而gender只有1.1%的差异。此外，我们发现在文献中常见的自动驾驶测试场景下，对暗皮人的偏见明显增加。我们将代码、数据和结果公开发布，以便未来关于公平性在自动驾驶领域的研究。

Abstract
This paper conducts fairness testing on automated pedestrian detection, a crucial but under-explored issue in autonomous driving systems. We evaluate eight widely-studied pedestrian detectors across demographic groups on large-scale real-world datasets. To enable thorough fairness testing, we provide extensive annotations for the datasets, resulting in 8,311 images with 16,070 gender labels, 20,115 age labels, and 3,513 skin tone labels. Our findings reveal significant fairness issues related to age and skin tone. The detection accuracy for adults is 19.67% higher compared to children, and there is a 7.52% accuracy disparity between light-skin and dark-skin individuals. Gender, however, shows only a 1.1% difference in detection accuracy. Additionally, we investigate common scenarios explored in the literature on autonomous driving testing, and find that the bias towards dark-skin pedestrians increases significantly under scenarios of low contrast and low brightness. We publicly release the code, data, and results to support future research on fairness in autonomous driving.

摘要
Translation notes:* "crucial" is translated as "重要的" (zhòng yào de), which means "very important" or "crucial"* "under-explored" is translated as "未探索的" (wèi tàn gōu de), which means "under-explored" or "unexplored"* "demographic groups" is translated as "人口群体" (rén kǒu qún tǐ), which means "demographic groups" or "population groups"* "gender" is translated as "性别" (xìng bèi), which means "gender"* "skin tone" is translated as "皮肤色" (pí fù sè), which means "skin tone"* "dark-skin" is translated as "黑皮肤" (hēi pí fù), which means "dark-skin"* "low contrast" is translated as "低对比度" (dī yǎo duì bǐ dù), which means "low contrast"* "low brightness" is translated as "低亮度" (dī liàng dù), which means "low brightness"* "scenarios" is translated as "场景" (chǎng jǐng), which means "scenarios" or "settings"* "bias" is translated as "偏见" (piān jiàn), which means "bias" or "prejudice"

ConvFormer: Revisiting Transformer for Sequential User Modeling

paper_url: http://arxiv.org/abs/2308.02925
repo_url: None
paper_authors: Hao Wang, Jianxun Lian, Mingqi Wu, Haoxuan Li, Jiajun Fan, Wanyue Xu, Chaozhuo Li, Xing Xie
for: 这篇论文的目的是提高个性化推荐系统中的序列用户模型，以更好地理解用户行为序列。
methods: 该论文使用了改进的Transformer结构，探讨了 item-to-item 机制在序列用户模型中的效果，并从实验分析中提出了三个关键指南。
results: 实验结果表明，该模型在四个公共数据集上实现了state-of-the-art的结果，并证实了提出的三个指南的有用性。

Abstract
Sequential user modeling, a critical task in personalized recommender systems, focuses on predicting the next item a user would prefer, requiring a deep understanding of user behavior sequences. Despite the remarkable success of Transformer-based models across various domains, their full potential in comprehending user behavior remains untapped. In this paper, we re-examine Transformer-like architectures aiming to advance state-of-the-art performance. We start by revisiting the core building blocks of Transformer-based methods, analyzing the effectiveness of the item-to-item mechanism within the context of sequential user modeling. After conducting a thorough experimental analysis, we identify three essential criteria for devising efficient sequential user models, which we hope will serve as practical guidelines to inspire and shape future designs. Following this, we introduce ConvFormer, a simple but powerful modification to the Transformer architecture that meets these criteria, yielding state-of-the-art results. Additionally, we present an acceleration technique to minimize the complexity associated with processing extremely long sequences. Experiments on four public datasets showcase ConvFormer's superiority and confirm the validity of our proposed criteria.

摘要
纵向用户模型化，个人化推荐系统中的关键任务，旨在预测用户下一个 preference 的 item，需要深刻了解用户行为序列。尽管转换器基本模型在不同领域取得了很大成功，但它们在理解用户行为方面的潜在能力还未得到充分利用。本文重新评估转换器类型的架构，以提高个人化推荐系统的性能。我们从转换器基本建构块开始，分析 item-to-item 机制在用户行为序列中的效果。经过广泛的实验分析，我们确定了三个关键标准 для设计高效的纵向用户模型，希望这些标准能够作为实践指南，激发和改进未来的设计。接着，我们介绍 ConvFormer，一种简单 yet 强大的修改，满足这些标准，并实现了状态之最好的结果。此外，我们还提出了加速技术，以降低处理极长序列的复杂性。经过四个公共数据集的实验，ConvFormer 的优势和我们提出的标准的有效性得到了证明。

Adversarial Erasing with Pruned Elements: Towards Better Graph Lottery Ticket

paper_url: http://arxiv.org/abs/2308.02916
repo_url: https://github.com/wangyuwen0627/ace-glt
paper_authors: Yuwen Wang, Shunyu Liu, Kaixuan Chen, Tongtian Zhu, Ji Qiao, Mengjie Shi, Yuanyu Wan, Mingli Song
for: 提高大输入图的深度图神经网络计算成本，并保持原始性能。
methods: 提出了一种新的敌对补做法（ACE），通过在剪枝过程中挖掘价值信息，提高GLT的性能。
results: 实验结果显示，我们的ACE-GLT在多种任务中表现出色，超过了现有方法。

Abstract
Graph Lottery Ticket (GLT), a combination of core subgraph and sparse subnetwork, has been proposed to mitigate the computational cost of deep Graph Neural Networks (GNNs) on large input graphs while preserving original performance. However, the winning GLTs in exisiting studies are obtained by applying iterative magnitude-based pruning (IMP) without re-evaluating and re-considering the pruned information, which disregards the dynamic changes in the significance of edges/weights during graph/model structure pruning, and thus limits the appeal of the winning tickets. In this paper, we formulate a conjecture, i.e., existing overlooked valuable information in the pruned graph connections and model parameters which can be re-grouped into GLT to enhance the final performance. Specifically, we propose an adversarial complementary erasing (ACE) framework to explore the valuable information from the pruned components, thereby developing a more powerful GLT, referred to as the ACE-GLT. The main idea is to mine valuable information from pruned edges/weights after each round of IMP, and employ the ACE technique to refine the GLT processing. Finally, experimental results demonstrate that our ACE-GLT outperforms existing methods for searching GLT in diverse tasks. Our code will be made publicly available.

摘要
Graph Lottery Ticket (GLT), 一种组合核心子图和稀疏子网络的方法，被提出以减少深度图神经网络（GNNs）中输入图的计算成本而保持原始性能。然而，现有的赢家GLT在存在的研究中通常通过不重新评估和重新考虑被剪除的信息来获得赢家票，这会忽略图/模型结构剪除过程中边/权重的动态变化，从而限制赢家票的吸引力。在这篇论文中，我们提出一个 conjecture，即现有的忽略了有价值信息的剪除graph连接和模型参数，可以重新组织成GLT，以提高最终性能。具体来说，我们提出一种对抗补偿抹除（ACE）框架，以探索剪除后的有价值信息，并使用ACE技术来练化GLT处理。最后，我们通过实验结果发现，我们的ACE-GLT在多种任务中超过了现有的搜索GLT方法。我们将代码公开发布。

Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

paper_url: http://arxiv.org/abs/2308.04455
repo_url: https://github.com/deep-privacy/SA-toolkit
paper_authors: Pierre Champion
for: 防止语音数据隐私泄露
methods: 使用量化变换提高匿名化精度，并对各组件进行独立评估
results: 提出新的匿名化方法，并对现有系统进行攻击和破解分析

Abstract
The growing use of voice user interfaces has led to a surge in the collection and storage of speech data. While data collection allows for the development of efficient tools powering most speech services, it also poses serious privacy issues for users as centralized storage makes private personal speech data vulnerable to cyber threats. With the increasing use of voice-based digital assistants like Amazon's Alexa, Google's Home, and Apple's Siri, and with the increasing ease with which personal speech data can be collected, the risk of malicious use of voice-cloning and speaker/gender/pathological/etc. recognition has increased. This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization. In this work, anonymization refers to making personal speech data unlinkable to an identity while maintaining the usefulness (utility) of the speech signal (e.g., access to linguistic content). We start by identifying several challenges that evaluation protocols need to consider to evaluate the degree of privacy protection properly. We clarify how anonymization systems must be configured for evaluation purposes and highlight that many practical deployment configurations do not permit privacy evaluation. Furthermore, we study and examine the most common voice conversion-based anonymization system and identify its weak points before suggesting new methods to overcome some limitations. We isolate all components of the anonymization system to evaluate the degree of speaker PPI associated with each of them. Then, we propose several transformation methods for each component to reduce as much as possible speaker PPI while maintaining utility. We promote anonymization algorithms based on quantization-based transformation as an alternative to the most-used and well-known noise-based approach. Finally, we endeavor a new attack method to invert anonymization.

摘要
随着声控交互的使用增加，声音数据的收集和存储也逐渐增加。然而，这也会产生一些隐私问题，因为中央存储的私人声音数据容易受到网络攻击。随着声音基于的数字助手 like Alexa、Home 和 Siri 的使用的加剧，以及声音数据的采集变得更加容易，黑客利用声音恶意识别和 speaker/性别/疾病等识别的风险也在增加。这个论文提出了一些解决隐私问题的方法，包括声音anonimization和评估隐私保护度的方法。在这个工作中，anonimization指的是让个人声音数据与身份分离开来，而不是完全隐藏声音内容。我们开始 by 认识评估协议中需要考虑的一些挑战，并且 clarify 如何配置anonimization系统以便评估。我们还发现了许多实际部署配置不允许隐私评估。此外，我们还研究了最常用的声音转换基于的隐私保护系统的弱点，并提出了一些新的方法来解决一些限制。我们将声音隐私系统中的每个组件分解出来，并评估它们中 speaker PPI 的水平。然后，我们提出了一些变换方法来减少 speaker PPI，同时保持用户体验。我们推荐使用量化变换算法，而不是传统的噪声基本方法。最后，我们提出了一种新的攻击方法，以尝试破坏隐私保护。

Anomaly Detection in Global Financial Markets with Graph Neural Networks and Nonextensive Entropy

paper_url: http://arxiv.org/abs/2308.02914
repo_url: None
paper_authors: Kleyton da Costa
for: 本研究探讨了在全球金融市场中检测异常现象的能力，特别是在多变量系统中。
methods: 本研究使用图神经网络（GNN）来检测异常现象，并考虑了不确定性enarioMeasured by nonextensive entropy。
results: 主要发现结果表明，在危机时期，高度相关的资产结构下降，并且在不同的nonextensive entropy参数下，异常数量 statistically different。

Abstract
Anomaly detection is a challenging task, particularly in systems with many variables. Anomalies are outliers that statistically differ from the analyzed data and can arise from rare events, malfunctions, or system misuse. This study investigated the ability to detect anomalies in global financial markets through Graph Neural Networks (GNN) considering an uncertainty scenario measured by a nonextensive entropy. The main findings show that the complex structure of highly correlated assets decreases in a crisis, and the number of anomalies is statistically different for nonextensive entropy parameters considering before, during, and after crisis.

摘要
<>传统的异常检测任务是非常具有挑战性，尤其在系统中存在多个变量时。异常是数据中的异常值，可能由罕见事件、设备故障或系统滥用引起。本研究通过图神经网络（GNN）检测了全球金融市场中的异常现象，并考虑了不确定性enario中的 nonextensive entropy。主要发现结果表明，危机时期的复杂资产结构下降，并且 nonextensive entropy参数中的异常数量在危机前、 durante 和危机后期 Statistically different。Note: "nonextensive entropy" is translated as "不确定性enario" in Simplified Chinese, which is a combination of "不确定" (uncertain) and "scenario" (a hypothetical or imaginary situation).

2023-08-06

cs.CL

cs.CL - 2023-08-06

Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models

paper_url: http://arxiv.org/abs/2308.03151
repo_url: https://github.com/aaronma2020/Food500-Cap
paper_authors: Zheng Ma, Mianzhi Pan, Wenhan Wu, Kanzhi Cheng, Jianbing Zhang, Shujian Huang, Jiajun Chen
for: 这篇论文旨在探讨 популяр的视觉语言模型（VLM）在特定领域中的能力。
methods: 该论文使用了多种探测方法，包括零 shot 设定下的评估方法，以检测 VLM 的局限性。
results: 实验结果显示，Popular VLM 在食品领域下表现较差，而且对不同地区的食品 Item 的处理能力具有偏见。

Abstract
Vision-language models (VLMs) have shown impressive performance in substantial downstream multi-modal tasks. However, only comparing the fine-tuned performance on downstream tasks leads to the poor interpretability of VLMs, which is adverse to their future improvement. Several prior works have identified this issue and used various probing methods under a zero-shot setting to detect VLMs' limitations, but they all examine VLMs using general datasets instead of specialized ones. In practical applications, VLMs are usually applied to specific scenarios, such as e-commerce and news fields, so the generalization of VLMs in specific domains should be given more attention. In this paper, we comprehensively investigate the capabilities of popular VLMs in a specific field, the food domain. To this end, we build a food caption dataset, Food-500 Cap, which contains 24,700 food images with 494 categories. Each image is accompanied by a detailed caption, including fine-grained attributes of food, such as the ingredient, shape, and color. We also provide a culinary culture taxonomy that classifies each food category based on its geographic origin in order to better analyze the performance differences of VLM in different regions. Experiments on our proposed datasets demonstrate that popular VLMs underperform in the food domain compared with their performance in the general domain. Furthermore, our research reveals severe bias in VLMs' ability to handle food items from different geographic regions. We adopt diverse probing methods and evaluate nine VLMs belonging to different architectures to verify the aforementioned observations. We hope that our study will bring researchers' attention to VLM's limitations when applying them to the domain of food or culinary cultures, and spur further investigations to address this issue.

摘要
视力语模型（VLM）在多Modal任务中表现出色，但只是对下游任务的细致 fine-tuning 可能导致 VLM 的解释性差，这对其未来改进带来障碍。一些先前的研究已经发现这个问题，并使用了不同的探测方法来检测 VLM 的局限性，但这些研究都使用了通用的数据集而不是专门的数据集。在实际应用中，VLM 通常被应用于特定场景，如电商和新闻领域，因此 VLM 在特定领域的泛化性应该得更多的注意。在这篇论文中，我们广泛探讨了流行的 VLM 在食品领域的能力。为此，我们建立了一个食品描述集合，即 Food-500 Cap，该集合包含 24,700 个食品图像，其中每个图像都有 494 个类别。每个图像都有详细的描述，包括食品的成分、形状和颜色。我们还提供了一个culinary culture taxonomy，该税onomy分类每个食品类别基于其地理起源，以便更好地分析 VLM 在不同地区的表现差异。我们对我们提posed的数据集进行实验，发现流行的 VLM 在食品领域的表现落后于其在通用领域的表现。此外，我们的研究发现 VLM 对不同地区的食品项目存在严重的偏见。我们采用了多种探测方法，并评估了九种不同架构的 VLM，以确认以上观察。我们希望通过这种研究，引起研究者对 VLM 在食品或culinary cultures领域的应用中的局限性的注意，并促进进一步的调查以解决这一问题。

Towards Multiple References Era – Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation

paper_url: http://arxiv.org/abs/2308.03131
repo_url: https://github.com/sefazeng/llm-ref
paper_authors: Xianfeng Zeng, Yijin Liu, Fandong Meng, Jie Zhou
for: 提高 matching-based 评估 metric 和人类评估的相关性, 特别是对比 neural-based metric 如 BLEURT.
methods: 使用多个参考文本来增强 matching-based metric 的一致性。
results: 在 WMT Metrics benchmark 中，多参考 F200spBLEU 与单参考 F200spBLEU 之间的准确率提高为 7.2%，并且超过 neural-based BERTscore 的准确率提高为 3.9%。此外，我们发现 LLM 中的数据泄露问题可以通过我们的多参考 metric 减少到一定程度。

Abstract
N-gram matching-based evaluation metrics, such as BLEU and chrF, are widely utilized across a range of natural language generation (NLG) tasks. However, recent studies have revealed a weak correlation between these matching-based metrics and human evaluations, especially when compared with neural-based metrics like BLEURT. In this paper, we conjecture that the performance bottleneck in matching-based metrics may be caused by the limited diversity of references. To address this issue, we propose to utilize \textit{multiple references} to enhance the consistency between these metrics and human evaluations. Within the WMT Metrics benchmarks, we observe that the multi-references F200spBLEU surpasses the conventional single-reference one by an accuracy improvement of 7.2\%. Remarkably, it also exceeds the neural-based BERTscore by an accuracy enhancement of 3.9\%. Moreover, we observe that the data leakage issue in large language models (LLMs) can be mitigated to a large extent by our multi-reference metric. We release the code and data at \url{https://github.com/SefaZeng/LLM-Ref}

摘要
对于自然语言生成（NLG）任务，N-gram匹配基于评估度量，如BLEU和chrF，在广泛应用。然而，最近的研究表明，这些匹配基于度量与人工评估之间的相关性较弱，特别是与神经网络基于度量 like BLEURT 相比。在这篇论文中，我们 conjecture 这种性能瓶颈可能是因为参考文本的多样性有限。为了解决这个问题，我们提议使用多个参考文本来提高这些度量与人工评估之间的一致性。在 WMT Metrics benchmark 中，我们发现使用多个参考文本的 F200spBLEU 的精度提高了7.2%，而且还超过了基于神经网络的 BERTscore 的精度提高3.9%。此外，我们发现大语言模型（LLM）中的数据泄露问题可以通过我们的多个参考文本度量来减轻。我们在 GitHub 上发布了代码和数据，请参考 \url{https://github.com/SefaZeng/LLM-Ref}。

“Kurosawa”: A Script Writer’s Assistant

paper_url: http://arxiv.org/abs/2308.03122
repo_url: None
paper_authors: Prerak Gandhi, Vishal Pramanik, Pushpak Bhattacharyya
for: 这 paper 的目的是提出一种基于 AI 的剧本创作工具箱，以便自动生成剧本和剧场。
methods: 该工具箱使用 GPT-3 进行微调，并使用 manually annotated 的剧本和剧场数据进行训练。
results: 经过评估后，这些自动生成的剧本和剧场被一家著名的娱乐平台 ErosNow 的编剧人员用于创作。

Abstract
Storytelling is the lifeline of the entertainment industry -- movies, TV shows, and stand-up comedies, all need stories. A good and gripping script is the lifeline of storytelling and demands creativity and resource investment. Good scriptwriters are rare to find and often work under severe time pressure. Consequently, entertainment media are actively looking for automation. In this paper, we present an AI-based script-writing workbench called KUROSAWA which addresses the tasks of plot generation and script generation. Plot generation aims to generate a coherent and creative plot (600-800 words) given a prompt (15-40 words). Script generation, on the other hand, generates a scene (200-500 words) in a screenplay format from a brief description (15-40 words). Kurosawa needs data to train. We use a 4-act structure of storytelling to annotate the plot dataset manually. We create a dataset of 1000 manually annotated plots and their corresponding prompts/storylines and a gold-standard dataset of 1000 scenes with four main elements -- scene headings, action lines, dialogues, and character names -- tagged individually. We fine-tune GPT-3 with the above datasets to generate plots and scenes. These plots and scenes are first evaluated and then used by the scriptwriters of a large and famous media platform ErosNow. We release the annotated datasets and the models trained on these datasets as a working benchmark for automatic movie plot and script generation.

摘要
互联网娱乐业的生命线是故事告诉，电影、电视节目和Stand-up喜剧都需要故事。一个好的和抓人的剧本是故事告诉的生命线，需要创造力和资源投入。好的剧本作家罕见，常工作于严格的时间压力下。因此，娱乐媒体 aktif looking for automation。在这篇论文中，我们介绍了一个基于人工智能的剧本创作工具 called KUROSAWA，它 Addresses the tasks of plot generation and script generation。Plot generation aims to generate a coherent and creative plot (600-800 words) given a prompt (15-40 words). Script generation, on the other hand, generates a scene (200-500 words) in a screenplay format from a brief description (15-40 words). Kurosawa需要数据进行训练。我们使用了4 acts of storytelling structure to annotate the plot dataset manually。我们创建了1000个手动注释的剧本和其相应的提示/故事情节的数据集，以及1000个完整的场景集，每个场景包括四个主要元素：场景标题、行为线、对白和角色名称，并且每个元素都被标记 separately。我们使用这些数据集来精度地调整GPT-3，以生成剧本和场景。这些剧本和场景首先被评估，然后被一家著名的娱乐平台ErosNow的编剧使用。我们发布了注释数据集和基于这些数据集的模型，作为自动电影剧本和剧本生成的工作准则。

PromptSum: Parameter-Efficient Controllable Abstractive Summarization

paper_url: http://arxiv.org/abs/2308.03117
repo_url: None
paper_authors: Mathieu Ravaut, Hailin Chen, Ruochen Zhao, Chengwei Qin, Shafiq Joty, Nancy Chen
for: 提高摘要生成的性能和可控性，同时实现参数效率和数据效率。
methods: combinig prompt tuning (PT) 技术和多任务目标，以及使用明确的实体提示。
results: 在 популяр的摘要生成benchmark上达到了竞争性ROUGE成绩，同时具有强的可控性，只需要 Parameters的几个数量级下锻炼。

Abstract
Prompt tuning (PT), a parameter-efficient technique that only tunes the additional prompt embeddings while keeping the backbone pre-trained language model (PLM) frozen, has shown promising results in language understanding tasks, especially in low-resource scenarios. However, effective prompt design methods suitable for generation tasks such as summarization are still lacking. At the same time, summarization guided through instructions (discrete prompts) can achieve a desirable double objective of high quality and controllability in summary generation. Towards a goal of strong summarization performance under the triple conditions of parameter-efficiency, data-efficiency, and controllability, we introduce PromptSum, a method combining PT with a multi-task objective and discrete entity prompts for abstractive summarization. Our model achieves competitive ROUGE results on popular abstractive summarization benchmarks coupled with a strong level of controllability through entities, all while only tuning several orders of magnitude less parameters.

摘要
Prompt tuning（PT），一种 parameter-efficient 技术，只是调整额外提示 embedding，保持预训练语言模型（PLM）冻结，已经显示出在语言理解任务中获得了良好的结果，特别是在低资源enario中。然而，适用于生成任务，如摘要的有效提示方法仍然缺乏。而摘要指导 instrucions（discrete prompts）可以实现一个双重目标：高质量和可控性。为了实现强大的摘要性能在三个条件下：参数效率、数据效率和可控性，我们介绍 PromptSum，一种将PT与多任务目标和分类提示拼接在一起的方法。我们的模型在流行的摘要生成benchmark上达到了竞争性ROUGE成绩，同时保持了高水平的可控性，所有这些都只需要调整几个数量级的参数。

Improving Domain-Specific Retrieval by NLI Fine-Tuning

paper_url: http://arxiv.org/abs/2308.03103
repo_url: None
paper_authors: Roman Dušek, Aleksander Wawer, Christopher Galias, Lidia Wojciechowska
for: investigate the fine-tuning potential of natural language inference (NLI) data to improve information retrieval and ranking.
methods: employ both monolingual and multilingual sentence encoders fine-tuned by a supervised method utilizing contrastive loss and NLI data.
results: NLI fine-tuning increases the performance of the models in both tasks and both languages, with the potential to improve mono- and multilingual models.Here’s the full text in Simplified Chinese:
for: 这篇论文旨在调查自然语言推理（NLI）数据的细化潜力，以提高信息检索和排名。
methods: 我们使用一种监督方法，使用对比损失和NLI数据来细化单语言和多语言句子编码器。
results: NLI细化提高了模型在两种任务和两种语言中的性能，并有可能提高单语言和多语言模型。I hope this helps! Let me know if you have any other questions.

Abstract
The aim of this article is to investigate the fine-tuning potential of natural language inference (NLI) data to improve information retrieval and ranking. We demonstrate this for both English and Polish languages, using data from one of the largest Polish e-commerce sites and selected open-domain datasets. We employ both monolingual and multilingual sentence encoders fine-tuned by a supervised method utilizing contrastive loss and NLI data. Our results point to the fact that NLI fine-tuning increases the performance of the models in both tasks and both languages, with the potential to improve mono- and multilingual models. Finally, we investigate uniformity and alignment of the embeddings to explain the effect of NLI-based fine-tuning for an out-of-domain use-case.

摘要
本文的目的是调查自然语言推理（NLI）数据的细化 potential，以提高信息检索和排序。我们通过英语和波兰语两种语言进行了实验，使用了一个大型波兰电商网站的数据和一些开放领域的数据集。我们使用了监督方法和对比损失来练化单语言和多语言句子编码器。我们的结果表明，NLI 练化可以提高模型在两种语言中的性能，并且可以改善单语言和多语言模型。最后，我们进行了均匀性和对齐的调查，以解释NLI基于练化的效果在异语言应用场景中。

LARCH: Large Language Model-based Automatic Readme Creation with Heuristics

paper_url: http://arxiv.org/abs/2308.03099
repo_url: https://github.com/hitachi-nlp/larch
paper_authors: Yuta Koreeda, Terufumi Morishita, Osamu Imaichi, Yasuhiro Sogawa
for: This paper aims to demonstrate the ability of large language models (LLMs) to generate coherent and factually correct readmes for software development projects, and to introduce a new tool called LARCH (LLM-based Automatic Readme Creation with Heuristics) that leverages representative code identification with heuristics and weak supervision to achieve this goal.methods: The authors use a dataset of 100 open-source projects to train and evaluate LARCH, and compare its performance with a baseline that does not rely on representative code identification. They use human and automated evaluations to assess the quality of the generated readmes, and show that LARCH outperforms the baseline in the majority of cases.results: The authors report that LARCH is capable of generating coherent and factually correct readmes in the majority of cases, and that it outperforms the baseline in terms of readability, accuracy, and completeness. They also provide a demo video showcasing LARCH’s capabilities, which is available at https://youtu.be/ZUKkh5ED-O4.

Abstract
Writing a readme is a crucial aspect of software development as it plays a vital role in managing and reusing program code. Though it is a pain point for many developers, automatically creating one remains a challenge even with the recent advancements in large language models (LLMs), because it requires generating an abstract description from thousands of lines of code. In this demo paper, we show that LLMs are capable of generating a coherent and factually correct readmes if we can identify a code fragment that is representative of the repository. Building upon this finding, we developed LARCH (LLM-based Automatic Readme Creation with Heuristics) which leverages representative code identification with heuristics and weak supervision. Through human and automated evaluations, we illustrate that LARCH can generate coherent and factually correct readmes in the majority of cases, outperforming a baseline that does not rely on representative code identification. We have made LARCH open-source and provided a cross-platform Visual Studio Code interface and command-line interface, accessible at https://github.com/hitachi-nlp/larch. A demo video showcasing LARCH's capabilities is available at https://youtu.be/ZUKkh5ED-O4.

摘要
制作readme文档是软件开发中的一个重要环节，它扮演着管理和重用代码的重要角色。尽管这是许多开发者的痛点，但自动生成readme仍然是一项挑战，尤其是在大语言模型（LLM）的最新进展下。这是因为需要从千行代码中生成抽象的描述。在这个demo paper中，我们显示了LLM可以生成准确和 coherent的readme，只要我们可以确定代码的 Representatives。基于这一发现，我们开发了LARCH（LLM-based Automatic Readme Creation with Heuristics），它利用代码 Representative identification和规则来生成readme。经过人工和自动评估，我们表明LARCH可以在大多数情况下生成准确和 coherent的readme，超过了不含代码 Representative identification的基线。我们将LARCH开源，并提供了跨平台Visual Studio Code接口和命令行接口，可以在https://github.com/hitachi-nlp/larch中下载。一个展示LARCH的功能的 demo 视频可以在https://youtu.be/ZUKkh5ED-O4上找到。

System-Initiated Transitions from Chit-Chat to Task-Oriented Dialogues with Transition Info Extractor and Transition Sentence Generator

paper_url: http://arxiv.org/abs/2308.03098
repo_url: None
paper_authors: Ye Liu, Stefan Ultes, Wolfgang Minker, Wolfgang Maier
for: investigate how a unified dialogue model can take the initiative during the dialogue mode transition from chit-chat to task-oriented in a coherent and cooperative manner.
methods: built a {transition information extractor} (TIE) and a {transition sentence generator} (TSG) through efficient Adapter tuning and transition prompt learning.
results: achieved promising performance regarding the proactive transitions and improved the TIE model by utilizing Conditional Random Fields (CRF). The TSG can flexibly generate transition sentences while maintaining the unified capabilities of normal chit-chat and task-oriented response generation.

Abstract
In this work, we study dialogue scenarios that start from chit-chat but eventually switch to task-related services, and investigate how a unified dialogue model, which can engage in both chit-chat and task-oriented dialogues, takes the initiative during the dialogue mode transition from chit-chat to task-oriented in a coherent and cooperative manner. We firstly build a {transition info extractor} (TIE) that keeps track of the preceding chit-chat interaction and detects the potential user intention to switch to a task-oriented service. Meanwhile, in the unified model, a {transition sentence generator} (TSG) is extended through efficient Adapter tuning and transition prompt learning. When the TIE successfully finds task-related information from the preceding chit-chat, such as a transition domain, then the TSG is activated automatically in the unified model to initiate this transition by generating a transition sentence under the guidance of transition information extracted by TIE. The experimental results show promising performance regarding the proactive transitions. We achieve an additional large improvement on TIE model by utilizing Conditional Random Fields (CRF). The TSG can flexibly generate transition sentences while maintaining the unified capabilities of normal chit-chat and task-oriented response generation.

摘要
在这项研究中，我们研究了从小聊天转移到任务相关服务的对话场景，并研究了一个统一对话模型，可以同时参与小聊天和任务准备对话。我们首先建立了一个{"transition信息抽取器"}(TIE)，以跟踪先前的小聊天交互并检测用户可能的任务转换意图。在统一模型中，我们通过高效的Adapter调整和转换提示学习扩展了{转换句生成器}(TSG)。当TIE成功检测到前一个小聊天中的任务相关信息，例如转换领域，那么TSG会自动在统一模型中被启动，通过生成一个转换句来实现转换，并且在TIE提供的转换信息的指导下进行转换句生成。实验结果表明，我们在掌握前一个小聊天中的任务相关信息后，可以通过TSG生成转换句来实现掌握任务的跳转，并且可以保持统一的对话能力。此外，我们还通过使用 Conditional Random Fields (CRF) 来提高TIE模型的性能，并且TSG可以灵活地生成转换句，同时保持统一的对话能力。

TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties

paper_url: http://arxiv.org/abs/2308.03051
repo_url: None
paper_authors: Karima Kadaoui, Samar M. Magdy, Abdul Waheed, Md Tawkat Islam Khondaker, Ahmed Oumar El-Shangiti, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed
For: 这项研究旨在评估Google Bard和OpenAI ChatGPT等大型语言模型在十种阿拉伯语言中的翻译能力。* Methods: 研究使用了这些模型的机器翻译功能，包括古典阿拉伯语、现代标准阿拉伯语和几种方言变体。此外，研究还进行了人类中心的研究，以评估Bard模型在翻译任务中遵循人类指令的能力。* Results: 研究发现，LLMs在某些阿拉伯语言方言上存在困难，特别是缺乏公共数据的语言，如阿尔及利亚和毛里塔尼亚语言。然而，它们在更常见的方言上表现良好，尽管有时与商业系统如Google Translate相比落后。此外，研究还发现Bard模型在翻译任务中遵循人类指令的能力有限。总的来说，研究表明，现有的LLMs仍然缺乏包容性，无法满足不同社区的语言和文化特点。

Abstract
Large language models (LLMs) finetuned to follow human instructions have recently emerged as a breakthrough in AI. Models such as Google Bard and OpenAI ChatGPT, for example, are surprisingly powerful tools for question answering, code debugging, and dialogue generation. Despite the purported multilingual proficiency of these models, their linguistic inclusivity remains insufficiently explored. Considering this constraint, we present a thorough assessment of Bard and ChatGPT (encompassing both GPT-3.5 and GPT-4) regarding their machine translation proficiencies across ten varieties of Arabic. Our evaluation covers diverse Arabic varieties such as Classical Arabic, Modern Standard Arabic, and several nuanced dialectal variants. Furthermore, we undertake a human-centric study to scrutinize the efficacy of the most recent model, Bard, in following human instructions during translation tasks. Our exhaustive analysis indicates that LLMs may encounter challenges with certain Arabic dialects, particularly those for which minimal public data exists, such as Algerian and Mauritanian dialects. However, they exhibit satisfactory performance with more prevalent dialects, albeit occasionally trailing behind established commercial systems like Google Translate. Additionally, our analysis reveals a circumscribed capability of Bard in aligning with human instructions in translation contexts. Collectively, our findings underscore that prevailing LLMs remain far from inclusive, with only limited ability to cater for the linguistic and cultural intricacies of diverse communities.

摘要
大型语言模型（LLM），如Google Bard和OpenAI ChatGPT，在最近几年内崭新出现，并表现出很强的问答、代码调试和对话生成能力。然而，这些模型的语言多样性仍然未得到足够的探索。因此，我们对Bard和ChatGPT（包括GPT-3.5和GPT-4）在十种阿拉伯语言中的机器翻译能力进行了全面的评估。我们的评估覆盖了古典阿拉伯语、现代标准阿拉伯语以及一些细腻的方言变体。此外，我们还进行了人类中心的研究，以评估Bard在翻译任务中遵循人类指令的能力。我们的详细分析表明，LLMs可能会面临certain阿拉伯语言的挑战，特别是缺乏公共数据的阿尔及杜和 Mauritania的方言。然而，它们在更常见的方言上表现得比较满意，尽管有时会落后于已有的商业系统如Google翻译。此外，我们的分析还发现了Bard在翻译任务中遵循人类指令的能力有限。总之，我们的发现表明，当前的LLMs仍然远离含义，只能够部分地适应不同社区的语言和文化特点。

3D-EX : A Unified Dataset of Definitions and Dictionary Examples

paper_url: http://arxiv.org/abs/2308.03043
repo_url: https://github.com/f-almeman/3d-ex
paper_authors: Fatemah Almeman, Hadi Sheikhi, Luis Espinosa-Anke
for: 这篇论文的目的是为了提供一个中央知识库， combin ing 英语资源，以便在 NLP 任务中使用。
methods: 这篇论文使用了 <term, definition, example> triplets 来填充 lexical 资源的 gap，并提供了一个统一的评估框架，以避免 memorization。
results: 实验结果表明，这些数据可以在下游 NLP 任务中有效地应用。

Abstract
Definitions are a fundamental building block in lexicography, linguistics and computational semantics. In NLP, they have been used for retrofitting word embeddings or augmenting contextual representations in language models. However, lexical resources containing definitions exhibit a wide range of properties, which has implications in the behaviour of models trained and evaluated on them. In this paper, we introduce 3D- EX , a dataset that aims to fill this gap by combining well-known English resources into one centralized knowledge repository in the form of triples. 3D- EX is a unified evaluation framework with carefully pre-computed train/validation/test splits to prevent memorization. We report experimental results that suggest that this dataset could be effectively leveraged in downstream NLP tasks. Code and data are available at https://github.com/F-Almeman/3D-EX .

摘要
Translate the given text into Simplified Chinese.定义是自然语言处理（NLP）中的基本构件，它们在语义学和计算 semantics 中扮演着重要的角色。在 NLP 中，定义被用于改进词嵌入或增强语言模型的上下文表示。然而，各种语言资源中的定义具有各种特点，这会影响模型在这些资源上训练和评估的行为。本文介绍了3D- EX 数据集，它将英语资源集成到一个中心知识库中，并提供了三元组。3D- EX 提供了一个统一的评估框架，并且在训练/验证/测试分区中进行了精心预计算，以避免Memorization。我们发现，这个数据集可以在下游 NLP 任务中得到有效地利用。代码和数据可以在 GitHub 上找到：https://github.com/F-Almeman/3D-EX。

Multi-Source (Pre-)Training for Cross-Domain Measurement, Unit and Context Extraction

paper_url: http://arxiv.org/abs/2308.02951
repo_url: https://github.com/liy140/multidomain-measextract-corpus
paper_authors: Yueling Li, Sebastian Martschat, Simone Paolo Ponzetto
for: 本研究旨在开发一种跨Domain的自动量测和上下文提取方法，利用预训练语言模型。
methods: 我们构建了多源多Domain的语料库，并在这个语料库上训练了一个端到端提取管道。然后，我们进行多源任务适应性预训练和细化调整，以评估我们的模型在跨Domain的总体化能力。
results: 我们的结果表明，多源训练导致最佳总体结果，而单源训练对各个域的结果具有最佳效果。虽然我们的设置可以提取量值和单位，但需要进一步研究以提高上下文实体的提取。我们将在线上发布使用的跨Domain语料库。

Abstract
We present a cross-domain approach for automated measurement and context extraction based on pre-trained language models. We construct a multi-source, multi-domain corpus and train an end-to-end extraction pipeline. We then apply multi-source task-adaptive pre-training and fine-tuning to benchmark the cross-domain generalization capability of our model. Further, we conceptualize and apply a task-specific error analysis and derive insights for future work. Our results suggest that multi-source training leads to the best overall results, while single-source training yields the best results for the respective individual domain. While our setup is successful at extracting quantity values and units, more research is needed to improve the extraction of contextual entities. We make the cross-domain corpus used in this work available online.

摘要
我们提出了跨领域方法，用于自动测量和上下文提取，基于预训练语言模型。我们构建了多源多领域 corpora，并训练了端到端提取管道。然后，我们实施多源任务适应预训练和精度调整，以评估我们模型的跨领域一致性。此外，我们提出了任务特定错误分析，并从中提取了未来工作的想法。我们的结果表明，多源训练得到了最佳总结果，而单源训练得到了每个固定领域的最佳结果。虽然我们的设置可以提取量值和单位，但更多的研究是需要进一步提高上下文实体的提取。我们在这里发布了用于这项工作的跨领域 corpus。

Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval

paper_url: http://arxiv.org/abs/2308.02926
repo_url: https://github.com/Haoxiang-WasedaU/Towards-Consistency-Filtering-Free-Unsupervised-Learning-for-Dense-Retrieval
paper_authors: Haoxiang Shi, Sumio Fujita, Tetsuya Sakai
for: 这个研究是为了解决现代神经信息搜寻（IR）中的领域转移问题。
methods: 这个研究使用了不同的方法来取代过去常用的领域专门的手动标注和人工生成的统计数据，以提高rankere的效率和表现。
results: 研究结果显示，使用TextRank基于pseudo relevance feedback的方法可以更好地超越其他方法，而且训练和测试效率都能持续提高。

Abstract
Domain transfer is a prevalent challenge in modern neural Information Retrieval (IR). To overcome this problem, previous research has utilized domain-specific manual annotations and synthetic data produced by consistency filtering to finetune a general ranker and produce a domain-specific ranker. However, training such consistency filters are computationally expensive, which significantly reduces the model efficiency. In addition, consistency filtering often struggles to identify retrieval intentions and recognize query and corpus distributions in a target domain. In this study, we evaluate a more efficient solution: replacing the consistency filter with either direct pseudo-labeling, pseudo-relevance feedback, or unsupervised keyword generation methods for achieving consistent filtering-free unsupervised dense retrieval. Our extensive experimental evaluations demonstrate that, on average, TextRank-based pseudo relevance feedback outperforms other methods. Furthermore, we analyzed the training and inference efficiency of the proposed paradigm. The results indicate that filtering-free unsupervised learning can continuously improve training and inference efficiency while maintaining retrieval performance. In some cases, it can even improve performance based on particular datasets.

摘要
域名转移是现代神经信息检索（IR）中的一大挑战。以前的研究使用了域名特定的手动标注和生成的人工数据来训练一个通用排名器，并生成一个域名特定的排名器。然而，训练这些一致性筛选器是计算机代价高昂，这会明显降低模型效率。另外，一致性筛选器经常难以识别检索意图和查询和文献分布在目标域中。在这种研究中，我们评估了一种更高效的解决方案：取代一致性筛选器，使用直接 pseudo-标注、 pseudo-相关反馈或无监督关键词生成方法来实现一致性自由无监督排名。我们对这些方法进行了广泛的实验评估，结果显示，在平均情况下，基于 TextRank 的 pseudo-相关反馈方法表现较好。此外，我们还分析了我们提议的训练和推理效率。结果表明，无监督无筛选的学习可以不断提高训练和推理效率，同时维持检索性能。在某些情况下，它可以超越基于特定数据集的表现。

2023-08-06

cs.LG

cs.LG - 2023-08-06

AI-GOMS: Large AI-Driven Global Ocean Modeling System

paper_url: http://arxiv.org/abs/2308.03152
repo_url: None
paper_authors: Wei Xiong, Yanfei Xiang, Hao Wu, Shuyi Zhou, Yuze Sun, Muyuan Ma, Xiaomeng Huang
for: 这个论文的目的是为了提出一种基于人工智能的全球海洋模拟系统，以实现精确和高效的全球海洋日常预测。
methods: 该论文使用了一种基于 Fourier-based Masked Autoencoder 结构的基本海洋变量预测模型，以及一些轻量级的细化预测模型，包括地区下降、涟式解码和生物化 Coupling 模块。
results: 该论文实现了在30天预测期间，全球海洋基本变量的最佳性能，并且可以正确 simulate mesoscale eddies 在日本热带区域和海洋层次分布在赤道太平洋区域。该系统还实现了对 Earth system 模型的新脊梁下游方式，使其可以易于转移、扩展和重用。

Abstract
Ocean modeling is a powerful tool for simulating the physical, chemical, and biological processes of the ocean, which is the foundation for marine science research and operational oceanography. Modern numerical ocean modeling mainly consists of governing equations and numerical algorithms. Nonlinear instability, computational expense, low reusability efficiency and high coupling costs have gradually become the main bottlenecks for the further development of numerical ocean modeling. Recently, artificial intelligence-based modeling in scientific computing has shown revolutionary potential for digital twins and scientific simulations, but the bottlenecks of numerical ocean modeling have not been further solved. Here, we present AI-GOMS, a large AI-driven global ocean modeling system, for accurate and efficient global ocean daily prediction. AI-GOMS consists of a backbone model with the Fourier-based Masked Autoencoder structure for basic ocean variable prediction and lightweight fine-tuning models incorporating regional downscaling, wave decoding, and biochemistry coupling modules. AI-GOMS has achieved the best performance in 30 days of prediction for the global ocean basic variables with 15 depth layers at 1/4{\deg} spatial resolution. Beyond the good performance in statistical metrics, AI-GOMS realizes the simulation of mesoscale eddies in the Kuroshio region at 1/12{\deg} spatial resolution and ocean stratification in the tropical Pacific Ocean. AI-GOMS provides a new backbone-downstream paradigm for Earth system modeling, which makes the system transferable, scalable and reusable.

摘要
海洋模拟是一种强大的工具，用于模拟海洋物理、化学和生物过程，这是海洋科学研究和操作海洋学的基础。现代数值海洋模拟主要由管理方程和数值算法组成。不线性不稳定、计算成本高、 reuse效率低和对接成本高逐渐成为现代数值海洋模拟的主要瓶颈。近年来，基于人工智能的科学计算方法在数值海洋模拟中表现出革命性的潜力，但现代数值海洋模拟中的瓶颈问题尚未得到解决。在这种情况下，我们提出了AI-GOMS，一个大规模的人工智能驱动的全球海洋模拟系统，用于准确和高效地预测全球海洋日常变化。AI-GOMS包括一个基础模型，其结构基于 Fourier-based Masked Autoencoder，用于预测基本海洋变量。此外，AI-GOMS还包括一些轻量级的精度增强模型，包括地区下降、浪谱解码和生物化学相互作用模块。AI-GOMS在30天预测全球海洋基本变量的15层深度分辨率下达到了最佳性能。除了在统计指标上的良好表现外，AI-GOMS还能够模拟kuroshio区域的宏观瑞度涟潮和热带太平洋海洋的层次分布。AI-GOMS提供了一个新的基础-下游模式，用于地球系统模拟，这使得系统可转移、可扩展和可重用。

Nest-DGIL: Nesterov-optimized Deep Geometric Incremental Learning for CS Image Reconstruction

paper_url: http://arxiv.org/abs/2308.03807
repo_url: https://github.com/fanxiaohong/Nest-DGIL
paper_authors: Xiaohong Fan, Yin Yang, Ke Chen, Yujie Feng, Jianping Zhang
for: 提高图像恢复精度和速度，并减少artefacts。
methods: 基于第二代Nesterov proximal梯度优化的深度凝师增量学习框架。
results: 提出了一种可 theoretically guarantee geometric texture details的恢复方法，并且可以快速 converges。

Abstract
Proximal gradient-based optimization is one of the most common strategies for solving image inverse problems as well as easy to implement. However, these techniques often generate heavy artifacts in image reconstruction. One of the most popular refinement methods is to fine-tune the regularization parameter to alleviate such artifacts, but it may not always be sufficient or applicable due to increased computational costs. In this work, we propose a deep geometric incremental learning framework based on second Nesterov proximal gradient optimization. The proposed end-to-end network not only has the powerful learning ability for high/low frequency image features,but also can theoretically guarantee that geometric texture details will be reconstructed from preliminary linear reconstruction.Furthermore, it can avoid the risk of intermediate reconstruction results falling outside the geometric decomposition domains and achieve fast convergence. Our reconstruction framework is decomposed into four modules including general linear reconstruction, cascade geometric incremental restoration, Nesterov acceleration and post-processing. In the image restoration step,a cascade geometric incremental learning module is designed to compensate for the missing texture information from different geometric spectral decomposition domains. Inspired by overlap-tile strategy, we also develop a post-processing module to remove the block-effect in patch-wise-based natural image reconstruction. All parameters in the proposed model are learnable,an adaptive initialization technique of physical-parameters is also employed to make model flexibility and ensure converging smoothly. We compare the reconstruction performance of the proposed method with existing state-of-the-art methods to demonstrate its superiority. Our source codes are available at https://github.com/fanxiaohong/Nest-DGIL.

摘要
近似 gradient-based 优化是解析图像问题的一种最常见的策略，易于实现。然而，这些技术通常会产生重要的artefacts在图像重建中。一种最受欢迎的修正方法是微调正则化参数，以避免这些artefacts，但这可能并不总是可行或适用，因为它可能会增加计算成本。在这项工作中，我们提出了一种深度几何增量学习框架，基于第二个讷斯特洛夫 proximal gradient 优化。我们的提案的终端网络不仅有强大的学习能力，可以学习高/低频图像特征，而且可以 theoretically 保证，从初步线性重建中恢复几何纹理细节。此外，它可以避免初步重建结果落在几何分解域之外，并快速 converges。我们的重建框架分为四个模块，包括普通的线性重建、几何增量修复、讷斯特洛夫加速和后处理。在图像修复步骤中，我们设计了几何增量修复模块，以补偿不同几何 spectral decomposition 域中缺失的纹理信息。受到 overlap-tile 策略的启发，我们还开发了后处理模块，以除去 patch-wise 基于自然图像重建中的块效应。所有模型参数都是学习的，并且使用 adaptive initialization 技术，以确保模型的灵活性和平滑的 converging。我们与现有的状态态-of-the-art 方法进行比较，以证明我们的方法的优越性。我们的源代码可以在 https://github.com/fanxiaohong/Nest-DGIL 上获取。

Self-Directed Linear Classification

paper_url: http://arxiv.org/abs/2308.03142
repo_url: https://github.com/Aryia-Behroziuan/neurons
paper_authors: Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, Nikos Zarifis
for: 这个论文研究了在线分类中learner预测标签的顺序选择问题，以实现最小化总错误数。
methods: 作者使用了自适应顺序选择方法，并设计了两个主要结果：一是对于uniformly随机从单位球体上取样的$X$数据集，设计了一个高效的自适应学习者，其错误数为$O(d \log \log n)$；二是对于任意$d$-维数据集$X$，设计了一个高效的自适应学习者，可以预测$X$中99%的点标签，错误数与$n$无关。
results: 作者的研究表明，在线分类中，采用自适应顺序选择方法可以实现较低的错误率，比如worst-order和Random-order学习方法的至少$\Omega(d \log n)$错误率。

Abstract
In online classification, a learner is presented with a sequence of examples and aims to predict their labels in an online fashion so as to minimize the total number of mistakes. In the self-directed variant, the learner knows in advance the pool of examples and can adaptively choose the order in which predictions are made. Here we study the power of choosing the prediction order and establish the first strong separation between worst-order and random-order learning for the fundamental task of linear classification. Prior to our work, such a separation was known only for very restricted concept classes, e.g., one-dimensional thresholds or axis-aligned rectangles. We present two main results. If $X$ is a dataset of $n$ points drawn uniformly at random from the $d$-dimensional unit sphere, we design an efficient self-directed learner that makes $O(d \log \log(n))$ mistakes and classifies the entire dataset. If $X$ is an arbitrary $d$-dimensional dataset of size $n$, we design an efficient self-directed learner that predicts the labels of $99\%$ of the points in $X$ with mistake bound independent of $n$. In contrast, under a worst- or random-ordering, the number of mistakes must be at least $\Omega(d \log n)$, even when the points are drawn uniformly from the unit sphere and the learner only needs to predict the labels for $1\%$ of them.

摘要
在在线分类中，学习者会看到一个序列例子并尝试预测其标签，以最小化总错误数。在自适应变体中，学习者在预测之前知道例子池，并可以动态选择预测的顺序。我们研究预测顺序的选择力，并证明了线性分类的基本任务上最差顺序和随机顺序之间的首次强分离。在我们的工作之前，这种分离只知道对非常限制的概念类型，例如一维阈值或垂直排序的直方体。我们有两个主要结果。如果$X$是一个$d$-维均匀随机选取从单位球体上的$n$个点，我们设计了高效的自适应学习者，其错误数为$O(d \log \log(n))$.如果$X$是一个任意$d$-维数据集大小$n$的点，我们设计了高效的自适应学习者，可以预测$X$中$99\%$的点标签，错误 bound不виси于$n$.相比之下，在最差或随机顺序下，错误数至少为$\Omega(d \log n)$,即使点被均匀选取从单位球体，并且学习者只需要预测$1\%$的点标签。

Iterative Magnitude Pruning as a Renormalisation Group: A Study in The Context of The Lottery Ticket Hypothesis

paper_url: http://arxiv.org/abs/2308.03128
repo_url: None
paper_authors: Abu-Al Hassan
for: This paper explores the Lottery Ticket Hypothesis (LTH) in Deep Neural Networks (DNNs), which suggests that smaller, trainable subnetworks within extensive DNNs can achieve performance comparable to the full model.
methods: The paper uses Iterative Magnitude Pruning (IMP) to identify and eliminate minimal weights in DNNs, emulating stepwise learning. The authors also investigate the “universality” of winning tickets and their applicability to other similar problems.
results: The paper bridges the gap between IMP and the Renormalisation Group (RG) theory in physics, providing a more rigorous understanding of IMP and its potential applications in DNNs.

Abstract
This thesis delves into the intricate world of Deep Neural Networks (DNNs), focusing on the exciting concept of the Lottery Ticket Hypothesis (LTH). The LTH posits that within extensive DNNs, smaller, trainable subnetworks termed "winning tickets", can achieve performance comparable to the full model. A key process in LTH, Iterative Magnitude Pruning (IMP), incrementally eliminates minimal weights, emulating stepwise learning in DNNs. Once we identify these winning tickets, we further investigate their "universality". In other words, we check if a winning ticket that works well for one specific problem could also work well for other, similar problems. We also bridge the divide between the IMP and the Renormalisation Group (RG) theory in physics, promoting a more rigorous understanding of IMP.

摘要
这个论文探讨了深度神经网络（DNN）的复杂世界，特别关注了赢家票假设（LTH）。LTH认为，在广泛的DNN中，更小的、可训练的子网络（赢家票）可以达到相同的性能。我们在LTH中使用增量大小减少（IMP）来逐渐减少最小的权重，模拟了DNN中的步骤学习。一旦我们identified这些赢家票，我们进一步调查它们的“通用性”。即我们检查一个赢家票在一个特定问题上能够达到高性能是否也能够在其他相似问题上达到高性能。我们还将IMP与物理学RG理论相连接，以促进IMP的更加准确的理解。

Learning-Rate-Free Learning: Dissecting D-Adaptation and Probabilistic Line Search

paper_url: http://arxiv.org/abs/2308.03102
repo_url: None
paper_authors: Max McGuinness
for: 本研究探讨了两种最近的学习率优化方法：D-Adaptation（arXiv:2301.07733）和概率线搜（arXiv:1502.02846）。这两种方法强调缓解选择初始学习率的负担，通过距离度量和 Gaussian 过程 posterior 估计，分别实现了。
methods: 本研究使用了 D-Adaptation 方法和概率线搜方法。D-Adaptation 方法基于距离度量，可以在不同的批处理大小下选择最佳学习率。概率线搜方法则使用 Gaussian 过程 posterior 估计来估计学习率的变化范围。
results: 本研究通过对两种方法的比较，发现 D-Adaptation 方法在某些情况下可以提供更高的准确率，而概率线搜方法在其他情况下可以提供更快的收敛速率。此外，本研究还发现了这两种方法在不同的批处理大小下的表现。

Abstract
This paper explores two recent methods for learning rate optimisation in stochastic gradient descent: D-Adaptation (arXiv:2301.07733) and probabilistic line search (arXiv:1502.02846). These approaches aim to alleviate the burden of selecting an initial learning rate by incorporating distance metrics and Gaussian process posterior estimates, respectively. In this report, I provide an intuitive overview of both methods, discuss their shared design goals, and devise scope for merging the two algorithms.

摘要
这份报告探讨了两种最近的学习率优化方法：D-Adaptation（arXiv:2301.07733）和概率线搜索（arXiv:1502.02846）。这两种方法目的是减轻选择初始学习率的负担，通过距离度量和 Gaussian 过程 posterior 估计，分别提供了一种简单的概念概述、讨论这两种方法的共同设计目标，并提出将这两种算法合并的范围。

Gradient Coding through Iterative Block Leverage Score Sampling

paper_url: http://arxiv.org/abs/2308.03096
repo_url: None
paper_authors: Neophytos Charalambides, Mert Pilanci, Alfred Hero
for: 这个论文是为了解决分布式计算中的失败问题（即慢进度），通过使用采样技术和编码计算方法来加速线性回归。
methods: 论文使用了一种基于采样的方法，即采样转换后的数据，以获得一个近似的$\ell_2$子空间嵌入。此外，它还使用了一种称为“编码计算”的方法，来加速线性回归。
results: 论文得到了一些有用的结果，包括：1) 采样技术可以在分布式计算中减少计算量，同时保持Solution的质量；2) 编码计算方法可以在分布式计算中加速线性回归，并且可以与采样技术结合使用以获得更好的性能。

Abstract
We generalize the leverage score sampling sketch for $\ell_2$-subspace embeddings, to accommodate sampling subsets of the transformed data, so that the sketching approach is appropriate for distributed settings. This is then used to derive an approximate coded computing approach for first-order methods; known as gradient coding, to accelerate linear regression in the presence of failures in distributed computational networks, \textit{i.e.} stragglers. We replicate the data across the distributed network, to attain the approximation guarantees through the induced sampling distribution. The significance and main contribution of this work, is that it unifies randomized numerical linear algebra with approximate coded computing, while attaining an induced $\ell_2$-subspace embedding through uniform sampling. The transition to uniform sampling is done without applying a random projection, as in the case of the subsampled randomized Hadamard transform. Furthermore, by incorporating this technique to coded computing, our scheme is an iterative sketching approach to approximately solving linear regression. We also propose weighting when sketching takes place through sampling with replacement, for further compression.

摘要
我们扩展了抽象分析过程中的贡献分析方法，以适应分布式设置，使其适合分布式计算环境。这后续用于 derivation of an approximate coded computing approach for first-order methods，known as gradient coding，以加速分布式计算网络中的线性回授。我们将数据复制到分布式网络中，以获得近似保证通过导入的抽象分布。本研究的重要性和主要贡献在于它将随机数学和近似coded computing融合在一起，而且通过导入均匀抽象，以获得$\ell_2$ Sobolev embedding。与传统的随机抽象方法不同的是，我们不需要随机投影，而是通过均匀抽象来实现。此外，我们还提出了在抽象过程中使用权重的思想，以进一步压缩数据。因此，本研究的主要贡献在于提供一种基于随机数学和近似coded computing的迭代快速解方法，用于解决分布式计算环境中的线性回授问题。

Control-aware echo state networks (Ca-ESN) for the suppression of extreme events

paper_url: http://arxiv.org/abs/2308.03095
repo_url: None
paper_authors: Alberto Racca, Luca Magri
for: 这篇论文是为了控制无序非线性系统中的极端事件而写的。
methods: 这篇论文使用了控制准确网络（Ca-ESN），让控制策略（如比例-积分-导数控制和模型预测控制）与ESNs融合在一起，以抑制极端事件的发生。
results: 这篇论文在实验中显示了使用Ca-ESN可以将极端事件的发生减少到传统方法的二个阶层，这开启了新的可控非线性系统的可能性。

Abstract
Extreme event are sudden large-amplitude changes in the state or observables of chaotic nonlinear systems, which characterize many scientific phenomena. Because of their violent nature, extreme events typically have adverse consequences, which call for methods to prevent the events from happening. In this work, we introduce the control-aware echo state network (Ca-ESN) to seamlessly combine ESNs and control strategies, such as proportional-integral-derivative and model predictive control, to suppress extreme events. The methodology is showcased on a chaotic-turbulent flow, in which we reduce the occurrence of extreme events with respect to traditional methods by two orders of magnitude. This works opens up new possibilities for the efficient control of nonlinear systems with neural networks.

摘要
非常事件是普遍存在于非线性对称系统中的突然大幅度变化，这些事件 caracterize 许多科学现象。由于它们的暴力性，非常事件通常会带来不良影响，需要控制方法来预防这些事件发生。在这个工作中，我们介绍了控制意识阶层网络（Ca-ESN），它可以联合 ESN 和控制策略，如比例Integral Derivative 控制和模型预测控制，以抑制非常事件。我们在湍流中使用这种方法，可以与传统方法比较，大大降低非常事件的发生频率，具体来说，降低了两个数量级。这个研究开启了非线性系统控制的新可能性。

Visualization of Extremely Sparse Contingency Table by Taxicab Correspondence Analysis: A Case Study of Textual Data

paper_url: http://arxiv.org/abs/2308.03079
repo_url: None
paper_authors: V. Choulakian, J. Allard
for: 这篇论文是用于描述一种Robust Variant of Correspondence Analysis方法，用于可见化EXTREMELY SPARSE ontingency tables。
methods: 这篇论文使用了12+1维度减少方法（t-SNE、UMAP、PHATE等）来Visualize sacred book fragments的文本数据集，该数据集包含590行x8265列。
results: 这篇论文通过使用Robust Variant of Correspondence Analysis方法，可以准确地VisualizeEXTREMELY SPARSE ontingency tables。

Abstract
We present an overview of taxicab correspondence analysis, a robust variant of correspondence analysis, for visualization of extremely sparse ontingency tables. In particular we visualize an extremely sparse textual data set of size 590 by 8265 concerning fragments of 8 sacred books recently introduced by Sah and Fokou\'e (2019) and studied quite in detail by (12 + 1) dimension reduction methods (t-SNE, UMAP, PHATE,...) by Ma, Sun and Zou (2022).

摘要
我们提供了taxicab对应分析的概述，这是对对应分析的一种鲁棒variant，用于可见化极其稀疏的对应表。特别是我们对590行8265列的一个极其稀疏的文本数据集进行可见化，这个数据集是由 Sah和Fokoué（2019）引入的8种圣书的残篇，并由(12+1)维度减少方法（t-SNE、UMAP、PHATE等）进行了详细研究。这个研究是由Ma、Sun和Zou（2022）进行的。

Study for Performance of MobileNetV1 and MobileNetV2 Based on Breast Cancer

paper_url: http://arxiv.org/abs/2308.03076
repo_url: None
paper_authors: Jiuqi Yan
for: 本实验主要目的是研究人工智能在医学领域中的应用，具体来说是比较MobileNetV1和MobileNetV2模型在分类乳腺癌病理图像方面的表现。
methods: 本实验使用了Kaggle上下载的乳腺癌病理图像集进行训练，并对数据集进行 нор化。然后，使用人工智能模型学习下载的数据集，找出图像中的特征并判断乳腺癌是否存在。
results: 实验结果显示，在处理这个数据集时，MobileNetV1表现得更好， validation accuracy和overfit问题在MobileNetV2训练中出现。这表明，在这种情况下，MobileNetV1比MobileNetV2更适合处理乳腺癌病理图像。

Abstract
Artificial intelligence is constantly evolving and can provide effective help in all aspects of people's lives. The experiment is mainly to study the use of artificial intelligence in the field of medicine. The purpose of this experiment was to compare which of MobileNetV1 and MobileNetV2 models was better at detecting histopathological images of the breast downloaded at Kaggle. When the doctor looks at the pathological image, there may be errors that lead to errors in judgment, and the observation speed is slow. Rational use of artificial intelligence can effectively reduce the error of doctor diagnosis in breast cancer judgment and speed up doctor diagnosis. The dataset was downloaded from Kaggle and then normalized. The basic principle of the experiment is to let the neural network model learn the downloaded data set. Then find the pattern and be able to judge on your own whether breast tissue is cancer. In the dataset, benign tumor pictures and malignant tumor pictures have been classified, of which 198738 are benign tumor pictures and 78, 786 are malignant tumor pictures. After calling MobileNetV1 and MobileNetV2, the dataset is trained separately, the training accuracy and validation accuracy rate are obtained, and the image is drawn. It can be observed that MobileNetV1 has better validation accuracy and overfit during MobileNetV2 training. From the experimental results, it can be seen that in the case of processing this dataset, MobileNetV1 is much better than MobileNetV2.

摘要
人工智能不断发展，可以帮助人们在各个方面更有效率。这个实验主要是研究人工智能在医学领域的应用。实验的目的是比较MobileNetV1和MobileNetV2模型在Kaggle上下载的乳腺病理图像中的表现。当医生查看病理图像时，可能存在错误，导致诊断错误和诊断速度慢。合理使用人工智能可以有效减少医生诊断乳腺癌判断中的错误和提高诊断速度。数据集来自Kaggle，然后Normalized。实验的基本原则是让神经网络模型学习下载的数据集，然后找出图像的模式，并能够独立判断乳腺癌。在数据集中，恶性肿瘤图像和癌瘤图像分别被分类，总共有198738个恶性肿瘤图像和78786个癌瘤图像。后来MobileNetV1和MobileNetV2都被调用，数据集被训练 separately，训练精度和验证精度分别获得，并将图像绘制出来。可以看到，在处理这个数据集时，MobileNetV1表现更好。

Comparative Analysis of Epileptic Seizure Prediction: Exploring Diverse Pre-Processing Techniques and Machine Learning Models

paper_url: http://arxiv.org/abs/2308.05176
repo_url: None
paper_authors: Md. Simul Hasan Talukder, Rejwan Bin Sulaiman
for: 预测癫痫症诊断
methods: 使用五种机器学习模型（Random Forest、Decision Tree、Extra Trees、Logistic Regression和Gradient Boosting）对电энцефалографи记录进行预测
results: 研究发现，Extra Trees模型在预测癫痫症中表现最佳，其准确率为99.29%，高于其他模型和先前研究的状态码。

Abstract
Epilepsy is a prevalent neurological disorder characterized by recurrent and unpredictable seizures, necessitating accurate prediction for effective management and patient care. Application of machine learning (ML) on electroencephalogram (EEG) recordings, along with its ability to provide valuable insights into brain activity during seizures, is able to make accurate and robust seizure prediction an indispensable component in relevant studies. In this research, we present a comprehensive comparative analysis of five machine learning models - Random Forest (RF), Decision Tree (DT), Extra Trees (ET), Logistic Regression (LR), and Gradient Boosting (GB) - for the prediction of epileptic seizures using EEG data. The dataset underwent meticulous preprocessing, including cleaning, normalization, outlier handling, and oversampling, ensuring data quality and facilitating accurate model training. These preprocessing techniques played a crucial role in enhancing the models' performance. The results of our analysis demonstrate the performance of each model in terms of accuracy. The LR classifier achieved an accuracy of 56.95%, while GB and DT both attained 97.17% accuracy. RT achieved a higher accuracy of 98.99%, while the ET model exhibited the best performance with an accuracy of 99.29%. Our findings reveal that the ET model outperformed not only the other models in the comparative analysis but also surpassed the state-of-the-art results from previous research. The superior performance of the ET model makes it a compelling choice for accurate and robust epileptic seizure prediction using EEG data.

摘要
“艾滋症是一种常见的神经疾病，具有不可预测的发作和重复的发作，因此需要精准的预测以便有效地管理和照顾病人。在这些研究中，我们展示了五种机器学习模型（Random Forest、Decision Tree、Extra Trees、Logistic Regression和Gradient Boosting）在艾滋症发作预测中的比较分析。这个数据集经过了精益的清洁、调整、异常处理和扩充，以确保数据质量和模型训练的准确性。这些预处理技术在提高模型表现方面扮演了关键的角色。我们的分析结果显示每个模型的精度，LR分类器获得56.95%的精度，而GB和DT均获得97.17%的精度。RT获得98.99%的精度，而ET模型则表现出99.29%的精度。我们的发现表明ET模型不仅在比较分析中表现出色，而且超越了过去研究中的州前成果。ET模型的超越表现使其成为精确和可靠的艾滋症发作预测中的首选。”

TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties

paper_url: http://arxiv.org/abs/2308.03051
repo_url: None
paper_authors: Karima Kadaoui, Samar M. Magdy, Abdul Waheed, Md Tawkat Islam Khondaker, Ahmed Oumar El-Shangiti, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed
for: The paper assesses the machine translation proficiencies of large language models (LLMs) like Google Bard and OpenAI ChatGPT across ten varieties of Arabic, including Classical Arabic and several dialectal variants.
methods: The paper evaluates the performance of LLMs in machine translation tasks, using diverse Arabic varieties and a human-centric study to scrutinize the models’ ability to follow human instructions.
results: The paper finds that LLMs exhibit satisfactory performance with more prevalent Arabic dialects, but encounter challenges with certain dialects, such as Algerian and Mauritanian, which have limited public data. Additionally, the paper reveals that Bard has limited ability to align with human instructions in translation contexts.

Abstract
Large language models (LLMs) finetuned to follow human instructions have recently emerged as a breakthrough in AI. Models such as Google Bard and OpenAI ChatGPT, for example, are surprisingly powerful tools for question answering, code debugging, and dialogue generation. Despite the purported multilingual proficiency of these models, their linguistic inclusivity remains insufficiently explored. Considering this constraint, we present a thorough assessment of Bard and ChatGPT (encompassing both GPT-3.5 and GPT-4) regarding their machine translation proficiencies across ten varieties of Arabic. Our evaluation covers diverse Arabic varieties such as Classical Arabic, Modern Standard Arabic, and several nuanced dialectal variants. Furthermore, we undertake a human-centric study to scrutinize the efficacy of the most recent model, Bard, in following human instructions during translation tasks. Our exhaustive analysis indicates that LLMs may encounter challenges with certain Arabic dialects, particularly those for which minimal public data exists, such as Algerian and Mauritanian dialects. However, they exhibit satisfactory performance with more prevalent dialects, albeit occasionally trailing behind established commercial systems like Google Translate. Additionally, our analysis reveals a circumscribed capability of Bard in aligning with human instructions in translation contexts. Collectively, our findings underscore that prevailing LLMs remain far from inclusive, with only limited ability to cater for the linguistic and cultural intricacies of diverse communities.

摘要
大型语言模型（LLM），如Google Bard和OpenAI ChatGPT，最近在人工智能领域受到了突破。这些模型在问答、代码调试和对话生成等方面表现出了惊人的能力。然而，这些模型在语言多样性方面的探索仍然不充分。为了解决这个问题，我们对Bard和ChatGPT进行了详细的评估，包括GPT-3.5和GPT-4在内的多种阿拉伯语种。我们的评估覆盖了多种阿拉伯语种，包括古典阿拉伯语、现代标准阿拉伯语以及一些细腻的地方语言变体。此外，我们进行了人类中心的研究，以评估Bard在翻译任务中遵循人类指令的能力。我们的详细分析表明，LLMs可能会在某些阿拉伯语言口语中遇到困难，特别是那些具有少量公共数据的语言，如阿尔及利亚和毛里塔尼亚口语。然而，它们在更常见的口语上表现得更好，尽管 occasionally 落后于商业系统如Google Translate。此外，我们的分析还发现了Bard在翻译任务中遵循人类指令的能力有限。总的来说，我们的发现表明，现有的LLMs仍然远离包容，它们只能部分地适应不同社区的语言和文化特点。

Weakly Supervised Multi-Task Representation Learning for Human Activity Analysis Using Wearables

paper_url: http://arxiv.org/abs/2308.03805
repo_url: None
paper_authors: Taoran Sheng, Manfred Huber
for: 这个论文主要针对的是如何使用弱监督多输出siamesenet对着活动和气象数据进行分类和识别。
methods: 该方法使用弱监督多输出siamesenet，该模型可以将数据映射到多个表示空间中，每个表示空间强调一个特定方面的数据。
results: 经过实验 validate，该模型可以同时解决多个任务，并且在许多情况下可以超越单任务监督方法的性能。此外， paper还进一步分析了模型的架构和多任务内部的效果，以及模型的可扩展性。

Abstract
Sensor data streams from wearable devices and smart environments are widely studied in areas like human activity recognition (HAR), person identification, or health monitoring. However, most of the previous works in activity and sensor stream analysis have been focusing on one aspect of the data, e.g. only recognizing the type of the activity or only identifying the person who performed the activity. We instead propose an approach that uses a weakly supervised multi-output siamese network that learns to map the data into multiple representation spaces, where each representation space focuses on one aspect of the data. The representation vectors of the data samples are positioned in the space such that the data with the same semantic meaning in that aspect are closely located to each other. Therefore, as demonstrated with a set of experiments, the trained model can provide metrics for clustering data based on multiple aspects, allowing it to address multiple tasks simultaneously and even to outperform single task supervised methods in many situations. In addition, further experiments are presented that in more detail analyze the effect of the architecture and of using multiple tasks within this framework, that investigate the scalability of the model to include additional tasks, and that demonstrate the ability of the framework to combine data for which only partial relationship information with respect to the target tasks is available.

摘要
感知数据流从智能设备和智能环境中获取，广泛研究在人动作识别（HAR）、人员识别和健康监测等领域。然而，大多数前一些研究在活动和感知流分析中都是专注一个方面的数据，例如只Recognize the type of activity或只是识别执行活动的人。我们提议一种使用弱监督多输出siamesenet，将数据映射到多个表示空间，每个表示空间都关注一个数据方面。数据样本的表示 вектор在空间中位置，使得同Semantic meaning的数据在该方面都很接近。因此，通过实验证明，训练模型可以为多个任务提供分 clustering metrics，使其能同时解决多个任务，甚至在许多情况下超越单任务超级vised方法。此外，进一步的实验还分析了architecture的影响和多个任务在这种框架中的使用情况，证明模型可以扩展到包括更多任务，并且可以将部分相关信息的数据组合成一起进行分析。

Machine learning methods for the search for L&T brown dwarfs in the data of modern sky surveys

paper_url: http://arxiv.org/abs/2308.03045
repo_url: https://github.com/iamaleksandra/ml-brown-dwarfs
paper_authors: Aleksandra Avdeeva
for: 这篇论文的目的是为了开发一种基于机器学习方法的方法，用于分类鲸鱼级和橙级恒星的不同类型。
methods: 这篇论文使用了Random Forest Classifier、XGBoost、SVM Classifier和TabNet等机器学习算法，对PanStarrs DR1、2MASS和WISE数据进行分析，以分类鲸鱼级和橙级恒星。
results: 这篇论文的结果表明，使用机器学习方法可以准确地分类鲸鱼级和橙级恒星，并且比传统的决策规则更有效和相关。

Abstract
According to various estimates, brown dwarfs (BD) should account for up to 25 percent of all objects in the Galaxy. However, few of them are discovered and well-studied, both individually and as a population. Homogeneous and complete samples of brown dwarfs are needed for these kinds of studies. Due to their weakness, spectral studies of brown dwarfs are rather laborious. For this reason, creating a significant reliable sample of brown dwarfs, confirmed by spectroscopic observations, seems unattainable at the moment. Numerous attempts have been made to search for and create a set of brown dwarfs using their colours as a decision rule applied to a vast amount of survey data. In this work, we use machine learning methods such as Random Forest Classifier, XGBoost, SVM Classifier and TabNet on PanStarrs DR1, 2MASS and WISE data to distinguish L and T brown dwarfs from objects of other spectral and luminosity classes. The explanation of the models is discussed. We also compare our models with classical decision rules, proving their efficiency and relevance.

摘要
根据不同的估计，棕矮星（BD）应该占到 галаxy中的25%以上的 объект数。然而，它们的发现和研究很少，个体和人口级别都很少。 homogeneous和完整的褐矮星样本是需要的，以便进行这些类型的研究。由于它们的弱点，褐矮星的spectral studies很困难。因此，创建一个可靠的褐矮星样本，通过spectroscopic observations确认，目前看起来不可能。许多人已经尝试过使用颜色作为决策规则，对大量的survey数据进行搜索和建立褐矮星样本。在这种工作中，我们使用机器学习方法，如Random Forest Classifier、XGBoost、SVM Classifier和TabNet，在PanStarrs DR1、2MASS和WISE数据上分类L和T褐矮星和其他 spectral和 luminosity 类型的 объек数。我们还讲述了模型的解释。我们还比较了我们的模型与经典的决策规则，证明它们的高效和 relevance。

Machine Learning for Infectious Disease Risk Prediction: A Survey

paper_url: http://arxiv.org/abs/2308.03037
repo_url: None
paper_authors: Mutong Liu, Yang Liu, Jiming Liu
for: 这篇论文主要是为了探讨机器学习如何在抑制传染疾病方面发挥作用，以帮助更好地预测传染疾病风险。
methods: 这篇论文使用了不同的机器学习模型来预测传染疾病风险，包括统计预测、数据驱动机器学习和epidemiology-inspired机器学习。
results: 论文结果表明，机器学习可以帮助量化疾病传播模式，并准确预测传染疾病风险。但是，在使用机器学习模型时，需要注意输入数据的问题、设计任务目标和评估模型性能等问题。

Abstract
Infectious diseases, either emerging or long-lasting, place numerous people at risk and bring heavy public health burdens worldwide. In the process against infectious diseases, predicting the epidemic risk by modeling the disease transmission plays an essential role in assisting with preventing and controlling disease transmission in a more effective way. In this paper, we systematically describe how machine learning can play an essential role in quantitatively characterizing disease transmission patterns and accurately predicting infectious disease risks. First, we introduce the background and motivation of using machine learning for infectious disease risk prediction. Next, we describe the development and components of various machine learning models for infectious disease risk prediction. Specifically, existing models fall into three categories: Statistical prediction, data-driven machine learning, and epidemiology-inspired machine learning. Subsequently, we discuss challenges encountered when dealing with model inputs, designing task-oriented objectives, and conducting performance evaluation. Finally, we conclude with a discussion of open questions and future directions.

摘要
免疫疾病，无论是新兴的或长期存在的，对全球公共健康带来沉重负担，数百万人受到威胁。在抗击免疫疾病的过程中，预测疾病传播风险的模型化帮助更有效地预测和控制疾病传播。在这篇文章中，我们系统地描述了机器学习如何在免疫疾病风险预测中发挥重要作用。首先，我们介绍了使用机器学习预测免疫疾病风险的背景和动机。然后，我们描述了不同类型的机器学习模型的开发和组成部分。 especifically，现有模型可以分为三类：统计预测、数据驱动机器学习和epidemiology-inspired机器学习。接着，我们讨论了与模型输入、设计任务指向 objective 和性能评估时遇到的挑战。最后，我们 conclude with 未解决的问题和未来方向。

Serverless Federated AUPRC Optimization for Multi-Party Collaborative Imbalanced Data Mining

paper_url: http://arxiv.org/abs/2308.03035
repo_url: https://github.com/xidongwu/d-auprc
paper_authors: Xidong Wu, Zhengmian Hu, Jian Pei, Heng Huang
for: 这个论文主要针对多方合作训练（分布式学习和联合学习）在巨量数据挑战中提供解决方案。
methods: 本文提出了一种新的服务器less多方合作AUPRC最大化问题，并将其改编为服务器less多方合作学习中的conditional随机优化问题。furthermore, the authors propose a new ServerLess biAsed sTochastic gradiEnt (SLATE) algorithm to directly optimize the AUPRC, and also propose a variance reduction technique called ServerLess biAsed sTochastic gradiEnt with Momentum-based variance reduction (SLATE-M) algorithm to improve the convergence rate.
results: 本文的实验结果表明，SLATE-M算法可以在多方合作学习 Setting中实现更高的AUPRC最大化，并且与单机端Online方法的最优性较高。此外，SLATE-M算法还可以降低了通信成本，提高了计算效率。

Abstract
Multi-party collaborative training, such as distributed learning and federated learning, is used to address the big data challenges. However, traditional multi-party collaborative training algorithms were mainly designed for balanced data mining tasks and are intended to optimize accuracy (\emph{e.g.}, cross-entropy). The data distribution in many real-world applications is skewed and classifiers, which are trained to improve accuracy, perform poorly when applied to imbalanced data tasks since models could be significantly biased toward the primary class. Therefore, the Area Under Precision-Recall Curve (AUPRC) was introduced as an effective metric. Although single-machine AUPRC maximization methods have been designed, multi-party collaborative algorithm has never been studied. The change from the single-machine to the multi-party setting poses critical challenges. To address the above challenge, we study the serverless multi-party collaborative AUPRC maximization problem since serverless multi-party collaborative training can cut down the communications cost by avoiding the server node bottleneck, and reformulate it as a conditional stochastic optimization problem in a serverless multi-party collaborative learning setting and propose a new ServerLess biAsed sTochastic gradiEnt (SLATE) algorithm to directly optimize the AUPRC. After that, we use the variance reduction technique and propose ServerLess biAsed sTochastic gradiEnt with Momentum-based variance reduction (SLATE-M) algorithm to improve the convergence rate, which matches the best theoretical convergence result reached by the single-machine online method. To the best of our knowledge, this is the first work to solve the multi-party collaborative AUPRC maximization problem.

摘要
多方合作训练，如分布式学习和联邦学习，用于解决大数据挑战。然而，传统的多方合作训练算法主要是为了均衡数据挖掘任务而设计，并且是为了提高准确率（例如，交叉熵）。然而，在多数实际应用中，数据分布是偏斜的，因此用于偏斜数据任务的分类器可能会在应用于不均衡数据任务时表现差。因此， Area Under Precision-Recall Curve（AUPRC）被引入为有效的指标。虽然单机AUPRC最大化方法已经设计，但多方合作算法尚未被研究。在这种多机到多机的转换中，存在关键挑战。为解决以上挑战，我们研究了无服务器多方合作AUPRC最大化问题，因为无服务器多方合作训练可以避免服务器瓶颈，从而减少通信成本。然后，我们将问题重新定义为无服务器多方合作学习中的Conditional Stochastic Optimization问题，并提出了一个新的ServerLess biAsed sTochastic gradiEnt（SLATE）算法，以直接优化AUPRC。之后，我们使用了偏移量降低技术，并提出了ServerLess biAsed sTochastic gradiEnt with Momentum-based variance reduction（SLATE-M）算法，以提高收敛率，与单机在线方法的最佳理论收敛率相匹配。到目前为止，这是首次解决多方合作AUPRC最大化问题的研究。

Causal Disentanglement Hidden Markov Model for Fault Diagnosis

paper_url: http://arxiv.org/abs/2308.03027
repo_url: None
paper_authors: Rihao Chang, Yongtao Ma, Weizhi Nie, Jie Nie, An-an Liu
for: 这 paper 的目的是提出一种基于 causal disentanglement hidden Markov model (CDHM) 的 fault diagnosis方法，以实现更加准确的维修预测。
methods: 该方法使用了时间序列数据，逐步分解震动信号为相关 fault 和无关 fault 因素，并使用 ELBO 优化学习 causal disentanglement Markov model。此外，该方法还采用了无监督领域适应，将学习的分解表示转移到其他工作环境中。
results: 实验结果表明，提出的方法能够在 CWRU 数据集和 IMS 数据集上提供更高的预测精度和维修效率，证明了该方法的优势。

Abstract
In modern industries, fault diagnosis has been widely applied with the goal of realizing predictive maintenance. The key issue for the fault diagnosis system is to extract representative characteristics of the fault signal and then accurately predict the fault type. In this paper, we propose a Causal Disentanglement Hidden Markov model (CDHM) to learn the causality in the bearing fault mechanism and thus, capture their characteristics to achieve a more robust representation. Specifically, we make full use of the time-series data and progressively disentangle the vibration signal into fault-relevant and fault-irrelevant factors. The ELBO is reformulated to optimize the learning of the causal disentanglement Markov model. Moreover, to expand the scope of the application, we adopt unsupervised domain adaptation to transfer the learned disentangled representations to other working environments. Experiments were conducted on the CWRU dataset and IMS dataset. Relevant results validate the superiority of the proposed method.

摘要
现代产业中，FAULT诊断已广泛应用，目的是实现预测维护。FAULT诊断系统的关键问题是提取FAULT信号的代表特征，然后准确预测FAULT类型。在这篇论文中，我们提出了 causal Disentanglement Hidden Markov Model（CDHM），用于学习滤波器FAULT机理的 causality，并从而捕捉其特征，以实现更加稳定的表示。specifically，我们利用时间序列数据，逐步分离振荡信号为FAULT相关和FAULT无关的因素。ELBO被重新表述，以便优化学习 causal Disentanglement Markov model。此外，为扩展应用范围，我们采用了无监督领域适应，将学习的分离表示转移到其他工作环境中。在CWRU数据集和IMS数据集上进行了实验，结果证明了我们提出的方法的优越性。

Early Detection and Localization of Pancreatic Cancer by Label-Free Tumor Synthesis

paper_url: http://arxiv.org/abs/2308.03008
repo_url: https://github.com/mrgiovanni/synthetictumors
paper_authors: Bowen Li, Yu-Cheng Chou, Shuwen Sun, Hualin Qiao, Alan Yuille, Zongwei Zhou
for: 早期检测和诊断肝脏癌可以提高病人5年生存率从8.5%提高到20%.
methods: 我们提出了一种使用人工智能（AI）帮助放射科医生早期检测肝脏癌的方法，但是训练AI模型需要大量的标注示例，但现有CT扫描器获得早期癌病变的示例受限。
results: 我们的实验表明，使用我们提出的合成癌病变方法训练AI模型，对肝脏癌的检测率（敏感性和特异性）和每个块分割性能都达到了与实际癌病变示例相当的水平。此外，我们的方法还显示在小癌病变检测方面更高的检测率。

Abstract
Early detection and localization of pancreatic cancer can increase the 5-year survival rate for patients from 8.5% to 20%. Artificial intelligence (AI) can potentially assist radiologists in detecting pancreatic tumors at an early stage. Training AI models require a vast number of annotated examples, but the availability of CT scans obtaining early-stage tumors is constrained. This is because early-stage tumors may not cause any symptoms, which can delay detection, and the tumors are relatively small and may be almost invisible to human eyes on CT scans. To address this issue, we develop a tumor synthesis method that can synthesize enormous examples of small pancreatic tumors in the healthy pancreas without the need for manual annotation. Our experiments demonstrate that the overall detection rate of pancreatic tumors, measured by Sensitivity and Specificity, achieved by AI trained on synthetic tumors is comparable to that of real tumors. More importantly, our method shows a much higher detection rate for small tumors. We further investigate the per-voxel segmentation performance of pancreatic tumors if AI is trained on a combination of CT scans with synthetic tumors and CT scans with annotated large tumors at an advanced stage. Finally, we show that synthetic tumors improve AI generalizability in tumor detection and localization when processing CT scans from different hospitals. Overall, our proposed tumor synthesis method has immense potential to improve the early detection of pancreatic cancer, leading to better patient outcomes.

摘要
早期发现和确定胰腺癌的患者5年生存率可以从8.5%提高到20%.人工智能（AI）可能能够帮助放射学家早期发现胰腺肿瘤。训练AI模型需要巨量的标注示例，但获得早期阶段肿瘤的CT扫描数据受限。这是因为早期阶段的肿瘤可能没有任何症状，这会延迟发现，同时肿瘤也很小，可能对人类眼不可见在CT扫描中。为解决这个问题，我们开发了一种肿瘤合成方法，可以在健康胰腺中合成巨量的小胰腺肿瘤示例，无需人工标注。我们的实验表明，使用合成肿瘤来训练AI的总检测率（敏感度和特异度）和小肿瘤检测率均与实际肿瘤相当。此外，我们还发现，将合成肿瘤与已知大肿瘤的CT扫描数据组合训练AI，可以提高每个voxel的分 segmentation性能。最后，我们证明了合成肿瘤可以提高AI在不同医院CT扫描数据处理中的普适性。总之，我们提出的肿瘤合成方法具有巨大的潜力，可以提高胰腺癌的早期发现，从而提高病人生存率。

Deep Polar Codes

paper_url: http://arxiv.org/abs/2308.03004
repo_url: https://github.com/HzFu/MNet_DeepCDR
paper_authors: Geon Choi, Namyoon Lee
for: 这paper是为了提出一种新的预变扭转码，即深度扭转码。
methods: 这paper使用了一种多层扭转变换的深度编码器，以及一种低复杂度的解码算法Successive Cancellation List with Backpropagation Parity Checks (SCL-BPC)。
results: simulations表明，深度扭转码在不同的编码率下，对块错误率具有较好的性能，并且可以保持低的编码和解码复杂度。此外，这paper还证明了将深度扭转码 concatenate avec cyclic-redundancy-check codes可以实现finite block length capacity的meta-converse bound within 0.4 dB。

Abstract
In this paper, we introduce a novel class of pre-transformed polar codes, termed as deep polar codes. We first present a deep polar encoder that harnesses a series of multi-layered polar transformations with varying sizes. Our approach to encoding enables a low-complexity implementation while significantly enhancing the weight distribution of the code. Moreover, our encoding method offers flexibility in rate-profiling, embracing a wide range of code rates and blocklengths. Next, we put forth a low-complexity decoding algorithm called successive cancellation list with backpropagation parity checks (SCL-BPC). This decoding algorithm leverages the parity check equations in the reverse process of the multi-layered pre-transformed encoding for SCL decoding. Additionally, we present a low-latency decoding algorithm that employs parallel-SCL decoding by treating partially pre-transformed bit patterns as additional frozen bits. Through simulations, we demonstrate that deep polar codes outperform existing pre-transformed polar codes in terms of block error rates across various code rates under short block lengths, while maintaining low encoding and decoding complexity. Furthermore, we show that concatenating deep polar codes with cyclic-redundancy-check codes can achieve the meta-converse bound of the finite block length capacity within 0.4 dB in some instances.

摘要
在这篇论文中，我们介绍了一种新的预变扩展极码，称为深度极码。我们首先描述了一种深度极码编码器，利用多层极化转换来实现低复杂性实现，同时有效地改善极码的重量分布。此外，我们的编码方法支持范围广适的代码速率和块长度。接下来，我们提出了一种低复杂度解码算法，称为顺序取消列表归并后传递检查（SCL-BPC）。这种解码算法利用了反向的多层预变扩展编码中的严格检查方程，并且可以在低复杂度下实现高效的解码。此外，我们还提出了一种低延迟解码算法，通过并行执行SCL解码来处理部分预变扩展的位 Pattern。通过实验，我们证明了深度极码在不同代码速率下的块错误率较低，同时保持低编码和解码复杂度。此外，我们还显示了将深度极码与循环检查码 concatenate 可以实现finite block length capacity的meta-converse bound within 0.4 dB 的情况。

Spanish Pre-trained BERT Model and Evaluation Data

paper_url: http://arxiv.org/abs/2308.02976
repo_url: https://github.com/dccuchile/beto
paper_authors: José Cañete, Gabriel Chaperon, Rodrigo Fuentes, Jou-Hui Ho, Hojin Kang, Jorge Pérez
for: bridging the gap of Spanish language resources for training and evaluating Spanish language models.
methods: using BERT-based language model pre-trained exclusively on Spanish data, and compiling several tasks specifically for the Spanish language in a single repository.
results: fine-tuning the pre-trained Spanish model achieves better results compared to other BERT-based models pre-trained on multilingual corpora for most tasks, and achieves a new state-of-the-art on some tasks.Here’s the format you requested:
for: bridging the gap of Spanish language resources
methods: using BERT-based language model pre-trained exclusively on Spanish data, and compiling several tasks specifically for the Spanish language in a single repository
results: fine-tuning the pre-trained Spanish model achieves better results compared to other BERT-based models pre-trained on multilingual corpora for most tasks, and achieves a new state-of-the-art on some tasks.

Abstract
The Spanish language is one of the top 5 spoken languages in the world. Nevertheless, finding resources to train or evaluate Spanish language models is not an easy task. In this paper we help bridge this gap by presenting a BERT-based language model pre-trained exclusively on Spanish data. As a second contribution, we also compiled several tasks specifically for the Spanish language in a single repository much in the spirit of the GLUE benchmark. By fine-tuning our pre-trained Spanish model, we obtain better results compared to other BERT-based models pre-trained on multilingual corpora for most of the tasks, even achieving a new state-of-the-art on some of them. We have publicly released our model, the pre-training data, and the compilation of the Spanish benchmarks.

摘要
西班牙语是全球前5大常用语言之一，但找到用于训练或评估西班牙语模型的资源并不容易。在这篇论文中，我们帮助填补这个差距，提出了基于BERT的西班牙语语言模型，并且在单个存储中集成了许多西班牙语任务。经过练练后的西班牙语模型，在大多数任务上比其他基于多语言Corpus预训练的BERT模型获得更好的结果，甚至达到了一些任务的新状态。我们将公开发布我们的模型、预训练数据和西班牙语benchmark集。

Generalized Oversampling for Learning from Imbalanced datasets and Associated Theory

paper_url: http://arxiv.org/abs/2308.02966
repo_url: None
paper_authors: Samuel Stocksieker, Denys Pommeret, Arthur Charpentier
for: 这个研究旨在解决受扰数据集的问题，尤其是预测任务中的欠资料问题。
methods: 本研究提出了一个基于核密度估计的数据增强方法，称为GOLIATH算法，可以应用于预测和回归任务。这个方法包括两大家族的人工增加：基于扰动，如 Gaussian Noise，和基于插值，如 SMOTE。它还提供了这些机器学习算法的明确形式和条件密度表达，特别是SMOTE。
results: 本研究评估了GOLIATH算法在受扰数据集中的性能，并与现有的State-of-the-art技术进行比较。结果显示，GOLIATH算法在受扰数据集中具有显著的改善。

Abstract
In supervised learning, it is quite frequent to be confronted with real imbalanced datasets. This situation leads to a learning difficulty for standard algorithms. Research and solutions in imbalanced learning have mainly focused on classification tasks. Despite its importance, very few solutions exist for imbalanced regression. In this paper, we propose a data augmentation procedure, the GOLIATH algorithm, based on kernel density estimates which can be used in classification and regression. This general approach encompasses two large families of synthetic oversampling: those based on perturbations, such as Gaussian Noise, and those based on interpolations, such as SMOTE. It also provides an explicit form of these machine learning algorithms and an expression of their conditional densities, in particular for SMOTE. New synthetic data generators are deduced. We apply GOLIATH in imbalanced regression combining such generator procedures with a wild-bootstrap resampling technique for the target values. We evaluate the performance of the GOLIATH algorithm in imbalanced regression situations. We empirically evaluate and compare our approach and demonstrate significant improvement over existing state-of-the-art techniques.

摘要
在超级vised学习中，很常遇到实际上的不均衡数据集。这种情况会导致标准算法学习困难。研究和解决不均衡学习问题的研究主要集中在分类任务上。尽管它的重要性，实际上很少解决不均衡回归问题的解决方案存在。在这篇论文中，我们提出了一种数据扩充过程，named GOLIATH algorithm，基于kernel density estimates，可以在分类和回归中使用。这种通用的方法包括两大家族的人工增加：基于扰动，如 Gaussian Noise，和基于 interpolations，如 SMOTE。它还提供了这些机器学习算法的明确形式，特别是SMOTE的表达。新的人工数据生成器被推导出来。我们在不均衡回归中结合了这些生成器过程和野生bootstrap抽样技术来针对目标值。我们对GOLIATH算法在不均衡回归情况下的性能进行了实验性评估和比较，并证明了我们的方法在现有状态的技术上具有显著的改善。

Data Fusion for Multi-Task Learning of Building Extraction and Height Estimation

paper_url: http://arxiv.org/abs/2308.02960
repo_url: https://github.com/SaadAhmedJamal/IEEE_DFC2023
paper_authors: Saad Ahmed Jamal, Arioluwa Aribisala
for: 这篇论文是为了解决城市重建问题，使用 optic 和 radar 卫星影像进行多任务学习，实现建筑物抽出和高度估计。
methods: 本论文使用多任务学习方法，将 optic 和 radar 卫星影像融合，实现建筑物抽出和高度估计。
results: 根据设计实验结果，本论文的基准结果显著提高了建筑物抽出和高度估计的精度。

Abstract
In accordance with the urban reconstruction problem proposed by the DFC23 Track 2 Contest, this paper attempts a multitask-learning method of building extraction and height estimation using both optical and radar satellite imagery. Contrary to the initial goal of multitask learning which could potentially give a superior solution by reusing features and forming implicit constraints between multiple tasks, this paper reports the individual implementation of the building extraction and height estimation under constraints. The baseline results for the building extraction and the height estimation significantly increased after designed experiments.

摘要
根据DFC23 Track 2 contest提出的城市重建问题，本文提出了一种多任务学习方法，使用光学和雷达卫星影像进行建筑物提取和高度估计。与初始目标的多任务学习不同，本文对每个任务进行独立实现，并在限制下进行建筑物提取和高度估计。经过设计实验，基准结果显著提高。Note: "多任务学习" in Chinese is "多任务学习" (duō zhòng zhì xué xí), and "光学" and "雷达" are both adjectives, so they are not translated.

K-band: Self-supervised MRI Reconstruction via Stochastic Gradient Descent over K-space Subsets

paper_url: http://arxiv.org/abs/2308.02958
repo_url: https://github.com/mikgroup/k-band
paper_authors: Frederic Wang, Han Qi, Alfredo De Goyeneche, Reinhard Heckel, Michael Lustig, Efrat Shimron
For: This paper aims to develop a novel mathematical framework for training deep learning (DL) models using only partial, limited-resolution k-space data in high-dimensional magnetic resonance imaging (MRI).* Methods: The proposed method, called k-band, uses stochastic gradient descent (SGD) over k-space subsets, where only a small k-space portion is used in each training iteration to compute gradients. The method is compatible with different sampling strategies, and the authors demonstrate its effectiveness using k-space “bands” with limited resolution in one dimension.* Results: The authors prove analytically that their method stochastically approximates the gradients computed in a fully-supervised setup, as long as two conditions are met: (i) the limited-resolution axis is chosen randomly-uniformly for every new scan, and (ii) the loss function is weighed with a mask that facilitates accurate reconstruction of high-resolution details. Numerical experiments with raw MRI data show that k-band outperforms two other methods trained on limited-resolution data and performs comparably to state-of-the-art methods trained on high-resolution data.

Abstract
Although deep learning (DL) methods are powerful for solving inverse problems, their reliance on high-quality training data is a major hurdle. This is significant in high-dimensional (dynamic/volumetric) magnetic resonance imaging (MRI), where acquisition of high-resolution fully sampled k-space data is impractical. We introduce a novel mathematical framework, dubbed k-band, that enables training DL models using only partial, limited-resolution k-space data. Specifically, we introduce training with stochastic gradient descent (SGD) over k-space subsets. In each training iteration, rather than using the fully sampled k-space for computing gradients, we use only a small k-space portion. This concept is compatible with different sampling strategies; here we demonstrate the method for k-space "bands", which have limited resolution in one dimension and can hence be acquired rapidly. We prove analytically that our method stochastically approximates the gradients computed in a fully-supervised setup, when two simple conditions are met: (i) the limited-resolution axis is chosen randomly-uniformly for every new scan, hence k-space is fully covered across the entire training set, and (ii) the loss function is weighed with a mask, derived here analytically, which facilitates accurate reconstruction of high-resolution details. Numerical experiments with raw MRI data indicate that k-band outperforms two other methods trained on limited-resolution data and performs comparably to state-of-the-art (SoTA) methods trained on high-resolution data. k-band hence obtains SoTA performance, with the advantage of training using only limited-resolution data. This work hence introduces a practical, easy-to-implement, self-supervised training framework, which involves fast acquisition and self-supervised reconstruction and offers theoretical guarantees.

摘要
although deep learning (DL) methods are powerful for solving inverse problems, their reliance on high-quality training data is a major hurdle. This is significant in high-dimensional (dynamic/volumetric) magnetic resonance imaging (MRI), where acquiring high-resolution fully sampled k-space data is impractical. We introduce a novel mathematical framework, dubbed k-band, that enables training DL models using only partial, limited-resolution k-space data. Specifically, we introduce training with stochastic gradient descent (SGD) over k-space subsets. In each training iteration, rather than using the fully sampled k-space for computing gradients, we only use a small k-space portion. This concept is compatible with different sampling strategies; here we demonstrate the method for k-space "bands", which have limited resolution in one dimension and can hence be acquired rapidly. We prove analytically that our method stochastically approximates the gradients computed in a fully-supervised setup, when two simple conditions are met: (i) the limited-resolution axis is chosen randomly and uniformly for every new scan, hence k-space is fully covered across the entire training set, and (ii) the loss function is weighed with a mask, derived here analytically, which facilitates accurate reconstruction of high-resolution details. Numerical experiments with raw MRI data indicate that k-band outperforms two other methods trained on limited-resolution data and performs comparably to state-of-the-art (SoTA) methods trained on high-resolution data. k-band hence obtains SoTA performance, with the advantage of training using only limited-resolution data. This work hence introduces a practical, easy-to-implement, self-supervised training framework, which involves fast acquisition and self-supervised reconstruction and offers theoretical guarantees.Here's the translation in Traditional Chinese:虽然深度学习（DL）方法有着解决反射问题的力量，但它们对高品质训练数据的依赖是一个主要的阻碍。这特别是在高维度（动态/体积）磁共振成像（MRI）中，其中高分辨率完整探测空间数据的取得是不实际的。我们提出了一个新的数学框架，名为k-band，可以使用仅有部分、有限分辨率的k-空间数据进行深度学习模型的训练。具体来说，我们引入了使用测出梯度的测出梯度运算，并在每个训练迭代中仅使用k-空间中的一小部分。这个概念可以与不同的采样策略相容，在这里我们示例了使用k-空间"带"，它们在一个维度上有限的分辨率，并可以快速地取得。我们 analytically prove了我们的方法可以随机地近似fully-supervised setup中的梯度，只要两个简单的条件是满足的：（i）在每个新的扫描中，Randomly and uniformly chooses the limited-resolution axis，使得k-space是全面覆盖整个训练集，并（ii）使用一个 analytically derived 的mask，来减轻高分辨率细节的重建。numero Experiments with raw MRI data indicate that k-band outperforms two other methods trained on limited-resolution data and performs comparably to state-of-the-art (SoTA) methods trained on high-resolution data. k-band hence obtains SoTA performance, with the advantage of training using only limited-resolution data. This work hence introduces a practical, easy-to-implement, self-supervised training framework, which involves fast acquisition and self-supervised reconstruction and offers theoretical guarantees.

An Empirical Study of AI-based Smart Contract Creation

paper_url: http://arxiv.org/abs/2308.02955
repo_url: None
paper_authors: Rabimba Karanjai, Edward Li, Lei Xu, Weidong Shi
for: 本研究旨在评估大语言模型（LLM）如ChatGPT和Google Palm2在智能合约生成方面的可能性。
methods: 本研究使用了LLMs对智能合约的生成，并评估了生成代码的质量。
results: 研究发现，通过LLMs生成的合约存在安全漏洞，且代码质量和正确性受到输入参数质量的影响。但是，还有一些可能的改进方向。

Abstract
The introduction of large language models (LLMs) like ChatGPT and Google Palm2 for smart contract generation seems to be the first well-established instance of an AI pair programmer. LLMs have access to a large number of open-source smart contracts, enabling them to utilize more extensive code in Solidity than other code generation tools. Although the initial and informal assessments of LLMs for smart contract generation are promising, a systematic evaluation is needed to explore the limits and benefits of these models. The main objective of this study is to assess the quality of generated code provided by LLMs for smart contracts. We also aim to evaluate the impact of the quality and variety of input parameters fed to LLMs. To achieve this aim, we created an experimental setup for evaluating the generated code in terms of validity, correctness, and efficiency. Our study finds crucial evidence of security bugs getting introduced in the generated smart contracts as well as the overall quality and correctness of the code getting impacted. However, we also identified the areas where it can be improved. The paper also proposes several potential research directions to improve the process, quality and safety of generated smart contract codes.

摘要
大型语言模型（LLMs）如ChatGPT和Google Palm2的出现在智能合约生成中似乎是首次成功的AI对应程式设计。LLMs有 Access to a large number of open-source smart contracts， allowing them to use more extensive code in Solidity than other code generation tools. Although the initial and informal assessments of LLMs for smart contract generation are promising, a systematic evaluation is needed to explore the limits and benefits of these models. 本研究的主要目的是评估由LLMs生成的智能合约程式码的质量。我们还想评估对LLMs输入参数的影响，以及生成的程式码的安全性和正确性。为了实现这个目标，我们创建了一个实验setup来评估生成的程式码，包括验证、正确性和效率。我们的研究发现，由LLMs生成的智能合约程式码中有许多安全漏洞，并且程式码的全面性和正确性受到影响。但我们也发现了改进这些模型的可能性。本研究的结论提出了多个可能的研究方向，以改善生成的程式码质量、安全性和可靠性。

dPASP: A Comprehensive Differentiable Probabilistic Answer Set Programming Environment For Neurosymbolic Learning and Reasoning

paper_url: http://arxiv.org/abs/2308.02944
repo_url: None
paper_authors: Renato Lui Geh, Jonas Gonçalves, Igor Cataneo Silveira, Denis Deratani Mauá, Fabio Gagliardi Cozman
for: 本文描述了一种新的宣言式概率逻辑编程框架dPASP，用于不同推理和统计知识的结合。
methods: 本文使用了逻辑约束、间值概率选择和神经预测来表示不确定、矛盾、不完整和统计知识。
results: 本文介绍了一种可以执行推理和学习的实现包，包括一些示例程序，并讨论了在不同语义下的梯度下降学习。

Abstract
We present dPASP, a novel declarative probabilistic logic programming framework for differentiable neuro-symbolic reasoning. The framework allows for the specification of discrete probabilistic models with neural predicates, logic constraints and interval-valued probabilistic choices, thus supporting models that combine low-level perception (images, texts, etc), common-sense reasoning, and (vague) statistical knowledge. To support all such features, we discuss the several semantics for probabilistic logic programs that can express nondeterministic, contradictory, incomplete and/or statistical knowledge. We also discuss how gradient-based learning can be performed with neural predicates and probabilistic choices under selected semantics. We then describe an implemented package that supports inference and learning in the language, along with several example programs. The package requires minimal user knowledge of deep learning system's inner workings, while allowing end-to-end training of rather sophisticated models and loss functions.

摘要
我们介绍了dpasp，一种新的声明型概率逻辑编程框架，用于可微分神经符号逻辑推理。这个框架允许用户指定混合低级感知（图像、文本等）、通用理智、混乱统计知识的概率模型。为支持这些特点，我们讨论了几种概率逻辑程序的 semantics，可以表达非束缚、矛盾、不完整和/或统计知识。我们还讨论了如何在选定 semantics 下使用神经 predicate 和概率选择来进行梯度基于学习。然后，我们描述了一个实现的包，包括推理和学习语言中的许多示例程序。这个包需要最少的用户知识 deep learning 系统的内部工作，同时允许用户实现较复杂的模型和损失函数的整体训练。

Towards the Development of an Uncertainty Quantification Protocol for the Natural Gas Industry

paper_url: http://arxiv.org/abs/2308.02941
repo_url: None
paper_authors: Babajide Kolade
for: 这个论文的目的是为了开发一种用于评估机器学习和机理模型预测结果的不确定性评估协议。
methods: 这个论文使用了机器学习模型和机理模型来进行预测，并使用了不确定性评估协议来评估模型的可靠性。
results: 该论文通过应用不确定性评估协议来评估机器学习和机理模型的预测结果的不确定性，并提供了一些可靠性评估的方法和技术。

Abstract
Simulations using machine learning (ML) models and mechanistic models are often run to inform decision-making processes. Uncertainty estimates of simulation results are critical to the decision-making process because simulation results of specific scenarios may have wide, but unspecified, confidence bounds that may impact subsequent analyses and decisions. The objective of this work is to develop a protocol to assess uncertainties in predictions of machine learning and mechanistic simulation models. The protocol will outline an uncertainty quantification workflow that may be used to establish credible bounds of predictability on computed quantities of interest and to assess model sufficiency. The protocol identifies key sources of uncertainties in machine learning and mechanistic modeling, defines applicable methods of uncertainty propagation for these sources, and includes statistically rational estimators for output uncertainties. The work applies the protocol to test cases relevant to the gas distribution industry and presents learnings from its application. The paper concludes with a brief discussion outlining a pathway to the wider adoption of uncertainty quantification within the industry

摘要
模拟使用机器学习（ML）模型和机理模型经常用于决策过程中，以便更好地了解不同情况下的结果。模拟结果中的uncertainty estimate是决策过程中非常重要的，因为模拟结果的特定情况可能有很宽，但不具体的信任范围，这可能会影响后续分析和决策。本工作的目标是开发一个协议，用于评估机器学习和机理模型预测结果中的不确定性。协议将 outline一个不确定性评估工作流程，可以用来确定计算量表达的可靠范围，并评估模型的充分性。协议将列出机器学习和机理模型中的主要不确定性源泉，并定义这些源泉上适用的不确定性传播方法，并包括输出不确定性的统计合理估计。本工作将应用协议到与天然气分布业有关的测试 случа例，并显示其应用的教训。文章结束于对在业界更广泛采用不确定性评估的道路的简要讨论。

Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval

paper_url: http://arxiv.org/abs/2308.02926
repo_url: https://github.com/Haoxiang-WasedaU/Towards-Consistency-Filtering-Free-Unsupervised-Learning-for-Dense-Retrieval
paper_authors: Haoxiang Shi, Sumio Fujita, Tetsuya Sakai
for: overcome domain transfer challenge in modern neural Information Retrieval (IR)
methods: replace consistency filter with direct pseudo-labeling, pseudo-relevance feedback, or unsupervised keyword generation methods
results: TextRank-based pseudo relevance feedback outperforms other methods, and filtering-free unsupervised learning can continuously improve training and inference efficiency while maintaining retrieval performance.Here’s the full text in Simplified Chinese:
for: 本研究旨在解决现代神经信息检索（IR）中的域转换问题。
methods: 我们提议取代了统计过程的一致性筛选器，使用直接 Pseudo-labeling、Pseudo- relevance feedback 或无监督关键词生成方法。
results: 我们的广泛的实验证明，TextRank-based Pseudo relevance feedback 方法在其他方法之上表现更出色，并且 filtering-free 无监督学习可以不断提高训练和推理效率，同时保持检索性能。

Abstract
Domain transfer is a prevalent challenge in modern neural Information Retrieval (IR). To overcome this problem, previous research has utilized domain-specific manual annotations and synthetic data produced by consistency filtering to finetune a general ranker and produce a domain-specific ranker. However, training such consistency filters are computationally expensive, which significantly reduces the model efficiency. In addition, consistency filtering often struggles to identify retrieval intentions and recognize query and corpus distributions in a target domain. In this study, we evaluate a more efficient solution: replacing the consistency filter with either direct pseudo-labeling, pseudo-relevance feedback, or unsupervised keyword generation methods for achieving consistent filtering-free unsupervised dense retrieval. Our extensive experimental evaluations demonstrate that, on average, TextRank-based pseudo relevance feedback outperforms other methods. Furthermore, we analyzed the training and inference efficiency of the proposed paradigm. The results indicate that filtering-free unsupervised learning can continuously improve training and inference efficiency while maintaining retrieval performance. In some cases, it can even improve performance based on particular datasets.

摘要
域名转移是现代神经信息检索（IR）中的一大挑战。以前的研究使用域名特定的手动标注和生成的域名特定排序器来训练一个通用排序器，以便在目标域名中提高检索性能。然而，训练这些一致性筛选器是计算机Expensive，这会significantly reduces the model efficiency。另外，一致性筛选器经常难以识别检索目的和查询和文献库的分布。在这种研究中，我们评估了一种更有效的解决方案：取代一致性筛选器，使用直接 Pseudo-labeling、pseudo relevance feedback 或无监督关键词生成方法来实现一致性自由无监督检索。我们进行了广泛的实验评估，结果表明，使用 TextRank 基于 Pseudo relevance feedback 的方法在 average 上超过其他方法。此外，我们还分析了提议的训练和执行效率。结果表明，无监督自由学习可以不断提高训练和执行效率，同时保持检索性能。在某些情况下，它甚至可以提高基于特定数据集的性能。

An AI-Enabled Framework to Defend Ingenious MDT-based Attacks on the Emerging Zero Touch Cellular Networks

paper_url: http://arxiv.org/abs/2308.02923
repo_url: None
paper_authors: Aneeqa Ijaz, Waseem Raza, Hasan Farooq, Marvin Manalastas, Ali Imran
for: This paper aims to address the security threats in deeply automated wireless networks and IoT devices, specifically the vulnerability of MDT reports to adversarial attacks.
methods: The paper proposes a novel Malicious MDT Reports Identification framework (MRIF) using Machine Learning to detect and eliminate malicious MDT reports, and verifies its effectiveness through a use-case.
results: The paper highlights the detrimental repercussions of adversarial attacks on MDT reports on the performance of common network automation functions, and proposes a countermeasure to defend against such attacks.

Abstract
Deep automation provided by self-organizing network (SON) features and their emerging variants such as zero touch automation solutions is a key enabler for increasingly dense wireless networks and pervasive Internet of Things (IoT). To realize their objectives, most automation functionalities rely on the Minimization of Drive Test (MDT) reports. The MDT reports are used to generate inferences about network state and performance, thus dynamically change network parameters accordingly. However, the collection of MDT reports from commodity user devices, particularly low cost IoT devices, make them a vulnerable entry point to launch an adversarial attack on emerging deeply automated wireless networks. This adds a new dimension to the security threats in the IoT and cellular networks. Existing literature on IoT, SON, or zero touch automation does not address this important problem. In this paper, we investigate an impactful, first of its kind adversarial attack that can be launched by exploiting the malicious MDT reports from the compromised user equipment (UE). We highlight the detrimental repercussions of this attack on the performance of common network automation functions. We also propose a novel Malicious MDT Reports Identification framework (MRIF) as a countermeasure to detect and eliminate the malicious MDT reports using Machine Learning and verify it through a use-case. Thus, the defense mechanism can provide the resilience and robustness for zero touch automation SON engines against the adversarial MDT attacks

摘要
深层自动化由自组织网络（SON）特点和其出现的变种，如零Touch自动化解决方案，是无线网络和互联网物联网（IoT）的关键加速器。为实现这些目标，大多数自动化功能都依赖于推定测试（MDT）报告。MDT报告可以生成对网络状态和性能的推理，因此动态改变网络参数。然而，从低成本IoT设备收集MDT报告，特别是低成本IoT设备，使得它们成为发动对新型深层自动化无线网络的敌意攻击的易受到攻击的入口点。这添加了新的安全隐患到互联网和无线网络中。现有的文献中关于IoT、SON或零Touch自动化没有讨论这个重要问题。在这篇论文中，我们调查了一种新型的攻击，可以通过利用受到恶意MDT报告的用户设备（UE）进行攻击。我们强调了这种攻击对常见网络自动化功能的负面影响。我们还提出了一个新的恶意MDT报告标识框架（MRIF）作为一种对抗手段，通过机器学习来检测和消除恶意MDT报告，并通过用例验证。因此，防御机制可以为零Touch自动化SON引擎提供防御力和坚固性，抵御攻击。

Structured Low-Rank Tensors for Generalized Linear Models

paper_url: http://arxiv.org/abs/2308.02922
repo_url: None
paper_authors: Batoul Taki, Anand D. Sarwate, Waheed U. Bajwa
for: 这个论文研究了一种新的低级别矩阵模型（LSR），用于通用线性模型（GLM）问题。
methods: 该论文提出了一种块坐标枢车算法来参数估计LSR结构矩阵GLM问题。
results: 论文 derive了一个最小最大下界，用于评估矩阵GLM问题中参数估计的误差阈值。该下界与矩阵GLM问题的自然度相比，表明sample complexity可能会比vectorized GLMs更低。此外，论文还进行了数值分析，并在 synthetic 数据上进行了三种类型的回归（线性、逻辑和波尔兹）的实验，以及一些医学成像数据上的实验。

Abstract
Recent works have shown that imposing tensor structures on the coefficient tensor in regression problems can lead to more reliable parameter estimation and lower sample complexity compared to vector-based methods. This work investigates a new low-rank tensor model, called Low Separation Rank (LSR), in Generalized Linear Model (GLM) problems. The LSR model -- which generalizes the well-known Tucker and CANDECOMP/PARAFAC (CP) models, and is a special case of the Block Tensor Decomposition (BTD) model -- is imposed onto the coefficient tensor in the GLM model. This work proposes a block coordinate descent algorithm for parameter estimation in LSR-structured tensor GLMs. Most importantly, it derives a minimax lower bound on the error threshold on estimating the coefficient tensor in LSR tensor GLM problems. The minimax bound is proportional to the intrinsic degrees of freedom in the LSR tensor GLM problem, suggesting that its sample complexity may be significantly lower than that of vectorized GLMs. This result can also be specialised to lower bound the estimation error in CP and Tucker-structured GLMs. The derived bounds are comparable to tight bounds in the literature for Tucker linear regression, and the tightness of the minimax lower bound is further assessed numerically. Finally, numerical experiments on synthetic datasets demonstrate the efficacy of the proposed LSR tensor model for three regression types (linear, logistic and Poisson). Experiments on a collection of medical imaging datasets demonstrate the usefulness of the LSR model over other tensor models (Tucker and CP) on real, imbalanced data with limited available samples.

摘要
近期研究表明，在回归问题中强制材料结构 onto 参数矩阵可以导致更可靠的参数估计和较低的样本复杂度，相比 vector-based 方法。这个工作研究了一种新的低级别张量模型，called Low Separation Rank (LSR)，在泛化线性模型 (GLM) 问题中。LSR 模型总结了已知的各种 Tucker 和 CANDECOMP/PARAFAC (CP) 模型，并是特殊情况的块张量分解 (BTD) 模型。这个工作提出了基于块坐标推导法的参数估计算法，并 derivates 一个最小最大下界对 GLM 问题中参数矩阵的估计误差。这个下界与 LSR 张量 GLM 问题中参数矩阵的内在度量相对，表明其样本复杂度可能远低于vectorized GLMs。此外，这个结果还可以特殊化到 CP 和 Tucker 结构 GLM 问题中。经过数学分析和实验 validate，这个下界与文献中已知的精细度下界相当。最后，数值实验表明LSR 张量模型在三种回归类型（线性回归、逻辑回归和波恩回归）中具有出色的效果，并在实际医疗影像数据上进行了比较。

Spectral Ranking Inferences based on General Multiway Comparisons

paper_url: http://arxiv.org/abs/2308.02918
repo_url: None
paper_authors: Jianqing Fan, Zhipeng Lou, Weichen Wang, Mengxin Yu
for: 这 paper 研究 spectral method 在对比Entities的偏好分数的估计和不确定性评估中的性能。
methods: paper 使用 spectral method 在一个非常通用和更真实的设定中，其中比较图包含可能不同大小的高阶约束，并且可能只有一个比较。这种设定在实际应用中非常普遍，因此不需要指定图的随机性和PL/BTL模型中的均匀采样假设。
results: paper 发现，在适用 BTL/PL 模型时，spectral estimator 和 Maximum Likelihood Estimator (MLE) 之间存在关系。furthermore, paper 提出了一种two-step spectral method，可以达到 MLE 的同等效率。此外，paper 还提出了一个涵盖一 Sample和两 Sample 排名推论的完整框架，可以应用于固定图和随机图设定。这是首次提出了有效的两 Sample rank testing 方法。最后，paper 通过了详细的数学实验和应用于统计期刊和电影排名等。

Abstract
This paper studies the performance of the spectral method in the estimation and uncertainty quantification of the unobserved preference scores of compared entities in a very general and more realistic setup in which the comparison graph consists of hyper-edges of possible heterogeneous sizes and the number of comparisons can be as low as one for a given hyper-edge. Such a setting is pervasive in real applications, circumventing the need to specify the graph randomness and the restrictive homogeneous sampling assumption imposed in the commonly-used Bradley-Terry-Luce (BTL) or Plackett-Luce (PL) models. Furthermore, in the scenarios when the BTL or PL models are appropriate, we unravel the relationship between the spectral estimator and the Maximum Likelihood Estimator (MLE). We discover that a two-step spectral method, where we apply the optimal weighting estimated from the equal weighting vanilla spectral method, can achieve the same asymptotic efficiency as the MLE. Given the asymptotic distributions of the estimated preference scores, we also introduce a comprehensive framework to carry out both one-sample and two-sample ranking inferences, applicable to both fixed and random graph settings. It is noteworthy that it is the first time effective two-sample rank testing methods are proposed. Finally, we substantiate our findings via comprehensive numerical simulations and subsequently apply our developed methodologies to perform statistical inferences on statistics journals and movie rankings.

摘要

Adversarial Erasing with Pruned Elements: Towards Better Graph Lottery Ticket

paper_url: http://arxiv.org/abs/2308.02916
repo_url: https://github.com/wangyuwen0627/ace-glt
paper_authors: Yuwen Wang, Shunyu Liu, Kaixuan Chen, Tongtian Zhu, Ji Qiao, Mengjie Shi, Yuanyu Wan, Mingli Song
for:这篇论文旨在提高大输入图形深度学习网络（GNN）的计算成本，而不是降低性能。methods:论文提出了一种组合核心子图和稀疏子网络的方法，即Graph Lottery Ticket（GLT），并使用迭代磁力基于剪枝（IMP）来获得奖券。然而，现有的研究仅将奖券获得者视为终极目标，而忽略了剪枝过程中当前关系的动态变化，这限制了奖券的吸引力。results:实验结果显示，我们的ACE-GLT在多种任务中均超越了现有的搜寻GLT方法。

Abstract
Graph Lottery Ticket (GLT), a combination of core subgraph and sparse subnetwork, has been proposed to mitigate the computational cost of deep Graph Neural Networks (GNNs) on large input graphs while preserving original performance. However, the winning GLTs in exisiting studies are obtained by applying iterative magnitude-based pruning (IMP) without re-evaluating and re-considering the pruned information, which disregards the dynamic changes in the significance of edges/weights during graph/model structure pruning, and thus limits the appeal of the winning tickets. In this paper, we formulate a conjecture, i.e., existing overlooked valuable information in the pruned graph connections and model parameters which can be re-grouped into GLT to enhance the final performance. Specifically, we propose an adversarial complementary erasing (ACE) framework to explore the valuable information from the pruned components, thereby developing a more powerful GLT, referred to as the ACE-GLT. The main idea is to mine valuable information from pruned edges/weights after each round of IMP, and employ the ACE technique to refine the GLT processing. Finally, experimental results demonstrate that our ACE-GLT outperforms existing methods for searching GLT in diverse tasks. Our code will be made publicly available.

摘要
Graph Lottery Ticket（GLT），一种将核心子图和稀疏子网络组合的方法，已被提出来降低深度图神经网络（GNNs）在大输入图上的计算成本，保持原始性能。然而，现有的赢家GLT通常通过不重新评估和重新考虑被剪除的信息来获得，这会忽略图/模型结构剪除中边Edge/权重的动态变化，从而限制赢家票的吸引力。在这篇论文中，我们提出一个假设，即现有的被过look的有价信息在剪除后的图连接和模型参数中，可以重新组织成GLT，以提高最终性能。 Specifically，我们提出一种对抗补做（ACE）框架，以挖掘剪除后的有价信息，并使用ACE技术来练级GLT处理。最后，我们的ACE-GLT在多种任务中的实验结果表明，我们的ACE-GLT比现有的GLT搜索方法更高效。我们的代码将在公共网上公布。

2023-08-06

eess.IV

eess.IV - 2023-08-06

FourLLIE: Boosting Low-Light Image Enhancement by Fourier Frequency Information

paper_url: http://arxiv.org/abs/2308.03033
repo_url: https://github.com/wangchx67/fourllie
paper_authors: Chenxi Wang, Hongjun Wu, Zhi Jin
for: 提高低光照图像的亮度和细节
methods: 利用 fourier 频谱信息和空间信息，提出了一种基于 fourier 频谱的 LLIE 网络（FourLLIE），通过估计 fourier 频谱变换图并在第二个阶段引入 SNR 图来提高图像细节和亮度。
results: FourLLIE 在四个标准测试集上的表现比既有 SOTA 方法更好，同时保持了好的模型效率。

Abstract
Recently, Fourier frequency information has attracted much attention in Low-Light Image Enhancement (LLIE). Some researchers noticed that, in the Fourier space, the lightness degradation mainly exists in the amplitude component and the rest exists in the phase component. By incorporating both the Fourier frequency and the spatial information, these researchers proposed remarkable solutions for LLIE. In this work, we further explore the positive correlation between the magnitude of amplitude and the magnitude of lightness, which can be effectively leveraged to improve the lightness of low-light images in the Fourier space. Moreover, we find that the Fourier transform can extract the global information of the image, and does not introduce massive neural network parameters like Multi-Layer Perceptrons (MLPs) or Transformer. To this end, a two-stage Fourier-based LLIE network (FourLLIE) is proposed. In the first stage, we improve the lightness of low-light images by estimating the amplitude transform map in the Fourier space. In the second stage, we introduce the Signal-to-Noise-Ratio (SNR) map to provide the prior for integrating the global Fourier frequency and the local spatial information, which recovers image details in the spatial space. With this ingenious design, FourLLIE outperforms the existing state-of-the-art (SOTA) LLIE methods on four representative datasets while maintaining good model efficiency.

摘要
近期，傅里叶频率信息在低光照图像提升（LLIE）中受到了很多注意。一些研究人员发现，在傅里叶空间中，亮度减退主要存在于幅度组件中，剩下的存在于相位组件中。通过结合傅里叶频率和空间信息，这些研究人员提出了有优势的解决方案。在这个工作中，我们进一步探索幅度组件的积分和亮度之间的正相关关系，可以有效地提高低光照图像的亮度在傅里叶空间中。此外，我们发现傅里叶变换可以提取图像的全局信息，不需要大量的神经网络参数如多层感知器（MLP）或变换器。为此，我们提出了一个两Stage的傅里叶基于LLIE网络（FourLLIE）。在第一stage中，我们使用傅里叶变换map来提高低光照图像的亮度。在第二stage中，我们引入信号噪比（SNR）地图，以提供亮度提升的优先级，并将全局傅里叶频率和本地空间信息集成起来，以恢复图像的细节。通过这种独特的设计，FourLLIE在四个代表性的数据集上比前一些SOTA LLIE方法表现出色，同时保持了好的模型效率。

Recurrent Spike-based Image Restoration under General Illumination

paper_url: http://arxiv.org/abs/2308.03018
repo_url: https://github.com/bit-vision/rsir
paper_authors: Lin Zhu, Yunlong Zheng, Mengyue Geng, Lizhi Wang, Hua Huang
for: 这种新型的生物体注视传感器可以记录光Intensity为高速度的锥形数组，具有高时间分辨率（20,000 Hz）。这种新的视觉传感器方式可以提供更多的视觉任务，如高速度图像重建。但是，现有的颗粒基本approaches通常假设场景中有足够的光Intensity，这通常不符合实际世界中的许多场景，如雨天或晚上场景。
methods: 我们提出了一种Recurrent Spike-based Image Restoration（RSIR）网络，这是首个能够从颗粒数组中恢复清晰图像的方法。我们根据采样过程建立了物理基于的颗粒噪声模型，并根据这个噪声模型，我们设计了我们的RSIR网络，该网络包括自适应颗粒变换模块、回归时间特征融合模块和频率基于的颗粒去噪模块。我们的RSIR可以在循环方式处理颗粒数组，以确保颗粒时间信息得到了好用。
results: 我们通过对实际 datasets with different illuminations进行了广泛的实验，证明了我们的方法的有效性。代码和数据集在https://github.com/BIT-Vision/RSIR上发布。

Abstract
Spike camera is a new type of bio-inspired vision sensor that records light intensity in the form of a spike array with high temporal resolution (20,000 Hz). This new paradigm of vision sensor offers significant advantages for many vision tasks such as high speed image reconstruction. However, existing spike-based approaches typically assume that the scenes are with sufficient light intensity, which is usually unavailable in many real-world scenarios such as rainy days or dusk scenes. To unlock more spike-based application scenarios, we propose a Recurrent Spike-based Image Restoration (RSIR) network, which is the first work towards restoring clear images from spike arrays under general illumination. Specifically, to accurately describe the noise distribution under different illuminations, we build a physical-based spike noise model according to the sampling process of the spike camera. Based on the noise model, we design our RSIR network which consists of an adaptive spike transformation module, a recurrent temporal feature fusion module, and a frequency-based spike denoising module. Our RSIR can process the spike array in a recursive manner to ensure that the spike temporal information is well utilized. In the training process, we generate the simulated spike data based on our noise model to train our network. Extensive experiments on real-world datasets with different illuminations demonstrate the effectiveness of the proposed network. The code and dataset are released at https://github.com/BIT-Vision/RSIR.

摘要
新型的蜂巢相机（Spike camera）是一种基于生物体的视觉传感器，它记录光度的变化形式为高度精度的蜂巢数组（20,000 Hz）。这种新的视觉传感器 paradigma提供了许多视觉任务的高速重建优势，但现有的蜂巢基本方法通常假设场景中有足够的光度，这通常不符合实际情况，如雨天或晚上场景。为了拓展更多的蜂巢应用场景，我们提出了一种基于蜂巢的图像修复网络（RSIR），这是首个在普通照明下修复清晰图像的工作。 Specifically, 我们建立了基于采样过程的物理基于蜂巢噪声模型，以描述不同照明下噪声分布。根据噪声模型，我们设计了我们的 RSIR 网络，该网络包括自适应蜂巢变换模块、回归时间特征融合模块和频率基于蜂巢噪声除净模块。我们的 RSIR 可以 recursive 地处理蜂巢数组，以确保蜂巢时间信息得到好好利用。在训练过程中，我们根据我们的噪声模型生成了模拟的蜂巢数据来训练我们的网络。广泛的实验表明，我们的方法可以在不同的照明下进行高效的图像修复。代码和数据集可以在上下载。

High-Resolution Vision Transformers for Pixel-Level Identification of Structural Components and Damage

paper_url: http://arxiv.org/abs/2308.03006
repo_url: None
paper_authors: Kareem Eltouny, Seyedomid Sajedi, Xiao Liang
for: 这个研究旨在提高桥梁检查图像的解析和检测效率，使用视transformer和劳拉幂 pyramids scaling网络来高效分割高分辨率视频检查图像。
methods: 该研究提出了一种基于视transformer和劳拉幂 pyramids scaling网络的 semantic segmentation网络，可以高效地处理大量的高分辨率视频检查图像，并保持本地细节和全局 semantics 信息。
results: 经过对bridge inspection report图像的详细实验，该方法能够高效地检测桥梁材料的分布，并且在多种 метриках上达到了比较高的准确率。

Abstract
Visual inspection is predominantly used to evaluate the state of civil structures, but recent developments in unmanned aerial vehicles (UAVs) and artificial intelligence have increased the speed, safety, and reliability of the inspection process. In this study, we develop a semantic segmentation network based on vision transformers and Laplacian pyramids scaling networks for efficiently parsing high-resolution visual inspection images. The massive amounts of collected high-resolution images during inspections can slow down the investigation efforts. And while there have been extensive studies dedicated to the use of deep learning models for damage segmentation, processing high-resolution visual data can pose major computational difficulties. Traditionally, images are either uniformly downsampled or partitioned to cope with computational demands. However, the input is at risk of losing local fine details, such as thin cracks, or global contextual information. Inspired by super-resolution architectures, our vision transformer model learns to resize high-resolution images and masks to retain both the valuable local features and the global semantics without sacrificing computational efficiency. The proposed framework has been evaluated through comprehensive experiments on a dataset of bridge inspection report images using multiple metrics for pixel-wise materials detection.

摘要
<>Translate the given text into Simplified Chinese.<>视觉检查主要用于评估公共建筑物，但最近的无人飞行器（UAV）和人工智能技术的发展已经提高了检查过程的速度、安全性和可靠性。在这项研究中，我们开发了基于视觉变换器和傅里叶分割网络的Semantic Segmentation网络，用于高效地解析视觉检查图像。收集的大量高分辨率图像可能会拖垮调查工作。虽然有很多关于深度学习模型的损害分割研究，但处理高分辨率视觉数据可以带来很大的计算困难。传统上，图像会被uniform downsample或分割，以降低计算压力，但输入可能会产生Local细腐 crack或全局Contextual信息的产生。受超分辨率架构启发，我们的视觉变换器模型可以resize高分辨率图像和mask来保留有价值的Local特征和全局Semantics信息，不需要牺牲计算效率。我们提出的框架在 bridge 检查报告图像上进行了广泛的实验，使用多种度量来进行像素精度检测。

Weakly supervised segmentation of intracranial aneurysms using a 3D focal modulation UNet

paper_url: http://arxiv.org/abs/2308.03001
repo_url: None
paper_authors: Amirhossein Rasoulian, Soorena Salari, Yiming Xiao
for: 本研究旨在提高脑动脉疾病诊断和治疗决策的准确性和效率，通过自动化三维血管动脉分割来提高UIA诊断和评估。
methods: 本研究使用了弱监督学习和粗糙标签，通过一种基于focal modulation的3D focal modulation UNet和Conditional Random Field（CRF）后处理来实现高精度的UIA分割。
results: 实验结果表明，提出的方法在评估指标Dice分数和 Hausdorff距离上均超过了现有的3D UNet和Swin-UNETR方法，并且显示了 focal modulation 的潜在优势。

Abstract
Accurate identification and quantification of unruptured intracranial aneurysms (UIAs) are essential for the risk assessment and treatment decisions of this cerebrovascular disorder. Current assessment based on 2D manual measures of aneurysms on 3D magnetic resonance angiography (MRA) is sub-optimal and time-consuming. Automatic 3D measures can significantly benefit the clinical workflow and treatment outcomes. However, one major issue in medical image segmentation is the need for large well-annotated data, which can be expensive to obtain. Techniques that mitigate the requirement, such as weakly supervised learning with coarse labels are highly desirable. In this paper, we leverage coarse labels of UIAs from time-of-flight MRAs to obtain refined UIAs segmentation using a novel 3D focal modulation UNet, called FocalSegNet and conditional random field (CRF) postprocessing, with a Dice score of 0.68 and 95% Hausdorff distance of 0.95 mm. We evaluated the performance of the proposed algorithms against the state-of-the-art 3D UNet and Swin-UNETR, and demonstrated the superiority of the proposed FocalSegNet and the benefit of focal modulation for the task.

摘要
精准识别和量化非ruptured intracranial aneurysms (UIAs) 是脑血管疾病风险评估和治疗决策中的关键。现有的评估方法基于2D手动测量在3D磁共振成像(MRA)上的动脉瘤是次优化的和时间consuming。自动3D测量可以帮助优化诊断和治疗结果。然而，医疗图像分割的一个主要问题是需要大量高质量标注数据，这可以是成本高的。我们在这篇论文中利用了时间反射MRAs中的UIAs粗略标注来获得精细的UIAs分割，使用了一种新的3D焦点修饰UNet（FocalSegNet）和条件Random Field（CRF）后处理，得到了0.68的Dice分数和0.95毫米的95% Hausdorff距离。我们对已有的3D UNets和Swin-UNETR进行了比较，并证明了我们提出的FocalSegNet的优越性和焦点修饰的好处。

DermoSegDiff: A Boundary-aware Segmentation Diffusion Model for Skin Lesion Delineation

paper_url: http://arxiv.org/abs/2308.02959
repo_url: https://github.com/mindflow-institue/dermosegdiff
paper_authors: Afshin Bozorgpour, Yousef Sadegheih, Amirhossein Kazerouni, Reza Azad, Dorit Merhof
for: 静脉皮肤病症诊断早期检测
methods: 使用边缘信息在学习过程中进行增强，并 introduce 一种新的损失函数来优先级化边界信息。
results: 对多个皮肤分割数据集进行实验，表明 DermoSegDiff 比现有 CNN、转换器和扩散模型更高效和普遍。

Abstract
Skin lesion segmentation plays a critical role in the early detection and accurate diagnosis of dermatological conditions. Denoising Diffusion Probabilistic Models (DDPMs) have recently gained attention for their exceptional image-generation capabilities. Building on these advancements, we propose DermoSegDiff, a novel framework for skin lesion segmentation that incorporates boundary information during the learning process. Our approach introduces a novel loss function that prioritizes the boundaries during training, gradually reducing the significance of other regions. We also introduce a novel U-Net-based denoising network that proficiently integrates noise and semantic information inside the network. Experimental results on multiple skin segmentation datasets demonstrate the superiority of DermoSegDiff over existing CNN, transformer, and diffusion-based approaches, showcasing its effectiveness and generalization in various scenarios. The implementation is publicly accessible on \href{https://github.com/mindflow-institue/dermosegdiff}{GitHub}

摘要
皮肤 lesion 分割在早期检测和准确诊断皮肤病理中扮演了关键角色。 reciently， Denoising Diffusion Probabilistic Models (DDPMs) 在图像生成方面受到了广泛关注。基于这些进步，我们提出了 DermoSegDiff，一种新的皮肤 lesion 分割框架，它在学习过程中引入边界信息。我们的方法引入了一种新的损失函数，在训练过程中优先级是边界区域，逐渐减少其他区域的重要性。我们还引入了一种基于 U-Net 的混合噪声和semantic信息的denoising网络。多个皮肤分割数据集的实验结果表明，DermoSegDiff 在不同场景下比核心 CNN、transformer 和 diffusion 基本上表现出色，展示其效果和泛化能力。实现可以在 \href{https://github.com/mindflow-institue/dermosegdiff}{GitHub} 上获取。

MomentaMorph: Unsupervised Spatial-Temporal Registration with Momenta, Shooting, and Correction

paper_url: http://arxiv.org/abs/2308.02949
repo_url: None
paper_authors: Zhangxing Bian, Shuwen Wei, Yihao Liu, Junyu Chen, Jiachen Zhuo, Fangxu Xing, Jonghye Woo, Aaron Carass, Jerry L. Prince
for:这篇论文旨在提出一种新的“劫持、射击、更正”框架，用于在具有复杂模式和大动量的情况下进行劫持动态图像的运动场 estimation。methods:这种框架基于李群和李代数原理，在坐标空间内积累动量，使用抽象映射在坐标空间中快速地逼近真的极小值，并且通过后续的更正步骤确保劫持到真的极小值。results:在一个2D synthetic数据集和一个实际的3D tMRI数据集上，这种方法能够准确地估计2D/3D动态图像中的劫持动量场，并且能够适应大动量和复杂模式。

Abstract
Tagged magnetic resonance imaging (tMRI) has been employed for decades to measure the motion of tissue undergoing deformation. However, registration-based motion estimation from tMRI is difficult due to the periodic patterns in these images, particularly when the motion is large. With a larger motion the registration approach gets trapped in a local optima, leading to motion estimation errors. We introduce a novel "momenta, shooting, and correction" framework for Lagrangian motion estimation in the presence of repetitive patterns and large motion. This framework, grounded in Lie algebra and Lie group principles, accumulates momenta in the tangent vector space and employs exponential mapping in the diffeomorphic space for rapid approximation towards true optima, circumventing local optima. A subsequent correction step ensures convergence to true optima. The results on a 2D synthetic dataset and a real 3D tMRI dataset demonstrate our method's efficiency in estimating accurate, dense, and diffeomorphic 2D/3D motion fields amidst large motion and repetitive patterns.

摘要
标记的核磁共振成像（tMRI）已经在数十年内用于测量软组织的运动。然而，基于准确的注册的运动估计从tMRI中很难进行，尤其是当运动较大时。大量运动会让注册方法被困在本地最佳点，导致运动估计错误。我们介绍了一种新的“动量、射击和修正”框架，用于在具有重复模式和大运动的情况下进行拉格朗日运动估计。这个框架基于李代数和李群原理，在 tangent 空间中积累动量，并使用 exponential mapping 在 diffeomorphic 空间中快速地逼近真正的最佳点， circumventing 本地最佳点。后续的修正步骤确保了真正的最佳点准确性。 synthetic 数据集和真实的 3D tMRI 数据集的结果表明，我们的方法可以快速、高精度地估计软组织的 2D/3D 运动场，即使在大运动和重复模式的情况下。

2023-08-05

cs.SD

cs.SD - 2023-08-05

ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging

paper_url: http://arxiv.org/abs/2308.02870
repo_url: None
paper_authors: Fangyuan Wang, Ming Hao, Yuhai Shi, Bo Xu
for: This paper aims to improve the conventional recipe for Automatic Speech Recognition (ASR) models by rethinking and updating the early stopping and checkpoint averaging methods from the perspective of the bias-variance tradeoff.
methods: The proposed method, called Approximated Bias-Variance Tradeoff (ApproBiVT), uses the training loss and validation loss as proxies of bias and variance to guide the early stopping and checkpoint averaging.
results: When evaluated on the AISHELL-1 and AISHELL-2 datasets, the proposed recipe provided a CER reduction of 2.5%-3.7% and 3.1%-4.6%, respectively, compared to the conventional recipe.

Abstract
The conventional recipe for Automatic Speech Recognition (ASR) models is to 1) train multiple checkpoints on a training set while relying on a validation set to prevent overfitting using early stopping and 2) average several last checkpoints or that of the lowest validation losses to obtain the final model. In this paper, we rethink and update the early stopping and checkpoint averaging from the perspective of the bias-variance tradeoff. Theoretically, the bias and variance represent the fitness and variability of a model and the tradeoff of them determines the overall generalization error. But, it's impractical to evaluate them precisely. As an alternative, we take the training loss and validation loss as proxies of bias and variance and guide the early stopping and checkpoint averaging using their tradeoff, namely an Approximated Bias-Variance Tradeoff (ApproBiVT). When evaluating with advanced ASR models, our recipe provides 2.5%-3.7% and 3.1%-4.6% CER reduction on the AISHELL-1 and AISHELL-2, respectively.

摘要
传统的自动语音识别（ASR）模型制作流程是：1）在训练集上训练多个Checkpoint，并且使用验证集来防止过拟合，使用早期停止和Checkpoint平均来获得最终模型。在这篇论文中，我们重新思考和更新了早期停止和Checkpoint平均的方法，从偏差-变差质量的角度来考虑。在理论上，偏差和变差代表模型的适应度和多样性，它们之间的质量评价是模型的总泛化误差的关键因素。但是，很难准确地评价它们。因此，我们使用训练损失和验证损失作为偏差和变差的代理，并使用它们之间的质量评价来引导早期停止和Checkpoint平均，即 Approximated Bias-Variance Tradeoff（ApproBiVT）。在使用高级ASR模型进行评估时，我们的制作流程可以提供2.5%-3.7%和3.1%-4.6%的CER减少在AISHELL-1和AISHELL-2上。

A Systematic Exploration of Joint-training for Singing Voice Synthesis

paper_url: http://arxiv.org/abs/2308.02867
repo_url: None
paper_authors: Yuning Wu, Yifeng Yu, Jiatong Shi, Tao Qian, Qin Jin
for: 提高Singing Voice Synthesis（SVS）系统的 JOINT-TRAINING 性能
methods: 采用joint-training策略，协同训练Acoustic Model和Vocoder
results: 对多个数据集进行了广泛的实验，并达到了比基eline更稳定的性能，同时提高了整个框架的可解释性。

Abstract
There has been a growing interest in using end-to-end acoustic models for singing voice synthesis (SVS). Typically, these models require an additional vocoder to transform the generated acoustic features into the final waveform. However, since the acoustic model and the vocoder are not jointly optimized, a gap can exist between the two models, leading to suboptimal performance. Although a similar problem has been addressed in the TTS systems by joint-training or by replacing acoustic features with a latent representation, adopting corresponding approaches to SVS is not an easy task. How to improve the joint-training of SVS systems has not been well explored. In this paper, we conduct a systematic investigation of how to better perform a joint-training of an acoustic model and a vocoder for SVS. We carry out extensive experiments and demonstrate that our joint-training strategy outperforms baselines, achieving more stable performance across different datasets while also increasing the interpretability of the entire framework.

摘要
Recently, there has been growing interest in using end-to-end acoustic models for singing voice synthesis (SVS). Typically, these models require an additional vocoder to transform the generated acoustic features into the final waveform. However, since the acoustic model and the vocoder are not jointly optimized, a gap can exist between the two models, leading to suboptimal performance. Although a similar problem has been addressed in TTS systems by joint-training or by replacing acoustic features with a latent representation, adopting corresponding approaches to SVS is not an easy task. How to improve the joint-training of SVS systems has not been well explored. In this paper, we conduct a systematic investigation of how to better perform a joint-training of an acoustic model and a vocoder for SVS. We carry out extensive experiments and demonstrate that our joint-training strategy outperforms baselines, achieving more stable performance across different datasets while also increasing the interpretability of the entire framework.

Bootstrapping Contrastive Learning Enhanced Music Cold-Start Matching

paper_url: http://arxiv.org/abs/2308.02844
repo_url: None
paper_authors: Xinping Zhao, Ying Zhang, Qiang Xiao, Yuming Ren, Yingchun Yang
for: 这篇论文是为了解决音乐冷启始匹配问题，即在没有相关数据的情况下，根据歌曲内容特征来找到与之相似的歌曲和听众。
methods: 作者们使用了一种名为Bootstrapping Contrastive Learning（BCL）的新的对比学习方法，以增强学习的歌曲表示质量。此外，他们还提出了一种名为Clustering-based Audience Targeting（CAT）的听众定向策略，用于在在线服务中更准确地定位目标听众。
results: 作者们通过对离线数据集和在线系统进行广泛的实验，证明了他们的方法的有效性和高效性。此外，他们还在NetEase Cloud Music上部署了这种方法，影响了数百万用户。

Abstract
We study a particular matching task we call Music Cold-Start Matching. In short, given a cold-start song request, we expect to retrieve songs with similar audiences and then fastly push the cold-start song to the audiences of the retrieved songs to warm up it. However, there are hardly any studies done on this task. Therefore, in this paper, we will formalize the problem of Music Cold-Start Matching detailedly and give a scheme. During the offline training, we attempt to learn high-quality song representations based on song content features. But, we find supervision signals typically follow power-law distribution causing skewed representation learning. To address this issue, we propose a novel contrastive learning paradigm named Bootstrapping Contrastive Learning (BCL) to enhance the quality of learned representations by exerting contrastive regularization. During the online serving, to locate the target audiences more accurately, we propose Clustering-based Audience Targeting (CAT) that clusters audience representations to acquire a few cluster centroids and then locate the target audiences by measuring the relevance between the audience representations and the cluster centroids. Extensive experiments on the offline dataset and online system demonstrate the effectiveness and efficiency of our method. Currently, we have deployed it on NetEase Cloud Music, affecting millions of users. Code will be released in the future.

摘要
我们研究了一个特定的匹配任务，称之为音乐冷启始匹配（Music Cold-Start Matching）。简而言之，给定一个冷启始歌曲请求，我们期望检索到与其相似的听众，然后快速推广冷启始歌曲到检索到的听众中，以便让其热身。然而，现有的研究对此任务的研究非常有限。因此，在这篇论文中，我们将Music Cold-Start Matching问题进行详细化，并提出一种方案。在线上训练时，我们尝试学习高质量的歌曲表示，基于歌曲内容特征。然而，我们发现监督信号通常遵循力量分布，导致表示学习受到扭曲的影响。为解决这个问题，我们提出了一种新的对比学习方法，称之为启动对比学习（Bootstrapping Contrastive Learning，BCL），以提高学习的质量。在线上服务时，我们提出了分组听众定向（Clustering-based Audience Targeting，CAT），将听众表示分组为一些集中点，然后通过测量听众表示和集中点之间的相似度来确定目标听众。我们对历史数据集和在线系统进行了广泛的实验，证明了我们的方法的有效性和效率。目前，我们已经将其部署到NetEase Cloud Music上，影响了数百万名用户。代码将在未来发布。

Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision

paper_url: http://arxiv.org/abs/2308.02774
repo_url: https://github.com/alibaba-damo-academy/3D-Speaker
paper_authors: Yafeng Chen, Siqi Zheng, Qian Chen
for: 提高无标签 speaker verification 系统的可靠性和稳定性。
methods: 提出了一种自动学习 speaker representation 的 Self-Distillation network with Ensemble Prototypes (SDEP) 框架，无需使用标签数据。
results: 在 VoxCeleb 数据集上进行了详细的实验，并达到了新的 SOTA 水平（ i.e., 等Error rate 1.94%、1.99% 和 3.77%），不使用任何标签数据进行训练。

Abstract
Training speaker-discriminative and robust speaker verification systems without speaker labels is still challenging and worthwhile to explore. Previous studies have noted a substantial performance disparity between self-supervised and fully supervised approaches. In this paper, we propose an effective Self-Distillation network with Ensemble Prototypes (SDEP) to facilitate self-supervised speaker representation learning. A range of experiments conducted on the VoxCeleb datasets demonstrate the superiority of the SDEP framework in speaker verification. SDEP achieves a new SOTA on Voxceleb1 speaker verification evaluation benchmark ( i.e., equal error rate 1.94\%, 1.99\%, and 3.77\% for trial Vox1-O, Vox1-E and Vox1-H , respectively), discarding any speaker labels in the training phase. Code will be publicly available at https://github.com/alibaba-damo-academy/3D-Speaker.

摘要
使用无标签的语音训练说话人识别系统仍然是一项挑战，但也是值得探索的。过去的研究表明自我超vised和完全超vised方法之间存在很大的性能差异。在这篇论文中，我们提出了一种有效的自我蒸馏网络与集成观察者（SDEP），以便无标签语音表征学习。对于VoxCeleb数据集进行了一系列实验， demonstarted SDEP框架在说话人识别中的优越性。SDEP在Voxceleb1说话人识别评价标准（即错误率1.94%、1.99%和3.77%）上达到了新的最佳性能，不使用任何说话人标签在训练阶段。代码将在https://github.com/alibaba-damo-academy/3D-Speaker上公开。Note that Simplified Chinese is used in the translation, as it is the more widely used standard in mainland China. If you prefer Traditional Chinese, I can provide that as well.