cs.CL - 2023-10-25

BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs’ Generation

  • paper_url: http://arxiv.org/abs/2310.17054
  • repo_url: https://github.com/PlusLabNLP/BOOST_EMNLP23
  • paper_authors: Yufei Tian, Felix Zhang, Nanyun Peng
  • for: 本研究旨在提高大型自然语言模型(LLM)的生成结果具备常识性。
  • methods: 我们提出了一种计算效率高的框架,使用冻结的预训练语言模型(PTLM)来生成更常识性的输出。我们首先构建了一个不含参考的评估器,将 sentence 评估为常识度。然后,我们使用评估器作为常识知识的oracle,并将 NADO 方法扩展到培训一个辅助头,以使PTLM更好地满足 oracle。
  • results: 我们在多种 GPT-2-, Flan-T5- 和 Alpaca-based 语言模型(LM)上进行了 série 的测试,结果显示,我们的方法能够 consistently 生成最常识性的输出。
    Abstract Large language models (LLMs) such as GPT-3 have demonstrated a strong capability to generate coherent and contextually relevant text. However, amidst their successes, a crucial issue persists: their generated outputs still lack commonsense at times. Moreover, fine-tuning the entire LLM towards more commonsensical outputs is computationally expensive if not infeasible. In this paper, we present a computation-efficient framework that steers a frozen Pre-Trained Language Model (PTLM) towards more commonsensical generation (i.e., producing a plausible output that incorporates a list of concepts in a meaningful way). Specifically, we first construct a reference-free evaluator that assigns a sentence with a commonsensical score by grounding the sentence to a dynamic commonsense knowledge base from four different relational aspects. We then use the scorer as the oracle for commonsense knowledge, and extend the controllable generation method called NADO to train an auxiliary head that guides a fixed PTLM to better satisfy the oracle. We test our framework on a series of GPT-2-, Flan-T5-, and Alpaca-based language models (LMs) on two constrained concept-to-sentence benchmarks. Human evaluation results demonstrate that our method consistently leads to the most commonsensical outputs.
    摘要 大型语言模型(LLM)如GPT-3已经表现出了强大的文本生成能力,但是在其成功之余,一个关键的问题仍然存在:它们的生成输出ometimes lack commonsense。而且,对整个LLM进行更加commonsensical的输出的精细调整是计算成本高昂的,甚至不可能。在这篇论文中,我们提出了一种计算效率高的框架,可以使用冻结的Pre-Trained Language Model(PTLM)生成更加commonsensical的文本。具体来说,我们首先构建了不含参考的评估器,可以根据四个关系方面的动态通用常识知识库赋予一句话commonsensical分数。然后,我们使用这个评估器作为 oracle,并将NADO控制生成方法扩展到固定PTLM上,以帮助它更好地满足oracle。我们在GPT-2-, Flan-T5-和Alpaca-based语言模型(LM)上进行了一系列测试。人类评估结果表明,我们的方法可以一直领先其他方法,并且生成出最commonsensical的输出。

Follow-on Question Suggestion via Voice Hints for Voice Assistants

  • paper_url: http://arxiv.org/abs/2310.17034
  • repo_url: None
  • paper_authors: Besnik Fetahu, Pedro Faustini, Giuseppe Castellucci, Anjie Fang, Oleg Rokhlenko, Shervin Malmasi
  • for: This paper aims to provide a solution for suggesting questions with compact and natural voice hints to allow users to ask follow-up questions in voice-based search settings.
  • methods: The authors propose an approach using sequence-to-sequence Transformers to generate spoken hints from a list of questions, and also define a linguistically-motivated pretraining task to improve the quality of the hints.
  • results: The authors evaluate their approach using a new dataset of 6681 input questions and human written hints, and find that their approach is strongly preferred by humans for producing the most natural hints, as compared to a naive approach of concatenating suggested questions.
    Abstract The adoption of voice assistants like Alexa or Siri has grown rapidly, allowing users to instantly access information via voice search. Query suggestion is a standard feature of screen-based search experiences, allowing users to explore additional topics. However, this is not trivial to implement in voice-based settings. To enable this, we tackle the novel task of suggesting questions with compact and natural voice hints to allow users to ask follow-up questions. We define the task, ground it in syntactic theory and outline linguistic desiderata for spoken hints. We propose baselines and an approach using sequence-to-sequence Transformers to generate spoken hints from a list of questions. Using a new dataset of 6681 input questions and human written hints, we evaluated the models with automatic metrics and human evaluation. Results show that a naive approach of concatenating suggested questions creates poor voice hints. Our approach, which applies a linguistically-motivated pretraining task was strongly preferred by humans for producing the most natural hints.
    摘要 “对话助手如Alexa或Siri的采用速度快速增加,让用户通过声音搜寻取得信息。视觉搜寻经验中的查询建议是一个标准功能,允许用户继续探索相关主题。但在声音基础设置中实现此功能并不容易。为此,我们面临了一个新的任务:提出自然且简洁的声音提示,让用户可以通过声音提问。”“我们定义这个任务,并基于 syntax theory 进行定义。我们也提出了一些语言特性,以确保提出的提示 naturall 且易于理解。我们提出了一个基于 sequence-to-sequence Transformer 的方法,将问题列表转换为声音提示。使用了6681个输入问题和人工写的提示,我们评估了这些模型的性能。结果显示, concatenate 提出的提示会导致poor的声音提示。我们的方法,将在语言驱动的预训任务中使用语言驱动的预训任务,被人类评估为生成最自然的提示。”

Conditionally Combining Robot Skills using Large Language Models

  • paper_url: http://arxiv.org/abs/2310.17019
  • repo_url: https://github.com/krzentner/language-world
  • paper_authors: K. R. Zentner, Ryan Julian, Brian Ichter, Gaurav S. Sukhatme
  • for: 本研究旨在探讨一种 combining two contributions的方法,包括一个叫做”Language-World”的扩展,允许一个大型自然语言模型在一个模拟的 роботиче环境中运行,使用 semi-structured natural language queries 和 scripted skills 描述使用 natural language。
  • methods: 本研究使用的方法包括 Plan Conditioned Behavioral Cloning (PCBC),可以使用 end-to-end 示例来调整高级计划的行为。
  • results: 使用 Language-World,PCBC 在多种 few-shot 情况下能够实现强性表现,经常实现任务总结概念,只需要一个示例即可。
    Abstract This paper combines two contributions. First, we introduce an extension of the Meta-World benchmark, which we call "Language-World," which allows a large language model to operate in a simulated robotic environment using semi-structured natural language queries and scripted skills described using natural language. By using the same set of tasks as Meta-World, Language-World results can be easily compared to Meta-World results, allowing for a point of comparison between recent methods using Large Language Models (LLMs) and those using Deep Reinforcement Learning. Second, we introduce a method we call Plan Conditioned Behavioral Cloning (PCBC), that allows finetuning the behavior of high-level plans using end-to-end demonstrations. Using Language-World, we show that PCBC is able to achieve strong performance in a variety of few-shot regimes, often achieving task generalization with as little as a single demonstration. We have made Language-World available as open-source software at https://github.com/krzentner/language-world/.
    摘要 这篇论文组合了两个贡献。首先,我们介绍了一种扩展Meta-World benchmark,我们称之为"语言世界"(Language-World),允许一个大型自然语言模型在模拟的机器人环境中使用不结构化的自然语言查询和遵循自然语言描述的脚本技能。通过使用Meta-World任务集,Language-World结果可以与Meta-World结果进行直接比较,从而为最近使用大型自然语言模型(LLMs)和深度强化学习方法之间的比较提供一个参照点。其次,我们介绍了一种方法,称之为Plan Conditioned Behavioral Cloning(PCBC),允许高级计划的训练终端示例。使用Language-World,我们表明PCBC在不同的几个尝试情况下能够实现强大的表现,经常在几个示例下实现任务总结。我们将Language-World作为开源软件提供在GitHub上,请参考

Data Augmentation for Emotion Detection in Small Imbalanced Text Data

  • paper_url: http://arxiv.org/abs/2310.17015
  • repo_url: https://github.com/a-koufakou/augemotiondetection
  • paper_authors: Anna Koufakou, Diego Grisales, Ragy Costa de jesus, Oscar Fox
  • for: 本研究旨在探讨数据增强技术在小规模、不均衡数据集上的影响,以提高NL表示模型在情感识别任务中的性能。
  • methods: 本研究使用了四种数据增强方法(EDA、静态和Contextual Embedding-based、ProtAugment),在三个不同的数据集上进行了实验。
  • results: 实验结果显示,通过在模型训练中使用增强数据,可以得到显著改善情感识别性能。此外,本研究还进行了两个 случа研究,包括使用受欢迎的Chat-GPT API来自动生成句子,以及使用外部数据增强训练集。结果表明这些方法具有潜在的潜力。
    Abstract Emotion recognition in text, the task of identifying emotions such as joy or anger, is a challenging problem in NLP with many applications. One of the challenges is the shortage of available datasets that have been annotated with emotions. Certain existing datasets are small, follow different emotion taxonomies and display imbalance in their emotion distribution. In this work, we studied the impact of data augmentation techniques precisely when applied to small imbalanced datasets, for which current state-of-the-art models (such as RoBERTa) under-perform. Specifically, we utilized four data augmentation methods (Easy Data Augmentation EDA, static and contextual Embedding-based, and ProtAugment) on three datasets that come from different sources and vary in size, emotion categories and distributions. Our experimental results show that using the augmented data when training the classifier model leads to significant improvements. Finally, we conducted two case studies: a) directly using the popular chat-GPT API to paraphrase text using different prompts, and b) using external data to augment the training set. Results show the promising potential of these methods.
    摘要 文本情感识别任务(Emotion Recognition)是自然语言处理(NLP)领域中的一个挑战性任务,具有许多应用。其中一个挑战是有限的可用标注数据。现有的一些数据集都很小,遵循着不同的情感分类法,并且具有不均匀的情感分布。在这项工作中,我们研究了对小规模不均匀数据集进行数据增强技术的影响。特别是,我们使用了四种数据增强方法(Easy Data Augmentation EDA、静态和Contextual Embedding-based、ProtAugment)在三个不同来源的数据集上进行实验。我们的实验结果表明,在训练分类器模型时使用增强数据可以获得显著改善。最后,我们进行了两个案例研究:a)直接使用流行的 chat-GPT API 将文本重新表述为不同的提问,b)使用外部数据增强训练集。结果表明这些方法具有潜在的批处性。

Quality > Quantity: Synthetic Corpora from Foundation Models for Closed-Domain Extractive Question Answering

  • paper_url: http://arxiv.org/abs/2310.16995
  • repo_url: https://github.com/saptarshi059/cdqa-v1-targetted-pretraining
  • paper_authors: Saptarshi Sengupta, Connor Heaton, Shreya Ghosh, Preslav Nakov, Prasenjit Mitra
  • for: 本研究旨在提高闭包的问答系统的性能,通过针对性地预训练模型来适应特定领域的问题。
  • methods: 我们提出了一种名为“targeted pre-training”的方法,即根据特定领域的数据来预训练模型,以提高其在目标领域的性能。我们使用了Galactica工具来生成一些“targeted”的 corpora,以便更好地适应特定领域的问题。
  • results: 我们在两个生物医学抽取式问答数据集上进行了实验,并 achieved a new benchmark on COVID-QA 数据集,同时在 RadQA 数据集上也得到了全面的改进。
    Abstract Domain adaptation, the process of training a model in one domain and applying it to another, has been extensively explored in machine learning. While training a domain-specific foundation model (FM) from scratch is an option, recent methods have focused on adapting pre-trained FMs for domain-specific tasks. However, our experiments reveal that either approach does not consistently achieve state-of-the-art (SOTA) results in the target domain. In this work, we study extractive question answering within closed domains and introduce the concept of targeted pre-training. This involves determining and generating relevant data to further pre-train our models, as opposed to the conventional philosophy of utilizing domain-specific FMs trained on a wide range of data. Our proposed framework uses Galactica to generate synthetic, ``targeted'' corpora that align with specific writing styles and topics, such as research papers and radiology reports. This process can be viewed as a form of knowledge distillation. We apply our method to two biomedical extractive question answering datasets, COVID-QA and RadQA, achieving a new benchmark on the former and demonstrating overall improvements on the latter. Code available at https://github.com/saptarshi059/CDQA-v1-Targetted-PreTraining/tree/main.
    摘要 域适应,即在一个领域中训练模型,然后应用到另一个领域,在机器学习领域中得到了广泛的探索。而在训练域pecific基本模型(FM)从scratch的方法也有所研究,但我们的实验表明,这两种方法并不一定能够在目标领域 achieve state-of-the-art(SOTA)结果。在这项工作中,我们研究closed domain中的抽取式问答 tasks,并提出了一种名为目标预训练的概念。这种方法是通过Determining和生成相关的数据来进一步训练我们的模型,而不是通过使用域pecific FMs在各种数据上进行训练。我们的提出的框架使用Galactica来生成一些“targeted”的Synthetic corpora,这些corpora与特定的写作风格和主题相对应,例如研究论文和医学报告。这个过程可以看作是一种知识储存。我们在COVID-QA和RadQA两个生物医学抽取式问答数据集上应用了我们的方法, achieved a new benchmark on the former and demonstrated overall improvements on the latter。代码可以在https://github.com/saptarshi059/CDQA-v1-Targetted-PreTraining/tree/main。

How well can machine-generated texts be identified and can language models be trained to avoid identification?

  • paper_url: http://arxiv.org/abs/2310.16992
  • repo_url: None
  • paper_authors: Sinclair Schneider, Florian Steuber, Joao A. G. Schneider, Gabi Dreo Rodosek
  • for: 本研究旨在 distinguishing 人工生成的文本与机器生成的文本。
  • methods: 我们使用了五种独立的语言模型来生成假 Tweets,并发现了 shallow learning 分类算法(如 Naive Bayes)可以达到0.6-0.8的检测精度。
  • results: 我们发现,使用高温值生成文本时,人类检测和机器检测之间存在显著差异,而使用 transformer 基于的分类算法可以达到0.9和更高的检测精度。 更 того,我们使用了强化学习approach来练习我们的生成模型,可以成功逃脱BERT基于的检测算法,其检测精度为0.15或更低。
    Abstract With the rise of generative pre-trained transformer models such as GPT-3, GPT-NeoX, or OPT, distinguishing human-generated texts from machine-generated ones has become important. We refined five separate language models to generate synthetic tweets, uncovering that shallow learning classification algorithms, like Naive Bayes, achieve detection accuracy between 0.6 and 0.8. Shallow learning classifiers differ from human-based detection, especially when using higher temperature values during text generation, resulting in a lower detection rate. Humans prioritize linguistic acceptability, which tends to be higher at lower temperature values. In contrast, transformer-based classifiers have an accuracy of 0.9 and above. We found that using a reinforcement learning approach to refine our generative models can successfully evade BERT-based classifiers with a detection accuracy of 0.15 or less.
    摘要 随着生成预训练变换器模型如GPT-3、GPT-NeoX或OPT的出现,分辨人工生成的文本和机器生成的文本已经变得非常重要。我们对五种语言模型进行了精细调整,生成了Synthetic tweets,发现了使用Naive Bayes等浅学习分类算法时,检测精度在0.6-0.8之间。浅学习分类器与人类检测存在差异,尤其是在使用更高的温度值生成文本时,检测率较低。人类偏好语言可接受性,这种可接受性通常在低温度值时高。相比之下,transformer基类分类器的准确率为0.9和更高。我们发现使用强化学习方法来精细调整我们的生成模型可以成功逃脱BERT基类分类器,其检测精度为0.15或更低。

Understanding Social Structures from Contemporary Literary Fiction using Character Interaction Graph – Half Century Chronology of Influential Bengali Writers

  • paper_url: http://arxiv.org/abs/2310.16968
  • repo_url: None
  • paper_authors: Nafis Irtiza Tripto, Mohammed Eunus Ali
  • for: 本研究探讨了现代文学小说中社会结构和现实生活事件的影响,通过使用文本分析方法来解释这些现实现象。
  • methods: 本研究使用了自然语言处理(NLP)方法,包括情感分析、故事概要和主题分析,以及视觉化技术来分析和检索现代文学小说中的人物互动。
  • results: 研究发现,使用人物互动图(或网络)可以帮助解释和检索现代文学小说中的社会问题,并且可以对 Bengali 小说的影响进行特定的评估和信息检索。
    Abstract Social structures and real-world incidents often influence contemporary literary fiction. Existing research in literary fiction analysis explains these real-world phenomena through the manual critical analysis of stories. Conventional Natural Language Processing (NLP) methodologies, including sentiment analysis, narrative summarization, and topic modeling, have demonstrated substantial efficacy in analyzing and identifying similarities within fictional works. However, the intricate dynamics of character interactions within fiction necessitate a more nuanced approach that incorporates visualization techniques. Character interaction graphs (or networks) emerge as a highly suitable means for visualization and information retrieval from the realm of fiction. Therefore, we leverage character interaction graphs with NLP-derived features to explore a diverse spectrum of societal inquiries about contemporary culture's impact on the landscape of literary fiction. Our study involves constructing character interaction graphs from fiction, extracting relevant graph features, and exploiting these features to resolve various real-life queries. Experimental evaluation of influential Bengali fiction over half a century demonstrates that character interaction graphs can be highly effective in specific assessments and information retrieval from literary fiction. Our data and codebase are available at https://cutt.ly/fbMgGEM
    摘要 社会结构和现实生活中的事件常常影响当代文学小说。现有的文学分析研究通过手动分析故事来解释现实世界现象。传统的自然语言处理(NLP)方法,包括情感分析、简要摘要和话题模型,已经证明了在文学作品中的有效性。然而,小说中人物之间的复杂关系需要一种更细微的方法,该方法包括视觉化技术。因此,我们利用小说中人物之间的互动图(或网络)来可视化和检索文学作品中的信息。我们的研究包括从小说中构建人物互动图,提取有关图的重要特征,并利用这些特征来解决现实生活中各种社会问题。我们对印度文学中的影响力有限的 Bengali 小说进行实验评估,发现人物互动图可以在特定的评价和信息检索中表现出非常高效。我们的数据和代码库可以在以下链接中找到:https://cutt.ly/fbMgGEM。

Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation

  • paper_url: http://arxiv.org/abs/2310.16964
  • repo_url: https://github.com/langus0/critic-aware-decoding
  • paper_authors: Mateusz Lango, Ondřej Dušek
  • for: Mitigating hallucinations in neural data-to-text generation
  • methods: Combining probabilistic output of a generator language model with output of a special “text critic” classifier
  • results: Improved performance on WebNLG and OpenDialKG benchmarks
    Abstract Hallucination of text ungrounded in the input is a well-known problem in neural data-to-text generation. Many methods have been proposed to mitigate it, but they typically require altering model architecture or collecting additional data, and thus cannot be easily applied to an existing model. In this paper, we explore a new way to mitigate hallucinations by combining the probabilistic output of a generator language model (LM) with the output of a special "text critic" classifier, which guides the generation by assessing the match between the input data and the text generated so far. Our method does not need any changes to the underlying LM's architecture or training procedure and can thus be combined with any model and decoding operating on word probabilities. The critic does not need any additional training data, using the base LM's training data and synthetic negative examples. Our experimental results show that our method improves over the baseline on the WebNLG and OpenDialKG benchmarks.
    摘要 幻像文本不受输入数据支持是神经网络数据到文本生成领域的一个常见问题。许多方法已经被提出来解决这个问题,但它们通常需要修改模型结构或收集更多数据,因此无法轻松应用于现有模型。在这篇论文中,我们探索了一种新的幻像 mitigation 方法,通过将生成语言模型(LM)的概率输出与一个特殊的 "文本评估器" 类型的分类器结合使用,以评估输入数据和已生成文本之间的匹配度。我们的方法不需要对基础LM的结构或训练过程进行任何更改,因此可以与任何模型和word probabilitiesdecoding进行组合。批评器也不需要额外的训练数据,只需使用基础LM的训练数据和 sintetic的负例来训练。我们的实验结果显示,我们的方法在WebNLG和OpenDialKG benchmark上表现出色。

Muslim-Violence Bias Persists in Debiased GPT Models

  • paper_url: http://arxiv.org/abs/2310.18368
  • repo_url: None
  • paper_authors: Babak Hemmatian, Razan Baltaji, Lav R. Varshney
  • for: 本研究旨在探讨GPT-3语言模型是否具有对穆斯林的偏见,以及如何使用不同的提示方式来降低这种偏见。
  • methods: 本研究使用了两个预注册的复制实验,其中一个使用了GPT-3模型,另一个使用了ChatGPT模型。两个实验中使用了不同的提示方式,以检测GPT-3模型是否具有偏见。
  • results: 研究发现,使用GPT-3模型时,对穆斯林的提示可能会导致更多的暴力结果,而对其他宗教的提示则不会。此外,使用ChatGPT模型时,也发现了类似的偏见。研究还发现了一些宗教特定的暴力主题,这些主题具有很强的不受欢迎的想法。
    Abstract Abid et al. (2021) showed a tendency in GPT-3 to generate violent completions when prompted about Muslims, compared with other religions. Two pre-registered replication attempts found few violent completions and only the weakest anti-Muslim bias in the Instruct version, fine-tuned to eliminate biased and toxic outputs. However, more pre-registered experiments showed that using common names associated with the religions in prompts increases several-fold the rate of violent completions, revealing a highly significant second-order bias against Muslims. Our content analysis revealed religion-specific violent themes containing highly offensive ideas regardless of prompt format. Replications with ChatGPT suggest that any effects of GPT-3's de-biasing have disappeared with continued model development, as this newer model showed both a strong Muslim-violence bias and rates of violent completions closer to Abid et al. (2021). Our results show the need for continual de-biasing of models in ways that address higher-order associations.
    摘要 阿比德等人 (2021) 发现,当提供有关穆斯林的提示时,GPT-3会生成暴力完成的倾向,相比其他宗教。两个预先注册的复现尝试发现了少量的暴力完成和只有最弱的反伊斯兰偏见在指导版本中,经过精心修改以消除恐怖和毒害输出。然而,更多的预先注册实验表明,在提示中使用宗教名称可以增加数量多少倍的暴力完成,揭示了高度显著的第二阶段偏见对穆斯林。我们的内容分析发现,不同宗教的暴力主题具有高度冒犯性的想法,无论提示格式如何。复现使用ChatGPT表明,GPT-3的去偏见效果已经消失,这 newer模型显示了强穆斯林暴力偏见和与Abid et al. (2021) 相似的暴力完成率。我们的结果表明,需要不断地去偏见模型,以解决更高级别的关联。

Zephyr: Direct Distillation of LM Alignment

  • paper_url: http://arxiv.org/abs/2310.16944
  • repo_url: https://github.com/huggingface/alignment-handbook
  • paper_authors: Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf
  • for: 这 paper 的目的是提高 chat 模型的意图对接。
  • methods: 这 paper 使用了 distilled supervised fine-tuning (dSFT) 和 preference data from AI Feedback (AIF) 来学习一个高效的 chat 模型。
  • results: 这 paper 的最终结果是 Zephyr-7B,这是一个基于 7B 参数模型的 chat 模型,在 chat benchmark 上达到了新的州OF-the-art 水平,并不需要人工标注。
    Abstract We aim to produce a smaller language model that is aligned to user intent. Previous research has shown that applying distilled supervised fine-tuning (dSFT) on larger models significantly improves task accuracy; however, these models are unaligned, i.e. they do not respond well to natural prompts. To distill this property, we experiment with the use of preference data from AI Feedback (AIF). Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment. The approach requires only a few hours of training without any additional sampling during fine-tuning. The final result, Zephyr-7B, sets the state-of-the-art on chat benchmarks for 7B parameter models, and requires no human annotation. In particular, results on MT-Bench show that Zephyr-7B surpasses Llama2-Chat-70B, the best open-access RLHF-based model. Code, models, data, and tutorials for the system are available at https://github.com/huggingface/alignment-handbook.
    摘要 我团队目标是开发一个更小的语言模型,并将其与用户意图进行对齐。前一研究表明,通过对更大的模型进行混合精度微调(dSFT)可以显著提高任务准确率,但这些模型通常不具备自然提示的响应能力。为了抓取这个特性,我们尝试使用人工智能反馈(AIF)的偏好数据进行直接偏好优化(dDPO),从而学习一个与用户意图更好地对齐的对话模型。这种方法只需几个小时的训练,不需任何额外的采样,并且不需人工标注。最终的结果是Zephyr-7B,它在7B参数模型上设置了对话benchmark的新纪录,并且在MT-Bench上超越了Llama2-Chat-70B,这是最佳的开放访问RLHF-based模型。我们提供了相关的代码、模型、数据和教程,可以在https://github.com/huggingface/alignment-handbook上下载。

Learning Transfers over Several Programming Languages

  • paper_url: http://arxiv.org/abs/2310.16937
  • repo_url: https://github.com/Sfedfcv/redesigned-pancake
  • paper_authors: Razan Baltaji, Saurabh Pujar, Louis Mandel, Martin Hirzel, Luca Buratti, Lav Varshney
  • for: 这篇论文旨在探讨跨语言传输学习在编程语言中的可行性和效果。
  • methods: 该论文使用了一种基于转换器的大型自然语言模型,并在11到41种编程语言中进行了广泛的实验,以探讨以下问题:首先,跨语言传输在不同语言对的效果如何?其次,给定任务和目标语言,如何选择最佳的源语言?第三,哪些语言对传输性能有优先顺序?第四,这些因素受到任务的影响如何?
  • results: 研究发现,跨语言传输可以在多种任务中提供有效的提升,且可以通过选择合适的源语言来提高效果。此外,研究还发现了一些语言对传输性能有优先顺序的特征,这些特征可以用于任务选择和语言对选择。
    Abstract Large language models (LLMs) have recently become remarkably good at improving developer productivity for high-resource programming languages. These models use two kinds of data: large amounts of unlabeled code samples for pretraining and relatively smaller amounts of labeled code samples for fine-tuning or in-context learning. Unfortunately, many programming languages are low-resource, lacking labeled samples for most tasks and often even lacking unlabeled samples. Therefore, users of low-resource languages (e.g., legacy or new languages) miss out on the benefits of LLMs. Cross-lingual transfer learning uses data from a source language to improve model performance on a target language. It has been well-studied for natural languages, but has received little attention for programming languages. This paper reports extensive experiments on four tasks using a transformer-based LLM and 11 to 41 programming languages to explore the following questions. First, how well cross-lingual transfer works for a given task across different language pairs. Second, given a task and target language, how to best choose a source language. Third, the characteristics of a language pair that are predictive of transfer performance, and fourth, how that depends on the given task.
    摘要 大型语言模型 (LLM) 在高资源编程语言中提高开发人员产量的能力已经很有起色。这些模型使用两种数据:大量的无标示代码样本用于预训练,以及相对较小的标注代码样本用于细化或在场景学习。然而,许多编程语言是低资源的,缺乏大多数任务的标注样本,甚至缺乏无标示样本。因此,使用低资源语言的用户(例如遗产语言或新语言)无法享受到 LLM 的好处。 Cross-lingual transfer learning 使用来自源语言的数据来改善目标语言的模型性能。它在自然语言方面得到了广泛的研究,但在编程语言方面得到了少量的注意。本文报告了使用 transformer-based LLM 和 11 到 41 种编程语言进行了广泛的实验,以探索以下问题:1. across different language pairs, how well does cross-lingual transfer work for a given task?2. given a task and target language, how to best choose a source language?3. what are the characteristics of a language pair that are predictive of transfer performance, and4. how does that depend on the given task?

Physician Detection of Clinical Harm in Machine Translation: Quality Estimation Aids in Reliance and Backtranslation Identifies Critical Errors

  • paper_url: http://arxiv.org/abs/2310.16924
  • repo_url: https://github.com/n-mehandru/physicianqe
  • paper_authors: Nikita Mehandru, Sweta Agrawal, Yimin Xiao, Elaine C Khoong, Ge Gao, Marine Carpuat, Niloufar Salehi
  • for: 这个论文的目的是提高机器翻译(MT)在实际应用中的可靠性,特别是帮助用户做出有知识的决策。
  • methods: 这篇论文使用了质量评估技术来自动评估MT质量,并在实际应用场景中进行了人类研究,以评估这些技术的有效性。
  • results: 研究发现,基于质量评估的干预可以提高用户对MT输出的有效使用,但是返回翻译可以帮助医生检测更多的严重错误,而质量评估单独无法捕捉这些错误。
    Abstract A major challenge in the practical use of Machine Translation (MT) is that users lack guidance to make informed decisions about when to rely on outputs. Progress in quality estimation research provides techniques to automatically assess MT quality, but these techniques have primarily been evaluated in vitro by comparison against human judgments outside of a specific context of use. This paper evaluates quality estimation feedback in vivo with a human study simulating decision-making in high-stakes medical settings. Using Emergency Department discharge instructions, we study how interventions based on quality estimation versus backtranslation assist physicians in deciding whether to show MT outputs to a patient. We find that quality estimation improves appropriate reliance on MT, but backtranslation helps physicians detect more clinically harmful errors that QE alone often misses.
    摘要 Machine Translation(MT)在实际应用中的一个主要挑战是用户缺乏指导来做出了解MT输出的决策。质量评估研究的进步提供了自动评估MT质量的技术,但这些技术主要在室外进行了人工评估,而不是在特定的使用场景下进行评估。本文通过医疗高危 Settings中的医生决策模拟研究,评估了基于质量评估和反编译的干预对MT输出的影响。我们发现,质量评估可以提高对MT的有效依赖,但反编译可以帮助医生检测更多的严重错误,这些错误经常被QEalonemiss。

Divide et Impera: Multi-Transformer Architectures for Complex NLP-Tasks

  • paper_url: http://arxiv.org/abs/2310.16897
  • repo_url: None
  • paper_authors: Solveig Helland, Elena Gavagnin, Alexandre de Spindler
  • for: 解决复杂的自然语言处理任务,例如减少性别偏见。
  • methods: 将复杂任务拆分成 simpler subtask,并使用多个transformer模型进行 fine-tuning,以实现更好的控制性。
  • results: 在使用多个模型进行 fine-tuning时,性别偏见减少的效果更好than使用单个模型。
    Abstract The growing capabilities of transformer models pave the way for solving increasingly complex NLP tasks. A key to supporting application-specific requirements is the ability to fine-tune. However, compiling a fine-tuning dataset tailored to complex tasks is tedious and results in large datasets, limiting the ability to control transformer output. We present an approach in which complex tasks are divided into simpler subtasks. Multiple transformer models are fine-tuned to one subtask each, and lined up to accomplish the complex task. This simplifies the compilation of fine-tuning datasets and increases overall controllability. Using the example of reducing gender bias as a complex task, we demonstrate our approach and show that it performs better than using a single model.
    摘要 transformer 模型的增长能力为解决越来越复杂的自然语言处理任务开创了道路。一个关键是可以微调,但是为复杂任务编译微调数据集是费时且导致数据集较大,限制了transformer输出的控制。我们提出了一种方法,将复杂任务分解成更简单的子任务。多个transformer模型对每个子任务进行微调,并将它们组合起来完成复杂任务。这种方法可以简化微调数据集的编译,并提高总的控制性。使用减少性别偏见为复杂任务的示例,我们示出了我们的方法的效果,并表明它在单一模型的情况下表现更好。

Discrete Diffusion Language Modeling by Estimating the Ratios of the Data Distribution

  • paper_url: http://arxiv.org/abs/2310.16834
  • repo_url: None
  • paper_authors: Aaron Lou, Chenlin Meng, Stefano Ermon
    for:* This paper aims to improve the performance of diffusion models on discrete data domains, such as natural language, by proposing a novel discrete score matching loss called score entropy.methods:* The proposed method, called Score Entropy Discrete Diffusion (SEDD), uses a denoising variant of the score entropy loss to efficiently optimize the model for maximum likelihood training.results:* The SEDD model achieves highly competitive likelihoods compared to the baseline GPT-2 model, and has several algorithmic advantages such as learning a more faithful sequence distribution, trading off compute for generation quality, and enabling arbitrary infilling beyond the standard left to right prompting.
    Abstract Despite their groundbreaking performance for many generative modeling tasks, diffusion models have fallen short on discrete data domains such as natural language. Crucially, standard diffusion models rely on the well-established theory of score matching, but efforts to generalize this to discrete structures have not yielded the same empirical gains. In this work, we bridge this gap by proposing score entropy, a novel discrete score matching loss that is more stable than existing methods, forms an ELBO for maximum likelihood training, and can be efficiently optimized with a denoising variant. We scale our Score Entropy Discrete Diffusion models (SEDD) to the experimental setting of GPT-2, achieving highly competitive likelihoods while also introducing distinct algorithmic advantages. In particular, when comparing similarly sized SEDD and GPT-2 models, SEDD attains comparable perplexities (normally within $+10\%$ of and sometimes outperforming the baseline). Furthermore, SEDD models learn a more faithful sequence distribution (around $4\times$ better compared to GPT-2 models with ancestral sampling as measured by large models), can trade off compute for generation quality (needing only $16\times$ fewer network evaluations to match GPT-2), and enables arbitrary infilling beyond the standard left to right prompting.
    摘要 尽管扩散模型在许多生成模型任务上表现出色,但它们在自然语言类数据上的表现却不如预期。原因是标准的扩散模型依赖于已经成熟的分数匹配理论,但将这种理论扩展到逻辑结构上并没有得到同样的实验性提升。在这项工作中,我们bridges这个差距,提出了一种新的简 discrete分数匹配损失函数,即分数 entropy,它更稳定、能够形成 ELBO для最大化可能性训练,并且可以高效地使用降噪变体进行优化。我们在 GPT-2 实验设置下扩大了 Score Entropy Discrete Diffusion 模型(SEDD),达到了非常竞争的可能性,同时也提供了一些算法优势。具体来说,与相同大小的 SEDD 和 GPT-2 模型相比,SEDD 可以达到与基准相同的折算值(通常在 $+10\%$ 以内,有时 même outperforming 基准),并且 SEDD 模型可以更好地学习数据序列分布(与 GPT-2 模型在 ancestral sampling 下的分布相比,大约 $4\times$ 更好),可以交换计算量和生成质量(只需 $16\times$ fewer network evaluations 可以与 GPT-2 模型匹配),并且允许随意填充 beyond 标准的左到右提示。

Language Agnostic Code Embeddings

  • paper_url: http://arxiv.org/abs/2310.16803
  • repo_url: https://github.com/snknitin/Multilingual-Embeddings-using-ACS-for-Cross-lingual-NLP
  • paper_authors: Saiteja Utpala, Alex Gu, Pin Yu Chen
  • for: 本研究探讨了多种编程语言的代码嵌入,尤其是跨语言代码嵌入的跨语言能力。
  • methods: 通过探索实验,研究发现代码嵌入包含两个不同组成部分:一个深深地关联到特定语言的细节和 sintaxis,另一个主要关注 semantics,不受语言细节影响。
  • results: 当我们隔离并消除语言特定的组成部分时,在下游代码检索任务中观察到显著改善,MRR提高了+17。
    Abstract Recently, code language models have achieved notable advancements in addressing a diverse array of essential code comprehension and generation tasks. Yet, the field lacks a comprehensive deep dive and understanding of the code embeddings of multilingual code models. In this paper, we present a comprehensive study on multilingual code embeddings, focusing on the cross-lingual capabilities of these embeddings across different programming languages. Through probing experiments, we demonstrate that code embeddings comprise two distinct components: one deeply tied to the nuances and syntax of a specific language, and the other remaining agnostic to these details, primarily focusing on semantics. Further, we show that when we isolate and eliminate this language-specific component, we witness significant improvements in downstream code retrieval tasks, leading to an absolute increase of up to +17 in the Mean Reciprocal Rank (MRR).
    摘要

Detecting Pretraining Data from Large Language Models

  • paper_url: http://arxiv.org/abs/2310.16789
  • repo_url: https://github.com/swj0419/detect-pretrain-code
  • paper_authors: Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, Luke Zettlemoyer
    for:这篇论文是为了研究大型自然语言模型(LLM)的预训练数据检测问题而写的。methods:这篇论文提出了一种新的检测方法,即使用Min-K% Prob方法,该方法基于一个简单的假设:未看过的示例可能会包含一些低概率词语,而已经看过的示例则 less likely 会有这些低概率词语。此外,这种方法不需要知道预训练词库或任何额外训练,因此与之前的检测方法不同。results:实验表明,Min-K% Prob 方法在 WIKIMIA 上比之前的方法提高了7.4%。此外,这种方法在实际应用中,如检测版权书籍和污染下游示例的问题上也表现了良好的效果。
    Abstract Although large language models (LLMs) are widely deployed, the data used to train them is rarely disclosed. Given the incredible scale of this data, up to trillions of tokens, it is all but certain that it includes potentially problematic text such as copyrighted materials, personally identifiable information, and test data for widely reported reference benchmarks. However, we currently have no way to know which data of these types is included or in what proportions. In this paper, we study the pretraining data detection problem: given a piece of text and black-box access to an LLM without knowing the pretraining data, can we determine if the model was trained on the provided text? To facilitate this study, we introduce a dynamic benchmark WIKIMIA that uses data created before and after model training to support gold truth detection. We also introduce a new detection method Min-K% Prob based on a simple hypothesis: an unseen example is likely to contain a few outlier words with low probabilities under the LLM, while a seen example is less likely to have words with such low probabilities. Min-K% Prob can be applied without any knowledge about the pretraining corpus or any additional training, departing from previous detection methods that require training a reference model on data that is similar to the pretraining data. Moreover, our experiments demonstrate that Min-K% Prob achieves a 7.4% improvement on WIKIMIA over these previous methods. We apply Min-K% Prob to two real-world scenarios, copyrighted book detection, and contaminated downstream example detection, and find it a consistently effective solution.
    摘要 尽管大型语言模型(LLM)广泛应用,但训练它们的数据几乎 nunca 被披露。这些数据的规模可以达到数十亿个字符,因此可能包含可能有问题的文本,如版权保护的内容、个人可识别信息和报道的参考基准测试数据。然而,我们目前没有任何方式可以了解这些类型的数据是否包含在内,以及它们的占比。在这篇论文中,我们研究了预训练数据检测问题:给定一个文本,无需知道预训练数据,可以使用黑框访问 LLl 来判断这个文本是否包含在预训练数据中?为了支持这项研究,我们引入了一个动态 benchmark 名为 WIKIMIA,它使用了在模型训练前后创建的数据来支持金实验 truth 检测。我们还提出了一种新的检测方法,即 Min-K% Prob,基于简单的假设:未seen 的例子很可能包含一些低概率的单词,而seen 的例子则更 unlikely 会有这些低概率的单词。Min-K% Prob 可以无需了解预训练集或任何额外训练,与之前的检测方法不同。此外,我们的实验表明,Min-K% Prob 在 WIKIMIA 上的提升为 7.4%。我们在实际应用中使用 Min-K% Prob 来检测版权保护的书籍和下游示例中的污染,发现它是一个可靠的解决方案。

Kiki or Bouba? Sound Symbolism in Vision-and-Language Models

  • paper_url: http://arxiv.org/abs/2310.16781
  • repo_url: None
  • paper_authors: Morris Alper, Hadar Averbuch-Elor
  • for: 这个研究探讨了语音 Symbolism在计算机视觉语言模型CLIP和Stable Diffusion中是否存在强烈的印证。
  • methods: 该研究使用零shot知识探测来调查这些模型内置的知识,并发现它们 действительно具有这种印证,与心理语言学中知名的吃苹果效应相似。
  • results: 该研究发现计算机视觉语言模型CLIP和Stable Diffusion中存在强烈的印证,证明了语音 Symbolism的存在并提供了一种计算机方法来证明和理解它的性质。
    Abstract Although the mapping between sound and meaning in human language is assumed to be largely arbitrary, research in cognitive science has shown that there are non-trivial correlations between particular sounds and meanings across languages and demographic groups, a phenomenon known as sound symbolism. Among the many dimensions of meaning, sound symbolism is particularly salient and well-demonstrated with regards to cross-modal associations between language and the visual domain. In this work, we address the question of whether sound symbolism is reflected in vision-and-language models such as CLIP and Stable Diffusion. Using zero-shot knowledge probing to investigate the inherent knowledge of these models, we find strong evidence that they do show this pattern, paralleling the well-known kiki-bouba effect in psycholinguistics. Our work provides a novel method for demonstrating sound symbolism and understanding its nature using computational tools. Our code will be made publicly available.
    摘要 尽管人类语言中音与意义的映射被视为大体是随机的,但是认知科学研究发现,不同语言和人群之间的声音和意义之间存在一定的相互关系,这种现象被称为声 Symbolism。在多种意义维度中,声 Symbolism 特别是与视觉领域的交互关系非常显著,在这种情况下,我们研究了 CLIP 和 Stable Diffusion 等视觉语言模型是否具备声 Symbolism 特征。通过零 shot 知识探测,我们发现这些模型确实具备这种特征,与 психолингвисти学中著名的 kiki-bouba 效应相似。我们的工作提供了一种计算工具来探索和理解声 Symbolism 的方法,代码将公开发布。

IntenDD: A Unified Contrastive Learning Approach for Intent Detection and Discovery

  • paper_url: http://arxiv.org/abs/2310.16761
  • repo_url: None
  • paper_authors: Bhavuk Singhal, Ashim Gupta, Shivasankaran V P, Amrith Krishna
  • for: 本文旨在提出一种能够同时处理多类和多标签的任务 oriented dialogue系统中的意向识别任务。
  • methods: 本文提出了一种名为 IntenDD 的独特方法,它利用共享的语句编码器来解决意向识别任务。该方法采用了一种不需要监督的对比学习策略,其中 pseudo-labels 是基于语句的字典特征来生成的。此外,本文还提出了一种两步后处理设置,用于类型化任务,其中包括卷积投影和修正。
  • results: 经过广泛的测试,本文发现 IntenDD 可以在多个数据集上与竞争对手相比,常常具有更高的性能。特别是,在少量数据情况下,IntenDD 的性能提高了2.32%、1.26%和1.52%。
    Abstract Identifying intents from dialogue utterances forms an integral component of task-oriented dialogue systems. Intent-related tasks are typically formulated either as a classification task, where the utterances are classified into predefined categories or as a clustering task when new and previously unknown intent categories need to be discovered from these utterances. Further, the intent classification may be modeled in a multiclass (MC) or multilabel (ML) setup. While typically these tasks are modeled as separate tasks, we propose IntenDD, a unified approach leveraging a shared utterance encoding backbone. IntenDD uses an entirely unsupervised contrastive learning strategy for representation learning, where pseudo-labels for the unlabeled utterances are generated based on their lexical features. Additionally, we introduce a two-step post-processing setup for the classification tasks using modified adsorption. Here, first, the residuals in the training data are propagated followed by smoothing the labels both modeled in a transductive setting. Through extensive evaluations on various benchmark datasets, we find that our approach consistently outperforms competitive baselines across all three tasks. On average, IntenDD reports percentage improvements of 2.32%, 1.26%, and 1.52% in their respective metrics for few-shot MC, few-shot ML, and the intent discovery tasks respectively.
    摘要 标准化对话语言理解是任务导向对话系统的一个重要组成部分。intent相关任务通常被формализова为分类任务,其中对话语言被分类为预定的类别,或者为聚类任务,当新的意图类别需要从对话语言中发现时。此外,意向分类可能是多类(MC)或多标签(ML)的设置。通常这些任务是分开模型的,但我们提出了IntenDD,一种综合方法,利用共享对话语言编码核心。IntenDD使用了一种完全无监督对比学习的表征学习策略,其中 pseudo-标签 для无标签对话语言是基于其语言特征生成的。此外,我们引入了一种两步后处理设置,其中首先在训练数据中的剩余被传播,然后对模型进行滑动平滑。通过对多个 benchmark 数据集进行广泛的评估,我们发现,我们的方法在所有三个任务中 consistently 超越竞争对手的基eline。在 average 的情况下,IntenDD Report了分类任务的准确率提高2.32%、1.26%和1.52%。

DISCO: A Large Scale Human Annotated Corpus for Disfluency Correction in Indo-European Languages

  • paper_url: http://arxiv.org/abs/2310.16749
  • repo_url: https://github.com/vineet2104/disco
  • paper_authors: Vineet Bhat, Preethi Jyothi, Pushpak Bhattacharyya
  • for: 该研究的目的是提供多语言干扰纠正(DC)的高质量人类标注数据集,以便进行多语言干扰纠正研究。
  • methods: 该研究使用了现有的DC模型,对四种重要的印欧语言(英语、希腊语、德语和法语)进行了分析。
  • results: 研究得到了四种语言的DC模型的F1分数,分别为97.55(英语)、94.29(希腊语)、95.89(德语)和92.97(法语)。此外,研究还表明,DC可以提高下游任务的BLEU分数平均5.65分。
    Abstract Disfluency correction (DC) is the process of removing disfluent elements like fillers, repetitions and corrections from spoken utterances to create readable and interpretable text. DC is a vital post-processing step applied to Automatic Speech Recognition (ASR) outputs, before subsequent processing by downstream language understanding tasks. Existing DC research has primarily focused on English due to the unavailability of large-scale open-source datasets. Towards the goal of multilingual disfluency correction, we present a high-quality human-annotated DC corpus covering four important Indo-European languages: English, Hindi, German and French. We provide extensive analysis of results of state-of-the-art DC models across all four languages obtaining F1 scores of 97.55 (English), 94.29 (Hindi), 95.89 (German) and 92.97 (French). To demonstrate the benefits of DC on downstream tasks, we show that DC leads to 5.65 points increase in BLEU scores on average when used in conjunction with a state-of-the-art Machine Translation (MT) system. We release code to run our experiments along with our annotated dataset here.
    摘要 “缺乏流畅性调正(DC)是将口语说话中的填充词、重复和修正元素移除,以创建可读和解释的文本的过程。DC 是自动语音识别(ASR)输出的重要后处理步骤,在下游语言理解任务前进行。现有 DC 研究主要集中在英语上,因为大规模的开源数据集的不足。为了多语言缺乏流畅性调正,我们发布了高品质的人类评估 DC 数据库,覆盖四种重要的印欧语言:英语、希腊语、德语和法语。我们提供了广泛的 DC 模型的结果分析,其中英语的 F1 分数为 97.55,希腊语的 F1 分数为 94.29,德语的 F1 分数为 95.89,法语的 F1 分数为 92.97。为了证明 DC 对下游任务的好处,我们显示了 DC 对 Machine Translation(MT)系统的均值 BLEU 分数提高 5.65 分。我们在这里发布了实验代码和数据。”

HANSEN: Human and AI Spoken Text Benchmark for Authorship Analysis

  • paper_url: http://arxiv.org/abs/2310.16746
  • repo_url: None
  • paper_authors: Nafis Irtiza Tripto, Adaku Uchendu, Thai Le, Mattia Setzu, Fosca Giannotti, Dongwon Lee
    for:这个论文主要是为了提高人工智能对口语文本的分析能力。methods:这个论文使用了现有的口语数据集,以及使用3种知名的大语言模型(ChatGPT、PaLM2和Vicuna13B)生成的人工生成的口语数据集。results:这个论文通过对人类口语数据集和人工生成的口语数据集进行作者归属分析和作者验证,以及人工生成 spoken text检测等方面的研究,以提高人工智能对口语文本的分析能力。
    Abstract Authorship Analysis, also known as stylometry, has been an essential aspect of Natural Language Processing (NLP) for a long time. Likewise, the recent advancement of Large Language Models (LLMs) has made authorship analysis increasingly crucial for distinguishing between human-written and AI-generated texts. However, these authorship analysis tasks have primarily been focused on written texts, not considering spoken texts. Thus, we introduce the largest benchmark for spoken texts - HANSEN (Human ANd ai Spoken tExt beNchmark). HANSEN encompasses meticulous curation of existing speech datasets accompanied by transcripts, alongside the creation of novel AI-generated spoken text datasets. Together, it comprises 17 human datasets, and AI-generated spoken texts created using 3 prominent LLMs: ChatGPT, PaLM2, and Vicuna13B. To evaluate and demonstrate the utility of HANSEN, we perform Authorship Attribution (AA) & Author Verification (AV) on human-spoken datasets and conducted Human vs. AI spoken text detection using state-of-the-art (SOTA) models. While SOTA methods, such as, character ngram or Transformer-based model, exhibit similar AA & AV performance in human-spoken datasets compared to written ones, there is much room for improvement in AI-generated spoken text detection. The HANSEN benchmark is available at: https://huggingface.co/datasets/HANSEN-REPO/HANSEN.
    摘要 《作者分析》也称为《风格分析》,已经是自然语言处理(NLP)的一个基本方面。而在最近,大型语言模型(LLMs)的发展使得作者分析变得越来越重要,以确定人类写作和人工智能生成的文本之间的区别。然而,这些作者分析任务主要集中在written文本上,忽略了口语文本。因此,我们介绍了最大的口语文本benchmark - HANSEN(人类和AI语音文本benchmark)。HANSEN包括了仔细筹集的现有口语数据集,并与创建了使用3个知名的LLMs:ChatGPT、PaLM2和Vicuna13B生成的口语文本数据集。总的来说,HANSEN包括17个人类数据集,以及由AI生成的口语文本。为了评估和利用HANSEN,我们在人类口语数据集上进行了作者归属(AA)和作者验证(AV),并使用当前最佳(SOTA)模型进行人类vsAI口语文本检测。虽然SOTA方法,如字符串igrams或Transformer基本模型,在人类口语数据集上与written数据集相比, AA&AV性能相似,但AI生成口语文本检测仍有很大的改进空间。HANSEN benchmark可以在以下地址获取:https://huggingface.co/datasets/HANSEN-REPO/HANSEN。

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation

  • paper_url: http://arxiv.org/abs/2310.16738
  • repo_url: https://github.com/wangxieric/bias-crs
  • paper_authors: Xi Wang, Hossein A. Rahmani, Jiqun Liu, Emine Yilmaz
  • for: 这个论文的目的是提出两种新的数据增强策略,以解决 conversational recommendation 中的各种偏见问题。
  • methods: 该论文使用了语言模型和数据增强技术,并从 Success of generative data 中灵感得到了两种新的数据增强策略:’Once-Aug’ 和 ‘PopNudge’。
  • results: 经过对 ReDial 和 TG-ReDial 数据集的广泛实验,该论文表明了 CRS 技术的表现得到了通过数据增强方法的改进,并提供了多种偏见问题的解决方案。
    Abstract Conversational Recommendation System (CRS) is a rapidly growing research area that has gained significant attention alongside advancements in language modelling techniques. However, the current state of conversational recommendation faces numerous challenges due to its relative novelty and limited existing contributions. In this study, we delve into benchmark datasets for developing CRS models and address potential biases arising from the feedback loop inherent in multi-turn interactions, including selection bias and multiple popularity bias variants. Drawing inspiration from the success of generative data via using language models and data augmentation techniques, we present two novel strategies, 'Once-Aug' and 'PopNudge', to enhance model performance while mitigating biases. Through extensive experiments on ReDial and TG-ReDial benchmark datasets, we show a consistent improvement of CRS techniques with our data augmentation approaches and offer additional insights on addressing multiple newly formulated biases.
    摘要 对话推荐系统(CRS)是一个迅速成长的研究领域,与语言模型技术的进步相互发展。然而,目前的对话推荐面临许多挑战,主要是因为这个领域的相对新颖性和有限的现有贡献。在这篇研究中,我们探索了对话推荐模型的 benchmarck 数据集,并处理了从多轮互动中产生的可能的偏见,包括选择偏见和多个流行度偏见的多种变体。受到语言模型和数据增强技术的成功的灵感,我们提出了两种新的策略,“Once-Aug”和“PopNudge”,以提高模型性能并减少偏见。通过广泛的实验,我们显示了对 ReDial 和 TG-ReDial 实验数据集的一致性改进,并提供了多个对多个新的偏见的解决方案。

Disentangling Extraction and Reasoning in Multi-hop Spatial Reasoning

  • paper_url: http://arxiv.org/abs/2310.16731
  • repo_url: None
  • paper_authors: Roshanak Mirzaee, Parisa Kordjamshidi
  • for: 这篇论文旨在探讨文本空间理解的挑战,以及如何通过分离信息提取和理解过程来解决这个挑战。
  • methods: 作者设计了多种模型,其中一些模型将信息提取和理解分离开来,并与现有基eline进行比较。
  • results: 实验结果表明,分离信息提取和理解过程可以提高模型在真实数据领域的普适性。
    Abstract Spatial reasoning over text is challenging as the models not only need to extract the direct spatial information from the text but also reason over those and infer implicit spatial relations. Recent studies highlight the struggles even large language models encounter when it comes to performing spatial reasoning over text. In this paper, we explore the potential benefits of disentangling the processes of information extraction and reasoning in models to address this challenge. To explore this, we design various models that disentangle extraction and reasoning(either symbolic or neural) and compare them with state-of-the-art(SOTA) baselines with no explicit design for these parts. Our experimental results consistently demonstrate the efficacy of disentangling, showcasing its ability to enhance models' generalizability within realistic data domains.
    摘要 对文本空间逻辑是挑战,因为模型不仅需要从文本中提取直接的空间信息,而且还需要对其进行推理和推论,推理出隐藏的空间关系。Recent studies表明,even large language models still struggle with spatial reasoning over text. 在这篇论文中,我们探索了分解模型中提取信息和推理的过程可能带来的可能性。为了做到这一点,我们设计了不同的模型,其中包括分解为符号或神经网络的模型,并与现有的基elines进行比较。我们的实验结果一致地表明了分解的效果,它能够提高模型在真实数据领域的普遍性。

  • paper_url: http://arxiv.org/abs/2310.16712
  • repo_url: None
  • paper_authors: Ganesh Jawahar, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Dujian Ding
  • for: 这个论文的目的是使用大型自然语言处理(NL)模型来建立性能预测模型(PP),以便预测特定深度神经网络架构在下游任务上的性能。
  • methods: 这个论文使用的方法包括设计PP提示语 для大型NL模型(LLM),其中包括角色描述、指令集、超参数定义和示例架构。在机器翻译(MT)任务上,我们发现使用LLM-PP提示语可以准确预测架构性能,并且与现有最佳性能预测器相当。此外,我们还提出了一种将PP预测结果进行混合压缩(LLM-Distill-PP),以实现成本效果的性能预测模型。
  • results: 这个论文的结果表明,使用LLM-PP提示语可以准确预测架构性能,并且可以在机器翻译(MT)任务上实现类似于现有最佳性能预测器的性能。此外,我们还提出了一种Hybrid-Search算法(HS-NAS),使用LLM-Distill-PP进行初始部分的搜索,然后使用基eline预测器进行剩余的搜索。我们的HS-NAS方法可以在不同的benchmark上实现相似于最佳NAS方法的性能,并且可以降低搜索时间约50%。
    Abstract Large language models (LLMs) have become an integral component in solving a wide range of NLP tasks. In this work, we explore a novel use case of using LLMs to build performance predictors (PP): models that, given a specific deep neural network architecture, predict its performance on a downstream task. We design PP prompts for LLMs consisting of: (i) role: description of the role assigned to the LLM, (ii) instructions: set of instructions to be followed by the LLM to carry out performance prediction, (iii) hyperparameters: a definition of each architecture-specific hyperparameter and (iv) demonstrations: sample architectures along with their efficiency metrics and 'training from scratch' performance. For machine translation (MT) tasks, we discover that GPT-4 with our PP prompts (LLM-PP) can predict the performance of architecture with a mean absolute error matching the SOTA and a marginal degradation in rank correlation coefficient compared to SOTA performance predictors. Further, we show that the predictions from LLM-PP can be distilled to a small regression model (LLM-Distill-PP). LLM-Distill-PP models surprisingly retain the performance of LLM-PP largely and can be a cost-effective alternative for heavy use cases of performance estimation. Specifically, for neural architecture search (NAS), we propose a Hybrid-Search algorithm for NAS (HS-NAS), which uses LLM-Distill-PP for the initial part of search, resorting to the baseline predictor for rest of the search. We show that HS-NAS performs very similar to SOTA NAS across benchmarks, reduces search hours by 50% roughly, and in some cases, improves latency, GFLOPs, and model size.
    摘要
  1. Role: A description of the role assigned to the LLM.2. Instructions: A set of instructions to be followed by the LLM to carry out performance prediction.3. Hyperparameters: A definition of each architecture-specific hyperparameter.4. Demonstrations: Sample architectures along with their efficiency metrics and ‘training from scratch’ performance.For machine translation (MT) tasks, we discover that GPT-4 with our PP prompts (LLM-PP) can predict the performance of architecture with a mean absolute error matching the state-of-the-art (SOTA) and a marginal degradation in rank correlation coefficient compared to SOTA performance predictors. Furthermore, we show that the predictions from LLM-PP can be distilled to a small regression model (LLM-Distill-PP). LLM-Distill-PP models surprisingly retain the performance of LLM-PP largely and can be a cost-effective alternative for heavy use cases of performance estimation.In the context of neural architecture search (NAS), we propose a Hybrid-Search algorithm (HS-NAS) that uses LLM-Distill-PP for the initial part of the search and resorts to the baseline predictor for the rest of the search. We show that HS-NAS performs very similarly to SOTA NAS across benchmarks, reduces search hours by approximately 50%, and in some cases, improves latency, GFLOPs, and model size.

BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories?

  • paper_url: http://arxiv.org/abs/2310.16681
  • repo_url: https://github.com/zephyr1022/babystories-utsa
  • paper_authors: Xingmeng Zhao, Tongnian Wang, Sheri Osborn, Anthony Rios
  • for: 这个研究的目的是探索通过人工反馈学习(RLHF)是否可以提高由预训练的语言模型在小型、人类化数据集上的表现。
  • methods: 这个研究使用了两种GPT-2变种,通过RLHF练练后在故事作业中表现较好。
  • results: 研究发现, larger model在RLHF练练后在故事作业中表现较好,这表明RLHF技术可能对大型模型更有利,但需要更多的实验来证明这一结论。 These findings suggest that RLHF techniques may be more advantageous for larger models due to their higher learning and adaptation capacity.
    Abstract Language models have seen significant growth in the size of their corpus, leading to notable performance improvements. Yet, there has been limited progress in developing models that handle smaller, more human-like datasets. As part of the BabyLM shared task, this study explores the impact of reinforcement learning from human feedback (RLHF) on language models pretrained from scratch with a limited training corpus. Comparing two GPT-2 variants, the larger model performs better in storytelling tasks after RLHF fine-tuning. These findings suggest that RLHF techniques may be more advantageous for larger models due to their higher learning and adaptation capacity, though more experiments are needed to confirm this finding. These insights highlight the potential benefits of RLHF fine-tuning for language models within limited data, enhancing their ability to maintain narrative focus and coherence while adhering better to initial instructions in storytelling tasks. The code for this work is publicly at https://github.com/Zephyr1022/BabyStories-UTSA.
    摘要 Language models have experienced significant growth in their corpus size, leading to notable improvements in performance. However, there has been limited progress in developing models that can handle smaller, more human-like datasets. This study explores the impact of reinforcement learning from human feedback (RLHF) on language models pretrained from scratch with a limited training corpus. Comparing two GPT-2 variants, the larger model performs better in storytelling tasks after RLHF fine-tuning. These findings suggest that RLHF techniques may be more advantageous for larger models due to their higher learning and adaptation capacity, although more experiments are needed to confirm this finding. These insights highlight the potential benefits of RLHF fine-tuning for language models within limited data, enabling them to better maintain narrative focus and coherence while adhering to initial instructions in storytelling tasks. The code for this work is publicly available at .

SSLCL: An Efficient Model-Agnostic Supervised Contrastive Learning Framework for Emotion Recognition in Conversations

  • paper_url: http://arxiv.org/abs/2310.16676
  • repo_url: https://github.com/taoshi1998/sslcl
  • paper_authors: Tao Shi, Xiao Liang, Yaoyuan Liang, Xinyi Tong, Shao-Lun Huang
  • for: 这个论文主要针对的是对话中的情感识别任务(ERC), aiming to detect the emotions expressed by speakers during a conversation.
  • methods: 我们提出了一种高效的、模型无关的Supervised Sample-Label Contrastive Learning(SSLCL)框架, which eliminates the need for a large batch size and can be seamlessly integrated with existing ERC models without introducing any model-specific assumptions.
  • results: 我们的SSLCL框架在两个ERC数据集上(IEMOCAP和MELD)得到了与现有State-of-the-art SCL方法相比的compatibleibility和显著性能提升。
    Abstract Emotion recognition in conversations (ERC) is a rapidly evolving task within the natural language processing community, which aims to detect the emotions expressed by speakers during a conversation. Recently, a growing number of ERC methods have focused on leveraging supervised contrastive learning (SCL) to enhance the robustness and generalizability of learned features. However, current SCL-based approaches in ERC are impeded by the constraint of large batch sizes and the lack of compatibility with most existing ERC models. To address these challenges, we propose an efficient and model-agnostic SCL framework named Supervised Sample-Label Contrastive Learning with Soft-HGR Maximal Correlation (SSLCL), which eliminates the need for a large batch size and can be seamlessly integrated with existing ERC models without introducing any model-specific assumptions. Specifically, we introduce a novel perspective on utilizing label representations by projecting discrete labels into dense embeddings through a shallow multilayer perceptron, and formulate the training objective to maximize the similarity between sample features and their corresponding ground-truth label embeddings, while minimizing the similarity between sample features and label embeddings of disparate classes. Moreover, we innovatively adopt the Soft-HGR maximal correlation as a measure of similarity between sample features and label embeddings, leading to significant performance improvements over conventional similarity measures. Additionally, multimodal cues of utterances are effectively leveraged by SSLCL as data augmentations to boost model performances. Extensive experiments on two ERC benchmark datasets, IEMOCAP and MELD, demonstrate the compatibility and superiority of our proposed SSLCL framework compared to existing state-of-the-art SCL methods. Our code is available at \url{https://github.com/TaoShi1998/SSLCL}.
    摘要 “情感识别在对话(ERC)是自然语言处理领域的一个快速发展的任务,目的是在对话中检测发言人表达的情感。当前的ERC方法中,一个增长的数量的方法是利用监督对比学习(SCL)来提高学习的稳定性和通用性。然而,现有的SCL基于的ERC方法受到大批次大小的限制和现有ERC模型的不兼容性的问题。为解决这些挑战,我们提出了一种高效和模型无关的SCL框架,名为Supervised Sample-Label Contrastive Learning with Soft-HGR Maximal Correlation(SSLCL),它不需要大批次大小,可以轻松地与现有ERC模型集成,无需做任何模型特定的假设。具体来说,我们提出了一种新的标签表示方法,通过将精确的标签映射到权重化的多层感知机中,并将训练目标设置为最大化样本特征和其相应的真实标签嵌入的相似性,同时最小化样本特征和不同类别的标签嵌入之间的相似性。此外,我们创新地采用Soft-HGR最大相似度作为样本特征和标签嵌入之间的相似度度量,从而导致模型性能显著提高。同时,我们有效地利用对话语音的多 modal 信息作为数据增强来提高模型性能。我们的实验表明,我们的SSLCL框架在两个ERC标准测试集上(IEMOCAP和MELD)的Compatibility和Superiority,证明了我们的提议的可行性和优势。我们的代码可以在 \url{https://github.com/TaoShi1998/SSLCL} 中找到。”

ChatGPT is a Potential Zero-Shot Dependency Parser

  • paper_url: http://arxiv.org/abs/2310.16654
  • repo_url: None
  • paper_authors: Boda Lin, Xinyi Zhou, Binghao Tang, Xiaocheng Gong, Si Li
  • for: 研究是否可以使用预训练语言模型进行静态分析,无需额外结构。
  • methods: 使用ChatGPT大语言模型进行实验和语言分析。
  • results: ChatGPT表现出了零shot情况下的静态分析能力,并且分析结果也显示了一些特殊的 parsed 输出。
    Abstract Pre-trained language models have been widely used in dependency parsing task and have achieved significant improvements in parser performance. However, it remains an understudied question whether pre-trained language models can spontaneously exhibit the ability of dependency parsing without introducing additional parser structure in the zero-shot scenario. In this paper, we propose to explore the dependency parsing ability of large language models such as ChatGPT and conduct linguistic analysis. The experimental results demonstrate that ChatGPT is a potential zero-shot dependency parser, and the linguistic analysis also shows some unique preferences in parsing outputs.
    摘要 大量语言模型在依赖分析任务中广泛应用,并实现了显著提高 parser 性能。但是,还是一个未研究的问题是否可以在零shotenario中,不添加额外的 parser 结构,使大量语言模型自动表现出依赖分析能力。在这篇论文中,我们提出了探索大量语言模型如ChatGPT的依赖分析能力,并进行语言分析。实验结果表明,ChatGPT可能是一个零shot依赖分析器,语言分析还发现了一些独特的依赖分析输出偏好。

Context Does Matter: End-to-end Panoptic Narrative Grounding with Deformable Attention Refined Matching Network

  • paper_url: http://arxiv.org/abs/2310.16616
  • repo_url: None
  • paper_authors: Yiming Lin, Xiao-Bo Jin, Qiufeng Wang, Kaizhu Huang
  • for: 提高图像中文描述匹配精度,解决文本描述与图像像素之间的匹配异常问题。
  • methods: 提出了一种新的学习框架,即变换注意力重新评估网络(DRMN),通过在循环学习过程中引入可变注意力网络,以 capture 不同级别的像素上的关键信息,从而提高文本描述与图像像素之间的匹配精度。
  • results: 实验结果表明,DRMN在PNG数据集上达到了新的状态艺术性能,增加了3.5%的召回率。
    Abstract Panoramic Narrative Grounding (PNG) is an emerging visual grounding task that aims to segment visual objects in images based on dense narrative captions. The current state-of-the-art methods first refine the representation of phrase by aggregating the most similar $k$ image pixels, and then match the refined text representations with the pixels of the image feature map to generate segmentation results. However, simply aggregating sampled image features ignores the contextual information, which can lead to phrase-to-pixel mis-match. In this paper, we propose a novel learning framework called Deformable Attention Refined Matching Network (DRMN), whose main idea is to bring deformable attention in the iterative process of feature learning to incorporate essential context information of different scales of pixels. DRMN iteratively re-encodes pixels with the deformable attention network after updating the feature representation of the top-$k$ most similar pixels. As such, DRMN can lead to accurate yet discriminative pixel representations, purify the top-$k$ most similar pixels, and consequently alleviate the phrase-to-pixel mis-match substantially.Experimental results show that our novel design significantly improves the matching results between text phrases and image pixels. Concretely, DRMN achieves new state-of-the-art performance on the PNG benchmark with an average recall improvement 3.5%. The codes are available in: https://github.com/JaMesLiMers/DRMN.
    摘要 паннорамный нарративный гроуинг (ПНГ) - это возникающая задача визуального гроуинга, которая целится в сегментации визуальных объектов в изображениях на основе плотных нарративных записей. Текущие методы штата-арта используют технику сжатия представления фразы, а затем сравнения с представлениями пикселей изображения для получения результатов сегментации. Однако, простое сжатие выборочных пикселей изображения игнорирует контекстную информацию, что может привести к несоответствию фразы-пиксель. В этой статье мы предлагаем новый фреймворк обучения называемый Сверхёжидной вниманием очищенным сетью (ДРМН), который использует внимание в процессе итеративного обучения для интеграции информации о различных масштабах пикселей. ДРМН итеративно перекодирует пиксели с сетью внимания после обновления представления пикселей. Таким образом, ДРМН может привести к точным и дискриминативным представлениям пикселей, очистить топ-$k$ самых похожих пикселей и уменьшить несоответствие фразы-пиксель существенно. Экспериментальные результаты показывают, что наша новая конструкция значительно улучшает результаты сравнения текстовых фраз и пикселей изображения. Конкретно, ДРМН достигает новых рекордов штата-арта на задаче ПНГ с увеличением в среднем на 3,5% recall. Коды доступны в: https://github.com/JaMesLiMers/DRMN.

On the Interplay between Fairness and Explainability

  • paper_url: http://arxiv.org/abs/2310.16607
  • repo_url: None
  • paper_authors: Stephanie Brandl, Emanuele Bugliarello, Ilias Chalkidis
  • for: 这个论文的目的是建立可靠和信任worthy的NLP应用程序,这些应用程序需要具有不偏袋和可解释性。
  • methods: 这篇论文使用了多种方法来优化偏袋和可解释性,包括偏袋缓解和可解释性检测。
  • results: 研究发现,偏袋缓解算法不总是能够提高公平性,同时,empirical fairness和可解释性是 orthogonal的。
    Abstract In order to build reliable and trustworthy NLP applications, models need to be both fair across different demographics and explainable. Usually these two objectives, fairness and explainability, are optimized and/or examined independently of each other. Instead, we argue that forthcoming, trustworthy NLP systems should consider both. In this work, we perform a first study to understand how they influence each other: do fair(er) models rely on more plausible rationales? and vice versa. To this end, we conduct experiments on two English multi-class text classification datasets, BIOS and ECtHR, that provide information on gender and nationality, respectively, as well as human-annotated rationales. We fine-tune pre-trained language models with several methods for (i) bias mitigation, which aims to improve fairness; (ii) rationale extraction, which aims to produce plausible explanations. We find that bias mitigation algorithms do not always lead to fairer models. Moreover, we discover that empirical fairness and explainability are orthogonal.
    摘要 为建立可靠和信worthy的自然语言处理(NLP)应用程序,模型需要具备不同人群的公平性和可解释性。通常这两个目标被优化和/或独立地评估。我们 argue that forthcoming NLP系统应该同时考虑这两个目标。在这项工作中,我们进行了首次研究,了解这两个目标之间的关系:是否可以更加公平的模型具备更加可信的理由?以及vice versa。为此,我们在两个英语多类文本分类 datasets(BIOS和ECtHR)上进行了实验,这两个dataset提供了 gender和 nationality信息,以及人类标注的理由。我们对预训练语言模型进行了多种方法的调整,包括:* 偏好缓和,用于提高公平性* 理由提取,用于生成可信的解释我们发现,偏好缓和算法不总是能够提高公平性。此外,我们发现了empirical公平性和可解释性是独立的。

Tailoring Personality Traits in Large Language Models via Unsupervisedly-Built Personalized Lexicons

  • paper_url: http://arxiv.org/abs/2310.16582
  • repo_url: None
  • paper_authors: Tianlong Li, Xiaoqing Zheng, Xuanjing Huang
  • for: 这研究旨在tailoring大 Five trait within large language models (LLMs), allowing for the incorporation of any combination of the Big Five factors (i.e., openness, conscientiousness, extraversion, agreeableness, and neuroticism) in a pluggable manner.
  • methods: 该方法使用Unsupervisedly-Built Personalized Lexicons (UBPL) 来调整原始 LLMs 的下一个 token 预测概率,以促使模型生成具有个性 trait 的文本。
  • results: 实验结果表明该方法可以精细地控制 LLMs 的个性 trait,并且可以轻松地与其他 LLMs 集成。
    Abstract Personality plays a pivotal role in shaping human expression patterns, and empowering and manipulating large language models (LLMs) with personality traits holds significant promise in enhancing the user experience of LLMs. However, prior approaches either rely on fine-tuning LLMs on a corpus enriched with personalized expressions or necessitate the manual crafting of prompts to induce LLMs to produce personalized responses. The former approaches demand substantial time and resources for collecting sufficient training examples while the latter might fail in enabling the precise manipulation of the personality traits at a fine-grained level (e.g., achieving high agreeableness while reducing openness). In this study, we introduce a novel approach for tailoring personality traits within LLMs, allowing for the incorporation of any combination of the Big Five factors (i.e., openness, conscientiousness, extraversion, agreeableness, and neuroticism) in a pluggable manner. This is achieved by employing a set of Unsupervisedly-Built Personalized Lexicons (UBPL) that are utilized to adjust the probability of the next token predicted by the original LLMs during the decoding phase. This adjustment encourages the models to generate words present in the personalized lexicons while preserving the naturalness of the generated texts. Extensive experimentation demonstrates the effectiveness of our approach in finely manipulating LLMs' personality traits. Furthermore, our method can be seamlessly integrated into other LLMs without necessitating updates to their parameters.
    摘要 人格 trait plays a crucial role in shaping human expression patterns, and empowering and manipulating large language models (LLMs) with personality traits holds great promise in enhancing the user experience of LLMs. However, previous approaches either rely on fine-tuning LLMs on a corpus enriched with personalized expressions or require manual crafting of prompts to induce LLMs to produce personalized responses. The former approaches demand substantial time and resources for collecting sufficient training examples, while the latter might fail to achieve precise manipulation of personality traits at a fine-grained level (e.g., achieving high agreeableness while reducing openness).In this study, we propose a novel approach for tailoring personality traits within LLMs, allowing for the incorporation of any combination of the Big Five factors (i.e., openness, conscientiousness, extraversion, agreeableness, and neuroticism) in a pluggable manner. This is achieved by employing a set of Unsupervisedly-Built Personalized Lexicons (UBPL) to adjust the probability of the next token predicted by the original LLMs during the decoding phase. This adjustment encourages the models to generate words present in the personalized lexicons while preserving the naturalness of the generated texts. Extensive experimentation demonstrates the effectiveness of our approach in finely manipulating LLMs' personality traits. Furthermore, our method can be seamlessly integrated into other LLMs without requiring updates to their parameters.

WSDMS: Debunk Fake News via Weakly Supervised Detection of Misinforming Sentences with Contextualized Social Wisdom

  • paper_url: http://arxiv.org/abs/2310.16579
  • repo_url: https://github.com/HKBUNLP/WSDMS-EMNLP2023
  • paper_authors: Ruichao Yang, Wei Gao, Jing Ma, Hongzhan Lin, Zhiwei Yang
    for: 这项研究的目的是解决社交媒体上快速传播的假消息和不确定信息问题。methods: 本研究提出了一种新的假新闻推篱方法,即利用多个实例学习(Multiple Instance Learning,MIL)方法,只需要训练集的袋子级标签,但可以推断出具有错误信息的句子和具有真实性的文章。results: 研究表明,该方法可以在三个真实世界标准 benchmark 上击败现有的状态作准基eline,在句子和文章水平上 debunk 假新闻。
    Abstract In recent years, we witness the explosion of false and unconfirmed information (i.e., rumors) that went viral on social media and shocked the public. Rumors can trigger versatile, mostly controversial stance expressions among social media users. Rumor verification and stance detection are different yet relevant tasks. Fake news debunking primarily focuses on determining the truthfulness of news articles, which oversimplifies the issue as fake news often combines elements of both truth and falsehood. Thus, it becomes crucial to identify specific instances of misinformation within the articles. In this research, we investigate a novel task in the field of fake news debunking, which involves detecting sentence-level misinformation. One of the major challenges in this task is the absence of a training dataset with sentence-level annotations regarding veracity. Inspired by the Multiple Instance Learning (MIL) approach, we propose a model called Weakly Supervised Detection of Misinforming Sentences (WSDMS). This model only requires bag-level labels for training but is capable of inferring both sentence-level misinformation and article-level veracity, aided by relevant social media conversations that are attentively contextualized with news sentences. We evaluate WSDMS on three real-world benchmarks and demonstrate that it outperforms existing state-of-the-art baselines in debunking fake news at both the sentence and article levels.
    摘要 近年来,我们目睹了社交媒体上的谣言泛洪,让公众受到了各种不同的影响。谣言可以让社交媒体用户表达多种不同的看法,大多是争议的。验证谣言和判断看法是不同 yet 相关的任务。驳斥 fake news 主要集中在决定新闻文章的真实性,这有些 simplifies 了问题,因为 fake news 经常混合真实和假的元素。因此,成为必须鉴别特定的谣言内容。在这项研究中,我们调查了一项新的 fake news 驳斥任务,即Detecting Misinforming Sentences(DMS)。这项任务的主要挑战在于缺乏 sentence-level 的真实性标注数据。以 Multiple Instance Learning(MIL) Approach 为 inspiration,我们提出了 Weakly Supervised Detection of Misinforming Sentences(WSDMS)模型。这个模型只需要训练 bag-level 标签,但可以推断出 sentence-level 谣言和文章-level 真实性,得益于与新闻句子相关的社交媒体对话。我们对 WSDMS 进行了三个实际 benchmark 的评估,并证明它在 fake news 驳斥中超过了现有的基eline。

Give Me the Facts! A Survey on Factual Knowledge Probing in Pre-trained Language Models

  • paper_url: http://arxiv.org/abs/2310.16570
  • repo_url: None
  • paper_authors: Paul Youssef, Osman Alperen Koraş, Meijie Li, Jörg Schlötterer, Christin Seifert
  • for: 这个研究旨在调查 PLMs 中的事实知识量,以解释其在下游任务中的表现,并可能正当使用它们作为知识库。
  • methods: 这篇论文报道了对 PLMs 的事实检测方法,包括输入、输出和检测 PLMs 的方法,并提供了这些方法的概述。
  • results: 该研究发现了 PLMs 中的事实知识量,并分析了在采用 PLMs 作为知识库时的障碍和未来研究的方向。
    Abstract Pre-trained Language Models (PLMs) are trained on vast unlabeled data, rich in world knowledge. This fact has sparked the interest of the community in quantifying the amount of factual knowledge present in PLMs, as this explains their performance on downstream tasks, and potentially justifies their use as knowledge bases. In this work, we survey methods and datasets that are used to probe PLMs for factual knowledge. Our contributions are: (1) We propose a categorization scheme for factual probing methods that is based on how their inputs, outputs and the probed PLMs are adapted; (2) We provide an overview of the datasets used for factual probing; (3) We synthesize insights about knowledge retention and prompt optimization in PLMs, analyze obstacles to adopting PLMs as knowledge bases and outline directions for future work.
    摘要
  1. 基于输入、输出和探测PLMs的改进方法的分类方案。2. 对用于事实探测的数据集进行概述。3. 对PLMs中知识保留和提问优化的分析,以及采用PLMs作为知识库的障碍和未来工作的规划。

1-PAGER: One Pass Answer Generation and Evidence Retrieval

  • paper_url: http://arxiv.org/abs/2310.16568
  • repo_url: None
  • paper_authors: Palak Jain, Livio Baldini Soares, Tom Kwiatkowski
  • for: 本研究旨在提出一种基于Transformer模型的单一解决方案,可以同时回答问题和检索证据。
  • methods: 该模型使用受限的解码进行搜索和回答,并通过比较与其他 retrieve-and-read 方法的性能指标来证明其竞争力。
  • results: 实验结果表明,1-Pager 可以与其他相似的 retrieve-and-read 方法相比,在回答准确率和检索率两个指标上具有竞争力。此外,1-Pager 还可以在不读取多个文档后产生答案的情况下,提供更高的回答准确率。
    Abstract We present 1-Pager the first system that answers a question and retrieves evidence using a single Transformer-based model and decoding process. 1-Pager incrementally partitions the retrieval corpus using constrained decoding to select a document and answer string, and we show that this is competitive with comparable retrieve-and-read alternatives according to both retrieval and answer accuracy metrics. 1-Pager also outperforms the equivalent closed-book question answering model, by grounding predictions in an evidence corpus. While 1-Pager is not yet on-par with more expensive systems that read many more documents before generating an answer, we argue that it provides an important step toward attributed generation by folding retrieval into the sequence-to-sequence paradigm that is currently dominant in NLP. We also show that the search paths used to partition the corpus are easy to read and understand, paving a way forward for interpretable neural retrieval.
    摘要 我们介绍1-Pager,首个使用单一转换器模型和解码过程来回答问题并提取证据的系统。1-Pager逐步分割检索库使用受限解码方式选择文档和答案字符串,我们表明这与相似的检索和读取选择相当。1-Pager还超过相同的关闭书问答模型,通过固定预测在证据库中附加 Generation。虽然1-Pager还不及更加昂贵的系统,但我们认为它为归因生成带来了重要的一步,将检索嵌入序列到序列中的当前主流NLP框架中。我们还显示检索路径使用受限解码方式分割库是易于阅读和理解的,这为神经网络检索带来了可读性的前进。

An Early Evaluation of GPT-4V(ision)

  • paper_url: http://arxiv.org/abs/2310.16534
  • repo_url: https://github.com/albertwy/gpt-4v-evaluation
  • paper_authors: Yang Wu, Shilong Wang, Hao Yang, Tian Zheng, Hongbo Zhang, Yanyan Zhao, Bing Qin
  • for: 本研究用于评估GPT-4V在视觉理解、语言理解、视觉拼图解决和其他modalities的能力。
  • methods: 我们手动构建656个测试实例,并仔细评估GPT-4V的表现。
  • results: 我们发现GPT-4V在英语视觉中benchmark表现出色,但无法识别简单的中文文本在图像中; GPT-4V在敏感特征相关问题上表现不一致; GPT-4V在语言理解任务上表现 inferior于GPT-4(API); 几何提示可以提高GPT-4V的视觉理解和语言理解能力; GPT-4V在类似模式的任务上表现差。
    Abstract In this paper, we evaluate different abilities of GPT-4V including visual understanding, language understanding, visual puzzle solving, and understanding of other modalities such as depth, thermal, video, and audio. To estimate GPT-4V's performance, we manually construct 656 test instances and carefully evaluate the results of GPT-4V. The highlights of our findings are as follows: (1) GPT-4V exhibits impressive performance on English visual-centric benchmarks but fails to recognize simple Chinese texts in the images; (2) GPT-4V shows inconsistent refusal behavior when answering questions related to sensitive traits such as gender, race, and age; (3) GPT-4V obtains worse results than GPT-4 (API) on language understanding tasks including general language understanding benchmarks and visual commonsense knowledge evaluation benchmarks; (4) Few-shot prompting can improve GPT-4V's performance on both visual understanding and language understanding; (5) GPT-4V struggles to find the nuances between two similar images and solve the easy math picture puzzles; (6) GPT-4V shows non-trivial performance on the tasks of similar modalities to image, such as video and thermal. Our experimental results reveal the ability and limitations of GPT-4V and we hope our paper can provide some insights into the application and research of GPT-4V.
    摘要 在这篇论文中,我们评估了GPT-4V的不同能力,包括视觉理解、语言理解、视觉逻辑解决、以及其他modalities such as depth、thermal、视频和audio的理解。为了估计GPT-4V的性能,我们手动构建了656个测试实例,并且精心评估了GPT-4V的结果。我们的发现包括:1. GPT-4V在英文视觉中benchmarks上表现出色,但是无法识别简单的中文文本在图像中;2. GPT-4V在归类敏感特征问题上表现不一致,包括性别、种族和年龄等;3. GPT-4V在语言理解任务上比GPT-4(API)表现更差,包括通用语言理解benchmarks和视觉常识知识评估benchmarks;4. 几个提示可以提高GPT-4V的视觉理解和语言理解性能;5. GPT-4V很难在两个类似图像之间找到细节和解决易于数学图像逻辑问题;6. GPT-4V在视频和热成像任务上表现不错,与图像任务相似。我们的实验结果表明GPT-4V的能力和局限性,我们希望这篇论文可以为GPT-4V的应用和研究提供一些启示。

CUNI Submission to MRL 2023 Shared Task on Multi-lingual Multi-task Information Retrieval

  • paper_url: http://arxiv.org/abs/2310.16528
  • repo_url: None
  • paper_authors: Jindřich Helcl, Jindřich Libovický
  • for: 这个论文是为了参加2023年多语言多任务信息检索(MRL)共同任务的系统设计的。
  • methods: 论文使用了翻译测试方法,首先将无标例例 перевод成英语,然后使用强task特定模型进行推理。最后,我们使用标签敏感翻译模型对原语言中的标签进行评分,以保持在原语言中的标签。
  • results: 论文在两个下标任务中使用了翻译测试方法,但由于development数据和共同任务验证和测试集之间的领域不同,经过训练的分类模型无法超越基线。
    Abstract We present the Charles University system for the MRL~2023 Shared Task on Multi-lingual Multi-task Information Retrieval. The goal of the shared task was to develop systems for named entity recognition and question answering in several under-represented languages. Our solutions to both subtasks rely on the translate-test approach. We first translate the unlabeled examples into English using a multilingual machine translation model. Then, we run inference on the translated data using a strong task-specific model. Finally, we project the labeled data back into the original language. To keep the inferred tags on the correct positions in the original language, we propose a method based on scoring the candidate positions using a label-sensitive translation model. In both settings, we experiment with finetuning the classification models on the translated data. However, due to a domain mismatch between the development data and the shared task validation and test sets, the finetuned models could not outperform our baselines.
    摘要 我们介绍了查尔斯大学系统 для MRL~2023共享任务的多语言多任务信息检索。该任务的目标是开发用于多种语言的命名实体识别和问答系统。我们的解决方案对于两个子任务都采用了翻译测试方法。我们首先将无标示示例翻译成英语使用多语言机器翻译模型。然后,我们运行在翻译后的数据上进行推理,使用强大的任务特定模型。最后,我们将原始语言中的标注数据投影回原始语言。为保持在原始语言中的推理结果的标注,我们提议一种基于标签敏感翻译模型的评分方法。在两个设置中,我们尝试了在翻译后的数据上进行训练分类模型,但由于共享任务验证和测试集与开发数据的领域差异,训练后的模型无法超越我们的基线。

OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models

  • paper_url: http://arxiv.org/abs/2310.16517
  • repo_url: None
  • paper_authors: Mingfeng Xue, Dayiheng Liu, Kexin Yang, Guanting Dong, Wenqiang Lei, Zheng Yuan, Chang Zhou, Jingren Zhou
    for:* This paper aims to address the issue of occupational bias in instruction-tuning datasets for large language models (LLMs), which hinders the models’ ability to generate helpful responses to professional queries from practitioners in specific fields.methods:* The authors create an instruction-tuning dataset named “OccuQuest” that contains 110,000+ prompt-completion pairs and 30,000+ dialogues covering over 1,000 occupations in 26 occupational categories.* They systematically request ChatGPT to generate responses to queries hierarchically based on Occupation, Responsibility, Topic, and Question to ensure comprehensive coverage of occupational specialty inquiries.results:* The authors compare OccuQuest with three commonly used datasets (Dolly, ShareGPT, and WizardLM) and find that OccuQuest exhibits a more balanced distribution across occupations.* They fine-tune LLaMA on OccuQuest to obtain OccuLLaMA, which significantly outperforms state-of-the-art LLaMA variants (Vicuna, Tulu, and WizardLM) on professional questions in GPT-4 and human evaluations, with a high win rate of 86.4% against WizardLM on the occu-quora set.
    Abstract The emergence of large language models (LLMs) has revolutionized natural language processing tasks. However, existing instruction-tuning datasets suffer from occupational bias: the majority of data relates to only a few occupations, which hampers the instruction-tuned LLMs to generate helpful responses to professional queries from practitioners in specific fields. To mitigate this issue and promote occupation-inclusive LLMs, we create an instruction-tuning dataset named \emph{OccuQuest}, which contains 110,000+ prompt-completion pairs and 30,000+ dialogues covering over 1,000 occupations in 26 occupational categories. We systematically request ChatGPT, organizing queries hierarchically based on Occupation, Responsibility, Topic, and Question, to ensure a comprehensive coverage of occupational specialty inquiries. By comparing with three commonly used datasets (Dolly, ShareGPT, and WizardLM), we observe that OccuQuest exhibits a more balanced distribution across occupations. Furthermore, we assemble three test sets for comprehensive evaluation, an occu-test set covering 25 occupational categories, an estate set focusing on real estate, and an occu-quora set containing real-world questions from Quora. We then fine-tune LLaMA on OccuQuest to obtain OccuLLaMA, which significantly outperforms state-of-the-art LLaMA variants (Vicuna, Tulu, and WizardLM) on professional questions in GPT-4 and human evaluations. Notably, on the occu-quora set, OccuLLaMA reaches a high win rate of 86.4\% against WizardLM.
    摘要 大量语言模型(LLM)的出现已经革命化了自然语言处理任务。然而,现有的 instrucion-tuning 数据集受到职业偏见:大多数数据关注只有一些职业,这使得 instrucion-tuned LLM 无法生成专业领域问题上有用的回答。为解决这问题并推广职业包容 LLM,我们创建了一个 instrucion-tuning 数据集名为“OccuQuest”,包含 более чем 110,000+ 提示完成对和 30,000+ 对话,覆盖26个职业类别中的1,000多个职业。我们系统地请求 ChatGPT,将提示组织按照职业、责任、话题和问题进行归类,以确保职业专业问题的全面覆盖。与 Dolly、ShareGPT 和 WizardLM 等三个常用数据集进行比较,我们发现 OccuQuest 的分布更加平衡。此外,我们组成了三个测试集,包括 occu-test set(覆盖25个职业类别)、 estate set(专注于房地产)和 occu-quora set(包含来自 Quora 的真实问题)。然后,我们使用 OccuQuest 进行精度调整,得到了 OccuLLaMA,它在专业问题上以高胜率(86.4%)击败了 WizardLM。

Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training

  • paper_url: http://arxiv.org/abs/2310.16484
  • repo_url: None
  • paper_authors: Max Müller-Eberstein, Rob van der Goot, Barbara Plank, Ivan Titov
  • for: 本研究旨在探讨语言模型在自然语言处理(NLP)中的基本知识空间是如何形成和如何在训练过程中交互。
  • methods: 研究人员使用了一种新的信息理论探索工具箱,可以直接比较不同任务的表现和表现空间,对九个任务(涉及语法、 semantics 和理解)进行了200万步骤的预训练和五个种子的分析。
  • results: 研究发现,在不同任务和时间点上,语言知识在不同阶段出现、交换信息和特циали化,从而影响模型的表现。 syntax 知识在训练的早期快速获得,而后续的表现提升主要来自于开放领域知识的获得,而 semantics 和理解任务则在后期受惠于更高的特циали化和长距离contextualization。 检测交育 Task 之间的相似性也表明,语言相关任务在训练过程中进行了各种信息交换,并在关键学习阶段更加活跃。
    Abstract Representational spaces learned via language modeling are fundamental to Natural Language Processing (NLP), however there has been limited understanding regarding how and when during training various types of linguistic information emerge and interact. Leveraging a novel information theoretic probing suite, which enables direct comparisons of not just task performance, but their representational subspaces, we analyze nine tasks covering syntax, semantics and reasoning, across 2M pre-training steps and five seeds. We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize. Across these phases, syntactic knowledge is acquired rapidly after 0.5% of full training. Continued performance improvements primarily stem from the acquisition of open-domain knowledge, while semantics and reasoning tasks benefit from later boosts to long-range contextualization and higher specialization. Measuring cross-task similarity further reveals that linguistically related tasks share information throughout training, and do so more during the critical phase of learning than before or after. Our findings have implications for model interpretability, multi-task learning, and learning from limited data.
    摘要 NATURAL LANGUAGE PROCESSING (NLP) 的基础知识是通过语言模型学习得到的表征空间,但是过去很少有人研究了在哪些时候和如何在训练中不同类型的语言信息emerge和交互。我们使用了一个新的信息论探测 suite,可以对不同任务的表征空间进行直接比较,我们分析了九个任务,覆盖了 syntax、 semantics 和理解,在200万个预训练步和五个种子上进行了分析。我们发现了训练过程中的关键学习阶段,在这些阶段表征空间出现、信息交换和后来分离以特化。在这些阶段,语法知识得到了快速的学习,而开放领域知识的获得则是训练的主要来源,而 semantics 和理解任务则在后来得到了更多的长距离 contextualization 和更高的特化。我们的发现对模型解释、多任务学习和学习从有限数据进行了启示。

CLEX: Continuous Length Extrapolation for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.16450
  • repo_url: https://github.com/damo-nlp-sg/clex
  • paper_authors: Guanzheng Chen, Xin Li, Zaiqiao Meng, Shangsong Liang, Lidong Bing
  • for: 提高LLMs的上下文窗口长度,以便在长上下文应用中表现出色。
  • methods: 基于Continuous Length EXtrapolation(CLEX)的方法,通过对length scaling factor进行普通微分方程的模型化,超越现有PE scaling方法的限制。
  • results: 在实验中,CLEX可以准确地扩展LLMs的上下文窗口长度至超过4倍或接近8倍的训练长度,无损性性能。此外,在实际的LongBench标准测试中,我们的模型在4k长度上表现竞争力强,与状态流开源模型在32k长度上训练的表现相当。
    Abstract Transformer-based Large Language Models (LLMs) are pioneering advances in many natural language processing tasks, however, their exceptional capabilities are restricted within the preset context window of Transformer. Position Embedding (PE) scaling methods, while effective in extending the context window to a specific length, demonstrate either notable limitations in their extrapolation abilities or sacrificing partial performance within the context window. Length extrapolation methods, although theoretically capable of extending the context window beyond the training sequence length, often underperform in practical long-context applications. To address these challenges, we propose Continuous Length EXtrapolation (CLEX) for LLMs. We generalise the PE scaling approaches to model the continuous dynamics by ordinary differential equations over the length scaling factor, thereby overcoming the constraints of current PE scaling methods designed for specific lengths. Moreover, by extending the dynamics to desired context lengths beyond the training sequence length, CLEX facilitates the length extrapolation with impressive performance in practical tasks. We demonstrate that CLEX can be seamlessly incorporated into LLMs equipped with Rotary Position Embedding, such as LLaMA and GPT-NeoX, with negligible impact on training and inference latency. Experimental results reveal that CLEX can effectively extend the context window to over 4x or almost 8x training length, with no deterioration in performance. Furthermore, when evaluated on the practical LongBench benchmark, our model trained on a 4k length exhibits competitive performance against state-of-the-art open-source models trained on context lengths up to 32k.
    摘要 transformer-based 大型自然语言处理模型(LLM)在许多任务中取得了先锋的进步,然而其杰出的能力受到 transformer 中设置的上下文窗口的限制。位嵌入(PE)缩放方法可以延长上下文窗口的长度,但是它们在不同长度上的极限性能或者在上下文窗口内的一部分性能牺牲。长度极限方法可以将上下文窗口拓展到训练序列长度之 beyond,但是在实际长上下文应用中 frequently underperform。为解决这些挑战,我们提出了 Continuous Length EXtrapolation(CLEX) для LLM。我们将 PE 缩放方法扩展到模型化连续动力学,使得可以不受现有 PE 缩放方法设置的限制。此外,通过将动力学拓展到所需的上下文长度,CLEX 可以具有卓越的长度极限性能。我们在 LLM 中 embedding 旋转 Position Embedding 的模型中实现了 CLEX,并证明了它可以轻松地与 training 和推理时间相比,无损到性能。实验结果表明,CLEX 可以有效地将上下文窗口拓展到训练序列长度的4倍或更长,无损到性能。此外,当我们对 practical LongBench benchmark 进行评估时,我们的模型在4k 长度上表现与开源模型在上下文长度达32k 的状态之前的状态竞争。

DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models

  • paper_url: http://arxiv.org/abs/2310.16436
  • repo_url: None
  • paper_authors: Ge Zheng, Bin Yang, Jiajin Tang, Hong-Yu Zhou, Sibei Yang
  • for: 本研究旨在提高人工智能系统在多Modal reasoning中的能力,使其能够像人类一样进行复杂的多Modal reasoning。
  • methods: 本研究使用了大语言模型(LLMs),通过模仿人类思维链(CoT)来实现多Modal reasoning。研究人员还提出了两个关键发现:“保持批判性思维”和“让每个人做自己的工作”。
  • results: 研究人员提出了一种新的DDCoT提问方法,可以维护语言模型的批判性思维能力,同时将视觉认知能力 integrate into reasoning过程。DDCoT提问方法在零shot提问和练习学习中,对大语言模型和小语言模型的理解能力进行了显著改进,并且具有很好的普适性和可解释性。
    Abstract A long-standing goal of AI systems is to perform complex multimodal reasoning like humans. Recently, large language models (LLMs) have made remarkable strides in such multi-step reasoning on the language modality solely by leveraging the chain of thought (CoT) to mimic human thinking. However, the transfer of these advancements to multimodal contexts introduces heightened challenges, including but not limited to the impractical need for labor-intensive annotation and the limitations in terms of flexibility, generalizability, and explainability. To evoke CoT reasoning in multimodality, this work first conducts an in-depth analysis of these challenges posed by multimodality and presents two key insights: "keeping critical thinking" and "letting everyone do their jobs" in multimodal CoT reasoning. Furthermore, this study proposes a novel DDCoT prompting that maintains a critical attitude through negative-space prompting and incorporates multimodality into reasoning by first dividing the reasoning responsibility of LLMs into reasoning and recognition and then integrating the visual recognition capability of visual models into the joint reasoning process. The rationales generated by DDCoT not only improve the reasoning abilities of both large and small language models in zero-shot prompting and fine-tuning learning, significantly outperforming state-of-the-art methods but also exhibit impressive generalizability and explainability.
    摘要 traditional goal of AI systems 是 perform complex multimodal reasoning 如人类。Recently, large language models (LLMs) have made remarkable progress in such multi-step reasoning on the language modality by leveraging the chain of thought (CoT) to mimic human thinking. However, the transfer of these advancements to multimodal contexts introduces increased challenges, including but not limited to the impractical need for labor-intensive annotation and limitations in terms of flexibility, generalizability, and explainability. To evoke CoT reasoning in multimodality, this work conducts an in-depth analysis of the challenges posed by multimodality and presents two key insights: "keeping critical thinking" and "letting everyone do their jobs" in multimodal CoT reasoning. Furthermore, this study proposes a novel DDCoT prompting that maintains a critical attitude through negative-space prompting and incorporates multimodality into reasoning by first dividing the reasoning responsibility of LLMs into reasoning and recognition and then integrating the visual recognition capability of visual models into the joint reasoning process. The rationales generated by DDCoT not only improve the reasoning abilities of both large and small language models in zero-shot prompting and fine-tuning learning, significantly outperforming state-of-the-art methods but also exhibit impressive generalizability and explainability.

PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

  • paper_url: http://arxiv.org/abs/2310.16427
  • repo_url: None
  • paper_authors: Xinyuan Wang, Chenxi Li, Zhen Wang, Fan Bai, Haotian Luo, Jiayou Zhang, Nebojsa Jojic, Eric P. Xing, Zhiting Hu
  • for: 这 paper 的目的是开发一种可以自动生成高质量专家级提问的优化方法,以提高大语言模型(LLM)的表现。
  • methods: 这 paper 使用了一种基于 Monte Carlo 搜索的原则导航算法,来寻找专家级提问空间中的优质提问。另外,它还引入了人类化的尝试-错误探索机制,以便从模型错误中获得精准的专家级 Insight 和深入的指导。
  • results: 这 paper 在 12 个任务中证明了 PromptAgent 可以备受提高 Chain-of-Thought 和最近的提问优化基准点。此外,它还进行了广泛的分析,证明了其能够具有高效、普适和域内专家级的提问生成能力。
    Abstract Highly effective, task-specific prompts are often heavily engineered by experts to integrate detailed instructions and domain insights based on a deep understanding of both instincts of large language models (LLMs) and the intricacies of the target task. However, automating the generation of such expert-level prompts remains elusive. Existing prompt optimization methods tend to overlook the depth of domain knowledge and struggle to efficiently explore the vast space of expert-level prompts. Addressing this, we present PromptAgent, an optimization method that autonomously crafts prompts equivalent in quality to those handcrafted by experts. At its core, PromptAgent views prompt optimization as a strategic planning problem and employs a principled planning algorithm, rooted in Monte Carlo tree search, to strategically navigate the expert-level prompt space. Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions by reflecting on model errors and generating constructive error feedback. Such a novel framework allows the agent to iteratively examine intermediate prompts (states), refine them based on error feedbacks (actions), simulate future rewards, and search for high-reward paths leading to expert prompts. We apply PromptAgent to 12 tasks spanning three practical domains: BIG-Bench Hard (BBH), as well as domain-specific and general NLP tasks, showing it significantly outperforms strong Chain-of-Thought and recent prompt optimization baselines. Extensive analyses emphasize its capability to craft expert-level, detailed, and domain-insightful prompts with great efficiency and generalizability.
    摘要 高效的任务特定提示通常由专家严格工程来整合详细的指令和领域知识,基于大语言模型(LLM)的本性和目标任务的细节。然而,自动生成专家水平提示的机器化仍然是一个未解之谜。现有的提示优化方法通常会忽略领域知识的深度和专家水平提示的巨大空间,而 PromptAgent 则是一种新的优化方法。PromptAgent 视提示优化为战略规划问题,并使用基于 Monte Carlo 搜索的原则正则算法来策略性浏览专家水平提示空间。被人类类似的尝试错误探索所 inspirited,PromptAgent 通过反思模型错误和生成有用的错误反馈来带来精准的专家水平启示和深入的指令。这种新的框架使得代理人可以随机检查中间提示(状态),根据错误反馈(动作)进行修改,在将来的奖励 simulate 和搜索高荷道路寻找专家提示。我们在 12 个任务中应用 PromptAgent,包括 BBH 和一些域特定和通用 NLP 任务,显示它与强大的 Chain-of-Thought 和最新的提示优化基线相比有显著的优势。广泛的分析表明它可以高效地制造专家水平的详细、领域内在的提示。

Enhanced Simultaneous Machine Translation with Word-level Policies

  • paper_url: http://arxiv.org/abs/2310.16417
  • repo_url: https://github.com/xl8-ai/wordsimt
  • paper_authors: Kang Kim, Hankyu Cho
  • for: 本研究的目的是提高同时机器翻译(SiMT)的性能,并解决现有研究中常见的一个假设,即在翻译过程中每步都需要读取或写入子单元(subword)。
  • methods: 本研究使用的方法包括提出了一种新的单词级策略(word-level policy),该策略可以在单词层面进行多个子单元的处理,以实现单词级别的翻译。此外,研究还提出了一种使用语言模型(LM)来提高SiMT模型的方法,该方法利用了word-level policy来解决LM和SiMT模型之间的子单元差异。
  • results: 研究发现,使用word-level policy可以提高SiMT模型的性能,并且可以 Addressing the subword disparity between LMs and SiMT models. Code is available at https://github.com/xl8-ai/WordSiMT.
    Abstract Recent years have seen remarkable advances in the field of Simultaneous Machine Translation (SiMT) due to the introduction of innovative policies that dictate whether to READ or WRITE at each step of the translation process. However, a common assumption in many existing studies is that operations are carried out at the subword level, even though the standard unit for input and output in most practical scenarios is typically at the word level. This paper demonstrates that policies devised and validated at the subword level are surpassed by those operating at the word level, which process multiple subwords to form a complete word in a single step. Additionally, we suggest a method to boost SiMT models using language models (LMs), wherein the proposed word-level policy plays a vital role in addressing the subword disparity between LMs and SiMT models. Code is available at https://github.com/xl8-ai/WordSiMT.
    摘要 近年来,同时机器翻译(SiMT)领域发生了非常出色的进步,这主要归功于新的政策的引入,这些政策在翻译过程中每步都会决定是否阅读或写入。然而,许多现有研究假设在翻译过程中每步都会进行子词级别的操作,尽管在实际应用场景中,输入和输出标准单位通常是单词级别。本文表明,在子词级别采用的策略会被单词级别的策略所超越,后者可以在单步中处理多个子词,形成完整的单词。此外,我们建议使用语言模型(LM)来提升SiMT模型,其中提议的单词级别策略具有重要的地位,以Addressing LM和SiMT模型之间的子词差异。代码可以在https://github.com/xl8-ai/WordSiMT中找到。

Decoding Stumpers: Large Language Models vs. Human Problem-Solvers

  • paper_url: http://arxiv.org/abs/2310.16411
  • repo_url: None
  • paper_authors: Alon Goldstein, Miriam Havin, Roi Reichart, Ariel Goldstein
  • for: 本研究探讨了大语言模型(LLMs)的问题解决能力,通过评估它们在独特的单步直觉问题上的表现。
  • methods: 本研究使用了四种当今最先进的LLMs(Davinci-2、Davinci-3、GPT-3.5-Turbo和GPT-4)和人类参与者进行比较。
  • results: 研究发现,新一代LLMs在解决独特问题上表现出色,超越人类表现。然而,人类参与者在验证解决方案的能力方面表现更出色。这些研究增强了我们对LLMs的认知能力的理解,并为不同领域中LLMs的问题解决潜力提供了新的思路。
    Abstract This paper investigates the problem-solving capabilities of Large Language Models (LLMs) by evaluating their performance on stumpers, unique single-step intuition problems that pose challenges for human solvers but are easily verifiable. We compare the performance of four state-of-the-art LLMs (Davinci-2, Davinci-3, GPT-3.5-Turbo, GPT-4) to human participants. Our findings reveal that the new-generation LLMs excel in solving stumpers and surpass human performance. However, humans exhibit superior skills in verifying solutions to the same problems. This research enhances our understanding of LLMs' cognitive abilities and provides insights for enhancing their problem-solving potential across various domains.
    摘要 这篇论文研究了大语言模型(LLMs)的问题解决能力,通过评估它们在单步直觉问题上的表现,这些问题对人类解决者来说是困难的,但是易于验证。我们比较了四个当今最先进的LLMs(Davinci-2、Davinci-3、GPT-3.5-Turbo、GPT-4)与人类参与者的表现。我们的发现表明,新一代LLMs在解决这些问题方面表现出色,超越了人类表现。然而,人类参与者在验证解决方案的能力方面表现出优异。这些研究增加了我们对LLMs的认知能力的理解,并为各个领域中LLMs的问题解决潜力带来了新的想法。

Video Referring Expression Comprehension via Transformer with Content-conditioned Query

  • paper_url: http://arxiv.org/abs/2310.16402
  • repo_url: None
  • paper_authors: Ji Jiang, Meng Cao, Tengtao Song, Long Chen, Yi Wang, Yuexian Zou
  • for: 本研究旨在提高视频表达理解(REC)中的目标对象定位精度,基于自然语言提交的问题。
  • methods: 该研究使用Transformer类型方法,并采用可学习的查询设计。然而,我们认为这种简单的查询设计不适合开放世界的视频REC,由于文本监督下的多种 semantics category。我们的解决方案是创建动态查询,它们是基于输入视频和自然语言来模型多种被引用的物体。特别是,我们在帧中预设一定数量的可学习 bounding box,并使用相关区域特征来提供先前信息。此外,我们发现现有的查询特征忽视了跨模态对齐的重要性。为此,我们将特定句子在句子中与semantic relevante的视觉区域进行对齐,并在现有的视频dataset(VID-Sentence和VidSTG)中进行标注。
  • results: 我们的提出的模型(名为ConFormer)在广泛的 benchmark dataset 上表现出优于其他模型。例如,在VID-Sentence dataset的测试分区中,ConFormer 在 Accu.@0.6 上实现了8.75%的绝对改进,比前一个状态的艺术模型更高。
    Abstract Video Referring Expression Comprehension (REC) aims to localize a target object in videos based on the queried natural language. Recent improvements in video REC have been made using Transformer-based methods with learnable queries. However, we contend that this naive query design is not ideal given the open-world nature of video REC brought by text supervision. With numerous potential semantic categories, relying on only a few slow-updated queries is insufficient to characterize them. Our solution to this problem is to create dynamic queries that are conditioned on both the input video and language to model the diverse objects referred to. Specifically, we place a fixed number of learnable bounding boxes throughout the frame and use corresponding region features to provide prior information. Also, we noticed that current query features overlook the importance of cross-modal alignment. To address this, we align specific phrases in the sentence with semantically relevant visual areas, annotating them in existing video datasets (VID-Sentence and VidSTG). By incorporating these two designs, our proposed model (called ConFormer) outperforms other models on widely benchmarked datasets. For example, in the testing split of VID-Sentence dataset, ConFormer achieves 8.75% absolute improvement on Accu.@0.6 compared to the previous state-of-the-art model.
    摘要 视频寻 Referring Expression Comprehension (REC) 目标是根据查询的自然语言来地址视频中的目标对象。 最近的改进方法使用 Transformer 基于方法,并使用可学习的查询。 但我们认为这种愚然的查询设计并不适合开放世界的视频 REC 中,因为它们可能会忽略多种可能的SemanticCategory。 我们的解决方案是创建动态的查询,它们是基于输入视频和语言来模型多种被引用的对象。 具体来说,我们在帧中预定一定数量的可学习的 bounding box,并使用相应的区域特征来提供先前信息。 同时,我们注意到现有的查询特征 ignore 视频和语言之间的协调。 为解决这个问题,我们将特定的句子在句子中与 semantically 相关的视觉区域进行对齐,并在现有的视频 dataset(VID-Sentence和 VidSTG)中进行标注。 通过这两种设计,我们的提议的模型(叫做 ConFormer)在广泛的标准化数据集上超越了其他模型。 例如,在 VID-Sentence 数据集的测试分区中,ConFormer 在 Accu.@0.6 上减少了8.75%的绝对改进,相比之前的状态对应模型。

ZGUL: Zero-shot Generalization to Unseen Languages using Multi-source Ensembling of Language Adapters

  • paper_url: http://arxiv.org/abs/2310.16393
  • repo_url: https://github.com/dair-iitd/zgul
  • paper_authors: Vipul Rathore, Rajdeep Dhingra, Parag Singla, Mausam
  • for: 这篇论文目的是解决zero-shot多语言传输问题在自然语言处理任务中。
  • methods: 论文使用语言适应器(LA)来实现多语言传输。LA通常是单个源语言(通常是英语)的适应器,在测试时使用目标语言或另一种相关语言的适应器。但是,训练目标语言的适应器需要无标签数据,这可能不太可能得到低资源的未看过语言:那些 neither seen by the underlying multilingual language model(例如,mBERT),也没有任何(标签或无标签)数据。因此,我们认为为更有效的跨语言传输,需要使用多个源语言的适应器,同时在训练和测试时使用它们。我们通过我们的新的神经网络架构ZGUL进行了调查。
  • results: 我们在四种语言组合中进行了广泛的实验,覆盖了15个未看过语言。结果表明,ZGUL比标准精度调整和其他强大基elines在POS标记和NER任务上提高了3.2个平均F1点。此外,我们还扩展了ZGUL,使其在有些未标签数据或少量训练示例available for the target language时也能够表现出色。在这些设置下,ZGUL仍然超过基elines。
    Abstract We tackle the problem of zero-shot cross-lingual transfer in NLP tasks via the use of language adapters (LAs). Most of the earlier works have explored training with adapter of a single source (often English), and testing either using the target LA or LA of another related language. Training target LA requires unlabeled data, which may not be readily available for low resource unseen languages: those that are neither seen by the underlying multilingual language model (e.g., mBERT), nor do we have any (labeled or unlabeled) data for them. We posit that for more effective cross-lingual transfer, instead of just one source LA, we need to leverage LAs of multiple (linguistically or geographically related) source languages, both at train and test-time - which we investigate via our novel neural architecture, ZGUL. Extensive experimentation across four language groups, covering 15 unseen target languages, demonstrates improvements of up to 3.2 average F1 points over standard fine-tuning and other strong baselines on POS tagging and NER tasks. We also extend ZGUL to settings where either (1) some unlabeled data or (2) few-shot training examples are available for the target language. We find that ZGUL continues to outperform baselines in these settings too.
    摘要 我们通过语言适配器(LA)解决了零样式跨语言传输问题在自然语言处理任务中。大多数先前的工作都是通过单个源语言(常常是英语)的适配器进行训练,然后在目标语言或另一种相关语言的适配器上进行测试。但是,训练目标语言的适配器需要无标签数据,这些数据可能不易 disponibility для低资源、未看到语言模型(如mBERT)中的语言。我们认为,为更有效的跨语言传输,不仅需要单一源语言的适配器,而是需要多种语言适配器,包括训练和测试时间。我们提出了一种新的神经网络架构,ZGUL,以 investigate 这种想法。我们在四种语言组中进行了广泛的实验,涵盖了15种未看到目标语言,并证明了ZGUL可以与标准精细调整和其他强大基elines 比较。我们还将ZGUL扩展到有限量的标签数据或几个培训示例的情况下。我们发现ZGUL仍然能够超越基elines 在这些情况下。

Transformer-based Live Update Generation for Soccer Matches from Microblog Posts

  • paper_url: http://arxiv.org/abs/2310.16368
  • repo_url: None
  • paper_authors: Masashi Oshika, Kosuke Yamada, Ryohei Sasano, Koichi Takeda
  • for: 这篇论文是为了生成来自推文的实时足球赛事更新,以便用户可以通过原始推文了解比赛的进程和激励。
  • methods: 该论文基于大型预训练语言模型,并实现了控制更新数量和减少重复更新的机制。
  • results: 该系统可以快速生成高质量的实时足球赛事更新,使用户可以快速了解比赛的进程和激励。
    Abstract It has been known to be difficult to generate adequate sports updates from a sequence of vast amounts of diverse live tweets, although the live sports viewing experience with tweets is gaining the popularity. In this paper, we focus on soccer matches and work on building a system to generate live updates for soccer matches from tweets so that users can instantly grasp a match's progress and enjoy the excitement of the match from raw tweets. Our proposed system is based on a large pre-trained language model and incorporates a mechanism to control the number of updates and a mechanism to reduce the redundancy of duplicate and similar updates.
    摘要 Live 体育更新从涂浮 tweets 是一个Difficult task,Despite the popularity of live sports viewing experience with tweets. In this paper, we focus on soccer matches and work on building a system to generate live updates for soccer matches from tweets, so that users can instantly grasp the progress of the match and enjoy the excitement of the match from raw tweets. Our proposed system is based on a large pre-trained language model and incorporates a mechanism to control the number of updates and a mechanism to reduce the redundancy of duplicate and similar updates.

From Simple to Complex: A Progressive Framework for Document-level Informative Argument Extraction

  • paper_url: http://arxiv.org/abs/2310.16358
  • repo_url: https://github.com/zhangyx0417/simple_to_complex
  • paper_authors: Quzhe Huang, Yanxi Zhang, Dongyan Zhao
  • for: 这个论文旨在提高文档级事件抽象EXTRACTION(EAE)的准确率。
  • methods: 该论文提出了一种简单到复杂的推进 Framework,通过计算每个事件的难度,然后按照简单到复杂的顺序进行抽象。这样,模型可以使用更可靠的结果来帮助预测更加困难的事件。
  • results: 在WikiEvents数据集上进行实验,该模型的F1分数高于SOTA的1.4%,表明提出的简单到复杂推进 Framework 对EAE任务有用。
    Abstract Document-level Event Argument Extraction (EAE) requires the model to extract arguments of multiple events from a single document. Considering the underlying dependencies between these events, recent efforts leverage the idea of "memory", where the results of already predicted events are cached and can be retrieved to help the prediction of upcoming events. These methods extract events according to their appearance order in the document, however, the event that appears in the first sentence does not mean that it is the easiest to extract. Existing methods might introduce noise to the extraction of upcoming events if they rely on an incorrect prediction of previous events. In order to provide more reliable memory, we propose a simple-to-complex progressive framework for document-level EAE. Specifically, we first calculate the difficulty of each event and then, we conduct the extraction following a simple-to-complex order. In this way, the memory will store the most certain results, and the model could use these reliable sources to help the prediction of more difficult events. Experiments on WikiEvents show that our model outperforms SOTA by 1.4% in F1, indicating the proposed simple-to-complex framework is useful in the EAE task.
    摘要 文档级事件功能抽取(EAE)需要模型从单个文档中提取多个事件的功能。尽管这些事件之间存在蕴含的依赖关系,但现有的方法却是通过“记忆”的想法来实现,即已经预测过的事件的结果被缓存并可以用于帮助预测未来的事件。这些方法通常是按照文档中事件的出现顺序来提取事件,但是第一个事件的出现并不意味着它是最容易提取的。现有的方法可能会对后续事件的提取引入噪音,如果它们基于错误的先前事件预测。为了提供更可靠的记忆,我们提议一种简单到复杂的进程式框架 для文档级EAE。具体来说,我们首先计算每个事件的难度,然后按照简单到复杂的顺序进行提取。这样,记忆将存储最可靠的结果,并且模型可以使用这些可靠的源来帮助预测更加困难的事件。在WikiEvents上的实验表明,我们的模型在F1指标上比SOTA高出1.4%,这表明我们提议的简单到复杂的框架是EAE任务中有用的。

A Multi-Modal Multilingual Benchmark for Document Image Classification

  • paper_url: http://arxiv.org/abs/2310.16356
  • repo_url: None
  • paper_authors: Yoshinari Fujinuma, Siddharth Varia, Nishant Sankaran, Srikar Appalaraju, Bonan Min, Yogarshi Vyas
  • for: 本研究旨在提供更多和更好的文档图像分类数据集,以便进一步研究文档图像分类技术。
  • methods: 本研究使用了两个新的多语言文档图像数据集:WIKI-DOC和MULTIEURLEX-DOC,并对现有的文档图像分类模型进行了全面的测试。
  • results: 实验结果表明,当文档图像分类模型在不同语言之间进行零例转移时,其限制性很大,需要进一步改进。
    Abstract Document image classification is different from plain-text document classification and consists of classifying a document by understanding the content and structure of documents such as forms, emails, and other such documents. We show that the only existing dataset for this task (Lewis et al., 2006) has several limitations and we introduce two newly curated multilingual datasets WIKI-DOC and MULTIEURLEX-DOC that overcome these limitations. We further undertake a comprehensive study of popular visually-rich document understanding or Document AI models in previously untested setting in document image classification such as 1) multi-label classification, and 2) zero-shot cross-lingual transfer setup. Experimental results show limitations of multilingual Document AI models on cross-lingual transfer across typologically distant languages. Our datasets and findings open the door for future research into improving Document AI models.
    摘要 文档图像分类与文本文档分类不同,它涉及到理解文档的内容和结构,如表格、电子邮件等。我们表明现有的数据集(Lewis et al., 2006)有几个限制,我们介绍了两个新的多语言数据集——WIKI-DOC和MULTIEURLEX-DOC,这两个数据集可以缓解这些限制。我们进行了详细的文档理解或文档AI模型在文档图像分类中的测试,包括1)多个标签分类和2)零shot跨语言传输设置。实验结果显示跨语言文档AI模型在跨语言传输中存在限制。我们的数据集和发现开启了未来文档AI模型的改进研究的大门。

Unraveling Feature Extraction Mechanisms in Neural Networks

  • paper_url: http://arxiv.org/abs/2310.16350
  • repo_url: https://github.com/richardsun-voyager/ufemnn
  • paper_authors: Xiaobing Sun, Jiaxi Li, Wei Lu
  • for: investigate the underlying mechanisms of neural networks in capturing precise knowledge
  • methods: based on Neural Tangent Kernels (NTKs) to analyze the learning dynamics of target models
  • results: discovered that the choice of activation function can affect feature extraction, and that multiplication-based models excel in learning n-grams.
    Abstract The underlying mechanism of neural networks in capturing precise knowledge has been the subject of consistent research efforts. In this work, we propose a theoretical approach based on Neural Tangent Kernels (NTKs) to investigate such mechanisms. Specifically, considering the infinite network width, we hypothesize the learning dynamics of target models may intuitively unravel the features they acquire from training data, deepening our insights into their internal mechanisms. We apply our approach to several fundamental models and reveal how these models leverage statistical features during gradient descent and how they are integrated into final decisions. We also discovered that the choice of activation function can affect feature extraction. For instance, the use of the \textit{ReLU} activation function could potentially introduce a bias in features, providing a plausible explanation for its replacement with alternative functions in recent pre-trained language models. Additionally, we find that while self-attention and CNN models may exhibit limitations in learning n-grams, multiplication-based models seem to excel in this area. We verify these theoretical findings through experiments and find that they can be applied to analyze language modeling tasks, which can be regarded as a special variant of classification. Our contributions offer insights into the roles and capacities of fundamental components within large language models, thereby aiding the broader understanding of these complex systems.
    摘要 大脑网络的内部机制如何捕捉精准知识,一直是研究的热点。在这项工作中,我们提出了基于神经 Tangent Kernels(NTK)的理论方法,以探索这些机制。具体来说,我们假设在训练数据上,目标模型的学习过程可以直观地解释它们从数据中继承的特征,从而深入了解它们的内部机制。我们应用我们的方法到一些基本模型上,揭示了这些模型在梯度下降过程中如何从数据中提取特征,以及如何将这些特征集成到最终决策中。我们还发现了活动函数的选择可能会影响特征提取,例如使用 ReLU 活动函数可能会引入偏见,从而解释其在最新的预训练语言模型中的替换。此外,我们发现了自注意力和 CNN 模型可能会在学习 n-grams 方面存在限制,而乘法基本模型则在这一方面表现出色。我们通过实验验证了这些理论发现,并发现它们可以应用于语言模型Task中,这可以视为一种特殊的分类任务。我们的贡献可以帮助我们更好地理解这些复杂系统中的基本组件的角色和能力,从而推动大脑网络的进一步发展。

A Comprehensive Evaluation of Constrained Text Generation for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.16343
  • repo_url: None
  • paper_authors: Xiang Chen, Xiaojun Wan
  • for: This paper aims to investigate the integration of intricate constraints into neural text generation using large language models (LLMs).
  • methods: The study employs multiple LLMs, including ChatGPT and GPT-4, and categorizes constraints into lexical, structural, and relation-based types. The authors present various benchmarks to facilitate fair evaluation.
  • results: The study reveals LLMs’ capacity and deficiency to incorporate constraints, providing insights for future developments in constrained text generation.Here’s the same information in Simplified Chinese:
  • for: 这篇论文目的是调查大语言模型(LLMs)中的约束性文本生成。
  • methods: 这些研究使用多种LLMs,包括ChatGPT和GPT-4,并将约束分为 lexical、structural 和relation-based 类型。作者们提供了多种标准化的评价指标。
  • results: 研究发现 LLMs 在约束下的文本生成能力和缺陷,提供了未来约束文本生成的指导。I hope that helps!
    Abstract Advancements in natural language generation (NLG) and large language models (LLMs) have led to proficient text generation in various tasks. However, integrating intricate constraints into neural text generation, due to LLMs' opacity, remains challenging. This study investigates constrained text generation for LLMs, where predefined constraints are applied during LLM's generation process. Our research examines multiple LLMs, including ChatGPT and GPT-4, categorizing constraints into lexical, structural, and relation-based types. We also present various benchmarks to facilitate fair evaluation. The study addresses some key research questions, including the extent of LLMs' compliance with constraints. Results illuminate LLMs' capacity and deficiency to incorporate constraints and provide insights for future developments in constrained text generation. Codes and datasets will be released upon acceptance.
    摘要 Recent advancements in natural language generation (NLG) and large language models (LLMs) have led to significant improvements in text generation for various tasks. However, incorporating complex constraints into neural text generation remains a challenging task due to the opacity of LLMs. This study explores constrained text generation for LLMs, where predefined constraints are applied during the LLM's generation process. Our research categorizes constraints into three types: lexical, structural, and relation-based, and examines multiple LLMs, including ChatGPT and GPT-4. We also provide various benchmarks to facilitate fair evaluation. The study aims to answer several key research questions, including the extent of LLMs' compliance with constraints. The results shed light on LLMs' capacity and limitations in incorporating constraints and provide valuable insights for future developments in constrained text generation. The codes and datasets will be released upon acceptance.

RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models

  • paper_url: http://arxiv.org/abs/2310.16340
  • repo_url: None
  • paper_authors: Zefan Wang, Zichuan Liu, Yingying Zhang, Aoxiao Zhong, Lunting Fan, Lingfei Wu, Qingsong Wen
  • for: 这个论文旨在提出一种自主代理工具框架,以便在实际工业环境中进行自主和隐私保护的根本原因分析(RCA)。
  • methods: 该框架使用了一些增强技术,包括自我一致性和多种上下文管理、稳定化和知识导入方法。
  • results: 实验结果显示,与ReAct相比,RCAgent在多个方面(包括根本原因预测、解决方案、证据和责任等)具有明显的优势,并且已经成功地 интеGRATED到了阿里巴巴云的Real-time Compute Platform for Apache Flink的诊断和问题探索工作流程中。
    Abstract Large language model (LLM) applications in cloud root cause analysis (RCA) have been actively explored recently. However, current methods are still reliant on manual workflow settings and do not unleash LLMs' decision-making and environment interaction capabilities. We present RCAgent, a tool-augmented LLM autonomous agent framework for practical and privacy-aware industrial RCA usage. Running on an internally deployed model rather than GPT families, RCAgent is capable of free-form data collection and comprehensive analysis with tools. Our framework combines a variety of enhancements, including a unique Self-Consistency for action trajectories, and a suite of methods for context management, stabilization, and importing domain knowledge. Our experiments show RCAgent's evident and consistent superiority over ReAct across all aspects of RCA -- predicting root causes, solutions, evidence, and responsibilities -- and tasks covered or uncovered by current rules, as validated by both automated metrics and human evaluations. Furthermore, RCAgent has already been integrated into the diagnosis and issue discovery workflow of the Real-time Compute Platform for Apache Flink of Alibaba Cloud.
    摘要

Generative Pre-training for Speech with Flow Matching

  • paper_url: http://arxiv.org/abs/2310.16338
  • repo_url: None
  • paper_authors: Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu
  • for: 这个论文的目的是建立一个基础模型来进行语音生成任务。
  • methods: 这个论文使用的方法是在60000小时的不分译语音数据上进行流匹配和masked condition的预训练,然后根据任务特定的数据进行细化。
  • results: 实验结果显示,预训练后的生成模型可以与专家模型一样或超过它们在语音增强、分离和合成等下游任务中表现。这些结果建议了一个基于生成预训练的基础模型可以为语音生成任务提供支持。
    Abstract Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data. In speech, text-to-speech synthesis and neural vocoder are good examples where generative models have shined. While generative models have been applied to different applications in speech, there exists no general-purpose generative model that models speech directly. In this work, we take a step toward this direction by showing a single pre-trained generative model can be adapted to different downstream tasks with strong performance. Specifically, we pre-trained a generative model, named SpeechFlow, on 60k hours of untranscribed speech with Flow Matching and masked conditions. Experiment results show the pre-trained generative model can be fine-tuned with task-specific data to match or surpass existing expert models on speech enhancement, separation, and synthesis. Our work suggested a foundational model for generation tasks in speech can be built with generative pre-training.
    摘要 现代生成模型在过去几年内得到了更多的关注,因为它们在估计和采样数据分布以生成高效的合成数据方面取得了非常出色的成果。在语音领域,语音合成和神经 vocoder 是一些生成模型在取得了极高水平的成果的例子。然而,在语音领域,没有一个通用的生成模型可以直接模型语音。在这项工作中,我们向这个方向发展了一步,我们显示了一个预训练的生成模型,即 SpeechFlow,可以在不同的下游任务中达到强大的性能。 Specifically, we pre-trained SpeechFlow 模型在 60 万小时的无转录语音数据上进行 Flow Matching 和 masked 条件。实验结果表明,可以在任务特定的数据上练习这个预训练模型,以达到或超过现有专家模型的表现水平。我们的工作建议了基于生成预训练的基本模型可以在语音生成任务中建立。注意:以下是将文本翻译成 Simplified Chinese,但不包括所有特殊的语音相关 терминологи。如果需要更加详细的翻译,请随时指明。

Samsung R&D Institute Philippines at WMT 2023

  • paper_url: http://arxiv.org/abs/2310.16322
  • repo_url: None
  • paper_authors: Jan Christian Blaise Cruz
  • for: 这 paper 是为了描述 Samsung R&D Institute Philippines 在 WMT 2023 通用翻译任务中提交的受限MT 系统,包括 en$\rightarrow$he 和 he$\rightarrow$en 两个方向。
  • methods: 这些系统采用了一系列最佳实践,包括全面的数据处理管道、人工生成的反向翻译数据和在线解码中使用噪声通道重新排序。
  • results: 这些模型在两个公共测试集上表现良好,与强基线系统相当,甚至occasionally outperform,即使它们有许多 fewer 参数。
    Abstract In this paper, we describe the constrained MT systems submitted by Samsung R&D Institute Philippines to the WMT 2023 General Translation Task for two directions: en$\rightarrow$he and he$\rightarrow$en. Our systems comprise of Transformer-based sequence-to-sequence models that are trained with a mix of best practices: comprehensive data preprocessing pipelines, synthetic backtranslated data, and the use of noisy channel reranking during online decoding. Our models perform comparably to, and sometimes outperform, strong baseline unconstrained systems such as mBART50 M2M and NLLB 200 MoE despite having significantly fewer parameters on two public benchmarks: FLORES-200 and NTREX-128.
    摘要 在这篇论文中,我们描述了我们由Samsung R&D Institute Philippines提交到WMT 2023通用翻译任务的受限MT系统,包括en$\rightarrow$he和he$\rightarrow$en两个方向。我们的系统采用了基于Transformer的序列到序列模型,通过一系列最佳实践进行训练,包括全面的数据预处理管道、合成回传数据和在线解码中使用噪声通道重新排序。我们的模型与强基线系统如mBART50 M2M和NLLB 200 MoE相比,在两个公共benchmark上表现相当,有时 même outperform,即使我们的参数数量相对较少。

DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment

  • paper_url: http://arxiv.org/abs/2310.16319
  • repo_url: None
  • paper_authors: Yukun Zhao, Lingyong Yan, Weiwei Sun, Chong Meng, Shuaiqiang Wang, Zhicong Cheng, Zhaochun Ren, Dawei Yin
  • for: 本研究是为了提供一个大规模的对话质量评估数据集(DiQAD),用于自动评估开放领域对话质量。
  • methods: 本研究使用了基于人类对对话质量的评估标准来确定评估标准,然后对实际用户之间的大规模对话进行了标注。
  • results: 本研究通过多种实验,报告了基线的性能在DiQAD上。同时,也公开了这个数据集,可以供后续研究使用。
    Abstract Dialogue assessment plays a critical role in the development of open-domain dialogue systems. Existing work are uncapable of providing an end-to-end and human-epistemic assessment dataset, while they only provide sub-metrics like coherence or the dialogues are conversed between annotators far from real user settings. In this paper, we release a large-scale dialogue quality assessment dataset (DiQAD), for automatically assessing open-domain dialogue quality. Specifically, we (1) establish the assessment criteria based on the dimensions conforming to human judgements on dialogue qualities, and (2) annotate large-scale dialogues that conversed between real users based on these annotation criteria, which contains around 100,000 dialogues. We conduct several experiments and report the performances of the baselines as the benchmark on DiQAD. The dataset is openly accessible at https://github.com/yukunZhao/Dataset_Dialogue_quality_evaluation.
    摘要 对话评估在开放领域对话系统的发展中扮演了关键角色。现有的工作无法提供总体和人类知识基础的对话评估数据集,只提供了一些子指标,如对话 coherence 或者对话者之间的对话是在不实际用户设置下进行的。在这篇论文中,我们发布了一个大规模的对话质量评估数据集(DiQAD),用于自动评估开放领域对话质量。specifically,我们(1)确定了评估标准基于对话质量的人类判断维度,并(2)对实际用户之间的大规模对话进行了annotate,这些对话约有100,000个。我们进行了多个实验,并对基线进行了评估。数据集可以在 中免费下载。

URL-BERT: Training Webpage Representations via Social Media Engagements

  • paper_url: http://arxiv.org/abs/2310.16303
  • repo_url: None
  • paper_authors: Ayesha Qamar, Chetan Verma, Ahmed El-Kishky, Sumit Binnani, Sneha Mehta, Taylor Berg-Kirkpatrick
  • for: 本研究旨在适应语言模型(LM)理解和表示网页内容,提高在社交媒体上分享和参与URL时的表达能力。
  • methods: 本研究提出了一种新的预训练目标,可以使LM适应理解URL和网页内容,并通过用户在社交媒体上的互动来学习URL的表示。
  • results: 通过对多语言版本BERT进行继续预训练,我们实际地证明了我们的框架可以提高 webpage 理解的多种任务和Twitter内部和外部 benchмарks 的表现。
    Abstract Understanding and representing webpages is crucial to online social networks where users may share and engage with URLs. Common language model (LM) encoders such as BERT can be used to understand and represent the textual content of webpages. However, these representations may not model thematic information of web domains and URLs or accurately capture their appeal to social media users. In this work, we introduce a new pre-training objective that can be used to adapt LMs to understand URLs and webpages. Our proposed framework consists of two steps: (1) scalable graph embeddings to learn shallow representations of URLs based on user engagement on social media and (2) a contrastive objective that aligns LM representations with the aforementioned graph-based representation. We apply our framework to the multilingual version of BERT to obtain the model URL-BERT. We experimentally demonstrate that our continued pre-training approach improves webpage understanding on a variety of tasks and Twitter internal and external benchmarks.
    摘要 理解和表示网页是在在线社交网络中关键的,因为用户可能将链接和分享在网页上。常见的语言模型(LM)编码器,如BERT,可以用来理解和表示网页的文本内容。然而,这些表示可能不会模型网页的主题信息或正确地捕捉社交媒体用户的appeal。在这项工作中,我们介绍了一种新的预训练目标,可以用来适应LM理解URL和网页。我们的提posed框架包括两个步骤:(1)可扩展的图 embedding来学习URL的浅层表示,基于社交媒体上的用户互动,以及(2)一种对比目标,用于将LM表示与上述图基于的表示相对应。我们在多语言版本的BERT上应用了我们的框架,得到了模型URL-BERT。我们通过实验表明,我们的继续预训练方法可以提高网页理解的多种任务和Twitter内部和外部 bencmarks。

Is ChatGPT a Good Multi-Party Conversation Solver?

  • paper_url: http://arxiv.org/abs/2310.16301
  • repo_url: None
  • paper_authors: Chao-Hong Tan, Jia-Chen Gu, Zhen-Hua Ling
  • for: 这paper的目的是研究大自然语言模型(LLMs)在多方会话(MPCs)中的能力。
  • methods: 这paper使用的方法包括对ChatGPT和GPT-4的生成器模型进行零基础学习训练,并在多个MPC任务上进行评估。
  • results: 研究发现,ChatGPT在一些MPC任务上表现不佳,而GPT-4的表现更为出色。此外,通过添加MPC结构,包括说话人和受众建筑,可以提高表现。这项研究为MPC任务中应用生成LLMs提供了全面的评估和分析,并揭示了创造更高效和强大的MPC代理的挑战。
    Abstract Large Language Models (LLMs) have emerged as influential instruments within the realm of natural language processing; nevertheless, their capacity to handle multi-party conversations (MPCs) -- a scenario marked by the presence of multiple interlocutors involved in intricate information exchanges -- remains uncharted. In this paper, we delve into the potential of generative LLMs such as ChatGPT and GPT-4 within the context of MPCs. An empirical analysis is conducted to assess the zero-shot learning capabilities of ChatGPT and GPT-4 by subjecting them to evaluation across three MPC datasets that encompass five representative tasks. The findings reveal that ChatGPT's performance on a number of evaluated MPC tasks leaves much to be desired, whilst GPT-4's results portend a promising future. Additionally, we endeavor to bolster performance through the incorporation of MPC structures, encompassing both speaker and addressee architecture. This study provides an exhaustive evaluation and analysis of applying generative LLMs to MPCs, casting a light upon the conception and creation of increasingly effective and robust MPC agents. Concurrently, this work underscores the challenges implicit in the utilization of LLMs for MPCs, such as deciphering graphical information flows and generating stylistically consistent responses.
    摘要 大型语言模型(LLM)已经在自然语言处理领域成为重要的工具,但它们在多方会话(MPC)中的能力仍然是未知之地。在这篇论文中,我们探索了基于生成的LLM,如ChatGPT和GPT-4,在MPC中的潜在。我们对三个MPC数据集进行了零基础学习评估,以评估这些模型在五种代表性任务中的表现。结果表明,ChatGPT在许多评估任务上表现不佳,而GPT-4的结果表明了未来的发展潜力。此外,我们还尝试了通过包含MPC结构,包括说话人和受众建筑,提高性能。这篇研究提供了生成LLM在MPC中的全面评估和分析,推翻了在MPC中使用LLM的挑战,如图像信息流的解读和生成具有风格一致的回应。

The Distributional Hypothesis Does Not Fully Explain the Benefits of Masked Language Model Pretraining

  • paper_url: http://arxiv.org/abs/2310.16261
  • repo_url: None
  • paper_authors: Ting-Rui Chiang, Dani Yogatama
  • for: 研究 whether the better sample efficiency and generalization capability of masked language models can be attributed to the semantic similarity encoded in the pretraining data’s distributional property.
  • methods: 使用 synthetic dataset 和 two real-world datasets 进行分析。
  • results: 发现 distributional property 对预训练模型的样本效率有益,但不能完全解释模型的泛化能力。
    Abstract We analyze the masked language modeling pretraining objective function from the perspective of the distributional hypothesis. We investigate whether better sample efficiency and the better generalization capability of models pretrained with masked language modeling can be attributed to the semantic similarity encoded in the pretraining data's distributional property. Via a synthetic dataset, our analysis suggests that distributional property indeed leads to the better sample efficiency of pretrained masked language models, but does not fully explain the generalization capability. We also conduct analyses over two real-world datasets and demonstrate that the distributional property does not explain the generalization ability of pretrained natural language models either. Our results illustrate our limited understanding of model pretraining and provide future research directions.
    摘要 我们从分布性假设的角度分析掩Masked语言模型预训练目标函数。我们研究whether better sample efficiency和预训练模型的更好泛化能力是由预训练数据的分布性质带来的semantic similarity编码。通过一个 sintetic dataset,我们的分析表明预训练数据的分布性 indeed leads to better sample efficiency of pretrained masked language models, but does not fully explain the generalization ability.我们还对两个实际 dataset进行了分析,并证明了预训练自然语言模型的泛化能力不受分布性的影响。我们的结果表明我们对预训练的理解仍然有限,并提供了未来研究的方向。

cs.LG - 2023-10-25

Strategizing EV Charging and Renewable Integration in Texas

  • paper_url: http://arxiv.org/abs/2310.17056
  • repo_url: None
  • paper_authors: Mohammad Mohammadi, Jesse Thornburg
  • for: 本研究旨在探讨电动汽车(EVs)、可再生能源和智能电网技术在德州上的整合,并解决电动汽车普及化困难所带来的挑战。
  • methods: 该研究使用时间扭曲 clustering(DTW)和k-means clustering方法对每天的总负荷和网络负荷进行分类,从而获得每天的电力消耗和可再生能源生产特征。此外,研究还提出了一种基于特定负荷特点的优化充电和汽车到电网(V2G)窗口的方法,以便更好地决策能源消耗和可再生资源的整合。
  • results: 研究发现,通过DTW clustering和k-means clustering方法可以分化不同的每天电力消耗和可再生能源生产特征,并且可以根据特定负荷特点设置优化的充电和V2G窗口,以提高智能电网的稳定性和可再生能源的使用率。这些发现对于实现可持续可靠的能源未来具有重要意义。
    Abstract Exploring the convergence of electric vehicles (EVs), renewable energy, and smart grid technologies in the context of Texas, this study addresses challenges hindering the widespread adoption of EVs. Acknowledging their environmental benefits, the research focuses on grid stability concerns, uncoordinated charging patterns, and the complicated relationship between EVs and renewable energy sources. Dynamic time warping (DTW) clustering and k-means clustering methodologies categorize days based on total load and net load, offering nuanced insights into daily electricity consumption and renewable energy generation patterns. By establishing optimal charging and vehicle-to-grid (V2G) windows tailored to specific load characteristics, the study provides a sophisticated methodology for strategic decision-making in energy consumption and renewable integration. The findings contribute to the ongoing discourse on achieving a sustainable and resilient energy future through the seamless integration of EVs into smart grids.
    摘要 研究electric vehicles (EVs)在德州的整合、可再生能源和智能网络技术方面进行了探索。本研究承认EVs具有环保的优点,但是对于网络稳定性、充电模式不协调和可再生能源源与EVs之间的复杂关系存在一些挑战。使用时间扭曲分 clustering和k-means分 clustering方法对每天的总负荷和网络负荷进行分类,从而提供了细化的每天电力消耗和可再生能源生产模式的洞察。通过确定最佳充电和汽车到网络(V2G)窗口,以适应特定的负荷特征,本研究提供了一种高级的决策方法,以便在能源消耗和可再生资源 интеграции方面做出策略决策。研究成果对于实现可持续可靠的能源未来做出了贡献。

Early Detection of Tuberculosis with Machine Learning Cough Audio Analysis: Towards More Accessible Global Triaging Usage

  • paper_url: http://arxiv.org/abs/2310.17675
  • repo_url: None
  • paper_authors: Chandra Suda
  • for: 这份研究旨在提高肺炎诊断,开发一个快速、可靠、易于存取的诊断工具。
  • methods: 这个研究使用了一种新型的机器学习架构,将智能手机的麦克风声音档案和人口普查资料进行分析,以检测肺炎。
  • results: 研究获得了88%的AUROC分布(世界卫生组织的诊断标准),比实际应用更高,并且可以在15秒内获得结果。
    Abstract Tuberculosis (TB), a bacterial disease mainly affecting the lungs, is one of the leading infectious causes of mortality worldwide. To prevent TB from spreading within the body, which causes life-threatening complications, timely and effective anti-TB treatment is crucial. Cough, an objective biomarker for TB, is a triage tool that monitors treatment response and regresses with successful therapy. Current gold standards for TB diagnosis are slow or inaccessible, especially in rural areas where TB is most prevalent. In addition, current machine learning (ML) diagnosis research, like utilizing chest radiographs, is ineffective and does not monitor treatment progression. To enable effective diagnosis, an ensemble model was developed that analyzes, using a novel ML architecture, coughs' acoustic epidemiologies from smartphones' microphones to detect TB. The architecture includes a 2D-CNN and XGBoost that was trained on 724,964 cough audio samples and demographics from 7 countries. After feature extraction (Mel-spectrograms) and data augmentation (IR-convolution), the model achieved AUROC (area under the receiving operator characteristic) of 88%, surpassing WHO's requirements for screening tests. The results are available within 15 seconds and can easily be accessible via a mobile app. This research helps to improve TB diagnosis through a promising accurate, quick, and accessible triaging tool.
    摘要 抑菌疾病(TB)是全球主要传染性疾病之一,主要影响肺部。在时间上采取有效抑菌治疗是关键,以防TB在身体内进一步扩散并导致生命威胁性的合并症。咳嗽是TB的对象生物标志,可以评估治疗效果,并随着成功治疗而逐渐下降。但目前的TB诊断标准过于慢或不可达,尤其是在乡村地区,TB的发病率最高。此外,目前的机器学习(ML)诊断研究,如使用胸部X射线,无法准确诊断TB。为了实现有效的诊断,我们开发了一个ensemble模型,利用智能手机麦克风的声音样本来检测TB。该模型包括2D-CNN和XGBoost,并在7个国家的724,964个咳嗽声音样本和人口统计数据上进行训练。 после特征提取(Mel-spectrograms)和数据增强(IR-convolution),模型达到了AUROC(区域下收益特征)的88%,超过世界卫生组织(WHO)的诊断测试标准。结果在15秒内可以获得,并可以通过移动应用程序访问。这项研究可以改善TB诊断,提供一个准确、快速、可 accessible的检测工具。

Learning to Rank for Active Learning via Multi-Task Bilevel Optimization

  • paper_url: http://arxiv.org/abs/2310.17044
  • repo_url: None
  • paper_authors: Zixin Ding, Si Chen, Ruoxi Jia, Yuxin Chen
  • for: 提高活动学习效率和可行性,降低标注成本。
  • methods: 提出了一种新的活动学习方法,通过学习的搅拌模型来选择未标注的实例。
  • results: 通过实验表明,使用我们的方法可以在标注成本高的情况下提高活动学习的效率和可行性。
    Abstract Active learning is a promising paradigm to reduce the labeling cost by strategically requesting labels to improve model performance. However, existing active learning methods often rely on expensive acquisition function to compute, extensive modeling retraining and multiple rounds of interaction with annotators. To address these limitations, we propose a novel approach for active learning, which aims to select batches of unlabeled instances through a learned surrogate model for data acquisition. A key challenge in this approach is developing an acquisition function that generalizes well, as the history of data, which forms part of the utility function's input, grows over time. Our novel algorithmic contribution is a bilevel multi-task bilevel optimization framework that predicts the relative utility -- measured by the validation accuracy -- of different training sets, and ensures the learned acquisition function generalizes effectively. For cases where validation accuracy is expensive to evaluate, we introduce efficient interpolation-based surrogate models to estimate the utility function, reducing the evaluation cost. We demonstrate the performance of our approach through extensive experiments on standard active classification benchmarks. By employing our learned utility function, we show significant improvements over traditional techniques, paving the way for more efficient and effective utility maximization in active learning applications.
    摘要 活动学习是一种有前途的思想,可以减少标注成本,通过策略性地请求标注,提高模型性能。然而,现有的活动学习方法经常依赖于贵重的获取函数来计算,需要广泛的模型重新训练和多轮与注解员的互动。为了解决这些限制,我们提出了一种新的活动学习方法,通过一个学习的代理模型来选择未标注的实例集。我们的新算法贡献是一种缓中多任务缓中优化框架,可以预测不同训练集的相对utilty值,并确保学习得到的获取函数可以广泛适用。在评估utilty值时成本高的情况下,我们引入了高效的 interpolate-based 代理模型,以便估计获取函数,降低评估成本。我们通过对标准的活动分类benchmark进行广泛的实验,证明了我们的方法的性能优势,开拓了更有效率的活动学习应用。

Quantum Long Short-Term Memory (QLSTM) vs Classical LSTM in Time Series Forecasting: A Comparative Study in Solar Power Forecasting

  • paper_url: http://arxiv.org/abs/2310.17032
  • repo_url: None
  • paper_authors: Saad Zafar Khan, Nazeefa Muzammil, Syed Mohammad Hassan Zaidi, Abdulah Jeza Aljohani, Haibat Khan, Salman Ghafoor
  • for: 预测太阳能生产的精准预测是现代可再生能源系统的关键。本研究 compare Quantum Long Short-Term Memory(QLSTM)和 классиcal Long Short-Term Memory(LSTM)模型在太阳能生产预测中的比较。
  • methods: 我们的控制实验表明,QLSTM具有加速训练收敛和在初始EPoch中显著降低测试损失的优势。这些实验证明QLSTM在处理复杂时间序关系方面具有潜在的优势。
  • results: 我们的实验结果表明,QLSTM在初始EPoch中的测试损失比 классиcal LSTM快了数个数量级。这表明QLSTM在处理复杂时间序关系方面具有潜在的优势。
    Abstract Accurately forecasting solar power generation is crucial in the global progression towards sustainable energy systems. In this study, we conduct a meticulous comparison between Quantum Long Short-Term Memory (QLSTM) and classical Long Short-Term Memory (LSTM) models for solar power production forecasting. Our controlled experiments reveal promising advantages of QLSTMs, including accelerated training convergence and substantially reduced test loss within the initial epoch compared to classical LSTMs. These empirical findings demonstrate QLSTM's potential to swiftly assimilate complex time series relationships, enabled by quantum phenomena like superposition. However, realizing QLSTM's full capabilities necessitates further research into model validation across diverse conditions, systematic hyperparameter optimization, hardware noise resilience, and applications to correlated renewable forecasting problems. With continued progress, quantum machine learning can offer a paradigm shift in renewable energy time series prediction. This pioneering work provides initial evidence substantiating quantum advantages over classical LSTM, while acknowledging present limitations. Through rigorous benchmarking grounded in real-world data, our study elucidates a promising trajectory for quantum learning in renewable forecasting. Additional research and development can further actualize this potential to achieve unprecedented accuracy and reliability in predicting solar power generation worldwide.
    摘要 Forecasting solar power generation accurately is crucial in the global transition to sustainable energy systems. In this study, we compare the Quantum Long Short-Term Memory (QLSTM) and classical Long Short-Term Memory (LSTM) models for solar power production forecasting. Our experiments show that QLSTMs have several advantages, such as faster training convergence and lower test loss within the initial epoch, compared to classical LSTMs. These findings demonstrate the potential of QLSTM to quickly learn complex time series relationships, thanks to quantum phenomena like superposition. However, further research is needed to validate the model across different conditions, optimize hyperparameters, and improve hardware noise resilience. Additionally, we need to explore the application of QLSTM to correlated renewable forecasting problems. With continued progress, quantum machine learning can offer a paradigm shift in renewable energy time series prediction. This study provides initial evidence of quantum advantages over classical LSTM, while acknowledging present limitations. Through rigorous benchmarking with real-world data, we elucidate a promising trajectory for quantum learning in renewable forecasting, paving the way for unprecedented accuracy and reliability in predicting solar power generation worldwide.

On the Identifiability and Interpretability of Gaussian Process Models

  • paper_url: http://arxiv.org/abs/2310.17023
  • repo_url: https://github.com/jiawenchenn/gp_mixture_kernel
  • paper_authors: Jiawen Chen, Wancen Mu, Yun Li, Didong Li
  • for: 本文探讨了在单输出 Gaussian Process(GP)模型中广泛使用的添加性杂合Mat'ernkernel的做法,并研究了这种杂合kernel的性质。
  • methods: 作者在单输出和多输出GP模型中使用了不同的方法,包括deriving theoretical results和进行 simulations和实际应用。
  • results: 研究发现,在单输出情况下,杂合Mat'ernkernel的熔炉性受到最不稳定的组件的限制,而GP模型具有这种杂合kernel的效果等同于最不稳定的kernel组件。此外,作者发现在多输出情况下,covariance matrix $A$是可Identifiable的,表明杂合kernel适合多输出任务。这些结论得到了实际应用和仔细的 simulations 的支持。
    Abstract In this paper, we critically examine the prevalent practice of using additive mixtures of Mat\'ern kernels in single-output Gaussian process (GP) models and explore the properties of multiplicative mixtures of Mat\'ern kernels for multi-output GP models. For the single-output case, we derive a series of theoretical results showing that the smoothness of a mixture of Mat\'ern kernels is determined by the least smooth component and that a GP with such a kernel is effectively equivalent to the least smooth kernel component. Furthermore, we demonstrate that none of the mixing weights or parameters within individual kernel components are identifiable. We then turn our attention to multi-output GP models and analyze the identifiability of the covariance matrix $A$ in the multiplicative kernel $K(x,y) = AK_0(x,y)$, where $K_0$ is a standard single output kernel such as Mat\'ern. We show that $A$ is identifiable up to a multiplicative constant, suggesting that multiplicative mixtures are well suited for multi-output tasks. Our findings are supported by extensive simulations and real applications for both single- and multi-output settings. This work provides insight into kernel selection and interpretation for GP models, emphasizing the importance of choosing appropriate kernel structures for different tasks.
    摘要 在本文中,我们critically examines the prevailing practice of using additive mixtures of Matérn kernels in single-output Gaussian process (GP) models, and explore the properties of multiplicative mixtures of Matérn kernels for multi-output GP models. For the single-output case, we derive a series of theoretical results showing that the smoothness of a mixture of Matérn kernels is determined by the least smooth component, and that a GP with such a kernel is effectively equivalent to the least smooth kernel component. Furthermore, we demonstrate that none of the mixing weights or parameters within individual kernel components are identifiable. We then turn our attention to multi-output GP models and analyze the identifiability of the covariance matrix $A$ in the multiplicative kernel $K(x,y) = AK_0(x,y)$, where $K_0$ is a standard single output kernel such as Matérn. We show that $A$ is identifiable up to a multiplicative constant, suggesting that multiplicative mixtures are well suited for multi-output tasks. Our findings are supported by extensive simulations and real applications for both single- and multi-output settings. This work provides insight into kernel selection and interpretation for GP models, emphasizing the importance of choosing appropriate kernel structures for different tasks.

Streaming Factor Trajectory Learning for Temporal Tensor Decomposition

  • paper_url: http://arxiv.org/abs/2310.17021
  • repo_url: https://github.com/xuangu-fang/streaming-factor-trajectory-learning
  • paper_authors: Shikai Fang, Xin Yu, Shibo Li, Zheng Wang, Robert Kirby, Shandian Zhe
  • for: 这篇论文是为了解决实际应用中的流动数据问题,即如何有效地捕捉流动数据中对象的表示的时间演化。
  • methods: 该论文提出了流动因子轨迹学习(SFTL)方法,使用 Gaussian processes(GPs)来模型因子轨迹的时间演化,并通过将 GPs 转换为状态空间假设来处理流动数据的计算挑战。
  • results: 论文通过 sintetic tasks 和实际应用示例表明了 SFTL 的优势,可以有效地捕捉流动数据中对象的表示时间演化,并且可以在流动数据处理中实现标准的 Rauch-Tung-Striebel 平滑。
    Abstract Practical tensor data is often along with time information. Most existing temporal decomposition approaches estimate a set of fixed factors for the objects in each tensor mode, and hence cannot capture the temporal evolution of the objects' representation. More important, we lack an effective approach to capture such evolution from streaming data, which is common in real-world applications. To address these issues, we propose Streaming Factor Trajectory Learning (SFTL) for temporal tensor decomposition. We use Gaussian processes (GPs) to model the trajectory of factors so as to flexibly estimate their temporal evolution. To address the computational challenges in handling streaming data, we convert the GPs into a state-space prior by constructing an equivalent stochastic differential equation (SDE). We develop an efficient online filtering algorithm to estimate a decoupled running posterior of the involved factor states upon receiving new data. The decoupled estimation enables us to conduct standard Rauch-Tung-Striebel smoothing to compute the full posterior of all the trajectories in parallel, without the need for revisiting any previous data. We have shown the advantage of SFTL in both synthetic tasks and real-world applications.
    摘要 实际tensor数据经常同时间信息一起出现。现有的 temporal decomposition方法大多Estimate a set of fixed factors for the objects in each tensor mode, 因此无法捕捉对象表示的 temporal evolution。更重要的是,我们缺乏有效的方法来从流动数据中捕捉这种演化。为了解决这些问题,我们提出了Streaming Factor Trajectory Learning (SFTL) temporal tensor decomposition方法。我们使用 Gaussian processes (GPs) 来模型因子的轨迹,以便灵活地估计其时间演化。为了处理流动数据的计算挑战,我们将GPs转换为状态空间假设,并构建了相应的随机演化方程(SDE)。我们开发了高效的在线筛选算法,以便在接收新数据时估计一个协调运行的因子状态。这种协调估计使我们可以在并行计算中对所有轨迹进行标准的Rauch-Tung-Striebel平滑,而不需要再次访问任何之前的数据。我们在 synthetic tasks 和实际应用中都表明了SFTL的优势。

Simulation based stacking

  • paper_url: http://arxiv.org/abs/2310.17009
  • repo_url: https://github.com/bregaldo/simulation_based_stacking
  • paper_authors: Yuling Yao, Bruno Régaldo-Saint Blancard, Justin Domke
  • for: 这个论文旨在提出一种总结多个 posterior approximation 的核心框架,以提高 Bayesian 计算的精度和可靠性。
  • methods: 该论文使用了多种 inference algorithm 和 architecture,并利用了随机初始化和梯度的Randomness,以获得多个 posterior approximation。
  • results: 该论文通过对多个 benchmark simulations 和一个具有挑战性的 cosmological inference 任务进行示例,证明了该框架的 asymptotic guarantee 和多个 posterior approximation 的合理性。
    Abstract Simulation-based inference has been popular for amortized Bayesian computation. It is typical to have more than one posterior approximation, from different inference algorithms, different architectures, or simply the randomness of initialization and stochastic gradients. With a provable asymptotic guarantee, we present a general stacking framework to make use of all available posterior approximations. Our stacking method is able to combine densities, simulation draws, confidence intervals, and moments, and address the overall precision, calibration, coverage, and bias at the same time. We illustrate our method on several benchmark simulations and a challenging cosmological inference task.
    摘要 模拟基于推理已经广泛应用于权重 bayesian 计算中。通常有多个 posterior aproximation,来自不同的推理算法、不同的架构或协同初始化和随机梯度的Randomness。我们提供一种通用的堆叠框架,可以利用所有可用的 posterior approximations。我们的堆叠方法可以将density、实验取样、信任范围和幂等元素组合起来,同时Addressing 总精度、准确性、覆盖率和偏见问题。我们在一些标准的benchmark simulations和一个复杂的 cosmological inference task中进行了示例。

Faster Recalibration of an Online Predictor via Approachability

  • paper_url: http://arxiv.org/abs/2310.17002
  • repo_url: None
  • paper_authors: Princewill Okoroafor, Robert Kleinberg, Wen Sun
  • for: 提高在线预测模型的可靠性和可信度,特别是在对象输出序列可能会被敌意攻击的情况下。
  • methods: 使用黑威尔的可达性定理来转换不准确的在线预测模型,以实现更好的准确率和折补率。
  • results: 提出了一种新的算法,可以在在线预测设置下实现更快的准确率和折补率,并且可以静态地控制折补率和准确率之间的平衡。
    Abstract Predictive models in ML need to be trustworthy and reliable, which often at the very least means outputting calibrated probabilities. This can be particularly difficult to guarantee in the online prediction setting when the outcome sequence can be generated adversarially. In this paper we introduce a technique using Blackwell's approachability theorem for taking an online predictive model which might not be calibrated and transforming its predictions to calibrated predictions without much increase to the loss of the original model. Our proposed algorithm achieves calibration and accuracy at a faster rate than existing techniques arXiv:1607.03594 and is the first algorithm to offer a flexible tradeoff between calibration error and accuracy in the online setting. We demonstrate this by characterizing the space of jointly achievable calibration and regret using our technique.
    摘要 Machine learning 预测模型需要可靠和可信,通常至少表示输出加工。但在在线预测设置下,结果序列可能会被反对抗性生成,这可能使得保证预测的准确性很难。在这篇论文中,我们介绍了一种使用黑威尔的可达性定理来将一个可能不准确的在线预测模型转换成准确预测,而无需增加原始模型的损失。我们的提议算法可以快速实现折衔和准确性的平衡,并且是现有技术arXiv:1607.03594中的第一个可以进行折衔和准确性的flexible tradeoff的算法。我们通过characterizing the jointly achievable calibration and regret space来证明这一点。

Towards Continually Learning Application Performance Models

  • paper_url: http://arxiv.org/abs/2310.16996
  • repo_url: None
  • paper_authors: Ray A. O. Sinurat, Anurag Daram, Haryadi S. Gunawi, Robert B. Ross, Sandeep Madireddy
  • for: 本研究旨在开发一种能够考虑数据分布变化的机器学习性性能模型,以便在生产级HPC系统中进行重要的任务调度和应用优化决策。
  • methods: 本研究使用了随着时间的推移逐渐学习的方法,以抵消数据分布的变化对性能模型的影响。此外,我们还采用了一种叫做归并学习的技术,以避免训练过程中的违和强化现象。
  • results: 我们的最佳模型能够保持准确性,即使面临新的数据分布变化,同时在整个数据序列中的预测精度比预期方法提高了2倍。
    Abstract Machine learning-based performance models are increasingly being used to build critical job scheduling and application optimization decisions. Traditionally, these models assume that data distribution does not change as more samples are collected over time. However, owing to the complexity and heterogeneity of production HPC systems, they are susceptible to hardware degradation, replacement, and/or software patches, which can lead to drift in the data distribution that can adversely affect the performance models. To this end, we develop continually learning performance models that account for the distribution drift, alleviate catastrophic forgetting, and improve generalizability. Our best model was able to retain accuracy, regardless of having to learn the new distribution of data inflicted by system changes, while demonstrating a 2x improvement in the prediction accuracy of the whole data sequence in comparison to the naive approach.
    摘要

Probabilistic Integral Circuits

  • paper_url: http://arxiv.org/abs/2310.16986
  • repo_url: None
  • paper_authors: Gennaro Gala, Cassio de Campos, Robert Peharz, Antonio Vergari, Erik Quaeghebeur
  • for: 这篇论文旨在探讨连续隐变量(Continuous Latent Variables,简称LV)和概率Circuit(PC)两种模型之间的桥接,以及它们之间的一致性。
  • methods: 该论文提出了一种新的计算图语言——概率积分电路(Probabilistic Integral Circuits,简称PIC),它将PC的积分单元扩展到连续隐变量,从而实现了PC的扩展和改进。PIC使用 симвоlic计算图,可以在一些简单的情况下进行analytical integration,并且可以通过光量神经网络来参数化。
  • results: 在多个分布估计 benchmark 上,PIC-approximating PCs 系统性地超越了常见的PCs,它们通常通过 expectation-maximization 或 SGD 来学习。这表明PIC可以在连续隐变量模型中实现PC的 tractability 和表达能力。
    Abstract Continuous latent variables (LVs) are a key ingredient of many generative models, as they allow modelling expressive mixtures with an uncountable number of components. In contrast, probabilistic circuits (PCs) are hierarchical discrete mixtures represented as computational graphs composed of input, sum and product units. Unlike continuous LV models, PCs provide tractable inference but are limited to discrete LVs with categorical (i.e. unordered) states. We bridge these model classes by introducing probabilistic integral circuits (PICs), a new language of computational graphs that extends PCs with integral units representing continuous LVs. In the first place, PICs are symbolic computational graphs and are fully tractable in simple cases where analytical integration is possible. In practice, we parameterise PICs with light-weight neural nets delivering an intractable hierarchical continuous mixture that can be approximated arbitrarily well with large PCs using numerical quadrature. On several distribution estimation benchmarks, we show that such PIC-approximating PCs systematically outperform PCs commonly learned via expectation-maximization or SGD.
    摘要 PICs are symbolic computational graphs and are fully tractable in simple cases where analytical integration is possible. In practice, we parameterize PICs with lightweight neural nets, delivering an intractable hierarchical continuous mixture that can be approximated arbitrarily well with large PCs using numerical quadrature. On several distribution estimation benchmarks, we show that such PIC-approximating PCs systematically outperform PCs commonly learned via expectation-maximization or SGD.

Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark

  • paper_url: http://arxiv.org/abs/2310.16981
  • repo_url: https://github.com/vanderschaarlab/data-centric-synthetic-data
  • paper_authors: Lasse Hansen, Nabeel Seedat, Mihaela van der Schaar, Andrija Petrovic
  • for: 提高机器学习模型的训练数据质量和效果。
  • methods: 结合数据中心AI技术,对数据进行 profiling,以准确反映实际数据的复杂特征。
  • results: 对 eleven 个不同的表格数据集进行实验,发现现有的生成方法具有一定的局限性和不足,并提出了实践建议,以提高生成数据的质量和效果。
    Abstract Synthetic data serves as an alternative in training machine learning models, particularly when real-world data is limited or inaccessible. However, ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper addresses this issue by exploring the potential of integrating data-centric AI techniques which profile the data to guide the synthetic data generation process. Moreover, we shed light on the often ignored consequences of neglecting these data profiles during synthetic data generation -- despite seemingly high statistical fidelity. Subsequently, we propose a novel framework to evaluate the integration of data profiles to guide the creation of more representative synthetic data. In an empirical study, we evaluate the performance of five state-of-the-art models for tabular data generation on eleven distinct tabular datasets. The findings offer critical insights into the successes and limitations of current synthetic data generation techniques. Finally, we provide practical recommendations for integrating data-centric insights into the synthetic data generation process, with a specific focus on classification performance, model selection, and feature selection. This study aims to reevaluate conventional approaches to synthetic data generation and promote the application of data-centric AI techniques in improving the quality and effectiveness of synthetic data.
    摘要 人工数据作为机器学习模型训练时的替代方案,特别是当实际世界数据scarce或difficult to access时。然而,确保人工数据准确反映实际世界数据的复杂特点是一项挑战。这篇论文通过探讨integrating data-centric AI技术,以 Profiling the data to guide the synthetic data generation process。此外,我们还探讨在不考虑这些数据Profile during synthetic data generation时所忽略的后果--尽管似乎具有高度的统计准确性。随后,我们提出了一种新的评估框架,用于评估数据Profile的集成,以创建更代表性的人工数据。在一项实验研究中,我们评估了五种当今最佳实践的 tabular data生成模型在 eleven 个不同的 tabular 数据集上。发现的结果提供了关键的洞察,揭示了当前人工数据生成技术的成功和局限性。最后,我们提供了实践oriented的建议,用于在人工数据生成过程中integrating data-centric AI技术,特别是关于分类性能、模型选择和特征选择。本研究的目标是重新评估现有的人工数据生成方法,并促进data-centric AI技术在提高人工数据质量和效果方面的应用。

Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian Inference

  • paper_url: http://arxiv.org/abs/2310.16975
  • repo_url: https://github.com/emorymlip/pcp-map
  • paper_authors: Zheyu Oliver Wang, Ricardo Baptista, Youssef Marzouk, Lars Ruthotto, Deepanshu Verma
  • for: 这两种神经网络方法用于解决静态和动态conditional optimal transport(COT)问题,以便实现样本和概率分布估计,这些任务是 bayesian inference 的核心任务。
  • methods: 这两种方法都是基于 measure transport 框架,将目标 conditional distribution 表示为一个可追踪的参考分布的变换。 COT 图是这个框架中的一个可能性,具有uniqueness 和 monotonicity 的优点。然而,相关的 COT 问题在 moderate 维度时 computationally challenging。为了提高可扩展性,我们的数学算法利用神经网络来参数化 COT 图。
  • results: 我们的方法比 state-of-the-art 方法更高效和更准确,可以在 benchmark 数据集和 bayesian inverse problem 中进行证明。 PCP-Map 模型将 conditional transport 图表示为 partially input convex neural network(PICNN)的梯度,并使用一种新的数学实现来提高计算效率。 COT-Flow 模型则使用一种含杂化的神经网络 ODE 来表示 conditional transport,它在训练时 slower,但在 sampling 时 faster。
    Abstract We present two neural network approaches that approximate the solutions of static and dynamic conditional optimal transport (COT) problems, respectively. Both approaches enable sampling and density estimation of conditional probability distributions, which are core tasks in Bayesian inference. Our methods represent the target conditional distributions as transformations of a tractable reference distribution and, therefore, fall into the framework of measure transport. COT maps are a canonical choice within this framework, with desirable properties such as uniqueness and monotonicity. However, the associated COT problems are computationally challenging, even in moderate dimensions. To improve the scalability, our numerical algorithms leverage neural networks to parameterize COT maps. Our methods exploit the structure of the static and dynamic formulations of the COT problem. PCP-Map models conditional transport maps as the gradient of a partially input convex neural network (PICNN) and uses a novel numerical implementation to increase computational efficiency compared to state-of-the-art alternatives. COT-Flow models conditional transports via the flow of a regularized neural ODE; it is slower to train but offers faster sampling. We demonstrate their effectiveness and efficiency by comparing them with state-of-the-art approaches using benchmark datasets and Bayesian inverse problems.
    摘要 我们提出了两种神经网络方法,用于近似静态和动态条件最优运输(COT)问题的解决方案。这两种方法允许采样和概率分布估计,这些任务是泛型推理中核心任务之一。我们的方法将目标 conditional distribution 表示为一个可迭代的参考分布变换,因此属于度量运输框架。 COT 图是这种框架中的一个可能性,具有uniqueness 和 monotonicity 的感知性。然而,相关的 COT 问题在moderate 维度下 computationally 挑战。为了提高可扩展性,我们的数字算法使用神经网络来参数化 COT 图。我们的方法利用静态和动态 COT 问题的结构。 PCP-Map 模型 conditional transport 图像为 partially input convex neural network(PICNN)的梯度,并使用一种新的数字实现以提高计算效率相比之前的状态艺术。 COT-Flow 模型 conditional transport 通过一个正则化神经 ODE 的流动来实现,它在训练 slower 但在样本 faster 。我们通过对比 benchmark 数据和推理 inverse 问题来证明它们的有效性和效率。

Privately Aligning Language Models with Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.16960
  • repo_url: None
  • paper_authors: Fan Wu, Huseyin A. Inan, Arturs Backurs, Varun Chandrasekaran, Janardhan Kulkarni, Robert Sim
  • for: 本研究旨在采用异谱隐私(DP)和强化学习(RL)对大型自然语言模型(LLM)进行适应性调整,以提高模型的听说能力和隐私保护。
  • methods: 本研究使用了Ziegler等人(2020)提出的两种主要方法:一是通过RL无人Loop(例如正面评价生成)进行调整,二是通过RL人反馈(RLHF)(例如 SUMMARIZATION 在人类偏好的方式)。我们提供了一个新的DP框架来实现这两种方法的适应性调整,并证明了其正确性。
  • results: 我们的实验结果证明了我们的方法的有效性,即可以提供竞争力强的实用性while ensuring strong privacy protections。
    Abstract Positioned between pre-training and user deployment, aligning large language models (LLMs) through reinforcement learning (RL) has emerged as a prevailing strategy for training instruction following-models such as ChatGPT. In this work, we initiate the study of privacy-preserving alignment of LLMs through Differential Privacy (DP) in conjunction with RL. Following the influential work of Ziegler et al. (2020), we study two dominant paradigms: (i) alignment via RL without human in the loop (e.g., positive review generation) and (ii) alignment via RL from human feedback (RLHF) (e.g., summarization in a human-preferred way). We give a new DP framework to achieve alignment via RL, and prove its correctness. Our experimental results validate the effectiveness of our approach, offering competitive utility while ensuring strong privacy protections.
    摘要 位于预训练和用户部署之间,通过强化学习(RL)来对大语言模型(LLM)进行对齐已成为训练指令遵从模型如ChatGPT的主要策略。在这项工作中,我们开始研究保护隐私的LLM对齐方法,通过强化学习和隐私保护(DP)。根据茅利等人(2020)的 influential work,我们研究两种主导方法:(i)通过RL无人参与(例如, Positive Review生成)和(ii)通过RL从人类反馈(RLHF,例如,人类偏好的概要)。我们提出了一个新的DP框架来实现LLM对齐,并证明其正确性。我们的实验结果证明了我们的方法的有效性,具有竞争的实用性并保护强大隐私权。

Improving Few-shot Generalization of Safety Classifiers via Data Augmented Parameter-Efficient Fine-Tuning

  • paper_url: http://arxiv.org/abs/2310.16959
  • repo_url: None
  • paper_authors: Ananth Balashankar, Xiao Ma, Aradhana Sinha, Ahmad Beirami, Yao Qin, Jilin Chen, Alex Beutel
  • for: 这 paper 探讨了 LLM 在新的安全问题和政策出现后,如何建立检测违反规则的分类器。
  • methods: 这 paper 使用了域总化少数例学习来解决 LLM 在新的安全问题上建立分类器。
  • results: 这 paper 实验表明,相比于优化学习和示例选择,参数高效调整(PEFT)和类例基于的数据增强(DAPT)方法可以在新的安全规则下提高分类器的性能,具体提高了 Social Chemistry 道德判断和 Toxicity 检测任务中的 F1 分数和 AUC 值。
    Abstract As large language models (LLMs) are widely adopted, new safety issues and policies emerge, to which existing safety classifiers do not generalize well. If we have only observed a few examples of violations of a new safety rule, how can we build a classifier to detect violations? In this paper, we study the novel setting of domain-generalized few-shot learning for LLM-based text safety classifiers. Unlike prior few-shot work, these new safety issues can be hard to uncover and we do not get to choose the few examples. We demonstrate that existing few-shot techniques do not perform well in this setting, and rather we propose to do parameter-efficient fine-tuning (PEFT) combined with augmenting training data based on similar examples in prior existing rules. We empirically show that our approach of similarity-based data-augmentation + prompt-tuning (DAPT) consistently outperforms baselines that either do not rely on data augmentation or on PEFT by 7-17% F1 score in the Social Chemistry moral judgement and 9-13% AUC in the Toxicity detection tasks, even when the new rule is loosely correlated with existing ones.
    摘要 大型语言模型(LLM)的广泛采用导致新的安全问题和政策出现,现有的安全分类器不能通用。如果我们只有几个例外新安全规则的违反示例,如何建立一个检测违反的分类器?在这篇论文中,我们研究了域总化几个步骤的文本安全分类器。不同于先前的几个步骤工作,这些新的安全问题可能很难发现,我们不能选择几个示例。我们表明,现有的几个步骤技术不适用于这种设定,而是提议使用效率高的参数调整(PEFT)和基于先前的规则相似例子的数据增强(DAPT)。我们实际证明,我们的方法在社交化学道德评价和攻击性识别任务中表现出了7-17%的F1分和9-13%的AUC提升,即使新规则与现有规则之间存在潜在的相互关系。

Transferring a molecular foundation model for polymer property predictions

  • paper_url: http://arxiv.org/abs/2310.16958
  • repo_url: None
  • paper_authors: Pei Zhang, Logan Kearney, Debsindhu Bhowmik, Zachary Fox, Amit K. Naskar, John Gounley
  • for: 加速设计优化,如药品开发和材料发现
  • methods: 使用对小分子进行预训的Transformer型语言模型,并在这些模型上进行精细调整,以估计聚合物性能
  • results: 使用这种方法可以获得与对增强聚合物数据进行增强调整的比较类似的准确性,但是无需进行耗时的数据增强。
    Abstract Transformer-based large language models have remarkable potential to accelerate design optimization for applications such as drug development and materials discovery. Self-supervised pretraining of transformer models requires large-scale datasets, which are often sparsely populated in topical areas such as polymer science. State-of-the-art approaches for polymers conduct data augmentation to generate additional samples but unavoidably incurs extra computational costs. In contrast, large-scale open-source datasets are available for small molecules and provide a potential solution to data scarcity through transfer learning. In this work, we show that using transformers pretrained on small molecules and fine-tuned on polymer properties achieve comparable accuracy to those trained on augmented polymer datasets for a series of benchmark prediction tasks.
    摘要 使用基于转换器的大型自然语言模型可以快速加速设计优化,如药物开发和材料发现。自我超级vised pretraining转换器模型需要大规模数据集,但材料科学领域中这些数据集 часто是罕见的。现状的方法是对材料进行数据扩充来生成更多样本,但这会添加额外的计算成本。相反,小分子领域有大规模的开源数据集可供使用,这提供了数据缺乏的解决方案通过传输学习。在这项工作中,我们显示了使用基于小分子的转换器模型,并将其精度调整为聚合物性能可以达到与数据扩充后的转换器模型相同的准确性水平。Here's the translation in Simplified Chinese:使用基于转换器的大型自然语言模型可以快速加速设计优化,如药物开发和材料发现。自我超级vised pretraining转换器模型需要大规模数据集,但材料科学领域中这些数据集 часто是罕见的。现状的方法是对材料进行数据扩充来生成更多样本,但这会添加额外的计算成本。相反,小分子领域有大规模的开源数据集可供使用,这提供了数据缺乏的解决方案通过传输学习。在这项工作中,我们显示了使用基于小分子的转换器模型,并将其精度调整为聚合物性能可以达到与数据扩充后的转换器模型相同的准确性水平。

Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks

  • paper_url: http://arxiv.org/abs/2310.16955
  • repo_url: None
  • paper_authors: Aradhana Sinha, Ananth Balashankar, Ahmad Beirami, Thi Avrahami, Jilin Chen, Alex Beutel
  • for: 这个论文的目的是提高自然语言处理系统对人类敌对者的Robustness。
  • methods: 这个论文使用有限的人类敌对例进行对抗训练,以生成更多的有用敌对例。
  • results: 对ANLI和仇恨言语检测 benchmark数据集进行训练,相比只训练 observed human attacks,同时训练 synthetic adversarial examples,可以提高模型对未来人类敌对者的Robustness。
    Abstract Real-world natural language processing systems need to be robust to human adversaries. Collecting examples of human adversaries for training is an effective but expensive solution. On the other hand, training on synthetic attacks with small perturbations - such as word-substitution - does not actually improve robustness to human adversaries. In this paper, we propose an adversarial training framework that uses limited human adversarial examples to generate more useful adversarial examples at scale. We demonstrate the advantages of this system on the ANLI and hate speech detection benchmark datasets - both collected via an iterative, adversarial human-and-model-in-the-loop procedure. Compared to training only on observed human attacks, also training on our synthetic adversarial examples improves model robustness to future rounds. In ANLI, we see accuracy gains on the current set of attacks (44.1%$\,\to\,$50.1%) and on two future unseen rounds of human generated attacks (32.5%$\,\to\,$43.4%, and 29.4%$\,\to\,$40.2%). In hate speech detection, we see AUC gains on current attacks (0.76 $\to$ 0.84) and a future round (0.77 $\to$ 0.79). Attacks from methods that do not learn the distribution of existing human adversaries, meanwhile, degrade robustness.
    摘要 现实世界的自然语言处理系统需要强健于人类黑客。收集人类黑客的示例用于训练是一种有效的 pero 昂贵的解决方案。然而,使用小偏移量的Synthetic攻击训练并不实际提高对人类黑客的Robustness。在这篇论文中,我们提出了一种基于有限的人类黑客示例的对抗训练框架。我们示示了这种系统在ANLI和仇恨言语检测 benchmark 数据集上的优势。与仅仅训练在观察到的人类攻击的情况相比,我们的对抗训练还能提高模型对未来征的Robustness。在 ANLI 中,我们看到了当前攻击的准确率提高(44.1% $\to$ 50.1%),以及未来两个未见的人类生成的攻击(32.5% $\to$ 43.4%,和 29.4% $\to$ 40.2%)。在仇恨言语检测中,我们看到了投用率提高(0.76 $\to$ 0.84)以及未来一个round(0.77 $\to$ 0.79)。而不学习现有人类黑客的分布的攻击方法则会降低Robustness。

Causal Q-Aggregation for CATE Model Selection

  • paper_url: http://arxiv.org/abs/2310.16945
  • repo_url: None
  • paper_authors: Hui Lan, Vasilis Syrgkanis
  • for: 该论文的目的是提出一种新的 conditional average treatment effect(CATE)模型选择方法,以便实现个性化决策。
  • methods: 该论文使用了 proxy loss metrics with double robust properties 和 model ensembling,并提出了一种基于 Q-aggregation 的新的 CATE 模型选择方法。
  • results: 该论文的主要结果表明,该新方法可以 дости到 statistically optimal oracle model selection regret rates of $\frac{\log(M)}{n}$,并且不需要任何候选 CATE 模型都需要接近真实值。
    Abstract Accurate estimation of conditional average treatment effects (CATE) is at the core of personalized decision making. While there is a plethora of models for CATE estimation, model selection is a nontrivial task, due to the fundamental problem of causal inference. Recent empirical work provides evidence in favor of proxy loss metrics with double robust properties and in favor of model ensembling. However, theoretical understanding is lacking. Direct application of prior theoretical work leads to suboptimal oracle model selection rates due to the non-convexity of the model selection problem. We provide regret rates for the major existing CATE ensembling approaches and propose a new CATE model ensembling approach based on Q-aggregation using the doubly robust loss. Our main result shows that causal Q-aggregation achieves statistically optimal oracle model selection regret rates of $\frac{\log(M)}{n}$ (with $M$ models and $n$ samples), with the addition of higher-order estimation error terms related to products of errors in the nuisance functions. Crucially, our regret rate does not require that any of the candidate CATE models be close to the truth. We validate our new method on many semi-synthetic datasets and also provide extensions of our work to CATE model selection with instrumental variables and unobserved confounding.
    摘要 “精确估计 conditional average treatment effect (CATE) 是个核心的人性化决策问题。尽管有许多 CATE 估计模型,但选择这些模型是一个非常困难的任务,因为 causal inference 的基本问题。latest empirical work 表明,使用 proxy loss metrics with double robust properties 和 model ensembling 可以提高 CATE 估计的精度。然而,理论上的理解是lacking。直接运用先前的理论工作将会导致 suboptimal oracle model selection rates,因为 model selection 问题是非断的。我们提供了 existing CATE ensembling 方法中的 regret rates,并提出了一个基于 Q-aggregation 的新 CATE model ensembling 方法,使用 doubly robust loss。我们的主要结果表明,causal Q-aggregation 可以 дости得 statistically optimal oracle model selection regret rates of $\frac{\log(M)}{n}$(with $M$ models and $n$ samples),并且添加了高阶 estimation error terms related to products of errors in the nuisance functions。其中,我们的 regret rate 不需要任何 candidate CATE 模型都需要接近 truth。我们验证了我们的新方法在许多 semi-synthetic 数据上,并且提供了 CATE model selection with instrumental variables 和 unobserved confounding 的扩展。”

Exploring Behavior Discovery Methods for Heterogeneous Swarms of Limited-Capability Robots

  • paper_url: http://arxiv.org/abs/2310.16941
  • repo_url: None
  • paper_authors: Connor Mattson, Jeremy C. Clark, Daniel S. Brown
  • for: 本研究旨在探讨功能不同的机器人群体中可能出现的新行为。
  • methods: 本研究使用了 novelties 搜索和聚类来找到新的 emergent 行为。
  • results: 研究发现,先前的方法无法找到许多有趣的行为,而人类在Loop 的探索过程可以找到更多的行为。总共发现了 23 个 emergent 行为,其中 18 个是新发现的。这些行为是 computation-free 代理人群体中的首次发现。Here’s the same information in Simplified Chinese:
  • for: 本研究旨在探讨功能不同的机器人群体中可能出现的新行为。
  • methods: 本研究使用了 novelties 搜索和聚类来找到新的 emergent 行为。
  • results: 研究发现,先前的方法无法找到许多有趣的行为,而人类在Loop 的探索过程可以找到更多的行为。总共发现了 23 个 emergent 行为,其中 18 个是新发现的。这些行为是 computation-free 代理人群体中的首次发现。
    Abstract We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities. Prior work has considered behavior search for homogeneous swarms and proposed the use of novelty search over either a hand-specified or learned behavior space followed by clustering to return a taxonomy of emergent behaviors to the user. In this paper, we seek to better understand the role of novelty search and the efficacy of using clustering to discover novel emergent behaviors. Through a large set of experiments and ablations, we analyze the effect of representations, evolutionary search, and various clustering methods in the search for novel behaviors in a heterogeneous swarm. Our results indicate that prior methods fail to discover many interesting behaviors and that an iterative human-in-the-loop discovery process discovers more behaviors than random search, swarm chemistry, and automated behavior discovery. The combined discoveries of our experiments uncover 23 emergent behaviors, 18 of which are novel discoveries. To the best of our knowledge, these are the first known emergent behaviors for heterogeneous swarms of computation-free agents. Videos, code, and appendix are available at the project website: https://sites.google.com/view/heterogeneous-bd-methods
    摘要 我们研究一个功能多样化群体机器人的发展行为问题,该群体具有有限的能力。先前的研究曾经考虑过同型群体的行为搜索,并提出使用新奇搜索来寻找行为空间中的新行为,然后使用对应的对应组合来返回用户。在这篇论文中,我们想要更好地理解新奇搜索的角色以及使用对应组合来发现新的发展行为的有效性。通过一系列实验和删除,我们分析了表示、演化搜索和对应组合的效果,以寻找在多样化群体中发现新的行为。我们的结果显示先前的方法无法发现许多有兴趣的行为,而一种轮循人类在过程中的寻找过程可以发现更多的行为,比如随机搜索、群体化学和自动行为发现。我们的实验发现了23个发展行为,其中18个是新发现。到目前为止,这些是同型多样化群体机器人中第一个已知的发展行为。详细信息可以在项目网站上找到:https://sites.google.com/view/heterogeneous-bd-methods。

MimicTouch: Learning Human’s Control Strategy with Multi-Modal Tactile Feedback

  • paper_url: http://arxiv.org/abs/2310.16917
  • repo_url: None
  • paper_authors: Kelin Yu, Yunhai Han, Matthew Zhu, Ye Zhao
  • for: 这paper的目的是开发一种基于人类感觉的控制策略的机器人控制系统。
  • methods: 这paper使用了多模态感觉数据集和模拟学习技术,以及在线差分强化学习来让机器人模仿人类的感觉控制策略。
  • results: experiments show that MimicTouch 可以安全地将人类的感觉控制策略传递给机器人,并且能够在各种任务中提高机器人的性能。
    Abstract In robotics and artificial intelligence, the integration of tactile processing is becoming increasingly pivotal, especially in learning to execute intricate tasks like alignment and insertion. However, existing works focusing on tactile methods for insertion tasks predominantly rely on robot teleoperation data and reinforcement learning, which do not utilize the rich insights provided by human's control strategy guided by tactile feedback. For utilizing human sensations, methodologies related to learning from humans predominantly leverage visual feedback, often overlooking the invaluable tactile feedback that humans inherently employ to finish complex manipulations. Addressing this gap, we introduce "MimicTouch", a novel framework that mimics human's tactile-guided control strategy. In this framework, we initially collect multi-modal tactile datasets from human demonstrators, incorporating human tactile-guided control strategies for task completion. The subsequent step involves instructing robots through imitation learning using multi-modal sensor data and retargeted human motions. To further mitigate the embodiment gap between humans and robots, we employ online residual reinforcement learning on the physical robot. Through comprehensive experiments, we validate the safety of MimicTouch in transferring a latent policy learned through imitation learning from human to robot. This ongoing work will pave the way for a broader spectrum of tactile-guided robotic applications.
    摘要 在机器人和人工智能领域,感觉处理的集成在执行复杂任务 like 对接和定位方面变得越来越重要,特别是在学习执行这些任务时。然而,现有的执行任务中的策略几乎完全依赖于机器人 теле操作数据和奖励学习,不使用人类的感觉指导策略。为了利用人类的感觉,相关的学习人类方法主要依赖于视觉反馈,经常忽略人类在完成复杂把握中的感觉反馈。为了解决这个空隙,我们提出了“模仿感觉”(MimicTouch)框架。在这个框架中,我们首先收集多modal的感觉数据集,包括人类在完成任务时的感觉指导策略。接着,我们通过模仿学习使用多modal感觉数据和重定向的人类动作来指导机器人。为了进一步减少人机embody gap,我们使用在线剩余奖励学习来调整物理机器人。通过广泛的实验,我们证明了MimicTouch可以安全地将人类的潜在策略传递给机器人。这项工作将为机器人感觉导向应用带来更广泛的前景。

Transformer-based Atmospheric Density Forecasting

  • paper_url: http://arxiv.org/abs/2310.16912
  • repo_url: None
  • paper_authors: Julia Briden, Peng Mun Siew, Victor Rodriguez-Fernandez, Richard Linares
  • for: 预测大气密度,提高空间环境意识。
  • methods: 使用深度学习模型, capture 大气密度数据中长期关系。
  • results: 比较 Empirical NRLMSISE-00、JB2008 和 TIEGCM 模型,显示 transformer-based пропагатор 的预测性能更高。
    Abstract As the peak of the solar cycle approaches in 2025 and the ability of a single geomagnetic storm to significantly alter the orbit of Resident Space Objects (RSOs), techniques for atmospheric density forecasting are vital for space situational awareness. While linear data-driven methods, such as dynamic mode decomposition with control (DMDc), have been used previously for forecasting atmospheric density, deep learning-based forecasting has the ability to capture nonlinearities in data. By learning multiple layer weights from historical atmospheric density data, long-term dependencies in the dataset are captured in the mapping between the current atmospheric density state and control input to the atmospheric density state at the next timestep. This work improves upon previous linear propagation methods for atmospheric density forecasting, by developing a nonlinear transformer-based architecture for atmospheric density forecasting. Empirical NRLMSISE-00 and JB2008, as well as physics-based TIEGCM atmospheric density models are compared for forecasting with DMDc and with the transformer-based propagator.
    摘要 As the peak of the solar cycle approaches in 2025 and the ability of a single geomagnetic storm to significantly alter the orbit of Resident Space Objects (RSOs), 技术 для预测大气密度是 Space situational awareness 中非常重要的。而以前使用的线性数据驱动方法,如动态模式分解控制(DMDc),已经用于大气密度预测,但是深度学习基本法可以捕捉数据中的非线性关系。通过从历史大气密度数据中学习多层权重,捕捉了数据中长期依赖关系,并将当前大气密度状态与控制输入的映射关系储存在内存中。这种工作提高了之前的线性协议方法,通过开发一种基于 transformer 架构的大气密度预测方法。empirical NRLMSISE-00 和 JB2008 模型,以及基于物理学的 TIEGCM 大气密度模型,与 DMDc 和 transformer 基于的传播器进行比较。

Deep machine learning for meteor monitoring: advances with transfer learning and gradient-weighted class activation mapping

  • paper_url: http://arxiv.org/abs/2310.16826
  • repo_url: None
  • paper_authors: Eloy Peña-Asensio, Josep M. Trigo-Rodríguez, Pau Grèbol-Tomàs, David Regordosa-Avellana, Albert Rimola
  • for: 这个论文主要目的是提出一种自动化的激光探测系统,以便更好地研究流星学。
  • methods: 该论文使用了卷积神经网络(CNNs)来自动分类候选的流星检测图像。具体来说,使用了Gradient-weighted Class Activation Mapping(Grad-CAM)技术来确定流星在每幅图像中的准确位置。
  • results: 该论文在使用SPMN数据集进行训练和评估后,实现了98%的精度。这种新方法有助于减少流星科学家和站点运行人员的工作负担,同时提高流星跟踪和分类的精度。
    Abstract In recent decades, the use of optical detection systems for meteor studies has increased dramatically, resulting in huge amounts of data being analyzed. Automated meteor detection tools are essential for studying the continuous meteoroid incoming flux, recovering fresh meteorites, and achieving a better understanding of our Solar System. Concerning meteor detection, distinguishing false positives between meteor and non-meteor images has traditionally been performed by hand, which is significantly time-consuming. To address this issue, we developed a fully automated pipeline that uses Convolutional Neural Networks (CNNs) to classify candidate meteor detections. Our new method is able to detect meteors even in images that contain static elements such as clouds, the Moon, and buildings. To accurately locate the meteor within each frame, we employ the Gradient-weighted Class Activation Mapping (Grad-CAM) technique. This method facilitates the identification of the region of interest by multiplying the activations from the last convolutional layer with the average of the gradients across the feature map of that layer. By combining these findings with the activation map derived from the first convolutional layer, we effectively pinpoint the most probable pixel location of the meteor. We trained and evaluated our model on a large dataset collected by the Spanish Meteor Network (SPMN) and achieved a precision of 98\%. Our new methodology presented here has the potential to reduce the workload of meteor scientists and station operators and improve the accuracy of meteor tracking and classification.
    摘要 近年来,用于天体研究的光学探测系统的使用量有所增加,导致大量数据需要进行分析。自动化的流星探测工具是研究不断的流星oid进来流和回收新的流星的关键。在流星探测方面,传统上通过手动进行分类来 отлича流星和非流星图像,这是非常时间消耗的。为解决这个问题,我们开发了一个完全自动化的管道,使用卷积神经网络(CNNs)来分类候选的流星探测。我们的新方法可以在包含静止元素 such as 云、月亮和建筑物的图像中探测流星。为了准确地在每帧中找到流星,我们使用了梯度加权灵活图像(Grad-CAM)技术。这种方法将各层的激活值与该层的梯度的平均值相乘,以便在特定的特征图像中标识感兴趣的区域。通过将这些发现与基层卷积层的激活图像相结合,我们可以准确地定位流星在每帧中的位置。我们使用了大量由西班牙流星网络(SPMN)收集的数据进行训练和评估,并达到了98%的精度。我们的新方法可以减少流星科学家和站点操作人员的工作负担,并提高流星跟踪和分类的精度。

CATE Lasso: Conditional Average Treatment Effect Estimation with High-Dimensional Linear Regression

  • paper_url: http://arxiv.org/abs/2310.16819
  • repo_url: None
  • paper_authors: Masahiro Kato, Masaaki Imaizumi
  • for: This paper is written to study the estimation of Conditional Average Treatment Effects (CATEs) in causal inference, specifically in the presence of high-dimensional and non-sparse parameters.
  • methods: The paper proposes a method for consistently estimating CATEs using Lasso regression, which is specialized for CATE estimation and leverages the assumption of implicit sparsity.
  • results: The paper demonstrates the consistency of the proposed method through simulation studies, and shows that desirable theoretical properties such as consistency remain attainable even without assuming sparsity explicitly.
    Abstract In causal inference about two treatments, Conditional Average Treatment Effects (CATEs) play an important role as a quantity representing an individualized causal effect, defined as a difference between the expected outcomes of the two treatments conditioned on covariates. This study assumes two linear regression models between a potential outcome and covariates of the two treatments and defines CATEs as a difference between the linear regression models. Then, we propose a method for consistently estimating CATEs even under high-dimensional and non-sparse parameters. In our study, we demonstrate that desirable theoretical properties, such as consistency, remain attainable even without assuming sparsity explicitly if we assume a weaker assumption called implicit sparsity originating from the definition of CATEs. In this assumption, we suppose that parameters of linear models in potential outcomes can be divided into treatment-specific and common parameters, where the treatment-specific parameters take difference values between each linear regression model, while the common parameters remain identical. Thus, in a difference between two linear regression models, the common parameters disappear, leaving only differences in the treatment-specific parameters. Consequently, the non-zero parameters in CATEs correspond to the differences in the treatment-specific parameters. Leveraging this assumption, we develop a Lasso regression method specialized for CATE estimation and present that the estimator is consistent. Finally, we confirm the soundness of the proposed method by simulation studies.
    摘要 在 causal inference 中, conditional average treatment effects (CATEs) 扮演着重要的角色,它是个人化的 causal effect,定义为两个待遇中期望的差异,它们条件于 covariates。这个研究假设了两个线性回归模型,它们连接 potential outcome 和 covariates 的两个待遇。然后,我们提出了一种方法来稳定地估计 CATEs,即使高维度和非杂参数情况下也可以。在我们的研究中,我们证明了欲望的理论性质,如一致性,可以在不假设简单性的情况下保持。在这个假设中,我们假设了待遇中的参数可以分解为待遇特定的参数和通用参数,其中待遇特定的参数在各个线性回归模型中差异化,而通用参数保持不变。因此,在两个线性回归模型之间的差异中,通用参数消失,只有待遇特定的参数存在非零值。基于这个假设,我们开发了一种特化于 CATE 估计的 Lasso 回归方法,并证明了该估计器是一致的。最后,我们通过 simulate 研究证明了我们的方法的正确性。

Learning COVID-19 Regional Transmission Using Universal Differential Equations in a SIR model

  • paper_url: http://arxiv.org/abs/2310.16804
  • repo_url: https://github.com/adrocampos/udes_in_sir_regional_transmision
  • paper_authors: Adrian Rojas-Campos, Lukas Stelz, Pascal Nieters
  • for: 模型COVID-19的传播行为在高度连接的社会中很难模拟。单个区域SIR模型无法考虑来自其他地区的感染力量,扩展到大量交互地区则需要许多不实际存在的假设。
  • methods: 我们提出使用Universal Differential Equations(UDEs)来捕捉邻居地区对感染的影响,并与SIR模型结合使用。UDEs是 Totally或部分由深度神经网络(DNN)定义的微分方程。我们添加了一个由DNN学习来自其他地区的感染力量的添加项到SIR方程中。学习使用自动导数和梯度下降来让模型更好地预测疫情的发展。
  • results: 我们对使用模拟COVID-19疫情的方法进行比较,包括单个区域SIR模型和基于 solely DNN的完全数据驱动模型。我们发现,提出的UDE+SIR模型可以更准确地预测疫情的发展,但在疫情的末期出现衰减性的表现。单个区域SIR模型和完全数据驱动模型无法正确地预测疫情的发展。
    Abstract Highly-interconnected societies difficult to model the spread of infectious diseases such as COVID-19. Single-region SIR models fail to account for incoming forces of infection and expanding them to a large number of interacting regions involves many assumptions that do not hold in the real world. We propose using Universal Differential Equations (UDEs) to capture the influence of neighboring regions and improve the model's predictions in a combined SIR+UDE model. UDEs are differential equations totally or partially defined by a deep neural network (DNN). We include an additive term to the SIR equations composed by a DNN that learns the incoming force of infection from the other regions. The learning is performed using automatic differentiation and gradient descent to approach the change in the target system caused by the state of the neighboring regions. We compared the proposed model using a simulated COVID-19 outbreak against a single-region SIR and a fully data-driven model composed only of a DNN. The proposed UDE+SIR model generates predictions that capture the outbreak dynamic more accurately, but a decay in performance is observed at the last stages of the outbreak. The single-area SIR and the fully data-driven approach do not capture the proper dynamics accurately. Once the predictions were obtained, we employed the SINDy algorithm to substitute the DNN with a regression, removing the black box element of the model with no considerable increase in the error levels.
    摘要 高度连接的社会困难模型冠状病毒如COVID-19的传播。单个地区SIR模型无法考虑来自其他地区的感染力量和扩展它们到许多交互地区具有许多假设不符实际世界。我们提议使用全球差分方程(UDE)捕捉邻近地区的影响并改进模型预测,在SIR+UDE模型中。UDEs是 Totally或部分定义为深度神经网络(DNN)的差分方程。我们添加了一个由DNN学习来自其他地区的感染力量的添加项到SIR方程中。学习使用自动微分和梯度下降以 approachingtarget系统中由邻近地区的状态所引起的变化。我们对提议模型使用了模拟COVID-19爆发的结果进行比较,与单个地区SIR模型和完全数据驱动模型(只由DNN组成)进行比较。提议的UDE+SIR模型生成了更加准确地捕捉爆发动态的预测,但在爆发的最后阶段显示衰减性。单个地区SIR和完全数据驱动方法不能准确地捕捉动态。一旦预测得到,我们使用SINDy算法将DNN替换为回归,从而消除黑盒模型的隐身元素,而无 considerable增加误差水平。

From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction

  • paper_url: http://arxiv.org/abs/2310.16802
  • repo_url: None
  • paper_authors: Nima Shoghi, Adeesh Kolluru, John R. Kitchin, Zachary W. Ulissi, C. Lawrence Zitnick, Brandon M. Wood
  • for: 提高化学领域中物理性质预测的效果
  • methods: 使用多个化学领域的数据同时进行超参数化训练,并将每个领域作为独立的预训练任务来处理
  • results: 比起从零开始训练,平均提高59%的性能和在40个任务中 matches或 setting state-of-the-art 的表现
    Abstract Foundation models have been transformational in machine learning fields such as natural language processing and computer vision. Similar success in atomic property prediction has been limited due to the challenges of training effective models across multiple chemical domains. To address this, we introduce Joint Multi-domain Pre-training (JMP), a supervised pre-training strategy that simultaneously trains on multiple datasets from different chemical domains, treating each dataset as a unique pre-training task within a multi-task framework. Our combined training dataset consists of $\sim$120M systems from OC20, OC22, ANI-1x, and Transition-1x. We evaluate performance and generalization by fine-tuning over a diverse set of downstream tasks and datasets including: QM9, rMD17, MatBench, QMOF, SPICE, and MD22. JMP demonstrates an average improvement of 59% over training from scratch, and matches or sets state-of-the-art on 34 out of 40 tasks. Our work highlights the potential of pre-training strategies that utilize diverse data to advance property prediction across chemical domains, especially for low-data tasks.
    摘要 基础模型在自然语言处理和计算机视觉领域已经带来了转变。然而,在化学领域中的质量预测方面,这些成功却受到训练效果难以扩展到多个化学领域的挑战。为解决这个问题,我们介绍了一种名为 JOINT MULTI-DOMAIN PRE-TRAINING(JMP)的指导预训练策略。这种策略 simultaneous 地在多个化学领域中训练,将每个领域视为独立的预训练任务,并在多任务框架中进行同时训练。我们的合并训练集包括了OC20、OC22、ANI-1x和Transition-1x等约120M个系统。我们通过对这些系统进行精细调整和特定任务和数据集进行评估,发现JMP可以在34个任务中提高平均59%,并与或与状态之一的成果相匹配。我们的工作表明,通过使用多样化数据进行预训练,可以推动化学领域中的质量预测进步,特别是低数据任务。

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

  • paper_url: http://arxiv.org/abs/2310.16795
  • repo_url: https://github.com/ist-daslab/qmoe
  • paper_authors: Elias Frantar, Dan Alistarh
  • for: 这 paper 是为了解决大语言模型(LLM)的高执行成本问题,通过稀疏路由来提供更快速和准确的模型,但是带来了庞大参数的问题。
  • methods: 这 paper 使用了一种新的压缩和执行框架called QMoE,该框架包括一种可扩展的算法,可以准确地压缩 trillion-parameter MoE 到 less than 1 bit per parameter,并且与特制 GPU 解码器一起实现高效的端到端压缩执行。
  • results: QMoE 可以将 1.6 万亿参数的 SwitchTransformer-c2048 模型压缩到 less than 160GB (20x 压缩,0.8 bits per parameter),并且只需要少量的精度损失,在单个 GPU 上完成在一天内。这使得可以在可负担的商业硬件上执行 trillion-parameter 模型,如单个服务器上的 4x NVIDIA A6000 或 8x NVIDIA 3090 GPUs,并且只需要 less than 5% 的运行时过程相对于理想的无压缩执行。
    Abstract Mixture-of-Experts (MoE) architectures offer a general solution to the high inference costs of large language models (LLMs) via sparse routing, bringing faster and more accurate models, at the cost of massive parameter counts. For example, the SwitchTransformer-c2048 model has 1.6 trillion parameters, requiring 3.2TB of accelerator memory to run efficiently, which makes practical deployment challenging and expensive. In this paper, we present a solution to this memory problem, in form of a new compression and execution framework called QMoE. Specifically, QMoE consists of a scalable algorithm which accurately compresses trillion-parameter MoEs to less than 1 bit per parameter, in a custom format co-designed with bespoke GPU decoding kernels to facilitate efficient end-to-end compressed inference, with minor runtime overheads relative to uncompressed execution. Concretely, QMoE can compress the 1.6 trillion parameter SwitchTransformer-c2048 model to less than 160GB (20x compression, 0.8 bits per parameter) at only minor accuracy loss, in less than a day on a single GPU. This enables, for the first time, the execution of a trillion-parameter model on affordable commodity hardware, like a single server with 4x NVIDIA A6000 or 8x NVIDIA 3090 GPUs, at less than 5% runtime overhead relative to ideal uncompressed inference. The source code and compressed models are available at github.com/IST-DASLab/qmoe.
    摘要 大量语言模型(LLM)的高推理成本可以通过杂合扩展(MoE)架构解决,通过稀疏路由实现更快速和更准确的模型,但是需要巨量的参数数量。例如,SwitchTransformer-c2048模型有1.6万亿个参数,需要3.2TB的加速器内存来运行高效,这使得实际部署成为困难和昂贵的问题。在这篇论文中,我们提出了一种解决这个内存问题的解决方案,即新的压缩和执行框架 called QMoE。具体来说,QMoE包括一个可扩展的算法,可以高精度地压缩大量参数的MoE,以 less than 1比特/参数的形式,并与特制GPU解码器一起实现高效的压缩执行,具有较少的运行时间开销。例如,QMoE可以将1.6万亿参数的SwitchTransformer-c2048模型压缩到less than 160GB(20倍压缩,0.8比特/参数),在单个GPU上完成,只需要一天时间,并且只有较少的精度损失。这使得,对于首次执行一万亿参数模型,可以使用可靠的商用硬件,如单个服务器上的4个NVIDIA A6000或8个NVIDIA 3090 GPU,并且在5%的运行时间开销下完成。源代码和压缩模型可以在github.com/IST-DASLab/qmoe中下载。

Learning Independent Program and Architecture Representations for Generalizable Performance Modeling

  • paper_url: http://arxiv.org/abs/2310.16792
  • repo_url: None
  • paper_authors: Lingda Li, Thomas Flynn, Adolfy Hoisie
  • for: 这篇论文提出了一种基于深度学习的性能模型框架,可以学习高维独立/正交的程序和微架构表示。
  • methods: 该框架使用深度学习来学习程序和微架构表示,并可以将程序表示应用于任何微架构,以及将微架构表示应用于任何程序的性能预测。
  • results: 评估表明,PerfVec比前一个方法更通用、高效和准确。
    Abstract This paper proposes PerfVec, a novel deep learning-based performance modeling framework that learns high-dimensional, independent/orthogonal program and microarchitecture representations. Once learned, a program representation can be used to predict its performance on any microarchitecture, and likewise, a microarchitecture representation can be applied in the performance prediction of any program. Additionally, PerfVec yields a foundation model that captures the performance essence of instructions, which can be directly used by developers in numerous performance modeling related tasks without incurring its training cost. The evaluation demonstrates that PerfVec is more general, efficient, and accurate than previous approaches.
    摘要 Translated into Simplified Chinese:这篇论文提出了PerfVec,一种基于深度学习的性能模拟框架,可以学习高维、独立/垂直的程序和微架构表示。一 fois学习完成,程序表示可以用来预测任何微架构上的性能,而微架构表示也可以应用于任何程序的性能预测。此外,PerfVec还提供了一个基础模型,可以直接用于开发者在多种性能模拟相关任务中使用,无需付出训练成本。评估表明,PerfVec比前方法更通用、高效和准确。

Covert Planning against Imperfect Observers

  • paper_url: http://arxiv.org/abs/2310.16791
  • repo_url: None
  • paper_authors: Haoxiang Ma, Chongyang Shi, Shuo Han, Michael R. Dorothy, Jie Fu
  • for: 本文研究了如何通过抽象维度和察视者的不准确观测来实现隐蔽计划,以达到最大化任务性能而不被探测。
  • methods: 本文使用了Markov决策过程来模型智能机器和其随机环境之间的互动,并使用偏见函数来捕捉察视者对于隐蔽计划的泄露信息。 我们假设察视者使用假设测试来检测到是否存在异常情况,隐蔽计划的目标是 maximize 折扣回报,同时保持察视者探测到的概率Below a given threshold。
  • results: 我们证明了finite-memory策略比Markovian策略更有力量在隐蔽计划中,然后我们开发了一种基于 primal-dual proximal policy gradient 的方法来计算一个(本地)最优隐蔽策略。我们通过一个柔性网格世界示例来证明我们的方法的有效性,实验结果表明我们的方法可以计算一个不violate detection constraint的策略,同时 empirically 示出了环境噪声对隐蔽策略的影响。
    Abstract Covert planning refers to a class of constrained planning problems where an agent aims to accomplish a task with minimal information leaked to a passive observer to avoid detection. However, existing methods of covert planning often consider deterministic environments or do not exploit the observer's imperfect information. This paper studies how covert planning can leverage the coupling of stochastic dynamics and the observer's imperfect observation to achieve optimal task performance without being detected. Specifically, we employ a Markov decision process to model the interaction between the agent and its stochastic environment, and a partial observation function to capture the leaked information to a passive observer. Assuming the observer employs hypothesis testing to detect if the observation deviates from a nominal policy, the covert planning agent aims to maximize the total discounted reward while keeping the probability of being detected as an adversary below a given threshold. We prove that finite-memory policies are more powerful than Markovian policies in covert planning. Then, we develop a primal-dual proximal policy gradient method with a two-time-scale update to compute a (locally) optimal covert policy. We demonstrate the effectiveness of our methods using a stochastic gridworld example. Our experimental results illustrate that the proposed method computes a policy that maximizes the adversary's expected reward without violating the detection constraint, and empirically demonstrates how the environmental noises can influence the performance of the covert policies.
    摘要 隐蔽 планирование(covert planning)是一类受限制的 планирование问题,其中一个智能体尝试完成任务,并避免被一个旁观者探测到。然而,现有的隐蔽 планирование方法通常考虑决定性环境,或者不利用旁观者的不准确观察。这篇论文研究了如何使用随机动力学和旁观者的不准确观察来实现隐蔽任务完成,无需被探测。 Specifically, we employ a Markov decision process to model the interaction between the agent and its stochastic environment, and a partial observation function to capture the leaked information to a passive observer. Assuming the observer employs hypothesis testing to detect if the observation deviates from a nominal policy, the covert planning agent aims to maximize the total discounted reward while keeping the probability of being detected as an adversary below a given threshold. We prove that finite-memory policies are more powerful than Markovian policies in covert planning. Then, we develop a primal-dual proximal policy gradient method with a two-time-scale update to compute a (locally) optimal covert policy. We demonstrate the effectiveness of our methods using a stochastic gridworld example. Our experimental results illustrate that the proposed method computes a policy that maximizes the adversary's expected reward without violating the detection constraint, and empirically demonstrates how the environmental noises can influence the performance of the covert policies.

The Simplest Inflationary Potentials

  • paper_url: http://arxiv.org/abs/2310.16786
  • repo_url: None
  • paper_authors: Tomás Sousa, Deaglan J. Bartlett, Harry Desmond, Pedro G. Ferreira
  • for: 这个论文是为了研究早期宇宙的Inflation理论,并且与现有的cosmic microwave background和大规模结构观察相Compatible。
  • methods: 这个论文使用了一种新的符号重回归法,生成所有可能的简单演算符潜在性。
  • results: 这个论文通过使用信息理论度量(“最小描述长度”)评估这些模型是否能够压缩Planck数据中的信息,并explored两种不同的假设空间中的参数。
    Abstract Inflation is a highly favoured theory for the early Universe. It is compatible with current observations of the cosmic microwave background and large scale structure and is a driver in the quest to detect primordial gravitational waves. It is also, given the current quality of the data, highly under-determined with a large number of candidate implementations. We use a new method in symbolic regression to generate all possible simple scalar field potentials for one of two possible basis sets of operators. Treating these as single-field, slow-roll inflationary models we then score them with an information-theoretic metric ("minimum description length") that quantifies their efficiency in compressing the information in the Planck data. We explore two possible priors on the parameter space of potentials, one related to the functions' structural complexity and one that uses a Katz back-off language model to prefer functions that may be theoretically motivated. This enables us to identify the inflaton potentials that optimally balance simplicity with accuracy at explaining the Planck data, which may subsequently find theoretical motivation. Our exploratory study opens the door to extraction of fundamental physics directly from data, and may be augmented with more refined theoretical priors in the quest for a complete understanding of the early Universe.
    摘要 inflation是 Early Universe 的非常受欢迎理论。它与 cosmic microwave background 和 large scale structure 的现有观察结果相Compatible,并且是探测primordial gravitational waves的 Driver。然而,由于数据质量的限制,这个理论目前处于高度不确定的状态,有许多候选的实现方式。我们使用新的 symbolic regression 方法生成所有可能的简单Scalar field potentials,然后使用信息理论度量("最小描述长度")对这些模型进行评分。我们使用两种不同的 prior 在 potential space 中,一种是函数的结构复杂度,另一种是使用 Katz back-off language model 来 preference functions ,这些函数可能具有理论导向性。这些 inflaton potentials 可以最优地平衡简洁性和准确性,从而描述 Planck 数据,并可能找到理论上的支持。我们的探索性研究可以直接从数据中提取基本物理学,并可能与更加精细的理论假设相结合,以完全理解 Early Universe。

Simple, Scalable and Effective Clustering via One-Dimensional Projections

  • paper_url: http://arxiv.org/abs/2310.16752
  • repo_url: https://github.com/boredoms/prone
  • paper_authors: Moses Charikar, Monika Henzinger, Lunjia Hu, Maxmilian Vötsch, Erik Waingarten
  • for: 这个论文的目的是提出一种基于随机sampling的归一化算法,以提高 clustering 算法的运行时间和准确性。
  • methods: 该算法使用了随机扫描和简单的排序算法,并且使用了一种新的评价函数来评价每个分区的质量。
  • results: 研究人员通过 theoretical 分析和实验 validate 了该算法的正确性和精度,并且发现该算法可以提供一个新的平衡点 между运行时间和分区质量。
    Abstract Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $\Omega(ndk)$ time when clustering $n$ points in a $d$-dimensional space (represented by an $n\times d$ matrix $X$) into $k$ clusters. In applications with moderate to large $k$, the multiplicative $k$ factor can become very expensive. We introduce a simple randomized clustering algorithm that provably runs in expected time $O(\mathrm{nnz}(X) + n\log n)$ for arbitrary $k$. Here $\mathrm{nnz}(X)$ is the total number of non-zero entries in the input dataset $X$, which is upper bounded by $nd$ and can be significantly smaller for sparse datasets. We prove that our algorithm achieves approximation ratio $\smash{\widetilde{O}(k^4)}$ on any input dataset for the $k$-means objective. We also believe that our theoretical analysis is of independent interest, as we show that the approximation ratio of a $k$-means algorithm is approximately preserved under a class of projections and that $k$-means++ seeding can be implemented in expected $O(n \log n)$ time in one dimension. Finally, we show experimentally that our clustering algorithm gives a new tradeoff between running time and cluster quality compared to previous state-of-the-art methods for these tasks.
    摘要 “集群是机器学习中的基本问题,具有许多应用在数据分析中。受欢迎的集群算法如条 Lloyd 算法和 $k$-means++ 可能需要 $\Omega(ndk)$ 时间来将 $n$ 个点在 $d$-dimensional 空间中集结成 $k$ 个集群。在实际应用中,$k$ 的乘积因子可能会很昂贵。我们介绍了一个简单的随机集群算法,其预期时间复杂度为 $O(\text{nnz}(X) + n\log n)$,其中 $\text{nnz}(X)$ 是输入数据集 $X$ 中非零元素的总数,最多等于 $nd$,并且可能较小 для稀疏数据集。我们证明了我们的算法对任何输入数据集都有 $\smash{\widetilde{O}(k^4)}$ 的近似比率,并且我们还证明了这个近似比率在某些投影下保持不变。此外,我们还显示了在一维中可以实现 $k$-means++ 种子生成的预期时间为 $O(n \log n)$。最后,我们通过实验显示了我们的集群算法对前state-of-the-art方法的新的时间负载与集群质量之间的交换。”

Stochastic Latent Transformer: Efficient Modelling of Stochastically Forced Zonal Jets

  • paper_url: http://arxiv.org/abs/2310.16741
  • repo_url: https://github.com/ira-shokar/stochastic_latent_transformer
  • paper_authors: Ira J. S. Shokar, Rich R. Kerswell, Peter H. Haynes
  • for: 用于生成大量 ensemble,以便研究流体动力系统中的统计问题,如自发转换事件的概率。
  • methods: 使用 ‘Stochastic Latent Transformer’ 深度学习模型,其包括带有随机冲击的 transformer 和 translate-equivariant autoencoder,可以复制系统动力 across various integration periods。
  • results: 对于已知的zonal jet系统,使用这种模型可以 achieved a five-order-of-magnitude speedup compared to numerical integration,以便生成大量 ensemble,用于研究流体动力系统中的统计问题。
    Abstract We introduce the 'Stochastic Latent Transformer', a probabilistic deep learning approach for efficient reduced-order modelling of stochastic partial differential equations (SPDEs). Despite recent advances in deep learning for fluid mechanics, limited research has explored modelling stochastically driven flows - which play a crucial role in understanding a broad spectrum of phenomena, from jets on giant planets to ocean circulation and the variability of midlatitude weather. The model architecture consists of a stochastically-forced transformer, paired with a translation-equivariant autoencoder, that we demonstrate is capable of reproducing system dynamics across various integration periods. We demonstrate its effectiveness applied to a well-researched zonal jet system, with the neural network achieving a five-order-of-magnitude speedup compared to numerical integration. This facilitates the cost-effective generation of large ensembles, enabling the exploration of statistical questions concerning probabilities of spontaneous transition events.
    摘要 我们介绍“随机隐藏传播器”,一种几率深度学习方法来有效地实现减少阶层模型(SPDEs)。 DESPITE recent advances in deep learning for fluid mechanics, limited research has explored stochastically driven flows, which play a crucial role in understanding a broad spectrum of phenomena, from jets on giant planets to ocean circulation and the variability of midlatitude weather. The model architecture consists of a stochastically-forced transformer, paired with a translation-equivariant autoencoder, that we demonstrate is capable of reproducing system dynamics across various integration periods. We demonstrate its effectiveness applied to a well-researched zonal jet system, with the neural network achieving a five-order-of-magnitude speedup compared to numerical integration. This facilitates the cost-effective generation of large ensembles, enabling the exploration of statistical questions concerning probabilities of spontaneous transition events.Here's the translation in Traditional Chinese:我们介绍“随机隐藏传播器”,一种几率深度学习方法来有效地实现减少阶层模型(SPDEs)。 DESPITE recent advances in deep learning for fluid mechanics, limited research has explored stochastically driven flows, which play a crucial role in understanding a broad spectrum of phenomena, from jets on giant planets to ocean circulation and the variability of midlatitude weather. The model architecture consists of a stochastically-forced transformer, paired with a translation-equivariant autoencoder, that we demonstrate is capable of reproducing system dynamics across various integration periods. We demonstrate its effectiveness applied to a well-researched zonal jet system, with the neural network achieving a five-order-of-magnitude speedup compared to numerical integration. This facilitates the cost-effective generation of large ensembles, enabling the exploration of statistical questions concerning probabilities of spontaneous transition events.

MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.16730
  • repo_url: None
  • paper_authors: Dong-Ki Kim, Sungryull Sohn, Lajanugen Logeswaran, Dongsub Shim, Honglak Lee
  • for: 这篇论文主要针对Automated prompt optimization based on reinforcement learning (RL) 的问题,即如何使用RL来优化提示,以生成可读的提示和黑盒基础模型兼容。
  • methods: 这篇论文提出了一种新的MultiPrompter框架,视提示优化为合作游戏,在提示组合过程中,多个提示工作 вместе,以降低问题大小,并帮助提示学习优化提示。
  • results: 测试在文本到图像任务中,MultiPrompter方法可以生成高质量的图像,比基eline表现更好。
    Abstract Recently, there has been an increasing interest in automated prompt optimization based on reinforcement learning (RL). This approach offers important advantages, such as generating interpretable prompts and being compatible with black-box foundation models. However, the substantial prompt space size poses challenges for RL-based methods, often leading to suboptimal policy convergence. This paper introduces MultiPrompter, a new framework that views prompt optimization as a cooperative game between prompters which take turns composing a prompt together. Our cooperative prompt optimization effectively reduces the problem size and helps prompters learn optimal prompts. We test our method on the text-to-image task and show its ability to generate higher-quality images than baselines.
    摘要 Here is the text in Simplified Chinese:近期,有越来越多的关注在基于强化学习(RL)的自动化提示优化方面。这种方法具有许多优点,如生成可解释的提示和与黑盒基模型兼容。然而,大型提示空间大小会对RL基本方法造成优化策略的困难。这篇论文介绍了 MultiPrompter,一个新的框架,它视提示优化为多个提示者之间的合作游戏。我们的合作提示优化方法可以减小问题的大小,并帮助提示者学习优化提示。我们在文本到图像任务上测试了我们的方法,并显示它可以生成比基elines高质量的图像。

AI Hazard Management: A framework for the systematic management of root causes for AI risks

  • paper_url: http://arxiv.org/abs/2310.16727
  • repo_url: None
  • paper_authors: Ronald Schnitzer, Andreas Hapfelmeier, Sven Gaube, Sonja Zillner
  • for: This paper aims to provide a structured process for identifying, assessing, and treating risks associated with Artificial Intelligence (AI) systems, called AI Hazard Management (AIHM) framework.
  • methods: The proposed AIHM framework is based on a comprehensive state-of-the-art analysis of AI hazards and provides a systematic approach to identify, assess, and treat AI hazards in parallel with the development of AI systems.
  • results: The proposed framework can increase the overall quality of AI systems by systematically reducing the impact of identified hazards to an acceptable level, and provides a taxonomy to support the optimal treatment of identified AI hazards. Additionally, the framework ensures the auditability of AI systems by systematically documenting evidence that the potential impact of identified AI hazards could be reduced to a tolerable level.
    Abstract Recent advancements in the field of Artificial Intelligence (AI) establish the basis to address challenging tasks. However, with the integration of AI, new risks arise. Therefore, to benefit from its advantages, it is essential to adequately handle the risks associated with AI. Existing risk management processes in related fields, such as software systems, need to sufficiently consider the specifics of AI. A key challenge is to systematically and transparently identify and address AI risks' root causes - also called AI hazards. This paper introduces the AI Hazard Management (AIHM) framework, which provides a structured process to systematically identify, assess, and treat AI hazards. The proposed process is conducted in parallel with the development to ensure that any AI hazard is captured at the earliest possible stage of the AI system's life cycle. In addition, to ensure the AI system's auditability, the proposed framework systematically documents evidence that the potential impact of identified AI hazards could be reduced to a tolerable level. The framework builds upon an AI hazard list from a comprehensive state-of-the-art analysis. Also, we provide a taxonomy that supports the optimal treatment of the identified AI hazards. Additionally, we illustrate how the AIHM framework can increase the overall quality of a power grid AI use case by systematically reducing the impact of identified hazards to an acceptable level.
    摘要 最新的人工智能(AI)技术发展提供了解决复杂问题的基础。然而,通过AI的应用,新的风险也出现了。因此,为了获得其优势,需要有效地处理AI中的风险。相关领域的现有风险管理过程,如软件系统,需要足够考虑AI的特点。关键问题是系统地和透明地识别和解决AI风险的根本原因,也称为AI危险。本文介绍了AI危险管理(AIHM)框架,该框架提供了一种结构化的过程,系统地识别、评估和治理AI风险。提议的过程与开发同步进行,以确保在AI系统的生命周期中最早 posible时间捕捉到任何AI风险。此外,为确保AI系统的审核性,提议的框架系统地记录证明已经识别的AI风险可以降低到可接受水平的证据。框架基于AI风险列表,从全面的现状分析中综合获得。此外,我们还提供了一种支持优化治理认知的分类。此外,我们示例了如何通过AIHM框架在电力网络AI应用中系统地减少识别的风险影响,使其达到可接受的水平。

Wasserstein Gradient Flow over Variational Parameter Space for Variational Inference

  • paper_url: http://arxiv.org/abs/2310.16705
  • repo_url: None
  • paper_authors: Dai Hai Nguyen, Tetsuya Sakurai, Hiroshi Mamitsuka
  • for: 这个论文的目的是提出一种基于 Wasserstein 梯度下降的变分推断(VI)优化方法,用于减少变分推断的优化问题中的梯度下降过程中的计算复杂度。
  • methods: 本论文使用的方法包括变分推断(VI)和 Wasserstein 梯度下降(WGD),以及一些实用的数值方法来解决变分推断中的离散梯度流问题。
  • results: 经验测试和理论分析表明,提出的方法可以有效地提高变分推断的优化效果,并且可以视为现有的黑盒子变分推断(VB)和自然变分推断(NVB)的特例。
    Abstract Variational inference (VI) can be cast as an optimization problem in which the variational parameters are tuned to closely align a variational distribution with the true posterior. The optimization task can be approached through vanilla gradient descent in black-box VI or natural-gradient descent in natural-gradient VI. In this work, we reframe VI as the optimization of an objective that concerns probability distributions defined over a \textit{variational parameter space}. Subsequently, we propose Wasserstein gradient descent for tackling this optimization problem. Notably, the optimization techniques, namely black-box VI and natural-gradient VI, can be reinterpreted as specific instances of the proposed Wasserstein gradient descent. To enhance the efficiency of optimization, we develop practical methods for numerically solving the discrete gradient flows. We validate the effectiveness of the proposed methods through empirical experiments on a synthetic dataset, supplemented by theoretical analyses.
    摘要 <>Variational inference (VI) 可以看作一个优化问题,其中的变量参数被调整以使Variational分布与真实 posterior 之间匹配。这个优化任务可以通过普通的梯度下降或自然梯度下降进行解决。在这篇文章中,我们将 VI 重新框架为一个关于变量参数空间上的概率分布优化问题。然后,我们提出了 Wasserstein 梯度下降来解决这个优化问题。各种优化技术,包括黑盒子 VI 和自然梯度 VI,可以被视为特殊情况的 Wasserstein 梯度下降。为了提高优化效率,我们开发了实用的数值方法来解决离散梯度流的问题。我们通过实验和理论分析 Validate 提出的方法的有效性。Note: "Variational parameter space" is translated as "变量参数空间" (fāng yì jī yuán jī) in Simplified Chinese.

Interpretable time series neural representation for classification purposes

  • paper_url: http://arxiv.org/abs/2310.16696
  • repo_url: None
  • paper_authors: Etienne Le Naour, Ghislain Agoua, Nicolas Baskiotis, Vincent Guigue
  • for: 本研究旨在提出一种可解释性强的神经网络模型,用于解决现有的时间序列数据表示方法缺乏可解释性的问题。
  • methods: 本研究提出了一组可解释性神经网络模型的需求,并提出了一种新的无监督神经网络架构,可以满足这些需求。该模型通过独立学习无下渠任务来学习,以确保其robustness。
  • results: 在使用UCRC存档 datasets进行分类任务的实验中,提出的模型比其他可解释性模型和现有神经网络表示学习模型得到更好的结果,而且在多个 dataset 上得到了平均更好的结果。此外,我们还进行了质量实验来评估该方法的可解释性。
    Abstract Deep learning has made significant advances in creating efficient representations of time series data by automatically identifying complex patterns. However, these approaches lack interpretability, as the time series is transformed into a latent vector that is not easily interpretable. On the other hand, Symbolic Aggregate approximation (SAX) methods allow the creation of symbolic representations that can be interpreted but do not capture complex patterns effectively. In this work, we propose a set of requirements for a neural representation of univariate time series to be interpretable. We propose a new unsupervised neural architecture that meets these requirements. The proposed model produces consistent, discrete, interpretable, and visualizable representations. The model is learned independently of any downstream tasks in an unsupervised setting to ensure robustness. As a demonstration of the effectiveness of the proposed model, we propose experiments on classification tasks using UCR archive datasets. The obtained results are extensively compared to other interpretable models and state-of-the-art neural representation learning models. The experiments show that the proposed model yields, on average better results than other interpretable approaches on multiple datasets. We also present qualitative experiments to asses the interpretability of the approach.
    摘要

Local Discovery by Partitioning: Polynomial-Time Causal Discovery Around Exposure-Outcome Pairs

  • paper_url: http://arxiv.org/abs/2310.17816
  • repo_url: None
  • paper_authors: Jacqueline Maasch, Weishen Pan, Shantanu Gupta, Volodymyr Kuleshov, Kyra Gan, Fei Wang
  • for: 该研究是为了解决自动选择 covariate 的问题,即在具有限制的先验知识的情况下。
  • methods: 该研究使用了 Local Discovery by Partitioning(LDP)算法,该算法可以将变量集 Z partitioned 到与 exposure-outcome 对 {X,Y} 相关的不同 subset 中。该算法是基于有效地调整集的标准化,但不需要先做很多先验知识。
  • results: 该研究提供了对于任何 Z 的有效的 adjustment set,并且可以确保这些 adjustment set 是有效的。在更强的条件下,研究表明了 partition label 是 asymptotically correct。total independence tests 的时间复杂度是 quadratic 的,但是在实际测试中,可以观察到 quadratic 的时间复杂度。与基eline 相比,LDP 可以更好地回归 confounder 和更准确地估计 average treatment effect。
    Abstract This work addresses the problem of automated covariate selection under limited prior knowledge. Given an exposure-outcome pair {X,Y} and a variable set Z of unknown causal structure, the Local Discovery by Partitioning (LDP) algorithm partitions Z into subsets defined by their relation to {X,Y}. We enumerate eight exhaustive and mutually exclusive partitions of any arbitrary Z and leverage this taxonomy to differentiate confounders from other variable types. LDP is motivated by valid adjustment set identification, but avoids the pretreatment assumption commonly made by automated covariate selection methods. We provide theoretical guarantees that LDP returns a valid adjustment set for any Z that meets sufficient graphical conditions. Under stronger conditions, we prove that partition labels are asymptotically correct. Total independence tests is worst-case quadratic in |Z|, with sub-quadratic runtimes observed empirically. We numerically validate our theoretical guarantees on synthetic and semi-synthetic graphs. Adjustment sets from LDP yield less biased and more precise average treatment effect estimates than baselines, with LDP outperforming on confounder recall, test count, and runtime for valid adjustment set discovery.
    摘要

Learning-based adaption of robotic friction models

  • paper_url: http://arxiv.org/abs/2310.16688
  • repo_url: None
  • paper_authors: Philipp Scholl, Maged Iskandar, Sebastian Wolf, Jinoh Lee, Aras Bacho, Alexander Dietrich, Alin Albu-Schäffer, Gitta Kutyniok
  • for: This paper aims to address the challenge of modeling friction torque in robotic joints, which is a longstanding problem due to the lack of a good mathematical description.
  • methods: The authors propose a novel approach based on residual learning to adapt an existing friction model to new dynamics using as little data as possible. They use a base neural network to learn an accurate relation between velocity and friction torque, and then train a second network to predict the residual of the initial network’s output.
  • results: The authors demonstrate that their proposed estimator outperforms the conventional model-based approach and the base neural network significantly, with an approximately 60-70% improvement in trajectory tracking accuracy. They also show that their method can adapt to diverse scenarios based on prior knowledge about friction in different settings, using only 43 seconds of robot movement data.
    Abstract In the Fourth Industrial Revolution, wherein artificial intelligence and the automation of machines occupy a central role, the deployment of robots is indispensable. However, the manufacturing process using robots, especially in collaboration with humans, is highly intricate. In particular, modeling the friction torque in robotic joints is a longstanding problem due to the lack of a good mathematical description. This motivates the usage of data-driven methods in recent works. However, model-based and data-driven models often exhibit limitations in their ability to generalize beyond the specific dynamics they were trained on, as we demonstrate in this paper. To address this challenge, we introduce a novel approach based on residual learning, which aims to adapt an existing friction model to new dynamics using as little data as possible. We validate our approach by training a base neural network on a symmetric friction data set to learn an accurate relation between the velocity and the friction torque. Subsequently, to adapt to more complex asymmetric settings, we train a second network on a small dataset, focusing on predicting the residual of the initial network's output. By combining the output of both networks in a suitable manner, our proposed estimator outperforms the conventional model-based approach and the base neural network significantly. Furthermore, we evaluate our method on trajectories involving external loads and still observe a substantial improvement, approximately 60-70\%, over the conventional approach. Our method does not rely on data with external load during training, eliminating the need for external torque sensors. This demonstrates the generalization capability of our approach, even with a small amount of data-only 43 seconds of a robot movement-enabling adaptation to diverse scenarios based on prior knowledge about friction in different settings.
    摘要 在第四次工业革命中,人工智能和机器自动化在中心地位上扮演着重要角色。在这个过程中,使用机器人非常重要。然而,通过机器人进行生产过程,尤其是在人机合作的情况下,非常复杂。特别是模拟机器人 JOINT 的摩擦力很难,因为没有好的数学描述。这个问题驱使了我们使用数据驱动方法。然而,模型基的和数据驱动模型经常具有泛化能力不足的问题,我们在这篇论文中展示了这一点。为了解决这个挑战,我们提出了一种新的方法,基于剩余学习,可以通过最少数据来适应新的动力学。我们验证了我们的方法,通过在 симметry 的摩擦数据集上训练基本神经网络,以学习摩擦力和运动速度之间的准确关系。然后,为了适应更复杂的非对称情况,我们在一个小 dataset 上训练第二个神经网络,专注于预测初始神经网络输出的差异。通过合理地组合这两个网络的输出,我们的提议的估计器在比较 conventional 模型基的方法和基本神经网络上显著提高了性能。此外,我们还评估了我们的方法在包括外部荷重的轨迹上的性能,并观察到大约 60-70% 的提高。我们的方法不需要在训练时使用外部扭矩传感器,因此可以在不同的情况下进行适应,只需要43秒的机器人运动数据。这表明了我们的方法具有泛化能力,甚至只需要小量的数据。

Robust and Actively Secure Serverless Collaborative Learning

  • paper_url: http://arxiv.org/abs/2310.16678
  • repo_url: None
  • paper_authors: Olive Franzese, Adam Dziedzic, Christopher A. Choquette-Choo, Mark R. Thomas, Muhammad Ahmad Kaleem, Stephan Rabanser, Congyu Fang, Somesh Jha, Nicolas Papernot, Xiao Wang
  • for: 本研究旨在提供一种安全和可靠的分布式机器学习(Collaborative Machine Learning)方法,以保护客户端数据点免受服务器或客户端的攻击。
  • methods: 本研究使用了分布式机器学习(Distributed Machine Learning)的方法,并提出了一种安全和可靠的peer-to-peer(P2P)学习方案,可以防止服务器和客户端的不可靠行为。
  • results: 研究表明,该方法可以在1000个参与者的情况下,对1000万参数的模型进行训练,并且可以防止服务器和客户端的攻击。
    Abstract Collaborative machine learning (ML) is widely used to enable institutions to learn better models from distributed data. While collaborative approaches to learning intuitively protect user data, they remain vulnerable to either the server, the clients, or both, deviating from the protocol. Indeed, because the protocol is asymmetric, a malicious server can abuse its power to reconstruct client data points. Conversely, malicious clients can corrupt learning with malicious updates. Thus, both clients and servers require a guarantee when the other cannot be trusted to fully cooperate. In this work, we propose a peer-to-peer (P2P) learning scheme that is secure against malicious servers and robust to malicious clients. Our core contribution is a generic framework that transforms any (compatible) algorithm for robust aggregation of model updates to the setting where servers and clients can act maliciously. Finally, we demonstrate the computational efficiency of our approach even with 1-million parameter models trained by 100s of peers on standard datasets.
    摘要 共同机器学习(ML)广泛应用于各institution以获得更好的模型,从分布式数据中学习。而共同approach to learning Intuitively保护用户数据,但它们仍然易受到服务器、客户端或两者都 deviation from the protocol。实际上,因为协议是非对称的,一个恶意服务器可以利用其权力重construct客户端数据点。相反,恶意客户端可以腐化学习 mediante malicious updates。因此,客户端和服务器都需要一个保证,当另一方不能完全合作时。在这项工作中,我们提议一种分布式学习方案,安全于恶意服务器和对客户端腐化。我们的核心贡献是一种可generic framework,将任何(相容)算法 дляRobust Aggregation of model updates transform into setting where servers and clients can act maliciously。最后,我们证明了我们的方法的计算效率, même avec 1000000参数模型由100个同仁在标准 datasets上训练。

UAV Pathfinding in Dynamic Obstacle Avoidance with Multi-agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.16659
  • repo_url: None
  • paper_authors: Qizhen Wu, Lei Chen, Kexin Liu, Jinhu Lv
  • For: solve the dynamic obstacle avoidance problem online for multi-agent systems* Methods: centralized training with decentralized execution based on multi-agent reinforcement learning, with improved model predictive control for efficiency and sample utilization* Results: experimental results in simulation, indoor, and outdoor environments validate the effectiveness of the proposed method, with video available at https://www.bilibili.com/video/BV1gw41197hV/?vd_source=9de61aecdd9fb684e546d032ef7fe7bf
    Abstract Multi-agent reinforcement learning based methods are significant for online planning of feasible and safe paths for agents in dynamic and uncertain scenarios. Although some methods like fully centralized and fully decentralized methods achieve a certain measure of success, they also encounter problems such as dimension explosion and poor convergence, respectively. In this paper, we propose a novel centralized training with decentralized execution method based on multi-agent reinforcement learning to solve the dynamic obstacle avoidance problem online. In this approach, each agent communicates only with the central planner or only with its neighbors, respectively, to plan feasible and safe paths online. We improve our methods based on the idea of model predictive control to increase the training efficiency and sample utilization of agents. The experimental results in both simulation, indoor, and outdoor environments validate the effectiveness of our method. The video is available at https://www.bilibili.com/video/BV1gw41197hV/?vd_source=9de61aecdd9fb684e546d032ef7fe7bf
    摘要 多智能体学习基于方法在线规划可行安全的路径 для智能体在动态不确定的enario中是非常重要的。尽管有些方法,如完全中央化和完全分布式方法,在某种程度上达到了成功,但它们也遇到了维度爆发和优化问题,分别。在这篇论文中,我们提出了一种新的中央训练与分布式执行方法,基于多智能体学习来解决动态障碍避免问题在线。在这种方法中,每个智能体只与中央规划器或只与其他邻居进行交流,以在线规划可行安全的路径。我们通过模型预测控制的想法来提高我们的方法的训练效率和智能体的样本利用率。实验结果表明,我们的方法在 simulate、indoor和outdoor环境中具有效果。视频可以在https://www.bilibili.com/video/BV1gw41197hV/?vd_source=9de61aecdd9fb684e546d032ef7fe7bf找到。

Towards Control-Centric Representations in Reinforcement Learning from Images

  • paper_url: http://arxiv.org/abs/2310.16655
  • repo_url: None
  • paper_authors: Chen Liu, Hongyu Zang, Xin Li, Yong Heng, Yifei Wang, Zhen Fang, Yisen Wang, Mingzhong Wang
  • for: 解决图像基于奖励学习中的实用性和挑战性问题
  • methods: integrate reward-free control information和奖励特定知识,使用 transformer 架构模型动力学,并使用块级卷积来消除时空重复信息
  • results: 在 Atari 游戏和 DeepMind Control Suit 两大标准 bencmark 上表现出色,比现有方法有更高的性能,证明其有效性
    Abstract Image-based Reinforcement Learning is a practical yet challenging task. A major hurdle lies in extracting control-centric representations while disregarding irrelevant information. While approaches that follow the bisimulation principle exhibit the potential in learning state representations to address this issue, they still grapple with the limited expressive capacity of latent dynamics and the inadaptability to sparse reward environments. To address these limitations, we introduce ReBis, which aims to capture control-centric information by integrating reward-free control information alongside reward-specific knowledge. ReBis utilizes a transformer architecture to implicitly model the dynamics and incorporates block-wise masking to eliminate spatiotemporal redundancy. Moreover, ReBis combines bisimulation-based loss with asymmetric reconstruction loss to prevent feature collapse in environments with sparse rewards. Empirical studies on two large benchmarks, including Atari games and DeepMind Control Suit, demonstrate that ReBis has superior performance compared to existing methods, proving its effectiveness.
    摘要 图像基于的再强化学习是一项实用又挑战性的任务。主要难点在于提取控制中心的表示,而不考虑无关的信息。尽管遵循 bisimulation 原理的方法表现出了学习状态表示的潜在力,但它们仍然面临着卷积动力的有限表达能力和缺乏适应环境中的稀薄奖励的问题。为解决这些限制,我们介绍了 ReBis,它通过结合奖励free控制信息和奖励特有的知识来捕捉控制中心的信息。ReBis 使用 transformer 架构来隐式地模型动力学,并在块级别屏蔽中除掉空间时间的重复性。此外,ReBis 结合 bisimulation 基于的损失与不对称重建loss 来避免特征塌沦的问题。实验研究在 Atari 游戏和 DeepMind Control Suit 两个大 benchmark 上表明,ReBis 与现有方法相比有较高的性能,证明其效果。

  • paper_url: http://arxiv.org/abs/2310.16652
  • repo_url: None
  • paper_authors: Linping Qu, Shenghui Song, Chi-Ying Tsui, Yuyi Mao
  • for: 这篇论文探讨了 federated learning(FL)在无线网络上的稳定性,具体来说,它研究了 FL 对于上行和下行通信错误的抗预测性。
  • methods: 该论文使用了理论分析方法,探讨了 FL 中两个关键参数(客户端数量和模型参数范围)对稳定性的影响,并提出了一个公式来量化上行和下行通信错误之间的差异。
  • results: 研究发现,FL 在上行通信错误情况下可以忍受更高的比特错误率(BER),并且与客户端数量和模型参数范围有关。这些结论得到了实验 validate。
    Abstract Because of its privacy-preserving capability, federated learning (FL) has attracted significant attention from both academia and industry. However, when being implemented over wireless networks, it is not clear how much communication error can be tolerated by FL. This paper investigates the robustness of FL to the uplink and downlink communication error. Our theoretical analysis reveals that the robustness depends on two critical parameters, namely the number of clients and the numerical range of model parameters. It is also shown that the uplink communication in FL can tolerate a higher bit error rate (BER) than downlink communication, and this difference is quantified by a proposed formula. The findings and theoretical analyses are further validated by extensive experiments.
    摘要 因其隐私保护能力,联合学习(FL)已经吸引了学术界和产业界的广泛关注。然而,在无线网络上实现FL时,不清楚FC可以tolerate多少communication error。这篇论文研究了FL对上行和下行通信错误的Robustness。我们的理论分析表明,FL的Robustness取决于两个关键参数:客户端数量和模型参数的数值范围。此外,我们还发现了FL的上行通信可以tolerate较高的比特错误率(BER),相比下行通信,这个差异由我们提出的公式进行了详细描述。实验结果也验证了我们的理论分析。

Posterior Consistency for Missing Data in Variational Autoencoders

  • paper_url: http://arxiv.org/abs/2310.16648
  • repo_url: None
  • paper_authors: Timur Sudak, Sebastian Tschiatschek
  • for: 学习带有缺失数据的变量自动机(VAEs),以提高变量自动机的权重平衡和数据填充能力。
  • methods: 提出了一种 posterior consistency 定义和规范,以及一种基于这种定义的激活函数正则化方法,以促进变量自动机的后验分布的一致性。
  • results: 通过实验表明,该正则化方法可以提高缺失数据下的变量自动机的表现,包括增加了数据重建质量和下游任务中使用uncertainty的能力。此外,该方法可以在不同类型的 VAEs 中实现改进表现,包括 VAEs 携带流形。
    Abstract We consider the problem of learning Variational Autoencoders (VAEs), i.e., a type of deep generative model, from data with missing values. Such data is omnipresent in real-world applications of machine learning because complete data is often impossible or too costly to obtain. We particularly focus on improving a VAE's amortized posterior inference, i.e., the encoder, which in the case of missing data can be susceptible to learning inconsistent posterior distributions regarding the missingness. To this end, we provide a formal definition of posterior consistency and propose an approach for regularizing an encoder's posterior distribution which promotes this consistency. We observe that the proposed regularization suggests a different training objective than that typically considered in the literature when facing missing values. Furthermore, we empirically demonstrate that our regularization leads to improved performance in missing value settings in terms of reconstruction quality and downstream tasks utilizing uncertainty in the latent space. This improved performance can be observed for many classes of VAEs including VAEs equipped with normalizing flows.
    摘要 我团队考虑了使用 Variational Autoencoders (VAEs) 学习,即深度生成模型,从数据中缺失值处理。这种数据在实际机器学习应用中很普遍,因为完整的数据往往是不可逾或者太Costly来获得的。我们特别关注在缺失数据中改进 VAE 的权重平均推理,即encoder,因为在缺失数据中,encoder可能会学习不一致的 posterior 分布。为此,我们提出了 posterior 一致性的正式定义,并提议一种对 encoder 的 posterior 分布进行规范化,以便促进一致性。我们发现,我们的规范化建议一个与常见在缺失值情况下考虑的培训目标不同的训练目标。此外,我们在实验中观察到,我们的规范化可以在缺失值情况下提高 VAE 的表现,包括减少缺失值的重建质量和在隐藏空间中使用不确定性来进行下游任务。这种提高表现可以观察到许多类型的 VAEs,包括 VAEs 配置有流形函数。

Achieving Constraints in Neural Networks: A Stochastic Augmented Lagrangian Approach

  • paper_url: http://arxiv.org/abs/2310.16647
  • repo_url: None
  • paper_authors: Diogo Lavado, Cláudia Soares, Alessandra Micheletti
  • For: 这个论文的目的是提高深度神经网络(DNNs)的通用化和避免过拟合。* Methods: 我们提出了一种新的DNN调理方法,将训练过程架构为具有限制的对抗问题,并使用Stochastic Augmented Lagrangian(SAL)方法来实现更加灵活和高效的调理机制。* Results: 实验结果显示,SAL方法可以在image-based classification tasks中实现更高的准确率,同时也可以更好地满足限制,显示其在受限设定下优化DNNs的潜力。
    Abstract Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting. Fixed penalty methods, though common, lack adaptability and suffer from hyperparameter sensitivity. In this paper, we propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem. Where the data fidelity term is the minimization objective and the regularization terms serve as constraints. Then, we employ the Stochastic Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism. Our approach extends beyond black-box regularization, demonstrating significant improvements in white-box models, where weights are often subject to hard constraints to ensure interpretability. Experimental results on image-based classification on MNIST, CIFAR10, and CIFAR100 datasets validate the effectiveness of our approach. SAL consistently achieves higher Accuracy while also achieving better constraint satisfaction, thus showcasing its potential for optimizing DNNs under constrained settings.
    摘要 Deep Neural Networks (DNNs) 的常规化是提高普适性和避免过拟合的关键。固定罚金方法,虽然广泛使用,但缺乏适应性和参数敏感性。在这篇论文中,我们提出了一种新的 DNN 常规化方法,将训练过程框定为一个具有数据准确度对象和常规化约束的归一化优化问题。然后,我们使用 Stochastic Augmented Lagrangian(SAL)方法来实现更加灵活和高效的常规化机制。我们的方法不仅超越黑盒常规化,而且在白盒模型中,通常有硬件约束来保证解释性。我们对 MNIST、CIFAR10 和 CIFAR100 等图像分类 dataset 进行了实验,并证明了我们的方法的有效性。SAL 在具有约束的情况下可以具有更高的准确率和更好的约束满足度,这表明它在受约束的情况下可以优化 DNNs。

Model predictive control-based value estimation for efficient reinforcement learning

  • paper_url: http://arxiv.org/abs/2310.16646
  • repo_url: None
  • paper_authors: Qizhen Wu, Kexin Liu, Lei Chen
  • for: 提高 reinforcement learning 在实际应用中的效率,因为现实环境中需要大量交互。
  • methods: 基于模型预测控制的改进 reinforcement learning 方法,通过数据驱动的环境模型来预测值函数和优化策略。
  • results: 方法在经典数据库和无人机避险场景中实验 validate,显示了更高的学习效率,更快的策略倾向于优化值,以及更少的经验回放缓存空间需求。
    Abstract Reinforcement learning suffers from limitations in real practices primarily due to the numbers of required interactions with virtual environments. It results in a challenging problem that we are implausible to obtain an optimal strategy only with a few attempts for many learning method. Hereby, we design an improved reinforcement learning method based on model predictive control that models the environment through a data-driven approach. Based on learned environmental model, it performs multi-step prediction to estimate the value function and optimize the policy. The method demonstrates higher learning efficiency, faster convergent speed of strategies tending to the optimal value, and fewer sample capacity space required by experience replay buffers. Experimental results, both in classic databases and in a dynamic obstacle avoidance scenario for unmanned aerial vehicle, validate the proposed approaches.
    摘要 <>translate "Reinforcement learning suffers from limitations in real practices primarily due to the numbers of required interactions with virtual environments. It results in a challenging problem that we are implausible to obtain an optimal strategy only with a few attempts for many learning method. Hereby, we design an improved reinforcement learning method based on model predictive control that models the environment through a data-driven approach. Based on learned environmental model, it performs multi-step prediction to estimate the value function and optimize the policy. The method demonstrates higher learning efficiency, faster convergent speed of strategies tending to the optimal value, and fewer sample capacity space required by experience replay buffers. Experimental results, both in classic databases and in a dynamic obstacle avoidance scenario for unmanned aerial vehicle, validate the proposed approaches." into Simplified Chinese.Here's the translation:<>现实中实施的增强学习受到环境份的限制,主要是因为需要与虚拟环境进行多次互动。这实际上是一个困难的问题,我们很难在几次尝试后就获得最佳策略。为此,我们设计了一个基于预测控制的改进增强学习方法。这个方法通过数据驱动的方法来建立环境模型,然后使用预测多步来估算值函数和优化策略。这个方法在学习效率、策略趋向最佳值的速度和经验练习空间的需求方面具有更高的表现。实验结果,包括 классические数据库和无人飞行器避障enario,证实了我们的提案。

Robust Covariate Shift Adaptation for Density-Ratio Estimation

  • paper_url: http://arxiv.org/abs/2310.16638
  • repo_url: None
  • paper_authors: Masahiro Kato
  • For: 预测测试数据中缺失的结果。* Methods: 使用重要性权重法和双重机器学习技术来适应covariate shift,并提出了一种双重可靠估计器来减少density ratio估计误差所带来的偏差。* Results: 通过实验研究表明,提出的方法可以减少density ratio估计误差所带来的偏差,并且方法 remains consistent if either the density ratio estimator or the regression function is consistent。
    Abstract Consider a scenario where we have access to train data with both covariates and outcomes while test data only contains covariates. In this scenario, our primary aim is to predict the missing outcomes of the test data. With this objective in mind, we train parametric regression models under a covariate shift, where covariate distributions are different between the train and test data. For this problem, existing studies have proposed covariate shift adaptation via importance weighting using the density ratio. This approach averages the train data losses, each weighted by an estimated ratio of the covariate densities between the train and test data, to approximate the test-data risk. Although it allows us to obtain a test-data risk minimizer, its performance heavily relies on the accuracy of the density ratio estimation. Moreover, even if the density ratio can be consistently estimated, the estimation errors of the density ratio also yield bias in the estimators of the regression model's parameters of interest. To mitigate these challenges, we introduce a doubly robust estimator for covariate shift adaptation via importance weighting, which incorporates an additional estimator for the regression function. Leveraging double machine learning techniques, our estimator reduces the bias arising from the density ratio estimation errors. We demonstrate the asymptotic distribution of the regression parameter estimator. Notably, our estimator remains consistent if either the density ratio estimator or the regression function is consistent, showcasing its robustness against potential errors in density ratio estimation. Finally, we confirm the soundness of our proposed method via simulation studies.
    摘要 假设我们有训练数据包含covariates和结果,而测试数据只包含covariates。在这种情况下,我们的主要目标是预测测试数据中缺失的结果。为了实现这个目标,我们在covariate shift下训练参数化回归模型,其中covariate分布在训练和测试数据中不同。现有的研究已经提出了covariate shift适应via重要性权重使用density比率。这种方法将训练数据的损失平均值,每个权重为训练和测试数据中covariate分布之间的density比率估计值,以 Approximate测试数据的风险。although it allows us to obtain a test-data risk minimizer, its performance heavily relies on the accuracy of the density ratio estimation. Moreover, even if the density ratio can be consistently estimated, the estimation errors of the density ratio also yield bias in the estimators of the regression model's parameters of interest.To mitigate these challenges, we introduce a doubly robust estimator for covariate shift adaptation via importance weighting, which incorporates an additional estimator for the regression function. Leveraging double machine learning techniques, our estimator reduces the bias arising from the density ratio estimation errors. We demonstrate the asymptotic distribution of the regression parameter estimator. Notably, our estimator remains consistent if either the density ratio estimator or the regression function is consistent, showcasing its robustness against potential errors in density ratio estimation. Finally, we confirm the soundness of our proposed method via simulation studies.

Photometric Redshifts with Copula Entropy

  • paper_url: http://arxiv.org/abs/2310.16633
  • repo_url: https://github.com/majianthu/quasar
  • paper_authors: Jian Ma
  • for: 应用 copula entropy (CE) 来提高光度谱的准确性。
  • methods: 使用 CE 测量光度谱和光度测量之间的相关性,并选择高 CE 值的测量来预测红移。
  • results: 实验结果表明,使用选择的测量(包括 luminosity 磁场、U 频率带的标准差和其他四个频率带的磁场)可以提高光度谱的准确性,特别是对高红移样本的预测。
    Abstract In this paper we propose to apply copula entropy (CE) to photometric redshifts. CE is used to measure the correlations between photometric measurements and redshifts and then the measurements associated with high CEs are selected for predicting redshifts. We verified the proposed method on the SDSS quasar data. Experimental results show that the accuracy of photometric redshifts is improved with the selected measurements compared to the results with all the measurements used in the experiments, especially for the samples with high redshifts. The measurements selected with CE include luminosity magnitude, the brightness in ultraviolet band with standard deviation, and the brightness of the other four bands. Since CE is a rigorously defined mathematical concept, the models such derived is interpretable.
    摘要 在这篇论文中,我们提议使用 copula entropy (CE) 来应用于光度谱。 CE 用于测量光度谱和光度测量之间的相关性,然后选择具有高 CE 的测量来预测谱。我们对 SDSS квазар数据进行验证。实验结果表明,使用选择的测量比使用所有测量来预测谱的准确性更高,特别是高红shift 样本中。选择 CE 的测量包括照度大小、标准差 ultraviolet 频谱亮度和其他四个频谱的亮度。由于 CE 是一种严格定义的数学概念,所 derivated 的模型是可解释的。

Free-form Flows: Make Any Architecture a Normalizing Flow

  • paper_url: http://arxiv.org/abs/2310.16624
  • repo_url: https://github.com/vislearn/fff
  • paper_authors: Felix Draxler, Peter Sorrenson, Lea Zimmermann, Armand Rousselot, Ullrich Köthe
  • for: 本研究旨在提高Normalizing Flows的设计领域,使其能够更加灵活地适应具体任务。
  • methods: 本研究使用一种高效的梯度估计器,允许任意维度保持的神经网络作为生成模型进行最大化 posterior probability 训练。
  • results: 研究人员在分子生成和反问题 benchmark 中获得了优秀的结果,并在使用存在权重 ResNet 架构的情况下与比较势力竞争。
    Abstract Normalizing Flows are generative models that directly maximize the likelihood. Previously, the design of normalizing flows was largely constrained by the need for analytical invertibility. We overcome this constraint by a training procedure that uses an efficient estimator for the gradient of the change of variables formula. This enables any dimension-preserving neural network to serve as a generative model through maximum likelihood training. Our approach allows placing the emphasis on tailoring inductive biases precisely to the task at hand. Specifically, we achieve excellent results in molecule generation benchmarks utilizing $E(n)$-equivariant networks. Moreover, our method is competitive in an inverse problem benchmark, while employing off-the-shelf ResNet architectures.
    摘要 正常化流是一种生成模型,直接 maximize 可能性。在过去,normalizing flow的设计受限于需要分析逆变换的需要。我们通过一种高效的梯度估计器来缓解这个限制,使任何维度保持的神经网络可以作为生成模型通过最大化可能性来训练。我们的方法允许在任务上加入适合的启发性权重,并在分子生成数据集上达到极高的性能。此外,我们的方法在反问题数据集上也具有竞争力,只使用商业化 ResNet 架构。

SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence

  • paper_url: http://arxiv.org/abs/2310.16620
  • repo_url: https://github.com/fangwei123456/spikingjelly
  • paper_authors: Wei Fang, Yanqi Chen, Jianhao Ding, Zhaofei Yu, Timothée Masquelier, Ding Chen, Liwei Huang, Huihui Zhou, Guoqi Li, Yonghong Tian
  • for: 这个论文旨在实现基于神经元逻辑芯片的高效能脑机智能,通过引入神经动力和射频性质。
  • methods: 这篇论文提出了名为SpikingJelly的全栈工具箱,用于预处理神经动力数据集、建立深度神经网络、优化参数和部署神经动力网络在神经元逻辑芯片上。相比现有方法,SpikingJelly可以加速深度神经网络的训练 $11\times$。
  • results: SpikingJelly提供了高度可扩展和灵活的工具箱,可以帮助用户在低成本下加速自定义模型,通过多层继承和自动代码生成。SpikingJelly开创了高效能神经动力网络基于机器智能系统的 synthesis 领域,这将扩展神经元计算生态系统。
    Abstract Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency by introducing neural dynamics and spike properties. As the emerging spiking deep learning paradigm attracts increasing interest, traditional programming frameworks cannot meet the demands of the automatic differentiation, parallel computation acceleration, and high integration of processing neuromorphic datasets and deployment. In this work, we present the SpikingJelly framework to address the aforementioned dilemma. We contribute a full-stack toolkit for pre-processing neuromorphic datasets, building deep SNNs, optimizing their parameters, and deploying SNNs on neuromorphic chips. Compared to existing methods, the training of deep SNNs can be accelerated $11\times$, and the superior extensibility and flexibility of SpikingJelly enable users to accelerate custom models at low costs through multilevel inheritance and semiautomatic code generation. SpikingJelly paves the way for synthesizing truly energy-efficient SNN-based machine intelligence systems, which will enrich the ecology of neuromorphic computing.
    摘要 聚凝神经网络(SNN)目标实现基于神经元模拟芯片的高效能智能,通过引入神经动力学和冲击特性。随着emerging spiking deep learning paradigm吸引越来越多的关注,传统的编程框架无法满足自动微分、并行计算加速和高集成处理神经元数据的需求。在这个工作中,我们提出SpikingJelly框架,以解决上述困境。我们提供了全栈工具箱,用于预处理神经元数据、建立深度SNN、优化参数和部署SNN在神经元模拟芯片上。与现有方法相比,SpikingJelly可以加速深度SNN的训练$11\times$,并且其出色的可扩展性和灵活性使得用户可以在低成本下加速自定义模型,通过多级继承和semiautomatic code generation。SpikingJelly开创了真正能效的SNN-基于机器智能系统的合成,这将丰富神经元计算生态。

Performative Prediction: Past and Future

  • paper_url: http://arxiv.org/abs/2310.16608
  • repo_url: https://github.com/salmansust/Machine-Learning-TSF-Petroleum-Production
  • paper_authors: Moritz Hardt, Celestine Mendler-Dünner
  • for: 这篇论文主要关注的是机器学习预测的表现力和其对数据生成分布的影响。
  • methods: 该论文使用了定义和概念框架来研究机器学习中的表现力,并提出了一种自然平衡概念和学习与导航两种机制的分类。
  • results: 该论文发现了机器学习预测可能会导致数据生成分布的变化,并提出了一种新的优化挑战。同时,论文还探讨了数字市场中平台对参与者的导航问题。
    Abstract Predictions in the social world generally influence the target of prediction, a phenomenon known as performativity. Self-fulfilling and self-negating predictions are examples of performativity. Of fundamental importance to economics, finance, and the social sciences, the notion has been absent from the development of machine learning. In machine learning applications, performativity often surfaces as distribution shift. A predictive model deployed on a digital platform, for example, influences consumption and thereby changes the data-generating distribution. We survey the recently founded area of performative prediction that provides a definition and conceptual framework to study performativity in machine learning. A consequence of performative prediction is a natural equilibrium notion that gives rise to new optimization challenges. Another consequence is a distinction between learning and steering, two mechanisms at play in performative prediction. The notion of steering is in turn intimately related to questions of power in digital markets. We review the notion of performative power that gives an answer to the question how much a platform can steer participants through its predictions. We end on a discussion of future directions, such as the role that performativity plays in contesting algorithmic systems.
    摘要 社会世界中的预测通常会影响预测目标,这种现象被称为表现力。自我实现和自我否定的预测是表现力的例子。 econometrics, finance and social sciences of great importance, this concept has been absent from the development of machine learning. In machine learning applications, performativity often appears as distribution shift. A predictive model deployed on a digital platform, for example, influences consumption and changes the data-generating distribution. We survey the recently founded area of performative prediction, which provides a definition and conceptual framework to study performativity in machine learning. A consequence of performative prediction is a natural equilibrium notion that gives rise to new optimization challenges. Another consequence is a distinction between learning and steering, two mechanisms at play in performative prediction. The notion of steering is closely related to questions of power in digital markets. We review the notion of performative power, which answers the question of how much a platform can steer participants through its predictions. We end with a discussion of future directions, such as the role that performativity plays in contesting algorithmic systems.Note: Please note that the translation is in Simplified Chinese, and the word order and sentence structure may be different from the original text.

AirFL-Mem: Improving Communication-Learning Trade-Off by Long-Term Memory

  • paper_url: http://arxiv.org/abs/2310.16606
  • repo_url: None
  • paper_authors: Haifeng Wen, Hong Xing, Osvaldo Simeone
  • for: 提高 federated learning(FL)中的通信瓶颈问题,随空间FL(AirFL)已经出现为一种有前途的解决方案,但是它受到深层折射条件的妨碍。
  • methods: 本文提出了 AirFL-Mem,一种利用长期记忆机制来减轻深层折射的影响的新方案。提供了对于普通非对称目标函数的收敛界限,包括长期记忆和现有AirFL变体的短期记忆。
  • results: 理论分析表明,AirFL-Mem 能够与理想的通信情况下的 FedAvg 达到同样的收敛速率,而现有方案通常受到错误底值的限制。同时,提出了一种基于几何programming的 convex 优化策略来调整 truncation 阈值以实现功率控制在RAYLEIGH 折射通道上。实验结果证明了理论分析的正确性,并证明了长期记忆机制对深层折射的减轻的优势。
    Abstract Addressing the communication bottleneck inherent in federated learning (FL), over-the-air FL (AirFL) has emerged as a promising solution, which is, however, hampered by deep fading conditions. In this paper, we propose AirFL-Mem, a novel scheme designed to mitigate the impact of deep fading by implementing a \emph{long-term} memory mechanism. Convergence bounds are provided that account for long-term memory, as well as for existing AirFL variants with short-term memory, for general non-convex objectives. The theory demonstrates that AirFL-Mem exhibits the same convergence rate of federated averaging (FedAvg) with ideal communication, while the performance of existing schemes is generally limited by error floors. The theoretical results are also leveraged to propose a novel convex optimization strategy for the truncation threshold used for power control in the presence of Rayleigh fading channels. Experimental results validate the analysis, confirming the advantages of a long-term memory mechanism for the mitigation of deep fading.
    摘要 Translation in Simplified Chinese:addressing the communication bottleneck inherent in federated learning (FL), over-the-air FL (AirFL) has emerged as a promising solution, which is, however, hampered by deep fading conditions. in this paper, we propose AirFL-Mem, a novel scheme designed to mitigate the impact of deep fading by implementing a \emph{long-term} memory mechanism. convergence bounds are provided that account for long-term memory, as well as for existing AirFL variants with short-term memory, for general non-convex objectives. the theory demonstrates that AirFL-Mem exhibits the same convergence rate of federated averaging (FedAvg) with ideal communication, while the performance of existing schemes is generally limited by error floors. the theoretical results are also leveraged to propose a novel convex optimization strategy for the truncation threshold used for power control in the presence of Rayleigh fading channels. experimental results validate the analysis, confirming the advantages of a long-term memory mechanism for the mitigation of deep fading.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing.

Parcel loss prediction in last-mile delivery: deep and non-deep approaches with insights from Explainable AI

  • paper_url: http://arxiv.org/abs/2310.16602
  • repo_url: None
  • paper_authors: Jan de Leeuw, Zaharah Bukhsh, Yingqian Zhang
    for: 降低电子商务 послед个分段配送阶段的包裹丢失是industry中一个重要的目标。methods: 本文提出了两种机器学习方法,namely Data Balance with Supervised Learning (DBSL)和 Deep Hybrid Ensemble Learning (DHEL),以准确预测包裹丢失。results: 我们对一年的比利时船运数据进行了全面的评估,发现 DHEL 模型,它将feed-forward autoencoder与随机森林结合,实现了最高的分类性能。
    Abstract Within the domain of e-commerce retail, an important objective is the reduction of parcel loss during the last-mile delivery phase. The ever-increasing availability of data, including product, customer, and order information, has made it possible for the application of machine learning in parcel loss prediction. However, a significant challenge arises from the inherent imbalance in the data, i.e., only a very low percentage of parcels are lost. In this paper, we propose two machine learning approaches, namely, Data Balance with Supervised Learning (DBSL) and Deep Hybrid Ensemble Learning (DHEL), to accurately predict parcel loss. The practical implication of such predictions is their value in aiding e-commerce retailers in optimizing insurance-related decision-making policies. We conduct a comprehensive evaluation of the proposed machine learning models using one year data from Belgian shipments. The findings show that the DHEL model, which combines a feed-forward autoencoder with a random forest, achieves the highest classification performance. Furthermore, we use the techniques from Explainable AI (XAI) to illustrate how prediction models can be used in enhancing business processes and augmenting the overall value proposition for e-commerce retailers in the last mile delivery.
    摘要 在电商零售领域,一个重要的目标是减少最后一英里配送阶段的包裹产生损失。随着数据的普遍化,包括产品、顾客和订单信息,可以通过机器学习来预测包裹损失。然而,数据的内生偏见成为了一个主要挑战,即只有非常低的百分比的包裹会产生损失。在这篇论文中,我们提出了两种机器学习方法,namely Data Balance with Supervised Learning (DBSL)和Deep Hybrid Ensemble Learning (DHEL),以准确预测包裹损失。实际上,这些预测结果对电商零售商来说具有很大的实用价值,可以帮助他们优化保险相关的决策政策。我们使用了一年的比利时船运数据进行了全面的评估。发现DHEL模型,它将Feed-Forward Autoencoder与随机森林相结合,实现了最高的分类性能。此外,我们使用了Explainable AI(XAI)技术,以示如何使用预测模型来增强业务过程,并增加电商零售商在最后一英里配送阶段的总价值。

Beyond IID weights: sparse and low-rank deep Neural Networks are also Gaussian Processes

  • paper_url: http://arxiv.org/abs/2310.16597
  • repo_url: None
  • paper_authors: Thiziri Nait-Saada, Alireza Naderi, Jared Tanner
  • for: 这 paper 探讨了深度学习中的许多现象,例如深度网络的训练动态和 activation function 的选择对训练过程的影响。
  • methods: 作者使用了 Matthews et al. (2018) 的证明,将其推广到更大的初始权重分布(即 PSEUDO-IID)中,包括已知的 IID 和 orthogonal 权重,以及低级和结构化缺失的设置。
  • results: 作者显示了 initialized 的 fully-connected 和 convolutional 网络,均可以 viewed as effectively equivalent up to their variance,并且可以通过 tuning 网络在 Edge-of-Chaos 附近进行增强训练。
    Abstract The infinitely wide neural network has been proven a useful and manageable mathematical model that enables the understanding of many phenomena appearing in deep learning. One example is the convergence of random deep networks to Gaussian processes that allows a rigorous analysis of the way the choice of activation function and network weights impacts the training dynamics. In this paper, we extend the seminal proof of Matthews et al. (2018) to a larger class of initial weight distributions (which we call PSEUDO-IID), including the established cases of IID and orthogonal weights, as well as the emerging low-rank and structured sparse settings celebrated for their computational speed-up benefits. We show that fully-connected and convolutional networks initialized with PSEUDO-IID distributions are all effectively equivalent up to their variance. Using our results, one can identify the Edge-of-Chaos for a broader class of neural networks and tune them at criticality in order to enhance their training.
    摘要 “无穷深度神经网络”已经被证明为一个有用且可控的数学模型,帮助理解深度学习中的许多现象。一个例子是深度网络的趋向 Gaussian 过程,允许严谨地分析 activation function 和网络重量的选择对训练过程的影响。在这篇文章中,我们将 Matthews et al. (2018) 的著名证明扩展到更大的初始重量分布(我们称为 Pseudo-IID),包括已知的 IID 和对称重量设定,以及受欢迎的低维度和结构簇 sparse 设定,这些设定在 Computational speed-up 方面获得了优化。我们显示了完全连接和卷积神经网络,它们都可以使用 Pseudo-IID 分布初始化,并且它们的方差都是相同的。使用我们的结果,您可以在更广泛的神经网络中找到 Edge-of-Chaos,并将它们调整到 críticality 以提高它们的训练。

Over-the-air Federated Policy Gradient

  • paper_url: http://arxiv.org/abs/2310.16592
  • repo_url: None
  • paper_authors: Huiwen Yang, Lingying Huang, Subhrakanti Dey, Ling Shi
  • for: 这篇论文是关于大规模分布学习优化感知中的无线协同技术。
  • methods: 本文提出了无线联合策略偏微算法,所有代理都同时通过无线通道广播一个分析信号,中央控制器使用接收的总波形更新策略参数。
  • results: 文章研究无线协同策略偏微算法的精度和稳定性,并确定了通信和采样的复杂性。最后,文章还提供了一些实验结果,证明了算法的有效性。
    Abstract In recent years, over-the-air aggregation has been widely considered in large-scale distributed learning, optimization, and sensing. In this paper, we propose the over-the-air federated policy gradient algorithm, where all agents simultaneously broadcast an analog signal carrying local information to a common wireless channel, and a central controller uses the received aggregated waveform to update the policy parameters. We investigate the effect of noise and channel distortion on the convergence of the proposed algorithm, and establish the complexities of communication and sampling for finding an $\epsilon$-approximate stationary point. Finally, we present some simulation results to show the effectiveness of the algorithm.
    摘要 Recently, over-the-air aggregation has been widely considered in large-scale distributed learning, optimization, and sensing. In this paper, we propose the over-the-air federated policy gradient algorithm, where all agents simultaneously broadcast an analog signal carrying local information to a common wireless channel, and a central controller uses the received aggregated waveform to update the policy parameters. We investigate the effect of noise and channel distortion on the convergence of the proposed algorithm, and establish the complexities of communication and sampling for finding an $\epsilon$-approximate stationary point. Finally, we present some simulation results to show the effectiveness of the algorithm.Here is the translation in Traditional Chinese:近年来,透过空中集成已经广泛地考虑在大规模分布式学习、优化和感应中。在这篇文章中,我们提出了透过空中联邦策略梯度算法,所有代理都同时将本地资讯传送到共同无线通道上,中央控制器使用接收的总波形来更新策略参数。我们研究了噪音和通道扭曲对提案的均衡稳定点的影响,并估算了通信和抽样的复杂性。最后,我们显示了一些实验结果,以证明提案的有效性。

Multi-parallel-task Time-delay Reservoir Computing combining a Silicon Microring with WDM

  • paper_url: http://arxiv.org/abs/2310.16588
  • repo_url: None
  • paper_authors: Bernard J. Giron Castro, Christophe Peucheret, Darko Zibar, Francesco Da Ros
  • for: 这个论文用于解决三个任务,即时间序列预测、分类和无线通道准确。
  • methods: 论文使用微环扬的时间延迟储存方案,并对每个任务进行优化的功率和频率调整。
  • results: 论文在每个任务上都达到了状态eliae-of-the-art的性能,并且在多普勒谱上进行了时间序列预测、分类和无线通道准确。
    Abstract We numerically demonstrate a microring-based time-delay reservoir computing scheme that simultaneously solves three tasks involving time-series prediction, classification, and wireless channel equalization. Each task performed on a wavelength-multiplexed channel achieves state-of-the-art performance with optimized power and frequency detuning.
    摘要 我们数字 demonstrate一种基于微环的时间延迟储存计算方案,同时解决了三个任务,包括时间序列预测、分类和无线通道平衡。每个任务在一个波长多plexed通道上实现了状态 искусственный智能性能,并且通过优化功率和频率偏差来优化性能。

Mapping the magnetic field using a magnetometer array with noisy input Gaussian process regression

  • paper_url: http://arxiv.org/abs/2310.16577
  • repo_url: None
  • paper_authors: Thomas Edridge, Manon Kok
  • for: 室内环境中磁性材料会导致环境磁场的干扰,这些磁场干扰地图可以用于室内定位。
  • methods: 我们使用了 Gaussian 过程来学习磁场的空间变化大小,使用磁计的测量值和磁计的位置信息。
  • results: 我们的方法可以提高磁场地图的质量,并且通过实验数据示出了这种方法的有效性。
    Abstract Ferromagnetic materials in indoor environments give rise to disturbances in the ambient magnetic field. Maps of these magnetic disturbances can be used for indoor localisation. A Gaussian process can be used to learn the spatially varying magnitude of the magnetic field using magnetometer measurements and information about the position of the magnetometer. The position of the magnetometer, however, is frequently only approximately known. This negatively affects the quality of the magnetic field map. In this paper, we investigate how an array of magnetometers can be used to improve the quality of the magnetic field map. The position of the array is approximately known, but the relative locations of the magnetometers on the array are known. We include this information in a novel method to make a map of the ambient magnetic field. We study the properties of our method in simulation and show that our method improves the map quality. We also demonstrate the efficacy of our method with experimental data for the mapping of the magnetic field using an array of 30 magnetometers.
    摘要 ferromagnetic 物体在室内环境中会导致环境类极化场的干扰。使用探测器测量和磁场的位置信息,可以使用 Gaussian 程序学习磁场的空间各点的强度。但是探测器的位置仅仅知道大约,这会对磁场地图的质量产生负面影响。在这篇论文中,我们研究了如何使用数组探测器来改善磁场地图的质量。数组的位置大约知道,但是它们之间的相对位置仅知道。我们将这个信息添加到一种新的方法中,以生成更高质量的磁场地图。我们在实验中研究了这个方法的性能,并证明了它可以提高地图的质量。我们还使用实验数据来证明方法的有效性,使用了30个探测器进行磁场的映射。

Large-scale magnetic field maps using structured kernel interpolation for Gaussian process regression

  • paper_url: http://arxiv.org/abs/2310.16574
  • repo_url: None
  • paper_authors: Clara Menzen, Marnix Fetter, Manon Kok
  • for: 这篇论文是为了计算室内环境中大规模磁场图的映射而设计的。
  • methods: 这篇论文使用了一种基于核函数 interpolate(SKI)的方法,通过利用高效的 kronecker 子空间方法来加速推理。
  • results: 在 simulate 中,这种方法可以在磁场图的映射区域增加的精度,并在大规模实验中可以在两分钟内从40000个三维磁场测量数据中构建磁场图。
    Abstract We present a mapping algorithm to compute large-scale magnetic field maps in indoor environments with approximate Gaussian process (GP) regression. Mapping the spatial variations in the ambient magnetic field can be used for localization algorithms in indoor areas. To compute such a map, GP regression is a suitable tool because it provides predictions of the magnetic field at new locations along with uncertainty quantification. Because full GP regression has a complexity that grows cubically with the number of data points, approximations for GPs have been extensively studied. In this paper, we build on the structured kernel interpolation (SKI) framework, speeding up inference by exploiting efficient Krylov subspace methods. More specifically, we incorporate SKI with derivatives (D-SKI) into the scalar potential model for magnetic field modeling and compute both predictive mean and covariance with a complexity that is linear in the data points. In our simulations, we show that our method achieves better accuracy than current state-of-the-art methods on magnetic field maps with a growing mapping area. In our large-scale experiments, we construct magnetic field maps from up to 40000 three-dimensional magnetic field measurements in less than two minutes on a standard laptop.
    摘要 我们提出了一种映射算法,用于计算室内环境中大规模的磁场地图。这种地图可以用于室内定位算法。使用 Gaussian process(GP)回归,可以提供新位置的磁场预测,同时还可以Quantify the uncertainty of the prediction.然而,全GP回归的复杂度会随着数据点的增加而增加,因此有严格的研究。在这篇论文中,我们基于结构kernel interpolation(SKI)框架,通过高效的Krylov子空间方法加速推理。更 specifically,我们将SKI与 derivatives(D-SKI)纳入scalar potential模型中,并计算预测的mean和covariance,复杂度 linearly with the data points。在我们的 simulations中,我们发现我们的方法可以在增加的映射区域大小的情况下,与当前状态的方法相比,提高精度。在我们的大规模实验中,我们从40000个三维磁场测量中构建了磁场地图,并在标准笔记电脑上完成了 less than two minutes。

Model-enhanced Contrastive Reinforcement Learning for Sequential Recommendation

  • paper_url: http://arxiv.org/abs/2310.16566
  • repo_url: None
  • paper_authors: Chengpeng Li, Zhengyi Yang, Jizhi Zhang, Jiancan Wu, Dingxian Wang, Xiangnan He, Xiang Wang
  • for: 提高推荐系统的长期用户满意度,通过Markov决策过程(MDP)来形式化推荐,并使用了强化学习(RL)方法优化。
  • methods: 提出了一种名为模型强化对比强化学习(MCRL)的新的RL推荐器,通过同时学习价值函数和保守价值学习机制来缓解过度估计问题,并使用对比学习来利用MDP的内部结构信息来模型奖励函数和状态转移函数。
  • results: 实验结果表明,相比现有的OFFLINE RL和自动学习RL方法,MCRL方法在两个真实世界数据集上得到了显著的提升。
    Abstract Reinforcement learning (RL) has been widely applied in recommendation systems due to its potential in optimizing the long-term engagement of users. From the perspective of RL, recommendation can be formulated as a Markov decision process (MDP), where recommendation system (agent) can interact with users (environment) and acquire feedback (reward signals).However, it is impractical to conduct online interactions with the concern on user experience and implementation complexity, and we can only train RL recommenders with offline datasets containing limited reward signals and state transitions. Therefore, the data sparsity issue of reward signals and state transitions is very severe, while it has long been overlooked by existing RL recommenders.Worse still, RL methods learn through the trial-and-error mode, but negative feedback cannot be obtained in implicit feedback recommendation tasks, which aggravates the overestimation problem of offline RL recommender. To address these challenges, we propose a novel RL recommender named model-enhanced contrastive reinforcement learning (MCRL). On the one hand, we learn a value function to estimate the long-term engagement of users, together with a conservative value learning mechanism to alleviate the overestimation problem.On the other hand, we construct some positive and negative state-action pairs to model the reward function and state transition function with contrastive learning to exploit the internal structure information of MDP. Experiments demonstrate that the proposed method significantly outperforms existing offline RL and self-supervised RL methods with different representative backbone networks on two real-world datasets.
    摘要 利用强化学习(RL)的应用在推荐系统中广泛,因为RL可以优化用户的长期投入。从RL的角度来看,推荐系统可以视为一个Markov决策过程(MDP),其中推荐系统(代理)可以与用户(环境)互动,获得反馈(奖励讯号)。然而,在线上互动的问题下,用户体验和实现方式的问题导致RL推荐器的训练仅能靠搀托给定的缺乏奖励讯号和状态变化的练习数据进行训练。因此,推荐系统的奖励讯号和状态变化的缺乏问题非常严重,而这个问题长期以来受到RL推荐器的遗传。更糟糕的是,RL方法通过尝试和错误的模式学习,但是在隐式反馈推荐任务中,负面反馈无法获得,这个问题进一步严重过估推荐系统的性能。为了解决这些挑战,我们提出了一个新的RL推荐器,名为模型强化对照学习(MCRL)。在一方面,我们学习一个值函数估计用户的长期投入,同时将一个保守的值学习机制引入以解决过估问题。另一方面,我们使用对照学习来建构一些正面和负面的状态动作对,以模型奖励函数和状态变化函数,并将这些对照学习的训练结果与RL推荐器结合。实验结果显示,提出的方法在两个真实的数据集上具有明显的性能优化,与存在的缺乏奖励和自我超级RL方法相比。

DECWA : Density-Based Clustering using Wasserstein Distance

  • paper_url: http://arxiv.org/abs/2310.16552
  • repo_url: https://github.com/nabilem/decwa
  • paper_authors: Nabil El Malki, Robin Cugny, Olivier Teste, Franck Ravat
  • for: 这篇论文是为了提出一种新的分区方法和一种基于空间浓度和概率方法的归一化算法,以解决现有的density-based clustering方法在低密度团、邻近密度相似团和高维数据上的缺陷。
  • methods: 该论文提出了一种基于probability density function($p.d.f$)的分区方法,首先使用$p.d.f$来建立子团,然后使用 Wasserstein metric来聚合相似的子团。
  • results: 论文表明,该方法在多种 datasets 上表现出色,超过了现有的state-of-the-art density-based clustering方法。
    Abstract Clustering is a data analysis method for extracting knowledge by discovering groups of data called clusters. Among these methods, state-of-the-art density-based clustering methods have proven to be effective for arbitrary-shaped clusters. Despite their encouraging results, they suffer to find low-density clusters, near clusters with similar densities, and high-dimensional data. Our proposals are a new characterization of clusters and a new clustering algorithm based on spatial density and probabilistic approach. First of all, sub-clusters are built using spatial density represented as probability density function ($p.d.f$) of pairwise distances between points. A method is then proposed to agglomerate similar sub-clusters by using both their density ($p.d.f$) and their spatial distance. The key idea we propose is to use the Wasserstein metric, a powerful tool to measure the distance between $p.d.f$ of sub-clusters. We show that our approach outperforms other state-of-the-art density-based clustering methods on a wide variety of datasets.
    摘要 首先,我们使用对Points之间的距离的概率浸润函数($p.d.f$)来构建子集。然后,我们提出了一种将相似的子集归类的方法,使用这些子集的浸润函数($p.d.f$)和其空间距离。我们的关键想法是使用沃氏距离,这是一种强大的测量 $p.d.f$ 的距离的工具。我们展示了我们的方法在各种数据集上的优秀性能。

Cyclic Directed Probabilistic Graphical Model: A Proposal Based on Structured Outcomes

  • paper_url: http://arxiv.org/abs/2310.16525
  • repo_url: None
  • paper_authors: Oleksii Sirotkin
  • for: probabilistic graphical model 建模 (structural learning)
  • methods: 使用 probablistic relation network (PRN) 直接捕捉方向相关性 (directional cyclic dependencies)
  • results: 支持从观测数据学习 (learning from observed data), 满足 probabilistic inference (probabilistic inference),可用于数据分析和专家设计等应用 (data analysis and expert design applications)
    Abstract In the process of building (structural learning) a probabilistic graphical model from a set of observed data, the directional, cyclic dependencies between the random variables of the model are often found. Existing graphical models such as Bayesian and Markov networks can reflect such dependencies. However, this requires complicating those models, such as adding additional variables or dividing the model graph into separate subgraphs. Herein, we describe a probabilistic graphical model - probabilistic relation network - that allows the direct capture of directional cyclic dependencies during structural learning. This model is based on the simple idea that each sample of the observed data can be represented by an arbitrary graph (structured outcome), which reflects the structure of the dependencies of the variables included in the sample. Each of the outcomes contains only a part of the graphical model structure; however, a complete graph of the probabilistic model is obtained by combining different outcomes. Such a graph, unlike Bayesian and Markov networks, can be directed and can have cycles. We explored the full joint distribution and conditional distribution and conditional independence properties of variables in the proposed model. We defined the algorithms for constructing of the model from the dataset and for calculating the conditional and full joint distributions. We also performed a numerical comparison with Bayesian and Markov networks. This model does not violate the probability axioms, and it supports learning from observed data. Notably, it supports probabilistic inference, making it a prospective tool in data analysis and in expert and design-making applications.
    摘要 在建立结构学习(structural learning)probabilistic graphical model时,常发现方向性循环依赖关系 междуRandom variable。现有的图书馆模型,如权化网络和马可夫链,可以表示这些依赖关系。然而,这需要补充这些模型,例如添加更多变量或将模型图分成多个子图。在这里,我们描述了一种probabilistic graphical model——probabilistic relation network——可以直接捕捉方向性循环依赖关系。这种模型基于简单的想法,即每个观察数据样本可以表示一个任意图(structured outcome),该图反映变量之间的依赖结构。每个结果只包含变量之间的一部分结构,但是可以将多个结果组合起来获得完整的图形模型结构。这种图不同于权化网络和马可夫链,可以是指定的并且可以有循环。我们研究了这种模型的全联合分布和 condition distribution和 conditional independence性质。我们还定义了从数据集构建模型的算法和计算 conditional和全联合分布的算法。此外,我们对权化网络和马可夫链进行了数值比较。这种模型不违背概率axioms,并且支持从观察数据学习。尤其是,它支持概率推理,使其成为数据分析和专家设计应用的可能工具。

Can You Rely on Your Model Evaluation? Improving Model Evaluation with Synthetic Test Data

  • paper_url: http://arxiv.org/abs/2310.16524
  • repo_url: https://github.com/vanderschaarlab/3S-Testing
  • paper_authors: Boris van Breugel, Nabeel Seedat, Fergus Imrie, Mihaela van der Schaar
    for: This paper aims to address the challenges of accurately assessing the performance of machine learning models on diverse and underrepresented subgroups.methods: The paper proposes a deep generative modeling framework called 3S Testing, which generates synthetic test sets for small subgroups and simulates distributional shifts.results: The authors demonstrate that 3S Testing outperforms traditional baselines in estimating model performance on minority subgroups and under plausible distributional shifts, and provides intervals around its performance estimates with superior coverage of the ground truth compared to existing approaches.
    Abstract Evaluating the performance of machine learning models on diverse and underrepresented subgroups is essential for ensuring fairness and reliability in real-world applications. However, accurately assessing model performance becomes challenging due to two main issues: (1) a scarcity of test data, especially for small subgroups, and (2) possible distributional shifts in the model's deployment setting, which may not align with the available test data. In this work, we introduce 3S Testing, a deep generative modeling framework to facilitate model evaluation by generating synthetic test sets for small subgroups and simulating distributional shifts. Our experiments demonstrate that 3S Testing outperforms traditional baselines -- including real test data alone -- in estimating model performance on minority subgroups and under plausible distributional shifts. In addition, 3S offers intervals around its performance estimates, exhibiting superior coverage of the ground truth compared to existing approaches. Overall, these results raise the question of whether we need a paradigm shift away from limited real test data towards synthetic test data.
    摘要

Towards Self-Interpretable Graph-Level Anomaly Detection

  • paper_url: http://arxiv.org/abs/2310.16520
  • repo_url: https://github.com/yixinliu233/signet
  • paper_authors: Yixin Liu, Kaize Ding, Qinghua Lu, Fuyi Li, Leo Yu Zhang, Shirui Pan
  • for: 本研究的目的是提出一种可解释的图像异常检测模型(SIGNET),可以同时检测图像异常和生成相关的解释。
  • methods: 本研究使用了多视图子图信息瓶颈(MSIB)框架,从而实现了自我解释的图像异常检测。
  • results: 广泛的实验表明,SIGNET具有优秀的异常检测能力和自我解释能力。
    Abstract Graph-level anomaly detection (GLAD) aims to identify graphs that exhibit notable dissimilarity compared to the majority in a collection. However, current works primarily focus on evaluating graph-level abnormality while failing to provide meaningful explanations for the predictions, which largely limits their reliability and application scope. In this paper, we investigate a new challenging problem, explainable GLAD, where the learning objective is to predict the abnormality of each graph sample with corresponding explanations, i.e., the vital subgraph that leads to the predictions. To address this challenging problem, we propose a Self-Interpretable Graph aNomaly dETection model (SIGNET for short) that detects anomalous graphs as well as generates informative explanations simultaneously. Specifically, we first introduce the multi-view subgraph information bottleneck (MSIB) framework, serving as the design basis of our self-interpretable GLAD approach. This way SIGNET is able to not only measure the abnormality of each graph based on cross-view mutual information but also provide informative graph rationales by extracting bottleneck subgraphs from the input graph and its dual hypergraph in a self-supervised way. Extensive experiments on 16 datasets demonstrate the anomaly detection capability and self-interpretability of SIGNET.
    摘要 <>Translate the given text into Simplified Chinese.<> GRaph-level anomaly detection (GLAD) aims to identify graphs that exhibit notable dissimilarity compared to the majority in a collection. However, current works primarily focus on evaluating graph-level abnormality while failing to provide meaningful explanations for the predictions, which largely limits their reliability and application scope. In this paper, we investigate a new challenging problem, explainable GLAD, where the learning objective is to predict the abnormality of each graph sample with corresponding explanations, i.e., the vital subgraph that leads to the predictions. To address this challenging problem, we propose a Self-Interpretable Graph aNomaly dETection model (SIGNET for short) that detects anomalous graphs as well as generates informative explanations simultaneously. Specifically, we first introduce the multi-view subgraph information bottleneck (MSIB) framework, serving as the design basis of our self-interpretable GLAD approach. This way SIGNET is able to not only measure the abnormality of each graph based on cross-view mutual information but also provide informative graph rationales by extracting bottleneck subgraphs from the input graph and its dual hypergraph in a self-supervised way. Extensive experiments on 16 datasets demonstrate the anomaly detection capability and self-interpretability of SIGNET.

Particle-based Variational Inference with Generalized Wasserstein Gradient Flow

  • paper_url: http://arxiv.org/abs/2310.16516
  • repo_url: https://github.com/alexczh1/gwg
  • paper_authors: Ziheng Cheng, Shiyue Zhang, Longlin Yu, Cheng Zhang
  • for: 这 paper 是关于 Particle-based variational inference methods (ParVIs) 的研究,具体来说是 Stein variational gradient descent (SVGD) 的一种更改方法。
  • methods: 这 paper 使用了 kernelized Wasserstein gradient flow 来更新 particles,但是 kernel 的设计可能会带来一些限制。这 paper 提出了一种基于函数梯度流的方法,可以更好地适应不同的情景。
  • results: 这 paper 显示了 Generalized Wasserstein gradient descent (GWG) 方法的强大的收敛保证,并且提供了一种自适应版本,可以根据 Wasserstein 距离选择最佳的 метри来加速收敛。在实验中,这 paper 证明了该方法在 simulate 和实际数据问题上的效果和效率。
    Abstract Particle-based variational inference methods (ParVIs) such as Stein variational gradient descent (SVGD) update the particles based on the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. However, the design of kernels is often non-trivial and can be restrictive for the flexibility of the method. Recent works show that functional gradient flow approximations with quadratic form regularization terms can improve performance. In this paper, we propose a ParVI framework, called generalized Wasserstein gradient descent (GWG), based on a generalized Wasserstein gradient flow of the KL divergence, which can be viewed as a functional gradient method with a broader class of regularizers induced by convex functions. We show that GWG exhibits strong convergence guarantees. We also provide an adaptive version that automatically chooses Wasserstein metric to accelerate convergence. In experiments, we demonstrate the effectiveness and efficiency of the proposed framework on both simulated and real data problems.
    摘要 <> translate "Particle-based variational inference methods (ParVIs) such as Stein variational gradient descent (SVGD) update the particles based on the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. However, the design of kernels is often non-trivial and can be restrictive for the flexibility of the method. Recent works show that functional gradient flow approximations with quadratic form regularization terms can improve performance. In this paper, we propose a ParVI framework, called generalized Wasserstein gradient descent (GWG), based on a generalized Wasserstein gradient flow of the KL divergence, which can be viewed as a functional gradient method with a broader class of regularizers induced by convex functions. We show that GWG exhibits strong convergence guarantees. We also provide an adaptive version that automatically chooses Wasserstein metric to accelerate convergence. In experiments, we demonstrate the effectiveness and efficiency of the proposed framework on both simulated and real data problems." into Simplified Chinese.翻译文本为Simplified Chinese:<>stein variational gradient descent(SVGD)和其他particle-based variational inference方法(ParVI)通常使用kernelized Wasserstein gradient flow来更新particles,但是kernel的设计可能是非常复杂和限制方法的灵活性。latest works表明,可以通过添加quadratic form regularization term来提高性能。在这篇论文中,我们提出了一种基于generalized Wasserstein gradient flow的ParVI框架,称为generalized Wasserstein gradient descent(GWG)。GWG可以看作一种基于convex function所induced的functional gradient方法,具有更广泛的regulator。我们证明了GWG具有强 convergence guarantees。此外,我们还提供了一个自适应版本,可以自动选择Wasserstein metric,以加速收敛。在实验中,我们展示了提议框架在模拟和实际数据问题上的效果和高效性。

Data Optimization in Deep Learning: A Survey

  • paper_url: http://arxiv.org/abs/2310.16499
  • repo_url: https://github.com/yaorujing/data-optimization
  • paper_authors: Ou Wu, Rujing Yao
    for:这篇论文的目的是为了提供一个包容性强的分类法,以便更好地理解现有的数据优化技术,并且探讨未来研究的可能性。methods:这篇论文使用了大量的文献研究,对现有的数据优化技术进行了分类和总结,并建立了一个包容性强的分类法。results:这篇论文通过对现有数据优化技术的分类和总结,提供了一个全面的视角,并探讨了未来研究的可能性。
    Abstract Large-scale, high-quality data are considered an essential factor for the successful application of many deep learning techniques. Meanwhile, numerous real-world deep learning tasks still have to contend with the lack of sufficient amounts of high-quality data. Additionally, issues such as model robustness, fairness, and trustworthiness are also closely related to training data. Consequently, a huge number of studies in the existing literature have focused on the data aspect in deep learning tasks. Some typical data optimization techniques include data augmentation, logit perturbation, sample weighting, and data condensation. These techniques usually come from different deep learning divisions and their theoretical inspirations or heuristic motivations may seem unrelated to each other. This study aims to organize a wide range of existing data optimization methodologies for deep learning from the previous literature, and makes the effort to construct a comprehensive taxonomy for them. The constructed taxonomy considers the diversity of split dimensions, and deep sub-taxonomies are constructed for each dimension. On the basis of the taxonomy, connections among the extensive data optimization methods for deep learning are built in terms of four aspects. We probe into rendering several promising and interesting future directions. The constructed taxonomy and the revealed connections will enlighten the better understanding of existing methods and the design of novel data optimization techniques. Furthermore, our aspiration for this survey is to promote data optimization as an independent subdivision of deep learning. A curated, up-to-date list of resources related to data optimization in deep learning is available at \url{https://github.com/YaoRujing/Data-Optimization}.
    摘要 大规模、高质量数据被认为是深度学习技术应用成功的重要因素。然而,许多实际应用中的深度学习任务仍面临着不足的高质量数据的问题。此外,模型的Robustness、公平性和信任性也与训练数据密切相关。因此,现有许多研究在深度学习任务中强调数据的重要性。这些研究包括数据增强、LOGit扰动、样本权重和数据缩densification等技术。这些技术通常来自不同的深度学习分支,其理论 inspirations 或启发性可能与彼此不相关。本研究 aimsto organize 过去 литераature 中关于深度学习数据优化的各种方法,并尝试构建一个全面的分类体系。该分类系统考虑了多维度的分割多样性,并为每个维度建立了深度的子分类。基于这个分类系统,对涵盖了广泛的深度学习数据优化方法的连接建立了四个方面。我们还探讨了一些有潜力和有趣的未来方向。建立的分类系统和连接将有助于更好地理解现有方法,以及设计新的数据优化技术。此外,我们的愿望是通过这种报告来促进数据优化成为深度学习的独立分支。一个CURATED、最新的关于数据优化在深度学习的资源列表可以在 \url{https://github.com/YaoRujing/Data-Optimization} 上找到。

Citizen participation: crowd-sensed sustainable indoor location services

  • paper_url: http://arxiv.org/abs/2310.16496
  • repo_url: None
  • paper_authors: Ioannis Nasios, Konstantinos Vogklis, Avleen Malhi, Anastasia Vayona, Panos Chatziadam, Vasilis Katos
  • for: 提供无需特殊硬件的indoor定位功能,帮助实现智能基础设施转型。
  • methods: 使用机器学习技术,通过访问可用的WiFi基站网络,基于visitor的智能手机来估算indoor位置。
  • results: 实验结果显示,提posed方法可以达到准确率低于2m,并且模型在BSSIDs数量减少时仍然具有抗难度性。
    Abstract In the present era of sustainable innovation, the circular economy paradigm dictates the optimal use and exploitation of existing finite resources. At the same time, the transition to smart infrastructures requires considerable investment in capital, resources and people. In this work, we present a general machine learning approach for offering indoor location awareness without the need to invest in additional and specialised hardware. We explore use cases where visitors equipped with their smart phone would interact with the available WiFi infrastructure to estimate their location, since the indoor requirement poses a limitation to standard GPS solutions. Results have shown that the proposed approach achieves a less than 2m accuracy and the model is resilient even in the case where a substantial number of BSSIDs are dropped.
    摘要 现在可持续创新时代,循环经济模式提倡最佳利用现有的有限资源。同时,转移到智能基础设施需要较大的投资资源人力。在这项工作中,我们提出了一种通用机器学习方法,以实现无需特殊硬件投资的室内位置意识。我们探讨访问者通过使用可用的 WiFi 基础设施与其智能手机进行互动,以估算室内位置,因为标准 GPS 解决方案在室内环境下存在限制。结果表明,我们的方法可以实现准确率低于 2m,并且模型在大量 BSSID 被drop 时仍能保持可靠性。

TSONN: Time-stepping-oriented neural network for solving partial differential equations

  • paper_url: http://arxiv.org/abs/2310.16491
  • repo_url: None
  • paper_authors: Wenbo Cao, Weiwei Zhang
  • for: 解决具有部分偏微分方程(PDE)的前向和反向问题,特别是通过physics-informed neural networks(PINNs)来解决这些问题。
  • methods: 将时间步长级联到深度学习中,将原始的不良条件优化问题转化为一系列良好条件的子问题,使模型训练 converges significantly。
  • results: 提出的方法可以稳定地训练并在许多问题中获得正确的结果,而标准PINNs则无法解决这些问题。此外,我们还发现了时间步长方法在神经网络优化方法框架中的一些新性和优势,比如可以使用显式或隐式时间步长,并且可以与传统的网格数值方法进行比较。
    Abstract Deep neural networks (DNNs), especially physics-informed neural networks (PINNs), have recently become a new popular method for solving forward and inverse problems governed by partial differential equations (PDEs). However, these methods still face challenges in achieving stable training and obtaining correct results in many problems, since minimizing PDE residuals with PDE-based soft constraint make the problem ill-conditioned. Different from all existing methods that directly minimize PDE residuals, this work integrates time-stepping method with deep learning, and transforms the original ill-conditioned optimization problem into a series of well-conditioned sub-problems over given pseudo time intervals. The convergence of model training is significantly improved by following the trajectory of the pseudo time-stepping process, yielding a robust optimization-based PDE solver. Our results show that the proposed method achieves stable training and correct results in many problems that standard PINNs fail to solve, requiring only a simple modification on the loss function. In addition, we demonstrate several novel properties and advantages of time-stepping methods within the framework of neural network-based optimization approach, in comparison to traditional grid-based numerical method. Specifically, explicit scheme allows significantly larger time step, while implicit scheme can be implemented as straightforwardly as explicit scheme.
    摘要 深度神经网络(DNN),特别是物理学信息泛化神经网络(PINN),最近成为解决部分导数方程(PDE)的前向和反向问题的新方法。然而,这些方法仍然面临困难在实现稳定训练和正确结果的多个问题中,因为将PDE residuals minimized with PDE-based soft constraint会导致问题变得不稳定。与所有直接将PDE residuals minimized的方法不同,这项工作将时间步骤法与深度学习结合,将原始不稳定优化问题转化为一系列稳定优化问题。通过跟踪pseudo时间步骤过程的路径,提高了模型训练的 converges。我们的结果表明,提案的方法可以在许多问题中实现稳定的训练和正确的结果,只需要对损失函数进行一个简单的修改。此外,我们还展示了时间步骤法在神经网络基于优化方法框架中的一些新特性和优势,比如:Explicit scheme可以允许更大的时间步骤,而Implicit scheme可以被实现为Explicit scheme一样直接。

Hyperparameter Optimization for Multi-Objective Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.16487
  • repo_url: https://github.com/lucasalegre/morl-baselines
  • paper_authors: Florian Felten, Daniel Gareev, El-Ghazali Talbi, Grégoire Danoy
  • for: 本研究旨在解决多目标算法中的超参数优化问题,以提高多目标算法的性能。
  • methods: 本研究提出了一种系统的方法来解决超参数优化问题,包括精心设计的搜索策略和优化目标函数。
  • results: 实验结果表明,提出的方法可以有效地提高多目标算法的性能,并且标识了未来研究的可能性。
    Abstract Reinforcement learning (RL) has emerged as a powerful approach for tackling complex problems. The recent introduction of multi-objective reinforcement learning (MORL) has further expanded the scope of RL by enabling agents to make trade-offs among multiple objectives. This advancement not only has broadened the range of problems that can be tackled but also created numerous opportunities for exploration and advancement. Yet, the effectiveness of RL agents heavily relies on appropriately setting their hyperparameters. In practice, this task often proves to be challenging, leading to unsuccessful deployments of these techniques in various instances. Hence, prior research has explored hyperparameter optimization in RL to address this concern. This paper presents an initial investigation into the challenge of hyperparameter optimization specifically for MORL. We formalize the problem, highlight its distinctive challenges, and propose a systematic methodology to address it. The proposed methodology is applied to a well-known environment using a state-of-the-art MORL algorithm, and preliminary results are reported. Our findings indicate that the proposed methodology can effectively provide hyperparameter configurations that significantly enhance the performance of MORL agents. Furthermore, this study identifies various future research opportunities to further advance the field of hyperparameter optimization for MORL.
    摘要 本文对MORL中的超参数优化进行了初步的研究。我们正式定义了问题,抛光了其独特的挑战,并提出了系统的方法ology。我们应用了一种现状最佳的MORL算法在一个知名环境中,并发布了初步的结果。我们的发现表明,我们的方法可以有效地为MORL代理人提供优化超参数的配置,从而提高其性能。此外,本研究还标识了MORL中超参数优化的未来研究的多种可能性。

A Comprehensive Python Library for Deep Learning-Based Event Detection in Multivariate Time Series Data and Information Retrieval in NLP

  • paper_url: http://arxiv.org/abs/2310.16485
  • repo_url: None
  • paper_authors: Menouar Azib, Benjamin Renard, Philippe Garnier, Vincent Génot, Nicolas André
    for: 这份研究目的是为了开发一个基于深度学习的时间序列资料事件探测方法,以便在不同领域中进行更有效的事件探测和预测。methods: 这个方法使用了四个重要的新特性,包括:首先,它是一个回归型的方法,而不是一个双值分类方法。其次,它不需要标注化的数据集,而是仅需要提供参考事件,例如时间点或时间间隔。第三,它使用了一个堆叠 ensemble 学习元模型,融合了多种深度学习模型,包括预设的 feed-forward neural networks (FFNs) 到现有的架构如 transformers。最后,为了实用实现,我们已经开发了一个 Python 套件,名为 eventdetector-ts,可以通过 Python Package Index (PyPI) 进行安装。results: 在不同的实验中,我们显示了这个方法的多元性和有效性,包括自然语言处理 (NLP) 到金融安全领域的实验。
    Abstract Event detection in time series data is crucial in various domains, including finance, healthcare, cybersecurity, and science. Accurately identifying events in time series data is vital for making informed decisions, detecting anomalies, and predicting future trends. Despite extensive research exploring diverse methods for event detection in time series, with deep learning approaches being among the most advanced, there is still room for improvement and innovation in this field. In this paper, we present a new deep learning supervised method for detecting events in multivariate time series data. Our method combines four distinct novelties compared to existing deep-learning supervised methods. Firstly, it is based on regression instead of binary classification. Secondly, it does not require labeled datasets where each point is labeled; instead, it only requires reference events defined as time points or intervals of time. Thirdly, it is designed to be robust by using a stacked ensemble learning meta-model that combines deep learning models, ranging from classic feed-forward neural networks (FFNs) to state-of-the-art architectures like transformers. This ensemble approach can mitigate individual model weaknesses and biases, resulting in more robust predictions. Finally, to facilitate practical implementation, we have developed a Python package to accompany our proposed method. The package, called eventdetector-ts, can be installed through the Python Package Index (PyPI). In this paper, we present our method and provide a comprehensive guide on the usage of the package. We showcase its versatility and effectiveness through different real-world use cases from natural language processing (NLP) to financial security domains.
    摘要 时序数据中的事件检测在各个领域都是非常重要的,包括金融、医疗、网络安全和科学等。正确地在时序数据中检测事件是至关重要的,以便做出了 informed 的决策,检测异常点和预测未来趋势。虽然有 extensively 的研究探讨了不同的事件检测方法,但是还有很多空间和创新的 possiblities 在这个领域。在这篇论文中,我们提出了一种新的深度学习监督方法,用于在多ivariate 时序数据中检测事件。我们的方法包括四个新特点,与现有的深度学习监督方法相比:1. 基于回归而不是二分类。2. 不需要标注数据集,只需要参考事件定义为时间点或时间Interval。3. 使用堆叠 ensemble learning 元模型,结合深度学习模型,从 classic feed-forward neural networks (FFNs) 到现代架构如 transformers。这种ensembleapproach可以 Mitigate 个模型的弱点和偏见,从而获得更加稳定的预测。4. 为了方便实现,我们开发了一个 Python 包,可以在 PyPI 上安装。在这篇论文中,我们介绍了我们的方法,并提供了使用该包的完整指南。我们通过不同的实际案例,从自然语言处理 (NLP) 到金融安全领域,展示了我们的方法的多样性和效果。

Symphony of experts: orchestration with adversarial insights in reinforcement learning

  • paper_url: http://arxiv.org/abs/2310.16473
  • repo_url: None
  • paper_authors: Matthieu Jonckheere, Chiara Mignacco, Gilles Stoltz
  • for: 这 paper 的目的是探讨 Structured reinforcement learning 如何在探索困难的情况下 достичь更好的性能,特别是通过 “orchestration” 的概念,一小组专家策略来导航决策。
  • methods: 这 paper 使用的方法包括 orchestration 的模型化,以及对 adversarial settings 的转移 regret bound 结果,以及对 natural policy gradient 的扩展和推广。
  • results: 这 paper 的结果包括在 tabular Setting 下的 value-functions regret bounds,以及对 arbitrary adversarial aggregation strategies 的扩展和推广。 另外, paper 还提供了更加透明的证明方法,以及一个 Stochastic matching toy model 的仿真结果。
    Abstract Structured reinforcement learning leverages policies with advantageous properties to reach better performance, particularly in scenarios where exploration poses challenges. We explore this field through the concept of orchestration, where a (small) set of expert policies guides decision-making; the modeling thereof constitutes our first contribution. We then establish value-functions regret bounds for orchestration in the tabular setting by transferring regret-bound results from adversarial settings. We generalize and extend the analysis of natural policy gradient in Agarwal et al. [2021, Section 5.3] to arbitrary adversarial aggregation strategies. We also extend it to the case of estimated advantage functions, providing insights into sample complexity both in expectation and high probability. A key point of our approach lies in its arguably more transparent proofs compared to existing methods. Finally, we present simulations for a stochastic matching toy model.
    摘要 “结构化强化学习可以利用有利属性的策略来 дости得更好的性能,特别在探索问题时。我们通过“指挥”的概念来探索这一领域,其中一小集的专家策略导向决策。我们在这篇文章中将提供值函数的后悔 bounds,并将这些结果应用到表格设定中。我们还将对自然策略均衡的分析扩展到任意的反抗策略整合策略,并将其推广到估计的优势函数中。我们的方法具有更加透明的证明,相比于现有的方法。最后,我们将提供一个Stochastic Matching实验模型的 simulate。”Note: The translation is in Simplified Chinese, which is one of the two standard varieties of Chinese. The other variety is Traditional Chinese.

Learning Continuous Network Emerging Dynamics from Scarce Observations via Data-Adaptive Stochastic Processes

  • paper_url: http://arxiv.org/abs/2310.16466
  • repo_url: https://github.com/csjtx1021/neural_ode_processes_for_network_dynamics-master
  • paper_authors: Jiaxu Cui, Bingyi Sun, Jiming Liu, Bo Yang
    for: 学习复杂网络动态的研究是探讨复杂网络在多个领域中的互动机制的重要基础。methods: 我们提出了一种新的Neural ODE Processes for Network Dynamics(NDP4ND),它是基于随机数据适应网络动态的新一代渐近逻辑过程。results: 我们在多种网络动态实验中进行了广泛的实验,结果表明,NDP4ND方法具有优秀的数据适应性和计算效率,可以快速适应未见的网络出现动态,并且可以减少观察数据的比例至只有约6%,提高学习新动态的速度。
    Abstract Learning network dynamics from the empirical structure and spatio-temporal observation data is crucial to revealing the interaction mechanisms of complex networks in a wide range of domains. However, most existing methods only aim at learning network dynamic behaviors generated by a specific ordinary differential equation instance, resulting in ineffectiveness for new ones, and generally require dense observations. The observed data, especially from network emerging dynamics, are usually difficult to obtain, which brings trouble to model learning. Therefore, how to learn accurate network dynamics with sparse, irregularly-sampled, partial, and noisy observations remains a fundamental challenge. We introduce Neural ODE Processes for Network Dynamics (NDP4ND), a new class of stochastic processes governed by stochastic data-adaptive network dynamics, to overcome the challenge and learn continuous network dynamics from scarce observations. Intensive experiments conducted on various network dynamics in ecological population evolution, phototaxis movement, brain activity, epidemic spreading, and real-world empirical systems, demonstrate that the proposed method has excellent data adaptability and computational efficiency, and can adapt to unseen network emerging dynamics, producing accurate interpolation and extrapolation with reducing the ratio of required observation data to only about 6\% and improving the learning speed for new dynamics by three orders of magnitude.
    摘要 学习复杂网络动力学从实际结构和空间时间观察数据中是揭示复杂网络交互机制的关键。然而,现有方法大多只关注于学习特定常 differential equation 实例生成的网络动力学行为,导致对新的动力学不效果,并且通常需要密集观察。观察数据,特别是从网络 emerging 动力学来,通常困难以获得,这会带来模型学习的困难。因此,如何学习准确的网络动力学以及稀疏、不规则、部分和噪音观察数据仍是一个基本挑战。我们介绍了神经网络过程 для网络动力学(NDP4ND),一种新的随机过程,由数据适应网络动力学控制。经过对各种网络动力学实验,包括生态学人口演化、光激活运动、脑活动、流行病传播和实际观察数据,我们发现该方法具有出色的数据适应性和计算效率,可以适应未见到的网络emerging动力学,生成高精度的 interpolate 和 extrapolate,并将观察数据比例降低至只有6%,提高了对新动力学的学习速度。

Unknown Health States Recognition With Collective Decision Based Deep Learning Networks In Predictive Maintenance Applications

  • paper_url: http://arxiv.org/abs/2310.17670
  • repo_url: None
  • paper_authors: Chuyue Lou, M. Amine Atoui
  • for: 这个研究的目的是为了提出一个集体决策框架,以便不同的卷积神经网络可以同时进行不同的健康状态分类。
  • methods: 这个研究使用了多个卷积神经网络,包括进步的卷积神经网络、多尺度卷积神经网络和差异卷积神经网络。这些神经网络可以从工业数据中学习有效的健康状态表示。
  • results: 根据TEP公共数据集的验证结果显示,提出的卷积神经网络集体决策框架可以优化不知道的健康状态标本的识别能力,同时维持知道的健康状态标本的准确率。这些结果显示了该深度学习框架的优越性,并且基于余差和多尺度学习的神经网络表现最佳。
    Abstract At present, decision making solutions developed based on deep learning (DL) models have received extensive attention in predictive maintenance (PM) applications along with the rapid improvement of computing power. Relying on the superior properties of shared weights and spatial pooling, Convolutional Neural Network (CNN) can learn effective representations of health states from industrial data. Many developed CNN-based schemes, such as advanced CNNs that introduce residual learning and multi-scale learning, have shown good performance in health state recognition tasks under the assumption that all the classes are known. However, these schemes have no ability to deal with new abnormal samples that belong to state classes not part of the training set. In this paper, a collective decision framework for different CNNs is proposed. It is based on a One-vs-Rest network (OVRN) to simultaneously achieve classification of known and unknown health states. OVRN learn state-specific discriminative features and enhance the ability to reject new abnormal samples incorporated to different CNNs. According to the validation results on the public dataset of Tennessee Eastman Process (TEP), the proposed CNN-based decision schemes incorporating OVRN have outstanding recognition ability for samples of unknown heath states, while maintaining satisfactory accuracy on known states. The results show that the new DL framework outperforms conventional CNNs, and the one based on residual and multi-scale learning has the best overall performance.
    摘要 当前,基于深度学习(DL)模型的决策支持技术在预测维护(PM)应用中得到了广泛的关注,随着计算能力的快速提升。利用深度学习模型的共享权重和空间 pooling 特性,卷积神经网络(CNN)可以从工业数据中学习有效的健康状态表示。一些已经发展出来的 CNN 基本 schemes,如增强 CNN 和多级学习,在健康状态识别任务中表现良好,假设所有类别都是已知的。然而,这些 schemes 无法处理新的异常样本,它们不在训练集中。在这篇论文中,一种集成多个 CNN 的决策框架是提出的。它基于一个对抗网络(OVRN),同时实现已知和未知健康状态的分类。OVRN 学习特定状态的抗性特征,提高了对新异常样本的拒绝能力。根据公共数据集 TEP 的验证结果,提出的 CNN 基本 schemes incorporating OVRN 在未知健康状态样本的识别能力方面表现出色,同时保持知道状态的准确率。结果表明,新的 DL 框架超越传统 CNNs,而基于增强和多级学习的 CNN 在整体性能方面表现最佳。

ClearMark: Intuitive and Robust Model Watermarking via Transposed Model Training

  • paper_url: http://arxiv.org/abs/2310.16453
  • repo_url: None
  • paper_authors: Torsten Krauß, Jasper Stang, Alexandra Dmitrienko
  • for: 提供一种可读性好的深度神经网络(DNN)水印方法,以便人类可以直观地判断水印是否存在。
  • methods: 使用一种名为ClearMark的方法,该方法在DNN模型中嵌入可见的水印,并且不需要复杂的验证算法或强制性阈值。
  • results: ClearMark方法可以在不同的数据集和模型上实现高度的可读性和抗性能,并且可以承受模型修改和黑客攻击。
    Abstract Due to costly efforts during data acquisition and model training, Deep Neural Networks (DNNs) belong to the intellectual property of the model creator. Hence, unauthorized use, theft, or modification may lead to legal repercussions. Existing DNN watermarking methods for ownership proof are often non-intuitive, embed human-invisible marks, require trust in algorithmic assessment that lacks human-understandable attributes, and rely on rigid thresholds, making it susceptible to failure in cases of partial watermark erasure. This paper introduces ClearMark, the first DNN watermarking method designed for intuitive human assessment. ClearMark embeds visible watermarks, enabling human decision-making without rigid value thresholds while allowing technology-assisted evaluations. ClearMark defines a transposed model architecture allowing to use of the model in a backward fashion to interwove the watermark with the main task within all model parameters. Compared to existing watermarking methods, ClearMark produces visual watermarks that are easy for humans to understand without requiring complex verification algorithms or strict thresholds. The watermark is embedded within all model parameters and entangled with the main task, exhibiting superior robustness. It shows an 8,544-bit watermark capacity comparable to the strongest existing work. Crucially, ClearMark's effectiveness is model and dataset-agnostic, and resilient against adversarial model manipulations, as demonstrated in a comprehensive study performed with four datasets and seven architectures.
    摘要 由于数据收集和模型训练的成本高昂,深度神经网络(DNN)通常属于创建者的知识产权。因此,不经授权使用、盗取或修改可能会导致法律后果。现有的DNN涂鸦方法为证明所有权存在一些缺点,如难于理解、需要对算法评估中的信任,且需要固定的阈值,这使得其容易受到部分涂鸦 removing 的影响。本文介绍了 ClearMark,首个为人类可读性设计的 DNN 涂鸦方法。ClearMark 使用可见的涂鸦,allowing human decision-making without rigid value thresholds,同时允许技术支持的评估。ClearMark 使用拼接模型 architecture,使得模型在反向方式下使用,将涂鸦与主任务内所有模型参数结合在一起。与现有的涂鸦方法相比,ClearMark 生成的视觉涂鸦易于人类理解,无需复杂的验证算法或固定的阈值。涂鸦被内置于所有模型参数和主任务中,具有更高的鲁棒性。它可以承载 8,544 比特的涂鸦 capacities,与最强的现有工作相当。更重要的是,ClearMark 的效果是模型和数据aset 独立,并且对抗模型修改的攻击,如在四个数据集和七种架构上进行了全面的研究。

Grokking in Linear Estimators – A Solvable Model that Groks without Understanding

  • paper_url: http://arxiv.org/abs/2310.16441
  • repo_url: None
  • paper_authors: Noam Levi, Alon Beck, Yohai Bar-Sinai
  • for: 这 paper 探讨了模型如何在训练数据之后仍然能够泛化。
  • methods: 作者使用了教师-学生模式和高维输入来研究了 linear 网络在Linear任务上的泛化行为。
  • results: 研究发现,在训练和泛化数据协方差矩阵的基础上,模型可以在训练数据之后仍然具有泛化能力,并且可以通过精确预测泛化时间的因素来预测模型的泛化能力。
    Abstract Grokking is the intriguing phenomenon where a model learns to generalize long after it has fit the training data. We show both analytically and numerically that grokking can surprisingly occur in linear networks performing linear tasks in a simple teacher-student setup with Gaussian inputs. In this setting, the full training dynamics is derived in terms of the training and generalization data covariance matrix. We present exact predictions on how the grokking time depends on input and output dimensionality, train sample size, regularization, and network initialization. We demonstrate that the sharp increase in generalization accuracy may not imply a transition from "memorization" to "understanding", but can simply be an artifact of the accuracy measure. We provide empirical verification for our calculations, along with preliminary results indicating that some predictions also hold for deeper networks, with non-linear activations.
    摘要 它(Grokking)是一种吸引人的现象,在学习过程中,模型会在训练数据之后仍然能够泛化。我们通过分析和数值方法表明,在线性网络中进行线性任务时,grokking可以意外地发生。在这种设置中,我们计算了全面的训练动态,并通过训练和泛化数据协方差矩阵来表示。我们提出了具体预测,包括输入和输出维度、训练样本大小、规范、网络初始化等因素,grokking时间如何随变化。我们还证明了,尽管sharply increase in generalization accuracy不一定意味着从"记忆"转移到"理解",但可能只是精度度量的假象。我们提供了实验证明,以及初步结果表明,一些预测也适用于更深的网络和非线性活化。

An Approach for Efficient Neural Architecture Search Space Definition

  • paper_url: http://arxiv.org/abs/2310.17669
  • repo_url: None
  • paper_authors: Léo Pouy, Fouad Khenfri, Patrick Leserf, Chokri Mraidha, Cherif Larouci
  • for: 本研究旨在提出一种新的自动Machine Learning(AutoML)方法和工具,帮助用户在选择神经网络架构时快速寻找最佳策略。
  • methods: 本研究使用了一种新的细胞结构搜索空间,易于理解和操作,并且可以涵盖大多数当前领先的卷积神经网络架构。
  • results: 研究人员通过实验和分析表明,提出的方法可以快速找到最佳策略,并且可以涵盖大多数当前领先的卷积神经网络架构。
    Abstract As we advance in the fast-growing era of Machine Learning, various new and more complex neural architectures are arising to tackle problem more efficiently. On the one hand their efficient usage requires advanced knowledge and expertise, which is most of the time difficult to find on the labor market. On the other hand, searching for an optimized neural architecture is a time-consuming task when it is performed manually using a trial and error approach. Hence, a method and a tool support is needed to assist users of neural architectures, leading to an eagerness in the field of Automatic Machine Learning (AutoML). When it comes to Deep Learning, an important part of AutoML is the Neural Architecture Search (NAS). In this paper, we propose a novel cell-based hierarchical search space, easy to comprehend and manipulate. The objectives of the proposed approach are to optimize the search-time and to be general enough to handle most of state of the art Convolutional Neural Networks (CNN) architectures.
    摘要 随着机器学习领域的快速发展,不断出现新的更复杂的神经网络架构,以提高问题的解决效率。一方面,这些神经网络架构的高级知识和专业技能的需求往往困难找到在劳动市场上。另一方面,手动进行试验和尝试的方法来搜索优化的神经网络架构是一项时间consuming的任务。因此,一种方法和工具支持是需要的,以帮助神经网络架构的用户,从而促进自动机器学习(AutoML)领域的发展。在深度学习方面,搜索神经网络架构(NAS)是AutoML的一个重要组成部分。本文提出了一种新的细胞基 hierarchical 搜索空间,易于理解和操作。该方法的目标是 оптимизиethe search-time和涵盖大多数当前领先的卷积神经网络架构。

Non-isotropic Persistent Homology: Leveraging the Metric Dependency of PH

  • paper_url: http://arxiv.org/abs/2310.16437
  • repo_url: None
  • paper_authors: Vincent P. Grande, Michael T. Schaub
  • for: 这篇论文旨在提出一种新的点云数据分析方法,以EXTRACT ADDITIONAL TOPOLOGICAL AND GEOMETRICAL INFORMATION FROM PERSISTENT HOMOLOGY ANALYSIS。
  • methods: 该方法基于变换距离函数的思想,通过对 persistently diagram 的变化来EXTRACT ADDITIONAL INFORMATION。
  • results: 实验表明,该方法可以准确地EXTRACT INFORMATION ON ORIENTATION, ORIENTATIONAL VARIANCE AND SCALING OF POINT CLOUDS,并且可以应用于实际数据。
    Abstract Persistent Homology is a widely used topological data analysis tool that creates a concise description of the topological properties of a point cloud based on a specified filtration. Most filtrations used for persistent homology depend (implicitly) on a chosen metric, which is typically agnostically chosen as the standard Euclidean metric on $\mathbb{R}^n$. Recent work has tried to uncover the 'true' metric on the point cloud using distance-to-measure functions, in order to obtain more meaningful persistent homology results. Here we propose an alternative look at this problem: we posit that information on the point cloud is lost when restricting persistent homology to a single (correct) distance function. Instead, we show how by varying the distance function on the underlying space and analysing the corresponding shifts in the persistence diagrams, we can extract additional topological and geometrical information. Finally, we numerically show that non-isotropic persistent homology can extract information on orientation, orientational variance, and scaling of randomly generated point clouds with good accuracy and conduct some experiments on real-world data.
    摘要 persistente homology 是一种广泛使用的数据分析工具,可以生成一个精炼的点云的Topological Property 的描述,基于指定的筛选。大多数使用的筛选都是基于标准欧几里得度量空间 $\mathbb{R}^n$ 中的距离函数,而这些距离函数通常是随意选择的。近些年来,人们尝试了找到 '真实' 的度量函数,以获得更加有意义的 persistente homology 结果。在这篇文章中,我们提出了一个不同的思路:我们认为,只要限制 persistente homology 到单一的正确距离函数上,就会产生信息损失。相反,我们示出了通过在下面空间上变换距离函数,并分析对应的 persistente diagram 的变化,可以从中提取更多的topological 和几何信息。最后,我们通过数值实验表明,非均匀 persistente homology 可以对 randomly generated point clouds 中的方向、方向 variance 和扩展缩放进行准确的检测,并进行了一些实验。

FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness for Semi-Supervised Learning

  • paper_url: http://arxiv.org/abs/2310.16412
  • repo_url: https://github.com/zhuohuangai/FlatMatch
  • paper_authors: Zhuo Huang, Li Shen, Jun Yu, Bo Han, Tongliang Liu
  • for: 这个论文的目的是提出一种新的 semi-supervised learning (SSL) 方法,以便将充沛的无标数数据与罕见的标数数据结合起来,以提高 SSL 的性能。
  • methods: 这个方法基于一个新的测度名为“cross-sharpness”,它测量了两个不同变数的关系。这个测度可以确保学习过程中的模型在标数数据和无标数数据上的学习性能是一致的。
  • results: 这个方法可以在许多 SSL Setting中取得最佳的结果,并且可以将无标数数据中的学习性能与标数数据中的学习性能连接起来,以提高 SSL 的性能。
    Abstract Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data. However, most SSL methods are commonly based on instance-wise consistency between different data transformations. Therefore, the label guidance on labeled data is hard to be propagated to unlabeled data. Consequently, the learning process on labeled data is much faster than on unlabeled data which is likely to fall into a local minima that does not favor unlabeled data, leading to sub-optimal generalization performance. In this paper, we propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets. Specifically, we increase the empirical risk on labeled data to obtain a worst-case model which is a failure case that needs to be enhanced. Then, by leveraging the richness of unlabeled data, we penalize the prediction difference (i.e., cross-sharpness) between the worst-case model and the original model so that the learning direction is beneficial to generalization on unlabeled data. Therefore, we can calibrate the learning process without being limited to insufficient label information. As a result, the mismatched learning performance can be mitigated, further enabling the effective exploitation of unlabeled data and improving SSL performance. Through comprehensive validation, we show FlatMatch achieves state-of-the-art results in many SSL settings.
    摘要 半指导学习(SSL)是一种有效地利用充沛的无标签数据和极其罕见的标签数据。然而,大多数SSL方法通常基于实例级别的一致性,因此将标签指导从标签数据传递到无标签数据很困难。因此,在标签数据上学习的过程比在无标签数据上更快,可能导致欠拟合性问题,从而影响泛化性表现。在这篇论文中,我们提出了平滑匹配(FlatMatch),它使用跨锐度度量来保证两个数据集之间的学习表现的一致性。具体来说,我们通过提高标签数据上的Empirical risk来获得一个最坏情况模型,这是一个需要改进的失败情况。然后,通过利用无标签数据的质量,我们对最坏情况模型和原始模型之间的预测差(i.e., 跨锐度度量)进行惩罚,以便通过提高学习方向来优化泛化性表现。因此,我们可以不受限于不充分的标签信息进行调整学习过程。通过全面验证,我们显示了FlatMatch在许多SSL设置中实现了状态机器的 результа。

Multiple Key-value Strategy in Recommendation Systems Incorporating Large Language Model

  • paper_url: http://arxiv.org/abs/2310.16409
  • repo_url: None
  • paper_authors: Dui Wang, Xiangyu Hou, Xiaohui Yang, Bo Zhang, Renbing Chen, Daiyue Xue
    for: 这篇论文的目的是提出一种基于多重键值数据的sequential recommendation方法,以帮助在实际应用中处理多个键值数据。methods: 该方法利用了大型自然语言模型(LLM)的特性,通过适应域知识注入来增强RS的表现。此外,该方法还提出了一种创新的洗牌和隐藏策略来解决LLM在多个键值数据上学习的问题。results: 经过广泛的实验 validate,该方法在MovieLens dataset上显示出了良好的效果,能够好好地完成多个键值数据的sequential recommendation问题。
    Abstract Recommendation system (RS) plays significant roles in matching users information needs for Internet applications, and it usually utilizes the vanilla neural network as the backbone to handle embedding details. Recently, the large language model (LLM) has exhibited emergent abilities and achieved great breakthroughs both in the CV and NLP communities. Thus, it is logical to incorporate RS with LLM better, which has become an emerging research direction. Although some existing works have made their contributions to this issue, they mainly consider the single key situation (e.g. historical interactions), especially in sequential recommendation. The situation of multiple key-value data is simply neglected. This significant scenario is mainstream in real practical applications, where the information of users (e.g. age, occupation, etc) and items (e.g. title, category, etc) has more than one key. Therefore, we aim to implement sequential recommendations based on multiple key-value data by incorporating RS with LLM. In particular, we instruct tuning a prevalent open-source LLM (Llama 7B) in order to inject domain knowledge of RS into the pre-trained LLM. Since we adopt multiple key-value strategies, LLM is hard to learn well among these keys. Thus the general and innovative shuffle and mask strategies, as an innovative manner of data argument, are designed. To demonstrate the effectiveness of our approach, extensive experiments are conducted on the popular and suitable dataset MovieLens which contains multiple keys-value. The experimental results demonstrate that our approach can nicely and effectively complete this challenging issue.
    摘要

Information-Theoretic Generalization Analysis for Topology-aware Heterogeneous Federated Edge Learning over Noisy Channels

  • paper_url: http://arxiv.org/abs/2310.16407
  • repo_url: None
  • paper_authors: Zheshun Wu, Zenglin Xu, Hongfang Yu, Jie Liu
  • for: 本研究旨在探讨 Federated Edge Learning (FEEL) 中的泛化问题,即在无isy通道和多样环境下,移动设备进行模型参数的传输和数据收集,以及设备之间的协同通信对模型的泛化的影响。
  • methods: 本研究采用信息论的泛化分析方法,对于 topology-aware FEEL 中的数据不同性和噪声通道的影响进行了全面的检查。此外,我们还提出了一种新的常见规范方法 called Federated Global Mutual Information Reduction (FedGMIR),用于提高模型的性能。
  • results: 数据 validate our theoretical findings and provide evidence for the effectiveness of the proposed method.
    Abstract With the rapid growth of edge intelligence, the deployment of federated learning (FL) over wireless networks has garnered increasing attention, which is called Federated Edge Learning (FEEL). In FEEL, both mobile devices transmitting model parameters over noisy channels and collecting data in diverse environments pose challenges to the generalization of trained models. Moreover, devices can engage in decentralized FL via Device-to-Device communication while the communication topology of connected devices also impacts the generalization of models. Most recent theoretical studies overlook the incorporation of all these effects into FEEL when developing generalization analyses. In contrast, our work presents an information-theoretic generalization analysis for topology-aware FEEL in the presence of data heterogeneity and noisy channels. Additionally, we propose a novel regularization method called Federated Global Mutual Information Reduction (FedGMIR) to enhance the performance of models based on our analysis. Numerical results validate our theoretical findings and provide evidence for the effectiveness of the proposed method.
    摘要 随着边缘智能的快速发展, Federated Edge Learning(FEEL)在无线网络上进行的部署吸引了越来越多的关注。在 FEEL 中,移动设备通过含杂annel 传输模型参数并收集数据在多样化环境中具有挑战,这些挑战对训练模型的泛化造成了影响。此外,设备可以通过设备之间的 Device-to-Device 通信参与到分布式 FL 中,而连接设备的通信结构也会影响模型的泛化。最近的理论研究忽视了这些效应在 FEEL 中进行泛化分析。相比之下,我们的工作提供了一种信息论的泛化分析方法,该方法考虑了数据多样性和含杂annel 的影响。此外,我们还提出了一种新的规范方法,即 Federated Global Mutual Information Reduction(FedGMIR),以提高模型性能。数值结果证明了我们的理论发现,并提供了 FEEL 中模型性能的提升证据。

Graph Neural Networks with a Distribution of Parametrized Graphs

  • paper_url: http://arxiv.org/abs/2310.16401
  • repo_url: None
  • paper_authors: See Hian Lee, Feng Ji, Kelin Xia, Wee Peng Tay
  • for: 提高图像分类和图像回归的性能,捕捉更多的信息
  • methods: 使用 latent variables Parameterizing multiple graphs,基于 Expectation-Maximization 框架和 Markov Chain Monte Carlo 方法
  • results: 在hetroogeneous graph 和化学数据集上对节点分类和图像回归实现了提高性能,比基eline模型更好Here’s a breakdown of each point:
  • for: The paper is written to improve the performance of graph neural networks on node classification and graph regression tasks by incorporating additional information from multiple graphs.
  • methods: The authors introduce latent variables to parameterize and generate multiple graphs, and use an Expectation-Maximization framework and Markov Chain Monte Carlo method to obtain the maximum likelihood estimate of the network parameters.
  • results: The authors demonstrate improvements in performance against baseline models on node classification for heterogeneous graphs and graph regression on chemistry datasets.
    Abstract Traditionally, graph neural networks have been trained using a single observed graph. However, the observed graph represents only one possible realization. In many applications, the graph may encounter uncertainties, such as having erroneous or missing edges, as well as edge weights that provide little informative value. To address these challenges and capture additional information previously absent in the observed graph, we introduce latent variables to parameterize and generate multiple graphs. We obtain the maximum likelihood estimate of the network parameters in an Expectation-Maximization (EM) framework based on the multiple graphs. Specifically, we iteratively determine the distribution of the graphs using a Markov Chain Monte Carlo (MCMC) method, incorporating the principles of PAC-Bayesian theory. Numerical experiments demonstrate improvements in performance against baseline models on node classification for heterogeneous graphs and graph regression on chemistry datasets.
    摘要

Learning Efficient Surrogate Dynamic Models with Graph Spline Networks

  • paper_url: http://arxiv.org/abs/2310.16397
  • repo_url: https://github.com/kaist-silab/graphsplinenets
  • paper_authors: Chuanbo Hua, Federico Berto, Michael Poli, Stefano Massaroli, Jinkyoo Park
  • for: 这篇论文旨在提高物理系统预测的效率,使用深度学习方法来简化网格大小和迭代步骤。
  • methods: 本文提出了GraphSplineNets,一种新的深度学习方法,利用两个可微的正交拓拨方法来高效预测时间和空间中的回应。此外,我们也提出了一个适应标本策略,优先在重要区域进行标本。
  • results: 本文透过处理多种物理系统,包括热方程、振荡波传播、奈奎-斯托克方程和真实世界的海洋流体,以及训练过程中的测试和评估,发现GraphSplineNets可以提高预测精度和速度的对应关系。
    Abstract While complex simulations of physical systems have been widely used in engineering and scientific computing, lowering their often prohibitive computational requirements has only recently been tackled by deep learning approaches. In this paper, we present GraphSplineNets, a novel deep-learning method to speed up the forecasting of physical systems by reducing the grid size and number of iteration steps of deep surrogate models. Our method uses two differentiable orthogonal spline collocation methods to efficiently predict response at any location in time and space. Additionally, we introduce an adaptive collocation strategy in space to prioritize sampling from the most important regions. GraphSplineNets improve the accuracy-speedup tradeoff in forecasting various dynamical systems with increasing complexity, including the heat equation, damped wave propagation, Navier-Stokes equations, and real-world ocean currents in both regular and irregular domains.
    摘要 traditional simulations of physical systems have been widely used in engineering and scientific computing, but their high computational requirements have only recently been addressed by deep learning methods. In this paper, we present GraphSplineNets, a novel deep-learning approach that speeds up the forecasting of physical systems by reducing the grid size and number of iteration steps of deep surrogate models. Our method uses two differentiable orthogonal spline collocation methods to efficiently predict responses at any location in time and space. Additionally, we introduce an adaptive collocation strategy in space to prioritize sampling from the most important regions. GraphSplineNets improve the accuracy-speedup tradeoff in forecasting various dynamical systems with increasing complexity, including the heat equation, damped wave propagation, Navier-Stokes equations, and real-world ocean currents in both regular and irregular domains.Note that Simplified Chinese is a romanization of Chinese, and the actual Chinese characters may be different.

Distributed Uncertainty Quantification of Kernel Interpolation on Spheres

  • paper_url: http://arxiv.org/abs/2310.16384
  • repo_url: None
  • paper_authors: Shao-Bo Lin, Xingping Sun, Di Wang
  • for: 这 paper 是关于 radial basis function (RBF) kernel interpolation of scattered data 的研究,具体来说是研究如何对具有较大规模噪声数据的射频数据进行插值,以及如何管理和评估插值过程中的不确定性。
  • methods: 这 paper 使用了分布式插值方法来处理具有较大规模噪声数据的射频数据,并通过对不确定性进行评估和管理来提高插值精度和稳定性。
  • results: 该 paper 的数据示出了该方法在处理具有较大规模噪声数据的射频数据时的实用性和稳定性,并且可以减少对不确定性的影响,从而提高插值精度。
    Abstract For radial basis function (RBF) kernel interpolation of scattered data, Schaback in 1995 proved that the attainable approximation error and the condition number of the underlying interpolation matrix cannot be made small simultaneously. He referred to this finding as an "uncertainty relation", an undesirable consequence of which is that RBF kernel interpolation is susceptible to noisy data. In this paper, we propose and study a distributed interpolation method to manage and quantify the uncertainty brought on by interpolating noisy spherical data of non-negligible magnitude. We also present numerical simulation results showing that our method is practical and robust in terms of handling noisy data from challenging computing environments.
    摘要 For radial basis function (RBF) kernel interpolation of scattered data, Schaback in 1995 proved that the attainable approximation error and the condition number of the underlying interpolation matrix cannot be made small simultaneously. He referred to this finding as an "uncertainty relation", an undesirable consequence of which is that RBF kernel interpolation is susceptible to noisy data. In this paper, we propose and study a distributed interpolation method to manage and quantify the uncertainty brought on by interpolating noisy spherical data of non-negligible magnitude. We also present numerical simulation results showing that our method is practical and robust in terms of handling noisy data from challenging computing environments.Here's the translation in Traditional Chinese:For radial basis function (RBF) kernel interpolation of scattered data, Schaback in 1995 proved that the attainable approximation error and the condition number of the underlying interpolation matrix cannot be made small simultaneously. He referred to this finding as an "uncertainty relation", an undesirable consequence of which is that RBF kernel interpolation is susceptible to noisy data. In this paper, we propose and study a distributed interpolation method to manage and quantify the uncertainty brought on by interpolating noisy spherical data of non-negligible magnitude. We also present numerical simulation results showing that our method is practical and robust in terms of handling noisy data from challenging computing environments.

A model for multi-attack classification to improve intrusion detection performance using deep learning approaches

  • paper_url: http://arxiv.org/abs/2310.16380
  • repo_url: None
  • paper_authors: Arun Kumar Silivery, Ram Mohan Rao Kovvur
  • for: 本研究旨在开发一种可靠的攻击检测机制,以 помочь发现恶意攻击。
  • methods: 该研究提出了一种深度学习方法框架,包括三种方法:Long-Short Term Memory Recurrent Neural Network (LSTM-RNN)、Recurrent Neural Network (RNN) 和 Deep Neural Network (DNN)。
  • results: 研究结果显示,LSTM-RNN WITH adamax 优化器在 NSL-KDD 数据集上表现出色,在准确率、检测率和假阳性率方面超过了现有的浅学习和深度学习模型。此外,多模型方法也在 KDD99、NSL-KDD 和 UNSWNB15 数据集上提供了显著的性能。
    Abstract This proposed model introduces novel deep learning methodologies. The objective here is to create a reliable intrusion detection mechanism to help identify malicious attacks. Deep learning based solution framework is developed consisting of three approaches. The first approach is Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) with seven optimizer functions such as adamax, SGD, adagrad, adam, RMSprop, nadam and adadelta. The model is evaluated on NSL-KDD dataset and classified multi attack classification. The model has outperformed with adamax optimizer in terms of accuracy, detection rate and low false alarm rate. The results of LSTM-RNN with adamax optimizer is compared with existing shallow machine and deep learning models in terms of accuracy, detection rate and low false alarm rate. The multi model methodology consisting of Recurrent Neural Network (RNN), Long-Short Term Memory Recurrent Neural Network (LSTM-RNN), and Deep Neural Network (DNN). The multi models are evaluated on bench mark datasets such as KDD99, NSL-KDD, and UNSWNB15 datasets. The models self-learnt the features and classifies the attack classes as multi-attack classification. The models RNN, and LSTM-RNN provide considerable performance compared to other existing methods on KDD99 and NSL-KDD dataset
    摘要 这种提议的模型引入了新的深度学习方法ologies。目标是创建一个可靠的攻击检测机制,以帮助标识恶意攻击。基于深度学习的解决方案框架由三种方法组成:首先是Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) WITH seven optimizer functions such as adamax, SGD, adagrad, adam, RMSprop, nadam和 adadelta。模型在 NSL-KDD 数据集上进行评估,并将多种攻击分类。模型使用 adamax 优化器时在准确率、检测率和假阳性率方面占据了领先地位。对于 LSTM-RNN WITH adamax 优化器的结果进行比较,与现有的浅学习和深度学习模型在准确率、检测率和假阳性率方面的性能。此外,我们还提出了一种多模型方法,包括 Recurrent Neural Network (RNN)、Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) 和 Deep Neural Network (DNN)。这些模型在 KDD99、NSL-KDD 和 UNSWNB15 数据集上进行评估,并自学习特征以进行多类攻击分类。RNN 和 LSTM-RNN 模型在 KDD99 和 NSL-KDD 数据集上表现出了显著的优异性,与其他现有方法相比。

DyExplainer: Explainable Dynamic Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.16375
  • repo_url: None
  • paper_authors: Tianchun Wang, Dongsheng Luo, Wei Cheng, Haifeng Chen, Xiang Zhang
  • for: 这 paper 的目的是解释动态图神经网络(GNNs)的含义,以便更好地理解和信任这些模型,从而扩大其在重要应用场景中的使用。
  • methods: 这 paper 使用了一种名为 DyExplainer 的新方法,它在运行时对动态 GNNs 进行解释。DyExplainer 使用了一种动态 GNN 的干扰注意力技术,同时通过对比学习技术来保持结构一致性和时间连续性。
  • results: 该 paper 的实验结果表明,DyExplainer 不仅能够 faithful 地解释模型预测结果,还能够显著提高模型预测精度,如在链接预测任务中。
    Abstract Graph Neural Networks (GNNs) resurge as a trending research subject owing to their impressive ability to capture representations from graph-structured data. However, the black-box nature of GNNs presents a significant challenge in terms of comprehending and trusting these models, thereby limiting their practical applications in mission-critical scenarios. Although there has been substantial progress in the field of explaining GNNs in recent years, the majority of these studies are centered on static graphs, leaving the explanation of dynamic GNNs largely unexplored. Dynamic GNNs, with their ever-evolving graph structures, pose a unique challenge and require additional efforts to effectively capture temporal dependencies and structural relationships. To address this challenge, we present DyExplainer, a novel approach to explaining dynamic GNNs on the fly. DyExplainer trains a dynamic GNN backbone to extract representations of the graph at each snapshot, while simultaneously exploring structural relationships and temporal dependencies through a sparse attention technique. To preserve the desired properties of the explanation, such as structural consistency and temporal continuity, we augment our approach with contrastive learning techniques to provide priori-guided regularization. To model longer-term temporal dependencies, we develop a buffer-based live-updating scheme for training. The results of our extensive experiments on various datasets demonstrate the superiority of DyExplainer, not only providing faithful explainability of the model predictions but also significantly improving the model prediction accuracy, as evidenced in the link prediction task.
    摘要 GRAPH Neural Networks (GNNs) once again become a popular research topic due to their impressive ability to capture representations from graph-structured data. However, the black-box nature of GNNs presents a significant challenge in terms of understanding and trusting these models, which limits their practical applications in critical scenarios. Although there has been substantial progress in the field of explaining GNNs in recent years, most of these studies focus on static graphs, leaving the explanation of dynamic GNNs largely unexplored. Dynamic GNNs, with their ever-changing graph structures, pose a unique challenge and require additional efforts to effectively capture temporal dependencies and structural relationships. To address this challenge, we propose DyExplainer, a novel approach to explaining dynamic GNNs on the fly. DyExplainer trains a dynamic GNN backbone to extract representations of the graph at each snapshot, while simultaneously exploring structural relationships and temporal dependencies through a sparse attention technique. To preserve the desired properties of the explanation, such as structural consistency and temporal continuity, we augment our approach with contrastive learning techniques to provide prior-guided regularization. To model longer-term temporal dependencies, we develop a buffer-based live-updating scheme for training. The results of our extensive experiments on various datasets demonstrate the superiority of DyExplainer, not only providing faithful explainability of the model predictions but also significantly improving the model prediction accuracy, as evidenced in the link prediction task.

Joint Distributional Learning via Cramer-Wold Distance

  • paper_url: http://arxiv.org/abs/2310.16374
  • repo_url: None
  • paper_authors: Seunghwan An, Jong-June Jeon
  • for: 提高高维数据集中变量 conditional independence 假设的限制,以适应复杂的 correlation 结构和高维数据集。
  • methods: 提出了 Cramer-Wold 距离正则化,可以在关闭式计算中实现高维数据集的共同分布学习。同时,我们提出了一种两步学习方法,以便灵活地设定先验分布和改进 posterior 和先验分布的对齐。
  • results: 通过对高维数据集进行synthetic数据生成测试,我们证明了我们的提议方法的效果。由于许多现有的数据集和数据科学应用都包含多 categorical 变量,我们的实验表明了我们的方法的 universal 性。
    Abstract The assumption of conditional independence among observed variables, primarily used in the Variational Autoencoder (VAE) decoder modeling, has limitations when dealing with high-dimensional datasets or complex correlation structures among observed variables. To address this issue, we introduced the Cramer-Wold distance regularization, which can be computed in a closed-form, to facilitate joint distributional learning for high-dimensional datasets. Additionally, we introduced a two-step learning method to enable flexible prior modeling and improve the alignment between the aggregated posterior and the prior distribution. Furthermore, we provide theoretical distinctions from existing methods within this category. To evaluate the synthetic data generation performance of our proposed approach, we conducted experiments on high-dimensional datasets with multiple categorical variables. Given that many readily available datasets and data science applications involve such datasets, our experiments demonstrate the effectiveness of our proposed methodology.
    摘要 假设独立性 Among observed variables的假设,通常用于Variational Autoencoder(VAE)解oder模型中,对高维度数据集或复杂的变量结构存在限制。为了解决这个问题,我们引入了Cramer-Wold distance regularization,可以在关闭式中计算,以便实现高维度数据集中的共同分布学习。此外,我们引入了两步学习方法,以提高对汇集 posterior和先前分布的整合。此外,我们还提供了与现有方法的理论区别。为评估我们提出的方法的 sintetic data生成性能,我们在高维度数据集中进行了实验,其中许多 readily available datasets和数据科学应用都包含这类数据。我们的实验显示了我们提出的方法的有效性。

Finite Time Analysis of Constrained Actor Critic and Constrained Natural Actor Critic Algorithms

  • paper_url: http://arxiv.org/abs/2310.16363
  • repo_url: None
  • paper_authors: Prashansa Panda, Shalabh Bhatnagar
  • for: 本文研究了actor critic和natural actor critic算法在受限制Markov决策过程(C-MDP)中的应用,特别是当状态-动作空间较大时。
  • methods: 本文使用了actor critic和natural actor critic算法,并使用了函数近似来处理受限制MDP中的不等约束。
  • results: 本文通过非假设分析表明,actor critic和natural actor critic算法在非i.i.d(Markovian) Setting下可以 garantúa找到一个首ORDER站点(即 $\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$),并且其样本复杂度为 $\mathcal{\tilde{O}(\epsilon^{-2.5})$。此外,在一些网格世界设置下进行了实验,并观察到了良好的实验性能。
    Abstract Actor Critic methods have found immense applications on a wide range of Reinforcement Learning tasks especially when the state-action space is large. In this paper, we consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes (C-MDP) involving inequality constraints and carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting. We consider the long-run average cost criterion where both the objective and the constraint functions are suitable policy-dependent long-run averages of certain prescribed cost functions. We handle the inequality constraints using the Lagrange multiplier method. We prove that these algorithms are guaranteed to find a first-order stationary point (i.e., $\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$) of the performance (Lagrange) function $L(\theta,\gamma)$, with a sample complexity of $\mathcal{\tilde{O}(\epsilon^{-2.5})$ in the case of both Constrained Actor Critic (C-AC) and Constrained Natural Actor Critic (C-NAC) algorithms.We also show the results of experiments on a few different grid world settings and observe good empirical performance using both of these algorithms. In particular, for large grid sizes, Constrained Natural Actor Critic shows slightly better results than Constrained Actor Critic while the latter is slightly better for a small grid size.
    摘要 actor-critic方法在各种强化学习任务上发现了广泛的应用,特别是当状态动作空间很大时。在这篇论文中,我们考虑actor-critic和自然actor-critic算法,并使用函数近似来解决受约束的马可夫决策过程(C-MDP)中的不等约束。我们使用长期均值成本函数,其中对象函数和约束函数都是适用于指定成本函数的长期均值。我们使用拉格朗日积分法来处理不等约束。我们证明了这些算法可以在非i.i.d(Markovian)设置下找到一个第一阶站点(即 $\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$)的性能(拉格朗日)函数$L(\theta,\gamma)$的first-order站点,并且其样本复杂度为 $\mathcal{\tilde{O}(\epsilon^{-2.5})$。我们还进行了一些网格世界的实验,并观察了这两种算法在各种网格大小下的良好实验性。特别是在大网格大小下,自然actor-critic算法表现slightly better than Constrained Actor Critic算法,而后者在小网格大小下表现slightly better。

Neural Potential Field for Obstacle-Aware Local Motion Planning

  • paper_url: http://arxiv.org/abs/2310.16362
  • repo_url: https://github.com/cog-isa/npfield
  • paper_authors: Muhammad Alhaddad, Konstantin Mironov, Aleksey Staroverov, Aleksandr Panov
  • for: 本文旨在提供基于预测模型的移动 робот平台的本地规划方法。
  • methods: 本文提出了一种基于神经网络的潜在场的射程场,该模型基于机器人姿态、障碍物地图和机器人脚印的 differentiable 射程场,可以在 MPC solving 中使用。
  • results: 对比试验表明,提出的方法可以与现有的本地规划器相比,提供了更平滑的轨迹、相对较短的路径长度和安全的距离障碍物。实验结果表明,该方法可以在 Husky UGV 移动机器人上实现实时和安全的本地规划。
    Abstract Model predictive control (MPC) may provide local motion planning for mobile robotic platforms. The challenging aspect is the analytic representation of collision cost for the case when both the obstacle map and robot footprint are arbitrary. We propose a Neural Potential Field: a neural network model that returns a differentiable collision cost based on robot pose, obstacle map, and robot footprint. The differentiability of our model allows its usage within the MPC solver. It is computationally hard to solve problems with a very high number of parameters. Therefore, our architecture includes neural image encoders, which transform obstacle maps and robot footprints into embeddings, which reduce problem dimensionality by two orders of magnitude. The reference data for network training are generated based on algorithmic calculation of a signed distance function. Comparative experiments showed that the proposed approach is comparable with existing local planners: it provides trajectories with outperforming smoothness, comparable path length, and safe distance from obstacles. Experiment on Husky UGV mobile robot showed that our approach allows real-time and safe local planning. The code for our approach is presented at https://github.com/cog-isa/NPField together with demo video.
    摘要

Redco: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs

  • paper_url: http://arxiv.org/abs/2310.16355
  • repo_url: https://github.com/tanyuqian/redco
  • paper_authors: Bowen Tan, Yun Zhu, Lijuan Liu, Hongyi Wang, Yonghao Zhuang, Jindong Chen, Eric Xing, Zhiting Hu
  • for: 这篇论文主要是为了提供一个轻量级、易用的工具来自动分布式训练和推理 dla 大型自然语言模型 (LLM),以及简化 ML 管道的开发。
  • methods: 该论文使用了两个简单的规则来生成 tensor 平行策略,以便轻松地分布式训练和推理 LLM。此外,论文还提出了一种机制,allowing for the customization of diverse ML pipelines through the definition of merely three functions。
  • results: 论文通过应用 Redco 在一些 LLlM 架构上,如 GPT-J、LLaMA、T5 和 OPT, demonstrate 了其效果,并且比官方实现更少的代码行数。
    Abstract The recent progress of AI can be largely attributed to large language models (LLMs). However, their escalating memory requirements introduce challenges for machine learning (ML) researchers and engineers. Addressing this requires developers to partition a large model to distribute it across multiple GPUs or TPUs. This necessitates considerable coding and intricate configuration efforts with existing model parallel tools, such as Megatron-LM, DeepSpeed, and Alpa. These tools require users' expertise in machine learning systems (MLSys), creating a bottleneck in LLM development, particularly for developers without MLSys background. In this work, we present Redco, a lightweight and user-friendly tool crafted to automate distributed training and inference for LLMs, as well as to simplify ML pipeline development. The design of Redco emphasizes two key aspects. Firstly, to automate model parallism, our study identifies two straightforward rules to generate tensor parallel strategies for any given LLM. Integrating these rules into Redco facilitates effortless distributed LLM training and inference, eliminating the need of additional coding or complex configurations. We demonstrate the effectiveness by applying Redco on a set of LLM architectures, such as GPT-J, LLaMA, T5, and OPT, up to the size of 66B. Secondly, we propose a mechanism that allows for the customization of diverse ML pipelines through the definition of merely three functions, eliminating redundant and formulaic code like multi-host related processing. This mechanism proves adaptable across a spectrum of ML algorithms, from foundational language modeling to complex algorithms like meta-learning and reinforcement learning. Consequently, Redco implementations exhibit much fewer code lines compared to their official counterparts.
    摘要 Recent progress in AI 可以归功于大语言模型 (LLM)。然而,这些模型的内存需求在机器学习 (ML) 研究人员和工程师面临挑战。为 Addressing this, developers need to divide a large model into smaller parts and distribute it across multiple GPUs or TPUs。这需要较多的编程和配置工作,包括使用现有的模型并行工具,如 Megatron-LM、DeepSpeed 和 Alpa。这些工具需要用户具有机器学习系统 (MLSys) 背景,从而成为 LLM 开发中的瓶颈,特别是对没有 MLSys 背景的开发者而言。在这种情况下,我们提出了 Redco,一个轻量级的和易于使用的工具,用于自动化分布式训练和推理 для LLM,以及简化 ML 管道开发。Redco 的设计强调两个关键方面。首先,通过生成简单的两个规则,我们可以自动实现任何给定的 LLM 的模型并行策略。将这些规则集成到 Redco 中,使得分布式 LLM 训练和推理变得非常简单,不需要额外的编程或复杂的配置。我们在一些 LLM 架构,如 GPT-J、LLaMA、T5 和 OPT,上进行了应用,并达到了66B的规模。其次,我们提出了一种机制,allowing for the customization of diverse ML pipelines through the definition of merely three functions。这种机制可以适应多种 ML 算法,从基础语言模型到复杂的算法,如元学习和强化学习。因此,Redco 的实现比官方对应的实现更少了代码行数。

SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process

  • paper_url: http://arxiv.org/abs/2310.16336
  • repo_url: https://github.com/zichongli5/smurf-thp
  • paper_authors: Zichong Li, Yanbo Xu, Simiao Zuo, Haoming Jiang, Chao Zhang, Tuo Zhao, Hongyuan Zha
  • for: 模型事件序列数据的Transformer Hawkes过程模型,但大多数现有的训练方法仍然基于最大化事件序列的概率,这会导致计算不可分解的积分。此外,现有方法无法提供模型预测结果的不确定性评估,例如预测事件到达时间的信任区间。
  • methods: SMURF-THP方法基于分数函数对事件的到达时间进行学习和预测,并通过分数匹配目标函数来避免计算不可分解的积分。
  • results: 在事件类型预测和到达时间不确定性评估中,SMURF-THP方法在置信度抑制中表现出色,同时保持与现有概率基于方法相当的预测精度。
    Abstract Transformer Hawkes process models have shown to be successful in modeling event sequence data. However, most of the existing training methods rely on maximizing the likelihood of event sequences, which involves calculating some intractable integral. Moreover, the existing methods fail to provide uncertainty quantification for model predictions, e.g., confidence intervals for the predicted event's arrival time. To address these issues, we propose SMURF-THP, a score-based method for learning Transformer Hawkes process and quantifying prediction uncertainty. Specifically, SMURF-THP learns the score function of events' arrival time based on a score-matching objective that avoids the intractable computation. With such a learned score function, we can sample arrival time of events from the predictive distribution. This naturally allows for the quantification of uncertainty by computing confidence intervals over the generated samples. We conduct extensive experiments in both event type prediction and uncertainty quantification of arrival time. In all the experiments, SMURF-THP outperforms existing likelihood-based methods in confidence calibration while exhibiting comparable prediction accuracy.
    摘要 <>使用Transformer Hawkes过程模型成功地处理事件序列数据,但大多数现有训练方法都是基于最大化事件序列的极高概率,这些方法通常需要计算一些不可 Calculate some intractable integral.此外,现有的方法无法提供预测结果的uncertainty量评估,例如预测事件到达时间的信任区间。为解决这些问题,我们提出了SMURF-THP,一种基于分数函数的方法,用于学习Transformer Hawkes过程和评估预测结果的uncertainty。具体来说,SMURF-THP学习事件到达时间的分数函数,基于分数匹配目标函数,而不需要计算不可 Calculate some intractable integral。通过这种学习的分数函数,我们可以从预测分布中采样到达时间,从而自然地计算预测结果的uncertainty。我们在事件类型预测和到达时间uncertainty量评估中进行了广泛的实验,并在所有实验中,SMURF-THP在信任报表中保持更好的报表 while exhibiting comparable prediction accuracy。Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".

Defense Against Model Extraction Attacks on Recommender Systems

  • paper_url: http://arxiv.org/abs/2310.16335
  • repo_url: None
  • paper_authors: Sixiao Zhang, Hongzhi Yin, Hongxu Chen, Cheng Long
  • for: 本研究旨在提高推荐系统的Robustness,尤其是针对模型提取攻击。
  • methods: 本文提出了一种名为Gradient-based Ranking Optimization(GRO)的防御策略,用于对模型提取攻击进行防御。GRO将防御问题定义为一个优化问题,目标是将保护目标模型的损失降低到最低,同时将攻击者的代理模型的损失增加到最高。在实现GRO时,我们使用了 swap matrices来代替top-k排名列表,以使用 student model来模拟攻击者的代理模型。
  • results: 我们在三个 benchmark 数据集上进行了实验,并证明了 GRO 的超越性,可以有效地防御模型提取攻击。
    Abstract The robustness of recommender systems has become a prominent topic within the research community. Numerous adversarial attacks have been proposed, but most of them rely on extensive prior knowledge, such as all the white-box attacks or most of the black-box attacks which assume that certain external knowledge is available. Among these attacks, the model extraction attack stands out as a promising and practical method, involving training a surrogate model by repeatedly querying the target model. However, there is a significant gap in the existing literature when it comes to defending against model extraction attacks on recommender systems. In this paper, we introduce Gradient-based Ranking Optimization (GRO), which is the first defense strategy designed to counter such attacks. We formalize the defense as an optimization problem, aiming to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model. Since top-k ranking lists are non-differentiable, we transform them into swap matrices which are instead differentiable. These swap matrices serve as input to a student model that emulates the surrogate model's behavior. By back-propagating the loss of the student model, we obtain gradients for the swap matrices. These gradients are used to compute a swap loss, which maximizes the loss of the student model. We conducted experiments on three benchmark datasets to evaluate the performance of GRO, and the results demonstrate its superior effectiveness in defending against model extraction attacks.
    摘要 “推荐系统的强健性已经成为研究社区中的焦点话题。许多敌意攻击已经被提出,但大多数它们需要很多先前知识,如白盒子攻击或黑盒子攻击,这些攻击假设有一定的外部知识是可用的。在这些攻击中,模型提取攻击最引人注目,它们可以通过重复询问目标模型来训练代理模型。然而,在现有的文献中,防御模型提取攻击的方法尚未得到充分的研究。在这篇论文中,我们介绍了一个名为Gradient-based Ranking Optimization(GRO)的防御策略,这是第一个针对推荐系统中的模型提取攻击进行防御的策略。我们将防御当做一个优化问题,寻找可以最大化攻击者的代理模型的损失,同时最小化保护目标模型的损失。因为排名列表是不可微的,我们将它转换为可微的交换矩阵,这些交换矩阵作为学生模型的输入。通过将学生模型的损失传递到交换矩阵上,我们可以得到交换损失,这个交换损失可以最大化学生模型的损失。我们在三个标准 benchmark 数据集上进行实验评估 GRO 的性能,结果显示 GRO 能够有效地防御模型提取攻击。”

Corrupting Neuron Explanations of Deep Visual Features

  • paper_url: http://arxiv.org/abs/2310.16332
  • repo_url: None
  • paper_authors: Divyansh Srivastava, Tuomas Oikarinen, Tsui-Wei Weng
  • for: This paper aims to investigate the robustness of Neuron Explanation Methods (NEMs) in deep neural networks.
  • methods: The authors use a unified pipeline to evaluate the robustness of NEMs under different types of corruptions, including random noises and well-designed perturbations.
  • results: The authors find that even small amounts of noise can significantly corrupt the explanations provided by NEMs, and that their proposed corruption algorithm can manipulate the explanations of more than 80% of neurons by poisoning less than 10% of the probing data. This raises concerns about the trustworthiness of NEMs in real-life applications.
    Abstract The inability of DNNs to explain their black-box behavior has led to a recent surge of explainability methods. However, there are growing concerns that these explainability methods are not robust and trustworthy. In this work, we perform the first robustness analysis of Neuron Explanation Methods under a unified pipeline and show that these explanations can be significantly corrupted by random noises and well-designed perturbations added to their probing data. We find that even adding small random noise with a standard deviation of 0.02 can already change the assigned concepts of up to 28% neurons in the deeper layers. Furthermore, we devise a novel corruption algorithm and show that our algorithm can manipulate the explanation of more than 80% neurons by poisoning less than 10% of probing data. This raises the concern of trusting Neuron Explanation Methods in real-life safety and fairness critical applications.
    摘要 Translated into Simplified Chinese:深度神经网络(DNN)的不可见行为无法解释的问题,带来了一波解释方法的增加。然而,有增加的担忧,这些解释方法不是可靠和可信的。在这项工作中,我们对神经元解释方法进行了首次稳定性分析,发现这些解释可以由杂音和设计的干扰添加到其探测数据中而严重损害。我们发现,即使添加0.02标准差的随机噪音,也可以改变深层神经元的分配概念,达到28%以上。此外,我们设计了一种新的损害算法,并证明我们的算法可以通过污染探测数据来控制神经元解释的80%以上。这引发了在实际安全和公平应用中信任神经元解释方法的担忧。

Brain-Inspired Reservoir Computing Using Memristors with Tunable Dynamics and Short-Term Plasticity

  • paper_url: http://arxiv.org/abs/2310.16331
  • repo_url: None
  • paper_authors: Nicholas X. Armendarez, Ahmed S. Mohamed, Anurag Dhungel, Md Razuan Hossain, Md Sakib Hasan, Joseph S. Najem
  • for: 本研究旨在提供一种用于时间类型分类和预测任务的analog设备,以提高信息处理速度,降低能耗和占用面积。
  • methods: 研究人员使用 ion-channel-based memristors,通过控制电压或 ion channel 浓度来实现多态动态特性。
  • results: 实验和 simulations 表明,使用一小 número de distinct memristors 可以获得高精度的预测和分类结果,比如在 second-order nonlinear dynamical system prediction 任务中,使用 five distinct memristors 实际 achievable normalized mean square error 为0.0015,在 neural activity classification 任务中,使用 three distinct memristors 实际 achievable accuracy 为96.5%。
    Abstract Recent advancements in reservoir computing research have created a demand for analog devices with dynamics that can facilitate the physical implementation of reservoirs, promising faster information processing while consuming less energy and occupying a smaller area footprint. Studies have demonstrated that dynamic memristors, with nonlinear and short-term memory dynamics, are excellent candidates as information-processing devices or reservoirs for temporal classification and prediction tasks. Previous implementations relied on nominally identical memristors that applied the same nonlinear transformation to the input data, which is not enough to achieve a rich state space. To address this limitation, researchers either diversified the data encoding across multiple memristors or harnessed the stochastic device-to-device variability among the memristors. However, this approach requires additional pre-processing steps and leads to synchronization issues. Instead, it is preferable to encode the data once and pass it through a reservoir layer consisting of memristors with distinct dynamics. Here, we demonstrate that ion-channel-based memristors with voltage-dependent dynamics can be controllably and predictively tuned through voltage or adjustment of the ion channel concentration to exhibit diverse dynamic properties. We show, through experiments and simulations, that reservoir layers constructed with a small number of distinct memristors exhibit significantly higher predictive and classification accuracies with a single data encoding. We found that for a second-order nonlinear dynamical system prediction task, the varied memristor reservoir experimentally achieved a normalized mean square error of 0.0015 using only five distinct memristors. Moreover, in a neural activity classification task, a reservoir of just three distinct memristors experimentally attained an accuracy of 96.5%.
    摘要 To overcome this limitation, researchers have either diversified the data encoding across multiple memristors or harnessed the stochastic device-to-device variability among memristors. However, these approaches require additional pre-processing steps and can lead to synchronization issues.In this study, we demonstrate that ion-channel-based memristors with voltage-dependent dynamics can be controllably and predictively tuned through voltage or adjustment of the ion channel concentration to exhibit diverse dynamic properties. By constructing reservoir layers with a small number of distinct memristors, we found that the predictive and classification accuracies can be significantly improved with a single data encoding.In our experiments and simulations, we showed that for a second-order nonlinear dynamical system prediction task, a reservoir layer constructed with only five distinct memristors achieved a normalized mean square error of 0.0015. Moreover, in a neural activity classification task, a reservoir of just three distinct memristors experimentally attained an accuracy of 96.5%. Our findings demonstrate the potential of using ion-channel-based memristors with diverse dynamics to improve the performance of reservoir computing.

Reinforcement Learning for SBM Graphon Games with Re-Sampling

  • paper_url: http://arxiv.org/abs/2310.16326
  • repo_url: None
  • paper_authors: Peihan Huo, Oscar Peralta, Junyu Guo, Qiaomin Xie, Andreea Minca
  • for: 本研究旨在探讨大规模群体动力学中的 Mean-Field 方法的局限性,并提出了 Multi-Population Mean-Field Game(MP-MFG)模型来解决这些限制。
  • methods: 作者们提出了一种基于 Policy Mirror Ascent 算法的 MP-MFG Nash 平衡找索法,并在更真实的情况下,即无知 Stochastic Block Model 时,提出了一种基于图顺游戏的 Graphon Game with Re-Sampling(GGR-S)模型。
  • results: 作者们分析了 GGR-S 模型的动态,并证明了它们的动态相对于 MP-MFG 动态的收敛性。此外,他们还提出了一种基于 GGR-S 模型的高效的 N-player 启动学习算法,并提供了finite sample 保证的收敛分析。
    Abstract The Mean-Field approximation is a tractable approach for studying large population dynamics. However, its assumption on homogeneity and universal connections among all agents limits its applicability in many real-world scenarios. Multi-Population Mean-Field Game (MP-MFG) models have been introduced in the literature to address these limitations. When the underlying Stochastic Block Model is known, we show that a Policy Mirror Ascent algorithm finds the MP-MFG Nash Equilibrium. In more realistic scenarios where the block model is unknown, we propose a re-sampling scheme from a graphon integrated with the finite N-player MP-MFG model. We develop a novel learning framework based on a Graphon Game with Re-Sampling (GGR-S) model, which captures the complex network structures of agents' connections. We analyze GGR-S dynamics and establish the convergence to dynamics of MP-MFG. Leveraging this result, we propose an efficient sample-based N-player Reinforcement Learning algorithm for GGR-S without population manipulation, and provide a rigorous convergence analysis with finite sample guarantee.
    摘要 “mean-fieldapproximation”是一种可行的方法来研究大规模动态系统。然而,它的假设homogeneity和所有代理者之间的通用连接限制了其在实际场景中的应用。“multi-population mean-field game”(MP-MFG)模型已经在文献中提出,以解决这些限制。当下面的随机块模型是known的时候,我们表明了一种策略镜像上升算法可以找到MP-MFG的 Nash Equilibrium。在更实际的场景中,当随机块模型是 unknown的时候,我们提议一种从graphon集成到finite N-player MP-MFG模型的重新抽样方案。我们开发了一种基于graphon game with re-sampling(GGR-S)模型的学习框架,该模型捕捉了代理者之间的复杂网络结构。我们分析了GGR-S动力学和establish了MP-MFG动力学的整合。基于这个结果,我们提出了一种高效的sample-based N-player reinforcement learning算法,并提供了finite sample guarantee的准确性分析。

Personalized Federated X -armed Bandit

  • paper_url: http://arxiv.org/abs/2310.16323
  • repo_url: None
  • paper_authors: Wenjie Li, Qifan Song, Jean Honorio
  • for: 本研究探讨了个性化联合 $\mathcal{X}$ 武器问题,即在联合学习框架中同时优化客户端的不同化本地目标。
  • methods: 我们提出了 \texttt{PF-PNE} 算法,具有独特的双排除策略,可以安全排除非优区间,同时鼓励联合合作通过偏袋而且有效的评估本地目标。
  • results: 我们的理论分析表明,提出的 \texttt{PF-PNE} 算法可以优化本地目标的任何水平异ogeneity,并且限制通信保护客户端奖励数据的隐私。实验表明,\texttt{PF-PNE} 超越多基elines在both synthetic和实际数据上。
    Abstract In this work, we study the personalized federated $\mathcal{X}$-armed bandit problem, where the heterogeneous local objectives of the clients are optimized simultaneously in the federated learning paradigm. We propose the \texttt{PF-PNE} algorithm with a unique double elimination strategy, which safely eliminates the non-optimal regions while encouraging federated collaboration through biased but effective evaluations of the local objectives. The proposed \texttt{PF-PNE} algorithm is able to optimize local objectives with arbitrary levels of heterogeneity, and its limited communications protects the confidentiality of the client-wise reward data. Our theoretical analysis shows the benefit of the proposed algorithm over single-client algorithms. Experimentally, \texttt{PF-PNE} outperforms multiple baselines on both synthetic and real life datasets.
    摘要 在这项研究中,我们研究了个性化联合 $\mathcal{X}$-臂投资问题,其中客户端的多样化本地目标同时在联合学习框架下优化。我们提议了\texttt{PF-PNE}算法,该算法使用独特的双淘汰策略,安全地排除非优化区域,同时通过偏袋但有效的本地目标评估来促进联合协作。提议的\texttt{PF-PNE}算法可以优化客户端目标的任何水平多样性,并且限制通信保护客户端奖励数据的隐私。我们的理论分析表明提议算法在单个客户端算法方面具有优势。实验表明\texttt{PF-PNE}超过多个基elines在 sintetic 和实际数据集上表现出色。

Enhancing Low-Precision Sampling via Stochastic Gradient Hamiltonian Monte Carlo

  • paper_url: http://arxiv.org/abs/2310.16320
  • repo_url: None
  • paper_authors: Ziyi Wang, Yujie Chen, Qifan Song, Ruqi Zhang
  • for: 这个论文研究了一种叫做低精度训练的技术,用于提高深度神经网络的训练效率,而不是减少准确性。
  • methods: 该论文使用了一种名叫Stochastic Gradient Hamiltonian Monte Carlo(SGHMC)的低精度抽样方法,并使用了低精度和全精度的梯度积累器。
  • results: 研究发现,使用SGHMC抽样方法可以在非几何分布上实现$\epsilon$-错误的2-沃asserstein距离,并且与现有的低精度抽样方法相比,具有 quadratic 改善($\widetilde{\mathbf{O}\left({\epsilon^{-2}{\mu^*}^{-2}\log^2\left({\epsilon^{-1}\right)}\right)$)。此外,研究还证明了SGHMC在梯度噪声中更加稳定和Robust。
    Abstract Low-precision training has emerged as a promising low-cost technique to enhance the training efficiency of deep neural networks without sacrificing much accuracy. Its Bayesian counterpart can further provide uncertainty quantification and improved generalization accuracy. This paper investigates low-precision sampling via Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) with low-precision and full-precision gradient accumulators for both strongly log-concave and non-log-concave distributions. Theoretically, our results show that, to achieve $\epsilon$-error in the 2-Wasserstein distance for non-log-concave distributions, low-precision SGHMC achieves quadratic improvement ($\widetilde{\mathbf{O}\left({\epsilon^{-2}{\mu^*}^{-2}\log^2\left({\epsilon^{-1}\right)}\right)$) compared to the state-of-the-art low-precision sampler, Stochastic Gradient Langevin Dynamics (SGLD) ($\widetilde{\mathbf{O}\left({\epsilon}^{-4}{\lambda^{*}^{-1}\log^5\left({\epsilon^{-1}\right)}\right)$). Moreover, we prove that low-precision SGHMC is more robust to the quantization error compared to low-precision SGLD due to the robustness of the momentum-based update w.r.t. gradient noise. Empirically, we conduct experiments on synthetic data, and {MNIST, CIFAR-10 \& CIFAR-100} datasets, which validate our theoretical findings. Our study highlights the potential of low-precision SGHMC as an efficient and accurate sampling method for large-scale and resource-limited machine learning.
    摘要 低精度训练技术已经出现为深度神经网络训练效率的低成本技术,无需做出很大牺牲。其 Bayesian 对应技术可以提供uncertainty量化和改善泛化精度。这篇论文研究了低精度抽样,使用随机Gradient Hamiltonian Monte Carlo(SGHMC)实现低精度和全精度梯度积累器。我们的研究结果表明,在非几何分布上,低精度 SGHMC 可以在 $\epsilon $ 误差下实现 2-Wasserstein 距离的 $\epsilon $-误差,而且与现状态的低精度抽样器 Stochastic Gradient Langevin Dynamics(SGLD)相比,具有quadratic 提高($\widetilde{\mathbf{O}\left({\epsilon^{-2}{\mu^*}^{-2}\log^2\left({\epsilon^{-1}\right)}\right)$)。此外,我们证明了低精度 SGHMC 对于梯度误差的Robustness比低精度 SGLD 更强,这是因为涉及梯度噪声的摇摆更新的稳定性。我们在 sintetic 数据和 {MNIST, CIFAR-10 \& CIFAR-100} datasets上进行了实验, validate 我们的理论发现。我们的研究强调了低精度 SGHMC 作为大规模和有限资源的机器学习中的有效和准确抽样方法。

Understanding Code Semantics: An Evaluation of Transformer Models in Summarization

  • paper_url: http://arxiv.org/abs/2310.16314
  • repo_url: https://github.com/Demon702/robust_code_summary
  • paper_authors: Debanjan Mondal, Abhilasha Lodha, Ankita Sahoo, Beena Kumari
  • for: 这篇论文探讨了高级变换器基于语言模型如何实现代码概要。
  • methods: 我们通过对函数和变量名进行修改来评估模型是否真正理解代码 semantics 还是仅仅依靠文本提示。我们还在三种编程语言(Python、Javascript、Java)中引入了干扰者如死代码和注释代码,进一步检验模型的理解能力。
  • results: 我们的研究希望通过探讨变换器基于LM的内部工作 Mechanism,提高模型对代码的理解能力,并促进软件开发和维护过程的效率。
    Abstract This paper delves into the intricacies of code summarization using advanced transformer-based language models. Through empirical studies, we evaluate the efficacy of code summarization by altering function and variable names to explore whether models truly understand code semantics or merely rely on textual cues. We have also introduced adversaries like dead code and commented code across three programming languages (Python, Javascript, and Java) to further scrutinize the model's understanding. Ultimately, our research aims to offer valuable insights into the inner workings of transformer-based LMs, enhancing their ability to understand code and contributing to more efficient software development practices and maintenance workflows.
    摘要 Here is the translation in Simplified Chinese:这篇论文探讨了使用高级变换器基本语言模型进行代码概要的细节。通过实验研究,我们评估了代码概要的有效性,通过修改函数和变量名来检验模型是否真正理解代码 semantics 还是仅仅依据文本提示。我们还在三种编程语言(Python、Javascript、Java)中引入了死代码和注释代码作为敌手,进一步检验模型的理解。我们的研究目标是提供有价值的内在性研究,以提高变换器基本语言模型对代码的理解,并为更有效的软件开发实践和维护工作流程做出贡献。

Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process with Uncertainty Quantification

  • paper_url: http://arxiv.org/abs/2310.16310
  • repo_url: None
  • paper_authors: Zichong Li, Qunzhi Xu, Zhenghao Xu, Yajun Mei, Tuo Zhao, Hongyuan Zha
  • for: 本研究旨在提出一种可靠地学习带有时空特征的事件发生过程的数学工具,以及为这些过程提供不确定性评估。
  • methods: 本研究使用的方法包括分布式预测和精度评估,以及一种基于分数匹配的pseudolikelihood函数来估计标记的STPPs。
  • results: 研究表明,对于不同的事件类型和数据量,SMASH方法可以提供更高的准确率和更低的不确定性,并且可以提供事件发生时间和位置的信心区间和标记的信心范围。
    Abstract Spatio-temporal point processes (STPPs) are potent mathematical tools for modeling and predicting events with both temporal and spatial features. Despite their versatility, most existing methods for learning STPPs either assume a restricted form of the spatio-temporal distribution, or suffer from inaccurate approximations of the intractable integral in the likelihood training objective. These issues typically arise from the normalization term of the probability density function. Moreover, current techniques fail to provide uncertainty quantification for model predictions, such as confidence intervals for the predicted event's arrival time and confidence regions for the event's location, which is crucial given the considerable randomness of the data. To tackle these challenges, we introduce SMASH: a Score MAtching-based pSeudolikeliHood estimator for learning marked STPPs with uncertainty quantification. Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of marked STPPs through score-matching and offers uncertainty quantification for the predicted event time, location and mark by computing confidence regions over the generated samples. The superior performance of our proposed framework is demonstrated through extensive experiments in both event prediction and uncertainty quantification.
    摘要 �� Stefanos �쳌�测点过程 (STPPs) 是一种强大的数学工具,用于模拟和预测具有时间和空间特征的事件。 despite their versatility, most existing methods for learning STPPs either assume a restricted form of the spatio-temporal distribution, or suffer from inaccurate approximations of the intractable integral in the likelihood training objective. These issues typically arise from the normalization term of the probability density function. Moreover, current techniques fail to provide uncertainty quantification for model predictions, such as confidence intervals for the predicted event's arrival time and confidence regions for the event's location, which is crucial given the considerable randomness of the data.To tackle these challenges, we introduce SMASH: a Score MAtching-based pSeudolikeliHood estimator for learning marked STPPs with uncertainty quantification. Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of marked STPPs through score-matching and offers uncertainty quantification for the predicted event time, location and mark by computing confidence regions over the generated samples. The superior performance of our proposed framework is demonstrated through extensive experiments in both event prediction and uncertainty quantification.Here's the text in Traditional Chinese:�� Stefanos �쳌�测点过程 (STPPs) 是一种强大的数学工具,用于模拟和预测具有时间和空间特征的事件。 despite their versatility, most existing methods for learning STPPs either assume a restricted form of the spatio-temporal distribution, or suffer from inaccurate approximations of the intractable integral in the likelihood training objective. These issues typically arise from the normalization term of the probability density function. Moreover, current techniques fail to provide uncertainty quantification for model predictions, such as confidence intervals for the predicted event's arrival time and confidence regions for the event's location, which is crucial given the considerable randomness of the data.To tackle these challenges, we introduce SMASH: a Score MAtching-based pSeudolikeliHood estimator for learning marked STPPs with uncertainty quantification. Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of marked STPPs through score-matching and offers uncertainty quantification for the predicted event time, location and mark by computing confidence regions over the generated samples. The superior performance of our proposed framework is demonstrated through extensive experiments in both event prediction and uncertainty quantification.

Imperfect Digital Twin Assisted Low Cost Reinforcement Training for Multi-UAV Networks

  • paper_url: http://arxiv.org/abs/2310.16302
  • repo_url: None
  • paper_authors: Xiucheng Wang, Nan Cheng, Longfei Ma, Zhisheng Yin, Tom. Luan, Ning Lu
  • for: 这个研究目的是优化多架空航空器网络的性能,并使用深度强化学习(DRL)来实现。
  • methods: 这个研究使用了随机生成的UAV混合部署方法,并使用了两个降阶神经网(NN)来优化UAV的混合部署,DT的建构成本,以及多架空航空器网络的性能。这两个NN使用了无监督学习和强化学习,两种低成本的无标签训练方法。
  • results: simulation results表明,这个方法可以对多架空航空器网络的训练成本进行重要优化,同时保证训练性能。这表明,使用不完美的DT模型可以实现高效的决策。
    Abstract Deep Reinforcement Learning (DRL) is widely used to optimize the performance of multi-UAV networks. However, the training of DRL relies on the frequent interactions between the UAVs and the environment, which consumes lots of energy due to the flying and communication of UAVs in practical experiments. Inspired by the growing digital twin (DT) technology, which can simulate the performance of algorithms in the digital space constructed by coping features of the physical space, the DT is introduced to reduce the costs of practical training, e.g., energy and hardware purchases. Different from previous DT-assisted works with an assumption of perfect reflecting real physics by virtual digital, we consider an imperfect DT model with deviations for assisting the training of multi-UAV networks. Remarkably, to trade off the training cost, DT construction cost, and the impact of deviations of DT on training, the natural and virtually generated UAV mixing deployment method is proposed. Two cascade neural networks (NN) are used to optimize the joint number of virtually generated UAVs, the DT construction cost, and the performance of multi-UAV networks. These two NNs are trained by unsupervised and reinforcement learning, both low-cost label-free training methods. Simulation results show the training cost can significantly decrease while guaranteeing the training performance. This implies that an efficient decision can be made with imperfect DTs in multi-UAV networks.
    摘要 为了衡量DT构建成本、训练成本和DT的偏差影响训练,我们提出了自然和虚拟生成的 UAV 混合部署方法。在这种方法中,我们使用了两个降准神经网络(NN)来优化虚拟生成 UAV 的数量、DT 构建成本和多架空航空器网络的性能。这两个 NN 通过无监督学习和反射学习训练,这些训练方法都是低成本的标签自由训练方法。在 simulations 中,我们发现可以大幅降低训练成本,同时保证训练性能。这表示,在多架空航空器网络中,可以有效地使用不完美的 DT 进行决策。

FuXi-Extreme: Improving extreme rainfall and wind forecasts with diffusion model

  • paper_url: http://arxiv.org/abs/2310.19822
  • repo_url: None
  • paper_authors: Xiaohui Zhong, Lei Chen, Jun Liu, Chensen Lin, Yuan Qi, Hao Li
    for:这个论文主要是为了提高天气预测模型的精度和准确性。methods:这个论文使用了一种名为denoising diffusion probabilistic model(DDPM),用于Restore表面预测数据中的细致细节,从而提高预测的准确性。results:论文表明,使用 FuXi-Extreme 模型可以在 5 天预测中提高预测的精度和准确性,特别是在极端天气事件方面。此外,这个模型还在 tropical cyclone 预测中表现出优于 HRES 模型。
    Abstract Significant advancements in the development of machine learning (ML) models for weather forecasting have produced remarkable results. State-of-the-art ML-based weather forecast models, such as FuXi, have demonstrated superior statistical forecast performance in comparison to the high-resolution forecasts (HRES) of the European Centre for Medium-Range Weather Forecasts (ECMWF). However, ML models face a common challenge: as forecast lead times increase, they tend to generate increasingly smooth predictions, leading to an underestimation of the intensity of extreme weather events. To address this challenge, we developed the FuXi-Extreme model, which employs a denoising diffusion probabilistic model (DDPM) to restore finer-scale details in the surface forecast data generated by the FuXi model in 5-day forecasts. An evaluation of extreme total precipitation ($\textrm{TP}$), 10-meter wind speed ($\textrm{WS10}$), and 2-meter temperature ($\textrm{T2M}$) illustrates the superior performance of FuXi-Extreme over both FuXi and HRES. Moreover, when evaluating tropical cyclone (TC) forecasts based on International Best Track Archive for Climate Stewardship (IBTrACS) dataset, both FuXi and FuXi-Extreme shows superior performance in TC track forecasts compared to HRES, but they show inferior performance in TC intensity forecasts in comparison to HRES.
    摘要 significannot advancements in the development of machine learning (ML) models for weather forecasting have produced remarkable results. State-of-the-art ML-based weather forecast models, such as FuXi, have demonstrated superior statistical forecast performance in comparison to the high-resolution forecasts (HRES) of the European Centre for Medium-Range Weather Forecasts (ECMWF). However, ML models face a common challenge: as forecast lead times increase, they tend to generate increasingly smooth predictions, leading to an underestimation of the intensity of extreme weather events. To address this challenge, we developed the FuXi-Extreme model, which employs a denoising diffusion probabilistic model (DDPM) to restore finer-scale details in the surface forecast data generated by the FuXi model in 5-day forecasts. An evaluation of extreme total precipitation ($\textrm{TP}$), 10-meter wind speed ($\textrm{WS10}$), and 2-meter temperature ($\textrm{T2M}$) illustrates the superior performance of FuXi-Extreme over both FuXi and HRES. Moreover, when evaluating tropical cyclone (TC) forecasts based on International Best Track Archive for Climate Stewardship (IBTrACS) dataset, both FuXi and FuXi-Extreme shows superior performance in TC track forecasts compared to HRES, but they show inferior performance in TC intensity forecasts in comparison to HRES.Here's the translation in Traditional Chinese:这些进步在机器学习(ML)模型的气象预报中得到了杰出的结果。例如FuXi这个ML-based weather forecast model,在欧洲中期气象预报中心(ECMWF)的高分辨率预报(HRES)之上, demonstrates superior statistical forecast performance。然而,ML模型面临一个普遍的挑战:当预报时间增加时,它们倾向于生成越来越平滑的预报,导致极端天气事件的Underestimation。为了解决这个问题,我们开发了FuXi-Extreme模型,该模型使用减钠扩散概率模型(DDPM)来重新塑造在FuXi模型的5天预报中的表面数据,以 Restore finer-scale details。一个evaluation of extreme total precipitation(TP)、10-meter wind speed(WS10)和2-meter temperature(T2M)表明FuXi-Extreme模型在FuXi和HRES之上表现出色。此外,基于International Best Track Archive for Climate Stewardship(IBTrACS)数据集的评估显示,FuXi和FuXi-Extreme在热带气旋(TC)预报中表现出Superior performance compared to HRES,但在TC intensity forecasts中表现较差 compared to HRES。

Crowd-Certain: Label Aggregation in Crowdsourced and Ensemble Learning Classification

  • paper_url: http://arxiv.org/abs/2310.16293
  • repo_url: None
  • paper_authors: Mohammad S. Majdi, Jeffrey J. Rodriguez
  • for: 这 paper 是为了提出一种基于大量标注数据的人员协同学习和集成学习分类任务的标注聚合方法,以提高性能和计算效率。
  • methods: 该方法使用了对 annotators 的一致性 versus 训练的分类器来确定每个 annotator 的可靠度分数。此外,Crowd-Certain 还利用预测的概率,使得可以 reuse 训练好的分类器,从而消除现有方法中的循环 simulation 过程。
  • results: 对于十种现有方法,Crowd-Certain 在十个不同的数据集上进行了广泛的评估,并取得了较高的平均准确率、F1 分数和 AUC 率。此外,本 paper 还提出了两种现有信任分数测试技术的变体,并使用了两个评估指标:预期均衡错误 (ECE) 和 Brier 分数损失。结果显示,Crowd-Certain 在大多数评估数据集上取得了更高的 Brier 分数,并且下降的 ECE。
    Abstract Crowdsourcing systems have been used to accumulate massive amounts of labeled data for applications such as computer vision and natural language processing. However, because crowdsourced labeling is inherently dynamic and uncertain, developing a technique that can work in most situations is extremely challenging. In this paper, we introduce Crowd-Certain, a novel approach for label aggregation in crowdsourced and ensemble learning classification tasks that offers improved performance and computational efficiency for different numbers of annotators and a variety of datasets. The proposed method uses the consistency of the annotators versus a trained classifier to determine a reliability score for each annotator. Furthermore, Crowd-Certain leverages predicted probabilities, enabling the reuse of trained classifiers on future sample data, thereby eliminating the need for recurrent simulation processes inherent in existing methods. We extensively evaluated our approach against ten existing techniques across ten different datasets, each labeled by varying numbers of annotators. The findings demonstrate that Crowd-Certain outperforms the existing methods (Tao, Sheng, KOS, MACE, MajorityVote, MMSR, Wawa, Zero-Based Skill, GLAD, and Dawid Skene), in nearly all scenarios, delivering higher average accuracy, F1 scores, and AUC rates. Additionally, we introduce a variation of two existing confidence score measurement techniques. Finally we evaluate these two confidence score techniques using two evaluation metrics: Expected Calibration Error (ECE) and Brier Score Loss. Our results show that Crowd-Certain achieves higher Brier Score, and lower ECE across the majority of the examined datasets, suggesting better calibrated results.
    摘要 众生系统已经用于收集大量标注数据,用于计算机视觉和自然语言处理等应用。然而,由于众生标注是自然动态的,因此开发一种能够在多种情况下工作的技术非常困难。在本文中,我们介绍了一种新的标签聚合方法——Crowd-Certain,用于众生和ensemble学习分类任务。该方法可以提高性能和计算效率,并且可以适用于不同的annotator数量和数据集。我们使用了annotators的一致性 versus 训练的分类器来确定每位annotator的可靠性分数。此外,Crowd-Certain还利用预测概率,从而消除了现有方法中的循环 simulation 过程。我们对十种现有方法进行了十种不同的数据集和不同的annotator数量的比较。结果表明,Crowd-Certain在大多数场景下超越现有方法,提供更高的平均准确率、F1分数和ROC曲线。此外,我们还提出了两种现有信息分数测量技术的变体。最后,我们使用了两种评估 metric:预期抽样错误(ECE)和布莱尔分数损失来评估这两种信息分数测量技术。我们的结果表明,Crowd-Certain在大多数检查的数据集上获得了更高的布莱尔分数,并且在大多数数据集上获得了更低的ECE,这表明它提供了更好的准确性。

Removing Dust from CMB Observations with Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.16285
  • repo_url: None
  • paper_authors: David Heurtel-Depeiges, Blakesley Burkhart, Ruben Ohana, Bruno Régaldo-Saint Blancard
  • for: 研究了吸引型尘埃前景模型的应用和其在分解成分方面的利益。
  • methods: 使用吸引型模型来模拟尘埃前景,并通过训练这些模型来实现分解成分。
  • results: 通过使用吸引型模型进行分解,可以良好地回归尘埃前景中的通用统计量(如功率spectrum和明ков斯函数)。此外,通过conditioning模型使用cosmology来提高模型的性能。
    Abstract In cosmology, the quest for primordial $B$-modes in cosmic microwave background (CMB) observations has highlighted the critical need for a refined model of the Galactic dust foreground. We investigate diffusion-based modeling of the dust foreground and its interest for component separation. Under the assumption of a Gaussian CMB with known cosmology (or covariance matrix), we show that diffusion models can be trained on examples of dust emission maps such that their sampling process directly coincides with posterior sampling in the context of component separation. We illustrate this on simulated mixtures of dust emission and CMB. We show that common summary statistics (power spectrum, Minkowski functionals) of the components are well recovered by this process. We also introduce a model conditioned by the CMB cosmology that outperforms models trained using a single cosmology on component separation. Such a model will be used in future work for diffusion-based cosmological inference.
    摘要 在 cosmology 中,寻找宇宙微波背景 (CMB) 中的原始 $B$-模式已经强调了对 галактической尘埃前景的精细模型的需求。我们研究了基于扩散的尘埃前景模型,并对其在分组分离中的利用性进行了调查。假设宇宙微波背景是 Gaussian 的,并且知道 cosmology 或者 covariance matrix,我们表明了扩散模型可以通过对尘埃辐射图像的示例进行训练,使其直接匹配 posterior 抽象在分组分离中的样本处理过程。我们在模拟中使用了尘埃辐射和 CMB 的混合图像,并证明了这种过程可以良好地重现组件的共同摘要统计(power spectrum、Minkowski 函数)。此外,我们还介绍了基于 CMB cosmology 的模型,该模型在分组分离中超越了基于单个 cosmology 的模型。这种模型将在未来的工作中用于扩散基于 cosmological inference。

Improvement in Alzheimer’s Disease MRI Images Analysis by Convolutional Neural Networks Via Topological Optimization

  • paper_url: http://arxiv.org/abs/2310.16857
  • repo_url: None
  • paper_authors: Peiwen Tan
  • for: 提高阿尔茨染色肿瘤病 diagnosis 精度
  • methods: 利用散射函数 topological optimization 进行 MRI 图像进一步加工
  • results: 使用 CNN 模型 VGG16、ResNet50、InceptionV3 和 Xception 后处理,对 MRI 图像进行改进,提高了图像的清晰度和准确率
    Abstract This research underscores the efficacy of Fourier topological optimization in refining MRI imagery, thereby bolstering the classification precision of Alzheimer's Disease through convolutional neural networks. Recognizing that MRI scans are indispensable for neurological assessments, but frequently grapple with issues like blurriness and contrast irregularities, the deployment of Fourier topological optimization offered enhanced delineation of brain structures, ameliorated noise, and superior contrast. The applied techniques prioritized boundary enhancement, contrast and brightness adjustments, and overall image lucidity. Employing CNN architectures VGG16, ResNet50, InceptionV3, and Xception, the post-optimization analysis revealed a marked elevation in performance. Conclusively, the amalgamation of Fourier topological optimization with CNNs delineates a promising trajectory for the nuanced classification of Alzheimer's Disease, portending a transformative impact on its diagnostic paradigms.
    摘要

Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits

  • paper_url: http://arxiv.org/abs/2310.16252
  • repo_url: https://github.com/aistats2024-noisy-psne/midsearch
  • paper_authors: Arnab Maiti, Ross Boczar, Kevin Jamieson, Lillian J. Ratliff
  • for: 本研究是为了研究在两个玩家的零和游戏中确定纯策略尼希尔平衡(PSNE)的样本复杂性。
  • methods: 本文使用了一种随机模型,其中任何学习者可以随机选择输入矩阵$A\in[-1,1]^{n\times m}$中的一个元素$(i,j)$,并观察$A_{i,j}+\eta$,其中$\eta$是一个零均值的1子-${\mathcal{N}(0,1)$噪音。学习者的目标是通过取得足够多样本来确定$A$中的PSNE,如果存在的话,并尽可能快。
  • results: 本文提出了一个基于实例的下界,该下界只取决于输入矩阵$A$中的行和列中的entry。此外,本文还提出了一个近似算法,其样本复杂性与下界匹配,即可以达到下界的logs级别。此外,本文还证明了这个问题与游戏策略的纯exploration问题和对抗策略问题之间的普通性,并且其结果与这些问题的优化下界匹配,即可以达到logs级别。
    Abstract We study the sample complexity of identifying the pure strategy Nash equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally, we are given a stochastic model where any learner can sample an entry $(i,j)$ of the input matrix $A\in[-1,1]^{n\times m}$ and observe $A_{i,j}+\eta$ where $\eta$ is a zero-mean 1-sub-Gaussian noise. The aim of the learner is to identify the PSNE of $A$, whenever it exists, with high probability while taking as few samples as possible. Zhou et al. (2017) presents an instance-dependent sample complexity lower bound that depends only on the entries in the row and column in which the PSNE lies. We design a near-optimal algorithm whose sample complexity matches the lower bound, up to log factors. The problem of identifying the PSNE also generalizes the problem of pure exploration in stochastic multi-armed bandits and dueling bandits, and our result matches the optimal bounds, up to log factors, in both the settings.
    摘要 我们研究样本复杂性标识纯策略尼仑平衡点(PSNE)在两个玩家零和游戏中的样本复杂性。正式地说,我们给出了一个随机模型,任何学习者可以随机选择输入矩阵$A\in[-1,1]^{n\times m}$的一个Entry $(i,j)$,并观察$A_{i,j}+\eta$,其中$\eta$是一个0mean的1子正态噪声。学习者的目标是通过取得高概率地标识PSNE的$A$,使用最少的样本数。周等(2017)提供了一个实例依赖的下界,该下界只取决于PSNE所在的行和列。我们设计了一个近似优化算法,其样本复杂性与下界匹配,即使log因子。PSNE的问题也涵盖了游戏纯exploration问题和对抗游戏问题,我们的结果与这两个设置的优化下界匹配,即使log因子。

eess.IV - 2023-10-25

Using Diffusion Models to Generate Synthetic Labelled Data for Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2310.16794
  • repo_url: https://github.com/dsaragih/diffuse-gen
  • paper_authors: Daniel Saragih, Pascal Tyrrell
    for: Synthetic labeled polyp images were generated to augment automatic medical image segmentation models.methods: Diffusion models were used to generate and style synthetic labeled data.results: The generated images more closely resembled real images, as shown by lower FID scores compared to previous GAN methods. The segmentation model performed better when trained or augmented with synthetic data, as evidenced by higher DL and IoU scores.
    Abstract In this paper, we proposed and evaluated a pipeline for generating synthetic labeled polyp images with the aim of augmenting automatic medical image segmentation models. In doing so, we explored the use of diffusion models to generate and style synthetic labeled data. The HyperKvasir dataset consisting of 1000 images of polyps in the human GI tract obtained from 2008 to 2016 during clinical endoscopies was used for training and testing. Furthermore, we did a qualitative expert review, and computed the Fr\'echet Inception Distance (FID) and Multi-Scale Structural Similarity (MS-SSIM) between the output images and the source images to evaluate our samples. To evaluate its augmentation potential, a segmentation model was trained with the synthetic data to compare their performance with the real data and previous Generative Adversarial Networks (GAN) methods. These models were evaluated using the Dice loss (DL) and Intersection over Union (IoU) score. Our pipeline generated images that more closely resembled real images according to the FID scores (GAN: $118.37 \pm 1.06 \text{ vs SD: } 65.99 \pm 0.37$). Improvements over GAN methods were seen on average when the segmenter was entirely trained (DL difference: $-0.0880 \pm 0.0170$, IoU difference: $0.0993 \pm 0.01493$) or augmented (DL difference: GAN $-0.1140 \pm 0.0900 \text{ vs SD }-0.1053 \pm 0.0981$, IoU difference: GAN $0.01533 \pm 0.03831 \text{ vs SD }0.0255 \pm 0.0454$) with synthetic data. Overall, we obtained more realistic synthetic images and improved segmentation model performance when fully or partially trained on synthetic data.
    摘要 在这篇论文中,我们提出并评估了一个涉及扩充自动医疗图像分割模型的数据生成管道。为此,我们利用了扩散模型来生成和风格化合成数据。我们使用的是HyperKvasir dataset,包含2008-2016年期间在临床endooscopy中获取的1000个肠道肿瘤图像。我们进行了专家评审,并计算了Fréchet Inception Distance(FID)和Multi-Scale Structural Similarity(MS-SSIM)指标来评估我们的样本。为了评估它的扩充潜力,我们使用合成数据训练了一个分割模型,并与实际数据和前一代生成对抗网络(GAN)方法进行比较。这些模型被评估使用Dice损失(DL)和交集 sobre union(IoU)分数。我们的管道生成的图像更加接近实际图像,根据FID分数(GAN:$118.37 \pm 1.06 \text{ vs SD: } 65.99 \pm 0.37)。在GAN方法上 average 的情况下,我们看到了使用合成数据训练的改善(DL差异:GAN $-0.1140 \pm 0.0900 \text{ vs SD }-0.1053 \pm 0.0981$, IoU差异:GAN $0.01533 \pm 0.03831 \text{ vs SD }0.0255 \pm 0.0454$)。总的来说,我们得到了更加真实的合成图像,并使分割模型在使用合成数据进行训练或扩充时表现得更好。

Single-pixel imaging based on deep learning

  • paper_url: http://arxiv.org/abs/2310.16869
  • repo_url: https://github.com/molyswu/hand_detection
  • paper_authors: Kai Song, Yaoxing Bian, Ku Wu, Hongrui Liu, Shuangping Han, Jiaming Li, Jiazhao Tian, Chengbin Qin, Jianyong Hu, Liantuan Xiao
  • for: 本文旨在概述基于深度学习的单像素成像技术的研究进展。
  • methods: 本文从单像素成像和深度学习的基本原理入手,详细介绍了基于深度学习的单像素成像技术的原理和实现方法。
  • results: 文献回顾了单像素成像基于深度学习的研究进展,包括超解像单像素成像、通过散射媒体的单像素成像、光子级单像素成像、基于单像素成像的光学加密、色彩单像素成像和无影像感测等领域的研究。
    Abstract Since the advent of single-pixel imaging and machine learning, both fields have flourished, but followed parallel tracks. Until recently, machine learning, especially deep learning, has demonstrated effectiveness in delivering high-quality solutions across various application domains of single-pixel imaging. This article comprehensively reviews the research of single-pixel imaging technology based on deep learning. From the basic principles of single-pixel imaging and deep learning, the principles and implementation methods of single-pixel imaging based on deep learning are described in detail. Then, the research status and development trend of single-pixel imaging based on deep learning in various domains are analyzed, including super-resolution single-pixel imaging, single-pixel imaging through scattering media, photon-level single-pixel imaging, optical encryption based on single-pixel imaging, color single-pixel imaging, and image-free sensing. Finally, this review explores the limitations in the ongoing research, while offers the delivering insights into prospective avenues for future research.
    摘要 (Simplified Chinese translation)自单Pixel影像和机器学习出现以来,这两个领域都在繁荣,但是在平行的轨道上进行了渐进的发展。直到最近,深度学习在各种单Pixel影像应用领域中表现出了高质量的解决方案。本文对单Pixel影像技术基于深度学习的研究进行了详细的查询。从单Pixel影像和深度学习的基本原理到深度学习在单Pixel影像方面的实现方法,都是在文中进行了详细的介绍。然后,文中分析了基于深度学习的单Pixel影像在不同领域的研究状况和发展趋势,包括超分辨单Pixel影像、单Pixel影像 через散射媒体、光子级单Pixel影像、基于单Pixel影像的光学加密、彩色单Pixel影像和无图像感知。最后,文中探讨了当前研究的限制,并提供了未来研究的可能性和挑战。

TILT: topological interface recovery in limited-angle tomography

  • paper_url: http://arxiv.org/abs/2310.16557
  • repo_url: None
  • paper_authors: Elli Karvonen, Matti Lassas, Pekka Pankka, Samuli Siltanen
  • for: solves the severely ill-posed inverse problem of limited-angle tomography
  • methods: lifting the visible part of the wavefront set under a universal covering map, using dual-tree complex wavelets, a dedicated metric, and persistent homology
  • results: not only a suggested invisible boundary but also a computational representation for all interfaces in the target
    Abstract A novel reconstruction method is introduced for the severely ill-posed inverse problem of limited-angle tomography. It is well known that, depending on the available measurement, angles specify a subset of the wavefront set of the unknown target, while some oriented singularities remain invisible in the data. Topological Interface recovery for Limited-angle Tomography, or TILT, is based on lifting the visible part of the wavefront set under a universal covering map. In the space provided, it is possible to connect the appropriate pieces of the lifted wavefront set correctly using dual-tree complex wavelets, a dedicated metric, and persistent homology. The result is not only a suggested invisible boundary but also a computational representation for all interfaces in the target.
    摘要 新的重建方法被介绍用于受限角度 computed tomography 的强度不确定问题。已知,根据可用的测量,角度Specify a subset of the wavefront set of the unknown target, while some oriented singularities remain invisible in the data。TILT 基于提升可见部分的波front set under a universal covering map。在提供的空间中,可以正确地连接相应的部分 lifting wavefront set using dual-tree complex wavelets, a dedicated metric, and persistent homology。结果不仅是一个建议的隐藏边界,还是一个计算表示所有界面在目标中。

eess.SP - 2023-10-25

Mode Selection and Target Classification in Cognitive Radar Networks

  • paper_url: http://arxiv.org/abs/2310.17006
  • repo_url: None
  • paper_authors: William W. Howard, Samuel R. Shebert, Benjamin H. Kirk, R. Michael Buehrer
  • For: The paper proposes a cognitive radar network that leverages the adaptability of cognitive radar networks to trade between active radar observation and passive signal parameter estimation, and learns the optimal action choices for each type of target.* Methods: The paper uses a multi-armed bandit model with current class information as prior information to select between available actions, and estimates the physical behavior of targets through radar emissions when the active radar action is selected, and estimates the radio behavior of targets through passive sensing when the passive action is selected.* Results: The network collects observed behavior of targets and forms clusters of similarly-behaved targets, and meta-learns the target class distributions while learning the optimal mode selections for each target class.Here are the three points in Simplified Chinese text:* For: 该 paper 提出了一种基于认知雷达网络的方法,以便在不同的目标类型下选择最佳的行动。* Methods: 该 paper 使用了一种多重武器模型,并使用当前类信息作为先验信息来选择可用的行动。当选择活动雷达时,节点通过雷达发射来估计目标的物理行为;当选择被动时,节点通过感知来估计目标的电磁行为。* Results: 网络通过收集目标的行为观察并组织类似目标的集群,从而meta-学习目标类型分布,同时学习每个目标类型的优化模式选择。
    Abstract Cognitive Radar Networks were proposed by Simon Haykin in 2006 to address problems with large legacy radar implementations - primarily, single-point vulnerabilities and lack of adaptability. This work proposes to leverage the adaptability of cognitive radar networks to trade between active radar observation, which uses high power and risks interception, and passive signal parameter estimation, which uses target emissions to gain side information and lower the power necessary to accurately track multiple targets. The goal of the network is to learn over many target tracks both the characteristics of the targets as well as the optimal action choices for each type of target. In order to select between the available actions, we utilize a multi-armed bandit model, using current class information as prior information. When the active radar action is selected, the node estimates the physical behavior of targets through the radar emissions. When the passive action is selected, the node estimates the radio behavior of targets through passive sensing. Over many target tracks, the network collects the observed behavior of targets and forms clusters of similarly-behaved targets. In this way, the network meta-learns the target class distributions while learning the optimal mode selections for each target class.
    摘要 cognitive radar networks 由谢韦金(Simon Haykin)在2006年提出,以解决传统雷达实施中的单点漏洞和不可靠性问题。这项工作提议利用智能雷达网络的适应性,在高功率和风险 intercept 的 aktive雷达观测和低功率的 passive信号参数估算之间进行交换。网络的目标是通过多个目标轨迹学习target的特征和最佳行为选择。为选择可用的行动,我们使用多重武器模型,使用当前类信息作为先验信息。当选择 active雷达动作时,节点估算目标的物理行为通过雷达发射。当选择 passive动作时,节点估算目标的电磁行为通过被动探测。在许多目标轨迹中,网络收集了目标的观测行为,并将其分为类似目标类型的集群。这样,网络可以meta-学习目标类型的分布,同时学习每个目标类型的优化模式选择。

Neural Distributed Compressor Discovers Binning

  • paper_url: http://arxiv.org/abs/2310.16961
  • repo_url: None
  • paper_authors: Ezgi Ozyilkan, Johannes Ballé, Elza Erkip
  • for: 这种研究是为了解决分布式源编码中的吞吐率问题,具体来说是威纳-赫茨问题。
  • methods: 这种研究使用机器学习的方法,特别是变量量量化,来实现数据驱动的压缩方案。
  • results: 研究发现,使用这种数据驱动的压缩方案可以Recover一些理想的理论解的特点,如源空间中的分割和使用侧 информацию进行最优的组合。这些行为 emerge although 没有直接使用源分布的知识。
    Abstract We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem, is a special case of distributed source coding. To this day, practical approaches for the Wyner-Ziv problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverages the universal function approximation capability of artificial neural networks. We find that our neural network-based compression scheme, based on variational vector quantization, recovers some principles of the optimum theoretical solution of the Wyner-Ziv setup, such as binning in the source space as well as optimal combination of the quantization index and side information, for exemplary sources. These behaviors emerge although no structure exploiting knowledge of the source distributions was imposed. Binning is a widely used tool in information theoretic proofs and methods, and to our knowledge, this is the first time it has been explicitly observed to emerge from data-driven learning.
    摘要 我们考虑在损失压缩的资料来源中进行损失less压缩,当decoder有无损失的存取相关的一来。这个设置称为吴纽-兹维问题,是分布式源码编码的特殊情况。至今为止,实用的方法 для吴纽-兹维问题仍未被完全开发或严重调查。我们提议一个基于机器学习的数据驱动方法,利用人工神经网络的通用函数近似能力。我们发现,我们的神经网络压缩方案,基于可变量化,可以重新现出一些吴纽-兹维问题的理想解答,例如在源空间中的binning以及对于副信息的优化 комбина�tion。这些行为 emerge,即使没有采用知情源分布的结构化知识。binning是信息论中广泛使用的工具,而且,根据我们所知,这是第一次由数据驱动学习中明示地观察到这种行为。

How to Extend 3D GBSM to Integrated Sensing and Communication Channel with Sharing Feature?

  • paper_url: http://arxiv.org/abs/2310.16765
  • repo_url: None
  • paper_authors: Yameng Liu, Jianhua Zhang, Yuxiang Zhang, Huiwen Gong, Tao Jiang, Guangyi Liu
  • for: This paper is written to support the development of Integrated Sensing and Communication (ISAC) technology in 6G systems, specifically by extending the existing 3D Geometry-Based Stochastic Model (GBSM) to include sensing channels.
  • methods: The paper proposes a new ISAC channel model that captures the sharing feature of both communication and sensing channels, including shared scatterers, clusters, paths, and similar propagation parameters. The model is based on a cascade of TX-target, radar cross section, and target-RX, with a novel parameter S for shared target extraction.
  • results: The proposed ISAC channel implementation framework offers flexible configuration of sharing feature and the joint generation of communication and sensing channels, and is compatible with the 3GPP standards, offering promising support for ISAC technology evaluation.
    Abstract Integrated Sensing and Communication (ISAC) is a promising technology in 6G systems. The existing 3D Geometry-Based Stochastic Model (GBSM), as standardized for 5G systems, addresses solely communication channels and lacks consideration of the integration with sensing channel. Therefore, this letter extends 3D GBSM to support ISAC research, with a particular focus on capturing the sharing feature of both channels, including shared scatterers, clusters, paths, and similar propagation param-eters, which have been experimentally verified in the literature. The proposed approach can be summarized as follows: Firstly, an ISAC channel model is proposed, where shared and non-shared components are superimposed for both communication and sensing. Secondly, sensing channel is characterized as a cascade of TX-target, radar cross section, and target-RX, with the introduction of a novel parameter S for shared target extraction. Finally, an ISAC channel implementation framework is proposed, allowing flexible configuration of sharing feature and the joint generation of communication and sensing channels. The proposed ISAC channel model can be compatible with the 3GPP standards and offers promising support for ISAC technology evaluation.
    摘要 《集成感知通信(ISAC)技术在6G系统中具有极大潜力。现有的3DGeometry-Based Stochastic Model(GBSM),为5G系统制定的标准,仅考虑了通信频道,不考虑感知频道的integration。因此,本文extend GBSM,以支持ISAC研究,特别是捕捉两个频道之间的共享特征,包括共享雷达目标、群集、路径和相似的传播参数,这些参数在文献中经过实验验证。 proposeapproach可以概括为以下三个步骤:1. 建立ISAC通信频道模型,其中共享和非共享组成部分相互重叠。2. 描述感知频道为TX-目标、雷达cross section和目标-RX的链式,并引入一个新参数S,用于捕捉共享目标。3. 提出ISAC通信频道实现框架,允许共享特征的灵活配置和共同生成通信和感知频道。提议的ISAC通信频道模型可以与3GPP标准兼容,并且对ISAC技术评估具有极大潜力。

Spherical Wavefront Near-Field DoA Estimation in THz Automotive Radar

  • paper_url: http://arxiv.org/abs/2310.16724
  • repo_url: None
  • paper_authors: Ahmet M. Elbir, Kumar Vijay Mishra, Symeon Chatzinotas
  • for: 这项研究旨在探讨 THz 频率带汽车雷达的方向OF arrival(DoA)估计方法。
  • methods: 该研究提出了一种使用多信号分类(MUSIC)算法来估计目标DoA和距离,同时也考虑了near-field中的板缩影响。
  • results: numerical experiments表明该方法可以准确地估计目标参数。
    Abstract Automotive radar at terahertz (THz) band has the potential to provide compact design. The availability of wide bandwidth at THz-band leads to high range resolution. Further, very narrow beamwidth arising from large arrays yields high angular resolution up to milli-degree level direction-of-arrival (DoA) estimation. At THz frequencies and extremely large arrays, the signal wavefront is spherical in the near-field that renders traditional far-field DoA estimation techniques unusable. In this work, we examine near-field DoA estimation for THz automotive radar. We propose an algorithm using multiple signal classification (MUSIC) to estimate target DoAs and ranges while also taking beam-squint in near-field into account. Using an array transformation approach, we compensate for near-field beam-squint in noise subspace computations to construct the beam-squint-free MUSIC spectra. Numerical experiments show the effectiveness of the proposed method to accurately estimate the target parameters.
    摘要 自动驱动radar在tera哈勒tz(THz)频带有可能提供更加紧凑的设计。 THz频带的宽频率导致高分辨率范围。此外,非常窄的扫描幅由大型阵列产生,使得高度分辨率的方向来源估计(DoA)。在THz频率和极其大的阵列下,信号波front在近场是球形的,使得传统的远场DoA估计技术无法使用。在这种情况下,我们研究了THz自动驱动radar的近场DoA估计。我们提出了使用多个信号分类(MUSIC)算法来估计目标DoAs和距离,同时也考虑近场扫描幅的影响。使用阵列变换方法,我们在噪声空间计算中补做近场扫描幅的影响,构建了扫描幅自由的MUSIC谱。 numerically experiments show the effectiveness of the proposed method to accurately estimate the target parameters.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Power Optimization in Satellite Communication Using Multi-Intelligent Reflecting Surfaces

  • paper_url: http://arxiv.org/abs/2310.16625
  • repo_url: None
  • paper_authors: Muhammad Ihsan Khalil
  • for: 这个研究旨在提高卫星到地面通信系统的能源效率,通过集成多个反射智能表面(RIS)的两个创新方法。
  • methods: 这两个方法包括对问题的分解,首先将RIS元素的相位调整为最大化电力接收,然后使用选择多标点对RIS元素进行最大化电力评估。第二个任务是使用二进制线性规划问题来实现最低功耗,并使用二进制粒子群推导(BPSO)技术来解决。
  • results: 研究发现这两个方法可以实现卫星到地面通信系统的能源效率提高,并且在实际运行中可以实现可 COUNTING 的能源储存。
    Abstract This study introduces two innovative methodologies aimed at augmenting energy efficiency in satellite-to-ground communication systems through the integration of multiple Reflective Intelligent Surfaces (RISs). The primary objective of these methodologies is to optimize overall energy efficiency under two distinct scenarios. In the first scenario, denoted as Ideal Environment (IE), we enhance energy efficiency by decomposing the problem into two sub-optimal tasks. The initial task concentrates on maximizing power reception by precisely adjusting the phase shift of each RIS element, followed by the implementation of Selective Diversity to identify the RIS element delivering maximal power. The second task entails minimizing power consumption, formulated as a binary linear programming problem, and addressed using the Binary Particle Swarm Optimization (BPSO) technique. The IE scenario presupposes an environment where signals propagate without any path loss, serving as a foundational benchmark for theoretical evaluations that elucidate the systems optimal capabilities. Conversely, the second scenario, termed Non-Ideal Environment (NIE), is designed for situations where signal transmission is subject to path loss. Within this framework, the Adam algorithm is utilized to optimize energy efficiency. This non ideal setting provides a pragmatic assessment of the systems capabilities under conventional operational conditions. Both scenarios emphasize the potential energy savings achievable by the satellite RIS system. Empirical simulations further corroborate the robustness and effectiveness of our approach, highlighting its potential to enhance energy efficiency in satellite-to-ground communication systems.
    摘要

  • paper_url: http://arxiv.org/abs/2310.16601
  • repo_url: None
  • paper_authors: Amila Ravinath, Bikshapathi Gouda, Italo Atzeni, Antti Tölli
  • for: 提高大量多输入多输出系统中用户设备(UE)的传输功率控制精度。
  • methods: 使用1比特分数器和多普逻耳sequence设计了一种closed-loop上行功率控制方法,包括单shot和算法控制方法两种。
  • results: 比较conventional closed-loop上行功率控制方法,提出的方法具有更高的精度和更好的性能。
    Abstract We propose uplink power control (PC) methods for massive multiple-input multiple-output systems with 1-bit analog-to-digital converters, which are specifically tailored to address the non-monotonic data detection performance with respect to the transmit power of the user equipment (UE). Considering a single UE, we design a multi-amplitude pilot sequence to capture the aforementioned non-monotonicity, which is utilized at the base station to derive UE transmit power adjustments via single-shot or differential power control (DPC) techniques. Both methods enable closed-loop uplink PC using different feedback approaches. The single-shot method employs one-time multi-bit feedback, while the DPC method relies on continuous adjustments with 1-bit feedback. Numerical results demonstrate the superiority of the proposed schemes over conventional closed-loop uplink PC techniques.
    摘要 我们提出了大量多输入多 outputs系统中的上传功率控制(PC)方法,特别是为了解决用户设备(UE)的传输功率与数据探测性的非单对数关系。对于单一UE,我们设计了多极性导航序列来捕捉上述非单对数关系,这些序列在基站端被用来 derivUE传输功率调整 via 单一射击或差分功率控制(DPC)技术。这两种方法均允许关闭loop上传PC使用不同的反馈方法。单一射击方法使用一次多位反馈,而DPC方法则靠 Continuous adjustments with 1-bit feedback。数字结果显示我们的提案方案比于传统关闭loop上传PC技术更有优势。

Terahertz-Enpowered Communications and Sensing in 6G Systems: Opportunities and Challenges

  • paper_url: http://arxiv.org/abs/2310.16548
  • repo_url: None
  • paper_authors: Wei Jiang, Hans D. Schotten
  • for: 这篇论文旨在探讨6G系统中使用THz频段的可能性和挑战。
  • methods: 论文提出了一些可能的THz频段应用和挑战,包括通信和感知、定位和成像等方面。
  • results: 论文未提出具体的结果,主要是为了探讨6G系统中THz频段的可能性和挑战。
    Abstract The current focus of academia and the telecommunications industry has been shifted to the development of the six-generation (6G) cellular technology, also formally referred to as IMT-2030. Unprecedented applications that 6G aims to accommodate demand extreme communications performance and, in addition, disruptive capabilities such as network sensing. Recently, there has been a surge of interest in terahertz (THz) frequencies as it offers not only massive spectral resources for communication but also distinct advantages in sensing, positioning, and imaging. The aim of this paper is to provide a brief outlook on opportunities opened by this under-exploited band and challenges that must be addressed to materialize the potential of THz-based communications and sensing in 6G systems.
    摘要 现在学术界和电信产业的焦点已经转移到第六代(6G)无线技术的开发,即IMT-2030。6G旨在满足极高通信性能的应用需求,同时还具有破坏性能,如网络感知。最近,人们对tera兆赫兹(THz)频率的利用表现出了很大的兴趣,因为它不仅提供了巨大的频率资源 для通信,而且在感知、定位和成像方面具有明显的优势。本文的目的是提供6G系统中THz频率的可能性和挑战的简要预测。

Transmitting Data Through Reconfigurable Intelligent Surface: A Spatial Sigma-Delta Modulation Approach

  • paper_url: http://arxiv.org/abs/2310.16347
  • repo_url: None
  • paper_authors: Wai-Yiu Keung, Hei Victor Cheng, Wing-Kin Ma
  • for: 这个论文旨在解决未来能效通信系统中数据传输中使用可重新配置智能表面(RIS)的问题。
  • methods: 该论文使用了一种名为Sigma-Delta($\Sigma\Delta$)干扰模ulasi的方法,该方法在空间域中应用首级Sigma-Delta干扰来处理相位量化。
  • results: 该论文的实验结果显示,使用Sigma-Delta干扰模ulasi可以实现与不量化ZF方案相同的比特错误率性能。
    Abstract Transmitting data using the phases on reconfigurable intelligent surfaces (RIS) is a promising solution for future energy-efficient communication systems. Recent work showed that a virtual phased massive multiuser multiple-input-multiple-out (MIMO) transmitter can be formed using only one active antenna and a large passive RIS. In this paper, we are interested in using such a system to perform MIMO downlink precoding. In this context, we may not be able to apply conventional MIMO precoding schemes, such as the simple zero-forcing (ZF) scheme, and we typically need to design the phase signals by solving optimization problems with constant modulus constraints or with discrete phase constraints, which pose challenges with high computational complexities. In this work, we propose an alternative approach based on Sigma-Delta ($\Sigma\Delta$) modulation, which is classically famous for its noise-shaping ability. Specifically, first-order $\Sigma\Delta$ modulation is applied in the spatial domain to handle phase quantization in generating constant envelope signals. Under some mild assumptions, the proposed phased $\Sigma\Delta$ modulator allows us to use the ZF scheme to synthesize the RIS reflection phases with negligible complexity. The proposed approach is empirically shown to achieve comparable bit error rate performance to the unquantized ZF scheme.
    摘要 通过可重新配置智能表面(RIS)传输数据是未来能效通信系统的承诺之一。最近的研究表明,只需一个活动天线和一大串pascal RIS可以形成虚拟 phase massive MIMO发射器。在这篇论文中,我们关心使用这种系统来执行MIMO下降干扰。在这种情况下,我们通常无法采用传统的MIMO预处理方案,如简单的零偏置(ZF)方案,而是需要设计相位信号通过优化问题的解决,这会带来高计算复杂度的挑战。在这种情况下,我们提议使用Sigma-Delta($\Sigma\Delta$)模ulation,这是经典知名的噪声定向技术。 Specifically, we apply first-order $\Sigma\Delta$ modulation in the spatial domain to handle phase quantization in generating constant envelope signals. Under some mild assumptions, the proposed phased $\Sigma\Delta$ modulator allows us to use the ZF scheme to synthesize the RIS reflection phases with negligible complexity. The proposed approach is empirically shown to achieve comparable bit error rate performance to the unquantized ZF scheme.

cs.SD - 2023-10-24

IA Para el Mantenimiento Predictivo en Canteras: Modelado

  • paper_url: http://arxiv.org/abs/2310.16140
  • repo_url: None
  • paper_authors: Fernando Marcos, Rodrigo Tamaki, Mateo Cámara, Virginia Yagüe, José Luis Blanco
  • for: 该论文目的是优化采矿业中的操作。
  • methods: 该论文使用无监督学习方法,训练一个变量自动编码器模型,使其能够从处理线操作时录制的听录中提取有用信息。
  • results: 研究结果表明,该模型能够将录制的听录重建并表示在隐藏空间中,并能够捕捉操作条件之间和设备之间的差异。未来,这可能会促进听录的分类和机器衰老的探测。
    Abstract Dependence on raw materials, especially in the mining sector, is a key part of today's economy. Aggregates are vital, being the second most used raw material after water. Digitally transforming this sector is key to optimizing operations. However, supervision and maintenance (predictive and corrective) are challenges little explored in this sector, due to the particularities of the sector, machinery and environmental conditions. All this, despite the successes achieved in other scenarios in monitoring with acoustic and contact sensors. We present an unsupervised learning scheme that trains a variational autoencoder model on a set of sound records. This is the first such dataset collected during processing plant operations, containing information from different points of the processing line. Our results demonstrate the model's ability to reconstruct and represent in latent space the recorded sounds, the differences in operating conditions and between different equipment. In the future, this should facilitate the classification of sounds, as well as the detection of anomalies and degradation patterns in the operation of the machinery.
    摘要 现代经济中Raw materials的依赖性,特别是采矿业,是非常重要的。各种粒子材料是第二重要的原材料,占用率很高。通过数字化转型,可以优化操作。但是,监督和维护(预测和修复)在这个领域尚未得到充分的探索,这主要归结于采矿业的特殊性、机器和环境条件。尽管在其他场景中监测器和接触传感器已经取得了成功,但这些成果在采矿业中尚未得到充分利用。我们提出了一种不监督学习方案,通过对一组声音记录进行训练,并将其模型化为一种变分自动机器学习模型。这是首次在处理厂操作过程中收集的声音数据集,包含不同点检测的声音信息。我们的结果表明模型能够重建和表示声音记录中的差异和不同设备的操作条件。未来,这将使得声音的分类和机器设备的磨损和腐食特征的检测变得更加容易。

CDSD: Chinese Dysarthria Speech Database

  • paper_url: http://arxiv.org/abs/2310.15930
  • repo_url: None
  • paper_authors: Mengyi Sun, Ming Gao, Xinchen Kang, Shiru Wang, Jun Du, Dengfeng Yao, Su-Jing Wang
  • For: The paper is written for researchers and professionals working in the field of dysarthria, specifically those interested in speech recognition and dysarthric speech.* Methods: The paper describes the data collection and annotation processes for the Chinese Dysarthria Speech Database (CDSD), as well as an approach for establishing a baseline for dysarthric speech recognition. The authors also conducted a speaker-dependent dysarthric speech recognition experiment using additional data from one participant.* Results: The paper reports that extensive data-driven model training and fine-tuning limited quantities of specific individual data can yield commendable results in speaker-dependent dysarthric speech recognition. However, the authors observe significant variations in recognition results among different dysarthric speakers.Here is the information in Simplified Chinese text:* For: 这篇论文是为了探讨异常语音障碍(dysarthria)研究而写的,特别是关注语音识别和异常语音识别。* Methods: 论文描述了中国异常语音语音库(CDSD)的数据收集和标注过程,以及一种建立异常语音识别基线的方法。作者还进行了一个参与者具体语音识别实验。* Results: 论文发现,通过大量数据驱动模型训练和特定个体数据精细调整,可以在参与者具体语音识别中获得了可夸dp的成绩。然而,作者发现异常语音 speaker之间的识别结果存在显著的差异。这些发现可以作为异常语音识别的参考点。
    Abstract We present the Chinese Dysarthria Speech Database (CDSD) as a valuable resource for dysarthria research. This database comprises speech data from 24 participants with dysarthria. Among these participants, one recorded an additional 10 hours of speech data, while each recorded one hour, resulting in 34 hours of speech material. To accommodate participants with varying cognitive levels, our text pool primarily consists of content from the AISHELL-1 dataset and speeches by primary and secondary school students. When participants read these texts, they must use a mobile device or the ZOOM F8n multi-track field recorder to record their speeches. In this paper, we elucidate the data collection and annotation processes and present an approach for establishing a baseline for dysarthric speech recognition. Furthermore, we conducted a speaker-dependent dysarthric speech recognition experiment using an additional 10 hours of speech data from one of our participants. Our research findings indicate that, through extensive data-driven model training, fine-tuning limited quantities of specific individual data yields commendable results in speaker-dependent dysarthric speech recognition. However, we observe significant variations in recognition results among different dysarthric speakers. These insights provide valuable reference points for speaker-dependent dysarthric speech recognition.
    摘要 我们介绍中国带有肥瘤症(dysarthria)演说数据库(CDSD)作为肥瘤症研究的有价值资源。这个数据库包括24名参与者的肥瘤症演说数据。这些参与者中有一些录制了额外的10小时演说数据,而每个人录制了1小时演说,共计34小时的演说材料。为了适应参与者的不同认知水平,我们的文本池主要来自AISHELL-1数据集和primary和secondary学校学生的演说。当参与者阅读这些文本时,他们需要使用移动设备或ZOOM F8n多轨采集器来记录他们的演说。在这篇论文中,我们详细介绍了数据收集和注释过程,并提出了基准建立肥瘤症演说识别的方法。此外,我们通过使用一名参与者的额外10小时演说数据进行了一个参与者依存的肥瘤症演说识别实验。我们的研究发现,通过大量数据驱动模型训练和精细调整限量的具体个人数据,可以在参与者依存的肥瘤症演说识别中获得优秀的结果。但我们发现,不同的肥瘤症演说者之间存在显著的识别结果差异。这些发现提供了价值的参考点 для参与者依存的肥瘤症演说识别。

FOLEY-VAE: Generación de efectos de audio para cine con inteligencia artificial

  • paper_url: http://arxiv.org/abs/2310.15663
  • repo_url: None
  • paper_authors: Mateo Cámara, José Luis Blanco
  • for: 这个研究旨在开发一种基于变量自动编码器的界面,用于创新创造FOLEY效果。
  • methods: 该模型通过各种自然声音训练,可以在实时传输新的声音特征到预录的音频或 Microphone 捕集的语音。此外,它还允许用户在实时进行交互式修改潜在变量,以实现精细化和个性化的艺术调整。
  • results: 研究基于上一年度这同学会上的研究,分析了现有的 RAVE 模型(一种特性化于音频效果生成的变量自动编码器)。该模型在 Audio 效果生成方面取得了成功,包括电磁、科幻、水声等效果。这种创新的方法已经为西班牙第一部使用人工智能生成的短片电影做出了贡献,这个突破口显示了人工智能在电影制作中的潜在价值和创新潜力。
    Abstract In this research, we present an interface based on Variational Autoencoders trained with a wide range of natural sounds for the innovative creation of Foley effects. The model can transfer new sound features to prerecorded audio or microphone-captured speech in real time. In addition, it allows interactive modification of latent variables, facilitating precise and customized artistic adjustments. Taking as a starting point our previous study on Variational Autoencoders presented at this same congress last year, we analyzed an existing implementation: RAVE [1]. This model has been specifically trained for audio effects production. Various audio effects have been successfully generated, ranging from electromagnetic, science fiction, and water sounds, among others published with this work. This innovative approach has been the basis for the artistic creation of the first Spanish short film with sound effects assisted by artificial intelligence. This milestone illustrates palpably the transformative potential of this technology in the film industry, opening the door to new possibilities for sound creation and the improvement of artistic quality in film productions.
    摘要 在这项研究中,我们提出了基于变量自动编码器的界面,用于创新 FOLEY 效果。该模型可以在实时传输新的声音特征,并允许互动修改潜在变量,以达到精确和定制化的艺术调整。作为上一年在这同会议上发表的前一项研究的起点,我们分析了现有的实现:RAVE [1]。这个模型已经专门用于音频效果生成。它在不同的音频效果上取得了成功,包括电磁、科幻和水声等,与此研究一起发表。这一创新方法已经成为了艺术创作中使用人工智能帮助生成音效的首个 milestone,这个突破口将开启新的可能性,以提高电影制作中的艺术质量。

eess.AS - 2023-10-24

Pre-training Music Classification Models via Music Source Separation

  • paper_url: http://arxiv.org/abs/2310.15845
  • repo_url: https://github.com/cgaroufis/msspt
  • paper_authors: Christos Garoufis, Athanasia Zlatintsi, Petros Maragos
  • for: 这个论文研究了 Whether music source separation can be used as a pre-training strategy for music representation learning, targeted at music classification tasks.
  • methods: 作者首先采用了 U-Net 网络,在不同的音乐源分离目标下进行了预训练,例如从音乐作品中隔离声乐或乐器源; 然后,他们附加了一个 convolutional tail network 到预训练后的 U-Net 上,并将整个网络进行了共同训练。 skip connections 也使得 separation 网络中学习的特征传递给了 tail network。
  • results: 实验结果表明,在两个公共可用的数据集上,采用预训练 U-Net 与 music source separation 目标可以提高 music classification 性能,特别是在使用 vocal separation 时的 music auto-tagging 任务中,以及在 multi-source separation 情况下的 music genre classification 任务中。
    Abstract In this paper, we study whether music source separation can be used as a pre-training strategy for music representation learning, targeted at music classification tasks. To this end, we first pre-train U-Net networks under various music source separation objectives, such as the isolation of vocal or instrumental sources from a musical piece; afterwards, we attach a convolutional tail network to the pre-trained U-Net and jointly finetune the whole network. The features learned by the separation network are also propagated to the tail network through skip connections. Experimental results in two widely used and publicly available datasets indicate that pre-training the U-Nets with a music source separation objective can improve performance compared to both training the whole network from scratch and using the tail network as a standalone in two music classification tasks: music auto-tagging, when vocal separation is used, and music genre classification for the case of multi-source separation.
    摘要 在这篇论文中,我们研究了music源分离是否可以作为music表示学习的预训练策略,targeted at music分类任务。为此,我们首先在不同的music源分离目标下预训练U-Net网络,例如从音乐作品中隔离声乐或乐器源;然后,我们将预训练后的U-Net网络与一个 convolutional 尾网络结合,并同时练习整个网络。learned by separation network的特征也通过skip connections传递给尾网络。实验结果表明,预训练U-Nets with music source separation objective可以提高music classification tasks中的表现,比如音乐自动标签任务中使用声乐分离,以及music genre classification任务中的多源分离情况。

cs.CV - 2023-10-24

On the Foundations of Shortcut Learning

  • paper_url: http://arxiv.org/abs/2310.16228
  • repo_url: None
  • paper_authors: Katherine L. Hermann, Hossein Mobahi, Thomas Fel, Michael C. Mozer
  • for: 这个论文的目的是研究深度学习模型如何选择特征,以及这些特征是如何影响模型的预测性和可用性。
  • methods: 这个论文使用了一种微型的生成框架,用于 sintesizing 分类数据集,并测试了不同输入属性的可用性和预测性如何影响模型的特征选择。
  • results: 研究发现,当 introducing a single hidden layer with ReLU or Tanh units 时,模型会带有一定的短cut bias,而 linear models 则相对较不受这种偏见。此外,研究还发现在实际应用中,模型在不同数据集中的可用性 manipulate 会增加模型的短cut bias。
    Abstract Deep-learning models can extract a rich assortment of features from data. Which features a model uses depends not only on predictivity-how reliably a feature indicates train-set labels-but also on availability-how easily the feature can be extracted, or leveraged, from inputs. The literature on shortcut learning has noted examples in which models privilege one feature over another, for example texture over shape and image backgrounds over foreground objects. Here, we test hypotheses about which input properties are more available to a model, and systematically study how predictivity and availability interact to shape models' feature use. We construct a minimal, explicit generative framework for synthesizing classification datasets with two latent features that vary in predictivity and in factors we hypothesize to relate to availability, and quantify a model's shortcut bias-its over-reliance on the shortcut (more available, less predictive) feature at the expense of the core (less available, more predictive) feature. We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias. Our empirical findings are consistent with a theoretical account based on Neural Tangent Kernels. Finally, we study how models used in practice trade off predictivity and availability in naturalistic datasets, discovering availability manipulations which increase models' degree of shortcut bias. Taken together, these findings suggest that the propensity to learn shortcut features is a fundamental characteristic of deep nonlinear architectures warranting systematic study given its role in shaping how models solve tasks.
    摘要 深度学习模型可以从数据中提取丰富的特征。这些特征的选择不仅取决于预测性能——数据集标签的可预测性——还取决于可用性——从输入中提取特征的可能性。文献中的快捷学习例子表明模型会偏爱某些特征,例如文本而不是形状,图像背景而不是前景对象。在这里,我们测试假设关于输入属性的可用性,并系统地研究预测性和可用性之间如何相互影响模型的特征使用。我们构建了一个最小、显式生成框架,用于生成分类数据集,其中两个隐藏特征变化于预测性和我们假设与可用性相关的因素。我们量化模型的快捷偏好——它过度依赖于快捷特征,而忽略核心特征。我们发现线性模型相对偏倚,但是在添加一个单一的隐藏层并使用ReLU或Tanh单元时,模型就会偏爱快捷特征。我们的实验结果与基于Neural Tangent Kernels的理论质量相符。最后,我们研究了实际使用的模型在自然化数据集中是如何交易预测性和可用性的。我们发现可以通过可用性操作来增加模型的快捷偏好。总之,这些发现表明深度非线性架构中的快捷特征学习是一种基本特征,需要系统地研究,因为它会影响模型如何解决任务。

ShadowSense: Unsupervised Domain Adaptation and Feature Fusion for Shadow-Agnostic Tree Crown Detection from RGB-Thermal Drone Imagery

  • paper_url: http://arxiv.org/abs/2310.16212
  • repo_url: https://github.com/rudrakshkapil/shadowsense
  • paper_authors: Rudraksh Kapil, Seyed Mojtaba Marvasti-Zadeh, Nadir Erbilgin, Nilanjan Ray
  • for: This paper is written for detecting individual tree crowns from remote sensing data, specifically in the context of dense forests with diverse environmental variations.
  • methods: The proposed method, called ShadowSense, is entirely self-supervised and leverages domain adversarial training and feature pyramid networks to adapt domain-invariant representations and improve the accuracy of tree crown detection.
  • results: The proposed method outperforms both the baseline RGB-trained detector and state-of-the-art techniques that rely on unsupervised domain adaptation or early image fusion, as demonstrated through extensive experiments.Here is the text in Simplified Chinese:
  • for: 本研究旨在通过远程感知数据检测个体树冠,特别是在杂化的森林环境下。
  • methods: 提议的方法是自适应域难变换和特征PYRAMID网络,以适应域不同的表示,并提高树冠检测的准确性。
  • results: 提议的方法比基础RGB训练的检测器和使用无监督领域适应或早期图像融合的状态前方法更高效,经过广泛的实验证明。
    Abstract Accurate detection of individual tree crowns from remote sensing data poses a significant challenge due to the dense nature of forest canopy and the presence of diverse environmental variations, e.g., overlapping canopies, occlusions, and varying lighting conditions. Additionally, the lack of data for training robust models adds another limitation in effectively studying complex forest conditions. This paper presents a novel method for detecting shadowed tree crowns and provides a challenging dataset comprising roughly 50k paired RGB-thermal images to facilitate future research for illumination-invariant detection. The proposed method (ShadowSense) is entirely self-supervised, leveraging domain adversarial training without source domain annotations for feature extraction and foreground feature alignment for feature pyramid networks to adapt domain-invariant representations by focusing on visible foreground regions, respectively. It then fuses complementary information of both modalities to effectively improve upon the predictions of an RGB-trained detector and boost the overall accuracy. Extensive experiments demonstrate the superiority of the proposed method over both the baseline RGB-trained detector and state-of-the-art techniques that rely on unsupervised domain adaptation or early image fusion. Our code and data are available: https://github.com/rudrakshkapil/ShadowSense
    摘要 原文:检测个体树冠FROM remote sensing数据中具有挑战性,因为森林穹顶叶物 dense 和多样化的环境变化(如重叠叶物、遮挡和不同的照明条件),而且缺乏数据 для训练可靠的模型,增加了研究复杂的森林条件的限制。本文提出了一种新的方法(ShadowSense),可以检测遮挡的树冠,并提供了一个包含约50k个RGB-热图像的挑战性数据集,以便未来的研究人员可以利用这些数据进行光照不敏感的检测。本方法是完全自动化的,不需要源频道注释,通过领域对抗训练来提取特征和对前景特征进行对齐,以适应频道不敏感的表示。然后,它将RGB和热图像的补充信息 fusion,以提高RGB检测器的预测结果。广泛的实验表明,提议的方法在RGB检测器和状态足的技术上都具有显著的优势。我们的代码和数据可以在 GitHub 上获得:https://github.com/rudrakshkapil/ShadowSense。Translation:原文:检测个体树冠FROM remote sensing数据中具有挑战性,因为森林穹顶叶物 dense 和多样化的环境变化(如重叠叶物、遮挡和不同的照明条件),而且缺乏数据 для训练可靠的模型,增加了研究复杂的森林条件的限制。本文提出了一种新的方法(ShadowSense),可以检测遮挡的树冠,并提供了一个包含约50k个RGB-热图像的挑战性数据集,以便未来的研究人员可以利用这些数据进行光照不敏感的检测。本方法是完全自动化的,不需要源频道注释,通过领域对抗训练来提取特征和对前景特征进行对齐,以适应频道不敏感的表示。然后,它将RGB和热图像的补充信息 fusion,以提高RGB检测器的预测结果。广泛的实验表明,提议的方法在RGB检测器和状态足的技术上都具有显著的优势。我们的代码和数据可以在 GitHub 上获得:https://github.com/rudrakshkapil/ShadowSense。

Sea-Land-Cloud Segmentation in Satellite Hyperspectral Imagery by Deep Learning

  • paper_url: http://arxiv.org/abs/2310.16210
  • repo_url: https://github.com/jonalvjusto/s_l_c_segm_hyp_img
  • paper_authors: Jon Alvarez Justo, Joseph Landon Garrett, Mariana-Iuliana Georgescu, Jesus Gonzalez-Llorente, Radu Tudor Ionescu, Tor Arne Johansen
  • For: The paper is written for on-board Artificial Intelligence (AI) techniques for enhancing the autonomy of satellite platforms through edge inference, specifically focusing on multi-class segmentation of High-Resolution Satellite (HS) imagery for sea, land, and cloud formations.* Methods: The paper employs 16 different deep learning (DL) models for segmenting HS imagery, including both shallow and deep models, and proposes four new DL models. The models are trained and evaluated for in-orbit deployment, considering performance, parameter count, and inference time.* Results: The paper shows that the proposed 1D-Justo-LiuNet model consistently outperforms state-of-the-art models for sea-land-cloud segmentation in terms of performance (0.93 accuracy) and parameter count (4,563), but presents longer inference time (15s) in the tested processing architecture. Additionally, the paper demonstrates that reducing spectral channels down to 3 can lower models’ parameters and inference time, but at the cost of weaker segmentation performance.
    Abstract Satellites are increasingly adopting on-board Artificial Intelligence (AI) techniques to enhance platforms' autonomy through edge inference. In this context, the utilization of deep learning (DL) techniques for segmentation in HS satellite imagery offers advantages for remote sensing applications, and therefore, we train 16 different models, whose codes are made available through our study, which we consider to be relevant for on-board multi-class segmentation of HS imagery, focusing on classifying oceanic (sea), terrestrial (land), and cloud formations. We employ the HYPSO-1 mission as an illustrative case for sea-land-cloud segmentation, and to demonstrate the utility of the segments, we introduce a novel sea-land-cloud ranking application scenario. Our system prioritizes HS image downlink based on sea, land, and cloud coverage levels from the segmented images. We comparatively evaluate the models for in-orbit deployment, considering performance, parameter count, and inference time. The models include both shallow and deep models, and after we propose four new DL models, we demonstrate that segmenting single spectral signatures (1D) outperforms 3D data processing comprising both spectral (1D) and spatial (2D) contexts. We conclude that our lightweight DL model, called 1D-Justo-LiuNet, consistently surpasses state-of-the-art models for sea-land-cloud segmentation, such as U-Net and its variations, in terms of performance (0.93 accuracy) and parameter count (4,563). However, the 1D models present longer inference time (15s) in the tested processing architecture, which is clearly suboptimal. Finally, after demonstrating that in-orbit image segmentation should occur post L1b radiance calibration rather than on raw data, we additionally show that reducing spectral channels down to 3 lowers models' parameters and inference time, at the cost of weaker segmentation performance.
    摘要 卫星在board上采用人工智能(AI)技术以提高平台的自主性,在这种情况下,使用深度学习(DL)技术进行高分辨率卫星影像(HS)中的分割具有优势,因此我们在本研究中训练了16个模型,代码在我们的研究中公布。我们认为这些模型适用于HS影像的board上多类分割,主要是为了将海洋(海)、陆地(地)和云形态分别分类。我们使用HYPSO-1任务作为示例,以示 segmentation 的实用性,并在HS影像中提供了一个新的海地陆云排名应用场景。我们的系统根据HS影像中的海、地和云覆盖水平来决定下载链接。我们对在空间中部署的模型进行比较评估,考虑性能、参数计数和计算时间。模型包括浅层和深度模型,而我们还提出了四种新的深度学习模型。我们发现,通过单 spectral signature(1D)进行分割,可以超过三个维度(2D)的数据处理。我们认为我们的轻量级深度学习模型,称为1D-Justo-LiuNet,在海地陆云分割方面 consistently exceeds state-of-the-art 模型(如UNet和其变种),以性能(0.93准确率)和参数计数(4563)为标准。然而,1D模型在测试的处理架构中具有较长的计算时间(15s),这显然不是最佳。最后,我们还证明了在空间中进行卫星影像分割应该在L1b辐射均衡化后进行,而不是在原始数据上进行。此外,我们发现,将 spectral channel 降低到3个可以降低模型的参数和计算时间,但是这将导致分割性能弱化。

Learning Low-Rank Latent Spaces with Simple Deterministic Autoencoder: Theoretical and Empirical Insights

  • paper_url: http://arxiv.org/abs/2310.16194
  • repo_url: None
  • paper_authors: Alokendu Mazumder, Tirthajit Baruah, Bhartendu Kumar, Rishab Sharma, Vishwajeet Pattanaik, Punit Rathore
  • for: 提高自动编码器的数据表示效果,使其更加紧凑地表示数据。
  • methods: incorporated 一个低级别正则化项,使自动编码器适应ively reconstruction低维 latent space,保持基本的目标。
  • results: empirical research shows that our model outperforms traditional autoencoders in various tasks such as image generation and downstream classification, and the theoretical analysis also proves the effectiveness of our model.
    Abstract The autoencoder is an unsupervised learning paradigm that aims to create a compact latent representation of data by minimizing the reconstruction loss. However, it tends to overlook the fact that most data (images) are embedded in a lower-dimensional space, which is crucial for effective data representation. To address this limitation, we propose a novel approach called Low-Rank Autoencoder (LoRAE). In LoRAE, we incorporated a low-rank regularizer to adaptively reconstruct a low-dimensional latent space while preserving the basic objective of an autoencoder. This helps embed the data in a lower-dimensional space while preserving important information. It is a simple autoencoder extension that learns low-rank latent space. Theoretically, we establish a tighter error bound for our model. Empirically, our model's superiority shines through various tasks such as image generation and downstream classification. Both theoretical and practical outcomes highlight the importance of acquiring low-dimensional embeddings.
    摘要 “自动Encoder是一种无监督学习架构,旨在创建一个简单的内在表示方法,以最小化重建损失。但是,它往往忽略了许多资料(图像)是嵌入在较低维度的空间中,这是有效的资料表示的关键。为了解决这个限制,我们提出了一种新的方法 called Low-Rank Autoencoder (LoRAE)。在LoRAE中,我们添加了一个低维度正规化项,以适应地重建一个较低维度的内在空间,保留了基本的自动Encoder目标。这将资料嵌入在较低维度的空间中,保留了重要的信息。它是一个简单地将自动Encoder扩展为学习低维度内在空间的方法。理论上,我们建立了一个更紧的错误范围,实际上,我们的模型在不同的任务上(如图像生成和下游分类)表现出了优越的成果。实际和理论的结果都显示了低维度内在空间的重要性。”

G-CASCADE: Efficient Cascaded Graph Convolutional Decoding for 2D Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2310.16175
  • repo_url: https://github.com/SLDGroup/G-CASCADE
  • paper_authors: Md Mostafijur Rahman, Radu Marculescu
  • for: 这篇论文是用于医疗影像分类的重要应用,并提出了一个新的图像分类模型,即弹性图像分类实现器(G-CASCADE)。
  • methods: 这个模型使用了弹性图像分类实现器,并与多个层次变换器结合,以进行多阶段特征地图。
  • results: 试验结果显示,这个模型在五个医疗影像分类任务(包括腹部器官、心脏器官、肿瘤涂炭、皮肤涂炭和视力器官)中都超过了其他现有的State-of-the-art(SOTA)方法。此外,该模型还可以轻松地与其他层次енкоーダ结合,用于通用的 semantic和医疗影像分类任务。
    Abstract In recent years, medical image segmentation has become an important application in the field of computer-aided diagnosis. In this paper, we are the first to propose a new graph convolution-based decoder namely, Cascaded Graph Convolutional Attention Decoder (G-CASCADE), for 2D medical image segmentation. G-CASCADE progressively refines multi-stage feature maps generated by hierarchical transformer encoders with an efficient graph convolution block. The encoder utilizes the self-attention mechanism to capture long-range dependencies, while the decoder refines the feature maps preserving long-range information due to the global receptive fields of the graph convolution block. Rigorous evaluations of our decoder with multiple transformer encoders on five medical image segmentation tasks (i.e., Abdomen organs, Cardiac organs, Polyp lesions, Skin lesions, and Retinal vessels) show that our model outperforms other state-of-the-art (SOTA) methods. We also demonstrate that our decoder achieves better DICE scores than the SOTA CASCADE decoder with 80.8% fewer parameters and 82.3% fewer FLOPs. Our decoder can easily be used with other hierarchical encoders for general-purpose semantic and medical image segmentation tasks.
    摘要 Recently, medical image segmentation has become an important application in the field of computer-aided diagnosis. In this paper, we propose a new graph convolution-based decoder, namely Cascaded Graph Convolutional Attention Decoder (G-CASCADE), for 2D medical image segmentation. G-CASCADE progressively refines multi-stage feature maps generated by hierarchical transformer encoders with an efficient graph convolution block. The encoder uses the self-attention mechanism to capture long-range dependencies, while the decoder refines the feature maps while preserving long-range information due to the global receptive fields of the graph convolution block. We evaluate our decoder with multiple transformer encoders on five medical image segmentation tasks (i.e., Abdomen organs, Cardiac organs, Polyp lesions, Skin lesions, and Retinal vessels) and show that our model outperforms other state-of-the-art (SOTA) methods. We also demonstrate that our decoder achieves better DICE scores than the SOTA CASCADE decoder with 80.8% fewer parameters and 82.3% fewer FLOPs. Our decoder can easily be used with other hierarchical encoders for general-purpose semantic and medical image segmentation tasks.

iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis

  • paper_url: http://arxiv.org/abs/2310.16167
  • repo_url: None
  • paper_authors: Yash Kant, Aliaksandr Siarohin, Michael Vasilkovsky, Riza Alp Guler, Jian Ren, Sergey Tulyakov, Igor Gilitschenski
  • for: The paper is written for generating consistent novel views from a single source image, with a focus on maximizing the reuse of visible pixels from the source image.
  • methods: The paper uses a monocular depth estimator to transfer visible pixels from the source view to the target view, and trains the method on the large-scale Objaverse dataset to learn 3D object priors. The paper also introduces a novel masking mechanism based on epipolar lines to further improve the quality of the approach.
  • results: The paper demonstrates the zero-shot abilities of the framework on three challenging datasets: Google Scanned Objects, Ray Traced Multiview, and Common Objects in 3D, and shows that the approach can generate high-quality novel views without requiring any additional training data.Here’s the simplified Chinese text for the three key points:
  • for: 这篇论文是为了从单一的源图像中生成一致的新视图,并将可见像素从源视图传输到目标视图。
  • methods: 这篇论文使用单目深度估计器来传输可见像素从源视图到目标视图,并在Objaverse大规模数据集上训练方法来学习3D对象假设。文章还引入了一种新的蒙版机制基于轴线,以进一步提高方法的质量。
  • results: 这篇论文在Google扫描物体、光Trace多视图和通用物体3D等三个复杂的数据集上展示了零shot能力,并证明了方法可以生成高质量的新视图无需任何额外的训练数据。
    Abstract We present a method for generating consistent novel views from a single source image. Our approach focuses on maximizing the reuse of visible pixels from the source image. To achieve this, we use a monocular depth estimator that transfers visible pixels from the source view to the target view. Starting from a pre-trained 2D inpainting diffusion model, we train our method on the large-scale Objaverse dataset to learn 3D object priors. While training we use a novel masking mechanism based on epipolar lines to further improve the quality of our approach. This allows our framework to perform zero-shot novel view synthesis on a variety of objects. We evaluate the zero-shot abilities of our framework on three challenging datasets: Google Scanned Objects, Ray Traced Multiview, and Common Objects in 3D. See our webpage for more details: https://yashkant.github.io/invs/
    摘要 我们提出了一种方法,可以从单个源图像生成一致的新视图。我们的方法是利用可见像素的最大重复使用来实现这一点。我们使用一个独立深度估计器,将源视图中的可见像素传递到目标视图中。我们从一个预训练的2D填充扩散模型开始,然后在宽泛的Objaverse数据集上训练我们的方法,以学习3D物体假设。在训练过程中,我们使用一种新的masking机制,基于epipolar线,以进一步提高我们的方法的质量。这 позволяет我们的框架在多种物体上进行零aser的新视图合成。我们在Google Scanned Objects、Ray Traced Multiview和Common Objects in 3D等三个挑战性 dataset上评估了我们的框架的零aser能力。更多细节请参考我们的网站:https://yashkant.github.io/invs/

MyriadAL: Active Few Shot Learning for Histopathology

  • paper_url: http://arxiv.org/abs/2310.16161
  • repo_url: None
  • paper_authors: Nico Schiavone, Jingyi Wang, Shuangzhi Li, Roger Zemp, Xingyu Li
    for:This paper addresses the issue of limited annotation budget in active learning (AL) and few-shot learning (FSL) scenarios, particularly in the context of histopathology where labelling is expensive.methods:The proposed Myriad Active Learning (MAL) framework includes a contrastive-learning encoder, pseudo-label generation, and novel query sample selection in the loop. Unlabelled data is massaged in a self-supervised manner to obtain data representations and clustering knowledge, which are used to activate the AL loop. The pseudo-labels of unlabelled data are refined with feedback from an oracle in each AL cycle, and the updated pseudo-labels are used to improve active learning query selection.results:Extensive experiments on two public histopathology datasets show that MAL has superior test accuracy, macro F1-score, and label efficiency compared to prior works, and can achieve a comparable test accuracy to a fully supervised algorithm while labelling only 5% of the dataset.
    Abstract Active Learning (AL) and Few Shot Learning (FSL) are two label-efficient methods which have achieved excellent results recently. However, most prior arts in both learning paradigms fail to explore the wealth of the vast unlabelled data. In this study, we address this issue in the scenario where the annotation budget is very limited, yet a large amount of unlabelled data for the target task is available. We frame this work in the context of histopathology where labelling is prohibitively expensive. To this end, we introduce an active few shot learning framework, Myriad Active Learning (MAL), including a contrastive-learning encoder, pseudo-label generation, and novel query sample selection in the loop. Specifically, we propose to massage unlabelled data in a self-supervised manner, where the obtained data representations and clustering knowledge form the basis to activate the AL loop. With feedback from the oracle in each AL cycle, the pseudo-labels of the unlabelled data are refined by optimizing a shallow task-specific net on top of the encoder. These updated pseudo-labels serve to inform and improve the active learning query selection process. Furthermore, we introduce a novel recipe to combine existing uncertainty measures and utilize the entire uncertainty list to reduce sample redundancy in AL. Extensive experiments on two public histopathology datasets show that MAL has superior test accuracy, macro F1-score, and label efficiency compared to prior works, and can achieve a comparable test accuracy to a fully supervised algorithm while labelling only 5% of the dataset.
    摘要 aktive læring (AL) og few shot læring (FSL) er to metoder, der har opnået excellente resultater på seneste tid. Men de fleste forrige værker i begge læring paradigmer har ignoreret de rigdomme af utydelige data. I denne studie adresserer vi dette problem i scenarioen, hvor annotation-budgetten er meget begrænset, men der er en stor mængde utydelige data for måletasken. Vi placerer dette arbejde i konteksten af histopatologi, hvor etikettering er forbudtivis. For at gøre dette, introducerer vi et aktivt few shot læring-ramme, kaldet Myriad Active Learning (MAL), der inkluderer en kontrastiv-læring encoder, pseudo-etiketteringsgenerering og en ny query-sample-udvælgelse i loop. Specielt proposerer vi at masseere utydelige data i en selv-superviset måde, hvor de erhvervede datarepræsentationer og clustering-viden danner grunden til at aktivere AL-loopet. Med tilbakemelding fra oraklet i hver AL-cirkel, opdaterer vi pseudo-etiketterne af utydelige data ved at optimere en skalmål-specifik net på toppen af encoderen. Disse opdaterede pseudo-etiketter tjener til at informere og forbedre den aktive læring query-udvælgelse proces. Desudtan introducerer vi en ny opskrift til at kombinere eksisterende usikkerhedsmål og bruge hele usikkerhedslisten til at reducere sample-redundansen i AL. Vores omfattende eksperimenter på to offentlige histopatologi-datasæt viser, at MAL har overlegne prøveaccuracy, makro F1-score og etiket-effektivitet i forhold til tidligere værker, og kan opnå en kompatibel prøveaccuracy til en fuldt-superviset algoritme, mens kun 5% af datasets er etiket.

Pix2HDR – A pixel-wise acquisition and deep learning-based synthesis approach for high-speed HDR videos

  • paper_url: http://arxiv.org/abs/2310.16139
  • repo_url: None
  • paper_authors: Caixin Wang, Jie Zhang, Matthew A. Wilson, Ralph Etienne-Cummings
  • for: 能够高速 capture 高动态范围视频,尤其是在低照度和暗背景下,是许多视觉应用中的关键。
  • methods: 我们使用像素级别可编程的图像感知器,通过在不同曝光和相位偏移中采样视频帧,同时捕捉快速运动和高动态范围。然后,我们使用深度神经网络对像素级别输出进行端到端学习,以实现高空间分辨率和减少运动模糊。
  • results: 我们实现了1000帧/秒的高动态范围视频捕捉,并能够减少运动模糊。我们的方法可以在各种动态条件下提高视觉系统的适应性和性能。
    Abstract Accurately capturing dynamic scenes with wide-ranging motion and light intensity is crucial for many vision applications. However, acquiring high-speed high dynamic range (HDR) video is challenging because the camera's frame rate restricts its dynamic range. Existing methods sacrifice speed to acquire multi-exposure frames. Yet, misaligned motion in these frames can still pose complications for HDR fusion algorithms, resulting in artifacts. Instead of frame-based exposures, we sample the videos using individual pixels at varying exposures and phase offsets. Implemented on a pixel-wise programmable image sensor, our sampling pattern simultaneously captures fast motion at a high dynamic range. We then transform pixel-wise outputs into an HDR video using end-to-end learned weights from deep neural networks, achieving high spatiotemporal resolution with minimized motion blurring. We demonstrate aliasing-free HDR video acquisition at 1000 FPS, resolving fast motion under low-light conditions and against bright backgrounds - both challenging conditions for conventional cameras. By combining the versatility of pixel-wise sampling patterns with the strength of deep neural networks at decoding complex scenes, our method greatly enhances the vision system's adaptability and performance in dynamic conditions.
    摘要 必须精准捕捉动态场景中的广泛运动和光强变化是许多视觉应用中的关键。然而,获取高速高动态范围(HDR)视频具有摄像机框率限制动态范围。现有方法为了获得多曝光帧而 sacrifices 速度。然而,这些多曝光帧中的运动不同相对可能会对HDR融合算法产生难以控制的缺陷,从而导致 artifacts。相比 frame-based 曝光,我们使用个别像素的不同曝光和阶段偏移来采样视频。在一个像素可程序化图像感知器上实现的这种采样模式中,我们可以同时捕捉快速运动和高动态范围。然后,我们使用深度神经网络学习到的权重将像素级别输出转换为HDR视频,实现高空间时间分辨率,并最小化运动模糊。我们在1000 FPS下获得无扭曲HDR视频,解决快速运动在低光环境和灯光背景下的捕捉问题,这些问题对传统摄像机来说都是挑战。通过将像素级别采样模式与深度神经网络的解码能力结合,我们的方法可以大幅提高视觉系统的适应性和性能在动态条件下。

Subtle Signals: Video-based Detection of Infant Non-nutritive Sucking as a Neurodevelopmental Cue

  • paper_url: http://arxiv.org/abs/2310.16138
  • repo_url: https://github.com/ostadabbas/nns-detection-and-segmentation
  • paper_authors: Shaotong Zhu, Michael Wan, Sai Kumar Reddy Manne, Emily Zimmerman, Sarah Ostadabbas
    for:This paper aims to develop a vision-based algorithm for non-contact detection of non-nutritive sucking (NNS) activity in infants using baby monitor footage.methods:The proposed algorithm utilizes optical flow and temporal convolutional networks to detect and amplify subtle infant-sucking signals from baby monitor videos.results:The authors successfully classify short video clips of uniform length into NNS and non-NNS periods, and investigate manual and learning-based techniques to piece together local classification results for segmenting longer mixed-activity videos into NNS and non-NNS segments of varying duration. Two novel datasets of annotated infant videos are introduced, including one sourced from a clinical study featuring 19 infant subjects and 183 hours of overnight baby monitor footage.
    Abstract Non-nutritive sucking (NNS), which refers to the act of sucking on a pacifier, finger, or similar object without nutrient intake, plays a crucial role in assessing healthy early development. In the case of preterm infants, NNS behavior is a key component in determining their readiness for feeding. In older infants, the characteristics of NNS behavior offer valuable insights into neural and motor development. Additionally, NNS activity has been proposed as a potential safeguard against sudden infant death syndrome (SIDS). However, the clinical application of NNS assessment is currently hindered by labor-intensive and subjective finger-in-mouth evaluations. Consequently, researchers often resort to expensive pressure transducers for objective NNS signal measurement. To enhance the accessibility and reliability of NNS signal monitoring for both clinicians and researchers, we introduce a vision-based algorithm designed for non-contact detection of NNS activity using baby monitor footage in natural settings. Our approach involves a comprehensive exploration of optical flow and temporal convolutional networks, enabling the detection and amplification of subtle infant-sucking signals. We successfully classify short video clips of uniform length into NNS and non-NNS periods. Furthermore, we investigate manual and learning-based techniques to piece together local classification results, facilitating the segmentation of longer mixed-activity videos into NNS and non-NNS segments of varying duration. Our research introduces two novel datasets of annotated infant videos, including one sourced from our clinical study featuring 19 infant subjects and 183 hours of overnight baby monitor footage.
    摘要 非营养性吸引行为(NNS),即吸食瓶、手指或类似物的行为而无营养摄入,在健康早期发展中扮演着关键角色。在幼儿期,NNS行为的特点对神经和运动发展提供了价值的信息。此外,NNS活动还被提议为新生儿死亡综合征(SIDS)的防范手段。然而,NNS评估的临床应用受到劳动力 INTENSIVE 和主观的吸食嘴部评估的限制。因此,研究人员经常使用昂贵的压力传感器来实现对NNS信号的 объектив测量。为了提高NNS信号监测的可 accessibility 和可靠性,我们提出了一种视觉基于的算法,用于非接触地检测幼儿吸食活动,使用宝宝监控器的视频记录。我们的方法包括对光流和时间卷积神经网络进行全面探索,以检测和强制 infant-吸食信号。我们成功地将短视频段分类为NNS和非NNS期间。此外,我们还 investigate 手动和学习基于的技术,以便将本地分类结果组装成长期混合活动视频的NNS和非NNS分 segments。我们的研究 introduce 了一个新的幼儿视频标注 Dataset,包括来自我们临床研究的19名婴儿主体和183小时的夜间宝宝监控器视频记录。

Stereoscopic Depth Perception Through Foliage

  • paper_url: http://arxiv.org/abs/2310.16120
  • repo_url: None
  • paper_authors: Robert Kerschner, Rakesh John Amala Arokia Nathan, Rafal Mantiuk, Oliver Bimber
  • for: 这篇论文是为了探讨人工和计算机方法在推断受遮盖物体深度方面的可能性和限制。
  • methods: 这篇论文使用了计算机 оптиче sintetic aperture sensing技术,以及人类视觉能力的融合,以实现对受遮盖物体深度的推断。
  • results: 研究发现,只有当使用了计算机方法和人类视觉能力的融合时,人类可以成功推断受遮盖物体深度。
    Abstract Both humans and computational methods struggle to discriminate the depths of objects hidden beneath foliage. However, such discrimination becomes feasible when we combine computational optical synthetic aperture sensing with the human ability to fuse stereoscopic images. For object identification tasks, as required in search and rescue, wildlife observation, surveillance, and early wildfire detection, depth assists in differentiating true from false findings, such as people, animals, or vehicles vs. sun-heated patches at the ground level or in the tree crowns, or ground fires vs. tree trunks. We used video captured by a drone above dense woodland to test users' ability to discriminate depth. We found that this is impossible when viewing monoscopic video and relying on motion parallax. The same was true with stereoscopic video because of the occlusions caused by foliage. However, when synthetic aperture sensing was used to reduce occlusions and disparity-scaled stereoscopic video was presented, whereas computational (stereoscopic matching) methods were unsuccessful, human observers successfully discriminated depth. This shows the potential of systems which exploit the synergy between computational methods and human vision to perform tasks that neither can perform alone.
    摘要 人类和计算方法都有困难在检测被树叶所隐藏的物体的深度。然而,当我们结合计算光学合成开口探测与人类的双目视觉融合时,这种检测变得可能。对于搜索救援、野生动物观察、监测和早期森林火灾检测等任务,深度可以帮助分辨真实的发现和假阳性发现,如人、动物或车辆与地面热腋或树叶之间的区分。我们使用了飞行器飞行在密集的森林区 capture的视频进行测试,发现在单目视频和运动相差探测时,人类无法分辨深度。同时,使用折射 Synthetic Aperture Sensing 减少遮挡和尺度缩放双目视频显示,而计算(双目视觉匹配)方法失败,人类观察员成功地分辨深度。这表明了将计算方法和人类视觉相结合的系统可以执行计算方法和人类无法执行的任务。

Wakening Past Concepts without Past Data: Class-Incremental Learning from Online Placebos

  • paper_url: http://arxiv.org/abs/2310.16115
  • repo_url: None
  • paper_authors: Yaoyao Liu, Yingying Li, Bernt Schiele, Qianru Sun
  • for: 这篇研究是针对分类增培学习(Class-Incremental Learning,CIL)中对旧类知识的保留问题进行了深入研究。
  • methods: 这篇研究使用了知识浓缩(Knowledge Distillation,KD)技术来解决旧类知识的保留问题。具体来说,这篇研究使用了新类数据来进行KD,并发现这会降低模型的适应度和效率。因此,这篇研究提出了使用流行的自由图像数据库,如Google Images,来选择合适的地方(placebos)进行KD。
  • results: 这篇研究的结果显示了以下几个点:1)使用流行的自由图像数据库选择的placebos可以实现高效的旧类知识保留,2)不需要额外的监督或记忆预算,3)在使用较低的记忆预算时,与一些顶尖的CIL方法进行比较,表现更好。
    Abstract Not forgetting old class knowledge is a key challenge for class-incremental learning (CIL) when the model continuously adapts to new classes. A common technique to address this is knowledge distillation (KD), which penalizes prediction inconsistencies between old and new models. Such prediction is made with almost new class data, as old class data is extremely scarce due to the strict memory limitation in CIL. In this paper, we take a deep dive into KD losses and find that "using new class data for KD" not only hinders the model adaption (for learning new classes) but also results in low efficiency for preserving old class knowledge. We address this by "using the placebos of old classes for KD", where the placebos are chosen from a free image stream, such as Google Images, in an automatical and economical fashion. To this end, we train an online placebo selection policy to quickly evaluate the quality of streaming images (good or bad placebos) and use only good ones for one-time feed-forward computation of KD. We formulate the policy training process as an online Markov Decision Process (MDP), and introduce an online learning algorithm to solve this MDP problem without causing much computation costs. In experiments, we show that our method 1) is surprisingly effective even when there is no class overlap between placebos and original old class data, 2) does not require any additional supervision or memory budget, and 3) significantly outperforms a number of top-performing CIL methods, in particular when using lower memory budgets for old class exemplars, e.g., five exemplars per class.
    摘要 不忘旧知识是隐藏学习(CIL)中的关键挑战,当模型不断适应新类时。一种常见的解决方法是知识阶段(KD),它评估新和旧模型的预测不一致。在这篇论文中,我们深入研究KD损失,发现“使用新类数据进行KD”不仅阻碍模型适应新类,而且效率不高于保留旧类知识。我们解决这问题,通过“使用旧类的地精选择”,其中地精选择自由图像流,如Google Images,以自动化和经济的方式。为此,我们训练了在线地精选策略,以快速评估流动图像质量(好或坏地精),并只使用好的地精进行一次 feed-forward 计算。我们将策略训练过程形式化为在线Markov决策过程(MDP),并引入在线学习算法来解决这个MDP问题,不需要多少计算成本。在实验中,我们发现我们的方法具有以下优点:1)效果很好,甚至在没有类 overlap 情况下也能够达到良好的效果,2)不需要额外的监督或存储预算,3)与一些顶尖的CIL方法相比,特别是使用较低的存储预算,例如每个类五个示例。

Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge

  • paper_url: http://arxiv.org/abs/2310.16112
  • repo_url: None
  • paper_authors: Gregory Holste, Yiliang Zhou, Song Wang, Ajay Jaiswal, Mingquan Lin, Sherry Zhuge, Yuzhe Yang, Dongkyun Kim, Trong-Hieu Nguyen-Mau, Minh-Triet Tran, Jaehyup Jeong, Wongi Park, Jongbin Ryu, Feng Hong, Arsh Verma, Yosuke Yamagishi, Changhyun Kim, Hyeryeong Seo, Myungjoo Kang, Leo Anthony Celi, Zhiyong Lu, Ronald M. Summers, George Shih, Zhangyang Wang, Yifan Peng
  • for: 这个论文是为了研究长尾学习在医疗影像识别中的应用。
  • methods: 论文使用了多种方法,包括长尾学习、多标签分类和视觉语言基础模型。
  • results: 研究发现了许多高性能解决方案,并提供了实践的建议 для长尾多标签医疗影像分类。此外,研究还提出了基于视觉语言基础模型的几种新的方法。
    Abstract Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" $\unicode{x2013}$ there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification.
    摘要 (Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The traditional Chinese form is used in Hong Kong, Macau, and Taiwan.)

Complex Image Generation SwinTransformer Network for Audio Denoising

  • paper_url: http://arxiv.org/abs/2310.16109
  • repo_url: https://github.com/YoushanZhang/CoxImgSwinTransformer
  • paper_authors: Youshan Zhang, Jialu Li
  • for: 这篇论文旨在提高实际应用中的音频干扰除除颗粒性。
  • methods: 该论文将音频干扰除问题转化为一个图像生成任务,并使用复杂的SwinTransformer网络捕捉更多的复杂傅立叶域信息。然后,通过结构相似和细节损失函数生成高质量图像,并使用SDR损失函数最小化清洁和干扰音频之间的差异。
  • results: 对两个标准数据集进行了广泛的实验,结果表明我们提出的模型比现有的方法更高效。
    Abstract Achieving high-performance audio denoising is still a challenging task in real-world applications. Existing time-frequency methods often ignore the quality of generated frequency domain images. This paper converts the audio denoising problem into an image generation task. We first develop a complex image generation SwinTransformer network to capture more information from the complex Fourier domain. We then impose structure similarity and detailed loss functions to generate high-quality images and develop an SDR loss to minimize the difference between denoised and clean audios. Extensive experiments on two benchmark datasets demonstrate that our proposed model is better than state-of-the-art methods.
    摘要 高性能音频减噪仍然是实际应用中的挑战。现有的时间频域方法通常忽略生成的频域图像质量。本文将音频减噪问题转换成图像生成任务。我们首先开发了复杂的图像生成SwinTransformer网络,以捕捉更多的复杂 Fourier 频域信息。然后,我们对生成的图像强制实施结构相似性和细节损失函数,以生成高质量的图像。同时,我们还开发了SDR损失函数,以最小化减噪后和清晰音频之间的差异。我们在两个标准测试集上进行了广泛的实验,结果表明,我们的提出的模型在比state-of-the-art方法更高的性能。

LaksNet: an end-to-end deep learning model for self-driving cars in Udacity simulator

  • paper_url: http://arxiv.org/abs/2310.16103
  • repo_url: None
  • paper_authors: Lakshmikar R. Polamreddy, Youshan Zhang
  • for: 降低机动车事故的风险,提高自动驾驶技术的效果
  • methods: 提出了一种新的快速深度学习模型,称为`LaksNet’,包括四个卷积层和两个全连接层
  • results: 对于UDacity simulator中的训练数据,LaksNet模型与许多现有的预训练ImageNet和NVIDIA模型相比,在车辆无法离开轨道的时间方面表现出色。
    Abstract The majority of road accidents occur because of human errors, including distraction, recklessness, and drunken driving. One of the effective ways to overcome this dangerous situation is by implementing self-driving technologies in vehicles. In this paper, we focus on building an efficient deep-learning model for self-driving cars. We propose a new and effective convolutional neural network model called `LaksNet' consisting of four convolutional layers and two fully connected layers. We conduct extensive experiments using our LaksNet model with the training data generated from the Udacity simulator. Our model outperforms many existing pre-trained ImageNet and NVIDIA models in terms of the duration of the car for which it drives without going off the track on the simulator.
    摘要 大多数道路事故是因为人类错误所致,包括分心、无负责感和酒后驾车。为了解决这个危险情况,我们可以在车辆中实施自动驾驶技术。在这篇论文中,我们专注于建立高效的深度学习模型 для自动驾驶车辆。我们提出了一个新的和有效的核心神经网络模型,称为“LaksNet”,它包括四个核心神经层和两个全连接层。我们对我们的LaksNet模型进行了广泛的实验,使用来自UDacity simulator的训练数据。我们的模型在训练数据上表现出色,与许多现有的预训ImageNet和NVIDIA模型相比,在 simulator 上驾车时间更长。

Learned, Uncertainty-driven Adaptive Acquisition for Photon-Efficient Multiphoton Microscopy

  • paper_url: http://arxiv.org/abs/2310.16102
  • repo_url: None
  • paper_authors: Cassandra Tong Ye, Jiashu Han, Kunzan Liu, Anastasios Angelopoulos, Linda Griffith, Kristina Monakhova, Sixian You
  • for: 这个论文是为了提高多光子镜像技术的信息准确性和速度而写的。
  • methods: 该论文使用了深度学习方法来降噪多光子镜像测量结果,同时也可以提供图像 pixel 层次的不确定性预测。
  • results: 该论文使用实验数据证明了其方法可以保持细节特征,并且在降噪和预测不确定性方面超过其他方法。此外,该方法还可以基于图像重新扫描优化采样技术,从而实现120倍的采样时间和总光谱强度减少。
    Abstract Multiphoton microscopy (MPM) is a powerful imaging tool that has been a critical enabler for live tissue imaging. However, since most multiphoton microscopy platforms rely on point scanning, there is an inherent trade-off between acquisition time, field of view (FOV), phototoxicity, and image quality, often resulting in noisy measurements when fast, large FOV, and/or gentle imaging is needed. Deep learning could be used to denoise multiphoton microscopy measurements, but these algorithms can be prone to hallucination, which can be disastrous for medical and scientific applications. We propose a method to simultaneously denoise and predict pixel-wise uncertainty for multiphoton imaging measurements, improving algorithm trustworthiness and providing statistical guarantees for the deep learning predictions. Furthermore, we propose to leverage this learned, pixel-wise uncertainty to drive an adaptive acquisition technique that rescans only the most uncertain regions of a sample. We demonstrate our method on experimental noisy MPM measurements of human endometrium tissues, showing that we can maintain fine features and outperform other denoising methods while predicting uncertainty at each pixel. Finally, with our adaptive acquisition technique, we demonstrate a 120X reduction in acquisition time and total light dose while successfully recovering fine features in the sample. We are the first to demonstrate distribution-free uncertainty quantification for a denoising task with real experimental data and the first to propose adaptive acquisition based on reconstruction uncertainty
    摘要 多 photon 微scopie (MPM) 是一种强大的成像工具,对生物组织成像具有关键作用。然而,由于大多数多 photon 微scopie 平台采用点扫描方式,因此在获取时间、预览 angle (FOV)、光学毒性和图像质量之间存在内在的交易,常导致快速、大 FOV 和/或温和成像时需要噪声损害。深度学习可以用于Multiphoton 微scopie 的噪声除除,但这些算法可能会出现幻觉,这可能是医学和科学应用中致命的问题。我们提出了一种同时减噪和预测像素级不确定性的方法,以提高深度学习预测的可信度,并为生成统计保证。此外,我们建议使用这些学习到的像素级不确定性来驱动适应式获取技术,重新扫描样本中的不确定区域。我们在实验室中使用噪声多 photon 微scopie 测量数据进行示例,并证明我们可以保持细节和其他减噪方法的表现相似,同时预测每个像素的不确定性。最后,我们使用我们的适应式获取技术,实现了120倍减少获取时间和总光子剂量,并成功地恢复样本中的细节特征。我们是首次在实验室数据中实现分布无关的不确定性量化,并首次提出适应式获取基于成像重建不确定性。

Deep Feature Registration for Unsupervised Domain Adaptation

  • paper_url: http://arxiv.org/abs/2310.16100
  • repo_url: None
  • paper_authors: Youshan Zhang, Brian D. Davison
  • for: 本研究旨在探讨如何更好地对准源频率和目标频率进行匹配,以提高预测性能。
  • methods: 我们提出了一种深度特征注册(DFR)模型,通过维护域不变特征并同时降低域不同特征和目标频率之间的差异,来实现域适应。此外,我们还提出了一种假标签纠正过程,包括概率软选择和中心基于硬选择,以提高目标域中假标签的质量。
  • results: 我们在多个UDA benchmark上进行了广泛的实验,结果表明我们的DFR模型能够提高预测性能,达到新的领先水平。
    Abstract While unsupervised domain adaptation has been explored to leverage the knowledge from a labeled source domain to an unlabeled target domain, existing methods focus on the distribution alignment between two domains. However, how to better align source and target features is not well addressed. In this paper, we propose a deep feature registration (DFR) model to generate registered features that maintain domain invariant features and simultaneously minimize the domain-dissimilarity of registered features and target features via histogram matching. We further employ a pseudo label refinement process, which considers both probabilistic soft selection and center-based hard selection to improve the quality of pseudo labels in the target domain. Extensive experiments on multiple UDA benchmarks demonstrate the effectiveness of our DFR model, resulting in new state-of-the-art performance.
    摘要 而尚未得到监督的领域适应(Unsupervised Domain Adaptation,UDA)已经被探索,以利用源领域的标注数据来适应目标领域的无标注数据。然而,如何更好地对应源和目标特征进行对应还没有得到充分的解决。在这篇论文中,我们提出了深度特征注册(Deep Feature Registration,DFR)模型,以生成具有领域不变特征的注册特征,同时使得注册特征和目标特征之间的差异最小化via histogram matching。此外,我们还employs一种假标签重新定义过程,包括概率软选择和中心基于硬选择,以提高目标领域的假标签质量。我们在多个UDA benchmark上进行了广泛的实验,并demonstrate了我们DFR模型的效果,从而实现了新的领先性表现。

From Posterior Sampling to Meaningful Diversity in Image Restoration

  • paper_url: http://arxiv.org/abs/2310.16047
  • repo_url: None
  • paper_authors: Noa Cohen, Hila Manor, Yuval Bahat, Tomer Michaeli
  • for: 这个论文主要目标是解决图像恢复中的多样性问题,即每个降低图像都可以有无数多个有效的恢复方案。
  • methods: 作者提出了一些后处理技术,可以与多样性图像恢复方法结合使用,以生成符号意义的多样性输出。此外,作者还提出了一种实用的方法,使得Diffusion based图像恢复方法可以生成多样性输出,而且只增加了negligible的计算开销。
  • results: 作者通过广泛的用户研究发现,减少输出之间的相似性是与后采样的策略相比significantly有利的。codes和示例可以在https://noa-cohen.github.io/MeaningfulDiversityInIR找到。
    Abstract Image restoration problems are typically ill-posed in the sense that each degraded image can be restored in infinitely many valid ways. To accommodate this, many works generate a diverse set of outputs by attempting to randomly sample from the posterior distribution of natural images given the degraded input. Here we argue that this strategy is commonly of limited practical value because of the heavy tail of the posterior distribution. Consider for example inpainting a missing region of the sky in an image. Since there is a high probability that the missing region contains no object but clouds, any set of samples from the posterior would be entirely dominated by (practically identical) completions of sky. However, arguably, presenting users with only one clear sky completion, along with several alternative solutions such as airships, birds, and balloons, would better outline the set of possibilities. In this paper, we initiate the study of meaningfully diverse image restoration. We explore several post-processing approaches that can be combined with any diverse image restoration method to yield semantically meaningful diversity. Moreover, we propose a practical approach for allowing diffusion based image restoration methods to generate meaningfully diverse outputs, while incurring only negligent computational overhead. We conduct extensive user studies to analyze the proposed techniques, and find the strategy of reducing similarity between outputs to be significantly favorable over posterior sampling. Code and examples are available in https://noa-cohen.github.io/MeaningfulDiversityInIR
    摘要 Image restoration problems typically have infinite solutions, making it difficult to choose the correct one. Many methods generate diverse outputs by sampling from the posterior distribution of natural images. However, this approach is limited because the posterior distribution has a heavy tail, and the generated outputs are often similar. For example, inpainting a missing region of the sky in an image, the posterior distribution is likely to contain only sky, with little variation. Instead of presenting users with multiple identical sky completions, it would be more useful to show alternative solutions such as airships, birds, and balloons. In this paper, we study meaningfully diverse image restoration and propose several post-processing approaches to achieve semantically meaningful diversity. We also propose a practical approach to generate diverse outputs with negligible computational overhead. We conduct user studies to evaluate the proposed techniques and find that reducing similarity between outputs is significantly favorable over posterior sampling. Code and examples are available at https://noa-cohen.github.io/MeaningfulDiversityInIR.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark

  • paper_url: http://arxiv.org/abs/2310.16044
  • repo_url: None
  • paper_authors: Zhengfei Kuang, Yunzhi Zhang, Hong-Xing Yu, Samir Agarwala, Shangzhe Wu, Jiajun Wu
  • for: 这个论文旨在提供一个真实世界的3D物体反向渲染测试准则,以评估和比较不同反向渲染方法的性能。
  • methods: 这篇论文使用了一个新的真实世界对象库,包含了各种不同的自然场景下的物体捕捉数据,以及相应的3D扫描数据、多视图图像和环境照明数据。
  • results: 该论文首次在真实世界场景下提供了一个完整的对象反向渲染评估准则,并对多种现有方法进行了比较。
    Abstract We introduce Stanford-ORB, a new real-world 3D Object inverse Rendering Benchmark. Recent advances in inverse rendering have enabled a wide range of real-world applications in 3D content generation, moving rapidly from research and commercial use cases to consumer devices. While the results continue to improve, there is no real-world benchmark that can quantitatively assess and compare the performance of various inverse rendering methods. Existing real-world datasets typically only consist of the shape and multi-view images of objects, which are not sufficient for evaluating the quality of material recovery and object relighting. Methods capable of recovering material and lighting often resort to synthetic data for quantitative evaluation, which on the other hand does not guarantee generalization to complex real-world environments. We introduce a new dataset of real-world objects captured under a variety of natural scenes with ground-truth 3D scans, multi-view images, and environment lighting. Using this dataset, we establish the first comprehensive real-world evaluation benchmark for object inverse rendering tasks from in-the-wild scenes, and compare the performance of various existing methods.
    摘要 我们引入 Stanford-ORB,一个新的实际世界3D物体反向渲染评估标准。 latest advances in inverse rendering 已经实现了广泛的实际应用在3D内容生成中,从研究和商业用例到consumer devices。 although the results continue to improve, there is no real-world benchmark that can quantitatively assess and compare the performance of various inverse rendering methods. existing real-world datasets typically only consist of the shape and multi-view images of objects, which are not sufficient for evaluating the quality of material recovery and object relighting. methods capable of recovering material and lighting often resort to synthetic data for quantitative evaluation, which on the other hand does not guarantee generalization to complex real-world environments. we introduce a new dataset of real-world objects captured under a variety of natural scenes with ground-truth 3D scans, multi-view images, and environment lighting. using this dataset, we establish the first comprehensive real-world evaluation benchmark for object inverse rendering tasks from in-the-wild scenes, and compare the performance of various existing methods.

ConvBKI: Real-Time Probabilistic Semantic Mapping Network with Quantifiable Uncertainty

  • paper_url: http://arxiv.org/abs/2310.16020
  • repo_url: None
  • paper_authors: Joey Wilson, Yuewei Fu, Joshua Friesen, Parker Ewen, Andrew Capodieci, Paramsothy Jayakumar, Kira Barton, Maani Ghaffari
  • for: 这个论文旨在开发一种卷积神经网络,用于实时 semantic mapping 在不确定环境中。
  • methods: 这个方法使用 Explicitly updating per-voxel probabilistic distributions within a neural network layer,combines the reliability of classical probabilistic algorithms with the performance and efficiency of modern neural networks。
  • results: 研究人员通过比较 ConvBKI 与现有深度学习方法和概率算法,发现 ConvBKI 具有更高的可靠性和性能。在实际的实验中,ConvBKI 在具有困难的感知 task 中表现出色。
    Abstract In this paper, we develop a modular neural network for real-time semantic mapping in uncertain environments, which explicitly updates per-voxel probabilistic distributions within a neural network layer. Our approach combines the reliability of classical probabilistic algorithms with the performance and efficiency of modern neural networks. Although robotic perception is often divided between modern differentiable methods and classical explicit methods, a union of both is necessary for real-time and trustworthy performance. We introduce a novel Convolutional Bayesian Kernel Inference (ConvBKI) layer which incorporates semantic segmentation predictions online into a 3D map through a depthwise convolution layer by leveraging conjugate priors. We compare ConvBKI against state-of-the-art deep learning approaches and probabilistic algorithms for mapping to evaluate reliability and performance. We also create a Robot Operating System (ROS) package of ConvBKI and test it on real-world perceptually challenging off-road driving data.
    摘要 在这篇论文中,我们开发了一种模块化神经网络,用于实时semantic mapping在不确定环境中,这种神经网络层中的每个小块拥有精确的概率分布。我们的方法结合了经典的概率算法和现代神经网络的性能和效率。虽然机器人观察通常被分为现代差分方法和经典显式方法,但是两者的结合是实时和可靠性的关键。我们引入了一种新的卷积 bayesian kernel inference(ConvBKI)层,通过 conjugate priors 将semantic segmentation预测结果在3D地图中进行深度wise convolution,并在实时更新per-voxel的概率分布。我们将ConvBKI与当前的深度学习方法和概率算法进行比较,以评估可靠性和性能。我们还创建了一个 Robot Operating System(ROS)包,并在实际的困难的Off-road驾驶数据上测试了ConvBKI。

CVPR 2023 Text Guided Video Editing Competition

  • paper_url: http://arxiv.org/abs/2310.16003
  • repo_url: https://github.com/showlab/loveu-tgve-2023
  • paper_authors: Jay Zhangjie Wu, Xiuyu Li, Difei Gao, Zhen Dong, Jinbin Bai, Aishani Singh, Xiaoyu Xiang, Youzeng Li, Zuwei Huang, Yuanxi Sun, Rui He, Feng Hu, Junhua Hu, Hai Huang, Hanyu Zhu, Xu Cheng, Jie Tang, Mike Zheng Shou, Kurt Keutzer, Forrest Iandola
  • for: 这 paper 是为了提出一个新的视频编辑数据集(TGVE),并且在 CVPR 会议上进行了一场竞赛,以评估模型在这个数据集上的性能。
  • methods: 这 paper 使用了文本到图像模型(如 Stable Diffusion 和 Imagen)进行生成 AI,并且提出了一个新的竞赛数据集来评估这些模型。
  • results: 这 paper 描述了竞赛中的赢家方法,并且提供了一个 retrapective 来评估竞赛中的模型性能。
    Abstract Humans watch more than a billion hours of video per day. Most of this video was edited manually, which is a tedious process. However, AI-enabled video-generation and video-editing is on the rise. Building on text-to-image models like Stable Diffusion and Imagen, generative AI has improved dramatically on video tasks. But it's hard to evaluate progress in these video tasks because there is no standard benchmark. So, we propose a new dataset for text-guided video editing (TGVE), and we run a competition at CVPR to evaluate models on our TGVE dataset. In this paper we present a retrospective on the competition and describe the winning method. The competition dataset is available at https://sites.google.com/view/loveucvpr23/track4.
    摘要 每天人们观看超过100亿小时视频,大多数这些视频都是人工编辑的,这是一项繁琐的过程。然而,基于文本到图像模型如稳定扩散和图像,生成AI在视频任务上有了很大的进步。然而,评估这些视频任务的进步很难,因为没有标准的评价标准。因此,我们提出了一个新的文本引导视频编辑(TGVE)数据集,并在CVPR上举办了一场比赛来评估模型在我们的TGVE数据集上的性能。在这篇论文中,我们提供了一个回顾这场比赛的报告,并描述了赢家的方法。TGVE数据集可以在以下链接下载:https://sites.google.com/view/loveucvpr23/track4。

Integrating View Conditions for Image Synthesis

  • paper_url: http://arxiv.org/abs/2310.16002
  • repo_url: https://github.com/viiika/viewcontrol
  • paper_authors: Jinbin Bai, Zhen Dong, Aosong Feng, Xiao Zhang, Tian Ye, Kaicheng Zhou, Mike Zheng Shou
  • for: 图像处理领域中进行细致 semantic 修改现有图像的挑战。
  • methods: 该 paper 提出了一种创新的框架,通过视点信息来增强图像编辑任务的控制。
  • results: 通过对现有方法进行评估和与当前领先方法进行比较,我们提供了吸引人的证据,证明我们的框架在多个维度上表现出色。
    Abstract In the field of image processing, applying intricate semantic modifications within existing images remains an enduring challenge. This paper introduces a pioneering framework that integrates viewpoint information to enhance the control of image editing tasks. By surveying existing object editing methodologies, we distill three essential criteria, consistency, controllability, and harmony, that should be met for an image editing method. In contrast to previous approaches, our method takes the lead in satisfying all three requirements for addressing the challenge of image synthesis. Through comprehensive experiments, encompassing both quantitative assessments and qualitative comparisons with contemporary state-of-the-art methods, we present compelling evidence of our framework's superior performance across multiple dimensions. This work establishes a promising avenue for advancing image synthesis techniques and empowering precise object modifications while preserving the visual coherence of the entire composition.
    摘要 在图像处理领域,在现有图像中进行细腻 semantic 修改仍然是一项挑战。这篇论文介绍了一种先进的框架,将视点信息integrated到图像编辑任务中,以提高图像编辑的控制能力。通过对现有 объек editing 方法进行检查,我们提炼出三个 essence criterion,即一致性、可控性和和谐性,这些 criterion 应被满足以解决图像合成的挑战。与先前的方法不同,我们的方法在这些 criterion 中脱颖而出,并在多个维度上提供了吸引人的证据。通过全面的实验,包括量化评估和当今领先的方法进行比较,我们展示了我们的框架在多个维度的优秀表现。这项工作为图像合成技术的发展开辟了新的 Avenues,并为精准对象修改而保持整体图像的视觉一致性提供了新的能力。

Transitivity Recovering Decompositions: Interpretable and Robust Fine-Grained Relationships

  • paper_url: http://arxiv.org/abs/2310.15999
  • repo_url: https://github.com/abhrac/trd
  • paper_authors: Abhra Chaudhuri, Massimiliano Mancini, Zeynep Akata, Anjan Dutta
  • for: 本研究旨在批处细腻表示学习中的抽象关系学习,以实现状态环境下的最佳结果。
  • methods: 本研究使用的方法是基于图像视图的可读性图表示抽象关系,并通过图像视图之间的关系恢复抽象关系的含义。
  • results: 研究表明,使用可读性图表示抽象关系可以提高表示学习的可读性和稳定性,并且可以与现有的状态环境下的最佳结果匹配。
    Abstract Recent advances in fine-grained representation learning leverage local-to-global (emergent) relationships for achieving state-of-the-art results. The relational representations relied upon by such methods, however, are abstract. We aim to deconstruct this abstraction by expressing them as interpretable graphs over image views. We begin by theoretically showing that abstract relational representations are nothing but a way of recovering transitive relationships among local views. Based on this, we design Transitivity Recovering Decompositions (TRD), a graph-space search algorithm that identifies interpretable equivalents of abstract emergent relationships at both instance and class levels, and with no post-hoc computations. We additionally show that TRD is provably robust to noisy views, with empirical evidence also supporting this finding. The latter allows TRD to perform at par or even better than the state-of-the-art, while being fully interpretable. Implementation is available at https://github.com/abhrac/trd.
    摘要 最近的细腻表示学技术发展借鉴本地-全局(emergent)关系以实现当前最佳效果。这些方法使用的关系表示是抽象的。我们想要拆分这种抽象,表示它们为可解释的图像视图。我们开始通过理论来证明,抽象的关系表示实际上是地址本地视图之间的转移关系。基于这一点,我们设计了归纳恢复分解(TRD),一种图像空间搜索算法,可以在实例和类层次上寻找可解释的相似关系,无需后续计算。我们还证明了TRD在噪音视图下的Robustness,并且有实际证明支持这一点。这意味着TRD可以与当前最佳性能相当或更高,同时具有可解释性。实现可以在https://github.com/abhrac/trd上找到。

Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning

  • paper_url: http://arxiv.org/abs/2310.15985
  • repo_url: https://github.com/mvrl/vlpl
  • paper_authors: Xin Xing, Zhexiao Xiong, Abby Stylianou, Srikumar Sastry, Liyu Gong, Nathan Jacobs
  • For: 这篇论文是为了解决单个图像可以同时属于多个类别的问题。* Methods: 该论文提出了一种新的视觉语言假标签方法(VLPL),使用视觉语言模型提供强有力的正确和负有力假标签,并与当前最佳方法相比,在 Pascal VOC、MS-COCO、NUS-WIDE 和 CUB-Birds 等四个dataset上减少了5.5%、18.4%、15.2% 和8.4%的错误率。* Results: 该论文的实验结果表明,使用VLPL方法可以在单个图像上预测多个类别时,与当前最佳方法相比,提高了5.5%、18.4%、15.2% 和8.4%的准确率。
    Abstract This paper presents a novel approach to Single-Positive Multi-label Learning. In general multi-label learning, a model learns to predict multiple labels or categories for a single input image. This is in contrast with standard multi-class image classification, where the task is predicting a single label from many possible labels for an image. Single-Positive Multi-label Learning (SPML) specifically considers learning to predict multiple labels when there is only a single annotation per image in the training data. Multi-label learning is in many ways a more realistic task than single-label learning as real-world data often involves instances belonging to multiple categories simultaneously; however, most common computer vision datasets predominantly contain single labels due to the inherent complexity and cost of collecting multiple high quality annotations for each instance. We propose a novel approach called Vision-Language Pseudo-Labeling (VLPL), which uses a vision-language model to suggest strong positive and negative pseudo-labels, and outperforms the current SOTA methods by 5.5% on Pascal VOC, 18.4% on MS-COCO, 15.2% on NUS-WIDE, and 8.4% on CUB-Birds. Our code and data are available at https://github.com/mvrl/VLPL.
    摘要

Geometry-Aware Video Quality Assessment for Dynamic Digital Human

  • paper_url: http://arxiv.org/abs/2310.15984
  • repo_url: None
  • paper_authors: Zicheng Zhang, Yingjie Zhou, Wei Sun, Xiongkuo Min, Guangtao Zhai
  • for: 本研究旨在提出一种基于DDH的无参考视频质量评估方法,以解决DDH在生成和传输过程中受到噪声和压缩扭曲的问题。
  • methods: 本方法使用DDH的统计参数来描述几何特征,从渲染视频中提取空间和时间特征,并将所有特征集成并回归到质量值上。
  • results: 实验结果表明,提出的方法在DDH-QA数据库上达到了状态对应的性能。
    Abstract Dynamic Digital Humans (DDHs) are 3D digital models that are animated using predefined motions and are inevitably bothered by noise/shift during the generation process and compression distortion during the transmission process, which needs to be perceptually evaluated. Usually, DDHs are displayed as 2D rendered animation videos and it is natural to adapt video quality assessment (VQA) methods to DDH quality assessment (DDH-QA) tasks. However, the VQA methods are highly dependent on viewpoints and less sensitive to geometry-based distortions. Therefore, in this paper, we propose a novel no-reference (NR) geometry-aware video quality assessment method for DDH-QA challenge. Geometry characteristics are described by the statistical parameters estimated from the DDHs' geometry attribute distributions. Spatial and temporal features are acquired from the rendered videos. Finally, all kinds of features are integrated and regressed into quality values. Experimental results show that the proposed method achieves state-of-the-art performance on the DDH-QA database.
    摘要 “几何动态人体”(DDH)是一种三维数字模型,通过预先定义的动作来动态显示,但是在生成和传输过程中受到噪音/移动的干扰,需要进行感知评估。通常,DDH会被显示为2Drendered动画影片,因此可以将影片质量评估(VQA)方法应用到DDH质量评估(DDH-QA)任务中。但是,VQA方法对于视点有高度依赖,较少关注几何基于的扭曲。因此,在本文中,我们提出一种新的无参考(NR)几何意识的影片质量评估方法,用于DDH-QA挑战。几何特征被描述为DDH几何特征分布的统计参数。从rendered影片中获取的空间和时间特征。最后,所有类型的特征被统合并对质量值进行回推。实验结果显示,提议的方法在DDH-QA数据库中实现了国际级的表现。

Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection

  • paper_url: http://arxiv.org/abs/2310.15955
  • repo_url: None
  • paper_authors: Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li
    for: 这个研究旨在提高DETR的物体检测性能。methods: 这篇研究使用了DETR的核心架构,并将分类和位置检测转化为不同的任务,以解决DETR的训练问题。results: 这篇研究在MSCOCO数据集上进行了广泛的实验,并证明了该方法可以提高DETR的表现,比如提高Conditional DETR的AP值4.5。
    Abstract The introduction of DETR represents a new paradigm for object detection. However, its decoder conducts classification and box localization using shared queries and cross-attention layers, leading to suboptimal results. We observe that different regions of interest in the visual feature map are suitable for performing query classification and box localization tasks, even for the same object. Salient regions provide vital information for classification, while the boundaries around them are more favorable for box regression. Unfortunately, such spatial misalignment between these two tasks greatly hinders DETR's training. Therefore, in this work, we focus on decoupling localization and classification tasks in DETR. To achieve this, we introduce a new design scheme called spatially decoupled DETR (SD-DETR), which includes a task-aware query generation module and a disentangled feature learning process. We elaborately design the task-aware query initialization process and divide the cross-attention block in the decoder to allow the task-aware queries to match different visual regions. Meanwhile, we also observe that the prediction misalignment problem for high classification confidence and precise localization exists, so we propose an alignment loss to further guide the spatially decoupled DETR training. Through extensive experiments, we demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work. For instance, we improve the performance of Conditional DETR by 4.5 AP. By spatially disentangling the two tasks, our method overcomes the misalignment problem and greatly improves the performance of DETR for object detection.
    摘要 DETR的引入标志着对象检测领域的新 paradigm。然而,DETR的解码器使用共享查询和交叉注意层进行类别和框定位任务,导致不佳的结果。我们发现不同的视觉特征图像区域适合进行查询类别和框定位任务,即使是同一个对象。焦点区域提供了关键的信息 для类别,而周围的边缘区域更适合框定位。然而,这种视觉空间不一致性使得DETR的训练受到很大的阻碍。因此,在这项工作中,我们关注DETR中的地方分解。为达到这一目标,我们提出了一种名为空间地解决DETR(SD-DETR)的新设计方案。我们仔细设计了任务意识查询初始化过程,并将权重分配块分解成多个视觉区域。同时,我们也发现了高分类信息和精准定位预测不一致的问题,因此我们提出了一种对齐损失来进一步引导空间地解DETR的训练。通过广泛的实验,我们证明了我们的方法在COCO dataset上比前一次的工作提高了4.5个AP。通过空间地解DETR,我们解决了视觉空间不一致性问题,并大幅提高了DETR对象检测的性能。

Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles

  • paper_url: http://arxiv.org/abs/2310.15952
  • repo_url: https://github.com/xingbpshen/nested-diffusion
  • paper_authors: Xing Shen, Hengguan Huang, Brennan Nichyporuk, Tal Arbel
  • for: 提高深度学习模型在医疗图像分析任务中的Robustness,无需预先定义数据增强策略。
  • methods: 提出了一种基于转换器和condition diffusion模型的三stage方法,通过建立层次特征表示,采用反卷积过程和生成型预测方法,提高模型对医疗图像变化的Robustness。
  • results: 通过对医疗图像benchmark数据集进行广泛的实验,示出了方法在比较状态方法的Robustness和信任折衔calibration方面的改进,同时也提出了实例水平预测不确定性的评估策略。
    Abstract While deep learning models have achieved remarkable success across a range of medical image analysis tasks, deployment of these models in real clinical contexts requires that they be robust to variability in the acquired images. While many methods apply predefined transformations to augment the training data to enhance test-time robustness, these transformations may not ensure the model's robustness to the diverse variability seen in patient images. In this paper, we introduce a novel three-stage approach based on transformers coupled with conditional diffusion models, with the goal of improving model robustness to the kinds of imaging variability commonly encountered in practice without the need for pre-determined data augmentation strategies. To this end, multiple image encoders first learn hierarchical feature representations to build discriminative latent spaces. Next, a reverse diffusion process, guided by the latent code, acts on an informative prior and proposes prediction candidates in a generative manner. Finally, several prediction candidates are aggregated in a bi-level aggregation protocol to produce the final output. Through extensive experiments on medical imaging benchmark datasets, we show that our method improves upon state-of-the-art methods in terms of robustness and confidence calibration. Additionally, we introduce a strategy to quantify the prediction uncertainty at the instance level, increasing their trustworthiness to clinicians using them in clinical practice.
    摘要 深度学习模型在医疗影像分析任务中已经取得了惊人的成功,但是在真实的临床上应用时,这些模型需要能够抗抗各种影像获取过程中的变化。许多方法通过预先定义的变换来增强训练数据,以提高测试时的Robustness,但这些变换可能并不能 garantía模型对实际医疗影像中的多样性的Robustness。在这篇论文中,我们提出了一种新的三个阶段方法,基于 transformers 和 condition diffusion 模型,以提高模型在实际医疗影像中的Robustness,无需预先定义数据增强策略。在这个方法中,多个图像Encoder 先学习层次特征表示,以建立特征空间的掌握。接下来,根据嵌入码,一种逆 diffusion 过程,带有有用的先验,提出了生成性的预测候选者。最后,多个预测候选者通过二级聚合协议来组合,生成最终输出。通过对医疗影像标准 benchmark 数据集进行广泛的实验,我们证明了我们的方法可以比对当前的方法提高Robustness和信任度Calibration。此外,我们还介绍了一种实例水平的预测uncertainty量化策略,使其在临床实践中增加了信任性。

Language-driven Scene Synthesis using Multi-conditional Diffusion Model

  • paper_url: http://arxiv.org/abs/2310.15948
  • repo_url: https://github.com/andvg3/LSDM
  • paper_authors: An Vuong, Minh Nhat Vu, Toan Tien Nguyen, Baoru Huang, Dzung Nguyen, Thieu Vo, Anh Nguyen
  • for: 本研究旨在提出一种语言驱动的场景生成任务,该任务将文本提示、人体动作和现有物品综合使用,以生成自然的场景。
  • methods: 我们提出了一种多 condional 干扰模型,该模型通过显式预测导向点来处理和编码多个条件,并且与其他干扰文献的隐式统一方法不同。
  • results: 我们的方法在实验中表现出色,超过了当前标准准则,并允许自然的场景编辑应用。
    Abstract Scene synthesis is a challenging problem with several industrial applications. Recently, substantial efforts have been directed to synthesize the scene using human motions, room layouts, or spatial graphs as the input. However, few studies have addressed this problem from multiple modalities, especially combining text prompts. In this paper, we propose a language-driven scene synthesis task, which is a new task that integrates text prompts, human motion, and existing objects for scene synthesis. Unlike other single-condition synthesis tasks, our problem involves multiple conditions and requires a strategy for processing and encoding them into a unified space. To address the challenge, we present a multi-conditional diffusion model, which differs from the implicit unification approach of other diffusion literature by explicitly predicting the guiding points for the original data distribution. We demonstrate that our approach is theoretically supportive. The intensive experiment results illustrate that our method outperforms state-of-the-art benchmarks and enables natural scene editing applications. The source code and dataset can be accessed at https://lang-scene-synth.github.io/.
    摘要 Scene synthesis is a challenging problem with many industrial applications. Recently, a lot of effort has been put into synthesizing scenes using human motions, room layouts, or spatial graphs as input. However, few studies have addressed this problem from multiple modalities, especially combining text prompts. In this paper, we propose a new task called language-driven scene synthesis, which integrates text prompts, human motion, and existing objects for scene synthesis. Unlike other single-condition synthesis tasks, our problem involves multiple conditions and requires a strategy for processing and encoding them into a unified space. To address the challenge, we present a multi-conditional diffusion model, which differs from the implicit unification approach of other diffusion literature by explicitly predicting the guiding points for the original data distribution. We demonstrate that our approach is theoretically supportive. The intensive experiment results show that our method outperforms state-of-the-art benchmarks and enables natural scene editing applications. The source code and dataset can be accessed at https://lang-scene-synth.github.io/.Here's the translation in Traditional Chinese:Scene synthesis is a challenging problem with many industrial applications. Recently, a lot of effort has been put into synthesizing scenes using human motions, room layouts, or spatial graphs as input. However, few studies have addressed this problem from multiple modalities, especially combining text prompts. In this paper, we propose a new task called language-driven scene synthesis, which integrates text prompts, human motion, and existing objects for scene synthesis. Unlike other single-condition synthesis tasks, our problem involves multiple conditions and requires a strategy for processing and encoding them into a unified space. To address the challenge, we present a multi-conditional diffusion model, which differs from the implicit unification approach of other diffusion literature by explicitly predicting the guiding points for the original data distribution. We demonstrate that our approach is theoretically supportive. The intensive experiment results show that our method outperforms state-of-the-art benchmarks and enables natural scene editing applications. The source code and dataset can be accessed at https://lang-scene-synth.github.io/.

ShARc: Shape and Appearance Recognition for Person Identification In-the-wild

  • paper_url: http://arxiv.org/abs/2310.15946
  • repo_url: None
  • paper_authors: Haidong Zhu, Wanrong Zheng, Zhaoheng Zheng, Ram Nevatia
  • for: 这篇论文旨在提出一种多模态方法,用于在无控制环境中进行人识别,以提高人识别精度。
  • methods: 该方法使用了两个编码器:一个姿势和形态编码器(PSE)和一个综合应用编码器(AAE)。PSE编码人体形状通过二维素投影、骨骼运动和三维体形等方式,而AAE提供了两种时间domain的特征聚合方法:注意力基于的特征聚合和平均聚合。
  • results: comparative experiments on public datasets(CCVID、MEVID和BRIAR)表明,该方法可以在人识别任务中显著提高精度,比 EXISTS 的方法更高。
    Abstract Identifying individuals in unconstrained video settings is a valuable yet challenging task in biometric analysis due to variations in appearances, environments, degradations, and occlusions. In this paper, we present ShARc, a multimodal approach for video-based person identification in uncontrolled environments that emphasizes 3-D body shape, pose, and appearance. We introduce two encoders: a Pose and Shape Encoder (PSE) and an Aggregated Appearance Encoder (AAE). PSE encodes the body shape via binarized silhouettes, skeleton motions, and 3-D body shape, while AAE provides two levels of temporal appearance feature aggregation: attention-based feature aggregation and averaging aggregation. For attention-based feature aggregation, we employ spatial and temporal attention to focus on key areas for person distinction. For averaging aggregation, we introduce a novel flattening layer after averaging to extract more distinguishable information and reduce overfitting of attention. We utilize centroid feature averaging for gallery registration. We demonstrate significant improvements over existing state-of-the-art methods on public datasets, including CCVID, MEVID, and BRIAR.
    摘要 在未控制的视频设置下,识别人员是生物ometric分析中的一项有价值 yet challenging task,因为人体外观、环境、损害和遮挡会导致人脸识别的困难。在这篇论文中,我们提出了 ShARc,一种多模态方法,用于在未控制环境下进行视频基于人脸识别。我们提出了两个编码器:一个 pose和形体编码器(PSE),以及一个综合应用 aparearance编码器(AAE)。PSE编码器通过矩阵化的轮廓、骨骼运动和三维体形来编码人体形状,而 AAE 提供了两种时间上的特征聚合方法:注意力基于的特征聚合和平均聚合。为了注意力基于的特征聚合,我们使用空间和时间的注意力来强调关键区域的人脸分辨率。为了平均聚合,我们引入了一种新的扁平层来抽取更多的分辨率信息,并避免注意力的过拟合。我们使用中心点特征平均进行 галерее注册。我们在公共数据集上达到了与现有状态的显著提升。

RePoseDM: Recurrent Pose Alignment and Gradient Guidance for Pose Guided Image Synthesis

  • paper_url: http://arxiv.org/abs/2310.16074
  • repo_url: None
  • paper_authors: Anant Khandelwal
  • For: pose-guided person image synthesis task* Methods: recurrent pose alignment, gradient guidance from pose interaction fields* Results: photorealistic appearance, flawless pose transfer, plausible pose transfer trajectories, efficient gradient guidanceHere’s the full text in Simplified Chinese:
  • for: 人像图像合成任务,需要重新渲染参考图像,以保持高真实度和精准的 pose 传输。由于人像图像具有高结构化特征,现有方法通常需要密集的连接来处理复杂的扭变和遮挡。但是, convolutional neural networks 生成的特征图不具有对称性,因此即使使用多级扭变和masking在latent space中,pose 对Alignment也不是完美的。我们引用了 diffusion 模型可以生成高真实度的图像,并提出了 recurrent pose alignment 来提供 pose-aligned texture 特征作为 conditional guidance。此外,我们还提出了基于 pose 互动场的梯度导航,可以输出target pose 的可能性场,帮助学习 plausible pose transfer 曲线,以保持图像的真实度和细节。
  • methods: recurrent pose alignment, gradient guidance from pose interaction fields
  • results: 高真实度的出现,精准的 pose 传输,可靠的 pose transfer 曲线,高效的梯度导航I hope this helps! Let me know if you have any further questions.
    Abstract Pose-guided person image synthesis task requires re-rendering a reference image, which should have a photorealistic appearance and flawless pose transfer. Since person images are highly structured, existing approaches require dense connections for complex deformations and occlusions because these are generally handled through multi-level warping and masking in latent space. But the feature maps generated by convolutional neural networks do not have equivariance, and hence even the multi-level warping does not have a perfect pose alignment. Inspired by the ability of the diffusion model to generate photorealistic images from the given conditional guidance, we propose recurrent pose alignment to provide pose-aligned texture features as conditional guidance. Moreover, we propose gradient guidance from pose interaction fields, which output the distance from the valid pose manifold given a target pose as input. This helps in learning plausible pose transfer trajectories that result in photorealism and undistorted texture details. Extensive results on two large-scale benchmarks and a user study demonstrate the ability of our proposed approach to generate photorealistic pose transfer under challenging scenarios. Additionally, we prove the efficiency of gradient guidance in pose-guided image generation on the HumanArt dataset with fine-tuned stable diffusion.
    摘要 pose-guided人像图像合成任务需要重新渲染引用图像,该图像应该具有摄影真实的外观和完美的pose倾斜。由于人像图像具有高度结构,现有方法通常需要密集的连接来处理复杂的变形和遮挡。但是,由 convolutional neural networks 生成的特征图不具有对称性,因此,即使使用多级扭曲和masking在 latent space中也不会得到完美的pose对齐。为了解决这个问题,我们提出了 recurrent pose alignment,它可以提供pose-aligned的текстура特征作为条件引用。此外,我们还提出了来自pose交互场的梯度导航,它输出了目标pose manifold 中的距离。这有助于学习plausible的pose转移轨迹,以实现摄影真实和不受扭曲的текстура细节。我们在两个大规模的benchmark上和用户研究中展示了我们提出的方法能够在复杂的场景下生成摄影真实的pose转移。此外,我们还证明了梯度导航在pose-guided图像生成中的稳定性。

Mitigate Domain Shift by Primary-Auxiliary Objectives Association for Generalizing Person ReID

  • paper_url: http://arxiv.org/abs/2310.15913
  • repo_url: None
  • paper_authors: Qilei Li, Shaogang Gong
  • for: 提高人脸识别模型在不同频谱环境下的泛化能力
  • methods: 通过同时学习主要人脸识别任务和副任务(pedestrian saliency检测)来减少模型域特性,并通过PAOA机制来协调两个任务的损失函数
  • results: 实验表明提出的PAOA模型在不同频谱环境下表现出色,比较主流的单任务模型和多任务模型更高效。
    Abstract While deep learning has significantly improved ReID model accuracy under the independent and identical distribution (IID) assumption, it has also become clear that such models degrade notably when applied to an unseen novel domain due to unpredictable/unknown domain shift. Contemporary domain generalization (DG) ReID models struggle in learning domain-invariant representation solely through training on an instance classification objective. We consider that a deep learning model is heavily influenced and therefore biased towards domain-specific characteristics, e.g., background clutter, scale and viewpoint variations, limiting the generalizability of the learned model, and hypothesize that the pedestrians are domain invariant owning they share the same structural characteristics. To enable the ReID model to be less domain-specific from these pure pedestrians, we introduce a method that guides model learning of the primary ReID instance classification objective by a concurrent auxiliary learning objective on weakly labeled pedestrian saliency detection. To solve the problem of conflicting optimization criteria in the model parameter space between the two learning objectives, we introduce a Primary-Auxiliary Objectives Association (PAOA) mechanism to calibrate the loss gradients of the auxiliary task towards the primary learning task gradients. Benefiting from the harmonious multitask learning design, our model can be extended with the recent test-time diagram to form the PAOA+, which performs on-the-fly optimization against the auxiliary objective in order to maximize the model's generative capacity in the test target domain. Experiments demonstrate the superiority of the proposed PAOA model.
    摘要 深度学习已经在独立和同样分布(IID)假设下明显提高了人识别模型的准确率,但同时也变得明显在未见的新领域中运行时失效,这是因为不可预测的领域转移。当前的领域总结(DG)人识别模型在学习领域不可变的表示方法时遇到了困难,我们认为深度学习模型受到领域特有的特征,如背景噪音、缩放和视角变化,这限制了学习的模型通用性,我们假设人识别对象在不同领域中具有相同的结构特征。为了让人识别模型免受这些纯净的人识别对象的影响,我们提出了一种方法,即通过同时学习主要人识别实例分类目标和 auxiliary 任务来导向模型学习。为了解决模型参数空间中主要和auxiliary任务之间的冲突问题,我们提出了主要-auxiliary任务关联(PAOA)机制,该机制可以在模型参数空间中均衡两个学习任务的损失导数。通过和谐多任务学习设计,我们的模型可以通过添加最近的测试时 diagram 来形成 PAOA+,该模型在测试目标领域中进行在线优化,以最大化模型的生成能力。实验结果表明我们的 PAOA 模型具有优势。

YOLO-Angio: An Algorithm for Coronary Anatomy Segmentation

  • paper_url: http://arxiv.org/abs/2310.15898
  • repo_url: None
  • paper_authors: Tom Liu, Hui Lin, Aggelos K. Katsaggelos, Adrienne Kline
  • for: 本研究旨在提供一种快速和准确的自动 coronary artery disease 诊断方法,用于替代人工评估。
  • methods: 本研究使用了一种三 stage 方法,包括预处理和特征选择,然后使用 YOLOv8 ensemble 模型生成可能的血管候选者,并最后使用一种逻辑基本方法重建 coronary tree。
  • results: 本研究在 MICCAI 2023 自动 coronary artery disease 诊断挑战中获得第三名,并在官方评估 metric 上得到了 F1 分数为 0.422 和 0.4289 的好成绩。
    Abstract Coronary angiography remains the gold standard for diagnosis of coronary artery disease, the most common cause of death worldwide. While this procedure is performed more than 2 million times annually, there remain few methods for fast and accurate automated measurement of disease and localization of coronary anatomy. Here, we present our solution to the Automatic Region-based Coronary Artery Disease diagnostics using X-ray angiography images (ARCADE) challenge held at MICCAI 2023. For the artery segmentation task, our three-stage approach combines preprocessing and feature selection by classical computer vision to enhance vessel contrast, followed by an ensemble model based on YOLOv8 to propose possible vessel candidates by generating a vessel map. A final segmentation is based on a logic-based approach to reconstruct the coronary tree in a graph-based sorting method. Our entry to the ARCADE challenge placed 3rd overall. Using the official metric for evaluation, we achieved an F1 score of 0.422 and 0.4289 on the validation and hold-out sets respectively.
    摘要 coronary angiography 仍然是 coronary artery disease 诊断的标准方法,全球最常见的死亡原因之一。尽管这个过程每年超过 2 百万次执行,但是尚未有快速和准确的自动测量疾病和 coronary anatomy 的方法。在这里,我们介绍了我们在 MICCAI 2023 上参加的 Automatic Region-based Coronary Artery Disease diagnostics using X-ray angiography images (ARCADE) 挑战的解决方案。 для artery segmentation 任务,我们采用了三个阶段的方法,首先使用类传统计算机视觉技术进行预处理和特征选择,以增强血管对比度,然后使用 YOLOv8 ensemble model 提出可能的血管候选者,生成血管地图。最后,我们使用逻辑基于方法来重建 coronary tree,并使用图表分类法来排序。我们在 ARCADE 挑战中的参赛作品位列全国第三。使用官方的评价指标,我们在验证集和保留集上的 F1 分数分别为 0.422 和 0.4289。

Correlation Debiasing for Unbiased Scene Graph Generation in Videos

  • paper_url: http://arxiv.org/abs/2310.16073
  • repo_url: None
  • paper_authors: Anant Khandelwal
  • for: 生成视频中的动态Scene Graph(SGG)需要不仅具备对场景中 объек 的全面理解,还需要模型这些对象之间的时间变化和互动。
  • methods: 我们提出了Flow-aware temporal consistency和Correlation Debiasing with uncertainty attenuation(FloCoDe),它使用流动特征扭曲来检测帧中的时间一致性,并使用相关偏置来学习不偏的关系表示。
  • results: 我们的方法可以减少生成的Scene Graph中的长untailed分布问题,并提高生成的Scene Graph的性能,相比普通方法可以提高到4.1%。
    Abstract Dynamic scene graph generation (SGG) from videos requires not only comprehensive understanding of objects across the scenes that are prone to temporal fluctuations but also a model the temporal motions and interactions with different objects. Moreover, the long-tailed distribution of visual relationships is the crucial bottleneck of most dynamic SGG methods, since most of them focus on capturing spatio-temporal context using complex architectures, which leads to the generation of biased scene graphs. To address these challenges, we propose FloCoDe: Flow-aware temporal consistency and Correlation Debiasing with uncertainty attenuation for unbiased dynamic scene graphs. FloCoDe employs feature warping using flow to detect temporally consistent objects across the frames. In addition, it uses correlation debiasing to learn the unbiased relation representation for long-tailed classes. Moreover, to attenuate the predictive uncertainties, it uses a mixture of sigmoidal cross-entropy loss and contrastive loss to incorporate label correlations to identify the commonly co-occurring relations and help debias the long-tailed ones. Extensive experimental evaluation shows a performance gain as high as 4.1% showing the superiority of generating more unbiased scene graphs.
    摘要 干净场景图生成(SGG)从视频中需要不仅具备场景中对象的全面理解,而且还需要模型这些对象的时间运动和不同对象之间的交互。此外,视觉关系的长尾分布是大多数动态SGG方法的关键瓶颈,因为大多数方法强调通过复杂的建筑来捕捉空间时间上下文,从而导致生成偏向的场景图。为解决这些挑战,我们提出了FloCoDe:流程承诺和对称偏移以降低不当的动态场景图生成。FloCoDe使用流程映射来检测帧中的时间一致对象。此外,它使用对称偏移来学习不偏向的关系表示,并使用权重混合的sigmoid混合 entropy损失和对比损失来吸收标签相关性,以帮助减少预测不确定性。广泛的实验评估表明,FloCoDe可以提高生成不偏向场景图的性能,最高提高4.1%。

On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms

  • paper_url: http://arxiv.org/abs/2310.15848
  • repo_url: None
  • paper_authors: Surbhi Mittal, Kartik Thakral, Richa Singh, Mayank Vatsa, Tamar Glaser, Cristian Canton Ferrer, Tal Hassner
  • for: 本研究旨在提出一个责任机器学习数据集框架,以评估数据集的可信worthiness。
  • methods: 本研究使用了责任机器学习数据集框架,考虑了公平、隐私和法规遵循性等方面,并提出了改进数据集文件的建议。
  • results: 经过100多个数据集的调查,发现现有数据集中有严重的公平、隐私和法规遵循性问题。
    Abstract Artificial Intelligence (AI) has made its way into various scientific fields, providing astonishing improvements over existing algorithms for a wide variety of tasks. In recent years, there have been severe concerns over the trustworthiness of AI technologies. The scientific community has focused on the development of trustworthy AI algorithms. However, machine and deep learning algorithms, popular in the AI community today, depend heavily on the data used during their development. These learning algorithms identify patterns in the data, learning the behavioral objective. Any flaws in the data have the potential to translate directly into algorithms. In this study, we discuss the importance of Responsible Machine Learning Datasets and propose a framework to evaluate the datasets through a responsible rubric. While existing work focuses on the post-hoc evaluation of algorithms for their trustworthiness, we provide a framework that considers the data component separately to understand its role in the algorithm. We discuss responsible datasets through the lens of fairness, privacy, and regulatory compliance and provide recommendations for constructing future datasets. After surveying over 100 datasets, we use 60 datasets for analysis and demonstrate that none of these datasets is immune to issues of fairness, privacy preservation, and regulatory compliance. We provide modifications to the ``datasheets for datasets" with important additions for improved dataset documentation. With governments around the world regularizing data protection laws, the method for the creation of datasets in the scientific community requires revision. We believe this study is timely and relevant in today's era of AI.
    摘要 人工智能(AI)已经渗透到了不同的科学领域,提供了惊人的改进,对各种任务的表现有了很大的提高。然而,在最近几年,关于AI技术的可靠性产生了严重的担忧。科学社区在开发可靠AI算法方面做出了巨大的努力。然而,机器学习和深度学习算法,今天科学社区中流行的算法,具有很强的数据依赖性。这些学习算法通过数据中的模式识别,学习行为目标。数据中的任何漏洞都可能直接影响算法。在这种情况下,我们认为负责任机器学习数据集的重要性是不可或缺的。我们提出了一种评估数据集的负责任框架,通过评估数据集的可责任性来理解它在算法中的角色。我们通过公平、隐私保护和法规遵守来评估数据集的负责任性。我们对100余个数据集进行了调查,并选择60个数据集进行分析,发现这些数据集中有很多问题,包括公平、隐私保护和法规遵守。我们对数据集的“数据Sheet”进行了重要的修改,以便更好地记录数据集的信息。随着政府在全球范围内规范数据保护法律,科学社区在创建数据集的方法需要进行修订。我们认为这种研究在当今AI时代非常时间和有 relevance。

CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting

  • paper_url: http://arxiv.org/abs/2310.16069
  • repo_url: None
  • paper_authors: Lei Li
  • for: 该研究旨在提高图像分割性能,通过 integrate 文本信息和图像进行语言导向的数据利用。
  • methods: 该framework 使用了一种新的 “Chain-of-Thought” 过程,通过将多个句子中的文本信息编码,形成一个 coherent 的链接。
  • results: 我们的质量和量测试表明,CPSeg 能够有效地提高图像分割性能。
    Abstract Natural scene analysis and remote sensing imagery offer immense potential for advancements in large-scale language-guided context-aware data utilization. This potential is particularly significant for enhancing performance in downstream tasks such as object detection and segmentation with designed language prompting. In light of this, we introduce the CPSeg, Chain-of-Thought Language Prompting for Finer-grained Semantic Segmentation), an innovative framework designed to augment image segmentation performance by integrating a novel "Chain-of-Thought" process that harnesses textual information associated with images. This groundbreaking approach has been applied to a flood disaster scenario. CPSeg encodes prompt texts derived from various sentences to formulate a coherent chain-of-thought. We propose a new vision-language dataset, FloodPrompt, which includes images, semantic masks, and corresponding text information. This not only strengthens the semantic understanding of the scenario but also aids in the key task of semantic segmentation through an interplay of pixel and text matching maps. Our qualitative and quantitative analyses validate the effectiveness of CPSeg.
    摘要 自然场景分析和远程感知影像具有巨大的潜力,可以提高大规模语言引导的上下文意识数据利用的性能。特别是在对象检测和分割任务中,采用设计语言提示可以提高性能。为此,我们介绍了CPSeg(语义粒度分割)框架,它通过 integrate 一种新的 "链条思维" 过程,将图像与文本信息相结合,以提高图像分割性能。我们在洪水灾害场景中应用了这种创新性的方法。CPSeg 将文本信息转化为谱文本,并将它们组织成一个 coherent 的链条思维。我们提出了一个新的视力语言数据集,FloodPrompt,该数据集包括图像、semantic 面积和相应的文本信息。这不仅强化了洪水场景的 semantic 理解,还帮助实现semantic segmentation 任务中的像素和文本匹配图。我们的质量和量统计分析证明了 CPSeg 的有效性。

Unpaired MRI Super Resolution with Self-Supervised Contrastive Learning

  • paper_url: http://arxiv.org/abs/2310.15767
  • repo_url: None
  • paper_authors: Hao Li, Quanwei Liu, Jianan Liu, Xiling Liu, Yanni Dong, Tao Huang, Zhihan Lv
  • for: 提高临床设备的诊断精度
  • methods: 使用自我指导的对比学习来提高SR性能,使用真实的高解度图像和生成的SR图像构建正例对
  • results: 使用限制性的训练数据可以获得显著提高的峰值信号噪声比和结构相似度指标
    Abstract High-resolution (HR) magnetic resonance imaging (MRI) is crucial for enhancing diagnostic accuracy in clinical settings. Nonetheless, the inherent limitation of MRI resolution restricts its widespread applicability. Deep learning-based image super-resolution (SR) methods exhibit promise in improving MRI resolution without additional cost. However, these methods frequently require a substantial number of HR MRI images for training, which can be challenging to acquire. In this paper, we propose an unpaired MRI SR approach that employs self-supervised contrastive learning to enhance SR performance with limited training data. Our approach leverages both authentic HR images and synthetically generated SR images to construct positive and negative sample pairs, thus facilitating the learning of discriminative features. Empirical results presented in this study underscore significant enhancements in the peak signal-to-noise ratio and structural similarity index, even when a paucity of HR images is available. These findings accentuate the potential of our approach in addressing the challenge of limited training data, thereby contributing to the advancement of high-resolution MRI in clinical applications.
    摘要 高分辨率(HR)磁共振成像(MRI)是诊断精度的关键因素,但MRI的自然限制使其广泛应用受到限制。深度学习基于图像超分辨(SR)方法表现出提高MRI分辨率的承诺,但这些方法frequently需要大量的HR MRI图像进行训练,这可能困难以获得。本文提出了一种没有对应HR图像的MRI SR方法,使用自我超级vised学习来提高SR性能,并利用高分辨率图像和生成的SR图像来构建正例对和负例对,从而促进学习抽象特征。实验结果表明,即使只有少量HR图像可用,SR性能也能得到显著提高,包括峰值信号噪声比和结构相似度指数。这些结果强调了我们的方法在有限训练数据情况下的潜在价值,并为高分辨率MRI在临床应用中的进一步发展提供了贡献。

Deep Learning Models for Classification of COVID-19 Cases by Medical Images

  • paper_url: http://arxiv.org/abs/2310.16851
  • repo_url: https://github.com/aditya-saxena-7/Detection-of-COVID-19-from-Chest-X-Ray-images-using-CNNs
  • paper_authors: Amir Ali
  • for: 这个研究旨在提高新型冠状病毒诊断的准确性和速度,通过利用深度学习模型对患者的Computed Tomography(CT)图像进行精确分类。
  • methods: 我们的研究使用了深度转移学习模型,包括DenseNet201、GoogleNet和AlexNet,进行 Covid-19 分类。我们还对这些模型进行了精心的超参数调整,以提高其性能。
  • results: 我们的研究结果表明,使用深度学习模型可以提高 Covid-19 诊断的准确性和速度。我们的模型可以处理多种医疗图像类型,并能够准确地识别特征性的 Covid-19 特征。这些成果表明了这些模型在全球 Covid-19 斗争中的潜在作用。
    Abstract In recent times, the use of chest Computed Tomography (CT) images for detecting coronavirus infections has gained significant attention, owing to their ability to reveal bilateral changes in affected individuals. However, classifying patients from medical images presents a formidable challenge, particularly in identifying such bilateral changes. To tackle this challenge, our study harnesses the power of deep learning models for the precise classification of infected patients. Our research involves a comparative analysis of deep transfer learning-based classification models, including DenseNet201, GoogleNet, and AlexNet, against carefully chosen supervised learning models. Additionally, our work encompasses Covid-19 classification, which involves the identification and differentiation of medical images, such as X-rays and electrocardiograms, that exhibit telltale signs of Covid-19 infection. This comprehensive approach ensures that our models can handle a wide range of medical image types and effectively identify characteristic patterns indicative of Covid-19. By conducting meticulous research and employing advanced deep learning techniques, we have made significant strides in enhancing the accuracy and speed of Covid-19 diagnosis. Our results demonstrate the effectiveness of these models and their potential to make substantial contributions to the global effort to combat COVID-19.
    摘要 Recently, the use of chest Computed Tomography (CT) images for detecting coronavirus infections has gained significant attention due to their ability to reveal bilateral changes in affected individuals. However, classifying patients from medical images presents a formidable challenge, particularly in identifying such bilateral changes. To tackle this challenge, our study employs deep learning models for precise classification of infected patients. Our research involves a comparative analysis of deep transfer learning-based classification models, including DenseNet201, GoogleNet, and AlexNet, against carefully chosen supervised learning models. Additionally, our work encompasses Covid-19 classification, which involves the identification and differentiation of medical images, such as X-rays and electrocardiograms, that exhibit telltale signs of Covid-19 infection. This comprehensive approach ensures that our models can handle a wide range of medical image types and effectively identify characteristic patterns indicative of Covid-19. By conducting meticulous research and employing advanced deep learning techniques, we have made significant strides in enhancing the accuracy and speed of Covid-19 diagnosis. Our results demonstrate the effectiveness of these models and their potential to make substantial contributions to the global effort to combat COVID-19.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Large Language Models are Temporal and Causal Reasoners for Video Question Answering

  • paper_url: http://arxiv.org/abs/2310.15747
  • repo_url: https://github.com/mlvlab/Flipped-VQA
  • paper_authors: Dohwan Ko, Ji Soo Lee, Wooyoung Kang, Byungseok Roh, Hyunwoo J. Kim
    for: 这paper是为了解决视频问答任务中的语言偏见问题,使用大语言模型(LLMs)提供了有效的假设,但是这些假设常导致模型过于依赖问题,而忽略视频内容。methods: 该paper提出了一种新的框架 called Flipped-VQA,它鼓励模型预测所有的 $\langle$V, Q, A$\rangle$ triplet,并通过翻转源对和目标标签来理解它们之间的复杂关系。results: 在五个挑战性的视频问答 benchmark 上,LLaMA-VQA模型,基于 Flipped-VQA 框架,超过了基于 LLMs 和非 LLMS 的模型的性能。此外,该框架可以应用于不同的 LLMs (OPT 和 GPT-J),并在这些模型中提高了表现。empirical 表明,Flipped-VQA 不仅增强了语言偏见的利用,还减轻了语言偏见,导致错误答案依赖于问题。
    Abstract Large Language Models (LLMs) have shown remarkable performances on a wide range of natural language understanding and generation tasks. We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and causal reasoning in Video Question Answering (VideoQA). However, such priors often cause suboptimal results on VideoQA by leading the model to over-rely on questions, $\textit{i.e.}$, $\textit{linguistic bias}$, while ignoring visual content. This is also known as `ungrounded guesses' or `hallucinations'. To address this problem while leveraging LLMs' prior on VideoQA, we propose a novel framework, Flipped-VQA, encouraging the model to predict all the combinations of $\langle$V, Q, A$\rangle$ triplet by flipping the source pair and the target label to understand their complex relationships, $\textit{i.e.}$, predict A, Q, and V given a VQ, VA, and QA pairs, respectively. In this paper, we develop LLaMA-VQA by applying Flipped-VQA to LLaMA, and it outperforms both LLMs-based and non-LLMs-based models on five challenging VideoQA benchmarks. Furthermore, our Flipped-VQA is a general framework that is applicable to various LLMs (OPT and GPT-J) and consistently improves their performances. We empirically demonstrate that Flipped-VQA not only enhances the exploitation of linguistic shortcuts but also mitigates the linguistic bias, which causes incorrect answers over-relying on the question. Code is available at https://github.com/mlvlab/Flipped-VQA.
    摘要 大型语言模型(LLMs)在各种自然语言理解和生成任务上表现出色。我们发现LLMs在视频问答(VideoQA)任务上提供了有效的先验知识,但这些先验知识经常导致视频内容被忽略,从而导致 incorrect answers。这也被称为“不扎实的猜测”或“幻觉”。为解决这个问题并利用LLMs的先验知识,我们提出了一种新的框架——翻转VQA(Flipped-VQA),强制模型预测所有的 $\langle$V, Q, A$\rangle$ 三元组。在这篇论文中,我们将应用Flipped-VQA于 LLMA 模型,并在五个复杂的 VideoQA benchmark 上取得了比 LLMS 和非 LLMS 模型更高的表现。此外,我们的 Flipped-VQA 是一个通用的框架,可以应用于多种 LLMS(OPT 和 GPT-J),并在不同的 LLMS 上提高其表现。我们经验表明,Flipped-VQA 不仅增强了语言短cut的利用,还减轻语言偏见,这种偏见导致 incorrect answers 依赖于问题。代码可以在 https://github.com/mlvlab/Flipped-VQA 上获取。

Interpretable Medical Image Classification using Prototype Learning and Privileged Information

  • paper_url: http://arxiv.org/abs/2310.15741
  • repo_url: https://github.com/xrad-ulm/proto-caps
  • paper_authors: Luisa Gallee, Meinrad Beer, Michael Goetz
    for: 这个论文主要是为了提高医疗图像分类的可解释性和性能。methods: 这个论文使用了 capsule networks、prototype learning 和特权信息来提高模型的可解释性和性能。results: 对于 LIDC-IDRI 数据集,该方法可以同时提高预测精度和可解释性,比基eline模型提高了6.3%的精度。同时,模型可以提供 случа例理解和原型表示,让 radiologist 定义的特征得到视觉验证。
    Abstract Interpretability is often an essential requirement in medical imaging. Advanced deep learning methods are required to address this need for explainability and high performance. In this work, we investigate whether additional information available during the training process can be used to create an understandable and powerful model. We propose an innovative solution called Proto-Caps that leverages the benefits of capsule networks, prototype learning and the use of privileged information. Evaluating the proposed solution on the LIDC-IDRI dataset shows that it combines increased interpretability with above state-of-the-art prediction performance. Compared to the explainable baseline model, our method achieves more than 6 % higher accuracy in predicting both malignancy (93.0 %) and mean characteristic features of lung nodules. Simultaneously, the model provides case-based reasoning with prototype representations that allow visual validation of radiologist-defined attributes.
    摘要 <>TRANSLATE_TEXT interpretability 是医学影像领域中的一项重要需求。高级深度学习方法可以满足这一需求,但同时也需要解释性和高性能的模型。在这项工作中,我们investigate了是否可以在训练过程中使用额外信息来创建一个可理解的和强大的模型。我们提出了一种创新解决方案,即Proto-Caps,该解决方案利用了圆拱网络、原型学习和特权信息的使用。对于LIDC-IDRI数据集进行评估,我们发现该方法可以同时实现高度解释性和上状态艺术性的预测性能。相比于解释性基准模型,我们的方法在预测肿瘤性(93.0%)和肿瘤特征特征的平均值方面达到了6.0%以上的提升。此外,模型还提供了基于案例的理解,使得可以通过板准表示来视觉验证 ради医定义的特征。Note: The text is translated using the Google Translate API, which may not be perfect and may not capture all the nuances of the original text.

Query-adaptive DETR for Crowded Pedestrian Detection

  • paper_url: http://arxiv.org/abs/2310.15725
  • repo_url: None
  • paper_authors: Feng Gao, Jiaxu Leng, Ji Gan, Xinbo Gao
  • for: 本研究旨在提高DETR和其变体在拥挤人群检测中的性能,特别是在不同程度的拥挤场景下自动调整DETRs的查询数量。
  • methods: 本文分析了两种当前的查询生成方法,并提出了四个指导原则 для设计适应查询生成方法。然后,我们提出了rank-based adaptive query generation(RAQG)方法,包括设计一个预测rank的rank prediction head,以及基于预测rank的adaptive选择方法来生成查询。此外,我们还提出了Soft Gradient L1 Loss来更好地训练rank prediction head。
  • results: 我们的方法可以具体地应用于任何DETRs,并在Crowdhuman dataset和Citypersons dataset上实现了竞争性的 Result。尤其是在Crowdhuman dataset上,我们的方法可以 дости得状态的最佳39.4% MR。
    Abstract DEtection TRansformer (DETR) and its variants (DETRs) have been successfully applied to crowded pedestrian detection, which achieved promising performance. However, we find that, in different degrees of crowded scenes, the number of DETRs' queries must be adjusted manually, otherwise, the performance would degrade to varying degrees. In this paper, we first analyze the two current query generation methods and summarize four guidelines for designing the adaptive query generation method. Then, we propose Rank-based Adaptive Query Generation (RAQG) to alleviate the problem. Specifically, we design a rank prediction head that can predict the rank of the lowest confidence positive training sample produced by the encoder. Based on the predicted rank, we design an adaptive selection method that can adaptively select coarse detection results produced by the encoder to generate queries. Moreover, to train the rank prediction head better, we propose Soft Gradient L1 Loss. The gradient of Soft Gradient L1 Loss is continuous, which can describe the relationship between the loss value and the updated value of model parameters granularly. Our method is simple and effective, which can be plugged into any DETRs to make it query-adaptive in theory. The experimental results on Crowdhuman dataset and Citypersons dataset show that our method can adaptively generate queries for DETRs and achieve competitive results. Especially, our method achieves state-of-the-art 39.4% MR on Crowdhuman dataset.
    摘要 <>转化文本到简化中文。<>DEtection TRansformer(DETR)和其变体(DETRs)已经成功应用于人群检测,实现了出色的表现。然而,我们发现在不同的人群场景中,DETRs的查询数量需要手动调整,否则表现会随之下降。在这篇论文中,我们首先分析了两种当前的查询生成方法,并总结出四个指导方针 для设计适应查询生成方法。然后,我们提议 Rank-based Adaptive Query Generation(RAQG)来解决这个问题。具体来说,我们设计了一个排名预测头,可以预测Encoder输出最低信心正例的排名。根据预测的排名,我们设计了一种适应选择方法,可以适应地选择Encoder输出的粗略检测结果来生成查询。此外,为了训练排名预测头更好,我们提议Soft Gradient L1 Loss。Soft Gradient L1 Loss的梯度是连续的,可以描述模型参数更新值与梯度值之间的关系细致。我们的方法简单而有效,可以将其插入到任何DETRs中,让它成为可query-adaptive的。实验结果表明,我们的方法可以适应地生成查询 для DETRs,并实现了竞争性的结果。特别是,我们的方法在Crowdhuman dataset上达到了39.4%的MR。

GNeSF: Generalizable Neural Semantic Fields

  • paper_url: http://arxiv.org/abs/2310.15712
  • repo_url: None
  • paper_authors: Hanlin Chen, Chen Li, Mengqi Guo, Zhiwen Yan, Gim Hee Lee
  • For: 本研究提出了一种基于神经 implicit representation的3D场景分割方法,以便只需在2D层次上进行训练,并且可以在推理时通过多视图图像特征和 semantic map 来适应不同场景。* Methods: 我们的方法首先从多视图图像特征和 semantic map 中提取特征,然后使用一种新的软投票机制来融合不同视图的2D semantic信息。此外,我们还编码了视图差信息,以便在推理时根据视图的距离对3D点的投票分数进行调整。最后,我们还设计了一个可见模块来检测和过滤出 occluded 视图中的不良信息。* Results: 我们的方法可以在不同场景下进行3D semantic segmentation,并且可以与场景特定的方法匹配性。此外,我们的方法还可以在只有2D semantic annotation 的情况下进行 Synthetic semantic map 的生成和3D semantic segmentation。实验结果表明,我们的方法可以与场景特定的方法匹配性,并且可以在只有2D annotation 的情况下进行3D semantic segmentation。更重要的是,我们的方法可以在有限的2D annotation 的情况下,超越现有的强supervision-based方法。我们的源代码可以在 GitHub 上找到:https://github.com/HLinChen/GNeSF。
    Abstract 3D scene segmentation based on neural implicit representation has emerged recently with the advantage of training only on 2D supervision. However, existing approaches still requires expensive per-scene optimization that prohibits generalization to novel scenes during inference. To circumvent this problem, we introduce a generalizable 3D segmentation framework based on implicit representation. Specifically, our framework takes in multi-view image features and semantic maps as the inputs instead of only spatial information to avoid overfitting to scene-specific geometric and semantic information. We propose a novel soft voting mechanism to aggregate the 2D semantic information from different views for each 3D point. In addition to the image features, view difference information is also encoded in our framework to predict the voting scores. Intuitively, this allows the semantic information from nearby views to contribute more compared to distant ones. Furthermore, a visibility module is also designed to detect and filter out detrimental information from occluded views. Due to the generalizability of our proposed method, we can synthesize semantic maps or conduct 3D semantic segmentation for novel scenes with solely 2D semantic supervision. Experimental results show that our approach achieves comparable performance with scene-specific approaches. More importantly, our approach can even outperform existing strong supervision-based approaches with only 2D annotations. Our source code is available at: https://github.com/HLinChen/GNeSF.
    摘要 三维场景分割基于神经隐式表示最近几年发展起来,具有训练只需2D监督的优势。然而,现有方法仍然需要贵重的每个场景优化,这限制了扩展到新场景的推理。为了解决这个问题,我们介绍了一个普适的三维分割框架基于隐式表示。Specifically,我们的框架接受多视图图像特征和 semantic maps作为输入,而不是只是空间信息,以避免过拟合特定场景的 Géometric 和 Semantic 信息。我们提出了一种新的软投票机制,用于对每个3D点的2D semantic信息进行集成。此外,我们还编码了视图差信息,以预测投票分数。直观地说,这使得邻近视图的semantic信息能够更大地贡献。此外,我们还设计了一个隐藏信息检测和过滤模块,以避免由 occluded 视图引入的损害信息。由于我们的提出的方法具有普适性,我们可以在具有 solely 2D semantic supervision 的情况下生成 semantic maps or 进行3D semantic segmentation for novel scenes。实验结果表明,我们的方法可以与场景特定方法匹配性能,而且甚至可以超越基于强监督的现有方法。我们的源代码可以在 GitHub 上找到:https://github.com/HLinChen/GNeSF。

Physics-Informed with Power-Enhanced Residual Network for Interpolation and Inverse Problems

  • paper_url: http://arxiv.org/abs/2310.15690
  • repo_url: https://github.com/cmmai/resnet_for_pinn
  • paper_authors: Amir Noorizadegan, D. L. Young, Y. C. Hon, C. S. Chen
  • for: 提高神经网络的 interpolating 能力,包括平滑和非平滑函数的 interpolating。
  • methods: 提出了一种新的神经网络结构 Power-Enhancing residual network,通过添加力量项来提高网络的表达力。研究考虑了网络深度、宽度以及优化方法,并证明了该architecture的适应性和性能优势。
  • results: 研究表明,提出的Power-Enhancing residual network具有非常高的准确率,特别是对非平滑函数的 interpolating。实际应用中也证明了该网络的superiority,包括准确率、速度和效率。此外,研究还探讨了更深的网络的影响。最后,该architecture被应用于解决反射 Burgers 方程的问题,并达到了优秀的性能。
    Abstract This paper introduces a novel neural network structure called the Power-Enhancing residual network, designed to improve interpolation capabilities for both smooth and non-smooth functions in 2D and 3D settings. By adding power terms to residual elements, the architecture boosts the network's expressive power. The study explores network depth, width, and optimization methods, showing the architecture's adaptability and performance advantages. Consistently, the results emphasize the exceptional accuracy of the proposed Power-Enhancing residual network, particularly for non-smooth functions. Real-world examples also confirm its superiority over plain neural network in terms of accuracy, convergence, and efficiency. The study also looks at the impact of deeper network. Moreover, the proposed architecture is also applied to solving the inverse Burgers' equation, demonstrating superior performance. In conclusion, the Power-Enhancing residual network offers a versatile solution that significantly enhances neural network capabilities. The codes implemented are available at: \url{https://github.com/CMMAi/ResNet_for_PINN}.
    摘要 这篇论文介绍了一种新的神经网络结构,称为能量增强径 residual network,用于提高 interpolation 能力,包括平滑和非平滑函数的情况。通过添加能量项到径元中,这种建筑增强了网络的表达能力。研究探讨了网络的深度、宽度和优化方法,并证明了该建筑的适应性和性能优势。结果表明提案的 Power-Enhancing residual network 具有 Exceptional accuracy,特别是对非平滑函数。实际例子也证明了它在精度、速度和稳定性方面的优越性。此外,该建筑还应用于解决 inverse Burgers' equation,并达到了优秀的性能。 conclude,Power-Enhancing residual network 提供了一种多样化的解决方案,可以显著提高神经网络的能力。实现的代码可以在:\url{https://github.com/CMMAi/ResNet_for_PINN} 中找到。

Nighttime Thermal Infrared Image Colorization with Feedback-based Object Appearance Learning

  • paper_url: http://arxiv.org/abs/2310.15688
  • repo_url: https://github.com/fuyaluo/foalgan
  • paper_authors: Fu-Ya Luo, Shu-Lin Liu, Yi-Jun Cao, Kai-Fu Yang, Chang-Yong Xie, Yong Liu, Yong-Jie Li
  • for: 提高夜间热动像(TIR)图像的可读性和可用性,以便在夜间场景中进行视觉识别和任务执行。
  • methods: 提出了一种基于生成对抗网络的feedback型物体外观学习(FoalGAN)方法,通过增加 occlusion-aware mixup 模块和对应的外观一致损失来提高小对象翻译性能。
  • results: 经验表明,提出的 FoalGAN 方法不仅可以提高小对象的外观学习,还能够在nighttime street scene中提高 semantic preservation和edge consistency。
    Abstract Stable imaging in adverse environments (e.g., total darkness) makes thermal infrared (TIR) cameras a prevalent option for night scene perception. However, the low contrast and lack of chromaticity of TIR images are detrimental to human interpretation and subsequent deployment of RGB-based vision algorithms. Therefore, it makes sense to colorize the nighttime TIR images by translating them into the corresponding daytime color images (NTIR2DC). Despite the impressive progress made in the NTIR2DC task, how to improve the translation performance of small object classes is under-explored. To address this problem, we propose a generative adversarial network incorporating feedback-based object appearance learning (FoalGAN). Specifically, an occlusion-aware mixup module and corresponding appearance consistency loss are proposed to reduce the context dependence of object translation. As a representative example of small objects in nighttime street scenes, we illustrate how to enhance the realism of traffic light by designing a traffic light appearance loss. To further improve the appearance learning of small objects, we devise a dual feedback learning strategy to selectively adjust the learning frequency of different samples. In addition, we provide pixel-level annotation for a subset of the Brno dataset, which can facilitate the research of NTIR image understanding under multiple weather conditions. Extensive experiments illustrate that the proposed FoalGAN is not only effective for appearance learning of small objects, but also outperforms other image translation methods in terms of semantic preservation and edge consistency for the NTIR2DC task.
    摘要 这个文章讨论了在不良环境(例如总黑暗)下稳定图像摄取的问题。因为热色干扰(TIR)相机在夜间场景认知中是一种普遍的选择,但是TIR图像的低 контра斯特和无彩色性使得人类阅读和后续的RGB基于视觉算法的应用受到阻碍。因此,将夜间TIR图像转换为相应的日间彩色图像(NTIR2DC)是一个有必要的步骤。尽管在NTIR2DC任务上已经做出了卓越的进步,但是如何提高小物类的译像性仍然是尚未探讨的问题。为了解决这个问题,我们提出了一个基于对应学习的生成 adversarial network(FoalGAN)。specifically,我们提出了一个遮瑕节module和相应的出现整合损失,以减少物类译像的上下文依赖。作为夜间街道场景中小物件的示例,我们详细说明了如何增强交通信号灯的现实感。此外,我们还提出了一个双重反馈学习策略,以选择性地调整不同的样本学习频率。此外,我们还提供了Brno dataset中一 subset的像素级注释,以便帮助夜间多种天气下NTIR图像理解的研究。实验结果显示,我们提出的FoalGAN不仅有效地进行小物类的出现学习,而且也在NTIR2DC任务中比其他图像转换方法具有更高的semantic preservation和edge consistency。

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

  • paper_url: http://arxiv.org/abs/2310.15670
  • repo_url: https://github.com/opendrivelab/birds-eye-view-perception
  • paper_authors: Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li
  • for: 提高camera-only 3D对象检测器(学生)的准确率,通过从LiDAR-或多Modal-基于的对手(师)获得知识传递。
  • methods: 使用Uni-modal Distillation,即主要使用摄像头特征,并采用师模型与学生模型的同构结构,以减少特征差异。同时,引入了一种基于运动误差的细化 trajectory-based distillation模块,以进一步改进学生模型的性能。
  • results: 在nuscenes上,VCD-A实现了63.1% NDS的新州OF-the-art成绩,超过了其他多Modal或LiDAR-based模型。
    Abstract Current research is primarily dedicated to advancing the accuracy of camera-only 3D object detectors (apprentice) through the knowledge transferred from LiDAR- or multi-modal-based counterparts (expert). However, the presence of the domain gap between LiDAR and camera features, coupled with the inherent incompatibility in temporal fusion, significantly hinders the effectiveness of distillation-based enhancements for apprentices. Motivated by the success of uni-modal distillation, an apprentice-friendly expert model would predominantly rely on camera features, while still achieving comparable performance to multi-modal models. To this end, we introduce VCD, a framework to improve the camera-only apprentice model, including an apprentice-friendly multi-modal expert and temporal-fusion-friendly distillation supervision. The multi-modal expert VCD-E adopts an identical structure as that of the camera-only apprentice in order to alleviate the feature disparity, and leverages LiDAR input as a depth prior to reconstruct the 3D scene, achieving the performance on par with other heterogeneous multi-modal experts. Additionally, a fine-grained trajectory-based distillation module is introduced with the purpose of individually rectifying the motion misalignment for each object in the scene. With those improvements, our camera-only apprentice VCD-A sets new state-of-the-art on nuScenes with a score of 63.1% NDS.
    摘要 当前研究主要目标是提高Camera-only 3D对象检测器(学生)的准确率,通过从LiDAR-或多模态基础的对手(专家)传递知识。然而,域之间差距和时间融合不兼容性使得采用热静融合方法的改进效果受到限制。为了解决这个问题,我们提出了VCD框架,包括学生友好的多模态专家和时间融合友好的热静融合监督。多模态专家VCD-E采用与Camera-only学生相同的结构,以减轻特征差异,并利用LiDAR输入为深度估计来重建3D场景,实现与其他多模态专家相同的性能。此外,我们还引入了细腻的轨迹基于热静融合模块,以减轻每个对象在场景中的运动误差。通过这些改进,我们的Camera-only学生VCD-A在nuScenes上设置新的状态标准,分数为63.1% NDS。

Region-controlled Style Transfer

  • paper_url: http://arxiv.org/abs/2310.15658
  • repo_url: https://github.com/kakinglow/Selective-Style-Transfer
  • paper_authors: Junjie Kang, Jinsong Wu, Shiqi Jiang
  • for: 提高图像风格传递的控制精度,使得图像风格更加自然和有趣。
  • methods: 提出了一种基于损失函数的启用方法,可以在不同区域中控制风格强度,并且引入了一种新的特征融合方法,可以将内容特征变换到风格特征的空间中,保持 semantic关系。
  • results: 经过广泛的实验 validate 了我们提出的方法的有效性。
    Abstract Image style transfer is a challenging task in computational vision. Existing algorithms transfer the color and texture of style images by controlling the neural network's feature layers. However, they fail to control the strength of textures in different regions of the content image. To address this issue, we propose a training method that uses a loss function to constrain the style intensity in different regions. This method guides the transfer strength of style features in different regions based on the gradient relationship between style and content images. Additionally, we introduce a novel feature fusion method that linearly transforms content features to resemble style features while preserving their semantic relationships. Extensive experiments have demonstrated the effectiveness of our proposed approach.
    摘要 Computational vision 中的图像风格传递是一项复杂的任务。现有的算法可以通过控制神经网络的特征层来传递风格图像的颜色和文字感,但它们无法控制不同区域的文字强度。为解决这个问题,我们提出了一种使用损失函数来约束不同区域的风格强度的训练方法。这种方法基于风格和内容图像的梯度关系来导引风格特征的传递强度在不同区域。此外,我们还介绍了一种新的特征融合方法,该方法将内容特征线性变换为风格特征,以保持它们的 semantic 关系。我们进行了广泛的实验,并证明了我们的提出的方法的有效性。

Breaking of brightness consistency in optical flow with a lightweight CNN network

  • paper_url: http://arxiv.org/abs/2310.15655
  • repo_url: https://github.com/linyicheng1/LET-NET
  • paper_authors: Yicheng Lin, Shuo Wang, Yunlong Jiang, Bin Han
  • for: 提高高Dynamic Range(HDR)环境中光流算法的性能,适用于各种计算机视觉任务。
  • methods: 使用轻量级网络提取强协VARIABILITY的抽象特征和强角度的方向特征,并修改传统的光流方法的亮度一致性要求。
  • results: 提高了93%的翻译错误,并在公共HDR数据集上验证了方法的可靠性和稳定性。
    Abstract Sparse optical flow is widely used in various computer vision tasks, however assuming brightness consistency limits its performance in High Dynamic Range (HDR) environments. In this work, a lightweight network is used to extract illumination robust convolutional features and corners with strong invariance. Modifying the typical brightness consistency of the optical flow method to the convolutional feature consistency yields the light-robust hybrid optical flow method. The proposed network runs at 190 FPS on a commercial CPU because it uses only four convolutional layers to extract feature maps and score maps simultaneously. Since the shallow network is difficult to train directly, a deep network is designed to compute the reliability map that helps it. An end-to-end unsupervised training mode is used for both networks. To validate the proposed method, we compare corner repeatability and matching performance with origin optical flow under dynamic illumination. In addition, a more accurate visual inertial system is constructed by replacing the optical flow method in VINS-Mono. In a public HDR dataset, it reduces translation errors by 93\%. The code is publicly available at https://github.com/linyicheng1/LET-NET.
    摘要 广泛应用在计算机视觉任务中的稀肥光流尚未在高动态范围(HDR)环境中表现出色,因为它假设照明一致性。在这种工作中,我们使用轻量级网络提取抗照明强健的 convolutional 特征和角度特征。对于传统的光流方法的照明一致性来改变光流方法,得到了光敏感混合光流方法。提案的网络在商业CPU上运行于190帧/秒,因为它只使用了四个 convolutional 层来提取特征图和分数图同时。由于这个浅网络困难直接训练,我们设计了深度网络来计算可靠度地图。我们使用无监督的整体训练模式来训练两个网络。为验证我们的方法,我们比较了角 repeatability 和匹配性与原始光流方法在动态照明下。此外,我们将光流方法在 VINS-Mono 中换为我们的方法,从而构建了更加准确的视ер普系统。在一个公共 HDR 数据集上,它降低了翻译错误率93%。代码可以在 上获取。

Mean Teacher DETR with Masked Feature Alignment: A Robust Domain Adaptive Detection Transformer Framework

  • paper_url: http://arxiv.org/abs/2310.15646
  • repo_url: None
  • paper_authors: Weixi Weng, Chun Yuan
    for:这 paper 的目的是提出一种基于 Mean Teacher 和 DETR 的 two-stage 无监督领域适应对象检测方法 (MTM),以解决现有方法存在的性能波动和训练停滞问题。methods:这 paper 使用了两种 masked feature alignment 方法,包括 masked domain query-based feature alignment (MDQFA) 和 masked token-wise feature alignment (MTWFA),以减轻领域偏移问题,并提高模型的目标性能。results: experiments 表明,MTM 可以在三个复杂的情况下提供更高的性能,而且理论分析表明,MTM 可以更好地减轻领域偏移问题。
    Abstract Unsupervised domain adaptation object detection(UDAOD) research on Detection Transformer(DETR) mainly focuses on feature alignment and existing methods can be divided into two kinds, each of which has its unresolved issues. One-stage feature alignment methods can easily lead to performance fluctuation and training stagnation. Two-stage feature alignment method based on mean teacher comprises a pretraining stage followed by a self-training stage, each facing problems in obtaining reliable pretrained model and achieving consistent performance gains. Methods mentioned above have not yet explore how to utilize the third related domain such as target-like domain to assist adaptation. To address these issues, we propose a two-stage framework named MTM, i.e. Mean Teacher-DETR with Masked Feature Alignment. In the pretraining stage, we utilize labeled target-like images produced by image style transfer to avoid performance fluctuation. In the self-training stage, we leverage unlabeled target images by pseudo labels based on mean teacher and propose a module called Object Queries Knowledge Transfer(OQKT) to ensure consistent performance gains of the student model. Most importantly, we propose masked feature alignment methods including Masked Domain Query-based Feature Alignment(MDQFA) and Masked Token-wise Feature Alignment(MTWFA) to alleviate domain shift in a more robust way, which not only prevent training stagnation and lead to a robust pretrained model in the pretraining stage, but also enhance the model's target performance in the self-training stage. Experiments on three challenging scenarios and a theoretical analysis verify the effectiveness of MTM.
    摘要 <>本文主要研究无监督领域适应对检测变换器(DETR)的问题,具体来说是在特征对齐方面。现有的方法可以分为两类,各自带有不解决的问题。一类是一stage特征对齐方法,容易导致性能波动和训练停滞。另一类是基于mean teacher的两stage特征对齐方法,但每个阶段都面临着获得可靠预训练模型和实现一致性提升的问题。以上方法尚未考虑利用第三个相关领域,如目标类似领域,来支持适应。为了解决这些问题,我们提出了一个名为MTM(Mean Teacher-DETR with Masked Feature Alignment)的两stage框架。在预训练阶段,我们利用了标注目标类似图像生成的image style transfer来避免性能波动。在自我训练阶段,我们利用了无标签目标图像和pseudo标签基于mean teacher,并提出了一个名为Object Queries Knowledge Transfer(OQKT)的模块,以确保学生模型的一致性提升。最重要的是,我们提出了一些masked feature alignment方法,包括Masked Domain Query-based Feature Alignment(MDQFA)和Masked Token-wise Feature Alignment(MTWFA),以减轻领域偏移,不仅在预训练阶段避免训练停滞,还在自我训练阶段提高模型的目标性能。实验结果和理论分析证明了MTM的有效性。

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

  • paper_url: http://arxiv.org/abs/2310.15624
  • repo_url: https://github.com/supermhp/gupnet
  • paper_authors: Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang
    for:* image-based monocular 3D object detectionmethods:* using perspective projection to estimate object depth* modeling geometry projection in a probabilistic mannerresults:* state-of-the-art (SOTA) performance in image-based monocular 3D detection* superiority in efficacy with a simplified framework
    Abstract Geometry plays a significant role in monocular 3D object detection. It can be used to estimate object depth by using the perspective projection between object's physical size and 2D projection in the image plane, which can introduce mathematical priors into deep models. However, this projection process also introduces error amplification, where the error of the estimated height is amplified and reflected into the projected depth. It leads to unreliable depth inferences and also impairs training stability. To tackle this problem, we propose a novel Geometry Uncertainty Propagation Network (GUPNet++) by modeling geometry projection in a probabilistic manner. This ensures depth predictions are well-bounded and associated with a reasonable uncertainty. The significance of introducing such geometric uncertainty is two-fold: (1). It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning. (2). It can be derived to a highly reliable confidence to indicate the quality of the 3D detection result, enabling more reliable detection inference. Experiments show that the proposed approach not only obtains (state-of-the-art) SOTA performance in image-based monocular 3D detection but also demonstrates superiority in efficacy with a simplified framework.
    摘要 To address this problem, we propose a novel Geometry Uncertainty Propagation Network (GUPNet++) that models the geometry projection in a probabilistic manner. This ensures that depth predictions are well-bounded and associated with a reasonable uncertainty. The significance of introducing such geometric uncertainty is twofold:1. It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.2. It can be derived to a highly reliable confidence to indicate the quality of the 3D detection result, enabling more reliable detection inference.Experiments show that the proposed approach not only achieves state-of-the-art performance in image-based monocular 3D detection but also demonstrates superiority in efficiency with a simplified framework.

Grasp Multiple Objects with One Hand

  • paper_url: http://arxiv.org/abs/2310.15599
  • repo_url: None
  • paper_authors: Yuyang Li, Bo Liu, Yiran Geng, Puhao Li, Yaodong Yang, Yixin Zhu, Tengyu Liu, Siyuan Huang
  • for: 这篇论文的目的是解决机器人多物 grasping 问题,即同时抓持和操作多个物体,如对象转移和手部操作。
  • methods: 这篇论文提出了一种两阶段方法,名为 MultiGrasp,用于在桌面上使用多指灵活手柔腕抓取多个物体。该方法包括(i)生成 pré-grasp 提议和(ii)执行抓取和升起操作。
  • results: 实验主要关注 dual-object grasping,成功率为 44.13%,表明方法能够适应未看到的物体配置和不精准的抓取。方法还能够抓取更多的物体,尽管推理速度减少。
    Abstract The human hand's complex kinematics allow for simultaneous grasping and manipulation of multiple objects, essential for tasks like object transfer and in-hand manipulation. Despite its importance, robotic multi-object grasping remains underexplored and presents challenges in kinematics, dynamics, and object configurations. This paper introduces MultiGrasp, a two-stage method for multi-object grasping on a tabletop with a multi-finger dexterous hand. It involves (i) generating pre-grasp proposals and (ii) executing the grasp and lifting the objects. Experimental results primarily focus on dual-object grasping and report a 44.13% success rate, showcasing adaptability to unseen object configurations and imprecise grasps. The framework also demonstrates the capability to grasp more than two objects, albeit at a reduced inference speed.
    摘要 人类手部的复杂动态学允许同时抓取和操作多个物体,这是对象传递和手部执行中的重要功能。尽管其重要性,机器人多物体抓取仍然受欠发展和研究,包括动力学、物体配置和 grasping 等方面的挑战。本文介绍 MultiGrasp,一种基于两个阶段的多物体抓取方法,用于表面上的多指dexterous 手。其包括(i)生成 pré-grasp 建议和(ii)执行抓取和升起 Objects。实验结果主要关注 dual-object grasping,成功率为 44.13%,展示了适应不同物体配置和不精确的抓取。框架还证明可以抓取更多的物体,尽管速度较慢。

Facial Data Minimization: Shallow Model as Your Privacy Filter

  • paper_url: http://arxiv.org/abs/2310.15590
  • repo_url: None
  • paper_authors: Yuwen Pu, Jiahao Chen, Jiayu Pan, Hao li, Diqun Yan, Xuhong Zhang, Shouling Ji
  • for: 保护用户面部数据隐私
  • methods: 提出了一种数据隐私最小化变换方法(PMT),可以基于授权服务器的浅型模型进行处理,以获得干扰数据。此外,还提出了一种增强干扰方法来提高PMT的Robustness。
  • results: 经过广泛的实验测试,PMT方法能够有效地防止面部数据泄露和滥用,同时保持面recognition精度。
    Abstract Face recognition service has been used in many fields and brings much convenience to people. However, once the user's facial data is transmitted to a service provider, the user will lose control of his/her private data. In recent years, there exist various security and privacy issues due to the leakage of facial data. Although many privacy-preserving methods have been proposed, they usually fail when they are not accessible to adversaries' strategies or auxiliary data. Hence, in this paper, by fully considering two cases of uploading facial images and facial features, which are very typical in face recognition service systems, we proposed a data privacy minimization transformation (PMT) method. This method can process the original facial data based on the shallow model of authorized services to obtain the obfuscated data. The obfuscated data can not only maintain satisfactory performance on authorized models and restrict the performance on other unauthorized models but also prevent original privacy data from leaking by AI methods and human visual theft. Additionally, since a service provider may execute preprocessing operations on the received data, we also propose an enhanced perturbation method to improve the robustness of PMT. Besides, to authorize one facial image to multiple service models simultaneously, a multiple restriction mechanism is proposed to improve the scalability of PMT. Finally, we conduct extensive experiments and evaluate the effectiveness of the proposed PMT in defending against face reconstruction, data abuse, and face attribute estimation attacks. These experimental results demonstrate that PMT performs well in preventing facial data abuse and privacy leakage while maintaining face recognition accuracy.
    摘要 《面部识别服务中的数据隐私保护》面部识别服务在各个领域中得到广泛应用,为人们带来了很大的便利。然而,一旦用户的面部数据被提供给服务提供者,用户就会失去面部数据的控制权。随着面部数据泄露的问题的出现,有许多隐私和安全问题被提出。虽然许多隐私保护方法已经被提出,但它们通常因为不能抵御敌对策略或辅助数据的攻击而失效。因此,在这篇论文中,我们根据面部识别服务系统中的两种上传方式(上传面部图像和上传面部特征),提出了一种数据隐私减少转换(PMT)方法。这种方法可以基于授权服务的浅型模型处理原始面部数据,以获得干扰后的数据。这些干扰后的数据可以保持授权服务器模型的表现和限制未授权服务器模型的表现,同时防止原始隐私数据泄露。此外,服务提供者可能会对接收到的数据进行预处理操作,因此我们还提出了一种加强干扰方法以提高PMT的Robustness。此外,为了让一张面部图像同时授权多个服务模型,我们还提出了一种多重限制机制以提高PMT的扩展性。最后,我们进行了广泛的实验和评估,并证明了PMT在防止面部数据滥用和隐私泄露的同时保持面部识别精度的效iveness。

VMAF Re-implementation on PyTorch: Some Experimental Results

  • paper_url: http://arxiv.org/abs/2310.15578
  • repo_url: None
  • paper_authors: Kirill Aistov, Maxim Koroteev
  • for: 本研究提出了一种基于PyTorch框架的VMAF实现,并与标准(libvmaf)实现进行比较,显示两者之间的差异在VMAF单位下小于10^-2。
  • methods: 本研究使用了VMAF作为目标函数,并 investigate了在计算梯度时的问题。结果表明,通过使用VMAF作为目标函数进行训练,不会导致梯度计算出现问题。
  • results: 本研究的实验结果表明,使用PyTorch框架实现的VMAF和标准(libvmaf)实现之间的差异在VMAF单位下小于10^-2。
    Abstract Based on the standard VMAF implementation we propose an implementation of VMAF using PyTorch framework. For this implementation comparisons with the standard (libvmaf) show the discrepancy $\lesssim 10^{-2}$ in VMAF units. We investigate gradients computation when using VMAF as an objective function and demonstrate that training using this function does not result in ill-behaving gradients.
    摘要 (Simplified Chinese) Based on 标准 VMAF 实现,我们提议使用 PyTorch 框架来实现 VMAF。与标准 (libvmaf) 进行比较,我们发现 VMAF 单位下的差异在 $10^{-2}$ 以下。我们调查了使用 VMAF 作为目标函数时的梯度计算,并证明了在训练这个函数时不会出现不正确的梯度问题。

I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

  • paper_url: http://arxiv.org/abs/2310.15568
  • repo_url: None
  • paper_authors: Yunyao Mao, Jiajun Deng, Wengang Zhou, Zhenbo Lu, Wanli Ouyang, Houqiang Li
  • for: 这篇论文主要用于提出一种基于对比学习的自动生成3D人体动作表示学习方法,以解决现有的对比学习框架中不充分利用不同骨骼模式之间的资源。
  • methods: 该方法基于一种泛化对比学习框架,称为Inter-和Intra-模式共同静止(I$^2$MD)框架。在I$^2$MD中,我们首先将交叉模式交互改为一种交叉模式共同静止(CMD) proces。此外,我们还采用了一种内部模式共同静止策略(IMD),以解决相似样本之间的干扰和利用其下面的上下文。
  • results: 对三个数据集进行了广泛的实验,并取得了一系列新纪录。
    Abstract Recent progresses on self-supervised 3D human action representation learning are largely attributed to contrastive learning. However, in conventional contrastive frameworks, the rich complementarity between different skeleton modalities remains under-explored. Moreover, optimized with distinguishing self-augmented samples, models struggle with numerous similar positive instances in the case of limited action categories. In this work, we tackle the aforementioned problems by introducing a general Inter- and Intra-modal Mutual Distillation (I$^2$MD) framework. In I$^2$MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process. Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training. To alleviate the interference of similar samples and exploit their underlying contexts, we further design the Intra-modal Mutual Distillation (IMD) strategy, In IMD, the Dynamic Neighbors Aggregation (DNA) mechanism is first introduced, where an additional cluster-level discrimination branch is instantiated in each modality. It adaptively aggregates highly-correlated neighboring features, forming local cluster-level contrasting. Mutual distillation is then performed between the two branches for cross-level knowledge exchange. Extensive experiments on three datasets show that our approach sets a series of new records.
    摘要 最近的进展在自监视3D人体动作表示学习中很大程度上归功于对比学习。然而,在传统的对比框架中,不同骨骼模式之间的丰富补充关系尚未得到充分利用。此外,使用自我增强样本进行优化的模型在有限的动作类型下难以处理大量相似正例的问题。在这项工作中,我们解决了上述问题通过介入一种通用的Inter-和Intra-modal Mutual Distillation(I$^2$MD)框架。在I$^2$MD中,我们首先将交叉模式交互重新表述为交叉模式共同馈敷(CMD)过程。与现有的馈敷解决方案不同,在CMD中,知识不断更新和bidirectionally馈敷 между模式 during pre-training。为了解决相似样本之间的干扰和利用他们的基础上下文,我们进一步设计了Intra-modal Mutual Distillation(IMD)策略。在IMD中,我们首次引入了Dynamic Neighbors Aggregation(DNA)机制,其中每个模式中增加了一个额外的群集级别分支。它适应地聚合高度相关的邻近特征,形成局部群集级别对比。然后,我们进行了相互馈敷 между两个分支,以实现交叉知识交换。我们在三个 dataset 上进行了广泛的实验,结果显示,我们的方法创造了一系列新的纪录。

PET Synthesis via Self-supervised Adaptive Residual Estimation Generative Adversarial Network

  • paper_url: http://arxiv.org/abs/2310.15550
  • repo_url: None
  • paper_authors: Yuxin Xue, Lei Bi, Yige Peng, Michael Fulham, David Dagan Feng, Jinman Kim
  • for: 降低对 positron emission tomography (PET) 的辐射暴露,同时维持高品质的 molecular imaging 图像。
  • methods: 使用 convolutional neural networks (CNNs) 生成来源低剂量 PET 图像,并使用自适应 residual estimation 生成 adversarial network (SS-AEGAN) 来解决 texture 和结构之间的差异问题,以及处理资料分布的迁移问题。
  • results: SS-AEGAN 在一个公共 benchmark 资料集上的实验中,与 state-of-the-art 合成方法相比,具有较高的效果,并且可以降低辐射暴露。
    Abstract Positron emission tomography (PET) is a widely used, highly sensitive molecular imaging in clinical diagnosis. There is interest in reducing the radiation exposure from PET but also maintaining adequate image quality. Recent methods using convolutional neural networks (CNNs) to generate synthesized high-quality PET images from low-dose counterparts have been reported to be state-of-the-art for low-to-high image recovery methods. However, these methods are prone to exhibiting discrepancies in texture and structure between synthesized and real images. Furthermore, the distribution shift between low-dose PET and standard PET has not been fully investigated. To address these issues, we developed a self-supervised adaptive residual estimation generative adversarial network (SS-AEGAN). We introduce (1) An adaptive residual estimation mapping mechanism, AE-Net, designed to dynamically rectify the preliminary synthesized PET images by taking the residual map between the low-dose PET and synthesized output as the input, and (2) A self-supervised pre-training strategy to enhance the feature representation of the coarse generator. Our experiments with a public benchmark dataset of total-body PET images show that SS-AEGAN consistently outperformed the state-of-the-art synthesis methods with various dose reduction factors.
    摘要 Positron emission tomography (PET) 是一种广泛使用、高度敏感的分子成像技术,在临床诊断中有广泛应用。然而,PET扫描产生的辐射暴露可能会导致影像质量下降。为了解决这问题,我们开发了一种自适应差分估计生成 adversarial neural network(SS-AEGAN)。我们提出了以下两点:1. 适应差分估计映射机制(AE-Net),用于在低剂量PET图像的基础上,动态修正预先生成的PET图像,通过将低剂量PET图像和生成输出之间的差分Map作为输入,以 rectify 预先生成的PET图像。2. 一种自动预训练策略,用于增强生成器的特征表示。我们的实验表明,SS-AEGAN在一个公共的整体PET图像benchmark dataset上,一直表现出优于当前的生成方法,并且可以处理不同的辐射减少因子。

Segue: Side-information Guided Generative Unlearnable Examples for Facial Privacy Protection in Real World

  • paper_url: http://arxiv.org/abs/2310.16061
  • repo_url: None
  • paper_authors: Zhiling Zhang, Jie Zhang, Kui Zhang, Wenbo Zhou, Weiming Zhang, Nenghai Yu
  • for: 实现隐私保护,防止人脸识别系统从数据中学习歧视特征。
  • methods: 使用“不可学习例子”的概念,在模型训练阶段添加不可见的干扰,以防止模型学习歧视特征。
  • results: 提出了一个名为Segue的新方法,可以快速生成可转移的不可学习例子,并且具有对抗JPEG压缩、敌方训练和一些标准的数据增强等特性。
    Abstract The widespread use of face recognition technology has given rise to privacy concerns, as many individuals are worried about the collection and utilization of their facial data. To address these concerns, researchers are actively exploring the concept of ``unlearnable examples", by adding imperceptible perturbation to data in the model training stage, which aims to prevent the model from learning discriminate features of the target face. However, current methods are inefficient and cannot guarantee transferability and robustness at the same time, causing impracticality in the real world. To remedy it, we propose a novel method called Segue: Side-information guided generative unlearnable examples. Specifically, we leverage a once-trained multiple-used model to generate the desired perturbation rather than the time-consuming gradient-based method. To improve transferability, we introduce side information such as true labels and pseudo labels, which are inherently consistent across different scenarios. For robustness enhancement, a distortion layer is integrated into the training pipeline. Extensive experiments demonstrate that the proposed Segue is much faster than previous methods (1000$\times$) and achieves transferable effectiveness across different datasets and model architectures. Furthermore, it can resist JPEG compression, adversarial training, and some standard data augmentations.
    摘要 广泛使用人脸识别技术已引起隐私问题,许多人担心模型会收集和利用他们的面部数据。为解决这些问题,研究人员正在积极探索“不可学习示例”的概念,通过在模型训练阶段添加不可见的干扰,以防止模型学习权倾向的面部特征。然而,当前的方法存在效率和可靠性问题,导致实际应用中存在不可避免的问题。为此,我们提出了一种新的方法:Segue:侧情况引导生成不可学习示例。具体来说,我们利用一个已经训练过的多用模型来生成所需的干扰,而不是时间消耗的梯度基本方法。为了提高传输性,我们引入了侧情况,如真实标签和假标签,这些情况是不同场景中的自然一致性。为了增强Robustness,我们在训练管道中添加了扭曲层。广泛的实验表明,我们提出的Segue比前一代方法(1000倍)更快速,并在不同的数据集和模型结构上实现了传输性和可靠性。此外,它还能抵抗JPEG压缩、反击训练和一些标准的数据扩展。

Learning with Noisy Labels Using Collaborative Sample Selection and Contrastive Semi-Supervised Learning

  • paper_url: http://arxiv.org/abs/2310.15533
  • repo_url: None
  • paper_authors: Qing Miao, Xiaohe Wu, Chao Xu, Yanli Ji, Wangmeng Zuo, Yiwen Guo, Zhaopeng Meng
  • for: 提高 Learning with Noisy Labels (LNL) 中的泛化性能。
  • methods: 提出了一种名为协同样本选择 (CSS) 的方法,利用大规模预训练模型 CLIP,从混合噪声样本中减少混合噪声样本。通过将 CLIP 的概率与 DNN 分类器的预测结果组合,使用 2D-GMM 训练。同时,通过对 CLIP 的招待语言进行练习,提高了 DNN 分类器的特征表示,提高了分类性能。
  • results: 在多个 benchmark 数据集上实验结果表明,提出的方法在比 estado-of-the-art 方法更高的泛化性能。
    Abstract Learning with noisy labels (LNL) has been extensively studied, with existing approaches typically following a framework that alternates between clean sample selection and semi-supervised learning (SSL). However, this approach has a limitation: the clean set selected by the Deep Neural Network (DNN) classifier, trained through self-training, inevitably contains noisy samples. This mixture of clean and noisy samples leads to misguidance in DNN training during SSL, resulting in impaired generalization performance due to confirmation bias caused by error accumulation in sample selection. To address this issue, we propose a method called Collaborative Sample Selection (CSS), which leverages the large-scale pre-trained model CLIP. CSS aims to remove the mixed noisy samples from the identified clean set. We achieve this by training a 2-Dimensional Gaussian Mixture Model (2D-GMM) that combines the probabilities from CLIP with the predictions from the DNN classifier. To further enhance the adaptation of CLIP to LNL, we introduce a co-training mechanism with a contrastive loss in semi-supervised learning. This allows us to jointly train the prompt of CLIP and the DNN classifier, resulting in improved feature representation, boosted classification performance of DNNs, and reciprocal benefits to our Collaborative Sample Selection. By incorporating auxiliary information from CLIP and utilizing prompt fine-tuning, we effectively eliminate noisy samples from the clean set and mitigate confirmation bias during training. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed method in comparison with the state-of-the-art approaches.
    摘要 学习噪声标签(LNL)已经得到了广泛的研究,现有的方法通常采用一种框架,即 alternate between clean sample selection和半supervised learning(SSL)。然而,这种方法有一个限制:深度神经网络(DNN)分类器,通过自我训练,选择的干净集(clean set)一定会包含噪声样本。这种混合干净和噪声样本的组合会导致DNN训练在SSL阶段的偏导,从而导致随机误差的积累,从而降低分类性能。为解决这个问题,我们提出了一种方法called Collaborative Sample Selection(CSS),它利用了大规模预训练的模型CLIP。CSS的目标是从已知干净集中除掉混合的噪声样本。我们通过训练一个2维Gaussian Mixture Model(2D-GMM),将CLIP的概率与DNN分类器的预测结合起来,实现这一目标。为了进一步提高CLIP在LNL中的适应性,我们引入了一种contrastive loss的co-training机制,使得我们可以同时训练CLIP的提示和DNN分类器,从而提高分类性能。通过利用CLIP的auxiliary信息和提示细化,我们可以有效地从干净集中除掉噪声样本,避免随机误差的积累,并提高分类性能。我们在多个benchmark数据集上进行了实验,与状态 искусственныйints的方法进行比较,结果表明我们的提posed方法的效果。

NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation

  • paper_url: http://arxiv.org/abs/2310.19820
  • repo_url: None
  • paper_authors: Shunyao Zhang, Yonggan Fu, Shang Wu, Jyotikrishna Dass, Haoran You, Yingyan, Lin
  • for: 提高 tiny neural network (TNN) 的任务准确率,以便在具有限制的 Edge 设备上部署 TNN。
  • methods: 提出了一种名为 NetDistiller 的框架,该框架通过将 TNN 视为一个权重共享的老师模型的子网络,并通过(1)梯度手术和(2)uncertainty-aware distillation来处理梯度冲突和老师模型过度适应。
  • results: 对多种任务进行了广泛的实验,证明 NetDistiller 可以有效地提高 TNN 的可达准确率,并且超过了现有方法的性能。代码可以在 https://github.com/GATECH-EIC/NetDistiller 上下载。
    Abstract Boosting the task accuracy of tiny neural networks (TNNs) has become a fundamental challenge for enabling the deployments of TNNs on edge devices which are constrained by strict limitations in terms of memory, computation, bandwidth, and power supply. To this end, we propose a framework called NetDistiller to boost the achievable accuracy of TNNs by treating them as sub-networks of a weight-sharing teacher constructed by expanding the number of channels of the TNN. Specifically, the target TNN model is jointly trained with the weight-sharing teacher model via (1) gradient surgery to tackle the gradient conflicts between them and (2) uncertainty-aware distillation to mitigate the overfitting of the teacher model. Extensive experiments across diverse tasks validate NetDistiller's effectiveness in boosting TNNs' achievable accuracy over state-of-the-art methods. Our code is available at https://github.com/GATECH-EIC/NetDistiller.
    摘要 增强小神经网络(TNNs)的任务准确率已成为实现TNNs在边缘设备上的部署的基本挑战,这些设备受限于内存、计算、带宽和电池供应等方面的严格限制。为此,我们提出了名为NetDistiller的框架,用于提高TNNs的可达准确率。具体来说,我们将Target TNN模型和扩展annels的weight-sharing教师模型一起培训,使得它们可以共享参数。我们通过(1)梯度手术和(2)uncertainty-aware distillation来解决这两个模型之间的梯度冲突和教师模型过拟合。我们的实验证明,NetDistiller可以覆盖多个任务中的TNNs,并且与现有方法相比具有更高的效果。我们的代码可以在https://github.com/GATECH-EIC/NetDistiller中找到。

Cross-view Self-localization from Synthesized Scene-graphs

  • paper_url: http://arxiv.org/abs/2310.15504
  • repo_url: None
  • paper_authors: Ryogo Yamamoto, Kanji Tanaka
  • for: 本研究旨在解决跨视角自地标记问题,该问题中提供的数据库图像来自稀疏的视角。
  • methods: 本研究使用NeRF技术生成数据库图像,并使用图像纹理场学习抽象出视点不变的外观特征和视点相关的空间 semantic特征。然后,这两种特征 fusion到Scene Graph中,通过图 neural network压缩学习和识别。
  • results: 研究人员通过一个新的混合场景模型,将视点不变的外观特征和视点相关的空间 semantic特征 fusion到Scene Graph中,并使用图 neural network压缩学习和识别。实验结果表明,提案的方法可以在跨视角自地标记问题中提供高性能。
    Abstract Cross-view self-localization is a challenging scenario of visual place recognition in which database images are provided from sparse viewpoints. Recently, an approach for synthesizing database images from unseen viewpoints using NeRF (Neural Radiance Fields) technology has emerged with impressive performance. However, synthesized images provided by these techniques are often of lower quality than the original images, and furthermore they significantly increase the storage cost of the database. In this study, we explore a new hybrid scene model that combines the advantages of view-invariant appearance features computed from raw images and view-dependent spatial-semantic features computed from synthesized images. These two types of features are then fused into scene graphs, and compressively learned and recognized by a graph neural network. The effectiveness of the proposed method was verified using a novel cross-view self-localization dataset with many unseen views generated using a photorealistic Habitat simulator.
    摘要 cross-view自本地化是视觉地标识中的一种具有挑战性的场景,其中数据库图像来自稀见的视角。近些年,使用NeRF(神经辐射场)技术生成数据库图像从未看过的视角的方法出现了,表现出色。然而,这些技术生成的图像与原始图像质量相比较低,同时也增加了数据库存储成本。在这项研究中,我们探索了一种新的混合场景模型,它结合了raw图像中的视变不变特征和synthesized图像中的视依存空间semantic特征。这两种特征然后被融合到场景图中,并通过图 neural网络压缩学习和识别。研究的效果得到了一个新的cross-view自本地化数据集,使用了真实的Habitat模拟器生成的多个未看过的视角。

Salient Object Detection in RGB-D Videos

  • paper_url: http://arxiv.org/abs/2310.15482
  • repo_url: https://github.com/kerenfu/rdvs
  • paper_authors: Ao Mou, Yukang Lu, Jiahao He, Dingyao Min, Keren Fu, Qijun Zhao
  • for: 这篇论文主要用于研究RGB-D视频中的鲜明 объек检测(SOD)技术,以提高视频中对象的检测精度。
  • methods: 该论文提出了一个新的三流网络模型(DCTNet+),其中RGB模式作为主要输入模式,并使用了depth和运动流作为auxiliary modalities。具有多modal注意模块(MAM)和修复融合模块(RFM)等两个模块,以便强化特征提升、修复和融合,以实现高精度的最终预测。
  • results: 对于17个视频SOD模型和14个RGB-D SOD模型,DCTNet+在pseudo RGB-D视频 dataset上的实验表明其在鲜明 объек检测任务中表现出优于其他模型。此外,对于真实的RGB-D视频 dataset,DCTNet+也表现出了优于其他模型。
    Abstract Given the widespread adoption of depth-sensing acquisition devices, RGB-D videos and related data/media have gained considerable traction in various aspects of daily life. Consequently, conducting salient object detection (SOD) in RGB-D videos presents a highly promising and evolving avenue. Despite the potential of this area, SOD in RGB-D videos remains somewhat under-explored, with RGB-D SOD and video SOD (VSOD) traditionally studied in isolation. To explore this emerging field, this paper makes two primary contributions: the dataset and the model. On one front, we construct the RDVS dataset, a new RGB-D VSOD dataset with realistic depth and characterized by its diversity of scenes and rigorous frame-by-frame annotations. We validate the dataset through comprehensive attribute and object-oriented analyses, and provide training and testing splits. Moreover, we introduce DCTNet+, a three-stream network tailored for RGB-D VSOD, with an emphasis on RGB modality and treats depth and optical flow as auxiliary modalities. In pursuit of effective feature enhancement, refinement, and fusion for precise final prediction, we propose two modules: the multi-modal attention module (MAM) and the refinement fusion module (RFM). To enhance interaction and fusion within RFM, we design a universal interaction module (UIM) and then integrate holistic multi-modal attentive paths (HMAPs) for refining multi-modal low-level features before reaching RFMs. Comprehensive experiments, conducted on pseudo RGB-D video datasets alongside our RDVS, highlight the superiority of DCTNet+ over 17 VSOD models and 14 RGB-D SOD models. Ablation experiments were performed on both pseudo and realistic RGB-D video datasets to demonstrate the advantages of individual modules as well as the necessity of introducing realistic depth. Our code together with RDVS dataset will be available at https://github.com/kerenfu/RDVS/.
    摘要 由于深度感知设备的广泛应用,RGB-D视频和相关数据/媒体在不同方面得到了广泛的应用。因此,在RGB-D视频中进行突出物体检测(SOD)是一个非常有前途的和发展中的领域。尽管这个领域的潜力很大,但RGB-D SOD和视频 SOD(VSOD)通常是隔离的,这个领域的探索仍然很有前途。为了探索这个新兴领域,本文做出了两个主要贡献: datasets 和模型。一方面,我们构建了 RDVS dataset,一个新的 RGB-D VSOD dataset,其特点是多样化的场景和精心注解每帧数据。我们通过了全面的特征和物体分析,并提供了训练和测试分割。此外,我们引入 DCTNet+,一种三流网络,其中RGB模式具有主导性,并将深度和光流视为助手模式。为了提高特征优化、融合和混合,我们提出了两个模块:多模式注意力模块(MAM)和修复融合模块(RFM)。为了增强 RFM 之间的互动和融合,我们设计了一个通用互动模块(UIM),并将整体多模式注意力路径(HMAPs)integrated into RFMs。对 pseudo RGB-D 视频 dataset 和我们的 RDVS 进行了广泛的实验,显示 DCTNet+ 在 17 个 VSOD 模型和 14 个 RGB-D SOD 模型中表现出色。我们还进行了对 pseudo 和真实 RGB-D 视频 dataset 的ablation 实验,以示各个模块的优势和引入真实深度的必要性。我们的代码以及 RDVS dataset 将在 GitHub 上提供,链接为:https://github.com/kerenfu/RDVS/.

DeepIron: Predicting Unwarped Garment Texture from a Single Image

  • paper_url: http://arxiv.org/abs/2310.15447
  • repo_url: None
  • paper_authors: Hyun-Song Kwon, Sung-Hee Lee
  • for: 创建虚拟人物和虚拟试穿
  • methods: 使用纹理抽象 Framework,包括Texture Unwarper,将输入图像中的纹理抽象到3D裤子模型中
  • results: 生成高质量的纹理图像,可以在新的姿势下显示真实扭曲的3D裤子模型
    Abstract Realistic reconstruction of 3D clothing from an image has wide applications, such as avatar creation and virtual try-on. This paper presents a novel framework that reconstructs the texture map for 3D garments from a single image with pose. Assuming that 3D garments are modeled by stitching 2D garment sewing patterns, our specific goal is to generate a texture image for the sewing patterns. A key component of our framework, the Texture Unwarper, infers the original texture image from the input clothing image, which exhibits warping and occlusion of texture due to the user's body shape and pose. The Texture Unwarper effectively transforms between the input and output images by mapping the latent spaces of the two images. By inferring the unwarped original texture of the input garment, our method helps reconstruct 3D garment models that can show high-quality texture images realistically deformed for new poses. We validate the effectiveness of our approach through a comparison with other methods and ablation studies.
    摘要 现实重建三维服装图像从图像中的应用广泛,如创建化身和虚拟试穿。本文提出了一种新的框架,可以从单个图像中恢复三维服装的纹理地图。我们的具体目标是生成适合三维服装模型的纹理图像。我们的框架中的纹理恢复器(Texture Unwarper)可以从输入服装图像中推理出原始纹理图像,这个图像受用户的身体形态和姿势的扭曲和遮盖。纹理恢复器可以将输入和输出图像的 latent space 进行可靠地映射,从而将输入服装图像恢复成高质量的三维服装图像。我们验证了我们的方法的有效性通过与其他方法进行比较和简化研究。

Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks

  • paper_url: http://arxiv.org/abs/2310.15444
  • repo_url: None
  • paper_authors: Xiaojun Jia, Jianshu Li, Jindong Gu, Yang Bai, Xiaochun Cao
  • For: 提高模型的对抗性和训练效率* Methods: 使用单步攻击基于模型内部块的动态抽象网络,并提供了理论分析以证明模型对抗性的提高* Results: 比对前方法更好地降低训练成本,同时达到更高的模型对抗性水平
    Abstract Adversarial training has shown promise in building robust models against adversarial examples. A major drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples. To overcome this limitation, adversarial training based on single-step attacks has been explored. Previous work improves the single-step adversarial training from different perspectives, e.g., sample initialization, loss regularization, and training strategy. Almost all of them treat the underlying model as a black box. In this work, we propose to exploit the interior building blocks of the model to improve efficiency. Specifically, we propose to dynamically sample lightweight subnetworks as a surrogate model during training. By doing this, both the forward and backward passes can be accelerated for efficient adversarial training. Besides, we provide theoretical analysis to show the model robustness can be improved by the single-step adversarial training with sampled subnetworks. Furthermore, we propose a novel sampling strategy where the sampling varies from layer to layer and from iteration to iteration. Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness. Evaluations on a series of popular datasets demonstrate the effectiveness of the proposed FB-Better. Our code has been released at https://github.com/jiaxiaojunQAQ/FP-Better.
    摘要 <>通过对抗训练来建立对抗示例的模型 robustness 已经显示出了承诺。对抗训练的一个主要缺点是生成对抗示例所带来的计算开销。为了解决这个限制,基于单步攻击的对抗训练已经得到了探索。先前的工作从不同的角度改进了单步对抗训练,例如样本初始化、损失规则化和训练策略。大多数情况下,这些方法都将模型视为黑盒子。在这种情况下,我们提议使用模型的内部建筑部件来提高效率。具体来说,我们提议在训练过程中动态选择轻量级子网络作为代理模型。这样,在前向和反向传播中,都可以加速对抗训练。此外,我们提供了理论分析,证明单步对抗训练可以通过采样轻量级子网络来提高模型的鲁棒性。此外,我们还提出了一种新的采样策略,其中采样的层次和迭代次数都会变化。相比之前的方法,我们的方法不仅减少了训练成本,还可以达到更好的模型鲁棒性。我们的代码已经在https://github.com/jiaxiaojunQAQ/FP-Better上发布。Note: The text has been translated using Google Translate, and some parts may not be perfectly accurate or idiomatic.

G2-MonoDepth: A General Framework of Generalized Depth Inference from Monocular RGB+X Data

  • paper_url: http://arxiv.org/abs/2310.15422
  • repo_url: https://github.com/wang-xjtu/g2-monodepth
  • paper_authors: Haotian Wang, Meng Yang, Nanning Zheng
  • for: This paper aims to solve the problem of monocular depth inference for robots, which is a fundamental problem for scene perception.
  • methods: The paper proposes a unified task of monocular depth inference, which uses a unified data representation, a novel unified loss, an improved network, and a data augmentation pipeline to well propagate diverse scene scales from input to output.
  • results: The paper demonstrates the effectiveness of its approach on three sub-tasks, including depth estimation, depth completion with different sparsity, and depth enhancement in unseen scenes, and outperforms state-of-the-art baselines on both real-world data and synthetic data.Here’s the Chinese version:
  • for: 本研究旨在解决机器人场景识别中的单目深度推断问题,是机器人Scene perception的基础问题。
  • methods: 本paper提出了一种统一任务的单目深度推断方法,使用统一的数据表示,一种新的统一损失函数,改进的网络和数据增强管道,以很好地传递不同场景比例的输入数据到输出场景中。
  • results: 本paper在三个子任务中,包括深度估计、深度缺失不同粒度和场景中的深度增强,都以状态体系内的基线性能为代表,并在实际数据和 sintetic数据上都超过了状态体系内的基线性能。
    Abstract Monocular depth inference is a fundamental problem for scene perception of robots. Specific robots may be equipped with a camera plus an optional depth sensor of any type and located in various scenes of different scales, whereas recent advances derived multiple individual sub-tasks. It leads to additional burdens to fine-tune models for specific robots and thereby high-cost customization in large-scale industrialization. This paper investigates a unified task of monocular depth inference, which infers high-quality depth maps from all kinds of input raw data from various robots in unseen scenes. A basic benchmark G2-MonoDepth is developed for this task, which comprises four components: (a) a unified data representation RGB+X to accommodate RGB plus raw depth with diverse scene scale/semantics, depth sparsity ([0%, 100%]) and errors (holes/noises/blurs), (b) a novel unified loss to adapt to diverse depth sparsity/errors of input raw data and diverse scales of output scenes, (c) an improved network to well propagate diverse scene scales from input to output, and (d) a data augmentation pipeline to simulate all types of real artifacts in raw depth maps for training. G2-MonoDepth is applied in three sub-tasks including depth estimation, depth completion with different sparsity, and depth enhancement in unseen scenes, and it always outperforms SOTA baselines on both real-world data and synthetic data.
    摘要 <> simulationMonocular depth inference 是场景理解机器人的基本问题。特定机器人可能配备了一个摄像头 plus 任何类型的深度传感器,并处在不同的尺度和不同的场景中。而最近的进步导致了多个个人子任务的分化,从而导致了大规模工业化的高成本定制。这篇论文研究了一个统一的幌 depth inference 任务,可以从所有类型的输入原始数据中推算出高质量的深度图。为了实现这一目标,我们开发了一个基本的标准准则 G2-MonoDepth,它包括以下四个Component:(a)一种统一的数据表示RGB+X,可以容纳RGB plus Raw depth 的多种场景尺度/semantics、深度缺失([0%, 100%))和错误(孔隙/噪声/模糊)。(b)一种新的统一损失,可以适应输入原始数据中的多种深度缺失/错误和输出场景中的多种尺度。(c)一种改进的网络,可以好地传播输入场景中的多种尺度到输出场景中。(d)一个数据增强管道,可以模拟所有类型的真实遗传 artifacts 在原始深度图中,用于训练。G2-MonoDepth 在三个子任务中应用,包括深度估算、深度完成不同缺失和深度增强在未看到场景中,并一直高于 SOTA 基eline 在实际数据和 sintetic 数据上。>>>

cs.AI - 2023-10-24

Speakerly: A Voice-based Writing Assistant for Text Composition

  • paper_url: http://arxiv.org/abs/2310.16251
  • repo_url: None
  • paper_authors: Dhruv Kumar, Vipul Raheja, Alice Kaiser-Schatzlein, Robyn Perry, Apurva Joshi, Justin Hugues-Nuger, Samuel Lou, Navid Chowdhury
  • for: 这篇论文是为了描述一个新的实时语音基于写作帮助系统,帮助用户在电子邮件、快递短信和笔记等不同场景下进行文本compose。
  • methods: 该系统使用小型、任务特定的模型,以及预训练的语言模型,实现快速和有效的文本compose,同时支持多种输入方式以提高使用性。
  • results: 该系统可以生成排版好的和协调的文档,并且可以在大规模部署中实现。
    Abstract We present Speakerly, a new real-time voice-based writing assistance system that helps users with text composition across various use cases such as emails, instant messages, and notes. The user can interact with the system through instructions or dictation, and the system generates a well-formatted and coherent document. We describe the system architecture and detail how we address the various challenges while building and deploying such a system at scale. More specifically, our system uses a combination of small, task-specific models as well as pre-trained language models for fast and effective text composition while supporting a variety of input modes for better usability.
    摘要 我们现在推出Speakerly,一个新的实时语音基于文本协助系统,帮助用户在不同的用例中(如电子邮件、即时消息和笔记)进行文本组成。用户可以通过指令或讲解与系统交互,系统会生成排版完善、流畅的文档。我们详细介绍系统架构,并解决了在建立和扩展这种系统的许多挑战。更specifically,我们的系统使用小型、任务特定的模型,以及预训练的语言模型,以快速和高效地进行文本组成,同时支持多种输入模式,以提高用户体验。

A clustering tool for interrogating finite element models based on eigenvectors of graph adjacency

  • paper_url: http://arxiv.org/abs/2310.16249
  • repo_url: None
  • paper_authors: Ramaseshan Kannan
  • for: 用于 debug Finite Element(FE)模型中的错误
  • methods: 使用无监督学习算法,对FE模型中的度量自由度进行卷积 clustering,基于数值的邻域稠密矩阵的数学性质
  • results: 已经成功应用于实际世界FE模型 debug,并提供了使用示例
    Abstract This note introduces an unsupervised learning algorithm to debug errors in finite element (FE) simulation models and details how it was productionised. The algorithm clusters degrees of freedom in the FE model using numerical properties of the adjacency of its stiffness matrix. The algorithm has been deployed as a tool called `Model Stability Analysis' tool within the commercial structural FE suite Oasys GSA (www.oasys-software.com/gsa). It has been used successfully by end-users for debugging real world FE models and we present examples of the tool in action.
    摘要 这份笔记介绍了一种无监督学习算法,用于 finite element(FE)模拟器中错误的调试,并详细介绍了其生产化过程。该算法使用 numerical properties of the adjacency of its stiffness matrix to cluster degrees of freedom in the FE model。该工具被命名为“Model Stability Analysis”工具,并在商业structural FE集成环境Oasys GSA(www.oasys-software.com/gsa)中部署。它已经由用户成功地用于真实世界FE模型的调试,我们现在提供了这种工具在行动的示例。

Pixel-Level Clustering Network for Unsupervised Image Segmentation

  • paper_url: http://arxiv.org/abs/2310.16234
  • repo_url: None
  • paper_authors: Cuong Manh Hoang, Byeongkeun Kang
  • for: 提高图像分割精度和效率,无需ground truth标注。
  • methods: 提出了一种基于像素嵌入、注意力机制、图像统计和超像素分割的无监督图像分割框架。
  • results: 实验结果表明,提出的方法在三个公开 dataset(Berkeley segmentation dataset、PASCAL VOC 2012 dataset和COCO-Stuff dataset)上表现出色,与前一代方法相比有所提高。
    Abstract While image segmentation is crucial in various computer vision applications, such as autonomous driving, grasping, and robot navigation, annotating all objects at the pixel-level for training is nearly impossible. Therefore, the study of unsupervised image segmentation methods is essential. In this paper, we present a pixel-level clustering framework for segmenting images into regions without using ground truth annotations. The proposed framework includes feature embedding modules with an attention mechanism, a feature statistics computing module, image reconstruction, and superpixel segmentation to achieve accurate unsupervised segmentation. Additionally, we propose a training strategy that utilizes intra-consistency within each superpixel, inter-similarity/dissimilarity between neighboring superpixels, and structural similarity between images. To avoid potential over-segmentation caused by superpixel-based losses, we also propose a post-processing method. Furthermore, we present an extension of the proposed method for unsupervised semantic segmentation. We conducted experiments on three publicly available datasets (Berkeley segmentation dataset, PASCAL VOC 2012 dataset, and COCO-Stuff dataset) to demonstrate the effectiveness of the proposed framework. The experimental results show that the proposed framework outperforms previous state-of-the-art methods.
    摘要 《Image Segmentation without Annotations: A Pixel-level Clustering Framework》Introduction:Image segmentation is a crucial technology in various computer vision applications, such as autonomous driving, grasping, and robot navigation. However, annotating all objects at the pixel-level for training is nearly impossible. Therefore, the study of unsupervised image segmentation methods is essential. In this paper, we propose a pixel-level clustering framework for segmenting images into regions without using ground truth annotations.Methodology:The proposed framework includes the following modules:1. Feature Embedding Modules with Attention Mechanism: These modules extract features from the input images and embed them into a lower-dimensional space using an attention mechanism.2. Feature Statistics Computing Module: This module computes the statistics of the embedded features to capture the distribution of the data.3. Image Reconstruction: This module reconstructs the input image from the embedded features to measure the similarity between the embedded features and the original image.4. Superpixel Segmentation: This module segments the input image into superpixels using the reconstructed image.Training Strategy:We propose a training strategy that utilizes intra-consistency within each superpixel, inter-similarity/dissimilarity between neighboring superpixels, and structural similarity between images. To avoid potential over-segmentation caused by superpixel-based losses, we also propose a post-processing method.Extension to Unsupervised Semantic Segmentation:We extend the proposed framework to perform unsupervised semantic segmentation by incorporating a semantic label predictor. We conduct experiments on three publicly available datasets (Berkeley segmentation dataset, PASCAL VOC 2012 dataset, and COCO-Stuff dataset) to demonstrate the effectiveness of the proposed framework. The experimental results show that the proposed framework outperforms previous state-of-the-art methods.In summary, the proposed pixel-level clustering framework provides an effective solution for unsupervised image segmentation without annotated data. The framework has broad applications in computer vision and can be further extended to other tasks such as object detection and semantic segmentation.

CleanCoNLL: A Nearly Noise-Free Named Entity Recognition Dataset

  • paper_url: http://arxiv.org/abs/2310.16225
  • repo_url: https://github.com/flairnlp/cleanconll
  • paper_authors: Susanna Rücker, Alan Akbik
  • for: 本文旨在提高CoNLL-03数据集的标注质量,以便对名实Recognition(NER)模型进行比较和分析。
  • methods: 本研究使用自动一致性检查和实体链接注意力,对英文CoNLL-03数据集进行了全面的重新标注。
  • results: 我们的实验发现,使用我们的数据集,现有的状态aset的模型可以达到97.1%的F1分数,而且错误率下降了从47%到6%。这表明我们的资源适用于分析现有模型的错误,并且证明了现有的模型尚未达到理论的最高性能边界。
    Abstract The CoNLL-03 corpus is arguably the most well-known and utilized benchmark dataset for named entity recognition (NER). However, prior works found significant numbers of annotation errors, incompleteness, and inconsistencies in the data. This poses challenges to objectively comparing NER approaches and analyzing their errors, as current state-of-the-art models achieve F1-scores that are comparable to or even exceed the estimated noise level in CoNLL-03. To address this issue, we present a comprehensive relabeling effort assisted by automatic consistency checking that corrects 7.0% of all labels in the English CoNLL-03. Our effort adds a layer of entity linking annotation both for better explainability of NER labels and as additional safeguard of annotation quality. Our experimental evaluation finds not only that state-of-the-art approaches reach significantly higher F1-scores (97.1%) on our data, but crucially that the share of correct predictions falsely counted as errors due to annotation noise drops from 47% to 6%. This indicates that our resource is well suited to analyze the remaining errors made by state-of-the-art models, and that the theoretical upper bound even on high resource, coarse-grained NER is not yet reached. To facilitate such analysis, we make CleanCoNLL publicly available to the research community.
    摘要 CoNLL-03 资料集是可能最具知名度和使用度的命名实体识别(NER)的标准benchmark dataset。然而,前一些研究发现CoNLL-03中存在大量的注释错误、不完整性和不一致性问题。这会带来对NERapproaches的比较和分析其错误的困难,因为当前的State-of-the-art模型在CoNLL-03中达到了F1分数,与或者超过了估计的注释噪声水平。为解决这个问题,我们提供了一项全面的重新标注努力,利用自动一致性检查来更正CoNLL-03英文版的7.0%的标签。我们的努力还添加了实体关联注释,以提高NER标签的解释性和作为额外的注释质量保障。我们的实验评估发现,不仅State-of-the-art方法在我们的数据上达到了97.1%的F1分数,而且关键的是,由于注释噪声而被误 counted为错误的分数从47%降至6%。这表示我们的资源适用于分析State-of-the-art模型的剩下错误,并且证明了高资源、粗粒度NER的理论最高 bound还没有被实现。为便于这种分析,我们将CleanCoNLL公开提供给研究社区。

Hierarchical Randomized Smoothing

  • paper_url: http://arxiv.org/abs/2310.16221
  • repo_url: https://github.com/ManojKumarPatnaik/Major-project-list
  • paper_authors: Yan Scholten, Jan Schuchardt, Aleksandar Bojchevski, Stephan Günnemann
  • for: 提高模型对输入数据小变化的抗干扰性和准确性。
  • methods: 使用层次随机熵法,在不同级别上随机添加噪声,以提高模型的抗干扰性和准确性。
  • results: 在图像和节点分类任务中,通过层次随机熵法获得了更好的抗干扰性-准确性质量平衡,比传统方法更高。
    Abstract Real-world data is complex and often consists of objects that can be decomposed into multiple entities (e.g. images into pixels, graphs into interconnected nodes). Randomized smoothing is a powerful framework for making models provably robust against small changes to their inputs - by guaranteeing robustness of the majority vote when randomly adding noise before classification. Yet, certifying robustness on such complex data via randomized smoothing is challenging when adversaries do not arbitrarily perturb entire objects (e.g. images) but only a subset of their entities (e.g. pixels). As a solution, we introduce hierarchical randomized smoothing: We partially smooth objects by adding random noise only on a randomly selected subset of their entities. By adding noise in a more targeted manner than existing methods we obtain stronger robustness guarantees while maintaining high accuracy. We initialize hierarchical smoothing using different noising distributions, yielding novel robustness certificates for discrete and continuous domains. We experimentally demonstrate the importance of hierarchical smoothing in image and node classification, where it yields superior robustness-accuracy trade-offs. Overall, hierarchical smoothing is an important contribution towards models that are both - certifiably robust to perturbations and accurate.
    摘要 实际世界数据是复杂的,经常可以划分为多个实体(例如图像可以分解为像素)。随机简化是一个强大的框架,使模型对小改变输入的证明抗坏性 - 通过确保多数票的抗坏性在随机添加噪音之前。然而,在复杂数据上证明抗坏性通过随机简化是困难的,因为敌人不会随机改变整个对象(例如图像),而是只会改变一部分其实体(例如像素)。为解决这个问题,我们引入层次随机简化:我们在随机选择对象的一部分实体上添加随机噪音。由于我们在添加噪音的方式更加精准,我们可以获得更强的抗坏性保证,同时保持高准确率。我们使用不同的噪音分布初始化层次简化,得到了新的抗坏性证明 certificates для精度和连续域。我们在图像和节点分类中实际证明了层次简化的重要性,它在抗坏性-准确度质量上提供了优于现有方法的质量。总之,层次简化是一个重要的贡献,它使得模型可以同时拥有证明抗坏性和高准确率的能力。

Knowledge Editing for Large Language Models: A Survey

  • paper_url: http://arxiv.org/abs/2310.16218
  • repo_url: None
  • paper_authors: Song Wang, Yaochen Zhu, Haochen Liu, Zaiyi Zheng, Chen Chen, Jundong Li
    for:This paper aims to provide a comprehensive and in-depth overview of recent advances in the field of Knowledge-based Model Editing (KME) for pre-trained language models (LLMs).methods:The paper uses a general formulation of KME to encompass different KME strategies, and introduces an innovative taxonomy of KME techniques based on how the new knowledge is introduced into pre-trained LLMs.results:The paper provides an in-depth analysis of existing KME strategies, including their key insights, advantages, and limitations. It also introduces representative metrics, datasets, and applications of KME.
    Abstract Large language models (LLMs) have recently transformed both the academic and industrial landscapes due to their remarkable capacity to understand, analyze, and generate texts based on their vast knowledge and reasoning ability. Nevertheless, one major drawback of LLMs is their substantial computational cost for pre-training due to their unprecedented amounts of parameters. The disadvantage is exacerbated when new knowledge frequently needs to be introduced into the pre-trained model. Therefore, it is imperative to develop effective and efficient techniques to update pre-trained LLMs. Traditional methods encode new knowledge in pre-trained LLMs through direct fine-tuning. However, naively re-training LLMs can be computationally intensive and risks degenerating valuable pre-trained knowledge irrelevant to the update in the model. Recently, Knowledge-based Model Editing (KME) has attracted increasing attention, which aims to precisely modify the LLMs to incorporate specific knowledge, without negatively influencing other irrelevant knowledge. In this survey, we aim to provide a comprehensive and in-depth overview of recent advances in the field of KME. We first introduce a general formulation of KME to encompass different KME strategies. Afterward, we provide an innovative taxonomy of KME techniques based on how the new knowledge is introduced into pre-trained LLMs, and investigate existing KME strategies while analyzing key insights, advantages, and limitations of methods from each category. Moreover, representative metrics, datasets, and applications of KME are introduced accordingly. Finally, we provide an in-depth analysis regarding the practicality and remaining challenges of KME and suggest promising research directions for further advancement in this field.
    摘要 大型语言模型(LLM)在学术和工业领域的应用中已经导致了巨大的改变,主要是因为它们具有巨大的知识和推理能力,能够理解、分析和生成文本。然而,LLM的执行成本仍然是一个主要的障碍,尤其是当需要时常更新它们时。因此,发展有效和高效的LLM更新技术是非常重要的。传统的方法是通过直接精度调整 LLM 来将新知识给嵌入。然而,规范地再训练 LLM 可能是 computationally 的消耗性和对已有的知识的损害。因此, Knowledge-based Model Editing(KME)在最近引起了越来越多的关注,它目的是将特定知识给 LLM 中,而不影响其他无关知识。在这篇评论中,我们希望提供一个全面和深入的 KME 发展的总结。我们首先提出了 KME 的一般表述,然后提出了一个创新的 KME 分类法,根据新知识如何在预训 LLM 中整合。接着,我们分析了现有的 KME 策略,并评估了每个类别的关键见解、优点和局限性。此外,我们还介绍了 KME 的代表性 метри克、数据集和应用。最后,我们对 KME 的实用性和未来挑战进行了深入分析,并建议了可能的进一步发展方向。

Length is a Curse and a Blessing for Document-level Semantics

  • paper_url: http://arxiv.org/abs/2310.16193
  • repo_url: https://github.com/gowitheflow-1998/la-ser-cubed
  • paper_authors: Chenghao Xiao, Yizhi Li, G Thomas Hudson, Chenghua Lin, Noura Al Moubayed
  • for: This paper aims to investigate the length generalizability of contrastive learning (CL) models and develop a new framework for learning semantically robust sentence representations that is not vulnerable to length-induced semantic shift.
  • methods: The authors use unsupervised CL methods that rely solely on the semantic signal provided by document length to devise a new framework for learning sentence representations.
  • results: The proposed framework, LA(SER)$^{3}$, achieves state-of-the-art unsupervised performance on the standard information retrieval benchmark, demonstrating the effectiveness of the length-agnostic self-reference approach in learning semantically robust sentence representations.
    Abstract In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but we can devise unsupervised CL methods solely depending on the semantic signal provided by document length. We first derive the theoretical foundations underlying length attacks, showing that elongating a document would intensify the high intra-document similarity that is already brought by CL. Moreover, we found that isotropy promised by CL is highly dependent on the length range of text exposed in training. Inspired by these findings, we introduce a simple yet universal document representation learning framework, LA(SER)$^{3}$: length-agnostic self-reference for semantically robust sentence representation learning, achieving state-of-the-art unsupervised performance on the standard information retrieval benchmark.
    摘要 在最近几年,对比学习(CL)已经广泛应用于恢复先验语言模型中的句子和文档级编码能力。在这项工作中,我们质疑CL-based模型对长度的敏感性,即因文档长度而导致的 semantics shift。我们不仅证明了长度敏感性是一个被忽略的研究漏洞,而且我们可以靠文档长度的semantic signal来静默CL方法。我们首先 derive了对length attack的理论基础,表明了在CL中延长文档会增强高于文档内的同样性。此外,我们发现CL中的均匀性强度与文档训练中文本的长度范围有着高度相关性。 inspirited by这些发现,我们提出了一个简单 yet universally applicable的文档表示学习框架,即LA(SER)$^{3}$:不受文档长度影响的自referential学习方法,可以在标准信息检索benchmark上实现领先的无监督性能。

Correction with Backtracking Reduces Hallucination in Summarization

  • paper_url: http://arxiv.org/abs/2310.16176
  • repo_url: None
  • paper_authors: Zhenzhen Liu, Chao Wan, Varsha Kishore, Jin Peng Zhou, Minmin Chen, Kilian Q. Weinberger
    for: 本研究旨在提高神经网络抽象摘要模型的可靠性,减少幻想(也称为假想)现象,以生成简洁而准确的摘要。methods: 本研究提出了一种简单 yet efficient的技术——CoBa,通过两个步骤来减少幻想:检测幻想和抑制幻想。检测幻想可以通过测量 conditional word probabilities 和距离上下文字的距离来实现。而抑制幻想可以使用直观的 backtracking 技术。results: 对于三个文本摘要 benchmark 数据集,我们进行了广泛的评估。结果表明,CoBa 能够有效地减少幻想现象,同时具有很好的适应性和灵活性。
    Abstract Abstractive summarization aims at generating natural language summaries of a source document that are succinct while preserving the important elements. Despite recent advances, neural text summarization models are known to be susceptible to hallucinating (or more correctly confabulating), that is to produce summaries with details that are not grounded in the source document. In this paper, we introduce a simple yet efficient technique, CoBa, to reduce hallucination in abstractive summarization. The approach is based on two steps: hallucination detection and mitigation. We show that the former can be achieved through measuring simple statistics about conditional word probabilities and distance to context words. Further, we demonstrate that straight-forward backtracking is surprisingly effective at mitigation. We thoroughly evaluate the proposed method with prior art on three benchmark datasets for text summarization. The results show that CoBa is effective and efficient in reducing hallucination, and offers great adaptability and flexibility.
    摘要 摘要抄写目标在于生成自然语言摘要,以减少源文档中重要元素的抽象。尽管最近很多进步,神经网络抄写模型仍然容易受到幻觉(或更正确地说,杂谈)的影响,即生成摘要中的细节不基于源文档。在这篇论文中,我们介绍了一种简单 yet efficient的技术,CoBa,以减少抄写幻觉。该方法包括两个步骤:幻觉检测和缓解。我们表明,前者可以通过测量条件词概率和与上下文词的距离来实现。此外,我们示示了直观的回溯 surprisingly 有效于缓解。我们在三个文本摘要 benchmark 上进行了严格的评估,结果显示,CoBa 有效地减少了幻觉,并具有很好的适应性和灵活性。

Context-aware feature attribution through argumentation

  • paper_url: http://arxiv.org/abs/2310.16157
  • repo_url: None
  • paper_authors: Jinfeng Zhong, Elsa Negre
  • for: 本研究旨在提高现有的特征归因方法的精度和可解性,并考虑用户的上下文,以提高预测结果的准确性。
  • methods: 本研究使用了一种新的特征归因框架,即Context-Aware Feature Attribution Through Argumentation(CA-FATA),它基于了论证来处理每个特征,并将特征归因视为一种论证过程。
  • results: 对比 existed 方法,CA-FATA 能够更高度地考虑用户的上下文,并提高特征归因的精度和可解性。
    Abstract Feature attribution is a fundamental task in both machine learning and data analysis, which involves determining the contribution of individual features or variables to a model's output. This process helps identify the most important features for predicting an outcome. The history of feature attribution methods can be traced back to General Additive Models (GAMs), which extend linear regression models by incorporating non-linear relationships between dependent and independent variables. In recent years, gradient-based methods and surrogate models have been applied to unravel complex Artificial Intelligence (AI) systems, but these methods have limitations. GAMs tend to achieve lower accuracy, gradient-based methods can be difficult to interpret, and surrogate models often suffer from stability and fidelity issues. Furthermore, most existing methods do not consider users' contexts, which can significantly influence their preferences. To address these limitations and advance the current state-of-the-art, we define a novel feature attribution framework called Context-Aware Feature Attribution Through Argumentation (CA-FATA). Our framework harnesses the power of argumentation by treating each feature as an argument that can either support, attack or neutralize a prediction. Additionally, CA-FATA formulates feature attribution as an argumentation procedure, and each computation has explicit semantics, which makes it inherently interpretable. CA-FATA also easily integrates side information, such as users' contexts, resulting in more accurate predictions.
    摘要 feature 归属是机器学习和数据分析中的基本任务,它的目的是确定每个特征或变量对模型输出的贡献。这个过程可以帮助 Identify the most important features for predicting an outcome. 在过去,feature 归属方法的历史可以追溯到通用加itive模型(GAMs),它们将线性回归模型扩展到包括非线性висиendent和独立变量之间的关系。在过去几年,梯度基本方法和代理模型已经应用于解读复杂的人工智能(AI)系统,但这些方法有限制。GAMs通常具有较低的准确率,梯度基本方法可能难以理解,而代理模型经常受稳定性和准确性的问题。此外,大多数现有方法不考虑用户的上下文,这可能会对用户的偏好产生重要影响。为了缓解这些限制并提高当前状态艺术,我们定义了一种新的特征归属框架,即Context-Aware Feature Attribution Through Argumentation(CA-FATA)。我们的框架利用了论证的力量,将每个特征视为一个论证,这些论证可以支持、攻击或中和预测。CA-FATA还将特征归属定义为论证过程,每个计算有显式 semantics,这使其自然地可读性高。CA-FATA还容易集成用户的上下文信息,导致更加准确的预测。

Yin Yang Convolutional Nets: Image Manifold Extraction by the Analysis of Opposites

  • paper_url: http://arxiv.org/abs/2310.16148
  • repo_url: https://github.com/nosaveddata/yinyang_cnn
  • paper_authors: Augusto Seben da Rosa, Frederico Santos de Oliveira, Anderson da Silva Soares, Arnaldo Candido Junior
  • for: 这项研究的目的是提出一种基于生物学革新的计算机视觉模型,以便更好地模仿大脑的运作。
  • methods: 该模型采用了一种称为“阴阳 convolutional network”的新型架构,该架构包括分离颜色和形状的分析块,以模拟 occipital lobe 的运作。
  • results: 研究结果表明,该模型在 CIFAR-10 数据集上达到了 State-of-the-Art 级别的效率,其中第一个模型达到了 93.32% 的测试精度,比之前的 SOTA 高出 0.8%,同时具有 150k fewer parameters (726k 总参数). 第二个模型使用了 52k 参数,产生的测试精度下降了 3.86%。此外,我们还对 ImageNet 进行了分析,达到了 66.49% 的验证精度,使用了 1.6M 参数。代码已经公开在 GitHub 上:https://github.com/NoSavedDATA/YinYang_CNN。
    Abstract Computer vision in general presented several advances such as training optimizations, new architectures (pure attention, efficient block, vision language models, generative models, among others). This have improved performance in several tasks such as classification, and others. However, the majority of these models focus on modifications that are taking distance from realistic neuroscientific approaches related to the brain. In this work, we adopt a more bio-inspired approach and present the Yin Yang Convolutional Network, an architecture that extracts visual manifold, its blocks are intended to separate analysis of colors and forms at its initial layers, simulating occipital lobe's operations. Our results shows that our architecture provides State-of-the-Art efficiency among low parameter architectures in the dataset CIFAR-10. Our first model reached 93.32\% test accuracy, 0.8\% more than the older SOTA in this category, while having 150k less parameters (726k in total). Our second model uses 52k parameters, losing only 3.86\% test accuracy. We also performed an analysis on ImageNet, where we reached 66.49\% validation accuracy with 1.6M parameters. We make the code publicly available at: https://github.com/NoSavedDATA/YinYang_CNN.
    摘要 通用计算机视觉在总体上表现出了许多进步,如训练优化、新的架构(纯注意力、高效块、视觉语言模型、生成模型等)。这些进步使得在多个任务中表现得更好,如分类等。然而,大多数这些模型都强调与真实神经科学方法相关的距离减少。在这种情况下,我们采用了更加生物启发的方法,并提出了灵感来自于脑的Yin Yang卷积网络架构,该架构可以提取视觉拟合,其各层分析颜色和形状,类似于 occipital lobe 的运作。我们的结果表明,我们的架构在 CIFAR-10 数据集中提供了 State-of-the-Art 的效率,其中首个模型达到了 93.32% 的测试精度,较之前的 SOTA 高于 0.8%,同时具有 150k fewer 参数(总共 726k)。我们的第二个模型使用了 52k 参数,产生的损失只有 3.86%。此外,我们还对 ImageNet 进行了分析,其中我们达到了 66.49% 的验证精度,使用了 1.6M 参数。我们将代码公开发布在 GitHub 上:https://github.com/NoSavedDATA/YinYang_CNN。

PreWoMe: Exploiting Presuppositions as Working Memory for Long Form Question Answering

  • paper_url: http://arxiv.org/abs/2310.16147
  • repo_url: None
  • paper_authors: Wookje Han, Jinsol Park, Kyungjae Lee
  • for: 这个论文是为了解决信息搜寻问题中的假设和假设推理问题。
  • methods: 该论文使用了提取问题中的假设,并将其作为工作记忆来生成反馈和行动,以解决问题。
  • results: 实验表明,该方法不仅可以处理假设问题,还可以处理正常的问题, thereby demonstrating the effectiveness of leveraging presuppositions, feedback, and action for real-world QA settings。
    Abstract Information-seeking questions in long-form question answering (LFQA) often prove misleading due to ambiguity or false presupposition in the question. While many existing approaches handle misleading questions, they are tailored to limited questions, which are insufficient in a real-world setting with unpredictable input characteristics. In this work, we propose PreWoMe, a unified approach capable of handling any type of information-seeking question. The key idea of PreWoMe involves extracting presuppositions in the question and exploiting them as working memory to generate feedback and action about the question. Our experiment shows that PreWoMe is effective not only in tackling misleading questions but also in handling normal ones, thereby demonstrating the effectiveness of leveraging presuppositions, feedback, and action for real-world QA settings.
    摘要 常见的信息搜寻问题在长形问答中经常导致误导,因为问题中含有模糊或虚假假设。虽然现有的方法可以处理误导问题,但它们只适用于有限的问题类型,这些类型在实际应用中是不可预测的。在这项工作中,我们提出了PreWoMe方法,可以处理任何类型的信息搜寻问题。PreWoMe方法的关键思想是提取问题中的假设,并将其作为工作记忆来生成反馈和行动。我们的实验表明,PreWoMe方法不仅能够处理误导问题,还能够处理正常的问题,这说明了在实际QA场景中,可以通过借鉴假设、反馈和行动来提高问答效果。

From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning

  • paper_url: http://arxiv.org/abs/2310.18364
  • repo_url: https://github.com/sled-group/heuristic-analytic-reasoning
  • paper_authors: Zheyuan Zhang, Shane Storks, Fengyuan Hu, Sungryull Sohn, Moontae Lee, Honglak Lee, Joyce Chai
  • for: 本研究旨在提高语言模型(PLM)的合理性和可靠性,通过 incorporating 认知心理学中的快速和直观思维和慢速和探索性思维两种不同的过程。
  • methods: 我们在 PLM 的 fine-tuning 和 contextual learning 中采用了这两种不同的思维过程,并应用到了两种语言理解任务,需要 Physical Common Sense 的合理化。
  • results: 我们的方法可以很大程度地提高 PLM 的合理化结果,达到了 TRIP 任务的州OF-THE-ART результаados,并发现这种改进的合理化是受到更加 faithful 的语言上下文注意力的直接结果。
    Abstract Pre-trained language models (PLMs) have shown impressive performance in various language tasks. However, they are prone to spurious correlations, and often generate illusory information. In real-world applications, PLMs should justify decisions with formalized, coherent reasoning chains, but this challenge remains under-explored. Cognitive psychology theorizes that humans are capable of utilizing fast and intuitive heuristic thinking to make decisions based on past experience, then rationalizing the decisions through slower and deliberative analytic reasoning. We incorporate these interlinked dual processes in fine-tuning and in-context learning with PLMs, applying them to two language understanding tasks that require coherent physical commonsense reasoning. We show that our proposed Heuristic-Analytic Reasoning (HAR) strategies drastically improve the coherence of rationalizations for model decisions, yielding state-of-the-art results on Tiered Reasoning for Intuitive Physics (TRIP). We also find that this improved coherence is a direct result of more faithful attention to relevant language context in each step of reasoning. Our findings suggest that human-like reasoning strategies can effectively improve the coherence and reliability of PLM reasoning.
    摘要

Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature

  • paper_url: http://arxiv.org/abs/2310.16146
  • repo_url: None
  • paper_authors: Alejandro Lozano, Scott L Fleming, Chia-Chun Chiang, Nigam Shah
  • For: This paper aims to provide an open-source WebApp called Clinfo.ai that answers clinical questions based on dynamically retrieved scientific literature, and to evaluate the performance of such retrieval-augmented language models (LLMs) using a specified information retrieval and abstractive summarization task.* Methods: The authors use a dataset of 200 questions and corresponding answers derived from published systematic reviews, named PubMed Retrieval and Synthesis (PubMedRS-200), to evaluate the performance of Clinfo.ai and other publicly available OpenQA systems.* Results: The authors report benchmark results for Clinfo.ai and other OpenQA systems on PubMedRS-200, demonstrating the effectiveness of their approach in answering clinical questions and summarizing relevant scientific literature.
    Abstract The quickly-expanding nature of published medical literature makes it challenging for clinicians and researchers to keep up with and summarize recent, relevant findings in a timely manner. While several closed-source summarization tools based on large language models (LLMs) now exist, rigorous and systematic evaluations of their outputs are lacking. Furthermore, there is a paucity of high-quality datasets and appropriate benchmark tasks with which to evaluate these tools. We address these issues with four contributions: we release Clinfo.ai, an open-source WebApp that answers clinical questions based on dynamically retrieved scientific literature; we specify an information retrieval and abstractive summarization task to evaluate the performance of such retrieval-augmented LLM systems; we release a dataset of 200 questions and corresponding answers derived from published systematic reviews, which we name PubMed Retrieval and Synthesis (PubMedRS-200); and report benchmark results for Clinfo.ai and other publicly available OpenQA systems on PubMedRS-200.
    摘要 随着医学文献的快速扩展,临床医生和研究人员很难在时间上保持和总结最新、相关的发现。虽然现在有一些基于大语言模型(LLM)的关闭源摘要工具存在,但是对这些工具的评估还缺乏严格和系统的评估。此外,有限的高质量数据集和适当的benchmark任务,用于评估这些工具的性能也缺乏。我们通过以下四个贡献来解决这些问题:我们发布了一个开源的WebApp,名为Clinfo.ai,可以根据动态 retrieve scientific literature来回答临床问题;我们定义了一个信息检索和抽象摘要任务,用于评估这些基于检索加强的LLM系统的性能;我们发布了200个问题和相应的答案,这些答案来自已发布的系统性文献审查,我们称之为PubMed Retrieval and Synthesis(PubMedRS-200);我们还对Clinfo.ai和其他公共可用的OpenQA系统在PubMedRS-200上的性能进行了标准化测试。

A Language Model with Limited Memory Capacity Captures Interference in Human Sentence Processing

  • paper_url: http://arxiv.org/abs/2310.16142
  • repo_url: None
  • paper_authors: William Timkey, Tal Linzen
  • for: 本研究旨在开发一种基于循环神经网络的自然语言处理模型,以更好地模拟人类句子处理中的内存系统。
  • methods: 本研究使用了一个带有单个自我注意头的循环神经网络模型,以更好地模拟人类句子处理中的内存系统。
  • results: 研究发现,该模型的单个注意头可以捕捉人类句子处理中的 semantic和 sintactic干扰效应。
    Abstract Two of the central factors believed to underpin human sentence processing difficulty are expectations and retrieval from working memory. A recent attempt to create a unified cognitive model integrating these two factors relied on the parallels between the self-attention mechanism of transformer language models and cue-based retrieval theories of working memory in human sentence processing (Ryu and Lewis 2021). While Ryu and Lewis show that attention patterns in specialized attention heads of GPT-2 are consistent with similarity-based interference, a key prediction of cue-based retrieval models, their method requires identifying syntactically specialized attention heads, and makes the cognitively implausible assumption that hundreds of memory retrieval operations take place in parallel. In the present work, we develop a recurrent neural language model with a single self-attention head, which more closely parallels the memory system assumed by cognitive theories. We show that our model's single attention head captures semantic and syntactic interference effects observed in human experiments.
    摘要 两个中心因素被认为影响人类句子处理困难是预期和从工作内存中 retrieval。一项最近尝试创建一个统一的认知模型,旨在 integrating 这两个因素,基于 transformer 语言模型的自注意机制和人工内存中缺失 theories 的缘故(Ryu 和 Lewis 2021)。而 Ryu 和 Lewis 表明,特殊注意头中的注意力强度与相似性基本干扰相关,但 их方法需要标识语法特殊注意头,并假设 hundreds 个内存检索操作发生在平行的。在当前工作中,我们开发了一个 recurrent neural language model ,具有单一的自注意头,更加匹配人类认知理论中的内存系统。我们显示,我们模型的单一注意头能够捕捉 semantic 和语法干扰效应,观察到人类实验中。

Context-aware explainable recommendations over knowledge graphs

  • paper_url: http://arxiv.org/abs/2310.16141
  • repo_url: None
  • paper_authors: Jinfeng Zhong, Elsa Negre
  • for: 用于模型用户在不同上下文中的偏好,并可以根据知识图中对物品的semantic关系进行适应 recombination。
  • methods: 使用Context-Aware Knowledge Graph Convolutional Network(CA-KGCN)框架,可以模型用户在不同上下文中的偏好,并可以根据知识图中对物品的semantic关系进行适应 recombination。
  • results: 在三个实际数据集上进行实验,证明了CA-KGCN框架的效果:可以模型用户在不同上下文中的偏好,并可以对 recombination 提供适应于context的解释。
    Abstract Knowledge graphs contain rich semantic relationships related to items and incorporating such semantic relationships into recommender systems helps to explore the latent connections of items, thus improving the accuracy of prediction and enhancing the explainability of recommendations. However, such explainability is not adapted to users' contexts, which can significantly influence their preferences. In this work, we propose CA-KGCN (Context-Aware Knowledge Graph Convolutional Network), an end-to-end framework that can model users' preferences adapted to their contexts and can incorporate rich semantic relationships in the knowledge graph related to items. This framework captures users' attention to different factors: contexts and features of items. More specifically, the framework can model users' preferences adapted to their contexts and provide explanations adapted to the given context. Experiments on three real-world datasets show the effectiveness of our framework: modeling users' preferences adapted to their contexts and explaining the recommendations generated.
    摘要 知识图中的semantic关系rich related to items, 将这些semantic关系integrated into recommender systems可以探索item的隐藏连接, thereby improving the accuracy of predictions and enhancing the explainability of recommendations. However, such explainability is not adapted to users' contexts, which can significantly influence their preferences. In this work, we propose CA-KGCN (Context-Aware Knowledge Graph Convolutional Network), an end-to-end framework that can model users' preferences adapted to their contexts and can incorporate rich semantic relationships in the knowledge graph related to items. This framework captures users' attention to different factors: contexts and features of items. More specifically, the framework can model users' preferences adapted to their contexts and provide explanations adapted to the given context. Experiments on three real-world datasets show the effectiveness of our framework: modeling users' preferences adapted to their contexts and explaining the recommendations generated.Here's the breakdown of the translation:* 知识图 (knowledge graph) becomes 知识图中 (in the knowledge graph)* semantic关系 (semantic relationships) becomes semantic关系rich (rich semantic relationships)* integrated becomes integrated into* 隐藏连接 (hidden connections) becomes 隐藏连接 (latent connections)* predictions becomes 预测 (predictions)* explainability becomes 解释 (explainability)* users' contexts becomes 用户的上下文 (users' contexts)* CA-KGCN (Context-Aware Knowledge Graph Convolutional Network) becomes Context-Aware Knowledge Graph Convolutional Network (CA-KGCN)* end-to-end framework becomes 终端框架 (end-to-end framework)* can model becomes 可以模型 (can model)* users' preferences becomes 用户的偏好 (users' preferences)* adapted to their contexts becomes 适应到他们的上下文 (adapted to their contexts)* and provide explanations becomes 并提供解释 (and provide explanations)* real-world datasets becomes 实际数据集 (real-world datasets)I hope this helps! Let me know if you have any further questions or if there's anything else I can help you with.

Alquist 5.0: Dialogue Trees Meet Generative Models. A Novel Approach for Enhancing SocialBot Conversations

  • paper_url: http://arxiv.org/abs/2310.16119
  • repo_url: None
  • paper_authors: Ondřej Kobza, Jan Čuhel, Tommaso Gargiani, David Herel, Petr Marek
  • for: 这篇论文是为了描述作者开发的 SocialBot Alquist~5.0,以及该系统如何 integrating NRG Barista 和 multimodal devices。
  • methods: 该论文使用了多种创新approaches,包括NRG Barista 和 multimodal devices的支持,以提高对话体验。
  • results: 该论文提供了关于 Alquist~5.0 的开发,以及其能够满足用户对话需求的细节信息。
    Abstract We present our SocialBot -- Alquist~5.0 -- developed for the Alexa Prize SocialBot Grand Challenge~5. Building upon previous versions of our system, we introduce the NRG Barista and outline several innovative approaches for integrating Barista into our SocialBot, improving the overall conversational experience. Additionally, we extend our SocialBot to support multimodal devices. This paper offers insights into the development of Alquist~5.0, which meets evolving user expectations while maintaining empathetic and knowledgeable conversational abilities across diverse topics.
    摘要 我们现在推出我们的社交机器人——Alquist~5.0——,这是为Alexa Prize SocialBot Grand Challenge~5所开发的。基于之前的版本,我们引入了NRG Barista,并提出了一些创新的方法来集成Barista到我们的社交机器人中,从而改善总体的对话体验。此外,我们扩展了我们的社交机器人以支持多模态设备。这篇论文为Alquist~5.0的开发提供了深入的启示,这种系统能够适应用户对话习惯的不断变化,同时保持对多种话题的对话能力和同理心。

Anatomically-aware Uncertainty for Semi-supervised Image Segmentation

  • paper_url: http://arxiv.org/abs/2310.16099
  • repo_url: https://github.com/adigasu/anatomically-aware_uncertainty_for_semi-supervised_segmentation
  • paper_authors: Sukesh Adiga V, Jose Dolz, Herve Lombaert
  • for: 这个论文的目的是提出一种新的图像分割不精确性估计方法,以便更好地指导图像分割网络。
  • methods: 这个论文使用了一种基于 segmentation masks 的 anatomically-aware 表示方法,通过将 prediction 映射到 anatomically-plausible segmentation,来估计图像分割不精确性。
  • results: 这个论文在两个公共可用的图像分割数据集上进行了测试,并与当前最佳 semi-supervised 方法进行了比较,得到了更高的分割精度。
    Abstract Semi-supervised learning relaxes the need of large pixel-wise labeled datasets for image segmentation by leveraging unlabeled data. A prominent way to exploit unlabeled data is to regularize model predictions. Since the predictions of unlabeled data can be unreliable, uncertainty-aware schemes are typically employed to gradually learn from meaningful and reliable predictions. Uncertainty estimation methods, however, rely on multiple inferences from the model predictions that must be computed for each training step, which is computationally expensive. Moreover, these uncertainty maps capture pixel-wise disparities and do not consider global information. This work proposes a novel method to estimate segmentation uncertainty by leveraging global information from the segmentation masks. More precisely, an anatomically-aware representation is first learnt to model the available segmentation masks. The learnt representation thereupon maps the prediction of a new segmentation into an anatomically-plausible segmentation. The deviation from the plausible segmentation aids in estimating the underlying pixel-level uncertainty in order to further guide the segmentation network. The proposed method consequently estimates the uncertainty using a single inference from our representation, thereby reducing the total computation. We evaluate our method on two publicly available segmentation datasets of left atria in cardiac MRIs and of multiple organs in abdominal CTs. Our anatomically-aware method improves the segmentation accuracy over the state-of-the-art semi-supervised methods in terms of two commonly used evaluation metrics.
    摘要 半指导学习降低了需要大量标注的像素级数据来进行图像分割的需求,通过利用无标注数据来推导模型预测。无标注数据的预测结果可能不可靠,因此通常采用不确定性感知方法来慢慢地学习有意义和可靠的预测结果。然而,不确定性估计方法通常需要对每个训练步骤进行多次模型预测,这是计算昂贵的。此外,这些不确定性地图只 capture pixel级差异,没有考虑全局信息。本工作提出了一种新的不确定性估计方法,通过利用分割masks中的全局信息来改进 segmentation uncertainty estimation。具体来说,首先学习了一种可靠的分割表示,以模型可用的分割masks来表示可能的分割结果。然后,通过将新的分割预测映射到该可靠的分割表示中,得到了一个具有 анатомиче可靠性的分割结果。与这个可靠的分割结果的差异可以用来估计图像分割的下一个像素级uncertainty,从而进一步引导分割网络。我们的方法只需要单个的模型预测,可以快速地计算不确定性,而不需要多次模型预测。我们对两个公共可用的分割数据集进行评估,即cardiac MRIs中的左大脏分割数据集和abdominal CTs中的多器官分割数据集。我们的半导学习方法在两个常用的评估指标上提高了分割精度,比前STATE-OF-THE-ART semi-supervised方法更高。

Synthetic Data as Validation

  • paper_url: http://arxiv.org/abs/2310.16052
  • repo_url: https://github.com/fiu-airlab/Next-Generation-Airline-Data-Exchange-Simulator
  • paper_authors: Qixin Hu, Alan Yuille, Zongwei Zhou
  • For: The paper is written to explore the use of synthetic data for early cancer detection in computed tomography (CT) volumes, with a focus on improving the robustness of AI models in identifying very tiny liver tumors.* Methods: The paper uses synthetic data to generate and superimpose tumors onto healthy organs in CT volumes, creating an extensive dataset for validation. The authors also propose a continual learning framework that continuously trains AI models on a stream of out-domain data with synthetic tumors.* Results: The paper shows that using synthetic data for validation can improve AI robustness in both in-domain and out-domain test sets. Specifically, the DSC score for liver tumor segmentation improves from 26.7% to 34.5% when evaluated on an in-domain dataset and from 31.1% to 35.4% on an out-domain dataset, with significant improvements in identifying very tiny liver tumors.
    Abstract This study leverages synthetic data as a validation set to reduce overfitting and ease the selection of the best model in AI development. While synthetic data have been used for augmenting the training set, we find that synthetic data can also significantly diversify the validation set, offering marked advantages in domains like healthcare, where data are typically limited, sensitive, and from out-domain sources (i.e., hospitals). In this study, we illustrate the effectiveness of synthetic data for early cancer detection in computed tomography (CT) volumes, where synthetic tumors are generated and superimposed onto healthy organs, thereby creating an extensive dataset for rigorous validation. Using synthetic data as validation can improve AI robustness in both in-domain and out-domain test sets. Furthermore, we establish a new continual learning framework that continuously trains AI models on a stream of out-domain data with synthetic tumors. The AI model trained and validated in dynamically expanding synthetic data can consistently outperform models trained and validated exclusively on real-world data. Specifically, the DSC score for liver tumor segmentation improves from 26.7% (95% CI: 22.6%-30.9%) to 34.5% (30.8%-38.2%) when evaluated on an in-domain dataset and from 31.1% (26.0%-36.2%) to 35.4% (32.1%-38.7%) on an out-domain dataset. Importantly, the performance gain is particularly significant in identifying very tiny liver tumors (radius < 5mm) in CT volumes, with Sensitivity improving from 33.1% to 55.4% on an in-domain dataset and 33.9% to 52.3% on an out-domain dataset, justifying the efficacy in early detection of cancer. The application of synthetic data, from both training and validation perspectives, underlines a promising avenue to enhance AI robustness when dealing with data from varying domains.
    摘要

AI Alignment and Social Choice: Fundamental Limitations and Policy Implications

  • paper_url: http://arxiv.org/abs/2310.16048
  • repo_url: None
  • paper_authors: Abhilash Mishra
  • for: 本研究旨在 investigating the challenges of building reinforcement learning with human feedback (RLHF) systems that respect democratic norms.
  • methods: 本paper使用社会选择理论的不可能性结果,以及研究RLHF系统如何对各个个人的价值观进行对齐。
  • results: 研究发现,使用RLHF系统对所有个人的价值观进行对齐是不可能的,而且需要对特定用户群进行对齐。 Here’s the full version of the three points in Traditional Chinese:
  • for: 本研究旨在探讨RLHF系统如何对民主 norms进行对齐。
  • methods: 本paper使用社会选择理论的不可能性结果,以及研究RLHF系统如何对各个个人的价值观进行对齐。
  • results: 研究发现,使用RLHF系统对所有个人的价值观进行对齐是不可能的,而且需要对特定用户群进行对齐。
    Abstract Aligning AI agents to human intentions and values is a key bottleneck in building safe and deployable AI applications. But whose values should AI agents be aligned with? Reinforcement learning with human feedback (RLHF) has emerged as the key framework for AI alignment. RLHF uses feedback from human reinforcers to fine-tune outputs; all widely deployed large language models (LLMs) use RLHF to align their outputs to human values. It is critical to understand the limitations of RLHF and consider policy challenges arising from these limitations. In this paper, we investigate a specific challenge in building RLHF systems that respect democratic norms. Building on impossibility results in social choice theory, we show that, under fairly broad assumptions, there is no unique voting protocol to universally align AI systems using RLHF through democratic processes. Further, we show that aligning AI agents with the values of all individuals will always violate certain private ethical preferences of an individual user i.e., universal AI alignment using RLHF is impossible. We discuss policy implications for the governance of AI systems built using RLHF: first, the need for mandating transparent voting rules to hold model builders accountable. Second, the need for model builders to focus on developing AI agents that are narrowly aligned to specific user groups.
    摘要 aligning AI agents with human intentions and values is a key bottleneck in building safe and deployable AI applications. But whose values should AI agents be aligned with? reinforcement learning with human feedback (RLHF) has emerged as the key framework for AI alignment. RLHF uses feedback from human reinforcers to fine-tune outputs; all widely deployed large language models (LLMs) use RLHF to align their outputs to human values. It is critical to understand the limitations of RLHF and consider policy challenges arising from these limitations. In this paper, we investigate a specific challenge in building RLHF systems that respect democratic norms. Building on impossibility results in social choice theory, we show that, under fairly broad assumptions, there is no unique voting protocol to universally align AI systems using RLHF through democratic processes. Further, we show that aligning AI agents with the values of all individuals will always violate certain private ethical preferences of an individual user i.e., universal AI alignment using RLHF is impossible. We discuss policy implications for the governance of AI systems built using RLHF: first, the need for mandating transparent voting rules to hold model builders accountable. Second, the need for model builders to focus on developing AI agents that are narrowly aligned to specific user groups.Here's the translation in Traditional Chinese:对于人工智能应用来说,与人类意图和价值观点相互整合是一个关键瓶颈。但是谁的价值应该与AI代理人相整合?人工增强学习with人类反馈(RLHF)已经成为AI整合的关键框架。RLHF通过人类反馈来细化输出,所有广泛部署的大型自然语言模型(LLMs)都使用RLHF来对人类价值进行整合。理解RLHF的限制和考虑由这些限制导致的政策挑战是非常重要。在这篇论文中,我们investigates a specific challenge in building RLHF systems that respect democratic norms。基于社会选择理论中的不可能结果,我们显示,在很广泛的假设下,通过民主过程使用RLHFUnique voting protocol是不可能的。此外,我们显示,通过RLHF来整合AI代理人与所有个人的价值相整合会违反某些个人私人的伦理偏好。我们讨论RLHF建设的政策影响:首先,需要制定透明的投票规则,以便举证模型建立者的责任。第二,模型建设者需要专注于开发特定用户群的狭频整合AI代理人。

Woodpecker: Hallucination Correction for Multimodal Large Language Models

  • paper_url: http://arxiv.org/abs/2310.16045
  • repo_url: https://github.com/bradyfu/woodpecker
  • paper_authors: Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, Enhong Chen
  • for: mitigate hallucinations in Multimodal Large Language Models (MLLMs)
  • methods: training-free method named Woodpecker, consisting of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction
  • results: 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl on the POPE benchmark, and the source code is released at https://github.com/BradyFU/Woodpecker.
    Abstract Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content. In order to mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introducing a training-free method named Woodpecker. Like a woodpecker heals trees, it picks out and corrects hallucinations from the generated text. Concretely, Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. Implemented in a post-remedy manner, Woodpecker can easily serve different MLLMs, while being interpretable by accessing intermediate outputs of the five stages. We evaluate Woodpecker both quantitatively and qualitatively and show the huge potential of this new paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released at https://github.com/BradyFU/Woodpecker.
    摘要 《抖抖幻觉:一种免 instrucion 的方法》Introduction: Multimodal Large Language Models (MLLMs) 的发展受到幻觉的困扰,幻觉指的是生成文本与图像内容不一致。为了缓解幻觉,现有研究主要采用 instruction-tuning 方法,需要重新训练模型。在这篇论文中,我们开辟了一个不同的方向,提出了一种无需 instrucion 的方法—— Woodpecker。就像一只鹊鸟护理树木,Woodpecker 可以从生成文本中提取和修正幻觉。Method: Woodpecker 包括五个阶段:关键概念提取、问题构造、视觉知识验证、视觉声明生成和幻觉修正。在后续救恤方式下,Woodpecker 可以轻松地服务于不同的 MLLMs,同时可以通过访问不同阶段的输出来解释。我们对 Woodpecker 进行了量化和质量的评估,并显示了这新的思维方式的巨大潜力。在 POPE 测试准则下,我们的方法与基eline MiniGPT-4/mPLUG-Owl 的基eline之间具有30.66%/24.33%的提升。Code Release: Woodpecker 的源代码已经在 GitHub 上发布,可以通过 访问。Note: * 幻觉(Hallucination)指的是生成文本与图像内容不一致的现象。* MLLMs 是多模态大型语言模型的缩写。* instrucion-tuning 是指重新训练模型以适应特定数据的方法。

WebWISE: Web Interface Control and Sequential Exploration with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.16042
  • repo_url: None
  • paper_authors: Heyi Tao, Sethuraman T V, Michal Shlapentokh-Rothman, Derek Hoiem
  • for: 这篇论文是用一种大型自然语言模型(LLM)来自动执行网络软件任务的。
  • methods: 这种方法使用筛选的文档对象模型(DOM)元素作为观察,并按照步骤进行任务,逐步生成小程序基于当前观察。
  • results: 我们的WebWISE方法可以在MiniWob++ bencmark上达到类似或更好的性能,只需要一个在场示例。
    Abstract The paper investigates using a Large Language Model (LLM) to automatically perform web software tasks using click, scroll, and text input operations. Previous approaches, such as reinforcement learning (RL) or imitation learning, are inefficient to train and task-specific. Our method uses filtered Document Object Model (DOM) elements as observations and performs tasks step-by-step, sequentially generating small programs based on the current observations. We use in-context learning, either benefiting from a single manually provided example, or an automatically generated example based on a successful zero-shot trial. We evaluate the proposed method on the MiniWob++ benchmark. With only one in-context example, our WebWISE method achieves similar or better performance than other methods that require many demonstrations or trials.
    摘要 文章研究使用大型自然语言模型(LLM)来自动完成网络软件任务,使用点击、滚动和文本输入操作。先前的方法,如强化学习(RL)或模仿学习,训练不efficient。我们的方法使用筛选后的文档对象模型(DOM)元素作为观察,逐步执行任务,基于当前观察sequentially generating small programs。我们使用在场学习,从单一提供的人工示例或自动生成的零例示例中受益。我们在MiniWob++测试准则上评估了我们的WebWISE方法。只需一个在场示例,我们的方法可以与其他需要多个示例或尝试的方法匹配或超越其表现。

Instruct and Extract: Instruction Tuning for On-Demand Information Extraction

  • paper_url: http://arxiv.org/abs/2310.16040
  • repo_url: https://github.com/yzjiao/on-demand-ie
  • paper_authors: Yizhu Jiao, Ming Zhong, Sha Li, Ruining Zhao, Siru Ouyang, Heng Ji, Jiawei Han
  • for: 本研究旨在提供一种基于大语言模型的个性化信息EXTRACTION系统,以满足非专家用户的长尾特殊EXTRACTION需求。
  • methods: 我们提出了一种新的 paradigm,称为“按需信息EXTRACTION”,以满足实际世界用户的个性化需求。我们的任务是根据用户的指令EXTRACTION相关文本中所需的内容,并将其提供在结构化表格格式中。
  • results: 我们在InstructIE benchmark上进行了广泛的评估,并显示了ODIE模型在相同大小模型中substantially outperform existing open-source models。
    Abstract Large language models with instruction-following capabilities open the door to a wider group of users. However, when it comes to information extraction - a classic task in natural language processing - most task-specific systems cannot align well with long-tail ad hoc extraction use cases for non-expert users. To address this, we propose a novel paradigm, termed On-Demand Information Extraction, to fulfill the personalized demands of real-world users. Our task aims to follow the instructions to extract the desired content from the associated text and present it in a structured tabular format. The table headers can either be user-specified or inferred contextually by the model. To facilitate research in this emerging area, we present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set. Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE. Comprehensive evaluations on our benchmark reveal that ODIE substantially outperforms the existing open-source models of similar size. Our code and dataset are released on https://github.com/yzjiao/On-Demand-IE.
    摘要 大型自然语言处理模型具有 instrucion-following 能力,打开了更广泛的用户群体。然而,在信息提取任务中,大多数任务特定系统无法与长尾随机提取用 caso 对非专家用户进行好匹配。为解决这个问题,我们提出了一种新的思路,称为需求导向信息提取(On-Demand Information Extraction),以满足实际世界用户的个性化需求。我们的任务是根据用户提供的指令,从关联的文本中提取需要的内容,并将其格式化为结构化的表格形式。表头可以由用户指定,或者由模型Contextually inferred。为促进这个新兴领域的研究,我们提供了一个名为 InstructIE 的benchmark,其包括自动生成的training数据,以及人工注释的测试集。基于 InstructIE,我们进一步开发了一个需求导向信息提取器(ODIE)。我们的评估表明,ODIE在相同大小模型中显著超越了现有的开源模型。我们的代码和数据集在 上发布。

What’s Left? Concept Grounding with Logic-Enhanced Foundation Models

  • paper_url: http://arxiv.org/abs/2310.16035
  • repo_url: https://github.com/joyhsu0504/left
  • paper_authors: Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu
  • for: 本研究旨在提出一种基于逻辑的跨领域基础模型(LEFT),可以智能地把概念考据到不同领域中。
  • methods: LEFT使用了大语言模型(LLM)和域特定固定模型(GTM),通过一种可微分的逻辑语言来执行程序。
  • results: LEFT在四个领域(2D图像、3D场景、人体动作和机器人抓取)中显示出了强大的理解能力,能够解决复杂的任务,包括训练时未看到的任务,并且可以轻松应用于新领域。
    Abstract Recent works such as VisProg and ViperGPT have smartly composed foundation models for visual reasoning-using large language models (LLMs) to produce programs that can be executed by pre-trained vision-language models. However, they operate in limited domains, such as 2D images, not fully exploiting the generalization of language: abstract concepts like "left" can also be grounded in 3D, temporal, and action data, as in moving to your left. This limited generalization stems from these inference-only methods' inability to learn or adapt pre-trained models to a new domain. We propose the Logic-Enhanced Foundation Model (LEFT), a unified framework that learns to ground and reason with concepts across domains with a differentiable, domain-independent, first-order logic-based program executor. LEFT has an LLM interpreter that outputs a program represented in a general, logic-based reasoning language, which is shared across all domains and tasks. LEFT's executor then executes the program with trainable domain-specific grounding modules. We show that LEFT flexibly learns concepts in four domains: 2D images, 3D scenes, human motions, and robotic manipulation. It exhibits strong reasoning ability in a wide variety of tasks, including those that are complex and not seen during training, and can be easily applied to new domains.
    摘要 最近的工作,如VisProg和ViperGPT,已经开发了基于大语言模型(LLM)的视觉逻辑基础模型。然而,这些基础模型只能在有限的领域中运行,如2D图像,而不能充分利用语言的抽象概念,如“左”。这种推理只能在新领域中学习和适应已经训练过的模型。我们提出了逻辑增强基础模型(LEFT),一个统一框架,可以在不同领域和任务中学习和理解概念。LEFT的LLM интерпрета器输出一个基于普适逻辑语言的程序表示,这种语言在所有领域和任务中共享。然后,LEFT的执行器将使用可调整的领域特定的固定模块执行程序。我们示示LEFT在四个领域中灵活地学习概念,并且在各种复杂的任务中表现出强大的逻辑能力,包括一些在训练时未经看到的任务。此外,LEFT可以轻松应用于新的领域。

Finetuning Offline World Models in the Real World

  • paper_url: http://arxiv.org/abs/2310.16029
  • repo_url: https://github.com/fyhMer/fowm
  • paper_authors: Yunhai Feng, Nicklas Hansen, Ziyan Xiong, Chandramouli Rajagopalan, Xiaolong Wang
  • for: 这个研究想要将模型基础RL(world models)训练在真实机器上,并且将其应用于新任务中。
  • methods: 这个研究使用了禁止线RL(offline RL)框架,将RL策略训练在预存的数据集上,而不需要在线上互动。在训练过程中,我们将模型训练到在线上执行的构思中,以免于在新任务中发生扩展错误。
  • results: 我们的方法可以在许多类型的视动控制任务中实现几次训练即可以见和未见的任务,甚至在有限的预存数据集上进行训练。我们提供了许多类型的视动控制任务的评估结果,并且提供了相关的代码和数据。
    Abstract Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult. While model-based RL algorithms (world models) improve data-efficiency to some extent, they still require hours or days of interaction to learn skills. Recently, offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction. However, constraining an algorithm to a fixed dataset induces a state-action distribution shift between training and inference, and limits its applicability to new tasks. In this work, we seek to get the best of both worlds: we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model. To mitigate extrapolation errors during online interaction, we propose to regularize the planner at test-time by balancing estimated returns and (epistemic) model uncertainty. We evaluate our method on a variety of visuo-motor control tasks in simulation and on a real robot, and find that our method enables few-shot finetuning to seen and unseen tasks even when offline data is limited. Videos, code, and data are available at https://yunhaifeng.com/FOWM .
    摘要 强化学习(RL)被认为是数据不充分的,这使得在真实机器人上进行训练变得困难。而使用世界模型(世界模型)的RL算法可以有一定程度的改善数据效率,但它们仍然需要数小时或数天的交互来学习技能。在最近,无线RL作为一个框架,它可以在已有数据集上训练RL策略而无需在线交互。然而,将算法约束在固定的数据集上会导致状态动作分布shift,限制其应用于新任务。在这个工作中,我们寻求获得两个世界的优点:我们考虑使用已有的世界模型在线上采集的数据进行预训练,然后在线上采集的数据上进行微调。为了减少在线交互时的推断误差,我们提议在测试时对计划进行规则化,将估计的返回和(知识)模型不确定性平衡。我们在视觉动作控制任务上进行了多种实验和在真实机器人上进行了实验,发现我们的方法可以在几个步骤内完成seen和unseen任务。视频、代码和数据可以在 中获取。

What Algorithms can Transformers Learn? A Study in Length Generalization

  • paper_url: http://arxiv.org/abs/2310.16028
  • repo_url: None
  • paper_authors: Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran
  • for: 这种研究旨在探讨Transformer模型在算法任务上的总能力范围,以及它们是否可以学习真正的算法来解决问题。
  • methods: 我们使用RASP(Weiss等,2021)编程语言,并提出了RASP总则:Transformer模型在任务上能够长期泛化,如果该任务可以通过一个短的RASP程序来解决,并且该程序适用于所有输入长度。
  • results: 我们发现,这个总则能够准确预测Transformer模型在算法任务上的泛化性能,并且我们可以根据这个总则提高传统难度的任务(如平衡和加法)的泛化性能。在理论方面,我们给出了一个简单的示例,表明Abbe等(2023)的”最小度 interpolator”模型不能正确预测Transformer模型的异常行为,而我们的总则则可以正确预测。总之,我们的工作为compositional generalization机制和Transformer模型的算法能力提供了新的视角。
    Abstract Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for solving a task. We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks. Here, we propose a unifying framework to understand when and how Transformers can exhibit strong length generalization on a given task. Specifically, we leverage RASP (Weiss et al., 2021) -- a programming language designed for the computational model of a Transformer -- and introduce the RASP-Generalization Conjecture: Transformers tend to length generalize on a task if the task can be solved by a short RASP program which works for all input lengths. This simple conjecture remarkably captures most known instances of length generalization on algorithmic tasks. Moreover, we leverage our insights to drastically improve generalization performance on traditionally hard tasks (such as parity and addition). On the theoretical side, we give a simple example where the "min-degree-interpolator" model of learning from Abbe et al. (2023) does not correctly predict Transformers' out-of-distribution behavior, but our conjecture does. Overall, our work provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers.
    摘要 大型语言模型会表现出意想不到的总结普遍性特性,但同时也对许多简单的逻辑任务显示困难。这提出了Transformer模型是否和何时学习真实的算法来解决任务的问题。我们在特定的设置下研究Transformer模型的能力范围。我们利用Weiss等人(2021)提出的RASP编程语言——一种基于Transformer计算模型的编程语言——并提出了RASP总结假设:Transformer模型在任务上具有长度总结能力,如果任务可以通过一个短的RASP程序来解决,并且这个程序适用于所有输入长度。这个简单的假设奇怪地捕捉了大多数已知的长度总结任务。此外,我们利用我们的发现来大幅提高传统难度任务(如加法和平衡)的总结性能。在理论上,我们给出了一个简单的例子,证明Abbe等人(2023)的“最小度 interpolator”模型不能正确预测Transformer模型的外部数据行为,而我们的假设则能够正确预测。总之,我们的工作提供了一种新的视角来理解compositional总结和Transformer模型的算法能力。

Physically Explainable Deep Learning for Convective Initiation Nowcasting Using GOES-16 Satellite Observations

  • paper_url: http://arxiv.org/abs/2310.16015
  • repo_url: None
  • paper_authors: Da Fan, Steven J. Greybush, David John Gagne II, Eugene E. Clothiaux
  • for: 预测气象预报模型和现有的nowcasting算法中的气象发展初始化(CI)问题仍然是一个挑战。
  • methods: 本研究使用物体基于概率深度学习模型预测CI,使用多核心infrared GOES-R卫星观测数据。
  • results: 深度学习模型在领先时间至1小时之间显著超越了经典逻辑模型,尤其是false alarm ratio。通过案例研究,深度学习模型表现了云和湿度特征在多个水平的依赖关系。模型解释结果显示模型决策过程中的不同基线的重要性。
    Abstract Convection initiation (CI) nowcasting remains a challenging problem for both numerical weather prediction models and existing nowcasting algorithms. In this study, object-based probabilistic deep learning models are developed to predict CI based on multichannel infrared GOES-R satellite observations. The data come from patches surrounding potential CI events identified in Multi-Radar Multi-Sensor Doppler weather radar products over the Great Plains region from June and July 2020 and June 2021. An objective radar-based approach is used to identify these events. The deep learning models significantly outperform the classical logistic model at lead times up to 1 hour, especially on the false alarm ratio. Through case studies, the deep learning model exhibits the dependence on the characteristics of clouds and moisture at multiple levels. Model explanation further reveals the model's decision-making process with different baselines. The explanation results highlight the importance of moisture and cloud features at different levels depending on the choice of baseline. Our study demonstrates the advantage of using different baselines in further understanding model behavior and gaining scientific insights.
    摘要 干湿物理学中的干湿物理学问题(CI)预测仍然是数值天气预测模型和现有预测算法的挑战。在本研究中,我们开发了基于多通道红外GOES-R卫星观测数据的对象概率深度学习模型,以预测CI。数据来自于2020年6月至7月和2021年6月在大陆地区的多普勒多感器气象雷达产品中拟合成的潜在CI事件。我们采用了对象雷达基本法来标识这些事件。深度学习模型在预测领域的领先时间(最长1小时)中显著超过了经典逻辑模型,特别是false alarm ratio。通过案例研究,深度学习模型表现出云和湿度特征的依赖关系,并且模型解释结果透视了不同基准下模型决策过程中的云和湿度特征的重要性。这些结果表明了不同基准下模型的行为和科学意义。

Bayesian imaging inverse problem with SA-Roundtrip prior via HMC-pCN sampler

  • paper_url: http://arxiv.org/abs/2310.17817
  • repo_url: https://github.com/qjy415417122/sa-roundtrip
  • paper_authors: Jiayu Qian, Yuanyuan Liu, Jingya Yang, Qingping Zhou
  • for: 用于解决科学和工程领域的图像反向问题
  • methods: 使用深度生成先验学习积分分布,并在描述数据的自注意结构内嵌入探索机制
  • results: 比较现有方法表现出色,在CT重建和MNIST数据集上实现了稳定和准确的点估计,同时提供了精确的不确定性评估
    Abstract Bayesian inference with deep generative prior has received considerable interest for solving imaging inverse problems in many scientific and engineering fields. The selection of the prior distribution is learned from, and therefore an important representation learning of, available prior measurements. The SA-Roundtrip, a novel deep generative prior, is introduced to enable controlled sampling generation and identify the data's intrinsic dimension. This prior incorporates a self-attention structure within a bidirectional generative adversarial network. Subsequently, Bayesian inference is applied to the posterior distribution in the low-dimensional latent space using the Hamiltonian Monte Carlo with preconditioned Crank-Nicolson (HMC-pCN) algorithm, which is proven to be ergodic under specific conditions. Experiments conducted on computed tomography (CT) reconstruction with the MNIST and TomoPhantom datasets reveal that the proposed method outperforms state-of-the-art comparisons, consistently yielding a robust and superior point estimator along with precise uncertainty quantification.
    摘要 bayesian 推理 WITH deep 生成假设 已经受到了许多科学和工程领域的广泛关注,用于解决几何逆问题。假设分布的选择是从可用的假设测量学习的,因此是很重要的假设学习。SA-Roundtrip,一种新的深度生成假设,是用于实现控制的抽样生成和识别数据的自然维度。这个假设包含了自我注意结构在 bidirectional 生成对抗网络中。接下来,bayesian 推理是应用到在低维度的潜在空间中的 posterior 分布上,使用 Hamilton Monte Carlo with preconditioned Crank-Nicolson (HMC-pCN) 算法,这是在特定条件下证明为ergodic。实验在 computed tomography (CT) 重建中使用 MNIST 和 TomoPhantom 数据集,显示了提案的方法在比较之上表现出Robust和superior的点估计,同时精确地评估不确定性。

Human-in-the-Loop Task and Motion Planning for Imitation Learning

  • paper_url: http://arxiv.org/abs/2310.16014
  • repo_url: None
  • paper_authors: Ajay Mandlekar, Caelan Garrett, Danfei Xu, Dieter Fox
  • for: 本研究旨在开发一种名为人类在循环控制(HITL)任务和运动规划(TAMP)系统,用于教育机器人执行复杂的搏动任务。
  • methods: 该系统使用了TAMP-gat Control机制,允许人类操作员在机器人 Fleet 中进行数据采集,并将采集的人类数据与仿冒学框架结合,以训练TAMP-gat 策略。
  • results: 与传统电动操作系统相比,HITL-TAMP 可以更高效地采集数据,并且可以从非专家电动操作数据中训练高效的机器人。研究中收集了2.1K demos,并在12种接触度高、时间长的任务中实现了近乎完美的机器人。
    Abstract Imitation learning from human demonstrations can teach robots complex manipulation skills, but is time-consuming and labor intensive. In contrast, Task and Motion Planning (TAMP) systems are automated and excel at solving long-horizon tasks, but they are difficult to apply to contact-rich tasks. In this paper, we present Human-in-the-Loop Task and Motion Planning (HITL-TAMP), a novel system that leverages the benefits of both approaches. The system employs a TAMP-gated control mechanism, which selectively gives and takes control to and from a human teleoperator. This enables the human teleoperator to manage a fleet of robots, maximizing data collection efficiency. The collected human data is then combined with an imitation learning framework to train a TAMP-gated policy, leading to superior performance compared to training on full task demonstrations. We compared HITL-TAMP to a conventional teleoperation system -- users gathered more than 3x the number of demos given the same time budget. Furthermore, proficient agents (75\%+ success) could be trained from just 10 minutes of non-expert teleoperation data. Finally, we collected 2.1K demos with HITL-TAMP across 12 contact-rich, long-horizon tasks and show that the system often produces near-perfect agents. Videos and additional results at https://hitltamp.github.io .
    摘要 人类示范学习可以教育机器人执行复杂的搬运任务,但是时间和劳动力成本高。相比之下,任务和动作规划(TAMP)系统是自动化的,但是它们在触感丰富任务上表现不佳。本文描述了一种新的人类在循环(HITL-TAMP)系统,它利用了两种方法的优点。该系统使用了 TAMP-gated 控制机制,允许人类操作员在机器人队列中进行数据采集管理。采集的人类数据与一种模仿学习框架结合,以训练一个 TAMP-gated 政策,从而实现更高的性能。我们比较了 HITL-TAMP 和传统的 теле操作系统,发现用户可以在同一时间预算内收集更多的示范数据(至少3倍)。此外,我们还发现,只需要10分钟的非专家电操作数据,可以训练出75%以上的成功率的高效代理。最后,我们收集了12种Contact-rich、long-horizon任务的2100个示范数据,并证明该系统可以生成近似完美的代理。详细结果和视频可以在 查看。

Dissecting In-Context Learning of Translations in GPTs

  • paper_url: http://arxiv.org/abs/2310.15987
  • repo_url: None
  • paper_authors: Vikas Raunak, Hany Hassan Awadalla, Arul Menezes
  • for: 这项研究旨在更好地理解在语音翻译中使用大语言模型(LLMs)的几个示例选择。
  • methods: 这项研究使用了对高质量、 Domain 中的示例进行偏移来更好地理解翻译中的几个示例选择。
  • results: 研究发现,源侧偏移几乎没有影响,而目标偏移可以很快地降低翻译质量,这说明在翻译中,输出文本分布是提供最重要的学习信号。他们提出了一种名为 Zero-Shot-Context 的方法,可以自动地添加这种信号在零示例提示中。研究显示,这种方法可以超越 GPT-3 的零示例翻译性能,甚至与几个示例提示的翻译性能相匹配。
    Abstract Most of the recent work in leveraging Large Language Models (LLMs) such as GPT-3 for Machine Translation (MT) has focused on selecting the few-shot samples for prompting. In this work, we try to better understand the role of demonstration attributes for the in-context learning of translations through perturbations of high-quality, in-domain demonstrations. We find that asymmetric perturbation of the source-target mappings yield vastly different results. We show that the perturbation of the source side has surprisingly little impact, while target perturbation can drastically reduce translation quality, suggesting that it is the output text distribution that provides the most important learning signal during in-context learning of translations. We propose a method named Zero-Shot-Context to add this signal automatically in Zero-Shot prompting. We demonstrate that it improves upon the zero-shot translation performance of GPT-3, even making it competitive with few-shot prompted translations.
    摘要 大多数最近的研究把大型自然语言模型(LLM)如GPT-3应用于机器翻译(MT)都集中在选择少量示例的推示中。在这个工作中,我们尝试更好地理解在上下文学习翻译时示例特征的作用。我们发现源目标映射的偏移会导致极其不同的结果,而目标偏移可以很快地降低翻译质量,表明在翻译上下文学习中,输出文本分布提供了最重要的学习信号。我们提出一种名为Zero-Shot-Context的方法,可以自动添加这种信号在零批示中。我们示示了这种方法可以超越GPT-3的零批示翻译性能,甚至与几批示推示的翻译性能竞争。

Graph Deep Learning for Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2310.15978
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Andrea Cini, Ivan Marisca, Daniele Zambon, Cesare Alippi
  • for: 本研究旨在提供一个系统的方法ologique框架,以便对时间序列集合进行Graph-based深度学习预测。
  • methods: 本研究使用的方法包括对时间序列集合进行Graph-based的深度学习预测,并提供了一个系统的设计原则和性能评估方法。
  • results: 本研究的结果显示,Graph-based深度学习预测方法可以实现高精度的预测,并且可以处理大规模的时间序列集合。此外,本研究还提供了一些设计原则和实践建议,以便帮助研究者在实际应用中使用这些方法。
    Abstract Graph-based deep learning methods have become popular tools to process collections of correlated time series. Differently from traditional multivariate forecasting methods, neural graph-based predictors take advantage of pairwise relationships by conditioning forecasts on a (possibly dynamic) graph spanning the time series collection. The conditioning can take the form of an architectural inductive bias on the neural forecasting architecture, resulting in a family of deep learning models called spatiotemporal graph neural networks. Such relational inductive biases enable the training of global forecasting models on large time-series collections, while at the same time localizing predictions w.r.t. each element in the set (i.e., graph nodes) by accounting for local correlations among them (i.e., graph edges). Indeed, recent theoretical and practical advances in graph neural networks and deep learning for time series forecasting make the adoption of such processing frameworks appealing and timely. However, most of the studies in the literature focus on proposing variations of existing neural architectures by taking advantage of modern deep learning practices, while foundational and methodological aspects have not been subject to systematic investigation. To fill the gap, this paper aims to introduce a comprehensive methodological framework that formalizes the forecasting problem and provides design principles for graph-based predictive models and methods to assess their performance. At the same time, together with an overview of the field, we provide design guidelines, recommendations, and best practices, as well as an in-depth discussion of open challenges and future research directions.
    摘要 GRaph-based deep learning methods have become popular tools to process collections of correlated time series. Differently from traditional multivariate forecasting methods, neural graph-based predictors take advantage of pairwise relationships by conditioning forecasts on a (possibly dynamic) graph spanning the time series collection. The conditioning can take the form of an architectural inductive bias on the neural forecasting architecture, resulting in a family of deep learning models called spatiotemporal graph neural networks. Such relational inductive biases enable the training of global forecasting models on large time-series collections, while at the same time localizing predictions w.r.t. each element in the set (i.e., graph nodes) by accounting for local correlations among them (i.e., graph edges). Indeed, recent theoretical and practical advances in graph neural networks and deep learning for time series forecasting make the adoption of such processing frameworks appealing and timely. However, most of the studies in the literature focus on proposing variations of existing neural architectures by taking advantage of modern deep learning practices, while foundational and methodological aspects have not been subject to systematic investigation. To fill the gap, this paper aims to introduce a comprehensive methodological framework that formalizes the forecasting problem and provides design principles for graph-based predictive models and methods to assess their performance. At the same time, together with an overview of the field, we provide design guidelines, recommendations, and best practices, as well as an in-depth discussion of open challenges and future research directions.Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from Traditional Chinese.

Accented Speech Recognition With Accent-specific Codebooks

  • paper_url: http://arxiv.org/abs/2310.15970
  • repo_url: https://github.com/csalt-research/accented-codebooks-asr
  • paper_authors: Darshan Prabhu, Preethi Jyothi, Sriram Ganapathy, Vinit Unni
  • For: 本研究旨在提高现代自动话语识别(ASR)系统对不同口音的表现,尤其是对于未经投入的口音。* Methods: 我们提出了一种新的口音适应方法,使用跨注意力的学习码库集。这些可学习的码库集中包含了口音特有的信息,并与ASR编码层结合使用。* Results: 我们在Mozilla Common Voice多口音 dataset上进行了实验,结果显示,我们的提议方法可以在训练过程中提高英语口音表现(最多提高37%的单词错误率),同时在未见口音测试数据上也可以获得显著改善(最多提高5%的单词错误率)。此外,我们还发现在零基础转移设置下,我们的方法可以在L2Artic dataset上实现显著改善。与其他基于口音对抗训练的方法进行比较,我们的方法也表现出优异。
    Abstract Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems. Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR. In this work, we propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks. These learnable codebooks capture accent-specific information and are integrated within the ASR encoder layers. The model is trained on accented English speech, while the test data also contained accents which were not seen during training. On the Mozilla Common Voice multi-accented dataset, we show that our proposed approach yields significant performance gains not only on the seen English accents (up to $37\%$ relative improvement in word error rate) but also on the unseen accents (up to $5\%$ relative improvement in WER). Further, we illustrate benefits for a zero-shot transfer setup on the L2Artic dataset. We also compare the performance with other approaches based on accent adversarial training.
    摘要 听力口音对现代自动语音识别(ASR)系统 pose significiant 挑战。听力口音不均衡的性能下降是ASR的包容性采用的严重障碍。在这项工作中,我们提议一种基于 cross-attention 的新的口音适应方法,用于结构ASR系统。这些可学习的codebook捕捉了口音特定信息,并在ASR编码层中集成。模型在口音英语语音上训练,测试数据包含不同于训练数据中的口音。在Mozilla Common Voice多口音集合上,我们表明我们的提议方法可以在 seen 口音上实现 significiant 性能提升(最高达37%的字典错误率下降),以及不seen 口音上(最高达5%的字典错误率下降)。此外,我们还证明了零战斗转移设置下的好处。此外,我们还与口音对抗训练方法进行比较。

Representation Learning with Large Language Models for Recommendation

  • paper_url: http://arxiv.org/abs/2310.15950
  • repo_url: https://github.com/hkuds/rlmrec
  • paper_authors: Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, Chao Huang
  • for: 这个论文的目的是提高现有的推荐系统,使其能够更好地捕捉用户的偏好和行为。
  • methods: 这篇论文使用了语言模型(LLM)来强化现有的ID-based推荐系统,并通过跨视图对Alignment来协调语言模型和共同关系信号的semantic空间。
  • results: 在我们的评估中,RLMRec可以增强现有的推荐模型,并且可以降低噪声和偏见的影响。我们的实现代码可以在https://github.com/HKUDS/RLMRec上获取。
    Abstract Recommender systems have seen significant advancements with the influence of deep learning and graph neural networks, particularly in capturing complex user-item relationships. However, these graph-based recommenders heavily depend on ID-based data, potentially disregarding valuable textual information associated with users and items, resulting in less informative learned representations. Moreover, the utilization of implicit feedback data introduces potential noise and bias, posing challenges for the effectiveness of user preference learning. While the integration of large language models (LLMs) into traditional ID-based recommenders has gained attention, challenges such as scalability issues, limitations in text-only reliance, and prompt input constraints need to be addressed for effective implementation in practical recommender systems. To address these challenges, we propose a model-agnostic framework RLMRec that aims to enhance existing recommenders with LLM-empowered representation learning. It proposes a recommendation paradigm that integrates representation learning with LLMs to capture intricate semantic aspects of user behaviors and preferences. RLMRec incorporates auxiliary textual signals, develops a user/item profiling paradigm empowered by LLMs, and aligns the semantic space of LLMs with the representation space of collaborative relational signals through a cross-view alignment framework. This work further establish a theoretical foundation demonstrating that incorporating textual signals through mutual information maximization enhances the quality of representations. In our evaluation, we integrate RLMRec with state-of-the-art recommender models, while also analyzing its efficiency and robustness to noise data. Our implementation codes are available at https://github.com/HKUDS/RLMRec.
    摘要 现代推荐系统已经受到深度学习和图内存网络的影响,特别是在捕捉用户-项目关系的复杂性方面做出了重要进步。然而,这些图基的推荐器很多依赖ID数据,可能忽略用户和项目之间的文本信息,从而导致学习的表示变得更加粗糙。此外,使用隐式反馈数据也会引入噪声和偏见问题,对用户喜好学习造成挑战。而将大型自然语言模型(LLM)integrated into传统ID基的推荐器已经吸引了关注,但是这些挑战需要解决,例如可扩展性问题、仅依靠文本信息的局限性和提交输入约束。为了解决这些挑战,我们提出了一个模型无关的框架RLMRec,该框架通过将LLM-empowered representation learning与现有推荐器集成,以捕捉用户行为和喜好的细腻层次。RLMRec利用文本信号作为辅助信号,开发了基于LLM的用户/项目 profiling方法,并通过交叉视图对齐框架将LLM的 semantic space与现有的共同关系信号的表示空间相匹配。本研究还建立了基于在加入文本信号的mutual information最大化的理论基础,证明了将文本信号integrated into推荐系统可以提高表示质量。在我们的评估中,我们将RLMRec与当今最先进的推荐模型结合,同时分析其效率和Robustness to noise data。我们的实现代码可以在https://github.com/HKUDS/RLMRec上获取。

Combining Behaviors with the Successor Features Keyboard

  • paper_url: http://arxiv.org/abs/2310.15940
  • repo_url: None
  • paper_authors: Wilka Carvalho, Andre Saraiva, Angelos Filos, Andrew Kyle Lampinen, Loic Matthey, Richard L. Lewis, Honglak Lee, Satinder Singh, Danilo J. Rezende, Daniel Zoran
  • for: 这篇论文目标是提出一种基于Successor Features(SF)和Generalized Policy Improvement(GPI)的行为知识传递方法,以便在新任务环境中快速学习。
  • methods: 该方法使用了Successor Features Keyboard(SFK)和Categorical Successor Feature Approximator(CSFA)两种算法来实现行为知识传递。CSFA是一种新的学习算法,可以同时发现状态特征和任务编码。
  • results: 通过SFK和CSFA,该论文在一个复杂的3D环境中实现了行为知识传递,并且比基于转移学习的基准方法更快速地传递到长期任务。此外,CSFA比其他SF approximator方法更能够在大规模任务中发现与SF&GPI相容的表示。
    Abstract The Option Keyboard (OK) was recently proposed as a method for transferring behavioral knowledge across tasks. OK transfers knowledge by adaptively combining subsets of known behaviors using Successor Features (SFs) and Generalized Policy Improvement (GPI). However, it relies on hand-designed state-features and task encodings which are cumbersome to design for every new environment. In this work, we propose the "Successor Features Keyboard" (SFK), which enables transfer with discovered state-features and task encodings. To enable discovery, we propose the "Categorical Successor Feature Approximator" (CSFA), a novel learning algorithm for estimating SFs while jointly discovering state-features and task encodings. With SFK and CSFA, we achieve the first demonstration of transfer with SFs in a challenging 3D environment where all the necessary representations are discovered. We first compare CSFA against other methods for approximating SFs and show that only CSFA discovers representations compatible with SF&GPI at this scale. We then compare SFK against transfer learning baselines and show that it transfers most quickly to long-horizon tasks.
    摘要 “Option Keyboard(OK)最近被提议作为知识传递方法。OK通过适应性地组合已知行为使用成功特征(SF)和通用政策改进(GPI)来传递知识。然而,它依赖于手动设计的状态特征和任务编码,这些编码是为每个新环境设计的困难。在这种工作中,我们提出了“Successor Features Keyboard”(SFK),它允许通过发现状态特征和任务编码来传递知识。为实现发现,我们提出了“ categorical Successor Feature Approximator”(CSFA),一种新的学习算法,可以在同时发现状态特征和任务编码中 estimating SF。与SFK和CSFA相比,我们发现SFK可以在长期任务中传递最快。”Note that Simplified Chinese is used in this translation, as it is the more widely used standard for Chinese writing in mainland China. If you prefer Traditional Chinese, I can provide that as well.

E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity

  • paper_url: http://arxiv.org/abs/2310.15929
  • repo_url: None
  • paper_authors: Yun Li, Lin Niu, Xipeng Zhang, Kai Liu, Jianchen Zhu, Zhanhui Kang
  • for: 提高 Large Language Models (LLMs) 的 Generative AI 性能,解决传统遍历方法在 LLMs 中的不可持续性问题。
  • methods: 利用隐藏状态特征的信息熵,设计一种名为 E-Sparse 的减少精度度量,以提高 N:M 稀缺性的准确性。E-Sparse 使用信息 ricness 来优化通道重要性,并采用了一些新的技术来实现:(1)通过增加参数权重和输入特征norm的信息熵来增强精度度量,并不需要修改剩下的权重。(2)设计全局简单混合和本地块混合来快速优化信息分布,并妥协 LLMS 的精度和内存占用。
  • results: E-Sparse 在 LLaMA 家族和 OPT 模型上实现了显著提高模型推理速度(最高达 1.53 倍)和内存占用减少(最高达 43.52%),与稀缺模型的精度损失可以接受。
    Abstract Traditional pruning methods are known to be challenging to work in Large Language Models (LLMs) for Generative AI because of their unaffordable training process and large computational demands. For the first time, we introduce the information entropy of hidden state features into a pruning metric design, namely E-Sparse, to improve the accuracy of N:M sparsity on LLM. E-Sparse employs the information richness to leverage the channel importance, and further incorporates several novel techniques to put it into effect: (1) it introduces information entropy to enhance the significance of parameter weights and input feature norms as a novel pruning metric, and performs N:M sparsity without modifying the remaining weights. (2) it designs global naive shuffle and local block shuffle to quickly optimize the information distribution and adequately cope with the impact of N:M sparsity on LLMs' accuracy. E-Sparse is implemented as a Sparse-GEMM on FasterTransformer and runs on NVIDIA Ampere GPUs. Extensive experiments on the LLaMA family and OPT models show that E-Sparse can significantly speed up the model inference over the dense model (up to 1.53X) and obtain significant memory saving (up to 43.52%), with acceptable accuracy loss.
    摘要 传统的剪枝方法在大语言模型(LLM)中的生成AI中存在困难,因为它们的训练过程是不可持预算的,计算负担也很大。我们首次在LLM中引入隐藏状态特征的信息熵,并将其作为剪枝度量设计metric,称之为E-Sparse。E-Sparse利用隐藏状态特征的信息质量来优化频道重要性,并采用了一些新的技术来实现:1. 通过将参数权重和输入特征norm中的信息熵添加到剪枝度量中,以提高N:M稀疏的精度。2. 设计了全局简单混淆和局部块混淆,以快速调整信息分布,并有效地处理LLMs的精度下降。E-Sparse实现为Sparse-GEMM在FasterTransformer上,并在NVIDIA Ampere GPU上运行。广泛的实验表明,E-Sparse可以在LLMs中提高模型推理速度(最高达1.53倍),并获得显著的内存储存减少(最高达43.52%),同时保持了可接受的精度损失。

Characterizing Mechanisms for Factual Recall in Language Models

  • paper_url: http://arxiv.org/abs/2310.15910
  • repo_url: None
  • paper_authors: Qinan Yu, Jack Merullo, Ellie Pavlick
  • for: 研究语言模型(LM)在练习时如何处理不同来源的信息冲突。
  • methods: 使用分布式和机制性的方法来研究LM的行为,包括测量LM使用counterfactual前缀(例如“波兰首都是伦敦”)抹除预练知识。
  • results: 发现在Pythia和GPT2模型中,训练国家“波兰”和在context中城市“伦敦”的频率高度影响LM的使用counterfactual的可能性。通过在运行时缩放单个注意头的值向量来控制使用context answer的可能性,可以提高生成context answer的时间为88%。这些研究贡献到了控制模型行为的方法,并提供了一种proof of concept。
    Abstract Language Models (LMs) often must integrate facts they memorized in pretraining with new information that appears in a given context. These two sources can disagree, causing competition within the model, and it is unclear how an LM will resolve the conflict. On a dataset that queries for knowledge of world capitals, we investigate both distributional and mechanistic determinants of LM behavior in such situations. Specifically, we measure the proportion of the time an LM will use a counterfactual prefix (e.g., "The capital of Poland is London") to overwrite what it learned in pretraining ("Warsaw"). On Pythia and GPT2, the training frequency of both the query country ("Poland") and the in-context city ("London") highly affect the models' likelihood of using the counterfactual. We then use head attribution to identify individual attention heads that either promote the memorized answer or the in-context answer in the logits. By scaling up or down the value vector of these heads, we can control the likelihood of using the in-context answer on new data. This method can increase the rate of generating the in-context answer to 88\% of the time simply by scaling a single head at runtime. Our work contributes to a body of evidence showing that we can often localize model behaviors to specific components and provides a proof of concept for how future methods might control model behavior dynamically at runtime.
    摘要 语言模型(LM)经常需要将在预训练中记忆的信息与新的上下文中的信息结合在一起。这两个来源可能会有冲突, causing competition within the model, 并且不清楚如何使模型解决这种冲突。在一个关于世界首都的数据集上,我们调查了分布式和机制性的权重决定LM行为在这些情况下。specifically,我们测量了LM使用counterfactual prefix(例如,"Poland的首都是London")来覆盖预训练中学习的答案("Warsaw")的比例。在Pythia和GPT2上,训练国家("Poland")和上下文城市("London")的频率高度影响模型的使用counterfactual的可能性。然后,我们使用头归因来确定具有推荐 memorized answer 或 in-context answer 的个体注意头。通过在运行时缩放这些头的值向量,我们可以控制在新数据上使用 in-context answer 的可能性。这种方法可以将在新数据上使用 in-context answer 的比例提高到88%,只需在运行时缩放单个头。我们的工作贡献到了控制模型行为的方法,并提供了一种实验证明的证明。

Is Probing All You Need? Indicator Tasks as an Alternative to Probing Embedding Spaces

  • paper_url: http://arxiv.org/abs/2310.15905
  • repo_url: None
  • paper_authors: Tal Levy, Omer Goldman, Reut Tsarfaty
  • for: This paper is written to explore the use of indicator tasks for evaluating the information encoded in word embeddings, and to demonstrate the advantages of using indicator tasks over traditional probing methods.
  • methods: The paper uses two test cases to demonstrate the effectiveness of indicator tasks: one dealing with gender debiasing and another with the erasure of morphological information from embedding spaces.
  • results: The paper shows that the application of a suitable indicator provides a more accurate picture of the information captured and removed compared to probes, and thus concludes that indicator tasks should be implemented and taken into consideration when eliciting information from embedded representations.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文是为了探讨word embedding中信息的评估方法,并通过两个测试 случа件来证明使用指标任务的优势。
  • methods: 这篇论文使用两个测试 случа件来证明指标任务的效果:一个 gender 偏见减少的测试 случа件和另一个 morphological information 从 embedding space 中除去的测试cases。
  • results: 这篇论文表明,使用合适的指标可以更准确地捕捉 embedding space 中 captured 和 removed 的信息,而不是使用 probing 方法。因此,这篇论文认为,指标任务应该被实施并考虑在 embedded representation 中提取信息时使用。
    Abstract The ability to identify and control different kinds of linguistic information encoded in vector representations of words has many use cases, especially for explainability and bias removal. This is usually done via a set of simple classification tasks, termed probes, to evaluate the information encoded in the embedding space. However, the involvement of a trainable classifier leads to entanglement between the probe's results and the classifier's nature. As a result, contemporary works on probing include tasks that do not involve training of auxiliary models. In this work we introduce the term indicator tasks for non-trainable tasks which are used to query embedding spaces for the existence of certain properties, and claim that this kind of tasks may point to a direction opposite to probes, and that this contradiction complicates the decision on whether a property exists in an embedding space. We demonstrate our claims with two test cases, one dealing with gender debiasing and another with the erasure of morphological information from embedding spaces. We show that the application of a suitable indicator provides a more accurate picture of the information captured and removed compared to probes. We thus conclude that indicator tasks should be implemented and taken into consideration when eliciting information from embedded representations.
    摘要 “可以识别和控制不同类型的语言信息对字 vector 表示的编码有很多用途,特别是 для explainability 和偏见除除。通常透过一系列简单的分类任务,称为探针,来评估嵌入空间中的信息。但是,将探针与搜寻器的性质推导关系会导致嵌入空间中的信息被探针所探索的结果混合在一起。为了解决这个问题,现代工作中通常使用不需要训练助手模型的任务。在这篇文章中,我们引入了“仪器任务”这个概念,用于在嵌入空间中询问特定的特性是否存在。我们声称,这种任务可能会与探针相反的方向,并且这个对应关系会复杂化嵌入空间中的信息决定。我们透过两个测试案例,一个是关于性别偏见的消除和另一个是关于嵌入空间中的 morphological 信息的消除,以示出适当的仪器可以提供更加精确的信息捕捉和移除比探针。因此,我们结论到,仪器任务应该被实现和考虑在嵌入表示中提取信息时。”

AdaptiX – A Transitional XR Framework for Development and Evaluation of Shared Control Applications in Assistive Robotics

  • paper_url: http://arxiv.org/abs/2310.15887
  • repo_url: https://github.com/maxpascher/AdaptiX
  • paper_authors: Max Pascher, Felix Ferdinand Goldau, Kirill Kronhardt, Udo Frese, Jens Gerken
  • for: 本研究旨在提高辅助技术的可用性和普及度,通过结合人机共同控制的概念,提高用户自主性,并在高解像力 simulate 环境中评估和开发共同控制应用程序。
  • methods: 本研究使用了开源的 AdaptiX XR 框架,可以快速开发和评估共同控制应用程序,无需真实的 física robotic arm。它可以轻松扩展,以满足特定研究需求,并可以通过 ROS 集成控制真实 robotic arm。
  • results: 本研究通过 AdaptiX 框架进行了评估和开发共同控制应用程序,并在虚拟现实环境中进行了多种人机共同控制方法的研究。结果表明,AdaptiX 可以帮助提高用户自主性,并减少用户与软件控制之间的差异。
    Abstract With the ongoing efforts to empower people with mobility impairments and the increase in technological acceptance by the general public, assistive technologies, such as collaborative robotic arms, are gaining popularity. Yet, their widespread success is limited by usability issues, specifically the disparity between user input and software control along the autonomy continuum. To address this, shared control concepts provide opportunities to combine the targeted increase of user autonomy with a certain level of computer assistance. This paper presents the free and open-source AdaptiX XR framework for developing and evaluating shared control applications in a high-resolution simulation environment. The initial framework consists of a simulated robotic arm with an example scenario in Virtual Reality (VR), multiple standard control interfaces, and a specialized recording/replay system. AdaptiX can easily be extended for specific research needs, allowing Human-Robot Interaction (HRI) researchers to rapidly design and test novel interaction methods, intervention strategies, and multi-modal feedback techniques, without requiring an actual physical robotic arm during the early phases of ideation, prototyping, and evaluation. Also, a Robot Operating System (ROS) integration enables the controlling of a real robotic arm in a PhysicalTwin approach without any simulation-reality gap. Here, we review the capabilities and limitations of AdaptiX in detail and present three bodies of research based on the framework. AdaptiX can be accessed at https://adaptix.robot-research.de.
    摘要 随着人们身体障碍的 empowerment 和技术的普及,协助技术,如合作 робоptic arms,在广泛的应用中受到欢迎。然而,其广泛的成功受限于使用问题,具体是用户输入和软件控制之间的差异,在自主性维度上。 To address this, shared control concepts can combine the targeted increase of user autonomy with a certain level of computer assistance. This paper presents the free and open-source AdaptiX XR framework for developing and evaluating shared control applications in a high-resolution simulation environment. The initial framework consists of a simulated robotic arm with an example scenario in Virtual Reality (VR), multiple standard control interfaces, and a specialized recording/replay system. AdaptiX can easily be extended for specific research needs, allowing Human-Robot Interaction (HRI) researchers to rapidly design and test novel interaction methods, intervention strategies, and multi-modal feedback techniques, without requiring an actual physical robotic arm during the early phases of ideation, prototyping, and evaluation. Additionally, a Robot Operating System (ROS) integration enables the controlling of a real robotic arm in a PhysicalTwin approach without any simulation-reality gap. Here, we review the capabilities and limitations of AdaptiX in detail and present three bodies of research based on the framework. AdaptiX can be accessed at https://adaptix.robot-research.de.

KirchhoffNet: A Circuit Bridging Message Passing and Continuous-Depth Models

  • paper_url: http://arxiv.org/abs/2310.15872
  • repo_url: None
  • paper_authors: Zhengqi Gao, Fan-Keng Sun, Duane S. Boning
  • for: 这篇论文的目的是介绍一种基于电子电路基本原理的神经网络模型,称之为KirchhoffNet,它与消息传递神经网络和连续深度网络有紧密的联系。
  • methods: 作者使用Kirchhoff的电流法则来引入一种独特的神经网络模型,这种模型不需要传统层次结构(如卷积、聚合和线性层)却可以达到98.86%的测试准确率在MNIST dataset,与当前最佳实践(SOTA)结果相当。
  • results: 作者证明了KirchhoffNet在硬件方面具有潜在的优势,它可以通过物理实现为analog电子电路,而不需要使用GPU。此外,作者还证明了KirchhoffNet的前向计算总是在1/f秒钟内完成,这意味着无论KirchhoffNet的参数数量如何,它都可以在1/f秒钟内完成计算。这一特点为实现ultra-大规模神经网络提供了一个有前途的技术。
    Abstract In this paper, we exploit a fundamental principle of analog electronic circuitry, Kirchhoff's current law, to introduce a unique class of neural network models that we refer to as KirchhoffNet. KirchhoffNet establishes close connections with message passing neural networks and continuous-depth networks. We demonstrate that even in the absence of any traditional layers (such as convolution, pooling, or linear layers), KirchhoffNet attains 98.86% test accuracy on the MNIST dataset, comparable with state of the art (SOTA) results. What makes KirchhoffNet more intriguing is its potential in the realm of hardware. Contemporary deep neural networks are conventionally deployed on GPUs. In contrast, KirchhoffNet can be physically realized by an analog electronic circuit. Moreover, we justify that irrespective of the number of parameters within a KirchhoffNet, its forward calculation can always be completed within 1/f seconds, with f representing the hardware's clock frequency. This characteristic introduces a promising technology for implementing ultra-large-scale neural networks.
    摘要 在这篇论文中,我们利用电子电路中的一个基本原理,基希夫托的电流法则,来引入一种新的神经网络模型,我们称之为基希夫托网络(KirchhoffNet)。基希夫托网络与消息传递神经网络和连续深度网络有紧密的联系。我们示示了,即使没有传统层(如核心抽取、聚合或线性层),基希夫托网络仍能达到98.86%的测试准确率在MNIST数据集上,与状态的最佳实践(SOTA)结果相当。这使得基希夫托网络更加吸引人,因为它在硬件方面具有潜在的优势。当代深度神经网络通常在GPU上部署。相比之下,基希夫托网络可以被物理实现为一个分析电子电路。此外,我们证明了基希夫托网络的前向计算总是在1/f秒钟内完成,其中f是硬件的频率。这一特点引入了可能实现超大规模神经网络的有望技术。

CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code

  • paper_url: http://arxiv.org/abs/2310.16853
  • repo_url: None
  • paper_authors: Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Shouling Ji, Wenhai Wang
  • for: 本研究的目的是提供一种能够自动生成binary函数摘要的方法,以便进行反向工程学。
  • methods: 本研究使用了一种基于控制流图和伪代码的 binary code summarization 框架,称为CP-BCS。该框架利用了双向指令级别控制流图和伪代码,以便充分利用 Assembly 代码的 semantics。
  • results: 对于3种不同的 binary 优化级别(O1、O2、O3)和3种不同的计算机架构(X86、X64、ARM),CP-BCS 的评估结果表明,它能够显著提高反向工程学的效率。
    Abstract Automatically generating function summaries for binaries is an extremely valuable but challenging task, since it involves translating the execution behavior and semantics of the low-level language (assembly code) into human-readable natural language. However, most current works on understanding assembly code are oriented towards generating function names, which involve numerous abbreviations that make them still confusing. To bridge this gap, we focus on generating complete summaries for binary functions, especially for stripped binary (no symbol table and debug information in reality). To fully exploit the semantics of assembly code, we present a control flow graph and pseudo code guided binary code summarization framework called CP-BCS. CP-BCS utilizes a bidirectional instruction-level control flow graph and pseudo code that incorporates expert knowledge to learn the comprehensive binary function execution behavior and logic semantics. We evaluate CP-BCS on 3 different binary optimization levels (O1, O2, and O3) for 3 different computer architectures (X86, X64, and ARM). The evaluation results demonstrate CP-BCS is superior and significantly improves the efficiency of reverse engineering.
    摘要 自动生成函数摘要 для二进制文件是一项非常有价值但具有挑战性的任务,因为它涉及翻译二进制语言(assembly code)的执行行为和semantics into human-readable natural language。然而,现有的大多数工作都集中在生成函数名称上,这些名称充满缩写,使得它们仍然混乱不清。为了弥补这一差距,我们将注意点在生成完整的函数摘要,特别是对于剥离二进制(没有符号表和调试信息)。为了全面利用二进制语言的 semantics,我们提出了一种控制流图和pseudo code导航 binary code摘要框架,称为CP-BCS。CP-BCS使用双向指令级别控制流图和pseudo code,并通过专家知识来学习二进制函数执行行为和逻辑semantics。我们对3个不同的二进制优化级别(O1、O2和O3)和3种不同的计算机架构(X86、X64和ARM)进行了评估。评估结果表明,CP-BCS在逆向工程中提高了效率。

Topology-aware Debiased Self-supervised Graph Learning for Recommendation

  • paper_url: http://arxiv.org/abs/2310.15858
  • repo_url: https://github.com/malajikuai/tdsgl
  • paper_authors: Lei Han, Hui Yan, Zhicheng Qiao
  • for: 提高推荐系统的准确率和效果
  • methods: 基于图的自适应自监学习(TDSGL),利用交互数据反映用户购买意愿和物品特点,构建强类对比,提高模型的泛化能力
  • results: 在三个公共数据集上实验显示,提出的模型在与状态艺术模型进行比较时达到了显著的提高,代码可以在https://github.com/malajikuai/TDSGL中获取
    Abstract In recommendation, graph-based Collaborative Filtering (CF) methods mitigate the data sparsity by introducing Graph Contrastive Learning (GCL). However, the random negative sampling strategy in these GCL-based CF models neglects the semantic structure of users (items), which not only introduces false negatives (negatives that are similar to anchor user (item)) but also ignores the potential positive samples. To tackle the above issues, we propose Topology-aware Debiased Self-supervised Graph Learning (TDSGL) for recommendation, which constructs contrastive pairs according to the semantic similarity between users (items). Specifically, since the original user-item interaction data commendably reflects the purchasing intent of users and certain characteristics of items, we calculate the semantic similarity between users (items) on interaction data. Then, given a user (item), we construct its negative pairs by selecting users (items) which embed different semantic structures to ensure the semantic difference between the given user (item) and its negatives. Moreover, for a user (item), we design a feature extraction module that converts other semantically similar users (items) into an auxiliary positive sample to acquire a more informative representation. Experimental results show that the proposed model outperforms the state-of-the-art models significantly on three public datasets. Our model implementation codes are available at https://github.com/malajikuai/TDSGL.
    摘要 在推荐中,基于图的共同推荐(CF)方法可以减轻数据稀缺性,通过引入图像对比学习(GCL)。然而,Random Negative Sampling策略在这些GCL基于CF模型中忽略用户(item)的 semantic structure,不仅引入假的负样本(与锚用户(item)相似的负样本),还忽略了可能的正样本。为了解决这些问题,我们提出了 topology-aware debiased self-supervised graph learning(TDSGL)模型,它根据用户(item)的semantic similarity来构建对比对。具体来说,我们根据原始的用户-item交互数据可以得到用户(item)的购买意愿和一些特征,然后计算用户(item)的semantic similarity。接下来,给一个用户(item),我们选择semantic structure不同的用户(item)作为负样本,以确保与给定用户(item)的semantic差异。此外,为一个用户(item),我们设计了一个特征提取模块,将其他semantic相似的用户(item)转换为auxiliary positive sample,以获得更加有用的表示。实验结果显示,我们提出的模型在三个公共数据集上与当前状态的模型显著超越。我们的模型实现代码可以在https://github.com/malajikuai/TDSGL中找到。

Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language Models

  • paper_url: http://arxiv.org/abs/2310.15852
  • repo_url: None
  • paper_authors: Lina Conti, Guillaume Wisniewski
  • for: 这个研究探究了神经语言模型如何自然地学习语言特性,包括 gender 信息的捕捉和使用规则。
  • methods: 研究使用了一个基于 PCDF 的人工语料库,控制了训练数据中 gender 分布,以确定模型是否正确地捕捉 gender 信息,或者受到 gender 偏见。
  • results: 研究发现,当模型在训练时受到 sufficient 的 gender 信息时,它们可以正确地捕捉 gender 信息并遵循使用规则。但是,当 gender 信息不充分时,模型可能受到 gender 偏见。
    Abstract Numerous studies have demonstrated the ability of neural language models to learn various linguistic properties without direct supervision. This work takes an initial step towards exploring the less researched topic of how neural models discover linguistic properties of words, such as gender, as well as the rules governing their usage. We propose to use an artificial corpus generated by a PCFG based on French to precisely control the gender distribution in the training data and determine under which conditions a model correctly captures gender information or, on the contrary, appears gender-biased.
    摘要 多个研究已经证明神经语言模型可以不直接监督学习不同语言性质。这项工作尝试了对较少研究的话语性质的探索,包括单词的性别信息以及其使用规则。我们提议使用基于法语的PCFG生成人工词库,以控制训练数据中性别分布的精度,并确定模型在捕捉性别信息时是否正确或偏向一方。

Posterior Estimation for Dynamic PET imaging using Conditional Variational Inference

  • paper_url: http://arxiv.org/abs/2310.15850
  • repo_url: None
  • paper_authors: Xiaofeng Liu, Thibault Marin, Tiss Amal, Jonghye Woo, Georges El Fakhri, Jinsong Ouyang
  • for: 这个研究旨在高效地估计动态PET成像中的质量参数 posterior distribution,基于时间活动曲线的测量。
  • methods: 该研究使用了深度学习框架,通过引入幂变量来抵消前向模型中的信息损失,然后使用conditional variational autoencoder(CVAE)估计 posterior。
  • results: 研究表明,CVAE-based方法可以高效地估计质量参数 posterior distribution,并且与MCMC sampling相比,其效果更好,特别是在低维度数据(单个脑区)下。
    Abstract This work aims efficiently estimating the posterior distribution of kinetic parameters for dynamic positron emission tomography (PET) imaging given a measurement of time of activity curve. Considering the inherent information loss from parametric imaging to measurement space with the forward kinetic model, the inverse mapping is ambiguous. The conventional (but expensive) solution can be the Markov Chain Monte Carlo (MCMC) sampling, which is known to produce unbiased asymptotical estimation. We propose a deep-learning-based framework for efficient posterior estimation. Specifically, we counteract the information loss in the forward process by introducing latent variables. Then, we use a conditional variational autoencoder (CVAE) and optimize its evidence lower bound. The well-trained decoder is able to infer the posterior with a given measurement and the sampled latent variables following a simple multivariate Gaussian distribution. We validate our CVAE-based method using unbiased MCMC as the reference for low-dimensional data (a single brain region) with the simplified reference tissue model.
    摘要 To address this issue, we propose a deep-learning-based framework for efficient posterior estimation. Specifically, we introduce latent variables to counteract the information loss in the forward process. Then, we use a conditional variational autoencoder (CVAE) to optimize the evidence lower bound. The well-trained decoder is able to infer the posterior with a given measurement and the sampled latent variables, following a simple multivariate Gaussian distribution.To validate our CVAE-based method, we use unbiased MCMC as the reference for low-dimensional data (a single brain region) with the simplified reference tissue model. Our results show that the CVAE-based method can provide accurate and efficient posterior estimation for dynamic PET imaging.

Grid Frequency Forecasting in University Campuses using Convolutional LSTM

  • paper_url: http://arxiv.org/abs/2310.16071
  • repo_url: None
  • paper_authors: Aneesh Sathe, Wen Ren Yang
  • for: 这种paper的目的是提出一种基于Convolutional Neural Networks (CNN)和Long Short-Term Memory (LSTM)网络的强化时间序列预测模型,以提高电网的可靠性和灵活性。
  • methods: 这种方法使用了Convolutional LSTM (ConvLSTM)模型,通过训练每个学生会大楼的自适应模型,以适应各自的时间序列数据。同时,一种ensemble模型也是形成,以汇集各个大楼的预测结果,提供整体的预测结果。
  • results: 实验结果表明,提出的方法在预测电网频率方面表现出色,比传统预测技术更高的精度和稳定性。 metrics such as Mean Square Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) 都表明了这种方法的优势。
    Abstract The modern power grid is facing increasing complexities, primarily stemming from the integration of renewable energy sources and evolving consumption patterns. This paper introduces an innovative methodology that harnesses Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks to establish robust time series forecasting models for grid frequency. These models effectively capture the spatiotemporal intricacies inherent in grid frequency data, significantly enhancing prediction accuracy and bolstering power grid reliability. The research explores the potential and development of individualized Convolutional LSTM (ConvLSTM) models for buildings within a university campus, enabling them to be independently trained and evaluated for each building. Individual ConvLSTM models are trained on power consumption data for each campus building and forecast the grid frequency based on historical trends. The results convincingly demonstrate the superiority of the proposed models over traditional forecasting techniques, as evidenced by performance metrics such as Mean Square Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). Additionally, an Ensemble Model is formulated to aggregate insights from the building-specific models, delivering comprehensive forecasts for the entire campus. This approach ensures the privacy and security of power consumption data specific to each building.
    摘要 现代电力网络面临着加大复杂性,主要来自于可再生能源资源的整合和消耗模式的变化。这篇论文提出了一种创新的方法,利用卷积神经网络(CNN)和长短期记忆网络(LSTM)建立了可靠的时间序列预测模型,以提高电力网络的可靠性。研究探讨了建筑物pecific Convolutional LSTM(ConvLSTM)模型的可能性和发展,通过对每座建筑物的电力消耗数据进行独立训练和评估,以预测电力网络频率。结果证明了提议的模型在传统预测技术的基础上具有明显的优势,如 Mean Square Error(MSE)、Mean Absolute Error(MAE)和Mean Absolute Percentage Error(MAPE)等性能指标。此外,一个Ensemble Model也被构建,以汇集建筑物特定的预测结果,为整个校园提供全面的预测。这种方法保障了每座建筑物的电力消耗数据的隐私和安全。

Clinical Decision Support System for Unani Medicine Practitioners

  • paper_url: http://arxiv.org/abs/2310.18361
  • repo_url: None
  • paper_authors: Haider Sultan, Hafiza Farwa Mahmood, Noor Fatima, Marriyam Nadeem, Talha Waheed
  • for: 这个研究旨在开发一个基于网络的临床决策支持系统,以帮助新手的医生更准确地诊断疾病,并提供远程医疗服务。
  • methods: 这个系统使用了现代人工智能技术,包括决策树、深度学习和自然语言处理,并将患者症状输入到网络上,然后自动分析和生成潜在疾病列表。
  • results: 这个系统可以帮助医生更快速和准确地诊断疾病,提高患者满意度和医疗效果,同时也可以减少医疗成本和提高医疗资源的利用率。
    Abstract Like other fields of Traditional Medicines, Unani Medicines have been found as an effective medical practice for ages. It is still widely used in the subcontinent, particularly in Pakistan and India. However, Unani Medicines Practitioners are lacking modern IT applications in their everyday clinical practices. An Online Clinical Decision Support System may address this challenge to assist apprentice Unani Medicines practitioners in their diagnostic processes. The proposed system provides a web-based interface to enter the patient's symptoms, which are then automatically analyzed by our system to generate a list of probable diseases. The system allows practitioners to choose the most likely disease and inform patients about the associated treatment options remotely. The system consists of three modules: an Online Clinical Decision Support System, an Artificial Intelligence Inference Engine, and a comprehensive Unani Medicines Database. The system employs advanced AI techniques such as Decision Trees, Deep Learning, and Natural Language Processing. For system development, the project team used a technology stack that includes React, FastAPI, and MySQL. Data and functionality of the application is exposed using APIs for integration and extension with similar domain applications. The novelty of the project is that it addresses the challenge of diagnosing diseases accurately and efficiently in the context of Unani Medicines principles. By leveraging the power of technology, the proposed Clinical Decision Support System has the potential to ease access to healthcare services and information, reduce cost, boost practitioner and patient satisfaction, improve speed and accuracy of the diagnostic process, and provide effective treatments remotely. The application will be useful for Unani Medicines Practitioners, Patients, Government Drug Regulators, Software Developers, and Medical Researchers.
    摘要 如其他传统医学一样,欧奈医学在历史上已经证明自己是一种有效的医疗方式。它仍然广泛使用在亚洲子半岛,特别是在巴基斯坦和印度。然而,欧奈医学实践者缺乏现代IT应用程序在日常临床实践中。一个在线临床决策支持系统可能解决这个挑战,并帮助新手欧奈医学实践者在诊断过程中更加准确和效率。该系统提供了一个网络化界面,让医生输入症状,然后自动由我们的系统分析,生成潜在疾病列表。系统允许医生选择最有可能的疾病,并通过在线向患者提供相关的治疗方案。系统包括三个模块:在线临床决策支持系统、人工智能推理引擎和欧奈医学数据库。系统运用先进的AI技术,如决策树、深度学习和自然语言处理。为系统开发,项目团队使用了技术栈,包括React、FastAPI和MySQL。数据和功能的暴露使用API,以便与类似领域应用程序集成和扩展。该项目的创新之处在于,它通过应用技术来解决欧奈医学原则下诊断疾病的挑战。通过利用技术,该提案的临床决策支持系统具有扩大健康服务和信息的访问权,降低成本,提高医生和患者满意度,提高诊断过程的速度和准确性,并提供远程治疗方案。该应用程序对欧奈医学实践者、患者、政府药品监管部门、软件开发者和医学研究人员都将有用。

A Diffusion Weighted Graph Framework for New Intent Discovery

  • paper_url: http://arxiv.org/abs/2310.15836
  • repo_url: https://github.com/yibai-shi/dwgf
  • paper_authors: Wenkai Shi, Wenbin An, Feng Tian, Qinghua Zheng, QianYing Wang, Ping Chen
  • for: 本研究旨在用有限量的标注数据来识别新和已知意图 from 无标注数据,并提供更充分和可靠的监督信号。
  • methods: 我们提出了一种新的Diffusion Weighted Graph Framework (DWGF),可以捕捉数据中的语义相似性和结构关系,并生成更加充分和可靠的监督信号。
  • results: 我们的方法在多个 benchmark 数据集上的所有评价指标上都达到了state-of-the-art 水平。
    Abstract New Intent Discovery (NID) aims to recognize both new and known intents from unlabeled data with the aid of limited labeled data containing only known intents. Without considering structure relationships between samples, previous methods generate noisy supervisory signals which cannot strike a balance between quantity and quality, hindering the formation of new intent clusters and effective transfer of the pre-training knowledge. To mitigate this limitation, we propose a novel Diffusion Weighted Graph Framework (DWGF) to capture both semantic similarities and structure relationships inherent in data, enabling more sufficient and reliable supervisory signals. Specifically, for each sample, we diffuse neighborhood relationships along semantic paths guided by the nearest neighbors for multiple hops to characterize its local structure discriminately. Then, we sample its positive keys and weigh them based on semantic similarities and local structures for contrastive learning. During inference, we further propose Graph Smoothing Filter (GSF) to explicitly utilize the structure relationships to filter high-frequency noise embodied in semantically ambiguous samples on the cluster boundary. Extensive experiments show that our method outperforms state-of-the-art models on all evaluation metrics across multiple benchmark datasets. Code and data are available at https://github.com/yibai-shi/DWGF.
    摘要 新意探索(NID)目标是从无标签数据中识别新和已知意图,帮助于限量标注数据中只包含已知意图。过去的方法生成了不稳定的超级监督信号,无法保持新意群和传输预训练知识的效果。为了解决这一限制,我们提出了一种新的扩散加重图框架(DWGF),可以捕捉数据中的语义相似性和结构关系,生成更充分和可靠的监督信号。具体来说,对每个样本,我们将邻居关系扩散到语义路径上,由最近邻居 guid 的多个跳步来特征化其地方结构。然后,我们将其正样例选择,并根据语义相似性和地方结构进行权重调整。在推理过程中,我们还提出了图像缓和筛选器(GSF),可以直接利用结构关系来过滤semantically ambiguous的样本。广泛的实验表明,我们的方法在多个评价指标上都超过了当前的模型。代码和数据可以在https://github.com/yibai-shi/DWGF中获取。

Automatic Aorta Segmentation with Heavily Augmented, High-Resolution 3-D ResUNet: Contribution to the SEG.A Challenge

  • paper_url: http://arxiv.org/abs/2310.15827
  • repo_url: https://github.com/mwod/sega_mw_2023
  • paper_authors: Marek Wodzinski, Henning Müller
  • for: 这个论文主要目标是提出一种自动推断三维医疗图像中的大动脉分割方法,以解决医疗图像中动脉分割的难题。
  • methods: 该方法基于深度编码器-解码器架构,并且假设数据预处理和扩展对于深度架构的性能更为重要,特别是在低数据范围内。因此,该方法基于变体的卷积核网络。
  • results: 该方法在所有测试 caso中取得了 dice 分数大于0.9,并且在所有参与者中得分最高,在临床评估、量化结果和三维瓷砾质量等方面分别获得了第一、第四和第三名。ources code、预训练模型和算法在 Grand-Challenge 平台上提供开源。
    Abstract Automatic aorta segmentation from 3-D medical volumes is an important yet difficult task. Several factors make the problem challenging, e.g. the possibility of aortic dissection or the difficulty with segmenting and annotating the small branches. This work presents a contribution by the MedGIFT team to the SEG.A challenge organized during the MICCAI 2023 conference. We propose a fully automated algorithm based on deep encoder-decoder architecture. The main assumption behind our work is that data preprocessing and augmentation are much more important than the deep architecture, especially in low data regimes. Therefore, the solution is based on a variant of traditional convolutional U-Net. The proposed solution achieved a Dice score above 0.9 for all testing cases with the highest stability among all participants. The method scored 1st, 4th, and 3rd in terms of the clinical evaluation, quantitative results, and volumetric meshing quality, respectively. We freely release the source code, pretrained model, and provide access to the algorithm on the Grand-Challenge platform.
    摘要 自动从三维医疗量子中分类索引静脉是一项重要但困难的任务。几种因素使得这个问题复杂,例如索引分化或者分类和注释小支流的困难。本文是MedGIFT团队对SEG.A挑战的贡献,于MICCAI 2023会议上举行。我们提议一种完全自动的深度编码器-解码器架构。我们假设数据预处理和扩展是在低数据情况下更重要的,特别是在低数据情况下。因此,我们的解决方案基于变种的传统 convolutional U-Net。我们的方法在测试案例中的 Dice 分数超过 0.9,并且在所有参与者中显示出最高稳定性。在临床评估、量化结果和卷积质量方面,我们的方法分别获得了第一名、第四名和第三名。我们开源了源代码、预训练模型,并在 Grand-Challenge 平台上提供了算法访问权限。

Rosetta Stone at KSAA-RD Shared Task: A Hop From Language Modeling To Word–Definition Alignment

  • paper_url: http://arxiv.org/abs/2310.15823
  • repo_url: None
  • paper_authors: Ahmed ElBakry, Mohamed Gabr, Muhammad ElNokrashy, Badr AlKhamissi
  • for: 本文旨在提供一种用于解决阿拉伯语”tip-of-the-tongue”现象的词典查询工具。
  • methods: 本文使用了一个 ensemble of 阿拉伯语BERT模型,通过对定义进行预处理并使用 average 方法生成word embedding。
  • results: 本文在两个子任务中都达到了最高分,并且在两个子任务中的结果表明了 ensemble 方法的可靠性和稳定性。
    Abstract A Reverse Dictionary is a tool enabling users to discover a word based on its provided definition, meaning, or description. Such a technique proves valuable in various scenarios, aiding language learners who possess a description of a word without its identity, and benefiting writers seeking precise terminology. These scenarios often encapsulate what is referred to as the "Tip-of-the-Tongue" (TOT) phenomena. In this work, we present our winning solution for the Arabic Reverse Dictionary shared task. This task focuses on deriving a vector representation of an Arabic word from its accompanying description. The shared task encompasses two distinct subtasks: the first involves an Arabic definition as input, while the second employs an English definition. For the first subtask, our approach relies on an ensemble of finetuned Arabic BERT-based models, predicting the word embedding for a given definition. The final representation is obtained through averaging the output embeddings from each model within the ensemble. In contrast, the most effective solution for the second subtask involves translating the English test definitions into Arabic and applying them to the finetuned models originally trained for the first subtask. This straightforward method achieves the highest score across both subtasks.
    摘要 一个反字典是一种工具,允许用户根据提供的定义、含义或描述找到一个词。这种技术在不同的场景中具有很大的价值,帮助语言学习者只有word的描述而没有其标识,也有助于作家寻找精准的术语。这些场景通常被称为“舌尖上的灵感”(Tip-of-the-Tongue,TOT)现象。在这个工作中,我们介绍了我们对阿拉伯语反字典共享任务的赢利解决方案。这个任务的目标是从阿拉伯语定义中提取一个阿拉伯语词的向量表示。共享任务包括两个不同的子任务:第一个使用阿拉伯语定义作为输入,第二个使用英语定义。对于第一个子任务,我们的方法是使用预处理的阿拉伯语BERT模型 ensemble,将给定定义生成word embedding。最终的表示是通过ensemble中每个模型的输出均值来获得。相反,对于第二个子任务,我们发现简单地将英语测试定义翻译成阿拉伯语,然后将其应用于原始预处理的阿拉伯语BERT模型中得到了最高分。

Discriminator Guidance for Autoregressive Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.15817
  • repo_url: None
  • paper_authors: Filip Ekström Kelvinius, Fredrik Lindsten
  • for: 这个论文是用来描述如何使用抑制器导向抽象扩散模型进行生成的。
  • methods: 这个论文使用了一个抑制器来导向一个抽象扩散过程,以前这种方法已经在连续扩散模型中使用过。这里的作者 derive了在离散 caso 中使用抑制器和预训练生成模型的方法。
  • results: 作者通过对分子图生成任务进行测试,发现使用抑制器可以提高生成性能。
    Abstract We introduce discriminator guidance in the setting of Autoregressive Diffusion Models. The use of a discriminator to guide a diffusion process has previously been used for continuous diffusion models, and in this work we derive ways of using a discriminator together with a pretrained generative model in the discrete case. First, we show that using an optimal discriminator will correct the pretrained model and enable exact sampling from the underlying data distribution. Second, to account for the realistic scenario of using a sub-optimal discriminator, we derive a sequential Monte Carlo algorithm which iteratively takes the predictions from the discrimiator into account during the generation process. We test these approaches on the task of generating molecular graphs and show how the discriminator improves the generative performance over using only the pretrained model.
    摘要 我们介绍了推导者导向的抽象扩散模型。在这篇文章中,我们 derivates了使用推导者与预训生成模型在碎散 случа的使用方法。首先,我们显示了使用最佳推导者可以正确地训练预训生成模型,并允许精准地抽取背景数据分布中的样本。其次,为了考虑实际情况中使用不Optimal推导者的情况,我们 derivates了一个序列 Монте卡洛 Algorithms,它在生成过程中逐步地考虑推导者的预测结果。我们在生成分子图的任务上试用了这些方法,并证明了推导者可以提高生成性能。

  • paper_url: http://arxiv.org/abs/2310.15799
  • repo_url: https://github.com/sreyan88/dale
  • paper_authors: Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar, S Ramaneswaran, S Sakshi, Utkarsh Tyagi, Dinesh Manocha
  • for: 提供了一个新的、有效的生成数据增强框架,用于低资源法律自然语言处理(Legal NLP)。
  • methods: 使用Encoder-Decoder语言模型,在 selective masking 基础上进行预训练,以获得法律语言特有的知识和语言使用表达。
  • results: 在13个数据集和6个任务中,DALE对低资源法律自然语言处理任务进行了有效的增强,与基线相比,DALE的表现提高1%-50%。
    Abstract We present DALE, a novel and effective generative Data Augmentation framework for low-resource LEgal NLP. DALE addresses the challenges existing frameworks pose in generating effective data augmentations of legal documents - legal language, with its specialized vocabulary and complex semantics, morphology, and syntax, does not benefit from data augmentations that merely rephrase the source sentence. To address this, DALE, built on an Encoder-Decoder Language Model, is pre-trained on a novel unsupervised text denoising objective based on selective masking - our masking strategy exploits the domain-specific language characteristics of templatized legal documents to mask collocated spans of text. Denoising these spans helps DALE acquire knowledge about legal concepts, principles, and language usage. Consequently, it develops the ability to generate coherent and diverse augmentations with novel contexts. Finally, DALE performs conditional generation to generate synthetic augmentations for low-resource Legal NLP tasks. We demonstrate the effectiveness of DALE on 13 datasets spanning 6 tasks and 4 low-resource settings. DALE outperforms all our baselines, including LLMs, qualitatively and quantitatively, with improvements of 1%-50%.
    摘要 我们介绍DALE,一种新的有效的生成数据扩展框架,用于低资源法律自然语言处理(Legal NLP)。DALE解决了现有框架对法律文档生成有效数据扩展的挑战,因为法律语言具有特殊词汇和复杂的语法、语义和语法结构,而且不能仅通过重句来生成有效的数据扩展。为了解决这个问题,DALE采用了一个 Encoder-Decoder 语言模型,并在其前置训练时使用一种新的无监督文本净化目标函数,基于选择性遮盖。我们的遮盖策略利用了特定领域的法律文档语言特征, selectively 遮盖了协调的文本段。这种净化帮助DALE获得了法律概念、原则和语言使用的知识。因此,DALE可以生成具有新Context的coherent和多样化的扩展。最后,DALE进行了conditional生成,为低资源法律 NLP 任务生成了 sintetic的扩展。我们在13个数据集和6个任务上demonstrate了DALE的效果,与基线相比,DALE表现出1%-50%的提升。

Random Entity Quantization for Parameter-Efficient Compositional Knowledge Graph Representation

  • paper_url: http://arxiv.org/abs/2310.15797
  • repo_url: https://github.com/jiaangl/randomquantization
  • paper_authors: Jiaang Li, Quan Wang, Yi Liu, Licheng Zhang, Zhendong Mao
  • for: 这篇论文关注在知识图(KG)表示学习中的实体表示问题,尤其是现有的KG Embedding(KGE)方法在扩展性上存在挑战。
  • methods: 该论文提出了一种新的方法,即随机实体量化,即对实体进行随机分配小型编码图表。
  • results: 研究发现,随机实体量化可以达到类似于现有策略的效果,并且通过分析 entropy 和 Jaccard 距离,解释了这种现象。
    Abstract Representation Learning on Knowledge Graphs (KGs) is essential for downstream tasks. The dominant approach, KG Embedding (KGE), represents entities with independent vectors and faces the scalability challenge. Recent studies propose an alternative way for parameter efficiency, which represents entities by composing entity-corresponding codewords matched from predefined small-scale codebooks. We refer to the process of obtaining corresponding codewords of each entity as entity quantization, for which previous works have designed complicated strategies. Surprisingly, this paper shows that simple random entity quantization can achieve similar results to current strategies. We analyze this phenomenon and reveal that entity codes, the quantization outcomes for expressing entities, have higher entropy at the code level and Jaccard distance at the codeword level under random entity quantization. Therefore, different entities become more easily distinguished, facilitating effective KG representation. The above results show that current quantization strategies are not critical for KG representation, and there is still room for improvement in entity distinguishability beyond current strategies. The code to reproduce our results is available at https://github.com/JiaangL/RandomQuantization.
    摘要 知识图(KG)下 Representation Learning 是下游任务的关键。目前主流方法是知识图嵌入(KGE),它使用独立的向量表示实体,但面临缩放挑战。近期研究提出了一种减少参数的方法,即使用预定的小规模码表来表示实体。我们称之为实体量化,之前的工作已经设计了复杂的策略。很奇怪的是,这篇论文显示了Random Entity Quantization可以达到类似的效果。我们分析了这个现象,发现实体代码,即表示实体的量化结果,在Random Entity Quantization中具有更高的熵度和Jacard距离。因此,不同的实体更容易区分,从而实现有效的KG表示。上述结果表明,当前的量化策略并不是KG表示的关键因素,还有可能提高实体区分性的空间。代码可以在https://github.com/JiaangL/RandomQuantization中找到。

Improving generalization in large language models by learning prefix subspaces

  • paper_url: http://arxiv.org/abs/2310.15793
  • repo_url: None
  • paper_authors: Louis Falissard, Vincent Guigue, Laure Soulier
  • for: 这篇论文主要关注大语言模型(LLMs)在缺乏数据的情况下(也称为“几个样本”学习设定)进行精细调整。
  • methods: 该论文提出了一种基于神经网络空间的方法,用于提高LLMs的通用能力。这种优化方法,原来在计算机视觉领域中引入,通过同时优化参数空间中的整个简单кс(simplex)来提高模型的通用能力。然而,将这种方法应用于大型、预训练的变换器 pose 些挑战,主要是由于它们具有大量参数和固定参数初始化方法。我们表明,“参数效率微调”(PEFT)方法是与这种原始方法完全兼容的,并提议在参数空间中学习整个简单кс中的连续前缀。
  • results: 我们在一个基于GLUE bencmark的变换器上进行了测试,并显示了我们的贡献对于“几个样本”学习设定下的平均性能具有提高。实现可以在以下链接中找到:https://github.com/Liloulou/prefix_subspace
    Abstract This article focuses on large language models (LLMs) fine-tuning in the scarce data regime (also known as the "few-shot" learning setting). We propose a method to increase the generalization capabilities of LLMs based on neural network subspaces. This optimization method, recently introduced in computer vision, aims to improve model generalization by identifying wider local optima through the joint optimization of an entire simplex of models in parameter space. Its adaptation to massive, pretrained transformers, however, poses some challenges. First, their considerable number of parameters makes it difficult to train several models jointly, and second, their deterministic parameter initialization schemes make them unfit for the subspace method as originally proposed. We show in this paper that "Parameter Efficient Fine-Tuning" (PEFT) methods, however, are perfectly compatible with this original approach, and propose to learn entire simplex of continuous prefixes. We test our method on a variant of the GLUE benchmark adapted to the few-shot learning setting, and show that both our contributions jointly lead to a gain in average performances compared to sota methods. The implementation can be found at the following link: https://github.com/Liloulou/prefix_subspace
    摘要 Translated into Simplified Chinese:这篇文章关注大型语言模型(LLMs)在缺乏数据情况下(也称为“几个样本学习”)的细化。我们提出一种方法,通过神经网络子空间来增强LLMs的泛化能力。这种优化方法,在计算机视觉领域已经引入,目的是通过 JOINT 优化整个参数空间中的模型,以提高模型的泛化性。但是对于大型预训练转换器来说,它们的许多参数和确定性参数初始化方式使得它们不适合原始的子空间方法。我们在这篇文章中显示,使用 "Parameter Efficient Fine-Tuning"(PEFT)方法可以与原始方法相结合,并提议学习整个继承refix的连续前缀。我们在一个基于 GLUE benchmark 的修改版本上测试了我们的方法,并显示了我们的贡献的共同作用导致了与 SOTA 方法相比的平均性能提高。实现可以在以下链接中找到:

Guiding LLM to Fool Itself: Automatically Manipulating Machine Reading Comprehension Shortcut Triggers

  • paper_url: http://arxiv.org/abs/2310.18360
  • repo_url: https://github.com/mosh0110/guiding-llm
  • paper_authors: Mosh Levy, Shauli Ravfogel, Yoav Goldberg
  • for: 这篇论文探讨了LLMs在机器阅读理解(MRC)系统中的应用,以及短cuts Mechanisms在其可靠性上的影响。
  • methods: 作者分析了LLMs作为编辑和读者两个角度,并提出了一个框架来引导编辑添加可能的短cut triggers。使用GPT4作为编辑,研究发现GPT4可以成功地编辑出Trigger短cut。
  • results: 研究发现,即使使用 capable LLMs,也可以使用短cut知识来欺骗LLMs。另外,发现GPT4可以被自己的编辑(15%的F1下降)。这些发现表明LLMs在短cut manipulate下存在潜在的漏洞。作者发布了ShortcutQA数据集,供未来研究使用。
    Abstract Recent applications of LLMs in Machine Reading Comprehension (MRC) systems have shown impressive results, but the use of shortcuts, mechanisms triggered by features spuriously correlated to the true label, has emerged as a potential threat to their reliability. We analyze the problem from two angles: LLMs as editors, guided to edit text to mislead LLMs; and LLMs as readers, who answer questions based on the edited text. We introduce a framework that guides an editor to add potential shortcuts-triggers to samples. Using GPT4 as the editor, we find it can successfully edit trigger shortcut in samples that fool LLMs. Analysing LLMs as readers, we observe that even capable LLMs can be deceived using shortcut knowledge. Strikingly, we discover that GPT4 can be deceived by its own edits (15% drop in F1). Our findings highlight inherent vulnerabilities of LLMs to shortcut manipulations. We publish ShortcutQA, a curated dataset generated by our framework for future research.
    摘要 Translated into Simplified Chinese:最近,大语言模型(LLMs)在机器阅读理解(MRC)系统中的应用有所成就,但使用快捷途径,基于true标签的特征相关性,被视为可能的可靠性问题。我们从两个角度分析问题:LLMs作为编辑,受导向编辑文本以诱导LLMs;以及LLMs作为读者,根据编辑后的文本回答问题。我们提出了一个框架,帮助编辑添加可能的快捷途径触发器到样本中。使用GPT4作为编辑,我们发现它可以成功地编辑诱导LLMs的快捷途径。我们分析LLMs作为读者时,发现即使有能力的LLMs也可以被使用快捷途径知识欺骗。另外,我们发现GPT4可以被自己的编辑(15%下降)欺骗。我们的发现表明LLMs具有快捷途径 manipulate的潜在弱点。我们发布了ShortcutQA数据集,用于未来的研究。

SequenceMatch: Revisiting the design of weak-strong augmentations for Semi-supervised learning

  • paper_url: http://arxiv.org/abs/2310.15787
  • repo_url: None
  • paper_authors: Khanh-Binh Nguyen
  • for: 本研究的目的是提出一种高效的 semi-supervised learning (SSL) 方法,以增强模型在半有标注数据上的训练。
  • methods: 本方法使用多种数据增强,其中包括一种中等增强,以减少模型对半有标注数据的过拟合。此外,本方法还定义了两种不同的一致性约束,以适应不同的预测情况。
  • results: 对于标准的 CIFAR-10/100、SVHN 和 STL-10 测试集,SequenceMatch 减少了模型的训练时间和数据量,同时保持高度的准确率。此外,SequenceMatch 在 ImageNet 大规模测试集上也达到了新的州OF-THE-ART 水平,其error rate 为 38.46%。
    Abstract Semi-supervised learning (SSL) has become popular in recent years because it allows the training of a model using a large amount of unlabeled data. However, one issue that many SSL methods face is the confirmation bias, which occurs when the model is overfitted to the small labeled training dataset and produces overconfident, incorrect predictions. To address this issue, we propose SequenceMatch, an efficient SSL method that utilizes multiple data augmentations. The key element of SequenceMatch is the inclusion of a medium augmentation for unlabeled data. By taking advantage of different augmentations and the consistency constraints between each pair of augmented examples, SequenceMatch helps reduce the divergence between the prediction distribution of the model for weakly and strongly augmented examples. In addition, SequenceMatch defines two different consistency constraints for high and low-confidence predictions. As a result, SequenceMatch is more data-efficient than ReMixMatch, and more time-efficient than both ReMixMatch ($\times4$) and CoMatch ($\times2$) while having higher accuracy. Despite its simplicity, SequenceMatch consistently outperforms prior methods on standard benchmarks, such as CIFAR-10/100, SVHN, and STL-10. It also surpasses prior state-of-the-art methods by a large margin on large-scale datasets such as ImageNet, with a 38.46\% error rate. Code is available at https://github.com/beandkay/SequenceMatch.
    摘要 隐supervised learning(SSL)在最近几年内变得越来越流行,因为它允许使用大量无标签数据来训练模型。然而,许多SSL方法面临的问题是确认偏见,这种情况下模型将被适应小标注训练集,并生成过自信的错误预测。为解决这个问题,我们提出了SequenceMatch方法,它利用多种数据增强。SequenceMatch的关键元素是将媒体增强加到无标签数据中。通过利用不同的增强和每对增强的约束相互之间的一致性,SequenceMatch可以减少模型预测结果的分化。此外,SequenceMatch还定义了高和低自信预测的两种一致约束。因此,SequenceMatch比RemixMatch更有效率,比RemixMatch(×4)和Comatch(×2)更快速,同时具有更高的准确率。尽管简单,SequenceMatch在标准标准benchmark上一直保持领先,包括CIFAR-10/100、SVHN和STL-10。它还在大规模数据集上超越了先前的状态rut bearing, erreur rate为38.46%。代码可以在https://github.com/beandkay/SequenceMatch上找到。

3D Masked Autoencoders for Enhanced Privacy in MRI Scans

  • paper_url: http://arxiv.org/abs/2310.15778
  • repo_url: None
  • paper_authors: Lennart Alexander Van der Goten, Kevin Smith
  • for: 防止MRI扫描数据泄露个人隐私信息
  • methods: 使用Masked Autoencoders和Generative Adversarial Networks(GAN)进行数据隐私处理
  • results: 提出CP-MAE模型,可以在下游任务中提高隐私处理性能,同时可以生成高分辨率($256^3$)的扫描数据 Synthesize
    Abstract MRI scans provide valuable medical information, however they also contain sensitive and personally identifiable information (PII) that needs to be protected. Whereas MRI metadata is easily sanitized, MRI image data is a privacy risk because it contains information to render highly-realistic 3D visualizations of a patient's head, enabling malicious actors to possibly identify the subject by cross-referencing a database. Data anonymization and de-identification is concerned with ensuring the privacy and confidentiality of individuals' personal information. Traditional MRI de-identification methods remove privacy-sensitive parts (e.g. eyes, nose etc.) from a given scan. This comes at the expense of introducing a domain shift that can throw off downstream analyses. Recently, a GAN-based approach was proposed to de-identify a patient's scan by remodeling it (e.g. changing the face) rather than by removing parts. In this work, we propose CP-MAE, a model that de-identifies the face using masked autoencoders and that outperforms all previous approaches in terms of downstream task performance as well as de-identification. With our method we are able to synthesize scans of resolution up to $256^3$ (previously 128 cubic) which constitutes an eight-fold increase in the number of voxels. Using our construction we were able to design a system that exhibits a highly robust training stage, making it easy to fit the network on novel data.
    摘要

MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications

  • paper_url: http://arxiv.org/abs/2310.15777
  • repo_url: None
  • paper_authors: Yizhe Yang, Huashan Sun, Jiawei Li, Runheng Liu, Yinghao Li, Yuhang Liu, Heyan Huang, Yang Gao
  • for: 这 paper 的目的是开发轻量级大语言模型,以提高模型的效率和可扩展性,并应对训练和部署大模型的高成本和资源短缺。
  • methods: 这 paper 使用了自scratch 训练的双语大语言模型 MindLLM,并提供了 1.3 亿和 3 亿参数的模型。文章还介绍了数据构造、模型体系、评估和应用等方面的经验。
  • results: MindLLM 可以与其他更大的开源模型在一些公共Benchmark上匹配或超越其表现,而且文章还提出了一种针对更小的模型进行调整 instrucion 的框架,以提高其能力。此外,文章还探讨了 MindLLM 在法律和金融等特定领域的应用。
    Abstract Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.
    摘要 大型自然语言模型(LLM)已经展示出惊人的性能, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Causal Understanding of Why Users Share Hate Speech on Social Media

  • paper_url: http://arxiv.org/abs/2310.15772
  • repo_url: None
  • paper_authors: Dominique Geissler, Abdurahman Maarouf, Stefan Feuerriegel
  • for: 这篇论文旨在探讨用户在社交媒体上分享仇恨言论的原因,以及如何预测和防止这种情况。
  • methods: 这篇论文采用了一种新的三步 causal 框架,包括减少选择性的偏见、使用倒排比例分数模型 latent 感知用户对仇恨言论的抵触程度,以及基于这些感知值模型 causal 效应。
  • results: 研究发现,有 fewer followers、fewer friends 和 fewer posts 的用户更容易分享仇恨言论,而 younger accounts 则更 unlikely to do so。这些结论可以帮助检测可能会发生危害的用户,并设计有效的预防策略。
    Abstract Hate speech on social media threatens the mental and physical well-being of individuals and is further responsible for real-world violence. An important driver behind the spread of hate speech and thus why hateful posts can go viral are reshares, yet little is known about why users reshare hate speech. In this paper, we present a comprehensive, causal analysis of the user attributes that make users reshare hate speech. However, causal inference from observational social media data is challenging, because such data likely suffer from selection bias, and there is further confounding due to differences in the vulnerability of users to hate speech. We develop a novel, three-step causal framework: (1) We debias the observational social media data by applying inverse propensity scoring. (2) We use the debiased propensity scores to model the latent vulnerability of users to hate speech as a latent embedding. (3) We model the causal effects of user attributes on users' probability of sharing hate speech, while controlling for the latent vulnerability of users to hate speech. Compared to existing baselines, a particular strength of our framework is that it models causal effects that are non-linear, yet still explainable. We find that users with fewer followers, fewer friends, and fewer posts share more hate speech. Younger accounts, in return, share less hate speech. Overall, understanding the factors that drive users to share hate speech is crucial for detecting individuals at risk of engaging in harmful behavior and for designing effective mitigation strategies.
    摘要 仇恨言语在社交媒体上威胁个人的心理和身体健康,并导致实际暴力。散布仇恨言语的重要驱动力是重分享,然而现有少知为何用户会重分享仇恨言语。在这篇论文中,我们提供了一个完整的 causal 分析,探讨用户特征是如何导致他们重分享仇恨言语。然而,从观察社交媒体数据中进行 causal 推断是困难的,因为这些数据可能受到选择偏见的影响,并且存在用户敏感性差异,导致不同用户对仇恨言语的敏感程度不同。我们开发了一种新的三步 causal 框架:1. 我们使用反报分析来减少社交媒体数据的选择偏见。2. 我们使用减少后的报分数来模型用户对仇恨言语的敏感程度,并将其转换为一个隐藏的嵌入。3. 我们使用隐藏嵌入来模型用户特征对重分享仇恨言语的 causal 影响,同时控制用户对仇恨言语的敏感程度。相比现有基线,我们的框架的一个重要优势是可以模型非线性的 causal 效应,同时仍然可以解释。我们发现,分享仇恨言语的用户通常有 fewer followers、fewer friends 和 fewer posts,而年轻的帐户则更少分享仇恨言语。总之,理解用户分享仇恨言语的因素非常重要,以便检测可能产生危害行为的个人,并设计有效的缓解策略。

Debiasing, calibrating, and improving Semi-supervised Learning performance via simple Ensemble Projector

  • paper_url: http://arxiv.org/abs/2310.15764
  • repo_url: https://github.com/beandkay/epass
  • paper_authors: Khanh-Binh Nguyen
  • for: 本研究旨在提高现有的对联训练 semi-supervised learning 框架的性能,而不是 introduce 更多的网络组件和训练过程。
  • methods: 该 paper 提出了一种简单的方法,名为 Ensemble Projectors Aided for Semi-supervised Learning (EPASS),它是通过改进学习的嵌入来提高 semi-supervised learning 的性能。
  • results: EPASS 可以提高泛化、强化特征表示和提高性能,例如,在 SimMatch 和 CoMatch datasets 上,EPASS 可以提高基eline 的 semi-supervised learning 性能 by 39.47%/31.39%/24.70% top-1 error rate,并且可以在不同的方法、网络架构和 dataset 上获得一致的提高。
    Abstract Recent studies on semi-supervised learning (SSL) have achieved great success. Despite their promising performance, current state-of-the-art methods tend toward increasingly complex designs at the cost of introducing more network components and additional training procedures. In this paper, we propose a simple method named Ensemble Projectors Aided for Semi-supervised Learning (EPASS), which focuses mainly on improving the learned embeddings to boost the performance of the existing contrastive joint-training semi-supervised learning frameworks. Unlike standard methods, where the learned embeddings from one projector are stored in memory banks to be used with contrastive learning, EPASS stores the ensemble embeddings from multiple projectors in memory banks. As a result, EPASS improves generalization, strengthens feature representation, and boosts performance. For instance, EPASS improves strong baselines for semi-supervised learning by 39.47\%/31.39\%/24.70\% top-1 error rate, while using only 100k/1\%/10\% of labeled data for SimMatch, and achieves 40.24\%/32.64\%/25.90\% top-1 error rate for CoMatch on the ImageNet dataset. These improvements are consistent across methods, network architectures, and datasets, proving the general effectiveness of the proposed methods. Code is available at https://github.com/beandkay/EPASS.
    摘要 最近的半监督学习(SSL)研究已经取得了很大的成功。虽然现有的状态之arteMethods tend to increasingly complex designs at the cost of introducing more network components and additional training procedures. 在这篇论文中,我们提出了一种简单的方法named Ensemble Projectors Aided for Semi-supervised Learning(EPASS),它主要是通过改进学习的嵌入来提高现有的对比训练半监督学习框架的性能。 unlike standard methods, where the learned embeddings from one projector are stored in memory banks to be used with contrastive learning, EPASS stores the ensemble embeddings from multiple projectors in memory banks. 因此,EPASS可以提高总体化,强化特征表示,并提高性能。例如,EPASS可以使用100k/1\%/10\%的标注数据对SimMatch进行训练,并实现了ImageNet数据集上的40.24\%/32.64\%/25.90\%的top-1错误率,而使用标注数据的数量和网络架构和数据集的变化,这些改进是一致的,证明了提posed方法的通用效果。代码可以在https://github.com/beandkay/EPASS中找到。

Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection

  • paper_url: http://arxiv.org/abs/2310.15752
  • repo_url: https://github.com/hlt-mt/fbk-fairseq
  • paper_authors: Dennis Fucci, Marco Gaido, Sara Papi, Mauro Cettolo, Matteo Negri, Luisa Bentivogli
  • for: 这篇论文主要是为了提高语音翻译(ST)系统中的 speaker-related gender inflection 控制,以提高翻译的准确性和可靠性。
  • methods: 这篇论文提出了一种在运行时使用 external language model(LM)来控制 ST 系统中的 speaker-related gender inflection 的方法,而不需要对模型进行专门的重新训练。
  • results: 根据实验结果,这种方法可以在 en->es/fr/it 等语言对话中提高 gender accuracy 的表现,相比基eline模型和最佳训练时间缓解策略,可以提高31.0和1.6个点的准确率。此外,当 speaker 的 vocal trait 与 gender 存在冲突时,这种方法的提高还更加明显,可以提高32.0和3.4个点的准确率。
    Abstract When translating words referring to the speaker, speech translation (ST) systems should not resort to default masculine generics nor rely on potentially misleading vocal traits. Rather, they should assign gender according to the speakers' preference. The existing solutions to do so, though effective, are hardly feasible in practice as they involve dedicated model re-training on gender-labeled ST data. To overcome these limitations, we propose the first inference-time solution to control speaker-related gender inflections in ST. Our approach partially replaces the (biased) internal language model (LM) implicitly learned by the ST decoder with gender-specific external LMs. Experiments on en->es/fr/it show that our solution outperforms the base models and the best training-time mitigation strategy by up to 31.0 and 1.6 points in gender accuracy, respectively, for feminine forms. The gains are even larger (up to 32.0 and 3.4) in the challenging condition where speakers' vocal traits conflict with their gender.
    摘要 当翻译话语时,ST系统 shouldn't resort to default masculine generics 也不可以 rely on potentially misleading vocal traits。而是应该根据Speaker的偏好 assign gender。现有的解决方案,虽然有效,但在实践中困难可行,需要重新训练 gender-labeled ST 数据。为了缓解这些限制,我们提出了首个在推理时控制 speaker-related gender inflections的解决方案。我们的方法部分地将(偏见的)STdecoder内部的语言模型(LM)被换成 gender-specific external LMs。实验表明,我们的解决方案在 en->es/fr/it 等语言之间对比 base models 和最佳训练时mitigation strategy 的性能有31.0和1.6点的提升,即 gender accuracy 的提升。这些提升还更大(达到32.0和3.4)在 speakers' vocal traits 与 gender 之间的复杂情况下。

The Hyperdimensional Transform: a Holographic Representation of Functions

  • paper_url: http://arxiv.org/abs/2310.16065
  • repo_url: None
  • paper_authors: Pieter Dewulf, Michiel Stock, Bernard De Baets
  • For: The paper introduces a new type of integral transform called the hyperdimensional transform, which maps square-integrable functions into noise-robust, holographic, high-dimensional representations called hyperdimensional vectors.* Methods: The paper uses a set of stochastic, orthogonal basis functions to approximate a function as a linear combination of random functions, and defines the hyperdimensional transform and its inverse.* Results: The paper discusses general transform-related properties such as uniqueness, approximation properties of the inverse transform, and the representation of integrals and derivatives, and provides straightforward and easily understandable code for computing the transform and solving differential equations.
    Abstract Integral transforms are invaluable mathematical tools to map functions into spaces where they are easier to characterize. We introduce the hyperdimensional transform as a new kind of integral transform. It converts square-integrable functions into noise-robust, holographic, high-dimensional representations called hyperdimensional vectors. The central idea is to approximate a function by a linear combination of random functions. We formally introduce a set of stochastic, orthogonal basis functions and define the hyperdimensional transform and its inverse. We discuss general transform-related properties such as its uniqueness, approximation properties of the inverse transform, and the representation of integrals and derivatives. The hyperdimensional transform offers a powerful, flexible framework that connects closely with other integral transforms, such as the Fourier, Laplace, and fuzzy transforms. Moreover, it provides theoretical foundations and new insights for the field of hyperdimensional computing, a computing paradigm that is rapidly gaining attention for efficient and explainable machine learning algorithms, with potential applications in statistical modelling and machine learning. In addition, we provide straightforward and easily understandable code, which can function as a tutorial and allows for the reproduction of the demonstrated examples, from computing the transform to solving differential equations.
    摘要 Integral transforms are incredibly useful mathematical tools for mapping functions into spaces where they are easier to characterize. We introduce the hyperdimensional transform as a new type of integral transform. It converts square-integrable functions into noise-robust, holographic, high-dimensional representations called hyperdimensional vectors. The central idea is to approximate a function by a linear combination of random functions. We formally introduce a set of stochastic, orthogonal basis functions and define the hyperdimensional transform and its inverse. We discuss general transform-related properties such as its uniqueness, approximation properties of the inverse transform, and the representation of integrals and derivatives. The hyperdimensional transform offers a powerful, flexible framework that connects closely with other integral transforms, such as the Fourier, Laplace, and fuzzy transforms. Moreover, it provides theoretical foundations and new insights for the field of hyperdimensional computing, a computing paradigm that is rapidly gaining attention for efficient and explainable machine learning algorithms, with potential applications in statistical modeling and machine learning. In addition, we provide straightforward and easily understandable code, which can function as a tutorial and allows for the reproduction of the demonstrated examples, from computing the transform to solving differential equations.Here's the translation in Traditional Chinese:Integral transforms are incredibly useful mathematical tools for mapping functions into spaces where they are easier to characterize. We introduce the hyperdimensional transform as a new type of integral transform. It converts square-integrable functions into noise-robust, holographic, high-dimensional representations called hyperdimensional vectors. The central idea is to approximate a function by a linear combination of random functions. We formally introduce a set of stochastic, orthogonal basis functions and define the hyperdimensional transform and its inverse. We discuss general transform-related properties such as its uniqueness, approximation properties of the inverse transform, and the representation of integrals and derivatives. The hyperdimensional transform offers a powerful, flexible framework that connects closely with other integral transforms, such as the Fourier, Laplace, and fuzzy transforms. Moreover, it provides theoretical foundations and new insights for the field of hyperdimensional computing, a computing paradigm that is rapidly gaining attention for efficient and explainable machine learning algorithms, with potential applications in statistical modeling and machine learning. In addition, we provide straightforward and easily understandable code, which can function as a tutorial and allows for the reproduction of the demonstrated examples, from computing the transform to solving differential equations.

Recurrent Linear Transformers

  • paper_url: http://arxiv.org/abs/2310.15719
  • repo_url: https://github.com/subho406/Recurrent-Linear-Transformers
  • paper_authors: Subhojeet Pramanik, Esraa Elelimy, Marlos C. Machado, Adam White
  • for: 本文目的是提出一种基于循环自注意力机制的 alternating attention 机制,以提高 transformer 模型在 reinforcement learning задача中的应用可行性。
  • methods: 本文使用了一种基于循环自注意力机制的 alternating attention 机制,该机制可以独立地进行启发成本计算,同时能够有效地利用长距离依赖关系。
  • results: 对于 reinforcement learning 问题,本文的方法与 state-of-the-art 方法 GTrXL 相比,推理成本下降至少 40%,并且减少了内存使用量超过 50%。此外,本文的方法在 harder tasks 上比 GTrXL 表现更好,提高了更多于 37%。
    Abstract The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequential data. Nevertheless, despite their success, transformers have two significant drawbacks that still limit their broader applicability: (1) In order to remember past information, the self-attention mechanism requires access to the whole history to be provided as context. (2) The inference cost in transformers is expensive. In this paper we introduce recurrent alternatives to the transformer self-attention mechanism that offer a context-independent inference cost, leverage long-range dependencies effectively, and perform well in practice. We evaluate our approaches in reinforcement learning problems where the aforementioned computational limitations make the application of transformers nearly infeasible. We quantify the impact of the different components of our architecture in a diagnostic environment and assess performance gains in 2D and 3D pixel-based partially-observable environments. When compared to a state-of-the-art architecture, GTrXL, inference in our approach is at least 40% cheaper while reducing memory use in more than 50%. Our approach either performs similarly or better than GTrXL, improving more than 37% upon GTrXL performance on harder tasks.
    摘要 transformer 架构中的自注意机制可以 capture 长距离依赖关系,这是它处理序列数据的效果的主要原因。然而,尽管它们成功, transformer 还有两个主要的缺点,这些缺点还限制了它们的更广泛应用:1. 为了记忆过去的信息,自注意机制需要提供全部历史作为 Context。2. transformer 的推理成本贵。在这篇论文中,我们介绍了 transformer 自注意机制的循环alternative,这些alternative可以提供context-independent的推理成本,高效地利用长距离依赖关系,并在实践中表现良好。我们在 reinforcement learning 问题中评估了我们的方法,这些问题的计算限制使 transformer 在实际应用中几乎无法实现。我们评估了不同组件的影响,并在二维和三维像素基于部分可见环境中评估性能。与 state-of-the-art 架构 GTrXL 相比,我们的方法的推理成本至少减少了40%,同时减少了内存使用量超过50%。我们的方法在更难的任务上表现相对或更好于 GTrXL,提高了More than 37% 的 GTrXL 性能。

Solving large flexible job shop scheduling instances by generating a diverse set of scheduling policies with deep reinforcement learning

  • paper_url: http://arxiv.org/abs/2310.15706
  • repo_url: None
  • paper_authors: Imanol Echeverria, Maialen Murua, Roberto Santana
  • for: 提出了一种方法可以针对大规模的Job Shop优化调度问题(FJSSP)。
  • methods: 使用图ael Neural Networks(GNN)模型FJSSP,并提出了两种方法来提高推理的稳定性:生成多个调度策略并限制其使用派单规则(DRs)。
  • results: 对于大规模的FJSSP实例,该方法比派单规则和三种 latest deep reinforcement learning方法获得更好的结果。
    Abstract The Flexible Job Shop Scheduling Problem (FJSSP) has been extensively studied in the literature, and multiple approaches have been proposed within the heuristic, exact, and metaheuristic methods. However, the industry's demand to be able to respond in real-time to disruptive events has generated the necessity to be able to generate new schedules within a few seconds. Among these methods, under this constraint, only dispatching rules (DRs) are capable of generating schedules, even though their quality can be improved. To improve the results, recent methods have been proposed for modeling the FJSSP as a Markov Decision Process (MDP) and employing reinforcement learning to create a policy that generates an optimal solution assigning operations to machines. Nonetheless, there is still room for improvement, particularly in the larger FJSSP instances which are common in real-world scenarios. Therefore, the objective of this paper is to propose a method capable of robustly solving large instances of the FJSSP. To achieve this, we propose a novel way of modeling the FJSSP as an MDP using graph neural networks. We also present two methods to make inference more robust: generating a diverse set of scheduling policies that can be parallelized and limiting them using DRs. We have tested our approach on synthetically generated instances and various public benchmarks and found that our approach outperforms dispatching rules and achieves better results than three other recent deep reinforcement learning methods on larger FJSSP instances.
    摘要 flexible job shop scheduling problem (FJSSP) 已经在文献中得到了广泛的研究,多种方法在迭代、精确和元迭代方法中提出了多种方法。然而,业务需要在实时应对突发事件,因此需要在几秒钟内生成新的计划。在这些方法中,只有派送规则(DRs)可以生成计划,尽管它们的质量可以进行改进。为了改进结果,latest methods 把 FJSSP 模型为Markov Decision Process (MDP),并使用 reinforcement learning 创建一个政策,以确定操作分配给机器。然而,在更大的 FJSSP 实例中仍然有很多的空间进行改进。因此,本文的目标是提出一种可以坚定地解决大型 FJSSP 实例的方法。为了实现这一目标,我们提出了一种新的 MDP 模型使用图 neural network。我们还提出了两种方法来使推理更加稳定:生成多种调度策略,并使用 DRs 限制它们。我们对 synthetically generated instances 和多个公共benchmark 进行了测试,并发现我们的方法在更大的 FJSSP 实例中比派送规则和三种最新的深度学习方法更好的结果。

Learning-based Scheduling for Information Accuracy and Freshness in Wireless Networks

  • paper_url: http://arxiv.org/abs/2310.15705
  • repo_url: None
  • paper_authors: Hitesh Gudwani
  • for: 本文关注一个多源单渠道监测系统,其中每个源都测量时间变化的量,但是这些量的准确性和传输成功率都不确定。
  • methods: 作者使用多臂抽奖问题的变体来模型调度问题,并对四种标准抽奖策略进行比较:ETC、$\epsilon$-贪婪、UCB和TS。此外,作者还提供了ETC和$\epsilon$-贪婪策略的分析保证。
  • results: 作者通过 simulations 和分析来证明,ETC 策略和 $\epsilon$-贪婪策略在该系统中表现最佳,而 UCB 和 TS 策略则表现较差。此外,作者还提供了系统的下界性能。
    Abstract We consider a system of multiple sources, a single communication channel, and a single monitoring station. Each source measures a time-varying quantity with varying levels of accuracy and one of them sends its update to the monitoring station via the channel. The probability of success of each attempted communication is a function of the source scheduled for transmitting its update. Both the probability of correct measurement and the probability of successful transmission of all the sources are unknown to the scheduler. The metric of interest is the reward received by the system which depends on the accuracy of the last update received by the destination and the Age-of-Information (AoI) of the system. We model our scheduling problem as a variant of the multi-arm bandit problem with sources as different arms. We compare the performance of all $4$ standard bandit policies, namely, ETC, $\epsilon$-greedy, UCB, and TS suitably adjusted to our system model via simulations. In addition, we provide analytical guarantees of $2$ of these policies, ETC, and $\epsilon$-greedy. Finally, we characterize the lower bound on the cumulative regret achievable by any policy.
    摘要 我们考虑一个多源多通道单一监控站的系统。每个源都会测量一个时间 varying 的量,并且其中一个source会将更新发送到监控站 via 通道。各 source 的尝试通信成功率是对应的 schedule 的函数。各源的正确测量和通信成功率都是未知的。我们的评估问题是一种多臂钟点问题,其中每个臂是各个源。我们通过实验和分析 guarantees 评估不同的政策的表现。特别是,我们评估了4种标准钟点策略, namely ETC、 $\epsilon $-greedy、UCB 和 TS,并且提供了适当的适配。此外,我们还提供了任何政策的下限 regret 可以实现的Lower bound。

Towards Automated Recipe Genre Classification using Semi-Supervised Learning

  • paper_url: http://arxiv.org/abs/2310.15693
  • repo_url: None
  • paper_authors: Nazmus Sakib, G. M. Shahariar, Md. Mohsinul Kabir, Md. Kamrul Hasan, Hasan Mahmud
  • for: 这个研究的目的是提供一个大规模的Cooking recipe dataset,以便对食谱进行分类和识别。
  • methods: 这个研究使用了两种Named Entity Recognition(NER)提取工具,并将食谱描述中的各种名称都EXTRACTED出来。
  • results: 研究结果显示,使用传统机器学习、深度学习和预训练语言模型可以对食谱进行分类,并达到了98.6%的总准确率。 investigate发现,标题特征在分类种类中发挥了更重要的作用。
    Abstract Sharing cooking recipes is a great way to exchange culinary ideas and provide instructions for food preparation. However, categorizing raw recipes found online into appropriate food genres can be challenging due to a lack of adequate labeled data. In this study, we present a dataset named the ``Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset" that contains two million culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions. This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions. The proposed pipeline named 3A2M+ extends the size of the Named Entity Recognition (NER) list to address missing named entities like heat, time or process from the recipe directions using two NER extraction tools. 3A2M+ dataset provides a comprehensive solution to the various challenging recipe-related tasks, including classification, named entity recognition, and recipe generation. Furthermore, we have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.6\%. Our investigation indicates that the title feature played a more significant role in classifying the genre.
    摘要 共享cooking recipe是一个好方式来交换culinary ideas和提供食物预制 instrucciones。然而,将raw recipefound online分类到合适的食物种类中可以是一个挑战,因为缺乏足够的标注数据。在这项研究中,我们提出了名为“Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset”的数据集,其包含2000万个culinary recipe,每个recipe都被标注到相应的食物种类中,并且包含了多种特征,如标题、NER、 instrucciones、扩展NER等,以及九个不同的标签,表示不同的食物种类,包括烘焙、饮料、非肉食、蔬菜、快餐、粮食、主食、侧菜和混合。我们提出的管道名为3A2M+,通过两个NER提取工具来扩展recipe directions中缺失的名称实体。3A2M+数据集提供了一个完整的解决方案 для多种挑战的recipe-related任务,包括分类、名称实体识别和recipe生成。此外,我们使用传统机器学习、深度学习和预训练语言模型来分类recipes,并达到了98.6%的总准确率。我们的调查表明,标题特征在分类种类中发挥了更重要的作用。

Improving Biomedical Abstractive Summarisation with Knowledge Aggregation from Citation Papers

  • paper_url: http://arxiv.org/abs/2310.15684
  • repo_url: https://github.com/tangg555/biomed-sum
  • paper_authors: Chen Tang, Shun Wang, Tomas Goldsack, Chenghua Lin
  • for: 本研究旨在提高生物医学文献摘要的语言模型性能,通过 integrate 文献引用中的域专业知识。
  • methods: 我们提出了一种新的注意力机制,用于将域专业知识集成到语言模型中,以便通过文献内容和相关知识来生成摘要。
  • results: 我们的模型在生物医学摘要 tasks 上表现出优秀的result,至于比基eline方法有substantial 的提升。
    Abstract Abstracts derived from biomedical literature possess distinct domain-specific characteristics, including specialised writing styles and biomedical terminologies, which necessitate a deep understanding of the related literature. As a result, existing language models struggle to generate technical summaries that are on par with those produced by biomedical experts, given the absence of domain-specific background knowledge. This paper aims to enhance the performance of language models in biomedical abstractive summarisation by aggregating knowledge from external papers cited within the source article. We propose a novel attention-based citation aggregation model that integrates domain-specific knowledge from citation papers, allowing neural networks to generate summaries by leveraging both the paper content and relevant knowledge from citation papers. Furthermore, we construct and release a large-scale biomedical summarisation dataset that serves as a foundation for our research. Extensive experiments demonstrate that our model outperforms state-of-the-art approaches and achieves substantial improvements in abstractive biomedical text summarisation.
    摘要 摘要抽象从生物医学文献中获得特有的领域特有特征,包括专业性语言风格和生物医学术语,这使得现有语言模型在生物医学抽象摘要中表现不佳,因为缺乏领域专业知识。这篇论文旨在提高生物医学抽象摘要中语言模型的表现,通过收集外部文献中的参照纸引入领域专业知识。我们提议一种注意力机制的参照纸聚合模型,使得神经网络可以通过文献内容和相关知识来生成摘要。此外,我们构建了大规模的生物医学摘要数据集,作为我们的研究基础。广泛的实验表明,我们的模型在生物医学抽象摘要方面表现出色,与现有方法相比,得到了显著的改进。

Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation

  • paper_url: http://arxiv.org/abs/2310.15676
  • repo_url: None
  • paper_authors: Yinjie Lei, Zixuan Wang, Feng Chen, Guoqing Wang, Peng Wang, Yang Yang
  • for: 这篇论文主要是为了探讨多Modal 3D场景理解的最新进展和发展趋势。
  • methods: 该论文使用了多种多Modal 3D方法,包括3D+2D多camera图像和3D+语言文本描述。
  • results: 论文对多Modal 3D方法的比较和分析,并提供了一些实验结果和深入分析。
    Abstract Multi-modal 3D scene understanding has gained considerable attention due to its wide applications in many areas, such as autonomous driving and human-computer interaction. Compared to conventional single-modal 3D understanding, introducing an additional modality not only elevates the richness and precision of scene interpretation but also ensures a more robust and resilient understanding. This becomes especially crucial in varied and challenging environments where solely relying on 3D data might be inadequate. While there has been a surge in the development of multi-modal 3D methods over past three years, especially those integrating multi-camera images (3D+2D) and textual descriptions (3D+language), a comprehensive and in-depth review is notably absent. In this article, we present a systematic survey of recent progress to bridge this gap. We begin by briefly introducing a background that formally defines various 3D multi-modal tasks and summarizes their inherent challenges. After that, we present a novel taxonomy that delivers a thorough categorization of existing methods according to modalities and tasks, exploring their respective strengths and limitations. Furthermore, comparative results of recent approaches on several benchmark datasets, together with insightful analysis, are offered. Finally, we discuss the unresolved issues and provide several potential avenues for future research.
    摘要 多modal 3D场景理解在过去三年内获得了广泛关注,因为它在各个领域,如自动驾驶和人机交互,有广泛的应用。与传统的单modal 3D理解相比,引入一个额外的模式不仅提高了场景理解的丰富性和精度,而且确保了更加可靠和抗障的理解。这在多样化和挑战性的环境中变得特别重要。随着过去三年内多modal 3D方法的开发,特别是将多个相机图像(3D+2D)和文本描述(3D+语言)结合起来的方法,尚未有一篇系统性的审查。在这篇文章中,我们提供了一个系统性的评估,以填补这一空白。我们首先介绍了背景,正式定义了不同的3D多modal任务,并总结了它们的内在挑战。接着,我们提出了一种新的分类方法,根据模式和任务进行了详细的分类,探讨它们的优势和局限性。此外,我们还提供了一些最近的方法在多个标准数据集上的比较结果,并进行了深入的分析。最后,我们讨论了未解决的问题,并提出了一些未来研究的可能性。

Confounder Balancing in Adversarial Domain Adaptation for Pre-Trained Large Models Fine-Tuning

  • paper_url: http://arxiv.org/abs/2310.16062
  • repo_url: None
  • paper_authors: Shuoran Jiang, Qingcai Chen, Yang Xiang, Youcheng Pan, Xiangping Wu
  • For: This paper proposes a method for adversarial domain adaptation (ADA) with confounder balancing for pre-trained large models (PLMs) fine-tuning.* Methods: The proposed method, ADA-CBF, includes a PLM as the foundation model for a feature extractor, a domain classifier, and a confounder classifier, which are jointly trained with an adversarial loss.* Results: Compared to existing ADA methods, ADA-CBF can correctly identify confounders in domain-invariant features and eliminate confounder biases in the extracted features from PLMs. Empirical results on natural language processing and computer vision downstream tasks show that ADA-CBF outperforms the newest GPT-4, LLaMA2, ViT, and ADA methods.Here is the text in Simplified Chinese:* For: 这篇论文提出了针对预训练大型模型(PLMs)精细调教时的反对频道适应(ADA)方法,以帮助将知识从源频道传递到目标频道。* Methods: ADA-CBF方法包括PLM作为基础模型,用于特征提取器、频道分类器和干扰分类器,这些模型通过反对损失进行联合训练。* Results: 与现有ADA方法相比,ADA-CBF方法可以正确地识别频道 invariant 中的干扰因素,从而消除PLMs中提取的干扰因素。实验结果表明,ADA-CBF方法在自然语言处理和计算机视觉下游任务上表现出色,超过了最新的GPT-4、LLaMA2、ViT和ADA方法。
    Abstract The excellent generalization, contextual learning, and emergence abilities in the pre-trained large models (PLMs) handle specific tasks without direct training data, making them the better foundation models in the adversarial domain adaptation (ADA) methods to transfer knowledge learned from the source domain to target domains. However, existing ADA methods fail to account for the confounder properly, which is the root cause of the source data distribution that differs from the target domains. This study proposes an adversarial domain adaptation with confounder balancing for PLMs fine-tuning (ADA-CBF). The ADA-CBF includes a PLM as the foundation model for a feature extractor, a domain classifier and a confounder classifier, and they are jointly trained with an adversarial loss. This loss is designed to improve the domain-invariant representation learning by diluting the discrimination in the domain classifier. At the same time, the adversarial loss also balances the confounder distribution among source and unmeasured domains in training. Compared to existing ADA methods, ADA-CBF can correctly identify confounders in domain-invariant features, thereby eliminating the confounder biases in the extracted features from PLMs. The confounder classifier in ADA-CBF is designed as a plug-and-play and can be applied in the confounder measurable, unmeasurable, or partially measurable environments. Empirical results on natural language processing and computer vision downstream tasks show that ADA-CBF outperforms the newest GPT-4, LLaMA2, ViT and ADA methods.
    摘要 这些预训练大型模型(PLM)具有出色的泛化、上下文学习和 emergence 能力,可以在不直接使用训练数据的情况下处理特定任务,使其成为更好的基础模型在逆转Domain adaptation(ADA)方法中传递知识从源频道到目标频道。然而,现有的 ADA 方法未能正确考虑隐藏因素(confounder),这是源数据分布的根本原因。这项研究提出了基于 PLM 的逆转Domain adaptation with confounder balancing(ADA-CBF)。ADA-CBF 包括 PLM 作为基础模型的特征提取器、频道分类器和隐藏因素分类器,这些模块在jointly 训练的 adversarial loss 中进行学习。这种损失设计可以提高频道无关的表示学习,同时也在训练中均衡隐藏因素的分布。相比现有的 ADA 方法,ADA-CBF 可以正确识别隐藏因素在频道无关的特征中,从而消除 PLM 提取的隐藏因素偏见。ADA-CBF 的隐藏因素分类器设计为可插入的 plug-and-play,可以在测量、未测量或部分测量环境中应用。实验结果表明,ADA-CBF 在自然语言处理和计算机视觉下渠道任务上表现出色,超过了最新的 GPT-4、LLaMA2、ViT 和 ADA 方法。

A Survey on Detection of LLMs-Generated Content

  • paper_url: http://arxiv.org/abs/2310.15654
  • repo_url: https://github.com/xianjun-yang/awesome_papers_on_llms_detection
  • paper_authors: Xianjun Yang, Liangming Pan, Xuandong Zhao, Haifeng Chen, Linda Petzold, William Yang Wang, Wei Cheng
  • For: The paper aims to provide a comprehensive survey of existing detection strategies and benchmarks for identifying content generated by advanced large language models (LLMs), and to identify key challenges and prospects in the field.* Methods: The paper scrutinizes the differences between existing detection strategies and benchmarks, and advocates for a multi-faceted approach to defend against various attacks to counter the rapidly advancing capabilities of LLMs.* Results: The paper provides a broad understanding of the current landscape of LLMs-generated content detection, offering a guiding reference for researchers and practitioners striving to uphold the integrity of digital information in an era increasingly dominated by synthetic content. The relevant papers are summarized and will be consistently updated at a specific GitHub repository.
    Abstract The burgeoning capabilities of advanced large language models (LLMs) such as ChatGPT have led to an increase in synthetic content generation with implications across a variety of sectors, including media, cybersecurity, public discourse, and education. As such, the ability to detect LLMs-generated content has become of paramount importance. We aim to provide a detailed overview of existing detection strategies and benchmarks, scrutinizing their differences and identifying key challenges and prospects in the field, advocating for more adaptable and robust models to enhance detection accuracy. We also posit the necessity for a multi-faceted approach to defend against various attacks to counter the rapidly advancing capabilities of LLMs. To the best of our knowledge, this work is the first comprehensive survey on the detection in the era of LLMs. We hope it will provide a broad understanding of the current landscape of LLMs-generated content detection, offering a guiding reference for researchers and practitioners striving to uphold the integrity of digital information in an era increasingly dominated by synthetic content. The relevant papers are summarized and will be consistently updated at https://github.com/Xianjun-Yang/Awesome_papers_on_LLMs_detection.git.
    摘要 大量高级自然语言模型(LLMs)的发展,如ChatGPT,已经导致了假内容生成的增加,对媒体、网络安全、公共讨论和教育等领域产生了重要影响。因此,检测LLMs生成的内容的能力变得非常重要。我们想要提供一个详细的检测策略和标准 benchmark的概述,分析它们之间的差异,并确定领域中关键的挑战和前景,并提倡更加适应性和可靠性的模型,以提高检测精度。此外,我们认为需要采取多方面的方法来防止不同类型的攻击,以对抗LLMs的攻击能力的快速发展。根据我们所知,这是LLMs生成内容检测领域的首次全面评价。我们希望这篇文章能为研究人员和实践者提供一份广泛的理解LLMs生成内容检测领域的现状,并作为参考,帮助他们在LLMs占据主导地位的时代保持数字信息的完整性。相关论文将在https://github.com/Xianjun-Yang/Awesome_papers_on_LLMs_detection.git中进行系统性的汇总和更新。

Career Path Prediction using Resume Representation Learning and Skill-based Matching

  • paper_url: http://arxiv.org/abs/2310.15636
  • repo_url: None
  • paper_authors: Jens-Joris Decorte, Jeroen Van Hautte, Johannes Deleu, Chris Develder, Thomas Demeester
  • for: 预测工作者的下一步职业发展,包括职业变革预测和内部职业移动预测。
  • methods: 利用工作经验段落中的文本描述来预测下一步职业发展,包括一种基于 ESCO 职业标签的数据集,以及一种基于文本描述的 represencing learning 方法(CareerBERT)。
  • results: 在使用数据集进行训练后,提出了一种基于技能的预测模型和一种基于文本描述的预测模型,它们分别达到了 35.24% 和 39.61% 的 recall@10 性能指标,而将这两种模型结合使用的 Hybrid 方法则达到了 43.01% 的 recall@10 性能指标。
    Abstract The impact of person-job fit on job satisfaction and performance is widely acknowledged, which highlights the importance of providing workers with next steps at the right time in their career. This task of predicting the next step in a career is known as career path prediction, and has diverse applications such as turnover prevention and internal job mobility. Existing methods to career path prediction rely on large amounts of private career history data to model the interactions between job titles and companies. We propose leveraging the unexplored textual descriptions that are part of work experience sections in resumes. We introduce a structured dataset of 2,164 anonymized career histories, annotated with ESCO occupation labels. Based on this dataset, we present a novel representation learning approach, CareerBERT, specifically designed for work history data. We develop a skill-based model and a text-based model for career path prediction, which achieve 35.24% and 39.61% recall@10 respectively on our dataset. Finally, we show that both approaches are complementary as a hybrid approach achieves the strongest result with 43.01% recall@10.
    摘要 “人职匹配对工作满意度和性能的影响广泛被认可,这 highlights 提供工作者合适的下一步时间的重要性。预测工作者的下一步职业是职业发展预测的任务,具有吸引人的应用场景,如防止辞职和内部职位转移。现有的职业发展预测方法基于大量私人职业历史数据来模型职业标签和公司之间的互动。我们提议利用工作经历部分的未曾被利用的文本描述。我们提供了2,164个匿名化的职业历史记录,并将其注解为 ESCO 职业标签。基于这个数据集,我们提出了一种新的表征学习方法,称为 CareerBERT,专门为工作历史数据设计。我们开发了一种基于技能的模型和一种基于文本的模型,以预测职业发展。两种方法分别 achiev 35.24% 和 39.61% 的回归@10 的结果。最后,我们表明这两种方法是补充的,混合方法实现最强的结果,即 43.01% 的回归@10。”

Using Slisemap to interpret physical data

  • paper_url: http://arxiv.org/abs/2310.15610
  • repo_url: None
  • paper_authors: Lauri Seppäläinen, Anton Björklund, Vitus Besel, Kai Puolamäki
  • for: 这篇论文是用于描述一种新的 manifold visualization 方法,叫做 Slise,以及其在物理和化学 dataset 上的应用。
  • methods: 这篇论文使用了 Slisemap,它是一种结合 manifold visualization 和可解释人工智能的方法,用于调查黑盒机器学习模型和复杂模拟器的决策过程。 Slisemap 可以找到一个嵌入,使得数据项的相似本地解释被分组在一起,从而提供了黑盒模型的不同行为的总览。
  • results: 在这篇论文中, authors 使用了 Slisemap 在物理数据上进行了评估,并发现了 Slisemap 可以帮助找到 meaningful 信息,包括分类和回归模型在数据集上的表现。
    Abstract Manifold visualisation techniques are commonly used to visualise high-dimensional datasets in physical sciences. In this paper we apply a recently introduced manifold visualisation method, called Slise, on datasets from physics and chemistry. Slisemap combines manifold visualisation with explainable artificial intelligence. Explainable artificial intelligence is used to investigate the decision processes of black box machine learning models and complex simulators. With Slisemap we find an embedding such that data items with similar local explanations are grouped together. Hence, Slisemap gives us an overview of the different behaviours of a black box model. This makes Slisemap into a supervised manifold visualisation method, where the patterns in the embedding reflect a target property. In this paper we show how Slisemap can be used and evaluated on physical data and that Slisemap is helpful in finding meaningful information on classification and regression models trained on these datasets.
    摘要 manifold 视觉技术广泛应用于物理科学中高维数据的可视化。在这篇论文中,我们使用最近引入的抽象可视化方法,即Slise,对物理和化学数据集进行可视化。Slisemap 结合抽象智能和可解释性,用于调查黑盒机器学习模型和复杂模拟器的决策过程。通过Slisemap,我们可以找到一个嵌入,使得数据项与相似的本地解释集成在一起。因此,Slisemap 为我们提供了一种监督的抽象可视化方法,其中抽象的特征嵌入反映了目标属性。在这篇论文中,我们示例了如何使用 Slisemap 可视化和评估物理数据集,并证明 Slisemap 对分类和回归模型的训练数据进行可视化有助于找到有意义的信息。

tagE: Enabling an Embodied Agent to Understand Human Instructions

  • paper_url: http://arxiv.org/abs/2310.15605
  • repo_url: https://github.com/csarkar/tage
  • paper_authors: Chayan Sarkar, Avik Mitra, Pradip Pramanick, Tapas Nayak
  • for: 本研究旨在提高智能代理人(embodied agent)理解自然语言(NLU)指令,以便在人工智能(AI)与人类交互时更加准确地理解人类的意图。
  • methods: 本研究提出了一种新的task和argument grounding for Embodied agents(tagE)系统,该系统使用了一种新型的神经网络模型,可以从自然语言中提取复杂任务指令中的多个任务和其相应的Arguments。
  • results: 实验结果表明,tagE系统比基eline模型表现出色,能够更好地理解人类的意图并将其映射到机器人的已知技能集和环境中的物品上。
    Abstract Natural language serves as the primary mode of communication when an intelligent agent with a physical presence engages with human beings. While a plethora of research focuses on natural language understanding (NLU), encompassing endeavors such as sentiment analysis, intent prediction, question answering, and summarization, the scope of NLU directed at situations necessitating tangible actions by an embodied agent remains limited. The inherent ambiguity and incompleteness inherent in natural language present challenges for intelligent agents striving to decipher human intention. To tackle this predicament head-on, we introduce a novel system known as task and argument grounding for Embodied agents (tagE). At its core, our system employs an inventive neural network model designed to extract a series of tasks from complex task instructions expressed in natural language. Our proposed model adopts an encoder-decoder framework enriched with nested decoding to effectively extract tasks and their corresponding arguments from these intricate instructions. These extracted tasks are then mapped (or grounded) to the robot's established collection of skills, while the arguments find grounding in objects present within the environment. To facilitate the training and evaluation of our system, we have curated a dataset featuring complex instructions. The results of our experiments underscore the prowess of our approach, as it outperforms robust baseline models.
    摘要 自然语言作为智能代理人与人类交流的主要模式,而许多研究集中在自然语言理解(NLU)方面,涵盖了情感分析、意图预测、问答和概要概述等领域。然而,针对需要物理行为的情况,NLU的应用范围仍然很有限。自然语言的抽象和缺失使得智能代理人很难理解人类意图。为了解决这个问题,我们介绍了一种新的系统——任务和论点固定 для具体代理人(tagE)。我们的系统使用了一种创新的神经网络模型,可以从自然语言中提取复杂任务指令中的多个任务和其相应的论点。我们的提议的模型采用了Encoder-Decoder框架,并添加了嵌入式编码来有效地提取任务和论点。这些提取的任务将被映射到机器人已经建立的技能集中,而论点则会在环境中找到对应的物品。为了让我们的系统受到训练和评估,我们已经制作了一个复杂指令的数据集。实验结果表明,我们的方法在与强大基准模型进行比较时表现出色。

Emergent Communication in Interactive Sketch Question Answering

  • paper_url: http://arxiv.org/abs/2310.15597
  • repo_url: https://github.com/mediabrain-sjtu/ecisqa
  • paper_authors: Zixing Lei, Yiming Zhang, Yuxin Xiong, Siheng Chen
  • for: 本研究旨在通过图画和解释器学习沟通,并评估多轮交互对人工智能沟通的影响。
  • methods: 本研究提出了一个新的互动图问答任务,两名合作player通过图画回答一个图像的问题,并采用了一种新的有效的互动EC系统,可以实现问题答案准确率、图画复杂度和人类可读性的有效平衡。
  • results: 实验结果和人类评价表明,多轮交互机制可以提高目标和有效的人工智能沟通,并且具有良好的人类可读性。
    Abstract Vision-based emergent communication (EC) aims to learn to communicate through sketches and demystify the evolution of human communication. Ironically, previous works neglect multi-round interaction, which is indispensable in human communication. To fill this gap, we first introduce a novel Interactive Sketch Question Answering (ISQA) task, where two collaborative players are interacting through sketches to answer a question about an image in a multi-round manner. To accomplish this task, we design a new and efficient interactive EC system, which can achieve an effective balance among three evaluation factors, including the question answering accuracy, drawing complexity and human interpretability. Our experimental results including human evaluation demonstrate that multi-round interactive mechanism facilitates targeted and efficient communication between intelligent agents with decent human interpretability.
    摘要 视觉基于的演进通信(EC)目标是通过绘画学习沟通,并推翻人类沟通的演进。却意外地,先前的工作忽略了多轮互动,这是人类沟通中不可或缺的一部分。为了填补这一漏洞,我们首先介绍了一种新的互动绘画问答(ISQA)任务,其中两名合作的玩家通过绘画回答一个图像的问题,以多轮的方式进行交互。为了完成这项任务,我们设计了一种新的高效的互动EC系统,可以实现三个评价因素的有效平衡,包括问题答案准确率、绘画复杂度和人类可解释性。我们的实验结果,包括人类评价,表明了多轮互动机制可以使智能代理人与人类之间的沟通更加准确和高效。

Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression

  • paper_url: http://arxiv.org/abs/2310.15594
  • repo_url: None
  • paper_authors: Jiduan Liu, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Dongyan Zhao, Ran Lucien Wang, Rui Yan
  • for: 实现大规模预训语言模型(LLMs)在实际应用中的部署,解决大型模型的问题。
  • methods: 提出了一种新的压缩方法 Retrieval-based Knowledge Transfer(RetriKT),可以将 LLMS 中的知识转移到非常小型的模型(例如 1%)中。
  • results: 实际实验结果显示,提出的方法可以将小型模型的性能提高,通过将 LLMS 中的知识转移到小型模型中,并且运用软题调整和 proximal policy optimization(PPO)优化学习方法。
    Abstract Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. While numerous model compression techniques have been proposed, most of them are not well-suited for achieving extreme model compression when there is a significant gap in model scale. In this paper, we introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT), which effectively transfers the knowledge of LLMs to extremely small-scale models (e.g., 1%). In particular, our approach extracts knowledge from LLMs to construct a knowledge store, from which the small-scale model can retrieve relevant information and leverage it for effective inference. To improve the quality of the model, soft prompt tuning and Proximal Policy Optimization (PPO) reinforcement learning techniques are employed. Extensive experiments are conducted on low-resource tasks from SuperGLUE and GLUE benchmarks. The results demonstrate that the proposed approach significantly enhances the performance of small-scale models by leveraging the knowledge from LLMs.
    摘要 Our approach extracts knowledge from LLMs to construct a knowledge store, from which the small-scale model can retrieve relevant information and leverage it for effective inference. To improve the quality of the model, we employ soft prompt tuning and Proximal Policy Optimization (PPO) reinforcement learning techniques. We conduct extensive experiments on low-resource tasks from SuperGLUE and GLUE benchmarks. The results show that the proposed approach significantly enhances the performance of small-scale models by leveraging the knowledge from LLMs.

Detecting Intentional AIS Shutdown in Open Sea Maritime Surveillance Using Self-Supervised Deep Learning

  • paper_url: http://arxiv.org/abs/2310.15586
  • repo_url: None
  • paper_authors: Pierre Bernabé, Arnaud Gotlieb, Bruno Legeard, Dusica Marijan, Frank Olaf Sem-Jacobsen, Helge Spieker
  • for: 检测非法活动,如非法捕鱼或贩卖不法物品。
  • methods: 使用自动标识系统(AIS)消息,并使用深度学习技术和变换器模型进行自我监督学习,实时处理AIS消息,可处理每月超过500亿消息和60000艘船舶的路径轨迹。
  • results: 可以准确地检测AIS消息不当失 reception,并且可以 rediscover已经检测到的意外AIS终端停机。
    Abstract In maritime traffic surveillance, detecting illegal activities, such as illegal fishing or transshipment of illicit products is a crucial task of the coastal administration. In the open sea, one has to rely on Automatic Identification System (AIS) message transmitted by on-board transponders, which are captured by surveillance satellites. However, insincere vessels often intentionally shut down their AIS transponders to hide illegal activities. In the open sea, it is very challenging to differentiate intentional AIS shutdowns from missing reception due to protocol limitations, bad weather conditions or restricting satellite positions. This paper presents a novel approach for the detection of abnormal AIS missing reception based on self-supervised deep learning techniques and transformer models. Using historical data, the trained model predicts if a message should be received in the upcoming minute or not. Afterwards, the model reports on detected anomalies by comparing the prediction with what actually happens. Our method can process AIS messages in real-time, in particular, more than 500 Millions AIS messages per month, corresponding to the trajectories of more than 60 000 ships. The method is evaluated on 1-year of real-world data coming from four Norwegian surveillance satellites. Using related research results, we validated our method by rediscovering already detected intentional AIS shutdowns.
    摘要 在海上交通监控中,检测违法活动,如非法渔业或贩卖黑市商品是海岸管理部门的关键任务。在开海上,一需要依靠自动识别系统(AIS)消息,由船舶上的扩展器传输,由监控卫星接收。然而,不诚实的船舶经常故意关闭AIS扩展器,以隐藏违法活动。在开海上,区分意外的AIS缺失和意外关闭非常困难。这篇论文提出了一种新的偏差检测方法,基于自动学习技术和变换器模型。使用历史数据,训练好的模型预测下一分钟内是否应该接收AIS消息。之后,模型对检测到的异常进行报告,通过与实际情况进行比较。我们的方法可以处理AIS消息在实时,特别是每月超过5亿条消息,对应60000艘船舶的轨迹。我们的方法在4年的实际数据上进行了评估,使用相关研究成果, validate our method by rediscovering already detected intentional AIS shutdowns。

CONTRASTE: Supervised Contrastive Pre-training With Aspect-based Prompts For Aspect Sentiment Triplet Extraction

  • paper_url: http://arxiv.org/abs/2310.15577
  • repo_url: https://github.com/nitkannen/contraste
  • paper_authors: Rajdeep Mukherjee, Nithish Kannen, Saurabh Kumar Pandey, Pawan Goyal
  • for: 提高多个ABSA任务下推理性能
  • methods: 使用对比学习增强ASTE性能,并提出多任务结合方法
  • results: 实现新的ASTE最佳Result,并通过细致的ablation study证明每个提案的重要性
    Abstract Existing works on Aspect Sentiment Triplet Extraction (ASTE) explicitly focus on developing more efficient fine-tuning techniques for the task. Instead, our motivation is to come up with a generic approach that can improve the downstream performances of multiple ABSA tasks simultaneously. Towards this, we present CONTRASTE, a novel pre-training strategy using CONTRastive learning to enhance the ASTE performance. While we primarily focus on ASTE, we also demonstrate the advantage of our proposed technique on other ABSA tasks such as ACOS, TASD, and AESC. Given a sentence and its associated (aspect, opinion, sentiment) triplets, first, we design aspect-based prompts with corresponding sentiments masked. We then (pre)train an encoder-decoder model by applying contrastive learning on the decoder-generated aspect-aware sentiment representations of the masked terms. For fine-tuning the model weights thus obtained, we then propose a novel multi-task approach where the base encoder-decoder model is combined with two complementary modules, a tagging-based Opinion Term Detector, and a regression-based Triplet Count Estimator. Exhaustive experiments on four benchmark datasets and a detailed ablation study establish the importance of each of our proposed components as we achieve new state-of-the-art ASTE results.
    摘要 现有的对应情感三元EXTRACTION(ASTE)研究主要集中在发展更高效的精细化训练技术上。而我们的动机是为了开发一个通用的方法,可以同时提高多个ABSA任务的下游性能。为此,我们提出了CONTRASTE,一种基于对照学习的新预训练策略。我们主要关注ASTE,但我们也证明了我们的提案方法在其他ABSA任务,如ACOS、TASD和AESC上的优势。对于一句话和其相关的(方面、意见、情感)三元,我们首先设计了方面基于的提示,并将相应的情感覆盖。然后,我们使用对照学习 trains一个encoder-decoder模型,并透过对decoder生成的方面对应情感表现进行对照学习。接着,我们提出了一个新的多任务方法,将基础encoder-decoder模型与两个辅助模组(一个是基于标签的意见 терміن列表,另一个是基于回归的三元数据列表)结合。我们对四个 benchmark 数据集进行了详细的实验和ablation研究,并证明了我们的每个提案元件的重要性,得到了新的state-of-the-art ASTE结果。

DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding

  • paper_url: http://arxiv.org/abs/2310.18359
  • repo_url: None
  • paper_authors: Xiao-Yu Guo, Yuan-Fang Li, Gholamreza Haffari
  • for: 本研究旨在检验社交智能benchmark datasets的准确性。
  • methods: 我们采用了一种全面的方法来研究社交智能benchmark datasets的准确性,包括对社交智能问题的分析和模型的评估。
  • results: 我们发现社交智能benchmark datasets存在严重的偏见,可以由一个moderately strong的语言模型学习到无 Context或问题的关系,并且我们提出了一个新的benchmark dataset(DeSIQ),它可以减少了原始社交智能benchmark datasets中的偏见。
    Abstract Social intelligence is essential for understanding and reasoning about human expressions, intents and interactions. One representative benchmark for its study is Social Intelligence Queries (Social-IQ), a dataset of multiple-choice questions on videos of complex social interactions. We define a comprehensive methodology to study the soundness of Social-IQ, as the soundness of such benchmark datasets is crucial to the investigation of the underlying research problem. Our analysis reveals that Social-IQ contains substantial biases, which can be exploited by a moderately strong language model to learn spurious correlations to achieve perfect performance without being given the context or even the question. We introduce DeSIQ, a new challenging dataset, constructed by applying simple perturbations to Social-IQ. Our empirical analysis shows DeSIQ significantly reduces the biases in the original Social-IQ dataset. Furthermore, we examine and shed light on the effect of model size, model style, learning settings, commonsense knowledge, and multi-modality on the new benchmark performance. Our new dataset, observations and findings open up important research questions for the study of social intelligence.
    摘要 社会智能是人类表达、意图和互动理解的关键。一个代表性的研究 benchmark 是社交智能问答 (Social-IQ),这是一个多选问题 dataset 集合,涵盖了复杂的社交互动。我们定义了一种全面的方法ологи?来研究 Social-IQ 的听实性。我们的分析发现,Social-IQ 存在严重的偏见,可以由一个 Moderately strong 语言模型学习伪造关系,以达到完美性无需 Context 或问题。我们引入了 DeSIQ,一个新的挑战性 dataset,通过简单的变换来修正 Social-IQ 中的偏见。我们的实验分析表明,DeSIQ 有效减少了 Social-IQ 中的偏见。此外,我们还检查了模型大小、模型风格、学习设置、常识知识和多模态对新 benchmark 性能的影响。我们的新 dataset、观察和发现开 up了重要的研究问题,用于研究社会智能。

SteloCoder: a Decoder-Only LLM for Multi-Language to Python Code Translation

  • paper_url: http://arxiv.org/abs/2310.15539
  • repo_url: https://github.com/sade-adrien/stelocoder
  • paper_authors: Jialing Pan, Adrien Sadé, Jin Kim, Eric Soriano, Guillem Sole, Sylvain Flamant
  • for: 这个论文的目的是提出一个基于 StarCoder 的 Decoder-only LLM,用于多种程式语言到 Python 的程式码转换。
  • methods: 这个模型使用 Mixture-of-Experts 技术和 Low-Rank Adaptive Method,将 StarCoder 模型扩展为多任务处理。
  • results: 这个模型在 XLCoST 数据集上实现了73.76 CodeBLEU 分数,比领先者至少高出3.5。
    Abstract With the recent focus on Large Language Models (LLMs), both StarCoder (Li et al., 2023) and Code Llama (Rozi\`ere et al., 2023) have demonstrated remarkable performance in code generation. However, there is still a need for improvement in code translation functionality with efficient training techniques. In response to this, we introduce SteloCoder, a decoder-only StarCoder-based LLM designed specifically for multi-programming language-to-Python code translation. In particular, SteloCoder achieves C++, C#, JavaScript, Java, or PHP-to-Python code translation without specifying the input programming language. We modified StarCoder model architecture by incorporating a Mixture-of-Experts (MoE) technique featuring five experts and a gating network for multi-task handling. Experts are obtained by StarCoder fine-tuning. Specifically, we use a Low-Rank Adaptive Method (LoRA) technique, limiting each expert size as only 0.06% of number of StarCoder's parameters. At the same time, to enhance training efficiency in terms of time, we adopt curriculum learning strategy and use self-instruct data for efficient fine-tuning. As a result, each expert takes only 6 hours to train on one single 80Gb A100 HBM. With experiments on XLCoST datasets, SteloCoder achieves an average of 73.76 CodeBLEU score in multi-programming language-to-Python translation, surpassing the top performance from the leaderboard by at least 3.5. This accomplishment is attributed to only 45M extra parameters with StarCoder as the backbone and 32 hours of valid training on one 80GB A100 HBM. The source code is release here: https://github.com/sade-adrien/SteloCoder.
    摘要 Recently, there has been a focus on Large Language Models (LLMs), and both StarCoder (Li et al., 2023) and Code Llama (Rozi`ere et al., 2023) have shown impressive performance in code generation. However, there is still room for improvement in code translation functionality with efficient training techniques. In response, we introduce SteloCoder, a decoder-only StarCoder-based LLM designed specifically for multi-programming language-to-Python code translation. Specifically, SteloCoder can translate C++, C#, JavaScript, Java, or PHP code to Python without specifying the input programming language. We modified the StarCoder model architecture by incorporating a Mixture-of-Experts (MoE) technique with five experts and a gating network for multi-task handling. The experts are obtained by fine-tuning StarCoder. We use a Low-Rank Adaptive Method (LoRA) technique to limit each expert size to only 0.06% of the number of StarCoder's parameters. To improve training efficiency in terms of time, we adopt a curriculum learning strategy and use self-instruct data for efficient fine-tuning. As a result, each expert can be trained on one single 80Gb A100 HBM in just 6 hours. With experiments on XLCoST datasets, SteloCoder achieves an average CodeBLEU score of 73.76 in multi-programming language-to-Python translation, surpassing the top performance from the leaderboard by at least 3.5. This accomplishment is attributed to only 45M extra parameters with StarCoder as the backbone and 32 hours of valid training on one 80GB A100 HBM. The source code is available at: https://github.com/sade-adrien/SteloCoder.

Generative and Contrastive Paradigms Are Complementary for Graph Self-Supervised Learning

  • paper_url: http://arxiv.org/abs/2310.15523
  • repo_url: https://github.com/wyx11112/GCMAE
  • paper_authors: Yuxiang Wang, Xiao Yan, Chuang Hu, Fangcheng Fu, Wentao Zhang, Hao Wang, Shuo Shang, Jiawei Jiang
  • for: 本研究旨在提出一种基于自适应学习的图像自我超visentinig方法,即图像对比学习(CL)和压缩学习(MAE)两种方法的统一框架,以提高图像自我超visentinig的性能。
  • methods: 本研究提出了一种名为图像对比压缩学习(GCMAE)的统一框架,其包括一个MAE分支和一个CL分支,两个分支共享一个编码器,这使得MAE分支可以利用CL分支中提取的全局信息。此外,为了让GCMAE捕捉全局图像结构,研究人员们提出了一种强制GCMAE重建整个相对性矩阵的方法,而不是只重建部分掩码的edge как在现有工作中。
  • results: 研究人员们对四个流行的图像任务(i.e., 节点类别、节点划分、链接预测和图像类别)进行了测试,并与14种现有的基准值进行了比较。结果显示,GCMAE在这些任务中的性能都很好,最大的性能提升比为3.2%相比最佳基准值。
    Abstract For graph self-supervised learning (GSSL), masked autoencoder (MAE) follows the generative paradigm and learns to reconstruct masked graph edges or node features. Contrastive Learning (CL) maximizes the similarity between augmented views of the same graph and is widely used for GSSL. However, MAE and CL are considered separately in existing works for GSSL. We observe that the MAE and CL paradigms are complementary and propose the graph contrastive masked autoencoder (GCMAE) framework to unify them. Specifically, by focusing on local edges or node features, MAE cannot capture global information of the graph and is sensitive to particular edges and features. On the contrary, CL excels in extracting global information because it considers the relation between graphs. As such, we equip GCMAE with an MAE branch and a CL branch, and the two branches share a common encoder, which allows the MAE branch to exploit the global information extracted by the CL branch. To force GCMAE to capture global graph structures, we train it to reconstruct the entire adjacency matrix instead of only the masked edges as in existing works. Moreover, a discrimination loss is proposed for feature reconstruction, which improves the disparity between node embeddings rather than reducing the reconstruction error to tackle the feature smoothing problem of MAE. We evaluate GCMAE on four popular graph tasks (i.e., node classification, node clustering, link prediction, and graph classification) and compare with 14 state-of-the-art baselines. The results show that GCMAE consistently provides good accuracy across these tasks, and the maximum accuracy improvement is up to 3.2% compared with the best-performing baseline.
    摘要 For 自我指导学习(GSSL),假设自动编码器(MAE)遵循生成概念,学习缺失图 edges 或节点特征的重建。相对学习(CL) maximizes 图中的相似性,并广泛应用于 GSSL。然而,MAE 和 CL 在现有的工作中是分开的。我们发现 MAE 和 CL 的概念是补偿的,我们提议一个图自相关假设自动编码器(GCMAE)框架来统一它们。具体来说,MAE 因为关注本地 edges 或节点特征,无法捕捉图的全局信息,而且受到特定的 edges 和特征的影响。相反,CL 因为考虑图之间的关系,能够提取全局信息。因此,我们在 GCMAE 中采用 MAE 支持和 CL 支持,两者共享一个编码器,使 MAE 支持 CL 支持的全局信息。此外,我们训练 GCMAE 可以重建整个相关矩阵,而不是只重建缺失 edges как在现有的工作中。此外,我们还提出了一种分类损失,以提高节点特征的重建。我们在四个流行的图任务(即节点分类、节点划分、链接预测和图分类)上评估 GCMAE,并与 14 个当前的基eline进行比较。结果显示,GCMAE 在这些任务上具有良好的准确率,最大的准确率提升为 3.2% 相比最佳基eline。

KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval

  • paper_url: http://arxiv.org/abs/2310.15511
  • repo_url: None
  • paper_authors: Marah I Abdin, Suriya Gunasekar, Varun Chandrasekaran, Jerry Li, Mert Yuksekgonul, Rahee Ghosh Peshawaria, Ranjita Naik, Besmira Nushi
  • for: 本研究旨在探讨现代模型是否能够回答信息检索查询(如“ san diego 的冰淇淋店”),以及这类查询是否只能通过网络搜索或知识库解决。
  • methods: 本研究使用了现代语言模型(LLMs)来初步研究这类查询的能力。
  • results: 研究发现,许多现有的检索标准不能够评估模型的约束满足能力,而且现有的检索数据集也存在许多问题。研究者们提供了一个新的数据集(KITAB),以及一种相关的动态数据收集和约束验证方法,以便在其他作者的数据上进行类似的测试。经过扩展的实验表明,在不含上下文的情况下,模型会表现出严重的局限性,包括不相关信息、错误信息和不完整性,这些局限性随着信息的流行程度减少而加剧。
    Abstract We study the ability of state-of-the art models to answer constraint satisfaction queries for information retrieval (e.g., 'a list of ice cream shops in San Diego'). In the past, such queries were considered to be tasks that could only be solved via web-search or knowledge bases. More recently, large language models (LLMs) have demonstrated initial emergent abilities in this task. However, many current retrieval benchmarks are either saturated or do not measure constraint satisfaction. Motivated by rising concerns around factual incorrectness and hallucinations of LLMs, we present KITAB, a new dataset for measuring constraint satisfaction abilities of language models. KITAB consists of book-related data across more than 600 authors and 13,000 queries, and also offers an associated dynamic data collection and constraint verification approach for acquiring similar test data for other authors. Our extended experiments on GPT4 and GPT3.5 characterize and decouple common failure modes across dimensions such as information popularity, constraint types, and context availability. Results show that in the absence of context, models exhibit severe limitations as measured by irrelevant information, factual errors, and incompleteness, many of which exacerbate as information popularity decreases. While context availability mitigates irrelevant information, it is not helpful for satisfying constraints, identifying fundamental barriers to constraint satisfaction. We open source our contributions to foster further research on improving constraint satisfaction abilities of future models.
    摘要 我们研究现代模型对信息检索查询(如“圣地亚哥的冰淇淋店列表”)的能力。在过去,这些查询被视为只能通过网络搜索或知识库解决的任务。然而,大型自然语言模型(LLM)在这个任务中已经表现出初步的能力。然而,现有的检索标准 benchmark 中有很多不够或者不能测试制约能力的问题。为了解决这些问题,我们提出了 KITAB,一个新的数据集,用于测试语言模型的制约能力。KITAB 包含了More than 600 作者和 13,000 个查询,并且提供了一种相关的动态数据收集和制约验证方法,用于获取类似的测试数据 для其他作者。我们在 GPT4 和 GPT3.5 上进行了扩展的实验,用于描述和解除不同维度的失败模式,包括信息受欢迎程度、制约类型和上下文可用性。结果表明,在没有上下文时,模型会表现出严重的限制,包括无关信息、错误信息和不完整性,这些问题在信息受欢迎程度下降时加剧。然而,上下文可用性可以减少无关信息,但是不能满足制约。我们将我们的贡献开源,以促进未来模型的制约能力改进。

Robust Representation Learning for Unified Online Top-K Recommendation

  • paper_url: http://arxiv.org/abs/2310.15492
  • repo_url: None
  • paper_authors: Minfang Lu, Yuchen Jiang, Huihui Dong, Qi Li, Ziru Xu, Yuanlin Liu, Lixia Wu, Haoyuan Hu, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng
  • For: The paper focuses on improving the efficiency of online recommendation systems in large-scale industrial e-commerce, particularly in delivering highly relevant item/content advertising that caters to diverse business scenarios.* Methods: The proposed method employs robust representation learning, including domain adversarial learning and multi-view Wasserstein distribution learning, to learn robust representations that ensure data fairness. The method balances conflicting objectives through homoscedastic uncertainty weights and orthogonality constraints.* Results: The proposed method is effective in tackling the challenges of multi-domain matching and retrieving top-k advertisements from multi-entity advertisements across different domains. Various experiments validate the effectiveness and rationality of the proposed method, which has been successfully deployed online to serve real business scenarios.
    Abstract In large-scale industrial e-commerce, the efficiency of an online recommendation system is crucial in delivering highly relevant item/content advertising that caters to diverse business scenarios. However, most existing studies focus solely on item advertising, neglecting the significance of content advertising. This oversight results in inconsistencies within the multi-entity structure and unfair retrieval. Furthermore, the challenge of retrieving top-k advertisements from multi-entity advertisements across different domains adds to the complexity. Recent research proves that user-entity behaviors within different domains exhibit characteristics of differentiation and homogeneity. Therefore, the multi-domain matching models typically rely on the hybrid-experts framework with domain-invariant and domain-specific representations. Unfortunately, most approaches primarily focus on optimizing the combination mode of different experts, failing to address the inherent difficulty in optimizing the expert modules themselves. The existence of redundant information across different domains introduces interference and competition among experts, while the distinct learning objectives of each domain lead to varying optimization challenges among experts. To tackle these issues, we propose robust representation learning for the unified online top-k recommendation. Our approach constructs unified modeling in entity space to ensure data fairness. The robust representation learning employs domain adversarial learning and multi-view wasserstein distribution learning to learn robust representations. Moreover, the proposed method balances conflicting objectives through the homoscedastic uncertainty weights and orthogonality constraints. Various experiments validate the effectiveness and rationality of our proposed method, which has been successfully deployed online to serve real business scenarios.
    摘要 大规模工业电商中,在线推荐系统的效率是关键,可以帮助提供高度相关的商品/内容广告,满足不同的业务场景。然而,大多数现有研究仅关注Item广告,忽略了内容广告的重要性。这种忽略会导致多元结构中的不一致性和不公正性。另外,从多个Domains中检索Top-k广告的挑战更加复杂。现有研究表明,用户-Entity行为在不同Domains中具有差异和同一性特征。因此,多Domain匹配模型通常采用Hybrid-Experts框架,并使用域不同和域相同的表示。然而,大多数方法主要关注多个专家的组合方式优化,而忽略了专家模块的内在困难。在不同Domains中存在重复信息的问题会导致专家之间的竞争和干扰,而每个Domain的学习目标也会导致专家模块的优化挑战。为了解决这些问题,我们提出了robust表示学习方法,以确保数据公平。我们的方法在Entity空间建立了统一的模型,并使用域对抗学习和多视图沃氏分布学习来学习Robust表示。此外,我们的方法通过Homoscedastic不确定量和正交约束来均衡矛盾目标。经过多个实验 validate了我们提出的方法的有效性和合理性,并在实际业务场景中成功部署。

NuTrea: Neural Tree Search for Context-guided Multi-hop KGQA

  • paper_url: http://arxiv.org/abs/2310.15484
  • repo_url: https://github.com/mlvlab/nutrea
  • paper_authors: Hyeong Kyu Choi, Seunghun Lee, Jaewon Chu, Hyunwoo J. Kim
  • for: 本研究旨在提高多跳知识图问答(KGQA)任务的性能,使得可以使用自然语言问题来检索知识图(KG)中的节点。
  • methods: 我们提出了一种基于搜索的图神经网络模型(GNN),即神经搜索(NuTrea),它可以更好地考虑知识图的全局背景。我们采用了一种消息传递机制,可以增强过去的嵌入。此外,我们还引入了一种关系频率反Entity频率(RF-IEF)节点嵌入,可以更好地描述不确定的知识图节点。
  • results: 我们通过对三个主要多跳KGQA测试集进行实验,证明了我们的方法的普遍有效性。此外,我们还进行了广泛的分析,以证明我们的方法的表达能力和稳定性。总的来说,NuTrea 提供了一种可以使用自然语言问题来检索知识图中节点的强大工具。代码可以在https://github.com/mlvlab/NuTrea 上获取。
    Abstract Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves retrieving nodes from a knowledge graph (KG) to answer natural language questions. Recent GNN-based approaches formulate this task as a KG path searching problem, where messages are sequentially propagated from the seed node towards the answer nodes. However, these messages are past-oriented, and they do not consider the full KG context. To make matters worse, KG nodes often represent proper noun entities and are sometimes encrypted, being uninformative in selecting between paths. To address these problems, we propose Neural Tree Search (NuTrea), a tree search-based GNN model that incorporates the broader KG context. Our model adopts a message-passing scheme that probes the unreached subtree regions to boost the past-oriented embeddings. In addition, we introduce the Relation Frequency-Inverse Entity Frequency (RF-IEF) node embedding that considers the global KG context to better characterize ambiguous KG nodes. The general effectiveness of our approach is demonstrated through experiments on three major multi-hop KGQA benchmark datasets, and our extensive analyses further validate its expressiveness and robustness. Overall, NuTrea provides a powerful means to query the KG with complex natural language questions. Code is available at https://github.com/mlvlab/NuTrea.
    摘要 多跳知识图问答(KGQA)是一个检索知识图(KG)中节点,以回答自然语言问题的任务。现代GNN基于的方法将这个任务视为知识图路径搜索问题,其中消息从种子节点向答题节点进行顺序传递。然而,这些消息是过去oriented,没有考虑整个KG上下文。另外,KG节点经常表示特定名词实体,有时会隐藏信息,从而困难在选择路径。为了解决这些问题,我们提出了神经树搜索(NuTrea)模型,它采用了树搜索的方法,并在搜索过程中采用了消息传递机制,以提高过去oriented的嵌入。此外,我们还引入了关系频率对反应实体频率(RF-IEF)节点嵌入,以更好地描述不确定的KG节点。我们的方法在三个主要多跳KGQA标准数据集上进行了实验,并通过了广泛的分析,以证明其表达力和稳定性。总之,NuTrea提供了一种可以用于查询KG的复杂自然语言问题的强大工具。代码可以在https://github.com/mlvlab/NuTrea上找到。

AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing

  • paper_url: http://arxiv.org/abs/2310.15479
  • repo_url: None
  • paper_authors: Namjoon Suh, Xiaofeng Lin, Din-Yin Hsieh, Merhdad Honarkhah, Guang Cheng
  • for: 本研究使用扩散模型生成 синтетиче表格数据,解决了表格数据生成中异构特征的问题。
  • methods: 我们使用 auto-encoder 架构来处理异构特征,并与现有的表格生成器进行比较。
  • results: 我们在 15 个公共数据集上进行了实验,发现我们的模型能够准确地捕捉特征之间的相关性,并在下游任务中表现良好。
    Abstract Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model for generating synthetic tabular data. The heterogeneous features in tabular data have been main obstacles in tabular data synthesis, and we tackle this problem by employing the auto-encoder architecture. When compared with the state-of-the-art tabular synthesizers, the resulting synthetic tables from our model show nice statistical fidelities to the real data, and perform well in downstream tasks for machine learning utilities. We conducted the experiments over 15 publicly available datasets. Notably, our model adeptly captures the correlations among features, which has been a long-standing challenge in tabular data synthesis. Our code is available upon request and will be publicly released if paper is accepted.
    摘要 Diffusion model已成为现代机器学习许多子领域的主要思想,包括计算机视觉、语言模型和声音生成等。在这篇论文中,我们利用分散模型为生成合成表格数据。表格数据中具有多种特征的差异是现代表格数据生成的主要障碍,我们通过使用自编码器架构解决这个问题。与现有的表格生成器相比,我们的模型能够很好地保持与真实数据的统计准确性,并在下游任务中表现出色。我们在15个公共可用的数据集上进行了实验。值得一提的是,我们的模型能够很好地捕捉表格数据中特征之间的相关性,这是现代表格数据生成中长期的挑战。我们的代码可以在请求时提供,并将在纸上接受后公开发布。

A Communication Theory Perspective on Prompting Engineering Methods for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.18358
  • repo_url: None
  • paper_authors: Yuanfeng Song, Yuanqin He, Xuefang Zhao, Hanlin Gu, Di Jiang, Haijun Yang, Lixin Fan, Qiang Yang
  • for: 本文旨在提供一种新的 perspective 来审查现有的 prompting 方法,并帮助读者更深入地理解现有的发展趋势。
  • methods: 本文使用了communication theory 框架来 illustrate 现有的 prompting 方法,并分析了四种典型任务的发展趋势。
  • results: 本文提出了一些可能的 future research directions ,可以帮助开发更好的 prompting 方法。
    Abstract The springing up of Large Language Models (LLMs) has shifted the community from single-task-orientated natural language processing (NLP) research to a holistic end-to-end multi-task learning paradigm. Along this line of research endeavors in the area, LLM-based prompting methods have attracted much attention, partially due to the technological advantages brought by prompt engineering (PE) as well as the underlying NLP principles disclosed by various prompting methods. Traditional supervised learning usually requires training a model based on labeled data and then making predictions. In contrast, PE methods directly use the powerful capabilities of existing LLMs (i.e., GPT-3 and GPT-4) via composing appropriate prompts, especially under few-shot or zero-shot scenarios. Facing the abundance of studies related to the prompting and the ever-evolving nature of this field, this article aims to (i) illustrate a novel perspective to review existing PE methods, within the well-established communication theory framework; (ii) facilitate a better/deeper understanding of developing trends of existing PE methods used in four typical tasks; (iii) shed light on promising research directions for future PE methods.
    摘要 春季 LLM 的出现导致社区从单一任务领域的自然语言处理(NLP)研究转移到整体端到端多任务学习模式。在这一研究领域,基于 LLM 的提示方法吸引了很多注意力,部分是因为提示工程(PE)技术的优势以及不同提示方法下的 NLP 原理的披露。传统的超级vised学习通常需要基于标注数据进行训练,然后进行预测。与此相反,PE 方法直接利用现有的 LLM(如 GPT-3 和 GPT-4)的强大能力,通过编写合适的提示,特别是在少量或零量场景下。面对各种提示和这一领域的不断演化,这篇文章的目标是:(i)在通信理论框架下绘制一种新的观点来评估现有的 PE 方法;(ii)促进更深刻的理解现有 PE 方法在四种典型任务的发展趋势;(iii)突出未来 PE 方法的探索方向。

Empowering Distributed Solutions in Renewable Energy Systems and Grid Optimization

  • paper_url: http://arxiv.org/abs/2310.15468
  • repo_url: None
  • paper_authors: Mohammad Mohammadi, Ali Mohammadi
  • for: 这项研究探讨了电力业务从中心化到分散式的转变,尤其是如何通过机器学习(ML)技术推动可再生能源和改善电网管理。
  • methods: 这项研究使用了各种机器学习模型,如人工神经网络、支持向量机和决策树,以预测可再生能源生产和消耗。此外,还使用了数据处理技术,如数据分割、 нормализация、分解和简化,以提高预测精度。
  • results: 该研究发现,通过将大数据和机器学习应用于智能电网,可以提高能源效率、更好地应对需求,并更好地 интеグрировать可再生能源资源。然而,还需要解决大量数据处理、保障网络安全和获得专业知识等挑战。
    Abstract This study delves into the shift from centralized to decentralized approaches in the electricity industry, with a particular focus on how machine learning (ML) advancements play a crucial role in empowering renewable energy sources and improving grid management. ML models have become increasingly important in predicting renewable energy generation and consumption, utilizing various techniques like artificial neural networks, support vector machines, and decision trees. Furthermore, data preprocessing methods, such as data splitting, normalization, decomposition, and discretization, are employed to enhance prediction accuracy. The incorporation of big data and ML into smart grids offers several advantages, including heightened energy efficiency, more effective responses to demand, and better integration of renewable energy sources. Nevertheless, challenges like handling large data volumes, ensuring cybersecurity, and obtaining specialized expertise must be addressed. The research investigates various ML applications within the realms of solar energy, wind energy, and electric distribution and storage, illustrating their potential to optimize energy systems. To sum up, this research demonstrates the evolving landscape of the electricity sector as it shifts from centralized to decentralized solutions through the application of ML innovations and distributed decision-making, ultimately shaping a more efficient and sustainable energy future.
    摘要 The integration of big data and ML into smart grids offers several advantages, including improved energy efficiency, more effective demand response, and better integration of renewable energy sources. However, challenges such as managing large data volumes, ensuring cybersecurity, and obtaining specialized expertise must be addressed.The research examines various ML applications within the realms of solar energy, wind energy, and electric distribution and storage, demonstrating their potential to optimize energy systems. For instance, ML can be used to predict solar and wind power output, optimize energy storage systems, and manage electricity distribution networks.Overall, this study illustrates the evolving landscape of the electricity sector as it shifts from centralized to decentralized solutions through the application of ML innovations and distributed decision-making, ultimately leading to a more efficient and sustainable energy future.

UI Layout Generation with LLMs Guided by UI Grammar

  • paper_url: http://arxiv.org/abs/2310.15455
  • repo_url: None
  • paper_authors: Yuwen Lu, Ziang Tong, Qinyi Zhao, Chengzhi Zhang, Toby Jia-Jun Li
  • for: investigate the use of Large Language Models (LLMs) for UI layout generation
  • methods: propose a novel approach called UI grammar to represent the hierarchical structure of UI screens and guide the generative capacities of LLMs
  • results: initial experiments with GPT-4 showed promising capability of LLMs to produce high-quality user interfaces via in-context learning, and preliminary comparative study suggested the potential of the grammar-based approach in improving the quality of generative results.Here’s the full translation of the abstract in Simplified Chinese:
  • for: 这份位置论文是要研究 Large Language Models (LLMs) 在用户界面 (UI) 布局生成方面的应用。
  • methods: 我们提出了一种新的方法,即 UI 语法,来表示 UI 画面中的层次结构,并将其用来引导 LLMs 的生成能力。
  • results: 我们的初步实验显示,使用 GPT-4 可以通过内容学习获得高质量的用户界面,而且我们的初步比较研究显示,语法基本方法在特定方面的生成结果质量上有潜在的改善。
    Abstract The recent advances in Large Language Models (LLMs) have stimulated interest among researchers and industry professionals, particularly in their application to tasks concerning mobile user interfaces (UIs). This position paper investigates the use of LLMs for UI layout generation. Central to our exploration is the introduction of UI grammar -- a novel approach we proposed to represent the hierarchical structure inherent in UI screens. The aim of this approach is to guide the generative capacities of LLMs more effectively and improve the explainability and controllability of the process. Initial experiments conducted with GPT-4 showed the promising capability of LLMs to produce high-quality user interfaces via in-context learning. Furthermore, our preliminary comparative study suggested the potential of the grammar-based approach in improving the quality of generative results in specific aspects.
    摘要

PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers’ Workflows

  • paper_url: http://arxiv.org/abs/2310.15435
  • repo_url: None
  • paper_authors: Savvas Petridis, Michael Terry, Carrie J. Cai
  • for: 这篇研究探讨了如何将AI示范和UI设计联系起来,以提高设计师的工作效率和 prototype 的可信度。
  • methods: 研究人员开发了一个名为PromptInfuser的 Figma 插件,可以让设计师将 UI 元素与提示连接起来,实现半功能的 mockup。
  • results: 在14名设计师 participated 的研究中,PromptInfuser 被视为比现有的 AI 示范工作流程更有用,可以更好地传达产品想法,生成更加实际地表现出想像中的产品,并且更有效率地进行示范。
    Abstract Prototyping AI applications is notoriously difficult. While large language model (LLM) prompting has dramatically lowered the barriers to AI prototyping, designers are still prototyping AI functionality and UI separately. We investigate how coupling prompt and UI design affects designers' workflows. Grounding this research, we developed PromptInfuser, a Figma plugin that enables users to create semi-functional mockups, by connecting UI elements to the inputs and outputs of prompts. In a study with 14 designers, we compare PromptInfuser to designers' current AI-prototyping workflow. PromptInfuser was perceived to be significantly more useful for communicating product ideas, more capable of producing prototypes that realistically represent the envisioned artifact, more efficient for prototyping, and more helpful for anticipating UI issues and technical constraints. PromptInfuser encouraged iteration over prompt and UI together, which helped designers identify UI and prompt incompatibilities and reflect upon their total solution. Together, these findings inform future systems for prototyping AI applications.
    摘要 probiotyping AI 应用非常困难。 although large language model (LLM) 提示大大降低了 AI probiotyping 的门槛,Designers 仍然在 AI 功能和 UI 设计之间进行分离的探索。我们研究了如何将提示和 UI 设计结合,对设计者的工作流程产生影响。为了实践这些研究,我们开发了 PromptInfuser,一款 Figma 插件,允许用户通过将 UI 元素连接到提示的输入和输出来创建半功能的 mockup。在 14 名设计者参与的研究中,我们比较了 PromptInfuser 和设计者当前的 AI 探索工作流程。结果显示,PromptInfuser 被视为更有用于传达产品想法,更能生成符合想像中的 artifact 的 mockup,更高效的探索,并更有助于预测 UI 问题和技术约束。PromptInfuser 促进了提示和 UI 的同步 iterate,帮助设计者更好地了解他们的总解决方案,并反思提示和 UI 之间的不兼容性。这些发现可以帮助未来的 AI 探索系统。

ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles

  • paper_url: http://arxiv.org/abs/2310.15428
  • repo_url: None
  • paper_authors: Savvas Petridis, Ben Wedin, James Wexler, Aaron Donsbach, Mahima Pushkarna, Nitesh Goyal, Carrie J. Cai, Michael Terry
  • for: 这个研究的目的是开发一种可以帮助用户通过自然语言反馈来调整LLM输出的工具,以便更好地控制chatbot的行为。
  • methods: 这个研究使用了用户反馈的自然语言来生成 constitution,以便用于控制chatbot的行为。
  • results: 研究发现用户可以使用ConstitutionMaker工具来更好地控制chatbot的行为,并且可以更加快速地将自己的反馈转换成明确的原则。此外,用户还可以使用这个工具来更好地表达自己的想法和反馈,并且可以更加有效地控制chatbot的行为。
    Abstract Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively refine model outputs through their feedback, by helping them convert their feedback into a set of principles (i.e. a constitution) that dictate the model's behavior. From a formative study, we (1) found that users needed support converting their feedback into principles for the chatbot and (2) classified the different principle types desired by users. Inspired by these findings, we developed ConstitutionMaker, an interactive tool for converting user feedback into principles, to steer LLM-based chatbots. With ConstitutionMaker, users can provide either positive or negative feedback in natural language, select auto-generated feedback, or rewrite the chatbot's response; each mode of feedback automatically generates a principle that is inserted into the chatbot's prompt. In a user study with 14 participants, we compare ConstitutionMaker to an ablated version, where users write their own principles. With ConstitutionMaker, participants felt that their principles could better guide the chatbot, that they could more easily convert their feedback into principles, and that they could write principles more efficiently, with less mental demand. ConstitutionMaker helped users identify ways to improve the chatbot, formulate their intuitive responses to the model into feedback, and convert this feedback into specific and clear principles. Together, these findings inform future tools that support the interactive critiquing of LLM outputs.
    摘要 大语言模型(LLM)提示是一种有前途的新方法,允许用户创建和自定义自己的 chatbot。然而,现有的输出方法,如提示工程和精度调整,不支持用户将自然反馈转化为提示或模型更新。在这种工作中,我们explore了如何让用户通过反馈来精炼模型输出,并帮助他们将反馈转化为一组原则(即宪法),定义模型的行为。从一项形成研究中,我们发现:1. 用户需要支持将反馈转化为原则,以便控制 chatbot。2. 用户希望的不同原则类型,包括具体的问题和答案。基于这些发现,我们开发了宪法制作器(ConstitutionMaker),一种可供用户在自然语言反馈下,将反馈转化为原则。宪法制作器包括三种反馈模式:自然语言反馈、自动生成反馈和重写回复。每种反馈模式都会自动生成一个原则,并将其插入到 chatbot 的提示中。在14名参与者参与的用户研究中,我们比较了宪法制作器与一个减少版本,其中用户需要手动编写原则。与减少版本相比,参与者表示使用宪法制作器可以更好地控制 chatbot,更容易将反馈转化为原则,并可以更高效地编写原则,减少心理压力。宪法制作器帮助用户找到 chatbot 的问题,将自然反馈转化为可以更好地引导模型的反馈,并将反馈转化为具体和明确的原则。这些发现可以指导未来的工具,以支持 LLM 输出的交互批判。

LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery

  • paper_url: http://arxiv.org/abs/2310.18356
  • repo_url: None
  • paper_authors: Tianyi Chen, Tianyu Ding, Badal Yadav, Ilya Zharkov, Luming Liang
  • for: 本研究目的是提出一种高效的语言模型结构剪裁方法,以降低大型语言模型的计算成本。
  • methods: 本方法首先生成了LoRA模块之间的依赖关系图,以发现最小 removable 结构并分析知识分布。然后,它进行了逐步结构剪裁 LoRA 适配器,并启用了内置的知识传递以更好地保留红利模块中的信息。
  • results: 经过numerical experiment,这种方法可以在只使用一个GPU内部的几个GPU天内减少大型语言模型的占用空间,并且只减少了1.0%的性能。此外,这种方法还可以至少20%的计算成本。
    Abstract Large Language Models (LLMs) have transformed the landscape of artificial intelligence, while their enormous size presents significant challenges in terms of computational costs. We introduce LoRAShear, a novel efficient approach to structurally prune LLMs and recover knowledge. Given general LLMs, LoRAShear at first creates the dependency graphs over LoRA modules to discover minimally removal structures and analyze the knowledge distribution. It then proceeds progressive structured pruning on LoRA adaptors and enables inherent knowledge transfer to better preserve the information in the redundant structures. To recover the lost knowledge during pruning, LoRAShear meticulously studies and proposes a dynamic fine-tuning schemes with dynamic data adaptors to effectively narrow down the performance gap to the full models. Numerical results demonstrate that by only using one GPU within a couple of GPU days, LoRAShear effectively reduced footprint of LLMs by 20% with only 1.0% performance degradation and significantly outperforms state-of-the-arts. The source code will be available at https://github.com/microsoft/lorashear.
    摘要 大型语言模型(LLMs)已经改变人工智能领域的景观,但它们的巨大大小也带来了计算成本的挑战。我们介绍LoRAShear,一种新的高效方法,可以结构删除LLMs并恢复知识。对于一般的LLMs,LoRAShear首先创建LoRA模组之间的依赖关系图以发现最小移除结构,并分析知识的分布。然后,它逐渐进行进构式删除LoRA拓展,并启用内置的知识转移,以更好地保留在缩减结构中遗留的信息。为了恢复在删除过程中遗失的知识,LoRAShear详细研究并提出了动态精度调整方案和动态数据适配器,以有效地缩小性能差距和全模型。 num = num results表明,仅使用一个GPU,在几个GPU天的时间内,LoRAShear已经成功地将LLMs的印识缩减了20%,仅增加1.0%的性能损失,并明显超过了状态顶峰的方法。源代码将会在https://github.com/microsoft/lorashear上公开。

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

  • paper_url: http://arxiv.org/abs/2310.15421
  • repo_url: https://github.com/skywalker023/fantom
  • paper_authors: Hyunwoo Kim, Melanie Sclar, Xuhui Zhou, Ronan Le Bras, Gunhee Kim, Yejin Choi, Maarten Sap
  • for: 这个论文是为了测试理解人类思维的语言模型(LLM)。
  • methods: 该论文使用了一个新的benchmark方法called FANToM,通过问答测试LLM的理解人类思维的能力。
  • results: 研究发现,现状的LLM模型在Answering questions时表现较差,与人类的思维能力相比。
    Abstract Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering. Our benchmark draws upon important theoretical requisites from psychology and necessary empirical considerations when evaluating large language models (LLMs). In particular, we formulate multiple types of questions that demand the same underlying reasoning to identify illusory or false sense of ToM capabilities in LLMs. We show that FANToM is challenging for state-of-the-art LLMs, which perform significantly worse than humans even with chain-of-thought reasoning or fine-tuning.
    摘要 theory of mind (ToM) 评估目前主要是通过使用无反应的故事进行测试,这些故事自然lack interactivity。我们介绍了FANToM,一个新的 benchmark,用于在信息不均衡对话上测试 ToM,通过问答来强制测试。我们的 benchmark drew upon 心理理论中的重要需求和评估大语言模型 (LLMs) 中必要的实证考虑。特别是,我们设计了多种问题,需要同样的理解来识别LLMs中的幻觉或false sense of ToM能力。我们显示了FANToM是现状顶尖 LLMs 所表现出 significatively worse than humans,包括链式思维或细化。

Fractal Landscapes in Policy Optimization

  • paper_url: http://arxiv.org/abs/2310.15418
  • repo_url: None
  • paper_authors: Tao Wang, Sylvia Herbert, Sicun Gao
  • for: 本研究旨在探讨policy gradient方法在连续状态空间下的深度学习控制问题中的一种限制。
  • methods: 本研究使用了chaos theory和非准确分析的技术,分析了策略优化目标函数的马克思列普涅夫准则和欧几里得级数。
  • results: 研究发现,在某些类型的MDP中,策略优化的优化地形可能具有非准确或复杂的特征,导致无法估算梯度。通过实际实验,研究证明了这种情况的发生。
    Abstract Policy gradient lies at the core of deep reinforcement learning (RL) in continuous domains. Despite much success, it is often observed in practice that RL training with policy gradient can fail for many reasons, even on standard control problems with known solutions. We propose a framework for understanding one inherent limitation of the policy gradient approach: the optimization landscape in the policy space can be extremely non-smooth or fractal for certain classes of MDPs, such that there does not exist gradient to be estimated in the first place. We draw on techniques from chaos theory and non-smooth analysis, and analyze the maximal Lyapunov exponents and H\"older exponents of the policy optimization objectives. Moreover, we develop a practical method that can estimate the local smoothness of objective function from samples to identify when the training process has encountered fractal landscapes. We show experiments to illustrate how some failure cases of policy optimization can be explained by such fractal landscapes.
    摘要 政策梯度位于深度奖励学习(RL)中的核心,即使在标准控制问题上有知解的解决方案,RL 训练仍然可能失败。我们提出了一种框架来理解政策梯度方法的内在限制:在策略空间中的优化景观可能是某些类型的 Markov declaim Problem (MDP) 非凡的,无法计算梯度。我们Draw on chaos theory and non-smooth analysis techniques, and analyze the maximal Lyapunov exponents and Holder exponents of the policy optimization objectives. Moreover, we develop a practical method that can estimate the local smoothness of the objective function from samples to identify when the training process has encountered fractal landscapes. We show experiments to illustrate how some failure cases of policy optimization can be explained by such fractal landscapes.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The Traditional Chinese writing system is also widely used, especially in Taiwan and Hong Kong.

Nominality Score Conditioned Time Series Anomaly Detection by Point/Sequential Reconstruction

  • paper_url: http://arxiv.org/abs/2310.15416
  • repo_url: https://github.com/andrewlai61616/npsr
  • paper_authors: Chih-Yu Lai, Fan-Keng Sun, Zhengqi Gao, Jeffrey H. Lang, Duane S. Boning
  • for: 本研究旨在提出一种基于点 reconstruction 和序列 reconstruction 的无监测时间序异常检测方法,以解决时间序异常检测中的复杂性和多样性问题。
  • methods: 本研究提出了一个点 reconstruction 模型和一个序列 reconstruction 模型,用于检测点异常和Contextual异常。Point reconstruction 模型使用一个点异常检测器来评估点异常,而序列 reconstruction 模型使用一个序列异常检测器来评估点和Contextual异常。
  • results: 对于several public datasets进行了广泛的实验研究,结果表明,提出的方法在时间序异常检测中比大多数现有的基准方法表现更好。
    Abstract Time series anomaly detection is challenging due to the complexity and variety of patterns that can occur. One major difficulty arises from modeling time-dependent relationships to find contextual anomalies while maintaining detection accuracy for point anomalies. In this paper, we propose a framework for unsupervised time series anomaly detection that utilizes point-based and sequence-based reconstruction models. The point-based model attempts to quantify point anomalies, and the sequence-based model attempts to quantify both point and contextual anomalies. Under the formulation that the observed time point is a two-stage deviated value from a nominal time point, we introduce a nominality score calculated from the ratio of a combined value of the reconstruction errors. We derive an induced anomaly score by further integrating the nominality score and anomaly score, then theoretically prove the superiority of the induced anomaly score over the original anomaly score under certain conditions. Extensive studies conducted on several public datasets show that the proposed framework outperforms most state-of-the-art baselines for time series anomaly detection.
    摘要 时间序列异常检测是因为复杂多变的异常模式而具有挑战性。一个主要困难在于模型时间相关关系以确定上下文异常,同时保持点异常检测的准确性。在这篇论文中,我们提出一种无监督时间序列异常检测框架,该框架利用点基模型和序列基模型来进行重建。点基模型尝试量化点异常,而序列基模型尝试量化点和上下文异常。我们在认为观察到的时间点为假值的两个阶段偏移后,引入了一个 Nominality 分数,该分数来自重建错误的合计值。我们还提出了一个潜在异常分数,并经过理论证明其在某些条件下超过原始异常分数的优越性。我们在多个公共数据集上进行了广泛的研究,并证明了我们的框架在大多数现状监督模型的基础上显著超越。

Mind the Gap Between Conversations for Improved Long-Term Dialogue Generation

  • paper_url: http://arxiv.org/abs/2310.15415
  • repo_url: https://github.com/qzx7/mindthetime
  • paper_authors: Qiang Zhang, Jason Naradowsky, Yusuke Miyao
  • for: 这篇论文目的是让对话模型意识到时间的概念,并在不同的时间间隔下进行对话。
  • methods: 作者提出了一个名为GapChat的多会话对话集,其中每个会话之间的时间间隔不同。模型在接受时间信息后,对时间和事件进展进行了不同的表示。
  • results: 人工评估表明,在评价对话 relevance 和信息吸收方面,意识时间的模型表现更好。
    Abstract Knowing how to end and resume conversations over time is a natural part of communication, allowing for discussions to span weeks, months, or years. The duration of gaps between conversations dictates which topics are relevant and which questions to ask, and dialogue systems which do not explicitly model time may generate responses that are unnatural. In this work we explore the idea of making dialogue models aware of time, and present GapChat, a multi-session dialogue dataset in which the time between each session varies. While the dataset is constructed in real-time, progress on events in speakers' lives is simulated in order to create realistic dialogues occurring across a long timespan. We expose time information to the model and compare different representations of time and event progress. In human evaluation we show that time-aware models perform better in metrics that judge the relevance of the chosen topics and the information gained from the conversation.
    摘要 知道如何结束和续续会话是通信的自然部分,允许对话 span 周月年。对话系统不显式考虑时间可能生成不自然的响应。在这项工作中,我们探讨将对话模型意识到时间的想法,并提出了 GapChat,一个多期对话集。在这个数据集中,每个会话之间的时间间隔不同。虽然数据集是在实时构建的,但speakers的生活进程的进步是通过模拟来创建真实的对话发生在长时间间隔。我们暴露了时间信息给模型,并比较了不同的时间和事件进度表示。在人类评估中,我们显示时间意识的模型在评估对话中话题的相关性和获得的信息的 метриках中表现出色。

Diverse Conventions for Human-AI Collaboration

  • paper_url: http://arxiv.org/abs/2310.15414
  • repo_url: https://github.com/Stanford-ILIAD/Diverse-Conventions
  • paper_authors: Bidipta Sarkar, Andy Shih, Dorsa Sadigh
  • for: 提高多代理人游戏中的合作性和多样性,使得玩家可以协调共享策略而不需要显式交流。
  • methods: 使用自适应奖励学习和权衡策略来优化合作策略,并通过权衡策略和前一次发现的策略之间的冲突来生成多样的协议。
  • results: 在多种多样的合作游戏中,包括Overcooked,技术可以超越人类水平,并且能够适应人类的协议。
    Abstract Conventions are crucial for strong performance in cooperative multi-agent games, because they allow players to coordinate on a shared strategy without explicit communication. Unfortunately, standard multi-agent reinforcement learning techniques, such as self-play, converge to conventions that are arbitrary and non-diverse, leading to poor generalization when interacting with new partners. In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. To ensure that learned policies act in good faith despite the adversarial optimization of cross-play, we introduce \emph{mixed-play}, where an initial state is randomly generated by sampling self-play and cross-play transitions and the player learns to maximize the self-play reward from this initial state. We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique can adapt to the conventions of humans, surpassing human-level performance when paired with real users.
    摘要 <>translate "Conventions are crucial for strong performance in cooperative multi-agent games, because they allow players to coordinate on a shared strategy without explicit communication. Unfortunately, standard multi-agent reinforcement learning techniques, such as self-play, converge to conventions that are arbitrary and non-diverse, leading to poor generalization when interacting with new partners. In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. To ensure that learned policies act in good faith despite the adversarial optimization of cross-play, we introduce \emph{mixed-play}, where an initial state is randomly generated by sampling self-play and cross-play transitions and the player learns to maximize the self-play reward from this initial state. We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique can adapt to the conventions of humans, surpassing human-level performance when paired with real users.">> Here's the translation in Simplified Chinese:<>合作多代理游戏中,共谱是关键,它允许玩家在不Explicit Communication的情况下协调共同策略。然而,标准的多代理学习技术,如自我玩家,会导致不多样化的共谱,这会导致与新合作伙伴交互时的泛化性强度不高。在这项工作中,我们提出了一种技术来生成多样化的共谱,通过(1)在自我玩家中最大化奖励,而(2)在已经发现的共谱中最小化奖励,以便使共谱具有semantically different的特征。为确保学习的策略在恶性优化的cross-play中保持良好的行为,我们引入了杂合玩家(mixed-play),其中,初始状态由自我玩家和cross-play转移中随机选择,并且玩家学习从这个初始状态中 maximize自我玩家奖励。我们对多种多代理合作游戏,包括Overcooked,进行分析,发现我们的技术可以适应人类的共谱,并且在与真实用户配对时超过人类水平的性能。

cs.CL - 2023-10-24

GlotLID: Language Identification for Low-Resource Languages

  • paper_url: http://arxiv.org/abs/2310.16248
  • repo_url: https://github.com/cisnlp/glotsparse
  • paper_authors: Amir Hossein Kargaran, Ayyoob Imani, François Yvon, Hinrich Schütze
  • for: 本研究的目的是提供一个可靠、高效的语言识别模型,以推动低资源语言的人工智能技术的普及和提高。
  • methods: 本研究使用了一种基于对照的语言识别模型,并运用了一些特殊的技术来解决低资源语言的问题,例如:错误的资料metadata、高资源语言的泄漏、等等。
  • results: 本研究的结果显示,GlotLID-M模型能够实现与先前的模型比较的高度匹配,并且在调整F1和错误率之间取得平衡。此外,研究也显示了一些低资源语言的特殊挑战,例如:metadata错误、语言泄漏、等等。
    Abstract Several recent papers have published good solutions for language identification (LID) for about 300 high-resource and medium-resource languages. However, there is no LID available that (i) covers a wide range of low-resource languages, (ii) is rigorously evaluated and reliable and (iii) efficient and easy to use. Here, we publish GlotLID-M, an LID model that satisfies the desiderata of wide coverage, reliability and efficiency. It identifies 1665 languages, a large increase in coverage compared to prior work. In our experiments, GlotLID-M outperforms four baselines (CLD3, FT176, OpenLID and NLLB) when balancing F1 and false positive rate (FPR). We analyze the unique challenges that low-resource LID poses: incorrect corpus metadata, leakage from high-resource languages, difficulty separating closely related languages, handling of macrolanguage vs varieties and in general noisy data. We hope that integrating GlotLID-M into dataset creation pipelines will improve quality and enhance accessibility of NLP technology for low-resource languages and cultures. GlotLID-M model, code, and list of data sources are available: https://github.com/cisnlp/GlotLID.
    摘要

ZzzGPT: An Interactive GPT Approach to Enhance Sleep Quality

  • paper_url: http://arxiv.org/abs/2310.16242
  • repo_url: https://github.com/marwahalaofi/ubicomp23-student-challenge
  • paper_authors: Yonchanok Khaokaew, Thuc Hanh Nguyen, Kaixin Ji, Hiruni Kegalle, Marwah Alaofi
  • for: 这篇论文旨在提供有用的睡眠预测和反馈,以提高用户的睡眠质量。
  • methods: 该论文使用两stage框架,结合大量自然语言模型(LLMs),以提供准确的睡眠预测和有用的反馈。
  • results: 该研究使用GLOBEM数据集和生成的 sintetic数据,显示使用XGBoost模型可以提高预测的准确性。
    Abstract In today's world, sleep quality is pivotal for overall well-being. While wearable sensors offer real-time monitoring, they often lack actionable insights, leading to user abandonment. This paper delves into the role of technology in understanding sleep patterns. We introduce a two-stage framework, utilizing Large Language Models (LLMs), aiming to provide accurate sleep predictions with actionable feedback. Leveraging the GLOBEM dataset and synthetic data from LLMs, we highlight enhanced results with models like XGBoost. Our approach merges advanced machine learning with user-centric design, blending scientific accuracy with practicality.
    摘要 今天的世界中,睡眠质量对总体健康非常重要。虽然佩戴式感知器可以提供实时监测,但它们经常缺乏有用的反馈,导致用户废弃。这篇论文探讨技术在理解睡眠模式方面的作用。我们介绍了一个两个阶段的框架,利用大语言模型(LLMs),以提供准确的睡眠预测和有用的反馈。我们利用GLOBEM数据集和LLMs生成的 sintetic数据,显示了加强的结果,如XGBoost模型。我们的方法结合了先进的机器学习技术和用户中心的设计,将科学准确性融合到实用中。

Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models

  • paper_url: http://arxiv.org/abs/2310.16240
  • repo_url: None
  • paper_authors: Raymond Li, Gabriel Murray, Giuseppe Carenini
  • for: 这篇论文旨在将两个流行的研究领域融合到预训练语言模型中,通过在PEFT设置下进行参数有效的微调。
  • methods: 我们提出了一种将并行适配器模块编码不同语言结构的混合多语言专家架构,使用Gumbel-Softmax门控制每层模型中每个专家的重要性。为降低参数数量,我们先在 fixes 小数量的步骤上训练模型,然后根据专家的重要性分数进行预测。
  • results: 我们的方法可以与其他PEFT方法相比,在相同的参数数量下达到更高的性能水平。此外,我们还提供了额外的分析,以便对每层模型选择的专家进行深入的探究。
    Abstract In this work, we propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models in the parameter-efficient fine-tuning (PEFT) setting. In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture, where Gumbel-Softmax gates are used to determine the importance of these modules at each layer of the model. To reduce the number of parameters, we first train the model for a fixed small number of steps before pruning the experts based on their importance scores. Our experiment results with three different pre-trained models show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters. In addition, we provide additional analysis to examine the experts selected by each model at each layer to provide insights for future studies.
    摘要 在这项工作中,我们提出了一种方法,该方法将两个流行研究领域结合在一起,通过在预训练语言模型的参数高效调教(PEFT)设置中注入语言结构。我们的方法使用一种新的混合语言专家架构,其中Gumbel-Softmax门控制每层模型中每个专家的重要性。为了减少参数数量,我们首先在fixed小数目步骤上训练模型,然后根据每个专家的重要性分数进行专家折叠。我们的实验结果表明,我们的方法可以与同样数量的参数 OUTPERFORM现状的PEFT方法。此外,我们还提供了进一步的分析,以便为未来研究提供洞察。

TiC-CLIP: Continual Training of CLIP Models

  • paper_url: http://arxiv.org/abs/2310.16226
  • repo_url: None
  • paper_authors: Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, Fartash Faghri
  • for: This paper is written for training vision-language models on time-continuous data, specifically to address the problem of continually training large foundation models without retraining from scratch.
  • methods: The paper introduces the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models, including TiC-DataCompt, TiC-YFCC, and TiC-RedCaps, which contain over 12.7B timestamped image-text pairs spanning 9 years (2014-2022). The paper also introduces a simple rehearsal-based approach for efficiently training models on time-continuous data.
  • results: The paper shows that OpenAI’s CLIP (trained on data up to 2020) loses approximately 8% zero-shot accuracy on the curated retrieval task from 2021-2022 compared with more recently trained models in the OpenCLIP repository. The paper also demonstrates that the simple rehearsal-based approach can reduce compute by 2.5 times compared to the standard practice of retraining from scratch.Here is the information in Simplified Chinese text:
  • for: 这篇论文是为了训练视觉语言模型而写的,具体来说是为了解决大型基础模型不断 retraining 的问题。
  • methods: 这篇论文引入了首个 web-scale 时间连续 (TiC) 测试 benchmark для训练视觉语言模型,包括 TiC-DataCompt、TiC-YFCC 和 TiC-RedCaps,这些 benchmark 包含了9年(2014-2022) 时间内的12.7亿个时间戳image-text对。论文还提出了一种简单的回忆型方法来高效地训练模型。
  • results: 论文显示,OpenAI的 CLIP (训练到2020年的数据) 在 authors 精心制定的检索任务上减少了约8%的零shot准确率,与 OpenCLIP 库中更近期训练的模型相比。论文还证明了这种简单的回忆型方法可以将计算量减少为2.5倍。
    Abstract Keeping large foundation models up to date on latest data is inherently expensive. To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models. This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines. We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataCompt, TiC-YFCC, and TiC-RedCaps with over 12.7B timestamped image-text pairs spanning 9 years (2014--2022). We first use our benchmarks to curate various dynamic evaluations to measure temporal robustness of existing models. We show OpenAI's CLIP (trained on data up to 2020) loses $\approx 8\%$ zero-shot accuracy on our curated retrieval task from 2021--2022 compared with more recently trained models in OpenCLIP repository. We then study how to efficiently train models on time-continuous data. We demonstrate that a simple rehearsal-based approach that continues training from the last checkpoint and replays old data reduces compute by $2.5\times$ when compared to the standard practice of retraining from scratch.
    摘要 维护大型基础模型的最新数据是昂贵的。为了避免不断 retraining 的高成本,需要不断训练这些模型。这个问题被加剧了由于大规模 continual learning 的缺乏标准 benchamarks 和基准。我们发布了首个 web-scale Time-Continual(TiC)benchmarks,包括 TiC-DataCompt、TiC-YFCC 和 TiC-RedCaps,共包含12.7亿个时间戳图像文本对。我们首先使用我们的benchmarks来推动不同的动态评估,以测试时间 robustness 的现有模型。我们发现 OpenAI 的 CLIP(训练数据止2020年)在我们精心编辑的检索任务上的零shot准确率下降了约8%,与更近期训练的 OpenCLIP 存储库中的模型相比。然后我们研究如何有效地在时间连续的数据上训练模型。我们发现一种简单的待机执行方法,通过从上一个检查点继续训练并重复旧数据,可以将计算量减少了2.5倍,相比于标准的重新训练从零开始的做法。

Background Summarization of Event Timelines

  • paper_url: http://arxiv.org/abs/2310.16197
  • repo_url: None
  • paper_authors: Adithya Pratapa, Kevin Small, Markus Dreyer
    for: 新闻事件的 concise 概述是一项自然语言处理任务的挑战。而新闻记者通常会编辑时间线,以便强调关键的子事件,但新来的读者可能会面临困难catching up with 新闻事件的历史背景。本文提出了背景新闻概述任务,该任务的目的是为每个时间步骤的新闻事件提供相关的前一系列事件的背景概述。我们构建了一个数据集,通过将现有的时间线数据集合并请求人工标注员为每个时间步骤的新闻事件编写背景概述。我们建立了强大的基eline性能,并提出了一种关注点 variant 来生成背景概述。为评估背景概述质量,我们提出了一种问题回答 metric,即背景用户分数(BUS),该分数测量一个当前事件时间步骤的问题中,背景概述是否能够回答。我们的实验表明,经过 fine-tuning 的 Flan-T5 系统以及 GPT-3.5 的强大零 shot 性能。
    Abstract Generating concise summaries of news events is a challenging natural language processing task. While journalists often curate timelines to highlight key sub-events, newcomers to a news event face challenges in catching up on its historical context. In this paper, we address this need by introducing the task of background news summarization, which complements each timeline update with a background summary of relevant preceding events. We construct a dataset by merging existing timeline datasets and asking human annotators to write a background summary for each timestep of each news event. We establish strong baseline performance using state-of-the-art summarization systems and propose a query-focused variant to generate background summaries. To evaluate background summary quality, we present a question-answering-based evaluation metric, Background Utility Score (BUS), which measures the percentage of questions about a current event timestep that a background summary answers. Our experiments show the effectiveness of instruction fine-tuned systems such as Flan-T5, in addition to strong zero-shot performance using GPT-3.5.
    摘要 传送新闻事件简要摘要是一项自然语言处理任务,具有挑战性。虽然记者们经常摘要时间线以便强调关键事件,但新手 faced with a news event often faces challenges in understanding its historical context. 在这篇论文中,我们解决这个需求,我们引入背景新闻摘要任务,每个时间点的新闻事件都需要一个背景摘要,涵盖有关的前一系列事件。我们构建了一个数据集,将现有的时间线数据集合并让人工标注者为每个时间点的新闻事件写一个背景摘要。我们建立了强大的基线性能,使用现有的摘要系统,并提出了一种关注点variant来生成背景摘要。为评估背景摘要质量,我们提出了一个问题回答 metric,背景用于评估器(Background Utility Score,BUS),该指标测量一个当前事件时间点的背景摘要是否能回答有关该事件的问题。我们的实验表明,在训练过程中使用 Flan-T5 等系统可以实现有效的辅助 fine-tuning,同时 Zero-shot 性能也非常强。

BLP 2023 Task 2: Sentiment Analysis

  • paper_url: http://arxiv.org/abs/2310.16183
  • repo_url: https://github.com/blp-workshop/blp_task2
  • paper_authors: Md. Arid Hasan, Firoj Alam, Anika Anjum, Shudipta Das, Afiyat Anjum
  • for: 这个研究是为了探讨社交媒体文本中的 sentiment 检测问题,以便更好地理解用户对产品或服务的看法。
  • methods: 这个研究使用了多种方法,包括传统机器学习模型、预先训练模型的 fine-tuning、以及大语言模型(LLMs)在零或少数shot设置下的使用。
  • results: 这个研究共收到了71名参与者的参与,其中29个 коман队在开发阶段提交了系统,并在评估阶段提交了30个系统。总共有597个运行被提交。然而,共有15个团队提交了系统描述文献。
    Abstract We present an overview of the BLP Sentiment Shared Task, organized as part of the inaugural BLP 2023 workshop, co-located with EMNLP 2023. The task is defined as the detection of sentiment in a given piece of social media text. This task attracted interest from 71 participants, among whom 29 and 30 teams submitted systems during the development and evaluation phases, respectively. In total, participants submitted 597 runs. However, a total of 15 teams submitted system description papers. The range of approaches in the submitted systems spans from classical machine learning models, fine-tuning pre-trained models, to leveraging Large Language Model (LLMs) in zero- and few-shot settings. In this paper, we provide a detailed account of the task setup, including dataset development and evaluation setup. Additionally, we provide a brief overview of the systems submitted by the participants. All datasets and evaluation scripts from the shared task have been made publicly available for the research community, to foster further research in this domain
    摘要 我们提供了BLP情感共享任务的概述,这是BLP 2023工作坊的一部分,并与EMNLP 2023相位。这个任务的定义是社交媒体文本中的情感检测。这个任务吸引了71名参与者的关注,其中29个团队在开发阶段和评估阶段分别提交了597次运行。然而,总共15个团队提交了系统描述论文。参与者们的提交的方法包括经典机器学习模型、精度调整预先训练模型以及在零和几个附加设置下使用大语言模型(LLMs)。在这篇论文中,我们提供了任务设置的详细资料,包括数据集开发和评估设置。此外,我们还提供了参与者们提交的系统的简要概述。所有的数据集和评估脚本都已经公开发布给研究社区,以促进这个领域的进一步研究。

Hidden Citations Obscure True Impact in Science

  • paper_url: http://arxiv.org/abs/2310.16181
  • repo_url: None
  • paper_authors: Xiangyi Meng, Onur Varol, Albert-László Barabási
  • for: 本研究旨在探讨科学家如何使用文献来评估发现的影响,并发现了隐藏的参考。
  • methods: 研究者采用了无监督可解释机器学习方法,对每篇论文的全文进行分析,系统地发现隐藏的参考。
  • results: 研究发现,对于影响大的发现,隐藏的参考数量更多于公开的参考数量,不受发表venue和学科影响。此外,隐藏参考的存在不是受到公开参考数量的影响,而是受到文献中对话的度量。
    Abstract References, the mechanism scientists rely on to signal previous knowledge, lately have turned into widely used and misused measures of scientific impact. Yet, when a discovery becomes common knowledge, citations suffer from obliteration by incorporation. This leads to the concept of hidden citation, representing a clear textual credit to a discovery without a reference to the publication embodying it. Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations. We find that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline. We show that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts, indicating that the more discussed is a discovery, the less visible it is to standard bibliometric analysis. Hidden citations indicate that bibliometric measures offer a limited perspective on quantifying the true impact of a discovery, raising the need to extract knowledge from the full text of the scientific corpus.
    摘要 科学家们常采用参考来示previous knowledge,但现在这些参考变成了广泛使用且不当地用于科学影响的标准。当发现成为通用知识时,参考会被包含在内化。这会导致隐藏引用的概念,表示文献中的明文资料 credit。我们利用不监督可解释机器学习对每篇论文全文进行系统地寻找隐藏引用。我们发现,对影响发现的隐藏引用数量比引用数量更多,不受出版平台和学科影响。我们表明,隐藏引用的存在不是基于引用数量,而是基于文献中话题的讨论程度, indicating that the more discussed is a discovery, the less visible it is to standard bibliometric analysis。隐藏引用表明, bibliometric measures 只能提供科学发现的有限影响量,需要从科学文献中提取知识。

WojoodNER 2023: The First Arabic Named Entity Recognition Shared Task

  • paper_url: http://arxiv.org/abs/2310.16153
  • repo_url: None
  • paper_authors: Mustafa Jarrar, Muhammad Abdul-Mageed, Mohammed Khalilia, Bashar Talafha, AbdelRahim Elmadany, Nagham Hamad, Alaa’ Omar
  • for: 本文主要关注阿拉伯语命名实体识别(NER)任务,提供了新的NER数据集(i.e., Wojood),并定义了用于促进不同NER方法比较的meaningful的子任务。
  • methods: 本文使用了45个团队参与了共同任务,其中11个团队参与了测试阶段。specifically, 11 teams participated in FlatNER, while 8 teams tackled NestedNER。
  • results: 赢得奖winning teams achieved F1 scores of 91.96 and 93.73 in FlatNER and NestedNER, respectively.
    Abstract We present WojoodNER-2023, the first Arabic Named Entity Recognition (NER) Shared Task. The primary focus of WojoodNER-2023 is on Arabic NER, offering novel NER datasets (i.e., Wojood) and the definition of subtasks designed to facilitate meaningful comparisons between different NER approaches. WojoodNER-2023 encompassed two Subtasks: FlatNER and NestedNER. A total of 45 unique teams registered for this shared task, with 11 of them actively participating in the test phase. Specifically, 11 teams participated in FlatNER, while $8$ teams tackled NestedNER. The winning teams achieved F1 scores of 91.96 and 93.73 in FlatNER and NestedNER, respectively.
    摘要 我们现在介绍WojoodNER-2023,这是第一个阿拉伯语命名实体识别(NER)共同任务。WojoodNER-2023的主要焦点是阿拉伯语NER,提供了新的NER数据集(即Wojood),以及为不同NER方法进行比较有意义的子任务定义。WojoodNER-2023包括了两个子任务:FlatNER和NestedNER。总共45个团队 регистри过这个共同任务,其中11个团队参加了测试阶段。特别是,11个团队参加了FlatNER,而8个团队解决了NestedNER。赢利的团队在FlatNER和NestedNER中的F1分数分别为91.96和93.73。

Can You Follow Me? Testing Situational Understanding in ChatGPT

  • paper_url: http://arxiv.org/abs/2310.16135
  • repo_url: https://github.com/yangalan123/situationaltesting
  • paper_authors: Chenghao Yang, Allyson Ettinger
  • for: 这篇论文的目的是检验人工智能代理人(ChatGPT)的情景理解(Situational Understanding,SU)能力,以确定其是否具备人类样式的对话能力。
  • methods: 作者们使用了一个新的synthetic environment来测试ChatGPT的SU能力,通过评估模型在不同环境下的表现来评估其能力。
  • results: 研究发现,尽管ChatGPT在对话任务上表现出色,但它在保持正确的环境状态方面存在问题。研究人员发现,ChatGPT的表现受到各种因素的影响,包括模型的忘记率和假的更新。这些发现表明,ChatGPT目前还不具备坚定的情景理解能力。
    Abstract Understanding sentence meanings and updating information states appropriately across time -- what we call "situational understanding" (SU) -- is a critical ability for human-like AI agents. SU is essential in particular for chat models, such as ChatGPT, to enable consistent, coherent, and effective dialogue between humans and AI. Previous works have identified certain SU limitations in non-chatbot Large Language models (LLMs), but the extent and causes of these limitations are not well understood, and capabilities of current chat-based models in this domain have not been explored. In this work we tackle these questions, proposing a novel synthetic environment for SU testing which allows us to do controlled and systematic testing of SU in chat-oriented models, through assessment of models' ability to track and enumerate environment states. Our environment also allows for close analysis of dynamics of model performance, to better understand underlying causes for performance patterns. We apply our test to ChatGPT, the state-of-the-art chatbot, and find that despite the fundamental simplicity of the task, the model's performance reflects an inability to retain correct environment states across time. Our follow-up analyses suggest that performance degradation is largely because ChatGPT has non-persistent in-context memory (although it can access the full dialogue history) and it is susceptible to hallucinated updates -- including updates that artificially inflate accuracies. Our findings suggest overall that ChatGPT is not currently equipped for robust tracking of situation states, and that trust in the impressive dialogue performance of ChatGPT comes with risks. We release the codebase for reproducing our test environment, as well as all prompts and API responses from ChatGPT, at https://github.com/yangalan123/SituationalTesting.
    摘要 <>将文本翻译成简化中文。<>人类智能代理机器人需要具备"情境理解"(SU)能力,以便在时间上更新信息状态。chatGPT等 chatbot需要这种能力,以实现人机对话的一致、 coherent 和有效。previous works已经发现了一些 SU 限制在非 chatbot 大语言模型(LLMs)中,但这些限制的EXTENT和原因还不够了解。此外,当前的 chat-based 模型在这个领域的能力还没有得到探索。在这项工作中,我们提出了一种新的情境测试环境,以控制和系统地测试 chat-oriented 模型的 SU 能力。我们通过评估模型的环境状态跟踪和总结来进行测试。我们在这个环境中测试了 chatGPT,发现它的性能表现出了无法保持正确的环境状态的问题。我们的跟进分析表明,chatGPT 的性能下降的主要原因是它没有持续性的内容快照(although it can access the full dialogue history),并且容易受到幻想的更新的影响。我们的发现建议 chatGPT 目前不具备 Robust 的情境跟踪能力,并且对它的印象性对话性能有风险。我们在 GitHub 上发布了测试环境的代码基本,以及所有的提示和 API 响应,可以在 中下载。

GenKIE: Robust Generative Multimodal Document Key Information Extraction

  • paper_url: http://arxiv.org/abs/2310.16131
  • repo_url: https://github.com/glasgow-ai4biomed/genkie
  • paper_authors: Panfeng Cao, Ye Wang, Qiang Zhang, Zaiqiao Meng
  • for: 提高扫描文档中关键信息提取精度
  • methods: 提出一种新的生成型终端模型(GenKIE),利用多modal编码器将视觉、布局和文本特征嵌入,并使用decoder生成需要的输出
  • results: 广泛实验表明,GenKIE在不同类型的文档上具有良好的泛化能力,并实现了状态之最的结果,同时模型也具有自动纠正OCR错误的能力。
    Abstract Key information extraction (KIE) from scanned documents has gained increasing attention because of its applications in various domains. Although promising results have been achieved by some recent KIE approaches, they are usually built based on discriminative models, which lack the ability to handle optical character recognition (OCR) errors and require laborious token-level labelling. In this paper, we propose a novel generative end-to-end model, named GenKIE, to address the KIE task. GenKIE is a sequence-to-sequence multimodal generative model that utilizes multimodal encoders to embed visual, layout and textual features and a decoder to generate the desired output. Well-designed prompts are leveraged to incorporate the label semantics as the weakly supervised signals and entice the generation of the key information. One notable advantage of the generative model is that it enables automatic correction of OCR errors. Besides, token-level granular annotation is not required. Extensive experiments on multiple public real-world datasets show that GenKIE effectively generalizes over different types of documents and achieves state-of-the-art results. Our experiments also validate the model's robustness against OCR errors, making GenKIE highly applicable in real-world scenarios.
    摘要 针对扫描文档中的关键信息提取(Key Information Extraction,KIE)问题,随着不同领域的应用,拥有增加的关注。虽然一些最新的KIE方法已经实现了可观的成果,但是这些方法通常是基于分类模型,缺乏对光学字符识别(OCR)错误的处理能力,同时需要繁琐的单个字符标注。在这篇论文中,我们提出了一种新的生成型终端模型,名为GenKIE,用于解决KIE任务。GenKIE是一种序列到序列多Modal生成模型,利用多Modal编码器将视觉、格式和文本特征编码,并使用解码器生成需要的输出。Well-designed prompts被利用来把标签 semantics 作为弱样本标注,让生成器自动生成关键信息。一个GenKIE的优点是它可以自动更正 OCR 错误。此外,单个字符精度标注不是必要的。广泛的实验表明,GenKIE可以高效地泛化到不同类型的文档,并 achieve 状态的最佳结果。我们的实验还证明了模型对 OCR 错误的Robustness,使得GenKIE在实际应用中非常可靠。

Octopus: A Multitask Model and Toolkit for Arabic Natural Language Generation

  • paper_url: http://arxiv.org/abs/2310.16127
  • repo_url: None
  • paper_authors: AbdelRahim Elmadany, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed
  • for: 这个研究是为了开发一个可以处理广泛任务的阿拉伯文本生成工具组。
  • methods: 这个研究使用了一种新的阿拉伯文本转换模型,名为AraT5v2,并在该模型上进行了训练。该训练包括了不同的预训练策略,包括无超给、有超给、共同预训练等。
  • results: 根据研究结果,这个新的模型在与其他比较基eline的模型进行比较时,具有了大幅度的提升。此外,研究者还开发了一个名为Octopus的Python基本包和命令行工具组,可以帮助开发者快速地进行阿拉伯文本生成任务。
    Abstract Understanding Arabic text and generating human-like responses is a challenging endeavor. While many researchers have proposed models and solutions for individual problems, there is an acute shortage of a comprehensive Arabic natural language generation toolkit that is capable of handling a wide range of tasks. In this work, we present a novel Arabic text-to-text Transformer model, namely AraT5v2. Our new model is methodically trained on extensive and diverse data, utilizing an extended sequence length of 2,048 tokens. We explore various pretraining strategies including unsupervised, supervised, and joint pertaining, under both single and multitask settings. Our models outperform competitive baselines with large margins. We take our work one step further by developing and publicly releasing Octopus, a Python-based package and command-line toolkit tailored for eight Arabic generation tasks all exploiting a single model. We release the models and the toolkit on our public repository.
    摘要 理解阿拉伯文本和生成人类化响应是一项复杂的任务。虽然许多研究人员已经提出了模型和解决方案,但是总的来说是缺乏一个全面的阿拉伯自然语言生成工具包,可以处理广泛的任务。在这项工作中,我们提出了一种新的阿拉伯文本-文本变换器模型,即AraT5v2。我们的新模型通过对广泛和多样化的数据进行系统训练,并使用扩展的序列长度2048个元素。我们研究了不同的预训练策略,包括无监督、监督、共同预训练等,在单任务和多任务 setting下进行了测试。我们的模型在比较基eline上表现出了明显的优势。为了进一步推动这项工作,我们还开发了一个名为Octopus的Python基本包和命令行工具集,用于执行八种阿拉伯生成任务,均基于单个模型。我们将模型和工具集公开发布到我们的公共存储库上。

NADI 2023: The Fourth Nuanced Arabic Dialect Identification Shared Task

  • paper_url: http://arxiv.org/abs/2310.16117
  • repo_url: None
  • paper_authors: Muhammad Abdul-Mageed, AbdelRahim Elmadany, Chiyu Zhang, El Moatez Billah Nagoudi, Houda Bouamor, Nizar Habash
  • for: 本研究的目的是提高阿拉伯语自然语言处理的状态作图,通过创造合作的研究团队在标准化的环境下竞争。
  • methods: 本研究使用了多种方法,包括 диалект识别和 machine translation。
  • results: 研究结果显示,三个子任务仍然具有挑战性,并且鼓励未来的研究。winning teams的成绩为87.27 F1、14.76 Bleu和21.10 Bleu,分别在三个子任务中。
    Abstract We describe the findings of the fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023). The objective of NADI is to help advance state-of-the-art Arabic NLP by creating opportunities for teams of researchers to collaboratively compete under standardized conditions. It does so with a focus on Arabic dialects, offering novel datasets and defining subtasks that allow for meaningful comparisons between different approaches. NADI 2023 targeted both dialect identification (Subtask 1) and dialect-to-MSA machine translation (Subtask 2 and Subtask 3). A total of 58 unique teams registered for the shared task, of whom 18 teams have participated (with 76 valid submissions during test phase). Among these, 16 teams participated in Subtask 1, 5 participated in Subtask 2, and 3 participated in Subtask 3. The winning teams achieved 87.27 F1 on Subtask 1, 14.76 Bleu in Subtask 2, and 21.10 Bleu in Subtask 3, respectively. Results show that all three subtasks remain challenging, thereby motivating future work in this area. We describe the methods employed by the participating teams and briefly offer an outlook for NADI.
    摘要 我团队描述了第四届细腻阿拉伯语言标注分享任务(NADI 2023)的发现。NADI 的目标是通过共同竞争的标准化条件来进步阿拉伯语言处理技术。它专注于阿拉伯方言,提供了新的数据集和定义了可比较的子任务,以便进行有意义的对不同方法的比较。NADI 2023 targeted both dialect identification (Subtask 1) and dialect-to-MSA machine translation (Subtask 2 and Subtask 3). A total of 58 unique teams registered for the shared task, of whom 18 teams participated (with 76 valid submissions during the test phase). Among these, 16 teams participated in Subtask 1, 5 participated in Subtask 2, and 3 participated in Subtask 3. The winning teams achieved 87.27% F1 on Subtask 1, 14.76 Bleu in Subtask 2, and 21.10 Bleu in Subtask 3, respectively. Results show that all three subtasks remain challenging, thereby motivating future work in this area. We describe the methods employed by the participating teams and briefly offer an outlook for NADI.

Locally Differentially Private Document Generation Using Zero Shot Prompting

  • paper_url: http://arxiv.org/abs/2310.16111
  • repo_url: None
  • paper_authors: Saiteja Utpala, Sara Hooker, Pin Yu Chen
  • for: 防止推导语言模型隐私风险
  • methods: 提出了一种 мест化差异隐私机制 called DP-Prompt,利用预训练大语言模型和零戳训练提示来对作者解匿攻击进行防御,并最小化下游性能的影响
  • results: 对IMDB数据集进行测试,DP-Prompt(与ChatGPT结合)可以完美地恢复清洁情感F1分数,同时对于静止攻击者而言,可以实现46%的作者识别F1分数减少,对于适应攻击者而言,可以实现26%的减少
    Abstract Numerous studies have highlighted the privacy risks associated with pretrained large language models. In contrast, our research offers a unique perspective by demonstrating that pretrained large language models can effectively contribute to privacy preservation. We propose a locally differentially private mechanism called DP-Prompt, which leverages the power of pretrained large language models and zero-shot prompting to counter author de-anonymization attacks while minimizing the impact on downstream utility. When DP-Prompt is used with a powerful language model like ChatGPT (gpt-3.5), we observe a notable reduction in the success rate of de-anonymization attacks, showing that it surpasses existing approaches by a considerable margin despite its simpler design. For instance, in the case of the IMDB dataset, DP-Prompt (with ChatGPT) perfectly recovers the clean sentiment F1 score while achieving a 46\% reduction in author identification F1 score against static attackers and a 26\% reduction against adaptive attackers. We conduct extensive experiments across six open-source large language models, ranging up to 7 billion parameters, to analyze various effects of the privacy-utility tradeoff.
    摘要 多数研究已经强调了预训练大型自然语言模型的隐私风险。然而,我们的研究呈现了一种独特的视角,即预训练大型自然语言模型可以有效地增进隐私保护。我们提出了一种本地差分隐私机制called DP-Prompt,利用了预训练大型自然语言模型和零扩展提示的力量,对抗作者去掌握攻击而减少下游实用性的影响。当DP-Prompt与强大的语言模型如ChatGPT(gpt-3.5)一起使用时,我们观察到了对抗攻击者去掌握的成功率下降,而且与现有方法相比,DP-Prompt具有更简单的设计。例如,在IMDB dataset中,DP-Prompt(与ChatGPT)可以完美地恢复清洁的 sentiment F1 分数,同时对于静态攻击者实现46%的减少,对于适应性攻击者实现26%的减少。我们在六个开源大型自然语言模型(最大参数70亿)上进行了广泛的实验,以分析不同的隐私Utility贸易。

CR-COPEC: Causal Rationale of Corporate Performance Changes to Learn from Financial Reports

  • paper_url: http://arxiv.org/abs/2310.16095
  • repo_url: https://github.com/cr-copec/cr-copec
  • paper_authors: Ye Eun Chun, Sunjae Kwon, Kyunghwan Sohn, Nakwon Sung, Junyoup Lee, Byungki Seo, Kevin Compher, Seung-won Hwang, Jaesik Choi
  • for: 这 paper 是为了构建一个大规模域适化 causal 句子数据集,用于检测公司财务性能变化。
  • methods: 这 paper 使用了 10-K 年度报告中专家的 causal 分析,以满足 accounting 标准。 dataset 可以广泛地用于个人投资者和分析师作为投资决策的材料资源,无需阅读大量文档。
  • results: 这 paper 在 twelve 个industry 中考虑了不同的特征,因此可以在不同的industry中分辨 causal 句子。 authors 还提供了 dataset 的构建和分析,以及实验代码。
    Abstract In this paper, we introduce CR-COPEC called Causal Rationale of Corporate Performance Changes from financial reports. This is a comprehensive large-scale domain-adaptation causal sentence dataset to detect financial performance changes of corporate. CR-COPEC contributes to two major achievements. First, it detects causal rationale from 10-K annual reports of the U.S. companies, which contain experts' causal analysis following accounting standards in a formal manner. This dataset can be widely used by both individual investors and analysts as material information resources for investing and decision making without tremendous effort to read through all the documents. Second, it carefully considers different characteristics which affect the financial performance of companies in twelve industries. As a result, CR-COPEC can distinguish causal sentences in various industries by taking unique narratives in each industry into consideration. We also provide an extensive analysis of how well CR-COPEC dataset is constructed and suited for classifying target sentences as causal ones with respect to industry characteristics. Our dataset and experimental codes are publicly available.
    摘要 在这篇论文中,我们引入了CR-COPEC,即财务报表中公司业绩变化的原因分析 dataset。这是一个广泛的领域适应性 causal 句据集,用于检测公司财务业绩变化。CR-COPEC 在两个主要成果方面做出了贡献。首先,它从美国公司的10-K年度报告中检测出了专家们根据会计标准进行形式化的 causal 分析。这个数据集可以被广泛使用于投资和决策过程中,无需阅读所有文档。其次,它仔细考虑了不同领域对公司财务业绩的影响,因此可以在不同领域中分辨出 causal 句。我们还提供了对 CR-COPEC 数据集的广泛分析和适用于分类目标句的研究。我们的数据集和实验代码都公开可用。

MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning

  • paper_url: http://arxiv.org/abs/2310.16049
  • repo_url: https://github.com/zayne-sprague/musr
  • paper_authors: Zayne Sprague, Xi Ye, Kaj Bostrom, Swarat Chaudhuri, Greg Durrett
  • for: 评估大语言模型(LLM)的理性能力
  • methods: 使用链式思维提示技术和自然语言生成算法
  • results: 评估多步软理解任务的语言模型表现不佳,需要进一步改进
    Abstract While large language models (LLMs) equipped with techniques like chain-of-thought prompting have demonstrated impressive capabilities, they still fall short in their ability to reason robustly in complex settings. However, evaluating LLM reasoning is challenging because system capabilities continue to grow while benchmark datasets for tasks like logical deduction have remained static. We introduce MuSR, a dataset for evaluating language models on multistep soft reasoning tasks specified in a natural language narrative. This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm, enabling the construction of complex reasoning instances that challenge GPT-4 (e.g., murder mysteries roughly 1000 words in length) and which can be scaled further as more capable LLMs are released. Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning; this makes it simultaneously much more challenging than other synthetically-crafted benchmarks while remaining realistic and tractable for human annotators to solve with high accuracy. We evaluate a range of LLMs and prompting techniques on this dataset and characterize the gaps that remain for techniques like chain-of-thought to perform robust reasoning.
    摘要 大型语言模型(LLM)已经具备了一些技术,如链式思维提示,表现出了很好的能力。然而,LLM在复杂的设定下进行Robust reasoning仍然弱点。然而,评估LLM的理解是困难的,因为系统的能力不断提高,而对于逻辑推理的 benchmark数据集仍然保持不变。我们介绍了 MuSR,一个用于评估语言模型多步软件逻辑任务的自然语言 narative。这个数据集有两个重要特点:首先,它通过一种新的 neurosymbolic 生成算法来生成复杂的逻辑任务,可以挑战 GPT-4(例如,谋杀 MYSTERY 约1000个单词长),并可以进一步扩展为更有能力的 LLM 发布。其次,我们的数据集实例是自然语言 narative,与实际世界的 reasoning Domain相对应,这使得它比其他生成的 benchmark 更加具有挑战性和真实性,同时也可以让人类评估员解决高度准确。我们对这些数据集进行了一系列LLM和提示技术的评估,并描述了链式思维在这些任务中的缺陷。

Visual Cropping Improves Zero-Shot Question Answering of Multimodal Large Language Models

  • paper_url: http://arxiv.org/abs/2310.16033
  • repo_url: https://github.com/saccharomycetes/visual_crop_zsvqa
  • paper_authors: Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski
  • for: 这 paper investigate multimodal Large Language Models (LLMs) 在视觉问答 (VQA) 任务中的限制,特别是它们是否可以正确地捕捉图像中的小型细节。
  • methods: 这 paper 使用了 multimodal LLMs 和人工视觉截割来提高 VQA 任务的性能。
  • results: 研究发现,multimodal LLMs 在图像中的小型细节捕捉能力很弱,其 zero-shot 性能与问题中图像Subject的大小有直接关系,随着Subject的尺寸增大,性能下降至 46%。此外,研究发现,人工视觉截割可以有效地改善 multimodal LLMs 的性能。
    Abstract Multimodal Large Language Models (LLMs) have recently achieved promising zero-shot accuracy on visual question answering (VQA) -- a fundamental task affecting various downstream applications and domains. Given the great potential for the broad use of these models, it is important to investigate their limitations in dealing with different image and question properties. In this work, we investigate whether multimodal LLMs can perceive small details as well as large details in images. In particular, we show that their zero-shot accuracy in answering visual questions is very sensitive to the size of the visual subject of the question, declining up to $46\%$ with size. Furthermore, we show that this effect is causal by observing that human visual cropping can significantly mitigate their sensitivity to size. Inspired by the usefulness of human cropping, we then propose three automatic visual cropping methods as inference time mechanisms to improve the zero-shot performance of multimodal LLMs. We study their effectiveness on four popular VQA datasets, and a subset of the VQAv2 dataset tailored towards fine visual details. Our findings suggest that multimodal LLMs should be used with caution in detail-sensitive VQA applications, and that visual cropping is a promising direction to improve their zero-shot performance. Our code and data are publicly available.
    摘要 多modal大语言模型(LLMs)在视觉问答(VQA)任务上最近获得了promising的零shot准确率——一个影响多个下渠应用和领域的基本任务。考虑到这些模型的广泛应用的潜力,因此调查其对不同图像和问题属性的局限性是非常重要的。在这项工作中,我们调查多modal LLMs是否可以正确地捕捉图像中的小 Details和大 Details。具体来说,我们发现其零shot准确率在回答视觉问题时对图像中的视觉主题大小具有极高的敏感性,随着主题大小的增加,准确率可以下降至46%。此外,我们发现这种效果是 causal,因为人类视觉cropping可以明显减轻其对大小的敏感性。 inspirited by the usefulness of human cropping,我们提出了三种自动视觉cropping方法作为infere时机制来提高多modal LLMs的零shot性能。我们在四个popular VQA dataset上研究了这些方法的效果,并在VQAv2 dataset上进行了一些精细Visual Details的子集研究。我们的发现表明:多modal LLMs在细节敏感的VQA应用中应用should be cautious,而visual cropping是一个有前途的方向来提高其零shot性能。我们的代码和数据公开可用。

Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation

  • paper_url: http://arxiv.org/abs/2310.15961
  • repo_url: None
  • paper_authors: Szymon Antoniak, Sebastian Jaszczur, Michał Krutul, Maciej Pióro, Jakub Krajewski, Jan Ludziejewski, Tomasz Odrzygóźdź, Marek Cygan
  • for: 提高Transformer模型的参数计数,保持训练和推理成本的MoE模型。
  • methods: 使用 feed-forward layer中的一些专家来代表所有token,但这会导致训练不稳定和专家使用不均匀。
  • results: 提出了一种新的Mixture of Tokens模型,可以保持MoE模型的好处而不是经受以上问题,通过将token从不同的示例混合而不是路由它们到专家。
    Abstract Despite the promise of Mixture of Experts (MoE) models in increasing parameter counts of Transformer models while maintaining training and inference costs, their application carries notable drawbacks. The key strategy of these models is to, for each processed token, activate at most a few experts - subsets of an extensive feed-forward layer. But this approach is not without its challenges. The operation of matching experts and tokens is discrete, which makes MoE models prone to issues like training instability and uneven expert utilization. Existing techniques designed to address these concerns, such as auxiliary losses or balance-aware matching, result either in lower model performance or are more difficult to train. In response to these issues, we propose Mixture of Tokens, a fully-differentiable model that retains the benefits of MoE architectures while avoiding the aforementioned difficulties. Rather than routing tokens to experts, this approach mixes tokens from different examples prior to feeding them to experts, enabling the model to learn from all token-expert combinations. Importantly, this mixing can be disabled to avoid mixing of different sequences during inference. Crucially, this method is fully compatible with both masked and causal Large Language Model training and inference.
    摘要 尽管混合专家(MoE)模型在提高Transformer模型参数数量的同时保持训练和推理成本的承诺,但它们的应用还存在一些困难。这些模型的关键策略是,对每个处理的单词,只启用最多几个专家——Feed-Forward层中的子集。但这种方法存在许多问题,如训练不稳定和专家不均衡使用。现有的技术,如辅助损失或平衡感知匹配,可以解决这些问题,但它们会导致模型性能下降或更难以训练。为了解决这些问题,我们提议 Mixture of Tokens,一种完全可导的模型,保留MoE架构的优点而避免上述困难。而不是路由单词到专家,这种方法将不同的单词混合在一起,以便模型可以学习所有单词-专家组合。这种混合可以被禁用,以避免在推理过程中混合不同的序列。此外,这种方法与 маsked和 causal Large Language Model 的训练和推理完全兼容。

NoteChat: A Dataset of Synthetic Doctor-Patient Conversations Conditioned on Clinical Notes

  • paper_url: http://arxiv.org/abs/2310.15959
  • repo_url: None
  • paper_authors: Junda Wang, Zonghai Yao, Zhichao Yang, Huixue Zhou, Rumeng Li, Xun Wang, Yucheng Xu, Hong Yu
  • for: 这个论文旨在Automating the creation of clinical records drafted by doctors after each patient’s visit, using language models to reduce the workload of doctors.
  • methods: 该论文提出了一种名为NoteChat的多agger扩展模型,利用大型自然语言模型(LLMs)生成医生与病人之间的合作对话,并且可以根据医疗记录来conditioning。NoteChat包括规划、角色扮演和精炼模块。
  • results: 对于NoteChat,我们提供了全自动和人工评估,与当前状态的模型进行比较,包括OpenAI的ChatGPT和GPT-4。结果表明,NoteChat可以生成高质量的医生与病人之间的合作对话,从而探索人工智能在医疗领域的潜在应用。这是多个LLMs合作完成基于医疗记录的医生与病人对话的第一个实例,为人工智能在医疗领域的发展提供了promising的可能性。
    Abstract The detailed clinical records drafted by doctors after each patient's visit are crucial for medical practitioners and researchers. Automating the creation of these notes with language models can reduce the workload of doctors. However, training such models can be difficult due to the limited public availability of conversations between patients and doctors. In this paper, we introduce NoteChat, a cooperative multi-agent framework leveraging Large Language Models (LLMs) for generating synthetic doctor-patient conversations conditioned on clinical notes. NoteChat consists of Planning, Roleplay, and Polish modules. We provide a comprehensive automatic and human evaluation of NoteChat, comparing it with state-of-the-art models, including OpenAI's ChatGPT and GPT-4. Results demonstrate that NoteChat facilitates high-quality synthetic doctor-patient conversations, underscoring the untapped potential of LLMs in healthcare. This work represents the first instance of multiple LLMs cooperating to complete a doctor-patient conversation conditioned on clinical notes, offering promising avenues for the intersection of AI and healthcare
    摘要 医生 после每次诊断的细节记录是医疗干部和研究人员的关键资料。使用语言模型自动生成这些笔记可以减轻医生的工作负担。然而,训练这些模型可以困难,因为医生和病人之间的对话非常有限。在这篇论文中,我们介绍NoteChat,一种合作多代理框架,利用大型语言模型(LLMs)生成基于医疗笔记的 sintetic 医生-病人对话。NoteChat包括规划、角色扮演和磨练模块。我们提供了完整的自动和人工评估NoteChat,与现有模型,包括OpenAI的ChatGPT和GPT-4进行比较。结果表明,NoteChat可以生成高质量的 sintetic 医生-病人对话,强调了人工智能在医疗领域的潜在价值。这是首次多个LLMs合作完成基于医疗笔记的医生-病人对话,提供了潜在的人工智能和医疗领域的交叉点。

This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models

  • paper_url: http://arxiv.org/abs/2310.15941
  • repo_url: https://github.com/hitz-zentroa/this-is-not-a-dataset
  • paper_authors: Iker García-Ferrero, Begoña Altuna, Javier Álvez, Itziar Gonzalez-Dios, German Rigau
  • for: 这 paper 是为了解释 LLMs 对否定语言的理解能力的研究。
  • methods: 该 paper 使用了一个大量自动生成的描述句子数据集,用于测试 LLMs 的总结和推理能力。
  • results: 研究发现, LLMs 在处理负面句子时表现不佳,通常仅仅依靠 superficiale 的cue。 fine-tuning 模型可以提高其表现,但negation 理解和总结仍然存在挑战。
    Abstract Although large language models (LLMs) have apparently acquired a certain level of grammatical knowledge and the ability to make generalizations, they fail to interpret negation, a crucial step in Natural Language Processing. We try to clarify the reasons for the sub-optimal performance of LLMs understanding negation. We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge that can be true or false in which negation is present in about 2/3 of the corpus in different forms. We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability and we have also fine-tuned some of the models to assess whether the understanding of negation can be trained. Our findings show that, while LLMs are proficient at classifying affirmative sentences, they struggle with negative sentences and lack a deep understanding of negation, often relying on superficial cues. Although fine-tuning the models on negative sentences improves their performance, the lack of generalization in handling negation is persistent, highlighting the ongoing challenges of LLMs regarding negation understanding and generalization. The dataset and code are publicly available.
    摘要 尽管大语言模型(LLMs)已经显示出了一定程度的语法知识和总结能力,但它们对否定的解释仍然存在困难。我们尝试解释LLMs理解否定的原因。我们创建了一个大型 semi-自动生成的描述句子集,包含约400,000个描述句子,其中否定存在约2/3的句子中,具有不同的形式。我们使用了我们的数据集和最大可用的开放LLMs进行零容量方法来评估这些模型的总结和推理能力,并对一些模型进行了微调以评估否定理解是否可以被训练。我们的发现表明,虽然LLMs在有Affirmative句子上表现出色,但它们对Negative句子表示困难,并且缺乏深入的否定理解,通常仅仅依靠 superficies 的cue。虽然微调模型可以提高其表现,但否定理解的总体化问题仍然存在,这 highlights continue challenges of LLMs regarding negation understanding and generalization.我们的数据集和代码公开available。

Contrastive Learning-based Sentence Encoders Implicitly Weight Informative Words

  • paper_url: http://arxiv.org/abs/2310.15921
  • repo_url: https://github.com/kuriyan1204/sentence-encoder-word-weighting
  • paper_authors: Hiroto Kurita, Goro Kobayashi, Sho Yokoi, Kentaro Inui
  • for: 这篇论文旨在提高句子编码器的性能,通过对比损失进行微调。
  • methods: 这篇论文使用了对比学习,并通过 тео리тиче和实验方法显示了模型在对比学习过程中对字符的重要性的影响。
  • results: 实验结果表明,对比 fine-tuning 会使模型对 informative 字符进行更多的重要性赋值,而不重要的字符则 receive 更少的重要性。
    Abstract The performance of sentence encoders can be significantly improved through the simple practice of fine-tuning using contrastive loss. A natural question arises: what characteristics do models acquire during contrastive learning? This paper theoretically and experimentally shows that contrastive-based sentence encoders implicitly weight words based on information-theoretic quantities; that is, more informative words receive greater weight, while others receive less. The theory states that, in the lower bound of the optimal value of the contrastive learning objective, the norm of word embedding reflects the information gain associated with the distribution of surrounding words. We also conduct comprehensive experiments using various models, multiple datasets, two methods to measure the implicit weighting of models (Integrated Gradients and SHAP), and two information-theoretic quantities (information gain and self-information). The results provide empirical evidence that contrastive fine-tuning emphasizes informative words.
    摘要 通过简单的对比损失精度调整,可以大幅提高句子编码器的性能。一个自然的问题是:模型在对比学习中所获得的特征是什么?这篇论文通过理论和实验表明,对比学习基于句子编码器会隐式地根据信息理论量来Weight词语,即更有信息的词语会 receiving 更大的权重,而其他词语则 receiving 更小的权重。我们的理论表明,在最优化对比学习目标下的下界中,词语 embedding 的 нор幅反映了周围词语分布中的信息增加。我们还进行了多种模型、多个数据集、两种测量模型(集成梯度和SHAP)以及两种信息理论量(信息增加和自信息)的实验。结果证明了,对比精度调整会强调有用的词语。

In-Context Learning Creates Task Vectors

  • paper_url: http://arxiv.org/abs/2310.15916
  • repo_url: https://github.com/roeehendel/icl_task_vectors
  • paper_authors: Roee Hendel, Mor Geva, Amir Globerson
  • for: 该研究探讨了宏语言模型(LLM)中的协同学习(ICL) paradigma的下面机制。
  • methods: 该研究使用了一系列实验来证明ICL学习的函数结构往往非常简单,即将训练集($S$)映射到一个单个任务向量($\boldsymbol{\theta}(S)$),然后使用这个任务向量来修饰transformer模型生成输出。
  • results: 该研究通过对多种模型和任务进行广泛的实验来支持上述声明,并证明ICL可以压缩训练集($S$)到一个单个任务向量($\boldsymbol{\theta}(S)$),从而使用这个任务向量来修饰transformer模型生成输出。
    Abstract In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set $S$ to find a best-fitting function $f(x)$ in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query $x$ and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing $S$ into a single task vector $\boldsymbol{\theta}(S)$ and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.
    摘要 受Contextual learning(ICL)在大语言模型(LLM)中emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set $S$ to find a best-fitting function $f(x)$ in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query $x$ and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing $S$ into a single task vector $\boldsymbol{\theta}(S)$ and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.Here's the translation in Traditional Chinese:受Contextual learning(ICL)在大语言模型(LLM)中emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set $S$ to find a best-fitting function $f(x)$ in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query $x$ and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing $S$ into a single task vector $\boldsymbol{\theta}(S)$ and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.

Do Stochastic Parrots have Feelings Too? Improving Neural Detection of Synthetic Text via Emotion Recognition

  • paper_url: http://arxiv.org/abs/2310.15904
  • repo_url: https://github.com/alanagiasi/emoplmsynth
  • paper_authors: Alan Cowap, Yvette Graham, Jennifer Foster
  • for: This paper aims to address the issue of identifying synthetic text generated by high-performance generative AI models, specifically by leveraging the emotional content present in human-authored text.
  • methods: The authors fine-tune pre-trained language models (PLMs) on emotion to develop an emotionally-aware detector, which is tested on various synthetic text generators, model sizes, datasets, and domains.
  • results: The emotionally-aware detector achieves significant improvements in identifying synthetic text, particularly when compared to ChatGPT, reinforcing the potential of emotion as a signal for identifying synthetic text.
    Abstract Recent developments in generative AI have shone a spotlight on high-performance synthetic text generation technologies. The now wide availability and ease of use of such models highlights the urgent need to provide equally powerful technologies capable of identifying synthetic text. With this in mind, we draw inspiration from psychological studies which suggest that people can be driven by emotion and encode emotion in the text they compose. We hypothesize that pretrained language models (PLMs) have an affective deficit because they lack such an emotional driver when generating text and consequently may generate synthetic text which has affective incoherence i.e. lacking the kind of emotional coherence present in human-authored text. We subsequently develop an emotionally aware detector by fine-tuning a PLM on emotion. Experiment results indicate that our emotionally-aware detector achieves improvements across a range of synthetic text generators, various sized models, datasets, and domains. Finally, we compare our emotionally-aware synthetic text detector to ChatGPT in the task of identification of its own output and show substantial gains, reinforcing the potential of emotion as a signal to identify synthetic text. Code, models, and datasets are available at https: //github.com/alanagiasi/emoPLMsynth
    摘要 近期的生成AI发展把关注高性能的合成文本生成技术。现在这些模型的广泛可用性和使用的容易度,强调了需要提供相应的技术来识别合成文本。基于心理学研究,我们发现人们在编写文本时会受到情感驱动,并且在文本中编码情感。我们 hypothesize that 预训练语言模型(PLM)缺乏情感驱动,因此可能会生成无情感含量的合成文本,即情感无 coherence。我们随后开发了一种情感意识检测器,通过细化 PLM 来实现。实验结果表明,我们的情感意识检测器在不同的合成文本生成器、模型大小、数据集和领域中均显示出改进。最后,我们与 ChatGPT 进行了对自己输出的识别任务,并显示了substantial 的提高,证明情感可以作为识别合成文本的信号。代码、模型和数据集可以在 上获取。

BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT

  • paper_url: http://arxiv.org/abs/2310.15896
  • repo_url: https://github.com/scutcyr/bianque
  • paper_authors: Yirong Chen, Zhenyu Wang, Xiaofen Xing, huimin zheng, Zhipei Xu, Kai Fang, Junhong Wang, Sihang Li, Jieling Wu, Qi Liu, Xiangmin Xu
  • For: The paper aims to improve the chain of questioning (CoQ) of large language models (LLMs) in providing personalized and effective health suggestions.* Methods: The proposed BianQue model is a ChatGLM-based LLM finetuned with the self-constructed health conversation dataset BianQueCorpus, which includes multiple turns of questioning and health suggestions polished by ChatGPT.* Results: Experimental results demonstrate that the proposed BianQue can simultaneously balance the capabilities of both questioning and health suggestions, which will help promote the research and application of LLMs in the field of proactive health.Here is the information in Simplified Chinese text:* For: 本研究目的是提高大语言模型(LLMs)的问题链(CoQ),以提供个性化和有效的健康建议。* Methods: 我们提出的 BianQue 模型是基于 ChatGLM 的 LLM,通过自构建的健康对话集 BianQueCorpus 进行 fine-tuning,该集包括多个问题和健康建议的循环对话,并通过 ChatGPT 进行精细调整。* Results: 实验结果表明,我们的 BianQue 可以同时保持问题和健康建议的能力,从而推动 LLMs 在健康预防领域的研究和应用。
    Abstract Large language models (LLMs) have performed well in providing general and extensive health suggestions in single-turn conversations, exemplified by systems such as ChatGPT, ChatGLM, ChatDoctor, DoctorGLM, and etc. However, the limited information provided by users during single turn results in inadequate personalization and targeting of the generated suggestions, which requires users to independently select the useful part. It is mainly caused by the missing ability to engage in multi-turn questioning. In real-world medical consultations, doctors usually employ a series of iterative inquiries to comprehend the patient's condition thoroughly, enabling them to provide effective and personalized suggestions subsequently, which can be defined as chain of questioning (CoQ) for LLMs. To improve the CoQ of LLMs, we propose BianQue, a ChatGLM-based LLM finetuned with the self-constructed health conversation dataset BianQueCorpus that is consist of multiple turns of questioning and health suggestions polished by ChatGPT. Experimental results demonstrate that the proposed BianQue can simultaneously balance the capabilities of both questioning and health suggestions, which will help promote the research and application of LLMs in the field of proactive health.
    摘要 (Simplified Chinese)大型语言模型(LLMs)在单转对话中表现良好,如ChatGPT、ChatGLM、ChatDoctor、DoctorGLM等系统。然而,用户在单转提供的信息有限,导致生成的建议没有充分个性化和目标化,需要用户独立选择有用的部分。这主要由单转问题的缺失引起。在实际医疗咨询中,医生通常采用多个迭代问题来深入了解患者的情况,以便提供有效和个性化的建议,这可以定义为链条问题(CoQ) для LLMs。为了提高LLMs的CoQ,我们提出 BianQue,基于ChatGLM的LLM,通过自动构建的健康对话集 BianQueCorpus 进行微调,该集包含多个问题和健康建议的多个转换。实验结果表明,我们的 BianQue 可以同时保持问题和健康建议的能力,这将有助于推动LLMs在健康预防领域的研究和应用。

A Contextualized Real-Time Multimodal Emotion Recognition for Conversational Agents using Graph Convolutional Networks in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.18363
  • repo_url: None
  • paper_authors: Fathima Abdul Rahman, Guang Lu
  • for: 本研究旨在提高对话代理人的情感认知能力,以提供更加人性化的交互体验。
  • methods: 本研究使用图像、视频和文本模式的图 convolutional neural network(GCN)和奖励学习(RL)来实现情感认知。
  • results: 对比其他状态当前模型,conER-GRL模型在IEMOCAP数据集上表现出了优于其他模型的情感认知能力。
    Abstract Owing to the recent developments in Generative Artificial Intelligence (GenAI) and Large Language Models (LLM), conversational agents are becoming increasingly popular and accepted. They provide a human touch by interacting in ways familiar to us and by providing support as virtual companions. Therefore, it is important to understand the user's emotions in order to respond considerately. Compared to the standard problem of emotion recognition, conversational agents face an additional constraint in that recognition must be real-time. Studies on model architectures using audio, visual, and textual modalities have mainly focused on emotion classification using full video sequences that do not provide online features. In this work, we present a novel paradigm for contextualized Emotion Recognition using Graph Convolutional Network with Reinforcement Learning (conER-GRL). Conversations are partitioned into smaller groups of utterances for effective extraction of contextual information. The system uses Gated Recurrent Units (GRU) to extract multimodal features from these groups of utterances. More importantly, Graph Convolutional Networks (GCN) and Reinforcement Learning (RL) agents are cascade trained to capture the complex dependencies of emotion features in interactive scenarios. Comparing the results of the conER-GRL model with other state-of-the-art models on the benchmark dataset IEMOCAP demonstrates the advantageous capabilities of the conER-GRL architecture in recognizing emotions in real-time from multimodal conversational signals.
    摘要

SoK: Memorization in General-Purpose Large Language Models

  • paper_url: http://arxiv.org/abs/2310.18362
  • repo_url: None
  • paper_authors: Valentin Hartmann, Anshuman Suri, Vincent Bindschaedler, David Evans, Shruti Tople, Robert West
  • for: 本研究旨在探讨大语言模型(LLM)在各种应用中的发展,以及LLM memorization的问题。
  • methods: 本研究使用了一种新的分类方法,以描述LLM中的 memorization 类型,包括 verbatim text、事实、想法、算法、写作风格、分布性和对齐目标。
  • results: 研究发现,LLM可以记忆短语和概念,但也可能记忆文本中的具体信息和写作风格。这些记忆可能导致隐私和安全问题,同时也可能提高模型的性能。研究还揭示了LLM的各种问题,如模型行为定义和排序算法的影响。
    Abstract Large Language Models (LLMs) are advancing at a remarkable pace, with myriad applications under development. Unlike most earlier machine learning models, they are no longer built for one specific application but are designed to excel in a wide range of tasks. A major part of this success is due to their huge training datasets and the unprecedented number of model parameters, which allow them to memorize large amounts of information contained in the training data. This memorization goes beyond mere language, and encompasses information only present in a few documents. This is often desirable since it is necessary for performing tasks such as question answering, and therefore an important part of learning, but also brings a whole array of issues, from privacy and security to copyright and beyond. LLMs can memorize short secrets in the training data, but can also memorize concepts like facts or writing styles that can be expressed in text in many different ways. We propose a taxonomy for memorization in LLMs that covers verbatim text, facts, ideas and algorithms, writing styles, distributional properties, and alignment goals. We describe the implications of each type of memorization - both positive and negative - for model performance, privacy, security and confidentiality, copyright, and auditing, and ways to detect and prevent memorization. We further highlight the challenges that arise from the predominant way of defining memorization with respect to model behavior instead of model weights, due to LLM-specific phenomena such as reasoning capabilities or differences between decoding algorithms. Throughout the paper, we describe potential risks and opportunities arising from memorization in LLMs that we hope will motivate new research directions.
    摘要 LLMs 可以记忆短语、事实、写作风格、分布性、对齐目标等。我们提出了 LLMs 的记忆分类,并描述了每种记忆的正面和负面影响,包括模型性能、隐私、安全、版权等方面。我们还描述了如何检测和预防记忆。然而,由于 LLMs 的特殊性,如推理能力或decoding算法的差异,我们需要更加注意记念的定义方式。在这篇论文中,我们描述了 LLMs 的记忆所带来的风险和机遇,希望能够激发新的研究方向。

Self-Guard: Empower the LLM to Safeguard Itself

  • paper_url: http://arxiv.org/abs/2310.15851
  • repo_url: None
  • paper_authors: Zezhong Wang, Fangkai Yang, Lu Wang, Pu Zhao, Hongru Wang, Liang Chen, Qingwei Lin, Kam-Fai Wong
  • for: 防止Large Language Model(LLM)被破坏攻击,避免LLM生成危险内容的负面社会影响。
  • methods: 提出了两种主要方法来解决破坏攻击:安全培训和安全保护。安全培训通过进一步训练LLM提高其安全性,而安全保护则通过外部模型或筛选器来防止危险输出。但是,安全培训在新型攻击的适应性有限,常会导致模型性能下降,而安全保护的帮助有限。
  • results: Self-Guard方法可以强化LLM对危险内容识别的能力,并使LLM能够自动检测和避免破坏攻击。实验结果表明,Self-Guard方法具有Robust性和稳定性。在坏CASE分析中,我们发现LLM occasional提供无害回答危险查询。此外,我们还评估了LLM的总能力之前和之后安全培训,证明Self-Guard方法不会导致LLM性能下降。在敏感测试中,Self-Guard方法不仅避免了LLM偏敏的问题,而且还可以减轻这种问题。
    Abstract The jailbreak attack can bypass the safety measures of a Large Language Model (LLM), generating harmful content. This misuse of LLM has led to negative societal consequences. Currently, there are two main approaches to address jailbreak attacks: safety training and safeguards. Safety training focuses on further training LLM to enhance its safety. On the other hand, safeguards involve implementing external models or filters to prevent harmful outputs. However, safety training has constraints in its ability to adapt to new attack types and often leads to a drop in model performance. Safeguards have proven to be of limited help. To tackle these issues, we propose a novel approach called Self-Guard, which combines the strengths of both safety methods. Self-Guard includes two stages. In the first stage, we enhance the model's ability to assess harmful content, and in the second stage, we instruct the model to consistently perform harmful content detection on its own responses. The experiment has demonstrated that Self-Guard is robust against jailbreak attacks. In the bad case analysis, we find that LLM occasionally provides harmless responses to harmful queries. Additionally, we evaluated the general capabilities of the LLM before and after safety training, providing evidence that Self-Guard does not result in the LLM's performance degradation. In sensitivity tests, Self-Guard not only avoids inducing over-sensitivity in LLM but also can even mitigate this issue.
    摘要 大型自然语言模型(LLM)的安全措施可以被绕过,从而生成危险的内容。这种LLM的诈骗使得社会产生了负面的后果。目前,有两种主要方法来解决绑架攻击:安全训练和安全措施。安全训练是对LLM进行进一步训练,以增强其安全性。而安全措施则是通过对LLM的输出进行外部模型或节流的实现,以防止危险的输出。但是,安全训练受到新的攻击类型的限制,而且常会导致模型性能下降。安全措施则被证明是有限的帮助。为了解决这些问题,我们提出了一个新的方法 called Self-Guard,它结合了安全训练和安全措施的优点。Self-Guard包括两个阶段。在第一阶段,我们将增强LLM的危险内容评估能力。在第二阶段,我们将 instrucLLM 通过自己的回应来预防危险输出。实验结果表明,Self-Guard具有对绑架攻击的Robust性。在坏情况分析中,我们发现LLM occasional提供无害回应危险查询。此外,我们还评估了LLM的一般能力之前和之后安全训练,证明Self-Guard不会导致LLM的性能下降。在敏感测试中,Self-Guard不仅能避免对LLM的过敏化,而且甚至可以缓和这个问题。

Unnatural language processing: How do language models handle machine-generated prompts?

  • paper_url: http://arxiv.org/abs/2310.15829
  • repo_url: None
  • paper_authors: Corentin Kervadec, Francesca Franzon, Marco Baroni
  • for: 这篇论文主要是为了研究语言模型推荐的优化问题。
  • methods: 这篇论文使用了自动生成的token序列来检测语言模型的响应。
  • results: 研究发现,自动生成的token序列可以 Routinely outperform manually crafted prompts,并且可以让模型表现出不同的响应模式。
    Abstract Language model prompt optimization research has shown that semantically and grammatically well-formed manually crafted prompts are routinely outperformed by automatically generated token sequences with no apparent meaning or syntactic structure, including sequences of vectors from a model's embedding space. We use machine-generated prompts to probe how models respond to input that is not composed of natural language expressions. We study the behavior of models of different sizes in multiple semantic tasks in response to both continuous and discrete machine-generated prompts, and compare it to the behavior in response to human-generated natural-language prompts. Even when producing a similar output, machine-generated and human prompts trigger different response patterns through the network processing pathways, including different perplexities, different attention and output entropy distributions, and different unit activation profiles. We provide preliminary insight into the nature of the units activated by different prompt types, suggesting that only natural language prompts recruit a genuinely linguistic circuit.
    摘要 Language model 提示优化研究显示,通常有意义和 grammatical 的手动生成的提示将被自动生成的Token序列所超越,包括模型的 embedding 空间中的Vector序列。我们使用机器生成的提示来探索模型对不同类型的输入的响应。我们研究不同大小的模型在多个semantic任务中对于连续和离散机器生成的提示的响应,并与人类生成的自然语言提示进行比较。即使生成相同的输出,机器生成和人类提示仍会触发不同的网络处理路径,包括不同的混乱度、注意力和输出 entropy 分布、和不同的单元活动profile。我们提供了初步的启示,表明只有自然语言提示会激活真正的语言环路。

Generative Language Models Exhibit Social Identity Biases

  • paper_url: http://arxiv.org/abs/2310.15819
  • repo_url: None
  • paper_authors: Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek
  • for: 这个研究探讨了现代大语言模型是否具有基本社会身份偏见,以及这些偏见是如何从人类语言模型中学习的。
  • methods: 研究人员使用51个大语言模型进行调查,发现大多数基础语言模型和一些指导练习模型在完成句子时表现出明显的团队positive和对外group negative的偏见。
  • results: 研究发现,通过对模型在练习数据中含有更多团队positive或对外group negative句子的控制,可以使模型表现出更高的团队solidarity和更大的对外group hostility。此外,从练习数据中去除团队positive或对外group negative句子也可以减少模型的偏见。这些结果表明,现代大语言模型具有基本社会身份偏见,并且可以通过精心制定练习数据来减少这些偏见。
    Abstract The surge in popularity of large language models has given rise to concerns about biases that these models could learn from humans. In this study, we investigate whether ingroup solidarity and outgroup hostility, fundamental social biases known from social science, are present in 51 large language models. We find that almost all foundational language models and some instruction fine-tuned models exhibit clear ingroup-positive and outgroup-negative biases when prompted to complete sentences (e.g., "We are..."). A comparison of LLM-generated sentences with human-written sentences on the internet reveals that these models exhibit similar level, if not greater, levels of bias than human text. To investigate where these biases stem from, we experimentally varied the amount of ingroup-positive or outgroup-negative sentences the model was exposed to during fine-tuning in the context of the United States Democrat-Republican divide. Doing so resulted in the models exhibiting a marked increase in ingroup solidarity and an even greater increase in outgroup hostility. Furthermore, removing either ingroup-positive or outgroup-negative sentences (or both) from the fine-tuning data leads to a significant reduction in both ingroup solidarity and outgroup hostility, suggesting that biases can be reduced by removing biased training data. Our findings suggest that modern language models exhibit fundamental social identity biases and that such biases can be mitigated by curating training data. Our results have practical implications for creating less biased large-language models and further underscore the need for more research into user interactions with LLMs to prevent potential bias reinforcement in humans.
    摘要 现代大型语言模型的流行性带来了对这些模型可能学习到的偏见的担忧。在这项研究中,我们调查了51个大型语言模型是否具有社会偏见。我们发现,大多数基础语言模型和一些特定任务练习模型在完成句子时表现出了明显的内群团结和外群敌对偏见。与人类文本相比,这些模型的偏见水平可能相当或更高。为了探索这些偏见的来源,我们在模型微调过程中采用了不同的群体偏见训练数据,并观察到模型在美国民主党和共和党的分化下表现出了明显的内群团结和外群敌对偏见。此外,从微调数据中移除内群团结或外群敌对的句子后,模型中的偏见有显著减少的趋势,这表明可以通过修改训练数据来减少偏见。我们的发现表明现代大型语言模型具有基本的社会标识偏见,并且可以通过精心修改训练数据来减少这些偏见。这些结论有实际意义,可以帮助创建更少偏见的大型语言模型,并且更加重要的是,防止人类与LLM之间的偏见循环。

BLESS: Benchmarking Large Language Models on Sentence Simplification

  • paper_url: http://arxiv.org/abs/2310.15773
  • repo_url: https://github.com/zurichnlp/bless
  • paper_authors: Tannon Kew, Alison Chi, Laura Vásquez-Rodríguez, Sweta Agrawal, Dennis Aumiller, Fernando Alva-Manchego, Matthew Shardlow
  • for: 这个论文的目的是为了测试最新的大语言模型(LLMs)在文本简化(TS)任务上的性能,以及这些模型是否可以解决这个复杂的任务。
  • methods: 这个论文使用了44种不同的大语言模型,包括不同的大小、结构、预训练方法和可访问性。这些模型在三个不同的领域(Wikipedia、新闻和医学)上进行了三种不同的测试集。
  • results: 研究发现,最佳的LLMs,即使没有专门为TS进行训练,也能与当前TS基线一样好。此外,研究还发现了一些模型在执行编辑操作方面的多样性和创新性。这个性能评估将成为未来TS方法和评价度量的开发资源。
    Abstract We present BLESS, a comprehensive performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS). We examine how well off-the-shelf LLMs can solve this challenging task, assessing a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting. Our analysis considers a suite of automatic metrics as well as a large-scale quantitative investigation into the types of common edit operations performed by the different models. Furthermore, we perform a manual qualitative analysis on a subset of model outputs to better gauge the quality of the generated simplifications. Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines. Additionally, we find that certain LLMs demonstrate a greater range and diversity of edit operations. Our performance benchmark will be available as a resource for the development of future TS methods and evaluation metrics.
    摘要 我们提出了BLESS,一个全面的性能评测标准,用于评测最新的大语言模型(LLMs)在文本简化(TS)任务上的表现。我们对44种不同的模型进行了评测,这些模型之间有不同的大小、结构、预训练方法和可访问性。我们使用三个来自不同领域(Wikipedia、新闻和医学)的测试集,在几个shot设定下进行了评测。我们的分析包括一系列自动度量器以及大规模的量化分析,以评估不同模型在TS任务上的表现。此外,我们还进行了一些手动质量分析,以更好地评估模型生成的简化结果质量。我们的评估结果表明,最佳的LLMs,即使没有直接针对TS进行训练,也能够与当前TS基准集成比肩。此外,我们发现某些LLMs在生成简化结果时拥有更广泛和多样化的编辑操作。我们的性能评测标准将作为未来TS方法和评价度量器的开发资源。

Learning From Free-Text Human Feedback – Collect New Datasets Or Extend Existing Ones?

  • paper_url: http://arxiv.org/abs/2310.15758
  • repo_url: https://github.com/ukplab/emnlp2023-learning-from-free-text-human-feedback
  • paper_authors: Dominic Petrak, Nafise Sadat Moosavi, Ye Tian, Nikolai Rozanov, Iryna Gurevych
  • for: 这篇论文的目的是研究对话系统学习自由文本人类反馈的可能性,以及使用现成对话数据集进行增强。
  • methods: 该论文使用了现成对话数据集,包括MultiWoZ、SGD、BABI、PersonaChat、Wizards-of-Wikipedia以及Self-Feeding Chatbot的人机分割。然后,通过对这些数据集中的自由文本人类反馈进行分类,derive出新的对话数据集的分类法。最后,用三种现状前景语言生成模型进行response生成,以评估包括这些数据集的影响。
  • results: 该论文的结果显示,包括MultiWoZ、SGD、BABI、PersonaChat、Wizards-of-Wikipedia以及Self-Feeding Chatbot的现成对话数据集中,自由文本人类反馈的类型和频率各不相同。此外,通过使用新的分类法,可以对对话数据集进行更好的增强。此外,通过使用三种现状前景语言生成模型进行response生成,可以评估包括这些数据集的影响。
    Abstract Learning from free-text human feedback is essential for dialog systems, but annotated data is scarce and usually covers only a small fraction of error types known in conversational AI. Instead of collecting and annotating new datasets from scratch, recent advances in synthetic dialog generation could be used to augment existing dialog datasets with the necessary annotations. However, to assess the feasibility of such an effort, it is important to know the types and frequency of free-text human feedback included in these datasets. In this work, we investigate this question for a variety of commonly used dialog datasets, including MultiWoZ, SGD, BABI, PersonaChat, Wizards-of-Wikipedia, and the human-bot split of the Self-Feeding Chatbot. Using our observations, we derive new taxonomies for the annotation of free-text human feedback in dialogs and investigate the impact of including such data in response generation for three SOTA language generation models, including GPT-2, LLAMA, and Flan-T5. Our findings provide new insights into the composition of the datasets examined, including error types, user response types, and the relations between them.
    摘要 学习从自由文本人类反馈是对对话系统的关键,但已经标注的数据匮乏,通常只覆盖了对话AI中的一小部分错误类型。相反,利用现有对话数据的同时生成技术可以增强对话数据的标注。然而,以评估这种努力的可行性为目的,我们需要了解这些数据集中自由文本人类反馈的类型和频率。在这种工作中,我们对多个常用的对话数据集进行调查,包括MultiWoZ、SGD、BABI、PersonaChat、Wizards-of-Wikipedia以及自然语言生成模型的人类分裂。通过我们的观察,我们 derivates新的对话自由文本人类反馈的分类法,并 investigate如何在响应生成中包含这些数据的影响。我们的发现提供了对这些数据集的详细了解,包括错误类型、用户反馈类型以及它们之间的关系。

Do Differences in Values Influence Disagreements in Online Discussions?

  • paper_url: http://arxiv.org/abs/2310.15757
  • repo_url: https://github.com/m0re4u/value-disagreement
  • paper_authors: Michiel van der Meer, Piek Vossen, Catholijn M. Jonker, Pradeep K. Murukannaiah
  • for: 本研究旨在 investigating 在 онлайн讨论中的不同意见是否与个人价值观有关,并探讨如何使用现有的模型来估算个人价值观。
  • methods: 本研究使用现有的语言模型来估算在线讨论中的个人价值观,然后将估算的价值观集成成价值 profil。 finally, 研究者使用人类标注的一致标签来评估价值 profil 的准确性。
  • results: 研究发现,在某些情况下,个人价值观的不同程度与不同意见有直接关系。此外,包含价值信息在一致预测中能够提高性能。
    Abstract Disagreements are common in online discussions. Disagreement may foster collaboration and improve the quality of a discussion under some conditions. Although there exist methods for recognizing disagreement, a deeper understanding of factors that influence disagreement is lacking in the literature. We investigate a hypothesis that differences in personal values are indicative of disagreement in online discussions. We show how state-of-the-art models can be used for estimating values in online discussions and how the estimated values can be aggregated into value profiles. We evaluate the estimated value profiles based on human-annotated agreement labels. We find that the dissimilarity of value profiles correlates with disagreement in specific cases. We also find that including value information in agreement prediction improves performance.
    摘要 互联网讨论中的不同意见是常见的。一些情况下,不同意见可能会促进协作并提高讨论的质量。然而,文献中对不同意见的影响因素还没有进行深入的研究。我们提出一种假设,即在线讨论中的个人价值差异是不同意见的指标。我们使用现有的模型来估算在线讨论中的价值,并将估算的价值聚合成价值profile。我们根据人类标注的一致标签来评估价值profile的评估结果。我们发现,价值profile的不同程度与specific cases中的不同意见相关。此外,包含价值信息在一致预测中可以提高性能。

Failures Pave the Way: Enhancing Large Language Models through Tuning-free Rule Accumulation

  • paper_url: http://arxiv.org/abs/2310.15746
  • repo_url: https://github.com/thunlp-mt/tran
  • paper_authors: Zeyuan Yang, Peng Li, Yang Liu
  • for: 提高大型语言模型(LLM)的性能
  • methods: 使用自适应规则积累(Tuning-free Rule Accumulation,TRAN)框架,让 LLM 从错误案例中学习并改进性能
  • results: 实验表明,Compared with recent baselines, TRAN 可以提高 LLM 的性能的大幅度。
    Abstract Large Language Models (LLMs) have showcased impressive performance. However, due to their inability to capture relationships among samples, these frozen LLMs inevitably keep repeating similar mistakes. In this work, we propose our Tuning-free Rule Accumulation (TRAN) framework, which guides LLMs in improving their performance by learning from previous mistakes. Considering data arrives sequentially, LLMs gradually accumulate rules from incorrect cases, forming a rule collection. These rules are then utilized by the LLMs to avoid making similar mistakes when processing subsequent inputs. Moreover, the rules remain independent of the primary prompts, seamlessly complementing prompt design strategies. Experimentally, we show that TRAN improves over recent baselines by a large margin.
    摘要 大型语言模型(LLM)有表现出色。然而,由于它们无法捕捉预测项目之间的关系,这些冻结的 LLM 总是重复相似的错误。在这个工作中,我们提出了我们的调整-自由规律积累(TRAN)框架,帮助 LLM 改善其表现。对于预测项目的给定,LLM 会逐渐累累集合错误的规律,形成一个规律集合。这些规律可以让 LLM 在处理后续的输入时避免重复错误。此外,这些规律与主要提示无关,可以与提示设计策略相融合。实验结果显示,TRAN 比最近的基eline高得多。

RAPL: A Relation-Aware Prototype Learning Approach for Few-Shot Document-Level Relation Extraction

  • paper_url: http://arxiv.org/abs/2310.15743
  • repo_url: None
  • paper_authors: Shiao Meng, Xuming Hu, Aiwei Liu, Shu’ang Li, Fukun Ma, Yawen Yang, Lijie Wen
  • for: 这篇论文主要用于提高几shot文档关系提取中的Semantic关系识别精度。
  • methods: 该方法使用度量基于的元学习框架,通过建立类prototype来进行分类。
  • results: 对于两个FSDLRE benchmark中的多种设置,该方法比前state-of-the-art方法提高了2.61%的$F_1$值。
    Abstract How to identify semantic relations among entities in a document when only a few labeled documents are available? Few-shot document-level relation extraction (FSDLRE) is crucial for addressing the pervasive data scarcity problem in real-world scenarios. Metric-based meta-learning is an effective framework widely adopted for FSDLRE, which constructs class prototypes for classification. However, existing works often struggle to obtain class prototypes with accurate relational semantics: 1) To build prototype for a target relation type, they aggregate the representations of all entity pairs holding that relation, while these entity pairs may also hold other relations, thus disturbing the prototype. 2) They use a set of generic NOTA (none-of-the-above) prototypes across all tasks, neglecting that the NOTA semantics differs in tasks with different target relation types. In this paper, we propose a relation-aware prototype learning method for FSDLRE to strengthen the relational semantics of prototype representations. By judiciously leveraging the relation descriptions and realistic NOTA instances as guidance, our method effectively refines the relation prototypes and generates task-specific NOTA prototypes. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches by average 2.61% $F_1$ across various settings of two FSDLRE benchmarks.
    摘要 如何在文档中IdentifyEntities的 semantic relations When only a few labeled documents are available? 几shot文档关系抽取 (FSDLRE) 是解决实际场景中数据缺乏的问题的关键。 度量基于的meta学是一种广泛采用的效果的框架,它在文档级别上构建分类的类prototype。然而,现有的工作frequently struggle to obtain class prototype with accurate relational semantics:1. 为目标关系类型构建类 prototype,它们将所有包含该关系的实体对的表示拟合在一起,但这些实体对可能也包含其他关系,从而干扰 prototype。2. 使用所有任务中的通用 NOTA (None-of-the-above) 类型的代表,忽略了不同目标关系类型下的 NOTA semantics 的差异。在这篇论文中,我们提出了一种关系意识 prototype learning 方法,用于加强文档中的关系semantics。我们judiciously leveraging relation descriptions和realistic NOTA instances as guidance,our method effectively refines the relation prototypes and generates task-specific NOTA prototypes。经过extensive experiments demonstrate that our method outperforms state-of-the-art approaches by average 2.61% $F_1$ across various settings of two FSDLRE benchmarks.

Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules

  • paper_url: http://arxiv.org/abs/2310.15724
  • repo_url: https://github.com/thunlp/compression-plugin
  • paper_authors: Chaojun Xiao, Yuqi Luo, Wenbin Zhang, Pengle Zhang, Xu Han, Yankai Lin, Zhengyan Zhang, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou
  • for: 提高 NLP 任务的计算效率,减少 Parameters 的大小和计算成本。
  • methods: 使用可插入的压缩插件,通过压缩多个隐藏向量为一个压缩多个隐藏向量为一个,并在原始 PLM 中固定训练。
  • results: 在七个数据集上 validate 了 Variator 的有效性,可以Save 53% 的计算成本,仅占 Parameters 的 0.9%,性能下降低于 2%。
    Abstract Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs. In this paper, we propose Variator, a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins. Compression plugins are designed to reduce the sequence length via compressing multiple hidden vectors into one and trained with original PLMs frozen. Different from traditional model acceleration methods, which compress PLMs to smaller sizes, Variator offers two distinct advantages: (1) In real-world applications, the plug-and-play nature of our compression plugins enables dynamic selection of different compression plugins with varying acceleration ratios based on the current workload. (2) The compression plugin comprises a few compact neural network layers with minimal parameters, significantly saving storage and memory overhead, particularly in scenarios with a growing number of tasks. We validate the effectiveness of Variator on seven datasets. Experimental results show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%. Moreover, when the model scales to billions of parameters, Variator matches the strong performance of uncompressed PLMs.
    摘要

Re-Temp: Relation-Aware Temporal Representation Learning for Temporal Knowledge Graph Completion

  • paper_url: http://arxiv.org/abs/2310.15722
  • repo_url: None
  • paper_authors: Kunze Wang, Soyeon Caren Han, Josiah Poon
  • for: 预测未来事物中缺失的实体(Temporal Knowledge Graph Completion under extrapolation setting)
  • methods: 利用显式时间嵌入和跳过信息流来避免无关信息的泄露,并 introduce two-phase forward propagation method to prevent information leakage
  • results: 在六个TKGC(extrapolation)数据集上比其他八个最新的状态艺术模型表现出色,具体 result 见 paper 中的表格
    Abstract Temporal Knowledge Graph Completion (TKGC) under the extrapolation setting aims to predict the missing entity from a fact in the future, posing a challenge that aligns more closely with real-world prediction problems. Existing research mostly encodes entities and relations using sequential graph neural networks applied to recent snapshots. However, these approaches tend to overlook the ability to skip irrelevant snapshots according to entity-related relations in the query and disregard the importance of explicit temporal information. To address this, we propose our model, Re-Temp (Relation-Aware Temporal Representation Learning), which leverages explicit temporal embedding as input and incorporates skip information flow after each timestamp to skip unnecessary information for prediction. Additionally, we introduce a two-phase forward propagation method to prevent information leakage. Through the evaluation on six TKGC (extrapolation) datasets, we demonstrate that our model outperforms all eight recent state-of-the-art models by a significant margin.
    摘要 <>通过提供答案,我将文本翻译成简化中文。>temporal knowledge graph completion(TKGC)下的推断任务是预测未来中的缺失实体,这种任务更加接近实际预测问题。现有研究大都使用逻辑图神经网络对最近的快照进行编码,但这些方法通常忽略了查询中实体相关的关系,以及时间信息的直接表达。为了解决这个问题,我们提出了我们的模型Re-Temp(关系意识时间表示学习),它利用显式的时间嵌入作为输入,并在每个时间戳点后进行跳过不必要信息的流动。此外,我们还提出了一种两阶段前进协议,以避免信息泄露。经过六个TKGC(推断)数据集的评估,我们示出了我们的模型在八个最新的状态艺术模型中具有显著的超越。

Ensemble of Task-Specific Language Models for Brain Encoding

  • paper_url: http://arxiv.org/abs/2310.15720
  • repo_url: https://github.com/jr-john/ensemble_brain_encoders
  • paper_authors: Sanjai Kumaran, Arvindh Arun, Jerrin John
  • for: 用于提高语言模型对大脑响应的预测性能
  • methods: 使用10种流行的自然语言处理任务的表示学习来转移学习,并创建了一个ensemble模型
  • results: 比基eline平均提高10%的性能 across all ROIs
    Abstract Language models have been shown to be rich enough to encode fMRI activations of certain Regions of Interest in our Brains. Previous works have explored transfer learning from representations learned for popular natural language processing tasks for predicting brain responses. In our work, we improve the performance of such encoders by creating an ensemble model out of 10 popular Language Models (2 syntactic and 8 semantic). We beat the current baselines by 10% on average across all ROIs through our ensembling methods.
    摘要 Language models有可能编码certain Regions of Interest(ROI)的fMRI活动。 previous works曾经explored transfer learning from representations学习的 popular natural language processing tasks来预测brain responses。在我们的工作中,我们提高了such encoders的性能by creating an ensemble model out of 10 popular Language Models(2 syntactic和8 semantic)。我们在所有ROIs上 average beat the current baselines by 10%。Note: "Regions of Interest" (ROIs) are specific areas of the brain that are being studied. In this text, the author is referring to the ability of language models to encode the activity of these areas based on fMRI scans.

Enhancing Biomedical Lay Summarisation with External Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2310.15702
  • repo_url: https://github.com/tgoldsack1/enhancing_biomedical_lay_summarisation_with_external_knowledge_graphs
  • paper_authors: Tomas Goldsack, Zhihao Zhang, Chen Tang, Carolina Scarton, Chenghua Lin
  • for: 这篇论文主要是为了提供一种自动生成简要摘要的方法,以便将专业技术文章简化成普通读者可以理解的语言。
  • methods: 这篇论文使用了知识图来提高自动生成简要摘要的效果,并系统地研究了三种不同的方法,每种方法targeting一个不同的encoder-decoder模型结构。
  • results: 结果表明,通过 интеGRATING graph-based domain knowledge,可以很大地提高自动生成简要摘要的可读性和技术概念的解释。
    Abstract Previous approaches for automatic lay summarisation are exclusively reliant on the source article that, given it is written for a technical audience (e.g., researchers), is unlikely to explicitly define all technical concepts or state all of the background information that is relevant for a lay audience. We address this issue by augmenting eLife, an existing biomedical lay summarisation dataset, with article-specific knowledge graphs, each containing detailed information on relevant biomedical concepts. Using both automatic and human evaluations, we systematically investigate the effectiveness of three different approaches for incorporating knowledge graphs within lay summarisation models, with each method targeting a distinct area of the encoder-decoder model architecture. Our results confirm that integrating graph-based domain knowledge can significantly benefit lay summarisation by substantially increasing the readability of generated text and improving the explanation of technical concepts.
    摘要 précédentes approches pour la résumé automatique sont exclusivement dépendantes de l'article de source qui, étant écrit pour un public technique (par exemple, des chercheurs), est peu probable de définir explicitement tous les concepts techniques ou d'indiquer toutes les informations de fond pertinentes pour un public lay. Nous résolvons ce problème en augmentant eLife, un dataset existant de résumé lay biomédical, avec des graphiques de connaissances spécifiques à l'article, chacune contenant des informations détaillées sur les concepts biomédicaux pertinents. En utilisant des évaluations automatiques et humaines, nous étudions de manière systématique l'efficacité de trois approches différentes pour intégrer des connaissances graphiques dans les modèles de résumé lay, chacune ciblant une région distincte de l'architecture encoder-decoder. Nos résultats confirment que l'intégration de la connaissance domainale basée sur les graphes peut considérablement améliorer la lisibilité du texte généré et l'explication des concepts techniques.

COPF: Continual Learning Human Preference through Optimal Policy Fitting

  • paper_url: http://arxiv.org/abs/2310.15694
  • repo_url: None
  • paper_authors: Han Zhang, Lin Gui, Yuanzhao Zhai, Hui Wang, Yu Lei, Ruifeng Xu
  • for: 提高预训练语言模型(LM) conform to human preferences
  • methods: 使用Reinforcement Learning from Human Feedback(RLHF)方法,但不需要每次新的查询或反馈都进行全量重新训练
  • results: 实验结果表明,COPF方法在不同任务和领域中一直与人类偏好保持一致,并且超越了强大的Continuous learning(CL)基elines
    Abstract The technique of Reinforcement Learning from Human Feedback (RLHF) is a commonly employed method to improve pre-trained Language Models (LM), enhancing their ability to conform to human preferences. Nevertheless, the current RLHF-based LMs necessitate full retraining each time novel queries or feedback are introduced, which becomes a challenging task because human preferences can vary between different domains or tasks. Retraining LMs poses practical difficulties in many real-world situations due to the significant time and computational resources required, along with concerns related to data privacy. To address this limitation, we propose a new method called Continual Optimal Policy Fitting (COPF), in which we estimate a series of optimal policies using the Monte Carlo method, and then continually fit the policy sequence with the function regularization. COPF involves a single learning phase and doesn't necessitate complex reinforcement learning. Importantly, it shares the capability with RLHF to learn from unlabeled data, making it flexible for continual preference learning. Our experimental results show that COPF outperforms strong Continuous learning (CL) baselines when it comes to consistently aligning with human preferences on different tasks and domains.
    摘要 RLHF(人类反馈学习强化)技术是通常用于改进预训练语言模型(LM)的方法,以提高其遵循人类偏好的能力。然而,现有RLHF基于LM的模型每次新增查询或反馈都需要全面重新训练,这会成为一项具有挑战性的任务,因为人类偏好可能在不同的领域或任务中发生变化。重新训练LM对于许多实际应用场景来说具有困难和费时的问题,同时也存在数据隐私问题。为解决这个限制,我们提出了一种新方法 called Continual Optimal Policy Fitting(COPF),其中我们使用Monte Carlo方法来估算一系列的优化政策,然后不断地使用功能规则来适应政策序列。COPF只需一次学习阶段,不需要复杂的强化学习。这种方法与RLHF一样可以学习从无标签数据中,这使其具有适应不断改变的偏好的灵活性。我们的实验结果表明,COPF在不同任务和领域中一直适应人类偏好的表现比强大的连续学习(CL)基eline更好。

Creating a silver standard for patent simplification

  • paper_url: http://arxiv.org/abs/2310.15689
  • repo_url: https://github.com/slvcsl/patentsilverstandard
  • paper_authors: Silvia Casola, Alberto Lavelli, Horacio Saggion
  • for: 本文提出了一种自动简化专利文本的方法,以便提高专利文本的访问性和机器可读性。
  • methods: 本文使用了一种基于人工智能的自动生成简化字典,并使用了特定的筛选器来生成 cleaner 的译文集。
  • results: 人工评估表明,生成的简化译文集具有 grammaticality、准确性和简洁性。
    Abstract Patents are legal documents that aim at protecting inventions on the one hand and at making technical knowledge circulate on the other. Their complex style -- a mix of legal, technical, and extremely vague language -- makes their content hard to access for humans and machines and poses substantial challenges to the information retrieval community. This paper proposes an approach to automatically simplify patent text through rephrasing. Since no in-domain parallel simplification data exist, we propose a method to automatically generate a large-scale silver standard for patent sentences. To obtain candidates, we use a general-domain paraphrasing system; however, the process is error-prone and difficult to control. Thus, we pair it with proper filters and construct a cleaner corpus that can successfully be used to train a simplification system. Human evaluation of the synthetic silver corpus shows that it is considered grammatical, adequate, and contains simple sentences.
    摘要 专利文档是法律文档,旨在一方面保护发明,另一方面让技术知识流通。它们的复杂风格——包括法律、技术和极其抽象语言——使得它们的内容困难 для人类和机器访问,对信息检索社区提出了重大挑战。这篇论文提议自动简化专利文本的方法,由于没有相关领域的平行简化数据,我们提议自动生成大规模的银色标准集。为获得候选者,我们使用通用领域重叠系统,但这个过程存在误差和难以控制的问题。因此,我们对其进行过滤,并构建了一个更加清晰的集合,可以成功地用于培训简化系统。人工评估该银色集表示,它具有正确的格式、充分的表达和简单的句子。

Prevalence and prevention of large language model use in crowd work

  • paper_url: http://arxiv.org/abs/2310.15683
  • repo_url: None
  • paper_authors: Veniamin Veselovsky, Manoel Horta Ribeiro, Philip Cozzolino, Andrew Gordon, David Rothschild, Robert West
  • for: The paper is written to investigate the use of large language models (LLMs) among crowd workers and to develop targeted mitigation strategies to reduce LLM use.
  • methods: The paper uses a text summarization task where workers were not directed in any way regarding their LLM use, and compares the estimated prevalence of LLM use with and without targeted mitigation strategies. The paper also conducts secondary analyses to explore the impact of LLM use on the quality and homogeneity of responses.
  • results: The paper finds that targeted mitigation strategies can significantly reduce, but not eliminate, LLM use among crowd workers. The paper also finds that LLM use yields high-quality but homogeneous responses, which may harm research concerned with human (rather than model) behavior and degrade future models trained with crowdsourced data. Additionally, the paper finds that preventing LLM use may be at odds with obtaining high-quality responses.
    Abstract We show that the use of large language models (LLMs) is prevalent among crowd workers, and that targeted mitigation strategies can significantly reduce, but not eliminate, LLM use. On a text summarization task where workers were not directed in any way regarding their LLM use, the estimated prevalence of LLM use was around 30%, but was reduced by about half by asking workers to not use LLMs and by raising the cost of using them, e.g., by disabling copy-pasting. Secondary analyses give further insight into LLM use and its prevention: LLM use yields high-quality but homogeneous responses, which may harm research concerned with human (rather than model) behavior and degrade future models trained with crowdsourced data. At the same time, preventing LLM use may be at odds with obtaining high-quality responses; e.g., when requesting workers not to use LLMs, summaries contained fewer keywords carrying essential information. Our estimates will likely change as LLMs increase in popularity or capabilities, and as norms around their usage change. Yet, understanding the co-evolution of LLM-based tools and users is key to maintaining the validity of research done using crowdsourcing, and we provide a critical baseline before widespread adoption ensues.
    摘要 我们表明大语言模型(LLM)在观众工作者中的使用是普遍的,并且目标的 Mitigation Strategies 可以大幅降低,但不能完全消除 LLM 的使用。在一个文本摘要任务中,工作者没有任何指导,LLM 的使用率约为 30%,但通过请求工作者不使用 LLM 和提高使用它们的成本,例如禁用复制键,可以大幅降低 LLM 的使用率,约从 30% 降至 15%。次要分析显示 LLM 使用对人类(而不是模型)的行为有高质量但同质的回应,这可能对研究造成伤害,并且对未来由观众集成的数据训练的模型造成负面影响。同时,防止 LLM 使用可能与获得高质量回应相抵触,例如当请求工作者不使用 LLM 时,摘要中的关键词数量减少。我们的估计将在 LLM 的普及度和能力增加,以及使用 norms 的改变时改变。但是,理解 LLM 基本的工具和用户之间的共演是维护透过观众集成所进行的研究的有效性的关键。我们提供了一个基本的估计,以便在大规模的采用前,我们可以更好地理解 LLM 的影响。

How Much Context Does My Attention-Based ASR System Need?

  • paper_url: http://arxiv.org/abs/2310.15672
  • repo_url: https://github.com/robflynnyh/long-context-asr
  • paper_authors: Robert Flynn, Anton Ragni
  • for: 这个研究旨在检验在语音识别任务中使用更长的听录上下文时的效果。
  • methods: 这些实验使用了 dense-attention 基于的语音和语言模型,并在不同的上下文长度(5秒至1小时)下进行训练和评估。
  • results: 研究发现,使用约80秒的听录上下文可以获得14.9%的相对提升,并通过批处理搜索与长上下文转换器语言模型组合来实现长上下文语音识别系统,与当前状态前进竞争。
    Abstract For the task of speech recognition, the use of more than 30 seconds of acoustic context during training is uncommon, and under-investigated in literature. In this work, we examine the effect of scaling the sequence length used to train/evaluate (dense-attention based) acoustic and language models on speech recognition performance. For these experiments a dataset of roughly 100,000 pseudo-labelled Spotify podcasts is used, with context lengths of 5 seconds to 1 hour being explored. Zero-shot evaluations on long-format datasets Earnings-22 and Tedlium demonstrate a benefit from training with around 80 seconds of acoustic context, showing up to a 14.9% relative improvement from a limited context baseline. Furthermore, we perform a system combination with long-context transformer language models via beam search for a fully long-context ASR system, with results that are competitive with the current state-of-the-art.
    摘要 For the task of speech recognition, using more than 30 seconds of acoustic context during training is rare and under-investigated in literature. In this work, we study the effect of scaling the sequence length used to train/evaluate (dense-attention based) acoustic and language models on speech recognition performance. For these experiments, we use a dataset of approximately 100,000 pseudo-labeled Spotify podcasts, with context lengths of 5 seconds to 1 hour being explored. Zero-shot evaluations on long-format datasets Earnings-22 and Tedlium show a benefit from training with around 80 seconds of acoustic context, with up to a 14.9% relative improvement from a limited context baseline. Furthermore, we perform a system combination with long-context transformer language models via beam search for a fully long-context ASR system, with results that are competitive with the current state-of-the-art.

Expression Syntax Information Bottleneck for Math Word Problems

  • paper_url: http://arxiv.org/abs/2310.15664
  • repo_url: https://github.com/menik1126/math_esib
  • paper_authors: Jing Xiong, Chengming Li, Min Yang, Xiping Hu, Bin Hu
  • for: automatic solving of mathematical questions in texts
  • methods: Expression Syntax Information Bottleneck (ESIB) method based on variational information bottleneck, with self-distillation loss to improve generalization and generate more diverse expressions
  • results: state-of-the-art results and more diverse solutions on two large-scale benchmarks
    Abstract Math Word Problems (MWP) aims to automatically solve mathematical questions given in texts. Previous studies tend to design complex models to capture additional information in the original text so as to enable the model to gain more comprehensive features. In this paper, we turn our attention in the opposite direction, and work on how to discard redundant features containing spurious correlations for MWP. To this end, we design an Expression Syntax Information Bottleneck method for MWP (called ESIB) based on variational information bottleneck, which extracts essential features of expression syntax tree while filtering latent-specific redundancy containing syntax-irrelevant features. The key idea of ESIB is to encourage multiple models to predict the same expression syntax tree for different problem representations of the same problem by mutual learning so as to capture consistent information of expression syntax tree and discard latent-specific redundancy. To improve the generalization ability of the model and generate more diverse expressions, we design a self-distillation loss to encourage the model to rely more on the expression syntax information in the latent space. Experimental results on two large-scale benchmarks show that our model not only achieves state-of-the-art results but also generates more diverse solutions. The code is available.
    摘要 mathematical word problems (MWP) 目的是自动解决在文本中提供的数学问题。在前一些研究中,旨在设计复杂的模型,以便 capture额外信息在原始文本中,以便模型可以获得更加全面的特征。在这篇论文中,我们弯我们的注意力向另一个方向,并努力如何抛弃 redundancy 中的偶极特征,以便为 MWP 提供更好的解决方案。为此,我们提出了一种基于变量信息瓶颈的表达 syntax 信息瓶颈方法(ESIB),该方法可以提取表达 syntax 树中的重要特征,同时过滤 latent-specific 的偶极特征。ESIB 的关键思想是通过互学习来鼓励多个模型对不同的问题表示进行同一个表达 syntax 树的预测,以便捕捉表达 syntax 树中的一致信息,并抛弃 latent-specific 的偶极特征。为了提高模型的通用能力和生成更多的表达,我们还设计了一种自我混合损失,以便让模型更加依赖于表达 syntax 信息在潜在空间中。实验结果表明,我们的模型不仅达到了当前最佳 результаados,还能够生成更多的解决方案。代码可用。

CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation

  • paper_url: http://arxiv.org/abs/2310.15638
  • repo_url: https://github.com/salt-nlp/coannotating
  • paper_authors: Minzhi Li, Taiwei Shi, Caleb Ziems, Min-Yen Kan, Nancy F. Chen, Zhengyuan Liu, Diyi Yang
  • for: 本研究旨在提出一种新的人机共约注解模型(CoAnnotating),用于大规模的不结构化文本注解。
  • methods: 该模型利用了机器学习模型的不确定性来估算机器模型的注解能力。
  • results: 实验结果表明,CoAnnotating可以有效地划分工作,并且在不同的数据集上达到21%的性能提升。
    Abstract Annotated data plays a critical role in Natural Language Processing (NLP) in training models and evaluating their performance. Given recent developments in Large Language Models (LLMs), models such as ChatGPT demonstrate zero-shot capability on many text-annotation tasks, comparable with or even exceeding human annotators. Such LLMs can serve as alternatives for manual annotation, due to lower costs and higher scalability. However, limited work has leveraged LLMs as complementary annotators, nor explored how annotation work is best allocated among humans and LLMs to achieve both quality and cost objectives. We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale. Under this framework, we utilize uncertainty to estimate LLMs' annotation capability. Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets, with up to 21% performance improvement over random baseline. For code implementation, see https://github.com/SALT-NLP/CoAnnotating.
    摘要 <>translate the following text into Simplified Chinese<>自然语言处理(NLP)中,注释数据扮演了关键角色,在训练模型和评估其性能方面都是非常重要的。在最近的大语言模型(LLMs)中,models如ChatGPT显示了零开始能力,在许多文本注释任务上表现相当于或 même exceeding human annotators。这些LLMs可以作为手动注释的替代方案,因为其成本较低,可扩展性较高。然而,有限的工作没有利用LLMs作为补充注释者,也没有探索如何将注释工作分配给人类和LLMs以实现质量和成本目标。我们提出了CoAnnotating,一种新的人类-LLM共同注释文本的框架。在这个框架下,我们利用uncertainty来估计LLMs的注释能力。我们的实验表明,CoAnnotating是一种有效的方法,可以在不同的数据集上分配工作,并且比Random baseline提高性能达21%。 For code implementation, see .

Tips for making the most of 64-bit architectures in langage design, libraries or garbage collection

  • paper_url: http://arxiv.org/abs/2310.15632
  • repo_url: None
  • paper_authors: Benoît Sonntag, Dominique Colnet
  • for: 该论文探讨了如何利用64位处理器的低级编程可能性,以提高计算速度和内存利用率。
  • methods: 论文提出了三个具体的例子,包括实现多精度整数库、使用UTF-8字符串索引和优化垃圾回收器。
  • results: 论文的例子显示了在计算速度和内存利用率方面的性能提升。
    Abstract The 64-bit architectures that have become standard today offer unprecedented low-level programming possibilities. For the first time in the history of computing, the size of address registers far exceeded the physical capacity of their bus.After a brief reminder of the possibilities offered by the small size of addresses compared to the available 64 bits,we develop three concrete examples of how the vacant bits of these registers can be used.Among these examples, two of them concern the implementation of a library for a new statically typed programming language.Firstly, the implementation of multi-precision integers, with the aim of improving performance in terms of both calculation speed and RAM savings.The second example focuses on the library's handling of UTF-8 character strings.Here, the idea is to make indexing easier by ignoring the physical size of each UTF-8 characters.Finally, the third example is a possible enhancement of garbage collectors, in particular the mark \& sweep for the object marking phase.
    摘要 现代64位架构已成为标准,提供了前所未有的低级编程可能性。这是计算史上首次,地址 Register 的大小超过了总线的物理容量。我们从小地址比例64位中的可用 bits的角度出发,提出三个具体的例子,其中两个是实现一种新的静态类型编程语言的库。首先,我们实现了多精度整数,以提高计算速度和内存占用量。第二个例子是库处理 UTF-8 字符串,以便更容易地索引。在这个例子中,我们忽略了每个 UTF-8 字符的物理大小,使索引更加简单。最后,第三个例子是对垃圾收集器的优化,特别是对对象标识阶段的 mark & sweep 垃圾收集器。

Machine Translation for Nko: Tools, Corpora and Baseline Results

  • paper_url: http://arxiv.org/abs/2310.15612
  • repo_url: None
  • paper_authors: Moussa Koulako Bala Doumbouya, Baba Mamadi Diané, Solo Farabado Cissé, Djibrila Diané, Abdoulaye Sow, Séré Moussa Doumbouya, Daouda Bangoura, Fodé Moriba Bayo, Ibrahima Sory 2. Condé, Kalo Mory Diané, Chris Piech, Christopher Manning
  • for: Addressing the lack of usable machine translation systems for Nko, a language spoken by tens of millions of people across multiple West African countries.
  • methods: Developed a set of tools, resources, and baseline results aimed towards the development of usable machine translation systems for Nko and other languages that do not currently have sufficiently large parallel text corpora available.
  • results: Presented a novel collaborative parallel text curation software (Friallel), expanded the FLoRes-200 and NLLB-Seed corpora with high-quality Nko translations, and developed a collection of trilingual and bilingual corpora (nicolingua-0005) with over 3 million Nko words. The best model scored 30.83 English-Nko chrF++ on FLoRes-devtest.
    Abstract Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed towards the development of usable machine translation systems for Nko and other languages that do not currently have sufficiently large parallel text corpora available. (1) Friallel: A novel collaborative parallel text curation software that incorporates quality control through copyedit-based workflows. (2) Expansion of the FLoRes-200 and NLLB-Seed corpora with 2,009 and 6,193 high-quality Nko translations in parallel with 204 and 40 other languages. (3) nicolingua-0005: A collection of trilingual and bilingual corpora with 130,850 parallel segments and monolingual corpora containing over 3 million Nko words. (4) Baseline bilingual and multilingual neural machine translation results with the best model scoring 30.83 English-Nko chrF++ on FLoRes-devtest.
    摘要 Currently, there is no usable machine translation system for Nko, a language spoken by tens of millions of people across multiple West African countries, which holds significant cultural and educational value. To address this issue, we present a set of tools, resources, and baseline results aimed towards the development of usable machine translation systems for Nko and other languages that do not currently have sufficiently large parallel text corpora available.(1) Friallel: 一款新的合作平行文本纠正软件,包含质量控制通过复制编辑工作流程。(2) FLoRes-200和NLLB-Seed corpora的扩展,包含2,009和6,193个高质量的Nko翻译和204种其他语言的平行翻译。(3) nicolingua-0005:一个包含130,850个平行段和更多than 3 million Nko单词的多语言和双语 corpora集。(4) 使用最佳模型,得到了30.83的英文-Nko chrF++ 成绩在FLoRes-devtest上。

MUSER: A Multi-View Similar Case Retrieval Dataset

  • paper_url: http://arxiv.org/abs/2310.15602
  • repo_url: https://github.com/thulawtech/muser
  • paper_authors: Qingquan Li, Yiran Hu, Feng Yao, Chaojun Xiao, Zhiyuan Liu, Maosong Sun, Weixing Shen
  • for: 提高司法公平性,开发智能法律应用程序
  • methods: 多视角相似度测量、全面法律元素标注
  • results: incorporating 法律元素可以提高相似案件模型的表现,但还需要继续解决MUSER中的挑战。Here’s a brief explanation of each point:
  • for: The paper is written to improve the fairness of the judicial system by developing a smart legal application.
  • methods: The paper uses a multi-view similarity measurement and comprehensive legal element annotation to evaluate the similarity between cases.
  • results: Incorporating legal elements can improve the performance of similar case retrieval models, but there are still challenges to be addressed in the MUSER dataset.
    Abstract Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. However, existing SCR datasets only focus on the fact description section when judging the similarity between cases, ignoring other valuable sections (e.g., the court's opinion) that can provide insightful reasoning process behind. Furthermore, the case similarities are typically measured solely by the textual semantics of the fact descriptions, which may fail to capture the full complexity of legal cases from the perspective of legal knowledge. In this work, we present MUSER, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations. Specifically, we select three perspectives (legal fact, dispute focus, and law statutory) and build a comprehensive and structured label schema of legal elements for each of them, to enable accurate and knowledgeable evaluation of case similarities. The constructed dataset originates from Chinese civil cases and contains 100 query cases and 4,024 candidate cases. We implement several text classification algorithms for legal element prediction and various retrieval methods for retrieving similar cases on MUSER. The experimental results indicate that incorporating legal elements can benefit the performance of SCR models, but further efforts are still required to address the remaining challenges posed by MUSER. The source code and dataset are released at https://github.com/THUlawtech/MUSER.
    摘要 相似案件检索(SCR)是法律人工智能应用的代表之一,对法院公正发挥重要作用。然而,现有的 SCR 数据集仅基于事实描述部分进行相似性评估,忽略其他有价值的部分(例如法院意见),这可能导致失去有价值的推理过程。另外,案例相似性通常基于事实描述部分的文本 semantics 进行评估,这可能不能捕捉法律案例的全面性。在这种情况下,我们提出了 MUSER,一个基于多视角相似度测量和全面的法律元素的similar case retrieval数据集。具体来说,我们选择了三个视角(法律事实、争议重点和法律法规),并建立了每个视角的结构化标签 schema 以便准确和知识化评估案例的相似性。构建的数据集来自中国民事案例,包含100个查询案例和4,024个候选案例。我们实现了多种文本分类算法以便法律元素预测,以及多种检索方法以便检索相似案例。实验结果表明,包含法律元素可以提高 SCR 模型的性能,但还需要进一步努力以 Address MUSER 中的剩下挑战。数据集和代码可以在 获取。

ScanDL: A Diffusion Model for Generating Synthetic Scanpaths on Texts

  • paper_url: http://arxiv.org/abs/2310.15587
  • repo_url: https://github.com/dili-lab/scandl
  • paper_authors: Lena S. Bolliger, David R. Reich, Patrick Haller, Deborah N. Jakobi, Paul Prasse, Lena A. Jäger
  • for: 这个论文的目的是为了研究人类语言处理的认知机制,以及利用眼动数据进行语言相关的机器学习任务。
  • methods: 这篇论文使用了一种基于扩散过程的数据驱动的眼动数据生成模型,称为ScanDL,以生成人类化的扫描路径。模型利用预训练的单词表示和共同嵌入句子和固定序列,以捕捉多模态的句子和眼动之间的互动。
  • results: 作者在几个数据集上进行了内部和跨数据集的评估,并证明了ScanDL在生成眼动数据方面表现出色,大大超过了当前的状态艺术。此外,作者还进行了广泛的心理语言分析,证明ScanDL能够展现出人类化的读取行为。
    Abstract Eye movements in reading play a crucial role in psycholinguistic research studying the cognitive mechanisms underlying human language processing. More recently, the tight coupling between eye movements and cognition has also been leveraged for language-related machine learning tasks such as the interpretability, enhancement, and pre-training of language models, as well as the inference of reader- and text-specific properties. However, scarcity of eye movement data and its unavailability at application time poses a major challenge for this line of research. Initially, this problem was tackled by resorting to cognitive models for synthesizing eye movement data. However, for the sole purpose of generating human-like scanpaths, purely data-driven machine-learning-based methods have proven to be more suitable. Following recent advances in adapting diffusion processes to discrete data, we propose ScanDL, a novel discrete sequence-to-sequence diffusion model that generates synthetic scanpaths on texts. By leveraging pre-trained word representations and jointly embedding both the stimulus text and the fixation sequence, our model captures multi-modal interactions between the two inputs. We evaluate ScanDL within- and across-dataset and demonstrate that it significantly outperforms state-of-the-art scanpath generation methods. Finally, we provide an extensive psycholinguistic analysis that underlines the model's ability to exhibit human-like reading behavior. Our implementation is made available at https://github.com/DiLi-Lab/ScanDL.
    摘要 阅读时的眼动对心理语言研究具有关键性。更 reciently, 眼动和认知之间的紧密关系也被用于语言相关的机器学习任务,如语言模型的可读性、加强和预训练,以及文本和读者特有的属性的推断。然而,眼动数据的罕见和应用时无法获取的问题成为了这一研究领域的主要挑战。初始地,这个问题被解决了通过启用认知模型来生成眼动数据。然而,为了生成人类化的扫描路径,数据驱动的机器学习基本方法证明更加适用。基于最近的扩散过程的适应,我们提出了ScanDL,一种新的离散序列到序列扩散模型,可以生成人工智能化的扫描路径。通过将预训练word表示和扫描序列共同嵌入,我们的模型捕捉了文本和扫描序列之间的多模态交互。我们在不同的数据集上进行了 dentro- y across-dataset 的评估,并证明ScanDL在生成扫描路径方面显著超越了现有的扫描路径生成方法。最后,我们进行了广泛的心理语言分析,并证明ScanDL能够展现出人类化的阅读行为。我们的实现可以在https://github.com/DiLi-Lab/ScanDL 上找到。

Multimodal Representations for Teacher-Guided Compositional Visual Reasoning

  • paper_url: http://arxiv.org/abs/2310.15585
  • repo_url: None
  • paper_authors: Wafa Aissa, Marin Ferecatu, Michel Crucianu
  • for: 提高图像问答模型的效果和可读性
  • methods: 使用大规模交叉模式Encoder获取特征,并利用学习导师约束来改善模型的训练方法
  • results: 通过增加交叉模式特征和改进训练方法,实现图像问答模型的性能和可读性之间的平衡
    Abstract Neural Module Networks (NMN) are a compelling method for visual question answering, enabling the translation of a question into a program consisting of a series of reasoning sub-tasks that are sequentially executed on the image to produce an answer. NMNs provide enhanced explainability compared to integrated models, allowing for a better understanding of the underlying reasoning process. To improve the effectiveness of NMNs we propose to exploit features obtained by a large-scale cross-modal encoder. Also, the current training approach of NMNs relies on the propagation of module outputs to subsequent modules, leading to the accumulation of prediction errors and the generation of false answers. To mitigate this, we introduce an NMN learning strategy involving scheduled teacher guidance. Initially, the model is fully guided by the ground-truth intermediate outputs, but gradually transitions to an autonomous behavior as training progresses. This reduces error accumulation, thus improving training efficiency and final performance.We demonstrate that by incorporating cross-modal features and employing more effective training techniques for NMN, we achieve a favorable balance between performance and transparency in the reasoning process.
    摘要 神经模块网络(NMN)是一种吸引人的方法 для视觉问答,允许将问题转化为一系列的逻辑子任务,并在图像上顺序执行以生成答案。NMN提供了提高可读性的优势,allowing for a better understanding of the underlying reasoning process. To improve the effectiveness of NMNs, we propose to exploit features obtained by a large-scale cross-modal encoder. In addition, the current training approach of NMNs relies on the propagation of module outputs to subsequent modules, leading to the accumulation of prediction errors and the generation of false answers. To mitigate this, we introduce an NMN learning strategy involving scheduled teacher guidance. Initially, the model is fully guided by the ground-truth intermediate outputs, but gradually transitions to an autonomous behavior as training progresses. This reduces error accumulation, thus improving training efficiency and final performance.We demonstrate that by incorporating cross-modal features and employing more effective training techniques for NMN, we achieve a favorable balance between performance and transparency in the reasoning process.

POE: Process of Elimination for Multiple Choice Reasoning

  • paper_url: http://arxiv.org/abs/2310.15575
  • repo_url: https://github.com/kasmasvan/poe
  • paper_authors: Chenkai Ma, Xinya Du
  • for: 提高自然语言处理器在多选理智任务中的表现
  • methods: 提出了一种两步评分方法,称为过程排除(POE),首先对每个选项进行评分,然后根据评分结果排除看上去错误的选项,并从剩下的选项中进行最终预测。
  • results: 在8种理智任务上进行零例试验,证明POE方法的有效性,并且发现POE方法尤其适合逻辑理智任务。此外,还进行了masks的分析,并证明POE方法可以应用于少量示例设定和大语言模型(LLMs)如ChatGPT。
    Abstract Language models (LMs) are capable of conducting in-context learning for multiple choice reasoning tasks, but the options in these tasks are treated equally. As humans often first eliminate wrong options before picking the final correct answer, we argue a similar two-step strategy can make LMs better at these tasks. To this end, we present the Process of Elimination (POE), a two-step scoring method. In the first step, POE scores each option, and eliminates seemingly wrong options. In the second step, POE masks these wrong options, and makes the final prediction from the remaining options. Zero-shot experiments on 8 reasoning tasks illustrate the effectiveness of POE, and a following analysis finds our method to be especially performant on logical reasoning tasks. We further analyze the effect of masks, and show that POE applies to few-shot settings and large language models (LLMs) like ChatGPT.
    摘要 语言模型(LM)可以在多选问题上进行上下文学习,但是选项在这些任务中往往被对待相同。人类通常会先消除错误的选项,然后选择最终正确的答案。我们认为类似的两步策略可以使LM更好地处理这些任务。为此,我们提出了消除过程(POE),它是一种两步评分方法。在第一步,POE对每个选项进行评分,并消除看起来错误的选项。在第二步,POE隐藏这些错误的选项,然后从剩下的选项中进行最终预测。在零容量实验中,POE在8个理解任务上表现出色,并且分析发现,POE在逻辑理解任务上表现特别出色。我们还分析了mask的效果,并证明POE适用于少量上下文和大语言模型(LLM)如ChatGPT。

Natural Language Processing for Drug Discovery Knowledge Graphs: promises and pitfalls

  • paper_url: http://arxiv.org/abs/2310.15572
  • repo_url: None
  • paper_authors: J. Charles G. Jeynes, Tim James, Matthew Corney
  • for: 该论文旨在探讨使用自然语言处理(NLP)技术来挖掘科学文献中的不结构化文本,以帮助建立知识图谱(KG)以激发药物发现。
  • methods: 该论文使用NLP技术来自动提取文本中的数据,以增强知识图谱中的数据。
  • results: 该论文指出,使用NLP技术可以自动提取数据从数百万个文档中,但也存在许多可能的坏处,如命名实体识别和 ontology 链接错误,这些错误可能导致错误的推论和结论。
    Abstract Building and analysing knowledge graphs (KGs) to aid drug discovery is a topical area of research. A salient feature of KGs is their ability to combine many heterogeneous data sources in a format that facilitates discovering connections. The utility of KGs has been exemplified in areas such as drug repurposing, with insights made through manual exploration and modelling of the data. In this article, we discuss promises and pitfalls of using natural language processing (NLP) to mine unstructured text typically from scientific literature as a data source for KGs. This draws on our experience of initially parsing structured data sources such as ChEMBL as the basis for data within a KG, and then enriching or expanding upon them using NLP. The fundamental promise of NLP for KGs is the automated extraction of data from millions of documents a task practically impossible to do via human curation alone. However, there are many potential pitfalls in NLP-KG pipelines such as incorrect named entity recognition and ontology linking all of which could ultimately lead to erroneous inferences and conclusions.
    摘要

Visually Grounded Continual Language Learning with Selective Specialization

  • paper_url: http://arxiv.org/abs/2310.15571
  • repo_url: None
  • paper_authors: Kyra Ahrens, Lennart Bengtson, Jae Hee Lee, Stefan Wermter
  • for: 本研究旨在提供Language-informed continual learning中模型特性的扩展分析,以便控制特定任务学习和总体知识的平衡。
  • methods: 本研究使用了多种特性分析和模型评估策略,包括两个新引入的 диагностические数据集,以及多种模型结构和特性分析。
  • results: 研究结果表明,选择策略对于Language-informed continual learning中的模型特性具有重要作用,并且提出了一些简单易行的方法,以超越常见的 continual learning 基elines。
    Abstract A desirable trait of an artificial agent acting in the visual world is to continually learn a sequence of language-informed tasks while striking a balance between sufficiently specializing in each task and building a generalized knowledge for transfer. Selective specialization, i.e., a careful selection of model components to specialize in each task, is a strategy to provide control over this trade-off. However, the design of selection strategies requires insights on the role of each model component in learning rather specialized or generalizable representations, which poses a gap in current research. Thus, our aim with this work is to provide an extensive analysis of selection strategies for visually grounded continual language learning. Due to the lack of suitable benchmarks for this purpose, we introduce two novel diagnostic datasets that provide enough control and flexibility for a thorough model analysis. We assess various heuristics for module specialization strategies as well as quantifiable measures for two different types of model architectures. Finally, we design conceptually simple approaches based on our analysis that outperform common continual learning baselines. Our results demonstrate the need for further efforts towards better aligning continual learning algorithms with the learning behaviors of individual model parts.
    摘要 文本中的愿望是一个人工智能在视觉世界中不断学习一串语言指导的任务,同时保持学习任务之间的平衡。选择性特殊化是一种策略,可以为这个贸易做出控制。然而,选择策略的设计需要了解模型组件在学习特定或通用表示方面的角色,这种空白在当前研究中存在。因此,我们的目标是通过广泛的分析来探讨选择策略的选择。由于没有适合的标准准确量表 для这种目的,我们引入了两个新的诊断数据集,以便对模型进行完善的分析。我们评估了多种模块特殊化策略,以及两种不同的模型架构中的量化度量。最后,我们设计了简单易于实现的方法,并超越常见的 kontinuierliche learning 基线。我们的结果表明需要更多的努力来更好地对照 continual learning 算法和模型组件之间的学习行为进行对应。

MuLMS: A Multi-Layer Annotated Text Corpus for Information Extraction in the Materials Science Domain

  • paper_url: http://arxiv.org/abs/2310.15569
  • repo_url: None
  • paper_authors: Timo Pierre Schrader, Matteo Finco, Stefan Grünewald, Felix Hildebrand, Annemarie Friedrich
  • for: 本文旨在提供一个新的数据集,用于支持材料科学领域的研究。
  • methods: 本文使用了多种机器学习模型,包括命名实体识别、关系检测和帧结构识别等多种任务。
  • results: 研究发现,通过多任务训练和利用相关的资源,可以获得竞争力强的模型性能。
    Abstract Keeping track of all relevant recent publications and experimental results for a research area is a challenging task. Prior work has demonstrated the efficacy of information extraction models in various scientific areas. Recently, several datasets have been released for the yet understudied materials science domain. However, these datasets focus on sub-problems such as parsing synthesis procedures or on sub-domains, e.g., solid oxide fuel cells. In this resource paper, we present MuLMS, a new dataset of 50 open-access articles, spanning seven sub-domains of materials science. The corpus has been annotated by domain experts with several layers ranging from named entities over relations to frame structures. We present competitive neural models for all tasks and demonstrate that multi-task training with existing related resources leads to benefits.
    摘要 监控最新相关发表文章和实验结果是研究领域中的一项具有挑战性的任务。先前的工作已经证明了信息EXTRACTION模型在不同的科学领域中的效果。最近,物理科学领域内的一些数据集已经发布。然而,这些数据集都是关注子问题,如 sintesis过程解析或特定子领域,如固体燃料电池。在这篇资源文章中,我们介绍了MuLMS数据集,包含50篇开放访问文章,覆盖了物理科学七个子领域。这个corpus已经由域专家注释了多层,从名称实体到关系到帧结构。我们提出了竞争力强的神经网络模型,并示出了多任务训练与现有相关资源的合作带来的 beneficial effect。

TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction

  • paper_url: http://arxiv.org/abs/2310.15556
  • repo_url: None
  • paper_authors: Junyi Liu, Liangzhi Li, Tong Xiang, Bowen Wang, Yiming Qian
  • For: The paper aims to mitigate the cost of deploying commercial retrieval-augmented large language models (LLMs) by proposing a token compression scheme.* Methods: The proposed token compression scheme includes two methods: summarization compression and semantic compression. The first method uses a T5-based model fine-tuned by datasets generated using self-instruct, while the second method removes words with lower impact on the semantic.* Results: The proposed methods are evaluated using a dataset called Food-Recommendation DB (FRDB) focusing on food recommendation for women around pregnancy period or infants. The results show that the summarization compression can reduce 65% of the retrieval token size with further 0.3% improvement on the accuracy, while the semantic compression provides a more flexible way to trade-off the token size with performance, with a 20% reduction in token size and only 1.6% drop in accuracy.
    Abstract Since ChatGPT released its API for public use, the number of applications built on top of commercial large language models (LLMs) increase exponentially. One popular usage of such models is leveraging its in-context learning ability and generating responses given user queries leveraging knowledge obtained by retrieval augmentation. One problem of deploying commercial retrieval-augmented LLMs is the cost due to the additionally retrieved context that largely increases the input token size of the LLMs. To mitigate this, we propose a token compression scheme that includes two methods: summarization compression and semantic compression. The first method applies a T5-based model that is fine-tuned by datasets generated using self-instruct containing samples with varying lengths and reduce token size by doing summarization. The second method further compresses the token size by removing words with lower impact on the semantic. In order to adequately evaluate the effectiveness of the proposed methods, we propose and utilize a dataset called Food-Recommendation DB (FRDB) focusing on food recommendation for women around pregnancy period or infants. Our summarization compression can reduce 65% of the retrieval token size with further 0.3% improvement on the accuracy; semantic compression provides a more flexible way to trade-off the token size with performance, for which we can reduce the token size by 20% with only 1.6% of accuracy drop.
    摘要 The first method uses a T5-based model that is fine-tuned by datasets generated using self-instruct containing samples with varying lengths, reducing the token size by summarizing the content. The second method further compresses the token size by removing words with lower impact on the semantic. To evaluate the effectiveness of the proposed methods, we propose and utilize a dataset called Food-Recommendation DB (FRDB) that focuses on food recommendations for women during pregnancy or infancy.Our summarization compression can reduce the retrieval token size by 65% with a further 0.3% improvement in accuracy, while the semantic compression provides a more flexible way to trade-off the token size with performance, reducing the token size by 20% with only a 1.6% drop in accuracy.

Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward Networks

  • paper_url: http://arxiv.org/abs/2310.15552
  • repo_url: None
  • paper_authors: Sunit Bhattacharya, Ondrej Bojar
  • for: 这个论文的目的是研究Transformer模型中的Feed-Forward模块是一个键值记忆集合,keys学习了各个输入pattern的特定模式,values则将这些’记忆’的输出组合起来预测下一个token。
  • methods: 这个论文使用了Transformer模型的pretraining和autoregressive预测方法,并通过使用并行 corpora来验证自己的假设。
  • results: 研究发现,模型的输入和输出层更加具有语言特有的行为,而中间层则更加具有语言共享的特征。
    Abstract Recent research suggests that the feed-forward module within Transformers can be viewed as a collection of key-value memories, where the keys learn to capture specific patterns from the input based on the training examples. The values then combine the output from the 'memories' of the keys to generate predictions about the next token. This leads to an incremental process of prediction that gradually converges towards the final token choice near the output layers. This interesting perspective raises questions about how multilingual models might leverage this mechanism. Specifically, for autoregressive models trained on two or more languages, do all neurons (across layers) respond equally to all languages? No! Our hypothesis centers around the notion that during pretraining, certain model parameters learn strong language-specific features, while others learn more language-agnostic (shared across languages) features. To validate this, we conduct experiments utilizing parallel corpora of two languages that the model was initially pretrained on. Our findings reveal that the layers closest to the network's input or output tend to exhibit more language-specific behaviour compared to the layers in the middle.
    摘要 (Simplified Chinese translation)近期研究表明,Transformer 中的 feed-forward 模块可以视为一个包含键值记忆的集合,其中键学习 capture 输入中特定的模式,基于训练示例。值们则将 ' memories ' 中的输出 combine 以生成下一个token的预测。这种逐步进行的预测过程,逐渐 converges 到输出层的最终token选择。这种有趣的视角提出了关于多语言模型如何利用这种机制的问题。例如,对于使用两个或更多语言的 autoregressive 模型,所有神经元(在层次上)是否对所有语言具有相同的响应?不是!我们的假设是,在预训练时,模型参数会学习强度语言特有的特征,而其他参数则会学习更加语言共享的特征。为验证这一假设,我们通过使用两种语言的并行 corpus 进行实验。我们的发现是,输入层和输出层的层次比较接近的神经元具有更强的语言特有行为,与中间层的神经元相比。

Improving Language Models Meaning Understanding and Consistency by Learning Conceptual Roles from Dictionary

  • paper_url: http://arxiv.org/abs/2310.15541
  • repo_url: None
  • paper_authors: Myeongjun Erik Jang, Thomas Lukasiewicz
  • for: 本研究旨在解决现代预训练语言模型(PLM)的不信worthiness问题,即生成不一致的预测问题,这会导致生成不同的预测结果,但是表达的意思相同。
  • methods: 我们提出了一种实用的方法,通过加强PLM的意识性来解决这个问题。我们基于字典中的词义对的概念角色理论来让PLM学习准确的意思。然后,我们提出了一种高效的参数集成技术,可以快速地将学习到的概念关系与PLM的预训练知识结合起来。
  • results: 我们的实验结果表明,我们的方法可以同时改善多种一致性,快速地集成知识,并可以应用于其他语言。
    Abstract The non-humanlike behaviour of contemporary pre-trained language models (PLMs) is a leading cause undermining their trustworthiness. A striking phenomenon of such faulty behaviours is the generation of inconsistent predictions, which produces logically contradictory results, such as generating different predictions for texts delivering the same meaning or violating logical properties. Previous studies exploited data augmentation or implemented specialised loss functions to alleviate the issue. However, their usage is limited, because they consume expensive training resources for large-sized PLMs and can only handle a certain consistency type. To this end, we propose a practical approach that alleviates the inconsistent behaviour issue by fundamentally improving PLMs' meaning awareness. Based on the conceptual role theory, our method allows PLMs to capture accurate meaning by learning precise interrelationships between concepts from word-definition pairs in a dictionary. Next, we propose an efficient parameter integration technique that updates only a few additional parameters to combine the learned interrelationship with PLMs' pre-trained knowledge. Our experimental results reveal that the approach can concurrently improve multiple types of consistency, enables efficient knowledge integration, and easily applies to other languages.
    摘要 当代预训练语言模型(PLM)的非人类行为是信worthiness的主要原因。一种 striking 的现象是生成不一致的预测,这会产生逻辑矛盾的结果,如生成表达同义文本时不同的预测或违反逻辑规则。先前的研究使用了数据扩展或特殊的损失函数来缓解问题,但它们的使用有限,因为它们需要大量的训练资源,只能处理一定的一致性类型。为此,我们提出了一种实用的方法,可以根本改善 PLM 的意识意识。基于概念角色理论,我们的方法使 PLM 能够准确地捕捉意思,通过学习词定义对的精准关系。然后,我们提出了一种高效的参数集成技术,可以在 PLM 的预训练知识基础之上更新只需要一些额外参数。我们的实验结果表明,该方法可以同时改善多种一致性,实现高效的知识集成,并且容易应用于其他语言。

MarkQA: A large scale KBQA dataset with numerical reasoning

  • paper_url: http://arxiv.org/abs/2310.15517
  • repo_url: https://github.com/cdhx/markqa
  • paper_authors: Xiang Huang, Sitao Cheng, Yuheng Bao, Shanshan Huang, Yuzhong Qu
  • for: 本研究对知识库问答(KBQA)的进一步发展进行了尝试,特别是对数字reasoning进行了更多的探索。
  • methods: 本文提出了一个新的任务,即NR-KBQA,它需要某些问题的解决需要进行多个跳步逻辑和数字逻辑。作者们提出了一种逻辑表示形式,即Python的PyQL,用于表示数字逻辑问题的解决过程。
  • results: 对MarkQA数据集进行了一些state-of-the-art QA方法的实验,结果显示,复杂的数字逻辑在KBQA中具有很大的挑战。
    Abstract While question answering over knowledge bases (KBQA) has shown progress in addressing factoid questions, KBQA with numerical reasoning remains relatively unexplored. In this paper, we focus on the complex numerical reasoning in KBQA and propose a new task, NR-KBQA, which necessitates the ability to perform both multi-hop reasoning and numerical reasoning. We design a logic form in Python format called PyQL to represent the reasoning process of numerical reasoning questions. To facilitate the development of NR-KBQA, we present a large dataset called MarkQA, which is automatically constructed from a small set of seeds. Each question in MarkQA is equipped with its corresponding SPARQL query, alongside the step-by-step reasoning process in the QDMR format and PyQL program. Experimental results of some state-of-the-art QA methods on the MarkQA show that complex numerical reasoning in KBQA faces great challenges.
    摘要 while 问答 sobre bases de conocimiento (KBQA) ha demostrado progreso en responder preguntas de hecho, el KBQA con razonamiento numérico permanece relativamente inexplorado. En este artículo, nos enfocamos en el razonamiento numérico complejo en KBQA y propusimos una nueva tarea, NR-KBQA, que requiere la habilidad de realizar tanto razonamiento en múltiples pasos como razonamiento numérico. Diseñamos una forma lógica en formato Python llamada PyQL para representar el proceso de razonamiento de preguntas de números. Para facilitar el desarrollo de NR-KBQA, presentamos una gran base de datos llamada MarkQA, que se construyó automáticamente a partir de un conjunto pequeño de semillas. Cada pregunta en MarkQA está equipada con su correspondiente consulta SPARQL, así como el proceso de razonamiento paso a paso en el formato QDMR y el programa PyQL. Los resultados experimentales de algunos métodos de QA estado-de-arte en MarkQA muestran que el razonamiento numérico complejo en KBQA enfrenta grandes desafíos.

Fighting Fire with Fire: The Dual Role of LLMs in Crafting and Detecting Elusive Disinformation

  • paper_url: http://arxiv.org/abs/2310.15515
  • repo_url: None
  • paper_authors: Jason Lucas, Adaku Uchendu, Michiharu Yamashita, Jooyoung Lee, Shaurya Rohatgi, Dongwon Lee
  • for: 防止大语言模型(LLM)被负用(即生成大规模危害和误导性内容)。
  • methods: 提议一种“战火与火”(F3)策略,利用现代LLM的生成和总结能力对人类写的和LLM生成的假信息进行抗击。
  • results: 在广泛的实验中,我们发现GPT-3.5-turbo在适用零极学习上下文Semantic Reasoning技术的情况下,对各种数据集(包括原始和假数据集)具有显著的优势,其精度在68-72%之间,而其他自定义和精度调整的假信息检测器则显示负性。
    Abstract Recent ubiquity and disruptive impacts of large language models (LLMs) have raised concerns about their potential to be misused (.i.e, generating large-scale harmful and misleading content). To combat this emerging risk of LLMs, we propose a novel "Fighting Fire with Fire" (F3) strategy that harnesses modern LLMs' generative and emergent reasoning capabilities to counter human-written and LLM-generated disinformation. First, we leverage GPT-3.5-turbo to synthesize authentic and deceptive LLM-generated content through paraphrase-based and perturbation-based prefix-style prompts, respectively. Second, we apply zero-shot in-context semantic reasoning techniques with cloze-style prompts to discern genuine from deceptive posts and news articles. In our extensive experiments, we observe GPT-3.5-turbo's zero-shot superiority for both in-distribution and out-of-distribution datasets, where GPT-3.5-turbo consistently achieved accuracy at 68-72%, unlike the decline observed in previous customized and fine-tuned disinformation detectors. Our codebase and dataset are available at https://github.com/mickeymst/F3.
    摘要 最近,大型自然语言模型(LLM)的普遍性和破坏性带来了关于其可能被滥用(例如,生成大规模危害和误导性内容)的担忧。为了解决这种emerging risk,我们提出了一种“战火与火”(F3)策略,利用现代LLM的生成和emergent reasoning能力来对人写的和LLM生成的假信息进行反击。首先,我们利用GPT-3.5-turbo来生成真实和假的LLM生成的内容,通过重写和扰动预 prompts来分别生成假和真实的内容。其次,我们运用零批学内存推理技术,通过cloze预 prompts来判断真实和假的文章和新闻报道。在我们的广泛实验中,我们发现GPT-3.5-turbo在各种数据集上表现出零批学的优势,其中GPT-3.5-turbo在各种distribution和out-of-distribution数据集上都能够达到68-72%的准确率,与之前经过定制和精度调整的假信息检测器不同,其准确率在下降。我们的代码库和数据集可以在https://github.com/mickeymst/F3中获取。

A Joint Matrix Factorization Analysis of Multilingual Representations

  • paper_url: http://arxiv.org/abs/2310.15513
  • repo_url: https://github.com/zsquaredz/joint_multilingual_analysis
  • paper_authors: Zheng Zhao, Yftah Ziser, Bonnie Webber, Shay B. Cohen
  • for: This paper is written to analyze the representations learned by multilingual pre-trained models and study how they encode morphosyntactic information.
  • methods: The authors use joint matrix factorization as an alternative to probing to compare the latent representations of multilingual and monolingual models.
  • results: The authors find variations in the encoding of morphosyntactic information across upper and lower layers, with category-specific differences influenced by language properties. They also find strong associations between the factorization outputs and performance across different cross-lingual tasks.
    Abstract We present an analysis tool based on joint matrix factorization for comparing latent representations of multilingual and monolingual models. An alternative to probing, this tool allows us to analyze multiple sets of representations in a joint manner. Using this tool, we study to what extent and how morphosyntactic features are reflected in the representations learned by multilingual pre-trained models. We conduct a large-scale empirical study of over 33 languages and 17 morphosyntactic categories. Our findings demonstrate variations in the encoding of morphosyntactic information across upper and lower layers, with category-specific differences influenced by language properties. Hierarchical clustering of the factorization outputs yields a tree structure that is related to phylogenetic trees manually crafted by linguists. Moreover, we find the factorization outputs exhibit strong associations with performance observed across different cross-lingual tasks. We release our code to facilitate future research.
    摘要 我们提出了基于共同矩阵因子化的分析工具,用于比较多语言和单语言模型的潜在表示。这是探测工具的一种代替方式,允许我们同时分析多个表示集。使用这种工具,我们研究了多语言预训练模型学习的 morphosyntactic 特征如何反映在其中。我们进行了33种语言和17种 morphosyntactic 类别的大规模实验研究。我们的发现表明,在上下层之间存在不同的 morphosyntactic 编码方式,并且这些差异受到语言属性的影响。使用层次 clustering 分析输出得到的树结构与手动制作的语言树相关。此外,我们发现了因素分解输出与不同的语言交互任务中的性能强相关。我们发布了我们的代码,以便将来的研究。

TRAMS: Training-free Memory Selection for Long-range Language Modeling

  • paper_url: http://arxiv.org/abs/2310.15494
  • repo_url: https://github.com/lwaekfjlk/trams
  • paper_authors: Haofei Yu, Cunxiang wang, Yue Zhang, Wei Bi
  • for: 提高Transformer架构在长距离语言模型方面的性能
  • methods: 提出了一种协助选择 calculus 计算中参与者的简单指标,以提高Transformer架构的长距离语言模型性能
  • results: 在Word-levelbenchmark(WikiText-103)和Character-level benchmark(enwik8)上测试了该方法,并获得了提高性能的结果,不需要额外训练或添加参数。
    Abstract The Transformer architecture is crucial for numerous AI models, but it still faces challenges in long-range language modeling. Though several specific transformer architectures have been designed to tackle issues of long-range dependencies, existing methods like Transformer-XL are plagued by a high percentage of ineffective memories. In this study, we present a plug-and-play strategy, known as TRAining-free Memory Selection (TRAMS), that selects tokens participating in attention calculation based on one simple metric. This strategy allows us to keep tokens that are likely to have a high attention score with the current queries and ignore the other ones. We have tested our approach on the word-level benchmark (WikiText-103) and the character-level benchmark (enwik8), and the results indicate an improvement without having additional training or adding additional parameters.
    摘要 transformer 架构在许多人工智能模型中具有重要作用,但它在长距离语言模型化方面仍面临挑战。虽然有一些特定的 transformer 架构被设计来解决长距离依赖关系的问题,但现有方法如 transformer-xl 受到高比例的不有效内存带来困扰。在这项研究中,我们提出了一种插件化策略,称为 TRAining-free Memory Selection (TRAMS),该策略选择基于一个简单度量的 tokens 参与计算注意力。这种策略允许我们保留与当前查询具有高注意力的 tokens,并忽略其他 tokens。我们在 word-level 约束(WikiText-103)和 character-level 约束(enwik8)上测试了我们的方法,结果表明我们的方法可以提高性能,无需进行额外训练或添加额外参数。

CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model

  • paper_url: http://arxiv.org/abs/2310.15477
  • repo_url: https://github.com/TsinghuaC3I/CRaSh
  • paper_authors: Kaiyan Zhang, Ning Ding, Biqing Qi, Xuekai Zhu, Xinwei Long, Bowen Zhou
  • for: 本研究旨在探讨如何使用Offsite-Tuning(OFT)技术来提高中央化大语言模型(LLM)的泛化能力,同时保护私人数据的隐私。
  • methods: 本研究使用了Empirical Analysis和CRaSh(Clustering, Removing, and Sharing)训练自由策略来探索LLM层次结构和表达的特性,以及OFT的效果。
  • results: 研究发现LLM具有层次结构,并且在不同层次上存在表达和中间预测的变化。此外,CRaSh训练自由策略可以大幅提高OFT性能,并且研究发现了基于损失函数的优化的优化方法。
    Abstract Instruction tuning has recently been recognized as an effective way of aligning Large Language Models (LLMs) to enhance their generalization ability across various tasks. However, when tuning publicly accessible, centralized LLMs with private instruction data, privacy concerns are inevitable. While direct transfer of parameterized modules between models is a plausible approach to address this, its implications and effectiveness need further exploration. This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators. Given the limited understanding of the underlying mechanism of OFT, we perform an empirical analysis on LLMs from the perspectives of representation and functional similarity. Interestingly, our findings reveal a unique modular structure within the layers of LLMs that appears to emerge as the model size expands. Simultaneously, we note subtle but potentially significant changes in representation and intermediate predictions across the layers. Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs. CRaSh significantly boosts performance of OFT with billions of parameters. Furthermore, we investigate the optimal solutions yielded by fine-tuning with and without full model through the lens of loss landscape. Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT. The source code is publicly available at https://github.com/TsinghuaC3I/CRaSh.
    摘要 征文报告:现代大语言模型(LLM)的调教技术已被认为是提高 LLM 的通用能力的有效方法。然而,当将公共可访问的中央 LLM 调教私人指令数据时,隐私问题是不可避免的。直接将参数化模块 между模型传递是一种可能的方法,但它的影响和效果需要进一步的探索。本文关注 Offsite-Tuning(OFT)技术,该技术将 transformer 块 между中央 LLM 和下游模拟器传递。由于 OFT 的深层理解尚未得到充分的研究,我们进行了empirical分析 LLM 的表示和功能相似性。我们发现 LLM 具有一种唯一的模块结构,该结构随模型大小增加而出现。此外,我们注意到在层次中存在潜在重要的表示和中间预测变化。 inspirited by these observations,我们提出了 CRaSh,即 Clustering, Removing, and Sharing 训练free的策略,以 derive improved emulators from LLMs。CRaSh 可以显著提高 OFT 的性能,并且我们通过对 fine-tuning with 和 without full model 的研究,发现这些优点在同一个极值附近。我们的发现表明 CRaSh 和 OFT 的效果。源代码可以在 https://github.com/TsinghuaC3I/CRaSh 上获取。

Continual Event Extraction with Semantic Confusion Rectification

  • paper_url: http://arxiv.org/abs/2310.15470
  • repo_url: https://github.com/nju-websoft/SCR
  • paper_authors: Zitao Wang, Xinyi Wang, Wei Hu
  • for: 本研究旨在提取不断出现的事件信息,并避免忘记现象。
  • methods: 我们提出了一种新的不断事件提取模型,使用semantic confusion rectification来纠正事件类型的含义混乱。我们为每句文本添加 Pseudo标签,以减轻semantic confusion。此外,我们还将当前和前一个模型之间的重要知识传递,以提高事件类型的理解。
  • results: 我们的模型在不平衡数据集上表现出色,比基eline模型更高效。
    Abstract We study continual event extraction, which aims to extract incessantly emerging event information while avoiding forgetting. We observe that the semantic confusion on event types stems from the annotations of the same text being updated over time. The imbalance between event types even aggravates this issue. This paper proposes a novel continual event extraction model with semantic confusion rectification. We mark pseudo labels for each sentence to alleviate semantic confusion. We transfer pivotal knowledge between current and previous models to enhance the understanding of event types. Moreover, we encourage the model to focus on the semantics of long-tailed event types by leveraging other associated types. Experimental results show that our model outperforms state-of-the-art baselines and is proficient in imbalanced datasets.
    摘要 我们研究不断发生的事件提取,旨在不断出现的事件信息的提取,而避免忘记。我们发现事件类型的含义混乱来自文本更新过程中的注释。事件类型的偏度更进一步加剧了这个问题。这篇论文提出了一种新的不断事件提取模型,用于纠正含义混乱。我们为每句话添加 Pseudo 标签,以减轻含义混乱。我们将当前和前一个模型之间的核心知识传递,以提高事件类型的理解。此外,我们采用其他相关类型来强化长尾事件类型的含义。实验结果表明,我们的模型在不良数据集中表现出色,超越了当前的基elines。

The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks

  • paper_url: http://arxiv.org/abs/2310.15469
  • repo_url: None
  • paper_authors: Xiaoyi Chen, Siyuan Tang, Rui Zhu, Shijun Yan, Lei Jin, Zihao Wang, Liya Su, XiaoFeng Wang, Haixu Tang
  • for: 这项研究旨在探讨 Large Language Models (LLMs) 在训练过程中是否可能泄露个人隐私信息。
  • methods: 该研究使用了 OpenAI 的 GPT-3.5 模型,并通过构建一个 PII assoiciation 任务来探讨 LLM 是否可能泄露个人隐私信息。
  • results: 研究发现,通过一个非常小的 PII 数据集进行微调,LLM 可以从原本不可能泄露 PII 的状态转变为可以泄露大量隐私信息的状态。这种攻击方式被称为 Janus 攻击。
    Abstract The era post-2018 marked the advent of Large Language Models (LLMs), with innovations such as OpenAI's ChatGPT showcasing prodigious linguistic prowess. As the industry galloped toward augmenting model parameters and capitalizing on vast swaths of human language data, security and privacy challenges also emerged. Foremost among these is the potential inadvertent accrual of Personal Identifiable Information (PII) during web-based data acquisition, posing risks of unintended PII disclosure. While strategies like RLHF during training and Catastrophic Forgetting have been marshaled to control the risk of privacy infringements, recent advancements in LLMs, epitomized by OpenAI's fine-tuning interface for GPT-3.5, have reignited concerns. One may ask: can the fine-tuning of LLMs precipitate the leakage of personal information embedded within training datasets? This paper reports the first endeavor to seek the answer to the question, particularly our discovery of a new LLM exploitation avenue, called the Janus attack. In the attack, one can construct a PII association task, whereby an LLM is fine-tuned using a minuscule PII dataset, to potentially reinstate and reveal concealed PIIs. Our findings indicate that, with a trivial fine-tuning outlay, LLMs such as GPT-3.5 can transition from being impermeable to PII extraction to a state where they divulge a substantial proportion of concealed PII. This research, through its deep dive into the Janus attack vector, underscores the imperative of navigating the intricate interplay between LLM utility and privacy preservation.
    摘要 post-2018年代marked the advent of Large Language Models (LLMs), with innovations such as OpenAI's ChatGPT showcasing prodigious linguistic prowess. As the industry galloped toward augmenting model parameters and capitalizing on vast swaths of human language data, security and privacy challenges also emerged. Foremost among these is the potential inadvertent accrual of Personal Identifiable Information (PII) during web-based data acquisition, posing risks of unintended PII disclosure. While strategies like RLHF during training and Catastrophic Forgetting have been marshaled to control the risk of privacy infringements, recent advancements in LLMs, epitomized by OpenAI's fine-tuning interface for GPT-3.5, have reignited concerns. One may ask: can the fine-tuning of LLMs precipitate the leakage of personal information embedded within training datasets? This paper reports the first endeavor to seek the answer to the question, particularly our discovery of a new LLM exploitation avenue, called the Janus attack. In the attack, one can construct a PII association task, whereby an LLM is fine-tuned using a minuscule PII dataset, to potentially reinstate and reveal concealed PIIs. Our findings indicate that, with a trivial fine-tuning outlay, LLMs such as GPT-3.5 can transition from being impermeable to PII extraction to a state where they divulge a substantial proportion of concealed PII. This research, through its deep dive into the Janus attack vector, underscores the imperative of navigating the intricate interplay between LLM utility and privacy preservation.

Interpreting Answers to Yes-No Questions in User-Generated Content

  • paper_url: http://arxiv.org/abs/2310.15464
  • repo_url: None
  • paper_authors: Shivam Mathur, Keun Hee Park, Dhivya Chinnappa, Saketh Kotamraju, Eduardo Blanco
  • for: 这篇论文是关于如何解释社交媒体上的问答问题的。
  • methods: 论文使用了4442个Twitter上的问答对,并分析了答案中的语言特征,以及无法解释的答案。
  • results: 研究发现,大型语言模型尚未解决了这个问题,即使精度和混合其他 corpora。
    Abstract Interpreting answers to yes-no questions in social media is difficult. Yes and no keywords are uncommon, and the few answers that include them are rarely to be interpreted what the keywords suggest. In this paper, we present a new corpus of 4,442 yes-no question-answer pairs from Twitter. We discuss linguistic characteristics of answers whose interpretation is yes or no, as well as answers whose interpretation is unknown. We show that large language models are far from solving this problem, even after fine-tuning and blending other corpora for the same problem but outside social media.
    摘要 社交媒体中答复问答的解释困难。yes和no关键词罕见,答案中很少包含这些词语,而且这些词语的解释并不总是简单。在这篇论文中,我们提供了4,442个yes-no问答对的新词汇库,并讨论了答案的语言特征,以及未知的解释。我们显示了大型自然语言模型,即使经过练习和混合其他词汇库,仍未能解决这个问题。

Facilitating Self-Guided Mental Health Interventions Through Human-Language Model Interaction: A Case Study of Cognitive Restructuring

  • paper_url: http://arxiv.org/abs/2310.15461
  • repo_url: None
  • paper_authors: Ashish Sharma, Kevin Rushton, Inna Wanyin Lin, Theresa Nguyen, Tim Althoff
  • for: 本研究旨在探讨人工智能语言模型如何支持自主心理健康 intervención.
  • methods: 我们采用了一种基于证据的心理治疗技巧——思想重构,作为案例研究。我们在一个大型心理健康网站上进行了一项IRB批准的随机场景研究,并设计了一个使用语言模型支持人们完成各种思想重构步骤的系统。
  • results: 我们发现,我们的系统对67%的参与者产生了正面的情感强度影响,并帮助65%的参与者超越负面思想。虽然青少年报告的result relatively worse,但我们发现可以通过简化语言模型生成来提高总效果和公平性。
    Abstract Self-guided mental health interventions, such as "do-it-yourself" tools to learn and practice coping strategies, show great promise to improve access to mental health care. However, these interventions are often cognitively demanding and emotionally triggering, creating accessibility barriers that limit their wide-scale implementation and adoption. In this paper, we study how human-language model interaction can support self-guided mental health interventions. We take cognitive restructuring, an evidence-based therapeutic technique to overcome negative thinking, as a case study. In an IRB-approved randomized field study on a large mental health website with 15,531 participants, we design and evaluate a system that uses language models to support people through various steps of cognitive restructuring. Our findings reveal that our system positively impacts emotional intensity for 67% of participants and helps 65% overcome negative thoughts. Although adolescents report relatively worse outcomes, we find that tailored interventions that simplify language model generations improve overall effectiveness and equity.
    摘要 自顾式精神健康互助 intervención,如"DIY"工具来学习和实践抗压力策略,显示了很大的托管难以实施和普及潜在。在这篇论文中,我们研究了人类语言模型交互如何支持自顾式精神健康互助。我们使用了证据基础的认知修剪技巧,即超越负面思维,作为一个案例研究。在一个获得IRB批准的随机场景学习中,我们设计并评估了一个使用语言模型支持人们进行多个步骤的认知修剪步骤。我们的发现表明,我们的系统对67%的参与者有积极的情感影响,并帮助65%的人超越负面思维。虽然青少年报告的结果相对较差,但我们发现了针对语言模型生成简化的个性化 intervención,可以提高总效果和公平性。

K-HATERS: A Hate Speech Detection Corpus in Korean with Target-Specific Ratings

  • paper_url: http://arxiv.org/abs/2310.15439
  • repo_url: https://github.com/ssu-humane/k-haters
  • paper_authors: Chaewon Park, Soohwan Kim, Kyubyong Park, Kunwoo Park
  • for: 这个研究的目的是开发一个针对韩语 hate speech 检测的新词库,以提高现有的 hate speech 检测模型的准确性和可靠性。
  • methods: 这个研究使用了192000篇韩语新闻评论,并对其进行了target-specific offensiveness rating。此外,研究还采用了认知反射测试来评估annotations的质量。
  • results: 研究发现,使用lowest test scores的annotations可能会导致模型对特定目标群体进行偏袋的预测,并且准确性较低。这个研究对韩语 hate speech 检测领域的NLG研究做出了贡献,并提供了一个大规模的韩语 hate speech 词库。
    Abstract Numerous datasets have been proposed to combat the spread of online hate. Despite these efforts, a majority of these resources are English-centric, primarily focusing on overt forms of hate. This research gap calls for developing high-quality corpora in diverse languages that also encapsulate more subtle hate expressions. This study introduces K-HATERS, a new corpus for hate speech detection in Korean, comprising approximately 192K news comments with target-specific offensiveness ratings. This resource is the largest offensive language corpus in Korean and is the first to offer target-specific ratings on a three-point Likert scale, enabling the detection of hate expressions in Korean across varying degrees of offensiveness. We conduct experiments showing the effectiveness of the proposed corpus, including a comparison with existing datasets. Additionally, to address potential noise and bias in human annotations, we explore a novel idea of adopting the Cognitive Reflection Test, which is widely used in social science for assessing an individual's cognitive ability, as a proxy of labeling quality. Findings indicate that annotations from individuals with the lowest test scores tend to yield detection models that make biased predictions toward specific target groups and are less accurate. This study contributes to the NLP research on hate speech detection and resource construction. The code and dataset can be accessed at https://github.com/ssu-humane/K-HATERS.
    摘要 众多数据集已经提出以抗击在线仇恨的扩展。然而,大多数这些资源都是英语中心,主要关注于显着的仇恨表达。这个研究 gap 需要开发多语言高质量 corpora,同时包括更加柔和的仇恨表达。本研究介绍 K-HATERS,一个新的仇恨语言检测 corpora 在韩语中,包含约192万条新闻评论,每个评论具有目标特定的不宽容度评分。这是韩语中最大的不宽容度 corpora,也是第一个提供目标特定的三点 Likert 等级评分,以便在不同程度的不宽容度下检测韩语中的仇恨表达。我们进行了实验,证明了提posed corpus 的有效性,包括与现有数据集进行比较。此外,为了减少人工标注中的噪音和偏见,我们explore了一种新的想法,即采用社会科学中广泛使用的认知反射测试作为标注质量的代理。结果表明,由lowest test scores的人进行标注的模型倾向于特定目标群体的报告,并且准确性较低。这种研究对 NLP 领域的仇恨语言检测和资源建构做出了贡献。可以通过 GitHub 上的https://github.com/ssu-humane/K-HATERS 获取代码和数据集。

Leveraging Large Language Models for Enhanced Product Descriptions in eCommerce

  • paper_url: http://arxiv.org/abs/2310.18357
  • repo_url: None
  • paper_authors: Jianghong Zhou, Bo Liu, Jhalak Nilesh Acharya Yao Hong, Kuang-chih Lee, Musen Wen
  • for: 提高电商搜索可见性和用户参与度,提高销售额和用户满意度
  • methods: 使用LLAMA 2.0 7B语言模型自动生成产品描述,并对模型进行域名特定语言特征和电商细节的定制,以提高其在销售和用户参与方面的实用性
  • results: 系统可以减少人工劳动量,同时提高搜索可见性和用户 clicks, validate the effectiveness of our approach using multiple evaluation metrics such as NDCG, customer click-through rates, and human assessments.
    Abstract In the dynamic field of eCommerce, the quality and comprehensiveness of product descriptions are pivotal for enhancing search visibility and customer engagement. Effective product descriptions can address the 'cold start' problem, align with market trends, and ultimately lead to increased click-through rates. Traditional methods for crafting these descriptions often involve significant human effort and may lack both consistency and scalability. This paper introduces a novel methodology for automating product description generation using the LLAMA 2.0 7B language model. We train the model on a dataset of authentic product descriptions from Walmart, one of the largest eCommerce platforms. The model is then fine-tuned for domain-specific language features and eCommerce nuances to enhance its utility in sales and user engagement. We employ multiple evaluation metrics, including NDCG, customer click-through rates, and human assessments, to validate the effectiveness of our approach. Our findings reveal that the system is not only scalable but also significantly reduces the human workload involved in creating product descriptions. This study underscores the considerable potential of large language models like LLAMA 2.0 7B in automating and optimizing various facets of eCommerce platforms, offering significant business impact, including improved search functionality and increased sales.
    摘要 在电商领域中,产品描述的质量和完整性对搜索可见性和客户参与度有着重要的影响。有效的产品描述可以解决冷启动问题,遵循市场趋势,最终导致更高的点击率。传统的产品描述创作方法通常需要大量的人工劳动,并且可能缺乏一致性和可扩展性。这篇论文提出了一种使用 LLMA 2.0 7B 语言模型自动生成产品描述的新方法。我们在 Walmart 的实际产品描述数据集上训练了模型,然后对域pecific语言特征和电商特点进行了微调,以提高其在销售和用户参与度方面的实用性。我们使用了多种评价指标,包括 NDCG、客户点击率和人类评价,来验证我们的方法的有效性。我们的发现表明,系统不仅可扩展,还可以减少人工劳动时间。这篇研究表明了大语言模型 LIKELLMA 2.0 7B 在电商平台上自动化和优化各种方面的潜在业务影响,包括改善搜索功能和提高销售。

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations

  • paper_url: http://arxiv.org/abs/2310.15431
  • repo_url: None
  • paper_authors: Kavel Rao, Liwei Jiang, Valentina Pyatkin, Yuling Gu, Niket Tandon, Nouha Dziri, Faeze Brahman, Yejin Choi
  • for: 这个论文的目的是提出一种叫做抵让的道德理由,用于在真实的生活场景中更好地表达人类的道德判断。
  • methods: 这篇论文使用了一种迭代自适应学习的方法,从GPT-3的少量无结构的种子知识开始,然后通过自己的模型进行自适应学习,并用人类判断和NLI进行筛选和自我学习,以获得更高质量的任务数据。
  • results: 这篇论文通过这种方法获得了一个高质量的数据集,名为δ-规则箴言(delta-Rules-of-Thumb),包含1.2万个 Entry of contextualizations和理由,用于115 thousand个可 defeasible moral actions。这个数据集的有效性和多样性得到了人类 annotators 的评价,85.9%-99.8% 的时间内。使用这个数据集, authors 还获得了一个高质量的学生模型,比所有 intermediate 学生模型都高得多。
    Abstract Moral or ethical judgments rely heavily on the specific contexts in which they occur. Understanding varying shades of defeasible contextualizations (i.e., additional information that strengthens or attenuates the moral acceptability of an action) is critical to accurately represent the subtlety and intricacy of grounded human moral judgment in real-life scenarios. We introduce defeasible moral reasoning: a task to provide grounded contexts that make an action more or less morally acceptable, along with commonsense rationales that justify the reasoning. To elicit high-quality task data, we take an iterative self-distillation approach that starts from a small amount of unstructured seed knowledge from GPT-3 and then alternates between (1) self-distillation from student models; (2) targeted filtering with a critic model trained by human judgment (to boost validity) and NLI (to boost diversity); (3) self-imitation learning (to amplify the desired data quality). This process yields a student model that produces defeasible contexts with improved validity, diversity, and defeasibility. From this model we distill a high-quality dataset, \delta-Rules-of-Thumb, of 1.2M entries of contextualizations and rationales for 115K defeasible moral actions rated highly by human annotators 85.9% to 99.8% of the time. Using \delta-RoT we obtain a final student model that wins over all intermediate student models by a notable margin.
    摘要 道德或伦理判断强烈取决于特定情境下发生。理解不同程度的抵觐性上下文化(即更强化或减弱行动的道德可接受度)是 kritical то accurately represent 人类的场景下的细腻和复杂的道德判断。我们引入抵觐的道德理解任务:提供场景和理由,使得行动更加道德可接受或更加不道德。为了获得高质量任务数据,我们采用迭代自适应法,从小量无结构的 GPT-3 种子知识开始,然后每次 alternate между (1) 自适应法 ; (2) 人类判断训练的批评模型和 NLI (以提高有效性和多样性); (3) 自我学习(以增强愿望的数据质量)。这个过程产生了一个学生模型,该模型生成的抵觐上下文和理由具有提高的有效性、多样性和抵觐性。从这个模型中,我们提取了一个高质量数据集, delta-Rules-of-Thumb,包含 1.2 万个上下文和理由,用于 115 万个可抵觐的道德行动,被人类标注者评分为 85.9% 到 99.8% 的时间。使用 delta-RoT,我们获得了一个最终的学生模型,该模型在所有 intermediate 学生模型之上胜出了显著的差距。

Beyond Sentiment: Leveraging Topic Metrics for Political Stance Classification

  • paper_url: http://arxiv.org/abs/2310.15429
  • repo_url: None
  • paper_authors: Weihong Qi
  • for: 本研究旨在替代和补充 sentiment 分析,准确地反映文本中的政治立场和结构。
  • methods: 本研究使用 Bestvater 和 Monroe (2023) 提供的三个数据集,利用 BERTopic 方法抽取凝结话题,并使用这些话题作为偏好变量进行立场分类。
  • results: 实验结果显示,BERTopic 可以提高凝结分数(coherence scores)by 17.07% to 54.20% Comparing to traditional方法如 Dirichlet Allocation (LDA) 和 Non-negative Matrix Factorization (NMF),并且 topic metrics 在立场分类中表现更高,可以提高性能 by as much as 18.95%。
    Abstract Sentiment analysis, widely critiqued for capturing merely the overall tone of a corpus, falls short in accurately reflecting the latent structures and political stances within texts. This study introduces topic metrics, dummy variables converted from extracted topics, as both an alternative and complement to sentiment metrics in stance classification. By employing three datasets identified by Bestvater and Monroe (2023), this study demonstrates BERTopic's proficiency in extracting coherent topics and the effectiveness of topic metrics in stance classification. The experiment results show that BERTopic improves coherence scores by 17.07% to 54.20% when compared to traditional approaches such as Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF), prevalent in earlier political science research. Additionally, our results indicate topic metrics outperform sentiment metrics in stance classification, increasing performance by as much as 18.95%. Our findings suggest topic metrics are especially effective for context-rich texts and corpus where stance and sentiment correlations are weak. The combination of sentiment and topic metrics achieve an optimal performance in most of the scenarios and can further address the limitations of relying solely on sentiment as well as the low coherence score of topic metrics.
    摘要 “对文本的情感分析,受到了许多批评,因为它只能捕捉文本的总趋势,而不能准确地反映文本中的隐藏结构和政治立场。这项研究提出了话题指标,即从提取出来的话题中生成的假变量,作为对 sentiment 指标的代替和补充。通过使用最近的 Bestvater 和 Monroe (2023) 所提出的三个数据集,本研究示出 BERTopic 在提取有 coherence 的话题方面的能力,以及话题指标在立场分类中的效果。实验结果表明,BERTopic 在比较于传统方法(如 Dirichlet Allocation 和 Non-negative Matrix Factorization)时,提高 coherence 分数 by 17.07% 到 54.20%。此外,我们的结果还表明,话题指标在立场分类中表现更好,提高性能 by 18.95%。我们的发现表明,话题指标在Context-rich 文本和 corpus 中 especial 有用,并且 combining sentiment 和话题指标可以达到最佳性能,并且可以解决受 sentiment 指标和话题指标低 coherence 分数的限制。”

The Mason-Alberta Phonetic Segmenter: A forced alignment system based on deep neural networks and interpolation

  • paper_url: http://arxiv.org/abs/2310.15425
  • repo_url: https://github.com/masonphonlab/maps_paper_code
  • paper_authors: Matthew C. Kelley, Scott James Perry, Benjamin V. Tucker
  • for: 这个论文主要是为了提出一种新的神经网络基于的强制对齐系统(MAPS),以测试两种可能的改进方法。
  • methods: 这个系统使用神经网络作为强制对齐的模型,并采用了两种改进方法:一是将声学模型视为标记任务而不是分类任务,二是使用插值技术来允许边界更精确。
  • results: 与比较体系 Montreal Forced Aligner 相比,使用插值技术的系统在测试集上增加了27.92%的边界 Within 10 ms。然而,使用标记任务的方法并不总是有所改进。这个研究还探讨了声学模型的训练任务和输出目标的问题,并提出了重新思考如何分 segment speech 本身的可能性。
    Abstract Forced alignment systems automatically determine boundaries between segments in speech data, given an orthographic transcription. These tools are commonplace in phonetics to facilitate the use of speech data that would be infeasible to manually transcribe and segment. In the present paper, we describe a new neural network-based forced alignment system, the Mason-Alberta Phonetic Segmenter (MAPS). The MAPS aligner serves as a testbed for two possible improvements we pursue for forced alignment systems. The first is treating the acoustic model in a forced aligner as a tagging task, rather than a classification task, motivated by the common understanding that segments in speech are not truly discrete and commonly overlap. The second is an interpolation technique to allow boundaries more precise than the common 10 ms limit in modern forced alignment systems. We compare configurations of our system to a state-of-the-art system, the Montreal Forced Aligner. The tagging approach did not generally yield improved results over the Montreal Forced Aligner. However, a system with the interpolation technique had a 27.92% increase relative to the Montreal Forced Aligner in the amount of boundaries within 10 ms of the target on the test set. We also reflect on the task and training process for acoustic modeling in forced alignment, highlighting how the output targets for these models do not match phoneticians' conception of similarity between phones and that reconciliation of this tension may require rethinking the task and output targets or how speech itself should be segmented.
    摘要 受限Alignment系统会自动确定在语音数据中的分割边界, givent an orthographic transcription。这些工具在 fonetics 中很普遍,以便使用不可靠的手动转录和分割语音数据。在这篇文章中,我们描述了一个新的 нейрон网络基于的受限Alignment系统,即 Mason-Alberta 音频分 segmenter (MAPS)。MAPS 分配器 serves as a testbed for two possible improvements we pursue for forced alignment systems。第一个是在受限Alignment系统中对音频模型进行标记任务,而不是分类任务,这是因为在语音中的分割不是真正独立的,通常 overlap。第二个是一种 interpolate 技术,以Allow boundaries more precise than the common 10 ms limit in modern forced alignment systems。我们与 state-of-the-art system, Montreal Forced Aligner 进行比较。 tagging 方法不一般提高resultsover Montreal Forced Aligner。然而,一种含 interpolate 技术的系统在 test set 上有 27.92% 的提高,相对于 Montreal Forced Aligner。我们还反思了 acoustic modeling 在受限Alignment中的任务和训练过程,并 highlighted how the output targets for these models do not match phoneticians' conception of similarity between phones,这可能需要重新考虑 speech 自身的 segmentation。

Let the Pretrained Language Models “Imagine” for Short Texts Topic Modeling

  • paper_url: http://arxiv.org/abs/2310.15420
  • repo_url: None
  • paper_authors: Pritom Saha Akash, Jie Huang, Kevin Chen-Chuan Chang
  • for: 寻找短文档中隐藏的 semantics,Addressing the data-sparsity issue in short-text topic modeling.
  • methods: 使用 pre-trained language models (PLMs) 将短文档扩展到更长的序列,并提供一种简单的解决方案来减少 PLMs 生成的噪音文本影响。
  • results: 在多个实际场景下,模型可以大幅提高短文档主题模型的性能,并超越现有的模型。
    Abstract Topic models are one of the compelling methods for discovering latent semantics in a document collection. However, it assumes that a document has sufficient co-occurrence information to be effective. However, in short texts, co-occurrence information is minimal, which results in feature sparsity in document representation. Therefore, existing topic models (probabilistic or neural) mostly fail to mine patterns from them to generate coherent topics. In this paper, we take a new approach to short-text topic modeling to address the data-sparsity issue by extending short text into longer sequences using existing pre-trained language models (PLMs). Besides, we provide a simple solution extending a neural topic model to reduce the effect of noisy out-of-topics text generation from PLMs. We observe that our model can substantially improve the performance of short-text topic modeling. Extensive experiments on multiple real-world datasets under extreme data sparsity scenarios show that our models can generate high-quality topics outperforming state-of-the-art models.
    摘要 In this paper, we take a new approach to short-text topic modeling to address the data-sparsity issue by extending short texts into longer sequences using existing pre-trained language models (PLMs). Additionally, we provide a simple solution for reducing the effect of noisy out-of-topics text generation from PLMs by extending a neural topic model. Our experiments on multiple real-world datasets under extreme data sparsity scenarios show that our models can generate high-quality topics that outperform state-of-the-art models.

cs.LG - 2023-10-24

Task Grouping for Automated Multi-Task Machine Learning via Task Affinity Prediction

  • paper_url: http://arxiv.org/abs/2310.16241
  • repo_url: None
  • paper_authors: Afiya Ayman, Ayan Mukhopadhyay, Aron Laszka
  • for: 本研究旨在找到优化多任务学习(MTL)模型性能的任务组合方法。
  • methods: 研究者使用了四个通用的 benchmark 数据集,对于神经网络基于的 MTL 模型进行研究,并identified inherent task features和单任务学习(STL)特征,以预测任务组合是否可以通过 MTL 进行学习。
  • results: 研究者提出了一种随机搜索算法,使用预测器来最小化 MTL 训练的数量,并在四个 benchmark 数据集上 demonstarted 该方法可以找到更好的任务组合,比较 existed 的基eline 方法更高。
    Abstract When a number of similar tasks have to be learned simultaneously, multi-task learning (MTL) models can attain significantly higher accuracy than single-task learning (STL) models. However, the advantage of MTL depends on various factors, such as the similarity of the tasks, the sizes of the datasets, and so on; in fact, some tasks might not benefit from MTL and may even incur a loss of accuracy compared to STL. Hence, the question arises: which tasks should be learned together? Domain experts can attempt to group tasks together following intuition, experience, and best practices, but manual grouping can be labor-intensive and far from optimal. In this paper, we propose a novel automated approach for task grouping. First, we study the affinity of tasks for MTL using four benchmark datasets that have been used extensively in the MTL literature, focusing on neural network-based MTL models. We identify inherent task features and STL characteristics that can help us to predict whether a group of tasks should be learned together using MTL or if they should be learned independently using STL. Building on this predictor, we introduce a randomized search algorithm, which employs the predictor to minimize the number of MTL trainings performed during the search for task groups. We demonstrate on the four benchmark datasets that our predictor-driven search approach can find better task groupings than existing baseline approaches.
    摘要 In this paper, we propose a novel automated approach for task grouping. First, we analyze the affinity of tasks for MTL using four benchmark datasets that have been widely used in the MTL literature, focusing on neural network-based MTL models. We identify inherent task features and STL characteristics that can help predict whether a group of tasks should be learned together using MTL or if they should be learned independently using STL. Building on this predictor, we introduce a randomized search algorithm that employs the predictor to minimize the number of MTL trainings performed during the search for task groups. We demonstrate on the four benchmark datasets that our predictor-driven search approach can find better task groupings than existing baseline approaches.

Attention-Based Ensemble Pooling for Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2310.16231
  • repo_url: https://github.com/awikner/denpool
  • paper_authors: Dhruvit Patel, Alexander Wikner
  • for: 降低时间序列预测模型偏见
  • methods: 使用 ensemble 预测模型并将其输出汇总为ensemble预测
  • results: 在非站ARY Lorenz ‘63方程的多步预测中表现出色,但在COVID-19每周病例死亡的一步预测中不一定比现有的ensemble pooling表现更好。Here’s the breakdown of each point:
  • for: 降低时间序列预测模型偏见 (What the paper is written for: reducing bias in time-series forecasting models)
  • methods: 使用 ensemble 预测模型并将其输出汇总为ensemble预测 (What methods the paper uses: using an ensemble of predictive models and pooling their output)
  • results: 在非站ARY Lorenz ‘63方程的多步预测中表现出色,但在COVID-19每周病例死亡的一步预测中不一定比现有的ensemble pooling表现更好。 (What results the paper gets: excellent performance in multi-step forecasting of the non-stationary Lorenz ‘63 equation, but not consistently better than existing ensemble pooling in one-step forecasting of COVID-19 weekly incident deaths)
    Abstract A common technique to reduce model bias in time-series forecasting is to use an ensemble of predictive models and pool their output into an ensemble forecast. In cases where each predictive model has different biases, however, it is not always clear exactly how each model forecast should be weighed during this pooling. We propose a method for pooling that performs a weighted average over candidate model forecasts, where the weights are learned by an attention-based ensemble pooling model. We test this method on two time-series forecasting problems: multi-step forecasting of the dynamics of the non-stationary Lorenz `63 equation, and one-step forecasting of the weekly incident deaths due to COVID-19. We find that while our model achieves excellent valid times when forecasting the non-stationary Lorenz `63 equation, it does not consistently perform better than the existing ensemble pooling when forecasting COVID-19 weekly incident deaths.
    摘要 一种常见的减少模型偏见技术在时间序列预测中是使用一个ensemble的预测模型,并将其输出Pool到一个ensemble预测中。在每个预测模型具有不同偏见的情况下,不一定是如何对每个模型预测应该进行权重的。我们提议一种使用注意力基于的ensemblePooling模型来学习 weights。我们在两个时间序列预测问题上测试了这种方法:非站点性 Lorenz '63 方程的多步预测和 COVID-19 每周病例死亡的一步预测。我们发现,虽然我们的模型在预测非站点性 Lorenz '63 方程时表现出色,但在预测 COVID-19 每周病例死亡时不一定能够超越现有的ensemblePooling。

Poison is Not Traceless: Fully-Agnostic Detection of Poisoning Attacks

  • paper_url: http://arxiv.org/abs/2310.16224
  • repo_url: None
  • paper_authors: Xinglong Chang, Katharina Dost, Gillian Dobbie, Jörg Wicker
  • for: 这个论文是为了检测机器学习模型中的攻击。
  • methods: 这个检测方法使用了一个全新的无预设框架,名为DIVA,它可以对于潜在的毒素资料集进行检测。DIVA 基于Classifier的精度差异 між受毒和清洁资料集来检测攻击。
  • results: 在这个论文中,DIVA 方法在面对标签转换攻击时得到了良好的结果,能够妥善地检测并识别攻击。
    Abstract The performance of machine learning models depends on the quality of the underlying data. Malicious actors can attack the model by poisoning the training data. Current detectors are tied to either specific data types, models, or attacks, and therefore have limited applicability in real-world scenarios. This paper presents a novel fully-agnostic framework, DIVA (Detecting InVisible Attacks), that detects attacks solely relying on analyzing the potentially poisoned data set. DIVA is based on the idea that poisoning attacks can be detected by comparing the classifier's accuracy on poisoned and clean data and pre-trains a meta-learner using Complexity Measures to estimate the otherwise unknown accuracy on a hypothetical clean dataset. The framework applies to generic poisoning attacks. For evaluation purposes, in this paper, we test DIVA on label-flipping attacks.
    摘要 Machine learning 模型的性能取决于训练数据的质量。恶意攻击者可以让训练数据中掺入假数据,从而影响模型的性能。现有的检测器受限于特定数据类型、模型或攻击方式,因此在实际场景中有有限的应用。这篇论文提出了一种全新的无关数据类型和模型的检测框架,名为DIVA(检测隐藏攻击)。DIVA基于的想法是,攻击者可以通过比较污染和干净数据集中分类器的准确率来检测攻击。DIVA使用复杂度度量来估算干净数据集中模型的准确率,并在这个假设中预训练一个元学习器。该框架适用于普通的污染攻击。为评估目的,在这篇论文中,我们对标签替换攻击进行测试。

Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies

  • paper_url: http://arxiv.org/abs/2310.16214
  • repo_url: None
  • paper_authors: Adrian Perez Dieguez, Margarita Amor Lopez
  • for: 本文旨在提高GPU嵌入式系统的性能,以满足实时或时间consuming应用的需求。
  • methods: 本文提出了两种优化方法,一是分析模型驱动的优化方法,另一是基于机器学习(ML)的优化方法。
  • results: 对各种并行前缀操作(FFT、扫描 primitives、三角系统解)的表现进行了性能分析,并提供了开发者和研究人员可以参考的实践指导。
    Abstract GPU-embedded systems have gained popularity across various domains due to their efficient power consumption. However, in order to meet the demands of real-time or time-consuming applications running on these systems, it is crucial for them to be tuned to exhibit high performance. This paper addresses the issue by developing and comparing two tuning methodologies on GPU-embedded systems, and also provides performance insights for developers and researchers seeking to optimize applications running on these architectures. We focus on parallel prefix operations, such as FFT, scan primitives, and tridiagonal system solvers, which are performance-critical components in many applications. The study introduces an analytical model-driven tuning methodology and a Machine Learning (ML)-based tuning methodology. We evaluate the performance of the two tuning methodologies for different parallel prefix implementations of the BPLG library in an NVIDIA Jetson system, and compare their performance to the ones achieved through an exhaustive search. The findings shed light on the best strategies for handling the open challenge of performance portability for major computational patterns among server and embedded devices, providing practical guidance for offline and online tuning. We also address the existing gap in performance studies for parallel computational patterns in GPU-embedded systems by comparing the BPLG performance against other state-of-the-art libraries, including CUSPARSE, CUB, and CUFFT.
    摘要

ELM Ridge Regression Boosting

  • paper_url: http://arxiv.org/abs/2310.16209
  • repo_url: None
  • paper_authors: M. Andrecut
  • for: 提高Extreme Learning Machine(ELM)的分类性能和Robustness。
  • methods: 使用Boosting方法对Ridge Regression(RR)方法进行改进。
  • results: 提高了ELM的分类性能和Robustness。In simplified Chinese, the three key points would be:
  • for: 提高ELM的分类性能和Robustness。
  • methods: 使用Boosting方法对RR方法进行改进。
  • results: 提高了ELM的分类性能和Robustness。
    Abstract We discuss a boosting approach for the Ridge Regression (RR) method, with applications to the Extreme Learning Machine (ELM), and we show that the proposed method significantly improves the classification performance and robustness of ELMs.
    摘要 我们讨论了一种扩充方法,用于ridge回归(RR)方法,并应用于极限学习机(ELM)中。我们显示,提案的方法可以显著提高ELM的分类性能和可靠性。Here's the word-for-word translation:我们讨论了一种扩充方法,用于ridge回归(RR)方法,并应用于极限学习机(ELM)中。我们显示,提案的方法可以显著提高ELM的分类性能和可靠性。Note that the word "ridge" is translated as "ridge回归" (ridge regression) in Simplified Chinese, and "extreme learning machine" is translated as "极限学习机" (extreme learning machine).

Efficient deep data assimilation with sparse observations and time-varying sensors

  • paper_url: http://arxiv.org/abs/2310.16187
  • repo_url: https://github.com/dl-wg/vivid
  • paper_authors: Sibo Cheng, Che Liu, Yike Guo, Rossella Arcucci
    for:* 这个论文是为了提出一种新的变量数据整合(DA)方法,用于处理高维动力系统中的不Structured观测数据。methods:* 这种新的DA方法称为Voronoi-tessellation Inverse operator for VariatIonal Data assimilation(VIVID),它 integration了深度学习(DL) inverse operator到数据整合目标函数中。* VIVID使用 Voronoi-tessellation和卷积神经网络来处理稀疏、不结构的观测数据,并且可以轻松地与Proper Orthogonal Decomposition(POD)结合,实现一个综合的减少维度数据整合方案。results:* 在一个流体动力系统中的数值实验中,VIVID可以明显超过现有的DA和DL算法。* VIVID的稳定性也被证明,通过对不同水平的先验错误、不同数量的探测器和数据整合错误 covariance的使用进行评估。
    Abstract Variational Data Assimilation (DA) has been broadly used in engineering problems for field reconstruction and prediction by performing a weighted combination of multiple sources of noisy data. In recent years, the integration of deep learning (DL) techniques in DA has shown promise in improving the efficiency and accuracy in high-dimensional dynamical systems. Nevertheless, existing deep DA approaches face difficulties in dealing with unstructured observation data, especially when the placement and number of sensors are dynamic over time. We introduce a novel variational DA scheme, named Voronoi-tessellation Inverse operator for VariatIonal Data assimilation (VIVID), that incorporates a DL inverse operator into the assimilation objective function. By leveraging the capabilities of the Voronoi-tessellation and convolutional neural networks, VIVID is adept at handling sparse, unstructured, and time-varying sensor data. Furthermore, the incorporation of the DL inverse operator establishes a direct link between observation and state space, leading to a reduction in the number of minimization steps required for DA. Additionally, VIVID can be seamlessly integrated with Proper Orthogonal Decomposition (POD) to develop an end-to-end reduced-order DA scheme, which can further expedite field reconstruction. Numerical experiments in a fluid dynamics system demonstrate that VIVID can significantly outperform existing DA and DL algorithms. The robustness of VIVID is also accessed through the application of various levels of prior error, the utilization of varying numbers of sensors, and the misspecification of error covariance in DA.
    摘要 “Variational Data Assimilation(DA)在工程问题中广泛应用于场景重建和预测,通过对多种噪声数据进行权重组合。在过去几年,将深度学习(DL)技术integrated into DA中的潜在优化高维动态系统的效率和准确性。然而,现有的深度DA方法在处理不结构化观测数据时存在困难,特别是当感知器的位置和数量在时间上是动态变化的。我们介绍了一种新的variational DA方案,名为Voronoi-tessellation Inverse operator for VariatIonal Data assimilation(VIVID),它通过在权重组合中引入深度反向运算来解决这些问题。通过利用Voronoi-tessellation和卷积神经网络,VIVID能够处理稀疏、不结构化和时间变化的感知数据。此外,将深度反向运算integrated into DA目标函数可以直接连接观测和状态空间,从而减少DA的最小化步骤数量。此外,VIVID可以轻松地与Proper Orthogonal Decomposition(POD)结合,开发一个端到端减少DA方案,可以进一步加速场景重建。数值实验在流体动力系统中表明,VIVID可以明显超越现有的DA和DL算法。此外,我们还评估了VIVID的稳定性,通过在不同水平的先验错误、不同数量的感知器和DA中错误协方差的情况下进行测试。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images

  • paper_url: http://arxiv.org/abs/2310.16186
  • repo_url: None
  • paper_authors: Howard Yanxon, Eric Roberts, Hannah Parraga, James Weng, Wenqian Xu, Uta Ruett, Alexander Hexemer, Petrus Zwart, Nickolas Schwarz
  • for: 这个论文是为了探讨材料功能设备中的晶体结构的研究。
  • methods: 该论文提出了一种使用深度学习卷积神经网络方法来识别实验XRD图像中的假象。
  • results: 研究结果表明,U-Nets可以在测试数据集上保持92.4%的准确率,同时减少了False Positive的平均值34%,并减少了对假象的分析时间超过50%。
    Abstract Scientific researchers frequently use the in situ synchrotron high-energy powder X-ray diffraction (XRD) technique to examine the crystallographic structures of materials in functional devices such as rechargeable battery materials. We propose a method for identifying artifacts in experimental XRD images. The proposed method uses deep learning convolutional neural network architectures, such as tunable U-Nets to identify the artifacts. In particular, the predicted artifacts are evaluated against the corresponding ground truth (manually implemented) using the overall true positive rate or recall. The result demonstrates that the U-Nets can consistently produce great recall performance at 92.4% on the test dataset, which is not included in the training, with a 34% reduction in average false positives in comparison to the conventional method. The U-Nets also reduce the time required to identify and separate artifacts by more than 50%. Furthermore, the exclusion of the artifacts shows major changes in the integrated 1D XRD pattern, enhancing further analysis of the post-processing XRD data.
    摘要 Here's the Simplified Chinese translation:科学研究人员 часто使用坐垦同步троン高能粉末X射 diffraction(XRD)技术来研究功能设备中材料的 кристал化结构。我们提出了用深度学习 convolutional neural network 架构来Identify experimental XRD 图像中的 artifacts的方法。特别是,预测的 artifacts 将与相应的ground truth(手动实现的)进行评估,并使用 true positive rate 或 recall 来评估预测的性能。结果显示,U-Nets 可以在测试集上保持92.4%的回归性能,与传统方法相比下降34%的假阳性率。U-Nets 还可以降低识别和分离 artifacts 所需的时间超过50%。此外,排除 artifacts 会导致整个1D XRD 图像中的积分呈现出明显的变化,进一步促进了后处理 XRD 数据的分析。

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $ε$-Greedy Exploration

  • paper_url: http://arxiv.org/abs/2310.16173
  • repo_url: None
  • paper_authors: Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury
  • for: 提供了深度强化学习中DQN($\epsilon$-资源探索)的理论理解。
  • methods: 使用目标网络和经验回放来获得不偏的MSBE估计,并提供了第一个实际 Setting中DQN的 тео리тиче converge和样本复杂度分析。
  • results: 证明了一种循环过程,其中 decaying $\epsilon$ converge to the optimal Q-value function geometrically,而且 higher level of $\epsilon$ values 增加了整体的整合区域,但是降低了循环速度,相反,lower level of $\epsilon$ values 减少了整合区域,但是提高了循环速度。实验证明了我们的理论成果。
    Abstract This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $\varepsilon$-greedy exploration in deep reinforcement learning. Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored. First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training the Q-network. However, the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient. This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $\epsilon$-greedy policy. We prove an iterative procedure with decaying $\epsilon$ converges to the optimal Q-value function geometrically. Moreover, a higher level of $\epsilon$ values enlarges the region of convergence but slows down the convergence, while the opposite holds for a lower level of $\epsilon$ values. Experiments justify our established theoretical insights on DQNs.
    摘要

Fine tuning Pre trained Models for Robustness Under Noisy Labels

  • paper_url: http://arxiv.org/abs/2310.17668
  • repo_url: None
  • paper_authors: Sumyeong Ahn, Sihyeon Kim, Jongwoo Ko, Se-Young Yun
  • for: 这篇论文的目的是为了解决养分拥有误标的训练集时,机器学习模型的性能如何受到影响。
  • methods: 这篇论文使用了对于误标数据的研究,以及对于这些误标数据的处理和范例。
  • results: 这篇论文的结果显示,使用TURN算法可以实现高效的误标数据处理,并且可以增强预训模型在不同 benchmark 上的表现。
    Abstract The presence of noisy labels in a training dataset can significantly impact the performance of machine learning models. To tackle this issue, researchers have explored methods for Learning with Noisy Labels to identify clean samples and reduce the influence of noisy labels. However, constraining the influence of a certain portion of the training dataset can result in a reduction in overall generalization performance. To alleviate this, recent studies have considered the careful utilization of noisy labels by leveraging huge computational resources. Therefore, the increasing training cost necessitates a reevaluation of efficiency. In other areas of research, there has been a focus on developing fine-tuning techniques for large pre-trained models that aim to achieve both high generalization performance and efficiency. However, these methods have mainly concentrated on clean datasets, and there has been limited exploration of the noisy label scenario. In this research, our aim is to find an appropriate way to fine-tune pre-trained models for noisy labeled datasets. To achieve this goal, we investigate the characteristics of pre-trained models when they encounter noisy datasets. Through empirical analysis, we introduce a novel algorithm called TURN, which robustly and efficiently transfers the prior knowledge of pre-trained models. The algorithm consists of two main steps: (1) independently tuning the linear classifier to protect the feature extractor from being distorted by noisy labels, and (2) reducing the noisy label ratio and fine-tuning the entire model based on the noise-reduced dataset to adapt it to the target dataset. The proposed algorithm has been extensively tested and demonstrates efficient yet improved denoising performance on various benchmarks compared to previous methods.
    摘要 训练数据中噪音标签的存在可能对机器学习模型的性能产生深见影响。为了解决这问题,研究人员已经探索了学习噪音标签的方法,以避免噪音标签的影响。然而,限制噪音标签的影响可能会导致总体的适应性下降。为了解决这问题,最近的研究已经考虑了大量计算资源的利用,以减少噪音标签的影响。因此,随着训练成本的增加,我们需要重新评估效率。在其他研究领域,研究人员已经对大型预训练模型进行了细化调整,以达到高适应性和高效率的目标。然而,这些方法主要集中在干净数据上进行调整,而噪音标签场景得到的研究很少。在这项研究中,我们的目标是找到适合的方法来调整预训练模型,以适应噪音标签数据。为了实现这个目标,我们进行了employm empirical分析,并提出了一种名为TURN的新算法。TURN算法包括两个主要步骤:(1)独立地调整线性分类器,以防止噪音标签对特征提取器的影响,和(2)通过减少噪音标签比例,并在减少后进行 fine-tuning,以适应目标数据。我们对TURN算法进行了广泛的测试,并证明了它可以有效地、高效率地除噪。

Brainchop: Next Generation Web-Based Neuroimaging Application

  • paper_url: http://arxiv.org/abs/2310.16162
  • repo_url: https://github.com/neuroneural/brainchop
  • paper_authors: Mohamed Masoud, Pratyush Reddy, Farfalla Hu, Sergey Plis
  • for: This paper is written for researchers and practitioners in the field of neuroimaging, particularly those interested in whole brain preprocessing and segmentation using deep learning models.
  • methods: The paper uses a pre-trained full-brain deep learning model to perform volumetric analysis of structural MRI data directly within the browser, without requiring technical expertise or intricate setup procedures. The MeshNet architecture is used to enable client-side processing for volumetric data.
  • results: The paper evaluates the performance of the Brainchop tool across various software and hardware configurations, demonstrating the practicality of client-side processing for volumetric data even within the resource-constrained environment of web browsers. The results show that Brainchop offers multiple benefits, including scalability, low latency, user-friendly operation, cross-platform compatibility, and enhanced accessibility.
    Abstract Performing volumetric image processing directly within the browser, particularly with medical data, presents unprecedented challenges compared to conventional backend tools. These challenges arise from limitations inherent in browser environments, such as constrained computational resources and the availability of frontend machine learning libraries. Consequently, there is a shortage of neuroimaging frontend tools capable of providing comprehensive end-to-end solutions for whole brain preprocessing and segmentation while preserving end-user data privacy and residency. In light of this context, we introduce Brainchop (http://www.brainchop.org) as a groundbreaking in-browser neuroimaging tool that enables volumetric analysis of structural MRI using pre-trained full-brain deep learning models, all without requiring technical expertise or intricate setup procedures. Beyond its commitment to data privacy, this frontend tool offers multiple features, including scalability, low latency, user-friendly operation, cross-platform compatibility, and enhanced accessibility. This paper outlines the processing pipeline of Brainchop and evaluates the performance of models across various software and hardware configurations. The results demonstrate the practicality of client-side processing for volumetric data, owing to the robust MeshNet architecture, even within the resource-constrained environment of web browsers.
    摘要 <>将三维图像处理直接在浏览器中进行,特别是在医疗数据上,存在前所未有的挑战。这些挑战来自浏览器环境的限制,如计算资源的紧张和前端机器学习库的可用性。因此,存在一个缺乏神经成像前端工具,可以提供全 bran 的整体端到端解决方案,保持用户数据隐私和存储。在这个 контексте,我们介绍 Brainchop(http://www.brainchop.org),一种创新的在浏览器中的神经成像工具,可以使用预训练的全 bran 深度学习模型进行结构 MRI 的三维分析,无需技术专业知识或复杂的设置过程。 Brainchop 除了保持用户数据隐私外,还具有规模、延迟低、易用操作、跨平台兼容和更好的可访问性等多个特点。本文介绍 Brainchop 的处理管道和模型性能的评估,结果表明在web浏览器的资源环境中,可以通过强大的 MeshNet 架构实现Client-side 处理三维数据的实用性。

Breaking the Curse of Dimensionality in Deep Neural Networks by Learning Invariant Representations

  • paper_url: http://arxiv.org/abs/2310.16154
  • repo_url: None
  • paper_authors: Leonardo Petrini
  • for: 这个论文旨在探讨深度学习模型的理论基础,具体来说是研究深度学习模型如何从数据中学习有用的特征,以及这些模型如何在高维数据中学习函数。
  • methods: 该论文采用了实验室方法,结合物理启发式的小型模型,以研究和解释深度学习系统中复杂的行为。
  • results: 该论文发现了深度学习模型的效果听起来是受数据结构的支持,而不是受数据量的支持。此外,不同的建筑均可以利用不同的数据结构,从而提高模型的性能。
    Abstract Artificial intelligence, particularly the subfield of machine learning, has seen a paradigm shift towards data-driven models that learn from and adapt to data. This has resulted in unprecedented advancements in various domains such as natural language processing and computer vision, largely attributed to deep learning, a special class of machine learning models. Deep learning arguably surpasses traditional approaches by learning the relevant features from raw data through a series of computational layers. This thesis explores the theoretical foundations of deep learning by studying the relationship between the architecture of these models and the inherent structures found within the data they process. In particular, we ask What drives the efficacy of deep learning algorithms and allows them to beat the so-called curse of dimensionality-i.e. the difficulty of generally learning functions in high dimensions due to the exponentially increasing need for data points with increased dimensionality? Is it their ability to learn relevant representations of the data by exploiting their structure? How do different architectures exploit different data structures? In order to address these questions, we push forward the idea that the structure of the data can be effectively characterized by its invariances-i.e. aspects that are irrelevant for the task at hand. Our methodology takes an empirical approach to deep learning, combining experimental studies with physics-inspired toy models. These simplified models allow us to investigate and interpret the complex behaviors we observe in deep learning systems, offering insights into their inner workings, with the far-reaching goal of bridging the gap between theory and practice.
    摘要 人工智能,尤其是机器学习的一个子领域,已经经历了一种 Paradigm shift ,把注重于数据驱动的模型作为核心。这种 shift 导致了各个领域的不同领域,如自然语言处理和计算机视觉等领域,具有压倒性的进步,主要归功于深度学习,一种特殊的机器学习模型。深度学习可能超越传统方法,因为它可以从原始数据中学习相关的特征。这个论文探讨了深度学习的理论基础,研究了深度学习模型和数据之间的关系。具体来说,我们问:深度学习算法的效果是什么?它能够在高维度上学习函数吗?深度学习模型如何利用数据的结构来学习有用的表示?不同的架构如何利用不同的数据结构?为了回答这些问题,我们推进了一种理论,即数据的结构可以被有效地Characterized by its invariances,即不重要的特征。我们的方法采取了实验室的方法,结合了物理启发的简单模型。这些简单的模型允许我们investigate和 interpret 深度学习系统中的复杂行为,提供了对其内部工作的深入理解,以期 bridge 理论和实践之间的差距。

FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering

  • paper_url: http://arxiv.org/abs/2310.16152
  • repo_url: None
  • paper_authors: Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Kang Gu, Najrin Sultana, Shagufta Mehnaz
  • for: 本研究旨在探讨 federated learning(FL)中隐私泄露的问题,具体来说是在语言模型中实现隐私保护。
  • methods: 本研究使用了两个新发现,即模型Snapshot在中间轮次可能导致更大的隐私泄露,以及模型中选择性的权重可以增加隐私泄露的风险。
  • results: 研究发现,使用最佳方法可以提高会员推理精度29%,并达到70%的隐私数据重建率,质量明显超过现有攻击方法。
    Abstract Federated learning (FL) is becoming a key component in many technology-based applications including language modeling -- where individual FL participants often have privacy-sensitive text data in their local datasets. However, realizing the extent of privacy leakage in federated language models is not straightforward and the existing attacks only intend to extract data regardless of how sensitive or naive it is. To fill this gap, in this paper, we introduce two novel findings with regard to leaking privacy-sensitive user data from federated language models. Firstly, we make a key observation that model snapshots from the intermediate rounds in FL can cause greater privacy leakage than the final trained model. Secondly, we identify that privacy leakage can be aggravated by tampering with a model's selective weights that are specifically responsible for memorizing the sensitive training data. We show how a malicious client can leak the privacy-sensitive data of some other user in FL even without any cooperation from the server. Our best-performing method improves the membership inference recall by 29% and achieves up to 70% private data reconstruction, evidently outperforming existing attacks with stronger assumptions of adversary capabilities.
    摘要 federated learning (FL) 已成为许多技术应用程序的关键组件,包括语音模型 - where individual FL participants often have privacy-sensitive text data in their local datasets. However, realizing the extent of privacy leakage in federated language models is not straightforward, and existing attacks only aim to extract data regardless of how sensitive or naive it is. To fill this gap, in this paper, we introduce two novel findings with regard to leaking privacy-sensitive user data from federated language models. Firstly, we make a key observation that model snapshots from the intermediate rounds in FL can cause greater privacy leakage than the final trained model. Secondly, we identify that privacy leakage can be aggravated by tampering with a model's selective weights that are specifically responsible for memorizing the sensitive training data. We show how a malicious client can leak the privacy-sensitive data of some other user in FL even without any cooperation from the server. Our best-performing method improves the membership inference recall by 29% and achieves up to 70% private data reconstruction, evidently outperforming existing attacks with stronger assumptions of adversary capabilities.

A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

  • paper_url: http://arxiv.org/abs/2310.19821
  • repo_url: None
  • paper_authors: Reda Alami, Mohammed Mahfoud, Mastane Achab
  • for: 该文章目的是提出一种适应非站台环境的可靠多臂抓拍算法框架,以优化在健康保险或金融等高波动环境中的学习问题。
  • methods: 该框架基于多种常见的风险度量,将多个家族的多臂抓拍算法映射到风险敏感的设定中。此外,该框架还使用重启 Bayesian 线性时间变化检测算法(R-BOCPD)和一个可调的强制探索策略来检测每个臂的地方(本地)变化。
  • results: 虽然该框架在理论上的 finite-time 保证和 asymptotic regret bound 是 $\tilde O(\sqrt{K_T T})$,但在实际应用中,它在 synthetic 和实际环境中表现出色,并能够有效地处理风险敏感和非站台环境。
    Abstract In a typical stochastic multi-armed bandit problem, the objective is often to maximize the expected sum of rewards over some time horizon $T$. While the choice of a strategy that accomplishes that is optimal with no additional information, it is no longer the case when provided additional environment-specific knowledge. In particular, in areas of high volatility like healthcare or finance, a naive reward maximization approach often does not accurately capture the complexity of the learning problem and results in unreliable solutions. To tackle problems of this nature, we propose a framework of adaptive risk-aware strategies that operate in non-stationary environments. Our framework incorporates various risk measures prevalent in the literature to map multiple families of multi-armed bandit algorithms into a risk-sensitive setting. In addition, we equip the resulting algorithms with the Restarted Bayesian Online Change-Point Detection (R-BOCPD) algorithm and impose a (tunable) forced exploration strategy to detect local (per-arm) switches. We provide finite-time theoretical guarantees and an asymptotic regret bound of order $\tilde O(\sqrt{K_T T})$ up to time horizon $T$ with $K_T$ the total number of change-points. In practice, our framework compares favorably to the state-of-the-art in both synthetic and real-world environments and manages to perform efficiently with respect to both risk-sensitivity and non-stationarity.
    摘要 在一般的随机多臂抓拍问题中,目标通常是在某个时间戳 $T$ 上 maximize 的预期奖励总和。而在具有环境特定知识的情况下,一个简单的奖励最大化方法通常不能准确捕捉学习问题的复杂性,导致不可靠的解决方案。为解决这类问题,我们提出了一个适应风险感知策略框架。我们的框架将多种通用风险度量 integrate 到非站台环境中,将多个家族的多臂抓拍算法映射到风险敏感设定下。此外,我们还具有Restarted Bayesian Online Change-Point Detection(R-BOCPD)算法和(可调)强制探索策略,以探测每个臂上的地方(本地)交替。我们提供了finite-time理论保证和 asymptotic regret bound of order $\tilde O(\sqrt{K_T T})$ up to time horizon $T$ with $K_T$ 是总共变化点数。在实践中,我们的框架与当前状态的最佳实践相比,在 sintetic 和实际环境中表现出色,并能够有效地处理风险敏感和非站台环境。

Online Thermal Field Prediction for Metal Additive Manufacturing of Thin Walls

  • paper_url: http://arxiv.org/abs/2310.16125
  • repo_url: None
  • paper_authors: Yifan Tang, M. Rahmani Dehaghani, Pouyan Sajadi, Shahriar Bakrani Balani, Akshay Dhalpe, Suraj Panicker, Di Wu, Eric Coatanea, G. Gary Wang
  • For: This paper aims to study a practical issue in metal AM, specifically how to predict the thermal field of yet-to-print parts online when only a few sensors are available.* Methods: The paper proposes an online thermal field prediction method using mapping and reconstruction, which incorporates an artificial neural network and a reduced order model (ROM) to estimate the temperature profiles of points on the yet-to-print layer.* Results: The proposed method can construct the thermal field of a yet-to-print layer within 0.1 seconds on a low-cost desktop, and has acceptable generalization capability in most cases from lower layers to higher layers in the same simulation and from one simulation to a new simulation on different AM process parameters.
    Abstract This paper aims to study a practical issue in metal AM, i.e., how to predict the thermal field of yet-to-print parts online when only a few sensors are available. This work proposes an online thermal field prediction method using mapping and reconstruction, which could be integrated into a metal AM process for online performance control. Based on the similarity of temperature curves (curve segments of a temperature profile of one point), the thermal field mapping applies an artificial neural network to estimate the temperature curves of points on the yet-to-print layer from measured temperatures of certain points on the previously printed layer. With measured/predicted temperature profiles of several points on the same layer, the thermal field reconstruction proposes a reduced order model (ROM) to construct the temperature profiles of all points on the same layer, which could be used to build the temperature field of the entire layer. The training of ROM is performed with an extreme learning machine (ELM) for computational efficiency. Fifteen wire arc AM experiments and nine simulations are designed for thin walls with a fixed length and unidirectional printing of each layer. The test results indicate that the proposed prediction method could construct the thermal field of a yet-to-print layer within 0.1 seconds on a low-cost desktop. Meanwhile, the method has acceptable generalization capability in most cases from lower layers to higher layers in the same simulation and from one simulation to a new simulation on different AM process parameters. More importantly, after fine-tuning the proposed method with limited experimental data, the relative errors of all predicted temperature profiles on a new experiment are sufficiently small, demonstrating the applicability and generalization of the proposed thermal field prediction method in online applications for metal AM.
    摘要 The method uses artificial neural networks to estimate the temperature curves of points on the yet-to-print layer based on measured temperatures of certain points on the previously printed layer. With measured/predicted temperature profiles of several points on the same layer, a reduced order model (ROM) is constructed to build the temperature field of the entire layer. The ROM is trained with an extreme learning machine (ELM) for computational efficiency.Experiments and simulations were conducted on thin walls with a fixed length and unidirectional printing of each layer. The results show that the proposed prediction method can construct the thermal field of a yet-to-print layer within 0.1 seconds on a low-cost desktop, and has acceptable generalization capability in most cases from lower layers to higher layers in the same simulation and from one simulation to a new simulation on different AM process parameters. After fine-tuning the method with limited experimental data, the relative errors of all predicted temperature profiles on a new experiment were sufficiently small, demonstrating the applicability and generalization of the proposed thermal field prediction method in online applications for metal AM.

Anchor Space Optimal Transport: Accelerating Batch Processing of Multiple OT Problems

  • paper_url: http://arxiv.org/abs/2310.16123
  • repo_url: None
  • paper_authors: Jianming Huang, Xun Su, Zhongxi Fang, Hiroyuki Kasai
  • for: 多个似然分布的批处理解决方案
  • methods: 使用共享anchor点空间来捕捉分布的共同特征,并提出三种方法来学习anchor空间,每种方法都有应用背景
  • results: 实验表明,提出的方法可以大幅降低计算时间,同时保持可接受的近似性表现
    Abstract The optimal transport (OT) theory provides an effective way to compare probability distributions on a defined metric space, but it suffers from cubic computational complexity. Although the Sinkhorn's algorithm greatly reduces the computational complexity of OT solutions, the solutions of multiple OT problems are still time-consuming and memory-comsuming in practice. However, many works on the computational acceleration of OT are usually based on the premise of a single OT problem, ignoring the potential common characteristics of the distributions in a mini-batch. Therefore, we propose a translated OT problem designated as the anchor space optimal transport (ASOT) problem, which is specially designed for batch processing of multiple OT problem solutions. For the proposed ASOT problem, the distributions will be mapped into a shared anchor point space, which learns the potential common characteristics and thus help accelerate OT batch processing. Based on the proposed ASOT, the Wasserstein distance error to the original OT problem is proven to be bounded by ground cost errors. Building upon this, we propose three methods to learn an anchor space minimizing the distance error, each of which has its application background. Numerical experiments on real-world datasets show that our proposed methods can greatly reduce computational time while maintaining reasonable approximation performance.
    摘要 Optimal transport(OT)理论提供了一种有效地比较概率分布在定义的度量空间上,但它受到立方体计算复杂性的限制。虽然斯inkelhorn算法可以大幅降低OT解决方案的计算复杂性,但在实践中,多个OT问题的解决仍然占用了大量的时间和内存。然而,许多关于OT计算加速的研究通常基于单个OT问题的假设,忽略了多个分布在一个批处理中的共同特征。因此,我们提出了一个名为anchor space optimal transport(ASOT)问题的翻译问题,这是专门为批处理多个OT问题的解决而设计的。在我们的ASOT问题中,分布将被映射到共享的anchor点空间中,这里学习了可能共同的特征,从而帮助加速OT批处理。根据我们的ASOT问题,我们证明了对原OT问题的沃asserstein距离错误是由地面成本错误约束的。基于这个结论,我们提出了三种方法来学习一个anchor空间,以降低距离错误,每种方法都有其应用背景。在实际数据上进行的数值实验表明,我们的提出方法可以大幅减少计算时间,同时保持合理的近似性。

19 Parameters Is All You Need: Tiny Neural Networks for Particle Physics

  • paper_url: http://arxiv.org/abs/2310.16121
  • repo_url: https://github.com/abogatskiy/pelican-nano
  • paper_authors: Alexander Bogatskiy, Timothy Hoffman, Jan T. Offermann
  • for: 这 paper 是为了探讨快速 neural network 架构的可行性,用于低延迟任务 such as triggering。
  • methods: 这 paper 使用了一种最近的 Lorentz-和 permutation-symmetric 架构 PELICAN,并提供了具有只 19 个可训练参数的实例,可以与 generic 架构相比而言而出色的表现。
  • results: 论文表明,PELICAN 架构可以在 top quark jet 分类任务中表现更好,并且只需要很少的参数数量。
    Abstract As particle accelerators increase their collision rates, and deep learning solutions prove their viability, there is a growing need for lightweight and fast neural network architectures for low-latency tasks such as triggering. We examine the potential of one recent Lorentz- and permutation-symmetric architecture, PELICAN, and present its instances with as few as 19 trainable parameters that outperform generic architectures with tens of thousands of parameters when compared on the binary classification task of top quark jet tagging.
    摘要 为了满足加速器的冲突速率的提高和深度学习解决方案的可行性,现有一个增长的需求是轻量级快速的神经网络架构,用于低延迟任务such as 触发。我们研究了最近的 Lorentz 和 permutation 相对symmetric架构PELICAN,并提供了具有只有19个可训练参数的实例,与通用架构数万个参数相比,在极高速批处理任务中表现出色。

Compressed representation of brain genetic transcription

  • paper_url: http://arxiv.org/abs/2310.16113
  • repo_url: None
  • paper_authors: James K Ruffle, Henry Watkins, Robert J Gray, Harpreet Hyare, Michel Thiebaut de Schotten, Parashkev Nachev
  • for: 本研究旨在提供一种高效的方法来压缩大规模的脑组织表达数据,以便更好地探索脑组织结构和功能。
  • methods: 本研究使用了多种常用的线性和非线性方法,包括PCA、kernel PCA、NMF、t-SNE、UMAP以及深度自编码,以实现数据压缩。
  • results: 研究结果表明,使用深度自编码可以获得最高的重建精度、结构准确性和预测utilty,因此支持使用深度自编码来表示脑组织表达数据。
    Abstract The architecture of the brain is too complex to be intuitively surveyable without the use of compressed representations that project its variation into a compact, navigable space. The task is especially challenging with high-dimensional data, such as gene expression, where the joint complexity of anatomical and transcriptional patterns demands maximum compression. Established practice is to use standard principal component analysis (PCA), whose computational felicity is offset by limited expressivity, especially at great compression ratios. Employing whole-brain, voxel-wise Allen Brain Atlas transcription data, here we systematically compare compressed representations based on the most widely supported linear and non-linear methods-PCA, kernel PCA, non-negative matrix factorization (NMF), t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP), and deep auto-encoding-quantifying reconstruction fidelity, anatomical coherence, and predictive utility with respect to signalling, microstructural, and metabolic targets. We show that deep auto-encoders yield superior representations across all metrics of performance and target domains, supporting their use as the reference standard for representing transcription patterns in the human brain.
    摘要 人脑的建筑物理太复杂,无法直观探索,需要使用压缩表示方法将其变化 проек到一个紧凑可 navigate 空间中。特别是在高维数据,如基因表达,其结合型复杂性和谱系强度需要最大压缩。现有的做法是使用标准的主成分分析(PCA),它的计算方便性受到限制,特别是在大压缩比例时。使用整个脑、每个 voxel 的 Allan 脑 Atlases 的整个转录数据,我们系统地比较了压缩表示方法,包括 PCA、kernel PCA、非正式矩阵分解(NMF)、t-Stochastic neighbor embedding(t-SNE)、Uniform manifold approximation and projection(UMAP)以及深度自编码。我们发现深度自编码可以在所有性能指标和目标领域中获得最佳表示,支持它作为人脑转录模式的参照标准。

Decentralized Learning over Wireless Networks with Broadcast-Based Subgraph Sampling

  • paper_url: http://arxiv.org/abs/2310.16106
  • repo_url: None
  • paper_authors: Daniel Pérez Herrera, Zheng Chen, Erik G. Larsson
  • for: 本文关注分布式学习无线网络上的通信方面,使用consensus-based分布式随机梯度下降(D-SGD)。考虑到网络中信息交换过程中的实际通信成本或延迟,我们的目标是使得算法快速 konvergence, measured by improvement per transmission slot。
  • methods: 我们提议了一种高效的通信框架BASS,用于D-SGD在无线网络上的广播传输和抽样。在每个迭代中,我们活化多个不干扰的节点广播模型更新到其邻居。这些subsets是随机变化的,与网络连接性相关,并且受到通信成本限制(例如,每次迭代的平均传输槽数)。在consensus更新步骤中,只有双向链接被保留,以保持通信对称。
  • results: 与现有的链接计划方法相比,无线通信频道的自然广播特性可以在同样的传输槽数下提高分布式学习的 konvergence 速度。
    Abstract This work centers on the communication aspects of decentralized learning over wireless networks, using consensus-based decentralized stochastic gradient descent (D-SGD). Considering the actual communication cost or delay caused by in-network information exchange in an iterative process, our goal is to achieve fast convergence of the algorithm measured by improvement per transmission slot. We propose BASS, an efficient communication framework for D-SGD over wireless networks with broadcast transmission and probabilistic subgraph sampling. In each iteration, we activate multiple subsets of non-interfering nodes to broadcast model updates to their neighbors. These subsets are randomly activated over time, with probabilities reflecting their importance in network connectivity and subject to a communication cost constraint (e.g., the average number of transmission slots per iteration). During the consensus update step, only bi-directional links are effectively preserved to maintain communication symmetry. In comparison to existing link-based scheduling methods, the inherent broadcasting nature of wireless channels offers intrinsic advantages in speeding up convergence of decentralized learning by creating more communicated links with the same number of transmission slots.
    摘要 To achieve this, we propose BASS, an efficient communication framework for D-SGD over wireless networks with broadcast transmission and probabilistic subgraph sampling. In each iteration, we activate multiple subsets of non-interfering nodes to broadcast model updates to their neighbors. These subsets are randomly activated over time, with probabilities reflecting their importance in network connectivity and subject to a communication cost constraint (e.g., the average number of transmission slots per iteration). During the consensus update step, only bi-directional links are effectively preserved to maintain communication symmetry.Compared to existing link-based scheduling methods, the inherent broadcasting nature of wireless channels offers intrinsic advantages in speeding up convergence of decentralized learning by creating more communicated links with the same number of transmission slots.

Locally Differentially Private Gradient Tracking for Distributed Online Learning over Directed Graphs

  • paper_url: http://arxiv.org/abs/2310.16105
  • repo_url: None
  • paper_authors: Ziqin Chen, Yongqiang Wang
  • for: solves the tradeoff between learning accuracy and privacy in distributed online learning over directed graphs.
  • methods: proposes a locally differentially private gradient tracking based distributed online learning algorithm that ensures rigorous local differential privacy and converges to the exact optimal solution.
  • results: outperforms existing counterparts in both training and testing accuracies, with guaranteed finite cumulative privacy budget even when the number of iterations tends to infinity.
    Abstract Distributed online learning has been proven extremely effective in solving large-scale machine learning problems over streaming data. However, information sharing between learners in distributed learning also raises concerns about the potential leakage of individual learners' sensitive data. To mitigate this risk, differential privacy, which is widely regarded as the "gold standard" for privacy protection, has been widely employed in many existing results on distributed online learning. However, these results often face a fundamental tradeoff between learning accuracy and privacy. In this paper, we propose a locally differentially private gradient tracking based distributed online learning algorithm that successfully circumvents this tradeoff. We prove that the proposed algorithm converges in mean square to the exact optimal solution while ensuring rigorous local differential privacy, with the cumulative privacy budget guaranteed to be finite even when the number of iterations tends to infinity. The algorithm is applicable even when the communication graph among learners is directed. To the best of our knowledge, this is the first result that simultaneously ensures learning accuracy and rigorous local differential privacy in distributed online learning over directed graphs. We evaluate our algorithm's performance by using multiple benchmark machine-learning applications, including logistic regression of the "Mushrooms" dataset and CNN-based image classification of the "MNIST" and "CIFAR-10" datasets, respectively. The experimental results confirm that the proposed algorithm outperforms existing counterparts in both training and testing accuracies.
    摘要 分布式在线学习已经证明可以非常有效地解决大规模机器学习问题,但是learners之间的信息共享也会使得个人学习者敏感数据泄露的风险增加。为了解决这个风险,分布式在线学习中广泛采用了广泛被视为“金标准”的隐私保护技术——差分隐私。然而,这些结果通常面临着学习精度和隐私之间的基本负担。在这篇论文中,我们提议一种基于差分隐私的梯度追踪分布式在线学习算法,该算法可以绕过学习精度和隐私之间的负担。我们证明该算法在mean square拟合到了准确的优质解,同时坚持rigorous的本地差分隐私,并且 garanttees the cumulative privacy budget是有限的,即使训练迭代数趋向于无穷。该算法适用于 directed communication graph中的learners。到目前为止,这是分布式在线学习中首先同时保证学习精度和rigorous的本地差分隐私的结果。我们通过多个benchmark机器学习应用程序进行性能评估,包括“蘑菇”数据集上的逻辑回归和“MNIST”和“CIFAR-10”数据集上的CNN图像分类,分别得出了良好的训练和测试准确率。

Contextual Bandits for Evaluating and Improving Inventory Control Policies

  • paper_url: http://arxiv.org/abs/2310.16096
  • repo_url: None
  • paper_authors: Dean Foster, Randy Jia, Dhruv Madeka
  • for: addresses the periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times.
  • methods: uses a light-weight contextual bandit-based algorithm to evaluate and occasionally tweak policies.
  • results: achieves favorable guarantees both theoretically and in empirical studies.Here’s the full translation in Simplified Chinese:
  • for: 本文addresses periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times.
  • methods: 使用轻量级上下文ual bandit-based算法来评估和 occasional tweak policies.
  • results: achieves favorable guarantees both theoretically and in empirical studies.
    Abstract Solutions to address the periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times typically involve making strong assumptions on the dynamics for either approximation or simulation, and applying methods such as optimization, dynamic programming, or reinforcement learning. Therefore, it is important to analyze and evaluate any inventory control policy, in particular to see if there is room for improvement. We introduce the concept of an equilibrium policy, a desirable property of a policy that intuitively means that, in hindsight, changing only a small fraction of actions does not result in materially more reward. We provide a light-weight contextual bandit-based algorithm to evaluate and occasionally tweak policies, and show that this method achieves favorable guarantees, both theoretically and in empirical studies.
    摘要 解决 periodic review 存储控制问题,包括非站ARY random demand,lost sales,和随机供应商延迟,通常需要做强大的假设,并使用优化、动态Programming或强化学习方法。因此,重要的是分析和评估存储控制策略,以确定是否有可能进行改进。我们介绍了平衡策略,这是一种策略的感知性质,意味着在后悔中,只需要改变一小部分的行动,不会导致更多的奖励。我们提供了一种轻量级上下文ual bandit-based算法来评估和偶尔调整策略,并证明了这种方法在理论和实际研究中都有有利的保证。

A Unified, Scalable Framework for Neural Population Decoding

  • paper_url: http://arxiv.org/abs/2310.16046
  • repo_url: None
  • paper_authors: Mehdi Azabou, Vinam Arora, Venkataramana Ganesh, Ximeng Mao, Santosh Nachimuthu, Michael J. Mendelson, Blake Richards, Matthew G. Perich, Guillaume Lajoie, Eva L. Dyer
  • for: 这个论文的目的是提出一种基于深度学习的方法,用于分析神经活动的大规模数据。
  • methods: 这个方法使用了跨注意力和PerceiverIO底层,将神经活动的population dynamics tokenized成一个有效的表示,并使用这个表示来预训练模型。
  • results: 作者在七只非human Primates的158个不同会话中,使用了大规模的数据集和100个小时的录制,预训练了一个大规模多会话模型。在不同的任务中, authorsshowed that their pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, achieving few-shot performance with minimal labels.
    Abstract Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale.
    摘要 我们可能通过深入学习方法解读神经活动受惠于更大的规模,包括模型大小和数据集。然而,将多个神经记录集成为一个统一的模型是挑战,因为每个记录包含不同动物的不同神经元的活动。在这篇论文中,我们介绍了一个训练框架和架构,用于模型神经活动范围内的人工智能动态。我们首先将数据集中的每个冲击分解成精细的时间结构,然后使用对话式和PerceiverIO脊梁来构建封装了神经人类活动的秘密tokenization。使用这个框架和训练方法,我们建立了一个大规模多Session模型,训练在7个非人类 Primates上,共计158个不同的Session,记录了27,373个神经元和100个小时的记录。在一些不同任务中,我们示示了我们预训练模型可以快速适应新、未看过的Session,并且只需少量标签。这项工作提出了一种新的深入学习工具来分析神经数据,并铺开了训练在大规模的道路。

TimewarpVAE: Simultaneous Time-Warping and Representation Learning of Trajectories

  • paper_url: http://arxiv.org/abs/2310.16027
  • repo_url: https://github.com/travers-rhodes/timewarpvae
  • paper_authors: Travers Rhodes, Daniel D. Lee
  • for: 本研究旨在学习复杂任务中人类示例路径的有效表示。
  • methods: 该研究提出了一种完全可导的拟合推断算法 TimewarpVAE,它通过动态时间拟合(DTW)同时学习时间变化和空间变化的纬度因素。
  • results: 研究结果表明,TimewarpVAE算法可以在小手写和餐刀捏持任务上学习有意义的时间对齐和空间变化的表示,并且可以生成符合 semantics的新路径。
    Abstract Human demonstrations of trajectories are an important source of training data for many machine learning problems. However, the difficulty of collecting human demonstration data for complex tasks makes learning efficient representations of those trajectories challenging. For many problems, such as for handwriting or for quasistatic dexterous manipulation, the exact timings of the trajectories should be factored from their spatial path characteristics. In this work, we propose TimewarpVAE, a fully differentiable manifold-learning algorithm that incorporates Dynamic Time Warping (DTW) to simultaneously learn both timing variations and latent factors of spatial variation. We show how the TimewarpVAE algorithm learns appropriate time alignments and meaningful representations of spatial variations in small handwriting and fork manipulation datasets. Our results have lower spatial reconstruction test error than baseline approaches and the learned low-dimensional representations can be used to efficiently generate semantically meaningful novel trajectories.
    摘要 人类示例路径是许多机器学习问题的重要训练数据源。然而,收集复杂任务的人类示例数据的困难性使得学习这些路径的有效表示困难。例如,手写或 quasi-静止的手部操作中,路径的具体时间 shouldn't 被从其空间轨迹特征中分离。在这种情况下,我们提出了 TimewarpVAE,一种完全可导的推广学习算法,其中包含动态时间扭曲(DTW),以同时学习时间变化和空间变化的秘密因素。我们展示了 TimewarpVAE 算法如何学习合适的时间对齐和有意义的空间变化表示。我们的结果在小手写和铲 manipulate 数据集上具有较低的空间重建测试错误,并且学习的低维度表示可以高效地生成具有 semantic 意义的新路径。

Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions

  • paper_url: http://arxiv.org/abs/2310.16076
  • repo_url: https://github.com/idsia/fwp-formal-lang
  • paper_authors: Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber
  • for: 这项研究探讨了基于批处理器的循环神经网络(RNN)的计算能力层次结构,以及在实时和有限精度假设下的RNN架构。
  • methods: 本研究使用自动回归转换器,即线性转换器(LT)或快速Weight程序(FWP),这些模型可以看作是固定大小状态的RNN序列处理器,同时也可以表示为当前流行的自注意网络。
  • results: 我们的实验表明,许多对标准转换器的结果直接适用于LTs/FWPs,而且扩展FWP可以超越LT的一些局限性,例如在枚举问题上进行泛化。我们的代码公开。
    Abstract Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions. Here we study auto-regressive Transformers with linearised attention, a.k.a. linear Transformers (LTs) or Fast Weight Programmers (FWPs). LTs are special in the sense that they are equivalent to RNN-like sequence processors with a fixed-size state, while they can also be expressed as the now-popular self-attention networks. We show that many well-known results for the standard Transformer directly transfer to LTs/FWPs. Our formal language recognition experiments demonstrate how recently proposed FWP extensions such as recurrent FWPs and self-referential weight matrices successfully overcome certain limitations of the LT, e.g., allowing for generalisation on the parity problem. Our code is public.
    摘要 近期研究计算力学的回归神经网络(RNN)表明了一个RNN架构层次结构,基于实时和有限精度假设。我们研究了自动回归转换器(LT),它们是特殊的因为它们可以视为固定大小的状态Sequence processor,同时也可以表示为目前受欢迎的自注意网络。我们证明了许多标准Transformer的结果直接适用于LTs/FWPs。我们的正式语言识别实验表明了 reciprocal FWP和自referential weight matrix的扩展可以超越LT的一些局限性,例如解决基本问题的泛化问题。我们的代码公开。

MLFMF: Data Sets for Machine Learning for Mathematical Formalization

  • paper_url: http://arxiv.org/abs/2310.16005
  • repo_url: https://github.com/ul-fmf/mlfmf-data
  • paper_authors: Andrej Bauer, Matej Petković, Ljupčo Todorovski
  • for: 本研究用于开发一个收集数据集,用于对ormalization of mathematics with proof assistants进行benchmarking。
  • methods: 本研究使用了Agda和Lean两种证明助手中的library,通过EXTRACTING网络和s-expressions来 Represent each library in two ways。
  • results: 本研究reported基线结果,使用了标准图 embeddings、word embeddings、tree ensembles和instance-based learning算法。 MLFMF数据集提供了对formalized mathematics的numerous machine learningapproaches的Solid benchmarking支持。
    Abstract We introduce MLFMF, a collection of data sets for benchmarking recommendation systems used to support formalization of mathematics with proof assistants. These systems help humans identify which previous entries (theorems, constructions, datatypes, and postulates) are relevant in proving a new theorem or carrying out a new construction. Each data set is derived from a library of formalized mathematics written in proof assistants Agda or Lean. The collection includes the largest Lean~4 library Mathlib, and some of the largest Agda libraries: the standard library, the library of univalent mathematics Agda-unimath, and the TypeTopology library. Each data set represents the corresponding library in two ways: as a heterogeneous network, and as a list of s-expressions representing the syntax trees of all the entries in the library. The network contains the (modular) structure of the library and the references between entries, while the s-expressions give complete and easily parsed information about every entry. We report baseline results using standard graph and word embeddings, tree ensembles, and instance-based learning algorithms. The MLFMF data sets provide solid benchmarking support for further investigation of the numerous machine learning approaches to formalized mathematics. The methodology used to extract the networks and the s-expressions readily applies to other libraries, and is applicable to other proof assistants. With more than $250\,000$ entries in total, this is currently the largest collection of formalized mathematical knowledge in machine learnable format.
    摘要 我们介绍MLFMF,一个收集数据集用于评估推荐系统的 formalized mathematics benchmarking 集合。这些系统可以帮助人类identify新的证明( theorem、construction、数据类型和假设)和新的构造是否有 relevance 在证明新的证明或完成新的构造。每个数据集来自于Agda或Lean证明助手中的库,包括最大的Lean4库Mathlib,以及一些最大的Agda库:标准库、Agda-unimath库和TypeTopology库。每个数据集表示对应的库在两种方式:为heterogeneous网络和所有entry的Syntax树表示。网络包含库中的(模块)结构和entry之间的引用,而Syntax树表示了每个entry的完整和可读性。我们报告了使用标准图 embeddings、word embeddings、树集和实例学习算法的基准结果。MLFMF数据集为further investigation of numerous machine learning approaches to formalized mathematics提供了坚实的 benchmarking 支持。我们使用的方法可以轻松应用于其他库和证明助手,现已有超过250000个入库。这是目前最大的 formalized mathematical knowledge 在机器学习化Format中的集合。

White-box Compiler Fuzzing Empowered by Large Language Models

  • paper_url: http://arxiv.org/abs/2310.15991
  • repo_url: None
  • paper_authors: Chenyuan Yang, Yinlin Deng, Runyu Lu, Jiayi Yao, Jiawei Liu, Reyhaneh Jabbarvand, Lingming Zhang
  • for: compiler testing, specifically white-box fuzzing of compiler optimizations
  • methods: uses Large Language Models (LLMs) with source-code information to generate high-quality tests for exercising deep optimizations
  • results: found 96 bugs, with 80 confirmed as previously unknown and 51 already fixed, demonstrating the effectiveness of WhiteFox in discovering previously undiscovered compiler bugs.
    Abstract Compiler correctness is crucial, as miscompilation falsifying the program behaviors can lead to serious consequences. In the literature, fuzzing has been extensively studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on black- and grey-box fuzzing, which generates tests without sufficient understanding of internal compiler behaviors. As such, they often fail to construct programs to exercise conditions of intricate optimizations. Meanwhile, traditional white-box techniques are computationally inapplicable to the giant codebase of compilers. Recent advances demonstrate that Large Language Models (LLMs) excel in code generation/understanding tasks and have achieved state-of-the-art performance in black-box fuzzing. Nonetheless, prompting LLMs with compiler source-code information remains a missing piece of research in compiler testing. To this end, we propose WhiteFox, the first white-box compiler fuzzer using LLMs with source-code information to test compiler optimization. WhiteFox adopts a dual-model framework: (i) an analysis LLM examines the low-level optimization source code and produces requirements on the high-level test programs that can trigger the optimization; (ii) a generation LLM produces test programs based on the summarized requirements. Additionally, optimization-triggering tests are used as feedback to further enhance the test generation on the fly. Our evaluation on four popular compilers shows that WhiteFox can generate high-quality tests to exercise deep optimizations requiring intricate conditions, practicing up to 80 more optimizations than state-of-the-art fuzzers. To date, WhiteFox has found in total 96 bugs, with 80 confirmed as previously unknown and 51 already fixed. Beyond compiler testing, WhiteFox can also be adapted for white-box fuzzing of other complex, real-world software systems in general.
    摘要 compiler Correctness 非常重要,因为 mistCompilation 可能会导致程序行为错误。在文献中,对于 Compiler 的检测已经进行了广泛的研究。然而, Compiler fuzzing 仍然是一项挑战:现有的技术主要集中在黑盒和灰盒 fuzzing 中,生成测试程序不具备内存compiler内部行为的充分理解。此外,传统的白盒技术对于编译器的代码库来说是计算不可行的。现有的研究表明, Large Language Models (LLMs) 在代码生成/理解任务中表现出色,并在黑盒 fuzzing 中达到了状态对照性。然而,在编译器测试中,LLMs 被提取source code 信息的研究还没有进行。为此,我们提出了 WhiteFox,首个使用 LLMs 和 source code 信息进行白盒编译器测试的工具。WhiteFox 采用了两个模型框架:(i)分析 LLM 对低级优化源代码进行分析,并生成高级测试程序可以触发优化;(ii)生成 LLM 根据总结的需求生成测试程序。此外,我们还使用优化触发测试作为反馈,以进一步改进测试生成。我们对四种流行的编译器进行评估,发现 WhiteFox 可以生成高质量的测试程序,激活深入的优化需求。至今,WhiteFox 共发现了96个bug,其中80个已经确认为新发现的问题,51个已经修复。除了编译器测试外,WhiteFox 还可以适用于白盒 fuzzing 其他复杂、实际世界软件系统。

Data-driven Traffic Simulation: A Comprehensive Review

  • paper_url: http://arxiv.org/abs/2310.15975
  • repo_url: None
  • paper_authors: Di Chen, Meixin Zhu, Hao Yang, Xuesong Wang, Yinhai Wang
  • For: This paper aims to review current research efforts and provide a futuristic perspective on data-driven microscopic traffic simulation for autonomous vehicles.* Methods: The paper discusses various methods, including imitation learning, reinforcement learning, generative learning, and deep learning, and evaluates their advantages and disadvantages.* Results: The paper provides a comprehensive evaluation of existing challenges and future research directions in data-driven microscopic traffic simulation for autonomous vehicles.Here is the same information in Simplified Chinese text:* For: 这篇论文目标是对当前的数据驱动微型交通模拟技术进行评估和未来发展规划。* Methods: 论文讨论了各种方法,包括仿制学习、奖励学习、生成学习和深度学习,并评估了它们的优缺点。* Results: 论文提供了数据驱动微型交通模拟技术的现有挑战和未来研究方向的全面评估。
    Abstract Autonomous vehicles (AVs) have the potential to significantly revolutionize society by providing a secure and efficient mode of transportation. Recent years have witnessed notable advance-ments in autonomous driving perception and prediction, but the challenge of validating the performance of AVs remains largely unresolved. Data-driven microscopic traffic simulation has be-come an important tool for autonomous driving testing due to 1) availability of high-fidelity traffic data; 2) its advantages of ena-bling large-scale testing and scenario reproducibility; and 3) its potential in reactive and realistic traffic simulation. However, a comprehensive review of this topic is currently lacking. This pa-per aims to fill this gap by summarizing relevant studies. The primary objective of this paper is to review current research ef-forts and provide a futuristic perspective that will benefit future developments in the field. It introduces the general issues of data-driven traffic simulation and outlines key concepts and terms. After overviewing traffic simulation, various datasets and evalua-tion metrics commonly used are reviewed. The paper then offers a comprehensive evaluation of imitation learning, reinforcement learning, generative and deep learning methods, summarizing each and analyzing their advantages and disadvantages in detail. Moreover, it evaluates the state-of-the-art, existing challenges, and future research directions.
    摘要 自动驾驶车(AV)有可能重大改变社会,提供安全有效的交通方式。过去几年,自动驾驶驱动技术得到了显著的进步,但AV的性能验证仍然存在重要的挑战。基于数据驱动的微型交通 simulate 成为了自动驾驶测试中的重要工具,因为它们具有以下优势:1)高品质交通数据的可 availability; 2)大规模测试和enario reproducibility的优势; 3)可以实现反应性和真实的交通 simulate。然而,这个领域的全面回顾仍然缺失。这篇论文的主要目标是审查当前的研究努力,并为未来发展提供未来性的视角。它介绍了一般问题,并 outline 关键概念和术语。然后,它综述了不同的数据集和评价指标,并进行了详细的分析和评价。此外,它还评估了现状、挑战和未来研究方向。

Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization

  • paper_url: http://arxiv.org/abs/2310.15976
  • repo_url: None
  • paper_authors: Zhen Qin, Zhishuai Liu, Pan Xu
  • for: 这个论文主要针对非对称优化问题,提出了一种基于随机排序的签名SGD算法,并证明了其 converge 性。
  • methods: 该论文使用了随机排序(SignRR)、变量减少(SignRVR)和均值更新(SignRVM)等方法,并证明了它们的 converge 性。
  • results: 该论文通过实验证明,随机排序签名SGD算法可以与现有的基准相匹配或超越它们。
    Abstract signSGD is popular in nonconvex optimization due to its communication efficiency. Yet, existing analyses of signSGD rely on assuming that data are sampled with replacement in each iteration, contradicting the practical implementation where data are randomly reshuffled and sequentially fed into the algorithm. We bridge this gap by proving the first convergence result of signSGD with random reshuffling (SignRR) for nonconvex optimization. Given the dataset size $n$, the number of epochs of data passes $T$, and the variance bound of a stochastic gradient $\sigma^2$, we show that SignRR has the same convergence rate $O(\log(nT)/\sqrt{nT} + \|\sigma\|_1)$ as signSGD \citep{bernstein2018signsgd}. We then present SignRVR and SignRVM, which leverage variance-reduced gradients and momentum updates respectively, both converging at $O(\log(nT)/\sqrt{nT})$. In contrast with the analysis of signSGD, our results do not require an extremely large batch size in each iteration to be of the same order as the total number of iterations \citep{bernstein2018signsgd} or the signs of stochastic and true gradients match element-wise with a minimum probability of 1/2 \citep{safaryan2021stochastic}. We also extend our algorithms to cases where data are distributed across different machines, yielding dist-SignRVR and dist-SignRVM, both converging at $O(\log(n_0T)/\sqrt{n_0T})$, where $n_0$ is the dataset size of a single machine. We back up our theoretical findings through experiments on simulated and real-world problems, verifying that randomly reshuffled sign methods match or surpass existing baselines.
    摘要 <> translate text into Simplified Chinese<>signSGD 受欢迎在非对称优化中,因为它的通信效率高。然而,现有的 signSGD 分析假设数据在每个迭代中随机抽取,与实际情况不符,即数据随机排序并且按照顺序feed into the algorithm。我们将这一悖论 bridge 这一悖论,通过证明 signSGD 随机排序(SignRR)在非对称优化中的首次收敛结果。给出 dataset 大小 $n$,迭代数据 passes $T$, Stochastic gradient 的方差 bound $\sigma^2$,我们显示 SignRR 的收敛速率为 $O(\frac{\log(nT)}{\sqrt{nT} + \|\sigma\|_1)$,与 signSGD 相同。然后,我们提出 SignRVR 和 SignRVM,它们都使用减少噪声的 gradient 和摩托UPDATE 分别收敛,它们的收敛速率为 $O(\frac{\log(nT)}{\sqrt{nT}$。与 signSGD 分析不同,我们的结果不需要在每个迭代中批处理大小与总迭代数量之间的很大批处理大小。我们还扩展我们的算法,以适应数据分布在不同机器上的情况,得到 dist-SignRVR 和 dist-SignRVM,它们的收敛速率为 $O(\frac{\log(n_0T)}{\sqrt{n_0T}$,其中 $n_0$ 是单机dataset 大小。我们通过对模拟和实际问题进行实验,证明 randomly reshuffled sign 方法与现有的基准相匹配或甚至超越。

Minimax Forward and Backward Learning of Evolving Tasks with Performance Guarantees

  • paper_url: http://arxiv.org/abs/2310.15974
  • repo_url: https://github.com/machinelearningbcam/imrcs-for-incremental-learning-neurips-2023
  • paper_authors: Verónica Álvarez, Santiago Mazuelas, Jose A. Lozano
  • for: 这篇论文是为了解决随时间推移而改变的任务序列中的分类任务。
  • methods: 本论文提出了一种叫做增量最小最大协方差分类器(IMRC),可以充分利用前向和后向学习,并考虑任务序列中任务之间的演进。
  • results: 实验结果显示,IMRC可以对于几乎无samples的情况下,实现明显的性能提升。
    Abstract For a sequence of classification tasks that arrive over time, it is common that tasks are evolving in the sense that consecutive tasks often have a higher similarity. The incremental learning of a growing sequence of tasks holds promise to enable accurate classification even with few samples per task by leveraging information from all the tasks in the sequence (forward and backward learning). However, existing techniques developed for continual learning and concept drift adaptation are either designed for tasks with time-independent similarities or only aim to learn the last task in the sequence. This paper presents incremental minimax risk classifiers (IMRCs) that effectively exploit forward and backward learning and account for evolving tasks. In addition, we analytically characterize the performance improvement provided by forward and backward learning in terms of the tasks' expected quadratic change and the number of tasks. The experimental evaluation shows that IMRCs can result in a significant performance improvement, especially for reduced sample sizes.
    摘要 For a sequence of classification tasks that arrive over time, it is common that tasks are evolving in the sense that consecutive tasks often have a higher similarity. The incremental learning of a growing sequence of tasks holds promise to enable accurate classification even with few samples per task by leveraging information from all the tasks in the sequence (forward and backward learning). However, existing techniques developed for continual learning and concept drift adaptation are either designed for tasks with time-independent similarities or only aim to learn the last task in the sequence. This paper presents incremental minimax risk classifiers (IMRCs) that effectively exploit forward and backward learning and account for evolving tasks. In addition, we analytically characterize the performance improvement provided by forward and backward learning in terms of the tasks' expected quadratic change and the number of tasks. The experimental evaluation shows that IMRCs can result in a significant performance improvement, especially for reduced sample sizes.Here's the translation in Traditional Chinese:在时间序列中涌现的课题中,常有课题之间的相似性增加。对于这种情况,将批量学习应用到不断增长的课题序列上,可以实现仅使用少量样本进行精确的分类。然而,现有的持续学习和概念漂移适应技术 either 对任务之间的相似性为时间独立或只是针对最后一个任务进行学习。本文则提出了增量最大危险风险分类器(IMRC),可以有效地利用前向和后向学习,并考虑到任务的演化。此外,我们也 analytically Characterize 增量学习的性能改善,包括任务预期的quadratic change 和任务数量。实验评估显示,IMRC 可以对于仅使用少量样本进行分类时,实现重要的性能改善。

Constructing and Machine Learning Calabi-Yau Five-folds

  • paper_url: http://arxiv.org/abs/2310.15966
  • repo_url: None
  • paper_authors: R. Alawadhi, D. Angella, A. Leonardo, T. Schettini Gherardini
  • for: 这个论文目的是构建所有可能的完全交叉Calabi-Yau五维空间,使用四个或更少的复抽象 проекive空间,并对这些空间进行研究。
  • methods: 作者使用了configuration matrix的方法,并对这些空间进行计数。他们还计算了这些空间中的cohomological数据,并将其存储在 dataset 中。
  • results: 作者发现了2375个不同的Hodge diamond,并对这些数据进行了supervised机器学习,使用类ifier和回归神经网络。他们发现,可以非常有效地预测h^{1,1}的值,并且accuracy达96%。对于h^{1,4},h^{2,3}和η,也发现了非常高的R^2 Score,但是精度较低,因为这些值的范围很大。
    Abstract We construct all possible complete intersection Calabi-Yau five-folds in a product of four or less complex projective spaces, with up to four constraints. We obtain $27068$ spaces, which are not related by permutations of rows and columns of the configuration matrix, and determine the Euler number for all of them. Excluding the $3909$ product manifolds among those, we calculate the cohomological data for $12433$ cases, i.e. $53.7 \%$ of the non-product spaces, obtaining $2375$ different Hodge diamonds. The dataset containing all the above information is available at https://www.dropbox.com/scl/fo/z7ii5idt6qxu36e0b8azq/h?rlkey=0qfhx3tykytduobpld510gsfy&dl=0 . The distributions of the invariants are presented, and a comparison with the lower-dimensional analogues is discussed. Supervised machine learning is performed on the cohomological data, via classifier and regressor (both fully connected and convolutional) neural networks. We find that $h^{1,1}$ can be learnt very efficiently, with very high $R^2$ score and an accuracy of $96\%$, i.e. $96 \%$ of the predictions exactly match the correct values. For $h^{1,4},h^{2,3}, \eta$, we also find very high $R^2$ scores, but the accuracy is lower, due to the large ranges of possible values.
    摘要 我们建构所有可能的完整交叉Calabi-Yau五维流体,在四个或更少的复数射影空间中,并将其条件为最多四个。我们获得了27068个空间,其中3909个是产生的流体,我们计算了这些流体的 cohomological 数据,获得了2375个不同的Hodge几何。我们提供了这些数据的Dataset,可以在https://www.dropbox.com/scl/fo/z7ii5idt6qxu36e0b8azq/h?rlkey=0qfhx3tykytduobpld510gsfy&dl=0 获取。我们分析了这些数据的分布,并与低维类型的数据进行比较。我们还使用了supervised机器学习,使用分类器和回归(both fully connected和 convolutional)神经网络,以学习cohomological数据。我们发现,可以非常高效地学习 $h^{1,1}$,其 $R^2$ 分数和准确率均非常高,分别为96%和96%。对于 $h^{1,4},h^{2,3}, \eta$,我们也发现了非常高的 $R^2$ 分数,但准确率较低,因为这些数据的可能的值幅度较大。

Weighted Distance Nearest Neighbor Condensing

  • paper_url: http://arxiv.org/abs/2310.15951
  • repo_url: None
  • paper_authors: Lee-Ad Gottlieb, Timor Sharabi, Roi Weiss
  • for: 这篇论文研究了加权距离最近邻居减去问题,即对减去集中分配权重,然后根据权重距离最近邻居来标注新点。
  • methods: 本论文提出了一种新的减去模型,并研究了其理论性质。具体来说,作者们使用权重距离来标注新点,并证明了这种方法可以比标准最近邻居规则更好地减去。
  • results: 作者们在论文中提出了一种减去启发式,并证明了这种启发式是极大likelihood估计的极限情况。此外,他们还进行了一些实验,并证明了这种方法在实际应用中的表现非常良好。
    Abstract The problem of nearest neighbor condensing has enjoyed a long history of study, both in its theoretical and practical aspects. In this paper, we introduce the problem of weighted distance nearest neighbor condensing, where one assigns weights to each point of the condensed set, and then new points are labeled based on their weighted distance nearest neighbor in the condensed set. We study the theoretical properties of this new model, and show that it can produce dramatically better condensing than the standard nearest neighbor rule, yet is characterized by generalization bounds almost identical to the latter. We then suggest a condensing heuristic for our new problem. We demonstrate Bayes consistency for this heuristic, and also show promising empirical results.
    摘要 这个 nearest neighbor 压缩问题有很长的历史研究,包括理论和实践方面。在这篇论文中,我们引入了一个新的weighted distance nearest neighbor压缩模型,其中每个点都有一个权重,然后根据这些权重计算的距离最近的点在压缩集中。我们研究了这个新模型的理论性质,并证明它可以比标准的 nearest neighbor 规则更好地压缩,但是其泛化 bound 几乎与后者一样。我们then propose了一种压缩启发法,并证明了这种启发法的拜访统计准确性。最后,我们还提供了一些实验结果,表明这种方法的潜在实用性。

ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2310.15938
  • repo_url: None
  • paper_authors: Anshul Ahluwalia, Rohit Das, Payman Behnam, Alind Khare, Pan Li, Alexey Tumanov
    for: 这个论文的目的是提出一种基于注意力的知识填充(ABKD)方法,用于压缩图像网络(GNNs),以提高压缩后的准确率。methods: 这个论文使用了知识填充(KD)方法,并使用注意力机制来选择重要的教师-学生层对。results: compared to现有的方法,这个方法可以实现更高的压缩率(32.3倍),同时保持相对较高的准确率(1.79%提升)。
    Abstract Graph Neural Networks (GNNs) have proven to be quite versatile for a variety of applications, including recommendation systems, fake news detection, drug discovery, and even computer vision. Due to the expanding size of graph-structured data, GNN models have also increased in complexity, leading to substantial latency issues. This is primarily attributed to the irregular structure of graph data and its access pattern into memory. The natural solution to reduce latency is to compress large GNNs into small GNNs. One way to do this is via knowledge distillation (KD). However, most KD approaches for GNNs only consider the outputs of the last layers and do not consider the outputs of the intermediate layers of the GNNs; these layers may contain important inductive biases indicated by the graph structure. To address this shortcoming, we propose a novel KD approach to GNN compression that we call Attention-Based Knowledge Distillation (ABKD). ABKD is a KD approach that uses attention to identify important intermediate teacher-student layer pairs and focuses on aligning their outputs. ABKD enables higher compression of GNNs with a smaller accuracy dropoff compared to existing KD approaches. On average, we achieve a 1.79% increase in accuracy with a 32.3x compression ratio on OGBN-Mag, a large graph dataset, compared to state-of-the-art approaches.
    摘要 Graph Neural Networks (GNNs) 已经证明非常适用于多种应用程序,包括推荐系统、假新闻检测、药物发现和计算机视觉。由于图数据的规模不断扩大,GNN 模型也在复杂度方面不断提高,导致了明显的延迟问题。这主要归结于图数据的不规则结构和访问模式。以减少延迟为目的,可以将大型 GNN 压缩成小型 GNN。一种可行的方法是通过知识传播(KD)。然而,现有的 KD 方法只考虑 GNN 的最后层输出,而忽略了中间层的输出,这些层可能包含图结构中重要的导向信息。为了解决这个缺陷,我们提出了一种新的 KD 方法,称之为注意力基本知识传播(ABKD)。ABKD 是一种使用注意力来确定重要的教师生Student层对对应的输出的准确性。ABKD 可以更好地压缩 GNN,并且相比现有的 KD 方法,具有更高的压缩率和更低的准确性下降。在 OGBN-Mag 大图数据集上,我们平均 achieved 1.79% 的准确性提高,与状态机器的 KD 方法相比,具有 32.3 倍的压缩率。

Online Robust Mean Estimation

  • paper_url: http://arxiv.org/abs/2310.15932
  • repo_url: None
  • paper_authors: Daniel M. Kane, Ilias Diakonikolas, Hanshen Xiao, Sihan Liu
  • for: 本文研究高维度 robust mean estimation 问题在在线 Setting下。 Specifically, we consider a scenario where $n$ sensor 测量一些常见、持续进行的现象。 At each time step $t=1,2,\ldots,T$, the $i^{th}$ sensor reports its readings $x^{(i)}_t$ for that time step. The algorithm must then commit to its estimate $\mu_t$ for the true mean value of the process at time $t$.
  • methods: 我们使用 online 算法来计算一个好的 approximation $\mu$ 来近似 true mean 值 $\mu^\ast := \mathbf{E}[X]$. We assume that most of the sensors observe independent samples from some common distribution $X$, but an $\epsilon$-fraction of them may instead behave maliciously.
  • results: 我们证明了两个主要结论。 First, if the uncorrupted samples satisfy the standard condition of $(\epsilon,\delta)$-stability, we give an efficient online algorithm that outputs estimates $\mu_t$, $t \in [T],$ such that with high probability it holds that $|\mu-\mu^\ast|_2 = O(\delta \log(T))$. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve $\ell_2$-error of $O(\delta)$. Our second main result shows that with additional assumptions on the input (most notably that $X$ is a product distribution) there are inefficient algorithms whose error does not depend on $T$ at all.
    Abstract We study the problem of high-dimensional robust mean estimation in an online setting. Specifically, we consider a scenario where $n$ sensors are measuring some common, ongoing phenomenon. At each time step $t=1,2,\ldots,T$, the $i^{th}$ sensor reports its readings $x^{(i)}_t$ for that time step. The algorithm must then commit to its estimate $\mu_t$ for the true mean value of the process at time $t$. We assume that most of the sensors observe independent samples from some common distribution $X$, but an $\epsilon$-fraction of them may instead behave maliciously. The algorithm wishes to compute a good approximation $\mu$ to the true mean $\mu^\ast := \mathbf{E}[X]$. We note that if the algorithm is allowed to wait until time $T$ to report its estimate, this reduces to the well-studied problem of robust mean estimation. However, the requirement that our algorithm produces partial estimates as the data is coming in substantially complicates the situation. We prove two main results about online robust mean estimation in this model. First, if the uncorrupted samples satisfy the standard condition of $(\epsilon,\delta)$-stability, we give an efficient online algorithm that outputs estimates $\mu_t$, $t \in [T],$ such that with high probability it holds that $\|\mu-\mu^\ast\|_2 = O(\delta \log(T))$, where $\mu = (\mu_t)_{t \in [T]}$. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve $\ell_2$-error of $O(\delta)$. Our second main result shows that with additional assumptions on the input (most notably that $X$ is a product distribution) there are inefficient algorithms whose error does not depend on $T$ at all.
    摘要 我们研究了一个高维Robust Mean Estimation问题在在线 Setting中。 Specifically, we consider a scenario where $n$ sensor are measuring some common, ongoing phenomenon. At each time step $t=1,2,\ldots,T$, the $i^{th}$ sensor reports its readings $x^{(i)}_t$ for that time step. The algorithm must then commit to its estimate $\mu_t$ for the true mean value of the process at time $t$. We assume that most of the sensors observe independent samples from some common distribution $X$, but an $\epsilon$-fraction of them may instead behave maliciously. The algorithm wishes to compute a good approximation $\mu$ to the true mean $\mu^\ast := \mathbf{E}[X]$. We note that if the algorithm is allowed to wait until time $T$ to report its estimate, this reduces to the well-studied problem of robust mean estimation. However, the requirement that our algorithm produces partial estimates as the data is coming in substantially complicates the situation. 我们证明了在这种模型中的两个主要结果。 first, if the uncorrupted samples satisfy the standard condition of $( \epsilon, \delta )$-stability, we give an efficient online algorithm that outputs estimates $\mu_t$, $t \in [T],$ such that with high probability it holds that $\| \mu - \mu^\ast \|_2 = O( \delta \log(T) )$, where $\mu = ( \mu_t )_{t \in [T]}$. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve $\ell_2$-error of $O( \delta )$. Our second main result shows that with additional assumptions on the input (most notably that $X$ is a product distribution) there are inefficient algorithms whose error does not depend on $T$ at all.

Climate Change Impact on Agricultural Land Suitability: An Interpretable Machine Learning-Based Eurasia Case Study

  • paper_url: http://arxiv.org/abs/2310.15912
  • repo_url: None
  • paper_authors: Valeriy Shevchenko, Daria Taniushkina, Aleksander Lukashevich, Aleksandr Bulkin, Roland Grinis, Kirill Kovalev, Veronika Narozhnaia, Nazar Sotiriadi, Alexander Krenke, Yury Maximov
  • for: 预测 климатиче变化对农业适应性的影响,以便为政策制定人员提供有用的决策指南,以避免人道主义危机。
  • methods: 使用机器学习方法预测不同碳排放enario下农业适应性的风险,并通过全面的特征重要性分析,揭示了气候和地形特征对农业适应性的影响。
  • results: 该研究实现了Remarkable accuracy,为政策制定人员提供了有价值的决策指南,以避免人道主义危机。
    Abstract The United Nations has identified improving food security and reducing hunger as essential components of its sustainable development goals. As of 2021, approximately 828 million people worldwide are experiencing hunger and malnutrition, with numerous fatalities reported. Climate change significantly impacts agricultural land suitability, potentially leading to severe food shortages and subsequent social and political conflicts. To address this pressing issue, we have developed a machine learning-based approach to predict the risk of substantial land suitability degradation and changes in irrigation patterns. Our study focuses on Central Eurasia, a region burdened with economic and social challenges. This study represents a pioneering effort in utilizing machine learning methods to assess the impact of climate change on agricultural land suitability under various carbon emissions scenarios. Through comprehensive feature importance analysis, we unveil specific climate and terrain characteristics that exert influence on land suitability. Our approach achieves remarkable accuracy, offering policymakers invaluable insights to facilitate informed decisions aimed at averting a humanitarian crisis, including strategies such as the provision of additional water and fertilizers. This research underscores the tremendous potential of machine learning in addressing global challenges, with a particular emphasis on mitigating hunger and malnutrition.
    摘要 联合国认为,提高食品安全和减少饥饿是可持续发展目标的重要组成部分。截至2021年,全球约有828万人在饥饿和营养不良的情况下生活,有多起死亡报告。气候变化对农业适应性的影响是重要的,可能导致严重的食品短缺和社会和政治冲突。为解决这一问题,我们开发了一种基于机器学习的方法,以预测气候变化对农业适应性的影响。我们的研究对中亚欧洲进行了应用,这是一个受经济和社会挑战的地区。本研究是机器学习方法在评估气候变化对农业适应性的影响方面的先驱之作。通过全面的特征重要性分析,我们揭示了气候和地形特征对适应性的影响。我们的方法具有remarkable的准确性,为政策制定人员提供了有价值的信息,以便采取积极措施,如提供额外的水和肥料,以避免人道主义危机。这项研究强调机器学习在解决全球问题方面的潜在力量,特别是减少饥饿和营养不良的挑战。

Neural Collapse in Multi-label Learning with Pick-all-label Loss

  • paper_url: http://arxiv.org/abs/2310.15903
  • repo_url: https://github.com/heimine/nc_mlab
  • paper_authors: Pengyu Li, Yutong Wang, Xiao Li, Qing Qu
  • for: 本文研究了深度神经网络在多类别分类任务中的彩色折衣(NC)现象,并扩展到多标签学习任务。
  • methods: 本文使用了先前的工作 restriction to multi-class classification setting中发现的NC现象的扩展,并证明了一种泛化的NC现象在多标签学习中存在。
  • results: 本文发现了一种新的 combinatorial 性,称为“标签均值”性质,其表明在多标签样本中,每个标签的特征类别均值是各个标签的特征类别均值的扩展。此外,本文也证明了全球优化结果,表明在UFM下,pick-all-label cross entropy risk的全球优化结果是唯一的。
    Abstract We study deep neural networks for the multi-label classification (MLab) task through the lens of neural collapse (NC). Previous works have been restricted to the multi-class classification setting and discovered a prevalent NC phenomenon comprising of the following properties for the last-layer features: (i) the variability of features within every class collapses to zero, (ii) the set of feature means form an equi-angular tight frame (ETF), and (iii) the last layer classifiers collapse to the feature mean upon some scaling. We generalize the study to multi-label learning, and prove for the first time that a generalized NC phenomenon holds with the "pick-all-label'' formulation. Under the natural analog of the unconstrained feature model (UFM), we establish that the only global classifier of the pick-all-label cross entropy loss display the same ETF geometry which further collapse to multiplicity-1 feature class means. Besides, we discover a combinatorial property in generalized NC which is unique for multi-label learning that we call ``tag-wise average'' property, where the feature class-means of samples with multiple labels are scaled average of the feature class-means of single label tags. Theoretically, we establish global optimality result for the pick-all-label cross-entropy risk for the UFM. Additionally, We also provide empirical evidence to support our investigation into training deep neural networks on multi-label datasets, resulting in improved training efficiency.
    摘要 我们研究深度神经网络对多类别分类任务(MLab)的学习过程中的神经垮坏(NC)现象。先前的研究都是对多类别分类任务进行研究,发现了一种常见的NC现象,其特征是:(i)每个类别内的特征变量归一化到零,(ii)最后层特征均值组成一个等角紧凑矩阵(ETF),以及(iii)最后层分类器归一化到特征均值。我们扩展了研究到多类别学习任务,并证明了一种普遍的NC现象在“选取所有标签”(pick-all-label)形式下存在。在自然的无约束特征模型(UFM)下,我们证明了全球分类器的pick-all-label权重概率分布具有相同的ETF几何结构,并且归一化到多重特征类别均值。此外,我们发现了多类别学习中特有的“标签均值”性质,我们称之为“标签均值”性质,即每个样本的多个标签的特征类别均值是多个标签的特征类别均值的扩展。我们 theoretically 证明了pick-all-label权重概率的全球最优性Result,并且我们还提供了实验证明,证明在训练深度神经网络在多类别数据上的效率提高。

Cross-feature Contrastive Loss for Decentralized Deep Learning on Heterogeneous Data

  • paper_url: http://arxiv.org/abs/2310.15890
  • repo_url: https://github.com/aparna-aketi/cross_feature_contrastive_loss
  • paper_authors: Sai Aparna Aketi, Kaushik Roy
  • for: 这篇论文的目的是提出一种基于对比损失的无中心学习方法,以应对实际分布式数据中的数据不均匀性。
  • methods: 这篇论文使用的方法包括了无中心学习算法,以及基于对比损失的数据自由知识传递技术。
  • results: 实验结果显示,这篇论文的提案方法可以在各种计算机视觉数据集(CIFAR-10、CIFAR-100、时尚MINST、Imagenette、ImageNet)、模型架构(ResNet、Inception)和网络架构(FC、CNN)上显示出超过0.2-4%的提升(相较于其他现有的无中心学习方法)。
    Abstract The current state-of-the-art decentralized learning algorithms mostly assume the data distribution to be Independent and Identically Distributed (IID). However, in practical scenarios, the distributed datasets can have significantly heterogeneous data distributions across the agents. In this work, we present a novel approach for decentralized learning on heterogeneous data, where data-free knowledge distillation through contrastive loss on cross-features is utilized to improve performance. Cross-features for a pair of neighboring agents are the features (i.e., last hidden layer activations) obtained from the data of an agent with respect to the model parameters of the other agent. We demonstrate the effectiveness of the proposed technique through an exhaustive set of experiments on various Computer Vision datasets (CIFAR-10, CIFAR-100, Fashion MNIST, Imagenette, and ImageNet), model architectures, and network topologies. Our experiments show that the proposed method achieves superior performance (0.2-4% improvement in test accuracy) compared to other existing techniques for decentralized learning on heterogeneous data.
    摘要 现代的分布式学习算法大多假设数据分布是独立同分布(IID)。然而,在实际应用场景中,分布式数据集可能具有明显不同的数据分布。在这项工作中,我们提出了一种新的分布式学习方法,通过对不同特征进行数据解压缩,提高性能。特征是指一对邻居代理的特征(即模型参数之间的最后隐藏层活动)。我们通过对各种计算机视觉数据集(CIFAR-10、CIFAR-100、时尚MINST、Imagenette和ImageNet)、模型架构和网络结构进行广泛的实验,证明了我们的方法的有效性。我们的实验结果表明,我们的方法可以与其他现有的分布式学习方法相比,在不同数据分布情况下提高测试准确率(0.2-4%)。

State Sequences Prediction via Fourier Transform for Representation Learning

  • paper_url: http://arxiv.org/abs/2310.15888
  • repo_url: https://github.com/miralab-ustc/rl-spf
  • paper_authors: Mingxuan Ye, Yufei Kuang, Jie Wang, Rui Yang, Wengang Zhou, Houqiang Li, Feng Wu
  • for: 提高深度强化学习(RL)的数据效率,增强RL在复杂控制任务中的性能。
  • methods: 利用 Fourier 变换来提取状态序列中的下行结构信息,并通过这种信息来学习高效表示。
  • results: 实验表明,提案的方法可以在样本效率和性能两个方面超过一些现有的算法。
    Abstract While deep reinforcement learning (RL) has been demonstrated effective in solving complex control tasks, sample efficiency remains a key challenge due to the large amounts of data required for remarkable performance. Existing research explores the application of representation learning for data-efficient RL, e.g., learning predictive representations by predicting long-term future states. However, many existing methods do not fully exploit the structural information inherent in sequential state signals, which can potentially improve the quality of long-term decision-making but is difficult to discern in the time domain. To tackle this problem, we propose State Sequences Prediction via Fourier Transform (SPF), a novel method that exploits the frequency domain of state sequences to extract the underlying patterns in time series data for learning expressive representations efficiently. Specifically, we theoretically analyze the existence of structural information in state sequences, which is closely related to policy performance and signal regularity, and then propose to predict the Fourier transform of infinite-step future state sequences to extract such information. One of the appealing features of SPF is that it is simple to implement while not requiring storage of infinite-step future states as prediction targets. Experiments demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.
    摘要 深度强化学习(RL)已经在复杂控制任务中得到了证明,但是样本效率仍然是一个关键挑战,因为需要大量数据来获得出色的性能。现有研究利用表示学习来提高数据效率的RL,例如通过预测长期未来状态来学习预测性表示。然而,许多现有方法没有充分利用状态序列中的结构信息,这可能会提高长期决策质量,但是在时间频谱中很难发现。为解决这个问题,我们提出了状态序列预测via傅里叶变换(SPF),一种新的方法,它利用状态序列的频谱频率来提取下表示的基本特征,从而学习表达式的表示。我们 theoretically 分析了状态序列中的结构信息,与策略性能和信号规律性 closely 相关,然后提议预测无穷步未来状态序列的傅里叶变换来提取这些信息。SPF 的一个吸引人的特点是它简单易行,不需要存储无穷步未来状态作为预测目标。实验表明,提案的方法在样本效率和性能两个方面都超过了一些现有算法。

Using Causality-Aware Graph Neural Networks to Predict Temporal Centralities in Dynamic Graphs

  • paper_url: http://arxiv.org/abs/2310.15865
  • repo_url: None
  • paper_authors: Franziska Heeg, Ingo Scholtes
  • for: 该论文旨在 Addressing the issue of computationally expensive path-based centrality calculation in temporal graphs, and exploring the application of De Bruijn Graph Neural Networks (DBGNN) for predicting temporal path-based centralities in time series data.
  • methods: 该论文使用了 De Bruijn Graph Neural Networks (DBGNN) 来预测 temporal 图的路径基于中心性。
  • results: 该论文在 13 个 temporal 图 dataset 上进行实验,并显示了 DBGNN 可以 considerably improve the prediction of both betweenness and closeness centrality compared to static Graph Convolutional Neural Network.
    Abstract Node centralities play a pivotal role in network science, social network analysis, and recommender systems. In temporal data, static path-based centralities like closeness or betweenness can give misleading results about the true importance of nodes in a temporal graph. To address this issue, temporal generalizations of betweenness and closeness have been defined that are based on the shortest time-respecting paths between pairs of nodes. However, a major issue of those generalizations is that the calculation of such paths is computationally expensive. Addressing this issue, we study the application of De Bruijn Graph Neural Networks (DBGNN), a causality-aware graph neural network architecture, to predict temporal path-based centralities in time series data. We experimentally evaluate our approach in 13 temporal graphs from biological and social systems and show that it considerably improves the prediction of both betweenness and closeness centrality compared to a static Graph Convolutional Neural Network.
    摘要 节点中心性在网络科学、社交网络分析和推荐系统中扮演着关键性的角色。在时间数据中,静态路径基于中心性如距离或中间性可能会导致 mistakenly 高估节点的重要性。为解决这个问题,我们定义了时间概念的betweenness和closeness中心性的扩展,它们基于时间尊重的最短路径 между pair of nodes。然而,这些扩展的计算具有高计算成本。为此,我们研究了使用 De Bruijn 图解决方案(DBGNN),一种具有 causality-awareness 的图解决方案,来预测时间序列数据中的时间路径基于中心性。我们对 13 个时间图从生物和社会系统进行实验,并证明我们的方法可以在预测 betweenness 和 closeness 中心性方面提高比 static 图CONvolutional Neural Network 的表现。

Improving Event Time Prediction by Learning to Partition the Event Time Space

  • paper_url: http://arxiv.org/abs/2310.15853
  • repo_url: https://github.com/djdprogramming/adfa2
  • paper_authors: Jimmy Hickey, Ricardo Henao, Daniel Wojdyla, Michael Pencina, Matthew M. Engelhard
  • for: 本研究旨在提高现有方法的存活分析方法,通过预测每个预先指定的时间间隔中事件发生的概率,以提高预测性能,特别是当数据充沛时。
  • methods: 本研究使用了一种方法,即通过学习数据中的切点来定义一个有限数量的时间间隔,以便在存活分析中进行更好的预测。
  • results: 在两个 simulated datasets 中,我们能够回归到下samples的切点,并在三个实际的观察数据集中显示了提高的预测性能。此外,我们还表明了该方法可以帮助临床决策人员更好地选择适合每个任务的时间间隔,以提高预测性能。
    Abstract Recently developed survival analysis methods improve upon existing approaches by predicting the probability of event occurrence in each of a number pre-specified (discrete) time intervals. By avoiding placing strong parametric assumptions on the event density, this approach tends to improve prediction performance, particularly when data are plentiful. However, in clinical settings with limited available data, it is often preferable to judiciously partition the event time space into a limited number of intervals well suited to the prediction task at hand. In this work, we develop a method to learn from data a set of cut points defining such a partition. We show that in two simulated datasets, we are able to recover intervals that match the underlying generative model. We then demonstrate improved prediction performance on three real-world observational datasets, including a large, newly harmonized stroke risk prediction dataset. Finally, we argue that our approach facilitates clinical decision-making by suggesting time intervals that are most appropriate for each task, in the sense that they facilitate more accurate risk prediction.
    摘要 Translated into Simplified Chinese:现在的生存分析方法已经超越了现有的方法,它可以预测每个预先指定的时间间隔中事件发生的概率。通过不对事件密度做强制的参数假设,这种方法在数据充沛的情况下往往会提高预测性能。然而,在医疗设置中具有有限的数据时,通常是选择judiciously事件时间空间中的一个有限数量的间隔,以适应预测任务。在这个工作中,我们开发了一种从数据中学习分割点的方法,以定义这些间隔。我们在两个随机数据集中能够回归到下面的生成模型。然后,我们在三个实际观察数据集上进行了改进预测性能。最后,我们认为我们的方法可以促进临床决策,因为它们可以为每个任务提供时间间隔,这些时间间隔可以更好地预测风险。

Spatial-Temporal Hypergraph Neural Network for Traffic Forecasting

  • paper_url: http://arxiv.org/abs/2310.16070
  • repo_url: None
  • paper_authors: Chengzhi Yao, Zhi Li, Junbo Wang
  • for: traffic forecasting in Intelligent Transportation Systems, to implement rich and varied transportation applications and bring convenient transportation services to people based on collected traffic data.
  • methods: combines road network topology and traffic dynamics to capture high-order spatio-temporal dependencies in traffic data, using a spatial module and a temporal module, including an adaptive MixHop hypergraph ODE network and a hyperedge evolving ODE network.
  • results: superior performance compared to various baselines, as demonstrated through extensive experiments conducted on four real-world traffic datasets.
    Abstract Traffic forecasting, which benefits from mobile Internet development and position technologies, plays a critical role in Intelligent Transportation Systems. It helps to implement rich and varied transportation applications and bring convenient transportation services to people based on collected traffic data. Most existing methods usually leverage graph-based deep learning networks to model the complex road network for traffic forecasting shallowly. Despite their effectiveness, these methods are generally limited in fully capturing high-order spatial dependencies caused by road network topology and high-order temporal dependencies caused by traffic dynamics. To tackle the above issues, we focus on the essence of traffic system and propose STHODE: Spatio-Temporal Hypergraph Neural Ordinary Differential Equation Network, which combines road network topology and traffic dynamics to capture high-order spatio-temporal dependencies in traffic data. Technically, STHODE consists of a spatial module and a temporal module. On the one hand, we construct a spatial hypergraph and leverage an adaptive MixHop hypergraph ODE network to capture high-order spatial dependencies. On the other hand, we utilize a temporal hypergraph and employ a hyperedge evolving ODE network to capture high-order temporal dependencies. Finally, we aggregate the outputs of stacked STHODE layers to mutually enhance the prediction performance. Extensive experiments conducted on four real-world traffic datasets demonstrate the superior performance of our proposed model compared to various baselines.
    摘要 traffic 预测,受移动互联网发展和位置技术启发,在智能交通系统中扮演了关键角色。它帮助实施丰富多样的交通应用程序,为人们提供了基于收集的交通数据的便捷交通服务。现有方法通常利用图形基的深度学习网络来模型复杂的公路网络,尽管它们有效,但通常只能 superficiety 捕捉公路网络的高级空间相互关系和交通动态的高级时间相互关系。为了解决以上问题,我们关注交通系统的核心,并提出了 STHODE:空间-时间多重图神经方程网络,它将公路网络和交通动态联系起来,捕捉交通数据中的高级空间-时间相互关系。技术上来说,STHODE包括空间模块和时间模块。一方面,我们构建了空间多重图,并利用自适应 MixHop 多重图 ODE 网络来捕捉高级空间相互关系。另一方面,我们利用时间多重图,并使用超过度演化 ODE 网络来捕捉高级时间相互关系。最后,我们将堆叠的 STHODE 层输出进行互相增强预测性能。我们在四个实际交通数据集上进行了广泛的实验,结果表明我们的提出的模型在对比多种基准模型时表现出色。

Localization of Small Leakages in Water Distribution Networks using Concept Drift Explanation Methods

  • paper_url: http://arxiv.org/abs/2310.15830
  • repo_url: None
  • paper_authors: Valerie Vaquet, Fabian Hinder, Kathrin Lammers, Jonas Vaquet, Barbara Hammer
  • for: 该研究旨在帮助找到和地图水 Distribution 网络中的漏斗,以减少水资源的损失。
  • methods: 该研究使用压力测量来进行漏斗检测和地图。首先,漏斗在水 Distribution 网络中被模型化为 bayesian 网络,然后通过分析系统动态来连接漏斗问题和概念漂移。
  • results: 实验结果表明,使用模型基的解释可以帮助在有限信息 circumstance 中找到漏斗。
    Abstract Facing climate change the already limited availability of drinking water will decrease in the future rendering drinking water an increasingly scarce resource. Considerable amounts of it are lost through leakages in water transportation and distribution networks. Leakage detection and localization are challenging problems due to the complex interactions and changing demands in water distribution networks. Especially small leakages are hard to pinpoint yet their localization is vital to avoid water loss over long periods of time. While there exist different approaches to solving the tasks of leakage detection and localization, they are relying on various information about the system, e.g. real-time demand measurements and the precise network topology, which is an unrealistic assumption in many real-world scenarios. In contrast, this work attempts leakage localization using pressure measurements only. For this purpose, first, leakages in the water distribution network are modeled employing Bayesian networks, and the system dynamics are analyzed. We then show how the problem is connected to and can be considered through the lens of concept drift. In particular, we argue that model-based explanations of concept drift are a promising tool for localizing leakages given limited information about the network. The methodology is experimentally evaluated using realistic benchmark scenarios.
    摘要 面对气候变化,未来 drinking water 的可用性将更加紧缩,这将导致 drinking water 成为一种渐渐减少的资源。现有大量 drinking water 通过水交通和分配网络的泄漏而丢失。泄漏探测和定位是一项具有挑战性的问题,因为水分配网络的复杂交互和变化的需求。特别是小泄漏难以寻址,但其localization是非常重要,以避免长时间内泄漏水。现有不同的方法来解决泄漏探测和定位问题,但它们均基于不同的信息,如实时需求测量和网络topology的精确知识,这在许多实际场景中是不可能的假设。相反,这项工作使用压力测量来进行泄漏定位。为此,我们首先使用 Bayesian 网络模型了泄漏在水分配网络中,然后分析系统动态。我们 THEN 显示了如何通过概念漂移来连接这个问题,并证明了模型基于解释是一种有 Promise 的工具来定位泄漏。该方法在实际benchmark场景下进行了实验性评估。

One or Two Things We know about Concept Drift – A Survey on Monitoring Evolving Environments

  • paper_url: http://arxiv.org/abs/2310.15826
  • repo_url: None
  • paper_authors: Fabian Hinder, Valerie Vaquet, Barbara Hammer
  • for: 本研究对于概念漂移在无监督数据流中进行了文献综述,这个Setting particularly relevant for monitoring和异常检测,这些任务和挑战在工程中直接适用。
  • methods: 本文提供了一种系统的文献综述,同时还提供了precise的数学定义,以及基于 Parametric artificial datasets的标准化实验,allowing for direct comparison of different strategies for detection and localization。
  • results: 本研究提供了一种系统的文献综述,并对不同的探测和定位策略进行了系统的分析和评价,从而为实际场景中的应用提供了指南。 Additionally, the paper discusses the emerging topic of explaining concept drift.
    Abstract The world surrounding us is subject to constant change. These changes, frequently described as concept drift, influence many industrial and technical processes. As they can lead to malfunctions and other anomalous behavior, which may be safety-critical in many scenarios, detecting and analyzing concept drift is crucial. In this paper, we provide a literature review focusing on concept drift in unsupervised data streams. While many surveys focus on supervised data streams, so far, there is no work reviewing the unsupervised setting. However, this setting is of particular relevance for monitoring and anomaly detection which are directly applicable to many tasks and challenges in engineering. This survey provides a taxonomy of existing work on drift detection. Besides, it covers the current state of research on drift localization in a systematic way. In addition to providing a systematic literature review, this work provides precise mathematical definitions of the considered problems and contains standardized experiments on parametric artificial datasets allowing for a direct comparison of different strategies for detection and localization. Thereby, the suitability of different schemes can be analyzed systematically and guidelines for their usage in real-world scenarios can be provided. Finally, there is a section on the emerging topic of explaining concept drift.
    摘要 世界周围的变化是不断发生的,这些变化通常被称为概念融合(concept drift),它们影响了多种工业和技术过程。由于这些变化可能会导致机器人员和其他不正常行为,因此检测和分析概念融合是非常重要的。在这篇论文中,我们提供了关于概念融合在无监督数据流中的文献综述。虽然许多调查集中了监督数据流,但是这个设置尤其关键对监测和异常检测,这些任务直接适用于许多工程应用。本文提供了概念融合检测的taxonomy,并对现有研究进行了系统性的概述。此外,本文还提供了精确的数学定义,以及参考 datasets的标准实验,以便对不同策略的检测和定位进行直接比较。因此,可以系统地分析不同方案的适用性,并提供实际应用场景中的指南。最后,文章还包括一节关于解释概念融合的新趋势。

Nonlinear dimensionality reduction then and now: AIMs for dissipative PDEs in the ML era

  • paper_url: http://arxiv.org/abs/2310.15816
  • repo_url: None
  • paper_authors: Eleni D. Koronaki, Nikolaos Evangelou, Cristina P. Martin-Linares, Edriss S. Titi, Ioannis G. Kevrekidis
  • for: 这个研究报告是关于构建分布式动力系统的减少预算模型(ROMs)的一系列纯数据驱动的工作流程。
  • methods: 这些ROMs是基于 Approximate Inertial Manifolds(AIMs)理论的数据助け模型,特别是garcia-archilla、novo和titi所提出的后处理加erkin方法。这种方法的应用范围可以扩展到不知道正确的潜在变量的情况下,使用机器学习工具来避免精度退化的问题。
  • results: 该方法可以表达ROMs为(a)理论(Fourier积分)、(b)线性数据驱动(POD模式)和/或(c)非线性数据驱动(Diffusion Maps)坐标。此外,还描述了黑盒和(理论 Informed和数据修正的)灰盒模型,后者是因为逐个 Galerkin 投影不准确而不可以进行后处理。通过使用 Chafee-Infante 反应扩散和 Kuramoto-Sivashinsky 损失偏微分方程来证明和成功测试整个框架。
    Abstract This study presents a collection of purely data-driven workflows for constructing reduced-order models (ROMs) for distributed dynamical systems. The ROMs we focus on, are data-assisted models inspired by, and templated upon, the theory of Approximate Inertial Manifolds (AIMs); the particular motivation is the so-called post-processing Galerkin method of Garcia-Archilla, Novo and Titi. Its applicability can be extended: the need for accurate truncated Galerkin projections and for deriving closed-formed corrections can be circumvented using machine learning tools. When the right latent variables are not a priori known, we illustrate how autoencoders as well as Diffusion Maps (a manifold learning scheme) can be used to discover good sets of latent variables and test their explainability. The proposed methodology can express the ROMs in terms of (a) theoretical (Fourier coefficients), (b) linear data-driven (POD modes) and/or (c) nonlinear data-driven (Diffusion Maps) coordinates. Both Black-Box and (theoretically-informed and data-corrected) Gray-Box models are described; the necessity for the latter arises when truncated Galerkin projections are so inaccurate as to not be amenable to post-processing. We use the Chafee-Infante reaction-diffusion and the Kuramoto-Sivashinsky dissipative partial differential equations to illustrate and successfully test the overall framework.
    摘要 In this study, we focus on the use of machine learning tools to circumvent the need for accurate truncated Galerkin projections and to derive closed-formed corrections in the construction of ROMs. Specifically, we use autoencoders and Diffusion Maps to discover good sets of latent variables and to test their explainability. The proposed methodology is able to express the ROMs in terms of different types of coordinates, including theoretical (Fourier coefficients), linear data-driven (POD modes), and nonlinear data-driven (Diffusion Maps) coordinates. Both Black-Box and Gray-Box models are described, with the necessity for the latter arising when the truncated Galerkin projections are inaccurate and not amenable to post-processing.The study demonstrates the applicability of the proposed methodology on two examples: the Chafee-Infante reaction-diffusion equation and the Kuramoto-Sivashinsky dissipative partial differential equation. The results show that the methodology is able to successfully construct ROMs for these systems, and that the use of machine learning tools can improve the accuracy and efficiency of the ROM construction process.

Good Better Best: Self-Motivated Imitation Learning for noisy Demonstrations

  • paper_url: http://arxiv.org/abs/2310.15815
  • repo_url: None
  • paper_authors: Ye Yuan, Xin Li, Yong Heng, Leiji Zhang, MingZhong Wang
  • for: 本文目的是探讨如何使用自我驱动的imitasion learning(SMILE)方法,排除具有低级别专家知识的示范行为,从而帮助学习者更好地学习专家示范行为。
  • methods: 本文使用了Diffusion Models的前向和反向过程,模拟示范行为中的专家知识散布,然后利用这些信息来预测示范行为中的扩散步数,并在自我驱动的方式下筛选掉不符合当前策略的示范行为。
  • results: 经验证明,SMILE方法可以在具有噪音示范行为的情况下,有效地学习专家示范行为,并且可以准确地排除不符合当前策略的示范行为。
    Abstract Imitation Learning (IL) aims to discover a policy by minimizing the discrepancy between the agent's behavior and expert demonstrations. However, IL is susceptible to limitations imposed by noisy demonstrations from non-expert behaviors, presenting a significant challenge due to the lack of supplementary information to assess their expertise. In this paper, we introduce Self-Motivated Imitation LEarning (SMILE), a method capable of progressively filtering out demonstrations collected by policies deemed inferior to the current policy, eliminating the need for additional information. We utilize the forward and reverse processes of Diffusion Models to emulate the shift in demonstration expertise from low to high and vice versa, thereby extracting the noise information that diffuses expertise. Then, the noise information is leveraged to predict the diffusion steps between the current policy and demonstrators, which we theoretically demonstrate its equivalence to their expertise gap. We further explain in detail how the predicted diffusion steps are applied to filter out noisy demonstrations in a self-motivated manner and provide its theoretical grounds. Through empirical evaluations on MuJoCo tasks, we demonstrate that our method is proficient in learning the expert policy amidst noisy demonstrations, and effectively filters out demonstrations with expertise inferior to the current policy.
    摘要 自适应学习(IL)目的是找到一个策略,以尽可能地减少代理者行为与专家示范之间的差异。然而,IL 受到非专家行为的示范噪声的限制,这是一个主要挑战,因为缺乏补充信息来评估他们的专业程度。在这篇论文中,我们介绍了一种名为自适应模仿学习(SMILE)的方法,可以逐渐排除由当前策略评估为非专业的示范集,不需要额外信息。我们利用了升降 diffusion 模型的前后进程,模拟示范专业程度从低到高和倒数,从而提取噪声信息。然后,我们利用这些噪声信息预测示范者与当前策略之间的扩散步骤,我们 theoretically 证明其等价于专业差距。我们进一步解释了在实际评估中如何应用预测的扩散步骤来自适应地排除噪声示范,并提供了其理论基础。通过在 MuJoCo 任务上的实际评估,我们证明了我们的方法可以在噪声示范下学习专家策略,并有效地排除不符合当前策略的示范。

Amortised Inference in Neural Networks for Small-Scale Probabilistic Meta-Learning

  • paper_url: http://arxiv.org/abs/2310.15786
  • repo_url: None
  • paper_authors: Matthew Ashman, Tommy Rochussen, Adrian Weller
  • for: 用于实现 task-specific BNNs 的 Bayesian inference
  • methods: 使用 inducing point variational approximation,将 inducing inputs 替换为实际数据,并使用权重网络(inference network)进行权重学习
  • results: 实现了对 task-specific BNNs 的权重学习 Bayesian inference
    Abstract The global inducing point variational approximation for BNNs is based on using a set of inducing inputs to construct a series of conditional distributions that accurately approximate the conditionals of the true posterior distribution. Our key insight is that these inducing inputs can be replaced by the actual data, such that the variational distribution consists of a set of approximate likelihoods for each datapoint. This structure lends itself to amortised inference, in which the parameters of each approximate likelihood are obtained by passing each datapoint through a meta-model known as the inference network. By training this inference network across related datasets, we can meta-learn Bayesian inference over task-specific BNNs.
    摘要 全球启发点可变近似方法是基于使用一组启发输入来构建一系列条件分布,准确地近似真实 posterior 分布的 conditionals。我们的关键发现是这些启发输入可以被取代为实际数据,从而变量分布包含一系列的近似可能性分布,每个数据点的参数可以通过一个称为推理网络的meta-模型来获得。通过训练这个推理网络在相关的数据集上,我们可以meta-学习 bayesian推理。

Robust Learning via Conditional Prevalence Adjustment

  • paper_url: http://arxiv.org/abs/2310.15766
  • repo_url: https://github.com/mnhng/CoPA
  • paper_authors: Minh Nguyen, Alan Q. Wang, Heejong Kim, Mert R. Sabuncu
  • for: 这个研究旨在解决健康领域资料中的潜在联合问题,该联合问题的存在可能导致深度学习模型在未见过的站点上失败。
  • methods: 这个研究提出了一种名为CoPA(Conditional Prevalence-Adjustment)的方法,该方法假设(1)生成机制是稳定的,即标签Y和偏导变量Z生成X,以及(2)每个站点E中的不稳定 conditional prevalence完全accounts for X和Y之间的不稳定相互关联。
  • results: 实验结果显示CoPA可以比基本reference beat competitive baselines。
    Abstract Healthcare data often come from multiple sites in which the correlations between confounding variables can vary widely. If deep learning models exploit these unstable correlations, they might fail catastrophically in unseen sites. Although many methods have been proposed to tackle unstable correlations, each has its limitations. For example, adversarial training forces models to completely ignore unstable correlations, but doing so may lead to poor predictive performance. Other methods (e.g. Invariant risk minimization [4]) try to learn domain-invariant representations that rely only on stable associations by assuming a causal data-generating process (input X causes class label Y ). Thus, they may be ineffective for anti-causal tasks (Y causes X), which are common in computer vision. We propose a method called CoPA (Conditional Prevalence-Adjustment) for anti-causal tasks. CoPA assumes that (1) generation mechanism is stable, i.e. label Y and confounding variable(s) Z generate X, and (2) the unstable conditional prevalence in each site E fully accounts for the unstable correlations between X and Y . Our crucial observation is that confounding variables are routinely recorded in healthcare settings and the prevalence can be readily estimated, for example, from a set of (Y, Z) samples (no need for corresponding samples of X). CoPA can work even if there is a single training site, a scenario which is often overlooked by existing methods. Our experiments on synthetic and real data show CoPA beating competitive baselines.
    摘要 医疗数据经常来自多个站点,其中 correlate 变量之间的相关性可能很大。如果深度学习模型利用这些不稳定相关性,它们可能在未看过的站点上失败 catastrophically。虽然许多方法已经提出来解决不稳定相关性问题,但每种方法都有其限制。例如,对抗性训练 forces 模型完全忽略不稳定相关性,但这可能导致预测性能差。其他方法(例如不变 risk minimization [4])尝试学习域不变的表示,它们假设数据生成过程是 causal 的(输入 X 导致类别标签 Y)。因此,它们可能无效于反 causal 任务(Y 导致 X),这些任务在计算机视觉中很常见。我们提出了一种方法called CoPA(conditional Prevalence-Adjustment),这种方法适用于反 causal 任务。CoPA 假设(1)生成机制是稳定的,即标签 Y 和干扰变量 Z 生成 X,以及(2)每个站点 E 中的不稳定 conditional prevalence completly accounts for 不稳定相关性 между X 和 Y。我们的关键观察是,干扰变量通常在医疗设置中记录,并且 prevalence 可以从 (Y, Z) 样本中Ready estimation(无需对应的 X 样本)。CoPA 可以在单个训练站点上工作,这种情况frequently overlooked by existing methods。我们在 sintetic 和实际数据上进行了实验,CoPA 可以击败竞争基线。

Analyzing Single Cell RNA Sequencing with Topological Nonnegative Matrix Factorization

  • paper_url: http://arxiv.org/abs/2310.15744
  • repo_url: None
  • paper_authors: Yuta Hozumi, Guo-Wei Wei
  • for: 这个论文旨在探讨单个细胞RNA sequencing(scRNA-seq)数据的统计分析方法,尤其是非正定矩阵分解(NMF)方法在scRNA-seq数据中的应用。
  • methods: 这篇论文提出了两种持续拉普拉斯regularized NMF方法,即topological NMF(TNMF)和robust topological NMF(rTNMF),并在12个数据集中进行了比较性研究,结果显示这两种方法在NMF-based方法中具有显著优势。
  • results: 研究人员通过使用TNMF和rTNMF方法对流行的Uniform Manifold Approximation and Projection(UMAP)和t-distributed stochastic neighbor embedding(t-SNE)进行了可视化,并证明了这两种方法在scRNA-seq数据中的应用可以带来更好的结果。
    Abstract Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. However, NMF approaches suffer from the lack of multiscale analysis. This work introduces two persistent Laplacian regularized NMF methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By employing a total of 12 datasets, we demonstrate that the proposed TNMF and rTNMF significantly outperform all other NMF-based methods. We have also utilized TNMF and rTNMF for the visualization of popular Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE).
    摘要

Improving Diffusion Models for ECG Imputation with an Augmented Template Prior

  • paper_url: http://arxiv.org/abs/2310.15742
  • repo_url: None
  • paper_authors: Alexander Jenkins, Zehua Chen, Fu Siong Ng, Danilo Mandic
  • for: 提高心电图(ECG)中缺失值的填充和预测精度,使用 probabilistic time-series models。
  • methods: 使用 template-guided denoising diffusion probabilistic model(PulseDiff),conditioned on informative prior for range of health conditions,包括个体特征和心跳水平的Variation。
  • results: 对PTBXL数据集进行实验,显示 PulseDiff 方法可以提高 DDPMs 基线模型 CSDI 和 SSSD$^{S4}$ 的性能,并且可以安全地使用医疗知识来提供先前的 priors。 合并 SSDD$^{S4}$ 模型时,PulseDiff 方法在短时间缺失数据下表现出色,与长时间缺失数据下的性能相当。
    Abstract Pulsative signals such as the electrocardiogram (ECG) are extensively collected as part of routine clinical care. However, noisy and poor-quality recordings, leading to missing values, are a major issue for signals collected using mobile health systems, decreasing the signal quality and affecting the automated downstream tasks. Recent studies have explored imputation of missing values for ECG with probabilistic time-series models. Nevertheless, in comparison with the deterministic models, their performance is still limited, as the variations across subjects and heart-beat relationships are not explicitly considered in the training objective. In this work, to improve the ECG imputation and forecasting accuracy with probabilistic models, we present an template-guided denoising diffusion probabilistic model, PulseDiff, which is conditioned an informative prior for a range of health conditions. Specifically, 1) we first extract a subject-level pulsative template from the observation as an informative prior of missing values, which captures the personal characteristics; 2) we then add beat-level stochastic shift terms on the template for prior augmentation, which considers the beat-level variance of positioning and amplitude; 3) we finally design a confidence score to consider the health condition of subject, which ensures our prior is provided in a safe way. Experiments with the PTBXL dataset reveal PulseDiff improves the performance of two strong DDPMs baseline models, CSDI and SSSD$^{S4}$, verifying our method guides the generation of DDPMs while managing the uncertainty. When combining with SSSD$^{S4}$, our PulseDiff method outperforms the leading deterministic model for short-interval missing data and is comparable for long-interval data loss.
    摘要 脉冲信号如电子心脉agram(ECG)在 Routine 临床护理中广泛采集。然而,手持设备采集的信号具有噪声和低质量,导致数据损失,这会降低信号质量并影响下游任务的自动化。Recent studies explored imputation of missing values for ECG with probabilistic time-series models. However, their performance is still limited, as they do not explicitly consider the variations across subjects and heart-beat relationships in the training objective. In this work, we aim to improve the ECG imputation and forecasting accuracy with probabilistic models by presenting a template-guided denoising diffusion probabilistic model, PulseDiff, which is conditioned on an informative prior for a range of health conditions. Specifically, we first extract a subject-level pulsative template from the observation as an informative prior of missing values, which captures the personal characteristics; then, we add beat-level stochastic shift terms on the template for prior augmentation, which considers the beat-level variance of positioning and amplitude; finally, we design a confidence score to consider the health condition of the subject, which ensures our prior is provided in a safe way. Experiments with the PTBXL dataset reveal that PulseDiff improves the performance of two strong DDPMs baseline models, CSDI and SSSD$^{S4}$, verifying our method guides the generation of DDPMs while managing the uncertainty. When combining with SSSD$^{S4}$, our PulseDiff method outperforms the leading deterministic model for short-interval missing data and is comparable for long-interval data loss.

Causal Representation Learning Made Identifiable by Grouping of Observational Variables

  • paper_url: http://arxiv.org/abs/2310.15709
  • repo_url: None
  • paper_authors: Hiroshi Morioka, Aapo Hyvärinen
  • for: The paper is written for learning a causal model for hidden features in a data-driven manner, specifically addressing the problem of Causal Representation Learning (CRL) which is ill-posed and difficult to identify.
  • methods: The paper proposes a novel approach to CRL that is based on weak constraints and does not require temporal structure, intervention, or weak supervision. The approach uses assuming the observational mixing exhibits a suitable grouping of the observational variables.
  • results: The paper shows that the proposed approach is statistically consistent and experimentally demonstrates superior CRL performances compared to state-of-the-art baselines. Additionally, the paper demonstrates the robustness of the approach against latent confounders and causal cycles.
    Abstract A topic of great current interest is Causal Representation Learning (CRL), whose goal is to learn a causal model for hidden features in a data-driven manner. Unfortunately, CRL is severely ill-posed since it is a combination of the two notoriously ill-posed problems of representation learning and causal discovery. Yet, finding practical identifiability conditions that guarantee a unique solution is crucial for its practical applicability. Most approaches so far have been based on assumptions on the latent causal mechanisms, such as temporal causality, or existence of supervision or interventions; these can be too restrictive in actual applications. Here, we show identifiability based on novel, weak constraints, which requires no temporal structure, intervention, nor weak supervision. The approach is based assuming the observational mixing exhibits a suitable grouping of the observational variables. We also propose a novel self-supervised estimation framework consistent with the model, prove its statistical consistency, and experimentally show its superior CRL performances compared to the state-of-the-art baselines. We further demonstrate its robustness against latent confounders and causal cycles.
    摘要 Currently, a topic of great interest is Causal Representation Learning (CRL), which aims to learn a causal model for hidden features in a data-driven manner. However, CRL is severely ill-posed because it combines the two notoriously ill-posed problems of representation learning and causal discovery. Finding practical identifiability conditions that guarantee a unique solution is crucial for its practical applicability. Most existing approaches rely on assumptions about the latent causal mechanisms, such as temporal causality or the existence of supervision or interventions, which can be too restrictive in real-world applications.In this study, we propose a novel approach to identifiability based on weak constraints that do not require temporal structure, intervention, or weak supervision. The approach is based on the assumption that the observational mixing exhibits a suitable grouping of the observational variables. We also propose a novel self-supervised estimation framework that is consistent with the model, prove its statistical consistency, and experimentally show its superior CRL performance compared to state-of-the-art baselines. Furthermore, we demonstrate its robustness against latent confounders and causal cycles.

Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

  • paper_url: http://arxiv.org/abs/2310.15681
  • repo_url: None
  • paper_authors: Shintaro Nakamura, Masashi Sugiyama
  • for: 这个论文是关于实值 combinatorial 纯探索多臂抓拍机的 fixed-budget 设定下的研究。
  • methods: 论文首先介绍了 Combinatorial Successive Assignment (CSA) 算法,这是第一个可以在 exponentially large 的动作类型下标识最佳动作的算法。 然后,论文还介绍了另一种名为 Minimax Combinatorial Successive Accepts and Rejects (Minimax-CombSAR) 算法,用于对动作类型的大小为 polynomial 的情况,并证明其是最佳的。
  • results: 论文通过对先前方法进行实验比较,显示了自己的算法的优越性。
    Abstract We study the real-valued combinatorial pure exploration of the multi-armed bandit in the fixed-budget setting. We first introduce the Combinatorial Successive Asign (CSA) algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large with respect to the number of arms. We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent. Then, we introduce another algorithm named the Minimax Combinatorial Successive Accepts and Rejects (Minimax-CombSAR) algorithm for the case where the size of the action class is polynomial, and show that it is optimal, which matches a lower bound. Finally, we experimentally compare the algorithms with previous methods and show that our algorithm performs better.
    摘要 我们研究了实数值的 combinatorial 纯查探多枪仓设定下的多枪仓问题。我们首先介绍了 Combinatorial Successive Assignment(CSA)算法,这是第一个可以确定最佳动作,即使动作类型数量为对数几何函数而言是非常大的算法。我们表明了 CSA 算法的上界错误概率与下界差分Logarithmic factor。然后,我们介绍了另一种名为 Minimax Combinatorial Successive Accepts and Rejects(Minimax-CombSAR)算法,用于情况下动作类型数量是 polynomial,并证明其是优化的,与下界匹配。最后,我们对 previous methods 进行了实验比较,并证明我们的算法表现更好。Note: "多枪仓" (dà qiàng chuī) is a Chinese term that refers to the multi-armed bandit problem.

Interactive Generalized Additive Model and Its Applications in Electric Load Forecasting

  • paper_url: http://arxiv.org/abs/2310.15662
  • repo_url: None
  • paper_authors: Linxiao Yang, Rui Ren, Xinyue Gu, Liang Sun
  • for: 预测电力负荷,帮助电力系统规划和管理。
  • methods: 我们提出了一种可互动的 generalized additive model(GAM),利用划分式线性函数,可以在限量数据或缺乏数据情况下提高预测性能。
  • results: 在公共benchmark和电力数据集上,我们的可互动GAM比现有状态对照方法高效,并且在极端天气事件情况下也表现良好。我们已经将这种模型集成到我们的eforecaster产品中,并创建了一个User-friendly的Web应用程序。
    Abstract Electric load forecasting is an indispensable component of electric power system planning and management. Inaccurate load forecasting may lead to the threat of outages or a waste of energy. Accurate electric load forecasting is challenging when there is limited data or even no data, such as load forecasting in holiday, or under extreme weather conditions. As high-stakes decision-making usually follows after load forecasting, model interpretability is crucial for the adoption of forecasting models. In this paper, we propose an interactive GAM which is not only interpretable but also can incorporate specific domain knowledge in electric power industry for improved performance. This boosting-based GAM leverages piecewise linear functions and can be learned through our efficient algorithm. In both public benchmark and electricity datasets, our interactive GAM outperforms current state-of-the-art methods and demonstrates good generalization ability in the cases of extreme weather events. We launched a user-friendly web-based tool based on interactive GAM and already incorporated it into our eForecaster product, a unified AI platform for electricity forecasting.
    摘要 电力负荷预测是电力系统规划和管理中不可或缺的一部分。不准确的电力预测可能会导致供电停顺或能源浪费。正确的电力预测在有限数据或缺乏数据情况下是挑战。例如,在假日或极端天气情况下, load forecasting 难以准确。因为高飞度决策通常会随后进行load forecasting, therefore model interpretability是预测模型的采用的关键因素。在这篇论文中,我们提出了一种可交互的Generalized Additive Model(GAM),不仅可 interpretability,而且可以包含特定领域知识以提高性能。这种boosting-based GAM使用分割线性函数,可以通过我们的效果algorithm learn。在公共 benchmark和电力数据集上,我们的交互式 GAM 超过当前状态艺术方法,并在极端天气事件情况下展现了良好的总体化能力。我们已经将交互式 GAM 集成到我们的 eForecaster 产品中,这是一个统一的 AI 平台 для电力预测。

Enhancing Traffic Prediction with Learnable Filter Module

  • paper_url: http://arxiv.org/abs/2310.16063
  • repo_url: None
  • paper_authors: Yuanshao Zhu, Yongchao Ye, Xiangyu Zhao, James J. Q. Yu
  • for: 提高交通预测模型的准确率, addresses the challenge of modeling future traffic conditions by proposing a learnable filter module to adaptively filter out noise in traffic data.
  • methods: 使用 Fourier 变换将数据转移到频域,并基于频谱特征对噪声进行过滤。然后使用反 Fourier 变换将过滤后的数据转换回时域。
  • results: 实验结果表明,提出的模块可以有效地 Mitigate 噪声,提高交通预测性能。
    Abstract Modeling future traffic conditions often relies heavily on complex spatial-temporal neural networks to capture spatial and temporal correlations, which can overlook the inherent noise in the data. This noise, often manifesting as unexpected short-term peaks or drops in traffic observation, is typically caused by traffic accidents or inherent sensor vibration. In practice, such noise can be challenging to model due to its stochastic nature and can lead to overfitting risks if a neural network is designed to learn this behavior. To address this issue, we propose a learnable filter module to filter out noise in traffic data adaptively. This module leverages the Fourier transform to convert the data to the frequency domain, where noise is filtered based on its pattern. The denoised data is then recovered to the time domain using the inverse Fourier transform. Our approach focuses on enhancing the quality of the input data for traffic prediction models, which is a critical yet often overlooked aspect in the field. We demonstrate that the proposed module is lightweight, easy to integrate with existing models, and can significantly improve traffic prediction performance. Furthermore, we validate our approach with extensive experimental results on real-world datasets, showing that it effectively mitigates noise and enhances prediction accuracy.
    摘要 (Simplified Chinese translation)模型未来交通情况经常利用复杂的空间-时间神经网络来捕捉空间和时间相关性,这可能会忽略数据中的随机噪声。这种噪声,通常表现为交通事故或感知器振荡所引起的意外短期峰值或下降,会导致模型过拟合。为解决这问题,我们提出了一个可学习的筛波模块,可以适应性地筛除交通数据中的噪声。这个模块利用快推转换数据到频率域,然后根据噪声的模式进行筛选。筛选后的数据再用逆快推转换回时间域。我们的方法注重提高交通预测模型的输入数据质量,这是预测领域中很重要的一点,尤其是在实际应用中。我们的实验结果表明,我们的方法可以减少噪声,提高预测精度。

Momentum Gradient-based Untargeted Attack on Hypergraph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.15656
  • repo_url: None
  • paper_authors: Yang Chen, Stjepan Picek, Zhonglin Ye, Zhaoyang Wang, Haixing Zhao
  • For: 这个论文的目的是研究隐藏图论文模型(HGNNs)对于隐藏图相关任务的应用。由于HGNNs具有高阶表示能力,因此它们在各种任务上表现出色。然而,深度学习模型在攻击下可能会受到影响,而大多数研究都集中在图神经网络(GNNs)上,对于HGNNs的攻击研究还很少。这篇论文试图填补这个空白。* Methods: 这篇论文提出了一种新的HGNNs攻击模型,称为MGHGA,用于无目标攻击。MGHGA的核心思想是通过修改节点特征来攻击HGNNs。具体来说,MGHGA包括两个部分:特征选择和特征修改。在特征选择模块中,我们使用了沃尔tz势力Gradient机制来选择攻击节点的特征。在特征修改模块中,我们使用了两种特征生成方法(直接修改和签号Gradient)来使MGHGA适用于离散和连续数据集。* Results: 我们在五个benchmark数据集上进行了广泛的实验,以验证MGHGA在节点和视对象分类任务中的攻击性能。结果显示,MGHGA在比基线模型高2%的平均提高了性能。
    Abstract Hypergraph Neural Networks (HGNNs) have been successfully applied in various hypergraph-related tasks due to their excellent higher-order representation capabilities. Recent works have shown that deep learning models are vulnerable to adversarial attacks. Most studies on graph adversarial attacks have focused on Graph Neural Networks (GNNs), and the study of adversarial attacks on HGNNs remains largely unexplored. In this paper, we try to reduce this gap. We design a new HGNNs attack model for the untargeted attack, namely MGHGA, which focuses on modifying node features. We consider the process of HGNNs training and use a surrogate model to implement the attack before hypergraph modeling. Specifically, MGHGA consists of two parts: feature selection and feature modification. We use a momentum gradient mechanism to choose the attack node features in the feature selection module. In the feature modification module, we use two feature generation approaches (direct modification and sign gradient) to enable MGHGA to be employed on discrete and continuous datasets. We conduct extensive experiments on five benchmark datasets to validate the attack performance of MGHGA in the node and the visual object classification tasks. The results show that MGHGA improves performance by an average of 2% compared to the than the baselines.
    摘要 希PEREP Neural Networks (HGNNs) 已经成功应用于多个几何グラフ関系的任务中,因为它具有出色的高阶表现能力。 recent works have shown that deep learning models are vulnerable to adversarial attacks. most studies on graph adversarial attacks have focused on Graph Neural Networks (GNNs), and the study of adversarial attacks on HGNNs remains largely unexplored. in this paper, we try to reduce this gap. we design a new HGNNs attack model for the untargeted attack, namely MGHGA, which focuses on modifying node features. we consider the process of HGNNs training and use a surrogate model to implement the attack before hypergraph modeling. specifically, MGHGA consists of two parts: feature selection and feature modification. we use a momentum gradient mechanism to choose the attack node features in the feature selection module. in the feature modification module, we use two feature generation approaches (direct modification and sign gradient) to enable MGHGA to be employed on discrete and continuous datasets. we conduct extensive experiments on five benchmark datasets to validate the attack performance of MGHGA in the node and the visual object classification tasks. the results show that MGHGA improves performance by an average of 2% compared to the baselines.

Deceptive Fairness Attacks on Graphs via Meta Learning

  • paper_url: http://arxiv.org/abs/2310.15653
  • repo_url: https://github.com/jiank2/fate
  • paper_authors: Jian Kang, Yinglong Xia, Ross Maciejewski, Jiebo Luo, Hanghang Tong
  • for: 本研究旨在Answering the question of how to launch poisoning attacks on graph learning models to exacerbate bias in a deceptive manner.
  • methods: 本研究使用bi-level优化问题和meta学习基础结构(FATE)来实现攻击。FATE 可以应用于不同的公平定义和图学习模型,以及任意的操作修改。
  • results: 实验结果表明,FATE 可以在真实世界数据集上增强图神经网络的偏见,无论是否考虑公平性。此外,FATE 还可以维持下游任务的有用性。本研究提供了对不公正公平图学习的抗 adversarial 性的新的理解,并可能为未来的研究提供指导。
    Abstract We study deceptive fairness attacks on graphs to answer the following question: How can we achieve poisoning attacks on a graph learning model to exacerbate the bias deceptively? We answer this question via a bi-level optimization problem and propose a meta learning-based framework named FATE. FATE is broadly applicable with respect to various fairness definitions and graph learning models, as well as arbitrary choices of manipulation operations. We further instantiate FATE to attack statistical parity and individual fairness on graph neural networks. We conduct extensive experimental evaluations on real-world datasets in the task of semi-supervised node classification. The experimental results demonstrate that FATE could amplify the bias of graph neural networks with or without fairness consideration while maintaining the utility on the downstream task. We hope this paper provides insights into the adversarial robustness of fair graph learning and can shed light on designing robust and fair graph learning in future studies.
    摘要 我们研究欺骗性公平攻击图进行回答:如何通过欺骗攻击图学模型,扩大偏见?我们通过二级优化问题回答这个问题,并提出了一种基于元学习的框架名为FATE。FATE在不同的公平定义和图学模型之间都是广泛应用的,同时还可以针对各种杂化操作进行配置。我们进一步实现FATE来攻击统计均衡和个人公平在图神经网络上。我们在实际世界数据集上进行了广泛的实验评估,结果表明FATE可以在不考虑公平情况下或者考虑公平情况下增强图神经网络的偏见,同时保持下游任务的实用性。我们希望这篇论文可以提供关于公平 graph learning 的抗攻击性和未来研究的灵感。

Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models

  • paper_url: http://arxiv.org/abs/2310.15648
  • repo_url: https://github.com/fschmid56/efficientat
  • paper_authors: Florian Schmid, Khaled Koutini, Gerhard Widmer
  • for: 这个论文的目的是提高大规模的音频数据集上的音频标签任务的效果,并且比对传统的卷积神经网络(CNN)和变换器(Transformer)的性能。
  • methods: 该论文使用了知识填充(Knowledge Distillation)技术,将Transformer知识填充到有限的效率卷积神经网络(efficient CNN)中,以提高其性能。此外,该论文还引入了动态卷积块(dynamic convolutions)、非线性(non-linearities)和注意机制(attention mechanisms),以提高效率卷积神经网络的性能。
  • results: 实验结果表明,引入的动态卷积块和非线性等技术可以提高效率卷积神经网络的性能,并且在AudioSet和多个下游任务上达到或超过Transformer的性能。此外,该论文还发现,动态卷积块和非线性等技术可以提高下游任务的灵活性和泛化能力。
    Abstract The introduction of large-scale audio datasets, such as AudioSet, paved the way for Transformers to conquer the audio domain and replace CNNs as the state-of-the-art neural network architecture for many tasks. Audio Spectrogram Transformers are excellent at exploiting large datasets, creating powerful pre-trained models that surpass CNNs when fine-tuned on downstream tasks. However, current popular Audio Spectrogram Transformers are demanding in terms of computational complexity compared to CNNs. Recently, we have shown that, by employing Transformer-to-CNN Knowledge Distillation, efficient CNNs can catch up with and even outperform Transformers on large datasets. In this work, we extend this line of research and increase the capacity of efficient CNNs by introducing dynamic CNN blocks, constructed of dynamic non-linearities, dynamic convolutions and attention mechanisms. We show that these dynamic CNNs outperform traditional efficient CNNs, in terms of the performance-complexity trade-off and parameter efficiency, at the task of audio tagging on the large-scale AudioSet. Our experiments further indicate that the introduced dynamic CNNs achieve better performance on downstream tasks and scale up well, attaining Transformer performance and even outperforming them on AudioSet and several downstream tasks.
    摘要 大量的声音数据集,如AudioSet,开创了Transformer在声音领域的统治之路,取代了CNN作为下游任务的状态态程最佳神经网络架构。声音柱图Transformer具有大量数据集的利用能力,创造了强大的预训练模型,在下游任务上精度训练时超过CNN。然而,当前流行的声音柱图Transformer在计算复杂性方面较高,与CNN相比。在最近的研究中,我们发现,通过Transformer-to-CNN知识储存学习,可以使得高效的CNNcatch up与和超过Transformers在大规模数据集上。在这项工作中,我们扩展这一研究,提高高效CNN的容量,通过引入动态非线性、动态核算和注意机制来构建动态CNN块。我们示出,这些动态CNN可以在性能和复杂度之间取得更好的平衡点,并且在AudioSet和多个下游任务上达到或超过Transformer的性能。我们的实验还表明,引入的动态CNN可以在更多的下游任务上进行扩展,并且可以在AudioSet和多个下游任务上保持稳定的性能。

Light up that Droid! On the Effectiveness of Static Analysis Features against App Obfuscation for Android Malware Detection

  • paper_url: http://arxiv.org/abs/2310.15645
  • repo_url: None
  • paper_authors: Borja Molina-Coronado, Antonio Ruggia, Usue Mori, Alessio Merlo, Alexander Mendiburu, Jose Miguel-Alonso
  • for: 本研究旨在探讨针对Android平台的Machine Learning(ML)恶意软件检测器是否能够抵抗增强难以理解的软件(obfuscation)的影响。
  • methods: 本研究使用了多种常见的静态分析特征,包括代码字串、API调用记录、函数名称和参数等,并使用了多种静态分析工具来检测和分析软件的增强难以理解。
  • results: 研究发现,增强难以理解可以对静态分析特征产生一定的影响,但是certain features still retain their validity for ML malware detection even in the presence of obfuscation。基于这些发现,本研究提出了一种robust against obfuscation的ML恶意软件检测器,并比当前状态的检测器表现更高。
    Abstract Malware authors have seen obfuscation as the mean to bypass malware detectors based on static analysis features. For Android, several studies have confirmed that many anti-malware products are easily evaded with simple program transformations. As opposed to these works, ML detection proposals for Android leveraging static analysis features have also been proposed as obfuscation-resilient. Therefore, it needs to be determined to what extent the use of a specific obfuscation strategy or tool poses a risk for the validity of ML malware detectors for Android based on static analysis features. To shed some light in this regard, in this article we assess the impact of specific obfuscation techniques on common features extracted using static analysis and determine whether the changes are significant enough to undermine the effectiveness of ML malware detectors that rely on these features. The experimental results suggest that obfuscation techniques affect all static analysis features to varying degrees across different tools. However, certain features retain their validity for ML malware detection even in the presence of obfuscation. Based on these findings, we propose a ML malware detector for Android that is robust against obfuscation and outperforms current state-of-the-art detectors.
    摘要 malware作者们看到了隐藏为了绕过基于静态分析特征的反毒软件。 Android 上有许多研究证明了许多反毒软件可以轻松地被隐藏。相比之下, ML 检测建议 для Android leveraging 静态分析特征也已经被提出侔避免这种情况。因此,需要决定使用特定隐藏策略或工具对 Android 基于静态分析特征的 ML 毒软件检测器的有效性对确。为了解释这问题,这篇文章评估了具体隐藏技术对常用的静态分析特征的影响,并决定这些变化是否足以导致 ML 毒软件检测器的无效性。实验结果表明隐藏技术对不同工具的静态分析特征产生了不同的影响,但certain 特征在隐藏情况下仍然保持其有效性。基于这些发现,我们提出了一个robust against 隐藏的 ML 毒软件检测器,并与现有的州�� ell-of-the-art 检测器进行比较。

Guaranteed Coverage Prediction Intervals with Gaussian Process Regression

  • paper_url: http://arxiv.org/abs/2310.15641
  • repo_url: None
  • paper_authors: Harris Papadopoulos
  • for: 提高 Gaussian Process Regression(GPR)中的预测不确定性估计的准确性,以便更好地评估模型的性能。
  • methods: 基于 Conformal Prediction(CP)机器学习框架,对 GPR 进行扩展,以 garantizar 预测interval的覆盖率达到所需的水平,即使模型是完全错误的。
  • results: 在实验中,提出的方法比现有方法更为有效,可以更好地评估模型的性能。
    Abstract Gaussian Process Regression (GPR) is a popular regression method, which unlike most Machine Learning techniques, provides estimates of uncertainty for its predictions. These uncertainty estimates however, are based on the assumption that the model is well-specified, an assumption that is violated in most practical applications, since the required knowledge is rarely available. As a result, the produced uncertainty estimates can become very misleading; for example the prediction intervals (PIs) produced for the 95\% confidence level may cover much less than 95\% of the true labels. To address this issue, this paper introduces an extension of GPR based on a Machine Learning framework called, Conformal Prediction (CP). This extension guarantees the production of PIs with the required coverage even when the model is completely misspecified. The proposed approach combines the advantages of GPR with the valid coverage guarantee of CP, while the performed experimental results demonstrate its superiority over existing methods.
    摘要

Contextual directed acyclic graphs

  • paper_url: http://arxiv.org/abs/2310.15627
  • repo_url: https://github.com/ryan-thompson/contextualdag.jl
  • paper_authors: Ryan Thompson, Edwin V. Bonilla, Robert Kohn
  • for: 本研究旨在解决由观测数据提供的导向无环图(DAG)结构估计问题,具体是在各个个体基础上的图结构异同。
  • methods: 本研究使用神经网络来映射各个个体的Contextual特征到一个权重adjacency矩阵表示的DAG结构。神经网络具有一个新的投影层,使输出矩阵 sparse并满足最近发展的异环性特征。
  • results: 我们的实验表明,新的方法可以成功地回归真实的上下文特定的图结构,而existingsapproaches失败。
    Abstract Estimating the structure of directed acyclic graphs (DAGs) from observational data remains a significant challenge in machine learning. Most research in this area concentrates on learning a single DAG for the entire population. This paper considers an alternative setting where the graph structure varies across individuals based on available "contextual" features. We tackle this contextual DAG problem via a neural network that maps the contextual features to a DAG, represented as a weighted adjacency matrix. The neural network is equipped with a novel projection layer that ensures the output matrices are sparse and satisfy a recently developed characterization of acyclicity. We devise a scalable computational framework for learning contextual DAGs and provide a convergence guarantee and an analytical gradient for backpropagating through the projection layer. Our experiments suggest that the new approach can recover the true context-specific graph where existing approaches fail.
    摘要 estimate directed acyclic graphs (DAGs) from observational data remains a significant challenge in machine learning. Most research in this area concentrates on learning a single DAG for the entire population. This paper considers an alternative setting where the graph structure varies across individuals based on available "contextual" features. We tackle this contextual DAG problem via a neural network that maps the contextual features to a DAG, represented as a weighted adjacency matrix. The neural network is equipped with a novel projection layer that ensures the output matrices are sparse and satisfy a recently developed characterization of acyclicity. We devise a scalable computational framework for learning contextual DAGs and provide a convergence guarantee and an analytical gradient for backpropagating through the projection layer. Our experiments suggest that the new approach can recover the true context-specific graph where existing approaches fail.Here's the text with some additional information about the translation:I used Google Translate to translate the text into Simplified Chinese. However, I noticed that the translation did not capture some of the technical terms and phrases used in the original text. Therefore, I made some adjustments to the translation to ensure that the meaning and context of the text are preserved.In particular, I replaced "directed acyclic graphs" with "指向无环图" (which is the literal translation of "directed acyclic graphs" in Simplified Chinese), and I replaced "contextual features" with "上下文特征" (which is a more common term used in machine learning to refer to features that are specific to a particular context or population). I also replaced "acyclic" with "无环" (which is the literal translation of "acyclic" in Simplified Chinese), and I added the word "特定" (which means "specific" or "context-specific") to the phrase "context-specific graph" to emphasize that the graph structure varies across individuals based on their specific context.Overall, I hope that the translation is helpful and accurate, and I apologize for any errors or inaccuracies that may remain.

Accelerating Split Federated Learning over Wireless Communication Networks

  • paper_url: http://arxiv.org/abs/2310.15584
  • repo_url: None
  • paper_authors: Ce Xu, Jinxuan Li, Yuan Liu, Yushi Ling, Miaowen Wen
  • for: 这篇论文旨在提高深度神经网络(DNN)在内存限制的Edge设备上的应用,但DNN的参数数量和计算复杂度使得它实现困难。
  • methods: 这篇论文提出了一种模型分割/拆分的方法,将DNN分成两部分,分别在设备和服务器上进行培训或推导。
  • results: 论文的实验结果显示,这种方法可以将系统延迟降至最低,同时提高准确性。
    Abstract The development of artificial intelligence (AI) provides opportunities for the promotion of deep neural network (DNN)-based applications. However, the large amount of parameters and computational complexity of DNN makes it difficult to deploy it on edge devices which are resource-constrained. An efficient method to address this challenge is model partition/splitting, in which DNN is divided into two parts which are deployed on device and server respectively for co-training or co-inference. In this paper, we consider a split federated learning (SFL) framework that combines the parallel model training mechanism of federated learning (FL) and the model splitting structure of split learning (SL). We consider a practical scenario of heterogeneous devices with individual split points of DNN. We formulate a joint problem of split point selection and bandwidth allocation to minimize the system latency. By using alternating optimization, we decompose the problem into two sub-problems and solve them optimally. Experiment results demonstrate the superiority of our work in latency reduction and accuracy improvement.
    摘要 人工智能(AI)的发展提供了深度神经网络(DNN)基于应用的推广机会。然而,DNN的参数量和计算复杂性使其在具有限制的边缘设备上部署困难。一种有效的方法是模型分割/拆分,其将DNN分成两部分,一部在设备上部署,另一部在服务器上部署,以进行合作训练或合作推理。在这篇论文中,我们考虑了一个分布式学习(FL)框架,该框架结合了分布式学习的并行训练机制和拆分学习(SL)的模型结构。我们考虑了一个实际的异构设备场景,其中每个设备有自己的拆分点。我们形式化了一个分布式系统延迟最小化问题,并使用交叉优化法解决这个问题。实验结果表明,我们的方法在延迟减少和准确率提高方面具有优势。

Identifiable Latent Polynomial Causal Models Through the Lens of Change

  • paper_url: http://arxiv.org/abs/2310.15580
  • repo_url: None
  • paper_authors: Yuhang Liu, Zhen Zhang, Dong Gong, Mingming Gong, Biwei Huang, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi
  • for: 本研究旨在探讨 causal representation learning 如何揭示隐藏的高级 causal 表示,并提供可靠的确认方法。
  • methods: 本文提出了一种基于 latent causal variable 的变化分析方法,以确保 causal 模型的可靠性。
  • results: 本研究发现了一种扩展 latent causal models 的方法,包括非线性 causal 关系和不同的噪音分布。此外,本文还提出了一种新的 empirical estimation 方法,并通过实验验证了其理论成果。
    Abstract Causal representation learning aims to unveil latent high-level causal representations from observed low-level data. One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability. A recent breakthrough explores identifiability by leveraging the change of causal influences among latent causal variables across multiple environments \citep{liu2022identifying}. However, this progress rests on the assumption that the causal relationships among latent causal variables adhere strictly to linear Gaussian models. In this paper, we extend the scope of latent causal models to involve nonlinear causal relationships, represented by polynomial models, and general noise distributions conforming to the exponential family. Additionally, we investigate the necessity of imposing changes on all causal parameters and present partial identifiability results when part of them remains unchanged. Further, we propose a novel empirical estimation method, grounded in our theoretical finding, that enables learning consistent latent causal representations. Our experimental results, obtained from both synthetic and real-world data, validate our theoretical contributions concerning identifiability and consistency.
    摘要 Translated into Simplified Chinese: causal representation learning aims to reveal latent high-level causal representations from observed low-level data. One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability. A recent breakthrough explores identifiability by leveraging the change of causal influences among latent causal variables across multiple environments. However, this progress rests on the assumption that the causal relationships among latent causal variables adhere strictly to linear Gaussian models. In this paper, we extend the scope of latent causal models to involve nonlinear causal relationships, represented by polynomial models, and general noise distributions conforming to the exponential family. Additionally, we investigate the necessity of imposing changes on all causal parameters and present partial identifiability results when part of them remains unchanged. Further, we propose a novel empirical estimation method, grounded in our theoretical finding, that enables learning consistent latent causal representations. Our experimental results, obtained from both synthetic and real-world data, validate our theoretical contributions concerning identifiability and consistency.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.

From Oja’s Algorithm to the Multiplicative Weights Update Method with Applications

  • paper_url: http://arxiv.org/abs/2310.15559
  • repo_url: None
  • paper_authors: Dan Garber
  • for: 这个论文主要针对的是在随机 principaal component analysis 中研究的在线算法 Oja 算法。
  • methods: 这个论文提出了一个简单 yet novel 的观察,即当应用到任何(不一定是随机)矩阵序列中,只要这些矩阵具有共同特征向量,那么 Oja 算法的 regret 可以直接关于 multiplicative weights update 方法的 regret 的 bound。
  • results: 论文提出了几个应用于 $\reals^n$ 上的二次形式优化问题,其中包括随机 principaal component analysis。
    Abstract Oja's algorithm is a well known online algorithm studied mainly in the context of stochastic principal component analysis. We make a simple observation, yet to the best of our knowledge a novel one, that when applied to a any (not necessarily stochastic) sequence of symmetric matrices which share common eigenvectors, the regret of Oja's algorithm could be directly bounded in terms of the regret of the well known multiplicative weights update method for the problem of prediction with expert advice. Several applications to optimization with quadratic forms over the unit sphere in $\reals^n$ are discussed.
    摘要 “Oja的算法是一种著名的在线算法,主要在随机主成分分析中研究。我们作出了一个简单的观察,即当应用于任何(不一定是随机的)对称矩阵序列,这些矩阵共享共同特征向量时,Oja的算法的 regret可以直接 bounds 为multiplicative weights更新方法的 regret,用于预测专家建议问题。我们还讨论了在 Unit 球上二元函数优化问题的几种应用。”Note that Simplified Chinese is a romanization of Chinese, and the actual Chinese characters may be different.

Transfer learning for day-ahead load forecasting: a case study on European national electricity demand time series

  • paper_url: http://arxiv.org/abs/2310.15555
  • repo_url: None
  • paper_authors: Alexandros-Menelaos Tzortzis, Sotiris Pelekis, Evangelos Spiliotis, Spiros Mouzakitis, John Psarras, Dimitris Askounis
  • for: 这个研究的目的是提高短期负载预测(STLF)的精度,并 investigate the performance of transfer learning(TL)in STLF.
  • methods: 本研究使用了一个流行的神经网络模型(NN),并进行了一个 clustering 分析来找出Series Similarity。
  • results: 研究结果显示,TL 可以比 convential approach 更高的精度,特别是当使用 clustering 分析时。
    Abstract Short-term load forecasting (STLF) is crucial for the daily operation of power grids. However, the non-linearity, non-stationarity, and randomness characterizing electricity demand time series renders STLF a challenging task. Various forecasting approaches have been proposed for improving STLF, including neural network (NN) models which are trained using data from multiple electricity demand series that may not necessary include the target series. In the present study, we investigate the performance of this special case of STLF, called transfer learning (TL), by considering a set of 27 time series that represent the national day-ahead electricity demand of indicative European countries. We employ a popular and easy-to-implement NN model and perform a clustering analysis to identify similar patterns among the series and assist TL. In this context, two different TL approaches, with and without the clustering step, are compiled and compared against each other as well as a typical NN training setup. Our results demonstrate that TL can outperform the conventional approach, especially when clustering techniques are considered.
    摘要 In this study, we investigate the performance of transfer learning (TL) in STLF by using a set of 27 time series that represent the national day-ahead electricity demand of indicative European countries. We employ a popular and easy-to-implement NN model and perform a clustering analysis to identify similar patterns among the series and assist TL. We compare two different TL approaches, with and without the clustering step, against each other and a typical NN training setup.Our results show that TL can outperform the conventional approach, especially when clustering techniques are considered. By leveraging the similarities among the time series, TL can improve the accuracy of STLF and provide more reliable predictions for power grid operations.

Algorithmic Regularization in Tensor Optimization: Towards a Lifted Approach in Matrix Sensing

  • paper_url: http://arxiv.org/abs/2310.15549
  • repo_url: None
  • paper_authors: Ziye Ma, Javad Lavaei, Somayeh Sojoudi
  • for: 研究Gradient Descent(GD)在机器学习模型中的泛化能力,尤其是在矩阵优化问题中。
  • methods: 使用GD方法优化矩阵优化问题,特别是在 lifted matrix sensing 框架中。
  • results: 研究发现,对于 sufficient small initialization scale,GD 可以导致矩阵变为约等于rank-1矩阵,并且得到了稳定的解。这些结论 highlights the importance of tensor parametrization of matrix sensing, in combination with first-order methods, in achieving global optimality in such problems.
    Abstract Gradient descent (GD) is crucial for generalization in machine learning models, as it induces implicit regularization, promoting compact representations. In this work, we examine the role of GD in inducing implicit regularization for tensor optimization, particularly within the context of the lifted matrix sensing framework. This framework has been recently proposed to address the non-convex matrix sensing problem by transforming spurious solutions into strict saddles when optimizing over symmetric, rank-1 tensors. We show that, with sufficiently small initialization scale, GD applied to this lifted problem results in approximate rank-1 tensors and critical points with escape directions. Our findings underscore the significance of the tensor parametrization of matrix sensing, in combination with first-order methods, in achieving global optimality in such problems.
    摘要 “梯度下降(GD)在机器学习模型中扮演着关键的角色,它会隐藏式地导致正规化,实现紧凑的表示。在这个工作中,我们研究GD在维度优化中的角色,特别是在 matrix sensing 框架下。这个框架最近被提出,以解决非齐形矩阵感知问题,并转换了假解答为绝对点状态。我们发现,将GD应用到这个升高的问题,则可以获得约等于矩阵的紧凑矩阵和潜在绝对点。我们的发现表明,矩阵参数化和首项方法的结合,可以实现这些问题的全球最佳解。”Note: Simplified Chinese is a standardized form of Chinese that is used in mainland China and Singapore. It is different from Traditional Chinese, which is used in Taiwan and other parts of the world.

Symmetry-preserving graph attention network to solve routing problems at multiple resolutions

  • paper_url: http://arxiv.org/abs/2310.15543
  • repo_url: https://github.com/hysonlab/multires-np-hard
  • paper_authors: Cong Dao Tran, Thong Bach, Truong Son Hy
  • for: 解决Travelling Salesperson Problems (TSPs) 和 Vehicle Routing Problems (VRPs) 的精度和计算时间问题,通过采用机器学习 (ML) 方法。
  • methods: 提出了首次完全具有对称性的模型和训练方法,能够解决 combinatorial problems。同时,我们还提出了一种Multiresolution scheme和Equivariant Graph Attention network (mEGAT) 架构,可以充分利用图的多尺度结构, especial for large and long-range graphs。
  • results: 对比于现有基eline,我们的模型表现出了显著的改善,并证明了对称保持和多尺度是解决 combinatorial problems 的关键因素。我们的代码公开 disponível于 GitHub 上(https://github.com/HySonLab/Multires-NP-hard)。
    Abstract Travelling Salesperson Problems (TSPs) and Vehicle Routing Problems (VRPs) have achieved reasonable improvement in accuracy and computation time with the adaptation of Machine Learning (ML) methods. However, none of the previous works completely respects the symmetries arising from TSPs and VRPs including rotation, translation, permutation, and scaling. In this work, we introduce the first-ever completely equivariant model and training to solve combinatorial problems. Furthermore, it is essential to capture the multiscale structure (i.e. from local to global information) of the input graph, especially for the cases of large and long-range graphs, while previous methods are limited to extracting only local information that can lead to a local or sub-optimal solution. To tackle the above limitation, we propose a Multiresolution scheme in combination with Equivariant Graph Attention network (mEGAT) architecture, which can learn the optimal route based on low-level and high-level graph resolutions in an efficient way. In particular, our approach constructs a hierarchy of coarse-graining graphs from the input graph, in which we try to solve the routing problems on simple low-level graphs first, then utilize that knowledge for the more complex high-level graphs. Experimentally, we have shown that our model outperforms existing baselines and proved that symmetry preservation and multiresolution are important recipes for solving combinatorial problems in a data-driven manner. Our source code is publicly available at https://github.com/HySonLab/Multires-NP-hard
    摘要 旅行销售人员问题 (TSP) 和车辆路径问题 (VRP) 已经通过机器学习 (ML) 方法实现了一定的改进,但是之前的所有工作都没有完全尊重 TSP 和 VRP 中出现的对称性,包括旋转、平移、Permutation 和缩放。在这项工作中,我们介绍了第一个完全对称的模型和训练方法,用于解决 combinatorial 问题。此外,我们认为需要捕捉输入图的多尺度结构(即从本地到全局信息),特别是在大型和长距离图中,而前一些方法只能提取本地信息,可能导致本地或不优解。为解决这些限制,我们提出了一种多尺度 schemes 和对称图注意力网络 (mEGAT) 架构,可以高效地学习输入图的优化路径。具体来说,我们的方法构建了输入图的层次结构,从输入图中构建一系列粗化图,并在这些粗化图上解决路径问题。我们首先在粗化图上解决路径问题,然后利用该知识来解决更复杂的高级图。实验证明,我们的模型已经超过了现有的基eline,并证明对称保持和多尺度是解决 combinatorial 问题的重要配方。我们的源代码可以在 上获取。

Privacy Amplification for Matrix Mechanisms

  • paper_url: http://arxiv.org/abs/2310.15526
  • repo_url: None
  • paper_authors: Christopher A. Choquette-Choo, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta
  • for: 这个论文的目的是为了提供一种能够适用于新的状态流行算法的隐私抽象分析方法。
  • methods: 这个论文使用的方法是基于随机选择数据的隐私抽象分析,以提供更加紧张的隐私保证。
  • results: 这个论文提出了一种名为”MMCC”的算法,可以对任何generic matrix mechanism进行隐私抽象分析。MMCC的分析结果几乎是最佳的,随着epsilondrawing to zero,其分析结果接近下界。此外,这个论文还证明了可以通过conditioning来分析相关输出,从而使得隐私抽象分析可以应用于新的状态流行算法。
    Abstract Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In this paper, we propose "MMCC", the first algorithm to analyze privacy amplification via sampling for any generic matrix mechanism. MMCC is nearly tight in that it approaches a lower bound as $\epsilon\to0$. To analyze correlated outputs in MMCC, we prove that they can be analyzed as if they were independent, by conditioning them on prior outputs. Our "conditional composition theorem" has broad utility: we use it to show that the noise added to binary-tree-DP-FTRL can asymptotically match the noise added to DP-SGD with amplification. Our amplification algorithm also has practical empirical utility: we show it leads to significant improvement in the privacy-utility trade-offs for DP-FTRL algorithms on standard benchmarks.
    摘要 “隐私增强”利用数据选择的随机性提供更紧密的隐私保证(DP)。这个分析是DP-SGD的成功之关键,但是不能directly应用于最新的state-of-the-art算法。这是因为这些算法,称为DP-FTRL,使用矩阵机制来添加相关的随机变数而不是独立的随机变数,如DP-SGD。在这篇论文中,我们提出“MMCC”,第一个可以分析隐私增强通过抽样的任何矩阵机制。MMCC几乎是紧致的,随着$\epsilon$趋向0,它接近下界。在分析相关的输出时,我们证明可以将它们视为独立的,通过对它们的先前输出进行条件。我们称之为“条件汇总定理”,它具有广泛的实用性:我们使用它来显示DP-FTRL中添加到二元树DP-FTRL的随机变数可以对应DP-SGD中的随机变数。我们的增强算法也有实际的实验实用性:我们显示它对DP-FTRL算法的隐私-功能贡献进行了重要的改善。

On the Inherent Privacy Properties of Discrete Denoising Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.15524
  • repo_url: None
  • paper_authors: Rongzhe Wei, Eleonora Kreačić, Haoyu Wang, Haoteng Yin, Eli Chien, Vamsi K. Potluru, Pan Li
  • for: 该论文旨在探讨对散列模型(DDMs)的散列数据生成的隐私保护性。
  • methods: 该论文采用了数学理论来描述DDMs的隐私保护性,具体来说是对每个数据点进行分别的散列Diffusion Models(pDP)的隐私泄露分析,以提供数据预处理方法来减少DDMs生成的隐私风险。
  • results: 该论文的研究结果表明,在使用DDMs生成散列数据时,隐私保护性可以在不同的数据规模下得到保障,并且隐私泄露会随着散列率的增加而减少。此外,该论文还通过实验验证了其理论结论,并在真实世界的数据集上进行了验证。
    Abstract Privacy concerns have led to a surge in the creation of synthetic datasets, with diffusion models emerging as a promising avenue. Although prior studies have performed empirical evaluations on these models, there has been a gap in providing a mathematical characterization of their privacy-preserving capabilities. To address this, we present the pioneering theoretical exploration of the privacy preservation inherent in discrete diffusion models (DDMs) for discrete dataset generation. Focusing on per-instance differential privacy (pDP), our framework elucidates the potential privacy leakage for each data point in a given training dataset, offering insights into data preprocessing to reduce privacy risks of the synthetic dataset generation via DDMs. Our bounds also show that training with $s$-sized data points leads to a surge in privacy leakage from $(\epsilon, \mathcal{O}(\frac{1}{s^2\epsilon}))$-pDP to $(\epsilon, \mathcal{O}(\frac{1}{s\epsilon}))$-pDP during the transition from the pure noise to the synthetic clean data phase, and a faster decay in diffusion coefficients amplifies the privacy guarantee. Finally, we empirically verify our theoretical findings on both synthetic and real-world datasets.
    摘要 隐私问题的增加导致了人工数据集的创造,扩散模型emerging as a promising avenue。 Although prior studies have performed empirical evaluations on these models, there has been a gap in providing a mathematical characterization of their privacy-preserving capabilities. To address this, we present the pioneering theoretical exploration of the privacy preservation inherent in discrete diffusion models (DDMs) for discrete dataset generation. Focusing on per-instance differential privacy (pDP), our framework elucidates the potential privacy leakage for each data point in a given training dataset, offering insights into data preprocessing to reduce privacy risks of the synthetic dataset generation via DDMs. Our bounds also show that training with $s$-sized data points leads to a surge in privacy leakage from $( \epsilon, \mathcal{O}( \frac{1}{s^2\epsilon}) )$-pDP to $( \epsilon, \mathcal{O}( \frac{1}{s\epsilon}) )$-pDP during the transition from the pure noise to the synthetic clean data phase, and a faster decay in diffusion coefficients amplifies the privacy guarantee. Finally, we empirically verify our theoretical findings on both synthetic and real-world datasets.Note: Please note that the translation is in Simplified Chinese, which is one of the two standardized Chinese writing systems. If you prefer Traditional Chinese, I can provide that as well.

Graph Attention-based Deep Reinforcement Learning for solving the Chinese Postman Problem with Load-dependent costs

  • paper_url: http://arxiv.org/abs/2310.15516
  • repo_url: https://github.com/hysonlab/chinese_postman_problem
  • paper_authors: Cong Dao Tran, Truong Son Hy
  • for: solves the Chinese Postman Problem with load-dependent costs (CPP-LC) using a novel deep reinforcement learning (DRL) framework.
  • methods: proposes a DRL model consisting of an encoder and decoder to address the CPP-LC challenge effectively, and a bio-inspired meta-heuristic solution based on Evolutionary Algorithm (EA).
  • results: outperforms existing meta-heuristic methods such as Iterative Local Search (ILS) and Variable Neighborhood Search (VNS) regarding both solution quality and running time, and gives the best solution quality with much more running time compared to EA.Here is the summary in Traditional Chinese:
  • for: 解决中文邮差问题(CPP-LC)使用 Deep Reinforcement Learning(DRL)框架。
  • methods: 提出一个基于 DRL 的数组码(encoder)和解码(decoder)来解决 CPP-LC 挑战,以及一个生物灵感的 meta-heuristic 解决方案基于进化算法(EA)。
  • results: 与 exist 的 meta-heuristic 方法(如迭代本地搜索(ILS)和Variable Neighborhood Search(VNS))相比,在解决 CPP-LC 中获得更好的解决方案和更快的执行时间,并且与 EA 相比,获得最佳解决方案,但执行时间较长。
    Abstract Recently, Deep reinforcement learning (DRL) models have shown promising results in solving routing problems. However, most DRL solvers are commonly proposed to solve node routing problems, such as the Traveling Salesman Problem (TSP). Meanwhile, there has been limited research on applying neural methods to arc routing problems, such as the Chinese Postman Problem (CPP), since they often feature irregular and complex solution spaces compared to TSP. To fill these gaps, this paper proposes a novel DRL framework to address the CPP with load-dependent costs (CPP-LC) (Corberan et al., 2018), which is a complex arc routing problem with load constraints. The novelty of our method is two-fold. First, we formulate the CPP-LC as a Markov Decision Process (MDP) sequential model. Subsequently, we introduce an autoregressive model based on DRL, namely Arc-DRL, consisting of an encoder and decoder to address the CPP-LC challenge effectively. Such a framework allows the DRL model to work efficiently and scalably to arc routing problems. Furthermore, we propose a new bio-inspired meta-heuristic solution based on Evolutionary Algorithm (EA) for CPP-LC. Extensive experiments show that Arc-DRL outperforms existing meta-heuristic methods such as Iterative Local Search (ILS) and Variable Neighborhood Search (VNS) proposed by (Corberan et al., 2018) on large benchmark datasets for CPP-LC regarding both solution quality and running time; while the EA gives the best solution quality with much more running time. We release our C++ implementations for metaheuristics such as EA, ILS and VNS along with the code for data generation and our generated data at https://github.com/HySonLab/Chinese_Postman_Problem
    摘要 近期,深度强化学习(DRL)模型已经在路径问题上显示了扎实的成果。然而,大多数DRL解决方案都是针对节点路径问题,如旅行销售人问题(TSP)。同时,对于弧路径问题,如中国邮政员问题(CPP),有限的研究尝试使用神经网络方法。为了填补这些漏洞,这篇论文提出了一种新的DRL框架,用于解决CPP-LC(Corberan et al., 2018),这是一种复杂的弧路径问题,具有负荷依赖成本。我们的创新在两个方面:1. 我们将CPP-LC转化为Markov决策过程(MDP)sequential模型。2. 我们引入了基于DRL的自适应模型,即弧路径模型(Arc-DRL),其包括编码器和解码器,以有效地解决CPP-LC挑战。这种框架使得DRL模型可以有效地和扩展地应用于弧路径问题。此外,我们还提出了一种基于进化算法(EA)的新生物启发式méta-希望解方法,用于CPP-LC。我们在大量的 benchmark 数据集上进行了广泛的实验,结果显示,Arc-DRL在解决CPP-LC问题时,不仅在解决质量和运行时间两个方面都超过了现有的méta-希望方法,如迭代本地搜索(ILS)和变化邻居搜索(VNS),而且EA提供了最佳的解决质量,但需要更多的运行时间。我们在https://github.com/HySonLab/Chinese_Postman_Problem中发布了我们在metaheuristics、EA、ILS和VNS等方面的C++实现,以及我们生成的数据和代码。

Interpretable Survival Analysis for Heart Failure Risk Prediction

  • paper_url: http://arxiv.org/abs/2310.15472
  • repo_url: None
  • paper_authors: Mike Van Ness, Tomas Bosschieter, Natasha Din, Andrew Ambrosy, Alexander Sandhu, Madeleine Udell
  • for: 这篇论文是针对医疗研究中的存生分析问题(time-to-event analysis),专门针对医院数据库中的心血管疾病风险评估。
  • methods: 这篇论文提出了一个新的存生分析管线,融合了机器学习和解释性分析,以提高存生分析的精度和可读性。管线包括改进的存生堆压法、ControlBurn 的特征选择和解释性加速机制。
  • results: 这篇论文使用大规模医院数据库,预测心血管疾病的风险,并 achieve state-of-the-art 性能。同时,管线还提供了有趣和新的风险因素统计,对于医疗应用有很好的帮助。
    Abstract Survival analysis, or time-to-event analysis, is an important and widespread problem in healthcare research. Medical research has traditionally relied on Cox models for survival analysis, due to their simplicity and interpretability. Cox models assume a log-linear hazard function as well as proportional hazards over time, and can perform poorly when these assumptions fail. Newer survival models based on machine learning avoid these assumptions and offer improved accuracy, yet sometimes at the expense of model interpretability, which is vital for clinical use. We propose a novel survival analysis pipeline that is both interpretable and competitive with state-of-the-art survival models. Specifically, we use an improved version of survival stacking to transform a survival analysis problem to a classification problem, ControlBurn to perform feature selection, and Explainable Boosting Machines to generate interpretable predictions. To evaluate our pipeline, we predict risk of heart failure using a large-scale EHR database. Our pipeline achieves state-of-the-art performance and provides interesting and novel insights about risk factors for heart failure.
    摘要 To address this challenge, we propose a novel survival analysis pipeline that balances interpretability and competitive performance with state-of-the-art survival models. Our pipeline consists of three key components:1. Survival stacking: We use an improved version of survival stacking to transform the survival analysis problem into a classification problem.2. ControlBurn: We employ ControlBurn for feature selection to identify the most relevant features for predicting the risk of heart failure.3. Explainable Boosting Machines: We use Explainable Boosting Machines to generate interpretable predictions and provide novel insights into risk factors for heart failure.To evaluate our pipeline, we use a large-scale EHR database to predict the risk of heart failure. Our results show that our pipeline achieves state-of-the-art performance and provides interesting and novel insights into risk factors for heart failure.In summary, our proposed survival analysis pipeline offers a balance between interpretability and competitive performance, making it a valuable tool for healthcare researchers and clinicians.

EKGNet: A 10.96μW Fully Analog Neural Network for Intra-Patient Arrhythmia Classification

  • paper_url: http://arxiv.org/abs/2310.15466
  • repo_url: None
  • paper_authors: Benyamin Haghi, Lin Ma, Sahin Lale, Anima Anandkumar, Azita Emami
  • for: 这篇论文是用于开发一个低功耗且精准的电子心脏病诊断系统,特别是用于识别不同类型的心脏病。
  • methods: 这篇论文提出了一个整合式的方法,结合了分析计算和深度学习,用于识别电子心脏病(ECG)的不同类型。它提出了一个名为EKGNet的硬件高效的完全分析型心脏病识别架构,这个架构可以实现高精度和低功耗。
  • results: 实验结果显示,这个方法在PhysioNet的MIT-BIH和PTB诊断数据集上取得了平均的平衡精度为95%和94.25%,用于内部患者的心脏病识别和myocardial infarction(MI)识别。这个创新的方法具有优秀的可转移性和应用性,可以用于开发低功耗且精准的心脏病诊断系统。
    Abstract We present an integrated approach by combining analog computing and deep learning for electrocardiogram (ECG) arrhythmia classification. We propose EKGNet, a hardware-efficient and fully analog arrhythmia classification architecture that archives high accuracy with low power consumption. The proposed architecture leverages the energy efficiency of transistors operating in the subthreshold region, eliminating the need for analog-to-digital converters (ADC) and static random access memory (SRAM). The system design includes a novel analog sequential Multiply-Accumulate (MAC) circuit that mitigates process, supply voltage, and temperature variations. Experimental evaluations on PhysioNet's MIT-BIH and PTB Diagnostics datasets demonstrate the effectiveness of the proposed method, achieving average balanced accuracy of 95% and 94.25% for intra-patient arrhythmia classification and myocardial infarction (MI) classification, respectively. This innovative approach presents a promising avenue for developing low-power arrhythmia classification systems with enhanced accuracy and transferability in biomedical applications.
    摘要 我们提出了一种集成的方法,通过结合分析计算和深度学习来进行电cardiogram(ECG)动 irregularity分类。我们提出了EKGNet,一种具有高精度低功耗的完全分析动 irregularity分类架构。这种架构利用了晶体管在低阈值区工作的能源效率,从而消除了ADC和SRAM的需求。系统设计包括一种新的分析顺序Multiply-Accumulate(MAC)电路,以降低过程、供电电压和温度变化的影响。实验评估PhysioNet的MIT-BIH和PTB诊断数据集表明,提出的方法有效,实现平均权衡精度为95%和94.25%,分别用于患者内部动 irregularity分类和心肺病(MI)分类。这一创新方法对生物医学应用中的低功耗动 irregularity分类系统具有扩展精度和可贷 Transferability。

Private Learning with Public Features

  • paper_url: http://arxiv.org/abs/2310.15454
  • repo_url: https://github.com/Sfedfcv/redesigned-pancake
  • paper_authors: Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Shuang Song, Abhradeep Thakurta, Li Zhang
  • for: 这个论文研究了一类Private Learning问题,其中数据是私有和公共特征的Join。这种情况通常出现在个性化推荐或广告预测中,其中个人相关的特征是敏感信息,而物品相关的特征(如电影或歌曲推荐或广告)不需要保护。问题是公共特征的存在下,私人算法可以达到更高的利用吗?我们给出了一个答案,即多个编码器模型中的一个编码器可以操作公共特征。
  • methods: 我们开发了新的算法,利用这种分离来只保护必要的充分统计(而不是添加噪声到梯度)。这种方法在线性回归中有保证的利用提升,并在两个标准私人推荐 benchmark 上达到了状态的艺术,证明了适应私人-公共特征分离的方法的重要性。
  • results: 我们的实验结果表明,在两个标准私人推荐 benchmark 上,我们的方法可以达到状态的艺术,并且在线性回归中有保证的利用提升。这证明了适应私人-公共特征分离的方法的重要性。
    Abstract We study a class of private learning problems in which the data is a join of private and public features. This is often the case in private personalization tasks such as recommendation or ad prediction, in which features related to individuals are sensitive, while features related to items (the movies or songs to be recommended, or the ads to be shown to users) are publicly available and do not require protection. A natural question is whether private algorithms can achieve higher utility in the presence of public features. We give a positive answer for multi-encoder models where one of the encoders operates on public features. We develop new algorithms that take advantage of this separation by only protecting certain sufficient statistics (instead of adding noise to the gradient). This method has a guaranteed utility improvement for linear regression, and importantly, achieves the state of the art on two standard private recommendation benchmarks, demonstrating the importance of methods that adapt to the private-public feature separation.
    摘要 我们研究一类private学习问题,其数据是private和public特征的Join。这经常发生在private个性化任务中,如推荐或广告预测,个人相关的特征是敏感信息,而Item(电影或歌曲的推荐或用户展示广告)相关的特征公开可用并不需要保护。自然地出现一个问题:private算法在公共特征的存在下是否可以 дости得更高的实用性。我们给出了一个答案,即多个encoder模型中的一个encoder操作于公共特征。我们开发了新的算法,利用这种分离来只保护特定的充分统计(而不是添加噪声到梯度)。这种方法在线性回归中有保证的实用性提升,并在两个标准的private推荐benchmark上达到了state of the art,证明了对private-public特征分离的方法的重要性。

General Identifiability and Achievability for Causal Representation Learning

  • paper_url: http://arxiv.org/abs/2310.15450
  • repo_url: https://github.com/bvarici/score-general-id-crl
  • paper_authors: Burak Varıcı, Emre Acartürk, Karthikeyan Shanmugam, Ali Tajer
    for:This paper focuses on developing a method for causal representation learning (CRL) under a general nonparametric causal latent model and a general transformation model.methods:The method uses two hard uncoupled interventions per node in the latent causal graph to establish identifiability and achievability results. The algorithm leverages score variations across different environments to estimate the inverse of the transformer and, subsequently, the latent variables.results:The paper guarantees perfect recovery of the latent causal model and variables under uncoupled interventions, and recovers the existing identifiability result for two hard coupled interventions. The method does not require additional faithfulness assumptions when observational data is available.
    Abstract This paper focuses on causal representation learning (CRL) under a general nonparametric causal latent model and a general transformation model that maps the latent data to the observational data. It establishes \textbf{identifiability} and \textbf{achievability} results using two hard \textbf{uncoupled} interventions per node in the latent causal graph. Notably, one does not know which pair of intervention environments have the same node intervened (hence, uncoupled environments). For identifiability, the paper establishes that perfect recovery of the latent causal model and variables is guaranteed under uncoupled interventions. For achievability, an algorithm is designed that uses observational and interventional data and recovers the latent causal model and variables with provable guarantees for the algorithm. This algorithm leverages score variations across different environments to estimate the inverse of the transformer and, subsequently, the latent variables. The analysis, additionally, recovers the existing identifiability result for two hard \textbf{coupled} interventions, that is when metadata about the pair of environments that have the same node intervened is known. It is noteworthy that the existing results on non-parametric identifiability require assumptions on interventions and additional faithfulness assumptions. This paper shows that when observational data is available, additional faithfulness assumptions are unnecessary.
    摘要

An accelerated first-order regularized momentum descent ascent algorithm for stochastic nonconvex-concave minimax problems

  • paper_url: http://arxiv.org/abs/2310.15448
  • repo_url: None
  • paper_authors: Huiling Zhang, Zi Xu
  • for: 解决随机非凸-凹最小最大值问题
  • methods: 使用加速的首项规则化慢步驱动算法(FORMDA)
  • results: 算法的迭代复杂度为 $\tilde{\mathcal{O}(\varepsilon ^{-6.5})$,可以在单循环算法中解决随机非凸-凹最小最大值问题,并且实现了最佳复杂度下限。
    Abstract Stochastic nonconvex minimax problems have attracted wide attention in machine learning, signal processing and many other fields in recent years. In this paper, we propose an accelerated first-order regularized momentum descent ascent algorithm (FORMDA) for solving stochastic nonconvex-concave minimax problems. The iteration complexity of the algorithm is proved to be $\tilde{\mathcal{O}(\varepsilon ^{-6.5})$ to obtain an $\varepsilon$-stationary point, which achieves the best-known complexity bound for single-loop algorithms to solve the stochastic nonconvex-concave minimax problems under the stationarity of the objective function.
    摘要

Learning Dynamics in Linear VAE: Posterior Collapse Threshold, Superfluous Latent Space Pitfalls, and Speedup with KL Annealing

  • paper_url: http://arxiv.org/abs/2310.15440
  • repo_url: None
  • paper_authors: Yuma Ichikawa, Koji Hukushima
  • for: 这个论文的目的是解决变量自动机 (VAE) 面临的一个知名问题,即随机 posterior 通常与先验密切相关,导致表示学习质量受损。
  • methods: 该论文提出了一个可调参数 $\beta$ 和一种宽泛化这个参数的策略,称为 KL 渐进。
  • results: 研究发现,在输入维度很大的情况下,VAE 的学习动力会 converges 到一个决定性过程,从而可以进行详细的泛化误差分析。此外,分析还表明,VAE 初期学习杂合表示,逐渐取得不杂合表示。在特定的学习期间,当 $\beta$ 超过一定的阈值时, posterior 坍塌变得不可避免。此外,过量的尺度变量会导致背景噪声的过拟合,从而影响总体化学习和学习速度。
    Abstract Variational autoencoders (VAEs) face a notorious problem wherein the variational posterior often aligns closely with the prior, a phenomenon known as posterior collapse, which hinders the quality of representation learning. To mitigate this problem, an adjustable hyperparameter $\beta$ and a strategy for annealing this parameter, called KL annealing, are proposed. This study presents a theoretical analysis of the learning dynamics in a minimal VAE. It is rigorously proved that the dynamics converge to a deterministic process within the limit of large input dimensions, thereby enabling a detailed dynamical analysis of the generalization error. Furthermore, the analysis shows that the VAE initially learns entangled representations and gradually acquires disentangled representations. A fixed-point analysis of the deterministic process reveals that when $\beta$ exceeds a certain threshold, posterior collapse becomes inevitable regardless of the learning period. Additionally, the superfluous latent variables for the data-generative factors lead to overfitting of the background noise; this adversely affects both generalization and learning convergence. The analysis further unveiled that appropriately tuned KL annealing can accelerate convergence.
    摘要 “短变自动encoder(VAEs)面临一种著名的问题,即变量 posterior frequently aligns closely with the prior,导致 representation learning 质量受损。为了解决这个问题,一个可调参数 $\beta$ 和一种冷却这个参数的策略,称为 KL 冷却,被提出。本研究对一个最小化 VAE 的学习动态进行了理论分析。通过证明了动态在大输入维度下 converges to a deterministic process,从而允许详细分析泛化错误。此外,分析表明 VAE 初始化时学习杂合的表示,逐渐获得分离的表示。一种固定点分析表明,当 $\beta$ 超过某个阈值时, posterior collapse 变得不可避免,无论学习期间如何。此外,杂合的幂 variable 对数据生成因素的过度适应,导致训练过程中的过拟合和泛化错误。研究还发现,合适地调整 KL 冷却可以加速 converge。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Off-Policy Evaluation for Large Action Spaces via Policy Convolution

  • paper_url: http://arxiv.org/abs/2310.15433
  • repo_url: None
  • paper_authors: Noveen Sachdeva, Lequn Wang, Dawen Liang, Nathan Kallus, Julian McAuley
  • for: 评估和优化新政策时需要开发准确的非策略估计器,以避免分布偏移问题。
  • methods: 我们介绍了Policy Convolution(PC)家族的估计器,利用行动嵌入来策略性卷积 logging 和目标策略,从而控制偏移的偏移。
  • results: 我们在 synthetic 和 benchmark 数据集上进行了实验,发现 PC 可以在策略差异较大的情况下提供remarkable的mean squared error(MSE)改进,最多达到5-6个数量级。
    Abstract Developing accurate off-policy estimators is crucial for both evaluating and optimizing for new policies. The main challenge in off-policy estimation is the distribution shift between the logging policy that generates data and the target policy that we aim to evaluate. Typically, techniques for correcting distribution shift involve some form of importance sampling. This approach results in unbiased value estimation but often comes with the trade-off of high variance, even in the simpler case of one-step contextual bandits. Furthermore, importance sampling relies on the common support assumption, which becomes impractical when the action space is large. To address these challenges, we introduce the Policy Convolution (PC) family of estimators. These methods leverage latent structure within actions -- made available through action embeddings -- to strategically convolve the logging and target policies. This convolution introduces a unique bias-variance trade-off, which can be controlled by adjusting the amount of convolution. Our experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC, especially when either the action space or policy mismatch becomes large, with gains of up to 5 - 6 orders of magnitude over existing estimators.
    摘要 发展准确的脱政策估计器是评估和优化新政策的关键。脱政策估计的主要挑战在于数据生成政策和目标政策之间的分布偏移。通常,对偏移分布进行重要样本抽样是一种解决方法,但这种方法通常会带来高异常度,即使在一步上下文抽象带its。此外,重要样本抽样假设共同支持,当行动空间较大时变得不实际。为解决这些挑战,我们介绍了政策卷积(PC)家族估计器。这些方法利用行动中的隐藏结构(通过行动嵌入)进行策略性卷积,这种卷积引入了唯一的偏移-异常度质量trade-off,可以通过调整卷积的量控制。我们在synthetic和benchmark数据集上进行了实验,demonstrate PC在大的行动空间或政策偏移时实现了非常出色的均方差Error(MSE)改进,比现有估计器更高达5-6个数量级。