cs.AI - 2023-09-19

LMDX: Language Model-based Document Information Extraction and Localization

  • paper_url: http://arxiv.org/abs/2309.10952
  • repo_url: None
  • paper_authors: Vincent Perot, Kai Kang, Florian Luisier, Guolong Su, Xiaoyu Sun, Ramya Sree Boppana, Zilong Wang, Jiaqi Mu, Hao Zhang, Nan Hua
  • for: This paper is written for the task of document information extraction, specifically for semi-structured documents, and aims to improve the state-of-the-art in this area.
  • methods: The paper introduces a new methodology called Language Model-based Document Information Extraction and Localization (LMDX), which adapts arbitrary large language models (LLMs) for document information extraction. LMDX uses a grounding mechanism to ensure that the extracted information is accurate and not hallucinated.
  • results: The paper evaluates LMDX on two benchmark datasets (VRDU and CORD) and achieves a new state-of-the-art in document information extraction. The results show that LMDX can extract singular, repeated, and hierarchical entities with high accuracy, both with and without training data. Additionally, the paper demonstrates the efficiency of LMDX in creating high-quality parsers with minimal data.Here’s the simplified Chinese text for the three pieces of information:
  • for: 这篇论文是为了文档信息提取而写的,特别是对半结构化文档进行提取。
  • methods: 论文提出了一种新的方法——语言模型基于文档信息提取和地点确定(LMDX),可以将任意大语言模型(LLM)适应到文档信息提取中。LMDX使用了一种安全机制,确保提取的信息准确无误。
  • results: 论文在两个标准 benchmark datasets(VRDU和CORD)上进行了评估,实现了文档信息提取的新州OF-THE-ART。结果表明,LMDX可以准确提取唯一、重复和层次结构的实体,无需训练数据,并且可以创建高质量的解析器。
    Abstract Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art on many existing tasks and exhibiting emergent capabilities. However, LLMs have not yet been successfully applied on semi-structured document information extraction, which is at the core of many document processing workflows and consists of extracting key entities from a visually rich document (VRD) given a predefined target schema. The main obstacles to LLM adoption in that task have been the absence of layout encoding within LLMs, critical for a high quality extraction, and the lack of a grounding mechanism ensuring the answer is not hallucinated. In this paper, we introduce Language Model-based Document Information Extraction and Localization (LMDX), a methodology to adapt arbitrary LLMs for document information extraction. LMDX can do extraction of singular, repeated, and hierarchical entities, both with and without training data, while providing grounding guarantees and localizing the entities within the document. In particular, we apply LMDX to the PaLM 2-S LLM and evaluate it on VRDU and CORD benchmarks, setting a new state-of-the-art and showing how LMDX enables the creation of high quality, data-efficient parsers.
    摘要 Translation notes:* "Large Language Models" (LLM) 是指大型自然语言处理(NLP)模型,它们已经革命化了许多现有任务的状态。* "semi-structured document information extraction" 是指从视觉 ricH document (VRD) 中提取预定 schema 中的关键实体,这是许多文档处理工作流程的核心。* " Layout encoding" 是指在 LLM 中包含文档布局信息的机制,这是提取高质量实体的关键。* "grounding guarantees" 是指保证解决方案不会幻想的机制。* "localizing the entities" 是指将实体位于文档中的机制。* "high-quality, data-efficient parsers" 是指高质量,数据有效的解析器。

Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change

  • paper_url: http://arxiv.org/abs/2309.10945
  • repo_url: None
  • paper_authors: Paulo Pirozelli, Marcos M. José, Igor Silveira, Flávio Nakasato, Sarajane M. Peres, Anarosa A. F. Brandão, Anna H. R. Costa, Fabio G. Cozman
  • for: 这篇论文的目的是为了测试机器学习模型在科学知识领域的能力。
  • methods: 这篇论文使用了 Pir'a 数据集,并定义了六个基准测试。
  • results: 这篇论文提供了多个参考值,用于测试机器学习模型在不同的问答任务上的能力。
    Abstract Pir\'a is a reading comprehension dataset focused on the ocean, the Brazilian coast, and climate change, built from a collection of scientific abstracts and reports on these topics. This dataset represents a versatile language resource, particularly useful for testing the ability of current machine learning models to acquire expert scientific knowledge. Despite its potential, a detailed set of baselines has not yet been developed for Pir\'a. By creating these baselines, researchers can more easily utilize Pir\'a as a resource for testing machine learning models across a wide range of question answering tasks. In this paper, we define six benchmarks over the Pir\'a dataset, covering closed generative question answering, machine reading comprehension, information retrieval, open question answering, answer triggering, and multiple choice question answering. As part of this effort, we have also produced a curated version of the original dataset, where we fixed a number of grammar issues, repetitions, and other shortcomings. Furthermore, the dataset has been extended in several new directions, so as to face the aforementioned benchmarks: translation of supporting texts from English into Portuguese, classification labels for answerability, automatic paraphrases of questions and answers, and multiple choice candidates. The results described in this paper provide several points of reference for researchers interested in exploring the challenges provided by the Pir\'a dataset.
    摘要 pia 是一个关注大洋、巴西海岸和气候变化的阅读理解数据集,基于科学报告和摘要这些主题。这个数据集代表了一个多样化的语言资源,特别有用于测试当前机器学习模型是否积累了专家科学知识。尽管其潜在,但还没有对 pia 的详细基准建立。通过创建这些基准,研究人员可以更加方便地使用 pia 作为测试机器学习模型的资源,涵盖广泛的问答任务。在这篇论文中,我们定义了六个 benchmark sobre el dataset de pia,包括关闭生成问答、机器阅读理解、信息检索、开放问答、答案触发和多选问答。为了实现这些目标,我们还制作了 pirá dataset 的修正版本,其中修复了一些语法错误、重复和其他缺陷。此外,该 dataset 还被扩展了多个新方向,以面对以上 benchmark:翻译支持文本从英语到葡萄牙语,答案可能性的分类标签,自动重叠问题和答案,以及多选问题的选项。本文中所描述的结果提供了许多参考值 для研究人员感兴趣于探索 pia 数据集的挑战。

End-to-End Speech Recognition Contextualization with Large Language Models

  • paper_url: http://arxiv.org/abs/2309.10917
  • repo_url: None
  • paper_authors: Egor Lakomkin, Chunyang Wu, Yassir Fathullah, Ozlem Kalinli, Michael L. Seltzer, Christian Fuegen
  • for: This paper aims to improve the performance of speech recognition models by incorporating large language models (LLMs) and contextual information.
  • methods: The authors propose a novel method that casts speech recognition as a mixed-modal language modeling task, using a pretrained LLM and providing audio features and optional text tokens for context. The system is trained in a decoder-only fashion, and the authors use adapters to add a small number of trainable parameters to unlock contextualized speech recognition capability.
  • results: The authors demonstrate a significant improvement in performance, with a 6% WER reduction when additional textual context is provided, and a 7.5% WER improvement overall and 17% WER improvement on rare words compared to a baseline contextualized RNN-T system that was trained on a much larger dataset.
    Abstract In recent years, Large Language Models (LLMs) have garnered significant attention from the research community due to their exceptional performance and generalization capabilities. In this paper, we introduce a novel method for contextualizing speech recognition models incorporating LLMs. Our approach casts speech recognition as a mixed-modal language modeling task based on a pretrained LLM. We provide audio features, along with optional text tokens for context, to train the system to complete transcriptions in a decoder-only fashion. As a result, the system is implicitly incentivized to learn how to leverage unstructured contextual information during training. Our empirical results demonstrate a significant improvement in performance, with a 6% WER reduction when additional textual context is provided. Moreover, we find that our method performs competitively and improve by 7.5% WER overall and 17% WER on rare words against a baseline contextualized RNN-T system that has been trained on more than twenty five times larger speech dataset. Overall, we demonstrate that by only adding a handful number of trainable parameters via adapters, we can unlock contextualized speech recognition capability for the pretrained LLM while keeping the same text-only input functionality.
    摘要 Here is the text in Simplified Chinese:Recently, Large Language Models (LLMs) have received significant attention from the research community due to their exceptional performance and generalization capabilities. In this paper, we propose a novel method for contextualizing speech recognition models using LLMs. Our approach treats speech recognition as a mixed-modal language modeling task based on a pre-trained LLM, and we provide audio features and optional text tokens for context to train the system in a decoder-only fashion. This approach implicitly incentivizes the system to learn how to leverage unstructured contextual information during training. Our experimental results show a significant improvement in performance, with a 6% WER reduction when additional textual context is provided. Moreover, we find that our method performs competitively and improves by 7.5% WER overall and 17% WER on rare words against a baseline contextualized RNN-T system that was trained on a much larger speech dataset. Overall, we demonstrate that by adding only a small number of trainable parameters via adapters, we can unlock contextualized speech recognition capabilities for the pre-trained LLM while maintaining the same text-only input functionality.

Amplifying Pathological Detection in EEG Signaling Pathways through Cross-Dataset Transfer Learning

  • paper_url: http://arxiv.org/abs/2309.10910
  • repo_url: None
  • paper_authors: Mohammad-Javad Darvishi-Bayazi, Mohammad Sajjad Ghaemi, Timothee Lesort, Md Rifat Arefin, Jocelyn Faubert, Irina Rish
    for:This paper aims to explore the effectiveness of data and model scaling, as well as cross-dataset knowledge transfer, in the context of pathology diagnosis based on EEG signals.methods:The authors use a combination of data scaling, model scaling, and cross-dataset knowledge transfer to improve the performance of their target model on a low-regime dataset. They also employ a small and generic model (ShallowNet) and a larger model (TCN) to compare their performance.results:The authors observe varying performance improvements through data scaling, and identify the challenges of possible negative transfer and the significance of some key components to overcome distribution shifts and potential spurious correlations. They find that a small and generic model performs well on a single dataset, while a larger model performs better on transfer and learning from a larger and diverse dataset.
    Abstract Pathology diagnosis based on EEG signals and decoding brain activity holds immense importance in understanding neurological disorders. With the advancement of artificial intelligence methods and machine learning techniques, the potential for accurate data-driven diagnoses and effective treatments has grown significantly. However, applying machine learning algorithms to real-world datasets presents diverse challenges at multiple levels. The scarcity of labelled data, especially in low regime scenarios with limited availability of real patient cohorts due to high costs of recruitment, underscores the vital deployment of scaling and transfer learning techniques. In this study, we explore a real-world pathology classification task to highlight the effectiveness of data and model scaling and cross-dataset knowledge transfer. As such, we observe varying performance improvements through data scaling, indicating the need for careful evaluation and labelling. Additionally, we identify the challenges of possible negative transfer and emphasize the significance of some key components to overcome distribution shifts and potential spurious correlations and achieve positive transfer. We see improvement in the performance of the target model on the target (NMT) datasets by using the knowledge from the source dataset (TUAB) when a low amount of labelled data was available. Our findings indicate a small and generic model (e.g. ShallowNet) performs well on a single dataset, however, a larger model (e.g. TCN) performs better on transfer and learning from a larger and diverse dataset.
    摘要 依据EEG信号的 PATHOLOGY诊断利用人工智能方法和机器学习技术已经具有重要意义,用于理解神经系统疾病。随着人工智能方法和机器学习技术的发展,实时数据驱动诊断和有效治疗的潜力已经增长了。然而,在实际数据集上应用机器学习算法时,存在多种挑战,包括数据的罕见性和限制性。在这种情况下,扩大数据和数据转移学习成为了非常重要的。在这个研究中,我们使用一个真实世界的 PATHOLOGY 分类任务来探讨数据和模型的扩大和交叉数据集知识传递的效果。我们发现,通过数据扩大,模型的性能有所提高,需要仔细评估和标注。此外,我们还发现了可能的负向传递和分布shift的挑战,并提出了一些关键组件来超越这些挑战。我们发现,使用源数据集(TUAB)的知识可以在有限的标签数据情况下提高目标模型(NMT)的性能。我们的发现表明,一个小型和通用的模型(例如ShallowNet)在单个数据集上表现良好,而一个更大的模型(例如TCN)在转移学习和学习从更大和多样化的数据集上表现更好。

Multicopy Reinforcement Learning Agents

  • paper_url: http://arxiv.org/abs/2309.10908
  • repo_url: None
  • paper_authors: Alicia P. Wolfe, Oliver Diamond, Remi Feuerman, Magdalena Kisielinska, Brigitte Goeler-Slough, Victoria Manfredi
  • for: 这种研究旨在解决一种多智能问题,在其中一个智能创建多个同一个智能的复制品来完成单个智能任务更好或更高效。
  • methods: 我们提出了一种学习算法,该算法利用值函数的结构来高效地学习如何平衡多 копи地智能的优点和成本。
  • results: 我们的研究表明,在噪声环境中,使用多 копи地智能可以提高任务的完成率和效率。
    Abstract This paper examines a novel type of multi-agent problem, in which an agent makes multiple identical copies of itself in order to achieve a single agent task better or more efficiently. This strategy improves performance if the environment is noisy and the task is sometimes unachievable by a single agent copy. We propose a learning algorithm for this multicopy problem which takes advantage of the structure of the value function to efficiently learn how to balance the advantages and costs of adding additional copies.
    摘要 Here's the text in Simplified Chinese:这篇论文研究了一种新型多智能问题,在这种问题中,一个智能创建多个相同的 копи体来实现单个智能任务更好或更高效。这种策略在噪音环境下提高性能,因为单个 копи体可能无法完成任务。我们提议一种学习算法来解决这个多 копи问题,该算法利用价值函数的结构来高效地学习如何平衡增加 копи数的优点和成本。

Artificial Intelligence-Enabled Intelligent Assistant for Personalized and Adaptive Learning in Higher Education

  • paper_url: http://arxiv.org/abs/2309.10892
  • repo_url: None
  • paper_authors: Ramteja Sajja, Yusuf Sermet, Muhammed Cikmaz, David Cwiertny, Ibrahim Demir
  • for: 这篇论文旨在开发一种基于人工智能的智能助手(AIIA),用于个性化和适应性的大学学习。
  • methods: 该系统使用高级人工智能和自然语言处理技术,创造了一个互动性强、有趣的学习平台,以减轻学生的认知负担,提供易于获取信息、评估知识和个性化学习支持。
  • results: 研究发现,AIIA系统可以理解和回答学生问题,生成测验和卡片,并提供个性化学习路径,有望改善学生学习效果、参与度和满意度。
    Abstract This paper presents a novel framework, Artificial Intelligence-Enabled Intelligent Assistant (AIIA), for personalized and adaptive learning in higher education. The AIIA system leverages advanced AI and Natural Language Processing (NLP) techniques to create an interactive and engaging learning platform. This platform is engineered to reduce cognitive load on learners by providing easy access to information, facilitating knowledge assessment, and delivering personalized learning support tailored to individual needs and learning styles. The AIIA's capabilities include understanding and responding to student inquiries, generating quizzes and flashcards, and offering personalized learning pathways. The research findings have the potential to significantly impact the design, implementation, and evaluation of AI-enabled Virtual Teaching Assistants (VTAs) in higher education, informing the development of innovative educational tools that can enhance student learning outcomes, engagement, and satisfaction. The paper presents the methodology, system architecture, intelligent services, and integration with Learning Management Systems (LMSs) while discussing the challenges, limitations, and future directions for the development of AI-enabled intelligent assistants in education.
    摘要 The AIIA system has several capabilities, including:1. Understanding and responding to student inquiries2. Generating quizzes and flashcards3. Offering personalized learning pathwaysThe research findings have the potential to significantly impact the design, implementation, and evaluation of AI-enabled Virtual Teaching Assistants (VTAs) in higher education, and can inform the development of innovative educational tools that can enhance student learning outcomes, engagement, and satisfaction.The paper discusses the methodology, system architecture, intelligent services, and integration with Learning Management Systems (LMSs) while addressing the challenges, limitations, and future directions for the development of AI-enabled intelligent assistants in education.

Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer

  • paper_url: http://arxiv.org/abs/2309.10891
  • repo_url: https://github.com/luka-group/SALT
  • paper_authors: Fei Wang, Kuan-Hao Huang, Kai-Wei Chang, Muhao Chen
  • for: 提高零shot跨语言传递性,不需要外部数据。
  • methods: 使用代码混合和嵌入混合自我增强,从多语言预训练语言模型中提取跨语言知识,提高下游任务的传递性。
  • results: 在XNLI和PAWS-X任务上,我们的方法能够提高零shot跨语言传递性,无需外部数据。
    Abstract Zero-shot cross-lingual transfer is a central task in multilingual NLP, allowing models trained in languages with more sufficient training resources to generalize to other low-resource languages. Earlier efforts on this task use parallel corpora, bilingual dictionaries, or other annotated alignment data to improve cross-lingual transferability, which are typically expensive to obtain. In this paper, we propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer of the multilingual pretrained language models without the help of such external data. By incorporating code-switching and embedding mixup with self-augmentation, SALT effectively distills cross-lingual knowledge from the multilingual PLM and enhances its transferability on downstream tasks. Experimental results on XNLI and PAWS-X show that our method is able to improve zero-shot cross-lingual transferability without external data. Our code is available at https://github.com/luka-group/SALT.
    摘要 zero-shot 跨语言传递是多语言NLP中的核心任务,允许基于更有 suficient 训练资源的语言模型在其他低资源语言上进行泛化。 Earlier 的尝试使用平行 corpora、双语词典或其他注解对应数据来提高跨语言传递性,这些数据通常是 expensive 的获得。 在这篇论文中,我们提出了一种简单又有效的方法,SALT,以提高多语言预训练语言模型的零shot 跨语言传递性。通过将 code-switching 和 embedding mixup 与自我束缚,SALT 有效地储存了多语言PLM 中的跨语言知识,并提高了其在下游任务的传递性。实验结果表明,我们的方法可以在 XNLI 和 PAWS-X 上提高零shot 跨语言传递性,无需外部数据。我们的代码可以在 https://github.com/luka-group/SALT 上获取。

Classifying Organizations for Food System Ontologies using Natural Language Processing

  • paper_url: http://arxiv.org/abs/2309.10880
  • repo_url: https://github.com/ICICLE-ai/Organization-Classification-for-Food-Systems
  • paper_authors: Tianyu Jiang, Sonia Vinogradova, Nathan Stringham, E. Louise Earl, Allan D. Hollander, Patrick R. Huber, Ellen Riloff, R. Sandra Schillo, Giorgio A. Ubbiali, Matthew Lange
  • for: 填充知识图和食品系统 Ontology 的信息
  • methods: 使用自然语言处理(NLP)方法自动分类实体
  • results: NLP 模型可以达到相对好的性能,并可以应用于许多其他分类问题
    Abstract Our research explores the use of natural language processing (NLP) methods to automatically classify entities for the purpose of knowledge graph population and integration with food system ontologies. We have created NLP models that can automatically classify organizations with respect to categories associated with environmental issues as well as Standard Industrial Classification (SIC) codes, which are used by the U.S. government to characterize business activities. As input, the NLP models are provided with text snippets retrieved by the Google search engine for each organization, which serves as a textual description of the organization that is used for learning. Our experimental results show that NLP models can achieve reasonably good performance for these two classification tasks, and they rely on a general framework that could be applied to many other classification problems as well. We believe that NLP models represent a promising approach for automatically harvesting information to populate knowledge graphs and aligning the information with existing ontologies through shared categories and concepts.
    摘要 我们的研究探讨了使用自然语言处理(NLP)方法自动分类实体,以填充知识图和食品系统 ontology 的目的。我们已经创建了一些 NLP 模型,可以自动将组织分类到环境问题相关的类别以及美国政府使用的标准工业分类(SIC)代码中。作为输入,NLP 模型被提供文本摘要,它们是由 Google 搜索引擎提取的每个组织的文本描述,用于学习。我们的实验结果表明,NLP 模型可以达到相对好的性能水平,并且可以应用于许多其他分类问题。我们认为,NLP 模型代表一种有前途的方法,用于自动收割信息,并将信息与现有 ontology 进行对应。

Believable Minecraft Settlements by Means of Decentralised Iterative Planning

  • paper_url: http://arxiv.org/abs/2309.10871
  • repo_url: None
  • paper_authors: Arthur van der Staaij, Jelmer Prins, Vincent L. Prins, Julian Poelsma, Thera Smit, Matthias Müller-Brockhausen, Mike Preuss
  • for: 这篇论文主要是为了解决 Procedural Content Generation (PCG) 领域中的寓真性和适应随机地形的城市生成问题。
  • methods: 该论文使用了分布式、迭代的规划过程,可以转移到类似的生成过程中生成“有机”的内容。
  • results: 该论文在 Generative Settlement Design in Minecraft (GDMC) 2022 比赛中获胜,表明了其在寓真性和适应随机地形的城市生成方面的可行性。
    Abstract Procedural city generation that focuses on believability and adaptability to random terrain is a difficult challenge in the field of Procedural Content Generation (PCG). Dozens of researchers compete for a realistic approach in challenges such as the Generative Settlement Design in Minecraft (GDMC), in which our method has won the 2022 competition. This was achieved through a decentralised, iterative planning process that is transferable to similar generation processes that aims to produce "organic" content procedurally.
    摘要 simultradchinese渐进城市生成,强调可信度和随机地形适应性,是PCG领域的挑战之一。多达数十名研究人员竞相奔走,寻求真实的方法,如 Minecraft 的生成殖民地设计挑战(GDMC),我们的方法在2022年赛事中获胜。这一成果基于分散式、迭代 плани策略,可以应用于类似的生成过程,以生成"有机"的内容。Note: "simplified Chinese" is a translation of the text into Chinese that uses simpler grammar and vocabulary, which is easier to understand for non-native speakers. However, the translation may not be as precise or nuanced as a translation into traditional Chinese.

Using AI Uncertainty Quantification to Improve Human Decision-Making

  • paper_url: http://arxiv.org/abs/2309.10852
  • repo_url: None
  • paper_authors: Laura R. Marusich, Jonathan Z. Bakdash, Yan Zhou, Murat Kantarcioglu
  • For: The paper aims to improve human decision-making by providing additional probabilistic information from AI uncertainty quantification (UQ).* Methods: The paper uses instance-based UQ for three real datasets, trains different AI models for classification, and creates confidence intervals for UQ using random samples. The UQ is calibrated using a strictly proper scoring rule.* Results: The paper finds that providing UQ information along with AI predictions significantly improves human decision-making beyond AI predictions alone, and that this benefit generalizes across different representations of UQ information.Here are the three key points in Simplified Chinese:* For: 这篇论文目的是提高人类决策过程中的AI预测信息。* Methods: 这篇论文使用实例基于的AI不确定量评估(UQ)来提高人类决策。它使用不同的AI模型进行分类,并使用随机样本生成周围的实例的信息来创建置信区间。这些置信区间被使用正确的评估函数来评估UQ的质量。* Results: 这篇论文发现,提供UQ信息和AI预测信息可以显著提高人类决策的准确性,并且这种效果适用于不同的UQ信息表示方式。
    Abstract AI Uncertainty Quantification (UQ) has the potential to improve human decision-making beyond AI predictions alone by providing additional useful probabilistic information to users. The majority of past research on AI and human decision-making has concentrated on model explainability and interpretability. We implemented instance-based UQ for three real datasets. To achieve this, we trained different AI models for classification for each dataset, and used random samples generated around the neighborhood of the given instance to create confidence intervals for UQ. The computed UQ was calibrated using a strictly proper scoring rule as a form of quality assurance for UQ. We then conducted two preregistered online behavioral experiments that compared objective human decision-making performance under different AI information conditions, including UQ. In Experiment 1, we compared decision-making for no AI (control), AI prediction alone, and AI prediction with a visualization of UQ. We found UQ significantly improved decision-making beyond the other two conditions. In Experiment 2, we focused on comparing different representations of UQ information: Point vs. distribution of uncertainty and visualization type (needle vs. dotplot). We did not find meaningful differences in decision-making performance among these different representations of UQ. Overall, our results indicate that human decision-making can be improved by providing UQ information along with AI predictions, and that this benefit generalizes across a variety of representations of UQ.
    摘要

SlimPajama-DC: Understanding Data Combinations for LLM Training

  • paper_url: http://arxiv.org/abs/2309.10818
  • repo_url: None
  • paper_authors: Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Joel Hestness, Natalia Vassilieva, Daria Soboleva, Eric Xing
  • for: 本研究使用SlimPajama dataset进行语言模型训练,旨在探索不同数据组合(如网络文本、Wikipedia、GitHub、书籍)对大语言模型训练的影响。
  • methods: 本研究使用SlimPajama dataset,并对其进行了全面和本地减重。然后,通过使用1.3B Cerebras-GPT模型和Alibi、SwiGLU进行训练,对不同的数据组合进行分析。
  • results: 研究发现,全球减重和本地减重对训练后的模型性能有着不同的影响。此外,研究还发现,在不同的数据组合中,提高数据多样性是关键的。最终,本研究的最佳配置比使用RedPajama dataset训练的1.3B模型表现出较好的性能。
    Abstract This paper aims to understand the impacts of various data combinations (e.g., web text, wikipedia, github, books) on the training of large language models using SlimPajama. SlimPajama is a rigorously deduplicated, multi-source dataset, which has been refined and further deduplicated to 627B tokens from the extensive 1.2T tokens RedPajama dataset contributed by Together. We've termed our research as SlimPajama-DC, an empirical analysis designed to uncover fundamental characteristics and best practices associated with employing SlimPajama in the training of large language models. During our research with SlimPajama, two pivotal observations emerged: (1) Global deduplication vs. local deduplication. We analyze and discuss how global (across different sources of datasets) and local (within the single source of dataset) deduplications affect the performance of trained models. (2) Proportions of high-quality/highly-deduplicated multi-source datasets in the combination. To study this, we construct six configurations of SlimPajama dataset and train individual ones using 1.3B Cerebras-GPT model with Alibi and SwiGLU. Our best configuration outperforms the 1.3B model trained on RedPajama using the same number of training tokens by a significant margin. All our 1.3B models are trained on Cerebras 16$\times$ CS-2 cluster with a total of 80 PFLOP/s in bf16 mixed precision. We further extend our discoveries (such as increasing data diversity is crucial after global deduplication) on a 7B model with large batch-size training. Our models and the separate SlimPajama-DC datasets are available at: https://huggingface.co/MBZUAI-LLM and https://huggingface.co/datasets/cerebras/SlimPajama-627B.
    摘要

AI Foundation Models for Weather and Climate: Applications, Design, and Implementation

  • paper_url: http://arxiv.org/abs/2309.10808
  • repo_url: None
  • paper_authors: S. Karthik Mukkavilli, Daniel Salles Civitarese, Johannes Schmude, Johannes Jakubik, Anne Jones, Nam Nguyen, Christopher Phillips, Sujit Roy, Shraddha Singh, Campbell Watson, Raghu Ganti, Hendrik Hamann, Udaysankar Nair, Rahul Ramachandran, Kommy Weldemariam
    for:* 这篇论文旨在探讨使用机器学习和深度学习方法来理解大气中的混沌行为,以及如何使用这些方法进行天气预测。methods:* 这篇论文主要使用变换器、物理学习和图 neural network 等方法,以实现在相对狭小的时空尺度和特定任务上的状态顶峰性能。results:* 这篇论文认为,随着现在的生成人工智能技术的进步,现在已经可以构建一个通用的 Earth 系统模型,以及区域气象模型和中级气象模型。这些模型可以在多个领域特定下游任务上达到竞争力。
    Abstract Machine learning and deep learning methods have been widely explored in understanding the chaotic behavior of the atmosphere and furthering weather forecasting. There has been increasing interest from technology companies, government institutions, and meteorological agencies in building digital twins of the Earth. Recent approaches using transformers, physics-informed machine learning, and graph neural networks have demonstrated state-of-the-art performance on relatively narrow spatiotemporal scales and specific tasks. With the recent success of generative artificial intelligence (AI) using pre-trained transformers for language modeling and vision with prompt engineering and fine-tuning, we are now moving towards generalizable AI. In particular, we are witnessing the rise of AI foundation models that can perform competitively on multiple domain-specific downstream tasks. Despite this progress, we are still in the nascent stages of a generalizable AI model for global Earth system models, regional climate models, and mesoscale weather models. Here, we review current state-of-the-art AI approaches, primarily from transformer and operator learning literature in the context of meteorology. We provide our perspective on criteria for success towards a family of foundation models for nowcasting and forecasting weather and climate predictions. We also discuss how such models can perform competitively on downstream tasks such as downscaling (super-resolution), identifying conditions conducive to the occurrence of wildfires, and predicting consequential meteorological phenomena across various spatiotemporal scales such as hurricanes and atmospheric rivers. In particular, we examine current AI methodologies and contend they have matured enough to design and implement a weather foundation model.
    摘要 机器学习和深度学习方法已广泛应用于理解大气中的混沌行为以及进一步改进天气预测。现在,技术公司、政府机构和气象局都有增加兴趣于建立地球的数字孪生。最近的方法使用变换器、物理学 informed machine learning 和图 neural networks 已经在相对狭小的空间时间尺度和特定任务上达到了国际先进水平。尤其是在语言模型和视觉领域使用预训练变换器后 fine-tuning 的情况下,人工智能已经取得了很好的进步。我们现在正在向通用人工智能进化。特别是我们正在见证到通用人工智能模型可以在多个领域特定下渠道任务上表现竞争力。 DESPITE 这些进步,我们还处于全球 Earth system models、区域气候模型和 mesoscale 天气模型的通用人工智能模型的初始阶段。这里,我们评论当前领域的状态艺术方法,主要是基于 transformer 和操作学习文献中的 meteorology 方法。我们提供我们对成功 criterion 的看法,以及如何建立一个家族基础模型,以便在不同的下渠道任务上进行 nowcasting 和预测天气和气候预测。我们还讨论了如何使用这些模型在下采样、识别激发野火的条件以及预测不同的空间时间尺度的重要气象现象上表现竞争力。

Heuristic Search for Path Finding with Refuelling

  • paper_url: http://arxiv.org/abs/2309.10796
  • repo_url: None
  • paper_authors: Anushtup Nandy, Zhongqiang Ren, Sivakumar Rathinam, Howie Choset
  • for: 这篇论文考虑了路径找路径(PF)的一种扩展,即充油路径找路径(RF-PF)问题。与PF问题一样,RF-PF问题定义在一个图上,图 vertices是知道燃料价格的加油站,边成本取决于加油站之间的燃料消耗。RF-PF寻找最低成本路径从开始到目标Vertex的一个机器人,机器人具有有限燃料箱和有限数量的加油停留。
  • methods: 这篇论文提出了一种启发搜索算法called Refuel A* (RF-A*),该算法在图上逐步构建部分解决方案路径,并利用准则来精简状态的排除。
  • results: 测试在大城市地图上,RF-A* 比现有的状态艺术(一种 polynomial time 算法)快速得多于一个顺序幂,并且能够找到优化解决方案。
    Abstract This paper considers a generalization of the Path Finding (PF) with refueling constraints referred to as the Refuelling Path Finding (RF-PF) problem. Just like PF, the RF-PF problem is defined over a graph, where vertices are gas stations with known fuel prices, and edge costs depend on the gas consumption between the corresponding vertices. RF-PF seeks a minimum-cost path from the start to the goal vertex for a robot with a limited gas tank and a limited number of refuelling stops. While RF-PF is polynomial-time solvable, it remains a challenge to quickly compute an optimal solution in practice since the robot needs to simultaneously determine the path, where to make the stops, and the amount to refuel at each stop. This paper develops a heuristic search algorithm called Refuel A* (RF-A* ) that iteratively constructs partial solution paths from the start to the goal guided by a heuristic function while leveraging dominance rules for state pruning during planning. RF-A* is guaranteed to find an optimal solution and runs more than an order of magnitude faster than the existing state of the art (a polynomial time algorithm) when tested in large city maps with hundreds of gas stations.
    摘要 这个论文考虑了路径找路(PF)的一种扩展,即充油路径找路(RF-PF)问题。与PF类似,RF-PF问题在图上定义,图 vertices 是知道燃料价格的加油站,边的成本取决于两个顶点之间的燃料消耗。RF-PF寻找最低成本路径从开始顶点到目标顶点, robot 有有限燃料箱和有限数量的加油停。虽然 RF-PF 是 polynomial-time 可解决的,但在实践中很难快速计算优化的解决方案,因为机器人需要同时确定路径、停留处和充油量。这篇论文开发了一种启发搜索算法called Refuel A*(RF-A*),该算法在启发函数的指导下逐步构建从开始顶点到目标顶点的偏好解。RF-A* 保证找到优化解决方案,并在大型城市地图上进行了大量的测试,与现有的状态艺术(一个 polynomial time 算法)比较,运行速度高于一个数量级。

Guide Your Agent with Adaptive Multimodal Rewards

  • paper_url: http://arxiv.org/abs/2309.10790
  • repo_url: https://github.com/csmile-1006/arp
  • paper_authors: Changyeon Kim, Younggyo Seo, Hao Liu, Lisa Lee, Jinwoo Shin, Honglak Lee, Kimin Lee
  • for: 这篇论文的目的是提高强化学习agent在未经见过环境中的适应能力。
  • methods: 该方法使用了自然语言任务描述和预训练多Modal embedding来增强agent的总结能力。具体来说,它使用了CLIP预训练多Modal embedding来计算视觉观察和自然语言指令之间的相似性,并使用这个相似性作为奖励信号来训练返回conditioned政策。
  • results: 该方法可以有效地 mitigate目的泛化,并在面对未经见过的文本指令时实现superior的总结性能。此外,通过细化预训练多Modal encoder来提高奖励质量,进一步提高了性能。视频示例和源代码可以在项目网站(https://sites.google.com/view/2023arp)上找到。
    Abstract Developing an agent capable of adapting to unseen environments remains a difficult challenge in imitation learning. In this work, we present Adaptive Return-conditioned Policy (ARP), an efficient framework designed to enhance the agent's generalization ability using natural language task descriptions and pre-trained multimodal encoders. Our key idea is to calculate a similarity between visual observations and natural language instructions in the pre-trained multimodal embedding space (such as CLIP) and use it as a reward signal. We then train a return-conditioned policy using expert demonstrations labeled with multimodal rewards. Because the multimodal rewards provide adaptive signals at each timestep, our ARP effectively mitigates the goal misgeneralization. This results in superior generalization performances even when faced with unseen text instructions, compared to existing text-conditioned policies. To improve the quality of rewards, we also introduce a fine-tuning method for pre-trained multimodal encoders, further enhancing the performance. Video demonstrations and source code are available on the project website: https://sites.google.com/view/2023arp.
    摘要 开发一个能够适应未看过环境的智能代理仍然是一个困难的挑战。在这个工作中,我们提出了适应返回条件策略(ARP),这是一个高效的框架,用于提高智能代理的通用能力使用自然语言任务描述和预训练多模态编码器。我们的关键想法是在预训练多模态空间(如CLIP)中计算视觉观察和自然语言指令之间的相似性,并将其作为奖励信号使用。然后,我们使用专家示范标注为多模态奖励进行返回条件策略的训练,因此我们的ARP可以有效地消除目标泛化。这导致我们在面对未看过文本指令时的总体性能强于现有的文本条件策略。为了提高奖励质量,我们还引入了预训练多模态编码器的细化方法,进一步提高性能。视频示例和源代码可以在项目网站上找到:https://sites.google.com/view/2023arp。

Language as the Medium: Multimodal Video Classification through text only

  • paper_url: http://arxiv.org/abs/2309.10783
  • repo_url: None
  • paper_authors: Laura Hanu, Anita L. Verő, James Thewlis
  • for: 本研究旨在提出一种新的模型独立方法,可以帮助解释视频中的复杂Contextual关系。
  • methods: 该方法利用大型语言模型,如GPT-3.5或Llama2,来理解视频和声音modalities的文本描述,从BLIP-2、Whisper和ImageBind获取。无需进行额外的视频-文本模型或数据集调整,我们示出了现有的LLMs可以使用这些多modal的文本描述作为“看”或“听”的代理,进行零shot多modal视频分类。
  • results: 我们在UCf-101和Kinetics等知名动作认知 benchmark上进行了评估,并示出了这些 Context-rich描述可以在视频理解任务中得到成功应用。这种方法预示着一个promising的新研究方向,即多modal机器学习模型之间的互动,可以实现更全面的视频理解。
    Abstract Despite an exciting new wave of multimodal machine learning models, current approaches still struggle to interpret the complex contextual relationships between the different modalities present in videos. Going beyond existing methods that emphasize simple activities or objects, we propose a new model-agnostic approach for generating detailed textual descriptions that captures multimodal video information. Our method leverages the extensive knowledge learnt by large language models, such as GPT-3.5 or Llama2, to reason about textual descriptions of the visual and aural modalities, obtained from BLIP-2, Whisper and ImageBind. Without needing additional finetuning of video-text models or datasets, we demonstrate that available LLMs have the ability to use these multimodal textual descriptions as proxies for ``sight'' or ``hearing'' and perform zero-shot multimodal classification of videos in-context. Our evaluations on popular action recognition benchmarks, such as UCF-101 or Kinetics, show these context-rich descriptions can be successfully used in video understanding tasks. This method points towards a promising new research direction in multimodal classification, demonstrating how an interplay between textual, visual and auditory machine learning models can enable more holistic video understanding.
    摘要 尽管现有一新的多modal机器学习模型浪潮,现在的方法仍然无法理解视频中不同modalities之间的复杂关系。我们提出了一种新的模型无关方法,可以生成详细的文本描述,捕捉视频信息。我们的方法利用了大语言模型,如GPT-3.5或Llama2所学习的广泛知识,来理解文本描述的视觉和听觉modalities,从BLIP-2、Whisper和ImageBind获取。不需要额外的视频-文本模型或数据集进行训练,我们示示现有的LLM可以使用这些多modal文本描述作为“视”或“听”的代理,进行零例Multimodal分类视频 tasks。我们在UCf-101和Kinetics等流行动作识别benchmark上进行评估,显示这些具有上下文的描述可以在视频理解任务中使用。这种方法指向了一个新的研究方向,证明了多modal机器学习模型之间的互动可以实现更全面的视频理解。

FRASIMED: a Clinical French Annotated Resource Produced through Crosslingual BERT-Based Annotation Projection

  • paper_url: http://arxiv.org/abs/2309.10770
  • repo_url: None
  • paper_authors: Jamil Zaghir, Mina Bjelogrlic, Jean-Philippe Goldman, Soukaïna Aananou, Christophe Gaudet-Blavignac, Christian Lovis
    for: 这个研究文章是为了提供一种方法来生成通过语言跨度注解 projection 的翻译版本的注解集,以增加low-resource corpora中的 annotated datasets。methods: 这种方法基于BERT语言模型,使用语言不依赖的方法,可以快速增加low-resource corpora中的 annotated datasets,只需要使用已有的开源数据资源。results: 我们的crosslingual annotation projection方法的评估结果表明其高效和准确,可以生成高质量的注解集。作为实际应用,我们开发了法语注解资源(FRASIMED),这是一个包含2’051个 sintetic clinical cases的法语注解集,可以用于开发和改进法语自然语言处理(NLP)应用。
    Abstract Natural language processing (NLP) applications such as named entity recognition (NER) for low-resource corpora do not benefit from recent advances in the development of large language models (LLMs) where there is still a need for larger annotated datasets. This research article introduces a methodology for generating translated versions of annotated datasets through crosslingual annotation projection. Leveraging a language agnostic BERT-based approach, it is an efficient solution to increase low-resource corpora with few human efforts and by only using already available open data resources. Quantitative and qualitative evaluations are often lacking when it comes to evaluating the quality and effectiveness of semi-automatic data generation strategies. The evaluation of our crosslingual annotation projection approach showed both effectiveness and high accuracy in the resulting dataset. As a practical application of this methodology, we present the creation of French Annotated Resource with Semantic Information for Medical Entities Detection (FRASIMED), an annotated corpus comprising 2'051 synthetic clinical cases in French. The corpus is now available for researchers and practitioners to develop and refine French natural language processing (NLP) applications in the clinical field (https://zenodo.org/record/8355629), making it the largest open annotated corpus with linked medical concepts in French.
    摘要 自然语言处理(NLP)应用程序,如命名实体识别(NER) для低资源 Corpora 不会受到最近的大语言模型(LLM)的发展所带来的 beneficial effects。这篇研究文章介绍了一种方法ologies for generating translated versions of annotated datasets through crosslingual annotation projection。通过使用语言无关的 BERT 基于方法,可以efficiently 增加低资源 Corpora 以及少量人工劳动,只需使用已有的开源数据资源。量化和质量评估是评估自动数据生成策略的重要问题,但对于我们的 crosslingual annotation projection 方法,我们的评估结果表明其效果和准确性都很高。作为这种方法的实践应用,我们介绍了创建了 French Annotated Resource with Semantic Information for Medical Entities Detection(FRASIMED),这是一个包含 2'051 个 sintetic clinical cases 的法语 annotated corpus(https://zenodo.org/record/8355629)。这个 corpus 现在可以为研究人员和实践者提供,以开发和完善法语自然语言处理(NLP)应用程序在医疗领域。

A Blueprint for Precise and Fault-Tolerant Analog Neural Networks

  • paper_url: http://arxiv.org/abs/2309.10759
  • repo_url: None
  • paper_authors: Cansu Demirkiran, Lakshmi Nair, Darius Bunandar, Ajay Joshi
    for:This paper aims to improve the energy efficiency and scalability of deep neural network (DNN) acceleration using analog computing.methods:The paper proposes using the residue number system (RNS) to compose high-precision operations from multiple low-precision operations, eliminating information loss caused by limited precision data converters.results:The study achieves $99%$ of FP32 accuracy for state-of-the-art DNN inference using data converters with only $6$-bit precision, reducing energy consumption by several orders of magnitude while maintaining the same throughput and precision. The approach is also applied to DNN training, achieving accuracy comparable to FP32 precision using $7$-bit integer arithmetic. Additionally, the paper presents a fault-tolerant dataflow using redundant RNS error-correcting codes to protect computation against noise and errors in analog accelerators.
    Abstract Analog computing has reemerged as a promising avenue for accelerating deep neural networks (DNNs) due to its potential to overcome the energy efficiency and scalability challenges posed by traditional digital architectures. However, achieving high precision and DNN accuracy using such technologies is challenging, as high-precision data converters are costly and impractical. In this paper, we address this challenge by using the residue number system (RNS). RNS allows composing high-precision operations from multiple low-precision operations, thereby eliminating the information loss caused by the limited precision of the data converters. Our study demonstrates that analog accelerators utilizing the RNS-based approach can achieve ${\geq}99\%$ of FP32 accuracy for state-of-the-art DNN inference using data converters with only $6$-bit precision whereas a conventional analog core requires more than $8$-bit precision to achieve the same accuracy in the same DNNs. The reduced precision requirements imply that using RNS can reduce the energy consumption of analog accelerators by several orders of magnitude while maintaining the same throughput and precision. Our study extends this approach to DNN training, where we can efficiently train DNNs using $7$-bit integer arithmetic while achieving accuracy comparable to FP32 precision. Lastly, we present a fault-tolerant dataflow using redundant RNS error-correcting codes to protect the computation against noise and errors inherent within an analog accelerator.
    摘要 Traditional digital architectures have faced challenges in terms of energy efficiency and scalability, which has led to the resurgence of analog computing as a promising avenue for accelerating deep neural networks (DNNs). However, achieving high precision and DNN accuracy using analog technologies is challenging, as high-precision data converters are costly and impractical. In this paper, we address this challenge by using the residue number system (RNS). RNS allows for the composition of high-precision operations from multiple low-precision operations, thereby eliminating the information loss caused by the limited precision of the data converters. Our study shows that analog accelerators utilizing the RNS-based approach can achieve accuracy of at least 99% of FP32 for state-of-the-art DNN inference using data converters with only 6-bit precision, whereas a conventional analog core requires more than 8-bit precision to achieve the same accuracy in the same DNNs. This reduction in precision requirements implies that using RNS can reduce the energy consumption of analog accelerators by several orders of magnitude while maintaining the same throughput and precision. Our study also extends this approach to DNN training, where we can efficiently train DNNs using 7-bit integer arithmetic while achieving accuracy comparable to FP32 precision. Finally, we present a fault-tolerant dataflow using redundant RNS error-correcting codes to protect the computation against noise and errors inherent within an analog accelerator.

SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction

  • paper_url: http://arxiv.org/abs/2309.10748
  • repo_url: None
  • paper_authors: Anilkumar Swamy, Vincent Leroy, Philippe Weinzaepfel, Fabien Baradel, Salma Galaaoui, Romain Bregier, Matthieu Armando, Jean-Sebastien Franco, Gregory Rogez
  • for: 本研究旨在超越现有的手套物交互数据集,提供更多的真实物体变化和精度的3D手套物 reconstruction。
  • methods: 该研究使用了一种2 stage推理管道,首先使用固定的手套物系统进行准确的 registrations,然后使用多视图重建(MVR)算法进行3D重建。
  • results: 研究表明,使用SFM工具箱或手势估计器可以实现可靠的 объек-agnostic 3D手套物重建,但这些方法仍然敏感于初始相机pose估计,具有改进的重建空间。
    Abstract Recent hand-object interaction datasets show limited real object variability and rely on fitting the MANO parametric model to obtain groundtruth hand shapes. To go beyond these limitations and spur further research, we introduce the SHOWMe dataset which consists of 96 videos, annotated with real and detailed hand-object 3D textured meshes. Following recent work, we consider a rigid hand-object scenario, in which the pose of the hand with respect to the object remains constant during the whole video sequence. This assumption allows us to register sub-millimetre-precise groundtruth 3D scans to the image sequences in SHOWMe. Although simpler, this hypothesis makes sense in terms of applications where the required accuracy and level of detail is important eg., object hand-over in human-robot collaboration, object scanning, or manipulation and contact point analysis. Importantly, the rigidity of the hand-object systems allows to tackle video-based 3D reconstruction of unknown hand-held objects using a 2-stage pipeline consisting of a rigid registration step followed by a multi-view reconstruction (MVR) part. We carefully evaluate a set of non-trivial baselines for these two stages and show that it is possible to achieve promising object-agnostic 3D hand-object reconstructions employing an SfM toolbox or a hand pose estimator to recover the rigid transforms and off-the-shelf MVR algorithms. However, these methods remain sensitive to the initial camera pose estimates which might be imprecise due to lack of textures on the objects or heavy occlusions of the hands, leaving room for improvements in the reconstruction. Code and dataset are available at https://europe.naverlabs.com/research/showme
    摘要 近期手Object交互数据集显示有限的真实物体多样性,并且通过适应MANO参数模型来获取实际手势。为了突破这些限制并促进更多的研究,我们介绍了SHOWMe数据集,包括96个视频,每个视频都有细节Real和3D手Object纹理网格的注释。我们遵循最近的工作,假设手Object场景是固定的,即手指与对象之间的pose保持不变 durante全个视频序列。这种假设使我们能够将亮度毫米精度的地面扫描与图像序列进行注册。虽然更简单,但这种假设在应用场景中是有意义的,例如人 robot合作中的手Object交换、物体扫描、或手指与对象的接触点分析。重要的是,固定的手Object系统使得我们可以通过一个2 stage管道来解决视频基于3D重建未知手持 объек的问题。我们仔细评估了一些非常轻量级的基准,并证明可以使用SfM工具箱或手势估计器来恢复固定变换和Off-the-shelf MVR算法来实现可靠的物体agnostic 3D手Object重建。然而,这些方法仍然敏感于初始相机pose估计,可能因为对象上缺乏文本或手指重叠而导致估计不准确,留下改进重建的空间。代码和数据集可以在https://europe.naverlabs.com/research/showme上下载。

Evaluating large language models’ ability to understand metaphor and sarcasm using a screening test for Asperger syndrome

  • paper_url: http://arxiv.org/abs/2309.10744
  • repo_url: https://github.com/hiromu/llm-msst
  • paper_authors: Hiromu Yakura
  • for: 本研究旨在检验 latest large language models (LLMs) 是否能够理解人类含义渊博的通信方式,包括 метафора和讽刺。
  • methods: 本研究使用标准化测试来评估 LLMs 对 метафора和讽刺的理解能力。
  • results: 研究发现,随着模型参数的增加,LLMs 对 метафора的理解能力有所提高,但对讽刺的理解能力没有改善。这表明,为了让 LLMs 理解讽刺,需要采取不同的方法。
    Abstract Metaphors and sarcasm are precious fruits of our highly-evolved social communication skills. However, children with Asperger syndrome are known to have difficulties in comprehending sarcasm, even if they possess a certain level of verbal IQ sufficient for understanding metaphors. Given that, a screening test that scores the ability to understand metaphor and sarcasm has been used to differentiate Asperger syndrome from other symptoms exhibiting akin external behaviors (e.g., attention-deficit/hyperactivity disorder). This study uses the standardized test to examine the capability of recent large language models (LLMs) in understanding human nuanced communication. The results divulged that, whereas their ability to comprehend metaphors has been improved with the increase of the number of model parameters, the improvement in sarcasm understanding was not observed. This implies that an alternative approach is imperative to imbue LLMs with the capacity to grasp sarcasm, which has been associated with the amygdala, a pivotal cerebral region for emotional learning, in the case of humans.
    摘要 假设和讽刺是我们高度进化的社会通信技能的珍贵果子。然而,儿童患有阿斯伯格症状时常有困难理解假设和讽刺,即使他们具有足够的语言IQ来理解 мета喻。由此而来,一种用于分 differentiate 阿斯伯格症状和其他外表相似的症状(如注意力不足障碍)的测试方法是使用标准化测试来评估人类偏向通信中的细微表达能力。本研究使用这种标准化测试来检查最新的大语言模型(LLMs)在理解人类细微通信中的能力。结果发现,虽然模型参数的增加可以提高其理解假设的能力,但对讽刺的理解则没有改善。这表明,为了让 LLMs 擅长理解讽刺,需要采取不同的方法。这种方法与人类情感学习中的 Amygdala 相关,即脑部的一个重要区域。

MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation

  • paper_url: http://arxiv.org/abs/2309.10738
  • repo_url: https://github.com/NEXTLab-ZJU/MelodyGLM
  • paper_authors: Xinda Wu, Zhijie Huang, Kejun Zhang, Jiaxing Yu, Xu Tan, Tieyao Zhang, Zihao Wang, Lingyun Sun
  • for: 这 paper 是为了提高 symbolic melody generation 的预训练方法,以便更好地捕捉多个尺度、多个维度的结构信息在音序中。
  • methods: 这 paper 使用了 multi-task pre-training 框架 MelodyGLM,并设计了 local blank infilling 和 global blank infilling 任务,以模型音序中的本地和全球结构。
  • results: 对于 melody continuation 和 melody inpainting 任务,MelodyGLM 表现出了明显的改善,特别是在 subjective 评价中,MelodyGLM 的平均提升为 0.82、0.87、0.78 和 0.94 个数据点,并且在 melody inpainting 任务上几乎与人工编写的 melody 相当。
    Abstract Pre-trained language models have achieved impressive results in various music understanding and generation tasks. However, existing pre-training methods for symbolic melody generation struggle to capture multi-scale, multi-dimensional structural information in note sequences, due to the domain knowledge discrepancy between text and music. Moreover, the lack of available large-scale symbolic melody datasets limits the pre-training improvement. In this paper, we propose MelodyGLM, a multi-task pre-training framework for generating melodies with long-term structure. We design the melodic n-gram and long span sampling strategies to create local and global blank infilling tasks for modeling the local and global structures in melodies. Specifically, we incorporate pitch n-grams, rhythm n-grams, and their combined n-grams into the melodic n-gram blank infilling tasks for modeling the multi-dimensional structures in melodies. To this end, we have constructed a large-scale symbolic melody dataset, MelodyNet, containing more than 0.4 million melody pieces. MelodyNet is utilized for large-scale pre-training and domain-specific n-gram lexicon construction. Both subjective and objective evaluations demonstrate that MelodyGLM surpasses the standard and previous pre-training methods. In particular, subjective evaluations show that, on the melody continuation task, MelodyGLM gains average improvements of 0.82, 0.87, 0.78, and 0.94 in consistency, rhythmicity, structure, and overall quality, respectively. Notably, MelodyGLM nearly matches the quality of human-composed melodies on the melody inpainting task.
    摘要 传统的预训练方法对象是文本和音乐之间的知识差异,使得现有的预训练方法很难捕捉多级多维结构信息在旋律中。此外,Symbolic melody的大规模数据集的可用性限制了预训练的改进。本文提出了MelodyGLM,一个多任务预训练框架,用于生成具有长期结构的旋律。我们设计了旋律n-gram和长span采样策略,以创建本地和全局的缺失填充任务,以模拟旋律中的本地和全局结构。特别是,我们将把抑音n-gram、节奏n-gram和其结合的n-gram添加到旋律n-gram缺失填充任务中,以模拟旋律中的多维结构。为此,我们建立了一个大规模的Symbolic melody数据集,MelodyNet,包含超过0.4万个旋律 Piece。MelodyNet被用于大规模预训练和域pecific n-gram词典构造。对比标准和先前的预训练方法,我们的MelodyGLM得分较高,特别是在旋律续写任务上,MelodyGLM的平均提升为0.82、0.87、0.78和0.94在一致性、节奏性、结构和总质量等方面。值得注意的是,MelodyGLM在旋律填充任务上几乎与人工制作的旋律相当。

Monte-Carlo tree search with uncertainty propagation via optimal transport

  • paper_url: http://arxiv.org/abs/2309.10737
  • repo_url: None
  • paper_authors: Tuan Dam, Pascal Stenger, Lukas Schneider, Joni Pajarinen, Carlo D’Eramo, Odalric-Ambrym Maillard
  • for: 这篇论文提出了一种新的备份策略,用于高度随机和部分可见的马尔可夫决策过程。
  • methods: 我们采用了一种概率方法,将值节点和行动值节点都模型为高斯分布。我们引入了一种新的备份操作符,通过计算行动值子节点的 Wasserstein 质量中心来传递估计的不确定性到根节点。
  • results: 我们提供了许多理论保证,证明我们的概率备份操作符在某些情况下具有极限吞吐量,并且在一些随机和部分可见环境中比较出色的表现。
    Abstract This paper introduces a novel backup strategy for Monte-Carlo Tree Search (MCTS) designed for highly stochastic and partially observable Markov decision processes. We adopt a probabilistic approach, modeling both value and action-value nodes as Gaussian distributions. We introduce a novel backup operator that computes value nodes as the Wasserstein barycenter of their action-value children nodes; thus, propagating the uncertainty of the estimate across the tree to the root node. We study our novel backup operator when using a novel combination of $L^1$-Wasserstein barycenter with $\alpha$-divergence, by drawing a notable connection to the generalized mean backup operator. We complement our probabilistic backup operator with two sampling strategies, based on optimistic selection and Thompson sampling, obtaining our Wasserstein MCTS algorithm. We provide theoretical guarantees of asymptotic convergence to the optimal policy, and an empirical evaluation on several stochastic and partially observable environments, where our approach outperforms well-known related baselines.
    摘要 The proposed backup operator is combined with two sampling strategies, based on optimistic selection and Thompson sampling, to obtain the Wasserstein MCTS algorithm. The authors provide theoretical guarantees of asymptotic convergence to the optimal policy and an empirical evaluation on several stochastic and partially observable environments, where the proposed approach outperforms well-known related baselines.Here is the translation of the text into Simplified Chinese:这篇论文介绍了一种新的 Monte Carlo Tree Search(MCTS)的备用策略,适用于高度随机的和部分可见的Markov决策过程。我们采用了一种 probabilistic 模型,将值和行动值节点都模型为 Gaussian 分布。我们提出了一种新的备用算法,计算值节点为行动值孩子节点的 Wasserstein 中心,从而在树中传递不确定性的估计。我们将该备用算法与两种抽样策略相结合,基于 optimistic selection 和 Thompson sampling,得到了 Wasserstein MCTS 算法。我们提供了对于优化策略的 asymptotic 收敛性的理论保证,并对多个随机和部分可见环境进行了 empirical 评估,其中我们的方法超过了一些相关的基准值。

PAMS: Platform for Artificial Market Simulations

  • paper_url: http://arxiv.org/abs/2309.10729
  • repo_url: https://github.com/masanorihirano/pams
  • paper_authors: Masanori Hirano, Ryosuke Takata, Kiyoshi Izumi
  • for: 这个论文提出了一个新的人工市场模拟平台,即 PAMS:Platform for Artificial Market Simulations。PAMS 是一个基于 Python 的模拟器,可以轻松地与深度学习结合,并且允许用户轻松地修改 simulation。
  • methods: 本论文使用的方法包括了深度学习,以 Predicting future prices。
  • results: 研究表明,PAMS 可以准确地预测未来的价格。
    Abstract This paper presents a new artificial market simulation platform, PAMS: Platform for Artificial Market Simulations. PAMS is developed as a Python-based simulator that is easily integrated with deep learning and enabling various simulation that requires easy users' modification. In this paper, we demonstrate PAMS effectiveness through a study using agents predicting future prices by deep learning.
    摘要 这篇论文介绍了一个新的人工市场模拟平台,即PAMS:基于Python的人工市场模拟平台。PAMS可以轻松地与深度学习结合,并且允许用户轻松地修改 simulate various simulation scenarios。在本文中,我们通过使用深度学习 agents 预测未来价格来证明 PAMS 的效果。Here's the translation in Traditional Chinese:这篇论文介绍了一个新的人工市场模拟平台,即PAMS:基于Python的人工市场模拟平台。PAMS可以轻松地与深度学习结合,并且允许用户轻松地修改 simulate various simulation scenarios。在本文中,我们通过使用深度学习 agents 预测未来价格来证明 PAMS 的效果。

Causality-Driven One-Shot Learning for Prostate Cancer Grading from MRI

  • paper_url: http://arxiv.org/abs/2309.10725
  • repo_url: None
  • paper_authors: Gianluca Carloni, Eva Pachetti, Sara Colantonio
  • for: 这个研究旨在提出一种自动分类医疗影像的方法,并将弱 causal 信号在影像中学习和应用。
  • methods: 我们的框架包括卷积神经网络和 causality-extractor 模组,这个模组可以将 causa-effect 关系 между特征图中的特征,帮助模型对于影像中的特征进行推断。
  • results: 我们通过一个 One-shot 学习 scheme 进行训练,包括 meta-training 和 meta-testing 任务,以评估我们的方法在低数据情况下的效果。我们对公开可用的 проstate MRI 影像集进行了二分和多分类实验,并进行了删除研究和 qualitative 评估,以验证提案的 causality-driven 模组的有效性。我们发现, causal 关系在特征之间扮演着关键角色,帮助模型更好地分类医疗影像。
    Abstract In this paper, we present a novel method to automatically classify medical images that learns and leverages weak causal signals in the image. Our framework consists of a convolutional neural network backbone and a causality-extractor module that extracts cause-effect relationships between feature maps that can inform the model on the appearance of a feature in one place of the image, given the presence of another feature within some other place of the image. To evaluate the effectiveness of our approach in low-data scenarios, we train our causality-driven architecture in a One-shot learning scheme, where we propose a new meta-learning procedure entailing meta-training and meta-testing tasks that are designed using related classes but at different levels of granularity. We conduct binary and multi-class classification experiments on a publicly available dataset of prostate MRI images. To validate the effectiveness of the proposed causality-driven module, we perform an ablation study and conduct qualitative assessments using class activation maps to highlight regions strongly influencing the network's decision-making process. Our findings show that causal relationships among features play a crucial role in enhancing the model's ability to discern relevant information and yielding more reliable and interpretable predictions. This would make it a promising approach for medical image classification tasks.
    摘要 在这篇论文中,我们提出了一种新的方法,用于自动分类医疗图像。我们的框架包括一个卷积神经网络背bone和一个 causality-extractor 模块,该模块EXTRACTS causal relationships between feature maps, 可以告诉模型在某个位置的图像中,某个特征的出现,受到另一个特征在另一个位置的影响。为了评估我们的方法在低数据 scenarios 中的效果,我们采用了一种 One-shot learning 方法,包括 meta-training 和 meta-testing 任务,这些任务是使用相关的类,但是在不同的粒度水平上进行设计。我们在公共可用的 проstate MRI 图像 dataset 上进行了二分和多分类分类实验。为了证明提案的 causality-driven 模块的有效性,我们进行了减少学习和qualitative assessment,使用类 activation maps 高亮模型决策过程中强烈影响的区域。我们的发现表明, causal relationships among features 在提高模型对 relevante information 的感知和取得更加可靠和可读的预测方面发挥了关键作用。这会使得这种方法在医疗图像分类任务中成为一种可靠的方法。

Sound Source Localization is All about Cross-Modal Alignment

  • paper_url: http://arxiv.org/abs/2309.10724
  • repo_url: None
  • paper_authors: Arda Senocak, Hyeonggon Ryu, Junsik Kim, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung
  • for: 本研究旨在解决真正的声源定位问题,即人类可以通过视觉场景中的声音来确定声音的来源。
  • methods: 我们提出了一种涉及声音和视觉modalities的共同定位任务,以提高声音定位和视觉modalities之间的协调。
  • results: 我们的方法在声音定位和跨模态检索中表现出色,高于当前的状态艺术方法。这些结果表明,同时解决声音定位和跨模态协调任务是解决真正的声源定位问题的关键。
    Abstract Humans can easily perceive the direction of sound sources in a visual scene, termed sound source localization. Recent studies on learning-based sound source localization have mainly explored the problem from a localization perspective. However, prior arts and existing benchmarks do not account for a more important aspect of the problem, cross-modal semantic understanding, which is essential for genuine sound source localization. Cross-modal semantic understanding is important in understanding semantically mismatched audio-visual events, e.g., silent objects, or off-screen sounds. To account for this, we propose a cross-modal alignment task as a joint task with sound source localization to better learn the interaction between audio and visual modalities. Thereby, we achieve high localization performance with strong cross-modal semantic understanding. Our method outperforms the state-of-the-art approaches in both sound source localization and cross-modal retrieval. Our work suggests that jointly tackling both tasks is necessary to conquer genuine sound source localization.
    摘要 人类可以轻松地在视觉场景中识别声音源的方向,称为声音源localization。现在的学习基于的声音源localization研究主要从localization角度出发。然而,前一代和现有的标准没有考虑一个更重要的问题,即跨模态 semantics的理解,这是真正的声音源localization的关键。跨模态 semantics的理解能够有效地处理semantically mismatched audio-visual事件,如静物或屏外声音。为了考虑这一点,我们提议在声音源localization任务中添加跨模态对应 зада务,以更好地学习视觉modalities之间的交互。因此,我们实现了高地理位性性能和强的跨模态 semantics理解。我们的方法超越了当前状态的方法在声音源localization和跨模态retrieval两个领域。我们的工作表明,同时解决这两个任务是必要的,以解决真正的声音源localization。

LEA*: An A* Variant Algorithm with Improved Edge Efficiency for Robot Motion Planning

  • paper_url: http://arxiv.org/abs/2309.10722
  • repo_url: https://github.com/dongliangch/leastar
  • paper_authors: Dongliang Zheng, Panagiotis Tsiotras
  • for: 这个论文是为了提出一种新的图搜索算法,即懒边基于A*(LEA*),用于机器人运动规划。
  • methods: 该算法使用边队列和懒搜索的想法,与A*相似,具有优化的顶点效率和改进的边效率。它的实现几乎没有改变A*的基本结构,因此对前一些懒搜索算法的过渡带来了较小的负担。
  • results: 我们在2D规划问题和7度 freedom manipulator的规划中测试了LEA*和其它算法。我们对Random世界和不同的图大小进行了严格的比较,结果显示LEA*和它的彩色版本wLEA*在发现计划的速度方面与之前的算法相比较快。
    Abstract In this work, we introduce a new graph search algorithm, lazy edged based A* (LEA*), for robot motion planning. By using an edge queue and exploiting the idea of lazy search, LEA* is optimally vertex efficient similar to A*, and has improved edge efficiency compared to A*. LEA* is simple and easy to implement with minimum modification to A*, resulting in a very small overhead compared to previous lazy search algorithms. We also explore the effect of inflated heuristics, which results in the weighted LEA* (wLEA*). We show that the edge efficiency of wLEA* becomes close to LazySP and, thus is near-optimal. We test LEA* and wLEA* on 2D planning problems and planning of a 7-DOF manipulator. We perform a thorough comparison with previous algorithms by considering sparse, medium, and cluttered random worlds and small, medium, and large graph sizes. Our results show that LEA* and wLEA* are the fastest algorithms to find the plan compared to previous algorithms.
    摘要 在这个工作中,我们介绍了一种新的图搜索算法,懒散边基于A*(LEA*),用于机器人运动规划。通过使用边队列和懒散搜索的想法,LEA* 能够与A* 类似的顶点效率优化,并与A* 的边效率相比提高。LEA* 简单易于实现,对 previous lazy search 算法的修改 minimal,因此对于 previous lazy search 算法的 overhead 具有较小的影响。我们还探讨了膨胀式拓扑(weighted LEA*)的效果,并证明其边效率接近LazySP,因此是近似于优化的。我们在 2D 规划问题和7-DOF manipulator 的规划中测试了 LEA* 和 weighted LEA*。我们对 previous algorithms 进行了系统比较,包括 randomly generated sparse、medium 和填充的世界,以及 small、medium 和大的图像大小。我们的结果表明 LEA* 和 weighted LEA* 比 previous algorithms 更快地查找了计划。

Measurement Simplification in ρ-POMDP with Performance Guarantees

  • paper_url: http://arxiv.org/abs/2309.10701
  • repo_url: None
  • paper_authors: Tom Yotam, Vadim Indelman
  • for: 这篇论文主要目标是提出一种高效的决策方法,用于在不精确的信息下进行决策。
  • methods: 该论文使用分割观察空间的方法,以形成关于预期信息奖励的分析 bounds。这些 bounds 然后用于高效地规划,保证性能。
  • results: 该论文显示了这种方法的效果,包括在 Gaussian 信号下的性能提升,以及在实验中的速度增加。同时,它也与其他现有的方法进行比较,并证明其在活动 SLAM 场景中的优势。
    Abstract Decision making under uncertainty is at the heart of any autonomous system acting with imperfect information. The cost of solving the decision making problem is exponential in the action and observation spaces, thus rendering it unfeasible for many online systems. This paper introduces a novel approach to efficient decision-making, by partitioning the high-dimensional observation space. Using the partitioned observation space, we formulate analytical bounds on the expected information-theoretic reward, for general belief distributions. These bounds are then used to plan efficiently while keeping performance guarantees. We show that the bounds are adaptive, computationally efficient, and that they converge to the original solution. We extend the partitioning paradigm and present a hierarchy of partitioned spaces that allows greater efficiency in planning. We then propose a specific variant of these bounds for Gaussian beliefs and show a theoretical performance improvement of at least a factor of 4. Finally, we compare our novel method to other state of the art algorithms in active SLAM scenarios, in simulation and in real experiments. In both cases we show a significant speed-up in planning with performance guarantees.
    摘要 Simplified Chinese translation: autonomous system acting with imperfect information 的决策问题在不确定性下充满挑战。因为动作和观察空间的成本是加性的,因此许多在线系统无法解决这个问题。这篇论文提出了一种新的方法,通过分割高维观察空间来提高决策效率。使用分割后的观察空间,我们提出了一些关于预期信息奖励的分析 bound,对于总体信念分布来说。这些 bound 然后用于有效地规划,保证性能。我们证明这些 bound 是可变的、计算效率高,并且会 converge to 原始解。我们还扩展了分割思想,并提出了一个层次结构的分割空间,以提高规划的效率。最后,我们对 Gaussian 信念中的具体变体提出了一种改进,并证明其在至少增加了4倍的性能。最后,我们在模拟和实际实验中与其他当前标准算法进行比较,并在具有性能保证的情况下显示了明显的减速。

From “Let’s Google” to “Let’s ChatGPT”: Student and Instructor Perspectives on the influence of LLMs on Undergraduate Engineering Education

  • paper_url: http://arxiv.org/abs/2309.10694
  • repo_url: None
  • paper_authors: Ishika Joshi, Ritvik Budhiraja, Pranav Deepak Tanna, Lovenya Jain, Mihika Deshpande, Arjun Srivastava, Srinivas Rallapalli, Harshal D Akolekar, Jagat Sesh Challa, Dhruv Kumar
    for: This paper aims to explore the current usage patterns, perceived benefits, threats, and challenges of Large Language Models (LLMs) among students and instructors in undergraduate engineering universities in India.methods: The study uses surveys and interviews to gather data from 1306 students, 112 student interviews, and 27 instructor interviews.results: The study finds that LLMs are currently used primarily for answering questions and providing explanations, and that students and instructors perceive benefits such as improved understanding and efficiency, but also face challenges such as the need for critical thinking and the potential for misuse. The study offers recommendations for enhancing the adoption of LLMs in undergraduate engineering education and beyond.
    Abstract The rise in popularity of Large Language Models (LLMs) has prompted discussions in academic circles, with students exploring LLM-based tools for coursework inquiries and instructors exploring them for teaching and research. Even though a lot of work is underway to create LLM-based tools tailored for students and instructors, there is a lack of comprehensive user studies that capture the perspectives of students and instructors regarding LLMs. This paper addresses this gap by conducting surveys and interviews within undergraduate engineering universities in India. Using 1306 survey responses among students, 112 student interviews, and 27 instructor interviews around the academic usage of ChatGPT (a popular LLM), this paper offers insights into the current usage patterns, perceived benefits, threats, and challenges, as well as recommendations for enhancing the adoption of LLMs among students and instructors. These insights are further utilized to discuss the practical implications of LLMs in undergraduate engineering education and beyond.
    摘要 LLM(大型自然语言模型)的崛起,已经引发了学术界的讨论,学生们在作业问题上使用 LLM 的工具,教师则在教学和研究中使用 LLM。虽然有很多人在开发学生和教师专门的 LLM 工具,但是没有全面的用户研究,捕捉学生和教师对 LLM 的看法。这篇论文填补了这个空白,通过在印度的大学中进行调查和采访,收集了1306名学生的问卷回答、112名学生的面对面采访和27名教师的采访,对学生和教师在学术上使用 ChatGPT(一个流行的 LLM)的现有使用模式、感受到的利点、威胁和挑战,以及提高学生和教师对 LLM 的采用的建议。这些发现还可以用来讨论 LLMS 在bachelor 工程教育中的实际应用和未来发展。

MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback

  • paper_url: http://arxiv.org/abs/2309.10691
  • repo_url: None
  • paper_authors: Xingyao Wang, Zihan Wang, Jiateng Liu, Yangyi Chen, Lifan Yuan, Hao Peng, Heng Ji
  • for: 评估大型自然语言模型(LLM)在复杂任务解决方面的多轮交互能力。
  • methods: 利用工具和自然语言反馈来评估 LLM 的多轮交互能力,并提供一套可重复性评估框架。
  • results: 研究发现, LLM 在多轮交互中受益于工具和自然语言反馈,表现提升(绝对值)1-8% 每次工具使用和2-17% 自然语言反馈。 单 turno 性能不一定对多轮交互性能有积极影响。 SIFT 和 RLHF 等方法在 LLM 中通常减退多轮交互能力。
    Abstract To solve complex tasks, large language models (LLMs) often require multiple rounds of interactions with the user, sometimes assisted by external tools. However, current evaluation paradigms often focus solely on benchmark performance with single-turn exchanges, neglecting the intricate interactions among the user, LLMs, and external tools, creating a discrepancy between benchmark evaluation and real-world use cases. We introduce MINT benchmark to evaluate LLMs' ability to solve tasks with multi-turn interactions by (1) using tools and (2) leveraging natural language feedback. To ensure reproducibility, we provide an evaluation framework where LLMs can access tools by executing Python code and receive natural language feedback from the user simulated with GPT-4. We repurpose a diverse set of established datasets and tasks focusing on reasoning, coding, and decision-making and carefully curate them into a compact subset of instances for efficient evaluation. Our analysis of 20 open- and closed-source LLMs offers intriguing findings. (1) LLMs generally benefit from tool interactions and language feedback, with performance gains (absolute, same below) of 1--8% per additional turn with tool use and 2--17% with natural language feedback. (2) Better single-turn performance does not guarantee better multi-turn performance. (3) Surprisingly, on LLMs we evaluated, we found supervised instruction-finetuning (SIFT) and reinforcement learning from human feedback (RLHF) generally hurt multi-turn capabilities. We hope MINT can help measure progress and incentivize research in improving LLMs' capabilities in multi-turn interactions, especially for open-source communities where multi-turn human evaluation has been less accessible compared to commercial LLMs with a larger user base.
    摘要 LLMs 通常需要多次互动来解决复杂任务,但现有的评估方法通常只关注单次交互的性能,忽视用户、LLMs 和外部工具之间的复杂互动,从而导致评估和实际使用场景之间的差异。我们提出了 MINT 评估标准,用于评估 LLMs 在多次交互中解决任务的能力,包括使用工具和利用自然语言反馈。为确保可重复性,我们提供了一个评估框架,其中 LLMs 可以通过执行 Python 代码来访问工具,并从用户模拟器(使用 GPT-4)接收自然语言反馈。我们将一些已有的 dataset 和任务重新分配,并将其精炼成一个高效的评估集。我们对 20 个开源和关闭源 LLMs 进行分析,发现了一些有趣的发现:1. LLMs 通常受益于工具和自然语言反馈,其性能提升(绝对值)为 1-8% 每次工具使用和 2-17% 自然语言反馈。2. 更高的单次性能不一定意味着更高的多次性能。3. 对我们评估的 LLMs,我们发现了超级vised instruction-finetuning (SIFT) 和人类反馈学习 (RLHF) 通常会降低多次性能。我们希望 MINT 可以帮助测量进步,并鼓励研究人员在多次互动中提高 LLMs 的能力,特别是对于开源社区,其中多次人工评估的训练资源相对较少,相比于商业 LLMs 的更大用户基数。

Learning-Initialized Trajectory Planning in Unknown Environments

  • paper_url: http://arxiv.org/abs/2309.10683
  • repo_url: None
  • paper_authors: Yicheng Chen, Jinjie Li, Wenyuan Qin, Yongzhao Hua, Xiwang Dong, Qingdong Li
  • for: 提高自适应飞行器在未知环境中的准确规划,以便实现更高级别的自主飞行。
  • methods: 提出了学习初始化规划器(LIT-Planner),利用神经网络规划器提供初始值,并通过批量采样进行空间-时间优化,以捕捉多模态性。
  • results: 通过对真实世界和虚拟环境进行模拟和实验,证明LIT-Planner可以减少优化时间cost,并保持规划质量。
    Abstract Autonomous flight in unknown environments requires precise planning for both the spatial and temporal profiles of trajectories, which generally involves nonconvex optimization, leading to high time costs and susceptibility to local optima. To address these limitations, we introduce the Learning-Initialized Trajectory Planner (LIT-Planner), a novel approach that guides optimization using a Neural Network (NN) Planner to provide initial values. We first leverage the spatial-temporal optimization with batch sampling to generate training cases, aiming to capture multimodality in trajectories. Based on these data, the NN-Planner maps visual and inertial observations to trajectory parameters for handling unknown environments. The network outputs are then optimized to enhance both reliability and explainability, ensuring robust performance. Furthermore, we propose a framework that supports robust online replanning with tolerance to planning latency. Comprehensive simulations validate the LIT-Planner's time efficiency without compromising trajectory quality compared to optimization-based methods. Real-world experiments further demonstrate its practical suitability for autonomous drone navigation.
    摘要 自适应飞行在未知环境中需要精准规划空间和时间轨迹的profile,通常是非核心优化,导致高时间成本和易陷到地点优化。为解决这些限制,我们介绍了学习INITIALIZED Trajectory Planner(LIT-Planner),一种新的方法,该使用神经网络(NN)Planner提供初始值。我们首先利用空间-时间优化批处理生成训练例子,以捕捉多模态的轨迹。基于这些数据,NN-Planner将视觉和遥感观察映射到轨迹参数,以处理未知环境。网络输出被优化,以提高可靠性和可解释性,确保robust性。此外,我们提出了支持稳定在线重新规划的框架,抗性能规划延迟。完整的 simulations validate LIT-Planner的时间效率,而不会妥协轨迹质量与优化方法相比。实际世界实验进一步证明了它的实用性。

Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model Evaluation

  • paper_url: http://arxiv.org/abs/2309.10677
  • repo_url: None
  • paper_authors: Yucheng Li
  • for: 本研究旨在提供一种不需要全量训练数据的污染分析方法,以便对现代语言模型进行可靠的评估。
  • methods: 本研究提出了一种基于沟通能力的污染分析方法,无需访问全量训练数据。
  • results: 研究发现,近期的基础模型在文本理解和概要写作benchmark上存在显著的记忆现象,而多选问题相对较少受污染。
    Abstract Data contamination in model evaluation is getting increasingly prevalent as the massive training corpora of large language models often unintentionally include benchmark samples. Therefore, contamination analysis has became an inevitable part of reliable model evaluation. However, existing method of contamination analysis requires the access of the entire training data which is often confidential for recent models. This prevent the community to rigorously audit these models and conduct accurate assessment of their capability. In this paper, we propose a novel method to quantify contamination without the access of the full training set, that measure the extent of contamination with perplexity. Our analysis provides evidence of significant memorisation of recent foundation models in popular reading comprehension, summarisation benchmarks, while multiple choice appears less contaminated.
    摘要 大量语言模型的训练集中的数据污染问题在不断增加,这是由于大型语言模型的训练集经常意外包含了标准样本。因此,污染分析已成为可靠模型评估的不可或缺的一部分。然而,现有的污染分析方法需要访问整个训练数据,这些数据通常是最新的模型中的商业秘密。这会阻碍社区对这些模型进行严格审核和准确评估其能力。在这篇论文中,我们提出了一种新的方法,可以无需访问整个训练集来衡量污染程度,这种方法基于混淆度来衡量污染程度。我们的分析表明,最近的基础模型在受欢迎的阅读理解和概要 Writing benchmarks 中存在较大的记忆现象,而多选题则相对较少污染。

Language Modeling Is Compression

  • paper_url: http://arxiv.org/abs/2309.10668
  • repo_url: https://github.com/facebookresearch/FBTT-Embedding
  • paper_authors: Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness
  • for: 这项研究的目的是用predictive模型来压缩数据,并评估大型自然语言模型的压缩能力。
  • methods: 该研究使用了大型自然语言模型,并使用了压缩视角来评估这些模型的Scaling laws、Tokenization和in-context learning能力。
  • results: 研究发现,大型自然语言模型不仅是强大的预测器,而且可以压缩图像和语音数据,比如ImageNet和LibriSpeech,以达到43.4%和16.4%的压缩率。此外,研究还表明,使用压缩视角可以使用任何压缩器(如gzip)建立conditional generative模型。
    Abstract It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model.
    摘要

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages

  • paper_url: http://arxiv.org/abs/2309.10661
  • repo_url: https://github.com/indonlp/nusa-writes
  • paper_authors: Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Maulana Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Wahyuning Linuwih, Bryan Wilie, Galih Pradipta Muridan, Genta Indra Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung
  • for: 这篇论文的目的是推广自然语言处理(NLP)技术的访问权,尤其是为underrepresented和EXTREMELY low-resource语言。
  • methods: 这篇论文使用了在线抓取和文档翻译来构建标注和无标注 corpora。然而,这些方法存在限制,包括lack of lexical diversity和local community的文化相关性。
  • results: 我们的实验结果表明,通过本地Native speakers写作 paragraphs来构建dataset可以提高lexical diversity和文化内容的质量。此外,我们还提供了\datasetname{} benchmark,包括12种underrepresented和EXTREMELY low-resource语言,这些语言在印度尼西亚被 millions of people speaking。我们的实验结果表明,现有的多语言大语言模型需要扩展到更多的underrepresented语言。我们在github上发布了nusa-writes dataset,可以在https://github.com/IndoNLP/nusa-writes 中下载。
    Abstract Democratizing access to natural language processing (NLP) technology is crucial, especially for underrepresented and extremely low-resource languages. Previous research has focused on developing labeled and unlabeled corpora for these languages through online scraping and document translation. While these methods have proven effective and cost-efficient, we have identified limitations in the resulting corpora, including a lack of lexical diversity and cultural relevance to local communities. To address this gap, we conduct a case study on Indonesian local languages. We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets. Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content. In addition, we present the \datasetname{} benchmark, encompassing 12 underrepresented and extremely low-resource languages spoken by millions of individuals in Indonesia. Our empirical experiment results using existing multilingual large language models conclude the need to extend these models to more underrepresented languages. We release the NusaWrites dataset at https://github.com/IndoNLP/nusa-writes.
    摘要 德米实现自然语言处理(NLP)技术的普及是非常重要,特别是 для那些受到歧视和资源匮乏的语言。过往的研究专注于透过网络采集和文档翻译来建立这些语言的标点和无标点数据库。although these methods have proven effective and cost-efficient,我们发现这些数据库中的缺失,包括语汇多样性和本地文化内涵。为了解决这个问题,我们进行了印尼地方语言的案例研究。我们比较了网络采集、人工翻译和本地母语者写作 paragraph的方法,以建立数据集。我们的发现是,由本地母语者写作 paragraph的数据集具有较高的语汇多样性和本地文化内涵。此外,我们还提供了 \datasetname{} 数据集,覆盖了印尼12种未代表和具有严重资源不足的语言,这些语言由 millions of individuals 使用。我们的实验结果显示,扩展现有的多语言大型语言模型到更多的未代表语言是必要的。我们在 GitHub 上发布了 NusaWrites 数据集,请参考 https://github.com/IndoNLP/nusa-writes。

CFGPT: Chinese Financial Assistant with Large Language Model

  • paper_url: http://arxiv.org/abs/2309.10654
  • repo_url: None
  • paper_authors: Jiangtong Li, Yuxuan Bian, Guoxuan Wang, Yang Lei, Dawei Cheng, Zhijun Ding, Changjun Jiang
  • For: The paper is written for presenting a Chinese Financial Generative Pre-trained Transformer framework (CFGPT) for natural language processing tasks in the financial domain.* Methods: The paper uses a dataset (CFData) for pre-training and supervised fine-tuning, a financial LLM (CFLLM) to manage financial texts, and a deployment framework (CFAPP) to navigate real-world financial applications. The CFLLM is trained on CFData in two stages, continued pre-training and supervised fine-tuning.* Results: The paper presents a tailored dataset (CFData) for financial natural language processing, a financial LLM (CFLLM) that can adeptly manage financial texts, and a deployment framework (CFAPP) with additional modules for multifaceted functionality in real-world applications.Here are the three information points in Simplified Chinese text:
  • for: 这篇论文是为了介绍一种基于Transformer框架的中文金融生成预训练模型(CFGPT),用于金融自然语言处理任务。
  • methods: 论文使用了一个名为CFData的数据集进行预训练和精度调整,还有一个专门为金融文本管理的金融LLM(CFLLM),以及一个用于实际应用的投放框架(CFAPP)。CFLLM通过两个阶段的预训练和精度调整来训练。
  • results: 论文提供了一个适用于金融自然语言处理的专门数据集(CFData),一个能够有效地处理金融文本的金融LLM(CFLLM),以及一个具有多方面功能的投放框架(CFAPP)。
    Abstract Large language models (LLMs) have demonstrated great potential in natural language processing tasks within the financial domain. In this work, we present a Chinese Financial Generative Pre-trained Transformer framework, named CFGPT, which includes a dataset~(CFData) for pre-training and supervised fine-tuning, a financial LLM~(CFLLM) to adeptly manage financial texts, and a deployment framework~(CFAPP) designed to navigate real-world financial applications. The CFData comprising both a pre-training dataset and a supervised fine-tuning dataset, where the pre-training dataset collates Chinese financial data and analytics, alongside a smaller subset of general-purpose text with 584M documents and 141B tokens in total, and the supervised fine-tuning dataset is tailored for six distinct financial tasks, embodying various facets of financial analysis and decision-making with 1.5M instruction pairs and 1.5B tokens in total. The CFLLM, which is based on InternLM-7B to balance the model capability and size, is trained on CFData in two stage, continued pre-training and supervised fine-tuning. The CFAPP is centered on large language models (LLMs) and augmented with additional modules to ensure multifaceted functionality in real-world application. Our codes are released at https://github.com/TongjiFinLab/CFGPT.
    摘要 大型自然语言处理模型(LLMs)在金融领域内表现出了很大的潜力。在这项工作中,我们介绍了一个名为CFGPT的中文金融生成预训练 transformer框架,其包括一个名为CFData的预训练和监督练习 dataset,一个适应金融文本的金融LLM,以及一个为实际金融应用而设计的CFAPP框架。CFData包含了中文金融数据和分析,以及一小部分通用文本的584M份文档和141B个字符。CFLLM基于InternLM-7B,通过两stage的预训练和监督练习来训练。CFAPP是基于LLMs的框架,并增加了其他模块,以确保在实际应用中的多方面功能。我们的代码在https://github.com/TongjiFinLab/CFGPT上发布。

Towards Energy-Aware Federated Traffic Prediction for Cellular Networks

  • paper_url: http://arxiv.org/abs/2309.10645
  • repo_url: https://github.com/vperifan/federated-time-series-forecasting
  • paper_authors: Vasileios Perifanis, Nikolaos Pavlidis, Selim F. Yilmaz, Francesc Wilhelmi, Elia Guerra, Marco Miozzo, Pavlos S. Efraimidis, Paolo Dini, Remous-Aris Koutsiamanis
  • for: 预测 fifth-generation 网络流量是一项重要的活动,以便优化网络,因为准确的预测是关键 для智能网络设计、资源分配和异常情况检测。
  • methods: 本文使用了 federated learning(FL)作为一种机器学习训练框架,以提高预测精度并避免数据中心化问题。
  • results: 研究发现,大型机器学习模型在联合学习场景下可以 marginally 提高性能,但具有显著的环境影响,导致它们在实际应用中不实际。
    Abstract Cellular traffic prediction is a crucial activity for optimizing networks in fifth-generation (5G) networks and beyond, as accurate forecasting is essential for intelligent network design, resource allocation and anomaly mitigation. Although machine learning (ML) is a promising approach to effectively predict network traffic, the centralization of massive data in a single data center raises issues regarding confidentiality, privacy and data transfer demands. To address these challenges, federated learning (FL) emerges as an appealing ML training framework which offers high accurate predictions through parallel distributed computations. However, the environmental impact of these methods is often overlooked, which calls into question their sustainability. In this paper, we address the trade-off between accuracy and energy consumption in FL by proposing a novel sustainability indicator that allows assessing the feasibility of ML models. Then, we comprehensively evaluate state-of-the-art deep learning (DL) architectures in a federated scenario using real-world measurements from base station (BS) sites in the area of Barcelona, Spain. Our findings indicate that larger ML models achieve marginally improved performance but have a significant environmental impact in terms of carbon footprint, which make them impractical for real-world applications.
    摘要 fifth-generation (5G) 网络中的 cellular traffic prediction 是一项非常重要的活动,因为准确预测是智能网络设计、资源分配和异常现象 mitigation 的关键。虽然机器学习 (ML) 是一种有前途的方法来有效预测网络流量,但是集中大量数据在单个数据中心存储的问题会导致隐私、安全性和数据传输带宽的问题。为解决这些挑战,联邦学习 (FL) 作为一种有appeal的 ML 训练框架,通过并行分布计算来提供高精度预测。然而,这些方法的环境影响 часто被忽略,这会让它们的可持续性成为问题。在这篇论文中,我们考虑了精度和能源消耗之间的负面交互,并提出了一个可用于评估 ML 模型可持续性的新指标。然后,我们对现有的深度学习 (DL) 架构在联邦enario中进行了广泛的评估,使用了实际测量从 Barcelona, Spain 的基站 (BS) 站点。我们发现,更大的 ML 模型可以marginally提高性能,但具有 significanth carbon footprint,这使得它们在实际应用中不可持续。

Geometric structure of Deep Learning networks and construction of global ${\mathcal L}^2$ minimizers

  • paper_url: http://arxiv.org/abs/2309.10639
  • repo_url: None
  • paper_authors: Thomas Chen, Patricia Muñoz Ewald
  • for: 这个论文的目的是对深度学习(Deep Learning)网络的结构做出几何解释,并使用$L$层抑制函数、${\mathcal L}^2$Schatten类(或希尔бер特- Schmidt)成本函数、输入和输出空间为${\mathbb R}^Q$($Q\geq1$)。
  • methods: 这篇论文使用了作者们之前对浅层神经网络的研究结果,构建了一个可导的家族解 minimizers,以实现深度学习网络的全局最小值。在这个设置下,隐藏层神经网络”照料’’(curate)训练输入数据,通过重层应用截断函数来最小化训练输入的噪声比例。
  • results: 论文显示,在$L\geq Q$的情况下,深度学习网络的全局最小值存在$2^Q-1$个不同的特点点。
    Abstract In this paper, we provide a geometric interpretation of the structure of Deep Learning (DL) networks, characterized by $L$ hidden layers, a ramp activation function, an ${\mathcal L}^2$ Schatten class (or Hilbert-Schmidt) cost function, and input and output spaces ${\mathbb R}^Q$ with equal dimension $Q\geq1$. The hidden layers are defined on spaces ${\mathbb R}^{Q}$, as well. We apply our recent results on shallow neural networks to construct an explicit family of minimizers for the global minimum of the cost function in the case $L\geq Q$, which we show to be degenerate. In the context presented here, the hidden layers of the DL network "curate" the training inputs by recursive application of a truncation map that minimizes the noise to signal ratio of the training inputs. Moreover, we determine a set of $2^Q-1$ distinct degenerate local minima of the cost function.
    摘要 在这篇论文中,我们提供了深度学习(DL)网络的几何解释,其特征为有$L$层感知层、斜坡活动函数、${\mathcal L}^2$ Schatten类(或希尔伯特-Ш密特)成本函数,以及输入和输出空间为${\mathbb R}^Q$,其维度为$Q\geq1$。感知层在${\mathbb R}^{Q}$上定义。我们利用我们之前对浅层神经网络的研究,构造了$L\geq Q$时的全局最小值的明确家族,并证明其为极值。在这种情况下,深度学习网络的隐藏层“照料”训练输入,通过重复应用一个减少映射来最小化训练输入的噪声与信号比率。此外,我们确定了$2^Q-1$个不同的极值本地最小值。

Exploring the Influence of Information Entropy Change in Learning Systems

  • paper_url: http://arxiv.org/abs/2309.10625
  • repo_url: None
  • paper_authors: Xiaowei Yu, Yao Xue, Lu Zhang, Li Wang, Tianming Liu, Dajiang Zhu
    for: This paper explores the influence of entropy change in deep learning systems by adding noise to the inputs/latent features, with applications in computer vision tasks.methods: The paper uses theoretical analysis and empirical experiments to demonstrate the enhancement gained from positive noise by reducing the task complexity defined by information entropy.results: The paper shows significant performance gains in large image datasets such as ImageNet by proactively injecting positive noise, achieving an unprecedented top 1 accuracy of over 95%.
    Abstract In this work, we explore the influence of entropy change in deep learning systems by adding noise to the inputs/latent features. The applications in this paper focus on deep learning tasks within computer vision, but the proposed theory can be further applied to other fields. Noise is conventionally viewed as a harmful perturbation in various deep learning architectures, such as convolutional neural networks (CNNs) and vision transformers (ViTs), as well as different learning tasks like image classification and transfer learning. However, this paper aims to rethink whether the conventional proposition always holds. We demonstrate that specific noise can boost the performance of various deep architectures under certain conditions. We theoretically prove the enhancement gained from positive noise by reducing the task complexity defined by information entropy and experimentally show the significant performance gain in large image datasets, such as the ImageNet. Herein, we use the information entropy to define the complexity of the task. We categorize the noise into two types, positive noise (PN) and harmful noise (HN), based on whether the noise can help reduce the complexity of the task. Extensive experiments of CNNs and ViTs have shown performance improvements by proactively injecting positive noise, where we achieved an unprecedented top 1 accuracy of over 95% on ImageNet. Both theoretical analysis and empirical evidence have confirmed that the presence of positive noise can benefit the learning process, while the traditionally perceived harmful noise indeed impairs deep learning models. The different roles of noise offer new explanations for deep models on specific tasks and provide a new paradigm for improving model performance. Moreover, it reminds us that we can influence the performance of learning systems via information entropy change.
    摘要 在这项研究中,我们探索深度学习系统中 entropy 变化的影响。我们在计算机视觉领域中应用了深度学习任务,但我们的理论可以应用到其他领域。在传统上,噪声被视为深度学习架构中的危险扰动,如 convolutional neural networks (CNNs) 和 vision transformers (ViTs),以及不同的学习任务,如图像分类和转移学习。然而,这篇论文想要重新思考这一观点是否总是正确的。我们展示了特定的噪声可以在某些条件下提高深度架构的性能。我们使用信息 entropy 来定义任务的复杂度,并分类噪声为正面噪声 (PN) 和有害噪声 (HN),根据噪声是否可以减少任务的复杂度。我们在大量图像数据集,如 ImageNet,进行了广泛的实验,并证明了在某些条件下,注意性噪声可以提高深度架构的性能。我们的理论分析和实验证据都表明,在某些任务上,正面噪声可以促进学习过程,而传统上认为的危险噪声实际上会降低深度学习模型的性能。这些不同的噪声角色为深度模型在特定任务上提供了新的解释,并提供了一个新的性能提升模式。此外,它提醒我们可以通过改变信息 entropy 来影响学习系统的性能。

Large language models can accurately predict searcher preferences

  • paper_url: http://arxiv.org/abs/2309.10621
  • repo_url: None
  • paper_authors: Paul Thomas, Seth Spielman, Nick Craswell, Bhaskar Mitra
  • for: 这个论文目的是提高搜索系统中 Labels 的质量,即用户是否认为搜索结果有用。
  • methods: 这个论文使用了大型自然语言模型(Lang Model)来生成 Labels,并通过对用户反馈进行训练来改进 Labels 的质量。
  • results: 论文表明,使用 Lang Model 可以生成高质量 Labels,并且比第三方标注者更准确,同时也比较cost-effective。此外,这些 Labels 还可以用于训练更好的排名算法。
    Abstract Relevance labels, which indicate whether a search result is valuable to a searcher, are key to evaluating and optimising search systems. The best way to capture the true preferences of users is to ask them for their careful feedback on which results would be useful, but this approach does not scale to produce a large number of labels. Getting relevance labels at scale is usually done with third-party labellers, who judge on behalf of the user, but there is a risk of low-quality data if the labeller doesn't understand user needs. To improve quality, one standard approach is to study real users through interviews, user studies and direct feedback, find areas where labels are systematically disagreeing with users, then educate labellers about user needs through judging guidelines, training and monitoring. This paper introduces an alternate approach for improving label quality. It takes careful feedback from real users, which by definition is the highest-quality first-party gold data that can be derived, and develops an large language model prompt that agrees with that data. We present ideas and observations from deploying language models for large-scale relevance labelling at Bing, and illustrate with data from TREC. We have found large language models can be effective, with accuracy as good as human labellers and similar capability to pick the hardest queries, best runs, and best groups. Systematic changes to the prompts make a difference in accuracy, but so too do simple paraphrases. To measure agreement with real searchers needs high-quality ``gold'' labels, but with these we find that models produce better labels than third-party workers, for a fraction of the cost, and these labels let us train notably better rankers.
    摘要 搜寻结果的价值性标签(relevance labels)是评估和优化搜寻系统的关键因素。获取高质量的标签最好的方法是请求用户提供精确的反馈,但这种方法不能生产大量的标签。通过第三方审核员进行审核,但这可能会导致低质量的数据。将高质量的标签获取到大量的数据是一个问题。这篇文章介绍了一种新的方法来提高标签质量。它从真实的用户中获取了精确的反馈,并使用大型自然语言模型来开发问题提示,以确保它们与用户需求相符。我们在部署语言模型进行大规模的审核labeling时发现,大型语言模型可以有高精度和人工审核员相似的能力,并且能够处理最困难的查询、最佳路径和最佳分组。我们发现,对于标签的系统性改变可以提高精确性,但也注意到了简单的重写可以获得相似的效果。为了衡量模型与真实搜寻者需求的一致,我们需要高质量的“金”标签,但我们发现,这些标签可以让我们训练更好的排名器,并且这些标签的成本比第三方审核员来的便宜得多。

A Dynamic Linear Bias Incorporation Scheme for Nonnegative Latent Factor Analysis

  • paper_url: http://arxiv.org/abs/2309.10618
  • repo_url: None
  • paper_authors: Yurong Zhong, Zhe Xie, Weiling Li, Xin Luo
  • for: Handle high-dimensional and incomplete (HDI) data in big data-related applications, such as social network services systems, by learning HDI data representation.
  • methods: Propose a dynamic linear bias incorporation (DLBI) scheme to improve the scalability and representation ability of nonnegative latent factor analysis (NLFA) models for HDI data.
  • results: Obtain higher representation accuracy and competitive computational efficiency compared to state-of-the-art models on three HDI datasets from real applications.
    Abstract High-Dimensional and Incomplete (HDI) data is commonly encountered in big data-related applications like social network services systems, which are concerning the limited interactions among numerous nodes. Knowledge acquisition from HDI data is a vital issue in the domain of data science due to their embedded rich patterns like node behaviors, where the fundamental task is to perform HDI data representation learning. Nonnegative Latent Factor Analysis (NLFA) models have proven to possess the superiority to address this issue, where a linear bias incorporation (LBI) scheme is important in present the training overshooting and fluctuation, as well as preventing the model from premature convergence. However, existing LBI schemes are all statistic ones where the linear biases are fixed, which significantly restricts the scalability of the resultant NLFA model and results in loss of representation learning ability to HDI data. Motivated by the above discoveries, this paper innovatively presents the dynamic linear bias incorporation (DLBI) scheme. It firstly extends the linear bias vectors into matrices, and then builds a binary weight matrix to switch the active/inactive states of the linear biases. The weight matrix's each entry switches between the binary states dynamically corresponding to the linear bias value variation, thereby establishing the dynamic linear biases for an NLFA model. Empirical studies on three HDI datasets from real applications demonstrate that the proposed DLBI-based NLFA model obtains higher representation accuracy several than state-of-the-art models do, as well as highly-competitive computational efficiency.
    摘要 高维ensional和不完全(HDI)数据在大数据相关应用中常见,如社交媒体系统等,它们关注有限的节点间交互。科学数据获取从HDI数据是数据科学领域的重要问题,因为它们嵌入了诸如节点行为的复杂模式。非正式因子分析(NLFA)模型已经证明可以解决这个问题,其中线性偏好包含(LBI)策略可以避免模型快速 converges 和抖动。然而,现有的LBI策略都是静态的,这限制了NLFA模型的可扩展性和对HDI数据的表达能力。这篇论文驱动于以上发现,开创了动态线性偏好包含(DLBI)策略。它首先将线性偏好 vectors 扩展到矩阵,然后建立一个二进制权重矩阵,以switch动态线性偏好的活动/不活动状态。每个权重矩阵中的每个Entry 在线性偏好值变化时动态地 switching между二进制状态,从而实现了动态线性偏好。实验研究在三个HDI数据集上表明,提案的DLBI-based NLFA模型在表达精度方面比现有模型高得多,同时computational efficiency 也具有高度竞争力。

Decentralized Online Learning in Task Assignment Games for Mobile Crowdsensing

  • paper_url: http://arxiv.org/abs/2309.10594
  • repo_url: None
  • paper_authors: Bernd Simon, Andrea Ortiz, Walid Saad, Anja Klein
  • for: 这个研究是为了解决移动对感应系统 (MCS) 中的聚合数据收集问题。
  • methods: 这个研究使用了一种新的分布式方法,结合了对抗理论和在线学习,被称为碰撞避免多重枪 (CA-MAB-SFS)。这个方法模型了任务将分配问题为一个对抗游戏,考虑到 MCSP 和 MU 的个人目标,并让 MU 在线上学习其努力。
  • results: 这个研究的结果显示,CA-MAB-SFS 可以将 MCSP 和 MU 的满意度提高,并且降低均值任务完成时间,至少降低 16%。此外,CA-MAB-SFS 可以确保任务分配问题的稳定 regret 是一个线性下降函数,并且在线上学习过程中,MU 的学习速度得到了重要的改善。
    Abstract The problem of coordinated data collection is studied for a mobile crowdsensing (MCS) system. A mobile crowdsensing platform (MCSP) sequentially publishes sensing tasks to the available mobile units (MUs) that signal their willingness to participate in a task by sending sensing offers back to the MCSP. From the received offers, the MCSP decides the task assignment. A stable task assignment must address two challenges: the MCSP's and MUs' conflicting goals, and the uncertainty about the MUs' required efforts and preferences. To overcome these challenges a novel decentralized approach combining matching theory and online learning, called collision-avoidance multi-armed bandit with strategic free sensing (CA-MAB-SFS), is proposed. The task assignment problem is modeled as a matching game considering the MCSP's and MUs' individual goals while the MUs learn their efforts online. Our innovative "free-sensing" mechanism significantly improves the MU's learning process while reducing collisions during task allocation. The stable regret of CA-MAB-SFS, i.e., the loss of learning, is analytically shown to be bounded by a sublinear function, ensuring the convergence to a stable optimal solution. Simulation results show that CA-MAB-SFS increases the MUs' and the MCSP's satisfaction compared to state-of-the-art methods while reducing the average task completion time by at least 16%.
    摘要 <>将数据收集协调问题应用于移动农垦系统(MCSP)中。MCSP逐次发布感知任务到可用的移动单元(MU),并且MU通过发送感知申请回到MCSP。从接收的申请中,MCSP决定任务分配。稳定任务分配必须解决两个挑战:MCSP和MU的目标冲突,以及MU的努力和偏好的不确定性。为了解决这些挑战,我们提出了一种新的分布式方法, combining matching theory和在线学习,称为碰撞避免多重臂bandit with strategic free sensing(CA-MAB-SFS)。任务分配问题被模型为一个匹配游戏,考虑MCSP和MU的个人目标,而MU在线学习其努力。我们的创新的“免费感知”机制可以显著提高MU的学习过程,同时降低任务分配中的碰撞。CA-MAB-SFS的稳定征 regret,即学习损失,被分析显示为一个下线函数,确保 converge to a stable optimal solution。实验结果表明,CA-MAB-SFS在比 estado-of-the-art方法的情况下,使MU和MCSP满意度提高,而任务完成时间平均下降至少16%。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. If you need Traditional Chinese, please let me know.

PDRL: Multi-Agent based Reinforcement Learning for Predictive Monitoring

  • paper_url: http://arxiv.org/abs/2309.10576
  • repo_url: None
  • paper_authors: Thanveer Shaik, Xiaohui Tao, Lin Li, Haoran Xie, U R Acharya, Raj Gururajan, Xujuan Zhou
  • for: 这个研究旨在提出一个新的、通用的预测深度学习(PDRL)系统,用于时间序列预测环境中。
  • methods: 这个系统使用多个对应的深度问题网络(DQN)代理人,以监控预测的未来环境状态,并将学习的知识与最大化奖励相结合。
  • results: 在评估过程中,三个DRL代理人能够顺利学习相应的模式,并逐次获得奖励。该系统在时间序列预测中实现了状态预测的最佳性能。
    Abstract Reinforcement learning has been increasingly applied in monitoring applications because of its ability to learn from previous experiences and can make adaptive decisions. However, existing machine learning-based health monitoring applications are mostly supervised learning algorithms, trained on labels and they cannot make adaptive decisions in an uncertain complex environment. This study proposes a novel and generic system, predictive deep reinforcement learning (PDRL) with multiple RL agents in a time series forecasting environment. The proposed generic framework accommodates virtual Deep Q Network (DQN) agents to monitor predicted future states of a complex environment with a well-defined reward policy so that the agent learns existing knowledge while maximizing their rewards. In the evaluation process of the proposed framework, three DRL agents were deployed to monitor a subject's future heart rate, respiration, and temperature predicted using a BiLSTM model. With each iteration, the three agents were able to learn the associated patterns and their cumulative rewards gradually increased. It outperformed the baseline models for all three monitoring agents. The proposed PDRL framework is able to achieve state-of-the-art performance in the time series forecasting process. The proposed DRL agents and deep learning model in the PDRL framework are customized to implement the transfer learning in other forecasting applications like traffic and weather and monitor their states. The PDRL framework is able to learn the future states of the traffic and weather forecasting and the cumulative rewards are gradually increasing over each episode.
    摘要 强化学习在监测应用中得到了广泛应用,因为它可以从前一次的经验中学习并做出适应性的决策。然而,现有的机器学习基于的健康监测应用多为指导学习算法,它们不能在不确定的复杂环境中做出适应性的决策。本研究提出了一种新的和通用的系统——预测深度强化学习(PDRL),该系统通过多个RL代理在时间序列预测环境中使用多个DQN代理来监测预测的未来状况,以便代理学习现有的知识而寻求最大化奖励。在评估PDRL系统的过程中,三个DRL代理被部署到监测一个人的未来心率、呼吸和体温预测结果。在每次迭代中,三个代理能够学习相关的模式,其总奖励逐渐增长。与基线模型相比,PDRL系统在时间序列预测过程中实现了状态的极佳性能。PDRL系统可以在其他预测应用中,如交通和天气预测,实现传输学习,并监测其状态。在每个 episoden 中,PDRL系统能够学习未来的交通和天气预测结果,并逐渐增长其总奖励。

A multimodal deep learning architecture for smoking detection with a small data approach

  • paper_url: http://arxiv.org/abs/2309.10561
  • repo_url: None
  • paper_authors: Robert Lakatos, Peter Pollner, Andras Hajdu, Tamas Joo
  • for: 探索使用人工智能检测隐藏烟草广告的可能性,以提高媒体内容的不偏不倚和公平性。
  • methods: 提出一种基于深度学习、生成方法和人类干扰的整合文本和图像处理模型,可以在文本和图像格式下检测吸烟场景,即使有限的训练数据。
  • results: 模型可以达到74%的图像准确率和98%的文本准确率,并且可以 integrating human reinforcement 进行专家干扰。
    Abstract Introduction: Covert tobacco advertisements often raise regulatory measures. This paper presents that artificial intelligence, particularly deep learning, has great potential for detecting hidden advertising and allows unbiased, reproducible, and fair quantification of tobacco-related media content. Methods: We propose an integrated text and image processing model based on deep learning, generative methods, and human reinforcement, which can detect smoking cases in both textual and visual formats, even with little available training data. Results: Our model can achieve 74\% accuracy for images and 98\% for text. Furthermore, our system integrates the possibility of expert intervention in the form of human reinforcement. Conclusions: Using the pre-trained multimodal, image, and text processing models available through deep learning makes it possible to detect smoking in different media even with few training data.
    摘要 引言:借由覆住式烟草广告,常会引起 regulatory 措施。本文提出,人工智能,特别是深度学习,具有察觉隐藏广告的潜力,并可以实现不偏袋、可重复、公正地评估烟草相关媒体内容。方法:我们提议一种基于深度学习、生成方法和人类补做的集成文本和图像处理模型,可以在文本和图像格式中检测吸烟场景,即使培训数据 scarcity 。结果:我们的模型可以达到 74% 的准确率 для图像和 98% 的准确率 для文本。此外,我们的系统还可以 integrate the possibility of expert intervention in the form of human reinforcement。结论:通过深度学习提供的 pré-train 多模态、图像和文本处理模型,可以在媒体中检测吸烟,即使培训数据 scarce。

A Neighbourhood-Aware Differential Privacy Mechanism for Static Word Embeddings

  • paper_url: http://arxiv.org/abs/2309.10551
  • repo_url: None
  • paper_authors: Danushka Bollegala, Shuichi Otake, Tomoya Machide, Ken-ichi Kawarabayashi
  • for: 保护个人隐私(differential privacy)
  • methods: 使用邻域相关的差分隐私机制(Neighbourhood-Aware Differential Privacy,NADP),根据word embedding空间中 Word 的邻域构建图,并在不同邻域中应用不同水平的高斯噪声,以保证指定的隐私水平。
  • results: 在多个下游任务中,NADP 机制比 laplacian、gaussian 和 mahalanobis 等先前提出的隐私机制表现更好,同时保证更高的隐私水平。
    Abstract We propose a Neighbourhood-Aware Differential Privacy (NADP) mechanism considering the neighbourhood of a word in a pretrained static word embedding space to determine the minimal amount of noise required to guarantee a specified privacy level. We first construct a nearest neighbour graph over the words using their embeddings, and factorise it into a set of connected components (i.e. neighbourhoods). We then separately apply different levels of Gaussian noise to the words in each neighbourhood, determined by the set of words in that neighbourhood. Experiments show that our proposed NADP mechanism consistently outperforms multiple previously proposed DP mechanisms such as Laplacian, Gaussian, and Mahalanobis in multiple downstream tasks, while guaranteeing higher levels of privacy.
    摘要 我们提出了一种基于邻居的差分隐私(NADP)机制,利用预训练的静态单词嵌入空间中的邻居 relaciones,确定最小的噪声量以保证指定的隐私水平。我们首先将单词的嵌入图构建成 nearest neighbor 图,然后将其分解成一系列相互独立的噪声应用。实验表明,我们的提议的 NADP 机制在多个下游任务中 consistently 超越了多个先前提出的差分隐私机制,如 Laplacian、Gaussian 和 Mahalanobis,同时保证更高的隐私水平。

Towards Generative Modeling of Urban Flow through Knowledge-enhanced Denoising Diffusion

  • paper_url: http://arxiv.org/abs/2309.10547
  • repo_url: https://github.com/tsinghua-fib-lab/kstdiff-urban-flow-generation
  • paper_authors: Zhilun Zhou, Jingtao Ding, Yu Liu, Depeng Jin, Yong Li
  • for: 本研究旨在生成城市流动数据,尤其是在数据缺乏或新规划区域的情况下。
  • methods: 本研究使用了Diffusion Model和知识增强的spatio-temporal diffusion模型(KSTDiff)来生成城市流动数据。在KSTDiff模型中,我们首先构建了一个城市知识图(UKG),以模拟城市环境和区域之间的关系。然后,我们设计了一个学习式的流量估计器,以便准确地生成不同区域的流量。此外,我们还提出了一种知识增强的降噪网络,以捕捉城市流动的空间时间关系以及城市环境的影响。
  • results: 对四个实际数据集进行了广泛的实验,并证明了我们的模型在城市流动生成方面的优越性。此外,我们还进行了更深入的研究,证明了生成的城市流动数据的实用性和我们模型的长期流动预测和城市流动预测能力。
    Abstract Although generative AI has been successful in many areas, its ability to model geospatial data is still underexplored. Urban flow, a typical kind of geospatial data, is critical for a wide range of urban applications. Existing studies mostly focus on predictive modeling of urban flow that predicts the future flow based on historical flow data, which may be unavailable in data-sparse areas or newly planned regions. Some other studies aim to predict OD flow among regions but they fail to model dynamic changes of urban flow over time. In this work, we study a new problem of urban flow generation that generates dynamic urban flow for regions without historical flow data. To capture the effect of multiple factors on urban flow, such as region features and urban environment, we employ diffusion model to generate urban flow for regions under different conditions. We first construct an urban knowledge graph (UKG) to model the urban environment and relationships between regions, based on which we design a knowledge-enhanced spatio-temporal diffusion model (KSTDiff) to generate urban flow for each region. Specifically, to accurately generate urban flow for regions with different flow volumes, we design a novel diffusion process guided by a volume estimator, which is learnable and customized for each region. Moreover, we propose a knowledge-enhanced denoising network to capture the spatio-temporal dependencies of urban flow as well as the impact of urban environment in the denoising process. Extensive experiments on four real-world datasets validate the superiority of our model over state-of-the-art baselines in urban flow generation. Further in-depth studies demonstrate the utility of generated urban flow data and the ability of our model for long-term flow generation and urban flow prediction. Our code is released at: https://github.com/tsinghua-fib-lab/KSTDiff-Urban-flow-generation.
    摘要 although generative AI has been successful in many areas, its ability to model geospatial data is still underexplored. urban flow, a typical kind of geospatial data, is critical for a wide range of urban applications. existing studies mostly focus on predictive modeling of urban flow that predicts the future flow based on historical flow data, which may be unavailable in data-sparse areas or newly planned regions. some other studies aim to predict OD flow among regions but they fail to model dynamic changes of urban flow over time. in this work, we study a new problem of urban flow generation that generates dynamic urban flow for regions without historical flow data. to capture the effect of multiple factors on urban flow, such as region features and urban environment, we employ diffusion model to generate urban flow for regions under different conditions. we first construct an urban knowledge graph (UKG) to model the urban environment and relationships between regions, based on which we design a knowledge-enhanced spatio-temporal diffusion model (KSTDiff) to generate urban flow for each region. specifically, to accurately generate urban flow for regions with different flow volumes, we design a novel diffusion process guided by a volume estimator, which is learnable and customized for each region. moreover, we propose a knowledge-enhanced denoising network to capture the spatio-temporal dependencies of urban flow as well as the impact of urban environment in the denoising process. extensive experiments on four real-world datasets validate the superiority of our model over state-of-the-art baselines in urban flow generation. further in-depth studies demonstrate the utility of generated urban flow data and the ability of our model for long-term flow generation and urban flow prediction. our code is released at: https://github.com/tsinghua-fib-lab/KSTDiff-Urban-flow-generation.

Mean Absolute Directional Loss as a New Loss Function for Machine Learning Problems in Algorithmic Investment Strategies

  • paper_url: http://arxiv.org/abs/2309.10546
  • repo_url: None
  • paper_authors: Jakub Michańków, Paweł Sakowski, Robert Ślepaczuk
  • for: 这paper investigate了用于预测金融时间序列的机器学习模型优化中的恰当损失函数问题,以建立更好的算法投资策略(AIS)。
  • methods: authors propose了 Mean Absolute Directional Loss(MADL)函数,解决了 классиical forecast error functions中提取信息从预测中创建有效的 buy/sell signals问题。
  • results: authors based on two different asset classes(Bitcoin和Crude Oil)的数据,显示了新的损失函数可以更好地选择LSTM模型的超参数,并在它们的验证数据上获得更高的风险调整回报率。
    Abstract This paper investigates the issue of an adequate loss function in the optimization of machine learning models used in the forecasting of financial time series for the purpose of algorithmic investment strategies (AIS) construction. We propose the Mean Absolute Directional Loss (MADL) function, solving important problems of classical forecast error functions in extracting information from forecasts to create efficient buy/sell signals in algorithmic investment strategies. Finally, based on the data from two different asset classes (cryptocurrencies: Bitcoin and commodities: Crude Oil), we show that the new loss function enables us to select better hyperparameters for the LSTM model and obtain more efficient investment strategies, with regard to risk-adjusted return metrics on the out-of-sample data.
    摘要

Model Leeching: An Extraction Attack Targeting LLMs

  • paper_url: http://arxiv.org/abs/2309.10544
  • repo_url: None
  • paper_authors: Lewis Birch, William Hackett, Stefan Trawicki, Neeraj Suri, Peter Garraghan
  • for: 本研究旨在提取大语言模型(LLM)中的任务特有知识,并将其转换为具有减少参数的模型。
  • methods: 本研究使用的方法是Model Leeching,可以快速和效率地从目标LLM中提取任务特有的知识。
  • results: 研究表明,通过使用Model Leeching,可以从ChatGPT-3.5-Turbo中提取出73%的 preciseness(EM)相似性,以及SQuAD EM和F1分数分别为75%和87%,仅需API成本50美元。此外,研究还证明了对提取模型进行ML攻击的可行性,对ChatGPT-3.5-Turbo进行ML攻击时,通过 transferred adversarial attack 可以提高攻击成功率11%。
    Abstract Model Leeching is a novel extraction attack targeting Large Language Models (LLMs), capable of distilling task-specific knowledge from a target LLM into a reduced parameter model. We demonstrate the effectiveness of our attack by extracting task capability from ChatGPT-3.5-Turbo, achieving 73% Exact Match (EM) similarity, and SQuAD EM and F1 accuracy scores of 75% and 87%, respectively for only $50 in API cost. We further demonstrate the feasibility of adversarial attack transferability from an extracted model extracted via Model Leeching to perform ML attack staging against a target LLM, resulting in an 11% increase to attack success rate when applied to ChatGPT-3.5-Turbo.
    摘要 模型偷窥(Model Leeching)是一种新的抽取攻击,可以从目标大语言模型(LLMs)中提取任务特定的知识,并将其转换为具有减少参数的模型。我们通过对ChatGPT-3.5-Turbo进行抽取,实现了73%的精确匹配(EM)相似性,以及SQuAD EM和F1分数的75%和87%分别。这些成绩只需要50美元的API成本。我们还证明了攻击者可以通过我们提取的模型来对目标LLM进行ML攻击,从而提高了攻击成功率11%。

OpenMSD: Towards Multilingual Scientific Documents Similarity Measurement

  • paper_url: http://arxiv.org/abs/2309.10539
  • repo_url: https://github.com/google-research/google-research
  • paper_authors: Yang Gao, Ji Ma, Ivan Korotkov, Keith Hall, Dana Alon, Don Metzler
  • for: 本研究旨在开发和评估多语言科学文献相似度测量模型,以便帮助多语言研究人员更有效地找到和探索相关文献。
  • methods: 我们使用Open-access Multilingual Scientific Documents(OpenMSD)数据集,该数据集包含74M篇论文和778M个引用对,并采用科学专业语言模型的预训练和不同策略来Derive “相关” 文献对进行细化。
  • results: 我们的最佳模型在比较 STRONG 基线模型时显著超越,提高了7-16%的平均精度。
    Abstract We develop and evaluate multilingual scientific documents similarity measurement models in this work. Such models can be used to find related works in different languages, which can help multilingual researchers find and explore papers more efficiently. We propose the first multilingual scientific documents dataset, Open-access Multilingual Scientific Documents (OpenMSD), which has 74M papers in 103 languages and 778M citation pairs. With OpenMSD, we pretrain science-specialized language models, and explore different strategies to derive "related" paper pairs to fine-tune the models, including using a mixture of citation, co-citation, and bibliographic-coupling pairs. To further improve the models' performance for non-English papers, we explore the use of generative language models to enrich the non-English papers with English summaries. This allows us to leverage the models' English capabilities to create better representations for non-English papers. Our best model significantly outperforms strong baselines by 7-16% (in mean average precision).
    摘要 我们在这项工作中开发和评估多语言科学文献相似度评估模型。这些模型可以帮助多语言研究人员更 efficiently找到相关的文献,从而提高研究效率。我们提出了首个多语言科学文献数据集 Open-access Multilingual Scientific Documents (OpenMSD),该数据集包含7400万篇文献和778亿引用对,涵盖103种语言。使用OpenMSD数据集,我们预训练了专门为科学研究设计的语言模型,并研究了不同的策略来生成"相关"文献对以进一步训练模型,包括使用混合引用、共同引用和文献联系对。为了进一步提高非英文文献的表现,我们explore了使用生成语言模型来把非英文文献与英文摘要相联系。这allow我们利用模型的英文能力来创建更好的非英文文献表示。我们的最佳模型在比 STRONG baseline 高7-16%(在平均精度上)。

A Cognitively-Inspired Neural Architecture for Visual Abstract Reasoning Using Contrastive Perceptual and Conceptual Processing

  • paper_url: http://arxiv.org/abs/2309.10532
  • repo_url: None
  • paper_authors: Yuan Yang, Deepayan Sanyal, James Ainooson, Joel Michelson, Effat Farhana, Maithilee Kunda
  • for: 解决视觉抽象逻辑任务,即人类认知中的抽象逻辑过程。
  • methods: 提出了一种基于人类认知原理的新神经网络 architecture,即对比性感知-概念处理网络(CPCNet),它模型了人类视觉抽象逻辑的迭代、自我对比、学习过程。
  • results: 在使用matrix reasoning问题的 estilo de Raven’s Progressive Matrices智能测验 dataset中,CPCNet实现了所有之前发表的模型的高精度,同时使用最弱的推导假设。此外,文章还发现了原始 RAVEN 数据集中的一个substantial和前未注意的类别偏见,并提出了一个新的 RAVEN 变体 – AB-RAVEN,它更加具有抽象概念的均衡。
    Abstract We introduce a new neural architecture for solving visual abstract reasoning tasks inspired by human cognition, specifically by observations that human abstract reasoning often interleaves perceptual and conceptual processing as part of a flexible, iterative, and dynamic cognitive process. Inspired by this principle, our architecture models visual abstract reasoning as an iterative, self-contrasting learning process that pursues consistency between perceptual and conceptual processing of visual stimuli. We explain how this new Contrastive Perceptual-Conceptual Network (CPCNet) works using matrix reasoning problems in the style of the well-known Raven's Progressive Matrices intelligence test. Experiments on the machine learning dataset RAVEN show that CPCNet achieves higher accuracy than all previously published models while also using the weakest inductive bias. We also point out a substantial and previously unremarked class imbalance in the original RAVEN dataset, and we propose a new variant of RAVEN -- AB-RAVEN -- that is more balanced in terms of abstract concepts.
    摘要 我团队引入了一种新的神经网络模型,用于解决视觉抽象逻辑任务, draws inspiration from human cognition, specifically the observation that human abstract reasoning often interleaves perceptual and conceptual processing as part of a flexible, iterative, and dynamic cognitive process. 我们的模型将视觉抽象逻辑模型为一种迭代、自相对抗的学习过程,以实现视觉各种抽象处理和概念处理之间的一致性。我们使用矩阵理解问题,类似于著名的鸭子进步矩阵测验,解释我们的新型 Contrastive Perceptual-Conceptual Network (CPCNet) 如何工作。我们的实验表明,CPCNet 在 RAVEN 机器学习 dataset 上达到了所有前一代模型的高精度,同时使用最弱的推导假设。我们还指出了原始 RAVEN 数据集中的一个substantial和 previously unremarked class imbalance,并提出了一个新的 AB-RAVEN 数据集,它更加均衡了抽象概念。

Visible and NIR Image Fusion Algorithm Based on Information Complementarity

  • paper_url: http://arxiv.org/abs/2309.10522
  • repo_url: None
  • paper_authors: Zhuo Li, Bo Li
  • for: 本研究旨在利用可见和近红外(NIR)频率图像的融合,以优化图像质量。
  • methods: 本研究提出了一种基于物理信号水平的共辐合模型,包括两层权值引导滤波器和导引滤波器来获取文本和边缘层,以及使用延展DoG滤波器来生成初始可见-NIR共辐合权重图。
  • results: 实验结果表明,提出的算法可以良好地利用可见和NIR图像的谱属性和信息协同性,并避免颜色不自然的问题,与现有的状态艺术比较。
    Abstract Visible and near-infrared(NIR) band sensors provide images that capture complementary spectral radiations from a scene. And the fusion of the visible and NIR image aims at utilizing their spectrum properties to enhance image quality. However, currently visible and NIR fusion algorithms cannot well take advantage of spectrum properties, as well as lack information complementarity, which results in color distortion and artifacts. Therefore, this paper designs a complementary fusion model from the level of physical signals. First, in order to distinguish between noise and useful information, we use two layers of the weight-guided filter and guided filter to obtain texture and edge layers, respectively. Second, to generate the initial visible-NIR complementarity weight map, the difference maps of visible and NIR are filtered by the extend-DoG filter. After that, the significant region of NIR night-time compensation guides the initial complementarity weight map by the arctanI function. Finally, the fusion images can be generated by the complementarity weight maps of visible and NIR images, respectively. The experimental results demonstrate that the proposed algorithm can not only well take advantage of the spectrum properties and the information complementarity, but also avoid color unnatural while maintaining naturalness, which outperforms the state-of-the-art.
    摘要 可见和近红外(NIR)摄像机提供图像,捕捉场景的补充 спектраль辐射。并将可见和NIR图像融合,以利用其谱属性提高图像质量。然而,当前可见和NIR融合算法无法好地利用谱属性,同时缺乏信息协同,导致颜色扭曲和 Artefacts。因此,这篇论文提出了基于物理信号的补充模型。首先,使用两层权重导向滤波器和导引滤波器来分别获取Texture和Edge层。其次,使用扩展DoG滤波器来缩放可见和NIR差分图。接着,使用arctanI函数来引导初始可见-NIR补充权重图。最后,可以使用可见和NIR图像的补充权重图来生成融合图像。实验结果表明,提出的算法不仅可以好地利用谱属性和信息协同,同时也可以避免颜色不自然,而保持自然性,与当前最佳的方法相比。

Partially-Specified Causal Simulations

  • paper_url: http://arxiv.org/abs/2309.10514
  • repo_url: None
  • paper_authors: A. Zamanian, L. Mareis, N. Ahmidi
  • For: The paper is written to emphasize the importance of proper simulation design in causal inference research, and to introduce a new simulation framework called PARCS that addresses this issue.* Methods: The paper uses graphical causal models and a wide range of adjustable parameters to synthesize data, and allows users to identify and specify the subset of related parameters and randomize the remaining ones to generate a range of complying data-generating processes for their causal method.* Results: The paper reproduces and extends the simulation studies of two well-known causal discovery and missing data analysis papers, demonstrating the necessity of a proper simulation design and the benefits of using PARCS for simulation. The results show that PARCS can generate a more comprehensive and inclusive empirical investigation for causal claims.
    Abstract Simulation studies play a key role in the validation of causal inference methods. The simulation results are reliable only if the study is designed according to the promised operational conditions of the method-in-test. Still, many causal inference literature tend to design over-restricted or misspecified studies. In this paper, we elaborate on the problem of improper simulation design for causal methods and compile a list of desiderata for an effective simulation framework. We then introduce partially-randomized causal simulation (PARCS), a simulation framework that meets those desiderata. PARCS synthesizes data based on graphical causal models and a wide range of adjustable parameters. There is a legible mapping from usual causal assumptions to the parameters, thus, users can identify and specify the subset of related parameters and randomize the remaining ones to generate a range of complying data-generating processes for their causal method. The result is a more comprehensive and inclusive empirical investigation for causal claims. Using PARCS, we reproduce and extend the simulation studies of two well-known causal discovery and missing data analysis papers to emphasize the necessity of a proper simulation design. Our results show that those papers would have improved and extended the findings, had they used PARCS for simulation. The framework is implemented as a Python package, too. By discussing the comprehensiveness and transparency of PARCS, we encourage causal inference researchers to utilize it as a standard tool for future works.
    摘要 模拟研究在 causal inference 方法的验证中扮演着关键角色。模拟结果的可靠性取决于研究按照测试方法的操作条件进行设计。然而,许多 causal inference 文献中的模拟设计往往过于紧张或不准确。在这篇文章中,我们讨论了模拟设计不当的问题,并编辑了一份有效模拟框架的需求列表。然后,我们介绍了 partially-randomized causal simulation(PARCS)模拟框架,该框架基于图形 causal 模型和广泛的可调参数。在这个框架中,用户可以明确地将相关参数与 causal 假设之间的映射,并随机化剩下的参数来生成一个包含多种合法的数据生成过程。这使得用户可以对 causal laims 进行更广泛和包容的实验 исследование。使用 PARCS,我们重新生成和扩展了两篇已有的 causal discovery 和 missing data analysis 文献中的模拟研究,以强调模拟设计的重要性。我们的结果表明,如果使用 PARCS,这些文献中的结果将更加完整和多元。PARCS 已经实现为 Python 包。通过讨论 PARCS 的全面性和透明度,我们鼓励 causal inference 研究人员在未来的工作中使用这种标准工具。

A Configurable Library for Generating and Manipulating Maze Datasets

  • paper_url: http://arxiv.org/abs/2309.10498
  • repo_url: https://github.com/understanding-search/maze-dataset
  • paper_authors: Michael Igorevich Ivanitskiy, Rusheb Shah, Alex F. Spies, Tilman Räuker, Dan Valentine, Can Rager, Lucia Quirke, Chris Mathwin, Guillaume Corlouer, Cecilia Diniz Behn, Samy Wu Fung
  • for: investigate how machine learning models respond to distributional shifts using maze-solving tasks as a testbed
  • methods: present a comprehensive library for generating, processing, and visualizing maze-solving datasets with extensive control over generation algorithms and parameters
  • results: support for multiple output formats and tools for visualizing and converting between them, ensuring versatility and adaptability in research applications
    Abstract Understanding how machine learning models respond to distributional shifts is a key research challenge. Mazes serve as an excellent testbed due to varied generation algorithms offering a nuanced platform to simulate both subtle and pronounced distributional shifts. To enable systematic investigations of model behavior on out-of-distribution data, we present $\texttt{maze-dataset}$, a comprehensive library for generating, processing, and visualizing datasets consisting of maze-solving tasks. With this library, researchers can easily create datasets, having extensive control over the generation algorithm used, the parameters fed to the algorithm of choice, and the filters that generated mazes must satisfy. Furthermore, it supports multiple output formats, including rasterized and text-based, catering to convolutional neural networks and autoregressive transformer models. These formats, along with tools for visualizing and converting between them, ensure versatility and adaptability in research applications.
    摘要 理解机器学习模型对分布转移的响应是一项关键的研究挑战。迷宫 serves as an excellent testbed due to its varied generation algorithms, offering a nuanced platform to simulate both subtle and pronounced distributional shifts. To enable systematic investigations of model behavior on out-of-distribution data, we present $\texttt{maze-dataset}$, a comprehensive library for generating, processing, and visualizing datasets consisting of maze-solving tasks. With this library, researchers can easily create datasets, having extensive control over the generation algorithm used, the parameters fed to the algorithm of choice, and the filters that generated mazes must satisfy. Furthermore, it supports multiple output formats, including rasterized and text-based, catering to convolutional neural networks and autoregressive transformer models. These formats, along with tools for visualizing and converting between them, ensure versatility and adaptability in research applications.Here's the breakdown of the translation:* 理解机器学习模型 (Understanding machine learning models) becomes 理解机器学习模型 (Understanding machine learning models)* 对分布转移 (distributional shifts) becomes 对分布转移 (distributional shifts)* 迷宫 serves as an excellent testbed (mazes serve as an excellent testbed) becomes 迷宫 serves as an excellent testbed (mazes serve as an excellent testbed)* varied generation algorithms (varied generation algorithms) becomes 多种生成算法 (various generation algorithms)* nuanced platform (nuanced platform) becomes 细腻的平台 (subtle platform)* To enable systematic investigations (To enable systematic investigations) becomes 为实现系统的调查 (To enable systematic investigations)* we present $\texttt{maze-dataset}$ (we present $\texttt{maze-dataset}$) becomes 我们提供 $\texttt{maze-dataset}$ (we provide $\texttt{maze-dataset}$)* a comprehensive library (a comprehensive library) becomes 一个全面的库 (a comprehensive library)* consisting of maze-solving tasks (consisting of maze-solving tasks) becomes 包含迷宫解决任务 (consisting of maze-solving tasks)* With this library, researchers can easily create datasets (With this library, researchers can easily create datasets) becomes 通过这个库,研究人员可以轻松创建数据集 (With this library, researchers can easily create datasets)* having extensive control (having extensive control) becomes 具有广泛的控制 (having extensive control)* over the generation algorithm used (over the generation algorithm used) becomes 对生成算法使用的控制 (over the generation algorithm used)* the parameters fed to the algorithm of choice (the parameters fed to the algorithm of choice) becomes 选择的算法的参数 (the parameters fed to the algorithm of choice)* and the filters that generated mazes must satisfy (and the filters that generated mazes must satisfy) becomes 并且生成迷宫的筛选器必须满足 (and the filters that generated mazes must satisfy)* Furthermore, it supports multiple output formats (Furthermore, it supports multiple output formats) becomes 此外,它还支持多种输出格式 (Furthermore, it supports multiple output formats)* including rasterized and text-based (including rasterized and text-based) becomes 包括预览和文本基于的格式 (including rasterized and text-based)* catering to convolutional neural networks and autoregressive transformer models (catering to convolutional neural networks and autoregressive transformer models) becomes 适用于卷积神经网络和自适应转换器模型 (catering to convolutional neural networks and autoregressive transformer models)* These formats, along with tools for visualizing and converting between them (These formats, along with tools for visualizing and converting between them) becomes 这些格式、以及转换和可视化工具 (These formats, along with tools for visualizing and converting between them)* ensure versatility and adaptability in research applications (ensure versatility and adaptability in research applications) becomes 确保在研究应用中具有多样性和适应性 (ensure versatility and adaptability in research applications)

An Evaluation of GPT-4 on the ETHICS Dataset

  • paper_url: http://arxiv.org/abs/2309.10492
  • repo_url: None
  • paper_authors: Sergey Rodionov, Zarathustra Amadeus Goertzel, Ben Goertzel
  • for: 这个研究是为了评估GPT-4模型在ETHICS数据集上的表现。
  • methods: 这个研究使用了GPT-4模型来处理ETHICS数据集中的道德判断。
  • results: GPT-4的表现比前一代模型要好,表明AI工程学习并不是道德伦理的硬件问题。
    Abstract This report summarizes a short study of the performance of GPT-4 on the ETHICS dataset. The ETHICS dataset consists of five sub-datasets covering different fields of ethics: Justice, Deontology, Virtue Ethics, Utilitarianism, and Commonsense Ethics. The moral judgments were curated so as to have a high degree of agreement with the aim of representing shared human values rather than moral dilemmas. GPT-4's performance is much better than that of previous models and suggests that learning to work with common human values is not the hard problem for AI ethics.
    摘要 Here is the text in Simplified Chinese:这份报告总结了GPT-4在ETHICS数据集上的性能。ETHICS数据集包括五个子数据集,涵盖不同领域的伦理:正义、德ontology、美德伦理、功利主义和常识伦理。这些伦理判断被精心准备,以便反映人类共同价值,而不是道德困难。相比之前的模型,GPT-4的性能显著提高,表明AI伦理学中学习共同人类价值不是困难的问题。

Fully automated landmarking and facial segmentation on 3D photographs

  • paper_url: http://arxiv.org/abs/2309.10472
  • repo_url: https://github.com/rumc3dlab/3dlandmarkdetection
  • paper_authors: Bo Berends, Freek Bielevelt, Ruud Schreurs, Shankeeth Vinayahalingam, Thomas Maal, Guido de Jong
  • for: 这个研究的目的是发展和评估一个自动化的侧面测量方法,以提高侧面测量的精度和效率。
  • methods: 这个方法使用了两个DiffusionNet模型和其他的颜面分类算法,以及人工标注的10个标点。
  • results: 这个研究发现,这个自动化方法可以实现高精度和高一致性的侧面测量,并且可以减少人工标注的时间和误差。
    Abstract Three-dimensional facial stereophotogrammetry provides a detailed representation of craniofacial soft tissue without the use of ionizing radiation. While manual annotation of landmarks serves as the current gold standard for cephalometric analysis, it is a time-consuming process and is prone to human error. The aim in this study was to develop and evaluate an automated cephalometric annotation method using a deep learning-based approach. Ten landmarks were manually annotated on 2897 3D facial photographs by a single observer. The automated landmarking workflow involved two successive DiffusionNet models and additional algorithms for facial segmentation. The dataset was randomly divided into a training and test dataset. The training dataset was used to train the deep learning networks, whereas the test dataset was used to evaluate the performance of the automated workflow. The precision of the workflow was evaluated by calculating the Euclidean distances between the automated and manual landmarks and compared to the intra-observer and inter-observer variability of manual annotation and the semi-automated landmarking method. The workflow was successful in 98.6% of all test cases. The deep learning-based landmarking method achieved precise and consistent landmark annotation. The mean precision of 1.69 (+/-1.15) mm was comparable to the inter-observer variability (1.31 +/-0.91 mm) of manual annotation. The Euclidean distance between the automated and manual landmarks was within 2 mm in 69%. Automated landmark annotation on 3D photographs was achieved with the DiffusionNet-based approach. The proposed method allows quantitative analysis of large datasets and may be used in diagnosis, follow-up, and virtual surgical planning.
    摘要 三维面部塑型摄影技术可以提供细腻的脑颅面软组织图像,不需要使用辐射。现有的手动标注方法为头颈相机分析的现金标准,但是它是一项时间consuming的过程,容易出现人工错误。本研究的目标是开发和评估一种基于深度学习的自动标注方法。研究使用2897个3D面部照片,每个照片由一名观察者手动标注10个标记。自动标注工作流程包括两个DiffusionNet模型和其他的面部分 segmentation 算法。数据集随机分成训练和测试集。训练集用于训练深度学习网络,而测试集用于评估自动工作流程的性能。自动工作流程的精度由计算自动和手动标注之间的欧几丁度距离来评估。结果显示,自动工作流程成功的情况为98.6%。深度学习基本的标注方法实现了精确和一致的标记。手动标注和自动标注之间的平均差距为1.69(+/-1.15)毫米,与人工变化(1.31(+/-0.91)毫米)相比,表明自动标注的精度和一致性。自动和手动标注之间的欧几丁度距离在2毫米内的情况为69%。这种基于DiffusionNet的方法可以在3D照片上自动标注标记,并且允许大量数据的量化分析,可能用于诊断、跟踪和虚拟手术规划。

Exploring the Dark Side of AI: Advanced Phishing Attack Design and Deployment Using ChatGPT

  • paper_url: http://arxiv.org/abs/2309.10463
  • repo_url: None
  • paper_authors: Nils Begou, Jeremy Vinoy, Andrzej Duda, Maciej Korczynski
  • for: 这篇论文探讨了使用ChatGPT开发高级钓鱼攻击并大规模部署它们。
  • methods: 论文使用ChatGPT生成钓鱼攻击的以下部分:(1)复制目标网站,(2)抓取凭据,(3)干扰代码,(4)自动部署网站在托管提供商上,(5)注册钓鱼域名,(6)将网站与反向代理集成。
  • results: 初步评估自动生成的钓鱼套件显示它们具有快速生成和部署过程以及准确地模拟目标网站的页面。总之,这些发现表明了人工智能的进步,强调了钓鱼攻击的可能性和危险性,强调了人工智能系统中的增强防御措施的必要性。
    Abstract This paper explores the possibility of using ChatGPT to develop advanced phishing attacks and automate their large-scale deployment. We make ChatGPT generate the following parts of a phishing attack: i) cloning a targeted website, ii) integrating code for stealing credentials, iii) obfuscating code, iv) automating website deployment on a hosting provider, v) registering a phishing domain name, and vi) integrating the website with a reverse proxy. The initial assessment of the automatically generated phishing kits highlights their rapid generation and deployment process as well as the close resemblance of the resulting pages to the target website. More broadly, we demonstrate that recent advances in AI underscore the potential risks of its misuse in phishing attacks, which can lead to their increased prevalence and severity. This highlights the necessity for enhanced countermeasures within AI systems.
    摘要
  1. Cloning a targeted website2. Integrating code for stealing credentials3. Obfuscating code4. Automating website deployment on a hosting provider5. Registering a phishing domain name6. Integrating the website with a reverse proxyOur initial assessment of the automatically generated phishing kits shows that they can be rapidly generated and deployed, with the resulting pages closely resembling the target website. This demonstrates the potential risks of AI misuse in phishing attacks, which could lead to an increase in their prevalence and severity. This highlights the need for enhanced countermeasures within AI systems.

Human-AI Interactions and Societal Pitfalls

  • paper_url: http://arxiv.org/abs/2309.10448
  • repo_url: None
  • paper_authors: Francisco Castro, Jian Gao, Sébastien Martin
  • for: 本研究旨在研究在使用生成式人工智能(AI)时,用户可能会看到产效提升,但AI生成的内容可能不会完全符合他们的偏好。
  • methods: 本研究使用 bayesian 框架,让不同的用户选择向 AI 分享多少信息,面临一种输出准确性和通信成本之间的负担。
  • results: 我们发现,在个人层面,AI 训练时使用 AI 生成的内容可能导致输出变得更加一致,特别是在 AI 受训练时。此外,任何 AI 偏见都可能变成社会偏见。要解决这些问题,我们需要改善人机交互,以获得个性化的输出而不是牺牲产效。
    Abstract When working with generative artificial intelligence (AI), users may see productivity gains, but the AI-generated content may not match their preferences exactly. To study this effect, we introduce a Bayesian framework in which heterogeneous users choose how much information to share with the AI, facing a trade-off between output fidelity and communication cost. We show that the interplay between these individual-level decisions and AI training may lead to societal challenges. Outputs may become more homogenized, especially when the AI is trained on AI-generated content. And any AI bias may become societal bias. A solution to the homogenization and bias issues is to improve human-AI interactions, enabling personalized outputs without sacrificing productivity.
    摘要

Toward Unified Controllable Text Generation via Regular Expression Instruction

  • paper_url: http://arxiv.org/abs/2309.10447
  • repo_url: https://github.com/mrzhengxin/ctg-regex-instruction
  • paper_authors: Xin Zheng, Hongyu Lin, Xianpei Han, Le Sun
  • for: 本研究的目的是提出一种基于常见表达式的指令机制,以便快速适应不同的约束类型和组合。
  • methods: 我们的方法使用了指令机制,通过常见表达式来完全利用其优势,并支持所有流行的细化控制生成约束,包括lexical、position和length约束,以及其复杂组合。
  • results: 我们的实验结果表明,我们的简单方法可以 дости得高成功率和适应性,同时与其他约束组合进行比较,在自动指标中维持竞争力,并超越大多数之前的基eline。
    Abstract Controllable text generation is a fundamental aspect of natural language generation, with numerous methods proposed for different constraint types. However, these approaches often require significant architectural or decoding modifications, making them challenging to apply to additional constraints or resolve different constraint combinations. To address this, our paper introduces Regular Expression Instruction (REI), which utilizes an instruction-based mechanism to fully exploit regular expressions' advantages to uniformly model diverse constraints. Specifically, our REI supports all popular fine-grained controllable generation constraints, i.e., lexical, positional, and length, as well as their complex combinations, via regular expression-style instructions. Our method only requires fine-tuning on medium-scale language models or few-shot, in-context learning on large language models, and requires no further adjustment when applied to various constraint combinations. Experiments demonstrate that our straightforward approach yields high success rates and adaptability to various constraints while maintaining competitiveness in automatic metrics and outperforming most previous baselines.
    摘要 natural language generation中的可控制文本生成是一个基本问题,各种方法被提出来解决不同的约束类型。然而,这些方法经常需要大量的架构或解码修改,使其应用于其他约束或解决不同的约束组合变得困难。为解决这个问题,我们的论文引入了常见表达式指令(REI),该机制利用了常见表达式的优点,以通用的方式模型多样的约束。具体来说,我们的REI支持所有流行的细化可控制生成约束,包括lexical、位置和长度约束,以及它们的复杂组合,通过常见表达式样式的指令。我们的方法仅需要中型语言模型的微调或几极少的在线学习,并且无需进行进一步的调整,无论应用于不同的约束组合。实验表明,我们的简单方法可以实现高的成功率和适应性,同时保持自动指标的竞争力和大多数之前的基elines的性能。

Exploring Self-Reinforcement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models

  • paper_url: http://arxiv.org/abs/2309.10444
  • repo_url: https://github.com/strong-ai-lab/explanation-generation
  • paper_authors: Qiming Bao, Juho Leinonen, Alex Yuxuan Peng, Wanjun Zhong, Tim Pistotti, Alice Huang, Paul Denny, Michael Witbrock, Jiamou Liu
    for: 这个论文的目的是帮助学生生成高质量的学习资源,并使用自然语言处理技术来自动生成解释。methods: 这个论文提出了一个基于自适应大语言模型的框架,包括三个模块:生成学生对应的解释、评估这些解释的质量,并不断改进解释。results: 这个论文的实验结果表明,与其他大语言模型相比,GPT-4在生成解释时表现出更高的创造力,并且由人类专家评估时被评为最高。
    Abstract Learnersourcing involves students generating and sharing learning resources with their peers. When learnersourcing multiple-choice questions, creating explanations for the generated questions is a crucial step as it facilitates a deeper understanding of the related concepts. However, it is often difficult for students to craft effective explanations due to limited subject understanding and a tendency to merely restate the question stem, distractors, and correct answer. To help scaffold this task, in this work we propose a self-reinforcement large-language-model framework, with the goal of generating and evaluating explanations automatically. Comprising three modules, the framework generates student-aligned explanations, evaluates these explanations to ensure their quality and iteratively enhances the explanations. If an explanation's evaluation score falls below a defined threshold, the framework iteratively refines and reassesses the explanation. Importantly, our framework emulates the manner in which students compose explanations at the relevant grade level. For evaluation, we had a human subject-matter expert compare the explanations generated by students with the explanations created by the open-source large language model Vicuna-13B, a version of Vicuna-13B that had been fine-tuned using our method, and by GPT-4. We observed that, when compared to other large language models, GPT-4 exhibited a higher level of creativity in generating explanations. We also found that explanations generated by GPT-4 were ranked higher by the human expert than both those created by the other models and the original student-created explanations. Our findings represent a significant advancement in enriching the learnersourcing experience for students and enhancing the capabilities of large language models in educational applications.
    摘要 学生来源化包括学生生成和分享学习资源的过程。当学生来源化多选问题时,创造相关概念的解释是一项重要的步骤,因为它可以帮助学生更深入理解相关概念。然而,学生 oftentimes Difficulty crafting effective explanations due to limited subject matter understanding and a tendency to simply restate the question stem, distractors, and correct answer. To help address this challenge, we propose a self-reinforcement large language model framework in this work, with the goal of generating and evaluating explanations automatically. The framework consists of three modules: generating student-aligned explanations, evaluating these explanations to ensure their quality, and iteratively enhancing the explanations. If an explanation's evaluation score falls below a defined threshold, the framework iteratively refines and reassesses the explanation. Importantly, our framework emulates the manner in which students compose explanations at the relevant grade level. For evaluation, we had a human subject-matter expert compare the explanations generated by students with the explanations created by the open-source large language model Vicuna-13B, a version of Vicuna-13B that had been fine-tuned using our method, and by GPT-4. We observed that, when compared to other large language models, GPT-4 exhibited a higher level of creativity in generating explanations. We also found that explanations generated by GPT-4 were ranked higher by the human expert than both those created by the other models and the original student-created explanations. Our findings represent a significant advancement in enriching the learnersourcing experience for students and enhancing the capabilities of large language models in educational applications.

Rethinking Imitation-based Planner for Autonomous Driving

  • paper_url: http://arxiv.org/abs/2309.10443
  • repo_url: https://github.com/jchengai/planTF
  • paper_authors: Jie Cheng, Yingbing Chen, Xiaodong Mei, Bowen Yang, Bo Li, Ming Liu
  • for: 这篇论文的目的是为了提供一个大规模的实际世界数据集和一个标准化的封闭比较 benchmark,以便对各种设计的效iveness进行公平的比较。
  • methods: 本文使用了两个基本 yet 未得到了充分研究的方面: Egoplan 中的关键特征和可以降低堆叠错误的有效数据扩展技术。
  • results: 我们的结果表明,一个well-designed的强制 imitation-based плаanner 可以与当前的状态 искусственный智能方法相比,在特长情况下表现出非常高的竞争力,并且在长尾情况下具有更好的泛化能力。
    Abstract In recent years, imitation-based driving planners have reported considerable success. However, due to the absence of a standardized benchmark, the effectiveness of various designs remains unclear. The newly released nuPlan addresses this issue by offering a large-scale real-world dataset and a standardized closed-loop benchmark for equitable comparisons. Utilizing this platform, we conduct a comprehensive study on two fundamental yet underexplored aspects of imitation-based planners: the essential features for ego planning and the effective data augmentation techniques to reduce compounding errors. Furthermore, we highlight an imitation gap that has been overlooked by current learning systems. Finally, integrating our findings, we propose a strong baseline model-PlanTF. Our results demonstrate that a well-designed, purely imitation-based planner can achieve highly competitive performance compared to state-of-the-art methods involving hand-crafted rules and exhibit superior generalization capabilities in long-tail cases. Our models and benchmarks are publicly available. Project website https://jchengai.github.io/planTF.
    摘要

Multi-Object Graph Affordance Network: Enabling Goal-Oriented Planning through Compound Object Affordances

  • paper_url: http://arxiv.org/abs/2309.10426
  • repo_url: None
  • paper_authors: Tuba Girgin, Emre Ugur
  • for: 研究复杂物体之间的契合关系,以便在机器学习中更好地训练机器人。
  • methods: 我们提出了多对象图像契合网络(MOGAN),用于模拟复杂物体之间的契合关系,并预测将新物体放置在现有复杂物体之上的效果。
  • results: 我们的系统能够正确地模拟复杂物体之间的契合关系,包括堆积球和杯子、杆和环等。我们在虚拟和真实环境中进行了测试,并与基线模型进行比较,以显示我们的系统的优势。
    Abstract Learning object affordances is an effective tool in the field of robot learning. While the data-driven models delve into the exploration of affordances of single or paired objects, there is a notable gap in the investigation of affordances of compound objects that are composed of an arbitrary number of objects with complex shapes. In this study, we propose Multi-Object Graph Affordance Network (MOGAN) that models compound object affordances and predicts the effect of placing new objects on top of the existing compound. Given different tasks, such as building towers of specific heights or properties, we used a search based planning to find the sequence of stack actions with the objects of suitable affordances. We showed that our system was able to correctly model the affordances of very complex compound objects that include stacked spheres and cups, poles, and rings that enclose the poles. We demonstrated the applicability of our system in both simulated and real-world environments, comparing our systems with a baseline model to highlight its advantages.
    摘要 学习对象可行性是机器学习领域的有效工具。而数据驱动模型探索单个或对应的对象可行性的探索,但是对于包含多个对象的复杂形状的复合物体可行性的探索却存在显著的缺口。在本研究中,我们提议了多对象图像可行性网络(MOGAN),该模型可以预测将新对象放置在现有复合物体上的效果。我们使用搜索基本计划来找到适合的堆作业序列,以实现不同任务,如建立特定高度或性能的塔楼。我们证明了我们的系统可以正确地模型复杂的复合物体,包括堆积球和杯子、柱子和环形结构,并在实验室和真实环境中进行了比较,与基eline模型进行了对比,以 highlight its advantages。

Functional requirements to mitigate the Risk of Harm to Patients from Artificial Intelligence in Healthcare

  • paper_url: http://arxiv.org/abs/2309.10424
  • repo_url: None
  • paper_authors: Juan M. García-Gómez, Vicent Blanes-Selva, José Carlos de Bartolomé Cenzano, Jaime Cebolla-Cornejo, Ascensión Doñate-Martínez
  • for: 本文旨在描述七种人工智能(AI)在医疗领域的风险,以及十四种技术解决方案来减少这些风险。
  • methods: 本文使用了七种AI风险的列表,并提出了十四种技术要求来降低这些风险。
  • results: 本文的结果表明,通过实施这些技术要求,可以减少AI在医疗领域的风险,并保证AI系统的不断良好运行,以便为患者提供有益的医疗服务。
    Abstract The Directorate General for Parliamentary Research Services of the European Parliament has prepared a report to the Members of the European Parliament where they enumerate seven main risks of Artificial Intelligence (AI) in medicine and healthcare: patient harm due to AI errors, misuse of medical AI tools, bias in AI and the perpetuation of existing inequities, lack of transparency, privacy and security issues, gaps in accountability, and obstacles in implementation. In this study, we propose fourteen functional requirements that AI systems may implement to reduce the risks associated with their medical purpose: AI passport, User management, Regulation check, Academic use only disclaimer, data quality assessment, Clinicians double check, Continuous performance evaluation, Audit trail, Continuous usability test, Review of retrospective/simulated cases, Bias check, eXplainable AI, Encryption and use of field-tested libraries, and Semantic interoperability. Our intention here is to provide specific high-level specifications of technical solutions to ensure continuous good performance and use of AI systems to benefit patients in compliance with the future EU regulatory framework.
    摘要 欧洲议会Directorate General for Parliamentary Research Services已经准备了一份关于人工智能(AI)在医疗领域的报告,并列出了七个主要的AI风险:对patient的伤害 due to AI错误,违规使用医疗AI工具,AI中存在偏见和现有不平等的持续传递,缺乏透明度、隐私和安全问题,责任缺口,以及实施困难。在这项研究中,我们提出了十四个功能需求,以减少AI系统的医疗用途中的风险:AI护照,用户管理,法规检查,仅学术用途说明,数据质量评估,临床医生双重检查,不断性能评估,审计记录,不断用户测试,审查退化/模拟案例,偏见检查,可解释AI,加密,并使用已经测试的库。我们的目的是提供特定的高级技术解决方案,以确保AI系统的持续良好表现,并且在欧盟未来的法规框架下使用AI系统为病人带来好处。

Learning from Teaching Assistants to Program with Subgoals: Exploring the Potential for AI Teaching Assistants

  • paper_url: http://arxiv.org/abs/2309.10419
  • repo_url: None
  • paper_authors: Changyoon Lee, Junho Myung, Jieun Han, Jiho Jin, Alice Oh
  • for: 本研究旨在探讨使用生成AI作为初级编程教育的教学助手,以评估学生与AI助手和人类助手之间的互动和感受。
  • methods: 我们采用 между组试验方法,尝试20名初级编程学习者在生成AI和人类助手的指导下解决编程任务。learners可以更快地解决任务,并得分相当。
  • results: 我们发现learners对AI助手的感受和人类助手的感受相似,即快速、全面和有用的回答,满意度也相似。此外,我们还提出了更好地设计和使用生成AI作为编程教育教学助手的指导原则。
    Abstract With recent advances in generative AI, conversational models like ChatGPT have become feasible candidates for TAs. We investigate the practicality of using generative AI as TAs in introductory programming education by examining novice learners' interaction with TAs in a subgoal learning environment. To compare the learners' interaction and perception of the AI and human TAs, we conducted a between-subject study with 20 novice programming learners. Learners solve programming tasks by producing subgoals and subsolutions with the guidance of a TA. Our study shows that learners can solve tasks faster with comparable scores with AI TAs. Learners' perception of the AI TA is on par with that of human TAs in terms of speed and comprehensiveness of the replies and helpfulness, difficulty, and satisfaction of the conversation. Finally, we suggest guidelines to better design and utilize generative AI as TAs in programming education from the result of our chat log analysis.
    摘要 Recent advances in 生成AI 使得对话模型如ChatGPT可能成为教学助手。我们 investigate 使用生成AI 作为初级编程教育中的教学助手,通过评估新手学者与 AI 和人类教学助手之间的互动来评估实用性。我们通过对 20 名初级编程学生进行比较研究,发现学生可以更快地解决编程任务,并且得分相似。学生对 AI 教学助手的评估与人类教学助手的评估相似,包括快速回答、全面性、 helpfulness、difficulty 和满意度。最后,我们提出了更好地设计和使用生成AI 作为编程教育教学助手的指南,基于我们的对话记录分析结果。

Unsupervised Learning via Network-Aware Embeddings

  • paper_url: http://arxiv.org/abs/2309.10408
  • repo_url: None
  • paper_authors: Anne Sophie Riis Damstrup, Sofie Tosti Madsen, Michele Coscia
  • for: 这篇论文的目的是解决不可靠的深度学习方法在面向网络数据的聚类任务中的缺陷。
  • methods: 这篇论文使用了一种新的网络意识 embedding 方法,通过对数字节点特征的一般化欧几里得距离进行估计,以便在聚类任务中考虑网络中节点之间的相互关系。
  • results: 试验结果表明,使用这种网络意识 embedding 方法可以提高聚类任务的效果,并且可以在各种领域(如市场营销、经济学和政治科学)提供实用的洞察。此外,这种方法可以扩展到大型网络中,并且可以在不同的数据集上重复得到类似的好效果。
    Abstract Data clustering, the task of grouping observations according to their similarity, is a key component of unsupervised learning -- with real world applications in diverse fields such as biology, medicine, and social science. Often in these fields the data comes with complex interdependencies between the dimensions of analysis, for instance the various characteristics and opinions people can have live on a complex social network. Current clustering methods are ill-suited to tackle this complexity: deep learning can approximate these dependencies, but not take their explicit map as the input of the analysis. In this paper, we aim at fixing this blind spot in the unsupervised learning literature. We can create network-aware embeddings by estimating the network distance between numeric node attributes via the generalized Euclidean distance. Differently from all methods in the literature that we know of, we do not cluster the nodes of the network, but rather its node attributes. In our experiments we show that having these network embeddings is always beneficial for the learning task; that our method scales to large networks; and that we can actually provide actionable insights in applications in a variety of fields such as marketing, economics, and political science. Our method is fully open source and data and code are available to reproduce all results in the paper.
    摘要 “数据聚合,将观察值按其相似性分组,是无监督学习中的关键组成部分,在生物、医学和社会科学等领域有广泛的应用。在这些领域中,数据往往具有复杂的相互关系,例如人们在社交网络上的多种特征和意见。现有的聚合方法无法处理这些复杂性,深度学习可以近似这些关系,但是无法直接使其作为分析的输入。在这篇论文中,我们想要解决这个潜在的盲点在无监督学习文献中。我们可以创建网络意识 embedding,通过一般化欧几何距离来估算网络中节点属性之间的距离。与现有文献中所有方法不同,我们不是将网络节点聚合,而是其节点属性。在我们的实验中,我们发现在应用于多个领域,如市场学、经济学和政治科学等,有助于提供实用的洞察。我们的方法是完全开源的,数据和代码都可以在论文中提供,以便重现所有结果。”

Exploiting Causality Signals in Medical Images: A Pilot Study with Empirical Results

  • paper_url: http://arxiv.org/abs/2309.10399
  • repo_url: None
  • paper_authors: Gianluca Carloni, Sara Colantonio
  • for: 这篇论文是用于自动分类医疗影像,以实现弱因果信号在场景中的模型,以描述一个特定区域的影像特征如何影响另一个区域的影像特征。
  • methods: 这篇论文使用了两个组件:一个卷积神经网络背bone和一个因果因素提取模组。这个模组计算出各个特征对象的权重,以强调每个特征对象的影像部分。可以根据两个外部信号来修改这个模组的功能,因此获得不同的方法variant。
  • results: 这篇论文在一个公开的 проstate MRI 影像数据集上进行了量值实验、质感评估和截除研究,结果显示了我们的方法可以提高分类性能和生成更加可靠的预测,尤其是在医疗影像中, precisione 和可靠性是诊断和治疗规划的重要因素。
    Abstract We present a new method for automatically classifying medical images that uses weak causal signals in the scene to model how the presence of a feature in one part of the image affects the appearance of another feature in a different part of the image. Our method consists of two components: a convolutional neural network backbone and a causality-factors extractor module. The latter computes weights for the feature maps to enhance each feature map according to its causal influence in the image's scene. We can modify the functioning of the causality module by using two external signals, thus obtaining different variants of our method. We evaluate our method on a public dataset of prostate MRI images for prostate cancer diagnosis, using quantitative experiments, qualitative assessment, and ablation studies. Our results show that our method improves classification performance and produces more robust predictions, focusing on relevant parts of the image. That is especially important in medical imaging, where accurate and reliable classifications are essential for effective diagnosis and treatment planning.
    摘要 我们提出了一种新的自动化医学图像分类方法,该方法利用图像场景中的弱 causal 信号来模型图像中不同部分之间的特征之间的相互影响。我们的方法包括两个组件:一个 convolutional neural network 背bone 和一个 causality-factors 提取模块。后者计算图像feature map 中的强度,以强调每个特征图像中的相互影响。我们可以通过使用两个外部信号来修改 causality 模块的功能,从而获得不同的方法变体。我们对公共的 проstate MRI 图像集进行评估,使用量化实验、质量评估和剪辑研究来评估我们的方法。我们的结果显示,我们的方法可以提高分类性能,生成更加稳定的预测结果,特别是在医学成像中,准确和可靠的分类是诊断和治疗规划的关键。

Adaptive questionnaires for facilitating patient data entry in clinical decision support systems: Methods and application to STOPP/START v2

  • paper_url: http://arxiv.org/abs/2309.10398
  • repo_url: None
  • paper_authors: Jean-Baptiste Lamy, Abdelmalek Mouazer, Karima Sedki, Sophie Dubois, Hector Falcoff
  • for: 该论文目的是提出一种简化患者数据录入的解决方案,以便临床医生更容易使用临床决策支持系统。
  • methods: 该论文使用了一种适应问卷,即在用户交互过程中动态显示或隐藏问题的问卷,以简化患者数据录入。同时,该论文还提出了一种将临床规则翻译成显示规则,以确定问卷中需要显示的项目的方法。
  • results: 该论文应用于一种决策支持系统,通过使用适应问卷,可以大大减少显示的临床条件数量,从原来的一半减少到一半。在临床医生focus group会议上,该适应问卷被评价为“很容易使用”。未来,该方法可能可以应用于其他指南,并适应于患者自身的数据录入。
    Abstract Clinical decision support systems are software tools that help clinicians to make medical decisions. However, their acceptance by clinicians is usually rather low. A known problem is that they often require clinicians to manually enter lots of patient data, which is long and tedious. Existing solutions, such as the automatic data extraction from electronic health record, are not fully satisfying, because of low data quality and availability. In practice, many systems still include long questionnaire for data entry. In this paper, we propose an original solution to simplify patient data entry, using an adaptive questionnaire, i.e. a questionnaire that evolves during user interaction, showing or hiding questions dynamically. Considering a rule-based decision support systems, we designed methods for translating the system's clinical rules into display rules that determine the items to show in the questionnaire, and methods for determining the optimal order of priority among the items in the questionnaire. We applied this approach to a decision support system implementing STOPP/START v2, a guideline for managing polypharmacy. We show that it permits reducing by about two thirds the number of clinical conditions displayed in the questionnaire. Presented to clinicians during focus group sessions, the adaptive questionnaire was found "pretty easy to use". In the future, this approach could be applied to other guidelines, and adapted for data entry by patients.
    摘要 临床决策支持系统是软件工具,帮助临床医生做出医疗决策。然而,它们的接受度通常很低。一个知道的问题是,它们经常需要临床医生手动输入大量患者数据,这是长时间的和繁琐的。现有的解决方案,如自动提取电子医疗记录中的数据,并不充分满足,因为数据质量和可用性不够。在实践中,许多系统仍然包含长问卷。在这篇论文中,我们提出了一种新的解决方案,以简化患者数据输入。我们使用了适应问卷,即在用户互动中动态显示或隐藏问题的问卷。考虑到规则驱动的决策支持系统,我们设计了将系统的临床规则翻译成显示规则,以确定问卷中显示的项目的顺序和优先级。我们应用了这种方法于一个管理多剂药物的决策支持系统,我们发现可以将问卷中显示的临床条件减少到大约两 third。在临床医生Focus组会议中展示了适应问卷,他们认为它很容易使用。未来,这种方法可能会应用于其他指南,并适应用于患者的数据输入。

Graph Contrastive Learning Meets Graph Meta Learning: A Unified Method for Few-shot Node Tasks

  • paper_url: http://arxiv.org/abs/2309.10376
  • repo_url: https://github.com/haoliu-cola/cola
  • paper_authors: Hao Liu, Jiarui Feng, Lecheng Kong, Dacheng Tao, Yixin Chen, Muhan Zhang
  • for: 本研究旨在提出一种新的几拟标签分类方法,以解决现有的几拟标签分类方法受到过拟合问题的限制。
  • methods: 本研究使用图生成学(Graph Neural Networks,GNNs)和强化学习(fine-tuning)两种方法,并结合了图对比学习(graph contrastive learning)。
  • results: 对于几拟标签分类任务,我们的方法COLA可以在少量数据情况下达到新的顶峰性能,而且可以减少过拟合风险。
    Abstract Graph Neural Networks (GNNs) have become popular in Graph Representation Learning (GRL). One fundamental application is few-shot node classification. Most existing methods follow the meta learning paradigm, showing the ability of fast generalization to few-shot tasks. However, recent works indicate that graph contrastive learning combined with fine-tuning can significantly outperform meta learning methods. Despite the empirical success, there is limited understanding of the reasons behind it. In our study, we first identify two crucial advantages of contrastive learning compared to meta learning, including (1) the comprehensive utilization of graph nodes and (2) the power of graph augmentations. To integrate the strength of both contrastive learning and meta learning on the few-shot node classification tasks, we introduce a new paradigm: Contrastive Few-Shot Node Classification (COLA). Specifically, COLA employs graph augmentations to identify semantically similar nodes, which enables the construction of meta-tasks without the need for label information. Therefore, COLA can utilize all nodes to construct meta-tasks, further reducing the risk of overfitting. Through extensive experiments, we validate the essentiality of each component in our design and demonstrate that COLA achieves new state-of-the-art on all tasks.
    摘要 граф neural networks (GNNs) 已成为graph representation learning (GRL) 中流行的方法之一。其中一个基本应用是几拟分类。大多数现有方法采用meta learning paradigm,表明它们在几拟任务上快速泛化的能力。然而,最近的研究表明,结合图像学习和精度调整可以明显超过meta learning方法。 DESPITE THE EMPERICAL SUCCESS, THERE IS LIMITED UNDERSTANDING OF THE REASONS BEHIND IT。在我们的研究中,我们首先确定了对比学习与meta learning之间的两大优势,包括(1)图像中节点的全面利用和(2)图像的扩展。为了结合对比学习和meta learning在几拟节点 classification中的优势,我们介绍了一种新的方法:对比几拟节点分类(COLA)。具体来说,COLA使用图像扩展来标识semantically similar的节点,从而无需标签信息可以构建meta-任务。因此,COLA可以完全利用所有节点来构建meta-任务,从而减少风险过拟合。通过广泛的实验,我们证明了我们的设计的每一个组件的重要性,并证明了COLA在所有任务上达到了新的状态机。

Generative AI vs. AGI: The Cognitive Strengths and Weaknesses of Modern LLMs

  • paper_url: http://arxiv.org/abs/2309.10371
  • repo_url: None
  • paper_authors: Ben Goertzel
  • For: The paper discusses the cognitive strengths and weaknesses of interactive large language models (LLMs) such as ChatGPT, GPT-4, Bard, and Llama, and how they differ from human cognitive systems.* Methods: The paper reviews the basic cognitive architectures of these LLMs and argues that incremental improvement is not a viable approach to achieving human-level artificial general intelligence (AGI).* Results: The paper suggests that while LLMs cannot form significant parts of human-level AGI architectures on their own, they can still provide valuable insights into human-level AGI and should be studied and experimented with. Additionally, the paper touches on social and ethical matters regarding LLMs, such as misinformation and economic upheavals, but argues that a different policy approach is needed compared to more credible approximations of human-level AGI.
    Abstract A moderately detailed consideration of interactive LLMs as cognitive systems is given, focusing on LLMs circa mid-2023 such as ChatGPT, GPT-4, Bard, Llama, etc.. Cognitive strengths of these systems are reviewed, and then careful attention is paid to the substantial differences between the sort of cognitive system these LLMs are, and the sort of cognitive systems human beings are. It is found that many of the practical weaknesses of these AI systems can be tied specifically to lacks in the basic cognitive architectures according to which these systems are built. It is argued that incremental improvement of such LLMs is not a viable approach to working toward human-level AGI, in practical terms given realizable amounts of compute resources. This does not imply there is nothing to learn about human-level AGI from studying and experimenting with LLMs, nor that LLMs cannot form significant parts of human-level AGI architectures that also incorporate other ideas. Social and ethical matters regarding LLMs are very briefly touched from this perspective, which implies that while care should be taken regarding misinformation and other issues, and economic upheavals will need their own social remedies based on their unpredictable course as with any powerfully impactful technology, overall the sort of policy needed as regards modern LLMs is quite different than would be the case if a more credible approximation to human-level AGI were at hand.
    摘要 一篇moderately detailed的文章对交互式LLM进行了认知系统的评估,主要focus on LLlMAmid-2023年,如ChatGPT、GPT-4、Bard、Llama等等。文章评估了这些系统的认知优势,然后仔细审视了这些AI系统与人类认知系统之间的重要差异。发现这些AI系统的实用弱点可以追溯到其基本认知架构的缺失。 argue that不可靠地提高这些LLMs不是实现人类水平AGI的可靠方法,具体来说,随着可计算资源的增加,这些LLMs的改进速度会变得慢。这并不意味着不能从研究和实验LLMs中学习人类水平AGI,也不意味着LLMs无法成为人类水平AGI架构的重要组成部分。文章 briefly touched social and ethical matters related to LLMs from this perspective, suggesting that while care should be taken to address misinformation and other issues, and economic upheavals will need their own social remedies based on their unpredictable course, the policy needed for modern LLMs is quite different from what would be the case if a more credible approximation to human-level AGI were at hand.

Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization

  • paper_url: http://arxiv.org/abs/2309.10370
  • repo_url: None
  • paper_authors: Thomas Chen, Patricia Muñoz Ewald
  • for: 这 paper 描述了一种 shallow neural network 的结构,包括一层抑制函数、${\mathcal L}^2$ Schatten class 成本函数、输入空间为 ${\mathbb R}^M$,输出空间为 ${\mathbb R}^Q$ ($Q\leq M$),训练输入样本大小为 $N>QM$。
  • methods: 这 paper 使用了一种基于投影的approximate optimizer,并证明了cost function 的最小值下界为 $O(\delta_P)$,其中 $\delta_P$ 是训练输入信号噪声比。在特殊情况下 $M=Q$ 时,我们可以得到一个精确的本地最小值,其差异与上述下界为 $O(\delta_P^2)$。
  • results: 这 paper 证明了cost function 的最小值下界,并构造了一个可被训练的 neural network,该网络可以imetrize 输入空间中 $Q$-维子空间,它是由训练输入样本的平均值 $\overline{x_{0,j}$ ($j=1,\dots,Q$) 所决定的。
    Abstract In this paper, we provide a geometric interpretation of the structure of shallow neural networks characterized by one hidden layer, a ramp activation function, an ${\mathcal L}^2$ Schatten class (or Hilbert-Schmidt) cost function, input space ${\mathbb R}^M$, output space ${\mathbb R}^Q$ with $Q\leq M$, and training input sample size $N>QM$. We prove an upper bound on the minimum of the cost function of order $O(\delta_P$ where $\delta_P$ measures the signal to noise ratio of training inputs. We obtain an approximate optimizer using projections adapted to the averages $\overline{x_{0,j}$ of training input vectors belonging to the same output vector $y_j$, $j=1,\dots,Q$. In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function; the sharp value differs from the upper bound obtained for $Q\leq M$ by a relative error $O(\delta_P^2)$. The proof of the upper bound yields a constructively trained network; we show that it metrizes the $Q$-dimensional subspace in the input space ${\mathbb R}^M$ spanned by $\overline{x_{0,j}$, $j=1,\dots,Q$. We comment on the characterization of the global minimum of the cost function in the given context.
    摘要 在这篇论文中,我们提供了一种几何 interpreting the structure of shallow neural networks with one hidden layer, ramp activation function, $\mathcal{L}^2$ Schatten class (或希尔伯特- Schmidt) cost function, input space $\mathbb{R}^M$, output space $\mathbb{R}^Q$ with $Q\leq M$, 和训练输入样本大小 $N>QM$. 我们证明了cost function的最小值的上界为$\mathcal{O}(\delta_P)$,其中$\delta_P$是训练输入信号响应率。我们使用适应于$\overline{x_{0,j}$的投影来获得一个approximate optimizer。在特殊情况下,当$M=Q$时,我们确切地确定了一个精确的地方最小值,其差异与上界相对Error $O(\delta_P^2)$。证明上界带来一个可重构的网络,我们表明它在输入空间$\mathbb{R}^M$中метrize了一个$Q$-维子空间,该子空间是由$\overline{x_{0,j}$所确定的。我们评论了在给定的 context中global minimum的特征。

Toward efficient resource utilization at edge nodes in federated learning

  • paper_url: http://arxiv.org/abs/2309.10367
  • repo_url: None
  • paper_authors: Sadi Alawadi, Addi Ait-Mlouk, Salman Toor, Andreas Hellander
  • for: 本研究旨在 empirically 探讨在 Federated Learning 中 randomly 选择层进行模型训练的策略可以实现资源占用减少和全球模型收敛不受影响。
  • methods: 本研究使用了 Federated Learning 框架 FEDn,在不同的 dataset(CIFAR-10、CASA 和 IMDB)和任务(使用不同的深度学习模型架构)下进行了多个实验。
  • results: 结果显示,只训练部分模型层可以加速训练过程,有效地利用设备上的资源,并将数据传输量减少了约 75% 和 53%,无需妨碍全球模型准确性。
    Abstract Federated learning (FL) enables edge nodes to collaboratively contribute to constructing a global model without sharing their data. This is accomplished by devices computing local, private model updates that are then aggregated by a server. However, computational resource constraints and network communication can become a severe bottleneck for larger model sizes typical for deep learning applications. Edge nodes tend to have limited hardware resources (RAM, CPU), and the network bandwidth and reliability at the edge is a concern for scaling federated fleet applications. In this paper, we propose and evaluate a FL strategy inspired by transfer learning in order to reduce resource utilization on devices, as well as the load on the server and network in each global training round. For each local model update, we randomly select layers to train, freezing the remaining part of the model. In doing so, we can reduce both server load and communication costs per round by excluding all untrained layer weights from being transferred to the server. The goal of this study is to empirically explore the potential trade-off between resource utilization on devices and global model convergence under the proposed strategy. We implement the approach using the federated learning framework FEDn. A number of experiments were carried out over different datasets (CIFAR-10, CASA, and IMDB), performing different tasks using different deep-learning model architectures. Our results show that training the model partially can accelerate the training process, efficiently utilizes resources on-device, and reduce the data transmission by around 75% and 53% when we train 25%, and 50% of the model layers, respectively, without harming the resulting global model accuracy.
    摘要 联合学习(FL)allow edge nodes to collaboratively construct a global model without sharing their data. This is achieved by devices computing local, private model updates that are then aggregated by a server. However, computational resource constraints and network communication can become a severe bottleneck for larger model sizes typical for deep learning applications. Edge nodes tend to have limited hardware resources (RAM, CPU), and the network bandwidth and reliability at the edge is a concern for scaling federated fleet applications. In this paper, we propose and evaluate a FL strategy inspired by transfer learning to reduce resource utilization on devices and the load on the server and network in each global training round. For each local model update, we randomly select layers to train, freezing the remaining part of the model. By doing so, we can reduce both server load and communication costs per round by excluding all untrained layer weights from being transferred to the server. The goal of this study is to empirically explore the potential trade-off between resource utilization on devices and global model convergence under the proposed strategy. We implement the approach using the federated learning framework FEDn. A number of experiments were carried out over different datasets (CIFAR-10, CASA, and IMDB), performing different tasks using different deep-learning model architectures. Our results show that training the model partially can accelerate the training process, efficiently utilize resources on-device, and reduce the data transmission by around 75% and 53% when we train 25%, and 50% of the model layers, respectively, without harming the resulting global model accuracy.

OccluTrack: Rethinking Awareness of Occlusion for Enhancing Multiple Pedestrian Tracking

  • paper_url: http://arxiv.org/abs/2309.10360
  • repo_url: https://github.com/hieu9955/ggggg
  • paper_authors: Jianjun Gao, Yi Wang, Kim-Hui Yap, Kratika Garg, Boon Siew Han
  • for: 提高多人跟踪在 occlusion 场景下的精度和稳定性。
  • methods: 提出了一种适应 occlusion 的多人跟踪方法,包括异常动量抑制机制、pose-guided re-ID 模块和 occlusion-aware 关联方法。
  • results: 在 MOT-Challenge 数据集上进行了广泛的评估,并表明了我们的 OccluTrack 在多人跟踪和关联性能方面的改进。特别是,对 IDF1、IDSw、AssA 和 AssR 的改进表明了我们的 OccluTrack 在 occlusion 场景下的效果。
    Abstract Multiple pedestrian tracking faces the challenge of tracking pedestrians in the presence of occlusion. Existing methods suffer from inaccurate motion estimation, appearance feature extraction, and association due to occlusion, leading to inadequate Identification F1-Score (IDF1), excessive ID switches (IDSw), and insufficient association accuracy and recall (AssA and AssR). We found that the main reason is abnormal detections caused by partial occlusion. In this paper, we suggest that the key insight is explicit motion estimation, reliable appearance features, and fair association in occlusion scenes. Specifically, we propose an adaptive occlusion-aware multiple pedestrian tracker, OccluTrack. We first introduce an abnormal motion suppression mechanism into the Kalman Filter to adaptively detect and suppress outlier motions caused by partial occlusion. Second, we propose a pose-guided re-ID module to extract discriminative part features for partially occluded pedestrians. Last, we design a new occlusion-aware association method towards fair IoU and appearance embedding distance measurement for occluded pedestrians. Extensive evaluation results demonstrate that our OccluTrack outperforms state-of-the-art methods on MOT-Challenge datasets. Particularly, the improvements on IDF1, IDSw, AssA, and AssR demonstrate the effectiveness of our OccluTrack on tracking and association performance.
    摘要 多人行踪面临 occlusion 挑战,现有方法受到 occlusion 的影响,导致不准确的运动估计、外观特征提取和关联,从而导致 IDF1 分数不够高、ID Switches 过多、关联准确率和回归率不够高。我们发现主要原因是部分 occlusion 引起的异常检测。在这篇论文中,我们提出了关键思路,即显式运动估计、可靠的外观特征和公平的关联在 occlusion 场景下。 Specifically,我们提出了一种适应 occlusion 的多人行踪器,即 OccluTrack。我们首先在 Kalman 筛引入了异常运动抑制机制,以适应部分 occlusion 引起的异常检测。其次,我们提出了一种基于 pose 的 Re-ID 模块,以提取部分 occlusion 的特征。最后,我们设计了一种新的 occlusion-aware 关联方法,以实现公平的 IoU 和外观嵌入距离度量测量。我们对 MOT-Challenge 数据集进行了广泛的评估,结果显示,我们的 OccluTrack 超过了当前状态的方法。特别是,对 IDF1、ID Switches、AssA 和 AssR 的改进表明了我们的 OccluTrack 在跟踪和关联性能方面的效果。

Explaining Agent Behavior with Large Language Models

  • paper_url: http://arxiv.org/abs/2309.10346
  • repo_url: None
  • paper_authors: Xijia Zhang, Yue Guo, Simon Stepputtis, Katia Sycara, Joseph Campbell
  • for: 这种研究旨在提供一种能够让智能代理人(如机器人)对其决策的推理进行说明,以便与人类对手中的解释。
  • methods: 该方法基于对状态和行为的观察,不需要了解深度神经网络等模型的具体表示。通过学习紧凑表示,生成自然语言的解释,并且可以进行用户与大语言模型的互动。
  • results: 经过用户测试和实验,该方法能够生成与人类专家的解释相当有帮助的解释,同时允许用户进行如清晰化和反向问题的互动。
    Abstract Intelligent agents such as robots are increasingly deployed in real-world, safety-critical settings. It is vital that these agents are able to explain the reasoning behind their decisions to human counterparts, however, their behavior is often produced by uninterpretable models such as deep neural networks. We propose an approach to generate natural language explanations for an agent's behavior based only on observations of states and actions, agnostic to the underlying model representation. We show how a compact representation of the agent's behavior can be learned and used to produce plausible explanations with minimal hallucination while affording user interaction with a pre-trained large language model. Through user studies and empirical experiments, we show that our approach generates explanations as helpful as those generated by a human domain expert while enabling beneficial interactions such as clarification and counterfactual queries.
    摘要 智能代理人如机器人在实际世界中越来越多地被部署。这些代理人的决策需要给人类对手 explicable,但它们的行为通常是由不可解释的模型,如深度神经网络生成的。我们提出了一种方法,可以基于状态和行动的观察来生成代理人的决策的自然语言解释。我们表明了如何学习一种紧凑的代理人行为表示,并使用这种表示生成可靠的解释,同时允许用户与预训练的大型自然语言模型进行互动。通过用户研究和实验,我们表明了我们的方法可以生成与人类领域专家生成的解释相当有用,并允许有利的互动,如确认和对比查询。

FedWOA: A Federated Learning Model that uses the Whale Optimization Algorithm for Renewable Energy Prediction

  • paper_url: http://arxiv.org/abs/2309.10337
  • repo_url: None
  • paper_authors: Viorica Chifu, Tudor Cioara, Cristian Anitiei, Cristina Pop, Ionut Anghel
  • for: 这篇论文旨在解决机器学习模型中敏感个人信息的隐私问题,通过训练大规模数据集来提高能源预测的准确性。
  • methods: 该论文提出了一种基于联合学习的解决方案,使用鲸鱼优化算法将本地LSTM神经网络模型的参数重新权值融合成全局共享模型,并通过K-Means对不同数据的整合进行处理。
  • results: 论文的实验结果表明,FedWOA可以提高能源预测模型的准确性,比 FedAVG 提高25%的MSE和16%的MAE,同时显示了好的叠加和降低损失。
    Abstract Privacy is important when dealing with sensitive personal information in machine learning models, which require large data sets for training. In the energy field, access to household prosumer energy data is crucial for energy predictions to support energy grid management and large-scale adoption of renewables however citizens are often hesitant to grant access to cloud-based machine learning models. Federated learning has been proposed as a solution to privacy challenges however report issues in generating the global prediction model due to data heterogeneity, variations in generation patterns, and the high number of parameters leading to even lower prediction accuracy. This paper addresses these challenges by introducing FedWOA a novel federated learning model that employs the Whale Optimization Algorithm to aggregate global prediction models from the weights of local LTSM neural network models trained on prosumer energy data. The proposed solution identifies the optimal vector of weights in the search spaces of the local models to construct the global shared model and then is subsequently transmitted to the local nodes to improve the prediction quality at the prosumer site while for handling non-IID data K-Means was used for clustering prosumers with similar scale of energy data. The evaluation results on prosumers energy data have shown that FedWOA can effectively enhance the accuracy of energy prediction models accuracy by 25% for MSE and 16% for MAE compared to FedAVG while demonstrating good convergence and reduced loss.
    摘要 隐私是机器学习模型处理敏感个人信息的重要问题,这些模型需要大量数据进行训练。在能源领域,获取家庭生产者能源数据是重要的,以支持能源网络管理和大规模采用可再生能源,但是公民经常拒绝提供云端机器学习模型访问。联邦学习被提议为解决隐私挑战,但是报告表示因数据不均匀、生成模式变化和参数太多,导致预测精度更低。本文提出了一种名为FedWOA的联邦学习模型,该模型使用吴顿优化算法对全球预测模型的 weights 进行聚合,从而提高预测质量。在处理非Identical distributed(非ID)数据时,使用K-Means进行分 clustering prosumers的能源数据,以便构建全球共享模型。评估结果表明,FedWOA可以提高能源预测模型的准确率,比 FedAVG 提高25%的MSE和16%的MAE,同时示出良好的叠加和降低损失。

Learning based 2D Irregular Shape Packing

  • paper_url: http://arxiv.org/abs/2309.10329
  • repo_url: None
  • paper_authors: Zeshi Yang, Zherong Pan, Manyi Li, Kui Wu, Xifeng Gao
  • for: 用于实现纹理Atlas中3D模型的 памяти有效的外观渲染。
  • methods: 使用学习支持的2D不Regular shape填充方法,包括选择和分组缺陷UV patches,并使用共同优化提高填充率。
  • results: 与多种常用基elines比较,本方法在三个数据集上实现了更高的填充率,同时保持了竞争性的计算速度。
    Abstract 2D irregular shape packing is a necessary step to arrange UV patches of a 3D model within a texture atlas for memory-efficient appearance rendering in computer graphics. Being a joint, combinatorial decision-making problem involving all patch positions and orientations, this problem has well-known NP-hard complexity. Prior solutions either assume a heuristic packing order or modify the upstream mesh cut and UV mapping to simplify the problem, which either limits the packing ratio or incurs robustness or generality issues. Instead, we introduce a learning-assisted 2D irregular shape packing method that achieves a high packing quality with minimal requirements from the input. Our method iteratively selects and groups subsets of UV patches into near-rectangular super patches, essentially reducing the problem to bin-packing, based on which a joint optimization is employed to further improve the packing ratio. In order to efficiently deal with large problem instances with hundreds of patches, we train deep neural policies to predict nearly rectangular patch subsets and determine their relative poses, leading to linear time scaling with the number of patches. We demonstrate the effectiveness of our method on three datasets for UV packing, where our method achieves a higher packing ratio over several widely used baselines with competitive computational speed.
    摘要 二维不规则形填充是计算机图形中为三维模型的Texture Atlas中的UV贴图进行内存高效的显示的必要步骤。作为一个共同的、复杂决策问题,这个问题有well-known NP-hard复杂性。先前的解决方案 Either assume a heuristic packing order或修改上游缝隙和UV映射以简化问题,这些方法 Either limit the packing ratio or incur robustness or generality issues。相比之下,我们介绍了一种学习帮助的二维不规则形填充方法,可以 achieve high packing quality with minimal input requirements。我们的方法 iteratively selects and groups subsets of UV patches into near-rectangular super patches, essentially reducing the problem to bin-packing, based on which a joint optimization is employed to further improve the packing ratio。为了有效地处理大型问题集合,我们训练了深度神经策略来预测 nearly rectangular patch subsets和他们的相对位置,从而实现 linear time scaling with the number of patches。我们在三个UV填充数据集上展示了我们的方法的效果,其中我们的方法高于许多常用的基线方法,并且与 computation speed相当。

QASnowball: An Iterative Bootstrapping Framework for High-Quality Question-Answering Data Generation

  • paper_url: http://arxiv.org/abs/2309.10326
  • repo_url: None
  • paper_authors: Kunlun Zhu, Shihao Liang, Xu Han, Zhi Zheng, Guoyang Zeng, Zhiyuan Liu, Maosong Sun
  • for: 本研究的目的是提出一种基于循环增强的问答数据生成方法,以便为Question Answering(QA)模型提供更多和更高质量的数据来源。
  • methods: 该方法包括三个模块:一个答案抽取模块,用于从无结构文档中提取核心短语作为答案候选;一个问题生成模块,用于基于文档和答案候选生成问题;以及一个问答数据筛选模块,用于筛选高质量的问答数据。此外,该方法还可以通过重新种子来自我改进,以达到不断提高数据生成质量的目的。
  • results: 我们在高资源英文场景和中资源中文场景进行了实验,结果表明:(1)使用生成的数据来训练QA模型可以达到与使用直接监督数据相当的性能;(2)先使用生成的数据进行预训练,然后使用直接监督数据进行细化训练可以达到更好的性能。
    Abstract Recent years have witnessed the success of question answering (QA), especially its potential to be a foundation paradigm for tackling diverse NLP tasks. However, obtaining sufficient data to build an effective and stable QA system still remains an open problem. For this problem, we introduce an iterative bootstrapping framework for QA data augmentation (named QASnowball), which can iteratively generate large-scale high-quality QA data based on a seed set of supervised examples. Specifically, QASnowball consists of three modules, an answer extractor to extract core phrases in unlabeled documents as candidate answers, a question generator to generate questions based on documents and candidate answers, and a QA data filter to filter out high-quality QA data. Moreover, QASnowball can be self-enhanced by reseeding the seed set to fine-tune itself in different iterations, leading to continual improvements in the generation quality. We conduct experiments in the high-resource English scenario and the medium-resource Chinese scenario, and the experimental results show that the data generated by QASnowball can facilitate QA models: (1) training models on the generated data achieves comparable results to using supervised data, and (2) pre-training on the generated data and fine-tuning on supervised data can achieve better performance. Our code and generated data will be released to advance further work.
    摘要 Recent years have witnessed the success of question answering (QA), especially its potential to be a foundation paradigm for tackling diverse NLP tasks. However, obtaining sufficient data to build an effective and stable QA system still remains an open problem. To solve this problem, we propose an iterative bootstrapping framework for QA data augmentation (named QASnowball), which can iteratively generate large-scale high-quality QA data based on a seed set of supervised examples. Specifically, QASnowball consists of three modules: an answer extractor to extract core phrases in unlabeled documents as candidate answers, a question generator to generate questions based on documents and candidate answers, and a QA data filter to filter out high-quality QA data. Moreover, QASnowball can be self-enhanced by reseeding the seed set to fine-tune itself in different iterations, leading to continual improvements in the generation quality. We conduct experiments in the high-resource English scenario and the medium-resource Chinese scenario, and the experimental results show that the data generated by QASnowball can facilitate QA models: (1) training models on the generated data achieves comparable results to using supervised data, and (2) pre-training on the generated data and fine-tuning on supervised data can achieve better performance. Our code and generated data will be released to advance further work.Here is the word-for-word translation of the text into Simplified Chinese:近年来,问答(QA)的成功尤其是一种基础理念,可以应对多种自然语言处理(NLP)任务。然而,建立有效稳定的QA系统仍然是一个打开的问题。为解决这个问题,我们提出了一个迭代启动框架,名为QASnowball,可以在种子集的超级vised例子基础上生成大规模高质量的QA数据。具体来说,QASnowball包括三个模块:一个答案提取器,可以从无标记文档中提取核心短语作为候选答案;一个问题生成器,可以基于文档和候选答案来生成问题;以及一个QA数据筛选器,可以筛选出高质量的QA数据。此外,QASnowball可以通过不同迭代来自我进行改进,从而实现不断提高生成质量。我们在高资源英语场景和中资源中文场景进行了实验,实验结果表明,QASnowball生成的数据可以帮助QA模型:(1)使用生成数据训练模型可以达到相同的性能,和(2)先进行预训练并在超级vised数据上进行细化可以实现更好的性能。我们将代码和生成的数据发布,以便进一步的工作。

Metastatic Breast Cancer Prognostication Through Multimodal Integration of Dimensionality Reduction Algorithms and Classification Algorithms

  • paper_url: http://arxiv.org/abs/2309.10324
  • repo_url: None
  • paper_authors: Bliss Singhal, Fnu Pooja
  • for: 这个研究旨在利用机器学习方法检测肿瘤是否为癌变。
  • methods: 研究使用了两种预处理算法:原理Components分析和种群算法,以降低数据维度,然后使用了三种分类算法:逻辑回归、决策树分类器和k-最近邻分类器来检测肿瘤是否为癌变。
  • results: 研究发现,使用这些预处理和分类算法的ML管道可以达到71.14%的准确率,表明这些算法在检测肿瘤是否为癌变方面具有潜在的应用前景。
    Abstract Machine learning (ML) is a branch of Artificial Intelligence (AI) where computers analyze data and find patterns in the data. The study focuses on the detection of metastatic cancer using ML. Metastatic cancer is the point where the cancer has spread to other parts of the body and is the cause of approximately 90% of cancer related deaths. Normally, pathologists spend hours each day to manually classify whether tumors are benign or malignant. This tedious task contributes to mislabeling metastasis being over 60% of time and emphasizes the importance to be aware of human error, and other inefficiencies. ML is a good candidate to improve the correct identification of metastatic cancer saving thousands of lives and can also improve the speed and efficiency of the process thereby taking less resources and time. So far, deep learning methodology of AI has been used in the research to detect cancer. This study is a novel approach to determine the potential of using preprocessing algorithms combined with classification algorithms in detecting metastatic cancer. The study used two preprocessing algorithms: principal component analysis (PCA) and the genetic algorithm to reduce the dimensionality of the dataset, and then used three classification algorithms: logistic regression, decision tree classifier, and k-nearest neighbors to detect metastatic cancer in the pathology scans. The highest accuracy of 71.14% was produced by the ML pipeline comprising of PCA, the genetic algorithm, and the k-nearest neighbors algorithm, suggesting that preprocessing and classification algorithms have great potential for detecting metastatic cancer.
    摘要 机器学习(ML)是人工智能(AI)的一个分支,计算机通过分析数据找到数据中的模式。这项研究关注利用ML检测肿瘤是否为恶性肿瘤。肿瘤肿瘤是指肿瘤已经扩散到身体其他部分,accounts for approximately 90% of cancer-related deaths. 通常, PATHOLOGISTS spend hours each day manually classify tumors as benign or malignant, but this tedious task can lead to mislabeling of metastasis, which can be over 60% of the time. ML is a good candidate to improve the correct identification of metastatic cancer, which can save thousands of lives and improve the speed and efficiency of the process, reducing the need for resources and time.在这项研究中,我们使用了深度学习方法来检测肿瘤。这是一种新的approach,我们使用了两种预处理算法:主成分分析(PCA)和 генетиче算法来减少数据集的维度,然后使用三种分类算法:Logistic regression、决策树分类器和k-nearest neighbors来检测肿瘤在Pathology scans中。最高的准确率为71.14%,这表明预处理和分类算法在检测肿瘤中具有潜在的潜力。

Who to Trust, How and Why: Untangling AI Ethics Principles, Trustworthiness and Trust

  • paper_url: http://arxiv.org/abs/2309.10318
  • repo_url: None
  • paper_authors: Andreas Duenser, David M. Douglas
  • for: 本文提供了关于人们对AI的信任和AI可靠性的文献综述,并强调了更清晰地分 differentiating these two concepts,以及更多的实证证据来探讨人们信任行为的影响因素。
  • methods: 本文讨论了信任AI的方法,包括不仅依赖于系统本身,还包括信任开发者们。AI伦理原则,如解释性和透明度,经常被认为能够促进用户信任,但实际证据表明这些特性对用户所认为的系统可靠性的影响并不是够清晰。
  • results: 本文认为,AI系统应该被视为社会技术系统,开发者、用户和其他相关人员在设计、开发、部署和使用系统时的参与度是决定系统可靠性的关键因素。如果不认真地考虑这些细节,那么人们对AI和可靠AI的信任就可能变得混乱,变得任何有利AI系统都会被视为可靠。
    Abstract We present an overview of the literature on trust in AI and AI trustworthiness and argue for the need to distinguish these concepts more clearly and to gather more empirically evidence on what contributes to people s trusting behaviours. We discuss that trust in AI involves not only reliance on the system itself, but also trust in the developers of the AI system. AI ethics principles such as explainability and transparency are often assumed to promote user trust, but empirical evidence of how such features actually affect how users perceive the system s trustworthiness is not as abundance or not that clear. AI systems should be recognised as socio-technical systems, where the people involved in designing, developing, deploying, and using the system are as important as the system for determining whether it is trustworthy. Without recognising these nuances, trust in AI and trustworthy AI risk becoming nebulous terms for any desirable feature for AI systems.
    摘要 我们提供了关于人们对AI和AI可靠性的文献综述,并 argue了更清晰地分 differentiating these concepts,并更多地寻求实证证据以确定人们如何信任系统的行为。我们讨论了人们对AI系统的信任不仅取决于系统本身,而且还取决于开发者。AI伦理原则,如可读性和透明度,通常被认为能够促进用户信任,但实际证据表明这些特性对用户对系统可靠性的看法并不那么清晰。我们认为AI系统应被视为社会技术系统,其中设计、开发、部署和使用系统的人员是决定系统可靠性的重要因素。如果不认真地考虑这些细节,则“信任AI”和“可靠AI”这两个概念可能会变得混乱,成为任何愿望的AI系统特性。

Investigating the Catastrophic Forgetting in Multimodal Large Language Models

  • paper_url: http://arxiv.org/abs/2309.10313
  • repo_url: None
  • paper_authors: Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma
  • for: 这个论文旨在研究多modal大语言模型(MLLM)的开发,并评估它们是否具有相同的表现水平。
  • methods: 这篇论文使用了EMT方法(Evaluating MulTimodality)来评估多modal语言模型中的分割混乱现象。
  • results: 论文发现,大多数经过精制的MLLM都无法保持与视觉模型相同的表现水平,而且随着精制的进度,MLLM会开始幻化,导致表现下降。
    Abstract Following the success of GPT4, there has been a surge in interest in multimodal large language model (MLLM) research. This line of research focuses on developing general-purpose LLMs through fine-tuning pre-trained LLMs and vision models. However, catastrophic forgetting, a notorious phenomenon where the fine-tuned model fails to retain similar performance compared to the pre-trained model, still remains an inherent problem in multimodal LLMs (MLLM). In this paper, we introduce EMT: Evaluating MulTimodality for evaluating the catastrophic forgetting in MLLMs, by treating each MLLM as an image classifier. We first apply EMT to evaluate several open-source fine-tuned MLLMs and we discover that almost all evaluated MLLMs fail to retain the same performance levels as their vision encoders on standard image classification tasks. Moreover, we continue fine-tuning LLaVA, an MLLM and utilize EMT to assess performance throughout the fine-tuning. Interestingly, our results suggest that early-stage fine-tuning on an image dataset improves performance across other image datasets, by enhancing the alignment of text and visual features. However, as fine-tuning proceeds, the MLLMs begin to hallucinate, resulting in a significant loss of generalizability, even when the image encoder remains frozen. Our results suggest that MLLMs have yet to demonstrate performance on par with their vision models on standard image classification tasks and the current MLLM fine-tuning procedure still has room for improvement.
    摘要 根据GPT4的成功,Multimodal大型语言模型(MLLM)的研究获得了更多的关注。这些研究旨在通过精心适应已经预训的语言模型和视觉模型来开发通用的MLLM。然而,在多modal LLM中,严重的忘记现象仍然是一个困扰,这意味着精心适应的模型无法保持与预训模型相同的性能水平。在这篇论文中,我们引入EMT:评估多modal性,用于评估MLLM中的忘记现象。我们首先将EMT应用于评估一些公开源的精心适应MLLM,我们发现大多数评估的MLLM都无法保持与摄像头模型在标准图像分类任务中的相同性能水平。此外,我们继续适应LLaVA,一个MLLM,并使用EMT评估其性能。我们发现,在早期的适应过程中,使用图像数据集进行适应可以提高图像和文本特征之间的对齐,但是,当精心适应进行时,MLLM开始伪造,导致严重的泛化能力损失,甚至当摄像头模型保持固定时。我们的结果表明,目前的MLLM尚未能达到和摄像头模型在标准图像分类任务中的性能水平,并且精心适应程序仍然需要改进。

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

  • paper_url: http://arxiv.org/abs/2309.10294
  • repo_url: None
  • paper_authors: Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen
  • For: This paper aims to improve speech emotion recognition (SER) using state-of-the-art speech pre-trained models, data2vec, text generation techniques, GPT-4, and speech synthesis techniques, Azure TTS.* Methods: The paper uses a combination of speech self-supervised pre-trained models, powerful large language models (LLMs), emotional text-to-speech (TTS) models, and data augmentation techniques to generate emotionally congruent text and speech.* Results: The paper demonstrates the effectiveness of their method through experiments and ablation studies on the IEMOCAP dataset, showing that their approach outperforms other data augmentation methods and other synthetic data.Here’s the simplified Chinese text:* 为:这篇论文目的是提高语音情感识别(SER),使用当前最佳的语音预训练模型(PTM)、数据2vec、文本生成技术、GPT-4和语音生成技术、Azure TTS。* 方法:该论文使用了语音自我超vised预训练模型、强大的大型语言模型(LLM)、情感文本生成模型和数据增强技术来生成情感相符的文本和语音。* 结果:论文通过对IEMOCAP数据集进行实验和剥离研究,证明了他们的方法的有效性,比其他数据增强方法和其他合成数据更高。
    Abstract In this paper, we explored how to boost speech emotion recognition (SER) with the state-of-the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and speech synthesis technique, Azure TTS. First, we investigated the representation ability of different speech self-supervised pre-trained models, and we found that data2vec has a good representation ability on the SER task. Second, we employed a powerful large language model (LLM), GPT-4, and emotional text-to-speech (TTS) model, Azure TTS, to generate emotionally congruent text and speech. We carefully designed the text prompt and dataset construction, to obtain the synthetic emotional speech data with high quality. Third, we studied different ways of data augmentation to promote the SER task with synthetic speech, including random mixing, adversarial training, transfer learning, and curriculum learning. Experiments and ablation studies on the IEMOCAP dataset demonstrate the effectiveness of our method, compared with other data augmentation methods, and data augmentation with other synthetic data.
    摘要 在这篇论文中,我们探索了如何通过现代speech预训练模型(PTM)、数据2vec、文本生成技术(GPT-4)和speech生成技术(Azure TTS)来提高语音情感识别(SER)的性能。首先,我们研究了不同的speech自我超vised预训练模型的表示能力,并发现data2vec在SER任务上有良好的表示能力。其次,我们利用了一个强大的大语言模型(LLM)GPT-4和情感文本-to-speech(TTS)模型Azure TTS,生成情感相符的文本和speech。我们仔细设计了文本提问和数据构造,以获得高质量的人工情感语音数据。第三,我们研究了不同的数据增强方法,以提高SER任务的性能,包括随机混合、对抗训练、传输学习和课程学习。实验和缺陷分析在IEMOCAP数据集上表明了我们的方法的有效性,相比其他数据增强方法和数据增强。

QXAI: Explainable AI Framework for Quantitative Analysis in Patient Monitoring Systems

  • paper_url: http://arxiv.org/abs/2309.10293
  • repo_url: None
  • paper_authors: Thanveer Shaik, Xiaohui Tao, Haoran Xie, Lin Li, Juan D. Velasquez, Niall Higgins
  • for: 这个研究的目的是提出一种可解释的人工智能技术,用于远程监测病人的生命体指标和 физи活动。
  • methods: 这个研究使用了深度学习模型和注意力机制,以实现可解释的人工智能框架(QXAI)。
  • results: 研究使用了PPG-DaLiA数据集和MHEALTH数据集,实现了心率预测和物理活动分类任务的状态知识和地区解释。
    Abstract Artificial Intelligence techniques can be used to classify a patient's physical activities and predict vital signs for remote patient monitoring. Regression analysis based on non-linear models like deep learning models has limited explainability due to its black-box nature. This can require decision-makers to make blind leaps of faith based on non-linear model results, especially in healthcare applications. In non-invasive monitoring, patient data from tracking sensors and their predisposing clinical attributes act as input features for predicting future vital signs. Explaining the contributions of various features to the overall output of the monitoring application is critical for a clinician's decision-making. In this study, an Explainable AI for Quantitative analysis (QXAI) framework is proposed with post-hoc model explainability and intrinsic explainability for regression and classification tasks in a supervised learning approach. This was achieved by utilizing the Shapley values concept and incorporating attention mechanisms in deep learning models. We adopted the artificial neural networks (ANN) and attention-based Bidirectional LSTM (BiLSTM) models for the prediction of heart rate and classification of physical activities based on sensor data. The deep learning models achieved state-of-the-art results in both prediction and classification tasks. Global explanation and local explanation were conducted on input data to understand the feature contribution of various patient data. The proposed QXAI framework was evaluated using PPG-DaLiA data to predict heart rate and mobile health (MHEALTH) data to classify physical activities based on sensor data. Monte Carlo approximation was applied to the framework to overcome the time complexity and high computation power requirements required for Shapley value calculations.
    摘要 人工智能技术可以用来分类患者的物理活动和预测生命 Parameters 进行远程患者监测。基于非线性模型的回归分析,如深度学习模型,具有限制可读性的问题,因为它们的黑盒特性可能会导致决策者根据非线性模型的结果进行盲目的信任,� особенpecially in healthcare applications。在非侵入式监测中,患者数据来自跟踪传感器和其相关的临床特征,用作预测未来生命 Parameters 的输入特征。解释不同特征对总输出监测应用的贡献是重要的,以便医生做出决策。在这种研究中,一种可解释AI量化分析(QXAI)框架被提出,该框架包括后续模型解释和内在解释,用于回归和分类任务。这是通过使用Shapley值概念和 incorporating attention mechanisms in deep learning models来实现的。我们采用人工神经网络(ANN)和注意力基于BiLSTM(BiLSTM)模型来预测心率和分类物理活动基于传感器数据。这些深度学习模型在预测和分类任务中达到了状态艺术的结果。全局解释和本地解释在输入数据上进行了全面的解释,以便理解不同患者数据特征的贡献。提议的QXAI框架在使用PPG-DaLiA数据集预测心率和Mobile Health(MHEALTH)数据集分类物理活动基于传感器数据进行了评估。在计算能力和计算复杂性方面,我们使用Monte Carlo Approximation来缓解QXAI框架的时间复杂度和计算能力需求。

Koopman Invertible Autoencoder: Leveraging Forward and Backward Dynamics for Temporal Modeling

  • paper_url: http://arxiv.org/abs/2309.10291
  • repo_url: None
  • paper_authors: Kshitij Tayal, Arvind Renganathan, Rahul Ghosh, Xiaowei Jia, Vipin Kumar
    for: 这个研究旨在提高机器学习模型的准确长期预测能力,并且解决现有的时间模型(如回传神经网络)在训练数据中的限制,以及它们可能无法学习目标系统的下面特性。methods: 我们提出了一种基于科普曼操作理论的机器学习模型,叫做科普曼倒镜自动encoder(KIA),这个模型可以将系统的前向和反向动态模型在无限维度希尔伯特空间中实现,从而实现更高的预测精度。此外,我们的方法设计了倒镜性,使得这个模型在前向和反向操作中保持逆转性和一致性。results: 我们在摆钟和气候 dataset 上验证了我们的方法,结果显示,对于摆钟dataset,我们的方法可以提高长期预测能力,并且在噪音影响下保持稳定性,而且在气候dataset上,我们的方法也表现出了更好的预测能力。
    Abstract Accurate long-term predictions are the foundations for many machine learning applications and decision-making processes. However, building accurate long-term prediction models remains challenging due to the limitations of existing temporal models like recurrent neural networks (RNNs), as they capture only the statistical connections in the training data and may fail to learn the underlying dynamics of the target system. To tackle this challenge, we propose a novel machine learning model based on Koopman operator theory, which we call Koopman Invertible Autoencoders (KIA), that captures the inherent characteristic of the system by modeling both forward and backward dynamics in the infinite-dimensional Hilbert space. This enables us to efficiently learn low-dimensional representations, resulting in more accurate predictions of long-term system behavior. Moreover, our method's invertibility design guarantees reversibility and consistency in both forward and inverse operations. We illustrate the utility of KIA on pendulum and climate datasets, demonstrating 300% improvements in long-term prediction capability for pendulum while maintaining robustness against noise. Additionally, our method excels in long-term climate prediction, further validating our method's effectiveness.
    摘要 准确长期预测是机器学习应用和决策过程的基础。然而,建立准确长期预测模型仍然是一项挑战,因为现有的时间模型如回归神经网络(RNN)只 capture了训练数据中的统计连接,可能无法学习目标系统的下面动态。为解决这个挑战,我们提出了一种基于库曼 оператор理论的新的机器学习模型,我们称之为库曼归一Autoencoder(KIA)。KIA模型能够在无穷维度希尔бер特空间中模型系统的前向和反向动态,从而有效地学习低维度表示,并且可以提高长期系统行为预测的准确性。此外,我们的方法的归一设计 garanties reversibility和一致性在前向和逆向操作中。我们在拖钩和气候数据集上进行了实验,并证明了KIA在长期预测方面的约300%的提升,同时保持了对噪声的Robustness。此外,我们的方法在气候预测方面也具有优异的效果,进一步证明了我们的方法的有效性。

AstroPortal: An ontology repository concept for astronomy, astronautics and other space topics

  • paper_url: http://arxiv.org/abs/2309.10288
  • repo_url: https://github.com/rrovetto/astroportal
  • paper_authors: Robert J. Rovetto
  • for: 这篇论文是为了建立一个关于天文学、航天学和其他空间相关领域的 Ontology 仓库而写的。
  • methods: 论文使用了一种中心化的平台,允许用户搜索、评审和创建 Ontology для astro-相关话题。
  • results: 论文提出了一种新的概念,即建立一个专门的 Ontology 仓库,以减少研究时间,并提供一个易用的方式来研究和比较知识组织系统或semantic资源。
    Abstract This paper describes a repository for ontologies of astronomy, astronautics, and other space-related topics. It may be called AstroPortal (or SpacePortal), AstroHub (or SpaceHub), etc. The creation of this repository will be applicable to academic, research and other data-intensive sectors. It is relevant for space sciences (including astronomy), Earth science, and astronautics (spaceflight), among other data-intensive disciplines. The repository should provide a centralized platform to search, review and create ontologies for astro-related topics. It thereby can decrease research time, while also providing a user-friendly means to study and compare knowledge organization systems or semantic resources of the target domains. With no apparent repository available on the target domain, this paper also expresses a novel concept.
    摘要 这份论文描述了一个天文、航天和其他空间相关领域 ontology 存储库。它可以被称为 AstroPortal(或 SpacePortal)、AstroHub(或 SpaceHub)等。该存储库的创建将对学术、研究和数据密集领域进行应用。它 relevante 于天文学、地球科学和航天(空间飞行)等数据密集领域。该存储库应该提供一个中央化平台,用于搜索、评审和创建 astro-related ontoologies。因此,它可以降低研究时间,同时提供一个易于使用的方式来研究和比较知识组织系统或semantic 资源的target 领域。由于目标领域没有明显的存储库,这篇论文还描述了一个新的概念。

FRAMU: Attention-based Machine Unlearning using Federated Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.10283
  • repo_url: None
  • paper_authors: Thanveer Shaik, Xiaohui Tao, Lin Li, Haoran Xie, Taotao Cai, Xiaofeng Zhu, Qing Li
  • for: 这篇论文旨在解决数据隐私问题,提供一种基于联合学习和迪金生学习的机器学习卷积推理框架,以提高模型的准确性和计算效率。
  • methods: 该框架使用注意力机制、隐私保护技术和优化策略,可以处理不同数据源,包括单模态和多模态数据,并保持模型的准确性和隐私。
  • results: 在单模态和多模态数据集上进行了实验,发现FRAMUsignificantly outperformed基准模型,并且对模型的演进和优化进行了证明。
    Abstract Machine Unlearning is an emerging field that addresses data privacy issues by enabling the removal of private or irrelevant data from the Machine Learning process. Challenges related to privacy and model efficiency arise from the use of outdated, private, and irrelevant data. These issues compromise both the accuracy and the computational efficiency of models in both Machine Learning and Unlearning. To mitigate these challenges, we introduce a novel framework, Attention-based Machine Unlearning using Federated Reinforcement Learning (FRAMU). This framework incorporates adaptive learning mechanisms, privacy preservation techniques, and optimization strategies, making it a well-rounded solution for handling various data sources, either single-modality or multi-modality, while maintaining accuracy and privacy. FRAMU's strength lies in its adaptability to fluctuating data landscapes, its ability to unlearn outdated, private, or irrelevant data, and its support for continual model evolution without compromising privacy. Our experiments, conducted on both single-modality and multi-modality datasets, revealed that FRAMU significantly outperformed baseline models. Additional assessments of convergence behavior and optimization strategies further validate the framework's utility in federated learning applications. Overall, FRAMU advances Machine Unlearning by offering a robust, privacy-preserving solution that optimizes model performance while also addressing key challenges in dynamic data environments.
    摘要 机器无学是一个emerging field,旨在解决数据隐私问题,通过从机器学习过程中除去private或无关的数据。由于使用过时、private或无关的数据,会导致模型精度和计算效率受到挑战。为了解决这些问题,我们提出了一种新的框架:基于联邦反馈学习的注意力机器无学(FRAMU)。这个框架包括适应学习机制、隐私保护技术和优化策略,使其能够处理不同数据源,包括单模态和多模态数据,而不会影响模型的准确性和隐私。FRAMU的优势在于其适应到变化的数据景观、能够快速地忘记过时、private或无关的数据,以及支持不间断的模型演化而不损失隐私。我们在单模态和多模态数据集上进行了实验,发现FRAMU与基准模型相比有显著性能提升。进一步的评估对 convergence 行为和优化策略也证明了框架在联邦学习应用中的实用性。总之,FRAMU 提高了机器无学的可靠性和隐私保护能力,为动态数据环境中的机器学习应用提供了一个robust和隐私保护的解决方案。

Crowd-Aware Multi-Agent Pathfinding With Boosted Curriculum Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.10275
  • repo_url: None
  • paper_authors: Phu Pham, Aniket Bera
  • for: 解决多Agent路径规划(MAPF)在拥挤环境中的困难问题,旨在找到所有Agent在系统中的冲突自由路径。
  • methods: 我们提出了一种人群意识感知加强的分布式方法(CRAMP),通过强化课程学习引导的训练策略来解决这个问题。
  • results: 我们在模拟环境中测试了CRAMP,并证明了我们的方法在多种维度上超过了现有的分布式方法的性能。CRAMP提高了解决quality达58% measured in makespan和冲突数量,并提高了成功率达5%。
    Abstract Multi-Agent Path Finding (MAPF) in crowded environments presents a challenging problem in motion planning, aiming to find collision-free paths for all agents in the system. MAPF finds a wide range of applications in various domains, including aerial swarms, autonomous warehouse robotics, and self-driving vehicles. The current approaches for MAPF can be broadly categorized into two main categories: centralized and decentralized planning. Centralized planning suffers from the curse of dimensionality and thus does not scale well in large and complex environments. On the other hand, decentralized planning enables agents to engage in real-time path planning within a partially observable environment, demonstrating implicit coordination. However, they suffer from slow convergence and performance degradation in dense environments. In this paper, we introduce CRAMP, a crowd-aware decentralized approach to address this problem by leveraging reinforcement learning guided by a boosted curriculum-based training strategy. We test CRAMP on simulated environments and demonstrate that our method outperforms the state-of-the-art decentralized methods for MAPF on various metrics. CRAMP improves the solution quality up to 58% measured in makespan and collision count, and up to 5% in success rate in comparison to previous methods.
    摘要 多机器人规划(MAPF)在拥挤环境中存在一个复杂的运动规划问题,旨在找到所有机器人的碰撞自由路径。MAPF在各个领域中找到了广泛的应用,包括飞行群体、自主仓库机器人和自动驾驶车辆。当前的MAPF方法可以分为两个主要类别:中央化计划和分布式计划。中央化计划受到维度约束的困扰,因此在大型和复杂的环境中不 scalable。相反,分布式计划使得机器人可以在部分可见环境中实时进行路径规划,表现出隐式协调。然而,它们在拥挤环境中表现缓慢,性能下降。在这篇论文中,我们介绍了一种受欢迎的人群意识 Decentralized Approach(CRAMP),用于解决这个问题,通过利用强化学习指导的推广课程学习策略。我们在模拟环境中测试了CRAMP,并证明我们的方法在多个纪录中性能更好,相比前一代的分布式方法。CRAMP提高了解决方案质量,达到58%的做 span和碰撞计数,以及5%的成功率。

Using an Uncrewed Surface Vehicle to Create a Volumetric Model of Non-Navigable Rivers and Other Shallow Bodies of Water

  • paper_url: http://arxiv.org/abs/2309.10269
  • repo_url: None
  • paper_authors: Jayesh Tripathi, Robin Murphy
  • for: 这篇论文旨在提供一种实用的方法,用于使用无人marine surface vehicle(USV)收集和合并浅水体的浅层地图和数字表面地图,以生成一个综合的体积模型。
  • methods: 本论文使用了Poisson面重建算法来生成下游线上的稀疏声纳深度读数,并使用商业的Structure from Motion(SfM)包装来生成静止银行的稠密上层地图。
  • results: 该方法可以准确地生成浅水体的体积模型,并且可以填充感器覆盖缺陷,以增强emergency planners对洪水的预测和管理能力。
    Abstract Non-navigable rivers and retention ponds play important roles in buffering communities from flooding, yet emergency planners often have no data as to the volume of water that they can carry before flooding the surrounding. This paper describes a practical approach for using an uncrewed marine surface vehicle (USV) to collect and merge bathymetric maps with digital surface maps of the banks of shallow bodies of water into a unified volumetric model. The below-waterline mesh is developed by applying the Poisson surface reconstruction algorithm to the sparse sonar depth readings of the underwater surface. Dense above-waterline meshes of the banks are created using commercial structure from motion (SfM) packages. Merging is challenging for many reasons, the most significant is gaps in sensor coverage, i.e., the USV cannot collect sonar depth data or visually see sandy beaches leading to a bank thus the two meshes may not intersect. The approach is demonstrated on a Hydronalix EMILY USV with a Humminbird single beam echosounder and Teledyne FLIR camera at Lake ESTI at the Texas A&M Engineering Extension Service Disaster City complex.
    摘要 非航行性河流和储水池在抵御洪水方面发挥重要作用,但紧急计划者经常没有洪水量的数据,以便在洪水时进行应急准备。这篇论文描述了一种实用的方法,使用无人海面车 (USV) 收集和融合浸没深度图和数字地面图,形成一个统一的体积模型。在水下的网格是通过将波浪表面重建算法应用于 USV 的罕见声纳深度读数来构建的。陆地上的稠密网格是使用商业的结构从运动 (SfM) 包装来创建的。融合具有许多挑战,最主要的是感器覆盖缺陷,即 USV 不能收集声纳深度数据或视见砂滩,导致两个网格不相交。该方法在得克萨斯A&M工程扩展服务灾难城区使用一只Hydronalix EMILY USV、一个Humminbird单束声纳和Teledyne FLIR Camera进行了示范。

Correlation between morphological evolution of splashing drop and exerted impact force revealed by interpretation of explainable artificial intelligence

  • paper_url: http://arxiv.org/abs/2309.10266
  • repo_url: None
  • paper_authors: Jingzu Yee, Daichi Igarashi, Pradipto, Akinori Yamanaka, Yoshiyuki Tagawa
  • for: 这个研究探讨了撞击液体在固体表面上的某些特征与正常化的影响力之间的可能的相关性。
  • methods: 这个研究使用了一种新的特征提取方法和一种可解释的人工智能(XAI)视频分类器来分类撞击和非撞击液体。
  • results: 研究发现,XAI模型对撞击和非撞击液体的分类值具有不同的重要性,并且这些重要性随时间的演化而变化。具体来说,在撞击时间的各个点上,抽象出的撞击特征的贡献率与正常化影响力的贡献率之间存在紧密的相关性。
    Abstract This study reveals a possible correlation between splashing morphology and the normalized impact force exerted by an impacting drop on a solid surface. This finding is obtained from a newly proposed feature extraction method and a subsequent interpretation of the classification of splashing and non-splashing drops performed by an explainable artificial intelligence (XAI) video classifier. Notably, the values of the weight matrix elements of the XAI that correspond to the extracted features are found to change with the temporal evolution of the drop morphology. We compute the rate of change of the contributions of each frame with respect to the classification value of a video as an important index to quantify the contributions of the extracted splashing and non-splashing features at different impact times to the classification of the XAI model. Remarkably, the rate computed for the extracted splashing features is found to closely match the profile of the normalized impact force, where the splashing features are most pronounced immediately after the normalized impact force reaches its peak value. This study has provided an example that clarifies the relationship between the complex morphological evolution of a splashing drop and physical parameters by interpreting the classification of an XAI video classifier.
    摘要 (以下是简化中文版)这个研究发现可能存在液体撞击表面时的液体形态和normalized影响力之间的关系。这一发现来自于一种新提出的特征提取方法和随后的XAI视频分类器的解释。值得注意的是,XAI模型中的weight矩阵元素与提取特征之间的关系发生了时间的变化。我们计算了每帧的贡献的变化率对视频分类值的影响,以便量化不同的撞击时间对XAI模型的分类的贡献。吸引人的是,计算的液体撞击特征的变化率与正常化影响力的profile非常相似,特别是在正常化影响力达到最大值时,液体撞击特征的变化最为明显。这个研究提供了一个示例,从液体撞击的形态进行解释,并且解释了XAI模型的分类结果与物理参数之间的关系。

LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI’s ChatGPT Plugins

  • paper_url: http://arxiv.org/abs/2309.10254
  • repo_url: https://github.com/llm-platform-security/chatgpt-plugin-eval
  • paper_authors: Umar Iqbal, Tadayoshi Kohno, Franziska Roesner
  • for: 本研究旨在提供一个框架,以便LLM平台设计师分析和改善现有和未来插件集成的LLM平台的安全性、隐私性和安全性。
  • methods: 我们提出了一个攻击分类学,通过询问LLM平台的潜在攻击者如何利用他们的能力和责任来进攻LLM平台。在我们的回归过程中,我们将这个攻击分类学应用于OpenAI的插件生态系。
  • results: 我们发现了一些插件,它们实际地显示出了我们的攻击分类学中的一些问题类型。我们结论是,这些问题对现有和未来的LLM-基于计算平台的安全性、隐私性和安全性具有新的挑战。
    Abstract Large language model (LLM) platforms, such as ChatGPT, have recently begun offering a plugin ecosystem to interface with third-party services on the internet. While these plugins extend the capabilities of LLM platforms, they are developed by arbitrary third parties and thus cannot be implicitly trusted. Plugins also interface with LLM platforms and users using natural language, which can have imprecise interpretations. In this paper, we propose a framework that lays a foundation for LLM platform designers to analyze and improve the security, privacy, and safety of current and future plugin-integrated LLM platforms. Our framework is a formulation of an attack taxonomy that is developed by iteratively exploring how LLM platform stakeholders could leverage their capabilities and responsibilities to mount attacks against each other. As part of our iterative process, we apply our framework in the context of OpenAI's plugin ecosystem. We uncover plugins that concretely demonstrate the potential for the types of issues that we outline in our attack taxonomy. We conclude by discussing novel challenges and by providing recommendations to improve the security, privacy, and safety of present and future LLM-based computing platforms.
    摘要

GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

  • paper_url: http://arxiv.org/abs/2309.10253
  • repo_url: https://github.com/sherdencooper/gptfuzz
  • paper_authors: Jiahao Yu, Xingwei Lin, Xinyu Xing
  • for: 这项研究的目的是提供一种自动生成黑盒攻击模板的攻击探索框架,以提高LLM的安全性。
  • methods: 这项研究使用了AFL fuzzing框架为基础,并提出了三个关键组成部分:种子选择策略、结构变换和判断模型。
  • results: 研究发现,使用\fuzzer攻击框架可以在不同的攻击enario下 consistently produce jailbreak templates with high success rate,即使从低质量的种子模板开始。
    Abstract Large language models (LLMs) have recently experienced tremendous popularity and are widely used from casual conversations to AI-driven programming. However, despite their considerable success, LLMs are not entirely reliable and can give detailed guidance on how to conduct harmful or illegal activities. While safety measures can reduce the risk of such outputs, adversarial "jailbreak" attacks can still exploit LLMs to produce harmful content. These jailbreak templates are typically manually crafted, making large-scale testing challenging. In this paper, we introduce \fuzzer, a novel black-box jailbreak fuzzing framework inspired by AFL fuzzing framework. Instead of manual engineering, \fuzzer automates the generation of jailbreak templates for red-teaming LLMs. At its core, \fuzzer starts with human-written templates as seeds, then mutates them using mutate operators to produce new templates. We detail three key components of \fuzzer: a seed selection strategy for balancing efficiency and variability, metamorphic relations for creating semantically equivalent or similar sentences, and a judgment model to assess the success of a jailbreak attack. We tested \fuzzer on various commercial and open-source LLMs, such as ChatGPT, LLaMa-2, and Claude2, under diverse attack scenarios. Our results indicate that \fuzzer consistently produces jailbreak templates with a high success rate, even in settings where all human-crafted templates fail. Notably, even starting with suboptimal seed templates, \fuzzer maintains over 90\% attack success rate against ChatGPT and Llama-2 models. We believe \fuzzer will aid researchers and practitioners in assessing LLM robustness and will spur further research into LLM safety.
    摘要

On Explicit Curvature Regularization in Deep Generative Models

  • paper_url: http://arxiv.org/abs/2309.10237
  • repo_url: None
  • paper_authors: Yonghyeon Lee, Frank Chongwoo Park
  • for: 这个论文是为了提出一种基于曲率的深度生成模型学习方法的建议。
  • methods: 这个论文使用了具有征服积分的几何 curvature measure,并 derivated了一些高效的计算方法。
  • results: 对于含有噪声的运动捕捉数据, curvature-based 方法表现更高效,内在曲率度量slightly 更为有效。
    Abstract We propose a family of curvature-based regularization terms for deep generative model learning. Explicit coordinate-invariant formulas for both intrinsic and extrinsic curvature measures are derived for the case of arbitrary data manifolds embedded in higher-dimensional Euclidean space. Because computing the curvature is a highly computation-intensive process involving the evaluation of second-order derivatives, efficient formulas are derived for approximately evaluating intrinsic and extrinsic curvatures. Comparative studies are conducted that compare the relative efficacy of intrinsic versus extrinsic curvature-based regularization measures, as well as performance comparisons against existing autoencoder training methods. Experiments involving noisy motion capture data confirm that curvature-based methods outperform existing autoencoder regularization methods, with intrinsic curvature measures slightly more effective than extrinsic curvature measures.
    摘要 我们提出了一组基于曲率的调整项,用于深度生成模型的学习。我们 derive了对于任意数据构造的内在和外在曲率度量的明确构成,并且因为计算曲率是高度 computation-intensive 的过程,我们 derivated了高效的曲率度量评估方法。我们进行了比较研究,评估了内在曲率 versus 外在曲率基于的调整项的Relative efficacy,以及与现有 autoencoder 训练方法的比较。实验结果显示,曲率基于的方法在陌生动态捕捉数据上表现较好,内在曲率度量 slightly more effective than 外在曲率度量。

Drive as You Speak: Enabling Human-Like Interaction with Large Language Models in Autonomous Vehicles

  • paper_url: http://arxiv.org/abs/2309.10228
  • repo_url: None
  • paper_authors: Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Ziran Wang
  • for: This paper aims to enhance autonomous vehicles’ decision-making processes by integrating Large Language Models (LLMs) to provide personalized assistance, continuous learning, and transparent decision-making.
  • methods: The proposed framework leverages LLMs’ natural language capabilities and contextual understanding, specialized tools usage, synergizing reasoning, and acting with various modules on autonomous vehicles.
  • results: The proposed framework has the potential to revolutionize the way autonomous vehicles operate, offering personalized assistance, continuous learning, and transparent decision-making, ultimately contributing to safer and more efficient autonomous driving technologies.
    Abstract The future of autonomous vehicles lies in the convergence of human-centric design and advanced AI capabilities. Autonomous vehicles of the future will not only transport passengers but also interact and adapt to their desires, making the journey comfortable, efficient, and pleasant. In this paper, we present a novel framework that leverages Large Language Models (LLMs) to enhance autonomous vehicles' decision-making processes. By integrating LLMs' natural language capabilities and contextual understanding, specialized tools usage, synergizing reasoning, and acting with various modules on autonomous vehicles, this framework aims to seamlessly integrate the advanced language and reasoning capabilities of LLMs into autonomous vehicles. The proposed framework holds the potential to revolutionize the way autonomous vehicles operate, offering personalized assistance, continuous learning, and transparent decision-making, ultimately contributing to safer and more efficient autonomous driving technologies.
    摘要 自动驾驶未来在人类中心设计和高级人工智能技术的融合中实现。未来的自动驾驶车不仅会运送乘客,还会与乘客互动,适应其愿望,使旅行更舒适、更高效、更愉悦。在这篇论文中,我们提出了一种新的框架,通过将大型自然语言模型(LLM)的自然语言能力和上下文理解 integrate into autonomous vehicles的决策过程中。通过特殊工具的使用、同步理解、合并推理和行动等模块的结合,这个框架计划将 LLM 的高级语言和推理能力融合到自动驾驶车中。该框架的提议具有改变自动驾驶车的运行方式,提供个性化协助、不断学习和透明决策,从而为更安全和更高效的自动驾驶技术做出贡献。

Multi-level feature fusion network combining attention mechanisms for polyp segmentation

  • paper_url: http://arxiv.org/abs/2309.10219
  • repo_url: None
  • paper_authors: Junzhuo Liu, Qiaosong Chen, Ye Zhang, Zhixiang Wang, Deng Xin, Jin Wang
  • For: 这篇论文的目的是提出一种新的自动识别肿瘤技术,以提高医疗诊断的效率和准确性,并减少抑癌病的风险。* Methods: 本论文提出的新技术称为MLFF-Net,它利用多层次特征融合和注意机制来优化肿瘤分类。具体来说,MLFF-Net包括三个模组:多尺度注意模组(MAM)、高级特征增强模组(HFEM)和全球注意模组(GAM)。* Results: 在五个公共数据集上进行实验,MLFF-Net的提案方法不��ley且具有比现有技术更高的准确性和通用能力。
    Abstract Clinically, automated polyp segmentation techniques have the potential to significantly improve the efficiency and accuracy of medical diagnosis, thereby reducing the risk of colorectal cancer in patients. Unfortunately, existing methods suffer from two significant weaknesses that can impact the accuracy of segmentation. Firstly, features extracted by encoders are not adequately filtered and utilized. Secondly, semantic conflicts and information redundancy caused by feature fusion are not attended to. To overcome these limitations, we propose a novel approach for polyp segmentation, named MLFF-Net, which leverages multi-level feature fusion and attention mechanisms. Specifically, MLFF-Net comprises three modules: Multi-scale Attention Module (MAM), High-level Feature Enhancement Module (HFEM), and Global Attention Module (GAM). Among these, MAM is used to extract multi-scale information and polyp details from the shallow output of the encoder. In HFEM, the deep features of the encoders complement each other by aggregation. Meanwhile, the attention mechanism redistributes the weight of the aggregated features, weakening the conflicting redundant parts and highlighting the information useful to the task. GAM combines features from the encoder and decoder features, as well as computes global dependencies to prevent receptive field locality. Experimental results on five public datasets show that the proposed method not only can segment multiple types of polyps but also has advantages over current state-of-the-art methods in both accuracy and generalization ability.
    摘要 临床上,自动化肿体分割技术具有提高医学诊断效率和准确率的潜在优势,从而降低患者抗性肿瘤的风险。然而,现有方法受到两大缺陷,这两个缺陷可能会影响分割的准确性。首先,编码器提取的特征并不充分筛选和利用。其次,由特征融合引起的semantic conflict和信息重复不被注意。为了解决这些限制,我们提出了一种新的肿体分割方法,名为MLFF-Net,它利用多级特征融合和注意机制。具体来说,MLFF-Net包括三个模块:多级注意模块(MAM)、高级特征增强模块(HFEM)和全局注意模块(GAM)。其中,MAM用于从编码器的浅输出中提取多级信息和肿体细节。在HFEM中,编码器的深特征相互补充,并通过注意机制重新分配这些特征的权重,弱化冲突的重复部分,高亮任务所需的信息。GAM将编码器和解码器特征相结合,并计算全局依赖关系,以避免感知范围地局部性。我们在五个公共数据集上进行了实验,结果表明,提议的方法不仅可以分割多种肿体,而且在准确率和普适性能方面也有优势于当前state-of-the-art方法。

An Empirical Study of Attention Networks for Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2309.10217
  • repo_url: None
  • paper_authors: Hao Guo, Hongbiao Si, Guilin Jiang, Wei Zhang, Zhiyan Liu, Xuanyi Zhu, Xulong Zhang, Yang Liu
  • for: 本文主要研究 semantic segmentation 领域中的注意网络,探讨其计算复杂度和精度在不同类别上的表现,以及适用场景和建议。
  • methods: 本文使用了多种注意网络,包括 decoder 和 self-attention 网络,进行对比研究。
  • results: 研究发现,decoder 网络在某些场景下表现较好,而 self-attention 网络在其他场景下表现较好。此外,研究还发现了一些注意网络的缺点和未来发展方向。
    Abstract Semantic segmentation is a vital problem in computer vision. Recently, a common solution to semantic segmentation is the end-to-end convolution neural network, which is much more accurate than traditional methods.Recently, the decoders based on attention achieve state-of-the-art (SOTA) performance on various datasets. But these networks always are compared with the mIoU of previous SOTA networks to prove their superiority and ignore their characteristics without considering the computation complexity and precision in various categories, which is essential for engineering applications. Besides, the methods to analyze the FLOPs and memory are not consistent between different networks, which makes the comparison hard to be utilized. What's more, various methods utilize attention in semantic segmentation, but the conclusion of these methods is lacking. This paper first conducts experiments to analyze their computation complexity and compare their performance. Then it summarizes suitable scenes for these networks and concludes key points that should be concerned when constructing an attention network. Last it points out some future directions of the attention network.
    摘要 semantic segmentation 是计算机视觉中的一个关键问题。最近,一种常见的解决方案是将端到端 convolutional neural network(CNN)作为解决方案,这种方法比传统方法更为精准。 Recently, attention 基于的解码器在多个数据集上达到了状态的极点性能(SOTA)。但这些网络总是与之前的 SOTA 网络的 mIoU 进行比较,忽略它们的特点而不考虑不同类别的计算复杂度和精度,这是工程应用中必须考虑的。另外,不同网络之间的 FLOPs 和内存分析方法不一致,使得比较变得困难。此外,各种方法在 semantic segmentation 中使用 attention,但这些方法的结论缺乏。这篇论文首先进行了计算复杂度的分析和比较性能。然后总结了适合这些网络的场景,并指出了在建立注意力网络时需要关注的关键点。最后,它指出了未来注意力网络的发展方向。

Safe POMDP Online Planning via Shielding

  • paper_url: http://arxiv.org/abs/2309.10216
  • repo_url: None
  • paper_authors: Shili Sheng, David Parker, Lu Feng
  • for: 这个研究旨在实现安全性的延伸POMDP在线规划,以满足安全需求。
  • methods: 研究使用了防护盾来限制不安全的动作,并将其与POMCP算法结合以确保安全性。
  • results: 实验结果显示,提案的防护盾方法可以成功保证安全性,并且对大型POMDP进行在线规划并不会对 runtime 有显著影响。
    Abstract Partially observable Markov decision processes (POMDPs) have been widely used in many robotic applications for sequential decision-making under uncertainty. POMDP online planning algorithms such as Partially Observable Monte-Carlo Planning (POMCP) can solve very large POMDPs with the goal of maximizing the expected return. But the resulting policies cannot provide safety guarantees that are imperative for real-world safety-critical tasks (e.g., autonomous driving). In this work, we consider safety requirements represented as almost-sure reach-avoid specifications (i.e., the probability to reach a set of goal states is one and the probability to reach a set of unsafe states is zero). We compute shields that restrict unsafe actions violating almost-sure reach-avoid specifications. We then integrate these shields into the POMCP algorithm for safe POMDP online planning. We propose four distinct shielding methods, differing in how the shields are computed and integrated, including factored variants designed to improve scalability. Experimental results on a set of benchmark domains demonstrate that the proposed shielding methods successfully guarantee safety (unlike the baseline POMCP without shielding) on large POMDPs, with negligible impact on the runtime for online planning.
    摘要