cs.AI - 2023-10-16

Greedy Perspectives: Multi-Drone View Planning for Collaborative Coverage in Cluttered Environments

  • paper_url: http://arxiv.org/abs/2310.10863
  • repo_url: None
  • paper_authors: Krishna Suresh, Aditya Rauniyar, Micah Corah, Sebastian Scherer
  • for: 这篇论文旨在帮助营造大规模的人群拍摄,特别是在团队体育和电影摄影等领域。
  • methods: 这篇论文使用了序列优化的方法来实现可扩展的Camera View最优化,但是在填充环境中却遇到了协调问题。
  • results: 作者通过开发了一种多机器人多演员视图规划算法,并对其进行了阻挡和遮挡意识的目标设定,以实现在填充环境中协调多机器人拍摄人群的目的。并且对比formation planner,这种顺序 планинг器在三个场景中 генериру了14%更高的actor view reward,并且在两个场景中与formation planning的性能相似。
    Abstract Deployment of teams of aerial robots could enable large-scale filming of dynamic groups of people (actors) in complex environments for novel applications in areas such as team sports and cinematography. Toward this end, methods for submodular maximization via sequential greedy planning can be used for scalable optimization of camera views across teams of robots but face challenges with efficient coordination in cluttered environments. Obstacles can produce occlusions and increase chances of inter-robot collision which can violate requirements for near-optimality guarantees. To coordinate teams of aerial robots in filming groups of people in dense environments, a more general view-planning approach is required. We explore how collision and occlusion impact performance in filming applications through the development of a multi-robot multi-actor view planner with an occlusion-aware objective for filming groups of people and compare with a greedy formation planner. To evaluate performance, we plan in five test environments with complex multiple-actor behaviors. Compared with a formation planner, our sequential planner generates 14% greater view reward over the actors for three scenarios and comparable performance to formation planning on two others. We also observe near identical performance of sequential planning both with and without inter-robot collision constraints. Overall, we demonstrate effective coordination of teams of aerial robots for filming groups that may split, merge, or spread apart and in environments cluttered with obstacles that may cause collisions or occlusions.
    摘要 deployments of teams of aerial robots could enable large-scale filming of dynamic groups of people (actors) in complex environments for novel applications in areas such as team sports and cinematography. Toward this end, methods for submodular maximization via sequential greedy planning can be used for scalable optimization of camera views across teams of robots but face challenges with efficient coordination in cluttered environments. Obstacles can produce occlusions and increase chances of inter-robot collision which can violate requirements for near-optimality guarantees. To coordinate teams of aerial robots in filming groups of people in dense environments, a more general view-planning approach is required. We explore how collision and occlusion impact performance in filming applications through the development of a multi-robot multi-actor view planner with an occlusion-aware objective for filming groups of people and compare with a greedy formation planner. To evaluate performance, we plan in five test environments with complex multiple-actor behaviors. Compared with a formation planner, our sequential planner generates 14% greater view reward over the actors for three scenarios and comparable performance to formation planning on two others. We also observe near identical performance of sequential planning both with and without inter-robot collision constraints. Overall, we demonstrate effective coordination of teams of aerial robots for filming groups that may split, merge, or spread apart and in environments cluttered with obstacles that may cause collisions or occlusions.Here's the text with some notes on the translation:* "deployments" is translated as "部署" (bù dào), which is a more general term that can refer to any type of deployment, not just of robots.* "aerial robots" is translated as "空中机器人" (kōng zhōng jī rò bīng), which is a more specific term that refers to robots that operate in the air.* "film" is translated as "拍摄" (pān shè), which is a more general term that can refer to any type of filming or recording.* "applications" is translated as "应用" (yìng yòu), which is a more general term that can refer to any type of use or application.* "team sports" is translated as "团体运动" (tuán tǐ yùn dòng), which is a more specific term that refers to sports that involve teams of players.* "cinematography" is translated as "摄影" (shè yǐng), which is a more specific term that refers to the art and technique of filmmaking.* "submodular maximization" is translated as "互补最大化" (huì chē zhì dà huì), which is a more specific term that refers to a type of optimization problem where the goal is to maximize a submodular function.* "sequential greedy planning" is translated as "顺序贪吃规划" (shù xìa bīng zhèng), which is a more specific term that refers to a type of planning algorithm that uses greedy heuristics to optimize a sequence of decisions.* "obstacles" is translated as "障碍物" (fāng yì wù), which is a more general term that can refer to any type of obstacle or barrier.* "occlusions" is translated as "遮挡" (miǎn zhì), which is a more specific term that refers to the blocking or hiding of objects or viewpoints by other objects or surfaces.* "inter-robot collision" is translated as "机器人间冲突" (jī rò bīng jiān chōng tòu), which is a more specific term that refers to collisions between robots.* "near-optimality guarantees" is translated as "似乎最优化保证" (xiào guī zhì yòu huì huì), which is a more specific term that refers to guarantees that a solution is close to optimal.* "view-planning" is translated as "观察规划" (guān chá zhì huì), which is a more specific term that refers to the planning of views or viewpoints.* "multi-robot multi-actor" is translated as "多机器人多actor" (duō jī rò duō yuǎn), which is a more specific term that refers to systems with multiple robots and multiple actors.* "occlusion-aware objective" is translated as "遮挡目标" (miǎn zhì mù tiǎo), which is a more specific term that refers to objectives that take into account the presence of occlusions.* "formation planner" is translated as "formation规划" (fāng yì zhì huì), which is a more specific term that refers to a type of planner that uses formations or patterns to optimize a sequence of decisions.* "sequential planner" is translated as "顺序规划" (shù xìa zhì huì), which is a more specific term that refers to a type of planner that uses a sequential search algorithm to optimize a sequence of decisions.* "test environments" is translated as "测试环境" (cè shí huán jīng), which is a more general term that can refer to any type of testing or evaluation environment.* "complex multiple-actor behaviors" is translated as "复杂多actor行为" (fāng xìa duō yuǎn xíng wèi), which is a more specific term that refers to systems with multiple actors and complex behaviors.

Proper Laplacian Representation Learning

  • paper_url: http://arxiv.org/abs/2310.10833
  • repo_url: None
  • paper_authors: Diego Gomez, Michael Bowling, Marlos C. Machado
  • for: 解决大型反射学习问题,即探索、泛化和传递问题,需要学习好的状态表示。
  • methods: 使用 Laplacian 表示法,通过寻找矩阵 Laplacian 的特征值和特征向量来实现。
  • results: 提出了一种 theoretically 有 garantue 的目标函数和优化算法,可以准确地 recuperate Laplacian 表示,并且在多种环境中进行了实验,证明了其在学习中的稳定性和可靠性。
    Abstract The ability to learn good representations of states is essential for solving large reinforcement learning problems, where exploration, generalization, and transfer are particularly challenging. The Laplacian representation is a promising approach to address these problems by inducing intrinsic rewards for temporally-extended action discovery and reward shaping, and informative state encoding. To obtain the Laplacian representation one needs to compute the eigensystem of the graph Laplacian, which is often approximated through optimization objectives compatible with deep learning approaches. These approximations, however, depend on hyperparameters that are impossible to tune efficiently, converge to arbitrary rotations of the desired eigenvectors, and are unable to accurately recover the corresponding eigenvalues. In this paper we introduce a theoretically sound objective and corresponding optimization algorithm for approximating the Laplacian representation. Our approach naturally recovers both the true eigenvectors and eigenvalues while eliminating the hyperparameter dependence of previous approximations. We provide theoretical guarantees for our method and we show that those results translate empirically into robust learning across multiple environments.
    摘要 “学习良好的状态表示是解决大型回归学习问题的关键,特别是在探索、泛化和传输方面存在挑战。laplacian表示是一种有 Promise的方法,它可以通过时间扩展的动作发现和奖励形成,以及有用的状态编码。但是,为了获得laplacian表示,需要计算图laplacian的eigen系统,这通常是通过兼容深度学习方法的优化目标来实现。这些优化目标 however,依赖于无法效率地调整的超参数,并且会导致优化过程中的旋转矩阵和征值的不准确性。在这篇论文中,我们介绍了一种有理论基础的目标函数和相应的优化算法,可以有效地近似laplacian表示。我们的方法可以自动回归真正的eigen vectors和征值,并消除前一代的优化目标中的超参数依赖性。我们提供了理论保证,并证明了这些结果在多个环境中的实际表现是稳定和可靠的。”

Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model

  • paper_url: http://arxiv.org/abs/2310.13010
  • repo_url: None
  • paper_authors: Hagen Soltau, Izhak Shafran, Alex Ottenwess, Joseph R. JR Duffy, Rene L. Utianski, Leland R. Barnard, John L. Stricker, Daniela Wiepert, David T. Jones, Hugo Botha
  • for: 检测speech中的异常现象,尤其是一些神经疾病的表现。
  • methods: 使用Perceiver-based序列分类器和Universal Speech Model(USM),并将其与12百万小时多样化音频记录进行训练。
  • results: 提出的模型在Mayo клиника检测集上表现出色,与标准转换器(80.9%)和感知器(81.8%)模型相比,具有更高的准确率(83.1%),并且在有限任务特定数据下,发现预训练是重要的,而且预训练与自动语音识别任务也是有益的。
    Abstract We propose a Perceiver-based sequence classifier to detect abnormalities in speech reflective of several neurological disorders. We combine this classifier with a Universal Speech Model (USM) that is trained (unsupervised) on 12 million hours of diverse audio recordings. Our model compresses long sequences into a small set of class-specific latent representations and a factorized projection is used to predict different attributes of the disordered input speech. The benefit of our approach is that it allows us to model different regions of the input for different classes and is at the same time data efficient. We evaluated the proposed model extensively on a curated corpus from the Mayo Clinic. Our model outperforms standard transformer (80.9%) and perceiver (81.8%) models and achieves an average accuracy of 83.1%. With limited task-specific data, we find that pretraining is important and surprisingly pretraining with the unrelated automatic speech recognition (ASR) task is also beneficial. Encodings from the middle layers provide a mix of both acoustic and phonetic information and achieve best prediction results compared to just using the final layer encodings (83.1% vs. 79.6%). The results are promising and with further refinements may help clinicians detect speech abnormalities without needing access to highly specialized speech-language pathologists.
    摘要 我们提议一种基于感知者的序列分类器,用于检测speech中的异常现象,表现为多种神经疾病。我们将这个分类器与一个基于自动语音识别(ASR)任务的通用语音模型(USM)结合,并在1200万小时多样化音频记录上进行无监督训练。我们的模型可以压缩长序列到一小集类特有的归一化表示和一个 факторизовый投影,以预测不同类型的输入异常speech的不同属性。我们的方法的优点在于,它可以为不同类型的输入模型不同的地方,同时具有数据效率的优势。我们对提议模型进行了广泛的评估,并在 mayo临床数据库中验证了模型。我们的模型在标准transformer(80.9%)和感知器(81.8%)模型的基础上提高了性能,并实现了83.1%的平均准确率。我们发现,在有限的任务特定数据上,预训练是重要的,而且预训练使用ASR任务也是有利的。中间层编码器提供了mixture的音频和phonetic信息,并实现了最佳预测结果(83.1% vs. 79.6%)。结果具有潜在的价值,通过进一步的优化,可能帮助临床专业人员检测speech异常性,不需要高度专业的语音学术师。

If the Sources Could Talk: Evaluating Large Language Models for Research Assistance in History

  • paper_url: http://arxiv.org/abs/2310.10808
  • repo_url: None
  • paper_authors: Giselle Gonzalez Garcia, Christian Weilbach
  • for: 这个论文旨在探讨如何使用大型自然语言模型(LLM)来探索历史记忆(或训练数据),并证明了在增强LLM WITH vector embedding的情况下,可以为历史学家和人文科学研究者提供一种可访问的对话式研究方法。
  • methods: 这篇论文使用了LLM进行对话式研究,并通过增强LLM WITH vector embedding来提高其对问题的回答和数据EXTRACTION和组织能力。
  • results: 论文表明,LLM可以在问题解决和数据EXTRACTION和组织等任务中表现出色,并且可以在特定研究项目中应用到大量文本档案中,无需包含在其训练数据中。因此,LLM可以被PRIVATELY queried by researchers,并且可以被用于特定研究项目中。
    Abstract The recent advent of powerful Large-Language Models (LLM) provides a new conversational form of inquiry into historical memory (or, training data, in this case). We show that by augmenting such LLMs with vector embeddings from highly specialized academic sources, a conversational methodology can be made accessible to historians and other researchers in the Humanities. Concretely, we evaluate and demonstrate how LLMs have the ability of assisting researchers while they examine a customized corpora of different types of documents, including, but not exclusive to: (1). primary sources, (2). secondary sources written by experts, and (3). the combination of these two. Compared to established search interfaces for digital catalogues, such as metadata and full-text search, we evaluate the richer conversational style of LLMs on the performance of two main types of tasks: (1). question-answering, and (2). extraction and organization of data. We demonstrate that LLMs semantic retrieval and reasoning abilities on problem-specific tasks can be applied to large textual archives that have not been part of the its training data. Therefore, LLMs can be augmented with sources relevant to specific research projects, and can be queried privately by researchers.
    摘要

Demystifying Poisoning Backdoor Attacks from a Statistical Perspective

  • paper_url: http://arxiv.org/abs/2310.10780
  • repo_url: None
  • paper_authors: Ganghua Wang, Xun Xian, Jayanth Srinivasa, Ashish Kundu, Xuan Bi, Mingyi Hong, Jie Ding
  • for: 本研究旨在评估潜在攻击型机器学习模型的安全性,具体来说是评估含有常量触发器的后门攻击的成功因素和攻击方向。
  • methods: 本研究使用了定理和实验方法来评估后门攻击的成功因素和攻击方向。
  • results: 研究发现了一系列关键因素影响后门攻击的成功,包括触发器的类型和位置、模型的类型和训练数据的性质等。此外,研究还发现了一些可能的攻击方向,包括潜在的人为干预和模型的泄漏。
    Abstract The growing dependence on machine learning in real-world applications emphasizes the importance of understanding and ensuring its safety. Backdoor attacks pose a significant security risk due to their stealthy nature and potentially serious consequences. Such attacks involve embedding triggers within a learning model with the intention of causing malicious behavior when an active trigger is present while maintaining regular functionality without it. This paper evaluates the effectiveness of any backdoor attack incorporating a constant trigger, by establishing tight lower and upper boundaries for the performance of the compromised model on both clean and backdoor test data. The developed theory answers a series of fundamental but previously underexplored problems, including (1) what are the determining factors for a backdoor attack's success, (2) what is the direction of the most effective backdoor attack, and (3) when will a human-imperceptible trigger succeed. Our derived understanding applies to both discriminative and generative models. We also demonstrate the theory by conducting experiments using benchmark datasets and state-of-the-art backdoor attack scenarios.
    摘要
  1. What are the determining factors for a backdoor attack’s success?2. What is the direction of the most effective backdoor attack?3. When will a human-imperceptible trigger succeed?Our findings apply to both discriminative and generative models, and we demonstrate our theory through experiments using benchmark datasets and state-of-the-art backdoor attack scenarios.

BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys

  • paper_url: http://arxiv.org/abs/2310.10765
  • repo_url: None
  • paper_authors: Yu Gu, Jianwei Yang, Naoto Usuyama, Chunyuan Li, Sheng Zhang, Matthew P. Lungren, Jianfeng Gao, Hoifung Poon
  • for: 这个研究旨在应用自然语言指令学习的技术来生成医疗影像中的counterfactual影像,以分别鉴别 causal structure 和 spurious correlation,并且帮助医生更好地阅读医疗影像进行病程模型化。
  • methods: 这个研究使用 GPT-4 处理医疗影像报告,生成医疗影像的描述,并且使用这些 triplets (prior image, progression description, new image) 进行 latent diffusion 模型的训练,以生成 counterfactual 医疗影像。
  • results: 这个研究的结果显示,BiomedJourney 方法可以对医疗影像进行高品质的 counterfactual 生成,并且substantially outperform 先前的 state-of-the-art 方法。
    Abstract Rapid progress has been made in instruction-learning for image editing with natural-language instruction, as exemplified by InstructPix2Pix. In biomedicine, such methods can be applied to counterfactual image generation, which helps differentiate causal structure from spurious correlation and facilitate robust image interpretation for disease progression modeling. However, generic image-editing models are ill-suited for the biomedical domain, and counterfactual biomedical image generation is largely underexplored. In this paper, we present BiomedJourney, a novel method for counterfactual biomedical image generation by instruction-learning from multimodal patient journeys. Given a patient with two biomedical images taken at different time points, we use GPT-4 to process the corresponding imaging reports and generate a natural language description of disease progression. The resulting triples (prior image, progression description, new image) are then used to train a latent diffusion model for counterfactual biomedical image generation. Given the relative scarcity of image time series data, we introduce a two-stage curriculum that first pretrains the denoising network using the much more abundant single image-report pairs (with dummy prior image), and then continues training using the counterfactual triples. Experiments using the standard MIMIC-CXR dataset demonstrate the promise of our method. In a comprehensive battery of tests on counterfactual medical image generation, BiomedJourney substantially outperforms prior state-of-the-art methods in instruction image editing and medical image generation such as InstructPix2Pix and RoentGen. To facilitate future study in counterfactual medical generation, we plan to release our instruction-learning code and pretrained models.
    摘要 快速进步在图像编辑中使用自然语言指令,如InstructPix2Pix,已经取得了显著的成果。在生物医学领域,这些方法可以应用于对比例图像生成,以分解 causal structure 和偶极相关,并且为疾病进程模型提供了更加稳定的图像解释。然而,通用的图像编辑模型在生物医学领域是不适用的,对比例生成图像的研究仍然很少。在这篇论文中,我们提出了 BiomedJourney,一种新的对比例生成方法,通过 instruction-learning 从多modal 患者旅程中学习。给定一个患有两张不同时点的生物医学图像,我们使用 GPT-4 处理相关的医学报告,并生成一个描述疾病进程的自然语言描述。这些 triple(先前图像、进程描述、新图像)然后用于训练一个潜在扩散模型进行对比例生成。由于图像时序数据的缺乏,我们提出了一个两stage 课程,首先使用 much more abundant 的单图像-报告对(与假先前图像)进行预训练,然后继续使用对比例 triple。实验使用标准的 MIMIC-CXR 数据集表明,BiomedJourney 在对比例医学图像生成方面具有明显的优势,substantially outperforming 先前的状态对照方法,如 InstructPix2Pix 和 RoentGen。为便于未来对比例医学生成的研究,我们计划在未来发布我们的 instruction-learning 代码和预训练模型。

Step-by-Step Remediation of Students’ Mathematical Mistakes

  • paper_url: http://arxiv.org/abs/2310.10648
  • repo_url: https://github.com/rosewang2008/remath
  • paper_authors: Rose E. Wang, Qingyang Zhang, Carly Robinson, Susanna Loeb, Dorottya Demszky
  • for: 这个论文的目的是探讨大型自然语言模型(LLM)在数学教学中是否能够有效地帮助新手老师更好地纠正学生的错误。
  • methods: 这个论文使用了一个名为ReMath的benchmark,该benchmark由经验丰富的数学教师共同开发,它包括三个步骤:(1)推断学生错误的类型,(2)确定修正错误的策略,(3)生成一个包含该信息的回答。这个benchmark用于评估当今最好的 instruct-tuned 和对话模型在ReMath上的性能。
  • results: 研究发现,即使使用最佳模型,模型的回答仍然不能与经验丰富的数学教师相比。提供模型错误类型和策略信息可以提高模型的回答质量,但是这些回答仍然不能达到经验教师的水平。这些结果表明,使用当今的LLM来提供高质量的学习经验,虽然有potential,但还有一定的限制。研究的代码已经公开在GitHub上:https://github.com/rosewang2008/remath。
    Abstract Scaling high-quality tutoring is a major challenge in education. Because of the growing demand, many platforms employ novice tutors who, unlike professional educators, struggle to effectively address student mistakes and thus fail to seize prime learning opportunities for students. In this paper, we explore the potential for large language models (LLMs) to assist math tutors in remediating student mistakes. We present ReMath, a benchmark co-developed with experienced math teachers that deconstructs their thought process for remediation. The benchmark consists of three step-by-step tasks: (1) infer the type of student error, (2) determine the strategy to address the error, and (3) generate a response that incorporates that information. We evaluate the performance of state-of-the-art instruct-tuned and dialog models on ReMath. Our findings suggest that although models consistently improve upon original tutor responses, we cannot rely on models alone to remediate mistakes. Providing models with the error type (e.g., the student is guessing) and strategy (e.g., simplify the problem) leads to a 75% improvement in the response quality over models without that information. Nonetheless, despite the improvement, the quality of the best model's responses still falls short of experienced math teachers. Our work sheds light on the potential and limitations of using current LLMs to provide high-quality learning experiences for both tutors and students at scale. Our work is open-sourced at this link: \url{https://github.com/rosewang2008/remath}.
    摘要 增加高质量的帮助是现代教育中的一大挑战。由于需求的增长,许多平台都雇用了不熟悉教育的新教师,与专业教师不同,他们有时无法有效地 corrected学生的错误,因此失去了学生 prime learning opportunities。在这篇论文中,我们探讨了大型自然语言模型(LLM)是否可以帮助数学 tutors corrected学生的错误。我们提出了一个名为 ReMath 的标准,与经验丰富的数学教师合作开发。ReMath 包括三个步骤任务:(1)推断学生错误的类型,(2)确定修复错误的策略,(3)生成包含该信息的回答。我们对 state-of-the-art 的 instruct-tuned 和对话模型进行评估,我们的发现表明,虽然模型在 ReMath 上表现了进步,但我们无法仅仅通过模型来修复错误。在提供错误类型(如学生假设)和修复策略(如简化问题)的情况下,模型的回答质量提高了75%。然而,即使有这些信息,模型的回答仍然落后于经验丰富的数学教师。我们的工作探讨了当前 LLM 是否可以在大规模上提供高质量的学习经验。我们的工作开源在这里:https://github.com/rosewang2008/remath。

A Survey on Video Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.10647
  • repo_url: https://github.com/ChenHsing/Awesome-Video-Diffusion-Models
  • paper_authors: Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, Yu-Gang Jiang
  • for: 这 paper 主要是为了对 AI 生成内容 (AIGC) 领域中的 video diffusion models 进行了一个系统的综述。
  • methods: 这 paper 使用了多种方法,包括 diffusion models、GANs 和 auto-regressive Transformers,以探讨 video diffusion models 在不同领域的应用。
  • results: 这 paper 发现了许多有价值的研究结果,包括 video 生成、编辑、以及其他视频理解任务中的应用。
    Abstract The recent wave of AI-generated content (AIGC) has witnessed substantial success in computer vision, with the diffusion model playing a crucial role in this achievement. Due to their impressive generative capabilities, diffusion models are gradually superseding methods based on GANs and auto-regressive Transformers, demonstrating exceptional performance not only in image generation and editing, but also in the realm of video-related research. However, existing surveys mainly focus on diffusion models in the context of image generation, with few up-to-date reviews on their application in the video domain. To address this gap, this paper presents a comprehensive review of video diffusion models in the AIGC era. Specifically, we begin with a concise introduction to the fundamentals and evolution of diffusion models. Subsequently, we present an overview of research on diffusion models in the video domain, categorizing the work into three key areas: video generation, video editing, and other video understanding tasks. We conduct a thorough review of the literature in these three key areas, including further categorization and practical contributions in the field. Finally, we discuss the challenges faced by research in this domain and outline potential future developmental trends. A comprehensive list of video diffusion models studied in this survey is available at https://github.com/ChenHsing/Awesome-Video-Diffusion-Models.
    摘要 最近的人工智能生成内容(AIGC)浪潮中,计算机视觉领域的扩散模型发挥了关键作用。由于它们的出色的生成能力,扩散模型逐渐取代了基于GANs和自适应Transformers的方法,在图像生成和编辑领域以及视频领域的研究中表现出色。然而,现有的评论主要集中在图像生成领域中,对视频领域中的扩散模型的应用有少量最新的评论。为了填补这个空白,本文提供了人工智能生成内容时代的视频扩散模型的全面评论。 Specifically, we begin with a concise introduction to the fundamentals and evolution of diffusion models. Subsequently, we present an overview of research on diffusion models in the video domain, categorizing the work into three key areas: video generation, video editing, and other video understanding tasks. We conduct a thorough review of the literature in these three key areas, including further categorization and practical contributions in the field. Finally, we discuss the challenges faced by research in this domain and outline potential future developmental trends. A comprehensive list of video diffusion models studied in this survey is available at https://github.com/ChenHsing/Awesome-Video-Diffusion-Models.

Interactive Task Planning with Language Models

  • paper_url: http://arxiv.org/abs/2310.10645
  • repo_url: https://github.com/CraftJarvis/MC-Planner
  • paper_authors: Boyi Li, Philipp Wu, Pieter Abbeel, Jitendra Malik
  • for: 这个paper是为了解决长期任务规划和执行问题,并且可以轻松泛化到不同的目标或任务。
  • methods: 这个paper使用语言模型来实现交互式任务规划,并且结合高级规划和低级功能执行。
  • results: 这个paper的系统可以生成新的高级指令来实现未经见过的目标,并且可以轻松地适应不同的任务,只需更改任务指南即可。此外,当用户发送新的请求时,系统可以重新规划根据新的请求、任务指南和之前执行的步骤。
    Abstract An interactive robot framework accomplishes long-horizon task planning and can easily generalize to new goals or distinct tasks, even during execution. However, most traditional methods require predefined module design, which makes it hard to generalize to different goals. Recent large language model based approaches can allow for more open-ended planning but often require heavy prompt engineering or domain-specific pretrained models. To tackle this, we propose a simple framework that achieves interactive task planning with language models. Our system incorporates both high-level planning and low-level function execution via language. We verify the robustness of our system in generating novel high-level instructions for unseen objectives and its ease of adaptation to different tasks by merely substituting the task guidelines, without the need for additional complex prompt engineering. Furthermore, when the user sends a new request, our system is able to replan accordingly with precision based on the new request, task guidelines and previously executed steps. Please check more details on our https://wuphilipp.github.io/itp_site and https://youtu.be/TrKLuyv26_g.
    摘要 一个交互式机器人框架实现了长期任务规划,可以轻松泛化到新目标或不同任务,甚至在执行过程中。然而,大多数传统方法需要预定的模块设计,这使得泛化到不同目标变得困难。现有大语言模型基于方法可以允许更开放的规划,但frequently需要重量的提前工程或域特定预训练模型。为解决这个问题,我们提出了一个简单的框架,可以通过语言模型实现交互式任务规划。我们的系统结合高级规划和低级功能执行 via 语言。我们验证了我们的系统在生成未看过目标的新高级指令方面的稳定性和可靠性,以及在不同任务时的扩展性和适应性。具体信息请参考我们的

In-Context Pretraining: Language Modeling Beyond Document Boundaries

  • paper_url: http://arxiv.org/abs/2310.10638
  • repo_url: None
  • paper_authors: Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Scott Yih, Mike Lewis
  • for: 这个论文的目的是提高大语言模型(LMs)的性能,使其能够更好地理解文档之间的关系和Contextual reasoning。
  • methods: 该论文提出了一种新的预训练方法 called In-Context Pretraining,该方法使用相关的文档序列来显式地鼓励LMs读取和理解文档边界。
  • results: 实验表明,In-Context Pretraining可以提高LMs的性能,特别是在需要更复杂的文档上下文理解任务中,例如在文档学习 (+8%), 阅读理解 (+15%), 对前Context的忠诚 (+16%), 长文档理解 (+5%), 和检索扩展 (+9%).
    Abstract Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining pipelines train LMs by concatenating random sets of short documents to create input contexts but the prior documents provide no signal for predicting the next document. We instead present In-Context Pretraining, a new approach where language models are pretrained on a sequence of related documents, thereby explicitly encouraging them to read and reason across document boundaries. We can do In-Context Pretraining by simply changing the document ordering so that each context contains related documents, and directly applying existing pretraining pipelines. However, this document sorting problem is challenging. There are billions of documents and we would like the sort to maximize contextual similarity for every document without repeating any data. To do this, we introduce approximate algorithms for finding related documents with efficient nearest neighbor search and constructing coherent input contexts with a graph traversal algorithm. Our experiments show In-Context Pretraining offers a simple and scalable approach to significantly enhance LMs'performance: we see notable improvements in tasks that require more complex contextual reasoning, including in-context learning (+8%), reading comprehension (+15%), faithfulness to previous contexts (+16%), long-context reasoning (+5%), and retrieval augmentation (+9%).
    摘要 大型语言模型(LM)目前在预测token的任务上被训练,允许它们直接进行长形生成和提示类型任务,这些任务可以被简化为文档完成。现有的预训管道将LM训练为 concatenating随机的短文档来建立输入 контекст,但是先前的文档提供了无法预测下一个文档的信号。我们则提出了内部预训(In-Context Pretraining),一种新的方法,其中语言模型在相关的文档序列中预训,并且明确地让模型在文档boundaries上读取和理解。我们可以实现内部预训 simply by changing the document ordering so that each context contains related documents, and directly applying existing pretraining pipelines。但是,这个文档排序问题是具有挑战性的,有 billions of documents,我们希望排序可以最大化文档之间的相似性,而不是重复数据。为了解决这个问题,我们引入了近似算法,用于快速找到相关文档,并使用图 traversal algorithm construct coherent input contexts。我们的实验显示,内部预训可以提供一个简单且扩展的方法,以提高LM的性能:我们在需要更多的contextual reasoning任务中看到了很大的改善(+8%),包括在文档中学习(+15%)、对先前context的忠诚性(+16%)、长形reasoning(+5%)和文档扩展(+9%)。

Towards Scenario-based Safety Validation for Autonomous Trains with Deep Generative Models

  • paper_url: http://arxiv.org/abs/2310.10635
  • repo_url: None
  • paper_authors: Thomas Decker, Ananta R. Bhattarai, Michael Lebacher
    for: 这篇论文是为了探讨如何适当地验证自动驾驶系统的可靠性。methods: 这篇论文使用了深度生成模型来生成数据,以验证自动驾驶系统在不同的照明和天气条件下是否能够正常运行。results: 研究人员通过使用深度生成模型,可以使限量的测试数据更加表示性,并且可以分析自动驾驶系统是否遵循了一般的操作设计域(ODD)要求。特别是在不同的照明和天气条件下,自动驾驶系统是否能够正常运行的问题上,研究人员可以通过深度生成模型来进行分析。
    Abstract Modern AI techniques open up ever-increasing possibilities for autonomous vehicles, but how to appropriately verify the reliability of such systems remains unclear. A common approach is to conduct safety validation based on a predefined Operational Design Domain (ODD) describing specific conditions under which a system under test is required to operate properly. However, collecting sufficient realistic test cases to ensure comprehensive ODD coverage is challenging. In this paper, we report our practical experiences regarding the utility of data simulation with deep generative models for scenario-based ODD validation. We consider the specific use case of a camera-based rail-scene segmentation system designed to support autonomous train operation. We demonstrate the capabilities of semantically editing railway scenes with deep generative models to make a limited amount of test data more representative. We also show how our approach helps to analyze the degree to which a system complies with typical ODD requirements. Specifically, we focus on evaluating proper operation under different lighting and weather conditions as well as while transitioning between them.
    摘要 现代人工智能技术为自动驾驶车辆开启了无限可能,但如何正确验证这些系统的可靠性仍然不清楚。一般来说,是通过预先定义的操作设计域(ODD)来确保系统在测试时运行正常。然而,收集足够的实际测试 случа件以确保完整的 ODD 覆盖却是一项具有挑战性的任务。在这篇论文中,我们介绍了使用深度生成模型进行数据simeulation,以验证场景基于 ODD 的安全验证。我们选择了基于摄像头的铁路景象分割系统,用于支持自动列车运行。我们示示了使用深度生成模型编辑铁路场景,以使用有限的测试数据更加 Representative。我们还展示了如何使用我们的方法来分析系统是否符合 Typical ODD 要求。特别是,我们对不同的照明和天气条件下的系统运行正常性进行评估,以及在这些条件之间的过渡中系统的运行情况。

OpenAgents: An Open Platform for Language Agents in the Wild

  • paper_url: http://arxiv.org/abs/2310.10634
  • repo_url: https://github.com/xlang-ai/openagents
  • paper_authors: Tianbao Xie, Fan Zhou, Zhoujun Cheng, Peng Shi, Luoxuan Weng, Yitao Liu, Toh Jing Hua, Junning Zhao, Qian Liu, Che Liu, Leo Z. Liu, Yiheng Xu, Hongjin Su, Dongchan Shin, Caiming Xiong, Tao Yu
  • for: 本研究旨在提供一个开源平台,供日常生活中使用语言代理人,并且将语言代理人应用到实际生活中。
  • methods: 本研究使用了Python/SQL和日常API工具来建立三个语言代理人:数据代理人、插件代理人和网页代理人。
  • results: 本研究实现了一个开源平台,可以让一般用户通过网页使用语言代理人功能,并且提供了一个简单的开发者和研究人员的部署体验,以便实现创新的语言代理人和实际世界中的评估。
    Abstract Language agents show potential in being capable of utilizing natural language for varied and intricate tasks in diverse environments, particularly when built upon large language models (LLMs). Current language agent frameworks aim to facilitate the construction of proof-of-concept language agents while neglecting the non-expert user access to agents and paying little attention to application-level designs. We present OpenAgents, an open platform for using and hosting language agents in the wild of everyday life. OpenAgents includes three agents: (1) Data Agent for data analysis with Python/SQL and data tools; (2) Plugins Agent with 200+ daily API tools; (3) Web Agent for autonomous web browsing. OpenAgents enables general users to interact with agent functionalities through a web user interface optimized for swift responses and common failures while offering developers and researchers a seamless deployment experience on local setups, providing a foundation for crafting innovative language agents and facilitating real-world evaluations. We elucidate the challenges and opportunities, aspiring to set a foundation for future research and development of real-world language agents.
    摘要 语言代理显示出在使用自然语言完成多样化和复杂任务的潜在能力,特别是在基于大语言模型(LLM)的情况下。现有的语言代理框架主要是为了建立证明性的语言代理,忽略了非专家用户访问代理和应用程序层的设计。我们介绍OpenAgents,一个开放的平台,用于在日常生活中使用和主机语言代理。OpenAgents包括三个代理:(1)数据代理,用于数据分析,使用Python/SQL和数据工具;(2)插件代理,提供200多个日常API工具;(3)网络代理,用于自动化网络浏览。OpenAgents允许一般用户通过网页用户界面进行快速响应和常见失败的交互,同时提供了开发者和研究人员在本地设置上的畅通部署体验,为创造语言代理的未来研究和发展提供了基础。我们详细介绍了挑战和机遇,以便为未来的语言代理研究和发展提供指导。

BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology

  • paper_url: http://arxiv.org/abs/2310.10632
  • repo_url: https://github.com/bioplanner/bioplanner
  • paper_authors: Odhran O’Donoghue, Aleksandar Shtedritski, John Ginger, Ralph Abboud, Ali Essa Ghareeb, Justin Booth, Samuel G Rodriques
  • for: 本研究旨在开发一种自动生成科学实验协议的能力,以便自动化科学研究。
  • methods: 本研究使用大型自然语言模型(LLM)来生成科学实验协议,并使用pseudocode表示法来评估模型的性能。
  • results: 研究发现LLM可以准确地生成科学实验协议,并且可以通过pseudocode表示法来评估模型的性能。此外,研究还发现使用pseudocode表示法可以准确地生成新的实验协议,并且可以在生物实验室中成功完成一个生成的实验协议。
    Abstract The ability to automatically generate accurate protocols for scientific experiments would represent a major step towards the automation of science. Large Language Models (LLMs) have impressive capabilities on a wide range of tasks, such as question answering and the generation of coherent text and code. However, LLMs can struggle with multi-step problems and long-term planning, which are crucial for designing scientific experiments. Moreover, evaluation of the accuracy of scientific protocols is challenging, because experiments can be described correctly in many different ways, require expert knowledge to evaluate, and cannot usually be executed automatically. Here we present an automatic evaluation framework for the task of planning experimental protocols, and we introduce BioProt: a dataset of biology protocols with corresponding pseudocode representations. To measure performance on generating scientific protocols, we use an LLM to convert a natural language protocol into pseudocode, and then evaluate an LLM's ability to reconstruct the pseudocode from a high-level description and a list of admissible pseudocode functions. We evaluate GPT-3 and GPT-4 on this task and explore their robustness. We externally validate the utility of pseudocode representations of text by generating accurate novel protocols using retrieved pseudocode, and we run a generated protocol successfully in our biological laboratory. Our framework is extensible to the evaluation and improvement of language model planning abilities in other areas of science or other areas that lack automatic evaluation.
    摘要 科学实验协议自动生成能力会代表科学自动化的重要一步。大型语言模型(LLM)在各种任务上表现出优异,如问答和文本和代码生成。然而,LLM在多步问题和长期规划方面可能会遇到困难,这些是科学实验的关键。此外,科学实验协议的评估困难,因为实验可以用多种语言描述正确,需要专家知识进行评估,并且通常无法自动执行。我们介绍了一种自动评估框架,用于评估语言模型在计划科学实验协议的能力。我们还提供了生物协议集(BioProt),其包含生物实验协议和对应的伪代码表示。为了衡量语言模型在生成科学协议方面的性能,我们使用一个LLM将自然语言协议转换为伪代码,然后评估LLM是否可以从高级描述和授权伪代码函数中重建伪代码。我们使用GPT-3和GPT-4进行测试,并评估其可靠性。我们还验证了 pseudocode 表示的实验协议的可重复性,并在生物实验室中成功执行了生成的协议。我们的框架可以扩展到其他科学领域或缺乏自动评估的领域中的语言模型规划能力的评估和改进。

Llemma: An Open Language Model For Mathematics

  • paper_url: http://arxiv.org/abs/2310.10631
  • repo_url: https://github.com/EleutherAI/math-lm
  • paper_authors: Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck
  • for: 这篇论文主要是为了描述一种大型语言模型,用于数学领域。
  • methods: 该论文使用了Code Llama进行预训练,并在Proof-Pile-2 dataset上继续预训练,这个dataset包括科学论文、网络数据和数学代码。
  • results: 根据MATH benchmark,Llemma模型在开放基础模型中表现出色,并且在相同参数基础上超过了所有已知的开放基础模型和未发布的Minerva模型集。此外,Llemma模型还可以不需要进一步微调来使用工具和正式证明 theorem。
    Abstract We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.
    摘要 我团队今天发布了一个大型语言模型,称为Llemma。我们继续预训 Code Llama 在 Proof-Pile-2 上进行预训, Proof-Pile-2 是一个混合科学论文、网络数据和数学代码的杂合,从而得到了 Llemma。在 MATH benchmark 上,Llemma 与所有已知的开放基模型以及未发布的 Minerva 模型集合相比,在参数量为相同的前提下表现出色。此外,Llemma 还可以无需进一步训练地使用工具和正式证明。我们公开发布了所有文件,包括 7 亿和 34 亿参数的模型、Proof-Pile-2 和代码,以便重现我们的实验。

Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers

  • paper_url: http://arxiv.org/abs/2310.10627
  • repo_url: https://github.com/elicit/fave-dataset
  • paper_authors: Charlie George, Andreas Stuhlmüller
  • for: 本研究旨在评估自动生成报告中的幻觉现象,以及使用 Factored Verification 方法检测幻觉。
  • methods: 本研究使用 Factored Verification 方法对 abstractive 报告进行自动检测幻觉,并在 HaluEval benchmark 上达到了新的 SotA 水平。
  • results: 研究发现,使用 Factored Critiques 方法自动修正幻觉后,ChatGPT 的幻觉数量降低至 0.49,GPT-4 的幻觉数量降低至 0.46,Claude 2 的幻觉数量降低至 0.95。幻觉的存在导致报告中出现了一些细微的差异。因此,在使用模型生成学报时应该慎重。
    Abstract Hallucination plagues even frontier LLMs--but how bad is it really for summarizing academic papers? We evaluate Factored Verification, a simple automated method for detecting hallucinations in abstractive summaries. This method sets a new SotA on hallucination detection in the summarization task of the HaluEval benchmark, achieving 76.2% accuracy. We then use this method to estimate how often language models hallucinate when summarizing across multiple academic papers and find 0.62 hallucinations in the average ChatGPT (16k) summary, 0.84 for GPT-4, and 1.55 for Claude 2. We ask models to self-correct using Factored Critiques and find that this lowers the number of hallucinations to 0.49 for ChatGPT, 0.46 for GPT-4, and 0.95 for Claude 2. The hallucinations we find are often subtle, so we advise caution when using models to synthesize academic papers.
    摘要 投影症也捕捉前沿 LLMS---但是怎么减少它对报告简要的影响?我们评估 Factored Verification,一种简单的自动检测投影症的方法。这种方法在HaluEvalbenchmark中的报告简要任务中设置了新的SotArekord,达到76.2%的准确率。然后,我们使用这种方法来估计语言模型在多篇学术论文总结中的投影症频率,发现了0.62个投影症在ChatGPT(16k)总结中,0.84个投影症在GPT-4总结中,以及1.55个投影症在Claude 2总结中。我们让模型使用Factored Critiques进行自我修复,发现这会降低投影症的数量到0.49个投影症在ChatGPT中,0.46个投影症在GPT-4中,以及0.95个投影症在Claude 2中。我们发现投影症很 часто是柔和的,因此在使用模型Synthesize academic papers时应该保持谨慎。

Video Language Planning

  • paper_url: http://arxiv.org/abs/2310.10625
  • repo_url: https://github.com/abusufyanvu/6S191_MIT_DeepLearning
  • paper_authors: Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson
  • for: 提高复杂长期任务的视觉规划能力,利用大型生成模型的最新进展。
  • methods: 训练视语模型和文本到视频模型,并使其服务为策略和价值函数,实现视觉语言规划(VLP)算法。
  • results: VLP可以根据计算资源的增加,提高完成长期任务的成功率,并可以在不同机器人领域中生成长期视频规划:从多对象重新排序到多摄像头双手巧妙操作。生成的视频规划可以通过目标受控策略转化为真正的机器人行为。实验表明,VLP与先前方法相比,在真实机器人上提高了长期任务成功率。
    Abstract We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data. To this end, we present video language planning (VLP), an algorithm that consists of a tree search procedure, where we train (i) vision-language models to serve as both policies and value functions, and (ii) text-to-video models as dynamics models. VLP takes as input a long-horizon task instruction and current image observation, and outputs a long video plan that provides detailed multimodal (video and language) specifications that describe how to complete the final task. VLP scales with increasing computation budget where more computation time results in improved video plans, and is able to synthesize long-horizon video plans across different robotics domains: from multi-object rearrangement, to multi-camera bi-arm dexterous manipulation. Generated video plans can be translated into real robot actions via goal-conditioned policies, conditioned on each intermediate frame of the generated video. Experiments show that VLP substantially improves long-horizon task success rates compared to prior methods on both simulated and real robots (across 3 hardware platforms).
    摘要 我们有兴趣实现视觉观察计划 для复杂长期任务在生成影像和语言之间,利用最近的大型生成模型。为此,我们提出了视觉语言观察(VLP)算法,它包括树搜索程式,我们在视觉语言模型中训练(i)视觉语言模型作为政策和价值函数,以及(ii)文本视频模型作为动态模型。VLP将长期任务指令和当前影像观察作为输入,将产生详细多模式(影像和语言)的视觉计划,描述如何完成最终任务。VLP随计算预算增加,增加计算时间可以提高视觉计划的质量,并能够适用于不同的机器人领域:从多个物体重新排序到多条臂优雅操作。生成的视觉计划可以转换为真实机器人动作 via 目标受控制政策,每个中途几何视觉计划中的一个条件。实验显示,VLP可以与先前的方法相比,在模拟和真实机器人(三个硬件平台)上提高长期任务成功率。

Generating Summaries with Controllable Readability Levels

  • paper_url: http://arxiv.org/abs/2310.10623
  • repo_url: None
  • paper_authors: Leonardo F. R. Ribeiro, Mohit Bansal, Markus Dreyer
  • for: 这个论文的目的是控制摘要的可读性水平,以便对不同读者群体进行知识传递。
  • methods: 这个论文使用了以下三种技术来控制摘要的可读性水平:1)指令式可读性控制,2)使用约束来减少请求和实际可读性差距,3)使用预测器来估算下一步摘要的可读性水平。
  • results: 这个论文通过对新闻摘要(CNN/DM数据集)进行实验,证明了其控制可读性的三种生成技术具有显著的改善作用,从而为可控的可读性 summarization 提供了强有力的基础。
    Abstract Readability refers to how easily a reader can understand a written text. Several factors affect the readability level, such as the complexity of the text, its subject matter, and the reader's background knowledge. Generating summaries based on different readability levels is critical for enabling knowledge consumption by diverse audiences. However, current text generation approaches lack refined control, resulting in texts that are not customized to readers' proficiency levels. In this work, we bridge this gap and study techniques to generate summaries at specified readability levels. Unlike previous methods that focus on a specific readability level (e.g., lay summarization), we generate summaries with fine-grained control over their readability. We develop three text generation techniques for controlling readability: (1) instruction-based readability control, (2) reinforcement learning to minimize the gap between requested and observed readability and (3) a decoding approach that uses lookahead to estimate the readability of upcoming decoding steps. We show that our generation methods significantly improve readability control on news summarization (CNN/DM dataset), as measured by various readability metrics and human judgement, establishing strong baselines for controllable readability in summarization.
    摘要 <>转换给定文本到简化中文。>可读性指示文本的理解容易程度。不同因素会影响可读性水平,如文本复杂度、主题和读者背景知识。生成基于不同可读性水平的摘要是关键,以便各种读者阅读。然而,当前文本生成方法缺乏细化控制,导致文本不具有读者素途水平定制。在这种情况下,我们尝试填补这个空白,研究生成摘要时控制可读性的技术。与之前的方法不同,我们的生成方法可以在不同的可读性水平上生成摘要,并且可以在不同的读者背景知识下进行细化控制。我们开发了三种文本生成技术来控制可读性:1. 指令式可读性控制2. 使用奖励学习减少请求和实际可读性之间的差距3. 使用预测器来估计下一步的可读性。我们表明,我们的生成方法可以在新闻摘要(CNN/DM数据集)中提高可读性控制,根据不同的可读性指标和人类评价。这些结果设置了可控的可读性基elines。

Quantifying Assistive Robustness Via the Natural-Adversarial Frontier

  • paper_url: http://arxiv.org/abs/2310.10610
  • repo_url: None
  • paper_authors: Jerry Zhi-Yang He, Zackory Erickson, Daniel S. Brown, Anca D. Dragan
  • For: The paper aims to build robust policies for robots that assist people, but the challenge is that people can behave unexpectedly and interact with the robot outside of its training distribution, leading to failures.* Methods: The paper proposes a method called RIGID, which constructs the entire natural-adversarial frontier by training adversarial human policies that trade off between minimizing robot reward and acting human-like.* Results: The paper uses RIGID to analyze the performance of standard collaborative Reinforcement Learning and existing methods meant to increase robustness, and compares the frontier identified by RIGID with failures identified in expert adversarial interaction and naturally-occurring failures during user interaction. The results show that RIGID can provide a meaningful measure of robustness predictive of deployment performance and uncover failure cases that are difficult to find manually.Here is the text in Simplified Chinese:* For: 论文目标是建立助人机器人策略,但是人们可能会在测试时表现出意外的行为,导致机器人失败。* Methods: 论文提出了一种方法 called RIGID,它通过培养对应的人类策略来构建整个自然-攻击前景,以确定机器人策略的稳定性。* Results: 论文使用 RIGID 分析了标准合作 reinforcement learning 和现有增强稳定性的方法的性能,并与专家对抗交互中的失败和用户交互中的自然失败进行比较。结果表明,RIGID 可以提供有用的稳定性预测,并揭示了一些难以手动发现的失败案例。
    Abstract Our ultimate goal is to build robust policies for robots that assist people. What makes this hard is that people can behave unexpectedly at test time, potentially interacting with the robot outside its training distribution and leading to failures. Even just measuring robustness is a challenge. Adversarial perturbations are the default, but they can paint the wrong picture: they can correspond to human motions that are unlikely to occur during natural interactions with people. A robot policy might fail under small adversarial perturbations but work under large natural perturbations. We propose that capturing robustness in these interactive settings requires constructing and analyzing the entire natural-adversarial frontier: the Pareto-frontier of human policies that are the best trade-offs between naturalness and low robot performance. We introduce RIGID, a method for constructing this frontier by training adversarial human policies that trade off between minimizing robot reward and acting human-like (as measured by a discriminator). On an Assistive Gym task, we use RIGID to analyze the performance of standard collaborative Reinforcement Learning, as well as the performance of existing methods meant to increase robustness. We also compare the frontier RIGID identifies with the failures identified in expert adversarial interaction, and with naturally-occurring failures during user interaction. Overall, we find evidence that RIGID can provide a meaningful measure of robustness predictive of deployment performance, and uncover failure cases in human-robot interaction that are difficult to find manually. https://ood-human.github.io.
    摘要 我们的最终目标是建立Robot assistant的坚强策略。但是,人类在测试时可能会出现意外的行为,导致机器人失败。甚至测试Robot的坚强性也是一个挑战。对于Robot来说,对人类的干预是最大的挑战。我们认为,在这些互动设定下,捕捉Robot的坚强性需要构建和分析人类政策的全面自然针对的前ier:人类政策的Pareto前ier,即在自然性和机器人性能之间寻找最佳的交易。我们提出了RIGID方法,通过在人类政策中培养对减少机器人奖励和人类化行为(由识别器来衡量)进行交易来构建这个前ier。在助手机器人任务上,我们使用RIGID方法分析标准协作学习的性能,以及已有的Robot坚强性增强方法的性能。我们还将这个前ier与专家对机器人互动的攻击、以及用户互动中的自然出现的失败进行比较。总的来说,我们发现RIGID可以提供有意义的坚强性预测,并揭示了人机互动中难以找到的失败案例。

Exploring the Power of Graph Neural Networks in Solving Linear Optimization Problems

  • paper_url: http://arxiv.org/abs/2310.10603
  • repo_url: https://github.com/chendiqian/IPM_MPNN
  • paper_authors: Chendi Qian, Didier Chételat, Christopher Morris
  • for: 这篇论文主要是为了解释Message-Passing Graph Neural Networks(MPNNs)在增强精确优化算法方面的效果。
  • methods: 这篇论文使用了MPNNs模仿计算机机械的努力,如强分支,来解决混合整数优化问题。
  • results: 这篇论文证明了MPNNs可以模拟标准内部点方法来解决线性优化问题,并且可以根据给定问题实例的分布来适应。 Empirical results show that MPNNs can solve LP relaxations of standard combinatorial optimization problems with high accuracy and fast speed, often surpassing conventional solvers and competing approaches.
    Abstract Recently, machine learning, particularly message-passing graph neural networks (MPNNs), has gained traction in enhancing exact optimization algorithms. For example, MPNNs speed up solving mixed-integer optimization problems by imitating computational intensive heuristics like strong branching, which entails solving multiple linear optimization problems (LPs). Despite the empirical success, the reasons behind MPNNs' effectiveness in emulating linear optimization remain largely unclear. Here, we show that MPNNs can simulate standard interior-point methods for LPs, explaining their practical success. Furthermore, we highlight how MPNNs can serve as a lightweight proxy for solving LPs, adapting to a given problem instance distribution. Empirically, we show that MPNNs solve LP relaxations of standard combinatorial optimization problems close to optimality, often surpassing conventional solvers and competing approaches in solving time.
    摘要 最近,机器学习技术,特别是消息传递图神经网络(MPNN),在增强精确优化算法方面取得了进展。例如,MPNN可以加速解决杂合整数优化问题,通过模拟计算沉重的规则,如强分支法,解决多个线性优化问题(LP)。虽然实际成功,但MPNN在优化LP的原因 remained largely unclear。在这篇文章中,我们展示MPNN可以模拟标准内部点方法,解释其实际成功。此外,我们指出MPNN可以作为LP的轻量级代理,适应给定问题实例分布。在实验中,我们发现MPNN可以解决标准 combinatorial优化问题的LP relaxation,与优化时间相对较长的传统算法和竞争方法相比,往往达到更高的优化精度。

Physics-informed neural wavefields with Gabor basis functions

  • paper_url: http://arxiv.org/abs/2310.10602
  • repo_url: None
  • paper_authors: Tariq Alkhalifah, Xinquan Huang
    for: This paper aims to enhance the efficiency and accuracy of neural network wavefield solutions by modeling them as linear combinations of Gabor basis functions that satisfy the wave equation.methods: The proposed approach uses a fully connected neural network with an adaptable Gabor layer as the final hidden layer, employing a weighted summation of Gabor neurons to compute predictions. The weights/coefficients of the Gabor functions are learned from previous hidden layers with nonlinear activation functions.results: Realistic assessments showcase the efficacy of this novel implementation compared to the vanilla PINN, particularly in scenarios involving high-frequencies and realistic models that are often challenging for PINNs.
    Abstract Recently, Physics-Informed Neural Networks (PINNs) have gained significant attention for their versatile interpolation capabilities in solving partial differential equations (PDEs). Despite their potential, the training can be computationally demanding, especially for intricate functions like wavefields. This is primarily due to the neural-based (learned) basis functions, biased toward low frequencies, as they are dominated by polynomial calculations, which are not inherently wavefield-friendly. In response, we propose an approach to enhance the efficiency and accuracy of neural network wavefield solutions by modeling them as linear combinations of Gabor basis functions that satisfy the wave equation. Specifically, for the Helmholtz equation, we augment the fully connected neural network model with an adaptable Gabor layer constituting the final hidden layer, employing a weighted summation of these Gabor neurons to compute the predictions (output). These weights/coefficients of the Gabor functions are learned from the previous hidden layers that include nonlinear activation functions. To ensure the Gabor layer's utilization across the model space, we incorporate a smaller auxiliary network to forecast the center of each Gabor function based on input coordinates. Realistic assessments showcase the efficacy of this novel implementation compared to the vanilla PINN, particularly in scenarios involving high-frequencies and realistic models that are often challenging for PINNs.
    摘要 近期,物理学 Informed Neural Networks (PINNs) 已经受到了广泛关注,因为它们可以通过解析部分偏微分方程 (PDEs) 来进行多元函数的 interpolating 能力。 despite their potential, the training of PINNs can be computationally demanding, especially for complex functions such as wavefields. This is primarily due to the fact that the neural network-based (learned) basis functions are biased towards low frequencies, as they are dominated by polynomial calculations, which are not inherently wavefield-friendly.为了解决这个问题,我们提出了一种方法,用于提高 PINN 的效率和准确性,通过将它们表示为线性组合的 Gabor 基函数,这些基函数满足波方程。 Specifically, for the Helmholtz equation, we augment the fully connected neural network model with an adaptable Gabor layer constituting the final hidden layer, employing a weighted summation of these Gabor neurons to compute the predictions (output). These weights/coefficients of the Gabor functions are learned from the previous hidden layers that include nonlinear activation functions. To ensure the Gabor layer's utilization across the model space, we incorporate a smaller auxiliary network to forecast the center of each Gabor function based on input coordinates. Realistic assessments showcase the efficacy of this novel implementation compared to the vanilla PINN, particularly in scenarios involving high-frequencies and realistic models that are often challenging for PINNs.

Automated Natural Language Explanation of Deep Visual Neurons with Large Models

  • paper_url: http://arxiv.org/abs/2310.10708
  • repo_url: None
  • paper_authors: Chenxu Zhao, Wei Qian, Yucheng Shi, Mengdi Huai, Ninghao Liu
  • for: 本研究旨在解释深度神经网络中 neuron 的含义,提高神经网络的可解释性和可行性。
  • methods: 本研究提出了一种基于大型基础模型的后续框架,可自动生成 neuron 的含义,不需要人工干预或专业知识。
  • results: 实验表明,该方法可以准确地找出 neuron 的含义,并且可以与不同的模型和数据集相结合。
    Abstract Deep neural networks have exhibited remarkable performance across a wide range of real-world tasks. However, comprehending the underlying reasons for their effectiveness remains a challenging problem. Interpreting deep neural networks through examining neurons offers distinct advantages when it comes to exploring the inner workings of neural networks. Previous research has indicated that specific neurons within deep vision networks possess semantic meaning and play pivotal roles in model performance. Nonetheless, the current methods for generating neuron semantics heavily rely on human intervention, which hampers their scalability and applicability. To address this limitation, this paper proposes a novel post-hoc framework for generating semantic explanations of neurons with large foundation models, without requiring human intervention or prior knowledge. Our framework is designed to be compatible with various model architectures and datasets, facilitating automated and scalable neuron interpretation. Experiments are conducted with both qualitative and quantitative analysis to verify the effectiveness of our proposed approach.
    摘要 深度神经网络在各种实际任务中表现出色,但理解它们的内在原理仍然是一项复杂的问题。通过分析神经元来解释深度神经网络的工作机制具有明显的优势。前期研究表明,深度视觉网络中的特定神经元具有 semantics 意义,并在模型性能中扮演重要角色。然而,目前用于生成神经元 semantics 的方法仍然高度依赖于人工干预,这限制了其可推广性和应用性。为了解决这一问题,本文提出了一种新的后置框架,用于自动生成大型基础模型中神经元的Semantic解释,不需要人工干预或先验知识。我们的框架适用于多种模型架构和数据集,可以实现自动化和扩展的神经元解释。我们通过对质量和kvantitative分析进行实验来验证我们的提议的有效性。

Towards the Imagenets of ML4EDA

  • paper_url: http://arxiv.org/abs/2310.10560
  • repo_url: None
  • paper_authors: Animesh Basak Chowdhury, Shailja Thakur, Hammond Pearce, Ramesh Karri, Siddharth Garg
  • for: 这篇论文旨在探讨ML导向EDA工具从RTL到GDSII的应用,但是现在没有标准的数据集或评估任务定义于EDA问题领域。
  • methods: 作者在这篇论文中描述了他们在Verilog代码生成和逻辑合成方面收集和维护了两个大规模、高质量的数据集的经验。
  • results: 作者在这篇论文中讨论了数据集维护和扩展的挑战,以及数据集质量和安全性问题,并使用专门为硬件领域开发的数据生成工具。
    Abstract Despite the growing interest in ML-guided EDA tools from RTL to GDSII, there are no standard datasets or prototypical learning tasks defined for the EDA problem domain. Experience from the computer vision community suggests that such datasets are crucial to spur further progress in ML for EDA. Here we describe our experience curating two large-scale, high-quality datasets for Verilog code generation and logic synthesis. The first, VeriGen, is a dataset of Verilog code collected from GitHub and Verilog textbooks. The second, OpenABC-D, is a large-scale, labeled dataset designed to aid ML for logic synthesis tasks. The dataset consists of 870,000 And-Inverter-Graphs (AIGs) produced from 1500 synthesis runs on a large number of open-source hardware projects. In this paper we will discuss challenges in curating, maintaining and growing the size and scale of these datasets. We will also touch upon questions of dataset quality and security, and the use of novel data augmentation tools that are tailored for the hardware domain.
    摘要 尽管RLT到GDSII之间的ML指导EDA工具的兴趣在增长,但是没有定义了EDA问题领域的标准数据集或范例学习任务。从计算机视觉社区的经验来看,这些数据集是ML进一步发展EDA的关键。我们在这篇文章中描述了我们在Verilog代码生成和逻辑合成方面收集的两个大规模、高质量数据集的经验。第一个数据集是从GitHub和Verilog书籍中收集的Verilog代码集合,称为VeriGen。第二个数据集是一个大规模、标注的数据集,用于ML逻辑合成任务,包括1500次合成运行从开源硬件项目中生成的870,000个和逻辑图(AIGs)。在这篇文章中,我们会讨论数据集维护和扩展的挑战,以及数据集质量和安全性问题,以及适用于硬件领域的特有数据扩展工具。

Demonstrations Are All You Need: Advancing Offensive Content Paraphrasing using In-Context Learning

  • paper_url: http://arxiv.org/abs/2310.10707
  • repo_url: None
  • paper_authors: Anirudh Som, Karan Sikka, Helen Gent, Ajay Divakaran, Andreas Kathol, Dimitra Vergyri
  • For: 本研究旨在帮助实践者开发可用的妥协译者,通过尝试大语言模型(LLM)中的内在学习(ICL),使用有限的输入标签示例对象引导模型生成特定查询的愿望输出。* Methods: 本研究检查了一些关键因素,如示例数量和顺序,排除提示指导语言,以及减少衡量毒性。我们在三个数据集上进行了原则性的评估,其中包括我们所提出的上下文感知礼貌译 dataset,包含对话式粗鲁言语、礼貌译和附加的对话上下文。* Results: 我们的结果表明,ICL与监督方法在生成质量方面相当,而且在人工评估中提高了25%,并在衡量毒性方面下降了76%。此外,ICL基于的译者只有10%的训练数据 exhibit slight reduction in performance.
    Abstract Paraphrasing of offensive content is a better alternative to content removal and helps improve civility in a communication environment. Supervised paraphrasers; however, rely heavily on large quantities of labelled data to help preserve meaning and intent. They also retain a large portion of the offensiveness of the original content, which raises questions on their overall usability. In this paper we aim to assist practitioners in developing usable paraphrasers by exploring In-Context Learning (ICL) with large language models (LLMs), i.e., using a limited number of input-label demonstration pairs to guide the model in generating desired outputs for specific queries. Our study focuses on key factors such as -- number and order of demonstrations, exclusion of prompt instruction, and reduction in measured toxicity. We perform principled evaluation on three datasets, including our proposed Context-Aware Polite Paraphrase dataset, comprising of dialogue-style rude utterances, polite paraphrases, and additional dialogue context. We evaluate our approach using two closed source and one open source LLM. Our results reveal that ICL is comparable to supervised methods in generation quality, while being qualitatively better by 25% on human evaluation and attaining lower toxicity by 76%. Also, ICL-based paraphrasers only show a slight reduction in performance even with just 10% training data.
    摘要 干脆的内容重写是一种更好的代替方案,而不是完全删除内容。这有助于提高通信环境中的文明性。然而,监督重写者依然需要大量标注数据来保持意思和意图。此外,它们还会保留大量的害词,这引发了使用这些重写器的问题。在这篇论文中,我们想帮助实践者开发可用的重写器,通过exploring In-Context Learning(ICL)和大语言模型(LLM)来实现。我们的研究将关注一些关键因素,如数量和顺序的示例,排除提示 instrucion,以及减少测量的害词。我们在三个数据集上进行了原则性的评估,包括我们提出的Context-Aware Polite Paraphrase数据集,这包括对话式的粗鲁言语、文明重写和额外对话背景。我们使用两个关闭源的LLM和一个开源的LLM进行评估。我们的结果表明,ICL与监督方法在生成质量上相当,而且在人工评估中比其提高了25%,并且测量到的害词下降了76%。此外,ICL基本上只有10%的训练数据下降的性能。

Deep learning applied to EEG data with different montages using spatial attention

  • paper_url: http://arxiv.org/abs/2310.10550
  • repo_url: https://github.com/sccn/deep-channel-harmonization
  • paper_authors: Dung Truong, Muhammad Abdullah Khalid, Arnaud Delorme
  • for: 本研究旨在使用深度学习处理和提取复杂脑动态信息的EEG raw数据中的信息。
  • methods: 本研究使用了 espacial attention 对 EEG 电极坐标进行通道协调,以使得可以使用不同的通道组合训练深度学习模型。
  • results: 研究表明,使用 espacial attention 可以提高模型性能,而且一个使用不同通道组合训练的深度学习模型可以在性别分类任务中表现出色,比Fixed 23-和 128-通道数据 montage 的模型要好。
    Abstract The ability of Deep Learning to process and extract relevant information in complex brain dynamics from raw EEG data has been demonstrated in various recent works. Deep learning models, however, have also been shown to perform best on large corpora of data. When processing EEG, a natural approach is to combine EEG datasets from different experiments to train large deep-learning models. However, most EEG experiments use custom channel montages, requiring the data to be transformed into a common space. Previous methods have used the raw EEG signal to extract features of interest and focused on using a common feature space across EEG datasets. While this is a sensible approach, it underexploits the potential richness of EEG raw data. Here, we explore using spatial attention applied to EEG electrode coordinates to perform channel harmonization of raw EEG data, allowing us to train deep learning on EEG data using different montages. We test this model on a gender classification task. We first show that spatial attention increases model performance. Then, we show that a deep learning model trained on data using different channel montages performs significantly better than deep learning models trained on fixed 23- and 128-channel data montages.
    摘要 深度学习可以从Raw EEG数据中提取和处理复杂脑动态信息的能力已经在各种最近的研究中得到证明。然而,深度学习模型也被证明可以在大量数据上表现最佳。在处理 EEG 数据时,自然的方法是将 EEG 数据集合在一起训练大型深度学习模型。然而,大多数 EEG 实验使用自定义通道 montage,需要数据进行变换以达到共同空间。先前的方法使用了 Raw EEG 信号提取关键特征,并将着眼于在 EEG 数据集中共同的特征空间。虽然这是一种合理的方法,但是它忽略了 EEG 原始数据的潜在强大性。在这里,我们探索使用 EEG 电极坐标的空间注意力进行通道协调的 Raw EEG 数据,以便在不同的 montage 上训练深度学习模型。我们在性别分类任务上测试了这种模型,首先显示了空间注意力可以提高模型性能。然后,我们显示了使用不同的 montage 训练深度学习模型可以在性别分类任务中获得显著更好的性能,与固定的 23-和 128-通道数据 montage 相比。

Use of probabilistic phrases in a coordination game: human versus GPT-4

  • paper_url: http://arxiv.org/abs/2310.10544
  • repo_url: None
  • paper_authors: Laurence T Maloney, Maria F Dal Martello, Vivian Fei, Valerie Ma
  • for: 这个论文的目的是测试人类和大语言模型GPT4在 probabilistic phrases 上的能力。
  • methods: 这个论文使用了人类和GPT4在两个不同的上下文中 estimates probabilistic phrases 的能力。
  • results: 研究发现人类和GPT4在 probabilistic phrases 上的 estimations 在大多数情况下相互吻合,但人类和GPT4在 ambiguity 上的 estimations 不够一致。 GPT4 重复测试结果表明,它的 estimations 不够稳定。
    Abstract English speakers use probabilistic phrases such as likely to communicate information about the probability or likelihood of events. Communication is successful to the extent that the listener grasps what the speaker means to convey and, if communication is successful, two individuals can potentially coordinate their actions based on shared knowledge about uncertainty. We first assessed human ability to estimate the probability and the ambiguity (imprecision) of 23 probabilistic phrases in two different contexts, investment advice and medical advice. We then had GPT4 (OpenAI), a recent Large Language Model, complete the same tasks as the human participants. We found that the median human participant and GPT4 assigned probability estimates that were in good agreement (proportions of variance accounted were close to .90). GPT4's estimates of probability both in the investment and Medical contexts were as close or closer to that of the human participants as the human participants were to one another. Estimates of probability for both the human participants and GPT4 were little affected by context. In contrast, human and GPT4 estimates of ambiguity were not in as good agreement. We repeated some of the GPT4 estimates to assess their stability: does GPT4, if run twice, produce the same or similar estimates? There is some indication that it does not.
    摘要 英语speaker们使用可能性短语来传达事件的可能性或可信度。如果通信成功,两个人可以基于共享不确定性知识协调行动。我们首先评估了人类对23个可能性短语的可能性和不确定性(精度)的能力。然后,我们使用GPT4(OpenAI),一个最近的大语言模型,完成了同样的任务。我们发现 median人参与者和GPT4的可能性估计相差不大(相对变异度占比接近0.90)。GPT4在投资和医疗上的可能性估计与人参与者的估计相似或更相似。人参与者和GPT4对可能性的估计几乎不受 context 的影响。然而,人参与者和GPT4对不确定性的估计不太一致。我们重复了一些GPT4的估计以评估其稳定性:GPT4在两次运行后会产生相同或类似的估计吗?有些证据表明它不一定会。

Efficient Dataset Distillation through Alignment with Smooth and High-Quality Expert Trajectories

  • paper_url: http://arxiv.org/abs/2310.10541
  • repo_url: None
  • paper_authors: Jiyuan Shen, Wenzhuo Yang, Kwok-Yan Lam
  • for: 本研究旨在提出一种数据效果的方法,以便在训练大型和 cutting-edge 机器学习模型时,避免使用大量数据。
  • methods: 该方法基于 expert trajectory 的使用,并引入 clipping loss 和 gradient penalty 来规则参数变化的速率。此外,还提出了代表性初始化、平衡内循环损失和中间匹配损失等优化策略。
  • results: 实验结果显示,提出的方法在不同的数据集、大小和分辨率上均显著超越先前的方法。
    Abstract Training a large and state-of-the-art machine learning model typically necessitates the use of large-scale datasets, which, in turn, makes the training and parameter-tuning process expensive and time-consuming. Some researchers opt to distil information from real-world datasets into tiny and compact synthetic datasets while maintaining their ability to train a well-performing model, hence proposing a data-efficient method known as Dataset Distillation (DD). Despite recent progress in this field, existing methods still underperform and cannot effectively replace large datasets. In this paper, unlike previous methods that focus solely on improving the efficacy of student distillation, we are the first to recognize the important interplay between expert and student. We argue the significant impact of expert smoothness when employing more potent expert trajectories in subsequent dataset distillation. Based on this, we introduce the integration of clipping loss and gradient penalty to regulate the rate of parameter changes in expert trajectories. Furthermore, in response to the sensitivity exhibited towards randomly initialized variables during distillation, we propose representative initialization for synthetic dataset and balanced inner-loop loss. Finally, we present two enhancement strategies, namely intermediate matching loss and weight perturbation, to mitigate the potential occurrence of cumulative errors. We conduct extensive experiments on datasets of different scales, sizes, and resolutions. The results demonstrate that the proposed method significantly outperforms prior methods.
    摘要 通常,训练大型和当前最佳的机器学习模型需要使用大规模数据集,这会使训练和参数调整过程成为昂贵的时间和资源浪费。一些研究人员尝试将实际世界数据集中的信息简化为小型和紧凑的 sintetic 数据集,同时保持模型训练的能力,这被称为数据减量(DD)。尽管现有的方法已经取得了一定的进步,但现有的方法仍然无法有效替代大规模数据集。在这篇论文中,我们不同于以前的方法,我们认为专家畅通性在使用更强大的专家轨迹时对 DATASET DISTILLATION 的影响是非常重要的。基于这个想法,我们引入了折射损失和梯度罚 penalty 来控制专家轨迹中参数的变化速率。此外,我们还提出了代表性初始化和平衡内循环损失来适应在混合损失中随机初始化的变量的敏感性。最后,我们提出了两种改进策略,即中间匹配损失和重量扰动,以避免可能出现的累累错误。我们在不同的数据集、大小和分辨率上进行了广泛的实验,结果显示,我们的方法在PRIOR METHODS 上表现出色。

Microscaling Data Formats for Deep Learning

  • paper_url: http://arxiv.org/abs/2310.10537
  • repo_url: https://github.com/microsoft/microxcaling
  • paper_authors: Bita Darvish Rouhani, Ritchie Zhao, Ankit More, Mathew Hall, Alireza Khodamoradi, Summer Deng, Dhruv Choudhary, Marius Cornea, Eric Dellinger, Kristof Denolf, Stosic Dusan, Venmugil Elango, Maximilian Golub, Alexander Heinecke, Phil James-Roxby, Dharmesh Jani, Gaurav Kolhe, Martin Langhammer, Ada Li, Levi Melnick, Maral Mesmakhosroshahi, Andres Rodriguez, Michael Schulte, Rasoul Shafipour, Lei Shao, Michael Siu, Pradeep Dubey, Paulius Micikevicius, Maxim Naumov, Colin Verrilli, Ralph Wittig, Doug Burger, Eric Chung
  • for: 降低现代深度学习应用的计算和存储成本
  • methods: 使用块缩放因子和窄Float和整数类型来组合微规模数据格式
  • results: 实证结果表明MX数据格式可以作为FP32的Drop-in更新,并且在AI推理和训练中具有低用户阻力,以及可以在训练生成语言模型中使用sub-8位权重、活化和梯度,并且减少了精度损失。
    Abstract Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications. This paper evaluates Microscaling (MX) data formats that combine a per-block scaling factor with narrow floating-point and integer types for individual elements. MX formats balance the competing needs of hardware efficiency, model accuracy, and user friction. Empirical results on over two dozen benchmarks demonstrate practicality of MX data formats as a drop-in replacement for baseline FP32 for AI inference and training with low user friction. We also show the first instance of training generative language models at sub-8-bit weights, activations, and gradients with minimal accuracy loss and no modifications to the training recipe.
    摘要 宽度狭小的数据格式是现代深度学习应用中减少计算和存储成本的关键。这篇论文评估了 Microscaling(MX)数据格式,它将每个块缩放因子与窄浮点和整数类型相结合。MX格式均衡硬件效率、模型准确性和用户抵抗。实验结果表明MX格式可以作为FP32基eline的Drop-in取代物,用于AI推理和训练,并且具有低用户抵抗。我们还示出了在低于8位权重、活动和梯度上训练生成语言模型,无需修改训练脚本,且减少了准确性损失。

Semantic Parsing by Large Language Models for Intricate Updating Strategies of Zero-Shot Dialogue State Tracking

  • paper_url: http://arxiv.org/abs/2310.10520
  • repo_url: https://github.com/ToLightUpTheSky/ParsingDST
  • paper_authors: Yuxiang Wu, Guanting Dong, Weiran Xu
  • for: Zero-shot Dialogue State Tracking (DST) aims to address the challenge of acquiring and annotating task-oriented dialogues, which can be time-consuming and costly.
  • methods: The proposed ParsingDST method leverages powerful Large Language Models (LLMs) and semantic parsing to reformulate the DST task and improve updating strategies in the text-to-JSON process.
  • results: Experimental results show that ParsingDST outperforms existing zero-shot DST methods on MultiWOZ, with significant improvements in Joint Goal Accuracy (JGA) and slot accuracy compared to existing ICL methods.
    Abstract Zero-shot Dialogue State Tracking (DST) addresses the challenge of acquiring and annotating task-oriented dialogues, which can be time consuming and costly. However, DST extends beyond simple slot-filling and requires effective updating strategies for tracking dialogue state as conversations progress. In this paper, we propose ParsingDST, a new In-Context Learning (ICL) method, to introduce additional intricate updating strategies in zero-shot DST. Our approach reformulates the DST task by leveraging powerful Large Language Models (LLMs) and translating the original dialogue text to JSON through semantic parsing as an intermediate state. We also design a novel framework that includes more modules to ensure the effectiveness of updating strategies in the text-to-JSON process. Experimental results demonstrate that our approach outperforms existing zero-shot DST methods on MultiWOZ, exhibiting significant improvements in Joint Goal Accuracy (JGA) and slot accuracy compared to existing ICL methods.
    摘要 <>TRANSLATE_TEXT Zero-shot Dialogue State Tracking (DST) Addresses the challenge of acquiring and annotating task-oriented dialogues, which can be time-consuming and costly. However, DST extends beyond simple slot-filling and requires effective updating strategies for tracking dialogue state as conversations progress. In this paper, we propose ParsingDST, a new In-Context Learning (ICL) method, to introduce additional intricate updating strategies in zero-shot DST. Our approach reformulates the DST task by leveraging powerful Large Language Models (LLMs) and translating the original dialogue text to JSON through semantic parsing as an intermediate state. We also design a novel framework that includes more modules to ensure the effectiveness of updating strategies in the text-to-JSON process. Experimental results demonstrate that our approach outperforms existing zero-shot DST methods on MultiWOZ, exhibiting significant improvements in Joint Goal Accuracy (JGA) and slot accuracy compared to existing ICL methods.TRANSLATE_TEXT

NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails

  • paper_url: http://arxiv.org/abs/2310.10501
  • repo_url: None
  • paper_authors: Traian Rebedea, Razvan Dinu, Makesh Sreedhar, Christopher Parisien, Jonathan Cohen
  • for: 这个论文主要是为了提供一种开源的工具kit,用于轻松地在基于语言模型(LLM)的对话系统中添加可编程的 guardrails。
  • methods: 论文使用了一些机制,如模型对齐,来让LLM提供者和开发者在训练时添加到 guardrails。此外,论文还使用了一种运行时灵感自对话管理的方法,允许开发者在运行时添加可编程的 guardrails。
  • results: 论文的初步结果表明,提出的方法可以与多个LLM提供者合作,开发出可控和安全的LLM应用程序,使用可编程的 guardrails。
    Abstract NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems. Guardrails (or rails for short) are a specific way of controlling the output of an LLM, such as not talking about topics considered harmful, following a predefined dialogue path, using a particular language style, and more. There are several mechanisms that allow LLM providers and developers to add guardrails that are embedded into a specific model at training, e.g. using model alignment. Differently, using a runtime inspired from dialogue management, NeMo Guardrails allows developers to add programmable rails to LLM applications - these are user-defined, independent of the underlying LLM, and interpretable. Our initial results show that the proposed approach can be used with several LLM providers to develop controllable and safe LLM applications using programmable rails.
    摘要 尼莫·卫铁(NeMo Guardrails)是一个开源的工具套件,用于轻松地在基于LLM的对话系统中添加可编程的保护栏(guardrails)。保护栏(rails)是控制LLM输出的一种特定方式,例如不讨论有害话题,遵循预定的对话路径,使用特定的语言风格,等等。它们可以让LLM提供者和开发者在训练时间添加到特定模型中的保护栏,例如使用模型对齐。然而,使用运行时启发自对话管理的NeMo Guardrails,开发者可以添加可编程的rails到LLM应用程序中 - 这些rails是独立于下面LLM的、用户定义的、可解释的。我们的初步结果表明,提议的方法可以与多个LLM提供者合作开发可控和安全的LLM应用程序使用可编程rails。

LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism

  • paper_url: http://arxiv.org/abs/2310.10497
  • repo_url: None
  • paper_authors: Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li
  • for: 本研究旨在提出一种Selective hearing mechanism的目标说话者定位算法,以提高多说话者场景下的干扰难以听清楚的问题。
  • methods: 给出一个参考说话者的 Referral speech,首先生成一个基于说话者的 Spectrogram mask,以排除干扰说话者的speech。然后,使用Long short-term memory(LSTM)网络提取目标说话者的位置信息从过滤后的 Spectrogram中。
  • results: 实验表明,我们提出的方法在不同的 Signal-to-noise ratio(SNR)条件下,与现有算法相比,具有较高的准确率和鲁棒性。Specifically, at SNR = -10 dB, our proposed network LocSelect achieves a mean absolute error (MAE) of 3.55 and an accuracy (ACC) of 87.40%.
    Abstract The prevailing noise-resistant and reverberation-resistant localization algorithms primarily emphasize separating and providing directional output for each speaker in multi-speaker scenarios, without association with the identity of speakers. In this paper, we present a target speaker localization algorithm with a selective hearing mechanism. Given a reference speech of the target speaker, we first produce a speaker-dependent spectrogram mask to eliminate interfering speakers' speech. Subsequently, a Long short-term memory (LSTM) network is employed to extract the target speaker's location from the filtered spectrogram. Experiments validate the superiority of our proposed method over the existing algorithms for different scale invariant signal-to-noise ratios (SNR) conditions. Specifically, at SNR = -10 dB, our proposed network LocSelect achieves a mean absolute error (MAE) of 3.55 and an accuracy (ACC) of 87.40%.
    摘要 prevailing 听风抵抗和响应抵抗的本地化算法主要强调在多个说话人场景中分离并提供每个说话人的方向性输出,不与说话人标识相关。在这篇论文中,我们提出了一种目标说话人本地化算法,具有选择性听风机制。给定一个参照说话人的参照speech,我们首先生成一个基于说话人的spectrogram掩码,以消除干扰说话人的speech。接着,我们使用Long short-term memory(LSTM)网络提取目标说话人的位置信息从过滤后的spectrogram中。实验证明了我们的提出方法与现有算法在不同的标准信号噪声比(SNR)条件下表现更好。具体来说,在SNR=-10dB的条件下,我们的提出的网络LocSelect的平均绝对误差(MAE)为3.55,准确率(ACC)为87.40%。

Harnessing the Power of LLMs: Evaluating Human-AI Text Co-Creation through the Lens of News Headline Generation

  • paper_url: http://arxiv.org/abs/2310.10706
  • repo_url: https://github.com/jsndg/emnlp23-llm-headline
  • paper_authors: Zijian Ding, Alison Smith-Renner, Wenjuan Zhang, Joel R. Tetreault, Alejandro Jaimes
  • for: 这项研究旨在探讨人们如何最佳地利用LLMs进行写作,以及与这些模型交互对于写作过程中的拥有感和信任的影响。
  • methods: 研究采用了常见的人机交互方式(如导航系统、从系统输出中选择、后期编辑),在LLM协助新闻标题生成上进行了比较。
  • results: 研究发现,人类控制可以减少LLM输出的不良结果,而且与自由编辑相比,AI协助不会影响参与者对写作过程的感知控制。
    Abstract To explore how humans can best leverage LLMs for writing and how interacting with these models affects feelings of ownership and trust in the writing process, we compared common human-AI interaction types (e.g., guiding system, selecting from system outputs, post-editing outputs) in the context of LLM-assisted news headline generation. While LLMs alone can generate satisfactory news headlines, on average, human control is needed to fix undesirable model outputs. Of the interaction methods, guiding and selecting model output added the most benefit with the lowest cost (in time and effort). Further, AI assistance did not harm participants' perception of control compared to freeform editing.
    摘要 (Simplified Chinese translation)为了了解人类如何最好地利用LLM,并如何与这些模型交互影响写作过程中的所有权和信任感,我们在LLM协助新闻标题生成上比较了不同的人机合作方式(如导引系统、从系统输出选择、后期编辑)。尽管LLM独立可以生成可靠的新闻标题,但平均需要人类控制来修复模型输出的问题。 among the interaction methods, guiding and selecting model output added the most benefit with the lowest cost (in time and effort). Additionally, AI assistance did not harm participants' perception of control compared to freeform editing.

Type-aware Decoding via Explicitly Aggregating Event Information for Document-level Event Extraction

  • paper_url: http://arxiv.org/abs/2310.10487
  • repo_url: None
  • paper_authors: Gang Zhao, Yidong Shi, Shudong Lu, Xinjie Yang, Guanting Dong, Jian Xu, Xiaocheng Gong, Si Li
  • for: 本研究旨在解决文档水平事件EXTRACTION(DEE)中的两个主要挑战:事件散布和多事件。 précédentes méthodes have attempted to address these challenges, but they have overlooked the interference of event-unrelated sentences during event detection and neglected the mutual interference of different event roles during argument extraction.
  • methods: 本研究提出了一种新的Schema-based Explicitly Aggregating(SEA)模型,该模型可以有效地聚合事件信息,并将事件类型和角色信息分别编码为特定的类型和角色表示。通过基于类型的表示来检测每个事件,SEA可以减轻由事件相关信息引起的干扰。此外,SEA可以根据每个角色的表示来提取对应的Arguments,从而减少不同角色之间的互相干扰。
  • results: 实验结果表明,SEA模型在ChFinAnn和DuEE-fin数据集上的表现优于STATE-OF-THE-ART(SOTA)方法。
    Abstract Document-level event extraction (DEE) faces two main challenges: arguments-scattering and multi-event. Although previous methods attempt to address these challenges, they overlook the interference of event-unrelated sentences during event detection and neglect the mutual interference of different event roles during argument extraction. Therefore, this paper proposes a novel Schema-based Explicitly Aggregating~(SEA) model to address these limitations. SEA aggregates event information into event type and role representations, enabling the decoding of event records based on specific type-aware representations. By detecting each event based on its event type representation, SEA mitigates the interference caused by event-unrelated information. Furthermore, SEA extracts arguments for each role based on its role-aware representations, reducing mutual interference between different roles. Experimental results on the ChFinAnn and DuEE-fin datasets show that SEA outperforms the SOTA methods.
    摘要 文档级事件提取(DEE)面临两大挑战:事件散布和多事件。尽管先前的方法尝试解决这些挑战,但它们忽略了事件检测过程中的事件无关句子干扰和对不同角色的事件提取过程中的互相干扰。因此,这篇论文提出了一种新的Schema-based Explicitly Aggregating(SEA)模型,用于解决这些限制。SEA将事件信息聚合到事件类型和角色表示中,使得根据具体的类型意识来解码事件记录。通过根据事件类型表示来检测每个事件,SEA可以减轻由事件无关信息引起的干扰。此外,SEA根据角色意识来提取每个角色的证据,减少不同角色之间的互相干扰。实验结果表明,SEA在ChFinAnn和DuEE-fin数据集上的性能比SOTA方法更高。

ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots

  • paper_url: http://arxiv.org/abs/2310.10486
  • repo_url: None
  • paper_authors: Milad Shafiee, Guillaume Bellegarda, Auke Ijspeert
  • for: 这种研究旨在开发一种可以控制多种四足机器人的奔跑策略,而无需重新调整参数和奖励函数。
  • methods: 研究人员 drew inspiration from animal motor control,并使用了一种模块化的CPG和PF层,以实现不同机器人之间的奔跑策略的共享。
  • results: 研究人员在不同机器人上测试了这种策略,并观察到了强健的实际到虚拟转移性,甚至在加载15公斤(相当于A1机器人的125% Nominal Mass)时仍然保持了稳定的性能。
    Abstract Learning a locomotion policy for quadruped robots has traditionally been constrained to specific robot morphology, mass, and size. The learning process must usually be repeated for every new robot, where hyperparameters and reward function weights must be re-tuned to maximize performance for each new system. Alternatively, attempting to train a single policy to accommodate different robot sizes, while maintaining the same degrees of freedom (DoF) and morphology, requires either complex learning frameworks, or mass, inertia, and dimension randomization, which leads to prolonged training periods. In our study, we show that drawing inspiration from animal motor control allows us to effectively train a single locomotion policy capable of controlling a diverse range of quadruped robots. These differences encompass a variable number of DoFs, (i.e. 12 or 16 joints), three distinct morphologies, a broad mass range spanning from 2 kg to 200 kg, and nominal standing heights ranging from 16 cm to 100 cm. Our policy modulates a representation of the Central Pattern Generator (CPG) in the spinal cord, effectively coordinating both frequencies and amplitudes of the CPG to produce rhythmic output (Rhythm Generation), which is then mapped to a Pattern Formation (PF) layer. Across different robots, the only varying component is the PF layer, which adjusts the scaling parameters for the stride height and length. Subsequently, we evaluate the sim-to-real transfer by testing the single policy on both the Unitree Go1 and A1 robots. Remarkably, we observe robust performance, even when adding a 15 kg load, equivalent to 125% of the A1 robot's nominal mass.
    摘要 学习四肢动物机器人的运动策略传统上受到机器人形态、质量和大小的限制。学习过程通常需要对每个新机器人重新调整超参数和奖励函数权重,以最大化表现。 Alternatively, 尝试使用同一个策略控制不同机器人的不同大小、同样的度度自由(DoF)和形态,需要使用复杂的学习框架,或者质量、抗力和维度随机化,这会导致训练期间过长。在我们的研究中,我们draw inspiration from animal motor control,我们可以有效地训练一个单一的运动策略,可以控制多种不同的四肢动物机器人。这些差异包括变化的DoF数(即12或16关节)、三种不同的形态、机器人质量范围从2公斤到200公斤,和nominal standing heights从16厘米到100厘米。我们的策略调节了中枢pattern generator(CPG)的表达,有效地协调CPG的频率和振荡 amplitudes,并将其映射到Pattern Formation(PF)层。不同的机器人中,唯一变化的是PF层,其调整了步高和步长的缩放参数。我们在Unitree Go1和A1机器人上进行了实验,并观察到了稳定的表现,甚至在加载15公斤的情况下,即A1机器人的125% Nominal mass。

DemoSG: Demonstration-enhanced Schema-guided Generation for Low-resource Event Extraction

  • paper_url: http://arxiv.org/abs/2310.10481
  • repo_url: None
  • paper_authors: Gang Zhao, Xiaocheng Gong, Xinjie Yang, Guanting Dong, Shudong Lu, Si Li
  • for: 提高低资源场景中的事件EXTRACTION(EE)效果
  • methods: 示范学习 paradigm和schema-based prompts
  • results: 在域 adapted low-resource setting中,对三个数据集进行了广泛的实验,并研究了 DemoSG 的稳定性。结果表明, DemoSG 在低资源场景中明显超过当前方法。Here’s a breakdown of each point:
  • for: The paper is written to improve the effectiveness of Event Extraction (EE) in low-resource scenarios.
  • methods: The paper proposes two methods to improve EE in low-resource scenarios: (1) demonstration-based learning paradigm, and (2) schema-based prompts.
  • results: The paper presents extensive experiments on three datasets in in-domain and domain adaptation low-resource settings, and demonstrates that the proposed DemoSG model significantly outperforms current methods in low-resource scenarios.
    Abstract Most current Event Extraction (EE) methods focus on the high-resource scenario, which requires a large amount of annotated data and can hardly be applied to low-resource domains. To address EE more effectively with limited resources, we propose the Demonstration-enhanced Schema-guided Generation (DemoSG) model, which benefits low-resource EE from two aspects: Firstly, we propose the demonstration-based learning paradigm for EE to fully use the annotated data, which transforms them into demonstrations to illustrate the extraction process and help the model learn effectively. Secondly, we formulate EE as a natural language generation task guided by schema-based prompts, thereby leveraging label semantics and promoting knowledge transfer in low-resource scenarios. We conduct extensive experiments under in-domain and domain adaptation low-resource settings on three datasets, and study the robustness of DemoSG. The results show that DemoSG significantly outperforms current methods in low-resource scenarios.
    摘要 现有的事件抽取(EE)方法专注于高资源情况下,需要大量的标注数据并几乎无法应用于低资源领域。为了对EE更有效地应用限制的资源,我们提出了示例增强的结构引导生成(DemoSG)模型,它具有以下两个方面的优点:首先,我们提出了示例学习模式,将标注数据转换为示例,以帮助模型彻底学习。其次,我们将EE视为自然语言生成任务,并透过Schema-based启发词提高标签 semantics,以便在低资源情况下传递知识。我们对三个数据集进行了广泛的实验,包括域内和领域适应低资源情况下的实验,并研究了DemoSG的稳定性。结果显示,DemoSG与现有的方法在低资源情况下具有很大的优势。

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

  • paper_url: http://arxiv.org/abs/2310.10477
  • repo_url: None
  • paper_authors: Kai Chen, Chunwei Wang, Kuo Yang, Jianhua Han, Lanqing Hong, Fei Mi, Hang Xu, Zhengying Liu, Wenyong Huang, Zhenguo Li, Dit-Yan Yeung, Lifeng Shang, Xin Jiang, Qun Liu
  • for: 本研究旨在提高大语言模型(LLM)的安全性和合理性,特别是在面对恶意和毒害语言时。
  • methods: 本研究提出了一种基于错误分析的新的对齐策略,通过故意暴露LLM于异常输出,然后进行全面的评估,以全面了解内部的原因。
  • results: 实验结果表明,提出的方法在安全指令遵从方面的性能超过了传统对齐方法,同时保持了高效性。
    Abstract The rapid advancement of large language models (LLMs) presents both opportunities and challenges, particularly concerning unintentional generation of harmful and toxic responses. While the traditional alignment methods strive to steer LLMs towards desired performance and shield them from malicious content, this study proposes a novel alignment strategy rooted in mistake analysis by exposing LLMs to flawed outputs purposefully and then conducting a thorough assessment to fully comprehend internal reasons via natural language analysis. Thus, toxic responses can be transformed into instruction tuning corpus for model alignment, and LLMs can not only be deterred from generating flawed responses but also trained to self-criticize, leveraging its innate ability to discriminate toxic content. Experimental results demonstrate that the proposed method outperforms conventional alignment techniques for safety instruction following, while maintaining superior efficiency.
    摘要 大量语言模型(LLM)的快速进步带来了机会和挑战,特别是在无意义生成危险和恶意响应方面。传统的Alignment方法努力使LLM towards Desired performance和避免恶意内容,这种研究提出了一种新的Alignment策略,基于 mistake analysis,故意暴露LLM于异常输出,然后进行全面的评估,以全面了解内部原因via自然语言分析。因此,恶意响应可以被转化为调教征集,LLM不仅可以减少生成异常响应,还可以培养自我批判,利用其内置的恶意内容抵制能力。实验结果表明,提出的方法在安全指令遵从方面超过了传统的Alignment技术,同时保持了高效性。

Stance Detection with Collaborative Role-Infused LLM-Based Agents

  • paper_url: http://arxiv.org/abs/2310.10467
  • repo_url: None
  • paper_authors: Xiaochong Lan, Chen Gao, Depeng Jin, Yong Li
  • for: 这篇文章的目的是提出一个三阶段框架,以帮助自然语言处理器(LLM)实现偏见探测。
  • methods: 这个框架使用了三种不同的LLM,每个LLM都有不同的角色,包括语言专家、领域专家和社交媒体老手。这些LLM共同执行三个阶段,包括多维度文本分析阶段、逻辑增强辩论阶段和结论统一阶段。
  • results: 这篇文章的结果显示,使用这个框架可以实现高度的偏见探测性能,并且不需要额外的标注资料和模型训练。实验还显示了这个方法的可说明性和可重用性。
    Abstract Stance detection automatically detects the stance in a text towards a target, vital for content analysis in web and social media research. Despite their promising capabilities, LLMs encounter challenges when directly applied to stance detection. First, stance detection demands multi-aspect knowledge, from deciphering event-related terminologies to understanding the expression styles in social media platforms. Second, stance detection requires advanced reasoning to infer authors' implicit viewpoints, as stance are often subtly embedded rather than overtly stated in the text. To address these challenges, we design a three-stage framework COLA (short for Collaborative rOle-infused LLM-based Agents) in which LLMs are designated distinct roles, creating a collaborative system where each role contributes uniquely. Initially, in the multidimensional text analysis stage, we configure the LLMs to act as a linguistic expert, a domain specialist, and a social media veteran to get a multifaceted analysis of texts, thus overcoming the first challenge. Next, in the reasoning-enhanced debating stage, for each potential stance, we designate a specific LLM-based agent to advocate for it, guiding the LLM to detect logical connections between text features and stance, tackling the second challenge. Finally, in the stance conclusion stage, a final decision maker agent consolidates prior insights to determine the stance. Our approach avoids extra annotated data and model training and is highly usable. We achieve state-of-the-art performance across multiple datasets. Ablation studies validate the effectiveness of each design role in handling stance detection. Further experiments have demonstrated the explainability and the versatility of our approach. Our approach excels in usability, accuracy, effectiveness, explainability and versatility, highlighting its value.
    摘要 Automatic stance detection可以检测文本中对目标的立场,这对于网络和社交媒体研究是非常重要。然而,深入应用于检测的语言模型(LLMs)会遇到挑战。首先,检测立场需要多方面的知识,包括理解社交媒体平台上的表达方式和解读事件相关的术语。其次,检测立场需要高级的理解,以便推理出作者的潜在观点,因为立场通常不直接在文本中表达。为解决这些挑战,我们设计了一个三个阶段的框架,称为COLA(简称为协作型角色扮演 LLM 代理)。在这个框架中,LLMs被分配为不同的角色,形成一个协作的系统,每个角色具有唯一的贡献。在多维度文本分析阶段,我们配置 LLMS acted as语言专家、领域专家和社交媒体老手,以获得多方面的分析结果,从而解决第一个挑战。接着,在逻辑批判阶段,我们为每个可能的立场分配了一个特定的 LLM 代理,使得 LLMS 检测文本特征和立场之间的逻辑连接,解决第二个挑战。最后,在立场结论阶段,一个最终的决策者代理将先前的见解集成,以确定立场。我们的方法不需要额外的注释数据和模型训练,具有非常高的可用性。我们在多个数据集上实现了状态的最佳性能。剥离学习 validate了每个设计角色在处理检测立场方面的效果。进一步的实验还表明了我们的方法在可读性、准确性、有效性、可读性和多样性方面的优异。这些结果表明我们的方法具有价值。

Machine Learning Techniques for Identifying the Defective Patterns in Semiconductor Wafer Maps: A Survey, Empirical, and Experimental Evaluations

  • paper_url: http://arxiv.org/abs/2310.10705
  • repo_url: None
  • paper_authors: Kamal Taha
    for:This survey paper provides a comprehensive review of machine learning (ML) techniques for identifying wafer defects in semiconductor manufacturing, aiming to fill a void in the existing literature and provide an in-depth analysis of the advantages, limitations, and potential applications of various ML algorithms in this field.methods:The paper employs a four-tier taxonomy to classify ML algorithms into more refined categories and techniques, providing a detailed understanding of the complex relationships between different algorithms and their sub-techniques. The taxonomy includes broad methodology categories, specific sub-techniques, and experimental evaluations to rank the techniques.results:The paper presents a comprehensive empirical evaluation of the techniques based on four criteria and an experimental evaluation that ranks the algorithms employing the same sub-techniques, techniques, sub-categories, and categories. The approach provides a detailed and holistic understanding of ML techniques and algorithms for identifying wafer defects, guiding researchers towards making more informed decisions in their work. The paper also highlights future prospects and opportunities for further research in this field.
    Abstract This survey paper offers a comprehensive review of methodologies utilizing machine learning (ML) techniques for identifying wafer defects in semiconductor manufacturing. Despite the growing body of research demonstrating the effectiveness of ML in wafer defect identification, there is a noticeable absence of comprehensive reviews on this subject. This survey attempts to fill this void by amalgamating available literature and providing an in-depth analysis of the advantages, limitations, and potential applications of various ML algorithms in the realm of wafer defect detection. An innovative taxonomy of methodologies that we present provides a detailed classification of algorithms into more refined categories and techniques. This taxonomy follows a four-tier structure, starting from broad methodology categories and ending with specific sub-techniques. It aids researchers in comprehending the complex relationships between different algorithms and their techniques. We employ a rigorous empirical and experimental evaluation to rank these varying techniques. For the empirical evaluation, we assess techniques based on a set of four criteria. The experimental evaluation ranks the algorithms employing the same sub-techniques, techniques, sub-categories, and categories. This integration of a multi-layered taxonomy, empirical evaluations, and comparative experiments provides a detailed and holistic understanding of ML techniques and algorithms for identifying wafer defects. This approach guides researchers towards making more informed decisions in their work. Additionally, the paper illuminates the future prospects of ML techniques for wafer defect identification, underscoring potential advancements and opportunities for further research in this field
    摘要 We present a novel taxonomy of methodologies that categorizes algorithms into more specific categories and techniques. This taxonomy has four tiers, starting with broad methodology categories and ending with specific sub-techniques. This taxonomy helps researchers understand the complex relationships between different algorithms and their techniques.We conduct a rigorous empirical and experimental evaluation of these techniques. For the empirical evaluation, we assess techniques based on a set of four criteria. The experimental evaluation ranks the algorithms using the same sub-techniques, techniques, sub-categories, and categories. This integrated approach provides a comprehensive understanding of ML techniques and algorithms for identifying wafer defects.This paper also highlights the future prospects of ML techniques for wafer defect identification, highlighting potential advancements and opportunities for further research in this field. By providing a detailed and holistic understanding of ML techniques and algorithms, this survey aims to guide researchers in making more informed decisions in their work.翻译结果:这篇研究论文提供了机器学习(ML)技术在半导体制造过程中检测板差的全面回顾。尽管有一些研究证明了ML在板差检测中的效果,但是存在一定的研究杂乱。这篇论文尝试填补这个空白,并提供了一个深入分析的机器学习技术在板差检测中的优点、缺点和应用前景。我们提出了一种新的分类方法,它将机器学习算法分为更加细分的类别和技术。这种分类方法有四层结构,从最高级的方法类别开始,到最低级的特定技术。这种分类方法可以帮助研究人员更好地理解不同的算法和技术之间的复杂关系。我们进行了一项严格的实验和实证评估。对实验来说,我们对不同的技术进行了四个标准的评估标准。这种分类方法可以帮助研究人员在工作中做出更加 Informed 的决策。此外,这篇论文还探讨了机器学习技术在板差检测中的未来前景,并指出了潜在的进步和研究机会。

On the Relevance of Temporal Features for Medical Ultrasound Video Recognition

  • paper_url: http://arxiv.org/abs/2310.10453
  • repo_url: https://github.com/MedAI-Clemson/pda_detection
  • paper_authors: D. Hudson Smith, John Paul Lineberger, George H. Baker
  • for: 本研究旨在提高医疗ultrasound视频识别任务的效率,特别是在低数据量情况下。
  • methods: 本研究提出了一种新的多头注意架构,通过 incorporating 时间特征来提高模型的效率。
  • results: 对比于效率高的3D CNN视频识别模型,本研究在一些常见的ultrasound任务中表现出优于其,尤其是在训练数据量受限的情况下。
    Abstract Many medical ultrasound video recognition tasks involve identifying key anatomical features regardless of when they appear in the video suggesting that modeling such tasks may not benefit from temporal features. Correspondingly, model architectures that exclude temporal features may have better sample efficiency. We propose a novel multi-head attention architecture that incorporates these hypotheses as inductive priors to achieve better sample efficiency on common ultrasound tasks. We compare the performance of our architecture to an efficient 3D CNN video recognition model in two settings: one where we expect not to require temporal features and one where we do. In the former setting, our model outperforms the 3D CNN - especially when we artificially limit the training data. In the latter, the outcome reverses. These results suggest that expressive time-independent models may be more effective than state-of-the-art video recognition models for some common ultrasound tasks in the low-data regime.
    摘要 “许多医疗超音波录影 задачі都涉及到识别关键生物学特征,不论在录影中出现的时间点。这表明模型化这些任务可能不需要时间特征。因此,不包含时间特征的模型架构可能会有更好的样本效率。我们提出了一个新的多头注意架构,将这两个假设作为导引假设,以 achieve better sample efficiency on common ultrasound tasks。我们将比较我们的架构和一个高效的3D CNN录影识别模型在两个设定下的性能:一个情况下,我们不需要时间特征,一个情况下,我们需要时间特征。在前一个情况下,我们的模型比3D CNN更高效,特别是当我们人工限制训练数据时。在后一个情况下,结果逆转。这些结果表明表现出时间独立的表达模型可能比现有的录影识别模型在低数据情况下更有效。”

Text Summarization Using Large Language Models: A Comparative Study of MPT-7b-instruct, Falcon-7b-instruct, and OpenAI Chat-GPT Models

  • paper_url: http://arxiv.org/abs/2310.10449
  • repo_url: https://github.com/lbasyal/llms-text-summarization
  • paper_authors: Lochan Basyal, Mihir Sanghvi
  • For: This paper explores the use of Large Language Models (LLMs) for text summarization, specifically comparing the performance of three different models (MPT-7b-instruct, falcon-7b-instruct, and OpenAI ChatGPT text-davinci-003) on two datasets (CNN Daily Mail and XSum).* Methods: The paper uses a diverse set of LLMs and evaluates their performance using widely accepted metrics such as BLEU Score, ROUGE Score, and BERT Score. The experiment involves different hyperparameters and aims to provide a comprehensive understanding of the effectiveness of LLMs for text summarization.* Results: According to the experiment, text-davinci-003 outperformed the other two models, demonstrating its effectiveness for text summarization. The paper provides valuable insights for researchers and practitioners within the NLP domain and lays the foundation for the development of advanced Generative AI applications.
    Abstract Text summarization is a critical Natural Language Processing (NLP) task with applications ranging from information retrieval to content generation. Leveraging Large Language Models (LLMs) has shown remarkable promise in enhancing summarization techniques. This paper embarks on an exploration of text summarization with a diverse set of LLMs, including MPT-7b-instruct, falcon-7b-instruct, and OpenAI ChatGPT text-davinci-003 models. The experiment was performed with different hyperparameters and evaluated the generated summaries using widely accepted metrics such as the Bilingual Evaluation Understudy (BLEU) Score, Recall-Oriented Understudy for Gisting Evaluation (ROUGE) Score, and Bidirectional Encoder Representations from Transformers (BERT) Score. According to the experiment, text-davinci-003 outperformed the others. This investigation involved two distinct datasets: CNN Daily Mail and XSum. Its primary objective was to provide a comprehensive understanding of the performance of Large Language Models (LLMs) when applied to different datasets. The assessment of these models' effectiveness contributes valuable insights to researchers and practitioners within the NLP domain. This work serves as a resource for those interested in harnessing the potential of LLMs for text summarization and lays the foundation for the development of advanced Generative AI applications aimed at addressing a wide spectrum of business challenges.
    摘要 文本概要是一个重要的自然语言处理(NLP)任务,其应用范围从信息检索到内容生成。利用大语言模型(LLMs)已经显著提高了概要技术。这篇论文展开了使用多种LLMs进行文本概要的研究,包括MPT-7b-instruct、falcon-7b-instruct和OpenAI ChatGPT text-davinci-003模型。实验中使用了不同的超参数,并使用了通用的评价指标如双语评价下study(BLEU)分数、推理引导下的学生评价(ROUGE)分数和Transformers的扩展语言模型(BERT)分数进行评估生成的概要。根据实验结果,text-davinci-003表现最佳。这项研究使用了两个不同的数据集:CNN Daily Mail和XSum。研究的主要目标是为NLP领域的研究者和实践者提供LLMs在不同数据集上的性能评估,以便更好地利用LLMs的潜力。这项工作作为NLP领域的研究资源,并为开发高级生成AI应用程序提供了基础。

Large Language Model-Empowered Agents for Simulating Macroeconomic Activities

  • paper_url: http://arxiv.org/abs/2310.10436
  • repo_url: None
  • paper_authors: Nian Li, Chen Gao, Yong Li, Qingmin Liao
  • for: 这篇论文旨在探讨使用语言模型(LLM)在macro经济模拟中的可能性,以解决传统模型中的三大挑战,即代理人差异、macro经济趋势的影响和多方面经济因素的互动。
  • methods: 该论文提出了一种新的方法,利用LLM来塑造人类决策行为,并通过提问工程来让LLM表现出人类特征,包括感知、反思和决策能力。
  • results: 在macro经济活动的 simulations中,LLM强化的代理人可以做出更加真实的工作和消费决策,并且可以生成更加合理的macro经济现象。这些结果表明LLM在macro经济模拟中的潜力很大。
    Abstract The advent of the Web has brought about a paradigm shift in traditional economics, particularly in the digital economy era, enabling the precise recording and analysis of individual economic behavior. This has led to a growing emphasis on data-driven modeling in macroeconomics. In macroeconomic research, Agent-based modeling (ABM) emerged as an alternative, evolving through rule-based agents, machine learning-enhanced decision-making, and, more recently, advanced AI agents. However, the existing works are suffering from three main challenges when endowing agents with human-like decision-making, including agent heterogeneity, the influence of macroeconomic trends, and multifaceted economic factors. Large language models (LLMs) have recently gained prominence in offering autonomous human-like characteristics. Therefore, leveraging LLMs in macroeconomic simulation presents an opportunity to overcome traditional limitations. In this work, we take an early step in introducing a novel approach that leverages LLMs in macroeconomic simulation. We design prompt-engineering-driven LLM agents to exhibit human-like decision-making and adaptability in the economic environment, with the abilities of perception, reflection, and decision-making to address the abovementioned challenges. Simulation experiments on macroeconomic activities show that LLM-empowered agents can make realistic work and consumption decisions and emerge more reasonable macroeconomic phenomena than existing rule-based or AI agents. Our work demonstrates the promising potential to simulate macroeconomics based on LLM and its human-like characteristics.
    摘要 互联网的出现带来了传统经济学中的 Paradigm shift,特别是在数位经济时代,允许精确地录取和分析个人经济行为。这导致了对数据驱动模型在macroeconomics中的增加强调。在macroeconomic研究中,Agent-based modeling(ABM) emerged as an alternative,通过规则生成的代理人、机器学习增强的决策和、更近期的进步AI代理人。然而,现有的工作受到三大挑战,包括代理人多样性、macroeconomic趋势的影响和多方面的经济因素。 latest Large language models(LLMs)have recently gained prominence in offering autonomous human-like characteristics. Therefore, leveraging LLMs in macroeconomic simulation presents an opportunity to overcome traditional limitations. In this work, we take an early step in introducing a novel approach that leverages LLMs in macroeconomic simulation. We design prompt-engineering-driven LLM agents to exhibit human-like decision-making and adaptability in the economic environment, with the abilities of perception, reflection, and decision-making to address the above-mentioned challenges. Simulation experiments on macroeconomic activities show that LLM-empowered agents can make realistic work and consumption decisions and emerge more reasonable macroeconomic phenomena than existing rule-based or AI agents. Our work demonstrates the promising potential to simulate macroeconomics based on LLM and its human-like characteristics.

Longitudinal Self-supervised Learning Using Neural Ordinary Differential Equation

  • paper_url: http://arxiv.org/abs/2310.10431
  • repo_url: None
  • paper_authors: Rachid Zeghlache, Pierre-Henri Conze, Mostafa El Habib Daho, Yihao Li, Hugo Le Boité, Ramin Tadayoni, Pascal Massin, Béatrice Cochener, Ikram Brahim, Gwenolé Quellec, Mathieu Lamard
  • for: investigate the progressive changes in anatomical structures or disease progression over time
  • methods: longitudinal self-supervised learning (LSSL) algorithm embedded in an auto-encoder (AE) structure, Siamese-like LSSL, and neural ordinary differential equation (NODE)
  • results: demonstration of LSSL without including a reconstruction term, and the potential of incorporating NODE in conjunction with LSSL
    Abstract Longitudinal analysis in medical imaging is crucial to investigate the progressive changes in anatomical structures or disease progression over time. In recent years, a novel class of algorithms has emerged with the goal of learning disease progression in a self-supervised manner, using either pairs of consecutive images or time series of images. By capturing temporal patterns without external labels or supervision, longitudinal self-supervised learning (LSSL) has become a promising avenue. To better understand this core method, we explore in this paper the LSSL algorithm under different scenarios. The original LSSL is embedded in an auto-encoder (AE) structure. However, conventional self-supervised strategies are usually implemented in a Siamese-like manner. Therefore, (as a first novelty) in this study, we explore the use of Siamese-like LSSL. Another new core framework named neural ordinary differential equation (NODE). NODE is a neural network architecture that learns the dynamics of ordinary differential equations (ODE) through the use of neural networks. Many temporal systems can be described by ODE, including modeling disease progression. We believe that there is an interesting connection to make between LSSL and NODE. This paper aims at providing a better understanding of those core algorithms for learning the disease progression with the mentioned change. In our different experiments, we employ a longitudinal dataset, named OPHDIAT, targeting diabetic retinopathy (DR) follow-up. Our results demonstrate the application of LSSL without including a reconstruction term, as well as the potential of incorporating NODE in conjunction with LSSL.
    摘要 长itudinal分析在医学成像中是关键性的,用于探索时间序列中结构或疾病进程的变化。近年来,一种新的算法类型出现了,即无supervision的自适应学习疾病进程(LSSL),使用时间序列或邻接图像对。通过捕捉时间模式,LSSL成为了一个有前途的方向。为更好地理解这种核心方法,我们在这篇论文中探讨了LSSL算法在不同情况下的表现。原始LSSL被嵌入了自适应encoder(AE)结构中。然而,传统的自适应策略通常是在Siamese-like的方式实现。因此,在本研究中,我们探索了使用Siamese-like LSSL。另一个新的核心框架是神经ordinary differential equation(NODE)。NODE是一种使用神经网络学习ordinary differential equation(ODE)的神经网络架构。许多时间系统可以通过ODE进行描述,包括疾病进程。我们认为在LSSL和NODE之间存在一种有趣的联系。本文的目标是为了更好地理解这些核心算法,以便更好地学习疾病进程的变化。在不同的实验中,我们使用了名为OPHDIAT的长itudinal数据集,targeting diabetic retinopathy(DR)跟踪。我们的结果表明了不包含重建项的LSSL的应用,以及将NODE与LSSL相结合的潜在优势。

Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms

  • paper_url: http://arxiv.org/abs/2310.10418
  • repo_url: https://github.com/wade3han/normlens
  • paper_authors: Seungju Han, Junhyeok Kim, Jack Hessel, Liwei Jiang, Jiwan Chung, Yejin Son, Yejin Choi, Youngjae Yu
  • for: 研究视觉封装常识规则,以提高机器人工智能的可理解性和适应能力。
  • methods: 使用人类评估和自然语言处理技术,构建一个新的多模态测试 benchmark,以评估模型对视觉封装常识规则的适应性和可解性。
  • results: 发现当前状态的模型判断和解释与人类标注不匹配,并提出一种新的方法,通过借鉴大型自然语言模型中的社会常识知识,改进模型与人类之间的匹配度。
    Abstract Commonsense norms are defeasible by context: reading books is usually great, but not when driving a car. While contexts can be explicitly described in language, in embodied scenarios, contexts are often provided visually. This type of visually grounded reasoning about defeasible commonsense norms is generally easy for humans, but (as we show) poses a challenge for machines, as it necessitates both visual understanding and reasoning about commonsense norms. We construct a new multimodal benchmark for studying visual-grounded commonsense norms: NORMLENS. NORMLENS consists of 10K human judgments accompanied by free-form explanations covering 2K multimodal situations, and serves as a probe to address two questions: (1) to what extent can models align with average human judgment? and (2) how well can models explain their predicted judgments? We find that state-of-the-art model judgments and explanations are not well-aligned with human annotation. Additionally, we present a new approach to better align models with humans by distilling social commonsense knowledge from large language models. The data and code are released at https://seungjuhan.me/normlens.
    摘要 通常的规则是可以被上下文所推翻:读书通常是非常好的,但不是在开车时。而在实际情况下,上下文通常是通过视觉提供的。这种基于视觉的常识规则的理解和判断是人类非常容易做,但对机器来说是一种挑战,因为它需要同时具备视觉理解和常识规则的理解。我们构建了一个新的多Modal benchMark,名为NORMLENS,用于研究基于视觉的常识规则。NORMLENS包含10,000个人类判断,以及2,000个多Modal的情况,并用于解决两个问题:(1)模型与人类平均判断是否能够Alignment?(2)模型如何解释其预测的判断?我们发现当前的模型判断和解释都与人类注释不一致。此外,我们还提出了一种新的方法,通过从大语言模型中提取社会常识知识来更好地将模型与人类Alignment。数据和代码在https://seungjuhan.me/normlens上发布。

Real-Fake: Effective Training Data Synthesis Through Distribution Matching

  • paper_url: http://arxiv.org/abs/2310.10402
  • repo_url: None
  • paper_authors: Jianhao Yuan, Jie Zhang, Shuyang Sun, Philip Torr, Bo Zhao
  • for: 提高深度学习模型的训练效率和鲁棒性。
  • methods: 基于分布匹配理论的数据生成方法,包括数据生成和数据筛选。
  • results: 在多种图像分类任务中, synthetic data 能够取代和补充实际数据,提高模型的鲁棒性和特点外泄能力。
    Abstract Synthetic training data has gained prominence in numerous learning tasks and scenarios, offering advantages such as dataset augmentation, generalization evaluation, and privacy preservation. Despite these benefits, the efficiency of synthetic data generated by current methodologies remains inferior when training advanced deep models exclusively, limiting its practical utility. To address this challenge, we analyze the principles underlying training data synthesis for supervised learning and elucidate a principled theoretical framework from the distribution-matching perspective that explicates the mechanisms governing synthesis efficacy. Through extensive experiments, we demonstrate the effectiveness of our synthetic data across diverse image classification tasks, both as a replacement for and augmentation to real datasets, while also benefits challenging tasks such as out-of-distribution generalization and privacy preservation.
    摘要 现代深度学习模型的训练数据 synthetic 技术在各种学习任务和场景中得到了广泛应用,具有提高数据量、提高模型性能、隐私保护等优点。然而,现有的synthetic数据生成方法在训练高级深度模型时效率仍然较低,限制其实际应用。为解决这个挑战,我们分析了supervised 学习数据生成的原理,从分布匹配角度出发,探讨生成效果的机制。经过广泛的实验,我们证明了我们的synthetic数据在多种图像分类任务中具有广泛的应用价值,可以替代真实数据,也可以增强模型的性能,同时在难题上如out-of-distribution泛化和隐私保护等方面具有优势。

Can Word Sense Distribution Detect Semantic Changes of Words?

  • paper_url: http://arxiv.org/abs/2310.10400
  • repo_url: https://github.com/LivNLP/Sense-based-Semantic-Change-Prediction
  • paper_authors: Xiaohang Tang, Yi Zhou, Taichi Aida, Procheta Sen, Danushka Bollegala
  • for: 这个研究的目的是为了探索使用时间点数据集来预测字词意思是否有变化。
  • methods: 这个研究使用预训 слова embeddings 来自动标注目标词的每个出现,然后计算每个词的意思分布。最后,使用不同的分布或距离度量来衡量目标词的意思变化。
  • results: 实验结果显示,使用时间点数据集可以准确预测英语、德语、瑞典语和拉丁语中字词意思的变化。
    Abstract Semantic Change Detection (SCD) of words is an important task for various NLP applications that must make time-sensitive predictions. Some words are used over time in novel ways to express new meanings, and these new meanings establish themselves as novel senses of existing words. On the other hand, Word Sense Disambiguation (WSD) methods associate ambiguous words with sense ids, depending on the context in which they occur. Given this relationship between WSD and SCD, we explore the possibility of predicting whether a target word has its meaning changed between two corpora collected at different time steps, by comparing the distributions of senses of that word in each corpora. For this purpose, we use pretrained static sense embeddings to automatically annotate each occurrence of the target word in a corpus with a sense id. Next, we compute the distribution of sense ids of a target word in a given corpus. Finally, we use different divergence or distance measures to quantify the semantic change of the target word across the two given corpora. Our experimental results on SemEval 2020 Task 1 dataset show that word sense distributions can be accurately used to predict semantic changes of words in English, German, Swedish and Latin.
    摘要 <>将文本翻译成简化中文。<>word sense distribution的Changesemantic detection(SCD)是各种自然语言处理(NLP)应用程序中的重要任务,这些应用程序需要在时间紧张的情况下进行预测。一些 слова在时间的推移中被用于表达新的意思,这些新意思会成为存在的 слова的新的意思。然而,word sense disambiguation(WSD)方法会将word的意思相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相关的上下文中的word相

Towards Open World Active Learning for 3D Object Detection

  • paper_url: http://arxiv.org/abs/2310.10391
  • repo_url: None
  • paper_authors: Zhuoxiao Chen, Yadan Luo, Zixin Wang, Zijian Wang, Xin Yu, Zi Huang
  • for: 本研究旨在解决开放世界3D对象检测中存在新类出现的挑战,即效率地选择少量的3D框进行标注,以提高对知类和未知类的检测性能。
  • methods: 本研究提出了一种名为OpenCRB的简单有效的活动学习策略,通过融合关系约束,选择最有价值的3D框进行标注,以最小化标注成本。
  • results: 对于开放世界3D对象检测任务,提出了一种名为OpenCRB的简单有效的活动学习策略,可以在很少的标注成本下,达到高效地检测知类和未知类的目标。
    Abstract Significant strides have been made in closed world 3D object detection, testing systems in environments with known classes. However, the challenge arises in open world scenarios where new object classes appear. Existing efforts sequentially learn novel classes from streams of labeled data at a significant annotation cost, impeding efficient deployment to the wild. To seek effective solutions, we investigate a more practical yet challenging research task: Open World Active Learning for 3D Object Detection (OWAL-3D), aiming at selecting a small number of 3D boxes to annotate while maximizing detection performance on both known and unknown classes. The core difficulty centers on striking a balance between mining more unknown instances and minimizing the labeling expenses of point clouds. Empirically, our study finds the harmonious and inverse relationship between box quantities and their confidences can help alleviate the dilemma, avoiding the repeated selection of common known instances and focusing on uncertain objects that are potentially unknown. We unify both relational constraints into a simple and effective AL strategy namely OpenCRB, which guides to acquisition of informative point clouds with the least amount of boxes to label. Furthermore, we develop a comprehensive codebase for easy reproducing and future research, supporting 15 baseline methods (i.e., active learning, out-of-distribution detection and open world detection), 2 types of modern 3D detectors (i.e., one-stage SECOND and two-stage PV-RCNN) and 3 benchmark 3D datasets (i.e., KITTI, nuScenes and Waymo). Extensive experiments evidence that the proposed Open-CRB demonstrates superiority and flexibility in recognizing both novel and shared categories with very limited labeling costs, compared to state-of-the-art baselines.
    摘要 <>translate_language: zh-CN<>Recent progress has been made in closed-world 3D object detection, testing systems in environments with known classes. However, the challenge arises in open-world scenarios where new object classes appear. Existing efforts sequentially learn novel classes from streams of labeled data at a significant annotation cost, impeding efficient deployment to the wild. To seek effective solutions, we investigate a more practical yet challenging research task: Open World Active Learning for 3D Object Detection (OWAL-3D), aiming at selecting a small number of 3D boxes to annotate while maximizing detection performance on both known and unknown classes. The core difficulty centers on striking a balance between mining more unknown instances and minimizing the labeling expenses of point clouds. Empirically, our study finds the harmonious and inverse relationship between box quantities and their confidences can help alleviate the dilemma, avoiding the repeated selection of common known instances and focusing on uncertain objects that are potentially unknown. We unify both relational constraints into a simple and effective AL strategy named OpenCRB, which guides to acquisition of informative point clouds with the least amount of boxes to label. Furthermore, we develop a comprehensive codebase for easy reproducing and future research, supporting 15 baseline methods (i.e., active learning, out-of-distribution detection, and open-world detection), 2 types of modern 3D detectors (i.e., one-stage SECOND and two-stage PV-RCNN), and 3 benchmark 3D datasets (i.e., KITTI, nuScenes, and Waymo). Extensive experiments evidence that the proposed Open-CRB demonstrates superiority and flexibility in recognizing both novel and shared categories with very limited labeling costs, compared to state-of-the-art baselines.

Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models

  • paper_url: http://arxiv.org/abs/2310.10378
  • repo_url: https://github.com/Betswish/Cross-Lingual-Consistency
  • paper_authors: Jirui Qi, Raquel Fernández, Arianna Bisazza
  • For: The paper aims to study the cross-lingual consistency (CLC) of factual knowledge in various multilingual pre-trained language models (PLMs) and to identify the determining factors for CLC.* Methods: The authors propose a Ranking-based Consistency (RankC) metric to evaluate knowledge consistency across languages independently from accuracy. They conduct an in-depth analysis of the determining factors for CLC, both at model level and at language-pair level.* Results: The authors find that increasing model size leads to higher factual probing accuracy in most languages, but does not improve cross-lingual consistency. They also conduct a case study on CLC when new factual associations are inserted in the PLMs via model editing, and find that the new piece of knowledge transfers only to languages with which English has a high RankC score.Here are the three points in Simplified Chinese text:* For: 这个论文目的是研究不同语言背景下的多语言大规模预训练语言模型(PLMs)中的知识一致性(CLC),并确定这些因素的影响因素。* Methods: 作者们提出了一种简单的排名基于一致性(RankC)指标,以独立地评估不同语言之间的知识一致性。他们进行了深入的分析,以确定CLC的决定因素,包括模型级别和语言对照级别。* Results: 作者们发现,增加模型大小通常会提高大多数语言中的事实探测精度,但不会提高跨语言一致性。他们还进行了模型编辑后新知识插入的 caso study,发现新知识只转移到与英语有高RankC分数的语言中。
    Abstract Multilingual large-scale Pretrained Language Models (PLMs) have been shown to store considerable amounts of factual knowledge, but large variations are observed across languages. With the ultimate goal of ensuring that users with different language backgrounds obtain consistent feedback from the same model, we study the cross-lingual consistency (CLC) of factual knowledge in various multilingual PLMs. To this end, we propose a Ranking-based Consistency (RankC) metric to evaluate knowledge consistency across languages independently from accuracy. Using this metric, we conduct an in-depth analysis of the determining factors for CLC, both at model level and at language-pair level. Among other results, we find that increasing model size leads to higher factual probing accuracy in most languages, but does not improve cross-lingual consistency. Finally, we conduct a case study on CLC when new factual associations are inserted in the PLMs via model editing. Results on a small sample of facts inserted in English reveal a clear pattern whereby the new piece of knowledge transfers only to languages with which English has a high RankC score.
    摘要 多语言大规模预训练语言模型(PLM)已经显示出很大量的事实知识,但是各语言之间存在大量的变化。为确保用户不同语言背景 obtain consistent feedback from the same model,我们研究了跨语言一致性(CLC)的多语言大规模预训练语言模型中的事实知识。为此,我们提出了一个 Ranking-based Consistency(RankC)度量来评估不同语言之间的知识一致性,不受准确率的影响。使用这个度量,我们进行了详细的分析CLC的决定因素,包括模型级别和语言对级别。 among other results, we find that increasing model size leads to higher factual probing accuracy in most languages, but does not improve cross-lingual consistency. Finally, we conduct a case study on CLC when new factual associations are inserted in the PLMs via model editing. Results on a small sample of facts inserted in English reveal a clear pattern whereby the new piece of knowledge transfers only to languages with which English has a high RankC score.

GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers

  • paper_url: http://arxiv.org/abs/2310.10375
  • repo_url: https://github.com/autonomousvision/gta
  • paper_authors: Takeru Miyato, Bernhard Jaeger, Max Welling, Andreas Geiger
  • for: 提高3D视觉任务中transformer模型的学习效率和性能,无需额外学习参数,只带有较少的计算开销。
  • methods: 基于几何关系的相对变换来编码token的几何结构,提出了一种几何意识授益机制(Geometric Transform Attention,GTA)。
  • results: 在多视图synthesis任务中,GTA提高了state-of-the-art transformer-based NVS模型的学习效率和性能,无需额外学习参数,只带有较少的计算开销。
    Abstract As transformers are equivariant to the permutation of input tokens, encoding the positional information of tokens is necessary for many tasks. However, since existing positional encoding schemes have been initially designed for NLP tasks, their suitability for vision tasks, which typically exhibit different structural properties in their data, is questionable. We argue that existing positional encoding schemes are suboptimal for 3D vision tasks, as they do not respect their underlying 3D geometric structure. Based on this hypothesis, we propose a geometry-aware attention mechanism that encodes the geometric structure of tokens as relative transformation determined by the geometric relationship between queries and key-value pairs. By evaluating on multiple novel view synthesis (NVS) datasets in the sparse wide-baseline multi-view setting, we show that our attention, called Geometric Transform Attention (GTA), improves learning efficiency and performance of state-of-the-art transformer-based NVS models without any additional learned parameters and only minor computational overhead.
    摘要 transformers是 permutation 的equivariant,因此需要encoding位置信息来进行许多任务。然而,现有的 pozitional 编码方案最初是为 NLP 任务设计的,因此对于视觉任务,它们的适用性是有问题的。我们认为现有的 pozitional 编码方案对于 3D 视觉任务是不佳的,因为它们不尊重它们的下面 3D 结构。基于这个假设,我们提出了一种 geometry-aware 注意力机制,该机制通过 queries 和 key-value 对的几何关系来确定 tokens 的几何变换。我们通过在多视图 synthesis (NVS) 数据集上评估了多个 sparse wide-baseline 多视图设定,并证明了我们的注意力(GTA)可以提高基于 transformer 的 NVS 模型的学习效率和性能,无需额外学习参数,只需要少量的计算开销。

Prompt Tuning for Multi-View Graph Contrastive Learning

  • paper_url: http://arxiv.org/abs/2310.10362
  • repo_url: None
  • paper_authors: Chenghua Gong, Xiang Li, Jianxiang Yu, Cheng Yao, Jiaqi Tan, Chengcheng Yu, Dawei Yin
  • for: 提高 traditional GNN 的问题,如标签依赖和泛化性能,使用 “预训练和精度调整” 方法。
  • methods: 提出了一种多视图图像对比学习方法作为预tex,并设计了一种启发调整方法来衔接预tex 和下游任务。
  • results: 通过对多个 benchmark 数据集进行广泛的实验,证明了我们的提议可以有效地提高 GNN 的性能。
    Abstract In recent years, "pre-training and fine-tuning" has emerged as a promising approach in addressing the issues of label dependency and poor generalization performance in traditional GNNs. To reduce labeling requirement, the "pre-train, fine-tune" and "pre-train, prompt" paradigms have become increasingly common. In particular, prompt tuning is a popular alternative to "pre-training and fine-tuning" in natural language processing, which is designed to narrow the gap between pre-training and downstream objectives. However, existing study of prompting on graphs is still limited, lacking a framework that can accommodate commonly used graph pre-training methods and downstream tasks. In this paper, we propose a multi-view graph contrastive learning method as pretext and design a prompting tuning for it. Specifically, we first reformulate graph pre-training and downstream tasks into a common format. Second, we construct multi-view contrasts to capture relevant information of graphs by GNN. Third, we design a prompting tuning method for our multi-view graph contrastive learning method to bridge the gap between pretexts and downsteam tasks. Finally, we conduct extensive experiments on benchmark datasets to evaluate and analyze our proposed method.
    摘要 Recently, "预训练和细化" 方法在解决传统GNNS中的标签依赖和泛化性问题上得到了广泛的应用。以减少标签需求为目的,"预训练、细化" 和 "预训练、提示" 两种方法在自然语言处理领域中得到了广泛的应用。特别是,提示练习是自然语言处理中的一种流行的代替方法,旨在减少预训练和下游目标之间的差距。然而,现有的图像提示研究仍然受限,缺乏一个框架可以整合通用的图像预训练方法和下游任务。在这篇论文中,我们提出了一种多视图图像对照学习方法作为预文,并设计了一种提示练习方法来衔接这两者。具体来说,我们首先将图像预训练和下游任务转化为共同的格式。其次,我们使用多视图对照来捕捉图像中重要信息。最后,我们设计了一种提示练习方法来桥接预文和下游任务之间的差距。我们在标准 benchmark 数据集上进行了广泛的实验,以评估和分析我们的提议方法。

Tabular Representation, Noisy Operators, and Impacts on Table Structure Understanding Tasks in LLMs

  • paper_url: http://arxiv.org/abs/2310.10358
  • repo_url: None
  • paper_authors: Ananya Singha, José Cambronero, Sumit Gulwani, Vu Le, Chris Parnin
  • for: 这个论文旨在研究大型自然语言模型(LLM)在表格任务中使用上下文学习,并评估不同格式的表格提示表示的影响。
  • methods: 作者根据先前的工作,生成了一系列自我超vised的结构任务(例如,导航到单元和行;转置表格),并评估不同格式下LLM表现的差异。此外,作者还引入了8种噪音操作,以模拟实际世界中的潦腹数据和敌意输入,并证明这些操作对不同结构理解任务中LLM表现的影响。
  • results: 研究发现,不同格式下LLM的表现有显著差异,而噪音操作也可以影响LLM的表现。这些结果表明,在选择表格提示格式时,需要考虑表格的结构和噪音特征,以便 optimize LLM的表现。
    Abstract Large language models (LLMs) are increasingly applied for tabular tasks using in-context learning. The prompt representation for a table may play a role in the LLMs ability to process the table. Inspired by prior work, we generate a collection of self-supervised structural tasks (e.g. navigate to a cell and row; transpose the table) and evaluate the performance differences when using 8 formats. In contrast to past work, we introduce 8 noise operations inspired by real-world messy data and adversarial inputs, and show that such operations can impact LLM performance across formats for different structural understanding tasks.
    摘要 大型语言模型(LLM)在表格任务中使用内容学习,表格提示表现可能影响 LLM 的处理能力。受到先前的工作启发,我们生成了一个自动supervised的结构任务集(例如:前往矩格和行),并评估不同格式的性能差异。相比于过去的工作,我们引入了8种噪音操作,这些操作是根据实际的混乱数据和敌方输入而设计的,并证明这些操作可以影响 LLM 的性能在不同结构理解任务中。

Compressed Sensing of Generative Sparse-latent (GSL) Signals

  • paper_url: http://arxiv.org/abs/2310.15119
  • repo_url: None
  • paper_authors: Antoine Honoré, Anubhab Ghosh, Saikat Chatterjee
  • for: 该研究旨在使用神经网络生成模型进行环境信号重建,并使用非对称拟合算法实现稀疏化。
  • methods: 该研究使用了神经网络生成模型,并采用了非对称拟合算法来实现稀疏化。
  • results: 实验结果表明,使用非对称拟合算法可以实现高质量的环境信号重建。
    Abstract We consider reconstruction of an ambient signal in a compressed sensing (CS) setup where the ambient signal has a neural network based generative model. The generative model has a sparse-latent input and we refer to the generated ambient signal as generative sparse-latent signal (GSL). The proposed sparsity inducing reconstruction algorithm is inherently non-convex, and we show that a gradient based search provides a good reconstruction performance. We evaluate our proposed algorithm using simulated data.
    摘要 我们考虑了压缩感知(CS)设置中重建的环境信号,该信号有基于神经网络的生成模型。生成的环境信号我们称为生成稀烈输入信号(GSL)。我们提出的稀烈性引导的重建算法是非几何的,我们表明了使用梯度基本搜索可以获得良好的重建性能。我们使用模拟数据进行评估我们的提议算法。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Optimizing Layerwise Polynomial Approximation for Efficient Private Inference on Fully Homomorphic Encryption: A Dynamic Programming Approach

  • paper_url: http://arxiv.org/abs/2310.10349
  • repo_url: None
  • paper_authors: Junghyun Lee, Eunsang Lee, Young-Sik Kim, Yongwoo Lee, Joon-Woo Lee, Yongjune Kim, Jong-Seon No
  • for: 本研究旨在实现基于完全同质加密的隐私保护深度神经网络,但实际应用受到 prolonged inference times 的限制。这主要归结于使用高度多项式函数approximation,如ReLU函数的高度多项式approximation,占用了大量同质计算资源,导致更慢的推理。
  • methods: 本研究使用layerwise度优化activation functions的方法,以减少推理时间,保持深度神经网络的分类精度。而不like previoius works,我们不使用最小最大approximation方法,而是使用weighted least squares approximation方法,基于activation functions的输入分布。然后,我们通过 dynamic programming算法获取layerwise优化的度数,考虑每层的扩张错误对深度神经网络的分类精度的影响。此外,我们还提议在ciphertext moduli-chain layerwise进行调整,以更好地减少推理时间。
  • results: 我们的方法可以将ResNet-20模型和ResNet-32模型的推理时间减少为3.44倍和3.16倍,respectively,相比之前使用均匀度和固定ciphertext modulus的实现。
    Abstract Recent research has explored the implementation of privacy-preserving deep neural networks solely using fully homomorphic encryption. However, its practicality has been limited because of prolonged inference times. When using a pre-trained model without retraining, a major factor contributing to these prolonged inference times is the high-degree polynomial approximation of activation functions such as the ReLU function. The high-degree approximation consumes a substantial amount of homomorphic computational resources, resulting in slower inference. Unlike the previous works approximating activation functions uniformly and conservatively, this paper presents a \emph{layerwise} degree optimization of activation functions to aggressively reduce the inference time while maintaining classification accuracy by taking into account the characteristics of each layer. Instead of the minimax approximation commonly used in state-of-the-art private inference models, we employ the weighted least squares approximation method with the input distributions of activation functions. Then, we obtain the layerwise optimized degrees for activation functions through the \emph{dynamic programming} algorithm, considering how each layer's approximation error affects the classification accuracy of the deep neural network. Furthermore, we propose modulating the ciphertext moduli-chain layerwise to reduce the inference time. By these proposed layerwise optimization methods, we can reduce inference times for the ResNet-20 model and the ResNet-32 model by 3.44 times and 3.16 times, respectively, in comparison to the prior implementations employing uniform degree polynomials and a consistent ciphertext modulus.
    摘要 近期研究探讨了使用完全同质加密实现隐私保护深度神经网络。然而,它的实用性受到了 prolonged inference times 的限制。当使用预训练模型无需重新训练时,一个主要的因素是高度 polynomials 的激活函数approximation,如 ReLU 函数。高度的激活函数approximation 需要大量的同质计算资源,导致更慢的推理。 unlike previous works approximating activation functions uniformly and conservatively, this paper presents a layerwise degree optimization of activation functions to aggressively reduce the inference time while maintaining classification accuracy by taking into account the characteristics of each layer。 instead of the minimax approximation commonly used in state-of-the-art private inference models, we employ the weighted least squares approximation method with the input distributions of activation functions。 then, we obtain the layerwise optimized degrees for activation functions through the dynamic programming algorithm, considering how each layer's approximation error affects the classification accuracy of the deep neural network。 furthermore, we propose modulating the ciphertext moduli-chain layerwise to reduce the inference time。 by these proposed layerwise optimization methods, we can reduce inference times for the ResNet-20 model and the ResNet-32 model by 3.44 times and 3.16 times, respectively, in comparison to the prior implementations employing uniform degree polynomials and a consistent ciphertext modulus。

Attribution Patching Outperforms Automated Circuit Discovery

  • paper_url: http://arxiv.org/abs/2310.10348
  • repo_url: None
  • paper_authors: Aaquib Syed, Can Rager, Arthur Conmy
  • for: 这个论文的目的是探讨自动化可读性研究的可能性,以扩展神经网络行为的解释到大型模型。
  • methods: 这个论文使用了归因覆盖来自动化发现计算子网络(circuit)。它使用了一种简单的方法,基于归因覆盖来估算每个边的重要性,然后使用这个估算来剪枝。
  • results: 论文表明,使用这种方法可以超越现有的所有方法,只需要两次前向 passes和一次后向 pass。在所有任务中,这种方法的AUC从计算子网络恢复中得到了最高的平均值。
    Abstract Automated interpretability research has recently attracted attention as a potential research direction that could scale explanations of neural network behavior to large models. Existing automated circuit discovery work applies activation patching to identify subnetworks responsible for solving specific tasks (circuits). In this work, we show that a simple method based on attribution patching outperforms all existing methods while requiring just two forward passes and a backward pass. We apply a linear approximation to activation patching to estimate the importance of each edge in the computational subgraph. Using this approximation, we prune the least important edges of the network. We survey the performance and limitations of this method, finding that averaged over all tasks our method has greater AUC from circuit recovery than other methods.
    摘要 自动化可读性研究近期吸引了关注,作为可扩展 neural network 行为的解释的潜在研究方向。现有的自动化电路发现工作使用活动贴图来确定解决特定任务(电路)负责的子网络。在这种工作中,我们显示了一种简单的方法,基于归因贴图,超过所有现有方法,只需要两次前进和一次反向传播。我们使用线性近似来估计活动贴图中每个边的重要性。使用这种近似,我们剪枝网络中最不重要的边。我们对这种方法的性能和局限性进行了抽查,发现在所有任务上的均值AUC greater than other methods。

Unlocking Metasurface Practicality for B5G Networks: AI-assisted RIS Planning

  • paper_url: http://arxiv.org/abs/2310.10330
  • repo_url: None
  • paper_authors: Guillermo Encinas-Lago, Antonio Albanese, Vincenzo Sciancalepore, Marco Di Renzo, Xavier Costa-Pérez
  • for: 本文旨在探讨如何使用可编程智能表面(RIS)来提高无线网络性能,尤其是在 beyond-fifth-generation 网络(B5G)中。
  • methods: 本文使用深度逆向学习(DRL)算法,训练一个 DRL 代理,以便优化 RIS 的投放。
  • results: 本文在法国雷恩火车站的indoor场景中进行了实验,并证明了对于无法覆盖的区域,D-RISA 算法可以提供更好的覆盖率(10 dB 的最小信号噪听比提高),同时具有更低的计算时间(下降至 -25%)和更好的扩展性。
    Abstract The advent of reconfigurable intelligent surfaces(RISs) brings along significant improvements for wireless technology on the verge of beyond-fifth-generation networks (B5G).The proven flexibility in influencing the propagation environment opens up the possibility of programmatically altering the wireless channel to the advantage of network designers, enabling the exploitation of higher-frequency bands for superior throughput overcoming the challenging electromagnetic (EM) propagation properties at these frequency bands. However, RISs are not magic bullets. Their employment comes with significant complexity, requiring ad-hoc deployments and management operations to come to fruition. In this paper, we tackle the open problem of bringing RISs to the field, focusing on areas with little or no coverage. In fact, we present a first-of-its-kind deep reinforcement learning (DRL) solution, dubbed as D-RISA, which trains a DRL agent and, in turn, obtain san optimal RIS deployment. We validate our framework in the indoor scenario of the Rennes railway station in France, assessing the performance of our algorithm against state-of-the-art (SOA) approaches. Our benchmarks showcase better coverage, i.e., 10-dB increase in minimum signal-to-noise ratio (SNR), at lower computational time (up to -25 percent) while improving scalability towards denser network deployments.
    摘要 随着智能重新配置表面(RIS)的出现, fifth-generation wireless networks(B5G)的技术将得到显著改进。 RIS 的可变性使得网络设计者可以通过程序控制无线通信环境,从而在高频段上获得更高的吞吐量,并且可以超越高频段的电磁波媒体传输性的挑战。 然而, RIS 不是一个魔术药丸。它的使用需要适当的部署和管理操作,以便实现。在这篇论文中,我们解决了将 RIS 引入到实际应用中的开放问题,特别是在有少量或无覆盖的地区。我们提出了一种首先的深度优化学(DRL)解决方案,称为 D-RISA,它在 RIS 部署方面进行优化。我们在法国雷恩火车站的indoorenario中验证了我们的框架,并与现有的方法进行比较。我们的标准显示了更好的覆盖率,即10dB的最小信号响应比(SNR)的提高,同时减少计算时间(下降至25%),并改善了网络部署的扩展性。

Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning

  • paper_url: http://arxiv.org/abs/2310.10318
  • repo_url: https://github.com/znlp/functionalspecializationinmha
  • paper_authors: Chong Li, Shaonan Wang, Yunhao Zhang, Jiajun Zhang, Chengqing Zong
  • for: 这paper aims to investigate the functional specialization of multi-head attention in transformer-based models under multi-task learning.
  • methods: The authors propose an interpreting method to quantify the degree of functional specialization in multi-head attention and a simple multi-task training method to increase functional specialization and mitigate negative information transfer.
  • results: Experimental results on seven pre-trained transformer models demonstrate that multi-head attention evolves functional specialization after multi-task training, which is affected by the similarity of tasks. The proposed multi-task training strategy based on functional specialization boosts performance in both multi-task learning and transfer learning without adding any parameters.
    Abstract Transformer-based models, even though achieving super-human performance on several downstream tasks, are often regarded as a black box and used as a whole. It is still unclear what mechanisms they have learned, especially their core module: multi-head attention. Inspired by functional specialization in the human brain, which helps to efficiently handle multiple tasks, this work attempts to figure out whether the multi-head attention module will evolve similar function separation under multi-tasking training. If it is, can this mechanism further improve the model performance? To investigate these questions, we introduce an interpreting method to quantify the degree of functional specialization in multi-head attention. We further propose a simple multi-task training method to increase functional specialization and mitigate negative information transfer in multi-task learning. Experimental results on seven pre-trained transformer models have demonstrated that multi-head attention does evolve functional specialization phenomenon after multi-task training which is affected by the similarity of tasks. Moreover, the multi-task training strategy based on functional specialization boosts performance in both multi-task learning and transfer learning without adding any parameters.
    摘要 transformer-based模型,即使达到了人类超常表现,常被视为黑盒子,无法了解它们学习的机制,尤其是核心模块:多头注意力。我们受到人脑功能特化的启发,人脑可以有效地处理多个任务,因此我们想知道多头注意力模块是否会在多任务训练中发展类似的功能分化现象。如果是,那么这种机制可能会进一步提高模型性能吗?为了回答这些问题,我们提出了一种量化多头注意力中函数特化程度的解释方法。此外,我们还提出了一种简单的多任务训练方法,可以增强功能特化并降低多任务学习中的负信息传递。实验结果表明,在七种预训练transformer模型中,多头注意力会在多任务训练后发展功能分化现象,这种现象受到任务相似度的影响。此外,基于功能特化的多任务训练策略可以提高多任务学习和传播学习的性能,无需添加参数。

End-to-end Offline Reinforcement Learning for Glycemia Control

  • paper_url: http://arxiv.org/abs/2310.10312
  • repo_url: None
  • paper_authors: Tristan Beolet, Alice Adenis, Erik Huneker, Maxime Louis
  • for: 这项研究旨在提高closed-loop系统的糖尿病控制性能,并使其更适应不同情况。
  • methods: 该研究使用了RL算法,并开发了一个端到端个性化管道,以 removing the need for a simulator while still enabling the estimation of clinically relevant metrics for diabetes。
  • results: 研究表明,使用RL算法和个性化管道可以提高closed-loop系统的糖尿病控制性能,并减少了风险的存在。
    Abstract The development of closed-loop systems for glycemia control in type I diabetes relies heavily on simulated patients. Improving the performances and adaptability of these close-loops raises the risk of over-fitting the simulator. This may have dire consequences, especially in unusual cases which were not faithfully-if at all-captured by the simulator. To address this, we propose to use offline RL agents, trained on real patient data, to perform the glycemia control. To further improve the performances, we propose an end-to-end personalization pipeline, which leverages offline-policy evaluation methods to remove altogether the need of a simulator, while still enabling an estimation of clinically relevant metrics for diabetes.
    摘要 开发闭环系统控制型一 диа베ت斯需要启用模拟患者。提高这些闭环的性能和适应性可能会增加过拟合模拟器的风险,特别是在不常见的情况下。为解决这个问题,我们提议使用线上RL代理,在真实患者数据上训练,来实现血糖控制。此外,我们还提议一个终端个性化管道,利用线上策略评估方法来完全排除模拟器的需求,同时仍能估计临床重要指标。

Learning visual-based deformable object rearrangement with local graph neural networks

  • paper_url: http://arxiv.org/abs/2310.10307
  • repo_url: https://github.com/dengyh16code/deformable-gnn
  • paper_authors: Yuhong Deng, Xueqian Wang, Lipeng chen
  • for: 本研究旨在解决机器人对弹性物体(如绳子和布)的重新排序问题,使用只有视觉观察的情况下,将弹性物体转换到预先设定的目标配置。
  • methods: 本研究提出了一种新的表示策略,可以快速和有效地表示弹性物体的状态,并且可以模型弹性重新排序动态。本研究还提出了一种本地图 neural network(GNN),用于同时学习弹性重新排序动态和掌握最佳抓取和放置动作。
  • results: 对多种弹性重新排序任务进行了仿真和实验,结果显示,提出的动态图表示方法可以更高效地模型弹性重新排序动态,并且在多任务学习和实际应用中表现出色。
    Abstract Goal-conditioned rearrangement of deformable objects (e.g. straightening a rope and folding a cloth) is one of the most common deformable manipulation tasks, where the robot needs to rearrange a deformable object into a prescribed goal configuration with only visual observations. These tasks are typically confronted with two main challenges: the high dimensionality of deformable configuration space and the underlying complexity, nonlinearity and uncertainty inherent in deformable dynamics. To address these challenges, we propose a novel representation strategy that can efficiently model the deformable object states with a set of keypoints and their interactions. We further propose local-graph neural network (GNN), a light local GNN learning to jointly model the deformable rearrangement dynamics and infer the optimal manipulation actions (e.g. pick and place) by constructing and updating two dynamic graphs. Both simulated and real experiments have been conducted to demonstrate that the proposed dynamic graph representation shows superior expressiveness in modeling deformable rearrangement dynamics. Our method reaches much higher success rates on a variety of deformable rearrangement tasks (96.3% on average) than state-of-the-art method in simulation experiments. Besides, our method is much more lighter and has a 60% shorter inference time than state-of-the-art methods. We also demonstrate that our method performs well in the multi-task learning scenario and can be transferred to real-world applications with an average success rate of 95% by solely fine tuning a keypoint detector.
    摘要 goal-conditioned 重新排序的软体 объек� (例如,整 Straightening a rope and folding a cloth) 是最常见的软体 manipulation 任务之一, robot需要根据视觉观察来重新排序软体 объек� 到预定的目标配置中。这些任务通常面临两个主要挑战:一是软体配置空间的维度太高,二是软体动力学的内置复杂性、非线性和不确定性。为了解决这些挑战,我们提出了一种新的表示策略,可以有效地模型软体 объек� 的状态,并且通过建立和更新两个动态图来模型软体重新排序动力学。我们还提出了一种本地图神经网络(GNN),可以同时模型软体重新排序动力学和推理最佳抓取和放置动作。我们在实验中发现,我们的动态图表示方法可以更高效地模型软体重新排序动力学,并且在多种软体重新排序任务上达到96.3%的Success rate(在实验中)。此外,我们的方法比现状态技术更轻量级,并且在推理时间方面具有60%的缩短。此外,我们还证明了我们的方法在多任务学习场景下表现良好,可以通过精细调整一个关键点探测器来转移到实际应用中。

Forking Uncertainties: Reliable Prediction and Model Predictive Control with Sequence Models via Conformal Risk Control

  • paper_url: http://arxiv.org/abs/2310.10299
  • repo_url: None
  • paper_authors: Matteo Zecchin, Sangwoo Park, Osvaldo Simeone
  • for: 本文旨在提供一种基于 probabilistic implicit or explicit sequence model 的预测 uncertainty 管理方法,以便在具有复杂动力学和分支轨迹的 cyber-physical systems 中提供可靠性和安全性 garanties。
  • methods: 本文提出了一种基于 ensemble 的多板轨迹预测(PTS-CRC)方法,该方法可以跨多个预测器生成可靠的预测集,以捕捉异常轨迹的不确定性。此外,PTS-CRC 还可以满足非覆盖性定义的可靠性要求,这使得可以实现更加高效的控制策略。
  • results: 实验结果表明,PTS-CRC 预测器可以提供更加信息归一化的预测集,以及安全性和质量的控制策略,并且在无线网络中的多种任务中都表现出了更高的返回。
    Abstract In many real-world problems, predictions are leveraged to monitor and control cyber-physical systems, demanding guarantees on the satisfaction of reliability and safety requirements. However, predictions are inherently uncertain, and managing prediction uncertainty presents significant challenges in environments characterized by complex dynamics and forking trajectories. In this work, we assume access to a pre-designed probabilistic implicit or explicit sequence model, which may have been obtained using model-based or model-free methods. We introduce probabilistic time series-conformal risk prediction (PTS-CRC), a novel post-hoc calibration procedure that operates on the predictions produced by any pre-designed probabilistic forecaster to yield reliable error bars. In contrast to existing art, PTS-CRC produces predictive sets based on an ensemble of multiple prototype trajectories sampled from the sequence model, supporting the efficient representation of forking uncertainties. Furthermore, unlike the state of the art, PTS-CRC can satisfy reliability definitions beyond coverage. This property is leveraged to devise a novel model predictive control (MPC) framework that addresses open-loop and closed-loop control problems under general average constraints on the quality or safety of the control policy. We experimentally validate the performance of PTS-CRC prediction and control by studying a number of use cases in the context of wireless networking. Across all the considered tasks, PTS-CRC predictors are shown to provide more informative predictive sets, as well as safe control policies with larger returns.
    摘要 在许多实际问题中,预测被用来监控和控制电脑物理系统,需要保证可靠性和安全性要求的满足。然而,预测本身具有不确定性,在复杂动态环境中管理预测不确定性呈现出 significanthallenges。在这项工作中,我们假设有一个预先设计的 probabilistic implicit or explicit sequence model,可能通过模型基于或模型自由方法获得。我们介绍了一种新的后期加拟程序,即 probablistic time series-conformal risk prediction (PTS-CRC),可以在任何预先设计的 probabilistic forecaster 的预测结果上进行后期加拟,以生成可靠的误差范围。与现有艺术 differencely,PTS-CRC 生成基于多个原型轨迹样本集的预测集,支持高效地表示分支不确定性。此外,PTS-CRC 可以满足超过覆盖率的可靠性定义,这种特性被利用来开发一种基于 average constraints 的新的模型预测控制 (MPC) 框架,可以解决一般平均约束下的开 loop 和关 loop 控制问题。我们通过研究无线网络上的一些用例, validate 了 PTS-CRC 预测和控制的性能。在所有考虑的任务中,PTS-CRC 预测器被证明提供更加有用的预测集,以及安全的控制策略与更大的回报。

Key-phrase boosted unsupervised summary generation for FinTech organization

  • paper_url: http://arxiv.org/abs/2310.10294
  • repo_url: None
  • paper_authors: Aadit Deshpande, Shreya Goyal, Prateek Nagwanshi, Avinash Tripathy
  • for: 这篇论文的目的是提出一种基于Action-Object对的自动生成社交媒体摘要方法,以帮助金融科技公司更好地利用社交媒体语言数据,并提供一个外部视角来对consumer behavior进行分析。
  • methods: 这篇论文使用了NLP技术,特别是意向检测、情感分类和文本概要生成等应用,以处理社交媒体语言数据。它还提出了一种基于Action-Object对的自动生成社交媒体摘要方法,以增强对社交媒体语言数据的分析和利用。
  • results: 该论文通过对Reddit讨论串的社交媒体语言数据进行分析,并对基于Action-Object对的摘要方法进行评估,并证明了该方法的效果。具体来说,该论文在Context Metrics上表现出了显著的优势,包括Unique words、Action-Object对和名称块的数量。
    Abstract With the recent advances in social media, the use of NLP techniques in social media data analysis has become an emerging research direction. Business organizations can particularly benefit from such an analysis of social media discourse, providing an external perspective on consumer behavior. Some of the NLP applications such as intent detection, sentiment classification, text summarization can help FinTech organizations to utilize the social media language data to find useful external insights and can be further utilized for downstream NLP tasks. Particularly, a summary which highlights the intents and sentiments of the users can be very useful for these organizations to get an external perspective. This external perspective can help organizations to better manage their products, offers, promotional campaigns, etc. However, certain challenges, such as a lack of labeled domain-specific datasets impede further exploration of these tasks in the FinTech domain. To overcome these challenges, we design an unsupervised phrase-based summary generation from social media data, using 'Action-Object' pairs (intent phrases). We evaluated the proposed method with other key-phrase based summary generation methods in the direction of contextual information of various Reddit discussion threads, available in the different summaries. We introduce certain "Context Metrics" such as the number of Unique words, Action-Object pairs, and Noun chunks to evaluate the contextual information retrieved from the source text in these phrase-based summaries. We demonstrate that our methods significantly outperform the baseline on these metrics, thus providing a qualitative and quantitative measure of their efficacy. Proposed framework has been leveraged as a web utility portal hosted within Amex.
    摘要 One of the key challenges in exploring these tasks in the FinTech domain is the lack of labeled domain-specific datasets. To overcome this challenge, we have designed an unsupervised phrase-based summary generation method from social media data using 'Action-Object' pairs (intent phrases). We evaluated our method against other key-phrase based summary generation methods in terms of contextual information retrieved from the source text.To evaluate the contextual information, we introduced certain "Context Metrics" such as the number of unique words, Action-Object pairs, and noun chunks. Our method significantly outperformed the baseline on these metrics, providing a qualitative and quantitative measure of its efficacy. This framework has been leveraged as a web utility portal hosted within Amex.

No Compromise in Solution Quality: Speeding Up Belief-dependent Continuous POMDPs via Adaptive Multilevel Simplification

  • paper_url: http://arxiv.org/abs/2310.10274
  • repo_url: None
  • paper_authors: Andrey Zhitnikov, Ori Sztyglic, Vadim Indelman
  • for: 这篇论文是关于Continuous POMDPs with general belief-dependent rewards的解决方案。
  • methods: 本论文使用了一种名为“adaptive multilevel simplification”的方法,具体来说是在给定的信念树和MCTS的基础上实现POMDP的线上规划。这种方法可以快速加速POMDP的规划,而不会失去解决方案的质量。
  • results: 本论文提出了三种算法来加速Continuous POMDP的规划,其中两种算法(SITH-BSP和LAZY-SITH-BSP)可以在任何信念树构建方法上使用,第三种算法(SITH-PFT)是一种可以适应任何探索技术的任何时间MCTS方法。所有这些算法都能够返回与未加速的算法相同的优化的动作。此外,本论文还提出了一种新的信息论 reward的代价计算方法,该方法可以轻松计算,并且可以通过需求的精细化来紧张化。
    Abstract Continuous POMDPs with general belief-dependent rewards are notoriously difficult to solve online. In this paper, we present a complete provable theory of adaptive multilevel simplification for the setting of a given externally constructed belief tree and MCTS that constructs the belief tree on the fly using an exploration technique. Our theory allows to accelerate POMDP planning with belief-dependent rewards without any sacrifice in the quality of the obtained solution. We rigorously prove each theoretical claim in the proposed unified theory. Using the general theoretical results, we present three algorithms to accelerate continuous POMDP online planning with belief-dependent rewards. Our two algorithms, SITH-BSP and LAZY-SITH-BSP, can be utilized on top of any method that constructs a belief tree externally. The third algorithm, SITH-PFT, is an anytime MCTS method that permits to plug-in any exploration technique. All our methods are guaranteed to return exactly the same optimal action as their unsimplified equivalents. We replace the costly computation of information-theoretic rewards with novel adaptive upper and lower bounds which we derive in this paper, and are of independent interest. We show that they are easy to calculate and can be tightened by the demand of our algorithms. Our approach is general; namely, any bounds that monotonically converge to the reward can be easily plugged-in to achieve significant speedup without any loss in performance. Our theory and algorithms support the challenging setting of continuous states, actions, and observations. The beliefs can be parametric or general and represented by weighted particles. We demonstrate in simulation a significant speedup in planning compared to baseline approaches with guaranteed identical performance.
    摘要 CONTINUOUS POMDPs WITH GENERAL BELIEF-DEPENDENT REWARDS ARE DIFFICULT TO SOLVE ONLINE. IN THIS PAPER, WE PRESENT A COMPLETE PROVABLE THEORY OF ADAPTIVE MULTILEVEL SIMPLIFICATION FOR THE SETTING OF A GIVEN EXTERNALLY CONSTRUCTED BELIEF TREE AND MCTS THAT CONSTRUCTS THE BELIEF TREE ON THE FLY USING AN EXPLORATION TECHNIQUE. OUR THEORY ALLOWS FOR ACCELERATING POMDP PLANNING WITH BELIEF-DEPENDENT REWARDS WITHOUT ANY SACRIFICE IN THE QUALITY OF THE OBTAINED SOLUTION. WE RIGOROUSLY PROVE EACH THEORETICAL CLAIM IN THE PROPOSED UNIFIED THEORY. USING THE GENERAL THEORETICAL RESULTS, WE PRESENT THREE ALGORITHMS TO ACCELERATE CONTINUOUS POMDP ONLINE PLANNING WITH BELIEF-DEPENDENT REWARDS. OUR TWO ALGORITHMS, SITH-BSP AND LAZY-SITH-BSP, CAN BE UTILIZED ON TOP OF ANY METHOD THAT CONSTRUCTS A BELIEF TREE EXTERNALLY. THE THIRD ALGORITHM, SITH-PFT, IS ANYTIME MCTS METHOD THAT PERMITS TO PLUG-IN ANY EXPLORATION TECHNIQUE. ALL OUR METHODS ARE GUARANTEED TO RETURN THE SAME OPTIMAL ACTION AS THEIR UNSIMPLIFIED EQUIVALENTS. WE REPLACE THE COSTLY COMPUTATION OF INFORMATION-THEORETIC REWARDS WITH NOVEL ADAPTIVE UPPER AND LOWER BOUNDS WHICH WE DERIVE IN THIS PAPER, AND ARE OF INDEPENDENT INTEREST. WE SHOW THAT THEY ARE EASY TO CALCULATE AND CAN BE TIGHTENED BY THE DEMAND OF OUR ALGORITHMS. OUR APPROACH IS GENERAL; NAMELY, ANY BOUNDS THAT MONOTONICALLY CONVERGE TO THE REWARD CAN BE EASILY PLUGGED-IN TO ACHIEVE SIGNIFICANT SPEEDUP WITHOUT ANY LOSS IN PERFORMANCE. OUR THEORY AND ALGORITHMS SUPPORT THE CHALLENGING SETTING OF CONTINUOUS STATES, ACTIONS, AND OBSERVATIONS. THE BELIEFS CAN BE PARAMETRIC OR GENERAL AND REPRESENTED BY WEIGHTED PARTICLES. WE DEMONSTRATE IN SIMULATION A SIGNIFICANT SPEEDUP IN PLANNING COMPARED TO BASELINE APPROACHES WITH GUARANTEED IDENTICAL PERFORMANCE.

Rethinking Financial Service Promotion With Hybrid Recommender Systems at PicPay

  • paper_url: http://arxiv.org/abs/2310.10268
  • repo_url: None
  • paper_authors: Gabriel Mendonça, Matheus Santos, André Gonçalves, Yan Almeida
  • for: 这个研究是为了提高PicPay的金融服务推荐效果。
  • methods: 这个研究使用了两种推荐算法,Switching Hybrid Recommender System,以提高item推荐的效果。
  • results: 我们的A/B测试显示, Switching Hybrid Recommender System可以提高推荐效果,比默认推荐策略提高3.2%。
    Abstract The fintech PicPay offers a wide range of financial services to its 30 million monthly active users, with more than 50 thousand items recommended in the PicPay mobile app. In this scenario, promoting specific items that are strategic to the company can be very challenging. In this work, we present a Switching Hybrid Recommender System that combines two algorithms to effectively promote items without negatively impacting the user's experience. The results of our A/B tests show an uplift of up to 3.2\% when compared to a default recommendation strategy.
    摘要 picPay 提供了一个广泛的金融服务,每月活跃用户达3000万人,app中推荐的商品超过50000个。在这种情况下,推荐特定的商品可以非常具有挑战性。在这份工作中,我们提出了一种交换 гибрид推荐系统,将两种算法结合使用,以有效地推荐商品,不对用户体验造成负面影响。我们的A/B测试结果显示,与默认推荐策略相比,我们的推荐策略可以提高用户增长率高达3.2%。

  • paper_url: http://arxiv.org/abs/2310.10260
  • repo_url: None
  • paper_authors: Adel Ammar, Anis Koubaa, Bilel Benjdira, Omar Najar, Serry Sibaee
  • for: 这研究旨在预测阿拉伯语法庭决定,帮助法官做出决策和帮助律师采取更加细化的战略。
  • methods: 本研究使用了当前state-of-the-art的大语言模型进行预测,包括LLaMA-7b、JAIS-13b和GPT3.5-turbo等三种基础模型,并使用了三种训练方法:零例学习、一例学习和特定精度调整。此外,研究还评估了对原始阿拉伯文输入文本的摘要和/或翻译的效果。
  • results: 研究发现,所有LLaMA模型的性能很差,而GPT-3.5基于模型在所有模型中表现出色,比其他模型的平均分数高出50%。此外,研究还发现,除了人类评估外,其他所有的评估方法都不可靠,无法正确评估大语言模型在法庭决定预测中的性能。
    Abstract In the intricate field of legal studies, the analysis of court decisions is a cornerstone for the effective functioning of the judicial system. The ability to predict court outcomes helps judges during the decision-making process and equips lawyers with invaluable insights, enhancing their strategic approaches to cases. Despite its significance, the domain of Arabic court analysis remains under-explored. This paper pioneers a comprehensive predictive analysis of Arabic court decisions on a dataset of 10,813 commercial court real cases, leveraging the advanced capabilities of the current state-of-the-art large language models. Through a systematic exploration, we evaluate three prevalent foundational models (LLaMA-7b, JAIS-13b, and GPT3.5-turbo) and three training paradigms: zero-shot, one-shot, and tailored fine-tuning. Besides, we assess the benefit of summarizing and/or translating the original Arabic input texts. This leads to a spectrum of 14 model variants, for which we offer a granular performance assessment with a series of different metrics (human assessment, GPT evaluation, ROUGE, and BLEU scores). We show that all variants of LLaMA models yield limited performance, whereas GPT-3.5-based models outperform all other models by a wide margin, surpassing the average score of the dedicated Arabic-centric JAIS model by 50%. Furthermore, we show that all scores except human evaluation are inconsistent and unreliable for assessing the performance of large language models on court decision predictions. This study paves the way for future research, bridging the gap between computational linguistics and Arabic legal analytics.
    摘要 在复杂的法律研究领域中,法庭判决分析是司法系统的基础石头。预测法庭结果可以帮助法官做出决策,并让律师获得价值的洞察,提高他们的战略方法。然而,阿拉伯语法庭分析领域仍然未得到足够的探索。这篇论文探索了10813起商业法庭案例的阿拉伯语法庭判决预测,利用当今最先进的大语言模型。通过系统性的探索,我们评估了三种基础模型(LLaMA-7b、JAIS-13b和GPT3.5-turbo)和三种训练方法(零shot、一shot和tailored fine-tuning)。此外,我们还评估了原始阿拉伯语输入文本的摘要和/或翻译是否有利。这导致了14种模型变体,我们为它们提供了细腻的性能评估,包括人类评估、GPT评估、ROUGE和BLEU分数。我们发现所有LLaMA模型的表现很有限,而GPT-3.5基于模型在所有其他模型之上占据了很大优势,超过了特化于阿拉伯语的JAIS模型的平均分数 by 50%。此外,我们发现除人类评估外,所有其他分数都是不可靠和不一致的,这些分数不适用于评估大语言模型在法庭判决预测中的表现。这篇研究为未来的研究提供了桥梁,将计算语言学和阿拉伯语法律分析相连。

SGOOD: Substructure-enhanced Graph-Level Out-of-Distribution Detection

  • paper_url: http://arxiv.org/abs/2310.10237
  • repo_url: None
  • paper_authors: Zhihao Ding, Jieming Shi
  • for: 本研究旨在提高图像分类中的非标准图像检测性能,即在未知数据分布下检测图像是否属于标准分布或者非标准分布。
  • methods: 本研究提出了一种基于子结构的图像分类方法,包括建立超Graph的子结构、设计两级图像编码管道以及开发三种图像增强技术来增强表达力。
  • results: 对10种竞争者进行了广泛的实验,常常超过现有方法,并且在许多图像 Dataset 上表现出了显著的优势。
    Abstract Graph-level representation learning is important in a wide range of applications. However, existing graph-level models are generally built on i.i.d. assumption for both training and testing graphs, which is not realistic in an open world, where models can encounter out-of-distribution (OOD) testing graphs that are from different distributions unknown during training. A trustworthy model should not only produce accurate predictions for in-distribution (ID) data, but also detect OOD graphs to avoid unreliable prediction. In this paper, we present SGOOD, a novel graph-level OOD detection framework. We find that substructure differences commonly exist between ID and OOD graphs. Hence, SGOOD explicitly utilizes substructures to learn powerful representations to achieve superior performance. Specifically, we build a super graph of substructures for every graph, and design a two-level graph encoding pipeline that works on both original graphs and super graphs to obtain substructure-enhanced graph representations. To further distinguish ID and OOD graphs, we develop three graph augmentation techniques that preserve substructures and increase expressiveness. Extensive experiments against 10 competitors on numerous graph datasets demonstrate the superiority of SGOOD, often surpassing existing methods by a significant margin. The code is available at https://anonymous.4open.science/r/SGOOD-0958.
    摘要 GRAPH-LEVEL REPRESENTATION LEARNING 是在各种应用中非常重要。然而,现有的 GRAPH-LEVEL 模型通常是基于 i.i.d. 假设,即训练和测试 GRAPH 都是同一个分布,这并不是现实世界中的开放世界, где模型可能会遇到不同分布的测试 GRAPH。一个可靠的模型不仅需要在 ID 数据上生成准确的预测,还需要检测 OOD GRAPH 以避免不可靠的预测。在这篇论文中,我们提出了 SGOOD,一种新的 GRAPH-LEVEL OOD 检测框架。我们发现了 ID 和 OOD GRAPH 之间的结构差异,因此 SGOOD 使用substructure来学习强大的表示。具体来说,我们建立了每个 GRAPH 的超graph,并设计了两级图编码管道,以便在原始 GRAPH 和超graph 上获得增强的图表示。为了进一步分别 ID 和 OOD GRAPH,我们开发了三种图增强技术,以保持substructure并提高表达能力。我们对10个竞争对手的实验结果表明,SGOOD 常常超过现有方法,准确率高于95%。代码可以在 获取。

Using Global Land Cover Product as Prompt for Cropland Mapping via Visual Foundation Model

  • paper_url: http://arxiv.org/abs/2310.10219
  • repo_url: None
  • paper_authors: Chao Tao, Aoran Hu, Rong Xiao, Haifeng Li, Yuze Wang
  • for: 本研究旨在解决受不同场景Attribute和取景条件影响的 cropland 映射问题,通过提出 “Pretrain+Prompting” 方法,以便在模型理解过程中简化领域适应。
  • methods: 本研究使用了可访问的全球土地覆盖产品,设计了自动提示(APT)方法,通过在模型推理过程中引入各个示例的个性提示,实现了细化的领域适应过程。
  • results: 根据两个SUB-METER cropland 数据集的实验结果,提出的 “Pretrain+Prompting” 方法在Remote sensing 领域的 cropland 映射问题中表现了比传统supervised learning和精度调整方法更高的性能。
    Abstract Data-driven deep learning methods have shown great potential in cropland mapping. However, due to multiple factors such as attributes of cropland (topography, climate, crop type) and imaging conditions (viewing angle, illumination, scale), croplands under different scenes demonstrate a great domain gap. This makes it difficult for models trained in the specific scenes to directly generalize to other scenes. A common way to handle this problem is through the "Pretrain+Fine-tuning" paradigm. Unfortunately, considering the variety of features of cropland that are affected by multiple factors, it is hardly to handle the complex domain gap between pre-trained data and target data using only sparse fine-tuned samples as general constraints. Moreover, as the number of model parameters grows, fine-tuning is no longer an easy and low-cost task. With the emergence of prompt learning via visual foundation models, the "Pretrain+Prompting" paradigm redesigns the optimization target by introducing individual prompts for each single sample. This simplifies the domain adaption from generic to specific scenes during model reasoning processes. Therefore, we introduce the "Pretrain+Prompting" paradigm to interpreting cropland scenes and design the auto-prompting (APT) method based on freely available global land cover product. It can achieve a fine-grained adaptation process from generic scenes to specialized cropland scenes without introducing additional label costs. To our best knowledge, this work pioneers the exploration of the domain adaption problems for cropland mapping under prompt learning perspectives. Our experiments using two sub-meter cropland datasets from southern and northern China demonstrated that the proposed method via visual foundation models outperforms traditional supervised learning and fine-tuning approaches in the field of remote sensing.
    摘要 “数据驱动深度学习方法在耕地地图中表现出了很大的潜力。然而,由于耕地特性(地形、气候、作物种)以及捕获条件(观察角度、照明、比例)的多种因素,耕地不同场景之间存在巨大的领域差异。这使得使用特定场景的训练数据直接适应其他场景变得困难。通常,使用“Pretrain+Fine-tuning”模式来解决这个问题。然而,考虑到耕地特性的多种影响,使用只有稀疏的精度适应样本作为通用约束是不充分的。此外,随着模型参数的增加,精度适应变得不是易于进行的低成本任务。随着视觉基础模型的出现,“Pretrain+Prompting”模式可以重新设定优化目标,通过引入每个样本的个性提示来简化领域适应。因此,我们提出了基于自由可用的全球土地覆盖产品的自动提示(APT)方法,可以实现不受预先标注的场景适应过程。我们的实验使用南方和北方中国的两个半米耕地数据集证明了,与传统的超级学习和精度适应方法相比,我们的方法在远程感知领域中表现出了更好的性能。”

Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook

  • paper_url: http://arxiv.org/abs/2310.10196
  • repo_url: https://github.com/qingsongedu/awesome-timeseries-spatiotemporal-lm-llm
  • paper_authors: Ming Jin, Qingsong Wen, Yuxuan Liang, Chaoli Zhang, Siqiao Xue, Xue Wang, James Zhang, Yi Wang, Haifeng Chen, Xiaoli Li, Shirui Pan, Vincent S. Tseng, Yu Zheng, Lei Chen, Hui Xiong
  • for: 本研究主要是为了对时间序列和空间时间数据进行分析和挖掘,以便更好地利用这些数据中含的丰富信息,并为各种应用领域提供支持。
  • methods: 本研究使用大量语言和其他基础模型,对时间序列和空间时间数据进行分析和挖掘,并提供了一个完整的和最新的评论,涵盖四个关键方面:数据类型、模型类别、模型范围和应用领域/任务。
  • results: 本研究提供了一个完整的和最新的评论,涵盖大型模型在时间序列和空间时间数据分析中的应用和发展,并提供了丰富的资源,包括数据集、模型资产和有用工具,以便开发应用和进行进一步的研究。
    Abstract Temporal data, notably time series and spatio-temporal data, are prevalent in real-world applications. They capture dynamic system measurements and are produced in vast quantities by both physical and virtual sensors. Analyzing these data types is vital to harnessing the rich information they encompass and thus benefits a wide range of downstream tasks. Recent advances in large language and other foundational models have spurred increased use of these models in time series and spatio-temporal data mining. Such methodologies not only enable enhanced pattern recognition and reasoning across diverse domains but also lay the groundwork for artificial general intelligence capable of comprehending and processing common temporal data. In this survey, we offer a comprehensive and up-to-date review of large models tailored (or adapted) for time series and spatio-temporal data, spanning four key facets: data types, model categories, model scopes, and application areas/tasks. Our objective is to equip practitioners with the knowledge to develop applications and further research in this underexplored domain. We primarily categorize the existing literature into two major clusters: large models for time series analysis (LM4TS) and spatio-temporal data mining (LM4STD). On this basis, we further classify research based on model scopes (i.e., general vs. domain-specific) and application areas/tasks. We also provide a comprehensive collection of pertinent resources, including datasets, model assets, and useful tools, categorized by mainstream applications. This survey coalesces the latest strides in large model-centric research on time series and spatio-temporal data, underscoring the solid foundations, current advances, practical applications, abundant resources, and future research opportunities.
    摘要 现代数据中,时序数据和空间时序数据具有广泛的应用,它们捕捉了动态系统的测量结果,并由物理和虚拟感知器生成了庞大量数据。分析这些数据类型是利用它们含义的关键,因此在多种下游任务中具有重要意义。最新的大语言和其他基础模型的发展,使得这些模型在时序数据和空间时序数据挖掘中得到广泛的应用。这些方法不仅可以在多种领域中提高模式识别和理解,而且为人工通用智能做好了准备。在本综述中,我们提供了一个完整和最新的时序数据和空间时序数据大模型综述,涵盖四个关键方面:数据类型、模型类别、模型范围和应用领域/任务。我们的目标是为实践者提供开发应用和进一步研究的知识。我们将现有文献分为两个主要群组:时序数据分析大模型(LM4TS)和空间时序数据挖掘大模型(LM4STD)。基于这两个群组,我们进一步分类研究根据模型范围(一般 vs.域特定)和应用领域/任务。此外,我们还提供了一份完整的相关资源,包括数据集、模型资产和有用工具,按照主流应用分类。这篇综述汇集了最新的大模型中心研究的进展,强调了它们的基础、当前进展、实际应用、资源储备和未来研究机遇。

Battle of the Large Language Models: Dolly vs LLaMA vs Vicuna vs Guanaco vs Bard vs ChatGPT – A Text-to-SQL Parsing Comparison

  • paper_url: http://arxiv.org/abs/2310.10190
  • repo_url: None
  • paper_authors: Shuo Sun, Yuchen Zhang, Jiahuan Yan, Yuze Gao, Donovan Ong, Bin Chen, Jian Su
  • for: 评估大型自然语言模型(LLM)在文本转换SQL解析方面的表现,以帮助研究人员更好地了解这些模型的实际性能。
  • methods: 对六种流行的大型自然语言模型进行系统性的评估,使用九个benchmark数据集和五种提示策略进行测试,包括零shot和几shot情况。
  • results: 发现开源模型在文本转换SQL解析方面的性能落后于关闭源模型如GPT-3.5,表明需要进一步的研究以减少这些模型之间的性能差距。
    Abstract The success of ChatGPT has ignited an AI race, with researchers striving to develop new large language models (LLMs) that can match or surpass the language understanding and generation abilities of commercial ones. In recent times, a number of models have emerged, claiming performance near that of GPT-3.5 or GPT-4 through various instruction-tuning methods. As practitioners of Text-to-SQL parsing, we are grateful for their valuable contributions to open-source research. However, it is important to approach these claims with a sense of scrutiny and ascertain the actual effectiveness of these models. Therefore, we pit six popular large language models against each other, systematically evaluating their Text-to-SQL parsing capability on nine benchmark datasets with five different prompting strategies, covering both zero-shot and few-shot scenarios. Regrettably, the open-sourced models fell significantly short of the performance achieved by closed-source models like GPT-3.5, highlighting the need for further work to bridge the performance gap between these models.
    摘要 成功的ChatGPT引燃了一场AI竞赛,研究人员努力开发新的大型自然语言模型(LLMs),以达到或超越商业模型的语言理解和生成能力。最近,一些模型宣称在不同的指导方法下达到GPT-3.5或GPT-4的性能。作为文本转SQL解析的实践者,我们感谢这些开源研究的有价值贡献。然而,我们应该对这些声明进行严格的评估,以确定这些模型的实际效果。因此,我们将六种流行的大型自然语言模型进行比较测试,系统地评估这些模型在九个benchmark datasets上的文本转SQL解析能力,使用五种不同的提示策略,包括零shot和几shot场景。惜亏,开源的模型在closed-source模型GPT-3.5的性能上表现出了明显的差距,这 highlights the need for further work to bridge the performance gap between these models。

Continual Generalized Intent Discovery: Marching Towards Dynamic and Open-world Intent Recognition

  • paper_url: http://arxiv.org/abs/2310.10184
  • repo_url: https://github.com/songxiaoshuai/CGID
  • paper_authors: Xiaoshuai Song, Yutao Mou, Keqing He, Yueyan Qiu, Pei Wang, Weiran Xu
  • for: 这篇论文目标是解决在不同数据流中进行动态意图发现,以及在开放世界中实现动态意图识别。
  • methods: 该论文提出了一种新的任务 named Continual Generalized Intent Discovery (CGID), 它通过不断地从动态OOD数据流中发现新意图,然后逐渐将其添加到分类器中,几乎不需要之前的数据。
  • results: 论文提出了一种名为Prototype-guided Learning with Replay and Distillation (PLRD)的方法,可以实现CGID任务。该方法通过类prototype进行启动新意图发现,并通过数据重播和特征储存来保持新和旧意图的平衡。
    Abstract In a practical dialogue system, users may input out-of-domain (OOD) queries. The Generalized Intent Discovery (GID) task aims to discover OOD intents from OOD queries and extend them to the in-domain (IND) classifier. However, GID only considers one stage of OOD learning, and needs to utilize the data in all previous stages for joint training, which limits its wide application in reality. In this paper, we introduce a new task, Continual Generalized Intent Discovery (CGID), which aims to continuously and automatically discover OOD intents from dynamic OOD data streams and then incrementally add them to the classifier with almost no previous data, thus moving towards dynamic intent recognition in an open world. Next, we propose a method called Prototype-guided Learning with Replay and Distillation (PLRD) for CGID, which bootstraps new intent discovery through class prototypes and balances new and old intents through data replay and feature distillation. Finally, we conduct detailed experiments and analysis to verify the effectiveness of PLRD and understand the key challenges of CGID for future research.
    摘要 在实际对话系统中,用户可能输入过域 (OOD) 查询。通用意图发现 (GID) 任务目标是从 OOD 查询中发现 OOD 意图并将其扩展到内域 (IND) 分类器。但 GID 只考虑了一个阶段的 OOD 学习,需要在所有前一阶段的数据上进行联合训练,这限制了其在实际应用中的广泛应用。在本文中,我们介绍了一个新的任务:不断总结化通用意图发现 (CGID),它目标是从动态 OOD 数据流中不断发现 OOD 意图,然后在几乎没有先前数据的情况下,逐步添加它们到分类器中,从而逐步实现动态意图认知在开放世界中。接着,我们提出了一种方法 called Prototype-guided Learning with Replay and Distillation (PLRD),它通过类型概念引导新意向发现,并通过数据重播和特征硬化平衡新和旧意图。最后,我们进行了详细的实验和分析,以证明 PLRD 的效果和 CGID 的关键挑战,以便未来研究。

Large Language Models Meet Open-World Intent Discovery and Recognition: An Evaluation of ChatGPT

  • paper_url: http://arxiv.org/abs/2310.10176
  • repo_url: https://github.com/songxiaoshuai/OOD-Evaluation
  • paper_authors: Xiaoshuai Song, Keqing He, Pei Wang, Guanting Dong, Yutao Mou, Jingang Wang, Yunsen Xian, Xunliang Cai, Weiran Xu
  • for: 本研究旨在评估ChatGPT在非预期意权(OOD)发现和总结扩展(GID)任务中的能力。
  • methods: 本研究使用了ChatGPT进行OOD意权发现和GID任务,并对其进行了评估。
  • results: ChatGPT在零shot设定下表现出了一致的优势,但与精心定制的模型相比,其仍然处于劣势。通过一系列的分析实验,本研究揭示了LLMs在扩展OOD意权时所面临的挑战,并提供了未来研究的指导。
    Abstract The tasks of out-of-domain (OOD) intent discovery and generalized intent discovery (GID) aim to extend a closed intent classifier to open-world intent sets, which is crucial to task-oriented dialogue (TOD) systems. Previous methods address them by fine-tuning discriminative models. Recently, although some studies have been exploring the application of large language models (LLMs) represented by ChatGPT to various downstream tasks, it is still unclear for the ability of ChatGPT to discover and incrementally extent OOD intents. In this paper, we comprehensively evaluate ChatGPT on OOD intent discovery and GID, and then outline the strengths and weaknesses of ChatGPT. Overall, ChatGPT exhibits consistent advantages under zero-shot settings, but is still at a disadvantage compared to fine-tuned models. More deeply, through a series of analytical experiments, we summarize and discuss the challenges faced by LLMs including clustering, domain-specific understanding, and cross-domain in-context learning scenarios. Finally, we provide empirical guidance for future directions to address these challenges.
    摘要 Tasks of out-of-domain (OOD) intent discovery and generalized intent discovery (GID) aim to extend a closed intent classifier to open-world intent sets, which is crucial to task-oriented dialogue (TOD) systems. Previous methods address them by fine-tuning discriminative models. Recently, although some studies have been exploring the application of large language models (LLMs) represented by ChatGPT to various downstream tasks, it is still unclear for the ability of ChatGPT to discover and incrementally extend OOD intents. In this paper, we comprehensively evaluate ChatGPT on OOD intent discovery and GID, and then outline the strengths and weaknesses of ChatGPT. Overall, ChatGPT exhibits consistent advantages under zero-shot settings, but is still at a disadvantage compared to fine-tuned models. More deeply, through a series of analytical experiments, we summarize and discuss the challenges faced by LLMs including clustering, domain-specific understanding, and cross-domain in-context learning scenarios. Finally, we provide empirical guidance for future directions to address these challenges.Here's the translation in Traditional Chinese:Tasks of out-of-domain (OOD) intent discovery and generalized intent discovery (GID) aim to extend a closed intent classifier to open-world intent sets, which is crucial to task-oriented dialogue (TOD) systems. Previous methods address them by fine-tuning discriminative models. Recently, although some studies have been exploring the application of large language models (LLMs) represented by ChatGPT to various downstream tasks, it is still unclear for the ability of ChatGPT to discover and incrementally extend OOD intents. In this paper, we comprehensively evaluate ChatGPT on OOD intent discovery and GID, and then outline the strengths and weaknesses of ChatGPT. Overall, ChatGPT exhibits consistent advantages under zero-shot settings, but is still at a disadvantage compared to fine-tuned models. More deeply, through a series of analytical experiments, we summarize and discuss the challenges faced by LLMs including clustering, domain-specific understanding, and cross-domain in-context learning scenarios. Finally, we provide empirical guidance for future directions to address these challenges.

Analyzing An After-Sales Service Process Using Object-Centric Process Mining: A Case Study

  • paper_url: http://arxiv.org/abs/2310.10174
  • repo_url: None
  • paper_authors: Gyunam Park, Sevde Aydin, Cuneyt Ugur, Wil M. P. van der Aalst
  • for: 本研究旨在探讨对象центри的过程挖掘技术的应用,以帮助实际操作场景中的业务进程优化。
  • methods: 本研究使用了对象центри的过程挖掘技术,通过对 approximately 65,000 个事件的分析,揭示了这种技术在实际操作场景中的应用前景和优势。
  • results: 研究发现,对象центри的过程挖掘技术可以更好地捕捉实际操作场景中的企业过程细节,提供更加全面和深入的业务进程nderstandings,帮助企业实现更好的操作优化。
    Abstract Process mining, a technique turning event data into business process insights, has traditionally operated on the assumption that each event corresponds to a singular case or object. However, many real-world processes are intertwined with multiple objects, making them object-centric. This paper focuses on the emerging domain of object-centric process mining, highlighting its potential yet underexplored benefits in actual operational scenarios. Through an in-depth case study of Borusan Cat's after-sales service process, this study emphasizes the capability of object-centric process mining to capture entangled business process details. Utilizing an event log of approximately 65,000 events, our analysis underscores the importance of embracing this paradigm for richer business insights and enhanced operational improvements.
    摘要 Process mining, a technique turning event data into business process insights, has traditionally operated on the assumption that each event corresponds to a singular case or object. However, many real-world processes are intertwined with multiple objects, making them object-centric. This paper focuses on the emerging domain of object-centric process mining, highlighting its potential yet underexplored benefits in actual operational scenarios. Through an in-depth case study of Borusan Cat's after-sales service process, this study emphasizes the capability of object-centric process mining to capture entangled business process details. Utilizing an event log of approximately 65,000 events, our analysis underscores the importance of embracing this paradigm for richer business insights and enhanced operational improvements. traducción al chino simplificado: 过程挖掘,一种将事件数据转化为业务过程智能,传统上假设每个事件对应一个单一的案例或对象。然而,现实世界中许多过程都与多个对象紧密相连,使得它们变成了中心式的。这篇论文关注到在 объекo-中心的过程挖掘领域的出现,强调其在实际运营场景中的可能尚未得到充分利用的优点。通过对博рус安猫后售服务过程的深入探讨,本研究强调了中心式过程挖掘的能力来捕捉互相紧密相连的业务过程细节。通过使用约65,000个事件的日志分析,我们的研究强调了接受这种思想的重要性,以获得更加丰富的商业智能和改进运营效率。

Leveraging Knowledge Distillation for Efficient Deep Reinforcement Learning in Resource-Constrained Environments

  • paper_url: http://arxiv.org/abs/2310.10170
  • repo_url: https://github.com/paopaolin/papercode/tree/main/MENGGUANLIN_papercode/combine%20V1
  • paper_authors: Guanlin Meng
  • for: 这篇论文旨在探索深度强化学习(DRL)与知识传播(KD)的组合,以实现将深度模型的计算负载减轻,保持性能。
  • methods: 这篇论文使用了多种DRL算法的知识传播,并研究了这些知识传播的影响。
  • results: 这篇论文的研究结果显示,通过将DRL算法与KD技术组合使用,可以开发出较快速、较具有 Computational efficiency 的DRL模型。
    Abstract This paper aims to explore the potential of combining Deep Reinforcement Learning (DRL) with Knowledge Distillation (KD) by distilling various DRL algorithms and studying their distillation effects. By doing so, the computational burden of deep models could be reduced while maintaining the performance. The primary objective is to provide a benchmark for evaluating the performance of different DRL algorithms that have been refined using KD techniques. By distilling these algorithms, the goal is to develop efficient and fast DRL models. This research is expected to provide valuable insights that can facilitate further advancements in this promising direction. By exploring the combination of DRL and KD, this work aims to promote the development of models that require fewer GPU resources, learn more quickly, and make faster decisions in complex environments. The results of this research have the capacity to significantly advance the field of DRL and pave the way for the future deployment of resource-efficient, decision-making intelligent systems.
    摘要 本研究旨在探索将深度束缚学习(DRL)与知识传递(KD)相结合,通过传递多种DRL算法,研究其传递效果。这可以减少深度模型的计算负担,保持性能。研究的主要目标是为不同DRL算法提供评估性能的标准准例。通过传递这些算法,目标是开发高效快速的DRL模型。这项研究预期会为这个Promising direction提供有价值的发现,促进DRL领域的进一步发展。通过探索DRL和KD的结合,这项研究期望开发需要 fewer GPU资源、快速学习、在复杂环境中做出快速决策的模型。研究结果具有提高DRL领域的前景,并为未来部署资源有效的决策智能系统铺平道路的潜在性。

DemoNSF: A Multi-task Demonstration-based Generative Framework for Noisy Slot Filling Task

  • paper_url: http://arxiv.org/abs/2310.10169
  • repo_url: https://github.com/dongguanting/Demo-NSF
  • paper_authors: Guanting Dong, Tingfeng Hui, Zhuoma GongQue, Jinxu Zhao, Daichi Guo, Gang Zhao, Keqing He, Weiran Xu
  • for: 提高生成框架在实际对话场景中的泛化能力,解决它们在输入干扰时的通用化问题。
  • methods: 提出了多任务示范生成框架,名为DemoNSF,并引入了三种含有干扰信息的辅助任务,即干扰恢复(NR)、随机覆盖(RM)和混合识别(HD),以便在不同粒度上捕捉输入干扰的Semantic结构信息。
  • results: 在两个标准 benchmark 上, DemoNSF 比所有基eline方法表现出色,并实现了强大的泛化性。进一步的分析提供了生成框架在实践中的指导。
    Abstract Recently, prompt-based generative frameworks have shown impressive capabilities in sequence labeling tasks. However, in practical dialogue scenarios, relying solely on simplistic templates and traditional corpora presents a challenge for these methods in generalizing to unknown input perturbations. To address this gap, we propose a multi-task demonstration based generative framework for noisy slot filling, named DemoNSF. Specifically, we introduce three noisy auxiliary tasks, namely noisy recovery (NR), random mask (RM), and hybrid discrimination (HD), to implicitly capture semantic structural information of input perturbations at different granularities. In the downstream main task, we design a noisy demonstration construction strategy for the generative framework, which explicitly incorporates task-specific information and perturbed distribution during training and inference. Experiments on two benchmarks demonstrate that DemoNSF outperforms all baseline methods and achieves strong generalization. Further analysis provides empirical guidance for the practical application of generative frameworks. Our code is released at https://github.com/dongguanting/Demo-NSF.
    摘要 最近,基于提示的生成框架在序列标注任务中表现出了很好的能力。然而,在实际对话场景中,仅仅依靠简单的模板和传统词汇库是生成方法在处理未知输入干扰时的挑战。为解决这个差距,我们提议一种多任务生成框架 для噪声插值,名为 DemoNSF。我们在这个框架中引入了三种噪声辅助任务,即噪声恢复(NR)、随机面(RM)和混合识别(HD),以隐式地捕捉输入干扰的 semantic 结构信息。在下游主任务中,我们设计了一种噪声示例建构策略,用于在训练和推断过程中显式地包含任务特定的信息和干扰分布。实验结果表明, DemoNSF 在两个标准 benchmark 上都高于所有基准方法,并且具有强大的泛化能力。进一步的分析提供了实践应用 generative 框架的指导。我们的代码在 GitHub 上发布,地址为

Deep Learning Algorithm for Advanced Level-3 Inverse-Modeling of Silicon-Carbide Power MOSFET Devices

  • paper_url: http://arxiv.org/abs/2310.17657
  • repo_url: None
  • paper_authors: Massimo Orazio Spata, Sebastiano Battiato, Alessandro Ortis, Francesco Rundo, Michele Calabretta, Carmelo Pino, Angelo Messina
  • for: 这个论文是为了提取力Field-Effect Transistor (SiC Power MOS)的物理参数而设计的深度学习方法。
  • methods: 该方法使用深度学习算法来训练Device的参数预测。
  • results: 实验结果表明,该方法可以有效地重构SiC Power MOS的物理参数,包括晶圆长度。
    Abstract Inverse modelling with deep learning algorithms involves training deep architecture to predict device's parameters from its static behaviour. Inverse device modelling is suitable to reconstruct drifted physical parameters of devices temporally degraded or to retrieve physical configuration. There are many variables that can influence the performance of an inverse modelling method. In this work the authors propose a deep learning method trained for retrieving physical parameters of Level-3 model of Power Silicon-Carbide MOSFET (SiC Power MOS). The SiC devices are used in applications where classical silicon devices failed due to high-temperature or high switching capability. The key application of SiC power devices is in the automotive field (i.e. in the field of electrical vehicles). Due to physiological degradation or high-stressing environment, SiC Power MOS shows a significant drift of physical parameters which can be monitored by using inverse modelling. The aim of this work is to provide a possible deep learning-based solution for retrieving physical parameters of the SiC Power MOSFET. Preliminary results based on the retrieving of channel length of the device are reported. Channel length of power MOSFET is a key parameter involved in the static and dynamic behaviour of the device. The experimental results reported in this work confirmed the effectiveness of a multi-layer perceptron designed to retrieve this parameter.
    摘要 倒推模型使用深度学习算法来训练深度架构,以预测设备的参数从其静态行为中。倒推设备模型适用于重构过时的物理参数或恢复物理配置。倒推模型的性能有很多因素的影响。在这项工作中,作者提出了一种深度学习方法,用于重 Retrieving physical parameters of Level-3 model of Power Silicon-Carbide MOSFET (SiC Power MOS). SiC设备在高温或高开关能力应用场景中被广泛使用,因此SiC Power MOS在高温或高压力环境中会显著偏移物理参数,这可以通过倒推模型进行监测。本工作的目标是提供一种可能的深度学习基于解决方案,以重 Retrieving physical parameters of SiC Power MOSFET。初步结果基于设备渠道长度的重构被报告。渠道长度是SiC Power MOS的静态和动态行为中关键参数之一。实验结果表明,使用多层感知器来重构这个参数是有效的。

Character-LLM: A Trainable Agent for Role-Playing

  • paper_url: http://arxiv.org/abs/2310.10158
  • repo_url: https://github.com/choosewhatulike/trainable-agents
  • paper_authors: Yunfan Shao, Linyang Li, Junqi Dai, Xipeng Qiu
  • for: 研究是使用大型自然语言模型(LLM)作为代理人类模拟人类行为的能力。
  • methods: 我们提出了一种方法,即编辑人物profile和经验,以允许LLM模型成为特定人物。
  • results: 我们在测试场景中训练了agent并评估其能否记忆和表现出人物的特点和经验。实验结果表明了有趣的观察,有助于建立未来的人类模拟。
    Abstract Large language models (LLMs) can be used to serve as agents to simulate human behaviors, given the powerful ability to understand human instructions and provide high-quality generated texts. Such ability stimulates us to wonder whether LLMs can simulate a person in a higher form than simple human behaviors. Therefore, we aim to train an agent with the profile, experience, and emotional states of a specific person instead of using limited prompts to instruct ChatGPT API. In this work, we introduce Character-LLM that teach LLMs to act as specific people such as Beethoven, Queen Cleopatra, Julius Caesar, etc. Our method focuses on editing profiles as experiences of a certain character and training models to be personal simulacra with these experiences. To assess the effectiveness of our approach, we build a test playground that interviews trained agents and evaluates whether the agents \textit{memorize} their characters and experiences. Experimental results show interesting observations that help build future simulacra of humankind.
    摘要 大型语言模型(LLM)可以作为代理人对人类行为进行模拟,因为它具有强大的理解人类指令和生成文本能力。这种能力让我们感到是否可以使LMLM模型模拟出更高一层的人类形式。因此,我们想要将特定人物的资料 Profiling、经验和情感状态传入LMLM模型,以模拟出该人物的行为。在这个研究中,我们提出了Character-LLM,它可以教导LMLM模型以特定人物的形式行为。我们的方法是将特定人物的资料编译为LMLM模型的体验,然后训练这些模型成为特定人物的人工模拟。为了评估我们的方法的有效性,我们建立了一个测试场景,让训练好的代理人回答问题,以判断代理人是否将记忆其角色和体验。实验结果给出了有趣的观察,帮助我们建立未来的人类模拟。

Adaptive Workload Distribution for Accuracy-aware DNN Inference on Collaborative Edge Platforms

  • paper_url: http://arxiv.org/abs/2310.10157
  • repo_url: None
  • paper_authors: Zain Taufique, Antonio Miele, Pasi Liljeberg, Anil Kanduri
  • for: 加速深度学习模型(DNN)的推理过程,通过分布工作负载到一群协作的边缘节点。
  • methods: 提议适应性工作负载分布策略,同时考虑边缘设备的差异性和深度学习模型的精度和性能要求。
  • results: 测试了我们的方法在一个包括Odroid XU4、Raspberry Pi4和Jetson Nano板的边缘集群上,与状态艺术工作负载分布策略相比,实现了平均提高41.52%的性能和5.2%的输出精度。
    Abstract DNN inference can be accelerated by distributing the workload among a cluster of collaborative edge nodes. Heterogeneity among edge devices and accuracy-performance trade-offs of DNN models present a complex exploration space while catering to the inference performance requirements. In this work, we propose adaptive workload distribution for DNN inference, jointly considering node-level heterogeneity of edge devices, and application-specific accuracy and performance requirements. Our proposed approach combinatorially optimizes heterogeneity-aware workload partitioning and dynamic accuracy configuration of DNN models to ensure performance and accuracy guarantees. We tested our approach on an edge cluster of Odroid XU4, Raspberry Pi4, and Jetson Nano boards and achieved an average gain of 41.52% in performance and 5.2% in output accuracy as compared to state-of-the-art workload distribution strategies.
    摘要

Theory of Mind for Multi-Agent Collaboration via Large Language Models

  • paper_url: http://arxiv.org/abs/2310.10701
  • repo_url: None
  • paper_authors: Huao Li, Yu Quan Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Michael Lewis, Katia Sycara
  • for: 本研究评估基于大型自然语言模型(LLM)的多智能代理人在多智能协作文本游戏中的表现,并与多智能奖励学习(MARL)和规划基eline进行比较。
  • methods: 本研究使用LLM来实现多智能协作,并对其表现进行评估。研究还 explore了使用explicit belief state representation来改善LLM的规划优化和任务状态幻觉问题。
  • results: 研究发现LLM-based agents exhibit emergent collaborative behaviors and high-order Theory of Mind capabilities,但受到长期context管理和任务状态幻觉的限制。使用explicit belief state representation可以提高任务表现和理解能力。
    Abstract While Large Language Models (LLMs) have demonstrated impressive accomplishments in both reasoning and planning, their abilities in multi-agent collaborations remains largely unexplored. This study evaluates LLM-based agents in a multi-agent cooperative text game with Theory of Mind (ToM) inference tasks, comparing their performance with Multi-Agent Reinforcement Learning (MARL) and planning-based baselines. We observed evidence of emergent collaborative behaviors and high-order Theory of Mind capabilities among LLM-based agents. Our results reveal limitations in LLM-based agents' planning optimization due to systematic failures in managing long-horizon contexts and hallucination about the task state. We explore the use of explicit belief state representations to mitigate these issues, finding that it enhances task performance and the accuracy of ToM inferences for LLM-based agents.
    摘要 大型语言模型(LLM)已经在理解和规划方面表现出色,但它们在多代合作中的能力仍然尚未得到充分探索。这个研究评估了基于 LLM 的代理人在多代合作文本游戏中的理论心理推理任务表现,与多代征学习(MARL)和规划基eline相比较。我们发现 LLM 基于代理人在合作行为中发展出了轻度的协力行为和高级理论心理能力。我们的结果显示 LLM 基于代理人在规划优化方面存在长时间前景管理和任务状态幻觉的系统性问题。我们探索了使用明确的信仰状态表示来缓和这些问题,发现这可以提高任务表现和 LLM 基于代理人的理论心理推理精度。

Recursive Segmentation Living Image: An eXplainable AI (XAI) Approach for Computing Structural Beauty of Images or the Livingness of Space

  • paper_url: http://arxiv.org/abs/2310.10149
  • repo_url: None
  • paper_authors: Yao Qianxiang, Bin Jiang
  • for: 这项研究探讨了一种对图像美感评价的 объек Oriented 计算方法,即“结构美”。通过使用 SAM 模型,我们提出了一种基于再嵌套分割的方法,可以更准确地捕捉图像中的更细grained 结构。
  • methods: 我们使用 SAM 模型进行再嵌套分割,并将结构重建为层次结构,从而获得更加准确的结构量和层次结构。
  • results: 我们的方法可以准确地分割出图像中的意义 objects,包括树、建筑和窗户等,以及抽象的画作中的子结构。我们的计算结果与人类视觉评价相一致,并且在不同的颜色空间中进行评价时也能够保持一定的一致性。
    Abstract This study introduces the concept of "structural beauty" as an objective computational approach for evaluating the aesthetic appeal of images. Through the utilization of the Segment anything model (SAM), we propose a method that leverages recursive segmentation to extract finer-grained substructures. Additionally, by reconstructing the hierarchical structure, we obtain a more accurate representation of substructure quantity and hierarchy. This approach reproduces and extends our previous research, allowing for the simultaneous assessment of Livingness in full-color images without the need for grayscale conversion or separate computations for foreground and background Livingness. Furthermore, the application of our method to the Scenic or Not dataset, a repository of subjective scenic ratings, demonstrates a high degree of consistency with subjective ratings in the 0-6 score range. This underscores that structural beauty is not solely a subjective perception, but a quantifiable attribute accessible through objective computation. Through our case studies, we have arrived at three significant conclusions. 1) our method demonstrates the capability to accurately segment meaningful objects, including trees, buildings, and windows, as well as abstract substructures within paintings. 2) we observed that the clarity of an image impacts our computational results; clearer images tend to yield higher Livingness scores. However, for equally blurry images, Livingness does not exhibit a significant reduction, aligning with human visual perception. 3) our approach fundamentally differs from methods employing Convolutional Neural Networks (CNNs) for predicting image scores. Our method not only provides computational results but also offers transparency and interpretability, positioning it as a novel avenue in the realm of Explainable AI (XAI).
    摘要
  1. Our method demonstrates the capability to accurately segment meaningful objects, including trees, buildings, and windows, as well as abstract substructures within paintings.2. We observed that the clarity of an image impacts our computational results; clearer images tend to yield higher Livingness scores. However, for equally blurry images, Livingness does not exhibit a significant reduction, aligning with human visual perception.3. Our approach fundamentally differs from methods employing Convolutional Neural Networks (CNNs) for predicting image scores. Our method not only provides computational results but also offers transparency and interpretability, positioning it as a novel avenue in the realm of Explainable AI (XAI).

LoBaSS: Gauging Learnability in Supervised Fine-tuning Data

  • paper_url: http://arxiv.org/abs/2310.13008
  • repo_url: None
  • paper_authors: Haotian Zhou, Tingkai Liu, Qianli Ma, Jianbo Yuan, Pengfei Liu, Yang You, Hongxia Yang
  • for: 本研究的目的是提出一种基于模型学习能力的超级vised fine-tuning数据选择方法,以便在模型精度和学习效率之间寻找优质平衡。
  • methods: 本研究使用的方法是基于损失函数的SFT数据选择方法(LoBaSS),该方法根据模型在预训练阶段所学习的能力来选择合适的SFT数据,以便提高模型的精度和学习效率。
  • results: 实验结果表明,使用LoBaSS方法可以在仅6%的全部训练数据量下,超越全数据 Fine-tuning,并在16.7%的数据量下具有同样的精度和学习效率。这表明LoBaSS方法可以在不同领域中协调模型的能力,以达到优质的精度和学习效率。
    Abstract Supervised Fine-Tuning (SFT) serves as a crucial phase in aligning Large Language Models (LLMs) to specific task prerequisites. The selection of fine-tuning data profoundly influences the model's performance, whose principle is traditionally grounded in data quality and distribution. In this paper, we introduce a new dimension in SFT data selection: learnability. This new dimension is motivated by the intuition that SFT unlocks capabilities acquired by a LLM during the pretraining phase. Given that different pretrained models have disparate capabilities, the SFT data appropriate for one may not suit another. Thus, we introduce the term learnability to define the suitability of data for effective learning by the model. We present the Loss Based SFT Data Selection (LoBaSS) method, utilizing data learnability as the principal criterion for the selection SFT data. This method provides a nuanced approach, allowing the alignment of data selection with inherent model capabilities, ensuring optimal compatibility and learning efficiency. In experimental comparisons involving 7B and 13B models, our LoBaSS method is able to surpass full-data fine-tuning at merely 6% of the total training data. When employing 16.7% of the data, LoBaSS harmonizes the model's capabilities across conversational and mathematical domains, proving its efficacy and adaptability.
    摘要 大型语言模型(LLM)的超级vised Fine-Tuning(SFT)阶段 serves as a crucial phase in aligning LLMs to specific task prerequisites. The selection of fine-tuning data profoundly influences the model's performance, whose principle is traditionally grounded in data quality and distribution. In this paper, we introduce a new dimension in SFT data selection: learnability. This new dimension is motivated by the intuition that SFT unlocks capabilities acquired by a LLM during the pretraining phase. Given that different pretrained models have disparate capabilities, the SFT data appropriate for one may not suit another. Thus, we introduce the term learnability to define the suitability of data for effective learning by the model. We present the Loss Based SFT Data Selection (LoBaSS) method, utilizing data learnability as the principal criterion for the selection SFT data. This method provides a nuanced approach, allowing the alignment of data selection with inherent model capabilities, ensuring optimal compatibility and learning efficiency. In experimental comparisons involving 7B and 13B models, our LoBaSS method is able to surpass full-data fine-tuning at merely 6% of the total training data. When employing 16.7% of the data, LoBaSS harmonizes the model's capabilities across conversational and mathematical domains, proving its efficacy and adaptability.

CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization

  • paper_url: http://arxiv.org/abs/2310.10134
  • repo_url: None
  • paper_authors: Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter Clark
  • For: The paper aims to develop a language-based agent that can continually improve over time and perform well in varied environments and tasks.* Methods: The paper proposes a persistent, dynamic, textual memory centered on causal abstractions, which is regularly updated after each trial to gradually learn useful knowledge for new trials.* Results: The proposed approach, called CLIN, outperforms state-of-the-art reflective language agents in the ScienceWorld benchmark, achieves transfer learning to new environments and tasks, and continually improves performance through memory updates.Here are the three points in Simplified Chinese:* For: 这篇论文目的是开发一种可以不断改进并在多个环境和任务中表现出色的语言基于的智能代理。* Methods: 论文提出了一种持续、动态、文本内存,以 causal abstractions 为中心,在每次试验后进行更新,以逐渐学习有用的知识。* Results: CLIN 在 ScienceWorld benchmark 中表现出色,比 state-of-the-art 反射语言代理 Reflexion 高出 23 个绝对分数点,并且在新环境(或新任务)中表现出较好的适应能力和持续改进能力。
    Abstract Language agents have shown some ability to interact with an external environment, e.g., a virtual world such as ScienceWorld, to perform complex tasks, e.g., growing a plant, without the startup costs of reinforcement learning. However, despite their zero-shot capabilities, these agents to date do not continually improve over time beyond performance refinement on a specific task. Here we present CLIN, the first language-based agent to achieve this, so that it continually improves over multiple trials, including when both the environment and task are varied, and without requiring parameter updates. Our approach is to use a persistent, dynamic, textual memory centered on causal abstractions (rather than general "helpful hints") that is regularly updated after each trial so that the agent gradually learns useful knowledge for new trials. In the ScienceWorld benchmark, CLIN is able to continually improve on repeated trials on the same task and environment, outperforming state-of-the-art reflective language agents like Reflexion by 23 absolute points. CLIN can also transfer its learning to new environments (or new tasks), improving its zero-shot performance by 4 points (13 for new tasks) and can further improve performance there through continual memory updates, enhancing performance by an additional 17 points (7 for new tasks). This suggests a new architecture for agents built on frozen models that can still continually and rapidly improve over time.
    摘要 language agents 有能力与外部环境互动,例如虚拟世界 ScienceWorld,完成复杂任务,如培养植物,而无需强化学习的开始成本。然而,虽有零开始能力,这些代理人至今没有持续改进的能力。在这,我们提出了 CLIN,第一个语言基于的代理人,能够在多次尝试中不断改进,包括环境和任务变化时。我们的方法是使用持续、动态、文本中心的 causal abstractions(而不是通用的“帮助提示”), Regularly update after each trial so that the agent gradually learns useful knowledge for new trials。在 ScienceWorld benchmark 中,CLIN 能够在重复尝试中不断改进同一任务和环境,胜过现状 reflective language agents like Reflexion 的 23 个绝对分数点。CLIN 还可以转移到新环境(或新任务),提高零开始性能 by 4 个分数点(13 个分数点),并可以通过持续记忆更新,进一步提高性能,增加 17 个分数点(7 个分数点)。这表明了一种基于冻结模型的新架构,可以在不断改进的时间上 continually 和 Rapidly 提高性能。

A Non-monotonic Smooth Activation Function

  • paper_url: http://arxiv.org/abs/2310.10126
  • repo_url: None
  • paper_authors: Koushik Biswas, Meghana Karri, Ulaş Bağcı
  • for: The paper is written for proposing a new activation function called Sqish, which is an alternative to existing activation functions in deep learning models.
  • methods: The paper uses experiments on various tasks such as classification, object detection, segmentation, and adversarial robustness to demonstrate the superiority of the Sqish activation function over existing activation functions such as ReLU.
  • results: The paper shows that the Sqish activation function achieves better performance than ReLU on several benchmark datasets, including CIFAR100, with an improvement of 8.21% in adversarial robustness and 5.87% in image classification.
    Abstract Activation functions are crucial in deep learning models since they introduce non-linearity into the networks, allowing them to learn from errors and make adjustments, which is essential for learning complex patterns. The essential purpose of activation functions is to transform unprocessed input signals into significant output activations, promoting information transmission throughout the neural network. In this study, we propose a new activation function called Sqish, which is a non-monotonic and smooth function and an alternative to existing ones. We showed its superiority in classification, object detection, segmentation tasks, and adversarial robustness experiments. We got an 8.21% improvement over ReLU on the CIFAR100 dataset with the ShuffleNet V2 model in the FGSM adversarial attack. We also got a 5.87% improvement over ReLU on image classification on the CIFAR100 dataset with the ShuffleNet V2 model.
    摘要 translate the given text into Simplified Chinese.Activation functions are crucial in deep learning models, as they introduce non-linearity into the networks, allowing them to learn from errors and make adjustments, which is essential for learning complex patterns. The essential purpose of activation functions is to transform unprocessed input signals into significant output activations, promoting information transmission throughout the neural network. In this study, we propose a new activation function called Sqish, which is a non-monotonic and smooth function and an alternative to existing ones. We showed its superiority in classification, object detection, segmentation tasks, and adversarial robustness experiments. We got an 8.21% improvement over ReLU on the CIFAR100 dataset with the ShuffleNet V2 model in the FGSM adversarial attack. We also got a 5.87% improvement over ReLU on image classification on the CIFAR100 dataset with the ShuffleNet V2 model.中文翻译: activation functions 是深度学习模型中关键的组件,因为它们引入非线性,让模型从错误中学习并进行调整,这是学习复杂模式的关键。 activation functions 的主要目的是将未处理的输入信号转化为有意义的输出活动,促进神经网络中信息的传输。在本研究中,我们提出了一个新的 activation function called Sqish,它是非增长的和平滑的函数,是现有的替代品。我们在类别、物体检测、分割任务和对抗攻击性实验中证明了它的优越性。在 ShuffleNet V2 模型上,我们在 FGSM 对抗攻击中获得了 ReLU 的 8.21% 提升,并在图像分类任务中获得了 ReLU 的 5.87% 提升。

From Continuous Dynamics to Graph Neural Networks: Neural Diffusion and Beyond

  • paper_url: http://arxiv.org/abs/2310.10121
  • repo_url: None
  • paper_authors: Andi Han, Dai Shi, Lequan Lin, Junbin Gao
  • for: 这篇论文旨在提供关于图 neural network(GNN)的系统性和全面的回顾,尤其是在使用连续动力学方法的研究中。
  • methods: 这篇论文使用的方法包括message passing机制和连续动力学方法,用于解决图 neural network(GNN)中的各种问题,如过滤和压缩。
  • results: 该论文提出了一种基于连续动力学方法的GNN设计方法,并对经典GNN的局限性进行了解释和改进。同时,该论文还提供了多个未解决的研究方向,以便进一步探索GNN的可能性。
    Abstract Graph neural networks (GNNs) have demonstrated significant promise in modelling relational data and have been widely applied in various fields of interest. The key mechanism behind GNNs is the so-called message passing where information is being iteratively aggregated to central nodes from their neighbourhood. Such a scheme has been found to be intrinsically linked to a physical process known as heat diffusion, where the propagation of GNNs naturally corresponds to the evolution of heat density. Analogizing the process of message passing to the heat dynamics allows to fundamentally understand the power and pitfalls of GNNs and consequently informs better model design. Recently, there emerges a plethora of works that proposes GNNs inspired from the continuous dynamics formulation, in an attempt to mitigate the known limitations of GNNs, such as oversmoothing and oversquashing. In this survey, we provide the first systematic and comprehensive review of studies that leverage the continuous perspective of GNNs. To this end, we introduce foundational ingredients for adapting continuous dynamics to GNNs, along with a general framework for the design of graph neural dynamics. We then review and categorize existing works based on their driven mechanisms and underlying dynamics. We also summarize how the limitations of classic GNNs can be addressed under the continuous framework. We conclude by identifying multiple open research directions.
    摘要 格图神经网络(GNNs)已经显示出了重要的承诺,可以模型关系数据,并广泛应用于不同的领域。GNNs的关键机制是叫做“消息传递”,信息从邻居传递到中心节点。这种机制与物理过程热扩散有着直接的关系,GNNs的传播 Naturally corresponds to the evolution of heat density。通过对消息传递的过程进行对热动力学的分析,可以更深入地理解GNNs的力量和缺陷,并且可以设计更好的模型。最近,有一些研究借鉴了维持热动力学形式的GNNs,以解决经典GNNs的知名的局限性,如扩散和压缩。在这篇评论中,我们提供了首次系统性和完整性的对Continuous perspective of GNNs的评论。为了实现这一目标,我们介绍了适应维持热动力学的基本成分,并提出了一个总体的框架 для设计图神经动态。然后,我们回顾了现有的研究,并根据他们的驱动机制和下面动力来分类。我们还总结了经典GNNs中的局限性如何通过维持热动力学的方式解决。 Finally, we identify multiple open research directions.

On Generative Agents in Recommendation

  • paper_url: http://arxiv.org/abs/2310.10108
  • repo_url: https://github.com/LehengTHU/Agent4Rec
  • paper_authors: An Zhang, Leheng Sheng, Yuxin Chen, Hao Li, Yang Deng, Xiang Wang, Tat-Seng Chua
    for:* 这个论文旨在提出一种基于大语言模型(LLM)的电影推荐模拟器,帮助解决现有推荐系统中偏差的问题。methods:* 该模拟器使用了LLM-empowered生成代理,每个代理具有用户 profiling、记忆和行为模块,特意为推荐系统设计。results:* 对 Agent4Rec 的广泛和多方面评估表明,LLM-empowered生成代理可以准确地模拟真实自主人类在推荐系统中的行为。I hope this helps! Let me know if you have any further questions or if there’s anything else I can help with.
    Abstract Recommender systems are the cornerstone of today's information dissemination, yet a disconnect between offline metrics and online performance greatly hinders their development. Addressing this challenge, we envision a recommendation simulator, capitalizing on recent breakthroughs in human-level intelligence exhibited by Large Language Models (LLMs). We propose Agent4Rec, a novel movie recommendation simulator, leveraging LLM-empowered generative agents equipped with user profile, memory, and actions modules specifically tailored for the recommender system. In particular, these agents' profile modules are initialized using the MovieLens dataset, capturing users' unique tastes and social traits; memory modules log both factual and emotional memories and are integrated with an emotion-driven reflection mechanism; action modules support a wide variety of behaviors, spanning both taste-driven and emotion-driven actions. Each agent interacts with personalized movie recommendations in a page-by-page manner, relying on a pre-implemented collaborative filtering-based recommendation algorithm. We delve into both the capabilities and limitations of Agent4Rec, aiming to explore an essential research question: to what extent can LLM-empowered generative agents faithfully simulate the behavior of real, autonomous humans in recommender systems? Extensive and multi-faceted evaluations of Agent4Rec highlight both the alignment and deviation between agents and user-personalized preferences. Beyond mere performance comparison, we explore insightful experiments, such as emulating the filter bubble effect and discovering the underlying causal relationships in recommendation tasks. Our codes are available at https://github.com/LehengTHU/Agent4Rec.
    摘要 现代推荐系统是信息传递的核心,但是在线和离线指标之间的差距妨碍了其发展。为解决这个挑战,我们提出了一种推荐模拟器,即Agent4Rec,利用最近的人工智能大语言模型(LLM)的突破性。我们的模拟器采用LLM拥有的生成代理,包括用户profile、记忆和行为模块,特地设计用于推荐系统。具体来说,这些代理的profile模块初始化使用MovieLens数据集,捕捉用户独特的味蕾和社交特征;记忆模块记录了事实和情感的记忆,并与情感驱动的反射机制集成;行动模块支持广泛的行为,包括味蕾驱动和情感驱动的行为。每个代理都与个性化电影推荐在页面上进行交互,基于先前实现的共同 filtering 基于推荐算法。我们深入探讨Agent4Rec的能力和局限性,以explore一个关键研究问题:可以使用LLM拥有的生成代理 simulate真实自主人类在推荐系统中的行为到哪 extent?我们进行了广泛和多方面的评估,包括对代理和用户个性化喜好的Alignment和偏差。此外,我们还进行了有趣的实验,如模拟过滤层效应和发现推荐任务中的内在 causal 关系。我们的代码可以在https://github.com/LehengTHU/Agent4Rec 中下载。

Regret Analysis of the Posterior Sampling-based Learning Algorithm for Episodic POMDPs

  • paper_url: http://arxiv.org/abs/2310.10107
  • repo_url: None
  • paper_authors: Dengwang Tang, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo
  • for: 这 paper 的目的是研究在 partially observable Markov decision processes (POMDPs) 中学习问题。
  • methods: 这 paper 使用 posterior sampling-based reinforcement learning (PSRL) algorithm for POMDPs,并证明其 bayesian regret 的增长率为 sqrt(eps)。
  • results: 这 paper 显示,在 POMDPs 中,bayesian regret 随着 horizon length H 的增长而增长 exponentiallly,但在undercomplete和weakly revealing的 condition下,可以得到 polynomial bayesian regret bound,其比 recent result by arXiv:2204.08967 好几个orders of magnitude。
    Abstract Compared to Markov Decision Processes (MDPs), learning in Partially Observable Markov Decision Processes (POMDPs) can be significantly harder due to the difficulty of interpreting observations. In this paper, we consider episodic learning problems in POMDPs with unknown transition and observation models. We consider the Posterior Sampling-based Reinforcement Learning (PSRL) algorithm for POMDPs and show that its Bayesian regret scales as the square root of the number of episodes. In general, the regret scales exponentially with the horizon length $H$, and we show that this is inevitable by providing a lower bound. However, under the condition that the POMDP is undercomplete and weakly revealing, we establish a polynomial Bayesian regret bound that improves the regret bound by a factor of $\Omega(H^2\sqrt{SA})$ over the recent result by arXiv:2204.08967.
    摘要 Translated into Simplified Chinese:与Markov决策过程(MDP)相比,在Partially Observable Markov决策过程(POMDP)中学习可能更加困难,主要是因为观察结果的难以解释。在这篇论文中,我们考虑POMDP中的集集学习问题,其中过渡和观察模型都未知。我们使用Posterior Sampling-based Reinforcement Learning(PSRL)算法,并证明其 bayesian regret的准确性与集数平方根相关。通常情况下, regret scales exponentially with horizon length $H$,并且我们提供了一个下界,证明这是不可避免的。然而,在POMDP是undercomplete和weakly revealing的情况下,我们设立了一个较好的 Bayesian regret bound,它比recent result by arXiv:2204.08967好于$\Omega(H^2\sqrt{SA})$。

  • paper_url: http://arxiv.org/abs/2310.10103
  • repo_url: https://github.com/Michael-Equi/lfg-nav
  • paper_authors: Dhruv Shah, Michael Equi, Blazej Osinski, Fei Xia, Brian Ichter, Sergey Levine
  • for: 这个论文的目的是为了帮助机器人快速在未知环境中找到目标。
  • methods: 这个论文使用语言模型来提供启发,并将语言模型作为搜索准则来帮助计划算法探索新环境。
  • results: 这个论文在实验中表明,使用语言模型可以使机器人在未知环境中更快地找到目标,并且在实际环境和模拟 benchmark 中都能够表现出优于无意义探索和其他语言模型使用方法。
    Abstract Navigation in unfamiliar environments presents a major challenge for robots: while mapping and planning techniques can be used to build up a representation of the world, quickly discovering a path to a desired goal in unfamiliar settings with such methods often requires lengthy mapping and exploration. Humans can rapidly navigate new environments, particularly indoor environments that are laid out logically, by leveraging semantics -- e.g., a kitchen often adjoins a living room, an exit sign indicates the way out, and so forth. Language models can provide robots with such knowledge, but directly using language models to instruct a robot how to reach some destination can also be impractical: while language models might produce a narrative about how to reach some goal, because they are not grounded in real-world observations, this narrative might be arbitrarily wrong. Therefore, in this paper we study how the ``semantic guesswork'' produced by language models can be utilized as a guiding heuristic for planning algorithms. Our method, Language Frontier Guide (LFG), uses the language model to bias exploration of novel real-world environments by incorporating the semantic knowledge stored in language models as a search heuristic for planning with either topological or metric maps. We evaluate LFG in challenging real-world environments and simulated benchmarks, outperforming uninformed exploration and other ways of using language models.
    摘要 naviagtion in unfamiliar environments presents a major challenge for robots: while mapping and planning techniques can be used to build up a representation of the world, quickly discovering a path to a desired goal in unfamiliar settings with such methods often requires lengthy mapping and exploration. humans can rapidly navigate new environments, particularly indoor environments that are laid out logically, by leveraging semantics -- e.g., a kitchen often adjoins a living room, an exit sign indicates the way out, and so forth. language models can provide robots with such knowledge, but directly using language models to instruct a robot how to reach some destination can also be impractical: while language models might produce a narrative about how to reach some goal, because they are not grounded in real-world observations, this narrative might be arbitrarily wrong. therefore, in this paper we study how the "semantic guesswork" produced by language models can be utilized as a guiding heuristic for planning algorithms. our method, language frontier guide (lfg), uses the language model to bias exploration of novel real-world environments by incorporating the semantic knowledge stored in language models as a search heuristic for planning with either topological or metric maps. we evaluate lfg in challenging real-world environments and simulated benchmarks, outperforming uninformed exploration and other ways of using language models.

Reusing Pretrained Models by Multi-linear Operators for Efficient Training

  • paper_url: http://arxiv.org/abs/2310.10699
  • repo_url: None
  • paper_authors: Yu Pan, Ye Yuan, Yichun Yin, Zenglin Xu, Lifeng Shang, Xin Jiang, Qun Liu
  • for: 这篇论文目的是提高大型模型的训练速度,使用小型预训练模型来初始化大型模型(称为“目标模型”),并且将这两个模型中的权重 Linearly correlated 以增强加速能力。
  • methods: 这篇论文使用了一种方法,将每个目标模型的权重与所有预训练模型的权重进行 Linear correlation,以增强加速能力。这篇论文还使用了多元线性算子来降低计算和空间复杂度,使得资源需求可以接受。
  • results: 实验结果显示,这篇论文的方法可以在DeiT-base上预训练DeiT-small时,节省76%的计算成本,并且在BERT2BERT和LiGO的比较下,提高了12.0%和20.7%的性能。
    Abstract Training large models from scratch usually costs a substantial amount of resources. Towards this problem, recent studies such as bert2BERT and LiGO have reused small pretrained models to initialize a large model (termed the ``target model''), leading to a considerable acceleration in training. Despite the successes of these previous studies, they grew pretrained models by mapping partial weights only, ignoring potential correlations across the entire model. As we show in this paper, there are inter- and intra-interactions among the weights of both the pretrained and the target models. As a result, the partial mapping may not capture the complete information and lead to inadequate growth. In this paper, we propose a method that linearly correlates each weight of the target model to all the weights of the pretrained model to further enhance acceleration ability. We utilize multi-linear operators to reduce computational and spacial complexity, enabling acceptable resource requirements. Experiments demonstrate that our method can save 76\% computational costs on DeiT-base transferred from DeiT-small, which outperforms bert2BERT by +12.0\% and LiGO by +20.7\%, respectively.
    摘要 通常,训练大型模型从零开始需要很多资源。为解决这个问题, latest studies such as bert2BERT 和 LiGO reuse small pre-trained models to initialize a large model(称为“目标模型”),从而大幅提高训练速度。然而,这些前一 Studies only map partial weights,忽略了整个模型中 weights 之间的可能的相互关系。在这篇文章中,我们发现了这些 weights 之间的相互作用和内部相互作用,这意味着只有部分映射可能不能捕捉完整的信息,导致不够的增长。为解决这个问题,我们提议一种方法,使得每个目标模型中的每个Weight 都 Linearly correlated 到整个预训练模型中的所有Weight。我们使用多线性运算符来减少计算和空间复杂度,使得可接受的资源需求。实验表明,我们的方法可以在 DeiT-base 中心从 DeiT-small 中心转移时Save 76% 的计算成本,并且在 bert2BERT 和 LiGO 的比较中,提高 +12.0% 和 +20.7%,分别。

Orthogonal Uncertainty Representation of Data Manifold for Robust Long-Tailed Learning

  • paper_url: http://arxiv.org/abs/2310.10090
  • repo_url: None
  • paper_authors: Yanbiao Ma, Licheng Jiao, Fang Liu, Shuyuan Yang, Xu Liu, Lingling Li
  • for: 提高模型在长尾分布下的Robustness
  • methods: 使用特征嵌入的Orthogonal Uncertainty Representation(OUR)和端到端训练策略
  • results: 在长尾数据集上进行了全面的评估,OUR方法可以减少模型在长尾分布下的敏感性,并且可以与其他长尾学习方法相结合使用,不需要额外数据生成,快速和高效地训练。
    Abstract In scenarios with long-tailed distributions, the model's ability to identify tail classes is limited due to the under-representation of tail samples. Class rebalancing, information augmentation, and other techniques have been proposed to facilitate models to learn the potential distribution of tail classes. The disadvantage is that these methods generally pursue models with balanced class accuracy on the data manifold, while ignoring the ability of the model to resist interference. By constructing noisy data manifold, we found that the robustness of models trained on unbalanced data has a long-tail phenomenon. That is, even if the class accuracy is balanced on the data domain, it still has bias on the noisy data manifold. However, existing methods cannot effectively mitigate the above phenomenon, which makes the model vulnerable in long-tailed scenarios. In this work, we propose an Orthogonal Uncertainty Representation (OUR) of feature embedding and an end-to-end training strategy to improve the long-tail phenomenon of model robustness. As a general enhancement tool, OUR has excellent compatibility with other methods and does not require additional data generation, ensuring fast and efficient training. Comprehensive evaluations on long-tailed datasets show that our method significantly improves the long-tail phenomenon of robustness, bringing consistent performance gains to other long-tailed learning methods.
    摘要 在长尾分布场景下,模型能够识别尾类受限因为尾类样本的下 Representation 不充分。Class重新平衡、信息增强和其他技术已经提出来解决这个问题,但这些方法通常寻求在数据 manifold 上具有平衡的类准确率,而忽略模型对干扰的抵抗能力。我们通过构建噪音数据 manifold 发现,模型在不平衡数据上训练时的 Robustness 存在长尾现象。即,即使在数据Domain 上具有平衡的类准确率,模型在噪音数据 manifold 上仍存在偏见。然而,现有的方法无法有效 Mitigate 这个现象,使得模型在长尾场景中脆弱。在这种情况下,我们提出了一种Orthogonal Uncertainty Representation(OUR)的特征嵌入和端到端训练策略,以改善模型在长尾场景中的Robustness。作为一种通用加强工具,OUR具有优compatibility 性,不需要额外数据生成,保证快速和高效的训练。对长尾数据集进行了全面评估,我们的方法在长尾场景中显著改善了模型的Robustness,并且与其他长尾学习方法相结合,带来了一致的性能提升。

MOCHA: Real-Time Motion Characterization via Context Matching

  • paper_url: http://arxiv.org/abs/2310.10079
  • repo_url: https://github.com/DK-Jang/MOCHA_SIGASIA2023
  • paper_authors: Deok-Kyeong Jang, Yuting Ye, Jungdam Won, Sung-Hee Lee
  • for: 这篇论文目的是为了实时转换中性无表情的输入动作为一个知名人物的特有风格。
  • methods: 这篇论文介绍了一个新的在线动作特征化框架,即MOCHA,可以将目标人物的动作风格和体型特征转移到输入动作上。
  • results: 该框架可以在实时进行人物动作特征化,并且可以轻松地满足不同应用场景,如只有稀疏输入的人物特征化和实时人物特征化。此外,论文还提供了一个高质量的动作数据集,包括六个不同人物在多种动作中的表现,这可以成为未来研究的优质资源。
    Abstract Transforming neutral, characterless input motions to embody the distinct style of a notable character in real time is highly compelling for character animation. This paper introduces MOCHA, a novel online motion characterization framework that transfers both motion styles and body proportions from a target character to an input source motion. MOCHA begins by encoding the input motion into a motion feature that structures the body part topology and captures motion dependencies for effective characterization. Central to our framework is the Neural Context Matcher, which generates a motion feature for the target character with the most similar context to the input motion feature. The conditioned autoregressive model of the Neural Context Matcher can produce temporally coherent character features in each time frame. To generate the final characterized pose, our Characterizer network incorporates the characteristic aspects of the target motion feature into the input motion feature while preserving its context. This is achieved through a transformer model that introduces the adaptive instance normalization and context mapping-based cross-attention, effectively injecting the character feature into the source feature. We validate the performance of our framework through comparisons with prior work and an ablation study. Our framework can easily accommodate various applications, including characterization with only sparse input and real-time characterization. Additionally, we contribute a high-quality motion dataset comprising six different characters performing a range of motions, which can serve as a valuable resource for future research.
    摘要 <>输入动作的中性化和无特点化的输入动作在实时动画中具有吸引力,这篇论文介绍了MOCHA,一种新的在线动作特征化框架。MOCHA可以同时传递目标人物的动作风格和身体比例到输入动作中。MOCHA开始sBy编码输入动作为一个动作特征,该特征映射体部 topology和捕捉动作依赖关系以便有效地特征化。中心于我们框架的神经Context Matcher生成了目标人物的动作特征,该特征与输入动作特征Context最相似。Conditional autoregressive model of the Neural Context Matcher can produce temporally coherent character features in each time frame. To generate the final characterized pose, our Characterizer network incorporates the characteristic aspects of the target motion feature into the input motion feature while preserving its context. This is achieved through a transformer model that introduces the adaptive instance normalization and context mapping-based cross-attention, effectively injecting the character feature into the source feature. We validate the performance of our framework through comparisons with prior work and an ablation study. Our framework can easily accommodate various applications, including characterization with only sparse input and real-time characterization. Additionally, we contribute a high-quality motion dataset comprising six different characters performing a range of motions, which can serve as a valuable resource for future research.

Verbosity Bias in Preference Labeling by Large Language Models

  • paper_url: http://arxiv.org/abs/2310.10076
  • repo_url: None
  • paper_authors: Keita Saito, Akifumi Wachi, Koki Wataoka, Youhei Akimoto
  • for: 本研究旨在探讨大语言模型(LLMs)的性能提升方法,具体来说是通过人工智能反馈学习(RLAIF)取代人类反馈来评估LLMs。
  • methods: 本研究使用了RLAIF评估GPT-4和人类反馈的方法,并提出了一个量化verbosity bias的指标。
  • results: 研究发现GPT-4在本研究中偏好 longer answers than humans, and propose a metric to measure this bias.
    Abstract In recent years, Large Language Models (LLMs) have witnessed a remarkable surge in prevalence, altering the landscape of natural language processing and machine learning. One key factor in improving the performance of LLMs is alignment with humans achieved with Reinforcement Learning from Human Feedback (RLHF), as for many LLMs such as GPT-4, Bard, etc. In addition, recent studies are investigating the replacement of human feedback with feedback from other LLMs named Reinforcement Learning from AI Feedback (RLAIF). We examine the biases that come along with evaluating LLMs with other LLMs and take a closer look into verbosity bias -- a bias where LLMs sometimes prefer more verbose answers even if they have similar qualities. We see that in our problem setting, GPT-4 prefers longer answers more than humans. We also propose a metric to measure this bias.
    摘要

Fine-tuning ChatGPT for Automatic Scoring

  • paper_url: http://arxiv.org/abs/2310.10072
  • repo_url: None
  • paper_authors: Ehsan Latif, Xiaoming Zhai
  • For: This paper demonstrates the potential of fine-tuned ChatGPT (GPT-3.5) for automatically scoring student written constructed responses in science education.* Methods: The paper uses fine-tuned GPT-3.5 on six assessment tasks with a diverse dataset of middle-school and high-school student responses and expert scoring.* Results: The results show that fine-tuned GPT-3.5 achieved a remarkable average increase (9.1%) in automatic scoring accuracy compared to the fine-tuned state-of-the-art Google’s generated language model, BERT, with significant improvements on multi-label tasks and multi-class items.Here are the three points in Simplified Chinese text:* For: 这个研究用于评估学生写的 constructed responses 的自动评分。* Methods: 这个研究使用 fine-tuned GPT-3.5 在 six 个评估任务上,使用多样化的中学和高中生回答和专家评分数据进行 fine-tuning。* Results: 结果显示, fine-tuned GPT-3.5 在 six 个评估任务上达到了9.1%的平均提升率,比 fine-tuned BERT 高,尤其是在多个标签任务和多个类型任务上显示出了显著的提升。
    Abstract This study highlights the potential of fine-tuned ChatGPT (GPT-3.5) for automatically scoring student written constructed responses using example assessment tasks in science education. Recent studies on OpenAI's generative model GPT-3.5 proved its superiority in predicting the natural language with high accuracy and human-like responses. GPT-3.5 has been trained over enormous online language materials such as journals and Wikipedia; therefore, more than direct usage of pre-trained GPT-3.5 is required for automatic scoring as students utilize a different language than trained material. These imply that a domain-specific model, fine-tuned over data for specific tasks, can enhance model performance. In this study, we fine-tuned GPT-3.5 on six assessment tasks with a diverse dataset of middle-school and high-school student responses and expert scoring. The six tasks comprise two multi-label and four multi-class assessment tasks. We compare the performance of fine-tuned GPT-3.5 with the fine-tuned state-of-the-art Google's generated language model, BERT. The results show that in-domain training corpora constructed from science questions and responses for BERT achieved average accuracy = 0.838, SD = 0.069. GPT-3.5 shows a remarkable average increase (9.1%) in automatic scoring accuracy (mean = 9.15, SD = 0.042) for the six tasks, p =0.001 < 0.05. Specifically, for multi-label tasks (item 1 with 5 labels; item 2 with 10 labels), GPT-3.5 achieved significantly higher scoring accuracy than BERT across all the labels, with the second item achieving a 7.1% increase. The average scoring increase for the four multi-class items for GPT-3.5 was 10.6% compared to BERT. Our study confirmed the effectiveness of fine-tuned GPT-3.5 for automatic scoring of student responses on domain-specific data in education with high accuracy. We have released fine-tuned models for public use and community engagement.
    摘要

GreatSplicing: A Semantically Rich Splicing Dataset

  • paper_url: http://arxiv.org/abs/2310.10070
  • repo_url: None
  • paper_authors: Xiuli Bi, Jiaming Liang
  • for: 解决现有拼接质量 dataset 中缺乏 semantic varieties 的问题,提高拼接trace detection的准确率。
  • methods: 使用 manually created splicing dataset GreatSplicing,包括5000个拼接图像,覆盖335种不同的semantic categories。
  • results: 模型在 GreatSplicing 上训练后表现出较低的错误率和跨 dataset detection 能力,比现有dataset 更佳。
    Abstract In existing splicing forgery datasets, the insufficient semantic varieties of spliced regions cause a problem that trained detection models overfit semantic features rather than splicing traces. Meanwhile, because of the absence of a reasonable dataset, different detection methods proposed cannot reach a consensus on experimental settings. To address these urgent issues, GreatSplicing, a manually created splicing dataset with a considerable amount and high quality, is proposed in this paper. GreatSplicing comprises 5,000 spliced images and covers spliced regions with 335 distinct semantic categories, allowing neural networks to grasp splicing traces better. Extensive experiments demonstrate that models trained on GreatSplicing exhibit minimal misidentification rates and superior cross-dataset detection capabilities compared to existing datasets. Furthermore, GreatSplicing is available for all research purposes and can be downloaded from www.greatsplicing.net.
    摘要 现有的剪辑伪造数据集中,剪辑区域的 semantic variety 不够,导致训练的检测模型更倾向于学习 semantic features 而不是剪辑 traces。另一方面,由于缺乏合理的数据集,不同的检测方法的实际设置不能达成一致。为解决这些紧迫的问题,本文提出了 GreatSplicing,一个手动创建的剪辑数据集,包含 5,000 个剪辑图像,剪辑区域涵盖 335 个不同的 semantic category,使得神经网络更好地捕捉剪辑 traces。广泛的实验表明,基于 GreatSplicing 训练的模型在剪辑检测方面具有较少的误认率和较好的跨数据集检测能力,与现有数据集相比。此外,GreatSplicing 适用于所有研究用途,可以从 www.greatsplicing.net 下载。

Learning Graph Filters for Spectral GNNs via Newton Interpolation

  • paper_url: http://arxiv.org/abs/2310.10064
  • repo_url: None
  • paper_authors: Junjie Xu, Enyan Dai, Dongsheng Luo, Xiang Zhang, Suhang Wang
  • for: 本研究旨在探讨spectral graph neural networks(GNNs)中filter frequency的选择对graph数据的同义度水平(homophily level)的影响,以及如何通过任务指导学习spectral filters来捕捉 graf数据中的关键频率信息。
  • methods: 本研究采用了 both theoretical and empirical analyses,包括对 existedential GNNs进行分析,以及实验室中的实验。研究发现,low-frequency filters与homophily level之间存在正相关,而高频率 filters则与homophily level之间存在负相关。基于这一结论,研究人员提出了一种shape-aware regularization技术,用于自适应定制polynomial spectral filters,以适应 desired homophily levels。
  • results: 研究人员通过实验表明,NewtonNet可以成功地实现desired filter shapes,并在homophilous和heterophilous dataset上显示出优秀的性能。
    Abstract Spectral Graph Neural Networks (GNNs) are gaining attention because they can surpass the limitations of message-passing GNNs by learning spectral filters that capture essential frequency information in graph data through task supervision. However, previous research suggests that the choice of filter frequency is tied to the graph's homophily level, a connection that hasn't been thoroughly explored in existing spectral GNNs. To address this gap, the study conducts both theoretical and empirical analyses, revealing that low-frequency filters have a positive correlation with homophily, while high-frequency filters have a negative correlation. This leads to the introduction of a shape-aware regularization technique applied to a Newton Interpolation-based spectral filter, enabling the customization of polynomial spectral filters that align with desired homophily levels. Extensive experiments demonstrate that NewtonNet successfully achieves the desired filter shapes and exhibits superior performance on both homophilous and heterophilous datasets.
    摘要 spectral graph neural networks (GNNs) 是受到关注,因为它可以超越消息传递 GNNs 的局限性,通过任务指导学习spectral filters,捕捉图数据中重要的频率信息。然而,前一些研究表明,filter frequency 与图的同化程度(homophily level)之间存在相互关系,这个关系尚未在现有的 spectral GNNs 中得到了充分探讨。为了解决这个差距,这项研究进行了both theoretical和empirical分析,发现低频 filters 与 homophily 之间存在正相关,高频 filters 与 homophily 之间存在负相关。这导致了一种shape-aware regularization技术的引入,用于 Newton Interpolation-based spectral filter,以适应欲要的 homophily 水平。广泛的实验表明,NewtonNet 成功实现了所需的 filter 形状,并在 homophilous 和 heterophilous 数据集上表现出色。

A Comprehensive Evaluation of Tool-Assisted Generation Strategies

  • paper_url: http://arxiv.org/abs/2310.10062
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Alon Jacovi, Avi Caciularu, Jonathan Herzig, Roee Aharoni, Bernd Bohnet, Mor Geva
  • for: This paper aims to investigate the effectiveness of various few-shot tool-usage strategies for augmenting language models, and to provide a systematic and fair comparison with strong baselines.
  • methods: The paper uses empirical analysis to compare the performance of different few-shot tool-usage strategies, including strategies that refine incorrect outputs with tools and strategies that retrieve relevant information ahead of or during generation.
  • results: The paper finds that strong no-tool baselines are competitive to tool-assisted strategies, and that tool-assisted strategies are expensive in terms of the number of tokens they require. The paper emphasizes the need for comprehensive evaluations of future strategies to accurately assess their benefits and costs.
    Abstract A growing area of research investigates augmenting language models with tools (e.g., search engines, calculators) to overcome their shortcomings (e.g., missing or incorrect knowledge, incorrect logical inferences). Various few-shot tool-usage strategies have been proposed. However, there is no systematic and fair comparison across different strategies, or between these strategies and strong baselines that do not leverage tools. We conduct an extensive empirical analysis, finding that (1) across various datasets, example difficulty levels, and models, strong no-tool baselines are competitive to tool-assisted strategies, implying that effectively using tools with in-context demonstrations is a difficult unsolved problem; (2) for knowledge-retrieval tasks, strategies that *refine* incorrect outputs with tools outperform strategies that retrieve relevant information *ahead of* or *during generation*; (3) tool-assisted strategies are expensive in the number of tokens they require to work -- incurring additional costs by orders of magnitude -- which does not translate into significant improvement in performance. Overall, our findings suggest that few-shot tool integration is still an open challenge, emphasizing the need for comprehensive evaluations of future strategies to accurately assess their *benefits* and *costs*.
    摘要 研究者正在努力增强语言模型,以解决其缺陷(如缺失或错误知识、逻辑推理错误)。各种几招工具使用策略已经被提出,但没有系统化、公平的比较,或与不使用工具的强大基eline进行比较。我们进行了广泛的实验分析,发现:1. 在不同的数据集、示例难度水平和模型上,不使用工具的强大基eline与工具协助策略竞争激烈,表明使用工具进行协助是一个困难的未解决问题。2. 对知识检索任务,使用工具修正错误输出的策略比使用工具预先检索相关信息的策略更高效。3. 使用工具的策略需要更多的字符数,而这些字符数的增加并没有级别提高表现。总之,我们的发现表明几招工具集成仍然是一个开放的挑战,强调未来策略的全面评估,以准确评估其利益和成本。

Flow Dynamics Correction for Action Recognition

  • paper_url: http://arxiv.org/abs/2310.10059
  • repo_url: None
  • paper_authors: Lei Wang, Piotr Koniusz
  • for: investigate different optical flow and features extracted from these optical flow to improve action recognition performance
  • methods: power normalization on magnitude component of optical flow for flow dynamics correction, and integrating corrected flow dynamics into popular models through a simple hallucination step
  • results: performance boosted with corrected optical flow, and new state-of-the-art performance on several benchmarks including HMDB-51, YUP++, fine-grained action recognition on MPII Cooking Activities, and large-scale Charades
    Abstract Various research studies indicate that action recognition performance highly depends on the types of motions being extracted and how accurate the human actions are represented. In this paper, we investigate different optical flow, and features extracted from these optical flow that capturing both short-term and long-term motion dynamics. We perform power normalization on the magnitude component of optical flow for flow dynamics correction to boost subtle or dampen sudden motions. We show that existing action recognition models which rely on optical flow are able to get performance boosted with our corrected optical flow. To further improve performance, we integrate our corrected flow dynamics into popular models through a simple hallucination step by selecting only the best performing optical flow features, and we show that by 'translating' the CNN feature maps into these optical flow features with different scales of motions leads to the new state-of-the-art performance on several benchmarks including HMDB-51, YUP++, fine-grained action recognition on MPII Cooking Activities, and large-scale Charades.
    摘要 各种研究表明动作认识性能高度取决于提取的动作类型和表达的准确性。在这篇论文中,我们调查了不同的光流,以及从这些光流中提取的短期和长期动作动力特征。我们对光流的幅组分进行功率正规化,以 corrections for flow dynamics。我们显示了现有的基于光流的动作认识模型可以通过我们修正后的光流获得性能提升。为了进一步提高性能,我们将我们修正后的流动动力集成到了流行的模型中,并通过简单的梦幻步骤选择最佳的光流特征,并显示了通过将CNN特征图转换为不同尺度的光流特征来实现新的状态前瞻性表现。这些表现在HMDB-51、YUP++、细化动作认识在MPII Cooking Activities以及大规模Charades等数据集上达到了新的状态前瞻性。

NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models

  • paper_url: http://arxiv.org/abs/2310.10054
  • repo_url: https://github.com/jongwooko/nash-pruning-official
  • paper_authors: Jongwoo Ko, Seungjoon Park, Yujin Kim, Sumyeong Ahn, Du-Seong Chang, Euijai Ahn, Se-Young Yun
  • for: 本研究旨在 investigate encoder-decoder 模型中的结构化剪辑方法,以提高执行速度和生成质量。
  • methods: 本研究采用了分解式剪辑视角,分别对 encoder 和 decoder 组件进行结构化剪辑。
  • results: 研究发现,减少 encoder 网络中的层数可以提高执行速度,而减少 decoder 网络中的层数可以提高生成质量。基于这些发现,提出了一种简单有效的框架 NASH,可以快速地适应不同的任务和网络架构。
    Abstract Structured pruning methods have proven effective in reducing the model size and accelerating inference speed in various network architectures such as Transformers. Despite the versatility of encoder-decoder models in numerous NLP tasks, the structured pruning methods on such models are relatively less explored compared to encoder-only models. In this study, we investigate the behavior of the structured pruning of the encoder-decoder models in the decoupled pruning perspective of the encoder and decoder component, respectively. Our findings highlight two insights: (1) the number of decoder layers is the dominant factor of inference speed, and (2) low sparsity in the pruned encoder network enhances generation quality. Motivated by these findings, we propose a simple and effective framework, NASH, that narrows the encoder and shortens the decoder networks of encoder-decoder models. Extensive experiments on diverse generation and inference tasks validate the effectiveness of our method in both speedup and output quality.
    摘要 《结构化剪除方法在不同的网络架构中,如转换器,有效地减少模型大小和加速推理速度。尽管encoder-decoder模型在许多自然语言处理任务中表现出了多样性,structured pruning方法在这些模型上 however, relatively less explored compared to encoder-only models. 在这个研究中,我们研究了encoder-decoder模型的结构化剪除在解体预测的encoder和decoder组件中的行为。我们的发现指出了两点:(1)解码层数是推理速度的决定因素,和(2)剪除后encoder网络中的低稀畴性提高了生成质量。这些发现使我们提出了一个简单而有效的框架,称为NASH,该框架缩短了encoder网络和缩短了decoder网络。我们在多种生成和推理任务上进行了广泛的实验,并证明了我们的方法在速度和输出质量上都是有效的。》

Robust Collaborative Filtering to Popularity Distribution Shift

  • paper_url: http://arxiv.org/abs/2310.10696
  • repo_url: https://github.com/anzhang314/popgo
  • paper_authors: An Zhang, Wenchang Ma, Jingnan Zheng, Xiang Wang, Tat-seng Chua
  • for: 本研究旨在提高collaborative filtering(CF)模型的泛化能力,即使训练数据中的媒体短Circle(popularity shortcut)存在。
  • methods: 本研究提出了一种简单 yet effective的偏差修正策略,称为PopGo,它可以衡量并降低用户-项目对的交互性wise媒体短Circle。PopGo首先学习一个媒体短Circle模型,然后通过对CF模型的预测进行修正来减少媒体短Circle的影响。
  • results: 对四个benchmark数据集进行了实验,PopGo可以在ID和OOD测试集上取得显著的提升,比如DICE和MACR等已有的偏差修正策略。
    Abstract In leading collaborative filtering (CF) models, representations of users and items are prone to learn popularity bias in the training data as shortcuts. The popularity shortcut tricks are good for in-distribution (ID) performance but poorly generalized to out-of-distribution (OOD) data, i.e., when popularity distribution of test data shifts w.r.t. the training one. To close the gap, debiasing strategies try to assess the shortcut degrees and mitigate them from the representations. However, there exist two deficiencies: (1) when measuring the shortcut degrees, most strategies only use statistical metrics on a single aspect (i.e., item frequency on item and user frequency on user aspect), failing to accommodate the compositional degree of a user-item pair; (2) when mitigating shortcuts, many strategies assume that the test distribution is known in advance. This results in low-quality debiased representations. Worse still, these strategies achieve OOD generalizability with a sacrifice on ID performance. In this work, we present a simple yet effective debiasing strategy, PopGo, which quantifies and reduces the interaction-wise popularity shortcut without any assumptions on the test data. It first learns a shortcut model, which yields a shortcut degree of a user-item pair based on their popularity representations. Then, it trains the CF model by adjusting the predictions with the interaction-wise shortcut degrees. By taking both causal- and information-theoretical looks at PopGo, we can justify why it encourages the CF model to capture the critical popularity-agnostic features while leaving the spurious popularity-relevant patterns out. We use PopGo to debias two high-performing CF models (MF, LightGCN) on four benchmark datasets. On both ID and OOD test sets, PopGo achieves significant gains over the state-of-the-art debiasing strategies (e.g., DICE, MACR).
    摘要 领导合作 filtering(CF)模型中,用户和item的表示可能受到训练数据中的媒体偏袋影响,即媒体偏袋短cut。这些媒体偏袋短cut可以在内部数据(ID)上达到好的性能,但是对于外部数据(OOD)来说,媒体偏袋短cut会导致模型的泛化能力差。为了缓解这 gap,去偏袋策略会评估用户和item的媒体偏袋度并 Mitigate它们从表示中。然而,存在两个缺陷:(1)当衡量媒体偏袋度时,大多数策略只使用单一方面的统计指标(例如,item频次和用户频次),而不考虑用户-item对的compositional度;(2)当缓解媒体偏袋时,许多策略假设测试分布已知。这会导致低质量的去偏袋表示。更糟糕的是,这些策略可以在ID性能的代价下实现OOD泛化性能。在这个工作中,我们提出了一种简单 yet effective的去偏袋策略,即PopGo。PopGo会量化和降低用户-item对的交互媒体偏袋度,不需要测试分布的假设。它首先学习一个媒体偏袋模型,并根据用户和item的媒体偏袋表示计算交互媒体偏袋度。然后,它将CF模型通过对预测进行调整来训练。通过从 causal和信息理论的角度看PopGo,我们可以解释它如何鼓励CF模型捕捉重要的媒体偏袋无关特征,而不捕捉媒体偏袋相关的假特征。我们使用PopGo对两种高性能CF模型(MF、LightGCN)在四个benchmark数据集上进行去偏袋。在ID和OOD测试集上,PopGo实现了与状态 искусственный debiasing策略(例如、DICE、MACR)相比较高的性能提升。

FATE-LLM: A Industrial Grade Federated Learning Framework for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.10049
  • repo_url: https://github.com/FederatedAI/FATE-LLM
  • paper_authors: Tao Fan, Yan Kang, Guoqiang Ma, Weijing Chen, Wenbin Wei, Lixin Fan, Qiang Yang
    for: 这个论文的目的是提出一个industrial-grade federated learning框架,以便在实际应用中使用大型自然语言模型(LLMs)。methods: 这个框架使用了 parameter-efficient fine-tuning方法来有效地训练大型自然语言模型,并且运用了隐私保护机制来保护知识产权和数据隐私。results: 这个框架能够解决训练大型自然语言模型所需的巨量 Computing资源和高质量数据的问题,并且能够保护知识产权和数据隐私。
    Abstract Large Language Models (LLMs), such as ChatGPT, LLaMA, GLM, and PaLM, have exhibited remarkable performances across various tasks in recent years. However, LLMs face two main challenges in real-world applications. One challenge is that training LLMs consumes vast computing resources, preventing LLMs from being adopted by small and medium-sized enterprises with limited computing resources. Another is that training LLM requires a large amount of high-quality data, which are often scattered among enterprises. To address these challenges, we propose FATE-LLM, an industrial-grade federated learning framework for large language models. FATE-LLM (1) facilitates federated learning for large language models (coined FedLLM); (2) promotes efficient training of FedLLM using parameter-efficient fine-tuning methods; (3) protects the intellectual property of LLMs; (4) preserves data privacy during training and inference through privacy-preserving mechanisms. We release the code of FATE-LLM at https://github.com/FederatedAI/FATE-LLM to facilitate the research of FedLLM and enable a broad range of industrial applications.
    摘要 大型语言模型(LLM),如ChatGPT、LLaMA、GLM和PaLM,在过去的几年中表现出了很好的表现。然而,LLM在实际应用中遇到了两个主要挑战。一个挑战是在小到中型企业的限制计算资源下培训LLM,这限制了LLM的应用。另一个挑战是培训LLM需要大量高质量数据,这些数据经常分散在企业中。为解决这些挑战,我们提出了FATE-LLM,一个工业级联合学习框架 для大型语言模型。FATE-LLM(1)实现联合学习 для大型语言模型(称为FedLLM);(2)提高FedLLM的高效培训方法;(3)保护LLM的知识产权;(4)在训练和推断过程中保护数据隐私。我们在GitHub上发布了FATE-LLM的代码,以便研究FedLLM和推广各种工业应用。

TRANSOM: An Efficient Fault-Tolerant System for Training LLMs

  • paper_url: http://arxiv.org/abs/2310.10046
  • repo_url: https://github.com/SenseCore/transom-checkpoint-engine
  • paper_authors: Baodong Wu, Lei Xia, Qingping Li, Kangyu Li, Xu Chen, Yongqiang Guo, Tieyao Xiang, Yuheng Chen, Shigang Li
  • for: 提高大型语言模型(LLM)的训练效率,解决大规模参数训练过程中的硬件和软件故障问题。
  • methods: 提出了一个新的故障tolerant LLM 训练系统,包括三个关键子系统:training pipeline automatic fault tolerance and recovery mechanism(TOL)、training task multi-dimensional metric automatic anomaly detection system(TEE)和training checkpoint asynchronous access automatic fault tolerance and recovery technology(TCE)。
  • results: 实验结果表明, tranSOM 可以显著提高大规模 LLM 训练的效率,Specifically, GPT3-175B 的预训练时间被降低了28%,而 asynchronous checkpoint saving and loading 性能提高了20倍。
    Abstract Large language models (LLMs) with hundreds of billions or trillions of parameters, represented by chatGPT, have achieved profound impact on various fields. However, training LLMs with super-large-scale parameters requires large high-performance GPU clusters and long training periods lasting for months. Due to the inevitable hardware and software failures in large-scale clusters, maintaining uninterrupted and long-duration training is extremely challenging. As a result, A substantial amount of training time is devoted to task checkpoint saving and loading, task rescheduling and restart, and task manual anomaly checks, which greatly harms the overall training efficiency. To address these issues, we propose TRANSOM, a novel fault-tolerant LLM training system. In this work, we design three key subsystems: the training pipeline automatic fault tolerance and recovery mechanism named Transom Operator and Launcher (TOL), the training task multi-dimensional metric automatic anomaly detection system named Transom Eagle Eye (TEE), and the training checkpoint asynchronous access automatic fault tolerance and recovery technology named Transom Checkpoint Engine (TCE). Here, TOL manages the lifecycle of training tasks, while TEE is responsible for task monitoring and anomaly reporting. TEE detects training anomalies and reports them to TOL, who automatically enters the fault tolerance strategy to eliminate abnormal nodes and restart the training task. And the asynchronous checkpoint saving and loading functionality provided by TCE greatly shorten the fault tolerance overhead. The experimental results indicate that TRANSOM significantly enhances the efficiency of large-scale LLM training on clusters. Specifically, the pre-training time for GPT3-175B has been reduced by 28%, while checkpoint saving and loading performance have improved by a factor of 20.
    摘要 大型语言模型(LLM),如chatGPT,已经在不同领域 achieve 了深见的影响。然而,在超大型参数的训练中,需要大型高性能GPU集群和长时间的训练时间,达到月份甚至更长。由于大型集群中的硬件和软件故障是不可避免的,因此在长时间训练中保持无间断和长时间训练是极其困难。为此,我们提出了 TRANSOM,一种新的 fault-tolerant LLM 训练系统。在这项工作中,我们设计了三个关键子系统:训练管道自动过错tolerance和恢复机制(TOL),任务多维度自动异常检测系统(TEE),以及训练checkpoint异步访问自动过错tolerance和恢复技术(TCE)。在这里,TOL负责训练任务的生命周期管理,而TEE负责任务监控和异常报告。TEE检测训练异常并将其报告给TOL,TOL会自动入口过错策略,消除异常节点并重新启动训练任务。而TCE提供的异步checkpoint存储和加载功能,可以大大减少过错过程的负担。实验结果表明,TRANSOM可以大幅提高大规模 LL M 训练的效率。Specifically,对于GPT3-175B的预训练时间,可以提高28%,而checkpoint存储和加载性能可以提高20倍。

Smart City Transportation: Deep Learning Ensemble Approach for Traffic Accident Detection

  • paper_url: http://arxiv.org/abs/2310.10038
  • repo_url: None
  • paper_authors: Victor Adewopo, Nelly Elsayed
  • for: 这篇论文旨在探讨现有的交通事故检测技术,以提高城市智能化交通管理系统中的安全性和效率。
  • methods: 该论文提出了一种新的I3D-CONVLSTM2D模型架构,该模型结合RGB帧和光流信息,特意为城市智能化交通监测系统设计,并通过实验研究证明了该模型的高效性。
  • results: 该论文的实验分析表明,I3D-CONVLSTM2D RGB + Optical-Flow (Trainable)模型在交通事故检测方面表现出色,MAP值达到87%。同时,论文还探讨了数据偏置问题,并提出了解决方案。
    Abstract The dynamic and unpredictable nature of road traffic necessitates effective accident detection methods for enhancing safety and streamlining traffic management in smart cities. This paper offers a comprehensive exploration study of prevailing accident detection techniques, shedding light on the nuances of other state-of-the-art methodologies while providing a detailed overview of distinct traffic accident types like rear-end collisions, T-bone collisions, and frontal impact accidents. Our novel approach introduces the I3D-CONVLSTM2D model architecture, a lightweight solution tailored explicitly for accident detection in smart city traffic surveillance systems by integrating RGB frames with optical flow information. Our experimental study's empirical analysis underscores our approach's efficacy, with the I3D-CONVLSTM2D RGB + Optical-Flow (Trainable) model outperforming its counterparts, achieving an impressive 87\% Mean Average Precision (MAP). Our findings further elaborate on the challenges posed by data imbalances, particularly when working with a limited number of datasets, road structures, and traffic scenarios. Ultimately, our research illuminates the path towards a sophisticated vision-based accident detection system primed for real-time integration into edge IoT devices within smart urban infrastructures.
    摘要 随着城市智能化的发展,道路交通中的事故检测技术已成为提高安全性和优化交通管理的关键。本文进行了全面的探讨现有事故检测技术,探讨其他现代方法的细节,并提供了不同类型的交通事故的详细概述,如后尾collisions、T-bone collisions和前面Collisions。我们的新方法 introduce了I3D-CONVLSTM2D模型架构,这是一种适应性强的解决方案,通过RGB框架和光流信息来检测事故。我们的实验研究的实证分析表明,我们的I3D-CONVLSTM2D RGB + Optical-Flow(可训练)模型在事故检测方面表现出色,达到了87%的 Mean Average Precision(MAP)。我们的发现还探讨了数据不均衡的挑战,特别是在有限数据集、路径结构和交通场景下。最后,我们的研究阐明了一种基于视觉的事故检测系统,准备好于实时集成到智能城市基础设施中的边缘IoT设备。

Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance

  • paper_url: http://arxiv.org/abs/2310.10021
  • repo_url: https://github.com/clvrai/boss
  • paper_authors: Jesse Zhang, Jiahui Zhang, Karl Pertsch, Ziyi Liu, Xiang Ren, Minsuk Chang, Shao-Hua Sun, Joseph J. Lim
    for:BOSS is designed to solve new long-horizon, complex, and meaningful tasks with minimal supervision.methods:BOSS uses skill bootstrapping, where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set. The bootstrapping phase is guided by large language models (LLMs) that inform the agent of meaningful skills to chain together.results:Agents trained with the LLM-guided bootstrapping procedure outperform those trained with naive bootstrapping as well as prior unsupervised skill acquisition methods on zero-shot execution of unseen, long-horizon tasks in new environments.
    Abstract We propose BOSS, an approach that automatically learns to solve new long-horizon, complex, and meaningful tasks by growing a learned skill library with minimal supervision. Prior work in reinforcement learning require expert supervision, in the form of demonstrations or rich reward functions, to learn long-horizon tasks. Instead, our approach BOSS (BOotStrapping your own Skills) learns to accomplish new tasks by performing "skill bootstrapping," where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set. This bootstrapping phase is guided by large language models (LLMs) that inform the agent of meaningful skills to chain together. Through this process, BOSS builds a wide range of complex and useful behaviors from a basic set of primitive skills. We demonstrate through experiments in realistic household environments that agents trained with our LLM-guided bootstrapping procedure outperform those trained with naive bootstrapping as well as prior unsupervised skill acquisition methods on zero-shot execution of unseen, long-horizon tasks in new environments. Website at clvrai.com/boss.
    摘要 我们提出了BOSS方法,它可以自动学习解决新的长期、复杂和有意义的任务,只需最小的监督。现有的循环学习方法通常需要专家指导,通过示例或丰富的奖励函数来学习长期任务。相比之下,我们的BOSS方法通过“技能启动”来学习新任务,其中一个已有技能的机器人与环境互动,不接受任务外部的奖励反馈,而是通过大型自然语言模型(LLM)指导,学习新的技能链接。通过这个过程,BOSS可以从基本的原始技能中拓宽许多复杂和有用的行为。我们通过实验表明,使用我们的LLM指导的启动过程训练的机器人在未看过任务和环境的情况下,可以在逻辑家庭环境中比静态训练和先前的无监督技能获取方法表现出更好的成绩。更多信息可以通过官方网站clvrai.com/boss。

Towards Unified and Effective Domain Generalization

  • paper_url: http://arxiv.org/abs/2310.10008
  • repo_url: https://github.com/invictus717/UniDG
  • paper_authors: Yiyuan Zhang, Kaixiong Gong, Xiaohan Ding, Kaipeng Zhang, Fangrui Lv, Kurt Keutzer, Xiangyu Yue
  • for: 提高基础模型对不同领域的扩展性和泛化性性能
  • methods: 基于无监督学习的方法,在推理阶段进行轻量级微调,以避免训练阶段的极端忘却
  • results: 在12种视觉基础模型上,包括CNN、MLP和Transformer等,平均提高了+5.4%的精度,证明UniDG的多样性和优势。
    Abstract We propose $\textbf{UniDG}$, a novel and $\textbf{Uni}$fied framework for $\textbf{D}$omain $\textbf{G}$eneralization that is capable of significantly enhancing the out-of-distribution generalization performance of foundation models regardless of their architectures. The core idea of UniDG is to finetune models during the inference stage, which saves the cost of iterative training. Specifically, we encourage models to learn the distribution of test data in an unsupervised manner and impose a penalty regarding the updating step of model parameters. The penalty term can effectively reduce the catastrophic forgetting issue as we would like to maximally preserve the valuable knowledge in the original model. Empirically, across 12 visual backbones, including CNN-, MLP-, and Transformer-based models, ranging from 1.89M to 303M parameters, UniDG shows an average accuracy improvement of +5.4% on DomainBed. These performance results demonstrate the superiority and versatility of UniDG. The code is publicly available at https://github.com/invictus717/UniDG
    摘要 我们提出了UniDG,一个新的、统一的框架,可以对基础模型的外部泛化性能进行明显改善,不论其架构。UniDG的核心思想是在推断阶段进行调整,这样可以避免迭代训练的成本。具体来说,我们鼓励模型在无监督下学习试验数据的分布,并对模型参数更新的步骤加入一个罚则。这个罚则可以有效减少严重遗忘问题,因为我们希望将原始模型中的有价知识保留到最大程度。实验结果显示,在12种视觉基础模型中,包括CNN、MLP和Transformer等,参数量从1.89M到303M之间,UniDG在DomainBed上平均提高了5.4%的精度。这些表现结果证明UniDG的优越性和多样性。代码可以在https://github.com/invictus717/UniDG上获取。

Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels

  • paper_url: http://arxiv.org/abs/2310.09997
  • repo_url: None
  • paper_authors: Thomas Jiralerspong, Flemming Kondrup, Doina Precup, Khimya Khetarpal
  • for: 本研究旨在提高深度层次优化学习 Agent 的Sample Efficiency,使其在高维状态空间中具有远景见准能力,从而更好地学习和做出决策。
  • methods: 本研究提出了 Forecaster,一种深度层次优化学习方法,该方法利用高级目标进行规划,并通过模拟环境动力学特性来学习抽象世界模型。
  • results: 实验表明,Forecaster 可以在 AntMaze domain 中实现更好的 Sample Efficiency,并且可以在新任务下进行普适化学习。
    Abstract The ability to plan at many different levels of abstraction enables agents to envision the long-term repercussions of their decisions and thus enables sample-efficient learning. This becomes particularly beneficial in complex environments from high-dimensional state space such as pixels, where the goal is distant and the reward sparse. We introduce Forecaster, a deep hierarchical reinforcement learning approach which plans over high-level goals leveraging a temporally abstract world model. Forecaster learns an abstract model of its environment by modelling the transitions dynamics at an abstract level and training a world model on such transition. It then uses this world model to choose optimal high-level goals through a tree-search planning procedure. It additionally trains a low-level policy that learns to reach those goals. Our method not only captures building world models with longer horizons, but also, planning with such models in downstream tasks. We empirically demonstrate Forecaster's potential in both single-task learning and generalization to new tasks in the AntMaze domain.
    摘要 agent的多级划分能力使其能够预测长期后果,从而实现样本效率学习。这特别有用在高维状态空间如像素的复杂环境中,目标远距离,奖励罕见。我们介绍了Forecaster,一种深层决策学习方法,通过高级目标规划来规划高级目标。Forecaster使用抽象世界模型来模型环境的过程动态,并在 such transition 上训练世界模型。它然后使用这个世界模型来选择优质高级目标,并通过树搜索规划算法来实现。此外,它还训练低级策略,以实现高级目标。我们的方法不仅能够建立更长期的世界模型,还能够在下游任务中使用这些模型进行规划。我们在AntMaze领域进行了实验,证明了Forecaster的潜力。

Network Analysis of the iNaturalist Citizen Science Community

  • paper_url: http://arxiv.org/abs/2310.10693
  • repo_url: None
  • paper_authors: Yu Lu Liu, Thomas Jiralerspong
  • for: 本研究使用iNaturalist citizen science平台作为案例研究,探讨公民科学项目之间的结构和互动方式。
  • methods: 我们将iNaturalist数据表示为两类网络,并使用视觉化和已知网络科学技术来获得公民科学项目结构和用户互动的新的视角。
  • results: 我们提出了一种新的网络准则,使用iNaturalist数据创建一个独特的网络结构,并通过链接预测任务表明这个网络可以用于探讨多种网络科学方法的新思路。
    Abstract In recent years, citizen science has become a larger and larger part of the scientific community. Its ability to crowd source data and expertise from thousands of citizen scientists makes it invaluable. Despite the field's growing popularity, the interactions and structure of citizen science projects are still poorly understood and under analyzed. We use the iNaturalist citizen science platform as a case study to analyze the structure of citizen science projects. We frame the data from iNaturalist as a bipartite network and use visualizations as well as established network science techniques to gain insights into the structure and interactions between users in citizen science projects. Finally, we propose a novel unique benchmark for network science research by using the iNaturalist data to create a network which has an unusual structure relative to other common benchmark networks. We demonstrate using a link prediction task that this network can be used to gain novel insights into a variety of network science methods.
    摘要