cs.AI - 2023-07-27

Multi-Source Domain Adaptation through Dataset Dictionary Learning in Wasserstein Space

  • paper_url: http://arxiv.org/abs/2307.14953
  • repo_url: https://github.com/eddardd/demo-dadil
  • paper_authors: Eduardo Fernandes Montesuma, Fred Ngolè Mboula, Antoine Souloumiac
  • for: 解决多源频率域适应(MSDA)问题,即将多个标注源频率域中的知识传递到无标注目标频率域中,并mitigate数据分布变化问题。
  • methods: 提出了一个基于词典学习和优质运输的MSDA框架,将每个频率域 interpret为一个empirical distribution,并使用 Wasserstein barycenter来表示每个频率域。提出了一个新算法DaDiL,通过每个频率域的atom分布和矩阵barycentric坐标来学习。
  • results: 在Caltech-Office、Office 31和CRWU三个benchmark上评估了我们的方法,并比前一个状态的报告提高了3.15%、2.29%和7.71%的分类性能。最后,我们表明了在学习到的atom分布中的 interpolations可以通过Wasserstein筒来提供可以泛化到目标频率域的数据。
    Abstract This paper seeks to solve Multi-Source Domain Adaptation (MSDA), which aims to mitigate data distribution shifts when transferring knowledge from multiple labeled source domains to an unlabeled target domain. We propose a novel MSDA framework based on dictionary learning and optimal transport. We interpret each domain in MSDA as an empirical distribution. As such, we express each domain as a Wasserstein barycenter of dictionary atoms, which are empirical distributions. We propose a novel algorithm, DaDiL, for learning via mini-batches: (i) atom distributions; (ii) a matrix of barycentric coordinates. Based on our dictionary, we propose two novel methods for MSDA: DaDil-R, based on the reconstruction of labeled samples in the target domain, and DaDiL-E, based on the ensembling of classifiers learned on atom distributions. We evaluate our methods in 3 benchmarks: Caltech-Office, Office 31, and CRWU, where we improved previous state-of-the-art by 3.15%, 2.29%, and 7.71% in classification performance. Finally, we show that interpolations in the Wasserstein hull of learned atoms provide data that can generalize to the target domain.
    摘要 In this framework, we interpret each domain in MSDA as an empirical distribution. We express each domain as a Wasserstein barycenter of dictionary atoms, which are empirical distributions. We propose a novel algorithm, DaDiL, for learning via mini-batches: (i) atom distributions; (ii) a matrix of barycentric coordinates.Based on our dictionary, we propose two novel methods for MSDA: DaDil-R, which is based on the reconstruction of labeled samples in the target domain, and DaDiL-E, which is based on the ensembling of classifiers learned on atom distributions. We evaluate our methods in three benchmarks: Caltech-Office, Office 31, and CRWU, and achieve state-of-the-art performance, with improvements of 3.15%, 2.29%, and 7.71% in classification performance, respectively.Finally, we show that interpolations in the Wasserstein hull of learned atoms provide data that can generalize to the target domain.

Designing Fiduciary Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2308.02435
  • repo_url: None
  • paper_authors: Sebastian Benthall, David Shekman
  • For: The paper is written to provide a procedure for designing and auditing Fiduciary AI, which is AI that is compliant with the legal duty of loyalty and care towards a principal.* Methods: The paper uses a combination of computer science and law to develop the procedure, including identifying the principals, assessing their interests, and ensuring loyalty and care in the design and auditing of Fiduciary AI.* Results: The paper argues that Fiduciary AI is a promising means to address the incompleteness of data subjects’ consent when interacting with complex technical systems, and connects the steps in the procedure to dimensions of Trustworthy AI such as privacy and alignment.Here’s the same information in Simplified Chinese text:* For: 这篇论文是为了提供设计和审核 fiduciary AI 的过程,fiduciary AI 是指遵循法律责任的loyalty和care的人工智能。* Methods: 这篇论文使用计算机科学和法律来发展设计和审核 fiduciary AI 的过程,包括确定主体、评估其利益,并确保设计和审核 fiduciary AI 的loyalty和care。* Results: 这篇论文认为 fiduciary AI 是Complex技术系统中数据主体consent的不完整性的一种解决方案,并将设计和审核 fiduciary AI 的步骤与信任worthy AI 的维度联系起来,如隐私和对齐。
    Abstract A fiduciary is a trusted agent that has the legal duty to act with loyalty and care towards a principal that employs them. When fiduciary organizations interact with users through a digital interface, or otherwise automate their operations with artificial intelligence, they will need to design these AI systems to be compliant with their duties. This article synthesizes recent work in computer science and law to develop a procedure for designing and auditing Fiduciary AI. The designer of a Fiduciary AI should understand the context of the system, identify its principals, and assess the best interests of those principals. Then the designer must be loyal with respect to those interests, and careful in an contextually appropriate way. We connect the steps in this procedure to dimensions of Trustworthy AI, such as privacy and alignment. Fiduciary AI is a promising means to address the incompleteness of data subject's consent when interacting with complex technical systems.
    摘要 一个 fiduciary 是一位被信任的代理人,具有法律责任,向使用者(principal)示好和照顾。当 fiduciary 组织通过数字界面或人工智能自动化其操作时,它们需要设计这些 AI 系统符合其职责。这篇文章结合计算机科学和法律研究,提出了设计和审核 fiduciary AI 的程序。设计 fiduciary AI 的人需要了解系统的上下文,识别主体,并评估这些主体的最好利益。然后,设计人必须对这些利益示好,并在上下文相应的方式进行谨慎。我们将这些步骤与信worthy AI 的维度,如隐私和对齐,连接起来。 fiduciary AI 是用于解决数据主体同技术系统交互时的数据权限不充分的问题的有希望的方法。

PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

  • paper_url: http://arxiv.org/abs/2307.14936
  • repo_url: None
  • paper_authors: Bo Shen, Jiaxin Zhang, Taihong Chen, Daoguang Zan, Bing Geng, An Fu, Muhan Zeng, Ailun Yu, Jichuan Ji, Jingyang Zhao, Yuenan Guo, Qianxiang Wang
  • for: 这篇论文主要写于如何提高预训练的代码生成模型的性能。
  • methods: 该论文提出了一种新的RRTF( Rank Responses to align Test&Teacher Feedback)框架,用于效果地提高预训练大语言模型的代码生成性能。
  • results: 该论文通过PanGu-Coder2实现了62.20%的pass@1分数在OpenAI HumanEval标准准点测试上,并在CoderEval和LeetCode标准测试上 consistentemente超过了所有之前的代码LM。
    Abstract Large Language Models for Code (Code LLM) are flourishing. New and powerful models are released on a weekly basis, demonstrating remarkable performance on the code generation task. Various approaches have been proposed to boost the code generation performance of pre-trained Code LLMs, such as supervised fine-tuning, instruction tuning, reinforcement learning, etc. In this paper, we propose a novel RRTF (Rank Responses to align Test&Teacher Feedback) framework, which can effectively and efficiently boost pre-trained large language models for code generation. Under this framework, we present PanGu-Coder2, which achieves 62.20% pass@1 on the OpenAI HumanEval benchmark. Furthermore, through an extensive evaluation on CoderEval and LeetCode benchmarks, we show that PanGu-Coder2 consistently outperforms all previous Code LLMs.
    摘要 大型语言模型 для程式码 (Code LLM) 正在盛况。新的强大模型在每周基础上发表,展示了惊人的程式码生成能力。各种方法已经被提议来提高预训Code LLM的程式码生成性能,如监督精度调整、指令调整、循环学习等。在本文中,我们提出了一个新的RRTF(排名回应对测试&教师反馈)框架,可以有效地和高效地提高预训大型语言模型的程式码生成能力。在这个框架下,我们发表了PanGu-Coder2,它在OpenAI HumanEvalbenchmark上取得了62.20%的通过率@1。此外,我们透过广泛的评估在CoderEval和LeetCodebenchmark上,展示了PanGu-Coder2在所有前一代Code LLMs中的稳定性和竞争力。

Solving Data Quality Problems with Desbordante: a Demo

  • paper_url: http://arxiv.org/abs/2307.14935
  • repo_url: None
  • paper_authors: George Chernishev, Michael Polyntsov, Anton Chizhov, Kirill Stupakov, Ilya Shchuckin, Alexander Smirnov, Maxim Strutovsky, Alexey Shlyonskikh, Mikhail Firsov, Stepan Manannikov, Nikita Bobrov, Daniil Goncharov, Ilia Barutkin, Vladislav Shalnev, Kirill Muraviev, Anna Rakhmukova, Dmitriy Shcheka, Anton Chernikov, Mikhail Vyrodov, Yaroslav Kurbatov, Maxim Fofanov, Sergei Belokonnyi, Pavel Anosov, Arthur Saliou, Eduard Gaisin, Kirill Smirnov
  • for: 提高现代数据驱动行业中数据分析的效率和质量。
  • methods: 使用功能依赖关系、数据约束、关联规则等复杂统计方法,并提供了可解释的描述。
  • results: 实现了高效、可扩展、可靠的数据 profiling 系统,并提供了与Python集成的解释。
    Abstract Data profiling is an essential process in modern data-driven industries. One of its critical components is the discovery and validation of complex statistics, including functional dependencies, data constraints, association rules, and others. However, most existing data profiling systems that focus on complex statistics do not provide proper integration with the tools used by contemporary data scientists. This creates a significant barrier to the adoption of these tools in the industry. Moreover, existing systems were not created with industrial-grade workloads in mind. Finally, they do not aim to provide descriptive explanations, i.e. why a given pattern is not found. It is a significant issue as it is essential to understand the underlying reasons for a specific pattern's absence to make informed decisions based on the data. Because of that, these patterns are effectively rest in thin air: their application scope is rather limited, they are rarely used by the broader public. At the same time, as we are going to demonstrate in this presentation, complex statistics can be efficiently used to solve many classic data quality problems. Desbordante is an open-source data profiler that aims to close this gap. It is built with emphasis on industrial application: it is efficient, scalable, resilient to crashes, and provides explanations. Furthermore, it provides seamless Python integration by offloading various costly operations to the C++ core, not only mining. In this demonstration, we show several scenarios that allow end users to solve different data quality problems. Namely, we showcase typo detection, data deduplication, and data anomaly detection scenarios.
    摘要 现代数据驱动行业中,数据 profiling 是一项非常重要的过程。其中一个关键组件是发现和验证复杂的统计学,如功能依赖关系、数据约束、相关规则等。然而,大多数现有的数据 profiling 系统,主要关注于复杂的统计学,并不提供适当的工具集成。这创造了使用这些工具在行业中的显著障碍。此外,现有系统没有考虑现代化工作负荷,而且不提供描述性解释,即为什么某种特征没有出现。这是一个重要的问题,因为需要理解数据下的深层次原因,以便根据数据做出了 Informed 决策。由于这些原因,这些特征在实际应用中具有有限的应用范围,通常只有特定领域的专业人员使用。然而,我们将在这个演示中展示,复杂的统计学可以高效地解决许多经典数据质量问题。Desbordante 是一款开源的数据 profiler,旨在填补这个空白。它强调在工业应用中高效、可扩展、可靠、并提供描述性解释。此外,它通过将费时操作卷积到 C++ 核心中,实现了顺略的 Python 集成,不仅是探钻。在这个演示中,我们将展示一些使用 Desbordante 解决不同数据质量问题的场景。具体来说,我们将展示 typo 检测、数据重复检测和数据异常检测等场景。

Approximate Model-Based Shielding for Safe Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.00707
  • repo_url: https://github.com/sacktock/ambs
  • paper_authors: Alexander W. Goodall, Francesco Belardinelli
  • for: 这篇论文的目的是为了解决通用的问题,并在实际世界中应用增强学习(Reinforcement Learning,RL)。
  • methods: 本论文提出了一种名为“approximate model-based shielding”(AMBS)的原理性的预先观察措施,用于验证RL政策对于一些给定的安全限制的性能。AMBS不需要先知道系统的安全相关动力学。
  • results: 论文的实验结果显示,AMBS在一组Atari游戏中的状态依赖安全标签上表现较好,并且比其他安全意识的方法更好。
    Abstract Reinforcement learning (RL) has shown great potential for solving complex tasks in a variety of domains. However, applying RL to safety-critical systems in the real-world is not easy as many algorithms are sample-inefficient and maximising the standard RL objective comes with no guarantees on worst-case performance. In this paper we propose approximate model-based shielding (AMBS), a principled look-ahead shielding algorithm for verifying the performance of learned RL policies w.r.t. a set of given safety constraints. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We provide a strong theoretical justification for AMBS and demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
    摘要

Scaling Session-Based Transformer Recommendations using Optimized Negative Sampling and Loss Functions

  • paper_url: http://arxiv.org/abs/2307.14906
  • repo_url: https://github.com/otto-de/tron
  • paper_authors: Timo Wilm, Philipp Normann, Sophie Baumeister, Paul-Vincent Kobow
  • for: 提高推荐质量和减少训练时间
  • methods: 使用最佳负样本和列wise损失函数增强推荐准确性
  • results: 在大规模电商数据集上比前方法提高推荐质量,同时保持训练速度类似于SASRec,实际场景中A/B测试显示与SASRec比起提高18.14%的点击率。
    Abstract This work introduces TRON, a scalable session-based Transformer Recommender using Optimized Negative-sampling. Motivated by the scalability and performance limitations of prevailing models such as SASRec and GRU4Rec+, TRON integrates top-k negative sampling and listwise loss functions to enhance its recommendation accuracy. Evaluations on relevant large-scale e-commerce datasets show that TRON improves upon the recommendation quality of current methods while maintaining training speeds similar to SASRec. A live A/B test yielded an 18.14% increase in click-through rate over SASRec, highlighting the potential of TRON in practical settings. For further research, we provide access to our source code at https://github.com/otto-de/TRON and an anonymized dataset at https://github.com/otto-de/recsys-dataset.
    摘要 这个工作介绍了TRON,一种可扩展的会话基于Transformer推荐器,使用优化的负样本选择来提高推荐精度。由于现有的模型如SASRec和GRU4Rec+的可扩展性和性能限制,TRON通过将top-k负样本和listwise损失函数结合使用来提高推荐精度。在相关的大规模电商数据集上进行评估,TRON与当前方法相比有所提高了推荐质量,同时保持与SASRec相同的训练速度。一次实际的A/B测试中,TRON比SASRec提高了18.14%的点击率,这表明TRON在实际场景中具有潜在的应用前景。如果您想了解更多细节,可以通过我们的GitHub仓库https://github.com/otto-de/TRON获得源代码,并通过https://github.com/otto-de/recsys-dataset获得匿名化的数据集。

CodeLens: An Interactive Tool for Visualizing Code Representations

  • paper_url: http://arxiv.org/abs/2307.14902
  • repo_url: None
  • paper_authors: Yuejun Guo, Seifeddine Bettaieb, Qiang Hu, Yves Le Traon, Qiang Tang
  • for: 这篇论文是为了提供一个可视化编程代码的工具,帮助开发者更好地理解和探索不同类型的代码表示方法。
  • methods: 这篇论文使用了多种代码表示方法,包括序列化的Token,抽象语法树(AST),数据流图(DFG)和控制流图(CFG)等。
  • results: 这篇论文介绍了一个名为CodeLens的工具,可以帮助开发者快速Visualize不同类型的代码表示,并且可以获取代码表示的输入数据,以便用于代码学习模型。
    Abstract Representing source code in a generic input format is crucial to automate software engineering tasks, e.g., applying machine learning algorithms to extract information. Visualizing code representations can further enable human experts to gain an intuitive insight into the code. Unfortunately, as of today, there is no universal tool that can simultaneously visualise different types of code representations. In this paper, we introduce a tool, CodeLens, which provides a visual interaction environment that supports various representation methods and helps developers understand and explore them. CodeLens is designed to support multiple programming languages, such as Java, Python, and JavaScript, and four types of code representations, including sequence of tokens, abstract syntax tree (AST), data flow graph (DFG), and control flow graph (CFG). By using CodeLens, developers can quickly visualize the specific code representation and also obtain the represented inputs for models of code. The Web-based interface of CodeLens is available at http://www.codelens.org. The demonstration video can be found at http://www.codelens.org/demo.
    摘要 “代码的抽象表示是软件工程中的关键,例如应用机器学习算法提取信息。可视化代码表示可以帮助人类专家获得直观的了解。却是,到目前为止,没有一个通用的工具可以同时视觉不同类型的代码表示。在这篇论文中,我们介绍了一个工具,CodeLens,它提供了一个可视化交互环境,支持多种代码表示方法,帮助开发者理解和探索代码。CodeLens支持多种编程语言,如Java、Python和JavaScript,以及四种代码表示方法,包括Token序列、抽象 sintaxis树(AST)、数据流图(DFG)和控制流图(CFG)。通过使用CodeLens,开发者可以快速视觉特定的代码表示,并获得代码表示的输入数据,以便用于代码模型。CodeLens的Web版本可在http://www.codelens.org/中找到,示例视频在http://www.codelens.org/demo。”

Text-guided Foundation Model Adaptation for Pathological Image Classification

  • paper_url: http://arxiv.org/abs/2307.14901
  • repo_url: https://github.com/yunkun-zhang/cite
  • paper_authors: Yunkun Zhang, Jin Gao, Mu Zhou, Xiaosong Wang, Yu Qiao, Shaoting Zhang, Dequan Wang
  • for: 强化数据稀缺的病理图像分类
  • methods: 利用语言模型预训练的各种生物医学文本知识,将图像和文本embeddings连接起来,增强图像分类性能
  • results: 在patchgastric癌病理图像 dataset上,与多个基elines比较,CITE方法 achieve leading performance,特别是在数据稀缺情况下Here’s the translation in English:
  • for: Enhancing data-efficient pathological image classification
  • methods: Utilizing language models pre-trained with a broad range of biomedical texts to connect image and text embeddings and enhance pathological image understanding
  • results: Leading performance compared with various baselines, especially when training data is scarce, demonstrated through extensive experiments on the PatchGastric stomach tumor pathological image dataset.
    Abstract The recent surge of foundation models in computer vision and natural language processing opens up perspectives in utilizing multi-modal clinical data to train large models with strong generalizability. Yet pathological image datasets often lack biomedical text annotation and enrichment. Guiding data-efficient image diagnosis from the use of biomedical text knowledge becomes a substantial interest. In this paper, we propose to Connect Image and Text Embeddings (CITE) to enhance pathological image classification. CITE injects text insights gained from language models pre-trained with a broad range of biomedical texts, leading to adapt foundation models towards pathological image understanding. Through extensive experiments on the PatchGastric stomach tumor pathological image dataset, we demonstrate that CITE achieves leading performance compared with various baselines especially when training data is scarce. CITE offers insights into leveraging in-domain text knowledge to reinforce data-efficient pathological image classification. Code is available at https://github.com/Yunkun-Zhang/CITE.
    摘要 最近几年,基金会模型在计算机视觉和自然语言处理领域的崛起,开启了使用多模态医疗数据训练大型模型的可能性。然而,病理图像 dataset часто缺乏医学文献注释和丰富。引导数据不充分的图像诊断成为了一项重要的利益。在这篇论文中,我们提出了连接图像和文本嵌入(CITE),以增强病理图像分类。CITE 利用语言模型预训练的宽泛生物医学文献知识,导向基础模型向病理图像理解。通过对 PatchGastric 胃癌病理图像集进行了广泛的实验,我们示出了 CITE 在各种基elines中表现出了领先的性能,特别是在训练数据稀缺时。CITE 提供了使用域内文本知识来加强数据效率的病理图像分类的思路。代码可以在 https://github.com/Yunkun-Zhang/CITE 上获取。

Base-based Model Checking for Multi-Agent Only Believing (long version)

  • paper_url: http://arxiv.org/abs/2307.14893
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Tiago de Lima, Emiliano Lorini, François Schwarzentruber
  • for: 这 paper 是用于描述 Multi-Agent 语言的 semantics 和如何自动检查其中的方程和动态扩展的 private belief expansion 算法。
  • methods: 这 paper 使用了 PSPACE 算法和一种专门的算法,它们都基于 reduction to QBF 和 state space 的探索。
  • results: 这 paper 提供了一个实现 QBF-based 算法,以及一些实际计算时间的示例数据。
    Abstract We present a novel semantics for the language of multi-agent only believing exploiting belief bases, and show how to use it for automatically checking formulas of this language and of its dynamic extension with private belief expansion operators. We provide a PSPACE algorithm for model checking relying on a reduction to QBF and alternative dedicated algorithm relying on the exploration of the state space. We present an implementation of the QBF-based algorithm and some experimental results on computation time in a concrete example.
    摘要 我们提出了一种新的 semantics для多智能语言,只信任滥用信仰基础,并示出如何使用它来自动检查这种语言和其动态扩展中的私有信仰扩展运算符的 формулы。我们提供了一个PSPACE算法 для模板检查,基于降reducible to QBF和一种专门的算法,基于状态空间的探索。我们提供了基于QBF的算法的实现和一些实际例子中的计算时间实验结果。

Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving

  • paper_url: http://arxiv.org/abs/2307.14889
  • repo_url: None
  • paper_authors: Peter Bauer, Arij Bouazizi, Ulrich Kressel, Fabian B. Flohr
    for: 这个论文的目的是提出一种简单 yet efficient的弱监督方法 для3D人姿估计在自动驾驶车辆(AV)上。methods: 这种方法使用了一种高级感知融合,将摄像头和LiDAR数据进行融合,并使用了一个Off-the-shelf 2D联合提取器和LiDAR到图像投影的pseudo标签来进行训练。results: 这种方法在Waymo开放数据集上的弱监督设定下,与当前最佳状态的结果相比,提高了$\sim$13%,并在超级监督设定下达到了当前最佳结果。
    Abstract Accurate 3D human pose estimation (3D HPE) is crucial for enabling autonomous vehicles (AVs) to make informed decisions and respond proactively in critical road scenarios. Promising results of 3D HPE have been gained in several domains such as human-computer interaction, robotics, sports and medical analytics, often based on data collected in well-controlled laboratory environments. Nevertheless, the transfer of 3D HPE methods to AVs has received limited research attention, due to the challenges posed by obtaining accurate 3D pose annotations and the limited suitability of data from other domains. We present a simple yet efficient weakly supervised approach for 3D HPE in the AV context by employing a high-level sensor fusion between camera and LiDAR data. The weakly supervised setting enables training on the target datasets without any 2D/3D keypoint labels by using an off-the-shelf 2D joint extractor and pseudo labels generated from LiDAR to image projections. Our approach outperforms state-of-the-art results by up to $\sim$ 13% on the Waymo Open Dataset in the weakly supervised setting and achieves state-of-the-art results in the supervised setting.
    摘要 精准的3D人姿估计(3DHPE)对自动驾驶车辆(AV)的决策和反应具有重要作用。在各个领域中,如人机交互、 робо扮、运动和医疗分析中,3DHPE已经获得了可观的成果,通常基于实验室环境中收集的数据。然而,将3DHPE方法应用于AV领域受到了限制的研究注意力,因为获得精准的3D姿势标注和其他领域数据的限制。我们提出了一种简单而高效的弱监督方法,通过高级感知融合相机和LiDAR数据来实现3DHPE在AV上。弱监督设定允许在目标数据集上进行训练,不需要2D/3D关键点标注,通过使用市售的2D联合抽取器和LiDAR到图像投影生成的pseudo标签。我们的方法在Waymo开放数据集上比州前一个月提高了$\sim$13%的表现,并在弱监督设定下实现了最佳表现。

Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners

  • paper_url: http://arxiv.org/abs/2307.14856
  • repo_url: None
  • paper_authors: Jihyeon Lee, Dain Kim, Doohae Jung, Boseop Kim, Kyoung-Woon On
  • for: 这个论文的目的是研究seq2seq模型在少量示例学习中的表现,以及如何更好地让seq2seq模型在这些任务上表现出几何学习的能力。
  • methods: 这个论文使用了decoder-only模型和encoder-decoder模型,并对这些模型进行了对比。另外, authors还提出了两种方法来提高seq2seq模型的在context few-shot learning中的表现:目标协调的提问和混合方法。
  • results: 实验结果显示, seq2seq模型在各种任务上的表现比 conventionalseq2seq模型更好,并且 authors的方法可以提高seq2seq模型的表现。特别是, authors的方法可以让seq2seq模型在少量示例学习中表现出几何学习的能力。
    Abstract In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i.e., seq2seq) models excel in methods that rely on weight updates. Recently, a few studies have demonstrated the feasibility of few-shot learning with seq2seq models; however, this has been limited to tasks that align well with the seq2seq architecture, such as summarization and translation. Inspired by these initial studies, we provide a first-ever extensive experiment comparing the in-context few-shot learning capabilities of decoder-only and encoder-decoder models on a broad range of tasks. Furthermore, we propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach. Remarkably, our approach outperforms a decoder-only model that is six times larger and exhibits significant performance improvements compared to conventional seq2seq models across a variety of settings. We posit that, with the right configuration and prompt design, seq2seq models can be highly effective few-shot learners for a wide spectrum of applications.
    摘要 内容学习,具有很大优势,主要在decoder-only模型中观察到,而encoder-decoder(即seq2seq)模型则在需要weight更新的方法中表现出色。最近几个研究已经证明了seq2seq模型可以实现几步学习,但这仅仅限于适合seq2seq架构的任务,如概要和翻译。我们受这些初期研究的启发,进行了首次广泛的实验, comparing the in-context few-shot learning能力of decoder-only和encoder-decoder模型在多种任务上。此外,我们也提出了两种方法,以更好地激发seq2seq模型的内容学习能力:目标对适配和融合方法。可以见,我们的方法在不同的设定和提示设计下,可以与decoder-only模型比较,并且在多种情况下表现出优秀的表现。我们认为,在适当的配置和提示设计下,seq2seq模型可以在广泛的应用中成为高效的几步学习模型。

Counterfactual Explanations for Graph Classification Through the Lenses of Density

  • paper_url: http://arxiv.org/abs/2307.14849
  • repo_url: https://github.com/carlo-abrate/Counterfactual-Explanations-for-Graph-Classification-Through-the-Lenses-of-Density
  • paper_authors: Carlo Abrate, Giulia Preti, Francesco Bonchi
  • for: 提供一种基于密度的对比例类型Counterfactual例子生成方法,用于解释图像分类器的决策。
  • methods: 使用不同类型的密集结构来生成对比例类型Counterfactual例子,包括打开或关闭三角形、驱动最大 clique。
  • results: 在7个大脑网络数据集中评估了不同实现方法的效果,并通过多种广泛使用的指标进行比较。结果表明,采用Semantic relevance的变换单元如密度是生成可靠和可读的对比例类型Counterfactual例子的关键。
    Abstract Counterfactual examples have emerged as an effective approach to produce simple and understandable post-hoc explanations. In the context of graph classification, previous work has focused on generating counterfactual explanations by manipulating the most elementary units of a graph, i.e., removing an existing edge, or adding a non-existing one. In this paper, we claim that such language of explanation might be too fine-grained, and turn our attention to some of the main characterizing features of real-world complex networks, such as the tendency to close triangles, the existence of recurring motifs, and the organization into dense modules. We thus define a general density-based counterfactual search framework to generate instance-level counterfactual explanations for graph classifiers, which can be instantiated with different notions of dense substructures. In particular, we show two specific instantiations of this general framework: a method that searches for counterfactual graphs by opening or closing triangles, and a method driven by maximal cliques. We also discuss how the general method can be instantiated to exploit any other notion of dense substructures, including, for instance, a given taxonomy of nodes. We evaluate the effectiveness of our approaches in 7 brain network datasets and compare the counterfactual statements generated according to several widely-used metrics. Results confirm that adopting a semantic-relevant unit of change like density is essential to define versatile and interpretable counterfactual explanation methods.
    摘要 counterfactual 例子在 graph 分类 задании中出现为一种有效的方法,生成简单易理解的后果解释。在图 classification 的上下文中,先前的工作都是通过修改图形的最基本单元,如删除现有的边或添加不存在的一个,来生成 counterfactual 解释。在这篇论文中,我们认为这种语言解释可能太细节,我们转移注意力到了现实世界中复杂网络的一些主要特征,如形成三角形的倾向,存在循环模式,以及组织成密集模块。我们因此定义了一种通用的density-based counterfactual 搜索框架,用于生成图分类器的实例级 counterfactual 解释,可以实现不同的归并概念。特别是,我们显示了两种特定的实现方式:通过打开或关闭三角形来搜索 counterfactual 图,以及通过最大 клиQUE 驱动。我们还讨论了如何使用其他任何密集结构的概念来实现该框架,包括例如给定的节点分类。我们在 7 个大脑网络数据集上评估了我们的方法,并与多种常用的度量进行比较。结果表明,采用 Semantic-relevant 的变化单元,如density,是生成可靠和可读的 counterfactual 解释方法的必要条件。

Seeing through the Brain: Image Reconstruction of Visual Perception from Human Brain Signals

  • paper_url: http://arxiv.org/abs/2308.02510
  • repo_url: None
  • paper_authors: Yu-Ting Lan, Kan Ren, Yansen Wang, Wei-Long Zheng, Dongsheng Li, Bao-Liang Lu, Lili Qiu
  • for: 这 paper 是关于人类视觉认知的研究,具体来说是利用 neuroscience 和人工智能技术来记录和复制人类视觉能力。
  • methods: 这 paper 使用了 electroencephalography (EEG) 信号来重建观察到的图像,并提出了一个全面的数据处理管道,名为 NeuroImagen,以提取有用信息并进行图像重建。
  • results: 实验结果表明,这 paper 的方法可以有效地重建图像,并且其表现比传统方法更为出色。
    Abstract Seeing is believing, however, the underlying mechanism of how human visual perceptions are intertwined with our cognitions is still a mystery. Thanks to the recent advances in both neuroscience and artificial intelligence, we have been able to record the visually evoked brain activities and mimic the visual perception ability through computational approaches. In this paper, we pay attention to visual stimuli reconstruction by reconstructing the observed images based on portably accessible brain signals, i.e., electroencephalography (EEG) data. Since EEG signals are dynamic in the time-series format and are notorious to be noisy, processing and extracting useful information requires more dedicated efforts; In this paper, we propose a comprehensive pipeline, named NeuroImagen, for reconstructing visual stimuli images from EEG signals. Specifically, we incorporate a novel multi-level perceptual information decoding to draw multi-grained outputs from the given EEG data. A latent diffusion model will then leverage the extracted information to reconstruct the high-resolution visual stimuli images. The experimental results have illustrated the effectiveness of image reconstruction and superior quantitative performance of our proposed method.
    摘要 seeing是信服,但是人类视觉 cognition 的下面机制仍然是一个谜。感谢最近的 neuroscience 和人工智能的进步,我们可以记录视觉诱发的脑动力和模拟视觉能力通过计算方法。在这篇文章中,我们关注视觉刺激重建,基于可 portable 的 brain signals,即 electroencephalography (EEG) 数据。因为 EEG 信号是时间序列格式的动态和具有噪声,处理和提取有用信息需要更多的努力。为此,我们提出了一个完整的管道,名为 NeuroImagen,用于从 EEG 信号中重建高分辨率的视觉刺激图像。 Specifically,我们采用了一种新的多级感知信息解码,以从给定的 EEG 数据中提取多层次输出。然后,一种潜在的扩散模型会利用提取的信息来重建高分辨率的视觉刺激图像。实验结果表明了图像重建的效果和我们提出的方法的数量性表现优于。

Hybrid ASP-based multi-objective scheduling of semiconductor manufacturing processes (Extended version)

  • paper_url: http://arxiv.org/abs/2307.14799
  • repo_url: None
  • paper_authors: Mohammed M. S. El-Kholany, Ramsha Ali, Martin Gebser
  • for: 本研究旨在实现现代半导体生产线上的调度,以满足对复杂生产过程和高科技机器的需求。
  • methods: 本研究使用混合Answer Set Programming with difference logic来模型半导体生产线上的特有需求,并包括可变机器处理、设置、批次和维护操作。
  • results: 本研究发现,在考虑多个优化目标下,大规模的调度可以实现更好的生产效率和流程可靠性,而非仅对单一机器或特定阶段进行地方优化。
    Abstract Modern semiconductor manufacturing involves intricate production processes consisting of hundreds of operations, which can take several months from lot release to completion. The high-tech machines used in these processes are diverse, operate on individual wafers, lots, or batches in multiple stages, and necessitate product-specific setups and specialized maintenance procedures. This situation is different from traditional job-shop scheduling scenarios, which have less complex production processes and machines, and mainly focus on solving highly combinatorial but abstract scheduling problems. In this work, we address the scheduling of realistic semiconductor manufacturing processes by modeling their specific requirements using hybrid Answer Set Programming with difference logic, incorporating flexible machine processing, setup, batching and maintenance operations. Unlike existing methods that schedule semiconductor manufacturing processes locally with greedy heuristics or by independently optimizing specific machine group allocations, we examine the potentials of large-scale scheduling subject to multiple optimization objectives.
    摘要 现代半导体生产包括复杂的生产过程,包括数百个操作,可能需要几个月时间从库存释放到完成。这些高科技机器在多个阶段中操作,需要产品特定的设置和维护过程。这与传统的作坊调度enario不同,后者的生产过程更为简单,主要关注解决高 combinatorial的Abstract调度问题。在这种工作中,我们使用混合Answer Set Programming with difference logic来模拟半导体生产过程的特定需求,包括灵活机器处理、设置、批处理和维护操作。与现有的方法不同,我们不仅使用本地剑指法或独立优化特定机器组分配置,而是尝试解决大规模调度问题,同时满足多个优化目标。

Emotion4MIDI: a Lyrics-based Emotion-Labeled Symbolic Music Dataset

  • paper_url: http://arxiv.org/abs/2307.14783
  • repo_url: https://github.com/serkansulun/lyricsemotions
  • paper_authors: Serkan Sulun, Pedro Oliveira, Paula Viana
  • for: 创建一个大规模的符号音乐数据集,包含12000首MIDI乐曲,用于探索音乐和情感之间的关系,以及开发能够根据特定情感生成音乐的模型。
  • methods: 首先在GoEmotions数据集上训练情感分类模型,实现了状态之内的最佳效果,并将这些模型应用于两个大规模的MIDI数据集中的歌词。
  • results: 创建了一个广泛的情感分类数据集,覆盖了多种细腻的情感,为研究音乐和情感之间的关系,以及开发能够根据特定情感生成音乐的模型提供了一个丰富的资源。
    Abstract We present a new large-scale emotion-labeled symbolic music dataset consisting of 12k MIDI songs. To create this dataset, we first trained emotion classification models on the GoEmotions dataset, achieving state-of-the-art results with a model half the size of the baseline. We then applied these models to lyrics from two large-scale MIDI datasets. Our dataset covers a wide range of fine-grained emotions, providing a valuable resource to explore the connection between music and emotions and, especially, to develop models that can generate music based on specific emotions. Our code for inference, trained models, and datasets are available online.
    摘要 我们提供了一个新的大规模符号音乐数据集,包含12000个MIDI歌曲。为创建这个数据集,我们首先在GoEmotions数据集上训练了情感分类模型,实现了状态之arte的结果,使用的模型比基线模型小半。然后,我们将这些模型应用到了两个大规模MIDI数据集中的歌词上。我们的数据集覆盖了各种细化的情感,提供了一个优质的资源,探索音乐和情感之间的连接,特别是开发基于具体情感的音乐生成模型。我们的推理代码、训练模型和数据集在线可用。

Car-Driver Drowsiness Assessment through 1D Temporal Convolutional Networks

  • paper_url: http://arxiv.org/abs/2308.02415
  • repo_url: None
  • paper_authors: Francesco Rundo, Concetto Spampinato, Michael Rundo
  • for: 这项研究旨在提高驾驶安全性,通过分析司机注意度水平。
  • methods: 该研究使用了一种新型的生物传感器,包括近红外LED发射器和光探测器,以分析司机生物学状态。同时,研究人员还开发了一种嵌入式时域频谱过滤技术,以及一种1D时间卷积架构,以实现实时识别司机睡眠状态。
  • results: 研究人员通过对实验数据进行分析,发现该系统可以准确地识别司机睡眠状态,准确率约为96%。
    Abstract Recently, the scientific progress of Advanced Driver Assistance System solutions (ADAS) has played a key role in enhancing the overall safety of driving. ADAS technology enables active control of vehicles to prevent potentially risky situations. An important aspect that researchers have focused on is the analysis of the driver attention level, as recent reports confirmed a rising number of accidents caused by drowsiness or lack of attentiveness. To address this issue, various studies have suggested monitoring the driver physiological state, as there exists a well-established connection between the Autonomic Nervous System (ANS) and the level of attention. For our study, we designed an innovative bio-sensor comprising near-infrared LED emitters and photo-detectors, specifically a Silicon PhotoMultiplier device. This allowed us to assess the driver physiological status by analyzing the associated PhotoPlethysmography (PPG) signal.Furthermore, we developed an embedded time-domain hyper-filtering technique in conjunction with a 1D Temporal Convolutional architecture that embdes a progressive dilation setup. This integrated system enables near real-time classification of driver drowsiness, yielding remarkable accuracy levels of approximately 96%.
    摘要 最近,高级驾驶助手技术(ADAS)的科学进步在提高驾驶安全方面扮演着关键角色。ADAS技术允许车辆活动控制,预防可能带来危险的情况。研究人员注重分析司机注意力水平,据报告显示,睡意或注意力不集中导致的交通事故的数量在增长。为解决这个问题,各种研究建议监测司机生理状况,因为存在自主神经系统(ANS)和注意力之间的很好的关系。为我们的研究,我们设计了一种创新的生物传感器,包括近红外LED发射器和光检测器,特别是一种半导体光产生器。这使得我们可以通过分析相关的血液压力信号来评估司机生理状况。此外,我们还开发了一种嵌入式时域频域滤波技术,并与一种1D时间卷积架构结合,实现了近实时的司机睡意分类,准确率达到约96%。

Fair Machine Unlearning: Data Removal while Mitigating Disparities

  • paper_url: http://arxiv.org/abs/2307.14754
  • repo_url: None
  • paper_authors: Alex Oesterling, Jiaqi Ma, Flavio P. Calmon, Hima Lakkaraju
  • for: 这个论文的目的是提出一种可靠地忘记数据实例的机器学习方法,以保持集体公正性。
  • methods: 该论文使用了一种基于梯度下降的方法,通过让模型学习一个新的权重函数来忘记数据实例。
  • results: 该论文的实验结果表明,该方法可以有效地忘记数据实例,同时保持集体公正性。
    Abstract As public consciousness regarding the collection and use of personal information by corporations grows, it is of increasing importance that consumers be active participants in the curation of corporate datasets. In light of this, data governance frameworks such as the General Data Protection Regulation (GDPR) have outlined the right to be forgotten as a key principle allowing individuals to request that their personal data be deleted from the databases and models used by organizations. To achieve forgetting in practice, several machine unlearning methods have been proposed to address the computational inefficiencies of retraining a model from scratch with each unlearning request. While efficient online alternatives to retraining, it is unclear how these methods impact other properties critical to real-world applications, such as fairness. In this work, we propose the first fair machine unlearning method that can provably and efficiently unlearn data instances while preserving group fairness. We derive theoretical results which demonstrate that our method can provably unlearn data instances while maintaining fairness objectives. Extensive experimentation with real-world datasets highlight the efficacy of our method at unlearning data instances while preserving fairness.
    摘要 In this work, we propose the first fair machine unlearning method that can provably and efficiently unlearn data instances while preserving group fairness. We derive theoretical results that demonstrate our method can provably unlearn data instances while maintaining fairness objectives. Extensive experiments with real-world datasets show that our method is effective in unlearning data instances while preserving fairness.

LLMediator: GPT-4 Assisted Online Dispute Resolution

  • paper_url: http://arxiv.org/abs/2307.16732
  • repo_url: None
  • paper_authors: Hannes Westermann, Jaromir Savelka, Karim Benyekhlef
  • For: The paper is written to explore the potential of using large language models (LLMs) to enhance online dispute resolution (ODR) processes, specifically in the context of high-volume, low-intensity legal disputes.* Methods: The paper proposes an experimental platform called LLMediator, which leverages GPT-4 to reformulate user messages, draft mediator responses, and potentially engage in discussions autonomously.* Results: The initial qualitative evaluations presented in the paper demonstrate the potential for LLMs to support ODR and facilitate amicable settlements, with promising results for the proof of concept.Here are the three points in Simplified Chinese text:* For: 本文是为了探讨利用现代大语言模型(LLM)来增强在线纠纷解决(ODR)过程,具体是在高量低度纠纷法律纠纷中。* Methods: 本文提出了一个名为LLMediator的实验平台,利用GPT-4来改写用户消息,写作仲裁员回复,并可能地自动参与讨论。* Results: 本文提出的初步质量评估显示LLM可以支持ODR,并且初步证明了概念的可行性。
    Abstract In this article, we introduce LLMediator, an experimental platform designed to enhance online dispute resolution (ODR) by utilizing capabilities of state-of-the-art large language models (LLMs) such as GPT-4. In the context of high-volume, low-intensity legal disputes, alternative dispute resolution methods such as negotiation and mediation offer accessible and cooperative solutions for laypeople. These approaches can be carried out online on ODR platforms. LLMediator aims to improve the efficacy of such processes by leveraging GPT-4 to reformulate user messages, draft mediator responses, and potentially autonomously engage in the discussions. We present and discuss several features of LLMediator and conduct initial qualitative evaluations, demonstrating the potential for LLMs to support ODR and facilitate amicable settlements. The initial proof of concept is promising and opens up avenues for further research in AI-assisted negotiation and mediation.
    摘要 在这篇文章中,我们介绍LLMediator,一个实验性的平台,用于增强在线纠纷解决(ODR),通过使用现代大语言模型(LLM) such as GPT-4。在高量低激法律纠纷中,人们可以通过谈判和谈判来解决纠纷,这些方法可以在ODR平台上进行在线。LLMediator通过使用GPT-4来重新表达用户消息,制定仲裁者回复,以及可能地自动参与谈判。我们介绍了LLMediator的多种特性,并进行了初步质量评估,以示LLM可以支持ODR,并促进和谐解决方案。我们的初步证明有潜力,开启了AI助成谈判和谈判的研究方向。

Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation

  • paper_url: http://arxiv.org/abs/2307.14750
  • repo_url: https://github.com/zhiyuan-li-john/rapsg
  • paper_authors: Zhiyuan Li, Dongnan Liu, Heng Wang, Chaoyi Zhang, Weidong Cai
    for:This paper proposes a new strategy for training an image captioner without annotated image-sentence pairs, which is to leverage prior knowledge from large pre-trained models (LPMs) and integrate a retrieval process to generate high-quality pseudo sentences.methods:The proposed method, called LPM + retrieval-augmented learning, consists of two main components: (1) Retrieval-augmented Pseudo Sentence Generation (RaPSG), which retrieves highly relevant short region descriptions from mismatching corpora and uses them to generate a variety of pseudo sentences with distinct representations and high quality, and (2) a fluency filter and a CLIP-guided training objective to facilitate model optimization.results:The proposed method achieves a CIDEr score of 78.1 (+5.1) while utilizing only 0.3% of the trainable parameters of the SOTA pre-training model (Flamingo3B), and outperforms the 1% semi-supervised image caption benchmark with a score of 93.4 CIDEr (+8.9) with a simple extension.
    Abstract Training an image captioner without annotated image-sentence pairs has gained traction in recent years. Previous approaches can be categorized into two strategies: crawling sentences from mismatching corpora and aligning them with the given images as pseudo annotations, or pre-training the captioner using external image-text pairs. However, the aligning setting seems to reach its performance limit due to the quality problem of pairs, and pre-training requires significant computational resources. To address these challenges, we propose a new strategy ``LPM + retrieval-augmented learning" where the prior knowledge from large pre-trained models (LPMs) is leveraged as supervision, and a retrieval process is integrated to further reinforce its effectiveness. Specifically, we introduce Retrieval-augmented Pseudo Sentence Generation (RaPSG), which adopts an efficient approach to retrieve highly relevant short region descriptions from the mismatching corpora and use them to generate a variety of pseudo sentences with distinct representations as well as high quality via LPMs. In addition, a fluency filter and a CLIP-guided training objective are further introduced to facilitate model optimization. Experimental results demonstrate that our method surpasses the SOTA pre-training model (Flamingo3B) by achieving a CIDEr score of 78.1 (+5.1) while utilizing only 0.3% of its trainable parameters (1.3B VS 33M). Importantly, our approach eliminates the need of computationally expensive pre-training processes on external datasets (e.g., the requirement of 312M image-text pairs for Flamingo3B). We further show that with a simple extension, the generated pseudo sentences can be deployed as weak supervision to boost the 1% semi-supervised image caption benchmark up to 93.4 CIDEr score (+8.9) which showcases the versatility and effectiveness of our approach.
    摘要 Recently, 没有annotated image-sentence对照的训练方法在image captioner领域受到了广泛关注。以前的方法可以分为两种策略:一是从不符对的corpus中抓取句子并将其与给定的图像作为pseudo注释进行对应,另一是使用外部的image-text对照来预训captioner。但是,对应设定似乎已经达到了性能的限制,因为对应对的问题和预训需要大量的计算资源。为了解决这些挑战,我们提出了一种新的策略“LPM+ Retrieval-augmented learning”,利用大型预训模型(LPM)的优先知识作为监督,并将检索过程集成到了模型训练中来进一步加强其效果。具体来说,我们提出了Retrieval-augmented Pseudo Sentence Generation(RaPSG)方法,通过高效的检索方式从不符对的corpus中检索高度相关的短区描述,并使用LPMs生成一系列高质量和多样化的pseudo句。此外,我们还引入了一个流利性筛选器和CLIP帮助的训练目标,以便优化模型。实验结果表明,我们的方法超过了预训模型(Flamingo3B)的SOTA分数(78.1 (+5.1),同时只使用0.3%的可训练参数(1.3B VS 33M)。重要的是,我们的方法消除了需要大量计算资源的预训过程(例如,需要312M个图像-文本对 дляFlamingo3B)。我们还证明了,通过一个简单的扩展,生成的pseudo句可以被用作弱级supervision,将1%的 semi-supervised image caption benchmark提高到93.4 CIDEr分数 (+8.9),这说明了我们的方法的多样性和效果。

JusticeBot: A Methodology for Building Augmented Intelligence Tools for Laypeople to Increase Access to Justice

  • paper_url: http://arxiv.org/abs/2308.02032
  • repo_url: None
  • paper_authors: Hannes Westermann, Karim Benyekhlef
  • for: 该论文旨在帮助非法专业人士解决法律问题。
  • methods: 该论文提出的方法是基于混合 случа法和规则逻辑的法律决策支持系统,通过问候用户的情况并提供法律信息、相似案例和下一步建议,帮助用户解决问题。
  • results: 该论文通过实现这种方法,为用户提供了一个可能帮助解决案件或在法律程序中行使权利的工具。
    Abstract Laypeople (i.e. individuals without legal training) may often have trouble resolving their legal problems. In this work, we present the JusticeBot methodology. This methodology can be used to build legal decision support tools, that support laypeople in exploring their legal rights in certain situations, using a hybrid case-based and rule-based reasoning approach. The system ask the user questions regarding their situation and provides them with legal information, references to previous similar cases and possible next steps. This information could potentially help the user resolve their issue, e.g. by settling their case or enforcing their rights in court. We present the methodology for building such tools, which consists of discovering typically applied legal rules from legislation and case law, and encoding previous cases to support the user. We also present an interface to build tools using this methodology and a case study of the first deployed JusticeBot version, focused on landlord-tenant disputes, which has been used by thousands of individuals.
    摘要 非法律专业人士(即没有法律训练的个人)经常遇到法律问题的解决困难。在这项工作中,我们介绍了JusticeBot方法论。这种方法可以用于建立法律决策支持工具,以帮助非法律专业人士在某些情况下探索他们的法律权利,采用混合案例基于和规则基于的思维方法。系统会问用户他们的情况,并提供他们法律信息、相似案例和可能的下一步。这些信息可能帮助用户解决他们的问题,例如和解案或在法律程序中保护他们的权利。我们介绍了这种方法的建立工具,包括从法律和案例中找到通常适用的法律规则,并将前例编码以支持用户。我们还介绍了使用这种方法构建工具的界面,以及一个关注JusticeBot版本1.0的案例研究,专注于房东和租户纠纷,已经被千余人使用。

New Interaction Paradigm for Complex EDA Software Leveraging GPT

  • paper_url: http://arxiv.org/abs/2307.14740
  • repo_url: https://github.com/smarton-empower/smarton-ai
  • paper_authors: Boyu Han, Xinyu Wang, Yifan Wang, Junyu Yan, Yidong Tian
  • for: 帮助 novice Printed Circuit Board (PCB) 设计者更好地使用 KiCad 等专业电子设计自动化 (EDA) 软件,通过人工智能 (AI) 交互助手Plugin 提高设计效率和用户体验。
  • methods: 基于 HuggingGPT 框架,采用大语言模型 GPT 和 BERT,实现任务规划和执行,包括分析帮助文档段落和执行不同插件,同时充分利用 KiCad 自身的电子设计和PCB manipulate 功能。
  • results: 在 préliminary 测试中,SmartonAI 可以有效地简化 PCB 设计过程,将复杂的命令转化为易于理解的语言基于交互。这种 bridging gap между复杂的 EDA 软件和易用的交互 interfaces 可以帮助 novice 设计者更好地使用 KiCad 等软件,同时也适用于其他复杂的软件系统,展示了 AI 协助用户界面在不同领域的潜在潜力。
    Abstract In the rapidly growing field of electronic design automation (EDA), professional software such as KiCad, Cadence , and Altium Designer provide increasingly extensive design functionalities. However, the intricate command structure and high learning curve create a barrier, particularly for novice printed circuit board (PCB) designers. This results in difficulties in selecting appropriate functions or plugins for varying design purposes, compounded by the lack of intuitive learning methods beyond traditional documentation, videos, and online forums. To address this challenge, an artificial intelligence (AI) interaction assist plugin for EDA software named SmartonAl is developed here, also KiCad is taken as the first example. SmartonAI is inspired by the HuggingGPT framework and employs large language models, such as GPT and BERT, to facilitate task planning and execution. On receiving a designer request, SmartonAI conducts a task breakdown and efficiently executes relevant subtasks, such as analysis of help documentation paragraphs and execution of different plugins, along with leveraging the built-in schematic and PCB manipulation functions in both SmartonAl itself and software. Our preliminary results demonstrate that SmartonAI can significantly streamline the PCB design process by simplifying complex commands into intuitive language-based interactions. By harnessing the powerful language capabilities of ChatGPT and the rich design functions of KiCad, the plugin effectively bridges the gap between complex EDA software and user-friendly interaction. Meanwhile, the new paradigm behind SmartonAI can also extend to other complex software systems, illustrating the immense potential of AI-assisted user interfaces in advancing digital interactions across various domains.
    摘要 在日益发展的电子设计自动化(EDA)领域中,职业软件如KiCad、Cadence和Altium Designer提供越来越广泛的设计功能。然而,复杂的命令结构和学习曲线创造了一个障碍,特别是 для初级Printed Circuit Board(PCB)设计师。这导致选择适合的功能或插件在不同的设计目的上具有困难,并且缺乏直观的学习方法,只有通过传统的文档、视频和在线讨论来学习。为解决这个挑战,我们开发了一个人工智能(AI)互动助手插件 для EDA 软件,名为SmartonAl,KiCad 被选为首个示例。SmartonAI 基于 HuggingGPT 框架,采用大型语言模型,如 GPT 和 BERT,以便任务规划和执行。当设计者发送请求时,SmartonAI 会进行任务拆分,然后高效地执行相关的子任务,例如分析帮助文档段落和执行不同的插件,同时利用 SmartonAl 自身和软件中的基本电子设计和PCB manipulate 功能。我们的初步结果表明,SmartonAI 可以很大程度地减少 PCB 设计过程中的复杂性,通过将复杂的命令转化为直观的语言基本互动。通过将ChatGPT 强大的语言能力和 KiCad wealthy 的设计功能相结合,插件可以有效地将复杂的 EDA 软件和用户友好的互动相连。同时,SmartonAI 的新理念可以扩展到其他复杂的软件系统,illustrating the immense potential of AI-assisted user interfaces in advancing digital interactions across various domains.

Cortex Inspired Learning to Recover Damaged Signal Modality with ReD-SOM Model

  • paper_url: http://arxiv.org/abs/2307.15095
  • repo_url: None
  • paper_authors: Artem Muliukov, Laurent Rodriguez, Benoit Miramond
  • for: 本研究旨在恢复lost的一个modalidad的数据,使用另一个modalidad的数据进行恢复。
  • methods: 本研究使用Variational Auto-Encoders、Self-Organizing Maps和Hebb连接在一起,构建了一个ReD-SOM模型,以模拟人脑中不同modalities之间的交互效应。
  • results: 实验结果表明,ReD-SOM模型可以有效地恢复lost的数据,并且在存在较大的信号扭曲情况下,效果更加remarkable。
    Abstract Recent progress in the fields of AI and cognitive sciences opens up new challenges that were previously inaccessible to study. One of such modern tasks is recovering lost data of one modality by using the data from another one. A similar effect (called the McGurk Effect) has been found in the functioning of the human brain. Observing this effect, one modality of information interferes with another, changing its perception. In this paper, we propose a way to simulate such an effect and use it to reconstruct lost data modalities by combining Variational Auto-Encoders, Self-Organizing Maps, and Hebb connections in a unified ReD-SOM (Reentering Deep Self-organizing Map) model. We are inspired by human's capability to use different zones of the brain in different modalities, in case of having a lack of information in one of the modalities. This new approach not only improves the analysis of ambiguous data but also restores the intended signal! The results obtained on the multimodal dataset demonstrate an increase of quality of the signal reconstruction. The effect is remarkable both visually and quantitatively, specifically in presence of a significant degree of signal's distortion.
    摘要 现代人工智能和认知科学的进步打开了以前无法研究的新挑战。其中一项现代任务是通过一种不同的感知modalities来恢复丢失的数据。人脑中的同样效应(叫做McGurk效应)也发现了类似的现象,人们在感知信息时,一种感知modalities会对另一种感知modalities产生干扰,从而改变它的感知。在这篇论文中,我们提议使用Variational Auto-Encoders、Self-Organizing Maps和Hebb连接在一起,实现一种 reunified ReD-SOM(再入Self-organizing Map)模型。我们受人类在不同感知modalities中使用不同的脑区域的启发,这种新方法不仅改善了抽象数据的分析,还能恢复原始信号!实验结果表明,在多模态数据集上,可以提高信号重建质量。效果是观察性和量度上有显著改善,特别在信号受到较大的扭曲时。

Evaluating Generative Models for Graph-to-Text Generation

  • paper_url: http://arxiv.org/abs/2307.14712
  • repo_url: https://github.com/shuzhouyuan/eval_g2t_genmodels
  • paper_authors: Shuzhou Yuan, Michael Färber
  • for: 本研究旨在探讨生成模型在零shot情况下对图数据的文本生成能力。
  • methods: 我们使用GPT-3和ChatGPT生成模型,并对T5和BARTfinetuned LLM模型进行比较。
  • results: 我们的结果表明,生成模型能够生成流畅、连贯的文本,AGENDA和WebNLG数据集的BLEU分数分别为10.57和11.08。然而,我们的错误分析发现,生成模型仍然忽略实体之间的semantic关系,并且有时会生成hallucination或无关信息。
    Abstract Large language models (LLMs) have been widely employed for graph-to-text generation tasks. However, the process of finetuning LLMs requires significant training resources and annotation work. In this paper, we explore the capability of generative models to generate descriptive text from graph data in a zero-shot setting. Specifically, we evaluate GPT-3 and ChatGPT on two graph-to-text datasets and compare their performance with that of finetuned LLM models such as T5 and BART. Our results demonstrate that generative models are capable of generating fluent and coherent text, achieving BLEU scores of 10.57 and 11.08 for the AGENDA and WebNLG datasets, respectively. However, our error analysis reveals that generative models still struggle with understanding the semantic relations between entities, and they also tend to generate text with hallucinations or irrelevant information. As a part of error analysis, we utilize BERT to detect machine-generated text and achieve high macro-F1 scores. We have made the text generated by generative models publicly available.
    摘要 大型语言模型(LLM)已广泛应用于图数据生成文本任务。然而,训练LLM模型需要巨量的资源和标注工作。在这篇论文中,我们探讨了生成模型是否可以从图数据生成描述性文本,而不需要训练。我们评估了GPT-3和ChatGPT模型在两个图数据生成文本任务上的表现,并与训练后的LLM模型T5和BART进行比较。我们的结果表明,生成模型可以生成流畅和一致的文本,AGENDA和WebNLG数据集上的BLEU分数分别为10.57和11.08。然而,我们的错误分析表明,生成模型仍然很难理解实体之间的semantic关系,同时也很容易生成幻觉或无关信息。为了进行错误分析,我们利用BERT检测机器生成文本,并实现了高macro-F1分数。我们已经公开了生成模型生成的文本。

A Multimodal Supervised Machine Learning Approach for Satellite-based Wildfire Identification in Europe

  • paper_url: http://arxiv.org/abs/2308.02508
  • repo_url: None
  • paper_authors: Angelica Urbanelli, Luca Barco, Edoardo Arnaudo, Claudio Rossi
  • for: 提高自动化天然灾害检测系统的精度,特意开发了一种野火识别解决方案。
  • methods: 利用多种信息源,包括MODIS和VIIRS热点服务、EFFIS数据库、ERSI年度Land Use Land Cover(LULC)和Copernicus Sentinel-3数据,提出了一种多模态超级vised机器学习方法,实现了野火识别任务中的效果。
  • results: 实验结果表明,提出的方法可以有效地分解热点检测结果,将野火和其他事件区分开来。
    Abstract The increasing frequency of catastrophic natural events, such as wildfires, calls for the development of rapid and automated wildfire detection systems. In this paper, we propose a wildfire identification solution to improve the accuracy of automated satellite-based hotspot detection systems by leveraging multiple information sources. We cross-reference the thermal anomalies detected by the Moderate-resolution Imaging Spectroradiometer (MODIS) and the Visible Infrared Imaging Radiometer Suite (VIIRS) hotspot services with the European Forest Fire Information System (EFFIS) database to construct a large-scale hotspot dataset for wildfire-related studies in Europe. Then, we propose a novel multimodal supervised machine learning approach to disambiguate hotspot detections, distinguishing between wildfires and other events. Our methodology includes the use of multimodal data sources, such as the ERSI annual Land Use Land Cover (LULC) and the Copernicus Sentinel-3 data. Experimental results demonstrate the effectiveness of our approach in the task of wildfire identification.
    摘要 随着自然灾害的频繁发生,如野火,需要开发高速自动化野火检测系统。在这篇论文中,我们提出了一种野火识别解决方案,以提高自动通过卫星温度异常检测系统获取的热点数据的准确性。我们将模拟高分辨率柯比耶报警系统(MODIS)和可见近红外报警系统(VIIRS)的热点服务与欧洲林地火灾信息系统(EFFIS)数据库进行交叉参考,以建立欧洲大规模热点数据集,用于林地火灾相关研究。然后,我们提出了一种新的多模态指导学习方法,用于分解热点检测结果,并将野火和其他事件区分开来。我们的方法包括使用多模态数据源,如地图信息系统(ERSI)年度土地用途土地覆盖(LULC)数据和科学卫星三号数据。实验结果表明,我们的方法在野火识别任务中具有效果。

Prediction of wind turbines power with physics-informed neural networks and evidential uncertainty quantification

  • paper_url: http://arxiv.org/abs/2307.14675
  • repo_url: None
  • paper_authors: Alfonso Gijón, Ainhoa Pujana-Goitia, Eugenio Perea, Miguel Molina-Solana, Juan Gómez-Romero
  • for: 这个研究旨在优化风力发电机的运行和维护,通过投角控制器和早期缺陷检测,提高风力发电机的发电效率和可靠性。
  • methods: 这个研究使用数据驱动方法来优化风力发电机的模型,将大量数据处理成更加精确和高效的模型,并且将物理限制给模型以保持其实际性。
  • results: 研究结果显示,使用物理限制的数据驱动模型可以实时预测风力发电机的发电功率、扭矩和功率系数,并且可以提供精确的不确定性估计。
    Abstract The ever-growing use of wind energy makes necessary the optimization of turbine operations through pitch angle controllers and their maintenance with early fault detection. It is crucial to have accurate and robust models imitating the behavior of wind turbines, especially to predict the generated power as a function of the wind speed. Existing empirical and physics-based models have limitations in capturing the complex relations between the input variables and the power, aggravated by wind variability. Data-driven methods offer new opportunities to enhance wind turbine modeling of large datasets by improving accuracy and efficiency. In this study, we used physics-informed neural networks to reproduce historical data coming from 4 turbines in a wind farm, while imposing certain physical constraints to the model. The developed models for regression of the power, torque, and power coefficient as output variables showed great accuracy for both real data and physical equations governing the system. Lastly, introducing an efficient evidential layer provided uncertainty estimations of the predictions, proved to be consistent with the absolute error, and made possible the definition of a confidence interval in the power curve.
    摘要 随着风能使用的增长,适用于风机操作的扭矩角度控制器和其维护的早期缺陷检测变得越来越重要。为了准确地预测风机生成的电力,特别是在风速变化的情况下,需要有高度准确和可靠的风机模型。现有的empirical和物理基于模型具有限制capture风机输入变量和电力之间的复杂关系,这使得预测电力的准确性受到风度的变化的影响。使用数据驱动方法可以提高风机模型的准确性和效率。在本研究中,我们使用物理信息权重 neural network来复制来自4台风机风电农的历史数据,并对模型受到物理限制。开发的 regression 模型的输出变量为电力、扭矩和功率系数表现出了很高的准确性,并且与物理方程统制系统的实际数据相符。最后,通过添加高效的证据层,我们实现了预测结果的uncertainty estimations,并证明与绝对错误之间的一致性。这使得可以定义风机电力曲线上的自信Interval。

Fuzzy order-sorted feature logic

  • paper_url: http://arxiv.org/abs/2307.14669
  • repo_url: None
  • paper_authors: Gian Carlo Milanese, Gabriella Pasi
  • for: 本文探讨了一种基于函数表示和集合表示的知识表示和推理语言(OSF逻辑)的扩展,即将OSF逻辑扩展到不确定环境中。
  • methods: 本文使用了一种柔化包含关系来扩展OSF逻辑,其中包含关系是基于 zadeh 的包含关系。在这个柔化环境中,sort symbol 和 OSF term 都表示不确定集合。
  • results: 本文提出了一种基于柔化包含关系的 OSF 逻辑 semantics,并证明了这种 semantics 的准确性和可行性。此外,本文还提供了一种用于计算 OSF term 之间的包含关系度的算法,并证明了这种算法的复杂度。
    Abstract Order-Sorted Feature (OSF) logic is a knowledge representation and reasoning language based on function-denoting feature symbols and set-denoting sort symbols ordered in a subsumption lattice. OSF logic allows the construction of record-like terms that represent classes of entities and that are themselves ordered in a subsumption relation. The unification algorithm for such structures provides an efficient calculus of type subsumption, which has been applied in computational linguistics and implemented in constraint logic programming languages such as LOGIN and LIFE and automated reasoners such as CEDAR. This work generalizes OSF logic to a fuzzy setting. We give a flexible definition of a fuzzy subsumption relation which generalizes Zadeh's inclusion between fuzzy sets. Based on this definition we define a fuzzy semantics of OSF logic where sort symbols and OSF terms denote fuzzy sets. We extend the subsumption relation to OSF terms and prove that it constitutes a fuzzy partial order with the property that two OSF terms are subsumed by one another in the crisp sense if and only if their subsumption degree is greater than 0. We show how to find the greatest lower bound of two OSF terms by unifying them and how to compute the subsumption degree between two OSF terms, and we provide the complexity of these operations.
    摘要 订定排序特征逻规(OSF)逻规是一种知识表现和推理语言,基于功能表示特征符号和集合表示排序符号,顺序在一个包含关系中。OSF逻规允许建构记录类型的条件,并且这些条件顺序在包含关系中。数据整合算法适用于这些结构,实现了型别包含的快速计算,并且在计算语言中如LOGIN和LIFE以及自动推理工具中如CEDAR中实现。这个工作将OSF逻规扩展到模糊设定。我们提供一个洒处的包含关系定义,它为Zadeh的包含关系中的模糊集合提供了一个扩展。基于这个定义,我们定义了模糊OSF逻规, Sort symbol和OSF表达符号表示模糊集合。我们将包含关系扩展到OSF表达符号,并证明它具有模糊偏序的性质,即两个OSF表达符号之间的包含关系是在模糊上的,即它们之间的包含度大于0。我们还说明如何在两个OSF表达符号之间找到最小共识,以及如何计算两个OSF表达符号之间的包含度,并且提供了这些操作的复杂度。

Multi-Valued Partial Order Plans in Numeric Planning

  • paper_url: http://arxiv.org/abs/2307.14660
  • repo_url: None
  • paper_authors: Hayyan Helal, Gerhard Lakemeyer
  • for: 研究 numeric planning 中的不可解析性的可能原因,通过研究动作的不同出现次数。
  • methods: 使用搜索问题的 reformulation,NP-complete 的 numeric planning 可以通过规则来找到。
  • results: 开发了多值部分顺序计划,一种用于(串行和并行)计划的最小commitment减少表示方式,并研究了优化技术以包含软前件。
    Abstract Many planning formalisms allow for mixing numeric with Boolean effects. However, most of these formalisms are undecidable. In this paper, we will analyze possible causes for this undecidability by studying the number of different occurrences of actions, an approach that proved useful for metric fluents before. We will start by reformulating a numeric planning problem known as restricted tasks as a search problem. We will then show how an NP-complete fragment of numeric planning can be found by using heuristics. To achieve this, we will develop the idea of multi-valued partial order plans, a least committing compact representation for (sequential and parallel) plans. Finally, we will study optimization techniques for this representation to incorporate soft preconditions.
    摘要 很多规划ormalism允许混合数字和布尔效果。然而,大多数这些ormalism是不可解决的。在这篇论文中,我们会分析可能导致这种不可解决性的原因,通过研究行动的不同出现次数来研究metric fluents的方法。我们将从restricted tasks中的数字规划问题开始,然后使用启发式将NP完全 Fragment of numeric planning转化为搜索问题。最后,我们将开发多值partial order plan的理想,一种用于(串行和并行)计划的最小承诺表示法。最后,我们将研究这种表示法的优化技术,以涵盖软前件。

MVMR-FS : Non-parametric feature selection algorithm based on Maximum inter-class Variation and Minimum Redundancy

  • paper_url: http://arxiv.org/abs/2307.14643
  • repo_url: None
  • paper_authors: Haitao Nie, Shengbo Zhang, Bin Xie
  • for: 本研究旨在解决 Filter-based 特征选择方法无法直接测量连续数据中特征之间的重复性的问题。
  • methods: 本方法基于最大间类差和最小重复性,简称 MVMR-FS。首先,我们使用支持学习和无支持学习核密度估计来捕捉特征之间的相似性和总体分布的不同。然后,我们提出了最大间类差和最小重复性的 criterion,其中间类概率分布用于反映特征相关性,而总体分布距离用于衡量特征之间的重复性。最后,我们使用 AG 搜索算法来找到最佳特征子集,以最小化 MVMR。
  • results: 与十种现状顶尖方法进行比较后,MVMR-FS achieved the highest average accuracy, and improved the accuracy by 5% to 11%.
    Abstract How to accurately measure the relevance and redundancy of features is an age-old challenge in the field of feature selection. However, existing filter-based feature selection methods cannot directly measure redundancy for continuous data. In addition, most methods rely on manually specifying the number of features, which may introduce errors in the absence of expert knowledge. In this paper, we propose a non-parametric feature selection algorithm based on maximum inter-class variation and minimum redundancy, abbreviated as MVMR-FS. We first introduce supervised and unsupervised kernel density estimation on the features to capture their similarities and differences in inter-class and overall distributions. Subsequently, we present the criteria for maximum inter-class variation and minimum redundancy (MVMR), wherein the inter-class probability distributions are employed to reflect feature relevance and the distances between overall probability distributions are used to quantify redundancy. Finally, we employ an AGA to search for the feature subset that minimizes the MVMR. Compared with ten state-of-the-art methods, MVMR-FS achieves the highest average accuracy and improves the accuracy by 5% to 11%.
    摘要 如何准确地衡量特征之间的相关性和重复性是机器学习领域的一个古老的挑战。然而,现有的筛选方法无法直接测量连续数据中的重复性。另外,大多数方法需要手动指定特征的数量,这可能会导致专家知识不足的情况下引入错误。在本文中,我们提出了一种非参数式特征选择算法基于最大间类差和最小重复性,简称MVMR-FS。我们首先引入supervised和unsupervised核密度估计来捕捉特征之间的相似性和总体分布的差异。然后,我们介绍了MVMR的标准,其中间类概率分布用于反映特征相关性,而总体概率分布的距离用于衡量特征之间的重复性。最后,我们使用AGA算法搜索最小化MVMR的特征子集,相比之下,与state-of-the-art方法相比,MVMR-FS实现了最高的平均准确率,提高了准确率5%到11%。

Fact-Checking of AI-Generated Reports

  • paper_url: http://arxiv.org/abs/2307.14634
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Razi Mahmood, Ge Wang, Mannudeep Kalra, Pingkun Yan
  • for: 这篇论文的目的是为了提高 radiology 图像的自动生成报告的准确性和责任使用。
  • methods: 这篇论文提出了一新的方法,利用图像和报告之间的关联来检查生成的报告是否实际存在错误。
  • results: 这篇论文的结果显示,这新的方法可以对 automatically生成的报告进行检查,并删除假的句子,提高报告的准确性和责任使用。
    Abstract With advances in generative artificial intelligence (AI), it is now possible to produce realistic-looking automated reports for preliminary reads of radiology images. This can expedite clinical workflows, improve accuracy and reduce overall costs. However, it is also well-known that such models often hallucinate, leading to false findings in the generated reports. In this paper, we propose a new method of fact-checking of AI-generated reports using their associated images. Specifically, the developed examiner differentiates real and fake sentences in reports by learning the association between an image and sentences describing real or potentially fake findings. To train such an examiner, we first created a new dataset of fake reports by perturbing the findings in the original ground truth radiology reports associated with images. Text encodings of real and fake sentences drawn from these reports are then paired with image encodings to learn the mapping to real/fake labels. The utility of such an examiner is demonstrated for verifying automatically generated reports by detecting and removing fake sentences. Future generative AI approaches can use the resulting tool to validate their reports leading to a more responsible use of AI in expediting clinical workflows.
    摘要 使用生成式人工智能(AI),现在可以生成具有很好的真实感的自动报告,以便加速临床工作流程,提高准确性和降低总成本。然而,这些模型经常“见鬼”,导致自动生成的报告中出现假的结论。在这篇论文中,我们提议一种新的实验检查AI生成的报告的方法,specifically,我们通过学习图像和报告中的句子之间的关系,来分辨真实的和假的句子。为了训练这种检查器,我们首先创建了一个新的假报告数据集,其中perturb the findings在原始的基准真实Radiology report中。然后,我们对真实和假句子的文本编码和图像编码进行对应,以学习将它们映射到真实/假标签。我们证明了这种检查器可以用于检查自动生成的报告,以检测和移除假的句子。未来的生成AI方法可以使用这个工具来验证他们的报告,从而实现负责任的使用AI来加速临床工作流程。

Metric-Based In-context Learning: A Case Study in Text Simplification

  • paper_url: http://arxiv.org/abs/2307.14632
  • repo_url: https://github.com/nlp-ku/metric-based-in-context-learning
  • paper_authors: Subha Vadlamannati, Gözde Gül Şahin
  • for: investigate the best method for selecting examples for in-context learning (ICL) in text simplification (TS) tasks.
  • methods: propose a Metric-Based in-context Learning (MBL) method that uses commonly used TS metrics such as SARI, compression ratio, and BERT-Precision for selection.
  • results: show that examples selected by the top SARI scores perform the best on larger models, while the compression ratio generally performs better on smaller models. MBL is robust to example orderings and out-of-domain test sets, and outperforms strong baselines and state-of-the-art finetuned language models. Additionally, the chosen metric can implicitly control the behavior of large GPT models.
    Abstract In-context learning (ICL) for large language models has proven to be a powerful approach for many natural language processing tasks. However, determining the best method to select examples for ICL is nontrivial as the results can vary greatly depending on the quality, quantity, and order of examples used. In this paper, we conduct a case study on text simplification (TS) to investigate how to select the best and most robust examples for ICL. We propose Metric-Based in-context Learning (MBL) method that utilizes commonly used TS metrics such as SARI, compression ratio, and BERT-Precision for selection. Through an extensive set of experiments with various-sized GPT models on standard TS benchmarks such as TurkCorpus and ASSET, we show that examples selected by the top SARI scores perform the best on larger models such as GPT-175B, while the compression ratio generally performs better on smaller models such as GPT-13B and GPT-6.7B. Furthermore, we demonstrate that MBL is generally robust to example orderings and out-of-domain test sets, and outperforms strong baselines and state-of-the-art finetuned language models. Finally, we show that the behaviour of large GPT models can be implicitly controlled by the chosen metric. Our research provides a new framework for selecting examples in ICL, and demonstrates its effectiveness in text simplification tasks, breaking new ground for more accurate and efficient NLG systems.
    摘要 大型语言模型的增Context学习(ICL)已经证明是许多自然语言处理任务的有力的方法。但是确定最佳的ICL例子选择方法是非常困难,因为结果可能会很大程度上取决于选择的例子质量、量和顺序。在这篇论文中,我们进行了TS(简化文本)案例研究,以 Investigate how to select the best and most robust examples for ICL。我们提出了Metric-Based in-context Learning(MBL)方法,该方法利用通用TS度量标准such as SARI、压缩率和BERT-Precision进行选择。通过对不同大小的GPT模型(GPT-175B、GPT-13B和GPT-6.7B)进行了广泛的实验,我们发现了TS的最高SARI分数选择的例子在更大的模型上表现最佳,而压缩率通常在较小的模型上表现更好。此外,我们还证明了MBL是对example顺序和尝试集的鲁棒的,并且超越了强基线和当前训练语言模型。最后,我们发现了选择的度量可以 implicitly控制大型GPT模型的行为。我们的研究提供了一个新的ICL例子选择框架,并证明其在TS任务中的效果,打破了更准确和有效的NLG系统的障碍。

A Survey on Reservoir Computing and its Interdisciplinary Applications Beyond Traditional Machine Learning

  • paper_url: http://arxiv.org/abs/2307.15092
  • repo_url: None
  • paper_authors: Heng Zhang, Danilo Vasconcellos Vargas
  • for: 这篇论文主要探讨了储量计算(RC)的最新发展,从机器学习到物理、生物和神经科学。
  • methods: RC 使用了随机连接的激活函数和非线性动力系统,可以处理时间信号处理等应用。
  • results: RC 可以实现高维空间的映射,具有良好的非线性特性和记忆能力,并且可以应用于多种领域。
    Abstract Reservoir computing (RC), first applied to temporal signal processing, is a recurrent neural network in which neurons are randomly connected. Once initialized, the connection strengths remain unchanged. Such a simple structure turns RC into a non-linear dynamical system that maps low-dimensional inputs into a high-dimensional space. The model's rich dynamics, linear separability, and memory capacity then enable a simple linear readout to generate adequate responses for various applications. RC spans areas far beyond machine learning, since it has been shown that the complex dynamics can be realized in various physical hardware implementations and biological devices. This yields greater flexibility and shorter computation time. Moreover, the neuronal responses triggered by the model's dynamics shed light on understanding brain mechanisms that also exploit similar dynamical processes. While the literature on RC is vast and fragmented, here we conduct a unified review of RC's recent developments from machine learning to physics, biology, and neuroscience. We first review the early RC models, and then survey the state-of-the-art models and their applications. We further introduce studies on modeling the brain's mechanisms by RC. Finally, we offer new perspectives on RC development, including reservoir design, coding frameworks unification, physical RC implementations, and interaction between RC, cognitive neuroscience and evolution.
    摘要 储池计算(RC),最初应用于时间信号处理,是一种循环神经网络,其中神经元随机连接。一旦初始化,连接强度保持不变。这种简单的结构使RC变成了一个非线性动力系统,可以将低维输入映射到高维空间中。模型的丰富动力、线性分离和记忆容量,然后允许简单的线性读取生成适用于各种应用的充分回应。RC的应用范围远超机器学习,因为它在不同的物理硬件实现和生物设备中也可以实现复杂的动力学过程。这提供了更大的灵活性和更短的计算时间。此外,由模型的动力触发的神经元响应,也有助于理解大脑机制,这些机制也利用类似的动力过程。在文献中,RC的发展是庞大的和杂乱的,在这里我们提供了一个统一的RC发展的评论,从机器学习到物理、生物和神经科学。我们首先评论了RC的早期模型,然后报道了当前的state-of-the-art模型和其应用。我们还介绍了通过RC模型大脑机制的研究。最后,我们提出了新的RC发展 Perspectives,包括储池设计、编程框架统一、物理RC实现和RC、认知神经科学和演化之间的交互。

BubbleML: A Multi-Physics Dataset and Benchmarks for Machine Learning

  • paper_url: http://arxiv.org/abs/2307.14623
  • repo_url: https://github.com/hpcforge/bubbleml
  • paper_authors: Sheikh Md Shakeel Hassan, Arthur Feeney, Akash Dhruv, Jihoon Kim, Youngjoon Suh, Jaiyoung Ryu, Yoonjin Won, Aparna Chandramowlishwaran
  • for: 提供 Machine Learning 训练数据集,帮助更好地理解多物理现象的复杂性。
  • methods: 基于物理驱动的数值 simulations,提供了多种热层泵泡场景的准确基准信息,覆盖了重力条件、流速、冷却水温、壁超热等多个参数,共51个 simulations。
  • results: 验证了对实验观测的验证和趋势,并为多种下游任务提供了探索的可能性,如液体动态分析和温度动态学习网络。
    Abstract In the field of phase change phenomena, the lack of accessible and diverse datasets suitable for machine learning (ML) training poses a significant challenge. Existing experimental datasets are often restricted, with limited availability and sparse ground truth data, impeding our understanding of this complex multi-physics phenomena. To bridge this gap, we present the BubbleML Dataset(https://github.com/HPCForge/BubbleML) which leverages physics-driven simulations to provide accurate ground truth information for various boiling scenarios, encompassing nucleate pool boiling, flow boiling, and sub-cooled boiling. This extensive dataset covers a wide range of parameters, including varying gravity conditions, flow rates, sub-cooling levels, and wall superheat, comprising 51 simulations. BubbleML is validated against experimental observations and trends, establishing it as an invaluable resource for ML research. Furthermore, we showcase its potential to facilitate exploration of diverse downstream tasks by introducing two benchmarks: (a) optical flow analysis to capture bubble dynamics, and (b) operator networks for learning temperature dynamics. The BubbleML dataset and its benchmarks serve as a catalyst for advancements in ML-driven research on multi-physics phase change phenomena, enabling the development and comparison of state-of-the-art techniques and models.
    摘要 在热相转换现象领域,因缺乏可访问和多样化的数据集,使得机器学习(ML)训练受到很大的挑战。现有的实验数据往往受限,有限的可用性和稀缺的实际数据,使得我们对这种复杂多物理现象的理解受阻。为了缓解这个问题,我们提供了BubbleML数据集(https://github.com/HPCForge/BubbleML),该数据集利用物理驱动的 simulations提供了多种爆发情况的准确的真实数据,包括固定流体流速、不同的液体温度和壁超热等参数,涵盖51个 simulations。BubbleML被验证了对实验观察和趋势的验证,成为一项不可或缺的资源 для ML研究。此外,我们还介绍了两个标准准确:(a)光流分析以捕捉气泡动力学,以及(b)运算网络用于学习温度动力学。BubbleML数据集和其标准准确 serve as a catalyst for advancements in ML-driven research on multi-physics phase change phenomena,allowing the development and comparison of state-of-the-art techniques and models.

Self-Contrastive Graph Diffusion Network

  • paper_url: http://arxiv.org/abs/2307.14613
  • repo_url: https://github.com/kunzhan/SCDGN
  • paper_authors: Yixian Ma, Kun Zhan
  • for: 本研究提出了一种新的框架 called Self-Contrastive Graph Diffusion Network (SCGDN), 用于 Graph Self-Contrastive Learning paradigm,以解决现有方法中的一些限制。
  • methods: 该框架包括两个主要组件:Attentional Module (AttM) 和 Diffusion Module (DiFM)。AttM 通过聚合高阶结构和特征信息来获得优秀的嵌入,而 DiFM 通过拉普拉斯扩散学习平衡每个节点的状态,并让 adjacency 和特征信息在图中协同演化。
  • results: SCGDN 可以避免 “sampling bias” 和 semantic drift,无需预训练。通过高质量的采样方法,SCGDN 可以更好地保持高阶结构信息,并避免过拟合。实验结果表明,SCGDN 可以在对照方法和传统方法的比较中表现出优异表现。
    Abstract Augmentation techniques and sampling strategies are crucial in contrastive learning, but in most existing works, augmentation techniques require careful design, and their sampling strategies can only capture a small amount of intrinsic supervision information. Additionally, the existing methods require complex designs to obtain two different representations of the data. To overcome these limitations, we propose a novel framework called the Self-Contrastive Graph Diffusion Network (SCGDN). Our framework consists of two main components: the Attentional Module (AttM) and the Diffusion Module (DiFM). AttM aggregates higher-order structure and feature information to get an excellent embedding, while DiFM balances the state of each node in the graph through Laplacian diffusion learning and allows the cooperative evolution of adjacency and feature information in the graph. Unlike existing methodologies, SCGDN is an augmentation-free approach that avoids "sampling bias" and semantic drift, without the need for pre-training. We conduct a high-quality sampling of samples based on structure and feature information. If two nodes are neighbors, they are considered positive samples of each other. If two disconnected nodes are also unrelated on $k$NN graph, they are considered negative samples for each other. The contrastive objective reasonably uses our proposed sampling strategies, and the redundancy reduction term minimizes redundant information in the embedding and can well retain more discriminative information. In this novel framework, the graph self-contrastive learning paradigm gives expression to a powerful force. SCGDN effectively balances between preserving high-order structure information and avoiding overfitting. The results manifest that SCGDN can consistently generate outperformance over both the contrastive methods and the classical methods.
    摘要 《增强技术和采样策略是对冲学习中的关键,但现有的方法通常需要仔细设计增强技术,并且只能捕捉到小量内在监督信息。此外,现有的方法需要复杂的设计来获得两种不同的数据表示。为了解决这些限制,我们提出了一种新的框架called Self-Contrastive Graph Diffusion Network (SCGDN)。我们的框架包括两个主要组成部分:Attentional Module (AttM)和Diffusion Module (DiFM)。AttM将高阶结构和特征信息聚合以获得优秀的嵌入,而DiFM通过拉普拉斯扩散学习平衡每个节点在图中的状态,并允许邻居和特征信息在图中协同演化。与现有方法不同,SCGDN不需要增强技术和采样偏见,无需预训练。我们采用高质量的采样策略,根据结构和特征信息进行采样。如果两个节点相邻,它们被视为对方的正例;如果两个不相邻的节点也不在$k$NN图中相关,它们被视为对方的负例。对冲目标合理使用我们提议的采样策略,并且减少纠纷信息的概率逻辑可以良好地保留更多的特征信息。在这种新的框架中,图自相关学习方式表达出了强大的力量。SCGDN能够均衡保持高阶结构信息和避免过拟合。结果表明SCGDN可以一直在对冲方法和传统方法之上出现出色的性能。》

Clustering based Point Cloud Representation Learning for 3D Analysis

  • paper_url: http://arxiv.org/abs/2307.14605
  • repo_url: https://github.com/fengzicai/cluster3dseg
  • paper_authors: Tuo Feng, Wenguan Wang, Xiaohan Wang, Yi Yang, Qinghua Zheng
  • for: 这种研究的目的是提出一种基于归一化的超参数学习方法,以自动发现 scene中的 subclass 模式,从而提高点云分析 task 的稳定性和敏感性。
  • methods: 这种方法使用 clustering 技术在点云 embedding 空间中进行内类划分,以挖掘Scene中的 latent 模式。然后,这些模式被用来重新绘制 embedding 空间,以更好地遵循训练数据集的下面分布,提高对变化的抗锋性。
  • results: 这种方法在多种3D网络架构(包括 voxel-based、point-based 和 Transformer-based)上显示了显著的改进(即2.0-2.6% 和 1.8-1.9% 在 SemanticKITTI 和 S3DIS datasets 上),并且在 KITTI 上也显示了2.0-3.4% mAP 的提升。
    Abstract Point cloud analysis (such as 3D segmentation and detection) is a challenging task, because of not only the irregular geometries of many millions of unordered points, but also the great variations caused by depth, viewpoint, occlusion, etc. Current studies put much focus on the adaption of neural networks to the complex geometries of point clouds, but are blind to a fundamental question: how to learn an appropriate point embedding space that is aware of both discriminative semantics and challenging variations? As a response, we propose a clustering based supervised learning scheme for point cloud analysis. Unlike current de-facto, scene-wise training paradigm, our algorithm conducts within-class clustering on the point embedding space for automatically discovering subclass patterns which are latent yet representative across scenes. The mined patterns are, in turn, used to repaint the embedding space, so as to respect the underlying distribution of the entire training dataset and improve the robustness to the variations. Our algorithm is principled and readily pluggable to modern point cloud segmentation networks during training, without extra overhead during testing. With various 3D network architectures (i.e., voxel-based, point-based, Transformer-based, automatically searched), our algorithm shows notable improvements on famous point cloud segmentation datasets (i.e.,2.0-2.6% on single-scan and 2.0-2.2% multi-scan of SemanticKITTI, 1.8-1.9% on S3DIS, in terms of mIoU). Our algorithm also demonstrates utility in 3D detection, showing 2.0-3.4% mAP gains on KITTI.
    摘要 点云分析(如3D segmentation和检测)是一项复杂的任务,因为点云的不规则形状以及深度、视点、遮挡等因素引起的巨大变化。现有研究强调用神经网络适应点云的复杂 geometries,但忽略了一个基本问题:如何学习适当的点云嵌入空间,考虑到both discriminative semantics和挑战性变化?作为回应,我们提出了一种 clustering 基于超级vised learning 方案 для点云分析。与当前的 scene-wise 训练方法不同,我们的算法在点云嵌入空间内进行 Within-class clustering,以自动发现 scene 中的 Representative subclass 模式。挖掘出来的模式会被用来重新绘制嵌入空间,以使其遵循整个训练数据集的下面分布,提高对变化的 robustness。我们的算法是理性的,可以在现代点云 segmentation 网络中进行实时插值,无需测试时间过载。与不同的3D网络架构(i.e., voxel-based, point-based, Transformer-based, automatically searched)结合使用,我们的算法在著名的点云 segmentation 数据集(i.e., SemanticKITTI 2.0-2.6%、S3DIS 1.8-1.9%, terms of mIoU)上显示了显著的提升。我们的算法还在3D检测中展示了2.0-3.4% mAP 的提升。

The detection and rectification for identity-switch based on unfalsified control

  • paper_url: http://arxiv.org/abs/2307.14591
  • repo_url: None
  • paper_authors: Junchao Huang, Xiaoqi He, Sheng Zhao
  • for: 这 paper 是为了解决视频中的多对象跟踪问题,并且提出了一种基于不做假的控制方法来解决 ID-switch 问题。
  • methods: 该 paper 使用了一种特定的检测和修正模块来检测 ID-switch,并提出了一种简单有效的匹配方法来解决 ambiguous 匹配问题。
  • results: 实验结果表明,该 tracker 在覆盖和快速运动导致的跟踪错误问题下表现出色,并且具有极高的效果和稳定性。
    Abstract The purpose of multi-object tracking (MOT) is to continuously track and identify objects detected in videos. Currently, most methods for multi-object tracking model the motion information and combine it with appearance information to determine and track objects. In this paper, unfalsified control is employed to address the ID-switch problem in multi-object tracking. We establish sequences of appearance information variations for the trajectories during the tracking process and design a detection and rectification module specifically for ID-switch detection and recovery. We also propose a simple and effective strategy to address the issue of ambiguous matching of appearance information during the data association process. Experimental results on publicly available MOT datasets demonstrate that the tracker exhibits excellent effectiveness and robustness in handling tracking errors caused by occlusions and rapid movements.
    摘要 <> translate english text into simplified chineseThe purpose of multi-object tracking (MOT) is to continuously track and identify objects detected in videos. Currently, most methods for multi-object tracking model the motion information and combine it with appearance information to determine and track objects. In this paper, unfalsified control is employed to address the ID-switch problem in multi-object tracking. We establish sequences of appearance information variations for the trajectories during the tracking process and design a detection and rectification module specifically for ID-switch detection and recovery. We also propose a simple and effective strategy to address the issue of ambiguous matching of appearance information during the data association process. Experimental results on publicly available MOT datasets demonstrate that the tracker exhibits excellent effectiveness and robustness in handling tracking errors caused by occlusions and rapid movements.<>Here's the translation in Traditional Chinese:<> translate english text into traditional chineseThe purpose of multi-object tracking (MOT) is to continuously track and identify objects detected in videos. Currently, most methods for multi-object tracking model the motion information and combine it with appearance information to determine and track objects. In this paper, unfalsified control is employed to address the ID-switch problem in multi-object tracking. We establish sequences of appearance information variations for the trajectories during the tracking process and design a detection and rectification module specifically for ID-switch detection and recovery. We also propose a simple and effective strategy to address the issue of ambiguous matching of appearance information during the data association process. Experimental results on publicly available MOT datasets demonstrate that the tracker exhibits excellent effectiveness and robustness in handling tracking errors caused by occlusions and rapid movements.<>

Explainable Techniques for Analyzing Flow Cytometry Cell Transformers

  • paper_url: http://arxiv.org/abs/2307.14581
  • repo_url: None
  • paper_authors: Florian Kowarsch, Lisa Weijler, FLorian Kleber, Matthias Wödlinger, Michael Reiter, Margarita Maurer-Granofszky, Michael Dworzak
    for:This paper aims to improve explainability for deep learning models in clinical applications, specifically for Flow CytoMetry (FCM) data.methods:The authors propose and evaluate two visualization techniques for cell classification and polygon regression on pediatric Acute Lymphoblastic Leukemia (ALL) FCM samples: gradient-based visualization and attention visualization. These techniques are tailored for FCM data and utilize a transformer architecture called ReluFormer.results:The results demonstrate the effectiveness of the proposed visualization techniques in outlining the model’s decision process and providing insights into the transformer’s decision-making process when handling FCM data. The gradient-based visualization identifies cells that are most significant for a particular prediction, while the attention visualization shows that different attention heads specialize by attending to different biologically meaningful sub-populations in the data.
    Abstract Explainability for Deep Learning Models is especially important for clinical applications, where decisions of automated systems have far-reaching consequences. While various post-hoc explainable methods, such as attention visualization and saliency maps, already exist for common data modalities, including natural language and images, little work has been done to adapt them to the modality of Flow CytoMetry (FCM) data. In this work, we evaluate the usage of a transformer architecture called ReluFormer that ease attention visualization as well as we propose a gradient- and an attention-based visualization technique tailored for FCM. We qualitatively evaluate the visualization techniques for cell classification and polygon regression on pediatric Acute Lymphoblastic Leukemia (ALL) FCM samples. The results outline the model's decision process and demonstrate how to utilize the proposed techniques to inspect the trained model. The gradient-based visualization not only identifies cells that are most significant for a particular prediction but also indicates the directions in the FCM feature space in which changes have the most impact on the prediction. The attention visualization provides insights on the transformer's decision process when handling FCM data. We show that different attention heads specialize by attending to different biologically meaningful sub-populations in the data, even though the model retrieved solely supervised binary classification signals during training.
    摘要 deep learning 模型的可解释性特别重要于临床应用,因为机器自动系统的决策对结果有深远的影响。 exist 多种后处可解释方法,如注意力视觉和积分地图,已经应用于常见的数据模式,如自然语言和图像。 然而,对流率维度测试(FCM)数据的可解释方法尚未得到广泛的研究。 在这种情况下,我们评估了一种名为ReLUFormer的transformer架构,以便进行注意力视觉以及我们提议了一种基于梯度和注意力的FCM数据可视化技术。 我们质量评估了这些可视化技术在儿童急性 лимфоblastLeukemia(ALL)FCM样本上进行细胞分类和多边 regression。 结果表明了模型做出的决策过程,并示出了如何使用我们提议的技术来检查训练的模型。 梯度可视化不仅可以确定细胞分类中最重要的细胞,还可以指示FCM特征空间中改变的方向具有最大影响。 注意力可视化为模型处理FCM数据的决策过程提供了启示,我们发现了不同的注意力头专注于不同的生物学意义的子 популяción,即使模型只在训练过程中获得了简单的二分类信号。

A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised Traffic Accident Detection in Driving Videos

  • paper_url: http://arxiv.org/abs/2307.14575
  • repo_url: None
  • paper_authors: Rongqin Liang, Yuanman Li, Yingxin Yi, Jiantao Zhou, Xia Li
  • for: 本研究旨在提高自动驾驶和助手系统的安全性,通过识别驾驶视频中的交通事故。
  • methods: 我们提出了一种新的记忆增强多任务协同框架(MAMTCF),通过同时模型视频帧中的出现变化和物体运动来更准确地检测交通事故。我们还引入了一种具有记忆的动作表示机制,以全面探索不同类型的运动表示之间的相互关系,并利用存储在内存中的常见交通模式高级特征来增强动作表示。
  • results: 我们的方法在最新的大规模数据集上进行了实验,与之前的状态时的方法相比,我们的方法可以更好地检测交通事故。
    Abstract Identifying traffic accidents in driving videos is crucial to ensuring the safety of autonomous driving and driver assistance systems. To address the potential danger caused by the long-tailed distribution of driving events, existing traffic accident detection (TAD) methods mainly rely on unsupervised learning. However, TAD is still challenging due to the rapid movement of cameras and dynamic scenes in driving scenarios. Existing unsupervised TAD methods mainly rely on a single pretext task, i.e., an appearance-based or future object localization task, to detect accidents. However, appearance-based approaches are easily disturbed by the rapid movement of the camera and changes in illumination, which significantly reduce the performance of traffic accident detection. Methods based on future object localization may fail to capture appearance changes in video frames, making it difficult to detect ego-involved accidents (e.g., out of control of the ego-vehicle). In this paper, we propose a novel memory-augmented multi-task collaborative framework (MAMTCF) for unsupervised traffic accident detection in driving videos. Different from previous approaches, our method can more accurately detect both ego-involved and non-ego accidents by simultaneously modeling appearance changes and object motions in video frames through the collaboration of optical flow reconstruction and future object localization tasks. Further, we introduce a memory-augmented motion representation mechanism to fully explore the interrelation between different types of motion representations and exploit the high-level features of normal traffic patterns stored in memory to augment motion representations, thus enlarging the difference from anomalies. Experimental results on recently published large-scale dataset demonstrate that our method achieves better performance compared to previous state-of-the-art approaches.
    摘要 identifying traffic accidents in driving videos是Autonomous driving和driver assistance system的关键安全因素。为了解决驾驶场景中异常事件的长尾分布问题,现有的交通事故检测(TAD)方法主要采用不监督学习。然而,TAD仍然是挑战,因为驾驶场景中的相机和场景在快速运动中变化。现有的不监督TAD方法主要基于单一假设任务,即外观基于的或未来对象定位任务,以检测事故。然而,外观基于的方法容易受到相机快速运动和光照变化的影响,导致事故检测性能下降。基于未来对象定位任务的方法可能无法捕捉视频帧中的外观变化,从而困难检测 egovolved 事故(例如, egovolved 车辆失控)。在本文中,我们提出了一种新的记忆增强多任务合作框架(MAMTCF),用于不监督交通事故检测。与之前的方法不同,我们的方法可以更准确地检测 egovolved 和非 egovolved 事故,通过视频帧中的相机运动和未来对象定位任务的共同模型化,捕捉视频帧中的外观变化和对象运动。此外,我们引入记忆增强运动表示机制,全面利用不同类型的运动表示之间的相互关系,并通过记忆中高级特征来增强运动表示,从而增大与异常之间的差异。实验结果表明,我们的方法在最新的大规模数据集上达到了前一个 estado del arte 方法的性能。

Evaluation of Safety Constraints in Autonomous Navigation with Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.14568
  • repo_url: None
  • paper_authors: Brian Angulo, Gregory Gorbov, Aleksandr Panov, Konstantin Yakovlev
  • for: 这个研究旨在高亮安全性因素在自动驾驶系统中的重要性,并对两种不同的学习导航策略进行比较。
  • methods: 这个研究使用了一种不同的学习导航策略,即考虑安全性因素的“安全”策略,与不考虑安全性因素的“危险”策略进行比较。
  • results: 研究结果表明,使用“安全”策略可以生成更多的减噪距离(距离障碍物),避免更多的碰撞,同时不 sacrificing总性能。I hope that helps! Let me know if you have any other questions.
    Abstract While reinforcement learning algorithms have had great success in the field of autonomous navigation, they cannot be straightforwardly applied to the real autonomous systems without considering the safety constraints. The later are crucial to avoid unsafe behaviors of the autonomous vehicle on the road. To highlight the importance of these constraints, in this study, we compare two learnable navigation policies: safe and unsafe. The safe policy takes the constraints into account, while the other does not. We show that the safe policy is able to generate trajectories with more clearance (distance to the obstacles) and makes less collisions while training without sacrificing the overall performance.
    摘要 Autonomous navigation algorithms based on reinforcement learning have achieved great success, but they cannot be directly applied to real-world autonomous systems without considering safety constraints. These constraints are crucial to avoid dangerous behaviors of the autonomous vehicle on the road. To emphasize the importance of these constraints, in this study, we compare two learnable navigation policies: safe and unsafe. The safe policy takes the constraints into account, while the other does not. We show that the safe policy can generate trajectories with more clearance (distance to obstacles) and makes fewer collisions while training without sacrificing overall performance.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Understanding Forward Process of Convolutional Neural Network

  • paper_url: http://arxiv.org/abs/2307.15090
  • repo_url: https://github.com/himanshub1007/Alzhimers-Disease-Prediction-Using-Deep-learning
  • paper_authors: Peixin Tian
  • for: 描述 CNN 的前向处理中的选择性旋转
  • methods: 利用 activation function 作为筛选和归一化输入数据的机制,并通过应用结构化数学工具来分析输入的统计指标
  • results: 研究发现,这种定义的方法让网络能够根据统计指标来分辨输入,并且发现人工神经网络和人脑在数据处理模式上存在一致性。
    Abstract This paper reveal the selective rotation in the CNNs' forward processing. It elucidates the activation function as a discerning mechanism that unifies and quantizes the rotational aspects of the input data. Experiments show how this defined methodology reflects the progress network distinguish inputs based on statistical indicators, which can be comprehended or analyzed by applying structured mathematical tools. Our findings also unveil the consistency between artificial neural networks and the human brain in their data processing pattern.
    摘要 Translated into Simplified Chinese:这篇论文揭示了 CNN 的选择性旋转处理。它提出了活动函数作为分类机制,归一化和量化输入数据的旋转方面。实验显示,这种定义的方法ология可以基于统计指标来分类输入,可以通过结构化数学工具进行理解或分析。我们的发现还揭示了人工神经网络和人脑在数据处理模式上的一致性。

Reinforcement learning guided fuzz testing for a browser’s HTML rendering engine

  • paper_url: http://arxiv.org/abs/2307.14556
  • repo_url: None
  • paper_authors: Martin Sablotny, Bjørn Sand Jensen, Jeremy Singer
  • for: 找寻多种漏洞和安全漏洞
  • methods: 使用深度学习模型生成测试案例,并使用双层深度Q网络(DDQN)引导测试案例创建
  • results: 对Firefox HTML渲染引擎进行了18.5%的代码覆盖性提升 compared to基eline语法基础的混杂化器
    Abstract Generation-based fuzz testing can uncover various bugs and security vulnerabilities. However, compared to mutation-based fuzz testing, it takes much longer to develop a well-balanced generator that produces good test cases and decides where to break the underlying structure to exercise new code paths. We propose a novel approach to combine a trained test case generator deep learning model with a double deep Q-network (DDQN) for the first time. The DDQN guides test case creation based on a code coverage signal. Our approach improves the code coverage performance of the underlying generator model by up to 18.5\% for the Firefox HTML rendering engine compared to the baseline grammar based fuzzer.
    摘要 生成基于的验证测试可以探测多种漏洞和安全漏洞。然而,相比于变换基于的验证测试,它需要较长时间来开发一个很好均衡的生成器,以生成好的测试 caso和决定在下面的结构下激活新的代码路径。我们提出了一种新的方法,将训练过的测试 caso生成深度学习模型与双深度Q网络(DDQN)结合使用。DDQN根据代码覆盖率信号引导测试 caso创建。我们的方法可以提高 Firefox HTML 渲染引擎下的代码覆盖率性能,相比基eline grammar基础的验证器,提高了18.5%。

Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application

  • paper_url: http://arxiv.org/abs/2307.14549
  • repo_url: None
  • paper_authors: Jianjun Yuan, Wei Lee Woon, Ludovik Coba
  • for: 这篇论文是为了解决在线推荐系统中的睡眠bandit问题而写的。
  • methods: 该论文提出了一种高效的算法来解决睡眠bandit问题,该算法基于单臂选择算法的扩展,并且保证能够实现理论性的表现,即 regret upper bounded by $\bigO(kN^2\sqrt{T\log T})$.
  • results: 该论文的结果表明,该算法能够在睡眠bandit问题中实现高效的选择,并且能够避免极端情况下的质量下降。
    Abstract This paper presents an efficient algorithm to solve the sleeping bandit with multiple plays problem in the context of an online recommendation system. The problem involves bounded, adversarial loss and unknown i.i.d. distributions for arm availability. The proposed algorithm extends the sleeping bandit algorithm for single arm selection and is guaranteed to achieve theoretical performance with regret upper bounded by $\bigO(kN^2\sqrt{T\log T})$, where $k$ is the number of arms selected per time step, $N$ is the total number of arms, and $T$ is the time horizon.
    摘要 (Simplified Chinese translation)这篇论文提出了一种高效的算法,用于在线推荐系统中解决睡着帮手问题。该问题具有含边界、对抗损失和未知i.i.d.分布的arm可用性。提出的算法基于单个帮手选择算法的扩展,并且保证了理论性能, regret的上界为 $\bigO(kN^2\sqrt{T\log T})$, where $k$ is the number of arms selected per time step, $N$ is the total number of arms, and $T$ is the time horizon.

Speed Reading Tool Powered by Artificial Intelligence for Students with ADHD, Dyslexia, or Short Attention Span

  • paper_url: http://arxiv.org/abs/2307.14544
  • repo_url: None
  • paper_authors: Megat Irfan Zackry Bin Ismail Ahmad Nazran bin Yusri Muhammad Hafizzul Bin Abdul Manap Muhammad Muizzuddin Bin Kamarozaman
  • for: 帮助学生 WITH dyslexia, ADHD, 和短时间注意力不足更好地理解文本信息
  • methods: 使用多层感知器(MLP)算法进行复杂文本处理和概要化任务,并使用 Hugging Face 提供的 T5 模型(文本生成模型)进行 fine-tuning 特定任务,使用 NLTK 的 Punkt Sentence Tokenizer 将文本分解成句子列表
  • results: 通过应用 Bionic Reading 的原则(包括粗体函数和字符、词、行间距调整),提高了阅读速度和效率
    Abstract This paper presents a novel approach to assist students with dyslexia, ADHD, and short attention span in digesting any text-based information more efficiently. The proposed solution utilizes the Multilayer Perceptron (MLP) algorithm for complex text processing and summarization tasks. The tool leverages the T5 (Text-to-Text Transfer Transformer) model from Hugging Face, which treats every NLP task as a text generation task. The model is fine-tuned on specific tasks using a smaller dataset. The NLTK's Punkt Sentence Tokenizer is used to divide a text into a list of sentences. The application is served using Flask, a lightweight web server and framework. The tool also applies principles from Bionic Reading to enhance readability, which includes a bolding function and adjustments to line, word, and character spacing. The paper discusses the methodology, implementation, and results of the AI-based speed reading tool.
    摘要 这篇论文介绍了一种新的方法,用于帮助学生有读写障碍、注意力不集中和短时间内容过载的问题更有效地处理文本信息。提出的解决方案利用多层感知器(MLP)算法进行复杂文本处理和摘要任务。工具使用了Hugging Face提供的T5(文本生成传输变换器)模型,该模型对每个NLP任务视为文本生成任务。模型通过使用特定任务的更小数据集进行细化。使用NLTK的Punkt Sentence Tokenizer将文本分解成一个列表中的句子。应用程序使用Flask,一个轻量级的网络服务器和框架。工具还应用了生物阅读的原则,以提高阅读性,包括粗体功能和字符、词和行间距调整。文章讨论了方法、实现和这种人工智能快速阅读工具的结果。

Open Problems in Computer Vision for Wilderness SAR and The Search for Patricia Wu-Murad

  • paper_url: http://arxiv.org/abs/2307.14527
  • repo_url: https://github.com/crasar/wisar
  • paper_authors: Thomas Manzini, Robin Murphy
  • for: 这个研究旨在应用两种计算机视觉系统,一个是supervised学习模型EfficientDET,另一个是无监督的RX спектраль分类器,于日本 Wu-Murad野外搜救(WSAR)活动中98.9 GB的无人机影像上进行找人任务。
  • methods: 这些研究使用了19个提议的方法和3个数据集来找人在无人机影像中,但只有3个方法(2个无监督和1个不知道的结构)在文献中被引用为在实际WSAR操作中使用过。
  • results: 这些提议中,EfficientDET体系和无监督的spectral RX分类器被选为最适合这种设定。EfficientDET模型在HERIDAL数据集上进行应用,尽管达到了数据上的性能,但在实际世界中出现了假阳性(如把树 LIMBS和岩石当作人)和假负性(如不能识别搜救队成员)。这些结果表明,在实际WSAR操作中,计算机视觉算法的表现并不如数据上的表现好。因此,未来的研究方向包括:更真实的野外SAR数据集、计算机视觉模型可以快速适应野外SAR操作中collect的多样化影像,以及更好地对照性能指标。
    Abstract This paper details the challenges in applying two computer vision systems, an EfficientDET supervised learning model and the unsupervised RX spectral classifier, to 98.9 GB of drone imagery from the Wu-Murad wilderness search and rescue (WSAR) effort in Japan and identifies 3 directions for future research. There have been at least 19 proposed approaches and 3 datasets aimed at locating missing persons in drone imagery, but only 3 approaches (2 unsupervised and 1 of an unknown structure) are referenced in the literature as having been used in an actual WSAR operation. Of these proposed approaches, the EfficientDET architecture and the unsupervised spectral RX classifier were selected as the most appropriate for this setting. The EfficientDET model was applied to the HERIDAL dataset and despite achieving performance that is statistically equivalent to the state-of-the-art, the model fails to translate to the real world in terms of false positives (e.g., identifying tree limbs and rocks as people), and false negatives (e.g., failing to identify members of the search team). The poor results in practice for algorithms that showed good results on datasets suggest 3 areas of future research: more realistic datasets for wilderness SAR, computer vision models that are capable of seamlessly handling the variety of imagery that can be collected during actual WSAR operations, and better alignment on performance measures.
    摘要
  1. More realistic datasets for wilderness SAR: The current datasets used for training and testing computer vision models may not accurately reflect the real-world conditions and variability of imagery collected during actual WSAR operations.2. Computer vision models that can handle diverse imagery: The models need to be able to seamlessly handle the variety of imagery that can be collected during actual WSAR operations, including different lighting conditions, weather, and vegetation.3. Better alignment on performance measures: The models need to be evaluated using performance measures that are relevant to the specific task and environment of WSAR operations, rather than simply relying on metrics that show good results on datasets.The paper also notes that while there have been many proposed approaches to locating missing persons in drone imagery, only a few have been used in actual WSAR operations, and of those, the EfficientDET architecture and the unsupervised spectral RX classifier were selected as the most appropriate for this setting. However, the EfficientDET model failed to translate to real-world performance, with issues such as false positives (identifying tree limbs and rocks as people) and false negatives (failing to identify members of the search team).

Patterns of Vehicle Lights: Addressing Complexities in Curation and Annotation of Camera-Based Vehicle Light Datasets and Metrics

  • paper_url: http://arxiv.org/abs/2307.14521
  • repo_url: None
  • paper_authors: Ross Greer, Akshay Gopalkrishnan, Maitrayee Keskar, Mohan Trivedi
  • for: 本研究探讨了计算机视觉中车辆灯光的表示方法以及其对自动驾驶场景中不同任务的影响。
  • methods: 研究比较了不同的车辆灯光表示方法,包括 bounding boxes、中心点、角点和分割面积,并讨论了它们的优缺点。
  • results: 研究认为,正确地检测车辆灯光对于夜间车辆检测、3D车辆方向估计和动态轨迹指示等任务都非常重要。研究还提出了一种基于实际数据驱动的模型训练方法,并提供了一个新的车辆灯光数据集和可见光模型,以满足下游应用中的车辆检测、意图预测和安全轨迹规划等任务。
    Abstract This paper explores the representation of vehicle lights in computer vision and its implications for various tasks in the field of autonomous driving. Different specifications for representing vehicle lights, including bounding boxes, center points, corner points, and segmentation masks, are discussed in terms of their strengths and weaknesses. Three important tasks in autonomous driving that can benefit from vehicle light detection are identified: nighttime vehicle detection, 3D vehicle orientation estimation, and dynamic trajectory cues. Each task may require a different representation of the light. The challenges of collecting and annotating large datasets for training data-driven models are also addressed, leading to introduction of the LISA Vehicle Lights Dataset and associated Light Visibility Model, which provides light annotations specifically designed for downstream applications in vehicle detection, intent and trajectory prediction, and safe path planning. A comparison of existing vehicle light datasets is provided, highlighting the unique features and limitations of each dataset. Overall, this paper provides insights into the representation of vehicle lights and the importance of accurate annotations for training effective detection models in autonomous driving applications. Our dataset and model are made available at https://cvrr.ucsd.edu/vehicle-lights-dataset
    摘要

A new algorithm for Subgroup Set Discovery based on Information Gain

  • paper_url: http://arxiv.org/abs/2307.15089
  • repo_url: None
  • paper_authors: Daniel Gómez-Bravo, Aaron García, Guillermo Vigueras, Belén Ríos, Alejandro Rodríguez-González
  • for: 本研究旨在提出一种新的模式发现算法,即信息增益子集发现(IGSD),用于解决现有模式发现算法的一些局限性。
  • methods: IGSD算法结合信息增益(IG)和偶极率(OR)为多个评价因素,用于模式选择。
  • results: 对于11个数据集,IGSD算法比现有的FSSD和SSD++算法提供更可靠的模式和更少的模式集。IGSD算法还提供了更高的偶极率值,表明模式和目标之间的相互依赖性更高。此外,IGSD算法中的模式被专家 validate 为更加符合域专家的评价。
    Abstract Pattern discovery is a machine learning technique that aims to find sets of items, subsequences, or substructures that are present in a dataset with a higher frequency value than a manually set threshold. This process helps to identify recurring patterns or relationships within the data, allowing for valuable insights and knowledge extraction. In this work, we propose Information Gained Subgroup Discovery (IGSD), a new SD algorithm for pattern discovery that combines Information Gain (IG) and Odds Ratio (OR) as a multi-criteria for pattern selection. The algorithm tries to tackle some limitations of state-of-the-art SD algorithms like the need for fine-tuning of key parameters for each dataset, usage of a single pattern search criteria set by hand, usage of non-overlapping data structures for subgroup space exploration, and the impossibility to search for patterns by fixing some relevant dataset variables. Thus, we compare the performance of IGSD with two state-of-the-art SD algorithms: FSSD and SSD++. Eleven datasets are assessed using these algorithms. For the performance evaluation, we also propose to complement standard SD measures with IG, OR, and p-value. Obtained results show that FSSD and SSD++ algorithms provide less reliable patterns and reduced sets of patterns than IGSD algorithm for all datasets considered. Additionally, IGSD provides better OR values than FSSD and SSD++, stating a higher dependence between patterns and targets. Moreover, patterns obtained for one of the datasets used, have been validated by a group of domain experts. Thus, patterns provided by IGSD show better agreement with experts than patterns obtained by FSSD and SSD++ algorithms. These results demonstrate the suitability of the IGSD as a method for pattern discovery and suggest that the inclusion of non-standard SD metrics allows to better evaluate discovered patterns.
    摘要 Pattern discovery 是一种机器学习技术,旨在在数据集中找到较高频值的项集、 subsequences 或 substructures。这个过程可以帮助找到数据中复杂的模式或关系,从而提供有价值的发现和知识提取。在这个工作中,我们提出了信息增加 subgroup discovery(IGSD)算法,这是一种新的 Pattern discovery 算法,它将信息增加(IG)和 odds ratio(OR)作为多种选择 criterion。该算法试图解决现有 Pattern discovery 算法的一些局限性,如手动设置的阈值、不同数据集需要调整参数、使用不 overlap 的数据结构来探索 subgroup 空间、以及无法通过固定数据集中的一些有关变量来搜索模式。因此,我们将IGSD与现有的 Pattern discovery 算法FSSD和SSD++进行比较。对 eleven 个数据集进行评估,我们还提出了一种用于评估 Pattern discovery 性能的方法,该方法包括IG、OR和p-value。获得的结果表明,FSSD和SSD++算法在所有考虑的数据集中提供了较差的模式和减少的模式集,而IGSD算法则提供了更高的OR值,表明模式和目标之间的依赖性更高。此外,IGSD算法对一个数据集中的模式进行验证,并与领域专家的评估结果相符,表明IGSD算法提供的模式更加符合专家的认知。这些结果表明IGSD算法适用于模式发现,并且包括非标准 Pattern discovery 度量可以更好地评估发现的模式。

The Co-12 Recipe for Evaluating Interpretable Part-Prototype Image Classifiers

  • paper_url: http://arxiv.org/abs/2307.14517
  • repo_url: None
  • paper_authors: Meike Nauta, Christin Seifert
  • for: 本研究旨在提供一份关于可解释部件prototype模型的评估方法的概要。
  • methods: 本文使用了Co-12性能评估属性(正确性、完整性、紧凑性等)来评估部件prototype模型的解释质量。
  • results: 本研究发现了评估部件prototype模型解释质量的现有工作和研究漏洞,并提出了未来评估方法的建议。同时,本文还提供了一份“Co-12 quick reference”,用于概括评估部件prototype模型的解释质量。
    Abstract Interpretable part-prototype models are computer vision models that are explainable by design. The models learn prototypical parts and recognise these components in an image, thereby combining classification and explanation. Despite the recent attention for intrinsically interpretable models, there is no comprehensive overview on evaluating the explanation quality of interpretable part-prototype models. Based on the Co-12 properties for explanation quality as introduced in arXiv:2201.08164 (e.g., correctness, completeness, compactness), we review existing work that evaluates part-prototype models, reveal research gaps and outline future approaches for evaluation of the explanation quality of part-prototype models. This paper, therefore, contributes to the progression and maturity of this relatively new research field on interpretable part-prototype models. We additionally provide a ``Co-12 cheat sheet'' that acts as a concise summary of our findings on evaluating part-prototype models.
    摘要 <>对于可解释部件模型,我们现在没有一个全面的评估解释质量的概述。基于arXiv:2201.08164中所引入的Co-12属性(如正确性、完整性、 компакт性),我们回顾现有的工作,揭示研究漏掉和未来的评估方法。因此,本文对新的可解释部件模型领域做出了贡献,并提供了一份``Co-12指南'',作为评估部件模型的 concise 概括。Translation:For interpretable part-prototype models, there is no comprehensive overview of evaluating the explanation quality. Based on the Co-12 properties for explanation quality introduced in arXiv:2201.08164 (such as correctness, completeness, compactness), we review existing work, reveal research gaps, and outline future approaches for evaluating the explanation quality of part-prototype models. Therefore, this paper contributes to the progression and maturity of this relatively new research field on interpretable part-prototype models. We also provide a "Co-12 cheat sheet" as a concise summary of our findings on evaluating part-prototype models.

Words That Stick: Predicting Decision Making and Synonym Engagement Using Cognitive Biases and Computational Linguistics

  • paper_url: http://arxiv.org/abs/2307.14511
  • repo_url: None
  • paper_authors: Nimrod Dvir, Elaine Friedman, Suraj Commuri, Fan Yang, Jennifer Romano
    for:This research aims to anticipate user engagement and decision-making on digital platforms by leveraging cognitive psychology and information systems studies.methods:The study employs natural language processing (NLP) techniques and insights from cognitive bias research to analyze user interactions with synonyms within digital content. The READ model is synthesized from four cognitive biases: Representativeness, Ease-of-use, Affect, and Distribution.results:Through a comprehensive user survey, the study finds that synonyms that accurately represent core ideas, are easy to understand, elicit emotional responses, and are commonly encountered, promote greater user engagement. The results offer a fresh perspective on human-computer interaction, digital behaviors, and decision-making processes, and highlight the significance of cognitive biases in designing effective digital content across fields like education and marketing.
    Abstract This research draws upon cognitive psychology and information systems studies to anticipate user engagement and decision-making on digital platforms. By employing natural language processing (NLP) techniques and insights from cognitive bias research, we delve into user interactions with synonyms within digital content. Our methodology synthesizes four cognitive biasesRepresentativeness, Ease-of-use, Affect, and Distributioninto the READ model. Through a comprehensive user survey, we assess the model's ability to predict user engagement, discovering that synonyms that accurately represent core ideas, are easy to understand, elicit emotional responses, and are commonly encountered, promote greater user engagement. Crucially, our work offers a fresh lens on human-computer interaction, digital behaviors, and decision-making processes. Our results highlight the promise of cognitive biases as potent indicators of user engagement, underscoring their significance in designing effective digital content across fields like education and marketing.
    摘要

Attention for Robot Touch: Tactile Saliency Prediction for Robust Sim-to-Real Tactile Control

  • paper_url: http://arxiv.org/abs/2307.14510
  • repo_url: None
  • paper_authors: Yijiong Lin, Mauro Comi, Alex Church, Dandan Zhang, Nathan F. Lepora
  • for: 提高机器人摸索控制在无结构环境中的稳定性。
  • methods: 基于人类触觉注意机制和计算视觉环境中的视觉突出预测问题,提出了一种新的概念——触觉突出。这个概念的目的是在触觉图像中寻找关键信息。而人类 manually labelling 触觉图像是困难的,因为触觉图像的模式可能具有counterintuitive的特征。为Address这个挑战,我们提出了一种新的方法,包括三个相互关联的网络:1)触觉深度网络(ConDepNet),生成一个实际触觉图像中的Contact Depth Map,以地址触觉图像中的target和噪声特征;2)触觉突出网络(TacSalNet),预测一个触觉突出图像,描述输入Contact Depth Map中的target区域;3)触觉噪声生成器(TacNGen),生成噪声特征,用于训练TacSalNet。
  • results: 实验结果表明,我们的触觉突出预测方法可以准确地预测实际触觉图像中的target特征。总的来说,我们的触觉突出预测方法在无结构环境中提供了稳定的sim-to-real触觉控制。项目页面:https://sites.google.com/view/tactile-saliency/.
    Abstract High-resolution tactile sensing can provide accurate information about local contact in contact-rich robotic tasks. However, the deployment of such tasks in unstructured environments remains under-investigated. To improve the robustness of tactile robot control in unstructured environments, we propose and study a new concept: \textit{tactile saliency} for robot touch, inspired by the human touch attention mechanism from neuroscience and the visual saliency prediction problem from computer vision. In analogy to visual saliency, this concept involves identifying key information in tactile images captured by a tactile sensor. While visual saliency datasets are commonly annotated by humans, manually labelling tactile images is challenging due to their counterintuitive patterns. To address this challenge, we propose a novel approach comprised of three interrelated networks: 1) a Contact Depth Network (ConDepNet), which generates a contact depth map to localize deformation in a real tactile image that contains target and noise features; 2) a Tactile Saliency Network (TacSalNet), which predicts a tactile saliency map to describe the target areas for an input contact depth map; 3) and a Tactile Noise Generator (TacNGen), which generates noise features to train the TacSalNet. Experimental results in contact pose estimation and edge-following in the presence of distractors showcase the accurate prediction of target features from real tactile images. Overall, our tactile saliency prediction approach gives robust sim-to-real tactile control in environments with unknown distractors. Project page: https://sites.google.com/view/tactile-saliency/.
    摘要 高解析触觉感测可以提供精确的本地接触信息在有接触的 роботиче任务中。然而,在无结构环境中部署这些任务仍然受到了不足的研究。为了提高触觉控制在无结构环境中的稳定性,我们提出了一新的概念:触觉焦点,启发自人类触觉注意机制和计算机视觉中的视觉预测问题。在视觉预测问题中,这个概念涉及到从触觉图像中提取关键信息。而视觉预测数据通常由人类手动标注,然而对于触觉图像来说,手动标注是具有counterintuitive pattern的。为了解决这个挑战,我们提出了一种新的方法,包括三个相关的网络:1)触觉深度网络(ConDepNet),生成一个具有target和噪声特征的真实触觉深度图像;2)触觉焦点网络(TacSalNet),预测一个触觉焦点图像,描述输入触觉深度图像中的target区域;3)触觉噪声生成器(TacNGen),生成噪声特征来训练TacSalNet。实验结果表明,我们的触觉焦点预测方法可以准确地预测真实触觉图像中的target特征。总的来说,我们的触觉焦点预测方法可以在未知干扰下提供稳定的sim-to-real触觉控制。项目页面:https://sites.google.com/view/tactile-saliency/

Improving Reliable Navigation under Uncertainty via Predictions Informed by Non-Local Information

  • paper_url: http://arxiv.org/abs/2307.14501
  • repo_url: None
  • paper_authors: Raihan Islam Arnob, Gregory J. Stein
  • for: 该论文目的是提高具有部分地图信息的环境中,可靠、长期目标导航的能力。
  • methods: 该论文使用非本地信息来预测行动的质量,包括使用图 neural network 来学习非本地信息。
  • results: 在三个 simulations 环境中,该方法比非学习基eline 减少了 9.3% 的成本,并比只使用本地信息来预测的学习 Informed Planner 减少了 14.9% 的成本。
    Abstract We improve reliable, long-horizon, goal-directed navigation in partially-mapped environments by using non-locally available information to predict the goodness of temporally-extended actions that enter unseen space. Making predictions about where to navigate in general requires non-local information: any observations the robot has seen so far may provide information about the goodness of a particular direction of travel. Building on recent work in learning-augmented model-based planning under uncertainty, we present an approach that can both rely on non-local information to make predictions (via a graph neural network) and is reliable by design: it will always reach its goal, even when learning does not provide accurate predictions. We conduct experiments in three simulated environments in which non-local information is needed to perform well. In our large scale university building environment, generated from real-world floorplans to the scale, we demonstrate a 9.3\% reduction in cost-to-go compared to a non-learned baseline and a 14.9\% reduction compared to a learning-informed planner that can only use local information to inform its predictions.
    摘要 我们提高了可靠、长期目标导航在部分地图环境中,使用非本地可用信息预测行动的质量。任何机器人所见之前的观察都可以提供行动的质量信息。基于最近的学习增强模型基于不确定性的规划方法,我们提出了一种方法,可以通过图 neural network 来预测,同时具有可靠性:它总是可以达到目标,即使学习不准确预测。我们在三个 simulated 环境中进行了实验,其中非本地信息是必要的。在我们的大规模大学建筑环境中,生成自真实的 floorplans ,我们表明了与非学习基准相比的 9.3% 的成本降低,并与只能使用本地信息来预测的学习 Informed 规划器相比,表现出 14.9% 的成本降低。

Technical note: ShinyAnimalCV: open-source cloud-based web application for object detection, segmentation, and three-dimensional visualization of animals using computer vision

  • paper_url: http://arxiv.org/abs/2307.14487
  • repo_url: https://github.com/uf-aiaos/shinyanimalcv
  • paper_authors: Jin Wang, Yu Hu, Lirong Xiang, Gota Morota, Samantha A. Brooks, Carissa L. Wickens, Emily K. Miller-Cushon, Haipeng Yu
    for:这个研究的目的是为了开发一个开源云端Web应用程序,以提供一个易于使用的界面,用于执行计算机视觉任务,包括对动物数据进行物体分割、检测、三维表面视觉化以及二维和三维形态特征提取。methods:这个研究使用了多种计算机视觉和深度学习算法,包括9种预训练的计算机视觉模型,以处理上述动物数据。results:这个研究开发了一个名为ShinyAnimalCV的开源云端Web应用程序,可以帮助用户快速和简单地执行计算机视觉任务,并提供了详细的教程和示例数据,以帮助用户快速适应该应用程序。
    Abstract Computer vision (CV), a non-intrusive and cost-effective technology, has furthered the development of precision livestock farming by enabling optimized decision-making through timely and individualized animal care. The availability of affordable two- and three-dimensional camera sensors, combined with various machine learning and deep learning algorithms, has provided a valuable opportunity to improve livestock production systems. However, despite the availability of various CV tools in the public domain, applying these tools to animal data can be challenging, often requiring users to have programming and data analysis skills, as well as access to computing resources. Moreover, the rapid expansion of precision livestock farming is creating a growing need to educate and train animal science students in CV. This presents educators with the challenge of efficiently demonstrating the complex algorithms involved in CV. Thus, the objective of this study was to develop ShinyAnimalCV, an open-source cloud-based web application. This application provides a user-friendly interface for performing CV tasks, including object segmentation, detection, three-dimensional surface visualization, and extraction of two- and three-dimensional morphological features. Nine pre-trained CV models using top-view animal data are included in the application. ShinyAnimalCV has been deployed online using cloud computing platforms. The source code of ShinyAnimalCV is available on GitHub, along with detailed documentation on training CV models using custom data and deploying ShinyAnimalCV locally to allow users to fully leverage the capabilities of the application. ShinyAnimalCV can contribute to CV research and teaching in the animal science community.
    摘要 计算机视觉(CV)技术,一种不侵入和cost-effective的技术,已经推动了精细 живо产业的发展,通过提供了时 opportune和个性化的动物护理,从而实现了 optimize livestock production systems。可以获得的便宜的二维和三维摄像头感知器,combined with various machine learning and deep learning algorithms,has provided a valuable opportunity to improve livestock production systems. However, despite the availability of various CV tools in the public domain, applying these tools to animal data can be challenging, often requiring users to have programming and data analysis skills, as well as access to computing resources. Moreover, the rapid expansion of precision livestock farming is creating a growing need to educate and train animal science students in CV. This presents educators with the challenge of efficiently demonstrating the complex algorithms involved in CV. Therefore, the objective of this study was to develop ShinyAnimalCV, an open-source cloud-based web application. This application provides a user-friendly interface for performing CV tasks, including object segmentation, detection, three-dimensional surface visualization, and extraction of two- and three-dimensional morphological features. Nine pre-trained CV models using top-view animal data are included in the application. ShinyAnimalCV has been deployed online using cloud computing platforms. The source code of ShinyAnimalCV is available on GitHub, along with detailed documentation on training CV models using custom data and deploying ShinyAnimalCV locally to allow users to fully leverage the capabilities of the application. ShinyAnimalCV can contribute to CV research and teaching in the animal science community.

Single Channel Speech Enhancement Using U-Net Spiking Neural Networks

  • paper_url: http://arxiv.org/abs/2307.14464
  • repo_url: None
  • paper_authors: Abir Riahi, Éric Plourde
  • for: 提高沟通设备和语音识别系统的可靠性,采用神经网络进行干扰减少。
  • methods: 使用神经网络(SNN)基于U-Net架构,SNN适用于处理时间维度数据,如语音,并且能够在限制性的硬件上实现能量减少。
  • results: 比较 intel neuromorphic deep noise suppression challenge 基线解决方案和等效的人工神经网络模型,能源减少的SNN模型达到了接受性的性能。
    Abstract Speech enhancement (SE) is crucial for reliable communication devices or robust speech recognition systems. Although conventional artificial neural networks (ANN) have demonstrated remarkable performance in SE, they require significant computational power, along with high energy costs. In this paper, we propose a novel approach to SE using a spiking neural network (SNN) based on a U-Net architecture. SNNs are suitable for processing data with a temporal dimension, such as speech, and are known for their energy-efficient implementation on neuromorphic hardware. As such, SNNs are thus interesting candidates for real-time applications on devices with limited resources. The primary objective of the current work is to develop an SNN-based model with comparable performance to a state-of-the-art ANN model for SE. We train a deep SNN using surrogate-gradient-based optimization and evaluate its performance using perceptual objective tests under different signal-to-noise ratios and real-world noise conditions. Our results demonstrate that the proposed energy-efficient SNN model outperforms the Intel Neuromorphic Deep Noise Suppression Challenge (Intel N-DNS Challenge) baseline solution and achieves acceptable performance compared to an equivalent ANN model.
    摘要 声音增强(SE)是重要的可靠通信设备或Robust Speech recognition系统的一部分。虽然传统的人工神经网络(ANN)在SE方面已经表现出了很好的性能,但它们需要很大的计算能力以及高的能源成本。在这篇论文中,我们提出了一种使用射频神经网络(SNN)基于U-Net架构的新方法 для SE。SNNs适用于处理具有时间维度的数据,如Speech,并且在 neuromorphic 硬件上实现时能够减少能源成本。因此,SNNs 是实时应用于有限资源的设备上的不错选择。我们的目标是开发一个与现有ANN模型相当的性能的SNN模型。我们使用代理函数逼近的优化方法来训练深度SNN,并在不同的信号噪声比和实际噪声条件下使用感知目标测试其性能。我们的结果表明,我们提出的能效的SNN模型比Intel Neuromorphic Deep Noise Suppression Challenge(Intel N-DNS Challenge)基eline解决方案更好,并且与相等的ANN模型相比,它的性能是可接受的。

VISPUR: Visual Aids for Identifying and Interpreting Spurious Associations in Data-Driven Decisions

  • paper_url: http://arxiv.org/abs/2307.14448
  • repo_url: https://github.com/picsolab/vispur
  • paper_authors: Xian Teng, Yongsu Ahn, Yu-Ru Lin
  • For: The paper aims to help users identify and understand spurious associations in big data and machine learning, and make accountable causal decisions.* Methods: The proposed “de-paradox” workflow and visual analytic system, including the CONFOUNDER DASHBOARD, SUBGROUP VIEWER, REASONING STORYBOARD, and DECISION DIAGNOSIS panel, provide a framework for tackling spurious associations.* Results: The qualitative and quantitative results from an expert interview and a controlled user experiment demonstrate the effectiveness of the proposed system in helping users identify and understand spurious associations, and make accountable causal decisions.In Simplified Chinese text, the three key points would be:
  • for: 论文旨在帮助用户标识和理解大数据和机器学习中的假关联,并做出负责任的 causal 决策。
  • methods: 提出的 “de-paradox” 工作流程和视觉分析系统,包括 CONFOUNDER DASHBOARD、SUBGROUP VIEWER、REASONING STORYBOARD 和 DECISION DIAGNOSIS panel,为处理假关联提供了一个框架。
  • results: 从专家采访和控制的用户试验结果来看,提出的系统有效地帮助用户标识和理解假关联,并做出负责任的 causal 决策。
    Abstract Big data and machine learning tools have jointly empowered humans in making data-driven decisions. However, many of them capture empirical associations that might be spurious due to confounding factors and subgroup heterogeneity. The famous Simpson's paradox is such a phenomenon where aggregated and subgroup-level associations contradict with each other, causing cognitive confusions and difficulty in making adequate interpretations and decisions. Existing tools provide little insights for humans to locate, reason about, and prevent pitfalls of spurious association in practice. We propose VISPUR, a visual analytic system that provides a causal analysis framework and a human-centric workflow for tackling spurious associations. These include a CONFOUNDER DASHBOARD, which can automatically identify possible confounding factors, and a SUBGROUP VIEWER, which allows for the visualization and comparison of diverse subgroup patterns that likely or potentially result in a misinterpretation of causality. Additionally, we propose a REASONING STORYBOARD, which uses a flow-based approach to illustrate paradoxical phenomena, as well as an interactive DECISION DIAGNOSIS panel that helps ensure accountable decision-making. Through an expert interview and a controlled user experiment, our qualitative and quantitative results demonstrate that the proposed "de-paradox" workflow and the designed visual analytic system are effective in helping human users to identify and understand spurious associations, as well as to make accountable causal decisions.
    摘要 大数据和机器学习工具共同激活了人类在基于数据的决策中的能力。然而,许多其中捕捉到了可能的假相关性,因为存在干扰因素和 subgroup 多样性。例如,赛普逊的 парадоксом是这种现象,其中汇集和 subgroup 级别的相关性相互矛盾,导致认知混乱和不能正确地解释和决策。现有的工具几乎无法为人类提供有用的理解和避免假相关性的方法。我们提议了一个名为VISPUR的视觉分析系统,它提供了一个 causal 分析框架和一个人类中心的工作流程,以解决假相关性的问题。这些包括一个 CONFOUNDER DASHBOARD,可以自动 indentify 可能的干扰因素,以及一个 SUBGROUP VIEWER,可以视觉化和比较多个 subgroup 的多样性,可能或可能导致 causality 的误解。此外,我们还提议了一个 REASONING STORYBOARD,使用流程方式解释困扰现象,以及一个交互式的 DECISION DIAGNOSIS 面板,帮助确保决策是可负责的。经过专家采访和控制的用户测试,我们的资深和量化结果表明,我们的提议的 "de-paradox" 工作流程和设计的视觉分析系统有效地帮助人类用户认识和理解假相关性,以及作出可负责的 causal 决策。

Three Bricks to Consolidate Watermarks for Large Language Models

  • paper_url: http://arxiv.org/abs/2308.00113
  • repo_url: https://github.com/facebookresearch/three_bricks
  • paper_authors: Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, Teddy Furon
  • for: 本研究旨在为大语言模型 Watermarking 技术提供更好的理论基础和实践应用。
  • methods: 本研究使用了三种理论和实践考虑,包括新的统计测试、对经典benchmark的比较和多比特水印技术。
  • results: 研究人员通过新的统计测试和实践应用发现,使用 Watermarking 技术可以准确地推断出生成文本是否来自特定的语言模型。
    Abstract The task of discerning between generated and natural texts is increasingly challenging. In this context, watermarking emerges as a promising technique for ascribing generated text to a specific model. It alters the sampling generation process so as to leave an invisible trace in the generated output, facilitating later detection. This research consolidates watermarks for large language models based on three theoretical and empirical considerations. First, we introduce new statistical tests that offer robust theoretical guarantees which remain valid even at low false-positive rates (less than 10$^{\text{-6}$). Second, we compare the effectiveness of watermarks using classical benchmarks in the field of natural language processing, gaining insights into their real-world applicability. Third, we develop advanced detection schemes for scenarios where access to the LLM is available, as well as multi-bit watermarking.
    摘要 “文本生成和自然文本之间的区分日益困难。在这种情况下,水印技术成为了识别生成文本的特定模型的有力的方法。它在生成过程中改变抽象样本,以留下不可见的 trace,使得后续检测变得容易。这项研究汇集了大语言模型的水印,基于三种理论和实证考虑。首先,我们提出了新的统计测试,具有坚实的理论保证,适用于低 FALSE POSITIVE 率(低于10$^{-6}$)。其次,我们通过经典的自然语言处理benchmark进行比较,从而获得了水印的实际应用性。最后,我们开发了高级检测方案,包括对LLM的访问和多比特水印。”

WavJourney: Compositional Audio Creation with Large Language Models

  • paper_url: http://arxiv.org/abs/2307.14335
  • repo_url: https://github.com/audio-agi/wavjourney
  • paper_authors: Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang
  • for: 这篇论文旨在探讨大语言模型(LLM)如何用于创造智能音频内容。
  • methods: 该论文提出了一种名为WavJourney的系统,该系统利用LLM连接多种音频模型,以生成包括语音、音乐和特效的听力内容。
  • results: 该论文通过在多个实际场景中应用WavJourney系统,证明了该系统的可行性和创造力。
    Abstract Large Language Models (LLMs) have shown great promise in integrating diverse expert models to tackle intricate language and vision tasks. Despite their significance in advancing the field of Artificial Intelligence Generated Content (AIGC), their potential in intelligent audio content creation remains unexplored. In this work, we tackle the problem of creating audio content with storylines encompassing speech, music, and sound effects, guided by text instructions. We present WavJourney, a system that leverages LLMs to connect various audio models for audio content generation. Given a text description of an auditory scene, WavJourney first prompts LLMs to generate a structured script dedicated to audio storytelling. The audio script incorporates diverse audio elements, organized based on their spatio-temporal relationships. As a conceptual representation of audio, the audio script provides an interactive and interpretable rationale for human engagement. Afterward, the audio script is fed into a script compiler, converting it into a computer program. Each line of the program calls a task-specific audio generation model or computational operation function (e.g., concatenate, mix). The computer program is then executed to obtain an explainable solution for audio generation. We demonstrate the practicality of WavJourney across diverse real-world scenarios, including science fiction, education, and radio play. The explainable and interactive design of WavJourney fosters human-machine co-creation in multi-round dialogues, enhancing creative control and adaptability in audio production. WavJourney audiolizes the human imagination, opening up new avenues for creativity in multimedia content creation.
    摘要 大型语言模型(LLM)已经表现出很大的损 Promise 在融合多种专家模型来解决复杂的语言和视觉任务。即使它们在人工智能生成内容(AIGC)领域中的潜力未被完全探索,但它们在智能音频内容创作中的潜力仍然未被开发。在这个工作中,我们对于创建音频内容的Storyline 进行了探索,这些Storyline 包括谈话、音乐和音效。我们提出了 WavJourney,一个系统可以通过连接多种音频模型来创建音频内容。当我们给出了一个文本描述的听频场景时,WavJourney 会使用 LLM 生成一个结构化的对话脚本,这个对话脚本包括多种音频元素,并以其空间时间关系组织。这个对话脚本作为音频的概念表现,提供了可互动和可解释的理由,以便人类参与。接着,这个对话脚本会被转换为一个 компьютер程序,每个程序行叫用一个任务特定的音频生成模型或计算操作函数(例如, concatenate 或 mix)。这个 компьютер程序会被执行,以获得一个可解释的音频生成解决方案。我们在多个实际应用中证明了 WavJourney 的实用性,包括科幻、教育和广播剧等。WavJourney 的可靠和互动设计启动人机共创,增加了创作控制和适应性在音频生成中。WavJourney 将人类的想像获得了声音,开启了新的创作 Avenues 在多媒体内容创作中。

Event-based Vision for Early Prediction of Manipulation Actions

  • paper_url: http://arxiv.org/abs/2307.14332
  • repo_url: https://github.com/danideniz/davishanddataset-events
  • paper_authors: Daniel Deniz, Cornelia Fermuller, Eduardo Ros, Manuel Rodriguez-Alvarez, Francisco Barranco
  • for: 这个研究的目的是用事件驱动的变换器网络进行 manipulate 动作预测。
  • methods: 该研究使用了事件驱动的 transformers 网络,通过在线推理来预测 manipulate 动作。
  • results: 研究表明, transformers 网络可以准确地预测 manipulate 动作,并且可以 capture 动作的动态特征,超过视频基于方法。 code 将在 GitHub 上发布。
    Abstract Neuromorphic visual sensors are artificial retinas that output sequences of asynchronous events when brightness changes occur in the scene. These sensors offer many advantages including very high temporal resolution, no motion blur and smart data compression ideal for real-time processing. In this study, we introduce an event-based dataset on fine-grained manipulation actions and perform an experimental study on the use of transformers for action prediction with events. There is enormous interest in the fields of cognitive robotics and human-robot interaction on understanding and predicting human actions as early as possible. Early prediction allows anticipating complex stages for planning, enabling effective and real-time interaction. Our Transformer network uses events to predict manipulation actions as they occur, using online inference. The model succeeds at predicting actions early on, building up confidence over time and achieving state-of-the-art classification. Moreover, the attention-based transformer architecture allows us to study the role of the spatio-temporal patterns selected by the model. Our experiments show that the Transformer network captures action dynamic features outperforming video-based approaches and succeeding with scenarios where the differences between actions lie in very subtle cues. Finally, we release the new event dataset, which is the first in the literature for manipulation action recognition. Code will be available at https://github.com/DaniDeniz/EventVisionTransformer.
    摘要 neuromorphic visual sensors 是人工视觉系统,它们当光度变化时输出一系列异步事件。这些感知器具有许多优点,包括非常高的时间分辨率、无运动模糊和智能数据压缩,适用于实时处理。在这个研究中,我们提出了基于事件的推理模型,并进行了实验研究,以测试这种模型在人体动作预测中的性能。人类动作预测在认知机器人和人机交互领域引起了极大的兴趣。早期预测可以在规划中预测复杂的阶段,使得交互变得有效和实时。我们的 transformer 网络使用事件来预测 manipulation 动作,并在线进行推理。模型在动作开始时就能够预测动作,逐渐增加信任度,并达到了状态之 arts 的分类。此外,基于注意力的 transformer 架构允许我们研究模型选择的空间时间模式的角色。我们的实验显示, transformer 网络在动作动态特征方面超越了视频基于方法,并在具有非常细微差异的动作场景中表现出色。最后,我们发布了一个新的事件数据集,这是文献中的第一个推理数据集。代码将在 GitHub 上公开,链接为 https://github.com/DaniDeniz/EventVisionTransformer。

Utilizing Large Language Models for Natural Interface to Pharmacology Databases

  • paper_url: http://arxiv.org/abs/2307.15717
  • repo_url: None
  • paper_authors: Hong Lu, Chuan Li, Yinheng Li, Jie Zhao
  • for: 这篇论文的目的是为了开发一个基于自然语言的数据库查询界面,以便药理学家在药物开发过程中访问和检索大量数据。
  • methods: 本文使用了大型自然语言模型(LLM)来实现数据库中的数据查询,并通过实验证明了这个框架的可行性和效能。
  • results: 实验结果显示,该框架可以对各种药品数据和知识库进行扩展,并且可以实现高效的数据查询和检索。
    Abstract The drug development process necessitates that pharmacologists undertake various tasks, such as reviewing literature, formulating hypotheses, designing experiments, and interpreting results. Each stage requires accessing and querying vast amounts of information. In this abstract, we introduce a Large Language Model (LLM)-based Natural Language Interface designed to interact with structured information stored in databases. Our experiments demonstrate the feasibility and effectiveness of the proposed framework. This framework can generalize to query a wide range of pharmaceutical data and knowledge bases.
    摘要 药物开发过程中,药物学家需要完成多种任务,如阅读文献、提出假设、设计实验和解释结果。每个阶段都需要访问和查询大量信息。在这个报告中,我们介绍了一种基于自然语言的语言模型(LLM)的自然语言界面,用于与数据库中的结构化信息进行交互。我们的实验表明该框架的可行性和效果。这个框架可以泛化到访问各种药物数据和知识库。

Building and Testing a General Intelligence Embodied in a Humanoid Robot

  • paper_url: http://arxiv.org/abs/2307.16770
  • repo_url: https://github.com/Aryia-Behroziuan/Other-sources
  • paper_authors: Suzanne Gildert, Geordie Rose
  • for: 这种论文的目的是建立一种人类水平的智能机器,以便它们可以完成经济最有价值的工作。
  • methods: 这种方法包括一个物理的人型 робо体系统,一个基于软件的控制系统,一个名为“g+”的性能指标,以及一种进化算法来逐步提高这个指标的得分。
  • results: 作者们介绍了这种方法的当前状况和历史测量结果,包括“g+”指标的测量结果。
    Abstract Machines with human-level intelligence should be able to do most economically valuable work. This aligns a major economic incentive with the scientific grand challenge of building a human-like mind. Here we describe our approach to building and testing such a system. Our approach comprises a physical humanoid robotic system; a software based control system for robots of this type; a performance metric, which we call g+, designed to be a measure of human-like intelligence in humanoid robots; and an evolutionary algorithm for incrementally increasing scores on this performance metric. We introduce and describe the current status of each of these. We report on current and historical measurements of the g+ metric on the systems described here.
    摘要 机器人 WITH human-level intelligence应该能够完成经济有价值的工作。这与科学大挑战的建立人类智能型机器人相吻合。我们的方法包括物理人iform机器人系统;基于软件的控制系统 для这类机器人;一个名为g+的性能指标,用于评估机器人的人类智能水平;以及一种演化算法,用于逐步提高g+指标的得分。我们介绍了每个部分的当前状态,并报告了当前和历史中的g+指标测量结果。

Waypoint-Based Imitation Learning for Robotic Manipulation

  • paper_url: http://arxiv.org/abs/2307.14326
  • repo_url: https://github.com/lucys0/awe
  • paper_authors: Lucy Xiaoyang Shi, Archit Sharma, Tony Z. Zhao, Chelsea Finn
  • for: 该论文旨在提出一种自动生成方向点的方法,以提高人工学习中的imitazione学习的精度和效率。
  • methods: 该方法基于linear motion的准确性,通过分解示例为最小的方向点集来生成方向点。
  • results: 实验结果表明,该方法可以增加state-of-the-art算法的成功率,在simulation中提高了25%,在实际双手操作任务上提高了4-28%,同时降低了决策准确性的 horizon。
    Abstract While imitation learning methods have seen a resurgent interest for robotic manipulation, the well-known problem of compounding errors continues to afflict behavioral cloning (BC). Waypoints can help address this problem by reducing the horizon of the learning problem for BC, and thus, the errors compounded over time. However, waypoint labeling is underspecified, and requires additional human supervision. Can we generate waypoints automatically without any additional human supervision? Our key insight is that if a trajectory segment can be approximated by linear motion, the endpoints can be used as waypoints. We propose Automatic Waypoint Extraction (AWE) for imitation learning, a preprocessing module to decompose a demonstration into a minimal set of waypoints which when interpolated linearly can approximate the trajectory up to a specified error threshold. AWE can be combined with any BC algorithm, and we find that AWE can increase the success rate of state-of-the-art algorithms by up to 25% in simulation and by 4-28% on real-world bimanual manipulation tasks, reducing the decision making horizon by up to a factor of 10. Videos and code are available at https://lucys0.github.io/awe/
    摘要 While imitation learning methods have seen a resurgence of interest for robotic manipulation, the well-known problem of compounding errors continues to affect behavioral cloning (BC). Waypoints can help address this problem by reducing the horizon of the learning problem for BC, but waypoint labeling is underspecified and requires additional human supervision. Can we generate waypoints automatically without any additional human supervision? Our key insight is that if a trajectory segment can be approximated by linear motion, the endpoints can be used as waypoints. We propose Automatic Waypoint Extraction (AWE) for imitation learning, a preprocessing module to decompose a demonstration into a minimal set of waypoints that can be interpolated linearly to approximate the trajectory up to a specified error threshold. AWE can be combined with any BC algorithm, and we find that AWE can increase the success rate of state-of-the-art algorithms by up to 25% in simulation and by 4-28% on real-world bimanual manipulation tasks, reducing the decision-making horizon by up to a factor of 10. Videos and code are available at .

Evaluating the Moral Beliefs Encoded in LLMs

  • paper_url: http://arxiv.org/abs/2307.14324
  • repo_url: https://github.com/ninodimontalcino/moralchoice
  • paper_authors: Nino Scherrer, Claudia Shi, Amir Feder, David M. Blei
  • for: 本研究旨在探讨大语言模型(LLM)上的问卷设计、管理、后处理和评估。
  • methods: 本研究使用统计方法来激发LLM中的信念,并 introduce了一些统计指标和评估度量来量化LLM“选择”的概率、相关的不确定性以及选择的一致性。
  • results: 研究发现:(1)在明确的场景下,大多数模型会选择与常识相符的行为。在杂乱的场景下,大多数模型表现出不确定性。(2)一些模型对于选择常识行为的问题 wording 有敏感性。(3)一些模型在杂乱场景下具有明确的偏好。特别是关闭源模型在大多数情况下表现一致。
    Abstract This paper presents a case study on the design, administration, post-processing, and evaluation of surveys on large language models (LLMs). It comprises two components: (1) A statistical method for eliciting beliefs encoded in LLMs. We introduce statistical measures and evaluation metrics that quantify the probability of an LLM "making a choice", the associated uncertainty, and the consistency of that choice. (2) We apply this method to study what moral beliefs are encoded in different LLMs, especially in ambiguous cases where the right choice is not obvious. We design a large-scale survey comprising 680 high-ambiguity moral scenarios (e.g., "Should I tell a white lie?") and 687 low-ambiguity moral scenarios (e.g., "Should I stop for a pedestrian on the road?"). Each scenario includes a description, two possible actions, and auxiliary labels indicating violated rules (e.g., "do not kill"). We administer the survey to 28 open- and closed-source LLMs. We find that (a) in unambiguous scenarios, most models "choose" actions that align with commonsense. In ambiguous cases, most models express uncertainty. (b) Some models are uncertain about choosing the commonsense action because their responses are sensitive to the question-wording. (c) Some models reflect clear preferences in ambiguous scenarios. Specifically, closed-source models tend to agree with each other.
    摘要
  1. A statistical method for extracting beliefs from LLMs. We introduce statistical measures and evaluation metrics that quantify the probability of an LLM “making a choice,” the associated uncertainty, and the consistency of that choice.2. Applying this method to investigate what moral beliefs are encoded in different LLMs, especially in situations where the right choice is not obvious. We designed a large-scale survey with 680 high-ambiguity moral scenarios (e.g., “Should I tell a white lie?”) and 687 low-ambiguity moral scenarios (e.g., “Should I stop for a pedestrian on the road?”). Each scenario includes a description, two possible actions, and auxiliary labels indicating violated rules (e.g., “do not kill”). We administered the survey to 28 open- and closed-source LLMs.Our findings are as follows:a. In unambiguous scenarios, most models “choose” actions that align with common sense. In ambiguous cases, most models express uncertainty.b. Some models are uncertain about choosing the common-sense action due to sensitivity to question wording.c. Some models reflect clear preferences in ambiguous scenarios, with closed-source models tending to agree with each other.

Reinforcement Learning by Guided Safe Exploration

  • paper_url: http://arxiv.org/abs/2307.14316
  • repo_url: None
  • paper_authors: Qisong Yang, Thiago D. Simão, Nils Jansen, Simon H. Tindemans, Matthijs T. J. Spaan
  • for: 本研究旨在使用不带奖励的强化学习(RL)训练代理人(导师),以便在未知目标任务下快速适应。
  • methods: 研究使用受限的奖励自由RL训练代理人,以避免危险交互,并在目标任务公布后不允许安全违反。同时,通过传输学习,训练目标策略(学生)以导师为 Referent,并逐渐减少导师的影响。
  • results: 实验结果表明,该方法可以实现安全的传输学习,帮助学生更快地解决目标任务。
    Abstract Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to compose a safe behaviour policy. Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses. The empirical analysis shows that this method can achieve safe transfer learning and helps the student solve the target task faster.
    摘要 安全性是RL应用扩展的关键因素。我们通常在实验室中训练RL代理,以便在真实世界中部署。然而,真实世界目标任务可能未知之前部署。无奖RL在部署前训练代理,以适应快速更新奖励信号。我们考虑了受限的奖励自由设定,其中一个代理在控制环境中学习安全地探索,而不需要奖励信号。当目标任务揭示后,安全违反不再允许。因此,代理可以被利用来组成安全行为策略。 drew from transfer learning,我们还启用了一个目标策略(学生),使其受到代理的正则化,而学生在训练进程中不可靠。逐渐消除代理的影响,以便更快地解决目标任务。我们的实验分析表明,这种方法可以实现安全的转移学习,并帮助学生更快地解决目标任务。

Unsupervised Deep Learning-based Pansharpening with Jointly-Enhanced Spectral and Spatial Fidelity

  • paper_url: http://arxiv.org/abs/2307.14403
  • repo_url: https://github.com/matciotola/lambda-pnn
  • paper_authors: Matteo Ciotola, Giovanni Poggi, Giuseppe Scarpa
  • for: 这个论文主要目的是提出一种基于深度学习的高分辨率图像缩进方法,以提高图像缩进的性能。
  • methods: 该方法使用了全分辨率培生和新的损失函数,以提高图像缩进的 spectral 和 spatial 质量。
  • results: 实验结果表明,提出的方法在具有挑战性的测试图像上比领先方法更好, both in terms of numerical results and visual output。
    Abstract In latest years, deep learning has gained a leading role in the pansharpening of multiresolution images. Given the lack of ground truth data, most deep learning-based methods carry out supervised training in a reduced-resolution domain. However, models trained on downsized images tend to perform poorly on high-resolution target images. For this reason, several research groups are now turning to unsupervised training in the full-resolution domain, through the definition of appropriate loss functions and training paradigms. In this context, we have recently proposed a full-resolution training framework which can be applied to many existing architectures. Here, we propose a new deep learning-based pansharpening model that fully exploits the potential of this approach and provides cutting-edge performance. Besides architectural improvements with respect to previous work, such as the use of residual attention modules, the proposed model features a novel loss function that jointly promotes the spectral and spatial quality of the pansharpened data. In addition, thanks to a new fine-tuning strategy, it improves inference-time adaptation to target images. Experiments on a large variety of test images, performed in challenging scenarios, demonstrate that the proposed method compares favorably with the state of the art both in terms of numerical results and visual output. Code is available online at https://github.com/matciotola/Lambda-PNN.
    摘要 最近几年,深度学习在多resolution图像投射中得到了主导地位。由于缺乏实际数据,大多数深度学习基于方法在减小分辨率领域进行了超级vised学习。然而,在高分辨率目标图像上,模型通常表现不佳。为此,许多研究小组现在转向无监督学习在全分辨率领域进行训练,通过定义适当的损失函数和训练方法。在这个上下文中,我们最近提出了一个全分辨率训练框架,可以应用到许多现有的架构上。我们提出了一个新的深度学习基于投射模型,全面利用了这种方法的潜力,并提供了顶尖性能。除了与前一代方法进行建筑性改进外,该模型还特点之所以有 residual attention 模块,以及一种新的损失函数,同时Promote spectral和spatial质量。此外,通过一种新的微调策略,可以在目标图像进行实时适应。对于一个大量的测试图像,在复杂的场景下进行了实验,结果表明,提议的方法与状态码器比较, both in terms of numerical results and visual output。代码可以在 上获取。

ChatGPT and Persuasive Technologies for the Management and Delivery of Personalized Recommendations in Hotel Hospitality

  • paper_url: http://arxiv.org/abs/2307.14298
  • repo_url: None
  • paper_authors: Manolis Remountakis, Konstantinos Kotis, Babis Kourtzis, George E. Tsekouras
  • for: 酒店住宿预测系统的自动化和改善
  • methods: 应用大语言模型(ChatGPT)和吸引技术对酒店预测系统的整合和改进
  • results: 透过实验研究,发现这些技术可以提高用户满意度和酒店营收
    Abstract Recommender systems have become indispensable tools in the hotel hospitality industry, enabling personalized and tailored experiences for guests. Recent advancements in large language models (LLMs), such as ChatGPT, and persuasive technologies, have opened new avenues for enhancing the effectiveness of those systems. This paper explores the potential of integrating ChatGPT and persuasive technologies for automating and improving hotel hospitality recommender systems. First, we delve into the capabilities of ChatGPT, which can understand and generate human-like text, enabling more accurate and context-aware recommendations. We discuss the integration of ChatGPT into recommender systems, highlighting the ability to analyze user preferences, extract valuable insights from online reviews, and generate personalized recommendations based on guest profiles. Second, we investigate the role of persuasive technology in influencing user behavior and enhancing the persuasive impact of hotel recommendations. By incorporating persuasive techniques, such as social proof, scarcity and personalization, recommender systems can effectively influence user decision-making and encourage desired actions, such as booking a specific hotel or upgrading their room. To investigate the efficacy of ChatGPT and persuasive technologies, we present a pilot experi-ment with a case study involving a hotel recommender system. We aim to study the impact of integrating ChatGPT and persua-sive techniques on user engagement, satisfaction, and conversion rates. The preliminary results demonstrate the potential of these technologies in enhancing the overall guest experience and business performance. Overall, this paper contributes to the field of hotel hospitality by exploring the synergistic relationship between LLMs and persuasive technology in recommender systems, ultimately influencing guest satisfaction and hotel revenue.
    摘要 <>TRANSLATE_TEXT酒店ospitality行业中的推荐系统已经成为不可或缺的工具,帮助客户得到个性化和适应性的体验。最新的大型语言模型(LLMs),如ChatGPT,以及吸引技术,则开启了新的 Avenues for enhancing the effectiveness of those systems。这篇论文探讨了将ChatGPT和吸引技术 integrate into hotel hospitality推荐系统中的 potential。首先,我们深入探讨ChatGPT的能力,它可以理解和生成人类语言,实现更加精确和上下文感知的推荐。我们讨论了将ChatGPT integrating into recommender systems, highlighting the ability to analyze user preferences, extract valuable insights from online reviews, and generate personalized recommendations based on guest profiles。其次,我们 investigate the role of persuasive technology in influencing user behavior and enhancing the persuasive impact of hotel recommendations。通过 incorporating persuasive techniques, such as social proof, scarcity and personalization, recommender systems can effectively influence user decision-making and encourage desired actions, such as booking a specific hotel or upgrading their room。为了评估ChatGPT和吸引技术的效果,我们进行了一项试验,使用一个酒店推荐系统的case study。我们的目的是研究将ChatGPT和吸引技术integrated into hotel hospitality推荐系统的影响,包括用户参与度、满意度和转化率。初步结果表明这些技术在提高客户体验和酒店业绩方面具有潜在的潜力。总之,这篇论文对酒店ospitality领域的推荐系统做出了贡献,探讨了LLMs和吸引技术之间的相互关系,最终影响客户满意度和酒店收益。

Unraveling the Complexity of Splitting Sequential Data: Tackling Challenges in Video and Time Series Analysis

  • paper_url: http://arxiv.org/abs/2307.14294
  • repo_url: None
  • paper_authors: Diego Botache, Kristina Dingel, Rico Huhnstock, Arno Ehresmann, Bernhard Sick
  • for: 本研究探讨了分析顺序数据时的挑战,包括数据收集、数据表示、分割率选择、设置质量标准和选择适当的选择策略。
  • methods: 本研究使用了两个实际应用例 Study two real-world examples: motor test benches and particle tracking in liquids.
  • results: 研究发现,在分析顺序数据时,需要考虑数据收集、数据表示、分割率选择、设置质量标准和选择适当的选择策略等多个因素,以确保分析结果的准确性和可靠性。
    Abstract Splitting of sequential data, such as videos and time series, is an essential step in various data analysis tasks, including object tracking and anomaly detection. However, splitting sequential data presents a variety of challenges that can impact the accuracy and reliability of subsequent analyses. This concept article examines the challenges associated with splitting sequential data, including data acquisition, data representation, split ratio selection, setting up quality criteria, and choosing suitable selection strategies. We explore these challenges through two real-world examples: motor test benches and particle tracking in liquids.
    摘要 分割连续数据,如视频和时间序列,是数据分析任务中的一项重要步骤,包括对象跟踪和异常检测。然而,分割连续数据带来许多挑战,这些挑战可能会影响后续分析的准确性和可靠性。本概念文章探讨分割连续数据的挑战,包括数据获取、数据表示、分割率选择、设置质量标准和选择适合的选择策略。我们通过两个实际应用例:汽车测试台和在液体中跟踪粒子来探讨这些挑战。

General Purpose Artificial Intelligence Systems (GPAIS): Properties, Definition, Taxonomy, Open Challenges and Implications

  • paper_url: http://arxiv.org/abs/2307.14283
  • repo_url: None
  • paper_authors: Isaac Triguero, Daniel Molina, Javier Poyatos, Javier Del Ser, Francisco Herrera
  • for: The paper discusses and proposes a new definition for General-Purpose Artificial Intelligence Systems (GPAIS) and its differentiation based on various factors.
  • methods: The paper uses existing definitions of GPAIS and proposes a new definition, and also discusses a taxonomy of approaches to realise GPAIS.
  • results: The paper aims to facilitate research collaboration across different areas that are tackling general-purpose tasks, and provides a holistic view of GPAIS, including its challenges and prospects, implications for society, and the need for responsible and trustworthy AI systems and regulation.Here is the same information in Simplified Chinese text:
  • for: 这篇论文讨论了和提出了一个新的General-Purpose Artificial Intelligence Systems(GPAIS)定义,以及其 diferenciación的基于多个因素。
  • methods: 这篇论文使用了现有的GPAIS定义,并提出了一个新的定义,同时还讨论了实现GPAIS的多种方法。
  • results: 这篇论文的目标是通过帮助不同领域的研究人员在处理通用任务方面合作,提供了GPAIS的总体视图,包括其挑战和前景,对社会的影响和责任的AI系统和 regulatory。
    Abstract Most applications of Artificial Intelligence (AI) are designed for a confined and specific task. However, there are many scenarios that call for a more general AI, capable of solving a wide array of tasks without being specifically designed for them. The term General-Purpose Artificial Intelligence Systems (GPAIS) has been defined to refer to these AI systems. To date, the possibility of an Artificial General Intelligence, powerful enough to perform any intellectual task as if it were human, or even improve it, has remained an aspiration, fiction, and considered a risk for our society. Whilst we might still be far from achieving that, GPAIS is a reality and sitting at the forefront of AI research. This work discusses existing definitions for GPAIS and proposes a new definition that allows for a gradual differentiation among types of GPAIS according to their properties and limitations. We distinguish between closed-world and open-world GPAIS, characterising their degree of autonomy and ability based on several factors such as adaptation to new tasks, competence in domains not intentionally trained for, ability to learn from few data, or proactive acknowledgment of their own limitations. We then propose a taxonomy of approaches to realise GPAIS, describing research trends such as the use of AI techniques to improve another AI or foundation models. As a prime example, we delve into generative AI, aligning them with the terms and concepts presented in the taxonomy. Through the proposed definition and taxonomy, our aim is to facilitate research collaboration across different areas that are tackling general-purpose tasks, as they share many common aspects. Finally, we discuss the current state of GPAIS, its challenges and prospects, implications for our society, and the need for responsible and trustworthy AI systems and regulation, with the goal of providing a holistic view of GPAIS.
    摘要 大多数人工智能(AI)应用都是为特定任务设计的,但有很多情况需要一种通用的AI系统,能够解决多种任务而不需要特定的设计。我们称这种AI系统为通用人工智能系统(GPAIS)。迄今为止,人工通用智能系统的可能性仍然是一个aspiration, fiction和社会中的风险。尽管我们仍然远离实现这一目标,但GPAIS已经成为人工智能研究的前线。这个工作提出了现有的GPAIS定义,并提出了一个新的定义,允许在不同的性能和限制基础下进行渐进的分类。我们分为闭世和开放世GPAIS,根据它们的自主性、能力和多个因素进行定义,如适应新任务、在不expressly预训练的领域中的能力、从少量数据学习、或者主动承认自己的局限性。然后,我们提出了实现GPAIS的方法分类,描述了研究趋势,如使用AI技术来提高另一个AI或基础模型。作为一个典型例子,我们探讨了生成AI,与提出的术语和概念进行对应。通过我们的定义和分类,我们希望能够促进不同领域在解决通用任务方面的合作研究,因为它们在许多方面都有共同之处。最后,我们讨论了GPAIS的当前状况、挑战和前途,以及对我们社会的影响和责任的人工智能系统和regulation,以提供一个总的视图。