cs.AI - 2023-09-21

Curriculum Reinforcement Learning via Morphology-Environment Co-Evolution

  • paper_url: http://arxiv.org/abs/2309.12529
  • repo_url: None
  • paper_authors: Shuang Ao, Tianyi Zhou, Guodong Long, Xuan Song, Jing Jiang
  • for: 本研究旨在帮助RL机器人在不同环境中学习和适应,以提高其总体性和可重复性。
  • methods: 本研究使用了“形态环境共EVOLUTION(MECE)”方法,在这种方法中,RL机器人的形态和环境会不断地更新和改进,以适应环境的变化。
  • results: 实验结果表明,通过MECE方法训练RL机器人的形态和策略,可以在未看过的测试环境中表现出显著更好的普适性和可重复性。此外,我们的剥离分析表明,MECE方法的成功与形态和环境的共EVOLUTION有直接的关系。
    Abstract Throughout long history, natural species have learned to survive by evolving their physical structures adaptive to the environment changes. In contrast, current reinforcement learning (RL) studies mainly focus on training an agent with a fixed morphology (e.g., skeletal structure and joint attributes) in a fixed environment, which can hardly generalize to changing environments or new tasks. In this paper, we optimize an RL agent and its morphology through ``morphology-environment co-evolution (MECE)'', in which the morphology keeps being updated to adapt to the changing environment, while the environment is modified progressively to bring new challenges and stimulate the improvement of the morphology. This leads to a curriculum to train generalizable RL, whose morphology and policy are optimized for different environments. Instead of hand-crafting the curriculum, we train two policies to automatically change the morphology and the environment. To this end, (1) we develop two novel and effective rewards for the two policies, which are solely based on the learning dynamics of the RL agent; (2) we design a scheduler to automatically determine when to change the environment and the morphology. In experiments on two classes of tasks, the morphology and RL policies trained via MECE exhibit significantly better generalization performance in unseen test environments than SOTA morphology optimization methods. Our ablation studies on the two MECE policies further show that the co-evolution between the morphology and environment is the key to the success.
    摘要 通过历史的演化,自然种类学会适应环境变化,而现代强化学习(RL)研究主要集中在训练一个固定结构(例如骨架和关节特性)在固定环境中,这难以应对变化环境或新任务。在这篇论文中,我们通过“形态环境共演化(MECE)”优化RL代理人和其形态,在形态不断更新以适应变化环境的同时,环境也不断改进以带来新的挑战和适应性提升。这导致了一个训练通用RL的课程,其中形态和策略在不同环境中得到优化。而不是手动设计课程,我们训练了两个政策来自动改变形态和环境。为此,我们:1. 开发了两种新有效的奖励,以便为两个政策提供动力学学习RL代理人的学习动态;2. 设计了一个计划器,以自动确定改变形态和环境的时间。在两类任务上进行了实验,MECE训练的形态和RL策略在未看到的测试环境中表现出了显著更好的普适性。我们的剖析研究还表明,MECE中形态和环境之间的共演化是成功的关键。

Knowledge Graph Embedding: An Overview

  • paper_url: http://arxiv.org/abs/2309.12501
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Xiou Ge, Yun-Cheng Wang, Bin Wang, C. -C. Jay Kuo
  • for: 本文概述了目前knowledge graph completion(KGC)领域的研究进展,尤其是两种主要的knowledge graph embedding(KGE)设计方法:距离基于方法和semantic matching基于方法。
  • methods: 本文总结了 reciently proposed models的关系,并发现了这些模型之间的联系。此外,文章还介绍了一种新的approach for KGC,即通过预训练语言模型(PLM)和实体和关系的文本描述来完成KGC。
  • results: 文章介绍了一种基于2D和3D affine操作的CompoundE和CompoundE3D模型,以及一种 combining KGE embedding方法与PLMs的新方法。这些方法可以提供更高的explainability和渐进性。
    Abstract Many mathematical models have been leveraged to design embeddings for representing Knowledge Graph (KG) entities and relations for link prediction and many downstream tasks. These mathematically-inspired models are not only highly scalable for inference in large KGs, but also have many explainable advantages in modeling different relation patterns that can be validated through both formal proofs and empirical results. In this paper, we make a comprehensive overview of the current state of research in KG completion. In particular, we focus on two main branches of KG embedding (KGE) design: 1) distance-based methods and 2) semantic matching-based methods. We discover the connections between recently proposed models and present an underlying trend that might help researchers invent novel and more effective models. Next, we delve into CompoundE and CompoundE3D, which draw inspiration from 2D and 3D affine operations, respectively. They encompass a broad spectrum of techniques including distance-based and semantic-based methods. We will also discuss an emerging approach for KG completion which leverages pre-trained language models (PLMs) and textual descriptions of entities and relations and offer insights into the integration of KGE embedding methods with PLMs for KG completion.
    摘要 许多数学模型已经被应用于设计知识 graphs (KG) 实体和关系的 Representation 以进行链接预测和多种下游任务。这些数学静态的模型不仅可以在大型 KG 中进行可扩展的推理,而且具有许多可解释的优势,可以通过正式证明和实际结果来验证不同的关系模式。在这篇论文中,我们对当前 KG 完成研究进行了全面的概述。特别是,我们关注了两个主要的 KG 嵌入 (KGE) 设计分支:1) 距离基于方法和 2) 含义匹配基于方法。我们发现了最新提出的模型之间的连接,并提出了一个可能的趋势,可以帮助研究人员创造更有效和新的模型。接着,我们探讨了 CompoundE 和 CompoundE3D,它们继承了2D和3D afine操作的想法。它们包括距离基于和含义基于的多种技术。我们还讨论了一种emergingapproach для KG completion,它利用预训练的语言模型 (PLMs) 和实体和关系的文本描述,并提供了 KGE嵌入方法与 PLMs 的集成的思路。

Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation

  • paper_url: http://arxiv.org/abs/2309.12491
  • repo_url: https://github.com/tomlimi/MT-Tokenizer-Bias
  • paper_authors: Bar Iluz, Tomasz Limisiewicz, Gabriel Stanovsky, David Mareček
  • for: This paper focuses on the effect of tokenization on gender bias in machine translation, specifically examining the interactions between the frequency of gendered profession names in training data, their representation in the subword tokenizer’s vocabulary, and gender bias.
  • methods: The authors use a combination of data analysis and machine learning techniques to study the impact of tokenization on gender bias in machine translation. They analyze the subword splits of gendered profession names in the training data and use fine-tuning of the token embedding layer to decrease the gender bias in the model.
  • results: The authors find that the imbalance of gender forms in the model’s training corpus is a major factor contributing to gender bias, and that analyzing subword splits provides good estimates of gender-form imbalance in the training data. They also show that fine-tuning just the token embedding layer can decrease the gap in gender prediction accuracy between female and male forms without impairing the translation quality.Here are the three points in Simplified Chinese text:
  • for: 这个论文研究了机器翻译中的gender bias问题,具体来说是研究训练数据中gendered profession名称的频率、tokenizer的词库中gendered profession名称的表示方式和gender bias之间的交互关系。
  • methods: 作者们使用了数据分析和机器学习技术来研究tokenization对机器翻译中的gender bias的影响。他们分析了训练数据中gendered profession名称的subword splits,并通过token embedding层的微调来降低模型中的gender bias。
  • results: 作者们发现,模型的训练数据中gender forms的偏度是gender bias的主要原因,而且分析subword splits可以提供good estimate of训练数据中gender-form偏度。他们还显示了微调只token embedding层可以降低女性和男性形式之间的差距而不妨碍翻译质量。
    Abstract We study the effect of tokenization on gender bias in machine translation, an aspect that has been largely overlooked in previous works. Specifically, we focus on the interactions between the frequency of gendered profession names in training data, their representation in the subword tokenizer's vocabulary, and gender bias. We observe that female and non-stereotypical gender inflections of profession names (e.g., Spanish "doctora" for "female doctor") tend to be split into multiple subword tokens. Our results indicate that the imbalance of gender forms in the model's training corpus is a major factor contributing to gender bias and has a greater impact than subword splitting. We show that analyzing subword splits provides good estimates of gender-form imbalance in the training data and can be used even when the corpus is not publicly available. We also demonstrate that fine-tuning just the token embedding layer can decrease the gap in gender prediction accuracy between female and male forms without impairing the translation quality.
    摘要 我们研究了各种卡通化的影响于机器翻译中的性别偏见,这是前一些研究中尚未得到充分关注的方面。我们专注于训练数据中的性别定型名称的频率,它们在字节化器的词库中的表示方式,以及性别偏见的关系。我们发现,女性和非标准性别定型名称(例如西班牙语"doctora")在训练数据中出现的频率较低,这些名称往往会被拆分成多个字节。我们的结果表明,训练数据中性别形式的偏见是机器翻译模型的训练 corpus 中最大的一个因素,并且对性别预测精度的差异产生了更大的影响,而不是字节拆分。我们示出,分析字节拆分可以提供良好的性别形式偏见的估计,即使训练数据不公开可用。此外,我们还证明了只修改字节嵌入层可以降低女性和男性形式之间的翻译质量差异。

Studying and improving reasoning in humans and machines

  • paper_url: http://arxiv.org/abs/2309.12485
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Nicolas Yax, Hernan Anlló, Stefano Palminteri
  • for: investigate and compare reasoning in large language models (LLM) and humans
  • methods: used cognitive psychology tools traditionally dedicated to the study of (bounded) rationality
  • results: most of the included models presented reasoning errors akin to those frequently ascribed to error-prone, heuristic-based human reasoning, but with important differences with human-like reasoning and limitations disappearing in more recent LLMs releases.
    Abstract In the present study, we investigate and compare reasoning in large language models (LLM) and humans using a selection of cognitive psychology tools traditionally dedicated to the study of (bounded) rationality. To do so, we presented to human participants and an array of pretrained LLMs new variants of classical cognitive experiments, and cross-compared their performances. Our results showed that most of the included models presented reasoning errors akin to those frequently ascribed to error-prone, heuristic-based human reasoning. Notwithstanding this superficial similarity, an in-depth comparison between humans and LLMs indicated important differences with human-like reasoning, with models limitations disappearing almost entirely in more recent LLMs releases. Moreover, we show that while it is possible to devise strategies to induce better performance, humans and machines are not equally-responsive to the same prompting schemes. We conclude by discussing the epistemological implications and challenges of comparing human and machine behavior for both artificial intelligence and cognitive psychology.
    摘要 在本研究中,我们研究和比较大语言模型(LLM)和人类的理解能力使用一些传统的认知心理学工具,以 investigate bounded rationality 的研究。我们给人类参与者和一些预训练的 LLM 提供了新的变种 classical cognitive experiments,并将其比较。我们的结果表明,大多数包含在模型中的理解错误与人类的错误相似,但是在深入比较之后,发现模型的局限性在更新的 LLM 发布中几乎消失了。此外,我们还证明了可以采取措施来提高表现,但是人类和机器不同的响应方式。我们 conclude 这些比较结果对人工智能和认知心理学都具有epistemological 意义和挑战。

State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding

  • paper_url: http://arxiv.org/abs/2309.12482
  • repo_url: None
  • paper_authors: Devleena Das, Sonia Chernova, Been Kim
  • for: 该研究旨在开发一种可以帮助非AI专家理解AI决策过程的方法,以便在日常任务中使用AI系统。
  • methods: 该研究使用了基于概念的解释方法,其中概念是在动作选择 Setting 中定义的。另外,研究还提出了一种joint embedding模型,用于学习状态动作对应的概念解释。
  • results: 实验结果表明,使用State2Explanation(S2E)框架可以提高代理人学习率和任务性能,同时也可以为非AI专家提供有用的解释,从而提高任务完成性。
    Abstract With more complex AI systems used by non-AI experts to complete daily tasks, there is an increasing effort to develop methods that produce explanations of AI decision making understandable by non-AI experts. Towards this effort, leveraging higher-level concepts and producing concept-based explanations have become a popular method. Most concept-based explanations have been developed for classification techniques, and we posit that the few existing methods for sequential decision making are limited in scope. In this work, we first contribute a desiderata for defining "concepts" in sequential decision making settings. Additionally, inspired by the Protege Effect which states explaining knowledge often reinforces one's self-learning, we explore the utility of concept-based explanations providing a dual benefit to the RL agent by improving agent learning rate, and to the end-user by improving end-user understanding of agent decision making. To this end, we contribute a unified framework, State2Explanation (S2E), that involves learning a joint embedding model between state-action pairs and concept-based explanations, and leveraging such learned model to both (1) inform reward shaping during an agent's training, and (2) provide explanations to end-users at deployment for improved task performance. Our experimental validations, in Connect 4 and Lunar Lander, demonstrate the success of S2E in providing a dual-benefit, successfully informing reward shaping and improving agent learning rate, as well as significantly improving end user task performance at deployment time.
    摘要 With the increasing use of more complex AI systems by non-AI experts for daily tasks, there is a growing effort to develop methods that provide understandable explanations of AI decision-making. To address this, leveraging higher-level concepts and producing concept-based explanations have become a popular approach. However, most existing methods are limited to classification techniques, and there is a lack of methods for sequential decision-making.In this work, we first propose a desiderata for defining "concepts" in sequential decision-making settings. Additionally, inspired by the Protege Effect, which states that explaining knowledge can reinforce one's self-learning, we explore the utility of concept-based explanations providing a dual benefit to both the RL agent and the end-user. To achieve this, we contribute a unified framework called State2Explanation (S2E), which involves learning a joint embedding model between state-action pairs and concept-based explanations, and leveraging this model to both inform reward shaping during the agent's training and provide explanations to end-users at deployment time.Our experimental validations in Connect 4 and Lunar Lander demonstrate the success of S2E in providing a dual-benefit, successfully informing reward shaping and improving agent learning rate, as well as significantly improving end-user task performance at deployment time.

HANS, are you clever? Clever Hans Effect Analysis of Neural Systems

  • paper_url: http://arxiv.org/abs/2309.12481
  • repo_url: None
  • paper_authors: Leonardo Ranaldi, Fabio Massimo Zanzotto
  • for: 这篇论文旨在检验 Iterative Large Language Models(It-LLMs)在不同选项顺序下的抗衰假设能力。
  • methods: 作者使用了多个多选题目(MCQ)benchmarks来构建坚实的评估模型能力。他们还引入了对抗样本,以检验模型的可靠性。
  • results: 研究发现,模型在选项顺序变化时存在偏袋性,具体来说是在第一个选项的位置影响模型选择的现象。此外,作者还发现模型在几个示例下 exhibit 偏好结构性的决策过程。通过使用 Chain-of-Thought(CoT)技术,作者可以让模型更加坚定地reasoning,从而减少偏袋性。
    Abstract Instruction-tuned Large Language Models (It-LLMs) have been exhibiting outstanding abilities to reason around cognitive states, intentions, and reactions of all people involved, letting humans guide and comprehend day-to-day social interactions effectively. In fact, several multiple-choice questions (MCQ) benchmarks have been proposed to construct solid assessments of the models' abilities. However, earlier works are demonstrating the presence of inherent "order bias" in It-LLMs, posing challenges to the appropriate evaluation. In this paper, we investigate It-LLMs' resilience abilities towards a series of probing tests using four MCQ benchmarks. Introducing adversarial examples, we show a significant performance gap, mainly when varying the order of the choices, which reveals a selection bias and brings into discussion reasoning abilities. Following a correlation between first positions and model choices due to positional bias, we hypothesized the presence of structural heuristics in the decision-making process of the It-LLMs, strengthened by including significant examples in few-shot scenarios. Finally, by using the Chain-of-Thought (CoT) technique, we elicit the model to reason and mitigate the bias by obtaining more robust models.
    摘要

SAVME: Efficient Safety Validation for Autonomous Systems Using Meta-Learning

  • paper_url: http://arxiv.org/abs/2309.12474
  • repo_url: None
  • paper_authors: Marc R. Schlichting, Nina V. Boord, Anthony L. Corso, Mykel J. Kochenderfer
  • for: 本研究旨在快速发现自动驾驶系统的可能性故障,以便在部署之前进行风险评估。
  • methods: 我们提出了一种 bayesian 方法,它结合了 meta-learning 策略和多重武器框架,以优化验证过程。我们学习了触发故障场景的分布,以及对 simulator 的精度设置的分布。在 meta-learning 的精神中,我们还评估了学习分布是否能够帮助更快地学习新场景。
  • results: 我们使用了一个 cutting-edge 3D 驾驶 simulator,包含了 16 个精度设置,对自动驾驶车stack 进行了测试。我们根据自动驾驶车的偏倾类型进行了不同的场景测试。结果显示,我们的方法可以快速减少验证时间,比传统方法快速多达 18 倍。
    Abstract Discovering potential failures of an autonomous system is important prior to deployment. Falsification-based methods are often used to assess the safety of such systems, but the cost of running many accurate simulation can be high. The validation can be accelerated by identifying critical failure scenarios for the system under test and by reducing the simulation runtime. We propose a Bayesian approach that integrates meta-learning strategies with a multi-armed bandit framework. Our method involves learning distributions over scenario parameters that are prone to triggering failures in the system under test, as well as a distribution over fidelity settings that enable fast and accurate simulations. In the spirit of meta-learning, we also assess whether the learned fidelity settings distribution facilitates faster learning of the scenario parameter distributions for new scenarios. We showcase our methodology using a cutting-edge 3D driving simulator, incorporating 16 fidelity settings for an autonomous vehicle stack that includes camera and lidar sensors. We evaluate various scenarios based on an autonomous vehicle pre-crash typology. As a result, our approach achieves a significant speedup, up to 18 times faster compared to traditional methods that solely rely on a high-fidelity simulator.
    摘要 发现自动化系统的潜在失败是在部署之前非常重要。使用模糊化方法评估自动化系统的安全性可能是costly的。我们提议使用 bayesian方法,结合多重武器框架,以加速验证过程。我们的方法是学习触发故障场景的分布,以及对于快速和准确的模拟而设置的信任度设定的分布。在meta-learning的精神中,我们还评估了学习分布中的信任度设定是否可以更快地学习新场景的分布。我们使用了一个前沿的3D驾驶模拟器,包括16个可信度设定,对于一个包含摄像头和雷达感知器的自动驾驶车Stack。我们根据自动驾驶车预crash类型进行了多种场景的评估。因此,我们的方法可以减少至18倍以上,相比传统方法仅使用高可信度模拟器。

Multimodal Deep Learning for Scientific Imaging Interpretation

  • paper_url: http://arxiv.org/abs/2309.12460
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Abdulelah S. Alshehri, Franklin L. Lee, Shihu Wang
  • for: 本研究旨在 linguistically emulating human-like interactions with Scanning Electron Microscopy (SEM) images, specifically of glass materials, and evaluating the accuracy of such interactions.
  • methods: 该方法基于多模态深度学习框架,利用文献评审文章中的文本和图像数据,以及GPT-4的数据生成和评估能力,以提取图像中的关键特征和缺陷。
  • results: 模型(GlassLLaVA)在前所未见的SEM图像中提取了准确的解释、标识了关键特征,并检测到缺陷。此外,我们还介绍了适用于科学成像应用的多样化评价指标,可以与研究级别的答案进行比较。
    Abstract In the domain of scientific imaging, interpreting visual data often demands an intricate combination of human expertise and deep comprehension of the subject materials. This study presents a novel methodology to linguistically emulate and subsequently evaluate human-like interactions with Scanning Electron Microscopy (SEM) images, specifically of glass materials. Leveraging a multimodal deep learning framework, our approach distills insights from both textual and visual data harvested from peer-reviewed articles, further augmented by the capabilities of GPT-4 for refined data synthesis and evaluation. Despite inherent challenges--such as nuanced interpretations and the limited availability of specialized datasets--our model (GlassLLaVA) excels in crafting accurate interpretations, identifying key features, and detecting defects in previously unseen SEM images. Moreover, we introduce versatile evaluation metrics, suitable for an array of scientific imaging applications, which allows for benchmarking against research-grounded answers. Benefiting from the robustness of contemporary Large Language Models, our model adeptly aligns with insights from research papers. This advancement not only underscores considerable progress in bridging the gap between human and machine interpretation in scientific imaging, but also hints at expansive avenues for future research and broader application.
    摘要 在科学成像领域,解读视觉数据经常需要复杂的人工智能和深入的Subject材料的理解。这项研究提出了一种新的方法ología,用于模拟和评估SEM图像中的人类如果交互行为,特别是钢琴材料的SEM图像。我们利用了一种多模态深度学习框架,将文本和视觉数据从同行评审文章中提取出来,并通过GPT-4的数据生成和评估能力进行进一步的增强。尽管存在某些挑战,如细微的解释和特殊 dataset的有限性,但我们的模型(GlassLLaVA)在面临 previously unseen SEM 图像时仍然能够提供高度准确的解释、标识关键特征和检测缺陷。此外,我们还引入了适用于多种科学成像应用的评价指标,使得可以对研究级答案进行比较。这种进步不仅标识了人机共同解读的科学成像领域中的巨大进步,还预示了未来研究和应用的广阔前景。

LongDocFACTScore: Evaluating the Factuality of Long Document Abstractive Summarisation

  • paper_url: http://arxiv.org/abs/2309.12455
  • repo_url: https://github.com/jbshp/longdocfactscore
  • paper_authors: Jennifer A Bishop, Qianqian Xie, Sophia Ananiadou
  • for: 本研究旨在evaluating automatic text summarization metrics for long document data sets, and proposing a new evaluation framework called LongDocFACTScore.
  • methods: 本研究使用了pre-trained language models和human annotated data sets来evaluate automatic text summarization metrics的准确性。
  • results: LongDocFACTScore outperforms existing state-of-the-art metrics in its ability to correlate with human measures of factuality when used to evaluate long document summarization data sets, and its performance is comparable to state-of-the-art metrics when evaluated against human measures of factual consistency on short document data sets.
    Abstract Maintaining factual consistency is a critical issue in abstractive text summarisation, however, it cannot be assessed by traditional automatic metrics used for evaluating text summarisation, such as ROUGE scoring. Recent efforts have been devoted to developing improved metrics for measuring factual consistency using pre-trained language models, but these metrics have restrictive token limits, and are therefore not suitable for evaluating long document text summarisation. Moreover, there is limited research evaluating whether existing automatic evaluation metrics are fit for purpose when applied to long document data sets. In this work, we evaluate the efficacy of automatic metrics at assessing factual consistency in long document text summarisation and propose a new evaluation framework LongDocFACTScore. This framework allows metrics to be extended to any length document. This framework outperforms existing state-of-the-art metrics in its ability to correlate with human measures of factuality when used to evaluate long document summarisation data sets. Furthermore, we show LongDocFACTScore has performance comparable to state-of-the-art metrics when evaluated against human measures of factual consistency on short document data sets. We make our code and annotated data publicly available: https://github.com/jbshp/LongDocFACTScore.
    摘要 保持事实一致性是抽象文本概要化中的关键问题,但这无法由传统的自动评价指标来评估,如ROUGE分数。 recent efforts have been devoted to developing improved metrics for measuring factual consistency using pre-trained language models, but these metrics have restrictive token limits and are therefore not suitable for evaluating long document text summarization. In addition, there is limited research evaluating whether existing automatic evaluation metrics are fit for purpose when applied to long document data sets. In this work, we evaluate the efficacy of automatic metrics at assessing factual consistency in long document text summarization and propose a new evaluation framework LongDocFACTScore. This framework allows metrics to be extended to any length document. This framework outperforms existing state-of-the-art metrics in its ability to correlate with human measures of factuality when used to evaluate long document summarization data sets. Furthermore, we show LongDocFACTScore has performance comparable to state-of-the-art metrics when evaluated against human measures of factual consistency on short document data sets. We make our code and annotated data publicly available: .

Ensemble Neural Networks for Remaining Useful Life (RUL) Prediction

  • paper_url: http://arxiv.org/abs/2309.12445
  • repo_url: None
  • paper_authors: Ahbishek Srinivasan, Juan Carlos Andresen, Anders Holst
  • For: The paper aims to propose an ensemble neural network approach for probabilistic remaining useful life (RUL) predictions, which considers both aleatoric and epistemic uncertainties and decouples them to provide a more accurate and interpretable prediction.* Methods: The proposed method uses ensemble neural networks to model the probabilistic nature of RUL predictions, and decouples the aleatoric and epistemic uncertainties to provide a better understanding of the confidence of the predictions.* Results: The proposed approach is tested on NASA’s turbofan jet engine CMAPSS data-set and shows how the uncertainties can be modeled and disentangled. The results also demonstrate the effectiveness of the proposed approach compared to current state-of-the-art methods.Here is the same information in Simplified Chinese text:* For: 该文章目的是提出一种 ensemble neural network 方法,用于 probabilistic 的 remaining useful life(RUL)预测,并考虑了 aleatoric 和 epistemic 不确定性,并将其分解为两个不同的不确定性。* Methods: 该方法使用 ensemble neural networks 来模型 probabilistic 的 RUL 预测,并将 aleatoric 和 epistemic 不确定性分解为两个不同的不确定性。* Results: 该方法在 NASA 的 turbofan jet engine CMAPSS 数据集上进行了测试,并显示了如何模型和分解不确定性。结果还比较了该方法与当前状态的先进方法。
    Abstract A core part of maintenance planning is a monitoring system that provides a good prognosis on health and degradation, often expressed as remaining useful life (RUL). Most of the current data-driven approaches for RUL prediction focus on single-point prediction. These point prediction approaches do not include the probabilistic nature of the failure. The few probabilistic approaches to date either include the aleatoric uncertainty (which originates from the system), or the epistemic uncertainty (which originates from the model parameters), or both simultaneously as a total uncertainty. Here, we propose ensemble neural networks for probabilistic RUL predictions which considers both uncertainties and decouples these two uncertainties. These decoupled uncertainties are vital in knowing and interpreting the confidence of the predictions. This method is tested on NASA's turbofan jet engine CMAPSS data-set. Our results show how these uncertainties can be modeled and how to disentangle the contribution of aleatoric and epistemic uncertainty. Additionally, our approach is evaluated on different metrics and compared against the current state-of-the-art methods.
    摘要 系综综保养规划中的核心部分是监测系统,提供器部件的健康和衰化情况的良好预测,通常表示为剩下的有用寿命(RUL)。现有的大多数数据驱动方法都是单点预测,不包括失败的 probabilistic 特征。其中一些 probabilistic 方法可以同时考虑系统内的 aleatoric 不确定性和模型参数的 epistemic 不确定性,或者将这两种不确定性作为总的不确定性。在这里,我们提出了ensemble neural networks для probabilistic RUL 预测,该方法考虑了这两种不确定性,并将它们分解开来。这些分解的不确定性非常重要,因为它们可以帮助我们理解和解释预测结果的信任程度。我们在NASA的涡轮喷气发动机CMAPSS数据集上测试了这种方法,我们的结果表明了如何模拟这些不确定性,并如何分解它们的贡献。此外,我们还对这种方法进行了不同的评价指标和与当前状态艺术方法进行了比较。

Can LLMs Augment Low-Resource Reading Comprehension Datasets? Opportunities and Challenges

  • paper_url: http://arxiv.org/abs/2309.12426
  • repo_url: None
  • paper_authors: Vinay Samuel, Houda Aynaou, Arijit Ghosh Chowdhury, Karthik Venkat Ramanan, Aman Chadha
  • for: 这个论文主要是用于探讨使用大语言模型(LLMs)来增强现有的抽取式阅读理解任务数据集,以提高下游任务的性能。
  • methods: 该论文使用的方法包括使用GPT-4来自动生成数据集的描述和答案,然后对这些数据集进行微调以适应特定任务。
  • results: 研究发现,使用GPT-4进行数据生成和微调可以提高low resource阅读理解任务的性能,同时也可以大幅减少人工标注的成本。此外,研究还发现了一些特殊的机会和挑战,需要进一步的研究和优化。
    Abstract Large Language Models (LLMs) have demonstrated impressive zero shot performance on a wide range of NLP tasks, demonstrating the ability to reason and apply commonsense. A relevant application is to use them for creating high quality synthetic datasets for downstream tasks. In this work, we probe whether GPT-4 can be used to augment existing extractive reading comprehension datasets. Automating data annotation processes has the potential to save large amounts of time, money and effort that goes into manually labelling datasets. In this paper, we evaluate the performance of GPT-4 as a replacement for human annotators for low resource reading comprehension tasks, by comparing performance after fine tuning, and the cost associated with annotation. This work serves to be the first analysis of LLMs as synthetic data augmenters for QA systems, highlighting the unique opportunities and challenges. Additionally, we release augmented versions of low resource datasets, that will allow the research community to create further benchmarks for evaluation of generated datasets.
    摘要 We evaluate the performance of GPT-4 as a replacement for human annotators for low-resource reading comprehension tasks by comparing performance after fine-tuning and the cost associated with annotation. This work is the first analysis of LLMs as synthetic data augmenters for QA systems, highlighting the unique opportunities and challenges. Additionally, we release augmented versions of low-resource datasets, allowing the research community to create further benchmarks for evaluating generated datasets.Note: "Simplified Chinese" is a romanization of Chinese characters, it's not a native language. The correct name of the language is "中文(简体)" in Chinese.

Event Prediction using Case-Based Reasoning over Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2309.12423
  • repo_url: https://github.com/solashirai/www-evcbr
  • paper_authors: Sola Shirai, Debarun Bhattacharjya, Oktie Hassanzadeh
  • for: 预测新事件的 causal 关系和属性
  • methods: 使用case-based reasoning模型(EvCBR),不需要 retraining,通过统计度量identify similar事件并进行路径预测
  • results: 在使用新闻事件 dataset 进行测试时,EvCBR 表现出色,超过基eline模型(包括 translate-distance-based、GNN-based和规则based LP 模型)
    Abstract Applying link prediction (LP) methods over knowledge graphs (KG) for tasks such as causal event prediction presents an exciting opportunity. However, typical LP models are ill-suited for this task as they are incapable of performing inductive link prediction for new, unseen event entities and they require retraining as knowledge is added or changed in the underlying KG. We introduce a case-based reasoning model, EvCBR, to predict properties about new consequent events based on similar cause-effect events present in the KG. EvCBR uses statistical measures to identify similar events and performs path-based predictions, requiring no training step. To generalize our methods beyond the domain of event prediction, we frame our task as a 2-hop LP task, where the first hop is a causal relation connecting a cause event to a new effect event and the second hop is a property about the new event which we wish to predict. The effectiveness of our method is demonstrated using a novel dataset of newsworthy events with causal relations curated from Wikidata, where EvCBR outperforms baselines including translational-distance-based, GNN-based, and rule-based LP models.
    摘要 使用链接预测(LP)方法在知识图(KG)上进行任务,如 causal event prediction 具有吸引人的机遇。然而,典型的 LP 模型无法执行新的链接预测,因为它们无法处理新的事件实体,并且需要重新训练。我们介绍了一种 случа件理解模型(EvCBR),用于预测新的后果事件的属性,基于知识图中的相似 causa-effect 事件。EvCBR 使用统计度量来标识相似事件,并进行路径预测,不需要训练步骤。为了扩展我们的方法,我们将任务划为两个步骤 LP 任务,第一步是一个 causal 关系连接一个新的效应事件和一个原因事件,第二步是预测新事件的属性。我们的方法在使用 Wikidata 上的新闻事件 causal 关系的数据集上进行了证明,EvCBR 在基于译译距离、图 neural network 和规则 LP 模型的基础上减少。

Constraints First: A New MDD-based Model to Generate Sentences Under Constraints

  • paper_url: http://arxiv.org/abs/2309.12415
  • repo_url: None
  • paper_authors: Alexandre Bonlarron, Aurélie Calabrèse, Pierre Kornprobst, Jean-Charles Régin
  • for: 这个论文是为了开发一种生成受约文本的新方法而写的。
  • methods: 这篇论文使用了多值决策图(MDD)来解决这个问题,并应用了一个语言模型(GPT-2)来选择最佳的句子。
  • results: 该方法可以生成大量的合法的句子,比传统的视力检测测试(MNREAD)中的句子更多,这为标准化句子生成带来了重大突破。此外,这种方法可以轻松适应其他语言,因此具有普适性和可重用性。
    Abstract This paper introduces a new approach to generating strongly constrained texts. We consider standardized sentence generation for the typical application of vision screening. To solve this problem, we formalize it as a discrete combinatorial optimization problem and utilize multivalued decision diagrams (MDD), a well-known data structure to deal with constraints. In our context, one key strength of MDD is to compute an exhaustive set of solutions without performing any search. Once the sentences are obtained, we apply a language model (GPT-2) to keep the best ones. We detail this for English and also for French where the agreement and conjugation rules are known to be more complex. Finally, with the help of GPT-2, we get hundreds of bona-fide candidate sentences. When compared with the few dozen sentences usually available in the well-known vision screening test (MNREAD), this brings a major breakthrough in the field of standardized sentence generation. Also, as it can be easily adapted for other languages, it has the potential to make the MNREAD test even more valuable and usable. More generally, this paper highlights MDD as a convincing alternative for constrained text generation, especially when the constraints are hard to satisfy, but also for many other prospects.
    摘要

ForceSight: Text-Guided Mobile Manipulation with Visual-Force Goals

  • paper_url: http://arxiv.org/abs/2309.12312
  • repo_url: https://github.com/force-sight/forcesight
  • paper_authors: Jeremy A. Collins, Cody Houff, You Liang Tan, Charles C. Kemp
  • for: 这篇论文是为了研究文本指导的移动 manipulate 系统,该系统使用深度学习神经网络预测视觉力目标。
  • methods: 该论文使用了一种深度学习模型,该模型使用单个RGBD图像和文本提示来预测视觉力目标(姿态目标)和相关的力目标(力量目标)。
  • results: 当部署在携带着RGBD相机的移动 manipulate 器上时,ForceSight系统在未seen环境中完成了精度抓取、抽屉开启和物品传递等任务,成功率达81%。在另一个实验中,仅通过视觉服务器而忽略力目标,成功率下降至45%,这说明力目标可以显著提高性能。
    Abstract We present ForceSight, a system for text-guided mobile manipulation that predicts visual-force goals using a deep neural network. Given a single RGBD image combined with a text prompt, ForceSight determines a target end-effector pose in the camera frame (kinematic goal) and the associated forces (force goal). Together, these two components form a visual-force goal. Prior work has demonstrated that deep models outputting human-interpretable kinematic goals can enable dexterous manipulation by real robots. Forces are critical to manipulation, yet have typically been relegated to lower-level execution in these systems. When deployed on a mobile manipulator equipped with an eye-in-hand RGBD camera, ForceSight performed tasks such as precision grasps, drawer opening, and object handovers with an 81% success rate in unseen environments with object instances that differed significantly from the training data. In a separate experiment, relying exclusively on visual servoing and ignoring force goals dropped the success rate from 90% to 45%, demonstrating that force goals can significantly enhance performance. The appendix, videos, code, and trained models are available at https://force-sight.github.io/.
    摘要 我们介绍ForceSight,一种基于文本指导的移动摩擦系统,该系统使用深度神经网络预测视觉力目标。给定一个RGBD图像和一个文本提示,ForceSight将确定相机框架中的目标终端器姿态(骨干目标)和相关的力(力目标)。这两个组成部分共同组成了视觉力目标。在前一个研究中,深度模型输出的人类可读的骨干目标可以使得真正的机器人实现灵活的摩擦。然而,力是摩擦中关键的一部分,通常被低级执行系统中排除。当部署在配备了眼头RGBD摄像头的移动摩擦机器人上时,ForceSight完成了精度抓取、抽屉打开和物品传递等任务,成功率达81%,在未看过的环境中,物品实例与训练数据有很大差异。在另一个实验中,仅通过视觉服务器和忽略力目标,成功率从90%降落到45%,这表明力目标可以显著提高性能。详细介绍、视频、代码和训练模型可以在中找到。

LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

  • paper_url: http://arxiv.org/abs/2309.12311
  • repo_url: None
  • paper_authors: Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David F. Fouhey, Joyce Chai
  • for: 提高家庭机器人的3D视觉掌握能力,以便机器人可以基于环境中的物体进行导航、物体操作和回答问题。
  • methods: 使用大语言模型(LLM)将复杂的自然语言查询拆分成Semantic constituents,然后使用OpenScene或LERF等视觉定位工具来定位3D场景中的物体。LLM然后评估这些提议的物体之间的空间和通用常识关系,以便作出最终的定位决定。
  • results: 在ScanRefer benchark中评估LLM-Grounder,未使用任何标注训练数据,可以普适地处理 novel 3D场景和任意自然语言查询,并达到了零shot定位精度。研究表明,LLM可以大幅提高定位能力,特别是对于复杂的语言查询,使LLM-Grounder成为3D视觉语言任务中的有效方法。
    Abstract 3D visual grounding is a critical skill for household robots, enabling them to navigate, manipulate objects, and answer questions based on their environment. While existing approaches often rely on extensive labeled data or exhibit limitations in handling complex language queries, we propose LLM-Grounder, a novel zero-shot, open-vocabulary, Large Language Model (LLM)-based 3D visual grounding pipeline. LLM-Grounder utilizes an LLM to decompose complex natural language queries into semantic constituents and employs a visual grounding tool, such as OpenScene or LERF, to identify objects in a 3D scene. The LLM then evaluates the spatial and commonsense relations among the proposed objects to make a final grounding decision. Our method does not require any labeled training data and can generalize to novel 3D scenes and arbitrary text queries. We evaluate LLM-Grounder on the ScanRefer benchmark and demonstrate state-of-the-art zero-shot grounding accuracy. Our findings indicate that LLMs significantly improve the grounding capability, especially for complex language queries, making LLM-Grounder an effective approach for 3D vision-language tasks in robotics. Videos and interactive demos can be found on the project website https://chat-with-nerf.github.io/ .
    摘要 三维视觉定位是家庭机器人的关键技能,它使机器人能够在环境中导航、操作物品和回答问题。现有的方法常常需要大量标注数据或者在处理复杂的语言查询时显示限制,而我们提议了LLM-Grounder,一种新的零批处理、开 vocabulary 的大语言模型(LLM)基于的三维视觉定位管道。LLM-Grounder 利用 LLM 将复杂的自然语言查询分解成Semantic 成分,然后使用 OpenScene 或 LERF 等视觉定位工具来确定3D场景中的物体。LLM 然后评估物体之间的空间和常识关系,以便做最终的定位决定。我们的方法不需要任何标注训练数据,可以泛化到新的3D场景和任意文本查询。我们在 ScanRefer benchmark 上评估了LLM-Grounder,并实现了零批处理定位精度。我们的发现表明,LLM 可以大幅提高定位能力,特别是对于复杂的语言查询,从而使LLM-Grounder 成为家庭机器人中的有效方法。视频和交互demo可以在项目网站https://chat-with-nerf.github.io/ 找到。

Rehearsal: Simulating Conflict to Teach Conflict Resolution

  • paper_url: http://arxiv.org/abs/2309.12309
  • repo_url: None
  • paper_authors: Omar Shaikh, Valentino Chai, Michele J. Gelfand, Diyi Yang, Michael S. Bernstein
  • for: This paper aims to provide a system for users to practice and learn effective conflict resolution strategies through simulated conversations with a believable interlocutor.
  • methods: The paper introduces a system called Rehearsal, which uses a large language model conditioned on the Interest-Rights-Power (IRP) theory to generate counterfactual scenarios and guide users towards de-escalating difficult conversations.
  • results: In a between-subjects evaluation, participants who received simulated training from Rehearsal significantly improved their performance in unaided conflicts, reducing their use of escalating competitive strategies by 67% and doubling their use of cooperative strategies.
    Abstract Interpersonal conflict is an uncomfortable but unavoidable fact of life. Navigating conflict successfully is a skill -- one that can be learned through deliberate practice -- but few have access to effective training or feedback. To expand this access, we introduce Rehearsal, a system that allows users to rehearse conflicts with a believable simulated interlocutor, explore counterfactual "what if?" scenarios to identify alternative conversational paths, and learn through feedback on how and when to apply specific conflict strategies. Users can utilize Rehearsal to practice handling a variety of predefined conflict scenarios, from office disputes to relationship issues, or they can choose to create their own. To enable Rehearsal, we develop IRP prompting, a method of conditioning output of a large language model on the influential Interest-Rights-Power (IRP) theory from conflict resolution. Rehearsal uses IRP to generate utterances grounded in conflict resolution theory, guiding users towards counterfactual conflict resolution strategies that help de-escalate difficult conversations. In a between-subjects evaluation, 40 participants engaged in an actual conflict with a confederate after training. Compared to a control group with lecture material covering the same IRP theory, participants with simulated training from Rehearsal significantly improved their performance in the unaided conflict: they reduced their use of escalating competitive strategies by an average of 67%, while doubling their use of cooperative strategies. Overall, Rehearsal highlights the potential effectiveness of language models as tools for learning and practicing interpersonal skills.
    摘要 人际冲突是生活中不可避免的一种不适,但是通过成功 Navigation 这种技能可以帮助你更好地处理这些冲突。然而,有限的人们有效地接受这种技能的训练和反馈。为解决这个问题,我们介绍了一个系统 called Rehearsal,它使得用户能够在受到虚拟对话者的反馈下练习冲突,探索不同的对话路径,并通过反馈学习如何在不同的情况下应用特定的冲突策略。用户可以使用 Rehearsal 练习各种预先定义的冲突场景,从办公室的争议到关系问题,或者他们可以创建自己的场景。为实现 Rehearsal,我们开发了 IRP 提示,一种基于冲突解决理论中的 Interest-Rights-Power(IRP)理论来控制大语言模型的输出。Rehearsal 使用 IRP 生成对话,引导用户采取与冲突解决相关的措施,以帮助他们在不易的对话中减少竞争策略的使用,同时增加合作策略的使用。在一个 between-subjects 评估中,40名参与者在与一名演员进行了实际冲突后接受了 Rehearsal 的虚拟训练。与控制组,其中接受了同样的 IRP 讲解材料,相比之下,Rehearsal 组的参与者在没有任何帮助的情况下处理冲突时表现出了显著改善:他们减少了竞争策略的使用平均67%,同时增加了合作策略的使用。总的来说,Rehearsal highlights 语言模型的潜在效果性作为人际技能学习和练习的工具。

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

  • paper_url: http://arxiv.org/abs/2309.12307
  • repo_url: https://github.com/dvlab-research/longlora
  • paper_authors: Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia
    for: 这个研究旨在提高预训语言模型(LLM)的上下文大小,以减少计算成本,并且保留原始架构。methods: 本研究使用了两种方法来实现上下文扩展:首先,使用稀疏的地方注意力进行练习,以减少计算成本;其次,重新检视了受限的参数练习环境,以确保模型在扩展上下文时仍然能够保持好的性能。results: 本研究在多个任务上实现了优秀的实验结果,包括从7B/13B到70B的LLaMA2模型。具体来说,LongLoRA可以将7B模型的上下文延长至4k至100k,或者将70B模型的上下文延长至32k,在单一的8x A100机器上进行训练。此外,本研究还创建了一个名为LongQA的数据集,用于监督练习。这个数据集包含了超过3,000个长上下文问题答案对。
    Abstract We present LongLoRA, an efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs), with limited computation cost. Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048. In this paper, we speed up the context extension of LLMs in two aspects. On the one hand, although dense global attention is needed during inference, fine-tuning the model can be effectively and efficiently done by sparse local attention. The proposed shift short attention effectively enables context extension, leading to non-trivial computation saving with similar performance to fine-tuning with vanilla attention. Particularly, it can be implemented with only two lines of code in training, while being optional in inference. On the other hand, we revisit the parameter-efficient fine-tuning regime for context expansion. Notably, we find that LoRA for context extension works well under the premise of trainable embedding and normalization. LongLoRA demonstrates strong empirical results on various tasks on LLaMA2 models from 7B/13B to 70B. LongLoRA adopts LLaMA2 7B from 4k context to 100k, or LLaMA2 70B to 32k on a single 8x A100 machine. LongLoRA extends models' context while retaining their original architectures, and is compatible with most existing techniques, like FlashAttention-2. In addition, to make LongLoRA practical, we collect a dataset, LongQA, for supervised fine-tuning. It contains more than 3k long context question-answer pairs.
    摘要 我们介绍LongLoRA,一种高效的精度调整方法,可以将预训练大语言模型(LLM)的上下文大小提高,而不需要巨大的计算成本。通常,在训练LLMs时,使用长上下文大小需要大量的计算时间和GPU资源。例如,在上下文长度为8192时,需要16倍的计算成本,相比于上下文长度为2048。在这篇论文中,我们提高了LLM的上下文扩展的速度,从两个方面进行了优化。一方面,虽然在推理时需要使用紧凑的全球注意力,但在微调时可以使用笔者的本地注意力进行有效和高效地调整。我们提出的Shift Short Attention技术可以有效地扩展上下文,并且可以在训练中实现只需两行代码,而在推理时可以选择使用。另一方面,我们再次检视了 Parametric Efficient Fine-Tuning 的 режим,发现在可 Trainable Embedding 和 Normalization 的前提下,LoRA 对上下文扩展非常有效。LongLoRA在多种任务上表现出色,包括 LLMA2 模型从 7B/13B 到 70B。LongLoRA 可以从 4k 上下文扩展到 100k,或者从 70B 下降到 32k 的单个 8x A100 机器上。LongLoRA 可以保留原始模型的结构,并且可以与大多数现有技术相容,如 FlashAttention-2。此外,为了让 LongLoRA 实用,我们收集了一个数据集,LongQA,用于supervised fine-tuning。这个数据集包含了 более3k个长上下文问答对。

Environment-biased Feature Ranking for Novelty Detection Robustness

  • paper_url: http://arxiv.org/abs/2309.12301
  • repo_url: None
  • paper_authors: Stefan Smeu, Elena Burceanu, Emanuela Haller, Andrei Liviu Nicolicioiu
  • for: robust novelty detection, aiming to detect novelties in terms of semantic content while being invariant to changes in other, irrelevant factors.
  • methods: propose a method that starts with a pretrained embedding and a multi-env setup, and ranks features based on their environment-focus, using a per-feature score based on feature distribution variance between envs.
  • results: improve the overall performance by up to 6%, both in covariance and sub-population shift cases, both for a real and a synthetic benchmark.
    Abstract We tackle the problem of robust novelty detection, where we aim to detect novelties in terms of semantic content while being invariant to changes in other, irrelevant factors. Specifically, we operate in a setup with multiple environments, where we determine the set of features that are associated more with the environments, rather than to the content relevant for the task. Thus, we propose a method that starts with a pretrained embedding and a multi-env setup and manages to rank the features based on their environment-focus. First, we compute a per-feature score based on the feature distribution variance between envs. Next, we show that by dropping the highly scored ones, we manage to remove spurious correlations and improve the overall performance by up to 6%, both in covariance and sub-population shift cases, both for a real and a synthetic benchmark, that we introduce for this task.
    摘要 我们面临Semantic novelty detection问题,即检测含义上的新鲜事物,而不受其他无关因素的变化影响。特别是,我们在多个环境下运行,并确定了与环境相关的特征,而不是与任务相关的内容。因此,我们提出了一种方法,它从预训练的嵌入起始,并在多个环境下进行了环境带重分类。首先,我们计算了每个特征的分布差异分布变化得分,以确定它们与环境相关程度。然后,我们证明了,通过去掉高分得分的特征,可以消除干扰关系,并提高总性能。我们在 covariance和 sub-population shift案例中实现了这一点,并在真实和 sintetic benchmark 中实现了6%的提高。

See to Touch: Learning Tactile Dexterity through Visual Incentives

  • paper_url: http://arxiv.org/abs/2309.12300
  • repo_url: None
  • paper_authors: Irmak Guzey, Yinlong Dai, Ben Evans, Soumith Chintala, Lerrel Pinto
  • for: 提高多 finger 机器人的灵活和精准把握能力。
  • methods: 使用视觉奖励来优化把握策略。
  • results: 在六个复杂任务中,如抓取固定物体、推翻细长物体等,TAVI 得到了 73% 的成功率,相比之下,不使用视觉奖励的策略只得到了 65% 的成功率,而使用视觉和抓取奖励的策略则达到了 82% 的成功率。Here’s the full translation of the paper’s abstract in Simplified Chinese:
  • for: 这篇论文旨在提高多 finger 机器人的灵活和精准把握能力。
  • methods: 该论文使用视觉奖励来优化把握策略。
  • results: 在六个复杂任务中,TAVI 得到了 73% 的成功率,相比之下,不使用视觉奖励的策略只得到了 65% 的成功率,而使用视觉和抓取奖励的策略则达到了 82% 的成功率。I hope this helps! Let me know if you have any further questions.
    Abstract Equipping multi-fingered robots with tactile sensing is crucial for achieving the precise, contact-rich, and dexterous manipulation that humans excel at. However, relying solely on tactile sensing fails to provide adequate cues for reasoning about objects' spatial configurations, limiting the ability to correct errors and adapt to changing situations. In this paper, we present Tactile Adaptation from Visual Incentives (TAVI), a new framework that enhances tactile-based dexterity by optimizing dexterous policies using vision-based rewards. First, we use a contrastive-based objective to learn visual representations. Next, we construct a reward function using these visual representations through optimal-transport based matching on one human demonstration. Finally, we use online reinforcement learning on our robot to optimize tactile-based policies that maximize the visual reward. On six challenging tasks, such as peg pick-and-place, unstacking bowls, and flipping slender objects, TAVI achieves a success rate of 73% using our four-fingered Allegro robot hand. The increase in performance is 108% higher than policies using tactile and vision-based rewards and 135% higher than policies without tactile observational input. Robot videos are best viewed on our project website: https://see-to-touch.github.io/.
    摘要 装备多指抓取机器人的感觉感知是实现人类精准、接触丰富、灵活抓取的关键。然而,仅仅通过感觉感知不能提供充分的启示,用于了解物体的空间配置,限制了 corrected errors and adapt to changing situations。在这篇论文中,我们提出了视觉适应的策略(TAVI),一种新的框架,可以通过视觉奖励来增强感觉基础的灵活性。首先,我们使用对比度基于的目标函数来学习视觉表示。接着,我们通过对一个人示例的最佳匹配来构建视觉奖励函数。最后,我们使用在我们四指抓取机器人上的在线反射学习来优化感觉基础的策略,以达到最大化视觉奖励的目标。在六个具有挑战性的任务中,例如吸盘卸、推倒碗和抓flipping细长物体,TAVI达到了73%的成功率。与不含感觉观察输入的策略相比,TAVI的表现提高了108%,与基于感觉和视觉奖励的策略相比,提高了135%。机器人视频最好在我们项目网站上观看:https://see-to-touch.github.io/。

Learning to Drive Anywhere

  • paper_url: http://arxiv.org/abs/2309.12295
  • repo_url: https://github.com/Sfedfcv/redesigned-pancake
  • paper_authors: Ruizhao Zhu, Peng Huang, Eshed Ohn-Bar, Venkatesh Saligrama
  • for: 这个研究旨在开发一个可以在不同地理位置和规律下适应驾驶决策的自动驾驶模型。
  • methods: 这个研究使用了一种名为conditional imitation learning(CIL)的方法,并将高容量的地图位置基于的频道对应 Mechanism引入,以有效地适应地方特点,同时也能够模型区域间的相似性。
  • results: 研究发现,使用AnyD模型可以在多个数据集、城市和扩展方法(如中央、半监督和分布式训练)中表现出色,比基线CIL模型高出14%以上在开 Loop评估中和30%以上在关闭 Loop测试中。
    Abstract Human drivers can seamlessly adapt their driving decisions across geographical locations with diverse conditions and rules of the road, e.g., left vs. right-hand traffic. In contrast, existing models for autonomous driving have been thus far only deployed within restricted operational domains, i.e., without accounting for varying driving behaviors across locations or model scalability. In this work, we propose AnyD, a single geographically-aware conditional imitation learning (CIL) model that can efficiently learn from heterogeneous and globally distributed data with dynamic environmental, traffic, and social characteristics. Our key insight is to introduce a high-capacity geo-location-based channel attention mechanism that effectively adapts to local nuances while also flexibly modeling similarities among regions in a data-driven manner. By optimizing a contrastive imitation objective, our proposed approach can efficiently scale across inherently imbalanced data distributions and location-dependent events. We demonstrate the benefits of our AnyD agent across multiple datasets, cities, and scalable deployment paradigms, i.e., centralized, semi-supervised, and distributed agent training. Specifically, AnyD outperforms CIL baselines by over 14% in open-loop evaluation and 30% in closed-loop testing on CARLA.
    摘要 人类司机可以无缝地适应不同地区的条件和道路规则,例如左右两种交通方向。然而,现有的自动驾驶模型只能在限定的运行域中进行部署,无法考虑不同地区的驾驶行为和模型可扩展性。在这项工作中,我们提出了AnyD,一个基于条件学习(CIL)模型,可以高效地从多样化的全球分布的数据中学习地区特有的驾驶行为。我们的关键发现是引入高容量的地理位置基于的通道注意力机制,可以有效地适应本地特点,同时也能够模型地区之间的相似性。通过优化一个对比式学习目标函数,我们的提出的AnyD代理可以高效地扩展到具有不同数据分布和地区事件的情况。我们在多个数据集、城市和可扩展的训练方法(中央化、半supervised和分布式代理训练)中证明了AnyD的优势,特别是在CARLA上,AnyD比基线CIL模型高于14%的开loop评估和30%的关闭loop测试。

The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”

  • paper_url: http://arxiv.org/abs/2309.12288
  • repo_url: https://github.com/lukasberglund/reversal_curse
  • paper_authors: Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, Owain Evans
    for:* The paper reveals a surprising failure of generalization in auto-regressive large language models (LLMs) when it comes to reversals of statements.methods:* The authors train and fine-tune GPT-3 and Llama-1 on fictitious statements and evaluate their performance on questions about real-world celebrities.results:* The models fail to correctly answer questions about real-world celebrities when the information is presented in reverse order, demonstrating a basic failure of logical deduction. GPT-4 is able to correctly answer questions about real-world celebrities 79% of the time, but only 33% of the time when the information is presented in reverse order. This failure is called the “Reversal Curse” and is robust across model sizes and model families.
    Abstract We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Olaf Scholz was the ninth Chancellor of Germany", it will not automatically be able to answer the question, "Who was the ninth Chancellor of Germany?". Moreover, the likelihood of the correct answer ("Olaf Scholz") will not be higher than for a random name. Thus, models exhibit a basic failure of logical deduction and do not generalize a prevalent pattern in their training set (i.e. if "A is B'' occurs, "B is A" is more likely to occur). We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of 'Abyssal Melodies'" and showing that they fail to correctly answer "Who composed 'Abyssal Melodies?'". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. This shows a failure of logical deduction that we hypothesize is caused by the Reversal Curse. Code is available at https://github.com/lukasberglund/reversal_curse.
    摘要 我们揭示了自动进行推理的大语言模型(LLM)的一个意外的泛化失败。如果一个模型被训练在“A是B”的句子上,它不会自动泛化到“B是A”的方向。我们称这为“排名之咒”。例如,如果一个模型被训练在“奥拉夫·施科尔茨是德国第九任总理”上,它不会自动回答“德国第九任总理是谁?”的问题,而且对于正确答案(奥拉夫·施科尔茨)的概率不高于随机名称。因此,模型表现出了基本的逻辑推理失败,不会泛化训练集中的普遍规律(即如果“A是B”出现,“B是A”更可能出现)。我们提供了证据,通过finetuning GPT-3和Llama-1在虚假句子上,并示出它们无法正确回答“谁写了‘abyssal Melodies’?”的问题。排名之咒是模型大小和模型家族的稳定特征,不受数据增强影响。我们还评估了ChatGPT(GPT-3.5和GPT-4)在真实世界名人的问题上,如“谁是汤美·克雷的妈妈?”和其反向“谁是mary Lee Pfeiffer的儿子?”。GPT-4在前者79%的时间内正确回答问题,而后者只有33%。这表明了逻辑推理的失败,我们假设是由排名之咒引起的。代码可以在https://github.com/lukasberglund/reversal_curse上获取。

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

  • paper_url: http://arxiv.org/abs/2309.12284
  • repo_url: https://github.com/meta-math/MetaMath
  • paper_authors: Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T. Kwok, Zhenguo Li, Adrian Weller, Weiyang Liu
    for:这篇论文旨在提高大型自然语言处理器(LLMs)的数学理解能力,以提供更好的数学问题解决能力。methods:作者们使用了自然语言处理技术,包括重新写数学问题的多种角度,以生成一个名为MetaMathQA的新数据集。然后,他们使用了LLaMA-2模型进行微调,以便在数学理解方面进行更好的表现。results:实验结果表明,作者们的MetaMath模型在两个流行的数学理解benchmark(GSM8K和MATH)上表现出色,与开源LLMs相比,它们的表现有所提高。具体来说,MetaMath-7B模型在GSM8K上达到了66.4%的准确率,而MetaMath-70B模型在MATH上达到了82.3%的准确率,这 beiden都高于同等模型大小的state-of-the-art模型。
    Abstract Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (e.g., LLaMA-2) are still far away from satisfactory for solving mathematical problem due to the complex reasoning procedures. To bridge this gap, we propose MetaMath, a fine-tuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives without extra knowledge, which results in a new dataset called MetaMathQA. Then we fine-tune the LLaMA-2 models on MetaMathQA. Experimental results on two popular benchmarks (i.e., GSM8K and MATH) for mathematical reasoning demonstrate that MetaMath outperforms a suite of open-source LLMs by a significant margin. Our MetaMath-7B model achieves 66.4% on GSM8K and 19.4% on MATH, exceeding the state-of-the-art models of the same size by 11.5% and 8.7%. Particularly, MetaMath-70B achieves an accuracy of 82.3% on GSM8K, slightly better than GPT-3.5-Turbo. We release all the MetaMathQA dataset, the MetaMath models with different model sizes and the training code for public use.
    摘要 大型语言模型(LLMs)已经推进了自然语言理解的limits和显示出了优秀的问题解决能力。 despite the great success, most existing open-source LLMs(例如LLaMA-2)仍然与mathematical problem solving remains far away from satisfactory due to the complex reasoning procedures. To bridge this gap, we propose MetaMath, a fine-tuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives without extra knowledge, which results in a new dataset called MetaMathQA. Then we fine-tune the LLaMA-2 models on MetaMathQA. Experimental results on two popular benchmarks(i.e., GSM8K和MATH)for mathematical reasoning demonstrate that MetaMath outperforms a suite of open-source LLMs by a significant margin. Our MetaMath-7B model achieves 66.4% on GSM8K and 19.4% on MATH, exceeding the state-of-the-art models of the same size by 11.5% and 8.7%. Particularly, MetaMath-70B achieves an accuracy of 82.3% on GSM8K, slightly better than GPT-3.5-Turbo. We release all the MetaMathQA dataset, the MetaMath models with different model sizes, and the training code for public use.

LLMR: Real-time Prompting of Interactive Worlds using Large Language Models

  • paper_url: http://arxiv.org/abs/2309.12276
  • repo_url: https://github.com/asem010/legend-pice
  • paper_authors: Fernanda De La Torre, Cathy Mengying Fang, Han Huang, Andrzej Banburski-Fahey, Judith Amores Fernandez, Jaron Lanier
  • for: 本研究开发了一个大语言模型 для混合现实(LLMR),用于实时创建和修改混合现实经验。
  • methods: LLMR 使用了新的策略来解决对理想训练数据罕至或需要创造内部动力、敏捷分析或高级互动的问题。 它靠扩展文本互动和 Unity 游戏引擎。
  • results: LLMR 比标准 GPT-4 快四倍在均误率上。它在跨平台互操作性方面进行了多个示例世界的评估和创建/修改任务的评估,并进行了一次用户研究(N=11),发现用户们对系统有正面体验并会再次使用它。
    Abstract We present Large Language Model for Mixed Reality (LLMR), a framework for the real-time creation and modification of interactive Mixed Reality experiences using LLMs. LLMR leverages novel strategies to tackle difficult cases where ideal training data is scarce, or where the design goal requires the synthesis of internal dynamics, intuitive analysis, or advanced interactivity. Our framework relies on text interaction and the Unity game engine. By incorporating techniques for scene understanding, task planning, self-debugging, and memory management, LLMR outperforms the standard GPT-4 by 4x in average error rate. We demonstrate LLMR's cross-platform interoperability with several example worlds, and evaluate it on a variety of creation and modification tasks to show that it can produce and edit diverse objects, tools, and scenes. Finally, we conducted a usability study (N=11) with a diverse set that revealed participants had positive experiences with the system and would use it again.
    摘要 我们介绍Large Language Model for Mixed Reality(LLMR),一个用于实时创建和修改混合现实经验的框架,使用大自然语言模型(LLM)。 LLMR 利用新的策略来解决缺乏理想训练数据或设计目标需要创造内部动力、直观分析或进阶互动的问题。我们的框架基于文本互动和Unity游戏引擎。通过包括场景理解、任务观察、自我检查和记忆管理的技术,LLMR 在average error rate上比标准GPT-4高出4倍。我们显示 LLMR 的跨平台可扩展性,并评估其在创建和修改多种物品、工具和场景时的表现。最后,我们进行了一次使用者研究(N=11),发现参与者对系统有正面的经验,并会再次使用它。

Enabling Quartile-based Estimated-Mean Gradient Aggregation As Baseline for Federated Image Classifications

  • paper_url: http://arxiv.org/abs/2309.12267
  • repo_url: None
  • paper_authors: Yusen Wu, Jamie Deng, Hao Chen, Phuong Nguyen, Yelena Yesha
  • for: 这 paper 的目的是提出一种解决 federated learning 中数据多样性和安全性问题的新方法,以及提供一个基本参考点 для advanced aggregation techniques。
  • methods: 该 paper 使用了 estimated mean aggregation (EMA) 方法,它通过使用 trimmed means 处理异常值和揭示数据不同性,以确保模型在各个客户端数据集上进行适应。
  • results: via 大量实验,EMA 方法能够保持高精度和 area under the curve (AUC),相比于其他方法,EMA 方法成为 federated learning 中效果和安全性的基本参考点。
    Abstract Federated Learning (FL) has revolutionized how we train deep neural networks by enabling decentralized collaboration while safeguarding sensitive data and improving model performance. However, FL faces two crucial challenges: the diverse nature of data held by individual clients and the vulnerability of the FL system to security breaches. This paper introduces an innovative solution named Estimated Mean Aggregation (EMA) that not only addresses these challenges but also provides a fundamental reference point as a $\mathsf{baseline}$ for advanced aggregation techniques in FL systems. EMA's significance lies in its dual role: enhancing model security by effectively handling malicious outliers through trimmed means and uncovering data heterogeneity to ensure that trained models are adaptable across various client datasets. Through a wealth of experiments, EMA consistently demonstrates high accuracy and area under the curve (AUC) compared to alternative methods, establishing itself as a robust baseline for evaluating the effectiveness and security of FL aggregation methods. EMA's contributions thus offer a crucial step forward in advancing the efficiency, security, and versatility of decentralized deep learning in the context of FL.
    摘要

SALSA-CLRS: A Sparse and Scalable Benchmark for Algorithmic Reasoning

  • paper_url: http://arxiv.org/abs/2309.12253
  • repo_url: https://github.com/jkminder/salsa-clrs
  • paper_authors: Julian Minder, Florian Grötschla, Joël Mathys, Roger Wattenhofer
  • for: 本研究旨在扩展CLRS算法学习benchmark,强调可扩展性和稀疏表示的使用。
  • methods: 本研究使用了修改后CLRS算法和分布式随机算法中的一些问题,以及新增了一些问题。
  • results: 我们在empirical evaluation中发现,SALSA-CLRS比CLRS更具可扩展性和稀疏表示能力。
    Abstract We introduce an extension to the CLRS algorithmic learning benchmark, prioritizing scalability and the utilization of sparse representations. Many algorithms in CLRS require global memory or information exchange, mirrored in its execution model, which constructs fully connected (not sparse) graphs based on the underlying problem. Despite CLRS's aim of assessing how effectively learned algorithms can generalize to larger instances, the existing execution model becomes a significant constraint due to its demanding memory requirements and runtime (hard to scale). However, many important algorithms do not demand a fully connected graph; these algorithms, primarily distributed in nature, align closely with the message-passing paradigm employed by Graph Neural Networks. Hence, we propose SALSA-CLRS, an extension of the current CLRS benchmark specifically with scalability and sparseness in mind. Our approach includes adapted algorithms from the original CLRS benchmark and introduces new problems from distributed and randomized algorithms. Moreover, we perform a thorough empirical evaluation of our benchmark. Code is publicly available at https://github.com/jkminder/SALSA-CLRS.
    摘要 我们介绍一个CLRS算法学习标准延伸,优先愿景是数据分析和紧缩表示。许多CLRS中的算法需要全球内存或资讯交换,这反映了它们的执行模型,它们创建了不紧缩的图(不是紧缩图)基于下面问题。尽管CLRS的目标是评估学习算法在更大的实例中对稍低的数据分析和紧缩表示的能力,但现有的执行模型对于内存需求和时间(困难扩展)带来了重要的限制。然而,许多重要的算法不需要全球连接图,这些算法,主要是分布式的,与传递讯息的方法相似,这与Graph Neural Networks的传递讯息模型相符。因此,我们提出了SALSA-CLRS,CLRS标准的延伸,优先愿景是数据分析和紧缩表示的扩展。我们的方法包括CLRS中原始的算法的修改,以及新的分布式和随机算法问题。此外,我们执行了详细的实验评估。代码可以在https://github.com/jkminder/SALSA-CLRS上取得。

Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection

  • paper_url: http://arxiv.org/abs/2309.12247
  • repo_url: https://github.com/ictmcg/arg
  • paper_authors: Beizhe Hu, Qiang Sheng, Juan Cao, Yuhui Shi, Yang Li, Danding Wang, Peng Qi
  • for: 这个论文旨在研究大型语言模型(LLMs)是否可以帮助检测假新闻。
  • methods: 作者使用了一种名为ARG的可靠性指南网络,该网络使用了精心调整的BERT来选择合适的理由,以帮助检测假新闻。
  • results: 实验结果表明,作者的ARG和ARG-D方法可以比三种基eline方法(包括SLM、LLM和这两种组合)表现更好,并且可以在Cost-sensitive的enario中提供更好的性能。
    Abstract Detecting fake news requires both a delicate sense of diverse clues and a profound understanding of the real-world background, which remains challenging for detectors based on small language models (SLMs) due to their knowledge and capability limitations. Recent advances in large language models (LLMs) have shown remarkable performance in various tasks, but whether and how LLMs could help with fake news detection remains underexplored. In this paper, we investigate the potential of LLMs in fake news detection. First, we conduct an empirical study and find that a sophisticated LLM such as GPT 3.5 could generally expose fake news and provide desirable multi-perspective rationales but still underperforms the basic SLM, fine-tuned BERT. Our subsequent analysis attributes such a gap to the LLM's inability to select and integrate rationales properly to conclude. Based on these findings, we propose that current LLMs may not substitute fine-tuned SLMs in fake news detection but can be a good advisor for SLMs by providing multi-perspective instructive rationales. To instantiate this proposal, we design an adaptive rationale guidance network for fake news detection (ARG), in which SLMs selectively acquire insights on news analysis from the LLMs' rationales. We further derive a rationale-free version of ARG by distillation, namely ARG-D, which services cost-sensitive scenarios without inquiring LLMs. Experiments on two real-world datasets demonstrate that ARG and ARG-D outperform three types of baseline methods, including SLM-based, LLM-based, and combinations of small and large language models.
    摘要 检测假新闻需要一种细腻的多种证据敏感和深刻的真实背景理解,这对小语言模型(SLM)来说是一项挑战。然而,大语言模型(LLM)的最新进展表现出色在多种任务上,但是LLM是否能够帮助检测假新闻仍然未得到充分探索。本文 investigate LLM在假新闻检测中的潜力。我们首先进行了实验研究,发现一个复杂的LLM如GPT 3.5可以暴露假新闻并提供愉悦多元理由,但还是落后于基本SLM、精通BERT。我们后续的分析表明这种差距是由LLM无法选择和结合理由而导致的。根据这些发现,我们认为当前LLM可能不能完全取代精通SLM,但可以作为SLM的好帮手,提供多元指导性的理由。为实现这一提议,我们设计了一种适应性理由指导网络(ARG),其中SLM可以选择性地从LLM的理由中获得分析新闻的 instrucity。此外,我们还 derivates一种不需要理由的ARG-D版本,通过萃取来实现成本敏感enario中无需问题LLM。我们在两个实际 datasets上进行了实验,发现ARG和ARG-D都高于三种基eline方法,包括SLM、LLM和小语言模型的组合。

ChaCha: Leveraging Large Language Models to Prompt Children to Share Their Emotions about Personal Events

  • paper_url: http://arxiv.org/abs/2309.12244
  • repo_url: None
  • paper_authors: Woosuk Seo, Chanmo Yang, Young-Ho Kim
  • for: 这篇研究的目的是为了探讨儿童如何通过与他人分享故事和感受来学习表达情感。
  • methods: 这篇研究使用了一个状态机和大型自然语言模型(LLMs),将对话保持在轨道上,同时让儿童进行自由的对话。
  • results: 研究发现,儿童对 ChaCha 表示出了亲密的感觉,并让他们分享了各种主题,如家庭旅行和个人成就。
    Abstract Children typically learn to identify and express emotions through sharing their stories and feelings with others, particularly their family. However, it is challenging for parents or siblings to have emotional communication with children since children are still developing their communication skills. We present ChaCha, a chatbot that encourages and guides children to share personal events and associated emotions. ChaCha combines a state machine and large language models (LLMs) to keep the dialogue on track while carrying on free-form conversations. Through an exploratory study with 20 children (aged 8-12), we examine how ChaCha prompts children to share personal events and guides them to describe associated emotions. Participants perceived ChaCha as a close friend and shared their stories on various topics, such as family trips and personal achievements. Based on the quantitative and qualitative findings, we discuss opportunities for leveraging LLMs to design child-friendly chatbots to support children in sharing their emotions.
    摘要 孩子通常通过与他人分享自己的故事和感受来学习识别和表达情感。然而,由于孩子的交流技巧还在发展,因此与孩子进行情感交流可以是一项挑战。我们介绍了一个名为ChaCha的虚拟助手,它鼓励和指导孩子分享个人事件和相关的情感。ChaCha结合状态机制和大型自然语言模型(LLM),以保持对话在轨而进行自由的对话。我们通过对20名8-12岁的孩子进行exploratory研究,发现ChaCha可以鼓励孩子分享个人事件并指导他们描述相关的情感。参与者认为ChaCha是一个亲密的朋友,并分享了各种话题,如家庭旅行和个人成就。根据数据和质量调查结果,我们讨论了如何利用LLM来设计适合孩子的虚拟助手,以支持孩子在表达情感方面。

Explainable Artificial Intelligence for Drug Discovery and Development – A Comprehensive Survey

  • paper_url: http://arxiv.org/abs/2309.12177
  • repo_url: None
  • paper_authors: Roohallah Alizadehsani, Sadiq Hussain, Rene Ripardo Calixto, Victor Hugo C. de Albuquerque, Mohamad Roshanzamir, Mohamed Rahouti, Senthil Kumar Jagatheesaperumal
  • for: 本文提供了一个全面的介绍,涵盖了使用Explainable Artificial Intelligence(XAI)技术在药物发现中的当前状况,包括不同的XAI方法、其应用在药物发现中的方法、以及XAI技术在药物发现中的挑战和限制。
  • methods: 本文详细介绍了XAI技术在药物发现中的应用,包括目标预测、化学物质设计和毒性预测等方面。
  • results: 本文summarizes the current state of XAI in drug discovery, highlighting the challenges and limitations of XAI techniques in drug discovery, and suggesting potential future research directions for the application of XAI in drug discovery.
    Abstract The field of drug discovery has experienced a remarkable transformation with the advent of artificial intelligence (AI) and machine learning (ML) technologies. However, as these AI and ML models are becoming more complex, there is a growing need for transparency and interpretability of the models. Explainable Artificial Intelligence (XAI) is a novel approach that addresses this issue and provides a more interpretable understanding of the predictions made by machine learning models. In recent years, there has been an increasing interest in the application of XAI techniques to drug discovery. This review article provides a comprehensive overview of the current state-of-the-art in XAI for drug discovery, including various XAI methods, their application in drug discovery, and the challenges and limitations of XAI techniques in drug discovery. The article also covers the application of XAI in drug discovery, including target identification, compound design, and toxicity prediction. Furthermore, the article suggests potential future research directions for the application of XAI in drug discovery. The aim of this review article is to provide a comprehensive understanding of the current state of XAI in drug discovery and its potential to transform the field.
    摘要 随着人工智能(AI)和机器学习(ML)技术的出现,药物发现领域已经经历了很大的变革。然而,随着AI和ML模型的复杂度的增加,对模型的透明度和解释性的需求也在增加。解释性人工智能(XAI)是一种新的方法,旨在提供更加解释的机器学习模型预测结果的理解。在过去几年中,对XAI技术的应用在药物发现领域的兴趣有所增加。本文提供了药物发现领域XAI技术的当前状态的总结,包括各种XAI方法、其应用在药物发现领域、以及XAI技术在药物发现领域的挑战和限制。文章还涵盖了XAI技术在药物发现中的应用,包括目标 indentification、化合物设计和毒性预测。此外,文章还提出了XAI技术在药物发现领域的未来研究方向。本文的目的是为读者提供药物发现领域XAI技术的全面了解,以及其在药物发现领域的潜在变革性。

SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap

  • paper_url: http://arxiv.org/abs/2309.12382
  • repo_url: None
  • paper_authors: Daehee Kim, Yoonsik Kim, DongHyun Kim, Yumin Lim, Geewook Kim, Taeho Kil
  • for: 这个研究是为了提高语言模型(LM)基于预训练的效能,并应用于视觉文档理解中。
  • methods: 这篇研究使用了LM基于预训练方法,并提出了一个名为SCOB的新预训练方法,这个方法使用字元别超vised contrastive learning和在线文本渲染来将文档和景象文本领域融合。
  • results: 实验结果显示,SCOB比vanilla预训练方法有更好的效果,并且与现有的方法相比,它的表现相当。这些结果表明,SCOB可以应用于读取类型的预训练方法中。
    Abstract Inspired by the great success of language model (LM)-based pre-training, recent studies in visual document understanding have explored LM-based pre-training methods for modeling text within document images. Among them, pre-training that reads all text from an image has shown promise, but often exhibits instability and even fails when applied to broader domains, such as those involving both visual documents and scene text images. This is a substantial limitation for real-world scenarios, where the processing of text image inputs in diverse domains is essential. In this paper, we investigate effective pre-training tasks in the broader domains and also propose a novel pre-training method called SCOB that leverages character-wise supervised contrastive learning with online text rendering to effectively pre-train document and scene text domains by bridging the domain gap. Moreover, SCOB enables weakly supervised learning, significantly reducing annotation costs. Extensive benchmarks demonstrate that SCOB generally improves vanilla pre-training methods and achieves comparable performance to state-of-the-art methods. Our findings suggest that SCOB can be served generally and effectively for read-type pre-training methods. The code will be available at https://github.com/naver-ai/scob.
    摘要 受Language Model(LM)预训示的成功启发,最近的视觉文档理解研究已经 explore LM预训示方法来模型图像中的文本。其中, reads all text from an image 预训示方法已经显示了 promise, 但经常存在不稳定和失败问题,尤其是在更广泛的领域,如文档和场景文本图像中。这是一个重要的限制,因为在实际应用中,处理各种文本图像输入的多样化领域是必要的。在这篇论文中,我们investigate effective pre-training tasks in the broader domains, 并提出了一种新的预训示方法called SCOB,它通过Character-wise supervised contrastive learning with online text rendering来有效地预训示文档和场景文本领域,并bridge the domain gap。此外,SCOB支持weakly supervised learning,可以减少注解成本。我们的实验表明,SCOB通常超越原始预训示方法,并与状态之artefacts达到相似的性能。我们的发现建议SCOB可以普遍应用和有效地预训示读类预训示方法。代码将在https://github.com/naver-ai/scob中提供。

Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features

  • paper_url: http://arxiv.org/abs/2309.12140
  • repo_url: None
  • paper_authors: Travis Zhang, Katie Luo, Cheng Perng Phoo, Yurong You, Wei-Lun Chao, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger
  • for: 提高自动驾驶车辆中3D对象检测系统的精度和通用性。
  • methods: 利用无标注重复旋转多个位置来适应新的驾驶环境,通过计算重复LiDAR扫描数据的统计信息来导导适应过程。
  • results: 通过含有空间量化历史特征的LiDAR检测模型和Lightweight回归头来强化检测模型,实现了20点的性能提升,特别是人员和远距离对象的检测。Here’s the breakdown of each point:
  • for: The paper aims to improve the accuracy and generalization of 3D object detection systems for self-driving cars.
  • methods: The proposed method uses unlabeled repeated traversals of multiple locations to adapt object detectors to new driving environments, and incorporates statistics computed from repeated LiDAR scans to guide the adaptation process.
  • results: The proposed method achieves significant improvements in detection performance, up to 20 points, especially in detecting pedestrians and distant objects, through the use of spatial quantized historical features and a lightweight regression head.
    Abstract The rapid development of 3D object detection systems for self-driving cars has significantly improved accuracy. However, these systems struggle to generalize across diverse driving environments, which can lead to safety-critical failures in detecting traffic participants. To address this, we propose a method that utilizes unlabeled repeated traversals of multiple locations to adapt object detectors to new driving environments. By incorporating statistics computed from repeated LiDAR scans, we guide the adaptation process effectively. Our approach enhances LiDAR-based detection models using spatial quantized historical features and introduces a lightweight regression head to leverage the statistics for feature regularization. Additionally, we leverage the statistics for a novel self-training process to stabilize the training. The framework is detector model-agnostic and experiments on real-world datasets demonstrate significant improvements, achieving up to a 20-point performance gain, especially in detecting pedestrians and distant objects. Code is available at https://github.com/zhangtravis/Hist-DA.
    摘要 三维物体探测系统的快速发展对自动驾驶车有了显著改善的准确性。然而,这些系统在不同的驾驶环境中很难泛化,这可能会导致检测交通参与者的安全关键失败。为解决这个问题,我们提出了一种方法,该方法利用多次重复的多个位置的无标签数据来适应新的驾驶环境。通过基于重复扫描 LiDAR 数据计算的统计信息,我们有效地引导适应过程。我们的方法可以增强基于 LiDAR 的探测模型,并引入空间量化历史特征来减少特征的抖动。此外,我们还利用统计信息进行一种新的自动训练过程,以稳定训练。这种框架是探测模型无关的,实验结果表明,在实际数据上可以获得大约 20 个表现指标的改善,特别是检测人员和远距离物体的检测。代码可以在 https://github.com/zhangtravis/Hist-DA 上获取。

On the relationship between Benchmarking, Standards and Certification in Robotics and AI

  • paper_url: http://arxiv.org/abs/2309.12139
  • repo_url: None
  • paper_authors: Alan F. T. Winfield, Matthew Studley
  • for: 这篇论文主要是用来探讨责任创新的相关过程,包括标准、认证和测试 benchmarking。
  • methods: 论文使用了标准、认证和测试 benchmarking 等方法来探讨责任创新的实践。
  • results: 论文通过 analyzing 标准、认证和测试 benchmarking 等方法, argued that these three linked processes are not only useful but vital to the broader practice of Responsible Innovation。
    Abstract Benchmarking, standards and certification are closely related processes. Standards can provide normative requirements that robotics and AI systems may or may not conform to. Certification generally relies upon conformance with one or more standards as the key determinant of granting a certificate to operate. And benchmarks are sets of standardised tests against which robots and AI systems can be measured. Benchmarks therefore can be thought of as informal standards. In this paper we will develop these themes with examples from benchmarking, standards and certification, and argue that these three linked processes are not only useful but vital to the broader practice of Responsible Innovation.
    摘要 《benchmarking、标准和认证》是密切相关的过程。标准可以提供一些必须遵循的规范要求,机器人和人工智能系统可能或可能不遵循。认证通常基于一个或多个标准来决定授予操作权限。而 benchmark 则是一种标准化测试集,可以用来评估机器人和人工智能系统的性能。 benchmark 因此可以被视为一种不正式的标准。在这篇文章中,我们将通过 examples from benchmarking、标准和认证, argued that这三个相关过程不仅有用,而且是责任创新的重要组成部分。

OSN-MDAD: Machine Translation Dataset for Arabic Multi-Dialectal Conversations on Online Social Media

  • paper_url: http://arxiv.org/abs/2309.12137
  • repo_url: None
  • paper_authors: Fatimah Alzamzami, Abdulmotaleb El Saddik
  • for: 本研究旨在提高阿拉伯语 dialectal 翻译模型的效果,以满足社交媒体平台上的语言需求。
  • methods: 研究人员采用了一种 Contextual Translation 策略,通过将英语推文翻译成四种阿拉伯语方言: Golfo、Yemeni、Iraqi 和 Levantine。
  • results: 研究人员通过开发神经网络翻译模型,证明了该数据集的可靠性和效果。
    Abstract While resources for English language are fairly sufficient to understand content on social media, similar resources in Arabic are still immature. The main reason that the resources in Arabic are insufficient is that Arabic has many dialects in addition to the standard version (MSA). Arabs do not use MSA in their daily communications; rather, they use dialectal versions. Unfortunately, social users transfer this phenomenon into their use of social media platforms, which in turn has raised an urgent need for building suitable AI models for language-dependent applications. Existing machine translation (MT) systems designed for MSA fail to work well with Arabic dialects. In light of this, it is necessary to adapt to the informal nature of communication on social networks by developing MT systems that can effectively handle the various dialects of Arabic. Unlike for MSA that shows advanced progress in MT systems, little effort has been exerted to utilize Arabic dialects for MT systems. While few attempts have been made to build translation datasets for dialectal Arabic, they are domain dependent and are not OSN cultural-language friendly. In this work, we attempt to alleviate these limitations by proposing an online social network-based multidialect Arabic dataset that is crafted by contextually translating English tweets into four Arabic dialects: Gulf, Yemeni, Iraqi, and Levantine. To perform the translation, we followed our proposed guideline framework for content translation, which could be universally applicable for translation between foreign languages and local dialects. We validated the authenticity of our proposed dataset by developing neural MT models for four Arabic dialects. Our results have shown a superior performance of our NMT models trained using our dataset. We believe that our dataset can reliably serve as an Arabic multidialectal translation dataset for informal MT tasks.
    摘要 在社交媒体上,英语资源够用以理解内容,但阿拉伯语资源仍然落后。主要原因是阿拉伯语有很多方言,而且用户不使用标准版本(MSA)在日常交流中,而是使用方言版本。这使得社交媒体平台上的用户将这种现象传播到了他们的使用方式,从而提高了建立适合语言依赖应用的人工智能模型的需求。现有的机器翻译(MT)系统设计为MSA时,对阿拉伯语方言不够有效。因此,需要适应社交媒体上的不正式交流方式,开发MT系统可以有效地处理不同的阿拉伯语方言。与MSA的翻译系统有很大进步的情况不同,对阿拉伯语方言的翻译系统几乎没有努力。尽管有些尝试了为dialectal Arabic建立翻译集合,但这些集合是域名 dependent 并不是社交网络文化语言友好。在这种情况下,我们尝试缓解这些限制,提出了一个基于社交网络的多方言阿拉伯语数据集。我们采用我们提议的内容翻译指南来进行翻译,这可以universally适用于外语到本地方言的翻译。我们验证了我们的数据集的authenticity,通过开发四种阿拉伯语方言的神经机器翻译模型。我们的结果表明,使用我们的数据集训练的NMT模型表现出色。我们认为,我们的数据集可靠地服务为阿拉伯语多方言翻译数据集。

A knowledge representation approach for construction contract knowledge modeling

  • paper_url: http://arxiv.org/abs/2309.12132
  • repo_url: None
  • paper_authors: Chunmo Zheng, Saika Wong, Xing Su, Yinqiu Tang
  • for: 本研究旨在使用大型自然语言模型(LLM)提高建筑合同管理的自动化程度,减少人类错误和时间成本。
  • methods: 本研究提出了一种嵌入式合同知识图(NCKG)知识表示方法,将专业人员驱动的合同知识以结构化的方式呈现,以防止LLM生成的内容不准确或欺诈性强。
  • results: 本研究实现了一个基于NCKG和LLM的合同审核管道,对建筑合同进行了可靠和可解释的审核,从而减少了合同风险。
    Abstract The emergence of large language models (LLMs) presents an unprecedented opportunity to automate construction contract management, reducing human errors and saving significant time and costs. However, LLMs may produce convincing yet inaccurate and misleading content due to a lack of domain expertise. To address this issue, expert-driven contract knowledge can be represented in a structured manner to constrain the automatic contract management process. This paper introduces the Nested Contract Knowledge Graph (NCKG), a knowledge representation approach that captures the complexity of contract knowledge using a nested structure. It includes a nested knowledge representation framework, a NCKG ontology built on the framework, and an implementation method. Furthermore, we present the LLM-assisted contract review pipeline enhanced with external knowledge in NCKG. Our pipeline achieves a promising performance in contract risk reviewing, shedding light on the combination of LLM and KG towards more reliable and interpretable contract management.
    摘要 LLM的出现提供了历史上无 precedent的机会,使得建筑合同管理可以自动化,从而减少人类错误和成本。然而,LLM可能生成的内容可能会有吸引力,但是不准确和欺骗性很强。为解决这个问题,我们可以通过封装专家驱动的合同知识来约束自动合同管理过程。本文介绍了嵌入式合同知识图ogram(NCKG),一种知识表示方法,它使用嵌入结构来捕捉合同知识的复杂性。它包括嵌入结构知识表示框架、基于框架的NCKG ontology,以及实现方法。此外,我们还提出了利用LLM和KG的结合来提高合同审核的管道。我们的管道实现了在合同风险审核中的优秀表现,为将LLM和KG结合在更可靠和可解释的合同管理中提供了灯光。

Incentivizing Massive Unknown Workers for Budget-Limited Crowdsensing: From Off-Line and On-Line Perspectives

  • paper_url: http://arxiv.org/abs/2309.12113
  • repo_url: None
  • paper_authors: Feng Li, Yuqi Chai, Huan Yang, Pengfei Hu, Lingjie Duan
  • For: 增强大量未知工作者的奖励机制, addresses the challenges of limited budget and dynamic worker population.* Methods: 基于Context-Aware Combinatorial Multi-Armed Bandit (CACI) mechanism, leveraging exploration-exploitation trade-off in a partitioned context space to incentivize massive unknown workers with limited budget.* Results: 提供了理论上的Upper Bounds on the regrets, 并通过实验证明了机制的有效性。
    Abstract Although the uncertainties of the workers can be addressed by the standard Combinatorial Multi-Armed Bandit (CMAB) framework in existing proposals through a trade-off between exploration and exploitation, we may not have sufficient budget to enable the trade-off among the individual workers, especially when the number of the workers is huge while the budget is limited. Moreover, the standard CMAB usually assumes the workers always stay in the system, whereas the workers may join in or depart from the system over time, such that what we have learnt for an individual worker cannot be applied after the worker leaves. To address the above challenging issues, in this paper, we first propose an off-line Context-Aware CMAB-based Incentive (CACI) mechanism. We innovate in leveraging the exploration-exploitation trade-off in a elaborately partitioned context space instead of the individual workers, to effectively incentivize the massive unknown workers with very limited budget. We also extend the above basic idea to the on-line setting where unknown workers may join in or depart from the systems dynamically, and propose an on-line version of the CACI mechanism. Specifically, by the exploitation-exploration trade-off in the context space, we learn to estimate the sensing ability of any unknown worker (even it never appeared in the system before) according to its context information. We perform rigorous theoretical analysis to reveal the upper bounds on the regrets of our CACI mechanisms and to prove their truthfulness and individual rationality, respectively. Extensive experiments on both synthetic and real datasets are also conducted to verify the efficacy of our mechanisms.
    摘要 尽管工作者的不确定性可以通过标准的 combinatorial Multi-Armed Bandit(CMAB)框架在现有的建议中Addressed through a trade-off between exploration and exploitation, but we may not have enough budget to enable the trade-off among individual workers, especially when the number of workers is large and the budget is limited. In addition, the standard CMAB usually assumes that workers always stay in the system, but workers may join or leave the system over time, so what we learn about an individual worker may not be applicable after they leave. To address these challenges, in this paper, we propose an offline Context-Aware CMAB-based Incentive (CACI) mechanism. We innovate by leveraging the exploration-exploitation trade-off in a carefully partitioned context space instead of individual workers to effectively incentivize massive unknown workers with a very limited budget. We also extend the basic idea to the online setting where unknown workers may join or leave the system dynamically, and propose an online version of the CACI mechanism. Specifically, by using the exploration-exploitation trade-off in the context space, we learn to estimate the sensing ability of any unknown worker (even if it has never appeared in the system before) based on its context information. We provide rigorous theoretical analysis to reveal the upper bounds on the regrets of our CACI mechanisms and to prove their truthfulness and individual rationality, respectively. Extensive experiments on both synthetic and real datasets are also conducted to verify the effectiveness of our mechanisms.

PEFTT: Parameter-Efficient Fine-Tuning for low-resource Tibetan pre-trained language models

  • paper_url: http://arxiv.org/abs/2309.12109
  • repo_url: None
  • paper_authors: Zhou Mingjun, Daiqing Zhuoma, Qun Nuo, Nyima Tashi
  • for: 这个研究是为了探索高资源语言模型的高效微调技术,以便更多的用户和机构可以使用这些模型进行训练。
  • methods: 本研究使用了三种有效微调策略:”提示微调”、”Adapter轻量级微调”和”提示微调+Adapter微调”,并对公共可用的 TNCC-title 数据集进行实验。
  • results: 实验结果表明,使用这些微调策略可以获得显著的改善,为 Tibetan 语言应用程序在基于预训练模型的上提供了有价值的发现。
    Abstract In this era of large language models (LLMs), the traditional training of models has become increasingly unimaginable for regular users and institutions. The exploration of efficient fine-tuning for high-resource languages on these models is an undeniable trend that is gradually gaining popularity. However, there has been very little exploration for various low-resource languages, such as Tibetan. Research in Tibetan NLP is inherently scarce and limited. While there is currently no existing large language model for Tibetan due to its low-resource nature, that day will undoubtedly arrive. Therefore, research on efficient fine-tuning for low-resource language models like Tibetan is highly necessary. Our research can serve as a reference to fill this crucial gap. Efficient fine-tuning strategies for pre-trained language models (PLMs) in Tibetan have seen minimal exploration. We conducted three types of efficient fine-tuning experiments on the publicly available TNCC-title dataset: "prompt-tuning," "Adapter lightweight fine-tuning," and "prompt-tuning + Adapter fine-tuning." The experimental results demonstrate significant improvements using these methods, providing valuable insights for advancing Tibetan language applications in the context of pre-trained models.
    摘要 在这个大语模型(LLM)时代,传统的模型训练已成为常见的无法想象的行为,特别是 для普通用户和机构。探索高资源语言模型的有效精细调整是一种日益受欢迎的趋势,但对低资源语言,如藏语,的研究却很少。藏语自然语言处理研究受限,目前没有任何藏语大语模型,这一天将来得不可避免。因此,关于低资源语言模型的有效精细调整是非常必要的。在这个背景下,我们对藏语预训练语言模型(PLM)的有效精细调整进行了较少的探索。我们在公共可用的 TNCC-title 数据集上进行了三种有效精细调整实验:"提示调整","Adapter 轻量级精细调整"和"提示调整 + Adapter 精细调整"。实验结果表明,这些方法可以获得显著改进,为 Tibetan 语言应用程序在预训练模型的上下文中提供了有价值的洞察。

Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation

  • paper_url: http://arxiv.org/abs/2309.12075
  • repo_url: https://github.com/eqtpartners/ptec
  • paper_authors: Valentin Leonhard Buchner, Lele Cao, Jan-Christoph Kalo, Vilhelm von Ehrenheim
  • for: 这个研究用于评估和优化预训练语言模型(PLM)的精心调整方法,以及这种方法在多类文本分类任务中的性能和计算效率。
  • methods: 本研究使用了Prompt Tuning和基准方法来评估多类文本分类任务的性能和计算效率。它还应用了Trie Search来Address limitation (a),并将PLM的语言头换为分类头来Address limitation (b)和(c)。
  • results: 研究发现,Prompt Tuned Embedding Classification(PTEC)可以显著改善文本分类性能,同时降低推理时间成本。此外,模型的性能在不同公司规模和知名度的情况下都具有可靠性。
    Abstract Prompt Tuning is emerging as a scalable and cost-effective method to fine-tune Pretrained Language Models (PLMs), which are often referred to as Large Language Models (LLMs). This study benchmarks the performance and computational efficiency of Prompt Tuning and baselines for multi-label text classification. This is applied to the challenging task of classifying companies into an investment firm's proprietary industry taxonomy, supporting their thematic investment strategy. Text-to-text classification is frequently reported to outperform task-specific classification heads, but has several limitations when applied to a multi-label classification problem where each label consists of multiple tokens: (a) Generated labels may not match any label in the label taxonomy; (b) The fine-tuning process lacks permutation invariance and is sensitive to the order of the provided labels; (c) The model provides binary decisions rather than appropriate confidence scores. Limitation (a) is addressed by applying constrained decoding using Trie Search, which slightly improves classification performance. All limitations (a), (b), and (c) are addressed by replacing the PLM's language head with a classification head, which is referred to as Prompt Tuned Embedding Classification (PTEC). This improves performance significantly, while also reducing computational costs during inference. In our industrial application, the training data is skewed towards well-known companies. We confirm that the model's performance is consistent across both well-known and less-known companies. Our overall results indicate the continuing need to adapt state-of-the-art methods to domain-specific tasks, even in the era of PLMs with strong generalization abilities. We release our codebase and a benchmarking dataset at https://github.com/EQTPartners/PTEC.
    摘要 《Prompt Tuning for Multi-Label Text Classification》Introduction:Prompt Tuning 是一种可扩展和成本效果的方法,用于细化预训练语言模型(PLM),以实现多个标签文本分类。本研究对 Prompt Tuning 和基eline进行性能和计算效率的比较,并应用于投资公司的专有行业分类任务。文本到文本分类 часто被报道可以超越任务特定的分类头,但在多个标签分类问题中存在以下限制:(a)生成的标签可能并不匹配任何标签在标签分类表中;(b) fine-tuning 过程缺乏 permutation 不变性,敏感于提供标签的顺序;(c)模型提供的 binary 决策而不是适当的信心分数。限制(a)通过使用 Trie Search 的受限 décoding,略微提高分类性能。所有限制(a)、(b)和(c)都被 Addressed 通过将 PLM 的语言头 replaced 为分类头,称为 Prompt Tuned Embedding Classification (PTEC),这Significantly 提高性能,同时降低了推理过程中的计算成本。在我们的工业应用中,训练数据偏向知名公司。我们确认模型在知名和不知名公司之间具有一致性。总的来说,我们的结果表明在域专任务中,需要灵活适应当今 PLM 的状态艺术,以实现优化的性能。我们将代码库和分类数据集发布在 GitHub 上,链接在

Benchmarking quantized LLaMa-based models on the Brazilian Secondary School Exam

  • paper_url: http://arxiv.org/abs/2309.12071
  • repo_url: None
  • paper_authors: Matheus L. O. Santos, Cláudio E. C. Campelo
  • for: 这个研究旨在评估基于7和13亿LLaMA模型的大语言模型(LLMs)在家用硬件上的性能。
  • methods: 我们使用了一个包含1,006个问题的数据库,来评估这些模型的效果。我们还测试了这些模型的计算效率。
  • results: 我们发现最佳performing模型在原始葡萄牙语问题上达到了约46%的准确率,而在英文翻译中达到了约49%的准确率。此外,我们发现7和13亿LLMs在这些问题上的计算时间为20和50秒,室内装备了AMD Ryzen 5 3600x处理器。
    Abstract Although Large Language Models (LLMs) represent a revolution in the way we interact with computers, allowing the construction of complex questions and the ability to reason over a sequence of statements, their use is restricted due to the need for dedicated hardware for execution. In this study, we evaluate the performance of LLMs based on the 7 and 13 billion LLaMA models, subjected to a quantization process and run on home hardware. The models considered were Alpaca, Koala, and Vicuna. To evaluate the effectiveness of these models, we developed a database containing 1,006 questions from the ENEM (Brazilian National Secondary School Exam). Our analysis revealed that the best performing models achieved an accuracy of approximately 46% for the original texts of the Portuguese questions and 49% on their English translations. In addition, we evaluated the computational efficiency of the models by measuring the time required for execution. On average, the 7 and 13 billion LLMs took approximately 20 and 50 seconds, respectively, to process the queries on a machine equipped with an AMD Ryzen 5 3600x processor
    摘要 Translated into Simplified Chinese:尽管大语言模型(LLM)代表了计算机与人类之间的交互方式的革命,允许建立复杂的问题和对语言序列进行理解,但其使用受到硬件限制,因此在执行时需要专门的硬件。在这项研究中,我们评估了基于7和13亿个LLaMA模型的LMM,经过量化处理并在家用硬件上运行。我们考虑了阿LPACA、科洛哈和维瓦纳这三种模型。为了评估这些模型的效果,我们创建了包含1006个ENEM(巴西国家高中考试)问题的数据库。我们的分析发现,最佳performing模型在原始葡萄牙语问题上的准确率为 approximately 46%,而在其英文翻译中的准确率为approximately 49%。此外,我们还评估了这些模型的计算效率,并测量了在一个配备AMD Ryzen 5 3600x处理器的机器上执行查询所需的时间。结果表明,7亿和13亿LLMs在 average需要20和50秒钟才能处理查询。

  • paper_url: http://arxiv.org/abs/2309.12067
  • repo_url: None
  • paper_authors: Karolina Seweryn, Anna Wróblewska, Szymon Łukasik
  • for: 本研究旨在提供关于足球动作认知的全面概述,包括动作识别、定位和空间时间动作local化等方面,尤其是使用不同感知modalities和多modal方法。
  • methods: 本文详细介绍了用于评估模型性能的公共数据源和评价指标,并探讨了最新的状态艺术方法,包括深度学习技术和传统方法。文中强调了多modal方法,以及将一种源数据 representation在不同的方式下表示。
  • results: 本文评论了现有的状态艺术方法的优劣点和局限性,以及它们在提高模型准确性和鲁棒性方面的潜在潜力。最后,文章强调了未来在足球动作认知领域的开放研究问题和未来趋势,包括多modal方法在这个领域的潜在推动作用。
    Abstract Action scene understanding in soccer is a challenging task due to the complex and dynamic nature of the game, as well as the interactions between players. This article provides a comprehensive overview of this task divided into action recognition, spotting, and spatio-temporal action localization, with a particular emphasis on the modalities used and multimodal methods. We explore the publicly available data sources and metrics used to evaluate models' performance. The article reviews recent state-of-the-art methods that leverage deep learning techniques and traditional methods. We focus on multimodal methods, which integrate information from multiple sources, such as video and audio data, and also those that represent one source in various ways. The advantages and limitations of methods are discussed, along with their potential for improving the accuracy and robustness of models. Finally, the article highlights some of the open research questions and future directions in the field of soccer action recognition, including the potential for multimodal methods to advance this field. Overall, this survey provides a valuable resource for researchers interested in the field of action scene understanding in soccer.
    摘要 《足球动作场景理解》是一项复杂和动态的任务,由于游戏的复杂性和玩家之间的交互。本文提供了全面的概述,分为动作识别、定位和空间时间动作Localization,强调modalities和多模态方法。我们探讨了公共可用的数据源和评估模型性能的度量。文章回顾了最新的state-of-the-art方法,包括深度学习技术和传统方法。我们专注于多模态方法,汇集视频和音频数据,以及表示一种来源的不同方式。我们讨论了方法的优点和局限性,以及它们在准确和Robustness中的潜在提升。最后,文章强调了该领域的一些未解决问题和未来方向,包括多模态方法在足球动作识别领域的潜在推动作用。总的来说,本文对研究足球动作场景理解领域的人士提供了有价值的资源。

An Efficient Consolidation of Word Embedding and Deep Learning Techniques for Classifying Anticancer Peptides: FastText+BiLSTM

  • paper_url: http://arxiv.org/abs/2309.12058
  • repo_url: None
  • paper_authors: Onur Karakaya, Zeynep Hilal Kilimci
  • for: 这 paper 的目的是为了开发一个高精度的预测模型,用于分类抗癌肽(ACPs)。
  • methods: 该 paper 使用 Word2Vec 和 FastText 作为词嵌入技术,然后使用 CNN、LSTM 和 BiLSTM 深度学习模型进行分类。
  • results: 实验结果表明,使用提议的模型可以提高分类精度,并在 widely-used 数据集上达到新的州态艺。 Specifically, 使用 FastText+BiLSTM 组合可以达到 ACPs250 数据集的92.50% 的准确率,以及 Independent 数据集的96.15% 的准确率,从而确定新的州态艺。
    Abstract Anticancer peptides (ACPs) are a group of peptides that exhibite antineoplastic properties. The utilization of ACPs in cancer prevention can present a viable substitute for conventional cancer therapeutics, as they possess a higher degree of selectivity and safety. Recent scientific advancements generate an interest in peptide-based therapies which offer the advantage of efficiently treating intended cells without negatively impacting normal cells. However, as the number of peptide sequences continues to increase rapidly, developing a reliable and precise prediction model becomes a challenging task. In this work, our motivation is to advance an efficient model for categorizing anticancer peptides employing the consolidation of word embedding and deep learning models. First, Word2Vec and FastText are evaluated as word embedding techniques for the purpose of extracting peptide sequences. Then, the output of word embedding models are fed into deep learning approaches CNN, LSTM, BiLSTM. To demonstrate the contribution of proposed framework, extensive experiments are carried on widely-used datasets in the literature, ACPs250 and Independent. Experiment results show the usage of proposed model enhances classification accuracy when compared to the state-of-the-art studies. The proposed combination, FastText+BiLSTM, exhibits 92.50% of accuracy for ACPs250 dataset, and 96.15% of accuracy for Independent dataset, thence determining new state-of-the-art.
    摘要 《抗癌肽(ACPs)是一组具有抗肿瘤性的肽。使用ACPs在抗癌治疗中可能成为一种可靠的替代方案,因为它们具有更高的选择性和安全性。最新的科学发展使得肽基本治疗在抗癌领域受到了广泛关注。在这项工作中,我们的动机是提出一种高效的ACP分类模型,使用词嵌入和深度学习模型的结合。首先,我们使用Word2Vec和FastText作为词嵌入技术,以提取肽序列。然后,Word2Vec和FastText模型的输出被 fed into深度学习方法CNN、LSTM和BiLSTM。为了证明我们的提案的价值,我们在文献中广泛使用的数据集进行了广泛的实验。实验结果表明,我们的模型可以在ACPs250和独立数据集上提高分类精度,比之前的状态 arts。特别是,我们的组合FastText+BiLSTM在ACPs250数据集上达到了92.50%的准确率,在独立数据集上达到了96.15%的准确率,从而确定了新的状态 arts。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China.

BELT:Bootstrapping Electroencephalography-to-Language Decoding and Zero-Shot Sentiment Classification by Natural Language Supervision

  • paper_url: http://arxiv.org/abs/2309.12056
  • repo_url: None
  • paper_authors: Jinzhao Zhou, Yiqun Duan, Yu-Cheng Chang, Yu-Kai Wang, Chin-Teng Lin
  • for: 这paper的目的是提出一种新的模型和学习框架,用于解决脑Signal到自然语言翻译的问题。
  • methods: 该paper使用了大型预训练语言模型(LM)来学习EEG表示,并通过对比学习获得semantically Meaningful的EEG表示。
  • results: 该paper在两个脑解释任务上达到了state-of-the-art的结果,分别超过了基eline模型 by 5.45%和10%,并在翻译和零批情感分类任务上取得了42.31%的BLEU-1分数和67.32%的精度。
    Abstract This paper presents BELT, a novel model and learning framework for the pivotal topic of brain-to-language translation research. The translation from noninvasive brain signals into readable natural language has the potential to promote the application scenario as well as the development of brain-computer interfaces (BCI) as a whole. The critical problem in brain signal decoding or brain-to-language translation is the acquisition of semantically appropriate and discriminative EEG representation from a dataset of limited scale and quality. The proposed BELT method is a generic and efficient framework that bootstraps EEG representation learning using off-the-shelf large-scale pretrained language models (LMs). With a large LM's capacity for understanding semantic information and zero-shot generalization, BELT utilizes large LMs trained on Internet-scale datasets to bring significant improvements to the understanding of EEG signals. In particular, the BELT model is composed of a deep conformer encoder and a vector quantization encoder. Semantical EEG representation is achieved by a contrastive learning step that provides natural language supervision. We achieve state-of-the-art results on two featuring brain decoding tasks including the brain-to-language translation and zero-shot sentiment classification. Specifically, our model surpasses the baseline model on both tasks by 5.45% and over 10% and archives a 42.31% BLEU-1 score and 67.32% precision on the main evaluation metrics for translation and zero-shot sentiment classification respectively.
    摘要 The proposed BELT method addresses this problem by leveraging off-the-shelf large-scale pretrained language models (LMs) to bootstrap EEG representation learning. With the capacity of large LMs to understand semantic information and their ability to generalize to new situations, BELT achieves significant improvements in understanding EEG signals.The BELT model consists of a deep conformer encoder and a vector quantization encoder, and semantical EEG representation is achieved through a contrastive learning step that provides natural language supervision. The model is evaluated on two brain decoding tasks, brain-to-language translation and zero-shot sentiment classification, and achieves state-of-the-art results. Specifically, the model outperforms the baseline model by 5.45% and over 10% on both tasks, with a BLEU-1 score of 42.31% and precision of 67.32% for translation and zero-shot sentiment classification, respectively.

SCVCNet: Sliding cross-vector convolution network for cross-task and inter-individual-set EEG-based cognitive workload recognition

  • paper_url: http://arxiv.org/abs/2310.03749
  • repo_url: None
  • paper_authors: Qi Wang, Li Chen, Zhiyuan Zhan, Jianhua Zhang, Zhong Yin
  • For: 这篇论文旨在应用认知劳动量识别器,通过利用不同人机任务和个体集的共同电enzephalogram(EEG)模式,实现Generic Approach。* Methods: 该论文提出了一种名为SCVCNet的神经网络模型,通过分析EEG的细致频率结构来消除任务和个体集相关的干扰。SCVCNet使用了滑动cross-vector convolution(SCVC)操作,并将 paired input layers representing theta和alpha power 作为输入。* Results: 该论文通过使用Regularized least-square method with ridge regression和extreme learning machine theory进行训练,并在三个数据库中验证性能,其中每个数据库包含不同任务由独立的参与者组成。结果显示,SCVCNet在两个不同的验证方案中平均准确率(0.6813和0.6229)和F1分数(0.6743和0.6076)达到了部分比前作高的性能。
    Abstract This paper presents a generic approach for applying the cognitive workload recognizer by exploiting common electroencephalogram (EEG) patterns across different human-machine tasks and individual sets. We propose a neural network called SCVCNet, which eliminates task- and individual-set-related interferences in EEGs by analyzing finer-grained frequency structures in the power spectral densities. The SCVCNet utilizes a sliding cross-vector convolution (SCVC) operation, where paired input layers representing the theta and alpha power are employed. By extracting the weights from a kernel matrix's central row and column, we compute the weighted sum of the two vectors around a specified scalp location. Next, we introduce an inter-frequency-point feature integration module to fuse the SCVC feature maps. Finally, we combined the two modules with the output-channel pooling and classification layers to construct the model. To train the SCVCNet, we employ the regularized least-square method with ridge regression and the extreme learning machine theory. We validate its performance using three databases, each consisting of distinct tasks performed by independent participant groups. The average accuracy (0.6813 and 0.6229) and F1 score (0.6743 and 0.6076) achieved in two different validation paradigms show partially higher performance than the previous works. All features and algorithms are available on website:https://github.com/7ohnKeats/SCVCNet.
    摘要 Here is the translation in Simplified Chinese:这篇论文提出了一种新的认知工作负担识别方法,使用电enzephalogram(EEG)信号。该方法称为SCVCNet,它使用神经网络分析EEG信号的功率 спектраль密度,并提取更细化的频率结构。网络使用滑动交叉向量 convolution(SCVC)操作和间频点特征集成模块来融合特征图。模型使用正则化最小二乘法withridge regression和极限学习机理来训练。模型的性能通过三个数据库,每个数据库包含不同任务和独立参与者组的EEG信号,进行验证。结果显示,提出的方法在两个不同的验证模式中的平均准确率为0.6813和0.6229,F1分数为0.6743和0.6076。所有特征和算法可以在以下网站上获得:

Uncertainty-driven Exploration Strategies for Online Grasp Learning

  • paper_url: http://arxiv.org/abs/2309.12038
  • repo_url: None
  • paper_authors: Yitian Shi, Philipp Schillinger, Miroslav Gabriel, Alexander Kuss, Zohar Feldman, Hanna Ziesche, Ngo Anh Vien
  • For: 提高机器人箱内物品抓取率和灵活性。* Methods: 基于在线学习和 reinforcement learning 的 grasp 预测方法,以及不同的不确定性估计方法。* Results: 实验结果显示,提出的方法可以在实际箱内物品抓取场景中显著提高 grasp 预测精度和灵活性,并且比传统的在线学习方法具有更好的适应能力。
    Abstract Existing grasp prediction approaches are mostly based on offline learning, while, ignored the exploratory grasp learning during online adaptation to new picking scenarios, i.e., unseen object portfolio, camera and bin settings etc. In this paper, we present a novel method for online learning of grasp predictions for robotic bin picking in a principled way. Existing grasp prediction approaches are mostly based on offline learning, while, ignored the exploratory grasp learning during online adaptation to new picking scenarios, i.e., unseen object portfolio, camera and bin settings etc. In this paper, we present a novel method for online learning of grasp predictions for robotic bin picking in a principled way. Specifically, the online learning algorithm with an effective exploration strategy can significantly improve its adaptation performance to unseen environment settings. To this end, we first propose to formulate online grasp learning as a RL problem that will allow to adapt both grasp reward prediction and grasp poses. We propose various uncertainty estimation schemes based on Bayesian Uncertainty Quantification and Distributional Ensembles. We carry out evaluations on real-world bin picking scenes of varying difficulty. The objects in the bin have various challenging physical and perceptual characteristics that can be characterized by semi- or total transparency, and irregular or curved surfaces. The results of our experiments demonstrate a notable improvement in the suggested approach compared to conventional online learning methods which incorporate only naive exploration strategies.
    摘要 现有的抓取预测方法都是基于离线学习,而忽略了在线适应新抓取场景中的探索式学习,即未经见过的物品库、摄像头和容器设置等等。在这篇论文中,我们提出了一种新的在线学习抓取预测方法,以便在原则上进行在线适应。specifically,我们提出了一种有效的探索策略,可以显著提高在未经见过的环境设置下的适应性。为此,我们首先提出了在线抓取学习为RL问题的形式,以便适应抓取奖励预测和抓取姿势。我们还提出了多种不确定性估计方法,基于 bayesian uncertainty quantification和分布 ensemble。我们在实际的垃圾桶抓取场景中进行了评估,桶中的物品具有各种困难的物理和感知特征,包括半透明或完全透明、扭曲或弯曲的表面等。实验结果表明,我们的方法与传统的在线学习方法相比,具有显著的改善。

Dynamic Hypergraph Structure Learning for Traffic Flow Forecasting

  • paper_url: http://arxiv.org/abs/2309.12028
  • repo_url: None
  • paper_authors: Yusheng Zhao, Xiao Luo, Wei Ju, Chong Chen, Xian-Sheng Hua, Ming Zhang
  • for: 预测未来交通情况,基于路网和过去交通情况。
  • methods: 使用强化图 neural network (GNN) 模型复杂的空间时间相关性,并提出一种名为动态Hipergraph结构学习 (DyHSL) 模型来解决交通流量预测问题。
  • results: 在四个流行的交通测试数据集上进行了广泛的实验,并证明了 DyHSL 模型的效果比基本方法更高。
    Abstract This paper studies the problem of traffic flow forecasting, which aims to predict future traffic conditions on the basis of road networks and traffic conditions in the past. The problem is typically solved by modeling complex spatio-temporal correlations in traffic data using spatio-temporal graph neural networks (GNNs). However, the performance of these methods is still far from satisfactory since GNNs usually have limited representation capacity when it comes to complex traffic networks. Graphs, by nature, fall short in capturing non-pairwise relations. Even worse, existing methods follow the paradigm of message passing that aggregates neighborhood information linearly, which fails to capture complicated spatio-temporal high-order interactions. To tackle these issues, in this paper, we propose a novel model named Dynamic Hypergraph Structure Learning (DyHSL) for traffic flow prediction. To learn non-pairwise relationships, our DyHSL extracts hypergraph structural information to model dynamics in the traffic networks, and updates each node representation by aggregating messages from its associated hyperedges. Additionally, to capture high-order spatio-temporal relations in the road network, we introduce an interactive graph convolution block, which further models the neighborhood interaction for each node. Finally, we integrate these two views into a holistic multi-scale correlation extraction module, which conducts temporal pooling with different scales to model different temporal patterns. Extensive experiments on four popular traffic benchmark datasets demonstrate the effectiveness of our proposed DyHSL compared with a broad range of competing baselines.
    摘要 DyHSL extracts hypergraph structural information to model dynamics in the traffic networks and updates each node representation by aggregating messages from its associated hyperedges. Additionally, an interactive graph convolution block is introduced to model high-order spatio-temporal relations in the road network. The two views are then integrated into a holistic multi-scale correlation extraction module, which conducts temporal pooling with different scales to model different temporal patterns.Extensive experiments on four popular traffic benchmark datasets demonstrate the effectiveness of DyHSL compared with a broad range of competing baselines. The proposed method is able to capture complex traffic patterns and improve traffic flow forecasting accuracy.

Demystifying Visual Features of Movie Posters for Multi-Label Genre Identification

  • paper_url: http://arxiv.org/abs/2309.12022
  • repo_url: None
  • paper_authors: Utsav Kumar Nareti, Chandranath Adak, Soumi Chattopadhyay
  • for: Automated multi-label genre identification of movies from poster images.
  • methods: Deep transformer network with a probabilistic module.
  • results: Encouraging performance and outperformed some contemporary architectures in experimental analysis using 13882 posters from IMDb.
    Abstract In the film industry, movie posters have been an essential part of advertising and marketing for many decades, and continue to play a vital role even today in the form of digital posters through online, social media and OTT platforms. Typically, movie posters can effectively promote and communicate the essence of a film, such as its genre, visual style/ tone, vibe and storyline cue/ theme, which are essential to attract potential viewers. Identifying the genres of a movie often has significant practical applications in recommending the film to target audiences. Previous studies on movie genre identification are limited to subtitles, plot synopses, and movie scenes that are mostly accessible after the movie release. Posters usually contain pre-release implicit information to generate mass interest. In this paper, we work for automated multi-label genre identification only from movie poster images, without any aid of additional textual/meta-data information about movies, which is one of the earliest attempts of its kind. Here, we present a deep transformer network with a probabilistic module to identify the movie genres exclusively from the poster. For experimental analysis, we procured 13882 number of posters of 13 genres from the Internet Movie Database (IMDb), where our model performances were encouraging and even outperformed some major contemporary architectures.
    摘要 在电影业中,电影海报是广告和营销的重要组成部分,已经有很多年了,并且在今天的形式中仍然扮演着重要的角色,包括数字海报通过在线、社交媒体和OTT平台。通常,电影海报可以有效地推广和传达电影的核心元素,如其类别、视觉风格/调子、氛围和故事线索/主题,这些元素都是吸引potential viewers的关键。识别电影的类别有重要实际应用,例如推荐电影给target audience。过去的研究通常限于电影的字幕、剧情简opsis和电影场景,这些信息通常都可以在电影上映后获得。然而,海报通常包含在电影发布之前的隐式信息,以便生成大量的兴趣。在这篇论文中,我们采用了自动化多标签类别预测方法,只使用电影海报图像,无需任何额外的文本/ мета-数据信息 about movies,这是当前的一个非常早期的尝试。我们采用了深度变换网络和概率模块来预测电影类别。为了实验分析,我们从互联网电影数据库(IMDb)上获取了13882张海报,并发现我们的模型性能很出色,甚至超过了一些当前的主流架构。

Safe Hierarchical Reinforcement Learning for CubeSat Task Scheduling Based on Energy Consumption

  • paper_url: http://arxiv.org/abs/2309.12004
  • repo_url: None
  • paper_authors: Mahya Ramezani, M. Amin Alandihallaj, Jose Luis Sanchez-Lopez, Andreas Hein
  • for: 优化CubeSat任务调度在低地球轨道(LEO)中
  • methods: 使用层次强化学习方法,包括高级策略 для全局任务分配和低级策略作为安全机制,并使用相似性注意力基本编码器(SABE)进行任务优先级化和多项式预测器(MLP)进行能量消耗预测
  • results: 在多个CubeSat配置下,通过实验证明 Hierarchical Reinforcement Learning 的超 convergency和任务成功率优于 MADDPG 模型和随机调度策略
    Abstract This paper presents a Hierarchical Reinforcement Learning methodology tailored for optimizing CubeSat task scheduling in Low Earth Orbits (LEO). Incorporating a high-level policy for global task distribution and a low-level policy for real-time adaptations as a safety mechanism, our approach integrates the Similarity Attention-based Encoder (SABE) for task prioritization and an MLP estimator for energy consumption forecasting. Integrating this mechanism creates a safe and fault-tolerant system for CubeSat task scheduling. Simulation results validate the Hierarchical Reinforcement Learning superior convergence and task success rate, outperforming both the MADDPG model and traditional random scheduling across multiple CubeSat configurations.
    摘要

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

  • paper_url: http://arxiv.org/abs/2309.11998
  • repo_url: https://github.com/lm-sys/fastchat
  • paper_authors: Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric. P Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang
  • for: The paper is written for researchers and developers who want to understand and advance the capabilities of large language models (LLMs) in real-world scenarios.
  • methods: The paper introduces a large-scale dataset called LMSYS-Chat-1M, which contains one million real-world conversations with 25 state-of-the-art LLMs. The dataset is collected from 210K unique IP addresses in the wild and includes a curation process, basic statistics, and topic distribution.
  • results: The paper demonstrates the versatility of the dataset through four use cases: developing content moderation models, building a safety benchmark, training instruction-following models, and creating challenging benchmark questions. The dataset is publicly available and is expected to serve as a valuable resource for understanding and advancing LLM capabilities.
    Abstract Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications. In this paper, we introduce LMSYS-Chat-1M, a large-scale dataset containing one million real-world conversations with 25 state-of-the-art LLMs. This dataset is collected from 210K unique IP addresses in the wild on our Vicuna demo and Chatbot Arena website. We offer an overview of the dataset's content, including its curation process, basic statistics, and topic distribution, highlighting its diversity, originality, and scale. We demonstrate its versatility through four use cases: developing content moderation models that perform similarly to GPT-4, building a safety benchmark, training instruction-following models that perform similarly to Vicuna, and creating challenging benchmark questions. We believe that this dataset will serve as a valuable resource for understanding and advancing LLM capabilities. The dataset is publicly available at https://huggingface.co/datasets/lmsys/lmsys-chat-1m.
    摘要 Translated into Simplified Chinese:研究人员在实际场景中与大语言模型(LLM)交互的研究日益重要,因为它们在各种应用程序中广泛使用。在这篇论文中,我们介绍了LMSYS-Chat-1M数据集,这是一个包含100万个实际对话,并与25个现代LLM进行交互的大规模数据集。这个数据集来自于210,000个唯一的IP地址,并在我们的Vicuna demo和Chatbot Arena网站上采集。我们提供了数据集的内容概述,包括筛选过程、基本统计和主题分布,并 highlighted its diversity, originality, and scale。我们还示例了这个数据集的多样性,通过四个使用场景:开发与GPT-4类似的内容审核模型,建立安全基准,训练与Vicuna类似的 instrucion-following 模型,并创建挑战性的问题集。我们认为这个数据集将成为 LLM 能力的研究和进步的重要资源。这个数据集公开可用于https://huggingface.co/datasets/lmsys/lmsys-chat-1m。

Predictability and Comprehensibility in Post-Hoc XAI Methods: A User-Centered Analysis

  • paper_url: http://arxiv.org/abs/2309.11987
  • repo_url: None
  • paper_authors: Anahid Jalali, Bernhard Haslhofer, Simone Kriglstein, Andreas Rauber
  • for: 本研究旨在评估用户对黑盒机器学习模型预测结果的解释是否能够增强用户对模型行为的预测能力。
  • methods: 本研究使用了两种广泛使用的工具:LIME和SHAP。我们还研究了对于用户理解和预测模型行为的影响。
  • results: 我们发现SHAP的解释在模型决策边界附近时具有显著的减少了可读性。此外,我们发现对于用户理解和预测模型行为的影响很大。基于我们的发现,我们还提出了未来采用更高度的可读性和预测性的后期解释方法的设计建议。
    Abstract Post-hoc explainability methods aim to clarify predictions of black-box machine learning models. However, it is still largely unclear how well users comprehend the provided explanations and whether these increase the users ability to predict the model behavior. We approach this question by conducting a user study to evaluate comprehensibility and predictability in two widely used tools: LIME and SHAP. Moreover, we investigate the effect of counterfactual explanations and misclassifications on users ability to understand and predict the model behavior. We find that the comprehensibility of SHAP is significantly reduced when explanations are provided for samples near a model's decision boundary. Furthermore, we find that counterfactual explanations and misclassifications can significantly increase the users understanding of how a machine learning model is making decisions. Based on our findings, we also derive design recommendations for future post-hoc explainability methods with increased comprehensibility and predictability.
    摘要 afterwards explainability 方法 goals to clarify predictions of black-box machine learning models. However, it is still largely unclear how well users comprehend the provided explanations and whether these increase the users ability to predict the model behavior. We approach this question by conducting a user study to evaluate comprehensibility and predictability in two widely used tools: LIME and SHAP. Moreover, we investigate the effect of counterfactual explanations and misclassifications on users ability to understand and predict the model behavior. We find that the comprehensibility of SHAP is significantly reduced when explanations are provided for samples near a model's decision boundary. Furthermore, we find that counterfactual explanations and misclassifications can significantly increase the users understanding of how a machine learning model is making decisions. Based on our findings, we also derive design recommendations for future post-hoc explainability methods with increased comprehensibility and predictability.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other parts of the world. If you prefer Traditional Chinese, I can provide that as well.

Representation Abstractions as Incentives for Reinforcement Learning Agents: A Robotic Grasping Case Study

  • paper_url: http://arxiv.org/abs/2309.11984
  • repo_url: https://github.com/PetropoulakisPanagiotis/igae
  • paper_authors: Panagiotis Petropoulakis, Ludwig Gräf, Josip Josifovski, Mohammadhossein Malmir, Alois Knoll
  • for: 这个研究的目的是探讨RL Agent在各种状态表示下解决 робо控制任务的效果。
  • methods: 这个研究使用了不同的状态表示方法,从模型基于的方法、数值型的表示、到图像型的表示,以评估RL Agent在不同状态表示下的性能。
  • results: 研究结果表明,RL Agent使用数值型状态表示可以与非学习基线相当,而图像型表示可以提高RL Agent的成功率和转移率。
    Abstract Choosing an appropriate representation of the environment for the underlying decision-making process of the RL agent is not always straightforward. The state representation should be inclusive enough to allow the agent to informatively decide on its actions and compact enough to increase sample efficiency for policy training. Given this outlook, this work examines the effect of various state representations in incentivizing the agent to solve a specific robotic task: antipodal and planar object grasping. A continuum of state representation abstractions is defined, starting from a model-based approach with complete system knowledge, through hand-crafted numerical, to image-based representations with decreasing level of induced task-specific knowledge. We examine the effects of each representation in the ability of the agent to solve the task in simulation and the transferability of the learned policy to the real robot. The results show that RL agents using numerical states can perform on par with non-learning baselines. Furthermore, we find that agents using image-based representations from pre-trained environment embedding vectors perform better than end-to-end trained agents, and hypothesize that task-specific knowledge is necessary for achieving convergence and high success rates in robot control.
    摘要 (注意:以下是简化中文翻译,不同的翻译方式可能会有所不同)选择RL机器人的决策过程下的环境表示方式不一定 straightforward。状态表示应该包含足够的信息,让机器人能够决策,同时也应该尽量减少样本效率,以便策略训练。基于这个视角,这项工作研究了不同状态表示方式对RL机器人解决特殊 робо控任务:把物捕获到极点和平面上的效果。一个维度的状态表示各种抽象维度定义,从模型基于的方法,到手工制作的数值,以至图像基于的表示,均逐渐减少所引入的任务特定知识。我们研究每种表示方式对机器人解决任务在模拟环境中的能力,以及这些学习的策略在真实机器人中的传输性。结果表明RL机器人使用数值状态可以与非学习基准相当,而使用图像基于表示,从预训练的环境嵌入向量中提取的表示,能够在策略训练中获得更高的成功率和转移率。

Rethinking the Evaluating Framework for Natural Language Understanding in AI Systems: Language Acquisition as a Core for Future Metrics

  • paper_url: http://arxiv.org/abs/2309.11981
  • repo_url: None
  • paper_authors: Patricio Vera, Pedro Moya, Lisa Barraza
  • for: 本研究旨在探讨人工智能(AI)领域内大语言模型(LLM)在自然语言处理(NLP)领域的不同进化,并重新评估传统机器智能的评价方法。
  • methods: 本研究提出一种新的评价框架,启发自现代语言模型的进步,旨在掌握语言理解和学习能力。
  • results: 研究表明,新的评价框架可以更好地评估机器智能的语言理解和学习能力,并且可以帮助解决传统评价方法的限制。
    Abstract In the burgeoning field of artificial intelligence (AI), the unprecedented progress of large language models (LLMs) in natural language processing (NLP) offers an opportunity to revisit the entire approach of traditional metrics of machine intelligence, both in form and content. As the realm of machine cognitive evaluation has already reached Imitation, the next step is an efficient Language Acquisition and Understanding. Our paper proposes a paradigm shift from the established Turing Test towards an all-embracing framework that hinges on language acquisition, taking inspiration from the recent advancements in LLMs. The present contribution is deeply tributary of the excellent work from various disciplines, point out the need to keep interdisciplinary bridges open, and delineates a more robust and sustainable approach.
    摘要 在人工智能(AI)领域的不断发展中,大语言模型(LLM)在自然语言处理(NLP)领域的无前例进步,为我们重新审视传统机器智能评价 metric 的整体方法和内容。因为机器认知领域已经达到了仿制,接下来的步骤是有效地语言学习和理解。我们的论文提出了由传统图灵测试shift towards一个涵盖语言学习的框架,以启发自最近的大语言模型的进步。本贡献受到了不同领域的出色工作的推动,要继续保持交往的桥梁,并定义了更加坚固和可持续的方法。

Inferring Capabilities from Task Performance with Bayesian Triangulation

  • paper_url: http://arxiv.org/abs/2309.11975
  • repo_url: None
  • paper_authors: John Burden, Konstantinos Voudouris, Ryan Burnell, Danaja Rutar, Lucy Cheke, José Hernández-Orallo
  • for: 本研究旨在Characterizing machine learning models in richer, more meaningful ways, using diverse experimental data to infer the cognitive profile of a system.
  • methods: 该方法使用PyMC probabilistic programming library, introducing measurement layouts to model how task-instance features interact with system capabilities, triangulating features to infer capabilities from non-populational data.
  • results: 研究通过对68个实际参赛者和30个synthetic agents进行评估,成功地推断出不同的认知 профиls,展示了 capability-oriented evaluation的潜力。
    Abstract As machine learning models become more general, we need to characterise them in richer, more meaningful ways. We describe a method to infer the cognitive profile of a system from diverse experimental data. To do so, we introduce measurement layouts that model how task-instance features interact with system capabilities to affect performance. These features must be triangulated in complex ways to be able to infer capabilities from non-populational data -- a challenge for traditional psychometric and inferential tools. Using the Bayesian probabilistic programming library PyMC, we infer different cognitive profiles for agents in two scenarios: 68 actual contestants in the AnimalAI Olympics and 30 synthetic agents for O-PIAAGETS, an object permanence battery. We showcase the potential for capability-oriented evaluation.
    摘要 随着机器学习模型变得更通用,我们需要用更加细致、有意义的方式来描述它们。我们介绍了一种方法,用于从多种实验数据中推断系统的认知 profiling。为此,我们引入了任务实例特征和系统能力之间的测量布局,以便从非常量数据中推断系统的能力。这些特征需要在复杂的方式下进行三角测量,以便从非常量数据中推断系统的能力。我们使用 bayesian probabilistic programming 库 PyMC,对 AnimalAI 奥运会中的68名实际参赛者和 O-PIAAGETS 对象常见性测试中的30名 sintetic agent 进行推断,并示cases the potential of capability-oriented evaluation。

A Comprehensive Review on Financial Explainable AI

  • paper_url: http://arxiv.org/abs/2309.11960
  • repo_url: None
  • paper_authors: Wei Jie Yeo, Wihan van der Heever, Rui Mao, Erik Cambria, Ranjan Satapathy, Gianmarco Mengaldo
  • for: 评估和选择深度学习模型的解释性方法,以提高深度学习模型在金融领域的透明度和可信度。
  • methods: 对深度学习模型的解释性方法进行比较分析,并根据它们的特点进行分类。
  • results: 对深度学习模型的解释性方法的透明度和可信度进行评估,并探讨采用解释性AI方法的问题和挑战,以及未来的发展方向。
    Abstract The success of artificial intelligence (AI), and deep learning models in particular, has led to their widespread adoption across various industries due to their ability to process huge amounts of data and learn complex patterns. However, due to their lack of explainability, there are significant concerns regarding their use in critical sectors, such as finance and healthcare, where decision-making transparency is of paramount importance. In this paper, we provide a comparative survey of methods that aim to improve the explainability of deep learning models within the context of finance. We categorize the collection of explainable AI methods according to their corresponding characteristics, and we review the concerns and challenges of adopting explainable AI methods, together with future directions we deemed appropriate and important.
    摘要 人工智能(AI)和深度学习模型的成功导致它们在不同领域得到广泛的应用,这主要是因为它们可以处理巨量数据并学习复杂的模式。然而,由于它们的不可解性,在重要领域如金融和医疗等,它们的使用受到了 significatively 的关注,因为它们的决策过程的透明度是非常重要的。在这篇论文中,我们提供了对于改善深度学习模型可见性的比较调查。我们根据这些方法的特点进行分类,并评估了采用可见性AI方法的问题和挑战,以及未来的发展方向。

On the Definition of Appropriate Trust and the Tools that Come with it

  • paper_url: http://arxiv.org/abs/2309.11937
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Helena Löfström
  • For: This paper focuses on evaluating the efficiency of human-AI interactions, specifically in terms of the human experience of explanations and the user’s appropriate trust in the model.* Methods: The paper compares the definitions of appropriate trust from the literature with model performance evaluation, and offers a novel approach to evaluating appropriate trust by taking advantage of the likenesses between definitions. The paper also provides several straightforward evaluation methods for different aspects of user performance, including measuring uncertainty and appropriate trust in regression.* Results: The paper’s main contribution is a novel approach to evaluating appropriate trust, which offers a more objective and comparative evaluation of explanation methods. The paper also provides specific evaluation methods for different aspects of user performance.
    Abstract Evaluating the efficiency of human-AI interactions is challenging, including subjective and objective quality aspects. With the focus on the human experience of the explanations, evaluations of explanation methods have become mostly subjective, making comparative evaluations almost impossible and highly linked to the individual user. However, it is commonly agreed that one aspect of explanation quality is how effectively the user can detect if the predictions are trustworthy and correct, i.e., if the explanations can increase the user's appropriate trust in the model. This paper starts with the definitions of appropriate trust from the literature. It compares the definitions with model performance evaluation, showing the strong similarities between appropriate trust and model performance evaluation. The paper's main contribution is a novel approach to evaluating appropriate trust by taking advantage of the likenesses between definitions. The paper offers several straightforward evaluation methods for different aspects of user performance, including suggesting a method for measuring uncertainty and appropriate trust in regression.
    摘要 评估人类-AI交互的效率具有挑战性,包括主观和客观质量方面。因为注重人类解释的经验,评估解释方法的评价倾向于主观,使对比评价变得各异不同,高度受用户个人影响。然而,通常认为一个解释质量的重要方面是否能让用户正确地判断预测结果的可靠性和正确性,即是否能够提高用户对模型的适当信任。这篇论文从文献中定义了适当信任的定义,并与模型性能评估进行比较,显示了这两者之间的强相似性。本文的主要贡献是一种新的适当信任评估方法,利用定义之间的相似性。文章还提供了不同方面的用户性能评估方法,包括用于推荐和回归中的不确定性和适当信任评估方法。

Learning to Recover for Safe Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.11907
  • repo_url: None
  • paper_authors: Haoyu Wang, Xin Yuan, Qinqing Ren
    for: 这种研究旨在实现安全的学习控制,以及在复杂环境中自动生成安全约束。methods: 提议了一种三个阶段架构,称为TU-Recovery架构,包括安全评估和恢复策略的学习。results: 实验表明,TU-Recovery在约束遵从和约束违反两个方面都有优于不受约束的对照组,并且 auxiliary reward 可以进一步提高TU-Recovery的奖励至比例。
    Abstract Safety controllers is widely used to achieve safe reinforcement learning. Most methods that apply a safety controller are using handcrafted safety constraints to construct the safety controller. However, when the environment dynamics are sophisticated, handcrafted safety constraints become unavailable. Therefore, it worth to research on constructing safety controllers by learning algorithms. We propose a three-stage architecture for safe reinforcement learning, namely TU-Recovery Architecture. A safety critic and a recovery policy is learned before task training. They form a safety controller to ensure safety in task training. Then a phenomenon induced by disagreement between task policy and recovery policy, called adversarial phenomenon, which reduces learning efficiency and model performance, is described. Auxiliary reward is proposed to mitigate adversarial phenomenon, while help the task policy to learn to recover from high-risk states. A series of experiments are conducted in a robot navigation environment. Experiments demonstrate that TU-Recovery outperforms unconstrained counterpart in both reward gaining and constraint violations during task training, and auxiliary reward further improve TU-Recovery in reward-to-cost ratio by significantly reduce constraint violations.
    摘要 安全控制器广泛应用于安全返回学习,大多数方法都使用手工安全限制构建安全控制器。然而,当环境动力较复杂时,手工安全限制变得无效。因此,研究构建基于学习算法的安全控制器是有优势的。我们提出了三阶 Architecture for Safe Reinforcement Learning,称为TU-Recovery Architecture。在任务训练之前,一个安全评价器和一个恢复策略被学习出来,它们组成一个安全控制器,确保任务训练中的安全性。然后,一种由任务策略和恢复策略的不一致引起的现象,称为对抗现象,这会降低学习效率和模型性能。为了 Mitigate this phenomenon, an auxiliary reward is proposed to help the task policy learn to recover from high-risk states.在一个机器人导航环境中,我们进行了一系列实验,结果表明,TU-Recovery在增加奖励和限制违反时比无限制版本表现更好,并且 auxiliary reward 可以再加强 TU-Recovery 的奖励比例。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Unlocking the Heart Using Adaptive Locked Agnostic Networks

  • paper_url: http://arxiv.org/abs/2309.11899
  • repo_url: https://github.com/AstraZeneca/UnlockingHeart
  • paper_authors: Sylwia Majchrowska, Anders Hildeman, Philip Teare, Tom Diethe
  • For: The paper is written for medical imaging applications, specifically for echocardiography datasets.* Methods: The paper introduces the Adaptive Locked Agnostic Network (ALAN) method, which uses self-supervised visual feature extraction and a large backbone model to produce anatomically robust semantic self-segmentation.* Results: The paper demonstrates that the self-supervised backbone model robustly identifies anatomical subregions of the heart in an apical four-chamber view, and then uses these features to design two downstream models for segmenting a target anatomical region and echocardiogram view classification.Here’s the Chinese translation of the three points:* For: 这篇论文是为医疗影像应用而写的,特别是为echocardiography datasets。* Methods: 论文介绍了Adaptive Locked Agnostic Network (ALAN)方法,该方法使用自动启动的视觉特征提取和大型后处理模型来生成医学上Robust的semantic自 Segmentation。* Results: 论文表明,自动启动后处理模型可以强健地识别心脏四室视图中的 анатомичеSUBregion。然后,通过使用这些特征,设计了两个下游模型,一个用于目标 анаatomical区域分割,另一个用于echo cardiogram视图分类。
    Abstract Supervised training of deep learning models for medical imaging applications requires a significant amount of labeled data. This is posing a challenge as the images are required to be annotated by medical professionals. To address this limitation, we introduce the Adaptive Locked Agnostic Network (ALAN), a concept involving self-supervised visual feature extraction using a large backbone model to produce anatomically robust semantic self-segmentation. In the ALAN methodology, this self-supervised training occurs only once on a large and diverse dataset. Due to the intuitive interpretability of the segmentation, downstream models tailored for specific tasks can be easily designed using white-box models with few parameters. This, in turn, opens up the possibility of communicating the inner workings of a model with domain experts and introducing prior knowledge into it. It also means that the downstream models become less data-hungry compared to fully supervised approaches. These characteristics make ALAN particularly well-suited for resource-scarce scenarios, such as costly clinical trials and rare diseases. In this paper, we apply the ALAN approach to three publicly available echocardiography datasets: EchoNet-Dynamic, CAMUS, and TMED-2. Our findings demonstrate that the self-supervised backbone model robustly identifies anatomical subregions of the heart in an apical four-chamber view. Building upon this, we design two downstream models, one for segmenting a target anatomical region, and a second for echocardiogram view classification.
    摘要 超vised学习深度学习模型用于医学成像应用需要一定量的标注数据。然而,获取标注数据具有挑战,因为图像需要由医疗专业人员进行标注。为解决这个限制,我们介绍了自适应锁定不可知数网络(ALAN)。ALAN方法包括使用大型后向模型进行自主超vised视觉特征提取,以生成可靠的各种生物marker。在ALAN方法中,这种自主超vised训练只需在一个大型和多样化的数据集上进行一次。由于分割结果具有直观可读性,可以使用白盒模型并少量参数来设计下游模型。这种特点使得ALAN在资源匮乏的enario下具有优势,如costly临床试验和罕见疾病。在这篇论文中,我们运用ALAN方法于三个公共可用的echocardiography数据集:EchoNet-Dynamic、CAMUS和TMED-2。我们的发现表明,自适应锁定不可知数网络可以在Apical四室视图中稳定地标识心脏的各种生物marker。基于这种成果,我们设计了两个下游模型:一个用于标识目标生物区域,另一个用于echocardiogram视图分类。

MiChao-HuaFen 1.0: A Specialized Pre-trained Corpus Dataset for Domain-specific Large Models

  • paper_url: http://arxiv.org/abs/2309.13079
  • repo_url: None
  • paper_authors: Yidong Liu, FuKai Shang, Fang Wang, Rui Xu, Jun Wang, Wei Li, Yao Li, Conghui He
  • for: 这篇论文旨在为特定领域(如医疗、法律、金融等)提供高质量、域specific的输出,以满足各个领域的需求。
  • methods: 该论文首先评估了现有的大型模型在专业领域的表现,并讨论了这些模型的限制。然后,该论文提出了一个名为“MiChao-HuaFen 1.0”的预训练数据集,专门为新闻和政府部门提供。这个数据集来自于2022年公开available的互联网数据,经过多轮净化和处理,以确保高质量和可靠的来源。
  • results: 该论文通过预训练大型模型在中文垂直领域中表现出色,并为深度学习研究和应用在相关领域提供了支持。
    Abstract With the advancement of deep learning technologies, general-purpose large models such as GPT-4 have demonstrated exceptional capabilities across various domains. Nevertheless, there remains a demand for high-quality, domain-specific outputs in areas like healthcare, law, and finance. This paper first evaluates the existing large models for specialized domains and discusses their limitations. To cater to the specific needs of certain domains, we introduce the ``MiChao-HuaFen 1.0'' pre-trained corpus dataset, tailored for the news and governmental sectors. The dataset, sourced from publicly available internet data from 2022, underwent multiple rounds of cleansing and processing to ensure high quality and reliable origins, with provisions for consistent and stable updates. This dataset not only supports the pre-training of large models for Chinese vertical domains but also aids in propelling deep learning research and applications in related fields.
    摘要 随着深度学习技术的发展,通用大型模型如GPT-4已经表现出色在各个领域。然而,仍然存在特定领域的高质量、域专输出的需求,如医疗、法律和金融等。这篇论文首先评估了现有的域专大型模型,并讨论了它们的限制。为了满足特定领域的需求,我们介绍了“微超花均1.0”预训练数据集,专门为新闻和政府部门设计。这个数据集来自于2022年公开available的互联网数据,经过多 rondas of 清洁和处理,以确保高质量和可靠的来源,并提供了一系列的常规和稳定的更新。这个数据集不仅支持中文垂直领域的大型模型的预训练,还可以推动深度学习研究和应用在相关领域。

Audio Contrastive based Fine-tuning

  • paper_url: http://arxiv.org/abs/2309.11895
  • repo_url: None
  • paper_authors: Yang Wang, Qibin Liang, Chenghao Xiao, Yizhi Li, Noura Al Moubayed, Chenghua Lin
  • for: Audio classification tasks with a wide range of applications, such as speech and sound processing.
  • methods: contrastive learning, fine-tuning.
  • results: state-of-the-art results in various settings, robust generalisability.Here’s the full text in Simplified Chinese:for: Audio classification tasks with a wide range of applications, such as speech and sound processing.methods: contrastive learning, fine-tuning.results: state-of-the-art results in various settings, robust generalisability.
    Abstract Audio classification plays a crucial role in speech and sound processing tasks with a wide range of applications. There still remains a challenge of striking the right balance between fitting the model to the training data (avoiding overfitting) and enabling it to generalise well to a new domain. Leveraging the transferability of contrastive learning, we introduce Audio Contrastive-based Fine-tuning (AudioConFit), an efficient approach characterised by robust generalisability. Empirical experiments on a variety of audio classification tasks demonstrate the effectiveness and robustness of our approach, which achieves state-of-the-art results in various settings.
    摘要 Audio分类在语音和声音处理任务中扮演着重要角色,它在各种应用领域中具有广泛的应用前景。然而,模型适应训练数据的问题仍然存在,即避免过拟合。我们利用对比学习的转移性,提出了Audio Contrastive-based Fine-tuning(AudioConFit)方法,具有良好的抗抗销性。经验测试表明,我们的方法在多种Audio分类任务中具有出色的效果和稳定性,达到了不同设置下的状态体现。

A Knowledge-Driven Cross-view Contrastive Learning for EEG Representation

  • paper_url: http://arxiv.org/abs/2310.03747
  • repo_url: None
  • paper_authors: Weining Weng, Yang Gu, Qihui Zhang, Yingying Huang, Chunyan Miao, Yiqiang Chen
  • for: This paper is written for researchers and practitioners working with electroencephalogram (EEG) signals and deep learning methods, particularly those interested in developing supervised learning methods for EEG signals with limited labels.
  • methods: The paper proposes a knowledge-driven cross-view contrastive learning framework (KDC2) that integrates neurological theory to extract effective representations from EEG signals with limited labels. The KDC2 method creates scalp and neural views of EEG signals, simulating the internal and external representation of brain activity, and uses inter-view and cross-view contrastive learning pipelines in combination with various augmentation methods to capture neural features from different views.
  • results: The experimental results on different downstream tasks demonstrate that the proposed method outperforms state-of-the-art methods, highlighting the superior generalization of neural knowledge-supported EEG representations across various brain tasks.
    Abstract Due to the abundant neurophysiological information in the electroencephalogram (EEG) signal, EEG signals integrated with deep learning methods have gained substantial traction across numerous real-world tasks. However, the development of supervised learning methods based on EEG signals has been hindered by the high cost and significant label discrepancies to manually label large-scale EEG datasets. Self-supervised frameworks are adopted in vision and language fields to solve this issue, but the lack of EEG-specific theoretical foundations hampers their applicability across various tasks. To solve these challenges, this paper proposes a knowledge-driven cross-view contrastive learning framework (KDC2), which integrates neurological theory to extract effective representations from EEG with limited labels. The KDC2 method creates scalp and neural views of EEG signals, simulating the internal and external representation of brain activity. Sequentially, inter-view and cross-view contrastive learning pipelines in combination with various augmentation methods are applied to capture neural features from different views. By modeling prior neural knowledge based on homologous neural information consistency theory, the proposed method extracts invariant and complementary neural knowledge to generate combined representations. Experimental results on different downstream tasks demonstrate that our method outperforms state-of-the-art methods, highlighting the superior generalization of neural knowledge-supported EEG representations across various brain tasks.
    摘要 因为电энце法测试(EEG)信号具有庞大的神经生物学信息,因此EEG信号与深度学习方法的结合在许多实际任务中得到了广泛的应用。然而,基于EEG信号的指导学习方法的发展受到了大量标签数据手动标注的高成本和标签差异的限制。在视觉和语言领域中采用了自动标注框架,但是由于EEG信号的特殊性,这些框架在不同任务中的应用受到了限制。为解决这些挑战,本文提出了基于知识驱动的跨视图对比学习框架(KDC2),该框架通过神经生物学理论提取EEG信号中有效的表示。KDC2方法创建了脊梁和神经视图的EEG信号,模拟内部和外部的脑动力表示。然后,在不同视图之间和跨视图之间,应用了多种扩展方法,以捕捉不同视图中的神经特征。通过基于同源神经信息一致理论模型尽可能多的先验知识,提出的方法提取了不变和补充的神经知识,生成了组合表示。实验结果表明,我们的方法在不同下游任务中的表现优于状态之前的方法,highlighting the superior generalization of neural knowledge-supported EEG representations across various brain tasks.

Multi-level Asymmetric Contrastive Learning for Medical Image Segmentation Pre-training

  • paper_url: http://arxiv.org/abs/2309.11876
  • repo_url: None
  • paper_authors: Shuang Zeng, Lei Zhu, Xinliang Zhang, Zifeng Tian, Qian Chen, Lujia Jin, Jiayi Wang, Yanye Lu
  • for: 这个研究旨在提出一个新的对称对抗学习框架(JCL),用于医疗影像分类。
  • methods: 这个框架使用了一个新的对称对抗学习策略,同时预训 Both encoder和decoder,以提供更好的初始化 для分类模型。另外,一个多层对抗损失函数被设计来考虑对于特征层、影像层和像素层的对应,以确保encoder和decoder在预训过程中学习多层表示。
  • results: 在多个医疗影像数据集上进行了实验,结果显示了我们的JCL框架比现有的SOTA对抗学习策略更好。
    Abstract Contrastive learning, which is a powerful technique for learning image-level representations from unlabeled data, leads a promising direction to dealing with the dilemma between large-scale pre-training and limited labeled data. However, most existing contrastive learning strategies are designed mainly for downstream tasks of natural images, therefore they are sub-optimal and even worse than learning from scratch when directly applied to medical images whose downstream tasks are usually segmentation. In this work, we propose a novel asymmetric contrastive learning framework named JCL for medical image segmentation with self-supervised pre-training. Specifically, (1) A novel asymmetric contrastive learning strategy is proposed to pre-train both encoder and decoder simultaneously in one-stage to provide better initialization for segmentation models. (2) A multi-level contrastive loss is designed to take the correspondence among feature-level, image-level and pixel-level projections, respectively into account to make sure multi-level representations can be learned by the encoder and decoder during pre-training. (3) Experiments on multiple medical image datasets indicate our JCL framework outperforms existing SOTA contrastive learning strategies.
    摘要 <>TRANSLATE_TEXTcontrastive learning, which is a powerful technique for learning image-level representations from unlabeled data, leads a promising direction to dealing with the dilemma between large-scale pre-training and limited labeled data. However, most existing contrastive learning strategies are designed mainly for downstream tasks of natural images, therefore they are sub-optimal and even worse than learning from scratch when directly applied to medical images whose downstream tasks are usually segmentation. In this work, we propose a novel asymmetric contrastive learning framework named JCL for medical image segmentation with self-supervised pre-training. Specifically, (1) A novel asymmetric contrastive learning strategy is proposed to pre-train both encoder and decoder simultaneously in one-stage to provide better initialization for segmentation models. (2) A multi-level contrastive loss is designed to take the correspondence among feature-level, image-level and pixel-level projections, respectively into account to make sure multi-level representations can be learned by the encoder and decoder during pre-training. (3) Experiments on multiple medical image datasets indicate our JCL framework outperforms existing SOTA contrastive learning strategies.TRANSLATE_TEXT

Stochastic stiffness identification and response estimation of Timoshenko beams via physics-informed Gaussian processes

  • paper_url: http://arxiv.org/abs/2309.11875
  • repo_url: https://github.com/gledsonrt/pigptimoshenkobeam
  • paper_authors: Gledson Rodrigo Tondo, Sebastian Rau, Igor Kavrakov, Guido Morgenthal
  • For: 这篇论文是用来描述一种基于机器学习的结构健康监测系统,用于结构参数Identification和结构回归预测。* Methods: 该论文使用了一种基于 Gaussian Process(GP)模型的physics-informed机器学习模型,使用多输出GP模型来描述Timoshenko beam元件的运动、弯 curvature、应力、负荷等参数。使用bayesian方式进行模型优化,通过Markov chain Monte Carlo方法来最大化 posterior模型。* Results: 该论文通过实验 validate了其模型,并 demonstarted that the proposed approach is effective at identifying structural parameters and is capable of fusing data from heterogeneous and multi-fidelity sensors. probabilistic predictions of structural responses and internal forces are in closer agreement with measured data.
    Abstract Machine learning models trained with structural health monitoring data have become a powerful tool for system identification. This paper presents a physics-informed Gaussian process (GP) model for Timoshenko beam elements. The model is constructed as a multi-output GP with covariance and cross-covariance kernels analytically derived based on the differential equations for deflections, rotations, strains, bending moments, shear forces and applied loads. Stiffness identification is performed in a Bayesian format by maximising a posterior model through a Markov chain Monte Carlo method, yielding a stochastic model for the structural parameters. The optimised GP model is further employed for probabilistic predictions of unobserved responses. Additionally, an entropy-based method for physics-informed sensor placement optimisation is presented, exploiting heterogeneous sensor position information and structural boundary conditions built into the GP model. Results demonstrate that the proposed approach is effective at identifying structural parameters and is capable of fusing data from heterogeneous and multi-fidelity sensors. Probabilistic predictions of structural responses and internal forces are in closer agreement with measured data. We validate our model with an experimental setup and discuss the quality and uncertainty of the obtained results. The proposed approach has potential applications in the field of structural health monitoring (SHM) for both mechanical and structural systems.
    摘要 机器学习模型使用结构健康监测数据变得了一种强大的系统识别工具。这篇论文提出了一种基于Timoshenko梁元件的物理学报GP模型。该模型通过分析差分方程来DERIVE covariance和交叉covariancekernel,并在Bayesian格式下通过Markov链 Monte Carlo方法进行弹性模型化。通过最大化 posterior模型,实现了结构参数的逻辑IDENTIFICATION。furthermore, the proposed approach is capable of fusing data from heterogeneous and multi-fidelity sensors, and provides probabilistic predictions of unobserved responses. The results show that the proposed approach is effective in identifying structural parameters and provides more accurate predictions of structural responses and internal forces. We validate the model with an experimental setup and discuss the quality and uncertainty of the obtained results. The proposed approach has potential applications in the field of structural health monitoring (SHM) for both mechanical and structural systems.Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.

OSNet & MNetO: Two Types of General Reconstruction Architectures for Linear Computed Tomography in Multi-Scenarios

  • paper_url: http://arxiv.org/abs/2309.11858
  • repo_url: None
  • paper_authors: Zhisheng Wang, Zihan Deng, Fenglin Liu, Yixing Huang, Haijun Yu, Junning Cui
  • For: This paper proposes two novel reconstruction architectures for linear computed tomography (LCT) systems to weaken projection truncation and image the region of interest (ROI).* Methods: The proposed methods use backprojection filtration (BPF) and two types of reconstruction architectures, Overlay-Single Network (OSNet) and Multiple Networks Overlaying (MNetO), to achieve stable interior reconstruction and avoid rotation operations of Hilbert filtering.* Results: Experimental results show that the proposed methods can both recover images, and OSNet outperforms BPF in various scenarios. Additionally, ST-pix2pixGAN is superior to pix2pixGAN and CycleGAN, and MNetO exhibits a few artifacts due to the differences among the multiple models.Here is the simplified Chinese text:* 为:这篇论文提出了两种新的重构架构来弱化投影截断和图像区域 интереса(ROI) для线性 computed tomography(LCT)系统。* 方法:提出的方法使用了投影筛选(BPF)和两种重构架构:重叠单网络(OSNet)和多网络叠加(MNetO),以实现稳定的内部重构和避免希尔贝特滤波的旋转操作。* 结果:实验结果表明,提出的方法都可以重建图像,并且OSNet在多种场景中都超过BPF表现。此外,ST-pix2pixGAN比pix2pixGAN和CycleGAN更佳,MNetO因多个模型之间的差异而具有一些瑕疵。
    Abstract Recently, linear computed tomography (LCT) systems have actively attracted attention. To weaken projection truncation and image the region of interest (ROI) for LCT, the backprojection filtration (BPF) algorithm is an effective solution. However, in BPF for LCT, it is difficult to achieve stable interior reconstruction, and for differentiated backprojection (DBP) images of LCT, multiple rotation-finite inversion of Hilbert transform (Hilbert filtering)-inverse rotation operations will blur the image. To satisfy multiple reconstruction scenarios for LCT, including interior ROI, complete object, and exterior region beyond field-of-view (FOV), and avoid the rotation operations of Hilbert filtering, we propose two types of reconstruction architectures. The first overlays multiple DBP images to obtain a complete DBP image, then uses a network to learn the overlying Hilbert filtering function, referred to as the Overlay-Single Network (OSNet). The second uses multiple networks to train different directional Hilbert filtering models for DBP images of multiple linear scannings, respectively, and then overlays the reconstructed results, i.e., Multiple Networks Overlaying (MNetO). In two architectures, we introduce a Swin Transformer (ST) block to the generator of pix2pixGAN to extract both local and global features from DBP images at the same time. We investigate two architectures from different networks, FOV sizes, pixel sizes, number of projections, geometric magnification, and processing time. Experimental results show that two architectures can both recover images. OSNet outperforms BPF in various scenarios. For the different networks, ST-pix2pixGAN is superior to pix2pixGAN and CycleGAN. MNetO exhibits a few artifacts due to the differences among the multiple models, but any one of its models is suitable for imaging the exterior edge in a certain direction.
    摘要 近些时候,线性计算 Tomatoesography(LCT)系统已经吸引了广泛的关注。为了减弱投影 truncation 并图像区域内 interest(ROI) для LCT,backprojection filtration(BPF)算法是一种有效的解决方案。然而,在 BPF 中,实现稳定的内部重建很难,而且对于 differentiated backprojection(DBP)图像的 LCT,多个旋转-有限倒散 transform(Hilbert filtering)- inverse rotation 操作会模糊图像。为满足 LCT 多种重建enario,包括内部 ROI、完整的物体和外部区域 beyond field-of-view(FOV),并避免 Hilbert filtering 的旋转操作,我们提出了两种重建架构。第一种是将多个 DBP 图像 overlay 成一个完整的 DBP 图像,然后使用一个网络学习 overlaying Hilbert filtering 函数,称为 Overlay-Single Network(OSNet)。第二种是使用多个网络在不同的旋转下对 DBP 图像进行不同的方向性 Hilbert filtering 模型训练,然后 overlay 得到的重建结果,称为 Multiple Networks Overlaying(MNetO)。在两种架构中,我们在 pix2pixGAN 生成器中引入了 Swin Transformer(ST)块,以同时提取 DBP 图像的局部和全局特征。我们从不同的网络、FOV 大小、像素大小、数据量、几何倍化和处理时间等方面进行了调查。实验结果表明,两种架构都可以重建图像。OSNet 在多种enario 中表现出色,比 BPF 更高效。在不同的网络方面,ST-pix2pixGAN 高于 pix2pixGAN 和 CycleGAN。MNetO 因多个模型之间的差异而存在一些瑕疵,但任一个模型都适用于在某个方向上图像外部边缘的重建。

BitCoin: Bidirectional Tagging and Supervised Contrastive Learning based Joint Relational Triple Extraction Framework

  • paper_url: http://arxiv.org/abs/2309.11853
  • repo_url: None
  • paper_authors: Luyao He, Zhongbao Zhang, Sen Su, Yuxin Chen
  • for: 提高relation triple extraction(RTE)任务的精度和效率,并解决现有方法的一些局限性。
  • methods: 提出了一种基于标签和监督contrastive学习的bidirectional triple extraction框架,并实现了标签在两个方向的执行,以便从主语到谓语和谓语到主语中提取关系 triples。
  • results: 在标准数据集上达到了state-of-the-art的result,并在不同类型的任务中显著提高了F1分数,包括Normal、SEO、EPO和多个关系提取任务。
    Abstract Relation triple extraction (RTE) is an essential task in information extraction and knowledge graph construction. Despite recent advancements, existing methods still exhibit certain limitations. They just employ generalized pre-trained models and do not consider the specificity of RTE tasks. Moreover, existing tagging-based approaches typically decompose the RTE task into two subtasks, initially identifying subjects and subsequently identifying objects and relations. They solely focus on extracting relational triples from subject to object, neglecting that once the extraction of a subject fails, it fails in extracting all triples associated with that subject. To address these issues, we propose BitCoin, an innovative Bidirectional tagging and supervised Contrastive learning based joint relational triple extraction framework. Specifically, we design a supervised contrastive learning method that considers multiple positives per anchor rather than restricting it to just one positive. Furthermore, a penalty term is introduced to prevent excessive similarity between the subject and object. Our framework implements taggers in two directions, enabling triples extraction from subject to object and object to subject. Experimental results show that BitCoin achieves state-of-the-art results on the benchmark datasets and significantly improves the F1 score on Normal, SEO, EPO, and multiple relation extraction tasks.
    摘要 信息提取和知识图构建中的关系 triple 提取(RTE)是一项重要任务。尽管最近有所进步,现有的方法仍然存在一些局限性。它们通常使用通用预训练模型,不考虑RTE任务的特殊性。此外,现有的标记 Based 方法通常将RTE任务分解为两个子任务,先 identific 主题,然后 identific 对象和关系。它们仅ocus on从主题到对象中提取关系 triple,忽略了如果提取主题失败,那么所有与该主题相关的 triple 都将难以提取。为了解决这些问题,我们提出了 BitCoin,一种创新的双向标记和监督对比学习基于的关系 triple 提取框架。具体来说,我们设计了一种监督对比学习方法,可以考虑多个正例而不是仅仅 restricting 到一个正例。此外,我们引入了一个罚 terme 来防止主题和对象之间的过度相似性。我们的框架实现了两个方向的标记,即从主题到对象和从对象到主题,以便提取关系 triple。实验结果表明,BitCoin在标准 benchmark 数据集上实现了当前最佳Result 和显著提高了Normal、SEO、EPO 和多个关系提取任务的 F1 分数。

How Prevalent is Gender Bias in ChatGPT? – Exploring German and English ChatGPT Responses

  • paper_url: http://arxiv.org/abs/2310.03031
  • repo_url: https://github.com/Ognatai/bias_chatGPT
  • paper_authors: Stefanie Urchs, Veronika Thurner, Matthias Aßenmacher, Christian Heumann, Stephanie Thiemichen
  • for: 这个论文旨在探讨OpenAI的ChatGPT语言模型如何帮助非技术用户创作日常工作中的文本,以及该模型的局限性和偏见问题。
  • methods: 该论文采用系统性的分析方法,对提示和生成的答案进行了深入的检查和分析,以找出可能存在的偏见问题。
  • results: 研究发现,ChatGPT可以帮助非技术用户创作文本,但是需要仔细检查系统的答案以避免偏见和语法错误。
    Abstract With the introduction of ChatGPT, OpenAI made large language models (LLM) accessible to users with limited IT expertise. However, users with no background in natural language processing (NLP) might lack a proper understanding of LLMs. Thus the awareness of their inherent limitations, and therefore will take the systems' output at face value. In this paper, we systematically analyse prompts and the generated responses to identify possible problematic issues with a special focus on gender biases, which users need to be aware of when processing the system's output. We explore how ChatGPT reacts in English and German if prompted to answer from a female, male, or neutral perspective. In an in-depth investigation, we examine selected prompts and analyse to what extent responses differ if the system is prompted several times in an identical way. On this basis, we show that ChatGPT is indeed useful for helping non-IT users draft texts for their daily work. However, it is absolutely crucial to thoroughly check the system's responses for biases as well as for syntactic and grammatical mistakes.
    摘要

Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues

  • paper_url: http://arxiv.org/abs/2309.11838
  • repo_url: None
  • paper_authors: Norbert Braunschweiler, Rama Doddipatla, Simon Keizer, Svetlana Stoyanchev
  • for: 这个论文 investigate了使用大型自然语言模型(LLMs)如ChatGPT来进行基于文档的回答生成在信息寻求对话中。
  • methods: 作者使用了两种方法:ChatCompletion和LlamaIndex。ChatCompletion使用了ChatGPT模型的知识,而LlamaIndex同时提取了文档中相关信息。
  • results: 观察到文档基于LLMs无法准确地评估回答生成,因为它们更加具有描述性。因此,作者进行了人工评估,评估 Shared Task 赛事获奖系统、ChatGPT两种变体的输出以及人类回答。结果显示,ChatGPT变体的输出被评估高于Shared Task 赛事获奖系统和人类回答。
    Abstract In this paper, we investigate the use of large language models (LLMs) like ChatGPT for document-grounded response generation in the context of information-seeking dialogues. For evaluation, we use the MultiDoc2Dial corpus of task-oriented dialogues in four social service domains previously used in the DialDoc 2022 Shared Task. Information-seeking dialogue turns are grounded in multiple documents providing relevant information. We generate dialogue completion responses by prompting a ChatGPT model, using two methods: Chat-Completion and LlamaIndex. ChatCompletion uses knowledge from ChatGPT model pretraining while LlamaIndex also extracts relevant information from documents. Observing that document-grounded response generation via LLMs cannot be adequately assessed by automatic evaluation metrics as they are significantly more verbose, we perform a human evaluation where annotators rate the output of the shared task winning system, the two Chat-GPT variants outputs, and human responses. While both ChatGPT variants are more likely to include information not present in the relevant segments, possibly including a presence of hallucinations, they are rated higher than both the shared task winning system and human responses.
    摘要 在这篇论文中,我们研究了大语言模型(LLM)如ChatGPT在信息寻求对话中的回答生成。为评估,我们使用了MultiDoc2Dial词汇对话集,这是四个社会服务领域的任务对话集,已经在DialDoc 2022共同任务中使用。信息寻求对话转帖是基于多份文档提供相关信息。我们使用两种方法生成对话完成响应:ChatCompletion和LlamaIndex。ChatCompletion利用ChatGPT模型的先验知识,而LlamaIndex同时从文档中提取有用信息。由于文档基于的回答生成无法准确地评估,我们进行了人工评估,评估共同任务赢家系统、ChatGPT两种变体的输出以及人类回答。结果显示,两种ChatGPT变体的输出被评估高于共同任务赢家系统和人类回答。

Multimodal Transformers for Wireless Communications: A Case Study in Beam Prediction

  • paper_url: http://arxiv.org/abs/2309.11811
  • repo_url: https://github.com/itu-ai-ml-in-5g-challenge/deepsense6g_tii
  • paper_authors: Yu Tian, Qiyang Zhao, Zine el abidine Kherroubi, Fouzi Boukhalfa, Kebin Wu, Faouzi Bader
  • For: 这个论文目的是为了提高无线通讯中高频率带的焦点管理,使用多modal的感知信息,包括相机、LiDAR、激光和GPS。* Methods: 这个论文使用多modal的transformer深度学习框架,将影像、点云、激光原始数据作为时间序列中的数据,使用卷积神经网提取特征,然后使用transformer核心来学习不同模式之间的隐藏关系,生成下一个层的特征提取。* Results: 这个论文的实验结果显示,使用影像和GPS数据训练的解析器,可以在78.44%的精度下预测焦点,并且具有优秀的泛化能力,在未见日情况下的73%和夜情况下的84%。这比使用其他模式和随机处理技术更好,显示了transformer具有组合特征的优秀表现在无线电波预测中。
    Abstract Wireless communications at high-frequency bands with large antenna arrays face challenges in beam management, which can potentially be improved by multimodality sensing information from cameras, LiDAR, radar, and GPS. In this paper, we present a multimodal transformer deep learning framework for sensing-assisted beam prediction. We employ a convolutional neural network to extract the features from a sequence of images, point clouds, and radar raw data sampled over time. At each convolutional layer, we use transformer encoders to learn the hidden relations between feature tokens from different modalities and time instances over abstraction space and produce encoded vectors for the next-level feature extraction. We train the model on a combination of different modalities with supervised learning. We try to enhance the model over imbalanced data by utilizing focal loss and exponential moving average. We also evaluate data processing and augmentation techniques such as image enhancement, segmentation, background filtering, multimodal data flipping, radar signal transformation, and GPS angle calibration. Experimental results show that our solution trained on image and GPS data produces the best distance-based accuracy of predicted beams at 78.44%, with effective generalization to unseen day scenarios near 73% and night scenarios over 84%. This outperforms using other modalities and arbitrary data processing techniques, which demonstrates the effectiveness of transformers with feature fusion in performing radio beam prediction from images and GPS. Furthermore, our solution could be pretrained from large sequences of multimodality wireless data, on fine-tuning for multiple downstream radio network tasks.
    摘要 无线通信在高频带width大antenna数组面临扩扫管理挑战,可能可以通过多模态感知信息来改进。在这篇论文中,我们提出了一个多模态变换深度学习框架,用于感知协助扫束预测。我们使用卷积神经网络提取图像、点云和雷达原始数据序列中的特征,并在每层卷积层中使用变换器Encoder学习不同模态和时间实例之间的隐藏关系,生成下一层特征提取的编码向量。我们使用多种模式进行超参数学习。为了强化模型在不平衡数据上,我们利用焦点损失和加权移动平均。我们还评估了数据处理和扩展技术,如图像改进、分割、背景筛选、多模态数据翻转、雷达信号转换和GPS角度准确。实验结果表明,我们基于图像和GPS数据进行训练的解决方案在predicted扫束距离方面取得了78.44%的最佳性能,并且在未看到天气的日常场景中实现了73%的有效普适性。此外,我们的解决方案可以从大量多模态无线数据中进行预训练,然后进行多个下游无线网络任务的细化调整。

JobRecoGPT – Explainable job recommendations using LLMs

  • paper_url: http://arxiv.org/abs/2309.11805
  • repo_url: None
  • paper_authors: Preetam Ghosh, Vaishali Sadaphal
  • for: 本研究旨在提出一种基于自然语言理解的 Job 推荐方法,以填补传统方法所产生的数据损失。
  • methods: 本研究使用 Large Language Models (LLMs) 来捕捉原始文本数据中的信息,并评估四种不同的方法(内容基于的 deterministic、LLM 引导的、LLM 无引导的、混合)的性能。
  • results: 研究发现,LLM 引导的方法和混合方法的性能较高,而内容基于的 deterministic 方法和 LLM 无引导的方法的性能较低。同时,LLM 引导的方法和混合方法的时间需求较低。
    Abstract In today's rapidly evolving job market, finding the right opportunity can be a daunting challenge. With advancements in the field of AI, computers can now recommend suitable jobs to candidates. However, the task of recommending jobs is not same as recommending movies to viewers. Apart from must-have criteria, like skills and experience, there are many subtle aspects to a job which can decide if it is a good fit or not for a given candidate. Traditional approaches can capture the quantifiable aspects of jobs and candidates, but a substantial portion of the data that is present in unstructured form in the job descriptions and resumes is lost in the process of conversion to structured format. As of late, Large Language Models (LLMs) have taken over the AI field by storm with extraordinary performance in fields where text-based data is available. Inspired by the superior performance of LLMs, we leverage their capability to understand natural language for capturing the information that was previously getting lost during the conversion of unstructured data to structured form. To this end, we compare performance of four different approaches for job recommendations namely, (i) Content based deterministic, (ii) LLM guided, (iii) LLM unguided, and (iv) Hybrid. In this study, we present advantages and limitations of each method and evaluate their performance in terms of time requirements.
    摘要 Large Language Models (LLMs) have recently taken the AI field by storm with extraordinary performance in fields where text-based data is available. Inspired by their superior performance, we leverage their ability to understand natural language to capture the information that was previously lost during the conversion of unstructured data to a structured form. To this end, we compare the performance of four different approaches for job recommendations, including:1. Content-based deterministic approach2. LLM-guided approach3. LLM unguided approach4. Hybrid approachIn this study, we present the advantages and limitations of each method and evaluate their performance in terms of time requirements.

DimCL: Dimensional Contrastive Learning For Improving Self-Supervised Learning

  • paper_url: http://arxiv.org/abs/2309.11782
  • repo_url: None
  • paper_authors: Thanh Nguyen, Trung Pham, Chaoning Zhang, Tung Luu, Thang Vu, Chang D. Yoo
  • for: 提高自主学习(SSL)的性能,尤其是对于非对照学习(CL)的扩展和改进。
  • methods: 提出了一种名为维度对比学习(DimCL)的策略,即在维度方向上进行对比学习而不是批处理方向上,以增强特征多样性并作为SSL前期执行的正则化。
  • results: 对多个数据集和后端架构进行了广泛的实验,并证明了DimCL可以提高SSL性能,并且发现了硬度意识的特性为DimCL的成功的关键原因。
    Abstract Self-supervised learning (SSL) has gained remarkable success, for which contrastive learning (CL) plays a key role. However, the recent development of new non-CL frameworks has achieved comparable or better performance with high improvement potential, prompting researchers to enhance these frameworks further. Assimilating CL into non-CL frameworks has been thought to be beneficial, but empirical evidence indicates no visible improvements. In view of that, this paper proposes a strategy of performing CL along the dimensional direction instead of along the batch direction as done in conventional contrastive learning, named Dimensional Contrastive Learning (DimCL). DimCL aims to enhance the feature diversity, and it can serve as a regularizer to prior SSL frameworks. DimCL has been found to be effective, and the hardness-aware property is identified as a critical reason for its success. Extensive experimental results reveal that assimilating DimCL into SSL frameworks leads to performance improvement by a non-trivial margin on various datasets and backbone architectures.
    摘要 自领导学习(SSL)已经取得了很大的成功,其中对比学习(CL)扮演着关键角色。然而,最近的新非CL框架的发展已经达到了相当或更好的性能水平,并且有很大的提升潜力,因此研究人员尝试进一步加强这些框架。将CL assimilated into non-CL frameworks 已经被考虑,但实际证据表明没有可见的改善。因此,这篇论文提出了一种在维度方向上进行CL而不是在批处理方向上进行CL,称之为维度对比学习(DimCL)。DimCL aimsto enhance the feature diversity, and it can serve as a regularizer to prior SSL frameworks. DimCL has been found to be effective, and the hardness-aware property is identified as a critical reason for its success. 广泛的实验结果表明,将DimCL assimilated into SSL frameworks 会导致性能提高,具有非负的幅度。

2DDATA: 2D Detection Annotations Transmittable Aggregation for Semantic Segmentation on Point Cloud

  • paper_url: http://arxiv.org/abs/2309.11755
  • repo_url: None
  • paper_authors: Guan-Cheng Lee
  • for: 本研究旨在解决现有多感器模型面临的精度匹配和费用高问题,以实现在实际应用中使用多感器模型。
  • methods: 本研究使用了2D检测注释传递汇集(\textbf{2DDATA}),设计了本地物体分支(\textbf{Local Object Branch),以处理固定盒体内点。这种简单的设计可以将矩形盒体约束信息传递到3D编码器模型中。
  • results: 研究证明了我们的简单设计可以将矩形盒体约束信息传递到3D编码器模型中,证明了大量多感器模型与特定数据 fusion 的可能性。
    Abstract Recently, multi-modality models have been introduced because of the complementary information from different sensors such as LiDAR and cameras. It requires paired data along with precise calibrations for all modalities, the complicated calibration among modalities hugely increases the cost of collecting such high-quality datasets, and hinder it from being applied to practical scenarios. Inherit from the previous works, we not only fuse the information from multi-modality without above issues, and also exhaust the information in the RGB modality. We introduced the 2D Detection Annotations Transmittable Aggregation(\textbf{2DDATA}), designing a data-specific branch, called \textbf{Local Object Branch}, which aims to deal with points in a certain bounding box, because of its easiness of acquiring 2D bounding box annotations. We demonstrate that our simple design can transmit bounding box prior information to the 3D encoder model, proving the feasibility of large multi-modality models fused with modality-specific data.
    摘要

Improve the efficiency of deep reinforcement learning through semantic exploration guided by natural language

  • paper_url: http://arxiv.org/abs/2309.11753
  • repo_url: None
  • paper_authors: Zhourui Guo, Meng Yao, Yang Yu, Qiyue Yin
  • for: 这个论文的目的是提出一种新的RL方法,以便更高效地使用奥拉克力来提高RL的性能。
  • methods: 该方法使用一种选择性的方式来与奥拉克力进行交互,使用一个封装了当前状态和奥拉克力的神经网络来选择最相关的问题,并使用奥拉克力的答案来更新RL的策略和价值函数。
  • results: 该方法可以在一个物体抓取任务中显著提高RL的效率,比基eline方法减少了与奥拉克力的交互次数,以达到一定的性能水平。
    Abstract Reinforcement learning is a powerful technique for learning from trial and error, but it often requires a large number of interactions to achieve good performance. In some domains, such as sparse-reward tasks, an oracle that can provide useful feedback or guidance to the agent during the learning process is really of great importance. However, querying the oracle too frequently may be costly or impractical, and the oracle may not always have a clear answer for every situation. Therefore, we propose a novel method for interacting with the oracle in a selective and efficient way, using a retrieval-based approach. We assume that the interaction can be modeled as a sequence of templated questions and answers, and that there is a large corpus of previous interactions available. We use a neural network to encode the current state of the agent and the oracle, and retrieve the most relevant question from the corpus to ask the oracle. We then use the oracle's answer to update the agent's policy and value function. We evaluate our method on an object manipulation task. We show that our method can significantly improve the efficiency of RL by reducing the number of interactions needed to reach a certain level of performance, compared to baselines that do not use the oracle or use it in a naive way.
    摘要 强化学习是一种强大的技术,可以通过试错学习,但它经常需要许多交互来达到良好的性能。在某些领域,如稀薄奖励任务,一个智能 oracle 可以提供有用的反馈或指导,这对 agents 的学习过程是非常重要。然而,向 oracle 查询过于频繁可能是成本高或实际不可能的,而且 oracle 不一定总是可以为每个情况提供明确的答案。因此,我们提出了一种新的方法,使用选择性和有效的方式与 oracle 交互,使用一种检索基于的方法。我们假设交互可以被视为一个序列化的问题和答案,并且有一个大量的前期交互数据库。我们使用一个神经网络来编码 agent 和 oracle 的当前状态,并从数据库中检索最相关的问题来问 oracle。然后,我们使用 oracle 的答案来更新 agent 的策略和价值函数。我们在一个物品抓取任务上进行了evaluation,我们显示,我们的方法可以减少RL中交互的次数,以达到一定的性能水平,相比于不使用 oracle 或使用它的简单方法。

How Robust is Google’s Bard to Adversarial Image Attacks?

  • paper_url: http://arxiv.org/abs/2309.11751
  • repo_url: https://github.com/thu-ml/attack-bard
  • paper_authors: Yinpeng Dong, Huanran Chen, Jiawei Chen, Zhengwei Fang, Xiao Yang, Yichi Zhang, Yu Tian, Hang Su, Jun Zhu
  • for: 这个论文主要研究了Google的Bard chatbot的抗 adversarial robustness问题,以更好地理解商业多模态语言模型的漏洞。
  • methods: 作者使用了白盒子代理视觉编码器或多模态语言模型进行攻击,生成了对Bard的恶意例子,并证明了这些例子可以让Bard输出错误的图像描述。
  • results: 作者发现,对Bard使用的攻击方法可以在22%的情况下成功,并且这些攻击也可以让其他多模态语言模型(如Bing Chat和ERNIE bot)被攻击。此外,作者还发现了Bard的两种防御机制,并设计了对这些防御机制的攻击方法。
    Abstract Multimodal Large Language Models (MLLMs) that integrate text and other modalities (especially vision) have achieved unprecedented performance in various multimodal tasks. However, due to the unsolved adversarial robustness problem of vision models, MLLMs can have more severe safety and security risks by introducing the vision inputs. In this work, we study the adversarial robustness of Google's Bard, a competitive chatbot to ChatGPT that released its multimodal capability recently, to better understand the vulnerabilities of commercial MLLMs. By attacking white-box surrogate vision encoders or MLLMs, the generated adversarial examples can mislead Bard to output wrong image descriptions with a 22% success rate based solely on the transferability. We show that the adversarial examples can also attack other MLLMs, e.g., a 26% attack success rate against Bing Chat and a 86% attack success rate against ERNIE bot. Moreover, we identify two defense mechanisms of Bard, including face detection and toxicity detection of images. We design corresponding attacks to evade these defenses, demonstrating that the current defenses of Bard are also vulnerable. We hope this work can deepen our understanding on the robustness of MLLMs and facilitate future research on defenses. Our code is available at https://github.com/thu-ml/Attack-Bard. Update: GPT-4V is available at October 2023. We further evaluate its robustness under the same set of adversarial examples, achieving a 45% attack success rate.
    摘要 多模态大语言模型(MLLMs),包括文本和其他感知modalities(特别是视觉),在多种多modal任务中表现出了无precendent的表现。然而,由于视觉模型的不可靠性问题,MLLMs可能具有更严重的安全和安全风险。在这个工作中,我们研究Google的Bard,一个与ChatGPT竞争的聊天机器人,以更好地了解商业MLLMs的漏洞。我们通过攻击白盒子代理视觉encoder或MLLMs来生成对抗例子,可以让Bard输出错误的图像描述,成功率达22%。此外,我们发现Bard的防御机制,包括图像检测和图像攻击检测。我们设计了对这些防御机制的攻击,并证明现有的防御机制也受到攻击。我们希望这个工作可以深入了解MLLMs的稳定性,并促进未来的防御研究。我们的代码可以在https://github.com/thu-ml/Attack-Bard上获取。更新:GPT-4V将于2023年10月发布。我们进一步测试其在同样的对抗例子下的稳定性,成功率达45%。

LPML: LLM-Prompting Markup Language for Mathematical Reasoning

  • paper_url: http://arxiv.org/abs/2309.13078
  • repo_url: None
  • paper_authors: Ryutaro Yamauchi, Sho Sonoda, Akiyoshi Sannai, Wataru Kumagai
  • for: 本研究旨在使用大型自然语言模型(LLMs)进行数学逻辑 reasoning,并解决由 LLMS 生成的文本中存在的错误和计算问题。
  • methods: 本研究提出了一种 novel 的框架,即将 Chain-of-Thought(CoT)方法与外部工具(Python REPL)集成,并通过占位符语言的 markup 语言来控制 LLMS 的不жела的行为。
  • results: 通过对 ChatGPT(GPT-3.5)进行应用,我们 demonstated 了将 CoT 和 Python REPL 集成可以提高 LLMS 的逻辑能力,并且可以使 LLMS 通过 zero-shot 提示来进行高级数学逻辑。
    Abstract In utilizing large language models (LLMs) for mathematical reasoning, addressing the errors in the reasoning and calculation present in the generated text by LLMs is a crucial challenge. In this paper, we propose a novel framework that integrates the Chain-of-Thought (CoT) method with an external tool (Python REPL). We discovered that by prompting LLMs to generate structured text in XML-like markup language, we could seamlessly integrate CoT and the external tool and control the undesired behaviors of LLMs. With our approach, LLMs can utilize Python computation to rectify errors within CoT. We applied our method to ChatGPT (GPT-3.5) to solve challenging mathematical problems and demonstrated that combining CoT and Python REPL through the markup language enhances the reasoning capability of LLMs. Our approach enables LLMs to write the markup language and perform advanced mathematical reasoning using only zero-shot prompting.
    摘要 utilizing large language models (LLMs) for mathematical reasoning, addressing the errors in the reasoning and calculation present in the generated text by LLMs is a crucial challenge. In this paper, we propose a novel framework that integrates the Chain-of-Thought (CoT) method with an external tool (Python REPL). We discovered that by prompting LLMs to generate structured text in XML-like markup language, we could seamlessly integrate CoT and the external tool and control the undesired behaviors of LLMs. With our approach, LLMs can utilize Python computation to rectify errors within CoT. We applied our method to ChatGPT (GPT-3.5) to solve challenging mathematical problems and demonstrated that combining CoT and Python REPL through the markup language enhances the reasoning capability of LLMs. Our approach enables LLMs to write the markup language and perform advanced mathematical reasoning using only zero-shot prompting.Here's the translation in Traditional Chinese:使用大型语言模型(LLMs)进行数学理解,对于 LLMS 生成的文本中的错误和计算存在的挑战是一个重要的挑战。在这篇论文中,我们提出了一个新的框架,它结合了排序链 (CoT) 方法和一个外部工具(Python REPL)。我们发现,通过将 LLMS 调侃为生成标记语言(XML-like)的 markup 语言,可以与 CoT 和外部工具完美整合,控制 LLMS 的不适当行为。我们的方法可以让 LLMS 使用 Python 计算 rectify CoT 中的错误。我们将我们的方法应用到 ChatGPT (GPT-3.5) 来解决困难的数学问题,并证明了通过 markup 语言来结合 CoT 和 Python REPL 可以提高 LLMS 的数学理解能力。我们的方法可以让 LLMS 只需零 shot 提示就能写 markup 语言并进行进阶的数学理解。

Choice-75: A Dataset on Decision Branching in Script Learning

  • paper_url: http://arxiv.org/abs/2309.11737
  • repo_url: None
  • paper_authors: Zhaoyi Joey Hou, Li Zhang, Chris Callison-Burch
  • for: 本研究旨在研究日常事件的发展。
  • methods: 本文提出了Choice-75,首个测试智能系统可Predict decisions given descriptive scenarios的benchmark。
  • results: 大语言模型在总体上表现不错,但在许多困难的场景下还有很大的进步空间。
    Abstract Script learning studies how daily events unfold. Previous works tend to consider a script as a linear sequence of events while ignoring the potential branches that arise due to people's circumstantial choices. We hence propose Choice-75, the first benchmark that challenges intelligent systems to predict decisions given descriptive scenarios, containing 75 scripts and more than 600 scenarios. While large language models demonstrate overall decent performances, there is still notable room for improvement in many hard scenarios.
    摘要 学习脚本研究每日事件的发展。以前的工作通常将脚本视为一个Linear sequence of events而忽略人们因特殊选择而导致的可能的分支。我们因此提出了选择75,首个挑战智能系统预测基于描述场景的决策,包括75个脚本和超过600个场景。虽然大型语言模型在总的表现不错,但还有许多困难场景需要进一步改进。

A Differentiable Framework for End-to-End Learning of Hybrid Structured Compression

  • paper_url: http://arxiv.org/abs/2309.13077
  • repo_url: None
  • paper_authors: Moonjung Eo, Suhyun Kang, Wonjong Rhee
  • for: 提高结构压缩技术的性能。
  • methods: 使用梯度基本优化的演化框架(DF),包括筛选器选择(DML-S)和排名选择(DTL-S)。
  • results: 实验结果表明DF比现有的结构压缩方法更高效。
    Abstract Filter pruning and low-rank decomposition are two of the foundational techniques for structured compression. Although recent efforts have explored hybrid approaches aiming to integrate the advantages of both techniques, their performance gains have been modest at best. In this study, we develop a \textit{Differentiable Framework~(DF)} that can express filter selection, rank selection, and budget constraint into a single analytical formulation. Within the framework, we introduce DML-S for filter selection, integrating scheduling into existing mask learning techniques. Additionally, we present DTL-S for rank selection, utilizing a singular value thresholding operator. The framework with DML-S and DTL-S offers a hybrid structured compression methodology that facilitates end-to-end learning through gradient-base optimization. Experimental results demonstrate the efficacy of DF, surpassing state-of-the-art structured compression methods. Our work establishes a robust and versatile avenue for advancing structured compression techniques.
    摘要 <>请转换文本为简化中文。<>基础技术 Filter pruning 和 low-rank decomposition 是结构压缩的两大基础技术。 although recent efforts have explored hybrid approaches aiming to integrate the advantages of both techniques, their performance gains have been modest at best. In this study, we develop a 可微分 Framework~(DF) that can express filter selection, rank selection, and budget constraint into a single analytical formulation. Within the framework, we introduce DML-S for filter selection, integrating scheduling into existing mask learning techniques. Additionally, we present DTL-S for rank selection, utilizing a singular value thresholding operator. The framework with DML-S and DTL-S offers a hybrid structured compression methodology that facilitates end-to-end learning through gradient-based optimization. Experimental results demonstrate the efficacy of DF, surpassing state-of-the-art structured compression methods. Our work establishes a robust and versatile avenue for advancing structured compression techniques.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency

  • paper_url: http://arxiv.org/abs/2309.11725
  • repo_url: https://github.com/ai-s2-lab/fluenteditor
  • paper_authors: Rui Liu, Jiatian Xi, Ziyue Jiang, Haizhou Li
  • for: 提高语音编辑技术中的流畅性,使得用户可以通过修改输入文本脚本而不是直接修改音频自身来编辑语音。
  • methods: 基于神经网络的文本语音编辑技术,包括在编辑区域和邻近音频段之间的听力过渡和原始语音风格的保持。
  • results: 对VCTK数据进行主观和 объектив的实验,表明我们的fluente Editor在自然性和流畅性方面超越了所有先进的基elines。
    Abstract Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by modifying the input text transcript instead of the audio itself. Despite much progress in neural network-based TSE techniques, the current techniques have focused on reducing the difference between the generated speech segment and the reference target in the editing region, ignoring its local and global fluency in the context and original utterance. To maintain the speech fluency, we propose a fluency speech editing model, termed \textit{FluentEditor}, by considering fluency-aware training criterion in the TSE training. Specifically, the \textit{acoustic consistency constraint} aims to smooth the transition between the edited region and its neighboring acoustic segments consistent with the ground truth, while the \textit{prosody consistency constraint} seeks to ensure that the prosody attributes within the edited regions remain consistent with the overall style of the original utterance. The subjective and objective experimental results on VCTK demonstrate that our \textit{FluentEditor} outperforms all advanced baselines in terms of naturalness and fluency. The audio samples and code are available at \url{https://github.com/Ai-S2-Lab/FluentEditor}.
    摘要 文本基于的语音编辑(TSE)技术是为了让用户通过修改输入文本脚本而非直接修改音频自行编辑语音。虽然 neural network-based TSE 技术已经做出了很多进展,但当前技术主要是针对编辑区域和参考目标之间的差异减小,忽略了语音流畅性的本地和全局因素。为保持语音流畅性,我们提议一种流畅语音编辑模型,称为“流畅编辑器”(FluentEditor),通过考虑语音流畅性训练 criterion 在 TSE 训练中。具体来说,“语音一致性约束”目的在编辑区域和其相邻的语音段之间实现缓冲的过渡,使得语音流畅性更高;“语调一致性约束”则是保证编辑区域中的语调特征与原始语音的整体风格保持一致。对 VCTK 进行主观和客观实验表明,我们的“流畅编辑器”在自然性和流畅性两个方面超越了所有高级基elines。听音样本和代码可以在 GitHub 上获得:https://github.com/Ai-S2-Lab/FluentEditor。

Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech

  • paper_url: http://arxiv.org/abs/2309.11724
  • repo_url: https://github.com/ai-s2-lab/emopp
  • paper_authors: Rui Liu, Bin Liu, Haizhou Li
  • for: 这个论文主要研究了用于实现语音合成中的情感表达。
  • methods: 这个论文提出了一种基于语音感知的情感表达模型,称为Emotion-Aware Prosodic Phrasing(EmoPP),以便准确地捕捉语音中的情感cue并预测合适的分割点。
  • results: 对于ESD dataset的对象评估和主观评估表明,EmoPP模型在情感表达方面表现出色,与基础模型相比有显著的提升。Audioamples和代码可以在https://github.com/AI-S2-Lab/EmoPP中找到。
    Abstract Prosodic phrasing is crucial to the naturalness and intelligibility of end-to-end Text-to-Speech (TTS). There exist both linguistic and emotional prosody in natural speech. As the study of prosodic phrasing has been linguistically motivated, prosodic phrasing for expressive emotion rendering has not been well studied. In this paper, we propose an emotion-aware prosodic phrasing model, termed \textit{EmoPP}, to mine the emotional cues of utterance accurately and predict appropriate phrase breaks. We first conduct objective observations on the ESD dataset to validate the strong correlation between emotion and prosodic phrasing. Then the objective and subjective evaluations show that the EmoPP outperforms all baselines and achieves remarkable performance in terms of emotion expressiveness. The audio samples and the code are available at \url{https://github.com/AI-S2-Lab/EmoPP}.
    摘要 <>传统的 Text-to-Speech (TTS) 系统通常强调语音的正确性和自然性,但是它们往往忽略了语音的情感表达。在这篇论文中,我们提出了一种基于情感的语音分割模型,称为 EmoPP,可以准确地捕捉语音中的情感cue并预测合适的分割点。我们首先通过对 ESD 数据集的 объектив观察, Validate the strong correlation between emotion and prosodic phrasing。然后,对象和主观评估表明,EmoPP 超过了所有基eline,并达到了很高的情感表达性能。听音amples和代码可以在 GitHub 上 obtian 到:https://github.com/AI-S2-Lab/EmoPP。Note: "ESD" stands for "Emotion-based Speech Dataset" in Chinese.

A Dynamic Domain Adaptation Deep Learning Network for EEG-based Motor Imagery Classification

  • paper_url: http://arxiv.org/abs/2309.11714
  • repo_url: None
  • paper_authors: Jie Jiao, Meiyan Xu, Qingqing Chen, Hefan Zhou, Wangliang Zhou
  • for: 提高BCI系统的精度和稳定性,减少参数调整时间
  • methods: 使用Dynamic Domain Adaptation Based Deep Learning Network (DADL-Net),将EEG数据映射到三维几何空间,并通过3D数据模组和通道注意力机制强化特征,最后通过最终数据模组更新特征
  • results: 在BCI竞赛IV 2a和OpenBMI数据集上验证了方法的表现,实现了70.42%和73.91%的准确率
    Abstract There is a correlation between adjacent channels of electroencephalogram (EEG), and how to represent this correlation is an issue that is currently being explored. In addition, due to inter-individual differences in EEG signals, this discrepancy results in new subjects need spend a amount of calibration time for EEG-based motor imagery brain-computer interface. In order to solve the above problems, we propose a Dynamic Domain Adaptation Based Deep Learning Network (DADL-Net). First, the EEG data is mapped to the three-dimensional geometric space and its temporal-spatial features are learned through the 3D convolution module, and then the spatial-channel attention mechanism is used to strengthen the features, and the final convolution module can further learn the spatial-temporal information of the features. Finally, to account for inter-subject and cross-sessions differences, we employ a dynamic domain-adaptive strategy, the distance between features is reduced by introducing a Maximum Mean Discrepancy loss function, and the classification layer is fine-tuned by using part of the target domain data. We verify the performance of the proposed method on BCI competition IV 2a and OpenBMI datasets. Under the intra-subject experiment, the accuracy rates of 70.42% and 73.91% were achieved on the OpenBMI and BCIC IV 2a datasets.
    摘要 有 correlation между邻近通道的电энцефаogram (EEG), 如何表示这种 correlation 是目前正在探索的问题。此外,由于 EEG 信号的个体差异,这种差异会导致新的 subjects 需要 spent 一定的 calibration 时间 для EEG-based motor imagination 脑机器 interfaces。为解决上述问题,我们提议一种 Dynamic Domain Adaptation Based Deep Learning Network (DADL-Net)。首先,EEG 数据被映射到三维几何空间中,并通过 3D 卷积模块学习其时间-空间特征,然后通过空间通道注意力机制强化特征,最后通过最后一个卷积模块学习特征的时间-空间信息。此外,为了补偿个体和跨会话差异,我们采用动态领域适应策略,将特征之间的距离减少到最大平均差异损失函数,并使用部分目标领域数据进行细化。我们验证了提议的方法在 BCI 竞赛 IV 2a 和 OpenBMI 数据集上的性能。在内部实验中,我们在 OpenBMI 和 BCIC IV 2a 数据集上达到了准确率为 70.42% 和 73.91%。