2023-10-22

cs.AI

cs.AI - 2023-10-22

A generalized likelihood-weighted optimal sampling algorithm for rare-event probability quantification

paper_url: http://arxiv.org/abs/2310.14457
repo_url: https://github.com/umbrellagong/gpextreme
paper_authors: Xianliang Gong, Yulin Pan
for: 用于效率地量化输入-响应系统中罕见事件的统计数据
methods: 使用一种新的获取函数，该函数是原始的可能性权重（LW）获取函数的扩展，可以更好地 Address 两个缺陷：1）输入空间相关罕见响应的采样不充分; 2）模型可能具有较大偏差，尤其是在复杂的输入-响应函数和有限样本情况下。
results: 比原始LW获取函数更高效，在多个测试 caso中显示出超过一个级别的性能提升，并在工程应用中用于船在随机海洋中罕见滚动统计量化。

Abstract
In this work, we introduce a new acquisition function for sequential sampling to efficiently quantify rare-event statistics of an input-to-response (ItR) system with given input probability and expensive function evaluations. Our acquisition is a generalization of the likelihood-weighted (LW) acquisition that was initially designed for the same purpose and then extended to many other applications. The improvement in our acquisition comes from the generalized form with two additional parameters, by varying which one can target and address two weaknesses of the original LW acquisition: (1) that the input space associated with rare-event responses is not sufficiently stressed in sampling; (2) that the surrogate model (generated from samples) may have significant deviation from the true ItR function, especially for cases with complex ItR function and limited number of samples. In addition, we develop a critical procedure in Monte-Carlo discrete optimization of the acquisition function, which achieves orders of magnitude acceleration compared to existing approaches for such type of problems. The superior performance of our new acquisition to the original LW acquisition is demonstrated in a number of test cases, including some cases that were designed to show the effectiveness of the original LW acquisition. We finally apply our method to an engineering example to quantify the rare-event roll-motion statistics of a ship in a random sea.

摘要
在这个工作中，我们介绍了一种新的获取函数，用于Sequential Sampling来有效地量化输入-响应（ItR）系统的罕见事件统计。我们的获取函数是原始的可信度权重（LW）获取函数的推广，通过两个额外参数，可以根据不同的目标进行调整。这两个参数可以解决原始LW获取函数的两个弱点：（1）输入空间相关罕见响应的抽样不充分;（2）基于抽样生成的模型（surrogate model）可能与真实ItR函数存在显著差异，特别是在复杂ItR函数和有限样本情况下。此外，我们还开发了一种在Monte-Carlo精确优化中的关键程序，可以实现多orders of magnitude的加速。我们的新获取函数比原始LW获取函数表现更优异，这被证明在一些测试案例中，包括一些用于测试原始LW获取函数的案例。最后，我们应用了我们的方法到一个工程实例，以量化船在随机海洋中的罕见滚动统计。

Mobile Traffic Prediction at the Edge through Distributed and Transfer Learning

paper_url: http://arxiv.org/abs/2310.14456
repo_url: None
paper_authors: Alfredo Petrella, Marco Miozzo, Paolo Dini
for: 这个论文旨在预测移动网络流量，以便智能优化移动网络。
methods: 该论文提出了一种基于边缘计算的预测框架，使用边缘上获取的数据进行预测。两种主要的深度学习架构，基于卷积神经网络（CNN）和循环神经网络（RNN），在不同的训练条件下进行测试。此外，该论文还应用了知识传递学习（KTL）技术来提高模型的性能，同时减少计算资源的需求。
results: 实验结果显示，CNN架构在RNNs之上表现出色，并提供了预测能力的估计。KTL技术能够降低模型的能量占用率，其中对CNNs和RNNs的预测模型而言，能量占用率下降60%和90%。最后，该论文还应用了两种前沿的解释性人工智能技术来解释得到的学习模型。

Abstract
Traffic prediction represents one of the crucial tasks for smartly optimizing the mobile network. The research in this topic concentrated in making predictions in a centralized fashion, i.e., by collecting data from the different network elements. This translates to a considerable amount of energy for data transmission and processing. In this work, we propose a novel prediction framework based on edge computing which uses datasets obtained on the edge through a large measurement campaign. Two main Deep Learning architectures are designed, based on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), and tested under different training conditions. In addition, Knowledge Transfer Learning (KTL) techniques are employed to improve the performance of the models while reducing the required computational resources. Simulation results show that the CNN architectures outperform the RNNs. An estimation for the needed training energy is provided, highlighting KTL ability to reduce the energy footprint of the models of 60% and 90% for CNNs and RNNs, respectively. Finally, two cutting-edge explainable Artificial Intelligence techniques are employed to interpret the derived learning models.

摘要
traffic prediction 是智能化 моби 网络的一项重要任务。研究在这个领域集中式的方式进行预测，即通过不同的网络元素收集数据。这会带来很大的能源消耗 для数据传输和处理。在这项工作中，我们提出了一种基于边缘计算的预测框架，使用在边缘上进行测量 campaigndata obtain。我们设计了两种主要的深度学习架构，基于卷积神经网络（CNNs）和循环神经网络（RNNs），并在不同的训练条件下进行测试。此外，我们还利用了知识传递学习（KTL）技术来提高模型的性能，同时降低计算资源的需求。 simulation 结果表明，CNN 架构在 RNNs 之上表现出色。我们还提供了预测模型所需的训练能源的估算，并显示KTL 技术可以降低模型的能源占用率，分别为 60% 和 90% для CNNs 和 RNNs。最后，我们利用了两种前沿的解释性人工智能技术来解释 deriv 的学习模型。

An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI

paper_url: http://arxiv.org/abs/2310.14455
repo_url: None
paper_authors: Ross Gruetzemacher, Alan Chan, Kevin Frazier, Christy Manning, Štěpán Los, James Fox, José Hernández-Orallo, John Burden, Matija Franklin, Clíodhna Ní Ghuidhir, Mark Bailey, Daniel Eth, Toby Pilditch, Kyle Kilian
for: The paper is written to address the need for effective governance and regulation of advanced AI systems, particularly in response to the risks they pose.methods: The paper proposes the creation of an international consortium for AI risk evaluations, which would bring together AI developers and third-party evaluators to assess and mitigate risks from advanced AI systems.results: The proposed consortium could play a critical role in coordinating international efforts to manage responsible scaling policies and evaluate risks from advanced AI systems, and could potentially help to mitigate societal-scale risks from these systems.

Abstract
Given rapid progress toward advanced AI and risks from frontier AI systems (advanced AI systems pushing the boundaries of the AI capabilities frontier), the creation and implementation of AI governance and regulatory schemes deserves prioritization and substantial investment. However, the status quo is untenable and, frankly, dangerous. A regulatory gap has permitted AI labs to conduct research, development, and deployment activities with minimal oversight. In response, frontier AI system evaluations have been proposed as a way of assessing risks from the development and deployment of frontier AI systems. Yet, the budding AI risk evaluation ecosystem faces significant coordination challenges, such as a limited diversity of evaluators, suboptimal allocation of effort, and perverse incentives. This paper proposes a solution in the form of an international consortium for AI risk evaluations, comprising both AI developers and third-party AI risk evaluators. Such a consortium could play a critical role in international efforts to mitigate societal-scale risks from advanced AI, including in managing responsible scaling policies and coordinated evaluation-based risk response. In this paper, we discuss the current evaluation ecosystem and its shortcomings, propose an international consortium for advanced AI risk evaluations, discuss issues regarding its implementation, discuss lessons that can be learnt from previous international institutions and existing proposals for international AI governance institutions, and, finally, we recommend concrete steps to advance the establishment of the proposed consortium: (i) solicit feedback from stakeholders, (ii) conduct additional research, (iii) conduct a workshop(s) for stakeholders, (iv) analyze feedback and create final proposal, (v) solicit funding, and (vi) create a consortium.

摘要
随着人工智能的快速进步和前iers AI 系统的风险 (前iers AI 系统在智能能力前iers的边缘进行推进), 建立和实施人工智能治理和规则制定的优先级和投入是需要优先考虑的。然而，当前的情况是不可持续和危险的。AI 实验室在研究、开发和部署活动中受到最小监管的情况下进行了研究，开发和部署活动。因此，前iers AI 系统评估被提出来评估前iers AI 系统的风险。然而，AI 风险评估生态系统面临着 significi cant 协调挑战，如评估人员的匮乏多样性、资源的不均分配和反面的激励。本文提出一种解决方案，即成立国际合作组织 для AI 风险评估，包括 AI 开发者和第三方 AI 风险评估人员。这种合作组织可以在国际努力中减轻社会规模的风险，包括负责任的扩展策略和协调评估基础的风险应对。本文首先介绍当前评估生态系统的缺陷和不足，然后提出了国际合作组织的建议，包括实施问题、学习自前例和现有的国际 AI 治理建议、以及实施步骤。 Specifically, the text has been translated as follows:随着人工智能的快速进步和前iers AI 系统的风险 (前iers AI 系统在智能能力前iers的边缘进行推进), 建立和实施人工智能治理和规则制定的优先级和投入是需要优先考虑的。然而，当前的情况是不可持续和危险的。AI 实验室在研究、开发和部署活动中受到最小监管的情况下进行了研究，开发和部署活动。因此，前iers AI 系统评估被提出来评估前iers AI 系统的风险。然而，AI 风险评估生态系统面临着 significiant 协调挑战，如评估人员的匮乏多样性、资源的不均分配和反面的激励。本文提出一种解决方案，即成立国际合作组织 для AI 风险评估，包括 AI 开发者和第三方 AI 风险评估人员。这种合作组织可以在国际努力中减轻社会规模的风险，包括负责任的扩展策略和协调评估基础的风险应对。

Retrieval-Augmented Chain-of-Thought in Semi-structured Domains

paper_url: http://arxiv.org/abs/2310.14435
repo_url: https://github.com/vaibhavg152/Retrieval-Augmented-Chain-of-Thought-in-Semi-structured-Domains
paper_authors: Vaibhav Mavi, Abulhair Saparov, Chen Zhao
for: 法律和金融领域的问答系统中使用现有的问答系统会存在一些挑战，需要具备专业知识。
methods: 这篇论文探讨了利用法律和金融数据的半结构化特性，以高效地检索相关的上下文，使用大语言模型（LLMs）进行领域专业的问答。
results: 这项研究的系统比当前模型高效，同时还提供了有用的解释，激励将LLMs integrate into 法律和金融NLP系统中进行未来研究。

Abstract
Applying existing question answering (QA) systems to specialized domains like law and finance presents challenges that necessitate domain expertise. Although large language models (LLMs) have shown impressive language comprehension and in-context learning capabilities, their inability to handle very long inputs/contexts is well known. Tasks specific to these domains need significant background knowledge, leading to contexts that can often exceed the maximum length that existing LLMs can process. This study explores leveraging the semi-structured nature of legal and financial data to efficiently retrieve relevant context, enabling the use of LLMs for domain-specialized QA. The resulting system outperforms contemporary models and also provides useful explanations for the answers, encouraging the integration of LLMs into legal and financial NLP systems for future research.

摘要
使用现有的问答（QA）系统在特定领域如法律和金融中存在挑战，需要领域专业知识。虽然大型自然语言模型（LLM）在语言理解和上下文学习能力方面表现出色，但它们无法处理非常长的输入/上下文，这是公共知识。这些领域的任务需要背景知识，导致上下文可以经常超过现有的 LLM 处理 longest length。这项研究探讨了利用法律和金融数据的半结构化特性，以高效地检索相关上下文，使用 LLM 进行领域化问答。该系统比当代模型高效，并提供了有用的解释，鼓励将 LLM integrated 到法律和金融 NLP 系统中 для未来研究。

Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation

paper_url: http://arxiv.org/abs/2310.14424
repo_url: None
paper_authors: Meriem Boubdir, Edward Kim, Beyza Ermis, Marzieh Fadaee, Sara Hooker
for: 大型自然语言模型的评估中，人类评估变得越来越重要，可以更好地捕捉语言细节和用户喜好。methods: 我们使用度量方法来减少人类反馈的数量，以提高模型评估的效率。results: 我们发现，使用度量方法可以减少模型评估中的决定性（或“tie”）结果，比Random Sample减少了54%。这表明我们的方法可以更好地使用人类反馈，从而提高模型评估的效率。

Abstract
Human evaluation is increasingly critical for assessing large language models, capturing linguistic nuances, and reflecting user preferences more accurately than traditional automated metrics. However, the resource-intensive nature of this type of annotation process poses significant challenges. The key question driving our work: "is it feasible to minimize human-in-the-loop feedback by prioritizing data instances which most effectively distinguish between models?" We evaluate several metric-based methods and find that these metrics enhance the efficiency of human evaluations by minimizing the number of required annotations, thus saving time and cost, while ensuring a robust performance evaluation. We show that our method is effective across widely used model families, reducing instances of indecisive (or "tie") outcomes by up to 54% compared to a random sample when focusing on the top-20 percentile of prioritized instances. This potential reduction in required human effort positions our approach as a valuable strategy in future large language model evaluations.

摘要
人类评估在评估大型自然语言模型时变得越来越重要，捕捉语言细节和用户偏好更加准确 than traditional自动指标。然而，这种类型的注释过程具有资源占用性的挑战。我们的问题驱动我们的工作："是否可以减少人工循环反馈？"我们评估了一些度量基于的方法，并发现这些度量可以提高人类评估的效率，减少需要注释的数量，从而节省时间和成本，同时保证评估性能的Robustness。我们的方法在广泛使用的模型家族上效果，减少 indecisive（或“tie”）结果的数量，相比随机抽样，下降54%。这种减少的人工努力位置我们的方法为未来大型自然语言模型评估中的有价值策略。

Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design

paper_url: http://arxiv.org/abs/2310.14420
repo_url: https://github.com/pnnl/chemreasoner
paper_authors: Henry W. Sprueill, Carl Edwards, Mariefel V. Olarte, Udishnu Sanyal, Heng Ji, Sutanay Choudhury
for: 本研究目的是找到新的催化剂，需要复杂的化学性质和结果之间的权衡，导致搜索空间的 combinatorial 增长。
methods: 本研究使用 Monte Carlo Tree Search 方法，以提高 state-of-the-art chain-of-thought prompting 变种，以增强科学理解。
results: 我们的方法比最佳基eline提高 25.8%，并发现我们的方法可以增强科学家的理解和发现过程，提供新的想法和发现。

Abstract
Discovering novel catalysts requires complex reasoning involving multiple chemical properties and resultant trade-offs, leading to a combinatorial growth in the search space. While large language models (LLM) have demonstrated novel capabilities for chemistry through complex instruction following capabilities and high quality reasoning, a goal-driven combinatorial search using LLMs has not been explored in detail. In this work, we present a Monte Carlo Tree Search-based approach that improves beyond state-of-the-art chain-of-thought prompting variants to augment scientific reasoning. We introduce two new reasoning datasets: 1) a curation of computational chemistry simulations, and 2) diverse questions written by catalysis researchers for reasoning about novel chemical conversion processes. We improve over the best baseline by 25.8\% and find that our approach can augment scientist's reasoning and discovery process with novel insights.

摘要
发现新的催化剂需要复杂的逻辑，涉及多种化学性质和结果的让步，从而导致搜索空间的 combinatorial 增长。虽然大型自然语言模型（LLM）已经在化学中表现出了新的能力，但一种以目标为导向的 combinatorial 搜索使用 LLM 尚未得到了详细的探索。在这种工作中，我们提出了基于 Monte Carlo Tree Search 的方法，可以超越现有的状态艺术探索变体，以增强科学逻辑。我们 introduce 了两个新的逻辑数据集：1）一个 Computational chemistry simulations 的筛选，2） catalysis 研究者写的多种关于新的化学转化过程的问题。我们在最佳基eline上提高了25.8％，并发现我们的方法可以增强科学家的逻辑和发现过程，并提供新的发现。

Vision Language Models in Autonomous Driving and Intelligent Transportation Systems

paper_url: http://arxiv.org/abs/2310.14414
repo_url: None
paper_authors: Xingcheng Zhou, Mingyu Liu, Bare Luka Zagar, Ekim Yurtsever, Alois C. Knoll
for: 本研究的目的是为Autonomous Driving (AD)和Intelligent Transportation Systems (ITS)领域内的视觉语言模型（VLM）做出全面的检视和评估，以探讨当前模型和数据集的进展，以及未来研究方向。
methods: 本研究使用了许多当前最佳的语言模型，包括Large Language Models (LLMs)，以及一些特定的交通和驾驶数据集。
results: 本研究发现了许多视觉语言模型在Autonomous Driving (AD)和Intelligent Transportation Systems (ITS)领域的应用，包括改善驾驶安全性和效率，以及探索新的研究方向。同时，研究还发现了一些挑战和研究缺失，需要进一步的研究和解决。

Abstract
The applications of Vision-Language Models (VLMs) in the fields of Autonomous Driving (AD) and Intelligent Transportation Systems (ITS) have attracted widespread attention due to their outstanding performance and the ability to leverage Large Language Models (LLMs). By integrating language data, the vehicles, and transportation systems are able to deeply understand real-world environments, improving driving safety and efficiency. In this work, we present a comprehensive survey of the advances in language models in this domain, encompassing current models and datasets. Additionally, we explore the potential applications and emerging research directions. Finally, we thoroughly discuss the challenges and research gap. The paper aims to provide researchers with the current work and future trends of VLMs in AD and ITS.

摘要
自然语言模型（VLM）在自动驾驶（AD）和智能交通系统（ITS）领域的应用已经吸引了广泛的注意，这主要归功于它们在实际环境中的出色表现以及能够利用大型语言模型（LLM）的优势。通过语言数据的 интеграción，车辆和交通系统能够深入理解现实环境，提高驾驶安全性和效率。在这篇论文中，我们提供了自然语言模型在这个领域的全面报告，涵盖当前的模型和数据集。此外，我们还探讨了这个领域的潜在应用和出现的研究方向。最后，我们详细讨论了这个领域的挑战和研究空白。本文的目标是为研究人员提供当前和未来自然语言模型在AD和ITS领域的进展和趋势。

Be Selfish, But Wisely: Investigating the Impact of Agent Personality in Mixed-Motive Human-Agent Interactions

paper_url: http://arxiv.org/abs/2310.14404
repo_url: None
paper_authors: Kushal Chawla, Ian Wu, Yu Rong, Gale M. Lucas, Jonathan Gratch
for: 这种方法是为了设计一个谈判对话系统。methods: 这种方法使用自我游戏学习，训练一个与人类对话数据模拟的用户。results: 这种方法会导致谈判对话系统失去妥协的价值，导致对方不再谈判，最终影响系统的表现。

Abstract
A natural way to design a negotiation dialogue system is via self-play RL: train an agent that learns to maximize its performance by interacting with a simulated user that has been designed to imitate human-human dialogue data. Although this procedure has been adopted in prior work, we find that it results in a fundamentally flawed system that fails to learn the value of compromise in a negotiation, which can often lead to no agreements (i.e., the partner walking away without a deal), ultimately hurting the model's overall performance. We investigate this observation in the context of the DealOrNoDeal task, a multi-issue negotiation over books, hats, and balls. Grounded in negotiation theory from Economics, we modify the training procedure in two novel ways to design agents with diverse personalities and analyze their performance with human partners. We find that although both techniques show promise, a selfish agent, which maximizes its own performance while also avoiding walkaways, performs superior to other variants by implicitly learning to generate value for both itself and the negotiation partner. We discuss the implications of our findings for what it means to be a successful negotiation dialogue system and how these systems should be designed in the future.

摘要
自然的方式设计一个谈判对话系统是通过自我游戏学习：训练一个智能代理人，它通过与一个模拟人类对话数据的虚拟用户进行交互来学习提高自己的表现。尽管这种方法在先前的工作中已经被采用，但我们发现它会导致谈判中寻求妥协的核心问题不得到学习，这可能导致对话失败（即对方不达成协议），最终影响模型的总性表现。我们在DealOrNoDeal任务中进行了研究，这是一种多个问题的谈判，涉及到书、帽子和球。基于经济学的谈判理论，我们修改了训练过程的两种新方法，以设计具有多样化个性的代理人，并分析它们与人类伙伴的表现。我们发现，尽管这两种技术都有承诺，但一个自私的代理人（它尽量提高自己的表现，同时避免对话失败）在其他变体中表现出色，并隐式地学习生成对话伙伴和自己都有价值的情况。我们讨论了我们的发现对成功谈判对话系统的设计意味着什么，以及未来这些系统应该如何设计。

O3D: Offline Data-driven Discovery and Distillation for Sequential Decision-Making with Large Language Models

paper_url: http://arxiv.org/abs/2310.14403
repo_url: None
paper_authors: Yuchen Xiao, Yanchao Sun, Mengda Xu, Udari Madhushani, Jared Vann, Deepeka Garg, Sumitra Ganesh
for: 提高大型语言模型（LLM）在解决sequential decision-making问题的表现
methods: 利用offline数据 scale（例如人类交互的日志）来提高LLM代理的in-context learning性能
results: O3D框架可以帮助LLM代理不需要训练就能够解决复杂和长时间的任务，并在多个任务中提取普遍可用的技能和知识

Abstract
Recent advancements in large language models (LLMs) have exhibited promising performance in solving sequential decision-making problems. By imitating few-shot examples provided in the prompts (i.e., in-context learning), an LLM agent can interact with an external environment and complete given tasks without additional training. However, such few-shot examples are often insufficient to generate high-quality solutions for complex and long-horizon tasks, while the limited context length cannot consume larger-scale demonstrations. To this end, we propose an offline learning framework that utilizes offline data at scale (e.g, logs of human interactions) to facilitate the in-context learning performance of LLM agents. We formally define LLM-powered policies with both text-based approaches and code-based approaches. We then introduce an Offline Data-driven Discovery and Distillation (O3D) framework to improve LLM-powered policies without finetuning. O3D automatically discovers reusable skills and distills generalizable knowledge across multiple tasks based on offline interaction data, advancing the capability of solving downstream tasks. Empirical results under two interactive decision-making benchmarks (ALFWorld and WebShop) demonstrate that O3D can notably enhance the decision-making capabilities of LLMs through the offline discovery and distillation process, and consistently outperform baselines across various LLMs with both text-based-policy and code-based-policy.

摘要
We define LLM-powered policies with both text-based and code-based approaches, and introduce an Offline Data-driven Discovery and Distillation (O3D) framework to improve these policies without finetuning. O3D automatically discovers reusable skills and distills generalizable knowledge across multiple tasks based on offline interaction data, enabling LLMs to solve downstream tasks more effectively.Empirical results under two interactive decision-making benchmarks (ALFWorld and WebShop) show that O3D can significantly enhance the decision-making capabilities of LLMs through the offline discovery and distillation process, and consistently outperform baselines across various LLMs with both text-based and code-based policies.

Value of Assistance for Grasping

paper_url: http://arxiv.org/abs/2310.14402
repo_url: None
paper_authors: Mohammad Masarwy, Yuval Goshen, David Dovrat, Sarah Keren
for: robotic grasping task with uncertain object pose
methods: probabilistic estimation of object pose, VOA measure for assessing observation effectiveness
results: effective in simulated and real-world robotic settings

Abstract
In many realistic settings, a robot is tasked with grasping an object without knowing its exact pose. Instead, the robot relies on a probabilistic estimation of the pose to decide how to attempt the grasp. We offer a novel Value of Assistance (VOA) measure for assessing the expected effect a specific observation will have on the robot's ability to successfully complete the grasp. Thus, VOA supports the decision of which sensing action would be most beneficial to the grasping task. We evaluate our suggested measures in both simulated and real-world robotic settings.

摘要
在许多现实场景中，机器人被要求抓取物品而不知其具体位置。而是通过 probabilistic 估计pose 来决定机器人是如何尝试抓取。我们提出了一种新的值帮助度（VOA）测量方法，用于评估抓取任务中具体的观察对机器人成功完成的影响。因此，VOA 支持机器人选择哪一种感知行为最有利于抓取任务。我们在实验室和实际机器人设置中评估了我们的建议。

Learning to bag with a simulation-free reinforcement learning framework for robots

paper_url: http://arxiv.org/abs/2310.14398
repo_url: None
paper_authors: Francisco Munguia-Galeano, Jihong Zhu, Juan David Hernández, Ze Ji
for: 这 paper 的目的是教育机器人如何包袋 (bagging) deformable objects, such as bags.
methods: 这 paper 使用了一种学习基于权重算法的框架, 可以在实际世界中学习包袋任务, 不需要使用模拟环境. 该框架使用了一组基本动作和五个状态来表示任务, 并通过一种强化学习算法来找到最佳抓取点.
results: 在实际世界中训练了三个小时后, 框架可以在不同的包袋任务中达到60%和80%的成功率, 并且在两个不同的袋子大小进行测试, 发现模型具有普适性.

Abstract
Bagging is an essential skill that humans perform in their daily activities. However, deformable objects, such as bags, are complex for robots to manipulate. This paper presents an efficient learning-based framework that enables robots to learn bagging. The novelty of this framework is its ability to perform bagging without relying on simulations. The learning process is accomplished through a reinforcement learning algorithm introduced in this work, designed to find the best grasping points of the bag based on a set of compact state representations. The framework utilizes a set of primitive actions and represents the task in five states. In our experiments, the framework reaches a 60 % and 80 % of success rate after around three hours of training in the real world when starting the bagging task from folded and unfolded, respectively. Finally, we test the trained model with two more bags of different sizes to evaluate its generalizability.

摘要
“bagging”是人类日常活动中的一种重要技能，但是柔软对象，如袋子，对机器人来说是复杂的 manipulate 的。这篇论文提出了一种高效的学习基于框架，使机器人能够学习bagging。这个框架的新特点在于不需要仿真环境，通过在这篇论文中介绍的一种强化学习算法，找到最佳抓取袋子的点。该框架使用了一组基本动作和五种状态来表示任务。在我们的实验中，该框架在真实世界中训练了三个小时后，在开始包装任务时达到了60%和80%的成功率，即从折叠和 unfolded 开始。最后，我们测试了训练后的模型，对两个不同大小的袋子进行了评估。

Merging Generated and Retrieved Knowledge for Open-Domain QA

paper_url: http://arxiv.org/abs/2310.14393
repo_url: https://github.com/yunx-z/combo
paper_authors: Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Lu Wang
For: The paper is written for improving open-domain question answering (QA) systems by effectively leveraging two sources of information: retrieved passages and large language models (LLMs).* Methods: The paper proposes a Compatibility-Oriented knowledge Merging (COMBO) framework that matches LLM-generated passages with retrieved counterparts into compatible pairs, based on discriminators trained with silver compatibility labels. The framework uses a Fusion-in-Decoder-based reader model to handle passage pairs and arrive at the final answer.* Results: The paper shows that COMBO outperforms competitive baselines on three out of four tested open-domain QA benchmarks, and demonstrates greater efficacy in scenarios with a higher degree of knowledge conflicts.

Abstract
Open-domain question answering (QA) systems are often built with retrieval modules. However, retrieving passages from a given source is known to suffer from insufficient knowledge coverage. Alternatively, prompting large language models (LLMs) to generate contextual passages based on their parametric knowledge has been shown to improve QA performance. Yet, LLMs tend to "hallucinate" content that conflicts with the retrieved knowledge. Based on the intuition that answers supported by both sources are more likely to be correct, we propose COMBO, a Compatibility-Oriented knowledge Merging for Better Open-domain QA framework, to effectively leverage the two sources of information. Concretely, we match LLM-generated passages with retrieved counterparts into compatible pairs, based on discriminators trained with silver compatibility labels. Then a Fusion-in-Decoder-based reader model handles passage pairs to arrive at the final answer. Experiments show that COMBO outperforms competitive baselines on three out of four tested open-domain QA benchmarks. Further analysis reveals that our proposed framework demonstrates greater efficacy in scenarios with a higher degree of knowledge conflicts.

摘要
具体来说，我们匹配LLM生成的passages与检索的对应者成Compatible pairs，基于训练用silver compatibility labels的Discriminators。然后，一个Fusion-in-Decoder-based reader model处理过程来到答案。实验表明，COMBO超过了竞争对手在四个测试的开放领域QA benchmarks。进一步的分析表明，我们的提议的Frameworks在知识冲突程度较高的场景中更有效。

ARCOQ: Arabic Closest Opposite Questions Dataset

paper_url: http://arxiv.org/abs/2310.14384
repo_url: https://github.com/sandrarizkallah/arcoq-dataset
paper_authors: Sandra Rizkallah, Amir F. Atiya, Samir Shaheen
for: 本研究提供了一个 Arabic 语言最近相反问题的数据集，这是首个为 Arabic 语言而设计的数据集。这个数据集对于 Antonymy 检测系统评估非常有用，结构类似于英语 Graduate Record Examination (GRE) 最近相反问题数据集。
methods: 本研究使用了一个 queries 和候选词集合来构建数据集，每个问题都包含一个查询词，需要从候选词中找到最近相反词。每个问题还关联着正确的答案。此外，文章还提供了一个基于 Arabic 词嵌入模型的性能标准。
results: 本研究提供了一个公共可用的数据集，以及一个标准的开发集和测试集分割。文章还提供了一些基于 Arabic 词嵌入模型的性能统计数据，以用于评估不同的 Antonymy 检测系统。

Abstract
This paper presents a dataset for closest opposite questions in Arabic language. The dataset is the first of its kind for the Arabic language. It is beneficial for the assessment of systems on the aspect of antonymy detection. The structure is similar to that of the Graduate Record Examination (GRE) closest opposite questions dataset for the English language. The introduced dataset consists of 500 questions, each contains a query word for which the closest opposite needs to be determined from among a set of candidate words. Each question is also associated with the correct answer. We publish the dataset publicly in addition to providing standard splits of the dataset into development and test sets. Moreover, the paper provides a benchmark for the performance of different Arabic word embedding models on the introduced dataset.

摘要

MoPe: Model Perturbation-based Privacy Attacks on Language Models

paper_url: http://arxiv.org/abs/2310.14369
repo_url: None
paper_authors: Marvin Li, Jason Wang, Jeffrey Wang, Seth Neel
for: 这个研究旨在检测大型自然语言模型（LLMs）是否会不意外地透露训练数据中的敏感信息。
methods: 这篇文章提出了一个新的方法，即模型扰动（MoPe），可以高度确定一个文本是否在一个预训语言模型的训练数据中。MoPe 在模型参数空间添加随机变动，然后测量模型在某个点 $x$ 的落差 log-likelihood，这个统计我们显示可以近似跟训练点的希尔伯特矩阵的跟踪。
results: 在各种语言模型，从 70 万到 12 亿个参数，我们的 MoPe 方法比存在损失基于的攻击和最近提出的扰动基于方法更有效。我们还考虑了训练点顺序和模型大小对于攻击成功的影响，并实际显示 MoPe 可以实际地近似希尔伯特矩阵的跟踪。我们的结果显示，损失单点没有能力决定抽取性 – 有些训练点可以使用我们的方法恢复，它们的平均损失都很高。这些结果质疑了之前的研究，它们使用单点损失作为训练点忘记或忘记的证据。

Abstract
Recent work has shown that Large Language Models (LLMs) can unintentionally leak sensitive information present in their training data. In this paper, we present Model Perturbations (MoPe), a new method to identify with high confidence if a given text is in the training data of a pre-trained language model, given white-box access to the models parameters. MoPe adds noise to the model in parameter space and measures the drop in log-likelihood at a given point $x$, a statistic we show approximates the trace of the Hessian matrix with respect to model parameters. Across language models ranging from $70$M to $12$B parameters, we show that MoPe is more effective than existing loss-based attacks and recently proposed perturbation-based methods. We also examine the role of training point order and model size in attack success, and empirically demonstrate that MoPe accurately approximate the trace of the Hessian in practice. Our results show that the loss of a point alone is insufficient to determine extractability -- there are training points we can recover using our method that have average loss. This casts some doubt on prior works that use the loss of a point as evidence of memorization or unlearning.

摘要
最近的研究表明，大型语言模型（LLM）可能会不意气地泄露训练数据中的敏感信息。在这篇论文中，我们提出了一种新的方法 called Model Perturbations（MoPe），可以在给定文本是否在预训练语言模型的训练数据中的高度置信度测试。MoPe在模型参数空间添加噪声，并测量在给定点 $x$ 上模型参数的下降 log-likelihood 的度量，我们证明这个度量与模型参数对Trace Hessian 矩阵很相似。在7000万到120亿参数的语言模型之间，我们证明MoPe比既有损失基于攻击和最近提出的噪声基于方法更有效。我们还研究了训练点顺序和模型大小对攻击成功的影响，并实际证明MoPe可以准确地估计Trace Hessian 在实践中。我们的结果表明，单个点的损失不能准确地确定抽取性——有些训练点可以通过我们的方法恢复，它们的平均损失高。这些结果对于以前的研究，使用单个点的损失作为忘记或不记忆的证据表示有一定的怀疑。

paper_url: http://arxiv.org/abs/2310.14358
repo_url: None
paper_authors: Elena Sergeeva, Anastasia Sergeeva, Huiyun Tang, Kerstin Bongard-Blanchy, Peter Szolovits
for: 本研究探讨用户对AI建议的接受行为，以评估健康相关声明的真实性在不同“建议质量”设定下。
methods: 我们采用了探索性的评估方法，通过向用户提供不同类型的AI建议来影响他们对健康相关声明的评估。
results: 我们发现，即使只是表示AI认为声明是错误/正确，也可以让更 than half of the people更改他们的声明真实性评估。不同的建议类型对接受率产生影响，但是获得建议的単纯效应经常大于建议类型效应。

Abstract
Previous research on expert advice-taking shows that humans exhibit two contradictory behaviors: on the one hand, people tend to overvalue their own opinions undervaluing the expert opinion, and on the other, people often defer to other people's advice even if the advice itself is rather obviously wrong. In our study, we conduct an exploratory evaluation of users' AI-advice accepting behavior when evaluating the truthfulness of a health-related statement in different "advice quality" settings. We find that even feedback that is confined to just stating that "the AI thinks that the statement is false/true" results in more than half of people moving their statement veracity assessment towards the AI suggestion. The different types of advice given influence the acceptance rates, but the sheer effect of getting a suggestion is often bigger than the suggestion-type effect.

摘要

From Chaos to Clarity: Claim Normalization to Empower Fact-Checking

paper_url: http://arxiv.org/abs/2310.14338
repo_url: None
paper_authors: Megha Sundriyal, Tanmoy Chakraborty, Preslav Nakov
For: The paper aims to help identify precise and prominent claims in social media posts that require verification, by introducing a novel task called Claim Normalization (ClaimNorm) and proposing a pioneering approach called CACN that leverages human reasoning processes and large language models.* Methods: The paper proposes a two-stage approach that first uses chain-of-thought and claim check-worthiness estimation to comprehend intricate claims, and then leverages large language models’ in-context learning abilities to provide guidance and improve the claim normalization process.* Results: The paper evaluates the effectiveness of the proposed model using a comprehensive real-world dataset (CLAN) consisting of more than 6k instances of social media posts alongside their respective normalized claims, and demonstrates that CACN outperforms several baselines across various evaluation measures.

Abstract
With the proliferation of social media platforms, users are exposed to vast information, including posts containing misleading claims. However, the pervasive noise inherent in these posts presents a challenge in identifying precise and prominent claims that require verification. Extracting the core assertions from such posts is arduous and time-consuming. We introduce a novel task called Claim Normalization (aka ClaimNorm) that aims to decompose complex and noisy social media posts into more straightforward and understandable forms, termed normalized claims. We propose CACN, a pioneering approach that leverages chain-of-thought and claim check-worthiness estimation, mimicking human reasoning processes, to comprehend intricate claims. Moreover, we capitalize on large language models' powerful in-context learning abilities to provide guidance and improve the claim normalization process. To evaluate the effectiveness of our proposed model, we meticulously compile a comprehensive real-world dataset, CLAN, comprising more than 6k instances of social media posts alongside their respective normalized claims. Experimentation demonstrates that CACN outperforms several baselines across various evaluation measures. A rigorous error analysis validates CACN's capabilities and pitfalls.

摘要
随着社交媒体平台的普及，用户面临着巨量的信息泥沼，其中包括含有误导性声明的帖子。然而，这些帖子中的噪音使得精确地提取声明变得困难和耗时。为了解决这个问题，我们提出了一个新任务 called Claim Normalization（简称 ClaimNorm），它的目标是将社交媒体帖子中的复杂和噪音声明转化为更加简单和易理解的形式，称为 normalized claims。我们提出了 CACN 模型，它利用链式思维和声明可靠性估计来模拟人类的思维过程，以便更好地理解复杂的声明。此外，我们利用大语言模型的强大在线学习能力，为声明Normalization过程提供指导和改进。为评估我们提出的模型效果，我们精心编译了 CLAN dataset，包含超过 6k 个社交媒体帖子和其对应的 normalized claims。实验结果表明，CACN 超过了多个基线数据进行多种评价指标。一份严格的错误分析证明了 CACN 的能力和缺点。

Learning Interpretable Rules for Scalable Data Representation and Classification

paper_url: http://arxiv.org/abs/2310.14336
repo_url: https://github.com/12wang3/rrl
paper_authors: Zhuo Wang, Wei Zhang, Ning Liu, Jianyong Wang
for: This paper aims to improve the scalability and interpretability of rule-based models for data representation and classification.
methods: The proposed method, called Rule-based Representation Learner (RRL), uses a novel training method called Gradient Grafting to optimize the discrete model using gradient descent, and employs a novel design of logical activation functions to increase scalability.
results: RRL outperforms competitive interpretable approaches on ten small and four large data sets, and can be easily adjusted to obtain a trade-off between classification accuracy and model complexity for different scenarios.

Abstract
Rule-based models, e.g., decision trees, are widely used in scenarios demanding high model interpretability for their transparent inner structures and good model expressivity. However, rule-based models are hard to optimize, especially on large data sets, due to their discrete parameters and structures. Ensemble methods and fuzzy/soft rules are commonly used to improve performance, but they sacrifice the model interpretability. To obtain both good scalability and interpretability, we propose a new classifier, named Rule-based Representation Learner (RRL), that automatically learns interpretable non-fuzzy rules for data representation and classification. To train the non-differentiable RRL effectively, we project it to a continuous space and propose a novel training method, called Gradient Grafting, that can directly optimize the discrete model using gradient descent. A novel design of logical activation functions is also devised to increase the scalability of RRL and enable it to discretize the continuous features end-to-end. Exhaustive experiments on ten small and four large data sets show that RRL outperforms the competitive interpretable approaches and can be easily adjusted to obtain a trade-off between classification accuracy and model complexity for different scenarios. Our code is available at: https://github.com/12wang3/rrl.

摘要
rule-based models, such as decision trees, are widely used in scenarios that require high model interpretability because of their transparent internal structures and good model expressivity. However, rule-based models are difficult to optimize, especially on large datasets, due to their discrete parameters and structures. Ensemble methods and fuzzy/soft rules are often used to improve performance, but these sacrifices model interpretability. To achieve both good scalability and interpretability, we propose a new classifier, called Rule-based Representation Learner (RRL), that automatically learns interpretable non-fuzzy rules for data representation and classification. To train the non-differentiable RRL effectively, we project it to a continuous space and propose a novel training method, called Gradient Grafting, that can directly optimize the discrete model using gradient descent. A novel design of logical activation functions is also devised to increase the scalability of RRL and enable it to discretize the continuous features end-to-end. Exhaustive experiments on ten small and four large datasets show that RRL outperforms competitive interpretable approaches and can be easily adjusted to obtain a trade-off between classification accuracy and model complexity for different scenarios. Our code is available at: https://github.com/12wang3/rrl.Here's the Chinese translation of the text, with some notes on the translation:1. 规则型模型 (rule-based models)：这类模型广泛应用在需要高度可读性的场景中，如决策树、规则 Engines 等。这些模型的透明性和表达能力使得它们在可读性和可解释性方面表现出色。2. However, 规则型模型 (rule-based models) 具有粗糙的结构和离散参数，因此在大型数据集上进行优化很困难。为了提高性能，人们通常使用 ensemble methods 和杂化/软规则来增强模型的表达能力。3. 但是，使用 ensemble methods 和杂化/软规则来提高性能会导致模型的可读性减退。为了实现同时保持高度可读性和高度表达能力，我们提出了一种新的分类器，即 Rule-based Representation Learner (RRL)。4. RRL 可以自动学习可读性高的非杂化规则，用于数据表示和分类。为了让 RRL 可以在大型数据集上进行效果地优化，我们提出了一种新的训练方法，即 Gradient Grafting。5. 在训练非 diferenciable RRL 时，我们使用 Gradient Grafting 方法可以直接使用梯度下降来优化离散模型。此外，我们还提出了一种新的逻辑激活函数的设计，以增加 RRL 的扩展性和可扩展性。6. 我们在十个小型数据集和四个大型数据集上进行了对比性实验，结果表明 RRL 可以超越一些竞争对手的可读性方法，并且可以根据不同的场景进行轻松地调整，以获得分类精度和模型复杂度之间的平衡。7. 我们的代码可以在 GitHub 上找到：https://github.com/12wang3/rrl.Some notes on the translation:1. "rule-based models" is translated as "规则型模型" (rule-based models), which is a common term used in machine learning to refer to models that use rules to make predictions.2. "high model interpretability" is translated as "高度可读性" (high degree of interpretability), which emphasizes the ability of the model to provide insights into its decision-making process.3. "discrete parameters" is translated as "离散参数" (discrete parameters), which refers to the fact that the model's parameters are not continuous, but rather take on discrete values.4. "ensemble methods" is translated as "ensemble methods" (ensemble methods), which refers to techniques that combine the predictions of multiple models to improve overall performance.5. "fuzzy/soft rules" is translated as "杂化/软规则" (fuzzy/soft rules), which refers to rules that allow for some degree of uncertainty or vagueness in their predictions.6. " Gradient Grafting" is translated as "梯度植入" (Gradient Grafting), which is a novel training method proposed in the paper to directly optimize the discrete model using gradient descent.7. " logical activation functions" is translated as "逻辑激活函数" (logical activation functions), which refers to a type of activation function that is designed to increase the scalability of the model.

CLMSM: A Multi-Task Learning Framework for Pre-training on Procedural Text

paper_url: http://arxiv.org/abs/2310.14326
repo_url: None
paper_authors: Abhilash Nandy, Manav Nitin Kapadnis, Pawan Goyal, Niloy Ganguly
For: + The paper is written for proposing a domain-specific, continual pre-training framework for procedural NLP tasks.* Methods: + The framework uses a Multi-Task Learning Framework to optimize two objectives: contrastive learning using hard triplets and a novel mask-step modeling objective.* Results: + The proposed framework (CLMSM) outperforms baselines on recipes (in-domain) and is able to generalize to open-domain procedural NLP tasks.

Abstract
In this paper, we propose CLMSM, a domain-specific, continual pre-training framework, that learns from a large set of procedural recipes. CLMSM uses a Multi-Task Learning Framework to optimize two objectives - a) Contrastive Learning using hard triplets to learn fine-grained differences across entities in the procedures, and b) a novel Mask-Step Modelling objective to learn step-wise context of a procedure. We test the performance of CLMSM on the downstream tasks of tracking entities and aligning actions between two procedures on three datasets, one of which is an open-domain dataset not conforming with the pre-training dataset. We show that CLMSM not only outperforms baselines on recipes (in-domain) but is also able to generalize to open-domain procedural NLP tasks.

摘要
在这篇论文中，我们提出了CLMSM，一种领域专门的大规模练习框架，利用大量的过程recipes进行学习。CLMSM使用多任务学习框架来优化两个目标：一是对照 triplets进行强制学习细腻差异 между实体，二是一种新的面积步骤模型目标来学习过程步骤的上下文。我们在三个数据集上测试CLMSM的性能，其中一个是一个公开的数据集，不符合预训练数据集。我们发现CLMSM不仅在recipes（预训练）上超越了基eline，还能够通过预训练数据集来泛化到开放领域的过程语言任务。

A Survey on Semantic Processing Techniques

paper_url: http://arxiv.org/abs/2310.18345
repo_url: None
paper_authors: Rui Mao, Kai He, Xulang Zhang, Guanyi Chen, Jinjie Ni, Zonglin Yang, Erik Cambria
for: 本研究旨在探讨计算语言学中的 semantics 领域的最新进展，以及这一领域在不同应用领域的拓展和integration。
methods: 本文分析了五种semantic processing task，包括word sense disambiguation、anaphora resolution、named entity recognition、concept extraction和subjectivity detection。并评估了这些任务的相关理论研究、高级方法和下游应用。
results: 本文对semantic processing tasks的研究进行了概括和比较，探讨了不同技术和应用趋势，并提出了未来发展的方向和 possiblities。

Abstract
Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.

摘要
《semantic processing》是计算语言学的基础研究领域之一。在大型预训练语言模型和大语言模型的时代，这一领域的研究进步似乎减速。然而，语义研究是语言学的多维ensional问题，computational semantic processing的研究深度和广度可以通过新技术得到大幅提升。在这份调研中，我们分析了五种semantic processing任务，例如word sense disambiguation、anaphora resolution、named entity recognition、concept extraction和主观性检测。我们研究相关的理论研究、先进方法和下游应用。我们将调研任务与下游应用连接起来，因为这可能会鼓励未来的学者将低级语义处理任务与高级自然语言处理任务融合起来。审视理论研究也可能 inspirer新的任务和技术在语义处理领域。最后，我们比较不同的语义处理技术，总结其技术趋势、应用趋势和未来方向。

Chainpoll: A high efficacy method for LLM hallucination detection

paper_url: http://arxiv.org/abs/2310.18344
repo_url: None
paper_authors: Robert Friel, Atindriyo Sanyal
for: 本研究旨在提出一种新的幻觉检测方法ChainPoll，并提供一个改进的benchmark datasets集RealHall，以评估现有 studies中的幻觉检测metric。
methods: 本研究使用了一种新的幻觉检测方法ChainPoll，以及一个改进的benchmark datasets集RealHall，以评估现有 studies中的幻觉检测metric。
results: 对RealHall benchmark datasets进行了全面的比较，发现ChainPoll在所有RealHall benchmark上表现出色，AUROC为0.781，比最佳理论方法提高11%，超过了行业标准23%。此外，ChainPoll还具有cost-effective和高度可见的优点。此外，本研究还引入了两种新的幻觉度量：Adherence和Correctness。

Abstract
Large language models (LLMs) have experienced notable advancements in generating coherent and contextually relevant responses. However, hallucinations - incorrect or unfounded claims - are still prevalent, prompting the creation of automated metrics to detect these in LLM outputs. Our contributions include: introducing ChainPoll, an innovative hallucination detection method that excels compared to its counterparts, and unveiling RealHall, a refined collection of benchmark datasets to assess hallucination detection metrics from recent studies. While creating RealHall, we assessed tasks and datasets from previous hallucination detection studies and observed that many are not suitable for the potent LLMs currently in use. Overcoming this, we opted for four datasets challenging for modern LLMs and pertinent to real-world scenarios. Using RealHall, we conducted a comprehensive comparison of ChainPoll with numerous hallucination metrics from recent studies. Our findings indicate that ChainPoll outperforms in all RealHall benchmarks, achieving an overall AUROC of 0.781. This surpasses the next best theoretical method by 11% and exceeds industry standards by over 23%. Additionally, ChainPoll is cost-effective and offers greater transparency than other metrics. We introduce two novel metrics to assess LLM hallucinations: Adherence and Correctness. Adherence is relevant to Retrieval Augmented Generation workflows, evaluating an LLM's analytical capabilities within given documents and contexts. In contrast, Correctness identifies logical and reasoning errors.

摘要
大型语言模型（LLM）在生成准确和Contextually relevanteresponses方面已经做出了显著的进步。然而，幻觉 - incorrect或不科学的声明 - 仍然是LLM输出中的普遍问题，因此创造了自动化的metricsto检测这些幻觉在LLM输出中。我们的贡献包括：引入ChainPoll，一种创新的幻觉检测方法，与其他counterparts相比，具有显著的优势；以及披露RealHall，一个改进的benchmark datasets，用于评估最近的学术研究中的幻觉检测 metric。在创建RealHall时，我们评估了过去的幻觉检测任务和dataset，发现许多不适合当今的强大LLM使用。为了超越这些 limitation，我们选择了四个dataset，挑战当今的LLM，并与实际场景相关。使用RealHall，我们对ChainPoll与最近学术研究中的幻觉metric进行了全面的比较。我们的发现表明，ChainPoll在RealHall benchmark中表现出色，AUROC为0.781，比最佳理论方法提高11%，超过了industry标准 by over 23%。此外，ChainPoll可以efficiently cost-effective，并且具有更高的透明度。我们还引入了两个新的metric来评估LLM幻觉： Adherence和Correctness。 Adherence relevante到 Retrieval Augmented Generation workflows，评估LLM在给定文档和context中的分析能力。与此相反，Correctness检测LLM的逻辑和推理错误。

NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval

paper_url: http://arxiv.org/abs/2310.14282
repo_url: None
paper_authors: Uri Katz, Matan Vetzler, Amir DN Cohen, Yoav Goldberg
for: 这些研究旨在推动Named Entity Recognition（NER）任务的进一步发展，包括提高NER模型的精度和可扩展性。
methods: 这些研究使用大语言模型（LLM），包括zero-shotRecognition和自动标注等技术，以提高NER模型的性能。
results: 研究发现， presenteNER任务仍然存在许多挑战，包括更细化的实体类型、零shot认知和检索、以及减少预处理时间等。

Abstract
Recognizing entities in texts is a central need in many information-seeking scenarios, and indeed, Named Entity Recognition (NER) is arguably one of the most successful examples of a widely adopted NLP task and corresponding NLP technology. Recent advances in large language models (LLMs) appear to provide effective solutions (also) for NER tasks that were traditionally handled with dedicated models, often matching or surpassing the abilities of the dedicated models. Should NER be considered a solved problem? We argue to the contrary: the capabilities provided by LLMs are not the end of NER research, but rather an exciting beginning. They allow taking NER to the next level, tackling increasingly more useful, and increasingly more challenging, variants. We present three variants of the NER task, together with a dataset to support them. The first is a move towards more fine-grained -- and intersectional -- entity types. The second is a move towards zero-shot recognition and extraction of these fine-grained types based on entity-type labels. The third, and most challenging, is the move from the recognition setup to a novel retrieval setup, where the query is a zero-shot entity type, and the expected result is all the sentences from a large, pre-indexed corpus that contain entities of these types, and their corresponding spans. We show that all of these are far from being solved. We provide a large, silver-annotated corpus of 4 million paragraphs covering 500 entity types, to facilitate research towards all of these three goals.

摘要
Recognizing entities in texts is a central need in many information-seeking scenarios, and indeed, Named Entity Recognition (NER) is arguably one of the most successful examples of a widely adopted NLP task and corresponding NLP technology. Recent advances in large language models (LLMs) appear to provide effective solutions (also) for NER tasks that were traditionally handled with dedicated models, often matching or surpassing the abilities of the dedicated models. However, we argue that NER should not be considered a solved problem: the capabilities provided by LLMs are not the end of NER research, but rather an exciting beginning. They allow taking NER to the next level, tackling increasingly more useful, and increasingly more challenging, variants. We present three variants of the NER task, together with a dataset to support them. The first is a move towards more fine-grained -- and intersectional -- entity types. The second is a move towards zero-shot recognition and extraction of these fine-grained types based on entity-type labels. The third, and most challenging, is the move from the recognition setup to a novel retrieval setup, where the query is a zero-shot entity type, and the expected result is all the sentences from a large, pre-indexed corpus that contain entities of these types, and their corresponding spans. We show that all of these are far from being solved. We provide a large, silver-annotated corpus of 4 million paragraphs covering 500 entity types, to facilitate research towards all of these three goals.

RSM-NLP at BLP-2023 Task 2: Bangla Sentiment Analysis using Weighted and Majority Voted Fine-Tuned Transformers

paper_url: http://arxiv.org/abs/2310.14261
repo_url: https://github.com/ptnv-s/rsm-nlp-blp-task2
paper_authors: Pratinav Seth, Rashi Goel, Komal Mathur, Swetha Vemulapalli
for: 本研究旨在提高孟加拉语社交媒体内容的自动情感分析能力。
methods: 本研究使用多种多语言BERT模型进行实验和调整，并使用多样本投票和质量权重加权模型，以提高情感分析的准确率。
results: 本研究在多类分类任务中取得0.711的分数，并在共同任务中名列十名。Here’s the translation in English for reference:
for: The aim of this research is to improve the ability of automatic sentiment analysis for Bangla social media content.
methods: The research uses various multilingual and pre-trained BERT models for experimentation and fine-tuning, and employs a majority voting and weighted ensemble model to enhance the accuracy of sentiment analysis.
results: The research achieved a score of 0.711 on the multiclass classification task and ranked 10th among participants on the leaderboard for the shared task.

Abstract
This paper describes our approach to submissions made at Shared Task 2 at BLP Workshop - Sentiment Analysis of Bangla Social Media Posts. Sentiment Analysis is an action research area in the digital age. With the rapid and constant growth of online social media sites and services and the increasing amount of textual data, the application of automatic Sentiment Analysis is on the rise. However, most of the research in this domain is based on the English language. Despite being the world's sixth most widely spoken language, little work has been done in Bangla. This task aims to promote work on Bangla Sentiment Analysis while identifying the polarity of social media content by determining whether the sentiment expressed in the text is Positive, Negative, or Neutral. Our approach consists of experimenting and finetuning various multilingual and pre-trained BERT-based models on our downstream tasks and using a Majority Voting and Weighted ensemble model that outperforms individual baseline model scores. Our system scored 0.711 for the multiclass classification task and scored 10th place among the participants on the leaderboard for the shared task. Our code is available at https://github.com/ptnv-s/RSM-NLP-BLP-Task2 .

摘要
translate into Simplified Chinese:这篇论文描述了我们在BLP工作坊的共享任务2中的提交方法 - sentiment analysis of Bangla social media posts。sentiment analysis是数字时代的一个研究领域，随着在线社交媒体网站和服务的快速增长和文本数据的增加，自动化sentiment analysis的应用也在增加。然而，大多数研究在这个领域是基于英语。不过，巴anga是全球第六 Most widely spoken语言，但是对于这种语言的研究却很少。这个任务的目标是促进巴anga Sentiment Analysis的研究，并在文本中确定 sentiment的 polarity是Positive、Negative或Neutral。我们的方法包括对多种多语言和预训练BERT模型进行实验和精度调整，并使用多数投票和Weighted ensemble模型，这个模型超过了个别基eline模型分数。我们的系统在多类分类任务上得分0.711，并在共享任务的 лидер板上排名第10名。我们的代码可以在https://github.com/ptnv-s/RSM-NLP-BLP-Task2 上获取。

High-Quality 3D Face Reconstruction with Affine Convolutional Networks

paper_url: http://arxiv.org/abs/2310.14237
repo_url: None
paper_authors: Zhiqian Lin, Jiangke Lin, Lincheng Li, Yi Yuan, Zhengxia Zou
for: This paper aims to tackle the challenges of canonical view reconstruction from a single input image, specifically addressing the problem of spatial misalignment between the input and output images.
methods: The proposed method uses an Affine Convolution Network (ACN) architecture to handle spatially non-corresponding input and output images, and represents 3D human heads in UV space with multiple components, including diffuse maps, position maps, and light maps.
results: The method is able to generate high-quality UV maps at a resolution of 512 x 512 pixels, while previous approaches typically generate maps at a lower resolution of 256 x 256 pixels or smaller.Here’s the simplified Chinese text for the three key points:
for: 这篇论文目标是解决单个输入图像中的 canonical view reconstruction 问题，具体来说是处理输入和输出图像之间的空间不对称问题。
methods: 该方法使用 Affine Convolution Network (ACN) 架构来处理空间不对称的输入和输出图像，并使用多个组件来表示 3D 人脸的 UV 空间，包括Diffuse 地图、Position 地图和 Light 地图。
results: 方法可以生成高质量的 UV 地图，分辨率为 512 x 512 像素，而前一种方法通常生成的地图分辨率为 256 x 256 像素或小于。

Abstract
Recent works based on convolutional encoder-decoder architecture and 3DMM parameterization have shown great potential for canonical view reconstruction from a single input image. Conventional CNN architectures benefit from exploiting the spatial correspondence between the input and output pixels. However, in 3D face reconstruction, the spatial misalignment between the input image (e.g. face) and the canonical/UV output makes the feature encoding-decoding process quite challenging. In this paper, to tackle this problem, we propose a new network architecture, namely the Affine Convolution Networks, which enables CNN based approaches to handle spatially non-corresponding input and output images and maintain high-fidelity quality output at the same time. In our method, an affine transformation matrix is learned from the affine convolution layer for each spatial location of the feature maps. In addition, we represent 3D human heads in UV space with multiple components, including diffuse maps for texture representation, position maps for geometry representation, and light maps for recovering more complex lighting conditions in the real world. All the components can be trained without any manual annotations. Our method is parametric-free and can generate high-quality UV maps at resolution of 512 x 512 pixels, while previous approaches normally generate 256 x 256 pixels or smaller. Our code will be released once the paper got accepted.

摘要
最近的研究基于卷积编码器-解码器架构和3DMM参数化已经显示了从单个输入图像重建 canonical 视图的潜在性。传统的 CNN 架构可以利用输入和输出像素之间的空间相对性，从而提高特征编码-解码过程的性能。但在3D面重建中，输入图像（例如脸）和 canonical/UV 输出之间的空间误差使得特征编码-解码过程变得非常困难。在这篇论文中，我们提出了一种新的网络架构，即 Affine Convolution Networks，以便 CNN 基本上可以处理空间不匹配的输入和输出图像，同时保持高质量输出。在我们的方法中，每个空间位置的特征图中学习了一个 Affine 变换矩阵。此外，我们使用多组分表示3D 人头的 UV 空间，包括diffuse 图像表示纹理，position 图像表示几何表示，以及light 图像为在真实世界中更复杂的光照条件的恢复。所有组分都可以通过无需手动标注来学习。我们的方法是 Parametric-free 的，可以生成高分辨率（512 x 512 像素）的高质量 UV 地图，而之前的方法通常生成的是 256 x 256 像素或更小的地图。我们的代码将在论文被接受后发布。

Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization

paper_url: http://arxiv.org/abs/2310.15196
repo_url: https://github.com/bill-cjb/emnh
paper_authors: Jinbiao Chen, Jiahai Wang, Zizhen Zhang, Zhiguang Cao, Te Ye, Siyuan Chen
for: 解决多目标 combinatorial 优化问题 (MOCOP)
methods: 使用深度强化学习 neural heuristics，并提出了一种高效的元 нейро逻辑 (EMNH) 方法，通过快速训练和精细调整来解决问题
results: EMNH 方法可以在 solution quality 和学习效率两个方面比 estado-of-the-art нейро逻辑方法出色，并且可以与传统的优化策略相比，提供竞争力强的解决方案，而且消耗的时间非常短。

Abstract
Recently, neural heuristics based on deep reinforcement learning have exhibited promise in solving multi-objective combinatorial optimization problems (MOCOPs). However, they are still struggling to achieve high learning efficiency and solution quality. To tackle this issue, we propose an efficient meta neural heuristic (EMNH), in which a meta-model is first trained and then fine-tuned with a few steps to solve corresponding single-objective subproblems. Specifically, for the training process, a (partial) architecture-shared multi-task model is leveraged to achieve parallel learning for the meta-model, so as to speed up the training; meanwhile, a scaled symmetric sampling method with respect to the weight vectors is designed to stabilize the training. For the fine-tuning process, an efficient hierarchical method is proposed to systematically tackle all the subproblems. Experimental results on the multi-objective traveling salesman problem (MOTSP), multi-objective capacitated vehicle routing problem (MOCVRP), and multi-objective knapsack problem (MOKP) show that, EMNH is able to outperform the state-of-the-art neural heuristics in terms of solution quality and learning efficiency, and yield competitive solutions to the strong traditional heuristics while consuming much shorter time.

摘要
近期，基于深度再征学习的神经归纳算法已经在解决多目标 combinatorial 优化问题（MOCOP）中显示了承诺。然而，它们仍然很难达到高效学习和解决方案质量的目标。为了解决这个问题，我们提出了一种高效的元神经归纳算法（EMNH），其中首先训练一个元模型，然后使用一些步骤进行细化以解决相应的单目标优化问题。在训练过程中，我们利用了一个（部分）建筑物共享多任务模型，以实现并行学习，以加速训练过程。同时，我们设计了一种尺度相对的对 вектор的抖擞方法，以稳定训练过程。在细化过程中，我们提出了一种高效的层次方法，以系统地解决所有的优化问题。实验结果显示，EMNH可以在多目标旅行商问题（MOTSP）、多目标资源限制车辆路径问题（MOCVRP）和多目标零部件问题（MOKP）中，超越现有的神经归纳算法，以至于解决方案质量和学习效率。同时，它可以在短时间内提供竞争力强的传统归纳算法解决方案。

Neural Multi-Objective Combinatorial Optimization with Diversity Enhancement

paper_url: http://arxiv.org/abs/2310.15195
repo_url: https://github.com/bill-cjb/nhde
paper_authors: Jinbiao Chen, Zizhen Zhang, Zhiguang Cao, Yaoxin Wu, Yining Ma, Te Ye, Jiahai Wang
for: solves multi-objective combinatorial optimization (MOCO) problems with a novel neural heuristic that enhances diversity.
methods: uses an indicator-enhanced deep reinforcement learning method and a heterogeneous graph attention mechanism to capture the relations between the instance graph and the Pareto front graph, as well as a multiple Pareto optima strategy to sample and preserve desirable solutions.
results: generates a Pareto front with higher diversity, achieving superior overall performance on classic MOCO problems, and is generic and can be applied to different neural methods for MOCO.Here’s the full Chinese text:
for: solves 多bjective combinatorial optimization (MOCO) 问题，使用一种新的神经拟合算法，以提高多元性。
methods: 使用指标增强的深度学习方法和异类图注意力机制，捕捉实例图和Pareto前图之间的关系，同时采用多个Pareto优点策略，抽样和保留有价值的解决方案。
results: 在经典MOCO问题上实现了更高的多元性Pareto前，实现了更好的总性能，并且可以应用于不同的神经方法 дляMOCO。

Abstract
Most of existing neural methods for multi-objective combinatorial optimization (MOCO) problems solely rely on decomposition, which often leads to repetitive solutions for the respective subproblems, thus a limited Pareto set. Beyond decomposition, we propose a novel neural heuristic with diversity enhancement (NHDE) to produce more Pareto solutions from two perspectives. On the one hand, to hinder duplicated solutions for different subproblems, we propose an indicator-enhanced deep reinforcement learning method to guide the model, and design a heterogeneous graph attention mechanism to capture the relations between the instance graph and the Pareto front graph. On the other hand, to excavate more solutions in the neighborhood of each subproblem, we present a multiple Pareto optima strategy to sample and preserve desirable solutions. Experimental results on classic MOCO problems show that our NHDE is able to generate a Pareto front with higher diversity, thereby achieving superior overall performance. Moreover, our NHDE is generic and can be applied to different neural methods for MOCO.

摘要
大多数现有的神经方法 для多目标组合优化（MOCO）问题都仅仅采用分解，这frequently leads to repetitive solutions for the respective subproblems, resulting in a limited Pareto set. 在这种情况下，我们提出了一种新的神经规范with diversity enhancement（NHDE），以生成更多的Pareto解决方案。从一个角度来看，我们提出了一种指标增强的深度学习方法，以避免不同的子问题之间的重复解决方案。同时，我们设计了一种异质图注意机制，以捕捉实例图和Pareto前图之间的关系。从另一个角度来看，我们采用多个Pareto优点策略，以采样和保留愿意的解决方案。实验结果表明，我们的NHDE能够生成一个更高度的多样性Pareto前，从而实现更好的总性能。此外，我们的NHDE可以应用于不同的神经方法for MOCO。

MIRACLE: Towards Personalized Dialogue Generation with Latent-Space Multiple Personal Attribute Control

paper_url: http://arxiv.org/abs/2310.18342
repo_url: https://github.com/lzy-the-boys/miracle
paper_authors: Zhenyi Lu, Wei Wei, Xiaoye Qu, XianLing Mao, Dangyang Chen, Jixiong Chen
for: 这种研究旨在提高人工智能对话系统的人性化特征，以便实现更加自然的人类对话。
methods: 该研究提出了一种新的个性化对话生成方法，基于多个人性特征的控制，包括语言风格、内心人物特征等。
results: 实验表明，该方法可以提供更高的人性化控制和对话质量，并且可以在多个对话场景中实现灵活的人性特征组合。

Abstract
Personalized dialogue systems aim to endow the chatbot agent with more anthropomorphic traits for human-like interactions. Previous approaches have explored explicitly user profile modeling using text descriptions, implicit derivation of user embeddings, or utilizing handicraft prompts for ChatGPT-like models. However, textual personas are limited in describing multi-faceted attributes (\emph{e.g.}, \emph{language style, inner character nuances}), implicit embedding suffers from personality sparsity, and handicraft prompts lack fine-grained and stable controllability. Hence, these approaches may struggle with complex personalized dialogue generation tasks that require generating controllable responses with multiple personal attributes. To this end, we propose \textbf{\textsc{Miracle}, a novel personalized dialogue generation method through \textbf{M}ult\textbf{I}ple Pe\textbf{R}sonal \textbf{A}ttributes \textbf{C}ontrol within \textbf{L}atent-Space \textbf{E}nergy-based Models. ttributes \textbf{C}ontrol within \textbf{L}atent-Space \textbf{E}nergy-based Models. Specifically, our approach first disentangles complex personality into multi-faceted attributes. Subsequently, we employ a conditional variational auto-encoder to align with the dense personalized responses within a latent joint attribute space. We have also tailored a dedicated energy function and customized the ordinary differential equations sampling method to offer flexible attribute composition and precise attribute control. Extensive experiments demonstrate that \textsc{Miracle} outperforms several strong baselines in terms of personality controllability and response generation quality. Our dataset and code are available at \url{https://github.com/LZY-the-boys/MIRACLE}

摘要
人工智能对话系统的目标是赋予对话机器人更多人类特征，以便更自然的人类交互。先前的方法包括文本描述来明确用户模型，从文本中推导用户嵌入，或者使用手工提示来驱动ChatGPT样式的模型。然而，文本个性只能描述用户的一些多方面特征（例如，语言风格、内心特点），嵌入难以表达用户的人格特点，而手工提示缺乏精细控制。因此，这些方法可能会在复杂的个性化对话生成任务中遇到困难，特别是需要生成多个人性特征的响应。为此，我们提出了\textbf{\textsc{Miracle}，一种新的个性化对话生成方法，通过\textbf{M}ult\textbf{I}ple \textbf{P}ersonal \textbf{A}ttributes \textbf{C}ontrol within \textbf{L}atent-Space \textbf{E}nergy-based Models。具体来说，我们的方法首先分解复杂的人性特征，然后employs a conditional variational autoencoder来对具有密集个性响应的积极特征空间进行对应。我们还特制了一个专门的能量函数和自适应的差分方程，以便可以自由地组合特征和精确地控制特征。我们的实验证明，\textsc{Miracle}在人性可控和响应质量两个方面都高于多个强基eline。我们的数据集和代码可以在\url{https://github.com/LZY-the-boys/MIRACLE}上获取。

UniMAP: Universal SMILES-Graph Representation Learning

paper_url: http://arxiv.org/abs/2310.14216
repo_url: https://github.com/fengshikun/unimap
paper_authors: Shikun Feng, Lixin Yang, Weiying Ma, Yanyan Lan
for: This paper aims to propose a universal molecular representation learning model that can effectively leverage both SMILES and graph representations for drug-related applications.
methods: The proposed model, UniMAP, uses an embedding layer to obtain token and node/edge representations in SMILES and graph, respectively, followed by a multi-layer Transformer to conduct deep cross-modality fusion. The model is pre-trained on four tasks: Multi-Level Cross-Modality Masking, SMILES-Graph Matching, Fragment-Level Alignment, and Domain Knowledge Learning.
results: UniMAP outperforms current state-of-the-art pre-training methods on various downstream tasks, including molecular property prediction, drug-target affinity prediction, and drug-drug interaction. The learned representations are also visualized to demonstrate the effect of multi-modality integration.

Abstract
Molecular representation learning is fundamental for many drug related applications. Most existing molecular pre-training models are limited in using single molecular modality, either SMILES or graph representation. To effectively leverage both modalities, we argue that it is critical to capture the fine-grained 'semantics' between SMILES and graph, because subtle sequence/graph differences may lead to contrary molecular properties. In this paper, we propose a universal SMILE-graph representation learning model, namely UniMAP. Firstly, an embedding layer is employed to obtain the token and node/edge representation in SMILES and graph, respectively. A multi-layer Transformer is then utilized to conduct deep cross-modality fusion. Specially, four kinds of pre-training tasks are designed for UniMAP, including Multi-Level Cross-Modality Masking (CMM), SMILES-Graph Matching (SGM), Fragment-Level Alignment (FLA), and Domain Knowledge Learning (DKL). In this way, both global (i.e. SGM and DKL) and local (i.e. CMM and FLA) alignments are integrated to achieve comprehensive cross-modality fusion. We evaluate UniMAP on various downstream tasks, i.e. molecular property prediction, drug-target affinity prediction and drug-drug interaction. Experimental results show that UniMAP outperforms current state-of-the-art pre-training methods.We also visualize the learned representations to demonstrate the effect of multi-modality integration.

摘要
молекулярное представление научиться是许多药物相关应用的基础。大多数现有的药物预训模型都是使用单一的分子模式，可以是SMILES或图表示。为了有效地利用这两种模式，我们认为是关键 capture fine-grained 'semantics' между SMILES和图，因为微scopic序列/图像差异可能会导致不同的分子性质。在这篇论文中，我们提议一种通用的SMILES-图表示学习模型，称为UniMAP。首先，一个嵌入层被使用来获取SMILES和图的token和节点/边表示。然后，一个多层变换器被使用来进行深度的cross-modality融合。特别是，我们设计了四种预训任务 дляUniMAP，包括多级cross-modality遮盲(CMM), SMILES-图匹配(SGM),块级对alignment(FLA)和领域知识学习(DKL)。这样，global (即SGM和DKL)和local (即CMM和FLA)的对应都被集成，以实现全面的cross-modality融合。我们在多个下游任务上评估UniMAP，包括分子性质预测、药物-Target相互作用预测和药物-药物相互作用。实验结果表明，UniMAP在现有状态的预训方法中表现出色。我们还利用学习的表示图示出效果多模式融合。

Item-Graph2vec: a Efficient and Effective Approach using Item Co-occurrence Graph Embedding for Collaborative Filtering

paper_url: http://arxiv.org/abs/2310.14215
repo_url: https://github.com/cpu135/item-graph2vec
paper_authors: Ruilin Yuan, Leya Li, Yuanzhe Cai
for: 提高大规模项目推荐系统的效率
methods: 使用随机游走Item Graph embedding算法
results: 相比Item2vec，Item-Graph2vec在大规模数据集上具有稳定的运行时间和高效的性能，并且在实际数据集上实现了3倍的效率提升。

Abstract
Current item-item collaborative filtering algorithms based on artificial neural network, such as Item2vec, have become ubiquitous and are widely applied in the modern recommender system. However, these approaches do not apply to the large-scale item-based recommendation system because of their extremely long training time. To overcome the shortcoming that current algorithms have high training time costs and poor stability when dealing with large-scale data sets, the item graph embedding algorithm Item-Graph2vec is described here. This algorithm transforms the users' shopping list into a item co-occurrence graph, obtains item sequences through randomly travelling on this co-occurrence graph and finally trains item vectors through sequence samples. We posit that because of the stable size of item, the size and density of the item co-occurrence graph change slightly with the increase in the training corpus. Therefore, Item-Graph2vec has a stable runtime on the large scale data set, and its performance advantage becomes more and more obvious with the growth of the training corpus. Extensive experiments conducted on real-world data sets demonstrate that Item-Graph2vec outperforms Item2vec by 3 times in terms of efficiency on douban data set, while the error generated by the random walk sampling is small.

摘要
当前的item-item共同推荐算法，如Item2vec，在现代推荐系统中广泛应用。然而，这些方法在大规模item-based推荐系统中不适用，因为它们的训练时间非常长。为了超越现有的缺点，Item Graph2vec算法是描述的。这个算法将用户购买记录转换成item共享图，从共享图中随机旅行获得item序列，并最后使用序列样本来训练item vectors。我们认为，由于item的稳定大小，item共享图的大小和密度随着训练 corpora 的增加而变化稍微。因此，Item Graph2vec在大规模数据集上具有稳定的运行时间，而且其性能优势随着训练 corpora 的增长而变得更加明显。在实际数据集上进行了广泛的实验，并证明了Item Graph2vec在对 douban 数据集进行训练时，与Item2vec相比，三倍的效率。同时，随机步骤采样生成的误差较小。

LUNA: A Model-Based Universal Analysis Framework for Large Language Models

paper_url: http://arxiv.org/abs/2310.14211
repo_url: None
paper_authors: Da Song, Xuan Xie, Jiayang Song, Derui Zhu, Yuheng Huang, Felix Juefei-Xu, Lei Ma
for:This paper aims to provide a universal analysis framework for large language models (LLMs) to evaluate their trustworthiness from multiple perspectives.methods:The proposed framework, called LUNA, leverages various abstract model construction methods and defines evaluation metrics to assess the quality of the abstract model and the semantics of the LLM.results:The proposed framework enables versatile analysis of LLMs from multiple quality perspectives in a human-interpretable manner, and can be used to evaluate the trustworthiness of LLMs in various industrial domains.

Abstract
Over the past decade, Artificial Intelligence (AI) has had great success recently and is being used in a wide range of academic and industrial fields. More recently, LLMs have made rapid advancements that have propelled AI to a new level, enabling even more diverse applications and industrial domains with intelligence, particularly in areas like software engineering and natural language processing. Nevertheless, a number of emerging trustworthiness concerns and issues exhibited in LLMs have already recently received much attention, without properly solving which the widespread adoption of LLMs could be greatly hindered in practice. The distinctive characteristics of LLMs, such as the self-attention mechanism, extremely large model scale, and autoregressive generation schema, differ from classic AI software based on CNNs and RNNs and present new challenges for quality analysis. Up to the present, it still lacks universal and systematic analysis techniques for LLMs despite the urgent industrial demand. Towards bridging this gap, we initiate an early exploratory study and propose a universal analysis framework for LLMs, LUNA, designed to be general and extensible, to enable versatile analysis of LLMs from multiple quality perspectives in a human-interpretable manner. In particular, we first leverage the data from desired trustworthiness perspectives to construct an abstract model as an auxiliary analysis asset, which is empowered by various abstract model construction methods. To assess the quality of the abstract model, we collect and define a number of evaluation metrics, aiming at both abstract model level and the semantics level. Then, the semantics, which is the degree of satisfaction of the LLM w.r.t. the trustworthiness perspective, is bound to and enriches the abstract model with semantics, which enables more detailed analysis applications for diverse purposes.

摘要
Specifically, we first leverage data from desired trustworthiness perspectives to construct an abstract model as an auxiliary analysis asset, which is empowered by various abstract model construction methods. We then collect and define a set of evaluation metrics to assess the quality of the abstract model, aiming at both the abstract model level and the semantics level. Finally, the semantics, which represents the degree of satisfaction of the LLM with respect to the trustworthiness perspective, is bound to and enriches the abstract model with semantics, enabling more detailed analysis applications for diverse purposes.

CXR-LLaVA: Multimodal Large Language Model for Interpreting Chest X-ray Images

paper_url: http://arxiv.org/abs/2310.18341
repo_url: https://github.com/ecofri/cxr_llava
paper_authors: Seowoo Lee, M. D., Jiwon Youn, Mansu Kim Ph. D., Soon Ho Yoon, M. D. Ph. D
for: 这项研究旨在开发一个开源的多模态大语言模型，用于解读胸部X射线图像（CXR）。
methods: 该研究使用了659,287个公开available的胸部X射线图像进行训练，其中417,336个图像有特定的放射学畸形标注（dataset 1），241,951个图像提供了自由文本放射学报告（dataset 2）。在训练Resnet50作为图像Encoder后，使用语言-图像对应预训练来对CXR和其相应的放射学畸形进行对应。然后，使用dataset 2进行精度调整，并使用GPT-4进行生成多种问答enario。相关代码可以在https://github.com/ECOFRI/CXR_LLaVA中找到。
results: 在测试集中，模型的性能因参数而异。在 average 情况下，它在五种疾病（气肿、心肿、填充、肿胀和液体积）上的 F1 分数为0.34，通过提问工程可以提高到0.46。在独立集中，模型的 average F1 分数为0.30。另外，对于未在训练中看到的儿童胸部X射线图像集，模型可以准确地分类不正常的胸部X射线图像，F1 分数在0.84-0.85之间。

Abstract
Purpose: Recent advancements in large language models (LLMs) have expanded their capabilities in a multimodal fashion, potentially replicating the image interpretation of human radiologists. This study aimed to develop open-source multimodal large language model for interpreting chest X-ray images (CXR-LLaVA). We also examined the effect of prompt engineering and model parameters such as temperature and nucleus sampling. Materials and Methods: For training, we collected 659,287 publicly available CXRs: 417,336 CXRs had labels for certain radiographic abnormalities (dataset 1); 241,951 CXRs provided free-text radiology reports (dataset 2). After pre-training the Resnet50 as an image encoder, the contrastive language-image pre-training was used to align CXRs and corresponding radiographic abnormalities. Then, the Large Language Model Meta AI-2 was fine-tuned using dataset 2, which were refined using GPT-4, with generating various question answering scenarios. The code can be found at https://github.com/ECOFRI/CXR_LLaVA. Results: In the test set, we observed that the model's performance fluctuated based on its parameters. On average, it achieved F1 score of 0.34 for five pathologic findings (atelectasis, cardiomegaly, consolidation, edema, and pleural effusion), which was improved to 0.46 through prompt engineering. In the independent set, the model achieved an average F1 score of 0.30 for the same pathologic findings. Notably, for the pediatric chest radiograph dataset, which was unseen during training, the model differentiated abnormal radiographs with an F1 score ranging from 0.84 to 0.85. Conclusion: CXR-LLaVA demonstrates promising potential in CXR interpretation. Both prompt engineering and model parameter adjustments can play pivotal roles in interpreting CXRs.

摘要
目的：近期大型自然语言模型（LLM）的进步已经扩展了其多Modal功能，可能地模拟人类胸部X射线专业人员的图像解释能力。这项研究旨在开发开源的多Modal大型自然语言模型，用于解释胸部X射线图像（CXR-LLaVA）。我们还研究了提示工程和模型参数的效果，如温度和核心采样。材料和方法：为训练，我们收集了659,287个公开可用的胸部X射线图像：417,336个图像有特定的放射学畸形标注（数据集1）；241,951个图像提供了自由文本医学报告（数据集2）。在预训练Resnet50作为图像Encoder后，我们使用语言-图像对对照预训练来与胸部X射线图像相对应。然后，我们使用数据集2进行微调，并使用GPT-4进行多种问题回答场景的细化。代码可以在https://github.com/ECOFRI/CXR_LLaVA找到。结果：在测试集中，我们发现模型的性能会随着参数的变化。在 average 的情况下，它达到了五种疾病发现（气肿、心肺肥大、填充物、肿胀和胸膜液泛）的 F1 分数为0.34，通过提示工程提高到0.46。在独立集中，模型在同样的五种疾病发现中的 average F1 分数为0.30。尤其是在没有在训练过程中看到的儿童胸部X射线数据集中，模型能够 diferenciate 畸形胸部X射线图像的 F1 分数在0.84-0.85之间。结论：CXR-LLaVA表现出了可能的胸部X射线解释潜力。提示工程和模型参数的调整都可以在解释胸部X射线图像中扮演着重要的角色。

Learning to Discern: Imitating Heterogeneous Human Demonstrations with Preference and Representation Learning

paper_url: http://arxiv.org/abs/2310.14196
repo_url: None
paper_authors: Sachit Kuhar, Shuo Cheng, Shivang Chopra, Matthew Bronars, Danfei Xu
for: 本研究旨在Addressing the challenges of maintaining the quality of collected data and addressing the suboptimal nature of some demonstrations in practical imitation learning (IL) systems.
methods: 本研究提出了一种名为Learning to Discern (L2D)的离线仿omorphism学习框架，通过在嵌入式的轨迹段中学习一个秘密表示，并使用喜好学习来评估和学习不同风格的示范者的示例。
results: 研究表明，L2D可以有效地评估和学习从不同质量和风格的示范者中的示例，从而提高了policy性能在多种任务中，包括在模拟和物理机器人上。

Abstract
Practical Imitation Learning (IL) systems rely on large human demonstration datasets for successful policy learning. However, challenges lie in maintaining the quality of collected data and addressing the suboptimal nature of some demonstrations, which can compromise the overall dataset quality and hence the learning outcome. Furthermore, the intrinsic heterogeneity in human behavior can produce equally successful but disparate demonstrations, further exacerbating the challenge of discerning demonstration quality. To address these challenges, this paper introduces Learning to Discern (L2D), an offline imitation learning framework for learning from demonstrations with diverse quality and style. Given a small batch of demonstrations with sparse quality labels, we learn a latent representation for temporally embedded trajectory segments. Preference learning in this latent space trains a quality evaluator that generalizes to new demonstrators exhibiting different styles. Empirically, we show that L2D can effectively assess and learn from varying demonstrations, thereby leading to improved policy performance across a range of tasks in both simulations and on a physical robot.

摘要
实用的模仿学习（IL）系统依赖于大量人类示范数据实现成功的政策学习。然而，维护收集的数据质量和处理一些示范的不理想性可能会影响整体数据质量和学习结果。另外，人类行为的内在多样性可能会生成同样成功但具有不同风格的示范，进一步加剧了决定示范质量的挑战。为解决这些挑战，本文提出了学习把握（L2D），一种离线模仿学习框架，可以从多质量和风格的示范中学习。给定一小批示范，我们学习了一个抽象的表示，并在这个表示空间中进行偏好学习，以训练一个普适的质量评估器。我们的实验表明，L2D可以有效地评估和学习不同示范者的示范，从而提高政策性能在多种任务中，包括在模拟和物理机器人上。

PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation

paper_url: http://arxiv.org/abs/2310.14192
repo_url: https://github.com/servicenow/promptmix-emnlp-2023
paper_authors: Gaurav Sahu, Olga Vechtomova, Dzmitry Bahdanau, Issam H. Laradji
for: 提高文本分类精度和效率，对具有有限训练数据的问题进行解决
methods: 利用大语言模型（LLM）如GPT3，生成新的示例，并使用LLM的指令和几拟分类能力来生成更有用的数据增强
results: 在4个文本分类 datasets（银行77、TREC6、主观性（SUBJ）和推特投诉）中，提出了一种生成和重新标签borderline示例的方法，以便将大型LLM如GPT3.5-turbo的知识传递到更小的和更便宜的分类器中，并在4个dataset上表现出比多个5拟数据增强方法更好的效果。

Abstract
Data augmentation is a widely used technique to address the problem of text classification when there is a limited amount of training data. Recent work often tackles this problem using large language models (LLMs) like GPT3 that can generate new examples given already available ones. In this work, we propose a method to generate more helpful augmented data by utilizing the LLM's abilities to follow instructions and perform few-shot classifications. Our specific PromptMix method consists of two steps: 1) generate challenging text augmentations near class boundaries; however, generating borderline examples increases the risk of false positives in the dataset, so we 2) relabel the text augmentations using a prompting-based LLM classifier to enhance the correctness of labels in the generated data. We evaluate the proposed method in challenging 2-shot and zero-shot settings on four text classification datasets: Banking77, TREC6, Subjectivity (SUBJ), and Twitter Complaints. Our experiments show that generating and, crucially, relabeling borderline examples facilitates the transfer of knowledge of a massive LLM like GPT3.5-turbo into smaller and cheaper classifiers like DistilBERT$_{base}$ and BERT$_{base}$. Furthermore, 2-shot PromptMix outperforms multiple 5-shot data augmentation methods on the four datasets. Our code is available at https://github.com/ServiceNow/PromptMix-EMNLP-2023.

摘要
<>Translate the given text into Simplified Chinese.<>文本扩充是一种广泛使用的技术来解决文本分类问题，当有限量的训练数据时。现今的研究通常使用大型自然语言模型（LLM），如GPT3，生成新的示例。在这种工作中，我们提出了一种方法，使用LLM的能力来跟进 instrucciones和几架分类来生成更有帮助的扩充数据。我们的特定PromptMix方法包括两步： 1）生成文本扩充在类别边界附近，但生成边缘示例会增加数据集中的假阳性风险，所以我们 2）使用提示基于的LLM分类器来修正生成的文本扩充标签，以提高生成数据中的正确性。我们在四个文本分类 dataset上进行了对4-shot和0-shot设定的实验：Banking77、TREC6、Subjectivity (SUBJ) 和 Twitter Complaints。我们的实验表明，生成并关键地修正边缘示例可以传递大型LLM如GPT3.5-turbo中的知识到更小和更便宜的分类器如DistilBERT$_{base}$和BERT$_{base}$。此外，2-shot PromptMix在四个数据集上超过多个5-shot数据扩充方法。我们的代码可以在https://github.com/ServiceNow/PromptMix-EMNLP-2023 上获取。

Randomized Forward Mode of Automatic Differentiation for Optimization Algorithms

paper_url: http://arxiv.org/abs/2310.14168
repo_url: None
paper_authors: Khemraj Shukla, Yeonjong Shin
for: 这个论文主要是为了提出一种基于反Mode différentiation的隐藏层激活函数，以及一种基于这种激活函数的隐藏层神经网络，以提高神经网络的性能。
methods: 该论文使用了反Mode différentiation来计算损失函数的导数，并使用这些导数来更新神经网络的参数。具体来说，该论文提出了一种基于irectional derivatives的隐藏层神经网络更新方法，该方法使用了forward Mode AD或Jacobian vector Product来计算导数。
results: 该论文通过对多个概率分布中的Random directions进行测试，并通过对各种Physics-informed neural networks和Deep Operator Networks进行实验，显示了该方法的高效性和稳定性。同时，该论文还提供了一种理论分析，证明了该方法的速度收敛率。

Abstract
Backpropagation within neural networks leverages a fundamental element of automatic differentiation, which is referred to as the reverse mode differentiation, or vector Jacobian Product (VJP) or, in the context of differential geometry, known as the pull-back process. The computation of gradient is important as update of neural network parameters is performed using gradient descent method. In this study, we present a genric randomized method, which updates the parameters of neural networks by using directional derivatives of loss functions computed efficiently by using forward mode AD or Jacobian vector Product (JVP). These JVP are computed along the random directions sampled from different probability distributions e.g., Bernoulli, Normal, Wigner, Laplace and Uniform distributions. The computation of gradient is performed during the forward pass of the neural network. We also present a rigorous analysis of the presented methods providing the rate of convergence along with the computational experiments deployed in scientific Machine learning in particular physics-informed neural networks and Deep Operator Networks.

摘要
<>将文本翻译成简化中文。<>使用神经网络中的归档传播方法可以利用自动导 diferencial的基本元素，即逆向导数 diferencial 或向量雅可比产品（VJP），或在斜块 геометрии中称为pull-back过程。计算梯度非常重要，因为使用梯度下降方法来更新神经网络参数。在这种研究中，我们提出了一种通用随机化方法，通过使用前向AD或雅可比产品（JVP）来计算梯度。这些JVP在不同的概率分布，例如 Bernoulli、Normal、Wigner、Laplace 和 Uniform 分布中随机 sampling 的方向上进行计算。计算梯度在神经网络的前向传播过程中进行。我们还提供了一种准确的分析方法，其中提供了涨落速率以及计算实验的结果，其中包括物理学 Informed Neural Networks 和 Deep Operator Networks。

Graph Convolutional Network with Connectivity Uncertainty for EEG-based Emotion Recognition

paper_url: http://arxiv.org/abs/2310.14165
repo_url: None
paper_authors: Hongxiang Gao, Xiangyao Wang, Zhenghua Chen, Min Wu, Zhipeng Cai, Lulu Zhao, Jianqing Li, Chengyu Liu
for: 这个研究的目的是提高人机交互的自动情感识别能力，使用多条 Електроэнцефалограм (EEG) 信号。
methods: 这个研究使用的方法包括分布式不确定性方法、 граhp convolutional neural network (GCN) 架构、graph mixup 技术和深度GCN 重量。
results: 实验结果显示，这个方法比前一代方法有更好的性能，在两个常用的数据集上（SEED和SEEDIV）获得了正面和有意义的改善。

Abstract
Automatic emotion recognition based on multichannel Electroencephalography (EEG) holds great potential in advancing human-computer interaction. However, several significant challenges persist in existing research on algorithmic emotion recognition. These challenges include the need for a robust model to effectively learn discriminative node attributes over long paths, the exploration of ambiguous topological information in EEG channels and effective frequency bands, and the mapping between intrinsic data qualities and provided labels. To address these challenges, this study introduces the distribution-based uncertainty method to represent spatial dependencies and temporal-spectral relativeness in EEG signals based on Graph Convolutional Network (GCN) architecture that adaptively assigns weights to functional aggregate node features, enabling effective long-path capturing while mitigating over-smoothing phenomena. Moreover, the graph mixup technique is employed to enhance latent connected edges and mitigate noisy label issues. Furthermore, we integrate the uncertainty learning method with deep GCN weights in a one-way learning fashion, termed Connectivity Uncertainty GCN (CU-GCN). We evaluate our approach on two widely used datasets, namely SEED and SEEDIV, for emotion recognition tasks. The experimental results demonstrate the superiority of our methodology over previous methods, yielding positive and significant improvements. Ablation studies confirm the substantial contributions of each component to the overall performance.

摘要
To address these challenges, this study introduces a distribution-based uncertainty method to represent spatial dependencies and temporal-spectral relativeness in EEG signals based on Graph Convolutional Network (GCN) architecture. The GCN architecture adaptively assigns weights to functional aggregate node features, enabling effective long-path capturing while mitigating over-smoothing phenomena. Moreover, the graph mixup technique is employed to enhance latent connected edges and mitigate noisy label issues.Furthermore, the uncertainty learning method is integrated with deep GCN weights in a one-way learning fashion, termed Connectivity Uncertainty GCN (CU-GCN). The CU-GCN approach is evaluated on two widely used datasets, namely SEED and SEEDIV, for emotion recognition tasks. The experimental results demonstrate the superiority of the CU-GCN methodology over previous methods, yielding positive and significant improvements. Ablation studies confirm the substantial contributions of each component to the overall performance.

Augmenting End-to-End Steering Angle Prediction with CAN Bus Data

paper_url: http://arxiv.org/abs/2310.14162
repo_url: None
paper_authors: Rohan Gupta
for: 这paper的目的是提高自动驾驶车辆的终端推导预测精度，而不使用激光雷达和感知器。
methods: 这paper使用了计算机视觉模型，并通过感知器融合CAN总线数据来提高计算机视觉模型的准确性。
results: 结果显示，通过感知器融合CAN总线数据可以降低计算机视觉模型的预测错误率，其中一些模型的预测错误率可以降低80%。

Abstract
In recent years, end to end steering prediction for autonomous vehicles has become a major area of research. The primary method for achieving end to end steering was to use computer vision models on a live feed of video data. However, to further increase accuracy, many companies have added data from light detection and ranging (LiDAR) and or radar sensors through sensor fusion. However, the addition of lasers and sensors comes at a high financial cost. In this paper, I address both of these issues by increasing the accuracy of the computer vision models without the increased cost of using LiDAR and or sensors. I achieved this by improving the accuracy of computer vision models by sensor fusing CAN bus data, a vehicle protocol, with video data. CAN bus data is a rich source of information about the vehicle's state, including its speed, steering angle, and acceleration. By fusing this data with video data, the accuracy of the computer vision model's predictions can be improved. When I trained the model without CAN bus data, I obtained an RMSE of 0.02492, while the model trained with the CAN bus data achieved an RMSE of 0.01970. This finding indicates that fusing CAN Bus data with video data can reduce the computer vision model's prediction error by 20% with some models decreasing the error by 80%.

摘要
近年来，驾驶自动化领域内的终端转向预测技术得到了广泛的研究。原始方法实现终端转向是通过计算机视觉模型在实时视频数据上进行预测。然而，为了进一步提高准确性，许多公司通过整合激光雷达（LiDAR）和/或雷达感知器的整合来进行整合感知。然而，加入激光和感知器的成本很高。在这篇论文中，我解决了这两个问题，即提高计算机视觉模型的准确性，不需要增加LiDAR和/或感知器的成本。我实现了这一点通过将CAN总线数据（车辆协议）与视频数据整合，CAN总线数据包含车辆的状态信息，包括速度、转向角度和加速度。当我不使用CAN总线数据进行训练时，我获得的RMSE值为0.02492，而使用CAN总线数据进行训练的模型则可以得到RMSE值为0.01970。这一结果表明，将CAN总线数据与视频数据整合可以降低计算机视觉模型的预测错误率，一些模型可以降低错误率80%。

When Urban Region Profiling Meets Large Language Models

paper_url: http://arxiv.org/abs/2310.18340
repo_url: None
paper_authors: Yibo Yan, Haomin Wen, Siru Zhong, Wei Chen, Haodong Chen, Qingsong Wen, Roger Zimmermann, Yuxuan Liang
for: 这个论文的目的是提出一种基于大语言模型（LLM）的城市区域 profiling 方法，以便为城市规划和可持续发展提供有用的数据支持。
methods: 这个方法使用了一种名为 UrbanCLIP 的新型方法，它利用 LLM 的力量，将文本modalities 集成到城市图像 profiling 中，并通过对Image-to-Text LLM 生成的文本描述和图像进行协同学习，实现了自然语言监督下的城市视觉学习。
results: 实验结果表明，UrbanCLIP 方法可以提高城市区域 profiling 的精度，在四个主要中国城市的三个城市指标预测中，与当前方法相比，平均提高了6.1%的R^2 值。

Abstract
Urban region profiling from web-sourced data is of utmost importance for urban planning and sustainable development. We are witnessing a rising trend of LLMs for various fields, especially dealing with multi-modal data research such as vision-language learning, where the text modality serves as a supplement information for the image. Since textual modality has never been introduced into modality combinations in urban region profiling, we aim to answer two fundamental questions in this paper: i) Can textual modality enhance urban region profiling? ii) and if so, in what ways and with regard to which aspects? To answer the questions, we leverage the power of Large Language Models (LLMs) and introduce the first-ever LLM-enhanced framework that integrates the knowledge of textual modality into urban imagery profiling, named LLM-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining (UrbanCLIP). Specifically, it first generates a detailed textual description for each satellite image by an open-source Image-to-Text LLM. Then, the model is trained on the image-text pairs, seamlessly unifying natural language supervision for urban visual representation learning, jointly with contrastive loss and language modeling loss. Results on predicting three urban indicators in four major Chinese metropolises demonstrate its superior performance, with an average improvement of 6.1% on R^2 compared to the state-of-the-art methods. Our code and the image-language dataset will be released upon paper notification.

摘要
城市区域识别从网络数据处理是现代城市规划和可持续发展的重要前提。我们目睹到涉及多Modal数据研究的LLM技术在不断升温，特别是视觉语言学习，其中文本模式作为图像信息的补充。由于文本模式从未在城市区域识别中使用过，我们的研究旨在回答以下两个基本问题：一、文本模式能否提高城市区域识别？二、如果可以，那么在哪些方面和如何？为了回答这些问题，我们利用LLM技术的强大能力，并提出了首次在城市区域识别中结合文本模式的框架，名为UrbanCLIP。具体来说，它首先生成每个卫星图像的详细文本描述，使用开源的Image-to-Text LLM进行生成。然后，模型在图像-文本对中进行训练，同时结合对偶损失和语言模型学习损失。实验结果表明，UrbanCLIP在预测四个主要中国城市的三个城市指标方面表现出色，与当前方法的平均提升率为6.1%。我们的代码和图像语言数据将在论文通知时发布。

Are LSTMs Good Few-Shot Learners?

paper_url: http://arxiv.org/abs/2310.14139
repo_url: https://github.com/mikehuisman/lstm-fewshotlearning-oplstm
paper_authors: Mike Huisman, Thomas M. Moerland, Aske Plaat, Jan N. van Rijn
for: 这篇研究旨在探讨深度学习需要大量数据来学习新任务，而meta-learning可以解决这个限制。
methods: 这篇研究使用了LSTM和backpropagation来学习meta-learning。
results: LSTM surprisingly在一个简单的几个据点变数回推 regression benchmark上表现出色，但是在更复杂的几个据点图像分类 benchmark上表现不如预期。研究人员提出了两个可能的解释，并提出了一个新的方法called Outer Product LSTM (OP-LSTM)，它可以解决这些问题，并在内部预测和跨领域预测中获得了明显的性能提升。

Abstract
Deep learning requires large amounts of data to learn new tasks well, limiting its applicability to domains where such data is available. Meta-learning overcomes this limitation by learning how to learn. In 2001, Hochreiter et al. showed that an LSTM trained with backpropagation across different tasks is capable of meta-learning. Despite promising results of this approach on small problems, and more recently, also on reinforcement learning problems, the approach has received little attention in the supervised few-shot learning setting. We revisit this approach and test it on modern few-shot learning benchmarks. We find that LSTM, surprisingly, outperform the popular meta-learning technique MAML on a simple few-shot sine wave regression benchmark, but that LSTM, expectedly, fall short on more complex few-shot image classification benchmarks. We identify two potential causes and propose a new method called Outer Product LSTM (OP-LSTM) that resolves these issues and displays substantial performance gains over the plain LSTM. Compared to popular meta-learning baselines, OP-LSTM yields competitive performance on within-domain few-shot image classification, and performs better in cross-domain settings by 0.5% to 1.9% in accuracy score. While these results alone do not set a new state-of-the-art, the advances of OP-LSTM are orthogonal to other advances in the field of meta-learning, yield new insights in how LSTM work in image classification, allowing for a whole range of new research directions. For reproducibility purposes, we publish all our research code publicly.

摘要
深度学习需要大量数据来学习新任务，这限制了其应用于具有相应数据的领域。元学习把这一限制打砸，它学会如何学习。在2001年， Hochreiter等人表明了一种使用反传播的LSTM在不同任务上进行元学习，这种方法在小问题上表现了扎实的成果，而且在最近也在强化学习问题上得到了应用。然而，这种方法在监督少量学习设定中受到了少量关注。我们重新审视了这种方法，并在现代少量学习标准架构上测试它。我们发现LSTM在简单的几个shot折衔回归benchmark上表现出优于受欢迎的MAML方法，但是LSTM在更复杂的几个shot图像分类benchmark上表现不佳。我们认为这可能是两个原因，并提出了一种新的方法called Outer Product LSTM（OP-LSTM），该方法可以解决这些问题，并且在监督少量学习和跨频道设定中显示了显著性能提升。相比 популяр的元学习基eline，OP-LSTM在内频道图像分类中展示了竞争性能，并在跨频道设定中表现出0.5%到1.9%的准确率提升。虽然这些结果并不是新的状态态，但OP-LSTM的进步是与其他元学习领域的进步正交的，它为图像分类领域带来了新的研究方向，并且为研究者提供了更多的研究方向。为了保证可重现性，我们将所有的研究代码公开发布。

2023-10-22

A generalized likelihood-weighted optimal sampling algorithm for rare-event probability quantification

Mobile Traffic Prediction at the Edge through Distributed and Transfer Learning

An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI

Retrieval-Augmented Chain-of-Thought in Semi-structured Domains

Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation

Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design

Vision Language Models in Autonomous Driving and Intelligent Transportation Systems

Be Selfish, But Wisely: Investigating the Impact of Agent Personality in Mixed-Motive Human-Agent Interactions

O3D: Offline Data-driven Discovery and Distillation for Sequential Decision-Making with Large Language Models

Value of Assistance for Grasping

Learning to bag with a simulation-free reinforcement learning framework for robots

Merging Generated and Retrieved Knowledge for Open-Domain QA

ARCOQ: Arabic Closest Opposite Questions Dataset

MoPe: Model Perturbation-based Privacy Attacks on Language Models

Right, No Matter Why: AI Fact-checking and AI Authority in Health-related Inquiry Settings

From Chaos to Clarity: Claim Normalization to Empower Fact-Checking

Learning Interpretable Rules for Scalable Data Representation and Classification

CLMSM: A Multi-Task Learning Framework for Pre-training on Procedural Text

A Survey on Semantic Processing Techniques

Chainpoll: A high efficacy method for LLM hallucination detection

NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval

RSM-NLP at BLP-2023 Task 2: Bangla Sentiment Analysis using Weighted and Majority Voted Fine-Tuned Transformers

High-Quality 3D Face Reconstruction with Affine Convolutional Networks

Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization

Neural Multi-Objective Combinatorial Optimization with Diversity Enhancement

MIRACLE: Towards Personalized Dialogue Generation with Latent-Space Multiple Personal Attribute Control

UniMAP: Universal SMILES-Graph Representation Learning

Item-Graph2vec: a Efficient and Effective Approach using Item Co-occurrence Graph Embedding for Collaborative Filtering

LUNA: A Model-Based Universal Analysis Framework for Large Language Models

CXR-LLaVA: Multimodal Large Language Model for Interpreting Chest X-ray Images

Learning to Discern: Imitating Heterogeneous Human Demonstrations with Preference and Representation Learning

PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation

Randomized Forward Mode of Automatic Differentiation for Optimization Algorithms

Graph Convolutional Network with Connectivity Uncertainty for EEG-based Emotion Recognition

Augmenting End-to-End Steering Angle Prediction with CAN Bus Data

When Urban Region Profiling Meets Large Language Models

Are LSTMs Good Few-Shot Learners?